We will be using GitHub for distributing and collecting your assignments. At this point you should already have a repository created on your behalf in the cs4157-hw GitHub org. Follow the instructions on the class listserv to get access to that repository.
To obtain the skeleton files that we have provided for you, you need to clone your private repository. Your repository page should have a button titled “< > Code”. Click on it, and under the “Clone” section, select “SSH” and copy the link from there. For example:
$ git clone git@github.com:cs4157-hw/hw5-<id>-<your-team-name>.git
The TAs will use scripts to download, extract, build, and display your code. It is essential that you DO NOT change the names of the skeleton files provided to you. If you deviate from the requirements, the grading scripts will fail and you will not receive credit for your work.
You need to have at least 5 git commits total, but we encourage you to have many
more. Your final submission should be pushed to the main
branch.
As always, your submission should not contain any binary files. Your program
must compile with no warnings, and should not produce any memory leaks or errors
when run under valgrind
. This requirement applies to all parts of the
assignment.
At a minimum, README.txt
should contain the following info:
The description should indicate whether your solution for the part is working or not. You may also want to include anything else you would like to communicate to the grader, such as extra functionality you implemented or how you tried to fix your non-working code.
Answers to written questions, if applicable, must be added to the skeleton file we have provided.
In this assignment, we’ll gradually build out a linker capable of linking together an arbitrary number of object files into a runnable executable.
We’ll start by writing out a minimal ELF file with dummy values in part 1 that
can be read by readelf -a
. In part 2, we’ll build out a linker that can link a
single simple program generated from assembly code. We’ll then add symbol
information and strings to our generated executable in part 3. To support simple
C programs, we’ll implement single-file relocations in part4 . Finally, we’ll
implement multi-file resolution & relocations in part 5 so that we can link
together an arbitrary number of object files. We will not implement static or
dynamic linking with libraries.
In this first part, we’ll write gen-exe
, a program that generates a minimal
ELF executable file called a.out
. Our goal is to be able to run
readelf -a a.out
to inspect generated executable without any warnings or
errors. We can do this by simply writing the ELF Header with mostly placeholder
values.
Here are some notes:
gen-exe
should write out a.out
in the current directory. You should
overwrite the file if it already exists and create it with 0755
permissions.
Note that if you have a file mode creation mask set using the umask
command,
the resulting file permissions will be different.man elf
and the
ELF Specification.
Use these resources to learn about the various ELF structures.a.out
is a minimal ELF file, just write 0 for fields in the ELF Header
that refer to offsets, addresses, and counts. You’ll revisit these fields once
we start building out a real linker in the later parts of this assignment.
However, there are many constant field values that you should fill out now. To
point out a couple:
e_ident
: Most of the bytes in here are straightforward. Note that for
OSABI, the ELF specification says to specify ELFOSABI_NONE
since we won’t
be using any special Linux extensions.e_type
: Specify ET_EXEC
since our linker will only support static
linking.We’ve placed our reference executable for this part on SPOC at
/opt/asp/bin/gen-exe
.
The objective for this part is to link the hello.o
object file under
test/hello
. It is an object file assembled from hello.s
, which is a minimal
implementation of a hello world program written in assembly.
Note that for this assigment, we will restrict the kinds of sections generated
by our linker for simplicity. We are primarily concerned with generating the
.text
section, where the executable’s runnable code will be. We will not
implement support for global/static variables (.data
and .bss
) and read-only
C string literals (.rodata
).
Here’s a high-level overview of how we’ll layout the generated ELF executable in this part:
+----------------------+ +
| ELF Header | |
+----------------------+ | Segment 0
| Program Header Table | |
+----------------------+ +
| .text | | Segment 1
+----------------------+ +
| Section Header Table |
+----------------------+
The Program Header Table will list two loadable segments in the generated executable:
The Section Header Table will list two sections in the generated executable
.text
.You’ll write your implementation in part2/cld.c
. Your linker should first
mmap
in the input object file and check that it’s a valid ELF object file. Use
the structures provided by <elf.h>
to parse the file and perform some validity
checks. Your linker should terminate with an error if the input object file is
invalid.
One way to navigate an ELF file is by searching through the section header table:
e_shoff
field will tell you the offset into the input
object file where the Section Header Table is located at.sh_name
a field, which is an index into the
.shstrtab
, which is the string table for section headers. The ELF
Header’s e_shstrndx
field gives you the index into the Section Header Table
that contains the entry for .shstrtab
. The Section Header’s sh_offset
field contains the offset into the file at which the section’s content is
located at..text
), you can iterate through the
Section Header Table and look up the name of the section in .shstrtab
.As you layout each piece of the ELF file, you’ll have to keep track of two offsets:
0x400000
.As illustrated above, everything is laid out after the ELF Header. Note that not all pieces of the ELF data are meant to loaded into memory and therefore only have a file offset – keep this in mind for the rest of the assignment.
You’ll also see that certain pieces of the ELF file have certain alignment requirements:
.text
will have a variable size because it contains the program code.
Whatever comes after .text
will have to be properly aligned. In this part, the
section header table comes after .text
, but that’ll change in later parts. For
simplcity, just align whatever comes after .text
to a page boundary.We will not be implementing strings and symbol information in this part. Most
notably, we won’t have a .shstrtab
in the output executable yet. You’ll have
to zero out the following for now:
sh_name
field in the section headers.e_shstrndx
field in the ELF Header.Be sure to update the ELF Header for the new data we’ve generated in this part, namely:
e_phoff
: The offset into the file at which the Program Header Table is located at.e_shoff
: The offset into the file at which the Section Header Table is located at.e_phnum
: Number of program headerse_shnum
: Number of section headerse_entry
: The entry point of the executable. At this point, our linker can
only support bareminimum assembly-based programs. Our provided hello
program
lays out the entry point right at the beginning of .text
, so specify the
run-time address of .text
for this field. This will change once we support
more complicated C programs.You should be able to link and run the simple hello
program. For your
reference, you should see something like the following when you run readelf -a
a.out
for this part:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401000
Start of program headers: 64 (bytes into file)
Start of section headers: 8192 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 2
Size of section headers: 64 (bytes)
Number of section headers: 2
Section header string table index: 0
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] <no-strings> NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] <no-strings> PROGBITS 0000000000401000 00001000
0000000000000029 0000000000000000 AX 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000b0 0x00000000000000b0 R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x0000000000000029 0x0000000000000029 R E 0x1000
There is no dynamic section in this file.
There are no relocations in this file.
No processor specific unwind information to decode
No version information found in this file.
You’ll see that there is no symbol or string information – we’ll do that next.
To support symbols and string information, we’ll have to add three more sections
to our generated ELF executable. Start by copying your part 2 implementation
into part3/
.
Here’s the updated layout for our ELF executable:
+----------------------+ +
| ELF Header | |
+----------------------+ | Segment 0
| Program Header Table | |
+----------------------+ +
| .text | | Segment 1
+----------------------+ +
| .symtab |
+----------------------+
| .strtab |
+----------------------+
| .shstrtab |
+----------------------+
| Section Header Table |
+----------------------+
Adding the symbol table (.symtab
) and its corresponding string table
(.strtab
) will enable readelf -a
to display symbol information for the
executable. Add two new Section Headers for .symtab
and .strtab
. The ELF spec’s discussion on
sections
explains two special Section Header fields that must be filled out for .symtab
:
sh_link
: The section header index of the associated string table – 3 for .strtab
in our layout.sh_info
: One greater than the symbol table index of the last local symbol.
This will always be 1 for us. For simplicty, we’ll only emit global symbols in the
generated symbol table. You don’t need to do anything special to ensure this at
this point as we can only support simple assembly-based programs like hello
that only have global symbols. We will revisit this in the next part. Note that
we write 1 because the symbol table has a NULL entry – see the ELF spec’s
discussion of
symbols for
more details.Because our target program hello
is so simple, you don’t need to build
.symtab
and .strtab
from scratch; you can almost copy them as they appear in
the input object file. You will have to adjust two fields in each Symbol Table
entry:
st_shndx
: The section header index of the section this symbol comes from. There are a few cases to consider:
SHN_ABS
: The symbol is not defined with respect to a section. You can just
emit it as-is.SHN_UNDEF
: The symbol is not defined in the object file (i.e., it is an
external reference). Since we only support a single input object file now,
you’ll have to throw an error for now. We’ll come back to this once we support
multiple input object files and global symbol resolution.SHN_COMMON
and SHN_XINDEX
: We will not support these kinds of symbols;
throw an error if you encounter them..text
. Adjust this field’s value to reflect the index of .text
in our generated output executable, which is 1. You’ll also have to adjust the
st_value
field for this case.st_value
: In the input object file, this holds the byte offset into the
section where the symbol is defined. Since we’re producing an executable, we
need to change this to be the virtual address at which the symbol is defined.
For this part, you can simply add the offset to the virtual address of the
.text
section in our executable.Don’t forget to include the NULL symbol at index 0 in when generating the Symbol Table.
The section header string table (.shstrtab
) should have five strings: an empty
string for the NULL section and then one for each section listed above. Update
the ELF Header’s e_shstrndx
field to reflect the index of the .shstrtab
header in the Section Header Table (4). You’ll also have to update the sh_name
field of each Section Header to contain the index into the .shstrtab
at which
the section’s string is located at.
Now that the Section Header Table comes after .shstrtab
, you’ll have to ensure
the start of the table is properly aligned. Ensure that the table starts at an
8-byte boundary.
Once you properly generate these three sections, readelf -a
will be able to
display section names and symbol info. Make sure that the generated symbol
addresses in the symbol table match up with their locations in the generated
executable (use objdump -d
to view the executable code disassembly).
Our goal for this part is to link test/echo1/echo.o
. In order to link object
files generated by a C program, we’ll have to perform symbol relocations to
resolve target addresses for function calls. This is the case even if your C
program’s main()
function doesn’t call any other functions because your
main()
function gets called by the C library’s startup
routines. You’ll start by copying your part 3 solution into part4/
.
Linking with the actual GNU C stdlib (glibc) is very complex. For the sake of
simplicity in this assignment, we will instead link with our own minimal C
library, my_libc
. It provides the startup routines and some syscall wrappers
for use in writing C programs.
The ELF spec’s discussion on relocations explains how to work with relocation entries. Also see CSAPP 7.7 on how to apply relocations. A few notes:
.text
section are stored in the .rela.text
section. We will not handle relocations for other sections since we are only
handling .text
.r_offset
field contains the byte offset into the section where the
relocation is to be applied. For .text
relocations, this refers to the
callsite in the input object’s .text
where we need to fill in the resolved
target function address.r_info
field contains two pieces of information:
ELF64_R_TYPE(r_info)
gives you the relocation type. We are
only interested in handling R_X86_64_PC32
/R_X86_64_PLT32
, which are
used for PC-relative encodings, where the function call target is relative
to the runtime address of the callsite. You should throw an error if you
encounter any other kind of relocation.ELF64_R_SYM(r_info)
gives you the index into the input object’s symbol
table of the symbol that is being referred to.Recall the sh_info
field of the Symbol Table Section Header from part 3. We
said that we’d always set it to 1 to simplify our implementation, but this meant
that we will not emit local symbols (other than the NULL symbol) in our
generated executable’s symbol table. Now that we’re working with C programs,
you’ll encounter local symbols frequently (e.g., static functions are local; see
the symbol table for echo1/echo.o
). Adjust your implementation to ensure that
we only emit global symbols (and the NULL symbol). You’ll also have to
regenerate .strtab
to ensure only global symbol names (and the initial empty
string for the NULL symbol) are included.
You’ll also have to change the ELF Header’s e_entry
field. The entrypoint of
the executable won’t necessarily be the start of the .text
section anymore.
Instead, change this to the runtime address of the _start
routine, which is
provided by my_libc
.
Start by copying your part 4 solution into part5/
. In this final part, we’ll
link the test/echo2
and test/sum
programs, which are composed of multiple
object files.
Each object file will have a mixture of symbol definitions and external symbol
references in its symbol table. External symbol references refer to symbols
define in some other file and appear as undefined symbols in the referencer’s
symbol table. The linker’s job is to merge the input .text
sections and
symbols into a single executable. Here’s an outline of the steps to be
performed:
.text
sections into a single output .text
sectionst_name
: Refer to the newly generated string tablest_value
: Refer to the memory address of the symbol in the output
.text
section, which is the concatenation of all the input .text
sectionsLast updated: 2024-04-03