seald

HW5: seald 🦭

Submission

We will be using GitHub for distributing and collecting your assignments. At this point you should already have a repository created on your behalf in the cs4157-hw GitHub org. Follow the instructions on the class listserv to get access to that repository.

To obtain the skeleton files that we have provided for you, you need to clone your private repository. Your repository page should have a button titled “< > Code”. Click on it, and under the “Clone” section, select “SSH” and copy the link from there. For example:

$ git clone git@github.com:cs4157-hw/hw5-<id>-<your-team-name>.git

The TAs will use scripts to download, extract, build, and display your code. It is essential that you DO NOT change the names of the skeleton files provided to you. If you deviate from the requirements, the grading scripts will fail and you will not receive credit for your work.

You need to have at least 5 git commits total, but we encourage you to have many more. Your final submission should be pushed to the main branch.

As always, your submission should not contain any binary files. Your program must compile with no warnings, and should not produce any memory leaks or errors when run under valgrind. This requirement applies to all parts of the assignment.

At a minimum, README.txt should contain the following info:

The description should indicate whether your solution for the part is working or not. You may also want to include anything else you would like to communicate to the grader, such as extra functionality you implemented or how you tried to fix your non-working code.

Answers to written questions, if applicable, must be added to the skeleton file we have provided.

Overview

In this assignment, we’ll gradually build out a linker capable of linking together an arbitrary number of object files into a runnable executable.

We’ll start by writing out a minimal ELF file with dummy values in part 1 that can be read by readelf -a. In part 2, we’ll build out a linker that can link a single simple program generated from assembly code. We’ll then add symbol information and strings to our generated executable in part 3. To support simple C programs, we’ll implement single-file relocations in part4 . Finally, we’ll implement multi-file resolution & relocations in part 5 so that we can link together an arbitrary number of object files. We will not implement static or dynamic linking with libraries.

Part 1: Minimal ELF File

In this first part, we’ll write gen-exe, a program that generates a minimal ELF executable file called a.out. Our goal is to be able to run readelf -a a.out to inspect generated executable without any warnings or errors. We can do this by simply writing the ELF Header with mostly placeholder values.

Here are some notes:

We’ve placed our reference executable for this part on SPOC at /opt/asp/bin/gen-exe.

Part 2: Minimal Executable

The objective for this part is to link the hello.o object file under test/hello. It is an object file assembled from hello.s, which is a minimal implementation of a hello world program written in assembly.

Note that for this assigment, we will restrict the kinds of sections generated by our linker for simplicity. We are primarily concerned with generating the .text section, where the executable’s runnable code will be. We will not implement support for global/static variables (.data and .bss) and read-only C string literals (.rodata).

Here’s a high-level overview of how we’ll layout the generated ELF executable in this part:

+----------------------+   +
| ELF Header           |   |
+----------------------+   |  Segment 0
| Program Header Table |   |
+----------------------+   +
| .text                |   |  Segment 1
+----------------------+   +
| Section Header Table |
+----------------------+

The Program Header Table will list two loadable segments in the generated executable:

The Section Header Table will list two sections in the generated executable

You’ll write your implementation in part2/cld.c. Your linker should first mmap in the input object file and check that it’s a valid ELF object file. Use the structures provided by <elf.h> to parse the file and perform some validity checks. Your linker should terminate with an error if the input object file is invalid.

One way to navigate an ELF file is by searching through the section header table:

As you layout each piece of the ELF file, you’ll have to keep track of two offsets:

As illustrated above, everything is laid out after the ELF Header. Note that not all pieces of the ELF data are meant to loaded into memory and therefore only have a file offset – keep this in mind for the rest of the assignment.

You’ll also see that certain pieces of the ELF file have certain alignment requirements:

We will not be implementing strings and symbol information in this part. Most notably, we won’t have a .shstrtab in the output executable yet. You’ll have to zero out the following for now:

Be sure to update the ELF Header for the new data we’ve generated in this part, namely:

You should be able to link and run the simple hello program. For your reference, you should see something like the following when you run readelf -a a.out for this part:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          8192 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         2
  Size of section headers:           64 (bytes)
  Number of section headers:         2
  Section header string table index: 0

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0] <no-strings>      NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] <no-strings>      PROGBITS         0000000000401000  00001000
       0000000000000029  0000000000000000  AX       0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000b0 0x00000000000000b0  R      0x1000
  LOAD           0x0000000000001000 0x0000000000401000 0x0000000000401000
                 0x0000000000000029 0x0000000000000029  R E    0x1000

There is no dynamic section in this file.

There are no relocations in this file.
No processor specific unwind information to decode

No version information found in this file.

You’ll see that there is no symbol or string information – we’ll do that next.

Part 3: Symbols and Strings

To support symbols and string information, we’ll have to add three more sections to our generated ELF executable. Start by copying your part 2 implementation into part3/.

Here’s the updated layout for our ELF executable:

+----------------------+   +
| ELF Header           |   |
+----------------------+   |  Segment 0
| Program Header Table |   |
+----------------------+   +
| .text                |   |  Segment 1
+----------------------+   +
| .symtab              |
+----------------------+
| .strtab              |
+----------------------+
| .shstrtab            |
+----------------------+
| Section Header Table |
+----------------------+

Adding the symbol table (.symtab) and its corresponding string table (.strtab) will enable readelf -a to display symbol information for the executable. Add two new Section Headers for .symtab and .strtab. The ELF spec’s discussion on sections explains two special Section Header fields that must be filled out for .symtab:

Because our target program hello is so simple, you don’t need to build .symtab and .strtab from scratch; you can almost copy them as they appear in the input object file. You will have to adjust two fields in each Symbol Table entry:

Don’t forget to include the NULL symbol at index 0 in when generating the Symbol Table.

The section header string table (.shstrtab) should have five strings: an empty string for the NULL section and then one for each section listed above. Update the ELF Header’s e_shstrndx field to reflect the index of the .shstrtab header in the Section Header Table (4). You’ll also have to update the sh_name field of each Section Header to contain the index into the .shstrtab at which the section’s string is located at.

Now that the Section Header Table comes after .shstrtab, you’ll have to ensure the start of the table is properly aligned. Ensure that the table starts at an 8-byte boundary.

Once you properly generate these three sections, readelf -a will be able to display section names and symbol info. Make sure that the generated symbol addresses in the symbol table match up with their locations in the generated executable (use objdump -d to view the executable code disassembly).

Part 4: Single File Relocations

Our goal for this part is to link test/echo1/echo.o. In order to link object files generated by a C program, we’ll have to perform symbol relocations to resolve target addresses for function calls. This is the case even if your C program’s main() function doesn’t call any other functions because your main() function gets called by the C library’s startup routines. You’ll start by copying your part 3 solution into part4/.

Linking with the actual GNU C stdlib (glibc) is very complex. For the sake of simplicity in this assignment, we will instead link with our own minimal C library, my_libc. It provides the startup routines and some syscall wrappers for use in writing C programs.

The ELF spec’s discussion on relocations explains how to work with relocation entries. Also see CSAPP 7.7 on how to apply relocations. A few notes:

Recall the sh_info field of the Symbol Table Section Header from part 3. We said that we’d always set it to 1 to simplify our implementation, but this meant that we will not emit local symbols (other than the NULL symbol) in our generated executable’s symbol table. Now that we’re working with C programs, you’ll encounter local symbols frequently (e.g., static functions are local; see the symbol table for echo1/echo.o). Adjust your implementation to ensure that we only emit global symbols (and the NULL symbol). You’ll also have to regenerate .strtab to ensure only global symbol names (and the initial empty string for the NULL symbol) are included.

You’ll also have to change the ELF Header’s e_entry field. The entrypoint of the executable won’t necessarily be the start of the .text section anymore. Instead, change this to the runtime address of the _start routine, which is provided by my_libc.

Part 5: Linking Multiple Object Files

Start by copying your part 4 solution into part5/. In this final part, we’ll link the test/echo2 and test/sum programs, which are composed of multiple object files.

Each object file will have a mixture of symbol definitions and external symbol references in its symbol table. External symbol references refer to symbols define in some other file and appear as undefined symbols in the referencer’s symbol table. The linker’s job is to merge the input .text sections and symbols into a single executable. Here’s an outline of the steps to be performed:


Last updated: 2024-04-03