We will be using GitHub for distributing and collecting your assignments. At this point you should already have a repository created on your behalf in the cs4157-hw GitHub org. Follow the instructions on the class listserv to get access to that repository.
To obtain the skeleton files that we have provided for you, you need to clone your private repository. Your repository page should have a button titled â< > Codeâ. Click on it, and under the âCloneâ section, select âSSHâ and copy the link from there. For example:
$ git clone git@github.com:cs4157-hw/hw4-<id>-<your-team-name>.git
The TAs will use scripts to download, extract, build, and display your code. It is essential that you DO NOT change the names of the skeleton files provided to you. If you deviate from the requirements, the grading scripts will fail and you will not receive credit for your work.
You need to have at least 5 git commits total, but we encourage you to have many
more. Your final submission should be pushed to the main
branch.
As always, your submission should not contain any binary files. Your program
must compile with no warnings, and should not produce any memory leaks or errors
when run under valgrind
. This requirement applies to all parts of the
assignment.
At a minimum, README.txt
should contain the following info:
The description should indicate whether your solution for the part is working or not. You may also want to include anything else you would like to communicate to the grader, such as extra functionality you implemented or how you tried to fix your non-working code.
Answers to written questions, if applicable, must be added to the skeleton file we have provided.
In this assignment, you will implement a rudimentary debugger called ldb
.
In part 1, you will be introduced to the ptrace()
system call, which provides
the core debugger functionality.
In part 2, weâll start by implementing a few simple commands for ldb
:
step/continue and read/write memory. Weâll add support for breakpoints in
part 3. Lastly, weâll implement stack backtracing in part 4.
icount
In the part1/
directory, youâll find skeleton code for a program called
icount
. Its usage is as follows:
./icount
usage: ./icount <target> [args...]
It takes a target program and command line arguments to execute. icount
will
count the number of assembly instructions that the target program executes and
prints it to stdout. One line of C code can translate into many assembly
instructions. As such, icount
will step through the assembly instructions in
the target program, not lines of C code.
icount
will use the ptrace()
system call to trace the execution of the
target program. Read the overview of the ptrace()
system call in man ptrace
,
up to and including the description of PTRACE_TRACEME
.
icount
will fork()
and have the child process invoke the PTRACE_TRACEME
command to indicate that the child process is to be traced by the parent
process. The child process will then go on to execute the target program. The
parent process will drive the child process forward by repeatedly invoking the
PTRACE_SINGLESTEP
command. Read about PTRACE_SINGLESTEP
in man ptrace
and
how to query the traceeâs status in man waitpid
.
Weâve placed a reference executable under /opt/asp/bin
on SPOC. Your
icount
âs output should match ours for simple programs like the sum example
from lecture. However, note that more complex programs probably wonât execute
the same number of instructions every time they run, so your output might differ
from the reference executable for such programs.
Under the part2/
directory, youâll find skeleton code for the ldb
debugger.
ldb.c
implements the command-line interface for the debugger. It stubs out the
various features that you will implement in this assignment into helper
functions in ldb-info.c
and ldb-step.c
. The implementation of ldb.c
is
complete already and will stay fixed throughout the assignment â do not modify
it.
Before you start coding, read through ldb.c
carefully and make sure you
understand the execution structure of ldb
. For each stubbed out function in
ldb-info.c
and ldb-step.c
, weâve indicated in which part youâll implement
the function.
Weâve placed our ldb
reference executable under /opt/asp/bin
on SPOC. This
is our part 4 solution, but it applies to all parts.
In this part, we will implement the following ldb
commands:
s
: Step forward by one instructionc
: Continue executioni
: Print the current values of some general purpose registersx <addr>
: Print the contents of the traceeâs memory at a specified addresse <addr> <byte>
: Write a byte to the traceeâs memory at a specified addressNote that <addr>
and <byte>
specified above (and in the rest of ldb
) must
be encoded in hexadecimal.
ptrace()
supports various commands to implement these features â see
man ptrace
for more details on PTRACE_SINGLESTEP
, PTRACE_CONT
,
PTRACE_GETREGS
, PTRACE_PEEKDATA
, and PTRACE_POKEDATA
.
A few notes on implementation:
RETURN VALUE
section for how to correctly interpret the return value
of PTRACE_PEEKDATA
PTRACE_POKEDATA
will write 8 bytes to the specified address. To
implement e
, you must first read 8 bytes at the target address, edit the
desired byte, and then write back the 8 bytes.Since we donât have breakpoints yet, we can replicate the functionality by using
pause()
. See sample/sum-pause.c
â after performing some work, it calls
pause()
and suspends execution. ldb
will seem to hang at this point. We can
regain control by sending the tracee SIGINT
.
When a signal is sent to a tracee, the tracee is first stopped to give the
tracer the opportunity to inspect the signal. It is then up to the tracer to
forward the signal (or not) to the tracee on the next PTRACE_SINGLESTEP
or
PTRACE_CONT
. Our code in ldb.c
will forward all signals to the tracee except
SIGTRAP
, which is used by ptrace()
and the OS kernel to notify the tracer of
various debugging events. For example, PTRACE_SINGLESTEP
will arrange for a
SIGTRAP
to be sent to the tracee, which causes the tracee to stop and the
tracer to regain control. This signal is meant for the tracer and shouldnât be
forwarded to the tracee. There are more nuances to SIGTRAP
that we are
ignoring for simplicity (e.g., we donât forward user-generated SIGTRAP
s to the
tracee).
The main()
function calls ldb_read_signal()
to retrieve the current signal
info for the stopped tracee. Implement this function in this part using the
PTRACE_GETSIGINFO
command.
To test your implementation, run ldb
on the sum-pause
sample program. When
the program hangs on the call to pause()
, use kill
to send a SIGINT
to the
tracee to give control back to ldb
. You should be able to examine the contents
of the a
array, and modify them such that the final sum printed at the end of
the program changes.
Start by copying your part2/
directory into part3/
.
In this part, we will implement the following ldb
commands:
b <addr>
: Set a breakpoint in the tracee at the specified addressb
: List all breakpoints currently setd
: Delete all breakpointsSince ldb
doesnât support source-code-level debugging, weâll have to specify
the memory address of the program instructions at which we want to set
breakpoints. You can use objdump -d <executable>
to view the assembly code of
the executable along with the addresses of where the instructions will be loaded
into memory. Normally, however, linkers produce position-indepedent executables
(pie), meaning that the memory address at which the executable is loaded can
change from run to run. Thus, the addresses listed in the objdump
output wonât
be the same as the addresses of the program code at runtime.
Youâll see that the Makefile
in the sample/
directory specifies the
-no-pie
linker flag. This flag ensures that the executable produced is not
position-independent, and therefore the memory addresses in the objdump
output
will match the program code addresses at runtime.
Implement breakpoints using the INT3
instruction as follows:
addr
, ldb
will replace the byte at addr
with
INT3
(OxCC) and save the byte so it can be restored later.ldb
will replace the INT3
byte at the breakpoint
address with the original byte that we saved. Be sure to unwind RIP if the
program was at a breakpoint when breakpoints get deleted with the d
command.ldb_step()
and ldb_cont()
to account for
breakpoints. Make sure s
and c
donât skip over the original instruction at
the breakpoint.
addr
, %rip
will be at addr + 1
.
ldb
temporarily restores the saved byte at addr
, unwinds %rip
to
addr
, steps forward to execute the instruction at addr
, and then resets
the byte at addr
to INT3
. Use the PTRACE_SETREGS
command to update
the value of %rip
â implement ldb_write_regs()
.c
implementation, itâs possible that the tracee gets stopped because that
instructionâs execution caused a non-SIGTRAP signal to be sent to the tracee. In
that case, you should return control to the debugger instead of continuing the
tracee further.Start by coping your part3/
directory into part4/
.
We will implement the t
command in this part, which prints the function call
stack trace starting at %rip
. This command requires you to implement the
following functions:
ldb_get_symbol_info()
: Read the target programâs symbol information from its
ELF file on disk.
mmap()
to map the ELF file into memory. The main()
function will
unmap it at exit.<elf.h>
defines various types and helper functions to parsing ELF files
â use them to retrieve the symbol information. See man elf
for more
details.ldb_backtrace()
: Trace through the traceeâs current function call stack by
following the saved frame pointers and return addresses on the stack.
main()
function call.ldb_find_function()
: Given a return address to a function, find the
functionâs symbol object in the ELF symbol table.
STT_FUNC
, its st_value
field is the address of
the first instruction of the function and the st_size
field is the total
byte size of all of the instructions in the function. Use these fields to
determine if the specified address is contained within a given function.Once these functions are implemented, ldb
will also print the function call
stack trace whenever the tracee gets sent SIGSEGV
(for segfault) or SIGFPE
(e.g., for divide-by-zero error).
Recall that the compiler may choose to optimize out the usage of the frame
pointer. Since our backtrace logic depends on its presence, target programs
should be compiled with -fno-omit-frame-pointer
.
You can see the symbol table inside of an ELF file by running
readelf --symbols <executable>
. For an executable linked with
-no-pie
that calls printf()
, youâll see the following entry for printf
:
23: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2.5
Because the GNU C library is dynamically linked by default, the function address
in the ELF file on disk is left unresolved. At the runtime, the function address
will be resolved. This makes it impossible for ldb
to include functions from
dynamically linked libraries in its stack traces since we use the symbol table
in the ELF file on disk to resolve function addresses.
You can disable dynamic linking by specifying the -static
linker flag. For an
executable linked with -static
that calls printf()
, youâll see the following
entry for printf
in the symbol table:
824: 000000000040b690 201 FUNC GLOBAL DEFAULT 7 printf
The entry has a fully resolved address.
If youâd like to use ldb
âs backtrace functionality from within a library
function, make sure that the library is linked in statically.
Last Updated: 2024-11-02