x86-64 has 16 64-bit general purpose registers, as presented in CSAPP Figure 3.2:

Many x86 assembly instructions can operate on different widths of data. The byte quantity is specified as the suffix of the instruction name. CSAPP Figure 3.1 details these suffixes:

The following sequence of mov instructions demonstrate how to manipulate data at different widths:
movabsq $0x0011223344556677, %rax # %rax = 0011223344556677
movb $-1, %al # %rax = 00112233445566FF
movw $-1, %ax, # %rax = 001122334455FFFF
movl $-1, %eax # %rax = 00000000FFFFFFFF
movq $-1, %rax # %rax = FFFFFFFFFFFFFFFF
The first mov fills %rax with a quadword (64-bit) quantity,
0x0011223344556677. Regular mov instructions can only take a 32-bit
immediate (i.e., literal) value. movabs allows you to specify a 64-bit
immediate value.
The next four mov instructions set the lower 1, 2, 4, and 8 bytes of %rax to
-1, respectively. Each mov instruction is augmented with the appropriate
data-width suffix. Sometimes the suffix is omitted because it can be deduced
from the register name.
Note that movb and movw instruction leave the rest of the bits in the
register unchanged, but movl has the side-effect of zero-ing out the higher
order 4 bytes in the register. x86 dictates that movl with a register
destination should zero-out the higher-order 4 bytes.
The mov instruction is used to transfer data between registers and memory. In
this section, we’ll cover the x86 syntax used to refer to memory locations. The
leaq instruction, often abbreviated as lea since it only operates on 8-byte
quadwords, is used to perform pointer arithmetic. It shares the same syntax with
mov, but doesn’t dereference memory; it just calculates the address. lea
stands for “load effective address”.
CSAPP 3.8.2 gives some examples of using mov and lea instructions with
various forms of memory addressing. For the following examples, assume the following:
E is stored in %rdxi is stored in %rcx%eax for data or %rax for pointersM[addr] refers to the data stored at addr in memory
As a side note, this memory access syntax is so convenient that compilers often
use lea to perform arithmetic unrelated to memory access.
The push and pop instructions are examples of instructions that operate on
specific registers. The %rsp register holds the address of the top of the
stack. The push instruction decreases %rsp (to grow the stack) and then
copies a value to the memory location pointed to by %rsp. The pop
instruction retrieves the value at the memory location pointed to by %rsp and
then increases %rsp (to shrink the stack).
CSAPP Figure 3.9 illustrates these semantics:

CSAPP Figure 3.25 depicts what the stack looks like when function P() calls function Q():

P’s stack frame:
Q(), P() prepares the arguments. The first six arguments
are passed through registers, %rdi, %rsi, %rdx, %rcx, %r8, %r9, in that order.
The rest of the arguments are pushed onto the P()’s stack frame, as depicted
above.P() executes the call instruction, it pushes the return address onto
its stack frame before jumping to Q().Q’s stack frame:
%rbx, %rbp, %r12-%r15 are classified as “callee-saved” registers,
meaning that the caller expects them to have the original values when the callee
returns. If Q() intends to use them, it must first save them on the stack so
they can later be restored.Q() then allocates space for its local variables in its stack frame.Q() invokes another function that takes more than 6 parameters, it will
prepare the arguments on its stack frame.Q() will store its return value in %rax (not depicted)Q() returns via the ret instruction, it will pop off the saved return
address from P()’s stack frame and jump to it.%rbp (not depicted) is known as the frame pointer. It points to the start of
the current stack frame. You’ll typically see these instructions in a function:
push %rbp # Save the old frame pointer on the stack
mov %rsp,%rbp # Set the new frame pointer ```
# <function body>
mov %rbp,%rsp # Roll up the stack to %rbp
pop %rbp # Restore the old frame pointer
ret
The compiler may optimize away parts of or the entirety of the stack frame. For example, leaf procedures (i.e., functions that don’t call other functions) may not have a stack frame.
Consider the following C program, divided into two source files:
sum.c:
long sum(long a, long b) {
return a + b;
}
long sum_array(long *p, int n) {
long s = 0;
for (int i = 0; i < n; i++) {
s = sum(s, p[i]);
}
return s;
}
main.c:
long sum_array(long *p, int n);
int main() {
long a[5] = {0, 1, 2, 3, 4};
long sum = sum_array(a, 5);
printf("sum=%ld\n", sum);
}
With the x86 assembly essentials we’ve just covered, we can now dive into compiler-generated x86 assembly for this simple C program.
When gcc compiles a C source file into an object file, it first translates the
C code into assembly code, and then invokes the assembler to translate the
assembly code into the machine code. One can generate the intermediate assembly
code using the -S flag (e.g., gcc -S code.c).
The object file is encoded using the ELF binary format, which we will cover
later in the class. We can use a disassembler to read the code region of the
object file as assembly code (e.g., objdump -d code.o).
sum.oThe disassembly of sum.o, heavily annotated by us, is shown below:
sum.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <sum>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp # Save the old frame pointer on the stack
5: 48 89 e5 mov %rsp,%rbp # Set the new frame pointer
8: 48 89 7d f8 mov %rdi,-0x8(%rbp) # Loads arguments a and b onto the stack
c: 48 89 75 f0 mov %rsi,-0x10(%rbp) # without adjusting rsp
10: 48 8b 55 f8 mov -0x8(%rbp),%rdx # Loads stack copies of a and b into %rdx and %rax
14: 48 8b 45 f0 mov -0x10(%rbp),%rax # Note that %rax holds function return value
18: 48 01 d0 add %rdx,%rax # b += a
1b: 5d pop %rbp # Restore the old frame pointer
1c: c3 ret # Return to caller
000000000000001d <sum_array>:
1d: f3 0f 1e fa endbr64
21: 55 push %rbp
22: 48 89 e5 mov %rsp,%rbp
25: 48 83 ec 20 sub $0x20,%rsp # Allocate 32 bytes on the stack
29: 48 89 7d e8 mov %rdi,-0x18(%rbp) # Load arguments p and n onto the stack
2d: 89 75 e4 mov %esi,-0x1c(%rbp)
30: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp) # s = 0
37: 00
38: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp) # i = 0
3f: eb 2e jmp 6f <sum_array+0x52> # Jump to 0x6f
41: 8b 45 f4 mov -0xc(%rbp),%eax # %eax = i
44: 48 98 cltq # %rax = (long) %eax
46: 48 8d 14 c5 00 00 00 lea 0x0(,%rax,8),%rdx # Calculate byte-offset (i * sizeof(long))
4d: 00
4e: 48 8b 45 e8 mov -0x18(%rbp),%rax # %rax = p
52: 48 01 d0 add %rdx,%rax # p += byte-offset
55: 48 8b 10 mov (%rax),%rdx # %rdx = *p
58: 48 8b 45 f8 mov -0x8(%rbp),%rax # %rax = s
5c: 48 89 d6 mov %rdx,%rsi # %rsi = *p (second arg)
5f: 48 89 c7 mov %rax,%rdi # %rdi = s (first arg)
62: e8 00 00 00 00 call 67 <sum_array+0x4a> # Call add() (address not resolved yet)
67: 48 89 45 f8 mov %rax,-0x8(%rbp) # Store return value into s
6b: 83 45 f4 01 addl $0x1,-0xc(%rbp) # i++
6f: 8b 45 f4 mov -0xc(%rbp),%eax # %eax = i
72: 3b 45 e4 cmp -0x1c(%rbp),%eax # Compare i and n
75: 7c ca jl 41 <sum_array+0x24> # Jump to 0x41 if i < n
77: 48 8b 45 f8 mov -0x8(%rbp),%rax # %rax = s as return value
7b: c9 leave # mov %rbp,%rsp then pop %rbp
7c: c3 ret
A few additional notes:
-O0 -fno-omit-frame-pointer to ensure the compiler
does not perform any optimizations and to ensure that %rbp is used as the
frame pointer.sum() is a leaf procedure. Note that it does not build out its stack frame
by manipulating %rsp like sum_array() does.call instructions are zero at this point. They
will be fixed up by the linker when the final executable is built.sum_array() allocates 32 bytes for local variables (which only take up 24
bytes) probably to ensure proper data alignment.jl used in sum_array()) usually follow some
comparison instructions (like cmp). The result of a comparison instruction is
stored as several bits in the %rflags register. Subsequent conditional jumps
refer to this register to determine if the jump should be taken.sum.o (optional)If we recompile sum.c using -O1 (for optimization level 1), the generated
machine code changes; see our annotations in-line:
sum.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <sum>:
0: f3 0f 1e fa endbr64
4: 48 8d 04 37 lea (%rdi,%rsi,1),%rax # %rax = a + (b * 1)
8: c3 ret
0000000000000009 <sum_array>:
9: f3 0f 1e fa endbr64
d: 85 f6 test %esi,%esi # Test n
f: 7e 20 jle 31 <sum_array+0x28> # Jump to 0x31 if n <= 0
11: 48 89 f8 mov %rdi,%rax # %rax = p
14: 8d 56 ff lea -0x1(%rsi),%edx # %edx = n - 1
17: 48 8d 4c d7 08 lea 0x8(%rdi,%rdx,8),%rcx # %rcx = (p + 8(n-1)) + 8, addr of one-past last element of array
1c: ba 00 00 00 00 mov $0x0,%edx # %edx = 0
21: 48 03 10 add (%rax),%rdx # %rdx += *p
24: 48 83 c0 08 add $0x8,%rax # p++
28: 48 39 c8 cmp %rcx,%rax # Compare p and addr of one-past last element of array
2b: 75 f4 jne 21 <sum_array+0x18> # Jump to 0x21 if not at end yet
2d: 48 89 d0 mov %rdx,%rax # %rdx -> %rax as return value
30: c3 ret
31: ba 00 00 00 00 mov $0x0,%edx # %edx = 0
36: eb f5 jmp 2d <sum_array+0x24> # Unconditional jump to 0x2d
A few things to note:
sum() has been optimized to simply use the lea instruction to perform the
addition.sum_array() no longer has a stack frame because all local variables are
stored in registers.long *end = p + n;
for (; p != end; p++) { ... }
sum_array() doesn’t even call sum() anymore; it has inlined the addition.main.oThe disassembly of main.o, heavily annotated by us, is shown below:
main.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 83 ec 30 sub $0x30,%rsp
c: 48 c7 45 d0 00 00 00 movq $0x0,-0x30(%rbp) # a[0] = 0
13: 00
14: 48 c7 45 d8 01 00 00 movq $0x1,-0x28(%rbp) # a[1] = 1
1b: 00
1c: 48 c7 45 e0 02 00 00 movq $0x2,-0x20(%rbp) # a[2] = 2
23: 00
24: 48 c7 45 e8 03 00 00 movq $0x3,-0x18(%rbp) # a[3] = 3
2b: 00
2c: 48 c7 45 f0 04 00 00 movq $0x4,-0x10(%rbp) # a[4] = 4
33: 00
34: 48 8d 45 d0 lea -0x30(%rbp),%rax # %rax = a
38: be 05 00 00 00 mov $0x5,%esi # %esi = 5 (second arg)
3d: 48 89 c7 mov %rax,%rdi # %rdi = a (first arg)
40: e8 00 00 00 00 call 45 <main+0x45> # Call sum_array()
45: 48 89 45 f8 mov %rax,-0x8(%rbp) # Store return value in sum
49: 48 8b 45 f8 mov -0x8(%rbp),%rax # %rax = sum
4d: 48 89 c6 mov %rax,%rsi # %rsi = %rax (second arg)
50: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # %rax = address of fmt string (unresolved)
57: 48 89 c7 mov %rax,%rdi # %rdi = %rax (first arg)
5a: b8 00 00 00 00 mov $0x0,%eax # Clear %rax
5f: e8 00 00 00 00 call 64 <main+0x64> # Call printf()
64: b8 00 00 00 00 mov $0x0,%eax # Clear %rax
69: c9 leave
6a: c3 ret
Note that this object file was compiled using -fno-stack-protector. gcc will
protect functions that it deems vulnerable to buffer overflow attacks with a
special guard value. The guard is initialized when a function is entered and
then checked when the function exits. We disabled this feature to keep the
main() function short.
mainOnce the object files are linked together to build the executable, the addresses
in the call instructions are resolved. For example, sum_array() call to
sum() is now resolved as an address relative to the instruction pointer:
1213: 48 89 c7 mov %rax,%rdi
1216: e8 99 ff ff ff call 11b4 <sum>
121b: 48 89 45 f8 mov %rax,-0x8(%rbp)
The address of sum() is calculated as 0x121b (the address of the next
instruction) + 0xffffff99 (the 4-byte signed integer following the e8 call
instruction, in little-endian order), or 0x121b - 0x67, which results in
0x11b4. This jump target encoding relative to the instruction pointer is
known as “PC-relative” encoding.
GDB is a command-line tool that you can use to debug your program. It lets you pause program execution at arbitrary points and inspect memory, register contents, variable values, etc. There exists many online tutorials on how to effectively use GDB. Also see CSAPP Section 3.10.2 and Figure 3.39 for an introduction to GDB and a list of useful commands.
Last Updated: 2024-03-12