x86-64 has 16 64-bit general purpose registers, as presented in CSAPP Figure 3.2:
Many x86 assembly instructions can operate on different widths of data. The byte quantity is specified as the suffix of the instruction name. CSAPP Figure 3.1 details these suffixes:
The following sequence of mov
instructions demonstrate how to manipulate data at different widths:
movabsq $0x0011223344556677, %rax # %rax = 0011223344556677
movb $-1, %al # %rax = 00112233445566FF
movw $-1, %ax, # %rax = 001122334455FFFF
movl $-1, %eax # %rax = 00000000FFFFFFFF
movq $-1, %rax # %rax = FFFFFFFFFFFFFFFF
The first mov
fills %rax
with a quadword (64-bit) quantity,
0x0011223344556677
. Regular mov
instructions can only take a 32-bit
immediate (i.e., literal) value. movabs
allows you to specify a 64-bit
immediate value.
The next four mov
instructions set the lower 1, 2, 4, and 8 bytes of %rax
to
-1, respectively. Each mov
instruction is augmented with the appropriate
data-width suffix. Sometimes the suffix is omitted because it can be deduced
from the register name.
Note that movb
and movw
instruction leave the rest of the bits in the
register unchanged, but movl
has the side-effect of zero-ing out the higher
order 4 bytes in the register. x86 dictates that movl
with a register
destination should zero-out the higher-order 4 bytes.
The mov
instruction is used to transfer data between registers and memory. In
this section, we’ll cover the x86 syntax used to refer to memory locations. The
leaq
instruction, often abbreviated as lea
since it only operates on 8-byte
quadwords, is used to perform pointer arithmetic. It shares the same syntax with
mov
, but doesn’t dereference memory; it just calculates the address. lea
stands for “load effective address”.
CSAPP 3.8.2 gives some examples of using mov
and lea
instructions with
various forms of memory addressing. For the following examples, assume the following:
E
is stored in %rdx
i
is stored in %rcx
%eax
for data or %rax
for pointersM[addr]
refers to the data stored at addr
in memoryAs a side note, this memory access syntax is so convenient that compilers often
use lea
to perform arithmetic unrelated to memory access.
The push
and pop
instructions are examples of instructions that operate on
specific registers. The %rsp
register holds the address of the top of the
stack. The push
instruction decreases %rsp
(to grow the stack) and then
copies a value to the memory location pointed to by %rsp
. The pop
instruction retrieves the value at the memory location pointed to by %rsp
and
then increases %rsp
(to shrink the stack).
CSAPP Figure 3.9 illustrates these semantics:
CSAPP Figure 3.25 depicts what the stack looks like when function P()
calls function Q()
:
P’s stack frame:
Q()
, P()
prepares the arguments. The first six arguments
are passed through registers, %rdi, %rsi, %rdx, %rcx, %r8, %r9
, in that order.
The rest of the arguments are pushed onto the P()
’s stack frame, as depicted
above.P()
executes the call
instruction, it pushes the return address onto
its stack frame before jumping to Q()
.Q’s stack frame:
%rbx, %rbp, %r12-%r15
are classified as “callee-saved” registers,
meaning that the caller expects them to have the original values when the callee
returns. If Q()
intends to use them, it must first save them on the stack so
they can later be restored.Q()
then allocates space for its local variables in its stack frame.Q()
invokes another function that takes more than 6 parameters, it will
prepare the arguments on its stack frame.Q()
will store its return value in %rax
(not depicted)Q()
returns via the ret
instruction, it will pop off the saved return
address from P()
’s stack frame and jump to it.%rbp
(not depicted) is known as the frame pointer. It points to the start of
the current stack frame. You’ll typically see these instructions in a function:
push %rbp # Save the old frame pointer on the stack
mov %rsp,%rbp # Set the new frame pointer ```
# <function body>
mov %rbp,%rsp # Roll up the stack to %rbp
pop %rbp # Restore the old frame pointer
ret
The compiler may optimize away parts of or the entirety of the stack frame. For example, leaf procedures (i.e., functions that don’t call other functions) may not have a stack frame.
Consider the following C program, divided into two source files:
sum.c
:
long sum(long a, long b) {
return a + b;
}
long sum_array(long *p, int n) {
long s = 0;
for (int i = 0; i < n; i++) {
s = sum(s, p[i]);
}
return s;
}
main.c
:
long sum_array(long *p, int n);
int main() {
long a[5] = {0, 1, 2, 3, 4};
long sum = sum_array(a, 5);
printf("sum=%ld\n", sum);
}
With the x86 assembly essentials we’ve just covered, we can now dive into compiler-generated x86 assembly for this simple C program.
When gcc
compiles a C source file into an object file, it first translates the
C code into assembly code, and then invokes the assembler to translate the
assembly code into the machine code. One can generate the intermediate assembly
code using the -S
flag (e.g., gcc -S code.c
).
The object file is encoded using the ELF binary format, which we will cover
later in the class. We can use a disassembler to read the code region of the
object file as assembly code (e.g., objdump -d code.o
).
sum.o
The disassembly of sum.o
, heavily annotated by us, is shown below:
sum.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <sum>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp # Save the old frame pointer on the stack
5: 48 89 e5 mov %rsp,%rbp # Set the new frame pointer
8: 48 89 7d f8 mov %rdi,-0x8(%rbp) # Loads arguments a and b onto the stack
c: 48 89 75 f0 mov %rsi,-0x10(%rbp) # without adjusting rsp
10: 48 8b 55 f8 mov -0x8(%rbp),%rdx # Loads stack copies of a and b into %rdx and %rax
14: 48 8b 45 f0 mov -0x10(%rbp),%rax # Note that %rax holds function return value
18: 48 01 d0 add %rdx,%rax # b += a
1b: 5d pop %rbp # Restore the old frame pointer
1c: c3 ret # Return to caller
000000000000001d <sum_array>:
1d: f3 0f 1e fa endbr64
21: 55 push %rbp
22: 48 89 e5 mov %rsp,%rbp
25: 48 83 ec 20 sub $0x20,%rsp # Allocate 32 bytes on the stack
29: 48 89 7d e8 mov %rdi,-0x18(%rbp) # Load arguments p and n onto the stack
2d: 89 75 e4 mov %esi,-0x1c(%rbp)
30: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp) # s = 0
37: 00
38: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp) # i = 0
3f: eb 2e jmp 6f <sum_array+0x52> # Jump to 0x6f
41: 8b 45 f4 mov -0xc(%rbp),%eax # %eax = i
44: 48 98 cltq # %rax = (long) %eax
46: 48 8d 14 c5 00 00 00 lea 0x0(,%rax,8),%rdx # Calculate byte-offset (i * sizeof(long))
4d: 00
4e: 48 8b 45 e8 mov -0x18(%rbp),%rax # %rax = p
52: 48 01 d0 add %rdx,%rax # p += byte-offset
55: 48 8b 10 mov (%rax),%rdx # %rdx = *p
58: 48 8b 45 f8 mov -0x8(%rbp),%rax # %rax = s
5c: 48 89 d6 mov %rdx,%rsi # %rsi = *p (second arg)
5f: 48 89 c7 mov %rax,%rdi # %rdi = s (first arg)
62: e8 00 00 00 00 call 67 <sum_array+0x4a> # Call add() (address not resolved yet)
67: 48 89 45 f8 mov %rax,-0x8(%rbp) # Store return value into s
6b: 83 45 f4 01 addl $0x1,-0xc(%rbp) # i++
6f: 8b 45 f4 mov -0xc(%rbp),%eax # %eax = i
72: 3b 45 e4 cmp -0x1c(%rbp),%eax # Compare i and n
75: 7c ca jl 41 <sum_array+0x24> # Jump to 0x41 if i < n
77: 48 8b 45 f8 mov -0x8(%rbp),%rax # %rax = s as return value
7b: c9 leave # mov %rbp,%rsp then pop %rbp
7c: c3 ret
A few additional notes:
-O0 -fno-omit-frame-pointer
to ensure the compiler
does not perform any optimizations and to ensure that %rbp
is used as the
frame pointer.sum()
is a leaf procedure. Note that it does not build out its stack frame
by manipulating %rsp
like sum_array()
does.call
instructions are zero at this point. They
will be fixed up by the linker when the final executable is built.sum_array()
allocates 32 bytes for local variables (which only take up 24
bytes) probably to ensure proper data alignment.jl
used in sum_array()
) usually follow some
comparison instructions (like cmp
). The result of a comparison instruction is
stored as several bits in the %rflags
register. Subsequent conditional jumps
refer to this register to determine if the jump should be taken.sum.o
(optional)If we recompile sum.c
using -O1
(for optimization level 1), the generated
machine code changes; see our annotations in-line:
sum.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <sum>:
0: f3 0f 1e fa endbr64
4: 48 8d 04 37 lea (%rdi,%rsi,1),%rax # %rax = a + (b * 1)
8: c3 ret
0000000000000009 <sum_array>:
9: f3 0f 1e fa endbr64
d: 85 f6 test %esi,%esi # Test n
f: 7e 20 jle 31 <sum_array+0x28> # Jump to 0x31 if n <= 0
11: 48 89 f8 mov %rdi,%rax # %rax = p
14: 8d 56 ff lea -0x1(%rsi),%edx # %edx = n - 1
17: 48 8d 4c d7 08 lea 0x8(%rdi,%rdx,8),%rcx # %rcx = (p + 8(n-1)) + 8, addr of one-past last element of array
1c: ba 00 00 00 00 mov $0x0,%edx # %edx = 0
21: 48 03 10 add (%rax),%rdx # %rdx += *p
24: 48 83 c0 08 add $0x8,%rax # p++
28: 48 39 c8 cmp %rcx,%rax # Compare p and addr of one-past last element of array
2b: 75 f4 jne 21 <sum_array+0x18> # Jump to 0x21 if not at end yet
2d: 48 89 d0 mov %rdx,%rax # %rdx -> %rax as return value
30: c3 ret
31: ba 00 00 00 00 mov $0x0,%edx # %edx = 0
36: eb f5 jmp 2d <sum_array+0x24> # Unconditional jump to 0x2d
A few things to note:
sum()
has been optimized to simply use the lea
instruction to perform the
addition.sum_array()
no longer has a stack frame because all local variables are
stored in registers.long *end = p + n;
for (; p != end; p++) { ... }
sum_array()
doesn’t even call sum()
anymore; it has inlined the addition.main.o
The disassembly of main.o
, heavily annotated by us, is shown below:
main.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 83 ec 30 sub $0x30,%rsp
c: 48 c7 45 d0 00 00 00 movq $0x0,-0x30(%rbp) # a[0] = 0
13: 00
14: 48 c7 45 d8 01 00 00 movq $0x1,-0x28(%rbp) # a[1] = 1
1b: 00
1c: 48 c7 45 e0 02 00 00 movq $0x2,-0x20(%rbp) # a[2] = 2
23: 00
24: 48 c7 45 e8 03 00 00 movq $0x3,-0x18(%rbp) # a[3] = 3
2b: 00
2c: 48 c7 45 f0 04 00 00 movq $0x4,-0x10(%rbp) # a[4] = 4
33: 00
34: 48 8d 45 d0 lea -0x30(%rbp),%rax # %rax = a
38: be 05 00 00 00 mov $0x5,%esi # %esi = 5 (second arg)
3d: 48 89 c7 mov %rax,%rdi # %rdi = a (first arg)
40: e8 00 00 00 00 call 45 <main+0x45> # Call sum_array()
45: 48 89 45 f8 mov %rax,-0x8(%rbp) # Store return value in sum
49: 48 8b 45 f8 mov -0x8(%rbp),%rax # %rax = sum
4d: 48 89 c6 mov %rax,%rsi # %rsi = %rax (second arg)
50: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # %rax = address of fmt string (unresolved)
57: 48 89 c7 mov %rax,%rdi # %rdi = %rax (first arg)
5a: b8 00 00 00 00 mov $0x0,%eax # Clear %rax
5f: e8 00 00 00 00 call 64 <main+0x64> # Call printf()
64: b8 00 00 00 00 mov $0x0,%eax # Clear %rax
69: c9 leave
6a: c3 ret
Note that this object file was compiled using -fno-stack-protector
. gcc
will
protect functions that it deems vulnerable to buffer overflow attacks with a
special guard value. The guard is initialized when a function is entered and
then checked when the function exits. We disabled this feature to keep the
main()
function short.
main
Once the object files are linked together to build the executable, the addresses
in the call
instructions are resolved. For example, sum_array()
call to
sum()
is now resolved as an address relative to the instruction pointer:
1213: 48 89 c7 mov %rax,%rdi
1216: e8 99 ff ff ff call 11b4 <sum>
121b: 48 89 45 f8 mov %rax,-0x8(%rbp)
The address of sum()
is calculated as 0x121b
(the address of the next
instruction) + 0xffffff99
(the 4-byte signed integer following the e8
call
instruction, in little-endian order), or 0x121b - 0x67
, which results in
0x11b4
. This jump target encoding relative to the instruction pointer is
known as “PC-relative” encoding.
GDB is a command-line tool that you can use to debug your program. It lets you pause program execution at arbitrary points and inspect memory, register contents, variable values, etc. There exists many online tutorials on how to effectively use GDB. Also see CSAPP Section 3.10.2 and Figure 3.39 for an introduction to GDB and a list of useful commands.
Last Updated: 2024-03-12