COMS 4995 Advanced Systems Programming

x86-64 Assembly

x86 Data Formats & Registers

x86-64 has 16 64-bit general purpose registers, as presented in CSAPP Figure 3.2:


Many x86 assembly instructions can operate on different widths of data. The byte quantity is specified as the suffix of the instruction name. CSAPP Figure 3.1 details these suffixes:


The following sequence of mov instructions demonstrate how to manipulate data at different widths:

movabsq $0x0011223344556677, %rax  # %rax = 0011223344556677
movb $-1, %al                      # %rax = 00112233445566FF
movw $-1, %ax,                     # %rax = 001122334455FFFF
movl $-1, %eax                     # %rax = 00000000FFFFFFFF
movq $-1, %rax                     # %rax = FFFFFFFFFFFFFFFF

The first mov fills %rax with a quadword (64-bit) quantity, 0x0011223344556677. Regular mov instructions can only take a 32-bit immediate (i.e., literal) value. movabs allows you to specify a 64-bit immediate value.

The next four mov instructions set the lower 1, 2, 4, and 8 bytes of %rax to -1, respectively. Each mov instruction is augmented with the appropriate data-width suffix. Sometimes the suffix is omitted because it can be deduced from the register name.

Note that movb and movw instruction leave the rest of the bits in the register unchanged, but movl has the side-effect of zero-ing out the higher order 4 bytes in the register. x86 dictates that movl with a register destination should zero-out the higher-order 4 bytes.

Memory Access

The mov instruction is used to transfer data between registers and memory. In this section, we’ll cover the x86 syntax used to refer to memory locations. The leaq instruction, often abbreviated as lea since it only operates on 8-byte quadwords, is used to perform pointer arithmetic. It shares the same syntax with mov, but doesn’t dereference memory; it just calculates the address. lea stands for “load effective address”.

CSAPP 3.8.2 gives some examples of using mov and lea instructions with various forms of memory addressing. For the following examples, assume the following:


As a side note, this memory access syntax is so convenient that compilers often use lea to perform arithmetic unrelated to memory access.

Stack Operations

The push and pop instructions are examples of instructions that operate on specific registers. The %rsp register holds the address of the top of the stack. The push instruction decreases %rsp (to grow the stack) and then copies a value to the memory location pointed to by %rsp. The pop instruction retrieves the value at the memory location pointed to by %rsp and then increases %rsp (to shrink the stack).

CSAPP Figure 3.9 illustrates these semantics:


Stack Frame

CSAPP Figure 3.25 depicts what the stack looks like when function P() calls function Q():


P’s stack frame:

Q’s stack frame:

%rbp (not depicted) is known as the frame pointer. It points to the start of the current stack frame. You’ll typically see these instructions in a function:

push   %rbp              # Save the old frame pointer on the stack
mov    %rsp,%rbp         # Set the new frame pointer              ```

# <function body>

mov    %rbp,%rsp         # Roll up the stack to %rbp
pop    %rbp              # Restore the old frame pointer

The compiler may optimize away parts of or the entirety of the stack frame. For example, leaf procedures (i.e., functions that don’t call other functions) may not have a stack frame.

Code Walkthrough

Consider the following C program, divided into two source files:


long sum(long a, long b) {
    return a + b;

long sum_array(long *p, int n) {
    long s = 0;
    for (int i = 0; i < n; i++) {
        s = sum(s, p[i]);
    return s;


long sum_array(long *p, int n);

int main() {
    long a[5] = {0, 1, 2, 3, 4};
    long sum = sum_array(a, 5);
    printf("sum=%ld\n", sum);

With the x86 assembly essentials we’ve just covered, we can now dive into compiler-generated x86 assembly for this simple C program.

When gcc compiles a C source file into an object file, it first translates the C code into assembly code, and then invokes the assembler to translate the assembly code into the machine code. One can generate the intermediate assembly code using the -S flag (e.g., gcc -S code.c).

The object file is encoded using the ELF binary format, which we will cover later in the class. We can use a disassembler to read the code region of the object file as assembly code (e.g., objdump -d code.o).


The disassembly of sum.o, heavily annotated by us, is shown below:

sum.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <sum>:
   0:	f3 0f 1e fa          	endbr64 
   4:	55                   	push   %rbp              # Save the old frame pointer on the stack
   5:	48 89 e5             	mov    %rsp,%rbp         # Set the new frame pointer
   8:	48 89 7d f8          	mov    %rdi,-0x8(%rbp)   # Loads arguments a and b onto the stack
   c:	48 89 75 f0          	mov    %rsi,-0x10(%rbp)  #   without adjusting rsp
  10:	48 8b 55 f8          	mov    -0x8(%rbp),%rdx   # Loads stack copies of a and b into %rdx and %rax
  14:	48 8b 45 f0          	mov    -0x10(%rbp),%rax  #   Note that %rax holds function return value
  18:	48 01 d0             	add    %rdx,%rax         # b += a
  1b:	5d                   	pop    %rbp              # Restore the old frame pointer
  1c:	c3                   	ret                      # Return to caller

000000000000001d <sum_array>:
  1d:	f3 0f 1e fa          	endbr64 
  21:	55                   	push   %rbp
  22:	48 89 e5             	mov    %rsp,%rbp
  25:	48 83 ec 20          	sub    $0x20,%rsp           # Allocate 32 bytes on the stack
  29:	48 89 7d e8          	mov    %rdi,-0x18(%rbp)     # Load arguments p and n onto the stack
  2d:	89 75 e4             	mov    %esi,-0x1c(%rbp)
  30:	48 c7 45 f8 00 00 00 	movq   $0x0,-0x8(%rbp)      # s = 0
  37:	00 
  38:	c7 45 f4 00 00 00 00 	movl   $0x0,-0xc(%rbp)      # i = 0
  3f:	eb 2e                	jmp    6f <sum_array+0x52>  # Jump to 0x6f
  41:	8b 45 f4             	mov    -0xc(%rbp),%eax      # %eax = i
  44:	48 98                	cltq                        # %rax = (long) %eax
  46:	48 8d 14 c5 00 00 00 	lea    0x0(,%rax,8),%rdx    # Calculate byte-offset (i * sizeof(long))
  4d:	00 
  4e:	48 8b 45 e8          	mov    -0x18(%rbp),%rax     # %rax = p
  52:	48 01 d0             	add    %rdx,%rax            # p += byte-offset
  55:	48 8b 10             	mov    (%rax),%rdx          # %rdx = *p
  58:	48 8b 45 f8          	mov    -0x8(%rbp),%rax      # %rax = s
  5c:	48 89 d6             	mov    %rdx,%rsi            # %rsi = *p (second arg)
  5f:	48 89 c7             	mov    %rax,%rdi            # %rdi = s (first arg)
  62:	e8 00 00 00 00       	call   67 <sum_array+0x4a>  # Call add() (address not resolved yet)
  67:	48 89 45 f8          	mov    %rax,-0x8(%rbp)      # Store return value into s
  6b:	83 45 f4 01          	addl   $0x1,-0xc(%rbp)      # i++
  6f:	8b 45 f4             	mov    -0xc(%rbp),%eax      # %eax = i
  72:	3b 45 e4             	cmp    -0x1c(%rbp),%eax     # Compare i and n
  75:	7c ca                	jl     41 <sum_array+0x24>  # Jump to 0x41 if i < n
  77:	48 8b 45 f8          	mov    -0x8(%rbp),%rax      # %rax = s as return value
  7b:	c9                   	leave                       # mov %rbp,%rsp then pop %rbp
  7c:	c3                   	ret    

A few additional notes:

Optimized sum.o (optional)

If we recompile sum.c using -O1 (for optimization level 1), the generated machine code changes; see our annotations in-line:

sum.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <sum>:
   0:	f3 0f 1e fa          	endbr64 
   4:	48 8d 04 37          	lea    (%rdi,%rsi,1),%rax  # %rax = a + (b * 1)
   8:	c3                   	ret    

0000000000000009 <sum_array>:
   9:	f3 0f 1e fa          	endbr64 
   d:	85 f6                	test   %esi,%esi              # Test n
   f:	7e 20                	jle    31 <sum_array+0x28>    # Jump to 0x31 if n <= 0
  11:	48 89 f8             	mov    %rdi,%rax              # %rax = p
  14:	8d 56 ff             	lea    -0x1(%rsi),%edx        # %edx = n - 1
  17:	48 8d 4c d7 08       	lea    0x8(%rdi,%rdx,8),%rcx  # %rcx = (p + 8(n-1)) + 8, addr of one-past last element of array
  1c:	ba 00 00 00 00       	mov    $0x0,%edx              # %edx = 0
  21:	48 03 10             	add    (%rax),%rdx            # %rdx += *p
  24:	48 83 c0 08          	add    $0x8,%rax              # p++
  28:	48 39 c8             	cmp    %rcx,%rax              # Compare p and addr of one-past last element of array
  2b:	75 f4                	jne    21 <sum_array+0x18>    # Jump to 0x21 if not at end yet
  2d:	48 89 d0             	mov    %rdx,%rax              # %rdx -> %rax as return value
  30:	c3                   	ret    
  31:	ba 00 00 00 00       	mov    $0x0,%edx              # %edx = 0
  36:	eb f5                	jmp    2d <sum_array+0x24>    # Unconditional jump to 0x2d

A few things to note:


The disassembly of main.o, heavily annotated by us, is shown below:

main.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <main>:
   0:	f3 0f 1e fa          	endbr64 
   4:	55                   	push   %rbp
   5:	48 89 e5             	mov    %rsp,%rbp
   8:	48 83 ec 30          	sub    $0x30,%rsp
   c:	48 c7 45 d0 00 00 00 	movq   $0x0,-0x30(%rbp)    # a[0] = 0
  13:	00 
  14:	48 c7 45 d8 01 00 00 	movq   $0x1,-0x28(%rbp)    # a[1] = 1
  1b:	00 
  1c:	48 c7 45 e0 02 00 00 	movq   $0x2,-0x20(%rbp)    # a[2] = 2
  23:	00 
  24:	48 c7 45 e8 03 00 00 	movq   $0x3,-0x18(%rbp)    # a[3] = 3
  2b:	00 
  2c:	48 c7 45 f0 04 00 00 	movq   $0x4,-0x10(%rbp)    # a[4] = 4
  33:	00 
  34:	48 8d 45 d0          	lea    -0x30(%rbp),%rax    # %rax = a
  38:	be 05 00 00 00       	mov    $0x5,%esi           # %esi = 5 (second arg)
  3d:	48 89 c7             	mov    %rax,%rdi           # %rdi = a (first arg)
  40:	e8 00 00 00 00       	call   45 <main+0x45>      # Call sum_array()
  45:	48 89 45 f8          	mov    %rax,-0x8(%rbp)     # Store return value in sum
  49:	48 8b 45 f8          	mov    -0x8(%rbp),%rax     # %rax = sum
  4d:	48 89 c6             	mov    %rax,%rsi           # %rsi = %rax (second arg)
  50:	48 8d 05 00 00 00 00 	lea    0x0(%rip),%rax      # %rax = address of fmt string (unresolved)
  57:	48 89 c7             	mov    %rax,%rdi           # %rdi = %rax (first arg)
  5a:	b8 00 00 00 00       	mov    $0x0,%eax           # Clear %rax
  5f:	e8 00 00 00 00       	call   64 <main+0x64>      # Call printf()
  64:	b8 00 00 00 00       	mov    $0x0,%eax           # Clear %rax
  69:	c9                   	leave  
  6a:	c3                   	ret    

Note that this object file was compiled using -fno-stack-protector. gcc will protect functions that it deems vulnerable to buffer overflow attacks with a special guard value. The guard is initialized when a function is entered and then checked when the function exits. We disabled this feature to keep the main() function short.


Once the object files are linked together to build the executable, the addresses in the call instructions are resolved. For example, sum_array() call to sum() is now resolved as an address relative to the instruction pointer:

1213:       48 89 c7                mov    %rax,%rdi
1216:       e8 99 ff ff ff          call   11b4 <sum>
121b:       48 89 45 f8             mov    %rax,-0x8(%rbp)

The address of sum() is calculated as 0x121b (the address of the next instruction) + 0xffffff99 (the 4-byte signed integer following the e8 call instruction, in little-endian order), or 0x121b - 0x67, which results in 0x11b4. This jump target encoding relative to the instruction pointer is known as “PC-relative” encoding.

GDB (GNU Debugger)

GDB is a command-line tool that you can use to debug your program. It lets you pause program execution at arbitrary points and inspect memory, register contents, variable values, etc. There exists many online tutorials on how to effectively use GDB. Also see CSAPP Section 3.10.2 and Figure 3.39 for an introduction to GDB and a list of useful commands.

Last Updated: 2024-03-12