malloctopus

HW1: malloctopus 🐙

Submission

We will be using GitHub for distributing and collecting your assignments. At this point you should already have a repository created on your behalf in the cs4157-hw GitHub org. Follow the instructions on the class listserv to get access to that repository. Note that if you submitted the group form late, you will not have a repo yet - please reach out to the TA listserv in that case. You will have to wait until a little after the assignment has been released, but please feel free to work on it independently.

To obtain the skeleton files that we have provided for you, you need to clone your private repository. Your repository page should have a button titled “< > Code”. Click on it, and under the “Clone” section, select “SSH” and copy the link from there. For example:

$ git clone git@github.com:cs4157-hw/hw1-<id>-<your-team-name>.git

The TAs will use scripts to download, extract, build, and display your code. It is essential that you DO NOT change the names of the skeleton files provided to you. If you deviate from the requirements, the grading scripts will fail and you will not receive credit for your work.

You need to have at least 5 git commits total, but we encourage you to have many more. Your final submission should be pushed to the main branch.

For all assignments in this class, make sure:

Your submission does not contain any binary files.
Your code compiles without any warnings or errors.
Your code runs valgrind clean (without memory errors or leaks). You can run valgrind like this:
```
$ valgrind --leak-check=full <your program>
```

At a minimum, README.txt should contain the following info:

Name & UNI for all group members
Homework assignment number
Description for each part

The description should indicate whether your solution for the part is working or not. You may also want to include anything else you would like to communicate to the grader, such as extra functionality you implemented or how you tried to fix your non-working code.

Answers to written questions, if applicable, must be added to the skeleton file we have provided.

Overview

Each part of this assignment will iterate on the implementation of a memory allocator library with the following API:

void *mm_malloc(size_t size): Returns a pointer to a block of memory on the heap of at least size bytes. The allocated memory is aligned such that it can be used for any data type. Returns NULL on failure and sets errno to ENOMEM.
void mm_free(void *p): Frees the block pointed to by p, which must have been returned by a previous call to mm_malloc(). If p is not a pointer returned by mm_malloc() or if mm_free(p) has already been called before, the behavior of mm_free() is undefined. If p is NULL, no operation is performed.
int mm_init(void): Initializes the memory allocator library. Should be called before using mm_malloc() and mm_free().
void mm_deinit(void): Tears down the memory allocator library.

We’ve included binaries of our solution memory allocators for each part of this assignment on SPOC under /opt/asp/bin. Use them to verify the correctness of your implementations.

Starting in part 2, the allocator library API will also expose mm_checkheap(), which performs integrity checks on the heap data structure and prints its layout. We’ve already implemented this for you and included sample usage in mm-test.c. For any test driver (including ones you write) for your allocator, the output of mm_checkheap() should match the output produced by the test driver when linked against our solution allocator.

Part 1: Bump Allocator

Reading

CSAPP 3.9.3: Data Alignment

Concept

To warm up, let’s consider an incredibly simple implementation: a bump allocator. The underlying algorithm maintains a pointer to the next free byte on the heap. Each time malloc() gets called, the allocator returns the pointer’s current value and then advances the pointer by the allocated block’s size. If there’s not enough space in the heap to serve an allocation request, the heap is extended by calling the sbrk() system call. free() is implemented as a no-op, which means that a program that uses a bump allocator will simply keep increasing heap memory, despite “freeing” blocks it doesn’t need anymore.

`libmem`

To facilitate testing, your code for all parts of this assignment should not call sbrk() directly. We’ve adapted CSAPP’s mem_sbrk() library function which allocates memory from a preallocated pool. This allows us to place a limit on the maximum heap size to test failures. In your allocator implementation, use mem_sbrk() instead of sbrk().

This library is initialized and deinitialized using mem_init() and mem_deinit(), which are called by mm_init() and mm_deinit(), respectively.

Tasks

Complete the mm_malloc() function to finish the implementation of the bump allocator.

Here are some requirements:

Ensure that allocated blocks satisfy 64-bit memory alignment requirements. That is, each pointer returned by mm_malloc() is at a 16-byte boundary.
- The skeleton code ensures that the heap starts at a 16-byte boundary already, so you can satisfy the alignment requirement simply by ensuring all allocated block sizes are a multiple of 16 bytes. For example, a 5-byte request will be padded to 16 bytes; a 20-byte requested will be padded to 32 bytes; etc.
When there is not enough memory remaining on the heap to satisfy the request, call mem_sbrk() to grow the heap.
- Increment the heap in multiples of 4096 bytes so that mem_sbrk() does not get called too often. For example, if the shortfall is 10 bytes (e.g., 15 bytes are requested but you only have 5 bytes remaining on the heap), you’d call mem_sbrk(4096); if the shortfall is 5KB, you’d call mem_sbrk(8192).

Part 2: Implicit Free List Allocator

Reading

CSAPP 9.9.12: 32-bit implicit free list allocator implementation
CSAPP 3.9.1: Structures
The Lost Art of Struct Packing, sections 4-6: Alignment, struct packing, and bitfields

Tasks

While simple and fast, the bump allocator is not a realistic implementation for a general purpose memory allocator. In this part, we will study CSAPP’s implementation of a 32-bit implicit free list allocator in detail, and port it to a 64-bit implementation. Here are the tasks for this part:

The csapp/ directory contains the textbook’s 32-bit implementation, as described in CSAPP 9.9.12. We made minor modifications to make the code compile with our libmem and included a simple test driver for you to use. Study the code and play around with it in this directory.

Your 64-bit port of the textbook’s code will be implemented in part2/mm.c. Your job is to complete the implementations of all the functions in the file.

CSAPP’s 32-bit implementation aligns block payloads to 8-byte boundaries. Your 64-bit port should ensure block payloads are aligned to 16-byte boundaries.

Instead of using macros, we have provided the following structure definitions and function prototypes:

typedef struct {
   // Note that this diverges from CSAPP's header semantics. Here, the size
   // field stores the full block size in bytes using 60 bits. This is more
   // than sufficient -- Linux uses 48 or 57 bits for virtual addresses. We do
   // not use any part of those 60 bits to store the allocated bit. The three
   // bits after size are reserved for future use and the last bit is the
   // allocated bit.
    uint64_t      size : 60;
    uint64_t    unused :  3;
    uint64_t allocated :  1;

    char payload[];
} header_t;

typedef struct {
    uint64_t      size : 60;
    uint64_t    unused :  3;
    uint64_t allocated :  1;
} footer_t;

static inline header_t *header(void *payload) {
    // Implement this function
}

static inline footer_t *footer(void *payload) {
    // Implement this function
}

static inline void *next_payload(void *payload) {
    // Implement this function
}

static inline void *prev_payload(void *payload) {
    // Implement this function
}

They are meant to replace the following macros used by the CSAPP implementation:

#define PACK(size, alloc)  ((size) | (alloc))
#define GET(p)       (*(unsigned int *)(p))
#define PUT(p, val)  (*(unsigned int *)(p) = (val))
#define GET_SIZE(p)  (GET(p) & ~0x7)
#define GET_ALLOC(p) (GET(p) & 0x1)
#define HDRP(bp)       ((char *)(bp) - WSIZE)
#define FTRP(bp)       ((char *)(bp) + GET_SIZE(HDRP(bp)) - DSIZE)
#define NEXT_BLKP(bp)  ((char *)(bp) + GET_SIZE(((char *)(bp) - WSIZE)))
#define PREV_BLKP(bp)  ((char *)(bp) - GET_SIZE(((char *)(bp) - DSIZE)))

Do not use these macros. You should understand what they do and replace their usage with our structures/functions accordingly. Note that header_t uses a C99 flexible array member to refer to the payload that comes right after the header.

The textbook implementation of mm_init() calls extend_heap() with 4096 bytes for the first free block in the heap. Your implementation should do the same.
Your implementation of mm_malloc() should call extend_heap() with a multiple of 4096 bytes, similar to what we did in part 1. Note that this deviates from the textbook implementation.

Verify that your allocator works correctly. Write programs using sequences of mm_malloc() and mm_free() to test various parts of your code, including:
- extend_heap()
- Placement follows first-fit policy
- Larger blocks are split after placement
- Freed blocks are coalesced with neighboring blocks
- mm_malloc() handles out-of-memory errors gracefully
- 64-bit alignment

Your test drivers should be tracked in your submission repo; however, we will not be grading your test drivers.

Part 3: Explicit Free List

Reading

CSAPP 9.9.13: Explicit free lists (previously assigned)
CSAPP 3.9.2: Unions

Task

The implicit free list uses the block size as a “pointer” to the next/previous block in the list. This means that searching for a free block in the list requires going through all blocks in the list, which also includes the allocated blocks. In this part, we construct an explicit free list that only links together free blocks.

Recall from the previous part that the minimum user payload size had to be 16 bytes, resulting in a minimum block size of 32 bytes including the header and footer in order to satisfy 64-bit system data alignment requirements. The 16-byte user payload goes unused if the block is free. We can repurpose these unused bytes to hold prev/next pointers to other free blocks to make searching faster.

We now define header_t and add a couple more helper functions as follows:

typedef struct header {
    uint64_t      size : 60;
    uint64_t    unused :  3;
    uint64_t allocated :  1;

    union {
        // These links overlap with the first 16 bytes of a block's payload.
        struct {
            struct header *fprev, *fnext;
        } links;

        char payload[0];
    };
} header_t;

// ...

static inline void *next_free_payload(void *payload) {
    // Implement this function
}

static inline void *prev_free_payload(void *payload) {
    // Implement this function
}

The fprev and fnext pointers link free blocks into the explicit free list, which may be scattered across the heap. Free list neighbors are not necessarily adjacent in memory.

The union construct in C allows us to refer to the same memory location with multiple names. Here, the links overlap with the first 16 bytes of the block’s payload. When the block is allocated, we’ll use the payload field to refer to the start of the payload. When the the block is free, we’ll use links.fprev and links.fnext to refer to the free list pointers. We use a zero-sized array instead of C99 flexible array member that we used in part 2 since the latter cannot appear in a union.

With this new definition of header_t, update the allocator implementation to make use of the fprev/fnext pointers to traverse the free list. You’ll also have to place newly freed blocks back on to the free list. For simplicity, you should use a LIFO strategy: add newly freed blocks to the beginning of the free list.

Notes and requirements:

Start by copying your part2/ directory to part3/. Modify your code from part 2 to implement the explicit free list. Replace the definition of header_t and add the new helper function definitions from above.
Note that sizeof(header_t) is different from part 2, despite the size of the actual “header” not changing. Be sure that your implementations of header(), footer(), next_payload(), and prev_payload() account for this.
The explicit free list is organized as a circular doubly linked list with a sentinel node. When the free list is empty, the sentinel node simply points to itself. When the free list is not empty, the the sentinel node’s fnext points to the first real node of the list. The first node’s fprev points back to the sentinel. The last node’s fnext points to the sentinel. The sentinel node’s fprev points back to the last node of the list.
In part 2, the prologue block had an empty payload. For this part, modify the prologue block to contain a 16-byte payload to serve as the sentinel for explicit free list. You will need to account for this additional memory in mm_init(). heap_listp should point to the prologue’s payload.
Be careful updating the code for coalesce()! The newly freed block passed to coalesce() is not yet on the free list. This function first coalesces it with its free neighbors, which you should remove from the free list as part of the process. The resulting block is then added to the front of the free list (LIFO ordering).
Update checkheap() and printblock() to account for the new explicit free list.

Part 4: Segregated Free Lists

Reading

CSAPP 9.9.14: Segregated Free Lists (previously assigned)

Task

In this part, we will expand your part 3 implementation to maintain an array of segregated explicit free lists. Each segregated free list will only link free blocks of similiar sizes.

Start by copying your part3/ directory to part4/. Modify your code from part 3 to implement the segregated explicit free lists. Here are some top-level definitions you should use in this part:

#define NUM_SEG_LISTS 8
// Segregated free list bucket sizes:
// 0: 32-63 bytes
// 1: 64-127 bytes
// 2: 128-255 bytes
// 3: 256-511 bytes
// 4: 512-1023 bytes
// 5: 1024-2047 bytes
// 6: 2048-4095 bytes
// 7: 4096+ bytes
static char *seg_listp[NUM_SEG_LISTS] = {0};

// This function returns the index into the array segregated explicit free lists
// corresponding to the given size.
// Example: For 32 <= size < 64, this function returns 0.
static inline int seglist_index(size_t size) {
    // Implement this function
}

In part 3, we used a single prologue block as the sentinel node for the explicit free list. In this part, we expand the prologue of the heap to be 8 blocks, one sentinel node per segregated explicit free list. Each element of the seg_listp array points to the corresponding sentinel node’s payload, just as heap_listp pointed to the single prologue block’s payload in part 3. Be sure to update mm_init() to account for this change. The initial 4KB free block allocated by mm_init() will go into the free list referred to by seg_listp[7].

When free blocks are coalesced and split, be sure to place the resulting block on to the correct segregated free list.

Your find_fit() search algorithm should be updated to start at the snuggest size class and continue to iterate through larger size classes as needed.

You should also update checkheap() and printblock() to account for the new segregated explicit free lists.

Last Updated: 2024-02-01