This article covers Building a Memory Allocator from Scratch in C. Learn how memory allocation works by implementing your own malloc() and free() functions in C. A deep dive into heap management, fragmentation, and optimization techniques.

Memory allocation is one of those fundamental concepts in programming that we often take for granted. We call malloc() and free() without thinking about what happens under the hood. But understanding how allocators work is crucial for writing efficient, reliable software, especially in performance-critical applications.

In this post, we'll build a complete memory allocator from scratch in C, exploring the challenges and trade-offs involved in heap management.

Understanding the Memory Landscape

Before we dive into implementation, let's understand how memory is organized in a typical C program:

+----------------------+ | Stack | ← Local variables, function calls +----------------------+ | ↓ | | Heap | ← Dynamic allocation (malloc/free) +----------------------+ | ↓ | | BSS | ← Uninitialized global/static data +----------------------+ | Data | ← Initialized global/static data +----------------------+ | Text | ← Program code +----------------------+

The heap grows upward from lower addresses, while the stack grows downward. Our allocator will manage the heap region.

The Challenge of Memory Allocation

A memory allocator faces several challenges:

Tracking free blocks - Knowing which memory regions are available Finding suitable blocks - Choosing the right free block for a request Handling fragmentation - Dealing with scattered free memory Alignment requirements - Ensuring proper memory alignment for different types Overhead minimization - Keeping metadata small while maintaining functionality

Basic Allocator Design

We'll implement a simple but effective allocator using a free list approach. Here's the basic idea:

Request large memory chunks from the operating system Split these chunks into smaller blocks as needed Maintain a linked list of free blocks Coalesce adjacent free blocks when freeing

Let's start with our header structure:

typedef struct block { size_t size; struct block * next ; int is_free; } block_t ; static block_t *free_list = NULL ; static block_t *heap_start = NULL ;

Finding Free Blocks

When malloc() is called, we need to find a suitable free block. We'll use the first-fit strategy - the first free block that's large enough:

static block_t * find_free_block ( size_t size) { block_t *current = free_list; while (current) { if (current->is_free && current->size >= size) { return current; } current = current->next; } return NULL ; }

Requesting Memory from the OS

When we can't find a suitable free block, we need to request more memory from the operating system using sbrk() :

static block_t * request_memory ( size_t size) { size_t total_size = sizeof ( block_t ) + size; size_t page_size = sysconf(_SC_PAGESIZE); size_t aligned_size = ((total_size + page_size - 1 ) / page_size) * page_size; void *block = sbrk(aligned_size); if (block == ( void *) -1 ) { return NULL ; } block_t *new_block = ( block_t *)block; new_block->size = aligned_size - sizeof ( block_t ); new_block->is_free = 0 ; new_block->next = NULL ; if (heap_start == NULL ) { heap_start = new_block; } return new_block; }

Splitting Blocks

If we find a free block that's much larger than needed, we should split it to avoid wasting memory:

static void split_block ( block_t *block, size_t size) { if (block->size - size > sizeof ( block_t ) + 16 ) { block_t *new_block = ( block_t *)(( char *)block + sizeof ( block_t ) + size); new_block->size = block->size - size - sizeof ( block_t ); new_block->is_free = 1 ; new_block->next = block->next; block->size = size; block->next = new_block; new_block->next = free_list; free_list = new_block; } }

Implementing malloc()

Now let's put it all together in our malloc() implementation:

void * malloc ( size_t size) { if (size == 0 ) { return NULL ; } size = (size + 7 ) & ~ 7 ; block_t *block = find_free_block(size); if (block) { split_block(block, size); block->is_free = 0 ; } else { block = request_memory(size); if (!block) { return NULL ; } } return ( void *)(( char *)block + sizeof ( block_t )); }

Implementing free()

Freeing memory is more complex than it seems. We need to:

Mark the block as free Add it back to the free list Coalesce with adjacent free blocks to reduce fragmentation

static void coalesce_blocks ( block_t *block) { if (block->next && block->next->is_free) { block->size += sizeof ( block_t ) + block->next->size; block->next = block->next->next; } block_t *current = free_list; while (current && current->next != block) { current = current->next; } if (current && current->is_free) { current->size += sizeof ( block_t ) + block->size; current->next = block->next; block = current; } } void free ( void *ptr) { if (!ptr) { return ; } block_t *block = ( block_t *)(( char *)ptr - sizeof ( block_t )); block->is_free = 1 ; block->next = free_list; free_list = block; coalesce_blocks(block); }

Complete Implementation

Let's put everything together in a complete, working allocator:

typedef struct block { size_t size; struct block * next ; int is_free; } block_t ; static block_t *free_list = NULL ; static block_t *heap_start = NULL ; static block_t * find_free_block ( size_t size) { block_t *current = free_list; while (current) { if (current->is_free && current->size >= size) { return current; } current = current->next; } return NULL ; } static void split_block ( block_t *block, size_t size) { if (block->size - size > sizeof ( block_t ) + 16 ) { block_t *new_block = ( block_t *)(( char *)block + sizeof ( block_t ) + size); new_block->size = block->size - size - sizeof ( block_t ); new_block->is_free = 1 ; new_block->next = block->next; block->size = size; block->next = new_block; new_block->next = free_list; free_list = new_block; } } static block_t * request_memory ( size_t size) { size_t total_size = sizeof ( block_t ) + size; size_t page_size = sysconf(_SC_PAGESIZE); size_t aligned_size = ((total_size + page_size - 1 ) / page_size) * page_size; void *block = sbrk(aligned_size); if (block == ( void *) -1 ) { return NULL ; } block_t *new_block = ( block_t *)block; new_block->size = aligned_size - sizeof ( block_t ); new_block->is_free = 0 ; new_block->next = NULL ; if (heap_start == NULL ) { heap_start = new_block; } return new_block; } static void coalesce_blocks ( block_t *block) { if (block->next && block->next->is_free) { block->size += sizeof ( block_t ) + block->next->size; block->next = block->next->next; } block_t *current = free_list; while (current && current->next != block) { current = current->next; } if (current && current->is_free) { current->size += sizeof ( block_t ) + block->size; current->next = block->next; block = current; } } void * malloc ( size_t size) { if (size == 0 ) return NULL ; size = (size + 7 ) & ~ 7 ; block_t *block = find_free_block(size); if (block) { split_block(block, size); block->is_free = 0 ; } else { block = request_memory(size); if (!block) return NULL ; } return ( void *)(( char *)block + sizeof ( block_t )); } void free ( void *ptr) { if (!ptr) return ; block_t *block = ( block_t *)(( char *)ptr - sizeof ( block_t )); block->is_free = 1 ; block->next = free_list; free_list = block; coalesce_blocks(block); } void print_allocator_state () { printf ( "=== Allocator State ===

" ); printf ( "Free list:

" ); block_t *current = free_list; while (current) { printf ( " Block at %p: size=%zu, free=%s

" , ( void *)current, current->size, current->is_free ? "yes" : "no" ); current = current->next; } printf ( "

" ); }

Testing Our Allocator

Let's create a test program to verify our allocator works correctly:

int main () { printf ( "Testing custom allocator...

" ); printf ( "1. Basic allocation test:

" ); int * array = ( int *) malloc ( 10 * sizeof ( int )); if ( array ) { for ( int i = 0 ; i < 10 ; i++) { array [i] = i * i; } printf ( " Allocated and filled array

" ); print_allocator_state(); free ( array ); printf ( " Freed array

" ); print_allocator_state(); } printf ( "

2. Multiple allocation test:

" ); char *str1 = ( char *) malloc ( 100 ); char *str2 = ( char *) malloc ( 200 ); char *str3 = ( char *) malloc ( 50 ); printf ( " Allocated three blocks

" ); print_allocator_state(); free (str1); printf ( " Freed first block

" ); print_allocator_state(); free (str3); printf ( " Freed third block

" ); print_allocator_state(); free (str2); printf ( " Freed second block

" ); print_allocator_state(); printf ( "

3. Large allocation test:

" ); void *large = malloc ( 10000 ); printf ( " Allocated large block

" ); print_allocator_state(); free (large); printf ( " Freed large block

" ); print_allocator_state(); printf ( "All tests completed!

" ); return 0 ; }

Advanced Topics and Optimizations

Our basic allocator works, but there are many ways to improve it:

1. Different Fit Strategies

Instead of first-fit, we could implement:

Best-fit : Find the smallest block that fits

: Find the smallest block that fits Worst-fit : Find the largest block (good for variable-sized allocations)

: Find the largest block (good for variable-sized allocations) Next-fit: Start searching from where we left off

2. Separate Free Lists

Maintain separate free lists for different size ranges:

static block_t *free_buckets[NUM_BUCKETS]; int get_bucket ( size_t size) { for ( int i = 0 ; i < NUM_BUCKETS; i++) { if (size <= ( 1 << (i + 4 ))) return i; } return NUM_BUCKETS - 1 ; }

3. Memory Alignment

Different data types have different alignment requirements:

size_t aligned_size = (size + ALIGNMENT - 1 ) & ~(ALIGNMENT - 1 );

4. Debugging Features

Add debugging capabilities:

void add_canary ( block_t *block) { uint32_t *canary = ( uint32_t *)(( char *)block + sizeof ( block_t ) + block->size); *canary = CANARY_VALUE; } int check_canary ( block_t *block) { uint32_t *canary = ( uint32_t *)(( char *)block + sizeof ( block_t ) + block->size); return *canary == CANARY_VALUE; }

5. Memory Pools

For applications with many small, similar-sized allocations, memory pools can be much more efficient:

typedef struct pool { size_t block_size; size_t pool_size; void *memory; void *free_list; } pool_t ; pool_t * create_pool ( size_t block_size, size_t num_blocks) { pool_t *pool = malloc ( sizeof ( pool_t )); pool->block_size = block_size; pool->pool_size = block_size * num_blocks; pool->memory = malloc (pool->pool_size); pool->free_list = pool->memory; for ( size_t i = 0 ; i < num_blocks - 1 ; i++) { void **next = ( void **)(( char *)pool->free_list + i * block_size); *next = ( void *)(( char *)pool->free_list + (i + 1 ) * block_size); } return pool; }

Performance Considerations

Fragmentation

There are two types of fragmentation:

External fragmentation: Free memory is scattered in small pieces Internal fragmentation: Allocated blocks are larger than needed

Our coalescing strategy helps with external fragmentation, but there are trade-offs between different strategies.

Cache Performance

Memory allocators can significantly impact cache performance. Consider:

Keeping frequently used data together

Minimizing metadata overhead

Aligning blocks to cache line boundaries

Thread Safety

Our current allocator is not thread-safe. To make it thread-safe, we'd need to add synchronization:

static pthread_mutex_t alloc_mutex = PTHREAD_MUTEX_INITIALIZER; void * malloc ( size_t size) { pthread_mutex_lock(&alloc_mutex); pthread_mutex_unlock(&alloc_mutex); return result; }

Real-World Allocators

Production allocators like jemalloc , tcmalloc , and mimalloc use sophisticated techniques:

Thread-local caching to avoid contention

to avoid contention Size classes for efficient small allocations

for efficient small allocations Lazy coalescing to balance performance and fragmentation

to balance performance and fragmentation Huge pages for large allocations

for large allocations NUMA awareness for multi-socket systems

Conclusion

Building a memory allocator teaches us valuable lessons about:

Memory management fundamentals - How the heap really works Data structures - Linked lists, trees, and other structures in action Trade-offs - Performance vs. memory usage vs. complexity System programming - Interfacing with the operating system

Our simple allocator demonstrates the core concepts, but production allocators are much more sophisticated. They handle edge cases, optimize for specific workloads, and provide advanced features like debugging and profiling.

Understanding how allocators work helps us write better code - we can make more informed decisions about memory usage, recognize allocation patterns, and even choose the right allocator for our specific needs.

The next time you call malloc() , you'll have a deeper appreciation for the complex dance happening behind the scenes to manage your program's memory efficiently.

Want to dive deeper? Try implementing different allocation strategies, add debugging features, or explore how modern allocators like jemalloc handle massive workloads efficiently.