![]()
Master memory management, I built my own malloc, and you should too
Ever stop and think about how your computer is actually shuffling trillions of bytes around every second? π€ΉββοΈ It's a question that always fascinated me. So, I decided to peek behind the curtain and dive into one of the most fundamental pieces of the puzzle: dynamic memory allocation.
In this post, I'll walk you through why malloc even exists, how it works on a deep level, and how I built my own version from scratch using the mmap system call. If that sounds complex, don't worry. I'm going to break it down from first principles. For me, grappling with this was a real "aha!" moment. And if you want to get your hands dirty, my completed project is on GitHub. Let's get into it. π
// These are the functions we're going to build.void malloc(size_t size);void free(void* ptr);void realloc(void* ptr, size_t size);void calloc(size_t count, size_t size);// This is how we'll ask the OS for memory.#include <sys/mman.h>void* mmap(void* addr, size_t len, int prot, int flags, int fd, off_t offset);int munmap(void* addr, size_t len);// And these help us set some ground rules.#include <sys/resource.h>int getrlimit(int resource, struct rlimit* rlp);int setrlimit(int resource, const struct rlimit* rlp);
Memory: The rigid, the fleeting, and the on-demand
Let's quickly touch on how C normally handles memory. It's a pretty rigid system.
- Static and global variables: These are set in stone when you compile. They exist from the moment the program starts to the second it stops, living right alongside the code itself.
- Automatic variables: These are the ones inside functions. They get created on the "stack" when a function is called and disappear the moment the function returns.
This works, but it has two massive limitations:
- You need to know the size of everything upfront. You can't just create an array and decide how big it is later.
- You're stuck with a fixed lifespan. The memory either lasts forever or for the duration of a single function call. Nothing in between.
This is why we need dynamic allocation. It's for all the situations where you don't know the "what" or the "when" at compile time.
The kernel's power tool: mmap
#include <sys/mman.h>void* mmap(void* addr, size_t len, int prot, int flags, int fd, off_t offset);
So, how do we get memory on demand? We have to ask the operating system. The kernel provides a powerful tool for this called a system call. The specific one I focused on is mmap(). Think of it as a direct line to the OS, asking it to reserve a chunk of physical memory and map it to a virtual address in our program. It's the ultimate source of memory. π
There's another tool called sbrk, but for this project, mmap is our weapon of choice. It's incredibly flexible for managing memory regions.
If mmap is the source, why bother with malloc?
This was the first big question for me. If mmap gives us memory, why don't we just call it every time we need a new variable?
The answer is performance. System calls are expensive. They require a context switch from your program to the kernel, which is a time-consuming operation. Most applications request and release little bits of memory thousands of times a second. If every single one of those requests was a full-blown system call, our programs would grind to a halt.
This is where malloc comes in. It's a clever middleman. Instead of you going to the kernel for every little thing, malloc goes once and requests a huge chunk of memory. Then, it manages that chunk for you. When you ask for a bit of memory, malloc just slices off a piece from the chunk it's already holding. Yes, this adds a little bit of overhead (the malloc library itself uses some memory), but the speed gain is enormous. It's a classic engineering trade-off.
Let's build this thing: My implementation
The library: The memory toolkit
My malloc library provides the classic trio:
malloc: Asks for a block of memory and returns a pointer to it.free: Takes that pointer back when you're done and marks the memory as available.realloc: Lets you resize a block of memory you already allocated, keeping the original data.
The data structure: How I organized memory
To make this work, I had to decide how to keep track of everything. I settled on a two-level hierarchy:
- Heap: A large region of memory I request from the OS using
mmap. - Block: A smaller piece of a heap that I hand out when
mallocis called.
Both of these need some metadata. I put a small header at the beginning of each heap and each block to store information. After a single malloc call, the memory map looks something like this:
Heap and Block Structure
Here are the C structs I defined for that metadata:
// Metadata for a whole mmap'd regiontypedef struct s_heap {struct s_heap *prev;struct s_heap *next;t_heap_group group; // TINY, SMALL, or LARGEsize_t total_size;size_t free_size;size_t block_count;} t_heap;// Metadata for a single allocated blocktypedef struct s_block {struct s_block *prev;struct s_block *next;size_t data_size;bool freed;} t_block;
By giving each block next and prev pointers, I effectively created a linked list. This lets me walk through the heap to find free spots or to find the neighbors of a block I want to free.
These little macros were helpers to quickly jump from the start of a heap or block to the user-data area.
#define HEAP_SHIFT(start) ((void*)start + sizeof(t_heap))#define BLOCK_SHIFT(start) ((void*)start + sizeof(t_block))
Performance strategy: Not all allocations are equal
I quickly realized that treating a 10-byte allocation the same as a 10-megabyte allocation was a bad idea. To optimize, I created three categories: TINY, SMALL, and LARGE. My strategy was to pre-allocate memory pages for TINY and SMALL requests, aiming to fit at least 100 blocks in each heap. LARGE blocks are the exception; they are allocated one-off without pre-allocation since they're usually rare.
A quick pro-tip I learned: it's far more efficient to make your heap sizes a multiple of the system's page size. You can get this with getpagesize() (or getconf PAGE_SIZE in the terminal). On my machine, it's 4096 bytes.
So, I did some math to define my heap sizes:
// One page can hold 128 tiny blocks#define TINY_HEAP_ALLOCATION_SIZE (4 * getpagesize())#define TINY_BLOCK_SIZE (TINY_HEAP_ALLOCATION_SIZE / 128)// Four pages can hold 128 small blocks#define SMALL_HEAP_ALLOCATION_SIZE (16 * getpagesize())#define SMALL_BLOCK_SIZE (SMALL_HEAP_ALLOCATION_SIZE / 128)
The malloc algorithm: Finding a home for data
When a malloc call comes in, here's the logic my code follows:
- It first looks at a global pointer to see if any heaps already exist.
- It then walks through the list of heaps, looking for a free block that's big enough. I used the first-fit strategy: grab the first one that works. It's simple and fast.
- If it gets to the end of a heap and there's still room, it just adds a new block there.
- If the last heap is totally full, it's time to ask the OS for more land by calling
mmap.
// The system call to create a new heap.void *heap = (t_heap *)mmap(NULL, heap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
free and the fragmentation problem
Memory Fragmentation
When free is called, simply marking a block as "available" is easy, but it creates a problem called fragmentation. You end up with lots of small, useless holes in your memory, like a game of Tetris gone wrong.
To fight this, I implemented a couple of key strategies:
- Coalescing: When a block is freed, I check if its neighbors are also free. If so, I merge them into one larger free block.
- Returning memory: If the block being freed is the very last one in a heap, and I have other heaps available, I just release the entire empty heap back to the OS with
munmap. No point in holding onto empty memory.
// Give the memory back to the kernel.munmap(heap, heap->total_size);
realloc: The shape-shifter
realloc often boils down to a simple recipe: malloc a new block of the desired size, memcpy the data from the old block to the new one, and then free the old block.
One edge case to be aware of is realloc(ptr, 0). The behavior here can vary. I took a "lazy" approach and just returned the original pointer. You should know, however, that some standards say this should be equivalent to free(ptr). My advice: don't use realloc to free memory. Use the right tool for the job.
Putting it to the test
The most rewarding part was seeing my malloc run real-world programs. I wrote a little script to force the dynamic linker to load my library instead of the standard system one.
#!/bin/shexport DYLD_LIBRARY_PATH=.export DYLD_INSERT_LIBRARIES=libft_malloc.soexport DYLD_FORCE_FLAT_NAMESPACE=1$@
Saving this as run.sh let me do things like sh run.sh ls -l or sh run.sh vim and see if they worked.
The vim crash and the alignment lesson
And of course, they didn't all work at first. ls was fine, but running vim immediately caused a segmentation fault. What was going on?
The culprit was memory alignment. It turned out that the standard malloc on macOS (where I was testing) doesn't just return any pointer. It guarantees the address is a multiple of 16. Certain programs and instructions rely on this for performance. My malloc didn't, and vim crashed.
The fix was a simple but powerful bitwise trick: size = (size + 15) & ~15;. This one line ensures that the size is always a multiple of 16, and thus the returned address will be properly aligned. A crucial lesson learned.
And that's the journey! We went from the kernel's mmap call all the way to a functioning, tested malloc library. For me, this project wasn't just about writing code; it was about demystifying a fundamental part of how our machines operate.
This was a powerful reminder that practice is everything. If you want to dig deeper, I encourage you to check out the complete implementation on my GitHub. Fork it, break it, and make it better. When you truly understand the foundation, you gain the power to build anything on top of it. Happy coding.