How Linux Kernel implements generic linked list: part 1
A circular doubly linked list is one of the most widely used data structures in the Linux system. It is important for process scheduling (RunQueue), buffer caches, device drivers, and more.
This is a 3-part series:
- Part 1: The building blocks of the Linux kernel’s circular doubly linked list.
- Part 2: How to implement essential list routines.
- Part 3: Why this generic list is important for process management and scheduling.
Assumptions and notes for readers
- You know the basics of linked lists.
- You understand C pointers.
- Most code snippets here do not follow the Linux code style for simplicity.
Why generic linked list?
The linked list is quite a common data structure used through out the Linux implementation. Let’s take a look at the simple search result in the Linux kernel source code which initialize the linked list.
Overall linked list has a finite set of routines such as initialize
, add_entry
,remove_entry
and more in the Linux’s context. Hence it makes sense to generalize the linked list implementation such that data type it holds can vary.
What’s the fundamental problem with the concrete type approach?
For example, let’s consider a Node
structure of rudimentary linked list which holds the process list of the type task_t
.
For a simplicity, we’ve dropped the other details from the real task_t
structure.
struct task_t {
//data properties
int pid;
//link properties
task_t *prev;
task_t *next;
}
The task_t
(Node) structure, it tightly binds data
(pid) and links
(prev, next) fields together. Now prev
and next
are pointers to the structure of concrete type task_t
. Hence it could not point to the data of any type other than task_t
.
Hence in order to make the list implementation generic to hold any data type, links
pointers needs to be independent of data type it points to. It means is should be able to point any addresses of any data type or struct
entity.
One option is to use a pointer to the type void *void
, however the tradeoff with approach is loosing the type safety. Is there any other way?
Solution: Separate the links from the structure implementation
struct list_head {
struct list_head *prev;
struct list_head *next;
};
Lets think of links
pointers role, its pointing to the next
or previous
node in list.
So what if we construct the links as a single struct which can point to next
and previous
links only? Hence the linked list could be formed as per the following diagram way. It’s generic enough now to be part of any node or struct.
Problem: What’s the use of such structure and how to associate such links and data types?
What if a node struct embed list_head
as a field? In the case of task_t
it would look like:
struct task_t {
int pid;
struct list_head tasks;
}
Then despite this how to get the handle of the data type task_t
from list node field tasks
(links)?
Solution: Calculate the offset to get the base address
The fields in a struct are stored in order in memory. If we know the address of tasks
, we can calculate the address of the whole task_t
struct by subtracting the offset of tasks
.
The entry or node in the linked list of task_t
is a structure. Hence memory allocated to fields of type task_t
is contiguous which is assured.
Then, how about tracing back the base address of the task_t
data(instance) by calculating the relative memory offset of the tasks
( or tasks.prev
or tasks.next
which is 8 or 16 bytes on 64 bit processor respectively) to the base address of the task_t
data(instance) in which tasks
field contains?
which means
base_address of previous (type_t) = address(link.prev) - 8 bytes
base_address of next (type_t) = address(link.next) - 16 bytes
The following example demonstrates calculating the byte offset of any field relative to its structure using a macro OFFSET_OF
. This macro accepts two arguments, the structure type T
and field name
belonging the structure.
/// offset.c
#include <stdio.h>
//calculates the byte offset of the field x with respect to its position in the struct T
#define OFFSET_OF(T, x) (unsigned long long int) (&(((T*)0) -> x))
struct list_head {
struct list_head *prev;
struct list_head *next;
};
struct task_t {
int pid;
struct list_head tasks;
};
int main() {
unsigned long int off_pid, off_tasks;
off_pid = OFFSET_OF(struct task_t, pid);
off_tasks = OFFSET_OF(struct task_t, tasks);
printf("task_t.pid offset %li\n", off_pid);
printf("task_t.tasks offset %li\n", off_tasks);
return 0;
}
$ gcc -o offset offset.c
$ ./offset
task_t.pid offset 0
task_t.tasks offset 8
The following example illustrate how address are assigned to the field.
//address.c
#include <stdio.h>
#include <stdint.h>
struct list_head {
struct list_head *prev;
struct list_head *next;
};
struct task_t {
int pid;
struct list_head tasks;
};
int main() {
struct task_t t1;
printf("%lu size pid\n", sizeof(t1.pid));
printf("%lu size prev\n", sizeof(t1.tasks.prev));
printf("%lu size next\n", sizeof(t1.tasks.next));
printf("%lu address of pid\n", (uintptr_t)&t1.pid); // also same as t1
printf("%lu address of prev\n", (uintptr_t)&t1.tasks.prev);
printf("%lu address of next\n", (uintptr_t)&t1.tasks.next);
return 0;
}
$ gcc -o address address.c
$ ./address
4 size pid
8 size prev
8 size next
140720804185888 address of pid
140720804185896 address of prev
140720804185904 address of next
Note: The pid
field is 4 bytes, but the next address starts after 8 bytes. The compiler adds 4 bytes of padding so the next field is aligned for faster access. This is called byte alignment
. The alignment may vary on 32 or 64 bit systems.
Let’s break down the OFFSET_OF
macro:
#define OFFSET_OF(T, x) (unsigned long long int) ((&(((T*)0) -> x)))
...
...
...
off_tasks = OFFSET_OF(struct task_t, tasks);
...
...
When you use OFFSET_OF(struct task_t, tasks)
, it expands to by C compilers pre-processor:
off_pid = (unsigned long long int) (&((struct task_t*)0) -> pid);
Step-by-step breakdown of the OFFSET_OF
macro
- (T)0
- This casts the integer
0
to a pointer of typeT*
. - So, it creates a pointer that “pretends” to point to a struct of type
T
at address 0.
- This casts the integer
- ((T)0) -> x
- This accesses the member
x
of the struct pointed to by the pointer. - Since the pointer is at address 0,
((T*)0)->x
is the address of memberx
as if the struct started at address 0.
- This accesses the member
- &(((T)0) -> x)
- This takes the address of the member
x
within the struct. - Since the base address is 0, the address of
x
is actually its offset from the start of the struct.
- This takes the address of the member
- (unsigned long long int)
- This casts the result to an unsigned long long integer, so you get the offset as a number (in bytes).
Now, to get back from a list_head
pointer to the parent structure (like task_t
), we use the CONTAINER_OF
macro. This is a common C trick to implement generic data structures.
How does CONTAINER_OF work?
Suppose you have a pointer to a field inside a struct (like tasks
inside task_t
). The macro calculates the address of the whole struct by subtracting the offset of the field from the field’s address.
Here’s a minimal example:
// generic_list_node.c
#include <stdio.h>
#include <stdlib.h>
#define OFFSET_OF(T, x) \
(unsigned long long int) ((&(((T*)0) -> x)))
#define CONTAINER_OF(x, T, name) \
(T*)((((char*)(x)) - OFFSET_OF(T,name)))
struct list_head {
struct list_head *prev, *next;
};
struct task_t {
int pid;
struct list_head tasks;
};
int main() {
// initialize list
struct list_head tasks_list;
struct task_t t1;
t1.pid = 1274;
// insert task_t entry
t1.tasks.next = &tasks_list;
t1.tasks.prev = &tasks_list;
tasks_list.next = &t1.tasks;
tasks_list.prev = &t1.tasks;
struct task_t *task = CONTAINER_OF(tasks_list.next, struct task_t, tasks);
printf("original task address: %p, retrieved task address: %p\n", &t1, task);
printf("pid %d\n", task -> pid);
return 0;
}
$ gcc -o node_access generic_list_node.c
$ ./node_access
original task address: 0x7ffee7dfc5f0, retrieved task address: 0x7ffee7dfc5f0
pid 1274
Explanation:
-
OFFSET_OF(type, member) gives the byte offset of member in type.
-
CONTAINER_OF(ptr, type, member) subtracts this offset from the pointer to the member, giving you the pointer to the whole struct.
-
In the example,
head.next
points tot1.tasks
. UsingCONTAINER_OF(tasks_list.next, struct task_t, tasks)
we get back tot1
.
Once the list is built, the tasks
list looks like this:
+-------------------+ +-------------------+ +-------------------+
| head_node | | task_t (node 1) | | task_t (node 2) |
| (tasks_list) | | | | |
| +-----------+ | | +-----------+ | | +-----------+ |
| | tasks |<---+------->| tasks |<---+------->| tasks |<---+
| +-----------+ | | +-----------+ | | +-----------+ |
+-------------------+ +-------------------+ +-------------------+
^ |
| |
+------------------------------------------------------+
Putting it all together
This is the basic idea behind the Linux kernel’s circular doubly linked list. This is still a simple version. You can see the real container_of
macro here and the offsetof
macro here.
In the next part, we’ll use this base to implement essential list routines.