Architecture
This page describes the structure of the kernite source tree, the architecture abstraction layer, the global lock hierarchy, and the constraints that govern kernel code.
Module Map
The kernel source (kernite/src/) is organized into the following modules:
| Module | Responsibility |
|---|---|
|
Kernel entry point ( |
|
Capability system: fat capabilities, CNode operations, global slot array, CDT, untyped retype, MemoryObject, IoPort, reference counting |
|
Inter-process communication: endpoints, notifications, futex, IRQ handler objects, IPC wait queue |
|
Memory management: VSpace (page tables, Maple tree), PMM (bitmap frame allocator), radix tree, node allocator |
|
Scheduling: EDF scheduler, thread control blocks, priority inheritance, sleep queue |
|
System call dispatch: 31 syscalls (numbers 0–30; |
|
x86_64 backend: GDT, IDT, APIC, ACPI, paging, SMP, CPUID, FPU, PIT, SMAP/SMEP, uaccess |
|
aarch64 backend: GICv3, PSCI, PL011 UART, paging (TTBR0/TTBR1), generic timer, FPU/NEON, SMP |
|
Init task bootstrap: CSpace setup, ELF loading, first user task creation |
|
Framebuffer text console: rendering, bitmap font |
|
TLV boot info parser |
|
ELF binary loader |
|
CPIO archive parser (initrd extraction) |
|
Shared ACPI table parsing (RSDP/XSDT, MCFG/PCIe ECAM) |
|
Random number generation (RDRAND on x86_64, generic fallback on aarch64) |
|
Compiler built-in stubs ( |
Architecture Abstraction
The arch/mod.rs module provides a unified interface consumed by generic kernel code.
Each architecture backend (x86_64/ or aarch64/) implements the same set of functions and types, selected at compile time via #[cfg(target_arch)].
Functions exported through arch/mod.rs:
| Function | Purpose |
|---|---|
|
Early machine bring-up: paging, exceptions, CPU-local state. Calls |
|
Start application processors (APs). |
|
Enable timer interrupts. Called after the scheduler is initialized. |
|
Remove the bootloader identity mapping after all APs have booted. |
|
Halt CPU until next interrupt. |
|
Enable / disable interrupts. |
|
Save old thread state, restore new thread state, switch page tables. |
|
Enter user mode for a newly created thread. |
|
Send inter-processor interrupt. |
|
Read monotonic tick counter / nanosecond timestamp. |
|
ACPI system power-off. Does not return. |
Re-exported sub-modules: paging, fpu, cpuid, uaccess.
x86_64 Backend
| File | Content |
|---|---|
|
BSP bootstrap: GDT, IDT, APIC, ACPI, paging, CPUID |
|
AP trampoline (real → long mode) via ACPI MADT |
|
Global Descriptor Table with TSS |
|
Interrupt Descriptor Table (256 entries) |
|
Local APIC + I/O APIC (timer, IPI, IRQ routing) |
|
ACPI MADT / MCFG / HPET parsing |
|
4-level page tables (PML4 → PDPT → PD → PT) |
|
Context switch via register save/restore |
|
CPUID feature detection |
|
x87/SSE/AVX eager context switch (XSAVE/XRSTOR on every TCB switch, |
|
Legacy PIT for APIC timer calibration |
|
SMAP/SMEP user memory access control |
Syscall entry: syscall instruction. Entry point in syscall.S swaps to the kernel GS base and stashes the caller’s user RSP in PerCpuData.saved_rsp (%gs:16), then loads the kernel stack pointer from PerCpuData.kernel_stack (%gs:8).
aarch64 Backend
| File | Content |
|---|---|
|
BSP bootstrap: GICv3, PSCI, generic timer, paging |
|
AP startup via PSCI |
|
GICv3 distributor (GICD), redistributor (GICR), CPU interface (ICC system registers) |
|
Power State Coordination Interface (HVC calls) |
|
Generic timer: EL1 physical timer (CNTP), PPI 30, 10 ms tick |
|
4-level page tables (TTBR0 user / TTBR1 kernel) |
|
Context switch via |
|
NEON Q0-Q31 + FPCR + FPSR eager switch (save/restore on every TCB switch, |
|
Exception vector table (sync/IRQ/FIQ/SError at EL1/EL0) |
|
PL011 UART driver for early serial output |
Syscall entry: svc #0 instruction. The exception vector dispatches to the Rust syscall handler.
Lock Ordering
The kernel enforces a strict lock acquisition order to prevent deadlocks. Locks must be acquired outermost-first; releasing order is the reverse.
The global lock hierarchy (from lib.rs):
CAP_LOCK
→ endpoint.lock / ntfn.lock / tcb.lock / sc.lock
→ SLEEP_LOCK / FUTEX_LOCK / IRQ_LOCK
→ sched.lock_cpu
→ VSpace.lock
→ ASID_LOCK
→ MO.commit_lock | MO.rmap_lock
→ ut.alloc_lock
→ FRAME_LOCK
→ SERIAL_LOCK
MO.commit_lock and MO.rmap_lock sit at the same tier but are disjoint — they must never be held simultaneously.
|
Nesting Patterns
- Slowpath syscalls
-
CAP_LOCK(capability lookup) → release →endpoint.lock(IPC operation). - IPC capability transfer
-
endpoint.lock→ release →CAP_LOCK(slot copy into receiver’s CNode) → reacquireendpoint.lock. The endpoint lock must be released before acquiringCAP_LOCKto preserve tier ordering. - IPC fastpath
-
CAP_LOCK(copy endpoint cap to stack) → release →endpoint.lock→sched.lock_cpu. - Timer / IPI handler
-
sched.lock_cpu(directly, no outer lock held). - Context switch
-
Releases
sched.lock_cpubefore the architectural switch, reacquires on resume.
This document previously used SCHED_IPC_LOCK to refer to the scheduler/IPC lock tier.
The canonical name in Lock Ordering is sched.lock_cpu (per-CPU scheduler state lock) and endpoint.lock (per-endpoint IPC lock).
SCHED_IPC_LOCK / scheduler.lock_state are aliases for the same tier-4 lock used in some architecture documents.
|
IRQ Save/Restore Pattern
All spinlock acquisitions disable interrupts first to prevent deadlock between interrupt handlers and the interrupted code path:
let irq = save_irq_disable();
LOCK.lock();
// critical section
LOCK.unlock();
restore_irq(irq);
SERIAL_LOCK is the innermost lock.
The SerialGuard wrapper disables IRQs, acquires the lock, and flushes buffered console output on drop.
Safety Model
#![no_std] Constraints
Kernite uses only the core library.
The alloc crate is not available — there is no kernel heap.
All kernel objects are carved from untyped memory via capability-mediated retype operations.
Once created, kernel objects are never freed back to a general-purpose allocator; they are owned by their parent untyped memory for the lifetime of the system.
No Floating Point
Floating-point and SIMD instructions are forbidden in kernel code:
-
x86_64: compiled with
-mno-sse -mno-mmx -mno-avx -
aarch64: compiled with
-mgeneral-regs-only
FPU/SIMD state is managed via eager context switching on both architectures — every TCB switch unconditionally saves and restores the FPU/SIMD register file, so no #NM / CPACR trap vector is used for regular FPU faults. CR0.TS (x86_64) is held at 0 and CPACR_EL1.FPEN (aarch64) is held at 0b11 for the lifetime of the kernel.
Unsafe Code Conventions
Every unsafe {} block carries a // SAFETY: comment explaining the invariant that makes the operation sound.
Every unsafe fn carries a # Safety documentation section listing caller obligations.
Rust 2024 edition rules apply:
-
unsafe_op_in_unsafe_fn: every unsafe operation inside anunsafe fnrequires an explicitunsafe {}block. -
No
static mutreferences: usecore::ptr::addr_of!/addr_of_mut!for raw pointer access. -
unsafe externblocks: items declared inextern "C"blocks require explicitunsafeorsafeannotation.
All structures shared across FFI boundaries or passed to/from assembly use #[repr©].
The KernelObject header must be the first field of any kernel object struct to permit safe pointer casts for reference counting.
Error Model
The kernel uses three error enums, each with specific variants:
| Type | Scope |
|---|---|
|
Returned to userspace through the syscall ABI. Variants include |
|
Internal to capability operations. Variants: |
|
Internal to virtual memory operations. |
Error types are mapped between layers by dedicated conversion functions (e.g., syscall_error_from_cap_error()).
The kernel does not use impl From<X> for Y for error conversions to prevent accidental information loss.
Subsystem Architecture
Kernite supports a multi-personality design. Each personality provides the runtime environment for a different ABI family. Personality selection is determined by a subsystem ID carried in the process descriptor; procmgr dispatches new process requests to the appropriate personality server.
Defined Personalities
| Personality | Subsystem ID | Description |
|---|---|---|
POSIX |
0 |
Standard POSIX/Unix environment. Servers: |
Win32 |
1 |
Windows NT subsystem compatibility environment. Server: |
PE/COFF Loader Path
Win32 processes use ldtrona-pe.so as the image loader.
ldtrona-pe.so maps PE image sections into the lower virtual address space, establishes the import table, and transfers control to the PE entry point.
The PE image base address conventions (typically starting at 0x0000_0000_0040_0000) coexist with the ELF layout; the overall kernel/user VA split is unchanged.
Subsystem Dispatch
procmgr reads the subsystem ID from the new-process request and routes the request to the registered personality server for that ID.
Personality servers are registered at boot time via their .service file’s [Dependencies] block.
A process whose subsystem ID is not registered receives InvalidArgument from the new-process syscall.
Related Pages
-
Boot Sequence — the full initialization path from
kmainto userspace -
Lock Ordering — detailed lock hierarchy with worked examples
-
Unsafe Code Conventions —
SAFETYcomment format and FFI patterns