Architecture

This page describes the structure of the kernite source tree, the architecture abstraction layer, the global lock hierarchy, and the constraints that govern kernel code.

Module Map

The kernel source (kernite/src/) is organized into the following modules:

Module Responsibility

lib.rs

Kernel entry point (kmain), serial I/O, panic handler, module declarations

cap/

Capability system: fat capabilities, CNode operations, global slot array, CDT, untyped retype, MemoryObject, IoPort, reference counting

ipc/

Inter-process communication: endpoints, notifications, futex, IRQ handler objects, IPC wait queue

mm/

Memory management: VSpace (page tables, Maple tree), PMM (bitmap frame allocator), radix tree, node allocator

sched/

Scheduling: EDF scheduler, thread control blocks, priority inheritance, sleep queue

syscall/

System call dispatch: 31 syscalls (numbers 0–30; SYS_SYSMEMINFO = 30 is the last assigned entry; 28 active + 1 deprecated (NBSend) + 2 reserved), capability invocation, IPC fastpath

arch/x86_64/

x86_64 backend: GDT, IDT, APIC, ACPI, paging, SMP, CPUID, FPU, PIT, SMAP/SMEP, uaccess

arch/aarch64/

aarch64 backend: GICv3, PSCI, PL011 UART, paging (TTBR0/TTBR1), generic timer, FPU/NEON, SMP

init.rs

Init task bootstrap: CSpace setup, ELF loading, first user task creation

console/

Framebuffer text console: rendering, bitmap font

bootinfo.rs

TLV boot info parser

elf.rs

ELF binary loader

cpio.rs

CPIO archive parser (initrd extraction)

acpi.rs

Shared ACPI table parsing (RSDP/XSDT, MCFG/PCIe ECAM)

rng.rs

Random number generation (RDRAND on x86_64, generic fallback on aarch64)

builtins.rs

Compiler built-in stubs (memcpy, memset, memcmp)

Diagram

Architecture Abstraction

The arch/mod.rs module provides a unified interface consumed by generic kernel code. Each architecture backend (x86_64/ or aarch64/) implements the same set of functions and types, selected at compile time via #[cfg(target_arch)].

Functions exported through arch/mod.rs:

Function Purpose

init(boot_info)

Early machine bring-up: paging, exceptions, CPU-local state. Calls mm::init() internally.

init_smp(boot_info)

Start application processors (APs).

start_timer()

Enable timer interrupts. Called after the scheduler is initialized.

clear_boot_identity_map()

Remove the bootloader identity mapping after all APs have booted.

halt()

Halt CPU until next interrupt.

sti() / cli()

Enable / disable interrupts.

context_switch(old, new)

Save old thread state, restore new thread state, switch page tables.

usermode_trampoline(tcb)

Enter user mode for a newly created thread.

send_ipi(cpu, kind)

Send inter-processor interrupt.

get_ticks() / now_ns()

Read monotonic tick counter / nanosecond timestamp.

shutdown()

ACPI system power-off. Does not return.

Re-exported sub-modules: paging, fpu, cpuid, uaccess.

x86_64 Backend

File Content

boot.rs

BSP bootstrap: GDT, IDT, APIC, ACPI, paging, CPUID

ap_boot.rs

AP trampoline (real → long mode) via ACPI MADT

gdt.rs

Global Descriptor Table with TSS

idt.rs

Interrupt Descriptor Table (256 entries)

apic.rs

Local APIC + I/O APIC (timer, IPI, IRQ routing)

acpi.rs

ACPI MADT / MCFG / HPET parsing

paging.rs

4-level page tables (PML4 → PDPT → PD → PT)

context.rs

Context switch via register save/restore

cpuid.rs

CPUID feature detection

fpu.rs

x87/SSE/AVX eager context switch (XSAVE/XRSTOR on every TCB switch, CR0.TS held at 0)

pit.rs

Legacy PIT for APIC timer calibration

uaccess.rs

SMAP/SMEP user memory access control

Syscall entry: syscall instruction. Entry point in syscall.S swaps to the kernel GS base and stashes the caller’s user RSP in PerCpuData.saved_rsp (%gs:16), then loads the kernel stack pointer from PerCpuData.kernel_stack (%gs:8).

aarch64 Backend

File Content

boot.rs

BSP bootstrap: GICv3, PSCI, generic timer, paging

ap_boot.rs

AP startup via PSCI CPU_ON (HVC) with the logical CPU index delivered through PSCI’s context ID argument (received in x0)

gic.rs

GICv3 distributor (GICD), redistributor (GICR), CPU interface (ICC system registers)

psci.rs

Power State Coordination Interface (HVC calls)

timer.rs

Generic timer: EL1 physical timer (CNTP), PPI 30, 10 ms tick

paging.rs

4-level page tables (TTBR0 user / TTBR1 kernel)

context.rs

Context switch via x0-x30, sp_el0, elr_el1, spsr_el1 save/restore

fpu.rs

NEON Q0-Q31 + FPCR + FPSR eager switch (save/restore on every TCB switch, CPACR_EL1.FPEN = 0b11 permanently)

exceptions.rs

Exception vector table (sync/IRQ/FIQ/SError at EL1/EL0)

pl011.rs

PL011 UART driver for early serial output

Syscall entry: svc #0 instruction. The exception vector dispatches to the Rust syscall handler.

Lock Ordering

The kernel enforces a strict lock acquisition order to prevent deadlocks. Locks must be acquired outermost-first; releasing order is the reverse.

The global lock hierarchy (from lib.rs):

CAP_LOCK
  → endpoint.lock / ntfn.lock / tcb.lock / sc.lock
    → SLEEP_LOCK / FUTEX_LOCK / IRQ_LOCK
      → sched.lock_cpu
        → VSpace.lock
          → ASID_LOCK
            → MO.commit_lock | MO.rmap_lock
              → ut.alloc_lock
                → FRAME_LOCK
                  → SERIAL_LOCK
MO.commit_lock and MO.rmap_lock sit at the same tier but are disjoint — they must never be held simultaneously.

Nesting Patterns

Slowpath syscalls

CAP_LOCK (capability lookup) → release → endpoint.lock (IPC operation).

IPC capability transfer

endpoint.lock → release → CAP_LOCK (slot copy into receiver’s CNode) → reacquire endpoint.lock. The endpoint lock must be released before acquiring CAP_LOCK to preserve tier ordering.

IPC fastpath

CAP_LOCK (copy endpoint cap to stack) → release → endpoint.locksched.lock_cpu.

Timer / IPI handler

sched.lock_cpu (directly, no outer lock held).

Context switch

Releases sched.lock_cpu before the architectural switch, reacquires on resume.

This document previously used SCHED_IPC_LOCK to refer to the scheduler/IPC lock tier. The canonical name in Lock Ordering is sched.lock_cpu (per-CPU scheduler state lock) and endpoint.lock (per-endpoint IPC lock). SCHED_IPC_LOCK / scheduler.lock_state are aliases for the same tier-4 lock used in some architecture documents.

IRQ Save/Restore Pattern

All spinlock acquisitions disable interrupts first to prevent deadlock between interrupt handlers and the interrupted code path:

let irq = save_irq_disable();
LOCK.lock();
// critical section
LOCK.unlock();
restore_irq(irq);

SERIAL_LOCK is the innermost lock. The SerialGuard wrapper disables IRQs, acquires the lock, and flushes buffered console output on drop.

Safety Model

#![no_std] Constraints

Kernite uses only the core library. The alloc crate is not available — there is no kernel heap. All kernel objects are carved from untyped memory via capability-mediated retype operations. Once created, kernel objects are never freed back to a general-purpose allocator; they are owned by their parent untyped memory for the lifetime of the system.

No Floating Point

Floating-point and SIMD instructions are forbidden in kernel code:

  • x86_64: compiled with -mno-sse -mno-mmx -mno-avx

  • aarch64: compiled with -mgeneral-regs-only

FPU/SIMD state is managed via eager context switching on both architectures — every TCB switch unconditionally saves and restores the FPU/SIMD register file, so no #NM / CPACR trap vector is used for regular FPU faults. CR0.TS (x86_64) is held at 0 and CPACR_EL1.FPEN (aarch64) is held at 0b11 for the lifetime of the kernel.

Unsafe Code Conventions

Every unsafe {} block carries a // SAFETY: comment explaining the invariant that makes the operation sound.

Every unsafe fn carries a # Safety documentation section listing caller obligations.

Rust 2024 edition rules apply:

  • unsafe_op_in_unsafe_fn: every unsafe operation inside an unsafe fn requires an explicit unsafe {} block.

  • No static mut references: use core::ptr::addr_of! / addr_of_mut! for raw pointer access.

  • unsafe extern blocks: items declared in extern "C" blocks require explicit unsafe or safe annotation.

All structures shared across FFI boundaries or passed to/from assembly use #[repr©]. The KernelObject header must be the first field of any kernel object struct to permit safe pointer casts for reference counting.

Error Model

The kernel uses three error enums, each with specific variants:

Type Scope

SyscallError

Returned to userspace through the syscall ABI. Variants include InvalidCapability, InvalidArgument, InsufficientRights, NotEnoughMemory, etc.

CapError

Internal to capability operations. Variants: InvalidSlot, SlotEmpty, GuardMismatch, DepthExceeded, InsufficientRights, RightsNotSubset, InvalidOperation, InvalidBadge, InvalidState.

VSpaceError

Internal to virtual memory operations.

Error types are mapped between layers by dedicated conversion functions (e.g., syscall_error_from_cap_error()). The kernel does not use impl From<X> for Y for error conversions to prevent accidental information loss.

Subsystem Architecture

Kernite supports a multi-personality design. Each personality provides the runtime environment for a different ABI family. Personality selection is determined by a subsystem ID carried in the process descriptor; procmgr dispatches new process requests to the appropriate personality server.

Defined Personalities

Personality Subsystem ID Description

POSIX

0

Standard POSIX/Unix environment. Servers: posix_ttysrv (terminal), posix_getty (login prompt), posix_login (authentication). ELF binaries loaded via ld-trona.so.

Win32

1

Windows NT subsystem compatibility environment. Server: win32_csrss (client/server runtime). PE/COFF binaries loaded via ldtrona-pe.so.

PE/COFF Loader Path

Win32 processes use ldtrona-pe.so as the image loader. ldtrona-pe.so maps PE image sections into the lower virtual address space, establishes the import table, and transfers control to the PE entry point. The PE image base address conventions (typically starting at 0x0000_0000_0040_0000) coexist with the ELF layout; the overall kernel/user VA split is unchanged.

Subsystem Dispatch

procmgr reads the subsystem ID from the new-process request and routes the request to the registered personality server for that ID. Personality servers are registered at boot time via their .service file’s [Dependencies] block. A process whose subsystem ID is not registered receives InvalidArgument from the new-process syscall.