Physical Memory

The Physical Memory Manager (PMM) is the kernel’s sole owner of physical frames. Every frame in the system has an owner tag tracked by the PMM. Higher layers — MemoryObjects, VSpaces, and kernel subsystems — borrow frames from the PMM and return them when done.

Four-Tier Memory Model

Diagram

The four tiers, from bottom to top:

PMM

Kernel-internal bitmap frame allocator. Provides pages for page tables, kernel stacks, radix tree nodes, maple tree nodes, and other kernel metadata. Approximately 1/8 of RAM.

Untyped memory

Raw physical memory exposed to userspace via capabilities. Source of kernel objects (via retype) and the primary source of MemoryObject data pages (via MO_COMMIT). Approximately 7/8 of RAM. See Untyped Memory.

MemoryObject (MO)

Page-granular memory abstraction. Borrows frames from untyped (primary) or PMM (fallback). Tracks pages via a 4-level radix tree. See Memory Objects.

VSpace

Observer layer. Maps MO pages into hardware page tables. Owns no frames. See Virtual Address Spaces.

Bitmap Allocator

Data Structure

pub struct FrameAllocator {
    bitmap: &'static mut [u64],    // 1 = free, 0 = used
    next_free: usize,              // hint: first potentially free frame
    total: usize,                  // highest frame index + 1
    free: usize,                   // count of free frames
    meta: &'static mut [FrameMeta], // per-frame metadata (16 bytes each)
    reserve: [u64; 32],            // emergency reserve pool (PhysAddr array)
    reserve_count: usize,
}

Each bit in the bitmap represents one 4 KB physical frame. Bit 1 = free, bit 0 = used. The bitmap is indexed by frame number (phys_addr / PAGE_SIZE).

Allocation (alloc)

  1. Start scanning from next_free frame index.

  2. Find the first word with a set bit (free frame) using bit scanning.

  3. Clear the bit, decrement free, advance next_free.

  4. Return the physical address.

Time complexity: O(N/64) in the worst case (N = total frames), amortized O(1) with the next_free hint.

Contiguous Allocation (alloc_contiguous)

Allocates n consecutive physical frames. Used for DMA buffers, capability slot arrays, and per-frame metadata arrays.

  1. Scan the bitmap for a run of n consecutive set bits.

  2. Clear all bits in the run.

  3. Return the physical address of the first frame.

Free (free)

  1. Verify the frame’s current owner matches the expected_owner (panic in debug builds on mismatch).

  2. Set the bitmap bit.

  3. Increment free.

  4. Update next_free if the freed frame is below the current hint.

  5. Reset the FrameMeta owner to Free.

Two-Phase Initialization

The PMM initializes in two phases because the per-frame metadata array requires the direct physical map, which is not available during early boot.

Phase 1: Bitmap Only (identity-mapped)

During FrameAllocator::new(), called from arch::init()mm::init():

  1. Scan the boot info memory map to find the highest usable physical address.

  2. Compute bitmap size: max_frames = max_phys / PAGE_SIZE, bitmap_words = (max_frames + 63) / 64.

  3. Allocate the bitmap from the first usable memory region within the identity-mapped window:

    • x86_64: 0x10_0000 - 0x4000_0000 (1 MB - 1 GB).

    • aarch64: 0x4000_0000 - 0x8000_0000 (QEMU virt RAM window).

  4. Zero the bitmap, then mark usable regions as free and reserved/kernel regions as used.

  5. At this point, alloc() works but meta[] is empty (zero-length slice).

Phase 2: Per-Frame Metadata (direct-mapped)

After paging::init() establishes the direct physical map:

  1. remap_bitmap() switches the bitmap pointer from identity-mapped to direct-mapped virtual address.

  2. Allocate the per-frame metadata array: max_frames * 16 bytes (one FrameMeta per frame).

  3. Zero-initialize all metadata entries.

  4. Populate the emergency reserve pool (32 pre-allocated frames).

After Phase 2, the full PMM API is available including ownership tracking and mapping reference counts.

Frame Ownership

Every physical frame carries a FrameOwner tag that identifies its current purpose:

pub enum FrameOwner {
    Free,
    UntypedReserved { ut: *mut UntypedMemory },
    MoData { mo: *mut MemoryObject, page_idx: u32 },
    MoMeta { mo: *mut MemoryObject, subkind: MoMetaKind },
    KernelPrivate { subkind: KernelMetaKind },
    PageCache,
    EmergencyReserve,
}
Variant Meaning

Free

Unallocated. Available for alloc().

UntypedReserved

Frame is carved out of an UntypedMemory watermark but has not been promoted to a typed owner yet (reset / retype windows, device untypeds). Blocks double allocation while untyped still claims the backing range.

MoData

User-visible data page owned by a MemoryObject. page_idx identifies the page within the MO. mo points to the owning MO struct (valid for system lifetime because kernel objects are never freed).

MoMeta

MO-internal metadata page. Sub-kinds: Radix (radix tree node), Rmap (reverse map overflow page).

KernelPrivate

Kernel-private page. Sub-kinds: PageTable, KernelStack, MapleNode, General, CowPool. CowPool: Un-consumed pool entries donated to a VSpace COW pool via VSPACE_SET_COW_POOL / VSPACE_REPLENISH_COW_POOL.

PageCache

File-backed page cache entry (reserved for future MoKind::FileBacked).

EmergencyReserve

Reserved pool for fault-path allocation.

Per-Frame Metadata

Each frame has a 16-byte FrameMeta structure stored in a contiguous array indexed by frame number:

#[repr(C)]
pub struct FrameMeta {       // 16 bytes
    pub owner_tag: OwnerTag, // 1 byte — discriminant
    pub subkind: u8,         // 1 byte — sub-kind within owner type
    pub map_count: u8,       // 1 byte — VSpace mapping count
    pub flags: u8,           // 1 byte — DIRTY, REFERENCED, PINNED
    pub page_idx: u32,       // 4 bytes — page index within MO
    pub owner_ptr: u64,      // 8 bytes — pointer to owning MO
}

Memory overhead: 16 bytes per 4 KB frame = 0.39% of RAM. For 1 GB RAM (262,144 frames): 4 MB metadata array.

KernelMetaKind maintains per-subkind accounting via the by_kmeta array and the kmeta_count() accessor (H7). This allows the PMM to report how many frames are held under each KernelPrivate sub-kind (e.g., PageTable, CowPool, General) separately from the aggregate KernelPrivate total.

Mapping Reference Tracking

map_count tracks how many VSpace page table entries reference this frame. It is maintained separately from ownership:

pmm_retain_mapping(phys)

Increment map_count. Called when a VSpace installs a PTE pointing to this frame.

pmm_release_mapping(phys)

Decrement map_count. Called when a VSpace removes a PTE.

map_count and flags are not reset when ownership changes (via set_owner). They track VSpace state independently of the ownership lifecycle.

Allocation API

All PMM functions acquire FRAME_LOCK internally with IRQs disabled.

Function Description

pmm_alloc(owner)

Allocate a single frame with the given ownership tag. Returns Option<PhysAddr>. Sets FrameMeta.owner on the allocated frame.

pmm_alloc_contiguous(n)

Allocate n consecutive frames. Returns Option<PhysAddr> (address of first frame). Used for DMA, slot arrays, metadata arrays.

pmm_free(phys, expected_owner)

Free a frame. Panics in debug builds if current owner does not match expected_owner.

pmm_set_owner(phys, new_owner)

Change ownership tag without free+alloc cycle. Used at the cow_install_atomic commit point to flip KernelPrivate{CowPool} to MoData.

pmm_transfer(phys, old, new)

Change ownership from a known old tag to a new tag without free+alloc cycle. Distinct from pmm_set_owner in that it verifies the current owner matches old before updating.

pmm_lookup(phys)

Return a copy of FrameMeta for a physical address. O(1) array index.

pmm_retain_mapping(phys)

Increment map_count.

pmm_release_mapping(phys)

Decrement map_count.

pmm_free_count()

Return the number of free frames.

Emergency Reserve

The PMM maintains a pool of 32 pre-allocated frames tagged EmergencyReserve. These are reserved exclusively for fault-path allocation:

  • Page fault handlers that need to allocate a radix tree node or page table page to resolve the fault.

  • Situations where normal allocation would fail due to memory pressure but forward progress is required.

In debug builds, the reserve allocator is guarded: attempting to use reserve frames from a non-fault code path triggers an assertion.

Direct Physical Map

All physical memory is permanently mapped at a fixed offset:

PHYS_MAP_OFFSET = 0xFFFF_8000_0000_0000

phys_to_virt(phys) returns phys + PHYS_MAP_OFFSET.

The direct map is established by arch::init()paging::init() during boot using 2 MB large pages where possible. It remains valid for the system lifetime.

All kernel data structures (bitmap, metadata array, radix tree nodes, page tables, kernel stacks) are accessed through the direct map.

CowPool Frame Lifecycle

Frames can be donated to a VSpace COW pool for use in the kernel fast-path fault handler:

  • Frames donated via VSPACE_SET_COW_POOL or VSPACE_REPLENISH_COW_POOL are retagged from their current owner to KernelPrivate{CowPool} under CAP_LOCK + FRAME_LOCK. At this point the frames are kernel-owned and the originating Frame capability is revoked.

  • Un-consumed CowPool entries are returned to the PMM during VSpace::cleanup via pmm_free(KernelPrivate{CowPool}) before page-table teardown.

  • Consumed entries — those used to satisfy a COW fault — become MoData at the cow_install_atomic commit point via pmm_set_owner.

FRAME_LOCK

FRAME_LOCK is a global SpinLock with TTAS optimization protecting the bitmap and metadata array. Position in the lock hierarchy: tier 7 (only SERIAL_LOCK is inner).

All access follows the IRQ save/restore protocol.

Callers may arrive at FRAME_LOCK from inside commit_lock (MO radix mutations during COW resolve). The downstream chain is: commit_lock → ut.alloc_lock → FRAME_LOCK. This means code holding commit_lock must never attempt to acquire a lock that is outer to FRAME_LOCK in the hierarchy.
let irq = save_irq_disable();
FRAME_LOCK.lock();
// critical section
FRAME_LOCK.unlock();
restore_irq(irq);

Error Conditions and Edge Cases

Allocation Failure

pmm_alloc() returns None when no free frames remain. Callers must handle this — there is no kernel panic on allocation failure (except for boot-time allocations that use .expect()).

Common callers and their failure handling:

  • MO commit: returns NotEnoughMemory to mmsrv, which can report OOM to the application.

  • Page table allocation: returns a page fault error to the faulting thread.

  • Radix tree node allocation: falls back to the emergency reserve.

Contiguous Allocation Fragmentation

pmm_alloc_contiguous(n) scans for n consecutive free frames. On a fragmented system, this can fail even when sufficient total free frames exist. Contiguous allocation is only used during boot (slot arrays, metadata arrays) when fragmentation is minimal.

Owner Mismatch on Free

pmm_free(phys, expected_owner) verifies that the frame’s current FrameMeta.owner matches expected_owner. In debug builds, a mismatch triggers a panic with diagnostic output. In release builds, the free proceeds regardless (the check is elided).

This catches double-free and wrong-owner-free bugs during development.

Emergency Reserve Depletion

The emergency reserve pool has 32 frames. If a burst of page faults depletes the reserve before normal allocation recovers, subsequent fault-path allocations fail. The thread receives a VM fault IPC to mmsrv, which must reclaim memory (e.g., evict page cache entries) and retry.

In practice, 32 frames covers the deepest fault-path allocation chain (radix tree growth + page table allocation + the faulted page itself).

Phase 1/2 Gap

Between Phase 1 (bitmap allocated via identity map) and Phase 2 (per-frame metadata array via direct map), the PMM can allocate frames but cannot track ownership metadata. During this gap, all allocations are for kernel-internal boot purposes and do not require ownership tracking. The meta slice is a zero-length placeholder until Phase 2 populates it.