Physical Memory
The Physical Memory Manager (PMM) is the kernel’s sole owner of physical frames. Every frame in the system has an owner tag tracked by the PMM. Higher layers — MemoryObjects, VSpaces, and kernel subsystems — borrow frames from the PMM and return them when done.
Four-Tier Memory Model
The four tiers, from bottom to top:
- PMM
-
Kernel-internal bitmap frame allocator. Provides pages for page tables, kernel stacks, radix tree nodes, maple tree nodes, and other kernel metadata. Approximately 1/8 of RAM.
- Untyped memory
-
Raw physical memory exposed to userspace via capabilities. Source of kernel objects (via
retype) and the primary source of MemoryObject data pages (viaMO_COMMIT). Approximately 7/8 of RAM. See Untyped Memory. - MemoryObject (MO)
-
Page-granular memory abstraction. Borrows frames from untyped (primary) or PMM (fallback). Tracks pages via a 4-level radix tree. See Memory Objects.
- VSpace
-
Observer layer. Maps MO pages into hardware page tables. Owns no frames. See Virtual Address Spaces.
Bitmap Allocator
Data Structure
pub struct FrameAllocator {
bitmap: &'static mut [u64], // 1 = free, 0 = used
next_free: usize, // hint: first potentially free frame
total: usize, // highest frame index + 1
free: usize, // count of free frames
meta: &'static mut [FrameMeta], // per-frame metadata (16 bytes each)
reserve: [u64; 32], // emergency reserve pool (PhysAddr array)
reserve_count: usize,
}
Each bit in the bitmap represents one 4 KB physical frame.
Bit 1 = free, bit 0 = used.
The bitmap is indexed by frame number (phys_addr / PAGE_SIZE).
Allocation (alloc)
-
Start scanning from
next_freeframe index. -
Find the first word with a set bit (free frame) using bit scanning.
-
Clear the bit, decrement
free, advancenext_free. -
Return the physical address.
Time complexity: O(N/64) in the worst case (N = total frames), amortized O(1) with the next_free hint.
Two-Phase Initialization
The PMM initializes in two phases because the per-frame metadata array requires the direct physical map, which is not available during early boot.
Phase 1: Bitmap Only (identity-mapped)
During FrameAllocator::new(), called from arch::init() → mm::init():
-
Scan the boot info memory map to find the highest usable physical address.
-
Compute bitmap size:
max_frames = max_phys / PAGE_SIZE,bitmap_words = (max_frames + 63) / 64. -
Allocate the bitmap from the first usable memory region within the identity-mapped window:
-
x86_64:
0x10_0000-0x4000_0000(1 MB - 1 GB). -
aarch64:
0x4000_0000-0x8000_0000(QEMU virt RAM window).
-
-
Zero the bitmap, then mark usable regions as free and reserved/kernel regions as used.
-
At this point,
alloc()works butmeta[]is empty (zero-length slice).
Phase 2: Per-Frame Metadata (direct-mapped)
After paging::init() establishes the direct physical map:
-
remap_bitmap()switches the bitmap pointer from identity-mapped to direct-mapped virtual address. -
Allocate the per-frame metadata array:
max_frames * 16bytes (oneFrameMetaper frame). -
Zero-initialize all metadata entries.
-
Populate the emergency reserve pool (32 pre-allocated frames).
After Phase 2, the full PMM API is available including ownership tracking and mapping reference counts.
Frame Ownership
Every physical frame carries a FrameOwner tag that identifies its current purpose:
pub enum FrameOwner {
Free,
UntypedReserved { ut: *mut UntypedMemory },
MoData { mo: *mut MemoryObject, page_idx: u32 },
MoMeta { mo: *mut MemoryObject, subkind: MoMetaKind },
KernelPrivate { subkind: KernelMetaKind },
PageCache,
EmergencyReserve,
}
| Variant | Meaning |
|---|---|
|
Unallocated. Available for |
|
Frame is carved out of an |
|
User-visible data page owned by a MemoryObject. |
|
MO-internal metadata page. Sub-kinds: |
|
Kernel-private page. Sub-kinds: |
|
File-backed page cache entry (reserved for future |
|
Reserved pool for fault-path allocation. |
Per-Frame Metadata
Each frame has a 16-byte FrameMeta structure stored in a contiguous array indexed by frame number:
#[repr(C)]
pub struct FrameMeta { // 16 bytes
pub owner_tag: OwnerTag, // 1 byte — discriminant
pub subkind: u8, // 1 byte — sub-kind within owner type
pub map_count: u8, // 1 byte — VSpace mapping count
pub flags: u8, // 1 byte — DIRTY, REFERENCED, PINNED
pub page_idx: u32, // 4 bytes — page index within MO
pub owner_ptr: u64, // 8 bytes — pointer to owning MO
}
Memory overhead: 16 bytes per 4 KB frame = 0.39% of RAM. For 1 GB RAM (262,144 frames): 4 MB metadata array.
KernelMetaKind maintains per-subkind accounting via the by_kmeta array and the kmeta_count() accessor (H7). This allows the PMM to report how many frames are held under each KernelPrivate sub-kind (e.g., PageTable, CowPool, General) separately from the aggregate KernelPrivate total.
|
Mapping Reference Tracking
map_count tracks how many VSpace page table entries reference this frame.
It is maintained separately from ownership:
pmm_retain_mapping(phys)-
Increment
map_count. Called when a VSpace installs a PTE pointing to this frame. pmm_release_mapping(phys)-
Decrement
map_count. Called when a VSpace removes a PTE.
map_count and flags are not reset when ownership changes (via set_owner).
They track VSpace state independently of the ownership lifecycle.
Allocation API
All PMM functions acquire FRAME_LOCK internally with IRQs disabled.
| Function | Description |
|---|---|
|
Allocate a single frame with the given ownership tag. Returns |
|
Allocate |
|
Free a frame. Panics in debug builds if current owner does not match |
|
Change ownership tag without free+alloc cycle. Used at the |
|
Change ownership from a known old tag to a new tag without free+alloc cycle. Distinct from |
|
Return a copy of |
|
Increment |
|
Decrement |
|
Return the number of free frames. |
Emergency Reserve
The PMM maintains a pool of 32 pre-allocated frames tagged EmergencyReserve.
These are reserved exclusively for fault-path allocation:
-
Page fault handlers that need to allocate a radix tree node or page table page to resolve the fault.
-
Situations where normal allocation would fail due to memory pressure but forward progress is required.
In debug builds, the reserve allocator is guarded: attempting to use reserve frames from a non-fault code path triggers an assertion.
Direct Physical Map
All physical memory is permanently mapped at a fixed offset:
PHYS_MAP_OFFSET = 0xFFFF_8000_0000_0000
phys_to_virt(phys) returns phys + PHYS_MAP_OFFSET.
The direct map is established by arch::init() → paging::init() during boot using 2 MB large pages where possible.
It remains valid for the system lifetime.
All kernel data structures (bitmap, metadata array, radix tree nodes, page tables, kernel stacks) are accessed through the direct map.
CowPool Frame Lifecycle
Frames can be donated to a VSpace COW pool for use in the kernel fast-path fault handler:
-
Frames donated via
VSPACE_SET_COW_POOLorVSPACE_REPLENISH_COW_POOLare retagged from their current owner toKernelPrivate{CowPool}underCAP_LOCK + FRAME_LOCK. At this point the frames are kernel-owned and the originating Frame capability is revoked. -
Un-consumed
CowPoolentries are returned to the PMM duringVSpace::cleanupviapmm_free(KernelPrivate{CowPool})before page-table teardown. -
Consumed entries — those used to satisfy a COW fault — become
MoDataat thecow_install_atomiccommit point viapmm_set_owner.
FRAME_LOCK
FRAME_LOCK is a global SpinLock with TTAS optimization protecting the bitmap and metadata array.
Position in the lock hierarchy: tier 7 (only SERIAL_LOCK is inner).
All access follows the IRQ save/restore protocol.
Callers may arrive at FRAME_LOCK from inside commit_lock (MO radix mutations during COW resolve). The downstream chain is: commit_lock → ut.alloc_lock → FRAME_LOCK. This means code holding commit_lock must never attempt to acquire a lock that is outer to FRAME_LOCK in the hierarchy.
|
let irq = save_irq_disable();
FRAME_LOCK.lock();
// critical section
FRAME_LOCK.unlock();
restore_irq(irq);
Error Conditions and Edge Cases
Allocation Failure
pmm_alloc() returns None when no free frames remain.
Callers must handle this — there is no kernel panic on allocation failure (except for boot-time allocations that use .expect()).
Common callers and their failure handling:
-
MO commit: returns
NotEnoughMemoryto mmsrv, which can report OOM to the application. -
Page table allocation: returns a page fault error to the faulting thread.
-
Radix tree node allocation: falls back to the emergency reserve.
Contiguous Allocation Fragmentation
pmm_alloc_contiguous(n) scans for n consecutive free frames.
On a fragmented system, this can fail even when sufficient total free frames exist.
Contiguous allocation is only used during boot (slot arrays, metadata arrays) when fragmentation is minimal.
Owner Mismatch on Free
pmm_free(phys, expected_owner) verifies that the frame’s current FrameMeta.owner matches expected_owner.
In debug builds, a mismatch triggers a panic with diagnostic output.
In release builds, the free proceeds regardless (the check is elided).
This catches double-free and wrong-owner-free bugs during development.
Emergency Reserve Depletion
The emergency reserve pool has 32 frames. If a burst of page faults depletes the reserve before normal allocation recovers, subsequent fault-path allocations fail. The thread receives a VM fault IPC to mmsrv, which must reclaim memory (e.g., evict page cache entries) and retry.
In practice, 32 frames covers the deepest fault-path allocation chain (radix tree growth + page table allocation + the faulted page itself).
Phase 1/2 Gap
Between Phase 1 (bitmap allocated via identity map) and Phase 2 (per-frame metadata array via direct map), the PMM can allocate frames but cannot track ownership metadata.
During this gap, all allocations are for kernel-internal boot purposes and do not require ownership tracking.
The meta slice is a zero-length placeholder until Phase 2 populates it.
Related Pages
-
Memory Objects — MO commit uses PMM as fallback frame source
-
Virtual Address Spaces — page table page allocation from PMM
-
Page Fault Handling — emergency reserve used in fault paths
-
Untyped Memory — primary frame source for user data
-
Architecture — lock ordering