Memory-Model Audit
| This document covers kernite PMM / MO / VSpace / paging as well as mmsrv and rsrcsrv (userland). The focus is the kernel’s contract side: it tracks kernel–userland invariants at the boundary, not userland implementation details. |
Purpose
This document is a living ledger for proof-relevant memory-model work in SaltyOS. It is meant to answer four questions at any point in time:
-
What state machine would a seL4-style proof need to refine?
-
What invariants must hold across PMM / MO / VSpace / mmsrv / rsrcsrv?
-
Which blocker-level issues are still open at current HEAD?
-
Which historically-known gaps have already been fixed?
This document is not a mechanized proof. It records static audit results, historical fixes, currently-open proof blockers, and the assumptions those statements depend on.
How To Update This Document
When a new audit pass is done:
-
Add one entry to == Audit Log.
-
Update == Current Status.
-
Add or update items in == Open Blockers and == Historical Fix Ledger.
-
If the proof obligations changed, update == Required Invariants.
Stable IDs:
-
H# = historically-identified blocker that is now closed.
-
O# = blocker that is open at current HEAD.
-
I# = proof-relevant invariant.
-
A# = load-bearing assumption.
Scope And Method
This audit covers:
-
kernite PMM / MO / VSpace / paging
-
mmsrv
-
rsrcsrv
Unless an audit-log entry says otherwise, this is a static source audit only. No build, boot, or runtime repro is implied.
Current Status
Current snapshot:
-
Date: 2026-04-17
-
Method: static source audit only
-
Current verdict: PROOF BLOCKERS CLOSED (pending build / boot verification)
-
Historical blocker closures retained: H1-H10
-
Current blocker-level issues: none
Status Table
| ID | Status | Area | Summary | Primary invariant(s) |
|---|---|---|---|---|
H1 |
CLOSED |
VSpace map/fork metadata |
Mapping metadata reservation made atomic before commit |
I2, I7, I8 |
H2 |
CLOSED |
MO reverse maps |
reverse_maps gained a synchronization domain and destroy walk protocol |
I2, I3 |
H3 |
CLOSED |
PMM free semantics |
pmm_free now proves unmapped-before-free locally via map_count |
I4 |
H4 |
CLOSED |
mmsrv mirror state |
Userspace region mirror gained transactional publication / rollback |
I5, I10, I11, I14 |
H5 |
CLOSED |
rsrcsrv quota sizing |
Kernel and rsrcsrv share one object sizing function |
I6, I9, I13 |
H6 |
CLOSED |
COW resolve syscall path |
resolve_cow_with_frame routes through cow_install_atomic; child MO backing is established as part of the same transaction |
I17 |
H7 |
CLOSED |
COW pooled fast path |
COW pool frames are retagged KernelPrivate\{CowPool} at donation and drained on VSpace teardown; kernel holds lifetime |
I18 |
H8 |
CLOSED |
MO radix tree access |
resolve_page_depth / is_local_committed read pages under commit_lock; parent chain walked per-MO without cross-nesting |
I19 |
H9 |
CLOSED |
COW fast paths |
Both handle_cow_fault and handle_cow_fault_pooled delegate to cow_install_atomic; radix commit failure rolls back PTE/PMM |
I17 |
H10 |
CLOSED |
VSpace teardown |
free_pdpt_recursive / free_pd_recursive call pmm_free(KernelPrivate\{PageTable}) after releasing the self-mapping ref |
I20 |
State Model That Must Be Proved
The implementation spreads memory truth across five places:
-
PMM per-frame state
kernite/src/mm/frame.rs
FrameOwner, map_count, and frame flags. -
MemoryObject state
kernite/src/cap/memory_object.rs
radix tree entries plus reverse_maps. -
VSpace state
kernite/src/mm/vspace.rs
hardware PTEs plus VmArea Maple tree metadata. -
mmsrv state
userland/core/mmsrv/src/types.rs
per-client MmRegion[] policy state. -
rsrcsrv state
userland/core/rsrcsrv/src/main.rs
untyped source pools, owner quotas, and handle metadata.
For a proof, these cannot merely be "usually consistent". They need one refinement story with explicit state transitions between:
PMM <-> MemoryObject <-> VSpace <-> mmsrv <-> rsrcsrv
Required Invariants
At minimum, the following must hold.
Core Ownership And Mapping Invariants
-
I1 Every physical frame has exactly one semantic owner in PMM.
-
I2 Every present MO-backed PTE is represented by a matching VmArea span and a matching MO ReverseMapEntry.
-
I3 Every ReverseMapEntry points to a live VSpace and a still-valid VA range.
-
I4 pmm_free() is called only after the last mapping reference is gone.
-
I5 mmsrv region metadata is a refinement of kernel VmArea state, not a competing second truth.
-
I6 rsrcsrv quota accounting must upper-bound, and ideally exactly match, actual kernel object memory usage.
Structural And Error-Cleanliness Invariants
-
I7 No two VmArea s in one VSpace overlap.
-
I8 map_mo / fork_range error paths leave no residual child-side PTE, VmArea, or reverse-map mutations attributable to the failed call.
-
I9 Kernel object sizing used by retype and quota accounting comes from one shared source of truth.
-
I10 Fork is child-atomic: success fully publishes child state, failure leaves the child in pre-call state.
-
I11 MmClient publication is atomic at the document’s abstraction level: no partially-swapped region/scalar state becomes observable.
-
I12 Policy code and kernel-VM code stay separated by an explicit layering boundary.
-
I13 Quota cost is idempotent: alloc-time charge equals free-time refund.
-
I14 Every kernel-side install performed by transactional fork is journaled exactly once before any later fallible step.
Assumption-Coupled And Caveat Invariants
-
I15 The current mmsrv publication model assumes a single-threaded main loop.
-
I16 Fork dedup preserves aliasing only within the bounded dedup table design; above the bound it degrades to a documented caveat, not a clean proof story.
Current-Head Blocking Invariants
-
I17 After any COW-resolution path, the installed present PTE must correspond to a backing entry in the child MO radix tree and PMM owner state must agree with that child MO.
-
I18 If the kernel stores a raw physical address obtained from a Frame capability, it must also hold a kernel-side lifetime claim or rehome ownership before userspace can revoke the original capability.
-
I19 Any read of MemoryObject.pages that can race with commit/remove/grow must either hold commit_lock or use a separately-proved lock-free publication protocol.
-
I20 Every PMM-allocated page-table frame for a user VSpace must eventually return to PMM via pmm_free(...) with the matching owner tag.
Open Blockers
None at current HEAD. The five blockers O1–O5 that were open on 2026-04-17 have all landed; see H6–H10 in the Historical Fix Ledger.
Historical Fix Ledger
These are blocker-level issues that were historically real and are still important to track, but are considered closed by the current implementation.
H1. Mapping happened before metadata registration — CLOSED
-
Status: CLOSED
-
Area: VSpace map/fork metadata
-
Primary invariants: I2, I7, I8
What landed:
-
overlap gate before metadata mutation;
-
maple reservation before insert;
-
reverse-map reservation ticket before commit;
-
infallible metadata commit after successful reservation.
Result:
Map/fork metadata no longer relies on post-hoc rollback symmetry to repair late-stage allocator failure.
H2. MemoryObject::reverse_maps had no synchronization domain — CLOSED
-
Status: CLOSED
-
Area: MO reverse-map integrity
-
Primary invariants: I2, I3
What landed:
-
dedicated MemoryObject::rmap_lock;
-
explicit lock order VSpace.lock -> MO.rmap_lock -> FRAME_LOCK;
-
snapshot + re-validate destroy walk to avoid inversion.
Result:
Reverse-map mutation and observation now have a named synchronization domain, and destroy no longer needs to violate global lock ordering.
H3. PMM free did not prove "unmapped before free" — CLOSED
-
Status: CLOSED
-
Area: PMM safety
-
Primary invariants: I4
What landed:
-
FrameAllocator::free_internal panics on non-zero map_count;
-
map_count widened to u32;
-
free-time diagnostics now report the failing frame and owner tag.
Result:
PMM now enforces "unmapped before free" locally instead of assuming higher layers got it right.
H4. mmsrv kept a second memory truth and preserved partial fork state — CLOSED
-
Status: CLOSED
-
Area: userspace VM mirror / fork publication
-
Primary invariants: I5, I10, I11, I14
What landed:
-
transactional mmsrv::txn rollback guard;
-
staged region-table publication;
-
atomic commit-swap of region/scalar state;
-
layering split between policy and kernel-VM code.
Result:
The userspace mirror is now explicitly transactional instead of being an independent best-effort state machine.
H5. rsrcsrv quota accounting was not a refinement of kernel bytes — CLOSED
-
Status: CLOSED
-
Area: quota sizing
-
Primary invariants: I6, I9, I13
What landed:
-
shared object_alloc_bytes sizing helper;
-
compile-time kernel size assertions;
-
cached cost_bytes for idempotent refund;
-
removal of ad hoc rsrcsrv object-size logic.
Result:
Kernel retype and rsrcsrv quota accounting now derive object byte cost from one shared source of truth.
H6. resolve_cow_with_frame published PTEs without establishing child-MO backing — CLOSED
-
Status: CLOSED
-
Area: COW resolve syscall path
-
Primary invariants: I17
What landed:
-
new VSpace::cow_install_atomic helper (Reserve BUSY → Publish PTE+TLB+PMM mapping-refs → Finalize radix → Commit-point pmm_set_owner(MoData)) under MO.commit_lock;
-
resolve_cow_with_frame delegates to the helper; the child MO is resolved via the VSpace Maple tree before any allocation;
-
syscall_vspace_cow_resolve is an always-consume donation: inside one CAP_LOCK critical section it revokes the Frame cap and retags the PMM owner to KernelPrivate\{CowPool}, then calls the helper. Success flips it to MoData. Failure returns the phys to PMM via pmm_free(CowPool). A post-install revoke would have let CDT’s reclaim_child_range overwrite the MoData tag back to UntypedReserved and panic the next owner-mismatch check — the donation-style ordering avoids this.
Result:
Every successful COW resolve exits with the (PTE, MO radix, PMM owner) triple all pointing at the new frame. External observers (commit_lock not held) only ever see pre-call or post-call states. ABI trade-off (recorded here for transparency): the Frame cap is consumed on every invocation, including failure. Retry requires the caller to allocate a fresh frame.
H7. COW scratch pool had no kernel-held lifetime claim — CLOSED
-
Status: CLOSED
-
Area: COW pooled fast path
-
Primary invariants: I18
What landed:
-
new KernelMetaKind::CowPool variant with per-subkind accounting;
-
VSPACE_SET_COW_POOL / VSPACE_REPLENISH_COW_POOL revoke each donated Frame cap and retag the backing frame to KernelPrivate\{CowPool} under CAP_LOCK + FRAME_LOCK;
-
VSpace::cleanup drains un-consumed pool entries via pmm_free(KernelPrivate\{CowPool}) before page-table teardown.
Result:
Pool entries are owned by the kernel for their full lifetime. Deletion or revocation of the originating Frame capability cannot reclaim the phys out from under the kernel.
H8. resolve_page_depth read pages outside its writers' synchronization domain — CLOSED
-
Status: CLOSED
-
Area: MO radix reader synchronization
-
Primary invariants: I19
What landed:
-
resolve_page_depth and is_local_committed acquire commit_lock around every pages.get — per MO, never nested across two MOs.
-
self.cow_parent is read inside self.commit_lock so the cow_parent slot value observed by the reader is coherent with respect to the lazy-collapse writer.
-
The lazy-collapse branch of MemoryObject::destroy now writes remaining.cow_parent under remaining.commit_lock, closing the torn-field read/write race on the field itself.
-
The lingering writer sites in syscall_mo_resize, map_demand_*, and the map-path flatten step now acquire commit_lock around commit_page / pages.remove.
-
MemoryObject::commit_lock docstring and the global lock-ordering comment in kernite/src/mm/mod.rs now declare commit_lock at the same nesting depth as rmap_lock (disjoint), with explicit commit_lock -> ut.alloc_lock -> FRAME_LOCK ordering downstream.
Result:
Every reader of MemoryObject.pages observes a state produced under the same lock the writers hold, and the cow_parent field itself is read/written coherently.
Scope of the cow_parent guarantee. The commit_lock guard serializes the write of cow_parent; the reader’s subsequent slot dereference (deref_cow_parent(old_slot) after the lock is released) still relies on A1 (kernel objects never freed) plus the CDT slot-lifetime rules. It does not prevent the cow_parent slot from being freed and reallocated to an unrelated cap in the window between the reader releasing commit_lock and calling deref_cow_parent. Tightening this (holding commit_lock across the deref, or gating parent-chain walks behind CAP_LOCK) is tracked as follow-up under == Secondary Proof Complications.
Returned phys values remain live for the caller’s VSpace.lock window via H3's map_count >= 1 invariant when the caller already holds an active mapping; first-mapping callers are addressed in the same caveats section.
H9. COW fast paths ignored radix-commit failure after publishing a PTE — CLOSED
-
Status: CLOSED
-
Area: COW fast-path atomicity
-
Primary invariants: I17
What landed:
-
handle_cow_fault and handle_cow_fault_pooled both delegate to cow_install_atomic; the helper’s Reserve step uses RadixTree::reserve_slot(BUSY) so radix-alloc failure returns the frame to PMM without ever touching PTE or PMM owner state;
-
write_entry failure rolls back the BUSY reservation;
-
RaceLost (another CPU already resolved) is reflected back to the fault handler as "continue", with the donated / allocated frame released with its current tag (KernelPrivate\{General} or KernelPrivate\{CowPool}).
Result:
No COW fast path can leave a present PTE pointing at a page the child MO has not committed. Failure paths are transactional across all three domains.
H10. User VSpace page-table frames leaked on teardown — CLOSED
-
Status: CLOSED
-
Area: VSpace teardown
-
Primary invariants: I20
What landed:
-
free_pdpt_recursive / free_pd_recursive call pmm_free(KernelPrivate\{PageTable}) on each PDPT, PD, and PT frame after the self-mapping ref is released via pmm_release_mapping;
-
PML4 / aarch64 L0 root continue to be owned by the parent untyped carve and are intentionally skipped;
-
applies unchanged on both x86_64 and aarch64 (the teardown helpers share one arch-generic 4-level walk).
Result:
User VSpace destruction returns every allocated page-table frame to PMM. The page-table population counter and the KernelPrivate\{PageTable} tag are now a faithful refinement of actual reclamation behaviour.
Secondary Proof Complications
Not blocker-level, but they complicate a mechanized proof and should stay visible. Grouped by origin.
Pre-existing
-
COW parent lifetime. Child MOs reference their parent through an internal cap slot and a manually-maintained refcount instead of a uniform CDT-managed lifetime.
-
Multiple mapping views. A single region is tracked in three places: PMM map_count, kernel VmArea / ReverseMapEntry, and userspace MmRegion. Consistency is invariant-level, not structural.
-
Bounded fork dedup. DEDUP_MAX = 16 caps alias preservation; beyond that the guarantee degrades to a documented caveat.
Introduced by H6–H10
-
Intermediate radix nodes on write_entry failure. cow_install_atomic removes the BUSY leaf on rollback but does not compact intermediate nodes allocated during Reserve. Memory accounting only; readers see the slot as uncommitted, and MO teardown frees the nodes.
-
Pool depth drift on RaceLost. handle_cow_fault_pooled consumes the pool entry before calling cow_install_atomic. A RaceLost frees the consumed frame via pmm_free(CowPool) without emitting a notification-ring entry, so mmsrv sees the deficit only at the next replenish. Correctness-neutral (another CPU already committed) but counter-visible.
-
First-mapping caller lifetime. resolve_page_depth's returned phys has map_count >= 1 only for callers already holding a mapping to the page. First-mapping callers (map-syscall initial install) must finish the PTE install inside the same VSpace.lock window; otherwise concurrent unmap from other VSpaces could drop map_count to zero and free the frame. All current callers satisfy this, but the window is narrow rather than absent. A tighter claim (incrementing map_count inside resolve_page_depth) is tracked as follow-up.
-
cow_parent slot-reuse window. H8 serializes the write of cow_parent under the child’s commit_lock, so the reader observes a coherent slot value. But the reader releases commit_lock before calling deref_cow_parent(old_slot); in the window between unlock and deref, the CDT may free and reallocate that slot to an unrelated cap. Today this is bounded by A1 (kernel objects never freed — the underlying MO storage is stable) plus the fact that deref_cow_parent type-checks ObjectType::MemoryObject before returning the pointer. A stricter closure (holding commit_lock across the deref, or gating parent-chain walks behind CAP_LOCK) is tracked as follow-up.
-
handle_cow_fault behavior change. When the faulting VA has a COW-bit PTE but the Maple-tree VMA lookup does not yield a backing MO, the function now returns Err(VSpaceError::NotMapped) instead of the previous silent-success path (which was itself the O1 bug). This is intentional and necessary for I17, but any user-visible code that relied on the silent-success behavior will now see the fault dispatcher’s error path (SIGSEGV-equivalent). The existing test_runner fork / mmap modules exercise COW; a just run pass is the empirical check that no legitimate caller regresses.
Load-Bearing Assumptions
If any assumption below changes, every dependent statement in this document must be re-audited.
A1. Kernel objects are never freed or reused during system lifetime
Source of assumption:
-
kernel objects are carved from untyped memory;
-
backing bytes are not reused during normal object destruction paths.
Why it matters:
-
raw *mut VSpace / *mut KernelObject identity is used in reverse-map and destroy logic;
-
snapshot + re-validate reasoning depends on pointer stability;
-
ABA is ruled out by design rather than by generation tagging.
Invalidation trigger:
-
any future object-memory reuse path;
-
any typed-object free-and-reallocate design;
-
any move away from monotonic untyped-backed object storage.
A2. mmsrv main loop is single-threaded
Source of assumption:
-
MmClient publication uses staged swap, not SMP-safe publication.
Why it matters:
-
I10 and I11 currently rely on the absence of concurrent readers/writers on the same MmClient;
-
the current commit-swap protocol is not an RCU or seq-lock design.
Invalidation trigger:
-
adding worker threads to mmsrv;
-
splitting policy handling across concurrent server threads;
-
introducing partial atomic publication of only some MmClient fields.
Audit Log
2026-04-17 — Close of O1-O5
-
Method: static source audit + targeted code walkthrough of the landed cow_install_atomic primitive and its three fault-path callers.
-
Result: O1-O5 closed, recorded as H6-H10.
-
Main landed fix themes: atomic 3-way COW transaction (PTE / MO radix / PMM owner), kernel-owned COW scratch pool via KernelPrivate\{CowPool}, radix reader / writer synchronization via commit_lock, user-VSpace page-table reclamation.
-
Build / boot verification pending — recorded as a near-term obligation.