IPC

substrate/ipc.rs is the second-narrowest file in the substrate after syscall.rs, and it is by far the busiest: every client-server interaction in SaltyOS — every VFS call, every procmgr spawn, every mmsrv mmap, every namesrv lookup — is a sequence of ipc::call_ctx or ipc::send_ctx calls on top of this file.

This page documents the three pieces you need to understand IPC: the TronaMsg / IpcBuffer / IpcContext types, the msginfo word encoding, and the nine IPC primitives. For which labels flow through these primitives, see IPC Protocol Labels.

The three types

TronaMsg — the user-visible message struct

TronaMsg is the struct callers fill in before sending and inspect after receiving. It lives in lib/trona/uapi/types/core.rs:

#[repr(C)]
#[derive(Clone, Copy)]
pub struct TronaMsg {
    pub label: u64,
    pub length: u64,
    pub regs: [u64; 32],
}
  • label is the IPC protocol label (VFS_WRITE, PM_SPAWN, MM_MMAP, …). It lives in bits 51:12 of the msginfo word, so only 40 bits are actually transmitted.

  • length is the number of valid message registers (0..20). The IPC layer enforces the upper bound.

  • regs[0..=3] are the four fast-path registers that travel in CPU registers (rdi, rsi, rdx, r10 on x86_64; x0-x3 on aarch64).

  • regs[4..19] (15 registers) travel through the IPC buffer page as overflow when length > 4.

  • regs[19..32] are unused — allocated for future expansion but never transmitted today.

The struct is 34 × 8 = 272 bytes regardless of how many registers are actually valid, which is fine for stack allocation and keeps the layout #[repr©]-stable.

IpcBuffer — the kernel-shared page

Every thread has a 4 KiB IpcBuffer page mapped at a fixed virtual address. The kernel reads from and writes to this page during IPC to transfer anything that does not fit in the syscall register window.

#[repr(C)]
pub struct IpcBuffer {
    pub msg: [u64; 34],
    pub badge: u64,
    pub caps: [u64; 4],
    pub receive_cnode: u64,
    pub receive_index: u64,
    pub receive_depth: u64,
    pub reserved: [u64; IPC_BUFFER_RESERVED_WORDS],  // 466 words = 3,728 bytes
}

Field-by-field:

  • msg[0..6] mirrors the TronaMsg header (label, length, and regs[0..=3]).

  • msg[6..34] holds the overflow message registers (the regs[4..31] range).

  • badge is written by the kernel on receive — it holds the badge of the sender endpoint.

  • caps[0..4] is the capability transfer window: up to four capability slots staged before a send and up to four slots to be populated on receive.

  • receive_cnode, receive_index, receive_depth configure the destination CNode / slot / depth where incoming capabilities should be placed. The kernel uses these on any receive path that could bring in caps.

  • reserved[0..466] is a per-syscall scratch area. Some operations write extra data here — for example, VSPACE_WALK writes result tuples starting at word offset 42 — and the caller parses them with helpers like vspace_walk_result_header() / vspace_walk_result_entry().

The total size is 34 + 1 + 4 + 3 + 466 = 508 words = 4,064 bytes, plus trailing padding to reach the 4,096-byte page boundary.

IpcContext — per-thread state

IpcContext is the struct every IPC call threads through:

#[repr(C)]
pub struct IpcContext {
    pub ipc_buffer: *mut IpcBuffer,
    pub send_cap_count: i32,
}

It is tiny on purpose: just the pointer to the thread’s IPC buffer plus a counter tracking how many caps have been staged for the next send. The counter is reset to 0 automatically after every send completes.

In a single-threaded process, substrate::__trona_ipc_ctx (the global static mut in lib.rs) is the only IpcContext anywhere. In a multi-threaded process, the substrate TLS layer keeps a per-thread IpcContext inside each ThreadLocalBlock, and tls::current_ipc_ctx() picks the right one.

The msginfo word — label, length, extra_caps

The kernel never sees a TronaMsg directly — what it sees on entry is a 64-bit msginfo word packing three fields:

Bits Field Meaning

6:0

length

0–127: number of message registers carrying payload. Capped at 20 by higher layers.

11:7

extra_caps

0–31: number of capability slots staged in IpcBuffer.caps[] for this transfer. Capped at 4 by the buffer size.

51:12

label

40-bit operation identifier (VFS_READ, PM_SPAWN, …).

substrate/ipc.rs exposes one encoder and three accessors so no call site ever manipulates the bit layout by hand:

#[inline(always)]
pub fn msginfo(label: u64, length: u64, caps: u64) -> u64 {
    (label << 12) | (caps << 7) | (length & 0x7F)
}

#[inline(always)]
pub fn msginfo_label(info: u64) -> u64 { (info >> 12) & 0xFF_FFFF_FFFF }

#[inline(always)]
pub fn msginfo_length(info: u64) -> u64 { info & 0x7F }

#[inline(always)]
pub fn msginfo_extracaps(info: u64) -> u64 { (info >> 7) & 0x1F }

All four are #[inline(always)], so the encoder collapses into a couple of shifts and an or at every call site.

The capability transfer protocol

IPC can carry capabilities along with the scalar register payload. The protocol is:

  1. Before a send, the caller stages up to four caps in ipc_buffer.caps[0..4] via set_send_cap_ctx(ctx, slot_index, cap_slot). Each call bumps ctx.send_cap_count.

  2. The caller builds a msginfo word whose extra_caps field equals send_cap_count.

  3. On SYS_SEND/SYS_CALL/etc., the kernel reads the staged caps from ipc_buffer.caps[0..extra_caps] and delivers them to the receiver’s cap-receive slots.

  4. After the send returns, the substrate IPC layer clears ipc_buffer.caps[0..4] and resets send_cap_count to 0. The caller does not need to clean up.

The receiver side configures where incoming caps should go by writing to receive_cnode / receive_index / receive_depth before the receive. substrate/ipc.rs exposes this as:

pub unsafe fn set_receive_slot_ctx(
    ctx: *mut IpcContext,
    cnode: Cap,
    index: u64,
    depth: u64,
);

If the receiver does not configure a receive slot, any cap the sender staged is simply dropped on arrival — there is no error, the cap is just gone. This is almost never what you want, so servers that expect incoming caps configure the receive slot as their very first action in their event loop.

The nine IPC primitives

substrate/ipc.rs exposes nine *_ctx functions. Each one takes a *mut IpcContext (so it can inspect and reset send_cap_count) and maps to one of the kernel IPC syscalls.

Function Syscall Semantics

send_ctx(ctx, ep, msg)

SYS_SEND

Blocking send. Writes overflow regs + staged caps to the IPC buffer, encodes msginfo, calls SYS_SEND, clears staged caps on return.

recv_ctx(ctx, ep, msg, badge)

SYS_RECV

Blocking receive. On wake, fills *msg with the received header + regs, and writes the sender’s badge into *badge.

recv_timed_ctx(ctx, ep, timeout_ns, msg, badge)

SYS_RECV_TIMED

Like recv_ctx but returns TRONA_CANCELLED on timeout.

recv_any_ctx(ctx, endpoints, count, msg, badge, source)

SYS_RECV_ANY

Blocking multi-endpoint receive. endpoints is a caller-provided array of up to N badged endpoint slots. On wake, *source gets the index of the endpoint that received the message (or IPC_RECV_SOURCE_NOTIFICATION for a notification wake).

recv_any_timed_ctx(…​)

SYS_RECV_ANY_TIMED

recv_any_ctx with a timeout.

call_ctx(ctx, ep, msg, reply)

SYS_CALL

Client RPC pattern: send + blocking receive on the same endpoint. Fills *reply with the server’s response. This is the kernel’s fastpath.

reply_recv_ctx(ctx, ep, reply, out_msg, badge)

SYS_REPLY_RECV

Server event loop pattern: reply to the current caller and immediately block waiting for the next request on the same endpoint. Also fastpath.

reply_recv_any_ctx(…​) / reply_recv_any_timed_ctx(…​)

SYS_REPLY_RECV_ANY / _TIMED

Server event loop across multiple endpoints (or with a timeout).

nbsend_ctx(ctx, ep, msg)

SYS_NBSEND

Non-blocking send. Returns TRONA_WOULD_BLOCK if no receiver is waiting. Caps can still be staged.

The pattern a client uses looks like:

let mut msg = TronaMsg::zeroed();
msg.label = VFS_WRITE;
msg.length = 3;
msg.regs[0] = fd;
msg.regs[1] = len;
msg.regs[2] = buf_ptr;

let mut reply = TronaMsg::zeroed();
unsafe { ipc::call_ctx(current_ipc_ctx(), vfs_ep, &msg, &mut reply); }

if reply.label == TRONA_OK {
    bytes_written = reply.regs[0];
}

The pattern a server uses looks like:

let mut msg = TronaMsg::zeroed();
let mut reply = TronaMsg::zeroed();
let mut badge = 0u64;

// First iteration: receive without replying.
unsafe { ipc::recv_ctx(current_ipc_ctx(), my_ep, &mut msg, &mut badge); }

loop {
    // dispatch(msg.label, badge, ...)
    reply.label = TRONA_OK;
    reply.length = 1;
    reply.regs[0] = result;

    // Reply + wait for next request.
    unsafe {
        ipc::reply_recv_ctx(current_ipc_ctx(), my_ep, &reply, &mut msg, &mut badge);
    }
}

This server loop shape is so common that substrate/worker.rs builds a whole event-loop abstraction on top of it — see Threads, TLS, and Worker Pool.

Fastpath vs slowpath

The kernel’s IPC implementation has an assembly fastpath for SYS_CALL (#2) and SYS_REPLY_RECV (#3) that handles the common case — small message, no extra caps, synchronous partner, same CPU — without entering the generic C slowpath. The fastpath bails out to the slowpath when any of the following is true:

  • extra_caps > 0 — capability transfer requires the slowpath

  • length > 4 — more than four message registers means IPC buffer overflow

  • No waiting partner on the target endpoint

  • Target TCB on a different CPU

  • Target TCB is blocked in a fault

From the substrate side, none of this is visible: call_ctx and reply_recv_ctx are identical whether the kernel takes the fastpath or the slowpath, because the observable behavior is the same. The only way to notice the fastpath is through timing — a slow-path IPC takes 3-5x as long as a fastpath IPC.

The kernel side of the fastpath lives in kernite/src/arch/x86_64/syscall.S and is covered in kernite: IPC fastpath.

Message register budget guidance

Because the first four message registers travel in CPU registers and anything beyond that pays a cache-line touch in the IPC buffer, message layout matters. The informal convention in the SaltyOS codebase is:

  • 4 or fewer regs: fast path. This is where SYS_CALL and SYS_REPLY_RECV want to live. Most PM_* and MM_* labels fit here.

  • 5–20 regs: slow path but still cheap. VFS_READ/VFS_WRITE and most socket operations end up here because they carry a buffer pointer, a length, flags, an offset, and sometimes credentials.

  • More than 20 regs or variable-size payload: use a bulk SHM region instead. VFS_BULK_SETUP + VFS_BULK_READ is the template for this pattern — see Poll, Pipe, and Bulk I/O.