Rust's Ownership Model as a Systems Design Tool
Type system as architectural skeleton — compile-time guarantees for concurrent exchange components.
Foundation · Core Engine & Data Structures
Why this week
You already know the mechanics of Rust's ownership model. This week is about promoting it from debugger to designer. You're going to encode architectural invariants — which component owns the order book, which can only read it, how shutdown signals propagate — directly into the type system so the compiler enforces them instead of a code reviewer.
The artifact: a SPMC channel in two implementations (Mutex + Condvar, then atomic/lock-free) that you'll compare for throughput. Both are the primitives your matching engine will use to fan events out to the WebSocket server, the event log, and the risk engine without stepping on each other.
The goal is not to beat the compiler — it's to use the compiler. When you write fn send(&self, v: T), you are declaring to every future reader and to the borrow checker: exactly one caller is mutating this, and only through this method. That declaration is free at runtime and unbreakable at compile time.
Day 1 — Re-reading Mara Bos with architect's eyes
Mara Bos, Ch. 1-3
Type system as concurrency architecture
Watch how Send/Sync compose from field types — thread-safety guarantees are structural, not gated by reviewer vigilance. Chapter 3's MutexGuard-as-RAII is the template for every 'must not forget to release X' invariant in a trading system.
Name three invariants in a trading exchange you could encode at the type level — what goes wrong at runtime today, and what shape of type would make it structurally impossible?
When would you pick Relaxed vs Acquire/Release vs SeqCst for updating the best-bid price that the WebSocket fanout reads? Give one concrete case for each.
Beyond MutexGuard, name one other 'must not forget to X' pattern in a trading system that deserves a Drop-based wrapper.
Day 2 — Jon Gjengset's decision process
Crust of Rust
Reason about code, not syntax
Don't focus on the syntax. Watch how Jon reasons: 'this reference must outlive this scope because…' The mental process is the point.
Cell, RefCell, Rc, Arc — when to use each. Pay attention to the single-threaded Cell<Price> idea vs Mutex<Price> — zero overhead when the context allows.
Pick three Arc<Mutex<T>> or Arc<RwLock<T>> sites in code you've written. For each, could the data flow be restructured so only one task owns the data and others receive snapshots/messages? Write the rewrite even if you don't ship it — this is the muscle we're building.
Day 3 — SPMC channel (Mutex + Condvar)
Safe & obvious
Version A: the type system does the talking
Design contract, before a single line of code:
Sender<T>is notClone. One producer, enforced structurally.Receiver<T>isClone. Each clone shares the same queue.- Dropping the sender wakes all blocked receivers so they can return
Noneinstead of hanging.
This asymmetry is the whole point. You're not documenting "please don't clone the sender" — you're making tx.clone() a compile error.
Build the SPMC channel with Mutex + Condvar
use std::collections::VecDeque;
use std::sync::{Arc, Condvar, Mutex};
pub struct Sender<T> {
shared: Arc<Shared<T>>,
}
pub struct Receiver<T> {
shared: Arc<Shared<T>>,
}
struct Shared<T> {
queue: Mutex<VecDeque<T>>,
available: Condvar,
sender_alive: Mutex<bool>,
}
pub fn channel<T>() -> (Sender<T>, Receiver<T>) {
let shared = Arc::new(Shared {
queue: Mutex::new(VecDeque::new()),
available: Condvar::new(),
sender_alive: Mutex::new(true),
});
(
Sender { shared: shared.clone() },
Receiver { shared },
)
}
impl<T> Clone for Receiver<T> {
fn clone(&self) -> Self {
Receiver { shared: self.shared.clone() }
}
}
impl<T> Sender<T> {
pub fn send(&self, value: T) {
// TODO: push to queue, notify one receiver.
todo!()
}
}
impl<T> Drop for Sender<T> {
fn drop(&mut self) {
// TODO: mark sender_alive = false, wake every blocked receiver.
todo!()
}
}
impl<T> Receiver<T> {
pub fn recv(&self) -> Option<T> {
// TODO: pop-or-wait loop. Return None only when queue is empty
// AND sender_alive is false.
todo!()
}
}
Day 4 — Lock-free SPSC ring buffer
Atomics with purpose
Version B: trade safety-net for speed — deliberately
The mutex version is correct. It's also ~5-20× slower than what's achievable when you know the access pattern is single-producer, single-consumer. Trading engines almost always have one task producing market events and one downstream consumer per queue — so SPSC is the right primitive.
The trick: represent the queue as a fixed-size array of UnsafeCell<T> with atomic head/tail indices. The producer owns head, the consumer owns tail, and Acquire/Release ordering on the index publication is enough to make hand-off safe without any lock.
Build the SPSC ring buffer
use std::cell::UnsafeCell;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
const CAP: usize = 1024;
pub struct RingBuffer<T: Copy + Default> {
buf: UnsafeCell<[T; CAP]>,
head: AtomicUsize, // producer writes next; cell is valid if i < head
tail: AtomicUsize, // consumer reads next; cell is free if i < tail
}
// SAFETY: head/tail indices partition the buffer; producer only touches
// indices in [tail, head), consumer only touches [tail, head) reads.
unsafe impl<T: Copy + Default + Send> Send for RingBuffer<T> {}
unsafe impl<T: Copy + Default + Send> Sync for RingBuffer<T> {}
pub struct Producer<T: Copy + Default> {
inner: Arc<RingBuffer<T>>,
}
pub struct Consumer<T: Copy + Default> {
inner: Arc<RingBuffer<T>>,
}
pub fn ring<T: Copy + Default>() -> (Producer<T>, Consumer<T>) {
let buf = Arc::new(RingBuffer {
buf: UnsafeCell::new([T::default(); CAP]),
head: AtomicUsize::new(0),
tail: AtomicUsize::new(0),
});
(Producer { inner: buf.clone() }, Consumer { inner: buf })
}
impl<T: Copy + Default> Producer<T> {
pub fn try_push(&self, value: T) -> Result<(), T> {
// TODO:
// 1. Read head (Relaxed — we are the only writer).
// 2. Read tail (Acquire — sync with consumer's last pop).
// 3. If head - tail == CAP, queue is full: return Err(value).
// 4. Write value into buf[head % CAP] (unsafe raw write).
// 5. Publish by storing head + 1 with Release.
todo!()
}
}
impl<T: Copy + Default> Consumer<T> {
pub fn try_pop(&self) -> Option<T> {
// TODO:
// 1. Read tail (Relaxed — we are the only writer).
// 2. Read head (Acquire — sync with producer's last push).
// 3. If head == tail, empty: return None.
// 4. Read buf[tail % CAP] (unsafe raw read) into a local.
// 5. Publish consumption by storing tail + 1 with Release.
todo!()
}
}
Day 5 — Benchmark them head-to-head
See the delta
Numbers make the design choice concrete
Throughput is what matters in a matching-engine event bus. Measure ops/sec on a single thread (Playground is single-core; a real multi-thread bench belongs on your laptop with criterion). Expect ~10× over the mutex baseline for the lock-free version — worth the unsafe.
Playground timings are directional. They share CPU with other tenants and show 5-30% run-to-run jitter. For publishable numbers run these locally under criterion; the numbers here exist to validate "the lock-free path is measurably faster", not to claim a particular throughput.
Day 6 — Comparative write-up
Assembly diff
What did the compiler actually emit?
Install with `cargo install cargo-show-asm`. On your laptop run it against both implementations and diff the hot path. Paste the key fragments into your Journal.
Paste the disassembly of try_push vs send. Where does the mutex version's overhead actually live? Point at the instructions — lock cmpxchg, pthread_mutex_lock, memory fences.
In one paragraph: which invariant from this week will you reuse in the matching engine? How will you encode it structurally?
Capstone — TypedChannel build
Ship a ring buffer that beats a mutex baseline
TargetCorrectness passes. Ring ≥2× mutex. Assembly analysis recorded.
// Re-implement the ring buffer — the harness will bench it against a
// mutex baseline it ships internally. Signatures must match these.
use std::cell::UnsafeCell;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
pub const RING_CAP: usize = 1024;
pub struct Ring {
buf: UnsafeCell<[u64; RING_CAP]>,
head: AtomicUsize,
tail: AtomicUsize,
}
unsafe impl Send for Ring {}
unsafe impl Sync for Ring {}
pub fn ring() -> (Arc<Ring>, Arc<Ring>) {
let r = Arc::new(Ring {
buf: UnsafeCell::new([0u64; RING_CAP]),
head: AtomicUsize::new(0),
tail: AtomicUsize::new(0),
});
(r.clone(), r)
}
pub fn try_push(r: &Ring, value: u64) -> Result<(), u64> {
// TODO
todo!()
}
pub fn try_pop(r: &Ring) -> Option<u64> {
// TODO
todo!()
}
Push/pop single values, fail on overflow, return None on empty, and survive wrap-around across CAP.
Ring-buffer push-pop pair is at least twice as fast as a Mutex<VecDeque<u64>> baseline measured in the same Playground run.
Your Journal has a diff of cargo asm output for the two hot paths, with the expensive instruction in the mutex version called out. Click to mark done after you've pasted it.
Feeds into
- Week 3 uses the mutex channel pattern for the cancellation-aware ConnectionManager.
- Week 7 uses the SPSC ring buffer to publish
BookUpdateevents from the matching engine to the WebSocket fanout. - Week 10 revisits cache-padding on the atomics so producer and consumer don't ping-pong the same cache line.