Phase 1Week 01~90 min read

0%·0/5

Rust's Ownership Model as a Systems Design Tool

Type system as architectural skeleton — compile-time guarantees for concurrent exchange components.

Foundation · Core Engine & Data Structures

Why this week

You already know the mechanics of Rust's ownership model. This week is about promoting it from debugger to designer. You're going to encode architectural invariants — which component owns the order book, which can only read it, how shutdown signals propagate — directly into the type system so the compiler enforces them instead of a code reviewer.

The artifact: a SPMC channel in two implementations (Mutex + Condvar, then atomic/lock-free) that you'll compare for throughput. Both are the primitives your matching engine will use to fan events out to the WebSocket server, the event log, and the risk engine without stepping on each other.

💡

Tip

The goal is not to beat the compiler — it's to use the compiler. When you write fn send(&self, v: T), you are declaring to every future reader and to the borrow checker: exactly one caller is mutating this, and only through this method. That declaration is free at runtime and unbreakable at compile time.

Recall#why-type-system

In one sentence: why push invariants into the type system instead of into documentation or tests?

Day 1 — Re-reading Mara Bos with architect's eyes

Day 01

Mara Bos, Ch. 1-3

Type system as concurrency architecture

Bookby Mara Bos · ch. 1-3

Rust Atomics and Locks

Watch how Send/Sync compose from field types — thread-safety guarantees are structural, not gated by reviewer vigilance. Chapter 3's MutexGuard-as-RAII is the template for every 'must not forget to release X' invariant in a trading system.

Design note#mara-invariants

Name three invariants in a trading exchange you could encode at the type level — what goes wrong at runtime today, and what shape of type would make it structurally impossible?

0 chars

Design note#ordering-choice

When would you pick Relaxed vs Acquire/Release vs SeqCst for updating the best-bid price that the WebSocket fanout reads? Give one concrete case for each.

0 chars

Design note#raii-generalization

Beyond MutexGuard, name one other 'must not forget to X' pattern in a trading system that deserves a Drop-based wrapper.

0 chars

Day 2 — Jon Gjengset's decision process

Day 02

Crust of Rust

Reason about code, not syntax

Videoby Jon Gjengset · ~1.5h

Lifetime Annotations

Don't focus on the syntax. Watch how Jon reasons: 'this reference must outlive this scope because…' The mental process is the point.

Videoby Jon Gjengset · ~2h

Smart Pointers and Interior Mutability

Cell, RefCell, Rc, Arc — when to use each. Pay attention to the single-threaded Cell<Price> idea vs Mutex<Price> — zero overhead when the context allows.

Design note#arc-mutex-audit

Pick three Arc<Mutex<T>> or Arc<RwLock<T>> sites in code you've written. For each, could the data flow be restructured so only one task owns the data and others receive snapshots/messages? Write the rewrite even if you don't ship it — this is the muscle we're building.

0 chars

Day 3 — SPMC channel (Mutex + Condvar)

Day 03

Safe & obvious

Version A: the type system does the talking

Design contract, before a single line of code:

Sender<T> is not Clone. One producer, enforced structurally.
Receiver<T> is Clone. Each clone shares the same queue.
Dropping the sender wakes all blocked receivers so they can return None instead of hanging.

This asymmetry is the whole point. You're not documenting "please don't clone the sender" — you're making tx.clone() a compile error.

Recall#spmc-sender-clone

Why is it important that `Sender` is *not* `Clone` in an SPMC channel? What invariant does that enforce?

Drill

Build the SPMC channel with Mutex + Condvar

Target: 4/4 correctness tests passNot yet run

use std::collections::VecDeque;
use std::sync::{Arc, Condvar, Mutex};

pub struct Sender<T> {
    shared: Arc<Shared<T>>,
}

pub struct Receiver<T> {
    shared: Arc<Shared<T>>,
}

struct Shared<T> {
    queue: Mutex<VecDeque<T>>,
    available: Condvar,
    sender_alive: Mutex<bool>,
}

pub fn channel<T>() -> (Sender<T>, Receiver<T>) {
    let shared = Arc::new(Shared {
        queue: Mutex::new(VecDeque::new()),
        available: Condvar::new(),
        sender_alive: Mutex::new(true),
    });
    (
        Sender { shared: shared.clone() },
        Receiver { shared },
    )
}

impl<T> Clone for Receiver<T> {
    fn clone(&self) -> Self {
        Receiver { shared: self.shared.clone() }
    }
}

impl<T> Sender<T> {
    pub fn send(&self, value: T) {
        // TODO: push to queue, notify one receiver.
        todo!()
    }
}

impl<T> Drop for Sender<T> {
    fn drop(&mut self) {
        // TODO: mark sender_alive = false, wake every blocked receiver.
        todo!()
    }
}

impl<T> Receiver<T> {
    pub fn recv(&self) -> Option<T> {
        // TODO: pop-or-wait loop. Return None only when queue is empty
        // AND sender_alive is false.
        todo!()
    }
}

Rust · runnable

Day 4 — Lock-free SPSC ring buffer

Day 04

Atomics with purpose

Version B: trade safety-net for speed — deliberately

The mutex version is correct. It's also ~5-20× slower than what's achievable when you know the access pattern is single-producer, single-consumer. Trading engines almost always have one task producing market events and one downstream consumer per queue — so SPSC is the right primitive.

The trick: represent the queue as a fixed-size array of UnsafeCell<T> with atomic head/tail indices. The producer owns head, the consumer owns tail, and Acquire/Release ordering on the index publication is enough to make hand-off safe without any lock.

Recall#release-pairs-with-acquire

Why must the producer publish `head` with `Release` ordering — what would break under `Relaxed`?

Drill

Build the SPSC ring buffer

Target: 4/4 tests · ≥10M ops/sec on the playgroundNot yet run

use std::cell::UnsafeCell;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;

const CAP: usize = 1024;

pub struct RingBuffer<T: Copy + Default> {
    buf: UnsafeCell<[T; CAP]>,
    head: AtomicUsize, // producer writes next; cell is valid if i < head
    tail: AtomicUsize, // consumer reads next; cell is free if i < tail
}

// SAFETY: head/tail indices partition the buffer; producer only touches
// indices in [tail, head), consumer only touches [tail, head) reads.
unsafe impl<T: Copy + Default + Send> Send for RingBuffer<T> {}
unsafe impl<T: Copy + Default + Send> Sync for RingBuffer<T> {}

pub struct Producer<T: Copy + Default> {
    inner: Arc<RingBuffer<T>>,
}

pub struct Consumer<T: Copy + Default> {
    inner: Arc<RingBuffer<T>>,
}

pub fn ring<T: Copy + Default>() -> (Producer<T>, Consumer<T>) {
    let buf = Arc::new(RingBuffer {
        buf: UnsafeCell::new([T::default(); CAP]),
        head: AtomicUsize::new(0),
        tail: AtomicUsize::new(0),
    });
    (Producer { inner: buf.clone() }, Consumer { inner: buf })
}

impl<T: Copy + Default> Producer<T> {
    pub fn try_push(&self, value: T) -> Result<(), T> {
        // TODO:
        //   1. Read head (Relaxed — we are the only writer).
        //   2. Read tail (Acquire — sync with consumer's last pop).
        //   3. If head - tail == CAP, queue is full: return Err(value).
        //   4. Write value into buf[head % CAP] (unsafe raw write).
        //   5. Publish by storing head + 1 with Release.
        todo!()
    }
}

impl<T: Copy + Default> Consumer<T> {
    pub fn try_pop(&self) -> Option<T> {
        // TODO:
        //   1. Read tail (Relaxed — we are the only writer).
        //   2. Read head (Acquire — sync with producer's last push).
        //   3. If head == tail, empty: return None.
        //   4. Read buf[tail % CAP] (unsafe raw read) into a local.
        //   5. Publish consumption by storing tail + 1 with Release.
        todo!()
    }
}

Rust · runnable

Day 5 — Benchmark them head-to-head

Day 05

See the delta

Numbers make the design choice concrete

Throughput is what matters in a matching-engine event bus. Measure ops/sec on a single thread (Playground is single-core; a real multi-thread bench belongs on your laptop with criterion). Expect ~10× over the mutex baseline for the lock-free version — worth the unsafe.

⚠️

Warning

Playground timings are directional. They share CPU with other tenants and show 5-30% run-to-run jitter. For publishable numbers run these locally under criterion; the numbers here exist to validate "the lock-free path is measurably faster", not to claim a particular throughput.

Day 6 — Comparative write-up

Day 06

Assembly diff

What did the compiler actually emit?

Docs

cargo-asm

Install with `cargo install cargo-show-asm`. On your laptop run it against both implementations and diff the hot path. Paste the key fragments into your Journal.

Design note#assembly-diff

Paste the disassembly of try_push vs send. Where does the mutex version's overhead actually live? Point at the instructions — lock cmpxchg, pthread_mutex_lock, memory fences.

0 chars

Design note#week-reflection

In one paragraph: which invariant from this week will you reuse in the matching engine? How will you encode it structurally?

0 chars

Capstone — TypedChannel build

Capstone

Ship a ring buffer that beats a mutex baseline

TargetCorrectness passes. Ring ≥2× mutex. Assembly analysis recorded.

// Re-implement the ring buffer — the harness will bench it against a
// mutex baseline it ships internally. Signatures must match these.

use std::cell::UnsafeCell;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;

pub const RING_CAP: usize = 1024;

pub struct Ring {
    buf: UnsafeCell<[u64; RING_CAP]>,
    head: AtomicUsize,
    tail: AtomicUsize,
}

unsafe impl Send for Ring {}
unsafe impl Sync for Ring {}

pub fn ring() -> (Arc<Ring>, Arc<Ring>) {
    let r = Arc::new(Ring {
        buf: UnsafeCell::new([0u64; RING_CAP]),
        head: AtomicUsize::new(0),
        tail: AtomicUsize::new(0),
    });
    (r.clone(), r)
}

pub fn try_push(r: &Ring, value: u64) -> Result<(), u64> {
    // TODO
    todo!()
}

pub fn try_pop(r: &Ring) -> Option<u64> {
    // TODO
    todo!()
}

Rust · runnable

Correctness tests pass#impl-complete

Push/pop single values, fail on overflow, return None on empty, and survive wrap-around across CAP.

≥2× mutex baseline#bench-delta

Ring-buffer push-pop pair is at least twice as fast as a Mutex<VecDeque<u64>> baseline measured in the same Playground run.

Assembly diff recorded (manual)#assembly-analysis

Your Journal has a diff of cargo asm output for the two hot paths, with the expensive instruction in the mutex version called out. Click to mark done after you've pasted it.

Ring buffer throughput

· · ·

Target

>=10000000

Mutex baseline throughput

· · ·

Target

>=1000000

Lock-free speedup over mutex

· · ·

Target

>=2

Feeds into

Feeds into →W03 · Memory Layout & Alignment W07 · Benchmark Harness Setup W10 · Property-Based Testing

Week 3 uses the mutex channel pattern for the cancellation-aware ConnectionManager.
Week 7 uses the SPSC ring buffer to publish BookUpdate events from the matching engine to the WebSocket fanout.
Week 10 revisits cache-padding on the atomics so producer and consumer don't ping-pong the same cache line.

Atomic Rust

Design note — Name three invariants in a trading exchange you could encode at the type level — what goes wrong at runtime today, and what shape of type would make it structurally impossible?

test

Design note — When would you pick Relaxed vs Acquire/Release vs SeqCst for updating the best-bid price that the WebSocket fanout reads? Give one concrete case for each.

Rust's Ownership Model as a Systems Design Tool

Why this week

Day 1 — Re-reading Mara Bos with architect's eyes

Mara Bos, Ch. 1-3

Day 2 — Jon Gjengset's decision process

Crust of Rust

Day 3 — SPMC channel (Mutex + Condvar)

Safe & obvious

Build the SPMC channel with Mutex + Condvar

Day 4 — Lock-free SPSC ring buffer

Atomics with purpose

Build the SPSC ring buffer

Day 5 — Benchmark them head-to-head

See the delta

Day 6 — Comparative write-up

Assembly diff

Capstone — TypedChannel build

Ship a ring buffer that beats a mutex baseline

Feeds into