NUMA-first runtime for latency-critical Rust.
numaperf gives you explicit control over memory placement, thread pinning, and work scheduling on NUMA systems. Stop guessing where your data lives and start guaranteeing it.
Cross-node memory access latency on multi-socket servers is typically 2–3× local access. Source: project README.
What you get
A focused set of NUMA primitives — not a kitchen-sink runtime. Pick the facade or wire up only the crates you need.
Topology discovery
Query NUMA nodes, CPUs, and inter-node distances at runtime. No more assuming a fixed layout across hardware.
Thread pinning
RAII-based CPU affinity via ScopedPin. Affinity is restored on drop — no leaks across scopes or worker pools.
Memory placement
Explicit MemPolicy: Bind, Preferred, Interleave, Local. Pair with Prefault::Touch to make placement decisions stick at allocation time.
Work scheduling
NumaExecutor with per-node worker pools and configurable work stealing. Schedule queries on data-local workers, not whatever core was free.
Sharded data
NumaSharded<T> for per-node data structures. ShardedCounter for lock-free counting that does not bounce a cache line across the box.
Device locality
Map NICs and NVMe devices to their NUMA nodes. Allocate packet buffers on the NIC-local node and skip the cross-node copy.
Observability
Track locality ratios, generate health reports, identify cross-node traffic. Find the leak before it shows up in p99.
Hard mode
Strict enforcement when you need guarantees, graceful degradation when you do not. The default behaviour is your choice, not the OS’s.
The 30-second mental model
Discover the topology. Pin your thread. Allocate locally with a policy you actually chose. Touch pages to force placement. Everything else is observability.
use numaperf::{Topology, ScopedPin, NumaRegion, MemPolicy, NodeMask, Prefault};
fn main() -> Result<(), numaperf::NumaError> {
// Discover NUMA topology
let topo = Topology::discover()?;
let node0 = topo.numa_nodes()[0].id();
// Pin this thread to node 0's CPUs (restored on drop)
let _pin = ScopedPin::to_node(&topo, node0)?;
// Allocate 1 GiB bound to node 0
let region = NumaRegion::anon(
1024 * 1024 * 1024,
MemPolicy::Bind(NodeMask::single(node0)),
Default::default(),
Prefault::Touch,
)?;
// region.as_mut_slice() is now guaranteed local to node 0
println!("Allocated {} bytes on node {}", region.len(), node0);
Ok(())
} Example adapted directly from the project README.
Where it earns its keep
Database engines
Pin buffer pools to specific nodes, schedule queries on data-local workers.
Network processing
Allocate packet buffers on the NIC-local node, process without cross-node copies.
Scientific computing
Partition large arrays across nodes, compute with guaranteed locality.
Trading systems
Eliminate latency variance from NUMA effects with strict pinning and placement.
Crate layout
numaperf ships as a workspace. Use the facade for everything, or wire up only the crates you actually need.
| Crate | Purpose |
|---|---|
numaperf | Facade — re-exports all public APIs |
numaperf-topo | Topology discovery |
numaperf-affinity | Thread pinning |
numaperf-mem | Memory placement |
numaperf-sched | Work scheduling |
numaperf-sharded | Per-node data structures |
numaperf-io | Device locality |
numaperf-perf | Observability |
Platform support
| Platform | Support |
|---|---|
| Linux x86_64 | Full |
| Linux aarch64 | Full |
| macOS | Graceful degradation (no NUMA hardware) |
Windows is not currently supported. If you need it, open an issue describing the target.
Start reading
Quickstart
From `cargo add numaperf` to a bound NumaRegion in five minutes.
notesWhen NUMA actually matters
A recipe for the p99 hunter: knowing when to reach for explicit placement and when to leave the kernel alone.
comparenumaperf vs. Tokio
Where each tool sits on the perf/ergonomics curve — and why you usually want both.