v0 — Rust 1.70+ · Linux x86_64 / aarch64

NUMA-first runtime for latency-critical Rust.

numaperf gives you explicit control over memory placement, thread pinning, and work scheduling on NUMA systems. Stop guessing where your data lives and start guaranteeing it.

Quickstart → Features GitHub crates.io

Memory ratio (NUMA)

2–3×

Placement policies

Workspace crates

MSRV

1.70

Cross-node memory access latency on multi-socket servers is typically 2–3× local access. Source: project README.

What you get

A focused set of NUMA primitives — not a kitchen-sink runtime. Pick the facade or wire up only the crates you need. See all features →

topo

Topology discovery

Query NUMA nodes, CPUs, and inter-node distances at runtime. No more assuming a fixed layout across hardware.

affinity

Thread pinning

RAII-based CPU affinity via ScopedPin. Affinity is restored on drop — no leaks across scopes or worker pools.

mem

Memory placement

Explicit MemPolicy: Bind, Preferred, Interleave, Local. Pair with Prefault::Touch to make placement decisions stick at allocation time.

sched

Work scheduling

NumaExecutor with per-node worker pools and configurable work stealing. Schedule queries on data-local workers, not whatever core was free.

sharded

Sharded data

NumaSharded<T> for per-node data structures. ShardedCounter for lock-free counting that does not bounce a cache line across the box.

Device locality

Map NICs and NVMe devices to their NUMA nodes. Allocate packet buffers on the NIC-local node and skip the cross-node copy.

perf

Observability

Track locality ratios, generate health reports, identify cross-node traffic. Find the leak before it shows up in p99.

mode

Hard mode

Strict enforcement when you need guarantees, graceful degradation when you do not. The default behaviour is your choice, not the OS’s.

The 30-second mental model

Discover the topology. Pin your thread. Allocate locally with a policy you actually chose. Touch pages to force placement. Everything else is observability.

main.rs

use numaperf::{Topology, ScopedPin, NumaRegion, MemPolicy, NodeMask, Prefault};

fn main() -> Result<(), numaperf::NumaError> {
    // Discover NUMA topology
    let topo = Topology::discover()?;
    let node0 = topo.numa_nodes()[0].id();

    // Pin this thread to node 0's CPUs (restored on drop)
    let _pin = ScopedPin::to_node(&topo, node0)?;

    // Allocate 1 GiB bound to node 0
    let region = NumaRegion::anon(
        1024 * 1024 * 1024,
        MemPolicy::Bind(NodeMask::single(node0)),
        Default::default(),
        Prefault::Touch,
    )?;

    // region.as_mut_slice() is now guaranteed local to node 0
    println!("Allocated {} bytes on node {}", region.len(), node0);
    Ok(())
}

Example adapted directly from the project README.

Where it earns its keep

Latency-critical workloads on multi-socket hardware. Read the full use cases →

Database engines

Pin buffer pools to specific nodes, schedule queries on data-local workers.

Network processing

Allocate packet buffers on the NIC-local node, process without cross-node copies.

Scientific computing

Partition large arrays across nodes, compute with guaranteed locality.

Trading systems

Eliminate latency variance from NUMA effects with strict pinning and placement.

Crate layout

numaperf ships as a workspace. Use the facade for everything, or wire up only the crates you actually need.

Crate	Purpose
`numaperf`	Facade — re-exports all public APIs
`numaperf-topo`	Topology discovery
`numaperf-affinity`	Thread pinning
`numaperf-mem`	Memory placement
`numaperf-sched`	Work scheduling
`numaperf-sharded`	Per-node data structures
`numaperf-io`	Device locality
`numaperf-perf`	Observability

Platform support

Platform	Support
Linux x86_64	Full
Linux aarch64	Full
macOS	Graceful degradation (no NUMA hardware)

Windows is not currently supported. If you need it, open an issue describing the target.

Explore numaperf

Every part of the project, one click from here.

5 min