Compare · updated 2026-05-01

numaperf vs. Tokio

Tokio is excellent at moving futures between worker threads. numaperf is about controlling where those workers run, what memory they touch, and how to observe locality. They compose — you don’t pick one.

Tokio project →

Dimension numaperf Tokio Note
Primary concern Memory placement, thread pinning, per-node scheduling, locality observability Async task execution, IO multiplexing, timers, work-stealing scheduler Different jobs; comparing them is mostly about understanding the gap
Async model Not an async runtime; can be used inside one Full async/await runtime with multi-threaded executor If you want async, you still want Tokio (or similar)
NUMA topology API Yes — Topology::discover() with node, CPU, and distance info No first-class NUMA awareness
Thread pinning RAII ScopedPin restored on drop Not built-in; can be done by hand at worker startup
Memory placement policies Bind, Preferred, Interleave, Local (NumaRegion + MemPolicy) Defers to the system allocator and the kernel
Per-node worker pools NumaExecutor with per-node pools and node-scoped work stealing Single global worker pool with cross-CPU work stealing
Per-node data structures NumaSharded<T>, ShardedCounter Not provided; use a third-party crate
Device locality Map NICs and NVMe devices to NUMA nodes Not provided
Locality observability Locality ratios, cross-node traffic counters, health reports Not provided
Platforms Linux x86_64, Linux aarch64 (full); macOS graceful degradation Linux, macOS, Windows
MSRV 1.70 Tokio publishes its own MSRV; see its release notes Treat both MSRVs as living numbers
License MIT MIT

Why this comparison exists

People ask whether numaperf “replaces” Tokio in latency-critical Rust. The honest answer is no. Tokio is an async runtime — it moves futures between worker threads, handles IO readiness, runs timers. numaperf is a set of NUMA primitives — it controls where threads execute, where memory is placed, and how you observe locality. They live at different layers. The pattern that works in practice is to keep your general-purpose runtime where ergonomics matter, and reach for numaperf on the parts of the system where p99 is the SLO.

For a longer take on the architectural pattern, see Pinning vs scheduling: where Rust runtimes leave perf on the table.

Where they overlap

The only real overlap is thread management. Both have an opinion about how many threads to spawn and what they should be doing. Tokio’s opinion is “as many as there are CPUs, all interchangeable, all stealing from each other”. numaperf’s opinion is “as many as makes sense per node, each pinned, each working out of node-local memory”. For most async code, Tokio’s answer is the right one. For the hot path of a packet processor or an order book, numaperf’s answer is.

Where Tokio wins

  • Ecosystem. Tokio is the default async runtime for Rust. If you need an HTTP server, a database driver, a Kafka client, the well-supported version of it is built on Tokio.
  • Async ergonomics. async fn, await, tokio::spawn, select! — none of that exists in numaperf and isn’t trying to.
  • Cross-platform. Tokio runs on Linux, macOS, and Windows. numaperf only really earns its keep on Linux NUMA hardware; macOS support is graceful degradation, Windows is not in scope.
  • General-purpose throughput. For workloads that are IO-bound and not latency-critical, Tokio’s work-stealing scheduler is excellent and you have no reason to fight it.

Where numaperf wins

  • Explicit memory placement. numaperf gives you MemPolicy::Bind with Prefault::Touch. Tokio gives you whatever your allocator decided.
  • RAII pinning. ScopedPin cleans up on drop. Manual sched_setaffinity calls in a worker startup hook don’t.
  • Per-node scheduling. A NumaExecutor will not steal work across nodes. Tokio’s default executor will, and that is the point — for latency, you want the opposite.
  • Locality observability. numaperf surfaces locality ratios and cross-node traffic per region. Nothing in the default Tokio stack does this.
  • Device locality. Mapping NICs and NVMe to NUMA nodes is a numaperf concern that no async runtime tries to solve.

The hybrid pattern

In a real service, the layout we recommend is:

  1. Use Tokio (or your runtime of choice) for the control plane: configuration, RPC, observability, slow IO.
  2. At startup, call Topology::discover() to learn the layout.
  3. Spin up a numaperf executor for the hot path, with workers pinned to specific nodes and node-local memory regions.
  4. Hand work off from the control plane to the hot path via a bounded channel.
  5. Put numaperf’s locality ratio on a dashboard next to your latency SLOs.

This buys you the ecosystem of Tokio for the parts of the system that need ecosystem, and the locality guarantees of numaperf for the parts of the system where they pay for themselves.

When to skip numaperf

If you do not have a multi-socket box, or your workload is not tail-latency-sensitive, you do not need this. The defaults are fine. Use Tokio, ship the service, profile when something hurts.


See something inaccurate about Tokio? Tell us and we’ll fix it.