A header-mostly C++20 library that models numbers as machines that emit digits on demand rather than as fixed-precision floats. A number is a tiny forkable virtual machine: you ask it for the next digit, fork it in O(1) (or O(log n) for the series tier), reproject it into a new base, compare it against another machine — all while staying interval-honest (equality is undecidable, so the library never returns a false definite answer).
The design is organized into two tiers and four implementation phases:
- Automaton tier — constant- or bounded-state machines whose fork is a literal struct copy: rationals, quadratic irrationals, p-adic numbers.
- Series tier — growing arbitrary-precision accumulators for classical transcendentals (e, ln 2, 1/e) backed by compiled tail-bound oracles.
- Concepts
- Quick start
- Build & test
- The four phases
- JIT compilation path (Phase 4)
- Python bindings (Phase 3 user layer)
- Debugging a hang
- Header / milestone map
- The frozen ABI contract
- Honesty commitments
Every number is a state machine exposing a single step:
NumVMStep step(AutomatonVM) // -> { uint32_t digit, AutomatonVM next }
Two consequences fall out of this shape:
-
Fork is value-copy. For the automaton tier the entire state is a 40-byte POD (
AutomatonVM), so forking a number is amemcpy— O(1), no hidden state, no copy-on-write surprises. For the series tier the accumulators are arbitrary-precision integers, so fork is an explicit deep copy, advertised as O(log n) in the computation depth. -
Memory is the complexity metric. A rational stays at constant state. A degree-2 algebraic number's accumulated-root register grows O(log n) in the digit index. A transcendental's accumulators grow with depth. The library instruments
bit_width()so tests can assert these shapes.
Comparison and metrics are interval-honest: predicates return tri-state
(Less | Greater | Indistinguishable) or std::optional (pending), never a
fabricated definite answer for values that cannot be distinguished within
the requested precision.
#include "nam/number.hpp"
using namespace nam;
int main() {
// 1/7 = 0.(142857) — emit twelve digits.
Number r = Number::rational(1, 7, 10);
auto ds = r.digits(12); // {1,4,2,8,5,7,1,4,2,8,5,7}
// Base is a codec, not part of the value: 1/4 in base 2 is 0.01.
Number quarter = Number::rational(1, 4, 10);
quarter.in_base(2).digits(4); // {0,1,0,0}
// Fork is O(1) and exact for the automaton tier.
auto [a, b] = r.fork();
bool same = (a.digits(20) == b.digits(20)); // true
// Transcendentals stream from the series tier (1/e = 0.36787944...).
Number inv_e = Number::one_over_e(10);
inv_e.digits(6); // {3,6,7,8,7,9}
// Interval-honest comparison: 1/7 < 1/3.
Number third = Number::rational(1, 3, 10);
r.compare(third, 5); // Trit::Less
}# Dependencies (Debian/Ubuntu). Only build tooling is required for the core;
# the zstd/zlib/edit/curl packages are for the optional LLVM JIT backend.
sudo apt update && sudo apt install -y build-essential ninja-build cmake libzstd-dev zlib1g-dev libedit-dev libcurl4-openssl-dev
rm -rf build && cmake -S . -B build -G Ninja && cmake --build build
ctest --test-dir build --output-on-failureThe core library has zero external dependencies by default: arbitrary
precision is provided by a vendored nam::BigInt, bounded integers by
nam::BoundedInt, and the JIT by a function-pointer interpreter. Optional
upgrades are gated behind CMake flags:
| Flag | Effect | Default |
|---|---|---|
-DNAM_USE_GMP=ON |
Use GMP mpz_t for the series tier accumulators |
off |
-DNAM_BUILD_PYTHON=ON |
Build the pybind11 user-layer bindings | off |
-DNAM_BUILD_WASM=ON |
Build the Emscripten/embind WebAssembly bindings | off |
-DNAM_SANITIZE=ON |
ASan + UBSan on the test target | on |
| Phase | Theme | What it adds |
|---|---|---|
| 1 | Automaton tier | Frozen ABI, rationals, quadratic irrationals, base codecs, p-adics, comparison/metric, skip-ahead |
| 2 | Series tier | Vendored BigInt, transcendentals with tail-bound oracles, interval refinement, bounded LRU memo |
| 3 | User-facing API | Number type: operator surface, precision contexts, explicit memoization, honest comparisons |
| 4 | JIT / expression tree | compile(expr_tree) -> NumVMFn for runtime-built trees, sharing the static C ABI exactly |
A rational p/q in base b is a constant-state machine: the only mutable
state is the running remainder, advanced by long-division-by-multiplication.
Quadratic irrationals (sqrt(D)) use the classic digit-by-digit long-hand
square-root recurrence — genuinely two registers (remainder + root prefix),
with the root prefix's bit-width growing O(log n). p-adic rationals commit
digits locally from the least-significant end, giving ultimately periodic
expansions and therefore free skip-ahead.
Transcendentals are convergent series carrying an attached tail-bound
oracle (a compiled convergence proof). The value is tracked as an exact
rational num/den plus an error numerator; a digit is committed only when
the interval [num/den, (num+err)/den] is narrow enough that its leading
digit is unambiguous. Refinement pulls more terms until the digit commits or
a budget is exhausted (honest pending).
The Number type is a thin tagged union over the two tiers. It hides the raw
ABI behind an mpmath/Decimal-flavoured surface: digits(n), in_base(b),
fork(), skip(n), compare(...), to_string(...), scoped
precision_context(digits=N), and explicit .streaming() / .cached(N)
memoization modes.
See JIT compilation path.
The ergonomic user layer is exposed to Python via pybind11. It is off by
default so the core still builds with no external dependencies; enable it
with -DNAM_BUILD_PYTHON=ON (pybind11 is located via find_package,
falling back to FetchContent):
cmake -S . -B build -G Ninja -DNAM_BUILD_PYTHON=ON
cmake --build build
PYTHONPATH=build/bindings python3 bindings/example.py
PYTHONPATH=build/bindings python3 bindings/test_nam_py.pyThe honesty commitments cross the language boundary unchanged: comparison
maps to Python tri-state (True | False | None), fork cost is annotated by
tier (fork_cost()), memoization is an explicit mode (.streaming() /
.cached(N)), and precision_context(digits=N) is a scoped context manager.
The same ergonomic user layer compiles to WebAssembly via Emscripten +
embind. Because the core is header-only with zero dependencies, the WASM
target is a thin binding shim — no GMP/LLVM required. It is off by default;
enable it with -DNAM_BUILD_WASM=ON under the Emscripten toolchain:
# Requires the Emscripten SDK (emsdk) activated in the current shell.
emcmake cmake -S . -B build-wasm -G Ninja -DNAM_BUILD_WASM=ON
cmake --build build-wasm
node build-wasm/bindings/wasm/example.js
node build-wasm/bindings/wasm/test_nam_wasm.jsThe bindings/wasm/nam.js wrapper turns the raw embind surface into an
idiomatic JS API:
const createNam = require('./nam_wasm.js');
const nam = await require('./nam.js')(createNam);
nam.rational(1, 7, 10).digits(12); // [1,4,2,8,5,7,1,4,2,8,5,7]
nam.rational(1, 4, 10).in_base(2).digits(4); // [0,1,0,0]
nam.rational(1, 7, 10).definitely_less_than(nam.rational(1, 3, 10), 5); // true
nam.precision_context(50, () => nam.e(10).digits()); // scoped, RAIIThe honesty commitments cross the WASM boundary unchanged: comparison maps
to JS tri-state (-1 | 1 | null for compare, true | false | null for
definitely_less_than), fork cost is annotated by tier (fork_cost()),
memoization is an explicit mode (.streaming() / .cached(N)), and
precision_context(digits, fn) runs fn inside a scoped RAII precision
context (restored even if fn throws).
ctest buffers a test's stdout and only prints it after the test
finishes. If a test hangs (e.g. an infinite loop in digit generation), you
see only Start 1: nam_tests and nothing else. To diagnose, bypass ctest
and run the binary directly so the per-test [ RUN ] / [ OK ] lines stream
live — the last [ RUN ] printed is the test that is hanging:
./build/tests/nam_testsUseful variants:
# Stream ctest output live instead of buffering (ctest >= 3.17), and add a
# hard timeout so a hang fails fast instead of blocking CI forever.
ctest --test-dir build --output-on-failure \
--output-junit results.xml \
--test-output-size-passed 0 \
--timeout 30 -V
# Force-flush per line when piping through a file or another process:
stdbuf -oL -eL ./build/tests/nam_testsRun under gdb, let it hang, then interrupt with Ctrl-C and inspect the
stack — the top frames reveal exactly which generator/extractor is looping:
gdb ./build/tests/nam_tests
(gdb) run
# ... wait for the hang, then press Ctrl-C ...
(gdb) bt # backtrace of the stuck thread
(gdb) frame N # hop to the NAM frame of interest
(gdb) info locals # inspect loop counters / interval boundsOr attach to the already-running process:
./build/tests/nam_tests &
gdb -p $(pgrep -n nam_tests)
(gdb) btTests are built with ASan + UBSan by default (-DNAM_SANITIZE=ON), which
catches the non-hanging failure modes (UB, OOB, leaks). A hang is not a
sanitizer event, so combine sanitizers with an external watchdog:
timeout 30 ./build/tests/nam_tests || echo "hung or failed (exit $?)"timeout exits 124 on a hang, giving CI a clean signal instead of an
indefinite block.
cmake -S . -B build-debug -G Ninja -DCMAKE_BUILD_TYPE=Debug
cmake --build build-debug
gdb ./build-debug/tests/nam_tests| Milestone | Header | Status |
|---|---|---|
| M1 ABI | nam/abi.h, nam/generator.hpp |
frozen, static_asserted |
| M2 Rationals | nam/rational.hpp |
constant state, period detection |
| M3 Quadratics | nam/algebraic.hpp |
degree-2 sqrt recurrence, bit-width instrumented |
| M4 Codec | nam/codec.hpp |
base as projection, round-trips |
| M5 p-adics | nam/padic.hpp |
local commitment, valuation extractor |
| M6 Compare/Metric | nam/compare.hpp, nam/metric.hpp |
interval-honest, product automaton |
| M7 Skip | nam/skip.hpp |
periodic skip + modexp kernel |
| P2 BigInt | nam/big_int.hpp |
vendored arbitrary-precision signed integer |
| P2 Series | nam/series.hpp, nam/constants.hpp |
tail-bound oracles, explicit deep-copy fork |
| P2 Refine | nam/refine.hpp |
interval-honest online digit extraction |
| P2 Memo | nam/memo.hpp |
explicit bounded LRU, no hidden global state |
| P3 User API | nam/number.hpp |
Number type, precision contexts, fork cost |
| P3 WASM | bindings/wasm/* |
Emscripten/embind JS bindings, same user surface |
The automaton tier speaks a versioned, C-compatible ABI (nam/abi.h):
typedef struct {
uint32_t base; /* codec selector — NOT baked into the number */
uint32_t phase; /* periodic-orbit phase / digit index */
uint64_t state[4]; /* over-provisioned for degree-4 algebraic */
} AutomatonVM; /* sizeof == 40 */
typedef struct { uint32_t digit; AutomatonVM next; } NumVMStep;
typedef NumVMStep (*NumVMFn)(AutomatonVM);sizeof(AutomatonVM) == 40(4 + 4 + 4*8) — checked bystatic_assert.- Trivially copyable + standard layout: fork is a literal struct copy, O(1), no hidden state.
NAM_ABI_VERSIONbumps on any layout change. Fields are never reordered.
These hold across every tier and language binding.
Core (Phase 1)
- No
assert(x == y)on values anywhere — equality is undecidable. Tests compare bounded digit prefixes; fork determinism is checked exact. - Comparison predicates return tri-state (
Less | Greater | Indistinguishable) orstd::optional<bool>(pending), never a false definite answer. - No global mutable state below the user layer.
- Memory is the complexity metric:
BoundedInt::bit_width()growth is asserted O(log n) for the n-th digit of a quadratic irrational.
Series tier (Phase 2)
- Fork is an explicit deep copy of the
BigIntaccumulators —SeriesVM::fork()is O(size) in computation depth, never copy-on-write.series_fork_is_deep_copyproves the fork is unaffected by later mutation. - Every constant ships a compiled convergence proof: the tail-bound
oracle is part of the
SeriesSpec, so digits are only committed when the interval honestly distinguishes them. - Digit extraction is interval-honest:
next_digitreturnsstd::optional— a boundary value that cannot be distinguished yields a pending (nullopt), never a false definite digit. - Memoization is explicit and bounded:
LruDigitCache/CachedDigitSourceare opt-in wrappers; there is no hidden global cache.
User tier (Phase 3)
- Precision contexts are scoped, not global:
PrecisionContextis a thread-local RAII guard. Nesting restores the prior value on scope exit; there is no global mutable precision leaking across threads or scopes. - Fork cost is annotated by tier:
Number::fork_cost()reportsO(1)for the automaton tier andO(log n)for the series tier, and the fork itself dispatches to the matching copy (struct copy vs deep copy). - Comparison stays interval-honest at the user layer:
definitely_less_thanreturnsstd::optional<bool>(pending) andcomparereturns aTrit;agrees_withis exact for a finite prefix. No false definite equality. - Memoization is an explicit mode:
.streaming()(O(1) space) and.cached(max_digits=N)(bounded LRU) are user choices; forks receive independent caches so value semantics are never violated. - Base is a codec:
in_base(b)reprojects without mutating the original Number (base()of the source is unchanged).
JIT tier (Phase 4)
- The compiled function is a real, C-callable
NumVMFnwith the exact static ABI; the interpreter's dispatch table lives honestly behind the function pointer rather than corrupting any ABI field. - Both the interpreter and LLVM backends reproduce the static digit stream exactly — verified digit-for-digit in the Phase 4 tests.
# Clone the emsdk
cd ~
git clone https://github.com/emscripten-core/emsdk.git
cd emsdk
# Download and install the latest SDK
./emsdk install latest
./emsdk activate latest
# Set up environment variables in your current shell
source ./emsdk_env.shThen configure your CMake build pointing to the toolchain file:
clear
source ~/emsdk/emsdk_env.sh
rm -rf build build-wasm
emcmake cmake -S . -B build-wasm -G Ninja -DNAM_BUILD_WASM=ON
cmake --build build-wasmEven easier — use the emcmake wrapper which auto-sets the toolchain:
emcmake cmake .. -DNAM_BUILD_WASM=ON
emmake make