Skip to content

Ixon: .ixe bundle format — main/assumptions header, metadata-last sections, blob integrity#474

Open
johnchandlerburnham wants to merge 1 commit into
jcb/compilerfrom
jcb/ixe-improvements
Open

Ixon: .ixe bundle format — main/assumptions header, metadata-last sections, blob integrity#474
johnchandlerburnham wants to merge 1 commit into
jcb/compilerfrom
jcb/ixe-improvements

Conversation

@johnchandlerburnham

@johnchandlerburnham johnchandlerburnham commented Jul 5, 2026

Copy link
Copy Markdown
Member

Merge #473 into main first.

Evolves the .ixe format in place (pre-alpha, Tag4(0xE,0) kept, no compat shims) so a
single env file can pin one Lean value as a self-contained, verifiable bundle — the
groundwork for using Ixon as a client/server interchange format. A constant's address
already pins its entire dependency DAG (refs tables are content addresses, recursively);
this PR adds the transport half: a distinguished root, an explicit trust boundary for
thin bundles, and closure helpers to produce/validate them.

Wire format

[Tag4(0xE,0)] [consts merkle root : 32]
[main : u8 flag + 32? ]              -- NEW: bundle root; readers enforce main ∈ consts
[assumptions : Tag0 count + 32*]     -- NEW: strictly ascending; thin-bundle cut points
§1 blobs    -- NEW: blake3(bytes) == addr verified per entry, every reader
§2 consts   -- unchanged (already verified)
§3 anon_hints  -- moved before metadata; ALWAYS written (derived from Named Def
               -- metadata when the map is empty)
§4 names / §5 named / §6 comms      -- metadata last
  • Metadata-last reorder: get_anon/get_anon_mmap read §1–§3 and stop — no more
    parse-and-discard of names/named/comms; parse_lazy_index stops before comms; the
    full readers now reject trailing bytes.
  • §3 is the canonical hint channel: read-time hint harvesting is gone (including
    EnvHandle::from_bytes' post-pass). merkle_root_canonical(assumptions) reproduces
    Claim.assumptions roots, so claim interop is derivable, not stored.
  • Integrity: blobs were previously unverified in every reader — a swapped blob
    silently changed a Nat/String literal under an otherwise-valid file. Lean getEnv
    also gains the per-const hash check it was missing.

Bundles

  • Env::bfs_closure — 3-edge traversal: Constant.refs, projection → Muts block
    (projections have empty refs; a refs-only walk returns just the root), and
    Muts block → member/ctor projection addresses (ingress_anon_block requires them).
    kernel::anon_work::closure_addrs now delegates to it.
  • Env::prune_to_closure(main, assumed) — builds a closed bundle: value closure cut at
    assumed (minimal: only cut points actually reached), genuine const bytes, blobs,
    per-const hints, and display metadata (named entries, name parent chains + string
    blobs, DataValue payload blobs, meta_refs edges, aux_gen originals).
  • Env::validate_closed() — receiver check: everything reachable from main is
    carried or assumed.

Byte-identical mirrors (bonus fix)

The Lean and Rust writers were never byte-identical: Rust emits the anonymous name as
names-table entry 0, Lean didn't — and nothing ever byte-compared the writers directly.
Lean now matches Rust, which also fixes a latent putIdx hazard (the getD 0 fallback
used to collide anon references with the first real name). New strict
serEnv == rsSerEnv tests lock this in.

Breaking change

All existing .ixe files are stale — regenerate with ix compile (readers emit a
"pre-bundle-format .ixe; recompile it" hint). No golden fixtures existed; .ixe is
gitignored and regenerated per run. FFI RawEnv ctor arity is 5→8 and RawEnvLazy
3→5 (the manual builders in ffi/compile.rs now populate the new slots — leaving them
unset was a segfault).

Verification

  • Rust: 1073 workspace tests, clippy --all-features clean. New ixon tests: bundle
    roundtrips through all four readers, blob-tampering rejection ×4, unsorted/dup
    assumptions, tampered main, trailing-garbage vs early-stop, hints-derivation byte
    parity, projection/Muts closure, prune/validate (offsets in tampering tests are now
    computed with the real parsers instead of hardcoded).
  • Lean: full lake test (ffi + ixon suites) green; generators produce bundle fields
    and content-addressed blobs.
  • Cross-language: whole-stdlib env — Lean serEnv == Rust rsSerEnv == roundtrip,
    byte-exact at 505,759,024 bytes (rust-serialize ignored suite).
  • End-to-end: ix compile Tests/MinimalDefs.lean → new-format .ixeix check-rs
    60,601/60,601 (meta) and 51,681/51,681 (anon); bench-typecheck executes and
    proves constants over the new format (on top of the Aiur Inductive-flag-drop port).

Follow-ups

  • prune_to_closure rescans env.named per fixpoint round — fine for bundles, worth
    an index if pruning against mathlib-scale envs becomes common.
  • Bundle transport niceties (store sync protocol, Commit.compileDef → bundle CLI) are
    future work; this PR is the format + helpers.

…tions, blob integrity

Evolves .ixe in place (Tag4(0xE,0) kept, pre-alpha) so a single env file
can pin one Lean value as a self-contained bundle:

- Header gains main : Option Address (bundle root; writers and all
  readers enforce main ∈ consts) and a strictly-ascending assumptions
  list (thin-bundle trust boundary; merkle_root_canonical over the
  leaves reproduces Claim.assumptions roots).
- Sections reordered hot-first: blobs, consts, anon_hints, names,
  named, comms. get_anon/get_anon_mmap stop after §3 (no metadata
  parse-and-discard); parse_lazy_index stops after §5; the full
  readers reject trailing bytes.
- anon_hints promoted from optional trailer to an always-written §3,
  derived from Named Def metadata when the map is empty; read-time
  hint harvesting removed (EnvHandle::from_bytes post-pass dropped).
- Blob entries are hash-verified in every reader on both sides (a
  swapped blob silently changed a Nat/String literal before); Lean
  getEnv also gains the const hash check it was missing.
- New Env::bfs_closure follows all three edge kinds (Constant.refs,
  Prj→block, Muts→member/ctor projections — a refs-only walk returns
  just the root for projections); prune_to_closure builds bundles
  (value closure + named-metadata pruning: name parent chains and
  string blobs, DataValue payload blobs, meta_refs edges, aux_gen
  originals, per-const hints); validate_closed is the receiver check.
  kernel::anon_work::closure_addrs now delegates to bfs_closure.
- Lean topologicalSortNames emits the anonymous name at index 0,
  matching Rust: serEnv and rsSerEnv are now byte-identical (verified
  byte-exact on a 505 MB whole-stdlib env) and the putIdx getD-0
  fallback now resolves to anon instead of colliding with the first
  real name.
- FFI mirrors: RawEnv carries main/assumptions/anonHints (ctor arity
  5→8), RawEnvLazy carries main/assumptions (3→5); the manual RawEnv
  builders in ffi compile.rs populate the new slots via
  set_raw_env_bundle_fields.
- Generators/fixtures content-address blobs and exercise the bundle
  fields; tampering tests compute offsets programmatically instead of
  hardcoding the header layout; docs/Ixon.md spec updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant