Skip to content

reduce: debug-only audit that interesting times are knowable from inputs#774

Draft
frankmcsherry wants to merge 1 commit into
TimelyDataflow:master-nextfrom
frankmcsherry:reduce-discovery-audit
Draft

reduce: debug-only audit that interesting times are knowable from inputs#774
frankmcsherry wants to merge 1 commit into
TimelyDataflow:master-nextfrom
frankmcsherry:reduce-discovery-audit

Conversation

@frankmcsherry

Copy link
Copy Markdown
Member

Draft / investigation — not a behavior change. The production reduce path is unchanged; this adds debug-only verification scaffolding to test whether reduce's interesting times can be determined up front from the inputs, independent of the values produced. That is the prerequisite for eventually splitting compute into a time-discovery phase and an evaluation phase.

What's here

  • compute is now a thin wrapper. The original interleaved algorithm is preserved verbatim as compute_live (the large diff is mostly relocation), so release builds are byte-for-byte the prior behavior with zero overhead — all new code is #[cfg(debug_assertions)]-gated.
  • In debug builds, compute additionally runs a prospective discover_times pass as a shadow (drives nothing) and compares the times it would enumerate against the times the live logic actually reconsidered, recording the result in the audit counters. Nothing on the worker path panics, so a divergence shows up as a counter, never a hang.
  • discover_times deliberately omits the value-dependent output_produced joins, so the comparison measures exactly what that (value-dependent) source contributes.
  • tests/reduce_audit.rs runs a 200-round incremental SCC and asserts the safety property missed == 0 (discovery must never drop a time the live logic reconsiders).

Finding

On the 200-round incremental SCC (50 nodes, 100 edges, random churn): reconsidered=13925, discovered=13925, missed=0, extra=0. The shadow reproduces the live interesting set exactly while omitting the output_produced joins — i.e. on this workload the output-seeded synthetic times are redundant with the input-time join-closure, supporting the claim that interesting times are knowable from inputs alone (a statement about which times to reconsider; value-independent, orthogonal to abelian-ness).

Status / next steps

  • Scaffolding is marked for removal once the split is decided.
  • Explanatory writeup ("Reasoning about interesting times") to be folded in.
  • Widen the audit across more reduce-driven tests before trusting a real split.

🤖 Generated with Claude Code

Adds verification scaffolding (debug builds only) to test whether the
reduce operator's interesting times can be determined up front from the
inputs, independent of the values produced -- a prerequisite for splitting
`compute` into a time-discovery phase and an evaluation phase.

The production path is unchanged: the original interleaved algorithm is
preserved verbatim as `compute_live`. In debug builds, `compute` also runs
a prospective `discover_times` pass as a shadow and compares the times it
would enumerate against the times the live logic actually reconsidered,
recording the discrepancy in the `audit` counters. `discover_times`
deliberately omits the value-dependent `output_produced` joins, so the
comparison measures exactly what those contribute. Nothing on the worker
path panics, so a divergence surfaces as a counter, not a hang.

tests/reduce_audit.rs runs a 200-round incremental SCC and asserts the
safety property `missed == 0` (discovery never drops a reconsidered time);
on that workload discovered == reconsidered exactly (extra == 0), i.e. the
output-seeded synthetic times are redundant with the input-time closure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant