Skip to content

reduce: structure compute as determine-times then evaluate#775

Draft
frankmcsherry wants to merge 6 commits into
TimelyDataflow:master-nextfrom
frankmcsherry:reduce-phase-split
Draft

reduce: structure compute as determine-times then evaluate#775
frankmcsherry wants to merge 6 commits into
TimelyDataflow:master-nextfrom
frankmcsherry:reduce-phase-split

Conversation

@frankmcsherry

Copy link
Copy Markdown
Member

Restructures the reduce operator's per-key compute into two passes — determine all interesting times, then evaluate them — replacing the original loop that interleaved time discovery with user-closure evaluation. Built as a sequence of behavior-preserving commits, each validated against tests/scc.rs (differential vs. sequential) and tests/reduce.rs.

Premise

Output can only change where the accumulated input changed, so output-change times lie within the join-closure of input times and need not separately seed the closure. This makes the interesting-time set knowable from inputs alone — verified empirically first (see #774): a shadow discovery omitting the value-dependent output_produced joins reproduced the live reconsidered set exactly (missed=0) across ~14k times on a 200-round incremental SCC.

Commits

  1. Add debug-only discovery shadow + audit — live path untouched; confirms the shadow matches what the live logic reconsiders.
  2. Stop seeding interesting times from output_produced — proven redundant; after this, time determination reads nothing produced by user logic and is value-independent.
  3. Split compute into determine-then-evaluate passes — the pivot. discover_times enumerates every interesting time in [.., upper) reasoning only about times; evaluate_times walks that fixed set, reforms collections, runs logic, emits diffs. All time determination now precedes any user-closure evaluation.
    4–5. Share one history load across both passescompute loads the input/output/batch histories once (the only cursor traversal, input/output advanced by meet(batch ∪ warned)); both passes are cursor-free, replaying the shared in-memory edits via ValueHistory::replay.
  4. Remove the scaffolding — clean two-pass reduce.

Validation

Full suite green at every step (lib, bfs, join, reduce, scc, trace); SCC's differential-vs-sequential oracle covers the new evaluate_times; release builds are byte-for-byte unchanged. Net +168/-135 in reduce.rs.

Notes / next

  • evaluate_times correctness rests on the SCC/reduce oracles; its running meet (glb of the remaining active suffix and the replay frontiers) stays <= every remaining active time, so buffer compaction is result-preserving — reasoned, exercised on a 200-round incremental workload, not formally proven.
  • Possible follow-ups: a non-destructive pass 1 to skip the pass-2 replay() rebuild; the clean pass-2 boundary now enables the abelian Möbius / independent-O(t) optimization as separate work; fold in the "Reasoning about interesting times" writeup as module docs.

🤖 Generated with Claude Code

frankmcsherry and others added 6 commits June 29, 2026 16:58
Additive, release path untouched. `compute` keeps the original interleaved
algorithm; in debug builds it additionally collects the in-interval times it
reconsiders and runs a prospective `discover_times` pass as a shadow (driving
nothing), comparing the two via the `audit` counters. `discover_times` reasons
only about times and omits the value-dependent `output_produced` joins, so the
comparison measures whether interesting times are knowable from inputs alone.

tests/reduce_audit.rs asserts the safety property missed == 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Output can only change where the accumulated input changed, so output-change
times lie within the join-closure of input times and need not separately seed
the closure. Drop the `output_produced` synthetic-time arm, keeping only its
`<= next_time` arm (which subtracts already-produced output when forming diffs).

After this, time determination reads nothing produced by user logic -- it is
value-independent and hoistable. Confirmed behavior-preserving: SCC still
matches the sequential oracle and the discovery audit is unchanged
(reconsidered == discovered, missed == 0).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`compute` is now a two-pass orchestrator: `discover_times` determines every
interesting time in [.., upper) reasoning only about times, then
`evaluate_times` walks those times in order, reforms the input/output
collections, runs user logic, and emits diffs. All time determination now
precedes any user-closure evaluation -- the target invariant.

`evaluate_times` carries no time-discovery machinery (no synthetic times, no
interestingness tests): the active set is fixed, so it just walks it. Its
running meet is the glb of the remaining active suffix and the replay
frontiers, which stays <= every remaining active time, so buffer compaction is
result-preserving.

Each pass currently loads its histories independently (step 5 will share one
load). Validated: full suite green, SCC matches the sequential oracle, release
builds unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two passes now run on a single load. `compute` loads the input/output/batch
histories once (the only cursor traversal), advancing input/output by
meet(batch ∪ warned). `discover_times` and `evaluate_times` are cursor-free:
each rebuilds its replay view from the shared in-memory edits via
`ValueHistory::replay`. Previously each pass re-walked the cursors.

Step 4 (simplify the evaluation loop) was already satisfied: `evaluate_times`
carries no discovery machinery -- it just walks the fixed active set.

Validated: full suite green, SCC matches the sequential oracle, release builds
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the debug-only `audit` module, its record calls, and tests/reduce_audit.rs
now that the two-pass split is in and validated. Output correctness is covered
by tests/scc.rs (differential vs sequential) and tests/reduce.rs. The audit can
be recovered from git history if tightness numbers are wanted later.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pass 1 no longer consults the output history. Output is a function of the
accumulated input, so it can only change at input-closure times; the output
trace bears on the diff values (pass 2) but not on which times are interesting.
Drop the output stream, its synthetic-time joins, and its meet contribution from
`discover_times`, leaving a pass over batch + maintained-input + warned +
synthetic times. Restore the explanatory comments stripped during the split.

Validated: full suite green, SCC matches the sequential oracle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant