reduce: structure compute as determine-times then evaluate by frankmcsherry · Pull Request #775 · TimelyDataflow/differential-dataflow

frankmcsherry · 2026-06-29T21:20:06Z

Restructures the reduce operator's per-key compute into two passes — determine all interesting times, then evaluate them — replacing the original loop that interleaved time discovery with user-closure evaluation. Built as a sequence of behavior-preserving commits, each validated against tests/scc.rs (differential vs. sequential) and tests/reduce.rs.

Premise

Output can only change where the accumulated input changed, so output-change times lie within the join-closure of input times and need not separately seed the closure. This makes the interesting-time set knowable from inputs alone — verified empirically first (see #774): a shadow discovery omitting the value-dependent output_produced joins reproduced the live reconsidered set exactly (missed=0) across ~14k times on a 200-round incremental SCC.

Commits

Add debug-only discovery shadow + audit — live path untouched; confirms the shadow matches what the live logic reconsiders.
Stop seeding interesting times from output_produced — proven redundant; after this, time determination reads nothing produced by user logic and is value-independent.
Split compute into determine-then-evaluate passes — the pivot. discover_times enumerates every interesting time in [.., upper) reasoning only about times; evaluate_times walks that fixed set, reforms collections, runs logic, emits diffs. All time determination now precedes any user-closure evaluation.
4–5. Share one history load across both passes — compute loads the input/output/batch histories once (the only cursor traversal, input/output advanced by meet(batch ∪ warned)); both passes are cursor-free, replaying the shared in-memory edits via ValueHistory::replay.
Remove the scaffolding — clean two-pass reduce.

Validation

Full suite green at every step (lib, bfs, join, reduce, scc, trace); SCC's differential-vs-sequential oracle covers the new evaluate_times; release builds are byte-for-byte unchanged. Net +168/-135 in reduce.rs.

Notes / next

evaluate_times correctness rests on the SCC/reduce oracles; its running meet (glb of the remaining active suffix and the replay frontiers) stays <= every remaining active time, so buffer compaction is result-preserving — reasoned, exercised on a 200-round incremental workload, not formally proven.
Possible follow-ups: a non-destructive pass 1 to skip the pass-2 replay() rebuild; the clean pass-2 boundary now enables the abelian Möbius / independent-O(t) optimization as separate work; fold in the "Reasoning about interesting times" writeup as module docs.

🤖 Generated with Claude Code

Additive, release path untouched. `compute` keeps the original interleaved algorithm; in debug builds it additionally collects the in-interval times it reconsiders and runs a prospective `discover_times` pass as a shadow (driving nothing), comparing the two via the `audit` counters. `discover_times` reasons only about times and omits the value-dependent `output_produced` joins, so the comparison measures whether interesting times are knowable from inputs alone. tests/reduce_audit.rs asserts the safety property missed == 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Output can only change where the accumulated input changed, so output-change times lie within the join-closure of input times and need not separately seed the closure. Drop the `output_produced` synthetic-time arm, keeping only its `<= next_time` arm (which subtracts already-produced output when forming diffs). After this, time determination reads nothing produced by user logic -- it is value-independent and hoistable. Confirmed behavior-preserving: SCC still matches the sequential oracle and the discovery audit is unchanged (reconsidered == discovered, missed == 0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`compute` is now a two-pass orchestrator: `discover_times` determines every interesting time in [.., upper) reasoning only about times, then `evaluate_times` walks those times in order, reforms the input/output collections, runs user logic, and emits diffs. All time determination now precedes any user-closure evaluation -- the target invariant. `evaluate_times` carries no time-discovery machinery (no synthetic times, no interestingness tests): the active set is fixed, so it just walks it. Its running meet is the glb of the remaining active suffix and the replay frontiers, which stays <= every remaining active time, so buffer compaction is result-preserving. Each pass currently loads its histories independently (step 5 will share one load). Validated: full suite green, SCC matches the sequential oracle, release builds unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The two passes now run on a single load. `compute` loads the input/output/batch histories once (the only cursor traversal), advancing input/output by meet(batch ∪ warned). `discover_times` and `evaluate_times` are cursor-free: each rebuilds its replay view from the shared in-memory edits via `ValueHistory::replay`. Previously each pass re-walked the cursors. Step 4 (simplify the evaluation loop) was already satisfied: `evaluate_times` carries no discovery machinery -- it just walks the fixed active set. Validated: full suite green, SCC matches the sequential oracle, release builds unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drop the debug-only `audit` module, its record calls, and tests/reduce_audit.rs now that the two-pass split is in and validated. Output correctness is covered by tests/scc.rs (differential vs sequential) and tests/reduce.rs. The audit can be recovered from git history if tightness numbers are wanted later. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Pass 1 no longer consults the output history. Output is a function of the accumulated input, so it can only change at input-closure times; the output trace bears on the diff values (pass 2) but not on which times are interesting. Drop the output stream, its synthetic-time joins, and its meet contribution from `discover_times`, leaving a pass over batch + maintained-input + warned + synthetic times. Restore the explanatory comments stripped during the split. Validated: full suite green, SCC matches the sequential oracle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

frankmcsherry and others added 6 commits June 29, 2026 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reduce: structure compute as determine-times then evaluate#775

reduce: structure compute as determine-times then evaluate#775
frankmcsherry wants to merge 6 commits into
TimelyDataflow:master-nextfrom
frankmcsherry:reduce-phase-split

frankmcsherry commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

frankmcsherry commented Jun 29, 2026

Premise

Commits

Validation

Notes / next

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant