reduce: structure compute as determine-times then evaluate#775
Draft
frankmcsherry wants to merge 6 commits into
Draft
reduce: structure compute as determine-times then evaluate#775frankmcsherry wants to merge 6 commits into
frankmcsherry wants to merge 6 commits into
Conversation
Additive, release path untouched. `compute` keeps the original interleaved algorithm; in debug builds it additionally collects the in-interval times it reconsiders and runs a prospective `discover_times` pass as a shadow (driving nothing), comparing the two via the `audit` counters. `discover_times` reasons only about times and omits the value-dependent `output_produced` joins, so the comparison measures whether interesting times are knowable from inputs alone. tests/reduce_audit.rs asserts the safety property missed == 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Output can only change where the accumulated input changed, so output-change times lie within the join-closure of input times and need not separately seed the closure. Drop the `output_produced` synthetic-time arm, keeping only its `<= next_time` arm (which subtracts already-produced output when forming diffs). After this, time determination reads nothing produced by user logic -- it is value-independent and hoistable. Confirmed behavior-preserving: SCC still matches the sequential oracle and the discovery audit is unchanged (reconsidered == discovered, missed == 0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`compute` is now a two-pass orchestrator: `discover_times` determines every interesting time in [.., upper) reasoning only about times, then `evaluate_times` walks those times in order, reforms the input/output collections, runs user logic, and emits diffs. All time determination now precedes any user-closure evaluation -- the target invariant. `evaluate_times` carries no time-discovery machinery (no synthetic times, no interestingness tests): the active set is fixed, so it just walks it. Its running meet is the glb of the remaining active suffix and the replay frontiers, which stays <= every remaining active time, so buffer compaction is result-preserving. Each pass currently loads its histories independently (step 5 will share one load). Validated: full suite green, SCC matches the sequential oracle, release builds unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two passes now run on a single load. `compute` loads the input/output/batch histories once (the only cursor traversal), advancing input/output by meet(batch ∪ warned). `discover_times` and `evaluate_times` are cursor-free: each rebuilds its replay view from the shared in-memory edits via `ValueHistory::replay`. Previously each pass re-walked the cursors. Step 4 (simplify the evaluation loop) was already satisfied: `evaluate_times` carries no discovery machinery -- it just walks the fixed active set. Validated: full suite green, SCC matches the sequential oracle, release builds unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the debug-only `audit` module, its record calls, and tests/reduce_audit.rs now that the two-pass split is in and validated. Output correctness is covered by tests/scc.rs (differential vs sequential) and tests/reduce.rs. The audit can be recovered from git history if tightness numbers are wanted later. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pass 1 no longer consults the output history. Output is a function of the accumulated input, so it can only change at input-closure times; the output trace bears on the diff values (pass 2) but not on which times are interesting. Drop the output stream, its synthetic-time joins, and its meet contribution from `discover_times`, leaving a pass over batch + maintained-input + warned + synthetic times. Restore the explanatory comments stripped during the split. Validated: full suite green, SCC matches the sequential oracle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restructures the reduce operator's per-key
computeinto two passes — determine all interesting times, then evaluate them — replacing the original loop that interleaved time discovery with user-closure evaluation. Built as a sequence of behavior-preserving commits, each validated againsttests/scc.rs(differential vs. sequential) andtests/reduce.rs.Premise
Output can only change where the accumulated input changed, so output-change times lie within the join-closure of input times and need not separately seed the closure. This makes the interesting-time set knowable from inputs alone — verified empirically first (see #774): a shadow discovery omitting the value-dependent
output_producedjoins reproduced the live reconsidered set exactly (missed=0) across ~14k times on a 200-round incremental SCC.Commits
output_produced— proven redundant; after this, time determination reads nothing produced by user logic and is value-independent.computeinto determine-then-evaluate passes — the pivot.discover_timesenumerates every interesting time in[.., upper)reasoning only about times;evaluate_timeswalks that fixed set, reforms collections, runslogic, emits diffs. All time determination now precedes any user-closure evaluation.4–5. Share one history load across both passes —
computeloads the input/output/batch histories once (the only cursor traversal, input/output advanced bymeet(batch ∪ warned)); both passes are cursor-free, replaying the shared in-memory edits viaValueHistory::replay.Validation
Full suite green at every step (lib, bfs, join, reduce, scc, trace); SCC's differential-vs-sequential oracle covers the new
evaluate_times; release builds are byte-for-byte unchanged. Net+168/-135inreduce.rs.Notes / next
evaluate_timescorrectness rests on the SCC/reduce oracles; its running meet (glb of the remaining active suffix and the replay frontiers) stays<=every remaining active time, so buffer compaction is result-preserving — reasoned, exercised on a 200-round incremental workload, not formally proven.replay()rebuild; the clean pass-2 boundary now enables the abelian Möbius / independent-O(t)optimization as separate work; fold in the "Reasoning about interesting times" writeup as module docs.🤖 Generated with Claude Code