fix(vcr-ra): const-CSE size-regression guard — CSE-last + per-segment size guard (#242)#519
Merged
Merged
Conversation
… size guard (#242) gale's v0.17.0 burndown found SYNTH_CONST_CSE=1 GREW a tiny --relocatable function (gust_mix 90→92 B). On --relocatable the optimized path's inline const cache never runs (select_direct), so the post-hoc liveness::apply_const_cse acts alone: it retargeted a use, kept a constant resident longer, and defeated a downstream immediate-fold that would otherwise have absorbed the constant. A remove-movw + rename-use post-pass on already-register-assigned instructions cannot itself spill — it grows code only by changing what a later pass does. Two fixes: 1. CSE-LAST: move the apply_const_cse call to run after every immediate-fold (fold_immediate_shifts / fold_uxth), before branch resolution. Foldable consts are already folded-and-gone, so CSE can no longer defeat a fold. This structurally eliminates gale's mechanism. 2. Per-segment SIZE GUARD in apply_const_cse: stage each segment's removals/ retargets, estimate the rewritten segment via estimate_arm_byte_size (the #511 encoder mirror), and commit only if it does not grow — so a retarget that flips a 16-bit ldr to its 32-bit form (low→high base register) is declined. Verification: - Two contrasting liveness unit tests prove the guard non-vacuous: identical segments differing only in the resident register's class (high R8 → encoding flips → declines; low R2 → no flip → commits). - const_cse_differential.py: flag-on values bit-identical to wasmtime across the corpus; new per-function no-regression gates on BOTH the optimized and --relocatable paths (the latter is the path gale's bug lives on — currently inert on the arithmetic corpus, a tripwire for when gust_mix.wat lands). - Flag-off byte-identical (frozen gate 3/3, const_cse golden 2/2). - cargo test --workspace green (85 suites); fmt + clippy clean. const-CSE stays flag-off (SYNTH_CONST_CSE). The pressure/size prerequisite for the eventual default-on flip is now closed; alias-eviction remains the sole open prerequisite. gale's exact gust_mix case is not yet reproduced in-tree — fixture requested on #242 to pin the trigger. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6 tasks
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
avrabe
added a commit
that referenced
this pull request
Jun 27, 2026
…t fires + stays correct on gale's path (#242) (#522) The const-CSE no-regression fix (#519) ships with an honest gap: its `--relocatable` gate was INERT. gale's gust_mix 90→92 B regression was on `--relocatable`, which routes through `select_direct()` where only the post-hoc `apply_const_cse` runs — but the `const_cse.wat` arithmetic corpus never makes the direct selector emit the redundant same-value-in-two-registers shape that pass dedups, so the gate gave zero positive evidence on the exact path the bug lived on. This adds `const_cse_direct.wat`: single-param, pure-register, reloc-free shapes (a >8-bit const reused across several independent sub-expressions summed at the end) that DO make the direct selector emit the redundant `movw`, so post-hoc `apply_const_cse` fires on `--relocatable` (r1 44→40, r2 38→34). The differential now runs a NON-VACUOUS direct-path gate that asserts: (a) CSE actually fires on >=1 function — fails if the gate goes blind; (b) no function grows (the no-regression property on gale's path); (c) every result is bit-identical to wasmtime under unicorn (correctness of post-hoc CSE on the direct selector's output). This is the positive evidence on gale's exact path that #519 could not provide. Behavior-frozen: new fixture + harness only, no codegen change — frozen anchors (control_step 0x00210A55, flight_algo 0x07FDF307) and the const_cse flag-off golden are untouched (frozen gate 3/3, golden 2/2). The full differential passes (exit 0); flag-off byte-identical. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes the const-CSE size regression gale found in the v0.17.0 burndown (#242):
SYNTH_CONST_CSE=1grew a tiny--relocatablefunction (gust_mix90→92 B).Root cause. On
--relocatable, the optimized path's inline const cache neverruns (
select_direct()), so the post-hocliveness::apply_const_cseacts alone.A remove-
movw+ rename-use post-pass on already-register-assigned instructionscannot itself spill — it grows code only by changing what a later pass does. Here
it retargeted a use, kept a constant resident longer, and defeated a downstream
immediate-fold that would otherwise have absorbed the constant.
Fix
apply_const_csecall to run after every immediate-fold(
fold_immediate_shifts/fold_uxth), before branch resolution. Foldable constsare already folded-and-gone, so CSE can no longer defeat a fold. Structurally
eliminates gale's mechanism.
apply_const_cse— stage each segment's removals/retargets, estimate the rewritten segment via
estimate_arm_byte_size(the test(vcr-oracle): estimator↔encoder agreement oracle for the optimized path (#498, #242) #511encoder mirror), commit only if it does not grow. A retarget that flips a 16-bit
ldrto its 32-bit form (low→high base register) is declined.Verification
livenessunit tests: identicalsegments differing only in the resident register's class (high R8 → encoding flips
→ declines; low R2 → no flip → commits).
const_cse_differential.py) — flag-on values bit-identical towasmtime across the corpus; new per-function no-regression gates on both the
optimized and
--relocatablepaths.cargo test --workspacegreen (85 suites); fmt + clippy clean.Honesty / scope
const-CSE stays flag-off (
SYNTH_CONST_CSE). The pressure/size prerequisite forthe eventual default-on flip is now closed; alias-eviction remains the sole open
prerequisite. gale's exact
gust_mixcase is not yet reproduced in-tree — thepost-hoc pass is currently inert on the arithmetic
--relocatablecorpus, so thatgate is a tripwire that lights up when a triggering fixture (gust_mix.wat, requested
on #242) lands. The fix is structurally sound; this PR does not claim an empirical
fix-on-gale.
🤖 Generated with Claude Code