Skip to content

fix(vcr-ra): const-CSE size-regression guard — CSE-last + per-segment size guard (#242)#519

Merged
avrabe merged 1 commit into
mainfrom
vcr-ra-const-cse-no-regression-242
Jun 26, 2026
Merged

fix(vcr-ra): const-CSE size-regression guard — CSE-last + per-segment size guard (#242)#519
avrabe merged 1 commit into
mainfrom
vcr-ra-const-cse-no-regression-242

Conversation

@avrabe

@avrabe avrabe commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

Fixes the const-CSE size regression gale found in the v0.17.0 burndown (#242):
SYNTH_CONST_CSE=1 grew a tiny --relocatable function (gust_mix 90→92 B).

Root cause. On --relocatable, the optimized path's inline const cache never
runs (select_direct()), so the post-hoc liveness::apply_const_cse acts alone.
A remove-movw + rename-use post-pass on already-register-assigned instructions
cannot itself spill — it grows code only by changing what a later pass does. Here
it retargeted a use, kept a constant resident longer, and defeated a downstream
immediate-fold
that would otherwise have absorbed the constant.

Fix

  1. CSE-last — move the apply_const_cse call to run after every immediate-fold
    (fold_immediate_shifts / fold_uxth), before branch resolution. Foldable consts
    are already folded-and-gone, so CSE can no longer defeat a fold. Structurally
    eliminates gale's mechanism.
  2. Per-segment size guard in apply_const_cse — stage each segment's removals/
    retargets, estimate the rewritten segment via estimate_arm_byte_size (the test(vcr-oracle): estimator↔encoder agreement oracle for the optimized path (#498, #242) #511
    encoder mirror), commit only if it does not grow. A retarget that flips a 16-bit
    ldr to its 32-bit form (low→high base register) is declined.

Verification

  • Non-vacuous guard tests — two contrasting liveness unit tests: identical
    segments differing only in the resident register's class (high R8 → encoding flips
    → declines; low R2 → no flip → commits).
  • Differential (const_cse_differential.py) — flag-on values bit-identical to
    wasmtime across the corpus; new per-function no-regression gates on both the
    optimized and --relocatable paths.
  • Flag-off byte-identical — frozen gate 3/3, const_cse golden 2/2.
  • cargo test --workspace green (85 suites); fmt + clippy clean.

Honesty / scope

const-CSE stays flag-off (SYNTH_CONST_CSE). The pressure/size prerequisite for
the eventual default-on flip is now closed; alias-eviction remains the sole open
prerequisite. gale's exact gust_mix case is not yet reproduced in-tree — the
post-hoc pass is currently inert on the arithmetic --relocatable corpus, so that
gate is a tripwire that lights up when a triggering fixture (gust_mix.wat, requested
on #242) lands. The fix is structurally sound; this PR does not claim an empirical
fix-on-gale.

🤖 Generated with Claude Code

… size guard (#242)

gale's v0.17.0 burndown found SYNTH_CONST_CSE=1 GREW a tiny --relocatable
function (gust_mix 90→92 B). On --relocatable the optimized path's inline const
cache never runs (select_direct), so the post-hoc liveness::apply_const_cse acts
alone: it retargeted a use, kept a constant resident longer, and defeated a
downstream immediate-fold that would otherwise have absorbed the constant.

A remove-movw + rename-use post-pass on already-register-assigned instructions
cannot itself spill — it grows code only by changing what a later pass does. Two
fixes:

1. CSE-LAST: move the apply_const_cse call to run after every immediate-fold
   (fold_immediate_shifts / fold_uxth), before branch resolution. Foldable consts
   are already folded-and-gone, so CSE can no longer defeat a fold. This
   structurally eliminates gale's mechanism.

2. Per-segment SIZE GUARD in apply_const_cse: stage each segment's removals/
   retargets, estimate the rewritten segment via estimate_arm_byte_size (the #511
   encoder mirror), and commit only if it does not grow — so a retarget that flips
   a 16-bit ldr to its 32-bit form (low→high base register) is declined.

Verification:
- Two contrasting liveness unit tests prove the guard non-vacuous: identical
  segments differing only in the resident register's class (high R8 → encoding
  flips → declines; low R2 → no flip → commits).
- const_cse_differential.py: flag-on values bit-identical to wasmtime across the
  corpus; new per-function no-regression gates on BOTH the optimized and
  --relocatable paths (the latter is the path gale's bug lives on — currently
  inert on the arithmetic corpus, a tripwire for when gust_mix.wat lands).
- Flag-off byte-identical (frozen gate 3/3, const_cse golden 2/2).
- cargo test --workspace green (85 suites); fmt + clippy clean.

const-CSE stays flag-off (SYNTH_CONST_CSE). The pressure/size prerequisite for
the eventual default-on flip is now closed; alias-eviction remains the sole open
prerequisite. gale's exact gust_mix case is not yet reproduced in-tree — fixture
requested on #242 to pin the trigger.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.10714% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-backend/src/arm_backend.rs 85.71% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 2024817 into main Jun 26, 2026
24 checks passed
@avrabe avrabe deleted the vcr-ra-const-cse-no-regression-242 branch June 26, 2026 21:43
avrabe added a commit that referenced this pull request Jun 27, 2026
…t fires + stays correct on gale's path (#242) (#522)

The const-CSE no-regression fix (#519) ships with an honest gap: its
`--relocatable` gate was INERT. gale's gust_mix 90→92 B regression was on
`--relocatable`, which routes through `select_direct()` where only the post-hoc
`apply_const_cse` runs — but the `const_cse.wat` arithmetic corpus never makes
the direct selector emit the redundant same-value-in-two-registers shape that
pass dedups, so the gate gave zero positive evidence on the exact path the bug
lived on.

This adds `const_cse_direct.wat`: single-param, pure-register, reloc-free shapes
(a >8-bit const reused across several independent sub-expressions summed at the
end) that DO make the direct selector emit the redundant `movw`, so post-hoc
`apply_const_cse` fires on `--relocatable` (r1 44→40, r2 38→34). The differential
now runs a NON-VACUOUS direct-path gate that asserts:
  (a) CSE actually fires on >=1 function — fails if the gate goes blind;
  (b) no function grows (the no-regression property on gale's path);
  (c) every result is bit-identical to wasmtime under unicorn (correctness of
      post-hoc CSE on the direct selector's output).

This is the positive evidence on gale's exact path that #519 could not provide.
Behavior-frozen: new fixture + harness only, no codegen change — frozen anchors
(control_step 0x00210A55, flight_algo 0x07FDF307) and the const_cse flag-off
golden are untouched (frozen gate 3/3, golden 2/2). The full differential passes
(exit 0); flag-off byte-identical.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant