Skip to content

feat(vcr-ra): frame-slot dead-store elimination behind SYNTH_STACK_FWD (flag-off) (#242)#515

Merged
avrabe merged 1 commit into
mainfrom
vcr-ra-frame-slot-dce-242
Jun 26, 2026
Merged

feat(vcr-ra): frame-slot dead-store elimination behind SYNTH_STACK_FWD (flag-off) (#242)#515
avrabe merged 1 commit into
mainfrom
vcr-ra-frame-slot-dce-242

Conversation

@avrabe

@avrabe avrabe commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

Completes the stack-reload-forwarding lever: after forward_stack_reloads
(already built, behind SYNTH_STACK_FWD) turns a frame reload into a register
move, the original str rX,[sp,#N] is left behind as a dead store. This adds
eliminate_dead_frame_stores to remove it.

This is part of the "optimize down toward native parity" push — the frame-resident
lowering (local.getldr [sp], local.setstr [sp]) is the dominant spurious
traffic gale measured on silicon (#209: flat_flight's spills fit in R0-R8 once
value-allocated).

eliminate_dead_stores can't catch these — a str defines no register, so its
register-def liveness is vacuous; slot liveness is what's needed.

Soundness — overwrite-only by design

A store is proven dead only by a later store to the same immediate slot
with no intervening read or aliasing op. Reaching the function end does not
count (the "abandoned at frame teardown" case needs epilogue reasoning we
deliberately skip — default is KEEP). The forward scan stops and keeps the store
at the first of:

  • a ldr from the slot,
  • ANY sp-relative register-offset access,
  • ANY sp-relative sub-word access (ldrb/ldrh/strb/… — could touch bytes inside the slot),
  • Push/Pop, a call (callee may read an outgoing stack-arg slot),
  • an SP redefinition, or any op reg_effect does not model (branch/label).

Rests on the documented load-bearing assumption that spill slots are
word-aligned and non-overlapping
. 8 unit tests pin each boundary; the
sub-word and overwrite-at-end guards were verified failing before the fix
(non-vacuous) — closing the #483/#496/#507-class hole an advisor flagged before
it could reach the flip.

Gating — DEFAULT-OFF, shipped behaviour unchanged

Both passes sit under SYNTH_STACK_FWD; off ⇒ byte-identical (no flag-off
code path changes; frozen gate green). The shipped behaviour of this PR is
unchanged
— the win below is the flip's payoff, a separate silicon-gated step.

New CI oracle frame_slot_dce_differential.py EXECUTES the flag-ON build under
unicorn and asserts flight_algo's return matches wasmtime across several sensor
inputs (incl. the 0x07FDF307 frozen-anchor value), linear memory seeded as
wasmtime's; it also asserts the flag-off .text is byte-identical across builds.

Flip payoff (measured via objdump — NOT moved by this PR)

metric (flat_flight flight_algo) flag-off flag-on
sp-relative memory ops 20 7 (9 reloads → reg moves, 4 dead stores removed)
instructions 139 135

VCR-RA / epic #242. Behaviour frozen on every shipped path.

…D (flag-off) (#242)

The optimized ARM path lowers wasm locals frame-resident (`local.get`→`ldr [sp]`,
`local.set`→`str [sp]`). On the silicon hot path most of that traffic is spurious
— gale #209 measured flat_flight's spills as fitting in R0-R8 once value-allocated.
`forward_stack_reloads` (already built, behind SYNTH_STACK_FWD) turns a reload
whose value is still in a register into a register move, but leaves the now-dead
`str` behind. This adds the missing completion: `eliminate_dead_frame_stores`
removes a `str rX,[sp,#N]` whose slot is overwritten before any read.

`eliminate_dead_stores` cannot catch these — a store defines no register, so its
register-def liveness is vacuous; slot liveness is what's needed.

SOUNDNESS — overwrite-only by design. A store is proven dead ONLY by a later
store to the SAME immediate slot with no intervening read or aliasing op.
Reaching the end of the function does NOT count (the "abandoned at frame
teardown" case needs epilogue reasoning we deliberately do not do — default is
KEEP). The forward scan stops, keeping the store, at the first of: a `ldr` from
the slot, ANY sp-relative register-offset access, ANY sp-relative SUB-WORD access
(ldrb/ldrh/strb/…, which could touch bytes inside the slot), Push/Pop, a call (a
callee may read an outgoing stack-arg slot), an SP redefinition, or any op
`reg_effect` does not model (branch/label). Rests on the load-bearing assumption
that spill slots are word-aligned and non-overlapping (documented at the pass).
8 unit tests pin each boundary (overwrite-removed, read-kept, distinct-slots,
call/SP-change/branch/sub-word/reg-offset all keep); the sub-word and
overwrite-at-end cases were verified failing before their guards (non-vacuous).

GATED DEFAULT-OFF. Both passes sit under `SYNTH_STACK_FWD`; off ⇒ byte-identical
(no flag-off code path changes; frozen gate green). The shipped behaviour of this
PR is therefore UNCHANGED — the win is the flip's payoff, a separate
silicon-gated step. New CI oracle `frame_slot_dce_differential.py` EXECUTES the
flag-ON build under unicorn and asserts flight_algo's return matches wasmtime
across several sensor inputs (incl. the 0x07FDF307 frozen-anchor value), with
linear memory seeded as wasmtime's; it also asserts the flag-off `.text` is
byte-identical across builds.

Flip payoff (measured, objdump, NOT shipped here): flat_flight `flight_algo`
sp-relative traffic 20→7 (9 reloads forwarded, 4 dead stores removed),
139→135 instructions.

VCR-RA / epic #242. Behaviour frozen on every shipped path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.85185% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/liveness.rs 95.31% 6 Missing ⚠️
crates/synth-backend/src/arm_backend.rs 28.57% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 43726f5 into main Jun 26, 2026
24 checks passed
@avrabe avrabe deleted the vcr-ra-frame-slot-dce-242 branch June 26, 2026 18:25
avrabe added a commit that referenced this pull request Jun 26, 2026
…t-on (#242) (#516)

The two paired frame-traffic passes shipped flag-off in #514/#515 —
`forward_stack_reloads` (a `local.set; local.get` reload becomes `mov rY,rX` when
rX still holds the value) and `eliminate_dead_frame_stores` (the now-dead
`str rX,[sp,#N]` whose slot is overwritten-before-read is removed) — go
DEFAULT-ON. Escape hatch: `SYNTH_NO_STACK_FWD=1` restores the frame-resident
bytes. Same gated path as the cmp→select (v0.13.0) and local-promotion (v0.14.0)
flips.

The win lands on the SHIPPED `--relocatable` path (the post-passes run on the
direct selector's output, which is what gale ships): flight_seam 774→738,
flight_seam_flat 910→878; control_step unchanged (no spurious slot reuse);
signed_div_const + all RV32 unchanged (ARM-only — verified m4 and m7 identically,
RISC-V byte-identical).

RESULTS bit-identical, proven on every frozen anchor: control_step 0x00210A55
(control_step_differential.py 13/13), flat AND inlined flight_algo 0x07FDF307
(flight_seam_differential.py MATCH both). The execution differential now runs
BOTH the default and the opt-out against wasmtime (a default flip is only safe if
the shipped path AND its rollback both match) and asserts the two emit different
bytes (flip engaged). Broad oracle: `cargo test --workspace` green under the new
default (the wast/spec suite is compile-only, so gale's G474RE silicon is the
broad-execution check). Clean-room verified (6/6 claims, independent harness).

GATING:
- Frozen goldens re-frozen (flight_seam, flight_seam_flat); control_step + RV32
  untouched. New `frozen_fixtures_stack_fwd_escape_hatch_restores_old_bytes`
  asserts `SYNTH_NO_STACK_FWD=1` restores the pre-flip bytes byte-for-byte — the
  rollback proof and a tripwire.
- CI oracle updated to test the default + the opt-out.

This is an instruction/memory-op proxy win (flight_algo sp-traffic 20→7,
139→135 insns); the measured CYCLE number is gale's G474RE, confirmed post-ship
per the cmp→select silicon-gate-waiver precedent. const-CSE (SYNTH_CONST_CSE)
stays flag-off — its alias-eviction prerequisite is open and it is inert on
flat_flight.

VCR-RA / epic #242.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant