Skip to content

feat(vcr-ra): flip stack-reload-forwarding + frame-slot DCE to default-on (#242)#516

Merged
avrabe merged 1 commit into
mainfrom
vcr-ra-stack-fwd-flip-242
Jun 26, 2026
Merged

feat(vcr-ra): flip stack-reload-forwarding + frame-slot DCE to default-on (#242)#516
avrabe merged 1 commit into
mainfrom
vcr-ra-stack-fwd-flip-242

Conversation

@avrabe

@avrabe avrabe commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

Flips the two paired frame-traffic passes from #514/#515 (forward_stack_reloads

  • eliminate_dead_frame_stores) from opt-in to DEFAULT-ON. Escape hatch:
    SYNTH_NO_STACK_FWD=1. This is the feature-loop step that turns the flag-off
    levers into an actual delivered win — same gated path as cmp→select (v0.13.0) and
    local-promotion (v0.14.0).

The win lands on the SHIPPED --relocatable path (the post-passes run on the
direct selector's output, which is what gale ships):

fixture (--relocatable, cortex-m4) before after
flight_seam 774 B 738 B
flight_seam_flat 910 B 878 B
control_step 304 B 304 B (unchanged)
signed_div_const / all RV32 unchanged (ARM-only)

Correctness — results bit-identical

anchor check
control_step 0x00210A55, control_step_differential.py 13/13
flat + inlined flight_algo 0x07FDF307, flight_seam_differential.py MATCH both

The execution differential now runs both the default and the SYNTH_NO_STACK_FWD
opt-out
against wasmtime (a default flip is only safe if the shipped path and
its rollback both match) and asserts the two emit different bytes (flip
engaged). Broad oracle: cargo test --workspace green under the new default
(the wast/spec suite is compile-only, so gale's G474RE silicon is the
broad-execution check). Clean-room verified — 6/6 falsifiable claims
reproduced by an independent agent with its own harness (incl. confirming the flip
is ARM-wide m4+m7 and RISC-V byte-identical).

Soundness

Overwrite-only DCE (a str [sp,#N] is dead only when a later store to the same
immediate slot overwrites it with no intervening read; reaching the function end
does not count), with sub-word sp accesses (ldrb/ldrh/…) as blockers — the
advisor-caught #483-class hole, with a test verified failing pre-fix. 8 unit tests
pin the boundaries.

Gating

  • Frozen goldens re-frozen (flight_seam, flight_seam_flat); control_step + RV32
    untouched. New frozen_fixtures_stack_fwd_escape_hatch_restores_old_bytes
    asserts SYNTH_NO_STACK_FWD=1 restores the pre-flip bytes byte-for-byte — the
    rollback proof and a tripwire.
  • CI oracle updated to test the default + the opt-out.

Honest framing

This is an instruction/memory-op proxy win (flight_algo sp-traffic 20→7,
139→135 insns). The measured cycle number is gale's G474RE, confirmed
post-ship per the cmp→select silicon-gate-waiver precedent — the merge is not
gated on silicon, the perf claim is. const-CSE stays flag-off (its
alias-eviction prerequisite is open and it is inert on flat_flight).

VCR-RA / epic #242.

…t-on (#242)

The two paired frame-traffic passes shipped flag-off in #514/#515 —
`forward_stack_reloads` (a `local.set; local.get` reload becomes `mov rY,rX` when
rX still holds the value) and `eliminate_dead_frame_stores` (the now-dead
`str rX,[sp,#N]` whose slot is overwritten-before-read is removed) — go
DEFAULT-ON. Escape hatch: `SYNTH_NO_STACK_FWD=1` restores the frame-resident
bytes. Same gated path as the cmp→select (v0.13.0) and local-promotion (v0.14.0)
flips.

The win lands on the SHIPPED `--relocatable` path (the post-passes run on the
direct selector's output, which is what gale ships): flight_seam 774→738,
flight_seam_flat 910→878; control_step unchanged (no spurious slot reuse);
signed_div_const + all RV32 unchanged (ARM-only — verified m4 and m7 identically,
RISC-V byte-identical).

RESULTS bit-identical, proven on every frozen anchor: control_step 0x00210A55
(control_step_differential.py 13/13), flat AND inlined flight_algo 0x07FDF307
(flight_seam_differential.py MATCH both). The execution differential now runs
BOTH the default and the opt-out against wasmtime (a default flip is only safe if
the shipped path AND its rollback both match) and asserts the two emit different
bytes (flip engaged). Broad oracle: `cargo test --workspace` green under the new
default (the wast/spec suite is compile-only, so gale's G474RE silicon is the
broad-execution check). Clean-room verified (6/6 claims, independent harness).

GATING:
- Frozen goldens re-frozen (flight_seam, flight_seam_flat); control_step + RV32
  untouched. New `frozen_fixtures_stack_fwd_escape_hatch_restores_old_bytes`
  asserts `SYNTH_NO_STACK_FWD=1` restores the pre-flip bytes byte-for-byte — the
  rollback proof and a tripwire.
- CI oracle updated to test the default + the opt-out.

This is an instruction/memory-op proxy win (flight_algo sp-traffic 20→7,
139→135 insns); the measured CYCLE number is gale's G474RE, confirmed post-ship
per the cmp→select silicon-gate-waiver precedent. const-CSE (SYNTH_CONST_CSE)
stays flag-off — its alias-eviction prerequisite is open and it is inert on
flat_flight.

VCR-RA / epic #242.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit b043101 into main Jun 26, 2026
24 checks passed
@avrabe avrabe deleted the vcr-ra-stack-fwd-flip-242 branch June 26, 2026 19:11
avrabe added a commit that referenced this pull request Jun 26, 2026
#517)

Pin sweep (workspace + all intra-workspace path-deps + MODULE.bazel 0.16.0→0.17.0)
+ CHANGELOG for the SYNTH_STACK_FWD flip to default-on (PR #516, b043101).

The shipped --relocatable path now gets stack-reload forwarding + frame-slot
dead-store elimination by default: flight_seam 774→738 B, flight_seam_flat
910→878 B; control_step + all RISC-V unchanged. RESULTS bit-identical
(control_step 0x00210A55, flat+inlined flight_algo 0x07FDF307). Escape hatch
SYNTH_NO_STACK_FWD=1. Measured cycle win confirmed on G474RE post-ship.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant