Skip to content

perf: many-webviews stress harness + Metal dst-texture cache + leak/fd robustness#11

Merged
wenkaifan0720 merged 5 commits into
mainfrom
dig/perf-render
Jun 25, 2026
Merged

perf: many-webviews stress harness + Metal dst-texture cache + leak/fd robustness#11
wenkaifan0720 merged 5 commits into
mainfrom
dig/perf-render

Conversation

@wenkaifan0720

Copy link
Copy Markdown
Collaborator

Summary

Performance hardening for the many-concurrent-webviews scenario, after an empirical stress audit. Key finding (verified): the engine is ALREADY smooth (0% jank) and leak-free at 12 concurrent webviews — so this PR is headroom + robustness, not a fix for observed jank. The #1 perf+leak risk is consumer wiring (Campus's ephemeral-host-per-tile), handed off separately.

Verified empirically (stress probe + sampler, added here)

  • 12 animating webviews on one shared host: ~6–8 ms frames, 0% jank, RSS flat ~2 GB.
  • create/dispose churn (×4 cycles): procs/RSS/fd return to baseline every cycle — no process/memory/fd leak.

Changes

  • Perf harness: example/lib/stress_probe.dart (N animating CefWebViews; shared/ephemeral/churn modes via --dart-define) + test/perf_sample.sh (cef_host procs/RSS/CPU + host-app fd sampler).
  • Metal dst-texture cache (cef_host): cache the GPU-blit dest MTLTexture per-Slot instead of wrapping it every frame; recreate only when the wrapped surface id changes; released at every surface site under surface_mutex. Verified leak-free under churn. (Marginal CPU at current scale — the bottleneck is the GPU waitUntilCompleted sync + uncoordinated pumps; see Deferred.)
  • Robustness: SIGKILL+reap a host still wedged after the death grace window (closes a zombie/orphan-host leak on the clean-shutdown race; removes the now-unused restoreSpawnedPid); raise soft RLIMIT_NOFILE at plugin registration (avoids EMFILE spawn failures with many agent-controlled tiles).

Deferred to a follow-up (by design)

The real render bottleneck is the per-frame GPU waitUntilCompleted sync + the N uncoordinated per-view 16 ms begin-frame pumps. The fix — a single CVDisplayLink driving all visible slots in phase + batching blits onto one command buffer/tick + idle-page pump backoff — is a delicate hot-path rework. Since the engine is already smooth, it's headroom for higher counts / 120 Hz and is best done as a dedicated, heavily-measured effort.

Verification

flutter analyze clean (package + sub-packages + example); example builds + renders; stress probe smooth + leak-free; CDP filter suite unaffected.

🤖 Generated with Claude Code

wenkaifan0720 and others added 5 commits June 24, 2026 16:50
…pler

example/lib/stress_probe.dart mounts a grid of N animating CefWebViews (shared or
ephemeral profile via --dart-define=CEF_EPHEMERAL, optional CEF_CHURN create/
dispose loop), reports Flutter frame timing, and offers +/- view controls.
test/perf_sample.sh samples cef_host process count + RSS + CPU + the host-app fd
count alongside it. Used to empirically verify smoothness + leak-freedom under
many concurrent webviews.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CompositeMetalLocked wrapped slot->surface as a fresh MTLTexture every paint (up to
60fps per visible browser). The dest surface is stable except on resize, so cache
the wrap on the Slot and recreate only when the wrapped IOSurface id changes,
released wherever surface is (OnBeforeClose / create-teardown / resize), all under
surface_mutex. Removes one of the two per-frame texture allocations. Verified
leak-free under create/dispose churn (procs/RSS return to baseline every cycle).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… scale

- handleHostDeath's reaper now SIGKILLs + reaps a child still wedged after the grace
  window, instead of handing the pid back for a later terminateProcess() the clean-
  shutdown path may never make — closing a zombie/orphan-host leak. Removes the
  now-unused restoreSpawnedPid.
- Raise the soft RLIMIT_NOFILE toward the hard cap at plugin registration: each
  cef_host costs several fds, so many agent-controlled tiles could approach a GUI
  app's default soft limit (often 256) and fail spawns with EMFILE.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…om crispness

Render many OSR webviews on one shared cef_host reliably and fast, and keep them
crisp when the canvas is zoomed in.

NEVER-BLANK. Creating N CEF browsers concurrently raced the first-frame GPU
shared-image allocation on the one shared GPU/Viz process — losers silently
Stop() -> permanent blank tile. Fix: the create-pacer now gates on first PAINT
(not the bind ack) and is a sliding-window semaphore (createInFlight + an
establishment window K, env FLUTTER_CEF_ESTAB_WINDOW, default 3). A patient
watchdog (firstPaintGrace ~10s) reports onPaintStalled as a REPEATING signal so a
consumer can do a bounded recreate; the per-slot begin-frame pump is liveness, so
a blank tile is merely slow and paints when resources free. removeBrowser/hide and
shutdown/host-death all free the establishment slot (no zombie-slot throttle).

FASTER CASCADE. The K window overlaps establishments (~3x faster median AND
last-tile first-paint for 20 real sites vs strict serial). Renderer-priority flags
(--disable-renderer-backgrounding + --disable-backgrounding-occluded-windows,
opt out FLUTTER_CEF_KEEP_BG_THROTTLE) keep visible OSR renderers full-priority
(OSR has no OS window, so Chromium throttles them); ~halves first-paint.
about:blank-first is opt-in (FLUTTER_CEF_BLANK_FIRST).

CRISPNESS. dpr is now plumbed through the resize path so a canvas zoom re-renders
the OSR surface at the on-screen density (was: dropped at every layer below the
widget, so a zoomed-in tile upscaled a 1x texture -> blurry). CefWebView gains a
renderScale prop + resizes on dpr change; CefWebSession.dpr is mutable and
reallocates the IOSurface at logical*dpr; opResize carries dpr; cef_host updates
slot->dpr + NotifyScreenInfoChanged. Clamped to <=8 on every layer.

CLEANUP. Removed the refuted experiments (kOpRecover/DoRecover resize-recovery,
the coordinated-pump A/B + its knobs, born-hidden, gpu-mem/watchdog/verbose
switches, dead Slot fields). Per-slot pump + the diag counters (FLUTTER_CEF_DEBUG)
are kept.

TESTS. Dart: renderScale->dpr override, dpr-only change re-resizes (crispness
regression), <=8 clamp. New asserting real-host probe test/run_cascade_probe.sh:
N concurrent tiles all reach a first frame (never-blank). Pre-merge adversarial
audit done; the create-pacer slot-accounting findings (M1/M2) are fixed here.
Design notes in specs/osr-many-views.md + specs/osr-ecosystem-survey.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
# Conflicts:
#	packages/flutter_cef_macos/macos/Classes/CefWebSession.swift
@wenkaifan0720 wenkaifan0720 merged commit c29b93f into main Jun 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant