fix(session-broker): keep live sessions alive across machine sleep#492
Open
abgregs wants to merge 2 commits into
Open
fix(session-broker): keep live sessions alive across machine sleep#492abgregs wants to merge 2 commits into
abgregs wants to merge 2 commits into
Conversation
- Detect a wall-clock jump larger than the TTL between sweeps as a likely daemon freeze from sleep rather than every session going silent at once - Forgive that first post-wake sweep so live sessions can heartbeat again before the next normal sweep can reap them - Keep the grace per-sweep so genuinely gone sessions are still pruned Fixes modem-dev#311
Contributor
|
PR author is not in the allowed authors list. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
After the machine sleeps (or any long freeze) past the stale-session TTL, the daemon's first post-wake sweep reaps sessions that are still alive — silently severing an in-progress review from the agent/CLI driving it via
hunk session …. The Hunk window still renders, but becomes unreachable:session listdrops it andsession navigate/commentfail with "No active Hunk sessions."Root cause
pruneStaleSessionsmeasures staleness asnow - lastSeenAtagainst the TTL, which assumesnowadvances smoothly while the process runs. On wake, the wall clock has jumped forward by the whole sleep duration while each session'slastSeenAtis frozen (they couldn't heartbeat while suspended) — so every session looks stale at once and is pruned, though nothing actually died. A wall-clock-vs-elapsed-liveness bug.The fix
Track the time of the last prune (
lastPruneAt). If more than a full TTL of wall time has elapsed since it, the gap is almost certainly a frozen daemon (sleep), not real idle time — so forgive that one sweep and let sessions heartbeat again before the next normal sweep. Genuinely gone sessions are still reaped on the following sweep. Both prune call sites (the interval sweep and the/healthmaintenance pulse) funnel through this method and share the state, so both paths are covered.Testing
pmset sleepnow): before, the session drops fromhunk session liston wake; after, it persists.bun test,bun run typecheck,bun run lint, andbun run format:checkall pass.Known limitations / follow-up
This uses a minimal per-sweep grace — it skips the first prune after a detected jump, sufficient while the recurring sweep is the only frequent pruner (sessions re-heartbeat before the next sweep). Two narrow, currently-dormant edges are documented rather than handled:
/healthbecoming a polled endpoint), a prune could land in the post-wake recovery gap and reap a live session. The clean evolution is a time-windowed grace: on a detected jump, set a privategraceUntil = now + graceWindowMsand suppress reaping until it passes, so every pruner defers until sessions re-check-in. (Preferred over refreshinglastSeenAt, which would mutate a shared, outward-facing field.)Happy to open a follow-up issue/PR for the time-windowed grace if useful.
Fixes #311