Skip to content

fix(session-broker): keep live sessions alive across machine sleep#492

Open
abgregs wants to merge 2 commits into
modem-dev:mainfrom
abgregs:fix/session-survive-sleep
Open

fix(session-broker): keep live sessions alive across machine sleep#492
abgregs wants to merge 2 commits into
modem-dev:mainfrom
abgregs:fix/session-survive-sleep

Conversation

@abgregs

@abgregs abgregs commented Jun 28, 2026

Copy link
Copy Markdown

What

After the machine sleeps (or any long freeze) past the stale-session TTL, the daemon's first post-wake sweep reaps sessions that are still alive — silently severing an in-progress review from the agent/CLI driving it via hunk session …. The Hunk window still renders, but becomes unreachable: session list drops it and session navigate/comment fail with "No active Hunk sessions."

Root cause

pruneStaleSessions measures staleness as now - lastSeenAt against the TTL, which assumes now advances smoothly while the process runs. On wake, the wall clock has jumped forward by the whole sleep duration while each session's lastSeenAt is frozen (they couldn't heartbeat while suspended) — so every session looks stale at once and is pruned, though nothing actually died. A wall-clock-vs-elapsed-liveness bug.

The fix

Track the time of the last prune (lastPruneAt). If more than a full TTL of wall time has elapsed since it, the gap is almost certainly a frozen daemon (sleep), not real idle time — so forgive that one sweep and let sessions heartbeat again before the next normal sweep. Genuinely gone sessions are still reaped on the following sweep. Both prune call sites (the interval sweep and the /health maintenance pulse) funnel through this method and share the state, so both paths are covered.

Testing

  • Unit tests with an injected clock: a live session survives a simulated ~5-min jump on the first post-wake sweep, and a still-silent session is still reaped on the next normal sweep (the grace doesn't make sessions immortal).
  • Reproduced and verified end-to-end on macOS (pmset sleepnow): before, the session drops from hunk session list on wake; after, it persists.
  • bun test, bun run typecheck, bun run lint, and bun run format:check all pass.

Known limitations / follow-up

This uses a minimal per-sweep grace — it skips the first prune after a detected jump, sufficient while the recurring sweep is the only frequent pruner (sessions re-heartbeat before the next sweep). Two narrow, currently-dormant edges are documented rather than handled:

  • Per-sweep, not time-windowed. If a second frequent pruner were added (e.g. /health becoming a polled endpoint), a prune could land in the post-wake recovery gap and reap a live session. The clean evolution is a time-windowed grace: on a detected jump, set a private graceUntil = now + graceWindowMs and suppress reaping until it passes, so every pruner defers until sessions re-check-in. (Preferred over refreshing lastSeenAt, which would mutate a shared, outward-facing field.)
  • Short-freeze boundary. A freeze just under the TTL that still pushes a session past its TTL isn't forgiven — acceptable, since real sleeps are minutes and the band is narrow and phase-dependent.

Happy to open a follow-up issue/PR for the time-windowed grace if useful.

Fixes #311

- Detect a wall-clock jump larger than the TTL between sweeps as a likely
  daemon freeze from sleep rather than every session going silent at once
- Forgive that first post-wake sweep so live sessions can heartbeat again
  before the next normal sweep can reap them
- Keep the grace per-sweep so genuinely gone sessions are still pruned

Fixes modem-dev#311
@greptile-apps

greptile-apps Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

PR author is not in the allowed authors list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Daemon loses track of live sessions after laptop sleep / lid-close

1 participant