Skip to content

fix(runtime): prevent streaming-bridge deadlock on client disconnect (#482)#563

Merged
tejaskash merged 1 commit into
mainfrom
worktree-fix-482-streaming-bridge-deadlock
Jul 2, 2026
Merged

fix(runtime): prevent streaming-bridge deadlock on client disconnect (#482)#563
tejaskash merged 1 commit into
mainfrom
worktree-fix-482-streaming-bridge-deadlock

Conversation

@tejaskash

@tejaskash tejaskash commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #482

_async_gen_to_sync_gen runs its producer on the shared worker event loop and buffers through a bounded queue.Queue(maxsize=100) with a blocking q.put(). When an SSE consumer stops draining (client disconnect or the 15-min read timeout), the queue fills and the blocking put freezes the worker-loop thread process-wide — every other session on the microVM is starved (invocations hang, no logs, while /ping stays healthy).

Fix

  1. Producer uses put_nowait() and await asyncio.sleep() on queue.Full, so it never blocks the shared loop (happy path keeps the fast path, no added latency).
  2. A stop event set in the consumer's finally (fires on GeneratorExit at disconnect) tears the orphaned producer down and frees a slot so a parked producer wakes; the source generator is aclose()d.

Testing

  • Unit: worker loop survives an abandoned stream; a second session survives the first's disconnect.
  • Integration: real uvicorn server, mid-stream disconnect over a socket → /ping healthy and fresh session gets 200.
  • Full runtime suite: 325 passed, ruff clean. Verified against a standalone repro with the built wheel.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

✅ No Breaking Changes Detected

No public API breaking changes found in this PR.

@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jul 1, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jul 1, 2026
@tejaskash tejaskash force-pushed the worktree-fix-482-streaming-bridge-deadlock branch from 7b87269 to ff641f5 Compare July 1, 2026 23:07
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jul 1, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jul 1, 2026
@tejaskash tejaskash force-pushed the worktree-fix-482-streaming-bridge-deadlock branch from ff641f5 to 8681acf Compare July 1, 2026 23:09
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jul 1, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jul 1, 2026
@tejaskash tejaskash force-pushed the worktree-fix-482-streaming-bridge-deadlock branch from 1a45b73 to 8bfcb21 Compare July 1, 2026 23:15
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jul 1, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jul 1, 2026
…482)

The async->sync streaming bridge (_async_gen_to_sync_gen) ran its producer
coroutine on the single shared worker event loop and used a blocking
queue.Queue.put(). When an SSE consumer stopped draining (client disconnect
or the 15-min read timeout), the bounded queue filled and the blocking put
froze the worker-loop thread process-wide, starving every other session's
handler on the microVM. Symptoms: invocations hang with no logs while /ping
stays healthy.

Fix:
- Producer now uses a non-blocking put_nowait and yields the loop via
  asyncio.sleep when the queue is full, so it never blocks the shared loop.
- A stop event, set in the consumer's finally (fires on GeneratorExit at
  client disconnect), tears the orphaned producer down and frees a queue slot
  so a parked producer wakes and exits. The source async generator is aclosed.

Adds unit tests (worker loop survives an abandoned stream; a second session
survives the first's disconnect) and a real-server integration test that
reproduces the disconnect over a full HTTP stack.
@tejaskash tejaskash force-pushed the worktree-fix-482-streaming-bridge-deadlock branch from 8bfcb21 to 1e9504f Compare July 1, 2026 23:36
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] _async_gen_to_sync_gen deadlocks container when SSE consumer disconnects (queue blocks worker loop thread)

2 participants