Skip to content

fix: avoid retaining a completed task per protocol message on Python 3.14 (#41357)#3114

Open
yen0304 wants to merge 1 commit into
microsoft:mainfrom
yen0304:fix/route-task-leak-py314
Open

fix: avoid retaining a completed task per protocol message on Python 3.14 (#41357)#3114
yen0304 wants to merge 1 commit into
microsoft:mainfrom
yen0304:fix/route-task-leak-py314

Conversation

@yen0304

@yen0304 yen0304 commented Jun 18, 2026

Copy link
Copy Markdown

Fixes microsoft/playwright#41357.

Summary

Registering context.route(...) / page.route(...) caused the async client to retain one completed asyncio.Task per intercepted request until the Connection closed (end of async_playwright()). Closing the page, context, or browser did not release them, so memory grew roughly linearly with the number of intercepted requests on a long-lived connection.

Root cause

Every protocol send in Channel._inner_send awaited the long-lived transport on_error_future alongside its per-message callback.future:

done, _ = await asyncio.wait(
    {self._connection._transport.on_error_future, callback.future},
    return_when=asyncio.FIRST_COMPLETED,
)

on_error_future is created once per connection (Transport.__init__) and only resolves when the transport errors/closes. On Python 3.14, asyncio added an await-graph (Future._asyncio_awaited_by) that keeps strong references between a task and the futures it awaits. Because each send runs in its own task (route handlers are dispatched as tasks) and awaits the never-resolving on_error_future, every completed send task stayed registered in on_error_future._asyncio_awaited_by and was only released when that future — and the whole Connection — went away.

This is interception-agnostic (a handler that continue_()s leaks the same as one that abort()s) because the leak is in the generic send path, which route handling exercises once per request.

Reproduction

Pinned down with gc.get_referrers; the retention chain is:

completed Channel.send Task  ->  set (_asyncio_awaited_by)  ->  on_error_future (Future)  ->  PipeTransport

Minimal, browserless confirmation (distinct task per wait against a long-lived future):

retained done tasks after 300 sends
Python 3.12 1
Python 3.14 300

Full repro from the issue (80 sub-resources/page, intercept-and-abort):

done tasks after browser.close()
3.14 before fix 961
3.14 after fix 1
3.12 (before/after) 1

The fix

Forward transport errors to the pending callback.future through a done callback instead of awaiting on_error_future directly, so each send's await graph stays local to its own short-lived future and is released as soon as the send completes. Behaviour is preserved:

  • a send still raises the transport error if the transport dies mid-flight (verified by force-killing the driver process mid-send: the call raises rather than hanging, on both 3.12 and 3.14);
  • the done callback is always removed in a finally block;
  • no double-resolution of callback.future (avoids the InvalidStateError that a naive set_exception forward would race with Connection.cleanup).

Test plan

  • New regression test test_route_should_not_leak_done_tasks — intercepts 100 requests across 5 contexts and asserts retained completed tasks stay bounded. Verified it fails on unpatched 3.14 (leaked 200) and passes after the fix (leaked 0); passes on 3.12 too.
  • Existing transport-error propagation still raises (no hang) on 3.12 and 3.14.
  • black / isort / flake8 clean on the changed files.

…3.14

Each protocol send awaited the long-lived transport `on_error_future` together with its
per-message callback future via `asyncio.wait(..., FIRST_COMPLETED)`. On Python 3.14 the new
asyncio await-graph keeps strong references between a task and the futures it awaits, so every
completed send task stayed registered in `on_error_future._asyncio_awaited_by` until that future
resolved, i.e. until the whole Connection closed. Intercepting requests (context.route / page.route)
spawns one such task per intercepted request, so memory grew linearly with the number of requests
and was not released by closing the page, context, or browser.

Forward transport errors to the pending callback future via a done callback instead of awaiting
`on_error_future` directly, so each send's await graph stays local to its own short-lived future.
The transport-error behaviour is preserved: a send still raises the transport error if the
transport dies mid-flight, and the callback is removed in a finally block.

Adds a regression test that intercepts many requests and asserts completed asyncio tasks are not
retained after the contexts are closed.
@mturac

mturac commented Jun 18, 2026

Copy link
Copy Markdown

Nice fix. I tried this branch locally against the #41357 shape on Python 3.14.4 / Chromium. It removes the accumulating Channel.send tasks, but I still see the last route-handler batch retained after context.close() and browser.close():

after ctx 1: 20 main.<locals>.h + init
after ctx 2: 20 main.<locals>.h + init
after ctx 3: 20 main.<locals>.h + init
after browser close: 20 main.<locals>.h + init

I think there is one more asyncio.wait() instance on the route path: Route._race_with_page_close() waits on [fut, target_closed_future], so on Python 3.14 the route-handler task can still be recorded in the page close future's await graph. Avoiding asyncio.wait() there as well dropped my local repro to just Connection.run.<locals>.init.

It may also be worth tightening the regression to count Channel.send and route handler tasks separately; the current < 50 threshold can still pass with one retained handler batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: context.route retains completed Tasks until Connection close (memory leak)

2 participants