Thread previous_response_id through Tzafon and OpenAI CUA providers by rgarcia · Pull Request #46 · kernel/cua

rgarcia · 2026-06-27T22:40:58Z

Summary

Adds server-side response-id threading to cua's two Responses-API computer-use providers (Tzafon, OpenAI). When a prior assistant responseId exists, the request chains via previous_response_id + store: true and sends only the delta (the latest screenshot) instead of replaying the full screenshot history every turn.

Fixes Tzafon, which resent the whole screenshot history each turn, overflowed its real ~64K window after ~4 turns, and degraded to prose (no tool call). Also cuts per-turn payload growth for OpenAI.
Threading is on by default with an opt-out: CUA_DISABLE_RESPONSE_THREADING (env) or a per-call disableResponseThreading option.
Stateless providers (Anthropic, Gemini, Yutori) are untouched — they keep full-history replay, which is the correct fallback.

What changed (all in `packages/ai`)

providers/common.ts — responseThreadingEnabled() (capability + opt-out) and responseThreadingDelta() (shared pure util: scans for the most recent assistant responseId → { previousResponseId, deltaMessages }).
providers/tzafon/provider.ts — extracted a pure buildTzafonRequestInput() and added previous_response_id + store + delta threading, gated on the flag.
providers/openai/provider.ts (new) — cua's own streamOpenAIResponses under a new api openai-cua-responses, replacing the route through pi-ai's builtin openai-responses. Calls the OpenAI SDK directly and threads previous_response_id + store: true + delta via the shared util. The previous store:true payload hook is folded into the builder.
providers.ts / models.ts — register the new api and route all OpenAI CUA models (registry-resolved and overrides) to it; pi-ai's openai-responses builtin is left registered and untouched.
Tests — tzafon-threading.test.ts + openai-threading.test.ts lock the behavior: ON sends exactly 1 delta screenshot + previous_response_id + store:true; OFF replays every screenshot with no id (the failure mode). Plus routing + runtime-spec coverage.

Verification

tsc -b + the packages/ai vitest suite (99 tests) green.
Tzafon — live, on a real Online-Mind2Web task: threading OFF stalls at turn 5 (stopReason=stop, prose, no tool call) and ends mid-task; threading ON emits a tool call at that same turn 5 and runs to a task-complete summary. Payload proof: OFF resends a growing screenshot history (input items 4→8→12→16, no previous_response_id); ON sends one delta screenshot/turn (input items flat at 2) with a real previous_response_id + store:true.
OpenAI (gpt-5.5) — live, same task: 8 turns, every turn emits a tool call with no API error, confirming the function_call_output image shape is accepted and the threading path (responseId capture → previous_response_id) works end-to-end; uncached input tokens stay flat (~3–5K/turn) rather than ballooning with replayed screenshots.

Notes

The new OpenAI provider was reviewed adversarially against pi's codex provider (openai-codex-responses.ts) for delta/store correctness. The continuity source differs by design: codex derives it from a pooled WebSocket connection; here it comes from the persisted responseId that pi-agent-core round-trips back into context.messages.
These packages are consumed elsewhere via the published @onkernel/* versions, so threading reaches a given consumer once cua-ai is republished (or the consumer resolves the workspace build).

Test plan

tsc -b + packages/ai vitest green
Tzafon live before/after on a real Online-Mind2Web task
OpenAI live smoke (threading + screenshot path)
follow-up: benchmark re-run / cost re-estimate with threading on

Add a shared response-threading capability (responseThreadingEnabled with CUA_DISABLE_RESPONSE_THREADING opt-out) and a pure responseThreadingDelta util that finds the most recent assistant responseId and returns the messages after it. Refactor the Tzafon request building into a pure buildTzafonRequestInput that, when threading is enabled and a prior responseId exists, chains via previous_response_id with store:true and sends only the delta screenshot instead of replaying the full screenshot history that overflows the window. Covered by a failure-mode test locking the off-path full-history growth and asserting the on-path delta + previous_response_id + store. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…d threading Give cua's OpenAI computer-use path its own Responses stream function instead of routing through pi-ai's builtin openai-responses adapter. streamOpenAIResponses calls the openai SDK client.responses.create directly and reuses the shared response-threading capability flag and delta util from the Tzafon phase: when a prior assistant responseId exists it chains via previous_response_id + store:true and sends only the delta screenshot, otherwise it replays the full history. Register it under OPENAI_CUA_RESPONSES_API and route every OpenAI CUA model to it in getCuaModel, including registry-resolved gpt-5.4/gpt-5.5 families that otherwise carry pi-ai's builtin api. Fold the store:true onPayload into the builder; pi-ai's openai-responses builtin is left untouched. Preserves the existing computer-use behavior: function-tool calls, pixel coordinates, system prompt, and store:true. Covered by a pure-builder threading test mirroring the Tzafon one and a routing assertion locking the api override. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

rgarcia and others added 2 commits June 27, 2026 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thread previous_response_id through Tzafon and OpenAI CUA providers#46

Thread previous_response_id through Tzafon and OpenAI CUA providers#46
rgarcia wants to merge 2 commits into
mainfrom
hypeship/cua-prev-response-id

rgarcia commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rgarcia commented Jun 27, 2026

Summary

What changed (all in packages/ai)

Verification

Notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

What changed (all in `packages/ai`)