Thread previous_response_id through Tzafon and OpenAI CUA providers#46
Draft
rgarcia wants to merge 2 commits into
Draft
Thread previous_response_id through Tzafon and OpenAI CUA providers#46rgarcia wants to merge 2 commits into
rgarcia wants to merge 2 commits into
Conversation
Add a shared response-threading capability (responseThreadingEnabled with CUA_DISABLE_RESPONSE_THREADING opt-out) and a pure responseThreadingDelta util that finds the most recent assistant responseId and returns the messages after it. Refactor the Tzafon request building into a pure buildTzafonRequestInput that, when threading is enabled and a prior responseId exists, chains via previous_response_id with store:true and sends only the delta screenshot instead of replaying the full screenshot history that overflows the window. Covered by a failure-mode test locking the off-path full-history growth and asserting the on-path delta + previous_response_id + store. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…d threading Give cua's OpenAI computer-use path its own Responses stream function instead of routing through pi-ai's builtin openai-responses adapter. streamOpenAIResponses calls the openai SDK client.responses.create directly and reuses the shared response-threading capability flag and delta util from the Tzafon phase: when a prior assistant responseId exists it chains via previous_response_id + store:true and sends only the delta screenshot, otherwise it replays the full history. Register it under OPENAI_CUA_RESPONSES_API and route every OpenAI CUA model to it in getCuaModel, including registry-resolved gpt-5.4/gpt-5.5 families that otherwise carry pi-ai's builtin api. Fold the store:true onPayload into the builder; pi-ai's openai-responses builtin is left untouched. Preserves the existing computer-use behavior: function-tool calls, pixel coordinates, system prompt, and store:true. Covered by a pure-builder threading test mirroring the Tzafon one and a routing assertion locking the api override. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds server-side response-id threading to cua's two Responses-API computer-use providers (Tzafon, OpenAI). When a prior assistant
responseIdexists, the request chains viaprevious_response_id+store: trueand sends only the delta (the latest screenshot) instead of replaying the full screenshot history every turn.CUA_DISABLE_RESPONSE_THREADING(env) or a per-calldisableResponseThreadingoption.What changed (all in
packages/ai)providers/common.ts—responseThreadingEnabled()(capability + opt-out) andresponseThreadingDelta()(shared pure util: scans for the most recent assistantresponseId→{ previousResponseId, deltaMessages }).providers/tzafon/provider.ts— extracted a purebuildTzafonRequestInput()and addedprevious_response_id+store+ delta threading, gated on the flag.providers/openai/provider.ts(new) — cua's ownstreamOpenAIResponsesunder a new apiopenai-cua-responses, replacing the route through pi-ai's builtinopenai-responses. Calls the OpenAI SDK directly and threadsprevious_response_id+store: true+ delta via the shared util. The previousstore:truepayload hook is folded into the builder.providers.ts/models.ts— register the new api and route all OpenAI CUA models (registry-resolved and overrides) to it; pi-ai'sopenai-responsesbuiltin is left registered and untouched.tzafon-threading.test.ts+openai-threading.test.tslock the behavior: ON sends exactly 1 delta screenshot +previous_response_id+store:true; OFF replays every screenshot with no id (the failure mode). Plus routing + runtime-spec coverage.Verification
tsc -b+ thepackages/aivitest suite (99 tests) green.stopReason=stop, prose, no tool call) and ends mid-task; threading ON emits a tool call at that same turn 5 and runs to a task-complete summary. Payload proof: OFF resends a growing screenshot history (input items 4→8→12→16, noprevious_response_id); ON sends one delta screenshot/turn (input items flat at 2) with a realprevious_response_id+store:true.function_call_outputimage shape is accepted and the threading path (responseId capture →previous_response_id) works end-to-end; uncached input tokens stay flat (~3–5K/turn) rather than ballooning with replayed screenshots.Notes
openai-codex-responses.ts) for delta/store correctness. The continuity source differs by design: codex derives it from a pooled WebSocket connection; here it comes from the persistedresponseIdthat pi-agent-core round-trips back intocontext.messages.@onkernel/*versions, so threading reaches a given consumer once cua-ai is republished (or the consumer resolves the workspace build).Test plan
tsc -b+packages/aivitest green