Skip to content

Thread previous_response_id through Tzafon and OpenAI CUA providers#46

Draft
rgarcia wants to merge 2 commits into
mainfrom
hypeship/cua-prev-response-id
Draft

Thread previous_response_id through Tzafon and OpenAI CUA providers#46
rgarcia wants to merge 2 commits into
mainfrom
hypeship/cua-prev-response-id

Conversation

@rgarcia

@rgarcia rgarcia commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds server-side response-id threading to cua's two Responses-API computer-use providers (Tzafon, OpenAI). When a prior assistant responseId exists, the request chains via previous_response_id + store: true and sends only the delta (the latest screenshot) instead of replaying the full screenshot history every turn.

  • Fixes Tzafon, which resent the whole screenshot history each turn, overflowed its real ~64K window after ~4 turns, and degraded to prose (no tool call). Also cuts per-turn payload growth for OpenAI.
  • Threading is on by default with an opt-out: CUA_DISABLE_RESPONSE_THREADING (env) or a per-call disableResponseThreading option.
  • Stateless providers (Anthropic, Gemini, Yutori) are untouched — they keep full-history replay, which is the correct fallback.

What changed (all in packages/ai)

  • providers/common.tsresponseThreadingEnabled() (capability + opt-out) and responseThreadingDelta() (shared pure util: scans for the most recent assistant responseId{ previousResponseId, deltaMessages }).
  • providers/tzafon/provider.ts — extracted a pure buildTzafonRequestInput() and added previous_response_id + store + delta threading, gated on the flag.
  • providers/openai/provider.ts (new) — cua's own streamOpenAIResponses under a new api openai-cua-responses, replacing the route through pi-ai's builtin openai-responses. Calls the OpenAI SDK directly and threads previous_response_id + store: true + delta via the shared util. The previous store:true payload hook is folded into the builder.
  • providers.ts / models.ts — register the new api and route all OpenAI CUA models (registry-resolved and overrides) to it; pi-ai's openai-responses builtin is left registered and untouched.
  • Tests — tzafon-threading.test.ts + openai-threading.test.ts lock the behavior: ON sends exactly 1 delta screenshot + previous_response_id + store:true; OFF replays every screenshot with no id (the failure mode). Plus routing + runtime-spec coverage.

Verification

  • tsc -b + the packages/ai vitest suite (99 tests) green.
  • Tzafon — live, on a real Online-Mind2Web task: threading OFF stalls at turn 5 (stopReason=stop, prose, no tool call) and ends mid-task; threading ON emits a tool call at that same turn 5 and runs to a task-complete summary. Payload proof: OFF resends a growing screenshot history (input items 4→8→12→16, no previous_response_id); ON sends one delta screenshot/turn (input items flat at 2) with a real previous_response_id + store:true.
  • OpenAI (gpt-5.5) — live, same task: 8 turns, every turn emits a tool call with no API error, confirming the function_call_output image shape is accepted and the threading path (responseId capture → previous_response_id) works end-to-end; uncached input tokens stay flat (~3–5K/turn) rather than ballooning with replayed screenshots.

Notes

  • The new OpenAI provider was reviewed adversarially against pi's codex provider (openai-codex-responses.ts) for delta/store correctness. The continuity source differs by design: codex derives it from a pooled WebSocket connection; here it comes from the persisted responseId that pi-agent-core round-trips back into context.messages.
  • These packages are consumed elsewhere via the published @onkernel/* versions, so threading reaches a given consumer once cua-ai is republished (or the consumer resolves the workspace build).

Test plan

  • tsc -b + packages/ai vitest green
  • Tzafon live before/after on a real Online-Mind2Web task
  • OpenAI live smoke (threading + screenshot path)
  • follow-up: benchmark re-run / cost re-estimate with threading on

rgarcia and others added 2 commits June 27, 2026 22:08
Add a shared response-threading capability (responseThreadingEnabled with
CUA_DISABLE_RESPONSE_THREADING opt-out) and a pure responseThreadingDelta util
that finds the most recent assistant responseId and returns the messages after
it. Refactor the Tzafon request building into a pure buildTzafonRequestInput
that, when threading is enabled and a prior responseId exists, chains via
previous_response_id with store:true and sends only the delta screenshot
instead of replaying the full screenshot history that overflows the window.

Covered by a failure-mode test locking the off-path full-history growth and
asserting the on-path delta + previous_response_id + store.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…d threading

Give cua's OpenAI computer-use path its own Responses stream function instead of
routing through pi-ai's builtin openai-responses adapter. streamOpenAIResponses
calls the openai SDK client.responses.create directly and reuses the shared
response-threading capability flag and delta util from the Tzafon phase: when a
prior assistant responseId exists it chains via previous_response_id + store:true
and sends only the delta screenshot, otherwise it replays the full history.

Register it under OPENAI_CUA_RESPONSES_API and route every OpenAI CUA model to it
in getCuaModel, including registry-resolved gpt-5.4/gpt-5.5 families that
otherwise carry pi-ai's builtin api. Fold the store:true onPayload into the
builder; pi-ai's openai-responses builtin is left untouched. Preserves the
existing computer-use behavior: function-tool calls, pixel coordinates, system
prompt, and store:true.

Covered by a pure-builder threading test mirroring the Tzafon one and a routing
assertion locking the api override.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant