Describe the bug
Sometimes when using Zoo code with a local openai provider (SGLang endpoint with a local model), prompt condensing will fail with 400 Bad request on the chat completions endpoint.
To Reproduce
Steps to reproduce the behavior:
- Open a workspace
- Open Zoo code, send a request that requires the AI to read a large amount of code
- Context fills up, condensing context beings
- Error 400 appears
- The Agent can't do the task because the chat history will be continuously truncated
Expected behavior
Condensing context runs successfully
Screenshots
What version of zoo are you running
Version: 3.62.0 (40660f1)
Additional context
Logs from SGLang only indicate a bad request:
[2026-06-25 10:10:33] Decode batch, #running-req: 1, #full token: 51338, full token usage: 0.66, mamba num: 2, mamba usage: 0.07, cuda graph: True, gen throughput (token/s): 35.49, #queue-req: 0
[2026-06-25 10:10:34] INFO: 10.1.0.105:50783 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
Note: this started like a week ago, before then it worked perfectly and I did not change anything on the sglang side.
Note 2: When this happens, starting a new chat (task) with the exact same prompt can fix the issue, and condensing runs as normal.
Describe the bug
Sometimes when using Zoo code with a local openai provider (SGLang endpoint with a local model), prompt condensing will fail with 400 Bad request on the chat completions endpoint.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Condensing context runs successfully
Screenshots
What version of zoo are you running
Additional context
Logs from SGLang only indicate a bad request:
Note: this started like a week ago, before then it worked perfectly and I did not change anything on the sglang side.
Note 2: When this happens, starting a new chat (task) with the exact same prompt can fix the issue, and condensing runs as normal.