Problem (one or two sentences)
On the VS Code LM (vscode-lm) provider, automatic context condensing does not trigger reliably. Even when the context gauge in the UI shows the window as effectively full, auto-condense never fires, so the conversation eventually overflows the model's real input limit.
Context (who is affected and when)
Affects users on the VS Code LM (Copilot) provider who rely on automatic context condensing. It surfaces during long conversations with large-window models (e.g., a Claude/GPT-5 family entry): the UI context gauge climbs to full, but auto-condense never kicks in.
Root cause
The vscode-lm provider reports maxTokens: -1 (unlimited) and an inflated live context window (Copilot's advertised window, far larger than the realistic usable input). Two problems follow:
- The condense gate computed
contextPercent against the full context window instead of the available input space (contextWindow - reservedForOutput), so usage was under-reported and the threshold was never reached.
- A negative
maxTokens (-1) was used directly as the reserved-output value, distorting the window math (maxTokens || DEFAULT kept -1).
Reproduction steps
- Select the VS Code LM (Copilot) provider with a large-window model (e.g., a Claude/GPT-5 family entry).
- Enable automatic context condensing with a normal threshold (e.g., 70–80%).
- Drive a long conversation until the UI context gauge shows the window near/over full.
- Observe that auto-condense does not trigger, despite the gauge indicating the context is effectively full.
Expected result
Auto-condense should fire in line with the context gauge: usage should be measured against the usable input space, and the gate should use the model's real input ceiling (the curated maxInputTokens) rather than the inflated live window. A negative/unlimited maxTokens should fall back to a sane default reserve.
Actual result
Auto-condense never triggers; usage is measured against the inflated full window, so the threshold is never reached and the conversation eventually overflows the model's real input limit.
Variations tried
Reproduces across large-window vscode-lm models regardless of the configured condense threshold, because the denominator (full live window) is wrong rather than the threshold.
App Version
N/A (provider-level context-management behavior; not tied to a specific release).
API Provider
VS Code Language Model API (vscode-lm / Copilot).
Model Used
Large-window vscode-lm models (e.g., a Claude/GPT-5 family entry).
Fix
Addressed by #710:
- Treat
maxTokens: -1 (unlimited) as the default output reserve in willManageContext/manageContext.
- Measure
contextPercent against available input space (contextWindow - reservedForOutput), with a safe fallback to the full window when the reserve is unknown.
- Add an optional
getCondenseContextWindow() ApiHandler seam; vscode-lm overrides it to use the curated static maxInputTokens.
- Refresh the vscode-lm model catalog/default and add UI guards so the context bar and the gate share one source of truth.
Problem (one or two sentences)
On the VS Code LM (
vscode-lm) provider, automatic context condensing does not trigger reliably. Even when the context gauge in the UI shows the window as effectively full, auto-condense never fires, so the conversation eventually overflows the model's real input limit.Context (who is affected and when)
Affects users on the VS Code LM (Copilot) provider who rely on automatic context condensing. It surfaces during long conversations with large-window models (e.g., a Claude/GPT-5 family entry): the UI context gauge climbs to full, but auto-condense never kicks in.
Root cause
The
vscode-lmprovider reportsmaxTokens: -1(unlimited) and an inflated live context window (Copilot's advertised window, far larger than the realistic usable input). Two problems follow:contextPercentagainst the full context window instead of the available input space (contextWindow - reservedForOutput), so usage was under-reported and the threshold was never reached.maxTokens(-1) was used directly as the reserved-output value, distorting the window math (maxTokens || DEFAULTkept-1).Reproduction steps
Expected result
Auto-condense should fire in line with the context gauge: usage should be measured against the usable input space, and the gate should use the model's real input ceiling (the curated
maxInputTokens) rather than the inflated live window. A negative/unlimitedmaxTokensshould fall back to a sane default reserve.Actual result
Auto-condense never triggers; usage is measured against the inflated full window, so the threshold is never reached and the conversation eventually overflows the model's real input limit.
Variations tried
Reproduces across large-window vscode-lm models regardless of the configured condense threshold, because the denominator (full live window) is wrong rather than the threshold.
App Version
N/A (provider-level context-management behavior; not tied to a specific release).
API Provider
VS Code Language Model API (
vscode-lm/ Copilot).Model Used
Large-window vscode-lm models (e.g., a Claude/GPT-5 family entry).
Fix
Addressed by #710:
maxTokens: -1(unlimited) as the default output reserve inwillManageContext/manageContext.contextPercentagainst available input space (contextWindow - reservedForOutput), with a safe fallback to the full window when the reserve is unknown.getCondenseContextWindow()ApiHandlerseam;vscode-lmoverrides it to use the curated staticmaxInputTokens.