Skip to content

vscode-lm: automatic context condensing never triggers (maxTokens -1 + inflated window) #714

Description

@simurg79

Problem (one or two sentences)

On the VS Code LM (vscode-lm) provider, automatic context condensing does not trigger reliably. Even when the context gauge in the UI shows the window as effectively full, auto-condense never fires, so the conversation eventually overflows the model's real input limit.

Context (who is affected and when)

Affects users on the VS Code LM (Copilot) provider who rely on automatic context condensing. It surfaces during long conversations with large-window models (e.g., a Claude/GPT-5 family entry): the UI context gauge climbs to full, but auto-condense never kicks in.

Root cause

The vscode-lm provider reports maxTokens: -1 (unlimited) and an inflated live context window (Copilot's advertised window, far larger than the realistic usable input). Two problems follow:

  1. The condense gate computed contextPercent against the full context window instead of the available input space (contextWindow - reservedForOutput), so usage was under-reported and the threshold was never reached.
  2. A negative maxTokens (-1) was used directly as the reserved-output value, distorting the window math (maxTokens || DEFAULT kept -1).

Reproduction steps

  1. Select the VS Code LM (Copilot) provider with a large-window model (e.g., a Claude/GPT-5 family entry).
  2. Enable automatic context condensing with a normal threshold (e.g., 70–80%).
  3. Drive a long conversation until the UI context gauge shows the window near/over full.
  4. Observe that auto-condense does not trigger, despite the gauge indicating the context is effectively full.

Expected result

Auto-condense should fire in line with the context gauge: usage should be measured against the usable input space, and the gate should use the model's real input ceiling (the curated maxInputTokens) rather than the inflated live window. A negative/unlimited maxTokens should fall back to a sane default reserve.

Actual result

Auto-condense never triggers; usage is measured against the inflated full window, so the threshold is never reached and the conversation eventually overflows the model's real input limit.

Variations tried

Reproduces across large-window vscode-lm models regardless of the configured condense threshold, because the denominator (full live window) is wrong rather than the threshold.

App Version

N/A (provider-level context-management behavior; not tied to a specific release).

API Provider

VS Code Language Model API (vscode-lm / Copilot).

Model Used

Large-window vscode-lm models (e.g., a Claude/GPT-5 family entry).

Fix

Addressed by #710:

  • Treat maxTokens: -1 (unlimited) as the default output reserve in willManageContext/manageContext.
  • Measure contextPercent against available input space (contextWindow - reservedForOutput), with a safe fallback to the full window when the reserve is unknown.
  • Add an optional getCondenseContextWindow() ApiHandler seam; vscode-lm overrides it to use the curated static maxInputTokens.
  • Refresh the vscode-lm model catalog/default and add UI guards so the context bar and the gate share one source of truth.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions