Clarification on GPT-5.5 results under the Claude Code harness in Table 1

Hello, thank you for sharing this interesting work.

I had a question about the experimental setup in Table 1, specifically the rows under the **Claude Code harness**. The table lists the model as **GPT-5.5**, while the harness is described as **Claude Code**.

My understanding is that the paper separates the **target model** from the **execution harness**:

* `model`: the frozen target model that solves the task
* `harness`: the execution environment / adapter used to run the task

In that interpretation, the table would mean that **GPT-5.5 was evaluated inside a Claude Code-style harness**.

However, Section 4 also describes the Claude Code harness as mirroring the workspace contract through the `claude` CLI. Since Claude Code is usually understood as an Anthropic/Claude-based coding agent, I was unsure how GPT-5.5 is used in this setting.

Could you clarify which interpretation is correct?

1. Does “Claude Code harness + GPT-5.5” mean that the Claude Code harness is used only as an execution interface, while the underlying target model is GPT-5.5?
2. Or is the table meant to report results from Claude Code’s native model, in which case “GPT-5.5” might be a typo?
3. If GPT-5.5 was indeed used through the Claude Code harness, could you briefly explain how the model was connected to the `claude` CLI / harness adapter?

This would be helpful for correctly interpreting the cross-harness results and for avoiding confusion between the model and the execution environment.

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on GPT-5.5 results under the Claude Code harness in Table 1 #95

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Clarification on GPT-5.5 results under the Claude Code harness in Table 1 #95

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions