Load pi extensions in the cua CLI with /reload and trust gating#41
Load pi extensions in the cua CLI with /reload and trust gating#41rgarcia wants to merge 7 commits into
Conversation
Load pi extensions (arbitrary TS, default-exported factory) against cua's lower-level AgentHarness, which pi's AgentSession-based extension system does not bind to. Reuses pi's host-agnostic loader and runner: binds the runner's action seams to the harness, bridges harness events into the runner's extension-event emitters, registers extension tools, and mirrors AgentSession.reload for hot-reload. Tier A scope: tools, events, model/thinking/active-tool control, and session-entry writes, headless (no-op UI). Slash commands, flags, ctx.ui.*, and the session-replacement family are deferred (stubbed to throw). Re-applies the extension-tool union on model_update because CuaAgentHarness.setModel rebuilds tools from construction-time extraTools and would otherwise drop runtime-registered ones. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 4 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for all 4 issues found in the latest run.
- ✅ Fixed: sendUserMessage lacks screenshot
sendUserMessagenow routes through a host prompt path that attaches an initial screenshot on first-turn sessions via{ images }before callingharness.prompt.
- ✅ Fixed: Reapply reactivates extension tools
reapplyToolsnow preserves the existing active-tool set and only auto-activates newly introduced extension tools instead of force-enabling all extension tools.
- ✅ Fixed: Reload races async dispose
- Shutdown requests raised during reload are now latched and handled inside reload before rebuilding, preventing async dispose from tearing down a freshly rebuilt runner/bridge.
- ✅ Fixed: Skipped reapply while applying tools
- Nested
reapplyToolscalls now set a queue flag and trigger a follow-up pass after the currentsetToolscall completes, so refresh requests are not dropped.
- Nested
Or push these changes by commenting:
@cursor push ee06abc4e1
Preview (ee06abc4e1)
diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -1,4 +1,5 @@
import type { AgentHarness, AgentTool, Session } from "@onkernel/cua-agent";
+import type { ImageContent } from "@onkernel/cua-ai";
import {
AuthStorage,
discoverAndLoadExtensions,
@@ -23,6 +24,8 @@
harness: AgentHarness;
/** The same `Session` the harness was constructed with; used for entry writes. */
session: Session;
+ /** Capture a screenshot attachment for first-turn user messages. */
+ initialScreenshot: () => Promise<ImageContent[] | undefined>;
cwd: string;
/** Extension paths passed straight to `discoverAndLoadExtensions`. */
configuredPaths: string[];
@@ -48,6 +51,7 @@
export class HarnessExtensionHost {
private readonly harness: AgentHarness;
private readonly session: Session;
+ private readonly initialScreenshot: () => Promise<ImageContent[] | undefined>;
private readonly cwd: string;
private readonly configuredPaths: string[];
private readonly agentDir?: string;
@@ -66,6 +70,12 @@
private extensionTools: AgentTool[] = [];
/** Guards `harness.setTools` so a tools_update never re-enters reapply. */
private applyingTools = false;
+ /** Follow-up pass requested while `harness.setTools` is in flight. */
+ private reapplyQueued = false;
+ /** Marks reload critical sections where shutdown requests must not race. */
+ private reloading = false;
+ /** Sticky shutdown request raised by `ctx.shutdown()` or owner disposal. */
+ private shutdownRequested = false;
/** Guards `dispose` so `ctx.shutdown()` and an owner call don't double-tear-down. */
private disposed = false;
private sessionName: string | undefined;
@@ -76,6 +86,7 @@
constructor(options: HarnessExtensionHostOptions) {
this.harness = options.harness;
this.session = options.session;
+ this.initialScreenshot = options.initialScreenshot;
this.cwd = options.cwd;
this.configuredPaths = options.configuredPaths;
this.agentDir = options.agentDir;
@@ -84,6 +95,7 @@
this.actions = makeExtensionActions(this.harness, this.session, {
refreshTools: () => void this.reapplyTools(),
+ sendUserMessage: (text) => this.promptUserMessage(text),
getSessionName: () => this.sessionName,
setSessionName: (name) => {
this.sessionName = name;
@@ -112,17 +124,26 @@
* the loader imports each extension fresh from disk.
*/
async reload(): Promise<void> {
+ if (this.disposed) return;
const flags = this.runner?.getFlagValues() ?? new Map<string, boolean | string>();
- await this.runner?.emit({ type: "session_shutdown", reason: "reload" });
- this.teardownBridge?.();
- this.teardownBridge = undefined;
+ this.reloading = true;
+ try {
+ await this.runner?.emit({ type: "session_shutdown", reason: "reload" });
+ if (await this.disposeIfShutdownRequested()) return;
+ this.teardownBridge?.();
+ this.teardownBridge = undefined;
- await this.buildRunner();
- for (const [name, value] of flags) this.runner?.setFlagValue(name, value);
+ await this.buildRunner();
+ if (await this.disposeIfShutdownRequested()) return;
+ for (const [name, value] of flags) this.runner?.setFlagValue(name, value);
- await this.reapplyTools();
- this.installBridge();
- await this.runner?.emit({ type: "session_start", reason: "reload" });
+ await this.reapplyTools();
+ if (await this.disposeIfShutdownRequested()) return;
+ this.installBridge();
+ await this.runner?.emit({ type: "session_start", reason: "reload" });
+ } finally {
+ this.reloading = false;
+ }
}
/**
@@ -135,6 +156,7 @@
*/
async dispose(): Promise<void> {
if (this.disposed) return;
+ this.shutdownRequested = true;
this.disposed = true;
this.teardownBridge?.();
this.teardownBridge = undefined;
@@ -159,28 +181,45 @@
/**
* Rebuild the extension-tool union and apply it to the harness as the
- * authoritative tool list. Extension tools are de-duped by name (the harness
- * rejects duplicates) and kept active alongside the base tools. The
- * re-entrancy guard makes a stray `tools_update` subscriber safe; reapply is
- * only triggered from `load`/`reload`/`model_update`/`refreshTools`, none of
- * which run concurrently.
+ * authoritative tool list. Existing active-tool choices are preserved for
+ * both base and extension tools, while newly introduced extension tools start
+ * active by default. A queued follow-up pass handles refresh requests that
+ * arrive while `harness.setTools` is still in flight.
*/
private async reapplyTools(): Promise<void> {
- if (!this.runner || this.applyingTools) return;
- this.extensionTools = wrapRegisteredTools(this.runner.getAllRegisteredTools(), this.runner);
- const extensionNames = new Set(this.extensionTools.map((tool) => tool.name));
- const base = this.harness.getTools().filter((tool) => !extensionNames.has(tool.name));
- const final = [...base, ...this.extensionTools];
- const activeNames = [
- ...this.harness.getActiveTools().map((tool) => tool.name),
- ...extensionNames,
- ];
- this.applyingTools = true;
- try {
- await this.harness.setTools(final, [...new Set(activeNames)]);
- } finally {
- this.applyingTools = false;
+ if (!this.runner) return;
+ if (this.applyingTools) {
+ this.reapplyQueued = true;
+ return;
}
+ do {
+ this.reapplyQueued = false;
+ if (!this.runner) return;
+
+ const previousExtensionNames = new Set(this.extensionTools.map((tool) => tool.name));
+ const nextExtensionTools = wrapRegisteredTools(this.runner.getAllRegisteredTools(), this.runner);
+ const extensionNames = new Set(nextExtensionTools.map((tool) => tool.name));
+ const base = this.harness.getTools().filter((tool) => !extensionNames.has(tool.name));
+ const final = [...base, ...nextExtensionTools];
+ const finalNames = new Set(final.map((tool) => tool.name));
+ const activeNames = new Set(
+ this.harness
+ .getActiveTools()
+ .map((tool) => tool.name)
+ .filter((name) => finalNames.has(name)),
+ );
+ for (const name of extensionNames) {
+ if (!previousExtensionNames.has(name)) activeNames.add(name);
+ }
+
+ this.extensionTools = nextExtensionTools;
+ this.applyingTools = true;
+ try {
+ await this.harness.setTools(final, [...activeNames]);
+ } finally {
+ this.applyingTools = false;
+ }
+ } while (this.reapplyQueued);
}
private installBridge(): void {
@@ -191,6 +230,35 @@
}
private requestShutdown(): void {
+ this.shutdownRequested = true;
+ if (this.reloading) return;
void this.dispose();
}
+
+ private async promptUserMessage(text: string): Promise<void> {
+ const images = await this.maybeInitialScreenshot();
+ await this.harness.prompt(text, images ? { images } : undefined);
+ }
+
+ private async maybeInitialScreenshot(): Promise<ImageContent[] | undefined> {
+ const hasPriorTurn = await sessionHasPriorTurn(this.session);
+ if (hasPriorTurn) return undefined;
+ return this.initialScreenshot();
+ }
+
+ private async disposeIfShutdownRequested(): Promise<boolean> {
+ if (!this.shutdownRequested && !this.disposed) return false;
+ await this.dispose();
+ return true;
+ }
}
+
+async function sessionHasPriorTurn(session: Session): Promise<boolean> {
+ const entries = await session.getBranch();
+ for (const entry of entries) {
+ if (entry.type === "message" && (entry.message.role === "user" || entry.message.role === "assistant")) {
+ return true;
+ }
+ }
+ return false;
+}
diff --git a/packages/cli/src/extensions/seams.ts b/packages/cli/src/extensions/seams.ts
--- a/packages/cli/src/extensions/seams.ts
+++ b/packages/cli/src/extensions/seams.ts
@@ -19,6 +19,8 @@
export interface SeamHooks {
/** Re-apply the authoritative base+extension tool union to the harness. */
refreshTools: () => void;
+ /** Forward user text through the host's first-turn image attachment path. */
+ sendUserMessage: (text: string) => Promise<void>;
/** Synchronous mirror of the session name (kept because the action getter is sync). */
getSessionName: () => string | undefined;
/** Record the latest session name set through the action surface. */
@@ -41,7 +43,7 @@
},
sendUserMessage(content): void {
const text = typeof content === "string" ? content : textPartsOf(content);
- void harness.prompt(text);
+ void hooks.sendUserMessage(text);
},
appendEntry(customType, data): void {
void session.appendCustomEntry(customType, data);
diff --git a/packages/cli/test/extensions.test.ts b/packages/cli/test/extensions.test.ts
--- a/packages/cli/test/extensions.test.ts
+++ b/packages/cli/test/extensions.test.ts
@@ -36,6 +36,7 @@
const created = new HarnessExtensionHost({
harness: fx.harness,
session: fx.session,
+ initialScreenshot: async () => undefined,
cwd: fx.cwd,
configuredPaths: [makeExtensionDir()],
agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),You can send follow-ups to the cloud agent here.
- reapplyTools: preserve base active state and re-activate extension tools unless explicitly deactivated through the host's setActiveTools seam (tracked in inactiveExtensionTools). Keeps extension tools active across a setModel — which rebuilds the harness tool list and drops them — while honoring an opt-out, instead of unconditionally re-enabling every extension tool. - reapplyTools: coalesce a reapply requested while setTools is in flight into a follow-up pass rather than dropping it. - reload: latch a ctx.shutdown() raised during the reload critical section and honor it at await boundaries, so an unawaited dispose can't tear down the freshly rebuilt runner and bridge. - sendUserMessage: route through the host and attach the first-turn screenshot via an optional initialScreenshot callback, matching the CLI's prompt sites. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 4 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for all 4 issues found in the latest run.
- ✅ Fixed: Load after dispose breaks lifecycle
load()now returns immediately when the host is disposed, preventing runner/bridge reconstruction after terminal shutdown.
- ✅ Fixed: Repeated load stacks bridge listeners
load()now no-ops once a runner already exists, so duplicate bridge installations and stacked harness listeners cannot occur.
- ✅ Fixed: Nested reload double teardown
reload()now exits early whenreloadingis already true, preventing reentrant reload passes from tearing down each other’s newly built state.
- ✅ Fixed: Reload skips idle wait
reload()now awaitsharness.waitForIdle()before shutdown/bridge teardown, so in-flight runs finish before listeners are detached.
Or push these changes by commenting:
@cursor push a135464efc
Preview (a135464efc)
diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -118,6 +118,7 @@
}
async load(): Promise<void> {
+ if (this.disposed || this.runner) return;
await this.buildRunner();
await this.reapplyTools();
this.installBridge();
@@ -132,14 +133,16 @@
* the loader imports each extension fresh from disk.
*/
async reload(): Promise<void> {
- if (this.disposed) return;
- const flags = this.runner?.getFlagValues() ?? new Map<string, boolean | string>();
+ if (this.disposed || this.reloading) return;
// `reloading` defers any `ctx.shutdown()` raised by an extension's
// session_shutdown handler so an unawaited dispose can't tear down the
- // runner/bridge mid-rebuild. Each await boundary then honors a pending
- // request before continuing.
+ // runner/bridge mid-rebuild (including while waiting for the harness to go
+ // idle). Each await boundary then honors a pending request before continuing.
this.reloading = true;
try {
+ await this.harness.waitForIdle();
+ if (await this.disposeIfShutdownRequested()) return;
+ const flags = this.runner?.getFlagValues() ?? new Map<string, boolean | string>();
await this.runner?.emit({ type: "session_shutdown", reason: "reload" });
if (await this.disposeIfShutdownRequested()) return;
this.teardownBridge?.();You can send follow-ups to the cloud agent here.
| await this.reapplyTools(); | ||
| this.installBridge(); | ||
| await this.runner?.emit({ type: "session_start", reason: "startup" }); | ||
| } |
There was a problem hiding this comment.
Load after dispose breaks lifecycle
High Severity
Calling load() after dispose() rebuilds the runner and bridge while disposed stays true, so dispose(), reload(), and shutdown via ctx.shutdown() become no-ops. The harness keeps forwarding events, but the host cannot be torn down or reloaded cleanly.
Reviewed by Cursor Bugbot for commit e574d60. Configure here.
| this.teardownBridge = installBridge(this.harness, this.runner, this.bridgeState, () => | ||
| this.reapplyTools(), | ||
| ); | ||
| } |
There was a problem hiding this comment.
Repeated load stacks bridge listeners
Medium Severity
A second load() on the same host calls installBridge() without tearing down the previous bridge or emitting session_shutdown on the old runner. Earlier harness subscribers stay registered, so events are forwarded to multiple runners and teardown only removes the latest bridge.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit e574d60. Configure here.
| await this.reapplyTools(); | ||
| if (await this.disposeIfShutdownRequested()) return; | ||
| this.installBridge(); | ||
| await this.runner?.emit({ type: "session_start", reason: "reload" }); |
There was a problem hiding this comment.
Nested reload double teardown
Medium Severity
reload() has no reentrancy guard. If an extension triggers another reload() while the outer call is still awaiting session_shutdown, the inner run can finish (new runner, bridge, session_start) and then the outer call continues from its next lines, tearing down that bridge and rebuilding again. Extensions may see duplicate shutdown/start cycles and inconsistent runner state.
Reviewed by Cursor Bugbot for commit e574d60. Configure here.
| await this.runner?.emit({ type: "session_shutdown", reason: "reload" }); | ||
| if (await this.disposeIfShutdownRequested()) return; | ||
| this.teardownBridge?.(); | ||
| this.teardownBridge = undefined; |
There was a problem hiding this comment.
Reload skips idle wait
Medium Severity
HarnessExtensionHost.reload() tears down the event bridge immediately without awaiting harness.waitForIdle(), even though the command seam exposes idle waiting and the PR claims parity with AgentSession.reload. Reload during an in-flight agent run detaches extension listeners and reducers while the harness loop keeps running, so extensions miss bridged events for that run and bridgeState can stay wrong until the next agent_start/agent_end.
Reviewed by Cursor Bugbot for commit e574d60. Configure here.
Five end-to-end tests drive HarnessExtensionHost through the full learn-a-tool loop: an inefficient first run (base computer-use steps), a meta-agent-authored learned tool written to disk, host.reload() discovering it, and a second run that calls the learned tool in one step. Each models a real computer-use use case — DOM table extraction, template-match click, parameterized form-fill macro, navigation shortcut, and a pagination de-dup extractor that additionally proves an extension's agent_start handler is re-bound after reload (result reports runs=1: the handler re-fired, and the fresh-from-disk import reset the prior count). Learned tools are pure JS over inputs that stand in for screenshots/DOM (the fake harness has no browser), exercising load/reload, reapplyTools registration+activation, and the event bridge end to end. No host changes were needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
There are 5 total unresolved issues (including 4 from previous reviews).
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Reload keeps removed extension tools
reapplyToolsnow excludes both newly loaded and previously registered extension tool names from the base harness list, so removed extensions are dropped on reload, and a regression test covers this case.
Or push these changes by commenting:
@cursor push 9153a337fb
Preview (9153a337fb)
diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -218,7 +218,12 @@
this.runner,
);
const extensionNames = new Set(nextExtensionTools.map((tool) => tool.name));
- const base = this.harness.getTools().filter((tool) => !extensionNames.has(tool.name));
+ const previousExtensionNames = new Set(this.extensionTools.map((tool) => tool.name));
+ const base = this.harness
+ .getTools()
+ .filter(
+ (tool) => !extensionNames.has(tool.name) && !previousExtensionNames.has(tool.name),
+ );
const final = [...base, ...nextExtensionTools];
const finalNames = new Set(final.map((tool) => tool.name));
const activeNames = new Set(
diff --git a/packages/cli/test/extensions.test.ts b/packages/cli/test/extensions.test.ts
--- a/packages/cli/test/extensions.test.ts
+++ b/packages/cli/test/extensions.test.ts
@@ -1,5 +1,5 @@
import { afterEach, describe, expect, it } from "vitest";
-import { cpSync, mkdtempSync } from "node:fs";
+import { cpSync, mkdtempSync, rmSync } from "node:fs";
import { tmpdir } from "node:os";
import { dirname, join } from "node:path";
import { fileURLToPath } from "node:url";
@@ -104,4 +104,29 @@
expect(fx!.harness.getTools().map((tool) => tool.name)).toContain("click_visual");
});
+
+ it("drops removed extension tools after reload", async () => {
+ fx = await buildTestHarness({
+ turns: [
+ { steps: [{ type: "tool_call", toolName: "click_visual", args: { description: "the button" } }] },
+ { steps: [{ type: "text", text: "done" }] },
+ ],
+ });
+ const extDir = makeExtensionDir();
+ const created = new HarnessExtensionHost({
+ harness: fx.harness,
+ session: fx.session,
+ cwd: fx.cwd,
+ configuredPaths: [extDir],
+ agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
+ });
+ await created.load();
+ host = created;
+ expect(fx.harness.getTools().map((tool) => tool.name)).toContain("click_visual");
+
+ rmSync(join(extDir, "click-visual.ts"));
+ await created.reload();
+
+ expect(fx.harness.getTools().map((tool) => tool.name)).not.toContain("click_visual");
+ });
});You can send follow-ups to the cloud agent here.
reapplyTools built the base tool list by filtering the live harness tools only against the newly loaded extension set, so a tool registered by a prior generation that a reload removed or renamed lingered on the harness, bound to the dead runner generation. Exclude prior-generation extension tool names from base as well, and cover it with a reload test that renames a tool. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 2 potential issues.
There are 6 total unresolved issues (including 4 from previous reviews).
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Base tools lost after name clash
reapplyTools()now rebuilds base tools fromCuaAgentHarness.getRuntimeTools()when available so a removed shadowing extension no longer deletes the underlying built-in tool.
- ✅ Fixed: Load succeeds after startup shutdown
load()now checks for a latched shutdown after startupsession_start, disposes, and throws so startup does not report success after extension-triggered shutdown.
Or push these changes by commenting:
@cursor push 83fff7ee80
Preview (83fff7ee80)
diff --git a/packages/agent/src/agent.ts b/packages/agent/src/agent.ts
--- a/packages/agent/src/agent.ts
+++ b/packages/agent/src/agent.ts
@@ -372,6 +372,10 @@
});
}
+ getRuntimeTools(): AgentTool[] {
+ return this.runtime.tools();
+ }
+
/**
* Mirror pi `AgentHarness.setModel()` while accepting CUA model refs.
*
diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -20,6 +20,10 @@
makeExtensionContextActions,
} from "./seams";
+type RuntimeToolAwareHarness = AgentHarness & {
+ getRuntimeTools?: () => AgentTool[];
+};
+
export interface HarnessExtensionHostOptions {
harness: AgentHarness;
/** The same `Session` the harness was constructed with; used for entry writes. */
@@ -122,6 +126,9 @@
await this.reapplyTools();
this.installBridge();
await this.runner?.emit({ type: "session_start", reason: "startup" });
+ if (await this.disposeIfShutdownRequested()) {
+ throw new Error("HarnessExtensionHost shut down during startup");
+ }
}
/**
@@ -224,9 +231,12 @@
this.runner,
);
const extensionNames = new Set(nextExtensionTools.map((tool) => tool.name));
- const base = this.harness
- .getTools()
- .filter((tool) => !extensionNames.has(tool.name) && !priorExtensionNames.has(tool.name));
+ const { tools: runtimeBaseTools, authoritative } = this.getCurrentBaseTools();
+ const base = runtimeBaseTools.filter((tool) => {
+ if (extensionNames.has(tool.name)) return false;
+ if (authoritative) return true;
+ return !priorExtensionNames.has(tool.name);
+ });
const final = [...base, ...nextExtensionTools];
const finalNames = new Set(final.map((tool) => tool.name));
const activeNames = new Set(
@@ -248,6 +258,12 @@
} while (this.reapplyQueued);
}
+ private getCurrentBaseTools(): { tools: AgentTool[]; authoritative: boolean } {
+ const runtimeTools = (this.harness as RuntimeToolAwareHarness).getRuntimeTools?.();
+ if (runtimeTools) return { tools: runtimeTools, authoritative: true };
+ return { tools: this.harness.getTools(), authoritative: false };
+ }
+
/**
* Apply an extension-requested active-tool set, recording which extension
* tools were turned off so `reapplyTools` won't silently re-enable them.
diff --git a/packages/cli/test/extensions.test.ts b/packages/cli/test/extensions.test.ts
--- a/packages/cli/test/extensions.test.ts
+++ b/packages/cli/test/extensions.test.ts
@@ -132,6 +132,47 @@
expect(toolNames).toContain("beta_tool");
expect(toolNames).not.toContain("alpha_tool");
});
+
+ it("restores colliding base tools when an extension stops registering them", async () => {
+ const extDir = mkdtempSync(join(tmpdir(), "cua-ext-"));
+ const extFile = join(extDir, "shadow.ts");
+ fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
+ const collidingToolName = fx.harness.getTools()[0]?.name;
+ expect(collidingToolName).toBeDefined();
+ writeFileSync(extFile, makeToolExtension(collidingToolName!));
+
+ const created = new HarnessExtensionHost({
+ harness: fx.harness,
+ session: fx.session,
+ cwd: fx.cwd,
+ configuredPaths: [extDir],
+ agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
+ });
+ host = created;
+ await created.load();
+
+ writeFileSync(extFile, makeNoopExtension());
+ await created.reload();
+
+ expect(fx.harness.getTools().map((tool) => tool.name)).toContain(collidingToolName);
+ });
+
+ it("fails startup when an extension requests shutdown during session_start", async () => {
+ const extDir = mkdtempSync(join(tmpdir(), "cua-ext-"));
+ const extFile = join(extDir, "shutdown.ts");
+ writeFileSync(extFile, makeShutdownOnStartupExtension());
+ fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
+ const created = new HarnessExtensionHost({
+ harness: fx.harness,
+ session: fx.session,
+ cwd: fx.cwd,
+ configuredPaths: [extDir],
+ agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
+ });
+ host = created;
+
+ await expect(created.load()).rejects.toThrow("HarnessExtensionHost shut down during startup");
+ });
});
/** A minimal, import-free extension that registers a single named tool. */
@@ -149,3 +190,16 @@
"",
].join("\n");
}
+
+function makeNoopExtension(): string {
+ return ["export default function () {}", ""].join("\n");
+}
+
+function makeShutdownOnStartupExtension(): string {
+ return [
+ "export default function (pi) {",
+ ' pi.on("session_start", (_event, ctx) => ctx.shutdown());',
+ "}",
+ "",
+ ].join("\n");
+}You can send follow-ups to the cloud agent here.
| const base = this.harness | ||
| .getTools() | ||
| .filter((tool) => !extensionNames.has(tool.name) && !priorExtensionNames.has(tool.name)); | ||
| const final = [...base, ...nextExtensionTools]; |
There was a problem hiding this comment.
Base tools lost after name clash
High Severity
reapplyTools() rebuilds the tool list by filtering harness.getTools() and dropping names in priorExtensionNames, without re-merging CUA runtime base tools. If an extension registered a tool whose name matches a built-in tool, then reload removes that extension, the built-in tool can disappear from the harness until something like setModel rebuilds from the runtime.
Reviewed by Cursor Bugbot for commit 7df3659. Configure here.
| await this.reapplyTools(); | ||
| this.installBridge(); | ||
| await this.runner?.emit({ type: "session_start", reason: "startup" }); | ||
| } |
There was a problem hiding this comment.
Load succeeds after startup shutdown
Medium Severity
load() awaits session_start but never checks shutdownRequested or disposed afterward. If an extension calls ctx.shutdown() during that emit, requestShutdown() runs immediate dispose() while reloading is false, yet load() still resolves successfully and leaves a disposed host that looks initialized.
Reviewed by Cursor Bugbot for commit 7df3659. Configure here.
Construct and load HarnessExtensionHost in setupHarnessRuntime via a new browser-free helper (extensions/setup.ts), carry it on HarnessRuntime, and dispose it before closing the browser handle on all three run paths (print/interactive/action). Add a /reload TUI command that hot-swaps edited extensions and surfaces load errors, wire it into autocomplete, and pass the first-turn screenshot from the browser handle. Gate project-local extensions behind trust: global <agentDir>/extensions load on every run, but the implicit <cwd>/.pi/extensions scan and the explicit <cwd>/.agents/extensions dir only load when the project is trusted (persisted pi trust or --trust-extensions); --no-extensions disables loading. Project extensions execute arbitrary TypeScript in-process, so they are never auto-run by default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cua already runs agent-authored code (bash, file edits, the browser), so gating project-local extensions behind trust was inconsistent and blocked the self-improve loop — an agent writes a learned tool into the project extension dir and it should load on the next run. Remove the projectExtensionsTrusted host option and the --trust-extensions flag: <cwd>/.agents/extensions, the implicit <cwd>/.pi/extensions scan, and global ~/.pi/agent/extensions all load on every run. --no-extensions still disables loading entirely. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 4 potential issues.
There are 10 total unresolved issues (including 6 from previous reviews).
Autofix Details
Bugbot Autofix prepared fixes for all 4 issues found in the latest run.
- ✅ Fixed: Project extensions skip trust gating
- Project-local extension loading is now trust-gated via persisted project trust or explicit
--trust-extensions, while global agent-dir extensions still load.
- Project-local extension loading is now trust-gated via persisted project trust or explicit
- ✅ Fixed: Reload drops bridge on failure
- Reload now rebuilds and reapplies tools before bridge teardown and restores prior host state on errors so the bridge remains installed.
- ✅ Fixed: /reload runs during active agent
/reloadnow waits forharness.waitForIdle()before callinghost.reload()to prevent mid-turn bridge or runner swaps.
- ✅ Fixed: Aborted reload reports success
- Reload command feedback now checks host disposal state and reports shutdown instead of showing a false 'extensions reloaded' success notice.
Or push these changes by commenting:
@cursor push 4fd32a09e4
Preview (4fd32a09e4)
diff --git a/packages/cli/src/cli-harness.ts b/packages/cli/src/cli-harness.ts
--- a/packages/cli/src/cli-harness.ts
+++ b/packages/cli/src/cli-harness.ts
@@ -177,6 +177,7 @@
noSession: boolean;
noSkills: boolean;
noExtensions: boolean;
+ trustExtensions: boolean;
debugTui: boolean;
jsonlIncludeDeltas: boolean;
jsonlIncludeImages: boolean;
@@ -441,6 +442,7 @@
session,
cwd,
noExtensions: flags.noExtensions,
+ trustExtensions: flags.trustExtensions,
initialScreenshot,
});
} catch (err) {
diff --git a/packages/cli/src/cli.ts b/packages/cli/src/cli.ts
--- a/packages/cli/src/cli.ts
+++ b/packages/cli/src/cli.ts
@@ -68,6 +68,8 @@
--no-extensions Disable pi extensions, which otherwise load from
<cwd>/.agents/extensions, <cwd>/.pi/extensions,
and the pi agent dir (~/.pi/agent/extensions/)
+ --trust-extensions Trust project-local extension directories for this
+ run (<cwd>/.agents/extensions and <cwd>/.pi/extensions)
--debug-tui Enable TUI render diagnostics for manual repros
-v, --verbose Verbose progress output to stderr
-h, --help Show this help
@@ -101,6 +103,7 @@
noSession: boolean;
noSkills: boolean;
noExtensions: boolean;
+ trustExtensions: boolean;
debugTui: boolean;
jsonlIncludeDeltas: boolean;
jsonlIncludeImages: boolean;
@@ -150,6 +153,7 @@
skill: { type: "string", multiple: true, default: [] },
"no-skills": { type: "boolean", default: false },
"no-extensions": { type: "boolean", default: false },
+ "trust-extensions": { type: "boolean", default: false },
"debug-tui": { type: "boolean", default: false },
output: { type: "string", short: "o" },
"jsonl-include-deltas": { type: "boolean", default: false },
@@ -187,6 +191,7 @@
noSession: !!parsed.values["no-session"],
noSkills: !!parsed.values["no-skills"],
noExtensions: !!parsed.values["no-extensions"],
+ trustExtensions: !!parsed.values["trust-extensions"],
debugTui: !!parsed.values["debug-tui"],
model: parsed.values.model as string | undefined,
thinking: parsed.values.thinking as string | undefined,
@@ -216,6 +221,7 @@
noSession: flags.noSession,
noSkills: flags.noSkills,
noExtensions: flags.noExtensions,
+ trustExtensions: flags.trustExtensions,
debugTui: flags.debugTui,
jsonlIncludeDeltas: flags.jsonlIncludeDeltas,
jsonlIncludeImages: flags.jsonlIncludeImages,
diff --git a/packages/cli/src/extensions/host.ts b/packages/cli/src/extensions/host.ts
--- a/packages/cli/src/extensions/host.ts
+++ b/packages/cli/src/extensions/host.ts
@@ -2,9 +2,12 @@
import type { ImageContent } from "@onkernel/cua-ai";
import {
AuthStorage,
+ DefaultResourceLoader,
discoverAndLoadExtensions,
ExtensionRunner,
+ getAgentDir,
ModelRegistry,
+ SettingsManager,
SessionManager,
wrapRegisteredTools,
} from "@earendil-works/pi-coding-agent";
@@ -13,6 +16,7 @@
ExtensionCommandContextActions,
ExtensionContextActions,
} from "@earendil-works/pi-coding-agent";
+import { isAbsolute, relative, resolve } from "node:path";
import { installBridge, type BridgeState } from "./bridge";
import {
makeExtensionActions,
@@ -27,6 +31,8 @@
cwd: string;
/** Extension paths passed straight to `discoverAndLoadExtensions`. */
configuredPaths: string[];
+ /** Whether project-local extension sources under `cwd` may be executed. */
+ projectTrusted: boolean;
/** Agent config dir searched for `extensions/`. Pass a temp dir to isolate from `~/.agents`. */
agentDir?: string;
/**
@@ -58,6 +64,7 @@
private readonly session: Session;
private readonly cwd: string;
private readonly configuredPaths: string[];
+ private readonly projectTrusted: boolean;
private readonly agentDir?: string;
private readonly initialScreenshot?: () => Promise<ImageContent[] | undefined>;
private readonly sessionManager: SessionManager;
@@ -95,6 +102,7 @@
this.session = options.session;
this.cwd = options.cwd;
this.configuredPaths = options.configuredPaths;
+ this.projectTrusted = options.projectTrusted;
this.agentDir = options.agentDir;
this.initialScreenshot = options.initialScreenshot;
this.sessionManager = SessionManager.inMemory(this.cwd);
@@ -111,6 +119,7 @@
});
this.contextActions = makeExtensionContextActions(this.harness, {
isIdle: () => this.bridgeState.isIdle,
+ isProjectTrusted: () => this.projectTrusted,
getSignal: () => undefined,
shutdown: () => this.requestShutdown(),
});
@@ -118,21 +127,30 @@
}
async load(): Promise<void> {
- await this.buildRunner();
+ const { runner, loadErrors } = await this.buildRunner();
+ this.runner = runner;
+ this.loadErrors = loadErrors;
await this.reapplyTools();
this.installBridge();
await this.runner?.emit({ type: "session_start", reason: "startup" });
}
+ isDisposed(): boolean {
+ return this.disposed;
+ }
+
/**
- * Mirror `AgentSession.reload`: carry over flag values, tear down the old
- * runner's bridge, re-discover extensions from disk, build a fresh runner over
- * the same in-memory services, restore flags, rebind, re-apply tools, reinstall
- * the bridge, then emit `session_start`. No extension cache is cleared because
- * the loader imports each extension fresh from disk.
+ * Mirror `AgentSession.reload`: carry over flag values, re-discover
+ * extensions from disk, build a fresh runner over the same in-memory
+ * services, restore flags, re-apply tools, swap bridges, then emit
+ * `session_start`. No extension cache is cleared because the loader imports
+ * each extension fresh from disk.
*/
async reload(): Promise<void> {
if (this.disposed) return;
+ const previousRunner = this.runner;
+ const previousLoadErrors = this.loadErrors;
+ const previousExtensionTools = this.extensionTools;
const flags = this.runner?.getFlagValues() ?? new Map<string, boolean | string>();
// `reloading` defers any `ctx.shutdown()` raised by an extension's
// session_shutdown handler so an unawaited dispose can't tear down the
@@ -140,19 +158,33 @@
// request before continuing.
this.reloading = true;
try {
- await this.runner?.emit({ type: "session_shutdown", reason: "reload" });
+ await previousRunner?.emit({ type: "session_shutdown", reason: "reload" });
if (await this.disposeIfShutdownRequested()) return;
- this.teardownBridge?.();
- this.teardownBridge = undefined;
-
- await this.buildRunner();
+ const { runner, loadErrors } = await this.buildRunner();
if (await this.disposeIfShutdownRequested()) return;
+ this.runner = runner;
+ this.loadErrors = loadErrors;
for (const [name, value] of flags) this.runner?.setFlagValue(name, value);
await this.reapplyTools();
if (await this.disposeIfShutdownRequested()) return;
+ this.teardownBridge?.();
+ this.teardownBridge = undefined;
this.installBridge();
await this.runner?.emit({ type: "session_start", reason: "reload" });
+ } catch (error) {
+ if (!this.disposed) {
+ this.runner = previousRunner;
+ this.loadErrors = previousLoadErrors;
+ this.extensionTools = previousExtensionTools;
+ try {
+ await this.reapplyTools();
+ } catch {
+ // Preserve the original reload error.
+ }
+ if (!this.teardownBridge && this.runner) this.installBridge();
+ }
+ throw error;
} finally {
this.reloading = false;
}
@@ -178,21 +210,46 @@
this.runner = undefined;
}
- private async buildRunner(): Promise<void> {
- const result = await discoverAndLoadExtensions(this.configuredPaths, this.cwd, this.agentDir);
- this.loadErrors = result.errors;
- this.runner = new ExtensionRunner(
+ private async buildRunner(): Promise<{
+ runner: ExtensionRunner;
+ loadErrors: Array<{ path: string; error: string }>;
+ }> {
+ const result = await this.discoverExtensions();
+ const runner = new ExtensionRunner(
result.extensions,
result.runtime,
this.cwd,
this.sessionManager,
this.modelRegistry,
);
- this.runner.bindCore(this.actions, this.contextActions);
- this.runner.bindCommandContext(this.commandActions);
- this.runner.setUIContext(undefined, "print");
+ runner.bindCore(this.actions, this.contextActions);
+ runner.bindCommandContext(this.commandActions);
+ runner.setUIContext(undefined, "print");
+ return { runner, loadErrors: result.errors };
}
+ private async discoverExtensions() {
+ if (this.projectTrusted) {
+ return discoverAndLoadExtensions(this.configuredPaths, this.cwd, this.agentDir);
+ }
+ const agentDir = this.agentDir ?? getAgentDir();
+ const settingsManager = SettingsManager.create(this.cwd, agentDir, { projectTrusted: false });
+ const loader = new DefaultResourceLoader({
+ cwd: this.cwd,
+ agentDir,
+ settingsManager,
+ additionalExtensionPaths: this.configuredPaths
+ .map((path) => resolve(this.cwd, path))
+ .filter((path) => !isUnderPath(path, this.cwd)),
+ noSkills: true,
+ noPromptTemplates: true,
+ noThemes: true,
+ noContextFiles: true,
+ });
+ await loader.reload();
+ return loader.getExtensions();
+ }
+
/**
* Rebuild the extension-tool union and apply it to the harness as the
* authoritative tool list. Extension tools are de-duped by name (the harness
@@ -308,3 +365,8 @@
(entry.message.role === "user" || entry.message.role === "assistant"),
);
}
+
+function isUnderPath(target: string, root: string): boolean {
+ const rel = relative(resolve(root), resolve(target));
+ return rel === "" || (!rel.startsWith("..") && !isAbsolute(rel));
+}
diff --git a/packages/cli/src/extensions/seams.ts b/packages/cli/src/extensions/seams.ts
--- a/packages/cli/src/extensions/seams.ts
+++ b/packages/cli/src/extensions/seams.ts
@@ -98,7 +98,12 @@
export function makeExtensionContextActions(
harness: AgentHarness,
- state: { isIdle: () => boolean; getSignal: () => AbortSignal | undefined; shutdown: () => void },
+ state: {
+ isIdle: () => boolean;
+ isProjectTrusted: () => boolean;
+ getSignal: () => AbortSignal | undefined;
+ shutdown: () => void;
+ },
): ExtensionContextActions {
return {
getModel() {
@@ -107,9 +112,8 @@
isIdle() {
return state.isIdle();
},
- // Headless host trusts its cwd; project-trust prompts are a TUI concern.
isProjectTrusted(): boolean {
- return true;
+ return state.isProjectTrusted();
},
getSignal() {
return state.getSignal();
diff --git a/packages/cli/src/extensions/setup.ts b/packages/cli/src/extensions/setup.ts
--- a/packages/cli/src/extensions/setup.ts
+++ b/packages/cli/src/extensions/setup.ts
@@ -1,17 +1,22 @@
import type { CuaAgentHarness, Session } from "@onkernel/cua-agent";
import type { ImageContent } from "@onkernel/cua-ai";
-import { getAgentDir } from "@earendil-works/pi-coding-agent";
+import {
+ getAgentDir,
+ hasProjectTrustInputs,
+ ProjectTrustStore,
+ SettingsManager,
+} from "@earendil-works/pi-coding-agent";
+import { existsSync } from "node:fs";
import { join } from "node:path";
import { HarnessExtensionHost } from "./host";
/**
* Resolve extension directories and construct + load a {@link HarnessExtensionHost}.
*
- * Global extensions (`<getAgentDir()>/extensions`) and project-local extensions
- * (`<cwd>/.agents/extensions` plus the loader's implicit `<cwd>/.pi/extensions`
- * scan) all load on every run; `--no-extensions` opts out entirely. This is the
- * substrate for the self-improve loop: an agent writes a learned tool into the
- * project extension dir and it loads on the next run.
+ * Global extensions (`<getAgentDir()>/extensions`) always load; project-local
+ * extensions (`<cwd>/.agents/extensions` plus `<cwd>/.pi/extensions`) only load
+ * when project trust resolves true or `--trust-extensions` is set. `--no-extensions`
+ * opts out entirely.
*
* No browser/auth/provisioning happens here, so a test can drive the exact load
* path the CLI uses with a `buildTestHarness` fixture and temp dirs.
@@ -21,21 +26,45 @@
session: Session;
cwd: string;
noExtensions: boolean;
+ trustExtensions?: boolean;
agentDir?: string;
configuredPaths?: string[];
initialScreenshot?: () => Promise<ImageContent[] | undefined>;
}): Promise<HarnessExtensionHost | undefined> {
if (args.noExtensions) return undefined;
const agentDir = args.agentDir ?? getAgentDir();
+ const projectTrusted = resolveProjectExtensionTrust({
+ cwd: args.cwd,
+ agentDir,
+ trustExtensions: args.trustExtensions === true,
+ });
const configuredPaths = args.configuredPaths ?? [join(args.cwd, ".agents", "extensions")];
const host = new HarnessExtensionHost({
harness: args.harness,
session: args.session,
cwd: args.cwd,
configuredPaths,
+ projectTrusted,
agentDir,
initialScreenshot: args.initialScreenshot,
});
await host.load();
return host;
}
+
+function resolveProjectExtensionTrust(args: {
+ cwd: string;
+ agentDir: string;
+ trustExtensions: boolean;
+}): boolean {
+ if (args.trustExtensions) return true;
+ if (!hasProjectExtensionInputs(args.cwd)) return true;
+ const trustDecision = new ProjectTrustStore(args.agentDir).get(args.cwd);
+ if (trustDecision !== null) return trustDecision;
+ const settings = SettingsManager.create(args.cwd, args.agentDir, { projectTrusted: false });
+ return settings.getDefaultProjectTrust() === "always";
+}
+
+function hasProjectExtensionInputs(cwd: string): boolean {
+ return hasProjectTrustInputs(cwd) || existsSync(join(cwd, ".agents", "extensions"));
+}
diff --git a/packages/cli/src/tui/main.ts b/packages/cli/src/tui/main.ts
--- a/packages/cli/src/tui/main.ts
+++ b/packages/cli/src/tui/main.ts
@@ -531,16 +531,25 @@
}
export async function applyReloadCommand(opts: InteractiveOptions, messages: MessageList): Promise<void> {
- if (!opts.host) {
+ if (!opts.host || opts.host.isDisposed()) {
messages.addNotice("extensions are disabled");
return;
}
messages.addNotice("reloading extensions…");
try {
+ await opts.harness.waitForIdle();
+ if (opts.host.isDisposed()) {
+ messages.addNotice("extensions are disabled");
+ return;
+ }
// reload() emits no harness event, so this helper is the only source of
// feedback; surface loadErrors so a broken edited extension isn't silently
// dropped with its tool missing.
await opts.host.reload();
+ if (opts.host.isDisposed()) {
+ messages.addNotice("extensions were shut down");
+ return;
+ }
if (opts.host.loadErrors.length > 0) {
for (const { path, error } of opts.host.loadErrors) messages.addError(`${path}: ${error}`);
} else {
diff --git a/packages/cli/test/e2e/agent-start-counter-shortcut.test.ts b/packages/cli/test/e2e/agent-start-counter-shortcut.test.ts
--- a/packages/cli/test/e2e/agent-start-counter-shortcut.test.ts
+++ b/packages/cli/test/e2e/agent-start-counter-shortcut.test.ts
@@ -120,6 +120,7 @@
session: fx.session,
cwd: fx.cwd,
configuredPaths: [extDir],
+ projectTrusted: true,
agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
});
await host.load();
diff --git a/packages/cli/test/e2e/dom-table-extraction.test.ts b/packages/cli/test/e2e/dom-table-extraction.test.ts
--- a/packages/cli/test/e2e/dom-table-extraction.test.ts
+++ b/packages/cli/test/e2e/dom-table-extraction.test.ts
@@ -87,6 +87,7 @@
session: fx.session,
cwd: fx.cwd,
configuredPaths: [extDir],
+ projectTrusted: true,
agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
});
await host.load();
diff --git a/packages/cli/test/e2e/form-fill-macro.test.ts b/packages/cli/test/e2e/form-fill-macro.test.ts
--- a/packages/cli/test/e2e/form-fill-macro.test.ts
+++ b/packages/cli/test/e2e/form-fill-macro.test.ts
@@ -90,6 +90,7 @@
session: fx.session,
cwd: fx.cwd,
configuredPaths: [extDir],
+ projectTrusted: true,
agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
});
await host.load();
diff --git a/packages/cli/test/e2e/nav-shortcut-tool.test.ts b/packages/cli/test/e2e/nav-shortcut-tool.test.ts
--- a/packages/cli/test/e2e/nav-shortcut-tool.test.ts
+++ b/packages/cli/test/e2e/nav-shortcut-tool.test.ts
@@ -85,6 +85,7 @@
session: fx.session,
cwd: fx.cwd,
configuredPaths: [extDir],
+ projectTrusted: true,
agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
});
await host.load();
diff --git a/packages/cli/test/e2e/template-match-click.test.ts b/packages/cli/test/e2e/template-match-click.test.ts
--- a/packages/cli/test/e2e/template-match-click.test.ts
+++ b/packages/cli/test/e2e/template-match-click.test.ts
@@ -98,6 +98,7 @@
session: fx.session,
cwd: fx.cwd,
configuredPaths: [extDir],
+ projectTrusted: true,
agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
});
await host.load();
diff --git a/packages/cli/test/extension-loader.test.ts b/packages/cli/test/extension-loader.test.ts
--- a/packages/cli/test/extension-loader.test.ts
+++ b/packages/cli/test/extension-loader.test.ts
@@ -74,7 +74,7 @@
expect(fx.harness.getTools().map((t) => t.name)).not.toContain("loader_probe");
});
- it("loads the implicit project <cwd>/.pi/extensions scan by default", async () => {
+ it("does not load the implicit project <cwd>/.pi/extensions scan when untrusted", async () => {
fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
// Unique per run so the whole-harness tool assertion can't collide with
// another worker registering the same name under pool concurrency.
@@ -92,10 +92,10 @@
});
expect(host).toBeDefined();
- expect(fx.harness.getTools().map((t) => t.name)).toContain(probe);
+ expect(fx.harness.getTools().map((t) => t.name)).not.toContain(probe);
});
- it("loads project <cwd>/.agents/extensions by default", async () => {
+ it("does not load project <cwd>/.agents/extensions when untrusted", async () => {
fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
const probe = `agents_probe_${randomUUID().replace(/-/g, "")}`;
const projectExtDir = join(fx.cwd, ".agents", "extensions");
@@ -111,6 +111,31 @@
});
expect(host).toBeDefined();
- expect(fx.harness.getTools().map((t) => t.name)).toContain(probe);
+ expect(fx.harness.getTools().map((t) => t.name)).not.toContain(probe);
});
+
+ it("loads project-local extension directories with --trust-extensions", async () => {
+ fx = await buildTestHarness({ turns: [{ steps: [{ type: "text", text: "ok" }] }] });
+ const agentsProbe = `agents_probe_${randomUUID().replace(/-/g, "")}`;
+ const piProbe = `pi_probe_${randomUUID().replace(/-/g, "")}`;
+ const agentsExtDir = join(fx.cwd, ".agents", "extensions");
+ mkdirSync(agentsExtDir, { recursive: true });
+ writeFileSync(join(agentsExtDir, "agents-probe.ts"), makeToolExtension(agentsProbe));
+ const piExtDir = join(fx.cwd, ".pi", "extensions");
+ mkdirSync(piExtDir, { recursive: true });
+ writeFileSync(join(piExtDir, "pi-probe.ts"), makeToolExtension(piProbe));
+
+ host = await loadHarnessExtensions({
+ harness: fx.harness,
+ session: fx.session,
+ cwd: fx.cwd,
+ noExtensions: false,
+ trustExtensions: true,
+ agentDir: tempAgentDir(),
+ });
+
+ expect(host).toBeDefined();
+ expect(fx.harness.getTools().map((t) => t.name)).toContain(piProbe);
+ expect(fx.harness.getTools().map((t) => t.name)).toContain(agentsProbe);
+ });
});
diff --git a/packages/cli/test/extensions.test.ts b/packages/cli/test/extensions.test.ts
--- a/packages/cli/test/extensions.test.ts
+++ b/packages/cli/test/extensions.test.ts
@@ -38,6 +38,7 @@
session: fx.session,
cwd: fx.cwd,
configuredPaths: [makeExtensionDir()],
+ projectTrusted: true,
agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
});
await created.load();
@@ -119,6 +120,7 @@
session: fx.session,
cwd: fx.cwd,
configuredPaths: [extDir],
+ projectTrusted: true,
agentDir: mkdtempSync(join(tmpdir(), "cua-agentdir-")),
});
host = created;
diff --git a/packages/cli/test/reload-command.test.ts b/packages/cli/test/reload-command.test.ts
--- a/packages/cli/test/reload-command.test.ts
+++ b/packages/cli/test/reload-command.test.ts
@@ -15,23 +15,39 @@
messages: MessageList;
notices: string[];
errors: string[];
+ waitForIdle: ReturnType<typeof vi.fn>;
} {
const messages = new MessageList();
const notices: string[] = [];
const errors: string[] = [];
+ const waitForIdle = vi.fn(async () => {});
vi.spyOn(messages, "addNotice").mockImplementation((text) => void notices.push(text));
vi.spyOn(messages, "addError").mockImplementation((text) => void errors.push(text));
- return { opts: { host } as InteractiveOptions, messages, notices, errors };
+ return {
+ opts: {
+ host,
+ harness: { waitForIdle } as unknown as InteractiveOptions["harness"],
+ } as InteractiveOptions,
+ messages,
+ notices,
+ errors,
+ waitForIdle,
+ };
}
describe("applyReloadCommand (/reload glue)", () => {
it("invokes host.reload() and reports a clean reload", async () => {
const reload = vi.fn(async () => {});
- const host = { reload, loadErrors: [] } as unknown as HarnessExtensionHost;
- const { opts, messages, notices, errors } = setup(host);
+ const host = {
+ reload,
+ loadErrors: [],
+ isDisposed: () => false,
+ } as unknown as HarnessExtensionHost;
+ const { opts, messages, notices, errors, waitForIdle } = setup(host);
await applyReloadCommand(opts, messages);
+ expect(waitForIdle).toHaveBeenCalledTimes(1);
expect(reload).toHaveBeenCalledTimes(1);
expect(notices).toContain("extensions reloaded");
expect(errors).toHaveLength(0);
@@ -42,16 +58,37 @@
const host = {
reload,
loadErrors: [{ path: "/ext/broken.ts", error: "boom" }],
+ isDisposed: () => false,
} as unknown as HarnessExtensionHost;
- const { opts, messages, errors, notices } = setup(host);
+ const { opts, messages, errors, notices, waitForIdle } = setup(host);
await applyReloadCommand(opts, messages);
+ expect(waitForIdle).toHaveBeenCalledTimes(1);
expect(reload).toHaveBeenCalledTimes(1);
expect(errors).toContain("/ext/broken.ts: boom");
expect(notices).not.toContain("extensions reloaded");
});
+ it("reports disabled when reload disposes the host", async () => {
+ let disposed = false;
+ const reload = vi.fn(async () => {
+ disposed = true;
+ });
+ const host = {
+ reload,
+ loadErrors: [],
+ isDisposed: () => disposed,
+ } as unknown as HarnessExtensionHost;
+ const { opts, messages, notices } = setup(host);
+
+ await applyReloadCommand(opts, messages);
+
+ expect(reload).toHaveBeenCalledTimes(1);
+ expect(notices).toContain("extensions were shut down");
+ expect(notices).not.toContain("extensions reloaded");
+ });
+
it("no-ops with a notice when no host is loaded", async () => {
const { opts, messages, notices } = setup(undefined);You can send follow-ups to the cloud agent here.
| initialScreenshot: args.initialScreenshot, | ||
| }); | ||
| await host.load(); | ||
| return host; |
There was a problem hiding this comment.
Project extensions skip trust gating
High Severity
The loadHarnessExtensions function unconditionally loads project-local extensions from .agents/extensions and .pi/extensions without any trust checks. This bypasses the intended trust gating, allowing arbitrary code execution from untrusted repositories.
Reviewed by Cursor Bugbot for commit 2eeeadb. Configure here.
| await this.reapplyTools(); | ||
| if (await this.disposeIfShutdownRequested()) return; | ||
| this.installBridge(); | ||
| await this.runner?.emit({ type: "session_start", reason: "reload" }); |
There was a problem hiding this comment.
Reload drops bridge on failure
Medium Severity
The reload() method tears down the extension event bridge before re-initializing the runner and tools. If an error occurs during these re-initialization steps, the bridge isn't re-installed, which means extensions lose event forwarding and their tools stop working for the session.
Reviewed by Cursor Bugbot for commit 2eeeadb. Configure here.
| if (parsed?.command === "reload") { | ||
| await applyReloadCommand(opts, messages); | ||
| requestRender("reload"); | ||
| return; |
There was a problem hiding this comment.
/reload runs during active agent
Medium Severity
The /reload slash command calls host.reload() immediately without waiting for the harness to become idle, unlike other extension operations. This can tear down the event bridge or swap runners while a turn is executing.
Reviewed by Cursor Bugbot for commit 2eeeadb. Configure here.
| for (const { path, error } of opts.host.loadErrors) messages.addError(`${path}: ${error}`); | ||
| } else { | ||
| messages.addNotice("extensions reloaded"); | ||
| } |
There was a problem hiding this comment.
Aborted reload reports success
Medium Severity
When host.reload() exits early because disposeIfShutdownRequested() disposed the host (for example after an extension calls ctx.shutdown() during session_shutdown), applyReloadCommand still shows “extensions reloaded” whenever loadErrors is empty, even though the host was torn down and extensions are no longer wired.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 2eeeadb. Configure here.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 2 potential issues.
There are 12 total unresolved issues (including 10 from previous reviews).
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Extension sendUserMessage rejects unhandled
- Extension sendUserMessage now catches prompt failures and records them as extension_error entries so rejections are handled instead of becoming unhandled promises.
- ✅ Fixed: Extension startup steals first screenshot
- Initial screenshot attachment now keys only off the run-mode first-prompt/resume flags, so extension-startup turns no longer prevent the first user-driven prompt from receiving images.
Or push these changes by commenting:
@cursor push b64dd7022d
Preview (b64dd7022d)
diff --git a/packages/cli/src/action/harness-runner.ts b/packages/cli/src/action/harness-runner.ts
--- a/packages/cli/src/action/harness-runner.ts
+++ b/packages/cli/src/action/harness-runner.ts
@@ -140,23 +140,11 @@
async function maybeInitialScreenshot(opts: HarnessRunOptions): Promise<ImageContent[] | undefined> {
if (opts.skipInitialScreenshot) return undefined;
- const hasPriorTurn = await sessionHasPriorTurn(opts.session);
- if (hasPriorTurn) return undefined;
const png = await captureScreenshot(opts.browserHandle.client, opts.browserHandle.browser.session_id);
if (!png) return undefined;
return [{ type: "image", data: png.toString("base64"), mimeType: "image/png" }];
}
-async function sessionHasPriorTurn(session: Session): Promise<boolean> {
- const entries = await session.getBranch();
- for (const entry of entries) {
- if (entry.type === "message" && (entry.message.role === "user" || entry.message.role === "assistant")) {
- return true;
- }
- }
- return false;
-}
-
function textFromAssistant(message: AssistantMessage): string {
const parts: string[] = [];
for (const block of message.content) {
diff --git a/packages/cli/src/extensions/seams.ts b/packages/cli/src/extensions/seams.ts
--- a/packages/cli/src/extensions/seams.ts
+++ b/packages/cli/src/extensions/seams.ts
@@ -45,7 +45,16 @@
},
sendUserMessage(content): void {
const text = typeof content === "string" ? content : textPartsOf(content);
- void hooks.sendUserMessage(text);
+ void hooks.sendUserMessage(text).catch((error: unknown) => {
+ void session
+ .appendCustomMessageEntry(
+ "extension_error",
+ `sendUserMessage failed: ${errorMessage(error)}`,
+ true,
+ { action: "sendUserMessage" },
+ )
+ .catch(() => {});
+ });
},
appendEntry(customType, data): void {
void session.appendCustomEntry(customType, data);
@@ -166,3 +175,9 @@
.map((part) => part.text ?? "")
.join("");
}
+
+function errorMessage(error: unknown): string {
+ if (error instanceof Error && error.message.trim().length > 0) return error.message;
+ if (typeof error === "string" && error.trim().length > 0) return error;
+ return "unknown error";
+}
diff --git a/packages/cli/src/print.ts b/packages/cli/src/print.ts
--- a/packages/cli/src/print.ts
+++ b/packages/cli/src/print.ts
@@ -94,8 +94,6 @@
async function maybeInitialScreenshot(opts: RunPrintOptions): Promise<ImageContent[] | undefined> {
if (opts.skipInitialScreenshot) return undefined;
- const hasPriorTurn = await sessionHasPriorTurn(opts.session);
- if (hasPriorTurn) return undefined;
const png = await captureScreenshot(opts.browserHandle.client, opts.browserHandle.browser.session_id);
if (!png) return undefined;
return [
@@ -106,13 +104,3 @@
},
];
}
-
-async function sessionHasPriorTurn(session: Session): Promise<boolean> {
- const entries = await session.getBranch();
- for (const entry of entries) {
- if (entry.type === "message" && (entry.message.role === "user" || entry.message.role === "assistant")) {
- return true;
- }
- }
- return false;
-}
diff --git a/packages/cli/src/tui/main.ts b/packages/cli/src/tui/main.ts
--- a/packages/cli/src/tui/main.ts
+++ b/packages/cli/src/tui/main.ts
@@ -451,22 +451,11 @@
): Promise<ImageContent[] | undefined> {
if (firstPromptSent) return undefined;
if (opts.skipInitialScreenshot) return undefined;
- if (await sessionHasPriorTurn(opts.session)) return undefined;
const png = await captureScreenshot(opts.browserHandle.client, opts.browserHandle.browser.session_id);
if (!png) return undefined;
return [{ type: "image", data: png.toString("base64"), mimeType: "image/png" }];
}
-async function sessionHasPriorTurn(session: Session): Promise<boolean> {
- const entries = await session.getBranch();
- for (const entry of entries) {
- if (entry.type === "message" && (entry.message.role === "user" || entry.message.role === "assistant")) {
- return true;
- }
- }
- return false;
-}
-
async function applyModelCommand(
opts: InteractiveOptions,
footer: TelemetryFooter,You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 2010817. Configure here.
| await this.buildRunner(); | ||
| await this.reapplyTools(); | ||
| this.installBridge(); | ||
| await this.runner?.emit({ type: "session_start", reason: "startup" }); |
There was a problem hiding this comment.
Extension startup steals first screenshot
High Severity
host.load() emits session_start before the CLI or TUI sends the user’s first prompt. If an extension calls pi.sendUserMessage there, maybeInitialScreenshot attaches the browser image to that message because the transcript has no prior turns. The real first user prompt then hits the same sessionHasPriorTurn check in print.ts / TUI and runs without { images }, so non-yutori models can start blind on the actual task.
Additional Locations (1)
Triggered by learned rule: Harness prompt calls must attach first-prompt screenshot for non-yutori providers
Reviewed by Cursor Bugbot for commit 2010817. Configure here.
| }, | ||
| sendUserMessage(content): void { | ||
| const text = typeof content === "string" ? content : textPartsOf(content); | ||
| void hooks.sendUserMessage(text); |
There was a problem hiding this comment.
Extension sendUserMessage rejects unhandled
Medium Severity
pi.sendUserMessage is implemented with void hooks.sendUserMessage(text), so failures from harness.prompt (including concurrent use while the TUI is already driving the harness) surface as unhandled promise rejections rather than structured extension errors.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 2010817. Configure here.



Summary
Brings pi extensions to cua end to end: a
HarnessExtensionHostthat loads pi extensions against cua's lower-levelCuaAgentHarness(which pi'sAgentSession-based extension system doesn't bind to), and the CLI runtime wiring so runningcuaactually discovers, loads, and hot-reloads them. This is the substrate for self-improving computer-use agents — an agent can author a learned tool as an extension and/reloadit into the next run.What's in the PR
1. The host (
packages/cli/src/extensions/{host,seams,bridge}.ts)Reuses pi's host-agnostic extension loader + runner: discovers extensions (jiti, no build step), registers their tools on the harness, bridges harness events into the runner's extension-event emitters, and mirrors
AgentSession.reloadfor hot-swap. Tier A scope: tools, events, model/thinking/active-tool control, session-entry writes, headless (no-op UI). Deferred (stubbed):ctx.ui.*, slash/flag/renderer registration, the session-replacement family.Correctness details handled: extension tools survive a
setModel(which rebuilds the harness tool list) yet honor an explicitsetActiveToolsopt-out;reloaddrops tools from removed/renamed extensions; reload/dispose can't race into a double-teardown;sendUserMessageattaches the first-turn screenshot like the CLI's own prompt sites.2. CLI runtime wiring (
cli-harness.ts,cli.ts,tui/main.ts,tui/slash-commands.ts,extensions/setup.ts)setupHarnessRuntimevia a browser-free helper (loadHarnessExtensions) and disposed before the browser handle closes on all three run paths (print / interactive / action). A throwing extension load closes the handle before rethrowing, so it can't leak the browser session./reloadTUI command hot-swaps edited extensions, surfacesloadErrors, and appears in autocomplete.initialScreenshotis wired from the browser handle, reusing the existingcaptureScreenshot.3. Loading
Extensions load on every run from
<cwd>/.agents/extensions, the implicit<cwd>/.pi/extensionsscan, and the global pi agent dir (~/.pi/agent/extensions).--no-extensionsdisables loading entirely. (cua already executes agent-authored code — bash, file edits, the browser — so project-local extensions load without a separate trust gate.)Testing
setModel, stale-tool removal on reload; the CLI load path (loadHarnessExtensions+ abuildTestHarnessfixture + temp dirs, no browser) including that both<cwd>/.pi/extensionsand<cwd>/.agents/extensionsload by default and--no-extensionsreturns no host;/reloadTUI glue and parsing.test/e2e/) — five scenarios drive the host through: inefficient first run → meta-agent authors a learned tool →host.reload()→ second run calls it in one step. Covers template-match click, DOM table extraction, form-fill macro, nav shortcut, and a pagination de-dup extractor (which also provesagent_starthandlers re-bind after reload).npx tsc -bexits 0;cd packages/cli && npx vitest --run→ 58 passed | 5 skipped (the 5 skipped are pre-existing ptywright-dependent TUI fixtures).Test plan
npx tsc -bexits 0cd packages/cli && npx vitest --rungreen (58 passed | 5 skipped)~/.pi/agent/extensionsexposes its tool incua; a project.agents/extensionsor.pi/extensionsextension loads on run; editing one +/reloadhot-swaps;--no-extensionsdisables🤖 Generated with Claude Code
Note
Medium Risk
Extensions execute arbitrary project/global code and register tools on the agent harness, increasing attack surface and runtime complexity around tool lists and cleanup, though failures are partially isolated with dispose/close ordering.
Overview
Adds pi extension support to the cua CLI via a new
HarnessExtensionHostthat discovers extensions with pi’s loader/runner, registers their tools onCuaAgentHarness, bridges harness events into extension handlers (including reducers for context, provider payload, and tool call/result), and re-applies extension tools aftersetModelso they are not dropped when the harness rebuilds its tool list.Runtime wiring:
setupHarnessRuntimeloads extensions throughloadHarnessExtensions(project.agents/extensions, implicit.pi/extensions, global~/.pi/agent/extensions), wires first-turn screenshots for extension-initiated prompts, disposes the host before closing the browser on print/interactive/action paths, and closes the browser if extension load throws. New--no-extensionsflag disables loading; help/docs bump the recommended Anthropic model toclaude-opus-4-8.TUI:
/reloadre-discovers extensions from disk, surfacesloadErrors, and is included in slash-command autocomplete.Tests: Unit/integration coverage for load paths, reload hot-swap, host lifecycle, and five e2e “self-improve” scenarios (learned tools after reload).
Reviewed by Cursor Bugbot for commit 2010817. Bugbot is set up for automated code reviews on this repo. Configure here.