Skip to content

feat(plugin): add subagent delegation mode#2

Open
Olbrasoft wants to merge 1 commit into
JochenYang:mainfrom
Olbrasoft:feat/subagent-delegation-mode
Open

feat(plugin): add subagent delegation mode#2
Olbrasoft wants to merge 1 commit into
JochenYang:mainfrom
Olbrasoft:feat/subagent-delegation-mode

Conversation

@Olbrasoft

Copy link
Copy Markdown

Summary

Add VISION_MODE env var to support two delegation strategies for non-vision models:

  • api mode (default, original): the vision tool calls an external VLM API (VISION_API_KEY + VISION_API_URL). This is the existing behaviour — left untouched.
  • subagent mode (new): the plugin instructs the LLM to delegate image analysis to a vision-capable subagent via the Task tool. No external API key required — the subagent runs on whatever multimodal model is configured in opencode (e.g. opencode-go/minimax-m3).

Auto-fallback

When VISION_MODE is unset, the plugin auto-detects:

  • Uses api mode if VISION_API_KEY is present
  • Falls back to subagent mode otherwise

This means the plugin works out-of-the-box without any external credentials, as long as a vision-capable subagent is configured.

Motivation

Many users run a powerful non-vision model (e.g. GLM-5.2) as their primary agent and already have a multimodal model available via their opencode provider (e.g. opencode-go/minimax-m3, qwen3-vl via Ollama). Requiring a separate MiniMax Group API Key with Token Plan access — or any external VLM endpoint — adds friction for users who already have a vision model at hand.

Related: #29550 (Task tool should support passing image/file attachments to subagents) — the subagent mode is a complementary workaround that works today without changes to the Task tool.

Changes

plugins/vision-helper.ts

  • Add VISION_MODE ("api" | "subagent") with auto-fallback based on VISION_API_KEY presence
  • Add VISION_SUBAGENT_NAME (default: "image-reader") for configurable subagent identifier
  • experimental.chat.system.transform: inject delegation instruction for non-vision models in subagent mode
  • experimental.chat.messages.transform: path hint includes — use @image-reader subagent via Task tool suffix in subagent mode
  • isPluginInjectedText: extended regex to match new hint format (cleanup on re-transform)
  • Original api mode behaviour preserved — no breaking changes

README_en.md

  • Document VISION_MODE and VISION_SUBAGENT_NAME env vars
  • Add "Delegation Modes" section with setup instructions for subagent mode
  • Document auto-fallback behaviour

Setup for subagent mode

  1. Create ~/.config/opencode/agent/image-reader.md:
---
description: Analyzes images and screenshots using a multimodal model. Use when the main agent cannot view images.
mode: subagent
model: opencode-go/minimax-m3
permission:
  read: allow
  glob: allow
  list: allow
  bash: deny
  edit: deny
---

You are a vision analyst. Read the image at the given path using the `read` tool and describe what you see.
  1. Either set VISION_MODE=subagent or simply leave it unset (auto-fallback kicks in when VISION_API_KEY is absent).

  2. Restart opencode. The plugin will:

    • Save pasted images to /tmp/opencode-vision/image{N}/
    • Inject a system prompt instructing the non-vision model to delegate
    • Inject a path hint naming the subagent

Test plan

  • Plugin compiles (bun build clean)
  • api mode still works (no env vars changed for existing users)
  • subagent mode: image pasted → plugin saves to tmp → injects hint → GLM-5.2 delegates to @image-reader (MiniMax M3) → description returned to user
  • Auto-fallback: with no VISION_API_KEY and no VISION_MODE, plugin uses subagent mode
  • Native-vision models (e.g. MiniMax M3 as primary) skip delegation — system prompt tells them to analyze natively

Backwards compatibility

✅ No breaking changes. Existing users with VISION_API_KEY set get identical behaviour to v1.7.0. The new subagent mode only activates when VISION_API_KEY is absent or VISION_MODE=subagent is explicit.

Add VISION_MODE env var to support two delegation strategies:
- 'api' (default, original): vision tool calls external VLM API
- 'subagent' (new): LLM delegates to @image-reader subagent via Task tool

Auto-fallback: uses 'api' when VISION_API_KEY is set, otherwise 'subagent'.
This lets the plugin work out-of-the-box without external API credentials,
reusing a vision-capable model already configured in opencode (e.g.
opencode-go/minimax-m3) via a subagent definition.

New env vars:
- VISION_MODE: 'api' | 'subagent' (auto-detected if unset)
- VISION_SUBAGENT_NAME: subagent identifier (default: 'image-reader')

The original api mode is left untouched — all existing env vars and the
vision tool behave exactly as before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants