Skip to content

feat(groomer): add durable history, audit UI, repo context, and guardrails#468

Merged
joryirving merged 2 commits into
mainfrom
feat/groomer-closeout
Jun 27, 2026
Merged

feat(groomer): add durable history, audit UI, repo context, and guardrails#468
joryirving merged 2 commits into
mainfrom
feat/groomer-closeout

Conversation

@itsmiso-ai

Copy link
Copy Markdown
Contributor

Closes #460.

Turns the hosted groomer MVP (#462) into a complete, operator-visible subsystem. The groomer's scope is unchanged — it still doesn't edit code, open PRs, merge, clone repositories, or run shell.

What's added

  • GroomingRun history table — dedicated audit record with labels/lane before+after, context warnings, failure stage, comment URL, prompt/model metadata, and composite indexes. AgentRun/AuditLog are still written for existing views; GroomingRun is the detailed drilldown.
  • Repository context — when DISPATCH_GROOMER_REPO_CONTEXT_ENABLED=true, the groomer gathers bounded context via GitHub REST APIs only (metadata, code search, file snippets). Hard caps on searches, files, bytes. No clone, no shell. Failures are soft and recorded as warnings.
  • Comment cooldown — skips repeat hosted-groomer comments within DISPATCH_GROOMER_COMMENT_COOLDOWN_HOURS (default 24), unless force is set.
  • History APIGET /api/groomer/runs (list with filters) and GET /api/groomer/runs/[id] (detail).
  • Operator UI/automation/groomer shows recent runs with status, issue, lane/label diffs, model, timestamps, and JSON links. Linked from the automation overview.
  • Scheduler token — optional DISPATCH_GROOMER_TOKEN for scheduled/admin invocations alongside DISPATCH_AGENT_TOKEN.
  • DocsREADME.md and docs/hosted-groomer.md updated with new env vars, history/audit, repo context, and scheduling.

Verification

  • npm run lint — clean (0 warnings)
  • npm run typecheck — clean
  • npm test — 1797 tests pass, 100 files
  • Secret scan — clean (only test fixtures)

…rails

Close #460. Turns the hosted groomer MVP into a complete, operator-visible
subsystem without broadening its scope (still no code edits, PRs, merges,
clones, or shell).

- GroomingRun table: dedicated history with labels/lane before+after,
  context warnings, failure stage, comment URL, and indexes.
- Repository context: bounded GitHub REST API context (metadata, code
  search, file snippets) with hard caps on searches/files/bytes. No clone.
- Comment cooldown: skips repeat comments within a configurable window.
- History API: GET /api/groomer/runs and /api/groomer/runs/[id].
- Operator UI: /automation/groomer with status, issue, lane/label diffs,
  model, timestamps, and JSON detail links.
- Auth: optional DISPATCH_GROOMER_TOKEN for scheduled invocations.
- Docs: new env vars, history/audit, repo context, scheduling.

@its-saffron its-saffron Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Automated Review

Full PR review.

Analysis engine: MiniMax-M2.7@https://litellm.jory.dev/v1 (anthropic) — routed smart (risk match: auth_changes)

Recommendation: Approve

This PR adds durable history, audit UI, repository context, and guardrails to the hosted groomer feature. The changes are additive, well-tested, and respect the safety constraints established by the original groomer MVP (issue PR 462) and the design in PR 460.

Change-by-Change Findings

Database Migration (prisma/schema.prisma)

  • New GroomingRun table only — no modifications to existing tables. No data loss risk. The table uses Cuid() IDs, cascade/set-null deletes on foreign keys, and includes composite indexes for the query patterns used by the new history API. Migration is safe.

Auth Changes (src/lib/auth.ts)

  • New authorizeGroomerRequest function adds DISPATCH_GROOMER_TOKEN as an alternative bearer token for the groomer endpoint. It falls back to standard auth (agent token, basic, OIDC) if the groomer token is not set. The safeEqual comparison is used for token comparison, which is the correct timing-safe pattern. No regression in existing auth flows — authorizeRequest is unchanged.

Path Handling (src/lib/groomer/repository-context.ts)

  • fetchRepositoryFileText encodes each path segment individually with encodeURIComponent, which properly URL-encodes .., /, null bytes, and other traversal sequences before constructing the GitHub Contents API URL (/repos/{owner}/{repo}/contents/{path}). Not a filesystem path — these are HTTP requests to GitHub's API, which resolves paths server-side.
  • isTextLike uses an allowlist of extensions (.ts, .tsx, .js, etc.); files without extensions or with non-text extensions (.png, .bin) are skipped regardless of path content.
  • Repository context is disabled by default (DISPATCH_GROOMER_REPO_CONTEXT_ENABLED=false), capped by maxSearches, maxContextFiles, maxFileBytes, and a 40% budget of maxContextBytes.

New API Endpoints

  • GET /api/groomer/runs and GET /api/groomer/runs/[id] — both protected by authorizeRequest, return 401 on unauthorized access, 404 on missing runs.
  • POST /api/groomer/run — switched to authorizeGroomerRequest, returns a generic error message on failure (intentional security improvement over leaking error details).

GroomingRun History (src/lib/groomer/history.ts)

  • createGroomingRunRecord mirrors labelsBefore → labelsAfter and laneBefore → laneAfter on creation, ensuring the "before" state is captured atomically before any mutations.
  • listGroomingRuns clamps take to [1, 200] range.
  • completeGroomingRunRecord sets completedAt automatically.

Guardrails Preserved

  • Dry-run defaults to true.
  • Comment cooldown enforcement queries prior GroomingRun records by issueId and createdAt.
  • Structured JSON schema validation via validateGroomerOutput is applied before any side effects.
  • Leases are acquired and released around the run.
  • Repository context failures are soft (warnings recorded, groomer continues).

Standards Compliance

  • Prisma schema conventions: ✅ new model with @id, @default(cuid()), proper relations, and composite indexes.
  • API route conventions: ✅ appropriate HTTP status codes (200, 401, 404, 500), JSON responses, error instanceof Error pattern in handlers.
  • Label allowlist: ✅ enforced by validateGroomerOutput schema validation.
  • Audit trail: ✅ AuditLog and AgentRun still written; new GroomingRun is the detailed drilldown.
  • Agent workflow contract: ✅ hosted-groomer uses the same groom run type; grooming scope unchanged (no PRs, no code edits, no shell).

Linked Issue Fit (PR 460)

All acceptance criteria are addressed:

  • ✅ One issue per invocation with candidate selection
  • ✅ Bounded repository context via GitHub REST APIs (no clone, no shell)
  • ✅ Provider-neutral LLM adapter
  • ✅ Schema-validated output before side effects
  • ✅ Dry-run mode returns mutation plan without writes
  • ✅ Write mode updates labels, adds comment (with cooldown), updates Dispatch cache
  • GroomingRun table with audit trail
  • ✅ External worker grooming (next-task?mode=groom) unaffected
  • ✅ Unit tests cover selection, validation, dry-run, mutation planning, and failure handling

Evidence Provider Findings

No evidence providers configured for this PR.

Tool Harness Findings

No tool calls issued; reviewed corpus directly.

Required Checks

  1. review migration for data loss risk — ✅ Verified. GroomingRun is a new table added via schema delta. No existing tables are modified. Uses onDelete: Cascade for Issue relation, onDelete: SetNull for AgentRun. No data loss possible.

  2. test migration on a copy of production schema⚠️ Cannot verify from corpus. The migration is a straightforward CREATE TABLE with standard types. Recommend verifying in a staging environment before production deployment.

  3. review for path traversal vulnerabilities — ✅ Verified. Paths go to GitHub Contents API URL (/repos/{owner}/{repo}/contents/{encoded_path}), not a filesystem. encodeURIComponent on each segment encodes traversal sequences. Allowlist extension filter blocks binary/non-text files. Repository context disabled by default with hard budget caps.

  4. test with edge-case paths (null bytes, symlinks) — ℹ️ Not explicitly covered by unit tests. Path encoding via encodeURIComponent handles null bytes. Symlinks are not applicable since the groomer reads from GitHub's API, not a local filesystem. The existing test suite in repository-context.test.ts covers empty paths, no-extension files, duplicate paths, and missing content but not explicit null-byte injection. Acceptable given the API-layer protection.

  5. review auth flow for regression — ✅ Verified. authorizeRequest is unchanged. authorizeGroomerRequest adds a fallback path without modifying existing token validation. safeEqual is used for groomer token comparison.

  6. verify session token handling is correct — ✅ Verified. DISPATCH_GROOMER_TOKEN is read from env, trimmed, and compared using safeEqual. It is never logged or persisted. The token only authorizes the groomer endpoint, not other Dispatch APIs.

Unknowns / Needs Verification

  • Migration has not been tested on a production-schema copy — recommend a staging validation run before deploying to production.
  • The Buffer global used in repository-context.ts and github.ts assumes Node.js runtime (consistent with the Node:24 Docker image used in the Dockerfile). No browser-specific issues detected.

@joryirving joryirving enabled auto-merge (squash) June 27, 2026 20:48
@joryirving joryirving merged commit b0ec3ec into main Jun 27, 2026
6 checks passed
@joryirving joryirving deleted the feat/groomer-closeout branch June 27, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run LLM-backed issue grooming inside Dispatch

2 participants