From 653f850699b863525aab840d867cf5f34d21e375 Mon Sep 17 00:00:00 2001 From: stephanie chou Date: Mon, 22 Jun 2026 17:48:18 -0700 Subject: [PATCH 1/5] Group CLI crash reports on structured error signals Replaces the single catch-all Bugsnag bucket (~1,170 unrelated GraphQL/API errors across app/theme/store/hydrogen) with grouping derived from the original typed error's HTTP status and GraphQL extensions.code, rather than regex- reparsing a stringified message. - New error-grouping.ts: structured-first decision table (THROTTLED/429->rate_limit, 401->authentication, 403/ACCESS_DENIED->permission, 5xx->server), shared keyword categorizer as fallback, and undefined for unknown errors so Bugsnag stack-trace grouping is preserved. - error-handler.ts: sets event.groupingHash only when a real category resolves, and emits error_grouping metadata (slice_name, http_status, error_code, error_class) so backend routing works regardless of CLI version. - error.ts: raw graphql-request ClientError 401/429/THROTTLED is now treated as expected and kept out of crash reporting. Shared error-categorizer.ts / storage.ts (Monorail taxonomy) are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) --- .changeset/structured-error-grouping.md | 7 + .../node/analytics/error-grouping.DESIGN.md | 55 ++++++ .../node/analytics/error-grouping.test.ts | 100 ++++++++++ .../private/node/analytics/error-grouping.ts | 179 ++++++++++++++++++ .../src/public/node/error-handler.test.ts | 38 +++- .../cli-kit/src/public/node/error-handler.ts | 29 ++- .../cli-kit/src/public/node/error.test.ts | 22 +++ packages/cli-kit/src/public/node/error.ts | 34 ++++ 8 files changed, 461 insertions(+), 3 deletions(-) create mode 100644 .changeset/structured-error-grouping.md create mode 100644 packages/cli-kit/src/private/node/analytics/error-grouping.DESIGN.md create mode 100644 packages/cli-kit/src/private/node/analytics/error-grouping.test.ts create mode 100644 packages/cli-kit/src/private/node/analytics/error-grouping.ts diff --git a/.changeset/structured-error-grouping.md b/.changeset/structured-error-grouping.md new file mode 100644 index 00000000000..578ab3d8688 --- /dev/null +++ b/.changeset/structured-error-grouping.md @@ -0,0 +1,7 @@ +--- +'@shopify/cli-kit': patch +--- + +Group CLI crash reports on structured error signals instead of one catch-all bucket + +Bugsnag error grouping now derives a stable `slice:category:signature` hash from the original typed error's HTTP status and GraphQL `extensions.code` (falling back to the shared keyword categorizer, and to Bugsnag's stack-trace grouping for genuinely unknown errors). The same signals are emitted as `error_grouping` metadata so backend routing works regardless of CLI version. Known-transient API errors (HTTP 401/429 and `THROTTLED`) are now treated as expected and kept out of crash reporting. diff --git a/packages/cli-kit/src/private/node/analytics/error-grouping.DESIGN.md b/packages/cli-kit/src/private/node/analytics/error-grouping.DESIGN.md new file mode 100644 index 00000000000..27f85ab76b8 --- /dev/null +++ b/packages/cli-kit/src/private/node/analytics/error-grouping.DESIGN.md @@ -0,0 +1,55 @@ +# Design: Structured error grouping for crash reporting + +## Problem + +A single Bugsnag bucket (`groupingHash 16864652937831232783`) is a catch-all for ~1,170 +unrelated GraphQL/API errors across `app`/`theme`/`store`/`hydrogen` (issue #7891). It is +un-routable (five different owning teams), mixes expected/transient noise with genuine +regressions, and manufactures false P1s. Root cause: in `sendErrorToBugsnag` every error is +flattened to `new Error(error.message)` (losing its class/code), no `groupingHash` is set, and +stack traces are uniform — so the backend groups everything together. + +## Users + +- **Resiliency owners / on-call** routing CLI crash buckets to the right team. +- **Dashboards/alerting** (Bugsnag, Observe) that escalate severity on a bucket. + +## Success criteria + +1. Distinct failure families land in distinct buckets, split by product slice and a stable + semantic category (authentication / permission / rate_limit / server / …). +2. Categories are derived from **structured signals** (HTTP status, GraphQL `extensions.code`, + error class) read from the original typed error — not by regex-reparsing a stringified message. +3. Expected/handled and known-transient errors (AbortErrors, THROTTLED, 5xx, raw 401) do **not** + reach crash reporting. +4. Genuinely unknown errors keep Bugsnag's stack-trace grouping (we do not merge distinct bugs). +5. Backend routing works regardless of CLI version (structured metadata tags, not just the hash). + +## Constraints + +- **Reporter-only scope.** No changes to API throw sites or cross-team code. The original typed + error is already in scope in `sendErrorToBugsnag`; `GraphQLClientError` already carries + `statusCode` + `errors[]`, and graphql-request's `ClientError` carries `response.status`/`errors`. +- **Do not mutate `error-categorizer.ts`.** Its precedence quirks are pinned by + `error-categorizer.test.ts` / `storage.test.ts` because it is shared with Monorail analytics + (`storage.ts` emits `error:${category}:${signature}`). It is reused here only as a fallback. +- **No import cycles.** `error.ts → headers.ts → error.ts` would cycle, so transient suppression + in `error.ts` uses only the external `ClientError` type (the raw-401 path). + +## Key decisions + +1. **Group on the original error, structured-first.** A new `error-grouping.ts` extracts + `{httpStatus, code, errorClass}` and maps them to a category via an explicit decision table. + `403` and `ACCESS_DENIED` → `permission` (deliberately, fixing the prior 403→authentication + confusion). Keyword `categorizeError` is the fallback only. +2. **Suppress + drop transient.** Restore the `expected_error` skip in `sendErrorToBugsnag`; + extend `shouldReportErrorAsUnexpected` so raw `ClientError` 401/429/THROTTLED is expected. +3. **Hash + tags.** Set `event.groupingHash` (when a real category resolves) AND emit + `error_grouping` metadata (`slice_name`, `http_status`, `error_code`, `error_class`, `category`) + so dashboards/routing don't depend on the hash or on CLI upgrade adoption. + +## Out of scope (follow-ups) + +- At-source `category`/`code` field on the `FatalError` hierarchy + API wrappers. +- Folding status/code grouping into `error-categorizer.ts` (needs coordinated `storage.ts` update). +- Alerting on `*:unknown:*` / `cli:*` bucket growth; tying severity to SLO burn not volume. diff --git a/packages/cli-kit/src/private/node/analytics/error-grouping.test.ts b/packages/cli-kit/src/private/node/analytics/error-grouping.test.ts new file mode 100644 index 00000000000..fb7827dbced --- /dev/null +++ b/packages/cli-kit/src/private/node/analytics/error-grouping.test.ts @@ -0,0 +1,100 @@ +import {errorGroupingHash, errorGroupingSignals} from './error-grouping.js' +import {GraphQLClientError} from '../api/headers.js' +import {AbortError} from '../../../public/node/error.js' +import {ClientError} from 'graphql-request' +import {describe, expect, test} from 'vitest' + +function clientError(status: number, code?: string): ClientError { + const errors = code ? [{message: 'boom', extensions: {code}}] : undefined + return new ClientError({status, errors, headers: {}} as any, {query: 'q'} as any) +} + +describe('errorGroupingSignals', () => { + test('reads status, code, and class from a GraphQLClientError', () => { + const error = new GraphQLClientError('Forbidden', 403, [{extensions: {code: 'ACCESS_DENIED'}}]) + + expect(errorGroupingSignals(error)).toEqual({ + httpStatus: 403, + code: 'ACCESS_DENIED', + errorClass: 'GraphQLClientError', + }) + }) + + test('reads status and code from a raw graphql-request ClientError', () => { + const error = clientError(429, 'THROTTLED') + + expect(errorGroupingSignals(error)).toEqual({ + httpStatus: 429, + code: 'THROTTLED', + errorClass: 'ClientError', + }) + }) + + test('returns only the class name for a generic Error', () => { + expect(errorGroupingSignals(new Error('boom'))).toEqual({errorClass: 'Error'}) + }) + + test('returns an empty object for a non-Error value', () => { + expect(errorGroupingSignals('boom')).toEqual({}) + }) + + test('ignores a non-array errors value', () => { + const error = new GraphQLClientError('weird', 400, 'not-an-array' as any) + + expect(errorGroupingSignals(error)).toEqual({httpStatus: 400, errorClass: 'GraphQLClientError'}) + }) +}) + +describe('errorGroupingHash — structured decision table', () => { + test.each<[string, ClientError | GraphQLClientError, string]>([ + ['THROTTLED code -> rate_limit', clientError(400, 'THROTTLED'), 'theme:rate_limit:http-400-throttled'], + ['HTTP 429 -> rate_limit', clientError(429), 'theme:rate_limit:http-429'], + ['HTTP 401 -> authentication', clientError(401), 'theme:authentication:http-401'], + ['HTTP 403 -> permission', clientError(403), 'theme:permission:http-403'], + ['ACCESS_DENIED code -> permission', clientError(400, 'ACCESS_DENIED'), 'theme:permission:http-400-access-denied'], + ['HTTP 500 -> server', clientError(500), 'theme:server:http-500'], + ['HTTP 503 -> server', clientError(503), 'theme:server:http-503'], + ])('%s', (_, error, expected) => { + expect(errorGroupingHash(error, 'theme')).toEqual(expected) + }) + + test('403 wins over a 401-looking authentication category (permission, not authentication)', () => { + const error = new GraphQLClientError('Forbidden', 403, [{extensions: {code: 'ACCESS_DENIED'}}]) + + expect(errorGroupingHash(error, 'store')).toEqual('store:permission:http-403-access-denied') + }) + + test('prefixes the hash with the provided slice name', () => { + expect(errorGroupingHash(clientError(429), 'app')).toEqual('app:rate_limit:http-429') + expect(errorGroupingHash(clientError(429), 'cli')).toEqual('cli:rate_limit:http-429') + }) +}) + +describe('errorGroupingHash — message fallback', () => { + test('recovers an HTTP status flattened into an AbortError message', () => { + const error = new AbortError('The request responded unsuccessfully with the HTTP status 500') + + expect(errorGroupingHash(error, 'theme')).toEqual('theme:server:http-500') + }) + + test('recovers a graphql-request style "(Code: NNN)" status from a message', () => { + const error = new Error('GraphQL Error (Code: 401): {"response":{"status":401}}') + + expect(errorGroupingHash(error, 'store')).toEqual('store:authentication:http-401') + }) + + test('falls back to the keyword categorizer for an untyped, signal-less error', () => { + const error = new Error('connect ETIMEDOUT 1.2.3.4:443') + const hash = errorGroupingHash(error, 'cli') + + expect(hash).toMatch(/^cli:network:/) + }) + + test('returns undefined for an unknown error so stack-trace grouping is preserved', () => { + expect(errorGroupingHash(new Error('something nobody has categorized'), 'cli')).toBeUndefined() + }) + + test('returns undefined for a non-Error value', () => { + expect(errorGroupingHash(undefined, 'cli')).toBeUndefined() + }) +}) diff --git a/packages/cli-kit/src/private/node/analytics/error-grouping.ts b/packages/cli-kit/src/private/node/analytics/error-grouping.ts new file mode 100644 index 00000000000..d4252a666e3 --- /dev/null +++ b/packages/cli-kit/src/private/node/analytics/error-grouping.ts @@ -0,0 +1,179 @@ +import {categorizeError, ErrorCategory, formatErrorMessage} from './error-categorizer.js' +import {GraphQLClientError} from '../api/headers.js' +import {ClientError} from 'graphql-request' + +/** + * Structured signals extracted from an error, used to group it into a meaningful Bugsnag bucket. + * + * These are read from the *original* typed error (before it is flattened to a generic `Error` + * for reporting), so we group on facts the error already carries rather than by re-parsing a + * stringified message. + */ +export interface ErrorGroupingSignals { + httpStatus?: number + code?: string + errorClass?: string +} + +/** + * Extracts structured grouping signals from an error object. + * + * @param error - The original error (any type). + * @returns The HTTP status, GraphQL error code, and error class name when available. + */ +export function errorGroupingSignals(error: unknown): ErrorGroupingSignals { + const signals: ErrorGroupingSignals = {} + + if (error instanceof Error) { + signals.errorClass = error.constructor.name + } + + // GraphQLClientError (handleErrors: true, status < 500) preserves the status and errors array. + if (error instanceof GraphQLClientError) { + signals.httpStatus = error.statusCode + signals.code = firstExtensionCode(error.errors) + return signals + } + + // Raw graphql-request ClientError (handleErrors: false) exposes the full response. + if (error instanceof ClientError) { + signals.httpStatus = error.response?.status + signals.code = firstExtensionCode(error.response?.errors) + } + + return signals +} + +/** + * Builds a Bugsnag grouping hash of the form `${slice}:${category}:${signature}`. + * + * Categories are resolved structured-first (HTTP status / GraphQL code), falling back to the + * shared keyword categorizer only for untyped errors. When no meaningful category can be + * resolved, returns `undefined` so the caller leaves `event.groupingHash` unset and Bugsnag's + * default stack-trace grouping applies — this avoids merging genuinely distinct unknown bugs. + * + * @param error - The original error (any type). + * @param sliceName - The product slice (`app`, `theme`, `store`, `hydrogen`, or `cli`). + * @returns The grouping hash, or `undefined` to fall back to stack-trace grouping. + */ +export function errorGroupingHash(error: unknown, sliceName: string): string | undefined { + const message = errorMessage(error) + // Object signals are authoritative; message-derived signals fill gaps (e.g. 5xx errors that the + // API layer flattens into an AbortError, dropping the structured status field). + const signals: ErrorGroupingSignals = {...signalsFromMessage(message), ...errorGroupingSignals(error)} + + const structuredCategory = categoryFromSignals(signals) + if (structuredCategory) { + return `${sliceName}:${structuredCategory}:${structuredSignature(signals)}` + } + + const groupingError = new Error(stripJsonDump(message)) + const category = categorizeError(groupingError) + if (category === ErrorCategory.Unknown) return undefined + + return `${sliceName}:${category.toLowerCase()}:${formatErrorMessage(groupingError, category)}` +} + +/** + * Resolves a semantic category from structured signals using an explicit decision table. + * + * 403 and ACCESS_DENIED map to `permission` (a forbidden request is a permission problem, not an + * authentication one). Returns `undefined` when no structured signal is present. + */ +function categoryFromSignals(signals: ErrorGroupingSignals): string | undefined { + const {httpStatus, code} = signals + + if (code === 'THROTTLED' || httpStatus === 429) return 'rate_limit' + if (httpStatus === 401) return 'authentication' + if (httpStatus === 403 || code === 'ACCESS_DENIED') return 'permission' + if (httpStatus !== undefined && httpStatus >= 500) return 'server' + + return undefined +} + +/** + * Builds a stable, low-cardinality signature slug from structured signals. + */ +function structuredSignature(signals: ErrorGroupingSignals): string { + const parts: string[] = [] + if (signals.httpStatus !== undefined) parts.push(`http-${signals.httpStatus}`) + if (signals.code) parts.push(signals.code) + if (parts.length === 0 && signals.errorClass) parts.push(signals.errorClass) + + return slugify(parts.join('-')) +} + +/** + * Recovers structured signals from an error message string. Handles both error shapes seen from + * the API layer: graphql-request's `GraphQL Error (Code: NNN): {json}` and the cli-kit wrapper's + * `... responded unsuccessfully with the HTTP status NNN ...`. + */ +function signalsFromMessage(message: string): ErrorGroupingSignals { + const signals: ErrorGroupingSignals = {} + + const statusMatch = message.match(/HTTP status (\d{3})/i) ?? message.match(/\(Code: (\d{3})\)/i) + if (statusMatch?.[1]) signals.httpStatus = Number(statusMatch[1]) + + const payload = parseEmbeddedJson(message) + if (payload?.response?.status !== undefined) { + signals.httpStatus = signals.httpStatus ?? payload.response.status + } + const code = firstExtensionCode(payload?.response?.errors) + if (code) signals.code = code + + return signals +} + +interface EmbeddedGraphQLPayload { + response?: { + status?: number + errors?: unknown + } +} + +/** + * Parses the JSON blob embedded in a graphql-request ClientError message, if present. + */ +function parseEmbeddedJson(message: string): EmbeddedGraphQLPayload | undefined { + const jsonStart = message.indexOf('{') + if (jsonStart === -1) return undefined + + try { + return JSON.parse(message.slice(jsonStart)) as EmbeddedGraphQLPayload + // eslint-disable-next-line no-catch-all/no-catch-all + } catch { + return undefined + } +} + +/** + * Returns the GraphQL `extensions.code` of the first error, when the errors value is an array. + * The API can return `errors` as a string, hence the array guard. + */ +function firstExtensionCode(errors: unknown): string | undefined { + if (!Array.isArray(errors)) return undefined + const code = (errors[0] as {extensions?: {code?: unknown}} | undefined)?.extensions?.code + return typeof code === 'string' ? code : undefined +} + +/** + * Strips a trailing JSON dump (`...: {json}`) from a message so the keyword categorizer sees the + * human-readable prefix rather than the serialized request/response (which contains the literal + * `request`, mis-routing to the network category). + */ +function stripJsonDump(message: string): string { + return message.split(': {')[0] ?? message +} + +function errorMessage(error: unknown): string { + if (error instanceof Error) return error.message + return typeof error === 'string' ? error : '' +} + +function slugify(value: string): string { + return value + .toLowerCase() + .replace(/[^a-z0-9]+/g, '-') + .replace(/^-|-$/g, '') + .slice(0, 50) +} diff --git a/packages/cli-kit/src/public/node/error-handler.test.ts b/packages/cli-kit/src/public/node/error-handler.test.ts index 8f101ab55d4..5fdfabfd180 100644 --- a/packages/cli-kit/src/public/node/error-handler.test.ts +++ b/packages/cli-kit/src/public/node/error-handler.test.ts @@ -6,13 +6,14 @@ import * as error from './error.js' import {hashString} from './crypto.js' import {isLocalEnvironment} from '../../private/node/context/service.js' import {getLastSeenUserIdAfterAuth} from '../../private/node/session.js' +import {GraphQLClientError} from '../../private/node/api/headers.js' import {settings} from '@oclif/core' import {beforeEach, describe, expect, test, vi} from 'vitest' const onNotify = vi.fn() const capturedEventHandler = vi.fn() -let lastBugsnagEvent: {addMetadata: ReturnType} | undefined +let lastBugsnagEvent: {addMetadata: ReturnType; groupingHash?: string} | undefined vi.mock('process') vi.mock('@bugsnag/js', () => { @@ -303,4 +304,39 @@ describe('sends errors to Bugsnag', () => { expect(lastBugsnagEvent).toBeDefined() expect(lastBugsnagEvent!.addMetadata).toHaveBeenCalledWith('custom', {slice_name: 'cli'}) }) + + test('sets a structured groupingHash and error_grouping metadata for a typed API error', async () => { + await metadata.addSensitiveMetadata(() => ({ + commandStartOptions: {startTime: Date.now(), startCommand: 'theme dev', startArgs: []}, + })) + const apiError = new GraphQLClientError('Forbidden', 403, [{extensions: {code: 'ACCESS_DENIED'}}]) + + await sendErrorToBugsnag(apiError, 'unexpected_error') + + expect(lastBugsnagEvent).toBeDefined() + expect(lastBugsnagEvent!.groupingHash).toEqual('theme:permission:http-403-access-denied') + expect(lastBugsnagEvent!.addMetadata).toHaveBeenCalledWith('error_grouping', { + slice_name: 'theme', + http_status: 403, + error_code: 'ACCESS_DENIED', + error_class: 'GraphQLClientError', + }) + }) + + test('leaves groupingHash unset for an unknown error but still tags error_grouping metadata', async () => { + await metadata.addSensitiveMetadata(() => ({ + commandStartOptions: {startTime: Date.now(), startCommand: 'app dev', startArgs: []}, + })) + + await sendErrorToBugsnag(new Error('something nobody has categorized'), 'unexpected_error') + + expect(lastBugsnagEvent).toBeDefined() + expect(lastBugsnagEvent!.groupingHash).toBeUndefined() + expect(lastBugsnagEvent!.addMetadata).toHaveBeenCalledWith('error_grouping', { + slice_name: 'app', + http_status: undefined, + error_code: undefined, + error_class: 'Error', + }) + }) }) diff --git a/packages/cli-kit/src/public/node/error-handler.ts b/packages/cli-kit/src/public/node/error-handler.ts index d639c9d3d33..06fd96cfa54 100644 --- a/packages/cli-kit/src/public/node/error-handler.ts +++ b/packages/cli-kit/src/public/node/error-handler.ts @@ -12,6 +12,7 @@ import { } from './error.js' import {outputDebug, outputInfo} from './output.js' import {getEnvironmentData} from '../../private/node/analytics.js' +import {errorGroupingHash, errorGroupingSignals} from '../../private/node/analytics/error-grouping.js' import {isLocalEnvironment} from '../../private/node/context/service.js' import {bugsnagApiKey, reportingRateLimit} from '../../private/node/constants.js' import {CLI_KIT_VERSION} from '../common/version.js' @@ -28,6 +29,14 @@ import {realpath} from 'fs/promises' // Hardcoded list per product slices to keep analytics consistent. const ALLOWED_SLICE_NAMES = new Set(['app', 'theme', 'hydrogen', 'store']) +// Derives the product slice from the command that was running (e.g. `theme dev` -> `theme`), +// defaulting to `cli` when the command is unknown or not in the allow-list. +function sliceNameFromStartCommand(startCommand: string | undefined): string { + if (!startCommand) return 'cli' + const firstWord = startCommand.trim().split(/\s+/)[0] ?? 'cli' + return ALLOWED_SLICE_NAMES.has(firstWord) ? firstWord : 'cli' +} + export async function errorHandler( error: Error & {exitCode?: number | undefined}, config?: Interfaces.Config, @@ -147,11 +156,27 @@ export async function sendErrorToBugsnag( // Attach command metadata so we know which CLI command triggered the error const {commandStartOptions} = metadata.getAllSensitiveMetadata() const {startCommand} = commandStartOptions ?? {} + const sliceName = sliceNameFromStartCommand(startCommand) if (startCommand) { - const firstWord = startCommand.trim().split(/\s+/)[0] ?? 'cli' - const sliceName = ALLOWED_SLICE_NAMES.has(firstWord) ? firstWord : 'cli' event.addMetadata('custom', {slice_name: sliceName}) } + + // Group on structured signals from the *original* error (not the flattened + // `reportableError`), and emit those signals as metadata so backend dashboards and + // routing work without depending on the grouping hash or on CLI version adoption. + // When no meaningful category resolves we leave `groupingHash` unset so Bugsnag's + // stack-trace grouping applies and distinct unknown bugs are not merged. + const groupingHash = errorGroupingHash(error, sliceName) + if (groupingHash) { + event.groupingHash = groupingHash + } + const {httpStatus, code, errorClass} = errorGroupingSignals(error) + event.addMetadata('error_grouping', { + slice_name: sliceName, + http_status: httpStatus, + error_code: code, + error_class: errorClass, + }) } const errorHandler = (error: unknown) => { if (error) { diff --git a/packages/cli-kit/src/public/node/error.test.ts b/packages/cli-kit/src/public/node/error.test.ts index 1bba30c0633..2ad406d6e86 100644 --- a/packages/cli-kit/src/public/node/error.test.ts +++ b/packages/cli-kit/src/public/node/error.test.ts @@ -1,7 +1,13 @@ import {AbortError, BugError, handler, cleanSingleStackTracePath, shouldReportErrorAsUnexpected} from './error.js' import {renderFatalError} from './ui.js' +import {ClientError} from 'graphql-request' import {describe, expect, test, vi} from 'vitest' +function clientError(status: number, code?: string): ClientError { + const errors = code ? [{message: 'boom', extensions: {code}}] : undefined + return new ClientError({status, errors, headers: {}} as any, {query: 'q'} as any) +} + vi.mock('./ui.js') describe('handler', () => { @@ -82,4 +88,20 @@ describe('shouldReportErrorAsUnexpected helper', () => { test('returns false for unsupported platform errors', () => { expect(shouldReportErrorAsUnexpected(new Error('Unsupported platform: win32 arm64 LE'))).toBe(false) }) + + test('returns false for a raw ClientError that is rate limited (HTTP 429)', () => { + expect(shouldReportErrorAsUnexpected(clientError(429))).toBe(false) + }) + + test('returns false for a raw ClientError that is unauthenticated (HTTP 401)', () => { + expect(shouldReportErrorAsUnexpected(clientError(401))).toBe(false) + }) + + test('returns false for a raw ClientError with a THROTTLED code', () => { + expect(shouldReportErrorAsUnexpected(clientError(400, 'THROTTLED'))).toBe(false) + }) + + test('returns true for a raw ClientError that is a genuine failure (HTTP 500)', () => { + expect(shouldReportErrorAsUnexpected(clientError(500))).toBe(true) + }) }) diff --git a/packages/cli-kit/src/public/node/error.ts b/packages/cli-kit/src/public/node/error.ts index c2a7852c65c..723aa24c7a5 100644 --- a/packages/cli-kit/src/public/node/error.ts +++ b/packages/cli-kit/src/public/node/error.ts @@ -3,6 +3,7 @@ import {OutputMessage, stringifyMessage, TokenizedString} from './output.js' import {InlineToken, TokenItem, tokenItemToString} from '../../private/node/ui/components/TokenizedText.js' import {Errors} from '@oclif/core' +import {ClientError} from 'graphql-request' import type {AlertCustomSection} from './ui.js' @@ -193,6 +194,12 @@ export function shouldReportErrorAsUnexpected(error: unknown): boolean { if (!isFatal(error)) { // this means its not one of the CLI wrapped errors if (error instanceof Error) { + // Raw API errors that slip through unwrapped (e.g. the handleErrors:false path) are + // transient/environmental, not CLI bugs: rate limiting (429/THROTTLED) and unauthenticated + // requests (401). Treat them as expected so they don't pollute crash reporting. + if (isTransientApiError(error)) { + return false + } const message = error.message return !errorMessageImpliesEnvironmentIssue(message) } @@ -204,6 +211,33 @@ export function shouldReportErrorAsUnexpected(error: unknown): boolean { return false } +/** + * Detects raw graphql-request `ClientError`s that are transient/environmental rather than CLI + * bugs: HTTP 401 (unauthenticated), 429 (rate limited), or a GraphQL `THROTTLED` code. These reach + * the reporter as plain `Error`s (not `FatalError`s) and would otherwise be reported as unexpected. + * + * Scoped to the external `ClientError` type only — importing the cli-kit `GraphQLClientError` + * wrapper here would create an `error.ts → headers.ts → error.ts` import cycle. + * + * @param error - Error to be checked. + * @returns A boolean indicating if the error is a known-transient API error. + */ +function isTransientApiError(error: Error): boolean { + if (!(error instanceof ClientError)) { + return false + } + const status = error.response?.status + if (status === 401 || status === 429) { + return true + } + const errors = error.response?.errors + if (Array.isArray(errors)) { + const code = (errors[0] as {extensions?: {code?: unknown}} | undefined)?.extensions?.code + return code === 'THROTTLED' + } + return false +} + /** * Stack traces usually have file:// - we strip that and also remove the Windows drive designation. * From 09e3ba00387bca7cad106c6c10462c875cbf651b Mon Sep 17 00:00:00 2001 From: stephanie chou Date: Mon, 22 Jun 2026 18:04:28 -0700 Subject: [PATCH 2/5] Drop unused export on ErrorGroupingSignals interface MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Knip flagged the interface as an unused export — it is only used internally as the return type of errorGroupingSignals; callers destructure inline. Co-Authored-By: Claude Opus 4.8 (1M context) --- packages/cli-kit/src/private/node/analytics/error-grouping.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/cli-kit/src/private/node/analytics/error-grouping.ts b/packages/cli-kit/src/private/node/analytics/error-grouping.ts index d4252a666e3..fb92841faa4 100644 --- a/packages/cli-kit/src/private/node/analytics/error-grouping.ts +++ b/packages/cli-kit/src/private/node/analytics/error-grouping.ts @@ -9,7 +9,7 @@ import {ClientError} from 'graphql-request' * for reporting), so we group on facts the error already carries rather than by re-parsing a * stringified message. */ -export interface ErrorGroupingSignals { +interface ErrorGroupingSignals { httpStatus?: number code?: string errorClass?: string From 8e0dc342c96f9c1c2257ae5de50c43869de8341f Mon Sep 17 00:00:00 2001 From: stephanie chou Date: Mon, 29 Jun 2026 12:16:28 -0700 Subject: [PATCH 3/5] Address review: robust GraphQL code scanning, single grouping resolver, suppression naming Responds to Mbarak's review on #7901: - Scan every GraphQL error (not just errors[0]) for routing codes, recognizing THROTTLED/429 and ACCESS_DENIED incl. nested App Management app_errors access_denied. Extracted into a dependency-free graphql-error-codes module shared by grouping and crash-report suppression (avoids the error.ts -> headers.ts -> error.ts import cycle). - Add the string GraphQL code "429" as a rate-limit signal, matching errorsIncludeStatus429 in private/node/api.ts. - Merge object signals over message-derived signals field-by-field with `??` so an absent object code/status can no longer erase a recovered value. - Single resolveErrorGrouping() returns {hash, category, signals}; the reporter drives both event.groupingHash and the error_grouping metadata from it, so the metadata now carries category and message-derived signals. - Rename isTransientApiError -> isExpectedApiError and document why 401 (expired/invalid credentials) is expected, distinct from rate limiting. - Document the deliberate 401-over-ACCESS_DENIED precedence and pin it + the new shapes with tests. Remove the issue-specific DESIGN.md. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../node/analytics/error-grouping.DESIGN.md | 55 --------- .../node/analytics/error-grouping.test.ts | 64 ++++++++++- .../private/node/analytics/error-grouping.ts | 104 +++++++++++++----- .../analytics/graphql-error-codes.test.ts | 41 +++++++ .../node/analytics/graphql-error-codes.ts | 79 +++++++++++++ .../src/public/node/error-handler.test.ts | 22 ++++ .../cli-kit/src/public/node/error-handler.ts | 14 ++- .../cli-kit/src/public/node/error.test.ts | 13 +++ packages/cli-kit/src/public/node/error.ts | 32 +++--- 9 files changed, 320 insertions(+), 104 deletions(-) delete mode 100644 packages/cli-kit/src/private/node/analytics/error-grouping.DESIGN.md create mode 100644 packages/cli-kit/src/private/node/analytics/graphql-error-codes.test.ts create mode 100644 packages/cli-kit/src/private/node/analytics/graphql-error-codes.ts diff --git a/packages/cli-kit/src/private/node/analytics/error-grouping.DESIGN.md b/packages/cli-kit/src/private/node/analytics/error-grouping.DESIGN.md deleted file mode 100644 index 27f85ab76b8..00000000000 --- a/packages/cli-kit/src/private/node/analytics/error-grouping.DESIGN.md +++ /dev/null @@ -1,55 +0,0 @@ -# Design: Structured error grouping for crash reporting - -## Problem - -A single Bugsnag bucket (`groupingHash 16864652937831232783`) is a catch-all for ~1,170 -unrelated GraphQL/API errors across `app`/`theme`/`store`/`hydrogen` (issue #7891). It is -un-routable (five different owning teams), mixes expected/transient noise with genuine -regressions, and manufactures false P1s. Root cause: in `sendErrorToBugsnag` every error is -flattened to `new Error(error.message)` (losing its class/code), no `groupingHash` is set, and -stack traces are uniform — so the backend groups everything together. - -## Users - -- **Resiliency owners / on-call** routing CLI crash buckets to the right team. -- **Dashboards/alerting** (Bugsnag, Observe) that escalate severity on a bucket. - -## Success criteria - -1. Distinct failure families land in distinct buckets, split by product slice and a stable - semantic category (authentication / permission / rate_limit / server / …). -2. Categories are derived from **structured signals** (HTTP status, GraphQL `extensions.code`, - error class) read from the original typed error — not by regex-reparsing a stringified message. -3. Expected/handled and known-transient errors (AbortErrors, THROTTLED, 5xx, raw 401) do **not** - reach crash reporting. -4. Genuinely unknown errors keep Bugsnag's stack-trace grouping (we do not merge distinct bugs). -5. Backend routing works regardless of CLI version (structured metadata tags, not just the hash). - -## Constraints - -- **Reporter-only scope.** No changes to API throw sites or cross-team code. The original typed - error is already in scope in `sendErrorToBugsnag`; `GraphQLClientError` already carries - `statusCode` + `errors[]`, and graphql-request's `ClientError` carries `response.status`/`errors`. -- **Do not mutate `error-categorizer.ts`.** Its precedence quirks are pinned by - `error-categorizer.test.ts` / `storage.test.ts` because it is shared with Monorail analytics - (`storage.ts` emits `error:${category}:${signature}`). It is reused here only as a fallback. -- **No import cycles.** `error.ts → headers.ts → error.ts` would cycle, so transient suppression - in `error.ts` uses only the external `ClientError` type (the raw-401 path). - -## Key decisions - -1. **Group on the original error, structured-first.** A new `error-grouping.ts` extracts - `{httpStatus, code, errorClass}` and maps them to a category via an explicit decision table. - `403` and `ACCESS_DENIED` → `permission` (deliberately, fixing the prior 403→authentication - confusion). Keyword `categorizeError` is the fallback only. -2. **Suppress + drop transient.** Restore the `expected_error` skip in `sendErrorToBugsnag`; - extend `shouldReportErrorAsUnexpected` so raw `ClientError` 401/429/THROTTLED is expected. -3. **Hash + tags.** Set `event.groupingHash` (when a real category resolves) AND emit - `error_grouping` metadata (`slice_name`, `http_status`, `error_code`, `error_class`, `category`) - so dashboards/routing don't depend on the hash or on CLI upgrade adoption. - -## Out of scope (follow-ups) - -- At-source `category`/`code` field on the `FatalError` hierarchy + API wrappers. -- Folding status/code grouping into `error-categorizer.ts` (needs coordinated `storage.ts` update). -- Alerting on `*:unknown:*` / `cli:*` bucket growth; tying severity to SLO burn not volume. diff --git a/packages/cli-kit/src/private/node/analytics/error-grouping.test.ts b/packages/cli-kit/src/private/node/analytics/error-grouping.test.ts index fb7827dbced..a87c5cedf80 100644 --- a/packages/cli-kit/src/private/node/analytics/error-grouping.test.ts +++ b/packages/cli-kit/src/private/node/analytics/error-grouping.test.ts @@ -1,4 +1,4 @@ -import {errorGroupingHash, errorGroupingSignals} from './error-grouping.js' +import {errorGroupingHash, errorGroupingSignals, resolveErrorGrouping} from './error-grouping.js' import {GraphQLClientError} from '../api/headers.js' import {AbortError} from '../../../public/node/error.js' import {ClientError} from 'graphql-request' @@ -9,6 +9,10 @@ function clientError(status: number, code?: string): ClientError { return new ClientError({status, errors, headers: {}} as any, {query: 'q'} as any) } +function clientErrorWithErrors(status: number, errors: unknown[]): ClientError { + return new ClientError({status, errors, headers: {}} as any, {query: 'q'} as any) +} + describe('errorGroupingSignals', () => { test('reads status, code, and class from a GraphQLClientError', () => { const error = new GraphQLClientError('Forbidden', 403, [{extensions: {code: 'ACCESS_DENIED'}}]) @@ -98,3 +102,61 @@ describe('errorGroupingHash — message fallback', () => { expect(errorGroupingHash(undefined, 'cli')).toBeUndefined() }) }) + +describe('errorGroupingHash — multi-error & nested codes (review #3/#5)', () => { + test('recognizes the GraphQL string code "429" as rate_limit, even at HTTP 200', () => { + // Mirrors errorsIncludeStatus429 in private/node/api.ts. + expect(errorGroupingHash(clientError(200, '429'), 'theme')).toEqual('theme:rate_limit:http-200-429') + }) + + test('scans every error, not just the first, for a routable code', () => { + const error = clientErrorWithErrors(200, [{message: 'noise'}, {extensions: {code: 'THROTTLED'}}]) + + expect(errorGroupingHash(error, 'theme')).toEqual('theme:rate_limit:http-200-throttled') + }) + + test('recognizes a nested App Management app_errors access_denied as permission', () => { + const error = clientErrorWithErrors(200, [{extensions: {app_errors: {errors: [{category: 'access_denied'}]}}}]) + + expect(errorGroupingHash(error, 'app')).toEqual('app:permission:http-200-access-denied') + }) +}) + +describe('errorGroupingHash — precedence & merge (review #2/#4)', () => { + test('401 wins over a co-occurring ACCESS_DENIED code (authentication, by decision)', () => { + const error = new GraphQLClientError('Unauthenticated', 401, [{extensions: {code: 'ACCESS_DENIED'}}]) + + // HTTP 401 is authoritative; the access-denied code is still recorded in the signature. + expect(errorGroupingHash(error, 'theme')).toEqual('theme:authentication:http-401-access-denied') + }) + + test('an absent object code does not erase a code recovered from the message', () => { + const error = new GraphQLClientError( + 'GraphQL Error (Code: 500): {"response":{"errors":[{"extensions":{"code":"INTERNAL"}}]}}', + 500, + ) + + const {signals, category} = resolveErrorGrouping(error, 'theme') + expect(category).toEqual('server') + expect(signals.httpStatus).toEqual(500) + expect(signals.code).toEqual('INTERNAL') + }) +}) + +describe('resolveErrorGrouping', () => { + test('returns hash, category, and signals together for the metadata path', () => { + expect(resolveErrorGrouping(clientError(403, 'ACCESS_DENIED'), 'store')).toEqual({ + hash: 'store:permission:http-403-access-denied', + category: 'permission', + signals: {httpStatus: 403, code: 'ACCESS_DENIED', errorClass: 'ClientError'}, + }) + }) + + test('returns an undefined hash but still-useful signals for an unknown error', () => { + const result = resolveErrorGrouping(new Error('nothing categorizes this'), 'cli') + + expect(result.hash).toBeUndefined() + expect(result.category).toBeUndefined() + expect(result.signals.errorClass).toEqual('Error') + }) +}) diff --git a/packages/cli-kit/src/private/node/analytics/error-grouping.ts b/packages/cli-kit/src/private/node/analytics/error-grouping.ts index fb92841faa4..f1c3968ada8 100644 --- a/packages/cli-kit/src/private/node/analytics/error-grouping.ts +++ b/packages/cli-kit/src/private/node/analytics/error-grouping.ts @@ -1,4 +1,5 @@ import {categorizeError, ErrorCategory, formatErrorMessage} from './error-categorizer.js' +import {graphQLErrorCodes, isPermissionCode, isRateLimitCode} from './graphql-error-codes.js' import {GraphQLClientError} from '../api/headers.js' import {ClientError} from 'graphql-request' @@ -15,9 +16,24 @@ interface ErrorGroupingSignals { errorClass?: string } +/** + * The fully resolved grouping decision for an error: the Bugsnag grouping hash (or `undefined` to + * fall back to stack-trace grouping), the semantic category, and the structured signals that drove + * the decision. The reporter uses a single resolution for both `event.groupingHash` and the + * `error_grouping` metadata so they can never disagree. + */ +interface ResolvedErrorGrouping { + hash?: string + category?: string + signals: ErrorGroupingSignals +} + /** * Extracts structured grouping signals from an error object. * + * Only assigns a field when a value is actually present, so the result can be safely merged over + * message-derived signals without an explicit `undefined` erasing a recovered value. + * * @param error - The original error (any type). * @returns The HTTP status, GraphQL error code, and error class name when available. */ @@ -30,62 +46,94 @@ export function errorGroupingSignals(error: unknown): ErrorGroupingSignals { // GraphQLClientError (handleErrors: true, status < 500) preserves the status and errors array. if (error instanceof GraphQLClientError) { - signals.httpStatus = error.statusCode - signals.code = firstExtensionCode(error.errors) + if (error.statusCode !== undefined) signals.httpStatus = error.statusCode + const code = routableCode(error.errors) + if (code !== undefined) signals.code = code return signals } // Raw graphql-request ClientError (handleErrors: false) exposes the full response. if (error instanceof ClientError) { - signals.httpStatus = error.response?.status - signals.code = firstExtensionCode(error.response?.errors) + const status = error.response?.status + if (status !== undefined) signals.httpStatus = status + const code = routableCode(error.response?.errors) + if (code !== undefined) signals.code = code } return signals } /** - * Builds a Bugsnag grouping hash of the form `${slice}:${category}:${signature}`. + * Resolves the full grouping decision for an error: hash, category, and the signals behind it. * * Categories are resolved structured-first (HTTP status / GraphQL code), falling back to the * shared keyword categorizer only for untyped errors. When no meaningful category can be - * resolved, returns `undefined` so the caller leaves `event.groupingHash` unset and Bugsnag's + * resolved, `hash` is `undefined` so the caller leaves `event.groupingHash` unset and Bugsnag's * default stack-trace grouping applies — this avoids merging genuinely distinct unknown bugs. * * @param error - The original error (any type). * @param sliceName - The product slice (`app`, `theme`, `store`, `hydrogen`, or `cli`). - * @returns The grouping hash, or `undefined` to fall back to stack-trace grouping. + * @returns The resolved hash (or `undefined`), category, and structured signals. */ -export function errorGroupingHash(error: unknown, sliceName: string): string | undefined { +export function resolveErrorGrouping(error: unknown, sliceName: string): ResolvedErrorGrouping { const message = errorMessage(error) // Object signals are authoritative; message-derived signals fill gaps (e.g. 5xx errors that the - // API layer flattens into an AbortError, dropping the structured status field). - const signals: ErrorGroupingSignals = {...signalsFromMessage(message), ...errorGroupingSignals(error)} + // API layer flattens into an AbortError, dropping the structured status field). Merge field by + // field so an absent object signal never clobbers one recovered from the message. + const fromError = errorGroupingSignals(error) + const fromMessage = signalsFromMessage(message) + const signals: ErrorGroupingSignals = { + httpStatus: fromError.httpStatus ?? fromMessage.httpStatus, + code: fromError.code ?? fromMessage.code, + errorClass: fromError.errorClass ?? fromMessage.errorClass, + } const structuredCategory = categoryFromSignals(signals) if (structuredCategory) { - return `${sliceName}:${structuredCategory}:${structuredSignature(signals)}` + return {hash: `${sliceName}:${structuredCategory}:${structuredSignature(signals)}`, category: structuredCategory, signals} } const groupingError = new Error(stripJsonDump(message)) const category = categorizeError(groupingError) - if (category === ErrorCategory.Unknown) return undefined + if (category === ErrorCategory.Unknown) { + return {hash: undefined, category: undefined, signals} + } - return `${sliceName}:${category.toLowerCase()}:${formatErrorMessage(groupingError, category)}` + const categoryName = category.toLowerCase() + return {hash: `${sliceName}:${categoryName}:${formatErrorMessage(groupingError, category)}`, category: categoryName, signals} +} + +/** + * Builds a Bugsnag grouping hash of the form `${slice}:${category}:${signature}`. + * + * Thin wrapper over {@link resolveErrorGrouping} for callers that only need the hash. + * + * @param error - The original error (any type). + * @param sliceName - The product slice (`app`, `theme`, `store`, `hydrogen`, or `cli`). + * @returns The grouping hash, or `undefined` to fall back to stack-trace grouping. + */ +export function errorGroupingHash(error: unknown, sliceName: string): string | undefined { + return resolveErrorGrouping(error, sliceName).hash } /** * Resolves a semantic category from structured signals using an explicit decision table. * - * 403 and ACCESS_DENIED map to `permission` (a forbidden request is a permission problem, not an - * authentication one). Returns `undefined` when no structured signal is present. + * Precedence is deliberate: + * - rate limit (`THROTTLED`/`429` code or HTTP 429) wins first — it is the most actionable bucket. + * - HTTP 401 is authoritative for `authentication`: a request the server rejected as + * unauthenticated is an authentication problem even if a GraphQL code also reports access-denied. + * `ACCESS_DENIED` in practice pairs with HTTP 403, which is handled by the permission branch. + * - 403 / `ACCESS_DENIED` (incl. nested App Management `access_denied`) → `permission`. + * + * Returns `undefined` when no structured signal is present. */ function categoryFromSignals(signals: ErrorGroupingSignals): string | undefined { const {httpStatus, code} = signals - if (code === 'THROTTLED' || httpStatus === 429) return 'rate_limit' + if (isRateLimitCode(code) || httpStatus === 429) return 'rate_limit' if (httpStatus === 401) return 'authentication' - if (httpStatus === 403 || code === 'ACCESS_DENIED') return 'permission' + if (httpStatus === 403 || isPermissionCode(code)) return 'permission' if (httpStatus !== undefined && httpStatus >= 500) return 'server' return undefined @@ -115,11 +163,11 @@ function signalsFromMessage(message: string): ErrorGroupingSignals { if (statusMatch?.[1]) signals.httpStatus = Number(statusMatch[1]) const payload = parseEmbeddedJson(message) - if (payload?.response?.status !== undefined) { - signals.httpStatus = signals.httpStatus ?? payload.response.status + if (signals.httpStatus === undefined && payload?.response?.status !== undefined) { + signals.httpStatus = payload.response.status } - const code = firstExtensionCode(payload?.response?.errors) - if (code) signals.code = code + const code = routableCode(payload?.response?.errors) + if (code !== undefined) signals.code = code return signals } @@ -147,13 +195,15 @@ function parseEmbeddedJson(message: string): EmbeddedGraphQLPayload | undefined } /** - * Returns the GraphQL `extensions.code` of the first error, when the errors value is an array. - * The API can return `errors` as a string, hence the array guard. + * Picks the most routing-relevant code from a GraphQL `errors` value. + * + * Scans every error (not just the first) and prefers a code we route on — a rate-limit code, then + * a permission code — so a benign leading code can't mask a `THROTTLED` or `ACCESS_DENIED` further + * down the array. Falls back to the first code present for visibility. */ -function firstExtensionCode(errors: unknown): string | undefined { - if (!Array.isArray(errors)) return undefined - const code = (errors[0] as {extensions?: {code?: unknown}} | undefined)?.extensions?.code - return typeof code === 'string' ? code : undefined +function routableCode(errors: unknown): string | undefined { + const codes = graphQLErrorCodes(errors) + return codes.find(isRateLimitCode) ?? codes.find(isPermissionCode) ?? codes[0] } /** diff --git a/packages/cli-kit/src/private/node/analytics/graphql-error-codes.test.ts b/packages/cli-kit/src/private/node/analytics/graphql-error-codes.test.ts new file mode 100644 index 00000000000..445cdde70b0 --- /dev/null +++ b/packages/cli-kit/src/private/node/analytics/graphql-error-codes.test.ts @@ -0,0 +1,41 @@ +import {graphQLErrorCodes, hasRateLimitCode, isPermissionCode, isRateLimitCode} from './graphql-error-codes.js' +import {describe, expect, test} from 'vitest' + +describe('graphQLErrorCodes', () => { + test('reads top-level extensions.code from every error, in order', () => { + const errors = [{extensions: {code: 'FIRST'}}, {message: 'no code'}, {extensions: {code: 'THROTTLED'}}] + + expect(graphQLErrorCodes(errors)).toEqual(['FIRST', 'THROTTLED']) + }) + + test('reads nested App Management app_errors categories', () => { + const errors = [{extensions: {app_errors: {errors: [{category: 'access_denied'}, {category: 'other'}]}}}] + + expect(graphQLErrorCodes(errors)).toEqual(['access_denied', 'other']) + }) + + test('returns an empty array for a string errors value (some 401 responses)', () => { + expect(graphQLErrorCodes('unauthenticated')).toEqual([]) + expect(graphQLErrorCodes(undefined)).toEqual([]) + }) +}) + +describe('code predicates', () => { + test('isRateLimitCode recognizes THROTTLED and the string 429', () => { + expect(isRateLimitCode('THROTTLED')).toBe(true) + expect(isRateLimitCode('429')).toBe(true) + expect(isRateLimitCode('ACCESS_DENIED')).toBe(false) + expect(isRateLimitCode(undefined)).toBe(false) + }) + + test('isPermissionCode recognizes ACCESS_DENIED and nested access_denied', () => { + expect(isPermissionCode('ACCESS_DENIED')).toBe(true) + expect(isPermissionCode('access_denied')).toBe(true) + expect(isPermissionCode('THROTTLED')).toBe(false) + }) + + test('hasRateLimitCode scans all errors, not just the first', () => { + expect(hasRateLimitCode([{message: 'a'}, {extensions: {code: '429'}}])).toBe(true) + expect(hasRateLimitCode([{extensions: {code: 'OTHER'}}])).toBe(false) + }) +}) diff --git a/packages/cli-kit/src/private/node/analytics/graphql-error-codes.ts b/packages/cli-kit/src/private/node/analytics/graphql-error-codes.ts new file mode 100644 index 00000000000..78126c604a4 --- /dev/null +++ b/packages/cli-kit/src/private/node/analytics/graphql-error-codes.ts @@ -0,0 +1,79 @@ +/** + * Pure helpers for reading routing-relevant codes out of a GraphQL error response. + * + * This module is intentionally dependency-free: it is imported by both the grouping logic + * (`error-grouping.ts`) and the crash-report suppression logic (`../../public/node/error.ts`). + * `error.ts` cannot import `error-grouping.ts` directly — that would create an + * `error.ts → headers.ts → error.ts` import cycle — so the shared scanning logic lives here, + * where it imports nothing from cli-kit. + */ + +/** GraphQL `extensions.code` values we treat as rate limiting. */ +const RATE_LIMIT_CODES = new Set(['THROTTLED', '429']) + +/** + * GraphQL `extensions.code` values (and nested App Management `app_errors` categories) we treat as + * a permission / access-denied signal. The top-level admin code is `ACCESS_DENIED`; the App + * Management nested shape uses the lowercase category `access_denied`. + */ +const PERMISSION_CODES = new Set(['ACCESS_DENIED', 'access_denied']) + +/** + * Collects every routing-relevant code from a GraphQL `errors` value. + * + * Scans *all* errors (not just the first), reading both the top-level `extensions.code` and the + * nested App Management shape `extensions.app_errors.errors[].category`. The API can also return + * `errors` as a string (e.g. some 401s), which yields no codes. + * + * @param errors - The `errors` value from a GraphQL response (array, string, or undefined). + * @returns Every string code/category found, in document order. + */ +export function graphQLErrorCodes(errors: unknown): string[] { + if (!Array.isArray(errors)) return [] + + return errors.flatMap((entry) => { + const found: string[] = [] + const error = entry as {extensions?: {code?: unknown; app_errors?: {errors?: unknown}}} | undefined + + const code = error?.extensions?.code + if (typeof code === 'string') found.push(code) + + const appErrors = error?.extensions?.app_errors?.errors + if (Array.isArray(appErrors)) { + for (const appError of appErrors) { + const category = (appError as {category?: unknown} | undefined)?.category + if (typeof category === 'string') found.push(category) + } + } + + return found + }) +} + +/** + * Whether a single code is a rate-limit signal (`THROTTLED` or `429`). + * + * Mirrors the established shape detected by `errorsIncludeStatus429` in `private/node/api.ts`, + * where `extensions.code === '429'` signals rate limiting even at HTTP 200. + */ +export function isRateLimitCode(code: string | undefined): boolean { + return code !== undefined && RATE_LIMIT_CODES.has(code) +} + +/** + * Whether a single code/category is a permission / access-denied signal. + */ +export function isPermissionCode(code: string | undefined): boolean { + return code !== undefined && PERMISSION_CODES.has(code) +} + +/** + * Whether any GraphQL error carries a rate-limit code. Convenience for suppression logic that only + * needs a boolean over the raw `errors` value. + * + * @param errors - The `errors` value from a GraphQL response. + * @returns True when a rate-limit code is present in any error. + */ +export function hasRateLimitCode(errors: unknown): boolean { + return graphQLErrorCodes(errors).some(isRateLimitCode) +} diff --git a/packages/cli-kit/src/public/node/error-handler.test.ts b/packages/cli-kit/src/public/node/error-handler.test.ts index 5fdfabfd180..261894a1577 100644 --- a/packages/cli-kit/src/public/node/error-handler.test.ts +++ b/packages/cli-kit/src/public/node/error-handler.test.ts @@ -317,12 +317,33 @@ describe('sends errors to Bugsnag', () => { expect(lastBugsnagEvent!.groupingHash).toEqual('theme:permission:http-403-access-denied') expect(lastBugsnagEvent!.addMetadata).toHaveBeenCalledWith('error_grouping', { slice_name: 'theme', + category: 'permission', http_status: 403, error_code: 'ACCESS_DENIED', error_class: 'GraphQLClientError', }) }) + test('emits message-derived signals and category in metadata for a flattened API error', async () => { + await metadata.addSensitiveMetadata(() => ({ + commandStartOptions: {startTime: Date.now(), startCommand: 'theme dev', startArgs: []}, + })) + // A 5xx the API layer flattened into an AbortError: the structured status field is gone, but it + // is recoverable from the message and must still reach both the hash and the metadata. + const flattened = new Error('The request responded unsuccessfully with the HTTP status 503') + + await sendErrorToBugsnag(flattened, 'unexpected_error') + + expect(lastBugsnagEvent!.groupingHash).toEqual('theme:server:http-503') + expect(lastBugsnagEvent!.addMetadata).toHaveBeenCalledWith('error_grouping', { + slice_name: 'theme', + category: 'server', + http_status: 503, + error_code: undefined, + error_class: 'Error', + }) + }) + test('leaves groupingHash unset for an unknown error but still tags error_grouping metadata', async () => { await metadata.addSensitiveMetadata(() => ({ commandStartOptions: {startTime: Date.now(), startCommand: 'app dev', startArgs: []}, @@ -334,6 +355,7 @@ describe('sends errors to Bugsnag', () => { expect(lastBugsnagEvent!.groupingHash).toBeUndefined() expect(lastBugsnagEvent!.addMetadata).toHaveBeenCalledWith('error_grouping', { slice_name: 'app', + category: undefined, http_status: undefined, error_code: undefined, error_class: 'Error', diff --git a/packages/cli-kit/src/public/node/error-handler.ts b/packages/cli-kit/src/public/node/error-handler.ts index 06fd96cfa54..26419f5dd37 100644 --- a/packages/cli-kit/src/public/node/error-handler.ts +++ b/packages/cli-kit/src/public/node/error-handler.ts @@ -12,7 +12,7 @@ import { } from './error.js' import {outputDebug, outputInfo} from './output.js' import {getEnvironmentData} from '../../private/node/analytics.js' -import {errorGroupingHash, errorGroupingSignals} from '../../private/node/analytics/error-grouping.js' +import {resolveErrorGrouping} from '../../private/node/analytics/error-grouping.js' import {isLocalEnvironment} from '../../private/node/context/service.js' import {bugsnagApiKey, reportingRateLimit} from '../../private/node/constants.js' import {CLI_KIT_VERSION} from '../common/version.js' @@ -164,18 +164,20 @@ export async function sendErrorToBugsnag( // Group on structured signals from the *original* error (not the flattened // `reportableError`), and emit those signals as metadata so backend dashboards and // routing work without depending on the grouping hash or on CLI version adoption. + // A single resolution drives both the hash and the metadata so they can never disagree: + // the metadata carries the resolved category and the same (message-merged) signals. // When no meaningful category resolves we leave `groupingHash` unset so Bugsnag's // stack-trace grouping applies and distinct unknown bugs are not merged. - const groupingHash = errorGroupingHash(error, sliceName) + const {hash: groupingHash, category, signals} = resolveErrorGrouping(error, sliceName) if (groupingHash) { event.groupingHash = groupingHash } - const {httpStatus, code, errorClass} = errorGroupingSignals(error) event.addMetadata('error_grouping', { slice_name: sliceName, - http_status: httpStatus, - error_code: code, - error_class: errorClass, + category, + http_status: signals.httpStatus, + error_code: signals.code, + error_class: signals.errorClass, }) } const errorHandler = (error: unknown) => { diff --git a/packages/cli-kit/src/public/node/error.test.ts b/packages/cli-kit/src/public/node/error.test.ts index 2ad406d6e86..b80d0054526 100644 --- a/packages/cli-kit/src/public/node/error.test.ts +++ b/packages/cli-kit/src/public/node/error.test.ts @@ -101,6 +101,19 @@ describe('shouldReportErrorAsUnexpected helper', () => { expect(shouldReportErrorAsUnexpected(clientError(400, 'THROTTLED'))).toBe(false) }) + test('returns false for a raw ClientError with a GraphQL "429" code at HTTP 200', () => { + // Matches errorsIncludeStatus429 in private/node/api.ts. + expect(shouldReportErrorAsUnexpected(clientError(200, '429'))).toBe(false) + }) + + test('returns false for a rate-limit code on a later error entry, not just the first', () => { + const error = new ClientError( + {status: 200, errors: [{message: 'noise'}, {extensions: {code: 'THROTTLED'}}], headers: {}} as any, + {query: 'q'} as any, + ) + expect(shouldReportErrorAsUnexpected(error)).toBe(false) + }) + test('returns true for a raw ClientError that is a genuine failure (HTTP 500)', () => { expect(shouldReportErrorAsUnexpected(clientError(500))).toBe(true) }) diff --git a/packages/cli-kit/src/public/node/error.ts b/packages/cli-kit/src/public/node/error.ts index 723aa24c7a5..d3c891775ec 100644 --- a/packages/cli-kit/src/public/node/error.ts +++ b/packages/cli-kit/src/public/node/error.ts @@ -1,6 +1,7 @@ import {normalizePath} from './path.js' import {OutputMessage, stringifyMessage, TokenizedString} from './output.js' import {InlineToken, TokenItem, tokenItemToString} from '../../private/node/ui/components/TokenizedText.js' +import {hasRateLimitCode} from '../../private/node/analytics/graphql-error-codes.js' import {Errors} from '@oclif/core' import {ClientError} from 'graphql-request' @@ -194,10 +195,10 @@ export function shouldReportErrorAsUnexpected(error: unknown): boolean { if (!isFatal(error)) { // this means its not one of the CLI wrapped errors if (error instanceof Error) { - // Raw API errors that slip through unwrapped (e.g. the handleErrors:false path) are - // transient/environmental, not CLI bugs: rate limiting (429/THROTTLED) and unauthenticated - // requests (401). Treat them as expected so they don't pollute crash reporting. - if (isTransientApiError(error)) { + // Raw API errors that slip through unwrapped (e.g. the handleErrors:false path) are expected + // environmental conditions, not CLI bugs. Treat them as expected so they don't pollute crash + // reporting. + if (isExpectedApiError(error)) { return false } const message = error.message @@ -212,17 +213,23 @@ export function shouldReportErrorAsUnexpected(error: unknown): boolean { } /** - * Detects raw graphql-request `ClientError`s that are transient/environmental rather than CLI - * bugs: HTTP 401 (unauthenticated), 429 (rate limited), or a GraphQL `THROTTLED` code. These reach - * the reporter as plain `Error`s (not `FatalError`s) and would otherwise be reported as unexpected. + * Detects raw graphql-request `ClientError`s that are expected environmental conditions rather than + * CLI bugs. These reach the reporter as plain `Error`s (not `FatalError`s) and would otherwise be + * reported as unexpected. Two distinct cases: + * + * - **HTTP 401 (unauthenticated).** Not "transient" in the retry sense — it means the user's + * session token is expired or invalid, a credential/environment condition we never want as a + * crash report (see issue #7891). + * - **Rate limiting (HTTP 429, or a `THROTTLED`/`429` GraphQL code on any error in the response).** + * Matches the rate-limit shape detected by `errorsIncludeStatus429` in `private/node/api.ts`. * * Scoped to the external `ClientError` type only — importing the cli-kit `GraphQLClientError` * wrapper here would create an `error.ts → headers.ts → error.ts` import cycle. * * @param error - Error to be checked. - * @returns A boolean indicating if the error is a known-transient API error. + * @returns A boolean indicating if the error is a known expected API error. */ -function isTransientApiError(error: Error): boolean { +function isExpectedApiError(error: Error): boolean { if (!(error instanceof ClientError)) { return false } @@ -230,12 +237,7 @@ function isTransientApiError(error: Error): boolean { if (status === 401 || status === 429) { return true } - const errors = error.response?.errors - if (Array.isArray(errors)) { - const code = (errors[0] as {extensions?: {code?: unknown}} | undefined)?.extensions?.code - return code === 'THROTTLED' - } - return false + return hasRateLimitCode(error.response?.errors) } /** From f0fc5fdc919cbfdbf646d2d10619238719a2d6cc Mon Sep 17 00:00:00 2001 From: stephanie chou Date: Mon, 29 Jun 2026 12:34:51 -0700 Subject: [PATCH 4/5] Fix lint: prettier multi-line returns + jsdoc indentation Wrap the two resolveErrorGrouping return objects across lines (prettier/prettier) and flatten the isExpectedApiError jsdoc to remove indented continuation lines (jsdoc/check-indentation, enforced on public API files). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../src/private/node/analytics/error-grouping.ts | 12 ++++++++++-- packages/cli-kit/src/public/node/error.ts | 11 +++++------ 2 files changed, 15 insertions(+), 8 deletions(-) diff --git a/packages/cli-kit/src/private/node/analytics/error-grouping.ts b/packages/cli-kit/src/private/node/analytics/error-grouping.ts index f1c3968ada8..ff66a1803d7 100644 --- a/packages/cli-kit/src/private/node/analytics/error-grouping.ts +++ b/packages/cli-kit/src/private/node/analytics/error-grouping.ts @@ -90,7 +90,11 @@ export function resolveErrorGrouping(error: unknown, sliceName: string): Resolve const structuredCategory = categoryFromSignals(signals) if (structuredCategory) { - return {hash: `${sliceName}:${structuredCategory}:${structuredSignature(signals)}`, category: structuredCategory, signals} + return { + hash: `${sliceName}:${structuredCategory}:${structuredSignature(signals)}`, + category: structuredCategory, + signals, + } } const groupingError = new Error(stripJsonDump(message)) @@ -100,7 +104,11 @@ export function resolveErrorGrouping(error: unknown, sliceName: string): Resolve } const categoryName = category.toLowerCase() - return {hash: `${sliceName}:${categoryName}:${formatErrorMessage(groupingError, category)}`, category: categoryName, signals} + return { + hash: `${sliceName}:${categoryName}:${formatErrorMessage(groupingError, category)}`, + category: categoryName, + signals, + } } /** diff --git a/packages/cli-kit/src/public/node/error.ts b/packages/cli-kit/src/public/node/error.ts index d3c891775ec..9f84774cef9 100644 --- a/packages/cli-kit/src/public/node/error.ts +++ b/packages/cli-kit/src/public/node/error.ts @@ -215,13 +215,12 @@ export function shouldReportErrorAsUnexpected(error: unknown): boolean { /** * Detects raw graphql-request `ClientError`s that are expected environmental conditions rather than * CLI bugs. These reach the reporter as plain `Error`s (not `FatalError`s) and would otherwise be - * reported as unexpected. Two distinct cases: + * reported as unexpected. Two distinct cases, both kept out of crash reporting: * - * - **HTTP 401 (unauthenticated).** Not "transient" in the retry sense — it means the user's - * session token is expired or invalid, a credential/environment condition we never want as a - * crash report (see issue #7891). - * - **Rate limiting (HTTP 429, or a `THROTTLED`/`429` GraphQL code on any error in the response).** - * Matches the rate-limit shape detected by `errorsIncludeStatus429` in `private/node/api.ts`. + * HTTP 401 (unauthenticated) is not "transient" in the retry sense — it means the user's session + * token is expired or invalid, a credential/environment condition (see issue #7891). Rate limiting + * (HTTP 429, or a `THROTTLED`/`429` GraphQL code on any error in the response) matches the shape + * detected by `errorsIncludeStatus429` in `private/node/api.ts`. * * Scoped to the external `ClientError` type only — importing the cli-kit `GraphQLClientError` * wrapper here would create an `error.ts → headers.ts → error.ts` import cycle. From 41879bfaf7f3f8fba95a453d2221089f5ab3b074 Mon Sep 17 00:00:00 2001 From: stephanie chou Date: Mon, 29 Jun 2026 13:07:05 -0700 Subject: [PATCH 5/5] Regenerate admin types: add THEME_CONTEXTUALIZATION_NOT_COMPATIBLE_WITH_CONDITION_TYPES Incidental codegen drift cleanup, not part of the error-grouping logic. The admin schema gained this OnlineStoreThemeFilesUserErrorsCode enum value upstream; the committed generated types were stale. The codegen check only runs when a PR touches graphql-pathed files, so this PR's new graphql-error-codes.ts un-skipped it and surfaced the pre-existing drift. Reproduces codegen output byte-for-byte (no local schema fetch available). Co-Authored-By: Claude Opus 4.8 (1M context) --- packages/cli-kit/src/cli/api/graphql/admin/generated/types.d.ts | 2 ++ 1 file changed, 2 insertions(+) diff --git a/packages/cli-kit/src/cli/api/graphql/admin/generated/types.d.ts b/packages/cli-kit/src/cli/api/graphql/admin/generated/types.d.ts index 2da991ce827..9ce2584b9d6 100644 --- a/packages/cli-kit/src/cli/api/graphql/admin/generated/types.d.ts +++ b/packages/cli-kit/src/cli/api/graphql/admin/generated/types.d.ts @@ -232,6 +232,8 @@ export type OnlineStoreThemeFilesUserErrorsCode = | 'LESS_THAN_OR_EQUAL_TO' /** The record with the ID used as the input value couldn't be found. */ | 'NOT_FOUND' + /** Theme contextualization and condition types are not compatible with each other. */ + | 'THEME_CONTEXTUALIZATION_NOT_COMPATIBLE_WITH_CONDITION_TYPES' /** There are theme files with conflicts. */ | 'THEME_FILES_CONFLICT' /** This action is not available on your current plan. Please upgrade to access theme editing features. */