Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@
- Add `failproofai audit` command (beta) — retrospectively scan past agent transcripts across all 7 CLIs and report wasteful/risky behavior via the 39 builtin policies + 8 new audit-only detectors (`redundant-cd-cwd`, `prefer-edit-over-read-cat`, `prefer-edit-over-sed-awk`, `prefer-write-over-heredoc`, `sleep-polling-loop`, `find-from-root`, `git-commit-no-verify`, `reread-after-edit`). Outputs ANSI table + markdown report; supports `--cli`, `--project`, `--since`, `--policy`, `--limit`, `--show-examples`, `--report`, `--no-report`, `--json`, `--no-cache`. Per-transcript cache at `~/.failproofai/cache/audit/` auto-invalidates on policy/detector code changes (#377).

### Fixes
- Deliver the `failproofai audit` CLI's telemetry reliably. `cli_audit_started` / `cli_audit_completed` / `cli_audit_failed` were emitted fire-and-forget (`void trackHookEvent(...)`), so the failed path (`die()` → `process.exit(1)`) and the empty-history path (`process.exit(0)`) killed the in-flight `fetch` before it landed — those events never reached PostHog. `src/audit/cli.ts` now `await`s the two exit-adjacent events before exiting (matching `bin/failproofai.mjs`'s `track()` helper); `cli_audit_started` stays fire-and-forget since the multi-second scan keeps the process alive. New `__tests__/audit/audit-cli-telemetry.test.ts` asserts each path emits its event and that the exit-adjacent events are awaited before `process.exit` (#461).
- Apply the same telemetry-delivery fix to the `failproofai auth` CLI (`src/auth/cli.ts`), which had the identical bug: `audit_cli_auth_login_completed` / `audit_otp_verified` / `audit_user_identity_linked` / `audit_cli_auth_logout_completed` / `audit_cli_auth_whoami` were emitted fire-and-forget and dropped when the process exited after the command. The terminal and error events are now awaited; the two mid-flow events (`audit_cli_auth_login_started`, the success `audit_otp_requested`) stay fire-and-forget since the interactive `email:` / `code:` prompts keep the process alive. New `__tests__/auth/auth-cli-telemetry.test.ts` (#461).
- Instrument the dashboard's server-side audit run. `POST /api/audit/run` ran `runAudit()` as a detached task and emitted **no** PostHog events — the dashboard's actual audit work and its failures were invisible, with only the client-side `audit_rerun_clicked` / `audit_rerun_failed` recorded. The route now emits `audit_run_started` / `audit_run_completed` (duration, events + sessions scanned, findings, hits, persisted) / `audit_run_failed` / `audit_run_rejected`, mirroring the CLI's `cli_audit_*` funnel; and the dashboard now emits the previously-missing `audit_rerun_succeeded` (it tracked clicks and failures but never successes) (#461).
- Close the remaining telemetry gaps the audit surfaced: track the postinstall build-missing failure (`package_install_failed`, awaited before `process.exit(1)` — previously invisible); add `keepalive: true` to `captureClientEvent` so events fired right before a navigation/unload aren't dropped; track `/api/auth/login-verify` validation-400s and add `email` + `source` to its failure events for parity with `/api/auth/login-request`; and fill property gaps (`node_version` on `package_installed`, drop the duplicate `version` on `first_install`, add `subcommand` + `exit_code` to `cli_auth_invoked`). The hook hot-path error events are intentionally left fire-and-forget to avoid adding telemetry latency to every tool call (#461).
- Fix the policies → activity table collapsing on narrow / portrait windows. Columns no longer overlap — each data cell clips with an ellipsis at its own edge and headers stay on one line — and the table holds a readable `min-width` (1280px), scrolling horizontally below that via a themed scrollbar instead of squeezing columns into each other. The badge / long-header columns (decision, event, cli, mode, duration, session) were widened so their content fits — the **mode** column in particular now holds its widest pill (`bypassPermissions`) instead of clipping it mid-word, and the mode pill truncates with an ellipsis + hover tooltip if a longer / custom mode ever appears.
- Fix three translated docs pages that failed the Mintlify deploy parse. `docs/tr/cli/audit.mdx` had a dropped closing backtick that pushed `<slug>` out of its inline-code span (parsed as an unclosed JSX tag); `docs/ja/built-in-policies.mdx` and `docs/zh/built-in-policies.mdx` carried translator-injected `{#id}` heading anchors that MDX reads as JS expressions. All three now match the other 12 locales (#455).
- Stop the failproofai server log from repeating the benign Next.js "Failed to find Server Action" deployment-skew error. A browser tab left open across a dashboard rebuild/upgrade POSTs a stale Server Action ID; the client recovers via Next's graceful 404, but the standalone server still logged a 3-line error block to stderr per stale request. The `start` launcher now pipes the server's output through a filter (`scripts/skew-log-filter.ts`) that drops just that block — all other output, and color via `FORCE_COLOR`, passes through untouched; `dev` is unchanged (#456).
Expand Down Expand Up @@ -98,6 +102,7 @@
- Add coverage for previously untested audit + auth modules: `__tests__/audit/archetypes.test.ts` (zero-signal → precision, broad-spread → goldfish, secondary ≥40% promotion vs authored fallback, deterministic variant picker), `__tests__/audit/findings.test.ts` (ranking, zero-hit drop, detector→policy remapping, `alsoCoveredBy`, `alreadyEnabled` enable-set + builtin-config heuristics, relative-time + missing `lastSeen` fallback), `__tests__/audit/strengths.test.ts` (clean-rate headline, credential / retry / push-to-main absence gates, 5-item cap, fallback row when too few qualify), and `__tests__/lib/auth-store.test.ts` (round-trip, mode 0600, atomic write leaves no `.tmp` siblings, shape-mismatch rejection, reminder scoping, atomic overwrite). +40 tests; full suite at 1741 passing.

### Docs
- Point every "Docs" landing link at `https://docs.befailproof.ai/introduction` (the Mintlify landing page) instead of a bare root that doesn't resolve to a page: the `failproofai --help` LINKS banner and the `dev` / `start` launch banner (were `https://befailproof.ai`), the dashboard "Reach Us" → Documentation entry (was `https://docs.befailproof.ai/`), and the README docs badge (English + 14 translations, was a bare `https://docs.befailproof.ai`). Deep page links (e.g. `https://docs.befailproof.ai/built-in-policies`) are unchanged (#461).
- Replace the community Slack invite with Discord (`https://discord.gg/2zjBZP7yQJ`) everywhere it's user-facing: the `failproofai --help` LINKS banner, the dashboard "Reach Us" dropdown, and the README community badge (English + 14 translations). The Slack *webhook notification example* (`examples/policies-notification.js`) is intentionally left as-is — it's a feature integration, not a community link.
- Reword the `/audit` invite card ("Share with friends" / "wanna know how your friends' agents score?") and grammar-pass the X/LinkedIn share templates (article/adverb/coordination/comma-splice fixes only — no behavioral or structural change).
- Document the `failproofai audit` command and `npx -y failproofai audit` usage in `docs/cli/audit.mdx`, and refresh the `docs/dashboard.mdx` Audit section to the current poster flow (#453).
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[![CI](https://img.shields.io/github/actions/workflow/status/failproofai/failproofai/ci.yml?branch=main&style=flat-square&label=CI)](https://github.com/failproofai/failproofai/actions)
[![Supply Chain](https://img.shields.io/badge/supply%20chain-secure-brightgreen?style=flat-square)](https://github.com/failproofai/failproofai/actions/workflows/osv-scanner.yml)
[![Discord](https://img.shields.io/badge/Discord-join%20us-5865F2?style=flat-square&logo=discord)](https://discord.gg/2zjBZP7yQJ)
[![Docs](https://img.shields.io/badge/docs-befailproof.ai-002CA7?style=flat-square)](https://docs.befailproof.ai)
[![Docs](https://img.shields.io/badge/docs-befailproof.ai-002CA7?style=flat-square)](https://docs.befailproof.ai/introduction)
[![License](https://img.shields.io/badge/license-MIT%20%2B%20Commons%20Clause-blue?style=flat-square)](./LICENSE)

**Translations:** [简体中文](./docs/i18n/README.zh.md) · [日本語](./docs/i18n/README.ja.md) · [한국어](./docs/i18n/README.ko.md) · [Español](./docs/i18n/README.es.md) · [Português](./docs/i18n/README.pt-br.md) · [Deutsch](./docs/i18n/README.de.md) · [Français](./docs/i18n/README.fr.md) · [Русский](./docs/i18n/README.ru.md) · [हिन्दी](./docs/i18n/README.hi.md) · [Türkçe](./docs/i18n/README.tr.md) · [Tiếng Việt](./docs/i18n/README.vi.md) · [Italiano](./docs/i18n/README.it.md) · [العربية](./docs/i18n/README.ar.md) · [עברית](./docs/i18n/README.he.md)
Expand Down
87 changes: 63 additions & 24 deletions __tests__/api/audit-run-route.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,17 @@ import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import type { NextRequest } from "next/server";

// Mock the heavy audit modules so the route is exercised in isolation: runAudit
// is replaced with a controllable promise, and the cache write is a no-op.
const { runAuditMock, writeCacheMock } = vi.hoisted(() => ({
// is replaced with a controllable promise, the cache write is a no-op, and the
// telemetry channel is a spy so we can assert the dashboard run funnel.
const { runAuditMock, writeCacheMock, trackEventMock, initTelemetryMock } = vi.hoisted(() => ({
runAuditMock: vi.fn(),
writeCacheMock: vi.fn(),
trackEventMock: vi.fn(),
initTelemetryMock: vi.fn(async () => {}),
}));
vi.mock("@/src/audit", () => ({ runAudit: runAuditMock }));
vi.mock("@/src/audit/dashboard-cache", () => ({ writeDashboardCache: writeCacheMock }));
vi.mock("@/lib/telemetry", () => ({ initTelemetry: initTelemetryMock, trackEvent: trackEventMock }));

import { POST } from "@/app/api/audit/run/route";
import { getRunState, releaseRun } from "@/app/api/audit/_state";
Expand All @@ -18,15 +22,35 @@ function req(body: string): NextRequest {
return { text: async () => body } as unknown as NextRequest;
}

// A well-formed AuditResult so the route's audit_run_completed property reads
// (result.transcripts.scanned, result.totals.hits, …) don't throw.
function auditResult(over: Record<string, unknown> = {}) {
return {
eventsScanned: 1240,
transcripts: { scanned: 18, skipped: 0, errors: 0, durationMs: 0 },
projectsScanned: ["/a", "/b"],
results: [{}, {}, {}],
totals: { hits: 7, projectsWithHits: 2 },
...over,
};
}

const flush = async () => {
for (let i = 0; i < 3; i++) await Promise.resolve();
};
const trackedNames = () => trackEventMock.mock.calls.map((c) => c[0] as string);

describe("POST /api/audit/run (fire-and-forget)", () => {
beforeEach(() => {
releaseRun();
runAuditMock.mockReset();
writeCacheMock.mockReset();
trackEventMock.mockReset();
initTelemetryMock.mockClear();
});
afterEach(() => releaseRun());

it("returns 202 immediately WITHOUT awaiting the run, and marks the lock running", async () => {
it("returns 202 immediately WITHOUT awaiting the run, marks the lock, and emits audit_run_started", async () => {
// runAudit never resolves during the test — if POST awaited it, this would
// hang. Reaching the assertions proves the run is detached.
runAuditMock.mockImplementation(() => new Promise<never>(() => {}));
Expand All @@ -37,64 +61,72 @@ describe("POST /api/audit/run (fire-and-forget)", () => {
await expect(res.json()).resolves.toEqual({ status: "started" });
expect(getRunState().running).toBe(true);
expect(runAuditMock).toHaveBeenCalledTimes(1);
expect(trackedNames()).toContain("audit_run_started");
});

it("409s a second concurrent run while one is in flight", async () => {
it("409s a second concurrent run and tracks audit_run_rejected(already_running)", async () => {
runAuditMock.mockImplementation(() => new Promise<never>(() => {}));

const first = await POST(req("{}"));
expect(first.status).toBe(202);

const second = await POST(req("{}"));
expect(second.status).toBe(409);
// The detached first run is still the only one that ran.
expect(runAuditMock).toHaveBeenCalledTimes(1);
expect(trackEventMock).toHaveBeenCalledWith(
"audit_run_rejected",
expect.objectContaining({ reason: "already_running" }),
);
});

it("records the error and releases the lock when the detached run throws", async () => {
it("tracks audit_run_failed, records the error, and releases the lock when the detached run throws", async () => {
let reject!: (e: unknown) => void;
runAuditMock.mockImplementation(() => new Promise((_res, rej) => { reject = rej; }));

const res = await POST(req("{}"));
expect(res.status).toBe(202);
expect(getRunState().running).toBe(true);

// Fail the background run and let its .catch settle.
reject(new Error("scan blew up"));
await Promise.resolve();
await Promise.resolve();
await flush();

const s = getRunState();
expect(s.running).toBe(false);
expect(s.error).toBe("scan blew up");
expect(writeCacheMock).not.toHaveBeenCalled();
expect(trackedNames()).toContain("audit_run_failed");
});

it("writes the cache and clears the lock when the detached run succeeds", async () => {
it("writes the cache, tracks audit_run_completed with metrics, and clears the lock on success", async () => {
let resolveRun!: (value: unknown) => void;
runAuditMock.mockImplementation(
() => new Promise((res) => { resolveRun = res; }),
);
runAuditMock.mockImplementation(() => new Promise((res) => { resolveRun = res; }));
writeCacheMock.mockReturnValue(true);

const res = await POST(req("{}"));
expect(res.status).toBe(202);
expect(getRunState().running).toBe(true);

// Complete the detached run and let its .then settle.
resolveRun({ ok: true });
await Promise.resolve();
await Promise.resolve();
resolveRun(auditResult());
await flush();

expect(writeCacheMock).toHaveBeenCalledTimes(1);
expect(getRunState()).toMatchObject({ running: false, error: null });
expect(trackEventMock).toHaveBeenCalledWith(
"audit_run_completed",
expect.objectContaining({
source: "dashboard",
events_scanned: 1240,
sessions_scanned: 18,
findings: 3,
total_hits: 7,
persisted: true,
}),
);
});

it("reports a run error when the result cannot be persisted (cache write fails)", async () => {
let resolveRun!: (value: unknown) => void;
runAuditMock.mockImplementation(
() => new Promise((res) => { resolveRun = res; }),
);
runAuditMock.mockImplementation(() => new Promise((res) => { resolveRun = res; }));
// writeDashboardCache swallows its own IO errors and returns false; in
// fire-and-forget the cache is the only delivery channel, so a failed
// persist must surface as a run error rather than a silent success.
Expand All @@ -103,19 +135,26 @@ describe("POST /api/audit/run (fire-and-forget)", () => {
const res = await POST(req("{}"));
expect(res.status).toBe(202);

resolveRun({ ok: true });
await Promise.resolve();
await Promise.resolve();
resolveRun(auditResult());
await flush();

const s = getRunState();
expect(s.running).toBe(false);
expect(s.error).toBeTruthy();
expect(trackEventMock).toHaveBeenCalledWith(
"audit_run_completed",
expect.objectContaining({ persisted: false }),
);
});

it("400s a non-object JSON body", async () => {
it("400s a non-object JSON body and tracks audit_run_rejected(non_object_body)", async () => {
const res = await POST(req("[]"));
expect(res.status).toBe(400);
expect(getRunState().running).toBe(false);
expect(runAuditMock).not.toHaveBeenCalled();
expect(trackEventMock).toHaveBeenCalledWith(
"audit_run_rejected",
expect.objectContaining({ reason: "non_object_body" }),
);
});
});
Loading