Skip to content

feat(insights): /usage-style profile + cost attribution; scanner perf overhaul#2

Merged
Gnonymous merged 1 commit into
mainfrom
feature/usage-insights-and-attribution
Jun 23, 2026
Merged

feat(insights): /usage-style profile + cost attribution; scanner perf overhaul#2
Gnonymous merged 1 commit into
mainfrom
feature/usage-insights-and-attribution

Conversation

@Gnonymous

Copy link
Copy Markdown
Owner

Summary

Borrows two ideas from Claude Code's /usage and Claude Desktop's Overview, while keeping CodingBar's "100% local" promise — every new data point is derived from already-scanned RawRecords, no new I/O sources, no new network paths.

Ships in three layered slices:

Insights tab — all-time Profile

  • 2×4 stat grid: sessions, messages, total tokens, active days, current/longest streak, peak hour, favorite model.
  • GitHub-style 13-week contribution calendar (blue palette, 7 rows × 13 weeks).
  • Fun-fact callout ("~N× Harry Potter") tied to lifetime tokens.
  • Replaces the old weekday × hour heatmap on the Insights tab — the new calendar carries the same activity signal with more time depth.

Cost tab — usage attribution

Two new sections, both Claude-only (Codex carries no attribution tags and its tokens are deltas of a running total, not absolute prompt size), both range-aware and metric-aware so they follow the existing right-pill + cost/tokens toggle.

  • By context size: ≤50k / 50–150k / >150k bands using the absolute prompt size from each request's usage block. Mirrors /usage's "what's contributing to your usage" lens; on this machine the >150k band came in at 53% of 30-day Claude spend.
  • Skills / Subagents / Plugins / MCP servers tables: read straight from Claude Code's own attribution* fields on each assistant line with usage, so percentages are an exact group-and-sum — not a heuristic. Honest "% of usage" denominator = Claude total in range, so shares are independent and don't sum to 100 (most turns carry no attribution at all).

Scanner perf — boot peak RSS ~1.3 GB → ~75 MB

Before this branch a fresh launch on a 711 MB / 1623-file / 51k-record corpus would spike to ~1.3 GB resident before settling back to ~190 MB. After:

before after delta
Boot peak RSS (.app) ~1.3 GB ~75 MB -94%
Steady-state RSS (.app) ~190 MB 49–56 MB -74%
Warm CLI dump peak ~190 MB 108 MB -43%
Cache file size 18 MB JSON 6.86 MB binary plist -62%

Three orthogonal fixes:

  1. Per-line autoreleasepool in both scanners. JSONSerialization returns autoreleased NSDictionary/NSString trees; in a tight scan loop running inside a detached Task there's no natural pool drain, and intermediates pile up across files. Wrapping each line drains immediately.
  2. Shared Scanner across Claude + Codex. Previously each provider instantiated its own Scanner and re-decoded the same cache file twice per Aggregator.run(). Now one Scanner is created in the aggregator and passed to both.
  3. Cache format JSON → binary property list. PropertyListDecoder reads directly into the Swift struct without the giant intermediate NSDictionary tree that JSONDecoder builds via JSONSerialization. Cache file also shrinks ~62%. cacheVersion bumped 4 → 5 (forces one full rescan on next launch; pre-release so no user impact).

Privacy boundary

Unchanged. Two network paths only, both pre-existing: Claude/Codex quota GETs (TTL-cached, the user's own OAuth token) and a user-initiated GitHub releases GET. No telemetry, no log uploads.

Test plan

  • swift build clean (debug + release)
  • make test (--self-test) — ALL PASS
  • --dump-json on real local corpus — profile (sessions=1398, activeDays=104, streak=32, peakHour=16, favorite=Opus 4.8), context attribution (>150k = 52/55/53% across today/7d/30d), usage attribution (workflow-subagent 14.2%, playwright MCP 10.7%, orchestration skill 4.8% on 30d) all match /usage-style numbers
  • Insights tab dark + light rasterized via --render-panel — 2×4 grid + calendar + fun fact render correctly
  • Cost tab dark/cost + light/tokens rasterized — both new sections follow range pill and metric toggle
  • .app boot peak measured at 49–75 MB resident (down from ~1.3 GB)
  • Cache file header inspected (bplist00 magic), size 6.86 MB
  • New ProfileBuilder tests + new attribution-parsing test added to SmokeTests.swift
  • CI XCTest run (only runnable in CI — needs Xcode toolchain that isn't on the local CLT)
  • Regenerate docs/assets/panel-insights.png / panel-overview.png reference shots (left to a follow-up since docs/assets/*.png already had pre-existing uncommitted changes on this working tree)

🤖 Generated with Claude Code

… overhaul

Insights tab — new all-time Profile section
- 2×4 stat grid: sessions / messages / total tokens / active days,
  current & longest streak, peak hour, favorite model.
- 13-week GitHub-style contribution calendar (blue palette).
- Fun-fact callout ("~N× Harry Potter") tied to lifetime tokens.
- Replaces the older weekday × hour heatmap on the Insights tab.

Cost tab — new attribution sections
- By context size: ≤50k / 50–150k / >150k bands. Claude-only (Codex
  records are delta-of-cumulative, not absolute prompt size).
- Usage attribution tables: Skills / Subagents / Plugins / MCP servers,
  read straight from Claude's `attribution*` fields on each assistant
  line — so percentages are an exact group-and-sum.
- Both follow the right-pill range (today / 7d / 30d) and the
  cost/tokens metric, like the existing model/project breakdowns.

Scanner perf — boot peak RSS ~1.3 GB → ~75 MB on this 711 MB corpus
- Per-line autoreleasepool in both scanners so JSONSerialization's
  NSDictionary/NSString tree drains immediately instead of piling up
  across files (Aggregator.run runs in a detached Task with no
  natural pool drain).
- Shared Scanner across Claude + Codex so the on-disk cache is
  decoded once per Aggregator.run() instead of twice.
- Cache format JSON → binary property list. PropertyListDecoder reads
  directly into the Swift struct without an intermediate NSDictionary
  tree, and the on-disk file shrinks ~62% (18 MB → 6.9 MB here).
  cacheVersion 4 → 5 (forces one full rescan on next launch).

Privacy boundary unchanged: every new data point is derived locally
from already-scanned records. No new I/O sources, no new network paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Gnonymous Gnonymous merged commit 458bd02 into main Jun 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant