Session list performance: disk summary cache, display-name cache, and count alignment#111
Conversation
Session list and landing page re-parsed or re-read every .jsonl on each cold start and on every GET /api/projects call. Add SQLite summary cache (keyed by path, mtime, rules fingerprint) so complete rows survive restart; route get_projects and get_project_sessions through it. Cache project display names by directory max mtime. Align project card session_count with session list when no exclusion rules are active.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughThis PR adds a disk-backed session summary cache keyed by path, mtime, and rules fingerprint, plus an in-memory project display-name cache. The project listing endpoints now reuse cached summaries and the tests cover cache behavior, warm-cache reuse, display-name reuse, and count alignment. ChangesSession listing performance caching
Estimated code review effort: 3 (Moderate) | ~25 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
api/projects.py (1)
76-89: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winExceptions are swallowed silently.
Unlike
get_project_sessions, which logs viacurrent_app.logger.exception(...)on failure, thisexcept Exceptionblock on the peek/cache path silently falls back to counting the session without any log trace. This makes cache corruption, permission errors, or SQLite lock contention on this endpoint invisible in production.🩹 Proposed fix
except Exception: + current_app.logger.exception("Failed to peek/cache summary for %s", s["id"]) titled_count += 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/projects.py` around lines 76 - 89, The exception handling in the session counting loop silently swallows failures in the _peek_or_cache_summary path, unlike get_project_sessions which logs errors. Update the except Exception block in the session processing logic to record the failure with current_app.logger.exception (or equivalent) before falling back to counting the session, so issues like cache corruption or SQLite lock contention are visible. Use the existing project session-counting code path and the _peek_or_cache_summary call site as the anchor points for the fix.
🧹 Nitpick comments (4)
utils/session_path.py (1)
48-63: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win
latest_mtimeis computed twice per project.
list_projectsalready computeslatest_mtimefor thelast_modifiedfield (lines 80-82), then_resolve_display_namerecomputes the exact same value via_project_jsonl_max_mtimefor the cache key. Pass the already-computed value through instead of re-statting every.jsonlfile a second time.♻️ Proposed fix
-def _resolve_display_name(project_dir: str, jsonl_files: list[str], fallback: str) -> str: - max_mtime = _project_jsonl_max_mtime(project_dir, jsonl_files) +def _resolve_display_name( + project_dir: str, jsonl_files: list[str], fallback: str, max_mtime: float +) -> str: with _display_name_lock:last_modified = datetime.fromtimestamp(latest_mtime, tz=timezone.utc).isoformat() - display_name = _resolve_display_name(project_dir, jsonl_files, name) + display_name = _resolve_display_name(project_dir, jsonl_files, name, latest_mtime)Also applies to: 80-86
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@utils/session_path.py` around lines 48 - 63, `_resolve_display_name` is recomputing the project max mtime even though `list_projects` already has `latest_mtime` for the same project. Update `list_projects` to pass the existing `latest_mtime` into `_resolve_display_name`, and change `_resolve_display_name` to use that value for the cache lookup instead of calling `_project_jsonl_max_mtime` again. Keep the cache behavior in `_display_name_cache` and the display-name fallback logic unchanged.api/projects.py (1)
105-125: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRedundant recomputation:
summary_from_sessionand_session_row_okboth derive the same fields fromparsed.
row = summary_from_session(parsed, ...)already computes title/models/tokens/tool_calls/timestamps, then_session_row_ok(s, parsed)recomputes the same fields a second time fromparsed. Sincesession_row_from_summary(s, row)exists precisely to map aSummaryCacheRowDictonto the API row shape, use it here instead, and drop the now-unused_session_row_ok.♻️ Proposed refactor
parsed = get_cached_session(s["path"]) excluded = is_session_excluded(rules, parsed, project_name) row = summary_from_session(parsed, is_excluded=excluded) put_summary(s["path"], s["modified"], rules_fp, row) if row["is_untitled"] or excluded: continue - result.append(_session_row_ok(s, parsed)) + result.append(session_row_from_summary(s, row))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/projects.py` around lines 105 - 125, The session parsing path in the project summary loop is doing redundant work: `summary_from_session(parsed, ...)` already builds the summary fields, but the success path then calls `_session_row_ok(s, parsed)` and recomputes them from `parsed` again. Replace that final append with `session_row_from_summary(s, row)` so the API row is derived from the cached summary object, and remove `_session_row_ok` if it is no longer used.utils/session_summary_cache.py (2)
17-17: 🚀 Performance & Scalability | 🔵 TrivialVerify
DEFAULT_MAX_ROWS=2000is adequate for large installs.
get_projects()touches a cache row for every session in every project on each landing-page request. For a large install (the scenario this PR targets) with more than ~2000 total sessions, a single landing-page load can evict rows faster than they're reused, so the cache never stays warm across projects — defeating the intended speedup for exactly the target use case. Consider making this configurable (env var) or raising the default substantially.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@utils/session_summary_cache.py` at line 17, The fixed DEFAULT_MAX_ROWS value in session_summary_cache should be revisited because get_projects() can churn through too many session rows for large installs. Update the caching limit in session_summary_cache.py by making DEFAULT_MAX_ROWS configurable (for example via an environment setting) or by increasing it substantially, and ensure the value used by get_projects() reflects the new setting so cache rows are not evicted before reuse.
117-152: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winStale rows for the same path accumulate under a different
mtime.Since the primary key includes
mtime, every time a session file's mtime changes (e.g. an in-progress conversation being appended to)put_summaryinserts a new row rather than replacing the old one for that path. Old rows for stale mtimes are never explicitly cleaned up — only removed by the global LRU sweep — so actively-growing sessions can quietly consume the row budget and cause other projects' cache entries to be evicted sooner than necessary.♻️ Proposed cleanup of stale mtime rows for the same path
conn.execute( """ INSERT INTO summary_cache (path, mtime, rules_fp, payload, accessed_at) VALUES (?, ?, ?, ?, ?) ON CONFLICT(path, mtime, rules_fp) DO UPDATE SET payload = excluded.payload, accessed_at = excluded.accessed_at """, (abspath, mtime, rules_fingerprint, payload, now), ) + conn.execute( + "DELETE FROM summary_cache WHERE path = ? AND rules_fp = ? AND mtime != ?", + (abspath, rules_fingerprint, mtime), + ) count = conn.execute("SELECT COUNT(*) FROM summary_cache").fetchone()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@utils/session_summary_cache.py` around lines 117 - 152, `put_summary` currently upserts by the full `(path, mtime, rules_fp)` key, so a changing `mtime` leaves older rows for the same session path behind. Update `put_summary` to explicitly remove stale `summary_cache` rows for the same normalized `abspath` (and, if appropriate, the same `rules_fingerprint`) before or after the insert so only the latest mtime is retained. Keep the fix localized to `put_summary` and preserve the existing LRU eviction behavior and `_ensure_connection`/`conn.execute` flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_session_summary_cache.py`:
- Around line 33-38: The cache_db fixture is annotated as returning Path even
though it is a generator fixture using yield, which triggers the mypy failure.
Update the cache_db fixture signature to use an appropriate generator/fixture
return annotation that matches its yielded Path value, and keep the
reset_connection_for_tests and clear_cache setup/teardown behavior unchanged.
In `@utils/session_summary_cache.py`:
- Around line 94-114: The get_summary cache read path currently performs an
UPDATE and commit on every cache hit, which makes a lookup behave like a write
transaction. Update get_summary to avoid synchronous per-hit writes by deferring
or batching the accessed_at touch, or by removing the touch entirely in favor of
a non-writing recency strategy; keep the fast read-only path around the SELECT
and only update via a separate periodic mechanism if needed.
- Around line 51-72: The SQLite connection setup in _ensure_connection opens the
cache with default journaling and no lock handling, which can cause contention
in shared use. Update the connection initialization to apply WAL mode and a busy
timeout on the sqlite3.Connection before creating or using the summary_cache
table, so concurrent access through _ensure_connection is more resilient and
faster.
---
Outside diff comments:
In `@api/projects.py`:
- Around line 76-89: The exception handling in the session counting loop
silently swallows failures in the _peek_or_cache_summary path, unlike
get_project_sessions which logs errors. Update the except Exception block in the
session processing logic to record the failure with current_app.logger.exception
(or equivalent) before falling back to counting the session, so issues like
cache corruption or SQLite lock contention are visible. Use the existing project
session-counting code path and the _peek_or_cache_summary call site as the
anchor points for the fix.
---
Nitpick comments:
In `@api/projects.py`:
- Around line 105-125: The session parsing path in the project summary loop is
doing redundant work: `summary_from_session(parsed, ...)` already builds the
summary fields, but the success path then calls `_session_row_ok(s, parsed)` and
recomputes them from `parsed` again. Replace that final append with
`session_row_from_summary(s, row)` so the API row is derived from the cached
summary object, and remove `_session_row_ok` if it is no longer used.
In `@utils/session_path.py`:
- Around line 48-63: `_resolve_display_name` is recomputing the project max
mtime even though `list_projects` already has `latest_mtime` for the same
project. Update `list_projects` to pass the existing `latest_mtime` into
`_resolve_display_name`, and change `_resolve_display_name` to use that value
for the cache lookup instead of calling `_project_jsonl_max_mtime` again. Keep
the cache behavior in `_display_name_cache` and the display-name fallback logic
unchanged.
In `@utils/session_summary_cache.py`:
- Line 17: The fixed DEFAULT_MAX_ROWS value in session_summary_cache should be
revisited because get_projects() can churn through too many session rows for
large installs. Update the caching limit in session_summary_cache.py by making
DEFAULT_MAX_ROWS configurable (for example via an environment setting) or by
increasing it substantially, and ensure the value used by get_projects()
reflects the new setting so cache rows are not evicted before reuse.
- Around line 117-152: `put_summary` currently upserts by the full `(path,
mtime, rules_fp)` key, so a changing `mtime` leaves older rows for the same
session path behind. Update `put_summary` to explicitly remove stale
`summary_cache` rows for the same normalized `abspath` (and, if appropriate, the
same `rules_fingerprint`) before or after the insert so only the latest mtime is
retained. Keep the fix localized to `put_summary` and preserve the existing LRU
eviction behavior and `_ensure_connection`/`conn.execute` flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c15aeee8-10a6-4929-a54b-cbe623aef66a
📒 Files selected for processing (6)
api/projects.pytests/test_api_integration.pytests/test_session_path.pytests/test_session_summary_cache.pyutils/session_path.pyutils/session_summary_cache.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@utils/session_summary_cache.py`:
- Around line 20-25: The max_cache_rows() helper in session_summary_cache.py
should not let a malformed CLAUDE_CODE_CHAT_BROWSER_SUMMARY_CACHE_MAX_ROWS value
crash put_summary() on the session-listing path. Update max_cache_rows() to
parse the env var defensively, catch invalid integer values, and fall back to
DEFAULT_MAX_ROWS (while still enforcing a minimum of 1 for valid inputs). Ensure
the change is localized to max_cache_rows() so callers like put_summary() keep
working even when the override is mistyped.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0f89fd1d-da8a-48fa-bd4b-78ae3039491f
📒 Files selected for processing (5)
api/projects.pytests/test_api_integration.pytests/test_session_summary_cache.pyutils/session_path.pyutils/session_summary_cache.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/test_api_integration.py
- tests/test_session_summary_cache.py
- api/projects.py
Catch ValueError when CLAUDE_CODE_CHAT_BROWSER_SUMMARY_CACHE_MAX_ROWS is not an integer and fall back to DEFAULT_MAX_ROWS so put_summary() does not fail on the session-list path.
Closes #109
Large installs were painful on cold start — every session list request re-parsed every
.jsonl, and the landing page reopened files on every load just to get display names and counts. The in-memory LRU from #82 helps within a run but dies on restart.This adds a SQLite summary cache (
~/.claude-code-chat-browser/session_summary_cache.sqlite) keyed on path + mtime + exclusion-rules fingerprint.get_project_sessionsskips the full parse when we already have a complete row.get_projectsgoes through the same cache for peek data instead of callingquick_session_infoon every file every time.Also cached display names in
session_path.py— keyed on the project dir and max.jsonlmtime, so repeat landing page loads don't reopen files when nothing changed.Project card
session_countnow uses the same titled-session filter as the session list when there are no exclusion rules. If exclusions are on, the card count can still diverge (same trade-off we accepted on cppa).Partial cache rows from peeks get upgraded to full rows on the first session-list visit. mtime change = cache miss.
Files:
utils/session_summary_cache.py(new),api/projects.py,utils/session_path.py, tests intest_session_summary_cache.py+ integration/path tests.To verify: run the test suite; optionally restart the server, hit a project session list twice — second request shouldn't touch
get_cached_sessionfor unchanged files. Touch one.jsonland confirm only that one re-parses.FTS search / search UX is separate — that's the Monday PR.
Summary by CodeRabbit