Session list performance: disk summary cache, display-name cache, and count alignment by clean6378-max-it · Pull Request #111 · cppalliance/claude-code-chat-browser

clean6378-max-it · 2026-07-03T00:19:28Z

Closes #109

Large installs were painful on cold start — every session list request re-parsed every .jsonl, and the landing page reopened files on every load just to get display names and counts. The in-memory LRU from #82 helps within a run but dies on restart.

This adds a SQLite summary cache (~/.claude-code-chat-browser/session_summary_cache.sqlite) keyed on path + mtime + exclusion-rules fingerprint. get_project_sessions skips the full parse when we already have a complete row. get_projects goes through the same cache for peek data instead of calling quick_session_info on every file every time.

Also cached display names in session_path.py — keyed on the project dir and max .jsonl mtime, so repeat landing page loads don't reopen files when nothing changed.

Project card session_count now uses the same titled-session filter as the session list when there are no exclusion rules. If exclusions are on, the card count can still diverge (same trade-off we accepted on cppa).

Partial cache rows from peeks get upgraded to full rows on the first session-list visit. mtime change = cache miss.

Files: utils/session_summary_cache.py (new), api/projects.py, utils/session_path.py, tests in test_session_summary_cache.py + integration/path tests.

To verify: run the test suite; optionally restart the server, hit a project session list twice — second request shouldn't touch get_cached_session for unchanged files. Touch one .jsonl and confirm only that one re-parses.

FTS search / search UX is separate — that's the Monday PR.

Summary by CodeRabbit

New Features
- Project and session listings now leverage a disk-backed session summary cache keyed by path, file mtime, and exclusion rules to speed up repeat requests.
- Added a cache for project display names to reduce repeated JSONL scanning.
Bug Fixes
- Improved session counting/filtering by aligning untitled/excluded handling with cached summary completeness.
- “Last modified” project activity is now derived from cached summary timestamps.
Tests
- Added/updated integration and cache-coverage tests, including cache reuse, invalidation, and LRU eviction behavior.

Session list and landing page re-parsed or re-read every .jsonl on each cold start and on every GET /api/projects call. Add SQLite summary cache (keyed by path, mtime, rules fingerprint) so complete rows survive restart; route get_projects and get_project_sessions through it. Cache project display names by directory max mtime. Align project card session_count with session list when no exclusion rules are active.

coderabbitai · 2026-07-03T00:19:48Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: af1aa35e-f6c2-483d-8f00-e79b4fdd6cd5

📥 Commits

Reviewing files that changed from the base of the PR and between db089cb and 79d2027.

📒 Files selected for processing (4)

api/projects.py
tests/test_api_integration.py
tests/test_session_summary_cache.py
utils/session_summary_cache.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/test_api_integration.py
tests/test_session_summary_cache.py
api/projects.py

📝 Walkthrough

Walkthrough

This PR adds a disk-backed session summary cache keyed by path, mtime, and rules fingerprint, plus an in-memory project display-name cache. The project listing endpoints now reuse cached summaries and the tests cover cache behavior, warm-cache reuse, display-name reuse, and count alignment.

Changes

Session listing performance caching

Layer / File(s)	Summary
Session summary cache module `utils/session_summary_cache.py`	Adds the SQLite cache schema, read/write/clear paths, eviction, summary derivation helpers, and session-row mapping.
API endpoints use cached summaries `api/projects.py`	Reworks project listing and session listing to read cached summary rows first, fall back to parsing when needed, and store new summaries.
Display-name cache in session_path `utils/session_path.py`, `tests/test_session_path.py`	Adds a lock-guarded display-name cache keyed by project directory and latest `.jsonl` mtime, then verifies repeated `list_projects()` calls avoid extra file reads.
Cache tests and integration coverage `tests/test_session_summary_cache.py`, `tests/test_api_integration.py`	Adds unit tests for cache behavior and eviction, plus integration checks for count alignment and warm-cache reuse.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Possibly related PRs

cppalliance/claude-code-chat-browser#6: Modifies the same project/session listing endpoints and related session-count logic.
cppalliance/claude-code-chat-browser#24: Changes the same exclusion-handling path used by the new cached summary rows.
cppalliance/claude-code-chat-browser#90: Also uses get_cached_session in api/projects.py session listing logic.

Suggested reviewers: timon0305, wpak-ai

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change set: session-list performance via summary caching, display-name caching, and count alignment.
Linked Issues check	✅ Passed	The described code changes implement the linked issue’s cache, invalidation, display-name, and count-alignment requirements with matching tests.
Out of Scope Changes check	✅ Passed	The changes appear on-scope and focused on the requested performance, caching, and API-count updates, with no unrelated additions evident.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/session-list-summary-cache

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

api/projects.py (1)
76-89: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Exceptions are swallowed silently.

Unlike get_project_sessions, which logs via current_app.logger.exception(...) on failure, this except Exception block on the peek/cache path silently falls back to counting the session without any log trace. This makes cache corruption, permission errors, or SQLite lock contention on this endpoint invisible in production.
🩹 Proposed fix
             except Exception:
+                current_app.logger.exception("Failed to peek/cache summary for %s", s["id"])
                 titled_count += 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/projects.py` around lines 76 - 89, The exception handling in the session
counting loop silently swallows failures in the _peek_or_cache_summary path,
unlike get_project_sessions which logs errors. Update the except Exception block
in the session processing logic to record the failure with
current_app.logger.exception (or equivalent) before falling back to counting the
session, so issues like cache corruption or SQLite lock contention are visible.
Use the existing project session-counting code path and the
_peek_or_cache_summary call site as the anchor points for the fix.

🧹 Nitpick comments (4)

utils/session_path.py (1)
48-63: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

latest_mtime is computed twice per project.

list_projects already computes latest_mtime for the last_modified field (lines 80-82), then _resolve_display_name recomputes the exact same value via _project_jsonl_max_mtime for the cache key. Pass the already-computed value through instead of re-statting every .jsonl file a second time.
♻️ Proposed fix
-def _resolve_display_name(project_dir: str, jsonl_files: list[str], fallback: str) -> str:
-    max_mtime = _project_jsonl_max_mtime(project_dir, jsonl_files)
+def _resolve_display_name(
+    project_dir: str, jsonl_files: list[str], fallback: str, max_mtime: float
+) -> str:
     with _display_name_lock:
             last_modified = datetime.fromtimestamp(latest_mtime, tz=timezone.utc).isoformat()
-            display_name = _resolve_display_name(project_dir, jsonl_files, name)
+            display_name = _resolve_display_name(project_dir, jsonl_files, name, latest_mtime)
Also applies to: 80-86
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/session_path.py` around lines 48 - 63, `_resolve_display_name` is
recomputing the project max mtime even though `list_projects` already has
`latest_mtime` for the same project. Update `list_projects` to pass the existing
`latest_mtime` into `_resolve_display_name`, and change `_resolve_display_name`
to use that value for the cache lookup instead of calling
`_project_jsonl_max_mtime` again. Keep the cache behavior in
`_display_name_cache` and the display-name fallback logic unchanged.
api/projects.py (1)
105-125: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Redundant recomputation: summary_from_session and _session_row_ok both derive the same fields from parsed.

row = summary_from_session(parsed, ...) already computes title/models/tokens/tool_calls/timestamps, then _session_row_ok(s, parsed) recomputes the same fields a second time from parsed. Since session_row_from_summary(s, row) exists precisely to map a SummaryCacheRowDict onto the API row shape, use it here instead, and drop the now-unused _session_row_ok.
♻️ Proposed refactor
             parsed = get_cached_session(s["path"])
             excluded = is_session_excluded(rules, parsed, project_name)
             row = summary_from_session(parsed, is_excluded=excluded)
             put_summary(s["path"], s["modified"], rules_fp, row)
             if row["is_untitled"] or excluded:
                 continue
-            result.append(_session_row_ok(s, parsed))
+            result.append(session_row_from_summary(s, row))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/projects.py` around lines 105 - 125, The session parsing path in the
project summary loop is doing redundant work: `summary_from_session(parsed,
...)` already builds the summary fields, but the success path then calls
`_session_row_ok(s, parsed)` and recomputes them from `parsed` again. Replace
that final append with `session_row_from_summary(s, row)` so the API row is
derived from the cached summary object, and remove `_session_row_ok` if it is no
longer used.
utils/session_summary_cache.py (2)
17-17: 🚀 Performance & Scalability | 🔵 Trivial

Verify DEFAULT_MAX_ROWS=2000 is adequate for large installs.

get_projects() touches a cache row for every session in every project on each landing-page request. For a large install (the scenario this PR targets) with more than ~2000 total sessions, a single landing-page load can evict rows faster than they're reused, so the cache never stays warm across projects — defeating the intended speedup for exactly the target use case. Consider making this configurable (env var) or raising the default substantially.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/session_summary_cache.py` at line 17, The fixed DEFAULT_MAX_ROWS value
in session_summary_cache should be revisited because get_projects() can churn
through too many session rows for large installs. Update the caching limit in
session_summary_cache.py by making DEFAULT_MAX_ROWS configurable (for example
via an environment setting) or by increasing it substantially, and ensure the
value used by get_projects() reflects the new setting so cache rows are not
evicted before reuse.
117-152: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Stale rows for the same path accumulate under a different mtime.

Since the primary key includes mtime, every time a session file's mtime changes (e.g. an in-progress conversation being appended to) put_summary inserts a new row rather than replacing the old one for that path. Old rows for stale mtimes are never explicitly cleaned up — only removed by the global LRU sweep — so actively-growing sessions can quietly consume the row budget and cause other projects' cache entries to be evicted sooner than necessary.
♻️ Proposed cleanup of stale mtime rows for the same path
         conn.execute(
             """
             INSERT INTO summary_cache (path, mtime, rules_fp, payload, accessed_at)
             VALUES (?, ?, ?, ?, ?)
             ON CONFLICT(path, mtime, rules_fp) DO UPDATE SET
                 payload = excluded.payload,
                 accessed_at = excluded.accessed_at
             """,
             (abspath, mtime, rules_fingerprint, payload, now),
         )
+        conn.execute(
+            "DELETE FROM summary_cache WHERE path = ? AND rules_fp = ? AND mtime != ?",
+            (abspath, rules_fingerprint, mtime),
+        )
         count = conn.execute("SELECT COUNT(*) FROM summary_cache").fetchone()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/session_summary_cache.py` around lines 117 - 152, `put_summary`
currently upserts by the full `(path, mtime, rules_fp)` key, so a changing
`mtime` leaves older rows for the same session path behind. Update `put_summary`
to explicitly remove stale `summary_cache` rows for the same normalized
`abspath` (and, if appropriate, the same `rules_fingerprint`) before or after
the insert so only the latest mtime is retained. Keep the fix localized to
`put_summary` and preserve the existing LRU eviction behavior and
`_ensure_connection`/`conn.execute` flow.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_session_summary_cache.py`:
- Around line 33-38: The cache_db fixture is annotated as returning Path even
though it is a generator fixture using yield, which triggers the mypy failure.
Update the cache_db fixture signature to use an appropriate generator/fixture
return annotation that matches its yielded Path value, and keep the
reset_connection_for_tests and clear_cache setup/teardown behavior unchanged.

In `@utils/session_summary_cache.py`:
- Around line 94-114: The get_summary cache read path currently performs an
UPDATE and commit on every cache hit, which makes a lookup behave like a write
transaction. Update get_summary to avoid synchronous per-hit writes by deferring
or batching the accessed_at touch, or by removing the touch entirely in favor of
a non-writing recency strategy; keep the fast read-only path around the SELECT
and only update via a separate periodic mechanism if needed.
- Around line 51-72: The SQLite connection setup in _ensure_connection opens the
cache with default journaling and no lock handling, which can cause contention
in shared use. Update the connection initialization to apply WAL mode and a busy
timeout on the sqlite3.Connection before creating or using the summary_cache
table, so concurrent access through _ensure_connection is more resilient and
faster.

---

Outside diff comments:
In `@api/projects.py`:
- Around line 76-89: The exception handling in the session counting loop
silently swallows failures in the _peek_or_cache_summary path, unlike
get_project_sessions which logs errors. Update the except Exception block in the
session processing logic to record the failure with current_app.logger.exception
(or equivalent) before falling back to counting the session, so issues like
cache corruption or SQLite lock contention are visible. Use the existing project
session-counting code path and the _peek_or_cache_summary call site as the
anchor points for the fix.

---

Nitpick comments:
In `@api/projects.py`:
- Around line 105-125: The session parsing path in the project summary loop is
doing redundant work: `summary_from_session(parsed, ...)` already builds the
summary fields, but the success path then calls `_session_row_ok(s, parsed)` and
recomputes them from `parsed` again. Replace that final append with
`session_row_from_summary(s, row)` so the API row is derived from the cached
summary object, and remove `_session_row_ok` if it is no longer used.

In `@utils/session_path.py`:
- Around line 48-63: `_resolve_display_name` is recomputing the project max
mtime even though `list_projects` already has `latest_mtime` for the same
project. Update `list_projects` to pass the existing `latest_mtime` into
`_resolve_display_name`, and change `_resolve_display_name` to use that value
for the cache lookup instead of calling `_project_jsonl_max_mtime` again. Keep
the cache behavior in `_display_name_cache` and the display-name fallback logic
unchanged.

In `@utils/session_summary_cache.py`:
- Line 17: The fixed DEFAULT_MAX_ROWS value in session_summary_cache should be
revisited because get_projects() can churn through too many session rows for
large installs. Update the caching limit in session_summary_cache.py by making
DEFAULT_MAX_ROWS configurable (for example via an environment setting) or by
increasing it substantially, and ensure the value used by get_projects()
reflects the new setting so cache rows are not evicted before reuse.
- Around line 117-152: `put_summary` currently upserts by the full `(path,
mtime, rules_fp)` key, so a changing `mtime` leaves older rows for the same
session path behind. Update `put_summary` to explicitly remove stale
`summary_cache` rows for the same normalized `abspath` (and, if appropriate, the
same `rules_fingerprint`) before or after the insert so only the latest mtime is
retained. Keep the fix localized to `put_summary` and preserve the existing LRU
eviction behavior and `_ensure_connection`/`conn.execute` flow.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c15aeee8-10a6-4929-a54b-cbe623aef66a

📥 Commits

Reviewing files that changed from the base of the PR and between 6465d4a and 1f79b30.

📒 Files selected for processing (6)

api/projects.py
tests/test_api_integration.py
tests/test_session_path.py
tests/test_session_summary_cache.py
utils/session_path.py
utils/session_summary_cache.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@utils/session_summary_cache.py`:
- Around line 20-25: The max_cache_rows() helper in session_summary_cache.py
should not let a malformed CLAUDE_CODE_CHAT_BROWSER_SUMMARY_CACHE_MAX_ROWS value
crash put_summary() on the session-listing path. Update max_cache_rows() to
parse the env var defensively, catch invalid integer values, and fall back to
DEFAULT_MAX_ROWS (while still enforcing a minimum of 1 for valid inputs). Ensure
the change is localized to max_cache_rows() so callers like put_summary() keep
working even when the override is mistyped.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f89fd1d-da8a-48fa-bd4b-78ae3039491f

📥 Commits

Reviewing files that changed from the base of the PR and between 1f79b30 and 90c5aa7.

📒 Files selected for processing (5)

api/projects.py
tests/test_api_integration.py
tests/test_session_summary_cache.py
utils/session_path.py
utils/session_summary_cache.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/test_api_integration.py
tests/test_session_summary_cache.py
api/projects.py

Catch ValueError when CLAUDE_CODE_CHAT_BROWSER_SUMMARY_CACHE_MAX_ROWS is not an integer and fall back to DEFAULT_MAX_ROWS so put_summary() does not fail on the session-list path.

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

Comment thread tests/test_session_summary_cache.py

Comment thread utils/session_summary_cache.py

Comment thread utils/session_summary_cache.py

Add disk-backed session summary cache and display-name cache

90c5aa7

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

Comment thread utils/session_summary_cache.py Outdated

Harden max_cache_rows against invalid env override

db089cb

Catch ValueError when CLAUDE_CODE_CHAT_BROWSER_SUMMARY_CACHE_MAX_ROWS is not an integer and fall back to DEFAULT_MAX_ROWS so put_summary() does not fail on the session-list path.

timon0305 requested changes Jul 3, 2026

View reviewed changes

Comment thread utils/session_summary_cache.py

Comment thread api/projects.py

Comment thread utils/session_summary_cache.py

Address feedback

79d2027

clean6378-max-it requested a review from timon0305 July 3, 2026 09:40

timon0305 approved these changes Jul 3, 2026

View reviewed changes

timon0305 requested a review from wpak-ai July 3, 2026 12:01

wpak-ai approved these changes Jul 3, 2026

View reviewed changes

wpak-ai merged commit a05fd29 into master Jul 3, 2026
16 checks passed

wpak-ai deleted the perf/session-list-summary-cache branch July 3, 2026 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Session list performance: disk summary cache, display-name cache, and count alignment#111

Session list performance: disk summary cache, display-name cache, and count alignment#111
wpak-ai merged 4 commits into
masterfrom
perf/session-list-summary-cache

clean6378-max-it commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

clean6378-max-it commented Jul 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clean6378-max-it commented Jul 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading