fix(delegation): serialize delegateParentAndOpenChild with atomicReadAndUpdate#691
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds an atomic read-modify-write API to ChangesAtomic delegation persistence
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed: private package registry requires authentication. Disable ESLint in CodeRabbit settings or use public packages. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
src/__tests__/provider-delegation.spec.ts (1)
162-200: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winMake the race test actually concurrent.
This test checks a single-call order (
createTask → atomicReadAndUpdate → child.start), so it would still pass if two same-parent delegations both created and started children. Add aPromise.allSettled/controlled-promise case for two calls with the sameparentTaskIdand assert only the serialized winner starts, or that the second call rejects after the first changes the current task.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/__tests__/provider-delegation.spec.ts` around lines 162 - 200, The test only verifies sequential execution order for a single delegateParentAndOpenChild call and would not catch race conditions if two concurrent delegations with the same parentTaskId executed simultaneously. Modify the test to use Promise.allSettled to invoke delegateParentAndOpenChild concurrently with identical parentTaskId values twice, and then assert that the concurrent calls are properly serialized by verifying either that only one child's start method is actually called or that the second concurrent invocation rejects after the first one modifies the current task state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/__tests__/delegation-concurrent.spec.ts`:
- Around line 109-114: The test is missing assertions to verify that delegated
fields are properly maintained during concurrent operations. In addition to the
existing assertions checking final.childIds and final.status, add assertions to
verify that final.delegatedToId and final.awaitingChildId are consistent and
present as expected. This ensures the test actually validates the full field set
as its name suggests, rather than only checking child IDs and status.
In `@src/__tests__/provider-delegation.spec.ts`:
- Line 34: The parent task doubles in the test file are missing the
flushPendingToolResultsToHistory method stub, which causes
delegateParentAndOpenChild to throw a TypeError during the flush step and follow
the error-handling branch instead of the intended happy path. Add the
flushPendingToolResultsToHistory stub method (using vi.fn()) to all parent task
double objects at lines 34, 101, 134, 165, and 205 so that the tests exercise
the correct delegation flow rather than the flush-failure branch.
In `@src/core/task-persistence/TaskHistoryStore.ts`:
- Around line 552-556: In the atomicReadAndUpdate method, the live cached object
is being passed directly to the updater function, which allows mutations to
affect the cache before persistence occurs. Create a deep snapshot or immutable
copy of current and pass that to the updater instead. Additionally, after the
updater returns the modified task object, validate that the returned object has
the same id as the input taskId parameter to enforce the task id invariant and
prevent writing another task's history item. Throw an error if the returned task
id does not match the expected taskId.
- Around line 551-556: The invalidate() and invalidateAll() methods in
TaskHistoryStore.ts mutate the cache directly without acquiring the lock via
withLock(), while atomicReadAndUpdate() serializes operations through
withLock(). This creates a race condition where invalidation can interleave with
file I/O in _upsertUnlocked, causing stale cache state. Wrap the cache mutations
in both invalidate() and invalidateAll() with withLock() to ensure serialization
with other operations. Additionally, update the call to invalidateAll() at
webviewMessageHandler.ts:954 to await the result since it will now be an async
operation.
In `@src/core/webview/ClineProvider.ts`:
- Around line 3499-3508: The delegation transition has a race condition where
two calls with the same parent task can both pass the current-parent check
before either completes, causing both to create and start child tasks. To fix
this, wrap the entire pre-start transition logic (the state updates with
delegatedToId, awaitingChildId, and the child creation) in a call to
runDelegationTransition(parentTaskId, ...) instead of just using the
atomicReadAndUpdate on taskHistoryStore. This function should return the paused
child task. Then keep the child.start() call after the runDelegationTransition
completes so it remains outside the TaskHistoryStore lock, ensuring the full
delegation transition is atomic and prevents both children from running
simultaneously.
---
Nitpick comments:
In `@src/__tests__/provider-delegation.spec.ts`:
- Around line 162-200: The test only verifies sequential execution order for a
single delegateParentAndOpenChild call and would not catch race conditions if
two concurrent delegations with the same parentTaskId executed simultaneously.
Modify the test to use Promise.allSettled to invoke delegateParentAndOpenChild
concurrently with identical parentTaskId values twice, and then assert that the
concurrent calls are properly serialized by verifying either that only one
child's start method is actually called or that the second concurrent invocation
rejects after the first one modifies the current task state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 767e34e4-85f6-47d5-8b54-d74c28aa55fe
📒 Files selected for processing (4)
src/__tests__/delegation-concurrent.spec.tssrc/__tests__/provider-delegation.spec.tssrc/core/task-persistence/TaskHistoryStore.tssrc/core/webview/ClineProvider.ts
| return this.withLock(async () => { | ||
| const current = this.cache.get(taskId) | ||
| if (!current) { | ||
| throw new Error(`[TaskHistoryStore] atomicReadAndUpdate: task ${taskId} not found in cache`) | ||
| } | ||
| return this._upsertUnlocked(updater(current)) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🔴 Critical
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find invalidation call sites that may need to await a locked async invalidation API.
rg -nP -C3 '\.\s*invalidate(All)?\s*\(' --type=tsRepository: Zoo-Code-Org/Zoo-Code
Length of output: 159
🏁 Script executed:
#!/bin/bash
# First, find the TaskHistoryStore.ts file
find . -name "TaskHistoryStore.ts" -type fRepository: Zoo-Code-Org/Zoo-Code
Length of output: 207
🏁 Script executed:
#!/bin/bash
# Read the relevant sections of the file to understand the implementation
# Lines 551-556 (atomicReadAndUpdate), and lines 303-320 (invalidate/invalidateAll)
cat -n src/core/task-persistence/TaskHistoryStore.ts | sed -n '295,330p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1268
🏁 Script executed:
#!/bin/bash
# Read the atomicReadAndUpdate section
cat -n src/core/task-persistence/TaskHistoryStore.ts | sed -n '545,560p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 948
🏁 Script executed:
#!/bin/bash
# Search for invalidate method calls to find all callers
rg "invalidate\(" --type ts -A 2 -B 2Repository: Zoo-Code-Org/Zoo-Code
Length of output: 159
🏁 Script executed:
#!/bin/bash
# Search for all references to invalidate (with various patterns)
rg "invalidate" --type ts -B 1 -A 1Repository: Zoo-Code-Org/Zoo-Code
Length of output: 159
🏁 Script executed:
#!/bin/bash
# Also check what withLock does and understand the locking mechanism
cat -n src/core/task-persistence/TaskHistoryStore.ts | sed -n '1,100p' | head -50Repository: Zoo-Code-Org/Zoo-Code
Length of output: 2236
🏁 Script executed:
#!/bin/bash
# Find the withLock implementation
rg -A 10 "withLock" --type ts | head -50Repository: Zoo-Code-Org/Zoo-Code
Length of output: 159
🏁 Script executed:
#!/bin/bash
# Search for direct method calls in the file
grep -n "invalidate\|withLock" src/core/task-persistence/TaskHistoryStore.ts | head -30Repository: Zoo-Code-Org/Zoo-Code
Length of output: 592
🏁 Script executed:
#!/bin/bash
# Find withLock method definition
cat -n src/core/task-persistence/TaskHistoryStore.ts | grep -A 15 "private.*withLock\|withLock\s*("Repository: Zoo-Code-Org/Zoo-Code
Length of output: 3598
🏁 Script executed:
#!/bin/bash
# Search for any references to these methods in the entire codebase
fd -e ts -e tsx | xargs grep -l "invalidate\|TaskHistoryStore" | head -20Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1167
🏁 Script executed:
#!/bin/bash
# Get a sense of the full file structure - lines 460-500 might have withLock
cat -n src/core/task-persistence/TaskHistoryStore.ts | sed -n '460,510p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1844
🏁 Script executed:
#!/bin/bash
# Search for calls to invalidate methods across the codebase
rg "\.invalidate\(|\.invalidateAll\(" --type ts -B 2 -A 2Repository: Zoo-Code-Org/Zoo-Code
Length of output: 159
🏁 Script executed:
#!/bin/bash
# Also check the test files to understand usage patterns
cat -n src/core/task-persistence/__tests__/TaskHistoryStore.spec.ts | grep -A 3 -B 3 "invalidate"Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1217
🏁 Script executed:
#!/bin/bash
# Search with simpler patterns
grep -r "invalidate" src --include="*.ts" --include="*.tsx" | grep -v "test\|spec\|\.invalidate\(taskId\)" | head -30Repository: Zoo-Code-Org/Zoo-Code
Length of output: 620
🏁 Script executed:
#!/bin/bash
# Look for exact method calls more carefully
grep -rn "invalidate(" src --include="*.ts" --include="*.tsx" | grep -v "test" | grep -v "spec"Repository: Zoo-Code-Org/Zoo-Code
Length of output: 260
🏁 Script executed:
#!/bin/bash
# Check if there are any awaits of invalidate or invalidateAll
grep -rn "await.*invalidate\|invalidate.*await" src --include="*.ts" --include="*.tsx"Repository: Zoo-Code-Org/Zoo-Code
Length of output: 493
🏁 Script executed:
#!/bin/bash
# Search in test files more specifically for how invalidate is called
cat -n src/core/task-persistence/__tests__/TaskHistoryStore.spec.ts | sed -n '400,450p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1800
🏁 Script executed:
#!/bin/bash
# Find the invalidateAll call in webviewMessageHandler.ts
grep -n "invalidateAll" src/core/webview/webviewMessageHandler.ts -B 5 -A 5Repository: Zoo-Code-Org/Zoo-Code
Length of output: 645
🏁 Script executed:
#!/bin/bash
# Check if there are any other callers we might have missed
find src -name "*.ts" -o -name "*.tsx" | xargs grep -l "invalidate" | xargs grep -n "invalidate" | grep -v "test" | grep -v "spec" | grep -v "^\s*//"Repository: Zoo-Code-Org/Zoo-Code
Length of output: 638
🏁 Script executed:
#!/bin/bash
# Get more context around the invalidateAll call at line 954
cat -n src/core/webview/webviewMessageHandler.ts | sed -n '940,970p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1509
🏁 Script executed:
#!/bin/bash
# Check what reconcile() does - does it also use withLock?
cat -n src/core/task-persistence/TaskHistoryStore.ts | sed -n '245,290p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1798
🏁 Script executed:
#!/bin/bash
# Check if there are any concurrent operations that might demonstrate the race
cat -n src/core/task-persistence/TaskHistoryStore.ts | sed -n '540,575p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1706
🏁 Script executed:
#!/bin/bash
# Check if _upsertUnlocked awaits I/O while holding the lock
cat -n src/core/task-persistence/TaskHistoryStore.ts | sed -n '164,190p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1097
🏁 Script executed:
#!/bin/bash
# Check the scheduleIndexWrite and debounced behavior to understand cache state
cat -n src/core/task-persistence/TaskHistoryStore.ts | sed -n '380,420p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 1328
🏁 Script executed:
#!/bin/bash
# Double-check all calls to invalidateAll - search more broadly
rg "invalidateAll" --type ts --type tsx -B 2 -A 2Repository: Zoo-Code-Org/Zoo-Code
Length of output: 187
🏁 Script executed:
#!/bin/bash
# Verify the full context where invalidateAll is called - is this in an async function?
cat -n src/core/webview/webviewMessageHandler.ts | sed -n '920,970p'Repository: Zoo-Code-Org/Zoo-Code
Length of output: 2132
Wrap cache mutations in both invalidate() and invalidateAll() with withLock to prevent race conditions.
atomicReadAndUpdate serializes operations through withLock, but invalidate() and invalidateAll() mutate cache directly without the lock. If invalidation interleaves while _upsertUnlocked is awaiting file I/O (line 175), it can restore or clear stale cache state. This violates the lock's serialization invariant and can cause the debounced index to persist stale data.
Both methods must acquire the lock before any cache mutation:
Suggested fix
- async invalidate(taskId: string): Promise<void> {
- try {
- const item = await this.readTaskFile(taskId)
- if (item) {
- this.cache.set(taskId, item)
- } else {
- this.cache.delete(taskId)
- }
- } catch {
- this.cache.delete(taskId)
- }
- }
+ async invalidate(taskId: string): Promise<void> {
+ return this.withLock(async () => {
+ try {
+ const item = await this.readTaskFile(taskId)
+ if (item) {
+ this.cache.set(taskId, item)
+ } else {
+ this.cache.delete(taskId)
+ }
+ } catch {
+ this.cache.delete(taskId)
+ }
+ })
+ }
- invalidateAll(): void {
- this.cache.clear()
- }
+ async invalidateAll(): Promise<void> {
+ return this.withLock(async () => {
+ this.cache.clear()
+ })
+ }Update the call at webviewMessageHandler.ts:954 to await invalidateAll().
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/core/task-persistence/TaskHistoryStore.ts` around lines 551 - 556, The
invalidate() and invalidateAll() methods in TaskHistoryStore.ts mutate the cache
directly without acquiring the lock via withLock(), while atomicReadAndUpdate()
serializes operations through withLock(). This creates a race condition where
invalidation can interleave with file I/O in _upsertUnlocked, causing stale
cache state. Wrap the cache mutations in both invalidate() and invalidateAll()
with withLock() to ensure serialization with other operations. Additionally,
update the call to invalidateAll() at webviewMessageHandler.ts:954 to await the
result since it will now be an async operation.
There was a problem hiding this comment.
Acknowledged — this is a real defect but predates this PR and is out of scope here. Tracked as a standalone story in Epic 2: #698. The fix recommended there is an invalidateAndReconcile() method that clears and rebuilds the cache under a single lock acquisition, eliminating the empty-cache window between the two steps.
There was a problem hiding this comment.
@edelauna, understood — thanks for the context. The invalidateAndReconcile() approach (clear + rebuild under a single lock) is a cleaner fix than wrapping the two methods separately anyway, since it eliminates the empty-cache window. I'll keep this comment open and it can be resolved when #698 lands.
✏️ Learnings added
Learnt from: edelauna
Repo: Zoo-Code-Org/Zoo-Code PR: 691
File: src/core/task-persistence/TaskHistoryStore.ts:0-0
Timestamp: 2026-06-23T13:43:33.766Z
Learning: In `src/core/task-persistence/TaskHistoryStore.ts`, the `invalidate(taskId)` and `invalidateAll()` methods mutate the in-memory cache outside `withLock`, creating a race with concurrent `_upsertUnlocked` I/O. This is a known pre-existing defect tracked as issue `#698`. The planned fix is a new `invalidateAndReconcile()` method that clears the cache and rebuilds it from disk within a single `withLock` acquisition, eliminating the empty-cache window between the two steps.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| throw new Error(`[TaskHistoryStore] atomicReadAndUpdate: task ${taskId} not found in cache`) | ||
| } | ||
| // Deep-copy so a mutating updater cannot alter cached state before persistence. | ||
| const snapshot = JSON.parse(JSON.stringify(current)) as HistoryItem |
There was a problem hiding this comment.
use structuredClone() here? instead of JSON.stringify which is hacky. can also use cloneDeep which is used in UpdateTodoListTool
navedmerchant
left a comment
There was a problem hiding this comment.
One comment on deep cloning, can use loadash or clone-deep for the deep cloning, otherwise lgtm
….stringigy and parse methods
Related GitHub Issue
Closes: #364
Description
Replaces the unguarded
getTaskWithId+updateTaskHistoryread-modify-write indelegateParentAndOpenChildwith a newatomicReadAndUpdate(taskId, updater)API onTaskHistoryStorethat holds the store's mutex for the entire read-modify-write sequence.Root cause: Two concurrent delegations from the same parent could each read the parent's
HistoryItembefore either had written back, causing the second write to clobber fields set by the first (e.g.awaitingChildId,childIds).Key implementation details:
_upsertUnlocked(item)— extracted private helper containing the upsert body without acquiring the lock; safe to call from inside an existing lock holder.atomicReadAndUpdate(taskId, updater)— typed API: reads from the in-memory cache, applies a pure synchronous updater, then writes via_upsertUnlocked, all within a singlewithLockacquisition. The updater cannot callupsert/withLock(deadlock-safe by design).child.start()is intentionally kept outside the lock — it callssaveClineMessages → upsert → withLockand would deadlock if called inside.Test Procedure
New test file
src/__tests__/delegation-concurrent.spec.tscovers:childIdspresent after two concurrent delegations)status,delegatedToId,awaitingChildId, andchildIdsare consistent after concurrent writesUpdated
src/__tests__/provider-delegation.spec.tsto match the newatomicReadAndUpdatesignature and extracted a sharedmakeStoreStub()helper.Run tests with:
Pre-Submission Checklist
Additional Notes
Ticket #365 (
reopenParentFromDelegation) has a similar unguarded read-modify-write — intentionally left out of scope for that follow-up ticket.Summary by CodeRabbit
atomicReadAndUpdateto the task history store for atomic, locked read-modify-write updates.upsertto separate unlocked mutation logic from the lock acquisition.