Skip to content

fix(task-history): invalidate() and invalidateAll() bypass withLock, corrupting cache under concurrent writes #698

Description

@edelauna

Part of #357 (Epic 2 — Task Lifecycle Fixes).

Context

TaskHistoryStore uses a promise-chain mutex (withLock) to serialize all cache mutations. Every write path — upsert, delete, deleteMany, reconcile, atomicReadAndUpdate — acquires the lock before touching this.cache. However, two public methods bypass the lock entirely:

  • invalidateAll() — synchronous cache.clear(), no lock acquired
  • invalidate(taskId) — async file read + cache.set, no lock acquired

There is one production caller today (webviewMessageHandler.ts:954, Roo history import):

provider.taskHistoryStore.invalidateAll()   // sync, no lock
await provider.taskHistoryStore.reconcile() // async, acquires lock

The Race

_upsertUnlocked (called by atomicReadAndUpdate and upsert) contains an await at line 175 (await this.writeTaskFile(merged)). At that yield point the event loop is free to run other synchronous work — including invalidateAll(). The resulting interleaving:

  1. atomicReadAndUpdate holds the lock, enters _upsertUnlocked
  2. await writeTaskFile(merged) — lock is held but execution yields
  3. invalidateAll() fires: cache.clear() — entire cache wiped while lock is logically held
  4. _upsertUnlocked resumes: cache.set(merged.id, merged) — only the one item is back
  5. getAll() returns a single-item list
  6. onWrite([singleItem])scheduleGlobalStateWriteThrough() runs with partial history
  7. Lock releases; reconcile() queues and eventually rebuilds the full cache

Impact: Between steps 6 and 7 globalState contains only one history item. If VS Code crashes in this window, the downgrade-compatibility globalState history list is corrupted. invalidate() has no production callers today, but carries an equivalent stale-write risk if ever called concurrently.

Developer Notes

Why the lock doesn't protect you here

The promise-chain mutex in withLock serializes other withLock callers — it does not prevent synchronous code outside the lock from observing or mutating cache mid-await. As the matklad "On Async Mutexes" post explains: "mutual exclusion is a property of the logic itself, not of the runtime… if a certain snippet of code must be executed atomically with respect to everything else that is concurrent, then it must be annotated as such in the source code." Any public method that touches this.cache without withLock is an escape hatch that breaks the invariant the rest of the class relies on. The Trabe article on mutex-based cache sync makes the same point: the lock must own the full critical section including any cache reads that feed into the write.

Recommended fix — invalidateAndReconcile()

Since the one production caller always pairs invalidateAll() + reconcile(), the cleanest fix is a combined method that does both under a single lock acquisition, eliminating the empty-cache window:

async invalidateAndReconcile(): Promise<void> {
  return this.withLock(async () => {
    this.cache.clear()
    // inline reconcile body (or extract _reconcileUnlocked like _upsertUnlocked)
    ...
  })
}

Then webviewMessageHandler.ts:954 becomes:

await provider.taskHistoryStore.invalidateAndReconcile()
await provider.taskHistoryStore.flushIndex()
await provider.postStateToWebview()

CodeRabbit suggested wrapping each method in its own withLock call. That also fixes the corruption, but leaves a brief window between two separate lock acquisitions where reads see an empty cache. A single combined method is strictly better.

invalidate(taskId) should also be wrapped in withLock (or use an _invalidateUnlocked helper if ever needed inside an existing lock holder).

What to validate

  • The onWrite callback receives a complete history list after any concurrent invalidateAndReconcile + atomicReadAndUpdate
  • A test that fires invalidateAndReconcile() concurrently with atomicReadAndUpdate() should show the full item set in cache after both settle (analogous to delegation-concurrent.spec.ts)
  • The one call site in webviewMessageHandler.ts is updated to await invalidateAndReconcile()

Scope note

This is a pre-existing defect that became visible when atomicReadAndUpdate (Story 2.1, PR #691) made the locking contract explicit. It is independent of Stories 2.2 and 2.3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions