Skip to content

Add stateful-history delta work items to the workflow worker#1777

Draft
JoshVanL wants to merge 1 commit into
dapr:masterfrom
JoshVanL:stateful-history
Draft

Add stateful-history delta work items to the workflow worker#1777
JoshVanL wants to merge 1 commit into
dapr:masterfrom
JoshVanL:stateful-history

Conversation

@JoshVanL

@JoshVanL JoshVanL commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

The sidecar re-sends a workflow instance's entire committed history to the worker on every turn. This adds the worker half of the "stateful history" optimization so that, once a worker is warm for an instance on a work-item stream, the sidecar sends only the new committed events (the delta) and the worker reconstructs the full history from its own cache. It mirrors the Go (durabletask-go), Python, and .NET SDK implementations and is on by default.

Worker (durabletask-client):

  • WorkflowHistoryCache: a per-stream cache of each instance's committed history, bounded by a sliding TTL, an instance-count cap, and a byte budget with LRU eviction. Injectable clock for deterministic tests.
  • DurableTaskGrpcWorker: advertise WORKER_CAPABILITY_STATEFUL_HISTORY in GetWorkItemsRequest, reset the cache on every reconnect (the sidecar drops the old stream's warm set), and reclaim idle entries with a daemon janitor stopped on close.
  • OrchestratorRunner: before replay, resolve the full committed history (cached prefix + delta on a hit, or a GetInstanceHistory fetch on a miss) instead of using the request's pastEvents directly; after replay, cache the committed history, or drop it once the instance ends (a CompleteWorkflow action, covering completed/failed/terminated/continued-as-new). A TerminateWorkflow action targets a different instance and is deliberately not treated as a reset.

Correctness never depends on the cache: any miss (cold stream, eviction, desync) self-heals via the GetInstanceHistory fallback, so this only changes per-turn bandwidth, not results. A fallback fetch that fails abandons the work item for backend redelivery rather than completing with a partial history.

Configuration (DurableTaskGrpcWorkerBuilder):

  • disableStatefulHistory to opt out, plus historyCacheTtl, historyCacheMaxInstances, and historyCacheMaxBytes to tune the bounds.

The sidecar re-sends a workflow instance's entire committed history to the
worker on every turn. This adds the worker half of the "stateful history"
optimization so that, once a worker is warm for an instance on a work-item
stream, the sidecar sends only the new committed events (the delta) and the
worker reconstructs the full history from its own cache. It mirrors the Go
(durabletask-go), Python, and .NET SDK implementations and is on by default.

Worker (durabletask-client):
- WorkflowHistoryCache: a per-stream cache of each instance's committed
  history, bounded by a sliding TTL, an instance-count cap, and a byte
  budget with LRU eviction. Injectable clock for deterministic tests.
- DurableTaskGrpcWorker: advertise WORKER_CAPABILITY_STATEFUL_HISTORY in
  GetWorkItemsRequest, reset the cache on every reconnect (the sidecar drops
  the old stream's warm set), and reclaim idle entries with a daemon janitor
  stopped on close.
- OrchestratorRunner: before replay, resolve the full committed history
  (cached prefix + delta on a hit, or a GetInstanceHistory fetch on a miss)
  instead of using the request's pastEvents directly; after replay, cache
  the committed history, or drop it once the instance ends (a CompleteWorkflow
  action, covering completed/failed/terminated/continued-as-new). A
  TerminateWorkflow action targets a different instance and is deliberately
  not treated as a reset.

Correctness never depends on the cache: any miss (cold stream, eviction,
desync) self-heals via the GetInstanceHistory fallback, so this only changes
per-turn bandwidth, not results. A fallback fetch that fails abandons the
work item for backend redelivery rather than completing with a partial
history.

Configuration (DurableTaskGrpcWorkerBuilder):
- disableStatefulHistory to opt out, plus historyCacheTtl,
  historyCacheMaxInstances, and historyCacheMaxBytes to tune the bounds.

Signed-off-by: joshvanl <me@joshvanl.dev>
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.89%. Comparing base (f42e0d2) to head (7a6842a).

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #1777   +/-   ##
=========================================
  Coverage     76.89%   76.89%           
  Complexity     2307     2307           
=========================================
  Files           244      244           
  Lines          7163     7163           
  Branches        753      753           
=========================================
  Hits           5508     5508           
  Misses         1288     1288           
  Partials        367      367           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant