Skip to content

Fix legacy entity protocol: propagate call_entity failures and stop double-encoding results#167

Open
andystaples wants to merge 4 commits into
microsoft:mainfrom
andystaples:andystaples/fix-entity-error-propagation
Open

Fix legacy entity protocol: propagate call_entity failures and stop double-encoding results#167
andystaples wants to merge 4 commits into
microsoft:mainfrom
andystaples:andystaples/fix-entity-error-propagation

Conversation

@andystaples

@andystaples andystaples commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Two related fixes to OrchestrationContext.call_entity under the legacy entity protocol — the protocol the Azure Functions Durable (WebJobs) extension uses when talking to out-of-proc Python workers. The current entity protocol (entityOperationCompleted / entityOperationFailed, used by durabletask.azuremanaged / DTS) already behaved correctly; only the legacy eventRaised-based path was affected.

1. Entity operation failures were not propagated

A failed entity operation was silently completing (typically returning None) instead of raising, so orchestrations could never observe or handle the failure.

Legacy-protocol entity results arrive as an eventRaised handled by _handle_entity_event_raised, which unconditionally called entity_task.complete(...) and never inspected the response for a failure indicator. The wire contract is DurableTask.Core's ResponseMessage, where IsErrorResult => exceptionType != null || failureDetails != null (exceptionType carries the error message; failureDetails carries a structured FailureDetails).

Fix: _handle_entity_event_raised now detects a failed response and fails the task with an EntityOperationFailedException, so an awaiting call_entity raises TaskFailedError — matching the current entity protocol and the .NET SDK (CallEntityAsync throws EntityOperationFailedException when the result IsError).

2. Successful results were double-encoded

The ResponseMessage.result field holds a serialized JSON string of the return value (just like the new protocol's entityOperationCompleted.output). The success path used coerce, which applies a type to an already-parsed value and does not parse JSON strings. As a result, string/dict/list returns arrived double-encoded (e.g. a returned "done" surfaced as "\"done\""), and only numeric returns with an explicit return_type happened to work.

Fix: deserialize the result string (type-directed) instead of coercing it, mirroring the current-protocol handler.

Also addressed

  • Cleaned up the legacy eventRaised dispatch to pop() _entity_task_id_map / _entity_lock_task_id_map entries once handled (they were looked up with get() and never removed), preventing unbounded map growth in long-running orchestrations — consistent with the new-protocol handlers. (From PR review.)

Changes

  • durabletask/worker.py: failure detection + propagation in _handle_entity_event_raised; type-directed deserialization of the result; pop() map cleanup.
  • durabletask/internal/helpers.py: entity_response_failure_details(...) plus a recursive FailureDetails converter.
  • tests/durabletask/test_orchestration_executor.py: legacy-protocol tests for failure via failureDetails, failure via exceptionType, and a parametrized success round-trip across str / int / float / bool / dict / list (no return_type required).
  • CHANGELOG.md: FIXED entries for both bugs.

Validation

  • New legacy-protocol tests pass; entity + orchestration-executor suites pass.
  • flake8 and Pylance clean on changed files.
  • durabletask.azuremanaged uses the current protocol and is unaffected.

call_entity failures delivered via the legacy entity protocol (used by the Azure Functions Durable extension) were silently completing instead of raising. _handle_entity_event_raised now detects failed ResponseMessage payloads (exceptionType/failureDetails) and fails the task with an EntityOperationFailedException, matching the current entity protocol and the .NET SDK.
Copilot AI review requested due to automatic review settings July 2, 2026 19:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a correctness gap in the core durabletask SDK where entity operation failures were not propagated to orchestrations when using the legacy entity protocol (used by the Azure Functions Durable/WebJobs extension). It aligns legacy-protocol behavior with the current protocol (and .NET) by detecting failure indicators in legacy ResponseMessage payloads and failing the awaiting call_entity task accordingly.

Changes:

  • Added legacy ResponseMessage failure detection/conversion helpers to produce TaskFailureDetails when failureDetails or exceptionType indicate an entity operation failure.
  • Updated legacy-protocol eventRaised handling to fail call_entity tasks with EntityOperationFailedException when failures are detected.
  • Added orchestration executor tests covering legacy-protocol entity call failure/success cases, and documented the fix in CHANGELOG.md.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tests/durabletask/test_orchestration_executor.py Adds regression tests for legacy-protocol entity call failure propagation (both failureDetails and exceptionType) and a success round-trip.
durabletask/worker.py Detects legacy-protocol entity call failures in eventRaised handling and propagates them as task failures to orchestrations.
durabletask/internal/helpers.py Adds helpers to extract/convert legacy ResponseMessage failure data into TaskFailureDetails.
CHANGELOG.md Documents the user-visible fix under ## UnreleasedFIXED.

Comment thread durabletask/worker.py
Comment thread durabletask/worker.py
The legacy entity protocol delivers the operation result as a serialized JSON string inside ResponseMessage.result. The success path used coerce, which does not parse JSON strings, so string/dict/list results arrived double-encoded and only numeric results with an explicit return_type worked. Deserialize the result string (type-directed) to match the current entity protocol. Adds parametrized legacy-protocol success round-trip tests.
Addresses PR review: _entity_task_id_map and _entity_lock_task_id_map entries were looked up with get() and never removed, so the maps could grow unbounded in long-running orchestrations. Use pop() to release each entry once its eventRaised is handled, matching the new-protocol handlers.
@andystaples andystaples changed the title Propagate entity operation failures over the legacy entity protocol Fix legacy entity protocol: propagate call_entity failures and stop double-encoding results Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants