Skip to content

Provide ability to capture continuations using the Context API#11663

Open
mcculls wants to merge 4 commits into
masterfrom
mcculls/context-continuations
Open

Provide ability to capture continuations using the Context API#11663
mcculls wants to merge 4 commits into
masterfrom
mcculls/context-continuations

Conversation

@mcculls

@mcculls mcculls commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What Does This Do

Introduces basic continuations and events to the Context API

Motivation

This is meant as a stepping stone from the old API towards a simpler foundation

Additional Notes

Benchmarking summary from Claude:

  • The two managers are throughput-equivalent but show different allocation pressure
  • ThreadLocalContextManager: nested attach() calls add zero bytes because the ContextScopeImpl objects are stack-allocated by escape analysis.
  • ContinuableScopeManager: each ContinuableScope object adds ≈32 B because they escape to the heap via the ScopeStack linked structure.

Contributor Checklist

  • Format the title according to the contribution guidelines
  • Assign the type: and (comp: or inst:) labels in addition to any other useful labels
  • Avoid using close, fix, or any linking keywords when referencing an issue
    Use solves instead, and assign the PR milestone to the issue
  • Update the CODEOWNERS file on source file addition, migration, or deletion
  • Update public documentation with any new configuration flags or behaviors
  • Add your completed PR to the merge queue by commenting /merge. You can also:
    • Customize the commit message associated with the merge with /merge --commit-message "..."
    • Remove your PR from the merge queue with /merge -c
    • Skip all merge queue checks with /merge -f --reason "reason"; please use this judiciously, as some checks do not run at the PR-level (note: the PR still needs to be mergeable, this will only skip the pre-merge build)
    • Get more information in this doc

Jira ticket: [PROJ-IDENT]

@mcculls mcculls added comp: core Tracer core tag: no release notes Changes to exclude from release notes type: refactoring labels Jun 17, 2026
@mcculls mcculls requested a review from Copilot June 17, 2026 11:28
@datadog-prod-us1-3

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the datadog.context API to support capturing/resuming execution context via a new ContextContinuation abstraction, and wires dd-trace’s scope continuation mechanism to also implement that API (so Context.current().capture() can hold traces similarly to the legacy tracer continuation).

Changes:

  • Introduces ContextContinuation + ContextListener and adds Context.capture() / ContextManager.capture(...) / listener registration APIs.
  • Implements capture(...) for both the default ThreadLocalContextManager and dd-trace’s ContinuableScopeManager, and makes AgentScope.Continuation also implement ContextContinuation.
  • Adds extensive tests for continuation semantics, noop scopes, and listener lifecycle events.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal-api/src/main/java/datadog/trace/bootstrap/instrumentation/api/AgentScope.java Makes AgentScope.Continuation also implement ContextContinuation and maps methods (resumeactivate, releasecancel).
dd-trace-core/src/main/java/datadog/trace/core/scopemanager/ContinuableScopeManager.java Adds ContextManager.capture(Context) implementation and an addListener(...) stub.
dd-trace-core/src/test/java/datadog/trace/core/scopemanager/ScopeManagerTest.java Adds tests verifying Context API capture/continuation behavior holds/unblocks traces and activates spans.
components/context/src/main/java/datadog/context/Context.java Adds Context.capture() default method.
components/context/src/main/java/datadog/context/ContextManager.java Adds capture + listener APIs and a static register(ContextListener) helper.
components/context/src/main/java/datadog/context/ContextContinuation.java New public API for capturing/resuming context across execution units.
components/context/src/main/java/datadog/context/ContextListener.java New public listener API for attach/detach/capture/release events.
components/context/src/main/java/datadog/context/ThreadLocalContextManager.java Implements capture/resume/release semantics and listener notification for the default manager.
components/context/src/main/java/datadog/context/NoopContextScope.java New cached noop ContextScope implementation for same-context attach/resume no-ops.
components/context/src/main/java/datadog/context/EmptyContextContinuation.java New noop continuation for root/empty context.
components/context/src/main/java/datadog/context/EmptyContext.java Makes EmptyContext constructor private (enforces singleton).
components/context/src/main/java/datadog/context/TestContextManager.java Delegates new capture and addListener APIs to the underlying manager.
components/context/src/test/java/datadog/context/ContextContinuationTest.java New comprehensive tests for continuation lifecycle, concurrency, holds, and event ordering.
components/context/src/test/java/datadog/context/ContextManagerTest.java Adds tests for noop-scope caching/collisions and listener behaviors.
components/context/src/test/java/datadog/context/ContextProvidersForkedTest.java Updates custom manager stub to implement new APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dd-octo-sts

dd-octo-sts Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.94 s 13.97 s [-1.1%; +0.7%] (no difference)
startup:insecure-bank:tracing:Agent 12.93 s 13.00 s [-1.2%; +0.1%] (no difference)
startup:petclinic:appsec:Agent 16.80 s 16.81 s [-1.2%; +1.1%] (no difference)
startup:petclinic:iast:Agent 16.85 s 16.86 s [-0.6%; +0.5%] (no difference)
startup:petclinic:profiling:Agent 16.77 s 16.21 s [-1.0%; +7.8%] (no difference)
startup:petclinic:sca:Agent 16.86 s 16.83 s [-1.0%; +1.3%] (no difference)
startup:petclinic:tracing:Agent 16.10 s 15.94 s [+0.3%; +1.7%] (maybe worse)

Commit: 0a3d3542 · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@PerfectSlayer PerfectSlayer self-requested a review June 17, 2026 11:55

@PerfectSlayer PerfectSlayer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early review of the API with a lot of refinement comments - trying to nail the API right before merging it. The implementation feels quite close to the current one so far. It's also nice to see the AgentSpan.Continuation nicely retrofitted to the new ContextContinuation.
Glad to see this piece of context coming! 🤝

Comment thread components/context/src/main/java/datadog/context/ContextContinuation.java Outdated
Comment thread components/context/src/main/java/datadog/context/Context.java Outdated
Comment thread components/context/src/main/java/datadog/context/ThreadLocalContextManager.java Outdated
Comment thread components/context/src/test/java/datadog/context/ContextManagerTest.java Outdated
Comment thread components/context/src/test/java/datadog/context/ContextManagerTest.java Outdated
@mcculls mcculls force-pushed the mcculls/context-continuations branch 2 times, most recently from 396e8a0 to c8de592 Compare June 18, 2026 09:08
@PerfectSlayer

PerfectSlayer commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

By the way, my comments only are early feedback. I don't expect to have them to get answer in timely manner.
Mostly reacting to you sharing your progress during our sync 🤝

@mcculls mcculls force-pushed the mcculls/context-continuations branch 6 times, most recently from 2754977 to d40fa11 Compare June 18, 2026 14:44
@mcculls mcculls marked this pull request as ready for review June 18, 2026 16:39
@mcculls mcculls requested review from a team as code owners June 18, 2026 16:39
@mcculls mcculls requested review from amarziali, dougqh, mhlidd and sarahchen6 and removed request for a team June 18, 2026 16:40

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d40fa11a82

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@mcculls mcculls force-pushed the mcculls/context-continuations branch from d40fa11 to c33dc26 Compare June 18, 2026 22:02
@mcculls mcculls requested a review from Copilot June 18, 2026 22:23

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.

This is meant as a stepping stone from the old API towards a simpler foundation
@mcculls mcculls force-pushed the mcculls/context-continuations branch from c33dc26 to ba13f41 Compare June 19, 2026 00:10

@dougqh dougqh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the work to move away from the old OpenTracing model towards OpenTelemetry.
I do think OpenTelemetry is genuinely a better model. ContextManagers never quite solved the hybrid async environments that automatic instrumentation encounters.

As we do the move, I do want to take care that we preserve the safety measures that were added previously.
Those were...
1 - stack depth limit
2 - unfinished continuations are tolerated
3 - scopes were allowed to be closed across threads
4 - out-of-order scope close was tolerated

In the past, I know that our instrumentations contributed to those problems, but I think we still need to be careful about these things because of outside / manual instrumentation. There have been issues in those areas as recently as last year.

Based on an initial review, it looks like we’ve lost some of these guards. I’m not sure that we need all of them, but I do think we need a depth limit and tolerance for unfinished continuations.

I'll need some time to review this more comprehensively, but I wanted to share that concern up front.

@amarziali amarziali left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me

@mcculls

mcculls commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

1 - stack depth limit

Previously we needed our own limit because scopes always escaped onto the heap via our separate scope stack. Now scopes naturally reside on the thread's own call stack. Given this I honestly believe we don't need the overhead of enforcing a separate stack limit and can rely on the JVM's own limits.

2 - unfinished continuations are tolerated

The new code uses the same basic continuation algorithm, so should tolerate the same scenarios.

3 - scopes were allowed to be closed across threads

This came with caveats - the first thing the old ContinuableScope.close does is fetch the current thread's scope stack. Once the scope is closed, cleanup is triggered on that scope stack - not the original thread's scope stack. This is because fundamentally only the original thread can perform cleanup of its own scope stack without having to introduce lots of extra locking.

The old code did update the scope's internal book-keeping regardless which thread closed it - but in practice this doesn't solve the issue of stale scopes left in the scope stack until the original thread happens to trigger cleanup. This is not an issue when we store scopes on the actual call stack, we naturally get cleanup of them as the call stack unwinds.

The new scopes have a minimum amount of state / book-keeping - basically just a flag to record when a close attempt was successful. This means the scope tolerates accidental extra calls to close it, firing only when the expected state matches.

A feature we now get which wasn't previously available is when we get to a known safe point, we can potentially 'swap' in (i.e. reset to) the expected context. This is possible because we don't have a separate scope stack which may contain a mix of stale and valid scopes. We just have the natural call stack and a notion of the currently attached context.

4 - out-of-order scope close was tolerated

We have tests that specifically cover out-of-order close. We do stop an out-of-order close from changing the currently attached context if it doesn't match what we expected, but that's intentional. We'll likely introduce a special mode to track out-of-order close calls back to the originating method/instrumentation like in OTel (such features are usually off by default for performance reasons.)

There are still situations where a misbehaving instrumentation could leave a context attached - this is the equivalent of an instrumentation leaving an unclosed scope on the stack in the old code, which wasn't recoverable. But now we have the ability to safely reset to the right context (at known safe points) which we don't have with the old approach.

If there is a leak then the next time that thread handles a request, a new context will attach and overwrite any stale context - from that point on (assuming the instrumentations for that request behave) the context will properly attach and detach. Only once the request unwinds completely will the stale context "re-appear". This also means that unlike the old approach we won't continually accumulate stale contexts as we don't have the separate scope stack. We'd only ever have potentially one stale context lingering per-thread, and that stale context would not impact any request handling on top of it.


Reminder that this code will no doubt evolve as we migrate away from the old approach. But fundamentally the old scope stack will be going away, as it limits how we can model more interesting async primitives. This also gives us a cleaner/simpler foundation to build on - it's not the destination, but a stepping stone.

@PerfectSlayer PerfectSlayer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! 👍
Not much important feedback, mostly about testing and mostly polish about coverage and being explicit.
I like having the API interop tested (hopefully we wont have to maintain it long).

Comment thread components/context/src/main/java/datadog/context/EmptyContext.java
Comment thread components/context/src/main/java/datadog/context/EmptyContextContinuation.java Outdated
Context context = root().with(STRING_KEY, "value");
try (ContextScope outer = context.attach()) {
// two consecutive noop scopes for the same context should be the same cached instance
ContextScope noop1 = context.attach();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 suggestion: ‏What about using try-with-resource for noop1 too? Overall, the 3 scopes (outer, noop1, noop2) can go within the same try-with-resource.

import java.util.concurrent.Future;
import org.junit.jupiter.api.Test;

class ContextContinuationTest extends ContextTestBase {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 thought: ‏What do you think about testing multiple hold calls or hold after release? According to implementation, it should not do anything but can those be interesting cases to cover? 🤷

void testListenersNotNotifiedForRootContext() {
List<String> events = new ArrayList<>();
ContextManager.register(trackingListener(events));
root().swap(); // current is already root, no events

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ question: ‏Should we test for Context.root().attach() too? Not only swap

Comment on lines +47 to +51
ContextContinuation[] ref = {null};
assertDoesNotThrow(() -> ref[0] = context.capture());
assertNotNull(ref[0]);
assertEquals(context, ref[0].context());
ref[0].release();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 suggestion: Alternative about readabliity:‏

Suggested change
ContextContinuation[] ref = {null};
assertDoesNotThrow(() -> ref[0] = context.capture());
assertNotNull(ref[0]);
assertEquals(context, ref[0].context());
ref[0].release();
assertDoesNotThrow(() -> {
ContextContinuation continuation = context.capture();
assertEquals(context, continuation.context());
continuation.release();
}, "Context.capture() should not throw");

Comment on lines +1137 to +1141
// context with no span uses NoopAgentTraceCollector — should not crash
ContextContinuation continuation = Context.current().capture();
assertNotNull(continuation);
assertEquals(ctx, continuation.context());
continuation.release(); // no-op on NoopAgentTraceCollector

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 thought: ‏I feel it a bit confusing. It’s not really about the 2 assert (not null + the right context), it’s about having it to run.
If so, what about using an explicit assertNotThrow with a comment why it should not throw (about the noop agent trace collector still)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this can be cleaned up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: core Tracer core tag: no release notes Changes to exclude from release notes type: refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants