Provide ability to capture continuations using the Context API#11663
Provide ability to capture continuations using the Context API#11663mcculls wants to merge 4 commits into
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR extends the datadog.context API to support capturing/resuming execution context via a new ContextContinuation abstraction, and wires dd-trace’s scope continuation mechanism to also implement that API (so Context.current().capture() can hold traces similarly to the legacy tracer continuation).
Changes:
- Introduces
ContextContinuation+ContextListenerand addsContext.capture()/ContextManager.capture(...)/ listener registration APIs. - Implements
capture(...)for both the defaultThreadLocalContextManagerand dd-trace’sContinuableScopeManager, and makesAgentScope.Continuationalso implementContextContinuation. - Adds extensive tests for continuation semantics, noop scopes, and listener lifecycle events.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| internal-api/src/main/java/datadog/trace/bootstrap/instrumentation/api/AgentScope.java | Makes AgentScope.Continuation also implement ContextContinuation and maps methods (resume→activate, release→cancel). |
| dd-trace-core/src/main/java/datadog/trace/core/scopemanager/ContinuableScopeManager.java | Adds ContextManager.capture(Context) implementation and an addListener(...) stub. |
| dd-trace-core/src/test/java/datadog/trace/core/scopemanager/ScopeManagerTest.java | Adds tests verifying Context API capture/continuation behavior holds/unblocks traces and activates spans. |
| components/context/src/main/java/datadog/context/Context.java | Adds Context.capture() default method. |
| components/context/src/main/java/datadog/context/ContextManager.java | Adds capture + listener APIs and a static register(ContextListener) helper. |
| components/context/src/main/java/datadog/context/ContextContinuation.java | New public API for capturing/resuming context across execution units. |
| components/context/src/main/java/datadog/context/ContextListener.java | New public listener API for attach/detach/capture/release events. |
| components/context/src/main/java/datadog/context/ThreadLocalContextManager.java | Implements capture/resume/release semantics and listener notification for the default manager. |
| components/context/src/main/java/datadog/context/NoopContextScope.java | New cached noop ContextScope implementation for same-context attach/resume no-ops. |
| components/context/src/main/java/datadog/context/EmptyContextContinuation.java | New noop continuation for root/empty context. |
| components/context/src/main/java/datadog/context/EmptyContext.java | Makes EmptyContext constructor private (enforces singleton). |
| components/context/src/main/java/datadog/context/TestContextManager.java | Delegates new capture and addListener APIs to the underlying manager. |
| components/context/src/test/java/datadog/context/ContextContinuationTest.java | New comprehensive tests for continuation lifecycle, concurrency, holds, and event ordering. |
| components/context/src/test/java/datadog/context/ContextManagerTest.java | Adds tests for noop-scope caching/collisions and listener behaviors. |
| components/context/src/test/java/datadog/context/ContextProvidersForkedTest.java | Updates custom manager stub to implement new APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
🟢 Java Benchmark SLOs — All performance SLOs passed
PR vs. master results
Commit: Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion. |
There was a problem hiding this comment.
Early review of the API with a lot of refinement comments - trying to nail the API right before merging it. The implementation feels quite close to the current one so far. It's also nice to see the AgentSpan.Continuation nicely retrofitted to the new ContextContinuation.
Glad to see this piece of context coming! 🤝
396e8a0 to
c8de592
Compare
|
By the way, my comments only are early feedback. I don't expect to have them to get answer in timely manner. |
2754977 to
d40fa11
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d40fa11a82
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
d40fa11 to
c33dc26
Compare
This is meant as a stepping stone from the old API towards a simpler foundation
c33dc26 to
ba13f41
Compare
dougqh
left a comment
There was a problem hiding this comment.
I appreciate the work to move away from the old OpenTracing model towards OpenTelemetry.
I do think OpenTelemetry is genuinely a better model. ContextManagers never quite solved the hybrid async environments that automatic instrumentation encounters.
As we do the move, I do want to take care that we preserve the safety measures that were added previously.
Those were...
1 - stack depth limit
2 - unfinished continuations are tolerated
3 - scopes were allowed to be closed across threads
4 - out-of-order scope close was tolerated
In the past, I know that our instrumentations contributed to those problems, but I think we still need to be careful about these things because of outside / manual instrumentation. There have been issues in those areas as recently as last year.
Based on an initial review, it looks like we’ve lost some of these guards. I’m not sure that we need all of them, but I do think we need a depth limit and tolerance for unfinished continuations.
I'll need some time to review this more comprehensively, but I wanted to share that concern up front.
amarziali
left a comment
There was a problem hiding this comment.
Thanks. Looks good to me
Previously we needed our own limit because scopes always escaped onto the heap via our separate scope stack. Now scopes naturally reside on the thread's own call stack. Given this I honestly believe we don't need the overhead of enforcing a separate stack limit and can rely on the JVM's own limits.
The new code uses the same basic continuation algorithm, so should tolerate the same scenarios.
This came with caveats - the first thing the old The old code did update the scope's internal book-keeping regardless which thread closed it - but in practice this doesn't solve the issue of stale scopes left in the scope stack until the original thread happens to trigger cleanup. This is not an issue when we store scopes on the actual call stack, we naturally get cleanup of them as the call stack unwinds. The new scopes have a minimum amount of state / book-keeping - basically just a flag to record when a close attempt was successful. This means the scope tolerates accidental extra calls to close it, firing only when the expected state matches. A feature we now get which wasn't previously available is when we get to a known safe point, we can potentially 'swap' in (i.e. reset to) the expected context. This is possible because we don't have a separate scope stack which may contain a mix of stale and valid scopes. We just have the natural call stack and a notion of the currently attached context.
We have tests that specifically cover out-of-order close. We do stop an out-of-order close from changing the currently attached context if it doesn't match what we expected, but that's intentional. We'll likely introduce a special mode to track out-of-order close calls back to the originating method/instrumentation like in OTel (such features are usually off by default for performance reasons.) There are still situations where a misbehaving instrumentation could leave a context attached - this is the equivalent of an instrumentation leaving an unclosed scope on the stack in the old code, which wasn't recoverable. But now we have the ability to safely reset to the right context (at known safe points) which we don't have with the old approach. If there is a leak then the next time that thread handles a request, a new context will attach and overwrite any stale context - from that point on (assuming the instrumentations for that request behave) the context will properly attach and detach. Only once the request unwinds completely will the stale context "re-appear". This also means that unlike the old approach we won't continually accumulate stale contexts as we don't have the separate scope stack. We'd only ever have potentially one stale context lingering per-thread, and that stale context would not impact any request handling on top of it. Reminder that this code will no doubt evolve as we migrate away from the old approach. But fundamentally the old scope stack will be going away, as it limits how we can model more interesting async primitives. This also gives us a cleaner/simpler foundation to build on - it's not the destination, but a stepping stone. |
PerfectSlayer
left a comment
There was a problem hiding this comment.
Looking great! 👍
Not much important feedback, mostly about testing and mostly polish about coverage and being explicit.
I like having the API interop tested (hopefully we wont have to maintain it long).
| Context context = root().with(STRING_KEY, "value"); | ||
| try (ContextScope outer = context.attach()) { | ||
| // two consecutive noop scopes for the same context should be the same cached instance | ||
| ContextScope noop1 = context.attach(); |
There was a problem hiding this comment.
🎯 suggestion: What about using try-with-resource for noop1 too? Overall, the 3 scopes (outer, noop1, noop2) can go within the same try-with-resource.
| import java.util.concurrent.Future; | ||
| import org.junit.jupiter.api.Test; | ||
|
|
||
| class ContextContinuationTest extends ContextTestBase { |
There was a problem hiding this comment.
💭 thought: What do you think about testing multiple hold calls or hold after release? According to implementation, it should not do anything but can those be interesting cases to cover? 🤷
| void testListenersNotNotifiedForRootContext() { | ||
| List<String> events = new ArrayList<>(); | ||
| ContextManager.register(trackingListener(events)); | ||
| root().swap(); // current is already root, no events |
There was a problem hiding this comment.
❔ question: Should we test for Context.root().attach() too? Not only swap
| ContextContinuation[] ref = {null}; | ||
| assertDoesNotThrow(() -> ref[0] = context.capture()); | ||
| assertNotNull(ref[0]); | ||
| assertEquals(context, ref[0].context()); | ||
| ref[0].release(); |
There was a problem hiding this comment.
🎯 suggestion: Alternative about readabliity:
| ContextContinuation[] ref = {null}; | |
| assertDoesNotThrow(() -> ref[0] = context.capture()); | |
| assertNotNull(ref[0]); | |
| assertEquals(context, ref[0].context()); | |
| ref[0].release(); | |
| assertDoesNotThrow(() -> { | |
| ContextContinuation continuation = context.capture(); | |
| assertEquals(context, continuation.context()); | |
| continuation.release(); | |
| }, "Context.capture() should not throw"); |
| // context with no span uses NoopAgentTraceCollector — should not crash | ||
| ContextContinuation continuation = Context.current().capture(); | ||
| assertNotNull(continuation); | ||
| assertEquals(ctx, continuation.context()); | ||
| continuation.release(); // no-op on NoopAgentTraceCollector |
There was a problem hiding this comment.
💭 thought: I feel it a bit confusing. It’s not really about the 2 assert (not null + the right context), it’s about having it to run.
If so, what about using an explicit assertNotThrow with a comment why it should not throw (about the noop agent trace collector still)
There was a problem hiding this comment.
yes, this can be cleaned up
What Does This Do
Introduces basic continuations and events to the Context API
Motivation
This is meant as a stepping stone from the old API towards a simpler foundation
Additional Notes
Benchmarking summary from Claude:
ThreadLocalContextManager: nested attach() calls add zero bytes because theContextScopeImplobjects are stack-allocated by escape analysis.ContinuableScopeManager: eachContinuableScopeobject adds ≈32 B because they escape to the heap via theScopeStacklinked structure.Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issue/merge. You can also:/merge --commit-message "..."/merge -c/merge -f --reason "reason"; please use this judiciously, as some checks do not run at the PR-level (note: the PR still needs to be mergeable, this will only skip the pre-merge build)Jira ticket: [PROJ-IDENT]