Skip to content

Add TRUNK_API_CLIENT_RETRY_DEADLINE_SECS to bound API retry time#1131

Merged
trunk-io[bot] merged 1 commit into
mainfrom
dylan/retry-deadline-and-signal-handling
Jun 30, 2026
Merged

Add TRUNK_API_CLIENT_RETRY_DEADLINE_SECS to bound API retry time#1131
trunk-io[bot] merged 1 commit into
mainfrom
dylan/retry-deadline-and-signal-handling

Conversation

@dfrankland

@dfrankland dfrankland commented Jun 29, 2026

Copy link
Copy Markdown
Member

Problem

The API client's retry loop (api/src/call_api.rs) only honored a retry count (TRUNK_API_CLIENT_RETRY_COUNT). Total wall-clock time was otherwise uncapped — per-request timeouts plus up to ~62s of exponential backoff (2/4/8/16/32s) across retries — so a slow/failing endpoint could spin for minutes (this is what surfaced the repeated …is taking longer than expected messages customers reported).

Change

  • New optional env var TRUNK_API_CLIENT_RETRY_DEADLINE_SECS: an overall wall-clock budget for a call's retry loop. The loop gives up once elapsed + next_backoff >= deadline, returning the most recent error. Pairs with TRUNK_API_CLIENT_RETRY_COUNT to tune how fast/often the CLI gives up.
  • Replaces tokio-retry's RetryIf with an explicit loop (behavior-identical when the deadline is unset) so the deadline can short-circuit while still returning a real error — the generic A::Error includes reqwest::Error, which has no public constructor, so a timeout()/select! wrapper couldn't synthesize one. We still use tokio-retry's ExponentialBackoff + Action.
  • Read the env vars once via LazyLock (DEFAULT_DELAY, RETRY_DEADLINE).
  • Captured TRUNK_API_CLIENT_RETRY_DEADLINE_SECS in TRUNK_ENVS_TO_CAPTURE for bundle-metadata debugging.

Notes

  • The deadline is checked between attempts, so a single in-flight request can overshoot by up to its own reqwest per-request timeout (30s API / 1s telemetry); each attempt is already bounded by that timeout. Immediate mid-request abort on CI cancellation is handled separately in Abort the CLI immediately on SIGTERM/SIGINT (CI cancellation) #1132 (signal handling).
  • Customer mitigations enabled by this: TRUNK_API_CLIENT_RETRY_COUNT=0 and/or a small TRUNK_API_CLIENT_RETRY_DEADLINE_SECS.

Testing

  • Added two deterministic unit tests (paused clock, explicit deadlines): stops_retrying_once_deadline_would_be_exceeded and retries_full_count_when_no_deadline_set.
  • All 9 call_api tests pass; existing retry-count/backoff/should_retry tests unchanged.

🤖 Generated with Claude Code

@trunk-io

trunk-io Bot commented Jun 29, 2026

Copy link
Copy Markdown

😎 Merged successfully - details.

@trunk-staging-io

trunk-staging-io Bot commented Jun 29, 2026

Copy link
Copy Markdown

Static BadgeStatic BadgeStatic BadgeStatic Badge

Failed Test Failure Summary Logs
variant_quarantine_test should be quarantined when run with variant A test expected the sum of 2 + 2 to be 5, but it was 4, indicating a failing assertion. Logs ↗︎

View Full Report ↗︎Docs

@codecov-commenter

codecov-commenter commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.79%. Comparing base (83033bf) to head (857a7eb).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1131      +/-   ##
==========================================
+ Coverage   82.48%   82.79%   +0.30%     
==========================================
  Files          69       69              
  Lines       15482    15531      +49     
==========================================
+ Hits        12771    12859      +88     
+ Misses       2711     2672      -39     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The API client's retry loop only honored a retry *count*
(TRUNK_API_CLIENT_RETRY_COUNT); the total wall-clock time was otherwise
unbounded (per-request timeouts + up to ~62s of exponential backoff
across retries), so slow/failing calls could run for minutes.

Add an optional overall deadline, read from
TRUNK_API_CLIENT_RETRY_DEADLINE_SECS. The retry loop now gives up once
the elapsed time plus the next backoff would exceed the budget,
returning the most recent error. Pairs with TRUNK_API_CLIENT_RETRY_COUNT
to tune how fast and how often the CLI gives up.

Replace tokio-retry's RetryIf with an explicit loop so the deadline can
short-circuit without needing to construct an error from nothing (the
generic A::Error includes reqwest::Error, which has no public
constructor). Behavior is identical to before when the deadline is
unset. Capture the new env var in bundle metadata for debugging.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dfrankland dfrankland force-pushed the dylan/retry-deadline-and-signal-handling branch from f914224 to bcb832c Compare June 29, 2026 23:03
@trunk-io

trunk-io Bot commented Jun 29, 2026

Copy link
Copy Markdown

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@dfrankland dfrankland force-pushed the dylan/retry-deadline-and-signal-handling branch 2 times, most recently from 59dad8c to 857a7eb Compare June 29, 2026 23:33
@dfrankland dfrankland changed the title Bound API retry time and abort on CI cancellation Add TRUNK_API_CLIENT_RETRY_DEADLINE_SECS to bound API retry time Jun 29, 2026
@trunk-io trunk-io Bot merged commit e617a6b into main Jun 30, 2026
27 checks passed

@TylerJang27 TylerJang27 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants