PLEASE IGNORE: unsupervised AI pr — Repair test-report tooling by crusaderky · Pull Request #9321 · dask/distributed

crusaderky · 2026-07-03T14:18:17Z

Warning

This PR was written autonomously by an AI agent and has not been reviewed by a human yet.

The Test Report workflow (runs) has been failing ever since the CI migration to pixi (#9276) changed the job and artifact naming of tests.yaml; the conda environment it relies on (python 3.9 + unmaintained altair_saver) no longer solves either.

Changes

New segregated test-report pixi environment (no dependencies spilled into other environments). Usage, both locally and in CI:
```
pixi run test-report dask/distributed
pixi run test-report dask/dask
```
This downloads the JUnit XML artifacts from the Tests runs of the target repo (caching them into local shelve databases) and generates test_report.html (last 90 days) and test_short_report.html (last 7 days). The old test-report-environment.yml conda file is deleted.
Adapted test_report.py parsing to the post-pixi job names (ubuntu-latest py310 test-ci not ci1) and artifact names (ubuntu-latest-py310-test-ci-notci1) of both dask/distributed and dask/dask. Workflow runs older than 2026-06-04 (the pixi migration) cannot be parsed and are skipped via a hardcoded cutoff; no backwards compatibility.
Local databases are named after the repo (e.g. test_report.dask__distributed.db) so reports for multiple repos can be generated from the same checkout without clobbering each other.
gh CLI fallback: when GITHUB_TOKEN is not set, the token is read from gh auth token.
Replaced unmaintained altair_saver with altair ≥6 native HTML output.
Fixed a regression from the f-string sweep in Use f-strings #9245: f"{df.file}.{df.test}" stringified the whole pandas Series, collapsing every test into a single giant chart.
Workflow changes:
- Everything runs through pixi; nothing is installed by hand.
- On PRs, the workflow only runs when the test-report label is set, and it uploads the reports + databases as workflow artifacts instead of deploying to GitHub Pages. Scheduled/manual runs on the main repo deploy to GitHub Pages as before.
- Fixed actions/cache usage: the previous static key made the cache immutable after the first save, so it never picked up new runs. Now every run saves a fresh cache and restores from the most recent one.
- Skip in-progress runs and expired artifacts so they don't poison the cache.

Verified locally

Both commands above were run locally from a clean cache; reports were inspected:

dask/distributed: 39 flaky-test charts over the last ~2 weeks (30 runs), including the known offenders (test_RetireWorker_stress, test_chaos_rechunk, test_failure_during_worker_initialization).
dask/dask: 2 flaky-test charts (test_warn_bad_rechunking, test_bind[False], both py314t), cross-checked against the actual failed Tests runs on main — the only other job failures were setup-pixi infra failures, which correctly don't appear as test failures.
No cross-contamination between the two repos' reports.

Known limitation

The artifact databases grow unboundedly over time (~25 MB/day pre-compression for dask/distributed); entries for runs that fall out of the 90-day window are never pruned. Since GitHub evicts caches LRU at 10 GB per repo this is tolerable, but a pruning pass would be a good follow-up.

A follow-up PR in dask/dask migrates its test-report workflow to use this same tooling.

🤖 Generated with Claude Code

The Test Report workflow has been broken since the CI migration to pixi (dask#9276): - Run the report generation through a new, self-contained `test-report` pixi environment: `pixi run test-report dask/distributed`. Delete the unused conda environment file. - Adapt job/artifact name parsing in test_report.py to the names produced by the post-pixi tests.yaml, in both dask/distributed and dask/dask. Runs older than 2026-06-04 cannot be parsed and are skipped. - Replace the unmaintained altair_saver with altair>=6 native HTML output. - Fix regression introduced by the f-string conversion in dask#9245 which collapsed all tests into a single unreadable chart. - Name the local artifact caches after the repo (e.g. test_report_dask__distributed) so that reports for multiple repos can be generated from the same working directory. - Fall back to `gh auth token` when GITHUB_TOKEN is not set. - On PRs, run the workflow only when the `test-report` label is set; upload the reports and databases as workflow artifacts instead of deploying to GitHub Pages. - Fix actions/cache usage so that the database cache is actually updated on every run (cache keys are immutable). - Skip in-progress and expired runs/artifacts to avoid poisoning the cache. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The explicit permissions block removes the default actions:read scope, which is needed to list workflow runs and download artifacts. The PR run passed regardless because fork PR tokens are read-only on all scopes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-03T15:15:53Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

40 files ±0 40 suites ±0 14h 47m 26s ⏱️ +59s
4 156 tests ±0 3 978 ✅ +1 178 💤 ±0 0 ❌ - 1
80 880 runs - 1 76 643 ✅ - 1 4 237 💤 +1 0 ❌ - 1

Results for commit db38c64. ± Comparison against base commit 9e706be.

♻️ This comment has been updated with latest results.

Each job gets strictly the permissions it needs: generate gets actions:read (plus contents:read for checkout), deploy gets contents:write. The report is handed over between the jobs as a workflow artifact, which also becomes always available for debugging; the databases are only uploaded on PRs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Public repositories are implicitly readable by the workflow token even when the permissions block sets contents to none. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

e.g. test_report.dask__distributed.db. This simplifies .gitignore and the workflow glob patterns, which previously had to avoid matching test_report.html. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

crusaderky added the test-report Run the test-report workflow on this PR label Jul 3, 2026

crusaderky mentioned this pull request Jul 3, 2026

PLEASE IGNORE: unsupervised AI pr — Repair test-report workflow dask/dask#12487

Draft

crusaderky and others added 4 commits July 3, 2026 16:23

Drop redundant contents:read

ad5fbf3

Public repositories are implicitly readable by the workflow token even when the permissions block sets contents to none. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

review

e189713

Add .db extension to database file names

db38c64

e.g. test_report.dask__distributed.db. This simplifies .gitignore and the workflow glob patterns, which previously had to avoid matching test_report.html. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PLEASE IGNORE: unsupervised AI pr — Repair test-report tooling#9321

PLEASE IGNORE: unsupervised AI pr — Repair test-report tooling#9321
crusaderky wants to merge 6 commits into
dask:mainfrom
crusaderky:test-report

crusaderky commented Jul 3, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

crusaderky commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Verified locally

Known limitation

Uh oh!

github-actions Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

crusaderky commented Jul 3, 2026 •

edited

Loading

github-actions Bot commented Jul 3, 2026 •

edited

Loading