Skip to content

PLEASE IGNORE: unsupervised AI pr — Repair test-report tooling#9321

Draft
crusaderky wants to merge 6 commits into
dask:mainfrom
crusaderky:test-report
Draft

PLEASE IGNORE: unsupervised AI pr — Repair test-report tooling#9321
crusaderky wants to merge 6 commits into
dask:mainfrom
crusaderky:test-report

Conversation

@crusaderky

@crusaderky crusaderky commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Warning

This PR was written autonomously by an AI agent and has not been reviewed by a human yet.

The Test Report workflow (runs) has been failing ever since the CI migration to pixi (#9276) changed the job and artifact naming of tests.yaml; the conda environment it relies on (python 3.9 + unmaintained altair_saver) no longer solves either.

Changes

  • New segregated test-report pixi environment (no dependencies spilled into other environments). Usage, both locally and in CI:
    pixi run test-report dask/distributed
    pixi run test-report dask/dask
    This downloads the JUnit XML artifacts from the Tests runs of the target repo (caching them into local shelve databases) and generates test_report.html (last 90 days) and test_short_report.html (last 7 days). The old test-report-environment.yml conda file is deleted.
  • Adapted test_report.py parsing to the post-pixi job names (ubuntu-latest py310 test-ci not ci1) and artifact names (ubuntu-latest-py310-test-ci-notci1) of both dask/distributed and dask/dask. Workflow runs older than 2026-06-04 (the pixi migration) cannot be parsed and are skipped via a hardcoded cutoff; no backwards compatibility.
  • Local databases are named after the repo (e.g. test_report.dask__distributed.db) so reports for multiple repos can be generated from the same checkout without clobbering each other.
  • gh CLI fallback: when GITHUB_TOKEN is not set, the token is read from gh auth token.
  • Replaced unmaintained altair_saver with altair ≥6 native HTML output.
  • Fixed a regression from the f-string sweep in Use f-strings #9245: f"{df.file}.{df.test}" stringified the whole pandas Series, collapsing every test into a single giant chart.
  • Workflow changes:
    • Everything runs through pixi; nothing is installed by hand.
    • On PRs, the workflow only runs when the test-report label is set, and it uploads the reports + databases as workflow artifacts instead of deploying to GitHub Pages. Scheduled/manual runs on the main repo deploy to GitHub Pages as before.
    • Fixed actions/cache usage: the previous static key made the cache immutable after the first save, so it never picked up new runs. Now every run saves a fresh cache and restores from the most recent one.
    • Skip in-progress runs and expired artifacts so they don't poison the cache.

Verified locally

Both commands above were run locally from a clean cache; reports were inspected:

  • dask/distributed: 39 flaky-test charts over the last ~2 weeks (30 runs), including the known offenders (test_RetireWorker_stress, test_chaos_rechunk, test_failure_during_worker_initialization).
  • dask/dask: 2 flaky-test charts (test_warn_bad_rechunking, test_bind[False], both py314t), cross-checked against the actual failed Tests runs on main — the only other job failures were setup-pixi infra failures, which correctly don't appear as test failures.
  • No cross-contamination between the two repos' reports.

Known limitation

The artifact databases grow unboundedly over time (~25 MB/day pre-compression for dask/distributed); entries for runs that fall out of the 90-day window are never pruned. Since GitHub evicts caches LRU at 10 GB per repo this is tolerable, but a pruning pass would be a good follow-up.

A follow-up PR in dask/dask migrates its test-report workflow to use this same tooling.

🤖 Generated with Claude Code

The Test Report workflow has been broken since the CI migration to pixi (dask#9276):

- Run the report generation through a new, self-contained `test-report` pixi
  environment: `pixi run test-report dask/distributed`.
  Delete the unused conda environment file.
- Adapt job/artifact name parsing in test_report.py to the names produced by
  the post-pixi tests.yaml, in both dask/distributed and dask/dask. Runs older
  than 2026-06-04 cannot be parsed and are skipped.
- Replace the unmaintained altair_saver with altair>=6 native HTML output.
- Fix regression introduced by the f-string conversion in dask#9245 which
  collapsed all tests into a single unreadable chart.
- Name the local artifact caches after the repo (e.g.
  test_report_dask__distributed) so that reports for multiple repos can be
  generated from the same working directory.
- Fall back to `gh auth token` when GITHUB_TOKEN is not set.
- On PRs, run the workflow only when the `test-report` label is set; upload
  the reports and databases as workflow artifacts instead of deploying to
  GitHub Pages.
- Fix actions/cache usage so that the database cache is actually updated on
  every run (cache keys are immutable).
- Skip in-progress and expired runs/artifacts to avoid poisoning the cache.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@crusaderky crusaderky added the test-report Run the test-report workflow on this PR label Jul 3, 2026
The explicit permissions block removes the default actions:read scope, which
is needed to list workflow runs and download artifacts. The PR run passed
regardless because fork PR tokens are read-only on all scopes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    40 files  ±0      40 suites  ±0   14h 47m 26s ⏱️ +59s
 4 156 tests ±0   3 978 ✅ +1    178 💤 ±0  0 ❌  - 1 
80 880 runs   - 1  76 643 ✅  - 1  4 237 💤 +1  0 ❌  - 1 

Results for commit db38c64. ± Comparison against base commit 9e706be.

♻️ This comment has been updated with latest results.

crusaderky and others added 4 commits July 3, 2026 16:23
Each job gets strictly the permissions it needs: generate gets actions:read
(plus contents:read for checkout), deploy gets contents:write. The report is
handed over between the jobs as a workflow artifact, which also becomes
always available for debugging; the databases are only uploaded on PRs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Public repositories are implicitly readable by the workflow token even when
the permissions block sets contents to none.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
e.g. test_report.dask__distributed.db. This simplifies .gitignore and the
workflow glob patterns, which previously had to avoid matching
test_report.html.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test-report Run the test-report workflow on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant