PLEASE IGNORE: unsupervised AI pr — Repair test-report tooling#9321
Draft
crusaderky wants to merge 6 commits into
Draft
PLEASE IGNORE: unsupervised AI pr — Repair test-report tooling#9321crusaderky wants to merge 6 commits into
crusaderky wants to merge 6 commits into
Conversation
The Test Report workflow has been broken since the CI migration to pixi (dask#9276): - Run the report generation through a new, self-contained `test-report` pixi environment: `pixi run test-report dask/distributed`. Delete the unused conda environment file. - Adapt job/artifact name parsing in test_report.py to the names produced by the post-pixi tests.yaml, in both dask/distributed and dask/dask. Runs older than 2026-06-04 cannot be parsed and are skipped. - Replace the unmaintained altair_saver with altair>=6 native HTML output. - Fix regression introduced by the f-string conversion in dask#9245 which collapsed all tests into a single unreadable chart. - Name the local artifact caches after the repo (e.g. test_report_dask__distributed) so that reports for multiple repos can be generated from the same working directory. - Fall back to `gh auth token` when GITHUB_TOKEN is not set. - On PRs, run the workflow only when the `test-report` label is set; upload the reports and databases as workflow artifacts instead of deploying to GitHub Pages. - Fix actions/cache usage so that the database cache is actually updated on every run (cache keys are immutable). - Skip in-progress and expired runs/artifacts to avoid poisoning the cache. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The explicit permissions block removes the default actions:read scope, which is needed to list workflow runs and download artifacts. The PR run passed regardless because fork PR tokens are read-only on all scopes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 40 files ±0 40 suites ±0 14h 47m 26s ⏱️ +59s Results for commit db38c64. ± Comparison against base commit 9e706be. ♻️ This comment has been updated with latest results. |
Each job gets strictly the permissions it needs: generate gets actions:read (plus contents:read for checkout), deploy gets contents:write. The report is handed over between the jobs as a workflow artifact, which also becomes always available for debugging; the databases are only uploaded on PRs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Public repositories are implicitly readable by the workflow token even when the permissions block sets contents to none. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
e.g. test_report.dask__distributed.db. This simplifies .gitignore and the workflow glob patterns, which previously had to avoid matching test_report.html. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Warning
This PR was written autonomously by an AI agent and has not been reviewed by a human yet.
The Test Report workflow (runs) has been failing ever since the CI migration to pixi (#9276) changed the job and artifact naming of
tests.yaml; the conda environment it relies on (python 3.9 + unmaintainedaltair_saver) no longer solves either.Changes
test-reportpixi environment (no dependencies spilled into other environments). Usage, both locally and in CI:Testsruns of the target repo (caching them into localshelvedatabases) and generatestest_report.html(last 90 days) andtest_short_report.html(last 7 days). The oldtest-report-environment.ymlconda file is deleted.test_report.pyparsing to the post-pixi job names (ubuntu-latest py310 test-ci not ci1) and artifact names (ubuntu-latest-py310-test-ci-notci1) of both dask/distributed and dask/dask. Workflow runs older than 2026-06-04 (the pixi migration) cannot be parsed and are skipped via a hardcoded cutoff; no backwards compatibility.test_report.dask__distributed.db) so reports for multiple repos can be generated from the same checkout without clobbering each other.ghCLI fallback: whenGITHUB_TOKENis not set, the token is read fromgh auth token.altair_saverwith altair ≥6 native HTML output.f"{df.file}.{df.test}"stringified the whole pandas Series, collapsing every test into a single giant chart.test-reportlabel is set, and it uploads the reports + databases as workflow artifacts instead of deploying to GitHub Pages. Scheduled/manual runs on the main repo deploy to GitHub Pages as before.actions/cacheusage: the previous static key made the cache immutable after the first save, so it never picked up new runs. Now every run saves a fresh cache and restores from the most recent one.Verified locally
Both commands above were run locally from a clean cache; reports were inspected:
test_RetireWorker_stress,test_chaos_rechunk,test_failure_during_worker_initialization).test_warn_bad_rechunking,test_bind[False], both py314t), cross-checked against the actual failedTestsruns on main — the only other job failures were setup-pixi infra failures, which correctly don't appear as test failures.Known limitation
The artifact databases grow unboundedly over time (~25 MB/day pre-compression for dask/distributed); entries for runs that fall out of the 90-day window are never pruned. Since GitHub evicts caches LRU at 10 GB per repo this is tolerable, but a pruning pass would be a good follow-up.
A follow-up PR in dask/dask migrates its test-report workflow to use this same tooling.
🤖 Generated with Claude Code