Add eval CI jobs for edge-tooling#81386
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR adds discovery-based ChangesClaude agent eval discovery flow
Estimated code review effort: 3 (Moderate) | ~25 minutes 🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml`:
- Around line 67-76: The eval job in openshift-claude-agent-eval is setting
EVAL_CHANGED_ONLY without a non-empty EVAL_CASES_DIR, so the changed-only path
becomes a no-op. Update the eval job definition for
eval-cluster-diagnostic-changed to either add EVAL_CASES_DIR alongside
EVAL_CONFIG and EVAL_MODEL in the env block, or remove EVAL_CHANGED_ONLY if the
intent is to run the full eval instead.
In
`@ci-operator/step-registry/lvms/zstream/trigger/lvms-zstream-trigger-commands.sh`:
- Around line 99-105: `get_last_digest` is currently treating any previously
recorded image digest as tested, even when the prior trigger run failed. Update
the logic around `get_last_digest` and the summary-write path that stores the
image after `jobs_failed > 0` so failed Gangway trigger attempts do not persist
the digest for reuse; only record the image when the run succeeds, or clear/skip
writing it on failure so the next cron run retries the same digest.
- Around line 283-289: The entry parsing in
trigger/lvms-zstream-trigger-commands.sh can still pass a trailing Prow JS
semicolon into jq, causing the lookup to fall back to unknown. Update the entry
assignment logic near the jq pipeline to strip the trailing “;” from prow_json
before parsing, matching the behavior already used by load_previous_summary, so
the JSON is valid and the last run state is extracted correctly.
- Around line 165-168: The snapshot extraction pipeline in extract_snapshot is
aborting under set -euo pipefail when grep finds no match, which prevents the
per-release error handler from running. Update the extraction logic around the
jq/grep/head pipeline so that a missing snapshot yields an empty
string/zero-status result instead of failing, while preserving the current
parsing behavior for valid matches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5ddda36d-ab21-494e-820e-cf2b347b59b3
⛔ Files ignored due to path filters (1)
ci-operator/jobs/openshift/lvm-operator/openshift-lvm-operator-main-periodics.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (7)
ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yamlci-operator/config/openshift/lvm-operator/openshift-lvm-operator-main__zstream.yamlci-operator/step-registry/lvms/zstream/OWNERSci-operator/step-registry/lvms/zstream/trigger/OWNERSci-operator/step-registry/lvms/zstream/trigger/lvms-zstream-trigger-commands.shci-operator/step-registry/lvms/zstream/trigger/lvms-zstream-trigger-ref.metadata.jsonci-operator/step-registry/lvms/zstream/trigger/lvms-zstream-trigger-ref.yaml
| get_last_digest() { | ||
| local release="$1" | ||
| if [[ -z "${PREV_SUMMARY}" ]]; then | ||
| echo "" | ||
| return | ||
| fi | ||
| jq -r --arg r "${release}" '.[$r].image // "" | split("@") | if length > 1 then .[1] else "" end' "${PREV_SUMMARY}" 2>/dev/null || echo "" |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Don’t mark a digest tested when trigger attempts failed.
get_last_digest accepts any prior image, but Line 410 writes that image even when jobs_failed > 0; the next cron run will skip the same digest instead of retrying failed Gangway triggers.
Proposed fix
- jq -r --arg r "${release}" '.[$r].image // "" | split("@") | if length > 1 then .[1] else "" end' "${PREV_SUMMARY}" 2>/dev/null || echo ""
+ jq -r --arg r "${release}" '
+ .[$r]
+ | select((.status == "triggered" and (.jobs_failed // 0) == 0) or .status == "skipped")
+ | .image // ""
+ | split("@")
+ | if length > 1 then .[1] else "" end
+ ' "${PREV_SUMMARY}" 2>/dev/null || echo ""Also applies to: 407-410
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/lvms/zstream/trigger/lvms-zstream-trigger-commands.sh`
around lines 99 - 105, `get_last_digest` is currently treating any previously
recorded image digest as tested, even when the prior trigger run failed. Update
the logic around `get_last_digest` and the summary-write path that stores the
image after `jobs_failed > 0` so failed Gangway trigger attempts do not persist
the digest for reuse; only record the image when the run succeeds, or clear/skip
writing it on failure so the next cron run retries the same digest.
| echo "${files_json}" | jq -r ' | ||
| [.[] | select(.filename | contains("catalog")) | .patch // ""] | | ||
| join("\n") | ||
| ' | grep -oP '(?<=snapshot: )\S+' | head -1 |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Make snapshot extraction return empty instead of aborting.
With set -euo pipefail, grep returning no match makes snapshot=$(extract_snapshot ...) exit before the handler at Line 342 can write the intended per-release error summary.
Proposed fix
echo "${files_json}" | jq -r '
[.[] | select(.filename | contains("catalog")) | .patch // ""] |
join("\n")
- ' | grep -oP '(?<=snapshot: )\S+' | head -1
+ ' | { grep -oP '(?<=snapshot: )\S+' || true; } | head -1
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "${files_json}" | jq -r ' | |
| [.[] | select(.filename | contains("catalog")) | .patch // ""] | | |
| join("\n") | |
| ' | grep -oP '(?<=snapshot: )\S+' | head -1 | |
| echo "${files_json}" | jq -r ' | |
| [.[] | select(.filename | contains("catalog")) | .patch // ""] | | |
| join("\n") | |
| ' | { grep -oP '(?<=snapshot: )\S+' || true; } | head -1 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/lvms/zstream/trigger/lvms-zstream-trigger-commands.sh`
around lines 165 - 168, The snapshot extraction pipeline in extract_snapshot is
aborting under set -euo pipefail when grep finds no match, which prevents the
per-release error handler from running. Update the extraction logic around the
jq/grep/head pipeline so that a missing snapshot yields an empty
string/zero-status result instead of failing, while preserving the current
parsing behavior for valid matches.
| local entry | ||
| entry=$(echo "${prow_json}" | sed 's/^var allBuilds = //' | \ | ||
| jq -r --arg n "${test_name}" ' | ||
| [.items[] | select(.status.state == "success" or .status.state == "failure" or .status.state == "error" or .status.state == "aborted")] | | ||
| if length > 0 then .[0] else null end | | ||
| if . then {name: $n, state: .status.state, url: .status.url, started: .status.startTime} else {name: $n, state: "unknown"} end | ||
| ' 2>/dev/null || echo "{\"name\": \"${test_name}\", \"state\": \"unknown\"}") |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Strip the trailing Prow JS semicolon here too.
load_previous_summary already removes ;$; this path does not, so jq can fail on the trailing semicolon and report every last run as unknown.
Proposed fix
- entry=$(echo "${prow_json}" | sed 's/^var allBuilds = //' | \
+ entry=$(echo "${prow_json}" | sed 's/^var allBuilds = //; s/;$//' | \
jq -r --arg n "${test_name}" '📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| local entry | |
| entry=$(echo "${prow_json}" | sed 's/^var allBuilds = //' | \ | |
| jq -r --arg n "${test_name}" ' | |
| [.items[] | select(.status.state == "success" or .status.state == "failure" or .status.state == "error" or .status.state == "aborted")] | | |
| if length > 0 then .[0] else null end | | |
| if . then {name: $n, state: .status.state, url: .status.url, started: .status.startTime} else {name: $n, state: "unknown"} end | |
| ' 2>/dev/null || echo "{\"name\": \"${test_name}\", \"state\": \"unknown\"}") | |
| local entry | |
| entry=$(echo "${prow_json}" | sed 's/^var allBuilds = //; s/;$//' | \ | |
| jq -r --arg n "${test_name}" ' | |
| [.items[] | select(.status.state == "success" or .status.state == "failure" or .status.state == "error" or .status.state == "aborted")] | | |
| if length > 0 then .[0] else null end | | |
| if . then {name: $n, state: .status.state, url: .status.url, started: .status.startTime} else {name: $n, state: "unknown"} end | |
| ' 2>/dev/null || echo "{\"name\": \"${test_name}\", \"state\": \"unknown\"}") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/lvms/zstream/trigger/lvms-zstream-trigger-commands.sh`
around lines 283 - 289, The entry parsing in
trigger/lvms-zstream-trigger-commands.sh can still pass a trailing Prow JS
semicolon into jq, causing the lookup to fall back to unknown. Update the entry
assignment logic near the jq pipeline to strip the trailing “;” from prow_json
before parsing, matching the behavior already used by load_previous_summary, so
the JSON is valid and the last run state is extracted correctly.
007eb23 to
a870610
Compare
|
/test eval-cluster-diagnostic |
|
@kasturinarra: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test eval-cluster-diagnostic |
|
@kasturinarra: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
a870610 to
1c5c4dc
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1c5c4dc to
ffe9d4a
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kasturinarra The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
When EVAL_DISCOVER is set ("true" or a glob pattern), the workflow
auto-discovers eval configs, diffs against PULL_BASE_SHA to run only
affected evals, and produces per-eval JUnit test cases. Single
EVAL_CONFIG mode is preserved for backward compatibility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2ed5fe3 to
2ce7768
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.sh`:
- Around line 207-216: The per-config eval in
`openshift-claude-agent-eval-commands.sh` still uses a fixed `timeout 7200`
inside the serial `CONFIGS_TO_RUN` loop, which can overrun the overall step
budget and prevent the final JUnit output from being written. Update the timeout
logic around the `claude` invocation in the discovery-mode loop to use the
remaining time budget per config, or otherwise write JUnit results incrementally
so each run is preserved even if later configs hit the step limit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 6a6cf46f-d381-4301-bf14-55a7f0802c1f
⛔ Files ignored due to path filters (1)
ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-presubmits.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (3)
ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yamlci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-commands.shci-operator/step-registry/openshift/claude/agent-eval/openshift-claude-agent-eval-ref.yaml
Write JUnit XML after each eval config completes so results are preserved if the step gets killed mid-loop. Add a step-level time guard (2h50m) to skip remaining configs before hitting the 3h limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@kasturinarra: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
openshift-eng/edge-tooling:eval-cluster-diagnostic— manual trigger via/test eval-cluster-diagnosticeval-cluster-diagnostic-changed— auto-triggers whenplugins/two-node/evals/.*cluster-diagnosticfiles changeeval-threat-model-tnf— manual trigger via/test eval-threat-model-tnfeval-threat-model-tnf-changed— auto-triggers whenplugins/two-node/evals/.*threat-model-tnffiles changeopenshift-claude-agent-evalworkflow,claude-opus-4-6model, parallelism 3optional: true— failures don't block mergeContext
Replaces #81166 which used on-the-fly eval generation (
EVAL_SETUP_SCRIPT). This PR points at committed eval configs from edge-tooling PR #178 instead — no dynamic generation, enabling regression detection with committed judges.Pattern follows
openshift-eng/ai-helperseval CI config exactly (per-skill manual +-changedauto-trigger pairs).Test plan
/test eval-cluster-diagnosticon an edge-tooling PR-changedauto-trigger fires when eval case files are modified🤖 Generated with Claude Code
Summary by CodeRabbit
This PR updates CI configuration in
openshift/releaseforopenshift-eng/edge-toolingby adding two new optional Claude eval presubmit entries that run theopenshift-claude-agent-evalworkflow with theclaude-opus-4-6model andEVAL_PARALLELISM: "3". The jobs support both manual and change-driven execution:eval-all(manual via/test eval-all): runs withEVAL_DISCOVER: "true"to auto-discover eval configs.eval-changed(manual via/test eval-changed, auto-triggered viarun_if_changed: ^plugins/.*/evals/|^plugins/.*/skills/): runs withEVAL_CHANGED_ONLY: "true"(andEVAL_DISCOVER: "true") to limit execution to changed evals.To support these modes, the
openshift-claude-agent-evalstep runner was enhanced to:EVAL_DISCOVERvsEVAL_CONFIGmutual exclusivity,RUN_ID),EVAL_CHANGED_ONLY=trueandPULL_BASE_SHAis set,junit_claude-eval.xml,