Skip to content

Add baremetal deploy via Ansible playbook#89

Open
fonta-rh wants to merge 18 commits into
openshift-eng:mainfrom
fonta-rh:ocpedge-2775-baremetal-deploy
Open

Add baremetal deploy via Ansible playbook#89
fonta-rh wants to merge 18 commits into
openshift-eng:mainfrom
fonta-rh:ocpedge-2775-baremetal-deploy

Conversation

@fonta-rh

@fonta-rh fonta-rh commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add deploy-baremetal.yml — Ansible playbook that deploys a TNF fencing cluster on adopted baremetal nodes using dev-scripts' agent pipeline
  • Reuses install-dev role's config.yml for all validation (CI token, pull-secret, scenario prefix) via include_role tasks_from: config — zero duplication
  • Rewrites deploy-baremetal.sh as a thin wrapper (~45 lines) that validates inventory and calls ansible-playbook, matching the deploy-cluster.sh pattern
  • Updates [provisioning_host] inventory section to standard Ansible format

Stacked on #88 (baremetal adoption) — review only the top commits. Rebase after #88 merges.

Also depends on: openshift-metal3/dev-scripts#1922 (baremetal ABI support in dev-scripts)

Deploy flow

make baremetal-adopt           # generates artifacts into role files/ dir (PR #88)
make baremetal-fencing-agent   # deploys via Ansible playbook (this PR)

The playbook targets [provisioning_host] from inventory_baremetal.ini:

  1. Pre-flight — checks adoption artifacts exist on controller
  2. Validation — reuses install-dev role's config.yml (CI token, pull-secret, scenario check)
  3. Setup — git checkout dev-scripts, copy ironic_nodes.json, set WORKING_DIR, create mirror creds
  4. Deploy — runs 6 dev-scripts make targets in sequence (requirements → agent_create_cluster)
  5. Post-deploy — fetches kubeconfig + kubeadmin-password back to controller

For local deployment, add localhost ansible_connection=local to [provisioning_host].

What changed from the shell script approach

The original deploy-baremetal.sh (448 lines) reimplemented config validation and had ~140 lines of SSH/rsync remote execution code. The Ansible playbook eliminates both:

  • Config validation reused via include_role (not reimplemented)
  • Remote execution handled by Ansible's native SSH connection layer
  • Credential fetch via fetch module instead of rsync

Test plan

  • ansible-playbook deploy-baremetal.yml --syntax-check passes
  • shellcheck deploy-baremetal.sh passes clean
  • make help shows updated description
  • End-to-end on RDU2 baremetal cluster with provisioning host
  • Verify make baremetal-fencing-agent wrapper validates inventory correctly

🤖 Generated with Claude Code

Summary by CodeRabbit

Summary

  • New Features
    • Added new baremetal make workflow commands: baremetal-adopt, baremetal-verify, baremetal-fencing-agent, and baremetal-wizard.
    • Introduced an interactive baremetal wizard to generate an adoption inventory with BMC details and optional network/provisioning-host settings.
    • Added a baremetal fencing deployment workflow that deploys a fencing cluster using the adopted-node artifacts.
  • Documentation
    • Added a sample baremetal inventory template and enhanced command help, including guidance for generated configuration.
  • Chores
    • Updated ignore rules to prevent committing generated adoption files and credential-containing inventories.

@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2026
@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fonta-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 30, 2026
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds baremetal adoption tooling for TNF fencing clusters: new make targets, inventory capture, adoption artifact generation, and deployment validation plus orchestration.

Changes

Baremetal Adoption Flow

Layer / File(s) Summary
Targets and inventory templates
deploy/Makefile, deploy/openshift-clusters/.gitignore, deploy/openshift-clusters/inventory_baremetal.ini.sample, deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_fencing_example.sh, deploy/openshift-clusters/roles/dev-scripts/install-dev/files/.gitignore
Adds baremetal make targets, updates help text, ignores generated files, and documents the baremetal inventory and fencing config inputs.
Interactive inventory wizard
deploy/openshift-clusters/scripts/baremetal-wizard.sh
Adds an interactive wizard that validates inputs, shows a summary, and writes inventory_baremetal.ini.
Adoption and artifact generation
deploy/openshift-clusters/scripts/baremetal-adopt.sh
Adds inventory parsing, BMC verification, boot metadata discovery, and generation of ironic_nodes.json and config_baremetal_fencing.sh.
Deployment wrapper and playbook
deploy/openshift-clusters/scripts/deploy-baremetal.sh, deploy/openshift-clusters/deploy-baremetal.yml
Adds the deployment wrapper and playbook that validate prerequisites, run dev-scripts, and retrieve cluster credentials.

Estimated code review effort: 4 (Complex) | ~60 minutes

Suggested labels: ready-for-human-review

Suggested reviewers: jaypoulz, slintes


Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (3 errors, 1 warning)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error baremetal-wizard/adopt print node names, BMC addresses, and SSH targets in normal output; passwords are masked, but internal hostnames/management endpoints are exposed. Redact or omit hostnames/BMC endpoints from default output, and gate any inventory detail behind a verbose/debug flag; keep passwords and other secrets masked.
No-Hardcoded-Secrets ❌ Error inventory_baremetal.ini.sample hardcodes bmc_pass=changeme for both sample nodes, which is a literal password and trips the secret check. Replace sample passwords with placeholders like <BMC_PASSWORD> or commented examples, and ensure no literal credentials remain in sample configs.
No-Injection-Vectors ❌ Error deploy-baremetal.yml embeds {{ test_cluster_name }} directly in an ansible.builtin.shell task, so a user-supplied cluster name can break out of the command. Replace that shell task with file/copy modules or validate/quote test_cluster_name before interpolation; never render it raw into shell.
Ai-Attribution ⚠️ Warning AI use is explicit, but the commit uses Co-Authored-By: Claude Opus 4.6 and no Assisted-by/Generated-by trailer was found. Replace the AI trailer with Red Hat Assisted-by: or Generated-by: attribution, and remove Co-Authored-By for the AI tool.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: adding baremetal deployment support through an Ansible playbook.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No-Weak-Crypto ✅ Passed No MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons found in the changed files.
Container-Privileges ✅ Passed No container/K8s manifests were added, and the touched files contain no privileged, hostNetwork, hostIPC, hostPID, SYS_ADMIN, or allowPrivilegeEscalation settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 30, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/openshift-clusters/inventory_baremetal.ini.sample`:
- Around line 24-25: The sample inventory entries still use a literal bmc_pass
value, which should be replaced with a clearly non-usable placeholder. Update
the bmc_pass fields in the master-0 and master-1 example entries to an obvious
placeholder or commented/blank value, keeping the sample format intact and
avoiding any real-looking credential in inventory_baremetal.ini.sample.

In `@deploy/openshift-clusters/scripts/baremetal-adopt.sh`:
- Around line 190-193: The inventory validation in parse_inventory() is only
warning on unsupported node counts instead of stopping the TNF-only flow. Change
the NODE_NAMES count check so it fails fast when the inventory is anything other
than exactly 2 nodes, using die with a clear unsupported-topology message after
the existing zero-nodes guard.

In `@deploy/openshift-clusters/scripts/deploy-baremetal.sh`:
- Around line 190-199: The setup_dev_scripts flow currently clones
DEV_SCRIPTS_REPO at runtime, which leaves stacked-dev-scripts unpinned and can
silently pick up upstream HEAD changes. Update setup_dev_scripts to use a fixed
branch/tag/commit for the git clone, or require an already checked-out
DEV_SCRIPTS_PATH and fail if the checkout is not the expected revision. Apply
the same pinning approach anywhere the deployment relies on the remote
provisioning-host path so the behavior stays stable across runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d6acc4b1-e856-4b79-95e5-ac473a54f6df

📥 Commits

Reviewing files that changed from the base of the PR and between 52f92a5 and 5b40836.

📒 Files selected for processing (7)
  • deploy/Makefile
  • deploy/openshift-clusters/.gitignore
  • deploy/openshift-clusters/inventory_baremetal.ini.sample
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_fencing_example.sh
  • deploy/openshift-clusters/scripts/baremetal-adopt.sh
  • deploy/openshift-clusters/scripts/baremetal-wizard.sh
  • deploy/openshift-clusters/scripts/deploy-baremetal.sh

Comment on lines +24 to +25
master-0 bmc_address=192.168.1.100 bmc_user=admin bmc_pass=changeme boot_mac=52:54:00:00:00:01 node_ip=192.168.1.10
master-1 bmc_address=192.168.1.101 bmc_user=admin bmc_pass=changeme boot_mac=52:54:00:00:00:02 node_ip=192.168.1.11

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win

Replace the sample bmc_pass values with non-credential placeholders.

bmc_pass=changeme is still a literal password in a config sample, so it is easy to copy into a real inventory unchanged. Please switch these to obviously non-usable placeholders such as <set-me> or leave the value blank/commented. As per coding guidelines, “Flag hardcoded secrets including API keys, tokens, passwords, private keys, and credentials.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/inventory_baremetal.ini.sample` around lines 24 -
25, The sample inventory entries still use a literal bmc_pass value, which
should be replaced with a clearly non-usable placeholder. Update the bmc_pass
fields in the master-0 and master-1 example entries to an obvious placeholder or
commented/blank value, keeping the sample format intact and avoiding any
real-looking credential in inventory_baremetal.ini.sample.

Source: Coding guidelines

Comment on lines +190 to +193
[[ ${#NODE_NAMES[@]} -eq 0 ]] && die "No nodes found in inventory"
if [[ ${#NODE_NAMES[@]} -ne 2 ]]; then
echo " WARNING: TNF requires exactly 2 nodes, found ${#NODE_NAMES[@]}" >&2
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Fail fast on unsupported node counts.

This flow is documented as TNF-only, but parse_inventory() still proceeds when the inventory has anything other than exactly 2 nodes. Generating artifacts for an unsupported topology here just defers the failure into a much more expensive deployment step.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/baremetal-adopt.sh` around lines 190 - 193,
The inventory validation in parse_inventory() is only warning on unsupported
node counts instead of stopping the TNF-only flow. Change the NODE_NAMES count
check so it fails fast when the inventory is anything other than exactly 2
nodes, using die with a clear unsupported-topology message after the existing
zero-nodes guard.

Comment thread deploy/openshift-clusters/scripts/deploy-baremetal.sh Outdated
@coderabbitai coderabbitai Bot removed the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jul 1, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
deploy/openshift-clusters/scripts/deploy-baremetal.sh (2)

45-63: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Validate option values before reading $2.

With set -u, --dev-scripts-repo or --dev-scripts-branch without a value aborts with an unbound-variable error instead of a clear usage failure.

Proposed fix
 parse_args() {
     while [[ $# -gt 0 ]]; do
         case $1 in
             --cluster-name)
+                [[ $# -ge 2 && -n "${2:-}" ]] || die "--cluster-name requires a value"
                 CLUSTER_NAME="$2"
                 shift 2
                 ;;
             --dev-scripts-path)
+                [[ $# -ge 2 && -n "${2:-}" ]] || die "--dev-scripts-path requires a value"
                 DEV_SCRIPTS_PATH="$2"
                 shift 2
                 ;;
             --dev-scripts-repo)
+                [[ $# -ge 2 && -n "${2:-}" ]] || die "--dev-scripts-repo requires a value"
                 DEV_SCRIPTS_REPO="$2"
                 shift 2
                 ;;
             --dev-scripts-branch)
+                [[ $# -ge 2 && -n "${2:-}" ]] || die "--dev-scripts-branch requires a value"
                 DEV_SCRIPTS_BRANCH="$2"
                 shift 2
                 ;;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/deploy-baremetal.sh` around lines 45 - 63,
The parse_args() option handling reads $2 for --dev-scripts-repo and
--dev-scripts-branch before confirming a value exists, which can trigger an
unbound-variable failure under set -u. Update parse_args() in
deploy-baremetal.sh to validate that each option has a following argument before
assigning from $2, and fail with a clear usage/error message when the value is
missing. Keep the checks in the same case branches for --dev-scripts-repo and
--dev-scripts-branch so the script exits cleanly instead of aborting
unexpectedly.

351-371: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Quote remote args and forward the effective repo/branch overrides.

In provisioning-host mode, local --dev-scripts-repo / --dev-scripts-branch values are dropped unless duplicated in inventory_baremetal.ini, and the string-built SSH command shell-splits unquoted repo/path/branch values. Build an argv array, shell-quote it for SSH, and merge the effective local/inventory overrides. As per coding guidelines, shell scripts should quote variables properly.

Proposed fix
 exec_on_remote() {
-    # shellcheck disable=SC2088 # tilde expands on the remote shell via SSH
-    local remote_script="~/${PROV_WORKING_DIR}/scripts/deploy-baremetal.sh"
-    local remote_args="--cluster-name ${CLUSTER_NAME}"
-    if [[ -n "${PROV_DEV_SCRIPTS_PATH:-}" ]]; then
-        remote_args+=" --dev-scripts-path ${PROV_DEV_SCRIPTS_PATH}"
+    local remote_script="${PROV_WORKING_DIR}/scripts/deploy-baremetal.sh"
+    local remote_args=(--cluster-name "${CLUSTER_NAME}")
+    local remote_dev_scripts_repo="${PROV_DEV_SCRIPTS_REPO:-${DEV_SCRIPTS_REPO}}"
+    local remote_dev_scripts_branch="${PROV_DEV_SCRIPTS_BRANCH:-${DEV_SCRIPTS_BRANCH}}"
+
+    if [[ -n "${PROV_DEV_SCRIPTS_PATH:-}" ]]; then
+        remote_args+=(--dev-scripts-path "${PROV_DEV_SCRIPTS_PATH}")
     fi
-    if [[ -n "${PROV_DEV_SCRIPTS_REPO:-}" ]]; then
-        remote_args+=" --dev-scripts-repo ${PROV_DEV_SCRIPTS_REPO}"
+    if [[ -n "${remote_dev_scripts_repo}" ]]; then
+        remote_args+=(--dev-scripts-repo "${remote_dev_scripts_repo}")
     fi
-    if [[ -n "${PROV_DEV_SCRIPTS_BRANCH:-}" ]]; then
-        remote_args+=" --dev-scripts-branch ${PROV_DEV_SCRIPTS_BRANCH}"
+    if [[ -n "${remote_dev_scripts_branch}" ]]; then
+        remote_args+=(--dev-scripts-branch "${remote_dev_scripts_branch}")
     fi
+
+    local remote_cmd="cd && TNT_REMOTE_EXEC=1 bash"
+    printf -v remote_cmd '%s %q' "${remote_cmd}" "${remote_script}"
+    local arg
+    for arg in "${remote_args[@]}"; do
+        printf -v remote_cmd '%s %q' "${remote_cmd}" "${arg}"
+    done
 
     info "Executing deploy on ${PROV_SSH_TARGET}..."
     info "Remote output follows:"
     echo "=========================================="
 
     # shellcheck disable=SC2029
     ssh -tt "${SSH_OPTS[@]}" "${PROV_SSH_TARGET}" \
-        "TNT_REMOTE_EXEC=1 bash ${remote_script} ${remote_args}"
+        "${remote_cmd}"
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/deploy-baremetal.sh` around lines 351 -
371, The exec_on_remote function is dropping effective dev-scripts overrides in
provisioning-host mode and is also building the SSH command from unquoted
strings, which can split repo/path/branch values incorrectly. Update
exec_on_remote to collect the final dev-scripts repo/branch values from the
local environment or inventory defaults, then build the remote invocation as an
argv-style array instead of concatenating remote_args. Ensure the ssh call
passes a properly shell-quoted command using the exec_on_remote and remote_args
logic, and quote all variable expansions used in the command construction.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@deploy/openshift-clusters/scripts/deploy-baremetal.sh`:
- Around line 45-63: The parse_args() option handling reads $2 for
--dev-scripts-repo and --dev-scripts-branch before confirming a value exists,
which can trigger an unbound-variable failure under set -u. Update parse_args()
in deploy-baremetal.sh to validate that each option has a following argument
before assigning from $2, and fail with a clear usage/error message when the
value is missing. Keep the checks in the same case branches for
--dev-scripts-repo and --dev-scripts-branch so the script exits cleanly instead
of aborting unexpectedly.
- Around line 351-371: The exec_on_remote function is dropping effective
dev-scripts overrides in provisioning-host mode and is also building the SSH
command from unquoted strings, which can split repo/path/branch values
incorrectly. Update exec_on_remote to collect the final dev-scripts repo/branch
values from the local environment or inventory defaults, then build the remote
invocation as an argv-style array instead of concatenating remote_args. Ensure
the ssh call passes a properly shell-quoted command using the exec_on_remote and
remote_args logic, and quote all variable expansions used in the command
construction.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: beb072fd-30ee-453b-b4c7-452f2890b2cf

📥 Commits

Reviewing files that changed from the base of the PR and between 5b40836 and e417b79.

📒 Files selected for processing (2)
  • deploy/openshift-clusters/inventory_baremetal.ini.sample
  • deploy/openshift-clusters/scripts/deploy-baremetal.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • deploy/openshift-clusters/inventory_baremetal.ini.sample

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 2, 2026
@fonta-rh fonta-rh force-pushed the ocpedge-2775-baremetal-deploy branch from 5a87a3f to 21dd305 Compare July 2, 2026 09:53
@fonta-rh fonta-rh changed the title Add baremetal deploy via dev-scripts ABI Add baremetal deploy via Ansible playbook Jul 2, 2026
fonta-rh and others added 12 commits July 2, 2026 11:57
Introduce adopt-baremetal.sh to onboard existing baremetal nodes into
the dev-scripts deployment workflow. Parses inventory_baremetal.ini
(template or wizard-generated), validates BMC credentials via Redfish,
and generates ironic_nodes.json + config_baremetal_fencing.sh artifacts
for NODES_PLATFORM=baremetal deployments.

OCPEDGE-2774

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split the interactive wizard out of adopt-baremetal.sh into its own
baremetal-wizard.sh script. Add input validation (IPv4, MAC format,
hostname), re-prompt on invalid input instead of dying, a summary
table with masked passwords before confirmation, and Y/n/q flow.

Rename all baremetal scripts and Make targets to use a baremetal-*
prefix for consistent grouping (baremetal-adopt, baremetal-verify,
baremetal-wizard).

OCPEDGE-2774

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The wizard only accepted IPv4 addresses for BMC endpoints, but real
environments (e.g., HPE iLO) commonly use FQDNs. Accept both IPv4
and hostnames, and update prompts/labels accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bmc_address accepts both IPs and hostnames (matching wizard change).
boot_mac is now optional — when omitted, the adopt script queries
Redfish EthernetInterfaces for an enabled NIC's MAC. Falls back to
a clear warning if discovery fails (e.g., firmware doesn't expose MACs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the EthernetInterfaces-based discovery (which returned the
wrong NIC) with BootOptions-based discovery. Walks the BIOS boot
order, finds the first PXE IPv4 entry, and extracts the MAC from
the UEFI device path. Tested against HPE iLO 5 (Edgeline e920t).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix jq null handling in Redfish system ID discovery (// empty)
- Fix script names in usage text and generated comments
- Fix missing / separator in BootOptions URL construction
- Add --inventory flag and info messages to wizard trigger
- Add warning when inventory has != 2 nodes
- Honor bmc_verify_ca in curl calls via bmc_curl wrapper
- Restrict permissions on generated credential artifacts (umask 077)

Co-Authored-By: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace heredoc JSON interpolation with jq -n --arg to prevent
  malformed output from passwords containing quotes or backslashes
- Fail hard on missing boot MAC instead of writing DISCOVERY_FAILED
  placeholder that silently breaks dev-scripts downstream
- Gate all BMC contact (discovery + verification) behind --skip-verify
  so offline users don't get silent failures baked into artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Redfish @odata.id values include a trailing slash (/redfish/v1/Systems/1/),
which produced a double-slash URL (.../Systems/1//BootOptions/) that iLO
returns null content for. Strip the trailing slash after the leading one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add declarative [baremetal_network] INI section for cluster-wide network
vars (machine_network, gateway, api_vip, ingress_vip) and per-node
node_ip field. The adoption script translates these into dev-scripts
exports (EXTERNAL_SUBNET_V4, BAREMETAL_GATEWAY, BAREMETAL_API_VIP,
BAREMETAL_INGRESS_VIP, BAREMETAL_IPS) and always emits bridge overrides
(MANAGE_BR_BRIDGE=n, MANAGE_PRO_BRIDGE=n, MANAGE_INT_BRIDGE=n).

All new fields are optional for backward compatibility. The wizard emits
skipped fields as commented placeholders so users can fill them later
without referencing the sample file. BAREMETAL_IPS is only emitted when
ALL nodes have node_ip set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The wizard now collects optional provisioning host config (ssh_target,
ssh_key, dev_scripts_path, working_dir) and writes a [provisioning_host]
section to inventory_baremetal.ini. Skipped fields are commented out.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without an explicit port in the Redfish URL, CEO's fence_redfish agent
fails to connect to the BMC during TNF auth jobs. The port (default 443)
now flows through verification, discovery, and ironic_nodes.json generation
as redfish://host:port/redfish/v1/Systems/1.

Adds bmc_port as a per-node field with a group default in
[baremetal_nodes:vars], matching the existing inheritance pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Place ironic_nodes.json and config_baremetal_fencing.sh in
roles/dev-scripts/install-dev/files/ instead of clusters/<name>/
so existing Ansible config.yml tasks can reference them by basename.
Remove --cluster-name flag (only used for the clusters/ subdirectory).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
deploy/openshift-clusters/scripts/deploy-baremetal.sh (1)

36-41: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add a comment explaining the section-parsing logic.

The sed/grep pipeline that checks for configured hosts under [provisioning_host] is non-trivial (range match to next section header, strip header line, filter comments/blanks). As per coding guidelines, **/*.sh files should "add comments for complex logic".

📝 Proposed comment addition
+# Extract lines between [provisioning_host] and the next section header,
+# drop the header line itself, and check for at least one non-comment/blank entry.
 if ! grep -qvE '^\s*(#|$|\[)' <(sed -n '/\[provisioning_host\]/,/^\[/p' "${INVENTORY}" | tail -n +2); then
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/deploy-baremetal.sh` around lines 36 - 41,
The host-check logic in deploy-baremetal.sh is complex and needs an inline
comment. Add a brief comment above the sed/grep pipeline in the
provisioning_host validation block explaining that it extracts the
[provisioning_host] section, skips the header, and filters out comments/blank
lines to detect whether any hosts are configured.

Source: Coding guidelines

deploy/openshift-clusters/deploy-baremetal.yml (1)

65-67: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Prefer ansible.builtin.file over shell for directory creation.

Using ansible.builtin.shell: mkdir -p ... with changed_when: false sacrifices idempotent change reporting for no real benefit; ansible.builtin.file with state: directory achieves the same result natively.

♻️ Proposed refactor
-    - name: Create working directory
-      ansible.builtin.shell: mkdir -p "${HOME}/dev-scripts-workdir"
-      changed_when: false
+    - name: Create working directory
+      ansible.builtin.file:
+        path: "{{ ansible_env.HOME }}/dev-scripts-workdir"
+        state: directory
+        mode: "0755"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/deploy-baremetal.yml` around lines 65 - 67, The
“Create working directory” task is using a shell command just to make a
directory, which should be replaced with Ansible’s native file handling. Update
that task in the playbook to use ansible.builtin.file with state set to
directory for the same "${HOME}/dev-scripts-workdir" path, and remove the
shell-based mkdir and changed_when suppression. Keep the task name and behavior
aligned with the existing deploy-baremetal workflow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/openshift-clusters/deploy-baremetal.yml`:
- Around line 82-94: The ABI pipeline task uses the short name make, which may
not resolve because this play does not declare a collections scope. Update the
task in the “Run dev-scripts ABI pipeline” block to use the fully qualified
community.general.make module, or alternatively add community.general to the
play’s collections so the “dev-scripts: {{ item }}” task resolves reliably.

---

Nitpick comments:
In `@deploy/openshift-clusters/deploy-baremetal.yml`:
- Around line 65-67: The “Create working directory” task is using a shell
command just to make a directory, which should be replaced with Ansible’s native
file handling. Update that task in the playbook to use ansible.builtin.file with
state set to directory for the same "${HOME}/dev-scripts-workdir" path, and
remove the shell-based mkdir and changed_when suppression. Keep the task name
and behavior aligned with the existing deploy-baremetal workflow.

In `@deploy/openshift-clusters/scripts/deploy-baremetal.sh`:
- Around line 36-41: The host-check logic in deploy-baremetal.sh is complex and
needs an inline comment. Add a brief comment above the sed/grep pipeline in the
provisioning_host validation block explaining that it extracts the
[provisioning_host] section, skips the header, and filters out comments/blank
lines to detect whether any hosts are configured.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1a36f074-0c93-4030-9375-a952efb2ffb5

📥 Commits

Reviewing files that changed from the base of the PR and between e417b79 and 5a87a3f.

📒 Files selected for processing (6)
  • deploy/Makefile
  • deploy/openshift-clusters/deploy-baremetal.yml
  • deploy/openshift-clusters/inventory_baremetal.ini.sample
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/.gitignore
  • deploy/openshift-clusters/scripts/baremetal-adopt.sh
  • deploy/openshift-clusters/scripts/deploy-baremetal.sh
✅ Files skipped from review due to trivial changes (2)
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/.gitignore
  • deploy/openshift-clusters/inventory_baremetal.ini.sample
🚧 Files skipped from review as they are similar to previous changes (2)
  • deploy/Makefile
  • deploy/openshift-clusters/scripts/baremetal-adopt.sh

Comment on lines +82 to +94
- name: Run dev-scripts ABI pipeline
block:
- name: "dev-scripts: {{ item }}"
make:
chdir: "{{ dev_scripts_path }}"
target: "{{ item }}"
loop:
- requirements
- agent_requirements
- agent_build_installer
- agent_prepare_release
- agent_configure
- agent_create_cluster

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Is community.general collection required for the ansible make module, or is it part of ansible.builtin?

💡 Result:

The make module is part of the community.general collection, not ansible.builtin [1][2]. In modern Ansible (following the move to the collections-based architecture), you should reference it by its fully qualified collection name (FQCN): community.general.make [2]. While the module existed as a built-in module in older versions of Ansible (prior to the introduction of collections and the separation of modules into the community and core distributions) [3][4], it was migrated to the community.general collection as part of the project's modularization [2][5]. If you are using a modern version of Ansible, you must ensure the community.general collection is installed in your environment to use this module [2].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the target playbook and nearby context
git ls-files deploy/openshift-clusters/deploy-baremetal.yml
wc -l deploy/openshift-clusters/deploy-baremetal.yml
sed -n '1,180p' deploy/openshift-clusters/deploy-baremetal.yml

# Look for any collections declarations that would resolve community.general.make
rg -n '^\s*collections:|community\.general|make:' deploy/openshift-clusters -g '*.yml' -g '*.yaml'

Repository: openshift-eng/two-node-toolbox

Length of output: 6055


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the playbook header and any relevant Ansible config/collection requirements
sed -n '1,130p' deploy/openshift-clusters/deploy-baremetal.yml

printf '\n--- collections requirements ---\n'
git ls-files 'deploy/openshift-clusters/**' 'collections/**' 'ansible.cfg' '.ansible*' | sed -n '1,200p'
printf '\n--- matches ---\n'
rg -n '^\s*collections:|community\.general|make:' deploy/openshift-clusters collections . -g '*.yml' -g '*.yaml' -g 'ansible.cfg' -g '.ansible*'

Repository: openshift-eng/two-node-toolbox

Length of output: 15874


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find any collection installation/requirements references that would make community.general available
git ls-files | rg '(^|/)(collections/requirements\.yml|ansible\.cfg|requirements\.yml|requirements\.yaml|galaxy\.yml|galaxy\.yaml)$'
for f in $(git ls-files | rg '(^|/)(collections/requirements\.yml|ansible\.cfg|requirements\.yml|requirements\.yaml|galaxy\.yml|galaxy\.yaml)$'); do
  echo "--- $f ---"
  sed -n '1,220p' "$f"
done

Repository: openshift-eng/two-node-toolbox

Length of output: 917


🌐 Web query:

Ansible short module name resolution for collection modules: can community.general.make be used as make when the collection is installed but no collections: is declared?

💡 Result:

No, you cannot use the short module name make to refer to the community.general.make module if the collections keyword is not declared in your playbook or role [1][2][3]. Ansible requires either the Fully Qualified Collection Name (FQCN), such as community.general.make, or the explicit use of the collections keyword to resolve short module names that are not part of the ansible.builtin collection [2][3]. The collections keyword creates an ordered search path that allows Ansible to resolve unqualified module names [2][3]. Without this declaration, or without explicitly using the FQCN, Ansible will attempt to find a module named make within the default search path (which includes ansible.builtin) and will fail if it cannot find it there [2][3]. The official documentation consistently recommends using the FQCN (community.general.make) to ensure clarity, avoid conflicts, and eliminate the need for the collections keyword [2][4][3].

Citations:


Use community.general.make here. make is a collection module, and this play doesn’t declare collections:, so the short name can fail to resolve; switch to the FQCN or add community.general to the play scope.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/deploy-baremetal.yml` around lines 82 - 94, The ABI
pipeline task uses the short name make, which may not resolve because this play
does not declare a collections scope. Update the task in the “Run dev-scripts
ABI pipeline” block to use the fully qualified community.general.make module, or
alternatively add community.general to the play’s collections so the
“dev-scripts: {{ item }}” task resolves reliably.

fonta-rh and others added 5 commits July 2, 2026 12:01
Add deploy-baremetal.sh — a standalone shell script that deploys a TNF
fencing cluster on adopted baremetal nodes using dev-scripts' agent
pipeline. Skips libvirt/qemu setup and drives Redfish VirtualMedia for
ISO delivery.

Supports remote execution: if [provisioning_host] is configured in
inventory_baremetal.ini, syncs artifacts via rsync and runs the deploy
on the remote host via SSH. Falls back to local execution if absent.

Changes:
- deploy-baremetal.sh: pre-flight validation, dev-scripts setup, remote
  execution via SSH with credential retrieval
- inventory_baremetal.ini.sample: add [provisioning_host] section
- baremetal-adopt.sh: append AGENT_E2E_TEST_SCENARIO to generated config
- Makefile: add baremetal-fencing-agent target

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add --dev-scripts-repo and --dev-scripts-branch CLI args so the deploy
  script can clone from a fork or switch an existing checkout to a
  specific branch (needed for dev-scripts PRs not yet merged upstream)
- Add dev_scripts_repo and dev_scripts_branch to [provisioning_host]
  inventory section, forwarded via SSH remote execution
- Remove validate_tools() — dev-scripts' make requirements now runs as
  part of the pipeline, handling all dependency installation
- Remove python wrapper and PYTHONPATH workarounds (Fedora laptop hacks,
  not needed on RHEL provisioning host)
- Normalize [[ ]] && to if/then blocks for readability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use TNF_IPV4_DHCP instead of TNF_IPV4 to skip static nmstate
generation in agent-config. Nodes get IP/DNS/gateway from the lab
DHCP server, avoiding the gateway-as-DNS bug in the static
networking template.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With DHCP networking mode, nodes get DNS from the lab DHCP server
instead of static nmstate config. The PROVISIONING_HOST_EXTERNAL_IP
variable is no longer needed for node configuration — dev-scripts
provides a harmless default via network.sh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the 448-line shell script with a 90-line Ansible playbook
(deploy-baremetal.yml) that reuses the install-dev role's config
validation via include_role tasks_from: config.

The shell script is gutted to a thin wrapper (~45 lines) that validates
inventory_baremetal.ini has a configured [provisioning_host] and calls
ansible-playbook, matching the deploy-cluster.sh pattern.

- Playbook targets [provisioning_host] group from inventory_baremetal.ini
- SSH/rsync remote execution replaced by Ansible's native connection layer
- Config/pull-secret validation reused from install-dev role (zero duplication)
- Dev-scripts ABI pipeline via make module loop (6 targets)
- Credentials fetched back to controller via fetch module
- Error recovery block with cleanup instructions

Also updates inventory_baremetal.ini.sample to use standard Ansible
inventory format for [provisioning_host] section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fonta-rh fonta-rh force-pushed the ocpedge-2775-baremetal-deploy branch from 21dd305 to 9f45936 Compare July 2, 2026 10:01
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 2, 2026
@fonta-rh fonta-rh marked this pull request as ready for review July 2, 2026 10:02
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 2, 2026
@openshift-ci openshift-ci Bot requested review from jaypoulz and qJkee July 2, 2026 10:02

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deploy/openshift-clusters/scripts/baremetal-adopt.sh (1)

363-389: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Write ironic_nodes.json with restrictive permissions.

This JSON contains BMC credentials, but > uses the existing file mode or process umask. Create/install it as 0600 to avoid leaking node credentials on the controller. As per coding guidelines, “Never include sensitive information” and “Handle secrets and credentials securely.”

Proposed fix
-    printf '%s\n' "${nodes[@]}" | jq -s '{nodes: .}' > "${output_file}"
+    local tmp_file
+    tmp_file="$(mktemp "${output_file}.XXXXXX")"
+    printf '%s\n' "${nodes[@]}" | jq -s '{nodes: .}' > "${tmp_file}"
+    install -m 0600 "${tmp_file}" "${output_file}"
+    rm -f "${tmp_file}"
     info "  → ${output_file}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/baremetal-adopt.sh` around lines 363 - 389,
The final write of ironic_nodes.json in baremetal-adopt.sh is too permissive for
a file containing BMC credentials. Update the logic around the nodes assembly
and the final `jq -s '{nodes: .}'` output so the file is created atomically with
restrictive permissions, using a `0600` mode (for example via a secure
create/move flow) rather than relying on `>` and the current umask. Keep the fix
local to the script section that builds `nodes` and writes `output_file`, and
ensure the resulting `ironic_nodes.json` is not readable by other users on the
controller.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/openshift-clusters/scripts/baremetal-adopt.sh`:
- Line 424: The generated config in the baremetal adopt script is setting the
wrong TNF agent scenario; update the scenario output in the script’s config
generation logic to use the requested AGENT_E2E_TEST_SCENARIO value of TNF_IPV4
instead of TNF_IPV4_DHCP. Locate the echo that emits the AGENT_E2E_TEST_SCENARIO
assignment in the baremetal-adopt.sh flow and change it so the deployment
follows the intended dev-scripts scenario.

---

Outside diff comments:
In `@deploy/openshift-clusters/scripts/baremetal-adopt.sh`:
- Around line 363-389: The final write of ironic_nodes.json in
baremetal-adopt.sh is too permissive for a file containing BMC credentials.
Update the logic around the nodes assembly and the final `jq -s '{nodes: .}'`
output so the file is created atomically with restrictive permissions, using a
`0600` mode (for example via a secure create/move flow) rather than relying on
`>` and the current umask. Keep the fix local to the script section that builds
`nodes` and writes `output_file`, and ensure the resulting `ironic_nodes.json`
is not readable by other users on the controller.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a56a7df1-dcaa-484a-a9a5-b5736ebd9bc8

📥 Commits

Reviewing files that changed from the base of the PR and between 5a87a3f and 21dd305.

📒 Files selected for processing (8)
  • deploy/Makefile
  • deploy/openshift-clusters/.gitignore
  • deploy/openshift-clusters/deploy-baremetal.yml
  • deploy/openshift-clusters/inventory_baremetal.ini.sample
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/.gitignore
  • deploy/openshift-clusters/scripts/baremetal-adopt.sh
  • deploy/openshift-clusters/scripts/baremetal-wizard.sh
  • deploy/openshift-clusters/scripts/deploy-baremetal.sh
💤 Files with no reviewable changes (1)
  • deploy/openshift-clusters/.gitignore
✅ Files skipped from review due to trivial changes (2)
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/.gitignore
  • deploy/openshift-clusters/inventory_baremetal.ini.sample
🚧 Files skipped from review as they are similar to previous changes (4)
  • deploy/openshift-clusters/scripts/deploy-baremetal.sh
  • deploy/Makefile
  • deploy/openshift-clusters/deploy-baremetal.yml
  • deploy/openshift-clusters/scripts/baremetal-wizard.sh

echo "export MANAGE_BR_BRIDGE=n"
echo "export MANAGE_PRO_BRIDGE=n"
echo "export MANAGE_INT_BRIDGE=n"
echo "export AGENT_E2E_TEST_SCENARIO=\"TNF_IPV4_DHCP\""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use the requested TNF agent scenario.

The PR objective says this generated config should append AGENT_E2E_TEST_SCENARIO="TNF_IPV4", but this emits TNF_IPV4_DHCP, which can route the deployment through the wrong dev-scripts scenario.

Proposed fix
-        echo "export AGENT_E2E_TEST_SCENARIO=\"TNF_IPV4_DHCP\""
+        echo "export AGENT_E2E_TEST_SCENARIO=\"TNF_IPV4\""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "export AGENT_E2E_TEST_SCENARIO=\"TNF_IPV4_DHCP\""
echo "export AGENT_E2E_TEST_SCENARIO=\"TNF_IPV4\""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/baremetal-adopt.sh` at line 424, The
generated config in the baremetal adopt script is setting the wrong TNF agent
scenario; update the scenario output in the script’s config generation logic to
use the requested AGENT_E2E_TEST_SCENARIO value of TNF_IPV4 instead of
TNF_IPV4_DHCP. Locate the echo that emits the AGENT_E2E_TEST_SCENARIO assignment
in the baremetal-adopt.sh flow and change it so the deployment follows the
intended dev-scripts scenario.

@coderabbitai coderabbitai Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jul 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
deploy/openshift-clusters/scripts/baremetal-wizard.sh (1)

462-482: 🔒 Security & Privacy | 🔵 Trivial

Consider vault-encrypting BMC credentials before persisting to disk.

bmc_pass values are written in plaintext into inventory_baremetal.ini. mktemp defaults to mode 600 so the file itself isn't world-readable, but plaintext BMC passwords on disk are still a risk if the file is copied, backed up, or committed accidentally. Consider prompting to encrypt the password field with ansible-vault or documenting that the generated inventory must be protected/gitignored.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/baremetal-wizard.sh` around lines 462 -
482, The inventory generation in baremetal-wizard.sh writes BMC credentials
directly into the temporary inventory file via the loop that builds each node’s
line, including bmc_pass in plaintext. Update this flow so WIZ_PASSES is not
persisted unencrypted: either vault-encrypt the password before writing the
inventory or replace the plaintext field with guidance/documentation that the
generated inventory must be protected and gitignored. Keep the change localized
to the inventory write logic around the mktemp/tmp_inventory generation and the
per-node line assembly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/openshift-clusters/scripts/baremetal-wizard.sh`:
- Around line 484-491: The baremetal inventory defaults to bmc_verify_ca=False
in the inventory block written by the wizard, which disables Redfish TLS
verification for everyone. Update the baremetal-wizard.sh flow that builds the
[baremetal_nodes:vars] section to make this setting configurable through the
wizard (or otherwise preserve the secure default), and ensure the generated
tmp_inventory reflects the user’s choice instead of hardcoding the insecure
value.
- Around line 389-409: The wizard reset logic is leaving WIZ_PORTS stale, which
can misalign bmc_port data after a restart. Update the array reinitialization
block in baremetal-wizard.sh to clear WIZ_PORTS along with WIZ_NAMES, WIZ_IPS,
WIZ_USERS, WIZ_PASSES, WIZ_MACS, and WIZ_NODE_IPS, so the node data collected in
the loop around prompt_bmc_port stays in sync when show_summary and
write_inventory are reached again after a continue.

---

Nitpick comments:
In `@deploy/openshift-clusters/scripts/baremetal-wizard.sh`:
- Around line 462-482: The inventory generation in baremetal-wizard.sh writes
BMC credentials directly into the temporary inventory file via the loop that
builds each node’s line, including bmc_pass in plaintext. Update this flow so
WIZ_PASSES is not persisted unencrypted: either vault-encrypt the password
before writing the inventory or replace the plaintext field with
guidance/documentation that the generated inventory must be protected and
gitignored. Keep the change localized to the inventory write logic around the
mktemp/tmp_inventory generation and the per-node line assembly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d3a43c7f-14e6-4d69-bf28-70f47cf41181

📥 Commits

Reviewing files that changed from the base of the PR and between 21dd305 and 9f45936.

📒 Files selected for processing (9)
  • deploy/Makefile
  • deploy/openshift-clusters/.gitignore
  • deploy/openshift-clusters/deploy-baremetal.yml
  • deploy/openshift-clusters/inventory_baremetal.ini.sample
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/.gitignore
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_fencing_example.sh
  • deploy/openshift-clusters/scripts/baremetal-adopt.sh
  • deploy/openshift-clusters/scripts/baremetal-wizard.sh
  • deploy/openshift-clusters/scripts/deploy-baremetal.sh
✅ Files skipped from review due to trivial changes (4)
  • deploy/openshift-clusters/.gitignore
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_fencing_example.sh
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/.gitignore
  • deploy/openshift-clusters/inventory_baremetal.ini.sample
🚧 Files skipped from review as they are similar to previous changes (4)
  • deploy/openshift-clusters/scripts/deploy-baremetal.sh
  • deploy/Makefile
  • deploy/openshift-clusters/deploy-baremetal.yml
  • deploy/openshift-clusters/scripts/baremetal-adopt.sh

Comment on lines +389 to +409
WIZ_NAMES=()
WIZ_IPS=()
WIZ_USERS=()
WIZ_PASSES=()
WIZ_MACS=()
WIZ_NODE_IPS=()

local i
for ((i = 0; i < node_count; i++)); do
local default_name="master-${i}"
echo ""
echo "--- Node $((i + 1)) of ${node_count} ---"

WIZ_NAMES+=("$(prompt_hostname "${default_name}")")
WIZ_IPS+=("$(prompt_bmc_address)")
WIZ_USERS+=("$(prompt_bmc_user)")
WIZ_PASSES+=("$(prompt_bmc_pass)")
WIZ_PORTS+=("$(prompt_bmc_port)")
WIZ_MACS+=("$(prompt_boot_mac)")
WIZ_NODE_IPS+=("$(prompt_node_ip)")
done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Stale WIZ_PORTS array causes misaligned data on wizard restart.

The reset block (Lines 389-394) omits WIZ_PORTS, but it is only ever appended to (Line 406). If the user declines the summary and restarts (Line 448 continue), old port entries persist and shift out of alignment with the freshly-reset WIZ_NAMES/WIZ_IPS/etc. arrays, causing show_summary (Line 345) and write_inventory (Line 472) to emit wrong bmc_port values for re-entered nodes.

🐛 Proposed fix
         WIZ_NAMES=()
         WIZ_IPS=()
         WIZ_USERS=()
         WIZ_PASSES=()
+        WIZ_PORTS=()
         WIZ_MACS=()
         WIZ_NODE_IPS=()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
WIZ_NAMES=()
WIZ_IPS=()
WIZ_USERS=()
WIZ_PASSES=()
WIZ_MACS=()
WIZ_NODE_IPS=()
local i
for ((i = 0; i < node_count; i++)); do
local default_name="master-${i}"
echo ""
echo "--- Node $((i + 1)) of ${node_count} ---"
WIZ_NAMES+=("$(prompt_hostname "${default_name}")")
WIZ_IPS+=("$(prompt_bmc_address)")
WIZ_USERS+=("$(prompt_bmc_user)")
WIZ_PASSES+=("$(prompt_bmc_pass)")
WIZ_PORTS+=("$(prompt_bmc_port)")
WIZ_MACS+=("$(prompt_boot_mac)")
WIZ_NODE_IPS+=("$(prompt_node_ip)")
done
WIZ_NAMES=()
WIZ_IPS=()
WIZ_USERS=()
WIZ_PASSES=()
WIZ_PORTS=()
WIZ_MACS=()
WIZ_NODE_IPS=()
local i
for ((i = 0; i < node_count; i++)); do
local default_name="master-${i}"
echo ""
echo "--- Node $((i + 1)) of ${node_count} ---"
WIZ_NAMES+=("$(prompt_hostname "${default_name}")")
WIZ_IPS+=("$(prompt_bmc_address)")
WIZ_USERS+=("$(prompt_bmc_user)")
WIZ_PASSES+=("$(prompt_bmc_pass)")
WIZ_PORTS+=("$(prompt_bmc_port)")
WIZ_MACS+=("$(prompt_boot_mac)")
WIZ_NODE_IPS+=("$(prompt_node_ip)")
done
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/baremetal-wizard.sh` around lines 389 -
409, The wizard reset logic is leaving WIZ_PORTS stale, which can misalign
bmc_port data after a restart. Update the array reinitialization block in
baremetal-wizard.sh to clear WIZ_PORTS along with WIZ_NAMES, WIZ_IPS, WIZ_USERS,
WIZ_PASSES, WIZ_MACS, and WIZ_NODE_IPS, so the node data collected in the loop
around prompt_bmc_port stays in sync when show_summary and write_inventory are
reached again after a continue.

Comment on lines +484 to +491
{
echo ""
echo "[baremetal_nodes:vars]"
echo "bmc_driver=redfish"
echo "bmc_port=443"
echo "bmc_verify_ca=False"
echo "cpu_arch=x86_64"
} >> "${tmp_inventory}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win

Hardcoded bmc_verify_ca=False disables BMC TLS cert verification by default.

This globally disables certificate validation for all Redfish BMC connections with no opt-out prompt. Worth exposing as a wizard-configurable option or at least documenting the risk, since it's a permissive default that weakens the security posture for anyone deploying with valid BMC certs.

🧰 Tools
🪛 Shellcheck (0.11.0)

[style] 484-491: Consider using { cmd1; cmd2; } >> file instead of individual redirects.

(SC2129)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/openshift-clusters/scripts/baremetal-wizard.sh` around lines 484 -
491, The baremetal inventory defaults to bmc_verify_ca=False in the inventory
block written by the wizard, which disables Redfish TLS verification for
everyone. Update the baremetal-wizard.sh flow that builds the
[baremetal_nodes:vars] section to make this setting configurable through the
wizard (or otherwise preserve the secure default), and ensure the generated
tmp_inventory reflects the user’s choice instead of hardcoding the insecure
value.

- Generate SSH keypair on provisioning host if missing, inject
  SSH_PUB_KEY into dev-scripts config so cluster nodes accept SSH
- Clean stale installer state (state.json, ISO, manifests) before
  running ABI pipeline to prevent release image SHA mismatches
- Skip host dependency install (runc, containernetworking-plugins)
  for baremetal path — dev-scripts make requirements handles its own
- Add test_cluster_name to play-level vars (include_role defaults
  don't persist to play scope with dynamic includes)
- Fetch SSH private key alongside kubeconfig for node access
- Pass method as task-level var on include_role (role vars/main.yml
  overrides play-level vars but not task-level vars)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci

openshift-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown

@fonta-rh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/shellcheck 0f098dc link true /test shellcheck

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant