Skip to content
Open
102 changes: 84 additions & 18 deletions content/en/docs/next/virtualization/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,32 +100,98 @@ Allocatable:
For example, the database entry for A10 reads `2236 GA102GL [A10]`, which results in a resource name `nvidia.com/GA102GL_A10`.
{{% /alert %}}

## 2. Update the KubeVirt Custom Resource
## 2. KubeVirt is wired automatically

Next, we will update the KubeVirt Custom Resource, as documented in the
[KubeVirt user guide](https://kubevirt.io/user-guide/virtual_machines/host-devices/#listing-permitted-devices),
so that the passthrough GPUs are permitted and can be requested by a KubeVirt VM.
When `cozystack.gpu-operator` is in `bundles.enabledPackages`, Cozystack mirrors the chosen GPU variant into the `KubeVirt` Custom Resource for you. There is no `kubectl edit kubevirt` step.

Adjust the `pciVendorSelector` and `resourceName` values to match your specific GPU model.
Setting `externalResourceProvider=true` indicates that this resource is provided by an external device plugin,
in this case the `sandbox-device-plugin` which is deployed by the Operator.
Specifically, the platform injects:

- `HostDevices` into `spec.configuration.developerConfiguration.featureGates` (current KubeVirt splits this from the `GPU` gate; the admission webhook rejects `domain.devices.hostDevices` without it).
- A starter `spec.configuration.permittedHostDevices.pciHostDevices` table (rendered in the default `gpuOperatorVariant: default` — vfio-pci passthrough) covering common NVIDIA datacenter GPUs — Hopper (H100, H200), Ada Lovelace (L4, L40, L40S), Ampere (A100 PCIe/SXM, A40, A30, A10), Turing (T4), Volta (V100, V100S). PCI vendor:device pairs are stable; each `resourceName` slug is whatever `nvidia-sandbox-device-plugin` derives mechanically from the card's PCI-IDs database name — uppercase the name, turn `/`, `.` and whitespace into `_`, then strip the surrounding `[` / `]`. The slug therefore carries every token the PCI-IDs string holds (the `GL` die suffix, the `Tesla` brand on Turing/Volta, the form factor, the memory size), not a tidy `<arch>_<model>`: `TU104GL [Tesla T4]` becomes `nvidia.com/TU104GL_TESLA_T4`, `GA100GL [A30 PCIe]` becomes `nvidia.com/GA100GL_A30_PCIE`, and the H200 SXM becomes `nvidia.com/GH100_H200_SXM_141GB`. Confirm the exact strings your nodes advertise with `kubectl describe node <node> | grep nvidia.com/`. `externalResourceProvider: true` is set on every entry because the resources are advertised by the sandbox plugin, not by KubeVirt's in-tree device plugin.

Verify the resulting CR:

```bash
kubectl edit kubevirt -n cozy-kubevirt
kubectl -n cozy-kubevirt get kubevirt kubevirt -o json \
| jq '.spec.configuration | {featureGates: .developerConfiguration.featureGates, permittedHostDevices: .permittedHostDevices}'
```
example config:

{{% alert color="info" %}}

**My GPU isn't in the default table — where's the old `kubectl edit kubevirt` step?** It is gone on purpose. `permittedHostDevices` is now owned by the chart template and reconciled from platform values, so any hand edit to the live CR is reverted on the next Flux/Helm reconcile. Add your card through `.gpu.permittedHostDevices` instead — see [Extending or replacing the NVIDIA defaults](#extending-or-replacing-the-nvidia-defaults) below. If you are upgrading from a release where you hand-edited the CR, follow [Upgrading from a hand-edited KubeVirt CR](#upgrading-from-a-hand-edited-kubevirt-cr) first.

{{% /alert %}}

### Extending or replacing the NVIDIA defaults

If your cluster ships a GPU not in the default table, or your `nvidia-sandbox-device-plugin` version emits a different `resourceName` (check with `kubectl describe node <node> | grep nvidia.com/`), extend the defaults via platform values:

```yaml
...
spec:
configuration:
permittedHostDevices:
pciHostDevices:
- externalResourceProvider: true
pciVendorSelector: 10DE:2236
resourceName: nvidia.com/GA102GL_A10
...
# Platform Package values
gpu:
# Append (default) — your entries land alongside the NVIDIA table.
# Set to true to drop the NVIDIA table entirely (useful for non-NVIDIA-only
# clusters or strict allowlists). With replaceDefaults: true and an empty
# list below, the rendered CR carries no permittedHostDevices block at all
# and the admission webhook rejects every GPU VM — supply your own list.
replaceDefaults: false
permittedHostDevices:
pciHostDevices:
- pciVendorSelector: "10DE:2236"
resourceName: nvidia.com/GA102GL_A10
externalResourceProvider: true
```

To **re-point** a card already in the NVIDIA table (for example to give `10DE:1EB8` a different `resourceName`), do not append a second entry for the same `pciVendorSelector` — both entries are rendered and KubeVirt resolves the duplicated selector non-deterministically. Set `replaceDefaults: true` and supply the full list you want instead.

### Upgrading from a hand-edited KubeVirt CR

Earlier Cozystack releases left `spec.configuration.permittedHostDevices` for operators to hand-edit (`kubectl edit kubevirt`). The bundle now **owns** that field: the first reconcile after the upgrade replaces your manual entries with the rendered NVIDIA default table.

Before upgrading:

1. Dump your current entries:

```bash
kubectl -n cozy-kubevirt get kubevirt kubevirt -o json \
| jq '.spec.configuration.permittedHostDevices'
```

2. Move any custom entries into the Platform Package values under `.gpu.permittedHostDevices` (set `.gpu.replaceDefaults: true` if you want only your own list instead of appending to the NVIDIA defaults).

3. Verify every `resourceName` against what your nodes actually advertise. The default table carries the slug `nvidia-sandbox-device-plugin` generates from each card's PCI-IDs name (uppercased, e.g. `nvidia.com/TU104GL_TESLA_T4` for a Tesla T4), but a different plugin build or PCI-IDs snapshot can emit a different string:

```bash
kubectl describe node <node> | grep nvidia.com/
```

A `resourceName` mismatch is silent until a GPU VM restarts or migrates, at which point the admission webhook rejects it.

### Manual Package-CR override path

If you opt out of bundle management and hand-craft a `cozystack.gpu-operator` Package CR directly (to apply overrides the bundle does not expose — driver settings, custom node selectors, validator / dcgmExporter tweaks), the platform does NOT auto-wire `HostDevices` or `permittedHostDevices` into the KubeVirt CR. In that flow, mirror the bundle behaviour by also creating a `cozystack.kubevirt` Package CR that carries `extraFeatureGates` and the matching `permittedHostDevices` block under `spec.components.kubevirt.values` (a cozystack `Package` always nests component values under `spec.components.<name>.values`, never a top-level `spec.values`):

```yaml
apiVersion: cozystack.io/v1alpha1
kind: Package
metadata:
name: cozystack.kubevirt
spec:
variant: default
components:
kubevirt:
values:
extraFeatureGates:
- HostDevices
permittedHostDevices:
pciHostDevices:
- pciVendorSelector: "10DE:2236"
resourceName: nvidia.com/GA102GL_A10
externalResourceProvider: true
```

The manual Package-CR override path takes precedence over the bundle render whenever both exist.

## 3. Create a Virtual Machine

We are now ready to create a VM.
Expand Down