From 0fa19605722989c6b339da88460cddc0d8b2f443 Mon Sep 17 00:00:00 2001 From: IvanHunters Date: Thu, 25 Jun 2026 15:51:00 +0300 Subject: [PATCH 1/3] docs(operations): tenant Kubernetes cluster OIDC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Operator-facing page covering per-tenant Keycloak realms and OIDC on tenant kube-apiservers — companion to the existing OIDC docs which cover the management cluster (`cozy` realm). Pairs with cozystack PR cozystack/cozystack#3044. Covers: - the auto-provisioning flow (apps/tenant lookup → realm + scope → _namespace.oidc-realm → apps/kubernetes per-cluster client / group / kube-apiserver flags → in-cluster ClusterRoleBinding via Job); - the enable-on-a-Kubernetes-CR workflow; - creating users and granting access in the tenant realm; - wiring kubectl with kubelogin; - the four known limitations (orphan realm because helm-controller does not re-render on Helm `lookup` result changes; no caBundle / self-signed Keycloak; hardcoded JWT username/groups claims; CRB orphan on runtime oidc.enabled=true→false toggle); - troubleshooting (401 with valid token, 403 for in-group user, stuck realm/scope after CR deletion). Signed-off-by: IvanHunters --- .../next/operations/oidc/tenant_clusters.md | 257 ++++++++++++++++++ 1 file changed, 257 insertions(+) create mode 100644 content/en/docs/next/operations/oidc/tenant_clusters.md diff --git a/content/en/docs/next/operations/oidc/tenant_clusters.md b/content/en/docs/next/operations/oidc/tenant_clusters.md new file mode 100644 index 00000000..ba83fd68 --- /dev/null +++ b/content/en/docs/next/operations/oidc/tenant_clusters.md @@ -0,0 +1,257 @@ +--- +title: "OIDC for tenant Kubernetes clusters" +linkTitle: "Tenant Kubernetes OIDC" +description: "Per-tenant Keycloak realms and OIDC authentication on tenant kube-apiservers" +weight: 40 +aliases: + - /docs/next/oidc/tenant-kubernetes +--- + +This page covers OIDC authentication on the tenant Kubernetes clusters +managed by Cozystack (the per-tenant kube-apiservers backed by Kamaji). +It complements [Enable OIDC Server]({{< relref "enable_oidc.md" >}}), +which covers OIDC for the **management** cluster (dashboard, kubeapps, +mgmt kubectl) using the platform `cozy` realm. + +## Overview + +Each Cozystack tenant gets its own dedicated Keycloak realm for its +kube-apiservers and tenant-scoped applications. The management +identity domain (the `cozy` realm) stays separated from per-tenant +identity. Users granted access to a tenant cluster live in that +tenant's realm only — they cannot accidentally log into the management +cluster or another tenant. + +When the `Kubernetes` CR opts into OIDC (`spec.oidc.enabled: true`): + +1. The `apps/tenant` chart auto-provisions a `ClusterKeycloakRealm` + named after the tenant and the standard `groups` + `KeycloakClientScope` inside it. The realm name is published as + `_namespace.oidc-realm` in the tenant's `cozystack-values` Secret + so descendants and apps pick it up. +2. The `apps/kubernetes` chart creates a per-cluster public + `KeycloakClient kubernetes-` (with its own audience scope + so cross-cluster token replay fails inside the same realm), the + `KeycloakRealmGroup `, and wires the tenant kube-apiserver + via `KamajiControlPlane.spec.apiServer.extraArgs`. +3. A post-install Job applies a `ClusterRoleBinding` inside the tenant + cluster, binding the realm group to the built-in `cluster-admin` + ClusterRole. Operators grant or revoke access by adding or removing + users from the Keycloak group. + +Operators do **not** need to pre-toggle `Tenant.spec.oidc.enabled` — +the parent `apps/tenant` chart auto-provisions the realm when any +child `Kubernetes` CR requests OIDC. + +## Prerequisites + +- Platform-level OIDC must be enabled + (`authentication.oidc.enabled: true` in the platform values → + `_cluster.oidc-enabled=true`). +- A publicly resolvable platform DNS name with a valid TLS certificate + on the Keycloak ingress. The tenant apiserver validates the OIDC + issuer over HTTPS using its system trust store — self-signed + Keycloak deployments are not supported (see Limitations). + +## Enable OIDC on a tenant cluster + +```yaml +apiVersion: apps.cozystack.io/v1alpha1 +kind: Kubernetes +metadata: + name: prod-a + namespace: tenant-acme +spec: + controlPlane: + replicas: 1 + nodeGroups: + md0: + minReplicas: 0 + maxReplicas: 3 + instanceType: u1.medium + roles: [worker] + storageClass: replicated + version: "v1.32" + oidc: + enabled: true +``` + +Within ≤ 5 minutes: + +- The `apps/tenant` reconcile creates `ClusterKeycloakRealm tenant-acme`. +- The `apps/kubernetes` reconcile picks up the realm, provisions the + per-cluster `KeycloakClient kubernetes-prod-a`, the realm group + `prod-a`, and adds OIDC flags to the kube-apiserver. +- A post-install Job binds `Group prod-a` to `cluster-admin` inside the + tenant cluster. + +## Create a user and grant access + +In Keycloak (the tenant realm — e.g. `tenant-acme`): + +1. Create a user, set a non-temporary password, mark email verified. +2. Add the user to the realm group named after the cluster (`prod-a`). + One membership = full kubectl access to that cluster. + +To revoke access, remove the user from the group. + +## Wire kubectl with kubelogin + +Install [kubelogin](https://github.com/int128/kubelogin): + +```bash +brew install int128/kubelogin/kubelogin +# or: kubectl krew install oidc-login +``` + +The chart prints a ready-to-paste kubeconfig snippet in its +`NOTES.txt`: + +```bash +helm get notes -n tenant-acme prod-a +``` + +Or write it by hand — pasting the cluster CA from the admin +kubeconfig Secret: + +```yaml +apiVersion: v1 +kind: Config +clusters: +- name: prod-a + cluster: + server: https://prod-a.acme.example.com:443 + certificate-authority-data: +contexts: +- name: prod-a + context: + cluster: prod-a + user: oidc +current-context: prod-a +users: +- name: oidc + user: + exec: + apiVersion: client.authentication.k8s.io/v1 + command: kubectl + args: + - oidc-login + - get-token + - --oidc-issuer-url=https://keycloak.acme.example.com/realms/tenant-acme + - --oidc-client-id=kubernetes-prod-a +``` + +Running `kubectl get pods` opens the browser, logs the user into +Keycloak, returns the id_token, and the apiserver authenticates based +on the `groups` claim. + +## Limitations + +### Realm cleanup is not automatic after the last child OIDC cluster is removed + +The `apps/tenant` chart uses Helm's `lookup` function to discover +whether any child `Kubernetes` CR has `spec.oidc.enabled=true`. +Helm-controller does **not** re-render a chart when a `lookup` result +changes — it only re-renders when the chart source artifact or the +HelmRelease values change. Consequently, deleting the last +`Kubernetes` CR with OIDC enabled does **not** trigger an `apps/tenant` +re-render, and the orphan `ClusterKeycloakRealm` stays in the tenant +namespace. + +To force cleanup, the operator can: + +- Explicitly toggle `Tenant.spec.oidc.enabled=true` and then back to + `false`. Each toggle changes the HelmRelease values, which triggers + a re-render with the up-to-date lookup result. After the second + toggle, the chart no longer renders the realm and Helm prunes it. +- Or wait for the next platform upgrade that bumps any chart-affecting + source — the realm cleanup happens for free as a side effect. + +### Self-signed Keycloak is not supported + +The tenant apiserver validates the OIDC issuer over HTTPS using the +system trust store inside the Kamaji apiserver pod. If the platform +Keycloak ingress uses a private CA, the apiserver fails the TLS +handshake and all OIDC logins return 401. The chart does not currently +expose a `caBundle` field — public DNS with a valid certificate (e.g. +via cert-manager + Let's Encrypt) is required. See +[Self-signed certificates]({{< relref "self-signed-certificates.md" >}}) +for the management-cluster workaround pattern. + +### JWT claims are not configurable + +`--oidc-username-claim` is fixed to `preferred_username` and +`--oidc-groups-claim` is fixed to `groups`. These match the Keycloak +defaults; deployments using non-default claim mappings need a chart +change. + +### Runtime toggle of `oidc.enabled` from `true` to `false` + +Helm hooks only fire on install / upgrade / delete, not on values +changes. If an operator flips `Kubernetes.spec.oidc.enabled` from +`true` to `false`, the chart stops rendering the in-cluster +`ClusterRoleBinding` Job but the existing binding inside the tenant +cluster is not removed. The apiserver also drops the OIDC arguments on +the next reconcile, so the orphan binding is inert (no realm group +matches against the now-disabled OIDC path). Manual cleanup: + +```bash +kubectl --kubeconfig= delete clusterrolebinding \ + --selector cozystack.io/oidc-cluster= +``` + +### CI / headless access requires manual KeycloakClient patch + +The chart-rendered `KeycloakClient` is public and does **not** enable +`directAccessGrantsEnabled` (password grant). This is correct for +browser-flow logins. For CI pipelines that need a non-interactive +token, the cluster-admin can patch the client on the live cluster: + +```bash +kubectl -n patch keycloakclient kubernetes- \ + --type=merge --patch '{"spec":{"directAccess":true}}' +``` + +This is intentionally not the default — interactive flow stays +recommended for human users. + +## Troubleshooting + +### Apiserver returns 401 with a valid token + +Check the apiserver flags in the Kamaji pod: + +```bash +kubectl --context=mgmt -n get pod \ + -l kamaji.clastix.io/name= \ + -o jsonpath='{.items[0].spec.containers[?(@.name=="kube-apiserver")].args}' | \ + tr ',' '\n' | grep oidc +``` + +Confirm the issuer URL matches the realm — decode the id_token and +compare `iss` against the `--oidc-issuer-url` flag. Confirm `aud` in +the token equals `--oidc-client-id`; mismatch is the most common cause +when running multiple clusters in the same realm. + +### Apiserver returns 403 for a user that is in the right group + +Check the in-cluster `ClusterRoleBinding`: + +```bash +kubectl --kubeconfig= get clusterrolebinding \ + --selector cozystack.io/oidc-cluster= +``` + +The bootstrap Job runs as a `post-install` / `post-upgrade` hook; +check its logs in the management cluster: + +```bash +kubectl --context=mgmt -n logs \ + job/-oidc-rbac +``` + +### Realm or scope objects stuck after CR deletion + +See [Realm cleanup is not automatic](#realm-cleanup-is-not-automatic-after-the-last-child-oidc-cluster-is-removed) +under Limitations. Operator intervention required (toggle +`Tenant.spec.oidc`). From febf81df6fef2b49715f565f21113558404c66a7 Mon Sep 17 00:00:00 2001 From: IvanHunters Date: Thu, 25 Jun 2026 15:55:24 +0300 Subject: [PATCH 2/3] docs(operations): tighten tenant Kubernetes OIDC page after self-review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adversarial pass on the page from the prior commit found ten issues that would either confuse the reader or break the doc on render. Fixes: * Drop the `helm get notes` recipe — Cozystack uses Flux helm-controller, not a local helm CLI, so the command would not work for most operators. Replace with an explicit `kubectl get secret … | base64 -d` recipe that extracts the admin kubeconfig and dumps the cluster CA. * Clarify the cross-link to `self-signed-certificates.md` — the management cluster workaround there does NOT apply to tenant apiservers (Kamaji owns their machine config, not the operator's Talos / talm flow). The prior phrasing implied a tenant workaround existed. * Replace the optimistic "Within ≤ 5 minutes" with "up to ~10 minutes worst case" — the cascade is two sequential reconcile loops, not one, so 5 minutes was misleading for cold-start installs. Also document the `-awaiting-oidc-realm` ConfigMap beacon between the two reconciles. * Rename the "Runtime toggle of `oidc.enabled` from `true` to `false`" heading to "Runtime oidc.enabled toggle does not clean up bindings" — Hugo's TOC generator does not always cope with inline `code` in headings, and the new wording is more declarative anyway. * Spell out the admin-kubeconfig extraction in every troubleshooting and cleanup recipe instead of leaving an `` placeholder. * Pin one set of placeholders for the whole page (tenant = `acme`, cluster = `prod-a`, root host = `acme.example.com`) and use them consistently in every example. The earlier draft mixed concrete and `` style across sections. * Spell out the Job name (`kubernetes-prod-a-oidc-rbac`) rather than leaving `-oidc-rbac` to be derived by the reader. * Rewrite "no realm group matches against the now-disabled OIDC path" to the clearer "no realm group can match it once OIDC is off". * Add an upfront `Tenant.spec.oidc.enabled` clarification: the field stays at its default `false` during normal operation, and the only legitimate use is the realm-cleanup workaround in Limitations. The prior draft mentioned the flag in two contexts without flagging that one of them was a workaround only. * Add a top-of-page placeholder index so the reader can map the example names back to their own deployment. Signed-off-by: IvanHunters --- .../next/operations/oidc/tenant_clusters.md | 159 +++++++++++------- 1 file changed, 99 insertions(+), 60 deletions(-) diff --git a/content/en/docs/next/operations/oidc/tenant_clusters.md b/content/en/docs/next/operations/oidc/tenant_clusters.md index ba83fd68..2a8914ee 100644 --- a/content/en/docs/next/operations/oidc/tenant_clusters.md +++ b/content/en/docs/next/operations/oidc/tenant_clusters.md @@ -13,6 +13,11 @@ It complements [Enable OIDC Server]({{< relref "enable_oidc.md" >}}), which covers OIDC for the **management** cluster (dashboard, kubeapps, mgmt kubectl) using the platform `cozy` realm. +In all examples below the tenant is `acme` (so its namespace is +`tenant-acme`), the tenant Kubernetes CR is `prod-a` (so its release +name is `kubernetes-prod-a`), and the platform root host is +`acme.example.com`. Substitute your own names accordingly. + ## Overview Each Cozystack tenant gets its own dedicated Keycloak realm for its @@ -25,23 +30,26 @@ cluster or another tenant. When the `Kubernetes` CR opts into OIDC (`spec.oidc.enabled: true`): 1. The `apps/tenant` chart auto-provisions a `ClusterKeycloakRealm` - named after the tenant and the standard `groups` + named after the tenant (`tenant-acme`) and the standard `groups` `KeycloakClientScope` inside it. The realm name is published as `_namespace.oidc-realm` in the tenant's `cozystack-values` Secret so descendants and apps pick it up. 2. The `apps/kubernetes` chart creates a per-cluster public - `KeycloakClient kubernetes-` (with its own audience scope - so cross-cluster token replay fails inside the same realm), the - `KeycloakRealmGroup `, and wires the tenant kube-apiserver + `KeycloakClient kubernetes-prod-a` (with its own audience scope so + cross-cluster token replay fails inside the same realm), the + `KeycloakRealmGroup prod-a`, and wires the tenant kube-apiserver via `KamajiControlPlane.spec.apiServer.extraArgs`. -3. A post-install Job applies a `ClusterRoleBinding` inside the tenant - cluster, binding the realm group to the built-in `cluster-admin` - ClusterRole. Operators grant or revoke access by adding or removing - users from the Keycloak group. - -Operators do **not** need to pre-toggle `Tenant.spec.oidc.enabled` — -the parent `apps/tenant` chart auto-provisions the realm when any -child `Kubernetes` CR requests OIDC. +3. A post-install Job (`kubernetes-prod-a-oidc-rbac`) applies a + `ClusterRoleBinding` inside the tenant cluster, binding the realm + group `prod-a` to the built-in `cluster-admin` ClusterRole. + Operators grant or revoke access by adding or removing users from + the Keycloak group. + +Operators do **not** need to pre-toggle `Tenant.spec.oidc.enabled` +during normal operation — the parent `apps/tenant` chart auto-detects +child Kubernetes CRs with `oidc.enabled: true` and provisions the +realm automatically. The explicit `Tenant.spec.oidc.enabled=true` is +only useful as a manual cleanup workaround (see Limitations below). ## Prerequisites @@ -76,22 +84,31 @@ spec: enabled: true ``` -Within ≤ 5 minutes: +Each chart in the chain reconciles on its own loop (default interval +5 minutes), so the full cascade takes up to ~10 minutes worst case +from a cold start: -- The `apps/tenant` reconcile creates `ClusterKeycloakRealm tenant-acme`. -- The `apps/kubernetes` reconcile picks up the realm, provisions the - per-cluster `KeycloakClient kubernetes-prod-a`, the realm group - `prod-a`, and adds OIDC flags to the kube-apiserver. -- A post-install Job binds `Group prod-a` to `cluster-admin` inside the - tenant cluster. +1. The `apps/tenant` reconcile creates `ClusterKeycloakRealm tenant-acme` + and publishes `_namespace.oidc-realm=tenant-acme` to the tenant's + `cozystack-values` Secret. +2. The `apps/kubernetes` reconcile picks up the realm, provisions the + per-cluster `KeycloakClient kubernetes-prod-a`, the realm group + `prod-a`, and adds OIDC flags to the kube-apiserver. +3. The post-install Job binds `Group prod-a` to `cluster-admin` inside + the tenant cluster. + +Until step 1 completes, `apps/kubernetes` renders a +`kubernetes-prod-a-awaiting-oidc-realm` ConfigMap beacon in the tenant +namespace and the kube-apiserver runs without OIDC arguments — the +client-cert (mTLS) admin kubeconfig stays usable throughout. ## Create a user and grant access -In Keycloak (the tenant realm — e.g. `tenant-acme`): +In Keycloak (the tenant realm — `tenant-acme`): 1. Create a user, set a non-temporary password, mark email verified. 2. Add the user to the realm group named after the cluster (`prod-a`). - One membership = full kubectl access to that cluster. + One membership = full `cluster-admin` access to that cluster. To revoke access, remove the user from the group. @@ -104,15 +121,22 @@ brew install int128/kubelogin/kubelogin # or: kubectl krew install oidc-login ``` -The chart prints a ready-to-paste kubeconfig snippet in its -`NOTES.txt`: +Extract the cluster CA from the Kamaji admin kubeconfig Secret in the +tenant namespace of the management cluster: ```bash -helm get notes -n tenant-acme prod-a +kubectl --context=mgmt -n tenant-acme \ + get secret kubernetes-prod-a-admin-kubeconfig \ + -o jsonpath='{.data.super-admin\.conf}' | base64 -d \ + > /tmp/prod-a-admin.kubeconfig + +CA=$(awk '/certificate-authority-data/{print $2}' \ + /tmp/prod-a-admin.kubeconfig) ``` -Or write it by hand — pasting the cluster CA from the admin -kubeconfig Secret: +Save the snippet below as `~/.kube/config-prod-a`, paste the value of +`$CA` into `certificate-authority-data`, and run +`export KUBECONFIG=~/.kube/config-prod-a`: ```yaml apiVersion: v1 @@ -121,7 +145,7 @@ clusters: - name: prod-a cluster: server: https://prod-a.acme.example.com:443 - certificate-authority-data: + certificate-authority-data: contexts: - name: prod-a context: @@ -154,29 +178,36 @@ whether any child `Kubernetes` CR has `spec.oidc.enabled=true`. Helm-controller does **not** re-render a chart when a `lookup` result changes — it only re-renders when the chart source artifact or the HelmRelease values change. Consequently, deleting the last -`Kubernetes` CR with OIDC enabled does **not** trigger an `apps/tenant` -re-render, and the orphan `ClusterKeycloakRealm` stays in the tenant -namespace. +`Kubernetes` CR with OIDC enabled does **not** trigger an +`apps/tenant` re-render, and the orphan `ClusterKeycloakRealm` stays +in the tenant namespace. To force cleanup, the operator can: -- Explicitly toggle `Tenant.spec.oidc.enabled=true` and then back to - `false`. Each toggle changes the HelmRelease values, which triggers - a re-render with the up-to-date lookup result. After the second +- Toggle `Tenant.spec.oidc.enabled=true` and then back to `false`. + Each toggle changes the HelmRelease values, which triggers a + re-render with the up-to-date lookup result. After the second toggle, the chart no longer renders the realm and Helm prunes it. -- Or wait for the next platform upgrade that bumps any chart-affecting - source — the realm cleanup happens for free as a side effect. + This is the only legitimate use of `Tenant.spec.oidc.enabled` — + during normal operation the field stays at its default `false`. +- Or wait for the next platform upgrade that bumps any + chart-affecting source — the realm cleanup happens for free as a + side effect of the new render. ### Self-signed Keycloak is not supported The tenant apiserver validates the OIDC issuer over HTTPS using the system trust store inside the Kamaji apiserver pod. If the platform Keycloak ingress uses a private CA, the apiserver fails the TLS -handshake and all OIDC logins return 401. The chart does not currently -expose a `caBundle` field — public DNS with a valid certificate (e.g. -via cert-manager + Let's Encrypt) is required. See -[Self-signed certificates]({{< relref "self-signed-certificates.md" >}}) -for the management-cluster workaround pattern. +handshake and all OIDC logins return 401. The chart does not expose a +`caBundle` field for the tenant apiserver — public DNS with a valid +certificate (e.g. via cert-manager + Let's Encrypt) is required. + +Note: [Self-signed certificates]({{< relref "self-signed-certificates.md" >}}) +covers the workaround for the **management cluster** apiserver only. +That workaround does **not** apply to tenant apiservers because their +machine config is managed by Kamaji, not by the operator's Talos / +talm flow. ### JWT claims are not configurable @@ -185,21 +216,23 @@ for the management-cluster workaround pattern. defaults; deployments using non-default claim mappings need a chart change. -### Runtime toggle of `oidc.enabled` from `true` to `false` +### Runtime oidc.enabled toggle does not clean up bindings Helm hooks only fire on install / upgrade / delete, not on values changes. If an operator flips `Kubernetes.spec.oidc.enabled` from `true` to `false`, the chart stops rendering the in-cluster `ClusterRoleBinding` Job but the existing binding inside the tenant -cluster is not removed. The apiserver also drops the OIDC arguments on -the next reconcile, so the orphan binding is inert (no realm group -matches against the now-disabled OIDC path). Manual cleanup: +cluster is not removed. The apiserver also drops the OIDC arguments +on the next reconcile, so the binding is inert — no realm group can +match it once OIDC is off. Manual cleanup: ```bash -kubectl --kubeconfig= delete clusterrolebinding \ - --selector cozystack.io/oidc-cluster= +KUBECONFIG=/tmp/prod-a-admin.kubeconfig kubectl delete clusterrolebinding \ + --selector cozystack.io/oidc-cluster=prod-a ``` +(reuse the admin kubeconfig extracted in the "Wire kubectl" section). + ### CI / headless access requires manual KeycloakClient patch The chart-rendered `KeycloakClient` is public and does **not** enable @@ -208,7 +241,7 @@ browser-flow logins. For CI pipelines that need a non-interactive token, the cluster-admin can patch the client on the live cluster: ```bash -kubectl -n patch keycloakclient kubernetes- \ +kubectl --context=mgmt -n tenant-acme patch keycloakclient kubernetes-prod-a \ --type=merge --patch '{"spec":{"directAccess":true}}' ``` @@ -219,39 +252,45 @@ recommended for human users. ### Apiserver returns 401 with a valid token -Check the apiserver flags in the Kamaji pod: +Check the apiserver flags in the Kamaji pod (in the management +cluster): ```bash -kubectl --context=mgmt -n get pod \ - -l kamaji.clastix.io/name= \ +kubectl --context=mgmt -n tenant-acme get pod \ + -l kamaji.clastix.io/name=kubernetes-prod-a \ -o jsonpath='{.items[0].spec.containers[?(@.name=="kube-apiserver")].args}' | \ tr ',' '\n' | grep oidc ``` Confirm the issuer URL matches the realm — decode the id_token and compare `iss` against the `--oidc-issuer-url` flag. Confirm `aud` in -the token equals `--oidc-client-id`; mismatch is the most common cause -when running multiple clusters in the same realm. +the token equals `--oidc-client-id`; mismatch is the most common +cause when running multiple clusters in the same realm. ### Apiserver returns 403 for a user that is in the right group -Check the in-cluster `ClusterRoleBinding`: +Extract the admin kubeconfig and check the in-cluster +`ClusterRoleBinding`: ```bash -kubectl --kubeconfig= get clusterrolebinding \ - --selector cozystack.io/oidc-cluster= +kubectl --context=mgmt -n tenant-acme \ + get secret kubernetes-prod-a-admin-kubeconfig \ + -o jsonpath='{.data.super-admin\.conf}' | base64 -d \ + > /tmp/prod-a-admin.kubeconfig + +KUBECONFIG=/tmp/prod-a-admin.kubeconfig kubectl get clusterrolebinding \ + --selector cozystack.io/oidc-cluster=prod-a ``` The bootstrap Job runs as a `post-install` / `post-upgrade` hook; check its logs in the management cluster: ```bash -kubectl --context=mgmt -n logs \ - job/-oidc-rbac +kubectl --context=mgmt -n tenant-acme logs \ + job/kubernetes-prod-a-oidc-rbac ``` ### Realm or scope objects stuck after CR deletion -See [Realm cleanup is not automatic](#realm-cleanup-is-not-automatic-after-the-last-child-oidc-cluster-is-removed) -under Limitations. Operator intervention required (toggle -`Tenant.spec.oidc`). +See "Realm cleanup is not automatic" under Limitations. Operator +intervention required (toggle `Tenant.spec.oidc`). From a542777087f5d79766dcb5c7660c9af136a960e6 Mon Sep 17 00:00:00 2001 From: IvanHunters Date: Thu, 25 Jun 2026 23:41:35 +0300 Subject: [PATCH 3/3] docs(operations): rewrite tenant OIDC page for the tenant-module flow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous version of this page described the inline-auto-provision architecture where apps/tenant rendered the realm directly and used Helm lookup to detect child Kubernetes CRs. That design is gone — realm provisioning now lives in extra/oidc, gated by a plain Tenant.spec.oidc bool (same shape as etcd / monitoring / ingress). Rewrite covers: * The tenant-module pattern (Tenant.spec.oidc=true → apps/tenant renders an `oidc` HR → extra/oidc provisions realm + admin user + keycloak-admin Secret). * The keycloak-admin Secret in the tenant namespace (url + username + password + realm), surfaced through the dashboard via spec.secrets.include. * Realm inheritance — descendant tenants inherit the parent's realm through _namespace.oidc-realm; realm-wide unique Keycloak identifiers (-kubernetes-) prevent sibling collisions. * Identity-admin delegation living with the realm-owning tenant only. * Limitation: disabling parent OIDC while descendant clusters use the inherited realm — eventually consistent within one helm-controller reconcile interval. Pairs with cozystack#3044 commit 2e52384c1. Signed-off-by: IvanHunters --- .../next/operations/oidc/tenant_clusters.md | 225 ++++++++++++------ 1 file changed, 150 insertions(+), 75 deletions(-) diff --git a/content/en/docs/next/operations/oidc/tenant_clusters.md b/content/en/docs/next/operations/oidc/tenant_clusters.md index 2a8914ee..56732d9c 100644 --- a/content/en/docs/next/operations/oidc/tenant_clusters.md +++ b/content/en/docs/next/operations/oidc/tenant_clusters.md @@ -20,36 +20,77 @@ name is `kubernetes-prod-a`), and the platform root host is ## Overview -Each Cozystack tenant gets its own dedicated Keycloak realm for its -kube-apiservers and tenant-scoped applications. The management -identity domain (the `cozy` realm) stays separated from per-tenant -identity. Users granted access to a tenant cluster live in that -tenant's realm only — they cannot accidentally log into the management -cluster or another tenant. - -When the `Kubernetes` CR opts into OIDC (`spec.oidc.enabled: true`): - -1. The `apps/tenant` chart auto-provisions a `ClusterKeycloakRealm` - named after the tenant (`tenant-acme`) and the standard `groups` - `KeycloakClientScope` inside it. The realm name is published as - `_namespace.oidc-realm` in the tenant's `cozystack-values` Secret - so descendants and apps pick it up. -2. The `apps/kubernetes` chart creates a per-cluster public - `KeycloakClient kubernetes-prod-a` (with its own audience scope so - cross-cluster token replay fails inside the same realm), the - `KeycloakRealmGroup prod-a`, and wires the tenant kube-apiserver - via `KamajiControlPlane.spec.apiServer.extraArgs`. +Per-tenant identity is delivered through a tenant-module: when a +tenant opts in via `Tenant.spec.oidc=true`, the platform provisions a +dedicated Keycloak realm + a realm-admin user + a `keycloak-admin` +Secret with the credentials. Tenant operators self-manage users in +their realm; child Kubernetes CRs and tenant-scoped applications wire +themselves to the realm declaratively through cozystack manifests. + +The management identity domain (the `cozy` realm) stays separated from +per-tenant identity. Users granted access to a tenant cluster live in +that tenant's realm only — they cannot accidentally log into the +management cluster or another tenant. + +When `Tenant.spec.oidc=true` is set: + +1. `apps/tenant` renders an `oidc` HelmRelease in the tenant namespace + (same tenant-module pattern as `etcd`, `monitoring`, `ingress`). +2. The HR resolves to the `extra/oidc` chart, which provisions + `ClusterKeycloakRealm tenant-acme`, the standard `groups` + `KeycloakClientScope`, a `KeycloakRealmUser admin` with the + built-in `realm-admin` client role on `realm-management`, and a + `Secret keycloak-admin` carrying the URL + username + password the + tenant operator uses to log into the realm's admin console. +3. The realm name is published to `_namespace.oidc-realm` in the + tenant's `cozystack-values` Secret so descendant tenants and + tenant-scoped apps inherit it the same way they inherit + `etcd` / `monitoring` / `ingress`. + +When a Kubernetes CR opts into OIDC (`spec.oidc.enabled: true`): + +1. `apps/kubernetes` reads `_namespace.oidc-realm` (own OR inherited). + If empty, the chart soft-renders an + `-awaiting-oidc-realm` ConfigMap beacon and the + kube-apiserver runs without OIDC arguments — the client-cert (mTLS) + admin kubeconfig stays usable. +2. Once the realm name is non-empty, the chart creates a per-cluster + public `KeycloakClient tenant-acme-kubernetes-prod-a` (with its own + audience scope so cross-cluster token replay fails inside the same + realm), creates `KeycloakRealmGroup tenant-acme-kubernetes-prod-a`, + and wires the tenant kube-apiserver via + `KamajiControlPlane.spec.apiServer.extraArgs`. 3. A post-install Job (`kubernetes-prod-a-oidc-rbac`) applies a `ClusterRoleBinding` inside the tenant cluster, binding the realm - group `prod-a` to the built-in `cluster-admin` ClusterRole. - Operators grant or revoke access by adding or removing users from - the Keycloak group. - -Operators do **not** need to pre-toggle `Tenant.spec.oidc.enabled` -during normal operation — the parent `apps/tenant` chart auto-detects -child Kubernetes CRs with `oidc.enabled: true` and provisions the -realm automatically. The explicit `Tenant.spec.oidc.enabled=true` is -only useful as a manual cleanup workaround (see Limitations below). + group `tenant-acme-kubernetes-prod-a` to the built-in + `cluster-admin` ClusterRole. Operators grant or revoke access by + adding or removing users from the Keycloak group. + +## Realm inheritance + +Descendant tenants without their own `spec.oidc=true` inherit the +nearest ancestor's realm name through the cozystack-values +propagation chain. Their `Kubernetes` CRs wire against the ancestor's +realm; the chart renders the per-cluster `KeycloakClient` and +`KeycloakRealmGroup` into the descendant's own namespace, with +`realmRef` pointing at the ancestor's cluster-scoped realm. + +| Tenant | `spec.oidc` | `_namespace.oidc-realm` | +| --- | --- | --- | +| `tenant-acme` | `true` | `tenant-acme` (owns) | +| `tenant-acme-prod` | `false` | `tenant-acme` (inherited) | +| `tenant-acme-prod-eu` | `false` | `tenant-acme` (inherited via chain) | +| `tenant-acme-staging` | `true` | `tenant-acme-staging` (owns — override) | + +Realm-wide unique names prevent collisions when two siblings under the +same parent realm each have a `Kubernetes` CR of the same +metadata.name — `tenant-acme-prod-kubernetes-dev` and +`tenant-acme-staging-kubernetes-dev` are distinct identifiers in the +shared `tenant-acme` realm. + +Identity-admin delegation lives with the realm-owning tenant only: +only that tenant gets the `keycloak-admin` Secret. Descendants +consume the realm declaratively but do not gain admin access to it. ## Prerequisites @@ -61,7 +102,43 @@ only useful as a manual cleanup workaround (see Limitations below). issuer over HTTPS using its system trust store — self-signed Keycloak deployments are not supported (see Limitations). -## Enable OIDC on a tenant cluster +## Enable OIDC on a tenant + +```yaml +apiVersion: apps.cozystack.io/v1alpha1 +kind: Tenant +metadata: + name: acme + namespace: tenant-root +spec: + etcd: true + ingress: true + oidc: true +``` + +On the next `apps/tenant` reconcile (default interval 5 min) the +`oidc` HR appears in `tenant-acme`. The `extra/oidc` chart then takes +1-2 minutes to provision the realm + admin user + Secret. Open the +`keycloak-admin` Secret through the cozystack dashboard or kubectl to +grab the realm admin URL + credentials: + +```bash +kubectl --context=mgmt -n tenant-acme get secret keycloak-admin -o yaml +``` + +The Secret carries: + +- `url` — the admin console URL (e.g. `https://keycloak.acme.example.com/admin/tenant-acme/console/`) +- `username` — `admin` +- `password` — random alphanumeric, stable across re-renders +- `realm` — `tenant-acme` +- `email` — `admin@tenant-acme.local` (or operator override) + +The tenant operator logs into the admin URL with these credentials and +manages users, groups, identity providers, password policies inside +their realm — independently of the platform admin. + +## Enable OIDC on a tenant Kubernetes cluster ```yaml apiVersion: apps.cozystack.io/v1alpha1 @@ -84,31 +161,29 @@ spec: enabled: true ``` -Each chart in the chain reconciles on its own loop (default interval -5 minutes), so the full cascade takes up to ~10 minutes worst case -from a cold start: - -1. The `apps/tenant` reconcile creates `ClusterKeycloakRealm tenant-acme` - and publishes `_namespace.oidc-realm=tenant-acme` to the tenant's - `cozystack-values` Secret. -2. The `apps/kubernetes` reconcile picks up the realm, provisions the - per-cluster `KeycloakClient kubernetes-prod-a`, the realm group - `prod-a`, and adds OIDC flags to the kube-apiserver. -3. The post-install Job binds `Group prod-a` to `cluster-admin` inside - the tenant cluster. +Each chart in the chain reconciles on its own loop (default 5 min), +so the full cascade takes up to ~10 minutes worst case from a cold +start: -Until step 1 completes, `apps/kubernetes` renders a -`kubernetes-prod-a-awaiting-oidc-realm` ConfigMap beacon in the tenant -namespace and the kube-apiserver runs without OIDC arguments — the -client-cert (mTLS) admin kubeconfig stays usable throughout. +1. `apps/tenant` reconcile creates the `oidc` HR; `extra/oidc` + provisions `ClusterKeycloakRealm tenant-acme` and publishes + `_namespace.oidc-realm=tenant-acme` to cozystack-values. +2. `apps/kubernetes` reconcile picks up the realm, provisions the + per-cluster `KeycloakClient tenant-acme-kubernetes-prod-a`, the + realm group `tenant-acme-kubernetes-prod-a`, and adds OIDC flags + to the kube-apiserver. +3. The post-install Job binds `Group tenant-acme-kubernetes-prod-a` + to `cluster-admin` inside the tenant cluster. ## Create a user and grant access In Keycloak (the tenant realm — `tenant-acme`): 1. Create a user, set a non-temporary password, mark email verified. -2. Add the user to the realm group named after the cluster (`prod-a`). - One membership = full `cluster-admin` access to that cluster. +2. Add the user to the realm group named after the cluster in the + format `-kubernetes-` (e.g. + `tenant-acme-kubernetes-prod-a`). One membership = full + `cluster-admin` access to that cluster. To revoke access, remove the user from the group. @@ -162,7 +237,7 @@ users: - oidc-login - get-token - --oidc-issuer-url=https://keycloak.acme.example.com/realms/tenant-acme - - --oidc-client-id=kubernetes-prod-a + - --oidc-client-id=tenant-acme-kubernetes-prod-a ``` Running `kubectl get pods` opens the browser, logs the user into @@ -171,29 +246,6 @@ on the `groups` claim. ## Limitations -### Realm cleanup is not automatic after the last child OIDC cluster is removed - -The `apps/tenant` chart uses Helm's `lookup` function to discover -whether any child `Kubernetes` CR has `spec.oidc.enabled=true`. -Helm-controller does **not** re-render a chart when a `lookup` result -changes — it only re-renders when the chart source artifact or the -HelmRelease values change. Consequently, deleting the last -`Kubernetes` CR with OIDC enabled does **not** trigger an -`apps/tenant` re-render, and the orphan `ClusterKeycloakRealm` stays -in the tenant namespace. - -To force cleanup, the operator can: - -- Toggle `Tenant.spec.oidc.enabled=true` and then back to `false`. - Each toggle changes the HelmRelease values, which triggers a - re-render with the up-to-date lookup result. After the second - toggle, the chart no longer renders the realm and Helm prunes it. - This is the only legitimate use of `Tenant.spec.oidc.enabled` — - during normal operation the field stays at its default `false`. -- Or wait for the next platform upgrade that bumps any - chart-affecting source — the realm cleanup happens for free as a - side effect of the new render. - ### Self-signed Keycloak is not supported The tenant apiserver validates the OIDC issuer over HTTPS using the @@ -241,13 +293,33 @@ browser-flow logins. For CI pipelines that need a non-interactive token, the cluster-admin can patch the client on the live cluster: ```bash -kubectl --context=mgmt -n tenant-acme patch keycloakclient kubernetes-prod-a \ +kubectl --context=mgmt -n tenant-acme patch keycloakclient tenant-acme-kubernetes-prod-a \ --type=merge --patch '{"spec":{"directAccess":true}}' ``` This is intentionally not the default — interactive flow stays recommended for human users. +### Disabling parent OIDC while descendant clusters use the inherited realm + +If a parent tenant flips `spec.oidc=false` while descendant tenants +still have `Kubernetes` CRs with `spec.oidc.enabled=true` referencing +the parent's realm, convergence takes up to one helm-controller +reconcile interval (default 5 min): + +1. Parent's `oidc` HR uninstalls — realm + admin user + + `keycloak-admin` Secret are removed. +2. `_namespace.oidc-realm` in descendant cozystack-values reverts to + empty on the next tenant reconcile. +3. Descendant's `apps/kubernetes` reconciles, drops the OIDC apiserver + args, and prunes the per-cluster KeycloakClient + RealmGroup CRs. + +During the window, descendant KeycloakClient / RealmGroup CRs +reference a deleted realm — the EDP Keycloak Operator logs errors but +does not damage cluster state. OIDC tokens stop working immediately; +the per-cluster client-cert admin kubeconfig remains usable as the +recovery path. + ## Troubleshooting ### Apiserver returns 401 with a valid token @@ -290,7 +362,10 @@ kubectl --context=mgmt -n tenant-acme logs \ job/kubernetes-prod-a-oidc-rbac ``` -### Realm or scope objects stuck after CR deletion +### Realm or scope objects stuck after Tenant.spec.oidc=false -See "Realm cleanup is not automatic" under Limitations. Operator -intervention required (toggle `Tenant.spec.oidc`). +Flux uninstalls the `oidc` HR on the next tenant reconcile, which +drops the realm + scope + user + Secret automatically — no orphan +workaround required (unlike the previous design where realm +provisioning was inline in `apps/tenant`). If a stale object persists, +check the Keycloak Operator's logs for reconciliation errors.