design-proposal: per-tenant Keycloak realms for tenant Kubernetes cluster OIDC#24
design-proposal: per-tenant Keycloak realms for tenant Kubernetes cluster OIDC#24IvanHunters wants to merge 3 commits into
Conversation
…ster OIDC Companion design proposal to cozystack/cozystack#3044. Central decision: per-tenant Keycloak realm (tenant-<ns>) as the identity unit for tenant Kubernetes cluster OIDC, with per-cluster public clients providing audience-bound token isolation. The platform-admin cozy realm and tenant user directories serve different populations and trust models; this proposal keeps them separate while delivering per-cluster isolation through audience binding rather than a separate directory. Covers realm provisioning via the EDP Keycloak operator, propagation through the cozystack-basics namespace-values channel, per-cluster KeycloakClient + view/admin KeycloakRealmGroups, KamajiControlPlane oidc flag wiring, an OIDC kubeconfig Secret exposed via cozyrds, plus lifecycle, rollout, security, failure modes, testing, and alternatives considered (including the flat cozy + per-cluster-client path). Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a design proposal for implementing per-tenant Keycloak realms to back OIDC authentication for tenant Kubernetes clusters in Cozystack. The feedback suggests clarifying that the bootstrap Job runs in the management cluster rather than inside the tenant cluster itself. Additionally, it warns that attempting to delete ClusterRoleBindings inside the tenant cluster during a pre-delete hook is redundant and could block Helm release deletion if the control plane becomes unreachable.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| Tenant admin assigns Keycloak users to whichever group is appropriate through the standard Keycloak admin UI inside `tenant-<ns>`. Group membership ends up as `groups:` claim in the id_token. | ||
|
|
||
| Inside the tenant Kubernetes cluster, a bootstrap Job (Helm `post-install,post-upgrade` hook, ServiceAccount scoped to the tenant namespace, pod labeled `policy.cozystack.io/allow-to-apiserver: "true"` so Cilium permits the egress to the management apiserver) applies two `ClusterRoleBinding`s: |
There was a problem hiding this comment.
The phrasing 'Inside the tenant Kubernetes cluster, a bootstrap Job...' is misleading. Based on the context (the ServiceAccount is scoped to the tenant namespace in the management cluster, and it later writes a Secret to the management namespace), this Job actually runs in the management cluster and targets the tenant cluster's API. Clarifying this to 'A bootstrap Job running in the management cluster's tenant namespace...' would prevent confusion about where the Job is scheduled and executed.
| - **Create**: `Tenant.spec.oidc=true` → realm appears → cozystack-basics emits realm into namespace values → all `Kubernetes` CRs in the namespace re-render with OIDC wired in on next Flux interval. | ||
| - **Inherit**: child tenant with `oidc=false` reads parent's `_namespace.oidc-realm` from its own namespace values. Same wiring as create. | ||
| - **Add a cluster**: a new `Kubernetes` CR in an OIDC-enabled tenant gets its own client+groups+CRBs+Secret automatically. No tenant-admin intervention. | ||
| - **Delete a cluster**: Helm `pre-delete` hook Job removes the per-cluster CRBs inside the tenant cluster, removes the OIDC kubeconfig Secret in the management cluster, and lets Flux garbage-collect the per-cluster `KeycloakClient` and `KeycloakRealmGroup`s. |
There was a problem hiding this comment.
Attempting to delete ClusterRoleBindings inside the tenant cluster during a pre-delete hook is redundant because the entire tenant cluster is being destroyed. Furthermore, if the tenant cluster's control plane is already partially torn down or unreachable, this hook could hang or fail, blocking the deletion of the Helm release. Consider limiting the cleanup Job to removing management-cluster resources like the OIDC kubeconfig Secret.
Grafana OIDC for tenants is a separate, unrelated initiative and was incorrectly listed here as an in-flight companion. The Rollout section is shortened accordingly. Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
…hentication config The chart in cozystack#3044 was reworked to wire the apiserver via a mounted AuthenticationConfiguration (apiserver.config.k8s.io/v1beta1) rather than the legacy four --oidc-* flags. The proposal is updated to match: - Layer 5 now describes the three-piece wiring: --authentication-config flag + extraVolumeMounts on apiServer + extraVolumes on deployment, sourced from a per-cluster Secret carrying the structured config. - Goals gain a bullet covering structured (not flag-based) authn so the forward path to BYO-OIDC and private-CA issuers is explicit up front. - Context > KamajiControlPlane reflects that extraVolumes / extraVolumeMounts are no longer 'matters for the follow-up', they are used by this proposal. - Open questions > BYO-OIDC narrows to the remaining design surface (chart API for a BYO JWTAuthenticator, audience composition). - Alternatives considered gains the flag-based path entry with the rationale for rejecting it (one-issuer cap, no private-CA, future rewrite cost). Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Andrei Kvapil (kvaps)
left a comment
There was a problem hiding this comment.
Thanks for writing this up — the doc is thorough and the "Alternatives considered" section is genuinely useful as the record for the per-tenant-realm direction. The structured --authentication-config delivery mechanism (Layer 5) and its rationale (multi-issuer, private-CA, additive BYO) are the right call and match where we want to land.
My main concern is alignment with today's two OIDC syncs, which settled the phasing after this doc was written. The agreed shape was:
- Phase 1: authenticate tenant-cluster users via the existing platform
cozyrealm + per-cluster public client for audience isolation, plus the ability to pass a BYO OIDC/authentication config into the cluster. No groups, no auto-provisioning into Keycloak. API as a selector, e.g.oidc: System | CustomConfig | None. - Phase 2 (deferred, may not be needed): per-tenant Keycloak realm / "managed IdP", as a separate, fully-designed product (
oidc: ... | KeycloakRealm | ...).
As written, the proposal does the inverse of that split, so I'd ask to realign it before the #3044 implementation lands:
- Identity unit (Overview / Rollout). The doc picks per-tenant realm
tenant-<ns>as the unit to implement now. Per the syncs that's Phase 2. Phase 1 is the sharedcozyrealm + per-cluster client. - "Flat
cozyrealm + per-cluster public client" (Alternatives, L278) is rejected — but that's the option we chose for Phase 1. Note the rejection somewhat misframes it: Phase 1 does not put tenant end-users intocozy. It authenticates the platform users that already exist there, and BYO covers a company's external users. Could you re-evaluate it as the Phase-1 baseline rather than a rejected alternative? - BYO-OIDC is marked non-goal/deferred (L18, Non-goals) — it's part of Phase 1. Passing a custom OIDC/authn config into the cluster is one of the two Phase-1 deliverables, not a follow-up. Layer 5 already mounts a structured config, so this is mostly a framing/scope change.
- Groups + auto-RBAC (Layer 4). Two
KeycloakRealmGroups per cluster + autoClusterRoleBindings is exactly the group/auto-provisioning machinery the syncs put out of Phase 1 ("no groups"; ausers:map with a role is enough). Auto-creating clients/groups in the IdP at cluster-creation is the "deep integration = a product that needs serious design" we flagged — and it breaks the customer-Keycloak case where the tenant does not want us spawning groups in their realm. Suggest dropping it from Phase 1. - Billing / central-Keycloak load is unaddressed. All
tenant-<ns>realms live on the platform masterClusterKeycloak(L72). That's the load/billing blocker raised on the call. Please add a section covering it and the alternative of running Keycloak as an External App inside the tenant namespace (billing becomes trivial, it's just API + DB pods), which was the proposed mitigation.
Factual fixes about the current codebase (verified against main):
- L81 — it's
packages/system/cozystack-basics/, notpackages/core/..., and the emission lives intemplates/cozystack-values-secret.yaml; there is no_namespace.tpl. - L85 —
_namespace.dns-suffix/_namespace.kubelet-image-credential-provider-configare not propagated today. The actual_namespace.*keys arehost, etcd, ingress, gateway, monitoring, seaweedfs, schedulingClass. The real precedent ishost/ingress. - L82 — there is no tenant-hierarchy walk-up today; propagation is single-level parent lookup (
packages/apps/tenant/templates/namespace.yaml). Recursive inheritance (Layer 2) is new mechanism, not "the same channel that already exists" — worth calling out as new work. - L46, L278 — the
cozygroups are not<ns>-admins/<ns>-users. They are<tenant>-view,<tenant>-use,<tenant>-admin,<tenant>-super-admin(packages/apps/tenant/templates/keycloakgroups.yaml). This also affects the "group collision" argument. - L35 —
KamajiControlPlanein cozystack iscontrolplane.cluster.x-k8s.io/v1alpha1(packages/apps/kubernetes/templates/cluster.yaml).tcp.kamaji.clastix.iois theTenantControlPlanegroup — a different CRD. - L33 —
apps/kuberneteshas nooidcvalue at all today (not a "no-op field"). - L17 vs Layer 5 — Scope says
--oidc-*wiring, Design uses--authentication-config. Please make these consistent.
None of this is a knock on the analysis — it's solid and I'd keep it as the Phase-2 design record. The ask is to re-frame the doc around the Phase-1/Phase-2 split we agreed on, fix the codebase references, and add the billing section. Happy to pair on the Phase-1 scope if useful.
Summary
Design proposal companion to cozystack#3044.
The proposal answers the architectural decision raised in that PR's review: what is the identity unit for tenant Kubernetes cluster OIDC. This document picks per-tenant Keycloak realm (
tenant-<ns>), with per-cluster public clients providing audience-bound token isolation, and explains why that is structurally different from using the flatcozyrealm. It is intentionally written to compare both options head-on under Alternatives considered, so the trade-off is on the record before the implementation PR lands.What the proposal covers
tenant-<ns>owned by the tenant admin, separate from the platform-admincozyrealm. Provisioned declaratively viaTenant.spec.oidc=true.KeycloakClientand aKeycloakClientScopeper tenant cluster inside the tenant's realm;id_token.audmatches the per-cluster--oidc-client-id, so tokens are non-replayable across clusters.view+adminKeycloakRealmGroupper cluster, bound inside the tenant cluster to upstreamClusterRole/viewandClusterRole/cluster-adminrespectively. No blanket cluster-admin grant; the static admin kubeconfig remains as the documented break-glass path.cozystack-basics's static_namespace.oidc-realmemission, the same channel that already propagates DNS suffix and ingress class. Explicitly no Helmlookupis used.kubernetes-<cluster>-oidc-kubeconfigSecret in the management cluster's tenant namespace, exposed viacozyrdsspec.secrets.include, carrying a ready-to-use kubeconfig with akubectl oidc-loginexec auth block.--authentication-configonKamajiControlPlane, RFC 8693 token exchange via a custom credential plugin, and acozystackCLI helper. All called out as separate future proposals.cozyrealm + per-cluster client, BYO-OIDC only, EKS-style flat IAM + webhook, RFC 8693 token exchange plugin, embedding into the existing admin-kubeconfig Secret, single global client. Each with a short rationale for the rejection.Status
Draft. Looking for review on the identity-unit decision and on the open questions before the cozystack#3044 implementation lands.