Skip to content

design-proposal: per-tenant Keycloak realms for tenant Kubernetes cluster OIDC#24

Draft
IvanHunters wants to merge 3 commits into
cozystack:mainfrom
IvanHunters:design/tenant-oidc-per-tenant-realm
Draft

design-proposal: per-tenant Keycloak realms for tenant Kubernetes cluster OIDC#24
IvanHunters wants to merge 3 commits into
cozystack:mainfrom
IvanHunters:design/tenant-oidc-per-tenant-realm

Conversation

@IvanHunters

Copy link
Copy Markdown

Summary

Design proposal companion to cozystack#3044.

The proposal answers the architectural decision raised in that PR's review: what is the identity unit for tenant Kubernetes cluster OIDC. This document picks per-tenant Keycloak realm (tenant-<ns>), with per-cluster public clients providing audience-bound token isolation, and explains why that is structurally different from using the flat cozy realm. It is intentionally written to compare both options head-on under Alternatives considered, so the trade-off is on the record before the implementation PR lands.

What the proposal covers

  • Identity model. Per-tenant Keycloak realm tenant-<ns> owned by the tenant admin, separate from the platform-admin cozy realm. Provisioned declaratively via Tenant.spec.oidc=true.
  • Per-cluster isolation. A KeycloakClient and a KeycloakClientScope per tenant cluster inside the tenant's realm; id_token.aud matches the per-cluster --oidc-client-id, so tokens are non-replayable across clusters.
  • Default RBAC. Two-tier view + admin KeycloakRealmGroup per cluster, bound inside the tenant cluster to upstream ClusterRole/view and ClusterRole/cluster-admin respectively. No blanket cluster-admin grant; the static admin kubeconfig remains as the documented break-glass path.
  • Realm propagation. Through cozystack-basics's static _namespace.oidc-realm emission, the same channel that already propagates DNS suffix and ingress class. Explicitly no Helm lookup is used.
  • OIDC kubeconfig Secret. A kubernetes-<cluster>-oidc-kubeconfig Secret in the management cluster's tenant namespace, exposed via cozyrds spec.secrets.include, carrying a ready-to-use kubeconfig with a kubectl oidc-login exec auth block.
  • Lifecycle. Create, inherit (child tenant), add a cluster, delete a cluster, disable OIDC on a tenant. All declarative.
  • Open questions and follow-ups. BYO-OIDC via --authentication-config on KamajiControlPlane, RFC 8693 token exchange via a custom credential plugin, and a cozystack CLI helper. All called out as separate future proposals.
  • Alternatives considered. Flat cozy realm + per-cluster client, BYO-OIDC only, EKS-style flat IAM + webhook, RFC 8693 token exchange plugin, embedding into the existing admin-kubeconfig Secret, single global client. Each with a short rationale for the rejection.

Status

Draft. Looking for review on the identity-unit decision and on the open questions before the cozystack#3044 implementation lands.

…ster OIDC

Companion design proposal to cozystack/cozystack#3044.

Central decision: per-tenant Keycloak realm (tenant-<ns>) as the identity
unit for tenant Kubernetes cluster OIDC, with per-cluster public clients
providing audience-bound token isolation. The platform-admin cozy realm
and tenant user directories serve different populations and trust models;
this proposal keeps them separate while delivering per-cluster isolation
through audience binding rather than a separate directory.

Covers realm provisioning via the EDP Keycloak operator, propagation
through the cozystack-basics namespace-values channel, per-cluster
KeycloakClient + view/admin KeycloakRealmGroups, KamajiControlPlane
oidc flag wiring, an OIDC kubeconfig Secret exposed via cozyrds, plus
lifecycle, rollout, security, failure modes, testing, and alternatives
considered (including the flat cozy + per-cluster-client path).

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eeaa7cbc-2289-4900-9f01-0d8b054d28de

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design proposal for implementing per-tenant Keycloak realms to back OIDC authentication for tenant Kubernetes clusters in Cozystack. The feedback suggests clarifying that the bootstrap Job runs in the management cluster rather than inside the tenant cluster itself. Additionally, it warns that attempting to delete ClusterRoleBindings inside the tenant cluster during a pre-delete hook is redundant and could block Helm release deletion if the control plane becomes unreachable.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


Tenant admin assigns Keycloak users to whichever group is appropriate through the standard Keycloak admin UI inside `tenant-<ns>`. Group membership ends up as `groups:` claim in the id_token.

Inside the tenant Kubernetes cluster, a bootstrap Job (Helm `post-install,post-upgrade` hook, ServiceAccount scoped to the tenant namespace, pod labeled `policy.cozystack.io/allow-to-apiserver: "true"` so Cilium permits the egress to the management apiserver) applies two `ClusterRoleBinding`s:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrasing 'Inside the tenant Kubernetes cluster, a bootstrap Job...' is misleading. Based on the context (the ServiceAccount is scoped to the tenant namespace in the management cluster, and it later writes a Secret to the management namespace), this Job actually runs in the management cluster and targets the tenant cluster's API. Clarifying this to 'A bootstrap Job running in the management cluster's tenant namespace...' would prevent confusion about where the Job is scheduled and executed.

- **Create**: `Tenant.spec.oidc=true` → realm appears → cozystack-basics emits realm into namespace values → all `Kubernetes` CRs in the namespace re-render with OIDC wired in on next Flux interval.
- **Inherit**: child tenant with `oidc=false` reads parent's `_namespace.oidc-realm` from its own namespace values. Same wiring as create.
- **Add a cluster**: a new `Kubernetes` CR in an OIDC-enabled tenant gets its own client+groups+CRBs+Secret automatically. No tenant-admin intervention.
- **Delete a cluster**: Helm `pre-delete` hook Job removes the per-cluster CRBs inside the tenant cluster, removes the OIDC kubeconfig Secret in the management cluster, and lets Flux garbage-collect the per-cluster `KeycloakClient` and `KeycloakRealmGroup`s.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Attempting to delete ClusterRoleBindings inside the tenant cluster during a pre-delete hook is redundant because the entire tenant cluster is being destroyed. Furthermore, if the tenant cluster's control plane is already partially torn down or unreachable, this hook could hang or fail, blocking the deletion of the Helm release. Consider limiting the cleanup Job to removing management-cluster resources like the OIDC kubeconfig Secret.

Grafana OIDC for tenants is a separate, unrelated initiative and was
incorrectly listed here as an in-flight companion. The Rollout section
is shortened accordingly.

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
…hentication config

The chart in cozystack#3044 was reworked to wire the apiserver via a
mounted AuthenticationConfiguration (apiserver.config.k8s.io/v1beta1)
rather than the legacy four --oidc-* flags. The proposal is updated to
match:

- Layer 5 now describes the three-piece wiring: --authentication-config
  flag + extraVolumeMounts on apiServer + extraVolumes on deployment,
  sourced from a per-cluster Secret carrying the structured config.
- Goals gain a bullet covering structured (not flag-based) authn so the
  forward path to BYO-OIDC and private-CA issuers is explicit up front.
- Context > KamajiControlPlane reflects that extraVolumes /
  extraVolumeMounts are no longer 'matters for the follow-up', they are
  used by this proposal.
- Open questions > BYO-OIDC narrows to the remaining design surface
  (chart API for a BYO JWTAuthenticator, audience composition).
- Alternatives considered gains the flag-based path entry with the
  rationale for rejecting it (one-issuer cap, no private-CA, future
  rewrite cost).

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>

@kvaps Andrei Kvapil (kvaps) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up — the doc is thorough and the "Alternatives considered" section is genuinely useful as the record for the per-tenant-realm direction. The structured --authentication-config delivery mechanism (Layer 5) and its rationale (multi-issuer, private-CA, additive BYO) are the right call and match where we want to land.

My main concern is alignment with today's two OIDC syncs, which settled the phasing after this doc was written. The agreed shape was:

  • Phase 1: authenticate tenant-cluster users via the existing platform cozy realm + per-cluster public client for audience isolation, plus the ability to pass a BYO OIDC/authentication config into the cluster. No groups, no auto-provisioning into Keycloak. API as a selector, e.g. oidc: System | CustomConfig | None.
  • Phase 2 (deferred, may not be needed): per-tenant Keycloak realm / "managed IdP", as a separate, fully-designed product (oidc: ... | KeycloakRealm | ...).

As written, the proposal does the inverse of that split, so I'd ask to realign it before the #3044 implementation lands:

  1. Identity unit (Overview / Rollout). The doc picks per-tenant realm tenant-<ns> as the unit to implement now. Per the syncs that's Phase 2. Phase 1 is the shared cozy realm + per-cluster client.
  2. "Flat cozy realm + per-cluster public client" (Alternatives, L278) is rejected — but that's the option we chose for Phase 1. Note the rejection somewhat misframes it: Phase 1 does not put tenant end-users into cozy. It authenticates the platform users that already exist there, and BYO covers a company's external users. Could you re-evaluate it as the Phase-1 baseline rather than a rejected alternative?
  3. BYO-OIDC is marked non-goal/deferred (L18, Non-goals) — it's part of Phase 1. Passing a custom OIDC/authn config into the cluster is one of the two Phase-1 deliverables, not a follow-up. Layer 5 already mounts a structured config, so this is mostly a framing/scope change.
  4. Groups + auto-RBAC (Layer 4). Two KeycloakRealmGroups per cluster + auto ClusterRoleBindings is exactly the group/auto-provisioning machinery the syncs put out of Phase 1 ("no groups"; a users: map with a role is enough). Auto-creating clients/groups in the IdP at cluster-creation is the "deep integration = a product that needs serious design" we flagged — and it breaks the customer-Keycloak case where the tenant does not want us spawning groups in their realm. Suggest dropping it from Phase 1.
  5. Billing / central-Keycloak load is unaddressed. All tenant-<ns> realms live on the platform master ClusterKeycloak (L72). That's the load/billing blocker raised on the call. Please add a section covering it and the alternative of running Keycloak as an External App inside the tenant namespace (billing becomes trivial, it's just API + DB pods), which was the proposed mitigation.

Factual fixes about the current codebase (verified against main):

  • L81 — it's packages/system/cozystack-basics/, not packages/core/..., and the emission lives in templates/cozystack-values-secret.yaml; there is no _namespace.tpl.
  • L85_namespace.dns-suffix / _namespace.kubelet-image-credential-provider-config are not propagated today. The actual _namespace.* keys are host, etcd, ingress, gateway, monitoring, seaweedfs, schedulingClass. The real precedent is host/ingress.
  • L82 — there is no tenant-hierarchy walk-up today; propagation is single-level parent lookup (packages/apps/tenant/templates/namespace.yaml). Recursive inheritance (Layer 2) is new mechanism, not "the same channel that already exists" — worth calling out as new work.
  • L46, L278 — the cozy groups are not <ns>-admins / <ns>-users. They are <tenant>-view, <tenant>-use, <tenant>-admin, <tenant>-super-admin (packages/apps/tenant/templates/keycloakgroups.yaml). This also affects the "group collision" argument.
  • L35KamajiControlPlane in cozystack is controlplane.cluster.x-k8s.io/v1alpha1 (packages/apps/kubernetes/templates/cluster.yaml). tcp.kamaji.clastix.io is the TenantControlPlane group — a different CRD.
  • L33apps/kubernetes has no oidc value at all today (not a "no-op field").
  • L17 vs Layer 5 — Scope says --oidc-* wiring, Design uses --authentication-config. Please make these consistent.

None of this is a knock on the analysis — it's solid and I'd keep it as the Phase-2 design record. The ask is to re-frame the doc around the Phase-1/Phase-2 split we agreed on, fix the codebase references, and add the billing section. Happy to pair on the Phase-1 scope if useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants