Summary
With global.singleArgoCD: true, internal/controller/pattern_controller.go's main Reconcile calls createOrUpdateArgoCD(...) (internal/controller/argo.go) unconditionally on every reconcile pass:
// We only update the clusterwide argo instance so we can define our own 'initcontainers' section
err = createOrUpdateArgoCD(r.dynamicClient, r.fullClient, getClusterWideArgoName(), clusterWideNS, patternsOperatorConfig)
createOrUpdateArgoCD (argo.go) always builds a fresh hardcoded desired spec via newArgoCD(...) and, when the object already exists, calls a plain client.Resource(gvr).Namespace(namespace).Update(...) on it -- with no diff check against the live object beforehand:
} else { // update it
oldArgo, oldUnstructured, errGet := getArgoCDFunc(client, name, namespace)
...
argo.SetResourceVersion(oldArgo.GetResourceVersion())
...
_, err = client.Resource(gvr).Namespace(namespace).Update(context.TODO(), newArgo, metav1.UpdateOptions{})
}
Since the operator appears to also react to changes on the ArgoCD object it just wrote (directly or transitively, e.g. via the Application/Pattern watch chain), this Update -> triggers another reconcile -> Update again, in a self-sustaining loop that runs far faster than the documented ReconcileLoopRequeueTime (180s) steady-state interval.
Impact (observed live on OpenShift, patterns-operator v0.0.77, gitops-operator v1.21.0)
oc get argocd <vpArgoNamespace> -n <vpArgoNamespace> -o jsonpath='{.metadata.resourceVersion}' polled every ~10-15s shows the resourceVersion incrementing continuously with no other actor touching the object (confirmed via --show-managed-fields: the sole non-status managedFields entry for spec belongs to field-manager manager, operation: Update, matching controller-runtime's default field-manager name for a plain typed/dynamic client Update -- not ServerSideApply).
- Confirmed the root cause is this operator specifically (not ACM policies, not the
clustergroup chart, not any local pattern chart) by scaling patterns-operator-controller-manager to 0 replicas: the ArgoCD CR's resourceVersion immediately stopped changing and stayed stable for 90+ seconds; scaling back to 1 replica, the churn resumed within the next reconcile.
openshift-gitops-operator's own controller reacts to each of these Update()s by recomputing/reapplying the Deployments/StatefulSet it manages ("Updating StatefulSet ... updating volumes/container resources/container command/container env", "Updating Deployment 'vp-gitops-redis/repo-server/server/applicationset-controller' - updating volumes/...") roughly every 15-30s, which restarts the <name>-application-controller StatefulSet pod on that same cadence.
- Practical effect: Argo CD
Application sync operations that are Running when the controller pod restarts get interrupted and can remain stuck in operationState.phase: Running (observed for over an hour on one Application in our environment) instead of completing, and status.sync.status never settles to Synced for several Applications even though the git repo has no outstanding diff.
- Separately (same root cause, different symptom):
spec.rbac set by newArgoCD()'s hardcoded baseline (defaultPolicy: role:readonly, a 3-line policy granting only system:cluster-admins/cluster-admins/admin -> role:admin, scopes: [groups,email]) always wins over anything a pattern's own chart tries to set on the same CR (even via ServerSideApply=true), since this operator does a plain Update every reconcile. There is no values/Pattern-CR override hook for this baseline today (unlike clustergroup-chart's clusterGroup.argoCD.rbac, added in v0.9.50, which is a no-op for singleArgoCD since clustergroup-chart's own templates/plumbing/argocd.yaml renders nothing at all when global.singleArgoCD=true).
Suggested fix
Before calling Update() in createOrUpdateArgoCD, compute a semantic diff between oldArgo.Spec (or the relevant subset this function owns) and the freshly-built argo.Spec, and skip the API call entirely when they already match. This alone should stop the self-sustaining loop, since there would be nothing left to re-trigger the watch after the first successful convergence.
As a secondary/independent improvement, consider exposing an override hook (mirroring clusterGroup.argoCD.rbac on the non-singleArgoCD path) for at least spec.rbac on the clusterwide instance, so patterns that need a local ArgoCD user with more than read-only access (e.g. an MCP-server-style integration) aren't stuck with the hardcoded 3-line baseline policy.
Environment
patterns-operator.v0.0.77 (catalog channel fast, currently the latest available)
openshift-gitops-operator.v1.21.0
global.singleArgoCD: true, single-cluster hub-only reproduction (also affects hub+spoke topologies)
- OpenShift 4.20
Summary
With
global.singleArgoCD: true,internal/controller/pattern_controller.go's mainReconcilecallscreateOrUpdateArgoCD(...)(internal/controller/argo.go) unconditionally on every reconcile pass:createOrUpdateArgoCD(argo.go) always builds a fresh hardcoded desired spec vianewArgoCD(...)and, when the object already exists, calls a plainclient.Resource(gvr).Namespace(namespace).Update(...)on it -- with no diff check against the live object beforehand:Since the operator appears to also react to changes on the
ArgoCDobject it just wrote (directly or transitively, e.g. via the Application/Pattern watch chain), this Update -> triggers another reconcile -> Update again, in a self-sustaining loop that runs far faster than the documentedReconcileLoopRequeueTime(180s) steady-state interval.Impact (observed live on OpenShift, patterns-operator v0.0.77, gitops-operator v1.21.0)
oc get argocd <vpArgoNamespace> -n <vpArgoNamespace> -o jsonpath='{.metadata.resourceVersion}'polled every ~10-15s shows the resourceVersion incrementing continuously with no other actor touching the object (confirmed via--show-managed-fields: the sole non-status managedFields entry forspecbelongs to field-managermanager,operation: Update, matching controller-runtime's default field-manager name for a plain typed/dynamic client Update -- not ServerSideApply).clustergroupchart, not any local pattern chart) by scalingpatterns-operator-controller-managerto 0 replicas: theArgoCDCR'sresourceVersionimmediately stopped changing and stayed stable for 90+ seconds; scaling back to 1 replica, the churn resumed within the next reconcile.openshift-gitops-operator's own controller reacts to each of these Update()s by recomputing/reapplying the Deployments/StatefulSet it manages ("Updating StatefulSet ... updating volumes/container resources/container command/container env", "Updating Deployment 'vp-gitops-redis/repo-server/server/applicationset-controller' - updating volumes/...") roughly every 15-30s, which restarts the<name>-application-controllerStatefulSet pod on that same cadence.Applicationsync operations that areRunningwhen the controller pod restarts get interrupted and can remain stuck inoperationState.phase: Running(observed for over an hour on one Application in our environment) instead of completing, andstatus.sync.statusnever settles toSyncedfor several Applications even though the git repo has no outstanding diff.spec.rbacset bynewArgoCD()'s hardcoded baseline (defaultPolicy: role:readonly, a 3-line policy granting onlysystem:cluster-admins/cluster-admins/admin->role:admin,scopes: [groups,email]) always wins over anything a pattern's own chart tries to set on the same CR (even viaServerSideApply=true), since this operator does a plain Update every reconcile. There is no values/Pattern-CR override hook for this baseline today (unlikeclustergroup-chart'sclusterGroup.argoCD.rbac, added in v0.9.50, which is a no-op forsingleArgoCDsinceclustergroup-chart's owntemplates/plumbing/argocd.yamlrenders nothing at all whenglobal.singleArgoCD=true).Suggested fix
Before calling
Update()increateOrUpdateArgoCD, compute a semantic diff betweenoldArgo.Spec(or the relevant subset this function owns) and the freshly-builtargo.Spec, and skip the API call entirely when they already match. This alone should stop the self-sustaining loop, since there would be nothing left to re-trigger the watch after the first successful convergence.As a secondary/independent improvement, consider exposing an override hook (mirroring
clusterGroup.argoCD.rbacon the non-singleArgoCD path) for at leastspec.rbacon the clusterwide instance, so patterns that need a local ArgoCD user with more than read-only access (e.g. an MCP-server-style integration) aren't stuck with the hardcoded 3-line baseline policy.Environment
patterns-operator.v0.0.77(catalog channelfast, currently the latest available)openshift-gitops-operator.v1.21.0global.singleArgoCD: true, single-cluster hub-only reproduction (also affects hub+spoke topologies)