singleArgoCD: createOrUpdateArgoCD unconditionally Update()s the singleton ArgoCD CR every reconcile, causing a self-sustaining fast reconcile loop and application-controller restarts

## Summary

With `global.singleArgoCD: true`, `internal/controller/pattern_controller.go`'s main `Reconcile` calls `createOrUpdateArgoCD(...)` (internal/controller/argo.go) unconditionally on every reconcile pass:

```go
// We only update the clusterwide argo instance so we can define our own 'initcontainers' section
err = createOrUpdateArgoCD(r.dynamicClient, r.fullClient, getClusterWideArgoName(), clusterWideNS, patternsOperatorConfig)
```

`createOrUpdateArgoCD` (argo.go) always builds a fresh hardcoded desired spec via `newArgoCD(...)` and, when the object already exists, calls a plain `client.Resource(gvr).Namespace(namespace).Update(...)` on it -- with **no diff check** against the live object beforehand:

```go
} else { // update it
    oldArgo, oldUnstructured, errGet := getArgoCDFunc(client, name, namespace)
    ...
    argo.SetResourceVersion(oldArgo.GetResourceVersion())
    ...
    _, err = client.Resource(gvr).Namespace(namespace).Update(context.TODO(), newArgo, metav1.UpdateOptions{})
}
```

Since the operator appears to also react to changes on the `ArgoCD` object it just wrote (directly or transitively, e.g. via the Application/Pattern watch chain), this Update -> triggers another reconcile -> Update again, in a self-sustaining loop that runs far faster than the documented `ReconcileLoopRequeueTime` (180s) steady-state interval.

## Impact (observed live on OpenShift, patterns-operator v0.0.77, gitops-operator v1.21.0)

- `oc get argocd <vpArgoNamespace> -n <vpArgoNamespace> -o jsonpath='{.metadata.resourceVersion}'` polled every ~10-15s shows the resourceVersion incrementing continuously with **no other actor touching the object** (confirmed via `--show-managed-fields`: the sole non-status managedFields entry for `spec` belongs to field-manager `manager`, `operation: Update`, matching controller-runtime's default field-manager name for a plain typed/dynamic client Update -- not ServerSideApply).
- Confirmed the root cause is this operator specifically (not ACM policies, not the `clustergroup` chart, not any local pattern chart) by scaling `patterns-operator-controller-manager` to 0 replicas: the `ArgoCD` CR's `resourceVersion` immediately stopped changing and stayed stable for 90+ seconds; scaling back to 1 replica, the churn resumed within the next reconcile.
- `openshift-gitops-operator`'s own controller reacts to each of these Update()s by recomputing/reapplying the Deployments/StatefulSet it manages ("Updating StatefulSet ... updating volumes/container resources/container command/container env", "Updating Deployment 'vp-gitops-redis/repo-server/server/applicationset-controller' - updating volumes/...") roughly every 15-30s, which restarts the `<name>-application-controller` StatefulSet pod on that same cadence.
- Practical effect: Argo CD `Application` sync operations that are `Running` when the controller pod restarts get interrupted and can remain stuck in `operationState.phase: Running` (observed for over an hour on one Application in our environment) instead of completing, and `status.sync.status` never settles to `Synced` for several Applications even though the git repo has no outstanding diff.
- Separately (same root cause, different symptom): `spec.rbac` set by `newArgoCD()`'s hardcoded baseline (`defaultPolicy: role:readonly`, a 3-line policy granting only `system:cluster-admins`/`cluster-admins`/`admin` -> `role:admin`, `scopes: [groups,email]`) always wins over anything a pattern's own chart tries to set on the same CR (even via `ServerSideApply=true`), since this operator does a plain Update every reconcile. There is no values/Pattern-CR override hook for this baseline today (unlike `clustergroup-chart`'s `clusterGroup.argoCD.rbac`, added in v0.9.50, which is a no-op for `singleArgoCD` since `clustergroup-chart`'s own `templates/plumbing/argocd.yaml` renders nothing at all when `global.singleArgoCD=true`).

## Suggested fix

Before calling `Update()` in `createOrUpdateArgoCD`, compute a semantic diff between `oldArgo.Spec` (or the relevant subset this function owns) and the freshly-built `argo.Spec`, and skip the API call entirely when they already match. This alone should stop the self-sustaining loop, since there would be nothing left to re-trigger the watch after the first successful convergence.

As a secondary/independent improvement, consider exposing an override hook (mirroring `clusterGroup.argoCD.rbac` on the non-singleArgoCD path) for at least `spec.rbac` on the clusterwide instance, so patterns that need a local ArgoCD user with more than read-only access (e.g. an MCP-server-style integration) aren't stuck with the hardcoded 3-line baseline policy.

## Environment

- `patterns-operator.v0.0.77` (catalog channel `fast`, currently the latest available)
- `openshift-gitops-operator.v1.21.0`
- `global.singleArgoCD: true`, single-cluster hub-only reproduction (also affects hub+spoke topologies)
- OpenShift 4.20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

singleArgoCD: createOrUpdateArgoCD unconditionally Update()s the singleton ArgoCD CR every reconcile, causing a self-sustaining fast reconcile loop and application-controller restarts #749

Summary

Impact (observed live on OpenShift, patterns-operator v0.0.77, gitops-operator v1.21.0)

Suggested fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

singleArgoCD: createOrUpdateArgoCD unconditionally Update()s the singleton ArgoCD CR every reconcile, causing a self-sustaining fast reconcile loop and application-controller restarts #749

Description

Summary

Impact (observed live on OpenShift, patterns-operator v0.0.77, gitops-operator v1.21.0)

Suggested fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions