ADR-0001: Predictable ArgoCD behavior for stateful applications¶
Date: 2026-02-26 Status: Accepted Author: DevOps BLN JIRA: DEVOPSBLN-7015
Context¶
Stateful infrastructure apps (Thanos, Kafka, Druid) run as StatefulSets with PVCs that periodically need resizing, resource adjustments, or emergency patches. selfHeal can revert these changes within seconds if sync options are misconfigured -- some changes stick, some get silently reverted, and ignore rules that appear correct in the UI don't actually protect fields during sync. We ran five incident scenarios against Thanos Storegateway in dev to determine the exact behaviour and identify the configuration required for predictable operations on stateful workloads.
Lessons learned¶
Enable ServerSideApply on all infra apps. With SSA, selfHeal only reverts fields defined in the Helm template. Fields added via kubectl that don't exist in Git are left alone. Without SSA, ArgoCD tracks state via an annotation that is not inspectable, and the revert behaviour is unpredictable. We confirmed this in testing: selfHeal reverted template-defined fields (like image) within ~10 seconds, but a patch adding a field absent from Git was not reverted.
Always pair ignoreDifferences with RespectIgnoreDifferences=true. Without it, ArgoCD hides the diff in the UI but still overwrites the field during sync.
Configure ignoreApplicationDifferences with jsonPointers: [/spec/syncPolicy] on ApplicationSets. Without it, temporarily disabling auto-sync via the UI or CLI will not persist -- the ApplicationSet controller re-applies the template's sync policy within ~3 seconds. Both infra ApplicationSets now have this configured.
Use kubectl patch as an emergency measure, but always follow up with a values.yaml update and re-enable auto-sync.
Resize PVCs with auto-sync enabled -- ignoreDifferences covers PVC fields, no need to disable sync.
Use cascade=orphan for VolumeClaimTemplate updates. The StatefulSet controller adopts orphaned pods by name when the STS is recreated. Zero downtime.
Working with StatefulSet ad-hoc changes¶
PVC expansion (disk full)¶
When: PVC is running out of space and needs immediate expansion.
Downtime: Be careful -- this triggers a brief downtime. The pod must be restarted for the filesystem to expand (GKE standard-rwo requires a remount; the PVC shows FileSystemResizePending until then). Use this only on apps that can tolerate downtime.
Disable auto-sync: No. ignoreDifferences covers PVC fields.
kubectl patch pvc <name> -n <ns> -p '{"spec":{"resources":{"requests":{"storage":"<new-size>"}}}}'- Delete the pod to trigger filesystem expansion
- Update
values.yamlin Git to reflect the new size
Resource change (OOMKill, CPU throttle)¶
When: Pod is OOMKilled or CPU-throttled and needs more resources immediately. Downtime: Yes. Patching the StatefulSet triggers a rolling restart of pods. Disable auto-sync: Yes. selfHeal will revert template-defined fields.
- Disable auto-sync via the ArgoCD UI
kubectl patch sts <name> -n <ns> --type merge -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","resources":{"requests":{"memory":"<new>"},"limits":{"memory":"<new>"}}}]}}}}'- Wait for the rolling restart to complete
- Update
values.yamlin Git, merge, re-enable auto-sync
VolumeClaimTemplate change (immutable field)¶
When: Default PVC size for new replicas needs to change (e.g. scaling up with larger disks). Downtime: None. Existing pods keep running throughout. Disable auto-sync: Yes.
- Disable auto-sync via the ArgoCD UI
kubectl delete sts <name> -n <ns> --cascade=orphan-- pods keep running- Update VCT size in
values.yaml, push and merge to Git - Re-enable auto-sync -- ArgoCD recreates the StatefulSet with the new VCT
The StatefulSet controller adopts orphaned pods by name. Zero pod restarts.
Onboarding a new app to ArgoCD¶
Any app managing StatefulSets, PVCs, or CRDs needs these sync options:
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
- RespectIgnoreDifferences=true
```yaml
And these ignore rules:
```yaml
ignoreDifferences:
- group: "*"
kind: "*"
jsonPointers:
- /status
- group: apiextensions.k8s.io
kind: CustomResourceDefinition
jqPathExpressions:
- '.spec.preserveUnknownFields'
- group: ""
kind: PersistentVolumeClaim
jsonPointers:
- /spec/volumeName
- /spec/storageClassName
- group: apps
kind: StatefulSet
jsonPointers:
- /spec/volumeClaimTemplates
For ApplicationSet-managed apps where DevOps needs the option to disable sync at runtime:
Rollout expectations¶
The first sync after enabling SSA shows a one-time diff as ArgoCD migrates from annotation-based tracking to managedFields. This is a no-op that establishes field ownership without changing resource state.
Conflict errors may appear if two controllers claim the same field. This is an improvement over client-side apply, which silently overwrites. Resolve by adding the contested field to ignoreDifferences.