Skip to content

fix(qdrant): clean up orphaned data PVC on delete (#3059)#3075

Open
scooby87 wants to merge 2 commits into
mainfrom
fix/qdrant-data-pvc-orphaned
Open

fix(qdrant): clean up orphaned data PVC on delete (#3059)#3075
scooby87 wants to merge 2 commits into
mainfrom
fix/qdrant-data-pvc-orphaned

Conversation

@scooby87

@scooby87 scooby87 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What this PR does

After deleting a Qdrant app CR, the StatefulSet data PVC
(qdrant-storage-<release>-0) was left orphaned in the tenant namespace,
consuming storage permanently. StatefulSet-templated PVCs are never
garbage-collected on StatefulSet deletion by default.

The vendored upstream Qdrant chart does not support
persistentVolumeClaimRetentionPolicy (verified: its statefulset.yaml never
renders that field), so the OpenBAO-style value passthrough is not possible
here. Instead this adds a post-delete PVC cleanup hook (modeled on the MariaDB
hook) that deletes the release's PVCs by label.

The chart's volumeClaimTemplate only stamps app: qdrant (shared by every
Qdrant in a namespace), so to select PVCs safely per release this also passes
persistence.additionalLabels.app.kubernetes.io/instance: <release> through the
HelmRelease — giving the data PVC a release-unique label the hook selects on.

The cleanup Job is hardened: runs as non-root (uid 65534), read-only root fs,
all capabilities dropped, seccompProfile: RuntimeDefault; RBAC scoped to
persistentvolumeclaims. A real cleanup failure is logged (not silently
swallowed) but never blocks namespace teardown.

Warning

Deleting a Qdrant app now permanently removes its data PVC and the vector data it holds.
This is intentional and fixes the storage leak; the hook only fires on full
release deletion (not on scale-down). Back up before deleting.

Fixes #3059.

Release note

fix(qdrant): garbage-collect the StatefulSet data PVC when a Qdrant app is deleted, instead of leaving it orphaned. WARNING: deleting a Qdrant app now permanently removes its data PVC and the data it holds.

Summary by CodeRabbit

  • New Features

    • Added automatic cleanup of Qdrant persistent volume claims when a release is deleted.
    • PVCs are now labeled per release to ensure cleanup targets only the correct data.
  • Bug Fixes

    • Improved release deletion behavior to avoid leftover storage resources.
    • Added safeguards so cleanup ignores missing PVCs and reports failures clearly.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where Qdrant data PVCs were left orphaned in the namespace after an application deletion. By introducing a post-delete Helm hook and applying unique instance labels to the PVCs, the system now ensures that storage is reclaimed correctly when a Qdrant release is removed. This change prevents long-term storage leaks while maintaining strict security practices for the cleanup process.

Highlights

  • PVC Cleanup Hook: Added a post-delete Helm hook that automatically removes orphaned persistent volume claims (PVCs) associated with a Qdrant release.
  • Labeling Strategy: Implemented a release-unique label (app.kubernetes.io/instance) on the data PVC to ensure the cleanup hook targets only the relevant resources.
  • Security Hardening: The cleanup Job is configured with non-root execution, read-only root filesystem, and minimal RBAC permissions to ensure secure operation.
  • Testing: Added comprehensive unit tests to verify the label passthrough and the correct rendering of the cleanup hook and its associated RBAC.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment Gemini (@gemini-code-assist) Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on Gemini (@gemini-code-assist) comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added area/uncategorized PR auto-labeler could not map title scope to a known area/*; please review kind/bug Categorizes issue or PR as related to a bug size/L This PR changes 100-499 lines, ignoring generated files labels Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fa206258-c1de-4b00-821d-c8990589fac9

📥 Commits

Reviewing files that changed from the base of the PR and between 065581a and 26dc006.

📒 Files selected for processing (3)
  • packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml
  • packages/apps/qdrant/templates/qdrant.yaml
  • packages/apps/qdrant/tests/cleanup_pvc_test.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/apps/qdrant/tests/cleanup_pvc_test.yaml
  • packages/apps/qdrant/templates/qdrant.yaml

📝 Walkthrough

Walkthrough

Qdrant now labels its data PVCs with the release instance name and adds a post-delete Helm hook that deletes matching PVCs in the release namespace. A new test suite checks the label propagation, hook rendering, and RBAC.

Changes

Qdrant PVC cleanup

Layer / File(s) Summary
PVC release labeling
packages/apps/qdrant/templates/qdrant.yaml, packages/apps/qdrant/tests/cleanup_pvc_test.yaml
Adds app.kubernetes.io/instance: {{ .Release.Name }} to Qdrant PVC labels and verifies the rendered value in tests.
Post-delete cleanup hook
packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml, packages/apps/qdrant/tests/cleanup_pvc_test.yaml
Adds a post-delete Job, ServiceAccount, Role, and RoleBinding that delete PVCs matching the release instance label, and tests the hook command and RBAC.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Suggested reviewers

  • kvaps
  • sircthulhu
  • myasnikovdaniil

Poem

A rabbit hopped by with a label so bright,
On PVC burrows it shone through the night.
A cleanup job nibbled the leftovers away,
Then RBAC stood guard at the end of the day.
🐰✨ Hop, delete, and rejoice!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and accurately summarizes the main change: cleaning up orphaned Qdrant PVCs on delete.
Linked Issues check ✅ Passed The PR implements the linked fix by adding a post-delete PVC cleanup hook and release-unique PVC labels, matching the issue's expected behavior.
Out of Scope Changes check ✅ Passed The changes stay focused on the PVC leak fix and its tests, with no obvious unrelated additions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/qdrant-data-pvc-orphaned

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@dosubot dosubot Bot added the area/database Issues or PRs related to managed databases (postgres, mariadb, redis, etcd, kafka, clickhouse) label Jun 25, 2026
@scooby87 scooby87 removed the area/uncategorized PR auto-labeler could not map title scope to a known area/*; please review label Jun 25, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a post-delete cleanup hook to delete orphaned PersistentVolumeClaims (PVCs) when a Qdrant release is deleted, along with corresponding unit tests. Feedback on the changes points out that the upstream Qdrant Helm chart uses persistence.labels rather than persistence.additionalLabels to apply labels to the PVC volume claim templates. Consequently, the HelmRelease values and the associated unit tests need to be updated to use labels to ensure the PVCs are correctly labeled and subsequently cleaned up.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +42 to +43
additionalLabels:
app.kubernetes.io/instance: {{ .Release.Name }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The upstream Qdrant Helm chart uses persistence.labels to apply labels to the PVC volume claim templates, not persistence.additionalLabels. Using additionalLabels will result in the label being ignored, meaning the PVC will not be labeled and the post-delete cleanup hook will fail to find and delete the orphaned PVC.

Please change additionalLabels to labels.

        labels:
          app.kubernetes.io/instance: {{ .Release.Name }}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This finding is incorrect — no change needed.

The vendored upstream Qdrant chart applies volumeClaimTemplates labels from persistence.additionalLabels, not persistence.labels:

https://github.com/cozystack/cozystack/blob/main/packages/system/qdrant/charts/qdrant/templates/statefulset.yaml#L264-L271

  volumeClaimTemplates:
    - metadata:
        name: {{ .Values.persistence.storageVolumeName | default "qdrant-storage" }}
        labels:
          app: {{ template "qdrant.name" . }}
          {{- with .Values.persistence.additionalLabels }}
          {{- toYaml . | nindent 10 }}
          {{- end }}

There is no persistence.labels key in this chart. Switching to labels would drop the app.kubernetes.io/instance label from the PVC and break the post-delete cleanup hook. The current additionalLabels passthrough is correct, and the helm unittest suite (cleanup_pvc_test.yaml) passes against the rendered output.

value: HelmRelease
asserts:
- equal:
path: spec.values.qdrant.persistence.additionalLabels["app.kubernetes.io/instance"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Since the field in templates/qdrant.yaml should be changed from additionalLabels to labels, this test assertion needs to be updated accordingly to verify the correct path.

          path: spec.values.qdrant.persistence.labels["app.kubernetes.io/instance"]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same root cause as the additionalLabels thread above — the upstream chart uses persistence.additionalLabels (statefulset.yaml#L264-L271), so this test asserts the correct path and needs no change.

@github-actions github-actions Bot added the area/uncategorized PR auto-labeler could not map title scope to a known area/*; please review label Jun 25, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml`:
- Around line 5-95: The cleanup PVC Helm template has unquoted Helm scalars that
strict YAML parsers treat as syntax errors. Update the qdrant cleanup manifest
so every field using template expressions like {{ .Release.Name }} is quoted,
including the resource names, serviceAccountName, roleRef.name, and
subjects.name. Use the cleanup job, ServiceAccount, Role, RoleBinding, and
related identifiers to locate all affected scalar values and keep the rendered
YAML valid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a426d42b-fa22-407f-82ad-ba6e3a2bce8d

📥 Commits

Reviewing files that changed from the base of the PR and between 63f9926 and 065581a.

📒 Files selected for processing (4)
  • packages/apps/qdrant/Makefile
  • packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml
  • packages/apps/qdrant/templates/qdrant.yaml
  • packages/apps/qdrant/tests/cleanup_pvc_test.yaml

Comment on lines +5 to +95
name: {{ .Release.Name }}-qdrant-cleanup
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-delete
"helm.sh/hook-weight": "10"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
metadata:
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
policy.cozystack.io/allow-to-apiserver: "true"
spec:
serviceAccountName: {{ .Release.Name }}-qdrant-cleanup
restartPolicy: Never
securityContext:
runAsNonRoot: true
runAsUser: 65534
runAsGroup: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: cleanup
image: docker.io/clastix/kubectl:v1.32
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
env:
- name: HOME
value: /tmp
command:
- /bin/sh
- -c
- |
echo "Deleting orphaned PVCs for {{ .Release.Name }}..."
kubectl delete pvc -n {{ .Release.Namespace }} -l app.kubernetes.io/instance={{ .Release.Name }} --ignore-not-found \
|| echo "WARNING: PVC cleanup failed for {{ .Release.Name }}; orphaned PVCs may remain" >&2
echo "PVC cleanup complete."
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ .Release.Name }}-qdrant-cleanup
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-delete
helm.sh/hook-weight: "0"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ .Release.Name }}-qdrant-cleanup
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-delete
"helm.sh/hook-weight": "5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
rules:
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ .Release.Name }}-qdrant-cleanup
labels:
app.kubernetes.io/instance: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-delete
helm.sh/hook-weight: "5"
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ .Release.Name }}-qdrant-cleanup
subjects:
- kind: ServiceAccount
name: {{ .Release.Name }}-qdrant-cleanup

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

Quote templated scalar values to resolve YAML syntax errors.

yamllint fails on this file with syntax error: expected <block end> at line 5. Unquoted Helm template scalars starting with {{ are misinterpreted as YAML flow collections by strict parsers.

Quote all line items containing {{ .Release.Name }} (e.g., name, serviceAccountName, roleRef.name, subjects.name). For example:
name: {{ .Release.Name }}-qdrant-cleanupname: "{{ .Release.Name }}-qdrant-cleanup".

🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 5-5: syntax error: expected , but found ''

(syntax)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/apps/qdrant/templates/hooks/cleanup-pvc.yaml` around lines 5 - 95,
The cleanup PVC Helm template has unquoted Helm scalars that strict YAML parsers
treat as syntax errors. Update the qdrant cleanup manifest so every field using
template expressions like {{ .Release.Name }} is quoted, including the resource
names, serviceAccountName, roleRef.name, and subjects.name. Use the cleanup job,
ServiceAccount, Role, RoleBinding, and related identifiers to locate all
affected scalar values and keep the rendered YAML valid.

The StatefulSet data PVC (qdrant-storage-<release>-0) was left orphaned
after a Qdrant app CR was deleted. The vendored upstream chart does not
support persistentVolumeClaimRetentionPolicy, so add a post-delete PVC
cleanup hook (modeled on the MariaDB hook) and pass persistence.additionalLabels
through the HelmRelease so the data PVC carries a release-unique
app.kubernetes.io/instance label the hook selects on. The cleanup Job runs
non-root with a read-only root fs and dropped capabilities; failures are
logged but never block namespace teardown.

Signed-off-by: Алексей Артамонов <aleksei.artamonov@aenix.io>
@scooby87 scooby87 force-pushed the fix/qdrant-data-pvc-orphaned branch from 065581a to 178b836 Compare June 25, 2026 15:48
The .PHONY: test tweak is out of scope for the PVC cleanup fix; remove it
to keep this PR focused (flagged by review).

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Алексей Артамонов <alexeyartamonov1987@gmail.com>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT LGTM — the PVC-label mechanism breaks reconciliation for already-deployed Qdrant instances.

Blocker — StatefulSet volumeClaimTemplates is immutable. This PR routes the release-unique label through qdrant.persistence.additionalLabels, which the upstream chart renders into volumeClaimTemplates[].metadata.labels (packages/system/qdrant/charts/qdrant/templates/statefulset.yaml ~L264, L269). For any Qdrant that already exists, this is a change to an immutable StatefulSet field, which the API server rejects (only replicas, ordinals, template, updateStrategy, persistentVolumeClaimRetentionPolicy, minReadySeconds are mutable). The <release>-system HelmRelease (retries: -1) then retry-fails indefinitely on upgrade — no data loss, but reconciliation is permanently broken, and the cleanup still can't reclaim those instances' PVCs (volumeClaimTemplate label changes don't apply retroactively to existing PVCs). This is the opposite of the openbao case, where the change worked precisely because persistentVolumeClaimRetentionPolicy is in the mutable list.

Suggested fix: drop the additionalLabels passthrough and have the hook delete by deterministic PVC name-prefix (qdrant-storage-<release>-). PVC names derive from the release, so this needs no StatefulSet change (upgrade-safe) and also reclaims PVCs from instances created before this change.

Also: hack/e2e-apps/qdrant.bats deletes the Qdrant but never asserts the data PVC is gone afterward, so the behavior this PR exists to fix isn't covered end-to-end — please add a post-delete (async-waited) not-found check on qdrant-storage-<name>-0.

Notes: the unit test can only assert the value reaches the HelmRelease, not that the subchart consumes it (I confirmed consumption separately at statefulset.yaml L269 — it is not a no-op). Consider the fix(...)! marker for the data-loss-on-delete behavior change, to match the sibling ClickHouse PR. The automated "use persistence.labels" suggestions are incorrect — the chart uses persistence.additionalLabels (no persistence.labels key exists). The security hardening on the cleanup Job is a clear improvement over the hook it is modeled on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/database Issues or PRs related to managed databases (postgres, mariadb, redis, etcd, kafka, clickhouse) area/uncategorized PR auto-labeler could not map title scope to a known area/*; please review kind/bug Categorizes issue or PR as related to a bug size/L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qdrant: StatefulSet data PVC is orphaned after deletion

2 participants