12. Drop Longhorn in Favour of Static Local PVs + NFS Backups#
Status: Accepted
Supersedes: 0009 (workstation exclusion from Longhorn — no longer relevant with Longhorn removed).
Context#
The cluster previously used Longhorn as its block-storage CSI provider for
six stateful PVCs (Supabase db/storage/minio, Grafana, Prometheus,
Open WebUI), totalling about 225Gi of live data. Two compounding problems
forced a rethink:
/rebuild-clusterdestroys all Longhorn data. The decommission playbook wipes/var/lib/longhornand/var/lib/rancher, and Longhorn’s volume metadata lives in Kubernetes CRDs that are gone once the cluster is torn down. Every rebuild was therefore a total loss of Supabase, Grafana history, and Open WebUI chat state.No backup system. Longhorn’s built-in volume snapshots are node-local and vanish with the node. Longhorn-to-NFS backup targets exist but were never wired up, because the recovery story still required a Longhorn cluster on the other side.
Additionally, the RK1 nodes each have a 1TB NVMe sitting mostly unused
(OS + free space) and nuc2 has a 931GB /home disk mounted at /home,
so local high-quality storage is already paid for.
The NAS (gknas, 192.168.1.3) hosts existing NFS shares for LLM models
(rkllama/llamacpp) and Supabase DB dumps, plus unrelated personal shares
(JellyFin library, files, etc.) that must not be touched. Any NAS change
has to respect that trust boundary.
Decision#
Drop Longhorn entirely. Replace it with two orthogonal layers:
Per-node static local PVs (
additions/local-storage/). Alocal-nvmeStorageClasswithprovisioner: kubernetes.io/no-provisioner,volumeBindingMode: WaitForFirstConsumer, andreclaimPolicy: Retain. Six staticPersistentVolumeobjects — one per live Longhorn PVC — each pre-bound viaspec.claimRef: {namespace, name}to the exact chart-generated PVC name, and pinned viaspec.nodeAffinityto a specific node:PVC
Node
On-disk path
supabase-dbnuc2
/home/k8s-data/supabase-dbsupabase-storagenuc2
/home/k8s-data/supabase-storagesupabase-minionuc2
/home/k8s-data/supabase-miniostorage-grafana-prometheus-0node03
/var/lib/k8s-data/grafanaprometheus-grafana-prometheus-kube-pr-prometheus-...node02
/var/lib/k8s-data/prometheusopen-webuinode04
/var/lib/k8s-data/open-webuiThe
k8s_data_dirsAnsible role creates these directories idempotently (part of theserversplay), with owner/mode chosen to match each workload’s podsecurityContext.pb_decommission.ymlpreserves/home/k8s-dataand/var/lib/k8s-databy default; a new opt-in flag-e wipe_local_data=trueremoves them for a genuine clean-slate rebuild.Per-app backup CronJobs writing to one NFS share on the NAS (
additions/backups/). A single cluster-owned subtree/bigdisk/k8s-cluster/hosts all cluster data the cluster reads or writes over NFS — LLM models, Supabase DB dumps, and the new backup targets (backups/<app>/{,weekly}). One static NFSPV/PVCcovers the whole subtree; each CronJob mounts it with a workload-specificsubPath. Daily + weekly schedules per app; retention is enforced in-job withfind -mtime. Prometheus is deliberately excluded — metrics are reconstructible from re-scrape.
Rejected alternatives#
Keep Longhorn, add a backup pipeline#
Possible, but doesn’t fix the fundamental problem: a rebuild destroys the
primary data store and the restore path would need a working Longhorn
cluster on the other side. Longhorn is also the single biggest source of
rebuild pain (iSCSI cleanup, stuck finalizers, multiple retry loops in
pb_decommission.yml), and removing it simplifies the teardown
significantly.
local-path-provisioner instead of static PVs#
local-path-provisioner is already the default StorageClass (bundled
with k3s), so in principle we could just point charts at it. Rejected
because its PVC→hostPath bindings live in etcd — they are lost
when the cluster is rebuilt. The new cluster’s local-path-provisioner
would allocate a brand new hostPath for each PVC, not re-bind to the
existing on-disk data. Static PVs with claimRef pinning are the only
way to guarantee a PVC re-binds to the same directory after a rebuild.
Ansible-managed NFS setup on the NAS#
The NAS is a QNAP (QTS) hosting unrelated personal data. Giving Ansible
access to it would require a trust boundary we don’t want, and QTS
regenerates /etc/exports from its web UI on every change, so any
Ansible-written configuration would be fragile. Decision: NAS setup
is a documented manual runbook (docs/how-to/nas-setup.md) the user
runs by hand on the NAS. The runbook creates /bigdisk/k8s-cluster/ as
a subdirectory of the existing /bigdisk NFS export (which is already
rw to the cluster subnet), so no QTS config changes are needed at
all. Rollback = leave the repo on the old paths (which still exist
untouched on the NAS).
Tar-out / copy-in migration#
Rejected because the Longhorn data being left behind is recreatable: Supabase runs its init migrations fresh, Grafana starts with empty dashboards (we had no saved dashboards), Prometheus starts empty (fine), Open WebUI starts empty (chat history was disposable). A one-time fresh-start cost was deemed acceptable versus the effort of tar-dumping Longhorn volumes and restoring them into raw hostPath directories with the right ownership.
Consequences#
Positive#
Stateful app data now survives
/rebuild-clusterby default. A rebuild re-binds the existing local-nvme PVs to fresh chart-generated PVCs; Supabase Studio shows the same thoughts, Grafana shows the same dashboards, Open WebUI shows the same chat history.Actual backup system exists. Nightly CronJobs write to the NAS, retention is enforced automatically, restore recipes are documented in Backup and Restore.
Decommission is simpler. All Longhorn-specific teardown (volume detachment waits, finalizer stripping, CRD cleanup, iSCSI logout,
/var/lib/longhornwipe) is gone.pb_decommission.ymlis shorter and faster.open-iscsino longer required on cluster nodes.
Negative#
No replication. Losing a node loses that node’s live data until the NFS backup is restored. Mitigation: per-app pinning spreads blast radius across four nodes (prometheus→node02, grafana→node03, open-webui→node04, supabase trio→nuc2), and RPO is one day (matching the daily CronJob schedule).
New RWO
local-nvmeworkloads must choose a host explicitly. The StorageClass isWaitForFirstConsumer+ static PVs, so a new PVC with no matching PV stays Pending until someone adds one. This is captured as a hard rule inCLAUDE.md.NAS remains a single point of failure for LLM models, Supabase DB dumps, and backup targets — unchanged from the previous design.
Manual NAS runbook must be run once before the first rebuild on this plan. Documented in
docs/how-to/nas-setup.md; the cluster cannot come up without it (rkllama/llamacpp/supabase-db-data PVs would fail to mount their new paths).Supabase
db-dataNFS path changed from/bigdisk/OpenBrainto/bigdisk/k8s-cluster/supabase-dumps— ADR 0006 is still accepted (NFS for the dump store), but its path is superseded by this ADR.
References#
PR #321 — scaffolds (this ADR’s infrastructure, no cutover)
PR #NNN — cutover (this ADR, chart StorageClass flips + Longhorn removal)
ADR 0006 — Supabase DB dump storage on NFS (path updated by this ADR)
ADR 0009 — Longhorn workstation exclusion (superseded by this ADR)
docs/how-to/nas-setup.md— manual NAS runbookdocs/how-to/backup-restore.md— CronJob schedule + restore recipes