Backup and Restore#
This guide covers the backup strategy for a cluster managed by this project.
What needs backing up?#
The key insight of a GitOps-managed cluster is that the Git repository is the backup for all configuration. ArgoCD can fully reconstruct the cluster state from the repo.
What is not in Git and needs separate backup:
Data |
Where it lives |
Backup approach |
|---|---|---|
Stateful app data |
static local-nvme PVs on nuc2 / RK1 nodes |
nightly CronJob → NFS NAS |
Sealed Secrets private key |
|
manual export |
Admin passwords |
|
re-create from password manager |
ArgoCD initial admin secret |
|
regenerated on install |
Stateful app data: CronJob-to-NFS backups#
The backups ArgoCD Application (kubernetes-services/templates/backups.yaml)
deploys one daily and one weekly CronJob per stateful workload. Each CronJob
mounts a single NFS PV — /bigdisk/k8s-cluster on the NAS — with a
workload-specific subPath so it only sees its own target subdirectory.
Workload |
Backup method |
NFS path |
|---|---|---|
Supabase Postgres |
`pg_dump |
gzip` |
Supabase Storage |
|
|
Supabase MinIO |
|
|
Grafana sqlite |
|
|
Open WebUI sqlite |
|
|
Prometheus is deliberately not backed up — metrics are reconstructible via re-scrape, and the snapshot-API dance is not worth the complexity.
Retention#
Daily CronJobs run at
02:00and keep 7 files (find -mtime +7 -delete).Weekly CronJobs run Sunday
03:00and keep 4 files (find -mtime +28 -delete).
Trigger a one-off backup#
kubectl create job --from=cronjob/supabase-db-daily manual-$(date +%s) -n backups
kubectl logs -n backups job/manual-<timestamp> -f
Then verify on the NAS:
ssh nas 'ls -lh /share/CACHEDEV1_DATA/bigdisk/k8s-cluster/backups/supabase-db/'
Restore — Supabase Postgres#
Copy the gzipped dump into the running DB pod and pipe it into psql:
kubectl cp /path/to/local/YYYY-MM-DD.sql.gz \
supabase/supabase-supabase-db-0:/tmp/restore.sql.gz
kubectl exec -n supabase supabase-supabase-db-0 -- \
sh -c 'gunzip -c /tmp/restore.sql.gz | psql -U postgres'
Or restore from the NFS PV directly by mounting it into a debug pod:
kubectl run pg-restore -n backups --rm -it --restart=Never \
--overrides='{
"spec": {
"containers": [{
"name": "psql",
"image": "postgres:15",
"command": ["sh", "-c", "gunzip -c /backup/$(ls -1t /backup | head -1) | psql -h supabase-supabase-db.supabase -U postgres"],
"volumeMounts": [{"name": "nfs", "mountPath": "/backup", "subPath": "backups/supabase-db"}]
}],
"volumes": [{"name": "nfs", "persistentVolumeClaim": {"claimName": "k8s-cluster-nfs"}}]
}
}'
Restore — Supabase Storage / MinIO / Open WebUI (tar archives)#
Scale the owning workload to zero, untar over the local-PV hostPath, then
scale back up. Example for Supabase Storage (nuc2 hostPath
/home/k8s-data/supabase-storage):
# 1. Scale down
kubectl scale -n supabase deploy/supabase-supabase-storage --replicas=0
# 2. Extract on the node (NFS share is already mounted on nuc2)
ssh ansible@nuc2 '
cd /home/k8s-data/supabase-storage &&
sudo rm -rf ./* ./.* 2>/dev/null;
sudo tar -xzf /mnt/nfs/k8s-cluster/backups/supabase-storage/YYYY-MM-DD.tar.gz
'
# 3. Scale back
kubectl scale -n supabase deploy/supabase-supabase-storage --replicas=1
(If the NFS share is not pre-mounted on the node, run the extract in a
debug pod that mounts the k8s-cluster-nfs PVC instead.)
Restore — Grafana sqlite#
The Grafana daily backup uses sqlite3 .backup (a consistent live snapshot,
unlike a raw tar of the db file). To restore:
kubectl scale -n monitoring sts/grafana-prometheus --replicas=0
ssh ansible@node03 '
sudo cp /mnt/nfs/k8s-cluster/backups/grafana/YYYY-MM-DD.db \
/var/lib/k8s-data/grafana/grafana.db
'
kubectl scale -n monitoring sts/grafana-prometheus --replicas=1
Sealed Secrets key backup#
If you rebuild the cluster, the sealed-secrets controller generates a new keypair. Existing SealedSecret YAML files in the repo will become undecryptable.
Export the key#
kubectl get secret -n kube-system \
-l sealedsecrets.bitnami.com/sealed-secrets-key \
-o yaml > sealed-secrets-key-backup.yaml
Warning
Store this file securely (e.g. password manager, encrypted drive) — never in Git. It can decrypt all your SealedSecrets.
Restore after rebuild#
Before ArgoCD deploys the sealed-secrets controller on a new cluster:
kubectl apply -f sealed-secrets-key-backup.yaml
The new controller will pick up the restored key and can decrypt existing SealedSecrets.
etcd backup and restore#
K3s uses an embedded etcd (or SQLite for single-node) datastore. Backing up etcd preserves the full cluster state including all Kubernetes objects.
Create an etcd snapshot#
ssh node01 sudo k3s etcd-snapshot save --name manual-$(date +%Y%m%d)
Snapshots are stored at /var/lib/rancher/k3s/server/db/snapshots/ on the
control plane node.
List snapshots#
ssh node01 sudo k3s etcd-snapshot list
Configure automatic snapshots#
K3s supports automatic etcd snapshots. Add to /etc/rancher/k3s/config.yaml
on the control plane:
etcd-snapshot-schedule-cron: "0 */6 * * *" # every 6 hours
etcd-snapshot-retention: 10
Restart K3s to apply:
ssh node01 sudo systemctl restart k3s
Restore from snapshot#
Warning
Restoring replaces the entire cluster state. All changes since the snapshot are lost.
ssh node01
sudo systemctl stop k3s
sudo k3s server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/<snapshot-name>
sudo systemctl start k3s
Disaster recovery#
To rebuild a cluster from scratch:
Flash and provision nodes (see tutorials).
Run the one-time NAS setup (
docs/how-to/nas-setup.md) so the cluster-owned/bigdisk/k8s-clustershare exists on the NAS.Run
ansible-playbook pb_all.yml -e do_flash=true.Restore the sealed-secrets key (if backed up).
Re-create the
admin-authsecrets (see Bootstrap the Cluster).ArgoCD auto-syncs all services from Git.
If
wipe_local_data=truewas used on decommission, restore stateful data from the latest NFS backup under/bigdisk/k8s-cluster/backups/using the restore recipes above. Otherwise, local PV data is still intact from before the rebuild and re-binds automatically.
The cluster will be fully operational within minutes, with only persistent data requiring explicit restoration.