# Backup and Restore This guide covers the backup strategy for a cluster managed by this project. ## What needs backing up? The key insight of a GitOps-managed cluster is that **the Git repository is the backup for all configuration**. ArgoCD can fully reconstruct the cluster state from the repo. What is **not** in Git and needs separate backup: | Data | Where it lives | Backup approach | |------|-----------------|-----------------| | Persistent Volume data | Longhorn volumes on NVMe | Longhorn snapshots/backups | | Sealed Secrets private key | `kube-system` namespace | Manual export | | Admin passwords | `admin-auth` secrets (manual) | Re-create from password manager | | ArgoCD initial admin secret | `argo-cd` namespace | Regenerated on install | ## Longhorn volume snapshots Longhorn supports both **snapshots** (local, on the same nodes) and **backups** (to external storage like NFS or S3). ### Create a snapshot Via the Longhorn UI at **https://longhorn.your-domain.com**: 1. Navigate to **Volumes**. 2. Click on the volume name. 3. Click **Take Snapshot**. Via `kubectl`: ```bash kubectl apply -f - < sealed-secrets-key-backup.yaml ``` :::{warning} Store this file **securely** (e.g. password manager, encrypted drive) — never in Git. It can decrypt all your SealedSecrets. ::: ### Restore after rebuild Before ArgoCD deploys the sealed-secrets controller on a new cluster: ```bash kubectl apply -f sealed-secrets-key-backup.yaml ``` The new controller will pick up the restored key and can decrypt existing SealedSecrets. ## etcd backup and restore K3s uses an embedded etcd (or SQLite for single-node) datastore. Backing up etcd preserves the full cluster state including all Kubernetes objects. ### Create an etcd snapshot ```bash ssh node01 sudo k3s etcd-snapshot save --name manual-$(date +%Y%m%d) ``` Snapshots are stored at `/var/lib/rancher/k3s/server/db/snapshots/` on the control plane node. ### List snapshots ```bash ssh node01 sudo k3s etcd-snapshot list ``` ### Configure automatic snapshots K3s supports automatic etcd snapshots. Add to `/etc/rancher/k3s/config.yaml` on the control plane: ```yaml etcd-snapshot-schedule-cron: "0 */6 * * *" # every 6 hours etcd-snapshot-retention: 10 ``` Restart K3s to apply: ```bash ssh node01 sudo systemctl restart k3s ``` ### Restore from snapshot :::{warning} Restoring replaces the entire cluster state. All changes since the snapshot are lost. ::: ```bash ssh node01 sudo systemctl stop k3s sudo k3s server --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/ sudo systemctl start k3s ``` ## Disaster recovery To rebuild a cluster from scratch: 1. Flash and provision nodes (see tutorials). 2. Run `ansible-playbook pb_all.yml -e do_flash=true`. 3. Restore the sealed-secrets key (if backed up). 4. Re-create the `admin-auth` secrets (see {doc}`bootstrap-cluster`). 5. ArgoCD auto-syncs all services from Git. 6. Restore Longhorn volumes from NFS backups (if configured). The cluster will be fully operational within minutes, with only persistent data requiring explicit restoration.