Re-flash and Rebuild#

This guide covers common recovery and rebuild operations: re-flashing nodes, reinstalling K3s, and forcing service redeployment.

Force flags#

The playbook is fully idempotent — it skips steps that are already completed. Use force flags to override this behaviour:

Flag

Effect

-e flash_force=true

Re-flash all nodes (erases eMMC even if OS is installed)

-e do_flash=true

Enable flashing. But does not flash nodes with OS installed

-e k3s_force=true

Uninstall and reinstall K3s on all nodes

-e cluster_force=true

Force reinstall of ArgoCD and cluster services

Re-flash and rebuild the entire cluster#

ansible-playbook pb_all.yml -e flash_force=true

This flashes every node with a fresh Ubuntu image, reinstalls K3s, and redeploys all services. This is the nuclear option — use it when you want a completely clean slate.

Re-flash a single node#

Use --limit to target specific hosts. Always include the Turing Pi BMC host as well (it is needed for the flash operation):

ansible-playbook pb_all.yml --limit turingpi,node03 -e flash_force=true

Reinstall K3s on all nodes#

ansible-playbook pb_all.yml --tags k3s,cluster -e k3s_force=true

This uninstalls K3s from every node, reinstalls it, and redeploys ArgoCD and all services. Persistent volume data on NVMe/Longhorn will survive if the underlying storage is not reformatted.

Reinstall K3s on a single worker#

ansible-playbook pb_all.yml --limit node03 --tags k3s -e k3s_force=true

The worker will be removed from the cluster, K3s will be reinstalled, and it will rejoin as a worker.

Redeploy cluster services only#

ansible-playbook pb_all.yml --tags cluster -e cluster_force=true

This reinstalls ArgoCD. After ArgoCD is up, it resynchronises all services from Git.

Run a single stage#

Use tags to run individual stages:

ansible-playbook pb_all.yml --tags tools       # Install CLI tools in devcontainer
ansible-playbook pb_all.yml --tags known_hosts  # Update SSH known_hosts
ansible-playbook pb_all.yml --tags servers      # OS migration + package updates
ansible-playbook pb_all.yml --tags k3s          # Install/update K3s
ansible-playbook pb_all.yml --tags cluster      # Install/update ArgoCD + services

What happens to data during a rebuild?#

Operation

eMMC

NVMe

Longhorn volumes

ArgoCD state

Re-flash

Erased

Preserved

Preserved (if on NVMe)

Redeployed from Git

K3s reinstall

Unchanged

Unchanged

Preserved

Redeployed from Git

Cluster redeploy

Unchanged

Unchanged

Preserved

Reinstalled

Note

eMMC always remains the bootloader for RK1 nodes. The ubuntu-rockchip-install tool (used by move_fs) copies the OS to NVMe but does not change the boot device. Re-flashing eMMC always restores the node to a bootable state.

Troubleshooting flash failures#

If tpi flash fails with Error occured during flashing: "USB":

  1. Power-cycle the BMC (not just the nodes). This is a BMC firmware USB enumeration bug.

  2. Re-run the playbook.

The BMC USB subsystem sometimes gets into a bad state that only a full power cycle resolves.