# Re-flash and Rebuild

This guide covers common recovery and rebuild operations: re-flashing nodes,
reinstalling K3s, and forcing service redeployment.

## Force flags

The playbook is fully idempotent — it skips steps that are already completed. Use
force flags to override this behaviour:

| Flag | Effect |
|------|--------|
| `-e flash_force=true` | Re-flash all nodes (erases eMMC even if OS is installed) |
| `-e do_flash=true` | Enable flashing. But does not flash nodes with OS installed |
| `-e k3s_force=true` | Uninstall and reinstall K3s on all nodes |
| `-e cluster_force=true` | Force reinstall of ArgoCD and cluster services |

## Re-flash and rebuild the entire cluster

```bash
ansible-playbook pb_all.yml -e flash_force=true
```

This flashes every node with a fresh Ubuntu image, reinstalls K3s, and redeploys all
services. This is the nuclear option — use it when you want a completely clean slate.

## Re-flash a single node

Use `--limit` to target specific hosts. Always include the Turing Pi BMC host as well
(it is needed for the flash operation):

```bash
ansible-playbook pb_all.yml --limit turingpi,node03 -e flash_force=true
```

## Reinstall K3s on all nodes

```bash
ansible-playbook pb_all.yml --tags k3s,cluster -e k3s_force=true
```

This uninstalls K3s from every node, reinstalls it, and redeploys ArgoCD and all services.
Persistent volume data on NVMe/Longhorn will survive if the underlying storage is not
reformatted.

## Reinstall K3s on a single worker

```bash
ansible-playbook pb_all.yml --limit node03 --tags k3s -e k3s_force=true
```

The worker will be removed from the cluster, K3s will be reinstalled, and it will rejoin
as a worker.

## Redeploy cluster services only

```bash
ansible-playbook pb_all.yml --tags cluster -e cluster_force=true
```

This reinstalls ArgoCD. After ArgoCD is up, it resynchronises all services from Git.

## Run a single stage

Use tags to run individual stages:

```bash
ansible-playbook pb_all.yml --tags tools       # Install CLI tools in devcontainer
ansible-playbook pb_all.yml --tags known_hosts  # Update SSH known_hosts
ansible-playbook pb_all.yml --tags servers      # OS migration + package updates
ansible-playbook pb_all.yml --tags k3s          # Install/update K3s
ansible-playbook pb_all.yml --tags cluster      # Install/update ArgoCD + services
```

## What happens to data during a rebuild?

| Operation | eMMC | NVMe | Longhorn volumes | ArgoCD state |
|-----------|------|------|-------------------|--------------|
| Re-flash | Erased | Preserved | Preserved (if on NVMe) | Redeployed from Git |
| K3s reinstall | Unchanged | Unchanged | Preserved | Redeployed from Git |
| Cluster redeploy | Unchanged | Unchanged | Preserved | Reinstalled |

:::{note}
eMMC always remains the bootloader for RK1 nodes. The `ubuntu-rockchip-install` tool
(used by `move_fs`) copies the OS to NVMe but does not change the boot device. Re-flashing
eMMC always restores the node to a bootable state.
:::

## Troubleshooting flash failures

If `tpi flash` fails with `Error occured during flashing: "USB"`:

1. **Power-cycle the BMC** (not just the nodes). This is a BMC firmware USB enumeration bug.
2. Re-run the playbook.

The BMC USB subsystem sometimes gets into a bad state that only a full power cycle resolves.