# Download RKLLama Models [RKLLama](https://github.com/NotPunchnox/rkllama) runs LLMs on the RK1's NPU using Rockchip's proprietary RKLLM runtime. Models must be pre-converted to `.rkllm` format for the RK3588 chip — standard GGUF/Safetensors files will not work. ## Prerequisites — configure your NFS share RKLLama stores models on an NFS PersistentVolume so that all RK1 nodes share the same model library and models survive pod restarts. Before deploying rkllama you must point it at your own NFS server. Edit **`kubernetes-services/values.yaml`** — this is the single place to configure NFS: ```yaml rkllama: nfs: server: 192.168.1.3 # ← replace with your NAS / NFS server IP path: /bigdisk/LMModels # ← replace with the exported path ``` ArgoCD injects these values into the rkllama Helm chart at sync time, so the PersistentVolume is updated automatically. No other file needs changing. Commit and push the change; ArgoCD will reconcile the PersistentVolume automatically. ## Find a compatible model on Hugging Face 1. Go to and search for `rk3588 rkllm`. 2. Look for repos that contain `.rkllm` files targeting `rk3588` or `rk3588s`. Common naming patterns in the filename to look out for: | Token | Meaning | |---|---| | `W8A8` | 8-bit weights, 8-bit activations (recommended) | | `W4A16` | 4-bit weights — smaller, slightly lower quality | | `G128` | group-size 128 quantisation variant | | `o0` / `o1` | optimisation level 0 / 1 — prefer `o1` for speed | | `rk3588` | built for RK3588 / RK1 / Orange Pi 5 | 3. On the HuggingFace repo page, click **Files** and note: - The **repo owner** (e.g. `ahz-r3v`) - The **repo name** (e.g. `DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4`) - The **exact `.rkllm` filename** (e.g. `DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm`) ## Pull a model with `rkllama-pull` `rkllama-pull` is a CLI tool that searches HuggingFace for compatible RKLLM models, lets you pick one interactively, and pulls it into the cluster via `kubectl exec`. It is installed by the `tools` Ansible role into `$BIN_DIR` (default `/root/bin`). ```bash rkllama-pull [search terms ...] ``` If you omit search terms you will be prompted: ``` $ rkllama-pull deepseek 7b Searching HuggingFace for: 'deepseek 7b rk3588 rkllm' ... Found 4 repo(s): 1. ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4 2. ... Select repo [1-4]: 1 Fetching file list from ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4 ... Available .rkllm files: 1. DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm 2. DeepSeek-R1-Distill-Qwen-7B_W4A16_RK3588_o0.rkllm Select file [1-2]: 1 Resolving rkllama pod ... Using: pod/rkllama-xyzab Pulling: ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4/DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm (Large models may take several minutes) 50% 100% ... Done. Open WebUI will pick up the new model within ~30 seconds. ``` ```{note} The download goes to the `rkllama-models` PVC on the node and persists across pod restarts. Large models (~8 GB) may take several minutes depending on your connection. ``` ```{note} Open WebUI's built-in model pull dialog is **not supported** — rkllama returns plain-text progress that the WebUI cannot parse. Use `rkllama-pull` instead. ``` ## Pull a model directly via kubectl If you prefer to skip the interactive tool, use `rkllama pull` directly in the pod: ```bash kubectl exec -n rkllama -it \ $(kubectl get pod -n rkllama -l app=rkllama -o name | head -1) \ -c rkllama -- rkllama pull ``` Enter the repo ID and filename when prompted: ``` Repo ID: ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4 File: DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm ``` Or pass them as a single argument: ```bash kubectl exec -n rkllama -it \ $(kubectl get pod -n rkllama -l app=rkllama -o name | head -1) \ -c rkllama -- rkllama pull \ ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4/DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm ``` ## List and delete models **List installed models:** ```bash kubectl exec -n rkllama \ $(kubectl get pod -n rkllama -l app=rkllama -o name | head -1) \ -c rkllama -- rkllama list ``` **Delete a model** (use the short name shown by `list`): ```bash kubectl exec -n rkllama \ $(kubectl get pod -n rkllama -l app=rkllama -o name | head -1) \ -c rkllama -- rkllama rm deepseek-r1-distill-qwen-7b ``` ## Memory limits The RK1 has 16 GB shared between CPU and NPU. Approximate model RAM usage: | Model size | Quantisation | Approx. RAM | |---|---|---| | 3B | W8A8 | ~4 GB | | 7B | W8A8 | ~9 GB | | 8B | W8A8 | ~10 GB | | 14B | W8A8 | ~15 GB (tight) | Models larger than ~14B will not fit on a single RK1 node.