Download RKLLama Models#
RKLLama runs LLMs on the RK1’s NPU using
Rockchip’s proprietary RKLLM runtime. Models must be pre-converted to .rkllm format
for the RK3588 chip — standard GGUF/Safetensors files will not work.
Find a compatible model on Hugging Face#
Go to https://huggingface.co/models and search for
rk3588 rkllm.Look for repos that contain
.rkllmfiles targetingrk3588orrk3588s.Common naming patterns in the filename to look out for:
Token
Meaning
W8A88-bit weights, 8-bit activations (recommended)
W4A164-bit weights — smaller, slightly lower quality
G128group-size 128 quantisation variant
o0/o1optimisation level 0 / 1 — prefer
o1for speedrk3588built for RK3588 / RK1 / Orange Pi 5
On the HuggingFace repo page, click Files and note:
The repo owner (e.g.
ahz-r3v)The repo name (e.g.
DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4)The exact
.rkllmfilename (e.g.DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm)
Pull a model with rkllama-pull#
rkllama-pull is a CLI tool that searches HuggingFace for compatible RKLLM models,
lets you pick one interactively, and pulls it into the cluster via kubectl exec.
It is installed by the tools Ansible role into $BIN_DIR (default /root/bin).
rkllama-pull [search terms ...]
If you omit search terms you will be prompted:
$ rkllama-pull deepseek 7b
Searching HuggingFace for: 'deepseek 7b rk3588 rkllm' ...
Found 4 repo(s):
1. ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4
2. ...
Select repo [1-4]: 1
Fetching file list from ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4 ...
Available .rkllm files:
1. DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm
2. DeepSeek-R1-Distill-Qwen-7B_W4A16_RK3588_o0.rkllm
Select file [1-2]: 1
Resolving rkllama pod ...
Using: pod/rkllama-xyzab
Pulling: ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4/DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm
(Large models may take several minutes)
50%
100%
...
Done. Open WebUI will pick up the new model within ~30 seconds.
Note
The download goes to the rkllama-models PVC on the node and persists across pod
restarts. Large models (~8 GB) may take several minutes depending on your connection.
Note
Open WebUI’s built-in model pull dialog is not supported — rkllama returns
plain-text progress that the WebUI cannot parse. Use rkllama-pull instead.
Pull a model directly via kubectl#
If you prefer to skip the interactive tool, use rkllama pull directly in the pod:
kubectl exec -n rkllama -it \
$(kubectl get pod -n rkllama -l app=rkllama -o name | head -1) \
-c rkllama -- rkllama pull
Enter the repo ID and filename when prompted:
Repo ID: ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4
File: DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm
Or pass them as a single argument:
kubectl exec -n rkllama -it \
$(kubectl get pod -n rkllama -l app=rkllama -o name | head -1) \
-c rkllama -- rkllama pull \
ahz-r3v/DeepSeek-R1-Distill-Qwen-7B-rk3588-rkllm-1.1.4/DeepSeek-R1-Distill-Qwen-7B_W8A8_RK3588_o1.rkllm
List and delete models#
List installed models:
kubectl exec -n rkllama \
$(kubectl get pod -n rkllama -l app=rkllama -o name | head -1) \
-c rkllama -- rkllama list
Delete a model (use the short name shown by list):
kubectl exec -n rkllama \
$(kubectl get pod -n rkllama -l app=rkllama -o name | head -1) \
-c rkllama -- rkllama rm deepseek-r1-distill-qwen-7b
Memory limits#
The RK1 has 16 GB shared between CPU and NPU. Approximate model RAM usage:
Model size |
Quantisation |
Approx. RAM |
|---|---|---|
3B |
W8A8 |
~4 GB |
7B |
W8A8 |
~9 GB |
8B |
W8A8 |
~10 GB |
14B |
W8A8 |
~15 GB (tight) |
Models larger than ~14B will not fit on a single RK1 node.