metadata

license: mit
tags:
  - rockchip
  - rk3588
  - rkllm
  - text-generation-inference
pipeline_tag: text-generation

ezrkllm-collection

Collection of LLMs compatible with Rockchip's chips using their rkllm-toolkit. This repo contains the converted models for running on the RK3588 NPU found in SBCs like Orange Pi 5, NanoPi R6 and Radxa Rock 5.

Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu

Available LLMs

Before running any LLM, take into account that the required RAM is between 1.5-3 times the model size (this is an estimation, haven't done extensive testing yet).

Right now, only converted the following models:

LLM	Parameters	Link
Qwen Chat	1.8B	https://huggingface.co./Pelochus/qwen-1_8B-rk3588
Microsoft Phi-2	2.7B	https://huggingface.co./Pelochus/phi-2-rk3588
Llama 2 7B	7B	https://huggingface.co./Pelochus/llama2-chat-7b-hf-rk3588
Llama 2 13B	13B	https://huggingface.co./Pelochus/llama2-chat-13b-hf-rk3588
Qwen 1.5 Chat	4B	https://huggingface.co./Pelochus/qwen1.5-chat-4B-rk3588
TinyLlama v1 (broken)	1.1B	https://huggingface.co./Pelochus/tinyllama-v1-rk3588

However, RKLLM also supports Qwen 2 (supossedly). Llama 2 was converted using Azure servers. For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max). Converting Llama 2 7B peaked at about 32 GBs of RAM + 50 GB of swap.

Downloading a model

Use:

git clone LINK_FROM_PREVIOUS_TABLE_HERE

And then (may not be necessary):

git lfs pull

If the first clone gives you problems (takes too long) you can also:

GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE

And then 'git lfs pull' inside the cloned folder to download the full model.

RKLLM parameters used

RK3588 only supports w8a8 quantization, so that was the selected quantization for ALL models. Aside from that, RKLLM toolkit allows for no optimization (0) and optimization (1). All models are optimized.

Future additions

Converting Llama 2 (70B currently in conversion, but that won't run even with 32GB RAM)
Converting Qwen 1.5 (from 0.5 to 7B, except 4B, already converted)
Adding other compatible Rockchip's SoCs

More info

My fork for rknn-llm: https://github.com/Pelochus/ezrknn-llm
Original Rockchip's LLMs repo: https://github.com/airockchip/rknn-llm