3ib0n's RKLLM Guide
These models and binaries require an RK3588 board running rknpu driver version 0.9.7 or above
Steps to reproduce conversion
# Download and setup miniforge3
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
# activate the base environment
source ~/miniforge3/bin/activate
# create and activate a python 3.8 environment
conda create -n rknn-llm-1.1.4 python=3.8
conda activate rknn-llm-1.1.4
# clone the lastest rknn-llm toolkit
git clone https://github.com/airockchip/rknn-llm.git
# intstall dependencies for the toolkit
pip install transformers accelerate torchvision rknn-toolkit2==2.2.1
pip install --upgrade torch pillow
# install rkllm
pip install ../../rkllm-toolkit/packages/rkllm_toolkit-1.1.4-cp38-cp38-linux_x86_64.whl
# edit or create a script to export rkllm models
cd rknn-llm/examples/rkllm_multimodal_demo
nano export/export_rkllm.py # update input and output paths
python export/export_rkllm.py
Example export_rkllm.py modified from https://github.com/airockchip/rknn-llm/blob/main/examples/rkllm_multimodel_demo/export/export_rkllm.py
import os
from rkllm.api import RKLLM
from datasets import load_dataset
from transformers import AutoTokenizer
from tqdm import tqdm
import torch
from torch import nn
modelpath = "~/models/Qwen/Qwen2.5-Coder-14B-Instruct/" ## UPDATE HERE
savepath = './Qwen2.5-Coder-14B-Instruct.rkllm' ## UPDATE HERE
llm = RKLLM()
# Load model
# Use 'export CUDA_VISIBLE_DEVICES=2' to specify GPU device
ret = llm.load_huggingface(model=modelpath, device='cpu')
if ret != 0:
print('Load model failed!')
exit(ret)
# Build model
qparams = None
## Do not use the dataset parameter as we are converting a pure text model, not a multimodal
ret = llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w8a8',
quantized_algorithm='normal', target_platform='rk3588', num_npu_core=3, extra_qparams=qparams)
if ret != 0:
print('Build model failed!')
exit(ret)
# # Export rkllm model
ret = llm.export_rkllm(savepath)
if ret != 0:
print('Export model failed!')
exit(ret)
Steps to build and run demo
# Dwonload the correct toolchain for working with rkllm
# Documentation here: https://github.com/airockchip/rknn-llm/blob/main/doc/Rockchip_RKLLM_SDK_EN_1.1.0.pdf
wget https://developer.arm.com/-/media/Files/downloads/gnu-a/10.2-2020.11/binrel/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz
tar -xz gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz
# ensure that the gcc compiler path is set to the location where the toolchain dowloaded earlier is unpacked
nano deploy/build-linux.sh # update the gcc compiler path
# compile the demo app
cd delpoy/
./build-linux.sh
Steps to run the app
More information and original guide: https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_multimodel_demo
# push install dir to device
adb push ./install/demo_Linux_aarch64 /data
# push model file to device
adb push Qwen2.5-Coder-14B-Instruct.rkllm /data/models
adb shell
cd /data/demo_Linux_aarch64
# export lib path
export LD_LIBRARY_PATH=./lib
# soft link models dir
ln -s /data/models .
# run llm(Pure Text Example)
./llm models/Qwen2.5-Coder-14B-Instruct.rkllm 128 512
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.