ezrkllm-collection / README.md
Pelochus's picture
Added Qwen 1.5 4B
575a297 verified
|
raw
history blame
2.73 kB
metadata
license: mit
tags:
  - rockchip
  - rk3588
  - rkllm
  - text-generation-inference
pipeline_tag: text-generation

ezrkllm-collection

Collection of LLMs compatible with Rockchip's chips using their rkllm-toolkit. This repo contains the converted models for running on the RK3588 NPU found in SBCs like Orange Pi 5, NanoPi R6 and Radxa Rock 5.

Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu

Available LLMs

Before running any LLM, take into account that the required RAM is between 1.5-3 times the model size (this is an estimation, haven't done extensive testing yet).

Right now, only converted the following models:

However, RKLLM also supports Qwen 2 (supossedly). Llama 2 was converted using Azure servers. For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max). Converting Llama 2 7B peaked at about 32 GBs of RAM + 50 GB of swap.

Downloading a model

Use:

git clone LINK_FROM_PREVIOUS_TABLE_HERE

And then (may not be necessary):

git lfs pull

If the first clone gives you problems (takes too long) you can also:

GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE

And then 'git lfs pull' inside the cloned folder to download the full model.

RKLLM parameters used

RK3588 only supports w8a8 quantization, so that was the selected quantization for ALL models. Aside from that, RKLLM toolkit allows for no optimization (0) and optimization (1). All models are optimized.

Future additions

  • Converting Llama 2 (70B currently in conversion, but that won't run even with 32GB RAM)
  • Converting Qwen 1.5 (from 0.5 to 7B, except 4B, already converted)
  • Adding other compatible Rockchip's SoCs

More info