File size: 2,852 Bytes
889c24a
 
cbd915a
 
 
 
b3b19e1
 
3743678
 
 
24121d0
 
3743678
5763b7e
 
3743678
9d4dfc6
 
3743678
24121d0
 
7fea569
7dc79f1
7fea569
4921edd
9d4dfc6
 
703a738
575a297
703a738
3743678
7dc79f1
24121d0
575a297
3743678
7fea569
 
 
 
 
 
 
 
 
 
 
 
 
9d4dfc6
7fea569
71f0fc0
 
 
 
 
3743678
4921edd
3743678
 
 
aac812b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
license: mit
tags:
- rockchip
- rk3588
- rkllm
- text-generation-inference
pipeline_tag: text-generation
---

# ezrkllm-collection
Collection of LLMs compatible with Rockchip's chips using their rkllm-toolkit. 
This repo contains the converted models for running on the RK3588 NPU found in SBCs like Orange Pi 5, NanoPi R6 and Radxa Rock 5.

Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu

## Available LLMs
Before running any LLM, take into account that the required RAM is between 1.5-3 times the model size (this is an estimation, haven't done extensive testing yet).

Right now, only converted the following models:
| LLM                   | Parameters  | Link                                                       | 
| --------------------- | ----------- | ---------------------------------------------------------- |
| Qwen Chat             | 1.8B        | https://huggingface.co./Pelochus/qwen-1_8B-rk3588           |
| Gemma                 | 2B          | https://huggingface.co./Pelochus/gemma-2b-rk3588            |
| Microsoft Phi-2       | 2.7B        | https://huggingface.co./Pelochus/phi-2-rk3588               |
| Microsoft Phi-3 Mini  | 3.8B        | https://huggingface.co./Pelochus/phi-3-mini-rk3588          |
| Llama 2 7B            | 7B          | https://huggingface.co./Pelochus/llama2-chat-7b-hf-rk3588   |
| Llama 2 13B           | 13B         | https://huggingface.co./Pelochus/llama2-chat-13b-hf-rk3588  |
| TinyLlama v1          | 1.1B        | https://huggingface.co./Pelochus/tinyllama-v1-rk3588        |
| Qwen 1.5 Chat         | 4B          | https://huggingface.co./Pelochus/qwen1.5-chat-4B-rk3588     |
| Qwen 2                | 1.5B        | https://huggingface.co./Pelochus/qwen2-1_5B-rk3588          |

Llama 2 was converted using Azure servers. 
For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max).
Converting Llama 2 7B peaked at about 32 GBs of RAM + 50 GB of swap. 

## Downloading a model 
Use:

`git clone LINK_FROM_PREVIOUS_TABLE_HERE`

And then (may not be necessary):

`git lfs pull`

If the first clone gives you problems (takes too long) you can also:

`GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE`

And then 'git lfs pull' inside the cloned folder to download the full model.

## RKLLM parameters used
RK3588 **only supports w8a8 quantization**, so that was the selected quantization for ALL models.
Aside from that, RKLLM toolkit allows for no optimization (0) and optimization (1).
All models are optimized.

## Future additions
- [x] Converting other compatible LLMs
- [ ] Adding other compatible Rockchip's SoCs

## More info
- My fork for rknn-llm: https://github.com/Pelochus/ezrknn-llm
- Original Rockchip's LLMs repo: https://github.com/airockchip/rknn-llm