Update README.md
Browse files
README.md
CHANGED
@@ -15,14 +15,18 @@ This repo contains the converted models for running on the RK3588 NPU found in S
|
|
15 |
Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu
|
16 |
|
17 |
## Available LLMs
|
|
|
|
|
18 |
Right now, only converted the following models:
|
19 |
| LLM | Parameters | Link |
|
20 |
| --------------------- | ----------- | ---------------------------------------------------------- |
|
21 |
| Qwen Chat | 1.8B | https://huggingface.co/Pelochus/qwen-1_8B-rk3588 |
|
22 |
| Microsoft Phi-2 | 2.7B | https://huggingface.co/Pelochus/phi-2-rk3588 |
|
23 |
-
|
|
|
|
|
|
24 |
|
25 |
-
However, RKLLM also supports Qwen
|
26 |
For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max).
|
27 |
|
28 |
## Downloading a model
|
@@ -38,7 +42,7 @@ If the first clone gives you problems (takes too long) you can also:
|
|
38 |
|
39 |
`GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE`
|
40 |
|
41 |
-
|
42 |
|
43 |
## RKLLM parameters used
|
44 |
RK3588 **only supports w8a8 quantization**, so that was the selected quantization for ALL models.
|
@@ -46,7 +50,8 @@ Aside from that, RKLLM toolkit allows for no optimization (0) and optimization (
|
|
46 |
All models are optimized.
|
47 |
|
48 |
## Future additions
|
49 |
-
- [
|
|
|
50 |
- [ ] Adding other compatible Rockchip's SoCs
|
51 |
|
52 |
## More info
|
|
|
15 |
Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu
|
16 |
|
17 |
## Available LLMs
|
18 |
+
Before running any LLM, take into account that the required RAM is between 1.5-3 times the model size (this is an estimation, haven't done extensive testing yet).
|
19 |
+
|
20 |
Right now, only converted the following models:
|
21 |
| LLM | Parameters | Link |
|
22 |
| --------------------- | ----------- | ---------------------------------------------------------- |
|
23 |
| Qwen Chat | 1.8B | https://huggingface.co/Pelochus/qwen-1_8B-rk3588 |
|
24 |
| Microsoft Phi-2 | 2.7B | https://huggingface.co/Pelochus/phi-2-rk3588 |
|
25 |
+
| Llama 2 7B | 7B | https://huggingface.co/Pelochus/llama2-chat-7b-hf-rk3588 |
|
26 |
+
| Llama 2 13B | 13B | https://huggingface.co/Pelochus/llama2-chat-13b-hf-rk3588 |
|
27 |
+
| TinyLlama v1 (broken) | 1.1B | https://huggingface.co/Pelochus/tinyllama-v1-rk3588 |
|
28 |
|
29 |
+
However, RKLLM also supports Qwen 1.5. Llama 2 was converted using Azure servers.
|
30 |
For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max).
|
31 |
|
32 |
## Downloading a model
|
|
|
42 |
|
43 |
`GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE`
|
44 |
|
45 |
+
And then 'git lfs pull' inside the cloned folder to download the full model.
|
46 |
|
47 |
## RKLLM parameters used
|
48 |
RK3588 **only supports w8a8 quantization**, so that was the selected quantization for ALL models.
|
|
|
50 |
All models are optimized.
|
51 |
|
52 |
## Future additions
|
53 |
+
- [x] Converting Llama 2 (70B currently in conversion, but that won't run even with 32GB RAM)
|
54 |
+
- [ ] Converting Qwen 1.5 (from 0.5 to 7B)
|
55 |
- [ ] Adding other compatible Rockchip's SoCs
|
56 |
|
57 |
## More info
|