ChatGLM-6B + ONNX

This model is exported from ChatGLM-6b with int8 quantization and optimized for ONNXRuntime inference. Export code in this repo.

Inference code with ONNXRuntime is uploaded with the model. Install requirements and run streamlit run web-ui.py to start chatting. Currently the MatMulInteger (for u8s8 data type) and DynamicQuantizeLinear operators are only supported on CPU. Arm64 with Neon support (Apple M1/M2) should be reasonably fast.

ๅฎ‰่ฃ…ไพ่ต–ๅนถ่ฟ่กŒ streamlit run web-ui.py ้ข„่งˆๆจกๅž‹ๆ•ˆๆžœใ€‚็”ฑไบŽ ONNXRuntime ็ฎ—ๅญๆ”ฏๆŒ้—ฎ้ข˜๏ผŒ็›ฎๅ‰ไป…่ƒฝๅคŸไฝฟ็”จ CPU ่ฟ›่กŒๆŽจ็†๏ผŒๅœจ Arm64 (Apple M1/M2) ไธŠๆœ‰ๅฏ่ง‚็š„้€Ÿๅบฆใ€‚ๅ…ทไฝ“็š„ ONNX ๅฏผๅ‡บไปฃ็ ๅœจ่ฟ™ไธชไป“ๅบ“ไธญใ€‚

Usage

Clone with git-lfs:

git lfs clone https://huggingface.co./K024/ChatGLM-6b-onnx-u8s8
cd ChatGLM-6b-onnx-u8s8
pip install -r requirements.txt
streamlit run web-ui.py

Or use huggingface_hub python client lib to download the repo snapshot:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="K024/ChatGLM-6b-onnx-u8s8", local_dir="./ChatGLM-6b-onnx-u8s8")

Codes are released under MIT license.

Model weights are released under the same license as ChatGLM-6b, see MODEL LICENSE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Spaces using K024/ChatGLM-6b-onnx-u8s8 100