K024's picture
create model card
3f466c9
|
raw
history blame
859 Bytes
metadata
language:
  - zh
  - en
tags:
  - chatglm
  - glm
  - onnx
  - onnxruntime

ChatGLM-6B + ONNX

This model is exported from ChatGLM-6b with int8 quantization and optimized for ONNXRuntime inference.

Inference code with ONNXRuntime is uploaded with the model. Install requirements and run streamlit run web-ui.py to start chatting. Currently the MatMulInteger (for u8s8 data type) and DynamicQuantizeLinear operators are only supported on CPU.

安装依赖并运行 streamlit run web-ui.py 预览模型效果。由于 ONNXRuntime 算子支持问题,目前仅能够使用 CPU 进行推理。

Codes are released under MIT license.

Model weights are released under the same license as ChatGLM-6b, see MODEL LICENSE.