--- language: - zh - en pipeline_tag: text-generation inference: false --- 原项目见 [https://huggingface.co./baichuan-inc/Baichuan-13B-Chat] 改动点:将原模型量化为8bit 保存为2GB大小的切片。 ## 使用方式(int8) ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation.utils import GenerationConfig tokenizer = AutoTokenizer.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit", use_fast=False, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit", device_map="auto", trust_remote_code=True) model.generation_config = GenerationConfig.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit") messages = [] messages.append({"role": "user", "content": "世界上第二高的山峰是哪座"}) response = model.chat(tokenizer, messages) print(response) ``` 如需使用 int4 量化 (Similarly, to use int4 quantization): ```python model = AutoModelForCausalLM.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit", device_map="auto",load_in_4bit=True,trust_remote_code=True) ```