Edit model card

Original model card

Buy me a coffee if you like this project ;)

Description

GGML Format model files for This project.

inference


import ctransformers

from ctransformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(output_dir, ggml_file,
gpu_layers=32, model_type="llama")

manual_input: str = "Tell me about your last dream, please."


llm(manual_input, 
      max_new_tokens=256, 
      temperature=0.9, 
      top_p= 0.7)

Original model card

Baichuan-7B-Instruction

介绍

Baichuan-7B-Instruction 为 Baichuan-7B 系列模型进行指令微调后的版本,预训练模型可见 Baichuan-7B

Demo

如下是一个使用 gradio 的模型 demo

import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True,use_fast=False)
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction",trust_remote_code=True ).half()
model.cuda()

def generate(histories,  max_new_tokens=2048, do_sample = True, top_p = 0.95, temperature = 0.35, repetition_penalty=1.1):
    prompt = ""
    for history in histories:
        history_with_identity = "\nHuman:" + history[0] + "\n\nAssistant:" + history[1]
        prompt += history_with_identity
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
    outputs = model.generate(
                    input_ids = input_ids,
                    max_new_tokens=max_new_tokens,
                    early_stopping=True,
                    do_sample=do_sample,
                    top_p=top_p, 
                    temperature=temperature,
                    repetition_penalty=repetition_penalty,
        )
    rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    generate_text = rets[0].replace(prompt, "")
    return generate_text
    
with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("clear")

    def user(user_message, history):
        return "", history + [[user_message, ""]]

    def bot(history):
        print(history)
        bot_message = generate(history)
        history[-1][1] = bot_message
        return history

    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
        bot, chatbot, chatbot
    )
    clear.click(lambda: None, None, chatbot, queue=False)

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0")


量化部署

Baichuan-7B 支持 int8 和 int4 量化,用户只需在推理代码中简单修改两行即可实现。请注意,如果是为了节省显存而进行量化,应加载原始精度模型到 CPU 后再开始量化;避免在 from_pretrained 时添加 device_map='auto' 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

使用 int8 量化 (To use int8 quantization):

model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda() 

同样的,如需使用 int4 量化 (Similarly, to use int4 quantization):

model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-7B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()

训练详情

数据集:https://huggingface.co./datasets/shareAI/ShareGPT-Chinese-English-90k。

硬件:8*A40

测评结果

CMMLU

Model 5-shot STEM Humanities Social Sciences Others China Specific Average
Baichuan-7B 34.4 47.5 47.6 46.6 44.3 44.0
Vicuna-13B 31.8 36.2 37.6 39.5 34.3 36.3
Chinese-Alpaca-Plus-13B 29.8 33.4 33.2 37.9 32.1 33.4
Chinese-LLaMA-Plus-13B 28.1 33.1 35.4 35.1 33.5 33.0
Ziya-LLaMA-13B-Pretrain 29.0 30.7 33.8 34.4 31.9 32.1
LLaMA-13B 29.2 30.8 31.6 33.0 30.5 31.2
moss-moon-003-base (16B) 27.2 30.4 28.8 32.6 28.7 29.6
Baichuan-13B-Base 41.7 61.1 59.8 59.0 56.4 55.3
Baichuan-13B-Chat 42.8 62.6 59.7 59.0 56.1 55.8
Baichuan-13B-Instruction 44.50 61.16 59.07 58.34 55.55 55.61
Baichuan-7B-Instruction 34.68 47.38 47.13 45.11 44.51 43.57
Model zero-shot STEM Humanities Social Sciences Others China Specific Average
ChatGLM2-6B 41.28 52.85 53.37 52.24 50.58 49.95
Baichuan-7B 32.79 44.43 46.78 44.79 43.11 42.33
ChatGLM-6B 32.22 42.91 44.81 42.60 41.93 40.79
BatGPT-15B 33.72 36.53 38.07 46.94 38.32 38.51
Chinese-LLaMA-7B 26.76 26.57 27.42 28.33 26.73 27.34
MOSS-SFT-16B 25.68 26.35 27.21 27.92 26.70 26.88
Chinese-GLM-10B 25.57 25.01 26.33 25.94 25.81 25.80
Baichuan-13B 42.04 60.49 59.55 56.60 55.72 54.63
Baichuan-13B-Chat 37.32 56.24 54.79 54.07 52.23 50.48
Baichuan-13B-Instruction 42.56 62.09 60.41 58.97 56.95 55.88
Baichuan-7B-Instruction 33.94 46.31 47.73 45.84 44.88 43.53

说明:CMMLU 是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的评测脚本对模型进行评测。Model zero-shot 表格中 Baichuan-13B-Chat 的得分来自我们直接运行 CMMLU 官方的评测脚本得到,其他模型的的得分来自于 CMMLU 官方的评测结果.

英文能力评测

除了中文榜单的测试,我们同样测试了模型在英文榜单 MMLU 上的能力。

MMLU

MMLU 是一个包含了57种任务的英文评测数据集。 我们采用了开源的评测方案 , 评测结果如下:

Model Humanities Social Sciences STEM Other Average
LLaMA-7B2 34.0 38.3 30.5 38.1 35.1
Falcon-7B1 - - - - 35.0
mpt-7B1 - - - - 35.6
ChatGLM-6B0 35.4 41.0 31.3 40.5 36.9
BLOOM 7B0 25.0 24.4 26.5 26.4 25.5
BLOOMZ 7B0 31.3 42.1 34.4 39.0 36.1
moss-moon-003-base (16B)0 24.2 22.8 22.4 24.4 23.6
moss-moon-003-sft (16B)0 30.5 33.8 29.3 34.4 31.9
Baichuan-7B0 38.4 48.9 35.6 48.1 42.3
Baichuan-7B-Instruction(5-shot) 38.9 49.0 35.3 48.8 42.6
Baichuan-7B-Instruction(0-shot) 38.7 47.9 34.5 48.2 42.0
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.