--- language: - zh - en pipeline_tag: text-generation inference: false --- # Baichuan-13B ## 介绍 Baichuan-13B 是由百川智能继 [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 之后开发的包含 130 亿参数的开源可商用的大规模语言模型,在标准的中文和英文 benchmark上均取得同尺寸最好的效果。本次发布包含有预训练 (Baichuan-13B-Base) 和对齐 (Baichuan-13B-Chat) 两个版本。Baichuan-13B 有如下几个特点: 1. **开源可商用百亿级别中文语言模型**:Baichuan-13B-Base 是免费开源可商用的百亿级别中文预训练语言模型。包含有130亿参数,没有经过任何 Instruction Tuning 或者针对 benchmark 的优化,纯净、高可定制。弥补了在中文领域缺乏 100 亿以上高可用中文预训练大模型的短板。 2. **更大尺寸、更多数据**:在 Baichuan-7B 的基础上进一步扩大参数量到 130 亿,并且在高质量的语料上训练了 1.4 万亿 tokens,是当前开源 13B 尺寸下训练数据量最多的模型。支持中英双语,使用 [ALiBi](https://arxiv.org/abs/2108.12409) 位置编码,上下文窗口长度为 4096。 3. **同时开源预训练和对齐模型**:预训练模型是适用开发者的”基座“,而广大普通用户对有对话功能的对齐模型具有更强的需求。因此本次开源我们同时发布了对齐模型(Baichuan-13B-Chat),具有很强的对话能力,开箱即用,支持很简单的部署。 4. **更高效的推理**:为了支持更广大用户的使用,我们本次同时开源了 int8 和 int4 的量化版本,在几乎没有效果损失的情况下可以很方便的将模型部署在低显存机器上。 ## How to Get Started with the Model 如下是一个使用baichuan-7B进行1-shot推理的任务,根据作品给出作者名,正确输出为"夜雨寄北->李商隐" ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True) inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt') inputs = inputs.to('cuda:0') pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1) print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)) ``` The following is a task of performing 1-shot inference using baichuan-7B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez" ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True) inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt') inputs = inputs.to('cuda:0') pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1) print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)) ``` ## Model Details ### Model Description - **Developed by:** 百川智能(Baichuan Intelligent Technology) - **Email**: opensource@baichuan-inc.com - **Language(s) (NLP):** Chinese/English - **License:** [Baichuan-13B License]() ### Model Sources 整体模型基于标准的Transformer结构,我们采用了和LLaMA一样的模型设计 - **Position Embedding**:采用rotary-embedding,是现阶段被大多数模型采用的位置编码方案,具有很好的外推性。 - **Feedforward Layer**:采用SwiGLU,Feedforward变化为(8/3)倍的隐含层大小,即11008。 - **Layer Normalization**: 基于[RMSNorm](https://arxiv.org/abs/1910.07467)的Pre-Normalization。 具体参数和见下表 | Hyperparameter | Value | |----------------|-------| |n_parameters | 7000559616 | |n_layers | 32 | | n_heads | 32 | | d_model | 4096 | | vocab size | 64000 | | sequence length | 4096 | The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA: - Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities. - Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008. - Layer Normalization: Pre-Normalization based on [RMSNorm](https://arxiv.org/abs/1910.07467). The specific parameters are as follows: | Hyperparameter | Value | |----------------|-------| |n_parameters | 7000559616 | |n_layers | 32 | | n_heads | 32 | | d_model | 4096 | | vocab size | 64000 | | sequence length | 4096 | ## Uses ### Downstream Use 我们同时开源出了和本模型配套的训练代码,允许进行高效的Finetune用于下游任务,具体参见[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B)。 We have also open-sourced the training code that accompanies this model, allowing for efficient finetuning for downstream tasks. For more details, please refer to [baichuan-13B](https://github.com/baichuan-inc/baichuan-13B). ### Out-of-Scope Use 在没有充分评估风险和采取缓解措施的情况下投入生产使用;任何可能被视为不负责任或有害的使用案例。 Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. ## Bias, Risks, and Limitations Baichuan-13B可能会产生事实上不正确的输出,不应依赖它产生事实上准确的信息。Baichuan-13B是在各种公共数据集上进行训练的。尽管我们已经做出了巨大的努力来清洗预训练数据,但这个模型可能会生成淫秽、偏见或其他冒犯性的输出。 Baichuan-13B can produce factually incorrect output, and should not be relied on to produce factually accurate information. Baichuan-13B was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs. ## Training Details 训练具体设置参见[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B)。 For specific training settings, please refer to [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B). ## Evaluation # Benchmark结果 我们在各个 benchmark 下进行了`5-shot`评测,所采用的方法和 [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B/) 项目中相同。结果如下: ## C-Eval | Model 5-shot | STEM | Social Sciences | Humanities | Others | Average | |-------------------------|-------|-----------------|------------|--------|---------| | ChatGLM2-6B | 45.9 | 61.6 | 49.7 | 48.2 | 50.2 | | InternLM-7B* | 40.1 | 55.7 | 49.4 | 37.9 | 44.6 | | Baichuan-7B | 38.2 | 52.0 | 46.2 | 39.3 | 42.8 | | Ziya-LLaMA-13B-Pretrain | 27.6 | 34.4 | 32.0 | 28.6 | 30.0 | | LLaMA-13B | 27.0 | 33.6 | 27.7 | 27.6 | 28.5 | | moss-moon-003-base (16B)| 27.0 | 29.1 | 27.2 | 26.9 | 27.4 | | vicuna-13B | 22.8 | 24.8 | 22.3 | 18.5 | 22.2 | | **Baichuan-13B-Base** | **45.9** | **63.5** | **57.2** | **49.3** | **52.4** | | **Baichuan-13B-Chat** | **43.7** | **64.6** | **56.2** | **49.2** | **51.5** | > *说明:表中各个模型的结果是使用统一的评估代码得到。[InternLM-7B](https://huggingface.co./internlm/internlm-7b) 汇报使用 [OpenCompass](https://opencompass.org.cn/rank) 工具评估的C-Eval平均值为 53.4,我们使用 OpenCompass 评估 InternLM-7B 的平均值为 51.6 ## MMLU | Model 5-shot | STEM | Social Sciences | Humanities | Others | Average | |-------------------------|-------|-----------------|------------|--------|---------| | LLaMA-13B | 36.1 | 53.0 | 44.0 | 52.8 | 46.3 | | ChatGLM2-6B | 38.2 | 52.5 | 43.2 | 50.8 | 45.9 | | InternLM-7B | 38.0 | 51.1 | 39.2 | 50.2 | 44.1 | | Ziya-LLaMA-13B-Pretrain | 35.6 | 47.6 | 40.1 | 49.4 | 42.9 | | Baichuan-7B | 35.6 | 48.9 | 38.4 | 48.1 | 42.3 | | vicuna-13B | 24.2 | 24.1 | 24.6 | 26.8 | 24.9 | | moss-moon-003-base (16B)| 22.4 | 22.8 | 24.2 | 24.4 | 23.6 | | **Baichuan-13B-Base** | **41.6** | **60.9** | **47.4** | **58.5** | **51.6** | | **Baichuan-13B-Chat** | **40.9** | **60.9** | **48.8** | **59.0** | **52.1** | ## CMMLU | Model 5-shot | STEM | Humanities | Social Sciences | Others | China Specific | Average | |-------------------------|-------|------------|-----------------|--------|----------------|---------| | InternLM-7B | 41.7 | 54.4 | 56.4 | 55.4 | 53.1 | 52.1 | | ChatGLM2-6B | 42.5 | 51.4 | 51.4 | 50.7 | 48.4 | 49.0 | | Baichuan-7B | 34.4 | 47.5 | 47.6 | 46.6 | 44.3 | 44.0 | | Ziya-LLaMA-13B-Pretrain | 29.0 | 30.7 | 33.8 | 34.4 | 31.9 | 32.1 | | LLaMA-13B | 29.2 | 30.8 | 31.6 | 33.0 | 30.5 | 31.2 | | moss-moon-003-base (16B)| 27.2 | 30.4 | 28.8 | 32.6 | 28.7 | 29.6 | | vicuna-13B | 24.0 | 25.4 | 25.3 | 25.0 | 25.0 | 24.9 | | **Baichuan-13B-Base** | **41.7** | **61.1** | **59.8** | **59.0** | **56.4** | **55.3** | | **Baichuan-13B-Chat** | **42.8** | **62.6** | **59.7** | **59.0** | **56.1** | **55.8** | ## Our Group ![WeChat](https://github.com/baichuan-inc/baichuan-7B/blob/main/media/wechat.jpeg?raw=true)