mabaochang commited on
Commit
a86679b
1 Parent(s): 863044c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ tags:
4
+ - text2text-generation
5
+ pipeline_tag: text2text-generation
6
+ language:
7
+ - zh
8
+ - en
9
+ ---
10
+
11
+ Considering LLaMA's license constraints, the model is for research and learning only.
12
+ Please strictly respect LLaMA's usage policy. We are not allowed to publish weights for LLaMA, of course, even finetuned, but there is no problem publishing the difference, a patch that we suggest to apply to the files.
13
+ The encryption is a simple XOR between files, ensuring that only the people that have access to the original weights (from completely legal sources, of course) can transform them into finetuned weights.
14
+ You can find the decrypt code on https://github.com/LianjiaTech/BELLE/tree/main/models .
15
+
16
+
17
+ # Model Card for Model ID
18
+
19
+ ## Welcome
20
+ If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !
21
+
22
+ ## Update
23
+ A new checkpoint trained with learning rate of 5e-6 is uploaded.
24
+ In our evaluation, llama trained with smaller lr achieved better performance.
25
+
26
+ ## Model description
27
+ BELLE-LLAMA-13B-2M-enc is based on LLAMA 13B and finetuned with 2M Chinese data combined with 50,000 pieces of English data from the open source Stanford-Alpaca, resulting in good Chinese instruction understanding and response generation capabilities.
28
+
29
+ The code of Chinese data generation and other detailed information can be found in our Github project repository: https://github.com/LianjiaTech/BELLE.
30
+
31
+
32
+ ## Training hyper-parameters
33
+ | Parameter | Value |
34
+ | ------ | ------ |
35
+ | Batch size | 16 |
36
+ | Learning rate | 5e-6 |
37
+ | Epochs | 3 |
38
+ |Weight_decay | 0.0 |
39
+ |Warmup_rate | 0.03 |
40
+ |LR_scheduler | cosine |
41
+
42
+ ## Download, Convert & Check
43
+ After you git clone this model
44
+ ```
45
+ md5sum ./*
46
+ 029965adbff7a240f33d040dedca0a54 ./config.json.e366f0c901ee336cb921450f975b3e3c5e32874035d227f4263dbcb5d966b822.enc
47
+ b1cc6321ba72757b82842cc44ffadbf3 ./generation_config.json.fd7ff399e5568cc21a0a8414f43df88ef7c424995b9b97a90563165d2cf79efd.enc
48
+ 0311f7aac77860f24e5d6379043a1c5e ./pytorch_model-00001-of-00003.bin.5abb160ecbd441c6a1fbe00a9eaa194ee0bd8cd75850c24f503336bd29f0dc45.enc
49
+ e1f8ffc06377eaa516c72091d49af6ec ./pytorch_model-00002-of-00003.bin.46a0e748edff9f0f82aa5f3e721e80e0f342f3d03dc47d0ec6514ea78a585320.enc
50
+ f1fd70e919041e63d7f8b104380dfcb1 ./pytorch_model-00003-of-00003.bin.ec6e4d45dc4c51f2b9abff5ea9840f06f633e065cdf574b71e96366c26a01578.enc
51
+ bf19c5b8dc64bfb19400a4b7fb3bc5b6 ./pytorch_model.bin.index.json.72e91e29282dae48ea5562fcf4d6ca0d5a9c2a30ebc8d67174a19e192552a20b.enc
52
+ 1ab707fa9b0c4be294fd0b867d73e919 ./special_tokens_map.json.44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a.enc
53
+ cae7b4ee8d1ad4e4402632bb0600cc17 ./tokenizer_config.json.ef7ef410b9b909949e96f172b17cbf7c68b11761c632715fa05a6088c0c2b9ac.enc
54
+ 848005d07146c31e73a10020b3a3099a ./tokenizer.model.9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347.enc
55
+ ```
56
+
57
+ After you decrypt the files using https://github.com/LianjiaTech/BELLE/tree/main/models
58
+ ```
59
+ md5sum ./*
60
+ 0fa6ff8379308d40f090878593f085a9 ./config.json
61
+ 2917a1cafb895cf57e746cfd7696bfe5 ./generation_config.json
62
+ 1710f2d139d883d7e1e9a3f3198ee581 ./pytorch_model-00001-of-00003.bin
63
+ 74b26646e31debd94c5c1092b3e39102 ./pytorch_model-00002-of-00003.bin
64
+ 1c123bee82a65a43b6005b7040e20618 ./pytorch_model-00003-of-00003.bin
65
+ 621720a147e0dd2a97580ab5dd0c5557 ./pytorch_model.bin.index.json
66
+ d463d8a04501fbf1d71feaa8fc1be250 ./README.md
67
+ 99914b932bd37a50b983c5e7c90ae93b ./special_tokens_map.json
68
+ 5526ad31f4928acb5219e295e5ff81ce ./tokenizer_config.json
69
+ eeec4125e9c7560836b4873b6f8e3025 ./tokenizer.model
70
+ ```
71
+
72
+ ## Use model
73
+ Please note that the input should be formatted as follows in both **training** and **inference**.
74
+ ``` python
75
+ Human: {input} \n\nAssistant:
76
+ ```
77
+
78
+ In order to load BELLE-LLAMA-13B-2M-enc with huggingface transformers, please install the main version, as the latest stable version doesn't support LLAMA (as of March 26, 2023).
79
+ ``` python
80
+ pip install git+https://github.com/huggingface/transformers
81
+ ```
82
+
83
+ After you decrypt the files, BELLE-LLAMA-13B-2M can be easily loaded with LlamaForCausalLM.
84
+ ``` python
85
+ from transformers import LlamaForCausalLM, AutoTokenizer
86
+ import torch
87
+
88
+ ckpt = './result/BELLE-LLAMA-13B-2M'
89
+ device = torch.device('cuda')
90
+ model = LlamaForCausalLM.from_pretrained(ckpt, device_map='auto', low_cpu_mem_usage=True)
91
+ tokenizer = AutoTokenizer.from_pretrained(ckpt)
92
+ prompt = "Human: 写一首中文歌曲,赞美大自然 \n\nAssistant: "
93
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
94
+ generate_ids = model.generate(input_ids, max_new_tokens=500, do_sample = True, top_k = 30, top_p = 0.85, temperature = 0.5, repetition_penalty=1., eos_token_id=2, bos_token_id=1, pad_token_id=0)
95
+ output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
96
+ response = output[len(prompt):]
97
+
98
+ ```
99
+
100
+ ## Limitations
101
+ There still exists a few issues in the model trained on current base model and data:
102
+
103
+ 1. The model might generate factual errors when asked to follow instructions related to facts.
104
+
105
+ 2. Occasionally generates harmful responses since the model still struggles to identify potential harmful instructions.
106
+
107
+ 3. Needs improvements on reasoning and coding.
108
+
109
+ Since the model still has its limitations, we require developers only use the open-sourced code, data, model and any other artifacts generated via this project for research purposes. Commercial use and other potential harmful use cases are not allowed.
110
+
111
+
112
+ ## Citation
113
+
114
+ Please cite us when using our code, data or model.
115
+
116
+ ```
117
+ @misc{BELLE,
118
+ author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li},
119
+ title = {BELLE: Be Everyone's Large Language model Engine},
120
+ year = {2023},
121
+ publisher = {GitHub},
122
+ journal = {GitHub repository},
123
+ howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
124
+ }
125
+ ```