Text Generation
Transformers
Safetensors
jamba
conversational
custom_code
Inference Endpoints
ptrdvn commited on
Commit
9a79178
โ€ข
1 Parent(s): ba4cd78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -162
README.md CHANGED
@@ -3,13 +3,11 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- Jamba Instruct (multilingual)
7
 
8
- ## Model Details
9
 
10
- This model was trained as a small-scale experiment to determine how easy it is to fine-tune ai21labs' Jamba v0.1 to work as a chatbot.
11
-
12
- The aim of this experiment was to find how intelligently and reliably Jamba can chat in both English and other languages if only finetuned for a few hours.
13
 
14
  Initial subjective testing has shown that this model can chat reasonably well in both English and Japanese, so feel free to give it a try!
15
 
@@ -39,195 +37,145 @@ print(tokenizer.batch_decode([outputs[0][len(input_ids[0]):]]))
39
  # ['ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใฏใ€้‹ๅ‹•ใ‚’ใ™ใ‚‹ใจใใซไฝ“ๆธฉใŒไธŠใŒใ‚Šใ€ไฝ“ๅ†…ใฎ็†ฑใ‚’ๅค–้ƒจใซๆ”พๅ‡บใ™ใ‚‹ใŸใ‚ใฎ่‡ช็„ถใชใƒกใ‚ซใƒ‹ใ‚บใƒ ใงใ™ใ€‚ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใŒๅคšใ„ใ“ใจใฏใ€ไธ€่ˆฌ็š„ใซใฏใ€ไฝ“ใฎๆธฉๅบฆ่ชฟ็ฏ€ๆฉŸ่ƒฝใŒๅƒใ„ใฆใ„ใ‚‹ใ“ใจใ‚’ๆ„ๅ‘ณใ—ใพใ™ใ€‚ใ—ใ‹ใ—ใ€ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใŒๅคšใ™ใŽใ‚‹ใจใ€ไธๅฟซๆ„Ÿใ‚„ๆฑ—็—‡ใชใฉใฎๅ•้กŒใŒ็™บ็”Ÿใ™ใ‚‹ใ“ใจใŒใ‚ใ‚Šใพใ™ใ€‚ไปฅไธ‹ใซใ€ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใŒๅคšใ„ๅ ดๅˆใฎๅฏพ็ญ–ใ‚’็ดนไป‹ใ—ใพใ™ใ€‚\n\n1. ้ฉๅˆ‡ใชๆœ่ฃ…ใ‚’้ธใถ: ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใŒๅคšใ„ๅ ดๅˆใ€่ปฝ้‡ใง้€ๆนฟๆ€งใฎ้ซ˜ใ„ๆœใ‚’้ธใถใ“ใจใŒ้‡่ฆใงใ™ใ€‚ใ“ใ‚Œใซใ‚ˆใ‚Šใ€ๆฑ—ใŒไฝ“ใ‹ใ‚‰ๅค–้ƒจใซ๏ฟฝ']
40
  ```
41
 
 
42
 
43
 
44
 
 
45
 
46
- ### Model Description
47
-
48
- <!-- Provide a longer summary of what this model is. -->
49
-
50
- This is the model card of a ๐Ÿค— transformers model that has been pushed on the Hub. This model card has been automatically generated.
51
-
52
- - **Developed by:** [More Information Needed]
53
- - **Funded by [optional]:** [More Information Needed]
54
- - **Shared by [optional]:** [More Information Needed]
55
- - **Model type:** [More Information Needed]
56
- - **Language(s) (NLP):** [More Information Needed]
57
- - **License:** [More Information Needed]
58
- - **Finetuned from model [optional]:** [More Information Needed]
59
-
60
- ### Model Sources [optional]
61
-
62
- <!-- Provide the basic links for the model. -->
63
-
64
- - **Repository:** [More Information Needed]
65
- - **Paper [optional]:** [More Information Needed]
66
- - **Demo [optional]:** [More Information Needed]
67
-
68
- ## Uses
69
-
70
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
71
-
72
- ### Direct Use
73
-
74
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
75
-
76
- [More Information Needed]
77
-
78
- ### Downstream Use [optional]
79
-
80
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
81
-
82
- [More Information Needed]
83
-
84
- ### Out-of-Scope Use
85
-
86
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
87
-
88
- [More Information Needed]
89
-
90
- ## Bias, Risks, and Limitations
91
-
92
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
93
-
94
- [More Information Needed]
95
-
96
- ### Recommendations
97
-
98
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
99
-
100
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
101
-
102
- ## How to Get Started with the Model
103
-
104
- Use the code below to get started with the model.
105
-
106
- [More Information Needed]
107
-
108
- ## Training Details
109
-
110
- ### Training Data
111
-
112
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
113
-
114
- [More Information Needed]
115
-
116
- ### Training Procedure
117
 
118
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
119
 
120
- #### Preprocessing [optional]
121
 
122
- [More Information Needed]
123
 
 
124
 
125
- #### Training Hyperparameters
126
 
127
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
128
 
129
- #### Speeds, Sizes, Times [optional]
130
 
131
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
132
 
133
- [More Information Needed]
134
-
135
- ## Evaluation
136
-
137
- <!-- This section describes the evaluation protocols and provides the results. -->
138
-
139
- ### Testing Data, Factors & Metrics
140
-
141
- #### Testing Data
142
-
143
- <!-- This should link to a Dataset Card if possible. -->
144
-
145
- [More Information Needed]
146
-
147
- #### Factors
148
-
149
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
150
-
151
- [More Information Needed]
152
-
153
- #### Metrics
154
-
155
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
156
-
157
- [More Information Needed]
158
-
159
- ### Results
160
-
161
- [More Information Needed]
162
-
163
- #### Summary
164
-
165
-
166
-
167
- ## Model Examination [optional]
168
-
169
- <!-- Relevant interpretability work for the model goes here -->
170
-
171
- [More Information Needed]
172
-
173
- ## Environmental Impact
174
-
175
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
176
-
177
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
178
-
179
- - **Hardware Type:** [More Information Needed]
180
- - **Hours used:** [More Information Needed]
181
- - **Cloud Provider:** [More Information Needed]
182
- - **Compute Region:** [More Information Needed]
183
- - **Carbon Emitted:** [More Information Needed]
184
-
185
- ## Technical Specifications [optional]
186
-
187
- ### Model Architecture and Objective
188
-
189
- [More Information Needed]
190
-
191
- ### Compute Infrastructure
192
 
193
- [More Information Needed]
 
194
 
195
- #### Hardware
196
 
197
- [More Information Needed]
 
198
 
199
- #### Software
200
 
201
- [More Information Needed]
202
 
203
- ## Citation [optional]
 
 
204
 
205
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
206
 
207
- **BibTeX:**
 
208
 
209
- [More Information Needed]
210
 
211
- **APA:**
 
212
 
213
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
214
 
215
- ## Glossary [optional]
216
 
217
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
218
 
219
- [More Information Needed]
220
 
221
- ## More Information [optional]
222
 
223
- [More Information Needed]
224
 
225
- ## Model Card Authors [optional]
226
 
227
- [More Information Needed]
228
 
229
- ## Model Card Contact
230
 
231
- [More Information Needed]
232
 
 
233
 
 
 
 
3
  tags: []
4
  ---
5
 
6
+ # Model Overview
7
 
8
+ This model was trained as a small-scale experiment to determine how easy it is to fine-tune [ai21labs/Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1) to work as a chatbot.
9
 
10
+ The aim of this experiment was to find how intelligently and reliably Jamba can chat in both English and other languages if only QLoRA finetuned for a few hours.
 
 
11
 
12
  Initial subjective testing has shown that this model can chat reasonably well in both English and Japanese, so feel free to give it a try!
13
 
 
37
  # ['ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใฏใ€้‹ๅ‹•ใ‚’ใ™ใ‚‹ใจใใซไฝ“ๆธฉใŒไธŠใŒใ‚Šใ€ไฝ“ๅ†…ใฎ็†ฑใ‚’ๅค–้ƒจใซๆ”พๅ‡บใ™ใ‚‹ใŸใ‚ใฎ่‡ช็„ถใชใƒกใ‚ซใƒ‹ใ‚บใƒ ใงใ™ใ€‚ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใŒๅคšใ„ใ“ใจใฏใ€ไธ€่ˆฌ็š„ใซใฏใ€ไฝ“ใฎๆธฉๅบฆ่ชฟ็ฏ€ๆฉŸ่ƒฝใŒๅƒใ„ใฆใ„ใ‚‹ใ“ใจใ‚’ๆ„ๅ‘ณใ—ใพใ™ใ€‚ใ—ใ‹ใ—ใ€ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใŒๅคšใ™ใŽใ‚‹ใจใ€ไธๅฟซๆ„Ÿใ‚„ๆฑ—็—‡ใชใฉใฎๅ•้กŒใŒ็™บ็”Ÿใ™ใ‚‹ใ“ใจใŒใ‚ใ‚Šใพใ™ใ€‚ไปฅไธ‹ใซใ€ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใŒๅคšใ„ๅ ดๅˆใฎๅฏพ็ญ–ใ‚’็ดนไป‹ใ—ใพใ™ใ€‚\n\n1. ้ฉๅˆ‡ใชๆœ่ฃ…ใ‚’้ธใถ: ๆฑ—ใŒๅ‡บใ‚‹ใ“ใจใŒๅคšใ„ๅ ดๅˆใ€่ปฝ้‡ใง้€ๆนฟๆ€งใฎ้ซ˜ใ„ๆœใ‚’้ธใถใ“ใจใŒ้‡่ฆใงใ™ใ€‚ใ“ใ‚Œใซใ‚ˆใ‚Šใ€ๆฑ—ใŒไฝ“ใ‹ใ‚‰ๅค–้ƒจใซ๏ฟฝ']
38
  ```
39
 
40
+ # Initial testing results
41
 
42
 
43
 
44
+ # Training details
45
 
46
+ The model was trained on 2 open source datasets (one multilingual) for one epoch on a A100 (80GB) x 4 environment for 3 hours.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
+ ## Training data
49
 
50
+ * [jondurbin/airoboros-3.2](https://huggingface.co/datasets/jondurbin/airoboros-3.2)
51
 
52
+ A ~59K example dataset of curated LLM tasks in English, primarily generated with GPT-4. This dataset has been used by some of the best performing open source LLMs in the world (e.g. [jondurbin/bagel-7b-v0.4](https://huggingface.co/jondurbin/bagel-7b-v0.4), [NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO)) and contains a wide variety of tasks, so we hypothesized that this would lead to a multi-talented, accurate model. For this reason we chose this dataset was chosen for the bulk of our training data.
53
 
54
+ Note: Each element in jondurbin/airoboros-3.2 already contains a system message.
55
 
56
+ * [openchat/openchat_sharegpt4_dataset](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset) (GPT-4 responses only)
57
 
58
+ A ~6K example dataset of multilingual multi-turn chats between users and GPT-4. While jondurbin/airoboros-3.2 has deilvered good results for models previously, it sadly contains no (or seemingly very little) multilingual data. We are a Japanese AI company, so require an LLM to be able to output in Japanese too. Hence we also selected a small, seemingly high quality dataset of GPT-4 responses in many languages from the ShareGPT dataset. We chose to only select the GPT-4 responses as we wanted to keep our dataset as small and high quality as possible to maximise the efficiency of our training.
59
 
60
+ Note: openchat/openchat_sharegpt4_dataset does not contain system messages, so we added 'You are GPT-4, a helpful assistant.' as our system message.
61
 
62
+ <details>
63
+ <summary>Data preparation code</summary>
64
 
65
+ ```python
66
+ import os
67
+ import pandas as pd
68
+ from datasets import load_dataset, Dataset, concatenate_datasets
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
+ os.environ['HF_HOME'] = "/workspace/hf_home"
71
+ os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = "1"
72
 
73
+ boros_dataset = load_dataset("jondurbin/airoboros-3.2", split='train')
74
 
75
+ gpt4_df = pd.read_json("https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/resolve/main/sharegpt_gpt4.json?download=true")
76
+ gpt4_df["conversations"] = gpt4_df["items"].apply(lambda x: [{'from': 'system', 'value': 'You are GPT-4, a helpful assistant.'}] + x)
77
 
78
+ gpt4_dataset = Dataset.from_pandas(gpt4_df[["conversations"]])
79
 
80
+ dataset = concatenate_datasets([gpt4_dataset, boros_dataset]).shuffle()
81
 
82
+ dataset.select_columns(["conversations"]).to_json("/workspace/airoboros-3.2_plus_openchat_sharegpt4.json")
83
+ ```
84
+ </details>
85
 
 
86
 
87
+ ## Training
88
+ The Jamba-v0.1 base model was trained for roughly 3 hours in a A100 (80GB) x 4 environment on the Azure cloud (Standard_NC96ads_A100_v4).
89
 
90
+ Our training harness was Axolotl, with the following config as our training parameters:
91
 
92
+ <details>
93
+ <summary>Training config</summary>
94
 
95
+ ```python
96
+ base_model: ai21labs/Jamba-v0.1
97
+ trust_remote_code: true
98
+
99
+ load_in_8bit: false
100
+ load_in_4bit: true
101
+ strict: false
102
+
103
+ datasets:
104
+ - path: /workspace/airoboros-3.2_plus_openchat_sharegpt4.json
105
+ ds_type: json
106
+ type: sharegpt
107
+ conversation: chatml
108
+ dataset_prepared_path:
109
+ val_set_size: 0.01
110
+ output_dir: ./airoboros-3.2_plus_openchat_sharegpt4_one_epoch
111
+
112
+ sequence_len: 6000
113
+ sample_packing: true
114
+ pad_to_sequence_len: false
115
+ eval_sample_packing: true
116
+
117
+ use_wandb: true
118
+ wandb_project: axolotl
119
+ wandb_entity: peterd
120
+ wandb_name: airoboros-3.2_plus_openchat_sharegpt4
121
+
122
+ adapter: qlora
123
+ lora_r: 8
124
+ lora_alpha: 16
125
+ lora_dropout: 0.05
126
+ lora_target_linear: true
127
+
128
+ low_cpu_mem_usage: true
129
+ gradient_accumulation_steps: 4
130
+ micro_batch_size: 1
131
+ num_epochs: 1
132
+ optimizer: paged_adamw_8bit
133
+ lr_scheduler: cosine
134
+ learning_rate: 0.0002
135
+
136
+ train_on_inputs: false
137
+ group_by_length: false
138
+ bf16: auto
139
+ fp16:
140
+ tf32: false
141
+
142
+ gradient_checkpointing: true
143
+ gradient_checkpointing_kwargs:
144
+ use_reentrant: false
145
+ early_stopping_patience:
146
+ resume_from_checkpoint:
147
+ local_rank:
148
+ logging_steps: 1
149
+ xformers_attention:
150
+ flash_attention: true
151
+
152
+ warmup_steps: 10
153
+ evals_per_epoch: 5
154
+ saves_per_epoch: 5
155
+ debug:
156
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero2.json
157
+ weight_decay: 0.0
158
+ special_tokens:
159
+ ```
160
+ </details>
161
 
 
162
 
163
+ <details>
164
+ <summary>Training graphs</summary>
165
 
166
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/umxTIsNRHUtKS_kL81Uyf.png)
167
 
168
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/mpuCoL99rxX8RCgXH1CJo.png)
169
 
170
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/5FvwYNdte-bgzEvcvFO8I.png)
171
 
172
+ </details>
173
 
 
174
 
 
175
 
176
+ <br/>
177
 
178
+ # Developers
179
 
180
+ Lead developer - Peter Devine
181
+ Administrative supervisor - Shunichi Taniguchi