ehartford commited on
Commit
3d60b22
·
verified ·
1 Parent(s): 4a15c00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +415 -3
README.md CHANGED
@@ -1,3 +1,415 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-Nemo-Base-2407
4
+ tags:
5
+ - generated_from_trainer
6
+ - axolotl
7
+ datasets:
8
+ - cognitivecomputations/Dolphin-2.9
9
+ - teknium/OpenHermes-2.5
10
+ - m-a-p/CodeFeedback-Filtered-Instruction
11
+ - cognitivecomputations/dolphin-coder
12
+ - cognitivecomputations/samantha-data
13
+ - microsoft/orca-math-word-problems-200k
14
+ - Locutusque/function-calling-chatml
15
+ - internlm/Agent-FLAN
16
+ ---
17
+
18
+ # Dolphin 2.9.3 Mistral Nemo 12b 🐬
19
+
20
+ This is the llama.cpp gguf conversion of the original model located here:
21
+
22
+ https://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b
23
+
24
+ Curated and trained by Eric Hartford and Cognitive Computations
25
+
26
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/h3K4XGj2RH)
27
+ Discord: https://discord.gg/h3K4XGj2RH
28
+
29
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
30
+
31
+ Our appreciation for the sponsors of Dolphin 2.9.3:
32
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xL40S node
33
+
34
+ This model is based on mistralai/Mistral-Nemo-Base-2407, and is governed by the apache 2.0 license.
35
+
36
+ The base model has 128K context, and our finetuning used 8192 sequence length.
37
+
38
+ Dolphin 2.9.3 uses ChatML prompt template format.
39
+
40
+ example:
41
+
42
+ ```
43
+ <|im_start|>system
44
+ You are Dolphin, a helpful AI assistant.<|im_end|>
45
+ <|im_start|>user
46
+ {prompt}<|im_end|>
47
+ <|im_start|>assistant
48
+
49
+ ```
50
+
51
+ Dolphin-2.9.3 has a variety of instruction following, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
52
+
53
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
54
+
55
+ Dolphin is licensed according to apache 2.0 license. We grant permission for any use, including commercial. Dolphin was trained on data generated from GPT4, among other models.
56
+
57
+ ## Evals
58
+
59
+ TBD
60
+
61
+ ## Training
62
+
63
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
64
+ should probably proofread and complete it, then remove this comment. -->
65
+
66
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
67
+ <details><summary>See axolotl config</summary>
68
+
69
+ axolotl version: `0.4.1`
70
+ ```yaml
71
+ base_model: /workspace/models/Mistral-Nemo-Base-2407
72
+ model_type: AutoModelForCausalLM
73
+ tokenizer_type: AutoTokenizer
74
+
75
+ load_in_8bit: false
76
+ # load_in_4bit: true
77
+ strict: false
78
+
79
+ datasets:
80
+ - path: /workspace/datasets/dolphin-2.9.3/dolphin201-sharegpt2.jsonl
81
+ type: sharegpt
82
+ conversation: chatml
83
+ - path: /workspace/datasets/dolphin-2.9.3/SystemChat_filtered_sharegpt.jsonl
84
+ type: sharegpt
85
+ conversation: chatml
86
+ - path: /workspace/datasets/dolphin-2.9.3/SystemChat_multilingual_sharegpt.jsonl
87
+ type: sharegpt
88
+ conversation: chatml
89
+ - path: /workspace/datasets/dolphin-2.9.3/dolphin-coder-translate-sharegpt2.jsonl
90
+ type: sharegpt
91
+ conversation: chatml
92
+ - path: /workspace/datasets/dolphin-2.9.3/dolphin-coder-codegen-sharegpt2.jsonl
93
+ type: sharegpt
94
+ conversation: chatml
95
+ - path: /workspace/datasets/dolphin-2.9.3/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
96
+ type: sharegpt
97
+ conversation: chatml
98
+ - path: /workspace/datasets/dolphin-2.9.3/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
99
+ type: sharegpt
100
+ conversation: chatml
101
+ - path: /workspace/datasets/dolphin-2.9.3/not_samantha_norefusals.jsonl
102
+ type: sharegpt
103
+ conversation: chatml
104
+ - path: /workspace/datasets/dolphin-2.9.3/Orca-Math-resort-unfiltered.jsonl
105
+ type: sharegpt
106
+ conversation: chatml
107
+ - path: /workspace/datasets/dolphin-2.9.3/agent_instruct_react_unfiltered.jsonl
108
+ type: sharegpt
109
+ conversation: chatml
110
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_instruct_j1s1_3k_unfiltered.jsonl
111
+ type: sharegpt
112
+ conversation: chatml
113
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_negative_unfiltered.jsonl
114
+ type: sharegpt
115
+ conversation: chatml
116
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_react_10p_unfiltered.jsonl
117
+ type: sharegpt
118
+ conversation: chatml
119
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_tflan_cot_30p_unfiltered.jsonl
120
+ type: sharegpt
121
+ conversation: chatml
122
+ - path: /workspace/datasets/dolphin-2.9.3/openhermes200k_unfiltered.jsonl
123
+ type: sharegpt
124
+ conversation: chatml
125
+
126
+ chat_template: chatml
127
+ # adapter: qlora
128
+ # lora_r: 128
129
+ # lora_alpha: 16
130
+ # lora_modules_to_save: [embed_tokens, lm_head]
131
+ # lora_dropout: 0.05
132
+ # lora_target_linear: true
133
+
134
+
135
+ unfrozen_parameters:
136
+ - ^lm_head.weight$
137
+ - ^model.embed_tokens.weight$
138
+ - input_layernorm
139
+ - model.norm
140
+ - post_attention_layernorm
141
+ - self_attn.rotary_emb
142
+ # mlp.down_proj layers
143
+ - model.layers.0.mlp.down_proj
144
+ - model.layers.1.mlp.down_proj
145
+ - model.layers.4.mlp.down_proj
146
+ - model.layers.37.mlp.down_proj
147
+ - model.layers.24.mlp.down_proj
148
+ - model.layers.2.mlp.down_proj
149
+ - model.layers.38.mlp.down_proj
150
+ - model.layers.35.mlp.down_proj
151
+ - model.layers.25.mlp.down_proj
152
+ - model.layers.6.mlp.down_proj
153
+ - model.layers.22.mlp.down_proj
154
+ - model.layers.23.mlp.down_proj
155
+ - model.layers.3.mlp.down_proj
156
+ - model.layers.21.mlp.down_proj
157
+ - model.layers.5.mlp.down_proj
158
+ - model.layers.28.mlp.down_proj
159
+ - model.layers.20.mlp.down_proj
160
+ - model.layers.26.mlp.down_proj
161
+ - model.layers.19.mlp.down_proj
162
+ - model.layers.34.mlp.down_proj
163
+ # mlp.gate_proj layers
164
+ - model.layers.2.mlp.gate_proj
165
+ - model.layers.1.mlp.gate_proj
166
+ - model.layers.3.mlp.gate_proj
167
+ - model.layers.5.mlp.gate_proj
168
+ - model.layers.4.mlp.gate_proj
169
+ - model.layers.35.mlp.gate_proj
170
+ - model.layers.36.mlp.gate_proj
171
+ - model.layers.37.mlp.gate_proj
172
+ - model.layers.38.mlp.gate_proj
173
+ - model.layers.34.mlp.gate_proj
174
+ - model.layers.33.mlp.gate_proj
175
+ - model.layers.8.mlp.gate_proj
176
+ - model.layers.32.mlp.gate_proj
177
+ - model.layers.6.mlp.gate_proj
178
+ - model.layers.28.mlp.gate_proj
179
+ - model.layers.26.mlp.gate_proj
180
+ - model.layers.30.mlp.gate_proj
181
+ - model.layers.23.mlp.gate_proj
182
+ - model.layers.29.mlp.gate_proj
183
+ - model.layers.27.mlp.gate_proj
184
+ # mlp.up_proj layers
185
+ - model.layers.3.mlp.up_proj
186
+ - model.layers.4.mlp.up_proj
187
+ - model.layers.6.mlp.up_proj
188
+ - model.layers.2.mlp.up_proj
189
+ - model.layers.5.mlp.up_proj
190
+ - model.layers.8.mlp.up_proj
191
+ - model.layers.10.mlp.up_proj
192
+ - model.layers.9.mlp.up_proj
193
+ - model.layers.7.mlp.up_proj
194
+ - model.layers.0.mlp.up_proj
195
+ - model.layers.17.mlp.up_proj
196
+ - model.layers.15.mlp.up_proj
197
+ - model.layers.22.mlp.up_proj
198
+ - model.layers.18.mlp.up_proj
199
+ - model.layers.16.mlp.up_proj
200
+ - model.layers.11.mlp.up_proj
201
+ - model.layers.21.mlp.up_proj
202
+ - model.layers.23.mlp.up_proj
203
+ - model.layers.20.mlp.up_proj
204
+ - model.layers.27.mlp.up_proj
205
+ # self_attn.k_proj layers
206
+ - model.layers.30.self_attn.k_proj
207
+ - model.layers.27.self_attn.k_proj
208
+ - model.layers.25.self_attn.k_proj
209
+ - model.layers.33.self_attn.k_proj
210
+ - model.layers.26.self_attn.k_proj
211
+ - model.layers.31.self_attn.k_proj
212
+ - model.layers.35.self_attn.k_proj
213
+ - model.layers.39.self_attn.k_proj
214
+ - model.layers.22.self_attn.k_proj
215
+ - model.layers.24.self_attn.k_proj
216
+ - model.layers.21.self_attn.k_proj
217
+ - model.layers.28.self_attn.k_proj
218
+ - model.layers.23.self_attn.k_proj
219
+ - model.layers.36.self_attn.k_proj
220
+ - model.layers.20.self_attn.k_proj
221
+ - model.layers.37.self_attn.k_proj
222
+ - model.layers.29.self_attn.k_proj
223
+ - model.layers.32.self_attn.k_proj
224
+ - model.layers.16.self_attn.k_proj
225
+ - model.layers.18.self_attn.k_proj
226
+ # self_attn.o_proj layers
227
+ - model.layers.7.self_attn.o_proj
228
+ - model.layers.6.self_attn.o_proj
229
+ - model.layers.9.self_attn.o_proj
230
+ - model.layers.5.self_attn.o_proj
231
+ - model.layers.27.self_attn.o_proj
232
+ - model.layers.26.self_attn.o_proj
233
+ - model.layers.4.self_attn.o_proj
234
+ - model.layers.31.self_attn.o_proj
235
+ - model.layers.8.self_attn.o_proj
236
+ - model.layers.16.self_attn.o_proj
237
+ - model.layers.3.self_attn.o_proj
238
+ - model.layers.10.self_attn.o_proj
239
+ - model.layers.18.self_attn.o_proj
240
+ - model.layers.33.self_attn.o_proj
241
+ - model.layers.17.self_attn.o_proj
242
+ - model.layers.32.self_attn.o_proj
243
+ - model.layers.30.self_attn.o_proj
244
+ - model.layers.2.self_attn.o_proj
245
+ - model.layers.15.self_attn.o_proj
246
+ - model.layers.11.self_attn.o_proj
247
+ # self_attn.q_proj layers
248
+ - model.layers.14.self_attn.q_proj
249
+ - model.layers.11.self_attn.q_proj
250
+ - model.layers.15.self_attn.q_proj
251
+ - model.layers.9.self_attn.q_proj
252
+ - model.layers.8.self_attn.q_proj
253
+ - model.layers.18.self_attn.q_proj
254
+ - model.layers.12.self_attn.q_proj
255
+ - model.layers.13.self_attn.q_proj
256
+ - model.layers.19.self_attn.q_proj
257
+ - model.layers.16.self_attn.q_proj
258
+ - model.layers.10.self_attn.q_proj
259
+ - model.layers.17.self_attn.q_proj
260
+ - model.layers.7.self_attn.q_proj
261
+ - model.layers.5.self_attn.q_proj
262
+ - model.layers.20.self_attn.q_proj
263
+ - model.layers.3.self_attn.q_proj
264
+ - model.layers.26.self_attn.q_proj
265
+ - model.layers.27.self_attn.q_proj
266
+ - model.layers.28.self_attn.q_proj
267
+ - model.layers.33.self_attn.q_proj
268
+ # self_attn.v_proj layers
269
+ - model.layers.27.self_attn.v_proj
270
+ - model.layers.20.self_attn.v_proj
271
+ - model.layers.24.self_attn.v_proj
272
+ - model.layers.25.self_attn.v_proj
273
+ - model.layers.30.self_attn.v_proj
274
+ - model.layers.2.self_attn.v_proj
275
+ - model.layers.23.self_attn.v_proj
276
+ - model.layers.22.self_attn.v_proj
277
+ - model.layers.26.self_attn.v_proj
278
+ - model.layers.33.self_attn.v_proj
279
+ - model.layers.37.self_attn.v_proj
280
+ - model.layers.7.self_attn.v_proj
281
+ - model.layers.4.self_attn.v_proj
282
+ - model.layers.18.self_attn.v_proj
283
+ - model.layers.31.self_attn.v_proj
284
+ - model.layers.17.self_attn.v_proj
285
+ - model.layers.35.self_attn.v_proj
286
+ - model.layers.32.self_attn.v_proj
287
+ - model.layers.21.self_attn.v_proj
288
+ - model.layers.3.self_attn.v_proj
289
+
290
+
291
+
292
+ dataset_prepared_path: /workspace/axolotl/dolph-2.9.3-nemo-prepared
293
+ val_set_size: 0.01
294
+ output_dir: /workspace/axolotl/dolphin-2.9.3-mistral-nemo
295
+
296
+ sequence_len: 8192
297
+ sample_packing: true
298
+ pad_to_sequence_len: true
299
+
300
+ wandb_project: dolphin-2.9.3-Mistral-nemo
301
+ wandb_watch:
302
+ wandb_run_id:
303
+ wandb_log_model:
304
+
305
+ gradient_accumulation_steps: 16
306
+ micro_batch_size: 1
307
+ num_epochs: 3
308
+ optimizer: adamw_torch
309
+ lr_scheduler: cosine
310
+ learning_rate: 5e-6
311
+ train_on_inputs: false
312
+ group_by_length: false
313
+ bf16: auto
314
+ fp16:
315
+ tf32:
316
+
317
+ gradient_checkpointing: true
318
+ gradient_checkpointing_kwargs:
319
+ use_reentrant: false
320
+ early_stopping_patience:
321
+ resume_from_checkpoint:
322
+ logging_steps: 1
323
+ xformers_attention:
324
+ flash_attention: true
325
+
326
+ warmup_steps: 100
327
+ # evals_per_epoch: 4
328
+ eval_table_size:
329
+ saves_per_epoch: 1
330
+ save_total_limit: 2
331
+ save_steps:
332
+ debug:
333
+ deepspeed: deepspeed_configs/zero3_bf16.json
334
+ weight_decay: 0.1
335
+ special_tokens:
336
+ eos_token: "<|im_end|>"
337
+ pad_token: "<pad>"
338
+ bos_token: "<s>"
339
+ unk_token: "<unk>"
340
+ tokens:
341
+ - "<|im_start|>"
342
+
343
+
344
+ # fsdp:
345
+ # - full_shard
346
+ # - auto_wrap
347
+ # fsdp_config:
348
+ # fsdp_limit_all_gathers: true
349
+ # fsdp_sync_module_states: true
350
+ # fsdp_offload_params: true
351
+ # fsdp_use_orig_params: false
352
+ # fsdp_cpu_ram_efficient_loading: true
353
+ # fsdp_transformer_layer_cls_to_wrap: MixtralSparseMoeBlock
354
+ # fsdp_state_dict_type: FULL_STATE_DICT
355
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
356
+ # fsdp_sharding_strategy: FULL_SHARD
357
+ # fsdp_forward_prefetch: false
358
+ # fsdp_backward_prefetch: BACKWARD_PRE
359
+ ```
360
+
361
+ </details><br>
362
+
363
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/ehartford/dolphin-2.9.3-Mistral-nemo/runs/c23odyoj)
364
+ # workspace/axolotl/dolphin-2.9.3-mistral-nemo
365
+
366
+ This model was trained from scratch on the None dataset.
367
+ It achieves the following results on the evaluation set:
368
+ - Loss: 0.5605
369
+
370
+ ## Model description
371
+
372
+ More information needed
373
+
374
+ ## Intended uses & limitations
375
+
376
+ More information needed
377
+
378
+ ## Training and evaluation data
379
+
380
+ More information needed
381
+
382
+ ## Training procedure
383
+
384
+ ### Training hyperparameters
385
+
386
+ The following hyperparameters were used during training:
387
+ - learning_rate: 5e-06
388
+ - train_batch_size: 1
389
+ - eval_batch_size: 1
390
+ - seed: 42
391
+ - distributed_type: multi-GPU
392
+ - num_devices: 8
393
+ - gradient_accumulation_steps: 16
394
+ - total_train_batch_size: 128
395
+ - total_eval_batch_size: 8
396
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
397
+ - lr_scheduler_type: cosine
398
+ - lr_scheduler_warmup_steps: 100
399
+ - num_epochs: 3
400
+
401
+ ### Training results
402
+
403
+ | Training Loss | Epoch | Step | Validation Loss |
404
+ |:-------------:|:------:|:----:|:---------------:|
405
+ | 0.5691 | 1.0162 | 983 | 0.5734 |
406
+ | 0.5335 | 2.0174 | 1968 | 0.5609 |
407
+ | 0.5297 | 2.9639 | 2901 | 0.5605 |
408
+
409
+
410
+ ### Framework versions
411
+
412
+ - Transformers 4.43.0.dev0
413
+ - Pytorch 2.2.2+cu121
414
+ - Datasets 2.19.1
415
+ - Tokenizers 0.19.1