Triangle104 commited on
Commit
2656548
·
verified ·
1 Parent(s): f911891

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +436 -0
README.md CHANGED
@@ -26,6 +26,442 @@ model-index:
26
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
27
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) for more details on the model.
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ## Use with llama.cpp
30
  Install llama.cpp through brew (works on Mac and Linux)
31
 
 
26
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
27
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2) for more details on the model.
28
 
29
+ ---
30
+ Model details:
31
+ -
32
+
33
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-14B on mixture of synthetic and natural data.
34
+
35
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve
36
+ versatility, creativity and "flavor" of the resulting model.
37
+
38
+
39
+
40
+
41
+
42
+ Version notes for 0.2: Now using the refined dataset from 32B
43
+ 0.2. Major improvements in coherence, instruction following and
44
+ long-context comprehension over 14B v0.1.
45
+
46
+
47
+
48
+
49
+
50
+ Prompt format is ChatML.
51
+
52
+
53
+
54
+ Recommended sampler values:
55
+
56
+
57
+ Temperature: 0.8
58
+ Min-P: 0.05
59
+ Top-A: 0.3
60
+ Repetition Penalty: 1.03
61
+
62
+
63
+
64
+ Recommended SillyTavern presets (via CalamitousFelicitousness):
65
+
66
+
67
+
68
+ Context
69
+ Instruct and System Prompt
70
+
71
+
72
+
73
+
74
+
75
+
76
+
77
+
78
+ Training data:
79
+
80
+
81
+
82
+ Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
83
+ Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
84
+ A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
85
+ A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
86
+ Synthstruct and SynthRP datasets by Epiculous
87
+ A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.
88
+
89
+
90
+
91
+ Training time and hardware:
92
+
93
+
94
+
95
+ 3 hours on 8xH100 SXM, provided by FeatherlessAI
96
+
97
+
98
+
99
+
100
+
101
+
102
+ Model was created by Kearm, Auri and Cahvay.
103
+
104
+
105
+ Special thanks:
106
+ to Cahvay for his work on investigating and reprocessing the
107
+ corrupted dataset, removing the single biggest source of data poisoning.
108
+ to FeatherlessAI for generously providing 8xH100 SXM node for training of this model
109
+ to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data
110
+ and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.
111
+
112
+
113
+
114
+
115
+
116
+
117
+ See axolotl config
118
+
119
+
120
+ axolotl version: 0.4.1
121
+
122
+
123
+ base_model: Qwen/Qwen2.5-14B
124
+
125
+ load_in_8bit: false
126
+ load_in_4bit: false
127
+ strict: false
128
+
129
+ plugins:
130
+ - axolotl.integrations.liger.LigerPlugin
131
+ liger_rope: true
132
+ liger_rms_norm: true
133
+ liger_swiglu: true
134
+ liger_fused_linear_cross_entropy: true
135
+
136
+ # plugins:
137
+ # - axolotl.integrations.spectrum.SpectrumPlugin
138
+
139
+ # spectrum_top_fraction: 0.5
140
+ # # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
141
+ # spectrum_model_name: Qwen/Qwen2.5-32B
142
+
143
+ datasets:
144
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
145
+ type: sharegpt
146
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
147
+ type: sharegpt
148
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
149
+ type: sharegpt
150
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
151
+ type: sharegpt
152
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
153
+ type: sharegpt
154
+ - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
155
+ type: sharegpt
156
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
157
+ type: sharegpt
158
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
159
+ type: sharegpt
160
+
161
+ chat_template: chatml
162
+ shuffle_merged_datasets: true
163
+ val_set_size: 0.001
164
+ output_dir: ./EVA-Qwen2.5-14B-SFFT-v0.2
165
+
166
+ sequence_len: 10240
167
+ sample_packing: true
168
+ eval_sample_packing: false
169
+ pad_to_sequence_len: true
170
+
171
+ # adapter: qlora
172
+ # lora_model_dir:
173
+ # lora_r: 64
174
+ # lora_alpha: 128
175
+ # lora_dropout: 0.05
176
+ # lora_target_linear: true
177
+ # peft_use_dora: true
178
+
179
+ base_model: Qwen/Qwen2.5-14B
180
+
181
+ load_in_8bit: false
182
+ load_in_4bit: false
183
+ strict: false
184
+
185
+ plugins:
186
+ - axolotl.integrations.liger.LigerPlugin
187
+ liger_rope: true
188
+ liger_rms_norm: true
189
+ liger_swiglu: true
190
+ liger_fused_linear_cross_entropy: true
191
+
192
+ datasets:
193
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
194
+ type: sharegpt
195
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
196
+ type: sharegpt
197
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
198
+ type: sharegpt
199
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
200
+ type: sharegpt
201
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
202
+ type: sharegpt
203
+ - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
204
+ type: sharegpt
205
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
206
+ type: sharegpt
207
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
208
+ type: sharegpt
209
+
210
+ chat_template: chatml
211
+ shuffle_merged_datasets: true
212
+ val_set_size: 0.005
213
+ output_dir: ./EVA-Qwen2.5-14B-SFFT-v0.2
214
+
215
+ sequence_len: 10240
216
+ sample_packing: true
217
+ eval_sample_packing: false
218
+ pad_to_sequence_len: true
219
+
220
+ # adapter: qlora
221
+ # lora_model_dir:
222
+ # lora_r: 32
223
+ # lora_alpha: 16
224
+ # lora_dropout: 0.05
225
+ # lora_target_linear: true
226
+ # peft_use_dora: true
227
+
228
+ unfrozen_parameters:
229
+ - ^lm_head.weight$
230
+ - ^model.embed_tokens.weight$
231
+ # mlp.down_proj layers
232
+ - model.layers.1.mlp.down_proj
233
+ - model.layers.35.mlp.down_proj
234
+ - model.layers.38.mlp.down_proj
235
+ - model.layers.37.mlp.down_proj
236
+ - model.layers.36.mlp.down_proj
237
+ - model.layers.15.mlp.down_proj
238
+ - model.layers.11.mlp.down_proj
239
+ - model.layers.12.mlp.down_proj
240
+ - model.layers.34.mlp.down_proj
241
+ - model.layers.44.mlp.down_proj
242
+ - model.layers.45.mlp.down_proj
243
+ - model.layers.9.mlp.down_proj
244
+ - model.layers.41.mlp.down_proj
245
+ - model.layers.33.mlp.down_proj
246
+ - model.layers.43.mlp.down_proj
247
+ - model.layers.40.mlp.down_proj
248
+ - model.layers.13.mlp.down_proj
249
+ - model.layers.8.mlp.down_proj
250
+ - model.layers.39.mlp.down_proj
251
+ - model.layers.10.mlp.down_proj
252
+ - model.layers.14.mlp.down_proj
253
+ - model.layers.16.mlp.down_proj
254
+ - model.layers.31.mlp.down_proj
255
+ - model.layers.32.mlp.down_proj
256
+ # mlp.gate_proj layers
257
+ - model.layers.1.mlp.gate_proj
258
+ - model.layers.44.mlp.gate_proj
259
+ - model.layers.46.mlp.gate_proj
260
+ - model.layers.45.mlp.gate_proj
261
+ - model.layers.43.mlp.gate_proj
262
+ - model.layers.47.mlp.gate_proj
263
+ - model.layers.42.mlp.gate_proj
264
+ - model.layers.32.mlp.gate_proj
265
+ - model.layers.27.mlp.gate_proj
266
+ - model.layers.33.mlp.gate_proj
267
+ - model.layers.28.mlp.gate_proj
268
+ - model.layers.39.mlp.gate_proj
269
+ - model.layers.41.mlp.gate_proj
270
+ - model.layers.40.mlp.gate_proj
271
+ - model.layers.30.mlp.gate_proj
272
+ - model.layers.29.mlp.gate_proj
273
+ - model.layers.31.mlp.gate_proj
274
+ - model.layers.37.mlp.gate_proj
275
+ - model.layers.26.mlp.gate_proj
276
+ - model.layers.10.mlp.gate_proj
277
+ - model.layers.38.mlp.gate_proj
278
+ - model.layers.36.mlp.gate_proj
279
+ - model.layers.12.mlp.gate_proj
280
+ - model.layers.13.mlp.gate_proj
281
+ # mlp.up_proj layers
282
+ - model.layers.1.mlp.up_proj
283
+ - model.layers.13.mlp.up_proj
284
+ - model.layers.11.mlp.up_proj
285
+ - model.layers.14.mlp.up_proj
286
+ - model.layers.15.mlp.up_proj
287
+ - model.layers.12.mlp.up_proj
288
+ - model.layers.8.mlp.up_proj
289
+ - model.layers.16.mlp.up_proj
290
+ - model.layers.9.mlp.up_proj
291
+ - model.layers.19.mlp.up_proj
292
+ - model.layers.10.mlp.up_proj
293
+ - model.layers.7.mlp.up_proj
294
+ - model.layers.17.mlp.up_proj
295
+ - model.layers.20.mlp.up_proj
296
+ - model.layers.21.mlp.up_proj
297
+ - model.layers.18.mlp.up_proj
298
+ - model.layers.37.mlp.up_proj
299
+ - model.layers.38.mlp.up_proj
300
+ - model.layers.39.mlp.up_proj
301
+ - model.layers.42.mlp.up_proj
302
+ - model.layers.41.mlp.up_proj
303
+ - model.layers.27.mlp.up_proj
304
+ - model.layers.28.mlp.up_proj
305
+ - model.layers.36.mlp.up_proj
306
+ # self_attn.k_proj layers
307
+ - model.layers.47.self_attn.k_proj
308
+ - model.layers.39.self_attn.k_proj
309
+ - model.layers.41.self_attn.k_proj
310
+ - model.layers.37.self_attn.k_proj
311
+ - model.layers.35.self_attn.k_proj
312
+ - model.layers.44.self_attn.k_proj
313
+ - model.layers.38.self_attn.k_proj
314
+ - model.layers.14.self_attn.k_proj
315
+ - model.layers.7.self_attn.k_proj
316
+ - model.layers.12.self_attn.k_proj
317
+ - model.layers.11.self_attn.k_proj
318
+ - model.layers.32.self_attn.k_proj
319
+ - model.layers.10.self_attn.k_proj
320
+ - model.layers.8.self_attn.k_proj
321
+ - model.layers.6.self_attn.k_proj
322
+ - model.layers.9.self_attn.k_proj
323
+ - model.layers.45.self_attn.k_proj
324
+ - model.layers.42.self_attn.k_proj
325
+ - model.layers.40.self_attn.k_proj
326
+ - model.layers.5.self_attn.k_proj
327
+ - model.layers.0.self_attn.k_proj
328
+ - model.layers.33.self_attn.k_proj
329
+ - model.layers.34.self_attn.k_proj
330
+ - model.layers.13.self_attn.k_proj
331
+ # self_attn.o_proj layers
332
+ - model.layers.12.self_attn.o_proj
333
+ - model.layers.5.self_attn.o_proj
334
+ - model.layers.14.self_attn.o_proj
335
+ - model.layers.16.self_attn.o_proj
336
+ - model.layers.20.self_attn.o_proj
337
+ - model.layers.13.self_attn.o_proj
338
+ - model.layers.11.self_attn.o_proj
339
+ - model.layers.4.self_attn.o_proj
340
+ - model.layers.6.self_attn.o_proj
341
+ - model.layers.19.self_attn.o_proj
342
+ - model.layers.7.self_attn.o_proj
343
+ - model.layers.18.self_attn.o_proj
344
+ - model.layers.8.self_attn.o_proj
345
+ - model.layers.38.self_attn.o_proj
346
+ - model.layers.15.self_attn.o_proj
347
+ - model.layers.17.self_attn.o_proj
348
+ - model.layers.9.self_attn.o_proj
349
+ - model.layers.10.self_attn.o_proj
350
+ - model.layers.21.self_attn.o_proj
351
+ - model.layers.28.self_attn.o_proj
352
+ - model.layers.32.self_attn.o_proj
353
+ - model.layers.35.self_attn.o_proj
354
+ - model.layers.39.self_attn.o_proj
355
+ - model.layers.3.self_attn.o_proj
356
+ # self_attn.q_proj layers
357
+ - model.layers.1.self_attn.q_proj
358
+ - model.layers.2.self_attn.q_proj
359
+ - model.layers.3.self_attn.q_proj
360
+ - model.layers.44.self_attn.q_proj
361
+ - model.layers.29.self_attn.q_proj
362
+ - model.layers.45.self_attn.q_proj
363
+ - model.layers.43.self_attn.q_proj
364
+ - model.layers.32.self_attn.q_proj
365
+ - model.layers.38.self_attn.q_proj
366
+ - model.layers.19.self_attn.q_proj
367
+ - model.layers.42.self_attn.q_proj
368
+ - model.layers.34.self_attn.q_proj
369
+ - model.layers.36.self_attn.q_proj
370
+ - model.layers.40.self_attn.q_proj
371
+ - model.layers.26.self_attn.q_proj
372
+ - model.layers.20.self_attn.q_proj
373
+ - model.layers.28.self_attn.q_proj
374
+ - model.layers.39.self_attn.q_proj
375
+ - model.layers.41.self_attn.q_proj
376
+ - model.layers.33.self_attn.q_proj
377
+ - model.layers.35.self_attn.q_proj
378
+ - model.layers.25.self_attn.q_proj
379
+ - model.layers.30.self_attn.q_proj
380
+ - model.layers.27.self_attn.q_proj
381
+ # self_attn.v_proj layers
382
+ - model.layers.0.self_attn.v_proj
383
+ - model.layers.7.self_attn.v_proj
384
+ - model.layers.39.self_attn.v_proj
385
+ - model.layers.31.self_attn.v_proj
386
+ - model.layers.15.self_attn.v_proj
387
+ - model.layers.10.self_attn.v_proj
388
+ - model.layers.41.self_attn.v_proj
389
+ - model.layers.32.self_attn.v_proj
390
+ - model.layers.6.self_attn.v_proj
391
+ - model.layers.33.self_attn.v_proj
392
+ - model.layers.42.self_attn.v_proj
393
+ - model.layers.29.self_attn.v_proj
394
+ - model.layers.9.self_attn.v_proj
395
+ - model.layers.14.self_attn.v_proj
396
+ - model.layers.35.self_attn.v_proj
397
+ - model.layers.38.self_attn.v_proj
398
+ - model.layers.13.self_attn.v_proj
399
+ - model.layers.30.self_attn.v_proj
400
+ - model.layers.34.self_attn.v_proj
401
+ - model.layers.5.self_attn.v_proj
402
+ - model.layers.28.self_attn.v_proj
403
+ - model.layers.37.self_attn.v_proj
404
+ - model.layers.27.self_attn.v_proj
405
+ - model.layers.11.self_attn.v_proj
406
+
407
+ wandb_project: EVA-Qwen2.5-14B-SFFT-v0.2
408
+ wandb_entity:
409
+ wandb_watch:
410
+ wandb_name: Unit-02
411
+ wandb_log_model:
412
+
413
+ gradient_accumulation_steps: 8
414
+ micro_batch_size: 2
415
+ num_epochs: 3
416
+ optimizer: paged_ademamix_8bit
417
+ lr_scheduler: cosine
418
+ learning_rate: 0.00005
419
+ max_grad_norm: 3
420
+
421
+ train_on_inputs: false
422
+ group_by_length: false
423
+ bf16: auto
424
+ fp16:
425
+ tf32: false
426
+
427
+ gradient_checkpointing: "unsloth"
428
+ # gradient_checkpointing_kwargs:
429
+ # use_reentrant: true
430
+ early_stopping_patience:
431
+ resume_from_checkpoint:
432
+ local_rank:
433
+ logging_steps: 1
434
+ xformers_attention:
435
+ flash_attention: true
436
+
437
+ warmup_steps: 20
438
+ evals_per_epoch: 4
439
+ saves_per_epoch: 4
440
+ save_safetensors: true
441
+ hub_model_id:
442
+ hub_strategy:
443
+ debug:
444
+ deepspeed: deepspeed_configs/zero3_bf16.json
445
+ weight_decay: 0.1
446
+ # fsdp:
447
+ # - full_shard
448
+ # - auto_wrap
449
+ # fsdp_config:
450
+ # fsdp_limit_all_gathers: true
451
+ # fsdp_sync_module_states: false
452
+ # fsdp_offload_params: true
453
+ # fsdp_cpu_ram_efficient_loading: true
454
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
455
+ # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
456
+ # fsdp_activation_checkpointing: true
457
+ # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
458
+ # fsdp_sharding_strategy: FULL_SHARD
459
+ # fsdp_forward_prefetch: false # Added
460
+ # fsdp_backward_prefetch: "BACKWARD_PRE" # Added
461
+ # fsdp_backward_prefetch_limit: 1 # Added
462
+ # fsdp_mixed_precision: BF16 # Added
463
+
464
+ ---
465
  ## Use with llama.cpp
466
  Install llama.cpp through brew (works on Mac and Linux)
467