mav23 commited on
Commit
88b5773
1 Parent(s): 2c1e158

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +431 -0
  3. eva-qwen2.5-32b-v0.2.Q4_0.gguf +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ eva-qwen2.5-32b-v0.2.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,431 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ datasets:
5
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal
6
+ - Nopm/Opus_WritingStruct
7
+ - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
8
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
9
+ - Gryphe/ChatGPT-4o-Writing-Prompts
10
+ - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
11
+ - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
12
+ - nothingiisreal/Reddit-Dirty-And-WritingPrompts
13
+ - allura-org/Celeste-1.x-data-mixture
14
+ - cognitivecomputations/dolphin-2.9.3
15
+ base_model: Qwen/Qwen2.5-32B
16
+ tags:
17
+ - generated_from_trainer
18
+ model-index:
19
+ - name: EVA-Qwen2.5-32B-SFFT-v0.1
20
+ results: []
21
+ ---
22
+
23
+ # EVA Qwen2.5-32B v0.2
24
+
25
+ <p>
26
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data.<br>
27
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
28
+ </p>
29
+
30
+ <p>Dedicated to Nev.</p>
31
+
32
+ <p><b>Version notes for 0.2</b>: Basically, reprocessed the whole dataset again, due to a severe mistake in previously used pipeline, which left the data poisoned with a lot of non-unicode characters. Now, no more weird generation artifacts, and more stability. Major kudos to Cahvay for his work on fixing this critical issue.</p>
33
+
34
+ <p>
35
+ <p>Prompt format is ChatML.</p><br>
36
+ <h3>Recommended sampler values:</h3>
37
+ <ul>
38
+ <li>Temperature: 1</li>
39
+ <li>Min-P: 0.05</li>
40
+ <li>Top-A: 0.2</li>
41
+ <li>Repetition Penalty: 1.03</li>
42
+ </ul>
43
+
44
+ <h3>Recommended SillyTavern presets (via CalamitousFelicitousness):</h3>
45
+
46
+ - [Context](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Context.json)
47
+ - [Instruct and System Prompt](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Instruct.json)
48
+ </p>
49
+
50
+ <p>
51
+ <br>
52
+ <h3>
53
+ Training data:
54
+ </h3>
55
+ <ul>
56
+ <li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
57
+ <li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
58
+ <li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
59
+ <li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
60
+ <li>Synthstruct and SynthRP datasets by Epiculous</li>
61
+ <li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
62
+ </ul>
63
+ <h3>
64
+ Training time and hardware:
65
+ </h3>
66
+ <ul><li>7 hours on 8xH100 SXM, provided by <a href=https://featherless.ai/>FeatherlessAI</a></li></ul><br>
67
+ </p>
68
+ <p>Model was created by Kearm, Auri and Cahvay.</p>
69
+ <h4>Special thanks:</h4><ul>
70
+ <li><b>to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.</b></li>
71
+ <li><b>to <a href=https://featherless.ai/>FeatherlessAI</a> for generously providing 8xH100 SXM node for training of this model</b></li>
72
+ <li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CogninitiveComputations for the data</li>
73
+ <li>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.</li></ul>
74
+
75
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
76
+ <details><summary>See axolotl config</summary>
77
+
78
+ axolotl version: `0.4.1`
79
+ ```yaml
80
+ base_model: Qwen/Qwen2.5-32B
81
+
82
+ load_in_8bit: false
83
+ load_in_4bit: false
84
+ strict: false
85
+
86
+ plugins:
87
+ - axolotl.integrations.liger.LigerPlugin
88
+ liger_rope: true
89
+ liger_rms_norm: true
90
+ liger_swiglu: true
91
+ liger_fused_linear_cross_entropy: true
92
+
93
+ # plugins:
94
+ # - axolotl.integrations.spectrum.SpectrumPlugin
95
+
96
+ # spectrum_top_fraction: 0.5
97
+ # # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
98
+ # spectrum_model_name: Qwen/Qwen2.5-32B
99
+
100
+ datasets:
101
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
102
+ type: sharegpt
103
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
104
+ type: sharegpt
105
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
106
+ type: sharegpt
107
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
108
+ type: sharegpt
109
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
110
+ type: sharegpt
111
+ - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
112
+ type: sharegpt
113
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
114
+ type: sharegpt
115
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
116
+ type: sharegpt
117
+
118
+ chat_template: chatml
119
+ shuffle_merged_datasets: true
120
+ val_set_size: 0.001
121
+ output_dir: ./EVA-Qwen2.5-32B-SFFT-v0.1
122
+
123
+ sequence_len: 10240
124
+ sample_packing: true
125
+ eval_sample_packing: false
126
+ pad_to_sequence_len: true
127
+
128
+ # adapter: qlora
129
+ # lora_model_dir:
130
+ # lora_r: 64
131
+ # lora_alpha: 128
132
+ # lora_dropout: 0.05
133
+ # lora_target_linear: true
134
+ # peft_use_dora: true
135
+
136
+ unfrozen_parameters:
137
+ - ^lm_head.weight$
138
+ - ^model.embed_tokens.weight$
139
+ # mlp.down_proj layers
140
+ - model.layers.63.mlp.down_proj
141
+ - model.layers.49.mlp.down_proj
142
+ - model.layers.48.mlp.down_proj
143
+ - model.layers.45.mlp.down_proj
144
+ - model.layers.44.mlp.down_proj
145
+ - model.layers.47.mlp.down_proj
146
+ - model.layers.46.mlp.down_proj
147
+ - model.layers.43.mlp.down_proj
148
+ - model.layers.8.mlp.down_proj
149
+ - model.layers.11.mlp.down_proj
150
+ - model.layers.19.mlp.down_proj
151
+ - model.layers.35.mlp.down_proj
152
+ - model.layers.20.mlp.down_proj
153
+ - model.layers.52.mlp.down_proj
154
+ - model.layers.39.mlp.down_proj
155
+ - model.layers.62.mlp.down_proj
156
+ - model.layers.50.mlp.down_proj
157
+ - model.layers.29.mlp.down_proj
158
+ - model.layers.16.mlp.down_proj
159
+ - model.layers.28.mlp.down_proj
160
+ - model.layers.53.mlp.down_proj
161
+ - model.layers.30.mlp.down_proj
162
+ - model.layers.31.mlp.down_proj
163
+ - model.layers.32.mlp.down_proj
164
+ - model.layers.7.mlp.down_proj
165
+ - model.layers.36.mlp.down_proj
166
+ - model.layers.12.mlp.down_proj
167
+ - model.layers.18.mlp.down_proj
168
+ - model.layers.37.mlp.down_proj
169
+ - model.layers.38.mlp.down_proj
170
+ - model.layers.14.mlp.down_proj
171
+ - model.layers.13.mlp.down_proj
172
+ # mlp.gate_proj layers
173
+ - model.layers.43.mlp.gate_proj
174
+ - model.layers.61.mlp.gate_proj
175
+ - model.layers.60.mlp.gate_proj
176
+ - model.layers.44.mlp.gate_proj
177
+ - model.layers.62.mlp.gate_proj
178
+ - model.layers.28.mlp.gate_proj
179
+ - model.layers.29.mlp.gate_proj
180
+ - model.layers.45.mlp.gate_proj
181
+ - model.layers.37.mlp.gate_proj
182
+ - model.layers.35.mlp.gate_proj
183
+ - model.layers.59.mlp.gate_proj
184
+ - model.layers.36.mlp.gate_proj
185
+ - model.layers.30.mlp.gate_proj
186
+ - model.layers.48.mlp.gate_proj
187
+ - model.layers.38.mlp.gate_proj
188
+ - model.layers.27.mlp.gate_proj
189
+ - model.layers.31.mlp.gate_proj
190
+ - model.layers.34.mlp.gate_proj
191
+ - model.layers.58.mlp.gate_proj
192
+ - model.layers.33.mlp.gate_proj
193
+ - model.layers.39.mlp.gate_proj
194
+ - model.layers.26.mlp.gate_proj
195
+ - model.layers.32.mlp.gate_proj
196
+ - model.layers.46.mlp.gate_proj
197
+ - model.layers.42.mlp.gate_proj
198
+ - model.layers.49.mlp.gate_proj
199
+ - model.layers.57.mlp.gate_proj
200
+ - model.layers.50.mlp.gate_proj
201
+ - model.layers.47.mlp.gate_proj
202
+ - model.layers.56.mlp.gate_proj
203
+ - model.layers.63.mlp.gate_proj
204
+ - model.layers.55.mlp.gate_proj
205
+ # mlp.up_proj layers
206
+ - model.layers.61.mlp.up_proj
207
+ - model.layers.60.mlp.up_proj
208
+ - model.layers.32.mlp.up_proj
209
+ - model.layers.59.mlp.up_proj
210
+ - model.layers.58.mlp.up_proj
211
+ - model.layers.57.mlp.up_proj
212
+ - model.layers.44.mlp.up_proj
213
+ - model.layers.28.mlp.up_proj
214
+ - model.layers.35.mlp.up_proj
215
+ - model.layers.36.mlp.up_proj
216
+ - model.layers.29.mlp.up_proj
217
+ - model.layers.31.mlp.up_proj
218
+ - model.layers.34.mlp.up_proj
219
+ - model.layers.55.mlp.up_proj
220
+ - model.layers.49.mlp.up_proj
221
+ - model.layers.30.mlp.up_proj
222
+ - model.layers.53.mlp.up_proj
223
+ - model.layers.43.mlp.up_proj
224
+ - model.layers.56.mlp.up_proj
225
+ - model.layers.33.mlp.up_proj
226
+ - model.layers.54.mlp.up_proj
227
+ - model.layers.62.mlp.up_proj
228
+ - model.layers.27.mlp.up_proj
229
+ - model.layers.51.mlp.up_proj
230
+ - model.layers.52.mlp.up_proj
231
+ - model.layers.37.mlp.up_proj
232
+ - model.layers.45.mlp.up_proj
233
+ - model.layers.26.mlp.up_proj
234
+ - model.layers.42.mlp.up_proj
235
+ - model.layers.50.mlp.up_proj
236
+ - model.layers.48.mlp.up_proj
237
+ - model.layers.39.mlp.up_proj
238
+ # self_attn.k_proj layers
239
+ - model.layers.63.self_attn.k_proj
240
+ - model.layers.55.self_attn.k_proj
241
+ - model.layers.60.self_attn.k_proj
242
+ - model.layers.7.self_attn.k_proj
243
+ - model.layers.12.self_attn.k_proj
244
+ - model.layers.13.self_attn.k_proj
245
+ - model.layers.57.self_attn.k_proj
246
+ - model.layers.29.self_attn.k_proj
247
+ - model.layers.14.self_attn.k_proj
248
+ - model.layers.51.self_attn.k_proj
249
+ - model.layers.53.self_attn.k_proj
250
+ - model.layers.54.self_attn.k_proj
251
+ - model.layers.22.self_attn.k_proj
252
+ - model.layers.61.self_attn.k_proj
253
+ - model.layers.18.self_attn.k_proj
254
+ - model.layers.30.self_attn.k_proj
255
+ - model.layers.9.self_attn.k_proj
256
+ - model.layers.24.self_attn.k_proj
257
+ - model.layers.23.self_attn.k_proj
258
+ - model.layers.25.self_attn.k_proj
259
+ - model.layers.10.self_attn.k_proj
260
+ - model.layers.58.self_attn.k_proj
261
+ - model.layers.56.self_attn.k_proj
262
+ - model.layers.15.self_attn.k_proj
263
+ - model.layers.32.self_attn.k_proj
264
+ - model.layers.28.self_attn.k_proj
265
+ - model.layers.8.self_attn.k_proj
266
+ - model.layers.59.self_attn.k_proj
267
+ - model.layers.11.self_attn.k_proj
268
+ - model.layers.48.self_attn.k_proj
269
+ - model.layers.16.self_attn.k_proj
270
+ - model.layers.50.self_attn.k_proj
271
+ # self_attn.o_proj layers
272
+ - model.layers.15.self_attn.o_proj
273
+ - model.layers.23.self_attn.o_proj
274
+ - model.layers.31.self_attn.o_proj
275
+ - model.layers.30.self_attn.o_proj
276
+ - model.layers.18.self_attn.o_proj
277
+ - model.layers.24.self_attn.o_proj
278
+ - model.layers.17.self_attn.o_proj
279
+ - model.layers.28.self_attn.o_proj
280
+ - model.layers.34.self_attn.o_proj
281
+ - model.layers.33.self_attn.o_proj
282
+ - model.layers.25.self_attn.o_proj
283
+ - model.layers.12.self_attn.o_proj
284
+ - model.layers.14.self_attn.o_proj
285
+ - model.layers.29.self_attn.o_proj
286
+ - model.layers.16.self_attn.o_proj
287
+ - model.layers.26.self_attn.o_proj
288
+ - model.layers.22.self_attn.o_proj
289
+ - model.layers.27.self_attn.o_proj
290
+ - model.layers.35.self_attn.o_proj
291
+ - model.layers.20.self_attn.o_proj
292
+ - model.layers.13.self_attn.o_proj
293
+ - model.layers.36.self_attn.o_proj
294
+ - model.layers.19.self_attn.o_proj
295
+ - model.layers.37.self_attn.o_proj
296
+ - model.layers.21.self_attn.o_proj
297
+ - model.layers.11.self_attn.o_proj
298
+ - model.layers.54.self_attn.o_proj
299
+ - model.layers.5.self_attn.o_proj
300
+ - model.layers.38.self_attn.o_proj
301
+ - model.layers.6.self_attn.o_proj
302
+ - model.layers.8.self_attn.o_proj
303
+ - model.layers.9.self_attn.o_proj
304
+ # self_attn.q_proj layers
305
+ - model.layers.1.self_attn.q_proj
306
+ - model.layers.2.self_attn.q_proj
307
+ - model.layers.3.self_attn.q_proj
308
+ - model.layers.45.self_attn.q_proj
309
+ - model.layers.54.self_attn.q_proj
310
+ - model.layers.35.self_attn.q_proj
311
+ - model.layers.48.self_attn.q_proj
312
+ - model.layers.61.self_attn.q_proj
313
+ - model.layers.52.self_attn.q_proj
314
+ - model.layers.50.self_attn.q_proj
315
+ - model.layers.60.self_attn.q_proj
316
+ - model.layers.56.self_attn.q_proj
317
+ - model.layers.58.self_attn.q_proj
318
+ - model.layers.42.self_attn.q_proj
319
+ - model.layers.59.self_attn.q_proj
320
+ - model.layers.44.self_attn.q_proj
321
+ - model.layers.55.self_attn.q_proj
322
+ - model.layers.57.self_attn.q_proj
323
+ - model.layers.41.self_attn.q_proj
324
+ - model.layers.36.self_attn.q_proj
325
+ - model.layers.39.self_attn.q_proj
326
+ - model.layers.4.self_attn.q_proj
327
+ - model.layers.43.self_attn.q_proj
328
+ - model.layers.34.self_attn.q_proj
329
+ - model.layers.46.self_attn.q_proj
330
+ - model.layers.49.self_attn.q_proj
331
+ - model.layers.40.self_attn.q_proj
332
+ - model.layers.25.self_attn.q_proj
333
+ - model.layers.51.self_attn.q_proj
334
+ - model.layers.17.self_attn.q_proj
335
+ - model.layers.37.self_attn.q_proj
336
+ - model.layers.53.self_attn.q_proj
337
+ # self_attn.v_proj layers
338
+ - model.layers.55.self_attn.v_proj
339
+ - model.layers.31.self_attn.v_proj
340
+ - model.layers.47.self_attn.v_proj
341
+ - model.layers.45.self_attn.v_proj
342
+ - model.layers.49.self_attn.v_proj
343
+ - model.layers.48.self_attn.v_proj
344
+ - model.layers.15.self_attn.v_proj
345
+ - model.layers.30.self_attn.v_proj
346
+ - model.layers.7.self_attn.v_proj
347
+ - model.layers.44.self_attn.v_proj
348
+ - model.layers.29.self_attn.v_proj
349
+ - model.layers.51.self_attn.v_proj
350
+ - model.layers.50.self_attn.v_proj
351
+ - model.layers.14.self_attn.v_proj
352
+ - model.layers.54.self_attn.v_proj
353
+ - model.layers.32.self_attn.v_proj
354
+ - model.layers.43.self_attn.v_proj
355
+ - model.layers.10.self_attn.v_proj
356
+ - model.layers.46.self_attn.v_proj
357
+ - model.layers.38.self_attn.v_proj
358
+ - model.layers.57.self_attn.v_proj
359
+ - model.layers.22.self_attn.v_proj
360
+ - model.layers.39.self_attn.v_proj
361
+ - model.layers.6.self_attn.v_proj
362
+ - model.layers.23.self_attn.v_proj
363
+ - model.layers.58.self_attn.v_proj
364
+ - model.layers.53.self_attn.v_proj
365
+ - model.layers.40.self_attn.v_proj
366
+ - model.layers.24.self_attn.v_proj
367
+ - model.layers.9.self_attn.v_proj
368
+ - model.layers.25.self_attn.v_proj
369
+ - model.layers.5.self_attn.v_proj
370
+
371
+
372
+
373
+ wandb_project: EVA-Qwen2.5-32B-SFFT-v0.2
374
+ wandb_entity:
375
+ wandb_watch:
376
+ wandb_name: Unit-02
377
+ wandb_log_model:
378
+
379
+ gradient_accumulation_steps: 8
380
+ micro_batch_size: 1
381
+ num_epochs: 3
382
+ optimizer: paged_adamw_8bit
383
+ lr_scheduler: cosine
384
+ learning_rate: 0.00005
385
+ max_grad_norm: 3
386
+
387
+ train_on_inputs: false
388
+ group_by_length: false
389
+ bf16: auto
390
+ fp16:
391
+ tf32: false
392
+
393
+ gradient_checkpointing: "unsloth"
394
+ # gradient_checkpointing_kwargs:
395
+ # use_reentrant: true
396
+ early_stopping_patience:
397
+ resume_from_checkpoint:
398
+ local_rank:
399
+ logging_steps: 1
400
+ xformers_attention:
401
+ flash_attention: true
402
+
403
+ warmup_steps: 20
404
+ evals_per_epoch: 4
405
+ saves_per_epoch: 4
406
+ save_safetensors: true
407
+ hub_model_id:
408
+ hub_strategy:
409
+ debug:
410
+ deepspeed: deepspeed_configs/zero3_bf16.json
411
+ weight_decay: 0.1
412
+ # fsdp:
413
+ # - full_shard
414
+ # - auto_wrap
415
+ # fsdp_config:
416
+ # fsdp_limit_all_gathers: true
417
+ # fsdp_sync_module_states: false
418
+ # fsdp_offload_params: true
419
+ # fsdp_cpu_ram_efficient_loading: true
420
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
421
+ # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
422
+ # fsdp_activation_checkpointing: true
423
+ # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
424
+ # fsdp_sharding_strategy: FULL_SHARD
425
+ # fsdp_forward_prefetch: false # Added
426
+ # fsdp_backward_prefetch: "BACKWARD_PRE" # Added
427
+ # fsdp_backward_prefetch_limit: 1 # Added
428
+ # fsdp_mixed_precision: BF16 # Added
429
+ ```
430
+
431
+ </details><br>
eva-qwen2.5-32b-v0.2.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c3f1e7ab942183a1e69eb8b609f1b250bf3585b65801520947aa1990ba8bdfc
3
+ size 18640229120