aashish1904 commited on
Commit
eb10b63
·
verified ·
1 Parent(s): 7824385

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +170 -0
README.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ library_name: transformers
5
+ license: other
6
+ license_name: qwen
7
+ license_link: https://huggingface.co/Qwen/Qwen2.5-14B/blob/main/LICENSE
8
+ base_model: Qwen/Qwen2.5-14B
9
+ tags:
10
+ - generated_from_trainer
11
+ model-index:
12
+ - name: 14B-Qwen2.5-Freya-x1
13
+ results: []
14
+
15
+ ---
16
+
17
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
18
+
19
+
20
+ # QuantFactory/14B-Qwen2.5-Freya-x1-GGUF
21
+ This is quantized version of [Sao10K/14B-Qwen2.5-Freya-x1](https://huggingface.co/Sao10K/14B-Qwen2.5-Freya-x1) created using llama.cpp
22
+
23
+ # Original Model Card
24
+
25
+
26
+ ![Freya](https://huggingface.co/Sao10K/14B-Qwen2.5-Freya-x1/resolve/main/sad.png)
27
+ *Me during failed runs*
28
+
29
+ # 14B-Qwen2.5-Freya-v1
30
+
31
+ I decided to mess around with training methods again, considering the re-emegence of methods like multi-step training. Some people began doing it again, and so, why not? Inspired by AshhLimaRP's methology but done it my way.
32
+
33
+ Freya-S1
34
+ - LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model.
35
+ - Cleaned text and literature as best as I could, still, may have had issues here and there.
36
+
37
+ Freya-S2
38
+ - The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that.
39
+ - Reduced LoRA rank because it's mainly instruct and other details I won't get into.
40
+
41
+ Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.*
42
+ ```
43
+ Prompt Format: ChatML
44
+ Temperature: 1+ # I don't know, man.
45
+ min_p: 0.05
46
+ ```
47
+
48
+ Training time in total was ~10 Hours on a 8xH100 Node, sponsored by the Government of Singapore or something. Thanks for the national service allowance, MHA.
49
+
50
+ https://sao10k.carrd.co/ for contact.
51
+
52
+ ---
53
+
54
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
55
+ <details><summary>See axolotl config</summary>
56
+
57
+ axolotl version: `0.6.0`
58
+ ```yaml
59
+ base_model:
60
+ - s1: Qwen/Qwen2.5-14B
61
+ - s2: Qwen/Qwen2.5-14B-Instruct
62
+ model_type: AutoModelForCausalLM
63
+ tokenizer_type: AutoTokenizer
64
+
65
+ load_in_8bit: false
66
+ load_in_4bit: false
67
+ strict: false
68
+ sequence_len: 16384
69
+ bf16: auto
70
+ fp16:
71
+ tf32: false
72
+ flash_attention: true
73
+ special_tokens:
74
+
75
+ adapter: lora # 16-bit
76
+ lora_r:
77
+ - s1: 64
78
+ - s2: 32
79
+ lora_alpha: 64
80
+ lora_dropout: 0.2
81
+ lora_fan_in_fan_out:
82
+ peft_use_rslora: true
83
+ lora_target_linear: true
84
+
85
+ # Data
86
+ dataset_prepared_path: dataset_run_freya
87
+ datasets:
88
+ # S1 - Writing / Completion
89
+ - path: datasets/eBooks-cleaned-75K
90
+ type: completion
91
+ - path: datasets/novels-clean-dedupe-10K
92
+ type: completion
93
+ # S2 - Instruct
94
+ - path: datasets/10k-amoral-full-fixed-sys.json
95
+ type: chat_template
96
+ chat_template: chatml
97
+ roles_to_train: ["gpt"]
98
+ field_messages: conversations
99
+ message_field_role: from
100
+ message_field_content: value
101
+ train_on_eos: turn
102
+ - path: datasets/44k-hespera-smartshuffle.json
103
+ type: chat_template
104
+ chat_template: chatml
105
+ roles_to_train: ["gpt"]
106
+ field_messages: conversations
107
+ message_field_role: from
108
+ message_field_content: value
109
+ train_on_eos: turn
110
+ - path: datasets/5k_rpg_adventure_instruct-sys.json
111
+ type: chat_template
112
+ chat_template: chatml
113
+ roles_to_train: ["gpt"]
114
+ field_messages: conversations
115
+ message_field_role: from
116
+ message_field_content: value
117
+ train_on_eos: turn
118
+ shuffle_merged_datasets: true
119
+ warmup_ratio: 0.1
120
+
121
+ plugins:
122
+ - axolotl.integrations.liger.LigerPlugin
123
+ liger_rope: true
124
+ liger_rms_norm: true
125
+ liger_layer_norm: true
126
+ liger_glu_activation: true
127
+ liger_fused_linear_cross_entropy: true
128
+
129
+ # Iterations
130
+ num_epochs:
131
+ - s1: 1
132
+ - s2: 2
133
+
134
+ # Sampling
135
+ sample_packing: true
136
+ pad_to_sequence_len: true
137
+ train_on_inputs: false
138
+ group_by_length: false
139
+
140
+ # Batching
141
+ gradient_accumulation_steps: 4
142
+ micro_batch_size: 2
143
+ gradient_checkpointing: unsloth
144
+
145
+ # Evaluation
146
+ val_set_size: 0.025
147
+ evals_per_epoch: 5
148
+ eval_table_size:
149
+ eval_max_new_tokens: 256
150
+ eval_sample_packing: false
151
+ eval_batch_size: 1
152
+
153
+ # Optimizer
154
+ optimizer: paged_ademamix_8bit
155
+ lr_scheduler: cosine
156
+ learning_rate:
157
+ - s1: 0.000002
158
+ - s2: 0.000004
159
+ weight_decay: 0.2
160
+ max_grad_norm: 10.0
161
+
162
+ # Garbage Collection
163
+ gc_steps: 10
164
+
165
+ # Misc
166
+ deepspeed: ./deepspeed_configs/zero2.json
167
+
168
+ ```
169
+
170
+ </details><br>