Svak commited on
Commit
340d792
·
verified ·
1 Parent(s): b87db4f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -0
README.md ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ library_name: transformers
4
+ base_model:
5
+ - meta-llama/Llama-3.3-70B-Instruct
6
+ tags:
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: 70B-L3.3-mhnnn-x1
10
+ results: []
11
+ license: llama3.3
12
+ ---
13
+
14
+ # This quant was made for and by [Infermatic.ai](https://infermatic.ai/)
15
+
16
+ [Sao10K/70B-L3.3-mhnnn-x1](https://huggingface.co/Sao10K/70B-L3.3-mhnnn-x1)
17
+
18
+ Copy of the original card
19
+
20
+ ---
21
+
22
+
23
+ ![yeah](https://huggingface.co/Sao10K/70B-L3.3-mhnnn-x1/resolve/main/Huh.jpg)
24
+ *my mental when things do not go well*
25
+
26
+ # 70B-L3.3-mhnnn-x1
27
+
28
+ I quite liked it, after messing around. Same data composition as Freya, applied differently.
29
+
30
+ Has occasional brainfarts which are fixed with a regen, the price for more creative outputs.
31
+
32
+ Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.*
33
+ ```
34
+ Prompt Format: Llama-3-Instruct
35
+ Temperature: 1.1
36
+ min_p: 0.05
37
+ ```
38
+
39
+ Types of Data included within Sets
40
+ ```
41
+ Completion - Novels / eBooks
42
+ Text Adventure - Include details like 'Text Adventure Narrator' in the System Prompt, give it a one-shot example and it'll fly.
43
+ Amoral Assistant - Include the terms 'Amoral', 'Neutral' along with the regular assistant prompt for better results.
44
+ Instruct / Assistant - The usual assistant tasks.
45
+ Roleplay - As per Usual, Regular Sets
46
+ ```
47
+
48
+ Training time in total was ~14 Hours on a 8xH100 Node, shout out to SCDF for not sponsoring this run. My funds are dry doing random things.
49
+
50
+ https://sao10k.carrd.co/ for contact.
51
+
52
+ ---
53
+
54
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
55
+ <details><summary>See axolotl config</summary>
56
+
57
+ axolotl version: `0.6.0`
58
+ ```yaml
59
+ adapter: lora # 16-bit
60
+ lora_r: 64
61
+ lora_alpha: 64
62
+ lora_dropout: 0.2
63
+ peft_use_rslora: true
64
+ lora_target_linear: true
65
+
66
+ # Data
67
+ dataset_prepared_path: dataset_run_freya
68
+ datasets:
69
+ # S1 - Writing / Completion
70
+ - path: datasets/eBooks-cleaned-75K
71
+ type: completion
72
+ - path: datasets/novels-clean-dedupe-10K
73
+ type: completion
74
+ # S2 - Instruct
75
+ - path: datasets/10k-amoral-full-fixed-sys.json
76
+ type: chat_template
77
+ chat_template: llama3
78
+ roles_to_train: ["gpt"]
79
+ field_messages: conversations
80
+ message_field_role: from
81
+ message_field_content: value
82
+ train_on_eos: turn
83
+ - path: datasets/44k-hespera-smartshuffle.json
84
+ type: chat_template
85
+ chat_template: llama3
86
+ roles_to_train: ["gpt"]
87
+ field_messages: conversations
88
+ message_field_role: from
89
+ message_field_content: value
90
+ train_on_eos: turn
91
+ - path: datasets/5k_rpg_adventure_instruct-sys.json
92
+ type: chat_template
93
+ chat_template: llama3
94
+ roles_to_train: ["gpt"]
95
+ field_messages: conversations
96
+ message_field_role: from
97
+ message_field_content: value
98
+ train_on_eos: turn
99
+ shuffle_merged_datasets: true
100
+ warmup_ratio: 0.1
101
+
102
+ plugins:
103
+ - axolotl.integrations.liger.LigerPlugin
104
+ liger_rope: true
105
+ liger_rms_norm: true
106
+ liger_layer_norm: true
107
+ liger_glu_activation: true
108
+ liger_fused_linear_cross_entropy: true
109
+
110
+ # Iterations
111
+ num_epochs: 1
112
+
113
+ # Sampling
114
+ sample_packing: true
115
+ pad_to_sequence_len: true
116
+ train_on_inputs: false
117
+ group_by_length: false
118
+
119
+ # Batching
120
+ gradient_accumulation_steps: 4
121
+ micro_batch_size: 2
122
+ gradient_checkpointing: unsloth
123
+
124
+ # Evaluation
125
+ val_set_size: 0.025
126
+ evals_per_epoch: 5
127
+ eval_table_size:
128
+ eval_max_new_tokens: 256
129
+ eval_sample_packing: false
130
+ eval_batch_size: 1
131
+
132
+ # Optimizer
133
+ optimizer: paged_ademamix_8bit
134
+ lr_scheduler: cosine
135
+ learning_rate: 0.00000242
136
+ weight_decay: 0.2
137
+ max_grad_norm: 10.0
138
+
139
+ # Garbage Collection
140
+ gc_steps: 10
141
+
142
+ # Misc
143
+ deepspeed: ./deepspeed_configs/zero3_bf16.json
144
+ ```
145
+
146
+ </details><br>