Alyosha11 commited on
Commit
dd8cabd
·
verified ·
1 Parent(s): b15ed0a

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -229
README.md DELETED
@@ -1,229 +0,0 @@
1
- ---
2
- library_name: transformers
3
- base_model: Alyosha11/KS-Llama-dpo-3.1-8B-clean
4
- tags:
5
- - generated_from_trainer
6
- model-index:
7
- - name: spectrum_dir
8
- results: []
9
- ---
10
-
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
15
- <details><summary>See axolotl config</summary>
16
-
17
- axolotl version: `0.4.1`
18
- ```yaml
19
- base_model: Alyosha11/KS-Llama-dpo-3.1-8B-clean
20
-
21
- tokenizer_type: AutoTokenizer
22
-
23
- load_in_8bit: false
24
- load_in_4bit: false
25
- strict: false
26
-
27
- datasets:
28
- - path: Alyosha11/newSFTclean
29
- type: alpaca
30
- split: train
31
- - path: Alyosha11/newSFT
32
- type: alpaca
33
- split: data
34
-
35
- output_dir: ./spectrum_dir
36
-
37
-
38
-
39
- sequence_len: 8192
40
- sample_packing: true
41
- pad_to_sequence_len: true
42
-
43
- wandb_project: angel
44
- wandb_entity:
45
- wandb_watch:
46
- wandb_run_id:
47
- wandb_log_model:
48
-
49
- gradient_accumulation_steps: 8
50
- micro_batch_size: 1
51
- eval_batch_size: 1
52
- num_epochs: 3
53
- optimizer: paged_adamw_8bit
54
- lr_scheduler: cosine
55
- learning_rate: 1e-5
56
-
57
- train_on_inputs: false
58
- group_by_length: false
59
- bf16: auto
60
- fp16:
61
- tf32: false
62
-
63
- gradient_checkpointing: true
64
- early_stopping_patience:
65
- resume_from_checkpoint:
66
- local_rank:
67
- logging_steps: 1
68
- xformers_attention:
69
- flash_attention: true
70
-
71
- # fsdp:
72
- # - full_shard
73
- # - auto_wrap
74
- # fsdp_config:
75
- # fsdp_offload_params: false
76
- # fsdp_state_dict_type: FULL_STATE_DICT
77
- # fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
78
- special_tokens:
79
- pad_token: <|end_of_text|>
80
- warmup_steps: 10
81
- auto_resume_from_checkpoints: false
82
- #warmup_ratio: 0.5
83
- eval_steps: 10
84
- saves_per_epoch: 10
85
- eval_sample_packing: false
86
- save_total_limit: 2
87
- debug:
88
- deepspeed: deepspeed_configs/zero2.json
89
-
90
-
91
-
92
-
93
-
94
-
95
- unfrozen_parameters:
96
- - ^lm_head.weight$
97
- - ^model.embed_tokens.weight$
98
- # input_layernorm layers
99
- - model.layers.0.input_layernorm
100
- - model.layers.1.input_layernorm
101
- - model.layers.2.input_layernorm
102
- - model.layers.3.input_layernorm
103
- - model.layers.4.input_layernorm
104
- - model.layers.5.input_layernorm
105
- - model.layers.6.input_layernorm
106
- - model.layers.7.input_layernorm
107
- # lm_head layers
108
- # mlp.down_proj layers
109
- - model.layers.1.mlp.down_proj
110
- - model.layers.0.mlp.down_proj
111
- - model.layers.30.mlp.down_proj
112
- - model.layers.2.mlp.down_proj
113
- - model.layers.21.mlp.down_proj
114
- - model.layers.29.mlp.down_proj
115
- - model.layers.22.mlp.down_proj
116
- - model.layers.5.mlp.down_proj
117
- # mlp.gate_proj layers
118
- - model.layers.1.mlp.gate_proj
119
- - model.layers.2.mlp.gate_proj
120
- - model.layers.3.mlp.gate_proj
121
- - model.layers.4.mlp.gate_proj
122
- - model.layers.0.mlp.gate_proj
123
- - model.layers.25.mlp.gate_proj
124
- - model.layers.26.mlp.gate_proj
125
- - model.layers.5.mlp.gate_proj
126
- # mlp.up_proj layers
127
- - model.layers.4.mlp.up_proj
128
- - model.layers.3.mlp.up_proj
129
- - model.layers.0.mlp.up_proj
130
- - model.layers.7.mlp.up_proj
131
- - model.layers.5.mlp.up_proj
132
- - model.layers.6.mlp.up_proj
133
- - model.layers.2.mlp.up_proj
134
- - model.layers.1.mlp.up_proj
135
- # model.embed_tokens layers
136
- # model.norm layers
137
- # post_attention_layernorm layers
138
- - model.layers.0.post_attention_layernorm
139
- - model.layers.1.post_attention_layernorm
140
- - model.layers.2.post_attention_layernorm
141
- - model.layers.3.post_attention_layernorm
142
- - model.layers.4.post_attention_layernorm
143
- - model.layers.5.post_attention_layernorm
144
- - model.layers.6.post_attention_layernorm
145
- - model.layers.7.post_attention_layernorm
146
- # self_attn.k_proj layers
147
- - model.layers.29.self_attn.k_proj
148
- - model.layers.25.self_attn.k_proj
149
- - model.layers.23.self_attn.k_proj
150
- - model.layers.28.self_attn.k_proj
151
- - model.layers.21.self_attn.k_proj
152
- - model.layers.19.self_attn.k_proj
153
- - model.layers.22.self_attn.k_proj
154
- - model.layers.20.self_attn.k_proj
155
- # self_attn.o_proj layers
156
- - model.layers.14.self_attn.o_proj
157
- - model.layers.7.self_attn.o_proj
158
- - model.layers.5.self_attn.o_proj
159
- - model.layers.11.self_attn.o_proj
160
- - model.layers.6.self_attn.o_proj
161
- - model.layers.24.self_attn.o_proj
162
- - model.layers.9.self_attn.o_proj
163
- - model.layers.13.self_attn.o_proj
164
- # self_attn.q_proj layers
165
- - model.layers.8.self_attn.q_proj
166
- - model.layers.13.self_attn.q_proj
167
- - model.layers.9.self_attn.q_proj
168
- - model.layers.14.self_attn.q_proj
169
- - model.layers.10.self_attn.q_proj
170
- - model.layers.11.self_attn.q_proj
171
- - model.layers.0.self_attn.q_proj
172
- - model.layers.15.self_attn.q_proj
173
- # self_attn.v_proj layers
174
- - model.layers.26.self_attn.v_proj
175
- - model.layers.17.self_attn.v_proj
176
- - model.layers.3.self_attn.v_proj
177
- - model.layers.28.self_attn.v_proj
178
- - model.layers.29.self_attn.v_proj
179
- - model.layers.21.self_attn.v_proj
180
- - model.layers.15.self_attn.v_proj
181
- - model.layers.16.self_attn.v_proj
182
-
183
- ```
184
-
185
- </details><br>
186
-
187
- # spectrum_dir
188
-
189
- This model is a fine-tuned version of [Alyosha11/KS-Llama-dpo-3.1-8B-clean](https://huggingface.co/Alyosha11/KS-Llama-dpo-3.1-8B-clean) on the None dataset.
190
-
191
- ## Model description
192
-
193
- More information needed
194
-
195
- ## Intended uses & limitations
196
-
197
- More information needed
198
-
199
- ## Training and evaluation data
200
-
201
- More information needed
202
-
203
- ## Training procedure
204
-
205
- ### Training hyperparameters
206
-
207
- The following hyperparameters were used during training:
208
- - learning_rate: 1e-05
209
- - train_batch_size: 1
210
- - eval_batch_size: 1
211
- - seed: 42
212
- - distributed_type: multi-GPU
213
- - gradient_accumulation_steps: 8
214
- - total_train_batch_size: 8
215
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
216
- - lr_scheduler_type: cosine
217
- - lr_scheduler_warmup_steps: 10
218
- - num_epochs: 3
219
-
220
- ### Training results
221
-
222
-
223
-
224
- ### Framework versions
225
-
226
- - Transformers 4.45.2
227
- - Pytorch 2.4.1+cu121
228
- - Datasets 3.0.1
229
- - Tokenizers 0.20.1