jermyn commited on
Commit
bf9ae8f
·
verified ·
1 Parent(s): 09f98fa

End of training

Browse files
Files changed (1) hide show
  1. README.md +194 -3
README.md CHANGED
@@ -1,3 +1,194 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/CodeQwen1.5-7B-Chat
3
+ library_name: peft
4
+ license: other
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: CodeQwen1.5-7B-Chat-lora8-NLQ2Cypher
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ # base_model: deepseek-ai/deepseek-coder-1.3b-instruct
22
+ base_model: Qwen/CodeQwen1.5-7B-Chat
23
+ model_type: AutoModelForCausalLM
24
+ tokenizer_type: AutoTokenizer
25
+ is_mistral_derived_model: false
26
+
27
+ load_in_8bit: true
28
+ load_in_4bit: false
29
+ strict: false
30
+
31
+ lora_fan_in_fan_out: false
32
+ data_seed: 49
33
+ seed: 49
34
+
35
+ datasets:
36
+ - path: sample_data/alpaca_synth_cypher.jsonl
37
+ type: sharegpt
38
+ conversation: alpaca
39
+ dataset_prepared_path: last_run_prepared
40
+ val_set_size: 0.1
41
+ output_dir: ./qlora-alpaca-codeqwen1.5-7b-chat-lora8
42
+ # output_dir: ./qlora-alpaca-out
43
+
44
+ hub_model_id: jermyn/CodeQwen1.5-7B-Chat-lora8-NLQ2Cypher
45
+ # hub_model_id: jermyn/deepseek-code-1.3b-inst-NLQ2Cypher
46
+
47
+ adapter: lora # 'qlora' or leave blank for full finetune
48
+ lora_model_dir:
49
+
50
+ sequence_len: 896
51
+ sample_packing: false
52
+ pad_to_sequence_len: true
53
+
54
+ lora_r: 32
55
+ lora_alpha: 16
56
+ lora_dropout: 0.05
57
+ lora_target_linear: true
58
+ lora_fan_in_fan_out:
59
+ # lora_target_modules:
60
+ # - gate_proj
61
+ # - down_proj
62
+ # - up_proj
63
+ # - q_proj
64
+ # - v_proj
65
+ # - k_proj
66
+ # - o_proj
67
+
68
+ # If you added new tokens to the tokenizer, you may need to save some LoRA modules because they need to know the new tokens.
69
+ # For LLaMA and Mistral, you need to save `embed_tokens` and `lm_head`. It may vary for other models.
70
+ # `embed_tokens` converts tokens to embeddings, and `lm_head` converts embeddings to token probabilities.
71
+ # https://github.com/huggingface/peft/issues/334#issuecomment-1561727994
72
+ # lora_modules_to_save:
73
+ # - embed_tokens
74
+ # - lm_head
75
+
76
+ wandb_project: fine-tune-axolotl
77
+ wandb_entity: jermyn
78
+
79
+ gradient_accumulation_steps: 2
80
+ micro_batch_size: 8
81
+ eval_batch_size: 8
82
+ num_epochs: 6
83
+ optimizer: adamw_bnb_8bit
84
+ lr_scheduler: cosine
85
+ learning_rate: 0.0005
86
+ max_grad_norm: 1.0
87
+ adam_beta2: 0.95
88
+ adam_epsilon: 0.00001
89
+
90
+ train_on_inputs: false
91
+ group_by_length: false
92
+ bf16: true
93
+ fp16: false
94
+ tf32: false
95
+
96
+ gradient_checkpointing: true
97
+ early_stopping_patience:
98
+ resume_from_checkpoint:
99
+ local_rank:
100
+ logging_steps: 1
101
+ xformers_attention:
102
+ flash_attention: true
103
+
104
+ loss_watchdog_threshold: 5.0
105
+ loss_watchdog_patience: 3
106
+
107
+ warmup_steps: 10
108
+ evals_per_epoch: 4
109
+ eval_table_size:
110
+ eval_table_max_new_tokens: 128
111
+ # saves_per_epoch: 6
112
+ save_steps: 10
113
+ save_total_limit: 3
114
+ debug:
115
+ weight_decay: 0.0
116
+ fsdp:
117
+ fsdp_config:
118
+ # special_tokens:
119
+ # bos_token: "<s>"
120
+ # eos_token: "</s>"
121
+ # unk_token: "<unk>"
122
+ save_safetensors: true
123
+
124
+ ```
125
+
126
+ </details><br>
127
+
128
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/jermyn/fine-tune-axolotl/runs/jmysluep)
129
+ # CodeQwen1.5-7B-Chat-lora8-NLQ2Cypher
130
+
131
+ This model is a fine-tuned version of [Qwen/CodeQwen1.5-7B-Chat](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat) on the None dataset.
132
+ It achieves the following results on the evaluation set:
133
+ - Loss: 0.3720
134
+
135
+ ## Model description
136
+
137
+ More information needed
138
+
139
+ ## Intended uses & limitations
140
+
141
+ More information needed
142
+
143
+ ## Training and evaluation data
144
+
145
+ More information needed
146
+
147
+ ## Training procedure
148
+
149
+ ### Training hyperparameters
150
+
151
+ The following hyperparameters were used during training:
152
+ - learning_rate: 0.0005
153
+ - train_batch_size: 8
154
+ - eval_batch_size: 8
155
+ - seed: 49
156
+ - gradient_accumulation_steps: 2
157
+ - total_train_batch_size: 16
158
+ - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
159
+ - lr_scheduler_type: cosine
160
+ - lr_scheduler_warmup_steps: 10
161
+ - num_epochs: 6
162
+
163
+ ### Training results
164
+
165
+ | Training Loss | Epoch | Step | Validation Loss |
166
+ |:-------------:|:------:|:----:|:---------------:|
167
+ | 1.1649 | 0.1538 | 1 | 0.9270 |
168
+ | 1.1566 | 0.3077 | 2 | 0.9268 |
169
+ | 1.0746 | 0.6154 | 4 | 0.8194 |
170
+ | 0.6428 | 0.9231 | 6 | 0.4970 |
171
+ | 0.2459 | 1.2308 | 8 | 0.4760 |
172
+ | 0.3512 | 1.5385 | 10 | 0.5091 |
173
+ | 0.1654 | 1.8462 | 12 | 0.4742 |
174
+ | 0.1484 | 2.1538 | 14 | 0.4560 |
175
+ | 0.137 | 2.4615 | 16 | 0.4105 |
176
+ | 0.0746 | 2.7692 | 18 | 0.3736 |
177
+ | 0.0539 | 3.0769 | 20 | 0.3412 |
178
+ | 0.1147 | 3.3846 | 22 | 0.3307 |
179
+ | 0.056 | 3.6923 | 24 | 0.3242 |
180
+ | 0.0767 | 4.0 | 26 | 0.3524 |
181
+ | 0.0583 | 4.3077 | 28 | 0.3690 |
182
+ | 0.0666 | 4.6154 | 30 | 0.3727 |
183
+ | 0.0539 | 4.9231 | 32 | 0.3773 |
184
+ | 0.0367 | 5.2308 | 34 | 0.3796 |
185
+ | 0.0297 | 5.5385 | 36 | 0.3720 |
186
+
187
+
188
+ ### Framework versions
189
+
190
+ - PEFT 0.11.1
191
+ - Transformers 4.42.3
192
+ - Pytorch 2.1.2+cu118
193
+ - Datasets 2.19.1
194
+ - Tokenizers 0.19.1