willtensora
/

459779f2-cbce-4ec0-b11c-1dcdf92498d8

@@ -1,12 +1,12 @@
 ---
 library_name: transformers
-license: llama3.2
-base_model: NousResearch/Llama-3.2-1B
 tags:
 - axolotl
 - generated_from_trainer
 model-index:
-- name: 0c2649cc-2fe7-4e88-b672-6da1fee4001f
   results: []
 ---
@@ -18,20 +18,20 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
-base_model: NousResearch/Llama-3.2-1B
 batch_size: 32
 bf16: true
 chat_template: tokenizer_default_fallback_alpaca
 datasets:
 - data_files:
-  - f51beb4c568b9128_train_data.json
   ds_type: json
   format: custom
-  path: /workspace/input_data/f51beb4c568b9128_train_data.json
   type:
-    field_input: keywords
-    field_instruction: idea
-    field_output: full_response
     format: '{instruction} {input}'
     no_input_format: '{instruction}'
     system_format: '{system}'
@@ -41,7 +41,7 @@ flash_attention: true
 gpu_memory_limit: 80GiB
 gradient_checkpointing: true
 group_by_length: true
-hub_model_id: willtensora/0c2649cc-2fe7-4e88-b672-6da1fee4001f
 hub_strategy: checkpoint
 learning_rate: 0.0002
 logging_steps: 10
@@ -57,15 +57,13 @@ sample_packing: false
 save_steps: 40
 save_total_limit: 1
 sequence_len: 2048
-special_tokens:
-  pad_token: <|end_of_text|>
-tokenizer_type: PreTrainedTokenizerFast
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.1
 wandb_entity: ''
 wandb_mode: online
-wandb_name: NousResearch/Llama-3.2-1B-/workspace/input_data/f51beb4c568b9128_train_data.json
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: default
@@ -76,11 +74,11 @@ xformers_attention: true
 </details><br>
-# 0c2649cc-2fe7-4e88-b672-6da1fee4001f
-This model is a fine-tuned version of [NousResearch/Llama-3.2-1B](https://huggingface.co/NousResearch/Llama-3.2-1B) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0849
 ## Model description
@@ -109,26 +107,28 @@ The following hyperparameters were used during training:
 - total_eval_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 12
-- training_steps: 258
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| No log        | 0.0005 | 1    | 0.2074          |
-| 0.5472        | 0.0097 | 20   | 0.1746          |
-| 0.3199        | 0.0194 | 40   | 0.2036          |
-| 0.2013        | 0.0291 | 60   | 0.1772          |
-| 0.0903        | 0.0388 | 80   | 0.1702          |
-| 0.0875        | 0.0485 | 100  | 0.2040          |
-| 0.1425        | 0.0582 | 120  | 0.1392          |
-| 0.1982        | 0.0679 | 140  | 0.1194          |
-| 0.1372        | 0.0776 | 160  | 0.1014          |
-| 0.0278        | 0.0873 | 180  | 0.0952          |
-| 0.0248        | 0.0970 | 200  | 0.0893          |
-| 0.1051        | 0.1067 | 220  | 0.0875          |
-| 0.0649        | 0.1164 | 240  | 0.0849          |
 ### Framework versions

 ---
 library_name: transformers
+license: apache-2.0
+base_model: Qwen/Qwen2-0.5B
 tags:
 - axolotl
 - generated_from_trainer
 model-index:
+- name: 459779f2-cbce-4ec0-b11c-1dcdf92498d8
   results: []
 ---
 axolotl version: `0.4.1`
 ```yaml
+base_model: Qwen/Qwen2-0.5B
 batch_size: 32
 bf16: true
 chat_template: tokenizer_default_fallback_alpaca
 datasets:
 - data_files:
+  - 745d2d05aaed18f4_train_data.json
   ds_type: json
   format: custom
+  path: /workspace/input_data/745d2d05aaed18f4_train_data.json
   type:
+    field_input: pos
+    field_instruction: task
+    field_output: query
     format: '{instruction} {input}'
     no_input_format: '{instruction}'
     system_format: '{system}'
 gpu_memory_limit: 80GiB
 gradient_checkpointing: true
 group_by_length: true
+hub_model_id: willtensora/459779f2-cbce-4ec0-b11c-1dcdf92498d8
 hub_strategy: checkpoint
 learning_rate: 0.0002
 logging_steps: 10
 save_steps: 40
 save_total_limit: 1
 sequence_len: 2048
+tokenizer_type: Qwen2TokenizerFast
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.1
 wandb_entity: ''
 wandb_mode: online
+wandb_name: Qwen/Qwen2-0.5B-/workspace/input_data/745d2d05aaed18f4_train_data.json
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: default
 </details><br>
+# 459779f2-cbce-4ec0-b11c-1dcdf92498d8
+This model is a fine-tuned version of [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.4560
 ## Model description
 - total_eval_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 14
+- training_steps: 291
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| No log        | 0.0004 | 1    | 3.9660          |
+| 2.8207        | 0.0086 | 20   | 3.1038          |
+| 3.1247        | 0.0172 | 40   | 3.0989          |
+| 2.9411        | 0.0258 | 60   | 2.8986          |
+| 2.9915        | 0.0344 | 80   | 2.8742          |
+| 2.8038        | 0.0430 | 100  | 2.8405          |
+| 2.8518        | 0.0516 | 120  | 2.7728          |
+| 2.7079        | 0.0602 | 140  | 2.6985          |
+| 2.6076        | 0.0688 | 160  | 2.6416          |
+| 2.6172        | 0.0774 | 180  | 2.5695          |
+| 2.552         | 0.0860 | 200  | 2.5151          |
+| 2.5036        | 0.0946 | 220  | 2.4783          |
+| 2.4887        | 0.1032 | 240  | 2.4610          |
+| 2.4008        | 0.1118 | 260  | 2.4569          |
+| 2.424         | 0.1204 | 280  | 2.4560          |
 ### Framework versions

generation_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
-  "_from_model_config": true,
-  "bos_token_id": 128000,
   "do_sample": true,
-  "eos_token_id": 128001,
   "transformers_version": "4.46.0"
 }

 {
+  "bos_token_id": 151643,
   "do_sample": true,
+  "eos_token_id": 151643,
+  "max_new_tokens": 2048,
   "transformers_version": "4.46.0"
 }

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4a3488e39325dea60c21ab0cf3a2715d26192702fde06183582341380d5a328b
-size 2471678226

 version https://git-lfs.github.com/spec/v1
+oid sha256:3215986268fec825d0014ee32eded07cd52ce8556945acb19dbacdd0c12be2bb
+size 988163026