cwaud
/

90e6600b-e8b1-40d3-a07e-162ae14eccad

@@ -46,13 +46,13 @@ fp16: null
 fsdp: null
 fsdp_config: null
 gradient_accumulation_steps: 4
-gradient_checkpointing: true
 group_by_length: false
 hub_model_id: cwaud/90e6600b-e8b1-40d3-a07e-162ae14eccad
 hub_repo: cwaud
 hub_strategy: checkpoint
 hub_token: null
-learning_rate: 0.0002
 load_in_4bit: false
 load_in_8bit: true
 local_rank: null
@@ -64,7 +64,7 @@ lora_model_dir: null
 lora_r: 16
 lora_target_linear: true
 lr_scheduler: cosine
-max_steps: 10
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/3f1696d98781e372_train_data.json
 model_type: AutoModelForCausalLM
@@ -101,7 +101,7 @@ xformers_attention: null
 This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.9187
 ## Model description
@@ -120,7 +120,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0002
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
@@ -129,16 +129,17 @@ The following hyperparameters were used during training:
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- training_steps: 10
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 1.0747        | 0.0024 | 1    | 1.0123          |
-| 0.8378        | 0.0072 | 3    | 1.0095          |
-| 0.8917        | 0.0143 | 6    | 0.9692          |
-| 0.6845        | 0.0215 | 9    | 0.9187          |
 ### Framework versions

 fsdp: null
 fsdp_config: null
 gradient_accumulation_steps: 4
+gradient_checkpointing: false
 group_by_length: false
 hub_model_id: cwaud/90e6600b-e8b1-40d3-a07e-162ae14eccad
 hub_repo: cwaud
 hub_strategy: checkpoint
 hub_token: null
+learning_rate: 0.0001
 load_in_4bit: false
 load_in_8bit: true
 local_rank: null
 lora_r: 16
 lora_target_linear: true
 lr_scheduler: cosine
+max_steps: 100
 micro_batch_size: 2
 mlflow_experiment_name: /tmp/3f1696d98781e372_train_data.json
 model_type: AutoModelForCausalLM
 This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.8348
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
 - train_batch_size: 2
 - eval_batch_size: 2
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- training_steps: 100
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 1.0747        | 0.0024 | 1    | 1.0123          |
+| 0.8032        | 0.0597 | 25   | 0.8647          |
+| 0.654         | 0.1193 | 50   | 0.8412          |
+| 0.73          | 0.1790 | 75   | 0.8352          |
+| 0.9823        | 0.2387 | 100  | 0.8348          |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c29393998c2e4bd09f0cdae30e9a44e739aece650fde573e118e969b6cd43db0
 size 45169354

 version https://git-lfs.github.com/spec/v1
+oid sha256:e553f560ef57a0e0bf20ff782c6b8f323c76838b4a6952c3340597759c6d5ec1
 size 45169354