cwaud commited on
Commit
37fbcf2
1 Parent(s): af421d1

End of training

Browse files
Files changed (2) hide show
  1. README.md +10 -9
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -46,13 +46,13 @@ fp16: null
46
  fsdp: null
47
  fsdp_config: null
48
  gradient_accumulation_steps: 4
49
- gradient_checkpointing: true
50
  group_by_length: false
51
  hub_model_id: cwaud/90e6600b-e8b1-40d3-a07e-162ae14eccad
52
  hub_repo: cwaud
53
  hub_strategy: checkpoint
54
  hub_token: null
55
- learning_rate: 0.0002
56
  load_in_4bit: false
57
  load_in_8bit: true
58
  local_rank: null
@@ -64,7 +64,7 @@ lora_model_dir: null
64
  lora_r: 16
65
  lora_target_linear: true
66
  lr_scheduler: cosine
67
- max_steps: 10
68
  micro_batch_size: 2
69
  mlflow_experiment_name: /tmp/3f1696d98781e372_train_data.json
70
  model_type: AutoModelForCausalLM
@@ -101,7 +101,7 @@ xformers_attention: null
101
 
102
  This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
103
  It achieves the following results on the evaluation set:
104
- - Loss: 0.9187
105
 
106
  ## Model description
107
 
@@ -120,7 +120,7 @@ More information needed
120
  ### Training hyperparameters
121
 
122
  The following hyperparameters were used during training:
123
- - learning_rate: 0.0002
124
  - train_batch_size: 2
125
  - eval_batch_size: 2
126
  - seed: 42
@@ -129,16 +129,17 @@ The following hyperparameters were used during training:
129
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
130
  - lr_scheduler_type: cosine
131
  - lr_scheduler_warmup_steps: 10
132
- - training_steps: 10
133
 
134
  ### Training results
135
 
136
  | Training Loss | Epoch | Step | Validation Loss |
137
  |:-------------:|:------:|:----:|:---------------:|
138
  | 1.0747 | 0.0024 | 1 | 1.0123 |
139
- | 0.8378 | 0.0072 | 3 | 1.0095 |
140
- | 0.8917 | 0.0143 | 6 | 0.9692 |
141
- | 0.6845 | 0.0215 | 9 | 0.9187 |
 
142
 
143
 
144
  ### Framework versions
 
46
  fsdp: null
47
  fsdp_config: null
48
  gradient_accumulation_steps: 4
49
+ gradient_checkpointing: false
50
  group_by_length: false
51
  hub_model_id: cwaud/90e6600b-e8b1-40d3-a07e-162ae14eccad
52
  hub_repo: cwaud
53
  hub_strategy: checkpoint
54
  hub_token: null
55
+ learning_rate: 0.0001
56
  load_in_4bit: false
57
  load_in_8bit: true
58
  local_rank: null
 
64
  lora_r: 16
65
  lora_target_linear: true
66
  lr_scheduler: cosine
67
+ max_steps: 100
68
  micro_batch_size: 2
69
  mlflow_experiment_name: /tmp/3f1696d98781e372_train_data.json
70
  model_type: AutoModelForCausalLM
 
101
 
102
  This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
103
  It achieves the following results on the evaluation set:
104
+ - Loss: 0.8348
105
 
106
  ## Model description
107
 
 
120
  ### Training hyperparameters
121
 
122
  The following hyperparameters were used during training:
123
+ - learning_rate: 0.0001
124
  - train_batch_size: 2
125
  - eval_batch_size: 2
126
  - seed: 42
 
129
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
130
  - lr_scheduler_type: cosine
131
  - lr_scheduler_warmup_steps: 10
132
+ - training_steps: 100
133
 
134
  ### Training results
135
 
136
  | Training Loss | Epoch | Step | Validation Loss |
137
  |:-------------:|:------:|:----:|:---------------:|
138
  | 1.0747 | 0.0024 | 1 | 1.0123 |
139
+ | 0.8032 | 0.0597 | 25 | 0.8647 |
140
+ | 0.654 | 0.1193 | 50 | 0.8412 |
141
+ | 0.73 | 0.1790 | 75 | 0.8352 |
142
+ | 0.9823 | 0.2387 | 100 | 0.8348 |
143
 
144
 
145
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c29393998c2e4bd09f0cdae30e9a44e739aece650fde573e118e969b6cd43db0
3
  size 45169354
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e553f560ef57a0e0bf20ff782c6b8f323c76838b4a6952c3340597759c6d5ec1
3
  size 45169354