GaetanMichelet's picture
End of training
381823b verified
|
raw
history blame
7.28 kB
metadata
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
datasets:
  - GaetanMichelet/chat-60_ft_task-1
library_name: peft
license: llama3.1
tags:
  - alignment-handbook
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: Llama-31-8B_task-1_60-samples_config-4_full
    results: []

Llama-31-8B_task-1_60-samples_config-4_full

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the GaetanMichelet/chat-60_ft_task-1 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9355

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 150

Training results

Training Loss Epoch Step Validation Loss
2.5391 0.6957 2 2.4168
2.5182 1.7391 5 2.4065
2.4879 2.7826 8 2.3913
2.4947 3.8261 11 2.3720
2.4335 4.8696 14 2.3479
2.4424 5.9130 17 2.3109
2.3698 6.9565 20 2.2672
2.3512 8.0 23 2.2129
2.32 8.6957 25 2.1830
2.2555 9.7391 28 2.1266
2.1681 10.7826 31 2.0537
2.0737 11.8261 34 1.9880
2.0403 12.8696 37 1.9277
1.9476 13.9130 40 1.8711
1.9204 14.9565 43 1.8155
1.8461 16.0 46 1.7615
1.8095 16.6957 48 1.7236
1.7597 17.7391 51 1.6580
1.6484 18.7826 54 1.5919
1.6443 19.8261 57 1.5262
1.5204 20.8696 60 1.4561
1.463 21.9130 63 1.3960
1.3833 22.9565 66 1.3404
1.3385 24.0 69 1.2875
1.3094 24.6957 71 1.2504
1.2303 25.7391 74 1.2007
1.1677 26.7826 77 1.1600
1.1674 27.8261 80 1.1332
1.1068 28.8696 83 1.1100
1.104 29.9130 86 1.0884
1.0617 30.9565 89 1.0717
1.0354 32.0 92 1.0577
1.0195 32.6957 94 1.0499
1.0659 33.7391 97 1.0396
1.0118 34.7826 100 1.0310
1.0009 35.8261 103 1.0247
0.9938 36.8696 106 1.0181
0.9736 37.9130 109 1.0124
0.9888 38.9565 112 1.0076
0.9637 40.0 115 1.0019
0.9769 40.6957 117 0.9987
0.936 41.7391 120 0.9939
0.9863 42.7826 123 0.9906
0.9626 43.8261 126 0.9863
0.9438 44.8696 129 0.9825
0.9034 45.9130 132 0.9804
0.9398 46.9565 135 0.9763
0.9206 48.0 138 0.9740
0.9251 48.6957 140 0.9728
0.9245 49.7391 143 0.9704
0.9332 50.7826 146 0.9671
0.9012 51.8261 149 0.9651
0.9075 52.8696 152 0.9627
0.9031 53.9130 155 0.9614
0.8969 54.9565 158 0.9592
0.9102 56.0 161 0.9583
0.8955 56.6957 163 0.9563
0.8775 57.7391 166 0.9547
0.8879 58.7826 169 0.9540
0.8805 59.8261 172 0.9510
0.8982 60.8696 175 0.9505
0.8897 61.9130 178 0.9494
0.8515 62.9565 181 0.9479
0.8637 64.0 184 0.9469
0.8719 64.6957 186 0.9471
0.8635 65.7391 189 0.9452
0.8579 66.7826 192 0.9445
0.8465 67.8261 195 0.9434
0.8588 68.8696 198 0.9436
0.868 69.9130 201 0.9421
0.8523 70.9565 204 0.9418
0.8654 72.0 207 0.9404
0.8525 72.6957 209 0.9405
0.8565 73.7391 212 0.9400
0.8424 74.7826 215 0.9407
0.8342 75.8261 218 0.9395
0.8539 76.8696 221 0.9393
0.8413 77.9130 224 0.9383
0.8488 78.9565 227 0.9382
0.8319 80.0 230 0.9395
0.8402 80.6957 232 0.9382
0.8604 81.7391 235 0.9376
0.8516 82.7826 238 0.9374
0.8195 83.8261 241 0.9378
0.8456 84.8696 244 0.9381
0.8313 85.9130 247 0.9374
0.8415 86.9565 250 0.9369
0.8318 88.0 253 0.9365
0.8271 88.6957 255 0.9370
0.8361 89.7391 258 0.9364
0.8216 90.7826 261 0.9365
0.8387 91.8261 264 0.9366
0.8457 92.8696 267 0.9366
0.8491 93.9130 270 0.9367
0.8171 94.9565 273 0.9357
0.8168 96.0 276 0.9367
0.8161 96.6957 278 0.9364
0.8442 97.7391 281 0.9356
0.8388 98.7826 284 0.9363
0.8365 99.8261 287 0.9355
0.8493 100.8696 290 0.9360
0.8267 101.9130 293 0.9355
0.8304 102.9565 296 0.9361
0.8216 104.0 299 0.9361
0.8436 104.3478 300 0.9358

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1