collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9361
  • Num Input Tokens Seen: 21319328

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
2.9736 0.0120 5 1.0890 257960
2.9201 0.0240 10 1.0094 513580
2.7063 0.0360 15 0.9961 772336
2.7066 0.0479 20 0.9889 1027836
2.5663 0.0599 25 0.9868 1285388
2.586 0.0719 30 0.9896 1541024
2.5497 0.0839 35 0.9909 1796588
2.325 0.0959 40 0.9916 2051248
2.1303 0.1079 45 0.9928 2316512
2.1498 0.1198 50 0.9901 2575448
2.1035 0.1318 55 0.9887 2827576
2.0106 0.1438 60 0.9895 3085924
1.9861 0.1558 65 0.9849 3344592
1.8483 0.1678 70 0.9882 3587496
1.698 0.1798 75 0.9837 3845228
1.5455 0.1917 80 0.9820 4094024
1.7371 0.2037 85 0.9779 4352288
1.6068 0.2157 90 0.9755 4606816
1.6234 0.2277 95 0.9705 4865000
1.6119 0.2397 100 0.9710 5122860
1.4461 0.2517 105 0.9661 5380192
1.5323 0.2637 110 0.9648 5638952
1.48 0.2756 115 0.9644 5895124
1.5077 0.2876 120 0.9632 6150672
1.3105 0.2996 125 0.9605 6404592
1.5438 0.3116 130 0.9604 6667232
1.6025 0.3236 135 0.9587 6919444
1.5647 0.3356 140 0.9575 7171560
1.3177 0.3475 145 0.9598 7427412
1.4743 0.3595 150 0.9563 7690832
1.6544 0.3715 155 0.9547 7949984
1.397 0.3835 160 0.9584 8205800
1.3666 0.3955 165 0.9543 8464028
1.5154 0.4075 170 0.9527 8713484
1.5427 0.4194 175 0.9557 8971692
1.2568 0.4314 180 0.9521 9225284
1.3871 0.4434 185 0.9520 9479360
1.5084 0.4554 190 0.9521 9730040
1.4411 0.4674 195 0.9499 9989888
1.3642 0.4794 200 0.9487 10253880
1.2564 0.4913 205 0.9472 10506892
1.4515 0.5033 210 0.9496 10762052
1.2647 0.5153 215 0.9494 11010792
1.3365 0.5273 220 0.9491 11258360
1.4796 0.5393 225 0.9486 11509984
1.4464 0.5513 230 0.9468 11768156
1.1882 0.5633 235 0.9482 12022340
1.4812 0.5752 240 0.9485 12270644
1.3927 0.5872 245 0.9466 12529864
1.5076 0.5992 250 0.9475 12788428
1.3727 0.6112 255 0.9459 13039508
1.2361 0.6232 260 0.9476 13292956
1.3745 0.6352 265 0.9443 13548132
1.3198 0.6471 270 0.9442 13805636
1.2179 0.6591 275 0.9436 14058880
1.4035 0.6711 280 0.9463 14318400
1.2952 0.6831 285 0.9440 14568908
1.291 0.6951 290 0.9439 14823440
1.4132 0.7071 295 0.9436 15082248
1.5722 0.7190 300 0.9429 15338164
1.2473 0.7310 305 0.9416 15601888
1.2805 0.7430 310 0.9420 15855996
1.1853 0.7550 315 0.9401 16103316
1.4429 0.7670 320 0.9411 16354352
1.0744 0.7790 325 0.9417 16609264
1.2779 0.7910 330 0.9432 16869072
1.4178 0.8029 335 0.9407 17125932
1.3986 0.8149 340 0.9414 17379164
1.1471 0.8269 345 0.9404 17628696
1.1763 0.8389 350 0.9426 17884156
1.2251 0.8509 355 0.9389 18134160
1.2366 0.8629 360 0.9409 18391736
1.3086 0.8748 365 0.9392 18644984
1.2506 0.8868 370 0.9405 18902772
1.355 0.8988 375 0.9384 19165216
1.3424 0.9108 380 0.9400 19415060
1.3585 0.9228 385 0.9390 19668820
1.3487 0.9348 390 0.9425 19922732
1.4113 0.9467 395 0.9402 20187160
1.5089 0.9587 400 0.9377 20438732
1.3723 0.9707 405 0.9376 20699200
1.2797 0.9827 410 0.9422 20957600
1.3996 0.9947 415 0.9367 21217992

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1

Base model

google/gemma-2-27b
Finetuned
(52)
this model