collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1
This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9361
- Num Input Tokens Seen: 21319328
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.1282 | 0 |
2.9736 | 0.0120 | 5 | 1.0890 | 257960 |
2.9201 | 0.0240 | 10 | 1.0094 | 513580 |
2.7063 | 0.0360 | 15 | 0.9961 | 772336 |
2.7066 | 0.0479 | 20 | 0.9889 | 1027836 |
2.5663 | 0.0599 | 25 | 0.9868 | 1285388 |
2.586 | 0.0719 | 30 | 0.9896 | 1541024 |
2.5497 | 0.0839 | 35 | 0.9909 | 1796588 |
2.325 | 0.0959 | 40 | 0.9916 | 2051248 |
2.1303 | 0.1079 | 45 | 0.9928 | 2316512 |
2.1498 | 0.1198 | 50 | 0.9901 | 2575448 |
2.1035 | 0.1318 | 55 | 0.9887 | 2827576 |
2.0106 | 0.1438 | 60 | 0.9895 | 3085924 |
1.9861 | 0.1558 | 65 | 0.9849 | 3344592 |
1.8483 | 0.1678 | 70 | 0.9882 | 3587496 |
1.698 | 0.1798 | 75 | 0.9837 | 3845228 |
1.5455 | 0.1917 | 80 | 0.9820 | 4094024 |
1.7371 | 0.2037 | 85 | 0.9779 | 4352288 |
1.6068 | 0.2157 | 90 | 0.9755 | 4606816 |
1.6234 | 0.2277 | 95 | 0.9705 | 4865000 |
1.6119 | 0.2397 | 100 | 0.9710 | 5122860 |
1.4461 | 0.2517 | 105 | 0.9661 | 5380192 |
1.5323 | 0.2637 | 110 | 0.9648 | 5638952 |
1.48 | 0.2756 | 115 | 0.9644 | 5895124 |
1.5077 | 0.2876 | 120 | 0.9632 | 6150672 |
1.3105 | 0.2996 | 125 | 0.9605 | 6404592 |
1.5438 | 0.3116 | 130 | 0.9604 | 6667232 |
1.6025 | 0.3236 | 135 | 0.9587 | 6919444 |
1.5647 | 0.3356 | 140 | 0.9575 | 7171560 |
1.3177 | 0.3475 | 145 | 0.9598 | 7427412 |
1.4743 | 0.3595 | 150 | 0.9563 | 7690832 |
1.6544 | 0.3715 | 155 | 0.9547 | 7949984 |
1.397 | 0.3835 | 160 | 0.9584 | 8205800 |
1.3666 | 0.3955 | 165 | 0.9543 | 8464028 |
1.5154 | 0.4075 | 170 | 0.9527 | 8713484 |
1.5427 | 0.4194 | 175 | 0.9557 | 8971692 |
1.2568 | 0.4314 | 180 | 0.9521 | 9225284 |
1.3871 | 0.4434 | 185 | 0.9520 | 9479360 |
1.5084 | 0.4554 | 190 | 0.9521 | 9730040 |
1.4411 | 0.4674 | 195 | 0.9499 | 9989888 |
1.3642 | 0.4794 | 200 | 0.9487 | 10253880 |
1.2564 | 0.4913 | 205 | 0.9472 | 10506892 |
1.4515 | 0.5033 | 210 | 0.9496 | 10762052 |
1.2647 | 0.5153 | 215 | 0.9494 | 11010792 |
1.3365 | 0.5273 | 220 | 0.9491 | 11258360 |
1.4796 | 0.5393 | 225 | 0.9486 | 11509984 |
1.4464 | 0.5513 | 230 | 0.9468 | 11768156 |
1.1882 | 0.5633 | 235 | 0.9482 | 12022340 |
1.4812 | 0.5752 | 240 | 0.9485 | 12270644 |
1.3927 | 0.5872 | 245 | 0.9466 | 12529864 |
1.5076 | 0.5992 | 250 | 0.9475 | 12788428 |
1.3727 | 0.6112 | 255 | 0.9459 | 13039508 |
1.2361 | 0.6232 | 260 | 0.9476 | 13292956 |
1.3745 | 0.6352 | 265 | 0.9443 | 13548132 |
1.3198 | 0.6471 | 270 | 0.9442 | 13805636 |
1.2179 | 0.6591 | 275 | 0.9436 | 14058880 |
1.4035 | 0.6711 | 280 | 0.9463 | 14318400 |
1.2952 | 0.6831 | 285 | 0.9440 | 14568908 |
1.291 | 0.6951 | 290 | 0.9439 | 14823440 |
1.4132 | 0.7071 | 295 | 0.9436 | 15082248 |
1.5722 | 0.7190 | 300 | 0.9429 | 15338164 |
1.2473 | 0.7310 | 305 | 0.9416 | 15601888 |
1.2805 | 0.7430 | 310 | 0.9420 | 15855996 |
1.1853 | 0.7550 | 315 | 0.9401 | 16103316 |
1.4429 | 0.7670 | 320 | 0.9411 | 16354352 |
1.0744 | 0.7790 | 325 | 0.9417 | 16609264 |
1.2779 | 0.7910 | 330 | 0.9432 | 16869072 |
1.4178 | 0.8029 | 335 | 0.9407 | 17125932 |
1.3986 | 0.8149 | 340 | 0.9414 | 17379164 |
1.1471 | 0.8269 | 345 | 0.9404 | 17628696 |
1.1763 | 0.8389 | 350 | 0.9426 | 17884156 |
1.2251 | 0.8509 | 355 | 0.9389 | 18134160 |
1.2366 | 0.8629 | 360 | 0.9409 | 18391736 |
1.3086 | 0.8748 | 365 | 0.9392 | 18644984 |
1.2506 | 0.8868 | 370 | 0.9405 | 18902772 |
1.355 | 0.8988 | 375 | 0.9384 | 19165216 |
1.3424 | 0.9108 | 380 | 0.9400 | 19415060 |
1.3585 | 0.9228 | 385 | 0.9390 | 19668820 |
1.3487 | 0.9348 | 390 | 0.9425 | 19922732 |
1.4113 | 0.9467 | 395 | 0.9402 | 20187160 |
1.5089 | 0.9587 | 400 | 0.9377 | 20438732 |
1.3723 | 0.9707 | 405 | 0.9376 | 20699200 |
1.2797 | 0.9827 | 410 | 0.9422 | 20957600 |
1.3996 | 0.9947 | 415 | 0.9367 | 21217992 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd1
Base model
google/gemma-2-27b