collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2
This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9477
- Num Input Tokens Seen: 24547368
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.1282 | 0 |
3.1929 | 0.0104 | 5 | 1.1054 | 258672 |
3.0522 | 0.0207 | 10 | 1.0218 | 516688 |
2.8837 | 0.0311 | 15 | 0.9980 | 769464 |
2.8939 | 0.0415 | 20 | 0.9908 | 1023524 |
2.5952 | 0.0518 | 25 | 0.9880 | 1278684 |
2.8256 | 0.0622 | 30 | 0.9921 | 1525544 |
2.4783 | 0.0726 | 35 | 0.9969 | 1772664 |
2.4713 | 0.0829 | 40 | 1.0015 | 2028124 |
2.4233 | 0.0933 | 45 | 1.0009 | 2290688 |
2.1576 | 0.1036 | 50 | 1.0000 | 2541904 |
2.1476 | 0.1140 | 55 | 0.9984 | 2802668 |
2.0639 | 0.1244 | 60 | 0.9936 | 3061672 |
2.0863 | 0.1347 | 65 | 0.9958 | 3321424 |
1.7792 | 0.1451 | 70 | 0.9949 | 3567924 |
1.7329 | 0.1555 | 75 | 0.9895 | 3817752 |
1.9016 | 0.1658 | 80 | 0.9863 | 4069396 |
1.7455 | 0.1762 | 85 | 0.9823 | 4326708 |
1.4548 | 0.1866 | 90 | 0.9835 | 4578592 |
1.9925 | 0.1969 | 95 | 0.9793 | 4831136 |
1.5158 | 0.2073 | 100 | 0.9795 | 5084284 |
1.3108 | 0.2177 | 105 | 0.9788 | 5341196 |
1.6902 | 0.2280 | 110 | 0.9749 | 5606060 |
1.6011 | 0.2384 | 115 | 0.9767 | 5866064 |
1.547 | 0.2488 | 120 | 0.9708 | 6123360 |
1.3591 | 0.2591 | 125 | 0.9733 | 6381784 |
1.5019 | 0.2695 | 130 | 0.9718 | 6639256 |
1.5715 | 0.2798 | 135 | 0.9696 | 6895060 |
1.4177 | 0.2902 | 140 | 0.9694 | 7145840 |
1.467 | 0.3006 | 145 | 0.9663 | 7397572 |
1.5423 | 0.3109 | 150 | 0.9655 | 7653164 |
1.2753 | 0.3213 | 155 | 0.9656 | 7911008 |
1.5971 | 0.3317 | 160 | 0.9644 | 8163972 |
1.5416 | 0.3420 | 165 | 0.9648 | 8422184 |
1.6416 | 0.3524 | 170 | 0.9632 | 8681656 |
1.4712 | 0.3628 | 175 | 0.9615 | 8942564 |
1.6394 | 0.3731 | 180 | 0.9614 | 9199204 |
1.2702 | 0.3835 | 185 | 0.9590 | 9451368 |
1.3811 | 0.3939 | 190 | 0.9618 | 9705944 |
1.3814 | 0.4042 | 195 | 0.9619 | 9966356 |
1.5587 | 0.4146 | 200 | 0.9598 | 10223764 |
1.412 | 0.4250 | 205 | 0.9581 | 10476676 |
1.4082 | 0.4353 | 210 | 0.9587 | 10731868 |
1.517 | 0.4457 | 215 | 0.9578 | 10983192 |
1.4438 | 0.4560 | 220 | 0.9596 | 11241884 |
1.4262 | 0.4664 | 225 | 0.9566 | 11495712 |
1.2175 | 0.4768 | 230 | 0.9568 | 11746224 |
1.288 | 0.4871 | 235 | 0.9555 | 11999420 |
1.4236 | 0.4975 | 240 | 0.9561 | 12258032 |
1.2708 | 0.5079 | 245 | 0.9545 | 12509532 |
1.472 | 0.5182 | 250 | 0.9526 | 12761720 |
1.4331 | 0.5286 | 255 | 0.9551 | 13018164 |
1.2036 | 0.5390 | 260 | 0.9546 | 13280196 |
1.2403 | 0.5493 | 265 | 0.9544 | 13530496 |
1.244 | 0.5597 | 270 | 0.9533 | 13777476 |
1.5044 | 0.5701 | 275 | 0.9516 | 14036360 |
1.1313 | 0.5804 | 280 | 0.9534 | 14292528 |
1.4051 | 0.5908 | 285 | 0.9541 | 14540976 |
1.4291 | 0.6012 | 290 | 0.9500 | 14791196 |
1.3338 | 0.6115 | 295 | 0.9516 | 15045092 |
1.2997 | 0.6219 | 300 | 0.9536 | 15296648 |
1.4427 | 0.6322 | 305 | 0.9524 | 15542264 |
1.3716 | 0.6426 | 310 | 0.9491 | 15797752 |
1.4086 | 0.6530 | 315 | 0.9516 | 16051964 |
1.3173 | 0.6633 | 320 | 0.9495 | 16309408 |
1.4131 | 0.6737 | 325 | 0.9497 | 16562960 |
1.3696 | 0.6841 | 330 | 0.9478 | 16816124 |
1.3399 | 0.6944 | 335 | 0.9485 | 17068232 |
1.2074 | 0.7048 | 340 | 0.9484 | 17322548 |
1.1584 | 0.7152 | 345 | 0.9507 | 17575180 |
1.3924 | 0.7255 | 350 | 0.9476 | 17835436 |
1.3037 | 0.7359 | 355 | 0.9495 | 18084436 |
1.4192 | 0.7463 | 360 | 0.9470 | 18333840 |
1.4104 | 0.7566 | 365 | 0.9486 | 18585048 |
1.1994 | 0.7670 | 370 | 0.9462 | 18841288 |
1.4342 | 0.7774 | 375 | 0.9468 | 19098152 |
1.3477 | 0.7877 | 380 | 0.9457 | 19351632 |
1.2131 | 0.7981 | 385 | 0.9480 | 19608392 |
1.2247 | 0.8084 | 390 | 0.9453 | 19865888 |
1.2134 | 0.8188 | 395 | 0.9437 | 20121508 |
1.3281 | 0.8292 | 400 | 0.9453 | 20377088 |
1.1996 | 0.8395 | 405 | 0.9454 | 20635728 |
1.3079 | 0.8499 | 410 | 0.9455 | 20889244 |
1.1988 | 0.8603 | 415 | 0.9470 | 21145028 |
1.4831 | 0.8706 | 420 | 0.9469 | 21400236 |
1.2966 | 0.8810 | 425 | 0.9438 | 21661824 |
1.3788 | 0.8914 | 430 | 0.9466 | 21919900 |
1.2225 | 0.9017 | 435 | 0.9447 | 22173924 |
1.4329 | 0.9121 | 440 | 0.9468 | 22423548 |
1.3712 | 0.9225 | 445 | 0.9440 | 22680788 |
1.1879 | 0.9328 | 450 | 0.9441 | 22933912 |
1.1186 | 0.9432 | 455 | 0.9457 | 23183940 |
1.3402 | 0.9536 | 460 | 0.9500 | 23433908 |
1.2629 | 0.9639 | 465 | 0.9462 | 23687452 |
1.1852 | 0.9743 | 470 | 0.9449 | 23941716 |
1.2072 | 0.9846 | 475 | 0.9451 | 24191528 |
1.2218 | 0.9950 | 480 | 0.9482 | 24444424 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2
Base model
google/gemma-2-27b