--- license: gemma base_model: google/gemma-2-27b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2 results: [] --- # collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2 This model is a fine-tuned version of [google/gemma-2-27b](https://huggingface.co./google/gemma-2-27b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9477 - Num Input Tokens Seen: 24547368 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.1282 | 0 | | 3.1929 | 0.0104 | 5 | 1.1054 | 258672 | | 3.0522 | 0.0207 | 10 | 1.0218 | 516688 | | 2.8837 | 0.0311 | 15 | 0.9980 | 769464 | | 2.8939 | 0.0415 | 20 | 0.9908 | 1023524 | | 2.5952 | 0.0518 | 25 | 0.9880 | 1278684 | | 2.8256 | 0.0622 | 30 | 0.9921 | 1525544 | | 2.4783 | 0.0726 | 35 | 0.9969 | 1772664 | | 2.4713 | 0.0829 | 40 | 1.0015 | 2028124 | | 2.4233 | 0.0933 | 45 | 1.0009 | 2290688 | | 2.1576 | 0.1036 | 50 | 1.0000 | 2541904 | | 2.1476 | 0.1140 | 55 | 0.9984 | 2802668 | | 2.0639 | 0.1244 | 60 | 0.9936 | 3061672 | | 2.0863 | 0.1347 | 65 | 0.9958 | 3321424 | | 1.7792 | 0.1451 | 70 | 0.9949 | 3567924 | | 1.7329 | 0.1555 | 75 | 0.9895 | 3817752 | | 1.9016 | 0.1658 | 80 | 0.9863 | 4069396 | | 1.7455 | 0.1762 | 85 | 0.9823 | 4326708 | | 1.4548 | 0.1866 | 90 | 0.9835 | 4578592 | | 1.9925 | 0.1969 | 95 | 0.9793 | 4831136 | | 1.5158 | 0.2073 | 100 | 0.9795 | 5084284 | | 1.3108 | 0.2177 | 105 | 0.9788 | 5341196 | | 1.6902 | 0.2280 | 110 | 0.9749 | 5606060 | | 1.6011 | 0.2384 | 115 | 0.9767 | 5866064 | | 1.547 | 0.2488 | 120 | 0.9708 | 6123360 | | 1.3591 | 0.2591 | 125 | 0.9733 | 6381784 | | 1.5019 | 0.2695 | 130 | 0.9718 | 6639256 | | 1.5715 | 0.2798 | 135 | 0.9696 | 6895060 | | 1.4177 | 0.2902 | 140 | 0.9694 | 7145840 | | 1.467 | 0.3006 | 145 | 0.9663 | 7397572 | | 1.5423 | 0.3109 | 150 | 0.9655 | 7653164 | | 1.2753 | 0.3213 | 155 | 0.9656 | 7911008 | | 1.5971 | 0.3317 | 160 | 0.9644 | 8163972 | | 1.5416 | 0.3420 | 165 | 0.9648 | 8422184 | | 1.6416 | 0.3524 | 170 | 0.9632 | 8681656 | | 1.4712 | 0.3628 | 175 | 0.9615 | 8942564 | | 1.6394 | 0.3731 | 180 | 0.9614 | 9199204 | | 1.2702 | 0.3835 | 185 | 0.9590 | 9451368 | | 1.3811 | 0.3939 | 190 | 0.9618 | 9705944 | | 1.3814 | 0.4042 | 195 | 0.9619 | 9966356 | | 1.5587 | 0.4146 | 200 | 0.9598 | 10223764 | | 1.412 | 0.4250 | 205 | 0.9581 | 10476676 | | 1.4082 | 0.4353 | 210 | 0.9587 | 10731868 | | 1.517 | 0.4457 | 215 | 0.9578 | 10983192 | | 1.4438 | 0.4560 | 220 | 0.9596 | 11241884 | | 1.4262 | 0.4664 | 225 | 0.9566 | 11495712 | | 1.2175 | 0.4768 | 230 | 0.9568 | 11746224 | | 1.288 | 0.4871 | 235 | 0.9555 | 11999420 | | 1.4236 | 0.4975 | 240 | 0.9561 | 12258032 | | 1.2708 | 0.5079 | 245 | 0.9545 | 12509532 | | 1.472 | 0.5182 | 250 | 0.9526 | 12761720 | | 1.4331 | 0.5286 | 255 | 0.9551 | 13018164 | | 1.2036 | 0.5390 | 260 | 0.9546 | 13280196 | | 1.2403 | 0.5493 | 265 | 0.9544 | 13530496 | | 1.244 | 0.5597 | 270 | 0.9533 | 13777476 | | 1.5044 | 0.5701 | 275 | 0.9516 | 14036360 | | 1.1313 | 0.5804 | 280 | 0.9534 | 14292528 | | 1.4051 | 0.5908 | 285 | 0.9541 | 14540976 | | 1.4291 | 0.6012 | 290 | 0.9500 | 14791196 | | 1.3338 | 0.6115 | 295 | 0.9516 | 15045092 | | 1.2997 | 0.6219 | 300 | 0.9536 | 15296648 | | 1.4427 | 0.6322 | 305 | 0.9524 | 15542264 | | 1.3716 | 0.6426 | 310 | 0.9491 | 15797752 | | 1.4086 | 0.6530 | 315 | 0.9516 | 16051964 | | 1.3173 | 0.6633 | 320 | 0.9495 | 16309408 | | 1.4131 | 0.6737 | 325 | 0.9497 | 16562960 | | 1.3696 | 0.6841 | 330 | 0.9478 | 16816124 | | 1.3399 | 0.6944 | 335 | 0.9485 | 17068232 | | 1.2074 | 0.7048 | 340 | 0.9484 | 17322548 | | 1.1584 | 0.7152 | 345 | 0.9507 | 17575180 | | 1.3924 | 0.7255 | 350 | 0.9476 | 17835436 | | 1.3037 | 0.7359 | 355 | 0.9495 | 18084436 | | 1.4192 | 0.7463 | 360 | 0.9470 | 18333840 | | 1.4104 | 0.7566 | 365 | 0.9486 | 18585048 | | 1.1994 | 0.7670 | 370 | 0.9462 | 18841288 | | 1.4342 | 0.7774 | 375 | 0.9468 | 19098152 | | 1.3477 | 0.7877 | 380 | 0.9457 | 19351632 | | 1.2131 | 0.7981 | 385 | 0.9480 | 19608392 | | 1.2247 | 0.8084 | 390 | 0.9453 | 19865888 | | 1.2134 | 0.8188 | 395 | 0.9437 | 20121508 | | 1.3281 | 0.8292 | 400 | 0.9453 | 20377088 | | 1.1996 | 0.8395 | 405 | 0.9454 | 20635728 | | 1.3079 | 0.8499 | 410 | 0.9455 | 20889244 | | 1.1988 | 0.8603 | 415 | 0.9470 | 21145028 | | 1.4831 | 0.8706 | 420 | 0.9469 | 21400236 | | 1.2966 | 0.8810 | 425 | 0.9438 | 21661824 | | 1.3788 | 0.8914 | 430 | 0.9466 | 21919900 | | 1.2225 | 0.9017 | 435 | 0.9447 | 22173924 | | 1.4329 | 0.9121 | 440 | 0.9468 | 22423548 | | 1.3712 | 0.9225 | 445 | 0.9440 | 22680788 | | 1.1879 | 0.9328 | 450 | 0.9441 | 22933912 | | 1.1186 | 0.9432 | 455 | 0.9457 | 23183940 | | 1.3402 | 0.9536 | 460 | 0.9500 | 23433908 | | 1.2629 | 0.9639 | 465 | 0.9462 | 23687452 | | 1.1852 | 0.9743 | 470 | 0.9449 | 23941716 | | 1.2072 | 0.9846 | 475 | 0.9451 | 24191528 | | 1.2218 | 0.9950 | 480 | 0.9482 | 24444424 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1