collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9477
  • Num Input Tokens Seen: 24547368

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
3.1929 0.0104 5 1.1054 258672
3.0522 0.0207 10 1.0218 516688
2.8837 0.0311 15 0.9980 769464
2.8939 0.0415 20 0.9908 1023524
2.5952 0.0518 25 0.9880 1278684
2.8256 0.0622 30 0.9921 1525544
2.4783 0.0726 35 0.9969 1772664
2.4713 0.0829 40 1.0015 2028124
2.4233 0.0933 45 1.0009 2290688
2.1576 0.1036 50 1.0000 2541904
2.1476 0.1140 55 0.9984 2802668
2.0639 0.1244 60 0.9936 3061672
2.0863 0.1347 65 0.9958 3321424
1.7792 0.1451 70 0.9949 3567924
1.7329 0.1555 75 0.9895 3817752
1.9016 0.1658 80 0.9863 4069396
1.7455 0.1762 85 0.9823 4326708
1.4548 0.1866 90 0.9835 4578592
1.9925 0.1969 95 0.9793 4831136
1.5158 0.2073 100 0.9795 5084284
1.3108 0.2177 105 0.9788 5341196
1.6902 0.2280 110 0.9749 5606060
1.6011 0.2384 115 0.9767 5866064
1.547 0.2488 120 0.9708 6123360
1.3591 0.2591 125 0.9733 6381784
1.5019 0.2695 130 0.9718 6639256
1.5715 0.2798 135 0.9696 6895060
1.4177 0.2902 140 0.9694 7145840
1.467 0.3006 145 0.9663 7397572
1.5423 0.3109 150 0.9655 7653164
1.2753 0.3213 155 0.9656 7911008
1.5971 0.3317 160 0.9644 8163972
1.5416 0.3420 165 0.9648 8422184
1.6416 0.3524 170 0.9632 8681656
1.4712 0.3628 175 0.9615 8942564
1.6394 0.3731 180 0.9614 9199204
1.2702 0.3835 185 0.9590 9451368
1.3811 0.3939 190 0.9618 9705944
1.3814 0.4042 195 0.9619 9966356
1.5587 0.4146 200 0.9598 10223764
1.412 0.4250 205 0.9581 10476676
1.4082 0.4353 210 0.9587 10731868
1.517 0.4457 215 0.9578 10983192
1.4438 0.4560 220 0.9596 11241884
1.4262 0.4664 225 0.9566 11495712
1.2175 0.4768 230 0.9568 11746224
1.288 0.4871 235 0.9555 11999420
1.4236 0.4975 240 0.9561 12258032
1.2708 0.5079 245 0.9545 12509532
1.472 0.5182 250 0.9526 12761720
1.4331 0.5286 255 0.9551 13018164
1.2036 0.5390 260 0.9546 13280196
1.2403 0.5493 265 0.9544 13530496
1.244 0.5597 270 0.9533 13777476
1.5044 0.5701 275 0.9516 14036360
1.1313 0.5804 280 0.9534 14292528
1.4051 0.5908 285 0.9541 14540976
1.4291 0.6012 290 0.9500 14791196
1.3338 0.6115 295 0.9516 15045092
1.2997 0.6219 300 0.9536 15296648
1.4427 0.6322 305 0.9524 15542264
1.3716 0.6426 310 0.9491 15797752
1.4086 0.6530 315 0.9516 16051964
1.3173 0.6633 320 0.9495 16309408
1.4131 0.6737 325 0.9497 16562960
1.3696 0.6841 330 0.9478 16816124
1.3399 0.6944 335 0.9485 17068232
1.2074 0.7048 340 0.9484 17322548
1.1584 0.7152 345 0.9507 17575180
1.3924 0.7255 350 0.9476 17835436
1.3037 0.7359 355 0.9495 18084436
1.4192 0.7463 360 0.9470 18333840
1.4104 0.7566 365 0.9486 18585048
1.1994 0.7670 370 0.9462 18841288
1.4342 0.7774 375 0.9468 19098152
1.3477 0.7877 380 0.9457 19351632
1.2131 0.7981 385 0.9480 19608392
1.2247 0.8084 390 0.9453 19865888
1.2134 0.8188 395 0.9437 20121508
1.3281 0.8292 400 0.9453 20377088
1.1996 0.8395 405 0.9454 20635728
1.3079 0.8499 410 0.9455 20889244
1.1988 0.8603 415 0.9470 21145028
1.4831 0.8706 420 0.9469 21400236
1.2966 0.8810 425 0.9438 21661824
1.3788 0.8914 430 0.9466 21919900
1.2225 0.9017 435 0.9447 22173924
1.4329 0.9121 440 0.9468 22423548
1.3712 0.9225 445 0.9440 22680788
1.1879 0.9328 450 0.9441 22933912
1.1186 0.9432 455 0.9457 23183940
1.3402 0.9536 460 0.9500 23433908
1.2629 0.9639 465 0.9462 23687452
1.1852 0.9743 470 0.9449 23941716
1.2072 0.9846 475 0.9451 24191528
1.2218 0.9950 480 0.9482 24444424

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd2

Base model

google/gemma-2-27b
Finetuned
(52)
this model