collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9431
  • Num Input Tokens Seen: 24113104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
3.2254 0.0108 5 1.0961 268092
2.9876 0.0215 10 1.0148 525364
3.0862 0.0323 15 1.0007 783936
2.8353 0.0431 20 0.9916 1042564
2.8331 0.0539 25 0.9929 1303404
2.8261 0.0646 30 0.9939 1563492
2.3827 0.0754 35 1.0007 1820252
2.4513 0.0862 40 0.9970 2078400
2.487 0.0970 45 0.9878 2336672
2.4506 0.1077 50 0.9944 2596612
2.4163 0.1185 55 0.9918 2850740
1.9844 0.1293 60 0.9928 3111820
2.1404 0.1400 65 0.9893 3372896
2.1749 0.1508 70 0.9831 3630380
1.9777 0.1616 75 0.9831 3893304
1.8699 0.1724 80 0.9804 4157048
1.8005 0.1831 85 0.9768 4414040
1.9942 0.1939 90 0.9743 4679364
1.6692 0.2047 95 0.9746 4942204
2.025 0.2154 100 0.9723 5202156
1.7424 0.2262 105 0.9720 5462920
1.7119 0.2370 110 0.9691 5722880
1.6313 0.2478 115 0.9701 5983564
1.8441 0.2585 120 0.9693 6239988
1.8205 0.2693 125 0.9658 6503448
1.7461 0.2801 130 0.9661 6765768
1.9212 0.2909 135 0.9648 7026576
1.7037 0.3016 140 0.9630 7278368
1.8345 0.3124 145 0.9629 7543060
1.6023 0.3232 150 0.9593 7798308
1.5075 0.3339 155 0.9614 8064352
1.7006 0.3447 160 0.9565 8324852
1.6318 0.3555 165 0.9573 8577584
1.7273 0.3663 170 0.9564 8840932
1.5702 0.3770 175 0.9559 9102784
1.4506 0.3878 180 0.9566 9361972
1.7453 0.3986 185 0.9535 9617376
1.6234 0.4093 190 0.9561 9873880
1.7928 0.4201 195 0.9554 10136916
1.7004 0.4309 200 0.9518 10397756
1.4518 0.4417 205 0.9541 10650948
1.4489 0.4524 210 0.9528 10911356
1.6097 0.4632 215 0.9547 11164796
1.7534 0.4740 220 0.9519 11425528
1.7858 0.4848 225 0.9500 11683928
1.7461 0.4955 230 0.9497 11934068
1.5409 0.5063 235 0.9506 12195672
1.5174 0.5171 240 0.9503 12449748
1.667 0.5278 245 0.9486 12711732
1.7769 0.5386 250 0.9474 12969368
1.6375 0.5494 255 0.9475 13229192
1.6864 0.5602 260 0.9481 13492212
1.5925 0.5709 265 0.9483 13748524
1.5207 0.5817 270 0.9468 14010960
1.8265 0.5925 275 0.9454 14273300
1.8 0.6032 280 0.9454 14533716
1.6235 0.6140 285 0.9487 14794608
1.4906 0.6248 290 0.9464 15062712
1.415 0.6356 295 0.9462 15320116
1.6232 0.6463 300 0.9459 15576780
1.6258 0.6571 305 0.9470 15843180
1.4792 0.6679 310 0.9449 16103364
1.4777 0.6787 315 0.9449 16362160
1.6598 0.6894 320 0.9445 16630092
1.564 0.7002 325 0.9441 16893764
1.2829 0.7110 330 0.9448 17150196
1.5467 0.7217 335 0.9439 17413432
1.2518 0.7325 340 0.9450 17670408
1.5517 0.7433 345 0.9462 17931036
1.4484 0.7541 350 0.9444 18192332
1.3657 0.7648 355 0.9439 18454740
1.5337 0.7756 360 0.9456 18720156
1.5494 0.7864 365 0.9460 18984560
1.3332 0.7971 370 0.9442 19249668
1.4732 0.8079 375 0.9424 19505204
1.4731 0.8187 380 0.9433 19759544
1.3394 0.8295 385 0.9462 20018204
1.501 0.8402 390 0.9437 20276620
1.38 0.8510 395 0.9437 20532420
1.4448 0.8618 400 0.9439 20791532
1.5206 0.8726 405 0.9446 21047984
1.4492 0.8833 410 0.9440 21305632
1.4171 0.8941 415 0.9431 21568260
1.5316 0.9049 420 0.9447 21824664
1.4805 0.9156 425 0.9425 22084792
1.7193 0.9264 430 0.9407 22342320
1.7271 0.9372 435 0.9431 22600956
1.6102 0.9480 440 0.9432 22861708
1.6152 0.9587 445 0.9429 23118340
1.362 0.9695 450 0.9406 23379296
1.51 0.9803 455 0.9446 23638084
1.3728 0.9910 460 0.9421 23906316

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd1

Base model

google/gemma-2-27b
Finetuned
(52)
this model