collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd1
This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9431
- Num Input Tokens Seen: 24113104
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.1282 | 0 |
3.2254 | 0.0108 | 5 | 1.0961 | 268092 |
2.9876 | 0.0215 | 10 | 1.0148 | 525364 |
3.0862 | 0.0323 | 15 | 1.0007 | 783936 |
2.8353 | 0.0431 | 20 | 0.9916 | 1042564 |
2.8331 | 0.0539 | 25 | 0.9929 | 1303404 |
2.8261 | 0.0646 | 30 | 0.9939 | 1563492 |
2.3827 | 0.0754 | 35 | 1.0007 | 1820252 |
2.4513 | 0.0862 | 40 | 0.9970 | 2078400 |
2.487 | 0.0970 | 45 | 0.9878 | 2336672 |
2.4506 | 0.1077 | 50 | 0.9944 | 2596612 |
2.4163 | 0.1185 | 55 | 0.9918 | 2850740 |
1.9844 | 0.1293 | 60 | 0.9928 | 3111820 |
2.1404 | 0.1400 | 65 | 0.9893 | 3372896 |
2.1749 | 0.1508 | 70 | 0.9831 | 3630380 |
1.9777 | 0.1616 | 75 | 0.9831 | 3893304 |
1.8699 | 0.1724 | 80 | 0.9804 | 4157048 |
1.8005 | 0.1831 | 85 | 0.9768 | 4414040 |
1.9942 | 0.1939 | 90 | 0.9743 | 4679364 |
1.6692 | 0.2047 | 95 | 0.9746 | 4942204 |
2.025 | 0.2154 | 100 | 0.9723 | 5202156 |
1.7424 | 0.2262 | 105 | 0.9720 | 5462920 |
1.7119 | 0.2370 | 110 | 0.9691 | 5722880 |
1.6313 | 0.2478 | 115 | 0.9701 | 5983564 |
1.8441 | 0.2585 | 120 | 0.9693 | 6239988 |
1.8205 | 0.2693 | 125 | 0.9658 | 6503448 |
1.7461 | 0.2801 | 130 | 0.9661 | 6765768 |
1.9212 | 0.2909 | 135 | 0.9648 | 7026576 |
1.7037 | 0.3016 | 140 | 0.9630 | 7278368 |
1.8345 | 0.3124 | 145 | 0.9629 | 7543060 |
1.6023 | 0.3232 | 150 | 0.9593 | 7798308 |
1.5075 | 0.3339 | 155 | 0.9614 | 8064352 |
1.7006 | 0.3447 | 160 | 0.9565 | 8324852 |
1.6318 | 0.3555 | 165 | 0.9573 | 8577584 |
1.7273 | 0.3663 | 170 | 0.9564 | 8840932 |
1.5702 | 0.3770 | 175 | 0.9559 | 9102784 |
1.4506 | 0.3878 | 180 | 0.9566 | 9361972 |
1.7453 | 0.3986 | 185 | 0.9535 | 9617376 |
1.6234 | 0.4093 | 190 | 0.9561 | 9873880 |
1.7928 | 0.4201 | 195 | 0.9554 | 10136916 |
1.7004 | 0.4309 | 200 | 0.9518 | 10397756 |
1.4518 | 0.4417 | 205 | 0.9541 | 10650948 |
1.4489 | 0.4524 | 210 | 0.9528 | 10911356 |
1.6097 | 0.4632 | 215 | 0.9547 | 11164796 |
1.7534 | 0.4740 | 220 | 0.9519 | 11425528 |
1.7858 | 0.4848 | 225 | 0.9500 | 11683928 |
1.7461 | 0.4955 | 230 | 0.9497 | 11934068 |
1.5409 | 0.5063 | 235 | 0.9506 | 12195672 |
1.5174 | 0.5171 | 240 | 0.9503 | 12449748 |
1.667 | 0.5278 | 245 | 0.9486 | 12711732 |
1.7769 | 0.5386 | 250 | 0.9474 | 12969368 |
1.6375 | 0.5494 | 255 | 0.9475 | 13229192 |
1.6864 | 0.5602 | 260 | 0.9481 | 13492212 |
1.5925 | 0.5709 | 265 | 0.9483 | 13748524 |
1.5207 | 0.5817 | 270 | 0.9468 | 14010960 |
1.8265 | 0.5925 | 275 | 0.9454 | 14273300 |
1.8 | 0.6032 | 280 | 0.9454 | 14533716 |
1.6235 | 0.6140 | 285 | 0.9487 | 14794608 |
1.4906 | 0.6248 | 290 | 0.9464 | 15062712 |
1.415 | 0.6356 | 295 | 0.9462 | 15320116 |
1.6232 | 0.6463 | 300 | 0.9459 | 15576780 |
1.6258 | 0.6571 | 305 | 0.9470 | 15843180 |
1.4792 | 0.6679 | 310 | 0.9449 | 16103364 |
1.4777 | 0.6787 | 315 | 0.9449 | 16362160 |
1.6598 | 0.6894 | 320 | 0.9445 | 16630092 |
1.564 | 0.7002 | 325 | 0.9441 | 16893764 |
1.2829 | 0.7110 | 330 | 0.9448 | 17150196 |
1.5467 | 0.7217 | 335 | 0.9439 | 17413432 |
1.2518 | 0.7325 | 340 | 0.9450 | 17670408 |
1.5517 | 0.7433 | 345 | 0.9462 | 17931036 |
1.4484 | 0.7541 | 350 | 0.9444 | 18192332 |
1.3657 | 0.7648 | 355 | 0.9439 | 18454740 |
1.5337 | 0.7756 | 360 | 0.9456 | 18720156 |
1.5494 | 0.7864 | 365 | 0.9460 | 18984560 |
1.3332 | 0.7971 | 370 | 0.9442 | 19249668 |
1.4732 | 0.8079 | 375 | 0.9424 | 19505204 |
1.4731 | 0.8187 | 380 | 0.9433 | 19759544 |
1.3394 | 0.8295 | 385 | 0.9462 | 20018204 |
1.501 | 0.8402 | 390 | 0.9437 | 20276620 |
1.38 | 0.8510 | 395 | 0.9437 | 20532420 |
1.4448 | 0.8618 | 400 | 0.9439 | 20791532 |
1.5206 | 0.8726 | 405 | 0.9446 | 21047984 |
1.4492 | 0.8833 | 410 | 0.9440 | 21305632 |
1.4171 | 0.8941 | 415 | 0.9431 | 21568260 |
1.5316 | 0.9049 | 420 | 0.9447 | 21824664 |
1.4805 | 0.9156 | 425 | 0.9425 | 22084792 |
1.7193 | 0.9264 | 430 | 0.9407 | 22342320 |
1.7271 | 0.9372 | 435 | 0.9431 | 22600956 |
1.6102 | 0.9480 | 440 | 0.9432 | 22861708 |
1.6152 | 0.9587 | 445 | 0.9429 | 23118340 |
1.362 | 0.9695 | 450 | 0.9406 | 23379296 |
1.51 | 0.9803 | 455 | 0.9446 | 23638084 |
1.3728 | 0.9910 | 460 | 0.9421 | 23906316 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd1
Base model
google/gemma-2-27b