collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9437
  • Num Input Tokens Seen: 24429136

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
3.2167 0.0105 5 1.0957 260288
2.852 0.0211 10 1.0143 515120
3.1183 0.0316 15 1.0029 771156
3.0936 0.0421 20 0.9878 1024572
2.7326 0.0527 25 0.9873 1286740
2.8651 0.0632 30 0.9935 1544608
2.4543 0.0738 35 0.9912 1797588
2.3316 0.0843 40 0.9939 2059220
2.2969 0.0948 45 0.9962 2316228
2.209 0.1054 50 0.9957 2573616
2.2011 0.1159 55 0.9901 2828304
2.249 0.1264 60 0.9885 3089580
1.9603 0.1370 65 0.9874 3344980
1.8445 0.1475 70 0.9853 3594824
2.0684 0.1580 75 0.9810 3851312
2.126 0.1686 80 0.9800 4107272
1.985 0.1791 85 0.9843 4370976
1.8983 0.1896 90 0.9811 4618784
1.9013 0.2002 95 0.9777 4879132
1.698 0.2107 100 0.9769 5133180
1.8366 0.2213 105 0.9777 5386652
1.6669 0.2318 110 0.9781 5651192
1.6158 0.2423 115 0.9741 5916924
1.7143 0.2529 120 0.9717 6175092
1.8355 0.2634 125 0.9722 6425136
1.5842 0.2739 130 0.9702 6678384
1.6173 0.2845 135 0.9698 6939592
1.6637 0.2950 140 0.9697 7196320
1.7279 0.3055 145 0.9669 7458004
1.7614 0.3161 150 0.9673 7716756
1.5623 0.3266 155 0.9649 7977856
1.4405 0.3372 160 0.9619 8235564
1.6714 0.3477 165 0.9663 8484732
1.7076 0.3582 170 0.9628 8745024
1.6164 0.3688 175 0.9610 8997416
1.7585 0.3793 180 0.9595 9257192
1.4447 0.3898 185 0.9606 9512832
1.5863 0.4004 190 0.9586 9768872
1.5235 0.4109 195 0.9593 10028040
1.5822 0.4214 200 0.9581 10285048
1.5285 0.4320 205 0.9548 10542956
1.5484 0.4425 210 0.9568 10812508
1.4607 0.4530 215 0.9546 11069192
1.4989 0.4636 220 0.9549 11316832
1.5499 0.4741 225 0.9533 11581384
1.3848 0.4847 230 0.9544 11838708
1.3471 0.4952 235 0.9543 12091016
1.2328 0.5057 240 0.9527 12347520
1.3087 0.5163 245 0.9532 12604588
1.3999 0.5268 250 0.9542 12855508
1.5176 0.5373 255 0.9548 13115696
1.3977 0.5479 260 0.9521 13369976
1.14 0.5584 265 0.9528 13629368
1.4824 0.5689 270 0.9539 13889796
1.2656 0.5795 275 0.9525 14149180
1.6385 0.5900 280 0.9504 14410972
1.6261 0.6006 285 0.9521 14667904
1.3793 0.6111 290 0.9497 14930112
1.4541 0.6216 295 0.9507 15189508
1.3924 0.6322 300 0.9490 15444108
1.5557 0.6427 305 0.9510 15706864
1.0083 0.6532 310 0.9509 15961260
1.3866 0.6638 315 0.9494 16222564
1.1968 0.6743 320 0.9510 16479072
1.4406 0.6848 325 0.9499 16744040
1.5105 0.6954 330 0.9499 17008220
1.6523 0.7059 335 0.9473 17259972
1.5404 0.7164 340 0.9504 17518816
1.3327 0.7270 345 0.9472 17775008
1.513 0.7375 350 0.9492 18032752
1.2379 0.7481 355 0.9465 18285632
1.3717 0.7586 360 0.9490 18539636
1.3086 0.7691 365 0.9472 18799412
1.3041 0.7797 370 0.9514 19059540
1.3024 0.7902 375 0.9434 19316932
1.5247 0.8007 380 0.9489 19569092
1.2434 0.8113 385 0.9464 19826932
1.2871 0.8218 390 0.9474 20085368
1.2104 0.8323 395 0.9458 20341548
1.4294 0.8429 400 0.9471 20601432
1.7209 0.8534 405 0.9453 20862684
1.4075 0.8640 410 0.9487 21111520
1.2768 0.8745 415 0.9468 21367704
1.3763 0.8850 420 0.9483 21630016
1.7273 0.8956 425 0.9465 21885988
1.1818 0.9061 430 0.9453 22144724
1.3393 0.9166 435 0.9423 22401468
1.2411 0.9272 440 0.9450 22657412
1.6222 0.9377 445 0.9431 22912156
1.3318 0.9482 450 0.9449 23172860
1.4488 0.9588 455 0.9428 23440376
1.2637 0.9693 460 0.9425 23699824
1.1678 0.9798 465 0.9441 23960164
1.6635 0.9904 470 0.9436 24220104

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter6_sftsd0

Base model

google/gemma-2-27b
Finetuned
(52)
this model