collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9401
  • Num Input Tokens Seen: 21285160

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
3.2221 0.0117 5 1.0926 251756
2.8365 0.0234 10 1.0155 507320
2.7913 0.0350 15 0.9995 754196
2.5411 0.0467 20 0.9870 998220
2.3748 0.0584 25 0.9889 1249252
2.5183 0.0701 30 0.9940 1494160
2.2618 0.0818 35 0.9941 1742232
2.2768 0.0935 40 0.9943 1984428
1.9749 0.1051 45 0.9904 2230660
1.8335 0.1168 50 0.9954 2480560
2.0343 0.1285 55 0.9924 2738660
1.9019 0.1402 60 0.9870 2984676
1.7913 0.1519 65 0.9850 3232276
1.7679 0.1635 70 0.9835 3481768
1.5726 0.1752 75 0.9809 3725896
1.3122 0.1869 80 0.9820 3976404
1.2818 0.1986 85 0.9803 4222020
1.2534 0.2103 90 0.9755 4467096
1.3957 0.2219 95 0.9715 4712856
1.4468 0.2336 100 0.9735 4966776
1.2346 0.2453 105 0.9688 5219188
1.375 0.2570 110 0.9661 5470400
1.2864 0.2687 115 0.9675 5718348
1.2863 0.2804 120 0.9653 5962556
1.2904 0.2920 125 0.9654 6212032
1.292 0.3037 130 0.9624 6457492
1.2084 0.3154 135 0.9630 6706428
1.2862 0.3271 140 0.9621 6958064
1.2497 0.3388 145 0.9612 7208000
1.0042 0.3504 150 0.9585 7455840
1.2159 0.3621 155 0.9577 7709904
1.2636 0.3738 160 0.9569 7958904
1.1413 0.3855 165 0.9598 8201696
1.232 0.3972 170 0.9544 8459644
1.2286 0.4088 175 0.9553 8707480
1.2674 0.4205 180 0.9535 8955328
1.203 0.4322 185 0.9543 9198592
1.1438 0.4439 190 0.9509 9453684
1.3743 0.4556 195 0.9539 9703056
1.4924 0.4673 200 0.9497 9951200
1.2615 0.4789 205 0.9529 10190064
1.1522 0.4906 210 0.9509 10442280
1.1088 0.5023 215 0.9542 10691252
1.1145 0.5140 220 0.9497 10932600
1.1479 0.5257 225 0.9498 11177348
1.1476 0.5373 230 0.9497 11426060
1.3338 0.5490 235 0.9495 11679624
1.1771 0.5607 240 0.9504 11928780
0.9654 0.5724 245 0.9466 12180408
1.1334 0.5841 250 0.9489 12427700
1.1846 0.5958 255 0.9483 12685832
1.2411 0.6074 260 0.9477 12932524
1.2086 0.6191 265 0.9479 13179472
1.1832 0.6308 270 0.9467 13433160
1.1775 0.6425 275 0.9466 13682108
1.2339 0.6542 280 0.9456 13931324
1.2441 0.6658 285 0.9469 14185032
1.0774 0.6775 290 0.9439 14433164
1.1275 0.6892 295 0.9443 14674964
1.0283 0.7009 300 0.9427 14922084
1.0613 0.7126 305 0.9453 15175408
0.8593 0.7242 310 0.9424 15427908
1.1358 0.7359 315 0.9448 15675160
1.1232 0.7476 320 0.9434 15927548
1.1183 0.7593 325 0.9437 16174772
1.1012 0.7710 330 0.9442 16433432
1.1579 0.7827 335 0.9423 16680244
0.8979 0.7943 340 0.9453 16924192
1.1912 0.8060 345 0.9409 17168560
1.0824 0.8177 350 0.9446 17422176
1.1499 0.8294 355 0.9414 17672716
0.8825 0.8411 360 0.9413 17908784
1.0893 0.8527 365 0.9433 18156560
0.9911 0.8644 370 0.9458 18401204
1.0546 0.8761 375 0.9444 18658068
1.0192 0.8878 380 0.9412 18899828
0.9538 0.8995 385 0.9448 19146148
1.2351 0.9111 390 0.9429 19398392
1.1466 0.9228 395 0.9434 19649472
0.886 0.9345 400 0.9437 19896600
0.95 0.9462 405 0.9446 20149368
1.0627 0.9579 410 0.9423 20397044
1.0381 0.9696 415 0.9434 20643720
1.035 0.9812 420 0.9432 20895920
0.8919 0.9929 425 0.9423 21138916

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd2

Base model

google/gemma-2-27b
Finetuned
(52)
this model