collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9537
  • Num Input Tokens Seen: 28613096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.3359 0.0088 5 1.2046 248636
1.1839 0.0175 10 1.0988 491672
1.0402 0.0263 15 1.0538 747280
0.8977 0.0351 20 1.0290 998768
0.6695 0.0438 25 1.0319 1250796
0.5374 0.0526 30 1.0345 1499784
0.4154 0.0614 35 1.0360 1748764
0.3295 0.0701 40 1.0337 2005896
0.2901 0.0789 45 1.0210 2250148
0.3106 0.0877 50 1.0183 2501568
0.2671 0.0965 55 1.0131 2757456
0.328 0.1052 60 1.0060 3015288
0.2448 0.1140 65 1.0001 3271092
0.2246 0.1228 70 0.9964 3521248
0.2362 0.1315 75 0.9959 3779680
0.2733 0.1403 80 0.9912 4023972
0.2402 0.1491 85 0.9902 4268036
0.2512 0.1578 90 0.9892 4519392
0.2321 0.1666 95 0.9864 4778620
0.2273 0.1754 100 0.9851 5025692
0.2323 0.1841 105 0.9838 5280468
0.3053 0.1929 110 0.9803 5530924
0.2245 0.2017 115 0.9804 5785864
0.2883 0.2104 120 0.9804 6036860
0.2026 0.2192 125 0.9785 6287784
0.2469 0.2280 130 0.9777 6545160
0.2019 0.2368 135 0.9773 6792968
0.2678 0.2455 140 0.9750 7040816
0.218 0.2543 145 0.9757 7293752
0.2226 0.2631 150 0.9774 7538784
0.2404 0.2718 155 0.9744 7790920
0.232 0.2806 160 0.9735 8039788
0.2165 0.2894 165 0.9744 8288388
0.2242 0.2981 170 0.9733 8538288
0.2149 0.3069 175 0.9725 8791120
0.1776 0.3157 180 0.9731 9036600
0.1956 0.3244 185 0.9720 9284368
0.2323 0.3332 190 0.9710 9530516
0.2396 0.3420 195 0.9705 9780556
0.1707 0.3507 200 0.9700 10030712
0.1939 0.3595 205 0.9692 10276572
0.2423 0.3683 210 0.9689 10527840
0.2288 0.3770 215 0.9701 10768656
0.1921 0.3858 220 0.9686 11021400
0.266 0.3946 225 0.9682 11275360
0.2229 0.4034 230 0.9660 11523228
0.2259 0.4121 235 0.9684 11777976
0.1639 0.4209 240 0.9676 12036000
0.221 0.4297 245 0.9663 12285220
0.3291 0.4384 250 0.9663 12536112
0.2452 0.4472 255 0.9657 12795356
0.1445 0.4560 260 0.9648 13047616
0.2716 0.4647 265 0.9629 13299196
0.1989 0.4735 270 0.9626 13542628
0.2011 0.4823 275 0.9654 13784996
0.1976 0.4910 280 0.9659 14044796
0.2436 0.4998 285 0.9621 14296308
0.1861 0.5086 290 0.9625 14547836
0.2246 0.5173 295 0.9645 14795728
0.2134 0.5261 300 0.9622 15042044
0.2016 0.5349 305 0.9615 15292508
0.191 0.5437 310 0.9627 15549036
0.1852 0.5524 315 0.9612 15800828
0.2197 0.5612 320 0.9603 16057208
0.1979 0.5700 325 0.9613 16313496
0.2359 0.5787 330 0.9612 16568676
0.1795 0.5875 335 0.9593 16821224
0.1896 0.5963 340 0.9601 17076824
0.2183 0.6050 345 0.9606 17329076
0.2005 0.6138 350 0.9587 17582136
0.2036 0.6226 355 0.9581 17829840
0.2329 0.6313 360 0.9602 18083936
0.1998 0.6401 365 0.9586 18336516
0.2645 0.6489 370 0.9577 18585836
0.1798 0.6576 375 0.9593 18835272
0.2039 0.6664 380 0.9580 19092844
0.2022 0.6752 385 0.9582 19347236
0.1866 0.6839 390 0.9589 19595964
0.2512 0.6927 395 0.9596 19845060
0.1757 0.7015 400 0.9581 20098528
0.1955 0.7103 405 0.9566 20351356
0.2391 0.7190 410 0.9565 20603916
0.2249 0.7278 415 0.9565 20859824
0.2613 0.7366 420 0.9563 21110008
0.2307 0.7453 425 0.9552 21361900
0.2076 0.7541 430 0.9565 21605480
0.1599 0.7629 435 0.9566 21853968
0.2783 0.7716 440 0.9552 22102344
0.2174 0.7804 445 0.9546 22348724
0.1421 0.7892 450 0.9552 22603820
0.2744 0.7979 455 0.9554 22851568
0.1836 0.8067 460 0.9555 23098412
0.1509 0.8155 465 0.9576 23345100
0.1987 0.8242 470 0.9560 23593736
0.223 0.8330 475 0.9559 23837832
0.1652 0.8418 480 0.9568 24091040
0.2086 0.8506 485 0.9562 24348968
0.1957 0.8593 490 0.9564 24598404
0.2455 0.8681 495 0.9556 24852524
0.1507 0.8769 500 0.9561 25099900
0.2063 0.8856 505 0.9572 25345184
0.2191 0.8944 510 0.9555 25597644
0.2405 0.9032 515 0.9561 25848912
0.2649 0.9119 520 0.9575 26096932
0.1879 0.9207 525 0.9549 26345864
0.2198 0.9295 530 0.9536 26597760
0.2613 0.9382 535 0.9541 26851052
0.2039 0.9470 540 0.9539 27105996
0.2108 0.9558 545 0.9553 27355996
0.1767 0.9645 550 0.9564 27605896
0.2146 0.9733 555 0.9564 27859848
0.2219 0.9821 560 0.9561 28118420
0.1854 0.9908 565 0.9530 28363872
0.1964 0.9996 570 0.9537 28613096

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd1

Base model

google/gemma-2-9b
Finetuned
(229)
this model