collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9569
  • Num Input Tokens Seen: 28770280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.4397 0.0088 5 1.2035 252296
1.3086 0.0175 10 1.0915 506500
1.1494 0.0263 15 1.0445 760592
1.0067 0.0351 20 1.0258 1004332
0.7695 0.0438 25 1.0296 1254744
0.5357 0.0526 30 1.0276 1508608
0.3396 0.0614 35 1.0433 1755340
0.3616 0.0701 40 1.0337 2011180
0.3358 0.0789 45 1.0260 2258492
0.2944 0.0877 50 1.0259 2516564
0.2253 0.0964 55 1.0194 2766940
0.3132 0.1052 60 1.0127 3021564
0.3384 0.1140 65 1.0111 3277484
0.255 0.1227 70 1.0058 3527464
0.2462 0.1315 75 1.0030 3776276
0.3055 0.1403 80 0.9994 4030452
0.2831 0.1490 85 0.9933 4273320
0.2466 0.1578 90 0.9936 4525792
0.2042 0.1665 95 0.9905 4776548
0.2649 0.1753 100 0.9880 5031264
0.1964 0.1841 105 0.9880 5281260
0.2198 0.1928 110 0.9848 5527728
0.2088 0.2016 115 0.9826 5785504
0.2486 0.2104 120 0.9819 6039476
0.3276 0.2191 125 0.9812 6296036
0.2672 0.2279 130 0.9793 6551664
0.2834 0.2367 135 0.9796 6805564
0.2273 0.2454 140 0.9765 7060732
0.2008 0.2542 145 0.9756 7309668
0.2399 0.2630 150 0.9746 7567368
0.2562 0.2717 155 0.9754 7825264
0.1821 0.2805 160 0.9747 8074164
0.2303 0.2893 165 0.9745 8321340
0.1777 0.2980 170 0.9745 8577896
0.2103 0.3068 175 0.9737 8822656
0.2213 0.3156 180 0.9733 9070192
0.2599 0.3243 185 0.9723 9324820
0.1466 0.3331 190 0.9734 9576412
0.2797 0.3419 195 0.9722 9818756
0.1883 0.3506 200 0.9709 10071484
0.2503 0.3594 205 0.9747 10320876
0.1916 0.3682 210 0.9726 10571940
0.2219 0.3769 215 0.9713 10824124
0.1512 0.3857 220 0.9710 11073656
0.2753 0.3945 225 0.9711 11328296
0.2557 0.4032 230 0.9699 11579460
0.2299 0.4120 235 0.9688 11830412
0.2205 0.4208 240 0.9680 12080820
0.1929 0.4295 245 0.9689 12333420
0.2273 0.4383 250 0.9678 12583240
0.2647 0.4470 255 0.9680 12834684
0.2449 0.4558 260 0.9694 13089408
0.2749 0.4646 265 0.9651 13340368
0.2647 0.4733 270 0.9642 13597720
0.234 0.4821 275 0.9656 13851564
0.2691 0.4909 280 0.9660 14104972
0.3277 0.4996 285 0.9658 14352908
0.2189 0.5084 290 0.9645 14605176
0.1958 0.5172 295 0.9635 14851964
0.2484 0.5259 300 0.9657 15106352
0.2042 0.5347 305 0.9664 15362776
0.2009 0.5435 310 0.9649 15612868
0.2497 0.5522 315 0.9633 15864520
0.2446 0.5610 320 0.9637 16117664
0.1779 0.5698 325 0.9640 16371944
0.2356 0.5785 330 0.9656 16623560
0.2123 0.5873 335 0.9627 16872720
0.2159 0.5961 340 0.9623 17124308
0.2092 0.6048 345 0.9625 17372836
0.2281 0.6136 350 0.9645 17628728
0.2634 0.6224 355 0.9659 17879612
0.2312 0.6311 360 0.9631 18135028
0.2888 0.6399 365 0.9607 18390692
0.2695 0.6487 370 0.9595 18648440
0.1614 0.6574 375 0.9628 18901900
0.2464 0.6662 380 0.9650 19156540
0.277 0.6750 385 0.9602 19410216
0.1922 0.6837 390 0.9606 19666028
0.1204 0.6925 395 0.9616 19925752
0.1864 0.7013 400 0.9615 20172712
0.1827 0.7100 405 0.9629 20430744
0.2452 0.7188 410 0.9617 20684588
0.1543 0.7276 415 0.9593 20937816
0.1891 0.7363 420 0.9594 21188956
0.2248 0.7451 425 0.9609 21450228
0.2304 0.7538 430 0.9638 21695608
0.1279 0.7626 435 0.9627 21947316
0.1945 0.7714 440 0.9588 22199944
0.3 0.7801 445 0.9581 22457408
0.2061 0.7889 450 0.9576 22707864
0.1922 0.7977 455 0.9582 22959288
0.2542 0.8064 460 0.9590 23208268
0.2286 0.8152 465 0.9589 23457504
0.2163 0.8240 470 0.9549 23708000
0.2045 0.8327 475 0.9545 23965028
0.1752 0.8415 480 0.9557 24210004
0.1393 0.8503 485 0.9574 24465820
0.173 0.8590 490 0.9568 24715180
0.216 0.8678 495 0.9551 24970732
0.2171 0.8766 500 0.9550 25226344
0.2616 0.8853 505 0.9574 25478572
0.1634 0.8941 510 0.9542 25734644
0.2093 0.9029 515 0.9542 25984972
0.1975 0.9116 520 0.9552 26237876
0.2017 0.9204 525 0.9557 26490584
0.2441 0.9292 530 0.9551 26744188
0.2348 0.9379 535 0.9551 26994276
0.1623 0.9467 540 0.9546 27245784
0.2043 0.9555 545 0.9559 27498260
0.2488 0.9642 550 0.9581 27757760
0.2076 0.9730 555 0.9563 28012680
0.2058 0.9818 560 0.9562 28269220
0.1892 0.9905 565 0.9563 28519236
0.2218 0.9993 570 0.9569 28770280

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd2

Base model

google/gemma-2-9b
Finetuned
(229)
this model