collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd2
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9569
- Num Input Tokens Seen: 28770280
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.4397 | 0.0088 | 5 | 1.2035 | 252296 |
1.3086 | 0.0175 | 10 | 1.0915 | 506500 |
1.1494 | 0.0263 | 15 | 1.0445 | 760592 |
1.0067 | 0.0351 | 20 | 1.0258 | 1004332 |
0.7695 | 0.0438 | 25 | 1.0296 | 1254744 |
0.5357 | 0.0526 | 30 | 1.0276 | 1508608 |
0.3396 | 0.0614 | 35 | 1.0433 | 1755340 |
0.3616 | 0.0701 | 40 | 1.0337 | 2011180 |
0.3358 | 0.0789 | 45 | 1.0260 | 2258492 |
0.2944 | 0.0877 | 50 | 1.0259 | 2516564 |
0.2253 | 0.0964 | 55 | 1.0194 | 2766940 |
0.3132 | 0.1052 | 60 | 1.0127 | 3021564 |
0.3384 | 0.1140 | 65 | 1.0111 | 3277484 |
0.255 | 0.1227 | 70 | 1.0058 | 3527464 |
0.2462 | 0.1315 | 75 | 1.0030 | 3776276 |
0.3055 | 0.1403 | 80 | 0.9994 | 4030452 |
0.2831 | 0.1490 | 85 | 0.9933 | 4273320 |
0.2466 | 0.1578 | 90 | 0.9936 | 4525792 |
0.2042 | 0.1665 | 95 | 0.9905 | 4776548 |
0.2649 | 0.1753 | 100 | 0.9880 | 5031264 |
0.1964 | 0.1841 | 105 | 0.9880 | 5281260 |
0.2198 | 0.1928 | 110 | 0.9848 | 5527728 |
0.2088 | 0.2016 | 115 | 0.9826 | 5785504 |
0.2486 | 0.2104 | 120 | 0.9819 | 6039476 |
0.3276 | 0.2191 | 125 | 0.9812 | 6296036 |
0.2672 | 0.2279 | 130 | 0.9793 | 6551664 |
0.2834 | 0.2367 | 135 | 0.9796 | 6805564 |
0.2273 | 0.2454 | 140 | 0.9765 | 7060732 |
0.2008 | 0.2542 | 145 | 0.9756 | 7309668 |
0.2399 | 0.2630 | 150 | 0.9746 | 7567368 |
0.2562 | 0.2717 | 155 | 0.9754 | 7825264 |
0.1821 | 0.2805 | 160 | 0.9747 | 8074164 |
0.2303 | 0.2893 | 165 | 0.9745 | 8321340 |
0.1777 | 0.2980 | 170 | 0.9745 | 8577896 |
0.2103 | 0.3068 | 175 | 0.9737 | 8822656 |
0.2213 | 0.3156 | 180 | 0.9733 | 9070192 |
0.2599 | 0.3243 | 185 | 0.9723 | 9324820 |
0.1466 | 0.3331 | 190 | 0.9734 | 9576412 |
0.2797 | 0.3419 | 195 | 0.9722 | 9818756 |
0.1883 | 0.3506 | 200 | 0.9709 | 10071484 |
0.2503 | 0.3594 | 205 | 0.9747 | 10320876 |
0.1916 | 0.3682 | 210 | 0.9726 | 10571940 |
0.2219 | 0.3769 | 215 | 0.9713 | 10824124 |
0.1512 | 0.3857 | 220 | 0.9710 | 11073656 |
0.2753 | 0.3945 | 225 | 0.9711 | 11328296 |
0.2557 | 0.4032 | 230 | 0.9699 | 11579460 |
0.2299 | 0.4120 | 235 | 0.9688 | 11830412 |
0.2205 | 0.4208 | 240 | 0.9680 | 12080820 |
0.1929 | 0.4295 | 245 | 0.9689 | 12333420 |
0.2273 | 0.4383 | 250 | 0.9678 | 12583240 |
0.2647 | 0.4470 | 255 | 0.9680 | 12834684 |
0.2449 | 0.4558 | 260 | 0.9694 | 13089408 |
0.2749 | 0.4646 | 265 | 0.9651 | 13340368 |
0.2647 | 0.4733 | 270 | 0.9642 | 13597720 |
0.234 | 0.4821 | 275 | 0.9656 | 13851564 |
0.2691 | 0.4909 | 280 | 0.9660 | 14104972 |
0.3277 | 0.4996 | 285 | 0.9658 | 14352908 |
0.2189 | 0.5084 | 290 | 0.9645 | 14605176 |
0.1958 | 0.5172 | 295 | 0.9635 | 14851964 |
0.2484 | 0.5259 | 300 | 0.9657 | 15106352 |
0.2042 | 0.5347 | 305 | 0.9664 | 15362776 |
0.2009 | 0.5435 | 310 | 0.9649 | 15612868 |
0.2497 | 0.5522 | 315 | 0.9633 | 15864520 |
0.2446 | 0.5610 | 320 | 0.9637 | 16117664 |
0.1779 | 0.5698 | 325 | 0.9640 | 16371944 |
0.2356 | 0.5785 | 330 | 0.9656 | 16623560 |
0.2123 | 0.5873 | 335 | 0.9627 | 16872720 |
0.2159 | 0.5961 | 340 | 0.9623 | 17124308 |
0.2092 | 0.6048 | 345 | 0.9625 | 17372836 |
0.2281 | 0.6136 | 350 | 0.9645 | 17628728 |
0.2634 | 0.6224 | 355 | 0.9659 | 17879612 |
0.2312 | 0.6311 | 360 | 0.9631 | 18135028 |
0.2888 | 0.6399 | 365 | 0.9607 | 18390692 |
0.2695 | 0.6487 | 370 | 0.9595 | 18648440 |
0.1614 | 0.6574 | 375 | 0.9628 | 18901900 |
0.2464 | 0.6662 | 380 | 0.9650 | 19156540 |
0.277 | 0.6750 | 385 | 0.9602 | 19410216 |
0.1922 | 0.6837 | 390 | 0.9606 | 19666028 |
0.1204 | 0.6925 | 395 | 0.9616 | 19925752 |
0.1864 | 0.7013 | 400 | 0.9615 | 20172712 |
0.1827 | 0.7100 | 405 | 0.9629 | 20430744 |
0.2452 | 0.7188 | 410 | 0.9617 | 20684588 |
0.1543 | 0.7276 | 415 | 0.9593 | 20937816 |
0.1891 | 0.7363 | 420 | 0.9594 | 21188956 |
0.2248 | 0.7451 | 425 | 0.9609 | 21450228 |
0.2304 | 0.7538 | 430 | 0.9638 | 21695608 |
0.1279 | 0.7626 | 435 | 0.9627 | 21947316 |
0.1945 | 0.7714 | 440 | 0.9588 | 22199944 |
0.3 | 0.7801 | 445 | 0.9581 | 22457408 |
0.2061 | 0.7889 | 450 | 0.9576 | 22707864 |
0.1922 | 0.7977 | 455 | 0.9582 | 22959288 |
0.2542 | 0.8064 | 460 | 0.9590 | 23208268 |
0.2286 | 0.8152 | 465 | 0.9589 | 23457504 |
0.2163 | 0.8240 | 470 | 0.9549 | 23708000 |
0.2045 | 0.8327 | 475 | 0.9545 | 23965028 |
0.1752 | 0.8415 | 480 | 0.9557 | 24210004 |
0.1393 | 0.8503 | 485 | 0.9574 | 24465820 |
0.173 | 0.8590 | 490 | 0.9568 | 24715180 |
0.216 | 0.8678 | 495 | 0.9551 | 24970732 |
0.2171 | 0.8766 | 500 | 0.9550 | 25226344 |
0.2616 | 0.8853 | 505 | 0.9574 | 25478572 |
0.1634 | 0.8941 | 510 | 0.9542 | 25734644 |
0.2093 | 0.9029 | 515 | 0.9542 | 25984972 |
0.1975 | 0.9116 | 520 | 0.9552 | 26237876 |
0.2017 | 0.9204 | 525 | 0.9557 | 26490584 |
0.2441 | 0.9292 | 530 | 0.9551 | 26744188 |
0.2348 | 0.9379 | 535 | 0.9551 | 26994276 |
0.1623 | 0.9467 | 540 | 0.9546 | 27245784 |
0.2043 | 0.9555 | 545 | 0.9559 | 27498260 |
0.2488 | 0.9642 | 550 | 0.9581 | 27757760 |
0.2076 | 0.9730 | 555 | 0.9563 | 28012680 |
0.2058 | 0.9818 | 560 | 0.9562 | 28269220 |
0.1892 | 0.9905 | 565 | 0.9563 | 28519236 |
0.2218 | 0.9993 | 570 | 0.9569 | 28770280 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd2
Base model
google/gemma-2-9b