collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd1
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9537
- Num Input Tokens Seen: 28613096
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.3359 | 0.0088 | 5 | 1.2046 | 248636 |
1.1839 | 0.0175 | 10 | 1.0988 | 491672 |
1.0402 | 0.0263 | 15 | 1.0538 | 747280 |
0.8977 | 0.0351 | 20 | 1.0290 | 998768 |
0.6695 | 0.0438 | 25 | 1.0319 | 1250796 |
0.5374 | 0.0526 | 30 | 1.0345 | 1499784 |
0.4154 | 0.0614 | 35 | 1.0360 | 1748764 |
0.3295 | 0.0701 | 40 | 1.0337 | 2005896 |
0.2901 | 0.0789 | 45 | 1.0210 | 2250148 |
0.3106 | 0.0877 | 50 | 1.0183 | 2501568 |
0.2671 | 0.0965 | 55 | 1.0131 | 2757456 |
0.328 | 0.1052 | 60 | 1.0060 | 3015288 |
0.2448 | 0.1140 | 65 | 1.0001 | 3271092 |
0.2246 | 0.1228 | 70 | 0.9964 | 3521248 |
0.2362 | 0.1315 | 75 | 0.9959 | 3779680 |
0.2733 | 0.1403 | 80 | 0.9912 | 4023972 |
0.2402 | 0.1491 | 85 | 0.9902 | 4268036 |
0.2512 | 0.1578 | 90 | 0.9892 | 4519392 |
0.2321 | 0.1666 | 95 | 0.9864 | 4778620 |
0.2273 | 0.1754 | 100 | 0.9851 | 5025692 |
0.2323 | 0.1841 | 105 | 0.9838 | 5280468 |
0.3053 | 0.1929 | 110 | 0.9803 | 5530924 |
0.2245 | 0.2017 | 115 | 0.9804 | 5785864 |
0.2883 | 0.2104 | 120 | 0.9804 | 6036860 |
0.2026 | 0.2192 | 125 | 0.9785 | 6287784 |
0.2469 | 0.2280 | 130 | 0.9777 | 6545160 |
0.2019 | 0.2368 | 135 | 0.9773 | 6792968 |
0.2678 | 0.2455 | 140 | 0.9750 | 7040816 |
0.218 | 0.2543 | 145 | 0.9757 | 7293752 |
0.2226 | 0.2631 | 150 | 0.9774 | 7538784 |
0.2404 | 0.2718 | 155 | 0.9744 | 7790920 |
0.232 | 0.2806 | 160 | 0.9735 | 8039788 |
0.2165 | 0.2894 | 165 | 0.9744 | 8288388 |
0.2242 | 0.2981 | 170 | 0.9733 | 8538288 |
0.2149 | 0.3069 | 175 | 0.9725 | 8791120 |
0.1776 | 0.3157 | 180 | 0.9731 | 9036600 |
0.1956 | 0.3244 | 185 | 0.9720 | 9284368 |
0.2323 | 0.3332 | 190 | 0.9710 | 9530516 |
0.2396 | 0.3420 | 195 | 0.9705 | 9780556 |
0.1707 | 0.3507 | 200 | 0.9700 | 10030712 |
0.1939 | 0.3595 | 205 | 0.9692 | 10276572 |
0.2423 | 0.3683 | 210 | 0.9689 | 10527840 |
0.2288 | 0.3770 | 215 | 0.9701 | 10768656 |
0.1921 | 0.3858 | 220 | 0.9686 | 11021400 |
0.266 | 0.3946 | 225 | 0.9682 | 11275360 |
0.2229 | 0.4034 | 230 | 0.9660 | 11523228 |
0.2259 | 0.4121 | 235 | 0.9684 | 11777976 |
0.1639 | 0.4209 | 240 | 0.9676 | 12036000 |
0.221 | 0.4297 | 245 | 0.9663 | 12285220 |
0.3291 | 0.4384 | 250 | 0.9663 | 12536112 |
0.2452 | 0.4472 | 255 | 0.9657 | 12795356 |
0.1445 | 0.4560 | 260 | 0.9648 | 13047616 |
0.2716 | 0.4647 | 265 | 0.9629 | 13299196 |
0.1989 | 0.4735 | 270 | 0.9626 | 13542628 |
0.2011 | 0.4823 | 275 | 0.9654 | 13784996 |
0.1976 | 0.4910 | 280 | 0.9659 | 14044796 |
0.2436 | 0.4998 | 285 | 0.9621 | 14296308 |
0.1861 | 0.5086 | 290 | 0.9625 | 14547836 |
0.2246 | 0.5173 | 295 | 0.9645 | 14795728 |
0.2134 | 0.5261 | 300 | 0.9622 | 15042044 |
0.2016 | 0.5349 | 305 | 0.9615 | 15292508 |
0.191 | 0.5437 | 310 | 0.9627 | 15549036 |
0.1852 | 0.5524 | 315 | 0.9612 | 15800828 |
0.2197 | 0.5612 | 320 | 0.9603 | 16057208 |
0.1979 | 0.5700 | 325 | 0.9613 | 16313496 |
0.2359 | 0.5787 | 330 | 0.9612 | 16568676 |
0.1795 | 0.5875 | 335 | 0.9593 | 16821224 |
0.1896 | 0.5963 | 340 | 0.9601 | 17076824 |
0.2183 | 0.6050 | 345 | 0.9606 | 17329076 |
0.2005 | 0.6138 | 350 | 0.9587 | 17582136 |
0.2036 | 0.6226 | 355 | 0.9581 | 17829840 |
0.2329 | 0.6313 | 360 | 0.9602 | 18083936 |
0.1998 | 0.6401 | 365 | 0.9586 | 18336516 |
0.2645 | 0.6489 | 370 | 0.9577 | 18585836 |
0.1798 | 0.6576 | 375 | 0.9593 | 18835272 |
0.2039 | 0.6664 | 380 | 0.9580 | 19092844 |
0.2022 | 0.6752 | 385 | 0.9582 | 19347236 |
0.1866 | 0.6839 | 390 | 0.9589 | 19595964 |
0.2512 | 0.6927 | 395 | 0.9596 | 19845060 |
0.1757 | 0.7015 | 400 | 0.9581 | 20098528 |
0.1955 | 0.7103 | 405 | 0.9566 | 20351356 |
0.2391 | 0.7190 | 410 | 0.9565 | 20603916 |
0.2249 | 0.7278 | 415 | 0.9565 | 20859824 |
0.2613 | 0.7366 | 420 | 0.9563 | 21110008 |
0.2307 | 0.7453 | 425 | 0.9552 | 21361900 |
0.2076 | 0.7541 | 430 | 0.9565 | 21605480 |
0.1599 | 0.7629 | 435 | 0.9566 | 21853968 |
0.2783 | 0.7716 | 440 | 0.9552 | 22102344 |
0.2174 | 0.7804 | 445 | 0.9546 | 22348724 |
0.1421 | 0.7892 | 450 | 0.9552 | 22603820 |
0.2744 | 0.7979 | 455 | 0.9554 | 22851568 |
0.1836 | 0.8067 | 460 | 0.9555 | 23098412 |
0.1509 | 0.8155 | 465 | 0.9576 | 23345100 |
0.1987 | 0.8242 | 470 | 0.9560 | 23593736 |
0.223 | 0.8330 | 475 | 0.9559 | 23837832 |
0.1652 | 0.8418 | 480 | 0.9568 | 24091040 |
0.2086 | 0.8506 | 485 | 0.9562 | 24348968 |
0.1957 | 0.8593 | 490 | 0.9564 | 24598404 |
0.2455 | 0.8681 | 495 | 0.9556 | 24852524 |
0.1507 | 0.8769 | 500 | 0.9561 | 25099900 |
0.2063 | 0.8856 | 505 | 0.9572 | 25345184 |
0.2191 | 0.8944 | 510 | 0.9555 | 25597644 |
0.2405 | 0.9032 | 515 | 0.9561 | 25848912 |
0.2649 | 0.9119 | 520 | 0.9575 | 26096932 |
0.1879 | 0.9207 | 525 | 0.9549 | 26345864 |
0.2198 | 0.9295 | 530 | 0.9536 | 26597760 |
0.2613 | 0.9382 | 535 | 0.9541 | 26851052 |
0.2039 | 0.9470 | 540 | 0.9539 | 27105996 |
0.2108 | 0.9558 | 545 | 0.9553 | 27355996 |
0.1767 | 0.9645 | 550 | 0.9564 | 27605896 |
0.2146 | 0.9733 | 555 | 0.9564 | 27859848 |
0.2219 | 0.9821 | 560 | 0.9561 | 28118420 |
0.1854 | 0.9908 | 565 | 0.9530 | 28363872 |
0.1964 | 0.9996 | 570 | 0.9537 | 28613096 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter6_sftsd1
Base model
google/gemma-2-9b