collapse_gemma-2-2b_hs2_accumulate_iter18_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1054
- Num Input Tokens Seen: 93253824
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5589 | 0.0029 | 5 | 1.3902 | 274000 |
1.5832 | 0.0058 | 10 | 1.3831 | 545296 |
1.5537 | 0.0087 | 15 | 1.3605 | 813056 |
1.5406 | 0.0116 | 20 | 1.3293 | 1088872 |
1.3959 | 0.0144 | 25 | 1.2857 | 1358904 |
1.3795 | 0.0173 | 30 | 1.2476 | 1633864 |
1.2653 | 0.0202 | 35 | 1.2241 | 1897512 |
1.156 | 0.0231 | 40 | 1.1976 | 2168848 |
1.1898 | 0.0260 | 45 | 1.1947 | 2443712 |
1.0116 | 0.0289 | 50 | 1.2279 | 2708784 |
0.9395 | 0.0318 | 55 | 1.2510 | 2977400 |
0.8139 | 0.0347 | 60 | 1.2637 | 3247864 |
0.6104 | 0.0376 | 65 | 1.3022 | 3516912 |
0.509 | 0.0405 | 70 | 1.2904 | 3787472 |
0.5054 | 0.0433 | 75 | 1.2904 | 4054688 |
0.5146 | 0.0462 | 80 | 1.2758 | 4326016 |
0.4133 | 0.0491 | 85 | 1.2778 | 4599264 |
0.3995 | 0.0520 | 90 | 1.2543 | 4867936 |
0.356 | 0.0549 | 95 | 1.2531 | 5137432 |
0.3111 | 0.0578 | 100 | 1.2612 | 5405440 |
0.2319 | 0.0607 | 105 | 1.2501 | 5682304 |
0.2996 | 0.0636 | 110 | 1.2211 | 5948384 |
0.2453 | 0.0665 | 115 | 1.2293 | 6212944 |
0.2344 | 0.0693 | 120 | 1.2176 | 6477552 |
0.2524 | 0.0722 | 125 | 1.2154 | 6752976 |
0.2025 | 0.0751 | 130 | 1.2221 | 7026224 |
0.2632 | 0.0780 | 135 | 1.2143 | 7290064 |
0.1875 | 0.0809 | 140 | 1.2105 | 7556688 |
0.0915 | 0.0838 | 145 | 1.2113 | 7827360 |
0.1715 | 0.0867 | 150 | 1.2169 | 8096992 |
0.2125 | 0.0896 | 155 | 1.2112 | 8364872 |
0.2488 | 0.0925 | 160 | 1.1999 | 8634344 |
0.2766 | 0.0954 | 165 | 1.2039 | 8904448 |
0.1718 | 0.0982 | 170 | 1.1953 | 9179064 |
0.203 | 0.1011 | 175 | 1.1997 | 9451224 |
0.1646 | 0.1040 | 180 | 1.1933 | 9720888 |
0.1598 | 0.1069 | 185 | 1.2043 | 9986096 |
0.1671 | 0.1098 | 190 | 1.2018 | 10252848 |
0.2159 | 0.1127 | 195 | 1.1887 | 10521360 |
0.1564 | 0.1156 | 200 | 1.1965 | 10791752 |
0.2083 | 0.1185 | 205 | 1.1926 | 11056816 |
0.1854 | 0.1214 | 210 | 1.1865 | 11325384 |
0.1973 | 0.1242 | 215 | 1.1897 | 11597248 |
0.1134 | 0.1271 | 220 | 1.1923 | 11869880 |
0.1822 | 0.1300 | 225 | 1.1894 | 12138592 |
0.1995 | 0.1329 | 230 | 1.1894 | 12411800 |
0.2013 | 0.1358 | 235 | 1.1840 | 12686960 |
0.209 | 0.1387 | 240 | 1.1870 | 12963480 |
0.1465 | 0.1416 | 245 | 1.1816 | 13237872 |
0.2081 | 0.1445 | 250 | 1.1797 | 13505888 |
0.1153 | 0.1474 | 255 | 1.1780 | 13776544 |
0.1555 | 0.1503 | 260 | 1.1813 | 14047784 |
0.1875 | 0.1531 | 265 | 1.1764 | 14320240 |
0.1218 | 0.1560 | 270 | 1.1783 | 14590488 |
0.2 | 0.1589 | 275 | 1.1730 | 14850520 |
0.1165 | 0.1618 | 280 | 1.1775 | 15117536 |
0.1662 | 0.1647 | 285 | 1.1825 | 15389328 |
0.1559 | 0.1676 | 290 | 1.1740 | 15655432 |
0.1177 | 0.1705 | 295 | 1.1759 | 15926896 |
0.1021 | 0.1734 | 300 | 1.1785 | 16203424 |
0.119 | 0.1763 | 305 | 1.1767 | 16475904 |
0.1396 | 0.1791 | 310 | 1.1765 | 16746040 |
0.1534 | 0.1820 | 315 | 1.1771 | 17019176 |
0.1497 | 0.1849 | 320 | 1.1724 | 17287504 |
0.2243 | 0.1878 | 325 | 1.1725 | 17554096 |
0.1548 | 0.1907 | 330 | 1.1712 | 17826504 |
0.1259 | 0.1936 | 335 | 1.1698 | 18094840 |
0.135 | 0.1965 | 340 | 1.1731 | 18366576 |
0.1599 | 0.1994 | 345 | 1.1679 | 18628832 |
0.1622 | 0.2023 | 350 | 1.1633 | 18903856 |
0.1286 | 0.2052 | 355 | 1.1688 | 19177200 |
0.1682 | 0.2080 | 360 | 1.1669 | 19442496 |
0.1071 | 0.2109 | 365 | 1.1637 | 19711680 |
0.1478 | 0.2138 | 370 | 1.1632 | 19983504 |
0.1369 | 0.2167 | 375 | 1.1661 | 20248768 |
0.1423 | 0.2196 | 380 | 1.1638 | 20517072 |
0.2272 | 0.2225 | 385 | 1.1619 | 20790944 |
0.1969 | 0.2254 | 390 | 1.1628 | 21057168 |
0.1786 | 0.2283 | 395 | 1.1594 | 21331368 |
0.1267 | 0.2312 | 400 | 1.1646 | 21592208 |
0.1398 | 0.2340 | 405 | 1.1605 | 21865576 |
0.1205 | 0.2369 | 410 | 1.1583 | 22135264 |
0.1751 | 0.2398 | 415 | 1.1619 | 22410736 |
0.124 | 0.2427 | 420 | 1.1603 | 22680456 |
0.142 | 0.2456 | 425 | 1.1649 | 22948424 |
0.1839 | 0.2485 | 430 | 1.1570 | 23214400 |
0.1541 | 0.2514 | 435 | 1.1562 | 23486864 |
0.0948 | 0.2543 | 440 | 1.1594 | 23746952 |
0.1285 | 0.2572 | 445 | 1.1572 | 24015568 |
0.1433 | 0.2600 | 450 | 1.1528 | 24284792 |
0.2033 | 0.2629 | 455 | 1.1589 | 24561288 |
0.1147 | 0.2658 | 460 | 1.1542 | 24832496 |
0.173 | 0.2687 | 465 | 1.1497 | 25100680 |
0.2365 | 0.2716 | 470 | 1.1531 | 25373344 |
0.1564 | 0.2745 | 475 | 1.1545 | 25629416 |
0.1459 | 0.2774 | 480 | 1.1537 | 25904952 |
0.1871 | 0.2803 | 485 | 1.1507 | 26175216 |
0.1891 | 0.2832 | 490 | 1.1499 | 26447200 |
0.1656 | 0.2861 | 495 | 1.1496 | 26719080 |
0.1092 | 0.2889 | 500 | 1.1482 | 26988784 |
0.0956 | 0.2918 | 505 | 1.1488 | 27252480 |
0.1371 | 0.2947 | 510 | 1.1471 | 27519760 |
0.1419 | 0.2976 | 515 | 1.1501 | 27784472 |
0.1071 | 0.3005 | 520 | 1.1518 | 28057736 |
0.1213 | 0.3034 | 525 | 1.1462 | 28331792 |
0.1374 | 0.3063 | 530 | 1.1470 | 28601872 |
0.1052 | 0.3092 | 535 | 1.1514 | 28876696 |
0.1027 | 0.3121 | 540 | 1.1480 | 29153200 |
0.114 | 0.3149 | 545 | 1.1456 | 29419136 |
0.1467 | 0.3178 | 550 | 1.1462 | 29690448 |
0.1545 | 0.3207 | 555 | 1.1433 | 29954832 |
0.1217 | 0.3236 | 560 | 1.1460 | 30230016 |
0.1076 | 0.3265 | 565 | 1.1451 | 30495272 |
0.1805 | 0.3294 | 570 | 1.1423 | 30754952 |
0.1839 | 0.3323 | 575 | 1.1404 | 31027976 |
0.1233 | 0.3352 | 580 | 1.1440 | 31305520 |
0.1022 | 0.3381 | 585 | 1.1456 | 31575856 |
0.0971 | 0.3410 | 590 | 1.1408 | 31850424 |
0.1356 | 0.3438 | 595 | 1.1426 | 32122800 |
0.1305 | 0.3467 | 600 | 1.1413 | 32395192 |
0.0849 | 0.3496 | 605 | 1.1413 | 32654824 |
0.1264 | 0.3525 | 610 | 1.1421 | 32918256 |
0.1419 | 0.3554 | 615 | 1.1414 | 33189352 |
0.104 | 0.3583 | 620 | 1.1369 | 33451696 |
0.1433 | 0.3612 | 625 | 1.1398 | 33722800 |
0.1505 | 0.3641 | 630 | 1.1439 | 33994944 |
0.0703 | 0.3670 | 635 | 1.1405 | 34267472 |
0.134 | 0.3698 | 640 | 1.1383 | 34536184 |
0.1174 | 0.3727 | 645 | 1.1398 | 34804600 |
0.1284 | 0.3756 | 650 | 1.1438 | 35074848 |
0.1429 | 0.3785 | 655 | 1.1401 | 35346456 |
0.1208 | 0.3814 | 660 | 1.1360 | 35619720 |
0.14 | 0.3843 | 665 | 1.1369 | 35891784 |
0.1607 | 0.3872 | 670 | 1.1438 | 36163960 |
0.1193 | 0.3901 | 675 | 1.1411 | 36433528 |
0.1593 | 0.3930 | 680 | 1.1339 | 36697904 |
0.1269 | 0.3959 | 685 | 1.1365 | 36970264 |
0.0766 | 0.3987 | 690 | 1.1388 | 37240080 |
0.1259 | 0.4016 | 695 | 1.1355 | 37512432 |
0.0942 | 0.4045 | 700 | 1.1348 | 37778016 |
0.131 | 0.4074 | 705 | 1.1376 | 38047800 |
0.1021 | 0.4103 | 710 | 1.1363 | 38319168 |
0.1372 | 0.4132 | 715 | 1.1373 | 38586624 |
0.1279 | 0.4161 | 720 | 1.1369 | 38862568 |
0.1177 | 0.4190 | 725 | 1.1333 | 39134768 |
0.1456 | 0.4219 | 730 | 1.1351 | 39403496 |
0.1071 | 0.4247 | 735 | 1.1381 | 39677728 |
0.124 | 0.4276 | 740 | 1.1315 | 39946528 |
0.1441 | 0.4305 | 745 | 1.1322 | 40221840 |
0.1645 | 0.4334 | 750 | 1.1377 | 40489184 |
0.153 | 0.4363 | 755 | 1.1347 | 40755656 |
0.174 | 0.4392 | 760 | 1.1333 | 41033360 |
0.1139 | 0.4421 | 765 | 1.1313 | 41305968 |
0.14 | 0.4450 | 770 | 1.1321 | 41578640 |
0.1803 | 0.4479 | 775 | 1.1310 | 41844968 |
0.1338 | 0.4508 | 780 | 1.1311 | 42113000 |
0.0806 | 0.4536 | 785 | 1.1312 | 42388904 |
0.1372 | 0.4565 | 790 | 1.1321 | 42664456 |
0.1154 | 0.4594 | 795 | 1.1332 | 42935688 |
0.1679 | 0.4623 | 800 | 1.1297 | 43208576 |
0.0879 | 0.4652 | 805 | 1.1279 | 43477656 |
0.0878 | 0.4681 | 810 | 1.1299 | 43746712 |
0.1751 | 0.4710 | 815 | 1.1301 | 44014432 |
0.134 | 0.4739 | 820 | 1.1269 | 44278352 |
0.0995 | 0.4768 | 825 | 1.1287 | 44545480 |
0.1473 | 0.4796 | 830 | 1.1289 | 44815368 |
0.1423 | 0.4825 | 835 | 1.1293 | 45083832 |
0.1379 | 0.4854 | 840 | 1.1316 | 45345832 |
0.0897 | 0.4883 | 845 | 1.1291 | 45609992 |
0.1085 | 0.4912 | 850 | 1.1283 | 45879408 |
0.1562 | 0.4941 | 855 | 1.1289 | 46147064 |
0.0833 | 0.4970 | 860 | 1.1285 | 46417608 |
0.1298 | 0.4999 | 865 | 1.1287 | 46689040 |
0.1749 | 0.5028 | 870 | 1.1293 | 46957408 |
0.1225 | 0.5057 | 875 | 1.1277 | 47224016 |
0.1683 | 0.5085 | 880 | 1.1267 | 47499592 |
0.1476 | 0.5114 | 885 | 1.1264 | 47770312 |
0.1367 | 0.5143 | 890 | 1.1280 | 48044608 |
0.0969 | 0.5172 | 895 | 1.1273 | 48317168 |
0.1579 | 0.5201 | 900 | 1.1260 | 48589712 |
0.1245 | 0.5230 | 905 | 1.1276 | 48862032 |
0.1124 | 0.5259 | 910 | 1.1267 | 49125680 |
0.1337 | 0.5288 | 915 | 1.1248 | 49401928 |
0.1126 | 0.5317 | 920 | 1.1301 | 49672856 |
0.1107 | 0.5345 | 925 | 1.1296 | 49933336 |
0.1176 | 0.5374 | 930 | 1.1268 | 50201496 |
0.0879 | 0.5403 | 935 | 1.1253 | 50475384 |
0.13 | 0.5432 | 940 | 1.1246 | 50736208 |
0.1853 | 0.5461 | 945 | 1.1259 | 51002040 |
0.1328 | 0.5490 | 950 | 1.1260 | 51265808 |
0.1891 | 0.5519 | 955 | 1.1242 | 51532000 |
0.1025 | 0.5548 | 960 | 1.1241 | 51804400 |
0.0983 | 0.5577 | 965 | 1.1270 | 52072064 |
0.1533 | 0.5606 | 970 | 1.1247 | 52343984 |
0.1387 | 0.5634 | 975 | 1.1218 | 52616648 |
0.0784 | 0.5663 | 980 | 1.1243 | 52883288 |
0.1404 | 0.5692 | 985 | 1.1244 | 53150032 |
0.1305 | 0.5721 | 990 | 1.1228 | 53424920 |
0.1637 | 0.5750 | 995 | 1.1241 | 53691184 |
0.0984 | 0.5779 | 1000 | 1.1233 | 53963336 |
0.124 | 0.5808 | 1005 | 1.1214 | 54235888 |
0.0857 | 0.5837 | 1010 | 1.1216 | 54510640 |
0.0896 | 0.5866 | 1015 | 1.1243 | 54788712 |
0.0709 | 0.5894 | 1020 | 1.1253 | 55055992 |
0.1305 | 0.5923 | 1025 | 1.1215 | 55326760 |
0.1741 | 0.5952 | 1030 | 1.1221 | 55594440 |
0.1327 | 0.5981 | 1035 | 1.1266 | 55859064 |
0.1567 | 0.6010 | 1040 | 1.1234 | 56126464 |
0.0897 | 0.6039 | 1045 | 1.1208 | 56388680 |
0.111 | 0.6068 | 1050 | 1.1235 | 56660632 |
0.1504 | 0.6097 | 1055 | 1.1234 | 56930880 |
0.1027 | 0.6126 | 1060 | 1.1202 | 57200120 |
0.1293 | 0.6155 | 1065 | 1.1230 | 57475672 |
0.07 | 0.6183 | 1070 | 1.1241 | 57750016 |
0.1054 | 0.6212 | 1075 | 1.1223 | 58009080 |
0.1256 | 0.6241 | 1080 | 1.1227 | 58275760 |
0.1456 | 0.6270 | 1085 | 1.1227 | 58546040 |
0.1854 | 0.6299 | 1090 | 1.1216 | 58814248 |
0.0928 | 0.6328 | 1095 | 1.1230 | 59074632 |
0.1196 | 0.6357 | 1100 | 1.1223 | 59345160 |
0.0722 | 0.6386 | 1105 | 1.1198 | 59620304 |
0.1418 | 0.6415 | 1110 | 1.1207 | 59886696 |
0.0948 | 0.6443 | 1115 | 1.1215 | 60152368 |
0.1127 | 0.6472 | 1120 | 1.1198 | 60423872 |
0.0763 | 0.6501 | 1125 | 1.1206 | 60683264 |
0.0965 | 0.6530 | 1130 | 1.1228 | 60954240 |
0.0782 | 0.6559 | 1135 | 1.1204 | 61226416 |
0.0636 | 0.6588 | 1140 | 1.1200 | 61491968 |
0.1603 | 0.6617 | 1145 | 1.1199 | 61762264 |
0.1672 | 0.6646 | 1150 | 1.1200 | 62034104 |
0.198 | 0.6675 | 1155 | 1.1184 | 62297544 |
0.0747 | 0.6704 | 1160 | 1.1183 | 62571152 |
0.0814 | 0.6732 | 1165 | 1.1192 | 62839080 |
0.147 | 0.6761 | 1170 | 1.1183 | 63112104 |
0.1748 | 0.6790 | 1175 | 1.1169 | 63381600 |
0.1412 | 0.6819 | 1180 | 1.1187 | 63650992 |
0.1344 | 0.6848 | 1185 | 1.1212 | 63926488 |
0.1112 | 0.6877 | 1190 | 1.1194 | 64191648 |
0.1186 | 0.6906 | 1195 | 1.1151 | 64454704 |
0.0859 | 0.6935 | 1200 | 1.1165 | 64722088 |
0.1359 | 0.6964 | 1205 | 1.1201 | 64993408 |
0.1185 | 0.6992 | 1210 | 1.1185 | 65261904 |
0.1285 | 0.7021 | 1215 | 1.1163 | 65531800 |
0.1617 | 0.7050 | 1220 | 1.1173 | 65800328 |
0.1886 | 0.7079 | 1225 | 1.1181 | 66066872 |
0.1623 | 0.7108 | 1230 | 1.1182 | 66340072 |
0.0973 | 0.7137 | 1235 | 1.1164 | 66609424 |
0.0896 | 0.7166 | 1240 | 1.1164 | 66875888 |
0.1043 | 0.7195 | 1245 | 1.1180 | 67147448 |
0.1796 | 0.7224 | 1250 | 1.1210 | 67416800 |
0.1422 | 0.7253 | 1255 | 1.1195 | 67681760 |
0.0819 | 0.7281 | 1260 | 1.1173 | 67953712 |
0.1168 | 0.7310 | 1265 | 1.1167 | 68223112 |
0.1783 | 0.7339 | 1270 | 1.1164 | 68497488 |
0.1004 | 0.7368 | 1275 | 1.1173 | 68762448 |
0.0932 | 0.7397 | 1280 | 1.1153 | 69034304 |
0.1222 | 0.7426 | 1285 | 1.1150 | 69300448 |
0.1537 | 0.7455 | 1290 | 1.1162 | 69562968 |
0.0953 | 0.7484 | 1295 | 1.1155 | 69832144 |
0.1476 | 0.7513 | 1300 | 1.1155 | 70098160 |
0.1287 | 0.7541 | 1305 | 1.1146 | 70363392 |
0.1314 | 0.7570 | 1310 | 1.1144 | 70631400 |
0.1352 | 0.7599 | 1315 | 1.1144 | 70902008 |
0.1231 | 0.7628 | 1320 | 1.1147 | 71172048 |
0.1088 | 0.7657 | 1325 | 1.1153 | 71434136 |
0.1473 | 0.7686 | 1330 | 1.1157 | 71701928 |
0.0929 | 0.7715 | 1335 | 1.1136 | 71969192 |
0.1138 | 0.7744 | 1340 | 1.1141 | 72242080 |
0.1344 | 0.7773 | 1345 | 1.1145 | 72511776 |
0.0779 | 0.7801 | 1350 | 1.1154 | 72774984 |
0.1322 | 0.7830 | 1355 | 1.1147 | 73039320 |
0.0853 | 0.7859 | 1360 | 1.1133 | 73304440 |
0.1302 | 0.7888 | 1365 | 1.1143 | 73572360 |
0.1115 | 0.7917 | 1370 | 1.1139 | 73842520 |
0.19 | 0.7946 | 1375 | 1.1129 | 74117776 |
0.1397 | 0.7975 | 1380 | 1.1125 | 74385568 |
0.1569 | 0.8004 | 1385 | 1.1132 | 74653080 |
0.0757 | 0.8033 | 1390 | 1.1133 | 74918640 |
0.1331 | 0.8062 | 1395 | 1.1145 | 75184976 |
0.1925 | 0.8090 | 1400 | 1.1119 | 75453696 |
0.1216 | 0.8119 | 1405 | 1.1108 | 75722760 |
0.108 | 0.8148 | 1410 | 1.1132 | 75995712 |
0.1057 | 0.8177 | 1415 | 1.1141 | 76266880 |
0.0918 | 0.8206 | 1420 | 1.1124 | 76539392 |
0.0889 | 0.8235 | 1425 | 1.1109 | 76805504 |
0.1425 | 0.8264 | 1430 | 1.1107 | 77075968 |
0.1161 | 0.8293 | 1435 | 1.1108 | 77346224 |
0.1434 | 0.8322 | 1440 | 1.1108 | 77612424 |
0.1273 | 0.8350 | 1445 | 1.1120 | 77880520 |
0.1539 | 0.8379 | 1450 | 1.1112 | 78148680 |
0.1179 | 0.8408 | 1455 | 1.1103 | 78413896 |
0.101 | 0.8437 | 1460 | 1.1120 | 78685208 |
0.1742 | 0.8466 | 1465 | 1.1120 | 78954704 |
0.096 | 0.8495 | 1470 | 1.1100 | 79229736 |
0.1489 | 0.8524 | 1475 | 1.1119 | 79493800 |
0.0916 | 0.8553 | 1480 | 1.1138 | 79767456 |
0.1456 | 0.8582 | 1485 | 1.1128 | 80032248 |
0.1606 | 0.8611 | 1490 | 1.1112 | 80307344 |
0.1173 | 0.8639 | 1495 | 1.1117 | 80577208 |
0.134 | 0.8668 | 1500 | 1.1106 | 80843944 |
0.0848 | 0.8697 | 1505 | 1.1096 | 81106024 |
0.1171 | 0.8726 | 1510 | 1.1111 | 81372520 |
0.1185 | 0.8755 | 1515 | 1.1116 | 81641328 |
0.1599 | 0.8784 | 1520 | 1.1120 | 81913752 |
0.1662 | 0.8813 | 1525 | 1.1118 | 82185840 |
0.1314 | 0.8842 | 1530 | 1.1125 | 82455216 |
0.1127 | 0.8871 | 1535 | 1.1102 | 82726184 |
0.1113 | 0.8899 | 1540 | 1.1110 | 82999432 |
0.1489 | 0.8928 | 1545 | 1.1090 | 83265168 |
0.0848 | 0.8957 | 1550 | 1.1088 | 83534296 |
0.0834 | 0.8986 | 1555 | 1.1121 | 83805400 |
0.1456 | 0.9015 | 1560 | 1.1120 | 84078160 |
0.1075 | 0.9044 | 1565 | 1.1095 | 84351552 |
0.0953 | 0.9073 | 1570 | 1.1085 | 84618336 |
0.1834 | 0.9102 | 1575 | 1.1081 | 84883592 |
0.1319 | 0.9131 | 1580 | 1.1102 | 85150008 |
0.1442 | 0.9160 | 1585 | 1.1108 | 85419616 |
0.1153 | 0.9188 | 1590 | 1.1093 | 85687680 |
0.0904 | 0.9217 | 1595 | 1.1099 | 85958648 |
0.0972 | 0.9246 | 1600 | 1.1107 | 86229976 |
0.1172 | 0.9275 | 1605 | 1.1113 | 86493616 |
0.0723 | 0.9304 | 1610 | 1.1101 | 86764208 |
0.1279 | 0.9333 | 1615 | 1.1091 | 87036296 |
0.099 | 0.9362 | 1620 | 1.1073 | 87308112 |
0.0814 | 0.9391 | 1625 | 1.1098 | 87575512 |
0.1072 | 0.9420 | 1630 | 1.1139 | 87842176 |
0.1261 | 0.9448 | 1635 | 1.1119 | 88115352 |
0.1281 | 0.9477 | 1640 | 1.1061 | 88379536 |
0.1616 | 0.9506 | 1645 | 1.1066 | 88645496 |
0.1611 | 0.9535 | 1650 | 1.1112 | 88915824 |
0.0981 | 0.9564 | 1655 | 1.1113 | 89189432 |
0.1089 | 0.9593 | 1660 | 1.1091 | 89461720 |
0.1117 | 0.9622 | 1665 | 1.1058 | 89741240 |
0.1199 | 0.9651 | 1670 | 1.1069 | 90009008 |
0.1241 | 0.9680 | 1675 | 1.1102 | 90282224 |
0.0795 | 0.9709 | 1680 | 1.1094 | 90541312 |
0.1547 | 0.9737 | 1685 | 1.1080 | 90817720 |
0.1424 | 0.9766 | 1690 | 1.1076 | 91089088 |
0.1278 | 0.9795 | 1695 | 1.1082 | 91358464 |
0.084 | 0.9824 | 1700 | 1.1078 | 91630048 |
0.103 | 0.9853 | 1705 | 1.1065 | 91901792 |
0.1038 | 0.9882 | 1710 | 1.1066 | 92170960 |
0.154 | 0.9911 | 1715 | 1.1070 | 92450184 |
0.0585 | 0.9940 | 1720 | 1.1050 | 92716376 |
0.0879 | 0.9969 | 1725 | 1.1047 | 92985760 |
0.1127 | 0.9997 | 1730 | 1.1054 | 93253824 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 15
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter18_sftsd2
Base model
google/gemma-2-2b