diff --git "a/main.log" "b/main.log" --- "a/main.log" +++ "b/main.log" @@ -1,5 +1,5 @@ -[2023-12-07 10:23:28,033][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. -[2023-12-07 10:23:28,034][Main][INFO] - Distributed environment: NO +[2024-01-02 07:29:30,393][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. +[2024-01-02 07:29:30,395][Main][INFO] - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 @@ -7,886 +7,919 @@ Device: cuda Mixed precision type: bf16 -[2023-12-07 10:23:28,035][Main][INFO] - Working directory is /home/jovyan/nanoT5/logs/2023-12-07/10-23-27- -[2023-12-07 10:23:34,494][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00228-of-00512.json.gz -[2023-12-07 10:23:34,494][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00122-of-00512.json.gz -[2023-12-07 10:26:52,223][Main][INFO] - [train] Step 100 out of 65536 | Loss --> 52.126 | Grad_l2 --> 99.505 | Weights_l2 --> 7039.195 | Lr --> 0.010 | Seconds_per_step --> 1.989 | -[2023-12-07 10:29:35,713][Main][INFO] - [train] Step 200 out of 65536 | Loss --> 10.545 | Grad_l2 --> 10.624 | Weights_l2 --> 7038.472 | Lr --> 0.010 | Seconds_per_step --> 1.635 | -[2023-12-07 10:32:18,359][Main][INFO] - [train] Step 300 out of 65536 | Loss --> 9.970 | Grad_l2 --> 11.171 | Weights_l2 --> 7038.803 | Lr --> 0.010 | Seconds_per_step --> 1.626 | -[2023-12-07 10:35:01,852][Main][INFO] - [train] Step 400 out of 65536 | Loss --> 7.047 | Grad_l2 --> 2.897 | Weights_l2 --> 7039.086 | Lr --> 0.010 | Seconds_per_step --> 1.635 | -[2023-12-07 10:37:51,523][Main][INFO] - [train] Step 500 out of 65536 | Loss --> 6.628 | Grad_l2 --> 1.774 | Weights_l2 --> 7039.514 | Lr --> 0.011 | Seconds_per_step --> 1.697 | -[2023-12-07 10:40:34,317][Main][INFO] - [train] Step 600 out of 65536 | Loss --> 6.396 | Grad_l2 --> 1.342 | Weights_l2 --> 7040.790 | Lr --> 0.011 | Seconds_per_step --> 1.628 | -[2023-12-07 10:40:39,527][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00274-of-00512.json.gz -[2023-12-07 10:41:07,486][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00185-of-00512.json.gz -[2023-12-07 10:43:19,864][Main][INFO] - [train] Step 700 out of 65536 | Loss --> 6.233 | Grad_l2 --> 1.181 | Weights_l2 --> 7043.264 | Lr --> 0.011 | Seconds_per_step --> 1.655 | -[2023-12-07 10:46:05,571][Main][INFO] - [train] Step 800 out of 65536 | Loss --> 6.100 | Grad_l2 --> 0.966 | Weights_l2 --> 7046.947 | Lr --> 0.011 | Seconds_per_step --> 1.657 | -[2023-12-07 10:48:49,373][Main][INFO] - [train] Step 900 out of 65536 | Loss --> 5.995 | Grad_l2 --> 0.857 | Weights_l2 --> 7051.087 | Lr --> 0.011 | Seconds_per_step --> 1.638 | -[2023-12-07 10:51:36,625][Main][INFO] - [train] Step 1000 out of 65536 | Loss --> 5.887 | Grad_l2 --> 0.898 | Weights_l2 --> 7055.274 | Lr --> 0.011 | Seconds_per_step --> 1.673 | -[2023-12-07 10:54:20,070][Main][INFO] - [train] Step 1100 out of 65536 | Loss --> 5.800 | Grad_l2 --> 0.856 | Weights_l2 --> 7059.892 | Lr --> 0.011 | Seconds_per_step --> 1.634 | -[2023-12-07 10:57:04,948][Main][INFO] - [train] Step 1200 out of 65536 | Loss --> 5.737 | Grad_l2 --> 0.752 | Weights_l2 --> 7065.135 | Lr --> 0.011 | Seconds_per_step --> 1.649 | -[2023-12-07 10:59:49,423][Main][INFO] - [train] Step 1300 out of 65536 | Loss --> 5.658 | Grad_l2 --> 0.790 | Weights_l2 --> 7071.026 | Lr --> 0.011 | Seconds_per_step --> 1.645 | -[2023-12-07 11:02:32,611][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00430-of-00512.json.gz -[2023-12-07 11:02:33,041][Main][INFO] - [train] Step 1400 out of 65536 | Loss --> 5.592 | Grad_l2 --> 0.745 | Weights_l2 --> 7077.585 | Lr --> 0.011 | Seconds_per_step --> 1.636 | -[2023-12-07 11:02:44,414][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00437-of-00512.json.gz -[2023-12-07 11:05:25,357][Main][INFO] - [train] Step 1500 out of 65536 | Loss --> 5.513 | Grad_l2 --> 0.711 | Weights_l2 --> 7084.555 | Lr --> 0.012 | Seconds_per_step --> 1.723 | -[2023-12-07 11:08:10,226][Main][INFO] - [train] Step 1600 out of 65536 | Loss --> 5.454 | Grad_l2 --> 0.700 | Weights_l2 --> 7091.847 | Lr --> 0.012 | Seconds_per_step --> 1.649 | -[2023-12-07 11:10:53,547][Main][INFO] - [train] Step 1700 out of 65536 | Loss --> 5.409 | Grad_l2 --> 0.676 | Weights_l2 --> 7099.487 | Lr --> 0.012 | Seconds_per_step --> 1.633 | -[2023-12-07 11:13:37,408][Main][INFO] - [train] Step 1800 out of 65536 | Loss --> 5.356 | Grad_l2 --> 0.654 | Weights_l2 --> 7107.333 | Lr --> 0.012 | Seconds_per_step --> 1.639 | -[2023-12-07 11:16:22,556][Main][INFO] - [train] Step 1900 out of 65536 | Loss --> 5.341 | Grad_l2 --> 0.912 | Weights_l2 --> 7115.152 | Lr --> 0.012 | Seconds_per_step --> 1.651 | -[2023-12-07 11:19:09,122][Main][INFO] - [train] Step 2000 out of 65536 | Loss --> 5.288 | Grad_l2 --> 0.642 | Weights_l2 --> 7123.527 | Lr --> 0.012 | Seconds_per_step --> 1.666 | -[2023-12-07 11:21:53,737][Main][INFO] - [train] Step 2100 out of 65536 | Loss --> 5.251 | Grad_l2 --> 0.617 | Weights_l2 --> 7132.043 | Lr --> 0.012 | Seconds_per_step --> 1.646 | -[2023-12-07 11:24:08,911][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00113-of-00512.json.gz -[2023-12-07 11:24:16,457][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00087-of-00512.json.gz -[2023-12-07 11:24:41,243][Main][INFO] - [train] Step 2200 out of 65536 | Loss --> 5.210 | Grad_l2 --> 0.602 | Weights_l2 --> 7140.854 | Lr --> 0.012 | Seconds_per_step --> 1.675 | -[2023-12-07 11:27:24,961][Main][INFO] - [train] Step 2300 out of 65536 | Loss --> 5.178 | Grad_l2 --> 0.596 | Weights_l2 --> 7150.111 | Lr --> 0.012 | Seconds_per_step --> 1.637 | -[2023-12-07 11:30:08,150][Main][INFO] - [train] Step 2400 out of 65536 | Loss --> 5.135 | Grad_l2 --> 0.594 | Weights_l2 --> 7159.698 | Lr --> 0.012 | Seconds_per_step --> 1.632 | -[2023-12-07 11:32:51,698][Main][INFO] - [train] Step 2500 out of 65536 | Loss --> 5.061 | Grad_l2 --> 0.622 | Weights_l2 --> 7169.855 | Lr --> 0.013 | Seconds_per_step --> 1.635 | -[2023-12-07 11:35:33,890][Main][INFO] - [train] Step 2600 out of 65536 | Loss --> 4.909 | Grad_l2 --> 0.616 | Weights_l2 --> 7181.470 | Lr --> 0.013 | Seconds_per_step --> 1.622 | -[2023-12-07 11:38:22,968][Main][INFO] - [train] Step 2700 out of 65536 | Loss --> 4.681 | Grad_l2 --> 0.605 | Weights_l2 --> 7194.923 | Lr --> 0.013 | Seconds_per_step --> 1.691 | -[2023-12-07 11:41:09,256][Main][INFO] - [train] Step 2800 out of 65536 | Loss --> 4.491 | Grad_l2 --> 0.600 | Weights_l2 --> 7210.022 | Lr --> 0.013 | Seconds_per_step --> 1.663 | -[2023-12-07 11:43:52,967][Main][INFO] - [train] Step 2900 out of 65536 | Loss --> 4.293 | Grad_l2 --> 0.601 | Weights_l2 --> 7227.237 | Lr --> 0.013 | Seconds_per_step --> 1.637 | -[2023-12-07 11:45:28,337][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00389-of-00512.json.gz -[2023-12-07 11:45:59,095][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00483-of-00512.json.gz -[2023-12-07 11:46:41,166][Main][INFO] - [train] Step 3000 out of 65536 | Loss --> 4.138 | Grad_l2 --> 0.589 | Weights_l2 --> 7245.406 | Lr --> 0.013 | Seconds_per_step --> 1.682 | -[2023-12-07 11:49:24,746][Main][INFO] - [train] Step 3100 out of 65536 | Loss --> 4.009 | Grad_l2 --> 0.567 | Weights_l2 --> 7263.675 | Lr --> 0.013 | Seconds_per_step --> 1.636 | -[2023-12-07 11:52:07,871][Main][INFO] - [train] Step 3200 out of 65536 | Loss --> 3.909 | Grad_l2 --> 0.571 | Weights_l2 --> 7281.685 | Lr --> 0.013 | Seconds_per_step --> 1.631 | -[2023-12-07 11:54:51,528][Main][INFO] - [train] Step 3300 out of 65536 | Loss --> 3.816 | Grad_l2 --> 0.576 | Weights_l2 --> 7299.196 | Lr --> 0.013 | Seconds_per_step --> 1.637 | -[2023-12-07 11:57:37,824][Main][INFO] - [train] Step 3400 out of 65536 | Loss --> 3.745 | Grad_l2 --> 0.566 | Weights_l2 --> 7316.550 | Lr --> 0.013 | Seconds_per_step --> 1.663 | -[2023-12-07 12:00:21,564][Main][INFO] - [train] Step 3500 out of 65536 | Loss --> 3.675 | Grad_l2 --> 0.561 | Weights_l2 --> 7333.831 | Lr --> 0.014 | Seconds_per_step --> 1.637 | -[2023-12-07 12:03:05,470][Main][INFO] - [train] Step 3600 out of 65536 | Loss --> 3.619 | Grad_l2 --> 0.533 | Weights_l2 --> 7350.971 | Lr --> 0.014 | Seconds_per_step --> 1.639 | -[2023-12-07 12:05:48,932][Main][INFO] - [train] Step 3700 out of 65536 | Loss --> 3.570 | Grad_l2 --> 0.537 | Weights_l2 --> 7368.034 | Lr --> 0.014 | Seconds_per_step --> 1.635 | -[2023-12-07 12:07:23,360][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00174-of-00512.json.gz -[2023-12-07 12:07:47,404][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00440-of-00512.json.gz -[2023-12-07 12:08:35,939][Main][INFO] - [train] Step 3800 out of 65536 | Loss --> 3.516 | Grad_l2 --> 0.541 | Weights_l2 --> 7385.155 | Lr --> 0.014 | Seconds_per_step --> 1.670 | -[2023-12-07 12:11:21,828][Main][INFO] - [train] Step 3900 out of 65536 | Loss --> 3.471 | Grad_l2 --> 0.523 | Weights_l2 --> 7402.046 | Lr --> 0.014 | Seconds_per_step --> 1.659 | -[2023-12-07 12:14:07,620][Main][INFO] - [train] Step 4000 out of 65536 | Loss --> 3.452 | Grad_l2 --> 0.523 | Weights_l2 --> 7418.893 | Lr --> 0.014 | Seconds_per_step --> 1.658 | -[2023-12-07 12:16:56,141][Main][INFO] - [train] Step 4100 out of 65536 | Loss --> 3.399 | Grad_l2 --> 0.648 | Weights_l2 --> 7435.877 | Lr --> 0.014 | Seconds_per_step --> 1.685 | -[2023-12-07 12:19:39,316][Main][INFO] - [train] Step 4200 out of 65536 | Loss --> 3.413 | Grad_l2 --> 0.918 | Weights_l2 --> 7451.295 | Lr --> 0.014 | Seconds_per_step --> 1.632 | -[2023-12-07 12:22:25,213][Main][INFO] - [train] Step 4300 out of 65536 | Loss --> 3.346 | Grad_l2 --> 0.519 | Weights_l2 --> 7469.101 | Lr --> 0.014 | Seconds_per_step --> 1.659 | -[2023-12-07 12:25:06,960][Main][INFO] - [train] Step 4400 out of 65536 | Loss --> 3.305 | Grad_l2 --> 0.511 | Weights_l2 --> 7486.642 | Lr --> 0.014 | Seconds_per_step --> 1.617 | -[2023-12-07 12:27:52,490][Main][INFO] - [train] Step 4500 out of 65536 | Loss --> 3.260 | Grad_l2 --> 0.508 | Weights_l2 --> 7504.121 | Lr --> 0.015 | Seconds_per_step --> 1.655 | -[2023-12-07 12:28:53,656][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00133-of-00512.json.gz -[2023-12-07 12:29:02,279][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00364-of-00512.json.gz -[2023-12-07 12:30:38,147][Main][INFO] - [train] Step 4600 out of 65536 | Loss --> 3.224 | Grad_l2 --> 0.503 | Weights_l2 --> 7521.369 | Lr --> 0.015 | Seconds_per_step --> 1.657 | -[2023-12-07 12:33:21,106][Main][INFO] - [train] Step 4700 out of 65536 | Loss --> 3.187 | Grad_l2 --> 0.495 | Weights_l2 --> 7538.803 | Lr --> 0.015 | Seconds_per_step --> 1.630 | -[2023-12-07 12:36:10,061][Main][INFO] - [train] Step 4800 out of 65536 | Loss --> 3.169 | Grad_l2 --> 0.486 | Weights_l2 --> 7556.290 | Lr --> 0.015 | Seconds_per_step --> 1.690 | -[2023-12-07 12:38:52,975][Main][INFO] - [train] Step 4900 out of 65536 | Loss --> 3.153 | Grad_l2 --> 0.497 | Weights_l2 --> 7573.844 | Lr --> 0.015 | Seconds_per_step --> 1.629 | -[2023-12-07 12:41:36,376][Main][INFO] - [train] Step 5000 out of 65536 | Loss --> 3.120 | Grad_l2 --> 0.487 | Weights_l2 --> 7591.507 | Lr --> 0.015 | Seconds_per_step --> 1.634 | -[2023-12-07 12:44:19,678][Main][INFO] - [train] Step 5100 out of 65536 | Loss --> 3.101 | Grad_l2 --> 0.473 | Weights_l2 --> 7609.343 | Lr --> 0.015 | Seconds_per_step --> 1.633 | -[2023-12-07 12:47:03,093][Main][INFO] - [train] Step 5200 out of 65536 | Loss --> 3.069 | Grad_l2 --> 0.474 | Weights_l2 --> 7627.107 | Lr --> 0.015 | Seconds_per_step --> 1.634 | -[2023-12-07 12:49:49,291][Main][INFO] - [train] Step 5300 out of 65536 | Loss --> 3.051 | Grad_l2 --> 0.466 | Weights_l2 --> 7644.872 | Lr --> 0.015 | Seconds_per_step --> 1.662 | -[2023-12-07 12:50:23,930][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00482-of-00512.json.gz -[2023-12-07 12:50:42,413][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00177-of-00512.json.gz -[2023-12-07 12:52:35,837][Main][INFO] - [train] Step 5400 out of 65536 | Loss --> 3.033 | Grad_l2 --> 0.461 | Weights_l2 --> 7662.800 | Lr --> 0.015 | Seconds_per_step --> 1.665 | -[2023-12-07 12:55:21,543][Main][INFO] - [train] Step 5500 out of 65536 | Loss --> 3.036 | Grad_l2 --> 0.457 | Weights_l2 --> 7680.992 | Lr --> 0.016 | Seconds_per_step --> 1.657 | -[2023-12-07 12:58:05,030][Main][INFO] - [train] Step 5600 out of 65536 | Loss --> 3.006 | Grad_l2 --> 0.461 | Weights_l2 --> 7699.308 | Lr --> 0.016 | Seconds_per_step --> 1.635 | -[2023-12-07 13:00:50,031][Main][INFO] - [train] Step 5700 out of 65536 | Loss --> 2.985 | Grad_l2 --> 0.447 | Weights_l2 --> 7717.695 | Lr --> 0.016 | Seconds_per_step --> 1.650 | -[2023-12-07 13:03:34,890][Main][INFO] - [train] Step 5800 out of 65536 | Loss --> 2.971 | Grad_l2 --> 0.452 | Weights_l2 --> 7736.224 | Lr --> 0.016 | Seconds_per_step --> 1.648 | -[2023-12-07 13:06:21,329][Main][INFO] - [train] Step 5900 out of 65536 | Loss --> 2.953 | Grad_l2 --> 0.449 | Weights_l2 --> 7755.170 | Lr --> 0.016 | Seconds_per_step --> 1.664 | -[2023-12-07 13:09:05,784][Main][INFO] - [train] Step 6000 out of 65536 | Loss --> 2.926 | Grad_l2 --> 0.446 | Weights_l2 --> 7774.076 | Lr --> 0.016 | Seconds_per_step --> 1.645 | -[2023-12-07 13:11:50,660][Main][INFO] - [train] Step 6100 out of 65536 | Loss --> 2.912 | Grad_l2 --> 0.446 | Weights_l2 --> 7793.126 | Lr --> 0.016 | Seconds_per_step --> 1.649 | -[2023-12-07 13:12:43,999][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00005-of-00512.json.gz -[2023-12-07 13:12:55,931][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00279-of-00512.json.gz -[2023-12-07 13:14:35,240][Main][INFO] - [train] Step 6200 out of 65536 | Loss --> 2.911 | Grad_l2 --> 0.438 | Weights_l2 --> 7812.422 | Lr --> 0.016 | Seconds_per_step --> 1.646 | -[2023-12-07 13:17:20,468][Main][INFO] - [train] Step 6300 out of 65536 | Loss --> 2.907 | Grad_l2 --> 0.432 | Weights_l2 --> 7832.023 | Lr --> 0.016 | Seconds_per_step --> 1.652 | -[2023-12-07 13:20:03,838][Main][INFO] - [train] Step 6400 out of 65536 | Loss --> 2.895 | Grad_l2 --> 0.425 | Weights_l2 --> 7851.646 | Lr --> 0.016 | Seconds_per_step --> 1.634 | -[2023-12-07 13:22:46,883][Main][INFO] - [train] Step 6500 out of 65536 | Loss --> 2.876 | Grad_l2 --> 0.428 | Weights_l2 --> 7871.571 | Lr --> 0.017 | Seconds_per_step --> 1.630 | -[2023-12-07 13:25:33,551][Main][INFO] - [train] Step 6600 out of 65536 | Loss --> 2.871 | Grad_l2 --> 0.422 | Weights_l2 --> 7891.751 | Lr --> 0.017 | Seconds_per_step --> 1.667 | -[2023-12-07 13:28:18,581][Main][INFO] - [train] Step 6700 out of 65536 | Loss --> 2.864 | Grad_l2 --> 0.424 | Weights_l2 --> 7912.017 | Lr --> 0.017 | Seconds_per_step --> 1.650 | -[2023-12-07 13:31:03,667][Main][INFO] - [train] Step 6800 out of 65536 | Loss --> 2.854 | Grad_l2 --> 0.411 | Weights_l2 --> 7932.649 | Lr --> 0.017 | Seconds_per_step --> 1.651 | -[2023-12-07 13:33:35,435][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00281-of-00512.json.gz -[2023-12-07 13:33:47,432][Main][INFO] - [train] Step 6900 out of 65536 | Loss --> 2.841 | Grad_l2 --> 0.410 | Weights_l2 --> 7953.490 | Lr --> 0.017 | Seconds_per_step --> 1.638 | -[2023-12-07 13:34:00,850][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00139-of-00512.json.gz -[2023-12-07 13:36:33,485][Main][INFO] - [train] Step 7000 out of 65536 | Loss --> 2.830 | Grad_l2 --> 0.407 | Weights_l2 --> 7974.751 | Lr --> 0.017 | Seconds_per_step --> 1.661 | -[2023-12-07 13:39:16,927][Main][INFO] - [train] Step 7100 out of 65536 | Loss --> 2.821 | Grad_l2 --> 0.407 | Weights_l2 --> 7996.017 | Lr --> 0.017 | Seconds_per_step --> 1.634 | -[2023-12-07 13:42:00,404][Main][INFO] - [train] Step 7200 out of 65536 | Loss --> 2.819 | Grad_l2 --> 0.394 | Weights_l2 --> 8017.339 | Lr --> 0.017 | Seconds_per_step --> 1.635 | -[2023-12-07 13:44:42,321][Main][INFO] - [train] Step 7300 out of 65536 | Loss --> 2.805 | Grad_l2 --> 0.395 | Weights_l2 --> 8038.960 | Lr --> 0.017 | Seconds_per_step --> 1.619 | -[2023-12-07 13:47:27,105][Main][INFO] - [train] Step 7400 out of 65536 | Loss --> 2.804 | Grad_l2 --> 0.406 | Weights_l2 --> 8060.828 | Lr --> 0.017 | Seconds_per_step --> 1.648 | -[2023-12-07 13:50:12,896][Main][INFO] - [train] Step 7500 out of 65536 | Loss --> 2.803 | Grad_l2 --> 0.515 | Weights_l2 --> 8083.113 | Lr --> 0.018 | Seconds_per_step --> 1.658 | -[2023-12-07 13:52:58,928][Main][INFO] - [train] Step 7600 out of 65536 | Loss --> 2.763 | Grad_l2 --> 0.393 | Weights_l2 --> 8105.723 | Lr --> 0.018 | Seconds_per_step --> 1.660 | -[2023-12-07 13:55:29,680][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00407-of-00512.json.gz -[2023-12-07 13:55:44,108][Main][INFO] - [train] Step 7700 out of 65536 | Loss --> 2.758 | Grad_l2 --> 0.382 | Weights_l2 --> 8128.239 | Lr --> 0.018 | Seconds_per_step --> 1.652 | -[2023-12-07 13:56:00,709][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00058-of-00512.json.gz -[2023-12-07 13:58:30,139][Main][INFO] - [train] Step 7800 out of 65536 | Loss --> 2.728 | Grad_l2 --> 0.384 | Weights_l2 --> 8151.077 | Lr --> 0.018 | Seconds_per_step --> 1.660 | -[2023-12-07 14:01:12,534][Main][INFO] - [train] Step 7900 out of 65536 | Loss --> 2.749 | Grad_l2 --> 0.378 | Weights_l2 --> 8174.121 | Lr --> 0.018 | Seconds_per_step --> 1.624 | -[2023-12-07 14:03:56,599][Main][INFO] - [train] Step 8000 out of 65536 | Loss --> 2.745 | Grad_l2 --> 0.369 | Weights_l2 --> 8197.295 | Lr --> 0.018 | Seconds_per_step --> 1.641 | -[2023-12-07 14:06:41,273][Main][INFO] - [train] Step 8100 out of 65536 | Loss --> 2.731 | Grad_l2 --> 0.368 | Weights_l2 --> 8220.798 | Lr --> 0.018 | Seconds_per_step --> 1.647 | -[2023-12-07 14:09:25,496][Main][INFO] - [train] Step 8200 out of 65536 | Loss --> 2.703 | Grad_l2 --> 0.366 | Weights_l2 --> 8244.246 | Lr --> 0.018 | Seconds_per_step --> 1.642 | -[2023-12-07 14:12:11,736][Main][INFO] - [train] Step 8300 out of 65536 | Loss --> 2.709 | Grad_l2 --> 0.358 | Weights_l2 --> 8268.152 | Lr --> 0.018 | Seconds_per_step --> 1.662 | -[2023-12-07 14:14:55,363][Main][INFO] - [train] Step 8400 out of 65536 | Loss --> 2.702 | Grad_l2 --> 0.362 | Weights_l2 --> 8292.280 | Lr --> 0.018 | Seconds_per_step --> 1.636 | -[2023-12-07 14:16:45,703][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00020-of-00512.json.gz -[2023-12-07 14:17:04,913][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00182-of-00512.json.gz -[2023-12-07 14:17:42,061][Main][INFO] - [train] Step 8500 out of 65536 | Loss --> 2.688 | Grad_l2 --> 0.361 | Weights_l2 --> 8316.665 | Lr --> 0.019 | Seconds_per_step --> 1.667 | -[2023-12-07 14:20:24,907][Main][INFO] - [train] Step 8600 out of 65536 | Loss --> 2.682 | Grad_l2 --> 0.354 | Weights_l2 --> 8341.256 | Lr --> 0.019 | Seconds_per_step --> 1.628 | -[2023-12-07 14:23:08,097][Main][INFO] - [train] Step 8700 out of 65536 | Loss --> 2.682 | Grad_l2 --> 0.352 | Weights_l2 --> 8365.983 | Lr --> 0.019 | Seconds_per_step --> 1.632 | -[2023-12-07 14:25:55,571][Main][INFO] - [train] Step 8800 out of 65536 | Loss --> 2.648 | Grad_l2 --> 0.344 | Weights_l2 --> 8391.130 | Lr --> 0.019 | Seconds_per_step --> 1.675 | -[2023-12-07 14:28:39,180][Main][INFO] - [train] Step 8900 out of 65536 | Loss --> 2.646 | Grad_l2 --> 0.349 | Weights_l2 --> 8416.312 | Lr --> 0.019 | Seconds_per_step --> 1.636 | -[2023-12-07 14:31:25,574][Main][INFO] - [train] Step 9000 out of 65536 | Loss --> 2.662 | Grad_l2 --> 0.345 | Weights_l2 --> 8441.909 | Lr --> 0.019 | Seconds_per_step --> 1.664 | -[2023-12-07 14:34:09,090][Main][INFO] - [train] Step 9100 out of 65536 | Loss --> 2.648 | Grad_l2 --> 0.344 | Weights_l2 --> 8467.707 | Lr --> 0.019 | Seconds_per_step --> 1.635 | -[2023-12-07 14:36:52,541][Main][INFO] - [train] Step 9200 out of 65536 | Loss --> 2.651 | Grad_l2 --> 0.341 | Weights_l2 --> 8493.979 | Lr --> 0.019 | Seconds_per_step --> 1.635 | -[2023-12-07 14:38:18,433][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00321-of-00512.json.gz -[2023-12-07 14:38:51,664][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00323-of-00512.json.gz -[2023-12-07 14:39:38,870][Main][INFO] - [train] Step 9300 out of 65536 | Loss --> 2.642 | Grad_l2 --> 0.334 | Weights_l2 --> 8520.284 | Lr --> 0.019 | Seconds_per_step --> 1.663 | -[2023-12-07 14:42:23,018][Main][INFO] - [train] Step 9400 out of 65536 | Loss --> 2.621 | Grad_l2 --> 0.334 | Weights_l2 --> 8546.966 | Lr --> 0.019 | Seconds_per_step --> 1.641 | -[2023-12-07 14:45:11,837][Main][INFO] - [train] Step 9500 out of 65536 | Loss --> 2.626 | Grad_l2 --> 0.329 | Weights_l2 --> 8573.770 | Lr --> 0.020 | Seconds_per_step --> 1.688 | -[2023-12-07 14:47:57,016][Main][INFO] - [train] Step 9600 out of 65536 | Loss --> 2.635 | Grad_l2 --> 0.326 | Weights_l2 --> 8600.857 | Lr --> 0.020 | Seconds_per_step --> 1.652 | -[2023-12-07 14:50:38,951][Main][INFO] - [train] Step 9700 out of 65536 | Loss --> 2.665 | Grad_l2 --> 0.437 | Weights_l2 --> 8629.295 | Lr --> 0.020 | Seconds_per_step --> 1.619 | -[2023-12-07 14:53:22,686][Main][INFO] - [train] Step 9800 out of 65536 | Loss --> 2.642 | Grad_l2 --> 0.325 | Weights_l2 --> 8657.007 | Lr --> 0.020 | Seconds_per_step --> 1.637 | -[2023-12-07 14:56:05,935][Main][INFO] - [train] Step 9900 out of 65536 | Loss --> 2.634 | Grad_l2 --> 0.313 | Weights_l2 --> 8684.909 | Lr --> 0.020 | Seconds_per_step --> 1.632 | -[2023-12-07 14:58:50,766][Main][INFO] - [train] Step 10000 out of 65536 | Loss --> 2.612 | Grad_l2 --> 0.311 | Weights_l2 --> 8712.799 | Lr --> 0.020 | Seconds_per_step --> 1.648 | -[2023-12-07 14:58:50,768][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-10000 -[2023-12-07 14:58:50,771][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading -[2023-12-07 14:58:52,713][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-10000/model.safetensors -[2023-12-07 14:58:56,099][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-10000/optimizer.bin -[2023-12-07 14:58:56,100][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-10000/scheduler.bin -[2023-12-07 14:58:56,100][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-10000/sampler.bin -[2023-12-07 14:58:56,100][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-10000/sampler_1.bin -[2023-12-07 14:58:56,102][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-10000/random_states_0.pkl -[2023-12-07 15:00:02,928][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00217-of-00512.json.gz -[2023-12-07 15:00:53,263][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00511-of-00512.json.gz -[2023-12-07 15:01:46,333][Main][INFO] - [train] Step 10100 out of 65536 | Loss --> 2.594 | Grad_l2 --> 0.316 | Weights_l2 --> 8740.908 | Lr --> 0.020 | Seconds_per_step --> 1.756 | -[2023-12-07 15:04:30,852][Main][INFO] - [train] Step 10200 out of 65536 | Loss --> 2.604 | Grad_l2 --> 0.304 | Weights_l2 --> 8768.914 | Lr --> 0.020 | Seconds_per_step --> 1.645 | -[2023-12-07 15:07:20,206][Main][INFO] - [train] Step 10300 out of 65536 | Loss --> 2.601 | Grad_l2 --> 0.311 | Weights_l2 --> 8796.842 | Lr --> 0.020 | Seconds_per_step --> 1.694 | -[2023-12-07 15:10:04,757][Main][INFO] - [train] Step 10400 out of 65536 | Loss --> 2.577 | Grad_l2 --> 0.307 | Weights_l2 --> 8824.727 | Lr --> 0.020 | Seconds_per_step --> 1.646 | -[2023-12-07 15:12:47,781][Main][INFO] - [train] Step 10500 out of 65536 | Loss --> 2.583 | Grad_l2 --> 0.303 | Weights_l2 --> 8852.810 | Lr --> 0.020 | Seconds_per_step --> 1.630 | -[2023-12-07 15:15:34,987][Main][INFO] - [train] Step 10600 out of 65536 | Loss --> 2.586 | Grad_l2 --> 0.305 | Weights_l2 --> 8880.808 | Lr --> 0.020 | Seconds_per_step --> 1.672 | -[2023-12-07 15:18:20,464][Main][INFO] - [train] Step 10700 out of 65536 | Loss --> 2.581 | Grad_l2 --> 0.299 | Weights_l2 --> 8908.768 | Lr --> 0.020 | Seconds_per_step --> 1.655 | -[2023-12-07 15:21:05,621][Main][INFO] - [train] Step 10800 out of 65536 | Loss --> 2.567 | Grad_l2 --> 0.303 | Weights_l2 --> 8936.775 | Lr --> 0.020 | Seconds_per_step --> 1.652 | -[2023-12-07 15:21:25,794][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00277-of-00512.json.gz -[2023-12-07 15:22:14,521][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00142-of-00512.json.gz -[2023-12-07 15:23:49,576][Main][INFO] - [train] Step 10900 out of 65536 | Loss --> 2.552 | Grad_l2 --> 0.291 | Weights_l2 --> 8964.806 | Lr --> 0.020 | Seconds_per_step --> 1.640 | -[2023-12-07 15:26:36,293][Main][INFO] - [train] Step 11000 out of 65536 | Loss --> 2.545 | Grad_l2 --> 0.295 | Weights_l2 --> 8992.954 | Lr --> 0.020 | Seconds_per_step --> 1.667 | -[2023-12-07 15:29:22,045][Main][INFO] - [train] Step 11100 out of 65536 | Loss --> 2.547 | Grad_l2 --> 0.299 | Weights_l2 --> 9020.810 | Lr --> 0.020 | Seconds_per_step --> 1.658 | -[2023-12-07 15:32:05,624][Main][INFO] - [train] Step 11200 out of 65536 | Loss --> 2.539 | Grad_l2 --> 0.290 | Weights_l2 --> 9048.768 | Lr --> 0.020 | Seconds_per_step --> 1.636 | -[2023-12-07 15:34:49,905][Main][INFO] - [train] Step 11300 out of 65536 | Loss --> 2.536 | Grad_l2 --> 0.294 | Weights_l2 --> 9076.638 | Lr --> 0.020 | Seconds_per_step --> 1.643 | -[2023-12-07 15:37:33,553][Main][INFO] - [train] Step 11400 out of 65536 | Loss --> 2.519 | Grad_l2 --> 0.296 | Weights_l2 --> 9104.713 | Lr --> 0.020 | Seconds_per_step --> 1.636 | -[2023-12-07 15:40:18,444][Main][INFO] - [train] Step 11500 out of 65536 | Loss --> 2.527 | Grad_l2 --> 0.282 | Weights_l2 --> 9132.718 | Lr --> 0.020 | Seconds_per_step --> 1.649 | -[2023-12-07 15:43:02,553][Main][INFO] - [train] Step 11600 out of 65536 | Loss --> 2.536 | Grad_l2 --> 0.290 | Weights_l2 --> 9160.736 | Lr --> 0.020 | Seconds_per_step --> 1.641 | -[2023-12-07 15:43:07,724][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00491-of-00512.json.gz -[2023-12-07 15:44:13,110][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00439-of-00512.json.gz -[2023-12-07 15:45:47,706][Main][INFO] - [train] Step 11700 out of 65536 | Loss --> 2.515 | Grad_l2 --> 0.290 | Weights_l2 --> 9188.762 | Lr --> 0.020 | Seconds_per_step --> 1.652 | -[2023-12-07 15:48:30,463][Main][INFO] - [train] Step 11800 out of 65536 | Loss --> 2.510 | Grad_l2 --> 0.279 | Weights_l2 --> 9216.709 | Lr --> 0.020 | Seconds_per_step --> 1.628 | -[2023-12-07 15:51:19,062][Main][INFO] - [train] Step 11900 out of 65536 | Loss --> 2.522 | Grad_l2 --> 0.276 | Weights_l2 --> 9244.797 | Lr --> 0.020 | Seconds_per_step --> 1.686 | -[2023-12-07 15:54:02,853][Main][INFO] - [train] Step 12000 out of 65536 | Loss --> 2.506 | Grad_l2 --> 0.280 | Weights_l2 --> 9272.830 | Lr --> 0.020 | Seconds_per_step --> 1.638 | -[2023-12-07 15:56:47,814][Main][INFO] - [train] Step 12100 out of 65536 | Loss --> 2.508 | Grad_l2 --> 0.276 | Weights_l2 --> 9300.795 | Lr --> 0.020 | Seconds_per_step --> 1.650 | -[2023-12-07 15:59:30,324][Main][INFO] - [train] Step 12200 out of 65536 | Loss --> 2.483 | Grad_l2 --> 0.293 | Weights_l2 --> 9328.601 | Lr --> 0.020 | Seconds_per_step --> 1.625 | -[2023-12-07 16:02:13,359][Main][INFO] - [train] Step 12300 out of 65536 | Loss --> 2.479 | Grad_l2 --> 0.287 | Weights_l2 --> 9356.875 | Lr --> 0.020 | Seconds_per_step --> 1.630 | -[2023-12-07 16:04:41,425][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00224-of-00512.json.gz -[2023-12-07 16:05:02,061][Main][INFO] - [train] Step 12400 out of 65536 | Loss --> 2.464 | Grad_l2 --> 0.284 | Weights_l2 --> 9385.079 | Lr --> 0.020 | Seconds_per_step --> 1.687 | -[2023-12-07 16:06:11,833][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00119-of-00512.json.gz -[2023-12-07 16:07:45,996][Main][INFO] - [train] Step 12500 out of 65536 | Loss --> 2.480 | Grad_l2 --> 0.282 | Weights_l2 --> 9413.199 | Lr --> 0.020 | Seconds_per_step --> 1.639 | -[2023-12-07 16:10:29,960][Main][INFO] - [train] Step 12600 out of 65536 | Loss --> 2.469 | Grad_l2 --> 0.269 | Weights_l2 --> 9441.196 | Lr --> 0.020 | Seconds_per_step --> 1.640 | -[2023-12-07 16:13:14,587][Main][INFO] - [train] Step 12700 out of 65536 | Loss --> 2.474 | Grad_l2 --> 0.276 | Weights_l2 --> 9469.353 | Lr --> 0.020 | Seconds_per_step --> 1.646 | -[2023-12-07 16:15:58,035][Main][INFO] - [train] Step 12800 out of 65536 | Loss --> 2.479 | Grad_l2 --> 0.270 | Weights_l2 --> 9497.355 | Lr --> 0.020 | Seconds_per_step --> 1.634 | -[2023-12-07 16:18:41,617][Main][INFO] - [train] Step 12900 out of 65536 | Loss --> 2.453 | Grad_l2 --> 0.275 | Weights_l2 --> 9525.279 | Lr --> 0.020 | Seconds_per_step --> 1.636 | -[2023-12-07 16:21:25,155][Main][INFO] - [train] Step 13000 out of 65536 | Loss --> 2.451 | Grad_l2 --> 0.268 | Weights_l2 --> 9553.459 | Lr --> 0.020 | Seconds_per_step --> 1.635 | -[2023-12-07 16:24:09,311][Main][INFO] - [train] Step 13100 out of 65536 | Loss --> 2.467 | Grad_l2 --> 0.278 | Weights_l2 --> 9581.751 | Lr --> 0.020 | Seconds_per_step --> 1.642 | -[2023-12-07 16:25:29,717][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00333-of-00512.json.gz -[2023-12-07 16:27:00,960][Main][INFO] - [train] Step 13200 out of 65536 | Loss --> 2.429 | Grad_l2 --> 0.275 | Weights_l2 --> 9609.667 | Lr --> 0.020 | Seconds_per_step --> 1.716 | -[2023-12-07 16:27:12,849][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00348-of-00512.json.gz -[2023-12-07 16:29:52,936][Main][INFO] - [train] Step 13300 out of 65536 | Loss --> 2.433 | Grad_l2 --> 0.268 | Weights_l2 --> 9637.712 | Lr --> 0.020 | Seconds_per_step --> 1.720 | -[2023-12-07 16:32:36,295][Main][INFO] - [train] Step 13400 out of 65536 | Loss --> 2.451 | Grad_l2 --> 0.263 | Weights_l2 --> 9665.697 | Lr --> 0.020 | Seconds_per_step --> 1.634 | -[2023-12-07 16:35:22,277][Main][INFO] - [train] Step 13500 out of 65536 | Loss --> 2.451 | Grad_l2 --> 0.276 | Weights_l2 --> 9693.319 | Lr --> 0.020 | Seconds_per_step --> 1.660 | -[2023-12-07 16:38:05,727][Main][INFO] - [train] Step 13600 out of 65536 | Loss --> 2.443 | Grad_l2 --> 0.267 | Weights_l2 --> 9720.969 | Lr --> 0.020 | Seconds_per_step --> 1.634 | -[2023-12-07 16:40:48,775][Main][INFO] - [train] Step 13700 out of 65536 | Loss --> 2.437 | Grad_l2 --> 0.266 | Weights_l2 --> 9749.050 | Lr --> 0.020 | Seconds_per_step --> 1.630 | -[2023-12-07 16:43:34,303][Main][INFO] - [train] Step 13800 out of 65536 | Loss --> 2.432 | Grad_l2 --> 0.265 | Weights_l2 --> 9776.968 | Lr --> 0.020 | Seconds_per_step --> 1.655 | -[2023-12-07 16:46:23,087][Main][INFO] - [train] Step 13900 out of 65536 | Loss --> 2.425 | Grad_l2 --> 0.267 | Weights_l2 --> 9804.793 | Lr --> 0.020 | Seconds_per_step --> 1.688 | -[2023-12-07 16:47:54,552][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00335-of-00512.json.gz -[2023-12-07 16:49:06,464][Main][INFO] - [train] Step 14000 out of 65536 | Loss --> 2.402 | Grad_l2 --> 0.263 | Weights_l2 --> 9832.773 | Lr --> 0.020 | Seconds_per_step --> 1.634 | -[2023-12-07 16:49:15,660][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00001-of-00512.json.gz -[2023-12-07 16:51:52,761][Main][INFO] - [train] Step 14100 out of 65536 | Loss --> 2.421 | Grad_l2 --> 0.263 | Weights_l2 --> 9860.695 | Lr --> 0.020 | Seconds_per_step --> 1.663 | -[2023-12-07 16:54:42,084][Main][INFO] - [train] Step 14200 out of 65536 | Loss --> 2.433 | Grad_l2 --> 0.268 | Weights_l2 --> 9888.457 | Lr --> 0.020 | Seconds_per_step --> 1.693 | -[2023-12-07 16:57:25,172][Main][INFO] - [train] Step 14300 out of 65536 | Loss --> 2.414 | Grad_l2 --> 0.264 | Weights_l2 --> 9916.213 | Lr --> 0.020 | Seconds_per_step --> 1.631 | -[2023-12-07 17:00:08,344][Main][INFO] - [train] Step 14400 out of 65536 | Loss --> 2.399 | Grad_l2 --> 0.255 | Weights_l2 --> 9943.847 | Lr --> 0.020 | Seconds_per_step --> 1.632 | -[2023-12-07 17:02:50,884][Main][INFO] - [train] Step 14500 out of 65536 | Loss --> 2.397 | Grad_l2 --> 0.263 | Weights_l2 --> 9971.712 | Lr --> 0.020 | Seconds_per_step --> 1.625 | -[2023-12-07 17:05:35,452][Main][INFO] - [train] Step 14600 out of 65536 | Loss --> 2.394 | Grad_l2 --> 0.259 | Weights_l2 --> 9999.560 | Lr --> 0.020 | Seconds_per_step --> 1.646 | -[2023-12-07 17:08:19,241][Main][INFO] - [train] Step 14700 out of 65536 | Loss --> 2.394 | Grad_l2 --> 0.261 | Weights_l2 --> 10027.189 | Lr --> 0.020 | Seconds_per_step --> 1.638 | -[2023-12-07 17:09:06,557][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00212-of-00512.json.gz -[2023-12-07 17:10:26,618][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00486-of-00512.json.gz -[2023-12-07 17:11:05,390][Main][INFO] - [train] Step 14800 out of 65536 | Loss --> 2.375 | Grad_l2 --> 0.255 | Weights_l2 --> 10054.910 | Lr --> 0.020 | Seconds_per_step --> 1.661 | -[2023-12-07 17:13:48,959][Main][INFO] - [train] Step 14900 out of 65536 | Loss --> 2.375 | Grad_l2 --> 0.263 | Weights_l2 --> 10082.499 | Lr --> 0.020 | Seconds_per_step --> 1.636 | -[2023-12-07 17:16:34,007][Main][INFO] - [train] Step 15000 out of 65536 | Loss --> 2.370 | Grad_l2 --> 0.256 | Weights_l2 --> 10110.061 | Lr --> 0.020 | Seconds_per_step --> 1.650 | -[2023-12-07 17:19:17,539][Main][INFO] - [train] Step 15100 out of 65536 | Loss --> 2.361 | Grad_l2 --> 0.255 | Weights_l2 --> 10137.511 | Lr --> 0.020 | Seconds_per_step --> 1.635 | -[2023-12-07 17:22:01,138][Main][INFO] - [train] Step 15200 out of 65536 | Loss --> 2.356 | Grad_l2 --> 0.246 | Weights_l2 --> 10165.257 | Lr --> 0.020 | Seconds_per_step --> 1.636 | -[2023-12-07 17:24:46,595][Main][INFO] - [train] Step 15300 out of 65536 | Loss --> 2.379 | Grad_l2 --> 0.311 | Weights_l2 --> 10195.037 | Lr --> 0.020 | Seconds_per_step --> 1.655 | -[2023-12-07 17:27:36,404][Main][INFO] - [train] Step 15400 out of 65536 | Loss --> 2.364 | Grad_l2 --> 0.255 | Weights_l2 --> 10223.204 | Lr --> 0.020 | Seconds_per_step --> 1.698 | -[2023-12-07 17:30:22,226][Main][INFO] - [train] Step 15500 out of 65536 | Loss --> 2.340 | Grad_l2 --> 0.250 | Weights_l2 --> 10250.658 | Lr --> 0.020 | Seconds_per_step --> 1.658 | -[2023-12-07 17:31:22,312][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00110-of-00512.json.gz -[2023-12-07 17:32:20,486][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00064-of-00512.json.gz -[2023-12-07 17:33:07,832][Main][INFO] - [train] Step 15600 out of 65536 | Loss --> 2.363 | Grad_l2 --> 0.247 | Weights_l2 --> 10278.145 | Lr --> 0.020 | Seconds_per_step --> 1.656 | -[2023-12-07 17:35:53,883][Main][INFO] - [train] Step 15700 out of 65536 | Loss --> 2.341 | Grad_l2 --> 0.251 | Weights_l2 --> 10305.312 | Lr --> 0.019 | Seconds_per_step --> 1.660 | -[2023-12-07 17:38:37,477][Main][INFO] - [train] Step 15800 out of 65536 | Loss --> 2.353 | Grad_l2 --> 0.252 | Weights_l2 --> 10332.346 | Lr --> 0.019 | Seconds_per_step --> 1.636 | -[2023-12-07 17:41:20,813][Main][INFO] - [train] Step 15900 out of 65536 | Loss --> 2.348 | Grad_l2 --> 0.247 | Weights_l2 --> 10359.602 | Lr --> 0.019 | Seconds_per_step --> 1.633 | -[2023-12-07 17:44:06,420][Main][INFO] - [train] Step 16000 out of 65536 | Loss --> 2.348 | Grad_l2 --> 0.260 | Weights_l2 --> 10386.853 | Lr --> 0.019 | Seconds_per_step --> 1.656 | -[2023-12-07 17:46:48,475][Main][INFO] - [train] Step 16100 out of 65536 | Loss --> 2.338 | Grad_l2 --> 0.254 | Weights_l2 --> 10414.156 | Lr --> 0.019 | Seconds_per_step --> 1.621 | -[2023-12-07 17:49:35,391][Main][INFO] - [train] Step 16200 out of 65536 | Loss --> 2.343 | Grad_l2 --> 0.250 | Weights_l2 --> 10441.502 | Lr --> 0.019 | Seconds_per_step --> 1.669 | -[2023-12-07 17:52:21,862][Main][INFO] - [train] Step 16300 out of 65536 | Loss --> 2.342 | Grad_l2 --> 0.260 | Weights_l2 --> 10468.715 | Lr --> 0.019 | Seconds_per_step --> 1.665 | -[2023-12-07 17:53:20,211][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00307-of-00512.json.gz -[2023-12-07 17:54:13,661][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00022-of-00512.json.gz -[2023-12-07 17:55:06,565][Main][INFO] - [train] Step 16400 out of 65536 | Loss --> 2.332 | Grad_l2 --> 0.256 | Weights_l2 --> 10495.637 | Lr --> 0.019 | Seconds_per_step --> 1.647 | -[2023-12-07 17:57:56,257][Main][INFO] - [train] Step 16500 out of 65536 | Loss --> 2.337 | Grad_l2 --> 0.256 | Weights_l2 --> 10522.869 | Lr --> 0.019 | Seconds_per_step --> 1.697 | -[2023-12-07 18:00:42,781][Main][INFO] - [train] Step 16600 out of 65536 | Loss --> 2.341 | Grad_l2 --> 0.249 | Weights_l2 --> 10550.043 | Lr --> 0.019 | Seconds_per_step --> 1.665 | -[2023-12-07 18:03:28,417][Main][INFO] - [train] Step 16700 out of 65536 | Loss --> 2.338 | Grad_l2 --> 0.252 | Weights_l2 --> 10577.119 | Lr --> 0.019 | Seconds_per_step --> 1.656 | -[2023-12-07 18:06:11,553][Main][INFO] - [train] Step 16800 out of 65536 | Loss --> 2.345 | Grad_l2 --> 0.244 | Weights_l2 --> 10604.182 | Lr --> 0.019 | Seconds_per_step --> 1.631 | -[2023-12-07 18:08:54,908][Main][INFO] - [train] Step 16900 out of 65536 | Loss --> 2.346 | Grad_l2 --> 0.242 | Weights_l2 --> 10631.226 | Lr --> 0.019 | Seconds_per_step --> 1.634 | -[2023-12-07 18:11:40,054][Main][INFO] - [train] Step 17000 out of 65536 | Loss --> 2.331 | Grad_l2 --> 0.244 | Weights_l2 --> 10658.006 | Lr --> 0.019 | Seconds_per_step --> 1.651 | -[2023-12-07 18:14:26,848][Main][INFO] - [train] Step 17100 out of 65536 | Loss --> 2.318 | Grad_l2 --> 0.252 | Weights_l2 --> 10684.868 | Lr --> 0.019 | Seconds_per_step --> 1.668 | -[2023-12-07 18:14:44,716][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00375-of-00512.json.gz -[2023-12-07 18:15:30,390][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00366-of-00512.json.gz -[2023-12-07 18:17:12,328][Main][INFO] - [train] Step 17200 out of 65536 | Loss --> 2.333 | Grad_l2 --> 0.260 | Weights_l2 --> 10711.491 | Lr --> 0.019 | Seconds_per_step --> 1.655 | -[2023-12-07 18:19:57,529][Main][INFO] - [train] Step 17300 out of 65536 | Loss --> 2.337 | Grad_l2 --> 0.243 | Weights_l2 --> 10738.522 | Lr --> 0.019 | Seconds_per_step --> 1.652 | -[2023-12-07 18:22:42,363][Main][INFO] - [train] Step 17400 out of 65536 | Loss --> 2.332 | Grad_l2 --> 0.248 | Weights_l2 --> 10765.553 | Lr --> 0.019 | Seconds_per_step --> 1.648 | -[2023-12-07 18:25:26,748][Main][INFO] - [train] Step 17500 out of 65536 | Loss --> 2.322 | Grad_l2 --> 0.236 | Weights_l2 --> 10792.616 | Lr --> 0.019 | Seconds_per_step --> 1.644 | -[2023-12-07 18:28:10,607][Main][INFO] - [train] Step 17600 out of 65536 | Loss --> 2.321 | Grad_l2 --> 0.244 | Weights_l2 --> 10819.467 | Lr --> 0.019 | Seconds_per_step --> 1.639 | -[2023-12-07 18:30:55,450][Main][INFO] - [train] Step 17700 out of 65536 | Loss --> 2.319 | Grad_l2 --> 0.253 | Weights_l2 --> 10846.597 | Lr --> 0.019 | Seconds_per_step --> 1.648 | -[2023-12-07 18:33:40,221][Main][INFO] - [train] Step 17800 out of 65536 | Loss --> 2.303 | Grad_l2 --> 0.244 | Weights_l2 --> 10873.485 | Lr --> 0.019 | Seconds_per_step --> 1.648 | -[2023-12-07 18:36:21,607][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00240-of-00512.json.gz -[2023-12-07 18:36:39,360][Main][INFO] - [train] Step 17900 out of 65536 | Loss --> 2.296 | Grad_l2 --> 0.242 | Weights_l2 --> 10900.446 | Lr --> 0.019 | Seconds_per_step --> 1.791 | -[2023-12-07 18:37:38,984][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00463-of-00512.json.gz -[2023-12-07 18:39:31,017][Main][INFO] - [train] Step 18000 out of 65536 | Loss --> 2.290 | Grad_l2 --> 0.242 | Weights_l2 --> 10927.105 | Lr --> 0.019 | Seconds_per_step --> 1.717 | -[2023-12-07 18:42:18,256][Main][INFO] - [train] Step 18100 out of 65536 | Loss --> 2.283 | Grad_l2 --> 0.244 | Weights_l2 --> 10953.904 | Lr --> 0.019 | Seconds_per_step --> 1.672 | -[2023-12-07 18:45:03,383][Main][INFO] - [train] Step 18200 out of 65536 | Loss --> 2.277 | Grad_l2 --> 0.241 | Weights_l2 --> 10980.524 | Lr --> 0.019 | Seconds_per_step --> 1.651 | -[2023-12-07 18:47:47,414][Main][INFO] - [train] Step 18300 out of 65536 | Loss --> 2.285 | Grad_l2 --> 0.240 | Weights_l2 --> 11006.922 | Lr --> 0.019 | Seconds_per_step --> 1.640 | -[2023-12-07 18:50:33,008][Main][INFO] - [train] Step 18400 out of 65536 | Loss --> 2.292 | Grad_l2 --> 0.239 | Weights_l2 --> 11033.427 | Lr --> 0.019 | Seconds_per_step --> 1.656 | -[2023-12-07 18:53:16,160][Main][INFO] - [train] Step 18500 out of 65536 | Loss --> 2.274 | Grad_l2 --> 0.238 | Weights_l2 --> 11060.012 | Lr --> 0.019 | Seconds_per_step --> 1.632 | -[2023-12-07 18:56:00,447][Main][INFO] - [train] Step 18600 out of 65536 | Loss --> 2.270 | Grad_l2 --> 0.234 | Weights_l2 --> 11086.420 | Lr --> 0.019 | Seconds_per_step --> 1.643 | -[2023-12-07 18:58:46,487][Main][INFO] - [train] Step 18700 out of 65536 | Loss --> 2.273 | Grad_l2 --> 0.239 | Weights_l2 --> 11112.857 | Lr --> 0.019 | Seconds_per_step --> 1.660 | -[2023-12-07 18:59:09,804][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00473-of-00512.json.gz -[2023-12-07 18:59:49,106][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00160-of-00512.json.gz -[2023-12-07 19:01:31,602][Main][INFO] - [train] Step 18800 out of 65536 | Loss --> 2.281 | Grad_l2 --> 0.240 | Weights_l2 --> 11139.419 | Lr --> 0.019 | Seconds_per_step --> 1.651 | -[2023-12-07 19:04:18,623][Main][INFO] - [train] Step 18900 out of 65536 | Loss --> 2.277 | Grad_l2 --> 0.252 | Weights_l2 --> 11165.724 | Lr --> 0.019 | Seconds_per_step --> 1.670 | -[2023-12-07 19:07:01,332][Main][INFO] - [train] Step 19000 out of 65536 | Loss --> 2.271 | Grad_l2 --> 0.253 | Weights_l2 --> 11191.982 | Lr --> 0.019 | Seconds_per_step --> 1.627 | -[2023-12-07 19:09:45,002][Main][INFO] - [train] Step 19100 out of 65536 | Loss --> 2.253 | Grad_l2 --> 0.238 | Weights_l2 --> 11218.354 | Lr --> 0.019 | Seconds_per_step --> 1.637 | -[2023-12-07 19:12:34,394][Main][INFO] - [train] Step 19200 out of 65536 | Loss --> 2.274 | Grad_l2 --> 0.252 | Weights_l2 --> 11244.747 | Lr --> 0.019 | Seconds_per_step --> 1.694 | -[2023-12-07 19:15:21,714][Main][INFO] - [train] Step 19300 out of 65536 | Loss --> 2.279 | Grad_l2 --> 0.248 | Weights_l2 --> 11271.531 | Lr --> 0.019 | Seconds_per_step --> 1.673 | -[2023-12-07 19:18:05,741][Main][INFO] - [train] Step 19400 out of 65536 | Loss --> 2.268 | Grad_l2 --> 0.248 | Weights_l2 --> 11297.770 | Lr --> 0.019 | Seconds_per_step --> 1.640 | -[2023-12-07 19:20:49,740][Main][INFO] - [train] Step 19500 out of 65536 | Loss --> 2.249 | Grad_l2 --> 0.248 | Weights_l2 --> 11323.624 | Lr --> 0.019 | Seconds_per_step --> 1.640 | -[2023-12-07 19:20:50,201][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00190-of-00512.json.gz -[2023-12-07 19:21:01,053][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00406-of-00512.json.gz -[2023-12-07 19:23:44,266][Main][INFO] - [train] Step 19600 out of 65536 | Loss --> 2.255 | Grad_l2 --> 0.248 | Weights_l2 --> 11349.514 | Lr --> 0.019 | Seconds_per_step --> 1.745 | -[2023-12-07 19:26:27,348][Main][INFO] - [train] Step 19700 out of 65536 | Loss --> 2.251 | Grad_l2 --> 0.250 | Weights_l2 --> 11375.584 | Lr --> 0.019 | Seconds_per_step --> 1.631 | -[2023-12-07 19:29:12,557][Main][INFO] - [train] Step 19800 out of 65536 | Loss --> 2.259 | Grad_l2 --> 0.245 | Weights_l2 --> 11401.649 | Lr --> 0.019 | Seconds_per_step --> 1.652 | -[2023-12-07 19:31:55,115][Main][INFO] - [train] Step 19900 out of 65536 | Loss --> 2.247 | Grad_l2 --> 0.243 | Weights_l2 --> 11427.471 | Lr --> 0.018 | Seconds_per_step --> 1.626 | -[2023-12-07 19:34:39,944][Main][INFO] - [train] Step 20000 out of 65536 | Loss --> 2.256 | Grad_l2 --> 0.236 | Weights_l2 --> 11453.381 | Lr --> 0.018 | Seconds_per_step --> 1.648 | -[2023-12-07 19:34:39,946][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000 -[2023-12-07 19:34:39,950][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading -[2023-12-07 19:34:42,537][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors -[2023-12-07 19:34:46,400][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin -[2023-12-07 19:34:46,401][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin -[2023-12-07 19:34:46,401][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin -[2023-12-07 19:34:46,402][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin -[2023-12-07 19:34:46,403][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl -[2023-12-07 19:37:31,396][Main][INFO] - [train] Step 20100 out of 65536 | Loss --> 2.243 | Grad_l2 --> 0.243 | Weights_l2 --> 11479.115 | Lr --> 0.018 | Seconds_per_step --> 1.715 | -[2023-12-07 19:40:15,358][Main][INFO] - [train] Step 20200 out of 65536 | Loss --> 2.240 | Grad_l2 --> 0.241 | Weights_l2 --> 11504.728 | Lr --> 0.018 | Seconds_per_step --> 1.640 | -[2023-12-07 19:42:38,485][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00214-of-00512.json.gz -[2023-12-07 19:43:04,504][Main][INFO] - [train] Step 20300 out of 65536 | Loss --> 2.247 | Grad_l2 --> 0.238 | Weights_l2 --> 11530.569 | Lr --> 0.018 | Seconds_per_step --> 1.691 | -[2023-12-07 19:43:19,710][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00492-of-00512.json.gz -[2023-12-07 19:45:50,797][Main][INFO] - [train] Step 20400 out of 65536 | Loss --> 2.240 | Grad_l2 --> 0.247 | Weights_l2 --> 11556.611 | Lr --> 0.018 | Seconds_per_step --> 1.663 | -[2023-12-07 19:48:35,032][Main][INFO] - [train] Step 20500 out of 65536 | Loss --> 2.239 | Grad_l2 --> 0.267 | Weights_l2 --> 11583.333 | Lr --> 0.018 | Seconds_per_step --> 1.642 | -[2023-12-07 19:51:18,968][Main][INFO] - [train] Step 20600 out of 65536 | Loss --> 2.237 | Grad_l2 --> 0.242 | Weights_l2 --> 11608.871 | Lr --> 0.018 | Seconds_per_step --> 1.639 | -[2023-12-07 19:54:06,135][Main][INFO] - [train] Step 20700 out of 65536 | Loss --> 2.233 | Grad_l2 --> 0.252 | Weights_l2 --> 11634.683 | Lr --> 0.018 | Seconds_per_step --> 1.672 | -[2023-12-07 19:56:51,417][Main][INFO] - [train] Step 20800 out of 65536 | Loss --> 2.227 | Grad_l2 --> 0.251 | Weights_l2 --> 11660.024 | Lr --> 0.018 | Seconds_per_step --> 1.653 | -[2023-12-07 19:59:34,828][Main][INFO] - [train] Step 20900 out of 65536 | Loss --> 2.245 | Grad_l2 --> 0.249 | Weights_l2 --> 11685.395 | Lr --> 0.018 | Seconds_per_step --> 1.634 | -[2023-12-07 20:02:18,770][Main][INFO] - [train] Step 21000 out of 65536 | Loss --> 2.234 | Grad_l2 --> 0.254 | Weights_l2 --> 11710.957 | Lr --> 0.018 | Seconds_per_step --> 1.639 | -[2023-12-07 20:04:03,572][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00082-of-00512.json.gz -[2023-12-07 20:04:19,826][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00143-of-00512.json.gz -[2023-12-07 20:05:06,776][Main][INFO] - [train] Step 21100 out of 65536 | Loss --> 2.212 | Grad_l2 --> 0.257 | Weights_l2 --> 11736.330 | Lr --> 0.018 | Seconds_per_step --> 1.680 | -[2023-12-07 20:07:51,226][Main][INFO] - [train] Step 21200 out of 65536 | Loss --> 2.223 | Grad_l2 --> 0.256 | Weights_l2 --> 11761.768 | Lr --> 0.018 | Seconds_per_step --> 1.645 | -[2023-12-07 20:10:36,492][Main][INFO] - [train] Step 21300 out of 65536 | Loss --> 2.213 | Grad_l2 --> 0.252 | Weights_l2 --> 11787.074 | Lr --> 0.018 | Seconds_per_step --> 1.653 | -[2023-12-07 20:13:21,676][Main][INFO] - [train] Step 21400 out of 65536 | Loss --> 2.211 | Grad_l2 --> 0.249 | Weights_l2 --> 11812.074 | Lr --> 0.018 | Seconds_per_step --> 1.652 | -[2023-12-07 20:16:08,234][Main][INFO] - [train] Step 21500 out of 65536 | Loss --> 2.212 | Grad_l2 --> 0.245 | Weights_l2 --> 11837.126 | Lr --> 0.018 | Seconds_per_step --> 1.666 | -[2023-12-07 20:18:52,774][Main][INFO] - [train] Step 21600 out of 65536 | Loss --> 2.205 | Grad_l2 --> 0.256 | Weights_l2 --> 11862.231 | Lr --> 0.018 | Seconds_per_step --> 1.645 | -[2023-12-07 20:21:38,150][Main][INFO] - [train] Step 21700 out of 65536 | Loss --> 2.235 | Grad_l2 --> 0.256 | Weights_l2 --> 11887.675 | Lr --> 0.018 | Seconds_per_step --> 1.654 | -[2023-12-07 20:24:22,215][Main][INFO] - [train] Step 21800 out of 65536 | Loss --> 2.213 | Grad_l2 --> 0.259 | Weights_l2 --> 11912.829 | Lr --> 0.018 | Seconds_per_step --> 1.641 | -[2023-12-07 20:25:47,896][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00264-of-00512.json.gz -[2023-12-07 20:25:55,123][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00033-of-00512.json.gz -[2023-12-07 20:27:08,223][Main][INFO] - [train] Step 21900 out of 65536 | Loss --> 2.212 | Grad_l2 --> 0.257 | Weights_l2 --> 11937.840 | Lr --> 0.018 | Seconds_per_step --> 1.660 | -[2023-12-07 20:29:51,526][Main][INFO] - [train] Step 22000 out of 65536 | Loss --> 2.199 | Grad_l2 --> 0.252 | Weights_l2 --> 11962.602 | Lr --> 0.018 | Seconds_per_step --> 1.633 | -[2023-12-07 20:32:36,241][Main][INFO] - [train] Step 22100 out of 65536 | Loss --> 2.196 | Grad_l2 --> 0.260 | Weights_l2 --> 11987.382 | Lr --> 0.018 | Seconds_per_step --> 1.647 | -[2023-12-07 20:35:20,671][Main][INFO] - [train] Step 22200 out of 65536 | Loss --> 2.200 | Grad_l2 --> 0.250 | Weights_l2 --> 12012.021 | Lr --> 0.018 | Seconds_per_step --> 1.644 | -[2023-12-07 20:38:03,608][Main][INFO] - [train] Step 22300 out of 65536 | Loss --> 2.209 | Grad_l2 --> 0.241 | Weights_l2 --> 12036.547 | Lr --> 0.018 | Seconds_per_step --> 1.629 | -[2023-12-07 20:40:47,553][Main][INFO] - [train] Step 22400 out of 65536 | Loss --> 2.196 | Grad_l2 --> 0.248 | Weights_l2 --> 12061.126 | Lr --> 0.018 | Seconds_per_step --> 1.639 | -[2023-12-07 20:43:29,993][Main][INFO] - [train] Step 22500 out of 65536 | Loss --> 2.193 | Grad_l2 --> 0.249 | Weights_l2 --> 12085.553 | Lr --> 0.018 | Seconds_per_step --> 1.624 | -[2023-12-07 20:46:18,422][Main][INFO] - [train] Step 22600 out of 65536 | Loss --> 2.188 | Grad_l2 --> 0.249 | Weights_l2 --> 12109.933 | Lr --> 0.018 | Seconds_per_step --> 1.684 | -[2023-12-07 20:47:19,768][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00421-of-00512.json.gz -[2023-12-07 20:47:26,431][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00418-of-00512.json.gz -[2023-12-07 20:49:02,553][Main][INFO] - [train] Step 22700 out of 65536 | Loss --> 2.176 | Grad_l2 --> 0.244 | Weights_l2 --> 12134.091 | Lr --> 0.018 | Seconds_per_step --> 1.641 | -[2023-12-07 20:51:48,880][Main][INFO] - [train] Step 22800 out of 65536 | Loss --> 2.183 | Grad_l2 --> 0.242 | Weights_l2 --> 12158.413 | Lr --> 0.017 | Seconds_per_step --> 1.663 | -[2023-12-07 20:54:32,640][Main][INFO] - [train] Step 22900 out of 65536 | Loss --> 2.191 | Grad_l2 --> 0.250 | Weights_l2 --> 12182.608 | Lr --> 0.017 | Seconds_per_step --> 1.638 | -[2023-12-07 20:57:16,487][Main][INFO] - [train] Step 23000 out of 65536 | Loss --> 2.171 | Grad_l2 --> 0.243 | Weights_l2 --> 12206.659 | Lr --> 0.017 | Seconds_per_step --> 1.638 | -[2023-12-07 21:00:00,932][Main][INFO] - [train] Step 23100 out of 65536 | Loss --> 2.173 | Grad_l2 --> 0.245 | Weights_l2 --> 12230.703 | Lr --> 0.017 | Seconds_per_step --> 1.644 | -[2023-12-07 21:02:47,829][Main][INFO] - [train] Step 23200 out of 65536 | Loss --> 2.164 | Grad_l2 --> 0.243 | Weights_l2 --> 12254.878 | Lr --> 0.017 | Seconds_per_step --> 1.669 | -[2023-12-07 21:05:34,611][Main][INFO] - [train] Step 23300 out of 65536 | Loss --> 2.170 | Grad_l2 --> 0.262 | Weights_l2 --> 12278.986 | Lr --> 0.017 | Seconds_per_step --> 1.668 | -[2023-12-07 21:08:17,436][Main][INFO] - [train] Step 23400 out of 65536 | Loss --> 2.166 | Grad_l2 --> 0.247 | Weights_l2 --> 12302.752 | Lr --> 0.017 | Seconds_per_step --> 1.628 | -[2023-12-07 21:08:28,583][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00505-of-00512.json.gz -[2023-12-07 21:08:42,230][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00209-of-00512.json.gz -[2023-12-07 21:11:04,015][Main][INFO] - [train] Step 23500 out of 65536 | Loss --> 2.177 | Grad_l2 --> 0.266 | Weights_l2 --> 12326.617 | Lr --> 0.017 | Seconds_per_step --> 1.666 | -[2023-12-07 21:13:50,250][Main][INFO] - [train] Step 23600 out of 65536 | Loss --> 2.184 | Grad_l2 --> 0.271 | Weights_l2 --> 12350.329 | Lr --> 0.017 | Seconds_per_step --> 1.662 | -[2023-12-07 21:16:33,708][Main][INFO] - [train] Step 23700 out of 65536 | Loss --> 2.173 | Grad_l2 --> 0.259 | Weights_l2 --> 12374.323 | Lr --> 0.017 | Seconds_per_step --> 1.635 | -[2023-12-07 21:19:18,613][Main][INFO] - [train] Step 23800 out of 65536 | Loss --> 2.169 | Grad_l2 --> 0.259 | Weights_l2 --> 12398.096 | Lr --> 0.017 | Seconds_per_step --> 1.649 | -[2023-12-07 21:22:02,798][Main][INFO] - [train] Step 23900 out of 65536 | Loss --> 2.160 | Grad_l2 --> 0.250 | Weights_l2 --> 12421.738 | Lr --> 0.017 | Seconds_per_step --> 1.642 | -[2023-12-07 21:24:48,001][Main][INFO] - [train] Step 24000 out of 65536 | Loss --> 2.141 | Grad_l2 --> 0.254 | Weights_l2 --> 12445.214 | Lr --> 0.017 | Seconds_per_step --> 1.652 | -[2023-12-07 21:27:35,138][Main][INFO] - [train] Step 24100 out of 65536 | Loss --> 2.149 | Grad_l2 --> 0.257 | Weights_l2 --> 12468.572 | Lr --> 0.017 | Seconds_per_step --> 1.671 | -[2023-12-07 21:30:17,859][Main][INFO] - [train] Step 24200 out of 65536 | Loss --> 2.169 | Grad_l2 --> 0.258 | Weights_l2 --> 12492.072 | Lr --> 0.017 | Seconds_per_step --> 1.627 | -[2023-12-07 21:30:33,957][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00360-of-00512.json.gz -[2023-12-07 21:30:56,106][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00441-of-00512.json.gz -[2023-12-07 21:33:04,299][Main][INFO] - [train] Step 24300 out of 65536 | Loss --> 2.171 | Grad_l2 --> 0.260 | Weights_l2 --> 12515.455 | Lr --> 0.017 | Seconds_per_step --> 1.664 | -[2023-12-07 21:35:51,570][Main][INFO] - [train] Step 24400 out of 65536 | Loss --> 2.138 | Grad_l2 --> 0.251 | Weights_l2 --> 12538.729 | Lr --> 0.017 | Seconds_per_step --> 1.673 | -[2023-12-07 21:38:36,323][Main][INFO] - [train] Step 24500 out of 65536 | Loss --> 2.134 | Grad_l2 --> 0.252 | Weights_l2 --> 12561.726 | Lr --> 0.017 | Seconds_per_step --> 1.648 | -[2023-12-07 21:41:23,867][Main][INFO] - [train] Step 24600 out of 65536 | Loss --> 2.136 | Grad_l2 --> 0.251 | Weights_l2 --> 12584.892 | Lr --> 0.017 | Seconds_per_step --> 1.675 | -[2023-12-07 21:44:06,511][Main][INFO] - [train] Step 24700 out of 65536 | Loss --> 2.150 | Grad_l2 --> 0.261 | Weights_l2 --> 12608.105 | Lr --> 0.017 | Seconds_per_step --> 1.626 | -[2023-12-07 21:46:52,499][Main][INFO] - [train] Step 24800 out of 65536 | Loss --> 2.161 | Grad_l2 --> 0.261 | Weights_l2 --> 12631.147 | Lr --> 0.017 | Seconds_per_step --> 1.660 | -[2023-12-07 21:49:35,569][Main][INFO] - [train] Step 24900 out of 65536 | Loss --> 2.155 | Grad_l2 --> 0.261 | Weights_l2 --> 12654.073 | Lr --> 0.017 | Seconds_per_step --> 1.631 | -[2023-12-07 21:52:18,186][Main][INFO] - [train] Step 25000 out of 65536 | Loss --> 2.154 | Grad_l2 --> 0.281 | Weights_l2 --> 12677.639 | Lr --> 0.017 | Seconds_per_step --> 1.626 | -[2023-12-07 21:52:22,947][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00227-of-00512.json.gz -[2023-12-07 21:52:46,741][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00309-of-00512.json.gz -[2023-12-07 21:55:06,336][Main][INFO] - [train] Step 25100 out of 65536 | Loss --> 2.151 | Grad_l2 --> 0.273 | Weights_l2 --> 12700.694 | Lr --> 0.017 | Seconds_per_step --> 1.681 | -[2023-12-07 21:57:49,806][Main][INFO] - [train] Step 25200 out of 65536 | Loss --> 2.139 | Grad_l2 --> 0.267 | Weights_l2 --> 12723.426 | Lr --> 0.017 | Seconds_per_step --> 1.635 | -[2023-12-07 22:00:32,914][Main][INFO] - [train] Step 25300 out of 65536 | Loss --> 2.137 | Grad_l2 --> 0.260 | Weights_l2 --> 12745.875 | Lr --> 0.016 | Seconds_per_step --> 1.631 | -[2023-12-07 22:03:17,184][Main][INFO] - [train] Step 25400 out of 65536 | Loss --> 2.138 | Grad_l2 --> 0.263 | Weights_l2 --> 12768.278 | Lr --> 0.016 | Seconds_per_step --> 1.643 | -[2023-12-07 22:06:04,048][Main][INFO] - [train] Step 25500 out of 65536 | Loss --> 2.146 | Grad_l2 --> 0.268 | Weights_l2 --> 12790.544 | Lr --> 0.016 | Seconds_per_step --> 1.669 | -[2023-12-07 22:08:46,665][Main][INFO] - [train] Step 25600 out of 65536 | Loss --> 2.136 | Grad_l2 --> 0.254 | Weights_l2 --> 12812.800 | Lr --> 0.016 | Seconds_per_step --> 1.626 | -[2023-12-07 22:11:31,764][Main][INFO] - [train] Step 25700 out of 65536 | Loss --> 2.129 | Grad_l2 --> 0.275 | Weights_l2 --> 12835.011 | Lr --> 0.016 | Seconds_per_step --> 1.651 | -[2023-12-07 22:13:49,158][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00265-of-00512.json.gz -[2023-12-07 22:13:49,428][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00167-of-00512.json.gz -[2023-12-07 22:14:17,132][Main][INFO] - [train] Step 25800 out of 65536 | Loss --> 2.131 | Grad_l2 --> 0.269 | Weights_l2 --> 12857.354 | Lr --> 0.016 | Seconds_per_step --> 1.654 | -[2023-12-07 22:17:02,716][Main][INFO] - [train] Step 25900 out of 65536 | Loss --> 2.139 | Grad_l2 --> 0.267 | Weights_l2 --> 12879.443 | Lr --> 0.016 | Seconds_per_step --> 1.656 | -[2023-12-07 22:19:45,637][Main][INFO] - [train] Step 26000 out of 65536 | Loss --> 2.136 | Grad_l2 --> 0.264 | Weights_l2 --> 12901.501 | Lr --> 0.016 | Seconds_per_step --> 1.629 | -[2023-12-07 22:22:27,851][Main][INFO] - [train] Step 26100 out of 65536 | Loss --> 2.117 | Grad_l2 --> 0.263 | Weights_l2 --> 12923.354 | Lr --> 0.016 | Seconds_per_step --> 1.622 | -[2023-12-07 22:25:14,259][Main][INFO] - [train] Step 26200 out of 65536 | Loss --> 2.109 | Grad_l2 --> 0.264 | Weights_l2 --> 12944.976 | Lr --> 0.016 | Seconds_per_step --> 1.664 | -[2023-12-07 22:27:56,897][Main][INFO] - [train] Step 26300 out of 65536 | Loss --> 2.113 | Grad_l2 --> 0.264 | Weights_l2 --> 12966.618 | Lr --> 0.016 | Seconds_per_step --> 1.626 | -[2023-12-07 22:30:41,435][Main][INFO] - [train] Step 26400 out of 65536 | Loss --> 2.136 | Grad_l2 --> 0.271 | Weights_l2 --> 12988.145 | Lr --> 0.016 | Seconds_per_step --> 1.645 | -[2023-12-07 22:33:24,655][Main][INFO] - [train] Step 26500 out of 65536 | Loss --> 2.142 | Grad_l2 --> 0.259 | Weights_l2 --> 13009.688 | Lr --> 0.016 | Seconds_per_step --> 1.632 | -[2023-12-07 22:35:34,662][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00117-of-00512.json.gz -[2023-12-07 22:35:39,761][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00153-of-00512.json.gz -[2023-12-07 22:36:12,806][Main][INFO] - [train] Step 26600 out of 65536 | Loss --> 2.126 | Grad_l2 --> 0.260 | Weights_l2 --> 13031.330 | Lr --> 0.016 | Seconds_per_step --> 1.682 | -[2023-12-07 22:38:56,682][Main][INFO] - [train] Step 26700 out of 65536 | Loss --> 2.122 | Grad_l2 --> 0.267 | Weights_l2 --> 13052.716 | Lr --> 0.016 | Seconds_per_step --> 1.639 | -[2023-12-07 22:41:41,702][Main][INFO] - [train] Step 26800 out of 65536 | Loss --> 2.121 | Grad_l2 --> 0.270 | Weights_l2 --> 13074.102 | Lr --> 0.016 | Seconds_per_step --> 1.650 | -[2023-12-07 22:44:26,676][Main][INFO] - [train] Step 26900 out of 65536 | Loss --> 2.102 | Grad_l2 --> 0.272 | Weights_l2 --> 13095.423 | Lr --> 0.016 | Seconds_per_step --> 1.650 | -[2023-12-07 22:47:14,900][Main][INFO] - [train] Step 27000 out of 65536 | Loss --> 2.108 | Grad_l2 --> 0.263 | Weights_l2 --> 13116.465 | Lr --> 0.016 | Seconds_per_step --> 1.682 | -[2023-12-07 22:49:58,038][Main][INFO] - [train] Step 27100 out of 65536 | Loss --> 2.103 | Grad_l2 --> 0.266 | Weights_l2 --> 13137.403 | Lr --> 0.016 | Seconds_per_step --> 1.631 | -[2023-12-07 22:52:41,415][Main][INFO] - [train] Step 27200 out of 65536 | Loss --> 2.102 | Grad_l2 --> 0.271 | Weights_l2 --> 13158.407 | Lr --> 0.016 | Seconds_per_step --> 1.634 | -[2023-12-07 22:55:24,976][Main][INFO] - [train] Step 27300 out of 65536 | Loss --> 2.111 | Grad_l2 --> 0.264 | Weights_l2 --> 13179.183 | Lr --> 0.016 | Seconds_per_step --> 1.636 | -[2023-12-07 22:56:55,153][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00263-of-00512.json.gz -[2023-12-07 22:57:10,967][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00453-of-00512.json.gz -[2023-12-07 22:58:13,579][Main][INFO] - [train] Step 27400 out of 65536 | Loss --> 2.080 | Grad_l2 --> 0.269 | Weights_l2 --> 13199.877 | Lr --> 0.016 | Seconds_per_step --> 1.686 | -[2023-12-07 23:00:57,304][Main][INFO] - [train] Step 27500 out of 65536 | Loss --> 2.095 | Grad_l2 --> 0.261 | Weights_l2 --> 13220.473 | Lr --> 0.015 | Seconds_per_step --> 1.637 | -[2023-12-07 23:03:42,223][Main][INFO] - [train] Step 27600 out of 65536 | Loss --> 2.087 | Grad_l2 --> 0.273 | Weights_l2 --> 13240.948 | Lr --> 0.015 | Seconds_per_step --> 1.649 | -[2023-12-07 23:06:25,334][Main][INFO] - [train] Step 27700 out of 65536 | Loss --> 2.093 | Grad_l2 --> 0.267 | Weights_l2 --> 13261.338 | Lr --> 0.015 | Seconds_per_step --> 1.631 | -[2023-12-07 23:09:10,726][Main][INFO] - [train] Step 27800 out of 65536 | Loss --> 2.084 | Grad_l2 --> 0.271 | Weights_l2 --> 13281.808 | Lr --> 0.015 | Seconds_per_step --> 1.654 | -[2023-12-07 23:11:54,225][Main][INFO] - [train] Step 27900 out of 65536 | Loss --> 2.084 | Grad_l2 --> 0.264 | Weights_l2 --> 13302.115 | Lr --> 0.015 | Seconds_per_step --> 1.635 | -[2023-12-07 23:14:37,321][Main][INFO] - [train] Step 28000 out of 65536 | Loss --> 2.077 | Grad_l2 --> 0.267 | Weights_l2 --> 13322.378 | Lr --> 0.015 | Seconds_per_step --> 1.631 | -[2023-12-07 23:17:25,405][Main][INFO] - [train] Step 28100 out of 65536 | Loss --> 2.074 | Grad_l2 --> 0.272 | Weights_l2 --> 13342.526 | Lr --> 0.015 | Seconds_per_step --> 1.681 | -[2023-12-07 23:18:32,118][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00431-of-00512.json.gz -[2023-12-07 23:18:58,179][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00219-of-00512.json.gz -[2023-12-07 23:20:13,995][Main][INFO] - [train] Step 28200 out of 65536 | Loss --> 2.072 | Grad_l2 --> 0.269 | Weights_l2 --> 13362.565 | Lr --> 0.015 | Seconds_per_step --> 1.686 | -[2023-12-07 23:22:57,832][Main][INFO] - [train] Step 28300 out of 65536 | Loss --> 2.068 | Grad_l2 --> 0.270 | Weights_l2 --> 13382.573 | Lr --> 0.015 | Seconds_per_step --> 1.638 | -[2023-12-07 23:25:42,801][Main][INFO] - [train] Step 28400 out of 65536 | Loss --> 2.078 | Grad_l2 --> 0.270 | Weights_l2 --> 13402.311 | Lr --> 0.015 | Seconds_per_step --> 1.650 | -[2023-12-07 23:28:28,850][Main][INFO] - [train] Step 28500 out of 65536 | Loss --> 2.073 | Grad_l2 --> 0.270 | Weights_l2 --> 13421.837 | Lr --> 0.015 | Seconds_per_step --> 1.660 | -[2023-12-07 23:31:12,530][Main][INFO] - [train] Step 28600 out of 65536 | Loss --> 2.085 | Grad_l2 --> 0.274 | Weights_l2 --> 13441.598 | Lr --> 0.015 | Seconds_per_step --> 1.637 | -[2023-12-07 23:33:59,894][Main][INFO] - [train] Step 28700 out of 65536 | Loss --> 2.072 | Grad_l2 --> 0.277 | Weights_l2 --> 13461.216 | Lr --> 0.015 | Seconds_per_step --> 1.674 | -[2023-12-07 23:36:43,977][Main][INFO] - [train] Step 28800 out of 65536 | Loss --> 2.079 | Grad_l2 --> 0.279 | Weights_l2 --> 13480.720 | Lr --> 0.015 | Seconds_per_step --> 1.641 | -[2023-12-07 23:39:29,453][Main][INFO] - [train] Step 28900 out of 65536 | Loss --> 2.083 | Grad_l2 --> 0.280 | Weights_l2 --> 13500.163 | Lr --> 0.015 | Seconds_per_step --> 1.655 | -[2023-12-07 23:40:32,939][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00014-of-00512.json.gz -[2023-12-07 23:40:42,493][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00257-of-00512.json.gz -[2023-12-07 23:42:14,271][Main][INFO] - [train] Step 29000 out of 65536 | Loss --> 2.078 | Grad_l2 --> 0.271 | Weights_l2 --> 13519.359 | Lr --> 0.015 | Seconds_per_step --> 1.648 | -[2023-12-07 23:44:58,502][Main][INFO] - [train] Step 29100 out of 65536 | Loss --> 2.074 | Grad_l2 --> 0.266 | Weights_l2 --> 13538.450 | Lr --> 0.015 | Seconds_per_step --> 1.642 | -[2023-12-07 23:47:44,536][Main][INFO] - [train] Step 29200 out of 65536 | Loss --> 2.075 | Grad_l2 --> 0.270 | Weights_l2 --> 13557.560 | Lr --> 0.015 | Seconds_per_step --> 1.660 | -[2023-12-07 23:50:33,045][Main][INFO] - [train] Step 29300 out of 65536 | Loss --> 2.075 | Grad_l2 --> 0.278 | Weights_l2 --> 13576.645 | Lr --> 0.015 | Seconds_per_step --> 1.685 | -[2023-12-07 23:53:16,077][Main][INFO] - [train] Step 29400 out of 65536 | Loss --> 2.080 | Grad_l2 --> 0.268 | Weights_l2 --> 13595.462 | Lr --> 0.015 | Seconds_per_step --> 1.630 | -[2023-12-07 23:56:01,151][Main][INFO] - [train] Step 29500 out of 65536 | Loss --> 2.067 | Grad_l2 --> 0.271 | Weights_l2 --> 13614.070 | Lr --> 0.015 | Seconds_per_step --> 1.651 | -[2023-12-07 23:58:47,732][Main][INFO] - [train] Step 29600 out of 65536 | Loss --> 2.065 | Grad_l2 --> 0.275 | Weights_l2 --> 13632.672 | Lr --> 0.014 | Seconds_per_step --> 1.666 | -[2023-12-08 00:01:31,608][Main][INFO] - [train] Step 29700 out of 65536 | Loss --> 2.085 | Grad_l2 --> 0.275 | Weights_l2 --> 13651.230 | Lr --> 0.014 | Seconds_per_step --> 1.639 | -[2023-12-08 00:01:45,951][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00060-of-00512.json.gz -[2023-12-08 00:02:32,794][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00510-of-00512.json.gz -[2023-12-08 00:04:16,953][Main][INFO] - [train] Step 29800 out of 65536 | Loss --> 2.072 | Grad_l2 --> 0.276 | Weights_l2 --> 13669.623 | Lr --> 0.014 | Seconds_per_step --> 1.653 | -[2023-12-08 00:07:07,562][Main][INFO] - [train] Step 29900 out of 65536 | Loss --> 2.072 | Grad_l2 --> 0.275 | Weights_l2 --> 13687.885 | Lr --> 0.014 | Seconds_per_step --> 1.706 | -[2023-12-08 00:09:56,097][Main][INFO] - [train] Step 30000 out of 65536 | Loss --> 2.061 | Grad_l2 --> 0.282 | Weights_l2 --> 13706.187 | Lr --> 0.014 | Seconds_per_step --> 1.685 | -[2023-12-08 00:09:56,099][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-30000 -[2023-12-08 00:09:56,101][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading -[2023-12-08 00:09:58,749][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-30000/model.safetensors -[2023-12-08 00:10:02,169][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-30000/optimizer.bin -[2023-12-08 00:10:02,170][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-30000/scheduler.bin -[2023-12-08 00:10:02,171][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-30000/sampler.bin -[2023-12-08 00:10:02,171][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-30000/sampler_1.bin -[2023-12-08 00:10:02,172][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-30000/random_states_0.pkl -[2023-12-08 00:12:44,918][Main][INFO] - [train] Step 30100 out of 65536 | Loss --> 2.068 | Grad_l2 --> 0.296 | Weights_l2 --> 13724.762 | Lr --> 0.014 | Seconds_per_step --> 1.688 | -[2023-12-08 00:15:31,875][Main][INFO] - [train] Step 30200 out of 65536 | Loss --> 2.048 | Grad_l2 --> 0.287 | Weights_l2 --> 13743.313 | Lr --> 0.014 | Seconds_per_step --> 1.670 | -[2023-12-08 00:18:15,215][Main][INFO] - [train] Step 30300 out of 65536 | Loss --> 2.032 | Grad_l2 --> 0.285 | Weights_l2 --> 13761.057 | Lr --> 0.014 | Seconds_per_step --> 1.633 | -[2023-12-08 00:21:00,769][Main][INFO] - [train] Step 30400 out of 65536 | Loss --> 2.038 | Grad_l2 --> 0.282 | Weights_l2 --> 13778.798 | Lr --> 0.014 | Seconds_per_step --> 1.656 | -[2023-12-08 00:23:44,474][Main][INFO] - [train] Step 30500 out of 65536 | Loss --> 2.040 | Grad_l2 --> 0.280 | Weights_l2 --> 13796.510 | Lr --> 0.014 | Seconds_per_step --> 1.637 | -[2023-12-08 00:23:50,846][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00353-of-00512.json.gz -[2023-12-08 00:24:22,264][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00434-of-00512.json.gz -[2023-12-08 00:26:32,393][Main][INFO] - [train] Step 30600 out of 65536 | Loss --> 2.045 | Grad_l2 --> 0.280 | Weights_l2 --> 13814.104 | Lr --> 0.014 | Seconds_per_step --> 1.679 | -[2023-12-08 00:29:22,100][Main][INFO] - [train] Step 30700 out of 65536 | Loss --> 2.041 | Grad_l2 --> 0.279 | Weights_l2 --> 13831.599 | Lr --> 0.014 | Seconds_per_step --> 1.697 | -[2023-12-08 00:32:06,810][Main][INFO] - [train] Step 30800 out of 65536 | Loss --> 2.058 | Grad_l2 --> 0.282 | Weights_l2 --> 13848.863 | Lr --> 0.014 | Seconds_per_step --> 1.647 | -[2023-12-08 00:34:51,097][Main][INFO] - [train] Step 30900 out of 65536 | Loss --> 2.060 | Grad_l2 --> 0.276 | Weights_l2 --> 13866.056 | Lr --> 0.014 | Seconds_per_step --> 1.643 | -[2023-12-08 00:37:37,520][Main][INFO] - [train] Step 31000 out of 65536 | Loss --> 2.043 | Grad_l2 --> 0.272 | Weights_l2 --> 13883.203 | Lr --> 0.014 | Seconds_per_step --> 1.664 | -[2023-12-08 00:40:21,904][Main][INFO] - [train] Step 31100 out of 65536 | Loss --> 2.028 | Grad_l2 --> 0.284 | Weights_l2 --> 13900.181 | Lr --> 0.014 | Seconds_per_step --> 1.644 | -[2023-12-08 00:43:06,157][Main][INFO] - [train] Step 31200 out of 65536 | Loss --> 2.000 | Grad_l2 --> 0.279 | Weights_l2 --> 13917.088 | Lr --> 0.014 | Seconds_per_step --> 1.643 | -[2023-12-08 00:45:49,003][Main][INFO] - [train] Step 31300 out of 65536 | Loss --> 2.023 | Grad_l2 --> 0.291 | Weights_l2 --> 13933.890 | Lr --> 0.014 | Seconds_per_step --> 1.628 | -[2023-12-08 00:45:50,602][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00157-of-00512.json.gz -[2023-12-08 00:46:09,798][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00051-of-00512.json.gz -[2023-12-08 00:48:40,840][Main][INFO] - [train] Step 31400 out of 65536 | Loss --> 2.022 | Grad_l2 --> 0.279 | Weights_l2 --> 13950.565 | Lr --> 0.014 | Seconds_per_step --> 1.718 | -[2023-12-08 00:51:27,894][Main][INFO] - [train] Step 31500 out of 65536 | Loss --> 2.012 | Grad_l2 --> 0.275 | Weights_l2 --> 13967.043 | Lr --> 0.013 | Seconds_per_step --> 1.671 | -[2023-12-08 00:54:13,165][Main][INFO] - [train] Step 31600 out of 65536 | Loss --> 2.031 | Grad_l2 --> 0.285 | Weights_l2 --> 13983.498 | Lr --> 0.013 | Seconds_per_step --> 1.653 | -[2023-12-08 00:56:56,037][Main][INFO] - [train] Step 31700 out of 65536 | Loss --> 2.031 | Grad_l2 --> 0.285 | Weights_l2 --> 13999.777 | Lr --> 0.013 | Seconds_per_step --> 1.629 | -[2023-12-08 00:59:43,650][Main][INFO] - [train] Step 31800 out of 65536 | Loss --> 2.016 | Grad_l2 --> 0.285 | Weights_l2 --> 14016.070 | Lr --> 0.013 | Seconds_per_step --> 1.676 | -[2023-12-08 01:02:29,180][Main][INFO] - [train] Step 31900 out of 65536 | Loss --> 2.002 | Grad_l2 --> 0.284 | Weights_l2 --> 14032.187 | Lr --> 0.013 | Seconds_per_step --> 1.655 | -[2023-12-08 01:05:13,051][Main][INFO] - [train] Step 32000 out of 65536 | Loss --> 2.002 | Grad_l2 --> 0.285 | Weights_l2 --> 14048.226 | Lr --> 0.013 | Seconds_per_step --> 1.639 | -[2023-12-08 01:07:19,202][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00068-of-00512.json.gz -[2023-12-08 01:07:27,431][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00347-of-00512.json.gz -[2023-12-08 01:08:04,291][Main][INFO] - [train] Step 32100 out of 65536 | Loss --> 2.015 | Grad_l2 --> 0.274 | Weights_l2 --> 14064.211 | Lr --> 0.013 | Seconds_per_step --> 1.712 | -[2023-12-08 01:10:48,948][Main][INFO] - [train] Step 32200 out of 65536 | Loss --> 2.008 | Grad_l2 --> 0.287 | Weights_l2 --> 14079.922 | Lr --> 0.013 | Seconds_per_step --> 1.647 | -[2023-12-08 01:13:32,204][Main][INFO] - [train] Step 32300 out of 65536 | Loss --> 2.002 | Grad_l2 --> 0.280 | Weights_l2 --> 14095.869 | Lr --> 0.013 | Seconds_per_step --> 1.633 | -[2023-12-08 01:16:15,159][Main][INFO] - [train] Step 32400 out of 65536 | Loss --> 2.019 | Grad_l2 --> 0.279 | Weights_l2 --> 14111.563 | Lr --> 0.013 | Seconds_per_step --> 1.630 | -[2023-12-08 01:18:58,871][Main][INFO] - [train] Step 32500 out of 65536 | Loss --> 2.014 | Grad_l2 --> 0.287 | Weights_l2 --> 14126.992 | Lr --> 0.013 | Seconds_per_step --> 1.637 | -[2023-12-08 01:21:46,444][Main][INFO] - [train] Step 32600 out of 65536 | Loss --> 1.993 | Grad_l2 --> 0.288 | Weights_l2 --> 14142.429 | Lr --> 0.013 | Seconds_per_step --> 1.676 | -[2023-12-08 01:24:33,905][Main][INFO] - [train] Step 32700 out of 65536 | Loss --> 2.012 | Grad_l2 --> 0.288 | Weights_l2 --> 14157.793 | Lr --> 0.013 | Seconds_per_step --> 1.675 | -[2023-12-08 01:27:18,066][Main][INFO] - [train] Step 32800 out of 65536 | Loss --> 2.007 | Grad_l2 --> 0.284 | Weights_l2 --> 14172.970 | Lr --> 0.013 | Seconds_per_step --> 1.642 | -[2023-12-08 01:29:13,919][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00137-of-00512.json.gz -[2023-12-08 01:29:28,461][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00179-of-00512.json.gz -[2023-12-08 01:30:04,923][Main][INFO] - [train] Step 32900 out of 65536 | Loss --> 1.998 | Grad_l2 --> 0.287 | Weights_l2 --> 14188.192 | Lr --> 0.013 | Seconds_per_step --> 1.669 | -[2023-12-08 01:32:48,850][Main][INFO] - [train] Step 33000 out of 65536 | Loss --> 2.004 | Grad_l2 --> 0.282 | Weights_l2 --> 14203.205 | Lr --> 0.013 | Seconds_per_step --> 1.639 | -[2023-12-08 01:35:32,122][Main][INFO] - [train] Step 33100 out of 65536 | Loss --> 1.995 | Grad_l2 --> 0.283 | Weights_l2 --> 14218.073 | Lr --> 0.013 | Seconds_per_step --> 1.633 | -[2023-12-08 01:38:16,506][Main][INFO] - [train] Step 33200 out of 65536 | Loss --> 2.001 | Grad_l2 --> 0.279 | Weights_l2 --> 14232.822 | Lr --> 0.013 | Seconds_per_step --> 1.644 | -[2023-12-08 01:41:01,346][Main][INFO] - [train] Step 33300 out of 65536 | Loss --> 1.995 | Grad_l2 --> 0.289 | Weights_l2 --> 14247.442 | Lr --> 0.013 | Seconds_per_step --> 1.648 | -[2023-12-08 01:43:50,403][Main][INFO] - [train] Step 33400 out of 65536 | Loss --> 1.991 | Grad_l2 --> 0.289 | Weights_l2 --> 14261.962 | Lr --> 0.012 | Seconds_per_step --> 1.691 | -[2023-12-08 01:46:33,978][Main][INFO] - [train] Step 33500 out of 65536 | Loss --> 1.983 | Grad_l2 --> 0.289 | Weights_l2 --> 14276.351 | Lr --> 0.012 | Seconds_per_step --> 1.636 | -[2023-12-08 01:49:18,431][Main][INFO] - [train] Step 33600 out of 65536 | Loss --> 1.993 | Grad_l2 --> 0.287 | Weights_l2 --> 14290.694 | Lr --> 0.012 | Seconds_per_step --> 1.645 | -[2023-12-08 01:51:11,544][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00008-of-00512.json.gz -[2023-12-08 01:51:46,197][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00121-of-00512.json.gz -[2023-12-08 01:52:05,126][Main][INFO] - [train] Step 33700 out of 65536 | Loss --> 1.995 | Grad_l2 --> 0.291 | Weights_l2 --> 14304.905 | Lr --> 0.012 | Seconds_per_step --> 1.667 | -[2023-12-08 01:54:49,034][Main][INFO] - [train] Step 33800 out of 65536 | Loss --> 1.985 | Grad_l2 --> 0.292 | Weights_l2 --> 14319.042 | Lr --> 0.012 | Seconds_per_step --> 1.639 | -[2023-12-08 01:57:33,301][Main][INFO] - [train] Step 33900 out of 65536 | Loss --> 1.983 | Grad_l2 --> 0.290 | Weights_l2 --> 14333.013 | Lr --> 0.012 | Seconds_per_step --> 1.643 | -[2023-12-08 02:00:18,033][Main][INFO] - [train] Step 34000 out of 65536 | Loss --> 1.984 | Grad_l2 --> 0.289 | Weights_l2 --> 14346.893 | Lr --> 0.012 | Seconds_per_step --> 1.647 | -[2023-12-08 02:03:00,980][Main][INFO] - [train] Step 34100 out of 65536 | Loss --> 1.992 | Grad_l2 --> 0.291 | Weights_l2 --> 14360.678 | Lr --> 0.012 | Seconds_per_step --> 1.629 | -[2023-12-08 02:05:46,436][Main][INFO] - [train] Step 34200 out of 65536 | Loss --> 1.995 | Grad_l2 --> 0.290 | Weights_l2 --> 14374.403 | Lr --> 0.012 | Seconds_per_step --> 1.655 | -[2023-12-08 02:08:30,960][Main][INFO] - [train] Step 34300 out of 65536 | Loss --> 1.962 | Grad_l2 --> 0.290 | Weights_l2 --> 14387.954 | Lr --> 0.012 | Seconds_per_step --> 1.645 | -[2023-12-08 02:11:16,303][Main][INFO] - [train] Step 34400 out of 65536 | Loss --> 1.957 | Grad_l2 --> 0.292 | Weights_l2 --> 14401.404 | Lr --> 0.012 | Seconds_per_step --> 1.653 | -[2023-12-08 02:12:23,584][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00477-of-00512.json.gz -[2023-12-08 02:12:51,186][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00230-of-00512.json.gz -[2023-12-08 02:14:02,110][Main][INFO] - [train] Step 34500 out of 65536 | Loss --> 1.956 | Grad_l2 --> 0.286 | Weights_l2 --> 14414.726 | Lr --> 0.012 | Seconds_per_step --> 1.658 | -[2023-12-08 02:16:45,774][Main][INFO] - [train] Step 34600 out of 65536 | Loss --> 1.953 | Grad_l2 --> 0.289 | Weights_l2 --> 14427.909 | Lr --> 0.012 | Seconds_per_step --> 1.637 | -[2023-12-08 02:19:30,541][Main][INFO] - [train] Step 34700 out of 65536 | Loss --> 1.959 | Grad_l2 --> 0.291 | Weights_l2 --> 14441.073 | Lr --> 0.012 | Seconds_per_step --> 1.648 | -[2023-12-08 02:22:14,218][Main][INFO] - [train] Step 34800 out of 65536 | Loss --> 1.969 | Grad_l2 --> 0.291 | Weights_l2 --> 14454.192 | Lr --> 0.012 | Seconds_per_step --> 1.637 | -[2023-12-08 02:25:00,024][Main][INFO] - [train] Step 34900 out of 65536 | Loss --> 1.951 | Grad_l2 --> 0.286 | Weights_l2 --> 14467.165 | Lr --> 0.012 | Seconds_per_step --> 1.658 | -[2023-12-08 02:27:43,800][Main][INFO] - [train] Step 35000 out of 65536 | Loss --> 1.982 | Grad_l2 --> 0.292 | Weights_l2 --> 14479.916 | Lr --> 0.012 | Seconds_per_step --> 1.638 | -[2023-12-08 02:30:28,054][Main][INFO] - [train] Step 35100 out of 65536 | Loss --> 1.976 | Grad_l2 --> 0.285 | Weights_l2 --> 14492.698 | Lr --> 0.012 | Seconds_per_step --> 1.643 | -[2023-12-08 02:33:12,228][Main][INFO] - [train] Step 35200 out of 65536 | Loss --> 1.976 | Grad_l2 --> 0.290 | Weights_l2 --> 14505.187 | Lr --> 0.011 | Seconds_per_step --> 1.642 | -[2023-12-08 02:34:15,012][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00273-of-00512.json.gz -[2023-12-08 02:34:24,172][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00249-of-00512.json.gz -[2023-12-08 02:35:57,538][Main][INFO] - [train] Step 35300 out of 65536 | Loss --> 1.973 | Grad_l2 --> 0.292 | Weights_l2 --> 14517.613 | Lr --> 0.011 | Seconds_per_step --> 1.653 | -[2023-12-08 02:38:45,072][Main][INFO] - [train] Step 35400 out of 65536 | Loss --> 1.983 | Grad_l2 --> 0.295 | Weights_l2 --> 14529.966 | Lr --> 0.011 | Seconds_per_step --> 1.675 | -[2023-12-08 02:41:32,288][Main][INFO] - [train] Step 35500 out of 65536 | Loss --> 1.965 | Grad_l2 --> 0.293 | Weights_l2 --> 14542.375 | Lr --> 0.011 | Seconds_per_step --> 1.672 | -[2023-12-08 02:44:19,689][Main][INFO] - [train] Step 35600 out of 65536 | Loss --> 1.965 | Grad_l2 --> 0.288 | Weights_l2 --> 14554.458 | Lr --> 0.011 | Seconds_per_step --> 1.674 | -[2023-12-08 02:47:04,676][Main][INFO] - [train] Step 35700 out of 65536 | Loss --> 1.958 | Grad_l2 --> 0.295 | Weights_l2 --> 14566.586 | Lr --> 0.011 | Seconds_per_step --> 1.650 | -[2023-12-08 02:49:48,170][Main][INFO] - [train] Step 35800 out of 65536 | Loss --> 1.967 | Grad_l2 --> 0.293 | Weights_l2 --> 14578.456 | Lr --> 0.011 | Seconds_per_step --> 1.635 | -[2023-12-08 02:52:35,214][Main][INFO] - [train] Step 35900 out of 65536 | Loss --> 1.975 | Grad_l2 --> 0.288 | Weights_l2 --> 14590.374 | Lr --> 0.011 | Seconds_per_step --> 1.670 | -[2023-12-08 02:55:17,699][Main][INFO] - [train] Step 36000 out of 65536 | Loss --> 1.962 | Grad_l2 --> 0.288 | Weights_l2 --> 14602.165 | Lr --> 0.011 | Seconds_per_step --> 1.625 | -[2023-12-08 02:55:26,076][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00500-of-00512.json.gz -[2023-12-08 02:55:31,264][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00000-of-00512.json.gz -[2023-12-08 02:58:04,772][Main][INFO] - [train] Step 36100 out of 65536 | Loss --> 1.944 | Grad_l2 --> 0.296 | Weights_l2 --> 14613.631 | Lr --> 0.011 | Seconds_per_step --> 1.671 | -[2023-12-08 03:00:47,882][Main][INFO] - [train] Step 36200 out of 65536 | Loss --> 1.949 | Grad_l2 --> 0.283 | Weights_l2 --> 14625.178 | Lr --> 0.011 | Seconds_per_step --> 1.631 | -[2023-12-08 03:03:33,228][Main][INFO] - [train] Step 36300 out of 65536 | Loss --> 1.961 | Grad_l2 --> 0.290 | Weights_l2 --> 14636.603 | Lr --> 0.011 | Seconds_per_step --> 1.653 | -[2023-12-08 03:06:16,671][Main][INFO] - [train] Step 36400 out of 65536 | Loss --> 1.946 | Grad_l2 --> 0.292 | Weights_l2 --> 14647.978 | Lr --> 0.011 | Seconds_per_step --> 1.634 | -[2023-12-08 03:09:02,751][Main][INFO] - [train] Step 36500 out of 65536 | Loss --> 1.953 | Grad_l2 --> 0.295 | Weights_l2 --> 14659.166 | Lr --> 0.011 | Seconds_per_step --> 1.661 | -[2023-12-08 03:11:45,092][Main][INFO] - [train] Step 36600 out of 65536 | Loss --> 1.949 | Grad_l2 --> 0.290 | Weights_l2 --> 14670.295 | Lr --> 0.011 | Seconds_per_step --> 1.623 | -[2023-12-08 03:14:30,610][Main][INFO] - [train] Step 36700 out of 65536 | Loss --> 1.954 | Grad_l2 --> 0.293 | Weights_l2 --> 14681.347 | Lr --> 0.011 | Seconds_per_step --> 1.655 | -[2023-12-08 03:16:53,639][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00507-of-00512.json.gz -[2023-12-08 03:17:15,245][Main][INFO] - [train] Step 36800 out of 65536 | Loss --> 1.946 | Grad_l2 --> 0.304 | Weights_l2 --> 14692.165 | Lr --> 0.011 | Seconds_per_step --> 1.646 | -[2023-12-08 03:17:32,288][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00352-of-00512.json.gz -[2023-12-08 03:20:00,182][Main][INFO] - [train] Step 36900 out of 65536 | Loss --> 1.924 | Grad_l2 --> 0.288 | Weights_l2 --> 14702.963 | Lr --> 0.010 | Seconds_per_step --> 1.649 | -[2023-12-08 03:22:45,572][Main][INFO] - [train] Step 37000 out of 65536 | Loss --> 1.939 | Grad_l2 --> 0.291 | Weights_l2 --> 14713.643 | Lr --> 0.010 | Seconds_per_step --> 1.654 | -[2023-12-08 03:25:28,865][Main][INFO] - [train] Step 37100 out of 65536 | Loss --> 1.938 | Grad_l2 --> 0.292 | Weights_l2 --> 14724.191 | Lr --> 0.010 | Seconds_per_step --> 1.633 | -[2023-12-08 03:28:16,227][Main][INFO] - [train] Step 37200 out of 65536 | Loss --> 1.933 | Grad_l2 --> 0.290 | Weights_l2 --> 14734.634 | Lr --> 0.010 | Seconds_per_step --> 1.674 | -[2023-12-08 03:31:00,008][Main][INFO] - [train] Step 37300 out of 65536 | Loss --> 1.930 | Grad_l2 --> 0.295 | Weights_l2 --> 14745.069 | Lr --> 0.010 | Seconds_per_step --> 1.638 | -[2023-12-08 03:33:50,096][Main][INFO] - [train] Step 37400 out of 65536 | Loss --> 1.925 | Grad_l2 --> 0.290 | Weights_l2 --> 14755.340 | Lr --> 0.010 | Seconds_per_step --> 1.701 | -[2023-12-08 03:36:32,856][Main][INFO] - [train] Step 37500 out of 65536 | Loss --> 1.936 | Grad_l2 --> 0.292 | Weights_l2 --> 14765.526 | Lr --> 0.010 | Seconds_per_step --> 1.628 | -[2023-12-08 03:39:03,244][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00140-of-00512.json.gz -[2023-12-08 03:39:08,837][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00178-of-00512.json.gz -[2023-12-08 03:39:20,140][Main][INFO] - [train] Step 37600 out of 65536 | Loss --> 1.924 | Grad_l2 --> 0.295 | Weights_l2 --> 14775.611 | Lr --> 0.010 | Seconds_per_step --> 1.673 | -[2023-12-08 03:42:04,055][Main][INFO] - [train] Step 37700 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.298 | Weights_l2 --> 14785.485 | Lr --> 0.010 | Seconds_per_step --> 1.639 | -[2023-12-08 03:44:47,695][Main][INFO] - [train] Step 37800 out of 65536 | Loss --> 1.913 | Grad_l2 --> 0.295 | Weights_l2 --> 14795.332 | Lr --> 0.010 | Seconds_per_step --> 1.636 | -[2023-12-08 03:47:35,616][Main][INFO] - [train] Step 37900 out of 65536 | Loss --> 1.921 | Grad_l2 --> 0.297 | Weights_l2 --> 14805.222 | Lr --> 0.010 | Seconds_per_step --> 1.679 | -[2023-12-08 03:50:19,652][Main][INFO] - [train] Step 38000 out of 65536 | Loss --> 1.908 | Grad_l2 --> 0.295 | Weights_l2 --> 14814.857 | Lr --> 0.010 | Seconds_per_step --> 1.640 | -[2023-12-08 03:53:06,429][Main][INFO] - [train] Step 38100 out of 65536 | Loss --> 1.893 | Grad_l2 --> 0.292 | Weights_l2 --> 14824.240 | Lr --> 0.010 | Seconds_per_step --> 1.668 | -[2023-12-08 03:55:49,461][Main][INFO] - [train] Step 38200 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.294 | Weights_l2 --> 14833.649 | Lr --> 0.010 | Seconds_per_step --> 1.630 | -[2023-12-08 03:58:34,119][Main][INFO] - [train] Step 38300 out of 65536 | Loss --> 1.907 | Grad_l2 --> 0.292 | Weights_l2 --> 14842.884 | Lr --> 0.010 | Seconds_per_step --> 1.647 | -[2023-12-08 04:00:29,569][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00105-of-00512.json.gz -[2023-12-08 04:00:51,814][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00165-of-00512.json.gz -[2023-12-08 04:01:24,630][Main][INFO] - [train] Step 38400 out of 65536 | Loss --> 1.915 | Grad_l2 --> 0.297 | Weights_l2 --> 14852.127 | Lr --> 0.010 | Seconds_per_step --> 1.705 | -[2023-12-08 04:04:08,419][Main][INFO] - [train] Step 38500 out of 65536 | Loss --> 1.918 | Grad_l2 --> 0.290 | Weights_l2 --> 14861.241 | Lr --> 0.010 | Seconds_per_step --> 1.638 | -[2023-12-08 04:06:52,002][Main][INFO] - [train] Step 38600 out of 65536 | Loss --> 1.915 | Grad_l2 --> 0.291 | Weights_l2 --> 14870.389 | Lr --> 0.010 | Seconds_per_step --> 1.636 | -[2023-12-08 04:09:37,313][Main][INFO] - [train] Step 38700 out of 65536 | Loss --> 1.897 | Grad_l2 --> 0.290 | Weights_l2 --> 14879.372 | Lr --> 0.009 | Seconds_per_step --> 1.653 | -[2023-12-08 04:12:23,736][Main][INFO] - [train] Step 38800 out of 65536 | Loss --> 1.900 | Grad_l2 --> 0.291 | Weights_l2 --> 14888.190 | Lr --> 0.009 | Seconds_per_step --> 1.664 | -[2023-12-08 04:15:06,932][Main][INFO] - [train] Step 38900 out of 65536 | Loss --> 1.900 | Grad_l2 --> 0.295 | Weights_l2 --> 14896.943 | Lr --> 0.009 | Seconds_per_step --> 1.632 | -[2023-12-08 04:17:50,338][Main][INFO] - [train] Step 39000 out of 65536 | Loss --> 1.906 | Grad_l2 --> 0.291 | Weights_l2 --> 14905.547 | Lr --> 0.009 | Seconds_per_step --> 1.634 | -[2023-12-08 04:20:34,604][Main][INFO] - [train] Step 39100 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.288 | Weights_l2 --> 14914.122 | Lr --> 0.009 | Seconds_per_step --> 1.643 | -[2023-12-08 04:22:14,526][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00378-of-00512.json.gz -[2023-12-08 04:23:04,197][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00464-of-00512.json.gz -[2023-12-08 04:23:23,022][Main][INFO] - [train] Step 39200 out of 65536 | Loss --> 1.904 | Grad_l2 --> 0.294 | Weights_l2 --> 14922.521 | Lr --> 0.009 | Seconds_per_step --> 1.684 | -[2023-12-08 04:26:08,713][Main][INFO] - [train] Step 39300 out of 65536 | Loss --> 1.888 | Grad_l2 --> 0.294 | Weights_l2 --> 14930.833 | Lr --> 0.009 | Seconds_per_step --> 1.657 | -[2023-12-08 04:28:51,845][Main][INFO] - [train] Step 39400 out of 65536 | Loss --> 1.899 | Grad_l2 --> 0.290 | Weights_l2 --> 14939.047 | Lr --> 0.009 | Seconds_per_step --> 1.631 | -[2023-12-08 04:31:37,336][Main][INFO] - [train] Step 39500 out of 65536 | Loss --> 1.919 | Grad_l2 --> 0.294 | Weights_l2 --> 14947.263 | Lr --> 0.009 | Seconds_per_step --> 1.655 | -[2023-12-08 04:34:25,619][Main][INFO] - [train] Step 39600 out of 65536 | Loss --> 1.915 | Grad_l2 --> 0.295 | Weights_l2 --> 14955.395 | Lr --> 0.009 | Seconds_per_step --> 1.683 | -[2023-12-08 04:37:08,312][Main][INFO] - [train] Step 39700 out of 65536 | Loss --> 1.906 | Grad_l2 --> 0.289 | Weights_l2 --> 14963.386 | Lr --> 0.009 | Seconds_per_step --> 1.627 | -[2023-12-08 04:39:52,028][Main][INFO] - [train] Step 39800 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.291 | Weights_l2 --> 14971.238 | Lr --> 0.009 | Seconds_per_step --> 1.637 | -[2023-12-08 04:42:38,498][Main][INFO] - [train] Step 39900 out of 65536 | Loss --> 1.886 | Grad_l2 --> 0.296 | Weights_l2 --> 14978.996 | Lr --> 0.009 | Seconds_per_step --> 1.665 | -[2023-12-08 04:43:45,803][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00010-of-00512.json.gz -[2023-12-08 04:44:54,181][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00019-of-00512.json.gz -[2023-12-08 04:45:25,694][Main][INFO] - [train] Step 40000 out of 65536 | Loss --> 1.899 | Grad_l2 --> 0.292 | Weights_l2 --> 14986.662 | Lr --> 0.009 | Seconds_per_step --> 1.672 | -[2023-12-08 04:45:25,695][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-40000 -[2023-12-08 04:45:25,698][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading -[2023-12-08 04:45:28,399][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-40000/model.safetensors -[2023-12-08 04:45:31,890][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-40000/optimizer.bin -[2023-12-08 04:45:31,891][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-40000/scheduler.bin -[2023-12-08 04:45:31,891][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-40000/sampler.bin -[2023-12-08 04:45:31,892][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-40000/sampler_1.bin -[2023-12-08 04:45:31,893][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-40000/random_states_0.pkl -[2023-12-08 04:48:15,937][Main][INFO] - [train] Step 40100 out of 65536 | Loss --> 1.913 | Grad_l2 --> 0.294 | Weights_l2 --> 14994.290 | Lr --> 0.009 | Seconds_per_step --> 1.702 | -[2023-12-08 04:51:00,619][Main][INFO] - [train] Step 40200 out of 65536 | Loss --> 1.893 | Grad_l2 --> 0.294 | Weights_l2 --> 15001.828 | Lr --> 0.009 | Seconds_per_step --> 1.647 | -[2023-12-08 04:53:46,410][Main][INFO] - [train] Step 40300 out of 65536 | Loss --> 1.892 | Grad_l2 --> 0.290 | Weights_l2 --> 15009.287 | Lr --> 0.009 | Seconds_per_step --> 1.658 | -[2023-12-08 04:56:30,183][Main][INFO] - [train] Step 40400 out of 65536 | Loss --> 1.895 | Grad_l2 --> 0.296 | Weights_l2 --> 15016.612 | Lr --> 0.009 | Seconds_per_step --> 1.638 | -[2023-12-08 04:59:14,207][Main][INFO] - [train] Step 40500 out of 65536 | Loss --> 1.893 | Grad_l2 --> 0.293 | Weights_l2 --> 15023.836 | Lr --> 0.008 | Seconds_per_step --> 1.640 | -[2023-12-08 05:01:59,921][Main][INFO] - [train] Step 40600 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.291 | Weights_l2 --> 15031.009 | Lr --> 0.008 | Seconds_per_step --> 1.657 | -[2023-12-08 05:04:44,032][Main][INFO] - [train] Step 40700 out of 65536 | Loss --> 1.875 | Grad_l2 --> 0.291 | Weights_l2 --> 15038.103 | Lr --> 0.008 | Seconds_per_step --> 1.641 | -[2023-12-08 05:05:17,154][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00471-of-00512.json.gz -[2023-12-08 05:06:25,266][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00424-of-00512.json.gz -[2023-12-08 05:07:29,324][Main][INFO] - [train] Step 40800 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.296 | Weights_l2 --> 15045.169 | Lr --> 0.008 | Seconds_per_step --> 1.653 | -[2023-12-08 05:10:15,262][Main][INFO] - [train] Step 40900 out of 65536 | Loss --> 1.881 | Grad_l2 --> 0.293 | Weights_l2 --> 15052.101 | Lr --> 0.008 | Seconds_per_step --> 1.659 | -[2023-12-08 05:12:59,310][Main][INFO] - [train] Step 41000 out of 65536 | Loss --> 1.879 | Grad_l2 --> 0.296 | Weights_l2 --> 15058.904 | Lr --> 0.008 | Seconds_per_step --> 1.640 | -[2023-12-08 05:15:45,075][Main][INFO] - [train] Step 41100 out of 65536 | Loss --> 1.873 | Grad_l2 --> 0.294 | Weights_l2 --> 15065.612 | Lr --> 0.008 | Seconds_per_step --> 1.658 | -[2023-12-08 05:18:32,712][Main][INFO] - [train] Step 41200 out of 65536 | Loss --> 1.884 | Grad_l2 --> 0.290 | Weights_l2 --> 15072.321 | Lr --> 0.008 | Seconds_per_step --> 1.676 | -[2023-12-08 05:21:17,038][Main][INFO] - [train] Step 41300 out of 65536 | Loss --> 1.892 | Grad_l2 --> 0.294 | Weights_l2 --> 15078.897 | Lr --> 0.008 | Seconds_per_step --> 1.643 | -[2023-12-08 05:24:02,982][Main][INFO] - [train] Step 41400 out of 65536 | Loss --> 1.882 | Grad_l2 --> 0.290 | Weights_l2 --> 15085.311 | Lr --> 0.008 | Seconds_per_step --> 1.659 | -[2023-12-08 05:26:52,288][Main][INFO] - [train] Step 41500 out of 65536 | Loss --> 1.878 | Grad_l2 --> 0.290 | Weights_l2 --> 15091.695 | Lr --> 0.008 | Seconds_per_step --> 1.693 | -[2023-12-08 05:27:09,261][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00053-of-00512.json.gz -[2023-12-08 05:28:15,872][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00063-of-00512.json.gz -[2023-12-08 05:29:39,142][Main][INFO] - [train] Step 41600 out of 65536 | Loss --> 1.856 | Grad_l2 --> 0.293 | Weights_l2 --> 15098.005 | Lr --> 0.008 | Seconds_per_step --> 1.669 | -[2023-12-08 05:32:22,875][Main][INFO] - [train] Step 41700 out of 65536 | Loss --> 1.857 | Grad_l2 --> 0.294 | Weights_l2 --> 15104.243 | Lr --> 0.008 | Seconds_per_step --> 1.637 | -[2023-12-08 05:35:13,602][Main][INFO] - [train] Step 41800 out of 65536 | Loss --> 1.859 | Grad_l2 --> 0.293 | Weights_l2 --> 15110.402 | Lr --> 0.008 | Seconds_per_step --> 1.707 | -[2023-12-08 05:37:56,484][Main][INFO] - [train] Step 41900 out of 65536 | Loss --> 1.858 | Grad_l2 --> 0.292 | Weights_l2 --> 15116.379 | Lr --> 0.008 | Seconds_per_step --> 1.629 | -[2023-12-08 05:40:40,183][Main][INFO] - [train] Step 42000 out of 65536 | Loss --> 1.861 | Grad_l2 --> 0.297 | Weights_l2 --> 15122.308 | Lr --> 0.008 | Seconds_per_step --> 1.637 | -[2023-12-08 05:43:25,177][Main][INFO] - [train] Step 42100 out of 65536 | Loss --> 1.860 | Grad_l2 --> 0.295 | Weights_l2 --> 15128.164 | Lr --> 0.008 | Seconds_per_step --> 1.650 | -[2023-12-08 05:46:10,029][Main][INFO] - [train] Step 42200 out of 65536 | Loss --> 1.849 | Grad_l2 --> 0.292 | Weights_l2 --> 15133.949 | Lr --> 0.008 | Seconds_per_step --> 1.649 | -[2023-12-08 05:48:27,068][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00200-of-00512.json.gz -[2023-12-08 05:48:55,001][Main][INFO] - [train] Step 42300 out of 65536 | Loss --> 1.874 | Grad_l2 --> 0.290 | Weights_l2 --> 15139.665 | Lr --> 0.007 | Seconds_per_step --> 1.650 | -[2023-12-08 05:49:17,236][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00266-of-00512.json.gz -[2023-12-08 05:51:39,010][Main][INFO] - [train] Step 42400 out of 65536 | Loss --> 1.868 | Grad_l2 --> 0.290 | Weights_l2 --> 15145.309 | Lr --> 0.007 | Seconds_per_step --> 1.640 | -[2023-12-08 05:54:24,491][Main][INFO] - [train] Step 42500 out of 65536 | Loss --> 1.878 | Grad_l2 --> 0.289 | Weights_l2 --> 15150.920 | Lr --> 0.007 | Seconds_per_step --> 1.655 | -[2023-12-08 05:57:08,952][Main][INFO] - [train] Step 42600 out of 65536 | Loss --> 1.863 | Grad_l2 --> 0.288 | Weights_l2 --> 15156.354 | Lr --> 0.007 | Seconds_per_step --> 1.645 | -[2023-12-08 05:59:51,577][Main][INFO] - [train] Step 42700 out of 65536 | Loss --> 1.855 | Grad_l2 --> 0.290 | Weights_l2 --> 15161.813 | Lr --> 0.007 | Seconds_per_step --> 1.626 | -[2023-12-08 06:02:37,600][Main][INFO] - [train] Step 42800 out of 65536 | Loss --> 1.846 | Grad_l2 --> 0.296 | Weights_l2 --> 15167.237 | Lr --> 0.007 | Seconds_per_step --> 1.660 | -[2023-12-08 06:05:22,789][Main][INFO] - [train] Step 42900 out of 65536 | Loss --> 1.834 | Grad_l2 --> 0.296 | Weights_l2 --> 15172.469 | Lr --> 0.007 | Seconds_per_step --> 1.652 | -[2023-12-08 06:08:07,377][Main][INFO] - [train] Step 43000 out of 65536 | Loss --> 1.826 | Grad_l2 --> 0.291 | Weights_l2 --> 15177.712 | Lr --> 0.007 | Seconds_per_step --> 1.646 | -[2023-12-08 06:10:34,452][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00144-of-00512.json.gz -[2023-12-08 06:10:52,180][Main][INFO] - [train] Step 43100 out of 65536 | Loss --> 1.834 | Grad_l2 --> 0.294 | Weights_l2 --> 15182.833 | Lr --> 0.007 | Seconds_per_step --> 1.648 | -[2023-12-08 06:10:54,965][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00386-of-00512.json.gz -[2023-12-08 06:13:36,601][Main][INFO] - [train] Step 43200 out of 65536 | Loss --> 1.839 | Grad_l2 --> 0.288 | Weights_l2 --> 15187.904 | Lr --> 0.007 | Seconds_per_step --> 1.644 | -[2023-12-08 06:16:20,014][Main][INFO] - [train] Step 43300 out of 65536 | Loss --> 1.851 | Grad_l2 --> 0.292 | Weights_l2 --> 15192.823 | Lr --> 0.007 | Seconds_per_step --> 1.634 | -[2023-12-08 06:19:04,359][Main][INFO] - [train] Step 43400 out of 65536 | Loss --> 1.851 | Grad_l2 --> 0.293 | Weights_l2 --> 15197.741 | Lr --> 0.007 | Seconds_per_step --> 1.643 | -[2023-12-08 06:21:49,783][Main][INFO] - [train] Step 43500 out of 65536 | Loss --> 1.849 | Grad_l2 --> 0.290 | Weights_l2 --> 15202.597 | Lr --> 0.007 | Seconds_per_step --> 1.654 | -[2023-12-08 06:24:33,703][Main][INFO] - [train] Step 43600 out of 65536 | Loss --> 1.841 | Grad_l2 --> 0.288 | Weights_l2 --> 15207.298 | Lr --> 0.007 | Seconds_per_step --> 1.639 | -[2023-12-08 06:27:17,028][Main][INFO] - [train] Step 43700 out of 65536 | Loss --> 1.847 | Grad_l2 --> 0.290 | Weights_l2 --> 15211.960 | Lr --> 0.007 | Seconds_per_step --> 1.633 | -[2023-12-08 06:30:04,262][Main][INFO] - [train] Step 43800 out of 65536 | Loss --> 1.844 | Grad_l2 --> 0.290 | Weights_l2 --> 15216.597 | Lr --> 0.007 | Seconds_per_step --> 1.672 | -[2023-12-08 06:32:11,514][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00302-of-00512.json.gz -[2023-12-08 06:32:23,251][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00057-of-00512.json.gz -[2023-12-08 06:32:50,310][Main][INFO] - [train] Step 43900 out of 65536 | Loss --> 1.837 | Grad_l2 --> 0.289 | Weights_l2 --> 15221.108 | Lr --> 0.007 | Seconds_per_step --> 1.660 | -[2023-12-08 06:35:37,436][Main][INFO] - [train] Step 44000 out of 65536 | Loss --> 1.839 | Grad_l2 --> 0.295 | Weights_l2 --> 15225.620 | Lr --> 0.007 | Seconds_per_step --> 1.671 | -[2023-12-08 06:38:20,398][Main][INFO] - [train] Step 44100 out of 65536 | Loss --> 1.818 | Grad_l2 --> 0.287 | Weights_l2 --> 15230.007 | Lr --> 0.007 | Seconds_per_step --> 1.630 | -[2023-12-08 06:41:02,785][Main][INFO] - [train] Step 44200 out of 65536 | Loss --> 1.819 | Grad_l2 --> 0.287 | Weights_l2 --> 15234.267 | Lr --> 0.006 | Seconds_per_step --> 1.624 | -[2023-12-08 06:43:47,146][Main][INFO] - [train] Step 44300 out of 65536 | Loss --> 1.809 | Grad_l2 --> 0.289 | Weights_l2 --> 15238.496 | Lr --> 0.006 | Seconds_per_step --> 1.644 | -[2023-12-08 06:46:32,021][Main][INFO] - [train] Step 44400 out of 65536 | Loss --> 1.820 | Grad_l2 --> 0.287 | Weights_l2 --> 15242.681 | Lr --> 0.006 | Seconds_per_step --> 1.649 | -[2023-12-08 06:49:15,450][Main][INFO] - [train] Step 44500 out of 65536 | Loss --> 1.819 | Grad_l2 --> 0.290 | Weights_l2 --> 15246.810 | Lr --> 0.006 | Seconds_per_step --> 1.634 | -[2023-12-08 06:51:58,011][Main][INFO] - [train] Step 44600 out of 65536 | Loss --> 1.827 | Grad_l2 --> 0.293 | Weights_l2 --> 15250.909 | Lr --> 0.006 | Seconds_per_step --> 1.626 | -[2023-12-08 06:53:20,092][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00346-of-00512.json.gz -[2023-12-08 06:53:31,307][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00329-of-00512.json.gz -[2023-12-08 06:54:44,869][Main][INFO] - [train] Step 44700 out of 65536 | Loss --> 1.828 | Grad_l2 --> 0.297 | Weights_l2 --> 15254.867 | Lr --> 0.006 | Seconds_per_step --> 1.669 | -[2023-12-08 06:57:28,247][Main][INFO] - [train] Step 44800 out of 65536 | Loss --> 1.825 | Grad_l2 --> 0.292 | Weights_l2 --> 15258.797 | Lr --> 0.006 | Seconds_per_step --> 1.634 | -[2023-12-08 07:00:11,336][Main][INFO] - [train] Step 44900 out of 65536 | Loss --> 1.833 | Grad_l2 --> 0.292 | Weights_l2 --> 15262.632 | Lr --> 0.006 | Seconds_per_step --> 1.631 | -[2023-12-08 07:02:57,365][Main][INFO] - [train] Step 45000 out of 65536 | Loss --> 1.822 | Grad_l2 --> 0.287 | Weights_l2 --> 15266.418 | Lr --> 0.006 | Seconds_per_step --> 1.660 | -[2023-12-08 07:05:40,065][Main][INFO] - [train] Step 45100 out of 65536 | Loss --> 1.829 | Grad_l2 --> 0.293 | Weights_l2 --> 15270.140 | Lr --> 0.006 | Seconds_per_step --> 1.627 | -[2023-12-08 07:08:23,251][Main][INFO] - [train] Step 45200 out of 65536 | Loss --> 1.828 | Grad_l2 --> 0.290 | Weights_l2 --> 15273.780 | Lr --> 0.006 | Seconds_per_step --> 1.632 | -[2023-12-08 07:11:06,761][Main][INFO] - [train] Step 45300 out of 65536 | Loss --> 1.811 | Grad_l2 --> 0.288 | Weights_l2 --> 15277.407 | Lr --> 0.006 | Seconds_per_step --> 1.635 | -[2023-12-08 07:13:52,038][Main][INFO] - [train] Step 45400 out of 65536 | Loss --> 1.806 | Grad_l2 --> 0.288 | Weights_l2 --> 15280.945 | Lr --> 0.006 | Seconds_per_step --> 1.653 | -[2023-12-08 07:15:01,341][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00481-of-00512.json.gz -[2023-12-08 07:15:29,736][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00158-of-00512.json.gz -[2023-12-08 07:16:38,151][Main][INFO] - [train] Step 45500 out of 65536 | Loss --> 1.804 | Grad_l2 --> 0.288 | Weights_l2 --> 15284.428 | Lr --> 0.006 | Seconds_per_step --> 1.661 | -[2023-12-08 07:19:20,481][Main][INFO] - [train] Step 45600 out of 65536 | Loss --> 1.815 | Grad_l2 --> 0.295 | Weights_l2 --> 15287.821 | Lr --> 0.006 | Seconds_per_step --> 1.623 | -[2023-12-08 07:22:05,372][Main][INFO] - [train] Step 45700 out of 65536 | Loss --> 1.830 | Grad_l2 --> 0.290 | Weights_l2 --> 15291.193 | Lr --> 0.006 | Seconds_per_step --> 1.649 | -[2023-12-08 07:24:52,021][Main][INFO] - [train] Step 45800 out of 65536 | Loss --> 1.818 | Grad_l2 --> 0.291 | Weights_l2 --> 15294.520 | Lr --> 0.006 | Seconds_per_step --> 1.666 | -[2023-12-08 07:27:35,554][Main][INFO] - [train] Step 45900 out of 65536 | Loss --> 1.836 | Grad_l2 --> 0.292 | Weights_l2 --> 15297.761 | Lr --> 0.006 | Seconds_per_step --> 1.635 | -[2023-12-08 07:30:18,506][Main][INFO] - [train] Step 46000 out of 65536 | Loss --> 1.815 | Grad_l2 --> 0.289 | Weights_l2 --> 15300.932 | Lr --> 0.006 | Seconds_per_step --> 1.630 | -[2023-12-08 07:32:59,331][Main][INFO] - [train] Step 46100 out of 65536 | Loss --> 1.819 | Grad_l2 --> 0.294 | Weights_l2 --> 15304.096 | Lr --> 0.005 | Seconds_per_step --> 1.608 | -[2023-12-08 07:35:44,708][Main][INFO] - [train] Step 46200 out of 65536 | Loss --> 1.810 | Grad_l2 --> 0.290 | Weights_l2 --> 15307.201 | Lr --> 0.005 | Seconds_per_step --> 1.654 | -[2023-12-08 07:36:23,831][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00075-of-00512.json.gz -[2023-12-08 07:37:00,314][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00072-of-00512.json.gz -[2023-12-08 07:38:35,519][Main][INFO] - [train] Step 46300 out of 65536 | Loss --> 1.802 | Grad_l2 --> 0.292 | Weights_l2 --> 15310.243 | Lr --> 0.005 | Seconds_per_step --> 1.708 | -[2023-12-08 07:41:19,252][Main][INFO] - [train] Step 46400 out of 65536 | Loss --> 1.801 | Grad_l2 --> 0.288 | Weights_l2 --> 15313.252 | Lr --> 0.005 | Seconds_per_step --> 1.637 | -[2023-12-08 07:44:03,892][Main][INFO] - [train] Step 46500 out of 65536 | Loss --> 1.803 | Grad_l2 --> 0.289 | Weights_l2 --> 15316.148 | Lr --> 0.005 | Seconds_per_step --> 1.646 | -[2023-12-08 07:46:48,523][Main][INFO] - [train] Step 46600 out of 65536 | Loss --> 1.808 | Grad_l2 --> 0.285 | Weights_l2 --> 15319.003 | Lr --> 0.005 | Seconds_per_step --> 1.646 | -[2023-12-08 07:49:31,265][Main][INFO] - [train] Step 46700 out of 65536 | Loss --> 1.802 | Grad_l2 --> 0.291 | Weights_l2 --> 15321.777 | Lr --> 0.005 | Seconds_per_step --> 1.627 | -[2023-12-08 07:52:14,908][Main][INFO] - [train] Step 46800 out of 65536 | Loss --> 1.815 | Grad_l2 --> 0.285 | Weights_l2 --> 15324.560 | Lr --> 0.005 | Seconds_per_step --> 1.636 | -[2023-12-08 07:54:59,523][Main][INFO] - [train] Step 46900 out of 65536 | Loss --> 1.801 | Grad_l2 --> 0.289 | Weights_l2 --> 15327.312 | Lr --> 0.005 | Seconds_per_step --> 1.646 | -[2023-12-08 07:57:41,235][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00285-of-00512.json.gz -[2023-12-08 07:57:41,602][Main][INFO] - [train] Step 47000 out of 65536 | Loss --> 1.806 | Grad_l2 --> 0.293 | Weights_l2 --> 15329.980 | Lr --> 0.005 | Seconds_per_step --> 1.621 | -[2023-12-08 07:57:55,456][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00062-of-00512.json.gz -[2023-12-08 08:00:29,619][Main][INFO] - [train] Step 47100 out of 65536 | Loss --> 1.783 | Grad_l2 --> 0.288 | Weights_l2 --> 15332.556 | Lr --> 0.005 | Seconds_per_step --> 1.680 | -[2023-12-08 08:03:13,379][Main][INFO] - [train] Step 47200 out of 65536 | Loss --> 1.783 | Grad_l2 --> 0.287 | Weights_l2 --> 15335.148 | Lr --> 0.005 | Seconds_per_step --> 1.638 | -[2023-12-08 08:05:57,405][Main][INFO] - [train] Step 47300 out of 65536 | Loss --> 1.772 | Grad_l2 --> 0.290 | Weights_l2 --> 15337.617 | Lr --> 0.005 | Seconds_per_step --> 1.640 | -[2023-12-08 08:08:45,111][Main][INFO] - [train] Step 47400 out of 65536 | Loss --> 1.784 | Grad_l2 --> 0.291 | Weights_l2 --> 15340.116 | Lr --> 0.005 | Seconds_per_step --> 1.677 | -[2023-12-08 08:11:30,355][Main][INFO] - [train] Step 47500 out of 65536 | Loss --> 1.787 | Grad_l2 --> 0.289 | Weights_l2 --> 15342.507 | Lr --> 0.005 | Seconds_per_step --> 1.652 | -[2023-12-08 08:14:16,723][Main][INFO] - [train] Step 47600 out of 65536 | Loss --> 1.786 | Grad_l2 --> 0.292 | Weights_l2 --> 15344.884 | Lr --> 0.005 | Seconds_per_step --> 1.664 | -[2023-12-08 08:17:01,705][Main][INFO] - [train] Step 47700 out of 65536 | Loss --> 1.783 | Grad_l2 --> 0.294 | Weights_l2 --> 15347.263 | Lr --> 0.005 | Seconds_per_step --> 1.650 | -[2023-12-08 08:19:41,049][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00039-of-00512.json.gz -[2023-12-08 08:19:46,132][Main][INFO] - [train] Step 47800 out of 65536 | Loss --> 1.785 | Grad_l2 --> 0.292 | Weights_l2 --> 15349.536 | Lr --> 0.005 | Seconds_per_step --> 1.644 | -[2023-12-08 08:19:58,320][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00154-of-00512.json.gz -[2023-12-08 08:22:31,518][Main][INFO] - [train] Step 47900 out of 65536 | Loss --> 1.787 | Grad_l2 --> 0.286 | Weights_l2 --> 15351.752 | Lr --> 0.005 | Seconds_per_step --> 1.654 | -[2023-12-08 08:25:14,620][Main][INFO] - [train] Step 48000 out of 65536 | Loss --> 1.795 | Grad_l2 --> 0.288 | Weights_l2 --> 15353.948 | Lr --> 0.005 | Seconds_per_step --> 1.631 | -[2023-12-08 08:27:57,451][Main][INFO] - [train] Step 48100 out of 65536 | Loss --> 1.792 | Grad_l2 --> 0.286 | Weights_l2 --> 15356.050 | Lr --> 0.004 | Seconds_per_step --> 1.628 | -[2023-12-08 08:30:40,095][Main][INFO] - [train] Step 48200 out of 65536 | Loss --> 1.777 | Grad_l2 --> 0.284 | Weights_l2 --> 15358.173 | Lr --> 0.004 | Seconds_per_step --> 1.626 | -[2023-12-08 08:33:24,328][Main][INFO] - [train] Step 48300 out of 65536 | Loss --> 1.784 | Grad_l2 --> 0.286 | Weights_l2 --> 15360.176 | Lr --> 0.004 | Seconds_per_step --> 1.642 | -[2023-12-08 08:36:13,903][Main][INFO] - [train] Step 48400 out of 65536 | Loss --> 1.787 | Grad_l2 --> 0.293 | Weights_l2 --> 15362.165 | Lr --> 0.004 | Seconds_per_step --> 1.696 | -[2023-12-08 08:38:57,713][Main][INFO] - [train] Step 48500 out of 65536 | Loss --> 1.782 | Grad_l2 --> 0.289 | Weights_l2 --> 15364.140 | Lr --> 0.004 | Seconds_per_step --> 1.638 | -[2023-12-08 08:40:39,684][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00286-of-00512.json.gz -[2023-12-08 08:41:01,831][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00343-of-00512.json.gz -[2023-12-08 08:41:45,772][Main][INFO] - [train] Step 48600 out of 65536 | Loss --> 1.783 | Grad_l2 --> 0.287 | Weights_l2 --> 15366.056 | Lr --> 0.004 | Seconds_per_step --> 1.681 | -[2023-12-08 08:44:31,506][Main][INFO] - [train] Step 48700 out of 65536 | Loss --> 1.780 | Grad_l2 --> 0.287 | Weights_l2 --> 15367.945 | Lr --> 0.004 | Seconds_per_step --> 1.657 | -[2023-12-08 08:47:14,990][Main][INFO] - [train] Step 48800 out of 65536 | Loss --> 1.780 | Grad_l2 --> 0.285 | Weights_l2 --> 15369.763 | Lr --> 0.004 | Seconds_per_step --> 1.635 | -[2023-12-08 08:50:02,328][Main][INFO] - [train] Step 48900 out of 65536 | Loss --> 1.783 | Grad_l2 --> 0.287 | Weights_l2 --> 15371.608 | Lr --> 0.004 | Seconds_per_step --> 1.673 | -[2023-12-08 08:52:46,768][Main][INFO] - [train] Step 49000 out of 65536 | Loss --> 1.781 | Grad_l2 --> 0.288 | Weights_l2 --> 15373.368 | Lr --> 0.004 | Seconds_per_step --> 1.644 | -[2023-12-08 08:55:35,158][Main][INFO] - [train] Step 49100 out of 65536 | Loss --> 1.759 | Grad_l2 --> 0.287 | Weights_l2 --> 15375.101 | Lr --> 0.004 | Seconds_per_step --> 1.684 | -[2023-12-08 08:58:18,881][Main][INFO] - [train] Step 49200 out of 65536 | Loss --> 1.764 | Grad_l2 --> 0.288 | Weights_l2 --> 15376.809 | Lr --> 0.004 | Seconds_per_step --> 1.637 | -[2023-12-08 09:01:02,424][Main][INFO] - [train] Step 49300 out of 65536 | Loss --> 1.766 | Grad_l2 --> 0.284 | Weights_l2 --> 15378.426 | Lr --> 0.004 | Seconds_per_step --> 1.635 | -[2023-12-08 09:02:30,767][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00304-of-00512.json.gz -[2023-12-08 09:02:47,725][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00253-of-00512.json.gz -[2023-12-08 09:03:51,365][Main][INFO] - [train] Step 49400 out of 65536 | Loss --> 1.763 | Grad_l2 --> 0.290 | Weights_l2 --> 15380.053 | Lr --> 0.004 | Seconds_per_step --> 1.689 | -[2023-12-08 09:06:34,503][Main][INFO] - [train] Step 49500 out of 65536 | Loss --> 1.753 | Grad_l2 --> 0.283 | Weights_l2 --> 15381.625 | Lr --> 0.004 | Seconds_per_step --> 1.631 | -[2023-12-08 09:09:18,184][Main][INFO] - [train] Step 49600 out of 65536 | Loss --> 1.773 | Grad_l2 --> 0.287 | Weights_l2 --> 15383.168 | Lr --> 0.004 | Seconds_per_step --> 1.637 | -[2023-12-08 09:12:04,244][Main][INFO] - [train] Step 49700 out of 65536 | Loss --> 1.768 | Grad_l2 --> 0.283 | Weights_l2 --> 15384.647 | Lr --> 0.004 | Seconds_per_step --> 1.661 | -[2023-12-08 09:14:46,543][Main][INFO] - [train] Step 49800 out of 65536 | Loss --> 1.759 | Grad_l2 --> 0.285 | Weights_l2 --> 15386.115 | Lr --> 0.004 | Seconds_per_step --> 1.623 | -[2023-12-08 09:17:30,709][Main][INFO] - [train] Step 49900 out of 65536 | Loss --> 1.765 | Grad_l2 --> 0.286 | Weights_l2 --> 15387.577 | Lr --> 0.004 | Seconds_per_step --> 1.642 | -[2023-12-08 09:20:14,027][Main][INFO] - [train] Step 50000 out of 65536 | Loss --> 1.751 | Grad_l2 --> 0.284 | Weights_l2 --> 15388.964 | Lr --> 0.004 | Seconds_per_step --> 1.633 | -[2023-12-08 09:20:14,028][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-50000 -[2023-12-08 09:20:14,031][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading -[2023-12-08 09:20:16,697][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-50000/model.safetensors -[2023-12-08 09:20:20,217][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-50000/optimizer.bin -[2023-12-08 09:20:20,218][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-50000/scheduler.bin -[2023-12-08 09:20:20,218][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-50000/sampler.bin -[2023-12-08 09:20:20,218][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-50000/sampler_1.bin -[2023-12-08 09:20:20,219][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-50000/random_states_0.pkl -[2023-12-08 09:23:04,564][Main][INFO] - [train] Step 50100 out of 65536 | Loss --> 1.763 | Grad_l2 --> 0.286 | Weights_l2 --> 15390.308 | Lr --> 0.004 | Seconds_per_step --> 1.705 | -[2023-12-08 09:24:10,928][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00254-of-00512.json.gz -[2023-12-08 09:24:25,740][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00262-of-00512.json.gz -[2023-12-08 09:25:51,431][Main][INFO] - [train] Step 50200 out of 65536 | Loss --> 1.756 | Grad_l2 --> 0.285 | Weights_l2 --> 15391.666 | Lr --> 0.004 | Seconds_per_step --> 1.669 | -[2023-12-08 09:28:35,433][Main][INFO] - [train] Step 50300 out of 65536 | Loss --> 1.763 | Grad_l2 --> 0.286 | Weights_l2 --> 15392.952 | Lr --> 0.003 | Seconds_per_step --> 1.640 | -[2023-12-08 09:31:25,310][Main][INFO] - [train] Step 50400 out of 65536 | Loss --> 1.759 | Grad_l2 --> 0.292 | Weights_l2 --> 15394.228 | Lr --> 0.003 | Seconds_per_step --> 1.699 | -[2023-12-08 09:34:08,252][Main][INFO] - [train] Step 50500 out of 65536 | Loss --> 1.755 | Grad_l2 --> 0.287 | Weights_l2 --> 15395.489 | Lr --> 0.003 | Seconds_per_step --> 1.629 | -[2023-12-08 09:36:53,506][Main][INFO] - [train] Step 50600 out of 65536 | Loss --> 1.755 | Grad_l2 --> 0.286 | Weights_l2 --> 15396.700 | Lr --> 0.003 | Seconds_per_step --> 1.653 | -[2023-12-08 09:39:36,014][Main][INFO] - [train] Step 50700 out of 65536 | Loss --> 1.747 | Grad_l2 --> 0.286 | Weights_l2 --> 15397.908 | Lr --> 0.003 | Seconds_per_step --> 1.625 | -[2023-12-08 09:42:18,549][Main][INFO] - [train] Step 50800 out of 65536 | Loss --> 1.751 | Grad_l2 --> 0.287 | Weights_l2 --> 15399.066 | Lr --> 0.003 | Seconds_per_step --> 1.625 | -[2023-12-08 09:45:05,361][Main][INFO] - [train] Step 50900 out of 65536 | Loss --> 1.748 | Grad_l2 --> 0.283 | Weights_l2 --> 15400.164 | Lr --> 0.003 | Seconds_per_step --> 1.668 | -[2023-12-08 09:45:21,682][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00313-of-00512.json.gz -[2023-12-08 09:45:33,260][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00118-of-00512.json.gz -[2023-12-08 09:47:50,909][Main][INFO] - [train] Step 51000 out of 65536 | Loss --> 1.746 | Grad_l2 --> 0.291 | Weights_l2 --> 15401.269 | Lr --> 0.003 | Seconds_per_step --> 1.655 | -[2023-12-08 09:50:38,032][Main][INFO] - [train] Step 51100 out of 65536 | Loss --> 1.757 | Grad_l2 --> 0.286 | Weights_l2 --> 15402.332 | Lr --> 0.003 | Seconds_per_step --> 1.671 | -[2023-12-08 09:53:22,692][Main][INFO] - [train] Step 51200 out of 65536 | Loss --> 1.744 | Grad_l2 --> 0.285 | Weights_l2 --> 15403.377 | Lr --> 0.003 | Seconds_per_step --> 1.647 | -[2023-12-08 09:56:05,810][Main][INFO] - [train] Step 51300 out of 65536 | Loss --> 1.752 | Grad_l2 --> 0.289 | Weights_l2 --> 15404.393 | Lr --> 0.003 | Seconds_per_step --> 1.631 | -[2023-12-08 09:58:48,711][Main][INFO] - [train] Step 51400 out of 65536 | Loss --> 1.739 | Grad_l2 --> 0.285 | Weights_l2 --> 15405.353 | Lr --> 0.003 | Seconds_per_step --> 1.629 | -[2023-12-08 10:01:33,520][Main][INFO] - [train] Step 51500 out of 65536 | Loss --> 1.747 | Grad_l2 --> 0.286 | Weights_l2 --> 15406.338 | Lr --> 0.003 | Seconds_per_step --> 1.648 | -[2023-12-08 10:04:18,761][Main][INFO] - [train] Step 51600 out of 65536 | Loss --> 1.745 | Grad_l2 --> 0.281 | Weights_l2 --> 15407.291 | Lr --> 0.003 | Seconds_per_step --> 1.652 | -[2023-12-08 10:07:03,383][Main][INFO] - [train] Step 51700 out of 65536 | Loss --> 1.751 | Grad_l2 --> 0.284 | Weights_l2 --> 15408.200 | Lr --> 0.003 | Seconds_per_step --> 1.646 | -[2023-12-08 10:07:29,814][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00300-of-00512.json.gz -[2023-12-08 10:07:35,489][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00433-of-00512.json.gz -[2023-12-08 10:09:48,503][Main][INFO] - [train] Step 51800 out of 65536 | Loss --> 1.733 | Grad_l2 --> 0.285 | Weights_l2 --> 15409.099 | Lr --> 0.003 | Seconds_per_step --> 1.651 | -[2023-12-08 10:12:33,410][Main][INFO] - [train] Step 51900 out of 65536 | Loss --> 1.745 | Grad_l2 --> 0.287 | Weights_l2 --> 15409.965 | Lr --> 0.003 | Seconds_per_step --> 1.649 | -[2023-12-08 10:15:17,067][Main][INFO] - [train] Step 52000 out of 65536 | Loss --> 1.750 | Grad_l2 --> 0.284 | Weights_l2 --> 15410.798 | Lr --> 0.003 | Seconds_per_step --> 1.637 | -[2023-12-08 10:18:03,330][Main][INFO] - [train] Step 52100 out of 65536 | Loss --> 1.740 | Grad_l2 --> 0.283 | Weights_l2 --> 15411.598 | Lr --> 0.003 | Seconds_per_step --> 1.663 | -[2023-12-08 10:20:48,047][Main][INFO] - [train] Step 52200 out of 65536 | Loss --> 1.728 | Grad_l2 --> 0.285 | Weights_l2 --> 15412.392 | Lr --> 0.003 | Seconds_per_step --> 1.647 | -[2023-12-08 10:23:35,716][Main][INFO] - [train] Step 52300 out of 65536 | Loss --> 1.732 | Grad_l2 --> 0.285 | Weights_l2 --> 15413.165 | Lr --> 0.003 | Seconds_per_step --> 1.677 | -[2023-12-08 10:26:17,443][Main][INFO] - [train] Step 52400 out of 65536 | Loss --> 1.718 | Grad_l2 --> 0.282 | Weights_l2 --> 15413.929 | Lr --> 0.003 | Seconds_per_step --> 1.617 | -[2023-12-08 10:28:59,726][Main][INFO] - [train] Step 52500 out of 65536 | Loss --> 1.719 | Grad_l2 --> 0.283 | Weights_l2 --> 15414.639 | Lr --> 0.003 | Seconds_per_step --> 1.623 | -[2023-12-08 10:29:19,069][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00379-of-00512.json.gz -[2023-12-08 10:29:31,353][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00015-of-00512.json.gz -[2023-12-08 10:31:50,933][Main][INFO] - [train] Step 52600 out of 65536 | Loss --> 1.737 | Grad_l2 --> 0.286 | Weights_l2 --> 15415.364 | Lr --> 0.003 | Seconds_per_step --> 1.712 | -[2023-12-08 10:34:37,214][Main][INFO] - [train] Step 52700 out of 65536 | Loss --> 1.734 | Grad_l2 --> 0.281 | Weights_l2 --> 15416.042 | Lr --> 0.003 | Seconds_per_step --> 1.663 | -[2023-12-08 10:37:19,498][Main][INFO] - [train] Step 52800 out of 65536 | Loss --> 1.739 | Grad_l2 --> 0.286 | Weights_l2 --> 15416.699 | Lr --> 0.002 | Seconds_per_step --> 1.623 | -[2023-12-08 10:40:03,853][Main][INFO] - [train] Step 52900 out of 65536 | Loss --> 1.738 | Grad_l2 --> 0.284 | Weights_l2 --> 15417.362 | Lr --> 0.002 | Seconds_per_step --> 1.644 | -[2023-12-08 10:42:50,575][Main][INFO] - [train] Step 53000 out of 65536 | Loss --> 1.727 | Grad_l2 --> 0.281 | Weights_l2 --> 15417.969 | Lr --> 0.002 | Seconds_per_step --> 1.667 | -[2023-12-08 10:45:34,875][Main][INFO] - [train] Step 53100 out of 65536 | Loss --> 1.729 | Grad_l2 --> 0.282 | Weights_l2 --> 15418.561 | Lr --> 0.002 | Seconds_per_step --> 1.643 | -[2023-12-08 10:48:18,526][Main][INFO] - [train] Step 53200 out of 65536 | Loss --> 1.723 | Grad_l2 --> 0.284 | Weights_l2 --> 15419.163 | Lr --> 0.002 | Seconds_per_step --> 1.637 | -[2023-12-08 10:50:42,786][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00128-of-00512.json.gz -[2023-12-08 10:50:48,302][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00276-of-00512.json.gz -[2023-12-08 10:51:04,580][Main][INFO] - [train] Step 53300 out of 65536 | Loss --> 1.722 | Grad_l2 --> 0.281 | Weights_l2 --> 15419.721 | Lr --> 0.002 | Seconds_per_step --> 1.661 | -[2023-12-08 10:53:47,441][Main][INFO] - [train] Step 53400 out of 65536 | Loss --> 1.717 | Grad_l2 --> 0.281 | Weights_l2 --> 15420.250 | Lr --> 0.002 | Seconds_per_step --> 1.629 | -[2023-12-08 10:56:30,839][Main][INFO] - [train] Step 53500 out of 65536 | Loss --> 1.728 | Grad_l2 --> 0.282 | Weights_l2 --> 15420.792 | Lr --> 0.002 | Seconds_per_step --> 1.634 | -[2023-12-08 10:59:14,421][Main][INFO] - [train] Step 53600 out of 65536 | Loss --> 1.715 | Grad_l2 --> 0.281 | Weights_l2 --> 15421.324 | Lr --> 0.002 | Seconds_per_step --> 1.636 | -[2023-12-08 11:01:58,272][Main][INFO] - [train] Step 53700 out of 65536 | Loss --> 1.726 | Grad_l2 --> 0.281 | Weights_l2 --> 15421.802 | Lr --> 0.002 | Seconds_per_step --> 1.639 | -[2023-12-08 11:04:44,518][Main][INFO] - [train] Step 53800 out of 65536 | Loss --> 1.723 | Grad_l2 --> 0.282 | Weights_l2 --> 15422.283 | Lr --> 0.002 | Seconds_per_step --> 1.662 | -[2023-12-08 11:07:27,834][Main][INFO] - [train] Step 53900 out of 65536 | Loss --> 1.724 | Grad_l2 --> 0.284 | Weights_l2 --> 15422.757 | Lr --> 0.002 | Seconds_per_step --> 1.633 | -[2023-12-08 11:10:11,765][Main][INFO] - [train] Step 54000 out of 65536 | Loss --> 1.728 | Grad_l2 --> 0.282 | Weights_l2 --> 15423.201 | Lr --> 0.002 | Seconds_per_step --> 1.639 | -[2023-12-08 11:12:23,251][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00148-of-00512.json.gz -[2023-12-08 11:12:30,483][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00275-of-00512.json.gz -[2023-12-08 11:12:58,188][Main][INFO] - [train] Step 54100 out of 65536 | Loss --> 1.737 | Grad_l2 --> 0.285 | Weights_l2 --> 15423.657 | Lr --> 0.002 | Seconds_per_step --> 1.664 | -[2023-12-08 11:15:42,699][Main][INFO] - [train] Step 54200 out of 65536 | Loss --> 1.745 | Grad_l2 --> 0.284 | Weights_l2 --> 15424.064 | Lr --> 0.002 | Seconds_per_step --> 1.645 | -[2023-12-08 11:18:25,994][Main][INFO] - [train] Step 54300 out of 65536 | Loss --> 1.732 | Grad_l2 --> 0.283 | Weights_l2 --> 15424.495 | Lr --> 0.002 | Seconds_per_step --> 1.633 | -[2023-12-08 11:21:10,592][Main][INFO] - [train] Step 54400 out of 65536 | Loss --> 1.734 | Grad_l2 --> 0.287 | Weights_l2 --> 15424.894 | Lr --> 0.002 | Seconds_per_step --> 1.646 | -[2023-12-08 11:23:58,197][Main][INFO] - [train] Step 54500 out of 65536 | Loss --> 1.707 | Grad_l2 --> 0.284 | Weights_l2 --> 15425.274 | Lr --> 0.002 | Seconds_per_step --> 1.676 | -[2023-12-08 11:26:42,068][Main][INFO] - [train] Step 54600 out of 65536 | Loss --> 1.716 | Grad_l2 --> 0.283 | Weights_l2 --> 15425.659 | Lr --> 0.002 | Seconds_per_step --> 1.639 | -[2023-12-08 11:29:24,761][Main][INFO] - [train] Step 54700 out of 65536 | Loss --> 1.722 | Grad_l2 --> 0.280 | Weights_l2 --> 15425.995 | Lr --> 0.002 | Seconds_per_step --> 1.627 | -[2023-12-08 11:32:08,700][Main][INFO] - [train] Step 54800 out of 65536 | Loss --> 1.719 | Grad_l2 --> 0.280 | Weights_l2 --> 15426.316 | Lr --> 0.002 | Seconds_per_step --> 1.639 | -[2023-12-08 11:33:30,641][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00318-of-00512.json.gz -[2023-12-08 11:33:46,852][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00445-of-00512.json.gz -[2023-12-08 11:34:55,159][Main][INFO] - [train] Step 54900 out of 65536 | Loss --> 1.705 | Grad_l2 --> 0.281 | Weights_l2 --> 15426.666 | Lr --> 0.002 | Seconds_per_step --> 1.665 | -[2023-12-08 11:37:40,752][Main][INFO] - [train] Step 55000 out of 65536 | Loss --> 1.716 | Grad_l2 --> 0.282 | Weights_l2 --> 15426.970 | Lr --> 0.002 | Seconds_per_step --> 1.656 | -[2023-12-08 11:40:25,865][Main][INFO] - [train] Step 55100 out of 65536 | Loss --> 1.705 | Grad_l2 --> 0.282 | Weights_l2 --> 15427.272 | Lr --> 0.002 | Seconds_per_step --> 1.651 | -[2023-12-08 11:43:09,660][Main][INFO] - [train] Step 55200 out of 65536 | Loss --> 1.710 | Grad_l2 --> 0.283 | Weights_l2 --> 15427.563 | Lr --> 0.002 | Seconds_per_step --> 1.638 | -[2023-12-08 11:45:54,642][Main][INFO] - [train] Step 55300 out of 65536 | Loss --> 1.709 | Grad_l2 --> 0.281 | Weights_l2 --> 15427.829 | Lr --> 0.002 | Seconds_per_step --> 1.650 | -[2023-12-08 11:48:37,924][Main][INFO] - [train] Step 55400 out of 65536 | Loss --> 1.708 | Grad_l2 --> 0.283 | Weights_l2 --> 15428.112 | Lr --> 0.002 | Seconds_per_step --> 1.633 | -[2023-12-08 11:51:19,802][Main][INFO] - [train] Step 55500 out of 65536 | Loss --> 1.706 | Grad_l2 --> 0.283 | Weights_l2 --> 15428.385 | Lr --> 0.002 | Seconds_per_step --> 1.619 | -[2023-12-08 11:54:08,733][Main][INFO] - [train] Step 55600 out of 65536 | Loss --> 1.709 | Grad_l2 --> 0.281 | Weights_l2 --> 15428.656 | Lr --> 0.002 | Seconds_per_step --> 1.689 | -[2023-12-08 11:55:13,016][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00358-of-00512.json.gz -[2023-12-08 11:55:44,871][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00077-of-00512.json.gz -[2023-12-08 11:56:53,572][Main][INFO] - [train] Step 55700 out of 65536 | Loss --> 1.710 | Grad_l2 --> 0.281 | Weights_l2 --> 15428.889 | Lr --> 0.002 | Seconds_per_step --> 1.648 | -[2023-12-08 11:59:38,153][Main][INFO] - [train] Step 55800 out of 65536 | Loss --> 1.702 | Grad_l2 --> 0.280 | Weights_l2 --> 15429.122 | Lr --> 0.001 | Seconds_per_step --> 1.646 | -[2023-12-08 12:02:23,329][Main][INFO] - [train] Step 55900 out of 65536 | Loss --> 1.697 | Grad_l2 --> 0.284 | Weights_l2 --> 15429.337 | Lr --> 0.001 | Seconds_per_step --> 1.652 | -[2023-12-08 12:05:08,120][Main][INFO] - [train] Step 56000 out of 65536 | Loss --> 1.695 | Grad_l2 --> 0.280 | Weights_l2 --> 15429.544 | Lr --> 0.001 | Seconds_per_step --> 1.648 | -[2023-12-08 12:07:55,368][Main][INFO] - [train] Step 56100 out of 65536 | Loss --> 1.710 | Grad_l2 --> 0.282 | Weights_l2 --> 15429.753 | Lr --> 0.001 | Seconds_per_step --> 1.672 | -[2023-12-08 12:10:43,478][Main][INFO] - [train] Step 56200 out of 65536 | Loss --> 1.705 | Grad_l2 --> 0.279 | Weights_l2 --> 15429.958 | Lr --> 0.001 | Seconds_per_step --> 1.681 | -[2023-12-08 12:13:27,540][Main][INFO] - [train] Step 56300 out of 65536 | Loss --> 1.705 | Grad_l2 --> 0.279 | Weights_l2 --> 15430.141 | Lr --> 0.001 | Seconds_per_step --> 1.641 | -[2023-12-08 12:16:11,783][Main][INFO] - [train] Step 56400 out of 65536 | Loss --> 1.714 | Grad_l2 --> 0.278 | Weights_l2 --> 15430.317 | Lr --> 0.001 | Seconds_per_step --> 1.642 | -[2023-12-08 12:17:15,535][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00035-of-00512.json.gz -[2023-12-08 12:17:44,105][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00490-of-00512.json.gz -[2023-12-08 12:18:57,109][Main][INFO] - [train] Step 56500 out of 65536 | Loss --> 1.710 | Grad_l2 --> 0.280 | Weights_l2 --> 15430.484 | Lr --> 0.001 | Seconds_per_step --> 1.653 | -[2023-12-08 12:21:41,369][Main][INFO] - [train] Step 56600 out of 65536 | Loss --> 1.708 | Grad_l2 --> 0.282 | Weights_l2 --> 15430.646 | Lr --> 0.001 | Seconds_per_step --> 1.643 | -[2023-12-08 12:24:28,223][Main][INFO] - [train] Step 56700 out of 65536 | Loss --> 1.721 | Grad_l2 --> 0.284 | Weights_l2 --> 15430.806 | Lr --> 0.001 | Seconds_per_step --> 1.669 | -[2023-12-08 12:27:15,929][Main][INFO] - [train] Step 56800 out of 65536 | Loss --> 1.712 | Grad_l2 --> 0.282 | Weights_l2 --> 15430.970 | Lr --> 0.001 | Seconds_per_step --> 1.677 | -[2023-12-08 12:29:59,203][Main][INFO] - [train] Step 56900 out of 65536 | Loss --> 1.713 | Grad_l2 --> 0.281 | Weights_l2 --> 15431.114 | Lr --> 0.001 | Seconds_per_step --> 1.633 | -[2023-12-08 12:32:42,561][Main][INFO] - [train] Step 57000 out of 65536 | Loss --> 1.703 | Grad_l2 --> 0.281 | Weights_l2 --> 15431.260 | Lr --> 0.001 | Seconds_per_step --> 1.634 | -[2023-12-08 12:35:27,175][Main][INFO] - [train] Step 57100 out of 65536 | Loss --> 1.706 | Grad_l2 --> 0.282 | Weights_l2 --> 15431.389 | Lr --> 0.001 | Seconds_per_step --> 1.646 | -[2023-12-08 12:38:09,660][Main][INFO] - [train] Step 57200 out of 65536 | Loss --> 1.700 | Grad_l2 --> 0.283 | Weights_l2 --> 15431.512 | Lr --> 0.001 | Seconds_per_step --> 1.625 | -[2023-12-08 12:38:25,702][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00272-of-00512.json.gz -[2023-12-08 12:38:50,694][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00278-of-00512.json.gz -[2023-12-08 12:40:55,941][Main][INFO] - [train] Step 57300 out of 65536 | Loss --> 1.699 | Grad_l2 --> 0.282 | Weights_l2 --> 15431.620 | Lr --> 0.001 | Seconds_per_step --> 1.663 | -[2023-12-08 12:43:39,972][Main][INFO] - [train] Step 57400 out of 65536 | Loss --> 1.703 | Grad_l2 --> 0.281 | Weights_l2 --> 15431.723 | Lr --> 0.001 | Seconds_per_step --> 1.640 | -[2023-12-08 12:46:23,921][Main][INFO] - [train] Step 57500 out of 65536 | Loss --> 1.711 | Grad_l2 --> 0.281 | Weights_l2 --> 15431.840 | Lr --> 0.001 | Seconds_per_step --> 1.639 | -[2023-12-08 12:49:07,123][Main][INFO] - [train] Step 57600 out of 65536 | Loss --> 1.706 | Grad_l2 --> 0.279 | Weights_l2 --> 15431.940 | Lr --> 0.001 | Seconds_per_step --> 1.632 | -[2023-12-08 12:51:50,574][Main][INFO] - [train] Step 57700 out of 65536 | Loss --> 1.687 | Grad_l2 --> 0.286 | Weights_l2 --> 15432.029 | Lr --> 0.001 | Seconds_per_step --> 1.635 | -[2023-12-08 12:54:35,407][Main][INFO] - [train] Step 57800 out of 65536 | Loss --> 1.684 | Grad_l2 --> 0.277 | Weights_l2 --> 15432.129 | Lr --> 0.001 | Seconds_per_step --> 1.648 | -[2023-12-08 12:57:18,089][Main][INFO] - [train] Step 57900 out of 65536 | Loss --> 1.699 | Grad_l2 --> 0.281 | Weights_l2 --> 15432.215 | Lr --> 0.001 | Seconds_per_step --> 1.627 | -[2023-12-08 12:59:55,784][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00324-of-00512.json.gz -[2023-12-08 13:00:04,052][Main][INFO] - [train] Step 58000 out of 65536 | Loss --> 1.693 | Grad_l2 --> 0.279 | Weights_l2 --> 15432.306 | Lr --> 0.001 | Seconds_per_step --> 1.660 | -[2023-12-08 13:00:17,916][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00146-of-00512.json.gz -[2023-12-08 13:02:49,579][Main][INFO] - [train] Step 58100 out of 65536 | Loss --> 1.705 | Grad_l2 --> 0.282 | Weights_l2 --> 15432.378 | Lr --> 0.001 | Seconds_per_step --> 1.655 | -[2023-12-08 13:05:37,015][Main][INFO] - [train] Step 58200 out of 65536 | Loss --> 1.688 | Grad_l2 --> 0.277 | Weights_l2 --> 15432.451 | Lr --> 0.001 | Seconds_per_step --> 1.674 | -[2023-12-08 13:08:22,055][Main][INFO] - [train] Step 58300 out of 65536 | Loss --> 1.678 | Grad_l2 --> 0.279 | Weights_l2 --> 15432.519 | Lr --> 0.001 | Seconds_per_step --> 1.650 | -[2023-12-08 13:11:04,820][Main][INFO] - [train] Step 58400 out of 65536 | Loss --> 1.692 | Grad_l2 --> 0.278 | Weights_l2 --> 15432.581 | Lr --> 0.001 | Seconds_per_step --> 1.628 | -[2023-12-08 13:13:49,927][Main][INFO] - [train] Step 58500 out of 65536 | Loss --> 1.701 | Grad_l2 --> 0.280 | Weights_l2 --> 15432.643 | Lr --> 0.001 | Seconds_per_step --> 1.651 | -[2023-12-08 13:16:35,770][Main][INFO] - [train] Step 58600 out of 65536 | Loss --> 1.685 | Grad_l2 --> 0.277 | Weights_l2 --> 15432.697 | Lr --> 0.001 | Seconds_per_step --> 1.658 | -[2023-12-08 13:19:18,601][Main][INFO] - [train] Step 58700 out of 65536 | Loss --> 1.691 | Grad_l2 --> 0.279 | Weights_l2 --> 15432.744 | Lr --> 0.001 | Seconds_per_step --> 1.628 | -[2023-12-08 13:21:58,583][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00173-of-00512.json.gz -[2023-12-08 13:22:05,798][Main][INFO] - [train] Step 58800 out of 65536 | Loss --> 1.695 | Grad_l2 --> 0.279 | Weights_l2 --> 15432.792 | Lr --> 0.001 | Seconds_per_step --> 1.672 | -[2023-12-08 13:22:06,952][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00387-of-00512.json.gz -[2023-12-08 13:24:54,832][Main][INFO] - [train] Step 58900 out of 65536 | Loss --> 1.721 | Grad_l2 --> 0.290 | Weights_l2 --> 15432.842 | Lr --> 0.001 | Seconds_per_step --> 1.690 | -[2023-12-08 13:27:41,806][Main][INFO] - [train] Step 59000 out of 65536 | Loss --> 1.717 | Grad_l2 --> 0.279 | Weights_l2 --> 15432.895 | Lr --> 0.001 | Seconds_per_step --> 1.670 | -[2023-12-08 13:30:25,050][Main][INFO] - [train] Step 59100 out of 65536 | Loss --> 1.703 | Grad_l2 --> 0.279 | Weights_l2 --> 15432.944 | Lr --> 0.001 | Seconds_per_step --> 1.632 | -[2023-12-08 13:33:09,122][Main][INFO] - [train] Step 59200 out of 65536 | Loss --> 1.703 | Grad_l2 --> 0.277 | Weights_l2 --> 15432.982 | Lr --> 0.001 | Seconds_per_step --> 1.641 | -[2023-12-08 13:35:54,967][Main][INFO] - [train] Step 59300 out of 65536 | Loss --> 1.697 | Grad_l2 --> 0.280 | Weights_l2 --> 15433.012 | Lr --> 0.001 | Seconds_per_step --> 1.658 | -[2023-12-08 13:38:41,308][Main][INFO] - [train] Step 59400 out of 65536 | Loss --> 1.695 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.050 | Lr --> 0.001 | Seconds_per_step --> 1.663 | -[2023-12-08 13:41:25,206][Main][INFO] - [train] Step 59500 out of 65536 | Loss --> 1.692 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.091 | Lr --> 0.001 | Seconds_per_step --> 1.639 | -[2023-12-08 13:43:23,790][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00479-of-00512.json.gz -[2023-12-08 13:43:24,171][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00180-of-00512.json.gz -[2023-12-08 13:44:11,382][Main][INFO] - [train] Step 59600 out of 65536 | Loss --> 1.714 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.119 | Lr --> 0.001 | Seconds_per_step --> 1.662 | -[2023-12-08 13:46:54,720][Main][INFO] - [train] Step 59700 out of 65536 | Loss --> 1.693 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.142 | Lr --> 0.001 | Seconds_per_step --> 1.633 | -[2023-12-08 13:49:38,230][Main][INFO] - [train] Step 59800 out of 65536 | Loss --> 1.696 | Grad_l2 --> 0.281 | Weights_l2 --> 15433.163 | Lr --> 0.001 | Seconds_per_step --> 1.635 | -[2023-12-08 13:52:20,943][Main][INFO] - [train] Step 59900 out of 65536 | Loss --> 1.687 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.194 | Lr --> 0.001 | Seconds_per_step --> 1.627 | -[2023-12-08 13:55:11,135][Main][INFO] - [train] Step 60000 out of 65536 | Loss --> 1.678 | Grad_l2 --> 0.280 | Weights_l2 --> 15433.212 | Lr --> 0.000 | Seconds_per_step --> 1.702 | -[2023-12-08 13:55:11,136][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-60000 -[2023-12-08 13:55:11,139][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading -[2023-12-08 13:55:13,847][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-60000/model.safetensors -[2023-12-08 13:55:17,342][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-60000/optimizer.bin -[2023-12-08 13:55:17,344][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-60000/scheduler.bin -[2023-12-08 13:55:17,344][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-60000/sampler.bin -[2023-12-08 13:55:17,344][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-60000/sampler_1.bin -[2023-12-08 13:55:17,346][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-60000/random_states_0.pkl -[2023-12-08 13:57:59,843][Main][INFO] - [train] Step 60100 out of 65536 | Loss --> 1.691 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.231 | Lr --> 0.000 | Seconds_per_step --> 1.687 | -[2023-12-08 14:00:42,572][Main][INFO] - [train] Step 60200 out of 65536 | Loss --> 1.683 | Grad_l2 --> 0.276 | Weights_l2 --> 15433.247 | Lr --> 0.000 | Seconds_per_step --> 1.627 | -[2023-12-08 14:03:26,191][Main][INFO] - [train] Step 60300 out of 65536 | Loss --> 1.695 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.265 | Lr --> 0.000 | Seconds_per_step --> 1.636 | -[2023-12-08 14:04:55,373][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00374-of-00512.json.gz -[2023-12-08 14:05:11,847][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00363-of-00512.json.gz -[2023-12-08 14:06:14,336][Main][INFO] - [train] Step 60400 out of 65536 | Loss --> 1.701 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.275 | Lr --> 0.000 | Seconds_per_step --> 1.681 | -[2023-12-08 14:08:58,494][Main][INFO] - [train] Step 60500 out of 65536 | Loss --> 1.690 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.293 | Lr --> 0.000 | Seconds_per_step --> 1.642 | -[2023-12-08 14:11:43,614][Main][INFO] - [train] Step 60600 out of 65536 | Loss --> 1.685 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.305 | Lr --> 0.000 | Seconds_per_step --> 1.651 | -[2023-12-08 14:14:28,288][Main][INFO] - [train] Step 60700 out of 65536 | Loss --> 1.665 | Grad_l2 --> 0.280 | Weights_l2 --> 15433.321 | Lr --> 0.000 | Seconds_per_step --> 1.647 | -[2023-12-08 14:17:14,490][Main][INFO] - [train] Step 60800 out of 65536 | Loss --> 1.679 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.328 | Lr --> 0.000 | Seconds_per_step --> 1.662 | -[2023-12-08 14:19:57,687][Main][INFO] - [train] Step 60900 out of 65536 | Loss --> 1.695 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.341 | Lr --> 0.000 | Seconds_per_step --> 1.632 | -[2023-12-08 14:22:41,385][Main][INFO] - [train] Step 61000 out of 65536 | Loss --> 1.688 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.349 | Lr --> 0.000 | Seconds_per_step --> 1.637 | -[2023-12-08 14:25:28,159][Main][INFO] - [train] Step 61100 out of 65536 | Loss --> 1.688 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.356 | Lr --> 0.000 | Seconds_per_step --> 1.668 | -[2023-12-08 14:26:56,619][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00393-of-00512.json.gz -[2023-12-08 14:27:13,268][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00081-of-00512.json.gz -[2023-12-08 14:28:12,495][Main][INFO] - [train] Step 61200 out of 65536 | Loss --> 1.686 | Grad_l2 --> 0.280 | Weights_l2 --> 15433.364 | Lr --> 0.000 | Seconds_per_step --> 1.643 | -[2023-12-08 14:30:56,161][Main][INFO] - [train] Step 61300 out of 65536 | Loss --> 1.685 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.362 | Lr --> 0.000 | Seconds_per_step --> 1.637 | -[2023-12-08 14:33:46,326][Main][INFO] - [train] Step 61400 out of 65536 | Loss --> 1.693 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.371 | Lr --> 0.000 | Seconds_per_step --> 1.702 | -[2023-12-08 14:36:30,055][Main][INFO] - [train] Step 61500 out of 65536 | Loss --> 1.687 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.376 | Lr --> 0.000 | Seconds_per_step --> 1.637 | -[2023-12-08 14:39:13,604][Main][INFO] - [train] Step 61600 out of 65536 | Loss --> 1.682 | Grad_l2 --> 0.280 | Weights_l2 --> 15433.378 | Lr --> 0.000 | Seconds_per_step --> 1.635 | -[2023-12-08 14:41:57,576][Main][INFO] - [train] Step 61700 out of 65536 | Loss --> 1.692 | Grad_l2 --> 0.276 | Weights_l2 --> 15433.383 | Lr --> 0.000 | Seconds_per_step --> 1.640 | -[2023-12-08 14:44:41,315][Main][INFO] - [train] Step 61800 out of 65536 | Loss --> 1.691 | Grad_l2 --> 0.281 | Weights_l2 --> 15433.383 | Lr --> 0.000 | Seconds_per_step --> 1.637 | -[2023-12-08 14:47:24,723][Main][INFO] - [train] Step 61900 out of 65536 | Loss --> 1.689 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.384 | Lr --> 0.000 | Seconds_per_step --> 1.634 | -[2023-12-08 14:47:49,718][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00258-of-00512.json.gz -[2023-12-08 14:48:12,946][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00013-of-00512.json.gz -[2023-12-08 14:50:10,060][Main][INFO] - [train] Step 62000 out of 65536 | Loss --> 1.694 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.382 | Lr --> 0.000 | Seconds_per_step --> 1.653 | -[2023-12-08 14:52:54,526][Main][INFO] - [train] Step 62100 out of 65536 | Loss --> 1.699 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.385 | Lr --> 0.000 | Seconds_per_step --> 1.645 | -[2023-12-08 14:55:41,524][Main][INFO] - [train] Step 62200 out of 65536 | Loss --> 1.700 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.385 | Lr --> 0.000 | Seconds_per_step --> 1.670 | -[2023-12-08 14:58:24,944][Main][INFO] - [train] Step 62300 out of 65536 | Loss --> 1.700 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.384 | Lr --> 0.000 | Seconds_per_step --> 1.634 | -[2023-12-08 15:01:08,345][Main][INFO] - [train] Step 62400 out of 65536 | Loss --> 1.704 | Grad_l2 --> 0.280 | Weights_l2 --> 15433.385 | Lr --> 0.000 | Seconds_per_step --> 1.634 | -[2023-12-08 15:03:51,093][Main][INFO] - [train] Step 62500 out of 65536 | Loss --> 1.692 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.386 | Lr --> 0.000 | Seconds_per_step --> 1.627 | -[2023-12-08 15:06:37,743][Main][INFO] - [train] Step 62600 out of 65536 | Loss --> 1.681 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.386 | Lr --> 0.000 | Seconds_per_step --> 1.666 | -[2023-12-08 15:09:20,288][Main][INFO] - [train] Step 62700 out of 65536 | Loss --> 1.701 | Grad_l2 --> 0.281 | Weights_l2 --> 15433.382 | Lr --> 0.000 | Seconds_per_step --> 1.625 | -[2023-12-08 15:09:29,086][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00371-of-00512.json.gz -[2023-12-08 15:09:46,816][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00181-of-00512.json.gz -[2023-12-08 15:12:07,597][Main][INFO] - [train] Step 62800 out of 65536 | Loss --> 1.705 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.384 | Lr --> 0.000 | Seconds_per_step --> 1.673 | -[2023-12-08 15:14:49,825][Main][INFO] - [train] Step 62900 out of 65536 | Loss --> 1.692 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.383 | Lr --> 0.000 | Seconds_per_step --> 1.622 | -[2023-12-08 15:17:33,303][Main][INFO] - [train] Step 63000 out of 65536 | Loss --> 1.704 | Grad_l2 --> 0.282 | Weights_l2 --> 15433.381 | Lr --> 0.000 | Seconds_per_step --> 1.635 | -[2023-12-08 15:20:16,259][Main][INFO] - [train] Step 63100 out of 65536 | Loss --> 1.686 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.381 | Lr --> 0.000 | Seconds_per_step --> 1.630 | -[2023-12-08 15:23:00,950][Main][INFO] - [train] Step 63200 out of 65536 | Loss --> 1.682 | Grad_l2 --> 0.276 | Weights_l2 --> 15433.379 | Lr --> 0.000 | Seconds_per_step --> 1.647 | -[2023-12-08 15:25:45,654][Main][INFO] - [train] Step 63300 out of 65536 | Loss --> 1.676 | Grad_l2 --> 0.275 | Weights_l2 --> 15433.378 | Lr --> 0.000 | Seconds_per_step --> 1.647 | -[2023-12-08 15:28:29,308][Main][INFO] - [train] Step 63400 out of 65536 | Loss --> 1.673 | Grad_l2 --> 0.272 | Weights_l2 --> 15433.376 | Lr --> 0.000 | Seconds_per_step --> 1.637 | -[2023-12-08 15:30:41,215][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00208-of-00512.json.gz -[2023-12-08 15:30:45,784][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00103-of-00512.json.gz -[2023-12-08 15:31:17,454][Main][INFO] - [train] Step 63500 out of 65536 | Loss --> 1.665 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.376 | Lr --> 0.000 | Seconds_per_step --> 1.681 | -[2023-12-08 15:34:01,784][Main][INFO] - [train] Step 63600 out of 65536 | Loss --> 1.677 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.376 | Lr --> 0.000 | Seconds_per_step --> 1.643 | -[2023-12-08 15:36:49,376][Main][INFO] - [train] Step 63700 out of 65536 | Loss --> 1.696 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.375 | Lr --> 0.000 | Seconds_per_step --> 1.676 | -[2023-12-08 15:39:32,627][Main][INFO] - [train] Step 63800 out of 65536 | Loss --> 1.690 | Grad_l2 --> 0.274 | Weights_l2 --> 15433.375 | Lr --> 0.000 | Seconds_per_step --> 1.633 | -[2023-12-08 15:42:18,422][Main][INFO] - [train] Step 63900 out of 65536 | Loss --> 1.676 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.374 | Lr --> 0.000 | Seconds_per_step --> 1.658 | -[2023-12-08 15:45:01,664][Main][INFO] - [train] Step 64000 out of 65536 | Loss --> 1.696 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.373 | Lr --> 0.000 | Seconds_per_step --> 1.632 | -[2023-12-08 15:47:45,414][Main][INFO] - [train] Step 64100 out of 65536 | Loss --> 1.702 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.372 | Lr --> 0.000 | Seconds_per_step --> 1.637 | -[2023-12-08 15:50:28,686][Main][INFO] - [train] Step 64200 out of 65536 | Loss --> 1.686 | Grad_l2 --> 0.275 | Weights_l2 --> 15433.371 | Lr --> 0.000 | Seconds_per_step --> 1.633 | -[2023-12-08 15:52:25,759][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00282-of-00512.json.gz -[2023-12-08 15:52:32,500][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00498-of-00512.json.gz -[2023-12-08 15:53:17,099][Main][INFO] - [train] Step 64300 out of 65536 | Loss --> 1.690 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.372 | Lr --> 0.000 | Seconds_per_step --> 1.684 | -[2023-12-08 15:55:59,214][Main][INFO] - [train] Step 64400 out of 65536 | Loss --> 1.689 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.370 | Lr --> 0.000 | Seconds_per_step --> 1.621 | -[2023-12-08 15:58:46,401][Main][INFO] - [train] Step 64500 out of 65536 | Loss --> 1.684 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.370 | Lr --> 0.000 | Seconds_per_step --> 1.672 | -[2023-12-08 16:01:31,443][Main][INFO] - [train] Step 64600 out of 65536 | Loss --> 1.666 | Grad_l2 --> 0.276 | Weights_l2 --> 15433.369 | Lr --> 0.000 | Seconds_per_step --> 1.650 | -[2023-12-08 16:04:15,924][Main][INFO] - [train] Step 64700 out of 65536 | Loss --> 1.675 | Grad_l2 --> 0.275 | Weights_l2 --> 15433.369 | Lr --> 0.000 | Seconds_per_step --> 1.645 | -[2023-12-08 16:06:59,459][Main][INFO] - [train] Step 64800 out of 65536 | Loss --> 1.691 | Grad_l2 --> 0.276 | Weights_l2 --> 15433.369 | Lr --> 0.000 | Seconds_per_step --> 1.635 | -[2023-12-08 16:09:44,164][Main][INFO] - [train] Step 64900 out of 65536 | Loss --> 1.707 | Grad_l2 --> 0.277 | Weights_l2 --> 15433.369 | Lr --> 0.000 | Seconds_per_step --> 1.647 | -[2023-12-08 16:12:30,337][Main][INFO] - [train] Step 65000 out of 65536 | Loss --> 1.701 | Grad_l2 --> 0.280 | Weights_l2 --> 15433.369 | Lr --> 0.000 | Seconds_per_step --> 1.662 | -[2023-12-08 16:13:57,714][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00301-of-00512.json.gz -[2023-12-08 16:14:38,582][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00201-of-00512.json.gz -[2023-12-08 16:15:16,165][Main][INFO] - [train] Step 65100 out of 65536 | Loss --> 1.706 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.369 | Lr --> 0.000 | Seconds_per_step --> 1.658 | -[2023-12-08 16:18:00,919][Main][INFO] - [train] Step 65200 out of 65536 | Loss --> 1.685 | Grad_l2 --> 0.278 | Weights_l2 --> 15433.369 | Lr --> 0.000 | Seconds_per_step --> 1.648 | -[2023-12-08 16:20:46,581][Main][INFO] - [train] Step 65300 out of 65536 | Loss --> 1.662 | Grad_l2 --> 0.281 | Weights_l2 --> 15433.369 | Lr --> 0.000 | Seconds_per_step --> 1.657 | -[2023-12-08 16:23:31,147][Main][INFO] - [train] Step 65400 out of 65536 | Loss --> 1.670 | Grad_l2 --> 0.279 | Weights_l2 --> 15433.369 | Lr --> 0.000 | Seconds_per_step --> 1.646 | -[2023-12-08 16:26:15,663][Main][INFO] - [train] Step 65500 out of 65536 | Loss --> 1.688 | Grad_l2 --> 0.275 | Weights_l2 --> 15433.368 | Lr --> 0.000 | Seconds_per_step --> 1.645 | -[2023-12-08 16:27:13,534][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. -[2023-12-08 16:27:13,534][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz -[2023-12-08 16:29:37,549][Main][INFO] - [eval] Step 65537 out of 65536 | Loss --> 1.697 | Accuracy --> 0.675 | Time --> 144.076 | -[2023-12-08 16:29:37,552][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-65537 -[2023-12-08 16:29:37,555][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading -[2023-12-08 16:29:39,435][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-65537/model.safetensors -[2023-12-08 16:29:42,830][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-65537/optimizer.bin -[2023-12-08 16:29:42,832][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-65537/scheduler.bin -[2023-12-08 16:29:42,832][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-65537/sampler.bin -[2023-12-08 16:29:42,832][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-65537/sampler_1.bin -[2023-12-08 16:29:42,833][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-65537/random_states_0.pkl +[2024-01-02 07:29:30,395][Main][INFO] - Working directory is /home/jovyan/nanoT5/logs/2024-01-02/07-29-30- +[2024-01-02 07:29:37,889][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00228-of-00512.json.gz +[2024-01-02 07:29:37,893][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00122-of-00512.json.gz +[2024-01-02 07:32:55,673][Main][INFO] - [train] Step 100 out of 65536 | Loss --> 56.793 | Grad_l2 --> 106.708 | Weights_l2 --> 9934.760 | Lr --> 0.010 | Seconds_per_step --> 1.985 | +[2024-01-02 07:35:45,962][Main][INFO] - [train] Step 200 out of 65536 | Loss --> 11.768 | Grad_l2 --> 10.600 | Weights_l2 --> 9933.323 | Lr --> 0.010 | Seconds_per_step --> 1.703 | +[2024-01-02 07:38:35,708][Main][INFO] - [train] Step 300 out of 65536 | Loss --> 8.268 | Grad_l2 --> 7.176 | Weights_l2 --> 9932.510 | Lr --> 0.010 | Seconds_per_step --> 1.697 | +[2024-01-02 07:41:23,498][Main][INFO] - [train] Step 400 out of 65536 | Loss --> 8.143 | Grad_l2 --> 54.690 | Weights_l2 --> 9932.596 | Lr --> 0.010 | Seconds_per_step --> 1.678 | +[2024-01-02 07:44:16,348][Main][INFO] - [train] Step 500 out of 65536 | Loss --> 6.963 | Grad_l2 --> 1.569 | Weights_l2 --> 9933.899 | Lr --> 0.011 | Seconds_per_step --> 1.728 | +[2024-01-02 07:45:06,122][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00274-of-00512.json.gz +[2024-01-02 07:45:35,466][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00185-of-00512.json.gz +[2024-01-02 07:47:07,477][Main][INFO] - [train] Step 600 out of 65536 | Loss --> 6.746 | Grad_l2 --> 1.401 | Weights_l2 --> 9935.268 | Lr --> 0.011 | Seconds_per_step --> 1.711 | +[2024-01-02 07:49:58,409][Main][INFO] - [train] Step 700 out of 65536 | Loss --> 6.592 | Grad_l2 --> 1.098 | Weights_l2 --> 9937.069 | Lr --> 0.011 | Seconds_per_step --> 1.709 | +[2024-01-02 07:52:46,792][Main][INFO] - [train] Step 800 out of 65536 | Loss --> 6.464 | Grad_l2 --> 1.049 | Weights_l2 --> 9939.345 | Lr --> 0.011 | Seconds_per_step --> 1.684 | +[2024-01-02 07:55:38,626][Main][INFO] - [train] Step 900 out of 65536 | Loss --> 6.357 | Grad_l2 --> 0.894 | Weights_l2 --> 9942.497 | Lr --> 0.011 | Seconds_per_step --> 1.718 | +[2024-01-02 07:58:29,981][Main][INFO] - [train] Step 1000 out of 65536 | Loss --> 6.257 | Grad_l2 --> 0.859 | Weights_l2 --> 9946.163 | Lr --> 0.011 | Seconds_per_step --> 1.714 | +[2024-01-02 08:01:17,269][Main][INFO] - [train] Step 1100 out of 65536 | Loss --> 6.184 | Grad_l2 --> 0.828 | Weights_l2 --> 9950.285 | Lr --> 0.011 | Seconds_per_step --> 1.673 | +[2024-01-02 08:04:06,727][Main][INFO] - [train] Step 1200 out of 65536 | Loss --> 6.094 | Grad_l2 --> 0.828 | Weights_l2 --> 9954.874 | Lr --> 0.011 | Seconds_per_step --> 1.695 | +[2024-01-02 08:05:30,527][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00430-of-00512.json.gz +[2024-01-02 08:05:43,438][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00437-of-00512.json.gz +[2024-01-02 08:06:59,680][Main][INFO] - [train] Step 1300 out of 65536 | Loss --> 6.032 | Grad_l2 --> 0.794 | Weights_l2 --> 9959.952 | Lr --> 0.011 | Seconds_per_step --> 1.730 | +[2024-01-02 08:09:50,806][Main][INFO] - [train] Step 1400 out of 65536 | Loss --> 5.947 | Grad_l2 --> 0.731 | Weights_l2 --> 9965.442 | Lr --> 0.011 | Seconds_per_step --> 1.711 | +[2024-01-02 08:12:39,958][Main][INFO] - [train] Step 1500 out of 65536 | Loss --> 5.881 | Grad_l2 --> 0.723 | Weights_l2 --> 9971.498 | Lr --> 0.012 | Seconds_per_step --> 1.692 | +[2024-01-02 08:15:28,523][Main][INFO] - [train] Step 1600 out of 65536 | Loss --> 5.838 | Grad_l2 --> 0.712 | Weights_l2 --> 9977.942 | Lr --> 0.012 | Seconds_per_step --> 1.686 | +[2024-01-02 08:18:21,350][Main][INFO] - [train] Step 1700 out of 65536 | Loss --> 5.781 | Grad_l2 --> 0.673 | Weights_l2 --> 9984.949 | Lr --> 0.012 | Seconds_per_step --> 1.728 | +[2024-01-02 08:21:10,863][Main][INFO] - [train] Step 1800 out of 65536 | Loss --> 5.738 | Grad_l2 --> 0.648 | Weights_l2 --> 9992.259 | Lr --> 0.012 | Seconds_per_step --> 1.695 | +[2024-01-02 08:23:59,970][Main][INFO] - [train] Step 1900 out of 65536 | Loss --> 5.695 | Grad_l2 --> 0.614 | Weights_l2 --> 9999.916 | Lr --> 0.012 | Seconds_per_step --> 1.691 | +[2024-01-02 08:25:36,522][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00113-of-00512.json.gz +[2024-01-02 08:25:43,294][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00087-of-00512.json.gz +[2024-01-02 08:26:52,451][Main][INFO] - [train] Step 2000 out of 65536 | Loss --> 5.631 | Grad_l2 --> 0.601 | Weights_l2 --> 10008.046 | Lr --> 0.012 | Seconds_per_step --> 1.725 | +[2024-01-02 08:29:40,070][Main][INFO] - [train] Step 2100 out of 65536 | Loss --> 5.600 | Grad_l2 --> 0.597 | Weights_l2 --> 10016.459 | Lr --> 0.012 | Seconds_per_step --> 1.676 | +[2024-01-02 08:32:31,002][Main][INFO] - [train] Step 2200 out of 65536 | Loss --> 5.535 | Grad_l2 --> 0.584 | Weights_l2 --> 10025.548 | Lr --> 0.012 | Seconds_per_step --> 1.709 | +[2024-01-02 08:35:18,394][Main][INFO] - [train] Step 2300 out of 65536 | Loss --> 5.398 | Grad_l2 --> 0.600 | Weights_l2 --> 10035.698 | Lr --> 0.012 | Seconds_per_step --> 1.674 | +[2024-01-02 08:38:06,607][Main][INFO] - [train] Step 2400 out of 65536 | Loss --> 5.176 | Grad_l2 --> 0.599 | Weights_l2 --> 10047.897 | Lr --> 0.012 | Seconds_per_step --> 1.682 | +[2024-01-02 08:40:57,015][Main][INFO] - [train] Step 2500 out of 65536 | Loss --> 5.037 | Grad_l2 --> 0.615 | Weights_l2 --> 10061.134 | Lr --> 0.013 | Seconds_per_step --> 1.704 | +[2024-01-02 08:43:48,930][Main][INFO] - [train] Step 2600 out of 65536 | Loss --> 4.849 | Grad_l2 --> 0.589 | Weights_l2 --> 10075.957 | Lr --> 0.013 | Seconds_per_step --> 1.719 | +[2024-01-02 08:45:27,146][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00389-of-00512.json.gz +[2024-01-02 08:45:53,717][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00483-of-00512.json.gz +[2024-01-02 08:46:41,027][Main][INFO] - [train] Step 2700 out of 65536 | Loss --> 4.661 | Grad_l2 --> 0.582 | Weights_l2 --> 10092.508 | Lr --> 0.013 | Seconds_per_step --> 1.721 | +[2024-01-02 08:49:34,467][Main][INFO] - [train] Step 2800 out of 65536 | Loss --> 4.526 | Grad_l2 --> 0.579 | Weights_l2 --> 10109.752 | Lr --> 0.013 | Seconds_per_step --> 1.734 | +[2024-01-02 08:52:22,299][Main][INFO] - [train] Step 2900 out of 65536 | Loss --> 4.396 | Grad_l2 --> 0.546 | Weights_l2 --> 10127.107 | Lr --> 0.013 | Seconds_per_step --> 1.678 | +[2024-01-02 08:55:17,616][Main][INFO] - [train] Step 3000 out of 65536 | Loss --> 4.290 | Grad_l2 --> 0.565 | Weights_l2 --> 10144.398 | Lr --> 0.013 | Seconds_per_step --> 1.753 | +[2024-01-02 08:58:05,967][Main][INFO] - [train] Step 3100 out of 65536 | Loss --> 4.208 | Grad_l2 --> 0.546 | Weights_l2 --> 10161.719 | Lr --> 0.013 | Seconds_per_step --> 1.684 | +[2024-01-02 09:00:59,521][Main][INFO] - [train] Step 3200 out of 65536 | Loss --> 4.134 | Grad_l2 --> 0.559 | Weights_l2 --> 10179.034 | Lr --> 0.013 | Seconds_per_step --> 1.736 | +[2024-01-02 09:03:48,488][Main][INFO] - [train] Step 3300 out of 65536 | Loss --> 4.060 | Grad_l2 --> 0.535 | Weights_l2 --> 10196.335 | Lr --> 0.013 | Seconds_per_step --> 1.690 | +[2024-01-02 09:06:06,162][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00174-of-00512.json.gz +[2024-01-02 09:06:26,824][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00440-of-00512.json.gz +[2024-01-02 09:06:43,296][Main][INFO] - [train] Step 3400 out of 65536 | Loss --> 3.970 | Grad_l2 --> 0.530 | Weights_l2 --> 10213.569 | Lr --> 0.013 | Seconds_per_step --> 1.748 | +[2024-01-02 09:09:32,439][Main][INFO] - [train] Step 3500 out of 65536 | Loss --> 3.919 | Grad_l2 --> 0.525 | Weights_l2 --> 10230.917 | Lr --> 0.014 | Seconds_per_step --> 1.691 | +[2024-01-02 09:12:23,187][Main][INFO] - [train] Step 3600 out of 65536 | Loss --> 3.882 | Grad_l2 --> 0.522 | Weights_l2 --> 10248.395 | Lr --> 0.014 | Seconds_per_step --> 1.707 | +[2024-01-02 09:15:13,458][Main][INFO] - [train] Step 3700 out of 65536 | Loss --> 3.835 | Grad_l2 --> 0.532 | Weights_l2 --> 10266.030 | Lr --> 0.014 | Seconds_per_step --> 1.703 | +[2024-01-02 09:18:01,193][Main][INFO] - [train] Step 3800 out of 65536 | Loss --> 3.798 | Grad_l2 --> 0.514 | Weights_l2 --> 10283.571 | Lr --> 0.014 | Seconds_per_step --> 1.677 | +[2024-01-02 09:20:49,325][Main][INFO] - [train] Step 3900 out of 65536 | Loss --> 3.769 | Grad_l2 --> 0.657 | Weights_l2 --> 10300.982 | Lr --> 0.014 | Seconds_per_step --> 1.681 | +[2024-01-02 09:23:39,240][Main][INFO] - [train] Step 4000 out of 65536 | Loss --> 3.709 | Grad_l2 --> 0.521 | Weights_l2 --> 10318.798 | Lr --> 0.014 | Seconds_per_step --> 1.699 | +[2024-01-02 09:26:01,799][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00133-of-00512.json.gz +[2024-01-02 09:26:10,569][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00364-of-00512.json.gz +[2024-01-02 09:26:33,661][Main][INFO] - [train] Step 4100 out of 65536 | Loss --> 3.669 | Grad_l2 --> 0.521 | Weights_l2 --> 10336.524 | Lr --> 0.014 | Seconds_per_step --> 1.744 | +[2024-01-02 09:29:21,760][Main][INFO] - [train] Step 4200 out of 65536 | Loss --> 3.621 | Grad_l2 --> 0.500 | Weights_l2 --> 10354.133 | Lr --> 0.014 | Seconds_per_step --> 1.681 | +[2024-01-02 09:32:09,396][Main][INFO] - [train] Step 4300 out of 65536 | Loss --> 3.582 | Grad_l2 --> 0.500 | Weights_l2 --> 10371.901 | Lr --> 0.014 | Seconds_per_step --> 1.676 | +[2024-01-02 09:35:01,777][Main][INFO] - [train] Step 4400 out of 65536 | Loss --> 3.561 | Grad_l2 --> 0.493 | Weights_l2 --> 10389.639 | Lr --> 0.014 | Seconds_per_step --> 1.724 | +[2024-01-02 09:37:50,814][Main][INFO] - [train] Step 4500 out of 65536 | Loss --> 3.526 | Grad_l2 --> 0.486 | Weights_l2 --> 10407.441 | Lr --> 0.015 | Seconds_per_step --> 1.690 | +[2024-01-02 09:40:39,274][Main][INFO] - [train] Step 4600 out of 65536 | Loss --> 3.497 | Grad_l2 --> 0.494 | Weights_l2 --> 10425.366 | Lr --> 0.015 | Seconds_per_step --> 1.685 | +[2024-01-02 09:43:29,238][Main][INFO] - [train] Step 4700 out of 65536 | Loss --> 3.462 | Grad_l2 --> 0.468 | Weights_l2 --> 10443.308 | Lr --> 0.015 | Seconds_per_step --> 1.700 | +[2024-01-02 09:46:04,669][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00482-of-00512.json.gz +[2024-01-02 09:46:20,505][Main][INFO] - [train] Step 4800 out of 65536 | Loss --> 3.441 | Grad_l2 --> 0.478 | Weights_l2 --> 10461.273 | Lr --> 0.015 | Seconds_per_step --> 1.713 | +[2024-01-02 09:46:23,790][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00177-of-00512.json.gz +[2024-01-02 09:49:11,581][Main][INFO] - [train] Step 4900 out of 65536 | Loss --> 3.430 | Grad_l2 --> 0.460 | Weights_l2 --> 10479.178 | Lr --> 0.015 | Seconds_per_step --> 1.711 | +[2024-01-02 09:52:00,911][Main][INFO] - [train] Step 5000 out of 65536 | Loss --> 3.393 | Grad_l2 --> 0.472 | Weights_l2 --> 10497.438 | Lr --> 0.015 | Seconds_per_step --> 1.693 | +[2024-01-02 09:52:00,967][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-02 09:52:00,968][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-02 09:54:12,141][Main][INFO] - [eval] Step 5000 out of 65536 | Loss --> 3.462 | Accuracy --> 0.473 | Time --> 131.226 | +[2024-01-02 09:57:07,848][Main][INFO] - [train] Step 5100 out of 65536 | Loss --> 3.369 | Grad_l2 --> 0.478 | Weights_l2 --> 10515.939 | Lr --> 0.015 | Seconds_per_step --> 1.757 | +[2024-01-02 10:00:01,320][Main][INFO] - [train] Step 5200 out of 65536 | Loss --> 3.358 | Grad_l2 --> 0.457 | Weights_l2 --> 10534.272 | Lr --> 0.015 | Seconds_per_step --> 1.735 | +[2024-01-02 10:02:49,171][Main][INFO] - [train] Step 5300 out of 65536 | Loss --> 3.328 | Grad_l2 --> 0.458 | Weights_l2 --> 10552.935 | Lr --> 0.015 | Seconds_per_step --> 1.678 | +[2024-01-02 10:05:39,781][Main][INFO] - [train] Step 5400 out of 65536 | Loss --> 3.299 | Grad_l2 --> 0.462 | Weights_l2 --> 10571.676 | Lr --> 0.015 | Seconds_per_step --> 1.706 | +[2024-01-02 10:08:29,241][Main][INFO] - [train] Step 5500 out of 65536 | Loss --> 3.290 | Grad_l2 --> 0.448 | Weights_l2 --> 10590.597 | Lr --> 0.016 | Seconds_per_step --> 1.695 | +[2024-01-02 10:09:11,204][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00005-of-00512.json.gz +[2024-01-02 10:09:23,048][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00279-of-00512.json.gz +[2024-01-02 10:11:22,404][Main][INFO] - [train] Step 5600 out of 65536 | Loss --> 3.283 | Grad_l2 --> 0.447 | Weights_l2 --> 10609.454 | Lr --> 0.016 | Seconds_per_step --> 1.732 | +[2024-01-02 10:14:11,041][Main][INFO] - [train] Step 5700 out of 65536 | Loss --> 3.261 | Grad_l2 --> 0.438 | Weights_l2 --> 10628.584 | Lr --> 0.016 | Seconds_per_step --> 1.686 | +[2024-01-02 10:17:05,214][Main][INFO] - [train] Step 5800 out of 65536 | Loss --> 3.264 | Grad_l2 --> 0.441 | Weights_l2 --> 10647.740 | Lr --> 0.016 | Seconds_per_step --> 1.742 | +[2024-01-02 10:20:03,650][Main][INFO] - [train] Step 5900 out of 65536 | Loss --> 3.247 | Grad_l2 --> 0.433 | Weights_l2 --> 10667.126 | Lr --> 0.016 | Seconds_per_step --> 1.784 | +[2024-01-02 10:22:52,332][Main][INFO] - [train] Step 6000 out of 65536 | Loss --> 3.231 | Grad_l2 --> 0.444 | Weights_l2 --> 10686.671 | Lr --> 0.016 | Seconds_per_step --> 1.687 | +[2024-01-02 10:25:42,400][Main][INFO] - [train] Step 6100 out of 65536 | Loss --> 3.222 | Grad_l2 --> 0.429 | Weights_l2 --> 10706.525 | Lr --> 0.016 | Seconds_per_step --> 1.701 | +[2024-01-02 10:28:31,589][Main][INFO] - [train] Step 6200 out of 65536 | Loss --> 3.200 | Grad_l2 --> 0.429 | Weights_l2 --> 10726.433 | Lr --> 0.016 | Seconds_per_step --> 1.692 | +[2024-01-02 10:28:51,953][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00281-of-00512.json.gz +[2024-01-02 10:29:16,283][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00139-of-00512.json.gz +[2024-01-02 10:31:22,732][Main][INFO] - [train] Step 6300 out of 65536 | Loss --> 3.207 | Grad_l2 --> 0.421 | Weights_l2 --> 10746.566 | Lr --> 0.016 | Seconds_per_step --> 1.711 | +[2024-01-02 10:34:12,886][Main][INFO] - [train] Step 6400 out of 65536 | Loss --> 3.182 | Grad_l2 --> 0.438 | Weights_l2 --> 10767.034 | Lr --> 0.016 | Seconds_per_step --> 1.702 | +[2024-01-02 10:37:02,088][Main][INFO] - [train] Step 6500 out of 65536 | Loss --> 3.196 | Grad_l2 --> 0.554 | Weights_l2 --> 10788.199 | Lr --> 0.017 | Seconds_per_step --> 1.692 | +[2024-01-02 10:39:50,971][Main][INFO] - [train] Step 6600 out of 65536 | Loss --> 3.179 | Grad_l2 --> 0.609 | Weights_l2 --> 10809.654 | Lr --> 0.017 | Seconds_per_step --> 1.689 | +[2024-01-02 10:42:40,192][Main][INFO] - [train] Step 6700 out of 65536 | Loss --> 3.176 | Grad_l2 --> 0.462 | Weights_l2 --> 10831.206 | Lr --> 0.017 | Seconds_per_step --> 1.692 | +[2024-01-02 10:45:30,678][Main][INFO] - [train] Step 6800 out of 65536 | Loss --> 3.143 | Grad_l2 --> 0.411 | Weights_l2 --> 10852.539 | Lr --> 0.017 | Seconds_per_step --> 1.705 | +[2024-01-02 10:48:20,529][Main][INFO] - [train] Step 6900 out of 65536 | Loss --> 3.107 | Grad_l2 --> 0.405 | Weights_l2 --> 10873.854 | Lr --> 0.017 | Seconds_per_step --> 1.699 | +[2024-01-02 10:49:17,778][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00407-of-00512.json.gz +[2024-01-02 10:49:48,125][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00058-of-00512.json.gz +[2024-01-02 10:51:14,139][Main][INFO] - [train] Step 7000 out of 65536 | Loss --> 3.095 | Grad_l2 --> 0.391 | Weights_l2 --> 10895.238 | Lr --> 0.017 | Seconds_per_step --> 1.736 | +[2024-01-02 10:54:03,555][Main][INFO] - [train] Step 7100 out of 65536 | Loss --> 3.087 | Grad_l2 --> 0.406 | Weights_l2 --> 10916.873 | Lr --> 0.017 | Seconds_per_step --> 1.694 | +[2024-01-02 10:56:52,316][Main][INFO] - [train] Step 7200 out of 65536 | Loss --> 3.078 | Grad_l2 --> 0.389 | Weights_l2 --> 10938.506 | Lr --> 0.017 | Seconds_per_step --> 1.688 | +[2024-01-02 10:59:41,908][Main][INFO] - [train] Step 7300 out of 65536 | Loss --> 3.074 | Grad_l2 --> 0.389 | Weights_l2 --> 10960.402 | Lr --> 0.017 | Seconds_per_step --> 1.696 | +[2024-01-02 11:02:32,725][Main][INFO] - [train] Step 7400 out of 65536 | Loss --> 3.059 | Grad_l2 --> 0.392 | Weights_l2 --> 10982.456 | Lr --> 0.017 | Seconds_per_step --> 1.708 | +[2024-01-02 11:05:21,789][Main][INFO] - [train] Step 7500 out of 65536 | Loss --> 3.048 | Grad_l2 --> 0.388 | Weights_l2 --> 11005.019 | Lr --> 0.018 | Seconds_per_step --> 1.690 | +[2024-01-02 11:08:11,208][Main][INFO] - [train] Step 7600 out of 65536 | Loss --> 3.044 | Grad_l2 --> 0.373 | Weights_l2 --> 11027.512 | Lr --> 0.018 | Seconds_per_step --> 1.694 | +[2024-01-02 11:09:07,989][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00020-of-00512.json.gz +[2024-01-02 11:09:26,816][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00182-of-00512.json.gz +[2024-01-02 11:11:01,891][Main][INFO] - [train] Step 7700 out of 65536 | Loss --> 3.035 | Grad_l2 --> 0.377 | Weights_l2 --> 11050.158 | Lr --> 0.018 | Seconds_per_step --> 1.707 | +[2024-01-02 11:13:50,790][Main][INFO] - [train] Step 7800 out of 65536 | Loss --> 3.027 | Grad_l2 --> 0.373 | Weights_l2 --> 11073.020 | Lr --> 0.018 | Seconds_per_step --> 1.689 | +[2024-01-02 11:16:40,206][Main][INFO] - [train] Step 7900 out of 65536 | Loss --> 2.995 | Grad_l2 --> 0.367 | Weights_l2 --> 11096.362 | Lr --> 0.018 | Seconds_per_step --> 1.694 | +[2024-01-02 11:19:28,364][Main][INFO] - [train] Step 8000 out of 65536 | Loss --> 2.985 | Grad_l2 --> 0.360 | Weights_l2 --> 11119.787 | Lr --> 0.018 | Seconds_per_step --> 1.682 | +[2024-01-02 11:22:18,250][Main][INFO] - [train] Step 8100 out of 65536 | Loss --> 2.994 | Grad_l2 --> 0.361 | Weights_l2 --> 11143.291 | Lr --> 0.018 | Seconds_per_step --> 1.699 | +[2024-01-02 11:25:06,896][Main][INFO] - [train] Step 8200 out of 65536 | Loss --> 2.983 | Grad_l2 --> 0.360 | Weights_l2 --> 11167.133 | Lr --> 0.018 | Seconds_per_step --> 1.686 | +[2024-01-02 11:27:53,897][Main][INFO] - [train] Step 8300 out of 65536 | Loss --> 2.998 | Grad_l2 --> 0.466 | Weights_l2 --> 11191.464 | Lr --> 0.018 | Seconds_per_step --> 1.670 | +[2024-01-02 11:29:06,279][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00321-of-00512.json.gz +[2024-01-02 11:29:39,792][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00323-of-00512.json.gz +[2024-01-02 11:30:43,928][Main][INFO] - [train] Step 8400 out of 65536 | Loss --> 3.009 | Grad_l2 --> 0.563 | Weights_l2 --> 11216.839 | Lr --> 0.018 | Seconds_per_step --> 1.700 | +[2024-01-02 11:33:38,321][Main][INFO] - [train] Step 8500 out of 65536 | Loss --> 2.972 | Grad_l2 --> 0.352 | Weights_l2 --> 11241.715 | Lr --> 0.019 | Seconds_per_step --> 1.744 | +[2024-01-02 11:36:31,514][Main][INFO] - [train] Step 8600 out of 65536 | Loss --> 2.965 | Grad_l2 --> 0.360 | Weights_l2 --> 11266.211 | Lr --> 0.019 | Seconds_per_step --> 1.732 | +[2024-01-02 11:39:29,291][Main][INFO] - [train] Step 8700 out of 65536 | Loss --> 2.965 | Grad_l2 --> 0.384 | Weights_l2 --> 11291.393 | Lr --> 0.019 | Seconds_per_step --> 1.778 | +[2024-01-02 11:42:19,746][Main][INFO] - [train] Step 8800 out of 65536 | Loss --> 2.973 | Grad_l2 --> 0.359 | Weights_l2 --> 11316.864 | Lr --> 0.019 | Seconds_per_step --> 1.705 | +[2024-01-02 11:45:08,757][Main][INFO] - [train] Step 8900 out of 65536 | Loss --> 2.961 | Grad_l2 --> 0.344 | Weights_l2 --> 11342.222 | Lr --> 0.019 | Seconds_per_step --> 1.690 | +[2024-01-02 11:47:57,776][Main][INFO] - [train] Step 9000 out of 65536 | Loss --> 2.952 | Grad_l2 --> 0.353 | Weights_l2 --> 11367.698 | Lr --> 0.019 | Seconds_per_step --> 1.690 | +[2024-01-02 11:49:26,201][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00217-of-00512.json.gz +[2024-01-02 11:50:16,445][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00511-of-00512.json.gz +[2024-01-02 11:50:49,398][Main][INFO] - [train] Step 9100 out of 65536 | Loss --> 2.924 | Grad_l2 --> 0.334 | Weights_l2 --> 11393.301 | Lr --> 0.019 | Seconds_per_step --> 1.716 | +[2024-01-02 11:53:40,378][Main][INFO] - [train] Step 9200 out of 65536 | Loss --> 2.928 | Grad_l2 --> 0.339 | Weights_l2 --> 11419.266 | Lr --> 0.019 | Seconds_per_step --> 1.710 | +[2024-01-02 11:56:29,788][Main][INFO] - [train] Step 9300 out of 65536 | Loss --> 2.932 | Grad_l2 --> 0.341 | Weights_l2 --> 11445.354 | Lr --> 0.019 | Seconds_per_step --> 1.694 | +[2024-01-02 11:59:22,581][Main][INFO] - [train] Step 9400 out of 65536 | Loss --> 2.909 | Grad_l2 --> 0.337 | Weights_l2 --> 11471.712 | Lr --> 0.019 | Seconds_per_step --> 1.728 | +[2024-01-02 12:02:11,908][Main][INFO] - [train] Step 9500 out of 65536 | Loss --> 2.911 | Grad_l2 --> 0.328 | Weights_l2 --> 11498.399 | Lr --> 0.020 | Seconds_per_step --> 1.693 | +[2024-01-02 12:05:02,332][Main][INFO] - [train] Step 9600 out of 65536 | Loss --> 2.905 | Grad_l2 --> 0.323 | Weights_l2 --> 11525.278 | Lr --> 0.020 | Seconds_per_step --> 1.704 | +[2024-01-02 12:07:51,872][Main][INFO] - [train] Step 9700 out of 65536 | Loss --> 2.902 | Grad_l2 --> 0.325 | Weights_l2 --> 11552.402 | Lr --> 0.020 | Seconds_per_step --> 1.695 | +[2024-01-02 12:09:20,031][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00277-of-00512.json.gz +[2024-01-02 12:10:10,095][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00142-of-00512.json.gz +[2024-01-02 12:10:44,454][Main][INFO] - [train] Step 9800 out of 65536 | Loss --> 2.893 | Grad_l2 --> 0.318 | Weights_l2 --> 11579.686 | Lr --> 0.020 | Seconds_per_step --> 1.726 | +[2024-01-02 12:13:35,163][Main][INFO] - [train] Step 9900 out of 65536 | Loss --> 2.881 | Grad_l2 --> 0.314 | Weights_l2 --> 11607.262 | Lr --> 0.020 | Seconds_per_step --> 1.707 | +[2024-01-02 12:16:23,928][Main][INFO] - [train] Step 10000 out of 65536 | Loss --> 2.882 | Grad_l2 --> 0.317 | Weights_l2 --> 11634.963 | Lr --> 0.020 | Seconds_per_step --> 1.688 | +[2024-01-02 12:16:23,976][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-02 12:16:23,976][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-02 12:18:35,496][Main][INFO] - [eval] Step 10000 out of 65536 | Loss --> 2.918 | Accuracy --> 0.526 | Time --> 131.565 | +[2024-01-02 12:21:26,879][Main][INFO] - [train] Step 10100 out of 65536 | Loss --> 2.862 | Grad_l2 --> 0.322 | Weights_l2 --> 11662.898 | Lr --> 0.020 | Seconds_per_step --> 1.714 | +[2024-01-02 12:24:15,728][Main][INFO] - [train] Step 10200 out of 65536 | Loss --> 2.873 | Grad_l2 --> 0.316 | Weights_l2 --> 11690.667 | Lr --> 0.020 | Seconds_per_step --> 1.688 | +[2024-01-02 12:27:14,037][Main][INFO] - [train] Step 10300 out of 65536 | Loss --> 2.850 | Grad_l2 --> 0.308 | Weights_l2 --> 11718.467 | Lr --> 0.020 | Seconds_per_step --> 1.783 | +[2024-01-02 12:30:01,949][Main][INFO] - [train] Step 10400 out of 65536 | Loss --> 2.860 | Grad_l2 --> 0.315 | Weights_l2 --> 11746.420 | Lr --> 0.020 | Seconds_per_step --> 1.679 | +[2024-01-02 12:31:53,885][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00491-of-00512.json.gz +[2024-01-02 12:32:51,503][Main][INFO] - [train] Step 10500 out of 65536 | Loss --> 2.859 | Grad_l2 --> 0.301 | Weights_l2 --> 11774.193 | Lr --> 0.020 | Seconds_per_step --> 1.696 | +[2024-01-02 12:32:57,174][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00439-of-00512.json.gz +[2024-01-02 12:35:42,573][Main][INFO] - [train] Step 10600 out of 65536 | Loss --> 2.853 | Grad_l2 --> 0.316 | Weights_l2 --> 11802.011 | Lr --> 0.020 | Seconds_per_step --> 1.711 | +[2024-01-02 12:38:32,566][Main][INFO] - [train] Step 10700 out of 65536 | Loss --> 2.840 | Grad_l2 --> 0.307 | Weights_l2 --> 11829.644 | Lr --> 0.020 | Seconds_per_step --> 1.700 | +[2024-01-02 12:41:22,684][Main][INFO] - [train] Step 10800 out of 65536 | Loss --> 2.837 | Grad_l2 --> 0.304 | Weights_l2 --> 11857.178 | Lr --> 0.020 | Seconds_per_step --> 1.701 | +[2024-01-02 12:44:10,983][Main][INFO] - [train] Step 10900 out of 65536 | Loss --> 2.812 | Grad_l2 --> 0.310 | Weights_l2 --> 11884.734 | Lr --> 0.020 | Seconds_per_step --> 1.683 | +[2024-01-02 12:46:59,989][Main][INFO] - [train] Step 11000 out of 65536 | Loss --> 2.818 | Grad_l2 --> 0.301 | Weights_l2 --> 11912.166 | Lr --> 0.020 | Seconds_per_step --> 1.690 | +[2024-01-02 12:49:48,092][Main][INFO] - [train] Step 11100 out of 65536 | Loss --> 2.774 | Grad_l2 --> 0.296 | Weights_l2 --> 11939.586 | Lr --> 0.020 | Seconds_per_step --> 1.681 | +[2024-01-02 12:51:56,755][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00224-of-00512.json.gz +[2024-01-02 12:52:42,224][Main][INFO] - [train] Step 11200 out of 65536 | Loss --> 2.792 | Grad_l2 --> 0.302 | Weights_l2 --> 11967.110 | Lr --> 0.020 | Seconds_per_step --> 1.741 | +[2024-01-02 12:53:26,814][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00119-of-00512.json.gz +[2024-01-02 12:55:32,300][Main][INFO] - [train] Step 11300 out of 65536 | Loss --> 2.784 | Grad_l2 --> 0.301 | Weights_l2 --> 11994.543 | Lr --> 0.020 | Seconds_per_step --> 1.701 | +[2024-01-02 12:58:21,753][Main][INFO] - [train] Step 11400 out of 65536 | Loss --> 2.792 | Grad_l2 --> 0.300 | Weights_l2 --> 12022.087 | Lr --> 0.020 | Seconds_per_step --> 1.695 | +[2024-01-02 13:01:09,907][Main][INFO] - [train] Step 11500 out of 65536 | Loss --> 2.808 | Grad_l2 --> 0.309 | Weights_l2 --> 12049.762 | Lr --> 0.020 | Seconds_per_step --> 1.682 | +[2024-01-02 13:04:00,541][Main][INFO] - [train] Step 11600 out of 65536 | Loss --> 2.776 | Grad_l2 --> 0.283 | Weights_l2 --> 12077.197 | Lr --> 0.020 | Seconds_per_step --> 1.706 | +[2024-01-02 13:06:50,871][Main][INFO] - [train] Step 11700 out of 65536 | Loss --> 2.756 | Grad_l2 --> 0.293 | Weights_l2 --> 12104.389 | Lr --> 0.020 | Seconds_per_step --> 1.703 | +[2024-01-02 13:09:39,441][Main][INFO] - [train] Step 11800 out of 65536 | Loss --> 2.770 | Grad_l2 --> 0.288 | Weights_l2 --> 12131.692 | Lr --> 0.020 | Seconds_per_step --> 1.686 | +[2024-01-02 13:11:26,109][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00333-of-00512.json.gz +[2024-01-02 13:12:30,746][Main][INFO] - [train] Step 11900 out of 65536 | Loss --> 2.758 | Grad_l2 --> 0.280 | Weights_l2 --> 12158.996 | Lr --> 0.020 | Seconds_per_step --> 1.713 | +[2024-01-02 13:12:59,159][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00348-of-00512.json.gz +[2024-01-02 13:15:26,476][Main][INFO] - [train] Step 12000 out of 65536 | Loss --> 2.747 | Grad_l2 --> 0.277 | Weights_l2 --> 12186.245 | Lr --> 0.020 | Seconds_per_step --> 1.757 | +[2024-01-02 13:18:15,914][Main][INFO] - [train] Step 12100 out of 65536 | Loss --> 2.754 | Grad_l2 --> 0.280 | Weights_l2 --> 12213.318 | Lr --> 0.020 | Seconds_per_step --> 1.694 | +[2024-01-02 13:21:05,540][Main][INFO] - [train] Step 12200 out of 65536 | Loss --> 2.783 | Grad_l2 --> 0.475 | Weights_l2 --> 12242.980 | Lr --> 0.020 | Seconds_per_step --> 1.696 | +[2024-01-02 13:23:57,573][Main][INFO] - [train] Step 12300 out of 65536 | Loss --> 2.835 | Grad_l2 --> 0.455 | Weights_l2 --> 12275.630 | Lr --> 0.020 | Seconds_per_step --> 1.720 | +[2024-01-02 13:26:51,637][Main][INFO] - [train] Step 12400 out of 65536 | Loss --> 2.767 | Grad_l2 --> 0.316 | Weights_l2 --> 12303.800 | Lr --> 0.020 | Seconds_per_step --> 1.741 | +[2024-01-02 13:29:40,588][Main][INFO] - [train] Step 12500 out of 65536 | Loss --> 2.752 | Grad_l2 --> 0.285 | Weights_l2 --> 12331.423 | Lr --> 0.020 | Seconds_per_step --> 1.690 | +[2024-01-02 13:32:16,990][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00335-of-00512.json.gz +[2024-01-02 13:32:33,602][Main][INFO] - [train] Step 12600 out of 65536 | Loss --> 2.727 | Grad_l2 --> 0.300 | Weights_l2 --> 12358.503 | Lr --> 0.020 | Seconds_per_step --> 1.730 | +[2024-01-02 13:33:37,256][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00001-of-00512.json.gz +[2024-01-02 13:35:23,429][Main][INFO] - [train] Step 12700 out of 65536 | Loss --> 2.717 | Grad_l2 --> 0.290 | Weights_l2 --> 12385.276 | Lr --> 0.020 | Seconds_per_step --> 1.698 | +[2024-01-02 13:38:12,912][Main][INFO] - [train] Step 12800 out of 65536 | Loss --> 2.734 | Grad_l2 --> 0.276 | Weights_l2 --> 12412.097 | Lr --> 0.020 | Seconds_per_step --> 1.695 | +[2024-01-02 13:41:01,744][Main][INFO] - [train] Step 12900 out of 65536 | Loss --> 2.732 | Grad_l2 --> 0.288 | Weights_l2 --> 12438.755 | Lr --> 0.020 | Seconds_per_step --> 1.688 | +[2024-01-02 13:43:49,130][Main][INFO] - [train] Step 13000 out of 65536 | Loss --> 2.699 | Grad_l2 --> 0.277 | Weights_l2 --> 12465.346 | Lr --> 0.020 | Seconds_per_step --> 1.674 | +[2024-01-02 13:46:37,658][Main][INFO] - [train] Step 13100 out of 65536 | Loss --> 2.708 | Grad_l2 --> 0.282 | Weights_l2 --> 12492.101 | Lr --> 0.020 | Seconds_per_step --> 1.685 | +[2024-01-02 13:49:30,095][Main][INFO] - [train] Step 13200 out of 65536 | Loss --> 2.687 | Grad_l2 --> 0.287 | Weights_l2 --> 12518.668 | Lr --> 0.020 | Seconds_per_step --> 1.724 | +[2024-01-02 13:52:01,251][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00212-of-00512.json.gz +[2024-01-02 13:52:21,442][Main][INFO] - [train] Step 13300 out of 65536 | Loss --> 2.681 | Grad_l2 --> 0.284 | Weights_l2 --> 12545.094 | Lr --> 0.020 | Seconds_per_step --> 1.713 | +[2024-01-02 13:53:20,889][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00486-of-00512.json.gz +[2024-01-02 13:55:10,174][Main][INFO] - [train] Step 13400 out of 65536 | Loss --> 2.676 | Grad_l2 --> 0.274 | Weights_l2 --> 12571.636 | Lr --> 0.020 | Seconds_per_step --> 1.687 | +[2024-01-02 13:57:58,686][Main][INFO] - [train] Step 13500 out of 65536 | Loss --> 2.672 | Grad_l2 --> 0.286 | Weights_l2 --> 12598.222 | Lr --> 0.020 | Seconds_per_step --> 1.685 | +[2024-01-02 14:00:49,257][Main][INFO] - [train] Step 13600 out of 65536 | Loss --> 2.657 | Grad_l2 --> 0.276 | Weights_l2 --> 12624.682 | Lr --> 0.020 | Seconds_per_step --> 1.706 | +[2024-01-02 14:03:47,429][Main][INFO] - [train] Step 13700 out of 65536 | Loss --> 2.654 | Grad_l2 --> 0.274 | Weights_l2 --> 12651.288 | Lr --> 0.020 | Seconds_per_step --> 1.782 | +[2024-01-02 14:06:37,118][Main][INFO] - [train] Step 13800 out of 65536 | Loss --> 2.637 | Grad_l2 --> 0.277 | Weights_l2 --> 12677.832 | Lr --> 0.020 | Seconds_per_step --> 1.697 | +[2024-01-02 14:09:28,744][Main][INFO] - [train] Step 13900 out of 65536 | Loss --> 2.647 | Grad_l2 --> 0.272 | Weights_l2 --> 12704.341 | Lr --> 0.020 | Seconds_per_step --> 1.716 | +[2024-01-02 14:12:18,044][Main][INFO] - [train] Step 14000 out of 65536 | Loss --> 2.648 | Grad_l2 --> 0.269 | Weights_l2 --> 12730.539 | Lr --> 0.020 | Seconds_per_step --> 1.693 | +[2024-01-02 14:12:51,526][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00110-of-00512.json.gz +[2024-01-02 14:13:53,017][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00064-of-00512.json.gz +[2024-01-02 14:15:11,988][Main][INFO] - [train] Step 14100 out of 65536 | Loss --> 2.645 | Grad_l2 --> 0.282 | Weights_l2 --> 12757.044 | Lr --> 0.020 | Seconds_per_step --> 1.739 | +[2024-01-02 14:18:04,342][Main][INFO] - [train] Step 14200 out of 65536 | Loss --> 2.633 | Grad_l2 --> 0.283 | Weights_l2 --> 12783.438 | Lr --> 0.020 | Seconds_per_step --> 1.724 | +[2024-01-02 14:20:53,113][Main][INFO] - [train] Step 14300 out of 65536 | Loss --> 2.645 | Grad_l2 --> 0.267 | Weights_l2 --> 12809.697 | Lr --> 0.020 | Seconds_per_step --> 1.688 | +[2024-01-02 14:23:42,569][Main][INFO] - [train] Step 14400 out of 65536 | Loss --> 2.649 | Grad_l2 --> 0.256 | Weights_l2 --> 12836.004 | Lr --> 0.020 | Seconds_per_step --> 1.695 | +[2024-01-02 14:26:31,922][Main][INFO] - [train] Step 14500 out of 65536 | Loss --> 2.639 | Grad_l2 --> 0.270 | Weights_l2 --> 12862.382 | Lr --> 0.020 | Seconds_per_step --> 1.694 | +[2024-01-02 14:29:21,289][Main][INFO] - [train] Step 14600 out of 65536 | Loss --> 2.633 | Grad_l2 --> 0.277 | Weights_l2 --> 12888.502 | Lr --> 0.020 | Seconds_per_step --> 1.694 | +[2024-01-02 14:32:09,575][Main][INFO] - [train] Step 14700 out of 65536 | Loss --> 2.642 | Grad_l2 --> 0.256 | Weights_l2 --> 12914.793 | Lr --> 0.020 | Seconds_per_step --> 1.683 | +[2024-01-02 14:33:21,571][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00307-of-00512.json.gz +[2024-01-02 14:34:14,677][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00022-of-00512.json.gz +[2024-01-02 14:35:01,259][Main][INFO] - [train] Step 14800 out of 65536 | Loss --> 2.635 | Grad_l2 --> 0.306 | Weights_l2 --> 12942.193 | Lr --> 0.020 | Seconds_per_step --> 1.717 | +[2024-01-02 14:37:52,050][Main][INFO] - [train] Step 14900 out of 65536 | Loss --> 2.660 | Grad_l2 --> 0.338 | Weights_l2 --> 12970.937 | Lr --> 0.020 | Seconds_per_step --> 1.708 | +[2024-01-02 14:40:42,335][Main][INFO] - [train] Step 15000 out of 65536 | Loss --> 2.644 | Grad_l2 --> 0.279 | Weights_l2 --> 12997.952 | Lr --> 0.020 | Seconds_per_step --> 1.703 | +[2024-01-02 14:40:42,381][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-02 14:40:42,382][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-02 14:42:52,869][Main][INFO] - [eval] Step 15000 out of 65536 | Loss --> 2.684 | Accuracy --> 0.551 | Time --> 130.532 | +[2024-01-02 14:45:44,453][Main][INFO] - [train] Step 15100 out of 65536 | Loss --> 2.645 | Grad_l2 --> 0.262 | Weights_l2 --> 13024.324 | Lr --> 0.020 | Seconds_per_step --> 1.716 | +[2024-01-02 14:48:33,921][Main][INFO] - [train] Step 15200 out of 65536 | Loss --> 2.644 | Grad_l2 --> 0.254 | Weights_l2 --> 13050.215 | Lr --> 0.020 | Seconds_per_step --> 1.695 | +[2024-01-02 14:51:24,408][Main][INFO] - [train] Step 15300 out of 65536 | Loss --> 2.625 | Grad_l2 --> 0.256 | Weights_l2 --> 13075.952 | Lr --> 0.020 | Seconds_per_step --> 1.705 | +[2024-01-02 14:54:12,203][Main][INFO] - [train] Step 15400 out of 65536 | Loss --> 2.615 | Grad_l2 --> 0.267 | Weights_l2 --> 13101.787 | Lr --> 0.020 | Seconds_per_step --> 1.678 | +[2024-01-02 14:55:24,863][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00375-of-00512.json.gz +[2024-01-02 14:56:10,541][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00366-of-00512.json.gz +[2024-01-02 14:57:03,488][Main][INFO] - [train] Step 15500 out of 65536 | Loss --> 2.639 | Grad_l2 --> 0.283 | Weights_l2 --> 13127.899 | Lr --> 0.020 | Seconds_per_step --> 1.713 | +[2024-01-02 14:59:53,686][Main][INFO] - [train] Step 15600 out of 65536 | Loss --> 2.629 | Grad_l2 --> 0.257 | Weights_l2 --> 13153.681 | Lr --> 0.020 | Seconds_per_step --> 1.702 | +[2024-01-02 15:02:43,780][Main][INFO] - [train] Step 15700 out of 65536 | Loss --> 2.628 | Grad_l2 --> 0.259 | Weights_l2 --> 13179.533 | Lr --> 0.019 | Seconds_per_step --> 1.701 | +[2024-01-02 15:05:33,256][Main][INFO] - [train] Step 15800 out of 65536 | Loss --> 2.604 | Grad_l2 --> 0.252 | Weights_l2 --> 13205.296 | Lr --> 0.019 | Seconds_per_step --> 1.695 | +[2024-01-02 15:08:23,806][Main][INFO] - [train] Step 15900 out of 65536 | Loss --> 2.616 | Grad_l2 --> 0.250 | Weights_l2 --> 13231.060 | Lr --> 0.019 | Seconds_per_step --> 1.705 | +[2024-01-02 15:11:21,443][Main][INFO] - [train] Step 16000 out of 65536 | Loss --> 2.595 | Grad_l2 --> 0.262 | Weights_l2 --> 13256.698 | Lr --> 0.019 | Seconds_per_step --> 1.776 | +[2024-01-02 15:14:11,257][Main][INFO] - [train] Step 16100 out of 65536 | Loss --> 2.571 | Grad_l2 --> 0.261 | Weights_l2 --> 13282.462 | Lr --> 0.019 | Seconds_per_step --> 1.698 | +[2024-01-02 15:15:39,662][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00240-of-00512.json.gz +[2024-01-02 15:16:44,845][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00463-of-00512.json.gz +[2024-01-02 15:17:02,208][Main][INFO] - [train] Step 16200 out of 65536 | Loss --> 2.573 | Grad_l2 --> 0.271 | Weights_l2 --> 13307.963 | Lr --> 0.019 | Seconds_per_step --> 1.710 | +[2024-01-02 15:19:53,136][Main][INFO] - [train] Step 16300 out of 65536 | Loss --> 2.587 | Grad_l2 --> 0.254 | Weights_l2 --> 13333.494 | Lr --> 0.019 | Seconds_per_step --> 1.709 | +[2024-01-02 15:22:47,503][Main][INFO] - [train] Step 16400 out of 65536 | Loss --> 2.557 | Grad_l2 --> 0.259 | Weights_l2 --> 13358.960 | Lr --> 0.019 | Seconds_per_step --> 1.744 | +[2024-01-02 15:25:41,500][Main][INFO] - [train] Step 16500 out of 65536 | Loss --> 2.561 | Grad_l2 --> 0.246 | Weights_l2 --> 13384.289 | Lr --> 0.019 | Seconds_per_step --> 1.740 | +[2024-01-02 15:28:32,008][Main][INFO] - [train] Step 16600 out of 65536 | Loss --> 2.570 | Grad_l2 --> 0.258 | Weights_l2 --> 13409.559 | Lr --> 0.019 | Seconds_per_step --> 1.705 | +[2024-01-02 15:31:24,041][Main][INFO] - [train] Step 16700 out of 65536 | Loss --> 2.546 | Grad_l2 --> 0.260 | Weights_l2 --> 13434.801 | Lr --> 0.019 | Seconds_per_step --> 1.720 | +[2024-01-02 15:34:14,641][Main][INFO] - [train] Step 16800 out of 65536 | Loss --> 2.580 | Grad_l2 --> 0.288 | Weights_l2 --> 13461.565 | Lr --> 0.019 | Seconds_per_step --> 1.706 | +[2024-01-02 15:36:47,720][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00473-of-00512.json.gz +[2024-01-02 15:37:05,420][Main][INFO] - [train] Step 16900 out of 65536 | Loss --> 2.563 | Grad_l2 --> 0.261 | Weights_l2 --> 13487.370 | Lr --> 0.019 | Seconds_per_step --> 1.708 | +[2024-01-02 15:37:28,842][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00160-of-00512.json.gz +[2024-01-02 15:39:55,579][Main][INFO] - [train] Step 17000 out of 65536 | Loss --> 2.566 | Grad_l2 --> 0.249 | Weights_l2 --> 13513.010 | Lr --> 0.019 | Seconds_per_step --> 1.702 | +[2024-01-02 15:42:44,434][Main][INFO] - [train] Step 17100 out of 65536 | Loss --> 2.561 | Grad_l2 --> 0.249 | Weights_l2 --> 13538.317 | Lr --> 0.019 | Seconds_per_step --> 1.689 | +[2024-01-02 15:45:37,029][Main][INFO] - [train] Step 17200 out of 65536 | Loss --> 2.543 | Grad_l2 --> 0.253 | Weights_l2 --> 13563.557 | Lr --> 0.019 | Seconds_per_step --> 1.726 | +[2024-01-02 15:48:25,541][Main][INFO] - [train] Step 17300 out of 65536 | Loss --> 2.550 | Grad_l2 --> 0.263 | Weights_l2 --> 13589.125 | Lr --> 0.019 | Seconds_per_step --> 1.685 | +[2024-01-02 15:51:14,269][Main][INFO] - [train] Step 17400 out of 65536 | Loss --> 2.549 | Grad_l2 --> 0.263 | Weights_l2 --> 13614.287 | Lr --> 0.019 | Seconds_per_step --> 1.687 | +[2024-01-02 15:54:05,860][Main][INFO] - [train] Step 17500 out of 65536 | Loss --> 2.545 | Grad_l2 --> 0.258 | Weights_l2 --> 13639.329 | Lr --> 0.019 | Seconds_per_step --> 1.716 | +[2024-01-02 15:56:56,106][Main][INFO] - [train] Step 17600 out of 65536 | Loss --> 2.536 | Grad_l2 --> 0.260 | Weights_l2 --> 13664.259 | Lr --> 0.019 | Seconds_per_step --> 1.702 | +[2024-01-02 15:56:58,221][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00190-of-00512.json.gz +[2024-01-02 15:57:11,457][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00406-of-00512.json.gz +[2024-01-02 15:59:51,233][Main][INFO] - [train] Step 17700 out of 65536 | Loss --> 2.535 | Grad_l2 --> 0.254 | Weights_l2 --> 13689.024 | Lr --> 0.019 | Seconds_per_step --> 1.751 | +[2024-01-02 16:02:39,927][Main][INFO] - [train] Step 17800 out of 65536 | Loss --> 2.540 | Grad_l2 --> 0.250 | Weights_l2 --> 13713.710 | Lr --> 0.019 | Seconds_per_step --> 1.687 | +[2024-01-02 16:05:29,164][Main][INFO] - [train] Step 17900 out of 65536 | Loss --> 2.536 | Grad_l2 --> 0.258 | Weights_l2 --> 13738.538 | Lr --> 0.019 | Seconds_per_step --> 1.692 | +[2024-01-02 16:08:25,730][Main][INFO] - [train] Step 18000 out of 65536 | Loss --> 2.530 | Grad_l2 --> 0.255 | Weights_l2 --> 13763.452 | Lr --> 0.019 | Seconds_per_step --> 1.766 | +[2024-01-02 16:11:17,117][Main][INFO] - [train] Step 18100 out of 65536 | Loss --> 2.532 | Grad_l2 --> 0.252 | Weights_l2 --> 13788.193 | Lr --> 0.019 | Seconds_per_step --> 1.714 | +[2024-01-02 16:14:13,193][Main][INFO] - [train] Step 18200 out of 65536 | Loss --> 2.523 | Grad_l2 --> 0.258 | Weights_l2 --> 13813.035 | Lr --> 0.019 | Seconds_per_step --> 1.761 | +[2024-01-02 16:17:02,862][Main][INFO] - [train] Step 18300 out of 65536 | Loss --> 2.518 | Grad_l2 --> 0.246 | Weights_l2 --> 13837.904 | Lr --> 0.019 | Seconds_per_step --> 1.697 | +[2024-01-02 16:17:19,431][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00214-of-00512.json.gz +[2024-01-02 16:18:00,973][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00492-of-00512.json.gz +[2024-01-02 16:19:55,259][Main][INFO] - [train] Step 18400 out of 65536 | Loss --> 2.515 | Grad_l2 --> 0.253 | Weights_l2 --> 13862.541 | Lr --> 0.019 | Seconds_per_step --> 1.724 | +[2024-01-02 16:22:45,207][Main][INFO] - [train] Step 18500 out of 65536 | Loss --> 2.503 | Grad_l2 --> 0.255 | Weights_l2 --> 13886.997 | Lr --> 0.019 | Seconds_per_step --> 1.699 | +[2024-01-02 16:25:36,207][Main][INFO] - [train] Step 18600 out of 65536 | Loss --> 2.506 | Grad_l2 --> 0.245 | Weights_l2 --> 13911.428 | Lr --> 0.019 | Seconds_per_step --> 1.710 | +[2024-01-02 16:28:28,646][Main][INFO] - [train] Step 18700 out of 65536 | Loss --> 2.511 | Grad_l2 --> 0.250 | Weights_l2 --> 13935.919 | Lr --> 0.019 | Seconds_per_step --> 1.724 | +[2024-01-02 16:31:17,078][Main][INFO] - [train] Step 18800 out of 65536 | Loss --> 2.514 | Grad_l2 --> 0.249 | Weights_l2 --> 13960.271 | Lr --> 0.019 | Seconds_per_step --> 1.684 | +[2024-01-02 16:34:05,518][Main][INFO] - [train] Step 18900 out of 65536 | Loss --> 2.506 | Grad_l2 --> 0.249 | Weights_l2 --> 13984.804 | Lr --> 0.019 | Seconds_per_step --> 1.684 | +[2024-01-02 16:36:55,156][Main][INFO] - [train] Step 19000 out of 65536 | Loss --> 2.516 | Grad_l2 --> 0.251 | Weights_l2 --> 14009.308 | Lr --> 0.019 | Seconds_per_step --> 1.696 | +[2024-01-02 16:37:19,031][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00082-of-00512.json.gz +[2024-01-02 16:37:41,016][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00143-of-00512.json.gz +[2024-01-02 16:39:53,980][Main][INFO] - [train] Step 19100 out of 65536 | Loss --> 2.501 | Grad_l2 --> 0.243 | Weights_l2 --> 14033.724 | Lr --> 0.019 | Seconds_per_step --> 1.788 | +[2024-01-02 16:42:43,335][Main][INFO] - [train] Step 19200 out of 65536 | Loss --> 2.523 | Grad_l2 --> 0.294 | Weights_l2 --> 14059.729 | Lr --> 0.019 | Seconds_per_step --> 1.694 | +[2024-01-02 16:45:31,492][Main][INFO] - [train] Step 19300 out of 65536 | Loss --> 2.505 | Grad_l2 --> 0.312 | Weights_l2 --> 14085.744 | Lr --> 0.019 | Seconds_per_step --> 1.682 | +[2024-01-02 16:48:19,609][Main][INFO] - [train] Step 19400 out of 65536 | Loss --> 2.514 | Grad_l2 --> 0.278 | Weights_l2 --> 14111.722 | Lr --> 0.019 | Seconds_per_step --> 1.681 | +[2024-01-02 16:51:14,710][Main][INFO] - [train] Step 19500 out of 65536 | Loss --> 2.492 | Grad_l2 --> 0.262 | Weights_l2 --> 14136.135 | Lr --> 0.019 | Seconds_per_step --> 1.751 | +[2024-01-02 16:54:06,545][Main][INFO] - [train] Step 19600 out of 65536 | Loss --> 2.512 | Grad_l2 --> 0.260 | Weights_l2 --> 14160.398 | Lr --> 0.019 | Seconds_per_step --> 1.718 | +[2024-01-02 16:56:57,751][Main][INFO] - [train] Step 19700 out of 65536 | Loss --> 2.507 | Grad_l2 --> 0.265 | Weights_l2 --> 14184.460 | Lr --> 0.019 | Seconds_per_step --> 1.712 | +[2024-01-02 16:57:40,622][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00264-of-00512.json.gz +[2024-01-02 16:57:50,685][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00033-of-00512.json.gz +[2024-01-02 16:59:46,604][Main][INFO] - [train] Step 19800 out of 65536 | Loss --> 2.460 | Grad_l2 --> 0.254 | Weights_l2 --> 14208.169 | Lr --> 0.019 | Seconds_per_step --> 1.689 | +[2024-01-02 17:02:38,157][Main][INFO] - [train] Step 19900 out of 65536 | Loss --> 2.491 | Grad_l2 --> 0.264 | Weights_l2 --> 14232.110 | Lr --> 0.018 | Seconds_per_step --> 1.716 | +[2024-01-02 17:05:29,590][Main][INFO] - [train] Step 20000 out of 65536 | Loss --> 2.487 | Grad_l2 --> 0.264 | Weights_l2 --> 14255.952 | Lr --> 0.018 | Seconds_per_step --> 1.714 | +[2024-01-02 17:05:29,637][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-02 17:05:29,639][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-02 17:07:40,969][Main][INFO] - [eval] Step 20000 out of 65536 | Loss --> 2.527 | Accuracy --> 0.569 | Time --> 131.377 | +[2024-01-02 17:07:40,972][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000 +[2024-01-02 17:07:40,975][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-01-02 17:07:44,022][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors +[2024-01-02 17:07:48,439][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin +[2024-01-02 17:07:48,441][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin +[2024-01-02 17:07:48,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin +[2024-01-02 17:07:48,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin +[2024-01-02 17:07:48,443][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl +[2024-01-02 17:10:37,572][Main][INFO] - [train] Step 20100 out of 65536 | Loss --> 2.489 | Grad_l2 --> 0.255 | Weights_l2 --> 14279.763 | Lr --> 0.018 | Seconds_per_step --> 1.766 | +[2024-01-02 17:13:31,127][Main][INFO] - [train] Step 20200 out of 65536 | Loss --> 2.473 | Grad_l2 --> 0.252 | Weights_l2 --> 14303.187 | Lr --> 0.018 | Seconds_per_step --> 1.736 | +[2024-01-02 17:16:19,446][Main][INFO] - [train] Step 20300 out of 65536 | Loss --> 2.458 | Grad_l2 --> 0.242 | Weights_l2 --> 14326.537 | Lr --> 0.018 | Seconds_per_step --> 1.683 | +[2024-01-02 17:19:09,966][Main][INFO] - [train] Step 20400 out of 65536 | Loss --> 2.451 | Grad_l2 --> 0.254 | Weights_l2 --> 14350.033 | Lr --> 0.018 | Seconds_per_step --> 1.705 | +[2024-01-02 17:20:08,209][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00421-of-00512.json.gz +[2024-01-02 17:20:18,985][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00418-of-00512.json.gz +[2024-01-02 17:22:00,590][Main][INFO] - [train] Step 20500 out of 65536 | Loss --> 2.458 | Grad_l2 --> 0.253 | Weights_l2 --> 14373.380 | Lr --> 0.018 | Seconds_per_step --> 1.706 | +[2024-01-02 17:24:52,665][Main][INFO] - [train] Step 20600 out of 65536 | Loss --> 2.453 | Grad_l2 --> 0.250 | Weights_l2 --> 14396.528 | Lr --> 0.018 | Seconds_per_step --> 1.721 | +[2024-01-02 17:27:42,251][Main][INFO] - [train] Step 20700 out of 65536 | Loss --> 2.478 | Grad_l2 --> 0.254 | Weights_l2 --> 14419.779 | Lr --> 0.018 | Seconds_per_step --> 1.696 | +[2024-01-02 17:30:31,301][Main][INFO] - [train] Step 20800 out of 65536 | Loss --> 2.450 | Grad_l2 --> 0.254 | Weights_l2 --> 14442.813 | Lr --> 0.018 | Seconds_per_step --> 1.690 | +[2024-01-02 17:33:25,345][Main][INFO] - [train] Step 20900 out of 65536 | Loss --> 2.444 | Grad_l2 --> 0.258 | Weights_l2 --> 14465.939 | Lr --> 0.018 | Seconds_per_step --> 1.740 | +[2024-01-02 17:36:13,070][Main][INFO] - [train] Step 21000 out of 65536 | Loss --> 2.440 | Grad_l2 --> 0.244 | Weights_l2 --> 14489.154 | Lr --> 0.018 | Seconds_per_step --> 1.677 | +[2024-01-02 17:39:04,055][Main][INFO] - [train] Step 21100 out of 65536 | Loss --> 2.451 | Grad_l2 --> 0.250 | Weights_l2 --> 14512.288 | Lr --> 0.018 | Seconds_per_step --> 1.710 | +[2024-01-02 17:39:59,025][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00505-of-00512.json.gz +[2024-01-02 17:40:06,313][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00209-of-00512.json.gz +[2024-01-02 17:41:55,609][Main][INFO] - [train] Step 21200 out of 65536 | Loss --> 2.445 | Grad_l2 --> 0.250 | Weights_l2 --> 14535.390 | Lr --> 0.018 | Seconds_per_step --> 1.716 | +[2024-01-02 17:44:48,184][Main][INFO] - [train] Step 21300 out of 65536 | Loss --> 2.460 | Grad_l2 --> 0.260 | Weights_l2 --> 14558.524 | Lr --> 0.018 | Seconds_per_step --> 1.726 | +[2024-01-02 17:47:36,660][Main][INFO] - [train] Step 21400 out of 65536 | Loss --> 2.447 | Grad_l2 --> 0.252 | Weights_l2 --> 14581.355 | Lr --> 0.018 | Seconds_per_step --> 1.685 | +[2024-01-02 17:50:26,460][Main][INFO] - [train] Step 21500 out of 65536 | Loss --> 2.433 | Grad_l2 --> 0.249 | Weights_l2 --> 14604.116 | Lr --> 0.018 | Seconds_per_step --> 1.698 | +[2024-01-02 17:53:22,788][Main][INFO] - [train] Step 21600 out of 65536 | Loss --> 2.416 | Grad_l2 --> 0.247 | Weights_l2 --> 14626.949 | Lr --> 0.018 | Seconds_per_step --> 1.763 | +[2024-01-02 17:56:14,559][Main][INFO] - [train] Step 21700 out of 65536 | Loss --> 2.427 | Grad_l2 --> 0.254 | Weights_l2 --> 14649.558 | Lr --> 0.018 | Seconds_per_step --> 1.718 | +[2024-01-02 17:59:03,310][Main][INFO] - [train] Step 21800 out of 65536 | Loss --> 2.425 | Grad_l2 --> 0.257 | Weights_l2 --> 14672.719 | Lr --> 0.018 | Seconds_per_step --> 1.687 | +[2024-01-02 18:00:41,296][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00360-of-00512.json.gz +[2024-01-02 18:00:57,421][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00441-of-00512.json.gz +[2024-01-02 18:01:55,092][Main][INFO] - [train] Step 21900 out of 65536 | Loss --> 2.448 | Grad_l2 --> 0.239 | Weights_l2 --> 14695.458 | Lr --> 0.018 | Seconds_per_step --> 1.718 | +[2024-01-02 18:04:45,128][Main][INFO] - [train] Step 22000 out of 65536 | Loss --> 2.415 | Grad_l2 --> 0.243 | Weights_l2 --> 14717.980 | Lr --> 0.018 | Seconds_per_step --> 1.700 | +[2024-01-02 18:07:42,566][Main][INFO] - [train] Step 22100 out of 65536 | Loss --> 2.404 | Grad_l2 --> 0.254 | Weights_l2 --> 14740.518 | Lr --> 0.018 | Seconds_per_step --> 1.774 | +[2024-01-02 18:10:37,165][Main][INFO] - [train] Step 22200 out of 65536 | Loss --> 2.405 | Grad_l2 --> 0.256 | Weights_l2 --> 14762.969 | Lr --> 0.018 | Seconds_per_step --> 1.746 | +[2024-01-02 18:13:29,157][Main][INFO] - [train] Step 22300 out of 65536 | Loss --> 2.423 | Grad_l2 --> 0.261 | Weights_l2 --> 14785.481 | Lr --> 0.018 | Seconds_per_step --> 1.720 | +[2024-01-02 18:16:24,606][Main][INFO] - [train] Step 22400 out of 65536 | Loss --> 2.447 | Grad_l2 --> 0.284 | Weights_l2 --> 14809.074 | Lr --> 0.018 | Seconds_per_step --> 1.754 | +[2024-01-02 18:19:13,301][Main][INFO] - [train] Step 22500 out of 65536 | Loss --> 2.443 | Grad_l2 --> 0.263 | Weights_l2 --> 14832.093 | Lr --> 0.018 | Seconds_per_step --> 1.687 | +[2024-01-02 18:21:18,524][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00227-of-00512.json.gz +[2024-01-02 18:21:36,325][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00309-of-00512.json.gz +[2024-01-02 18:22:04,772][Main][INFO] - [train] Step 22600 out of 65536 | Loss --> 2.423 | Grad_l2 --> 0.253 | Weights_l2 --> 14854.775 | Lr --> 0.018 | Seconds_per_step --> 1.715 | +[2024-01-02 18:24:58,962][Main][INFO] - [train] Step 22700 out of 65536 | Loss --> 2.416 | Grad_l2 --> 0.268 | Weights_l2 --> 14877.335 | Lr --> 0.018 | Seconds_per_step --> 1.742 | +[2024-01-02 18:27:53,693][Main][INFO] - [train] Step 22800 out of 65536 | Loss --> 2.414 | Grad_l2 --> 0.253 | Weights_l2 --> 14899.739 | Lr --> 0.017 | Seconds_per_step --> 1.747 | +[2024-01-02 18:30:43,772][Main][INFO] - [train] Step 22900 out of 65536 | Loss --> 2.403 | Grad_l2 --> 0.257 | Weights_l2 --> 14921.510 | Lr --> 0.017 | Seconds_per_step --> 1.701 | +[2024-01-02 18:33:33,099][Main][INFO] - [train] Step 23000 out of 65536 | Loss --> 2.430 | Grad_l2 --> 0.251 | Weights_l2 --> 14943.319 | Lr --> 0.017 | Seconds_per_step --> 1.693 | +[2024-01-02 18:36:22,961][Main][INFO] - [train] Step 23100 out of 65536 | Loss --> 2.397 | Grad_l2 --> 0.252 | Weights_l2 --> 14965.151 | Lr --> 0.017 | Seconds_per_step --> 1.699 | +[2024-01-02 18:39:11,811][Main][INFO] - [train] Step 23200 out of 65536 | Loss --> 2.408 | Grad_l2 --> 0.257 | Weights_l2 --> 14986.698 | Lr --> 0.017 | Seconds_per_step --> 1.688 | +[2024-01-02 18:41:21,814][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00265-of-00512.json.gz +[2024-01-02 18:41:28,135][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00167-of-00512.json.gz +[2024-01-02 18:42:02,830][Main][INFO] - [train] Step 23300 out of 65536 | Loss --> 2.407 | Grad_l2 --> 0.251 | Weights_l2 --> 15008.506 | Lr --> 0.017 | Seconds_per_step --> 1.710 | +[2024-01-02 18:44:54,942][Main][INFO] - [train] Step 23400 out of 65536 | Loss --> 2.406 | Grad_l2 --> 0.263 | Weights_l2 --> 15030.706 | Lr --> 0.017 | Seconds_per_step --> 1.721 | +[2024-01-02 18:47:44,626][Main][INFO] - [train] Step 23500 out of 65536 | Loss --> 2.407 | Grad_l2 --> 0.268 | Weights_l2 --> 15052.366 | Lr --> 0.017 | Seconds_per_step --> 1.697 | +[2024-01-02 18:50:36,690][Main][INFO] - [train] Step 23600 out of 65536 | Loss --> 2.393 | Grad_l2 --> 0.265 | Weights_l2 --> 15074.223 | Lr --> 0.017 | Seconds_per_step --> 1.721 | +[2024-01-02 18:53:25,775][Main][INFO] - [train] Step 23700 out of 65536 | Loss --> 2.390 | Grad_l2 --> 0.249 | Weights_l2 --> 15095.673 | Lr --> 0.017 | Seconds_per_step --> 1.691 | +[2024-01-02 18:56:14,325][Main][INFO] - [train] Step 23800 out of 65536 | Loss --> 2.394 | Grad_l2 --> 0.264 | Weights_l2 --> 15117.037 | Lr --> 0.017 | Seconds_per_step --> 1.685 | +[2024-01-02 18:59:06,316][Main][INFO] - [train] Step 23900 out of 65536 | Loss --> 2.406 | Grad_l2 --> 0.255 | Weights_l2 --> 15138.512 | Lr --> 0.017 | Seconds_per_step --> 1.720 | +[2024-01-02 19:01:52,496][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00153-of-00512.json.gz +[2024-01-02 19:01:52,883][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00117-of-00512.json.gz +[2024-01-02 19:02:01,057][Main][INFO] - [train] Step 24000 out of 65536 | Loss --> 2.398 | Grad_l2 --> 0.256 | Weights_l2 --> 15159.799 | Lr --> 0.017 | Seconds_per_step --> 1.747 | +[2024-01-02 19:04:49,943][Main][INFO] - [train] Step 24100 out of 65536 | Loss --> 2.393 | Grad_l2 --> 0.251 | Weights_l2 --> 15181.146 | Lr --> 0.017 | Seconds_per_step --> 1.689 | +[2024-01-02 19:07:39,191][Main][INFO] - [train] Step 24200 out of 65536 | Loss --> 2.393 | Grad_l2 --> 0.278 | Weights_l2 --> 15202.209 | Lr --> 0.017 | Seconds_per_step --> 1.692 | +[2024-01-02 19:10:32,575][Main][INFO] - [train] Step 24300 out of 65536 | Loss --> 2.375 | Grad_l2 --> 0.260 | Weights_l2 --> 15223.079 | Lr --> 0.017 | Seconds_per_step --> 1.734 | +[2024-01-02 19:13:21,840][Main][INFO] - [train] Step 24400 out of 65536 | Loss --> 2.372 | Grad_l2 --> 0.248 | Weights_l2 --> 15243.978 | Lr --> 0.017 | Seconds_per_step --> 1.693 | +[2024-01-02 19:16:16,666][Main][INFO] - [train] Step 24500 out of 65536 | Loss --> 2.371 | Grad_l2 --> 0.251 | Weights_l2 --> 15264.771 | Lr --> 0.017 | Seconds_per_step --> 1.748 | +[2024-01-02 19:19:08,833][Main][INFO] - [train] Step 24600 out of 65536 | Loss --> 2.375 | Grad_l2 --> 0.267 | Weights_l2 --> 15285.543 | Lr --> 0.017 | Seconds_per_step --> 1.722 | +[2024-01-02 19:21:55,363][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00263-of-00512.json.gz +[2024-01-02 19:21:59,897][Main][INFO] - [train] Step 24700 out of 65536 | Loss --> 2.372 | Grad_l2 --> 0.263 | Weights_l2 --> 15306.152 | Lr --> 0.017 | Seconds_per_step --> 1.711 | +[2024-01-02 19:22:05,257][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00453-of-00512.json.gz +[2024-01-02 19:24:51,720][Main][INFO] - [train] Step 24800 out of 65536 | Loss --> 2.358 | Grad_l2 --> 0.256 | Weights_l2 --> 15326.725 | Lr --> 0.017 | Seconds_per_step --> 1.718 | +[2024-01-02 19:27:39,622][Main][INFO] - [train] Step 24900 out of 65536 | Loss --> 2.359 | Grad_l2 --> 0.249 | Weights_l2 --> 15347.256 | Lr --> 0.017 | Seconds_per_step --> 1.679 | +[2024-01-02 19:30:27,697][Main][INFO] - [train] Step 25000 out of 65536 | Loss --> 2.359 | Grad_l2 --> 0.261 | Weights_l2 --> 15367.830 | Lr --> 0.017 | Seconds_per_step --> 1.681 | +[2024-01-02 19:30:27,748][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-02 19:30:27,750][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-02 19:32:39,671][Main][INFO] - [eval] Step 25000 out of 65536 | Loss --> 2.407 | Accuracy --> 0.583 | Time --> 131.972 | +[2024-01-02 19:35:33,075][Main][INFO] - [train] Step 25100 out of 65536 | Loss --> 2.360 | Grad_l2 --> 0.262 | Weights_l2 --> 15388.418 | Lr --> 0.017 | Seconds_per_step --> 1.734 | +[2024-01-02 19:38:27,107][Main][INFO] - [train] Step 25200 out of 65536 | Loss --> 2.352 | Grad_l2 --> 0.255 | Weights_l2 --> 15408.697 | Lr --> 0.017 | Seconds_per_step --> 1.740 | +[2024-01-02 19:41:17,526][Main][INFO] - [train] Step 25300 out of 65536 | Loss --> 2.354 | Grad_l2 --> 0.258 | Weights_l2 --> 15428.927 | Lr --> 0.016 | Seconds_per_step --> 1.704 | +[2024-01-02 19:44:07,303][Main][INFO] - [train] Step 25400 out of 65536 | Loss --> 2.337 | Grad_l2 --> 0.266 | Weights_l2 --> 15449.144 | Lr --> 0.016 | Seconds_per_step --> 1.698 | +[2024-01-02 19:44:20,518][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00431-of-00512.json.gz +[2024-01-02 19:44:40,085][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00219-of-00512.json.gz +[2024-01-02 19:46:58,868][Main][INFO] - [train] Step 25500 out of 65536 | Loss --> 2.331 | Grad_l2 --> 0.254 | Weights_l2 --> 15469.032 | Lr --> 0.016 | Seconds_per_step --> 1.716 | +[2024-01-02 19:49:49,258][Main][INFO] - [train] Step 25600 out of 65536 | Loss --> 2.345 | Grad_l2 --> 0.266 | Weights_l2 --> 15489.093 | Lr --> 0.016 | Seconds_per_step --> 1.704 | +[2024-01-02 19:52:38,249][Main][INFO] - [train] Step 25700 out of 65536 | Loss --> 2.347 | Grad_l2 --> 0.262 | Weights_l2 --> 15509.004 | Lr --> 0.016 | Seconds_per_step --> 1.690 | +[2024-01-02 19:55:30,533][Main][INFO] - [train] Step 25800 out of 65536 | Loss --> 2.351 | Grad_l2 --> 0.262 | Weights_l2 --> 15528.766 | Lr --> 0.016 | Seconds_per_step --> 1.723 | +[2024-01-02 19:58:18,977][Main][INFO] - [train] Step 25900 out of 65536 | Loss --> 2.344 | Grad_l2 --> 0.274 | Weights_l2 --> 15548.873 | Lr --> 0.016 | Seconds_per_step --> 1.684 | +[2024-01-02 20:01:07,096][Main][INFO] - [train] Step 26000 out of 65536 | Loss --> 2.359 | Grad_l2 --> 0.260 | Weights_l2 --> 15568.736 | Lr --> 0.016 | Seconds_per_step --> 1.681 | +[2024-01-02 20:03:59,599][Main][INFO] - [train] Step 26100 out of 65536 | Loss --> 2.348 | Grad_l2 --> 0.262 | Weights_l2 --> 15588.379 | Lr --> 0.016 | Seconds_per_step --> 1.725 | +[2024-01-02 20:04:48,626][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00014-of-00512.json.gz +[2024-01-02 20:04:55,213][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00257-of-00512.json.gz +[2024-01-02 20:06:52,474][Main][INFO] - [train] Step 26200 out of 65536 | Loss --> 2.358 | Grad_l2 --> 0.266 | Weights_l2 --> 15608.121 | Lr --> 0.016 | Seconds_per_step --> 1.729 | +[2024-01-02 20:09:40,812][Main][INFO] - [train] Step 26300 out of 65536 | Loss --> 2.334 | Grad_l2 --> 0.262 | Weights_l2 --> 15627.567 | Lr --> 0.016 | Seconds_per_step --> 1.683 | +[2024-01-02 20:12:32,449][Main][INFO] - [train] Step 26400 out of 65536 | Loss --> 2.347 | Grad_l2 --> 0.274 | Weights_l2 --> 15647.006 | Lr --> 0.016 | Seconds_per_step --> 1.716 | +[2024-01-02 20:15:25,730][Main][INFO] - [train] Step 26500 out of 65536 | Loss --> 2.344 | Grad_l2 --> 0.265 | Weights_l2 --> 15666.068 | Lr --> 0.016 | Seconds_per_step --> 1.733 | +[2024-01-02 20:18:12,959][Main][INFO] - [train] Step 26600 out of 65536 | Loss --> 2.356 | Grad_l2 --> 0.260 | Weights_l2 --> 15685.294 | Lr --> 0.016 | Seconds_per_step --> 1.672 | +[2024-01-02 20:21:09,422][Main][INFO] - [train] Step 26700 out of 65536 | Loss --> 2.334 | Grad_l2 --> 0.264 | Weights_l2 --> 15704.444 | Lr --> 0.016 | Seconds_per_step --> 1.765 | +[2024-01-02 20:24:00,053][Main][INFO] - [train] Step 26800 out of 65536 | Loss --> 2.364 | Grad_l2 --> 0.282 | Weights_l2 --> 15724.034 | Lr --> 0.016 | Seconds_per_step --> 1.706 | +[2024-01-02 20:24:42,424][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00060-of-00512.json.gz +[2024-01-02 20:25:22,899][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00510-of-00512.json.gz +[2024-01-02 20:26:52,609][Main][INFO] - [train] Step 26900 out of 65536 | Loss --> 2.354 | Grad_l2 --> 0.265 | Weights_l2 --> 15743.290 | Lr --> 0.016 | Seconds_per_step --> 1.726 | +[2024-01-02 20:29:43,646][Main][INFO] - [train] Step 27000 out of 65536 | Loss --> 2.339 | Grad_l2 --> 0.258 | Weights_l2 --> 15762.223 | Lr --> 0.016 | Seconds_per_step --> 1.710 | +[2024-01-02 20:32:33,625][Main][INFO] - [train] Step 27100 out of 65536 | Loss --> 2.345 | Grad_l2 --> 0.282 | Weights_l2 --> 15781.998 | Lr --> 0.016 | Seconds_per_step --> 1.700 | +[2024-01-02 20:35:23,353][Main][INFO] - [train] Step 27200 out of 65536 | Loss --> 2.344 | Grad_l2 --> 0.279 | Weights_l2 --> 15800.944 | Lr --> 0.016 | Seconds_per_step --> 1.697 | +[2024-01-02 20:38:15,119][Main][INFO] - [train] Step 27300 out of 65536 | Loss --> 2.311 | Grad_l2 --> 0.264 | Weights_l2 --> 15819.695 | Lr --> 0.016 | Seconds_per_step --> 1.718 | +[2024-01-02 20:41:03,550][Main][INFO] - [train] Step 27400 out of 65536 | Loss --> 2.317 | Grad_l2 --> 0.272 | Weights_l2 --> 15838.170 | Lr --> 0.016 | Seconds_per_step --> 1.684 | +[2024-01-02 20:44:02,635][Main][INFO] - [train] Step 27500 out of 65536 | Loss --> 2.319 | Grad_l2 --> 0.266 | Weights_l2 --> 15856.759 | Lr --> 0.015 | Seconds_per_step --> 1.791 | +[2024-01-02 20:45:15,436][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00353-of-00512.json.gz +[2024-01-02 20:45:41,056][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00434-of-00512.json.gz +[2024-01-02 20:46:56,360][Main][INFO] - [train] Step 27600 out of 65536 | Loss --> 2.316 | Grad_l2 --> 0.261 | Weights_l2 --> 15875.160 | Lr --> 0.015 | Seconds_per_step --> 1.737 | +[2024-01-02 20:49:51,501][Main][INFO] - [train] Step 27700 out of 65536 | Loss --> 2.301 | Grad_l2 --> 0.261 | Weights_l2 --> 15893.388 | Lr --> 0.015 | Seconds_per_step --> 1.751 | +[2024-01-02 20:52:42,203][Main][INFO] - [train] Step 27800 out of 65536 | Loss --> 2.335 | Grad_l2 --> 0.275 | Weights_l2 --> 15911.735 | Lr --> 0.015 | Seconds_per_step --> 1.707 | +[2024-01-02 20:55:30,530][Main][INFO] - [train] Step 27900 out of 65536 | Loss --> 2.321 | Grad_l2 --> 0.263 | Weights_l2 --> 15929.715 | Lr --> 0.015 | Seconds_per_step --> 1.683 | +[2024-01-02 20:58:24,401][Main][INFO] - [train] Step 28000 out of 65536 | Loss --> 2.314 | Grad_l2 --> 0.261 | Weights_l2 --> 15947.652 | Lr --> 0.015 | Seconds_per_step --> 1.739 | +[2024-01-02 21:01:14,614][Main][INFO] - [train] Step 28100 out of 65536 | Loss --> 2.291 | Grad_l2 --> 0.272 | Weights_l2 --> 15965.616 | Lr --> 0.015 | Seconds_per_step --> 1.702 | +[2024-01-02 21:04:03,228][Main][INFO] - [train] Step 28200 out of 65536 | Loss --> 2.279 | Grad_l2 --> 0.277 | Weights_l2 --> 15983.342 | Lr --> 0.015 | Seconds_per_step --> 1.686 | +[2024-01-02 21:05:48,936][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00157-of-00512.json.gz +[2024-01-02 21:06:04,397][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00051-of-00512.json.gz +[2024-01-02 21:06:54,532][Main][INFO] - [train] Step 28300 out of 65536 | Loss --> 2.295 | Grad_l2 --> 0.274 | Weights_l2 --> 16001.161 | Lr --> 0.015 | Seconds_per_step --> 1.713 | +[2024-01-02 21:09:43,822][Main][INFO] - [train] Step 28400 out of 65536 | Loss --> 2.285 | Grad_l2 --> 0.265 | Weights_l2 --> 16018.729 | Lr --> 0.015 | Seconds_per_step --> 1.693 | +[2024-01-02 21:12:36,433][Main][INFO] - [train] Step 28500 out of 65536 | Loss --> 2.305 | Grad_l2 --> 0.258 | Weights_l2 --> 16036.217 | Lr --> 0.015 | Seconds_per_step --> 1.726 | +[2024-01-02 21:15:24,119][Main][INFO] - [train] Step 28600 out of 65536 | Loss --> 2.294 | Grad_l2 --> 0.275 | Weights_l2 --> 16053.729 | Lr --> 0.015 | Seconds_per_step --> 1.677 | +[2024-01-02 21:18:15,217][Main][INFO] - [train] Step 28700 out of 65536 | Loss --> 2.297 | Grad_l2 --> 0.280 | Weights_l2 --> 16070.909 | Lr --> 0.015 | Seconds_per_step --> 1.711 | +[2024-01-02 21:21:11,627][Main][INFO] - [train] Step 28800 out of 65536 | Loss --> 2.270 | Grad_l2 --> 0.270 | Weights_l2 --> 16088.154 | Lr --> 0.015 | Seconds_per_step --> 1.764 | +[2024-01-02 21:24:03,333][Main][INFO] - [train] Step 28900 out of 65536 | Loss --> 2.281 | Grad_l2 --> 0.325 | Weights_l2 --> 16106.178 | Lr --> 0.015 | Seconds_per_step --> 1.717 | +[2024-01-02 21:25:47,428][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00068-of-00512.json.gz +[2024-01-02 21:25:57,460][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00347-of-00512.json.gz +[2024-01-02 21:26:59,069][Main][INFO] - [train] Step 29000 out of 65536 | Loss --> 2.296 | Grad_l2 --> 0.275 | Weights_l2 --> 16123.948 | Lr --> 0.015 | Seconds_per_step --> 1.757 | +[2024-01-02 21:29:48,297][Main][INFO] - [train] Step 29100 out of 65536 | Loss --> 2.274 | Grad_l2 --> 0.272 | Weights_l2 --> 16141.083 | Lr --> 0.015 | Seconds_per_step --> 1.692 | +[2024-01-02 21:32:39,491][Main][INFO] - [train] Step 29200 out of 65536 | Loss --> 2.289 | Grad_l2 --> 0.280 | Weights_l2 --> 16158.398 | Lr --> 0.015 | Seconds_per_step --> 1.712 | +[2024-01-02 21:35:29,787][Main][INFO] - [train] Step 29300 out of 65536 | Loss --> 2.285 | Grad_l2 --> 0.276 | Weights_l2 --> 16175.486 | Lr --> 0.015 | Seconds_per_step --> 1.703 | +[2024-01-02 21:38:18,294][Main][INFO] - [train] Step 29400 out of 65536 | Loss --> 2.285 | Grad_l2 --> 0.285 | Weights_l2 --> 16192.404 | Lr --> 0.015 | Seconds_per_step --> 1.685 | +[2024-01-02 21:41:09,096][Main][INFO] - [train] Step 29500 out of 65536 | Loss --> 2.269 | Grad_l2 --> 0.283 | Weights_l2 --> 16209.102 | Lr --> 0.015 | Seconds_per_step --> 1.708 | +[2024-01-02 21:44:01,334][Main][INFO] - [train] Step 29600 out of 65536 | Loss --> 2.268 | Grad_l2 --> 0.277 | Weights_l2 --> 16225.583 | Lr --> 0.014 | Seconds_per_step --> 1.722 | +[2024-01-02 21:46:14,451][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00137-of-00512.json.gz +[2024-01-02 21:46:32,415][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00179-of-00512.json.gz +[2024-01-02 21:46:53,459][Main][INFO] - [train] Step 29700 out of 65536 | Loss --> 2.261 | Grad_l2 --> 0.288 | Weights_l2 --> 16241.978 | Lr --> 0.014 | Seconds_per_step --> 1.721 | +[2024-01-02 21:49:46,300][Main][INFO] - [train] Step 29800 out of 65536 | Loss --> 2.263 | Grad_l2 --> 0.275 | Weights_l2 --> 16258.343 | Lr --> 0.014 | Seconds_per_step --> 1.728 | +[2024-01-02 21:52:34,658][Main][INFO] - [train] Step 29900 out of 65536 | Loss --> 2.263 | Grad_l2 --> 0.272 | Weights_l2 --> 16274.619 | Lr --> 0.014 | Seconds_per_step --> 1.684 | +[2024-01-02 21:55:28,505][Main][INFO] - [train] Step 30000 out of 65536 | Loss --> 2.274 | Grad_l2 --> 0.270 | Weights_l2 --> 16290.690 | Lr --> 0.014 | Seconds_per_step --> 1.738 | +[2024-01-02 21:55:28,561][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-02 21:55:28,562][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-02 21:57:39,177][Main][INFO] - [eval] Step 30000 out of 65536 | Loss --> 2.305 | Accuracy --> 0.595 | Time --> 130.669 | +[2024-01-02 22:00:28,493][Main][INFO] - [train] Step 30100 out of 65536 | Loss --> 2.266 | Grad_l2 --> 0.281 | Weights_l2 --> 16306.693 | Lr --> 0.014 | Seconds_per_step --> 1.693 | +[2024-01-02 22:03:21,488][Main][INFO] - [train] Step 30200 out of 65536 | Loss --> 2.247 | Grad_l2 --> 0.270 | Weights_l2 --> 16322.589 | Lr --> 0.014 | Seconds_per_step --> 1.730 | +[2024-01-02 22:06:11,410][Main][INFO] - [train] Step 30300 out of 65536 | Loss --> 2.270 | Grad_l2 --> 0.284 | Weights_l2 --> 16338.783 | Lr --> 0.014 | Seconds_per_step --> 1.699 | +[2024-01-02 22:08:58,658][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00008-of-00512.json.gz +[2024-01-02 22:08:59,121][Main][INFO] - [train] Step 30400 out of 65536 | Loss --> 2.268 | Grad_l2 --> 0.280 | Weights_l2 --> 16354.834 | Lr --> 0.014 | Seconds_per_step --> 1.677 | +[2024-01-02 22:09:38,033][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00121-of-00512.json.gz +[2024-01-02 22:11:57,819][Main][INFO] - [train] Step 30500 out of 65536 | Loss --> 2.259 | Grad_l2 --> 0.281 | Weights_l2 --> 16370.619 | Lr --> 0.014 | Seconds_per_step --> 1.787 | +[2024-01-02 22:14:45,639][Main][INFO] - [train] Step 30600 out of 65536 | Loss --> 2.248 | Grad_l2 --> 0.271 | Weights_l2 --> 16385.988 | Lr --> 0.014 | Seconds_per_step --> 1.678 | +[2024-01-02 22:17:34,343][Main][INFO] - [train] Step 30700 out of 65536 | Loss --> 2.254 | Grad_l2 --> 0.270 | Weights_l2 --> 16401.478 | Lr --> 0.014 | Seconds_per_step --> 1.687 | +[2024-01-02 22:20:24,583][Main][INFO] - [train] Step 30800 out of 65536 | Loss --> 2.268 | Grad_l2 --> 0.285 | Weights_l2 --> 16416.938 | Lr --> 0.014 | Seconds_per_step --> 1.702 | +[2024-01-02 22:23:14,533][Main][INFO] - [train] Step 30900 out of 65536 | Loss --> 2.262 | Grad_l2 --> 0.278 | Weights_l2 --> 16432.423 | Lr --> 0.014 | Seconds_per_step --> 1.699 | +[2024-01-02 22:26:05,144][Main][INFO] - [train] Step 31000 out of 65536 | Loss --> 2.236 | Grad_l2 --> 0.270 | Weights_l2 --> 16447.583 | Lr --> 0.014 | Seconds_per_step --> 1.706 | +[2024-01-02 22:28:51,617][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00477-of-00512.json.gz +[2024-01-02 22:28:56,393][Main][INFO] - [train] Step 31100 out of 65536 | Loss --> 2.236 | Grad_l2 --> 0.278 | Weights_l2 --> 16462.689 | Lr --> 0.014 | Seconds_per_step --> 1.712 | +[2024-01-02 22:29:20,705][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00230-of-00512.json.gz +[2024-01-02 22:31:46,231][Main][INFO] - [train] Step 31200 out of 65536 | Loss --> 2.219 | Grad_l2 --> 0.272 | Weights_l2 --> 16477.594 | Lr --> 0.014 | Seconds_per_step --> 1.698 | +[2024-01-02 22:34:34,666][Main][INFO] - [train] Step 31300 out of 65536 | Loss --> 2.218 | Grad_l2 --> 0.273 | Weights_l2 --> 16492.417 | Lr --> 0.014 | Seconds_per_step --> 1.684 | +[2024-01-02 22:37:23,187][Main][INFO] - [train] Step 31400 out of 65536 | Loss --> 2.242 | Grad_l2 --> 0.280 | Weights_l2 --> 16507.318 | Lr --> 0.014 | Seconds_per_step --> 1.685 | +[2024-01-02 22:40:16,332][Main][INFO] - [train] Step 31500 out of 65536 | Loss --> 2.227 | Grad_l2 --> 0.276 | Weights_l2 --> 16521.998 | Lr --> 0.013 | Seconds_per_step --> 1.731 | +[2024-01-02 22:43:06,205][Main][INFO] - [train] Step 31600 out of 65536 | Loss --> 2.238 | Grad_l2 --> 0.275 | Weights_l2 --> 16536.536 | Lr --> 0.013 | Seconds_per_step --> 1.699 | +[2024-01-02 22:45:56,381][Main][INFO] - [train] Step 31700 out of 65536 | Loss --> 2.246 | Grad_l2 --> 0.275 | Weights_l2 --> 16550.986 | Lr --> 0.013 | Seconds_per_step --> 1.702 | +[2024-01-02 22:48:45,105][Main][INFO] - [train] Step 31800 out of 65536 | Loss --> 2.239 | Grad_l2 --> 0.282 | Weights_l2 --> 16565.364 | Lr --> 0.013 | Seconds_per_step --> 1.687 | +[2024-01-02 22:49:15,781][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00273-of-00512.json.gz +[2024-01-02 22:49:28,221][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00249-of-00512.json.gz +[2024-01-02 22:51:35,461][Main][INFO] - [train] Step 31900 out of 65536 | Loss --> 2.246 | Grad_l2 --> 0.279 | Weights_l2 --> 16579.675 | Lr --> 0.013 | Seconds_per_step --> 1.704 | +[2024-01-02 22:54:28,150][Main][INFO] - [train] Step 32000 out of 65536 | Loss --> 2.247 | Grad_l2 --> 0.280 | Weights_l2 --> 16593.872 | Lr --> 0.013 | Seconds_per_step --> 1.727 | +[2024-01-02 22:57:17,376][Main][INFO] - [train] Step 32100 out of 65536 | Loss --> 2.230 | Grad_l2 --> 0.295 | Weights_l2 --> 16608.055 | Lr --> 0.013 | Seconds_per_step --> 1.692 | +[2024-01-02 23:00:14,088][Main][INFO] - [train] Step 32200 out of 65536 | Loss --> 2.236 | Grad_l2 --> 0.291 | Weights_l2 --> 16622.165 | Lr --> 0.013 | Seconds_per_step --> 1.767 | +[2024-01-02 23:03:01,993][Main][INFO] - [train] Step 32300 out of 65536 | Loss --> 2.230 | Grad_l2 --> 0.279 | Weights_l2 --> 16636.046 | Lr --> 0.013 | Seconds_per_step --> 1.679 | +[2024-01-02 23:05:51,292][Main][INFO] - [train] Step 32400 out of 65536 | Loss --> 2.244 | Grad_l2 --> 0.280 | Weights_l2 --> 16649.850 | Lr --> 0.013 | Seconds_per_step --> 1.693 | +[2024-01-02 23:08:41,925][Main][INFO] - [train] Step 32500 out of 65536 | Loss --> 2.234 | Grad_l2 --> 0.277 | Weights_l2 --> 16663.578 | Lr --> 0.013 | Seconds_per_step --> 1.706 | +[2024-01-02 23:09:00,196][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00500-of-00512.json.gz +[2024-01-02 23:09:08,888][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00000-of-00512.json.gz +[2024-01-02 23:11:34,392][Main][INFO] - [train] Step 32600 out of 65536 | Loss --> 2.210 | Grad_l2 --> 0.285 | Weights_l2 --> 16677.233 | Lr --> 0.013 | Seconds_per_step --> 1.725 | +[2024-01-02 23:14:23,458][Main][INFO] - [train] Step 32700 out of 65536 | Loss --> 2.229 | Grad_l2 --> 0.289 | Weights_l2 --> 16690.735 | Lr --> 0.013 | Seconds_per_step --> 1.691 | +[2024-01-02 23:17:13,334][Main][INFO] - [train] Step 32800 out of 65536 | Loss --> 2.220 | Grad_l2 --> 0.278 | Weights_l2 --> 16704.177 | Lr --> 0.013 | Seconds_per_step --> 1.699 | +[2024-01-02 23:20:12,953][Main][INFO] - [train] Step 32900 out of 65536 | Loss --> 2.220 | Grad_l2 --> 0.274 | Weights_l2 --> 16717.512 | Lr --> 0.013 | Seconds_per_step --> 1.796 | +[2024-01-02 23:23:04,208][Main][INFO] - [train] Step 33000 out of 65536 | Loss --> 2.225 | Grad_l2 --> 0.284 | Weights_l2 --> 16730.809 | Lr --> 0.013 | Seconds_per_step --> 1.713 | +[2024-01-02 23:25:52,391][Main][INFO] - [train] Step 33100 out of 65536 | Loss --> 2.223 | Grad_l2 --> 0.283 | Weights_l2 --> 16744.049 | Lr --> 0.013 | Seconds_per_step --> 1.682 | +[2024-01-02 23:28:46,134][Main][INFO] - [train] Step 33200 out of 65536 | Loss --> 2.212 | Grad_l2 --> 0.278 | Weights_l2 --> 16757.175 | Lr --> 0.013 | Seconds_per_step --> 1.737 | +[2024-01-02 23:29:20,478][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00507-of-00512.json.gz +[2024-01-02 23:29:51,602][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00352-of-00512.json.gz +[2024-01-02 23:31:35,551][Main][INFO] - [train] Step 33300 out of 65536 | Loss --> 2.206 | Grad_l2 --> 0.283 | Weights_l2 --> 16770.140 | Lr --> 0.013 | Seconds_per_step --> 1.694 | +[2024-01-02 23:34:26,651][Main][INFO] - [train] Step 33400 out of 65536 | Loss --> 2.195 | Grad_l2 --> 0.289 | Weights_l2 --> 16783.056 | Lr --> 0.012 | Seconds_per_step --> 1.711 | +[2024-01-02 23:37:15,247][Main][INFO] - [train] Step 33500 out of 65536 | Loss --> 2.201 | Grad_l2 --> 0.283 | Weights_l2 --> 16795.989 | Lr --> 0.012 | Seconds_per_step --> 1.686 | +[2024-01-02 23:40:04,555][Main][INFO] - [train] Step 33600 out of 65536 | Loss --> 2.198 | Grad_l2 --> 0.283 | Weights_l2 --> 16808.647 | Lr --> 0.012 | Seconds_per_step --> 1.693 | +[2024-01-02 23:42:55,093][Main][INFO] - [train] Step 33700 out of 65536 | Loss --> 2.205 | Grad_l2 --> 0.280 | Weights_l2 --> 16821.247 | Lr --> 0.012 | Seconds_per_step --> 1.705 | +[2024-01-02 23:45:46,415][Main][INFO] - [train] Step 33800 out of 65536 | Loss --> 2.196 | Grad_l2 --> 0.285 | Weights_l2 --> 16833.817 | Lr --> 0.012 | Seconds_per_step --> 1.713 | +[2024-01-02 23:48:36,020][Main][INFO] - [train] Step 33900 out of 65536 | Loss --> 2.204 | Grad_l2 --> 0.280 | Weights_l2 --> 16846.275 | Lr --> 0.012 | Seconds_per_step --> 1.696 | +[2024-01-02 23:49:55,848][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00178-of-00512.json.gz +[2024-01-02 23:50:00,390][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00140-of-00512.json.gz +[2024-01-02 23:51:29,982][Main][INFO] - [train] Step 34000 out of 65536 | Loss --> 2.182 | Grad_l2 --> 0.289 | Weights_l2 --> 16858.585 | Lr --> 0.012 | Seconds_per_step --> 1.740 | +[2024-01-02 23:54:25,627][Main][INFO] - [train] Step 34100 out of 65536 | Loss --> 2.186 | Grad_l2 --> 0.281 | Weights_l2 --> 16870.781 | Lr --> 0.012 | Seconds_per_step --> 1.756 | +[2024-01-02 23:57:13,266][Main][INFO] - [train] Step 34200 out of 65536 | Loss --> 2.186 | Grad_l2 --> 0.286 | Weights_l2 --> 16882.840 | Lr --> 0.012 | Seconds_per_step --> 1.676 | +[2024-01-03 00:00:02,388][Main][INFO] - [train] Step 34300 out of 65536 | Loss --> 2.182 | Grad_l2 --> 0.283 | Weights_l2 --> 16894.808 | Lr --> 0.012 | Seconds_per_step --> 1.691 | +[2024-01-03 00:02:54,384][Main][INFO] - [train] Step 34400 out of 65536 | Loss --> 2.181 | Grad_l2 --> 0.303 | Weights_l2 --> 16907.026 | Lr --> 0.012 | Seconds_per_step --> 1.720 | +[2024-01-03 00:05:41,895][Main][INFO] - [train] Step 34500 out of 65536 | Loss --> 2.157 | Grad_l2 --> 0.281 | Weights_l2 --> 16918.696 | Lr --> 0.012 | Seconds_per_step --> 1.675 | +[2024-01-03 00:08:32,195][Main][INFO] - [train] Step 34600 out of 65536 | Loss --> 2.181 | Grad_l2 --> 0.282 | Weights_l2 --> 16930.414 | Lr --> 0.012 | Seconds_per_step --> 1.703 | +[2024-01-03 00:09:53,632][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00105-of-00512.json.gz +[2024-01-03 00:10:10,003][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00165-of-00512.json.gz +[2024-01-03 00:11:23,789][Main][INFO] - [train] Step 34700 out of 65536 | Loss --> 2.196 | Grad_l2 --> 0.288 | Weights_l2 --> 16941.943 | Lr --> 0.012 | Seconds_per_step --> 1.716 | +[2024-01-03 00:14:13,982][Main][INFO] - [train] Step 34800 out of 65536 | Loss --> 2.182 | Grad_l2 --> 0.280 | Weights_l2 --> 16953.472 | Lr --> 0.012 | Seconds_per_step --> 1.702 | +[2024-01-03 00:17:05,745][Main][INFO] - [train] Step 34900 out of 65536 | Loss --> 2.168 | Grad_l2 --> 0.289 | Weights_l2 --> 16964.750 | Lr --> 0.012 | Seconds_per_step --> 1.718 | +[2024-01-03 00:19:55,498][Main][INFO] - [train] Step 35000 out of 65536 | Loss --> 2.166 | Grad_l2 --> 0.281 | Weights_l2 --> 16976.026 | Lr --> 0.012 | Seconds_per_step --> 1.698 | +[2024-01-03 00:19:55,547][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-03 00:19:55,548][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-03 00:22:06,691][Main][INFO] - [eval] Step 35000 out of 65536 | Loss --> 2.212 | Accuracy --> 0.608 | Time --> 131.191 | +[2024-01-03 00:24:55,807][Main][INFO] - [train] Step 35100 out of 65536 | Loss --> 2.165 | Grad_l2 --> 0.282 | Weights_l2 --> 16987.239 | Lr --> 0.012 | Seconds_per_step --> 1.691 | +[2024-01-03 00:27:47,004][Main][INFO] - [train] Step 35200 out of 65536 | Loss --> 2.169 | Grad_l2 --> 0.283 | Weights_l2 --> 16998.328 | Lr --> 0.011 | Seconds_per_step --> 1.712 | +[2024-01-03 00:30:36,343][Main][INFO] - [train] Step 35300 out of 65536 | Loss --> 2.163 | Grad_l2 --> 0.287 | Weights_l2 --> 17009.469 | Lr --> 0.011 | Seconds_per_step --> 1.693 | +[2024-01-03 00:32:23,771][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00378-of-00512.json.gz +[2024-01-03 00:33:08,170][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00464-of-00512.json.gz +[2024-01-03 00:33:30,575][Main][INFO] - [train] Step 35400 out of 65536 | Loss --> 2.176 | Grad_l2 --> 0.285 | Weights_l2 --> 17020.410 | Lr --> 0.011 | Seconds_per_step --> 1.742 | +[2024-01-03 00:36:18,391][Main][INFO] - [train] Step 35500 out of 65536 | Loss --> 2.160 | Grad_l2 --> 0.293 | Weights_l2 --> 17031.153 | Lr --> 0.011 | Seconds_per_step --> 1.678 | +[2024-01-03 00:39:11,785][Main][INFO] - [train] Step 35600 out of 65536 | Loss --> 2.173 | Grad_l2 --> 0.295 | Weights_l2 --> 17041.871 | Lr --> 0.011 | Seconds_per_step --> 1.734 | +[2024-01-03 00:42:01,666][Main][INFO] - [train] Step 35700 out of 65536 | Loss --> 2.189 | Grad_l2 --> 0.289 | Weights_l2 --> 17052.524 | Lr --> 0.011 | Seconds_per_step --> 1.699 | +[2024-01-03 00:44:50,615][Main][INFO] - [train] Step 35800 out of 65536 | Loss --> 2.182 | Grad_l2 --> 0.285 | Weights_l2 --> 17062.917 | Lr --> 0.011 | Seconds_per_step --> 1.689 | +[2024-01-03 00:47:44,325][Main][INFO] - [train] Step 35900 out of 65536 | Loss --> 2.173 | Grad_l2 --> 0.292 | Weights_l2 --> 17073.414 | Lr --> 0.011 | Seconds_per_step --> 1.737 | +[2024-01-03 00:50:32,752][Main][INFO] - [train] Step 36000 out of 65536 | Loss --> 2.158 | Grad_l2 --> 0.288 | Weights_l2 --> 17083.724 | Lr --> 0.011 | Seconds_per_step --> 1.684 | +[2024-01-03 00:52:30,653][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00010-of-00512.json.gz +[2024-01-03 00:53:24,107][Main][INFO] - [train] Step 36100 out of 65536 | Loss --> 2.166 | Grad_l2 --> 0.287 | Weights_l2 --> 17093.965 | Lr --> 0.011 | Seconds_per_step --> 1.714 | +[2024-01-03 00:53:29,335][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00019-of-00512.json.gz +[2024-01-03 00:56:15,503][Main][INFO] - [train] Step 36200 out of 65536 | Loss --> 2.176 | Grad_l2 --> 0.315 | Weights_l2 --> 17104.358 | Lr --> 0.011 | Seconds_per_step --> 1.714 | +[2024-01-03 00:59:04,351][Main][INFO] - [train] Step 36300 out of 65536 | Loss --> 2.170 | Grad_l2 --> 0.299 | Weights_l2 --> 17114.695 | Lr --> 0.011 | Seconds_per_step --> 1.688 | +[2024-01-03 01:01:54,068][Main][INFO] - [train] Step 36400 out of 65536 | Loss --> 2.173 | Grad_l2 --> 0.294 | Weights_l2 --> 17124.837 | Lr --> 0.011 | Seconds_per_step --> 1.697 | +[2024-01-03 01:04:43,248][Main][INFO] - [train] Step 36500 out of 65536 | Loss --> 2.161 | Grad_l2 --> 0.294 | Weights_l2 --> 17134.826 | Lr --> 0.011 | Seconds_per_step --> 1.692 | +[2024-01-03 01:07:34,614][Main][INFO] - [train] Step 36600 out of 65536 | Loss --> 2.170 | Grad_l2 --> 0.291 | Weights_l2 --> 17144.589 | Lr --> 0.011 | Seconds_per_step --> 1.714 | +[2024-01-03 01:10:25,235][Main][INFO] - [train] Step 36700 out of 65536 | Loss --> 2.153 | Grad_l2 --> 0.282 | Weights_l2 --> 17154.303 | Lr --> 0.011 | Seconds_per_step --> 1.706 | +[2024-01-03 01:12:28,124][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00471-of-00512.json.gz +[2024-01-03 01:13:15,391][Main][INFO] - [train] Step 36800 out of 65536 | Loss --> 2.157 | Grad_l2 --> 0.295 | Weights_l2 --> 17163.970 | Lr --> 0.011 | Seconds_per_step --> 1.702 | +[2024-01-03 01:13:28,789][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00424-of-00512.json.gz +[2024-01-03 01:16:06,386][Main][INFO] - [train] Step 36900 out of 65536 | Loss --> 2.152 | Grad_l2 --> 0.284 | Weights_l2 --> 17173.483 | Lr --> 0.010 | Seconds_per_step --> 1.710 | +[2024-01-03 01:18:57,140][Main][INFO] - [train] Step 37000 out of 65536 | Loss --> 2.146 | Grad_l2 --> 0.288 | Weights_l2 --> 17182.887 | Lr --> 0.010 | Seconds_per_step --> 1.708 | +[2024-01-03 01:21:45,249][Main][INFO] - [train] Step 37100 out of 65536 | Loss --> 2.149 | Grad_l2 --> 0.287 | Weights_l2 --> 17192.265 | Lr --> 0.010 | Seconds_per_step --> 1.681 | +[2024-01-03 01:24:35,753][Main][INFO] - [train] Step 37200 out of 65536 | Loss --> 2.153 | Grad_l2 --> 0.291 | Weights_l2 --> 17201.602 | Lr --> 0.010 | Seconds_per_step --> 1.705 | +[2024-01-03 01:27:24,424][Main][INFO] - [train] Step 37300 out of 65536 | Loss --> 2.158 | Grad_l2 --> 0.286 | Weights_l2 --> 17210.711 | Lr --> 0.010 | Seconds_per_step --> 1.687 | +[2024-01-03 01:30:15,540][Main][INFO] - [train] Step 37400 out of 65536 | Loss --> 2.150 | Grad_l2 --> 0.287 | Weights_l2 --> 17219.733 | Lr --> 0.010 | Seconds_per_step --> 1.711 | +[2024-01-03 01:32:46,451][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00053-of-00512.json.gz +[2024-01-03 01:33:11,357][Main][INFO] - [train] Step 37500 out of 65536 | Loss --> 2.145 | Grad_l2 --> 0.302 | Weights_l2 --> 17228.918 | Lr --> 0.010 | Seconds_per_step --> 1.758 | +[2024-01-03 01:33:44,213][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00063-of-00512.json.gz +[2024-01-03 01:36:02,645][Main][INFO] - [train] Step 37600 out of 65536 | Loss --> 2.122 | Grad_l2 --> 0.289 | Weights_l2 --> 17237.709 | Lr --> 0.010 | Seconds_per_step --> 1.713 | +[2024-01-03 01:38:51,472][Main][INFO] - [train] Step 37700 out of 65536 | Loss --> 2.108 | Grad_l2 --> 0.297 | Weights_l2 --> 17246.470 | Lr --> 0.010 | Seconds_per_step --> 1.688 | +[2024-01-03 01:41:41,257][Main][INFO] - [train] Step 37800 out of 65536 | Loss --> 2.139 | Grad_l2 --> 0.297 | Weights_l2 --> 17255.070 | Lr --> 0.010 | Seconds_per_step --> 1.698 | +[2024-01-03 01:44:30,334][Main][INFO] - [train] Step 37900 out of 65536 | Loss --> 2.123 | Grad_l2 --> 0.292 | Weights_l2 --> 17263.616 | Lr --> 0.010 | Seconds_per_step --> 1.691 | +[2024-01-03 01:47:24,146][Main][INFO] - [train] Step 38000 out of 65536 | Loss --> 2.133 | Grad_l2 --> 0.285 | Weights_l2 --> 17272.094 | Lr --> 0.010 | Seconds_per_step --> 1.738 | +[2024-01-03 01:50:11,700][Main][INFO] - [train] Step 38100 out of 65536 | Loss --> 2.138 | Grad_l2 --> 0.291 | Weights_l2 --> 17280.525 | Lr --> 0.010 | Seconds_per_step --> 1.676 | +[2024-01-03 01:52:35,521][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00200-of-00512.json.gz +[2024-01-03 01:53:03,146][Main][INFO] - [train] Step 38200 out of 65536 | Loss --> 2.134 | Grad_l2 --> 0.293 | Weights_l2 --> 17288.890 | Lr --> 0.010 | Seconds_per_step --> 1.714 | +[2024-01-03 01:53:19,543][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00266-of-00512.json.gz +[2024-01-03 01:55:53,439][Main][INFO] - [train] Step 38300 out of 65536 | Loss --> 2.138 | Grad_l2 --> 0.282 | Weights_l2 --> 17297.026 | Lr --> 0.010 | Seconds_per_step --> 1.703 | +[2024-01-03 01:58:41,726][Main][INFO] - [train] Step 38400 out of 65536 | Loss --> 2.140 | Grad_l2 --> 0.288 | Weights_l2 --> 17305.150 | Lr --> 0.010 | Seconds_per_step --> 1.683 | +[2024-01-03 02:01:31,764][Main][INFO] - [train] Step 38500 out of 65536 | Loss --> 2.126 | Grad_l2 --> 0.294 | Weights_l2 --> 17313.261 | Lr --> 0.010 | Seconds_per_step --> 1.700 | +[2024-01-03 02:04:21,253][Main][INFO] - [train] Step 38600 out of 65536 | Loss --> 2.123 | Grad_l2 --> 0.294 | Weights_l2 --> 17321.197 | Lr --> 0.010 | Seconds_per_step --> 1.695 | +[2024-01-03 02:07:11,587][Main][INFO] - [train] Step 38700 out of 65536 | Loss --> 2.101 | Grad_l2 --> 0.294 | Weights_l2 --> 17329.156 | Lr --> 0.009 | Seconds_per_step --> 1.703 | +[2024-01-03 02:10:03,544][Main][INFO] - [train] Step 38800 out of 65536 | Loss --> 2.103 | Grad_l2 --> 0.284 | Weights_l2 --> 17336.994 | Lr --> 0.009 | Seconds_per_step --> 1.720 | +[2024-01-03 02:12:56,742][Main][INFO] - [train] Step 38900 out of 65536 | Loss --> 2.099 | Grad_l2 --> 0.293 | Weights_l2 --> 17344.697 | Lr --> 0.009 | Seconds_per_step --> 1.732 | +[2024-01-03 02:13:17,424][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00144-of-00512.json.gz +[2024-01-03 02:13:33,791][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00386-of-00512.json.gz +[2024-01-03 02:15:46,711][Main][INFO] - [train] Step 39000 out of 65536 | Loss --> 2.110 | Grad_l2 --> 0.285 | Weights_l2 --> 17352.333 | Lr --> 0.009 | Seconds_per_step --> 1.700 | +[2024-01-03 02:18:37,160][Main][INFO] - [train] Step 39100 out of 65536 | Loss --> 2.110 | Grad_l2 --> 0.283 | Weights_l2 --> 17359.887 | Lr --> 0.009 | Seconds_per_step --> 1.704 | +[2024-01-03 02:21:26,524][Main][INFO] - [train] Step 39200 out of 65536 | Loss --> 2.117 | Grad_l2 --> 0.287 | Weights_l2 --> 17367.262 | Lr --> 0.009 | Seconds_per_step --> 1.694 | +[2024-01-03 02:24:16,638][Main][INFO] - [train] Step 39300 out of 65536 | Loss --> 2.115 | Grad_l2 --> 0.291 | Weights_l2 --> 17374.575 | Lr --> 0.009 | Seconds_per_step --> 1.701 | +[2024-01-03 02:27:04,394][Main][INFO] - [train] Step 39400 out of 65536 | Loss --> 2.117 | Grad_l2 --> 0.285 | Weights_l2 --> 17381.807 | Lr --> 0.009 | Seconds_per_step --> 1.678 | +[2024-01-03 02:29:53,317][Main][INFO] - [train] Step 39500 out of 65536 | Loss --> 2.113 | Grad_l2 --> 0.289 | Weights_l2 --> 17388.990 | Lr --> 0.009 | Seconds_per_step --> 1.689 | +[2024-01-03 02:32:44,181][Main][INFO] - [train] Step 39600 out of 65536 | Loss --> 2.105 | Grad_l2 --> 0.285 | Weights_l2 --> 17396.028 | Lr --> 0.009 | Seconds_per_step --> 1.709 | +[2024-01-03 02:33:25,310][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00302-of-00512.json.gz +[2024-01-03 02:33:32,914][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00057-of-00512.json.gz +[2024-01-03 02:35:35,864][Main][INFO] - [train] Step 39700 out of 65536 | Loss --> 2.105 | Grad_l2 --> 0.286 | Weights_l2 --> 17402.992 | Lr --> 0.009 | Seconds_per_step --> 1.717 | +[2024-01-03 02:38:24,656][Main][INFO] - [train] Step 39800 out of 65536 | Loss --> 2.093 | Grad_l2 --> 0.294 | Weights_l2 --> 17409.980 | Lr --> 0.009 | Seconds_per_step --> 1.688 | +[2024-01-03 02:41:13,447][Main][INFO] - [train] Step 39900 out of 65536 | Loss --> 2.089 | Grad_l2 --> 0.290 | Weights_l2 --> 17416.890 | Lr --> 0.009 | Seconds_per_step --> 1.688 | +[2024-01-03 02:44:03,925][Main][INFO] - [train] Step 40000 out of 65536 | Loss --> 2.081 | Grad_l2 --> 0.293 | Weights_l2 --> 17423.653 | Lr --> 0.009 | Seconds_per_step --> 1.705 | +[2024-01-03 02:44:03,973][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-03 02:44:03,974][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-03 02:46:15,123][Main][INFO] - [eval] Step 40000 out of 65536 | Loss --> 2.132 | Accuracy --> 0.618 | Time --> 131.196 | +[2024-01-03 02:46:15,127][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-40000 +[2024-01-03 02:46:15,130][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-01-03 02:46:17,851][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-40000/model.safetensors +[2024-01-03 02:46:22,202][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-40000/optimizer.bin +[2024-01-03 02:46:22,203][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-40000/scheduler.bin +[2024-01-03 02:46:22,203][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-40000/sampler.bin +[2024-01-03 02:46:22,203][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-40000/sampler_1.bin +[2024-01-03 02:46:22,205][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-40000/random_states_0.pkl +[2024-01-03 02:49:13,051][Main][INFO] - [train] Step 40100 out of 65536 | Loss --> 2.088 | Grad_l2 --> 0.291 | Weights_l2 --> 17430.295 | Lr --> 0.009 | Seconds_per_step --> 1.779 | +[2024-01-03 02:52:01,035][Main][INFO] - [train] Step 40200 out of 65536 | Loss --> 2.083 | Grad_l2 --> 0.283 | Weights_l2 --> 17436.882 | Lr --> 0.009 | Seconds_per_step --> 1.680 | +[2024-01-03 02:54:50,316][Main][INFO] - [train] Step 40300 out of 65536 | Loss --> 2.085 | Grad_l2 --> 0.292 | Weights_l2 --> 17443.444 | Lr --> 0.009 | Seconds_per_step --> 1.693 | +[2024-01-03 02:55:27,847][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00346-of-00512.json.gz +[2024-01-03 02:55:33,883][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00329-of-00512.json.gz +[2024-01-03 02:57:40,855][Main][INFO] - [train] Step 40400 out of 65536 | Loss --> 2.086 | Grad_l2 --> 0.289 | Weights_l2 --> 17449.908 | Lr --> 0.009 | Seconds_per_step --> 1.705 | +[2024-01-03 03:00:32,757][Main][INFO] - [train] Step 40500 out of 65536 | Loss --> 2.090 | Grad_l2 --> 0.293 | Weights_l2 --> 17456.234 | Lr --> 0.008 | Seconds_per_step --> 1.719 | +[2024-01-03 03:03:23,942][Main][INFO] - [train] Step 40600 out of 65536 | Loss --> 2.107 | Grad_l2 --> 0.289 | Weights_l2 --> 17462.468 | Lr --> 0.008 | Seconds_per_step --> 1.712 | +[2024-01-03 03:06:14,293][Main][INFO] - [train] Step 40700 out of 65536 | Loss --> 2.083 | Grad_l2 --> 0.289 | Weights_l2 --> 17468.697 | Lr --> 0.008 | Seconds_per_step --> 1.704 | +[2024-01-03 03:09:03,839][Main][INFO] - [train] Step 40800 out of 65536 | Loss --> 2.094 | Grad_l2 --> 0.291 | Weights_l2 --> 17474.825 | Lr --> 0.008 | Seconds_per_step --> 1.695 | +[2024-01-03 03:11:52,679][Main][INFO] - [train] Step 40900 out of 65536 | Loss --> 2.081 | Grad_l2 --> 0.288 | Weights_l2 --> 17480.865 | Lr --> 0.008 | Seconds_per_step --> 1.688 | +[2024-01-03 03:14:45,179][Main][INFO] - [train] Step 41000 out of 65536 | Loss --> 2.078 | Grad_l2 --> 0.290 | Weights_l2 --> 17486.805 | Lr --> 0.008 | Seconds_per_step --> 1.725 | +[2024-01-03 03:15:44,044][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00481-of-00512.json.gz +[2024-01-03 03:16:16,067][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00158-of-00512.json.gz +[2024-01-03 03:17:36,448][Main][INFO] - [train] Step 41100 out of 65536 | Loss --> 2.070 | Grad_l2 --> 0.288 | Weights_l2 --> 17492.712 | Lr --> 0.008 | Seconds_per_step --> 1.713 | +[2024-01-03 03:20:27,922][Main][INFO] - [train] Step 41200 out of 65536 | Loss --> 2.083 | Grad_l2 --> 0.286 | Weights_l2 --> 17498.521 | Lr --> 0.008 | Seconds_per_step --> 1.715 | +[2024-01-03 03:23:17,312][Main][INFO] - [train] Step 41300 out of 65536 | Loss --> 2.088 | Grad_l2 --> 0.292 | Weights_l2 --> 17504.282 | Lr --> 0.008 | Seconds_per_step --> 1.694 | +[2024-01-03 03:26:04,569][Main][INFO] - [train] Step 41400 out of 65536 | Loss --> 2.085 | Grad_l2 --> 0.297 | Weights_l2 --> 17510.014 | Lr --> 0.008 | Seconds_per_step --> 1.673 | +[2024-01-03 03:28:54,767][Main][INFO] - [train] Step 41500 out of 65536 | Loss --> 2.095 | Grad_l2 --> 0.289 | Weights_l2 --> 17515.610 | Lr --> 0.008 | Seconds_per_step --> 1.702 | +[2024-01-03 03:31:44,389][Main][INFO] - [train] Step 41600 out of 65536 | Loss --> 2.074 | Grad_l2 --> 0.294 | Weights_l2 --> 17521.144 | Lr --> 0.008 | Seconds_per_step --> 1.696 | +[2024-01-03 03:34:34,298][Main][INFO] - [train] Step 41700 out of 65536 | Loss --> 2.097 | Grad_l2 --> 0.292 | Weights_l2 --> 17526.615 | Lr --> 0.008 | Seconds_per_step --> 1.699 | +[2024-01-03 03:35:42,352][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00075-of-00512.json.gz +[2024-01-03 03:36:20,206][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00072-of-00512.json.gz +[2024-01-03 03:37:27,082][Main][INFO] - [train] Step 41800 out of 65536 | Loss --> 2.070 | Grad_l2 --> 0.284 | Weights_l2 --> 17531.987 | Lr --> 0.008 | Seconds_per_step --> 1.728 | +[2024-01-03 03:40:18,101][Main][INFO] - [train] Step 41900 out of 65536 | Loss --> 2.077 | Grad_l2 --> 0.288 | Weights_l2 --> 17537.343 | Lr --> 0.008 | Seconds_per_step --> 1.710 | +[2024-01-03 03:43:08,557][Main][INFO] - [train] Step 42000 out of 65536 | Loss --> 2.068 | Grad_l2 --> 0.289 | Weights_l2 --> 17542.586 | Lr --> 0.008 | Seconds_per_step --> 1.705 | +[2024-01-03 03:45:57,998][Main][INFO] - [train] Step 42100 out of 65536 | Loss --> 2.068 | Grad_l2 --> 0.293 | Weights_l2 --> 17547.789 | Lr --> 0.008 | Seconds_per_step --> 1.694 | +[2024-01-03 03:48:51,875][Main][INFO] - [train] Step 42200 out of 65536 | Loss --> 2.062 | Grad_l2 --> 0.286 | Weights_l2 --> 17552.852 | Lr --> 0.008 | Seconds_per_step --> 1.739 | +[2024-01-03 03:51:40,516][Main][INFO] - [train] Step 42300 out of 65536 | Loss --> 2.075 | Grad_l2 --> 0.289 | Weights_l2 --> 17557.897 | Lr --> 0.007 | Seconds_per_step --> 1.686 | +[2024-01-03 03:54:28,632][Main][INFO] - [train] Step 42400 out of 65536 | Loss --> 2.073 | Grad_l2 --> 0.287 | Weights_l2 --> 17562.835 | Lr --> 0.007 | Seconds_per_step --> 1.681 | +[2024-01-03 03:55:39,567][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00285-of-00512.json.gz +[2024-01-03 03:55:58,159][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00062-of-00512.json.gz +[2024-01-03 03:57:20,119][Main][INFO] - [train] Step 42500 out of 65536 | Loss --> 2.070 | Grad_l2 --> 0.290 | Weights_l2 --> 17567.733 | Lr --> 0.007 | Seconds_per_step --> 1.715 | +[2024-01-03 04:00:11,674][Main][INFO] - [train] Step 42600 out of 65536 | Loss --> 2.039 | Grad_l2 --> 0.285 | Weights_l2 --> 17572.547 | Lr --> 0.007 | Seconds_per_step --> 1.716 | +[2024-01-03 04:03:02,269][Main][INFO] - [train] Step 42700 out of 65536 | Loss --> 2.050 | Grad_l2 --> 0.289 | Weights_l2 --> 17577.315 | Lr --> 0.007 | Seconds_per_step --> 1.706 | +[2024-01-03 04:05:51,733][Main][INFO] - [train] Step 42800 out of 65536 | Loss --> 2.030 | Grad_l2 --> 0.291 | Weights_l2 --> 17581.978 | Lr --> 0.007 | Seconds_per_step --> 1.694 | +[2024-01-03 04:08:39,804][Main][INFO] - [train] Step 42900 out of 65536 | Loss --> 2.043 | Grad_l2 --> 0.293 | Weights_l2 --> 17586.586 | Lr --> 0.007 | Seconds_per_step --> 1.681 | +[2024-01-03 04:11:33,564][Main][INFO] - [train] Step 43000 out of 65536 | Loss --> 2.049 | Grad_l2 --> 0.289 | Weights_l2 --> 17591.143 | Lr --> 0.007 | Seconds_per_step --> 1.738 | +[2024-01-03 04:14:23,750][Main][INFO] - [train] Step 43100 out of 65536 | Loss --> 2.039 | Grad_l2 --> 0.289 | Weights_l2 --> 17595.617 | Lr --> 0.007 | Seconds_per_step --> 1.702 | +[2024-01-03 04:16:15,288][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00039-of-00512.json.gz +[2024-01-03 04:16:27,714][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00154-of-00512.json.gz +[2024-01-03 04:17:19,029][Main][INFO] - [train] Step 43200 out of 65536 | Loss --> 2.052 | Grad_l2 --> 0.289 | Weights_l2 --> 17599.983 | Lr --> 0.007 | Seconds_per_step --> 1.753 | +[2024-01-03 04:20:07,567][Main][INFO] - [train] Step 43300 out of 65536 | Loss --> 2.054 | Grad_l2 --> 0.289 | Weights_l2 --> 17604.349 | Lr --> 0.007 | Seconds_per_step --> 1.685 | +[2024-01-03 04:22:58,386][Main][INFO] - [train] Step 43400 out of 65536 | Loss --> 2.054 | Grad_l2 --> 0.288 | Weights_l2 --> 17608.604 | Lr --> 0.007 | Seconds_per_step --> 1.708 | +[2024-01-03 04:25:46,523][Main][INFO] - [train] Step 43500 out of 65536 | Loss --> 2.042 | Grad_l2 --> 0.287 | Weights_l2 --> 17612.800 | Lr --> 0.007 | Seconds_per_step --> 1.681 | +[2024-01-03 04:28:33,755][Main][INFO] - [train] Step 43600 out of 65536 | Loss --> 2.040 | Grad_l2 --> 0.291 | Weights_l2 --> 17616.925 | Lr --> 0.007 | Seconds_per_step --> 1.672 | +[2024-01-03 04:31:26,272][Main][INFO] - [train] Step 43700 out of 65536 | Loss --> 2.051 | Grad_l2 --> 0.284 | Weights_l2 --> 17621.023 | Lr --> 0.007 | Seconds_per_step --> 1.725 | +[2024-01-03 04:34:15,461][Main][INFO] - [train] Step 43800 out of 65536 | Loss --> 2.047 | Grad_l2 --> 0.285 | Weights_l2 --> 17625.027 | Lr --> 0.007 | Seconds_per_step --> 1.692 | +[2024-01-03 04:35:48,811][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00286-of-00512.json.gz +[2024-01-03 04:36:05,765][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00343-of-00512.json.gz +[2024-01-03 04:37:05,710][Main][INFO] - [train] Step 43900 out of 65536 | Loss --> 2.051 | Grad_l2 --> 0.288 | Weights_l2 --> 17629.009 | Lr --> 0.007 | Seconds_per_step --> 1.702 | +[2024-01-03 04:39:55,076][Main][INFO] - [train] Step 44000 out of 65536 | Loss --> 2.042 | Grad_l2 --> 0.290 | Weights_l2 --> 17632.911 | Lr --> 0.007 | Seconds_per_step --> 1.694 | +[2024-01-03 04:42:46,854][Main][INFO] - [train] Step 44100 out of 65536 | Loss --> 2.046 | Grad_l2 --> 0.287 | Weights_l2 --> 17636.705 | Lr --> 0.007 | Seconds_per_step --> 1.718 | +[2024-01-03 04:45:34,449][Main][INFO] - [train] Step 44200 out of 65536 | Loss --> 2.037 | Grad_l2 --> 0.321 | Weights_l2 --> 17640.721 | Lr --> 0.006 | Seconds_per_step --> 1.676 | +[2024-01-03 04:48:23,375][Main][INFO] - [train] Step 44300 out of 65536 | Loss --> 2.034 | Grad_l2 --> 0.289 | Weights_l2 --> 17644.460 | Lr --> 0.006 | Seconds_per_step --> 1.689 | +[2024-01-03 04:51:14,841][Main][INFO] - [train] Step 44400 out of 65536 | Loss --> 2.024 | Grad_l2 --> 0.289 | Weights_l2 --> 17648.087 | Lr --> 0.006 | Seconds_per_step --> 1.715 | +[2024-01-03 04:54:05,409][Main][INFO] - [train] Step 44500 out of 65536 | Loss --> 2.030 | Grad_l2 --> 0.285 | Weights_l2 --> 17651.628 | Lr --> 0.006 | Seconds_per_step --> 1.706 | +[2024-01-03 04:56:06,856][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00304-of-00512.json.gz +[2024-01-03 04:56:18,130][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00253-of-00512.json.gz +[2024-01-03 04:56:58,490][Main][INFO] - [train] Step 44600 out of 65536 | Loss --> 2.021 | Grad_l2 --> 0.283 | Weights_l2 --> 17655.141 | Lr --> 0.006 | Seconds_per_step --> 1.731 | +[2024-01-03 04:59:46,362][Main][INFO] - [train] Step 44700 out of 65536 | Loss --> 2.020 | Grad_l2 --> 0.289 | Weights_l2 --> 17658.629 | Lr --> 0.006 | Seconds_per_step --> 1.679 | +[2024-01-03 05:02:39,535][Main][INFO] - [train] Step 44800 out of 65536 | Loss --> 2.040 | Grad_l2 --> 0.288 | Weights_l2 --> 17661.997 | Lr --> 0.006 | Seconds_per_step --> 1.732 | +[2024-01-03 05:05:29,989][Main][INFO] - [train] Step 44900 out of 65536 | Loss --> 2.030 | Grad_l2 --> 0.297 | Weights_l2 --> 17665.418 | Lr --> 0.006 | Seconds_per_step --> 1.704 | +[2024-01-03 05:08:17,468][Main][INFO] - [train] Step 45000 out of 65536 | Loss --> 2.016 | Grad_l2 --> 0.285 | Weights_l2 --> 17668.698 | Lr --> 0.006 | Seconds_per_step --> 1.675 | +[2024-01-03 05:08:17,541][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-03 05:08:17,542][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-03 05:10:28,442][Main][INFO] - [eval] Step 45000 out of 65536 | Loss --> 2.057 | Accuracy --> 0.628 | Time --> 130.950 | +[2024-01-03 05:13:19,885][Main][INFO] - [train] Step 45100 out of 65536 | Loss --> 2.029 | Grad_l2 --> 0.294 | Weights_l2 --> 17671.989 | Lr --> 0.006 | Seconds_per_step --> 1.714 | +[2024-01-03 05:16:07,939][Main][INFO] - [train] Step 45200 out of 65536 | Loss --> 2.008 | Grad_l2 --> 0.294 | Weights_l2 --> 17675.197 | Lr --> 0.006 | Seconds_per_step --> 1.681 | +[2024-01-03 05:18:26,781][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00254-of-00512.json.gz +[2024-01-03 05:18:46,937][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00262-of-00512.json.gz +[2024-01-03 05:19:05,189][Main][INFO] - [train] Step 45300 out of 65536 | Loss --> 2.013 | Grad_l2 --> 0.287 | Weights_l2 --> 17678.299 | Lr --> 0.006 | Seconds_per_step --> 1.773 | +[2024-01-03 05:21:54,220][Main][INFO] - [train] Step 45400 out of 65536 | Loss --> 2.026 | Grad_l2 --> 0.285 | Weights_l2 --> 17681.375 | Lr --> 0.006 | Seconds_per_step --> 1.690 | +[2024-01-03 05:24:44,189][Main][INFO] - [train] Step 45500 out of 65536 | Loss --> 2.020 | Grad_l2 --> 0.287 | Weights_l2 --> 17684.421 | Lr --> 0.006 | Seconds_per_step --> 1.700 | +[2024-01-03 05:27:33,522][Main][INFO] - [train] Step 45600 out of 65536 | Loss --> 2.011 | Grad_l2 --> 0.286 | Weights_l2 --> 17687.395 | Lr --> 0.006 | Seconds_per_step --> 1.693 | +[2024-01-03 05:30:23,633][Main][INFO] - [train] Step 45700 out of 65536 | Loss --> 2.011 | Grad_l2 --> 0.289 | Weights_l2 --> 17690.283 | Lr --> 0.006 | Seconds_per_step --> 1.701 | +[2024-01-03 05:33:15,236][Main][INFO] - [train] Step 45800 out of 65536 | Loss --> 2.005 | Grad_l2 --> 0.292 | Weights_l2 --> 17693.136 | Lr --> 0.006 | Seconds_per_step --> 1.716 | +[2024-01-03 05:36:04,189][Main][INFO] - [train] Step 45900 out of 65536 | Loss --> 2.032 | Grad_l2 --> 0.293 | Weights_l2 --> 17695.938 | Lr --> 0.006 | Seconds_per_step --> 1.690 | +[2024-01-03 05:38:11,105][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00313-of-00512.json.gz +[2024-01-03 05:38:31,551][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00118-of-00512.json.gz +[2024-01-03 05:38:59,839][Main][INFO] - [train] Step 46000 out of 65536 | Loss --> 2.000 | Grad_l2 --> 0.287 | Weights_l2 --> 17698.712 | Lr --> 0.006 | Seconds_per_step --> 1.756 | +[2024-01-03 05:41:50,228][Main][INFO] - [train] Step 46100 out of 65536 | Loss --> 2.010 | Grad_l2 --> 0.287 | Weights_l2 --> 17701.450 | Lr --> 0.005 | Seconds_per_step --> 1.704 | +[2024-01-03 05:44:42,152][Main][INFO] - [train] Step 46200 out of 65536 | Loss --> 2.011 | Grad_l2 --> 0.294 | Weights_l2 --> 17704.118 | Lr --> 0.005 | Seconds_per_step --> 1.719 | +[2024-01-03 05:47:32,102][Main][INFO] - [train] Step 46300 out of 65536 | Loss --> 1.998 | Grad_l2 --> 0.292 | Weights_l2 --> 17706.793 | Lr --> 0.005 | Seconds_per_step --> 1.699 | +[2024-01-03 05:50:22,715][Main][INFO] - [train] Step 46400 out of 65536 | Loss --> 2.001 | Grad_l2 --> 0.291 | Weights_l2 --> 17709.329 | Lr --> 0.005 | Seconds_per_step --> 1.706 | +[2024-01-03 05:53:13,369][Main][INFO] - [train] Step 46500 out of 65536 | Loss --> 2.000 | Grad_l2 --> 0.287 | Weights_l2 --> 17711.866 | Lr --> 0.005 | Seconds_per_step --> 1.707 | +[2024-01-03 05:56:03,716][Main][INFO] - [train] Step 46600 out of 65536 | Loss --> 2.001 | Grad_l2 --> 0.287 | Weights_l2 --> 17714.399 | Lr --> 0.005 | Seconds_per_step --> 1.703 | +[2024-01-03 05:58:52,062][Main][INFO] - [train] Step 46700 out of 65536 | Loss --> 2.008 | Grad_l2 --> 0.286 | Weights_l2 --> 17716.786 | Lr --> 0.005 | Seconds_per_step --> 1.683 | +[2024-01-03 05:58:58,949][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00300-of-00512.json.gz +[2024-01-03 05:59:09,467][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00433-of-00512.json.gz +[2024-01-03 06:01:47,301][Main][INFO] - [train] Step 46800 out of 65536 | Loss --> 1.989 | Grad_l2 --> 0.296 | Weights_l2 --> 17719.210 | Lr --> 0.005 | Seconds_per_step --> 1.752 | +[2024-01-03 06:04:36,050][Main][INFO] - [train] Step 46900 out of 65536 | Loss --> 2.002 | Grad_l2 --> 0.293 | Weights_l2 --> 17721.550 | Lr --> 0.005 | Seconds_per_step --> 1.687 | +[2024-01-03 06:07:26,427][Main][INFO] - [train] Step 47000 out of 65536 | Loss --> 1.992 | Grad_l2 --> 0.288 | Weights_l2 --> 17723.868 | Lr --> 0.005 | Seconds_per_step --> 1.704 | +[2024-01-03 06:10:16,331][Main][INFO] - [train] Step 47100 out of 65536 | Loss --> 1.989 | Grad_l2 --> 0.290 | Weights_l2 --> 17726.145 | Lr --> 0.005 | Seconds_per_step --> 1.699 | +[2024-01-03 06:13:07,198][Main][INFO] - [train] Step 47200 out of 65536 | Loss --> 1.982 | Grad_l2 --> 0.283 | Weights_l2 --> 17728.351 | Lr --> 0.005 | Seconds_per_step --> 1.709 | +[2024-01-03 06:15:56,648][Main][INFO] - [train] Step 47300 out of 65536 | Loss --> 1.984 | Grad_l2 --> 0.287 | Weights_l2 --> 17730.499 | Lr --> 0.005 | Seconds_per_step --> 1.694 | +[2024-01-03 06:18:45,811][Main][INFO] - [train] Step 47400 out of 65536 | Loss --> 1.985 | Grad_l2 --> 0.286 | Weights_l2 --> 17732.626 | Lr --> 0.005 | Seconds_per_step --> 1.692 | +[2024-01-03 06:19:24,189][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00379-of-00512.json.gz +[2024-01-03 06:19:39,877][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00015-of-00512.json.gz +[2024-01-03 06:21:38,536][Main][INFO] - [train] Step 47500 out of 65536 | Loss --> 1.983 | Grad_l2 --> 0.291 | Weights_l2 --> 17734.723 | Lr --> 0.005 | Seconds_per_step --> 1.727 | +[2024-01-03 06:24:27,628][Main][INFO] - [train] Step 47600 out of 65536 | Loss --> 1.988 | Grad_l2 --> 0.290 | Weights_l2 --> 17736.780 | Lr --> 0.005 | Seconds_per_step --> 1.691 | +[2024-01-03 06:27:15,767][Main][INFO] - [train] Step 47700 out of 65536 | Loss --> 1.986 | Grad_l2 --> 0.288 | Weights_l2 --> 17738.760 | Lr --> 0.005 | Seconds_per_step --> 1.681 | +[2024-01-03 06:30:05,529][Main][INFO] - [train] Step 47800 out of 65536 | Loss --> 1.980 | Grad_l2 --> 0.286 | Weights_l2 --> 17740.690 | Lr --> 0.005 | Seconds_per_step --> 1.698 | +[2024-01-03 06:32:56,001][Main][INFO] - [train] Step 47900 out of 65536 | Loss --> 1.994 | Grad_l2 --> 0.289 | Weights_l2 --> 17742.631 | Lr --> 0.005 | Seconds_per_step --> 1.705 | +[2024-01-03 06:35:44,297][Main][INFO] - [train] Step 48000 out of 65536 | Loss --> 1.974 | Grad_l2 --> 0.283 | Weights_l2 --> 17744.513 | Lr --> 0.005 | Seconds_per_step --> 1.683 | +[2024-01-03 06:38:35,335][Main][INFO] - [train] Step 48100 out of 65536 | Loss --> 1.981 | Grad_l2 --> 0.291 | Weights_l2 --> 17746.355 | Lr --> 0.004 | Seconds_per_step --> 1.710 | +[2024-01-03 06:39:14,259][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00128-of-00512.json.gz +[2024-01-03 06:39:26,563][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00276-of-00512.json.gz +[2024-01-03 06:41:24,140][Main][INFO] - [train] Step 48200 out of 65536 | Loss --> 1.968 | Grad_l2 --> 0.287 | Weights_l2 --> 17748.151 | Lr --> 0.004 | Seconds_per_step --> 1.688 | +[2024-01-03 06:44:15,224][Main][INFO] - [train] Step 48300 out of 65536 | Loss --> 1.985 | Grad_l2 --> 0.283 | Weights_l2 --> 17749.913 | Lr --> 0.004 | Seconds_per_step --> 1.711 | +[2024-01-03 06:47:07,918][Main][INFO] - [train] Step 48400 out of 65536 | Loss --> 1.961 | Grad_l2 --> 0.286 | Weights_l2 --> 17751.654 | Lr --> 0.004 | Seconds_per_step --> 1.727 | +[2024-01-03 06:49:56,414][Main][INFO] - [train] Step 48500 out of 65536 | Loss --> 1.973 | Grad_l2 --> 0.285 | Weights_l2 --> 17753.328 | Lr --> 0.004 | Seconds_per_step --> 1.685 | +[2024-01-03 06:52:47,869][Main][INFO] - [train] Step 48600 out of 65536 | Loss --> 1.980 | Grad_l2 --> 0.283 | Weights_l2 --> 17754.972 | Lr --> 0.004 | Seconds_per_step --> 1.715 | +[2024-01-03 06:55:37,206][Main][INFO] - [train] Step 48700 out of 65536 | Loss --> 1.986 | Grad_l2 --> 0.285 | Weights_l2 --> 17756.556 | Lr --> 0.004 | Seconds_per_step --> 1.693 | +[2024-01-03 06:58:26,075][Main][INFO] - [train] Step 48800 out of 65536 | Loss --> 1.982 | Grad_l2 --> 0.286 | Weights_l2 --> 17758.144 | Lr --> 0.004 | Seconds_per_step --> 1.689 | +[2024-01-03 06:59:36,491][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00148-of-00512.json.gz +[2024-01-03 06:59:49,115][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00275-of-00512.json.gz +[2024-01-03 07:01:21,837][Main][INFO] - [train] Step 48900 out of 65536 | Loss --> 1.986 | Grad_l2 --> 0.287 | Weights_l2 --> 17759.704 | Lr --> 0.004 | Seconds_per_step --> 1.758 | +[2024-01-03 07:04:12,556][Main][INFO] - [train] Step 49000 out of 65536 | Loss --> 1.996 | Grad_l2 --> 0.285 | Weights_l2 --> 17761.236 | Lr --> 0.004 | Seconds_per_step --> 1.707 | +[2024-01-03 07:07:03,578][Main][INFO] - [train] Step 49100 out of 65536 | Loss --> 1.980 | Grad_l2 --> 0.286 | Weights_l2 --> 17762.724 | Lr --> 0.004 | Seconds_per_step --> 1.710 | +[2024-01-03 07:09:54,031][Main][INFO] - [train] Step 49200 out of 65536 | Loss --> 1.963 | Grad_l2 --> 0.283 | Weights_l2 --> 17764.181 | Lr --> 0.004 | Seconds_per_step --> 1.705 | +[2024-01-03 07:12:44,444][Main][INFO] - [train] Step 49300 out of 65536 | Loss --> 1.966 | Grad_l2 --> 0.279 | Weights_l2 --> 17765.586 | Lr --> 0.004 | Seconds_per_step --> 1.704 | +[2024-01-03 07:15:33,649][Main][INFO] - [train] Step 49400 out of 65536 | Loss --> 1.978 | Grad_l2 --> 0.281 | Weights_l2 --> 17766.950 | Lr --> 0.004 | Seconds_per_step --> 1.692 | +[2024-01-03 07:18:22,511][Main][INFO] - [train] Step 49500 out of 65536 | Loss --> 1.967 | Grad_l2 --> 0.285 | Weights_l2 --> 17768.285 | Lr --> 0.004 | Seconds_per_step --> 1.689 | +[2024-01-03 07:19:25,709][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00318-of-00512.json.gz +[2024-01-03 07:19:36,424][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00445-of-00512.json.gz +[2024-01-03 07:21:13,513][Main][INFO] - [train] Step 49600 out of 65536 | Loss --> 1.958 | Grad_l2 --> 0.285 | Weights_l2 --> 17769.626 | Lr --> 0.004 | Seconds_per_step --> 1.710 | +[2024-01-03 07:24:03,348][Main][INFO] - [train] Step 49700 out of 65536 | Loss --> 1.962 | Grad_l2 --> 0.282 | Weights_l2 --> 17770.912 | Lr --> 0.004 | Seconds_per_step --> 1.698 | +[2024-01-03 07:26:54,995][Main][INFO] - [train] Step 49800 out of 65536 | Loss --> 1.955 | Grad_l2 --> 0.280 | Weights_l2 --> 17772.156 | Lr --> 0.004 | Seconds_per_step --> 1.716 | +[2024-01-03 07:29:42,922][Main][INFO] - [train] Step 49900 out of 65536 | Loss --> 1.963 | Grad_l2 --> 0.285 | Weights_l2 --> 17773.425 | Lr --> 0.004 | Seconds_per_step --> 1.679 | +[2024-01-03 07:32:32,227][Main][INFO] - [train] Step 50000 out of 65536 | Loss --> 1.945 | Grad_l2 --> 0.285 | Weights_l2 --> 17774.612 | Lr --> 0.004 | Seconds_per_step --> 1.693 | +[2024-01-03 07:32:32,274][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-03 07:32:32,275][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-03 07:34:45,805][Main][INFO] - [eval] Step 50000 out of 65536 | Loss --> 1.993 | Accuracy --> 0.637 | Time --> 133.576 | +[2024-01-03 07:37:36,452][Main][INFO] - [train] Step 50100 out of 65536 | Loss --> 1.960 | Grad_l2 --> 0.284 | Weights_l2 --> 17775.790 | Lr --> 0.004 | Seconds_per_step --> 1.706 | +[2024-01-03 07:40:24,868][Main][INFO] - [train] Step 50200 out of 65536 | Loss --> 1.947 | Grad_l2 --> 0.284 | Weights_l2 --> 17776.924 | Lr --> 0.004 | Seconds_per_step --> 1.684 | +[2024-01-03 07:41:50,457][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00358-of-00512.json.gz +[2024-01-03 07:42:15,965][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00077-of-00512.json.gz +[2024-01-03 07:43:16,085][Main][INFO] - [train] Step 50300 out of 65536 | Loss --> 1.959 | Grad_l2 --> 0.285 | Weights_l2 --> 17778.076 | Lr --> 0.003 | Seconds_per_step --> 1.712 | +[2024-01-03 07:46:07,748][Main][INFO] - [train] Step 50400 out of 65536 | Loss --> 1.947 | Grad_l2 --> 0.285 | Weights_l2 --> 17779.151 | Lr --> 0.003 | Seconds_per_step --> 1.717 | +[2024-01-03 07:49:05,278][Main][INFO] - [train] Step 50500 out of 65536 | Loss --> 1.943 | Grad_l2 --> 0.287 | Weights_l2 --> 17780.195 | Lr --> 0.003 | Seconds_per_step --> 1.775 | +[2024-01-03 07:51:57,442][Main][INFO] - [train] Step 50600 out of 65536 | Loss --> 1.947 | Grad_l2 --> 0.283 | Weights_l2 --> 17781.220 | Lr --> 0.003 | Seconds_per_step --> 1.722 | +[2024-01-03 07:54:46,848][Main][INFO] - [train] Step 50700 out of 65536 | Loss --> 1.954 | Grad_l2 --> 0.285 | Weights_l2 --> 17782.226 | Lr --> 0.003 | Seconds_per_step --> 1.694 | +[2024-01-03 07:57:40,847][Main][INFO] - [train] Step 50800 out of 65536 | Loss --> 1.952 | Grad_l2 --> 0.290 | Weights_l2 --> 17783.180 | Lr --> 0.003 | Seconds_per_step --> 1.740 | +[2024-01-03 08:00:29,173][Main][INFO] - [train] Step 50900 out of 65536 | Loss --> 1.941 | Grad_l2 --> 0.283 | Weights_l2 --> 17784.137 | Lr --> 0.003 | Seconds_per_step --> 1.683 | +[2024-01-03 08:02:32,766][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00035-of-00512.json.gz +[2024-01-03 08:02:55,068][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00490-of-00512.json.gz +[2024-01-03 08:03:22,441][Main][INFO] - [train] Step 51000 out of 65536 | Loss --> 1.963 | Grad_l2 --> 0.280 | Weights_l2 --> 17785.068 | Lr --> 0.003 | Seconds_per_step --> 1.733 | +[2024-01-03 08:06:11,340][Main][INFO] - [train] Step 51100 out of 65536 | Loss --> 1.949 | Grad_l2 --> 0.287 | Weights_l2 --> 17785.981 | Lr --> 0.003 | Seconds_per_step --> 1.689 | +[2024-01-03 08:09:00,162][Main][INFO] - [train] Step 51200 out of 65536 | Loss --> 1.956 | Grad_l2 --> 0.282 | Weights_l2 --> 17786.851 | Lr --> 0.003 | Seconds_per_step --> 1.688 | +[2024-01-03 08:11:51,687][Main][INFO] - [train] Step 51300 out of 65536 | Loss --> 1.943 | Grad_l2 --> 0.281 | Weights_l2 --> 17787.678 | Lr --> 0.003 | Seconds_per_step --> 1.715 | +[2024-01-03 08:14:40,610][Main][INFO] - [train] Step 51400 out of 65536 | Loss --> 1.951 | Grad_l2 --> 0.283 | Weights_l2 --> 17788.532 | Lr --> 0.003 | Seconds_per_step --> 1.689 | +[2024-01-03 08:17:39,618][Main][INFO] - [train] Step 51500 out of 65536 | Loss --> 1.948 | Grad_l2 --> 0.280 | Weights_l2 --> 17789.336 | Lr --> 0.003 | Seconds_per_step --> 1.790 | +[2024-01-03 08:20:28,318][Main][INFO] - [train] Step 51600 out of 65536 | Loss --> 1.940 | Grad_l2 --> 0.283 | Weights_l2 --> 17790.135 | Lr --> 0.003 | Seconds_per_step --> 1.687 | +[2024-01-03 08:22:22,677][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00272-of-00512.json.gz +[2024-01-03 08:22:41,878][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00278-of-00512.json.gz +[2024-01-03 08:23:19,793][Main][INFO] - [train] Step 51700 out of 65536 | Loss --> 1.939 | Grad_l2 --> 0.282 | Weights_l2 --> 17790.870 | Lr --> 0.003 | Seconds_per_step --> 1.715 | +[2024-01-03 08:26:11,959][Main][INFO] - [train] Step 51800 out of 65536 | Loss --> 1.926 | Grad_l2 --> 0.281 | Weights_l2 --> 17791.613 | Lr --> 0.003 | Seconds_per_step --> 1.722 | +[2024-01-03 08:29:04,895][Main][INFO] - [train] Step 51900 out of 65536 | Loss --> 1.942 | Grad_l2 --> 0.284 | Weights_l2 --> 17792.345 | Lr --> 0.003 | Seconds_per_step --> 1.729 | +[2024-01-03 08:31:55,552][Main][INFO] - [train] Step 52000 out of 65536 | Loss --> 1.936 | Grad_l2 --> 0.286 | Weights_l2 --> 17793.061 | Lr --> 0.003 | Seconds_per_step --> 1.707 | +[2024-01-03 08:34:44,174][Main][INFO] - [train] Step 52100 out of 65536 | Loss --> 1.926 | Grad_l2 --> 0.284 | Weights_l2 --> 17793.750 | Lr --> 0.003 | Seconds_per_step --> 1.686 | +[2024-01-03 08:37:32,397][Main][INFO] - [train] Step 52200 out of 65536 | Loss --> 1.941 | Grad_l2 --> 0.285 | Weights_l2 --> 17794.421 | Lr --> 0.003 | Seconds_per_step --> 1.682 | +[2024-01-03 08:40:23,128][Main][INFO] - [train] Step 52300 out of 65536 | Loss --> 1.930 | Grad_l2 --> 0.282 | Weights_l2 --> 17795.039 | Lr --> 0.003 | Seconds_per_step --> 1.707 | +[2024-01-03 08:42:36,230][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00324-of-00512.json.gz +[2024-01-03 08:42:52,397][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00146-of-00512.json.gz +[2024-01-03 08:43:14,802][Main][INFO] - [train] Step 52400 out of 65536 | Loss --> 1.941 | Grad_l2 --> 0.276 | Weights_l2 --> 17795.653 | Lr --> 0.003 | Seconds_per_step --> 1.717 | +[2024-01-03 08:46:02,851][Main][INFO] - [train] Step 52500 out of 65536 | Loss --> 1.930 | Grad_l2 --> 0.284 | Weights_l2 --> 17796.251 | Lr --> 0.003 | Seconds_per_step --> 1.680 | +[2024-01-03 08:48:51,490][Main][INFO] - [train] Step 52600 out of 65536 | Loss --> 1.912 | Grad_l2 --> 0.282 | Weights_l2 --> 17796.824 | Lr --> 0.003 | Seconds_per_step --> 1.686 | +[2024-01-03 08:51:44,971][Main][INFO] - [train] Step 52700 out of 65536 | Loss --> 1.916 | Grad_l2 --> 0.279 | Weights_l2 --> 17797.375 | Lr --> 0.003 | Seconds_per_step --> 1.735 | +[2024-01-03 08:54:35,671][Main][INFO] - [train] Step 52800 out of 65536 | Loss --> 1.941 | Grad_l2 --> 0.285 | Weights_l2 --> 17797.922 | Lr --> 0.002 | Seconds_per_step --> 1.707 | +[2024-01-03 08:57:24,488][Main][INFO] - [train] Step 52900 out of 65536 | Loss --> 1.936 | Grad_l2 --> 0.278 | Weights_l2 --> 17798.438 | Lr --> 0.002 | Seconds_per_step --> 1.688 | +[2024-01-03 09:00:13,896][Main][INFO] - [train] Step 53000 out of 65536 | Loss --> 1.921 | Grad_l2 --> 0.284 | Weights_l2 --> 17798.966 | Lr --> 0.002 | Seconds_per_step --> 1.694 | +[2024-01-03 09:03:01,180][Main][INFO] - [train] Step 53100 out of 65536 | Loss --> 1.930 | Grad_l2 --> 0.285 | Weights_l2 --> 17799.468 | Lr --> 0.002 | Seconds_per_step --> 1.673 | +[2024-01-03 09:03:06,043][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00173-of-00512.json.gz +[2024-01-03 09:03:07,200][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00387-of-00512.json.gz +[2024-01-03 09:05:50,561][Main][INFO] - [train] Step 53200 out of 65536 | Loss --> 1.946 | Grad_l2 --> 0.290 | Weights_l2 --> 17799.944 | Lr --> 0.002 | Seconds_per_step --> 1.694 | +[2024-01-03 09:08:42,539][Main][INFO] - [train] Step 53300 out of 65536 | Loss --> 1.946 | Grad_l2 --> 0.282 | Weights_l2 --> 17800.414 | Lr --> 0.002 | Seconds_per_step --> 1.720 | +[2024-01-03 09:11:34,312][Main][INFO] - [train] Step 53400 out of 65536 | Loss --> 1.933 | Grad_l2 --> 0.281 | Weights_l2 --> 17800.876 | Lr --> 0.002 | Seconds_per_step --> 1.718 | +[2024-01-03 09:14:21,663][Main][INFO] - [train] Step 53500 out of 65536 | Loss --> 1.924 | Grad_l2 --> 0.284 | Weights_l2 --> 17801.311 | Lr --> 0.002 | Seconds_per_step --> 1.674 | +[2024-01-03 09:17:09,495][Main][INFO] - [train] Step 53600 out of 65536 | Loss --> 1.927 | Grad_l2 --> 0.279 | Weights_l2 --> 17801.746 | Lr --> 0.002 | Seconds_per_step --> 1.678 | +[2024-01-03 09:20:01,016][Main][INFO] - [train] Step 53700 out of 65536 | Loss --> 1.934 | Grad_l2 --> 0.279 | Weights_l2 --> 17802.152 | Lr --> 0.002 | Seconds_per_step --> 1.715 | +[2024-01-03 09:22:49,068][Main][INFO] - [train] Step 53800 out of 65536 | Loss --> 1.943 | Grad_l2 --> 0.282 | Weights_l2 --> 17802.556 | Lr --> 0.002 | Seconds_per_step --> 1.681 | +[2024-01-03 09:22:50,328][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00479-of-00512.json.gz +[2024-01-03 09:22:56,830][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00180-of-00512.json.gz +[2024-01-03 09:25:42,060][Main][INFO] - [train] Step 53900 out of 65536 | Loss --> 1.930 | Grad_l2 --> 0.279 | Weights_l2 --> 17802.936 | Lr --> 0.002 | Seconds_per_step --> 1.730 | +[2024-01-03 09:28:31,432][Main][INFO] - [train] Step 54000 out of 65536 | Loss --> 1.932 | Grad_l2 --> 0.284 | Weights_l2 --> 17803.290 | Lr --> 0.002 | Seconds_per_step --> 1.694 | +[2024-01-03 09:31:19,349][Main][INFO] - [train] Step 54100 out of 65536 | Loss --> 1.928 | Grad_l2 --> 0.277 | Weights_l2 --> 17803.649 | Lr --> 0.002 | Seconds_per_step --> 1.679 | +[2024-01-03 09:34:17,347][Main][INFO] - [train] Step 54200 out of 65536 | Loss --> 1.910 | Grad_l2 --> 0.279 | Weights_l2 --> 17803.992 | Lr --> 0.002 | Seconds_per_step --> 1.780 | +[2024-01-03 09:37:05,412][Main][INFO] - [train] Step 54300 out of 65536 | Loss --> 1.918 | Grad_l2 --> 0.280 | Weights_l2 --> 17804.320 | Lr --> 0.002 | Seconds_per_step --> 1.681 | +[2024-01-03 09:39:54,989][Main][INFO] - [train] Step 54400 out of 65536 | Loss --> 1.922 | Grad_l2 --> 0.284 | Weights_l2 --> 17804.625 | Lr --> 0.002 | Seconds_per_step --> 1.696 | +[2024-01-03 09:42:52,153][Main][INFO] - [train] Step 54500 out of 65536 | Loss --> 1.935 | Grad_l2 --> 0.283 | Weights_l2 --> 17804.941 | Lr --> 0.002 | Seconds_per_step --> 1.772 | +[2024-01-03 09:43:01,447][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00374-of-00512.json.gz +[2024-01-03 09:43:21,998][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00363-of-00512.json.gz +[2024-01-03 09:45:46,923][Main][INFO] - [train] Step 54600 out of 65536 | Loss --> 1.929 | Grad_l2 --> 0.283 | Weights_l2 --> 17805.238 | Lr --> 0.002 | Seconds_per_step --> 1.748 | +[2024-01-03 09:48:35,393][Main][INFO] - [train] Step 54700 out of 65536 | Loss --> 1.908 | Grad_l2 --> 0.283 | Weights_l2 --> 17805.515 | Lr --> 0.002 | Seconds_per_step --> 1.685 | +[2024-01-03 09:51:23,795][Main][INFO] - [train] Step 54800 out of 65536 | Loss --> 1.905 | Grad_l2 --> 0.283 | Weights_l2 --> 17805.780 | Lr --> 0.002 | Seconds_per_step --> 1.684 | +[2024-01-03 09:54:14,571][Main][INFO] - [train] Step 54900 out of 65536 | Loss --> 1.907 | Grad_l2 --> 0.281 | Weights_l2 --> 17806.019 | Lr --> 0.002 | Seconds_per_step --> 1.708 | +[2024-01-03 09:57:04,000][Main][INFO] - [train] Step 55000 out of 65536 | Loss --> 1.910 | Grad_l2 --> 0.279 | Weights_l2 --> 17806.272 | Lr --> 0.002 | Seconds_per_step --> 1.694 | +[2024-01-03 09:57:04,063][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-03 09:57:04,064][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-03 09:59:15,506][Main][INFO] - [eval] Step 55000 out of 65536 | Loss --> 1.946 | Accuracy --> 0.644 | Time --> 131.503 | +[2024-01-03 10:02:06,812][Main][INFO] - [train] Step 55100 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.278 | Weights_l2 --> 17806.518 | Lr --> 0.002 | Seconds_per_step --> 1.713 | +[2024-01-03 10:04:57,090][Main][INFO] - [train] Step 55200 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.281 | Weights_l2 --> 17806.736 | Lr --> 0.002 | Seconds_per_step --> 1.703 | +[2024-01-03 10:05:46,552][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00393-of-00512.json.gz +[2024-01-03 10:06:12,249][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00081-of-00512.json.gz +[2024-01-03 10:07:51,292][Main][INFO] - [train] Step 55300 out of 65536 | Loss --> 1.918 | Grad_l2 --> 0.278 | Weights_l2 --> 17806.957 | Lr --> 0.002 | Seconds_per_step --> 1.742 | +[2024-01-03 10:10:41,650][Main][INFO] - [train] Step 55400 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.280 | Weights_l2 --> 17807.162 | Lr --> 0.002 | Seconds_per_step --> 1.704 | +[2024-01-03 10:13:30,115][Main][INFO] - [train] Step 55500 out of 65536 | Loss --> 1.909 | Grad_l2 --> 0.285 | Weights_l2 --> 17807.366 | Lr --> 0.002 | Seconds_per_step --> 1.685 | +[2024-01-03 10:16:21,613][Main][INFO] - [train] Step 55600 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.281 | Weights_l2 --> 17807.573 | Lr --> 0.002 | Seconds_per_step --> 1.715 | +[2024-01-03 10:19:13,472][Main][INFO] - [train] Step 55700 out of 65536 | Loss --> 1.899 | Grad_l2 --> 0.280 | Weights_l2 --> 17807.748 | Lr --> 0.002 | Seconds_per_step --> 1.719 | +[2024-01-03 10:22:04,836][Main][INFO] - [train] Step 55800 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.283 | Weights_l2 --> 17807.928 | Lr --> 0.001 | Seconds_per_step --> 1.714 | +[2024-01-03 10:24:53,434][Main][INFO] - [train] Step 55900 out of 65536 | Loss --> 1.901 | Grad_l2 --> 0.282 | Weights_l2 --> 17808.096 | Lr --> 0.001 | Seconds_per_step --> 1.686 | +[2024-01-03 10:25:28,577][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00258-of-00512.json.gz +[2024-01-03 10:25:43,285][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00013-of-00512.json.gz +[2024-01-03 10:27:44,412][Main][INFO] - [train] Step 56000 out of 65536 | Loss --> 1.928 | Grad_l2 --> 0.278 | Weights_l2 --> 17808.247 | Lr --> 0.001 | Seconds_per_step --> 1.710 | +[2024-01-03 10:30:35,033][Main][INFO] - [train] Step 56100 out of 65536 | Loss --> 1.917 | Grad_l2 --> 0.276 | Weights_l2 --> 17808.405 | Lr --> 0.001 | Seconds_per_step --> 1.706 | +[2024-01-03 10:33:21,664][Main][INFO] - [train] Step 56200 out of 65536 | Loss --> 1.921 | Grad_l2 --> 0.279 | Weights_l2 --> 17808.551 | Lr --> 0.001 | Seconds_per_step --> 1.666 | +[2024-01-03 10:36:10,744][Main][INFO] - [train] Step 56300 out of 65536 | Loss --> 1.926 | Grad_l2 --> 0.284 | Weights_l2 --> 17808.693 | Lr --> 0.001 | Seconds_per_step --> 1.691 | +[2024-01-03 10:39:05,510][Main][INFO] - [train] Step 56400 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.281 | Weights_l2 --> 17808.825 | Lr --> 0.001 | Seconds_per_step --> 1.748 | +[2024-01-03 10:41:54,171][Main][INFO] - [train] Step 56500 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.284 | Weights_l2 --> 17808.956 | Lr --> 0.001 | Seconds_per_step --> 1.687 | +[2024-01-03 10:44:43,932][Main][INFO] - [train] Step 56600 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.278 | Weights_l2 --> 17809.086 | Lr --> 0.001 | Seconds_per_step --> 1.698 | +[2024-01-03 10:45:40,890][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00371-of-00512.json.gz +[2024-01-03 10:45:51,655][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00181-of-00512.json.gz +[2024-01-03 10:47:35,088][Main][INFO] - [train] Step 56700 out of 65536 | Loss --> 1.920 | Grad_l2 --> 0.280 | Weights_l2 --> 17809.191 | Lr --> 0.001 | Seconds_per_step --> 1.712 | +[2024-01-03 10:50:26,992][Main][INFO] - [train] Step 56800 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.278 | Weights_l2 --> 17809.289 | Lr --> 0.001 | Seconds_per_step --> 1.719 | +[2024-01-03 10:53:14,768][Main][INFO] - [train] Step 56900 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.279 | Weights_l2 --> 17809.384 | Lr --> 0.001 | Seconds_per_step --> 1.678 | +[2024-01-03 10:56:06,561][Main][INFO] - [train] Step 57000 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.280 | Weights_l2 --> 17809.482 | Lr --> 0.001 | Seconds_per_step --> 1.718 | +[2024-01-03 10:58:58,519][Main][INFO] - [train] Step 57100 out of 65536 | Loss --> 1.905 | Grad_l2 --> 0.281 | Weights_l2 --> 17809.573 | Lr --> 0.001 | Seconds_per_step --> 1.720 | +[2024-01-03 11:01:47,005][Main][INFO] - [train] Step 57200 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.277 | Weights_l2 --> 17809.665 | Lr --> 0.001 | Seconds_per_step --> 1.685 | +[2024-01-03 11:04:36,561][Main][INFO] - [train] Step 57300 out of 65536 | Loss --> 1.894 | Grad_l2 --> 0.279 | Weights_l2 --> 17809.746 | Lr --> 0.001 | Seconds_per_step --> 1.696 | +[2024-01-03 11:05:30,132][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00103-of-00512.json.gz +[2024-01-03 11:05:35,523][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00208-of-00512.json.gz +[2024-01-03 11:07:29,362][Main][INFO] - [train] Step 57400 out of 65536 | Loss --> 1.889 | Grad_l2 --> 0.278 | Weights_l2 --> 17809.834 | Lr --> 0.001 | Seconds_per_step --> 1.728 | +[2024-01-03 11:10:18,968][Main][INFO] - [train] Step 57500 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.279 | Weights_l2 --> 17809.900 | Lr --> 0.001 | Seconds_per_step --> 1.696 | +[2024-01-03 11:13:09,063][Main][INFO] - [train] Step 57600 out of 65536 | Loss --> 1.910 | Grad_l2 --> 0.279 | Weights_l2 --> 17809.962 | Lr --> 0.001 | Seconds_per_step --> 1.701 | +[2024-01-03 11:15:57,325][Main][INFO] - [train] Step 57700 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.026 | Lr --> 0.001 | Seconds_per_step --> 1.683 | +[2024-01-03 11:18:48,091][Main][INFO] - [train] Step 57800 out of 65536 | Loss --> 1.898 | Grad_l2 --> 0.287 | Weights_l2 --> 17810.089 | Lr --> 0.001 | Seconds_per_step --> 1.708 | +[2024-01-03 11:21:36,734][Main][INFO] - [train] Step 57900 out of 65536 | Loss --> 1.922 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.153 | Lr --> 0.001 | Seconds_per_step --> 1.686 | +[2024-01-03 11:24:24,907][Main][INFO] - [train] Step 58000 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.201 | Lr --> 0.001 | Seconds_per_step --> 1.682 | +[2024-01-03 11:25:45,135][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00498-of-00512.json.gz +[2024-01-03 11:25:46,341][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00282-of-00512.json.gz +[2024-01-03 11:27:15,200][Main][INFO] - [train] Step 58100 out of 65536 | Loss --> 1.900 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.245 | Lr --> 0.001 | Seconds_per_step --> 1.703 | +[2024-01-03 11:30:05,335][Main][INFO] - [train] Step 58200 out of 65536 | Loss --> 1.893 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.289 | Lr --> 0.001 | Seconds_per_step --> 1.701 | +[2024-01-03 11:32:56,437][Main][INFO] - [train] Step 58300 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.331 | Lr --> 0.001 | Seconds_per_step --> 1.711 | +[2024-01-03 11:35:43,398][Main][INFO] - [train] Step 58400 out of 65536 | Loss --> 1.881 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.369 | Lr --> 0.001 | Seconds_per_step --> 1.670 | +[2024-01-03 11:38:33,866][Main][INFO] - [train] Step 58500 out of 65536 | Loss --> 1.904 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.404 | Lr --> 0.001 | Seconds_per_step --> 1.705 | +[2024-01-03 11:41:24,447][Main][INFO] - [train] Step 58600 out of 65536 | Loss --> 1.907 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.439 | Lr --> 0.001 | Seconds_per_step --> 1.706 | +[2024-01-03 11:44:13,254][Main][INFO] - [train] Step 58700 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.474 | Lr --> 0.001 | Seconds_per_step --> 1.688 | +[2024-01-03 11:45:45,715][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00301-of-00512.json.gz +[2024-01-03 11:46:18,193][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00201-of-00512.json.gz +[2024-01-03 11:47:05,567][Main][INFO] - [train] Step 58800 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.493 | Lr --> 0.001 | Seconds_per_step --> 1.723 | +[2024-01-03 11:49:54,285][Main][INFO] - [train] Step 58900 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.523 | Lr --> 0.001 | Seconds_per_step --> 1.687 | +[2024-01-03 11:52:46,307][Main][INFO] - [train] Step 59000 out of 65536 | Loss --> 1.868 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.540 | Lr --> 0.001 | Seconds_per_step --> 1.720 | +[2024-01-03 11:55:34,450][Main][INFO] - [train] Step 59100 out of 65536 | Loss --> 1.888 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.564 | Lr --> 0.001 | Seconds_per_step --> 1.681 | +[2024-01-03 11:58:24,845][Main][INFO] - [train] Step 59200 out of 65536 | Loss --> 1.893 | Grad_l2 --> 0.282 | Weights_l2 --> 17810.584 | Lr --> 0.001 | Seconds_per_step --> 1.704 | +[2024-01-03 12:01:15,842][Main][INFO] - [train] Step 59300 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.598 | Lr --> 0.001 | Seconds_per_step --> 1.710 | +[2024-01-03 12:04:08,755][Main][INFO] - [train] Step 59400 out of 65536 | Loss --> 1.895 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.613 | Lr --> 0.001 | Seconds_per_step --> 1.729 | +[2024-01-03 12:06:00,519][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00420-of-00512.json.gz +[2024-01-03 12:06:19,674][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00159-of-00512.json.gz +[2024-01-03 12:07:04,236][Main][INFO] - [train] Step 59500 out of 65536 | Loss --> 1.886 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.628 | Lr --> 0.001 | Seconds_per_step --> 1.755 | +[2024-01-03 12:09:52,438][Main][INFO] - [train] Step 59600 out of 65536 | Loss --> 1.905 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.640 | Lr --> 0.001 | Seconds_per_step --> 1.682 | +[2024-01-03 12:12:42,133][Main][INFO] - [train] Step 59700 out of 65536 | Loss --> 1.880 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.644 | Lr --> 0.001 | Seconds_per_step --> 1.697 | +[2024-01-03 12:15:31,972][Main][INFO] - [train] Step 59800 out of 65536 | Loss --> 1.884 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.654 | Lr --> 0.001 | Seconds_per_step --> 1.698 | +[2024-01-03 12:18:20,211][Main][INFO] - [train] Step 59900 out of 65536 | Loss --> 1.895 | Grad_l2 --> 0.283 | Weights_l2 --> 17810.661 | Lr --> 0.001 | Seconds_per_step --> 1.682 | +[2024-01-03 12:21:13,723][Main][INFO] - [train] Step 60000 out of 65536 | Loss --> 1.898 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.665 | Lr --> 0.000 | Seconds_per_step --> 1.735 | +[2024-01-03 12:21:13,771][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-03 12:21:13,771][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-03 12:23:24,062][Main][INFO] - [eval] Step 60000 out of 65536 | Loss --> 1.922 | Accuracy --> 0.647 | Time --> 130.336 | +[2024-01-03 12:23:24,065][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-60000 +[2024-01-03 12:23:24,069][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-01-03 12:23:27,125][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-60000/model.safetensors +[2024-01-03 12:23:31,336][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-60000/optimizer.bin +[2024-01-03 12:23:31,337][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-60000/scheduler.bin +[2024-01-03 12:23:31,337][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-60000/sampler.bin +[2024-01-03 12:23:31,337][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-60000/sampler_1.bin +[2024-01-03 12:23:31,339][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-60000/random_states_0.pkl +[2024-01-03 12:26:22,312][Main][INFO] - [train] Step 60100 out of 65536 | Loss --> 1.905 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.670 | Lr --> 0.000 | Seconds_per_step --> 1.782 | +[2024-01-03 12:28:41,060][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00050-of-00512.json.gz +[2024-01-03 12:29:15,482][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00193-of-00512.json.gz +[2024-01-03 12:29:22,204][Main][INFO] - [train] Step 60200 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.677 | Lr --> 0.000 | Seconds_per_step --> 1.799 | +[2024-01-03 12:32:12,472][Main][INFO] - [train] Step 60300 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.679 | Lr --> 0.000 | Seconds_per_step --> 1.703 | +[2024-01-03 12:35:01,473][Main][INFO] - [train] Step 60400 out of 65536 | Loss --> 1.885 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.682 | Lr --> 0.000 | Seconds_per_step --> 1.690 | +[2024-01-03 12:37:53,655][Main][INFO] - [train] Step 60500 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.683 | Lr --> 0.000 | Seconds_per_step --> 1.722 | +[2024-01-03 12:40:43,615][Main][INFO] - [train] Step 60600 out of 65536 | Loss --> 1.896 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.679 | Lr --> 0.000 | Seconds_per_step --> 1.700 | +[2024-01-03 12:43:36,004][Main][INFO] - [train] Step 60700 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.680 | Lr --> 0.000 | Seconds_per_step --> 1.724 | +[2024-01-03 12:46:28,090][Main][INFO] - [train] Step 60800 out of 65536 | Loss --> 1.885 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.681 | Lr --> 0.000 | Seconds_per_step --> 1.721 | +[2024-01-03 12:49:18,432][Main][INFO] - [train] Step 60900 out of 65536 | Loss --> 1.873 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.677 | Lr --> 0.000 | Seconds_per_step --> 1.703 | +[2024-01-03 12:49:22,445][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00017-of-00512.json.gz +[2024-01-03 12:49:29,336][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00090-of-00512.json.gz +[2024-01-03 12:52:11,823][Main][INFO] - [train] Step 61000 out of 65536 | Loss --> 1.886 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.676 | Lr --> 0.000 | Seconds_per_step --> 1.734 | +[2024-01-03 12:55:00,485][Main][INFO] - [train] Step 61100 out of 65536 | Loss --> 1.896 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.670 | Lr --> 0.000 | Seconds_per_step --> 1.687 | +[2024-01-03 12:57:50,045][Main][INFO] - [train] Step 61200 out of 65536 | Loss --> 1.873 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.671 | Lr --> 0.000 | Seconds_per_step --> 1.696 | +[2024-01-03 13:00:41,726][Main][INFO] - [train] Step 61300 out of 65536 | Loss --> 1.875 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.662 | Lr --> 0.000 | Seconds_per_step --> 1.717 | +[2024-01-03 13:03:30,567][Main][INFO] - [train] Step 61400 out of 65536 | Loss --> 1.866 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.662 | Lr --> 0.000 | Seconds_per_step --> 1.688 | +[2024-01-03 13:07:09,967][Main][INFO] - [train] Step 61500 out of 65536 | Loss --> 1.872 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.657 | Lr --> 0.000 | Seconds_per_step --> 2.194 | +[2024-01-03 13:09:26,281][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00271-of-00512.json.gz +[2024-01-03 13:10:00,007][Main][INFO] - [train] Step 61600 out of 65536 | Loss --> 1.884 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.654 | Lr --> 0.000 | Seconds_per_step --> 1.700 | +[2024-01-03 13:10:35,628][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00402-of-00512.json.gz +[2024-01-03 13:12:48,268][Main][INFO] - [train] Step 61700 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.649 | Lr --> 0.000 | Seconds_per_step --> 1.683 | +[2024-01-03 13:15:38,728][Main][INFO] - [train] Step 61800 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.645 | Lr --> 0.000 | Seconds_per_step --> 1.704 | +[2024-01-03 13:18:32,609][Main][INFO] - [train] Step 61900 out of 65536 | Loss --> 1.879 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.639 | Lr --> 0.000 | Seconds_per_step --> 1.739 | +[2024-01-03 13:21:22,433][Main][INFO] - [train] Step 62000 out of 65536 | Loss --> 1.881 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.640 | Lr --> 0.000 | Seconds_per_step --> 1.698 | +[2024-01-03 13:24:13,244][Main][INFO] - [train] Step 62100 out of 65536 | Loss --> 1.888 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.633 | Lr --> 0.000 | Seconds_per_step --> 1.708 | +[2024-01-03 13:27:02,989][Main][INFO] - [train] Step 62200 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.628 | Lr --> 0.000 | Seconds_per_step --> 1.697 | +[2024-01-03 13:29:43,155][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00261-of-00512.json.gz +[2024-01-03 13:29:53,152][Main][INFO] - [train] Step 62300 out of 65536 | Loss --> 1.876 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.625 | Lr --> 0.000 | Seconds_per_step --> 1.702 | +[2024-01-03 13:31:14,753][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00459-of-00512.json.gz +[2024-01-03 13:32:48,177][Main][INFO] - [train] Step 62400 out of 65536 | Loss --> 1.894 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.621 | Lr --> 0.000 | Seconds_per_step --> 1.750 | +[2024-01-03 13:35:40,454][Main][INFO] - [train] Step 62500 out of 65536 | Loss --> 1.894 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.617 | Lr --> 0.000 | Seconds_per_step --> 1.723 | +[2024-01-03 13:38:30,070][Main][INFO] - [train] Step 62600 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.610 | Lr --> 0.000 | Seconds_per_step --> 1.696 | +[2024-01-03 13:41:18,539][Main][INFO] - [train] Step 62700 out of 65536 | Loss --> 1.884 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.608 | Lr --> 0.000 | Seconds_per_step --> 1.685 | +[2024-01-03 13:44:08,823][Main][INFO] - [train] Step 62800 out of 65536 | Loss --> 1.888 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.604 | Lr --> 0.000 | Seconds_per_step --> 1.703 | +[2024-01-03 13:46:57,439][Main][INFO] - [train] Step 62900 out of 65536 | Loss --> 1.895 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.601 | Lr --> 0.000 | Seconds_per_step --> 1.686 | +[2024-01-03 13:49:23,728][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00083-of-00512.json.gz +[2024-01-03 13:49:48,796][Main][INFO] - [train] Step 63000 out of 65536 | Loss --> 1.900 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.597 | Lr --> 0.000 | Seconds_per_step --> 1.714 | +[2024-01-03 13:50:58,701][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00111-of-00512.json.gz +[2024-01-03 13:52:39,016][Main][INFO] - [train] Step 63100 out of 65536 | Loss --> 1.908 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.594 | Lr --> 0.000 | Seconds_per_step --> 1.702 | +[2024-01-03 13:55:28,423][Main][INFO] - [train] Step 63200 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.592 | Lr --> 0.000 | Seconds_per_step --> 1.694 | +[2024-01-03 13:58:20,100][Main][INFO] - [train] Step 63300 out of 65536 | Loss --> 1.897 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.587 | Lr --> 0.000 | Seconds_per_step --> 1.717 | +[2024-01-03 14:01:08,081][Main][INFO] - [train] Step 63400 out of 65536 | Loss --> 1.896 | Grad_l2 --> 0.273 | Weights_l2 --> 17810.585 | Lr --> 0.000 | Seconds_per_step --> 1.680 | +[2024-01-03 14:04:01,026][Main][INFO] - [train] Step 63500 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.582 | Lr --> 0.000 | Seconds_per_step --> 1.729 | +[2024-01-03 14:06:54,379][Main][INFO] - [train] Step 63600 out of 65536 | Loss --> 1.870 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.579 | Lr --> 0.000 | Seconds_per_step --> 1.734 | +[2024-01-03 14:09:38,592][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00114-of-00512.json.gz +[2024-01-03 14:09:44,055][Main][INFO] - [train] Step 63700 out of 65536 | Loss --> 1.883 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.577 | Lr --> 0.000 | Seconds_per_step --> 1.697 | +[2024-01-03 14:11:18,021][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00078-of-00512.json.gz +[2024-01-03 14:12:34,997][Main][INFO] - [train] Step 63800 out of 65536 | Loss --> 1.889 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.574 | Lr --> 0.000 | Seconds_per_step --> 1.709 | +[2024-01-03 14:15:29,525][Main][INFO] - [train] Step 63900 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.570 | Lr --> 0.000 | Seconds_per_step --> 1.745 | +[2024-01-03 14:18:21,644][Main][INFO] - [train] Step 64000 out of 65536 | Loss --> 1.872 | Grad_l2 --> 0.273 | Weights_l2 --> 17810.569 | Lr --> 0.000 | Seconds_per_step --> 1.721 | +[2024-01-03 14:21:15,252][Main][INFO] - [train] Step 64100 out of 65536 | Loss --> 1.879 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.566 | Lr --> 0.000 | Seconds_per_step --> 1.736 | +[2024-01-03 14:24:04,635][Main][INFO] - [train] Step 64200 out of 65536 | Loss --> 1.865 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.566 | Lr --> 0.000 | Seconds_per_step --> 1.694 | +[2024-01-03 14:26:55,637][Main][INFO] - [train] Step 64300 out of 65536 | Loss --> 1.874 | Grad_l2 --> 0.288 | Weights_l2 --> 17810.564 | Lr --> 0.000 | Seconds_per_step --> 1.710 | +[2024-01-03 14:29:48,764][Main][INFO] - [train] Step 64400 out of 65536 | Loss --> 1.863 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.563 | Lr --> 0.000 | Seconds_per_step --> 1.731 | +[2024-01-03 14:30:35,636][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00414-of-00512.json.gz +[2024-01-03 14:31:58,777][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00024-of-00512.json.gz +[2024-01-03 14:33:14,044][Main][INFO] - [train] Step 64500 out of 65536 | Loss --> 1.874 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.562 | Lr --> 0.000 | Seconds_per_step --> 2.053 | +[2024-01-03 14:36:02,591][Main][INFO] - [train] Step 64600 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.561 | Lr --> 0.000 | Seconds_per_step --> 1.685 | +[2024-01-03 14:38:53,830][Main][INFO] - [train] Step 64700 out of 65536 | Loss --> 1.877 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.560 | Lr --> 0.000 | Seconds_per_step --> 1.712 | +[2024-01-03 14:41:41,988][Main][INFO] - [train] Step 64800 out of 65536 | Loss --> 1.883 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.559 | Lr --> 0.000 | Seconds_per_step --> 1.682 | +[2024-01-03 14:44:33,468][Main][INFO] - [train] Step 64900 out of 65536 | Loss --> 1.884 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.558 | Lr --> 0.000 | Seconds_per_step --> 1.715 | +[2024-01-03 14:47:23,816][Main][INFO] - [train] Step 65000 out of 65536 | Loss --> 1.886 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.559 | Lr --> 0.000 | Seconds_per_step --> 1.703 | +[2024-01-03 14:47:23,874][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-03 14:47:23,875][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-03 14:49:35,360][Main][INFO] - [eval] Step 65000 out of 65536 | Loss --> 1.915 | Accuracy --> 0.648 | Time --> 131.542 | +[2024-01-03 14:52:24,549][Main][INFO] - [train] Step 65100 out of 65536 | Loss --> 1.892 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.558 | Lr --> 0.000 | Seconds_per_step --> 1.692 | +[2024-01-03 14:53:07,539][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00432-of-00512.json.gz +[2024-01-03 14:54:40,328][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00100-of-00512.json.gz +[2024-01-03 14:55:17,473][Main][INFO] - [train] Step 65200 out of 65536 | Loss --> 1.897 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.557 | Lr --> 0.000 | Seconds_per_step --> 1.729 | +[2024-01-03 14:58:08,882][Main][INFO] - [train] Step 65300 out of 65536 | Loss --> 1.880 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.557 | Lr --> 0.000 | Seconds_per_step --> 1.714 | +[2024-01-03 15:00:58,309][Main][INFO] - [train] Step 65400 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.558 | Lr --> 0.000 | Seconds_per_step --> 1.694 | +[2024-01-03 15:03:47,325][Main][INFO] - [train] Step 65500 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.558 | Lr --> 0.000 | Seconds_per_step --> 1.690 | +[2024-01-03 15:04:48,926][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. +[2024-01-03 15:04:48,927][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz +[2024-01-03 15:07:00,390][Main][INFO] - [eval] Step 65537 out of 65536 | Loss --> 1.914 | Accuracy --> 0.648 | Time --> 131.512 | +[2024-01-03 15:07:00,394][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-65537 +[2024-01-03 15:07:00,397][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading +[2024-01-03 15:07:02,671][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-65537/model.safetensors +[2024-01-03 15:07:07,200][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-65537/optimizer.bin +[2024-01-03 15:07:07,201][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-65537/scheduler.bin +[2024-01-03 15:07:07,202][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-65537/sampler.bin +[2024-01-03 15:07:07,202][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-65537/sampler_1.bin +[2024-01-03 15:07:07,203][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-65537/random_states_0.pkl