[2024-01-02 07:29:30,393][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [2024-01-02 07:29:30,395][Main][INFO] - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 [2024-01-02 07:29:30,395][Main][INFO] - Working directory is /home/jovyan/nanoT5/logs/2024-01-02/07-29-30- [2024-01-02 07:29:37,889][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00228-of-00512.json.gz [2024-01-02 07:29:37,893][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00122-of-00512.json.gz [2024-01-02 07:32:55,673][Main][INFO] - [train] Step 100 out of 65536 | Loss --> 56.793 | Grad_l2 --> 106.708 | Weights_l2 --> 9934.760 | Lr --> 0.010 | Seconds_per_step --> 1.985 | [2024-01-02 07:35:45,962][Main][INFO] - [train] Step 200 out of 65536 | Loss --> 11.768 | Grad_l2 --> 10.600 | Weights_l2 --> 9933.323 | Lr --> 0.010 | Seconds_per_step --> 1.703 | [2024-01-02 07:38:35,708][Main][INFO] - [train] Step 300 out of 65536 | Loss --> 8.268 | Grad_l2 --> 7.176 | Weights_l2 --> 9932.510 | Lr --> 0.010 | Seconds_per_step --> 1.697 | [2024-01-02 07:41:23,498][Main][INFO] - [train] Step 400 out of 65536 | Loss --> 8.143 | Grad_l2 --> 54.690 | Weights_l2 --> 9932.596 | Lr --> 0.010 | Seconds_per_step --> 1.678 | [2024-01-02 07:44:16,348][Main][INFO] - [train] Step 500 out of 65536 | Loss --> 6.963 | Grad_l2 --> 1.569 | Weights_l2 --> 9933.899 | Lr --> 0.011 | Seconds_per_step --> 1.728 | [2024-01-02 07:45:06,122][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00274-of-00512.json.gz [2024-01-02 07:45:35,466][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00185-of-00512.json.gz [2024-01-02 07:47:07,477][Main][INFO] - [train] Step 600 out of 65536 | Loss --> 6.746 | Grad_l2 --> 1.401 | Weights_l2 --> 9935.268 | Lr --> 0.011 | Seconds_per_step --> 1.711 | [2024-01-02 07:49:58,409][Main][INFO] - [train] Step 700 out of 65536 | Loss --> 6.592 | Grad_l2 --> 1.098 | Weights_l2 --> 9937.069 | Lr --> 0.011 | Seconds_per_step --> 1.709 | [2024-01-02 07:52:46,792][Main][INFO] - [train] Step 800 out of 65536 | Loss --> 6.464 | Grad_l2 --> 1.049 | Weights_l2 --> 9939.345 | Lr --> 0.011 | Seconds_per_step --> 1.684 | [2024-01-02 07:55:38,626][Main][INFO] - [train] Step 900 out of 65536 | Loss --> 6.357 | Grad_l2 --> 0.894 | Weights_l2 --> 9942.497 | Lr --> 0.011 | Seconds_per_step --> 1.718 | [2024-01-02 07:58:29,981][Main][INFO] - [train] Step 1000 out of 65536 | Loss --> 6.257 | Grad_l2 --> 0.859 | Weights_l2 --> 9946.163 | Lr --> 0.011 | Seconds_per_step --> 1.714 | [2024-01-02 08:01:17,269][Main][INFO] - [train] Step 1100 out of 65536 | Loss --> 6.184 | Grad_l2 --> 0.828 | Weights_l2 --> 9950.285 | Lr --> 0.011 | Seconds_per_step --> 1.673 | [2024-01-02 08:04:06,727][Main][INFO] - [train] Step 1200 out of 65536 | Loss --> 6.094 | Grad_l2 --> 0.828 | Weights_l2 --> 9954.874 | Lr --> 0.011 | Seconds_per_step --> 1.695 | [2024-01-02 08:05:30,527][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00430-of-00512.json.gz [2024-01-02 08:05:43,438][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00437-of-00512.json.gz [2024-01-02 08:06:59,680][Main][INFO] - [train] Step 1300 out of 65536 | Loss --> 6.032 | Grad_l2 --> 0.794 | Weights_l2 --> 9959.952 | Lr --> 0.011 | Seconds_per_step --> 1.730 | [2024-01-02 08:09:50,806][Main][INFO] - [train] Step 1400 out of 65536 | Loss --> 5.947 | Grad_l2 --> 0.731 | Weights_l2 --> 9965.442 | Lr --> 0.011 | Seconds_per_step --> 1.711 | [2024-01-02 08:12:39,958][Main][INFO] - [train] Step 1500 out of 65536 | Loss --> 5.881 | Grad_l2 --> 0.723 | Weights_l2 --> 9971.498 | Lr --> 0.012 | Seconds_per_step --> 1.692 | [2024-01-02 08:15:28,523][Main][INFO] - [train] Step 1600 out of 65536 | Loss --> 5.838 | Grad_l2 --> 0.712 | Weights_l2 --> 9977.942 | Lr --> 0.012 | Seconds_per_step --> 1.686 | [2024-01-02 08:18:21,350][Main][INFO] - [train] Step 1700 out of 65536 | Loss --> 5.781 | Grad_l2 --> 0.673 | Weights_l2 --> 9984.949 | Lr --> 0.012 | Seconds_per_step --> 1.728 | [2024-01-02 08:21:10,863][Main][INFO] - [train] Step 1800 out of 65536 | Loss --> 5.738 | Grad_l2 --> 0.648 | Weights_l2 --> 9992.259 | Lr --> 0.012 | Seconds_per_step --> 1.695 | [2024-01-02 08:23:59,970][Main][INFO] - [train] Step 1900 out of 65536 | Loss --> 5.695 | Grad_l2 --> 0.614 | Weights_l2 --> 9999.916 | Lr --> 0.012 | Seconds_per_step --> 1.691 | [2024-01-02 08:25:36,522][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00113-of-00512.json.gz [2024-01-02 08:25:43,294][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00087-of-00512.json.gz [2024-01-02 08:26:52,451][Main][INFO] - [train] Step 2000 out of 65536 | Loss --> 5.631 | Grad_l2 --> 0.601 | Weights_l2 --> 10008.046 | Lr --> 0.012 | Seconds_per_step --> 1.725 | [2024-01-02 08:29:40,070][Main][INFO] - [train] Step 2100 out of 65536 | Loss --> 5.600 | Grad_l2 --> 0.597 | Weights_l2 --> 10016.459 | Lr --> 0.012 | Seconds_per_step --> 1.676 | [2024-01-02 08:32:31,002][Main][INFO] - [train] Step 2200 out of 65536 | Loss --> 5.535 | Grad_l2 --> 0.584 | Weights_l2 --> 10025.548 | Lr --> 0.012 | Seconds_per_step --> 1.709 | [2024-01-02 08:35:18,394][Main][INFO] - [train] Step 2300 out of 65536 | Loss --> 5.398 | Grad_l2 --> 0.600 | Weights_l2 --> 10035.698 | Lr --> 0.012 | Seconds_per_step --> 1.674 | [2024-01-02 08:38:06,607][Main][INFO] - [train] Step 2400 out of 65536 | Loss --> 5.176 | Grad_l2 --> 0.599 | Weights_l2 --> 10047.897 | Lr --> 0.012 | Seconds_per_step --> 1.682 | [2024-01-02 08:40:57,015][Main][INFO] - [train] Step 2500 out of 65536 | Loss --> 5.037 | Grad_l2 --> 0.615 | Weights_l2 --> 10061.134 | Lr --> 0.013 | Seconds_per_step --> 1.704 | [2024-01-02 08:43:48,930][Main][INFO] - [train] Step 2600 out of 65536 | Loss --> 4.849 | Grad_l2 --> 0.589 | Weights_l2 --> 10075.957 | Lr --> 0.013 | Seconds_per_step --> 1.719 | [2024-01-02 08:45:27,146][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00389-of-00512.json.gz [2024-01-02 08:45:53,717][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00483-of-00512.json.gz [2024-01-02 08:46:41,027][Main][INFO] - [train] Step 2700 out of 65536 | Loss --> 4.661 | Grad_l2 --> 0.582 | Weights_l2 --> 10092.508 | Lr --> 0.013 | Seconds_per_step --> 1.721 | [2024-01-02 08:49:34,467][Main][INFO] - [train] Step 2800 out of 65536 | Loss --> 4.526 | Grad_l2 --> 0.579 | Weights_l2 --> 10109.752 | Lr --> 0.013 | Seconds_per_step --> 1.734 | [2024-01-02 08:52:22,299][Main][INFO] - [train] Step 2900 out of 65536 | Loss --> 4.396 | Grad_l2 --> 0.546 | Weights_l2 --> 10127.107 | Lr --> 0.013 | Seconds_per_step --> 1.678 | [2024-01-02 08:55:17,616][Main][INFO] - [train] Step 3000 out of 65536 | Loss --> 4.290 | Grad_l2 --> 0.565 | Weights_l2 --> 10144.398 | Lr --> 0.013 | Seconds_per_step --> 1.753 | [2024-01-02 08:58:05,967][Main][INFO] - [train] Step 3100 out of 65536 | Loss --> 4.208 | Grad_l2 --> 0.546 | Weights_l2 --> 10161.719 | Lr --> 0.013 | Seconds_per_step --> 1.684 | [2024-01-02 09:00:59,521][Main][INFO] - [train] Step 3200 out of 65536 | Loss --> 4.134 | Grad_l2 --> 0.559 | Weights_l2 --> 10179.034 | Lr --> 0.013 | Seconds_per_step --> 1.736 | [2024-01-02 09:03:48,488][Main][INFO] - [train] Step 3300 out of 65536 | Loss --> 4.060 | Grad_l2 --> 0.535 | Weights_l2 --> 10196.335 | Lr --> 0.013 | Seconds_per_step --> 1.690 | [2024-01-02 09:06:06,162][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00174-of-00512.json.gz [2024-01-02 09:06:26,824][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00440-of-00512.json.gz [2024-01-02 09:06:43,296][Main][INFO] - [train] Step 3400 out of 65536 | Loss --> 3.970 | Grad_l2 --> 0.530 | Weights_l2 --> 10213.569 | Lr --> 0.013 | Seconds_per_step --> 1.748 | [2024-01-02 09:09:32,439][Main][INFO] - [train] Step 3500 out of 65536 | Loss --> 3.919 | Grad_l2 --> 0.525 | Weights_l2 --> 10230.917 | Lr --> 0.014 | Seconds_per_step --> 1.691 | [2024-01-02 09:12:23,187][Main][INFO] - [train] Step 3600 out of 65536 | Loss --> 3.882 | Grad_l2 --> 0.522 | Weights_l2 --> 10248.395 | Lr --> 0.014 | Seconds_per_step --> 1.707 | [2024-01-02 09:15:13,458][Main][INFO] - [train] Step 3700 out of 65536 | Loss --> 3.835 | Grad_l2 --> 0.532 | Weights_l2 --> 10266.030 | Lr --> 0.014 | Seconds_per_step --> 1.703 | [2024-01-02 09:18:01,193][Main][INFO] - [train] Step 3800 out of 65536 | Loss --> 3.798 | Grad_l2 --> 0.514 | Weights_l2 --> 10283.571 | Lr --> 0.014 | Seconds_per_step --> 1.677 | [2024-01-02 09:20:49,325][Main][INFO] - [train] Step 3900 out of 65536 | Loss --> 3.769 | Grad_l2 --> 0.657 | Weights_l2 --> 10300.982 | Lr --> 0.014 | Seconds_per_step --> 1.681 | [2024-01-02 09:23:39,240][Main][INFO] - [train] Step 4000 out of 65536 | Loss --> 3.709 | Grad_l2 --> 0.521 | Weights_l2 --> 10318.798 | Lr --> 0.014 | Seconds_per_step --> 1.699 | [2024-01-02 09:26:01,799][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00133-of-00512.json.gz [2024-01-02 09:26:10,569][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00364-of-00512.json.gz [2024-01-02 09:26:33,661][Main][INFO] - [train] Step 4100 out of 65536 | Loss --> 3.669 | Grad_l2 --> 0.521 | Weights_l2 --> 10336.524 | Lr --> 0.014 | Seconds_per_step --> 1.744 | [2024-01-02 09:29:21,760][Main][INFO] - [train] Step 4200 out of 65536 | Loss --> 3.621 | Grad_l2 --> 0.500 | Weights_l2 --> 10354.133 | Lr --> 0.014 | Seconds_per_step --> 1.681 | [2024-01-02 09:32:09,396][Main][INFO] - [train] Step 4300 out of 65536 | Loss --> 3.582 | Grad_l2 --> 0.500 | Weights_l2 --> 10371.901 | Lr --> 0.014 | Seconds_per_step --> 1.676 | [2024-01-02 09:35:01,777][Main][INFO] - [train] Step 4400 out of 65536 | Loss --> 3.561 | Grad_l2 --> 0.493 | Weights_l2 --> 10389.639 | Lr --> 0.014 | Seconds_per_step --> 1.724 | [2024-01-02 09:37:50,814][Main][INFO] - [train] Step 4500 out of 65536 | Loss --> 3.526 | Grad_l2 --> 0.486 | Weights_l2 --> 10407.441 | Lr --> 0.015 | Seconds_per_step --> 1.690 | [2024-01-02 09:40:39,274][Main][INFO] - [train] Step 4600 out of 65536 | Loss --> 3.497 | Grad_l2 --> 0.494 | Weights_l2 --> 10425.366 | Lr --> 0.015 | Seconds_per_step --> 1.685 | [2024-01-02 09:43:29,238][Main][INFO] - [train] Step 4700 out of 65536 | Loss --> 3.462 | Grad_l2 --> 0.468 | Weights_l2 --> 10443.308 | Lr --> 0.015 | Seconds_per_step --> 1.700 | [2024-01-02 09:46:04,669][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00482-of-00512.json.gz [2024-01-02 09:46:20,505][Main][INFO] - [train] Step 4800 out of 65536 | Loss --> 3.441 | Grad_l2 --> 0.478 | Weights_l2 --> 10461.273 | Lr --> 0.015 | Seconds_per_step --> 1.713 | [2024-01-02 09:46:23,790][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00177-of-00512.json.gz [2024-01-02 09:49:11,581][Main][INFO] - [train] Step 4900 out of 65536 | Loss --> 3.430 | Grad_l2 --> 0.460 | Weights_l2 --> 10479.178 | Lr --> 0.015 | Seconds_per_step --> 1.711 | [2024-01-02 09:52:00,911][Main][INFO] - [train] Step 5000 out of 65536 | Loss --> 3.393 | Grad_l2 --> 0.472 | Weights_l2 --> 10497.438 | Lr --> 0.015 | Seconds_per_step --> 1.693 | [2024-01-02 09:52:00,967][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-02 09:52:00,968][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-02 09:54:12,141][Main][INFO] - [eval] Step 5000 out of 65536 | Loss --> 3.462 | Accuracy --> 0.473 | Time --> 131.226 | [2024-01-02 09:57:07,848][Main][INFO] - [train] Step 5100 out of 65536 | Loss --> 3.369 | Grad_l2 --> 0.478 | Weights_l2 --> 10515.939 | Lr --> 0.015 | Seconds_per_step --> 1.757 | [2024-01-02 10:00:01,320][Main][INFO] - [train] Step 5200 out of 65536 | Loss --> 3.358 | Grad_l2 --> 0.457 | Weights_l2 --> 10534.272 | Lr --> 0.015 | Seconds_per_step --> 1.735 | [2024-01-02 10:02:49,171][Main][INFO] - [train] Step 5300 out of 65536 | Loss --> 3.328 | Grad_l2 --> 0.458 | Weights_l2 --> 10552.935 | Lr --> 0.015 | Seconds_per_step --> 1.678 | [2024-01-02 10:05:39,781][Main][INFO] - [train] Step 5400 out of 65536 | Loss --> 3.299 | Grad_l2 --> 0.462 | Weights_l2 --> 10571.676 | Lr --> 0.015 | Seconds_per_step --> 1.706 | [2024-01-02 10:08:29,241][Main][INFO] - [train] Step 5500 out of 65536 | Loss --> 3.290 | Grad_l2 --> 0.448 | Weights_l2 --> 10590.597 | Lr --> 0.016 | Seconds_per_step --> 1.695 | [2024-01-02 10:09:11,204][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00005-of-00512.json.gz [2024-01-02 10:09:23,048][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00279-of-00512.json.gz [2024-01-02 10:11:22,404][Main][INFO] - [train] Step 5600 out of 65536 | Loss --> 3.283 | Grad_l2 --> 0.447 | Weights_l2 --> 10609.454 | Lr --> 0.016 | Seconds_per_step --> 1.732 | [2024-01-02 10:14:11,041][Main][INFO] - [train] Step 5700 out of 65536 | Loss --> 3.261 | Grad_l2 --> 0.438 | Weights_l2 --> 10628.584 | Lr --> 0.016 | Seconds_per_step --> 1.686 | [2024-01-02 10:17:05,214][Main][INFO] - [train] Step 5800 out of 65536 | Loss --> 3.264 | Grad_l2 --> 0.441 | Weights_l2 --> 10647.740 | Lr --> 0.016 | Seconds_per_step --> 1.742 | [2024-01-02 10:20:03,650][Main][INFO] - [train] Step 5900 out of 65536 | Loss --> 3.247 | Grad_l2 --> 0.433 | Weights_l2 --> 10667.126 | Lr --> 0.016 | Seconds_per_step --> 1.784 | [2024-01-02 10:22:52,332][Main][INFO] - [train] Step 6000 out of 65536 | Loss --> 3.231 | Grad_l2 --> 0.444 | Weights_l2 --> 10686.671 | Lr --> 0.016 | Seconds_per_step --> 1.687 | [2024-01-02 10:25:42,400][Main][INFO] - [train] Step 6100 out of 65536 | Loss --> 3.222 | Grad_l2 --> 0.429 | Weights_l2 --> 10706.525 | Lr --> 0.016 | Seconds_per_step --> 1.701 | [2024-01-02 10:28:31,589][Main][INFO] - [train] Step 6200 out of 65536 | Loss --> 3.200 | Grad_l2 --> 0.429 | Weights_l2 --> 10726.433 | Lr --> 0.016 | Seconds_per_step --> 1.692 | [2024-01-02 10:28:51,953][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00281-of-00512.json.gz [2024-01-02 10:29:16,283][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00139-of-00512.json.gz [2024-01-02 10:31:22,732][Main][INFO] - [train] Step 6300 out of 65536 | Loss --> 3.207 | Grad_l2 --> 0.421 | Weights_l2 --> 10746.566 | Lr --> 0.016 | Seconds_per_step --> 1.711 | [2024-01-02 10:34:12,886][Main][INFO] - [train] Step 6400 out of 65536 | Loss --> 3.182 | Grad_l2 --> 0.438 | Weights_l2 --> 10767.034 | Lr --> 0.016 | Seconds_per_step --> 1.702 | [2024-01-02 10:37:02,088][Main][INFO] - [train] Step 6500 out of 65536 | Loss --> 3.196 | Grad_l2 --> 0.554 | Weights_l2 --> 10788.199 | Lr --> 0.017 | Seconds_per_step --> 1.692 | [2024-01-02 10:39:50,971][Main][INFO] - [train] Step 6600 out of 65536 | Loss --> 3.179 | Grad_l2 --> 0.609 | Weights_l2 --> 10809.654 | Lr --> 0.017 | Seconds_per_step --> 1.689 | [2024-01-02 10:42:40,192][Main][INFO] - [train] Step 6700 out of 65536 | Loss --> 3.176 | Grad_l2 --> 0.462 | Weights_l2 --> 10831.206 | Lr --> 0.017 | Seconds_per_step --> 1.692 | [2024-01-02 10:45:30,678][Main][INFO] - [train] Step 6800 out of 65536 | Loss --> 3.143 | Grad_l2 --> 0.411 | Weights_l2 --> 10852.539 | Lr --> 0.017 | Seconds_per_step --> 1.705 | [2024-01-02 10:48:20,529][Main][INFO] - [train] Step 6900 out of 65536 | Loss --> 3.107 | Grad_l2 --> 0.405 | Weights_l2 --> 10873.854 | Lr --> 0.017 | Seconds_per_step --> 1.699 | [2024-01-02 10:49:17,778][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00407-of-00512.json.gz [2024-01-02 10:49:48,125][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00058-of-00512.json.gz [2024-01-02 10:51:14,139][Main][INFO] - [train] Step 7000 out of 65536 | Loss --> 3.095 | Grad_l2 --> 0.391 | Weights_l2 --> 10895.238 | Lr --> 0.017 | Seconds_per_step --> 1.736 | [2024-01-02 10:54:03,555][Main][INFO] - [train] Step 7100 out of 65536 | Loss --> 3.087 | Grad_l2 --> 0.406 | Weights_l2 --> 10916.873 | Lr --> 0.017 | Seconds_per_step --> 1.694 | [2024-01-02 10:56:52,316][Main][INFO] - [train] Step 7200 out of 65536 | Loss --> 3.078 | Grad_l2 --> 0.389 | Weights_l2 --> 10938.506 | Lr --> 0.017 | Seconds_per_step --> 1.688 | [2024-01-02 10:59:41,908][Main][INFO] - [train] Step 7300 out of 65536 | Loss --> 3.074 | Grad_l2 --> 0.389 | Weights_l2 --> 10960.402 | Lr --> 0.017 | Seconds_per_step --> 1.696 | [2024-01-02 11:02:32,725][Main][INFO] - [train] Step 7400 out of 65536 | Loss --> 3.059 | Grad_l2 --> 0.392 | Weights_l2 --> 10982.456 | Lr --> 0.017 | Seconds_per_step --> 1.708 | [2024-01-02 11:05:21,789][Main][INFO] - [train] Step 7500 out of 65536 | Loss --> 3.048 | Grad_l2 --> 0.388 | Weights_l2 --> 11005.019 | Lr --> 0.018 | Seconds_per_step --> 1.690 | [2024-01-02 11:08:11,208][Main][INFO] - [train] Step 7600 out of 65536 | Loss --> 3.044 | Grad_l2 --> 0.373 | Weights_l2 --> 11027.512 | Lr --> 0.018 | Seconds_per_step --> 1.694 | [2024-01-02 11:09:07,989][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00020-of-00512.json.gz [2024-01-02 11:09:26,816][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00182-of-00512.json.gz [2024-01-02 11:11:01,891][Main][INFO] - [train] Step 7700 out of 65536 | Loss --> 3.035 | Grad_l2 --> 0.377 | Weights_l2 --> 11050.158 | Lr --> 0.018 | Seconds_per_step --> 1.707 | [2024-01-02 11:13:50,790][Main][INFO] - [train] Step 7800 out of 65536 | Loss --> 3.027 | Grad_l2 --> 0.373 | Weights_l2 --> 11073.020 | Lr --> 0.018 | Seconds_per_step --> 1.689 | [2024-01-02 11:16:40,206][Main][INFO] - [train] Step 7900 out of 65536 | Loss --> 2.995 | Grad_l2 --> 0.367 | Weights_l2 --> 11096.362 | Lr --> 0.018 | Seconds_per_step --> 1.694 | [2024-01-02 11:19:28,364][Main][INFO] - [train] Step 8000 out of 65536 | Loss --> 2.985 | Grad_l2 --> 0.360 | Weights_l2 --> 11119.787 | Lr --> 0.018 | Seconds_per_step --> 1.682 | [2024-01-02 11:22:18,250][Main][INFO] - [train] Step 8100 out of 65536 | Loss --> 2.994 | Grad_l2 --> 0.361 | Weights_l2 --> 11143.291 | Lr --> 0.018 | Seconds_per_step --> 1.699 | [2024-01-02 11:25:06,896][Main][INFO] - [train] Step 8200 out of 65536 | Loss --> 2.983 | Grad_l2 --> 0.360 | Weights_l2 --> 11167.133 | Lr --> 0.018 | Seconds_per_step --> 1.686 | [2024-01-02 11:27:53,897][Main][INFO] - [train] Step 8300 out of 65536 | Loss --> 2.998 | Grad_l2 --> 0.466 | Weights_l2 --> 11191.464 | Lr --> 0.018 | Seconds_per_step --> 1.670 | [2024-01-02 11:29:06,279][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00321-of-00512.json.gz [2024-01-02 11:29:39,792][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00323-of-00512.json.gz [2024-01-02 11:30:43,928][Main][INFO] - [train] Step 8400 out of 65536 | Loss --> 3.009 | Grad_l2 --> 0.563 | Weights_l2 --> 11216.839 | Lr --> 0.018 | Seconds_per_step --> 1.700 | [2024-01-02 11:33:38,321][Main][INFO] - [train] Step 8500 out of 65536 | Loss --> 2.972 | Grad_l2 --> 0.352 | Weights_l2 --> 11241.715 | Lr --> 0.019 | Seconds_per_step --> 1.744 | [2024-01-02 11:36:31,514][Main][INFO] - [train] Step 8600 out of 65536 | Loss --> 2.965 | Grad_l2 --> 0.360 | Weights_l2 --> 11266.211 | Lr --> 0.019 | Seconds_per_step --> 1.732 | [2024-01-02 11:39:29,291][Main][INFO] - [train] Step 8700 out of 65536 | Loss --> 2.965 | Grad_l2 --> 0.384 | Weights_l2 --> 11291.393 | Lr --> 0.019 | Seconds_per_step --> 1.778 | [2024-01-02 11:42:19,746][Main][INFO] - [train] Step 8800 out of 65536 | Loss --> 2.973 | Grad_l2 --> 0.359 | Weights_l2 --> 11316.864 | Lr --> 0.019 | Seconds_per_step --> 1.705 | [2024-01-02 11:45:08,757][Main][INFO] - [train] Step 8900 out of 65536 | Loss --> 2.961 | Grad_l2 --> 0.344 | Weights_l2 --> 11342.222 | Lr --> 0.019 | Seconds_per_step --> 1.690 | [2024-01-02 11:47:57,776][Main][INFO] - [train] Step 9000 out of 65536 | Loss --> 2.952 | Grad_l2 --> 0.353 | Weights_l2 --> 11367.698 | Lr --> 0.019 | Seconds_per_step --> 1.690 | [2024-01-02 11:49:26,201][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00217-of-00512.json.gz [2024-01-02 11:50:16,445][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00511-of-00512.json.gz [2024-01-02 11:50:49,398][Main][INFO] - [train] Step 9100 out of 65536 | Loss --> 2.924 | Grad_l2 --> 0.334 | Weights_l2 --> 11393.301 | Lr --> 0.019 | Seconds_per_step --> 1.716 | [2024-01-02 11:53:40,378][Main][INFO] - [train] Step 9200 out of 65536 | Loss --> 2.928 | Grad_l2 --> 0.339 | Weights_l2 --> 11419.266 | Lr --> 0.019 | Seconds_per_step --> 1.710 | [2024-01-02 11:56:29,788][Main][INFO] - [train] Step 9300 out of 65536 | Loss --> 2.932 | Grad_l2 --> 0.341 | Weights_l2 --> 11445.354 | Lr --> 0.019 | Seconds_per_step --> 1.694 | [2024-01-02 11:59:22,581][Main][INFO] - [train] Step 9400 out of 65536 | Loss --> 2.909 | Grad_l2 --> 0.337 | Weights_l2 --> 11471.712 | Lr --> 0.019 | Seconds_per_step --> 1.728 | [2024-01-02 12:02:11,908][Main][INFO] - [train] Step 9500 out of 65536 | Loss --> 2.911 | Grad_l2 --> 0.328 | Weights_l2 --> 11498.399 | Lr --> 0.020 | Seconds_per_step --> 1.693 | [2024-01-02 12:05:02,332][Main][INFO] - [train] Step 9600 out of 65536 | Loss --> 2.905 | Grad_l2 --> 0.323 | Weights_l2 --> 11525.278 | Lr --> 0.020 | Seconds_per_step --> 1.704 | [2024-01-02 12:07:51,872][Main][INFO] - [train] Step 9700 out of 65536 | Loss --> 2.902 | Grad_l2 --> 0.325 | Weights_l2 --> 11552.402 | Lr --> 0.020 | Seconds_per_step --> 1.695 | [2024-01-02 12:09:20,031][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00277-of-00512.json.gz [2024-01-02 12:10:10,095][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00142-of-00512.json.gz [2024-01-02 12:10:44,454][Main][INFO] - [train] Step 9800 out of 65536 | Loss --> 2.893 | Grad_l2 --> 0.318 | Weights_l2 --> 11579.686 | Lr --> 0.020 | Seconds_per_step --> 1.726 | [2024-01-02 12:13:35,163][Main][INFO] - [train] Step 9900 out of 65536 | Loss --> 2.881 | Grad_l2 --> 0.314 | Weights_l2 --> 11607.262 | Lr --> 0.020 | Seconds_per_step --> 1.707 | [2024-01-02 12:16:23,928][Main][INFO] - [train] Step 10000 out of 65536 | Loss --> 2.882 | Grad_l2 --> 0.317 | Weights_l2 --> 11634.963 | Lr --> 0.020 | Seconds_per_step --> 1.688 | [2024-01-02 12:16:23,976][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-02 12:16:23,976][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-02 12:18:35,496][Main][INFO] - [eval] Step 10000 out of 65536 | Loss --> 2.918 | Accuracy --> 0.526 | Time --> 131.565 | [2024-01-02 12:21:26,879][Main][INFO] - [train] Step 10100 out of 65536 | Loss --> 2.862 | Grad_l2 --> 0.322 | Weights_l2 --> 11662.898 | Lr --> 0.020 | Seconds_per_step --> 1.714 | [2024-01-02 12:24:15,728][Main][INFO] - [train] Step 10200 out of 65536 | Loss --> 2.873 | Grad_l2 --> 0.316 | Weights_l2 --> 11690.667 | Lr --> 0.020 | Seconds_per_step --> 1.688 | [2024-01-02 12:27:14,037][Main][INFO] - [train] Step 10300 out of 65536 | Loss --> 2.850 | Grad_l2 --> 0.308 | Weights_l2 --> 11718.467 | Lr --> 0.020 | Seconds_per_step --> 1.783 | [2024-01-02 12:30:01,949][Main][INFO] - [train] Step 10400 out of 65536 | Loss --> 2.860 | Grad_l2 --> 0.315 | Weights_l2 --> 11746.420 | Lr --> 0.020 | Seconds_per_step --> 1.679 | [2024-01-02 12:31:53,885][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00491-of-00512.json.gz [2024-01-02 12:32:51,503][Main][INFO] - [train] Step 10500 out of 65536 | Loss --> 2.859 | Grad_l2 --> 0.301 | Weights_l2 --> 11774.193 | Lr --> 0.020 | Seconds_per_step --> 1.696 | [2024-01-02 12:32:57,174][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00439-of-00512.json.gz [2024-01-02 12:35:42,573][Main][INFO] - [train] Step 10600 out of 65536 | Loss --> 2.853 | Grad_l2 --> 0.316 | Weights_l2 --> 11802.011 | Lr --> 0.020 | Seconds_per_step --> 1.711 | [2024-01-02 12:38:32,566][Main][INFO] - [train] Step 10700 out of 65536 | Loss --> 2.840 | Grad_l2 --> 0.307 | Weights_l2 --> 11829.644 | Lr --> 0.020 | Seconds_per_step --> 1.700 | [2024-01-02 12:41:22,684][Main][INFO] - [train] Step 10800 out of 65536 | Loss --> 2.837 | Grad_l2 --> 0.304 | Weights_l2 --> 11857.178 | Lr --> 0.020 | Seconds_per_step --> 1.701 | [2024-01-02 12:44:10,983][Main][INFO] - [train] Step 10900 out of 65536 | Loss --> 2.812 | Grad_l2 --> 0.310 | Weights_l2 --> 11884.734 | Lr --> 0.020 | Seconds_per_step --> 1.683 | [2024-01-02 12:46:59,989][Main][INFO] - [train] Step 11000 out of 65536 | Loss --> 2.818 | Grad_l2 --> 0.301 | Weights_l2 --> 11912.166 | Lr --> 0.020 | Seconds_per_step --> 1.690 | [2024-01-02 12:49:48,092][Main][INFO] - [train] Step 11100 out of 65536 | Loss --> 2.774 | Grad_l2 --> 0.296 | Weights_l2 --> 11939.586 | Lr --> 0.020 | Seconds_per_step --> 1.681 | [2024-01-02 12:51:56,755][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00224-of-00512.json.gz [2024-01-02 12:52:42,224][Main][INFO] - [train] Step 11200 out of 65536 | Loss --> 2.792 | Grad_l2 --> 0.302 | Weights_l2 --> 11967.110 | Lr --> 0.020 | Seconds_per_step --> 1.741 | [2024-01-02 12:53:26,814][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00119-of-00512.json.gz [2024-01-02 12:55:32,300][Main][INFO] - [train] Step 11300 out of 65536 | Loss --> 2.784 | Grad_l2 --> 0.301 | Weights_l2 --> 11994.543 | Lr --> 0.020 | Seconds_per_step --> 1.701 | [2024-01-02 12:58:21,753][Main][INFO] - [train] Step 11400 out of 65536 | Loss --> 2.792 | Grad_l2 --> 0.300 | Weights_l2 --> 12022.087 | Lr --> 0.020 | Seconds_per_step --> 1.695 | [2024-01-02 13:01:09,907][Main][INFO] - [train] Step 11500 out of 65536 | Loss --> 2.808 | Grad_l2 --> 0.309 | Weights_l2 --> 12049.762 | Lr --> 0.020 | Seconds_per_step --> 1.682 | [2024-01-02 13:04:00,541][Main][INFO] - [train] Step 11600 out of 65536 | Loss --> 2.776 | Grad_l2 --> 0.283 | Weights_l2 --> 12077.197 | Lr --> 0.020 | Seconds_per_step --> 1.706 | [2024-01-02 13:06:50,871][Main][INFO] - [train] Step 11700 out of 65536 | Loss --> 2.756 | Grad_l2 --> 0.293 | Weights_l2 --> 12104.389 | Lr --> 0.020 | Seconds_per_step --> 1.703 | [2024-01-02 13:09:39,441][Main][INFO] - [train] Step 11800 out of 65536 | Loss --> 2.770 | Grad_l2 --> 0.288 | Weights_l2 --> 12131.692 | Lr --> 0.020 | Seconds_per_step --> 1.686 | [2024-01-02 13:11:26,109][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00333-of-00512.json.gz [2024-01-02 13:12:30,746][Main][INFO] - [train] Step 11900 out of 65536 | Loss --> 2.758 | Grad_l2 --> 0.280 | Weights_l2 --> 12158.996 | Lr --> 0.020 | Seconds_per_step --> 1.713 | [2024-01-02 13:12:59,159][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00348-of-00512.json.gz [2024-01-02 13:15:26,476][Main][INFO] - [train] Step 12000 out of 65536 | Loss --> 2.747 | Grad_l2 --> 0.277 | Weights_l2 --> 12186.245 | Lr --> 0.020 | Seconds_per_step --> 1.757 | [2024-01-02 13:18:15,914][Main][INFO] - [train] Step 12100 out of 65536 | Loss --> 2.754 | Grad_l2 --> 0.280 | Weights_l2 --> 12213.318 | Lr --> 0.020 | Seconds_per_step --> 1.694 | [2024-01-02 13:21:05,540][Main][INFO] - [train] Step 12200 out of 65536 | Loss --> 2.783 | Grad_l2 --> 0.475 | Weights_l2 --> 12242.980 | Lr --> 0.020 | Seconds_per_step --> 1.696 | [2024-01-02 13:23:57,573][Main][INFO] - [train] Step 12300 out of 65536 | Loss --> 2.835 | Grad_l2 --> 0.455 | Weights_l2 --> 12275.630 | Lr --> 0.020 | Seconds_per_step --> 1.720 | [2024-01-02 13:26:51,637][Main][INFO] - [train] Step 12400 out of 65536 | Loss --> 2.767 | Grad_l2 --> 0.316 | Weights_l2 --> 12303.800 | Lr --> 0.020 | Seconds_per_step --> 1.741 | [2024-01-02 13:29:40,588][Main][INFO] - [train] Step 12500 out of 65536 | Loss --> 2.752 | Grad_l2 --> 0.285 | Weights_l2 --> 12331.423 | Lr --> 0.020 | Seconds_per_step --> 1.690 | [2024-01-02 13:32:16,990][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00335-of-00512.json.gz [2024-01-02 13:32:33,602][Main][INFO] - [train] Step 12600 out of 65536 | Loss --> 2.727 | Grad_l2 --> 0.300 | Weights_l2 --> 12358.503 | Lr --> 0.020 | Seconds_per_step --> 1.730 | [2024-01-02 13:33:37,256][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00001-of-00512.json.gz [2024-01-02 13:35:23,429][Main][INFO] - [train] Step 12700 out of 65536 | Loss --> 2.717 | Grad_l2 --> 0.290 | Weights_l2 --> 12385.276 | Lr --> 0.020 | Seconds_per_step --> 1.698 | [2024-01-02 13:38:12,912][Main][INFO] - [train] Step 12800 out of 65536 | Loss --> 2.734 | Grad_l2 --> 0.276 | Weights_l2 --> 12412.097 | Lr --> 0.020 | Seconds_per_step --> 1.695 | [2024-01-02 13:41:01,744][Main][INFO] - [train] Step 12900 out of 65536 | Loss --> 2.732 | Grad_l2 --> 0.288 | Weights_l2 --> 12438.755 | Lr --> 0.020 | Seconds_per_step --> 1.688 | [2024-01-02 13:43:49,130][Main][INFO] - [train] Step 13000 out of 65536 | Loss --> 2.699 | Grad_l2 --> 0.277 | Weights_l2 --> 12465.346 | Lr --> 0.020 | Seconds_per_step --> 1.674 | [2024-01-02 13:46:37,658][Main][INFO] - [train] Step 13100 out of 65536 | Loss --> 2.708 | Grad_l2 --> 0.282 | Weights_l2 --> 12492.101 | Lr --> 0.020 | Seconds_per_step --> 1.685 | [2024-01-02 13:49:30,095][Main][INFO] - [train] Step 13200 out of 65536 | Loss --> 2.687 | Grad_l2 --> 0.287 | Weights_l2 --> 12518.668 | Lr --> 0.020 | Seconds_per_step --> 1.724 | [2024-01-02 13:52:01,251][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00212-of-00512.json.gz [2024-01-02 13:52:21,442][Main][INFO] - [train] Step 13300 out of 65536 | Loss --> 2.681 | Grad_l2 --> 0.284 | Weights_l2 --> 12545.094 | Lr --> 0.020 | Seconds_per_step --> 1.713 | [2024-01-02 13:53:20,889][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00486-of-00512.json.gz [2024-01-02 13:55:10,174][Main][INFO] - [train] Step 13400 out of 65536 | Loss --> 2.676 | Grad_l2 --> 0.274 | Weights_l2 --> 12571.636 | Lr --> 0.020 | Seconds_per_step --> 1.687 | [2024-01-02 13:57:58,686][Main][INFO] - [train] Step 13500 out of 65536 | Loss --> 2.672 | Grad_l2 --> 0.286 | Weights_l2 --> 12598.222 | Lr --> 0.020 | Seconds_per_step --> 1.685 | [2024-01-02 14:00:49,257][Main][INFO] - [train] Step 13600 out of 65536 | Loss --> 2.657 | Grad_l2 --> 0.276 | Weights_l2 --> 12624.682 | Lr --> 0.020 | Seconds_per_step --> 1.706 | [2024-01-02 14:03:47,429][Main][INFO] - [train] Step 13700 out of 65536 | Loss --> 2.654 | Grad_l2 --> 0.274 | Weights_l2 --> 12651.288 | Lr --> 0.020 | Seconds_per_step --> 1.782 | [2024-01-02 14:06:37,118][Main][INFO] - [train] Step 13800 out of 65536 | Loss --> 2.637 | Grad_l2 --> 0.277 | Weights_l2 --> 12677.832 | Lr --> 0.020 | Seconds_per_step --> 1.697 | [2024-01-02 14:09:28,744][Main][INFO] - [train] Step 13900 out of 65536 | Loss --> 2.647 | Grad_l2 --> 0.272 | Weights_l2 --> 12704.341 | Lr --> 0.020 | Seconds_per_step --> 1.716 | [2024-01-02 14:12:18,044][Main][INFO] - [train] Step 14000 out of 65536 | Loss --> 2.648 | Grad_l2 --> 0.269 | Weights_l2 --> 12730.539 | Lr --> 0.020 | Seconds_per_step --> 1.693 | [2024-01-02 14:12:51,526][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00110-of-00512.json.gz [2024-01-02 14:13:53,017][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00064-of-00512.json.gz [2024-01-02 14:15:11,988][Main][INFO] - [train] Step 14100 out of 65536 | Loss --> 2.645 | Grad_l2 --> 0.282 | Weights_l2 --> 12757.044 | Lr --> 0.020 | Seconds_per_step --> 1.739 | [2024-01-02 14:18:04,342][Main][INFO] - [train] Step 14200 out of 65536 | Loss --> 2.633 | Grad_l2 --> 0.283 | Weights_l2 --> 12783.438 | Lr --> 0.020 | Seconds_per_step --> 1.724 | [2024-01-02 14:20:53,113][Main][INFO] - [train] Step 14300 out of 65536 | Loss --> 2.645 | Grad_l2 --> 0.267 | Weights_l2 --> 12809.697 | Lr --> 0.020 | Seconds_per_step --> 1.688 | [2024-01-02 14:23:42,569][Main][INFO] - [train] Step 14400 out of 65536 | Loss --> 2.649 | Grad_l2 --> 0.256 | Weights_l2 --> 12836.004 | Lr --> 0.020 | Seconds_per_step --> 1.695 | [2024-01-02 14:26:31,922][Main][INFO] - [train] Step 14500 out of 65536 | Loss --> 2.639 | Grad_l2 --> 0.270 | Weights_l2 --> 12862.382 | Lr --> 0.020 | Seconds_per_step --> 1.694 | [2024-01-02 14:29:21,289][Main][INFO] - [train] Step 14600 out of 65536 | Loss --> 2.633 | Grad_l2 --> 0.277 | Weights_l2 --> 12888.502 | Lr --> 0.020 | Seconds_per_step --> 1.694 | [2024-01-02 14:32:09,575][Main][INFO] - [train] Step 14700 out of 65536 | Loss --> 2.642 | Grad_l2 --> 0.256 | Weights_l2 --> 12914.793 | Lr --> 0.020 | Seconds_per_step --> 1.683 | [2024-01-02 14:33:21,571][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00307-of-00512.json.gz [2024-01-02 14:34:14,677][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00022-of-00512.json.gz [2024-01-02 14:35:01,259][Main][INFO] - [train] Step 14800 out of 65536 | Loss --> 2.635 | Grad_l2 --> 0.306 | Weights_l2 --> 12942.193 | Lr --> 0.020 | Seconds_per_step --> 1.717 | [2024-01-02 14:37:52,050][Main][INFO] - [train] Step 14900 out of 65536 | Loss --> 2.660 | Grad_l2 --> 0.338 | Weights_l2 --> 12970.937 | Lr --> 0.020 | Seconds_per_step --> 1.708 | [2024-01-02 14:40:42,335][Main][INFO] - [train] Step 15000 out of 65536 | Loss --> 2.644 | Grad_l2 --> 0.279 | Weights_l2 --> 12997.952 | Lr --> 0.020 | Seconds_per_step --> 1.703 | [2024-01-02 14:40:42,381][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-02 14:40:42,382][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-02 14:42:52,869][Main][INFO] - [eval] Step 15000 out of 65536 | Loss --> 2.684 | Accuracy --> 0.551 | Time --> 130.532 | [2024-01-02 14:45:44,453][Main][INFO] - [train] Step 15100 out of 65536 | Loss --> 2.645 | Grad_l2 --> 0.262 | Weights_l2 --> 13024.324 | Lr --> 0.020 | Seconds_per_step --> 1.716 | [2024-01-02 14:48:33,921][Main][INFO] - [train] Step 15200 out of 65536 | Loss --> 2.644 | Grad_l2 --> 0.254 | Weights_l2 --> 13050.215 | Lr --> 0.020 | Seconds_per_step --> 1.695 | [2024-01-02 14:51:24,408][Main][INFO] - [train] Step 15300 out of 65536 | Loss --> 2.625 | Grad_l2 --> 0.256 | Weights_l2 --> 13075.952 | Lr --> 0.020 | Seconds_per_step --> 1.705 | [2024-01-02 14:54:12,203][Main][INFO] - [train] Step 15400 out of 65536 | Loss --> 2.615 | Grad_l2 --> 0.267 | Weights_l2 --> 13101.787 | Lr --> 0.020 | Seconds_per_step --> 1.678 | [2024-01-02 14:55:24,863][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00375-of-00512.json.gz [2024-01-02 14:56:10,541][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00366-of-00512.json.gz [2024-01-02 14:57:03,488][Main][INFO] - [train] Step 15500 out of 65536 | Loss --> 2.639 | Grad_l2 --> 0.283 | Weights_l2 --> 13127.899 | Lr --> 0.020 | Seconds_per_step --> 1.713 | [2024-01-02 14:59:53,686][Main][INFO] - [train] Step 15600 out of 65536 | Loss --> 2.629 | Grad_l2 --> 0.257 | Weights_l2 --> 13153.681 | Lr --> 0.020 | Seconds_per_step --> 1.702 | [2024-01-02 15:02:43,780][Main][INFO] - [train] Step 15700 out of 65536 | Loss --> 2.628 | Grad_l2 --> 0.259 | Weights_l2 --> 13179.533 | Lr --> 0.019 | Seconds_per_step --> 1.701 | [2024-01-02 15:05:33,256][Main][INFO] - [train] Step 15800 out of 65536 | Loss --> 2.604 | Grad_l2 --> 0.252 | Weights_l2 --> 13205.296 | Lr --> 0.019 | Seconds_per_step --> 1.695 | [2024-01-02 15:08:23,806][Main][INFO] - [train] Step 15900 out of 65536 | Loss --> 2.616 | Grad_l2 --> 0.250 | Weights_l2 --> 13231.060 | Lr --> 0.019 | Seconds_per_step --> 1.705 | [2024-01-02 15:11:21,443][Main][INFO] - [train] Step 16000 out of 65536 | Loss --> 2.595 | Grad_l2 --> 0.262 | Weights_l2 --> 13256.698 | Lr --> 0.019 | Seconds_per_step --> 1.776 | [2024-01-02 15:14:11,257][Main][INFO] - [train] Step 16100 out of 65536 | Loss --> 2.571 | Grad_l2 --> 0.261 | Weights_l2 --> 13282.462 | Lr --> 0.019 | Seconds_per_step --> 1.698 | [2024-01-02 15:15:39,662][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00240-of-00512.json.gz [2024-01-02 15:16:44,845][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00463-of-00512.json.gz [2024-01-02 15:17:02,208][Main][INFO] - [train] Step 16200 out of 65536 | Loss --> 2.573 | Grad_l2 --> 0.271 | Weights_l2 --> 13307.963 | Lr --> 0.019 | Seconds_per_step --> 1.710 | [2024-01-02 15:19:53,136][Main][INFO] - [train] Step 16300 out of 65536 | Loss --> 2.587 | Grad_l2 --> 0.254 | Weights_l2 --> 13333.494 | Lr --> 0.019 | Seconds_per_step --> 1.709 | [2024-01-02 15:22:47,503][Main][INFO] - [train] Step 16400 out of 65536 | Loss --> 2.557 | Grad_l2 --> 0.259 | Weights_l2 --> 13358.960 | Lr --> 0.019 | Seconds_per_step --> 1.744 | [2024-01-02 15:25:41,500][Main][INFO] - [train] Step 16500 out of 65536 | Loss --> 2.561 | Grad_l2 --> 0.246 | Weights_l2 --> 13384.289 | Lr --> 0.019 | Seconds_per_step --> 1.740 | [2024-01-02 15:28:32,008][Main][INFO] - [train] Step 16600 out of 65536 | Loss --> 2.570 | Grad_l2 --> 0.258 | Weights_l2 --> 13409.559 | Lr --> 0.019 | Seconds_per_step --> 1.705 | [2024-01-02 15:31:24,041][Main][INFO] - [train] Step 16700 out of 65536 | Loss --> 2.546 | Grad_l2 --> 0.260 | Weights_l2 --> 13434.801 | Lr --> 0.019 | Seconds_per_step --> 1.720 | [2024-01-02 15:34:14,641][Main][INFO] - [train] Step 16800 out of 65536 | Loss --> 2.580 | Grad_l2 --> 0.288 | Weights_l2 --> 13461.565 | Lr --> 0.019 | Seconds_per_step --> 1.706 | [2024-01-02 15:36:47,720][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00473-of-00512.json.gz [2024-01-02 15:37:05,420][Main][INFO] - [train] Step 16900 out of 65536 | Loss --> 2.563 | Grad_l2 --> 0.261 | Weights_l2 --> 13487.370 | Lr --> 0.019 | Seconds_per_step --> 1.708 | [2024-01-02 15:37:28,842][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00160-of-00512.json.gz [2024-01-02 15:39:55,579][Main][INFO] - [train] Step 17000 out of 65536 | Loss --> 2.566 | Grad_l2 --> 0.249 | Weights_l2 --> 13513.010 | Lr --> 0.019 | Seconds_per_step --> 1.702 | [2024-01-02 15:42:44,434][Main][INFO] - [train] Step 17100 out of 65536 | Loss --> 2.561 | Grad_l2 --> 0.249 | Weights_l2 --> 13538.317 | Lr --> 0.019 | Seconds_per_step --> 1.689 | [2024-01-02 15:45:37,029][Main][INFO] - [train] Step 17200 out of 65536 | Loss --> 2.543 | Grad_l2 --> 0.253 | Weights_l2 --> 13563.557 | Lr --> 0.019 | Seconds_per_step --> 1.726 | [2024-01-02 15:48:25,541][Main][INFO] - [train] Step 17300 out of 65536 | Loss --> 2.550 | Grad_l2 --> 0.263 | Weights_l2 --> 13589.125 | Lr --> 0.019 | Seconds_per_step --> 1.685 | [2024-01-02 15:51:14,269][Main][INFO] - [train] Step 17400 out of 65536 | Loss --> 2.549 | Grad_l2 --> 0.263 | Weights_l2 --> 13614.287 | Lr --> 0.019 | Seconds_per_step --> 1.687 | [2024-01-02 15:54:05,860][Main][INFO] - [train] Step 17500 out of 65536 | Loss --> 2.545 | Grad_l2 --> 0.258 | Weights_l2 --> 13639.329 | Lr --> 0.019 | Seconds_per_step --> 1.716 | [2024-01-02 15:56:56,106][Main][INFO] - [train] Step 17600 out of 65536 | Loss --> 2.536 | Grad_l2 --> 0.260 | Weights_l2 --> 13664.259 | Lr --> 0.019 | Seconds_per_step --> 1.702 | [2024-01-02 15:56:58,221][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00190-of-00512.json.gz [2024-01-02 15:57:11,457][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00406-of-00512.json.gz [2024-01-02 15:59:51,233][Main][INFO] - [train] Step 17700 out of 65536 | Loss --> 2.535 | Grad_l2 --> 0.254 | Weights_l2 --> 13689.024 | Lr --> 0.019 | Seconds_per_step --> 1.751 | [2024-01-02 16:02:39,927][Main][INFO] - [train] Step 17800 out of 65536 | Loss --> 2.540 | Grad_l2 --> 0.250 | Weights_l2 --> 13713.710 | Lr --> 0.019 | Seconds_per_step --> 1.687 | [2024-01-02 16:05:29,164][Main][INFO] - [train] Step 17900 out of 65536 | Loss --> 2.536 | Grad_l2 --> 0.258 | Weights_l2 --> 13738.538 | Lr --> 0.019 | Seconds_per_step --> 1.692 | [2024-01-02 16:08:25,730][Main][INFO] - [train] Step 18000 out of 65536 | Loss --> 2.530 | Grad_l2 --> 0.255 | Weights_l2 --> 13763.452 | Lr --> 0.019 | Seconds_per_step --> 1.766 | [2024-01-02 16:11:17,117][Main][INFO] - [train] Step 18100 out of 65536 | Loss --> 2.532 | Grad_l2 --> 0.252 | Weights_l2 --> 13788.193 | Lr --> 0.019 | Seconds_per_step --> 1.714 | [2024-01-02 16:14:13,193][Main][INFO] - [train] Step 18200 out of 65536 | Loss --> 2.523 | Grad_l2 --> 0.258 | Weights_l2 --> 13813.035 | Lr --> 0.019 | Seconds_per_step --> 1.761 | [2024-01-02 16:17:02,862][Main][INFO] - [train] Step 18300 out of 65536 | Loss --> 2.518 | Grad_l2 --> 0.246 | Weights_l2 --> 13837.904 | Lr --> 0.019 | Seconds_per_step --> 1.697 | [2024-01-02 16:17:19,431][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00214-of-00512.json.gz [2024-01-02 16:18:00,973][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00492-of-00512.json.gz [2024-01-02 16:19:55,259][Main][INFO] - [train] Step 18400 out of 65536 | Loss --> 2.515 | Grad_l2 --> 0.253 | Weights_l2 --> 13862.541 | Lr --> 0.019 | Seconds_per_step --> 1.724 | [2024-01-02 16:22:45,207][Main][INFO] - [train] Step 18500 out of 65536 | Loss --> 2.503 | Grad_l2 --> 0.255 | Weights_l2 --> 13886.997 | Lr --> 0.019 | Seconds_per_step --> 1.699 | [2024-01-02 16:25:36,207][Main][INFO] - [train] Step 18600 out of 65536 | Loss --> 2.506 | Grad_l2 --> 0.245 | Weights_l2 --> 13911.428 | Lr --> 0.019 | Seconds_per_step --> 1.710 | [2024-01-02 16:28:28,646][Main][INFO] - [train] Step 18700 out of 65536 | Loss --> 2.511 | Grad_l2 --> 0.250 | Weights_l2 --> 13935.919 | Lr --> 0.019 | Seconds_per_step --> 1.724 | [2024-01-02 16:31:17,078][Main][INFO] - [train] Step 18800 out of 65536 | Loss --> 2.514 | Grad_l2 --> 0.249 | Weights_l2 --> 13960.271 | Lr --> 0.019 | Seconds_per_step --> 1.684 | [2024-01-02 16:34:05,518][Main][INFO] - [train] Step 18900 out of 65536 | Loss --> 2.506 | Grad_l2 --> 0.249 | Weights_l2 --> 13984.804 | Lr --> 0.019 | Seconds_per_step --> 1.684 | [2024-01-02 16:36:55,156][Main][INFO] - [train] Step 19000 out of 65536 | Loss --> 2.516 | Grad_l2 --> 0.251 | Weights_l2 --> 14009.308 | Lr --> 0.019 | Seconds_per_step --> 1.696 | [2024-01-02 16:37:19,031][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00082-of-00512.json.gz [2024-01-02 16:37:41,016][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00143-of-00512.json.gz [2024-01-02 16:39:53,980][Main][INFO] - [train] Step 19100 out of 65536 | Loss --> 2.501 | Grad_l2 --> 0.243 | Weights_l2 --> 14033.724 | Lr --> 0.019 | Seconds_per_step --> 1.788 | [2024-01-02 16:42:43,335][Main][INFO] - [train] Step 19200 out of 65536 | Loss --> 2.523 | Grad_l2 --> 0.294 | Weights_l2 --> 14059.729 | Lr --> 0.019 | Seconds_per_step --> 1.694 | [2024-01-02 16:45:31,492][Main][INFO] - [train] Step 19300 out of 65536 | Loss --> 2.505 | Grad_l2 --> 0.312 | Weights_l2 --> 14085.744 | Lr --> 0.019 | Seconds_per_step --> 1.682 | [2024-01-02 16:48:19,609][Main][INFO] - [train] Step 19400 out of 65536 | Loss --> 2.514 | Grad_l2 --> 0.278 | Weights_l2 --> 14111.722 | Lr --> 0.019 | Seconds_per_step --> 1.681 | [2024-01-02 16:51:14,710][Main][INFO] - [train] Step 19500 out of 65536 | Loss --> 2.492 | Grad_l2 --> 0.262 | Weights_l2 --> 14136.135 | Lr --> 0.019 | Seconds_per_step --> 1.751 | [2024-01-02 16:54:06,545][Main][INFO] - [train] Step 19600 out of 65536 | Loss --> 2.512 | Grad_l2 --> 0.260 | Weights_l2 --> 14160.398 | Lr --> 0.019 | Seconds_per_step --> 1.718 | [2024-01-02 16:56:57,751][Main][INFO] - [train] Step 19700 out of 65536 | Loss --> 2.507 | Grad_l2 --> 0.265 | Weights_l2 --> 14184.460 | Lr --> 0.019 | Seconds_per_step --> 1.712 | [2024-01-02 16:57:40,622][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00264-of-00512.json.gz [2024-01-02 16:57:50,685][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00033-of-00512.json.gz [2024-01-02 16:59:46,604][Main][INFO] - [train] Step 19800 out of 65536 | Loss --> 2.460 | Grad_l2 --> 0.254 | Weights_l2 --> 14208.169 | Lr --> 0.019 | Seconds_per_step --> 1.689 | [2024-01-02 17:02:38,157][Main][INFO] - [train] Step 19900 out of 65536 | Loss --> 2.491 | Grad_l2 --> 0.264 | Weights_l2 --> 14232.110 | Lr --> 0.018 | Seconds_per_step --> 1.716 | [2024-01-02 17:05:29,590][Main][INFO] - [train] Step 20000 out of 65536 | Loss --> 2.487 | Grad_l2 --> 0.264 | Weights_l2 --> 14255.952 | Lr --> 0.018 | Seconds_per_step --> 1.714 | [2024-01-02 17:05:29,637][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-02 17:05:29,639][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-02 17:07:40,969][Main][INFO] - [eval] Step 20000 out of 65536 | Loss --> 2.527 | Accuracy --> 0.569 | Time --> 131.377 | [2024-01-02 17:07:40,972][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-20000 [2024-01-02 17:07:40,975][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-01-02 17:07:44,022][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-20000/model.safetensors [2024-01-02 17:07:48,439][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-20000/optimizer.bin [2024-01-02 17:07:48,441][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-20000/scheduler.bin [2024-01-02 17:07:48,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-20000/sampler.bin [2024-01-02 17:07:48,441][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-20000/sampler_1.bin [2024-01-02 17:07:48,443][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-20000/random_states_0.pkl [2024-01-02 17:10:37,572][Main][INFO] - [train] Step 20100 out of 65536 | Loss --> 2.489 | Grad_l2 --> 0.255 | Weights_l2 --> 14279.763 | Lr --> 0.018 | Seconds_per_step --> 1.766 | [2024-01-02 17:13:31,127][Main][INFO] - [train] Step 20200 out of 65536 | Loss --> 2.473 | Grad_l2 --> 0.252 | Weights_l2 --> 14303.187 | Lr --> 0.018 | Seconds_per_step --> 1.736 | [2024-01-02 17:16:19,446][Main][INFO] - [train] Step 20300 out of 65536 | Loss --> 2.458 | Grad_l2 --> 0.242 | Weights_l2 --> 14326.537 | Lr --> 0.018 | Seconds_per_step --> 1.683 | [2024-01-02 17:19:09,966][Main][INFO] - [train] Step 20400 out of 65536 | Loss --> 2.451 | Grad_l2 --> 0.254 | Weights_l2 --> 14350.033 | Lr --> 0.018 | Seconds_per_step --> 1.705 | [2024-01-02 17:20:08,209][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00421-of-00512.json.gz [2024-01-02 17:20:18,985][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00418-of-00512.json.gz [2024-01-02 17:22:00,590][Main][INFO] - [train] Step 20500 out of 65536 | Loss --> 2.458 | Grad_l2 --> 0.253 | Weights_l2 --> 14373.380 | Lr --> 0.018 | Seconds_per_step --> 1.706 | [2024-01-02 17:24:52,665][Main][INFO] - [train] Step 20600 out of 65536 | Loss --> 2.453 | Grad_l2 --> 0.250 | Weights_l2 --> 14396.528 | Lr --> 0.018 | Seconds_per_step --> 1.721 | [2024-01-02 17:27:42,251][Main][INFO] - [train] Step 20700 out of 65536 | Loss --> 2.478 | Grad_l2 --> 0.254 | Weights_l2 --> 14419.779 | Lr --> 0.018 | Seconds_per_step --> 1.696 | [2024-01-02 17:30:31,301][Main][INFO] - [train] Step 20800 out of 65536 | Loss --> 2.450 | Grad_l2 --> 0.254 | Weights_l2 --> 14442.813 | Lr --> 0.018 | Seconds_per_step --> 1.690 | [2024-01-02 17:33:25,345][Main][INFO] - [train] Step 20900 out of 65536 | Loss --> 2.444 | Grad_l2 --> 0.258 | Weights_l2 --> 14465.939 | Lr --> 0.018 | Seconds_per_step --> 1.740 | [2024-01-02 17:36:13,070][Main][INFO] - [train] Step 21000 out of 65536 | Loss --> 2.440 | Grad_l2 --> 0.244 | Weights_l2 --> 14489.154 | Lr --> 0.018 | Seconds_per_step --> 1.677 | [2024-01-02 17:39:04,055][Main][INFO] - [train] Step 21100 out of 65536 | Loss --> 2.451 | Grad_l2 --> 0.250 | Weights_l2 --> 14512.288 | Lr --> 0.018 | Seconds_per_step --> 1.710 | [2024-01-02 17:39:59,025][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00505-of-00512.json.gz [2024-01-02 17:40:06,313][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00209-of-00512.json.gz [2024-01-02 17:41:55,609][Main][INFO] - [train] Step 21200 out of 65536 | Loss --> 2.445 | Grad_l2 --> 0.250 | Weights_l2 --> 14535.390 | Lr --> 0.018 | Seconds_per_step --> 1.716 | [2024-01-02 17:44:48,184][Main][INFO] - [train] Step 21300 out of 65536 | Loss --> 2.460 | Grad_l2 --> 0.260 | Weights_l2 --> 14558.524 | Lr --> 0.018 | Seconds_per_step --> 1.726 | [2024-01-02 17:47:36,660][Main][INFO] - [train] Step 21400 out of 65536 | Loss --> 2.447 | Grad_l2 --> 0.252 | Weights_l2 --> 14581.355 | Lr --> 0.018 | Seconds_per_step --> 1.685 | [2024-01-02 17:50:26,460][Main][INFO] - [train] Step 21500 out of 65536 | Loss --> 2.433 | Grad_l2 --> 0.249 | Weights_l2 --> 14604.116 | Lr --> 0.018 | Seconds_per_step --> 1.698 | [2024-01-02 17:53:22,788][Main][INFO] - [train] Step 21600 out of 65536 | Loss --> 2.416 | Grad_l2 --> 0.247 | Weights_l2 --> 14626.949 | Lr --> 0.018 | Seconds_per_step --> 1.763 | [2024-01-02 17:56:14,559][Main][INFO] - [train] Step 21700 out of 65536 | Loss --> 2.427 | Grad_l2 --> 0.254 | Weights_l2 --> 14649.558 | Lr --> 0.018 | Seconds_per_step --> 1.718 | [2024-01-02 17:59:03,310][Main][INFO] - [train] Step 21800 out of 65536 | Loss --> 2.425 | Grad_l2 --> 0.257 | Weights_l2 --> 14672.719 | Lr --> 0.018 | Seconds_per_step --> 1.687 | [2024-01-02 18:00:41,296][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00360-of-00512.json.gz [2024-01-02 18:00:57,421][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00441-of-00512.json.gz [2024-01-02 18:01:55,092][Main][INFO] - [train] Step 21900 out of 65536 | Loss --> 2.448 | Grad_l2 --> 0.239 | Weights_l2 --> 14695.458 | Lr --> 0.018 | Seconds_per_step --> 1.718 | [2024-01-02 18:04:45,128][Main][INFO] - [train] Step 22000 out of 65536 | Loss --> 2.415 | Grad_l2 --> 0.243 | Weights_l2 --> 14717.980 | Lr --> 0.018 | Seconds_per_step --> 1.700 | [2024-01-02 18:07:42,566][Main][INFO] - [train] Step 22100 out of 65536 | Loss --> 2.404 | Grad_l2 --> 0.254 | Weights_l2 --> 14740.518 | Lr --> 0.018 | Seconds_per_step --> 1.774 | [2024-01-02 18:10:37,165][Main][INFO] - [train] Step 22200 out of 65536 | Loss --> 2.405 | Grad_l2 --> 0.256 | Weights_l2 --> 14762.969 | Lr --> 0.018 | Seconds_per_step --> 1.746 | [2024-01-02 18:13:29,157][Main][INFO] - [train] Step 22300 out of 65536 | Loss --> 2.423 | Grad_l2 --> 0.261 | Weights_l2 --> 14785.481 | Lr --> 0.018 | Seconds_per_step --> 1.720 | [2024-01-02 18:16:24,606][Main][INFO] - [train] Step 22400 out of 65536 | Loss --> 2.447 | Grad_l2 --> 0.284 | Weights_l2 --> 14809.074 | Lr --> 0.018 | Seconds_per_step --> 1.754 | [2024-01-02 18:19:13,301][Main][INFO] - [train] Step 22500 out of 65536 | Loss --> 2.443 | Grad_l2 --> 0.263 | Weights_l2 --> 14832.093 | Lr --> 0.018 | Seconds_per_step --> 1.687 | [2024-01-02 18:21:18,524][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00227-of-00512.json.gz [2024-01-02 18:21:36,325][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00309-of-00512.json.gz [2024-01-02 18:22:04,772][Main][INFO] - [train] Step 22600 out of 65536 | Loss --> 2.423 | Grad_l2 --> 0.253 | Weights_l2 --> 14854.775 | Lr --> 0.018 | Seconds_per_step --> 1.715 | [2024-01-02 18:24:58,962][Main][INFO] - [train] Step 22700 out of 65536 | Loss --> 2.416 | Grad_l2 --> 0.268 | Weights_l2 --> 14877.335 | Lr --> 0.018 | Seconds_per_step --> 1.742 | [2024-01-02 18:27:53,693][Main][INFO] - [train] Step 22800 out of 65536 | Loss --> 2.414 | Grad_l2 --> 0.253 | Weights_l2 --> 14899.739 | Lr --> 0.017 | Seconds_per_step --> 1.747 | [2024-01-02 18:30:43,772][Main][INFO] - [train] Step 22900 out of 65536 | Loss --> 2.403 | Grad_l2 --> 0.257 | Weights_l2 --> 14921.510 | Lr --> 0.017 | Seconds_per_step --> 1.701 | [2024-01-02 18:33:33,099][Main][INFO] - [train] Step 23000 out of 65536 | Loss --> 2.430 | Grad_l2 --> 0.251 | Weights_l2 --> 14943.319 | Lr --> 0.017 | Seconds_per_step --> 1.693 | [2024-01-02 18:36:22,961][Main][INFO] - [train] Step 23100 out of 65536 | Loss --> 2.397 | Grad_l2 --> 0.252 | Weights_l2 --> 14965.151 | Lr --> 0.017 | Seconds_per_step --> 1.699 | [2024-01-02 18:39:11,811][Main][INFO] - [train] Step 23200 out of 65536 | Loss --> 2.408 | Grad_l2 --> 0.257 | Weights_l2 --> 14986.698 | Lr --> 0.017 | Seconds_per_step --> 1.688 | [2024-01-02 18:41:21,814][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00265-of-00512.json.gz [2024-01-02 18:41:28,135][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00167-of-00512.json.gz [2024-01-02 18:42:02,830][Main][INFO] - [train] Step 23300 out of 65536 | Loss --> 2.407 | Grad_l2 --> 0.251 | Weights_l2 --> 15008.506 | Lr --> 0.017 | Seconds_per_step --> 1.710 | [2024-01-02 18:44:54,942][Main][INFO] - [train] Step 23400 out of 65536 | Loss --> 2.406 | Grad_l2 --> 0.263 | Weights_l2 --> 15030.706 | Lr --> 0.017 | Seconds_per_step --> 1.721 | [2024-01-02 18:47:44,626][Main][INFO] - [train] Step 23500 out of 65536 | Loss --> 2.407 | Grad_l2 --> 0.268 | Weights_l2 --> 15052.366 | Lr --> 0.017 | Seconds_per_step --> 1.697 | [2024-01-02 18:50:36,690][Main][INFO] - [train] Step 23600 out of 65536 | Loss --> 2.393 | Grad_l2 --> 0.265 | Weights_l2 --> 15074.223 | Lr --> 0.017 | Seconds_per_step --> 1.721 | [2024-01-02 18:53:25,775][Main][INFO] - [train] Step 23700 out of 65536 | Loss --> 2.390 | Grad_l2 --> 0.249 | Weights_l2 --> 15095.673 | Lr --> 0.017 | Seconds_per_step --> 1.691 | [2024-01-02 18:56:14,325][Main][INFO] - [train] Step 23800 out of 65536 | Loss --> 2.394 | Grad_l2 --> 0.264 | Weights_l2 --> 15117.037 | Lr --> 0.017 | Seconds_per_step --> 1.685 | [2024-01-02 18:59:06,316][Main][INFO] - [train] Step 23900 out of 65536 | Loss --> 2.406 | Grad_l2 --> 0.255 | Weights_l2 --> 15138.512 | Lr --> 0.017 | Seconds_per_step --> 1.720 | [2024-01-02 19:01:52,496][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00153-of-00512.json.gz [2024-01-02 19:01:52,883][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00117-of-00512.json.gz [2024-01-02 19:02:01,057][Main][INFO] - [train] Step 24000 out of 65536 | Loss --> 2.398 | Grad_l2 --> 0.256 | Weights_l2 --> 15159.799 | Lr --> 0.017 | Seconds_per_step --> 1.747 | [2024-01-02 19:04:49,943][Main][INFO] - [train] Step 24100 out of 65536 | Loss --> 2.393 | Grad_l2 --> 0.251 | Weights_l2 --> 15181.146 | Lr --> 0.017 | Seconds_per_step --> 1.689 | [2024-01-02 19:07:39,191][Main][INFO] - [train] Step 24200 out of 65536 | Loss --> 2.393 | Grad_l2 --> 0.278 | Weights_l2 --> 15202.209 | Lr --> 0.017 | Seconds_per_step --> 1.692 | [2024-01-02 19:10:32,575][Main][INFO] - [train] Step 24300 out of 65536 | Loss --> 2.375 | Grad_l2 --> 0.260 | Weights_l2 --> 15223.079 | Lr --> 0.017 | Seconds_per_step --> 1.734 | [2024-01-02 19:13:21,840][Main][INFO] - [train] Step 24400 out of 65536 | Loss --> 2.372 | Grad_l2 --> 0.248 | Weights_l2 --> 15243.978 | Lr --> 0.017 | Seconds_per_step --> 1.693 | [2024-01-02 19:16:16,666][Main][INFO] - [train] Step 24500 out of 65536 | Loss --> 2.371 | Grad_l2 --> 0.251 | Weights_l2 --> 15264.771 | Lr --> 0.017 | Seconds_per_step --> 1.748 | [2024-01-02 19:19:08,833][Main][INFO] - [train] Step 24600 out of 65536 | Loss --> 2.375 | Grad_l2 --> 0.267 | Weights_l2 --> 15285.543 | Lr --> 0.017 | Seconds_per_step --> 1.722 | [2024-01-02 19:21:55,363][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00263-of-00512.json.gz [2024-01-02 19:21:59,897][Main][INFO] - [train] Step 24700 out of 65536 | Loss --> 2.372 | Grad_l2 --> 0.263 | Weights_l2 --> 15306.152 | Lr --> 0.017 | Seconds_per_step --> 1.711 | [2024-01-02 19:22:05,257][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00453-of-00512.json.gz [2024-01-02 19:24:51,720][Main][INFO] - [train] Step 24800 out of 65536 | Loss --> 2.358 | Grad_l2 --> 0.256 | Weights_l2 --> 15326.725 | Lr --> 0.017 | Seconds_per_step --> 1.718 | [2024-01-02 19:27:39,622][Main][INFO] - [train] Step 24900 out of 65536 | Loss --> 2.359 | Grad_l2 --> 0.249 | Weights_l2 --> 15347.256 | Lr --> 0.017 | Seconds_per_step --> 1.679 | [2024-01-02 19:30:27,697][Main][INFO] - [train] Step 25000 out of 65536 | Loss --> 2.359 | Grad_l2 --> 0.261 | Weights_l2 --> 15367.830 | Lr --> 0.017 | Seconds_per_step --> 1.681 | [2024-01-02 19:30:27,748][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-02 19:30:27,750][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-02 19:32:39,671][Main][INFO] - [eval] Step 25000 out of 65536 | Loss --> 2.407 | Accuracy --> 0.583 | Time --> 131.972 | [2024-01-02 19:35:33,075][Main][INFO] - [train] Step 25100 out of 65536 | Loss --> 2.360 | Grad_l2 --> 0.262 | Weights_l2 --> 15388.418 | Lr --> 0.017 | Seconds_per_step --> 1.734 | [2024-01-02 19:38:27,107][Main][INFO] - [train] Step 25200 out of 65536 | Loss --> 2.352 | Grad_l2 --> 0.255 | Weights_l2 --> 15408.697 | Lr --> 0.017 | Seconds_per_step --> 1.740 | [2024-01-02 19:41:17,526][Main][INFO] - [train] Step 25300 out of 65536 | Loss --> 2.354 | Grad_l2 --> 0.258 | Weights_l2 --> 15428.927 | Lr --> 0.016 | Seconds_per_step --> 1.704 | [2024-01-02 19:44:07,303][Main][INFO] - [train] Step 25400 out of 65536 | Loss --> 2.337 | Grad_l2 --> 0.266 | Weights_l2 --> 15449.144 | Lr --> 0.016 | Seconds_per_step --> 1.698 | [2024-01-02 19:44:20,518][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00431-of-00512.json.gz [2024-01-02 19:44:40,085][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00219-of-00512.json.gz [2024-01-02 19:46:58,868][Main][INFO] - [train] Step 25500 out of 65536 | Loss --> 2.331 | Grad_l2 --> 0.254 | Weights_l2 --> 15469.032 | Lr --> 0.016 | Seconds_per_step --> 1.716 | [2024-01-02 19:49:49,258][Main][INFO] - [train] Step 25600 out of 65536 | Loss --> 2.345 | Grad_l2 --> 0.266 | Weights_l2 --> 15489.093 | Lr --> 0.016 | Seconds_per_step --> 1.704 | [2024-01-02 19:52:38,249][Main][INFO] - [train] Step 25700 out of 65536 | Loss --> 2.347 | Grad_l2 --> 0.262 | Weights_l2 --> 15509.004 | Lr --> 0.016 | Seconds_per_step --> 1.690 | [2024-01-02 19:55:30,533][Main][INFO] - [train] Step 25800 out of 65536 | Loss --> 2.351 | Grad_l2 --> 0.262 | Weights_l2 --> 15528.766 | Lr --> 0.016 | Seconds_per_step --> 1.723 | [2024-01-02 19:58:18,977][Main][INFO] - [train] Step 25900 out of 65536 | Loss --> 2.344 | Grad_l2 --> 0.274 | Weights_l2 --> 15548.873 | Lr --> 0.016 | Seconds_per_step --> 1.684 | [2024-01-02 20:01:07,096][Main][INFO] - [train] Step 26000 out of 65536 | Loss --> 2.359 | Grad_l2 --> 0.260 | Weights_l2 --> 15568.736 | Lr --> 0.016 | Seconds_per_step --> 1.681 | [2024-01-02 20:03:59,599][Main][INFO] - [train] Step 26100 out of 65536 | Loss --> 2.348 | Grad_l2 --> 0.262 | Weights_l2 --> 15588.379 | Lr --> 0.016 | Seconds_per_step --> 1.725 | [2024-01-02 20:04:48,626][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00014-of-00512.json.gz [2024-01-02 20:04:55,213][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00257-of-00512.json.gz [2024-01-02 20:06:52,474][Main][INFO] - [train] Step 26200 out of 65536 | Loss --> 2.358 | Grad_l2 --> 0.266 | Weights_l2 --> 15608.121 | Lr --> 0.016 | Seconds_per_step --> 1.729 | [2024-01-02 20:09:40,812][Main][INFO] - [train] Step 26300 out of 65536 | Loss --> 2.334 | Grad_l2 --> 0.262 | Weights_l2 --> 15627.567 | Lr --> 0.016 | Seconds_per_step --> 1.683 | [2024-01-02 20:12:32,449][Main][INFO] - [train] Step 26400 out of 65536 | Loss --> 2.347 | Grad_l2 --> 0.274 | Weights_l2 --> 15647.006 | Lr --> 0.016 | Seconds_per_step --> 1.716 | [2024-01-02 20:15:25,730][Main][INFO] - [train] Step 26500 out of 65536 | Loss --> 2.344 | Grad_l2 --> 0.265 | Weights_l2 --> 15666.068 | Lr --> 0.016 | Seconds_per_step --> 1.733 | [2024-01-02 20:18:12,959][Main][INFO] - [train] Step 26600 out of 65536 | Loss --> 2.356 | Grad_l2 --> 0.260 | Weights_l2 --> 15685.294 | Lr --> 0.016 | Seconds_per_step --> 1.672 | [2024-01-02 20:21:09,422][Main][INFO] - [train] Step 26700 out of 65536 | Loss --> 2.334 | Grad_l2 --> 0.264 | Weights_l2 --> 15704.444 | Lr --> 0.016 | Seconds_per_step --> 1.765 | [2024-01-02 20:24:00,053][Main][INFO] - [train] Step 26800 out of 65536 | Loss --> 2.364 | Grad_l2 --> 0.282 | Weights_l2 --> 15724.034 | Lr --> 0.016 | Seconds_per_step --> 1.706 | [2024-01-02 20:24:42,424][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00060-of-00512.json.gz [2024-01-02 20:25:22,899][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00510-of-00512.json.gz [2024-01-02 20:26:52,609][Main][INFO] - [train] Step 26900 out of 65536 | Loss --> 2.354 | Grad_l2 --> 0.265 | Weights_l2 --> 15743.290 | Lr --> 0.016 | Seconds_per_step --> 1.726 | [2024-01-02 20:29:43,646][Main][INFO] - [train] Step 27000 out of 65536 | Loss --> 2.339 | Grad_l2 --> 0.258 | Weights_l2 --> 15762.223 | Lr --> 0.016 | Seconds_per_step --> 1.710 | [2024-01-02 20:32:33,625][Main][INFO] - [train] Step 27100 out of 65536 | Loss --> 2.345 | Grad_l2 --> 0.282 | Weights_l2 --> 15781.998 | Lr --> 0.016 | Seconds_per_step --> 1.700 | [2024-01-02 20:35:23,353][Main][INFO] - [train] Step 27200 out of 65536 | Loss --> 2.344 | Grad_l2 --> 0.279 | Weights_l2 --> 15800.944 | Lr --> 0.016 | Seconds_per_step --> 1.697 | [2024-01-02 20:38:15,119][Main][INFO] - [train] Step 27300 out of 65536 | Loss --> 2.311 | Grad_l2 --> 0.264 | Weights_l2 --> 15819.695 | Lr --> 0.016 | Seconds_per_step --> 1.718 | [2024-01-02 20:41:03,550][Main][INFO] - [train] Step 27400 out of 65536 | Loss --> 2.317 | Grad_l2 --> 0.272 | Weights_l2 --> 15838.170 | Lr --> 0.016 | Seconds_per_step --> 1.684 | [2024-01-02 20:44:02,635][Main][INFO] - [train] Step 27500 out of 65536 | Loss --> 2.319 | Grad_l2 --> 0.266 | Weights_l2 --> 15856.759 | Lr --> 0.015 | Seconds_per_step --> 1.791 | [2024-01-02 20:45:15,436][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00353-of-00512.json.gz [2024-01-02 20:45:41,056][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00434-of-00512.json.gz [2024-01-02 20:46:56,360][Main][INFO] - [train] Step 27600 out of 65536 | Loss --> 2.316 | Grad_l2 --> 0.261 | Weights_l2 --> 15875.160 | Lr --> 0.015 | Seconds_per_step --> 1.737 | [2024-01-02 20:49:51,501][Main][INFO] - [train] Step 27700 out of 65536 | Loss --> 2.301 | Grad_l2 --> 0.261 | Weights_l2 --> 15893.388 | Lr --> 0.015 | Seconds_per_step --> 1.751 | [2024-01-02 20:52:42,203][Main][INFO] - [train] Step 27800 out of 65536 | Loss --> 2.335 | Grad_l2 --> 0.275 | Weights_l2 --> 15911.735 | Lr --> 0.015 | Seconds_per_step --> 1.707 | [2024-01-02 20:55:30,530][Main][INFO] - [train] Step 27900 out of 65536 | Loss --> 2.321 | Grad_l2 --> 0.263 | Weights_l2 --> 15929.715 | Lr --> 0.015 | Seconds_per_step --> 1.683 | [2024-01-02 20:58:24,401][Main][INFO] - [train] Step 28000 out of 65536 | Loss --> 2.314 | Grad_l2 --> 0.261 | Weights_l2 --> 15947.652 | Lr --> 0.015 | Seconds_per_step --> 1.739 | [2024-01-02 21:01:14,614][Main][INFO] - [train] Step 28100 out of 65536 | Loss --> 2.291 | Grad_l2 --> 0.272 | Weights_l2 --> 15965.616 | Lr --> 0.015 | Seconds_per_step --> 1.702 | [2024-01-02 21:04:03,228][Main][INFO] - [train] Step 28200 out of 65536 | Loss --> 2.279 | Grad_l2 --> 0.277 | Weights_l2 --> 15983.342 | Lr --> 0.015 | Seconds_per_step --> 1.686 | [2024-01-02 21:05:48,936][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00157-of-00512.json.gz [2024-01-02 21:06:04,397][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00051-of-00512.json.gz [2024-01-02 21:06:54,532][Main][INFO] - [train] Step 28300 out of 65536 | Loss --> 2.295 | Grad_l2 --> 0.274 | Weights_l2 --> 16001.161 | Lr --> 0.015 | Seconds_per_step --> 1.713 | [2024-01-02 21:09:43,822][Main][INFO] - [train] Step 28400 out of 65536 | Loss --> 2.285 | Grad_l2 --> 0.265 | Weights_l2 --> 16018.729 | Lr --> 0.015 | Seconds_per_step --> 1.693 | [2024-01-02 21:12:36,433][Main][INFO] - [train] Step 28500 out of 65536 | Loss --> 2.305 | Grad_l2 --> 0.258 | Weights_l2 --> 16036.217 | Lr --> 0.015 | Seconds_per_step --> 1.726 | [2024-01-02 21:15:24,119][Main][INFO] - [train] Step 28600 out of 65536 | Loss --> 2.294 | Grad_l2 --> 0.275 | Weights_l2 --> 16053.729 | Lr --> 0.015 | Seconds_per_step --> 1.677 | [2024-01-02 21:18:15,217][Main][INFO] - [train] Step 28700 out of 65536 | Loss --> 2.297 | Grad_l2 --> 0.280 | Weights_l2 --> 16070.909 | Lr --> 0.015 | Seconds_per_step --> 1.711 | [2024-01-02 21:21:11,627][Main][INFO] - [train] Step 28800 out of 65536 | Loss --> 2.270 | Grad_l2 --> 0.270 | Weights_l2 --> 16088.154 | Lr --> 0.015 | Seconds_per_step --> 1.764 | [2024-01-02 21:24:03,333][Main][INFO] - [train] Step 28900 out of 65536 | Loss --> 2.281 | Grad_l2 --> 0.325 | Weights_l2 --> 16106.178 | Lr --> 0.015 | Seconds_per_step --> 1.717 | [2024-01-02 21:25:47,428][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00068-of-00512.json.gz [2024-01-02 21:25:57,460][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00347-of-00512.json.gz [2024-01-02 21:26:59,069][Main][INFO] - [train] Step 29000 out of 65536 | Loss --> 2.296 | Grad_l2 --> 0.275 | Weights_l2 --> 16123.948 | Lr --> 0.015 | Seconds_per_step --> 1.757 | [2024-01-02 21:29:48,297][Main][INFO] - [train] Step 29100 out of 65536 | Loss --> 2.274 | Grad_l2 --> 0.272 | Weights_l2 --> 16141.083 | Lr --> 0.015 | Seconds_per_step --> 1.692 | [2024-01-02 21:32:39,491][Main][INFO] - [train] Step 29200 out of 65536 | Loss --> 2.289 | Grad_l2 --> 0.280 | Weights_l2 --> 16158.398 | Lr --> 0.015 | Seconds_per_step --> 1.712 | [2024-01-02 21:35:29,787][Main][INFO] - [train] Step 29300 out of 65536 | Loss --> 2.285 | Grad_l2 --> 0.276 | Weights_l2 --> 16175.486 | Lr --> 0.015 | Seconds_per_step --> 1.703 | [2024-01-02 21:38:18,294][Main][INFO] - [train] Step 29400 out of 65536 | Loss --> 2.285 | Grad_l2 --> 0.285 | Weights_l2 --> 16192.404 | Lr --> 0.015 | Seconds_per_step --> 1.685 | [2024-01-02 21:41:09,096][Main][INFO] - [train] Step 29500 out of 65536 | Loss --> 2.269 | Grad_l2 --> 0.283 | Weights_l2 --> 16209.102 | Lr --> 0.015 | Seconds_per_step --> 1.708 | [2024-01-02 21:44:01,334][Main][INFO] - [train] Step 29600 out of 65536 | Loss --> 2.268 | Grad_l2 --> 0.277 | Weights_l2 --> 16225.583 | Lr --> 0.014 | Seconds_per_step --> 1.722 | [2024-01-02 21:46:14,451][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00137-of-00512.json.gz [2024-01-02 21:46:32,415][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00179-of-00512.json.gz [2024-01-02 21:46:53,459][Main][INFO] - [train] Step 29700 out of 65536 | Loss --> 2.261 | Grad_l2 --> 0.288 | Weights_l2 --> 16241.978 | Lr --> 0.014 | Seconds_per_step --> 1.721 | [2024-01-02 21:49:46,300][Main][INFO] - [train] Step 29800 out of 65536 | Loss --> 2.263 | Grad_l2 --> 0.275 | Weights_l2 --> 16258.343 | Lr --> 0.014 | Seconds_per_step --> 1.728 | [2024-01-02 21:52:34,658][Main][INFO] - [train] Step 29900 out of 65536 | Loss --> 2.263 | Grad_l2 --> 0.272 | Weights_l2 --> 16274.619 | Lr --> 0.014 | Seconds_per_step --> 1.684 | [2024-01-02 21:55:28,505][Main][INFO] - [train] Step 30000 out of 65536 | Loss --> 2.274 | Grad_l2 --> 0.270 | Weights_l2 --> 16290.690 | Lr --> 0.014 | Seconds_per_step --> 1.738 | [2024-01-02 21:55:28,561][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-02 21:55:28,562][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-02 21:57:39,177][Main][INFO] - [eval] Step 30000 out of 65536 | Loss --> 2.305 | Accuracy --> 0.595 | Time --> 130.669 | [2024-01-02 22:00:28,493][Main][INFO] - [train] Step 30100 out of 65536 | Loss --> 2.266 | Grad_l2 --> 0.281 | Weights_l2 --> 16306.693 | Lr --> 0.014 | Seconds_per_step --> 1.693 | [2024-01-02 22:03:21,488][Main][INFO] - [train] Step 30200 out of 65536 | Loss --> 2.247 | Grad_l2 --> 0.270 | Weights_l2 --> 16322.589 | Lr --> 0.014 | Seconds_per_step --> 1.730 | [2024-01-02 22:06:11,410][Main][INFO] - [train] Step 30300 out of 65536 | Loss --> 2.270 | Grad_l2 --> 0.284 | Weights_l2 --> 16338.783 | Lr --> 0.014 | Seconds_per_step --> 1.699 | [2024-01-02 22:08:58,658][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00008-of-00512.json.gz [2024-01-02 22:08:59,121][Main][INFO] - [train] Step 30400 out of 65536 | Loss --> 2.268 | Grad_l2 --> 0.280 | Weights_l2 --> 16354.834 | Lr --> 0.014 | Seconds_per_step --> 1.677 | [2024-01-02 22:09:38,033][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00121-of-00512.json.gz [2024-01-02 22:11:57,819][Main][INFO] - [train] Step 30500 out of 65536 | Loss --> 2.259 | Grad_l2 --> 0.281 | Weights_l2 --> 16370.619 | Lr --> 0.014 | Seconds_per_step --> 1.787 | [2024-01-02 22:14:45,639][Main][INFO] - [train] Step 30600 out of 65536 | Loss --> 2.248 | Grad_l2 --> 0.271 | Weights_l2 --> 16385.988 | Lr --> 0.014 | Seconds_per_step --> 1.678 | [2024-01-02 22:17:34,343][Main][INFO] - [train] Step 30700 out of 65536 | Loss --> 2.254 | Grad_l2 --> 0.270 | Weights_l2 --> 16401.478 | Lr --> 0.014 | Seconds_per_step --> 1.687 | [2024-01-02 22:20:24,583][Main][INFO] - [train] Step 30800 out of 65536 | Loss --> 2.268 | Grad_l2 --> 0.285 | Weights_l2 --> 16416.938 | Lr --> 0.014 | Seconds_per_step --> 1.702 | [2024-01-02 22:23:14,533][Main][INFO] - [train] Step 30900 out of 65536 | Loss --> 2.262 | Grad_l2 --> 0.278 | Weights_l2 --> 16432.423 | Lr --> 0.014 | Seconds_per_step --> 1.699 | [2024-01-02 22:26:05,144][Main][INFO] - [train] Step 31000 out of 65536 | Loss --> 2.236 | Grad_l2 --> 0.270 | Weights_l2 --> 16447.583 | Lr --> 0.014 | Seconds_per_step --> 1.706 | [2024-01-02 22:28:51,617][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00477-of-00512.json.gz [2024-01-02 22:28:56,393][Main][INFO] - [train] Step 31100 out of 65536 | Loss --> 2.236 | Grad_l2 --> 0.278 | Weights_l2 --> 16462.689 | Lr --> 0.014 | Seconds_per_step --> 1.712 | [2024-01-02 22:29:20,705][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00230-of-00512.json.gz [2024-01-02 22:31:46,231][Main][INFO] - [train] Step 31200 out of 65536 | Loss --> 2.219 | Grad_l2 --> 0.272 | Weights_l2 --> 16477.594 | Lr --> 0.014 | Seconds_per_step --> 1.698 | [2024-01-02 22:34:34,666][Main][INFO] - [train] Step 31300 out of 65536 | Loss --> 2.218 | Grad_l2 --> 0.273 | Weights_l2 --> 16492.417 | Lr --> 0.014 | Seconds_per_step --> 1.684 | [2024-01-02 22:37:23,187][Main][INFO] - [train] Step 31400 out of 65536 | Loss --> 2.242 | Grad_l2 --> 0.280 | Weights_l2 --> 16507.318 | Lr --> 0.014 | Seconds_per_step --> 1.685 | [2024-01-02 22:40:16,332][Main][INFO] - [train] Step 31500 out of 65536 | Loss --> 2.227 | Grad_l2 --> 0.276 | Weights_l2 --> 16521.998 | Lr --> 0.013 | Seconds_per_step --> 1.731 | [2024-01-02 22:43:06,205][Main][INFO] - [train] Step 31600 out of 65536 | Loss --> 2.238 | Grad_l2 --> 0.275 | Weights_l2 --> 16536.536 | Lr --> 0.013 | Seconds_per_step --> 1.699 | [2024-01-02 22:45:56,381][Main][INFO] - [train] Step 31700 out of 65536 | Loss --> 2.246 | Grad_l2 --> 0.275 | Weights_l2 --> 16550.986 | Lr --> 0.013 | Seconds_per_step --> 1.702 | [2024-01-02 22:48:45,105][Main][INFO] - [train] Step 31800 out of 65536 | Loss --> 2.239 | Grad_l2 --> 0.282 | Weights_l2 --> 16565.364 | Lr --> 0.013 | Seconds_per_step --> 1.687 | [2024-01-02 22:49:15,781][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00273-of-00512.json.gz [2024-01-02 22:49:28,221][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00249-of-00512.json.gz [2024-01-02 22:51:35,461][Main][INFO] - [train] Step 31900 out of 65536 | Loss --> 2.246 | Grad_l2 --> 0.279 | Weights_l2 --> 16579.675 | Lr --> 0.013 | Seconds_per_step --> 1.704 | [2024-01-02 22:54:28,150][Main][INFO] - [train] Step 32000 out of 65536 | Loss --> 2.247 | Grad_l2 --> 0.280 | Weights_l2 --> 16593.872 | Lr --> 0.013 | Seconds_per_step --> 1.727 | [2024-01-02 22:57:17,376][Main][INFO] - [train] Step 32100 out of 65536 | Loss --> 2.230 | Grad_l2 --> 0.295 | Weights_l2 --> 16608.055 | Lr --> 0.013 | Seconds_per_step --> 1.692 | [2024-01-02 23:00:14,088][Main][INFO] - [train] Step 32200 out of 65536 | Loss --> 2.236 | Grad_l2 --> 0.291 | Weights_l2 --> 16622.165 | Lr --> 0.013 | Seconds_per_step --> 1.767 | [2024-01-02 23:03:01,993][Main][INFO] - [train] Step 32300 out of 65536 | Loss --> 2.230 | Grad_l2 --> 0.279 | Weights_l2 --> 16636.046 | Lr --> 0.013 | Seconds_per_step --> 1.679 | [2024-01-02 23:05:51,292][Main][INFO] - [train] Step 32400 out of 65536 | Loss --> 2.244 | Grad_l2 --> 0.280 | Weights_l2 --> 16649.850 | Lr --> 0.013 | Seconds_per_step --> 1.693 | [2024-01-02 23:08:41,925][Main][INFO] - [train] Step 32500 out of 65536 | Loss --> 2.234 | Grad_l2 --> 0.277 | Weights_l2 --> 16663.578 | Lr --> 0.013 | Seconds_per_step --> 1.706 | [2024-01-02 23:09:00,196][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00500-of-00512.json.gz [2024-01-02 23:09:08,888][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00000-of-00512.json.gz [2024-01-02 23:11:34,392][Main][INFO] - [train] Step 32600 out of 65536 | Loss --> 2.210 | Grad_l2 --> 0.285 | Weights_l2 --> 16677.233 | Lr --> 0.013 | Seconds_per_step --> 1.725 | [2024-01-02 23:14:23,458][Main][INFO] - [train] Step 32700 out of 65536 | Loss --> 2.229 | Grad_l2 --> 0.289 | Weights_l2 --> 16690.735 | Lr --> 0.013 | Seconds_per_step --> 1.691 | [2024-01-02 23:17:13,334][Main][INFO] - [train] Step 32800 out of 65536 | Loss --> 2.220 | Grad_l2 --> 0.278 | Weights_l2 --> 16704.177 | Lr --> 0.013 | Seconds_per_step --> 1.699 | [2024-01-02 23:20:12,953][Main][INFO] - [train] Step 32900 out of 65536 | Loss --> 2.220 | Grad_l2 --> 0.274 | Weights_l2 --> 16717.512 | Lr --> 0.013 | Seconds_per_step --> 1.796 | [2024-01-02 23:23:04,208][Main][INFO] - [train] Step 33000 out of 65536 | Loss --> 2.225 | Grad_l2 --> 0.284 | Weights_l2 --> 16730.809 | Lr --> 0.013 | Seconds_per_step --> 1.713 | [2024-01-02 23:25:52,391][Main][INFO] - [train] Step 33100 out of 65536 | Loss --> 2.223 | Grad_l2 --> 0.283 | Weights_l2 --> 16744.049 | Lr --> 0.013 | Seconds_per_step --> 1.682 | [2024-01-02 23:28:46,134][Main][INFO] - [train] Step 33200 out of 65536 | Loss --> 2.212 | Grad_l2 --> 0.278 | Weights_l2 --> 16757.175 | Lr --> 0.013 | Seconds_per_step --> 1.737 | [2024-01-02 23:29:20,478][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00507-of-00512.json.gz [2024-01-02 23:29:51,602][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00352-of-00512.json.gz [2024-01-02 23:31:35,551][Main][INFO] - [train] Step 33300 out of 65536 | Loss --> 2.206 | Grad_l2 --> 0.283 | Weights_l2 --> 16770.140 | Lr --> 0.013 | Seconds_per_step --> 1.694 | [2024-01-02 23:34:26,651][Main][INFO] - [train] Step 33400 out of 65536 | Loss --> 2.195 | Grad_l2 --> 0.289 | Weights_l2 --> 16783.056 | Lr --> 0.012 | Seconds_per_step --> 1.711 | [2024-01-02 23:37:15,247][Main][INFO] - [train] Step 33500 out of 65536 | Loss --> 2.201 | Grad_l2 --> 0.283 | Weights_l2 --> 16795.989 | Lr --> 0.012 | Seconds_per_step --> 1.686 | [2024-01-02 23:40:04,555][Main][INFO] - [train] Step 33600 out of 65536 | Loss --> 2.198 | Grad_l2 --> 0.283 | Weights_l2 --> 16808.647 | Lr --> 0.012 | Seconds_per_step --> 1.693 | [2024-01-02 23:42:55,093][Main][INFO] - [train] Step 33700 out of 65536 | Loss --> 2.205 | Grad_l2 --> 0.280 | Weights_l2 --> 16821.247 | Lr --> 0.012 | Seconds_per_step --> 1.705 | [2024-01-02 23:45:46,415][Main][INFO] - [train] Step 33800 out of 65536 | Loss --> 2.196 | Grad_l2 --> 0.285 | Weights_l2 --> 16833.817 | Lr --> 0.012 | Seconds_per_step --> 1.713 | [2024-01-02 23:48:36,020][Main][INFO] - [train] Step 33900 out of 65536 | Loss --> 2.204 | Grad_l2 --> 0.280 | Weights_l2 --> 16846.275 | Lr --> 0.012 | Seconds_per_step --> 1.696 | [2024-01-02 23:49:55,848][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00178-of-00512.json.gz [2024-01-02 23:50:00,390][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00140-of-00512.json.gz [2024-01-02 23:51:29,982][Main][INFO] - [train] Step 34000 out of 65536 | Loss --> 2.182 | Grad_l2 --> 0.289 | Weights_l2 --> 16858.585 | Lr --> 0.012 | Seconds_per_step --> 1.740 | [2024-01-02 23:54:25,627][Main][INFO] - [train] Step 34100 out of 65536 | Loss --> 2.186 | Grad_l2 --> 0.281 | Weights_l2 --> 16870.781 | Lr --> 0.012 | Seconds_per_step --> 1.756 | [2024-01-02 23:57:13,266][Main][INFO] - [train] Step 34200 out of 65536 | Loss --> 2.186 | Grad_l2 --> 0.286 | Weights_l2 --> 16882.840 | Lr --> 0.012 | Seconds_per_step --> 1.676 | [2024-01-03 00:00:02,388][Main][INFO] - [train] Step 34300 out of 65536 | Loss --> 2.182 | Grad_l2 --> 0.283 | Weights_l2 --> 16894.808 | Lr --> 0.012 | Seconds_per_step --> 1.691 | [2024-01-03 00:02:54,384][Main][INFO] - [train] Step 34400 out of 65536 | Loss --> 2.181 | Grad_l2 --> 0.303 | Weights_l2 --> 16907.026 | Lr --> 0.012 | Seconds_per_step --> 1.720 | [2024-01-03 00:05:41,895][Main][INFO] - [train] Step 34500 out of 65536 | Loss --> 2.157 | Grad_l2 --> 0.281 | Weights_l2 --> 16918.696 | Lr --> 0.012 | Seconds_per_step --> 1.675 | [2024-01-03 00:08:32,195][Main][INFO] - [train] Step 34600 out of 65536 | Loss --> 2.181 | Grad_l2 --> 0.282 | Weights_l2 --> 16930.414 | Lr --> 0.012 | Seconds_per_step --> 1.703 | [2024-01-03 00:09:53,632][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00105-of-00512.json.gz [2024-01-03 00:10:10,003][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00165-of-00512.json.gz [2024-01-03 00:11:23,789][Main][INFO] - [train] Step 34700 out of 65536 | Loss --> 2.196 | Grad_l2 --> 0.288 | Weights_l2 --> 16941.943 | Lr --> 0.012 | Seconds_per_step --> 1.716 | [2024-01-03 00:14:13,982][Main][INFO] - [train] Step 34800 out of 65536 | Loss --> 2.182 | Grad_l2 --> 0.280 | Weights_l2 --> 16953.472 | Lr --> 0.012 | Seconds_per_step --> 1.702 | [2024-01-03 00:17:05,745][Main][INFO] - [train] Step 34900 out of 65536 | Loss --> 2.168 | Grad_l2 --> 0.289 | Weights_l2 --> 16964.750 | Lr --> 0.012 | Seconds_per_step --> 1.718 | [2024-01-03 00:19:55,498][Main][INFO] - [train] Step 35000 out of 65536 | Loss --> 2.166 | Grad_l2 --> 0.281 | Weights_l2 --> 16976.026 | Lr --> 0.012 | Seconds_per_step --> 1.698 | [2024-01-03 00:19:55,547][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-03 00:19:55,548][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-03 00:22:06,691][Main][INFO] - [eval] Step 35000 out of 65536 | Loss --> 2.212 | Accuracy --> 0.608 | Time --> 131.191 | [2024-01-03 00:24:55,807][Main][INFO] - [train] Step 35100 out of 65536 | Loss --> 2.165 | Grad_l2 --> 0.282 | Weights_l2 --> 16987.239 | Lr --> 0.012 | Seconds_per_step --> 1.691 | [2024-01-03 00:27:47,004][Main][INFO] - [train] Step 35200 out of 65536 | Loss --> 2.169 | Grad_l2 --> 0.283 | Weights_l2 --> 16998.328 | Lr --> 0.011 | Seconds_per_step --> 1.712 | [2024-01-03 00:30:36,343][Main][INFO] - [train] Step 35300 out of 65536 | Loss --> 2.163 | Grad_l2 --> 0.287 | Weights_l2 --> 17009.469 | Lr --> 0.011 | Seconds_per_step --> 1.693 | [2024-01-03 00:32:23,771][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00378-of-00512.json.gz [2024-01-03 00:33:08,170][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00464-of-00512.json.gz [2024-01-03 00:33:30,575][Main][INFO] - [train] Step 35400 out of 65536 | Loss --> 2.176 | Grad_l2 --> 0.285 | Weights_l2 --> 17020.410 | Lr --> 0.011 | Seconds_per_step --> 1.742 | [2024-01-03 00:36:18,391][Main][INFO] - [train] Step 35500 out of 65536 | Loss --> 2.160 | Grad_l2 --> 0.293 | Weights_l2 --> 17031.153 | Lr --> 0.011 | Seconds_per_step --> 1.678 | [2024-01-03 00:39:11,785][Main][INFO] - [train] Step 35600 out of 65536 | Loss --> 2.173 | Grad_l2 --> 0.295 | Weights_l2 --> 17041.871 | Lr --> 0.011 | Seconds_per_step --> 1.734 | [2024-01-03 00:42:01,666][Main][INFO] - [train] Step 35700 out of 65536 | Loss --> 2.189 | Grad_l2 --> 0.289 | Weights_l2 --> 17052.524 | Lr --> 0.011 | Seconds_per_step --> 1.699 | [2024-01-03 00:44:50,615][Main][INFO] - [train] Step 35800 out of 65536 | Loss --> 2.182 | Grad_l2 --> 0.285 | Weights_l2 --> 17062.917 | Lr --> 0.011 | Seconds_per_step --> 1.689 | [2024-01-03 00:47:44,325][Main][INFO] - [train] Step 35900 out of 65536 | Loss --> 2.173 | Grad_l2 --> 0.292 | Weights_l2 --> 17073.414 | Lr --> 0.011 | Seconds_per_step --> 1.737 | [2024-01-03 00:50:32,752][Main][INFO] - [train] Step 36000 out of 65536 | Loss --> 2.158 | Grad_l2 --> 0.288 | Weights_l2 --> 17083.724 | Lr --> 0.011 | Seconds_per_step --> 1.684 | [2024-01-03 00:52:30,653][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00010-of-00512.json.gz [2024-01-03 00:53:24,107][Main][INFO] - [train] Step 36100 out of 65536 | Loss --> 2.166 | Grad_l2 --> 0.287 | Weights_l2 --> 17093.965 | Lr --> 0.011 | Seconds_per_step --> 1.714 | [2024-01-03 00:53:29,335][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00019-of-00512.json.gz [2024-01-03 00:56:15,503][Main][INFO] - [train] Step 36200 out of 65536 | Loss --> 2.176 | Grad_l2 --> 0.315 | Weights_l2 --> 17104.358 | Lr --> 0.011 | Seconds_per_step --> 1.714 | [2024-01-03 00:59:04,351][Main][INFO] - [train] Step 36300 out of 65536 | Loss --> 2.170 | Grad_l2 --> 0.299 | Weights_l2 --> 17114.695 | Lr --> 0.011 | Seconds_per_step --> 1.688 | [2024-01-03 01:01:54,068][Main][INFO] - [train] Step 36400 out of 65536 | Loss --> 2.173 | Grad_l2 --> 0.294 | Weights_l2 --> 17124.837 | Lr --> 0.011 | Seconds_per_step --> 1.697 | [2024-01-03 01:04:43,248][Main][INFO] - [train] Step 36500 out of 65536 | Loss --> 2.161 | Grad_l2 --> 0.294 | Weights_l2 --> 17134.826 | Lr --> 0.011 | Seconds_per_step --> 1.692 | [2024-01-03 01:07:34,614][Main][INFO] - [train] Step 36600 out of 65536 | Loss --> 2.170 | Grad_l2 --> 0.291 | Weights_l2 --> 17144.589 | Lr --> 0.011 | Seconds_per_step --> 1.714 | [2024-01-03 01:10:25,235][Main][INFO] - [train] Step 36700 out of 65536 | Loss --> 2.153 | Grad_l2 --> 0.282 | Weights_l2 --> 17154.303 | Lr --> 0.011 | Seconds_per_step --> 1.706 | [2024-01-03 01:12:28,124][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00471-of-00512.json.gz [2024-01-03 01:13:15,391][Main][INFO] - [train] Step 36800 out of 65536 | Loss --> 2.157 | Grad_l2 --> 0.295 | Weights_l2 --> 17163.970 | Lr --> 0.011 | Seconds_per_step --> 1.702 | [2024-01-03 01:13:28,789][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00424-of-00512.json.gz [2024-01-03 01:16:06,386][Main][INFO] - [train] Step 36900 out of 65536 | Loss --> 2.152 | Grad_l2 --> 0.284 | Weights_l2 --> 17173.483 | Lr --> 0.010 | Seconds_per_step --> 1.710 | [2024-01-03 01:18:57,140][Main][INFO] - [train] Step 37000 out of 65536 | Loss --> 2.146 | Grad_l2 --> 0.288 | Weights_l2 --> 17182.887 | Lr --> 0.010 | Seconds_per_step --> 1.708 | [2024-01-03 01:21:45,249][Main][INFO] - [train] Step 37100 out of 65536 | Loss --> 2.149 | Grad_l2 --> 0.287 | Weights_l2 --> 17192.265 | Lr --> 0.010 | Seconds_per_step --> 1.681 | [2024-01-03 01:24:35,753][Main][INFO] - [train] Step 37200 out of 65536 | Loss --> 2.153 | Grad_l2 --> 0.291 | Weights_l2 --> 17201.602 | Lr --> 0.010 | Seconds_per_step --> 1.705 | [2024-01-03 01:27:24,424][Main][INFO] - [train] Step 37300 out of 65536 | Loss --> 2.158 | Grad_l2 --> 0.286 | Weights_l2 --> 17210.711 | Lr --> 0.010 | Seconds_per_step --> 1.687 | [2024-01-03 01:30:15,540][Main][INFO] - [train] Step 37400 out of 65536 | Loss --> 2.150 | Grad_l2 --> 0.287 | Weights_l2 --> 17219.733 | Lr --> 0.010 | Seconds_per_step --> 1.711 | [2024-01-03 01:32:46,451][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00053-of-00512.json.gz [2024-01-03 01:33:11,357][Main][INFO] - [train] Step 37500 out of 65536 | Loss --> 2.145 | Grad_l2 --> 0.302 | Weights_l2 --> 17228.918 | Lr --> 0.010 | Seconds_per_step --> 1.758 | [2024-01-03 01:33:44,213][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00063-of-00512.json.gz [2024-01-03 01:36:02,645][Main][INFO] - [train] Step 37600 out of 65536 | Loss --> 2.122 | Grad_l2 --> 0.289 | Weights_l2 --> 17237.709 | Lr --> 0.010 | Seconds_per_step --> 1.713 | [2024-01-03 01:38:51,472][Main][INFO] - [train] Step 37700 out of 65536 | Loss --> 2.108 | Grad_l2 --> 0.297 | Weights_l2 --> 17246.470 | Lr --> 0.010 | Seconds_per_step --> 1.688 | [2024-01-03 01:41:41,257][Main][INFO] - [train] Step 37800 out of 65536 | Loss --> 2.139 | Grad_l2 --> 0.297 | Weights_l2 --> 17255.070 | Lr --> 0.010 | Seconds_per_step --> 1.698 | [2024-01-03 01:44:30,334][Main][INFO] - [train] Step 37900 out of 65536 | Loss --> 2.123 | Grad_l2 --> 0.292 | Weights_l2 --> 17263.616 | Lr --> 0.010 | Seconds_per_step --> 1.691 | [2024-01-03 01:47:24,146][Main][INFO] - [train] Step 38000 out of 65536 | Loss --> 2.133 | Grad_l2 --> 0.285 | Weights_l2 --> 17272.094 | Lr --> 0.010 | Seconds_per_step --> 1.738 | [2024-01-03 01:50:11,700][Main][INFO] - [train] Step 38100 out of 65536 | Loss --> 2.138 | Grad_l2 --> 0.291 | Weights_l2 --> 17280.525 | Lr --> 0.010 | Seconds_per_step --> 1.676 | [2024-01-03 01:52:35,521][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00200-of-00512.json.gz [2024-01-03 01:53:03,146][Main][INFO] - [train] Step 38200 out of 65536 | Loss --> 2.134 | Grad_l2 --> 0.293 | Weights_l2 --> 17288.890 | Lr --> 0.010 | Seconds_per_step --> 1.714 | [2024-01-03 01:53:19,543][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00266-of-00512.json.gz [2024-01-03 01:55:53,439][Main][INFO] - [train] Step 38300 out of 65536 | Loss --> 2.138 | Grad_l2 --> 0.282 | Weights_l2 --> 17297.026 | Lr --> 0.010 | Seconds_per_step --> 1.703 | [2024-01-03 01:58:41,726][Main][INFO] - [train] Step 38400 out of 65536 | Loss --> 2.140 | Grad_l2 --> 0.288 | Weights_l2 --> 17305.150 | Lr --> 0.010 | Seconds_per_step --> 1.683 | [2024-01-03 02:01:31,764][Main][INFO] - [train] Step 38500 out of 65536 | Loss --> 2.126 | Grad_l2 --> 0.294 | Weights_l2 --> 17313.261 | Lr --> 0.010 | Seconds_per_step --> 1.700 | [2024-01-03 02:04:21,253][Main][INFO] - [train] Step 38600 out of 65536 | Loss --> 2.123 | Grad_l2 --> 0.294 | Weights_l2 --> 17321.197 | Lr --> 0.010 | Seconds_per_step --> 1.695 | [2024-01-03 02:07:11,587][Main][INFO] - [train] Step 38700 out of 65536 | Loss --> 2.101 | Grad_l2 --> 0.294 | Weights_l2 --> 17329.156 | Lr --> 0.009 | Seconds_per_step --> 1.703 | [2024-01-03 02:10:03,544][Main][INFO] - [train] Step 38800 out of 65536 | Loss --> 2.103 | Grad_l2 --> 0.284 | Weights_l2 --> 17336.994 | Lr --> 0.009 | Seconds_per_step --> 1.720 | [2024-01-03 02:12:56,742][Main][INFO] - [train] Step 38900 out of 65536 | Loss --> 2.099 | Grad_l2 --> 0.293 | Weights_l2 --> 17344.697 | Lr --> 0.009 | Seconds_per_step --> 1.732 | [2024-01-03 02:13:17,424][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00144-of-00512.json.gz [2024-01-03 02:13:33,791][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00386-of-00512.json.gz [2024-01-03 02:15:46,711][Main][INFO] - [train] Step 39000 out of 65536 | Loss --> 2.110 | Grad_l2 --> 0.285 | Weights_l2 --> 17352.333 | Lr --> 0.009 | Seconds_per_step --> 1.700 | [2024-01-03 02:18:37,160][Main][INFO] - [train] Step 39100 out of 65536 | Loss --> 2.110 | Grad_l2 --> 0.283 | Weights_l2 --> 17359.887 | Lr --> 0.009 | Seconds_per_step --> 1.704 | [2024-01-03 02:21:26,524][Main][INFO] - [train] Step 39200 out of 65536 | Loss --> 2.117 | Grad_l2 --> 0.287 | Weights_l2 --> 17367.262 | Lr --> 0.009 | Seconds_per_step --> 1.694 | [2024-01-03 02:24:16,638][Main][INFO] - [train] Step 39300 out of 65536 | Loss --> 2.115 | Grad_l2 --> 0.291 | Weights_l2 --> 17374.575 | Lr --> 0.009 | Seconds_per_step --> 1.701 | [2024-01-03 02:27:04,394][Main][INFO] - [train] Step 39400 out of 65536 | Loss --> 2.117 | Grad_l2 --> 0.285 | Weights_l2 --> 17381.807 | Lr --> 0.009 | Seconds_per_step --> 1.678 | [2024-01-03 02:29:53,317][Main][INFO] - [train] Step 39500 out of 65536 | Loss --> 2.113 | Grad_l2 --> 0.289 | Weights_l2 --> 17388.990 | Lr --> 0.009 | Seconds_per_step --> 1.689 | [2024-01-03 02:32:44,181][Main][INFO] - [train] Step 39600 out of 65536 | Loss --> 2.105 | Grad_l2 --> 0.285 | Weights_l2 --> 17396.028 | Lr --> 0.009 | Seconds_per_step --> 1.709 | [2024-01-03 02:33:25,310][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00302-of-00512.json.gz [2024-01-03 02:33:32,914][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00057-of-00512.json.gz [2024-01-03 02:35:35,864][Main][INFO] - [train] Step 39700 out of 65536 | Loss --> 2.105 | Grad_l2 --> 0.286 | Weights_l2 --> 17402.992 | Lr --> 0.009 | Seconds_per_step --> 1.717 | [2024-01-03 02:38:24,656][Main][INFO] - [train] Step 39800 out of 65536 | Loss --> 2.093 | Grad_l2 --> 0.294 | Weights_l2 --> 17409.980 | Lr --> 0.009 | Seconds_per_step --> 1.688 | [2024-01-03 02:41:13,447][Main][INFO] - [train] Step 39900 out of 65536 | Loss --> 2.089 | Grad_l2 --> 0.290 | Weights_l2 --> 17416.890 | Lr --> 0.009 | Seconds_per_step --> 1.688 | [2024-01-03 02:44:03,925][Main][INFO] - [train] Step 40000 out of 65536 | Loss --> 2.081 | Grad_l2 --> 0.293 | Weights_l2 --> 17423.653 | Lr --> 0.009 | Seconds_per_step --> 1.705 | [2024-01-03 02:44:03,973][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-03 02:44:03,974][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-03 02:46:15,123][Main][INFO] - [eval] Step 40000 out of 65536 | Loss --> 2.132 | Accuracy --> 0.618 | Time --> 131.196 | [2024-01-03 02:46:15,127][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-40000 [2024-01-03 02:46:15,130][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-01-03 02:46:17,851][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-40000/model.safetensors [2024-01-03 02:46:22,202][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-40000/optimizer.bin [2024-01-03 02:46:22,203][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-40000/scheduler.bin [2024-01-03 02:46:22,203][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-40000/sampler.bin [2024-01-03 02:46:22,203][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-40000/sampler_1.bin [2024-01-03 02:46:22,205][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-40000/random_states_0.pkl [2024-01-03 02:49:13,051][Main][INFO] - [train] Step 40100 out of 65536 | Loss --> 2.088 | Grad_l2 --> 0.291 | Weights_l2 --> 17430.295 | Lr --> 0.009 | Seconds_per_step --> 1.779 | [2024-01-03 02:52:01,035][Main][INFO] - [train] Step 40200 out of 65536 | Loss --> 2.083 | Grad_l2 --> 0.283 | Weights_l2 --> 17436.882 | Lr --> 0.009 | Seconds_per_step --> 1.680 | [2024-01-03 02:54:50,316][Main][INFO] - [train] Step 40300 out of 65536 | Loss --> 2.085 | Grad_l2 --> 0.292 | Weights_l2 --> 17443.444 | Lr --> 0.009 | Seconds_per_step --> 1.693 | [2024-01-03 02:55:27,847][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00346-of-00512.json.gz [2024-01-03 02:55:33,883][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00329-of-00512.json.gz [2024-01-03 02:57:40,855][Main][INFO] - [train] Step 40400 out of 65536 | Loss --> 2.086 | Grad_l2 --> 0.289 | Weights_l2 --> 17449.908 | Lr --> 0.009 | Seconds_per_step --> 1.705 | [2024-01-03 03:00:32,757][Main][INFO] - [train] Step 40500 out of 65536 | Loss --> 2.090 | Grad_l2 --> 0.293 | Weights_l2 --> 17456.234 | Lr --> 0.008 | Seconds_per_step --> 1.719 | [2024-01-03 03:03:23,942][Main][INFO] - [train] Step 40600 out of 65536 | Loss --> 2.107 | Grad_l2 --> 0.289 | Weights_l2 --> 17462.468 | Lr --> 0.008 | Seconds_per_step --> 1.712 | [2024-01-03 03:06:14,293][Main][INFO] - [train] Step 40700 out of 65536 | Loss --> 2.083 | Grad_l2 --> 0.289 | Weights_l2 --> 17468.697 | Lr --> 0.008 | Seconds_per_step --> 1.704 | [2024-01-03 03:09:03,839][Main][INFO] - [train] Step 40800 out of 65536 | Loss --> 2.094 | Grad_l2 --> 0.291 | Weights_l2 --> 17474.825 | Lr --> 0.008 | Seconds_per_step --> 1.695 | [2024-01-03 03:11:52,679][Main][INFO] - [train] Step 40900 out of 65536 | Loss --> 2.081 | Grad_l2 --> 0.288 | Weights_l2 --> 17480.865 | Lr --> 0.008 | Seconds_per_step --> 1.688 | [2024-01-03 03:14:45,179][Main][INFO] - [train] Step 41000 out of 65536 | Loss --> 2.078 | Grad_l2 --> 0.290 | Weights_l2 --> 17486.805 | Lr --> 0.008 | Seconds_per_step --> 1.725 | [2024-01-03 03:15:44,044][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00481-of-00512.json.gz [2024-01-03 03:16:16,067][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00158-of-00512.json.gz [2024-01-03 03:17:36,448][Main][INFO] - [train] Step 41100 out of 65536 | Loss --> 2.070 | Grad_l2 --> 0.288 | Weights_l2 --> 17492.712 | Lr --> 0.008 | Seconds_per_step --> 1.713 | [2024-01-03 03:20:27,922][Main][INFO] - [train] Step 41200 out of 65536 | Loss --> 2.083 | Grad_l2 --> 0.286 | Weights_l2 --> 17498.521 | Lr --> 0.008 | Seconds_per_step --> 1.715 | [2024-01-03 03:23:17,312][Main][INFO] - [train] Step 41300 out of 65536 | Loss --> 2.088 | Grad_l2 --> 0.292 | Weights_l2 --> 17504.282 | Lr --> 0.008 | Seconds_per_step --> 1.694 | [2024-01-03 03:26:04,569][Main][INFO] - [train] Step 41400 out of 65536 | Loss --> 2.085 | Grad_l2 --> 0.297 | Weights_l2 --> 17510.014 | Lr --> 0.008 | Seconds_per_step --> 1.673 | [2024-01-03 03:28:54,767][Main][INFO] - [train] Step 41500 out of 65536 | Loss --> 2.095 | Grad_l2 --> 0.289 | Weights_l2 --> 17515.610 | Lr --> 0.008 | Seconds_per_step --> 1.702 | [2024-01-03 03:31:44,389][Main][INFO] - [train] Step 41600 out of 65536 | Loss --> 2.074 | Grad_l2 --> 0.294 | Weights_l2 --> 17521.144 | Lr --> 0.008 | Seconds_per_step --> 1.696 | [2024-01-03 03:34:34,298][Main][INFO] - [train] Step 41700 out of 65536 | Loss --> 2.097 | Grad_l2 --> 0.292 | Weights_l2 --> 17526.615 | Lr --> 0.008 | Seconds_per_step --> 1.699 | [2024-01-03 03:35:42,352][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00075-of-00512.json.gz [2024-01-03 03:36:20,206][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00072-of-00512.json.gz [2024-01-03 03:37:27,082][Main][INFO] - [train] Step 41800 out of 65536 | Loss --> 2.070 | Grad_l2 --> 0.284 | Weights_l2 --> 17531.987 | Lr --> 0.008 | Seconds_per_step --> 1.728 | [2024-01-03 03:40:18,101][Main][INFO] - [train] Step 41900 out of 65536 | Loss --> 2.077 | Grad_l2 --> 0.288 | Weights_l2 --> 17537.343 | Lr --> 0.008 | Seconds_per_step --> 1.710 | [2024-01-03 03:43:08,557][Main][INFO] - [train] Step 42000 out of 65536 | Loss --> 2.068 | Grad_l2 --> 0.289 | Weights_l2 --> 17542.586 | Lr --> 0.008 | Seconds_per_step --> 1.705 | [2024-01-03 03:45:57,998][Main][INFO] - [train] Step 42100 out of 65536 | Loss --> 2.068 | Grad_l2 --> 0.293 | Weights_l2 --> 17547.789 | Lr --> 0.008 | Seconds_per_step --> 1.694 | [2024-01-03 03:48:51,875][Main][INFO] - [train] Step 42200 out of 65536 | Loss --> 2.062 | Grad_l2 --> 0.286 | Weights_l2 --> 17552.852 | Lr --> 0.008 | Seconds_per_step --> 1.739 | [2024-01-03 03:51:40,516][Main][INFO] - [train] Step 42300 out of 65536 | Loss --> 2.075 | Grad_l2 --> 0.289 | Weights_l2 --> 17557.897 | Lr --> 0.007 | Seconds_per_step --> 1.686 | [2024-01-03 03:54:28,632][Main][INFO] - [train] Step 42400 out of 65536 | Loss --> 2.073 | Grad_l2 --> 0.287 | Weights_l2 --> 17562.835 | Lr --> 0.007 | Seconds_per_step --> 1.681 | [2024-01-03 03:55:39,567][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00285-of-00512.json.gz [2024-01-03 03:55:58,159][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00062-of-00512.json.gz [2024-01-03 03:57:20,119][Main][INFO] - [train] Step 42500 out of 65536 | Loss --> 2.070 | Grad_l2 --> 0.290 | Weights_l2 --> 17567.733 | Lr --> 0.007 | Seconds_per_step --> 1.715 | [2024-01-03 04:00:11,674][Main][INFO] - [train] Step 42600 out of 65536 | Loss --> 2.039 | Grad_l2 --> 0.285 | Weights_l2 --> 17572.547 | Lr --> 0.007 | Seconds_per_step --> 1.716 | [2024-01-03 04:03:02,269][Main][INFO] - [train] Step 42700 out of 65536 | Loss --> 2.050 | Grad_l2 --> 0.289 | Weights_l2 --> 17577.315 | Lr --> 0.007 | Seconds_per_step --> 1.706 | [2024-01-03 04:05:51,733][Main][INFO] - [train] Step 42800 out of 65536 | Loss --> 2.030 | Grad_l2 --> 0.291 | Weights_l2 --> 17581.978 | Lr --> 0.007 | Seconds_per_step --> 1.694 | [2024-01-03 04:08:39,804][Main][INFO] - [train] Step 42900 out of 65536 | Loss --> 2.043 | Grad_l2 --> 0.293 | Weights_l2 --> 17586.586 | Lr --> 0.007 | Seconds_per_step --> 1.681 | [2024-01-03 04:11:33,564][Main][INFO] - [train] Step 43000 out of 65536 | Loss --> 2.049 | Grad_l2 --> 0.289 | Weights_l2 --> 17591.143 | Lr --> 0.007 | Seconds_per_step --> 1.738 | [2024-01-03 04:14:23,750][Main][INFO] - [train] Step 43100 out of 65536 | Loss --> 2.039 | Grad_l2 --> 0.289 | Weights_l2 --> 17595.617 | Lr --> 0.007 | Seconds_per_step --> 1.702 | [2024-01-03 04:16:15,288][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00039-of-00512.json.gz [2024-01-03 04:16:27,714][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00154-of-00512.json.gz [2024-01-03 04:17:19,029][Main][INFO] - [train] Step 43200 out of 65536 | Loss --> 2.052 | Grad_l2 --> 0.289 | Weights_l2 --> 17599.983 | Lr --> 0.007 | Seconds_per_step --> 1.753 | [2024-01-03 04:20:07,567][Main][INFO] - [train] Step 43300 out of 65536 | Loss --> 2.054 | Grad_l2 --> 0.289 | Weights_l2 --> 17604.349 | Lr --> 0.007 | Seconds_per_step --> 1.685 | [2024-01-03 04:22:58,386][Main][INFO] - [train] Step 43400 out of 65536 | Loss --> 2.054 | Grad_l2 --> 0.288 | Weights_l2 --> 17608.604 | Lr --> 0.007 | Seconds_per_step --> 1.708 | [2024-01-03 04:25:46,523][Main][INFO] - [train] Step 43500 out of 65536 | Loss --> 2.042 | Grad_l2 --> 0.287 | Weights_l2 --> 17612.800 | Lr --> 0.007 | Seconds_per_step --> 1.681 | [2024-01-03 04:28:33,755][Main][INFO] - [train] Step 43600 out of 65536 | Loss --> 2.040 | Grad_l2 --> 0.291 | Weights_l2 --> 17616.925 | Lr --> 0.007 | Seconds_per_step --> 1.672 | [2024-01-03 04:31:26,272][Main][INFO] - [train] Step 43700 out of 65536 | Loss --> 2.051 | Grad_l2 --> 0.284 | Weights_l2 --> 17621.023 | Lr --> 0.007 | Seconds_per_step --> 1.725 | [2024-01-03 04:34:15,461][Main][INFO] - [train] Step 43800 out of 65536 | Loss --> 2.047 | Grad_l2 --> 0.285 | Weights_l2 --> 17625.027 | Lr --> 0.007 | Seconds_per_step --> 1.692 | [2024-01-03 04:35:48,811][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00286-of-00512.json.gz [2024-01-03 04:36:05,765][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00343-of-00512.json.gz [2024-01-03 04:37:05,710][Main][INFO] - [train] Step 43900 out of 65536 | Loss --> 2.051 | Grad_l2 --> 0.288 | Weights_l2 --> 17629.009 | Lr --> 0.007 | Seconds_per_step --> 1.702 | [2024-01-03 04:39:55,076][Main][INFO] - [train] Step 44000 out of 65536 | Loss --> 2.042 | Grad_l2 --> 0.290 | Weights_l2 --> 17632.911 | Lr --> 0.007 | Seconds_per_step --> 1.694 | [2024-01-03 04:42:46,854][Main][INFO] - [train] Step 44100 out of 65536 | Loss --> 2.046 | Grad_l2 --> 0.287 | Weights_l2 --> 17636.705 | Lr --> 0.007 | Seconds_per_step --> 1.718 | [2024-01-03 04:45:34,449][Main][INFO] - [train] Step 44200 out of 65536 | Loss --> 2.037 | Grad_l2 --> 0.321 | Weights_l2 --> 17640.721 | Lr --> 0.006 | Seconds_per_step --> 1.676 | [2024-01-03 04:48:23,375][Main][INFO] - [train] Step 44300 out of 65536 | Loss --> 2.034 | Grad_l2 --> 0.289 | Weights_l2 --> 17644.460 | Lr --> 0.006 | Seconds_per_step --> 1.689 | [2024-01-03 04:51:14,841][Main][INFO] - [train] Step 44400 out of 65536 | Loss --> 2.024 | Grad_l2 --> 0.289 | Weights_l2 --> 17648.087 | Lr --> 0.006 | Seconds_per_step --> 1.715 | [2024-01-03 04:54:05,409][Main][INFO] - [train] Step 44500 out of 65536 | Loss --> 2.030 | Grad_l2 --> 0.285 | Weights_l2 --> 17651.628 | Lr --> 0.006 | Seconds_per_step --> 1.706 | [2024-01-03 04:56:06,856][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00304-of-00512.json.gz [2024-01-03 04:56:18,130][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00253-of-00512.json.gz [2024-01-03 04:56:58,490][Main][INFO] - [train] Step 44600 out of 65536 | Loss --> 2.021 | Grad_l2 --> 0.283 | Weights_l2 --> 17655.141 | Lr --> 0.006 | Seconds_per_step --> 1.731 | [2024-01-03 04:59:46,362][Main][INFO] - [train] Step 44700 out of 65536 | Loss --> 2.020 | Grad_l2 --> 0.289 | Weights_l2 --> 17658.629 | Lr --> 0.006 | Seconds_per_step --> 1.679 | [2024-01-03 05:02:39,535][Main][INFO] - [train] Step 44800 out of 65536 | Loss --> 2.040 | Grad_l2 --> 0.288 | Weights_l2 --> 17661.997 | Lr --> 0.006 | Seconds_per_step --> 1.732 | [2024-01-03 05:05:29,989][Main][INFO] - [train] Step 44900 out of 65536 | Loss --> 2.030 | Grad_l2 --> 0.297 | Weights_l2 --> 17665.418 | Lr --> 0.006 | Seconds_per_step --> 1.704 | [2024-01-03 05:08:17,468][Main][INFO] - [train] Step 45000 out of 65536 | Loss --> 2.016 | Grad_l2 --> 0.285 | Weights_l2 --> 17668.698 | Lr --> 0.006 | Seconds_per_step --> 1.675 | [2024-01-03 05:08:17,541][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-03 05:08:17,542][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-03 05:10:28,442][Main][INFO] - [eval] Step 45000 out of 65536 | Loss --> 2.057 | Accuracy --> 0.628 | Time --> 130.950 | [2024-01-03 05:13:19,885][Main][INFO] - [train] Step 45100 out of 65536 | Loss --> 2.029 | Grad_l2 --> 0.294 | Weights_l2 --> 17671.989 | Lr --> 0.006 | Seconds_per_step --> 1.714 | [2024-01-03 05:16:07,939][Main][INFO] - [train] Step 45200 out of 65536 | Loss --> 2.008 | Grad_l2 --> 0.294 | Weights_l2 --> 17675.197 | Lr --> 0.006 | Seconds_per_step --> 1.681 | [2024-01-03 05:18:26,781][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00254-of-00512.json.gz [2024-01-03 05:18:46,937][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00262-of-00512.json.gz [2024-01-03 05:19:05,189][Main][INFO] - [train] Step 45300 out of 65536 | Loss --> 2.013 | Grad_l2 --> 0.287 | Weights_l2 --> 17678.299 | Lr --> 0.006 | Seconds_per_step --> 1.773 | [2024-01-03 05:21:54,220][Main][INFO] - [train] Step 45400 out of 65536 | Loss --> 2.026 | Grad_l2 --> 0.285 | Weights_l2 --> 17681.375 | Lr --> 0.006 | Seconds_per_step --> 1.690 | [2024-01-03 05:24:44,189][Main][INFO] - [train] Step 45500 out of 65536 | Loss --> 2.020 | Grad_l2 --> 0.287 | Weights_l2 --> 17684.421 | Lr --> 0.006 | Seconds_per_step --> 1.700 | [2024-01-03 05:27:33,522][Main][INFO] - [train] Step 45600 out of 65536 | Loss --> 2.011 | Grad_l2 --> 0.286 | Weights_l2 --> 17687.395 | Lr --> 0.006 | Seconds_per_step --> 1.693 | [2024-01-03 05:30:23,633][Main][INFO] - [train] Step 45700 out of 65536 | Loss --> 2.011 | Grad_l2 --> 0.289 | Weights_l2 --> 17690.283 | Lr --> 0.006 | Seconds_per_step --> 1.701 | [2024-01-03 05:33:15,236][Main][INFO] - [train] Step 45800 out of 65536 | Loss --> 2.005 | Grad_l2 --> 0.292 | Weights_l2 --> 17693.136 | Lr --> 0.006 | Seconds_per_step --> 1.716 | [2024-01-03 05:36:04,189][Main][INFO] - [train] Step 45900 out of 65536 | Loss --> 2.032 | Grad_l2 --> 0.293 | Weights_l2 --> 17695.938 | Lr --> 0.006 | Seconds_per_step --> 1.690 | [2024-01-03 05:38:11,105][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00313-of-00512.json.gz [2024-01-03 05:38:31,551][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00118-of-00512.json.gz [2024-01-03 05:38:59,839][Main][INFO] - [train] Step 46000 out of 65536 | Loss --> 2.000 | Grad_l2 --> 0.287 | Weights_l2 --> 17698.712 | Lr --> 0.006 | Seconds_per_step --> 1.756 | [2024-01-03 05:41:50,228][Main][INFO] - [train] Step 46100 out of 65536 | Loss --> 2.010 | Grad_l2 --> 0.287 | Weights_l2 --> 17701.450 | Lr --> 0.005 | Seconds_per_step --> 1.704 | [2024-01-03 05:44:42,152][Main][INFO] - [train] Step 46200 out of 65536 | Loss --> 2.011 | Grad_l2 --> 0.294 | Weights_l2 --> 17704.118 | Lr --> 0.005 | Seconds_per_step --> 1.719 | [2024-01-03 05:47:32,102][Main][INFO] - [train] Step 46300 out of 65536 | Loss --> 1.998 | Grad_l2 --> 0.292 | Weights_l2 --> 17706.793 | Lr --> 0.005 | Seconds_per_step --> 1.699 | [2024-01-03 05:50:22,715][Main][INFO] - [train] Step 46400 out of 65536 | Loss --> 2.001 | Grad_l2 --> 0.291 | Weights_l2 --> 17709.329 | Lr --> 0.005 | Seconds_per_step --> 1.706 | [2024-01-03 05:53:13,369][Main][INFO] - [train] Step 46500 out of 65536 | Loss --> 2.000 | Grad_l2 --> 0.287 | Weights_l2 --> 17711.866 | Lr --> 0.005 | Seconds_per_step --> 1.707 | [2024-01-03 05:56:03,716][Main][INFO] - [train] Step 46600 out of 65536 | Loss --> 2.001 | Grad_l2 --> 0.287 | Weights_l2 --> 17714.399 | Lr --> 0.005 | Seconds_per_step --> 1.703 | [2024-01-03 05:58:52,062][Main][INFO] - [train] Step 46700 out of 65536 | Loss --> 2.008 | Grad_l2 --> 0.286 | Weights_l2 --> 17716.786 | Lr --> 0.005 | Seconds_per_step --> 1.683 | [2024-01-03 05:58:58,949][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00300-of-00512.json.gz [2024-01-03 05:59:09,467][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00433-of-00512.json.gz [2024-01-03 06:01:47,301][Main][INFO] - [train] Step 46800 out of 65536 | Loss --> 1.989 | Grad_l2 --> 0.296 | Weights_l2 --> 17719.210 | Lr --> 0.005 | Seconds_per_step --> 1.752 | [2024-01-03 06:04:36,050][Main][INFO] - [train] Step 46900 out of 65536 | Loss --> 2.002 | Grad_l2 --> 0.293 | Weights_l2 --> 17721.550 | Lr --> 0.005 | Seconds_per_step --> 1.687 | [2024-01-03 06:07:26,427][Main][INFO] - [train] Step 47000 out of 65536 | Loss --> 1.992 | Grad_l2 --> 0.288 | Weights_l2 --> 17723.868 | Lr --> 0.005 | Seconds_per_step --> 1.704 | [2024-01-03 06:10:16,331][Main][INFO] - [train] Step 47100 out of 65536 | Loss --> 1.989 | Grad_l2 --> 0.290 | Weights_l2 --> 17726.145 | Lr --> 0.005 | Seconds_per_step --> 1.699 | [2024-01-03 06:13:07,198][Main][INFO] - [train] Step 47200 out of 65536 | Loss --> 1.982 | Grad_l2 --> 0.283 | Weights_l2 --> 17728.351 | Lr --> 0.005 | Seconds_per_step --> 1.709 | [2024-01-03 06:15:56,648][Main][INFO] - [train] Step 47300 out of 65536 | Loss --> 1.984 | Grad_l2 --> 0.287 | Weights_l2 --> 17730.499 | Lr --> 0.005 | Seconds_per_step --> 1.694 | [2024-01-03 06:18:45,811][Main][INFO] - [train] Step 47400 out of 65536 | Loss --> 1.985 | Grad_l2 --> 0.286 | Weights_l2 --> 17732.626 | Lr --> 0.005 | Seconds_per_step --> 1.692 | [2024-01-03 06:19:24,189][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00379-of-00512.json.gz [2024-01-03 06:19:39,877][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00015-of-00512.json.gz [2024-01-03 06:21:38,536][Main][INFO] - [train] Step 47500 out of 65536 | Loss --> 1.983 | Grad_l2 --> 0.291 | Weights_l2 --> 17734.723 | Lr --> 0.005 | Seconds_per_step --> 1.727 | [2024-01-03 06:24:27,628][Main][INFO] - [train] Step 47600 out of 65536 | Loss --> 1.988 | Grad_l2 --> 0.290 | Weights_l2 --> 17736.780 | Lr --> 0.005 | Seconds_per_step --> 1.691 | [2024-01-03 06:27:15,767][Main][INFO] - [train] Step 47700 out of 65536 | Loss --> 1.986 | Grad_l2 --> 0.288 | Weights_l2 --> 17738.760 | Lr --> 0.005 | Seconds_per_step --> 1.681 | [2024-01-03 06:30:05,529][Main][INFO] - [train] Step 47800 out of 65536 | Loss --> 1.980 | Grad_l2 --> 0.286 | Weights_l2 --> 17740.690 | Lr --> 0.005 | Seconds_per_step --> 1.698 | [2024-01-03 06:32:56,001][Main][INFO] - [train] Step 47900 out of 65536 | Loss --> 1.994 | Grad_l2 --> 0.289 | Weights_l2 --> 17742.631 | Lr --> 0.005 | Seconds_per_step --> 1.705 | [2024-01-03 06:35:44,297][Main][INFO] - [train] Step 48000 out of 65536 | Loss --> 1.974 | Grad_l2 --> 0.283 | Weights_l2 --> 17744.513 | Lr --> 0.005 | Seconds_per_step --> 1.683 | [2024-01-03 06:38:35,335][Main][INFO] - [train] Step 48100 out of 65536 | Loss --> 1.981 | Grad_l2 --> 0.291 | Weights_l2 --> 17746.355 | Lr --> 0.004 | Seconds_per_step --> 1.710 | [2024-01-03 06:39:14,259][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00128-of-00512.json.gz [2024-01-03 06:39:26,563][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00276-of-00512.json.gz [2024-01-03 06:41:24,140][Main][INFO] - [train] Step 48200 out of 65536 | Loss --> 1.968 | Grad_l2 --> 0.287 | Weights_l2 --> 17748.151 | Lr --> 0.004 | Seconds_per_step --> 1.688 | [2024-01-03 06:44:15,224][Main][INFO] - [train] Step 48300 out of 65536 | Loss --> 1.985 | Grad_l2 --> 0.283 | Weights_l2 --> 17749.913 | Lr --> 0.004 | Seconds_per_step --> 1.711 | [2024-01-03 06:47:07,918][Main][INFO] - [train] Step 48400 out of 65536 | Loss --> 1.961 | Grad_l2 --> 0.286 | Weights_l2 --> 17751.654 | Lr --> 0.004 | Seconds_per_step --> 1.727 | [2024-01-03 06:49:56,414][Main][INFO] - [train] Step 48500 out of 65536 | Loss --> 1.973 | Grad_l2 --> 0.285 | Weights_l2 --> 17753.328 | Lr --> 0.004 | Seconds_per_step --> 1.685 | [2024-01-03 06:52:47,869][Main][INFO] - [train] Step 48600 out of 65536 | Loss --> 1.980 | Grad_l2 --> 0.283 | Weights_l2 --> 17754.972 | Lr --> 0.004 | Seconds_per_step --> 1.715 | [2024-01-03 06:55:37,206][Main][INFO] - [train] Step 48700 out of 65536 | Loss --> 1.986 | Grad_l2 --> 0.285 | Weights_l2 --> 17756.556 | Lr --> 0.004 | Seconds_per_step --> 1.693 | [2024-01-03 06:58:26,075][Main][INFO] - [train] Step 48800 out of 65536 | Loss --> 1.982 | Grad_l2 --> 0.286 | Weights_l2 --> 17758.144 | Lr --> 0.004 | Seconds_per_step --> 1.689 | [2024-01-03 06:59:36,491][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00148-of-00512.json.gz [2024-01-03 06:59:49,115][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00275-of-00512.json.gz [2024-01-03 07:01:21,837][Main][INFO] - [train] Step 48900 out of 65536 | Loss --> 1.986 | Grad_l2 --> 0.287 | Weights_l2 --> 17759.704 | Lr --> 0.004 | Seconds_per_step --> 1.758 | [2024-01-03 07:04:12,556][Main][INFO] - [train] Step 49000 out of 65536 | Loss --> 1.996 | Grad_l2 --> 0.285 | Weights_l2 --> 17761.236 | Lr --> 0.004 | Seconds_per_step --> 1.707 | [2024-01-03 07:07:03,578][Main][INFO] - [train] Step 49100 out of 65536 | Loss --> 1.980 | Grad_l2 --> 0.286 | Weights_l2 --> 17762.724 | Lr --> 0.004 | Seconds_per_step --> 1.710 | [2024-01-03 07:09:54,031][Main][INFO] - [train] Step 49200 out of 65536 | Loss --> 1.963 | Grad_l2 --> 0.283 | Weights_l2 --> 17764.181 | Lr --> 0.004 | Seconds_per_step --> 1.705 | [2024-01-03 07:12:44,444][Main][INFO] - [train] Step 49300 out of 65536 | Loss --> 1.966 | Grad_l2 --> 0.279 | Weights_l2 --> 17765.586 | Lr --> 0.004 | Seconds_per_step --> 1.704 | [2024-01-03 07:15:33,649][Main][INFO] - [train] Step 49400 out of 65536 | Loss --> 1.978 | Grad_l2 --> 0.281 | Weights_l2 --> 17766.950 | Lr --> 0.004 | Seconds_per_step --> 1.692 | [2024-01-03 07:18:22,511][Main][INFO] - [train] Step 49500 out of 65536 | Loss --> 1.967 | Grad_l2 --> 0.285 | Weights_l2 --> 17768.285 | Lr --> 0.004 | Seconds_per_step --> 1.689 | [2024-01-03 07:19:25,709][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00318-of-00512.json.gz [2024-01-03 07:19:36,424][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00445-of-00512.json.gz [2024-01-03 07:21:13,513][Main][INFO] - [train] Step 49600 out of 65536 | Loss --> 1.958 | Grad_l2 --> 0.285 | Weights_l2 --> 17769.626 | Lr --> 0.004 | Seconds_per_step --> 1.710 | [2024-01-03 07:24:03,348][Main][INFO] - [train] Step 49700 out of 65536 | Loss --> 1.962 | Grad_l2 --> 0.282 | Weights_l2 --> 17770.912 | Lr --> 0.004 | Seconds_per_step --> 1.698 | [2024-01-03 07:26:54,995][Main][INFO] - [train] Step 49800 out of 65536 | Loss --> 1.955 | Grad_l2 --> 0.280 | Weights_l2 --> 17772.156 | Lr --> 0.004 | Seconds_per_step --> 1.716 | [2024-01-03 07:29:42,922][Main][INFO] - [train] Step 49900 out of 65536 | Loss --> 1.963 | Grad_l2 --> 0.285 | Weights_l2 --> 17773.425 | Lr --> 0.004 | Seconds_per_step --> 1.679 | [2024-01-03 07:32:32,227][Main][INFO] - [train] Step 50000 out of 65536 | Loss --> 1.945 | Grad_l2 --> 0.285 | Weights_l2 --> 17774.612 | Lr --> 0.004 | Seconds_per_step --> 1.693 | [2024-01-03 07:32:32,274][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-03 07:32:32,275][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-03 07:34:45,805][Main][INFO] - [eval] Step 50000 out of 65536 | Loss --> 1.993 | Accuracy --> 0.637 | Time --> 133.576 | [2024-01-03 07:37:36,452][Main][INFO] - [train] Step 50100 out of 65536 | Loss --> 1.960 | Grad_l2 --> 0.284 | Weights_l2 --> 17775.790 | Lr --> 0.004 | Seconds_per_step --> 1.706 | [2024-01-03 07:40:24,868][Main][INFO] - [train] Step 50200 out of 65536 | Loss --> 1.947 | Grad_l2 --> 0.284 | Weights_l2 --> 17776.924 | Lr --> 0.004 | Seconds_per_step --> 1.684 | [2024-01-03 07:41:50,457][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00358-of-00512.json.gz [2024-01-03 07:42:15,965][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00077-of-00512.json.gz [2024-01-03 07:43:16,085][Main][INFO] - [train] Step 50300 out of 65536 | Loss --> 1.959 | Grad_l2 --> 0.285 | Weights_l2 --> 17778.076 | Lr --> 0.003 | Seconds_per_step --> 1.712 | [2024-01-03 07:46:07,748][Main][INFO] - [train] Step 50400 out of 65536 | Loss --> 1.947 | Grad_l2 --> 0.285 | Weights_l2 --> 17779.151 | Lr --> 0.003 | Seconds_per_step --> 1.717 | [2024-01-03 07:49:05,278][Main][INFO] - [train] Step 50500 out of 65536 | Loss --> 1.943 | Grad_l2 --> 0.287 | Weights_l2 --> 17780.195 | Lr --> 0.003 | Seconds_per_step --> 1.775 | [2024-01-03 07:51:57,442][Main][INFO] - [train] Step 50600 out of 65536 | Loss --> 1.947 | Grad_l2 --> 0.283 | Weights_l2 --> 17781.220 | Lr --> 0.003 | Seconds_per_step --> 1.722 | [2024-01-03 07:54:46,848][Main][INFO] - [train] Step 50700 out of 65536 | Loss --> 1.954 | Grad_l2 --> 0.285 | Weights_l2 --> 17782.226 | Lr --> 0.003 | Seconds_per_step --> 1.694 | [2024-01-03 07:57:40,847][Main][INFO] - [train] Step 50800 out of 65536 | Loss --> 1.952 | Grad_l2 --> 0.290 | Weights_l2 --> 17783.180 | Lr --> 0.003 | Seconds_per_step --> 1.740 | [2024-01-03 08:00:29,173][Main][INFO] - [train] Step 50900 out of 65536 | Loss --> 1.941 | Grad_l2 --> 0.283 | Weights_l2 --> 17784.137 | Lr --> 0.003 | Seconds_per_step --> 1.683 | [2024-01-03 08:02:32,766][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00035-of-00512.json.gz [2024-01-03 08:02:55,068][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00490-of-00512.json.gz [2024-01-03 08:03:22,441][Main][INFO] - [train] Step 51000 out of 65536 | Loss --> 1.963 | Grad_l2 --> 0.280 | Weights_l2 --> 17785.068 | Lr --> 0.003 | Seconds_per_step --> 1.733 | [2024-01-03 08:06:11,340][Main][INFO] - [train] Step 51100 out of 65536 | Loss --> 1.949 | Grad_l2 --> 0.287 | Weights_l2 --> 17785.981 | Lr --> 0.003 | Seconds_per_step --> 1.689 | [2024-01-03 08:09:00,162][Main][INFO] - [train] Step 51200 out of 65536 | Loss --> 1.956 | Grad_l2 --> 0.282 | Weights_l2 --> 17786.851 | Lr --> 0.003 | Seconds_per_step --> 1.688 | [2024-01-03 08:11:51,687][Main][INFO] - [train] Step 51300 out of 65536 | Loss --> 1.943 | Grad_l2 --> 0.281 | Weights_l2 --> 17787.678 | Lr --> 0.003 | Seconds_per_step --> 1.715 | [2024-01-03 08:14:40,610][Main][INFO] - [train] Step 51400 out of 65536 | Loss --> 1.951 | Grad_l2 --> 0.283 | Weights_l2 --> 17788.532 | Lr --> 0.003 | Seconds_per_step --> 1.689 | [2024-01-03 08:17:39,618][Main][INFO] - [train] Step 51500 out of 65536 | Loss --> 1.948 | Grad_l2 --> 0.280 | Weights_l2 --> 17789.336 | Lr --> 0.003 | Seconds_per_step --> 1.790 | [2024-01-03 08:20:28,318][Main][INFO] - [train] Step 51600 out of 65536 | Loss --> 1.940 | Grad_l2 --> 0.283 | Weights_l2 --> 17790.135 | Lr --> 0.003 | Seconds_per_step --> 1.687 | [2024-01-03 08:22:22,677][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00272-of-00512.json.gz [2024-01-03 08:22:41,878][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00278-of-00512.json.gz [2024-01-03 08:23:19,793][Main][INFO] - [train] Step 51700 out of 65536 | Loss --> 1.939 | Grad_l2 --> 0.282 | Weights_l2 --> 17790.870 | Lr --> 0.003 | Seconds_per_step --> 1.715 | [2024-01-03 08:26:11,959][Main][INFO] - [train] Step 51800 out of 65536 | Loss --> 1.926 | Grad_l2 --> 0.281 | Weights_l2 --> 17791.613 | Lr --> 0.003 | Seconds_per_step --> 1.722 | [2024-01-03 08:29:04,895][Main][INFO] - [train] Step 51900 out of 65536 | Loss --> 1.942 | Grad_l2 --> 0.284 | Weights_l2 --> 17792.345 | Lr --> 0.003 | Seconds_per_step --> 1.729 | [2024-01-03 08:31:55,552][Main][INFO] - [train] Step 52000 out of 65536 | Loss --> 1.936 | Grad_l2 --> 0.286 | Weights_l2 --> 17793.061 | Lr --> 0.003 | Seconds_per_step --> 1.707 | [2024-01-03 08:34:44,174][Main][INFO] - [train] Step 52100 out of 65536 | Loss --> 1.926 | Grad_l2 --> 0.284 | Weights_l2 --> 17793.750 | Lr --> 0.003 | Seconds_per_step --> 1.686 | [2024-01-03 08:37:32,397][Main][INFO] - [train] Step 52200 out of 65536 | Loss --> 1.941 | Grad_l2 --> 0.285 | Weights_l2 --> 17794.421 | Lr --> 0.003 | Seconds_per_step --> 1.682 | [2024-01-03 08:40:23,128][Main][INFO] - [train] Step 52300 out of 65536 | Loss --> 1.930 | Grad_l2 --> 0.282 | Weights_l2 --> 17795.039 | Lr --> 0.003 | Seconds_per_step --> 1.707 | [2024-01-03 08:42:36,230][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00324-of-00512.json.gz [2024-01-03 08:42:52,397][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00146-of-00512.json.gz [2024-01-03 08:43:14,802][Main][INFO] - [train] Step 52400 out of 65536 | Loss --> 1.941 | Grad_l2 --> 0.276 | Weights_l2 --> 17795.653 | Lr --> 0.003 | Seconds_per_step --> 1.717 | [2024-01-03 08:46:02,851][Main][INFO] - [train] Step 52500 out of 65536 | Loss --> 1.930 | Grad_l2 --> 0.284 | Weights_l2 --> 17796.251 | Lr --> 0.003 | Seconds_per_step --> 1.680 | [2024-01-03 08:48:51,490][Main][INFO] - [train] Step 52600 out of 65536 | Loss --> 1.912 | Grad_l2 --> 0.282 | Weights_l2 --> 17796.824 | Lr --> 0.003 | Seconds_per_step --> 1.686 | [2024-01-03 08:51:44,971][Main][INFO] - [train] Step 52700 out of 65536 | Loss --> 1.916 | Grad_l2 --> 0.279 | Weights_l2 --> 17797.375 | Lr --> 0.003 | Seconds_per_step --> 1.735 | [2024-01-03 08:54:35,671][Main][INFO] - [train] Step 52800 out of 65536 | Loss --> 1.941 | Grad_l2 --> 0.285 | Weights_l2 --> 17797.922 | Lr --> 0.002 | Seconds_per_step --> 1.707 | [2024-01-03 08:57:24,488][Main][INFO] - [train] Step 52900 out of 65536 | Loss --> 1.936 | Grad_l2 --> 0.278 | Weights_l2 --> 17798.438 | Lr --> 0.002 | Seconds_per_step --> 1.688 | [2024-01-03 09:00:13,896][Main][INFO] - [train] Step 53000 out of 65536 | Loss --> 1.921 | Grad_l2 --> 0.284 | Weights_l2 --> 17798.966 | Lr --> 0.002 | Seconds_per_step --> 1.694 | [2024-01-03 09:03:01,180][Main][INFO] - [train] Step 53100 out of 65536 | Loss --> 1.930 | Grad_l2 --> 0.285 | Weights_l2 --> 17799.468 | Lr --> 0.002 | Seconds_per_step --> 1.673 | [2024-01-03 09:03:06,043][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00173-of-00512.json.gz [2024-01-03 09:03:07,200][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00387-of-00512.json.gz [2024-01-03 09:05:50,561][Main][INFO] - [train] Step 53200 out of 65536 | Loss --> 1.946 | Grad_l2 --> 0.290 | Weights_l2 --> 17799.944 | Lr --> 0.002 | Seconds_per_step --> 1.694 | [2024-01-03 09:08:42,539][Main][INFO] - [train] Step 53300 out of 65536 | Loss --> 1.946 | Grad_l2 --> 0.282 | Weights_l2 --> 17800.414 | Lr --> 0.002 | Seconds_per_step --> 1.720 | [2024-01-03 09:11:34,312][Main][INFO] - [train] Step 53400 out of 65536 | Loss --> 1.933 | Grad_l2 --> 0.281 | Weights_l2 --> 17800.876 | Lr --> 0.002 | Seconds_per_step --> 1.718 | [2024-01-03 09:14:21,663][Main][INFO] - [train] Step 53500 out of 65536 | Loss --> 1.924 | Grad_l2 --> 0.284 | Weights_l2 --> 17801.311 | Lr --> 0.002 | Seconds_per_step --> 1.674 | [2024-01-03 09:17:09,495][Main][INFO] - [train] Step 53600 out of 65536 | Loss --> 1.927 | Grad_l2 --> 0.279 | Weights_l2 --> 17801.746 | Lr --> 0.002 | Seconds_per_step --> 1.678 | [2024-01-03 09:20:01,016][Main][INFO] - [train] Step 53700 out of 65536 | Loss --> 1.934 | Grad_l2 --> 0.279 | Weights_l2 --> 17802.152 | Lr --> 0.002 | Seconds_per_step --> 1.715 | [2024-01-03 09:22:49,068][Main][INFO] - [train] Step 53800 out of 65536 | Loss --> 1.943 | Grad_l2 --> 0.282 | Weights_l2 --> 17802.556 | Lr --> 0.002 | Seconds_per_step --> 1.681 | [2024-01-03 09:22:50,328][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00479-of-00512.json.gz [2024-01-03 09:22:56,830][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00180-of-00512.json.gz [2024-01-03 09:25:42,060][Main][INFO] - [train] Step 53900 out of 65536 | Loss --> 1.930 | Grad_l2 --> 0.279 | Weights_l2 --> 17802.936 | Lr --> 0.002 | Seconds_per_step --> 1.730 | [2024-01-03 09:28:31,432][Main][INFO] - [train] Step 54000 out of 65536 | Loss --> 1.932 | Grad_l2 --> 0.284 | Weights_l2 --> 17803.290 | Lr --> 0.002 | Seconds_per_step --> 1.694 | [2024-01-03 09:31:19,349][Main][INFO] - [train] Step 54100 out of 65536 | Loss --> 1.928 | Grad_l2 --> 0.277 | Weights_l2 --> 17803.649 | Lr --> 0.002 | Seconds_per_step --> 1.679 | [2024-01-03 09:34:17,347][Main][INFO] - [train] Step 54200 out of 65536 | Loss --> 1.910 | Grad_l2 --> 0.279 | Weights_l2 --> 17803.992 | Lr --> 0.002 | Seconds_per_step --> 1.780 | [2024-01-03 09:37:05,412][Main][INFO] - [train] Step 54300 out of 65536 | Loss --> 1.918 | Grad_l2 --> 0.280 | Weights_l2 --> 17804.320 | Lr --> 0.002 | Seconds_per_step --> 1.681 | [2024-01-03 09:39:54,989][Main][INFO] - [train] Step 54400 out of 65536 | Loss --> 1.922 | Grad_l2 --> 0.284 | Weights_l2 --> 17804.625 | Lr --> 0.002 | Seconds_per_step --> 1.696 | [2024-01-03 09:42:52,153][Main][INFO] - [train] Step 54500 out of 65536 | Loss --> 1.935 | Grad_l2 --> 0.283 | Weights_l2 --> 17804.941 | Lr --> 0.002 | Seconds_per_step --> 1.772 | [2024-01-03 09:43:01,447][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00374-of-00512.json.gz [2024-01-03 09:43:21,998][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00363-of-00512.json.gz [2024-01-03 09:45:46,923][Main][INFO] - [train] Step 54600 out of 65536 | Loss --> 1.929 | Grad_l2 --> 0.283 | Weights_l2 --> 17805.238 | Lr --> 0.002 | Seconds_per_step --> 1.748 | [2024-01-03 09:48:35,393][Main][INFO] - [train] Step 54700 out of 65536 | Loss --> 1.908 | Grad_l2 --> 0.283 | Weights_l2 --> 17805.515 | Lr --> 0.002 | Seconds_per_step --> 1.685 | [2024-01-03 09:51:23,795][Main][INFO] - [train] Step 54800 out of 65536 | Loss --> 1.905 | Grad_l2 --> 0.283 | Weights_l2 --> 17805.780 | Lr --> 0.002 | Seconds_per_step --> 1.684 | [2024-01-03 09:54:14,571][Main][INFO] - [train] Step 54900 out of 65536 | Loss --> 1.907 | Grad_l2 --> 0.281 | Weights_l2 --> 17806.019 | Lr --> 0.002 | Seconds_per_step --> 1.708 | [2024-01-03 09:57:04,000][Main][INFO] - [train] Step 55000 out of 65536 | Loss --> 1.910 | Grad_l2 --> 0.279 | Weights_l2 --> 17806.272 | Lr --> 0.002 | Seconds_per_step --> 1.694 | [2024-01-03 09:57:04,063][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-03 09:57:04,064][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-03 09:59:15,506][Main][INFO] - [eval] Step 55000 out of 65536 | Loss --> 1.946 | Accuracy --> 0.644 | Time --> 131.503 | [2024-01-03 10:02:06,812][Main][INFO] - [train] Step 55100 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.278 | Weights_l2 --> 17806.518 | Lr --> 0.002 | Seconds_per_step --> 1.713 | [2024-01-03 10:04:57,090][Main][INFO] - [train] Step 55200 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.281 | Weights_l2 --> 17806.736 | Lr --> 0.002 | Seconds_per_step --> 1.703 | [2024-01-03 10:05:46,552][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00393-of-00512.json.gz [2024-01-03 10:06:12,249][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00081-of-00512.json.gz [2024-01-03 10:07:51,292][Main][INFO] - [train] Step 55300 out of 65536 | Loss --> 1.918 | Grad_l2 --> 0.278 | Weights_l2 --> 17806.957 | Lr --> 0.002 | Seconds_per_step --> 1.742 | [2024-01-03 10:10:41,650][Main][INFO] - [train] Step 55400 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.280 | Weights_l2 --> 17807.162 | Lr --> 0.002 | Seconds_per_step --> 1.704 | [2024-01-03 10:13:30,115][Main][INFO] - [train] Step 55500 out of 65536 | Loss --> 1.909 | Grad_l2 --> 0.285 | Weights_l2 --> 17807.366 | Lr --> 0.002 | Seconds_per_step --> 1.685 | [2024-01-03 10:16:21,613][Main][INFO] - [train] Step 55600 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.281 | Weights_l2 --> 17807.573 | Lr --> 0.002 | Seconds_per_step --> 1.715 | [2024-01-03 10:19:13,472][Main][INFO] - [train] Step 55700 out of 65536 | Loss --> 1.899 | Grad_l2 --> 0.280 | Weights_l2 --> 17807.748 | Lr --> 0.002 | Seconds_per_step --> 1.719 | [2024-01-03 10:22:04,836][Main][INFO] - [train] Step 55800 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.283 | Weights_l2 --> 17807.928 | Lr --> 0.001 | Seconds_per_step --> 1.714 | [2024-01-03 10:24:53,434][Main][INFO] - [train] Step 55900 out of 65536 | Loss --> 1.901 | Grad_l2 --> 0.282 | Weights_l2 --> 17808.096 | Lr --> 0.001 | Seconds_per_step --> 1.686 | [2024-01-03 10:25:28,577][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00258-of-00512.json.gz [2024-01-03 10:25:43,285][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00013-of-00512.json.gz [2024-01-03 10:27:44,412][Main][INFO] - [train] Step 56000 out of 65536 | Loss --> 1.928 | Grad_l2 --> 0.278 | Weights_l2 --> 17808.247 | Lr --> 0.001 | Seconds_per_step --> 1.710 | [2024-01-03 10:30:35,033][Main][INFO] - [train] Step 56100 out of 65536 | Loss --> 1.917 | Grad_l2 --> 0.276 | Weights_l2 --> 17808.405 | Lr --> 0.001 | Seconds_per_step --> 1.706 | [2024-01-03 10:33:21,664][Main][INFO] - [train] Step 56200 out of 65536 | Loss --> 1.921 | Grad_l2 --> 0.279 | Weights_l2 --> 17808.551 | Lr --> 0.001 | Seconds_per_step --> 1.666 | [2024-01-03 10:36:10,744][Main][INFO] - [train] Step 56300 out of 65536 | Loss --> 1.926 | Grad_l2 --> 0.284 | Weights_l2 --> 17808.693 | Lr --> 0.001 | Seconds_per_step --> 1.691 | [2024-01-03 10:39:05,510][Main][INFO] - [train] Step 56400 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.281 | Weights_l2 --> 17808.825 | Lr --> 0.001 | Seconds_per_step --> 1.748 | [2024-01-03 10:41:54,171][Main][INFO] - [train] Step 56500 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.284 | Weights_l2 --> 17808.956 | Lr --> 0.001 | Seconds_per_step --> 1.687 | [2024-01-03 10:44:43,932][Main][INFO] - [train] Step 56600 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.278 | Weights_l2 --> 17809.086 | Lr --> 0.001 | Seconds_per_step --> 1.698 | [2024-01-03 10:45:40,890][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00371-of-00512.json.gz [2024-01-03 10:45:51,655][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00181-of-00512.json.gz [2024-01-03 10:47:35,088][Main][INFO] - [train] Step 56700 out of 65536 | Loss --> 1.920 | Grad_l2 --> 0.280 | Weights_l2 --> 17809.191 | Lr --> 0.001 | Seconds_per_step --> 1.712 | [2024-01-03 10:50:26,992][Main][INFO] - [train] Step 56800 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.278 | Weights_l2 --> 17809.289 | Lr --> 0.001 | Seconds_per_step --> 1.719 | [2024-01-03 10:53:14,768][Main][INFO] - [train] Step 56900 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.279 | Weights_l2 --> 17809.384 | Lr --> 0.001 | Seconds_per_step --> 1.678 | [2024-01-03 10:56:06,561][Main][INFO] - [train] Step 57000 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.280 | Weights_l2 --> 17809.482 | Lr --> 0.001 | Seconds_per_step --> 1.718 | [2024-01-03 10:58:58,519][Main][INFO] - [train] Step 57100 out of 65536 | Loss --> 1.905 | Grad_l2 --> 0.281 | Weights_l2 --> 17809.573 | Lr --> 0.001 | Seconds_per_step --> 1.720 | [2024-01-03 11:01:47,005][Main][INFO] - [train] Step 57200 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.277 | Weights_l2 --> 17809.665 | Lr --> 0.001 | Seconds_per_step --> 1.685 | [2024-01-03 11:04:36,561][Main][INFO] - [train] Step 57300 out of 65536 | Loss --> 1.894 | Grad_l2 --> 0.279 | Weights_l2 --> 17809.746 | Lr --> 0.001 | Seconds_per_step --> 1.696 | [2024-01-03 11:05:30,132][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00103-of-00512.json.gz [2024-01-03 11:05:35,523][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00208-of-00512.json.gz [2024-01-03 11:07:29,362][Main][INFO] - [train] Step 57400 out of 65536 | Loss --> 1.889 | Grad_l2 --> 0.278 | Weights_l2 --> 17809.834 | Lr --> 0.001 | Seconds_per_step --> 1.728 | [2024-01-03 11:10:18,968][Main][INFO] - [train] Step 57500 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.279 | Weights_l2 --> 17809.900 | Lr --> 0.001 | Seconds_per_step --> 1.696 | [2024-01-03 11:13:09,063][Main][INFO] - [train] Step 57600 out of 65536 | Loss --> 1.910 | Grad_l2 --> 0.279 | Weights_l2 --> 17809.962 | Lr --> 0.001 | Seconds_per_step --> 1.701 | [2024-01-03 11:15:57,325][Main][INFO] - [train] Step 57700 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.026 | Lr --> 0.001 | Seconds_per_step --> 1.683 | [2024-01-03 11:18:48,091][Main][INFO] - [train] Step 57800 out of 65536 | Loss --> 1.898 | Grad_l2 --> 0.287 | Weights_l2 --> 17810.089 | Lr --> 0.001 | Seconds_per_step --> 1.708 | [2024-01-03 11:21:36,734][Main][INFO] - [train] Step 57900 out of 65536 | Loss --> 1.922 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.153 | Lr --> 0.001 | Seconds_per_step --> 1.686 | [2024-01-03 11:24:24,907][Main][INFO] - [train] Step 58000 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.201 | Lr --> 0.001 | Seconds_per_step --> 1.682 | [2024-01-03 11:25:45,135][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00498-of-00512.json.gz [2024-01-03 11:25:46,341][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00282-of-00512.json.gz [2024-01-03 11:27:15,200][Main][INFO] - [train] Step 58100 out of 65536 | Loss --> 1.900 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.245 | Lr --> 0.001 | Seconds_per_step --> 1.703 | [2024-01-03 11:30:05,335][Main][INFO] - [train] Step 58200 out of 65536 | Loss --> 1.893 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.289 | Lr --> 0.001 | Seconds_per_step --> 1.701 | [2024-01-03 11:32:56,437][Main][INFO] - [train] Step 58300 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.331 | Lr --> 0.001 | Seconds_per_step --> 1.711 | [2024-01-03 11:35:43,398][Main][INFO] - [train] Step 58400 out of 65536 | Loss --> 1.881 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.369 | Lr --> 0.001 | Seconds_per_step --> 1.670 | [2024-01-03 11:38:33,866][Main][INFO] - [train] Step 58500 out of 65536 | Loss --> 1.904 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.404 | Lr --> 0.001 | Seconds_per_step --> 1.705 | [2024-01-03 11:41:24,447][Main][INFO] - [train] Step 58600 out of 65536 | Loss --> 1.907 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.439 | Lr --> 0.001 | Seconds_per_step --> 1.706 | [2024-01-03 11:44:13,254][Main][INFO] - [train] Step 58700 out of 65536 | Loss --> 1.914 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.474 | Lr --> 0.001 | Seconds_per_step --> 1.688 | [2024-01-03 11:45:45,715][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00301-of-00512.json.gz [2024-01-03 11:46:18,193][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00201-of-00512.json.gz [2024-01-03 11:47:05,567][Main][INFO] - [train] Step 58800 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.493 | Lr --> 0.001 | Seconds_per_step --> 1.723 | [2024-01-03 11:49:54,285][Main][INFO] - [train] Step 58900 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.523 | Lr --> 0.001 | Seconds_per_step --> 1.687 | [2024-01-03 11:52:46,307][Main][INFO] - [train] Step 59000 out of 65536 | Loss --> 1.868 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.540 | Lr --> 0.001 | Seconds_per_step --> 1.720 | [2024-01-03 11:55:34,450][Main][INFO] - [train] Step 59100 out of 65536 | Loss --> 1.888 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.564 | Lr --> 0.001 | Seconds_per_step --> 1.681 | [2024-01-03 11:58:24,845][Main][INFO] - [train] Step 59200 out of 65536 | Loss --> 1.893 | Grad_l2 --> 0.282 | Weights_l2 --> 17810.584 | Lr --> 0.001 | Seconds_per_step --> 1.704 | [2024-01-03 12:01:15,842][Main][INFO] - [train] Step 59300 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.598 | Lr --> 0.001 | Seconds_per_step --> 1.710 | [2024-01-03 12:04:08,755][Main][INFO] - [train] Step 59400 out of 65536 | Loss --> 1.895 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.613 | Lr --> 0.001 | Seconds_per_step --> 1.729 | [2024-01-03 12:06:00,519][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00420-of-00512.json.gz [2024-01-03 12:06:19,674][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00159-of-00512.json.gz [2024-01-03 12:07:04,236][Main][INFO] - [train] Step 59500 out of 65536 | Loss --> 1.886 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.628 | Lr --> 0.001 | Seconds_per_step --> 1.755 | [2024-01-03 12:09:52,438][Main][INFO] - [train] Step 59600 out of 65536 | Loss --> 1.905 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.640 | Lr --> 0.001 | Seconds_per_step --> 1.682 | [2024-01-03 12:12:42,133][Main][INFO] - [train] Step 59700 out of 65536 | Loss --> 1.880 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.644 | Lr --> 0.001 | Seconds_per_step --> 1.697 | [2024-01-03 12:15:31,972][Main][INFO] - [train] Step 59800 out of 65536 | Loss --> 1.884 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.654 | Lr --> 0.001 | Seconds_per_step --> 1.698 | [2024-01-03 12:18:20,211][Main][INFO] - [train] Step 59900 out of 65536 | Loss --> 1.895 | Grad_l2 --> 0.283 | Weights_l2 --> 17810.661 | Lr --> 0.001 | Seconds_per_step --> 1.682 | [2024-01-03 12:21:13,723][Main][INFO] - [train] Step 60000 out of 65536 | Loss --> 1.898 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.665 | Lr --> 0.000 | Seconds_per_step --> 1.735 | [2024-01-03 12:21:13,771][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-03 12:21:13,771][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-03 12:23:24,062][Main][INFO] - [eval] Step 60000 out of 65536 | Loss --> 1.922 | Accuracy --> 0.647 | Time --> 130.336 | [2024-01-03 12:23:24,065][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-60000 [2024-01-03 12:23:24,069][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-01-03 12:23:27,125][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-60000/model.safetensors [2024-01-03 12:23:31,336][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-60000/optimizer.bin [2024-01-03 12:23:31,337][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-60000/scheduler.bin [2024-01-03 12:23:31,337][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-60000/sampler.bin [2024-01-03 12:23:31,337][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-60000/sampler_1.bin [2024-01-03 12:23:31,339][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-60000/random_states_0.pkl [2024-01-03 12:26:22,312][Main][INFO] - [train] Step 60100 out of 65536 | Loss --> 1.905 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.670 | Lr --> 0.000 | Seconds_per_step --> 1.782 | [2024-01-03 12:28:41,060][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00050-of-00512.json.gz [2024-01-03 12:29:15,482][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00193-of-00512.json.gz [2024-01-03 12:29:22,204][Main][INFO] - [train] Step 60200 out of 65536 | Loss --> 1.911 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.677 | Lr --> 0.000 | Seconds_per_step --> 1.799 | [2024-01-03 12:32:12,472][Main][INFO] - [train] Step 60300 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.679 | Lr --> 0.000 | Seconds_per_step --> 1.703 | [2024-01-03 12:35:01,473][Main][INFO] - [train] Step 60400 out of 65536 | Loss --> 1.885 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.682 | Lr --> 0.000 | Seconds_per_step --> 1.690 | [2024-01-03 12:37:53,655][Main][INFO] - [train] Step 60500 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.683 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-01-03 12:40:43,615][Main][INFO] - [train] Step 60600 out of 65536 | Loss --> 1.896 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.679 | Lr --> 0.000 | Seconds_per_step --> 1.700 | [2024-01-03 12:43:36,004][Main][INFO] - [train] Step 60700 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.680 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-01-03 12:46:28,090][Main][INFO] - [train] Step 60800 out of 65536 | Loss --> 1.885 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.681 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-01-03 12:49:18,432][Main][INFO] - [train] Step 60900 out of 65536 | Loss --> 1.873 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.677 | Lr --> 0.000 | Seconds_per_step --> 1.703 | [2024-01-03 12:49:22,445][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00017-of-00512.json.gz [2024-01-03 12:49:29,336][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00090-of-00512.json.gz [2024-01-03 12:52:11,823][Main][INFO] - [train] Step 61000 out of 65536 | Loss --> 1.886 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.676 | Lr --> 0.000 | Seconds_per_step --> 1.734 | [2024-01-03 12:55:00,485][Main][INFO] - [train] Step 61100 out of 65536 | Loss --> 1.896 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.670 | Lr --> 0.000 | Seconds_per_step --> 1.687 | [2024-01-03 12:57:50,045][Main][INFO] - [train] Step 61200 out of 65536 | Loss --> 1.873 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.671 | Lr --> 0.000 | Seconds_per_step --> 1.696 | [2024-01-03 13:00:41,726][Main][INFO] - [train] Step 61300 out of 65536 | Loss --> 1.875 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.662 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-01-03 13:03:30,567][Main][INFO] - [train] Step 61400 out of 65536 | Loss --> 1.866 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.662 | Lr --> 0.000 | Seconds_per_step --> 1.688 | [2024-01-03 13:07:09,967][Main][INFO] - [train] Step 61500 out of 65536 | Loss --> 1.872 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.657 | Lr --> 0.000 | Seconds_per_step --> 2.194 | [2024-01-03 13:09:26,281][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00271-of-00512.json.gz [2024-01-03 13:10:00,007][Main][INFO] - [train] Step 61600 out of 65536 | Loss --> 1.884 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.654 | Lr --> 0.000 | Seconds_per_step --> 1.700 | [2024-01-03 13:10:35,628][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00402-of-00512.json.gz [2024-01-03 13:12:48,268][Main][INFO] - [train] Step 61700 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.649 | Lr --> 0.000 | Seconds_per_step --> 1.683 | [2024-01-03 13:15:38,728][Main][INFO] - [train] Step 61800 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.645 | Lr --> 0.000 | Seconds_per_step --> 1.704 | [2024-01-03 13:18:32,609][Main][INFO] - [train] Step 61900 out of 65536 | Loss --> 1.879 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.639 | Lr --> 0.000 | Seconds_per_step --> 1.739 | [2024-01-03 13:21:22,433][Main][INFO] - [train] Step 62000 out of 65536 | Loss --> 1.881 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.640 | Lr --> 0.000 | Seconds_per_step --> 1.698 | [2024-01-03 13:24:13,244][Main][INFO] - [train] Step 62100 out of 65536 | Loss --> 1.888 | Grad_l2 --> 0.275 | Weights_l2 --> 17810.633 | Lr --> 0.000 | Seconds_per_step --> 1.708 | [2024-01-03 13:27:02,989][Main][INFO] - [train] Step 62200 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.628 | Lr --> 0.000 | Seconds_per_step --> 1.697 | [2024-01-03 13:29:43,155][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00261-of-00512.json.gz [2024-01-03 13:29:53,152][Main][INFO] - [train] Step 62300 out of 65536 | Loss --> 1.876 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.625 | Lr --> 0.000 | Seconds_per_step --> 1.702 | [2024-01-03 13:31:14,753][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00459-of-00512.json.gz [2024-01-03 13:32:48,177][Main][INFO] - [train] Step 62400 out of 65536 | Loss --> 1.894 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.621 | Lr --> 0.000 | Seconds_per_step --> 1.750 | [2024-01-03 13:35:40,454][Main][INFO] - [train] Step 62500 out of 65536 | Loss --> 1.894 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.617 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-01-03 13:38:30,070][Main][INFO] - [train] Step 62600 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.610 | Lr --> 0.000 | Seconds_per_step --> 1.696 | [2024-01-03 13:41:18,539][Main][INFO] - [train] Step 62700 out of 65536 | Loss --> 1.884 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.608 | Lr --> 0.000 | Seconds_per_step --> 1.685 | [2024-01-03 13:44:08,823][Main][INFO] - [train] Step 62800 out of 65536 | Loss --> 1.888 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.604 | Lr --> 0.000 | Seconds_per_step --> 1.703 | [2024-01-03 13:46:57,439][Main][INFO] - [train] Step 62900 out of 65536 | Loss --> 1.895 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.601 | Lr --> 0.000 | Seconds_per_step --> 1.686 | [2024-01-03 13:49:23,728][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00083-of-00512.json.gz [2024-01-03 13:49:48,796][Main][INFO] - [train] Step 63000 out of 65536 | Loss --> 1.900 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.597 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-01-03 13:50:58,701][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00111-of-00512.json.gz [2024-01-03 13:52:39,016][Main][INFO] - [train] Step 63100 out of 65536 | Loss --> 1.908 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.594 | Lr --> 0.000 | Seconds_per_step --> 1.702 | [2024-01-03 13:55:28,423][Main][INFO] - [train] Step 63200 out of 65536 | Loss --> 1.891 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.592 | Lr --> 0.000 | Seconds_per_step --> 1.694 | [2024-01-03 13:58:20,100][Main][INFO] - [train] Step 63300 out of 65536 | Loss --> 1.897 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.587 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-01-03 14:01:08,081][Main][INFO] - [train] Step 63400 out of 65536 | Loss --> 1.896 | Grad_l2 --> 0.273 | Weights_l2 --> 17810.585 | Lr --> 0.000 | Seconds_per_step --> 1.680 | [2024-01-03 14:04:01,026][Main][INFO] - [train] Step 63500 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.582 | Lr --> 0.000 | Seconds_per_step --> 1.729 | [2024-01-03 14:06:54,379][Main][INFO] - [train] Step 63600 out of 65536 | Loss --> 1.870 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.579 | Lr --> 0.000 | Seconds_per_step --> 1.734 | [2024-01-03 14:09:38,592][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00114-of-00512.json.gz [2024-01-03 14:09:44,055][Main][INFO] - [train] Step 63700 out of 65536 | Loss --> 1.883 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.577 | Lr --> 0.000 | Seconds_per_step --> 1.697 | [2024-01-03 14:11:18,021][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00078-of-00512.json.gz [2024-01-03 14:12:34,997][Main][INFO] - [train] Step 63800 out of 65536 | Loss --> 1.889 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.574 | Lr --> 0.000 | Seconds_per_step --> 1.709 | [2024-01-03 14:15:29,525][Main][INFO] - [train] Step 63900 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.570 | Lr --> 0.000 | Seconds_per_step --> 1.745 | [2024-01-03 14:18:21,644][Main][INFO] - [train] Step 64000 out of 65536 | Loss --> 1.872 | Grad_l2 --> 0.273 | Weights_l2 --> 17810.569 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-01-03 14:21:15,252][Main][INFO] - [train] Step 64100 out of 65536 | Loss --> 1.879 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.566 | Lr --> 0.000 | Seconds_per_step --> 1.736 | [2024-01-03 14:24:04,635][Main][INFO] - [train] Step 64200 out of 65536 | Loss --> 1.865 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.566 | Lr --> 0.000 | Seconds_per_step --> 1.694 | [2024-01-03 14:26:55,637][Main][INFO] - [train] Step 64300 out of 65536 | Loss --> 1.874 | Grad_l2 --> 0.288 | Weights_l2 --> 17810.564 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-01-03 14:29:48,764][Main][INFO] - [train] Step 64400 out of 65536 | Loss --> 1.863 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.563 | Lr --> 0.000 | Seconds_per_step --> 1.731 | [2024-01-03 14:30:35,636][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00414-of-00512.json.gz [2024-01-03 14:31:58,777][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00024-of-00512.json.gz [2024-01-03 14:33:14,044][Main][INFO] - [train] Step 64500 out of 65536 | Loss --> 1.874 | Grad_l2 --> 0.281 | Weights_l2 --> 17810.562 | Lr --> 0.000 | Seconds_per_step --> 2.053 | [2024-01-03 14:36:02,591][Main][INFO] - [train] Step 64600 out of 65536 | Loss --> 1.887 | Grad_l2 --> 0.276 | Weights_l2 --> 17810.561 | Lr --> 0.000 | Seconds_per_step --> 1.685 | [2024-01-03 14:38:53,830][Main][INFO] - [train] Step 64700 out of 65536 | Loss --> 1.877 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.560 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-01-03 14:41:41,988][Main][INFO] - [train] Step 64800 out of 65536 | Loss --> 1.883 | Grad_l2 --> 0.278 | Weights_l2 --> 17810.559 | Lr --> 0.000 | Seconds_per_step --> 1.682 | [2024-01-03 14:44:33,468][Main][INFO] - [train] Step 64900 out of 65536 | Loss --> 1.884 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.558 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-01-03 14:47:23,816][Main][INFO] - [train] Step 65000 out of 65536 | Loss --> 1.886 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.559 | Lr --> 0.000 | Seconds_per_step --> 1.703 | [2024-01-03 14:47:23,874][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-03 14:47:23,875][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-03 14:49:35,360][Main][INFO] - [eval] Step 65000 out of 65536 | Loss --> 1.915 | Accuracy --> 0.648 | Time --> 131.542 | [2024-01-03 14:52:24,549][Main][INFO] - [train] Step 65100 out of 65536 | Loss --> 1.892 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.558 | Lr --> 0.000 | Seconds_per_step --> 1.692 | [2024-01-03 14:53:07,539][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00432-of-00512.json.gz [2024-01-03 14:54:40,328][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk.tfrecord-00100-of-00512.json.gz [2024-01-03 14:55:17,473][Main][INFO] - [train] Step 65200 out of 65536 | Loss --> 1.897 | Grad_l2 --> 0.274 | Weights_l2 --> 17810.557 | Lr --> 0.000 | Seconds_per_step --> 1.729 | [2024-01-03 14:58:08,882][Main][INFO] - [train] Step 65300 out of 65536 | Loss --> 1.880 | Grad_l2 --> 0.280 | Weights_l2 --> 17810.557 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-01-03 15:00:58,309][Main][INFO] - [train] Step 65400 out of 65536 | Loss --> 1.890 | Grad_l2 --> 0.279 | Weights_l2 --> 17810.558 | Lr --> 0.000 | Seconds_per_step --> 1.694 | [2024-01-03 15:03:47,325][Main][INFO] - [train] Step 65500 out of 65536 | Loss --> 1.903 | Grad_l2 --> 0.277 | Weights_l2 --> 17810.558 | Lr --> 0.000 | Seconds_per_step --> 1.690 | [2024-01-03 15:04:48,926][datasets.iterable_dataset][WARNING] - Too many dataloader workers: 2 (max is dataset.n_shards=1). Stopping 1 dataloader workers. [2024-01-03 15:04:48,927][datasets_modules.datasets.mc4.99acea4a740b4cc36e4a93a238c7de11b0ce341d65b7d37168b3b90fd64721d2.mc4][INFO] - generating examples from = https://huggingface.co./datasets/allenai/c4/resolve/1ddc917116b730e1859edef32896ec5c16be51d0/multilingual/c4-sk-validation.tfrecord-00000-of-00001.json.gz [2024-01-03 15:07:00,390][Main][INFO] - [eval] Step 65537 out of 65536 | Loss --> 1.914 | Accuracy --> 0.648 | Time --> 131.512 | [2024-01-03 15:07:00,394][accelerate.accelerator][INFO] - Saving current state to checkpoint-pt-65537 [2024-01-03 15:07:00,397][accelerate.utils.other][WARNING] - Removed shared tensor {'decoder.embed_tokens.weight', 'encoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-01-03 15:07:02,671][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-pt-65537/model.safetensors [2024-01-03 15:07:07,200][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-pt-65537/optimizer.bin [2024-01-03 15:07:07,201][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-pt-65537/scheduler.bin [2024-01-03 15:07:07,202][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-pt-65537/sampler.bin [2024-01-03 15:07:07,202][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-pt-65537/sampler_1.bin [2024-01-03 15:07:07,203][accelerate.checkpointing][INFO] - Random states saved in checkpoint-pt-65537/random_states_0.pkl