/usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( [2024-03-10 11:11:23,156] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,396] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,422] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,477] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:23,477] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2024-03-10 11:11:23,532] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,718] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:23,742] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:23,745] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,848] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,853] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:23,865] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,969] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:24,046] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:24,153] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:24,187] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:24,302] [INFO] [comm.py:637:init_distributed] cdb=None Loading checkpoint shards: 0%| | 0/4 [00:00 2024-03-10 11:12:22.635 n193-018-074:2301448:2301448 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.654 n193-018-074:2301449:2301449 [1] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.654 n193-018-074:2301449:2301449 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.654 n193-018-074:2301449:2301449 [1] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.661 n193-018-074:2301451:2301451 [3] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.661 n193-018-074:2301451:2301451 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.662 n193-018-074:2301451:2301451 [3] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.668 n193-018-074:2301452:2301452 [4] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.669 n193-018-074:2301452:2301452 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.669 n193-018-074:2301452:2301452 [4] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.671 n193-018-074:2301455:2301455 [7] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.671 n193-018-074:2301455:2301455 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.671 n193-018-074:2301455:2301455 [7] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.673 n193-018-074:2301449:2301449 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.677 n193-018-074:2301448:2301448 [0] NCCL INFO cudaDriverVersion 12010 NCCL version 2.19.3+cuda12.1 2024-03-10 11:12:22.680 n193-018-074:2301451:2301451 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.685 n193-018-074:2301452:2301452 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.689 n193-018-074:2301455:2301455 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.700 n193-018-074:2301449:2302318 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.709 n193-018-074:2301448:2302319 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.709 n193-018-074:2301451:2302320 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.713 n193-018-074:2301452:2302321 [4] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.716 n193-018-074:2301455:2302322 [7] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.744 n193-018-074:2301449:2302318 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.745 n193-018-074:2301449:2302318 [1] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.758 n193-018-074:2301449:2302318 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.758 n193-018-074:2301449:2302318 [1] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.758 n193-018-074:2301449:2302318 [1] NCCL INFO Using network IB 2024-03-10 11:12:22.770 n193-018-074:2301455:2302322 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.772 n193-018-074:2301455:2302322 [7] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.776 n193-018-074:2301451:2302320 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.776 n193-018-074:2301451:2302320 [3] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.779 n193-018-074:2301448:2302319 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.780 n193-018-074:2301448:2302319 [0] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.787 n193-018-074:2301452:2302321 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.788 n193-018-074:2301452:2302321 [4] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.791 n193-018-074:2301451:2302320 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.791 n193-018-074:2301455:2302322 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.791 n193-018-074:2301451:2302320 [3] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.791 n193-018-074:2301451:2302320 [3] NCCL INFO Using network IB 2024-03-10 11:12:22.791 n193-018-074:2301455:2302322 [7] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.791 n193-018-074:2301455:2302322 [7] NCCL INFO Using network IB 2024-03-10 11:12:22.792 n193-018-074:2301448:2302319 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.792 n193-018-074:2301448:2302319 [0] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.792 n193-018-074:2301448:2302319 [0] NCCL INFO Using network IB 2024-03-10 11:12:22.800 n193-018-074:2301452:2302321 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.800 n193-018-074:2301452:2302321 [4] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.800 n193-018-074:2301452:2302321 [4] NCCL INFO Using network IB 2024-03-10 11:12:22.880 n193-018-074:2301453:2301453 [5] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.880 n193-018-074:2301453:2301453 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.881 n193-018-074:2301453:2301453 [5] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.887 n193-018-074:2301453:2301453 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.908 n193-018-074:2301453:2302344 [5] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.927 n193-018-074:2301453:2302344 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.927 n193-018-074:2301453:2302344 [5] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:23.068 n193-018-074:2301453:2302344 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:23.068 n193-018-074:2301453:2302344 [5] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:23.068 n193-018-074:2301453:2302344 [5] NCCL INFO Using network IB 2024-03-10 11:12:23.392 n193-018-074:2301450:2301450 [2] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:23.392 n193-018-074:2301450:2301450 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:23.392 n193-018-074:2301450:2301450 [2] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:23.402 n193-018-074:2301450:2301450 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:23.417 n193-018-074:2301454:2301454 [6] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:23.417 n193-018-074:2301454:2301454 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:23.417 n193-018-074:2301454:2301454 [6] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:23.422 n193-018-074:2301450:2302383 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:23.427 n193-018-074:2301454:2301454 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:23.439 n193-018-074:2301454:2302384 [6] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:23.447 n193-018-074:2301450:2302383 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:23.448 n193-018-074:2301450:2302383 [2] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:23.461 n193-018-074:2301450:2302383 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:23.461 n193-018-074:2301450:2302383 [2] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:23.461 n193-018-074:2301450:2302383 [2] NCCL INFO Using network IB 2024-03-10 11:12:23.462 n193-018-074:2301454:2302384 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:23.462 n193-018-074:2301454:2302384 [6] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:23.475 n193-018-074:2301454:2302384 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:23.475 n193-018-074:2301454:2302384 [6] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:23.475 n193-018-074:2301454:2302384 [6] NCCL INFO Using network IB 2024-03-10 11:12:23.515 n193-018-074:2301454:2302384 [6] NCCL INFO comm 0xb8a29e30 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301450:2302383 [2] NCCL INFO comm 0x186781460 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301455:2302322 [7] NCCL INFO comm 0xb983c130 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301453:2302344 [5] NCCL INFO comm 0x6f639cc0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301452:2302321 [4] NCCL INFO comm 0x185b03250 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301449:2302318 [1] NCCL INFO comm 0xa4fb2560 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301448:2302319 [0] NCCL INFO comm 0x198e24fa0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301451:2302320 [3] NCCL INFO comm 0x6fd77d40 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:25.574 n193-018-074:2301448:2302319 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,00000000,ffffffff 2024-03-10 11:12:25.574 n193-018-074:2301448:2302319 [0] NCCL INFO NVLS multicast support is not available on dev 0 2024-03-10 11:12:25.600 n193-018-074:2301454:2302384 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:25.600 n193-018-074:2301454:2302384 [6] NCCL INFO NVLS multicast support is not available on dev 6 2024-03-10 11:12:25.608 n193-018-074:2301450:2302383 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,00000000,ffffffff 2024-03-10 11:12:25.608 n193-018-074:2301450:2302383 [2] NCCL INFO NVLS multicast support is not available on dev 2 2024-03-10 11:12:25.611 n193-018-074:2301455:2302322 [7] NCCL INFO Setting affinity for GPU 7 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:25.611 n193-018-074:2301455:2302322 [7] NCCL INFO NVLS multicast support is not available on dev 7 2024-03-10 11:12:25.613 n193-018-074:2301452:2302321 [4] NCCL INFO Setting affinity for GPU 4 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:25.613 n193-018-074:2301452:2302321 [4] NCCL INFO NVLS multicast support is not available on dev 4 2024-03-10 11:12:25.614 n193-018-074:2301451:2302320 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,00000000,ffffffff 2024-03-10 11:12:25.614 n193-018-074:2301451:2302320 [3] NCCL INFO NVLS multicast support is not available on dev 3 2024-03-10 11:12:25.617 n193-018-074:2301449:2302318 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,00000000,ffffffff 2024-03-10 11:12:25.618 n193-018-074:2301449:2302318 [1] NCCL INFO NVLS multicast support is not available on dev 1 2024-03-10 11:12:25.618 n193-018-074:2301453:2302344 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:25.618 n193-018-074:2301453:2302344 [5] NCCL INFO NVLS multicast support is not available on dev 5 2024-03-10 11:12:25.619 n193-018-074:2301453:2302344 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301454:2302384 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 2024-03-10 11:12:25.619 n193-018-074:2301451:2302320 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 2024-03-10 11:12:25.619 n193-018-074:2301455:2302322 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 2024-03-10 11:12:25.619 n193-018-074:2301449:2302318 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301454:2302384 [6] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301453:2302344 [5] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301452:2302321 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301450:2302383 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 2024-03-10 11:12:25.619 n193-018-074:2301449:2302318 [1] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301455:2302322 [7] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301451:2302320 [3] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301452:2302321 [4] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301450:2302383 [2] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:26.064 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.072 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.072 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.075 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.075 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.076 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.077 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.077 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.077 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.077 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.078 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.078 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.078 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.079 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.080 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.080 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.080 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.081 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.081 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.081 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.081 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.082 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.083 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.086 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.089 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.091 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.092 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.094 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.094 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.095 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.099 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.102 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.106 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.108 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.108 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.108 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.108 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.109 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.109 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.109 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.110 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.113 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.113 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.116 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.117 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.117 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.117 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.117 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.118 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.118 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.118 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.119 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.120 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.120 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.120 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.121 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.121 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.121 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.122 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.122 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.123 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.123 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.123 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.124 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.133 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.133 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.134 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.135 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.137 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.138 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.140 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.141 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.143 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.144 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.146 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.147 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.149 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.150 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.152 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.152 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.153 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.155 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.155 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.156 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.157 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.157 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.159 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.159 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.753 n193-018-074:2301450:2302383 [2] NCCL INFO Connected all rings 2024-03-10 11:12:26.784 n193-018-074:2301449:2302318 [1] NCCL INFO Connected all rings 2024-03-10 11:12:26.784 n193-018-074:2301448:2302319 [0] NCCL INFO Connected all rings 2024-03-10 11:12:26.786 n193-018-074:2301452:2302321 [4] NCCL INFO Connected all rings 2024-03-10 11:12:26.786 n193-018-074:2301451:2302320 [3] NCCL INFO Connected all rings 2024-03-10 11:12:26.794 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.796 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.798 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.800 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.801 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.803 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.805 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.808 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.809 n193-018-074:2301455:2302322 [7] NCCL INFO Connected all rings 2024-03-10 11:12:26.809 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.809 n193-018-074:2301453:2302344 [5] NCCL INFO Connected all rings 2024-03-10 11:12:26.809 n193-018-074:2301454:2302384 [6] NCCL INFO Connected all rings 2024-03-10 11:12:26.810 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.812 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.813 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.815 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.816 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.818 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.819 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.820 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.822 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.823 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.824 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.827 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.829 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.829 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.832 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.832 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.835 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.836 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.838 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.839 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.841 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.842 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.843 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.844 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.844 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.846 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.846 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.847 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.847 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.849 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.849 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.849 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.850 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.851 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.852 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.852 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.852 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.854 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.854 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.855 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.856 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.857 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.857 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.858 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.858 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.859 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.859 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.860 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.861 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.861 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.862 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.863 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.863 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.864 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.864 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.865 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.865 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.866 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.866 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.868 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.868 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.868 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.869 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.870 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.870 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.871 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.871 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.872 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.872 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.872 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.873 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.873 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.874 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.874 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.874 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.874 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.876 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.876 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.876 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.876 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.877 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.878 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.878 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.878 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.878 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.879 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.881 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.882 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.882 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.882 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.882 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.883 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.884 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.884 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.884 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.884 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.885 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.886 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.886 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.886 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.886 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.887 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.888 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.888 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.888 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.888 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.889 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.890 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.890 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.890 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.890 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.891 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.892 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.892 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.892 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.892 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.894 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.894 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.894 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.894 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.895 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.896 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.896 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.896 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.897 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.897 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.898 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.898 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.898 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.899 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.900 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.900 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.902 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.902 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.902 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.904 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.904 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.904 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.906 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.906 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.907 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.911 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.911 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.913 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.913 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.914 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.915 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.916 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.917 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.918 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.918 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.920 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.920 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.923 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:27.388 n193-018-074:2301448:2302319 [0] NCCL INFO Connected all trees 2024-03-10 11:12:27.388 n193-018-074:2301448:2302319 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.388 n193-018-074:2301448:2302319 [0] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.471 n193-018-074:2301449:2302318 [1] NCCL INFO Connected all trees 2024-03-10 11:12:27.471 n193-018-074:2301449:2302318 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.471 n193-018-074:2301449:2302318 [1] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.520 n193-018-074:2301450:2302383 [2] NCCL INFO Connected all trees 2024-03-10 11:12:27.520 n193-018-074:2301450:2302383 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.520 n193-018-074:2301450:2302383 [2] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.528 n193-018-074:2301451:2302320 [3] NCCL INFO Connected all trees 2024-03-10 11:12:27.528 n193-018-074:2301451:2302320 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.528 n193-018-074:2301451:2302320 [3] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.533 n193-018-074:2301455:2302322 [7] NCCL INFO Connected all trees 2024-03-10 11:12:27.533 n193-018-074:2301455:2302322 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.533 n193-018-074:2301455:2302322 [7] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.535 n193-018-074:2301452:2302321 [4] NCCL INFO Connected all trees 2024-03-10 11:12:27.535 n193-018-074:2301452:2302321 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.535 n193-018-074:2301452:2302321 [4] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.535 n193-018-074:2301454:2302384 [6] NCCL INFO Connected all trees 2024-03-10 11:12:27.535 n193-018-074:2301454:2302384 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.535 n193-018-074:2301454:2302384 [6] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.535 n193-018-074:2301453:2302344 [5] NCCL INFO Connected all trees 2024-03-10 11:12:27.535 n193-018-074:2301453:2302344 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.536 n193-018-074:2301453:2302344 [5] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.605 n193-018-074:2301455:2302322 [7] NCCL INFO comm 0xb983c130 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301452:2302321 [4] NCCL INFO comm 0x185b03250 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301454:2302384 [6] NCCL INFO comm 0xb8a29e30 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301450:2302383 [2] NCCL INFO comm 0x186781460 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301449:2302318 [1] NCCL INFO comm 0xa4fb2560 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301451:2302320 [3] NCCL INFO comm 0x6fd77d40 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301448:2302319 [0] NCCL INFO comm 0x198e24fa0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301453:2302344 [5] NCCL INFO comm 0x6f639cc0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0x2fa19338ddb25d3f - Init COMPLETE /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") 0%| | 0/2774 [00:002->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 2024-03-10 11:12:46.455 n193-018-074:2301450:2302658 [2] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301449:2302654 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 2024-03-10 11:12:46.455 n193-018-074:2301453:2302656 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301454:2302657 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 2024-03-10 11:12:46.455 n193-018-074:2301453:2302656 [5] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301449:2302654 [1] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301455:2302659 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301454:2302657 [6] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301455:2302659 [7] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301452:2302653 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 2024-03-10 11:12:46.455 n193-018-074:2301451:2302655 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301452:2302653 [4] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301451:2302655 [3] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.773 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.774 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.774 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.774 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.774 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.775 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.775 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.775 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.776 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.776 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.777 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.777 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.777 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.778 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.778 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.778 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.778 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.779 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.779 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.779 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.780 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.780 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.780 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.781 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.781 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.781 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.782 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.782 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.782 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.783 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.783 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.784 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.784 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.785 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.785 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.785 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.786 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.786 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.786 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.787 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.787 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.788 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.788 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.788 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.789 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.789 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.790 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.790 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.790 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.791 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.791 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.791 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.792 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.792 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.792 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.793 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.793 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.794 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.795 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.795 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.796 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.796 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.796 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.796 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.797 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.799 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.800 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.800 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.800 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.801 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.801 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.801 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.801 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.803 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.803 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.803 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.804 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.805 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.805 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.805 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.806 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.807 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.808 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.808 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.808 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.809 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.809 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.809 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.809 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.810 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.810 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.810 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.811 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.811 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.811 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.812 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.812 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.813 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.813 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.813 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.813 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.814 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.814 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.814 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.814 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.815 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.815 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.815 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.816 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.816 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.817 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.817 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.817 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.818 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.818 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.818 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.819 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.819 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.819 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.819 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.820 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.820 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.820 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.820 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.821 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.821 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.822 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.822 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.822 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.823 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.823 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.823 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.824 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.824 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.824 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.824 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.825 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.825 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.825 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.825 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.826 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.827 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.827 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.827 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.827 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.828 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.828 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.828 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.828 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.829 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.829 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.829 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.829 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.830 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.830 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.830 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.831 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.832 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.832 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.832 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.832 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.833 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.833 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.833 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.834 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.834 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.834 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.834 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.835 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.835 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.835 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.835 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.836 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.837 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.837 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.837 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.837 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.838 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.838 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.838 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.839 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.839 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.840 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.840 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.840 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.841 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.841 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.841 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.842 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.842 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.842 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.844 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.408 n193-018-074:2301449:2302654 [1] NCCL INFO Connected all rings 2024-03-10 11:12:47.416 n193-018-074:2301450:2302658 [2] NCCL INFO Connected all rings 2024-03-10 11:12:47.420 n193-018-074:2301448:2302652 [0] NCCL INFO Connected all rings 2024-03-10 11:12:47.445 n193-018-074:2301451:2302655 [3] NCCL INFO Connected all rings 2024-03-10 11:12:47.450 n193-018-074:2301452:2302653 [4] NCCL INFO Connected all rings 2024-03-10 11:12:47.455 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.457 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.459 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.461 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.463 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.465 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.465 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.466 n193-018-074:2301455:2302659 [7] NCCL INFO Connected all rings 2024-03-10 11:12:47.466 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.466 n193-018-074:2301453:2302656 [5] NCCL INFO Connected all rings 2024-03-10 11:12:47.466 n193-018-074:2301454:2302657 [6] NCCL INFO Connected all rings 2024-03-10 11:12:47.467 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.467 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.469 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.469 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.470 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.471 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.472 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.473 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.473 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.473 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.474 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.475 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.475 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.476 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.477 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.477 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.478 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.479 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.479 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.480 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.480 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.481 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.481 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.482 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.483 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.483 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.484 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.484 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.485 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.486 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.487 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.487 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.488 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.488 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.489 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.490 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.490 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.491 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.491 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.492 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.492 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.493 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.493 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.494 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.495 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.496 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.496 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.497 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.497 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.497 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.498 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.498 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.498 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.499 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.499 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.500 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.500 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.500 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.501 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.502 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.502 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.502 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.502 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.503 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.504 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.504 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.504 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.505 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.505 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.506 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.506 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.508 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.508 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.508 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.509 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.509 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.510 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.510 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.510 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.510 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.511 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.512 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.512 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.512 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.512 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.513 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.514 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.514 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.514 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.514 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.515 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.515 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.515 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.515 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.516 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.516 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.516 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.516 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.517 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.517 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.517 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.517 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.518 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.518 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.518 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.518 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.519 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.519 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.519 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.519 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.520 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.520 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.520 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.521 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.521 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.521 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.522 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.522 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.523 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.523 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.524 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.524 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.525 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.525 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.525 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.526 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.527 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.527 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.527 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.527 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.528 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.528 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.529 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.529 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.529 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.530 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.530 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.530 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.531 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.531 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.531 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.532 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.533 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.533 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.534 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.535 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.535 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.536 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.537 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.538 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.538 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.539 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.540 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.540 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.542 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.543 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.544 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.544 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.545 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.547 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:48.003 n193-018-074:2301448:2302652 [0] NCCL INFO Connected all trees 2024-03-10 11:12:48.003 n193-018-074:2301448:2302652 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.003 n193-018-074:2301448:2302652 [0] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.065 n193-018-074:2301449:2302654 [1] NCCL INFO Connected all trees 2024-03-10 11:12:48.065 n193-018-074:2301449:2302654 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.065 n193-018-074:2301449:2302654 [1] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.071 n193-018-074:2301450:2302658 [2] NCCL INFO Connected all trees 2024-03-10 11:12:48.071 n193-018-074:2301450:2302658 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.071 n193-018-074:2301450:2302658 [2] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.090 n193-018-074:2301455:2302659 [7] NCCL INFO Connected all trees 2024-03-10 11:12:48.090 n193-018-074:2301455:2302659 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.090 n193-018-074:2301455:2302659 [7] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.091 n193-018-074:2301454:2302657 [6] NCCL INFO Connected all trees 2024-03-10 11:12:48.091 n193-018-074:2301454:2302657 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.091 n193-018-074:2301454:2302657 [6] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.092 n193-018-074:2301451:2302655 [3] NCCL INFO Connected all trees 2024-03-10 11:12:48.092 n193-018-074:2301451:2302655 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.092 n193-018-074:2301451:2302655 [3] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.092 n193-018-074:2301453:2302656 [5] NCCL INFO Connected all trees 2024-03-10 11:12:48.092 n193-018-074:2301452:2302653 [4] NCCL INFO Connected all trees 2024-03-10 11:12:48.092 n193-018-074:2301453:2302656 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.092 n193-018-074:2301452:2302653 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.092 n193-018-074:2301453:2302656 [5] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.092 n193-018-074:2301452:2302653 [4] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.116 n193-018-074:2301448:2302652 [0] NCCL INFO comm 0x1985b4bb0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.116 n193-018-074:2301453:2302656 [5] NCCL INFO comm 0x1862cf940 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.116 n193-018-074:2301454:2302657 [6] NCCL INFO comm 0xb7182140 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.116 n193-018-074:2301452:2302653 [4] NCCL INFO comm 0x185a77340 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.116 n193-018-074:2301450:2302658 [2] NCCL INFO comm 0xb858a750 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.117 n193-018-074:2301455:2302659 [7] NCCL INFO comm 0x1872d08c0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.117 n193-018-074:2301451:2302655 [3] NCCL INFO comm 0xb6209bc0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.117 n193-018-074:2301449:2302654 [1] NCCL INFO comm 0x1862a5d40 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0x699b1860b4474e85 - Init COMPLETE /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) 0%| | 1/2774 [00:24<18:30:55, 24.04s/it] {'loss': 1.1294, 'learning_rate': 5.952380952380953e-08, 'epoch': 0.0} 0%| | 1/2774 [00:24<18:30:55, 24.04s/it] 0%| | 2/2774 [00:35<12:43:02, 16.52s/it] {'loss': 1.1387, 'learning_rate': 1.1904761904761906e-07, 'epoch': 0.0} 0%| | 2/2774 [00:35<12:43:02, 16.52s/it] 0%| | 3/2774 [00:46<11:00:19, 14.30s/it] {'loss': 1.1772, 'learning_rate': 1.7857142857142858e-07, 'epoch': 0.0} 0%| | 3/2774 [00:46<11:00:19, 14.30s/it] 0%| | 4/2774 [00:58<10:02:47, 13.06s/it] {'loss': 1.1333, 'learning_rate': 2.3809523809523811e-07, 'epoch': 0.0} 0%| | 4/2774 [00:58<10:02:47, 13.06s/it] 0%| | 5/2774 [01:09<9:34:00, 12.44s/it] {'loss': 1.1484, 'learning_rate': 2.9761904761904765e-07, 'epoch': 0.0} 0%| | 5/2774 [01:09<9:34:00, 12.44s/it] 0%| | 6/2774 [01:22<9:42:47, 12.63s/it] {'loss': 1.1582, 'learning_rate': 3.5714285714285716e-07, 'epoch': 0.0} 0%| | 6/2774 [01:22<9:42:47, 12.63s/it] 0%| | 7/2774 [01:34<9:36:10, 12.49s/it] {'loss': 1.0933, 'learning_rate': 4.1666666666666667e-07, 'epoch': 0.0} 0%| | 7/2774 [01:34<9:36:10, 12.49s/it] 0%| | 8/2774 [01:45<9:16:32, 12.07s/it] {'loss': 1.1504, 'learning_rate': 4.7619047619047623e-07, 'epoch': 0.0} 0%| | 8/2774 [01:45<9:16:32, 12.07s/it] 0%| | 9/2774 [01:58<9:19:28, 12.14s/it] {'loss': 1.1104, 'learning_rate': 5.357142857142857e-07, 'epoch': 0.0} 0%| | 9/2774 [01:58<9:19:28, 12.14s/it] 0%| | 10/2774 [02:09<9:10:17, 11.95s/it] {'loss': 1.1162, 'learning_rate': 5.952380952380953e-07, 'epoch': 0.0} 0%| | 10/2774 [02:09<9:10:17, 11.95s/it] 0%| | 11/2774 [02:21<9:04:01, 11.81s/it] {'loss': 1.1558, 'learning_rate': 6.547619047619048e-07, 'epoch': 0.0} 0%| | 11/2774 [02:21<9:04:01, 11.81s/it] 0%| | 12/2774 [02:33<9:06:58, 11.88s/it] {'loss': 1.165, 'learning_rate': 7.142857142857143e-07, 'epoch': 0.0} 0%| | 12/2774 [02:33<9:06:58, 11.88s/it] 0%| | 13/2774 [02:44<9:01:05, 11.76s/it] {'loss': 1.1729, 'learning_rate': 7.738095238095239e-07, 'epoch': 0.0} 0%| | 13/2774 [02:44<9:01:05, 11.76s/it] 1%| | 14/2774 [02:55<8:54:09, 11.61s/it] {'loss': 1.1426, 'learning_rate': 8.333333333333333e-07, 'epoch': 0.01} 1%| | 14/2774 [02:55<8:54:09, 11.61s/it] 1%| | 15/2774 [03:09<9:19:45, 12.17s/it] {'loss': 1.0425, 'learning_rate': 8.928571428571429e-07, 'epoch': 0.01} 1%| | 15/2774 [03:09<9:19:45, 12.17s/it] 1%| | 16/2774 [03:20<9:07:34, 11.91s/it] {'loss': 1.1709, 'learning_rate': 9.523809523809525e-07, 'epoch': 0.01} 1%| | 16/2774 [03:20<9:07:34, 11.91s/it] 1%| | 17/2774 [03:32<9:06:06, 11.88s/it] {'loss': 1.1748, 'learning_rate': 1.011904761904762e-06, 'epoch': 0.01} 1%| | 17/2774 [03:32<9:06:06, 11.88s/it] 1%| | 18/2774 [03:44<9:01:43, 11.79s/it] {'loss': 1.2041, 'learning_rate': 1.0714285714285714e-06, 'epoch': 0.01} 1%| | 18/2774 [03:44<9:01:43, 11.79s/it] 1%| | 19/2774 [03:55<9:00:46, 11.78s/it] {'loss': 1.1641, 'learning_rate': 1.130952380952381e-06, 'epoch': 0.01} 1%| | 19/2774 [03:55<9:00:46, 11.78s/it] 1%| | 20/2774 [04:07<9:02:40, 11.82s/it] {'loss': 1.0566, 'learning_rate': 1.1904761904761906e-06, 'epoch': 0.01} 1%| | 20/2774 [04:07<9:02:40, 11.82s/it] 1%| | 21/2774 [04:19<9:03:47, 11.85s/it] {'loss': 1.1689, 'learning_rate': 1.25e-06, 'epoch': 0.01} 1%| | 21/2774 [04:19<9:03:47, 11.85s/it] 1%| | 22/2774 [04:31<8:57:40, 11.72s/it] {'loss': 1.1099, 'learning_rate': 1.3095238095238096e-06, 'epoch': 0.01} 1%| | 22/2774 [04:31<8:57:40, 11.72s/it] 1%| | 23/2774 [04:42<8:49:14, 11.54s/it] {'loss': 1.104, 'learning_rate': 1.3690476190476193e-06, 'epoch': 0.01} 1%| | 23/2774 [04:42<8:49:14, 11.54s/it] 1%| | 24/2774 [04:53<8:45:14, 11.46s/it] {'loss': 1.126, 'learning_rate': 1.4285714285714286e-06, 'epoch': 0.01} 1%| | 24/2774 [04:53<8:45:14, 11.46s/it] 1%| | 25/2774 [05:05<8:47:18, 11.51s/it] {'loss': 1.0576, 'learning_rate': 1.4880952380952381e-06, 'epoch': 0.01} 1%| | 25/2774 [05:05<8:47:18, 11.51s/it] 1%| | 26/2774 [05:17<8:54:40, 11.67s/it] {'loss': 1.0488, 'learning_rate': 1.5476190476190479e-06, 'epoch': 0.01} 1%| | 26/2774 [05:17<8:54:40, 11.67s/it] 1%| | 27/2774 [05:28<8:53:58, 11.66s/it] {'loss': 1.1729, 'learning_rate': 1.6071428571428574e-06, 'epoch': 0.01} 1%| | 27/2774 [05:28<8:53:58, 11.66s/it] 1%| | 28/2774 [05:40<8:56:26, 11.72s/it] {'loss': 1.1636, 'learning_rate': 1.6666666666666667e-06, 'epoch': 0.01} 1%| | 28/2774 [05:40<8:56:26, 11.72s/it] 1%| | 29/2774 [05:53<9:06:07, 11.94s/it] {'loss': 1.0781, 'learning_rate': 1.7261904761904764e-06, 'epoch': 0.01} 1%| | 29/2774 [05:53<9:06:07, 11.94s/it] 1%| | 30/2774 [06:04<8:59:55, 11.81s/it] {'loss': 1.0566, 'learning_rate': 1.7857142857142859e-06, 'epoch': 0.01} 1%| | 30/2774 [06:04<8:59:55, 11.81s/it] 1%| | 31/2774 [06:16<8:54:05, 11.68s/it] {'loss': 1.082, 'learning_rate': 1.8452380952380954e-06, 'epoch': 0.01} 1%| | 31/2774 [06:16<8:54:05, 11.68s/it] 1%| | 32/2774 [06:28<9:00:40, 11.83s/it] {'loss': 1.042, 'learning_rate': 1.904761904761905e-06, 'epoch': 0.01} 1%| | 32/2774 [06:28<9:00:40, 11.83s/it] 1%| | 33/2774 [06:41<9:26:14, 12.39s/it] {'loss': 1.0249, 'learning_rate': 1.9642857142857144e-06, 'epoch': 0.01} 1%| | 33/2774 [06:41<9:26:14, 12.39s/it] 1%| | 34/2774 [06:53<9:10:32, 12.06s/it] {'loss': 1.0791, 'learning_rate': 2.023809523809524e-06, 'epoch': 0.01} 1%| | 34/2774 [06:53<9:10:32, 12.06s/it] 1%|▏ | 35/2774 [07:04<9:05:50, 11.96s/it] {'loss': 1.0693, 'learning_rate': 2.0833333333333334e-06, 'epoch': 0.01} 1%|▏ | 35/2774 [07:04<9:05:50, 11.96s/it] 1%|▏ | 36/2774 [07:16<9:04:14, 11.93s/it] {'loss': 1.0415, 'learning_rate': 2.1428571428571427e-06, 'epoch': 0.01} 1%|▏ | 36/2774 [07:16<9:04:14, 11.93s/it] 1%|▏ | 37/2774 [07:30<9:30:08, 12.50s/it] {'loss': 1.0752, 'learning_rate': 2.2023809523809525e-06, 'epoch': 0.01} 1%|▏ | 37/2774 [07:30<9:30:08, 12.50s/it] 1%|▏ | 38/2774 [07:42<9:17:09, 12.22s/it] {'loss': 1.0649, 'learning_rate': 2.261904761904762e-06, 'epoch': 0.01} 1%|▏ | 38/2774 [07:42<9:17:09, 12.22s/it] 1%|▏ | 39/2774 [07:55<9:31:16, 12.53s/it] {'loss': 1.0972, 'learning_rate': 2.321428571428572e-06, 'epoch': 0.01} 1%|▏ | 39/2774 [07:55<9:31:16, 12.53s/it] 1%|▏ | 40/2774 [08:06<9:14:58, 12.18s/it] {'loss': 1.0835, 'learning_rate': 2.380952380952381e-06, 'epoch': 0.01} 1%|▏ | 40/2774 [08:06<9:14:58, 12.18s/it] 1%|▏ | 41/2774 [08:19<9:28:01, 12.47s/it] {'loss': 1.0435, 'learning_rate': 2.4404761904761905e-06, 'epoch': 0.01} 1%|▏ | 41/2774 [08:19<9:28:01, 12.47s/it] 2%|▏ | 42/2774 [08:32<9:34:32, 12.62s/it] {'loss': 1.105, 'learning_rate': 2.5e-06, 'epoch': 0.02} 2%|▏ | 42/2774 [08:32<9:34:32, 12.62s/it] 2%|▏ | 43/2774 [08:44<9:21:49, 12.34s/it] {'loss': 1.1113, 'learning_rate': 2.5595238095238095e-06, 'epoch': 0.02} 2%|▏ | 43/2774 [08:44<9:21:49, 12.34s/it] 2%|▏ | 44/2774 [08:56<9:10:51, 12.11s/it] {'loss': 1.002, 'learning_rate': 2.6190476190476192e-06, 'epoch': 0.02} 2%|▏ | 44/2774 [08:56<9:10:51, 12.11s/it] 2%|▏ | 45/2774 [09:07<8:59:35, 11.86s/it] {'loss': 1.0391, 'learning_rate': 2.6785714285714285e-06, 'epoch': 0.02} 2%|▏ | 45/2774 [09:07<8:59:35, 11.86s/it] 2%|▏ | 46/2774 [09:18<8:52:02, 11.70s/it] {'loss': 1.0635, 'learning_rate': 2.7380952380952387e-06, 'epoch': 0.02} 2%|▏ | 46/2774 [09:18<8:52:02, 11.70s/it] 2%|▏ | 47/2774 [09:30<8:51:18, 11.69s/it] {'loss': 1.0591, 'learning_rate': 2.797619047619048e-06, 'epoch': 0.02} 2%|▏ | 47/2774 [09:30<8:51:18, 11.69s/it] 2%|▏ | 48/2774 [09:41<8:46:33, 11.59s/it] {'loss': 1.083, 'learning_rate': 2.8571428571428573e-06, 'epoch': 0.02} 2%|▏ | 48/2774 [09:41<8:46:33, 11.59s/it] 2%|▏ | 49/2774 [09:53<8:51:43, 11.71s/it] {'loss': 1.0903, 'learning_rate': 2.916666666666667e-06, 'epoch': 0.02} 2%|▏ | 49/2774 [09:53<8:51:43, 11.71s/it] 2%|▏ | 50/2774 [10:05<8:49:51, 11.67s/it] {'loss': 1.0776, 'learning_rate': 2.9761904761904763e-06, 'epoch': 0.02} 2%|▏ | 50/2774 [10:05<8:49:51, 11.67s/it] 2%|▏ | 51/2774 [10:16<8:46:36, 11.60s/it] {'loss': 0.9985, 'learning_rate': 3.0357142857142856e-06, 'epoch': 0.02} 2%|▏ | 51/2774 [10:16<8:46:36, 11.60s/it] 2%|▏ | 52/2774 [10:29<9:02:45, 11.96s/it] {'loss': 1.002, 'learning_rate': 3.0952380952380957e-06, 'epoch': 0.02} 2%|▏ | 52/2774 [10:29<9:02:45, 11.96s/it] 2%|▏ | 53/2774 [10:41<9:00:23, 11.92s/it] {'loss': 1.0967, 'learning_rate': 3.154761904761905e-06, 'epoch': 0.02} 2%|▏ | 53/2774 [10:41<9:00:23, 11.92s/it] 2%|▏ | 54/2774 [10:52<8:50:58, 11.71s/it] {'loss': 1.0996, 'learning_rate': 3.2142857142857147e-06, 'epoch': 0.02} 2%|▏ | 54/2774 [10:52<8:50:58, 11.71s/it] 2%|▏ | 55/2774 [11:04<8:46:49, 11.63s/it] {'loss': 1.0845, 'learning_rate': 3.273809523809524e-06, 'epoch': 0.02} 2%|▏ | 55/2774 [11:04<8:46:49, 11.63s/it] 2%|▏ | 56/2774 [11:15<8:45:14, 11.59s/it] {'loss': 1.0469, 'learning_rate': 3.3333333333333333e-06, 'epoch': 0.02} 2%|▏ | 56/2774 [11:15<8:45:14, 11.59s/it] 2%|▏ | 57/2774 [11:27<8:43:57, 11.57s/it] {'loss': 1.0381, 'learning_rate': 3.3928571428571435e-06, 'epoch': 0.02} 2%|▏ | 57/2774 [11:27<8:43:57, 11.57s/it] 2%|▏ | 58/2774 [11:38<8:45:27, 11.61s/it] {'loss': 1.0674, 'learning_rate': 3.4523809523809528e-06, 'epoch': 0.02} 2%|▏ | 58/2774 [11:38<8:45:27, 11.61s/it] 2%|▏ | 59/2774 [11:51<8:56:43, 11.86s/it] {'loss': 1.0181, 'learning_rate': 3.511904761904762e-06, 'epoch': 0.02} 2%|▏ | 59/2774 [11:51<8:56:43, 11.86s/it] 2%|▏ | 60/2774 [12:03<9:06:03, 12.07s/it] {'loss': 1.0278, 'learning_rate': 3.5714285714285718e-06, 'epoch': 0.02} 2%|▏ | 60/2774 [12:03<9:06:03, 12.07s/it] 2%|▏ | 61/2774 [12:17<9:25:33, 12.51s/it] {'loss': 0.9951, 'learning_rate': 3.630952380952381e-06, 'epoch': 0.02} 2%|▏ | 61/2774 [12:17<9:25:33, 12.51s/it] 2%|▏ | 62/2774 [12:29<9:14:41, 12.27s/it] {'loss': 1.0386, 'learning_rate': 3.690476190476191e-06, 'epoch': 0.02} 2%|▏ | 62/2774 [12:29<9:14:41, 12.27s/it] 2%|▏ | 63/2774 [12:42<9:33:38, 12.70s/it] {'loss': 1.0269, 'learning_rate': 3.7500000000000005e-06, 'epoch': 0.02} 2%|▏ | 63/2774 [12:42<9:33:38, 12.70s/it] 2%|▏ | 64/2774 [12:54<9:16:43, 12.33s/it] {'loss': 1.0552, 'learning_rate': 3.80952380952381e-06, 'epoch': 0.02} 2%|▏ | 64/2774 [12:54<9:16:43, 12.33s/it] 2%|▏ | 65/2774 [13:05<9:03:49, 12.04s/it] {'loss': 1.063, 'learning_rate': 3.869047619047619e-06, 'epoch': 0.02} 2%|▏ | 65/2774 [13:05<9:03:49, 12.04s/it] 2%|▏ | 66/2774 [13:18<9:15:38, 12.31s/it] {'loss': 1.0244, 'learning_rate': 3.928571428571429e-06, 'epoch': 0.02} 2%|▏ | 66/2774 [13:18<9:15:38, 12.31s/it] 2%|▏ | 67/2774 [13:29<9:02:32, 12.03s/it] {'loss': 1.105, 'learning_rate': 3.9880952380952386e-06, 'epoch': 0.02} 2%|▏ | 67/2774 [13:29<9:02:32, 12.03s/it] 2%|▏ | 68/2774 [13:41<8:55:36, 11.88s/it] {'loss': 1.043, 'learning_rate': 4.047619047619048e-06, 'epoch': 0.02} 2%|▏ | 68/2774 [13:41<8:55:36, 11.88s/it] 2%|▏ | 69/2774 [13:53<8:54:22, 11.85s/it] {'loss': 1.0854, 'learning_rate': 4.107142857142857e-06, 'epoch': 0.02} 2%|▏ | 69/2774 [13:53<8:54:22, 11.85s/it] 3%|▎ | 70/2774 [14:04<8:45:55, 11.67s/it] {'loss': 0.9614, 'learning_rate': 4.166666666666667e-06, 'epoch': 0.03} 3%|▎ | 70/2774 [14:04<8:45:55, 11.67s/it] 3%|▎ | 71/2774 [14:16<8:44:02, 11.63s/it] {'loss': 1.0498, 'learning_rate': 4.226190476190477e-06, 'epoch': 0.03} 3%|▎ | 71/2774 [14:16<8:44:02, 11.63s/it] 3%|▎ | 72/2774 [14:27<8:41:28, 11.58s/it] {'loss': 1.0669, 'learning_rate': 4.2857142857142855e-06, 'epoch': 0.03} 3%|▎ | 72/2774 [14:27<8:41:28, 11.58s/it] 3%|▎ | 73/2774 [14:38<8:38:21, 11.51s/it] {'loss': 1.042, 'learning_rate': 4.345238095238096e-06, 'epoch': 0.03} 3%|▎ | 73/2774 [14:38<8:38:21, 11.51s/it] 3%|▎ | 74/2774 [14:50<8:36:57, 11.49s/it] {'loss': 1.083, 'learning_rate': 4.404761904761905e-06, 'epoch': 0.03} 3%|▎ | 74/2774 [14:50<8:36:57, 11.49s/it] 3%|▎ | 75/2774 [15:01<8:37:22, 11.50s/it] {'loss': 1.0146, 'learning_rate': 4.464285714285715e-06, 'epoch': 0.03} 3%|▎ | 75/2774 [15:01<8:37:22, 11.50s/it] 3%|▎ | 76/2774 [15:12<8:32:22, 11.39s/it] {'loss': 1.0522, 'learning_rate': 4.523809523809524e-06, 'epoch': 0.03} 3%|▎ | 76/2774 [15:12<8:32:22, 11.39s/it] 3%|▎ | 77/2774 [15:24<8:36:05, 11.48s/it] {'loss': 1.0, 'learning_rate': 4.583333333333333e-06, 'epoch': 0.03} 3%|▎ | 77/2774 [15:24<8:36:05, 11.48s/it] 3%|▎ | 78/2774 [15:35<8:33:54, 11.44s/it] {'loss': 1.0752, 'learning_rate': 4.642857142857144e-06, 'epoch': 0.03} 3%|▎ | 78/2774 [15:35<8:33:54, 11.44s/it] 3%|▎ | 79/2774 [15:47<8:30:43, 11.37s/it] {'loss': 1.0649, 'learning_rate': 4.702380952380953e-06, 'epoch': 0.03} 3%|▎ | 79/2774 [15:47<8:30:43, 11.37s/it] 3%|▎ | 80/2774 [15:58<8:30:48, 11.38s/it] {'loss': 1.0405, 'learning_rate': 4.761904761904762e-06, 'epoch': 0.03} 3%|▎ | 80/2774 [15:58<8:30:48, 11.38s/it] 3%|▎ | 81/2774 [16:09<8:31:16, 11.39s/it] {'loss': 1.0566, 'learning_rate': 4.821428571428572e-06, 'epoch': 0.03} 3%|▎ | 81/2774 [16:09<8:31:16, 11.39s/it] 3%|▎ | 82/2774 [16:21<8:31:21, 11.40s/it] {'loss': 1.0557, 'learning_rate': 4.880952380952381e-06, 'epoch': 0.03} 3%|▎ | 82/2774 [16:21<8:31:21, 11.40s/it] 3%|▎ | 83/2774 [16:33<8:41:01, 11.62s/it] {'loss': 0.9883, 'learning_rate': 4.940476190476191e-06, 'epoch': 0.03} 3%|▎ | 83/2774 [16:33<8:41:01, 11.62s/it] 3%|▎ | 84/2774 [16:44<8:35:23, 11.50s/it] {'loss': 0.9775, 'learning_rate': 5e-06, 'epoch': 0.03} 3%|▎ | 84/2774 [16:44<8:35:23, 11.50s/it] 3%|▎ | 85/2774 [16:56<8:42:45, 11.66s/it] {'loss': 0.9976, 'learning_rate': 4.999998295075511e-06, 'epoch': 0.03} 3%|▎ | 85/2774 [16:56<8:42:45, 11.66s/it] 3%|▎ | 86/2774 [17:08<8:43:13, 11.68s/it] {'loss': 1.0503, 'learning_rate': 4.9999931803043675e-06, 'epoch': 0.03} 3%|▎ | 86/2774 [17:08<8:43:13, 11.68s/it] 3%|▎ | 87/2774 [17:19<8:39:57, 11.61s/it] {'loss': 1.0117, 'learning_rate': 4.999984655693547e-06, 'epoch': 0.03} 3%|▎ | 87/2774 [17:19<8:39:57, 11.61s/it] 3%|▎ | 88/2774 [17:31<8:37:36, 11.56s/it] {'loss': 1.0293, 'learning_rate': 4.999972721254676e-06, 'epoch': 0.03} 3%|▎ | 88/2774 [17:31<8:37:36, 11.56s/it] 3%|▎ | 89/2774 [17:43<8:38:24, 11.58s/it] {'loss': 1.0674, 'learning_rate': 4.999957377004031e-06, 'epoch': 0.03} 3%|▎ | 89/2774 [17:43<8:38:24, 11.58s/it] 3%|▎ | 90/2774 [17:54<8:33:31, 11.48s/it] {'loss': 1.0532, 'learning_rate': 4.9999386229625436e-06, 'epoch': 0.03} 3%|▎ | 90/2774 [17:54<8:33:31, 11.48s/it] 3%|▎ | 91/2774 [18:05<8:31:35, 11.44s/it] {'loss': 0.9653, 'learning_rate': 4.999916459155791e-06, 'epoch': 0.03} 3%|▎ | 91/2774 [18:05<8:31:35, 11.44s/it] 3%|▎ | 92/2774 [18:17<8:32:24, 11.46s/it] {'loss': 1.0649, 'learning_rate': 4.999890885614004e-06, 'epoch': 0.03} 3%|▎ | 92/2774 [18:17<8:32:24, 11.46s/it] 3%|▎ | 93/2774 [18:28<8:27:06, 11.35s/it] {'loss': 1.0786, 'learning_rate': 4.999861902372063e-06, 'epoch': 0.03} 3%|▎ | 93/2774 [18:28<8:27:06, 11.35s/it] 3%|▎ | 94/2774 [18:40<8:32:49, 11.48s/it] {'loss': 1.0527, 'learning_rate': 4.9998295094694995e-06, 'epoch': 0.03} 3%|▎ | 94/2774 [18:40<8:32:49, 11.48s/it] 3%|▎ | 95/2774 [18:53<8:56:49, 12.02s/it] {'loss': 0.979, 'learning_rate': 4.999793706950496e-06, 'epoch': 0.03} 3%|▎ | 95/2774 [18:53<8:56:49, 12.02s/it] 3%|▎ | 96/2774 [19:04<8:44:37, 11.75s/it] {'loss': 1.0317, 'learning_rate': 4.999754494863884e-06, 'epoch': 0.03} 3%|▎ | 96/2774 [19:04<8:44:37, 11.75s/it] 3%|▎ | 97/2774 [19:15<8:41:24, 11.69s/it] {'loss': 1.0347, 'learning_rate': 4.999711873263148e-06, 'epoch': 0.03} 3%|▎ | 97/2774 [19:15<8:41:24, 11.69s/it] 4%|▎ | 98/2774 [19:27<8:37:23, 11.60s/it] {'loss': 1.0322, 'learning_rate': 4.9996658422064195e-06, 'epoch': 0.04} 4%|▎ | 98/2774 [19:27<8:37:23, 11.60s/it] 4%|▎ | 99/2774 [19:39<8:38:23, 11.63s/it] {'loss': 1.0366, 'learning_rate': 4.9996164017564835e-06, 'epoch': 0.04} 4%|▎ | 99/2774 [19:39<8:38:23, 11.63s/it] 4%|▎ | 100/2774 [19:50<8:35:40, 11.57s/it] {'loss': 1.0151, 'learning_rate': 4.999563551980773e-06, 'epoch': 0.04} 4%|▎ | 100/2774 [19:50<8:35:40, 11.57s/it] 4%|▎ | 101/2774 [20:03<8:55:10, 12.01s/it] {'loss': 0.9834, 'learning_rate': 4.999507292951371e-06, 'epoch': 0.04} 4%|▎ | 101/2774 [20:03<8:55:10, 12.01s/it] 4%|▎ | 102/2774 [20:14<8:43:27, 11.75s/it] {'loss': 1.0244, 'learning_rate': 4.9994476247450145e-06, 'epoch': 0.04} 4%|▎ | 102/2774 [20:14<8:43:27, 11.75s/it] 4%|▎ | 103/2774 [20:26<8:48:57, 11.88s/it] {'loss': 0.9766, 'learning_rate': 4.999384547443084e-06, 'epoch': 0.04} 4%|▎ | 103/2774 [20:26<8:48:57, 11.88s/it] 4%|▎ | 104/2774 [20:38<8:43:11, 11.76s/it] {'loss': 1.0312, 'learning_rate': 4.999318061131614e-06, 'epoch': 0.04} 4%|▎ | 104/2774 [20:38<8:43:11, 11.76s/it] 4%|▍ | 105/2774 [20:49<8:37:39, 11.64s/it] {'loss': 1.0693, 'learning_rate': 4.999248165901289e-06, 'epoch': 0.04} 4%|▍ | 105/2774 [20:49<8:37:39, 11.64s/it] 4%|▍ | 106/2774 [21:01<8:39:37, 11.69s/it] {'loss': 1.0796, 'learning_rate': 4.999174861847441e-06, 'epoch': 0.04} 4%|▍ | 106/2774 [21:01<8:39:37, 11.69s/it] 4%|▍ | 107/2774 [21:13<8:38:49, 11.67s/it] {'loss': 1.0425, 'learning_rate': 4.999098149070052e-06, 'epoch': 0.04} 4%|▍ | 107/2774 [21:13<8:38:49, 11.67s/it] 4%|▍ | 108/2774 [21:26<9:07:31, 12.32s/it] {'loss': 1.0151, 'learning_rate': 4.999018027673754e-06, 'epoch': 0.04} 4%|▍ | 108/2774 [21:26<9:07:31, 12.32s/it] 4%|▍ | 109/2774 [21:38<9:02:47, 12.22s/it] {'loss': 1.0215, 'learning_rate': 4.998934497767829e-06, 'epoch': 0.04} 4%|▍ | 109/2774 [21:38<9:02:47, 12.22s/it] 4%|▍ | 110/2774 [21:50<8:55:07, 12.05s/it] {'loss': 1.0576, 'learning_rate': 4.998847559466204e-06, 'epoch': 0.04} 4%|▍ | 110/2774 [21:50<8:55:07, 12.05s/it] 4%|▍ | 111/2774 [22:03<9:06:03, 12.30s/it] {'loss': 1.0854, 'learning_rate': 4.99875721288746e-06, 'epoch': 0.04} 4%|▍ | 111/2774 [22:03<9:06:03, 12.30s/it] 4%|▍ | 112/2774 [22:14<8:50:11, 11.95s/it] {'loss': 1.042, 'learning_rate': 4.9986634581548235e-06, 'epoch': 0.04} 4%|▍ | 112/2774 [22:14<8:50:11, 11.95s/it] 4%|▍ | 113/2774 [22:26<8:43:53, 11.81s/it] {'loss': 1.0664, 'learning_rate': 4.998566295396169e-06, 'epoch': 0.04} 4%|▍ | 113/2774 [22:26<8:43:53, 11.81s/it] 4%|▍ | 114/2774 [22:37<8:40:05, 11.73s/it] {'loss': 1.0088, 'learning_rate': 4.998465724744023e-06, 'epoch': 0.04} 4%|▍ | 114/2774 [22:37<8:40:05, 11.73s/it] 4%|▍ | 115/2774 [22:48<8:31:47, 11.55s/it] {'loss': 1.0615, 'learning_rate': 4.998361746335556e-06, 'epoch': 0.04} 4%|▍ | 115/2774 [22:48<8:31:47, 11.55s/it] 4%|▍ | 116/2774 [23:01<8:48:11, 11.92s/it] {'loss': 1.042, 'learning_rate': 4.998254360312589e-06, 'epoch': 0.04} 4%|▍ | 116/2774 [23:01<8:48:11, 11.92s/it] 4%|▍ | 117/2774 [23:13<8:50:49, 11.99s/it] {'loss': 1.0137, 'learning_rate': 4.998143566821589e-06, 'epoch': 0.04} 4%|▍ | 117/2774 [23:13<8:50:49, 11.99s/it] 4%|▍ | 118/2774 [23:25<8:45:34, 11.87s/it] {'loss': 1.0347, 'learning_rate': 4.998029366013674e-06, 'epoch': 0.04} 4%|▍ | 118/2774 [23:25<8:45:34, 11.87s/it] 4%|▍ | 119/2774 [23:36<8:42:12, 11.80s/it] {'loss': 1.0532, 'learning_rate': 4.997911758044605e-06, 'epoch': 0.04} 4%|▍ | 119/2774 [23:36<8:42:12, 11.80s/it] 4%|▍ | 120/2774 [23:48<8:37:05, 11.69s/it] {'loss': 1.0752, 'learning_rate': 4.997790743074793e-06, 'epoch': 0.04} 4%|▍ | 120/2774 [23:48<8:37:05, 11.69s/it] 4%|▍ | 121/2774 [23:59<8:31:51, 11.58s/it] {'loss': 0.9868, 'learning_rate': 4.997666321269294e-06, 'epoch': 0.04} 4%|▍ | 121/2774 [23:59<8:31:51, 11.58s/it] 4%|▍ | 122/2774 [24:11<8:31:34, 11.57s/it] {'loss': 1.0439, 'learning_rate': 4.997538492797813e-06, 'epoch': 0.04} 4%|▍ | 122/2774 [24:11<8:31:34, 11.57s/it] 4%|▍ | 123/2774 [24:22<8:28:32, 11.51s/it] {'loss': 0.9834, 'learning_rate': 4.9974072578347e-06, 'epoch': 0.04} 4%|▍ | 123/2774 [24:22<8:28:32, 11.51s/it] 4%|▍ | 124/2774 [24:34<8:27:25, 11.49s/it] {'loss': 1.106, 'learning_rate': 4.997272616558952e-06, 'epoch': 0.04} 4%|▍ | 124/2774 [24:34<8:27:25, 11.49s/it] 5%|▍ | 125/2774 [24:46<8:46:10, 11.92s/it] {'loss': 1.0312, 'learning_rate': 4.99713456915421e-06, 'epoch': 0.05} 5%|▍ | 125/2774 [24:46<8:46:10, 11.92s/it] 5%|▍ | 126/2774 [24:58<8:40:16, 11.79s/it] {'loss': 0.9814, 'learning_rate': 4.996993115808765e-06, 'epoch': 0.05} 5%|▍ | 126/2774 [24:58<8:40:16, 11.79s/it] 5%|▍ | 127/2774 [25:10<8:38:26, 11.75s/it] {'loss': 1.0161, 'learning_rate': 4.996848256715547e-06, 'epoch': 0.05} 5%|▍ | 127/2774 [25:10<8:38:26, 11.75s/it] 5%|▍ | 128/2774 [25:22<8:41:46, 11.83s/it] {'loss': 1.0122, 'learning_rate': 4.996699992072139e-06, 'epoch': 0.05} 5%|▍ | 128/2774 [25:22<8:41:46, 11.83s/it] 5%|▍ | 129/2774 [25:34<8:53:12, 12.10s/it] {'loss': 1.0459, 'learning_rate': 4.996548322080763e-06, 'epoch': 0.05} 5%|▍ | 129/2774 [25:34<8:53:12, 12.10s/it] 5%|▍ | 130/2774 [25:46<8:46:33, 11.95s/it] {'loss': 1.0337, 'learning_rate': 4.996393246948288e-06, 'epoch': 0.05} 5%|▍ | 130/2774 [25:46<8:46:33, 11.95s/it] 5%|▍ | 131/2774 [25:57<8:39:34, 11.80s/it] {'loss': 1.0166, 'learning_rate': 4.996234766886227e-06, 'epoch': 0.05} 5%|▍ | 131/2774 [25:57<8:39:34, 11.80s/it] 5%|▍ | 132/2774 [26:09<8:32:49, 11.65s/it] {'loss': 1.0342, 'learning_rate': 4.996072882110737e-06, 'epoch': 0.05} 5%|▍ | 132/2774 [26:09<8:32:49, 11.65s/it] 5%|▍ | 133/2774 [26:20<8:33:21, 11.66s/it] {'loss': 1.0039, 'learning_rate': 4.995907592842619e-06, 'epoch': 0.05} 5%|▍ | 133/2774 [26:20<8:33:21, 11.66s/it] 5%|▍ | 134/2774 [26:32<8:26:23, 11.51s/it] {'loss': 1.0557, 'learning_rate': 4.995738899307319e-06, 'epoch': 0.05} 5%|▍ | 134/2774 [26:32<8:26:23, 11.51s/it] 5%|▍ | 135/2774 [26:43<8:27:52, 11.55s/it] {'loss': 1.0161, 'learning_rate': 4.995566801734923e-06, 'epoch': 0.05} 5%|▍ | 135/2774 [26:43<8:27:52, 11.55s/it] 5%|▍ | 136/2774 [26:55<8:31:51, 11.64s/it] {'loss': 1.0039, 'learning_rate': 4.9953913003601625e-06, 'epoch': 0.05} 5%|▍ | 136/2774 [26:55<8:31:51, 11.64s/it] 5%|▍ | 137/2774 [27:06<8:27:50, 11.56s/it] {'loss': 1.0215, 'learning_rate': 4.995212395422412e-06, 'epoch': 0.05} 5%|▍ | 137/2774 [27:06<8:27:50, 11.56s/it] 5%|▍ | 138/2774 [27:18<8:25:54, 11.52s/it] {'loss': 1.0527, 'learning_rate': 4.995030087165684e-06, 'epoch': 0.05} 5%|▍ | 138/2774 [27:18<8:25:54, 11.52s/it] 5%|▌ | 139/2774 [27:30<8:32:48, 11.68s/it] {'loss': 0.9849, 'learning_rate': 4.994844375838639e-06, 'epoch': 0.05} 5%|▌ | 139/2774 [27:30<8:32:48, 11.68s/it] 5%|▌ | 140/2774 [27:41<8:29:43, 11.61s/it] {'loss': 1.0498, 'learning_rate': 4.994655261694575e-06, 'epoch': 0.05} 5%|▌ | 140/2774 [27:41<8:29:43, 11.61s/it] 5%|▌ | 141/2774 [27:53<8:29:45, 11.62s/it] {'loss': 1.0322, 'learning_rate': 4.994462744991431e-06, 'epoch': 0.05} 5%|▌ | 141/2774 [27:53<8:29:45, 11.62s/it] 5%|▌ | 142/2774 [28:05<8:29:28, 11.61s/it] {'loss': 1.0396, 'learning_rate': 4.994266825991788e-06, 'epoch': 0.05} 5%|▌ | 142/2774 [28:05<8:29:28, 11.61s/it] 5%|▌ | 143/2774 [28:16<8:28:00, 11.58s/it] {'loss': 1.0088, 'learning_rate': 4.9940675049628715e-06, 'epoch': 0.05} 5%|▌ | 143/2774 [28:16<8:28:00, 11.58s/it] 5%|▌ | 144/2774 [28:28<8:25:43, 11.54s/it] {'loss': 1.0078, 'learning_rate': 4.993864782176539e-06, 'epoch': 0.05} 5%|▌ | 144/2774 [28:28<8:25:43, 11.54s/it] 5%|▌ | 145/2774 [28:40<8:36:47, 11.79s/it] {'loss': 0.9629, 'learning_rate': 4.993658657909294e-06, 'epoch': 0.05} 5%|▌ | 145/2774 [28:40<8:36:47, 11.79s/it] 5%|▌ | 146/2774 [28:53<8:48:59, 12.08s/it] {'loss': 0.9785, 'learning_rate': 4.993449132442278e-06, 'epoch': 0.05} 5%|▌ | 146/2774 [28:53<8:48:59, 12.08s/it] 5%|▌ | 147/2774 [29:04<8:45:38, 12.01s/it] {'loss': 1.0596, 'learning_rate': 4.9932362060612694e-06, 'epoch': 0.05} 5%|▌ | 147/2774 [29:04<8:45:38, 12.01s/it] 5%|▌ | 148/2774 [29:16<8:44:39, 11.99s/it] {'loss': 1.0303, 'learning_rate': 4.993019879056689e-06, 'epoch': 0.05} 5%|▌ | 148/2774 [29:16<8:44:39, 11.99s/it] 5%|▌ | 149/2774 [29:28<8:35:59, 11.79s/it] {'loss': 1.0684, 'learning_rate': 4.992800151723592e-06, 'epoch': 0.05} 5%|▌ | 149/2774 [29:28<8:35:59, 11.79s/it] 5%|▌ | 150/2774 [29:39<8:30:06, 11.66s/it] {'loss': 0.9956, 'learning_rate': 4.9925770243616745e-06, 'epoch': 0.05} 5%|▌ | 150/2774 [29:39<8:30:06, 11.66s/it] 5%|▌ | 151/2774 [29:50<8:24:15, 11.53s/it] {'loss': 1.0322, 'learning_rate': 4.992350497275268e-06, 'epoch': 0.05} 5%|▌ | 151/2774 [29:50<8:24:15, 11.53s/it] 5%|▌ | 152/2774 [30:02<8:24:34, 11.55s/it] {'loss': 1.0742, 'learning_rate': 4.992120570773342e-06, 'epoch': 0.05} 5%|▌ | 152/2774 [30:02<8:24:34, 11.55s/it] 6%|▌ | 153/2774 [30:14<8:30:18, 11.68s/it] {'loss': 1.0566, 'learning_rate': 4.991887245169502e-06, 'epoch': 0.06} 6%|▌ | 153/2774 [30:14<8:30:18, 11.68s/it] 6%|▌ | 154/2774 [30:26<8:29:25, 11.67s/it] {'loss': 1.0605, 'learning_rate': 4.99165052078199e-06, 'epoch': 0.06} 6%|▌ | 154/2774 [30:26<8:29:25, 11.67s/it] 6%|▌ | 155/2774 [30:38<8:39:45, 11.91s/it] {'loss': 1.0112, 'learning_rate': 4.991410397933685e-06, 'epoch': 0.06} 6%|▌ | 155/2774 [30:38<8:39:45, 11.91s/it] 6%|▌ | 156/2774 [30:49<8:32:35, 11.75s/it] {'loss': 1.0498, 'learning_rate': 4.991166876952098e-06, 'epoch': 0.06} 6%|▌ | 156/2774 [30:49<8:32:35, 11.75s/it] 6%|▌ | 157/2774 [31:01<8:32:24, 11.75s/it] {'loss': 1.0659, 'learning_rate': 4.990919958169379e-06, 'epoch': 0.06} 6%|▌ | 157/2774 [31:01<8:32:24, 11.75s/it] 6%|▌ | 158/2774 [31:14<8:50:46, 12.17s/it] {'loss': 0.9556, 'learning_rate': 4.99066964192231e-06, 'epoch': 0.06} 6%|▌ | 158/2774 [31:14<8:50:46, 12.17s/it] 6%|▌ | 159/2774 [31:27<8:53:02, 12.23s/it] {'loss': 1.0527, 'learning_rate': 4.990415928552306e-06, 'epoch': 0.06} 6%|▌ | 159/2774 [31:27<8:53:02, 12.23s/it] 6%|▌ | 160/2774 [31:40<9:05:10, 12.51s/it] {'loss': 1.1377, 'learning_rate': 4.990158818405417e-06, 'epoch': 0.06} 6%|▌ | 160/2774 [31:40<9:05:10, 12.51s/it] 6%|▌ | 161/2774 [31:52<8:57:01, 12.33s/it] {'loss': 1.0562, 'learning_rate': 4.9898983118323265e-06, 'epoch': 0.06} 6%|▌ | 161/2774 [31:52<8:57:01, 12.33s/it] 6%|▌ | 162/2774 [32:03<8:43:30, 12.03s/it] {'loss': 1.0586, 'learning_rate': 4.989634409188349e-06, 'epoch': 0.06} 6%|▌ | 162/2774 [32:03<8:43:30, 12.03s/it] 6%|▌ | 163/2774 [32:15<8:36:25, 11.87s/it] {'loss': 1.0205, 'learning_rate': 4.989367110833432e-06, 'epoch': 0.06} 6%|▌ | 163/2774 [32:15<8:36:25, 11.87s/it] 6%|▌ | 164/2774 [32:27<8:46:49, 12.11s/it] {'loss': 1.042, 'learning_rate': 4.9890964171321535e-06, 'epoch': 0.06} 6%|▌ | 164/2774 [32:27<8:46:49, 12.11s/it] 6%|▌ | 165/2774 [32:39<8:35:48, 11.86s/it] {'loss': 1.0288, 'learning_rate': 4.988822328453725e-06, 'epoch': 0.06} 6%|▌ | 165/2774 [32:39<8:35:48, 11.86s/it] 6%|▌ | 166/2774 [32:50<8:35:04, 11.85s/it] {'loss': 1.0459, 'learning_rate': 4.988544845171986e-06, 'epoch': 0.06} 6%|▌ | 166/2774 [32:50<8:35:04, 11.85s/it] 6%|▌ | 167/2774 [33:02<8:31:44, 11.78s/it] {'loss': 1.0356, 'learning_rate': 4.9882639676654075e-06, 'epoch': 0.06} 6%|▌ | 167/2774 [33:02<8:31:44, 11.78s/it] 6%|▌ | 168/2774 [33:13<8:26:07, 11.65s/it] {'loss': 1.0381, 'learning_rate': 4.987979696317088e-06, 'epoch': 0.06} 6%|▌ | 168/2774 [33:13<8:26:07, 11.65s/it] 6%|▌ | 169/2774 [33:25<8:25:22, 11.64s/it] {'loss': 1.0044, 'learning_rate': 4.987692031514758e-06, 'epoch': 0.06} 6%|▌ | 169/2774 [33:25<8:25:22, 11.64s/it] 6%|▌ | 170/2774 [33:36<8:22:40, 11.58s/it] {'loss': 1.0757, 'learning_rate': 4.9874009736507745e-06, 'epoch': 0.06} 6%|▌ | 170/2774 [33:36<8:22:40, 11.58s/it] 6%|▌ | 171/2774 [33:48<8:21:08, 11.55s/it] {'loss': 1.0215, 'learning_rate': 4.987106523122122e-06, 'epoch': 0.06} 6%|▌ | 171/2774 [33:48<8:21:08, 11.55s/it] 6%|▌ | 172/2774 [34:01<8:35:13, 11.88s/it] {'loss': 1.0396, 'learning_rate': 4.986808680330415e-06, 'epoch': 0.06} 6%|▌ | 172/2774 [34:01<8:35:13, 11.88s/it] 6%|▌ | 173/2774 [34:12<8:29:56, 11.76s/it] {'loss': 1.0537, 'learning_rate': 4.9865074456818906e-06, 'epoch': 0.06} 6%|▌ | 173/2774 [34:12<8:29:56, 11.76s/it] 6%|▋ | 174/2774 [34:26<8:57:15, 12.40s/it] {'loss': 1.0645, 'learning_rate': 4.9862028195874165e-06, 'epoch': 0.06} 6%|▋ | 174/2774 [34:26<8:57:15, 12.40s/it] 6%|▋ | 175/2774 [34:38<8:54:03, 12.33s/it] {'loss': 0.999, 'learning_rate': 4.985894802462485e-06, 'epoch': 0.06} 6%|▋ | 175/2774 [34:38<8:54:03, 12.33s/it] 6%|▋ | 176/2774 [34:51<9:02:01, 12.52s/it] {'loss': 1.0571, 'learning_rate': 4.985583394727211e-06, 'epoch': 0.06} 6%|▋ | 176/2774 [34:51<9:02:01, 12.52s/it] 6%|▋ | 177/2774 [35:04<9:11:46, 12.75s/it] {'loss': 0.9673, 'learning_rate': 4.985268596806336e-06, 'epoch': 0.06} 6%|▋ | 177/2774 [35:04<9:11:46, 12.75s/it] 6%|▋ | 178/2774 [35:15<8:48:26, 12.21s/it] {'loss': 1.0605, 'learning_rate': 4.9849504091292264e-06, 'epoch': 0.06} 6%|▋ | 178/2774 [35:15<8:48:26, 12.21s/it] 6%|▋ | 179/2774 [35:26<8:34:29, 11.90s/it] {'loss': 1.0771, 'learning_rate': 4.98462883212987e-06, 'epoch': 0.06} 6%|▋ | 179/2774 [35:26<8:34:29, 11.90s/it] 6%|▋ | 180/2774 [35:38<8:29:28, 11.78s/it] {'loss': 1.1079, 'learning_rate': 4.984303866246879e-06, 'epoch': 0.06} 6%|▋ | 180/2774 [35:38<8:29:28, 11.78s/it] 7%|▋ | 181/2774 [35:50<8:35:52, 11.94s/it] {'loss': 0.9976, 'learning_rate': 4.983975511923488e-06, 'epoch': 0.07} 7%|▋ | 181/2774 [35:50<8:35:52, 11.94s/it] 7%|▋ | 182/2774 [36:04<8:56:25, 12.42s/it] {'loss': 0.9878, 'learning_rate': 4.98364376960755e-06, 'epoch': 0.07} 7%|▋ | 182/2774 [36:04<8:56:25, 12.42s/it] 7%|▋ | 183/2774 [36:16<8:47:49, 12.22s/it] {'loss': 1.0107, 'learning_rate': 4.983308639751544e-06, 'epoch': 0.07} 7%|▋ | 183/2774 [36:16<8:47:49, 12.22s/it] 7%|▋ | 184/2774 [36:27<8:41:05, 12.07s/it] {'loss': 1.0322, 'learning_rate': 4.982970122812566e-06, 'epoch': 0.07} 7%|▋ | 184/2774 [36:27<8:41:05, 12.07s/it] 7%|▋ | 185/2774 [36:39<8:40:08, 12.05s/it] {'loss': 0.9829, 'learning_rate': 4.9826282192523315e-06, 'epoch': 0.07} 7%|▋ | 185/2774 [36:39<8:40:08, 12.05s/it] 7%|▋ | 186/2774 [36:51<8:31:24, 11.86s/it] {'loss': 1.0029, 'learning_rate': 4.982282929537179e-06, 'epoch': 0.07} 7%|▋ | 186/2774 [36:51<8:31:24, 11.86s/it] 7%|▋ | 187/2774 [37:02<8:24:13, 11.69s/it] {'loss': 1.0386, 'learning_rate': 4.98193425413806e-06, 'epoch': 0.07} 7%|▋ | 187/2774 [37:02<8:24:13, 11.69s/it] 7%|▋ | 188/2774 [37:14<8:22:51, 11.67s/it] {'loss': 1.0215, 'learning_rate': 4.9815821935305475e-06, 'epoch': 0.07} 7%|▋ | 188/2774 [37:14<8:22:51, 11.67s/it] 7%|▋ | 189/2774 [37:25<8:18:20, 11.57s/it] {'loss': 1.0249, 'learning_rate': 4.981226748194833e-06, 'epoch': 0.07} 7%|▋ | 189/2774 [37:25<8:18:20, 11.57s/it] 7%|▋ | 190/2774 [37:38<8:34:08, 11.94s/it] {'loss': 1.0068, 'learning_rate': 4.980867918615719e-06, 'epoch': 0.07} 7%|▋ | 190/2774 [37:38<8:34:08, 11.94s/it] 7%|▋ | 191/2774 [37:49<8:28:04, 11.80s/it] {'loss': 1.0366, 'learning_rate': 4.980505705282629e-06, 'epoch': 0.07} 7%|▋ | 191/2774 [37:49<8:28:04, 11.80s/it] 7%|▋ | 192/2774 [38:01<8:26:49, 11.78s/it] {'loss': 1.0464, 'learning_rate': 4.980140108689602e-06, 'epoch': 0.07} 7%|▋ | 192/2774 [38:01<8:26:49, 11.78s/it] 7%|▋ | 193/2774 [38:14<8:41:40, 12.13s/it] {'loss': 0.9741, 'learning_rate': 4.979771129335286e-06, 'epoch': 0.07} 7%|▋ | 193/2774 [38:14<8:41:40, 12.13s/it] 7%|▋ | 194/2774 [38:25<8:31:53, 11.90s/it] {'loss': 1.041, 'learning_rate': 4.979398767722949e-06, 'epoch': 0.07} 7%|▋ | 194/2774 [38:25<8:31:53, 11.90s/it] 7%|▋ | 195/2774 [38:36<8:22:28, 11.69s/it] {'loss': 0.9717, 'learning_rate': 4.97902302436047e-06, 'epoch': 0.07} 7%|▋ | 195/2774 [38:36<8:22:28, 11.69s/it] 7%|▋ | 196/2774 [38:48<8:18:11, 11.59s/it] {'loss': 1.0186, 'learning_rate': 4.9786438997603385e-06, 'epoch': 0.07} 7%|▋ | 196/2774 [38:48<8:18:11, 11.59s/it] 7%|▋ | 197/2774 [38:59<8:18:24, 11.60s/it] {'loss': 0.9966, 'learning_rate': 4.978261394439658e-06, 'epoch': 0.07} 7%|▋ | 197/2774 [38:59<8:18:24, 11.60s/it] 7%|▋ | 198/2774 [39:11<8:16:31, 11.56s/it] {'loss': 1.0659, 'learning_rate': 4.9778755089201445e-06, 'epoch': 0.07} 7%|▋ | 198/2774 [39:11<8:16:31, 11.56s/it] 7%|▋ | 199/2774 [39:24<8:35:59, 12.02s/it] {'loss': 1.0732, 'learning_rate': 4.97748624372812e-06, 'epoch': 0.07} 7%|▋ | 199/2774 [39:24<8:35:59, 12.02s/it] 7%|▋ | 200/2774 [39:36<8:28:51, 11.86s/it] {'loss': 1.0776, 'learning_rate': 4.97709359939452e-06, 'epoch': 0.07} 7%|▋ | 200/2774 [39:36<8:28:51, 11.86s/it] 7%|▋ | 201/2774 [39:47<8:24:03, 11.75s/it] {'loss': 1.166, 'learning_rate': 4.976697576454889e-06, 'epoch': 0.07} 7%|▋ | 201/2774 [39:47<8:24:03, 11.75s/it] 7%|▋ | 202/2774 [39:59<8:20:38, 11.68s/it] {'loss': 1.0479, 'learning_rate': 4.9762981754493755e-06, 'epoch': 0.07} 7%|▋ | 202/2774 [39:59<8:20:38, 11.68s/it] 7%|▋ | 203/2774 [40:10<8:17:45, 11.62s/it] {'loss': 1.0161, 'learning_rate': 4.97589539692274e-06, 'epoch': 0.07} 7%|▋ | 203/2774 [40:10<8:17:45, 11.62s/it] 7%|▋ | 204/2774 [40:23<8:29:58, 11.91s/it] {'loss': 1.0459, 'learning_rate': 4.975489241424347e-06, 'epoch': 0.07} 7%|▋ | 204/2774 [40:23<8:29:58, 11.91s/it] 7%|▋ | 205/2774 [40:34<8:22:34, 11.74s/it] {'loss': 1.0908, 'learning_rate': 4.975079709508171e-06, 'epoch': 0.07} 7%|▋ | 205/2774 [40:34<8:22:34, 11.74s/it] 7%|▋ | 206/2774 [40:45<8:16:36, 11.60s/it] {'loss': 1.0581, 'learning_rate': 4.9746668017327845e-06, 'epoch': 0.07} 7%|▋ | 206/2774 [40:45<8:16:36, 11.60s/it] 7%|▋ | 207/2774 [40:58<8:33:43, 12.01s/it] {'loss': 0.9844, 'learning_rate': 4.974250518661371e-06, 'epoch': 0.07} 7%|▋ | 207/2774 [40:58<8:33:43, 12.01s/it] 7%|▋ | 208/2774 [41:09<8:24:14, 11.79s/it] {'loss': 0.999, 'learning_rate': 4.973830860861717e-06, 'epoch': 0.07} 7%|▋ | 208/2774 [41:09<8:24:14, 11.79s/it] 8%|▊ | 209/2774 [41:21<8:17:14, 11.63s/it] {'loss': 1.0444, 'learning_rate': 4.973407828906208e-06, 'epoch': 0.08} 8%|▊ | 209/2774 [41:21<8:17:14, 11.63s/it] 8%|▊ | 210/2774 [41:33<8:26:28, 11.85s/it] {'loss': 1.084, 'learning_rate': 4.9729814233718345e-06, 'epoch': 0.08} 8%|▊ | 210/2774 [41:33<8:26:28, 11.85s/it] 8%|▊ | 211/2774 [41:44<8:20:25, 11.72s/it] {'loss': 0.9941, 'learning_rate': 4.972551644840188e-06, 'epoch': 0.08} 8%|▊ | 211/2774 [41:44<8:20:25, 11.72s/it] 8%|▊ | 212/2774 [41:57<8:33:28, 12.03s/it] {'loss': 1.0229, 'learning_rate': 4.972118493897461e-06, 'epoch': 0.08} 8%|▊ | 212/2774 [41:57<8:33:28, 12.03s/it] 8%|▊ | 213/2774 [42:08<8:22:07, 11.76s/it] {'loss': 1.0664, 'learning_rate': 4.9716819711344446e-06, 'epoch': 0.08} 8%|▊ | 213/2774 [42:08<8:22:07, 11.76s/it] 8%|▊ | 214/2774 [42:20<8:15:18, 11.61s/it] {'loss': 1.0444, 'learning_rate': 4.97124207714653e-06, 'epoch': 0.08} 8%|▊ | 214/2774 [42:20<8:15:18, 11.61s/it] 8%|▊ | 215/2774 [42:31<8:17:22, 11.66s/it] {'loss': 0.9995, 'learning_rate': 4.9707988125337056e-06, 'epoch': 0.08} 8%|▊ | 215/2774 [42:31<8:17:22, 11.66s/it] 8%|▊ | 216/2774 [42:44<8:33:38, 12.05s/it] {'loss': 1.0269, 'learning_rate': 4.970352177900558e-06, 'epoch': 0.08} 8%|▊ | 216/2774 [42:44<8:33:38, 12.05s/it] 8%|▊ | 217/2774 [42:56<8:23:16, 11.81s/it] {'loss': 1.0503, 'learning_rate': 4.9699021738562705e-06, 'epoch': 0.08} 8%|▊ | 217/2774 [42:56<8:23:16, 11.81s/it] 8%|▊ | 218/2774 [43:09<8:48:59, 12.42s/it] {'loss': 1.043, 'learning_rate': 4.9694488010146195e-06, 'epoch': 0.08} 8%|▊ | 218/2774 [43:09<8:48:59, 12.42s/it] 8%|▊ | 219/2774 [43:21<8:35:13, 12.10s/it] {'loss': 1.0435, 'learning_rate': 4.968992059993979e-06, 'epoch': 0.08} 8%|▊ | 219/2774 [43:21<8:35:13, 12.10s/it] 8%|▊ | 220/2774 [43:32<8:25:51, 11.88s/it] {'loss': 0.9829, 'learning_rate': 4.9685319514173165e-06, 'epoch': 0.08} 8%|▊ | 220/2774 [43:32<8:25:51, 11.88s/it] 8%|▊ | 221/2774 [43:44<8:25:08, 11.87s/it] {'loss': 1.0205, 'learning_rate': 4.968068475912192e-06, 'epoch': 0.08} 8%|▊ | 221/2774 [43:44<8:25:08, 11.87s/it] 8%|▊ | 222/2774 [43:55<8:16:14, 11.67s/it] {'loss': 1.0195, 'learning_rate': 4.967601634110758e-06, 'epoch': 0.08} 8%|▊ | 222/2774 [43:55<8:16:14, 11.67s/it] 8%|▊ | 223/2774 [44:07<8:14:52, 11.64s/it] {'loss': 1.02, 'learning_rate': 4.9671314266497595e-06, 'epoch': 0.08} 8%|▊ | 223/2774 [44:07<8:14:52, 11.64s/it] 8%|▊ | 224/2774 [44:18<8:11:57, 11.58s/it] {'loss': 1.0938, 'learning_rate': 4.96665785417053e-06, 'epoch': 0.08} 8%|▊ | 224/2774 [44:18<8:11:57, 11.58s/it] 8%|▊ | 225/2774 [44:30<8:08:52, 11.51s/it] {'loss': 0.9912, 'learning_rate': 4.966180917318994e-06, 'epoch': 0.08} 8%|▊ | 225/2774 [44:30<8:08:52, 11.51s/it] 8%|▊ | 226/2774 [44:41<8:05:17, 11.43s/it] {'loss': 1.0371, 'learning_rate': 4.965700616745665e-06, 'epoch': 0.08} 8%|▊ | 226/2774 [44:41<8:05:17, 11.43s/it] 8%|▊ | 227/2774 [44:53<8:14:42, 11.65s/it] {'loss': 1.0537, 'learning_rate': 4.965216953105644e-06, 'epoch': 0.08} 8%|▊ | 227/2774 [44:53<8:14:42, 11.65s/it] 8%|▊ | 228/2774 [45:04<8:08:18, 11.51s/it] {'loss': 1.0176, 'learning_rate': 4.964729927058618e-06, 'epoch': 0.08} 8%|▊ | 228/2774 [45:04<8:08:18, 11.51s/it] 8%|▊ | 229/2774 [45:16<8:10:58, 11.58s/it] {'loss': 1.0005, 'learning_rate': 4.964239539268861e-06, 'epoch': 0.08} 8%|▊ | 229/2774 [45:16<8:10:58, 11.58s/it] 8%|▊ | 230/2774 [45:27<8:10:18, 11.56s/it] {'loss': 0.9961, 'learning_rate': 4.963745790405234e-06, 'epoch': 0.08} 8%|▊ | 230/2774 [45:27<8:10:18, 11.56s/it] 8%|▊ | 231/2774 [45:39<8:08:31, 11.53s/it] {'loss': 1.02, 'learning_rate': 4.963248681141179e-06, 'epoch': 0.08} 8%|▊ | 231/2774 [45:39<8:08:31, 11.53s/it] 8%|▊ | 232/2774 [45:50<8:06:49, 11.49s/it] {'loss': 1.0684, 'learning_rate': 4.962748212154724e-06, 'epoch': 0.08} 8%|▊ | 232/2774 [45:50<8:06:49, 11.49s/it] 8%|▊ | 233/2774 [46:02<8:04:26, 11.44s/it] {'loss': 1.0229, 'learning_rate': 4.9622443841284786e-06, 'epoch': 0.08} 8%|▊ | 233/2774 [46:02<8:04:26, 11.44s/it] 8%|▊ | 234/2774 [46:13<8:05:56, 11.48s/it] {'loss': 1.0518, 'learning_rate': 4.961737197749633e-06, 'epoch': 0.08} 8%|▊ | 234/2774 [46:13<8:05:56, 11.48s/it] 8%|▊ | 235/2774 [46:24<8:01:43, 11.38s/it] {'loss': 1.0249, 'learning_rate': 4.961226653709959e-06, 'epoch': 0.08} 8%|▊ | 235/2774 [46:24<8:01:43, 11.38s/it] 9%|▊ | 236/2774 [46:36<8:02:21, 11.40s/it] {'loss': 1.0688, 'learning_rate': 4.960712752705808e-06, 'epoch': 0.09} 9%|▊ | 236/2774 [46:36<8:02:21, 11.40s/it] 9%|▊ | 237/2774 [46:47<8:05:20, 11.48s/it] {'loss': 1.0439, 'learning_rate': 4.96019549543811e-06, 'epoch': 0.09} 9%|▊ | 237/2774 [46:47<8:05:20, 11.48s/it] 9%|▊ | 238/2774 [46:59<8:07:21, 11.53s/it] {'loss': 1.0122, 'learning_rate': 4.959674882612372e-06, 'epoch': 0.09} 9%|▊ | 238/2774 [46:59<8:07:21, 11.53s/it] 9%|▊ | 239/2774 [47:11<8:08:16, 11.56s/it] {'loss': 1.0776, 'learning_rate': 4.95915091493868e-06, 'epoch': 0.09} 9%|▊ | 239/2774 [47:11<8:08:16, 11.56s/it] 9%|▊ | 240/2774 [47:22<8:10:33, 11.62s/it] {'loss': 1.0278, 'learning_rate': 4.958623593131691e-06, 'epoch': 0.09} 9%|▊ | 240/2774 [47:22<8:10:33, 11.62s/it] 9%|▊ | 241/2774 [47:34<8:07:43, 11.55s/it] {'loss': 0.9966, 'learning_rate': 4.958092917910646e-06, 'epoch': 0.09} 9%|▊ | 241/2774 [47:34<8:07:43, 11.55s/it] 9%|▊ | 242/2774 [47:45<8:06:11, 11.52s/it] {'loss': 1.0532, 'learning_rate': 4.9575588899993464e-06, 'epoch': 0.09} 9%|▊ | 242/2774 [47:45<8:06:11, 11.52s/it] 9%|▉ | 243/2774 [47:57<8:04:53, 11.49s/it] {'loss': 1.0562, 'learning_rate': 4.9570215101261796e-06, 'epoch': 0.09} 9%|▉ | 243/2774 [47:57<8:04:53, 11.49s/it] 9%|▉ | 244/2774 [48:08<8:04:13, 11.48s/it] {'loss': 1.0918, 'learning_rate': 4.956480779024098e-06, 'epoch': 0.09} 9%|▉ | 244/2774 [48:08<8:04:13, 11.48s/it] 9%|▉ | 245/2774 [48:20<8:03:05, 11.46s/it] {'loss': 1.0913, 'learning_rate': 4.955936697430625e-06, 'epoch': 0.09} 9%|▉ | 245/2774 [48:20<8:03:05, 11.46s/it] 9%|▉ | 246/2774 [48:31<8:04:51, 11.51s/it] {'loss': 1.0386, 'learning_rate': 4.955389266087856e-06, 'epoch': 0.09} 9%|▉ | 246/2774 [48:31<8:04:51, 11.51s/it] 9%|▉ | 247/2774 [48:43<8:04:52, 11.51s/it] {'loss': 1.0195, 'learning_rate': 4.954838485742453e-06, 'epoch': 0.09} 9%|▉ | 247/2774 [48:43<8:04:52, 11.51s/it] 9%|▉ | 248/2774 [48:54<8:07:41, 11.58s/it] {'loss': 1.0044, 'learning_rate': 4.95428435714565e-06, 'epoch': 0.09} 9%|▉ | 248/2774 [48:54<8:07:41, 11.58s/it] 9%|▉ | 249/2774 [49:06<8:11:13, 11.67s/it] {'loss': 0.9722, 'learning_rate': 4.953726881053242e-06, 'epoch': 0.09} 9%|▉ | 249/2774 [49:06<8:11:13, 11.67s/it] 9%|▉ | 250/2774 [49:18<8:05:39, 11.54s/it] {'loss': 1.0186, 'learning_rate': 4.9531660582255934e-06, 'epoch': 0.09} 9%|▉ | 250/2774 [49:18<8:05:39, 11.54s/it] 9%|▉ | 251/2774 [49:29<8:03:57, 11.51s/it] {'loss': 1.0244, 'learning_rate': 4.952601889427634e-06, 'epoch': 0.09} 9%|▉ | 251/2774 [49:29<8:03:57, 11.51s/it] 9%|▉ | 252/2774 [49:41<8:06:41, 11.58s/it] {'loss': 1.0239, 'learning_rate': 4.9520343754288545e-06, 'epoch': 0.09} 9%|▉ | 252/2774 [49:41<8:06:41, 11.58s/it] 9%|▉ | 253/2774 [49:53<8:08:59, 11.64s/it] {'loss': 1.0142, 'learning_rate': 4.951463517003311e-06, 'epoch': 0.09} 9%|▉ | 253/2774 [49:53<8:08:59, 11.64s/it] 9%|▉ | 254/2774 [50:04<8:02:39, 11.49s/it] {'loss': 1.0381, 'learning_rate': 4.950889314929618e-06, 'epoch': 0.09} 9%|▉ | 254/2774 [50:04<8:02:39, 11.49s/it] 9%|▉ | 255/2774 [50:15<7:57:48, 11.38s/it] {'loss': 1.0464, 'learning_rate': 4.9503117699909545e-06, 'epoch': 0.09} 9%|▉ | 255/2774 [50:15<7:57:48, 11.38s/it] 9%|▉ | 256/2774 [50:28<8:17:34, 11.86s/it] {'loss': 1.0361, 'learning_rate': 4.949730882975055e-06, 'epoch': 0.09} 9%|▉ | 256/2774 [50:28<8:17:34, 11.86s/it] 9%|▉ | 257/2774 [50:39<8:11:12, 11.71s/it] {'loss': 1.1025, 'learning_rate': 4.949146654674216e-06, 'epoch': 0.09} 9%|▉ | 257/2774 [50:39<8:11:12, 11.71s/it] 9%|▉ | 258/2774 [50:51<8:08:20, 11.65s/it] {'loss': 1.0479, 'learning_rate': 4.948559085885288e-06, 'epoch': 0.09} 9%|▉ | 258/2774 [50:51<8:08:20, 11.65s/it] 9%|▉ | 259/2774 [51:02<8:05:11, 11.58s/it] {'loss': 1.0273, 'learning_rate': 4.947968177409681e-06, 'epoch': 0.09} 9%|▉ | 259/2774 [51:02<8:05:11, 11.58s/it] 9%|▉ | 260/2774 [51:14<8:05:25, 11.59s/it] {'loss': 1.02, 'learning_rate': 4.9473739300533575e-06, 'epoch': 0.09} 9%|▉ | 260/2774 [51:14<8:05:25, 11.59s/it] 9%|▉ | 261/2774 [51:27<8:23:42, 12.03s/it] {'loss': 1.022, 'learning_rate': 4.946776344626834e-06, 'epoch': 0.09} 9%|▉ | 261/2774 [51:27<8:23:42, 12.03s/it] 9%|▉ | 262/2774 [51:40<8:44:11, 12.52s/it] {'loss': 0.936, 'learning_rate': 4.9461754219451844e-06, 'epoch': 0.09} 9%|▉ | 262/2774 [51:40<8:44:11, 12.52s/it] 9%|▉ | 263/2774 [51:52<8:32:28, 12.25s/it] {'loss': 1.0244, 'learning_rate': 4.945571162828027e-06, 'epoch': 0.09} 9%|▉ | 263/2774 [51:52<8:32:28, 12.25s/it] 10%|▉ | 264/2774 [52:03<8:20:11, 11.96s/it] {'loss': 1.0488, 'learning_rate': 4.9449635680995375e-06, 'epoch': 0.1} 10%|▉ | 264/2774 [52:03<8:20:11, 11.96s/it] 10%|▉ | 265/2774 [52:15<8:11:54, 11.76s/it] {'loss': 1.0425, 'learning_rate': 4.944352638588436e-06, 'epoch': 0.1} 10%|▉ | 265/2774 [52:15<8:11:54, 11.76s/it] 10%|▉ | 266/2774 [52:26<8:05:23, 11.61s/it] {'loss': 0.9795, 'learning_rate': 4.943738375127996e-06, 'epoch': 0.1} 10%|▉ | 266/2774 [52:26<8:05:23, 11.61s/it] 10%|▉ | 267/2774 [52:38<8:11:32, 11.76s/it] {'loss': 1.0034, 'learning_rate': 4.943120778556034e-06, 'epoch': 0.1} 10%|▉ | 267/2774 [52:38<8:11:32, 11.76s/it] 10%|▉ | 268/2774 [52:50<8:15:01, 11.85s/it] {'loss': 1.0171, 'learning_rate': 4.942499849714915e-06, 'epoch': 0.1} 10%|▉ | 268/2774 [52:50<8:15:01, 11.85s/it] 10%|▉ | 269/2774 [53:02<8:21:09, 12.00s/it] {'loss': 1.002, 'learning_rate': 4.941875589451548e-06, 'epoch': 0.1} 10%|▉ | 269/2774 [53:02<8:21:09, 12.00s/it] 10%|▉ | 270/2774 [53:14<8:19:03, 11.96s/it] {'loss': 1.0405, 'learning_rate': 4.9412479986173854e-06, 'epoch': 0.1} 10%|▉ | 270/2774 [53:14<8:19:03, 11.96s/it] 10%|▉ | 271/2774 [53:26<8:10:31, 11.76s/it] {'loss': 1.0332, 'learning_rate': 4.940617078068426e-06, 'epoch': 0.1} 10%|▉ | 271/2774 [53:26<8:10:31, 11.76s/it] 10%|▉ | 272/2774 [53:37<8:02:05, 11.56s/it] {'loss': 1.0444, 'learning_rate': 4.9399828286652056e-06, 'epoch': 0.1} 10%|▉ | 272/2774 [53:37<8:02:05, 11.56s/it] 10%|▉ | 273/2774 [53:50<8:29:42, 12.23s/it] {'loss': 1.0015, 'learning_rate': 4.939345251272802e-06, 'epoch': 0.1} 10%|▉ | 273/2774 [53:50<8:29:42, 12.23s/it] 10%|▉ | 274/2774 [54:02<8:18:13, 11.96s/it] {'loss': 0.9541, 'learning_rate': 4.938704346760832e-06, 'epoch': 0.1} 10%|▉ | 274/2774 [54:02<8:18:13, 11.96s/it] 10%|▉ | 275/2774 [54:13<8:12:37, 11.83s/it] {'loss': 1.0674, 'learning_rate': 4.938060116003452e-06, 'epoch': 0.1} 10%|▉ | 275/2774 [54:13<8:12:37, 11.83s/it] 10%|▉ | 276/2774 [54:25<8:05:35, 11.66s/it] {'loss': 1.0449, 'learning_rate': 4.937412559879352e-06, 'epoch': 0.1} 10%|▉ | 276/2774 [54:25<8:05:35, 11.66s/it] 10%|▉ | 277/2774 [54:37<8:13:51, 11.87s/it] {'loss': 1.0322, 'learning_rate': 4.936761679271761e-06, 'epoch': 0.1} 10%|▉ | 277/2774 [54:37<8:13:51, 11.87s/it] 10%|█ | 278/2774 [54:48<8:07:55, 11.73s/it] {'loss': 1.0488, 'learning_rate': 4.9361074750684404e-06, 'epoch': 0.1} 10%|█ | 278/2774 [54:48<8:07:55, 11.73s/it] 10%|█ | 279/2774 [55:00<8:03:12, 11.62s/it] {'loss': 1.0098, 'learning_rate': 4.935449948161684e-06, 'epoch': 0.1} 10%|█ | 279/2774 [55:00<8:03:12, 11.62s/it] 10%|█ | 280/2774 [55:13<8:22:10, 12.08s/it] {'loss': 1.0303, 'learning_rate': 4.93478909944832e-06, 'epoch': 0.1} 10%|█ | 280/2774 [55:13<8:22:10, 12.08s/it] 10%|█ | 281/2774 [55:24<8:15:17, 11.92s/it] {'loss': 1.0347, 'learning_rate': 4.934124929829706e-06, 'epoch': 0.1} 10%|█ | 281/2774 [55:24<8:15:17, 11.92s/it] 10%|█ | 282/2774 [55:36<8:12:33, 11.86s/it] {'loss': 1.0425, 'learning_rate': 4.9334574402117295e-06, 'epoch': 0.1} 10%|█ | 282/2774 [55:36<8:12:33, 11.86s/it] 10%|█ | 283/2774 [55:47<8:06:31, 11.72s/it] {'loss': 0.9795, 'learning_rate': 4.932786631504805e-06, 'epoch': 0.1} 10%|█ | 283/2774 [55:47<8:06:31, 11.72s/it] 10%|█ | 284/2774 [55:59<7:57:55, 11.52s/it] {'loss': 1.0762, 'learning_rate': 4.932112504623876e-06, 'epoch': 0.1} 10%|█ | 284/2774 [55:59<7:57:55, 11.52s/it] 10%|█ | 285/2774 [56:10<7:57:46, 11.52s/it] {'loss': 1.0469, 'learning_rate': 4.931435060488411e-06, 'epoch': 0.1} 10%|█ | 285/2774 [56:10<7:57:46, 11.52s/it] 10%|█ | 286/2774 [56:22<7:59:12, 11.56s/it] {'loss': 0.9966, 'learning_rate': 4.9307543000224024e-06, 'epoch': 0.1} 10%|█ | 286/2774 [56:22<7:59:12, 11.56s/it] 10%|█ | 287/2774 [56:33<7:56:36, 11.50s/it] {'loss': 1.0791, 'learning_rate': 4.930070224154366e-06, 'epoch': 0.1} 10%|█ | 287/2774 [56:33<7:56:36, 11.50s/it] 10%|█ | 288/2774 [56:45<7:59:24, 11.57s/it] {'loss': 1.0449, 'learning_rate': 4.92938283381734e-06, 'epoch': 0.1} 10%|█ | 288/2774 [56:45<7:59:24, 11.57s/it] 10%|█ | 289/2774 [56:57<8:02:19, 11.65s/it] {'loss': 1.1064, 'learning_rate': 4.928692129948884e-06, 'epoch': 0.1} 10%|█ | 289/2774 [56:57<8:02:19, 11.65s/it] 10%|█ | 290/2774 [57:09<8:13:24, 11.92s/it] {'loss': 0.9844, 'learning_rate': 4.927998113491076e-06, 'epoch': 0.1} 10%|█ | 290/2774 [57:09<8:13:24, 11.92s/it] 10%|█ | 291/2774 [57:21<8:10:34, 11.85s/it] {'loss': 1.0459, 'learning_rate': 4.927300785390513e-06, 'epoch': 0.1} 10%|█ | 291/2774 [57:21<8:10:34, 11.85s/it] 11%|█ | 292/2774 [57:32<8:05:03, 11.73s/it] {'loss': 1.0132, 'learning_rate': 4.926600146598307e-06, 'epoch': 0.11} 11%|█ | 292/2774 [57:32<8:05:03, 11.73s/it] 11%|█ | 293/2774 [57:44<8:04:56, 11.73s/it] {'loss': 1.0474, 'learning_rate': 4.925896198070088e-06, 'epoch': 0.11} 11%|█ | 293/2774 [57:44<8:04:56, 11.73s/it] 11%|█ | 294/2774 [57:56<8:09:43, 11.85s/it] {'loss': 1.0391, 'learning_rate': 4.925188940766e-06, 'epoch': 0.11} 11%|█ | 294/2774 [57:56<8:09:43, 11.85s/it] 11%|█ | 295/2774 [58:07<8:02:02, 11.67s/it] {'loss': 1.0317, 'learning_rate': 4.9244783756506975e-06, 'epoch': 0.11} 11%|█ | 295/2774 [58:07<8:02:02, 11.67s/it] 11%|█ | 296/2774 [58:19<7:59:14, 11.60s/it] {'loss': 1.0244, 'learning_rate': 4.9237645036933505e-06, 'epoch': 0.11} 11%|█ | 296/2774 [58:19<7:59:14, 11.60s/it] 11%|█ | 297/2774 [58:32<8:13:42, 11.96s/it] {'loss': 0.9746, 'learning_rate': 4.923047325867635e-06, 'epoch': 0.11} 11%|█ | 297/2774 [58:32<8:13:42, 11.96s/it] 11%|█ | 298/2774 [58:43<8:11:13, 11.90s/it] {'loss': 1.0181, 'learning_rate': 4.922326843151739e-06, 'epoch': 0.11} 11%|█ | 298/2774 [58:43<8:11:13, 11.90s/it] 11%|█ | 299/2774 [58:55<8:03:23, 11.72s/it] {'loss': 1.0024, 'learning_rate': 4.921603056528358e-06, 'epoch': 0.11} 11%|█ | 299/2774 [58:55<8:03:23, 11.72s/it] 11%|█ | 300/2774 [59:06<8:03:42, 11.73s/it] {'loss': 1.0146, 'learning_rate': 4.920875966984693e-06, 'epoch': 0.11} 11%|█ | 300/2774 [59:06<8:03:42, 11.73s/it] 11%|█ | 301/2774 [59:18<8:03:59, 11.74s/it] {'loss': 1.0181, 'learning_rate': 4.92014557551245e-06, 'epoch': 0.11} 11%|█ | 301/2774 [59:18<8:03:59, 11.74s/it] 11%|█ | 302/2774 [59:30<8:03:29, 11.74s/it] {'loss': 1.0601, 'learning_rate': 4.91941188310784e-06, 'epoch': 0.11} 11%|█ | 302/2774 [59:30<8:03:29, 11.74s/it] 11%|█ | 303/2774 [59:43<8:13:47, 11.99s/it] {'loss': 1.0024, 'learning_rate': 4.918674890771573e-06, 'epoch': 0.11} 11%|█ | 303/2774 [59:43<8:13:47, 11.99s/it] 11%|█ | 304/2774 [59:54<8:09:48, 11.90s/it] {'loss': 1.0527, 'learning_rate': 4.9179345995088625e-06, 'epoch': 0.11} 11%|█ | 304/2774 [59:54<8:09:48, 11.90s/it] 11%|█ | 305/2774 [1:00:06<8:01:59, 11.71s/it] {'loss': 1.0332, 'learning_rate': 4.917191010329423e-06, 'epoch': 0.11} 11%|█ | 305/2774 [1:00:06<8:01:59, 11.71s/it] 11%|█ | 306/2774 [1:00:17<8:00:32, 11.68s/it] {'loss': 1.0356, 'learning_rate': 4.916444124247463e-06, 'epoch': 0.11} 11%|█ | 306/2774 [1:00:17<8:00:32, 11.68s/it] 11%|█ | 307/2774 [1:00:29<8:06:14, 11.83s/it] {'loss': 1.064, 'learning_rate': 4.915693942281691e-06, 'epoch': 0.11} 11%|█ | 307/2774 [1:00:29<8:06:14, 11.83s/it] 11%|█ | 308/2774 [1:00:41<8:05:03, 11.80s/it] {'loss': 1.0547, 'learning_rate': 4.91494046545531e-06, 'epoch': 0.11} 11%|█ | 308/2774 [1:00:41<8:05:03, 11.80s/it] 11%|█ | 309/2774 [1:00:54<8:21:20, 12.20s/it] {'loss': 1.002, 'learning_rate': 4.914183694796017e-06, 'epoch': 0.11} 11%|█ | 309/2774 [1:00:54<8:21:20, 12.20s/it] 11%|█ | 310/2774 [1:01:07<8:34:30, 12.53s/it] {'loss': 1.0356, 'learning_rate': 4.913423631336e-06, 'epoch': 0.11} 11%|█ | 310/2774 [1:01:07<8:34:30, 12.53s/it] 11%|█ | 311/2774 [1:01:20<8:34:35, 12.54s/it] {'loss': 1.0054, 'learning_rate': 4.912660276111941e-06, 'epoch': 0.11} 11%|█ | 311/2774 [1:01:20<8:34:35, 12.54s/it] 11%|█ | 312/2774 [1:01:32<8:27:07, 12.36s/it] {'loss': 1.0439, 'learning_rate': 4.911893630165011e-06, 'epoch': 0.11} 11%|█ | 312/2774 [1:01:32<8:27:07, 12.36s/it] 11%|█▏ | 313/2774 [1:01:43<8:12:58, 12.02s/it] {'loss': 1.0205, 'learning_rate': 4.911123694540868e-06, 'epoch': 0.11} 11%|█▏ | 313/2774 [1:01:43<8:12:58, 12.02s/it] 11%|█▏ | 314/2774 [1:01:54<8:02:37, 11.77s/it] {'loss': 1.0732, 'learning_rate': 4.910350470289656e-06, 'epoch': 0.11} 11%|█▏ | 314/2774 [1:01:54<8:02:37, 11.77s/it] 11%|█▏ | 315/2774 [1:02:06<7:55:02, 11.59s/it] {'loss': 1.019, 'learning_rate': 4.90957395846601e-06, 'epoch': 0.11} 11%|█▏ | 315/2774 [1:02:06<7:55:02, 11.59s/it] 11%|█▏ | 316/2774 [1:02:17<7:49:51, 11.47s/it] {'loss': 1.0181, 'learning_rate': 4.9087941601290416e-06, 'epoch': 0.11} 11%|█▏ | 316/2774 [1:02:17<7:49:51, 11.47s/it] 11%|█▏ | 317/2774 [1:02:28<7:48:07, 11.43s/it] {'loss': 1.0522, 'learning_rate': 4.90801107634235e-06, 'epoch': 0.11} 11%|█▏ | 317/2774 [1:02:28<7:48:07, 11.43s/it] 11%|█▏ | 318/2774 [1:02:39<7:47:19, 11.42s/it] {'loss': 1.0625, 'learning_rate': 4.907224708174014e-06, 'epoch': 0.11} 11%|█▏ | 318/2774 [1:02:39<7:47:19, 11.42s/it] 11%|█▏ | 319/2774 [1:02:52<8:03:08, 11.81s/it] {'loss': 0.9829, 'learning_rate': 4.9064350566965925e-06, 'epoch': 0.11} 11%|█▏ | 319/2774 [1:02:52<8:03:08, 11.81s/it] 12%|█▏ | 320/2774 [1:03:04<8:05:34, 11.87s/it] {'loss': 1.042, 'learning_rate': 4.905642122987123e-06, 'epoch': 0.12} 12%|█▏ | 320/2774 [1:03:04<8:05:34, 11.87s/it] 12%|█▏ | 321/2774 [1:03:16<7:59:01, 11.72s/it] {'loss': 1.019, 'learning_rate': 4.904845908127119e-06, 'epoch': 0.12} 12%|█▏ | 321/2774 [1:03:16<7:59:01, 11.72s/it] 12%|█▏ | 322/2774 [1:03:27<7:57:52, 11.69s/it] {'loss': 1.0566, 'learning_rate': 4.904046413202568e-06, 'epoch': 0.12} 12%|█▏ | 322/2774 [1:03:27<7:57:52, 11.69s/it] 12%|█▏ | 323/2774 [1:03:38<7:52:06, 11.56s/it] {'loss': 1.0234, 'learning_rate': 4.903243639303934e-06, 'epoch': 0.12} 12%|█▏ | 323/2774 [1:03:38<7:52:06, 11.56s/it] 12%|█▏ | 324/2774 [1:03:50<7:51:53, 11.56s/it] {'loss': 1.0107, 'learning_rate': 4.902437587526152e-06, 'epoch': 0.12} 12%|█▏ | 324/2774 [1:03:50<7:51:53, 11.56s/it] 12%|█▏ | 325/2774 [1:04:01<7:47:29, 11.45s/it] {'loss': 1.0151, 'learning_rate': 4.901628258968628e-06, 'epoch': 0.12} 12%|█▏ | 325/2774 [1:04:01<7:47:29, 11.45s/it] 12%|█▏ | 326/2774 [1:04:13<7:48:54, 11.49s/it] {'loss': 1.0229, 'learning_rate': 4.900815654735237e-06, 'epoch': 0.12} 12%|█▏ | 326/2774 [1:04:13<7:48:54, 11.49s/it] 12%|█▏ | 327/2774 [1:04:25<7:58:22, 11.73s/it] {'loss': 0.9951, 'learning_rate': 4.8999997759343225e-06, 'epoch': 0.12} 12%|█▏ | 327/2774 [1:04:25<7:58:22, 11.73s/it] 12%|█▏ | 328/2774 [1:04:38<8:09:53, 12.02s/it] {'loss': 1.0283, 'learning_rate': 4.899180623678693e-06, 'epoch': 0.12} 12%|█▏ | 328/2774 [1:04:38<8:09:53, 12.02s/it] 12%|█▏ | 329/2774 [1:04:49<8:00:31, 11.79s/it] {'loss': 1.0195, 'learning_rate': 4.898358199085624e-06, 'epoch': 0.12} 12%|█▏ | 329/2774 [1:04:49<8:00:31, 11.79s/it] 12%|█▏ | 330/2774 [1:05:00<7:54:48, 11.66s/it] {'loss': 1.0605, 'learning_rate': 4.897532503276852e-06, 'epoch': 0.12} 12%|█▏ | 330/2774 [1:05:00<7:54:48, 11.66s/it] 12%|█▏ | 331/2774 [1:05:12<7:53:12, 11.62s/it] {'loss': 1.0337, 'learning_rate': 4.896703537378577e-06, 'epoch': 0.12} 12%|█▏ | 331/2774 [1:05:12<7:53:12, 11.62s/it] 12%|█▏ | 332/2774 [1:05:23<7:48:48, 11.52s/it] {'loss': 1.0493, 'learning_rate': 4.895871302521457e-06, 'epoch': 0.12} 12%|█▏ | 332/2774 [1:05:23<7:48:48, 11.52s/it] 12%|█▏ | 333/2774 [1:05:34<7:46:01, 11.45s/it] {'loss': 1.0229, 'learning_rate': 4.89503579984061e-06, 'epoch': 0.12} 12%|█▏ | 333/2774 [1:05:34<7:46:01, 11.45s/it] 12%|█▏ | 334/2774 [1:05:46<7:46:49, 11.48s/it] {'loss': 0.9893, 'learning_rate': 4.894197030475614e-06, 'epoch': 0.12} 12%|█▏ | 334/2774 [1:05:46<7:46:49, 11.48s/it] 12%|█▏ | 335/2774 [1:05:57<7:44:38, 11.43s/it] {'loss': 1.0786, 'learning_rate': 4.893354995570497e-06, 'epoch': 0.12} 12%|█▏ | 335/2774 [1:05:57<7:44:38, 11.43s/it] 12%|█▏ | 336/2774 [1:06:09<7:46:43, 11.49s/it] {'loss': 1.0552, 'learning_rate': 4.892509696273745e-06, 'epoch': 0.12} 12%|█▏ | 336/2774 [1:06:09<7:46:43, 11.49s/it] 12%|█▏ | 337/2774 [1:06:20<7:42:45, 11.39s/it] {'loss': 1.0103, 'learning_rate': 4.891661133738295e-06, 'epoch': 0.12} 12%|█▏ | 337/2774 [1:06:20<7:42:45, 11.39s/it] 12%|█▏ | 338/2774 [1:06:31<7:39:49, 11.33s/it] {'loss': 1.0234, 'learning_rate': 4.8908093091215344e-06, 'epoch': 0.12} 12%|█▏ | 338/2774 [1:06:31<7:39:49, 11.33s/it] 12%|█▏ | 339/2774 [1:06:45<8:06:33, 11.99s/it] {'loss': 1.021, 'learning_rate': 4.889954223585301e-06, 'epoch': 0.12} 12%|█▏ | 339/2774 [1:06:45<8:06:33, 11.99s/it] 12%|█▏ | 340/2774 [1:06:56<8:02:19, 11.89s/it] {'loss': 1.1162, 'learning_rate': 4.88909587829588e-06, 'epoch': 0.12} 12%|█▏ | 340/2774 [1:06:56<8:02:19, 11.89s/it] 12%|█▏ | 341/2774 [1:07:08<7:54:35, 11.70s/it] {'loss': 1.0635, 'learning_rate': 4.8882342744240015e-06, 'epoch': 0.12} 12%|█▏ | 341/2774 [1:07:08<7:54:35, 11.70s/it] 12%|█▏ | 342/2774 [1:07:19<7:49:17, 11.58s/it] {'loss': 0.978, 'learning_rate': 4.8873694131448425e-06, 'epoch': 0.12} 12%|█▏ | 342/2774 [1:07:19<7:49:17, 11.58s/it] 12%|█▏ | 343/2774 [1:07:30<7:46:14, 11.51s/it] {'loss': 1.0059, 'learning_rate': 4.886501295638021e-06, 'epoch': 0.12} 12%|█▏ | 343/2774 [1:07:30<7:46:14, 11.51s/it] 12%|█▏ | 344/2774 [1:07:42<7:46:09, 11.51s/it] {'loss': 1.0684, 'learning_rate': 4.885629923087597e-06, 'epoch': 0.12} 12%|█▏ | 344/2774 [1:07:42<7:46:09, 11.51s/it] 12%|█▏ | 345/2774 [1:07:54<7:46:59, 11.54s/it] {'loss': 0.9941, 'learning_rate': 4.88475529668207e-06, 'epoch': 0.12} 12%|█▏ | 345/2774 [1:07:54<7:46:59, 11.54s/it] 12%|█▏ | 346/2774 [1:08:05<7:45:31, 11.50s/it] {'loss': 1.0059, 'learning_rate': 4.883877417614376e-06, 'epoch': 0.12} 12%|█▏ | 346/2774 [1:08:05<7:45:31, 11.50s/it] 13%|█▎ | 347/2774 [1:08:16<7:42:02, 11.42s/it] {'loss': 1.0825, 'learning_rate': 4.882996287081892e-06, 'epoch': 0.13} 13%|█▎ | 347/2774 [1:08:16<7:42:02, 11.42s/it] 13%|█▎ | 348/2774 [1:08:28<7:42:25, 11.44s/it] {'loss': 1.0459, 'learning_rate': 4.882111906286425e-06, 'epoch': 0.13} 13%|█▎ | 348/2774 [1:08:28<7:42:25, 11.44s/it] 13%|█▎ | 349/2774 [1:08:39<7:39:40, 11.37s/it] {'loss': 1.002, 'learning_rate': 4.8812242764342165e-06, 'epoch': 0.13} 13%|█▎ | 349/2774 [1:08:39<7:39:40, 11.37s/it] 13%|█▎ | 350/2774 [1:08:50<7:41:25, 11.42s/it] {'loss': 0.998, 'learning_rate': 4.880333398735941e-06, 'epoch': 0.13} 13%|█▎ | 350/2774 [1:08:50<7:41:25, 11.42s/it] 13%|█▎ | 351/2774 [1:09:02<7:38:04, 11.34s/it] {'loss': 1.062, 'learning_rate': 4.879439274406702e-06, 'epoch': 0.13} 13%|█▎ | 351/2774 [1:09:02<7:38:04, 11.34s/it] 13%|█▎ | 352/2774 [1:09:13<7:40:46, 11.41s/it] {'loss': 0.9668, 'learning_rate': 4.87854190466603e-06, 'epoch': 0.13} 13%|█▎ | 352/2774 [1:09:13<7:40:46, 11.41s/it] 13%|█▎ | 353/2774 [1:09:24<7:39:19, 11.38s/it] {'loss': 1.0552, 'learning_rate': 4.8776412907378845e-06, 'epoch': 0.13} 13%|█▎ | 353/2774 [1:09:24<7:39:19, 11.38s/it] 13%|█▎ | 354/2774 [1:09:36<7:37:47, 11.35s/it] {'loss': 1.0029, 'learning_rate': 4.876737433850647e-06, 'epoch': 0.13} 13%|█▎ | 354/2774 [1:09:36<7:37:47, 11.35s/it] 13%|█▎ | 355/2774 [1:09:47<7:38:12, 11.37s/it] {'loss': 1.0596, 'learning_rate': 4.875830335237125e-06, 'epoch': 0.13} 13%|█▎ | 355/2774 [1:09:47<7:38:12, 11.37s/it] 13%|█▎ | 356/2774 [1:09:59<7:40:32, 11.43s/it] {'loss': 1.001, 'learning_rate': 4.874919996134546e-06, 'epoch': 0.13} 13%|█▎ | 356/2774 [1:09:59<7:40:32, 11.43s/it] 13%|█▎ | 357/2774 [1:10:10<7:38:07, 11.37s/it] {'loss': 1.0957, 'learning_rate': 4.874006417784557e-06, 'epoch': 0.13} 13%|█▎ | 357/2774 [1:10:10<7:38:07, 11.37s/it] 13%|█▎ | 358/2774 [1:10:22<7:40:57, 11.45s/it] {'loss': 1.0806, 'learning_rate': 4.873089601433223e-06, 'epoch': 0.13} 13%|█▎ | 358/2774 [1:10:22<7:40:57, 11.45s/it] 13%|█▎ | 359/2774 [1:10:33<7:40:00, 11.43s/it] {'loss': 1.0415, 'learning_rate': 4.872169548331028e-06, 'epoch': 0.13} 13%|█▎ | 359/2774 [1:10:33<7:40:00, 11.43s/it] 13%|█▎ | 360/2774 [1:10:44<7:38:43, 11.40s/it] {'loss': 1.0732, 'learning_rate': 4.871246259732867e-06, 'epoch': 0.13} 13%|█▎ | 360/2774 [1:10:44<7:38:43, 11.40s/it] 13%|█▎ | 361/2774 [1:10:55<7:35:52, 11.34s/it] {'loss': 1.0498, 'learning_rate': 4.870319736898052e-06, 'epoch': 0.13} 13%|█▎ | 361/2774 [1:10:55<7:35:52, 11.34s/it] 13%|█▎ | 362/2774 [1:11:08<7:44:09, 11.55s/it] {'loss': 1.0068, 'learning_rate': 4.869389981090302e-06, 'epoch': 0.13} 13%|█▎ | 362/2774 [1:11:08<7:44:09, 11.55s/it] 13%|█▎ | 363/2774 [1:11:19<7:45:41, 11.59s/it] {'loss': 1.0596, 'learning_rate': 4.868456993577749e-06, 'epoch': 0.13} 13%|█▎ | 363/2774 [1:11:19<7:45:41, 11.59s/it] 13%|█▎ | 364/2774 [1:11:30<7:41:57, 11.50s/it] {'loss': 1.0474, 'learning_rate': 4.867520775632931e-06, 'epoch': 0.13} 13%|█▎ | 364/2774 [1:11:30<7:41:57, 11.50s/it] 13%|█▎ | 365/2774 [1:11:42<7:38:31, 11.42s/it] {'loss': 1.0449, 'learning_rate': 4.866581328532793e-06, 'epoch': 0.13} 13%|█▎ | 365/2774 [1:11:42<7:38:31, 11.42s/it] 13%|█▎ | 366/2774 [1:11:55<8:04:17, 12.07s/it] {'loss': 1.0161, 'learning_rate': 4.865638653558684e-06, 'epoch': 0.13} 13%|█▎ | 366/2774 [1:11:55<8:04:17, 12.07s/it] 13%|█▎ | 367/2774 [1:12:07<7:56:18, 11.87s/it] {'loss': 1.0024, 'learning_rate': 4.864692751996356e-06, 'epoch': 0.13} 13%|█▎ | 367/2774 [1:12:07<7:56:18, 11.87s/it] 13%|█▎ | 368/2774 [1:12:19<7:55:38, 11.86s/it] {'loss': 1.0278, 'learning_rate': 4.863743625135962e-06, 'epoch': 0.13} 13%|█▎ | 368/2774 [1:12:19<7:55:38, 11.86s/it] 13%|█▎ | 369/2774 [1:12:30<7:46:08, 11.63s/it] {'loss': 1.0132, 'learning_rate': 4.862791274272053e-06, 'epoch': 0.13} 13%|█▎ | 369/2774 [1:12:30<7:46:08, 11.63s/it] 13%|█▎ | 370/2774 [1:12:43<8:10:15, 12.24s/it] {'loss': 1.0083, 'learning_rate': 4.861835700703578e-06, 'epoch': 0.13} 13%|█▎ | 370/2774 [1:12:43<8:10:15, 12.24s/it] 13%|█▎ | 371/2774 [1:12:55<8:02:39, 12.05s/it] {'loss': 1.02, 'learning_rate': 4.860876905733881e-06, 'epoch': 0.13} 13%|█▎ | 371/2774 [1:12:55<8:02:39, 12.05s/it] 13%|█▎ | 372/2774 [1:13:07<7:57:03, 11.92s/it] {'loss': 0.9839, 'learning_rate': 4.859914890670701e-06, 'epoch': 0.13} 13%|█▎ | 372/2774 [1:13:07<7:57:03, 11.92s/it] 13%|█▎ | 373/2774 [1:13:18<7:50:47, 11.76s/it] {'loss': 1.043, 'learning_rate': 4.85894965682617e-06, 'epoch': 0.13} 13%|█▎ | 373/2774 [1:13:18<7:50:47, 11.76s/it] 13%|█▎ | 374/2774 [1:13:29<7:45:01, 11.63s/it] {'loss': 1.0449, 'learning_rate': 4.857981205516807e-06, 'epoch': 0.13} 13%|█▎ | 374/2774 [1:13:29<7:45:01, 11.63s/it] 14%|█▎ | 375/2774 [1:13:41<7:40:52, 11.53s/it] {'loss': 1.0703, 'learning_rate': 4.8570095380635215e-06, 'epoch': 0.14} 14%|█▎ | 375/2774 [1:13:41<7:40:52, 11.53s/it] 14%|█▎ | 376/2774 [1:13:52<7:40:15, 11.52s/it] {'loss': 0.9351, 'learning_rate': 4.856034655791608e-06, 'epoch': 0.14} 14%|█▎ | 376/2774 [1:13:52<7:40:15, 11.52s/it] 14%|█▎ | 377/2774 [1:14:04<7:41:19, 11.55s/it] {'loss': 1.0044, 'learning_rate': 4.85505656003075e-06, 'epoch': 0.14} 14%|█▎ | 377/2774 [1:14:04<7:41:19, 11.55s/it] 14%|█▎ | 378/2774 [1:14:15<7:41:13, 11.55s/it] {'loss': 1.0781, 'learning_rate': 4.854075252115007e-06, 'epoch': 0.14} 14%|█▎ | 378/2774 [1:14:15<7:41:13, 11.55s/it] 14%|█▎ | 379/2774 [1:14:27<7:39:40, 11.52s/it] {'loss': 1.0605, 'learning_rate': 4.853090733382827e-06, 'epoch': 0.14} 14%|█▎ | 379/2774 [1:14:27<7:39:40, 11.52s/it] 14%|█▎ | 380/2774 [1:14:38<7:36:36, 11.44s/it] {'loss': 1.0381, 'learning_rate': 4.852103005177033e-06, 'epoch': 0.14} 14%|█▎ | 380/2774 [1:14:38<7:36:36, 11.44s/it] 14%|█▎ | 381/2774 [1:14:49<7:35:26, 11.42s/it] {'loss': 1.0171, 'learning_rate': 4.851112068844827e-06, 'epoch': 0.14} 14%|█▎ | 381/2774 [1:14:49<7:35:26, 11.42s/it] 14%|█▍ | 382/2774 [1:15:01<7:39:02, 11.51s/it] {'loss': 1.0132, 'learning_rate': 4.850117925737784e-06, 'epoch': 0.14} 14%|█▍ | 382/2774 [1:15:01<7:39:02, 11.51s/it] 14%|█▍ | 383/2774 [1:15:14<7:55:38, 11.94s/it] {'loss': 1.0059, 'learning_rate': 4.8491205772118585e-06, 'epoch': 0.14} 14%|█▍ | 383/2774 [1:15:14<7:55:38, 11.94s/it] 14%|█▍ | 384/2774 [1:15:26<7:54:07, 11.90s/it] {'loss': 1.0347, 'learning_rate': 4.848120024627372e-06, 'epoch': 0.14} 14%|█▍ | 384/2774 [1:15:26<7:54:07, 11.90s/it] 14%|█▍ | 385/2774 [1:15:39<8:06:32, 12.22s/it] {'loss': 1.0317, 'learning_rate': 4.847116269349018e-06, 'epoch': 0.14} 14%|█▍ | 385/2774 [1:15:39<8:06:32, 12.22s/it] 14%|█▍ | 386/2774 [1:15:50<7:54:55, 11.93s/it] {'loss': 0.9907, 'learning_rate': 4.846109312745857e-06, 'epoch': 0.14} 14%|█▍ | 386/2774 [1:15:50<7:54:55, 11.93s/it] 14%|█▍ | 387/2774 [1:16:03<8:05:19, 12.20s/it] {'loss': 1.0088, 'learning_rate': 4.845099156191319e-06, 'epoch': 0.14} 14%|█▍ | 387/2774 [1:16:03<8:05:19, 12.20s/it] 14%|█▍ | 388/2774 [1:16:14<7:58:02, 12.02s/it] {'loss': 1.0156, 'learning_rate': 4.844085801063195e-06, 'epoch': 0.14} 14%|█▍ | 388/2774 [1:16:14<7:58:02, 12.02s/it] 14%|█▍ | 389/2774 [1:16:26<7:50:20, 11.83s/it] {'loss': 1.0845, 'learning_rate': 4.843069248743641e-06, 'epoch': 0.14} 14%|█▍ | 389/2774 [1:16:26<7:50:20, 11.83s/it] 14%|█▍ | 390/2774 [1:16:37<7:48:06, 11.78s/it] {'loss': 1.0801, 'learning_rate': 4.842049500619173e-06, 'epoch': 0.14} 14%|█▍ | 390/2774 [1:16:37<7:48:06, 11.78s/it] 14%|█▍ | 391/2774 [1:16:49<7:41:48, 11.63s/it] {'loss': 1.0278, 'learning_rate': 4.8410265580806645e-06, 'epoch': 0.14} 14%|█▍ | 391/2774 [1:16:49<7:41:48, 11.63s/it] 14%|█▍ | 392/2774 [1:17:00<7:37:29, 11.52s/it] {'loss': 1.0737, 'learning_rate': 4.840000422523348e-06, 'epoch': 0.14} 14%|█▍ | 392/2774 [1:17:00<7:37:29, 11.52s/it] 14%|█▍ | 393/2774 [1:17:12<7:41:08, 11.62s/it] {'loss': 1.0405, 'learning_rate': 4.838971095346811e-06, 'epoch': 0.14} 14%|█▍ | 393/2774 [1:17:12<7:41:08, 11.62s/it] 14%|█▍ | 394/2774 [1:17:23<7:38:34, 11.56s/it] {'loss': 1.021, 'learning_rate': 4.8379385779549944e-06, 'epoch': 0.14} 14%|█▍ | 394/2774 [1:17:23<7:38:34, 11.56s/it] 14%|█▍ | 395/2774 [1:17:37<7:59:20, 12.09s/it] {'loss': 1.0415, 'learning_rate': 4.836902871756187e-06, 'epoch': 0.14} 14%|█▍ | 395/2774 [1:17:37<7:59:20, 12.09s/it] 14%|█▍ | 396/2774 [1:17:48<7:49:30, 11.85s/it] {'loss': 1.0488, 'learning_rate': 4.835863978163032e-06, 'epoch': 0.14} 14%|█▍ | 396/2774 [1:17:48<7:49:30, 11.85s/it] 14%|█▍ | 397/2774 [1:18:01<8:07:33, 12.31s/it] {'loss': 1.0195, 'learning_rate': 4.834821898592516e-06, 'epoch': 0.14} 14%|█▍ | 397/2774 [1:18:01<8:07:33, 12.31s/it] 14%|█▍ | 398/2774 [1:18:12<7:54:20, 11.98s/it] {'loss': 1.0327, 'learning_rate': 4.833776634465973e-06, 'epoch': 0.14} 14%|█▍ | 398/2774 [1:18:12<7:54:20, 11.98s/it] 14%|█▍ | 399/2774 [1:18:24<7:45:57, 11.77s/it] {'loss': 1.0293, 'learning_rate': 4.83272818720908e-06, 'epoch': 0.14} 14%|█▍ | 399/2774 [1:18:24<7:45:57, 11.77s/it] 14%|█▍ | 400/2774 [1:18:38<8:10:45, 12.40s/it] {'loss': 1.0093, 'learning_rate': 4.8316765582518565e-06, 'epoch': 0.14} 14%|█▍ | 400/2774 [1:18:38<8:10:45, 12.40s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 14%|█▍ | 401/2774 [1:19:18<13:46:00, 20.89s/it] {'loss': 0.9956, 'learning_rate': 4.830621749028659e-06, 'epoch': 0.14} 14%|█▍ | 401/2774 [1:19:18<13:46:00, 20.89s/it] 14%|█▍ | 402/2774 [1:19:30<11:52:42, 18.03s/it] {'loss': 1.0186, 'learning_rate': 4.829563760978186e-06, 'epoch': 0.14} 14%|█▍ | 402/2774 [1:19:30<11:52:42, 18.03s/it] 15%|█▍ | 403/2774 [1:19:41<10:32:12, 16.00s/it] {'loss': 1.0161, 'learning_rate': 4.828502595543467e-06, 'epoch': 0.15} 15%|█▍ | 403/2774 [1:19:41<10:32:12, 16.00s/it] 15%|█▍ | 404/2774 [1:19:53<9:40:04, 14.69s/it] {'loss': 1.0298, 'learning_rate': 4.8274382541718695e-06, 'epoch': 0.15} 15%|█▍ | 404/2774 [1:19:53<9:40:04, 14.69s/it] 15%|█▍ | 405/2774 [1:20:06<9:22:00, 14.23s/it] {'loss': 0.9888, 'learning_rate': 4.82637073831509e-06, 'epoch': 0.15} 15%|█▍ | 405/2774 [1:20:06<9:22:00, 14.23s/it] 15%|█▍ | 406/2774 [1:20:17<8:49:03, 13.41s/it] {'loss': 1.0068, 'learning_rate': 4.825300049429155e-06, 'epoch': 0.15} 15%|█▍ | 406/2774 [1:20:17<8:49:03, 13.41s/it] 15%|█▍ | 407/2774 [1:20:29<8:24:32, 12.79s/it] {'loss': 0.9551, 'learning_rate': 4.82422618897442e-06, 'epoch': 0.15} 15%|█▍ | 407/2774 [1:20:29<8:24:32, 12.79s/it] 15%|█▍ | 408/2774 [1:20:40<8:06:24, 12.33s/it] {'loss': 1.0566, 'learning_rate': 4.8231491584155665e-06, 'epoch': 0.15} 15%|█▍ | 408/2774 [1:20:40<8:06:24, 12.33s/it] 15%|█▍ | 409/2774 [1:20:51<7:53:57, 12.02s/it] {'loss': 1.0229, 'learning_rate': 4.822068959221599e-06, 'epoch': 0.15} 15%|█▍ | 409/2774 [1:20:51<7:53:57, 12.02s/it] 15%|█▍ | 410/2774 [1:21:03<7:46:58, 11.85s/it] {'loss': 1.0591, 'learning_rate': 4.8209855928658425e-06, 'epoch': 0.15} 15%|█▍ | 410/2774 [1:21:03<7:46:58, 11.85s/it] 15%|█▍ | 411/2774 [1:21:14<7:43:28, 11.77s/it] {'loss': 1.0337, 'learning_rate': 4.819899060825943e-06, 'epoch': 0.15} 15%|█▍ | 411/2774 [1:21:14<7:43:28, 11.77s/it] 15%|█▍ | 412/2774 [1:21:25<7:37:31, 11.62s/it] {'loss': 1.0273, 'learning_rate': 4.8188093645838674e-06, 'epoch': 0.15} 15%|█▍ | 412/2774 [1:21:25<7:37:31, 11.62s/it] 15%|█▍ | 413/2774 [1:21:38<7:42:42, 11.76s/it] {'loss': 1.0576, 'learning_rate': 4.817716505625894e-06, 'epoch': 0.15} 15%|█▍ | 413/2774 [1:21:38<7:42:42, 11.76s/it] 15%|█▍ | 414/2774 [1:21:50<7:48:54, 11.92s/it] {'loss': 0.9683, 'learning_rate': 4.816620485442616e-06, 'epoch': 0.15} 15%|█▍ | 414/2774 [1:21:50<7:48:54, 11.92s/it] 15%|█▍ | 415/2774 [1:22:01<7:45:20, 11.84s/it] {'loss': 1.041, 'learning_rate': 4.815521305528939e-06, 'epoch': 0.15} 15%|█▍ | 415/2774 [1:22:01<7:45:20, 11.84s/it] 15%|█▍ | 416/2774 [1:22:13<7:41:47, 11.75s/it] {'loss': 1.0493, 'learning_rate': 4.814418967384078e-06, 'epoch': 0.15} 15%|█▍ | 416/2774 [1:22:13<7:41:47, 11.75s/it] 15%|█▌ | 417/2774 [1:22:25<7:46:33, 11.88s/it] {'loss': 1.0293, 'learning_rate': 4.813313472511555e-06, 'epoch': 0.15} 15%|█▌ | 417/2774 [1:22:25<7:46:33, 11.88s/it] 15%|█▌ | 418/2774 [1:22:37<7:48:51, 11.94s/it] {'loss': 0.9956, 'learning_rate': 4.812204822419199e-06, 'epoch': 0.15} 15%|█▌ | 418/2774 [1:22:37<7:48:51, 11.94s/it] 15%|█▌ | 419/2774 [1:22:49<7:44:27, 11.83s/it] {'loss': 1.0215, 'learning_rate': 4.811093018619143e-06, 'epoch': 0.15} 15%|█▌ | 419/2774 [1:22:49<7:44:27, 11.83s/it] 15%|█▌ | 420/2774 [1:23:00<7:38:28, 11.69s/it] {'loss': 1.0117, 'learning_rate': 4.809978062627818e-06, 'epoch': 0.15} 15%|█▌ | 420/2774 [1:23:00<7:38:28, 11.69s/it] 15%|█▌ | 421/2774 [1:23:12<7:34:22, 11.59s/it] {'loss': 1.0, 'learning_rate': 4.808859955965957e-06, 'epoch': 0.15} 15%|█▌ | 421/2774 [1:23:12<7:34:22, 11.59s/it] 15%|█▌ | 422/2774 [1:23:23<7:27:52, 11.43s/it] {'loss': 1.0322, 'learning_rate': 4.807738700158592e-06, 'epoch': 0.15} 15%|█▌ | 422/2774 [1:23:23<7:27:52, 11.43s/it] 15%|█▌ | 423/2774 [1:23:34<7:32:29, 11.55s/it] {'loss': 1.0566, 'learning_rate': 4.806614296735045e-06, 'epoch': 0.15} 15%|█▌ | 423/2774 [1:23:34<7:32:29, 11.55s/it] 15%|█▌ | 424/2774 [1:23:46<7:31:24, 11.53s/it] {'loss': 1.0752, 'learning_rate': 4.805486747228936e-06, 'epoch': 0.15} 15%|█▌ | 424/2774 [1:23:46<7:31:24, 11.53s/it] 15%|█▌ | 425/2774 [1:23:57<7:25:58, 11.39s/it] {'loss': 1.0317, 'learning_rate': 4.804356053178175e-06, 'epoch': 0.15} 15%|█▌ | 425/2774 [1:23:57<7:25:58, 11.39s/it] 15%|█▌ | 426/2774 [1:24:09<7:33:48, 11.60s/it] {'loss': 1.0098, 'learning_rate': 4.8032222161249595e-06, 'epoch': 0.15} 15%|█▌ | 426/2774 [1:24:09<7:33:48, 11.60s/it] 15%|█▌ | 427/2774 [1:24:21<7:32:04, 11.56s/it] {'loss': 0.9946, 'learning_rate': 4.802085237615776e-06, 'epoch': 0.15} 15%|█▌ | 427/2774 [1:24:21<7:32:04, 11.56s/it] 15%|█▌ | 428/2774 [1:24:32<7:33:16, 11.59s/it] {'loss': 1.001, 'learning_rate': 4.800945119201392e-06, 'epoch': 0.15} 15%|█▌ | 428/2774 [1:24:32<7:33:16, 11.59s/it] 15%|█▌ | 429/2774 [1:24:45<7:44:28, 11.88s/it] {'loss': 0.9922, 'learning_rate': 4.799801862436863e-06, 'epoch': 0.15} 15%|█▌ | 429/2774 [1:24:45<7:44:28, 11.88s/it] 16%|█▌ | 430/2774 [1:24:58<7:55:07, 12.16s/it] {'loss': 1.0137, 'learning_rate': 4.798655468881519e-06, 'epoch': 0.16} 16%|█▌ | 430/2774 [1:24:58<7:55:07, 12.16s/it] 16%|█▌ | 431/2774 [1:25:09<7:50:09, 12.04s/it] {'loss': 1.0474, 'learning_rate': 4.797505940098975e-06, 'epoch': 0.16} 16%|█▌ | 431/2774 [1:25:09<7:50:09, 12.04s/it] 16%|█▌ | 432/2774 [1:25:21<7:43:07, 11.86s/it] {'loss': 1.1255, 'learning_rate': 4.796353277657117e-06, 'epoch': 0.16} 16%|█▌ | 432/2774 [1:25:21<7:43:07, 11.86s/it] 16%|█▌ | 433/2774 [1:25:33<7:49:16, 12.03s/it] {'loss': 1.0562, 'learning_rate': 4.795197483128107e-06, 'epoch': 0.16} 16%|█▌ | 433/2774 [1:25:33<7:49:16, 12.03s/it] 16%|█▌ | 434/2774 [1:25:44<7:38:33, 11.76s/it] {'loss': 1.0557, 'learning_rate': 4.794038558088378e-06, 'epoch': 0.16} 16%|█▌ | 434/2774 [1:25:44<7:38:33, 11.76s/it] 16%|█▌ | 435/2774 [1:25:56<7:33:08, 11.62s/it] {'loss': 1.0801, 'learning_rate': 4.792876504118636e-06, 'epoch': 0.16} 16%|█▌ | 435/2774 [1:25:56<7:33:08, 11.62s/it] 16%|█▌ | 436/2774 [1:26:07<7:27:24, 11.48s/it] {'loss': 1.041, 'learning_rate': 4.791711322803852e-06, 'epoch': 0.16} 16%|█▌ | 436/2774 [1:26:07<7:27:24, 11.48s/it] 16%|█▌ | 437/2774 [1:26:18<7:24:36, 11.41s/it] {'loss': 1.0317, 'learning_rate': 4.79054301573326e-06, 'epoch': 0.16} 16%|█▌ | 437/2774 [1:26:18<7:24:36, 11.41s/it] 16%|█▌ | 438/2774 [1:26:30<7:25:43, 11.45s/it] {'loss': 1.062, 'learning_rate': 4.789371584500364e-06, 'epoch': 0.16} 16%|█▌ | 438/2774 [1:26:30<7:25:43, 11.45s/it] 16%|█▌ | 439/2774 [1:26:41<7:24:47, 11.43s/it] {'loss': 1.0542, 'learning_rate': 4.788197030702924e-06, 'epoch': 0.16} 16%|█▌ | 439/2774 [1:26:41<7:24:47, 11.43s/it] 16%|█▌ | 440/2774 [1:26:52<7:24:26, 11.43s/it] {'loss': 1.0288, 'learning_rate': 4.787019355942959e-06, 'epoch': 0.16} 16%|█▌ | 440/2774 [1:26:52<7:24:26, 11.43s/it] 16%|█▌ | 441/2774 [1:27:04<7:21:26, 11.35s/it] {'loss': 1.0215, 'learning_rate': 4.785838561826749e-06, 'epoch': 0.16} 16%|█▌ | 441/2774 [1:27:04<7:21:26, 11.35s/it] 16%|█▌ | 442/2774 [1:27:18<7:57:48, 12.29s/it] {'loss': 1.0317, 'learning_rate': 4.7846546499648224e-06, 'epoch': 0.16} 16%|█▌ | 442/2774 [1:27:18<7:57:48, 12.29s/it] 16%|█▌ | 443/2774 [1:27:31<8:08:19, 12.57s/it] {'loss': 0.9839, 'learning_rate': 4.783467621971966e-06, 'epoch': 0.16} 16%|█▌ | 443/2774 [1:27:31<8:08:19, 12.57s/it] 16%|█▌ | 444/2774 [1:27:43<8:04:23, 12.47s/it] {'loss': 1.0156, 'learning_rate': 4.782277479467216e-06, 'epoch': 0.16} 16%|█▌ | 444/2774 [1:27:43<8:04:23, 12.47s/it] 16%|█▌ | 445/2774 [1:27:55<7:53:51, 12.21s/it] {'loss': 1.0405, 'learning_rate': 4.78108422407385e-06, 'epoch': 0.16} 16%|█▌ | 445/2774 [1:27:55<7:53:51, 12.21s/it] 16%|█▌ | 446/2774 [1:28:07<7:46:16, 12.02s/it] {'loss': 1.0566, 'learning_rate': 4.7798878574194e-06, 'epoch': 0.16} 16%|█▌ | 446/2774 [1:28:07<7:46:16, 12.02s/it] 16%|█▌ | 447/2774 [1:28:18<7:37:58, 11.81s/it] {'loss': 1.0425, 'learning_rate': 4.778688381135636e-06, 'epoch': 0.16} 16%|█▌ | 447/2774 [1:28:18<7:37:58, 11.81s/it] 16%|█▌ | 448/2774 [1:28:31<7:46:25, 12.03s/it] {'loss': 1.0542, 'learning_rate': 4.777485796858572e-06, 'epoch': 0.16} 16%|█▌ | 448/2774 [1:28:31<7:46:25, 12.03s/it] 16%|█▌ | 449/2774 [1:28:44<7:57:26, 12.32s/it] {'loss': 1.0361, 'learning_rate': 4.77628010622846e-06, 'epoch': 0.16} 16%|█▌ | 449/2774 [1:28:44<7:57:26, 12.32s/it] 16%|█▌ | 450/2774 [1:28:55<7:45:27, 12.02s/it] {'loss': 0.9922, 'learning_rate': 4.775071310889791e-06, 'epoch': 0.16} 16%|█▌ | 450/2774 [1:28:55<7:45:27, 12.02s/it] 16%|█▋ | 451/2774 [1:29:07<7:43:17, 11.97s/it] {'loss': 1.0073, 'learning_rate': 4.773859412491285e-06, 'epoch': 0.16} 16%|█▋ | 451/2774 [1:29:07<7:43:17, 11.97s/it] 16%|█▋ | 452/2774 [1:29:18<7:34:40, 11.75s/it] {'loss': 1.0298, 'learning_rate': 4.7726444126859015e-06, 'epoch': 0.16} 16%|█▋ | 452/2774 [1:29:18<7:34:40, 11.75s/it] 16%|█▋ | 453/2774 [1:29:30<7:37:06, 11.82s/it] {'loss': 1.0654, 'learning_rate': 4.771426313130826e-06, 'epoch': 0.16} 16%|█▋ | 453/2774 [1:29:30<7:37:06, 11.82s/it] 16%|█▋ | 454/2774 [1:29:41<7:33:26, 11.73s/it] {'loss': 1.0249, 'learning_rate': 4.770205115487471e-06, 'epoch': 0.16} 16%|█▋ | 454/2774 [1:29:41<7:33:26, 11.73s/it] 16%|█▋ | 455/2774 [1:29:53<7:29:22, 11.63s/it] {'loss': 0.9624, 'learning_rate': 4.76898082142148e-06, 'epoch': 0.16} 16%|█▋ | 455/2774 [1:29:53<7:29:22, 11.63s/it] 16%|█▋ | 456/2774 [1:30:04<7:26:27, 11.56s/it] {'loss': 1.001, 'learning_rate': 4.767753432602713e-06, 'epoch': 0.16} 16%|█▋ | 456/2774 [1:30:04<7:26:27, 11.56s/it] 16%|█▋ | 457/2774 [1:30:15<7:22:13, 11.45s/it] {'loss': 1.0176, 'learning_rate': 4.7665229507052545e-06, 'epoch': 0.16} 16%|█▋ | 457/2774 [1:30:15<7:22:13, 11.45s/it] 17%|█▋ | 458/2774 [1:30:27<7:21:47, 11.45s/it] {'loss': 1.0342, 'learning_rate': 4.765289377407409e-06, 'epoch': 0.17} 17%|█▋ | 458/2774 [1:30:27<7:21:47, 11.45s/it] 17%|█▋ | 459/2774 [1:30:39<7:27:38, 11.60s/it] {'loss': 1.0298, 'learning_rate': 4.764052714391695e-06, 'epoch': 0.17} 17%|█▋ | 459/2774 [1:30:39<7:27:38, 11.60s/it] 17%|█▋ | 460/2774 [1:30:50<7:28:27, 11.63s/it] {'loss': 1.0415, 'learning_rate': 4.762812963344845e-06, 'epoch': 0.17} 17%|█▋ | 460/2774 [1:30:50<7:28:27, 11.63s/it] 17%|█▋ | 461/2774 [1:31:02<7:26:51, 11.59s/it] {'loss': 1.0098, 'learning_rate': 4.7615701259578065e-06, 'epoch': 0.17} 17%|█▋ | 461/2774 [1:31:02<7:26:51, 11.59s/it] 17%|█▋ | 462/2774 [1:31:13<7:21:33, 11.46s/it] {'loss': 1.0547, 'learning_rate': 4.760324203925735e-06, 'epoch': 0.17} 17%|█▋ | 462/2774 [1:31:13<7:21:33, 11.46s/it] 17%|█▋ | 463/2774 [1:31:25<7:29:36, 11.67s/it] {'loss': 1.0513, 'learning_rate': 4.75907519894799e-06, 'epoch': 0.17} 17%|█▋ | 463/2774 [1:31:25<7:29:36, 11.67s/it] 17%|█▋ | 464/2774 [1:31:37<7:25:42, 11.58s/it] {'loss': 1.0586, 'learning_rate': 4.757823112728141e-06, 'epoch': 0.17} 17%|█▋ | 464/2774 [1:31:37<7:25:42, 11.58s/it] 17%|█▋ | 465/2774 [1:31:48<7:25:50, 11.59s/it] {'loss': 1.0977, 'learning_rate': 4.756567946973958e-06, 'epoch': 0.17} 17%|█▋ | 465/2774 [1:31:48<7:25:50, 11.59s/it] 17%|█▋ | 466/2774 [1:32:00<7:21:52, 11.49s/it] {'loss': 0.9995, 'learning_rate': 4.75530970339741e-06, 'epoch': 0.17} 17%|█▋ | 466/2774 [1:32:00<7:21:52, 11.49s/it] 17%|█▋ | 467/2774 [1:32:12<7:38:43, 11.93s/it] {'loss': 1.0361, 'learning_rate': 4.7540483837146675e-06, 'epoch': 0.17} 17%|█▋ | 467/2774 [1:32:13<7:38:43, 11.93s/it] 17%|█▋ | 468/2774 [1:32:24<7:33:16, 11.79s/it] {'loss': 1.0293, 'learning_rate': 4.752783989646092e-06, 'epoch': 0.17} 17%|█▋ | 468/2774 [1:32:24<7:33:16, 11.79s/it] 17%|█▋ | 469/2774 [1:32:35<7:29:47, 11.71s/it] {'loss': 1.0107, 'learning_rate': 4.751516522916242e-06, 'epoch': 0.17} 17%|█▋ | 469/2774 [1:32:35<7:29:47, 11.71s/it] 17%|█▋ | 470/2774 [1:32:47<7:27:41, 11.66s/it] {'loss': 1.0586, 'learning_rate': 4.750245985253864e-06, 'epoch': 0.17} 17%|█▋ | 470/2774 [1:32:47<7:27:41, 11.66s/it] 17%|█▋ | 471/2774 [1:33:00<7:37:19, 11.91s/it] {'loss': 1.0527, 'learning_rate': 4.748972378391897e-06, 'epoch': 0.17} 17%|█▋ | 471/2774 [1:33:00<7:37:19, 11.91s/it] 17%|█▋ | 472/2774 [1:33:11<7:33:15, 11.81s/it] {'loss': 1.0381, 'learning_rate': 4.747695704067462e-06, 'epoch': 0.17} 17%|█▋ | 472/2774 [1:33:11<7:33:15, 11.81s/it] 17%|█▋ | 473/2774 [1:33:22<7:26:11, 11.63s/it] {'loss': 1.0132, 'learning_rate': 4.746415964021866e-06, 'epoch': 0.17} 17%|█▋ | 473/2774 [1:33:22<7:26:11, 11.63s/it] 17%|█▋ | 474/2774 [1:33:36<7:49:14, 12.24s/it] {'loss': 1.0107, 'learning_rate': 4.745133160000598e-06, 'epoch': 0.17} 17%|█▋ | 474/2774 [1:33:36<7:49:14, 12.24s/it] 17%|█▋ | 475/2774 [1:33:47<7:39:39, 12.00s/it] {'loss': 1.0391, 'learning_rate': 4.743847293753323e-06, 'epoch': 0.17} 17%|█▋ | 475/2774 [1:33:47<7:39:39, 12.00s/it] 17%|█▋ | 476/2774 [1:33:59<7:36:00, 11.91s/it] {'loss': 0.9688, 'learning_rate': 4.7425583670338885e-06, 'epoch': 0.17} 17%|█▋ | 476/2774 [1:33:59<7:36:00, 11.91s/it] 17%|█▋ | 477/2774 [1:34:10<7:29:24, 11.74s/it] {'loss': 1.0645, 'learning_rate': 4.741266381600309e-06, 'epoch': 0.17} 17%|█▋ | 477/2774 [1:34:10<7:29:24, 11.74s/it] 17%|█▋ | 478/2774 [1:34:22<7:25:28, 11.64s/it] {'loss': 1.0747, 'learning_rate': 4.739971339214776e-06, 'epoch': 0.17} 17%|█▋ | 478/2774 [1:34:22<7:25:28, 11.64s/it] 17%|█▋ | 479/2774 [1:34:34<7:31:42, 11.81s/it] {'loss': 1.0225, 'learning_rate': 4.73867324164365e-06, 'epoch': 0.17} 17%|█▋ | 479/2774 [1:34:34<7:31:42, 11.81s/it] 17%|█▋ | 480/2774 [1:34:46<7:30:27, 11.78s/it] {'loss': 1.0137, 'learning_rate': 4.737372090657458e-06, 'epoch': 0.17} 17%|█▋ | 480/2774 [1:34:46<7:30:27, 11.78s/it] 17%|█▋ | 481/2774 [1:34:59<7:47:09, 12.22s/it] {'loss': 1.0283, 'learning_rate': 4.736067888030888e-06, 'epoch': 0.17} 17%|█▋ | 481/2774 [1:34:59<7:47:09, 12.22s/it] 17%|█▋ | 482/2774 [1:35:10<7:34:52, 11.91s/it] {'loss': 1.0547, 'learning_rate': 4.734760635542797e-06, 'epoch': 0.17} 17%|█▋ | 482/2774 [1:35:10<7:34:52, 11.91s/it] 17%|█▋ | 483/2774 [1:35:26<8:20:39, 13.11s/it] {'loss': 1.0171, 'learning_rate': 4.733450334976197e-06, 'epoch': 0.17} 17%|█▋ | 483/2774 [1:35:26<8:20:39, 13.11s/it] 17%|█▋ | 484/2774 [1:35:38<8:00:50, 12.60s/it] {'loss': 1.0459, 'learning_rate': 4.732136988118259e-06, 'epoch': 0.17} 17%|█▋ | 484/2774 [1:35:38<8:00:50, 12.60s/it] 17%|█▋ | 485/2774 [1:35:50<7:59:32, 12.57s/it] {'loss': 0.9922, 'learning_rate': 4.73082059676031e-06, 'epoch': 0.17} 17%|█▋ | 485/2774 [1:35:50<7:59:32, 12.57s/it] 18%|█▊ | 486/2774 [1:36:01<7:46:02, 12.22s/it] {'loss': 1.021, 'learning_rate': 4.7295011626978255e-06, 'epoch': 0.18} 18%|█▊ | 486/2774 [1:36:01<7:46:02, 12.22s/it] 18%|█▊ | 487/2774 [1:36:13<7:35:10, 11.94s/it] {'loss': 1.02, 'learning_rate': 4.728178687730436e-06, 'epoch': 0.18} 18%|█▊ | 487/2774 [1:36:13<7:35:10, 11.94s/it] 18%|█▊ | 488/2774 [1:36:24<7:30:07, 11.81s/it] {'loss': 1.0688, 'learning_rate': 4.726853173661917e-06, 'epoch': 0.18} 18%|█▊ | 488/2774 [1:36:24<7:30:07, 11.81s/it] 18%|█▊ | 489/2774 [1:36:36<7:26:51, 11.73s/it] {'loss': 1.0483, 'learning_rate': 4.725524622300191e-06, 'epoch': 0.18} 18%|█▊ | 489/2774 [1:36:36<7:26:51, 11.73s/it] 18%|█▊ | 490/2774 [1:36:47<7:23:31, 11.65s/it] {'loss': 1.063, 'learning_rate': 4.724193035457319e-06, 'epoch': 0.18} 18%|█▊ | 490/2774 [1:36:47<7:23:31, 11.65s/it] 18%|█▊ | 491/2774 [1:36:59<7:20:11, 11.57s/it] {'loss': 1.0371, 'learning_rate': 4.722858414949506e-06, 'epoch': 0.18} 18%|█▊ | 491/2774 [1:36:59<7:20:11, 11.57s/it] 18%|█▊ | 492/2774 [1:37:12<7:43:03, 12.18s/it] {'loss': 1.0298, 'learning_rate': 4.721520762597095e-06, 'epoch': 0.18} 18%|█▊ | 492/2774 [1:37:12<7:43:03, 12.18s/it] 18%|█▊ | 493/2774 [1:37:24<7:40:53, 12.12s/it] {'loss': 1.0337, 'learning_rate': 4.720180080224563e-06, 'epoch': 0.18} 18%|█▊ | 493/2774 [1:37:24<7:40:53, 12.12s/it] 18%|█▊ | 494/2774 [1:37:36<7:34:51, 11.97s/it] {'loss': 1.0405, 'learning_rate': 4.718836369660517e-06, 'epoch': 0.18} 18%|█▊ | 494/2774 [1:37:36<7:34:51, 11.97s/it] 18%|█▊ | 495/2774 [1:37:48<7:35:49, 12.00s/it] {'loss': 1.0986, 'learning_rate': 4.7174896327377e-06, 'epoch': 0.18} 18%|█▊ | 495/2774 [1:37:48<7:35:49, 12.00s/it] 18%|█▊ | 496/2774 [1:37:59<7:27:17, 11.78s/it] {'loss': 1.0879, 'learning_rate': 4.7161398712929785e-06, 'epoch': 0.18} 18%|█▊ | 496/2774 [1:37:59<7:27:17, 11.78s/it] 18%|█▊ | 497/2774 [1:38:11<7:27:23, 11.79s/it] {'loss': 1.0732, 'learning_rate': 4.714787087167346e-06, 'epoch': 0.18} 18%|█▊ | 497/2774 [1:38:11<7:27:23, 11.79s/it] 18%|█▊ | 498/2774 [1:38:22<7:21:05, 11.63s/it] {'loss': 1.0303, 'learning_rate': 4.713431282205919e-06, 'epoch': 0.18} 18%|█▊ | 498/2774 [1:38:22<7:21:05, 11.63s/it] 18%|█▊ | 499/2774 [1:38:34<7:17:36, 11.54s/it] {'loss': 1.0791, 'learning_rate': 4.712072458257932e-06, 'epoch': 0.18} 18%|█▊ | 499/2774 [1:38:34<7:17:36, 11.54s/it] 18%|█▊ | 500/2774 [1:38:47<7:38:37, 12.10s/it] {'loss': 1.0293, 'learning_rate': 4.710710617176739e-06, 'epoch': 0.18} 18%|█▊ | 500/2774 [1:38:47<7:38:37, 12.10s/it] 18%|█▊ | 501/2774 [1:38:58<7:29:25, 11.86s/it] {'loss': 1.0601, 'learning_rate': 4.70934576081981e-06, 'epoch': 0.18} 18%|█▊ | 501/2774 [1:38:58<7:29:25, 11.86s/it] 18%|█▊ | 502/2774 [1:39:12<7:44:27, 12.27s/it] {'loss': 0.9668, 'learning_rate': 4.7079778910487264e-06, 'epoch': 0.18} 18%|█▊ | 502/2774 [1:39:12<7:44:27, 12.27s/it] 18%|█▊ | 503/2774 [1:39:23<7:34:47, 12.02s/it] {'loss': 1.0669, 'learning_rate': 4.70660700972918e-06, 'epoch': 0.18} 18%|█▊ | 503/2774 [1:39:23<7:34:47, 12.02s/it] 18%|█▊ | 504/2774 [1:39:34<7:28:42, 11.86s/it] {'loss': 1.1055, 'learning_rate': 4.705233118730969e-06, 'epoch': 0.18} 18%|█▊ | 504/2774 [1:39:34<7:28:42, 11.86s/it] 18%|█▊ | 505/2774 [1:39:47<7:33:08, 11.98s/it] {'loss': 1.0156, 'learning_rate': 4.703856219927999e-06, 'epoch': 0.18} 18%|█▊ | 505/2774 [1:39:47<7:33:08, 11.98s/it] 18%|█▊ | 506/2774 [1:39:59<7:40:03, 12.17s/it] {'loss': 1.0615, 'learning_rate': 4.702476315198275e-06, 'epoch': 0.18} 18%|█▊ | 506/2774 [1:39:59<7:40:03, 12.17s/it] 18%|█▊ | 507/2774 [1:40:11<7:29:04, 11.89s/it] {'loss': 1.0332, 'learning_rate': 4.701093406423907e-06, 'epoch': 0.18} 18%|█▊ | 507/2774 [1:40:11<7:29:04, 11.89s/it] 18%|█▊ | 508/2774 [1:40:22<7:20:35, 11.67s/it] {'loss': 1.1064, 'learning_rate': 4.699707495491096e-06, 'epoch': 0.18} 18%|█▊ | 508/2774 [1:40:22<7:20:35, 11.67s/it] 18%|█▊ | 509/2774 [1:40:33<7:17:56, 11.60s/it] {'loss': 1.0078, 'learning_rate': 4.698318584290141e-06, 'epoch': 0.18} 18%|█▊ | 509/2774 [1:40:33<7:17:56, 11.60s/it] 18%|█▊ | 510/2774 [1:40:45<7:15:52, 11.55s/it] {'loss': 1.0317, 'learning_rate': 4.696926674715435e-06, 'epoch': 0.18} 18%|█▊ | 510/2774 [1:40:45<7:15:52, 11.55s/it] 18%|█▊ | 511/2774 [1:40:56<7:13:09, 11.48s/it] {'loss': 1.0947, 'learning_rate': 4.695531768665456e-06, 'epoch': 0.18} 18%|█▊ | 511/2774 [1:40:56<7:13:09, 11.48s/it] 18%|█▊ | 512/2774 [1:41:07<7:09:14, 11.39s/it] {'loss': 1.0269, 'learning_rate': 4.694133868042775e-06, 'epoch': 0.18} 18%|█▊ | 512/2774 [1:41:07<7:09:14, 11.39s/it] 18%|█▊ | 513/2774 [1:41:19<7:11:19, 11.45s/it] {'loss': 0.9688, 'learning_rate': 4.692732974754041e-06, 'epoch': 0.18} 18%|█▊ | 513/2774 [1:41:19<7:11:19, 11.45s/it] 19%|█▊ | 514/2774 [1:41:30<7:11:38, 11.46s/it] {'loss': 1.0244, 'learning_rate': 4.691329090709989e-06, 'epoch': 0.19} 19%|█▊ | 514/2774 [1:41:30<7:11:38, 11.46s/it] 19%|█▊ | 515/2774 [1:41:41<7:08:56, 11.39s/it] {'loss': 1.0576, 'learning_rate': 4.689922217825431e-06, 'epoch': 0.19} 19%|█▊ | 515/2774 [1:41:41<7:08:56, 11.39s/it] 19%|█▊ | 516/2774 [1:41:53<7:10:09, 11.43s/it] {'loss': 1.0073, 'learning_rate': 4.6885123580192575e-06, 'epoch': 0.19} 19%|█▊ | 516/2774 [1:41:53<7:10:09, 11.43s/it] 19%|█▊ | 517/2774 [1:42:04<7:10:47, 11.45s/it] {'loss': 1.1006, 'learning_rate': 4.687099513214433e-06, 'epoch': 0.19} 19%|█▊ | 517/2774 [1:42:04<7:10:47, 11.45s/it] 19%|█▊ | 518/2774 [1:42:16<7:14:30, 11.56s/it] {'loss': 1.0708, 'learning_rate': 4.685683685337991e-06, 'epoch': 0.19} 19%|█▊ | 518/2774 [1:42:16<7:14:30, 11.56s/it] 19%|█▊ | 519/2774 [1:42:28<7:14:18, 11.56s/it] {'loss': 1.061, 'learning_rate': 4.684264876321035e-06, 'epoch': 0.19} 19%|█▊ | 519/2774 [1:42:28<7:14:18, 11.56s/it] 19%|█▊ | 520/2774 [1:42:41<7:36:35, 12.15s/it] {'loss': 0.9976, 'learning_rate': 4.682843088098736e-06, 'epoch': 0.19} 19%|█▊ | 520/2774 [1:42:41<7:36:35, 12.15s/it] 19%|█▉ | 521/2774 [1:42:53<7:34:52, 12.11s/it] {'loss': 1.0098, 'learning_rate': 4.681418322610327e-06, 'epoch': 0.19} 19%|█▉ | 521/2774 [1:42:53<7:34:52, 12.11s/it] 19%|█▉ | 522/2774 [1:43:04<7:22:43, 11.80s/it] {'loss': 1.0615, 'learning_rate': 4.679990581799102e-06, 'epoch': 0.19} 19%|█▉ | 522/2774 [1:43:04<7:22:43, 11.80s/it] 19%|█▉ | 523/2774 [1:43:16<7:23:23, 11.82s/it] {'loss': 1.0679, 'learning_rate': 4.678559867612412e-06, 'epoch': 0.19} 19%|█▉ | 523/2774 [1:43:16<7:23:23, 11.82s/it] 19%|█▉ | 524/2774 [1:43:28<7:22:58, 11.81s/it] {'loss': 1.0898, 'learning_rate': 4.677126182001667e-06, 'epoch': 0.19} 19%|█▉ | 524/2774 [1:43:28<7:22:58, 11.81s/it] 19%|█▉ | 525/2774 [1:43:40<7:20:57, 11.76s/it] {'loss': 1.0562, 'learning_rate': 4.675689526922324e-06, 'epoch': 0.19} 19%|█▉ | 525/2774 [1:43:40<7:20:57, 11.76s/it] 19%|█▉ | 526/2774 [1:43:51<7:14:51, 11.61s/it] {'loss': 1.0024, 'learning_rate': 4.6742499043338985e-06, 'epoch': 0.19} 19%|█▉ | 526/2774 [1:43:51<7:14:51, 11.61s/it] 19%|█▉ | 527/2774 [1:44:02<7:09:30, 11.47s/it] {'loss': 0.9907, 'learning_rate': 4.672807316199946e-06, 'epoch': 0.19} 19%|█▉ | 527/2774 [1:44:02<7:09:30, 11.47s/it] 19%|█▉ | 528/2774 [1:44:16<7:32:46, 12.10s/it] {'loss': 1.0269, 'learning_rate': 4.671361764488069e-06, 'epoch': 0.19} 19%|█▉ | 528/2774 [1:44:16<7:32:46, 12.10s/it] 19%|█▉ | 529/2774 [1:44:27<7:23:31, 11.85s/it] {'loss': 1.085, 'learning_rate': 4.669913251169914e-06, 'epoch': 0.19} 19%|█▉ | 529/2774 [1:44:27<7:23:31, 11.85s/it] 19%|█▉ | 530/2774 [1:44:40<7:36:23, 12.20s/it] {'loss': 1.0078, 'learning_rate': 4.668461778221165e-06, 'epoch': 0.19} 19%|█▉ | 530/2774 [1:44:40<7:36:23, 12.20s/it] 19%|█▉ | 531/2774 [1:44:51<7:24:15, 11.88s/it] {'loss': 0.9917, 'learning_rate': 4.6670073476215435e-06, 'epoch': 0.19} 19%|█▉ | 531/2774 [1:44:51<7:24:15, 11.88s/it] 19%|█▉ | 532/2774 [1:45:04<7:32:46, 12.12s/it] {'loss': 1.0391, 'learning_rate': 4.665549961354806e-06, 'epoch': 0.19} 19%|█▉ | 532/2774 [1:45:04<7:32:46, 12.12s/it] 19%|█▉ | 533/2774 [1:45:15<7:25:52, 11.94s/it] {'loss': 1.0791, 'learning_rate': 4.664089621408738e-06, 'epoch': 0.19} 19%|█▉ | 533/2774 [1:45:15<7:25:52, 11.94s/it] 19%|█▉ | 534/2774 [1:45:28<7:33:37, 12.15s/it] {'loss': 1.0063, 'learning_rate': 4.662626329775155e-06, 'epoch': 0.19} 19%|█▉ | 534/2774 [1:45:28<7:33:37, 12.15s/it] 19%|█▉ | 535/2774 [1:45:40<7:27:17, 11.99s/it] {'loss': 1.0649, 'learning_rate': 4.6611600884498994e-06, 'epoch': 0.19} 19%|█▉ | 535/2774 [1:45:40<7:27:17, 11.99s/it] 19%|█▉ | 536/2774 [1:45:51<7:22:50, 11.87s/it] {'loss': 1.0093, 'learning_rate': 4.659690899432835e-06, 'epoch': 0.19} 19%|█▉ | 536/2774 [1:45:51<7:22:50, 11.87s/it] 19%|█▉ | 537/2774 [1:46:03<7:17:33, 11.74s/it] {'loss': 1.0449, 'learning_rate': 4.658218764727847e-06, 'epoch': 0.19} 19%|█▉ | 537/2774 [1:46:03<7:17:33, 11.74s/it] 19%|█▉ | 538/2774 [1:46:14<7:11:48, 11.59s/it] {'loss': 1.0474, 'learning_rate': 4.656743686342838e-06, 'epoch': 0.19} 19%|█▉ | 538/2774 [1:46:14<7:11:48, 11.59s/it] 19%|█▉ | 539/2774 [1:46:25<7:10:31, 11.56s/it] {'loss': 1.061, 'learning_rate': 4.655265666289727e-06, 'epoch': 0.19} 19%|█▉ | 539/2774 [1:46:25<7:10:31, 11.56s/it] 19%|█▉ | 540/2774 [1:46:37<7:11:38, 11.59s/it] {'loss': 1.0562, 'learning_rate': 4.653784706584443e-06, 'epoch': 0.19} 19%|█▉ | 540/2774 [1:46:37<7:11:38, 11.59s/it] 20%|█▉ | 541/2774 [1:46:50<7:26:44, 12.00s/it] {'loss': 1.0645, 'learning_rate': 4.6523008092469255e-06, 'epoch': 0.2} 20%|█▉ | 541/2774 [1:46:50<7:26:44, 12.00s/it] 20%|█▉ | 542/2774 [1:47:01<7:20:49, 11.85s/it] {'loss': 1.0762, 'learning_rate': 4.6508139763011205e-06, 'epoch': 0.2} 20%|█▉ | 542/2774 [1:47:01<7:20:49, 11.85s/it] 20%|█▉ | 543/2774 [1:47:13<7:17:18, 11.76s/it] {'loss': 1.0303, 'learning_rate': 4.649324209774979e-06, 'epoch': 0.2} 20%|█▉ | 543/2774 [1:47:13<7:17:18, 11.76s/it] 20%|█▉ | 544/2774 [1:47:25<7:19:13, 11.82s/it] {'loss': 1.0508, 'learning_rate': 4.647831511700453e-06, 'epoch': 0.2} 20%|█▉ | 544/2774 [1:47:25<7:19:13, 11.82s/it] 20%|█▉ | 545/2774 [1:47:36<7:13:47, 11.68s/it] {'loss': 1.0303, 'learning_rate': 4.646335884113492e-06, 'epoch': 0.2} 20%|█▉ | 545/2774 [1:47:36<7:13:47, 11.68s/it] 20%|█▉ | 546/2774 [1:47:48<7:14:11, 11.69s/it] {'loss': 1.0537, 'learning_rate': 4.644837329054042e-06, 'epoch': 0.2} 20%|█▉ | 546/2774 [1:47:48<7:14:11, 11.69s/it] 20%|█▉ | 547/2774 [1:48:00<7:13:40, 11.68s/it] {'loss': 1.0654, 'learning_rate': 4.6433358485660405e-06, 'epoch': 0.2} 20%|█▉ | 547/2774 [1:48:00<7:13:40, 11.68s/it] 20%|█▉ | 548/2774 [1:48:11<7:12:18, 11.65s/it] {'loss': 1.0566, 'learning_rate': 4.641831444697417e-06, 'epoch': 0.2} 20%|█▉ | 548/2774 [1:48:11<7:12:18, 11.65s/it] 20%|█▉ | 549/2774 [1:48:23<7:09:52, 11.59s/it] {'loss': 1.0347, 'learning_rate': 4.640324119500087e-06, 'epoch': 0.2} 20%|█▉ | 549/2774 [1:48:23<7:09:52, 11.59s/it] 20%|█▉ | 550/2774 [1:48:34<7:10:23, 11.61s/it] {'loss': 1.0337, 'learning_rate': 4.638813875029952e-06, 'epoch': 0.2} 20%|█▉ | 550/2774 [1:48:34<7:10:23, 11.61s/it] 20%|█▉ | 551/2774 [1:48:47<7:25:42, 12.03s/it] {'loss': 0.9717, 'learning_rate': 4.637300713346894e-06, 'epoch': 0.2} 20%|█▉ | 551/2774 [1:48:47<7:25:42, 12.03s/it] 20%|█▉ | 552/2774 [1:48:59<7:20:58, 11.91s/it] {'loss': 1.0137, 'learning_rate': 4.635784636514773e-06, 'epoch': 0.2} 20%|█▉ | 552/2774 [1:48:59<7:20:58, 11.91s/it] 20%|█▉ | 553/2774 [1:49:10<7:12:42, 11.69s/it] {'loss': 1.0098, 'learning_rate': 4.634265646601427e-06, 'epoch': 0.2} 20%|█▉ | 553/2774 [1:49:10<7:12:42, 11.69s/it] 20%|█▉ | 554/2774 [1:49:23<7:24:04, 12.00s/it] {'loss': 1.0166, 'learning_rate': 4.632743745678667e-06, 'epoch': 0.2} 20%|█▉ | 554/2774 [1:49:23<7:24:04, 12.00s/it] 20%|██ | 555/2774 [1:49:35<7:21:17, 11.93s/it] {'loss': 1.0649, 'learning_rate': 4.631218935822273e-06, 'epoch': 0.2} 20%|██ | 555/2774 [1:49:35<7:21:17, 11.93s/it] 20%|██ | 556/2774 [1:49:46<7:14:27, 11.75s/it] {'loss': 1.0801, 'learning_rate': 4.629691219111993e-06, 'epoch': 0.2} 20%|██ | 556/2774 [1:49:46<7:14:27, 11.75s/it] 20%|██ | 557/2774 [1:49:57<7:09:51, 11.63s/it] {'loss': 1.0073, 'learning_rate': 4.628160597631543e-06, 'epoch': 0.2} 20%|██ | 557/2774 [1:49:57<7:09:51, 11.63s/it] 20%|██ | 558/2774 [1:50:10<7:19:05, 11.89s/it] {'loss': 1.0127, 'learning_rate': 4.626627073468596e-06, 'epoch': 0.2} 20%|██ | 558/2774 [1:50:10<7:19:05, 11.89s/it] 20%|██ | 559/2774 [1:50:21<7:15:29, 11.80s/it] {'loss': 1.061, 'learning_rate': 4.6250906487147865e-06, 'epoch': 0.2} 20%|██ | 559/2774 [1:50:21<7:15:29, 11.80s/it] 20%|██ | 560/2774 [1:50:33<7:14:11, 11.77s/it] {'loss': 1.0293, 'learning_rate': 4.623551325465705e-06, 'epoch': 0.2} 20%|██ | 560/2774 [1:50:33<7:14:11, 11.77s/it] 20%|██ | 561/2774 [1:50:45<7:10:10, 11.66s/it] {'loss': 0.9629, 'learning_rate': 4.622009105820896e-06, 'epoch': 0.2} 20%|██ | 561/2774 [1:50:45<7:10:10, 11.66s/it] 20%|██ | 562/2774 [1:50:56<7:07:16, 11.59s/it] {'loss': 1.0488, 'learning_rate': 4.620463991883853e-06, 'epoch': 0.2} 20%|██ | 562/2774 [1:50:56<7:07:16, 11.59s/it] 20%|██ | 563/2774 [1:51:08<7:07:20, 11.60s/it] {'loss': 1.0161, 'learning_rate': 4.6189159857620194e-06, 'epoch': 0.2} 20%|██ | 563/2774 [1:51:08<7:07:20, 11.60s/it] 20%|██ | 564/2774 [1:51:21<7:23:16, 12.03s/it] {'loss': 0.9761, 'learning_rate': 4.617365089566782e-06, 'epoch': 0.2} 20%|██ | 564/2774 [1:51:21<7:23:16, 12.03s/it] 20%|██ | 565/2774 [1:51:32<7:16:14, 11.85s/it] {'loss': 1.0459, 'learning_rate': 4.615811305413468e-06, 'epoch': 0.2} 20%|██ | 565/2774 [1:51:32<7:16:14, 11.85s/it] 20%|██ | 566/2774 [1:51:46<7:37:38, 12.44s/it] {'loss': 1.0386, 'learning_rate': 4.614254635421347e-06, 'epoch': 0.2} 20%|██ | 566/2774 [1:51:46<7:37:38, 12.44s/it] 20%|██ | 567/2774 [1:51:58<7:40:02, 12.51s/it] {'loss': 1.0571, 'learning_rate': 4.61269508171362e-06, 'epoch': 0.2} 20%|██ | 567/2774 [1:51:58<7:40:02, 12.51s/it] 20%|██ | 568/2774 [1:52:12<7:49:16, 12.76s/it] {'loss': 0.9976, 'learning_rate': 4.611132646417428e-06, 'epoch': 0.2} 20%|██ | 568/2774 [1:52:12<7:49:16, 12.76s/it] 21%|██ | 569/2774 [1:52:24<7:38:54, 12.49s/it] {'loss': 0.9829, 'learning_rate': 4.609567331663836e-06, 'epoch': 0.21} 21%|██ | 569/2774 [1:52:24<7:38:54, 12.49s/it] 21%|██ | 570/2774 [1:52:35<7:29:21, 12.23s/it] {'loss': 1.0737, 'learning_rate': 4.607999139587838e-06, 'epoch': 0.21} 21%|██ | 570/2774 [1:52:35<7:29:21, 12.23s/it] 21%|██ | 571/2774 [1:52:47<7:18:46, 11.95s/it] {'loss': 1.0352, 'learning_rate': 4.606428072328355e-06, 'epoch': 0.21} 21%|██ | 571/2774 [1:52:47<7:18:46, 11.95s/it] 21%|██ | 572/2774 [1:52:58<7:12:05, 11.77s/it] {'loss': 1.0537, 'learning_rate': 4.604854132028227e-06, 'epoch': 0.21} 21%|██ | 572/2774 [1:52:58<7:12:05, 11.77s/it] 21%|██ | 573/2774 [1:53:09<7:06:12, 11.62s/it] {'loss': 1.0244, 'learning_rate': 4.603277320834213e-06, 'epoch': 0.21} 21%|██ | 573/2774 [1:53:09<7:06:12, 11.62s/it] 21%|██ | 574/2774 [1:53:21<7:06:13, 11.62s/it] {'loss': 1.0391, 'learning_rate': 4.6016976408969895e-06, 'epoch': 0.21} 21%|██ | 574/2774 [1:53:21<7:06:13, 11.62s/it] 21%|██ | 575/2774 [1:53:33<7:11:26, 11.77s/it] {'loss': 0.9937, 'learning_rate': 4.600115094371144e-06, 'epoch': 0.21} 21%|██ | 575/2774 [1:53:33<7:11:26, 11.77s/it] 21%|██ | 576/2774 [1:53:44<7:05:11, 11.61s/it] {'loss': 1.0244, 'learning_rate': 4.5985296834151735e-06, 'epoch': 0.21} 21%|██ | 576/2774 [1:53:44<7:05:11, 11.61s/it] 21%|██ | 577/2774 [1:53:56<7:02:27, 11.54s/it] {'loss': 1.0186, 'learning_rate': 4.5969414101914846e-06, 'epoch': 0.21} 21%|██ | 577/2774 [1:53:56<7:02:27, 11.54s/it] 21%|██ | 578/2774 [1:54:07<7:03:56, 11.58s/it] {'loss': 1.0601, 'learning_rate': 4.595350276866384e-06, 'epoch': 0.21} 21%|██ | 578/2774 [1:54:07<7:03:56, 11.58s/it] 21%|██ | 579/2774 [1:54:19<7:01:27, 11.52s/it] {'loss': 1.0034, 'learning_rate': 4.593756285610083e-06, 'epoch': 0.21} 21%|██ | 579/2774 [1:54:19<7:01:27, 11.52s/it] 21%|██ | 580/2774 [1:54:30<7:03:08, 11.57s/it] {'loss': 1.0093, 'learning_rate': 4.592159438596688e-06, 'epoch': 0.21} 21%|██ | 580/2774 [1:54:30<7:03:08, 11.57s/it] 21%|██ | 581/2774 [1:54:42<7:03:39, 11.59s/it] {'loss': 1.0259, 'learning_rate': 4.590559738004203e-06, 'epoch': 0.21} 21%|██ | 581/2774 [1:54:42<7:03:39, 11.59s/it] 21%|██ | 582/2774 [1:54:54<7:04:50, 11.63s/it] {'loss': 1.0225, 'learning_rate': 4.588957186014523e-06, 'epoch': 0.21} 21%|██ | 582/2774 [1:54:54<7:04:50, 11.63s/it] 21%|██ | 583/2774 [1:55:05<7:04:32, 11.63s/it] {'loss': 1.0181, 'learning_rate': 4.587351784813431e-06, 'epoch': 0.21} 21%|██ | 583/2774 [1:55:05<7:04:32, 11.63s/it] 21%|██ | 584/2774 [1:55:16<6:58:51, 11.48s/it] {'loss': 1.0469, 'learning_rate': 4.585743536590599e-06, 'epoch': 0.21} 21%|██ | 584/2774 [1:55:16<6:58:51, 11.48s/it] 21%|██ | 585/2774 [1:55:28<7:00:05, 11.51s/it] {'loss': 1.0615, 'learning_rate': 4.5841324435395785e-06, 'epoch': 0.21} 21%|██ | 585/2774 [1:55:28<7:00:05, 11.51s/it] 21%|██ | 586/2774 [1:55:40<6:59:46, 11.51s/it] {'loss': 1.0869, 'learning_rate': 4.582518507857804e-06, 'epoch': 0.21} 21%|██ | 586/2774 [1:55:40<6:59:46, 11.51s/it] 21%|██ | 587/2774 [1:55:51<6:57:22, 11.45s/it] {'loss': 0.9849, 'learning_rate': 4.580901731746587e-06, 'epoch': 0.21} 21%|██ | 587/2774 [1:55:51<6:57:22, 11.45s/it] 21%|██ | 588/2774 [1:56:04<7:11:44, 11.85s/it] {'loss': 1.0464, 'learning_rate': 4.579282117411111e-06, 'epoch': 0.21} 21%|██ | 588/2774 [1:56:04<7:11:44, 11.85s/it] 21%|██ | 589/2774 [1:56:16<7:21:08, 12.11s/it] {'loss': 1.0444, 'learning_rate': 4.577659667060432e-06, 'epoch': 0.21} 21%|██ | 589/2774 [1:56:16<7:21:08, 12.11s/it] 21%|██▏ | 590/2774 [1:56:30<7:39:38, 12.63s/it] {'loss': 1.0205, 'learning_rate': 4.576034382907476e-06, 'epoch': 0.21} 21%|██▏ | 590/2774 [1:56:30<7:39:38, 12.63s/it] 21%|██▏ | 591/2774 [1:56:42<7:29:24, 12.35s/it] {'loss': 1.0474, 'learning_rate': 4.574406267169031e-06, 'epoch': 0.21} 21%|██▏ | 591/2774 [1:56:42<7:29:24, 12.35s/it] 21%|██▏ | 592/2774 [1:56:53<7:20:44, 12.12s/it] {'loss': 1.0615, 'learning_rate': 4.57277532206575e-06, 'epoch': 0.21} 21%|██▏ | 592/2774 [1:56:53<7:20:44, 12.12s/it] 21%|██▏ | 593/2774 [1:57:05<7:19:13, 12.08s/it] {'loss': 1.0107, 'learning_rate': 4.571141549822142e-06, 'epoch': 0.21} 21%|██▏ | 593/2774 [1:57:05<7:19:13, 12.08s/it] 21%|██▏ | 594/2774 [1:57:18<7:28:21, 12.34s/it] {'loss': 1.0454, 'learning_rate': 4.569504952666574e-06, 'epoch': 0.21} 21%|██▏ | 594/2774 [1:57:18<7:28:21, 12.34s/it] 21%|██▏ | 595/2774 [1:57:30<7:17:01, 12.03s/it] {'loss': 1.0732, 'learning_rate': 4.567865532831266e-06, 'epoch': 0.21} 21%|██▏ | 595/2774 [1:57:30<7:17:01, 12.03s/it] 21%|██▏ | 596/2774 [1:57:41<7:08:56, 11.82s/it] {'loss': 1.0122, 'learning_rate': 4.566223292552287e-06, 'epoch': 0.21} 21%|██▏ | 596/2774 [1:57:41<7:08:56, 11.82s/it] 22%|██▏ | 597/2774 [1:57:53<7:08:07, 11.80s/it] {'loss': 1.0415, 'learning_rate': 4.564578234069556e-06, 'epoch': 0.22} 22%|██▏ | 597/2774 [1:57:53<7:08:07, 11.80s/it] 22%|██▏ | 598/2774 [1:58:04<7:05:19, 11.73s/it] {'loss': 1.0215, 'learning_rate': 4.5629303596268295e-06, 'epoch': 0.22} 22%|██▏ | 598/2774 [1:58:04<7:05:19, 11.73s/it] 22%|██▏ | 599/2774 [1:58:16<7:03:57, 11.70s/it] {'loss': 1.0142, 'learning_rate': 4.561279671471711e-06, 'epoch': 0.22} 22%|██▏ | 599/2774 [1:58:16<7:03:57, 11.70s/it] 22%|██▏ | 600/2774 [1:58:28<7:02:09, 11.65s/it] {'loss': 1.043, 'learning_rate': 4.55962617185564e-06, 'epoch': 0.22} 22%|██▏ | 600/2774 [1:58:28<7:02:09, 11.65s/it] 22%|██▏ | 601/2774 [1:58:39<7:02:08, 11.66s/it] {'loss': 1.0596, 'learning_rate': 4.557969863033889e-06, 'epoch': 0.22} 22%|██▏ | 601/2774 [1:58:39<7:02:08, 11.66s/it] 22%|██▏ | 602/2774 [1:58:51<7:02:37, 11.67s/it] {'loss': 1.0308, 'learning_rate': 4.556310747265562e-06, 'epoch': 0.22} 22%|██▏ | 602/2774 [1:58:51<7:02:37, 11.67s/it] 22%|██▏ | 603/2774 [1:59:02<7:00:08, 11.61s/it] {'loss': 1.0361, 'learning_rate': 4.554648826813595e-06, 'epoch': 0.22} 22%|██▏ | 603/2774 [1:59:02<7:00:08, 11.61s/it] 22%|██▏ | 604/2774 [1:59:14<6:59:12, 11.59s/it] {'loss': 0.9897, 'learning_rate': 4.5529841039447466e-06, 'epoch': 0.22} 22%|██▏ | 604/2774 [1:59:14<6:59:12, 11.59s/it] 22%|██▏ | 605/2774 [1:59:26<6:58:52, 11.59s/it] {'loss': 1.0264, 'learning_rate': 4.551316580929597e-06, 'epoch': 0.22} 22%|██▏ | 605/2774 [1:59:26<6:58:52, 11.59s/it] 22%|██▏ | 606/2774 [1:59:37<6:55:49, 11.51s/it] {'loss': 1.0728, 'learning_rate': 4.5496462600425474e-06, 'epoch': 0.22} 22%|██▏ | 606/2774 [1:59:37<6:55:49, 11.51s/it] 22%|██▏ | 607/2774 [1:59:49<6:58:33, 11.59s/it] {'loss': 1.0391, 'learning_rate': 4.547973143561816e-06, 'epoch': 0.22} 22%|██▏ | 607/2774 [1:59:49<6:58:33, 11.59s/it] 22%|██▏ | 608/2774 [2:00:00<6:57:38, 11.57s/it] {'loss': 1.04, 'learning_rate': 4.54629723376943e-06, 'epoch': 0.22} 22%|██▏ | 608/2774 [2:00:00<6:57:38, 11.57s/it] 22%|██▏ | 609/2774 [2:00:14<7:17:27, 12.12s/it] {'loss': 1.0068, 'learning_rate': 4.544618532951232e-06, 'epoch': 0.22} 22%|██▏ | 609/2774 [2:00:14<7:17:27, 12.12s/it] 22%|██▏ | 610/2774 [2:00:25<7:09:50, 11.92s/it] {'loss': 1.0098, 'learning_rate': 4.542937043396865e-06, 'epoch': 0.22} 22%|██▏ | 610/2774 [2:00:25<7:09:50, 11.92s/it] 22%|██▏ | 611/2774 [2:00:36<7:04:47, 11.78s/it] {'loss': 1.0566, 'learning_rate': 4.541252767399783e-06, 'epoch': 0.22} 22%|██▏ | 611/2774 [2:00:36<7:04:47, 11.78s/it] 22%|██▏ | 612/2774 [2:00:51<7:31:59, 12.54s/it] {'loss': 1.0215, 'learning_rate': 4.5395657072572345e-06, 'epoch': 0.22} 22%|██▏ | 612/2774 [2:00:51<7:31:59, 12.54s/it] 22%|██▏ | 613/2774 [2:01:03<7:25:47, 12.38s/it] {'loss': 1.0396, 'learning_rate': 4.537875865270267e-06, 'epoch': 0.22} 22%|██▏ | 613/2774 [2:01:03<7:25:47, 12.38s/it] 22%|██▏ | 614/2774 [2:01:14<7:13:32, 12.04s/it] {'loss': 1.084, 'learning_rate': 4.536183243743726e-06, 'epoch': 0.22} 22%|██▏ | 614/2774 [2:01:14<7:13:32, 12.04s/it] 22%|██▏ | 615/2774 [2:01:26<7:17:49, 12.17s/it] {'loss': 1.0376, 'learning_rate': 4.534487844986241e-06, 'epoch': 0.22} 22%|██▏ | 615/2774 [2:01:26<7:17:49, 12.17s/it] 22%|██▏ | 616/2774 [2:01:39<7:25:51, 12.40s/it] {'loss': 0.9961, 'learning_rate': 4.532789671310236e-06, 'epoch': 0.22} 22%|██▏ | 616/2774 [2:01:39<7:25:51, 12.40s/it] 22%|██▏ | 617/2774 [2:01:51<7:12:04, 12.02s/it] {'loss': 1.0186, 'learning_rate': 4.531088725031917e-06, 'epoch': 0.22} 22%|██▏ | 617/2774 [2:01:51<7:12:04, 12.02s/it] 22%|██▏ | 618/2774 [2:02:02<7:09:40, 11.96s/it] {'loss': 1.0024, 'learning_rate': 4.529385008471272e-06, 'epoch': 0.22} 22%|██▏ | 618/2774 [2:02:02<7:09:40, 11.96s/it] 22%|██▏ | 619/2774 [2:02:14<7:06:49, 11.88s/it] {'loss': 1.0552, 'learning_rate': 4.527678523952067e-06, 'epoch': 0.22} 22%|██▏ | 619/2774 [2:02:14<7:06:49, 11.88s/it] 22%|██▏ | 620/2774 [2:02:26<7:04:53, 11.84s/it] {'loss': 1.0347, 'learning_rate': 4.525969273801845e-06, 'epoch': 0.22} 22%|██▏ | 620/2774 [2:02:26<7:04:53, 11.84s/it] 22%|██▏ | 621/2774 [2:02:38<7:08:16, 11.94s/it] {'loss': 1.0371, 'learning_rate': 4.524257260351917e-06, 'epoch': 0.22} 22%|██▏ | 621/2774 [2:02:38<7:08:16, 11.94s/it] 22%|██▏ | 622/2774 [2:02:49<7:03:21, 11.80s/it] {'loss': 1.0161, 'learning_rate': 4.522542485937369e-06, 'epoch': 0.22} 22%|██▏ | 622/2774 [2:02:49<7:03:21, 11.80s/it] 22%|██▏ | 623/2774 [2:03:01<6:56:47, 11.63s/it] {'loss': 1.0449, 'learning_rate': 4.520824952897048e-06, 'epoch': 0.22} 22%|██▏ | 623/2774 [2:03:01<6:56:47, 11.63s/it] 22%|██▏ | 624/2774 [2:03:12<6:53:34, 11.54s/it] {'loss': 1.085, 'learning_rate': 4.519104663573567e-06, 'epoch': 0.22} 22%|██▏ | 624/2774 [2:03:12<6:53:34, 11.54s/it] 23%|██▎ | 625/2774 [2:03:25<7:10:10, 12.01s/it] {'loss': 1.0317, 'learning_rate': 4.517381620313295e-06, 'epoch': 0.23} 23%|██▎ | 625/2774 [2:03:25<7:10:10, 12.01s/it] 23%|██▎ | 626/2774 [2:03:36<7:01:49, 11.78s/it] {'loss': 1.0098, 'learning_rate': 4.515655825466359e-06, 'epoch': 0.23} 23%|██▎ | 626/2774 [2:03:36<7:01:49, 11.78s/it] 23%|██▎ | 627/2774 [2:03:48<6:59:19, 11.72s/it] {'loss': 1.0015, 'learning_rate': 4.5139272813866395e-06, 'epoch': 0.23} 23%|██▎ | 627/2774 [2:03:48<6:59:19, 11.72s/it] 23%|██▎ | 628/2774 [2:04:00<6:58:11, 11.69s/it] {'loss': 1.0483, 'learning_rate': 4.512195990431767e-06, 'epoch': 0.23} 23%|██▎ | 628/2774 [2:04:00<6:58:11, 11.69s/it] 23%|██▎ | 629/2774 [2:04:12<7:00:37, 11.77s/it] {'loss': 1.0479, 'learning_rate': 4.510461954963116e-06, 'epoch': 0.23} 23%|██▎ | 629/2774 [2:04:12<7:00:37, 11.77s/it] 23%|██▎ | 630/2774 [2:04:23<6:59:02, 11.73s/it] {'loss': 1.0059, 'learning_rate': 4.508725177345809e-06, 'epoch': 0.23} 23%|██▎ | 630/2774 [2:04:23<6:59:02, 11.73s/it] 23%|██▎ | 631/2774 [2:04:35<6:56:12, 11.65s/it] {'loss': 1.0029, 'learning_rate': 4.5069856599487014e-06, 'epoch': 0.23} 23%|██▎ | 631/2774 [2:04:35<6:56:12, 11.65s/it] 23%|██▎ | 632/2774 [2:04:46<6:52:48, 11.56s/it] {'loss': 1.0659, 'learning_rate': 4.505243405144394e-06, 'epoch': 0.23} 23%|██▎ | 632/2774 [2:04:46<6:52:48, 11.56s/it] 23%|██▎ | 633/2774 [2:04:58<6:54:22, 11.61s/it] {'loss': 1.0967, 'learning_rate': 4.5034984153092145e-06, 'epoch': 0.23} 23%|██▎ | 633/2774 [2:04:58<6:54:22, 11.61s/it] 23%|██▎ | 634/2774 [2:05:11<7:10:46, 12.08s/it] {'loss': 0.999, 'learning_rate': 4.501750692823225e-06, 'epoch': 0.23} 23%|██▎ | 634/2774 [2:05:11<7:10:46, 12.08s/it] 23%|██▎ | 635/2774 [2:05:23<7:06:00, 11.95s/it] {'loss': 1.0054, 'learning_rate': 4.500000240070212e-06, 'epoch': 0.23} 23%|██▎ | 635/2774 [2:05:23<7:06:00, 11.95s/it] 23%|██▎ | 636/2774 [2:05:35<7:11:20, 12.11s/it] {'loss': 1.0859, 'learning_rate': 4.498247059437689e-06, 'epoch': 0.23} 23%|██▎ | 636/2774 [2:05:35<7:11:20, 12.11s/it] 23%|██▎ | 637/2774 [2:05:46<7:03:22, 11.89s/it] {'loss': 1.0498, 'learning_rate': 4.496491153316887e-06, 'epoch': 0.23} 23%|██▎ | 637/2774 [2:05:46<7:03:22, 11.89s/it] 23%|██▎ | 638/2774 [2:05:58<7:00:55, 11.82s/it] {'loss': 1.0977, 'learning_rate': 4.494732524102757e-06, 'epoch': 0.23} 23%|██▎ | 638/2774 [2:05:58<7:00:55, 11.82s/it] 23%|██▎ | 639/2774 [2:06:10<7:03:16, 11.90s/it] {'loss': 1.022, 'learning_rate': 4.492971174193963e-06, 'epoch': 0.23} 23%|██▎ | 639/2774 [2:06:10<7:03:16, 11.90s/it] 23%|██▎ | 640/2774 [2:06:21<6:57:31, 11.74s/it] {'loss': 1.0728, 'learning_rate': 4.4912071059928794e-06, 'epoch': 0.23} 23%|██▎ | 640/2774 [2:06:21<6:57:31, 11.74s/it] 23%|██▎ | 641/2774 [2:06:33<6:52:41, 11.61s/it] {'loss': 1.0312, 'learning_rate': 4.489440321905588e-06, 'epoch': 0.23} 23%|██▎ | 641/2774 [2:06:33<6:52:41, 11.61s/it] 23%|██▎ | 642/2774 [2:06:44<6:47:56, 11.48s/it] {'loss': 1.04, 'learning_rate': 4.487670824341877e-06, 'epoch': 0.23} 23%|██▎ | 642/2774 [2:06:44<6:47:56, 11.48s/it] 23%|██▎ | 643/2774 [2:06:55<6:45:41, 11.42s/it] {'loss': 1.0278, 'learning_rate': 4.485898615715233e-06, 'epoch': 0.23} 23%|██▎ | 643/2774 [2:06:55<6:45:41, 11.42s/it] 23%|██▎ | 644/2774 [2:07:07<6:45:05, 11.41s/it] {'loss': 1.083, 'learning_rate': 4.4841236984428426e-06, 'epoch': 0.23} 23%|██▎ | 644/2774 [2:07:07<6:45:05, 11.41s/it] 23%|██▎ | 645/2774 [2:07:18<6:44:05, 11.39s/it] {'loss': 1.0161, 'learning_rate': 4.482346074945585e-06, 'epoch': 0.23} 23%|██▎ | 645/2774 [2:07:18<6:44:05, 11.39s/it] 23%|██▎ | 646/2774 [2:07:29<6:42:57, 11.36s/it] {'loss': 1.0615, 'learning_rate': 4.4805657476480305e-06, 'epoch': 0.23} 23%|██▎ | 646/2774 [2:07:29<6:42:57, 11.36s/it] 23%|██▎ | 647/2774 [2:07:41<6:47:26, 11.49s/it] {'loss': 1.0811, 'learning_rate': 4.4787827189784395e-06, 'epoch': 0.23} 23%|██▎ | 647/2774 [2:07:41<6:47:26, 11.49s/it] 23%|██▎ | 648/2774 [2:07:53<6:50:10, 11.58s/it] {'loss': 1.0283, 'learning_rate': 4.476996991368755e-06, 'epoch': 0.23} 23%|██▎ | 648/2774 [2:07:53<6:50:10, 11.58s/it] 23%|██▎ | 649/2774 [2:08:06<7:02:45, 11.94s/it] {'loss': 1.0137, 'learning_rate': 4.4752085672546005e-06, 'epoch': 0.23} 23%|██▎ | 649/2774 [2:08:06<7:02:45, 11.94s/it] 23%|██▎ | 650/2774 [2:08:17<6:55:43, 11.74s/it] {'loss': 1.0586, 'learning_rate': 4.47341744907528e-06, 'epoch': 0.23} 23%|██▎ | 650/2774 [2:08:17<6:55:43, 11.74s/it] 23%|██▎ | 651/2774 [2:08:29<6:56:12, 11.76s/it] {'loss': 1.0518, 'learning_rate': 4.47162363927377e-06, 'epoch': 0.23} 23%|██▎ | 651/2774 [2:08:29<6:56:12, 11.76s/it] 24%|██▎ | 652/2774 [2:08:42<7:08:37, 12.12s/it] {'loss': 1.0103, 'learning_rate': 4.469827140296719e-06, 'epoch': 0.24} 24%|██▎ | 652/2774 [2:08:42<7:08:37, 12.12s/it] 24%|██▎ | 653/2774 [2:08:53<7:00:43, 11.90s/it] {'loss': 1.0269, 'learning_rate': 4.468027954594442e-06, 'epoch': 0.24} 24%|██▎ | 653/2774 [2:08:53<7:00:43, 11.90s/it] 24%|██▎ | 654/2774 [2:09:05<6:57:21, 11.81s/it] {'loss': 0.998, 'learning_rate': 4.466226084620919e-06, 'epoch': 0.24} 24%|██▎ | 654/2774 [2:09:05<6:57:21, 11.81s/it] 24%|██▎ | 655/2774 [2:09:16<6:50:28, 11.62s/it] {'loss': 1.02, 'learning_rate': 4.464421532833794e-06, 'epoch': 0.24} 24%|██▎ | 655/2774 [2:09:16<6:50:28, 11.62s/it] 24%|██▎ | 656/2774 [2:09:27<6:49:11, 11.59s/it] {'loss': 1.0112, 'learning_rate': 4.462614301694367e-06, 'epoch': 0.24} 24%|██▎ | 656/2774 [2:09:27<6:49:11, 11.59s/it] 24%|██▎ | 657/2774 [2:09:39<6:48:59, 11.59s/it] {'loss': 1.0928, 'learning_rate': 4.460804393667589e-06, 'epoch': 0.24} 24%|██▎ | 657/2774 [2:09:39<6:48:59, 11.59s/it] 24%|██▎ | 658/2774 [2:09:51<6:50:05, 11.63s/it] {'loss': 1.0337, 'learning_rate': 4.458991811222067e-06, 'epoch': 0.24} 24%|██▎ | 658/2774 [2:09:51<6:50:05, 11.63s/it] 24%|██▍ | 659/2774 [2:10:02<6:46:45, 11.54s/it] {'loss': 1.04, 'learning_rate': 4.457176556830054e-06, 'epoch': 0.24} 24%|██▍ | 659/2774 [2:10:02<6:46:45, 11.54s/it] 24%|██▍ | 660/2774 [2:10:13<6:44:41, 11.49s/it] {'loss': 1.063, 'learning_rate': 4.4553586329674484e-06, 'epoch': 0.24} 24%|██▍ | 660/2774 [2:10:13<6:44:41, 11.49s/it] 24%|██▍ | 661/2774 [2:10:25<6:44:07, 11.48s/it] {'loss': 1.0332, 'learning_rate': 4.4535380421137865e-06, 'epoch': 0.24} 24%|██▍ | 661/2774 [2:10:25<6:44:07, 11.48s/it] 24%|██▍ | 662/2774 [2:10:37<6:46:25, 11.55s/it] {'loss': 1.0156, 'learning_rate': 4.451714786752245e-06, 'epoch': 0.24} 24%|██▍ | 662/2774 [2:10:37<6:46:25, 11.55s/it] 24%|██▍ | 663/2774 [2:10:48<6:44:46, 11.50s/it] {'loss': 1.0312, 'learning_rate': 4.449888869369634e-06, 'epoch': 0.24} 24%|██▍ | 663/2774 [2:10:48<6:44:46, 11.50s/it] 24%|██▍ | 664/2774 [2:10:59<6:43:30, 11.47s/it] {'loss': 1.083, 'learning_rate': 4.448060292456395e-06, 'epoch': 0.24} 24%|██▍ | 664/2774 [2:10:59<6:43:30, 11.47s/it] 24%|██▍ | 665/2774 [2:11:11<6:41:01, 11.41s/it] {'loss': 1.0181, 'learning_rate': 4.446229058506596e-06, 'epoch': 0.24} 24%|██▍ | 665/2774 [2:11:11<6:41:01, 11.41s/it] 24%|██▍ | 666/2774 [2:11:22<6:43:55, 11.50s/it] {'loss': 1.0503, 'learning_rate': 4.44439517001793e-06, 'epoch': 0.24} 24%|██▍ | 666/2774 [2:11:22<6:43:55, 11.50s/it] 24%|██▍ | 667/2774 [2:11:36<7:02:27, 12.03s/it] {'loss': 0.9688, 'learning_rate': 4.44255862949171e-06, 'epoch': 0.24} 24%|██▍ | 667/2774 [2:11:36<7:02:27, 12.03s/it] 24%|██▍ | 668/2774 [2:11:47<6:57:46, 11.90s/it] {'loss': 1.0264, 'learning_rate': 4.440719439432866e-06, 'epoch': 0.24} 24%|██▍ | 668/2774 [2:11:47<6:57:46, 11.90s/it] 24%|██▍ | 669/2774 [2:11:58<6:49:59, 11.69s/it] {'loss': 1.0029, 'learning_rate': 4.438877602349941e-06, 'epoch': 0.24} 24%|██▍ | 669/2774 [2:11:58<6:49:59, 11.69s/it] 24%|██▍ | 670/2774 [2:12:12<7:07:08, 12.18s/it] {'loss': 1.0894, 'learning_rate': 4.437033120755092e-06, 'epoch': 0.24} 24%|██▍ | 670/2774 [2:12:12<7:07:08, 12.18s/it] 24%|██▍ | 671/2774 [2:12:23<6:59:52, 11.98s/it] {'loss': 1.0142, 'learning_rate': 4.435185997164079e-06, 'epoch': 0.24} 24%|██▍ | 671/2774 [2:12:23<6:59:52, 11.98s/it] 24%|██▍ | 672/2774 [2:12:35<6:54:24, 11.83s/it] {'loss': 1.0151, 'learning_rate': 4.433336234096267e-06, 'epoch': 0.24} 24%|██▍ | 672/2774 [2:12:35<6:54:24, 11.83s/it] 24%|██▍ | 673/2774 [2:12:46<6:51:31, 11.75s/it] {'loss': 1.0249, 'learning_rate': 4.431483834074621e-06, 'epoch': 0.24} 24%|██▍ | 673/2774 [2:12:46<6:51:31, 11.75s/it] 24%|██▍ | 674/2774 [2:13:00<7:07:31, 12.22s/it] {'loss': 0.9697, 'learning_rate': 4.429628799625704e-06, 'epoch': 0.24} 24%|██▍ | 674/2774 [2:13:00<7:07:31, 12.22s/it] 24%|██▍ | 675/2774 [2:13:11<6:57:36, 11.94s/it] {'loss': 1.0215, 'learning_rate': 4.4277711332796695e-06, 'epoch': 0.24} 24%|██▍ | 675/2774 [2:13:11<6:57:36, 11.94s/it] 24%|██▍ | 676/2774 [2:13:23<6:56:53, 11.92s/it] {'loss': 0.9863, 'learning_rate': 4.425910837570263e-06, 'epoch': 0.24} 24%|██▍ | 676/2774 [2:13:23<6:56:53, 11.92s/it] 24%|██▍ | 677/2774 [2:13:34<6:51:33, 11.78s/it] {'loss': 1.0327, 'learning_rate': 4.4240479150348145e-06, 'epoch': 0.24} 24%|██▍ | 677/2774 [2:13:34<6:51:33, 11.78s/it] 24%|██▍ | 678/2774 [2:13:46<6:49:44, 11.73s/it] {'loss': 1.04, 'learning_rate': 4.4221823682142385e-06, 'epoch': 0.24} 24%|██▍ | 678/2774 [2:13:46<6:49:44, 11.73s/it] 24%|██▍ | 679/2774 [2:13:57<6:43:29, 11.56s/it] {'loss': 1.0503, 'learning_rate': 4.420314199653028e-06, 'epoch': 0.24} 24%|██▍ | 679/2774 [2:13:57<6:43:29, 11.56s/it] 25%|██▍ | 680/2774 [2:14:08<6:40:39, 11.48s/it] {'loss': 1.0229, 'learning_rate': 4.4184434118992525e-06, 'epoch': 0.25} 25%|██▍ | 680/2774 [2:14:08<6:40:39, 11.48s/it] 25%|██▍ | 681/2774 [2:14:20<6:41:52, 11.52s/it] {'loss': 0.9966, 'learning_rate': 4.4165700075045525e-06, 'epoch': 0.25} 25%|██▍ | 681/2774 [2:14:20<6:41:52, 11.52s/it] 25%|██▍ | 682/2774 [2:14:32<6:44:27, 11.60s/it] {'loss': 1.0381, 'learning_rate': 4.41469398902414e-06, 'epoch': 0.25} 25%|██▍ | 682/2774 [2:14:32<6:44:27, 11.60s/it] 25%|██▍ | 683/2774 [2:14:43<6:39:45, 11.47s/it] {'loss': 1.0498, 'learning_rate': 4.412815359016789e-06, 'epoch': 0.25} 25%|██▍ | 683/2774 [2:14:43<6:39:45, 11.47s/it] 25%|██▍ | 684/2774 [2:14:54<6:37:44, 11.42s/it] {'loss': 1.0483, 'learning_rate': 4.410934120044838e-06, 'epoch': 0.25} 25%|██▍ | 684/2774 [2:14:54<6:37:44, 11.42s/it] 25%|██▍ | 685/2774 [2:15:06<6:37:22, 11.41s/it] {'loss': 1.0933, 'learning_rate': 4.4090502746741845e-06, 'epoch': 0.25} 25%|██▍ | 685/2774 [2:15:06<6:37:22, 11.41s/it] 25%|██▍ | 686/2774 [2:15:17<6:36:33, 11.40s/it] {'loss': 0.9897, 'learning_rate': 4.4071638254742795e-06, 'epoch': 0.25} 25%|██▍ | 686/2774 [2:15:17<6:36:33, 11.40s/it] 25%|██▍ | 687/2774 [2:15:28<6:35:39, 11.37s/it] {'loss': 1.0376, 'learning_rate': 4.4052747750181245e-06, 'epoch': 0.25} 25%|██▍ | 687/2774 [2:15:28<6:35:39, 11.37s/it] 25%|██▍ | 688/2774 [2:15:40<6:39:03, 11.48s/it] {'loss': 1.0547, 'learning_rate': 4.40338312588227e-06, 'epoch': 0.25} 25%|██▍ | 688/2774 [2:15:40<6:39:03, 11.48s/it] 25%|██▍ | 689/2774 [2:15:52<6:40:13, 11.52s/it] {'loss': 1.0352, 'learning_rate': 4.401488880646813e-06, 'epoch': 0.25} 25%|██▍ | 689/2774 [2:15:52<6:40:13, 11.52s/it] 25%|██▍ | 690/2774 [2:16:03<6:41:23, 11.56s/it] {'loss': 1.0547, 'learning_rate': 4.399592041895389e-06, 'epoch': 0.25} 25%|██▍ | 690/2774 [2:16:03<6:41:23, 11.56s/it] 25%|██▍ | 691/2774 [2:16:14<6:37:48, 11.46s/it] {'loss': 1.0288, 'learning_rate': 4.397692612215169e-06, 'epoch': 0.25} 25%|██▍ | 691/2774 [2:16:14<6:37:48, 11.46s/it] 25%|██▍ | 692/2774 [2:16:26<6:36:45, 11.43s/it] {'loss': 1.0493, 'learning_rate': 4.395790594196864e-06, 'epoch': 0.25} 25%|██▍ | 692/2774 [2:16:26<6:36:45, 11.43s/it] 25%|██▍ | 693/2774 [2:16:37<6:34:02, 11.36s/it] {'loss': 1.0122, 'learning_rate': 4.39388599043471e-06, 'epoch': 0.25} 25%|██▍ | 693/2774 [2:16:37<6:34:02, 11.36s/it] 25%|██▌ | 694/2774 [2:16:50<6:46:19, 11.72s/it] {'loss': 1.0723, 'learning_rate': 4.391978803526471e-06, 'epoch': 0.25} 25%|██▌ | 694/2774 [2:16:50<6:46:19, 11.72s/it] 25%|██▌ | 695/2774 [2:17:02<6:50:19, 11.84s/it] {'loss': 1.0811, 'learning_rate': 4.390069036073436e-06, 'epoch': 0.25} 25%|██▌ | 695/2774 [2:17:02<6:50:19, 11.84s/it] 25%|██▌ | 696/2774 [2:17:14<6:52:14, 11.90s/it] {'loss': 1.0112, 'learning_rate': 4.3881566906804105e-06, 'epoch': 0.25} 25%|██▌ | 696/2774 [2:17:14<6:52:14, 11.90s/it] 25%|██▌ | 697/2774 [2:17:25<6:50:00, 11.84s/it] {'loss': 1.0562, 'learning_rate': 4.386241769955721e-06, 'epoch': 0.25} 25%|██▌ | 697/2774 [2:17:25<6:50:00, 11.84s/it] 25%|██▌ | 698/2774 [2:17:37<6:43:26, 11.66s/it] {'loss': 1.084, 'learning_rate': 4.3843242765112006e-06, 'epoch': 0.25} 25%|██▌ | 698/2774 [2:17:37<6:43:26, 11.66s/it] 25%|██▌ | 699/2774 [2:17:49<6:54:59, 12.00s/it] {'loss': 0.9971, 'learning_rate': 4.382404212962196e-06, 'epoch': 0.25} 25%|██▌ | 699/2774 [2:17:49<6:54:59, 12.00s/it] 25%|██▌ | 700/2774 [2:18:01<6:48:56, 11.83s/it] {'loss': 0.9771, 'learning_rate': 4.3804815819275585e-06, 'epoch': 0.25} 25%|██▌ | 700/2774 [2:18:01<6:48:56, 11.83s/it] 25%|██▌ | 701/2774 [2:18:12<6:44:53, 11.72s/it] {'loss': 1.0542, 'learning_rate': 4.378556386029638e-06, 'epoch': 0.25} 25%|██▌ | 701/2774 [2:18:12<6:44:53, 11.72s/it] 25%|██▌ | 702/2774 [2:18:23<6:38:47, 11.55s/it] {'loss': 1.0195, 'learning_rate': 4.37662862789429e-06, 'epoch': 0.25} 25%|██▌ | 702/2774 [2:18:23<6:38:47, 11.55s/it] 25%|██▌ | 703/2774 [2:18:35<6:36:09, 11.48s/it] {'loss': 1.0898, 'learning_rate': 4.374698310150856e-06, 'epoch': 0.25} 25%|██▌ | 703/2774 [2:18:35<6:36:09, 11.48s/it] 25%|██▌ | 704/2774 [2:18:46<6:38:18, 11.55s/it] {'loss': 1.0264, 'learning_rate': 4.372765435432176e-06, 'epoch': 0.25} 25%|██▌ | 704/2774 [2:18:46<6:38:18, 11.55s/it] 25%|██▌ | 705/2774 [2:18:59<6:48:57, 11.86s/it] {'loss': 1.0366, 'learning_rate': 4.370830006374571e-06, 'epoch': 0.25} 25%|██▌ | 705/2774 [2:18:59<6:48:57, 11.86s/it] 25%|██▌ | 706/2774 [2:19:12<7:02:01, 12.24s/it] {'loss': 1.0386, 'learning_rate': 4.368892025617852e-06, 'epoch': 0.25} 25%|██▌ | 706/2774 [2:19:12<7:02:01, 12.24s/it] 25%|██▌ | 707/2774 [2:19:24<6:53:58, 12.02s/it] {'loss': 1.0352, 'learning_rate': 4.366951495805306e-06, 'epoch': 0.25} 25%|██▌ | 707/2774 [2:19:24<6:53:58, 12.02s/it] 26%|██▌ | 708/2774 [2:19:35<6:48:18, 11.86s/it] {'loss': 1.0571, 'learning_rate': 4.3650084195837e-06, 'epoch': 0.26} 26%|██▌ | 708/2774 [2:19:35<6:48:18, 11.86s/it] 26%|██▌ | 709/2774 [2:19:47<6:48:50, 11.88s/it] {'loss': 1.0244, 'learning_rate': 4.363062799603271e-06, 'epoch': 0.26} 26%|██▌ | 709/2774 [2:19:47<6:48:50, 11.88s/it] 26%|██▌ | 710/2774 [2:19:59<6:43:51, 11.74s/it] {'loss': 1.0942, 'learning_rate': 4.361114638517728e-06, 'epoch': 0.26} 26%|██▌ | 710/2774 [2:19:59<6:43:51, 11.74s/it] 26%|██▌ | 711/2774 [2:20:10<6:39:28, 11.62s/it] {'loss': 1.0469, 'learning_rate': 4.359163938984245e-06, 'epoch': 0.26} 26%|██▌ | 711/2774 [2:20:10<6:39:28, 11.62s/it] 26%|██▌ | 712/2774 [2:20:21<6:38:23, 11.59s/it] {'loss': 1.0942, 'learning_rate': 4.357210703663458e-06, 'epoch': 0.26} 26%|██▌ | 712/2774 [2:20:21<6:38:23, 11.59s/it] 26%|██▌ | 713/2774 [2:20:33<6:36:14, 11.54s/it] {'loss': 1.0527, 'learning_rate': 4.355254935219462e-06, 'epoch': 0.26} 26%|██▌ | 713/2774 [2:20:33<6:36:14, 11.54s/it] 26%|██▌ | 714/2774 [2:20:49<7:22:05, 12.88s/it] {'loss': 1.0215, 'learning_rate': 4.353296636319808e-06, 'epoch': 0.26} 26%|██▌ | 714/2774 [2:20:49<7:22:05, 12.88s/it] 26%|██▌ | 715/2774 [2:21:00<7:05:03, 12.39s/it] {'loss': 1.0835, 'learning_rate': 4.3513358096354966e-06, 'epoch': 0.26} 26%|██▌ | 715/2774 [2:21:00<7:05:03, 12.39s/it] 26%|██▌ | 716/2774 [2:21:11<6:54:57, 12.10s/it] {'loss': 1.0234, 'learning_rate': 4.3493724578409756e-06, 'epoch': 0.26} 26%|██▌ | 716/2774 [2:21:11<6:54:57, 12.10s/it] 26%|██▌ | 717/2774 [2:21:23<6:47:22, 11.88s/it] {'loss': 1.0161, 'learning_rate': 4.347406583614141e-06, 'epoch': 0.26} 26%|██▌ | 717/2774 [2:21:23<6:47:22, 11.88s/it] 26%|██▌ | 718/2774 [2:21:35<6:45:29, 11.83s/it] {'loss': 1.04, 'learning_rate': 4.3454381896363245e-06, 'epoch': 0.26} 26%|██▌ | 718/2774 [2:21:35<6:45:29, 11.83s/it] 26%|██▌ | 719/2774 [2:21:46<6:45:16, 11.83s/it] {'loss': 1.0562, 'learning_rate': 4.343467278592297e-06, 'epoch': 0.26} 26%|██▌ | 719/2774 [2:21:46<6:45:16, 11.83s/it] 26%|██▌ | 720/2774 [2:21:58<6:40:57, 11.71s/it] {'loss': 1.0371, 'learning_rate': 4.341493853170263e-06, 'epoch': 0.26} 26%|██▌ | 720/2774 [2:21:58<6:40:57, 11.71s/it] 26%|██▌ | 721/2774 [2:22:09<6:37:57, 11.63s/it] {'loss': 1.0205, 'learning_rate': 4.3395179160618545e-06, 'epoch': 0.26} 26%|██▌ | 721/2774 [2:22:09<6:37:57, 11.63s/it] 26%|██▌ | 722/2774 [2:22:21<6:35:44, 11.57s/it] {'loss': 1.0317, 'learning_rate': 4.337539469962131e-06, 'epoch': 0.26} 26%|██▌ | 722/2774 [2:22:21<6:35:44, 11.57s/it] 26%|██▌ | 723/2774 [2:22:32<6:34:17, 11.53s/it] {'loss': 1.022, 'learning_rate': 4.335558517569573e-06, 'epoch': 0.26} 26%|██▌ | 723/2774 [2:22:32<6:34:17, 11.53s/it] 26%|██▌ | 724/2774 [2:22:45<6:46:52, 11.91s/it] {'loss': 1.0137, 'learning_rate': 4.333575061586079e-06, 'epoch': 0.26} 26%|██▌ | 724/2774 [2:22:45<6:46:52, 11.91s/it] 26%|██▌ | 725/2774 [2:22:56<6:41:42, 11.76s/it] {'loss': 1.1064, 'learning_rate': 4.331589104716965e-06, 'epoch': 0.26} 26%|██▌ | 725/2774 [2:22:56<6:41:42, 11.76s/it] 26%|██▌ | 726/2774 [2:23:08<6:40:08, 11.72s/it] {'loss': 1.0703, 'learning_rate': 4.329600649670955e-06, 'epoch': 0.26} 26%|██▌ | 726/2774 [2:23:08<6:40:08, 11.72s/it] 26%|██▌ | 727/2774 [2:23:19<6:37:14, 11.64s/it] {'loss': 1.083, 'learning_rate': 4.327609699160183e-06, 'epoch': 0.26} 26%|██▌ | 727/2774 [2:23:19<6:37:14, 11.64s/it] 26%|██▌ | 728/2774 [2:23:31<6:32:48, 11.52s/it] {'loss': 1.042, 'learning_rate': 4.325616255900183e-06, 'epoch': 0.26} 26%|██▌ | 728/2774 [2:23:31<6:32:48, 11.52s/it] 26%|██▋ | 729/2774 [2:23:44<6:49:19, 12.01s/it] {'loss': 0.9995, 'learning_rate': 4.323620322609894e-06, 'epoch': 0.26} 26%|██▋ | 729/2774 [2:23:44<6:49:19, 12.01s/it] 26%|██▋ | 730/2774 [2:23:55<6:42:33, 11.82s/it] {'loss': 1.0562, 'learning_rate': 4.321621902011645e-06, 'epoch': 0.26} 26%|██▋ | 730/2774 [2:23:55<6:42:33, 11.82s/it] 26%|██▋ | 731/2774 [2:24:06<6:36:56, 11.66s/it] {'loss': 0.9985, 'learning_rate': 4.319620996831164e-06, 'epoch': 0.26} 26%|██▋ | 731/2774 [2:24:06<6:36:56, 11.66s/it] 26%|██▋ | 732/2774 [2:24:20<6:51:27, 12.09s/it] {'loss': 0.9839, 'learning_rate': 4.3176176097975635e-06, 'epoch': 0.26} 26%|██▋ | 732/2774 [2:24:20<6:51:27, 12.09s/it] 26%|██▋ | 733/2774 [2:24:31<6:41:32, 11.80s/it] {'loss': 1.0054, 'learning_rate': 4.315611743643342e-06, 'epoch': 0.26} 26%|██▋ | 733/2774 [2:24:31<6:41:32, 11.80s/it] 26%|██▋ | 734/2774 [2:24:42<6:36:28, 11.66s/it] {'loss': 1.0654, 'learning_rate': 4.31360340110438e-06, 'epoch': 0.26} 26%|██▋ | 734/2774 [2:24:42<6:36:28, 11.66s/it] 26%|██▋ | 735/2774 [2:24:53<6:33:54, 11.59s/it] {'loss': 1.0303, 'learning_rate': 4.311592584919936e-06, 'epoch': 0.26} 26%|██▋ | 735/2774 [2:24:53<6:33:54, 11.59s/it] 27%|██▋ | 736/2774 [2:25:05<6:31:01, 11.51s/it] {'loss': 1.0601, 'learning_rate': 4.309579297832642e-06, 'epoch': 0.27} 27%|██▋ | 736/2774 [2:25:05<6:31:01, 11.51s/it] 27%|██▋ | 737/2774 [2:25:16<6:28:49, 11.45s/it] {'loss': 1.0293, 'learning_rate': 4.307563542588498e-06, 'epoch': 0.27} 27%|██▋ | 737/2774 [2:25:16<6:28:49, 11.45s/it] 27%|██▋ | 738/2774 [2:25:27<6:26:54, 11.40s/it] {'loss': 1.0166, 'learning_rate': 4.305545321936875e-06, 'epoch': 0.27} 27%|██▋ | 738/2774 [2:25:27<6:26:54, 11.40s/it] 27%|██▋ | 739/2774 [2:25:39<6:26:48, 11.40s/it] {'loss': 0.9912, 'learning_rate': 4.303524638630503e-06, 'epoch': 0.27} 27%|██▋ | 739/2774 [2:25:39<6:26:48, 11.40s/it] 27%|██▋ | 740/2774 [2:25:51<6:39:09, 11.77s/it] {'loss': 1.0156, 'learning_rate': 4.301501495425472e-06, 'epoch': 0.27} 27%|██▋ | 740/2774 [2:25:51<6:39:09, 11.77s/it] 27%|██▋ | 741/2774 [2:26:03<6:39:12, 11.78s/it] {'loss': 1.0273, 'learning_rate': 4.299475895081226e-06, 'epoch': 0.27} 27%|██▋ | 741/2774 [2:26:03<6:39:12, 11.78s/it] 27%|██▋ | 742/2774 [2:26:16<6:51:11, 12.14s/it] {'loss': 1.0278, 'learning_rate': 4.297447840360562e-06, 'epoch': 0.27} 27%|██▋ | 742/2774 [2:26:16<6:51:11, 12.14s/it] 27%|██▋ | 743/2774 [2:26:29<6:59:54, 12.40s/it] {'loss': 0.9575, 'learning_rate': 4.295417334029626e-06, 'epoch': 0.27} 27%|██▋ | 743/2774 [2:26:29<6:59:54, 12.40s/it] 27%|██▋ | 744/2774 [2:26:43<7:11:17, 12.75s/it] {'loss': 0.9702, 'learning_rate': 4.293384378857903e-06, 'epoch': 0.27} 27%|██▋ | 744/2774 [2:26:43<7:11:17, 12.75s/it] 27%|██▋ | 745/2774 [2:26:55<7:06:41, 12.62s/it] {'loss': 1.0752, 'learning_rate': 4.291348977618224e-06, 'epoch': 0.27} 27%|██▋ | 745/2774 [2:26:55<7:06:41, 12.62s/it] 27%|██▋ | 746/2774 [2:27:07<6:54:02, 12.25s/it] {'loss': 1.1016, 'learning_rate': 4.289311133086751e-06, 'epoch': 0.27} 27%|██▋ | 746/2774 [2:27:07<6:54:02, 12.25s/it] 27%|██▋ | 747/2774 [2:27:18<6:42:49, 11.92s/it] {'loss': 1.0044, 'learning_rate': 4.287270848042982e-06, 'epoch': 0.27} 27%|██▋ | 747/2774 [2:27:18<6:42:49, 11.92s/it] 27%|██▋ | 748/2774 [2:27:31<6:57:27, 12.36s/it] {'loss': 0.9775, 'learning_rate': 4.285228125269742e-06, 'epoch': 0.27} 27%|██▋ | 748/2774 [2:27:31<6:57:27, 12.36s/it] 27%|██▋ | 749/2774 [2:27:43<6:50:23, 12.16s/it] {'loss': 1.0493, 'learning_rate': 4.283182967553183e-06, 'epoch': 0.27} 27%|██▋ | 749/2774 [2:27:43<6:50:23, 12.16s/it] 27%|██▋ | 750/2774 [2:27:54<6:43:54, 11.97s/it] {'loss': 1.02, 'learning_rate': 4.281135377682775e-06, 'epoch': 0.27} 27%|██▋ | 750/2774 [2:27:54<6:43:54, 11.97s/it] 27%|██▋ | 751/2774 [2:28:06<6:36:54, 11.77s/it] {'loss': 1.084, 'learning_rate': 4.279085358451307e-06, 'epoch': 0.27} 27%|██▋ | 751/2774 [2:28:06<6:36:54, 11.77s/it] 27%|██▋ | 752/2774 [2:28:17<6:34:21, 11.70s/it] {'loss': 1.041, 'learning_rate': 4.277032912654881e-06, 'epoch': 0.27} 27%|██▋ | 752/2774 [2:28:17<6:34:21, 11.70s/it] 27%|██▋ | 753/2774 [2:28:28<6:30:02, 11.58s/it] {'loss': 1.0649, 'learning_rate': 4.27497804309291e-06, 'epoch': 0.27} 27%|██▋ | 753/2774 [2:28:28<6:30:02, 11.58s/it] 27%|██▋ | 754/2774 [2:28:40<6:27:11, 11.50s/it] {'loss': 1.043, 'learning_rate': 4.272920752568112e-06, 'epoch': 0.27} 27%|██▋ | 754/2774 [2:28:40<6:27:11, 11.50s/it] 27%|██▋ | 755/2774 [2:28:52<6:36:08, 11.77s/it] {'loss': 1.0254, 'learning_rate': 4.270861043886506e-06, 'epoch': 0.27} 27%|██▋ | 755/2774 [2:28:52<6:36:08, 11.77s/it] 27%|██▋ | 756/2774 [2:29:03<6:30:48, 11.62s/it] {'loss': 1.0215, 'learning_rate': 4.268798919857412e-06, 'epoch': 0.27} 27%|██▋ | 756/2774 [2:29:03<6:30:48, 11.62s/it] 27%|██▋ | 757/2774 [2:29:16<6:39:41, 11.89s/it] {'loss': 1.0444, 'learning_rate': 4.266734383293441e-06, 'epoch': 0.27} 27%|██▋ | 757/2774 [2:29:16<6:39:41, 11.89s/it] 27%|██▋ | 758/2774 [2:29:30<6:56:36, 12.40s/it] {'loss': 1.0283, 'learning_rate': 4.264667437010497e-06, 'epoch': 0.27} 27%|██▋ | 758/2774 [2:29:30<6:56:36, 12.40s/it] 27%|██▋ | 759/2774 [2:29:41<6:44:40, 12.05s/it] {'loss': 1.0664, 'learning_rate': 4.262598083827769e-06, 'epoch': 0.27} 27%|██▋ | 759/2774 [2:29:41<6:44:40, 12.05s/it] 27%|██▋ | 760/2774 [2:29:52<6:37:13, 11.83s/it] {'loss': 1.0234, 'learning_rate': 4.26052632656773e-06, 'epoch': 0.27} 27%|██▋ | 760/2774 [2:29:52<6:37:13, 11.83s/it] 27%|██▋ | 761/2774 [2:30:03<6:30:31, 11.64s/it] {'loss': 1.0239, 'learning_rate': 4.258452168056132e-06, 'epoch': 0.27} 27%|██▋ | 761/2774 [2:30:03<6:30:31, 11.64s/it] 27%|██▋ | 762/2774 [2:30:15<6:26:38, 11.53s/it] {'loss': 1.0439, 'learning_rate': 4.256375611122003e-06, 'epoch': 0.27} 27%|██▋ | 762/2774 [2:30:15<6:26:38, 11.53s/it] 28%|██▊ | 763/2774 [2:30:26<6:25:45, 11.51s/it] {'loss': 1.0244, 'learning_rate': 4.25429665859764e-06, 'epoch': 0.28} 28%|██▊ | 763/2774 [2:30:26<6:25:45, 11.51s/it] 28%|██▊ | 764/2774 [2:30:38<6:26:20, 11.53s/it] {'loss': 1.0332, 'learning_rate': 4.252215313318608e-06, 'epoch': 0.28} 28%|██▊ | 764/2774 [2:30:38<6:26:20, 11.53s/it] 28%|██▊ | 765/2774 [2:30:49<6:25:23, 11.51s/it] {'loss': 1.0469, 'learning_rate': 4.250131578123737e-06, 'epoch': 0.28} 28%|██▊ | 765/2774 [2:30:49<6:25:23, 11.51s/it] 28%|██▊ | 766/2774 [2:31:00<6:22:58, 11.44s/it] {'loss': 1.0312, 'learning_rate': 4.248045455855116e-06, 'epoch': 0.28} 28%|██▊ | 766/2774 [2:31:00<6:22:58, 11.44s/it] 28%|██▊ | 767/2774 [2:31:12<6:22:17, 11.43s/it] {'loss': 1.0728, 'learning_rate': 4.24595694935809e-06, 'epoch': 0.28} 28%|██▊ | 767/2774 [2:31:12<6:22:17, 11.43s/it] 28%|██▊ | 768/2774 [2:31:23<6:19:39, 11.36s/it] {'loss': 1.0049, 'learning_rate': 4.243866061481256e-06, 'epoch': 0.28} 28%|██▊ | 768/2774 [2:31:23<6:19:39, 11.36s/it] 28%|██▊ | 769/2774 [2:31:35<6:23:16, 11.47s/it] {'loss': 1.0488, 'learning_rate': 4.241772795076458e-06, 'epoch': 0.28} 28%|██▊ | 769/2774 [2:31:35<6:23:16, 11.47s/it] 28%|██▊ | 770/2774 [2:31:46<6:26:05, 11.56s/it] {'loss': 1.0186, 'learning_rate': 4.239677152998784e-06, 'epoch': 0.28} 28%|██▊ | 770/2774 [2:31:46<6:26:05, 11.56s/it] 28%|██▊ | 771/2774 [2:31:59<6:33:33, 11.79s/it] {'loss': 0.9648, 'learning_rate': 4.2375791381065654e-06, 'epoch': 0.28} 28%|██▊ | 771/2774 [2:31:59<6:33:33, 11.79s/it] 28%|██▊ | 772/2774 [2:32:10<6:32:24, 11.76s/it] {'loss': 1.0234, 'learning_rate': 4.235478753261366e-06, 'epoch': 0.28} 28%|██▊ | 772/2774 [2:32:10<6:32:24, 11.76s/it] 28%|██▊ | 773/2774 [2:32:22<6:27:04, 11.61s/it] {'loss': 0.9854, 'learning_rate': 4.233376001327984e-06, 'epoch': 0.28} 28%|██▊ | 773/2774 [2:32:22<6:27:04, 11.61s/it] 28%|██▊ | 774/2774 [2:32:33<6:25:22, 11.56s/it] {'loss': 1.0298, 'learning_rate': 4.231270885174448e-06, 'epoch': 0.28} 28%|██▊ | 774/2774 [2:32:33<6:25:22, 11.56s/it] 28%|██▊ | 775/2774 [2:32:44<6:21:30, 11.45s/it] {'loss': 1.042, 'learning_rate': 4.229163407672007e-06, 'epoch': 0.28} 28%|██▊ | 775/2774 [2:32:44<6:21:30, 11.45s/it] 28%|██▊ | 776/2774 [2:32:57<6:30:20, 11.72s/it] {'loss': 1.0093, 'learning_rate': 4.2270535716951345e-06, 'epoch': 0.28} 28%|██▊ | 776/2774 [2:32:57<6:30:20, 11.72s/it] 28%|██▊ | 777/2774 [2:33:08<6:26:31, 11.61s/it] {'loss': 1.0083, 'learning_rate': 4.224941380121518e-06, 'epoch': 0.28} 28%|██▊ | 777/2774 [2:33:08<6:26:31, 11.61s/it] 28%|██▊ | 778/2774 [2:33:20<6:26:56, 11.63s/it] {'loss': 1.0776, 'learning_rate': 4.2228268358320605e-06, 'epoch': 0.28} 28%|██▊ | 778/2774 [2:33:20<6:26:56, 11.63s/it] 28%|██▊ | 779/2774 [2:33:31<6:21:41, 11.48s/it] {'loss': 1.0464, 'learning_rate': 4.220709941710871e-06, 'epoch': 0.28} 28%|██▊ | 779/2774 [2:33:31<6:21:41, 11.48s/it] 28%|██▊ | 780/2774 [2:33:42<6:21:10, 11.47s/it] {'loss': 1.0151, 'learning_rate': 4.218590700645267e-06, 'epoch': 0.28} 28%|██▊ | 780/2774 [2:33:42<6:21:10, 11.47s/it] 28%|██▊ | 781/2774 [2:33:54<6:20:57, 11.47s/it] {'loss': 1.0562, 'learning_rate': 4.216469115525763e-06, 'epoch': 0.28} 28%|██▊ | 781/2774 [2:33:54<6:20:57, 11.47s/it] 28%|██▊ | 782/2774 [2:34:05<6:19:40, 11.44s/it] {'loss': 1.0498, 'learning_rate': 4.214345189246077e-06, 'epoch': 0.28} 28%|██▊ | 782/2774 [2:34:05<6:19:40, 11.44s/it] 28%|██▊ | 783/2774 [2:34:20<6:58:43, 12.62s/it] {'loss': 1.0576, 'learning_rate': 4.212218924703111e-06, 'epoch': 0.28} 28%|██▊ | 783/2774 [2:34:20<6:58:43, 12.62s/it] 28%|██▊ | 784/2774 [2:34:32<6:49:44, 12.35s/it] {'loss': 1.0464, 'learning_rate': 4.210090324796965e-06, 'epoch': 0.28} 28%|██▊ | 784/2774 [2:34:32<6:49:44, 12.35s/it] 28%|██▊ | 785/2774 [2:34:44<6:39:55, 12.06s/it] {'loss': 1.04, 'learning_rate': 4.207959392430921e-06, 'epoch': 0.28} 28%|██▊ | 785/2774 [2:34:44<6:39:55, 12.06s/it] 28%|██▊ | 786/2774 [2:34:55<6:37:16, 11.99s/it] {'loss': 1.0273, 'learning_rate': 4.205826130511439e-06, 'epoch': 0.28} 28%|██▊ | 786/2774 [2:34:55<6:37:16, 11.99s/it] 28%|██▊ | 787/2774 [2:35:07<6:30:35, 11.79s/it] {'loss': 1.064, 'learning_rate': 4.2036905419481615e-06, 'epoch': 0.28} 28%|██▊ | 787/2774 [2:35:07<6:30:35, 11.79s/it] 28%|██▊ | 788/2774 [2:35:18<6:28:17, 11.73s/it] {'loss': 1.0757, 'learning_rate': 4.201552629653902e-06, 'epoch': 0.28} 28%|██▊ | 788/2774 [2:35:18<6:28:17, 11.73s/it] 28%|██▊ | 789/2774 [2:35:30<6:24:30, 11.62s/it] {'loss': 0.9976, 'learning_rate': 4.1994123965446435e-06, 'epoch': 0.28} 28%|██▊ | 789/2774 [2:35:30<6:24:30, 11.62s/it] 28%|██▊ | 790/2774 [2:35:43<6:35:54, 11.97s/it] {'loss': 0.9971, 'learning_rate': 4.197269845539535e-06, 'epoch': 0.28} 28%|██▊ | 790/2774 [2:35:43<6:35:54, 11.97s/it] 29%|██▊ | 791/2774 [2:35:54<6:30:00, 11.80s/it] {'loss': 1.0527, 'learning_rate': 4.1951249795608865e-06, 'epoch': 0.29} 29%|██▊ | 791/2774 [2:35:54<6:30:00, 11.80s/it] 29%|██▊ | 792/2774 [2:36:05<6:26:18, 11.69s/it] {'loss': 1.0518, 'learning_rate': 4.192977801534165e-06, 'epoch': 0.29} 29%|██▊ | 792/2774 [2:36:05<6:26:18, 11.69s/it] 29%|██▊ | 793/2774 [2:36:17<6:23:55, 11.63s/it] {'loss': 1.063, 'learning_rate': 4.1908283143879925e-06, 'epoch': 0.29} 29%|██▊ | 793/2774 [2:36:17<6:23:55, 11.63s/it] 29%|██▊ | 794/2774 [2:36:29<6:26:43, 11.72s/it] {'loss': 0.9517, 'learning_rate': 4.188676521054139e-06, 'epoch': 0.29} 29%|██▊ | 794/2774 [2:36:29<6:26:43, 11.72s/it] 29%|██▊ | 795/2774 [2:36:40<6:25:00, 11.67s/it] {'loss': 1.0679, 'learning_rate': 4.186522424467522e-06, 'epoch': 0.29} 29%|██▊ | 795/2774 [2:36:40<6:25:00, 11.67s/it] 29%|██▊ | 796/2774 [2:36:52<6:21:45, 11.58s/it] {'loss': 1.0723, 'learning_rate': 4.1843660275661964e-06, 'epoch': 0.29} 29%|██▊ | 796/2774 [2:36:52<6:21:45, 11.58s/it] 29%|██▊ | 797/2774 [2:37:03<6:23:06, 11.63s/it] {'loss': 1.0864, 'learning_rate': 4.1822073332913605e-06, 'epoch': 0.29} 29%|██▊ | 797/2774 [2:37:03<6:23:06, 11.63s/it] 29%|██▉ | 798/2774 [2:37:15<6:22:30, 11.61s/it] {'loss': 1.0532, 'learning_rate': 4.1800463445873405e-06, 'epoch': 0.29} 29%|██▉ | 798/2774 [2:37:15<6:22:30, 11.61s/it] 29%|██▉ | 799/2774 [2:37:28<6:32:21, 11.92s/it] {'loss': 0.998, 'learning_rate': 4.177883064401596e-06, 'epoch': 0.29} 29%|██▉ | 799/2774 [2:37:28<6:32:21, 11.92s/it] 29%|██▉ | 800/2774 [2:37:41<6:45:51, 12.34s/it] {'loss': 0.9844, 'learning_rate': 4.175717495684709e-06, 'epoch': 0.29} 29%|██▉ | 800/2774 [2:37:41<6:45:51, 12.34s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 29%|██▉ | 801/2774 [2:38:20<11:09:33, 20.36s/it] {'loss': 1.0322, 'learning_rate': 4.1735496413903855e-06, 'epoch': 0.29} 29%|██▉ | 801/2774 [2:38:20<11:09:33, 20.36s/it] 29%|██▉ | 802/2774 [2:38:32<9:42:41, 17.73s/it] {'loss': 1.0605, 'learning_rate': 4.171379504475447e-06, 'epoch': 0.29} 29%|██▉ | 802/2774 [2:38:32<9:42:41, 17.73s/it] 29%|██▉ | 803/2774 [2:38:43<8:40:47, 15.85s/it] {'loss': 1.041, 'learning_rate': 4.169207087899829e-06, 'epoch': 0.29} 29%|██▉ | 803/2774 [2:38:43<8:40:47, 15.85s/it] 29%|██▉ | 804/2774 [2:38:54<7:54:39, 14.46s/it] {'loss': 1.0391, 'learning_rate': 4.167032394626579e-06, 'epoch': 0.29} 29%|██▉ | 804/2774 [2:38:54<7:54:39, 14.46s/it] 29%|██▉ | 805/2774 [2:39:06<7:31:41, 13.76s/it] {'loss': 1.0249, 'learning_rate': 4.164855427621844e-06, 'epoch': 0.29} 29%|██▉ | 805/2774 [2:39:06<7:31:41, 13.76s/it] 29%|██▉ | 806/2774 [2:39:18<7:07:35, 13.04s/it] {'loss': 1.0601, 'learning_rate': 4.162676189854877e-06, 'epoch': 0.29} 29%|██▉ | 806/2774 [2:39:18<7:07:35, 13.04s/it] 29%|██▉ | 807/2774 [2:39:29<6:52:33, 12.58s/it] {'loss': 1.085, 'learning_rate': 4.160494684298027e-06, 'epoch': 0.29} 29%|██▉ | 807/2774 [2:39:29<6:52:33, 12.58s/it] 29%|██▉ | 808/2774 [2:39:41<6:42:00, 12.27s/it] {'loss': 1.0444, 'learning_rate': 4.158310913926735e-06, 'epoch': 0.29} 29%|██▉ | 808/2774 [2:39:41<6:42:00, 12.27s/it] 29%|██▉ | 809/2774 [2:39:53<6:41:19, 12.25s/it] {'loss': 0.9932, 'learning_rate': 4.156124881719533e-06, 'epoch': 0.29} 29%|██▉ | 809/2774 [2:39:53<6:41:19, 12.25s/it] 29%|██▉ | 810/2774 [2:40:05<6:36:39, 12.12s/it] {'loss': 1.0425, 'learning_rate': 4.1539365906580354e-06, 'epoch': 0.29} 29%|██▉ | 810/2774 [2:40:05<6:36:39, 12.12s/it] 29%|██▉ | 811/2774 [2:40:16<6:31:21, 11.96s/it] {'loss': 1.0381, 'learning_rate': 4.15174604372694e-06, 'epoch': 0.29} 29%|██▉ | 811/2774 [2:40:16<6:31:21, 11.96s/it] 29%|██▉ | 812/2774 [2:40:28<6:24:49, 11.77s/it] {'loss': 1.02, 'learning_rate': 4.14955324391402e-06, 'epoch': 0.29} 29%|██▉ | 812/2774 [2:40:28<6:24:49, 11.77s/it] 29%|██▉ | 813/2774 [2:40:40<6:24:07, 11.75s/it] {'loss': 1.019, 'learning_rate': 4.147358194210122e-06, 'epoch': 0.29} 29%|██▉ | 813/2774 [2:40:40<6:24:07, 11.75s/it] 29%|██▉ | 814/2774 [2:40:52<6:33:59, 12.06s/it] {'loss': 0.9517, 'learning_rate': 4.14516089760916e-06, 'epoch': 0.29} 29%|██▉ | 814/2774 [2:40:52<6:33:59, 12.06s/it] 29%|██▉ | 815/2774 [2:41:04<6:31:49, 12.00s/it] {'loss': 1.0571, 'learning_rate': 4.1429613571081164e-06, 'epoch': 0.29} 29%|██▉ | 815/2774 [2:41:04<6:31:49, 12.00s/it] 29%|██▉ | 816/2774 [2:41:16<6:30:26, 11.96s/it] {'loss': 1.0913, 'learning_rate': 4.140759575707031e-06, 'epoch': 0.29} 29%|██▉ | 816/2774 [2:41:16<6:30:26, 11.96s/it] 29%|██▉ | 817/2774 [2:41:27<6:23:57, 11.77s/it] {'loss': 1.0903, 'learning_rate': 4.138555556408998e-06, 'epoch': 0.29} 29%|██▉ | 817/2774 [2:41:27<6:23:57, 11.77s/it] 29%|██▉ | 818/2774 [2:41:41<6:37:56, 12.21s/it] {'loss': 1.0503, 'learning_rate': 4.13634930222017e-06, 'epoch': 0.29} 29%|██▉ | 818/2774 [2:41:41<6:37:56, 12.21s/it] 30%|██▉ | 819/2774 [2:41:52<6:26:57, 11.88s/it] {'loss': 1.0332, 'learning_rate': 4.134140816149742e-06, 'epoch': 0.3} 30%|██▉ | 819/2774 [2:41:52<6:26:57, 11.88s/it] 30%|██▉ | 820/2774 [2:42:03<6:25:37, 11.84s/it] {'loss': 1.0962, 'learning_rate': 4.1319301012099575e-06, 'epoch': 0.3} 30%|██▉ | 820/2774 [2:42:03<6:25:37, 11.84s/it] 30%|██▉ | 821/2774 [2:42:15<6:22:16, 11.74s/it] {'loss': 1.0664, 'learning_rate': 4.1297171604160965e-06, 'epoch': 0.3} 30%|██▉ | 821/2774 [2:42:15<6:22:16, 11.74s/it] 30%|██▉ | 822/2774 [2:42:27<6:25:09, 11.84s/it] {'loss': 1.1011, 'learning_rate': 4.127501996786477e-06, 'epoch': 0.3} 30%|██▉ | 822/2774 [2:42:27<6:25:09, 11.84s/it] 30%|██▉ | 823/2774 [2:42:39<6:22:18, 11.76s/it] {'loss': 0.9912, 'learning_rate': 4.125284613342449e-06, 'epoch': 0.3} 30%|██▉ | 823/2774 [2:42:39<6:22:18, 11.76s/it] 30%|██▉ | 824/2774 [2:42:50<6:22:26, 11.77s/it] {'loss': 1.0684, 'learning_rate': 4.1230650131083884e-06, 'epoch': 0.3} 30%|██▉ | 824/2774 [2:42:50<6:22:26, 11.77s/it] 30%|██▉ | 825/2774 [2:43:02<6:16:50, 11.60s/it] {'loss': 1.0308, 'learning_rate': 4.120843199111697e-06, 'epoch': 0.3} 30%|██▉ | 825/2774 [2:43:02<6:16:50, 11.60s/it] 30%|██▉ | 826/2774 [2:43:13<6:14:04, 11.52s/it] {'loss': 1.0405, 'learning_rate': 4.118619174382794e-06, 'epoch': 0.3} 30%|██▉ | 826/2774 [2:43:13<6:14:04, 11.52s/it] 30%|██▉ | 827/2774 [2:43:24<6:13:15, 11.50s/it] {'loss': 1.0684, 'learning_rate': 4.116392941955117e-06, 'epoch': 0.3} 30%|██▉ | 827/2774 [2:43:24<6:13:15, 11.50s/it] 30%|██▉ | 828/2774 [2:43:36<6:13:08, 11.50s/it] {'loss': 0.9971, 'learning_rate': 4.114164504865108e-06, 'epoch': 0.3} 30%|██▉ | 828/2774 [2:43:36<6:13:08, 11.50s/it] 30%|██▉ | 829/2774 [2:43:47<6:11:23, 11.46s/it] {'loss': 1.0957, 'learning_rate': 4.111933866152225e-06, 'epoch': 0.3} 30%|██▉ | 829/2774 [2:43:47<6:11:23, 11.46s/it] 30%|██▉ | 830/2774 [2:43:59<6:13:11, 11.52s/it] {'loss': 1.1035, 'learning_rate': 4.1097010288589225e-06, 'epoch': 0.3} 30%|██▉ | 830/2774 [2:43:59<6:13:11, 11.52s/it] 30%|██▉ | 831/2774 [2:44:10<6:10:36, 11.44s/it] {'loss': 1.0396, 'learning_rate': 4.107465996030657e-06, 'epoch': 0.3} 30%|██▉ | 831/2774 [2:44:10<6:10:36, 11.44s/it] 30%|██▉ | 832/2774 [2:44:22<6:12:21, 11.50s/it] {'loss': 1.061, 'learning_rate': 4.105228770715876e-06, 'epoch': 0.3} 30%|██▉ | 832/2774 [2:44:22<6:12:21, 11.50s/it] 30%|███ | 833/2774 [2:44:33<6:13:52, 11.56s/it] {'loss': 1.0474, 'learning_rate': 4.102989355966021e-06, 'epoch': 0.3} 30%|███ | 833/2774 [2:44:33<6:13:52, 11.56s/it] 30%|███ | 834/2774 [2:44:45<6:12:24, 11.52s/it] {'loss': 1.0752, 'learning_rate': 4.100747754835518e-06, 'epoch': 0.3} 30%|███ | 834/2774 [2:44:45<6:12:24, 11.52s/it] 30%|███ | 835/2774 [2:44:58<6:30:46, 12.09s/it] {'loss': 1.0083, 'learning_rate': 4.098503970381777e-06, 'epoch': 0.3} 30%|███ | 835/2774 [2:44:58<6:30:46, 12.09s/it] 30%|███ | 836/2774 [2:45:10<6:26:56, 11.98s/it] {'loss': 0.9746, 'learning_rate': 4.0962580056651835e-06, 'epoch': 0.3} 30%|███ | 836/2774 [2:45:10<6:26:56, 11.98s/it] 30%|███ | 837/2774 [2:45:23<6:33:24, 12.19s/it] {'loss': 0.9878, 'learning_rate': 4.0940098637490964e-06, 'epoch': 0.3} 30%|███ | 837/2774 [2:45:23<6:33:24, 12.19s/it] 30%|███ | 838/2774 [2:45:35<6:32:04, 12.15s/it] {'loss': 1.063, 'learning_rate': 4.091759547699848e-06, 'epoch': 0.3} 30%|███ | 838/2774 [2:45:35<6:32:04, 12.15s/it] 30%|███ | 839/2774 [2:45:46<6:25:46, 11.96s/it] {'loss': 1.0576, 'learning_rate': 4.089507060586731e-06, 'epoch': 0.3} 30%|███ | 839/2774 [2:45:46<6:25:46, 11.96s/it] 30%|███ | 840/2774 [2:45:59<6:34:22, 12.24s/it] {'loss': 1.0542, 'learning_rate': 4.087252405482002e-06, 'epoch': 0.3} 30%|███ | 840/2774 [2:45:59<6:34:22, 12.24s/it] 30%|███ | 841/2774 [2:46:11<6:25:59, 11.98s/it] {'loss': 1.0767, 'learning_rate': 4.084995585460877e-06, 'epoch': 0.3} 30%|███ | 841/2774 [2:46:11<6:25:59, 11.98s/it] 30%|███ | 842/2774 [2:46:22<6:21:41, 11.85s/it] {'loss': 1.0278, 'learning_rate': 4.082736603601519e-06, 'epoch': 0.3} 30%|███ | 842/2774 [2:46:22<6:21:41, 11.85s/it] 30%|███ | 843/2774 [2:46:34<6:18:35, 11.76s/it] {'loss': 1.0537, 'learning_rate': 4.0804754629850445e-06, 'epoch': 0.3} 30%|███ | 843/2774 [2:46:34<6:18:35, 11.76s/it] 30%|███ | 844/2774 [2:46:45<6:13:51, 11.62s/it] {'loss': 1.0283, 'learning_rate': 4.078212166695513e-06, 'epoch': 0.3} 30%|███ | 844/2774 [2:46:45<6:13:51, 11.62s/it] 30%|███ | 845/2774 [2:46:57<6:15:41, 11.69s/it] {'loss': 1.0361, 'learning_rate': 4.075946717819923e-06, 'epoch': 0.3} 30%|███ | 845/2774 [2:46:57<6:15:41, 11.69s/it] 30%|███ | 846/2774 [2:47:08<6:13:44, 11.63s/it] {'loss': 1.0493, 'learning_rate': 4.07367911944821e-06, 'epoch': 0.3} 30%|███ | 846/2774 [2:47:08<6:13:44, 11.63s/it] 31%|███ | 847/2774 [2:47:19<6:08:40, 11.48s/it] {'loss': 1.0215, 'learning_rate': 4.071409374673241e-06, 'epoch': 0.31} 31%|███ | 847/2774 [2:47:19<6:08:40, 11.48s/it] 31%|███ | 848/2774 [2:47:32<6:19:41, 11.83s/it] {'loss': 0.9629, 'learning_rate': 4.069137486590812e-06, 'epoch': 0.31} 31%|███ | 848/2774 [2:47:32<6:19:41, 11.83s/it] 31%|███ | 849/2774 [2:47:43<6:12:58, 11.63s/it] {'loss': 1.0264, 'learning_rate': 4.06686345829964e-06, 'epoch': 0.31} 31%|███ | 849/2774 [2:47:43<6:12:58, 11.63s/it] 31%|███ | 850/2774 [2:47:55<6:14:44, 11.69s/it] {'loss': 1.0, 'learning_rate': 4.0645872929013626e-06, 'epoch': 0.31} 31%|███ | 850/2774 [2:47:55<6:14:44, 11.69s/it] 31%|███ | 851/2774 [2:48:06<6:11:23, 11.59s/it] {'loss': 1.0107, 'learning_rate': 4.062308993500531e-06, 'epoch': 0.31} 31%|███ | 851/2774 [2:48:06<6:11:23, 11.59s/it] 31%|███ | 852/2774 [2:48:18<6:08:01, 11.49s/it] {'loss': 1.0898, 'learning_rate': 4.06002856320461e-06, 'epoch': 0.31} 31%|███ | 852/2774 [2:48:18<6:08:01, 11.49s/it] 31%|███ | 853/2774 [2:48:29<6:07:11, 11.47s/it] {'loss': 1.0215, 'learning_rate': 4.057746005123966e-06, 'epoch': 0.31} 31%|███ | 853/2774 [2:48:29<6:07:11, 11.47s/it] 31%|███ | 854/2774 [2:48:41<6:07:14, 11.48s/it] {'loss': 1.0684, 'learning_rate': 4.055461322371873e-06, 'epoch': 0.31} 31%|███ | 854/2774 [2:48:41<6:07:14, 11.48s/it] 31%|███ | 855/2774 [2:48:52<6:05:49, 11.44s/it] {'loss': 0.9888, 'learning_rate': 4.053174518064499e-06, 'epoch': 0.31} 31%|███ | 855/2774 [2:48:52<6:05:49, 11.44s/it] 31%|███ | 856/2774 [2:49:05<6:22:33, 11.97s/it] {'loss': 1.0635, 'learning_rate': 4.050885595320906e-06, 'epoch': 0.31} 31%|███ | 856/2774 [2:49:05<6:22:33, 11.97s/it] 31%|███ | 857/2774 [2:49:17<6:18:16, 11.84s/it] {'loss': 1.0342, 'learning_rate': 4.048594557263049e-06, 'epoch': 0.31} 31%|███ | 857/2774 [2:49:17<6:18:16, 11.84s/it] 31%|███ | 858/2774 [2:49:29<6:22:13, 11.97s/it] {'loss': 0.9648, 'learning_rate': 4.046301407015763e-06, 'epoch': 0.31} 31%|███ | 858/2774 [2:49:29<6:22:13, 11.97s/it] 31%|███ | 859/2774 [2:49:40<6:16:49, 11.81s/it] {'loss': 1.063, 'learning_rate': 4.044006147706768e-06, 'epoch': 0.31} 31%|███ | 859/2774 [2:49:40<6:16:49, 11.81s/it] 31%|███ | 860/2774 [2:49:52<6:15:39, 11.78s/it] {'loss': 1.085, 'learning_rate': 4.041708782466657e-06, 'epoch': 0.31} 31%|███ | 860/2774 [2:49:52<6:15:39, 11.78s/it] 31%|███ | 861/2774 [2:50:04<6:14:00, 11.73s/it] {'loss': 1.0498, 'learning_rate': 4.039409314428899e-06, 'epoch': 0.31} 31%|███ | 861/2774 [2:50:04<6:14:00, 11.73s/it] 31%|███ | 862/2774 [2:50:15<6:10:54, 11.64s/it] {'loss': 1.0103, 'learning_rate': 4.0371077467298305e-06, 'epoch': 0.31} 31%|███ | 862/2774 [2:50:15<6:10:54, 11.64s/it] 31%|███ | 863/2774 [2:50:27<6:09:42, 11.61s/it] {'loss': 1.0767, 'learning_rate': 4.0348040825086485e-06, 'epoch': 0.31} 31%|███ | 863/2774 [2:50:27<6:09:42, 11.61s/it] 31%|███ | 864/2774 [2:50:38<6:08:23, 11.57s/it] {'loss': 0.9951, 'learning_rate': 4.032498324907413e-06, 'epoch': 0.31} 31%|███ | 864/2774 [2:50:38<6:08:23, 11.57s/it] 31%|███ | 865/2774 [2:50:50<6:06:07, 11.51s/it] {'loss': 1.0034, 'learning_rate': 4.030190477071039e-06, 'epoch': 0.31} 31%|███ | 865/2774 [2:50:50<6:06:07, 11.51s/it] 31%|███ | 866/2774 [2:51:01<6:06:12, 11.52s/it] {'loss': 1.0073, 'learning_rate': 4.02788054214729e-06, 'epoch': 0.31} 31%|███ | 866/2774 [2:51:01<6:06:12, 11.52s/it] 31%|███▏ | 867/2774 [2:51:12<6:02:25, 11.40s/it] {'loss': 0.9995, 'learning_rate': 4.025568523286778e-06, 'epoch': 0.31} 31%|███▏ | 867/2774 [2:51:12<6:02:25, 11.40s/it] 31%|███▏ | 868/2774 [2:51:24<6:08:10, 11.59s/it] {'loss': 1.0288, 'learning_rate': 4.023254423642957e-06, 'epoch': 0.31} 31%|███▏ | 868/2774 [2:51:24<6:08:10, 11.59s/it] 31%|███▏ | 869/2774 [2:51:36<6:06:46, 11.55s/it] {'loss': 1.02, 'learning_rate': 4.020938246372119e-06, 'epoch': 0.31} 31%|███▏ | 869/2774 [2:51:36<6:06:46, 11.55s/it] 31%|███▏ | 870/2774 [2:51:48<6:13:43, 11.78s/it] {'loss': 1.0273, 'learning_rate': 4.018619994633391e-06, 'epoch': 0.31} 31%|███▏ | 870/2774 [2:51:48<6:13:43, 11.78s/it] 31%|███▏ | 871/2774 [2:51:59<6:07:54, 11.60s/it] {'loss': 0.9927, 'learning_rate': 4.016299671588728e-06, 'epoch': 0.31} 31%|███▏ | 871/2774 [2:51:59<6:07:54, 11.60s/it] 31%|███▏ | 872/2774 [2:52:10<6:04:32, 11.50s/it] {'loss': 1.0015, 'learning_rate': 4.01397728040291e-06, 'epoch': 0.31} 31%|███▏ | 872/2774 [2:52:10<6:04:32, 11.50s/it] 31%|███▏ | 873/2774 [2:52:22<6:00:31, 11.38s/it] {'loss': 1.0078, 'learning_rate': 4.011652824243538e-06, 'epoch': 0.31} 31%|███▏ | 873/2774 [2:52:22<6:00:31, 11.38s/it] 32%|███▏ | 874/2774 [2:52:33<6:00:07, 11.37s/it] {'loss': 1.0513, 'learning_rate': 4.0093263062810305e-06, 'epoch': 0.32} 32%|███▏ | 874/2774 [2:52:33<6:00:07, 11.37s/it] 32%|███▏ | 875/2774 [2:52:44<5:59:24, 11.36s/it] {'loss': 1.0601, 'learning_rate': 4.006997729688616e-06, 'epoch': 0.32} 32%|███▏ | 875/2774 [2:52:44<5:59:24, 11.36s/it] 32%|███▏ | 876/2774 [2:52:55<5:58:04, 11.32s/it] {'loss': 1.0576, 'learning_rate': 4.004667097642334e-06, 'epoch': 0.32} 32%|███▏ | 876/2774 [2:52:55<5:58:04, 11.32s/it] 32%|███▏ | 877/2774 [2:53:07<5:58:39, 11.34s/it] {'loss': 1.0464, 'learning_rate': 4.002334413321026e-06, 'epoch': 0.32} 32%|███▏ | 877/2774 [2:53:07<5:58:39, 11.34s/it] 32%|███▏ | 878/2774 [2:53:18<6:00:52, 11.42s/it] {'loss': 1.0293, 'learning_rate': 3.999999679906331e-06, 'epoch': 0.32} 32%|███▏ | 878/2774 [2:53:18<6:00:52, 11.42s/it] 32%|███▏ | 879/2774 [2:53:30<5:59:43, 11.39s/it] {'loss': 1.0732, 'learning_rate': 3.997662900582685e-06, 'epoch': 0.32} 32%|███▏ | 879/2774 [2:53:30<5:59:43, 11.39s/it] 32%|███▏ | 880/2774 [2:53:41<6:00:37, 11.42s/it] {'loss': 1.0327, 'learning_rate': 3.9953240785373145e-06, 'epoch': 0.32} 32%|███▏ | 880/2774 [2:53:41<6:00:37, 11.42s/it] 32%|███▏ | 881/2774 [2:53:53<5:59:42, 11.40s/it] {'loss': 1.002, 'learning_rate': 3.992983216960231e-06, 'epoch': 0.32} 32%|███▏ | 881/2774 [2:53:53<5:59:42, 11.40s/it] 32%|███▏ | 882/2774 [2:54:06<6:15:28, 11.91s/it] {'loss': 0.9893, 'learning_rate': 3.990640319044228e-06, 'epoch': 0.32} 32%|███▏ | 882/2774 [2:54:06<6:15:28, 11.91s/it] 32%|███▏ | 883/2774 [2:54:17<6:11:05, 11.77s/it] {'loss': 0.9893, 'learning_rate': 3.9882953879848764e-06, 'epoch': 0.32} 32%|███▏ | 883/2774 [2:54:17<6:11:05, 11.77s/it] 32%|███▏ | 884/2774 [2:54:29<6:09:54, 11.74s/it] {'loss': 1.061, 'learning_rate': 3.9859484269805215e-06, 'epoch': 0.32} 32%|███▏ | 884/2774 [2:54:29<6:09:54, 11.74s/it] 32%|███▏ | 885/2774 [2:54:40<6:04:38, 11.58s/it] {'loss': 1.0249, 'learning_rate': 3.9835994392322755e-06, 'epoch': 0.32} 32%|███▏ | 885/2774 [2:54:40<6:04:38, 11.58s/it] 32%|███▏ | 886/2774 [2:54:53<6:18:40, 12.03s/it] {'loss': 0.9839, 'learning_rate': 3.981248427944017e-06, 'epoch': 0.32} 32%|███▏ | 886/2774 [2:54:53<6:18:40, 12.03s/it] 32%|███▏ | 887/2774 [2:55:05<6:21:20, 12.13s/it] {'loss': 1.0376, 'learning_rate': 3.9788953963223815e-06, 'epoch': 0.32} 32%|███▏ | 887/2774 [2:55:05<6:21:20, 12.13s/it] 32%|███▏ | 888/2774 [2:55:17<6:15:24, 11.94s/it] {'loss': 1.0083, 'learning_rate': 3.976540347576763e-06, 'epoch': 0.32} 32%|███▏ | 888/2774 [2:55:17<6:15:24, 11.94s/it] 32%|███▏ | 889/2774 [2:55:28<6:09:04, 11.75s/it] {'loss': 1.0679, 'learning_rate': 3.974183284919306e-06, 'epoch': 0.32} 32%|███▏ | 889/2774 [2:55:28<6:09:04, 11.75s/it] 32%|███▏ | 890/2774 [2:55:40<6:07:06, 11.69s/it] {'loss': 1.0908, 'learning_rate': 3.971824211564902e-06, 'epoch': 0.32} 32%|███▏ | 890/2774 [2:55:40<6:07:06, 11.69s/it] 32%|███▏ | 891/2774 [2:55:51<6:05:49, 11.66s/it] {'loss': 1.0347, 'learning_rate': 3.969463130731183e-06, 'epoch': 0.32} 32%|███▏ | 891/2774 [2:55:51<6:05:49, 11.66s/it] 32%|███▏ | 892/2774 [2:56:03<6:04:13, 11.61s/it] {'loss': 1.0439, 'learning_rate': 3.967100045638522e-06, 'epoch': 0.32} 32%|███▏ | 892/2774 [2:56:03<6:04:13, 11.61s/it] 32%|███▏ | 893/2774 [2:56:16<6:15:10, 11.97s/it] {'loss': 0.9761, 'learning_rate': 3.964734959510024e-06, 'epoch': 0.32} 32%|███▏ | 893/2774 [2:56:16<6:15:10, 11.97s/it] 32%|███▏ | 894/2774 [2:56:28<6:13:36, 11.92s/it] {'loss': 1.0459, 'learning_rate': 3.962367875571522e-06, 'epoch': 0.32} 32%|███▏ | 894/2774 [2:56:28<6:13:36, 11.92s/it] 32%|███▏ | 895/2774 [2:56:39<6:11:25, 11.86s/it] {'loss': 1.0366, 'learning_rate': 3.959998797051578e-06, 'epoch': 0.32} 32%|███▏ | 895/2774 [2:56:39<6:11:25, 11.86s/it] 32%|███▏ | 896/2774 [2:56:51<6:06:44, 11.72s/it] {'loss': 1.0112, 'learning_rate': 3.957627727181472e-06, 'epoch': 0.32} 32%|███▏ | 896/2774 [2:56:51<6:06:44, 11.72s/it] 32%|███▏ | 897/2774 [2:57:02<6:03:22, 11.62s/it] {'loss': 0.9858, 'learning_rate': 3.955254669195198e-06, 'epoch': 0.32} 32%|███▏ | 897/2774 [2:57:02<6:03:22, 11.62s/it] 32%|███▏ | 898/2774 [2:57:14<6:01:50, 11.57s/it] {'loss': 1.02, 'learning_rate': 3.952879626329464e-06, 'epoch': 0.32} 32%|███▏ | 898/2774 [2:57:14<6:01:50, 11.57s/it] 32%|███▏ | 899/2774 [2:57:26<6:06:42, 11.73s/it] {'loss': 1.0513, 'learning_rate': 3.950502601823686e-06, 'epoch': 0.32} 32%|███▏ | 899/2774 [2:57:26<6:06:42, 11.73s/it] 32%|███▏ | 900/2774 [2:57:38<6:09:45, 11.84s/it] {'loss': 1.0151, 'learning_rate': 3.948123598919982e-06, 'epoch': 0.32} 32%|███▏ | 900/2774 [2:57:38<6:09:45, 11.84s/it] 32%|███▏ | 901/2774 [2:57:49<6:06:59, 11.76s/it] {'loss': 1.0107, 'learning_rate': 3.9457426208631674e-06, 'epoch': 0.32} 32%|███▏ | 901/2774 [2:57:49<6:06:59, 11.76s/it] 33%|███▎ | 902/2774 [2:58:01<6:07:56, 11.79s/it] {'loss': 1.0273, 'learning_rate': 3.943359670900753e-06, 'epoch': 0.33} 33%|███▎ | 902/2774 [2:58:01<6:07:56, 11.79s/it] 33%|███▎ | 903/2774 [2:58:13<6:04:13, 11.68s/it] {'loss': 1.0801, 'learning_rate': 3.940974752282939e-06, 'epoch': 0.33} 33%|███▎ | 903/2774 [2:58:13<6:04:13, 11.68s/it] 33%|███▎ | 904/2774 [2:58:24<6:05:46, 11.74s/it] {'loss': 0.9902, 'learning_rate': 3.9385878682626085e-06, 'epoch': 0.33} 33%|███▎ | 904/2774 [2:58:24<6:05:46, 11.74s/it] 33%|███▎ | 905/2774 [2:58:36<6:02:26, 11.64s/it] {'loss': 0.9917, 'learning_rate': 3.93619902209533e-06, 'epoch': 0.33} 33%|███▎ | 905/2774 [2:58:36<6:02:26, 11.64s/it] 33%|███▎ | 906/2774 [2:58:51<6:30:42, 12.55s/it] {'loss': 0.9932, 'learning_rate': 3.933808217039343e-06, 'epoch': 0.33} 33%|███▎ | 906/2774 [2:58:51<6:30:42, 12.55s/it] 33%|███▎ | 907/2774 [2:59:03<6:31:12, 12.57s/it] {'loss': 1.0381, 'learning_rate': 3.931415456355562e-06, 'epoch': 0.33} 33%|███▎ | 907/2774 [2:59:03<6:31:12, 12.57s/it] 33%|███▎ | 908/2774 [2:59:15<6:23:49, 12.34s/it] {'loss': 1.084, 'learning_rate': 3.929020743307567e-06, 'epoch': 0.33} 33%|███▎ | 908/2774 [2:59:15<6:23:49, 12.34s/it] 33%|███▎ | 909/2774 [2:59:27<6:24:52, 12.38s/it] {'loss': 1.0977, 'learning_rate': 3.926624081161604e-06, 'epoch': 0.33} 33%|███▎ | 909/2774 [2:59:27<6:24:52, 12.38s/it] 33%|███▎ | 910/2774 [2:59:39<6:16:52, 12.13s/it] {'loss': 0.9834, 'learning_rate': 3.9242254731865734e-06, 'epoch': 0.33} 33%|███▎ | 910/2774 [2:59:39<6:16:52, 12.13s/it] 33%|███▎ | 911/2774 [2:59:50<6:09:36, 11.90s/it] {'loss': 1.04, 'learning_rate': 3.921824922654033e-06, 'epoch': 0.33} 33%|███▎ | 911/2774 [2:59:50<6:09:36, 11.90s/it] 33%|███▎ | 912/2774 [3:00:02<6:03:26, 11.71s/it] {'loss': 1.0034, 'learning_rate': 3.919422432838188e-06, 'epoch': 0.33} 33%|███▎ | 912/2774 [3:00:02<6:03:26, 11.71s/it] 33%|███▎ | 913/2774 [3:00:14<6:05:47, 11.79s/it] {'loss': 1.0654, 'learning_rate': 3.91701800701589e-06, 'epoch': 0.33} 33%|███▎ | 913/2774 [3:00:14<6:05:47, 11.79s/it] 33%|███▎ | 914/2774 [3:00:25<6:00:56, 11.64s/it] {'loss': 1.0283, 'learning_rate': 3.914611648466629e-06, 'epoch': 0.33} 33%|███▎ | 914/2774 [3:00:25<6:00:56, 11.64s/it] 33%|███▎ | 915/2774 [3:00:36<5:59:26, 11.60s/it] {'loss': 1.0371, 'learning_rate': 3.912203360472535e-06, 'epoch': 0.33} 33%|███▎ | 915/2774 [3:00:36<5:59:26, 11.60s/it] 33%|███▎ | 916/2774 [3:00:49<6:04:14, 11.76s/it] {'loss': 1.0063, 'learning_rate': 3.909793146318366e-06, 'epoch': 0.33} 33%|███▎ | 916/2774 [3:00:49<6:04:14, 11.76s/it] 33%|███▎ | 917/2774 [3:01:00<6:02:14, 11.70s/it] {'loss': 1.0693, 'learning_rate': 3.907381009291509e-06, 'epoch': 0.33} 33%|███▎ | 917/2774 [3:01:00<6:02:14, 11.70s/it] 33%|███▎ | 918/2774 [3:01:11<5:59:21, 11.62s/it] {'loss': 1.0557, 'learning_rate': 3.904966952681972e-06, 'epoch': 0.33} 33%|███▎ | 918/2774 [3:01:11<5:59:21, 11.62s/it] 33%|███▎ | 919/2774 [3:01:24<6:04:27, 11.79s/it] {'loss': 1.0449, 'learning_rate': 3.902550979782384e-06, 'epoch': 0.33} 33%|███▎ | 919/2774 [3:01:24<6:04:27, 11.79s/it] 33%|███▎ | 920/2774 [3:01:36<6:04:59, 11.81s/it] {'loss': 1.0718, 'learning_rate': 3.900133093887984e-06, 'epoch': 0.33} 33%|███▎ | 920/2774 [3:01:36<6:04:59, 11.81s/it] 33%|███▎ | 921/2774 [3:01:47<5:59:45, 11.65s/it] {'loss': 1.0659, 'learning_rate': 3.897713298296625e-06, 'epoch': 0.33} 33%|███▎ | 921/2774 [3:01:47<5:59:45, 11.65s/it] 33%|███▎ | 922/2774 [3:01:58<5:56:58, 11.57s/it] {'loss': 1.0146, 'learning_rate': 3.89529159630876e-06, 'epoch': 0.33} 33%|███▎ | 922/2774 [3:01:58<5:56:58, 11.57s/it] 33%|███▎ | 923/2774 [3:02:10<5:56:46, 11.56s/it] {'loss': 1.0591, 'learning_rate': 3.892867991227445e-06, 'epoch': 0.33} 33%|███▎ | 923/2774 [3:02:10<5:56:46, 11.56s/it] 33%|███▎ | 924/2774 [3:02:21<5:55:28, 11.53s/it] {'loss': 1.0298, 'learning_rate': 3.890442486358332e-06, 'epoch': 0.33} 33%|███▎ | 924/2774 [3:02:21<5:55:28, 11.53s/it] 33%|███▎ | 925/2774 [3:02:33<5:57:33, 11.60s/it] {'loss': 1.125, 'learning_rate': 3.88801508500966e-06, 'epoch': 0.33} 33%|███▎ | 925/2774 [3:02:33<5:57:33, 11.60s/it] 33%|███▎ | 926/2774 [3:02:44<5:56:15, 11.57s/it] {'loss': 1.0981, 'learning_rate': 3.88558579049226e-06, 'epoch': 0.33} 33%|███▎ | 926/2774 [3:02:44<5:56:15, 11.57s/it] 33%|███▎ | 927/2774 [3:02:56<5:51:52, 11.43s/it] {'loss': 1.0635, 'learning_rate': 3.883154606119544e-06, 'epoch': 0.33} 33%|███▎ | 927/2774 [3:02:56<5:51:52, 11.43s/it] 33%|███▎ | 928/2774 [3:03:08<5:59:51, 11.70s/it] {'loss': 1.04, 'learning_rate': 3.8807215352074975e-06, 'epoch': 0.33} 33%|███▎ | 928/2774 [3:03:08<5:59:51, 11.70s/it] 33%|███▎ | 929/2774 [3:03:19<5:54:58, 11.54s/it] {'loss': 1.1113, 'learning_rate': 3.878286581074685e-06, 'epoch': 0.33} 33%|███▎ | 929/2774 [3:03:19<5:54:58, 11.54s/it] 34%|███▎ | 930/2774 [3:03:31<5:54:33, 11.54s/it] {'loss': 1.0601, 'learning_rate': 3.875849747042236e-06, 'epoch': 0.34} 34%|███▎ | 930/2774 [3:03:31<5:54:33, 11.54s/it] 34%|███▎ | 931/2774 [3:03:42<5:54:06, 11.53s/it] {'loss': 1.0752, 'learning_rate': 3.873411036433845e-06, 'epoch': 0.34} 34%|███▎ | 931/2774 [3:03:42<5:54:06, 11.53s/it] 34%|███▎ | 932/2774 [3:03:54<5:56:48, 11.62s/it] {'loss': 1.0366, 'learning_rate': 3.870970452575765e-06, 'epoch': 0.34} 34%|███▎ | 932/2774 [3:03:54<5:56:48, 11.62s/it] 34%|███▎ | 933/2774 [3:04:05<5:54:47, 11.56s/it] {'loss': 1.0059, 'learning_rate': 3.868527998796807e-06, 'epoch': 0.34} 34%|███▎ | 933/2774 [3:04:05<5:54:47, 11.56s/it] 34%|███▎ | 934/2774 [3:04:17<5:57:50, 11.67s/it] {'loss': 1.0298, 'learning_rate': 3.866083678428328e-06, 'epoch': 0.34} 34%|███▎ | 934/2774 [3:04:17<5:57:50, 11.67s/it] 34%|███▎ | 935/2774 [3:04:29<5:53:55, 11.55s/it] {'loss': 1.0698, 'learning_rate': 3.863637494804235e-06, 'epoch': 0.34} 34%|███▎ | 935/2774 [3:04:29<5:53:55, 11.55s/it] 34%|███▎ | 936/2774 [3:04:40<5:56:39, 11.64s/it] {'loss': 1.0244, 'learning_rate': 3.861189451260974e-06, 'epoch': 0.34} 34%|███▎ | 936/2774 [3:04:40<5:56:39, 11.64s/it] 34%|███▍ | 937/2774 [3:04:52<5:56:47, 11.65s/it] {'loss': 1.0205, 'learning_rate': 3.858739551137528e-06, 'epoch': 0.34} 34%|███▍ | 937/2774 [3:04:52<5:56:47, 11.65s/it] 34%|███▍ | 938/2774 [3:05:05<6:08:59, 12.06s/it] {'loss': 0.9458, 'learning_rate': 3.856287797775414e-06, 'epoch': 0.34} 34%|███▍ | 938/2774 [3:05:05<6:08:59, 12.06s/it] 34%|███▍ | 939/2774 [3:05:16<6:00:52, 11.80s/it] {'loss': 1.0527, 'learning_rate': 3.853834194518675e-06, 'epoch': 0.34} 34%|███▍ | 939/2774 [3:05:16<6:00:52, 11.80s/it] 34%|███▍ | 940/2774 [3:05:27<5:55:03, 11.62s/it] {'loss': 1.0571, 'learning_rate': 3.851378744713879e-06, 'epoch': 0.34} 34%|███▍ | 940/2774 [3:05:27<5:55:03, 11.62s/it] 34%|███▍ | 941/2774 [3:05:40<5:59:38, 11.77s/it] {'loss': 1.0156, 'learning_rate': 3.848921451710107e-06, 'epoch': 0.34} 34%|███▍ | 941/2774 [3:05:40<5:59:38, 11.77s/it] 34%|███▍ | 942/2774 [3:05:51<5:58:10, 11.73s/it] {'loss': 1.0229, 'learning_rate': 3.846462318858962e-06, 'epoch': 0.34} 34%|███▍ | 942/2774 [3:05:51<5:58:10, 11.73s/it] 34%|███▍ | 943/2774 [3:06:03<5:55:28, 11.65s/it] {'loss': 1.0952, 'learning_rate': 3.844001349514553e-06, 'epoch': 0.34} 34%|███▍ | 943/2774 [3:06:03<5:55:28, 11.65s/it] 34%|███▍ | 944/2774 [3:06:14<5:52:24, 11.55s/it] {'loss': 1.0186, 'learning_rate': 3.8415385470334906e-06, 'epoch': 0.34} 34%|███▍ | 944/2774 [3:06:14<5:52:24, 11.55s/it] 34%|███▍ | 945/2774 [3:06:26<5:54:02, 11.61s/it] {'loss': 1.0308, 'learning_rate': 3.83907391477489e-06, 'epoch': 0.34} 34%|███▍ | 945/2774 [3:06:26<5:54:02, 11.61s/it] 34%|███▍ | 946/2774 [3:06:37<5:51:07, 11.52s/it] {'loss': 1.0142, 'learning_rate': 3.836607456100362e-06, 'epoch': 0.34} 34%|███▍ | 946/2774 [3:06:37<5:51:07, 11.52s/it] 34%|███▍ | 947/2774 [3:06:48<5:49:21, 11.47s/it] {'loss': 1.0366, 'learning_rate': 3.834139174374005e-06, 'epoch': 0.34} 34%|███▍ | 947/2774 [3:06:48<5:49:21, 11.47s/it] 34%|███▍ | 948/2774 [3:07:00<5:49:38, 11.49s/it] {'loss': 1.0117, 'learning_rate': 3.831669072962408e-06, 'epoch': 0.34} 34%|███▍ | 948/2774 [3:07:00<5:49:38, 11.49s/it] 34%|███▍ | 949/2774 [3:07:12<5:50:56, 11.54s/it] {'loss': 0.9795, 'learning_rate': 3.82919715523464e-06, 'epoch': 0.34} 34%|███▍ | 949/2774 [3:07:12<5:50:56, 11.54s/it] 34%|███▍ | 950/2774 [3:07:24<5:59:37, 11.83s/it] {'loss': 1.0293, 'learning_rate': 3.826723424562246e-06, 'epoch': 0.34} 34%|███▍ | 950/2774 [3:07:24<5:59:37, 11.83s/it] 34%|███▍ | 951/2774 [3:07:36<5:55:25, 11.70s/it] {'loss': 1.0405, 'learning_rate': 3.824247884319245e-06, 'epoch': 0.34} 34%|███▍ | 951/2774 [3:07:36<5:55:25, 11.70s/it] 34%|███▍ | 952/2774 [3:07:47<5:52:07, 11.60s/it] {'loss': 0.9946, 'learning_rate': 3.821770537882126e-06, 'epoch': 0.34} 34%|███▍ | 952/2774 [3:07:47<5:52:07, 11.60s/it] 34%|███▍ | 953/2774 [3:07:58<5:50:13, 11.54s/it] {'loss': 0.9727, 'learning_rate': 3.81929138862984e-06, 'epoch': 0.34} 34%|███▍ | 953/2774 [3:07:58<5:50:13, 11.54s/it] 34%|███▍ | 954/2774 [3:08:10<5:51:34, 11.59s/it] {'loss': 1.019, 'learning_rate': 3.816810439943795e-06, 'epoch': 0.34} 34%|███▍ | 954/2774 [3:08:10<5:51:34, 11.59s/it] 34%|███▍ | 955/2774 [3:08:21<5:49:40, 11.53s/it] {'loss': 1.0215, 'learning_rate': 3.814327695207858e-06, 'epoch': 0.34} 34%|███▍ | 955/2774 [3:08:21<5:49:40, 11.53s/it] 34%|███▍ | 956/2774 [3:08:33<5:50:18, 11.56s/it] {'loss': 1.0723, 'learning_rate': 3.81184315780834e-06, 'epoch': 0.34} 34%|███▍ | 956/2774 [3:08:33<5:50:18, 11.56s/it] 34%|███▍ | 957/2774 [3:08:44<5:48:11, 11.50s/it] {'loss': 1.0044, 'learning_rate': 3.8093568311340007e-06, 'epoch': 0.34} 34%|███▍ | 957/2774 [3:08:44<5:48:11, 11.50s/it] 35%|███▍ | 958/2774 [3:08:56<5:46:10, 11.44s/it] {'loss': 1.0259, 'learning_rate': 3.8068687185760406e-06, 'epoch': 0.35} 35%|███▍ | 958/2774 [3:08:56<5:46:10, 11.44s/it] 35%|███▍ | 959/2774 [3:09:07<5:43:24, 11.35s/it] {'loss': 1.0493, 'learning_rate': 3.804378823528093e-06, 'epoch': 0.35} 35%|███▍ | 959/2774 [3:09:07<5:43:24, 11.35s/it] 35%|███▍ | 960/2774 [3:09:18<5:42:05, 11.32s/it] {'loss': 1.0259, 'learning_rate': 3.8018871493862265e-06, 'epoch': 0.35} 35%|███▍ | 960/2774 [3:09:18<5:42:05, 11.32s/it] 35%|███▍ | 961/2774 [3:09:32<6:01:07, 11.95s/it] {'loss': 1.0288, 'learning_rate': 3.799393699548932e-06, 'epoch': 0.35} 35%|███▍ | 961/2774 [3:09:32<6:01:07, 11.95s/it] 35%|███▍ | 962/2774 [3:09:43<5:55:38, 11.78s/it] {'loss': 1.0537, 'learning_rate': 3.7968984774171268e-06, 'epoch': 0.35} 35%|███▍ | 962/2774 [3:09:43<5:55:38, 11.78s/it] 35%|███▍ | 963/2774 [3:09:54<5:52:03, 11.66s/it] {'loss': 0.9907, 'learning_rate': 3.7944014863941415e-06, 'epoch': 0.35} 35%|███▍ | 963/2774 [3:09:54<5:52:03, 11.66s/it] 35%|███▍ | 964/2774 [3:10:06<5:49:29, 11.59s/it] {'loss': 1.0645, 'learning_rate': 3.7919027298857213e-06, 'epoch': 0.35} 35%|███▍ | 964/2774 [3:10:06<5:49:29, 11.59s/it] 35%|███▍ | 965/2774 [3:10:17<5:49:39, 11.60s/it] {'loss': 1.0532, 'learning_rate': 3.7894022113000184e-06, 'epoch': 0.35} 35%|███▍ | 965/2774 [3:10:17<5:49:39, 11.60s/it] 35%|███▍ | 966/2774 [3:10:29<5:49:58, 11.61s/it] {'loss': 1.002, 'learning_rate': 3.786899934047591e-06, 'epoch': 0.35} 35%|███▍ | 966/2774 [3:10:29<5:49:58, 11.61s/it] 35%|███▍ | 967/2774 [3:10:41<5:49:10, 11.59s/it] {'loss': 1.0059, 'learning_rate': 3.784395901541393e-06, 'epoch': 0.35} 35%|███▍ | 967/2774 [3:10:41<5:49:10, 11.59s/it] 35%|███▍ | 968/2774 [3:10:52<5:46:54, 11.52s/it] {'loss': 1.0615, 'learning_rate': 3.7818901171967727e-06, 'epoch': 0.35} 35%|███▍ | 968/2774 [3:10:52<5:46:54, 11.52s/it] 35%|███▍ | 969/2774 [3:11:03<5:42:59, 11.40s/it] {'loss': 0.9937, 'learning_rate': 3.7793825844314702e-06, 'epoch': 0.35} 35%|███▍ | 969/2774 [3:11:03<5:42:59, 11.40s/it] 35%|███▍ | 970/2774 [3:11:15<5:45:06, 11.48s/it] {'loss': 1.0366, 'learning_rate': 3.7768733066656075e-06, 'epoch': 0.35} 35%|███▍ | 970/2774 [3:11:15<5:45:06, 11.48s/it] 35%|███▌ | 971/2774 [3:11:26<5:44:33, 11.47s/it] {'loss': 1.0361, 'learning_rate': 3.7743622873216886e-06, 'epoch': 0.35} 35%|███▌ | 971/2774 [3:11:26<5:44:33, 11.47s/it] 35%|███▌ | 972/2774 [3:11:38<5:44:20, 11.47s/it] {'loss': 1.0063, 'learning_rate': 3.7718495298245917e-06, 'epoch': 0.35} 35%|███▌ | 972/2774 [3:11:38<5:44:20, 11.47s/it] 35%|███▌ | 973/2774 [3:11:49<5:46:01, 11.53s/it] {'loss': 1.0449, 'learning_rate': 3.769335037601566e-06, 'epoch': 0.35} 35%|███▌ | 973/2774 [3:11:49<5:46:01, 11.53s/it] 35%|███▌ | 974/2774 [3:12:01<5:49:55, 11.66s/it] {'loss': 1.0537, 'learning_rate': 3.766818814082228e-06, 'epoch': 0.35} 35%|███▌ | 974/2774 [3:12:01<5:49:55, 11.66s/it] 35%|███▌ | 975/2774 [3:12:15<6:07:13, 12.25s/it] {'loss': 1.021, 'learning_rate': 3.7643008626985532e-06, 'epoch': 0.35} 35%|███▌ | 975/2774 [3:12:15<6:07:13, 12.25s/it] 35%|███▌ | 976/2774 [3:12:27<6:02:29, 12.10s/it] {'loss': 1.0664, 'learning_rate': 3.7617811868848774e-06, 'epoch': 0.35} 35%|███▌ | 976/2774 [3:12:27<6:02:29, 12.10s/it] 35%|███▌ | 977/2774 [3:12:40<6:12:14, 12.43s/it] {'loss': 1.0269, 'learning_rate': 3.7592597900778836e-06, 'epoch': 0.35} 35%|███▌ | 977/2774 [3:12:40<6:12:14, 12.43s/it] 35%|███▌ | 978/2774 [3:12:54<6:27:49, 12.96s/it] {'loss': 0.9629, 'learning_rate': 3.756736675716606e-06, 'epoch': 0.35} 35%|███▌ | 978/2774 [3:12:54<6:27:49, 12.96s/it] 35%|███▌ | 979/2774 [3:13:06<6:17:48, 12.63s/it] {'loss': 1.0464, 'learning_rate': 3.7542118472424207e-06, 'epoch': 0.35} 35%|███▌ | 979/2774 [3:13:06<6:17:48, 12.63s/it] 35%|███▌ | 980/2774 [3:13:17<6:08:14, 12.32s/it] {'loss': 1.0469, 'learning_rate': 3.7516853080990403e-06, 'epoch': 0.35} 35%|███▌ | 980/2774 [3:13:17<6:08:14, 12.32s/it] 35%|███▌ | 981/2774 [3:13:28<5:56:31, 11.93s/it] {'loss': 1.002, 'learning_rate': 3.749157061732511e-06, 'epoch': 0.35} 35%|███▌ | 981/2774 [3:13:28<5:56:31, 11.93s/it] 35%|███▌ | 982/2774 [3:13:40<5:54:02, 11.85s/it] {'loss': 1.0425, 'learning_rate': 3.74662711159121e-06, 'epoch': 0.35} 35%|███▌ | 982/2774 [3:13:40<5:54:02, 11.85s/it] 35%|███▌ | 983/2774 [3:13:52<5:58:39, 12.02s/it] {'loss': 1.0137, 'learning_rate': 3.744095461125835e-06, 'epoch': 0.35} 35%|███▌ | 983/2774 [3:13:52<5:58:39, 12.02s/it] 35%|███▌ | 984/2774 [3:14:04<5:55:05, 11.90s/it] {'loss': 1.0259, 'learning_rate': 3.7415621137894055e-06, 'epoch': 0.35} 35%|███▌ | 984/2774 [3:14:04<5:55:05, 11.90s/it] 36%|███▌ | 985/2774 [3:14:15<5:47:28, 11.65s/it] {'loss': 1.0522, 'learning_rate': 3.739027073037253e-06, 'epoch': 0.36} 36%|███▌ | 985/2774 [3:14:15<5:47:28, 11.65s/it] 36%|███▌ | 986/2774 [3:14:29<6:03:34, 12.20s/it] {'loss': 1.0464, 'learning_rate': 3.7364903423270204e-06, 'epoch': 0.36} 36%|███▌ | 986/2774 [3:14:29<6:03:34, 12.20s/it] 36%|███▌ | 987/2774 [3:14:41<6:00:00, 12.09s/it] {'loss': 1.0625, 'learning_rate': 3.733951925118655e-06, 'epoch': 0.36} 36%|███▌ | 987/2774 [3:14:41<6:00:00, 12.09s/it] 36%|███▌ | 988/2774 [3:14:52<5:56:26, 11.97s/it] {'loss': 1.0444, 'learning_rate': 3.7314118248744045e-06, 'epoch': 0.36} 36%|███▌ | 988/2774 [3:14:52<5:56:26, 11.97s/it] 36%|███▌ | 989/2774 [3:15:04<5:53:13, 11.87s/it] {'loss': 1.0332, 'learning_rate': 3.7288700450588134e-06, 'epoch': 0.36} 36%|███▌ | 989/2774 [3:15:04<5:53:13, 11.87s/it] 36%|███▌ | 990/2774 [3:15:15<5:49:19, 11.75s/it] {'loss': 1.106, 'learning_rate': 3.726326589138714e-06, 'epoch': 0.36} 36%|███▌ | 990/2774 [3:15:15<5:49:19, 11.75s/it] 36%|███▌ | 991/2774 [3:15:27<5:48:18, 11.72s/it] {'loss': 1.0371, 'learning_rate': 3.723781460583228e-06, 'epoch': 0.36} 36%|███▌ | 991/2774 [3:15:27<5:48:18, 11.72s/it] 36%|███▌ | 992/2774 [3:15:38<5:44:06, 11.59s/it] {'loss': 1.0503, 'learning_rate': 3.7212346628637557e-06, 'epoch': 0.36} 36%|███▌ | 992/2774 [3:15:38<5:44:06, 11.59s/it] 36%|███▌ | 993/2774 [3:15:50<5:42:11, 11.53s/it] {'loss': 1.042, 'learning_rate': 3.7186861994539763e-06, 'epoch': 0.36} 36%|███▌ | 993/2774 [3:15:50<5:42:11, 11.53s/it] 36%|███▌ | 994/2774 [3:16:02<5:48:00, 11.73s/it] {'loss': 1.0264, 'learning_rate': 3.716136073829839e-06, 'epoch': 0.36} 36%|███▌ | 994/2774 [3:16:02<5:48:00, 11.73s/it] 36%|███▌ | 995/2774 [3:16:14<5:55:14, 11.98s/it] {'loss': 1.0088, 'learning_rate': 3.713584289469563e-06, 'epoch': 0.36} 36%|███▌ | 995/2774 [3:16:14<5:55:14, 11.98s/it] 36%|███▌ | 996/2774 [3:16:26<5:51:37, 11.87s/it] {'loss': 1.0127, 'learning_rate': 3.7110308498536264e-06, 'epoch': 0.36} 36%|███▌ | 996/2774 [3:16:26<5:51:37, 11.87s/it] 36%|███▌ | 997/2774 [3:16:37<5:46:27, 11.70s/it] {'loss': 1.0225, 'learning_rate': 3.7084757584647662e-06, 'epoch': 0.36} 36%|███▌ | 997/2774 [3:16:37<5:46:27, 11.70s/it] 36%|███▌ | 998/2774 [3:16:49<5:46:26, 11.70s/it] {'loss': 1.1157, 'learning_rate': 3.705919018787974e-06, 'epoch': 0.36} 36%|███▌ | 998/2774 [3:16:49<5:46:26, 11.70s/it] 36%|███▌ | 999/2774 [3:17:00<5:44:02, 11.63s/it] {'loss': 1.0571, 'learning_rate': 3.7033606343104877e-06, 'epoch': 0.36} 36%|███▌ | 999/2774 [3:17:00<5:44:02, 11.63s/it] 36%|███▌ | 1000/2774 [3:17:12<5:40:08, 11.50s/it] {'loss': 1.0698, 'learning_rate': 3.700800608521789e-06, 'epoch': 0.36} 36%|███▌ | 1000/2774 [3:17:12<5:40:08, 11.50s/it] 36%|███▌ | 1001/2774 [3:17:23<5:38:05, 11.44s/it] {'loss': 1.0522, 'learning_rate': 3.6982389449135986e-06, 'epoch': 0.36} 36%|███▌ | 1001/2774 [3:17:23<5:38:05, 11.44s/it] 36%|███▌ | 1002/2774 [3:17:35<5:41:53, 11.58s/it] {'loss': 1.0659, 'learning_rate': 3.695675646979871e-06, 'epoch': 0.36} 36%|███▌ | 1002/2774 [3:17:35<5:41:53, 11.58s/it] 36%|███▌ | 1003/2774 [3:17:47<5:42:52, 11.62s/it] {'loss': 1.0371, 'learning_rate': 3.6931107182167904e-06, 'epoch': 0.36} 36%|███▌ | 1003/2774 [3:17:47<5:42:52, 11.62s/it] 36%|███▌ | 1004/2774 [3:17:58<5:44:56, 11.69s/it] {'loss': 1.0786, 'learning_rate': 3.690544162122763e-06, 'epoch': 0.36} 36%|███▌ | 1004/2774 [3:17:58<5:44:56, 11.69s/it] 36%|███▌ | 1005/2774 [3:18:10<5:46:25, 11.75s/it] {'loss': 1.0322, 'learning_rate': 3.6879759821984175e-06, 'epoch': 0.36} 36%|███▌ | 1005/2774 [3:18:10<5:46:25, 11.75s/it] 36%|███▋ | 1006/2774 [3:18:22<5:48:25, 11.82s/it] {'loss': 1.0098, 'learning_rate': 3.685406181946596e-06, 'epoch': 0.36} 36%|███▋ | 1006/2774 [3:18:22<5:48:25, 11.82s/it] 36%|███▋ | 1007/2774 [3:18:34<5:43:41, 11.67s/it] {'loss': 1.04, 'learning_rate': 3.682834764872351e-06, 'epoch': 0.36} 36%|███▋ | 1007/2774 [3:18:34<5:43:41, 11.67s/it] 36%|███▋ | 1008/2774 [3:18:45<5:40:23, 11.57s/it] {'loss': 1.0801, 'learning_rate': 3.6802617344829393e-06, 'epoch': 0.36} 36%|███▋ | 1008/2774 [3:18:45<5:40:23, 11.57s/it] 36%|███▋ | 1009/2774 [3:18:56<5:37:06, 11.46s/it] {'loss': 0.9507, 'learning_rate': 3.6776870942878196e-06, 'epoch': 0.36} 36%|███▋ | 1009/2774 [3:18:56<5:37:06, 11.46s/it] 36%|███▋ | 1010/2774 [3:19:09<5:50:14, 11.91s/it] {'loss': 0.9746, 'learning_rate': 3.675110847798645e-06, 'epoch': 0.36} 36%|███▋ | 1010/2774 [3:19:09<5:50:14, 11.91s/it] 36%|███▋ | 1011/2774 [3:19:21<5:49:30, 11.89s/it] {'loss': 1.0527, 'learning_rate': 3.6725329985292614e-06, 'epoch': 0.36} 36%|███▋ | 1011/2774 [3:19:21<5:49:30, 11.89s/it] 36%|███▋ | 1012/2774 [3:19:33<5:52:26, 12.00s/it] {'loss': 0.9746, 'learning_rate': 3.669953549995698e-06, 'epoch': 0.36} 36%|███▋ | 1012/2774 [3:19:33<5:52:26, 12.00s/it] 37%|███▋ | 1013/2774 [3:19:45<5:47:25, 11.84s/it] {'loss': 1.0317, 'learning_rate': 3.6673725057161676e-06, 'epoch': 0.37} 37%|███▋ | 1013/2774 [3:19:45<5:47:25, 11.84s/it] 37%|███▋ | 1014/2774 [3:19:57<5:47:17, 11.84s/it] {'loss': 1.0029, 'learning_rate': 3.6647898692110578e-06, 'epoch': 0.37} 37%|███▋ | 1014/2774 [3:19:57<5:47:17, 11.84s/it] 37%|███▋ | 1015/2774 [3:20:08<5:43:11, 11.71s/it] {'loss': 1.0073, 'learning_rate': 3.6622056440029303e-06, 'epoch': 0.37} 37%|███▋ | 1015/2774 [3:20:08<5:43:11, 11.71s/it] 37%|███▋ | 1016/2774 [3:20:19<5:41:22, 11.65s/it] {'loss': 1.0317, 'learning_rate': 3.6596198336165107e-06, 'epoch': 0.37} 37%|███▋ | 1016/2774 [3:20:19<5:41:22, 11.65s/it] 37%|███▋ | 1017/2774 [3:20:31<5:38:46, 11.57s/it] {'loss': 1.0498, 'learning_rate': 3.657032441578689e-06, 'epoch': 0.37} 37%|███▋ | 1017/2774 [3:20:31<5:38:46, 11.57s/it] 37%|███▋ | 1018/2774 [3:20:44<5:48:59, 11.92s/it] {'loss': 0.9985, 'learning_rate': 3.6544434714185117e-06, 'epoch': 0.37} 37%|███▋ | 1018/2774 [3:20:44<5:48:59, 11.92s/it] 37%|███▋ | 1019/2774 [3:20:55<5:45:39, 11.82s/it] {'loss': 1.0889, 'learning_rate': 3.6518529266671764e-06, 'epoch': 0.37} 37%|███▋ | 1019/2774 [3:20:55<5:45:39, 11.82s/it] 37%|███▋ | 1020/2774 [3:21:09<6:05:26, 12.50s/it] {'loss': 1.0239, 'learning_rate': 3.649260810858031e-06, 'epoch': 0.37} 37%|███▋ | 1020/2774 [3:21:09<6:05:26, 12.50s/it] 37%|███▋ | 1021/2774 [3:21:22<6:05:27, 12.51s/it] {'loss': 1.0254, 'learning_rate': 3.6466671275265653e-06, 'epoch': 0.37} 37%|███▋ | 1021/2774 [3:21:22<6:05:27, 12.51s/it] 37%|███▋ | 1022/2774 [3:21:35<6:14:38, 12.83s/it] {'loss': 1.0371, 'learning_rate': 3.644071880210405e-06, 'epoch': 0.37} 37%|███▋ | 1022/2774 [3:21:35<6:14:38, 12.83s/it] 37%|███▋ | 1023/2774 [3:21:47<6:01:38, 12.39s/it] {'loss': 1.0396, 'learning_rate': 3.641475072449312e-06, 'epoch': 0.37} 37%|███▋ | 1023/2774 [3:21:47<6:01:38, 12.39s/it] 37%|███▋ | 1024/2774 [3:21:58<5:52:25, 12.08s/it] {'loss': 1.0586, 'learning_rate': 3.6388767077851745e-06, 'epoch': 0.37} 37%|███▋ | 1024/2774 [3:21:58<5:52:25, 12.08s/it] 37%|███▋ | 1025/2774 [3:22:10<5:49:04, 11.97s/it] {'loss': 1.0269, 'learning_rate': 3.6362767897620054e-06, 'epoch': 0.37} 37%|███▋ | 1025/2774 [3:22:10<5:49:04, 11.97s/it] 37%|███▋ | 1026/2774 [3:22:22<5:46:38, 11.90s/it] {'loss': 1.0015, 'learning_rate': 3.633675321925936e-06, 'epoch': 0.37} 37%|███▋ | 1026/2774 [3:22:22<5:46:38, 11.90s/it] 37%|███▋ | 1027/2774 [3:22:33<5:41:06, 11.72s/it] {'loss': 1.103, 'learning_rate': 3.6310723078252103e-06, 'epoch': 0.37} 37%|███▋ | 1027/2774 [3:22:33<5:41:06, 11.72s/it] 37%|███▋ | 1028/2774 [3:22:44<5:39:22, 11.66s/it] {'loss': 1.0278, 'learning_rate': 3.6284677510101827e-06, 'epoch': 0.37} 37%|███▋ | 1028/2774 [3:22:44<5:39:22, 11.66s/it] 37%|███▋ | 1029/2774 [3:22:57<5:47:25, 11.95s/it] {'loss': 1.04, 'learning_rate': 3.6258616550333128e-06, 'epoch': 0.37} 37%|███▋ | 1029/2774 [3:22:57<5:47:25, 11.95s/it] 37%|███▋ | 1030/2774 [3:23:08<5:41:31, 11.75s/it] {'loss': 1.042, 'learning_rate': 3.623254023449156e-06, 'epoch': 0.37} 37%|███▋ | 1030/2774 [3:23:08<5:41:31, 11.75s/it] 37%|███▋ | 1031/2774 [3:23:20<5:39:36, 11.69s/it] {'loss': 1.061, 'learning_rate': 3.620644859814365e-06, 'epoch': 0.37} 37%|███▋ | 1031/2774 [3:23:20<5:39:36, 11.69s/it] 37%|███▋ | 1032/2774 [3:23:31<5:33:35, 11.49s/it] {'loss': 1.0278, 'learning_rate': 3.6180341676876818e-06, 'epoch': 0.37} 37%|███▋ | 1032/2774 [3:23:31<5:33:35, 11.49s/it] 37%|███▋ | 1033/2774 [3:23:43<5:38:03, 11.65s/it] {'loss': 1.0215, 'learning_rate': 3.615421950629932e-06, 'epoch': 0.37} 37%|███▋ | 1033/2774 [3:23:43<5:38:03, 11.65s/it] 37%|███▋ | 1034/2774 [3:23:56<5:49:39, 12.06s/it] {'loss': 1.0264, 'learning_rate': 3.6128082122040224e-06, 'epoch': 0.37} 37%|███▋ | 1034/2774 [3:23:56<5:49:39, 12.06s/it] 37%|███▋ | 1035/2774 [3:24:07<5:41:18, 11.78s/it] {'loss': 0.9805, 'learning_rate': 3.610192955974935e-06, 'epoch': 0.37} 37%|███▋ | 1035/2774 [3:24:07<5:41:18, 11.78s/it] 37%|███▋ | 1036/2774 [3:24:18<5:38:32, 11.69s/it] {'loss': 1.0176, 'learning_rate': 3.60757618550972e-06, 'epoch': 0.37} 37%|███▋ | 1036/2774 [3:24:18<5:38:32, 11.69s/it] 37%|███▋ | 1037/2774 [3:24:30<5:34:56, 11.57s/it] {'loss': 1.0298, 'learning_rate': 3.6049579043774946e-06, 'epoch': 0.37} 37%|███▋ | 1037/2774 [3:24:30<5:34:56, 11.57s/it] 37%|███▋ | 1038/2774 [3:24:41<5:33:58, 11.54s/it] {'loss': 1.0312, 'learning_rate': 3.602338116149437e-06, 'epoch': 0.37} 37%|███▋ | 1038/2774 [3:24:41<5:33:58, 11.54s/it] 37%|███▋ | 1039/2774 [3:24:52<5:30:31, 11.43s/it] {'loss': 1.0151, 'learning_rate': 3.599716824398779e-06, 'epoch': 0.37} 37%|███▋ | 1039/2774 [3:24:52<5:30:31, 11.43s/it] 37%|███▋ | 1040/2774 [3:25:04<5:28:57, 11.38s/it] {'loss': 1.0146, 'learning_rate': 3.5970940327008043e-06, 'epoch': 0.37} 37%|███▋ | 1040/2774 [3:25:04<5:28:57, 11.38s/it] 38%|███▊ | 1041/2774 [3:25:15<5:28:37, 11.38s/it] {'loss': 1.0918, 'learning_rate': 3.594469744632843e-06, 'epoch': 0.38} 38%|███▊ | 1041/2774 [3:25:15<5:28:37, 11.38s/it] 38%|███▊ | 1042/2774 [3:25:26<5:28:18, 11.37s/it] {'loss': 1.0337, 'learning_rate': 3.5918439637742648e-06, 'epoch': 0.38} 38%|███▊ | 1042/2774 [3:25:26<5:28:18, 11.37s/it] 38%|███▊ | 1043/2774 [3:25:38<5:31:47, 11.50s/it] {'loss': 1.04, 'learning_rate': 3.5892166937064765e-06, 'epoch': 0.38} 38%|███▊ | 1043/2774 [3:25:38<5:31:47, 11.50s/it] 38%|███▊ | 1044/2774 [3:25:50<5:33:17, 11.56s/it] {'loss': 1.0474, 'learning_rate': 3.5865879380129157e-06, 'epoch': 0.38} 38%|███▊ | 1044/2774 [3:25:50<5:33:17, 11.56s/it] 38%|███▊ | 1045/2774 [3:26:02<5:34:19, 11.60s/it] {'loss': 1.0444, 'learning_rate': 3.583957700279047e-06, 'epoch': 0.38} 38%|███▊ | 1045/2774 [3:26:02<5:34:19, 11.60s/it] 38%|███▊ | 1046/2774 [3:26:14<5:40:41, 11.83s/it] {'loss': 1.0015, 'learning_rate': 3.5813259840923543e-06, 'epoch': 0.38} 38%|███▊ | 1046/2774 [3:26:14<5:40:41, 11.83s/it] 38%|███▊ | 1047/2774 [3:26:25<5:37:40, 11.73s/it] {'loss': 1.0195, 'learning_rate': 3.5786927930423408e-06, 'epoch': 0.38} 38%|███▊ | 1047/2774 [3:26:25<5:37:40, 11.73s/it] 38%|███▊ | 1048/2774 [3:26:37<5:35:49, 11.67s/it] {'loss': 1.0327, 'learning_rate': 3.57605813072052e-06, 'epoch': 0.38} 38%|███▊ | 1048/2774 [3:26:37<5:35:49, 11.67s/it] 38%|███▊ | 1049/2774 [3:26:49<5:37:25, 11.74s/it] {'loss': 1.0444, 'learning_rate': 3.5734220007204114e-06, 'epoch': 0.38} 38%|███▊ | 1049/2774 [3:26:49<5:37:25, 11.74s/it] 38%|███▊ | 1050/2774 [3:27:01<5:40:02, 11.83s/it] {'loss': 1.0054, 'learning_rate': 3.5707844066375373e-06, 'epoch': 0.38} 38%|███▊ | 1050/2774 [3:27:01<5:40:02, 11.83s/it] 38%|███▊ | 1051/2774 [3:27:13<5:45:15, 12.02s/it] {'loss': 1.0435, 'learning_rate': 3.5681453520694164e-06, 'epoch': 0.38} 38%|███▊ | 1051/2774 [3:27:13<5:45:15, 12.02s/it] 38%|███▊ | 1052/2774 [3:27:25<5:38:11, 11.78s/it] {'loss': 1.02, 'learning_rate': 3.565504840615561e-06, 'epoch': 0.38} 38%|███▊ | 1052/2774 [3:27:25<5:38:11, 11.78s/it] 38%|███▊ | 1053/2774 [3:27:36<5:35:54, 11.71s/it] {'loss': 1.1133, 'learning_rate': 3.5628628758774685e-06, 'epoch': 0.38} 38%|███▊ | 1053/2774 [3:27:36<5:35:54, 11.71s/it] 38%|███▊ | 1054/2774 [3:27:48<5:33:40, 11.64s/it] {'loss': 1.0444, 'learning_rate': 3.5602194614586184e-06, 'epoch': 0.38} 38%|███▊ | 1054/2774 [3:27:48<5:33:40, 11.64s/it] 38%|███▊ | 1055/2774 [3:28:01<5:50:20, 12.23s/it] {'loss': 0.9902, 'learning_rate': 3.5575746009644696e-06, 'epoch': 0.38} 38%|███▊ | 1055/2774 [3:28:01<5:50:20, 12.23s/it] 38%|███▊ | 1056/2774 [3:28:13<5:46:25, 12.10s/it] {'loss': 0.9507, 'learning_rate': 3.554928298002451e-06, 'epoch': 0.38} 38%|███▊ | 1056/2774 [3:28:13<5:46:25, 12.10s/it] 38%|███▊ | 1057/2774 [3:28:26<5:51:55, 12.30s/it] {'loss': 1.0088, 'learning_rate': 3.5522805561819605e-06, 'epoch': 0.38} 38%|███▊ | 1057/2774 [3:28:26<5:51:55, 12.30s/it] 38%|███▊ | 1058/2774 [3:28:37<5:42:57, 11.99s/it] {'loss': 1.0659, 'learning_rate': 3.5496313791143578e-06, 'epoch': 0.38} 38%|███▊ | 1058/2774 [3:28:37<5:42:57, 11.99s/it] 38%|███▊ | 1059/2774 [3:28:49<5:38:38, 11.85s/it] {'loss': 1.0249, 'learning_rate': 3.54698077041296e-06, 'epoch': 0.38} 38%|███▊ | 1059/2774 [3:28:49<5:38:38, 11.85s/it] 38%|███▊ | 1060/2774 [3:29:00<5:33:26, 11.67s/it] {'loss': 0.9556, 'learning_rate': 3.544328733693038e-06, 'epoch': 0.38} 38%|███▊ | 1060/2774 [3:29:00<5:33:26, 11.67s/it] 38%|███▊ | 1061/2774 [3:29:11<5:30:49, 11.59s/it] {'loss': 1.0688, 'learning_rate': 3.54167527257181e-06, 'epoch': 0.38} 38%|███▊ | 1061/2774 [3:29:11<5:30:49, 11.59s/it] 38%|███▊ | 1062/2774 [3:29:24<5:38:37, 11.87s/it] {'loss': 0.979, 'learning_rate': 3.5390203906684356e-06, 'epoch': 0.38} 38%|███▊ | 1062/2774 [3:29:24<5:38:37, 11.87s/it] 38%|███▊ | 1063/2774 [3:29:35<5:35:54, 11.78s/it] {'loss': 1.0298, 'learning_rate': 3.5363640916040137e-06, 'epoch': 0.38} 38%|███▊ | 1063/2774 [3:29:35<5:35:54, 11.78s/it] 38%|███▊ | 1064/2774 [3:29:47<5:32:46, 11.68s/it] {'loss': 1.0249, 'learning_rate': 3.533706379001577e-06, 'epoch': 0.38} 38%|███▊ | 1064/2774 [3:29:47<5:32:46, 11.68s/it] 38%|███▊ | 1065/2774 [3:29:58<5:32:35, 11.68s/it] {'loss': 1.0166, 'learning_rate': 3.531047256486082e-06, 'epoch': 0.38} 38%|███▊ | 1065/2774 [3:29:58<5:32:35, 11.68s/it] 38%|███▊ | 1066/2774 [3:30:10<5:29:20, 11.57s/it] {'loss': 1.0186, 'learning_rate': 3.5283867276844147e-06, 'epoch': 0.38} 38%|███▊ | 1066/2774 [3:30:10<5:29:20, 11.57s/it] 38%|███▊ | 1067/2774 [3:30:22<5:30:57, 11.63s/it] {'loss': 1.0537, 'learning_rate': 3.5257247962253727e-06, 'epoch': 0.38} 38%|███▊ | 1067/2774 [3:30:22<5:30:57, 11.63s/it] 39%|███▊ | 1068/2774 [3:30:33<5:30:24, 11.62s/it] {'loss': 1.0591, 'learning_rate': 3.523061465739671e-06, 'epoch': 0.39} 39%|███▊ | 1068/2774 [3:30:33<5:30:24, 11.62s/it] 39%|███▊ | 1069/2774 [3:30:45<5:29:59, 11.61s/it] {'loss': 0.9956, 'learning_rate': 3.520396739859932e-06, 'epoch': 0.39} 39%|███▊ | 1069/2774 [3:30:45<5:29:59, 11.61s/it] 39%|███▊ | 1070/2774 [3:30:59<5:48:07, 12.26s/it] {'loss': 1.0176, 'learning_rate': 3.5177306222206797e-06, 'epoch': 0.39} 39%|███▊ | 1070/2774 [3:30:59<5:48:07, 12.26s/it] 39%|███▊ | 1071/2774 [3:31:10<5:45:15, 12.16s/it] {'loss': 0.9819, 'learning_rate': 3.5150631164583393e-06, 'epoch': 0.39} 39%|███▊ | 1071/2774 [3:31:10<5:45:15, 12.16s/it] 39%|███▊ | 1072/2774 [3:31:22<5:38:43, 11.94s/it] {'loss': 1.0278, 'learning_rate': 3.5123942262112255e-06, 'epoch': 0.39} 39%|███▊ | 1072/2774 [3:31:22<5:38:43, 11.94s/it] 39%|███▊ | 1073/2774 [3:31:34<5:41:35, 12.05s/it] {'loss': 1.0381, 'learning_rate': 3.509723955119544e-06, 'epoch': 0.39} 39%|███▊ | 1073/2774 [3:31:34<5:41:35, 12.05s/it] 39%|███▊ | 1074/2774 [3:31:46<5:38:29, 11.95s/it] {'loss': 1.0088, 'learning_rate': 3.5070523068253835e-06, 'epoch': 0.39} 39%|███▊ | 1074/2774 [3:31:46<5:38:29, 11.95s/it] 39%|███▉ | 1075/2774 [3:31:58<5:36:06, 11.87s/it] {'loss': 1.042, 'learning_rate': 3.5043792849727116e-06, 'epoch': 0.39} 39%|███▉ | 1075/2774 [3:31:58<5:36:06, 11.87s/it] 39%|███▉ | 1076/2774 [3:32:09<5:31:43, 11.72s/it] {'loss': 1.0908, 'learning_rate': 3.5017048932073674e-06, 'epoch': 0.39} 39%|███▉ | 1076/2774 [3:32:09<5:31:43, 11.72s/it] 39%|███▉ | 1077/2774 [3:32:24<5:55:37, 12.57s/it] {'loss': 1.0137, 'learning_rate': 3.49902913517706e-06, 'epoch': 0.39} 39%|███▉ | 1077/2774 [3:32:24<5:55:37, 12.57s/it] 39%|███▉ | 1078/2774 [3:32:35<5:42:48, 12.13s/it] {'loss': 1.083, 'learning_rate': 3.496352014531361e-06, 'epoch': 0.39} 39%|███▉ | 1078/2774 [3:32:35<5:42:48, 12.13s/it] 39%|███▉ | 1079/2774 [3:32:46<5:34:12, 11.83s/it] {'loss': 1.0103, 'learning_rate': 3.493673534921703e-06, 'epoch': 0.39} 39%|███▉ | 1079/2774 [3:32:46<5:34:12, 11.83s/it] 39%|███▉ | 1080/2774 [3:32:57<5:29:39, 11.68s/it] {'loss': 1.04, 'learning_rate': 3.4909937000013706e-06, 'epoch': 0.39} 39%|███▉ | 1080/2774 [3:32:57<5:29:39, 11.68s/it] 39%|███▉ | 1081/2774 [3:33:08<5:25:56, 11.55s/it] {'loss': 0.9487, 'learning_rate': 3.488312513425495e-06, 'epoch': 0.39} 39%|███▉ | 1081/2774 [3:33:08<5:25:56, 11.55s/it] 39%|███▉ | 1082/2774 [3:33:20<5:25:35, 11.55s/it] {'loss': 1.0898, 'learning_rate': 3.485629978851053e-06, 'epoch': 0.39} 39%|███▉ | 1082/2774 [3:33:20<5:25:35, 11.55s/it] 39%|███▉ | 1083/2774 [3:33:32<5:34:17, 11.86s/it] {'loss': 1.0166, 'learning_rate': 3.4829460999368597e-06, 'epoch': 0.39} 39%|███▉ | 1083/2774 [3:33:32<5:34:17, 11.86s/it] 39%|███▉ | 1084/2774 [3:33:49<6:11:28, 13.19s/it] {'loss': 1.0083, 'learning_rate': 3.480260880343565e-06, 'epoch': 0.39} 39%|███▉ | 1084/2774 [3:33:49<6:11:28, 13.19s/it] 39%|███▉ | 1085/2774 [3:34:00<5:56:42, 12.67s/it] {'loss': 1.0151, 'learning_rate': 3.477574323733645e-06, 'epoch': 0.39} 39%|███▉ | 1085/2774 [3:34:00<5:56:42, 12.67s/it] 39%|███▉ | 1086/2774 [3:34:11<5:43:50, 12.22s/it] {'loss': 1.0444, 'learning_rate': 3.474886433771401e-06, 'epoch': 0.39} 39%|███▉ | 1086/2774 [3:34:11<5:43:50, 12.22s/it] 39%|███▉ | 1087/2774 [3:34:23<5:42:41, 12.19s/it] {'loss': 1.043, 'learning_rate': 3.472197214122953e-06, 'epoch': 0.39} 39%|███▉ | 1087/2774 [3:34:23<5:42:41, 12.19s/it] 39%|███▉ | 1088/2774 [3:34:35<5:37:13, 12.00s/it] {'loss': 1.0225, 'learning_rate': 3.469506668456234e-06, 'epoch': 0.39} 39%|███▉ | 1088/2774 [3:34:35<5:37:13, 12.00s/it] 39%|███▉ | 1089/2774 [3:34:48<5:46:40, 12.34s/it] {'loss': 0.9492, 'learning_rate': 3.466814800440985e-06, 'epoch': 0.39} 39%|███▉ | 1089/2774 [3:34:48<5:46:40, 12.34s/it] 39%|███▉ | 1090/2774 [3:35:00<5:44:19, 12.27s/it] {'loss': 1.0264, 'learning_rate': 3.464121613748752e-06, 'epoch': 0.39} 39%|███▉ | 1090/2774 [3:35:00<5:44:19, 12.27s/it] 39%|███▉ | 1091/2774 [3:35:12<5:37:59, 12.05s/it] {'loss': 1.041, 'learning_rate': 3.4614271120528787e-06, 'epoch': 0.39} 39%|███▉ | 1091/2774 [3:35:12<5:37:59, 12.05s/it] 39%|███▉ | 1092/2774 [3:35:23<5:31:24, 11.82s/it] {'loss': 1.022, 'learning_rate': 3.458731299028503e-06, 'epoch': 0.39} 39%|███▉ | 1092/2774 [3:35:23<5:31:24, 11.82s/it] 39%|███▉ | 1093/2774 [3:35:34<5:26:34, 11.66s/it] {'loss': 1.0337, 'learning_rate': 3.456034178352551e-06, 'epoch': 0.39} 39%|███▉ | 1093/2774 [3:35:34<5:26:34, 11.66s/it] 39%|███▉ | 1094/2774 [3:35:46<5:25:46, 11.63s/it] {'loss': 1.0239, 'learning_rate': 3.4533357537037315e-06, 'epoch': 0.39} 39%|███▉ | 1094/2774 [3:35:46<5:25:46, 11.63s/it] 39%|███▉ | 1095/2774 [3:35:57<5:21:37, 11.49s/it] {'loss': 1.0645, 'learning_rate': 3.4506360287625337e-06, 'epoch': 0.39} 39%|███▉ | 1095/2774 [3:35:57<5:21:37, 11.49s/it] 40%|███▉ | 1096/2774 [3:36:08<5:19:41, 11.43s/it] {'loss': 1.0762, 'learning_rate': 3.4479350072112183e-06, 'epoch': 0.4} 40%|███▉ | 1096/2774 [3:36:08<5:19:41, 11.43s/it] 40%|███▉ | 1097/2774 [3:36:20<5:22:38, 11.54s/it] {'loss': 1.0298, 'learning_rate': 3.445232692733817e-06, 'epoch': 0.4} 40%|███▉ | 1097/2774 [3:36:20<5:22:38, 11.54s/it] 40%|███▉ | 1098/2774 [3:36:32<5:20:36, 11.48s/it] {'loss': 1.0229, 'learning_rate': 3.442529089016123e-06, 'epoch': 0.4} 40%|███▉ | 1098/2774 [3:36:32<5:20:36, 11.48s/it] 40%|███▉ | 1099/2774 [3:36:43<5:19:40, 11.45s/it] {'loss': 0.9966, 'learning_rate': 3.439824199745688e-06, 'epoch': 0.4} 40%|███▉ | 1099/2774 [3:36:43<5:19:40, 11.45s/it] 40%|███▉ | 1100/2774 [3:36:54<5:17:08, 11.37s/it] {'loss': 1.0884, 'learning_rate': 3.4371180286118172e-06, 'epoch': 0.4} 40%|███▉ | 1100/2774 [3:36:54<5:17:08, 11.37s/it] 40%|███▉ | 1101/2774 [3:37:06<5:20:37, 11.50s/it] {'loss': 1.021, 'learning_rate': 3.434410579305565e-06, 'epoch': 0.4} 40%|███▉ | 1101/2774 [3:37:06<5:20:37, 11.50s/it] 40%|███▉ | 1102/2774 [3:37:17<5:20:17, 11.49s/it] {'loss': 1.0591, 'learning_rate': 3.4317018555197303e-06, 'epoch': 0.4} 40%|███▉ | 1102/2774 [3:37:17<5:20:17, 11.49s/it] 40%|███▉ | 1103/2774 [3:37:29<5:24:06, 11.64s/it] {'loss': 1.0356, 'learning_rate': 3.4289918609488453e-06, 'epoch': 0.4} 40%|███▉ | 1103/2774 [3:37:29<5:24:06, 11.64s/it] 40%|███▉ | 1104/2774 [3:37:41<5:21:41, 11.56s/it] {'loss': 1.0093, 'learning_rate': 3.426280599289182e-06, 'epoch': 0.4} 40%|███▉ | 1104/2774 [3:37:41<5:21:41, 11.56s/it] 40%|███▉ | 1105/2774 [3:37:52<5:22:02, 11.58s/it] {'loss': 1.064, 'learning_rate': 3.4235680742387355e-06, 'epoch': 0.4} 40%|███▉ | 1105/2774 [3:37:52<5:22:02, 11.58s/it] 40%|███▉ | 1106/2774 [3:38:03<5:17:48, 11.43s/it] {'loss': 0.9858, 'learning_rate': 3.4208542894972272e-06, 'epoch': 0.4} 40%|███▉ | 1106/2774 [3:38:03<5:17:48, 11.43s/it] 40%|███▉ | 1107/2774 [3:38:15<5:16:16, 11.38s/it] {'loss': 1.0391, 'learning_rate': 3.4181392487660964e-06, 'epoch': 0.4} 40%|███▉ | 1107/2774 [3:38:15<5:16:16, 11.38s/it] 40%|███▉ | 1108/2774 [3:38:26<5:17:26, 11.43s/it] {'loss': 1.0908, 'learning_rate': 3.4154229557484924e-06, 'epoch': 0.4} 40%|███▉ | 1108/2774 [3:38:26<5:17:26, 11.43s/it] 40%|███▉ | 1109/2774 [3:38:38<5:19:32, 11.52s/it] {'loss': 1.0366, 'learning_rate': 3.412705414149276e-06, 'epoch': 0.4} 40%|███▉ | 1109/2774 [3:38:38<5:19:32, 11.52s/it] 40%|████ | 1110/2774 [3:38:50<5:19:49, 11.53s/it] {'loss': 1.0205, 'learning_rate': 3.4099866276750106e-06, 'epoch': 0.4} 40%|████ | 1110/2774 [3:38:50<5:19:49, 11.53s/it] 40%|████ | 1111/2774 [3:39:02<5:29:19, 11.88s/it] {'loss': 1.0029, 'learning_rate': 3.407266600033955e-06, 'epoch': 0.4} 40%|████ | 1111/2774 [3:39:02<5:29:19, 11.88s/it] 40%|████ | 1112/2774 [3:39:13<5:22:49, 11.65s/it] {'loss': 1.0264, 'learning_rate': 3.4045453349360643e-06, 'epoch': 0.4} 40%|████ | 1112/2774 [3:39:13<5:22:49, 11.65s/it] 40%|████ | 1113/2774 [3:39:25<5:20:51, 11.59s/it] {'loss': 1.0835, 'learning_rate': 3.401822836092977e-06, 'epoch': 0.4} 40%|████ | 1113/2774 [3:39:25<5:20:51, 11.59s/it] 40%|████ | 1114/2774 [3:39:37<5:23:13, 11.68s/it] {'loss': 0.9907, 'learning_rate': 3.39909910721802e-06, 'epoch': 0.4} 40%|████ | 1114/2774 [3:39:37<5:23:13, 11.68s/it] 40%|████ | 1115/2774 [3:39:49<5:26:00, 11.79s/it] {'loss': 0.9399, 'learning_rate': 3.396374152026194e-06, 'epoch': 0.4} 40%|████ | 1115/2774 [3:39:49<5:26:00, 11.79s/it] 40%|████ | 1116/2774 [3:40:00<5:23:55, 11.72s/it] {'loss': 1.0205, 'learning_rate': 3.3936479742341734e-06, 'epoch': 0.4} 40%|████ | 1116/2774 [3:40:00<5:23:55, 11.72s/it] 40%|████ | 1117/2774 [3:40:12<5:20:29, 11.61s/it] {'loss': 1.0361, 'learning_rate': 3.390920577560299e-06, 'epoch': 0.4} 40%|████ | 1117/2774 [3:40:12<5:20:29, 11.61s/it] 40%|████ | 1118/2774 [3:40:23<5:20:37, 11.62s/it] {'loss': 1.0571, 'learning_rate': 3.388191965724576e-06, 'epoch': 0.4} 40%|████ | 1118/2774 [3:40:23<5:20:37, 11.62s/it] 40%|████ | 1119/2774 [3:40:34<5:16:37, 11.48s/it] {'loss': 1.02, 'learning_rate': 3.3854621424486663e-06, 'epoch': 0.4} 40%|████ | 1119/2774 [3:40:34<5:16:37, 11.48s/it] 40%|████ | 1120/2774 [3:40:46<5:17:35, 11.52s/it] {'loss': 1.0649, 'learning_rate': 3.3827311114558834e-06, 'epoch': 0.4} 40%|████ | 1120/2774 [3:40:46<5:17:35, 11.52s/it] 40%|████ | 1121/2774 [3:40:58<5:23:30, 11.74s/it] {'loss': 0.9761, 'learning_rate': 3.3799988764711883e-06, 'epoch': 0.4} 40%|████ | 1121/2774 [3:40:58<5:23:30, 11.74s/it] 40%|████ | 1122/2774 [3:41:10<5:21:07, 11.66s/it] {'loss': 1.0967, 'learning_rate': 3.3772654412211854e-06, 'epoch': 0.4} 40%|████ | 1122/2774 [3:41:10<5:21:07, 11.66s/it] 40%|████ | 1123/2774 [3:41:22<5:21:36, 11.69s/it] {'loss': 1.0664, 'learning_rate': 3.3745308094341144e-06, 'epoch': 0.4} 40%|████ | 1123/2774 [3:41:22<5:21:36, 11.69s/it] 41%|████ | 1124/2774 [3:41:33<5:21:54, 11.71s/it] {'loss': 1.0107, 'learning_rate': 3.3717949848398485e-06, 'epoch': 0.41} 41%|████ | 1124/2774 [3:41:33<5:21:54, 11.71s/it] 41%|████ | 1125/2774 [3:41:45<5:19:18, 11.62s/it] {'loss': 0.9727, 'learning_rate': 3.369057971169888e-06, 'epoch': 0.41} 41%|████ | 1125/2774 [3:41:45<5:19:18, 11.62s/it] 41%|████ | 1126/2774 [3:41:58<5:31:24, 12.07s/it] {'loss': 1.0039, 'learning_rate': 3.3663197721573516e-06, 'epoch': 0.41} 41%|████ | 1126/2774 [3:41:58<5:31:24, 12.07s/it] 41%|████ | 1127/2774 [3:42:09<5:25:58, 11.88s/it] {'loss': 1.0557, 'learning_rate': 3.3635803915369795e-06, 'epoch': 0.41} 41%|████ | 1127/2774 [3:42:09<5:25:58, 11.88s/it] 41%|████ | 1128/2774 [3:42:21<5:27:46, 11.95s/it] {'loss': 1.0635, 'learning_rate': 3.3608398330451206e-06, 'epoch': 0.41} 41%|████ | 1128/2774 [3:42:21<5:27:46, 11.95s/it] 41%|████ | 1129/2774 [3:42:33<5:25:33, 11.87s/it] {'loss': 1.0112, 'learning_rate': 3.3580981004197323e-06, 'epoch': 0.41} 41%|████ | 1129/2774 [3:42:33<5:25:33, 11.87s/it] 41%|████ | 1130/2774 [3:42:44<5:21:18, 11.73s/it] {'loss': 1.0054, 'learning_rate': 3.35535519740037e-06, 'epoch': 0.41} 41%|████ | 1130/2774 [3:42:44<5:21:18, 11.73s/it] 41%|████ | 1131/2774 [3:42:56<5:19:04, 11.65s/it] {'loss': 1.0371, 'learning_rate': 3.3526111277281897e-06, 'epoch': 0.41} 41%|████ | 1131/2774 [3:42:56<5:19:04, 11.65s/it] 41%|████ | 1132/2774 [3:43:08<5:18:26, 11.64s/it] {'loss': 1.0132, 'learning_rate': 3.3498658951459357e-06, 'epoch': 0.41} 41%|████ | 1132/2774 [3:43:08<5:18:26, 11.64s/it] 41%|████ | 1133/2774 [3:43:19<5:14:37, 11.50s/it] {'loss': 1.0361, 'learning_rate': 3.3471195033979405e-06, 'epoch': 0.41} 41%|████ | 1133/2774 [3:43:19<5:14:37, 11.50s/it] 41%|████ | 1134/2774 [3:43:31<5:17:18, 11.61s/it] {'loss': 1.0649, 'learning_rate': 3.3443719562301147e-06, 'epoch': 0.41} 41%|████ | 1134/2774 [3:43:31<5:17:18, 11.61s/it] 41%|████ | 1135/2774 [3:43:42<5:14:22, 11.51s/it] {'loss': 1.0249, 'learning_rate': 3.341623257389949e-06, 'epoch': 0.41} 41%|████ | 1135/2774 [3:43:42<5:14:22, 11.51s/it] 41%|████ | 1136/2774 [3:43:53<5:13:53, 11.50s/it] {'loss': 1.002, 'learning_rate': 3.3388734106264997e-06, 'epoch': 0.41} 41%|████ | 1136/2774 [3:43:53<5:13:53, 11.50s/it] 41%|████ | 1137/2774 [3:44:05<5:14:31, 11.53s/it] {'loss': 1.0347, 'learning_rate': 3.336122419690394e-06, 'epoch': 0.41} 41%|████ | 1137/2774 [3:44:05<5:14:31, 11.53s/it] 41%|████ | 1138/2774 [3:44:17<5:15:48, 11.58s/it] {'loss': 1.0264, 'learning_rate': 3.333370288333817e-06, 'epoch': 0.41} 41%|████ | 1138/2774 [3:44:17<5:15:48, 11.58s/it] 41%|████ | 1139/2774 [3:44:31<5:34:12, 12.26s/it] {'loss': 1.0127, 'learning_rate': 3.3306170203105086e-06, 'epoch': 0.41} 41%|████ | 1139/2774 [3:44:31<5:34:12, 12.26s/it] 41%|████ | 1140/2774 [3:44:42<5:25:26, 11.95s/it] {'loss': 1.0225, 'learning_rate': 3.3278626193757607e-06, 'epoch': 0.41} 41%|████ | 1140/2774 [3:44:42<5:25:26, 11.95s/it] 41%|████ | 1141/2774 [3:44:54<5:25:18, 11.95s/it] {'loss': 1.063, 'learning_rate': 3.3251070892864097e-06, 'epoch': 0.41} 41%|████ | 1141/2774 [3:44:54<5:25:18, 11.95s/it] 41%|████ | 1142/2774 [3:45:05<5:19:39, 11.75s/it] {'loss': 0.9995, 'learning_rate': 3.322350433800832e-06, 'epoch': 0.41} 41%|████ | 1142/2774 [3:45:05<5:19:39, 11.75s/it] 41%|████ | 1143/2774 [3:45:17<5:19:02, 11.74s/it] {'loss': 1.0444, 'learning_rate': 3.3195926566789405e-06, 'epoch': 0.41} 41%|████ | 1143/2774 [3:45:17<5:19:02, 11.74s/it] 41%|████ | 1144/2774 [3:45:28<5:14:00, 11.56s/it] {'loss': 1.0435, 'learning_rate': 3.316833761682175e-06, 'epoch': 0.41} 41%|████ | 1144/2774 [3:45:28<5:14:00, 11.56s/it] 41%|████▏ | 1145/2774 [3:45:40<5:15:48, 11.63s/it] {'loss': 1.0327, 'learning_rate': 3.3140737525735017e-06, 'epoch': 0.41} 41%|████▏ | 1145/2774 [3:45:40<5:15:48, 11.63s/it] 41%|████▏ | 1146/2774 [3:45:51<5:13:40, 11.56s/it] {'loss': 1.0522, 'learning_rate': 3.311312633117407e-06, 'epoch': 0.41} 41%|████▏ | 1146/2774 [3:45:51<5:13:40, 11.56s/it] 41%|████▏ | 1147/2774 [3:46:02<5:12:50, 11.54s/it] {'loss': 1.0029, 'learning_rate': 3.3085504070798915e-06, 'epoch': 0.41} 41%|████▏ | 1147/2774 [3:46:02<5:12:50, 11.54s/it] 41%|████▏ | 1148/2774 [3:46:14<5:12:55, 11.55s/it] {'loss': 1.0493, 'learning_rate': 3.305787078228463e-06, 'epoch': 0.41} 41%|████▏ | 1148/2774 [3:46:14<5:12:55, 11.55s/it] 41%|████▏ | 1149/2774 [3:46:28<5:28:35, 12.13s/it] {'loss': 1.0, 'learning_rate': 3.303022650332136e-06, 'epoch': 0.41} 41%|████▏ | 1149/2774 [3:46:28<5:28:35, 12.13s/it] 41%|████▏ | 1150/2774 [3:46:39<5:23:38, 11.96s/it] {'loss': 1.0249, 'learning_rate': 3.3002571271614233e-06, 'epoch': 0.41} 41%|████▏ | 1150/2774 [3:46:39<5:23:38, 11.96s/it] 41%|████▏ | 1151/2774 [3:46:51<5:20:22, 11.84s/it] {'loss': 1.0166, 'learning_rate': 3.2974905124883315e-06, 'epoch': 0.41} 41%|████▏ | 1151/2774 [3:46:51<5:20:22, 11.84s/it] 42%|████▏ | 1152/2774 [3:47:02<5:17:48, 11.76s/it] {'loss': 1.1074, 'learning_rate': 3.2947228100863558e-06, 'epoch': 0.42} 42%|████▏ | 1152/2774 [3:47:02<5:17:48, 11.76s/it] 42%|████▏ | 1153/2774 [3:47:14<5:17:53, 11.77s/it] {'loss': 1.0884, 'learning_rate': 3.2919540237304746e-06, 'epoch': 0.42} 42%|████▏ | 1153/2774 [3:47:14<5:17:53, 11.77s/it] 42%|████▏ | 1154/2774 [3:47:26<5:17:13, 11.75s/it] {'loss': 0.9922, 'learning_rate': 3.2891841571971463e-06, 'epoch': 0.42} 42%|████▏ | 1154/2774 [3:47:26<5:17:13, 11.75s/it] 42%|████▏ | 1155/2774 [3:47:39<5:30:38, 12.25s/it] {'loss': 0.9536, 'learning_rate': 3.2864132142643e-06, 'epoch': 0.42} 42%|████▏ | 1155/2774 [3:47:39<5:30:38, 12.25s/it] 42%|████▏ | 1156/2774 [3:47:53<5:40:12, 12.62s/it] {'loss': 1.0347, 'learning_rate': 3.283641198711337e-06, 'epoch': 0.42} 42%|████▏ | 1156/2774 [3:47:53<5:40:12, 12.62s/it] 42%|████▏ | 1157/2774 [3:48:04<5:28:54, 12.20s/it] {'loss': 1.0542, 'learning_rate': 3.2808681143191162e-06, 'epoch': 0.42} 42%|████▏ | 1157/2774 [3:48:04<5:28:54, 12.20s/it] 42%|████▏ | 1158/2774 [3:48:15<5:22:01, 11.96s/it] {'loss': 1.0859, 'learning_rate': 3.278093964869959e-06, 'epoch': 0.42} 42%|████▏ | 1158/2774 [3:48:15<5:22:01, 11.96s/it] 42%|████▏ | 1159/2774 [3:48:26<5:15:26, 11.72s/it] {'loss': 1.0015, 'learning_rate': 3.275318754147636e-06, 'epoch': 0.42} 42%|████▏ | 1159/2774 [3:48:26<5:15:26, 11.72s/it] 42%|████▏ | 1160/2774 [3:48:40<5:30:16, 12.28s/it] {'loss': 1.0317, 'learning_rate': 3.272542485937369e-06, 'epoch': 0.42} 42%|████▏ | 1160/2774 [3:48:40<5:30:16, 12.28s/it] 42%|████▏ | 1161/2774 [3:48:52<5:29:21, 12.25s/it] {'loss': 1.0337, 'learning_rate': 3.2697651640258195e-06, 'epoch': 0.42} 42%|████▏ | 1161/2774 [3:48:52<5:29:21, 12.25s/it] 42%|████▏ | 1162/2774 [3:49:03<5:20:47, 11.94s/it] {'loss': 0.9932, 'learning_rate': 3.266986792201086e-06, 'epoch': 0.42} 42%|████▏ | 1162/2774 [3:49:03<5:20:47, 11.94s/it] 42%|████▏ | 1163/2774 [3:49:16<5:23:41, 12.06s/it] {'loss': 1.0225, 'learning_rate': 3.2642073742527e-06, 'epoch': 0.42} 42%|████▏ | 1163/2774 [3:49:16<5:23:41, 12.06s/it] 42%|████▏ | 1164/2774 [3:49:27<5:16:35, 11.80s/it] {'loss': 1.0181, 'learning_rate': 3.26142691397162e-06, 'epoch': 0.42} 42%|████▏ | 1164/2774 [3:49:27<5:16:35, 11.80s/it] 42%|████▏ | 1165/2774 [3:49:38<5:11:12, 11.61s/it] {'loss': 1.0171, 'learning_rate': 3.258645415150226e-06, 'epoch': 0.42} 42%|████▏ | 1165/2774 [3:49:38<5:11:12, 11.61s/it] 42%|████▏ | 1166/2774 [3:49:49<5:09:27, 11.55s/it] {'loss': 1.0332, 'learning_rate': 3.2558628815823144e-06, 'epoch': 0.42} 42%|████▏ | 1166/2774 [3:49:49<5:09:27, 11.55s/it] 42%|████▏ | 1167/2774 [3:50:03<5:22:13, 12.03s/it] {'loss': 1.0835, 'learning_rate': 3.2530793170630926e-06, 'epoch': 0.42} 42%|████▏ | 1167/2774 [3:50:03<5:22:13, 12.03s/it] 42%|████▏ | 1168/2774 [3:50:14<5:18:01, 11.88s/it] {'loss': 1.0527, 'learning_rate': 3.2502947253891742e-06, 'epoch': 0.42} 42%|████▏ | 1168/2774 [3:50:14<5:18:01, 11.88s/it] 42%|████▏ | 1169/2774 [3:50:27<5:27:00, 12.22s/it] {'loss': 0.9717, 'learning_rate': 3.247509110358575e-06, 'epoch': 0.42} 42%|████▏ | 1169/2774 [3:50:27<5:27:00, 12.22s/it] 42%|████▏ | 1170/2774 [3:50:40<5:29:11, 12.31s/it] {'loss': 1.0449, 'learning_rate': 3.244722475770705e-06, 'epoch': 0.42} 42%|████▏ | 1170/2774 [3:50:40<5:29:11, 12.31s/it] 42%|████▏ | 1171/2774 [3:50:54<5:43:37, 12.86s/it] {'loss': 0.978, 'learning_rate': 3.2419348254263653e-06, 'epoch': 0.42} 42%|████▏ | 1171/2774 [3:50:54<5:43:37, 12.86s/it] 42%|████▏ | 1172/2774 [3:51:05<5:29:35, 12.34s/it] {'loss': 0.9688, 'learning_rate': 3.239146163127743e-06, 'epoch': 0.42} 42%|████▏ | 1172/2774 [3:51:05<5:29:35, 12.34s/it] 42%|████▏ | 1173/2774 [3:51:16<5:20:32, 12.01s/it] {'loss': 1.0488, 'learning_rate': 3.236356492678404e-06, 'epoch': 0.42} 42%|████▏ | 1173/2774 [3:51:16<5:20:32, 12.01s/it] 42%|████▏ | 1174/2774 [3:51:29<5:25:39, 12.21s/it] {'loss': 1.0083, 'learning_rate': 3.2335658178832926e-06, 'epoch': 0.42} 42%|████▏ | 1174/2774 [3:51:29<5:25:39, 12.21s/it] 42%|████▏ | 1175/2774 [3:51:40<5:18:18, 11.94s/it] {'loss': 1.0234, 'learning_rate': 3.230774142548718e-06, 'epoch': 0.42} 42%|████▏ | 1175/2774 [3:51:40<5:18:18, 11.94s/it] 42%|████▏ | 1176/2774 [3:51:53<5:24:46, 12.19s/it] {'loss': 0.98, 'learning_rate': 3.2279814704823575e-06, 'epoch': 0.42} 42%|████▏ | 1176/2774 [3:51:53<5:24:46, 12.19s/it] 42%|████▏ | 1177/2774 [3:52:05<5:22:16, 12.11s/it] {'loss': 1.0581, 'learning_rate': 3.2251878054932482e-06, 'epoch': 0.42} 42%|████▏ | 1177/2774 [3:52:05<5:22:16, 12.11s/it] 42%|████▏ | 1178/2774 [3:52:17<5:17:54, 11.95s/it] {'loss': 1.0107, 'learning_rate': 3.222393151391779e-06, 'epoch': 0.42} 42%|████▏ | 1178/2774 [3:52:17<5:17:54, 11.95s/it] 43%|████▎ | 1179/2774 [3:52:28<5:13:06, 11.78s/it] {'loss': 1.0454, 'learning_rate': 3.2195975119896907e-06, 'epoch': 0.43} 43%|████▎ | 1179/2774 [3:52:28<5:13:06, 11.78s/it] 43%|████▎ | 1180/2774 [3:52:42<5:28:02, 12.35s/it] {'loss': 1.0195, 'learning_rate': 3.216800891100065e-06, 'epoch': 0.43} 43%|████▎ | 1180/2774 [3:52:42<5:28:02, 12.35s/it] 43%|████▎ | 1181/2774 [3:52:54<5:32:05, 12.51s/it] {'loss': 1.0405, 'learning_rate': 3.214003292537325e-06, 'epoch': 0.43} 43%|████▎ | 1181/2774 [3:52:54<5:32:05, 12.51s/it] 43%|████▎ | 1182/2774 [3:53:07<5:28:54, 12.40s/it] {'loss': 1.0732, 'learning_rate': 3.211204720117225e-06, 'epoch': 0.43} 43%|████▎ | 1182/2774 [3:53:07<5:28:54, 12.40s/it] 43%|████▎ | 1183/2774 [3:53:18<5:21:41, 12.13s/it] {'loss': 1.0366, 'learning_rate': 3.2084051776568504e-06, 'epoch': 0.43} 43%|████▎ | 1183/2774 [3:53:18<5:21:41, 12.13s/it] 43%|████▎ | 1184/2774 [3:53:29<5:13:48, 11.84s/it] {'loss': 1.0669, 'learning_rate': 3.205604668974607e-06, 'epoch': 0.43} 43%|████▎ | 1184/2774 [3:53:29<5:13:48, 11.84s/it] 43%|████▎ | 1185/2774 [3:53:41<5:10:12, 11.71s/it] {'loss': 1.0674, 'learning_rate': 3.2028031978902186e-06, 'epoch': 0.43} 43%|████▎ | 1185/2774 [3:53:41<5:10:12, 11.71s/it] 43%|████▎ | 1186/2774 [3:53:53<5:13:38, 11.85s/it] {'loss': 1.0845, 'learning_rate': 3.2000007682247243e-06, 'epoch': 0.43} 43%|████▎ | 1186/2774 [3:53:53<5:13:38, 11.85s/it] 43%|████▎ | 1187/2774 [3:54:05<5:12:21, 11.81s/it] {'loss': 1.0503, 'learning_rate': 3.1971973838004673e-06, 'epoch': 0.43} 43%|████▎ | 1187/2774 [3:54:05<5:12:21, 11.81s/it] 43%|████▎ | 1188/2774 [3:54:16<5:11:44, 11.79s/it] {'loss': 1.0073, 'learning_rate': 3.1943930484410963e-06, 'epoch': 0.43} 43%|████▎ | 1188/2774 [3:54:16<5:11:44, 11.79s/it] 43%|████▎ | 1189/2774 [3:54:29<5:16:04, 11.96s/it] {'loss': 1.0361, 'learning_rate': 3.191587765971553e-06, 'epoch': 0.43} 43%|████▎ | 1189/2774 [3:54:29<5:16:04, 11.96s/it] 43%|████▎ | 1190/2774 [3:54:42<5:28:50, 12.46s/it] {'loss': 1.0171, 'learning_rate': 3.1887815402180756e-06, 'epoch': 0.43} 43%|████▎ | 1190/2774 [3:54:42<5:28:50, 12.46s/it] 43%|████▎ | 1191/2774 [3:54:54<5:19:51, 12.12s/it] {'loss': 1.0322, 'learning_rate': 3.1859743750081853e-06, 'epoch': 0.43} 43%|████▎ | 1191/2774 [3:54:54<5:19:51, 12.12s/it] 43%|████▎ | 1192/2774 [3:55:05<5:14:23, 11.92s/it] {'loss': 1.0059, 'learning_rate': 3.1831662741706853e-06, 'epoch': 0.43} 43%|████▎ | 1192/2774 [3:55:05<5:14:23, 11.92s/it] 43%|████▎ | 1193/2774 [3:55:16<5:09:10, 11.73s/it] {'loss': 1.0601, 'learning_rate': 3.1803572415356576e-06, 'epoch': 0.43} 43%|████▎ | 1193/2774 [3:55:16<5:09:10, 11.73s/it] 43%|████▎ | 1194/2774 [3:55:27<5:03:15, 11.52s/it] {'loss': 1.0273, 'learning_rate': 3.177547280934451e-06, 'epoch': 0.43} 43%|████▎ | 1194/2774 [3:55:27<5:03:15, 11.52s/it] 43%|████▎ | 1195/2774 [3:55:39<5:03:28, 11.53s/it] {'loss': 0.9937, 'learning_rate': 3.1747363961996823e-06, 'epoch': 0.43} 43%|████▎ | 1195/2774 [3:55:39<5:03:28, 11.53s/it] 43%|████▎ | 1196/2774 [3:55:51<5:04:06, 11.56s/it] {'loss': 1.0278, 'learning_rate': 3.171924591165229e-06, 'epoch': 0.43} 43%|████▎ | 1196/2774 [3:55:51<5:04:06, 11.56s/it] 43%|████▎ | 1197/2774 [3:56:02<5:06:07, 11.65s/it] {'loss': 1.0942, 'learning_rate': 3.1691118696662245e-06, 'epoch': 0.43} 43%|████▎ | 1197/2774 [3:56:02<5:06:07, 11.65s/it] 43%|████▎ | 1198/2774 [3:56:14<5:02:38, 11.52s/it] {'loss': 1.0977, 'learning_rate': 3.166298235539048e-06, 'epoch': 0.43} 43%|████▎ | 1198/2774 [3:56:14<5:02:38, 11.52s/it] 43%|████▎ | 1199/2774 [3:56:26<5:10:37, 11.83s/it] {'loss': 1.0049, 'learning_rate': 3.1634836926213287e-06, 'epoch': 0.43} 43%|████▎ | 1199/2774 [3:56:26<5:10:37, 11.83s/it] 43%|████▎ | 1200/2774 [3:56:37<5:05:58, 11.66s/it] {'loss': 1.0566, 'learning_rate': 3.1606682447519333e-06, 'epoch': 0.43} 43%|████▎ | 1200/2774 [3:56:37<5:05:58, 11.66s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 43%|████▎ | 1201/2774 [3:57:15<8:32:41, 19.56s/it] {'loss': 1.0459, 'learning_rate': 3.157851895770961e-06, 'epoch': 0.43} 43%|████▎ | 1201/2774 [3:57:16<8:32:41, 19.56s/it] 43%|████▎ | 1202/2774 [3:57:29<7:41:41, 17.62s/it] {'loss': 1.0112, 'learning_rate': 3.1550346495197433e-06, 'epoch': 0.43} 43%|████▎ | 1202/2774 [3:57:29<7:41:41, 17.62s/it] 43%|████▎ | 1203/2774 [3:57:40<6:52:07, 15.74s/it] {'loss': 1.0371, 'learning_rate': 3.1522165098408332e-06, 'epoch': 0.43} 43%|████▎ | 1203/2774 [3:57:40<6:52:07, 15.74s/it] 43%|████▎ | 1204/2774 [3:57:51<6:18:28, 14.46s/it] {'loss': 1.1455, 'learning_rate': 3.149397480578002e-06, 'epoch': 0.43} 43%|████▎ | 1204/2774 [3:57:51<6:18:28, 14.46s/it] 43%|████▎ | 1205/2774 [3:58:03<5:53:45, 13.53s/it] {'loss': 1.0488, 'learning_rate': 3.1465775655762377e-06, 'epoch': 0.43} 43%|████▎ | 1205/2774 [3:58:03<5:53:45, 13.53s/it] 43%|████▎ | 1206/2774 [3:58:14<5:36:35, 12.88s/it] {'loss': 1.0068, 'learning_rate': 3.1437567686817317e-06, 'epoch': 0.43} 43%|████▎ | 1206/2774 [3:58:14<5:36:35, 12.88s/it] 44%|████▎ | 1207/2774 [3:58:25<5:24:29, 12.42s/it] {'loss': 1.04, 'learning_rate': 3.140935093741882e-06, 'epoch': 0.44} 44%|████▎ | 1207/2774 [3:58:25<5:24:29, 12.42s/it] 44%|████▎ | 1208/2774 [3:58:37<5:16:06, 12.11s/it] {'loss': 1.0581, 'learning_rate': 3.138112544605282e-06, 'epoch': 0.44} 44%|████▎ | 1208/2774 [3:58:37<5:16:06, 12.11s/it] 44%|████▎ | 1209/2774 [3:58:49<5:17:56, 12.19s/it] {'loss': 1.0122, 'learning_rate': 3.1352891251217183e-06, 'epoch': 0.44} 44%|████▎ | 1209/2774 [3:58:49<5:17:56, 12.19s/it] 44%|████▎ | 1210/2774 [3:59:02<5:22:09, 12.36s/it] {'loss': 0.9976, 'learning_rate': 3.132464839142165e-06, 'epoch': 0.44} 44%|████▎ | 1210/2774 [3:59:02<5:22:09, 12.36s/it] 44%|████▎ | 1211/2774 [3:59:13<5:15:00, 12.09s/it] {'loss': 0.9946, 'learning_rate': 3.129639690518777e-06, 'epoch': 0.44} 44%|████▎ | 1211/2774 [3:59:13<5:15:00, 12.09s/it] 44%|████▎ | 1212/2774 [3:59:25<5:10:48, 11.94s/it] {'loss': 1.0156, 'learning_rate': 3.126813683104887e-06, 'epoch': 0.44} 44%|████▎ | 1212/2774 [3:59:25<5:10:48, 11.94s/it] 44%|████▎ | 1213/2774 [3:59:37<5:08:15, 11.85s/it] {'loss': 1.0679, 'learning_rate': 3.1239868207549974e-06, 'epoch': 0.44} 44%|████▎ | 1213/2774 [3:59:37<5:08:15, 11.85s/it] 44%|████▍ | 1214/2774 [3:59:48<5:05:14, 11.74s/it] {'loss': 1.1006, 'learning_rate': 3.121159107324778e-06, 'epoch': 0.44} 44%|████▍ | 1214/2774 [3:59:48<5:05:14, 11.74s/it] 44%|████▍ | 1215/2774 [4:00:01<5:10:53, 11.96s/it] {'loss': 1.0308, 'learning_rate': 3.1183305466710605e-06, 'epoch': 0.44} 44%|████▍ | 1215/2774 [4:00:01<5:10:53, 11.96s/it] 44%|████▍ | 1216/2774 [4:00:12<5:05:03, 11.75s/it] {'loss': 1.0269, 'learning_rate': 3.115501142651829e-06, 'epoch': 0.44} 44%|████▍ | 1216/2774 [4:00:12<5:05:03, 11.75s/it] 44%|████▍ | 1217/2774 [4:00:23<5:03:12, 11.68s/it] {'loss': 1.0254, 'learning_rate': 3.1126708991262205e-06, 'epoch': 0.44} 44%|████▍ | 1217/2774 [4:00:23<5:03:12, 11.68s/it] 44%|████▍ | 1218/2774 [4:00:35<5:04:53, 11.76s/it] {'loss': 1.0059, 'learning_rate': 3.109839819954516e-06, 'epoch': 0.44} 44%|████▍ | 1218/2774 [4:00:35<5:04:53, 11.76s/it] 44%|████▍ | 1219/2774 [4:00:47<5:04:41, 11.76s/it] {'loss': 1.019, 'learning_rate': 3.1070079089981364e-06, 'epoch': 0.44} 44%|████▍ | 1219/2774 [4:00:47<5:04:41, 11.76s/it] 44%|████▍ | 1220/2774 [4:01:01<5:17:26, 12.26s/it] {'loss': 1.0669, 'learning_rate': 3.1041751701196377e-06, 'epoch': 0.44} 44%|████▍ | 1220/2774 [4:01:01<5:17:26, 12.26s/it] 44%|████▍ | 1221/2774 [4:01:14<5:22:50, 12.47s/it] {'loss': 0.9658, 'learning_rate': 3.1013416071827034e-06, 'epoch': 0.44} 44%|████▍ | 1221/2774 [4:01:14<5:22:50, 12.47s/it] 44%|████▍ | 1222/2774 [4:01:25<5:16:26, 12.23s/it] {'loss': 1.0435, 'learning_rate': 3.0985072240521434e-06, 'epoch': 0.44} 44%|████▍ | 1222/2774 [4:01:25<5:16:26, 12.23s/it] 44%|████▍ | 1223/2774 [4:01:39<5:26:46, 12.64s/it] {'loss': 0.9946, 'learning_rate': 3.0956720245938845e-06, 'epoch': 0.44} 44%|████▍ | 1223/2774 [4:01:39<5:26:46, 12.64s/it] 44%|████▍ | 1224/2774 [4:01:50<5:16:44, 12.26s/it] {'loss': 1.0352, 'learning_rate': 3.092836012674968e-06, 'epoch': 0.44} 44%|████▍ | 1224/2774 [4:01:50<5:16:44, 12.26s/it] 44%|████▍ | 1225/2774 [4:02:02<5:17:05, 12.28s/it] {'loss': 1.1304, 'learning_rate': 3.089999192163542e-06, 'epoch': 0.44} 44%|████▍ | 1225/2774 [4:02:02<5:17:05, 12.28s/it] 44%|████▍ | 1226/2774 [4:02:14<5:12:31, 12.11s/it] {'loss': 0.9897, 'learning_rate': 3.0871615669288584e-06, 'epoch': 0.44} 44%|████▍ | 1226/2774 [4:02:14<5:12:31, 12.11s/it] 44%|████▍ | 1227/2774 [4:02:27<5:16:49, 12.29s/it] {'loss': 1.0098, 'learning_rate': 3.0843231408412675e-06, 'epoch': 0.44} 44%|████▍ | 1227/2774 [4:02:27<5:16:49, 12.29s/it] 44%|████▍ | 1228/2774 [4:02:38<5:08:28, 11.97s/it] {'loss': 1.0762, 'learning_rate': 3.0814839177722108e-06, 'epoch': 0.44} 44%|████▍ | 1228/2774 [4:02:38<5:08:28, 11.97s/it] 44%|████▍ | 1229/2774 [4:02:50<5:11:08, 12.08s/it] {'loss': 0.9956, 'learning_rate': 3.078643901594216e-06, 'epoch': 0.44} 44%|████▍ | 1229/2774 [4:02:50<5:11:08, 12.08s/it] 44%|████▍ | 1230/2774 [4:03:02<5:03:50, 11.81s/it] {'loss': 1.0835, 'learning_rate': 3.0758030961808954e-06, 'epoch': 0.44} 44%|████▍ | 1230/2774 [4:03:02<5:03:50, 11.81s/it] 44%|████▍ | 1231/2774 [4:03:13<5:01:24, 11.72s/it] {'loss': 1.0317, 'learning_rate': 3.0729615054069338e-06, 'epoch': 0.44} 44%|████▍ | 1231/2774 [4:03:13<5:01:24, 11.72s/it] 44%|████▍ | 1232/2774 [4:03:25<4:59:07, 11.64s/it] {'loss': 1.0166, 'learning_rate': 3.0701191331480905e-06, 'epoch': 0.44} 44%|████▍ | 1232/2774 [4:03:25<4:59:07, 11.64s/it] 44%|████▍ | 1233/2774 [4:03:36<4:55:31, 11.51s/it] {'loss': 1.0029, 'learning_rate': 3.0672759832811904e-06, 'epoch': 0.44} 44%|████▍ | 1233/2774 [4:03:36<4:55:31, 11.51s/it] 44%|████▍ | 1234/2774 [4:03:47<4:53:43, 11.44s/it] {'loss': 1.0884, 'learning_rate': 3.064432059684117e-06, 'epoch': 0.44} 44%|████▍ | 1234/2774 [4:03:47<4:53:43, 11.44s/it] 45%|████▍ | 1235/2774 [4:03:59<4:55:01, 11.50s/it] {'loss': 1.0776, 'learning_rate': 3.06158736623581e-06, 'epoch': 0.45} 45%|████▍ | 1235/2774 [4:03:59<4:55:01, 11.50s/it] 45%|████▍ | 1236/2774 [4:04:11<4:57:25, 11.60s/it] {'loss': 1.0776, 'learning_rate': 3.0587419068162605e-06, 'epoch': 0.45} 45%|████▍ | 1236/2774 [4:04:11<4:57:25, 11.60s/it] 45%|████▍ | 1237/2774 [4:04:22<4:55:17, 11.53s/it] {'loss': 1.0825, 'learning_rate': 3.0558956853065024e-06, 'epoch': 0.45} 45%|████▍ | 1237/2774 [4:04:22<4:55:17, 11.53s/it] 45%|████▍ | 1238/2774 [4:04:33<4:54:43, 11.51s/it] {'loss': 1.0132, 'learning_rate': 3.053048705588611e-06, 'epoch': 0.45} 45%|████▍ | 1238/2774 [4:04:33<4:54:43, 11.51s/it] 45%|████▍ | 1239/2774 [4:04:45<4:53:39, 11.48s/it] {'loss': 1.0645, 'learning_rate': 3.050200971545693e-06, 'epoch': 0.45} 45%|████▍ | 1239/2774 [4:04:45<4:53:39, 11.48s/it] 45%|████▍ | 1240/2774 [4:04:56<4:53:32, 11.48s/it] {'loss': 0.9956, 'learning_rate': 3.047352487061887e-06, 'epoch': 0.45} 45%|████▍ | 1240/2774 [4:04:56<4:53:32, 11.48s/it] 45%|████▍ | 1241/2774 [4:05:08<4:52:07, 11.43s/it] {'loss': 1.1069, 'learning_rate': 3.044503256022353e-06, 'epoch': 0.45} 45%|████▍ | 1241/2774 [4:05:08<4:52:07, 11.43s/it] 45%|████▍ | 1242/2774 [4:05:19<4:52:14, 11.45s/it] {'loss': 1.0874, 'learning_rate': 3.041653282313271e-06, 'epoch': 0.45} 45%|████▍ | 1242/2774 [4:05:19<4:52:14, 11.45s/it] 45%|████▍ | 1243/2774 [4:05:31<4:53:50, 11.52s/it] {'loss': 1.0337, 'learning_rate': 3.0388025698218315e-06, 'epoch': 0.45} 45%|████▍ | 1243/2774 [4:05:31<4:53:50, 11.52s/it] 45%|████▍ | 1244/2774 [4:05:42<4:53:06, 11.49s/it] {'loss': 0.981, 'learning_rate': 3.0359511224362353e-06, 'epoch': 0.45} 45%|████▍ | 1244/2774 [4:05:42<4:53:06, 11.49s/it] 45%|████▍ | 1245/2774 [4:05:54<4:54:05, 11.54s/it] {'loss': 1.0, 'learning_rate': 3.0330989440456837e-06, 'epoch': 0.45} 45%|████▍ | 1245/2774 [4:05:54<4:54:05, 11.54s/it] 45%|████▍ | 1246/2774 [4:06:07<5:09:14, 12.14s/it] {'loss': 0.9878, 'learning_rate': 3.0302460385403763e-06, 'epoch': 0.45} 45%|████▍ | 1246/2774 [4:06:07<5:09:14, 12.14s/it] 45%|████▍ | 1247/2774 [4:06:20<5:08:58, 12.14s/it] {'loss': 1.0156, 'learning_rate': 3.0273924098115045e-06, 'epoch': 0.45} 45%|████▍ | 1247/2774 [4:06:20<5:08:58, 12.14s/it] 45%|████▍ | 1248/2774 [4:06:31<5:01:43, 11.86s/it] {'loss': 1.0747, 'learning_rate': 3.024538061751243e-06, 'epoch': 0.45} 45%|████▍ | 1248/2774 [4:06:31<5:01:43, 11.86s/it] 45%|████▌ | 1249/2774 [4:06:44<5:11:30, 12.26s/it] {'loss': 0.9688, 'learning_rate': 3.021682998252753e-06, 'epoch': 0.45} 45%|████▌ | 1249/2774 [4:06:44<5:11:30, 12.26s/it] 45%|████▌ | 1250/2774 [4:06:55<5:03:55, 11.97s/it] {'loss': 1.1143, 'learning_rate': 3.0188272232101666e-06, 'epoch': 0.45} 45%|████▌ | 1250/2774 [4:06:55<5:03:55, 11.97s/it] 45%|████▌ | 1251/2774 [4:07:06<4:58:15, 11.75s/it] {'loss': 1.0503, 'learning_rate': 3.01597074051859e-06, 'epoch': 0.45} 45%|████▌ | 1251/2774 [4:07:06<4:58:15, 11.75s/it] 45%|████▌ | 1252/2774 [4:07:18<4:56:05, 11.67s/it] {'loss': 0.9985, 'learning_rate': 3.0131135540740915e-06, 'epoch': 0.45} 45%|████▌ | 1252/2774 [4:07:18<4:56:05, 11.67s/it] 45%|████▌ | 1253/2774 [4:07:30<5:01:13, 11.88s/it] {'loss': 0.9858, 'learning_rate': 3.0102556677737024e-06, 'epoch': 0.45} 45%|████▌ | 1253/2774 [4:07:30<5:01:13, 11.88s/it] 45%|████▌ | 1254/2774 [4:07:42<4:57:16, 11.73s/it] {'loss': 1.0024, 'learning_rate': 3.0073970855154057e-06, 'epoch': 0.45} 45%|████▌ | 1254/2774 [4:07:42<4:57:16, 11.73s/it] 45%|████▌ | 1255/2774 [4:07:54<5:04:23, 12.02s/it] {'loss': 0.9883, 'learning_rate': 3.004537811198135e-06, 'epoch': 0.45} 45%|████▌ | 1255/2774 [4:07:54<5:04:23, 12.02s/it] 45%|████▌ | 1256/2774 [4:08:06<5:00:16, 11.87s/it] {'loss': 1.0713, 'learning_rate': 3.0016778487217683e-06, 'epoch': 0.45} 45%|████▌ | 1256/2774 [4:08:06<5:00:16, 11.87s/it] 45%|████▌ | 1257/2774 [4:08:18<4:59:35, 11.85s/it] {'loss': 0.9922, 'learning_rate': 2.9988172019871216e-06, 'epoch': 0.45} 45%|████▌ | 1257/2774 [4:08:18<4:59:35, 11.85s/it] 45%|████▌ | 1258/2774 [4:08:29<4:54:34, 11.66s/it] {'loss': 1.0132, 'learning_rate': 2.995955874895944e-06, 'epoch': 0.45} 45%|████▌ | 1258/2774 [4:08:29<4:54:34, 11.66s/it] 45%|████▌ | 1259/2774 [4:08:41<4:53:57, 11.64s/it] {'loss': 1.0488, 'learning_rate': 2.9930938713509127e-06, 'epoch': 0.45} 45%|████▌ | 1259/2774 [4:08:41<4:53:57, 11.64s/it] 45%|████▌ | 1260/2774 [4:08:52<4:53:54, 11.65s/it] {'loss': 1.0405, 'learning_rate': 2.9902311952556286e-06, 'epoch': 0.45} 45%|████▌ | 1260/2774 [4:08:52<4:53:54, 11.65s/it] 45%|████▌ | 1261/2774 [4:09:05<5:02:27, 11.99s/it] {'loss': 0.979, 'learning_rate': 2.9873678505146077e-06, 'epoch': 0.45} 45%|████▌ | 1261/2774 [4:09:05<5:02:27, 11.99s/it] 45%|████▌ | 1262/2774 [4:09:18<5:10:33, 12.32s/it] {'loss': 1.061, 'learning_rate': 2.9845038410332793e-06, 'epoch': 0.45} 45%|████▌ | 1262/2774 [4:09:18<5:10:33, 12.32s/it] 46%|████▌ | 1263/2774 [4:09:29<5:02:45, 12.02s/it] {'loss': 1.0059, 'learning_rate': 2.9816391707179802e-06, 'epoch': 0.46} 46%|████▌ | 1263/2774 [4:09:29<5:02:45, 12.02s/it] 46%|████▌ | 1264/2774 [4:09:41<4:59:43, 11.91s/it] {'loss': 1.0088, 'learning_rate': 2.9787738434759472e-06, 'epoch': 0.46} 46%|████▌ | 1264/2774 [4:09:41<4:59:43, 11.91s/it] 46%|████▌ | 1265/2774 [4:09:53<4:59:13, 11.90s/it] {'loss': 1.0557, 'learning_rate': 2.9759078632153145e-06, 'epoch': 0.46} 46%|████▌ | 1265/2774 [4:09:53<4:59:13, 11.90s/it] 46%|████▌ | 1266/2774 [4:10:04<4:55:09, 11.74s/it] {'loss': 1.0107, 'learning_rate': 2.9730412338451044e-06, 'epoch': 0.46} 46%|████▌ | 1266/2774 [4:10:04<4:55:09, 11.74s/it] 46%|████▌ | 1267/2774 [4:10:16<4:54:18, 11.72s/it] {'loss': 1.0137, 'learning_rate': 2.9701739592752265e-06, 'epoch': 0.46} 46%|████▌ | 1267/2774 [4:10:16<4:54:18, 11.72s/it] 46%|████▌ | 1268/2774 [4:10:28<4:57:31, 11.85s/it] {'loss': 1.0244, 'learning_rate': 2.9673060434164712e-06, 'epoch': 0.46} 46%|████▌ | 1268/2774 [4:10:28<4:57:31, 11.85s/it] 46%|████▌ | 1269/2774 [4:10:39<4:53:03, 11.68s/it] {'loss': 1.0547, 'learning_rate': 2.9644374901805025e-06, 'epoch': 0.46} 46%|████▌ | 1269/2774 [4:10:39<4:53:03, 11.68s/it] 46%|████▌ | 1270/2774 [4:10:53<5:05:04, 12.17s/it] {'loss': 0.9863, 'learning_rate': 2.9615683034798514e-06, 'epoch': 0.46} 46%|████▌ | 1270/2774 [4:10:53<5:05:04, 12.17s/it] 46%|████▌ | 1271/2774 [4:11:05<5:03:30, 12.12s/it] {'loss': 0.9966, 'learning_rate': 2.9586984872279178e-06, 'epoch': 0.46} 46%|████▌ | 1271/2774 [4:11:05<5:03:30, 12.12s/it] 46%|████▌ | 1272/2774 [4:11:18<5:08:22, 12.32s/it] {'loss': 1.0317, 'learning_rate': 2.955828045338957e-06, 'epoch': 0.46} 46%|████▌ | 1272/2774 [4:11:18<5:08:22, 12.32s/it] 46%|████▌ | 1273/2774 [4:11:29<5:03:43, 12.14s/it] {'loss': 1.0034, 'learning_rate': 2.952956981728078e-06, 'epoch': 0.46} 46%|████▌ | 1273/2774 [4:11:29<5:03:43, 12.14s/it] 46%|████▌ | 1274/2774 [4:11:42<5:05:01, 12.20s/it] {'loss': 1.0479, 'learning_rate': 2.9500853003112384e-06, 'epoch': 0.46} 46%|████▌ | 1274/2774 [4:11:42<5:05:01, 12.20s/it] 46%|████▌ | 1275/2774 [4:11:53<4:58:29, 11.95s/it] {'loss': 1.0254, 'learning_rate': 2.9472130050052385e-06, 'epoch': 0.46} 46%|████▌ | 1275/2774 [4:11:53<4:58:29, 11.95s/it] 46%|████▌ | 1276/2774 [4:12:06<5:08:15, 12.35s/it] {'loss': 1.0239, 'learning_rate': 2.944340099727715e-06, 'epoch': 0.46} 46%|████▌ | 1276/2774 [4:12:06<5:08:15, 12.35s/it] 46%|████▌ | 1277/2774 [4:12:18<5:01:24, 12.08s/it] {'loss': 1.0601, 'learning_rate': 2.9414665883971365e-06, 'epoch': 0.46} 46%|████▌ | 1277/2774 [4:12:18<5:01:24, 12.08s/it] 46%|████▌ | 1278/2774 [4:12:29<4:54:42, 11.82s/it] {'loss': 1.0093, 'learning_rate': 2.938592474932801e-06, 'epoch': 0.46} 46%|████▌ | 1278/2774 [4:12:29<4:54:42, 11.82s/it] 46%|████▌ | 1279/2774 [4:12:40<4:50:59, 11.68s/it] {'loss': 1.0312, 'learning_rate': 2.9357177632548234e-06, 'epoch': 0.46} 46%|████▌ | 1279/2774 [4:12:40<4:50:59, 11.68s/it] 46%|████▌ | 1280/2774 [4:12:52<4:50:45, 11.68s/it] {'loss': 1.042, 'learning_rate': 2.9328424572841375e-06, 'epoch': 0.46} 46%|████▌ | 1280/2774 [4:12:52<4:50:45, 11.68s/it] 46%|████▌ | 1281/2774 [4:13:03<4:48:27, 11.59s/it] {'loss': 1.0854, 'learning_rate': 2.929966560942487e-06, 'epoch': 0.46} 46%|████▌ | 1281/2774 [4:13:03<4:48:27, 11.59s/it] 46%|████▌ | 1282/2774 [4:13:14<4:44:34, 11.44s/it] {'loss': 1.0332, 'learning_rate': 2.9270900781524216e-06, 'epoch': 0.46} 46%|████▌ | 1282/2774 [4:13:14<4:44:34, 11.44s/it] 46%|████▋ | 1283/2774 [4:13:26<4:43:47, 11.42s/it] {'loss': 1.0566, 'learning_rate': 2.9242130128372896e-06, 'epoch': 0.46} 46%|████▋ | 1283/2774 [4:13:26<4:43:47, 11.42s/it] 46%|████▋ | 1284/2774 [4:13:37<4:45:20, 11.49s/it] {'loss': 1.0054, 'learning_rate': 2.9213353689212337e-06, 'epoch': 0.46} 46%|████▋ | 1284/2774 [4:13:37<4:45:20, 11.49s/it] 46%|████▋ | 1285/2774 [4:13:49<4:47:00, 11.57s/it] {'loss': 1.0156, 'learning_rate': 2.9184571503291865e-06, 'epoch': 0.46} 46%|████▋ | 1285/2774 [4:13:49<4:47:00, 11.57s/it] 46%|████▋ | 1286/2774 [4:14:00<4:43:58, 11.45s/it] {'loss': 1.1016, 'learning_rate': 2.915578360986865e-06, 'epoch': 0.46} 46%|████▋ | 1286/2774 [4:14:00<4:43:58, 11.45s/it] 46%|████▋ | 1287/2774 [4:14:12<4:42:54, 11.41s/it] {'loss': 0.9805, 'learning_rate': 2.9126990048207633e-06, 'epoch': 0.46} 46%|████▋ | 1287/2774 [4:14:12<4:42:54, 11.41s/it] 46%|████▋ | 1288/2774 [4:14:23<4:41:42, 11.37s/it] {'loss': 1.0142, 'learning_rate': 2.9098190857581493e-06, 'epoch': 0.46} 46%|████▋ | 1288/2774 [4:14:23<4:41:42, 11.37s/it] 46%|████▋ | 1289/2774 [4:14:35<4:43:22, 11.45s/it] {'loss': 1.0479, 'learning_rate': 2.906938607727059e-06, 'epoch': 0.46} 46%|████▋ | 1289/2774 [4:14:35<4:43:22, 11.45s/it] 47%|████▋ | 1290/2774 [4:14:46<4:45:42, 11.55s/it] {'loss': 1.0278, 'learning_rate': 2.90405757465629e-06, 'epoch': 0.47} 47%|████▋ | 1290/2774 [4:14:46<4:45:42, 11.55s/it] 47%|████▋ | 1291/2774 [4:14:59<4:50:52, 11.77s/it] {'loss': 1.0518, 'learning_rate': 2.901175990475398e-06, 'epoch': 0.47} 47%|████▋ | 1291/2774 [4:14:59<4:50:52, 11.77s/it] 47%|████▋ | 1292/2774 [4:15:11<4:51:57, 11.82s/it] {'loss': 1.0542, 'learning_rate': 2.8982938591146892e-06, 'epoch': 0.47} 47%|████▋ | 1292/2774 [4:15:11<4:51:57, 11.82s/it] 47%|████▋ | 1293/2774 [4:15:22<4:49:18, 11.72s/it] {'loss': 1.0151, 'learning_rate': 2.895411184505217e-06, 'epoch': 0.47} 47%|████▋ | 1293/2774 [4:15:22<4:49:18, 11.72s/it] 47%|████▋ | 1294/2774 [4:15:36<5:04:57, 12.36s/it] {'loss': 0.9927, 'learning_rate': 2.892527970578775e-06, 'epoch': 0.47} 47%|████▋ | 1294/2774 [4:15:36<5:04:57, 12.36s/it] 47%|████▋ | 1295/2774 [4:15:48<4:58:49, 12.12s/it] {'loss': 1.0298, 'learning_rate': 2.8896442212678933e-06, 'epoch': 0.47} 47%|████▋ | 1295/2774 [4:15:48<4:58:49, 12.12s/it] 47%|████▋ | 1296/2774 [4:15:59<4:54:32, 11.96s/it] {'loss': 0.9819, 'learning_rate': 2.8867599405058315e-06, 'epoch': 0.47} 47%|████▋ | 1296/2774 [4:15:59<4:54:32, 11.96s/it] 47%|████▋ | 1297/2774 [4:16:11<4:51:38, 11.85s/it] {'loss': 1.0771, 'learning_rate': 2.8838751322265746e-06, 'epoch': 0.47} 47%|████▋ | 1297/2774 [4:16:11<4:51:38, 11.85s/it] 47%|████▋ | 1298/2774 [4:16:22<4:49:55, 11.79s/it] {'loss': 1.0503, 'learning_rate': 2.880989800364826e-06, 'epoch': 0.47} 47%|████▋ | 1298/2774 [4:16:22<4:49:55, 11.79s/it] 47%|████▋ | 1299/2774 [4:16:34<4:46:13, 11.64s/it] {'loss': 1.0527, 'learning_rate': 2.8781039488560055e-06, 'epoch': 0.47} 47%|████▋ | 1299/2774 [4:16:34<4:46:13, 11.64s/it] 47%|████▋ | 1300/2774 [4:16:45<4:45:35, 11.63s/it] {'loss': 1.0547, 'learning_rate': 2.8752175816362384e-06, 'epoch': 0.47} 47%|████▋ | 1300/2774 [4:16:45<4:45:35, 11.63s/it] 47%|████▋ | 1301/2774 [4:16:57<4:43:52, 11.56s/it] {'loss': 1.0244, 'learning_rate': 2.8723307026423565e-06, 'epoch': 0.47} 47%|████▋ | 1301/2774 [4:16:57<4:43:52, 11.56s/it] 47%|████▋ | 1302/2774 [4:17:08<4:41:28, 11.47s/it] {'loss': 1.0107, 'learning_rate': 2.869443315811889e-06, 'epoch': 0.47} 47%|████▋ | 1302/2774 [4:17:08<4:41:28, 11.47s/it] 47%|████▋ | 1303/2774 [4:17:20<4:42:27, 11.52s/it] {'loss': 1.0742, 'learning_rate': 2.866555425083055e-06, 'epoch': 0.47} 47%|████▋ | 1303/2774 [4:17:20<4:42:27, 11.52s/it] 47%|████▋ | 1304/2774 [4:17:31<4:41:38, 11.50s/it] {'loss': 1.0879, 'learning_rate': 2.8636670343947646e-06, 'epoch': 0.47} 47%|████▋ | 1304/2774 [4:17:31<4:41:38, 11.50s/it] 47%|████▋ | 1305/2774 [4:17:43<4:46:10, 11.69s/it] {'loss': 1.0054, 'learning_rate': 2.860778147686608e-06, 'epoch': 0.47} 47%|████▋ | 1305/2774 [4:17:43<4:46:10, 11.69s/it] 47%|████▋ | 1306/2774 [4:17:54<4:43:03, 11.57s/it] {'loss': 1.0205, 'learning_rate': 2.857888768898852e-06, 'epoch': 0.47} 47%|████▋ | 1306/2774 [4:17:54<4:43:03, 11.57s/it] 47%|████▋ | 1307/2774 [4:18:06<4:41:42, 11.52s/it] {'loss': 1.062, 'learning_rate': 2.8549989019724344e-06, 'epoch': 0.47} 47%|████▋ | 1307/2774 [4:18:06<4:41:42, 11.52s/it] 47%|████▋ | 1308/2774 [4:18:17<4:40:04, 11.46s/it] {'loss': 1.0127, 'learning_rate': 2.852108550848959e-06, 'epoch': 0.47} 47%|████▋ | 1308/2774 [4:18:17<4:40:04, 11.46s/it] 47%|████▋ | 1309/2774 [4:18:29<4:40:13, 11.48s/it] {'loss': 1.1143, 'learning_rate': 2.849217719470691e-06, 'epoch': 0.47} 47%|████▋ | 1309/2774 [4:18:29<4:40:13, 11.48s/it] 47%|████▋ | 1310/2774 [4:18:40<4:39:55, 11.47s/it] {'loss': 1.04, 'learning_rate': 2.84632641178055e-06, 'epoch': 0.47} 47%|████▋ | 1310/2774 [4:18:40<4:39:55, 11.47s/it] 47%|████▋ | 1311/2774 [4:18:51<4:38:32, 11.42s/it] {'loss': 1.0347, 'learning_rate': 2.8434346317221033e-06, 'epoch': 0.47} 47%|████▋ | 1311/2774 [4:18:51<4:38:32, 11.42s/it] 47%|████▋ | 1312/2774 [4:19:03<4:39:14, 11.46s/it] {'loss': 0.9863, 'learning_rate': 2.840542383239565e-06, 'epoch': 0.47} 47%|████▋ | 1312/2774 [4:19:03<4:39:14, 11.46s/it] 47%|████▋ | 1313/2774 [4:19:15<4:45:24, 11.72s/it] {'loss': 1.0469, 'learning_rate': 2.8376496702777884e-06, 'epoch': 0.47} 47%|████▋ | 1313/2774 [4:19:15<4:45:24, 11.72s/it] 47%|████▋ | 1314/2774 [4:19:27<4:43:33, 11.65s/it] {'loss': 1.0361, 'learning_rate': 2.8347564967822583e-06, 'epoch': 0.47} 47%|████▋ | 1314/2774 [4:19:27<4:43:33, 11.65s/it] 47%|████▋ | 1315/2774 [4:19:38<4:41:50, 11.59s/it] {'loss': 1.0122, 'learning_rate': 2.831862866699089e-06, 'epoch': 0.47} 47%|████▋ | 1315/2774 [4:19:38<4:41:50, 11.59s/it] 47%|████▋ | 1316/2774 [4:19:50<4:41:59, 11.60s/it] {'loss': 1.0396, 'learning_rate': 2.8289687839750157e-06, 'epoch': 0.47} 47%|████▋ | 1316/2774 [4:19:50<4:41:59, 11.60s/it] 47%|████▋ | 1317/2774 [4:20:01<4:41:52, 11.61s/it] {'loss': 1.02, 'learning_rate': 2.8260742525573944e-06, 'epoch': 0.47} 47%|████▋ | 1317/2774 [4:20:01<4:41:52, 11.61s/it] 48%|████▊ | 1318/2774 [4:20:13<4:39:48, 11.53s/it] {'loss': 0.9785, 'learning_rate': 2.8231792763941894e-06, 'epoch': 0.48} 48%|████▊ | 1318/2774 [4:20:13<4:39:48, 11.53s/it] 48%|████▊ | 1319/2774 [4:20:26<4:49:21, 11.93s/it] {'loss': 1.0073, 'learning_rate': 2.8202838594339756e-06, 'epoch': 0.48} 48%|████▊ | 1319/2774 [4:20:26<4:49:21, 11.93s/it] 48%|████▊ | 1320/2774 [4:20:37<4:47:31, 11.86s/it] {'loss': 1.0801, 'learning_rate': 2.817388005625924e-06, 'epoch': 0.48} 48%|████▊ | 1320/2774 [4:20:37<4:47:31, 11.86s/it] 48%|████▊ | 1321/2774 [4:20:49<4:47:58, 11.89s/it] {'loss': 0.9878, 'learning_rate': 2.8144917189198055e-06, 'epoch': 0.48} 48%|████▊ | 1321/2774 [4:20:49<4:47:58, 11.89s/it] 48%|████▊ | 1322/2774 [4:21:01<4:44:08, 11.74s/it] {'loss': 1.0576, 'learning_rate': 2.811595003265981e-06, 'epoch': 0.48} 48%|████▊ | 1322/2774 [4:21:01<4:44:08, 11.74s/it] 48%|████▊ | 1323/2774 [4:21:12<4:40:49, 11.61s/it] {'loss': 1.0518, 'learning_rate': 2.808697862615395e-06, 'epoch': 0.48} 48%|████▊ | 1323/2774 [4:21:12<4:40:49, 11.61s/it] 48%|████▊ | 1324/2774 [4:21:24<4:40:27, 11.61s/it] {'loss': 1.0015, 'learning_rate': 2.805800300919572e-06, 'epoch': 0.48} 48%|████▊ | 1324/2774 [4:21:24<4:40:27, 11.61s/it] 48%|████▊ | 1325/2774 [4:21:35<4:36:51, 11.46s/it] {'loss': 0.9849, 'learning_rate': 2.8029023221306117e-06, 'epoch': 0.48} 48%|████▊ | 1325/2774 [4:21:35<4:36:51, 11.46s/it] 48%|████▊ | 1326/2774 [4:21:49<4:57:57, 12.35s/it] {'loss': 1.0156, 'learning_rate': 2.8000039302011817e-06, 'epoch': 0.48} 48%|████▊ | 1326/2774 [4:21:49<4:57:57, 12.35s/it] 48%|████▊ | 1327/2774 [4:22:01<4:51:31, 12.09s/it] {'loss': 1.0195, 'learning_rate': 2.7971051290845137e-06, 'epoch': 0.48} 48%|████▊ | 1327/2774 [4:22:01<4:51:31, 12.09s/it] 48%|████▊ | 1328/2774 [4:22:12<4:46:31, 11.89s/it] {'loss': 1.0244, 'learning_rate': 2.7942059227343974e-06, 'epoch': 0.48} 48%|████▊ | 1328/2774 [4:22:12<4:46:31, 11.89s/it] 48%|████▊ | 1329/2774 [4:22:24<4:43:16, 11.76s/it] {'loss': 1.021, 'learning_rate': 2.7913063151051744e-06, 'epoch': 0.48} 48%|████▊ | 1329/2774 [4:22:24<4:43:16, 11.76s/it] 48%|████▊ | 1330/2774 [4:22:35<4:41:28, 11.70s/it] {'loss': 1.0317, 'learning_rate': 2.7884063101517354e-06, 'epoch': 0.48} 48%|████▊ | 1330/2774 [4:22:35<4:41:28, 11.70s/it] 48%|████▊ | 1331/2774 [4:22:47<4:43:06, 11.77s/it] {'loss': 1.0361, 'learning_rate': 2.7855059118295114e-06, 'epoch': 0.48} 48%|████▊ | 1331/2774 [4:22:47<4:43:06, 11.77s/it] 48%|████▊ | 1332/2774 [4:22:59<4:41:59, 11.73s/it] {'loss': 1.0029, 'learning_rate': 2.7826051240944706e-06, 'epoch': 0.48} 48%|████▊ | 1332/2774 [4:22:59<4:41:59, 11.73s/it] 48%|████▊ | 1333/2774 [4:23:10<4:40:35, 11.68s/it] {'loss': 1.083, 'learning_rate': 2.779703950903112e-06, 'epoch': 0.48} 48%|████▊ | 1333/2774 [4:23:10<4:40:35, 11.68s/it] 48%|████▊ | 1334/2774 [4:23:23<4:47:03, 11.96s/it] {'loss': 1.0205, 'learning_rate': 2.7768023962124613e-06, 'epoch': 0.48} 48%|████▊ | 1334/2774 [4:23:23<4:47:03, 11.96s/it] 48%|████▊ | 1335/2774 [4:23:34<4:42:36, 11.78s/it] {'loss': 1.0757, 'learning_rate': 2.7739004639800628e-06, 'epoch': 0.48} 48%|████▊ | 1335/2774 [4:23:34<4:42:36, 11.78s/it] 48%|████▊ | 1336/2774 [4:23:46<4:42:15, 11.78s/it] {'loss': 0.9976, 'learning_rate': 2.7709981581639772e-06, 'epoch': 0.48} 48%|████▊ | 1336/2774 [4:23:46<4:42:15, 11.78s/it] 48%|████▊ | 1337/2774 [4:23:58<4:40:48, 11.72s/it] {'loss': 1.0337, 'learning_rate': 2.768095482722775e-06, 'epoch': 0.48} 48%|████▊ | 1337/2774 [4:23:58<4:40:48, 11.72s/it] 48%|████▊ | 1338/2774 [4:24:11<4:54:47, 12.32s/it] {'loss': 1.0156, 'learning_rate': 2.7651924416155298e-06, 'epoch': 0.48} 48%|████▊ | 1338/2774 [4:24:11<4:54:47, 12.32s/it] 48%|████▊ | 1339/2774 [4:24:23<4:48:41, 12.07s/it] {'loss': 1.0645, 'learning_rate': 2.7622890388018133e-06, 'epoch': 0.48} 48%|████▊ | 1339/2774 [4:24:23<4:48:41, 12.07s/it] 48%|████▊ | 1340/2774 [4:24:35<4:46:22, 11.98s/it] {'loss': 1.061, 'learning_rate': 2.7593852782416923e-06, 'epoch': 0.48} 48%|████▊ | 1340/2774 [4:24:35<4:46:22, 11.98s/it] 48%|████▊ | 1341/2774 [4:24:46<4:40:55, 11.76s/it] {'loss': 1.0259, 'learning_rate': 2.756481163895722e-06, 'epoch': 0.48} 48%|████▊ | 1341/2774 [4:24:46<4:40:55, 11.76s/it] 48%|████▊ | 1342/2774 [4:24:59<4:49:05, 12.11s/it] {'loss': 1.0225, 'learning_rate': 2.753576699724936e-06, 'epoch': 0.48} 48%|████▊ | 1342/2774 [4:24:59<4:49:05, 12.11s/it] 48%|████▊ | 1343/2774 [4:25:10<4:43:27, 11.89s/it] {'loss': 1.0308, 'learning_rate': 2.750671889690851e-06, 'epoch': 0.48} 48%|████▊ | 1343/2774 [4:25:10<4:43:27, 11.89s/it] 48%|████▊ | 1344/2774 [4:25:21<4:37:44, 11.65s/it] {'loss': 1.0288, 'learning_rate': 2.7477667377554506e-06, 'epoch': 0.48} 48%|████▊ | 1344/2774 [4:25:21<4:37:44, 11.65s/it] 48%|████▊ | 1345/2774 [4:25:32<4:34:44, 11.54s/it] {'loss': 1.0298, 'learning_rate': 2.7448612478811878e-06, 'epoch': 0.48} 48%|████▊ | 1345/2774 [4:25:32<4:34:44, 11.54s/it] 49%|████▊ | 1346/2774 [4:25:44<4:34:09, 11.52s/it] {'loss': 1.0244, 'learning_rate': 2.7419554240309737e-06, 'epoch': 0.49} 49%|████▊ | 1346/2774 [4:25:44<4:34:09, 11.52s/it] 49%|████▊ | 1347/2774 [4:25:58<4:51:56, 12.28s/it] {'loss': 0.9937, 'learning_rate': 2.739049270168177e-06, 'epoch': 0.49} 49%|████▊ | 1347/2774 [4:25:58<4:51:56, 12.28s/it] 49%|████▊ | 1348/2774 [4:26:12<5:01:23, 12.68s/it] {'loss': 0.9971, 'learning_rate': 2.7361427902566175e-06, 'epoch': 0.49} 49%|████▊ | 1348/2774 [4:26:12<5:01:23, 12.68s/it] 49%|████▊ | 1349/2774 [4:26:23<4:51:40, 12.28s/it] {'loss': 1.0059, 'learning_rate': 2.7332359882605563e-06, 'epoch': 0.49} 49%|████▊ | 1349/2774 [4:26:23<4:51:40, 12.28s/it] 49%|████▊ | 1350/2774 [4:26:34<4:45:14, 12.02s/it] {'loss': 1.0342, 'learning_rate': 2.7303288681446966e-06, 'epoch': 0.49} 49%|████▊ | 1350/2774 [4:26:34<4:45:14, 12.02s/it] 49%|████▊ | 1351/2774 [4:26:46<4:42:35, 11.92s/it] {'loss': 1.0073, 'learning_rate': 2.727421433874175e-06, 'epoch': 0.49} 49%|████▊ | 1351/2774 [4:26:46<4:42:35, 11.92s/it] 49%|████▊ | 1352/2774 [4:26:57<4:38:18, 11.74s/it] {'loss': 1.0786, 'learning_rate': 2.7245136894145556e-06, 'epoch': 0.49} 49%|████▊ | 1352/2774 [4:26:57<4:38:18, 11.74s/it] 49%|████▉ | 1353/2774 [4:27:09<4:35:44, 11.64s/it] {'loss': 1.0142, 'learning_rate': 2.7216056387318257e-06, 'epoch': 0.49} 49%|████▉ | 1353/2774 [4:27:09<4:35:44, 11.64s/it] 49%|████▉ | 1354/2774 [4:27:20<4:33:37, 11.56s/it] {'loss': 1.0845, 'learning_rate': 2.7186972857923922e-06, 'epoch': 0.49} 49%|████▉ | 1354/2774 [4:27:20<4:33:37, 11.56s/it] 49%|████▉ | 1355/2774 [4:27:32<4:32:27, 11.52s/it] {'loss': 1.0879, 'learning_rate': 2.715788634563072e-06, 'epoch': 0.49} 49%|████▉ | 1355/2774 [4:27:32<4:32:27, 11.52s/it] 49%|████▉ | 1356/2774 [4:27:44<4:35:34, 11.66s/it] {'loss': 1.0654, 'learning_rate': 2.712879689011089e-06, 'epoch': 0.49} 49%|████▉ | 1356/2774 [4:27:44<4:35:34, 11.66s/it] 49%|████▉ | 1357/2774 [4:27:55<4:32:56, 11.56s/it] {'loss': 1.0576, 'learning_rate': 2.70997045310407e-06, 'epoch': 0.49} 49%|████▉ | 1357/2774 [4:27:55<4:32:56, 11.56s/it] 49%|████▉ | 1358/2774 [4:28:06<4:30:59, 11.48s/it] {'loss': 1.0181, 'learning_rate': 2.707060930810037e-06, 'epoch': 0.49} 49%|████▉ | 1358/2774 [4:28:06<4:30:59, 11.48s/it] 49%|████▉ | 1359/2774 [4:28:18<4:31:49, 11.53s/it] {'loss': 1.0396, 'learning_rate': 2.704151126097403e-06, 'epoch': 0.49} 49%|████▉ | 1359/2774 [4:28:18<4:31:49, 11.53s/it] 49%|████▉ | 1360/2774 [4:28:29<4:32:10, 11.55s/it] {'loss': 0.9819, 'learning_rate': 2.7012410429349656e-06, 'epoch': 0.49} 49%|████▉ | 1360/2774 [4:28:29<4:32:10, 11.55s/it] 49%|████▉ | 1361/2774 [4:28:41<4:29:42, 11.45s/it] {'loss': 1.0513, 'learning_rate': 2.698330685291902e-06, 'epoch': 0.49} 49%|████▉ | 1361/2774 [4:28:41<4:29:42, 11.45s/it] 49%|████▉ | 1362/2774 [4:28:52<4:29:57, 11.47s/it] {'loss': 1.0181, 'learning_rate': 2.695420057137764e-06, 'epoch': 0.49} 49%|████▉ | 1362/2774 [4:28:52<4:29:57, 11.47s/it] 49%|████▉ | 1363/2774 [4:29:04<4:30:24, 11.50s/it] {'loss': 1.0244, 'learning_rate': 2.692509162442473e-06, 'epoch': 0.49} 49%|████▉ | 1363/2774 [4:29:04<4:30:24, 11.50s/it] 49%|████▉ | 1364/2774 [4:29:15<4:28:15, 11.41s/it] {'loss': 1.0537, 'learning_rate': 2.6895980051763145e-06, 'epoch': 0.49} 49%|████▉ | 1364/2774 [4:29:15<4:28:15, 11.41s/it] 49%|████▉ | 1365/2774 [4:29:26<4:27:49, 11.40s/it] {'loss': 1.0776, 'learning_rate': 2.6866865893099298e-06, 'epoch': 0.49} 49%|████▉ | 1365/2774 [4:29:26<4:27:49, 11.40s/it] 49%|████▉ | 1366/2774 [4:29:38<4:26:26, 11.35s/it] {'loss': 0.9785, 'learning_rate': 2.683774918814314e-06, 'epoch': 0.49} 49%|████▉ | 1366/2774 [4:29:38<4:26:26, 11.35s/it] 49%|████▉ | 1367/2774 [4:29:49<4:29:48, 11.51s/it] {'loss': 1.0386, 'learning_rate': 2.6808629976608114e-06, 'epoch': 0.49} 49%|████▉ | 1367/2774 [4:29:49<4:29:48, 11.51s/it] 49%|████▉ | 1368/2774 [4:30:01<4:30:15, 11.53s/it] {'loss': 1.0732, 'learning_rate': 2.6779508298211055e-06, 'epoch': 0.49} 49%|████▉ | 1368/2774 [4:30:01<4:30:15, 11.53s/it] 49%|████▉ | 1369/2774 [4:30:12<4:28:29, 11.47s/it] {'loss': 1.0068, 'learning_rate': 2.6750384192672172e-06, 'epoch': 0.49} 49%|████▉ | 1369/2774 [4:30:12<4:28:29, 11.47s/it] 49%|████▉ | 1370/2774 [4:30:24<4:26:58, 11.41s/it] {'loss': 0.9795, 'learning_rate': 2.6721257699714985e-06, 'epoch': 0.49} 49%|████▉ | 1370/2774 [4:30:24<4:26:58, 11.41s/it] 49%|████▉ | 1371/2774 [4:30:35<4:27:01, 11.42s/it] {'loss': 1.0039, 'learning_rate': 2.6692128859066283e-06, 'epoch': 0.49} 49%|████▉ | 1371/2774 [4:30:35<4:27:01, 11.42s/it] 49%|████▉ | 1372/2774 [4:30:47<4:27:05, 11.43s/it] {'loss': 1.0806, 'learning_rate': 2.666299771045603e-06, 'epoch': 0.49} 49%|████▉ | 1372/2774 [4:30:47<4:27:05, 11.43s/it] 49%|████▉ | 1373/2774 [4:31:00<4:38:43, 11.94s/it] {'loss': 0.9761, 'learning_rate': 2.663386429361736e-06, 'epoch': 0.49} 49%|████▉ | 1373/2774 [4:31:00<4:38:43, 11.94s/it] 50%|████▉ | 1374/2774 [4:31:12<4:41:19, 12.06s/it] {'loss': 1.0244, 'learning_rate': 2.6604728648286494e-06, 'epoch': 0.5} 50%|████▉ | 1374/2774 [4:31:12<4:41:19, 12.06s/it] 50%|████▉ | 1375/2774 [4:31:24<4:37:21, 11.90s/it] {'loss': 1.0557, 'learning_rate': 2.657559081420269e-06, 'epoch': 0.5} 50%|████▉ | 1375/2774 [4:31:24<4:37:21, 11.90s/it] 50%|████▉ | 1376/2774 [4:31:37<4:47:41, 12.35s/it] {'loss': 0.9741, 'learning_rate': 2.6546450831108187e-06, 'epoch': 0.5} 50%|████▉ | 1376/2774 [4:31:37<4:47:41, 12.35s/it] 50%|████▉ | 1377/2774 [4:31:50<4:51:09, 12.50s/it] {'loss': 1.0015, 'learning_rate': 2.6517308738748178e-06, 'epoch': 0.5} 50%|████▉ | 1377/2774 [4:31:50<4:51:09, 12.50s/it] 50%|████▉ | 1378/2774 [4:32:04<5:03:54, 13.06s/it] {'loss': 0.9722, 'learning_rate': 2.6488164576870706e-06, 'epoch': 0.5} 50%|████▉ | 1378/2774 [4:32:04<5:03:54, 13.06s/it] 50%|████▉ | 1379/2774 [4:32:15<4:50:50, 12.51s/it] {'loss': 0.9961, 'learning_rate': 2.6459018385226643e-06, 'epoch': 0.5} 50%|████▉ | 1379/2774 [4:32:15<4:50:50, 12.51s/it] 50%|████▉ | 1380/2774 [4:32:27<4:42:52, 12.18s/it] {'loss': 1.0605, 'learning_rate': 2.642987020356964e-06, 'epoch': 0.5} 50%|████▉ | 1380/2774 [4:32:27<4:42:52, 12.18s/it] 50%|████▉ | 1381/2774 [4:32:39<4:39:57, 12.06s/it] {'loss': 1.0288, 'learning_rate': 2.640072007165606e-06, 'epoch': 0.5} 50%|████▉ | 1381/2774 [4:32:39<4:39:57, 12.06s/it] 50%|████▉ | 1382/2774 [4:32:50<4:35:50, 11.89s/it] {'loss': 0.9858, 'learning_rate': 2.637156802924492e-06, 'epoch': 0.5} 50%|████▉ | 1382/2774 [4:32:50<4:35:50, 11.89s/it] 50%|████▉ | 1383/2774 [4:33:01<4:31:44, 11.72s/it] {'loss': 1.019, 'learning_rate': 2.6342414116097838e-06, 'epoch': 0.5} 50%|████▉ | 1383/2774 [4:33:01<4:31:44, 11.72s/it] 50%|████▉ | 1384/2774 [4:33:13<4:29:12, 11.62s/it] {'loss': 1.0024, 'learning_rate': 2.6313258371978996e-06, 'epoch': 0.5} 50%|████▉ | 1384/2774 [4:33:13<4:29:12, 11.62s/it] 50%|████▉ | 1385/2774 [4:33:31<5:14:40, 13.59s/it] {'loss': 1.0571, 'learning_rate': 2.628410083665506e-06, 'epoch': 0.5} 50%|████▉ | 1385/2774 [4:33:31<5:14:40, 13.59s/it] 50%|████▉ | 1386/2774 [4:33:44<5:11:16, 13.46s/it] {'loss': 1.0059, 'learning_rate': 2.6254941549895156e-06, 'epoch': 0.5} 50%|████▉ | 1386/2774 [4:33:44<5:11:16, 13.46s/it] 50%|█████ | 1387/2774 [4:33:56<5:00:28, 13.00s/it] {'loss': 1.0396, 'learning_rate': 2.62257805514708e-06, 'epoch': 0.5} 50%|█████ | 1387/2774 [4:33:56<5:00:28, 13.00s/it] 50%|█████ | 1388/2774 [4:34:09<5:03:16, 13.13s/it] {'loss': 1.0405, 'learning_rate': 2.61966178811558e-06, 'epoch': 0.5} 50%|█████ | 1388/2774 [4:34:09<5:03:16, 13.13s/it] 50%|█████ | 1389/2774 [4:34:23<5:03:21, 13.14s/it] {'loss': 1.0308, 'learning_rate': 2.6167453578726303e-06, 'epoch': 0.5} 50%|█████ | 1389/2774 [4:34:23<5:03:21, 13.14s/it] 50%|█████ | 1390/2774 [4:34:35<4:55:24, 12.81s/it] {'loss': 1.0508, 'learning_rate': 2.613828768396065e-06, 'epoch': 0.5} 50%|█████ | 1390/2774 [4:34:35<4:55:24, 12.81s/it] 50%|█████ | 1391/2774 [4:34:46<4:44:45, 12.35s/it] {'loss': 1.0996, 'learning_rate': 2.610912023663936e-06, 'epoch': 0.5} 50%|█████ | 1391/2774 [4:34:46<4:44:45, 12.35s/it] 50%|█████ | 1392/2774 [4:35:00<4:53:21, 12.74s/it] {'loss': 1.0029, 'learning_rate': 2.6079951276545067e-06, 'epoch': 0.5} 50%|█████ | 1392/2774 [4:35:00<4:53:21, 12.74s/it] 50%|█████ | 1393/2774 [4:35:12<4:49:43, 12.59s/it] {'loss': 0.9922, 'learning_rate': 2.605078084346247e-06, 'epoch': 0.5} 50%|█████ | 1393/2774 [4:35:12<4:49:43, 12.59s/it] 50%|█████ | 1394/2774 [4:35:26<4:58:47, 12.99s/it] {'loss': 0.9883, 'learning_rate': 2.602160897717828e-06, 'epoch': 0.5} 50%|█████ | 1394/2774 [4:35:26<4:58:47, 12.99s/it] 50%|█████ | 1395/2774 [4:35:37<4:47:16, 12.50s/it] {'loss': 1.0234, 'learning_rate': 2.599243571748116e-06, 'epoch': 0.5} 50%|█████ | 1395/2774 [4:35:37<4:47:16, 12.50s/it] 50%|█████ | 1396/2774 [4:35:48<4:39:24, 12.17s/it] {'loss': 1.0361, 'learning_rate': 2.596326110416167e-06, 'epoch': 0.5} 50%|█████ | 1396/2774 [4:35:48<4:39:24, 12.17s/it] 50%|█████ | 1397/2774 [4:36:01<4:42:00, 12.29s/it] {'loss': 0.9829, 'learning_rate': 2.593408517701222e-06, 'epoch': 0.5} 50%|█████ | 1397/2774 [4:36:01<4:42:00, 12.29s/it] 50%|█████ | 1398/2774 [4:36:13<4:36:22, 12.05s/it] {'loss': 1.0044, 'learning_rate': 2.5904907975827015e-06, 'epoch': 0.5} 50%|█████ | 1398/2774 [4:36:13<4:36:22, 12.05s/it] 50%|█████ | 1399/2774 [4:36:24<4:34:26, 11.98s/it] {'loss': 1.04, 'learning_rate': 2.5875729540401993e-06, 'epoch': 0.5} 50%|█████ | 1399/2774 [4:36:24<4:34:26, 11.98s/it] 50%|█████ | 1400/2774 [4:36:36<4:29:58, 11.79s/it] {'loss': 1.0386, 'learning_rate': 2.584654991053479e-06, 'epoch': 0.5} 50%|█████ | 1400/2774 [4:36:36<4:29:58, 11.79s/it] 51%|█████ | 1401/2774 [4:36:47<4:26:24, 11.64s/it] {'loss': 1.0483, 'learning_rate': 2.581736912602464e-06, 'epoch': 0.51} 51%|█████ | 1401/2774 [4:36:47<4:26:24, 11.64s/it] 51%|█████ | 1402/2774 [4:36:58<4:23:52, 11.54s/it] {'loss': 1.0571, 'learning_rate': 2.578818722667238e-06, 'epoch': 0.51} 51%|█████ | 1402/2774 [4:36:58<4:23:52, 11.54s/it] 51%|█████ | 1403/2774 [4:37:10<4:22:29, 11.49s/it] {'loss': 1.083, 'learning_rate': 2.575900425228035e-06, 'epoch': 0.51} 51%|█████ | 1403/2774 [4:37:10<4:22:29, 11.49s/it] 51%|█████ | 1404/2774 [4:37:21<4:22:05, 11.48s/it] {'loss': 1.0425, 'learning_rate': 2.5729820242652376e-06, 'epoch': 0.51} 51%|█████ | 1404/2774 [4:37:21<4:22:05, 11.48s/it] 51%|█████ | 1405/2774 [4:37:35<4:37:33, 12.16s/it] {'loss': 0.9502, 'learning_rate': 2.570063523759368e-06, 'epoch': 0.51} 51%|█████ | 1405/2774 [4:37:35<4:37:33, 12.16s/it] 51%|█████ | 1406/2774 [4:37:47<4:34:43, 12.05s/it] {'loss': 1.0723, 'learning_rate': 2.5671449276910836e-06, 'epoch': 0.51} 51%|█████ | 1406/2774 [4:37:47<4:34:43, 12.05s/it] 51%|█████ | 1407/2774 [4:37:58<4:30:09, 11.86s/it] {'loss': 1.0703, 'learning_rate': 2.5642262400411745e-06, 'epoch': 0.51} 51%|█████ | 1407/2774 [4:37:58<4:30:09, 11.86s/it] 51%|█████ | 1408/2774 [4:38:09<4:25:09, 11.65s/it] {'loss': 1.0239, 'learning_rate': 2.561307464790554e-06, 'epoch': 0.51} 51%|█████ | 1408/2774 [4:38:09<4:25:09, 11.65s/it] 51%|█████ | 1409/2774 [4:38:21<4:24:03, 11.61s/it] {'loss': 0.9731, 'learning_rate': 2.558388605920255e-06, 'epoch': 0.51} 51%|█████ | 1409/2774 [4:38:21<4:24:03, 11.61s/it] 51%|█████ | 1410/2774 [4:38:32<4:22:19, 11.54s/it] {'loss': 0.979, 'learning_rate': 2.5554696674114243e-06, 'epoch': 0.51} 51%|█████ | 1410/2774 [4:38:32<4:22:19, 11.54s/it] 51%|█████ | 1411/2774 [4:38:44<4:22:04, 11.54s/it] {'loss': 0.9814, 'learning_rate': 2.552550653245318e-06, 'epoch': 0.51} 51%|█████ | 1411/2774 [4:38:44<4:22:04, 11.54s/it] 51%|█████ | 1412/2774 [4:38:55<4:23:08, 11.59s/it] {'loss': 1.0835, 'learning_rate': 2.5496315674032952e-06, 'epoch': 0.51} 51%|█████ | 1412/2774 [4:38:55<4:23:08, 11.59s/it] 51%|█████ | 1413/2774 [4:39:07<4:22:12, 11.56s/it] {'loss': 1.0508, 'learning_rate': 2.5467124138668126e-06, 'epoch': 0.51} 51%|█████ | 1413/2774 [4:39:07<4:22:12, 11.56s/it] 51%|█████ | 1414/2774 [4:39:18<4:21:21, 11.53s/it] {'loss': 1.0312, 'learning_rate': 2.54379319661742e-06, 'epoch': 0.51} 51%|█████ | 1414/2774 [4:39:18<4:21:21, 11.53s/it] 51%|█████ | 1415/2774 [4:39:30<4:22:28, 11.59s/it] {'loss': 1.0566, 'learning_rate': 2.540873919636752e-06, 'epoch': 0.51} 51%|█████ | 1415/2774 [4:39:30<4:22:28, 11.59s/it] 51%|█████ | 1416/2774 [4:39:41<4:21:06, 11.54s/it] {'loss': 1.0889, 'learning_rate': 2.537954586906527e-06, 'epoch': 0.51} 51%|█████ | 1416/2774 [4:39:41<4:21:06, 11.54s/it] 51%|█████ | 1417/2774 [4:39:53<4:19:35, 11.48s/it] {'loss': 1.0518, 'learning_rate': 2.5350352024085383e-06, 'epoch': 0.51} 51%|█████ | 1417/2774 [4:39:53<4:19:35, 11.48s/it] 51%|█████ | 1418/2774 [4:40:04<4:18:01, 11.42s/it] {'loss': 1.0479, 'learning_rate': 2.5321157701246503e-06, 'epoch': 0.51} 51%|█████ | 1418/2774 [4:40:04<4:18:01, 11.42s/it] 51%|█████ | 1419/2774 [4:40:15<4:16:36, 11.36s/it] {'loss': 1.0264, 'learning_rate': 2.5291962940367915e-06, 'epoch': 0.51} 51%|█████ | 1419/2774 [4:40:15<4:16:36, 11.36s/it] 51%|█████ | 1420/2774 [4:40:27<4:16:40, 11.37s/it] {'loss': 1.0513, 'learning_rate': 2.526276778126951e-06, 'epoch': 0.51} 51%|█████ | 1420/2774 [4:40:27<4:16:40, 11.37s/it] 51%|█████ | 1421/2774 [4:40:38<4:15:54, 11.35s/it] {'loss': 1.0171, 'learning_rate': 2.5233572263771727e-06, 'epoch': 0.51} 51%|█████ | 1421/2774 [4:40:38<4:15:54, 11.35s/it] 51%|█████▏ | 1422/2774 [4:40:52<4:35:28, 12.22s/it] {'loss': 1.0981, 'learning_rate': 2.520437642769549e-06, 'epoch': 0.51} 51%|█████▏ | 1422/2774 [4:40:52<4:35:28, 12.22s/it] 51%|█████▏ | 1423/2774 [4:41:04<4:30:12, 12.00s/it] {'loss': 1.0522, 'learning_rate': 2.5175180312862145e-06, 'epoch': 0.51} 51%|█████▏ | 1423/2774 [4:41:04<4:30:12, 12.00s/it] 51%|█████▏ | 1424/2774 [4:41:16<4:30:00, 12.00s/it] {'loss': 1.0879, 'learning_rate': 2.514598395909344e-06, 'epoch': 0.51} 51%|█████▏ | 1424/2774 [4:41:16<4:30:00, 12.00s/it] 51%|█████▏ | 1425/2774 [4:41:28<4:28:20, 11.93s/it] {'loss': 1.0459, 'learning_rate': 2.511678740621143e-06, 'epoch': 0.51} 51%|█████▏ | 1425/2774 [4:41:28<4:28:20, 11.93s/it] 51%|█████▏ | 1426/2774 [4:41:41<4:36:28, 12.31s/it] {'loss': 1.0586, 'learning_rate': 2.5087590694038455e-06, 'epoch': 0.51} 51%|█████▏ | 1426/2774 [4:41:41<4:36:28, 12.31s/it] 51%|█████▏ | 1427/2774 [4:41:52<4:31:20, 12.09s/it] {'loss': 1.0493, 'learning_rate': 2.5058393862397067e-06, 'epoch': 0.51} 51%|█████▏ | 1427/2774 [4:41:52<4:31:20, 12.09s/it] 51%|█████▏ | 1428/2774 [4:42:04<4:29:29, 12.01s/it] {'loss': 1.0288, 'learning_rate': 2.5029196951109975e-06, 'epoch': 0.51} 51%|█████▏ | 1428/2774 [4:42:04<4:29:29, 12.01s/it] 52%|█████▏ | 1429/2774 [4:42:17<4:31:43, 12.12s/it] {'loss': 0.9868, 'learning_rate': 2.5e-06, 'epoch': 0.52} 52%|█████▏ | 1429/2774 [4:42:17<4:31:43, 12.12s/it] 52%|█████▏ | 1430/2774 [4:42:28<4:28:22, 11.98s/it] {'loss': 1.0098, 'learning_rate': 2.4970803048890033e-06, 'epoch': 0.52} 52%|█████▏ | 1430/2774 [4:42:28<4:28:22, 11.98s/it] 52%|█████▏ | 1431/2774 [4:42:39<4:23:33, 11.77s/it] {'loss': 1.0249, 'learning_rate': 2.4941606137602937e-06, 'epoch': 0.52} 52%|█████▏ | 1431/2774 [4:42:39<4:23:33, 11.77s/it] 52%|█████▏ | 1432/2774 [4:42:51<4:21:42, 11.70s/it] {'loss': 1.0, 'learning_rate': 2.491240930596155e-06, 'epoch': 0.52} 52%|█████▏ | 1432/2774 [4:42:51<4:21:42, 11.70s/it] 52%|█████▏ | 1433/2774 [4:43:02<4:20:17, 11.65s/it] {'loss': 1.0854, 'learning_rate': 2.488321259378857e-06, 'epoch': 0.52} 52%|█████▏ | 1433/2774 [4:43:03<4:20:17, 11.65s/it] 52%|█████▏ | 1434/2774 [4:43:14<4:17:49, 11.54s/it] {'loss': 1.0322, 'learning_rate': 2.4854016040906574e-06, 'epoch': 0.52} 52%|█████▏ | 1434/2774 [4:43:14<4:17:49, 11.54s/it] 52%|█████▏ | 1435/2774 [4:43:25<4:18:03, 11.56s/it] {'loss': 1.0215, 'learning_rate': 2.482481968713787e-06, 'epoch': 0.52} 52%|█████▏ | 1435/2774 [4:43:25<4:18:03, 11.56s/it] 52%|█████▏ | 1436/2774 [4:43:37<4:17:33, 11.55s/it] {'loss': 1.0977, 'learning_rate': 2.4795623572304523e-06, 'epoch': 0.52} 52%|█████▏ | 1436/2774 [4:43:37<4:17:33, 11.55s/it] 52%|█████▏ | 1437/2774 [4:43:48<4:16:49, 11.53s/it] {'loss': 1.0264, 'learning_rate': 2.4766427736228277e-06, 'epoch': 0.52} 52%|█████▏ | 1437/2774 [4:43:48<4:16:49, 11.53s/it] 52%|█████▏ | 1438/2774 [4:44:00<4:15:16, 11.46s/it] {'loss': 1.0391, 'learning_rate': 2.4737232218730495e-06, 'epoch': 0.52} 52%|█████▏ | 1438/2774 [4:44:00<4:15:16, 11.46s/it] 52%|█████▏ | 1439/2774 [4:44:11<4:15:31, 11.48s/it] {'loss': 1.1123, 'learning_rate': 2.4708037059632094e-06, 'epoch': 0.52} 52%|█████▏ | 1439/2774 [4:44:11<4:15:31, 11.48s/it] 52%|█████▏ | 1440/2774 [4:44:23<4:14:36, 11.45s/it] {'loss': 1.0186, 'learning_rate': 2.467884229875351e-06, 'epoch': 0.52} 52%|█████▏ | 1440/2774 [4:44:23<4:14:36, 11.45s/it] 52%|█████▏ | 1441/2774 [4:44:37<4:30:58, 12.20s/it] {'loss': 0.9585, 'learning_rate': 2.464964797591462e-06, 'epoch': 0.52} 52%|█████▏ | 1441/2774 [4:44:37<4:30:58, 12.20s/it] 52%|█████▏ | 1442/2774 [4:44:48<4:25:34, 11.96s/it] {'loss': 1.1006, 'learning_rate': 2.4620454130934732e-06, 'epoch': 0.52} 52%|█████▏ | 1442/2774 [4:44:48<4:25:34, 11.96s/it] 52%|█████▏ | 1443/2774 [4:45:00<4:22:34, 11.84s/it] {'loss': 1.0444, 'learning_rate': 2.4591260803632484e-06, 'epoch': 0.52} 52%|█████▏ | 1443/2774 [4:45:00<4:22:34, 11.84s/it] 52%|█████▏ | 1444/2774 [4:45:11<4:18:17, 11.65s/it] {'loss': 1.0112, 'learning_rate': 2.4562068033825807e-06, 'epoch': 0.52} 52%|█████▏ | 1444/2774 [4:45:11<4:18:17, 11.65s/it] 52%|█████▏ | 1445/2774 [4:45:22<4:16:54, 11.60s/it] {'loss': 1.0176, 'learning_rate': 2.453287586133188e-06, 'epoch': 0.52} 52%|█████▏ | 1445/2774 [4:45:22<4:16:54, 11.60s/it] 52%|█████▏ | 1446/2774 [4:45:34<4:15:19, 11.54s/it] {'loss': 1.0479, 'learning_rate': 2.450368432596705e-06, 'epoch': 0.52} 52%|█████▏ | 1446/2774 [4:45:34<4:15:19, 11.54s/it] 52%|█████▏ | 1447/2774 [4:45:46<4:21:05, 11.81s/it] {'loss': 1.105, 'learning_rate': 2.4474493467546828e-06, 'epoch': 0.52} 52%|█████▏ | 1447/2774 [4:45:46<4:21:05, 11.81s/it] 52%|█████▏ | 1448/2774 [4:45:58<4:21:31, 11.83s/it] {'loss': 1.0073, 'learning_rate': 2.4445303325885765e-06, 'epoch': 0.52} 52%|█████▏ | 1448/2774 [4:45:58<4:21:31, 11.83s/it] 52%|█████▏ | 1449/2774 [4:46:09<4:19:07, 11.73s/it] {'loss': 1.0127, 'learning_rate': 2.4416113940797457e-06, 'epoch': 0.52} 52%|█████▏ | 1449/2774 [4:46:09<4:19:07, 11.73s/it] 52%|█████▏ | 1450/2774 [4:46:21<4:18:41, 11.72s/it] {'loss': 1.0103, 'learning_rate': 2.4386925352094464e-06, 'epoch': 0.52} 52%|█████▏ | 1450/2774 [4:46:21<4:18:41, 11.72s/it] 52%|█████▏ | 1451/2774 [4:46:33<4:16:24, 11.63s/it] {'loss': 1.0479, 'learning_rate': 2.4357737599588255e-06, 'epoch': 0.52} 52%|█████▏ | 1451/2774 [4:46:33<4:16:24, 11.63s/it] 52%|█████▏ | 1452/2774 [4:46:44<4:14:31, 11.55s/it] {'loss': 1.0073, 'learning_rate': 2.4328550723089173e-06, 'epoch': 0.52} 52%|█████▏ | 1452/2774 [4:46:44<4:14:31, 11.55s/it] 52%|█████▏ | 1453/2774 [4:46:55<4:14:03, 11.54s/it] {'loss': 1.0278, 'learning_rate': 2.429936476240633e-06, 'epoch': 0.52} 52%|█████▏ | 1453/2774 [4:46:55<4:14:03, 11.54s/it] 52%|█████▏ | 1454/2774 [4:47:07<4:12:53, 11.50s/it] {'loss': 1.0059, 'learning_rate': 2.4270179757347633e-06, 'epoch': 0.52} 52%|█████▏ | 1454/2774 [4:47:07<4:12:53, 11.50s/it] 52%|█████▏ | 1455/2774 [4:47:18<4:13:01, 11.51s/it] {'loss': 1.0088, 'learning_rate': 2.4240995747719657e-06, 'epoch': 0.52} 52%|█████▏ | 1455/2774 [4:47:18<4:13:01, 11.51s/it] 52%|█████▏ | 1456/2774 [4:47:30<4:11:54, 11.47s/it] {'loss': 1.0039, 'learning_rate': 2.421181277332763e-06, 'epoch': 0.52} 52%|█████▏ | 1456/2774 [4:47:30<4:11:54, 11.47s/it] 53%|█████▎ | 1457/2774 [4:47:42<4:18:30, 11.78s/it] {'loss': 1.0498, 'learning_rate': 2.418263087397537e-06, 'epoch': 0.53} 53%|█████▎ | 1457/2774 [4:47:42<4:18:30, 11.78s/it] 53%|█████▎ | 1458/2774 [4:47:56<4:31:43, 12.39s/it] {'loss': 1.0488, 'learning_rate': 2.415345008946522e-06, 'epoch': 0.53} 53%|█████▎ | 1458/2774 [4:47:56<4:31:43, 12.39s/it] 53%|█████▎ | 1459/2774 [4:48:08<4:27:50, 12.22s/it] {'loss': 1.0903, 'learning_rate': 2.4124270459598007e-06, 'epoch': 0.53} 53%|█████▎ | 1459/2774 [4:48:08<4:27:50, 12.22s/it] 53%|█████▎ | 1460/2774 [4:48:19<4:22:30, 11.99s/it] {'loss': 1.0298, 'learning_rate': 2.4095092024172994e-06, 'epoch': 0.53} 53%|█████▎ | 1460/2774 [4:48:19<4:22:30, 11.99s/it] 53%|█████▎ | 1461/2774 [4:48:32<4:27:16, 12.21s/it] {'loss': 1.0024, 'learning_rate': 2.406591482298779e-06, 'epoch': 0.53} 53%|█████▎ | 1461/2774 [4:48:32<4:27:16, 12.21s/it] 53%|█████▎ | 1462/2774 [4:48:44<4:22:12, 11.99s/it] {'loss': 1.0576, 'learning_rate': 2.403673889583835e-06, 'epoch': 0.53} 53%|█████▎ | 1462/2774 [4:48:44<4:22:12, 11.99s/it] 53%|█████▎ | 1463/2774 [4:48:55<4:17:52, 11.80s/it] {'loss': 1.0254, 'learning_rate': 2.4007564282518854e-06, 'epoch': 0.53} 53%|█████▎ | 1463/2774 [4:48:55<4:17:52, 11.80s/it] 53%|█████▎ | 1464/2774 [4:49:06<4:15:17, 11.69s/it] {'loss': 1.0337, 'learning_rate': 2.397839102282173e-06, 'epoch': 0.53} 53%|█████▎ | 1464/2774 [4:49:06<4:15:17, 11.69s/it] 53%|█████▎ | 1465/2774 [4:49:19<4:18:46, 11.86s/it] {'loss': 1.04, 'learning_rate': 2.394921915653754e-06, 'epoch': 0.53} 53%|█████▎ | 1465/2774 [4:49:19<4:18:46, 11.86s/it] 53%|█████▎ | 1466/2774 [4:49:30<4:17:13, 11.80s/it] {'loss': 0.9746, 'learning_rate': 2.3920048723454938e-06, 'epoch': 0.53} 53%|█████▎ | 1466/2774 [4:49:30<4:17:13, 11.80s/it] 53%|█████▎ | 1467/2774 [4:49:41<4:13:17, 11.63s/it] {'loss': 0.999, 'learning_rate': 2.3890879763360643e-06, 'epoch': 0.53} 53%|█████▎ | 1467/2774 [4:49:41<4:13:17, 11.63s/it] 53%|█████▎ | 1468/2774 [4:49:53<4:10:47, 11.52s/it] {'loss': 1.0801, 'learning_rate': 2.386171231603935e-06, 'epoch': 0.53} 53%|█████▎ | 1468/2774 [4:49:53<4:10:47, 11.52s/it] 53%|█████▎ | 1469/2774 [4:50:04<4:08:39, 11.43s/it] {'loss': 0.9653, 'learning_rate': 2.3832546421273693e-06, 'epoch': 0.53} 53%|█████▎ | 1469/2774 [4:50:04<4:08:39, 11.43s/it] 53%|█████▎ | 1470/2774 [4:50:16<4:09:10, 11.46s/it] {'loss': 1.0659, 'learning_rate': 2.380338211884421e-06, 'epoch': 0.53} 53%|█████▎ | 1470/2774 [4:50:16<4:09:10, 11.46s/it] 53%|█████▎ | 1471/2774 [4:50:29<4:24:41, 12.19s/it] {'loss': 1.0488, 'learning_rate': 2.377421944852922e-06, 'epoch': 0.53} 53%|█████▎ | 1471/2774 [4:50:29<4:24:41, 12.19s/it] 53%|█████▎ | 1472/2774 [4:50:41<4:21:50, 12.07s/it] {'loss': 1.0283, 'learning_rate': 2.374505845010485e-06, 'epoch': 0.53} 53%|█████▎ | 1472/2774 [4:50:41<4:21:50, 12.07s/it] 53%|█████▎ | 1473/2774 [4:50:53<4:18:58, 11.94s/it] {'loss': 1.0571, 'learning_rate': 2.3715899163344947e-06, 'epoch': 0.53} 53%|█████▎ | 1473/2774 [4:50:53<4:18:58, 11.94s/it] 53%|█████▎ | 1474/2774 [4:51:04<4:15:48, 11.81s/it] {'loss': 1.0449, 'learning_rate': 2.3686741628021016e-06, 'epoch': 0.53} 53%|█████▎ | 1474/2774 [4:51:04<4:15:48, 11.81s/it] 53%|█████▎ | 1475/2774 [4:51:16<4:12:17, 11.65s/it] {'loss': 1.0811, 'learning_rate': 2.365758588390217e-06, 'epoch': 0.53} 53%|█████▎ | 1475/2774 [4:51:16<4:12:17, 11.65s/it] 53%|█████▎ | 1476/2774 [4:51:28<4:18:26, 11.95s/it] {'loss': 0.9927, 'learning_rate': 2.3628431970755087e-06, 'epoch': 0.53} 53%|█████▎ | 1476/2774 [4:51:28<4:18:26, 11.95s/it] 53%|█████▎ | 1477/2774 [4:51:40<4:18:36, 11.96s/it] {'loss': 1.0483, 'learning_rate': 2.359927992834394e-06, 'epoch': 0.53} 53%|█████▎ | 1477/2774 [4:51:40<4:18:36, 11.96s/it] 53%|█████▎ | 1478/2774 [4:51:52<4:18:01, 11.95s/it] {'loss': 0.9756, 'learning_rate': 2.357012979643036e-06, 'epoch': 0.53} 53%|█████▎ | 1478/2774 [4:51:52<4:18:01, 11.95s/it] 53%|█████▎ | 1479/2774 [4:52:04<4:15:41, 11.85s/it] {'loss': 1.022, 'learning_rate': 2.3540981614773366e-06, 'epoch': 0.53} 53%|█████▎ | 1479/2774 [4:52:04<4:15:41, 11.85s/it] 53%|█████▎ | 1480/2774 [4:52:16<4:16:11, 11.88s/it] {'loss': 1.0703, 'learning_rate': 2.3511835423129307e-06, 'epoch': 0.53} 53%|█████▎ | 1480/2774 [4:52:16<4:16:11, 11.88s/it] 53%|█████▎ | 1481/2774 [4:52:27<4:12:10, 11.70s/it] {'loss': 0.9927, 'learning_rate': 2.3482691261251835e-06, 'epoch': 0.53} 53%|█████▎ | 1481/2774 [4:52:27<4:12:10, 11.70s/it] 53%|█████▎ | 1482/2774 [4:52:38<4:10:16, 11.62s/it] {'loss': 1.0234, 'learning_rate': 2.3453549168891817e-06, 'epoch': 0.53} 53%|█████▎ | 1482/2774 [4:52:38<4:10:16, 11.62s/it] 53%|█████▎ | 1483/2774 [4:52:50<4:07:49, 11.52s/it] {'loss': 1.0459, 'learning_rate': 2.342440918579732e-06, 'epoch': 0.53} 53%|█████▎ | 1483/2774 [4:52:50<4:07:49, 11.52s/it] 53%|█████▎ | 1484/2774 [4:53:01<4:06:57, 11.49s/it] {'loss': 0.9639, 'learning_rate': 2.3395271351713515e-06, 'epoch': 0.53} 53%|█████▎ | 1484/2774 [4:53:01<4:06:57, 11.49s/it] 54%|█████▎ | 1485/2774 [4:53:13<4:08:20, 11.56s/it] {'loss': 1.0854, 'learning_rate': 2.3366135706382644e-06, 'epoch': 0.54} 54%|█████▎ | 1485/2774 [4:53:13<4:08:20, 11.56s/it] 54%|█████▎ | 1486/2774 [4:53:25<4:09:40, 11.63s/it] {'loss': 1.083, 'learning_rate': 2.333700228954398e-06, 'epoch': 0.54} 54%|█████▎ | 1486/2774 [4:53:25<4:09:40, 11.63s/it] 54%|█████▎ | 1487/2774 [4:53:36<4:07:08, 11.52s/it] {'loss': 1.0493, 'learning_rate': 2.3307871140933725e-06, 'epoch': 0.54} 54%|█████▎ | 1487/2774 [4:53:36<4:07:08, 11.52s/it] 54%|█████▎ | 1488/2774 [4:53:47<4:06:36, 11.51s/it] {'loss': 1.0029, 'learning_rate': 2.327874230028502e-06, 'epoch': 0.54} 54%|█████▎ | 1488/2774 [4:53:47<4:06:36, 11.51s/it] 54%|█████▎ | 1489/2774 [4:53:59<4:07:17, 11.55s/it] {'loss': 1.1045, 'learning_rate': 2.3249615807327836e-06, 'epoch': 0.54} 54%|█████▎ | 1489/2774 [4:53:59<4:07:17, 11.55s/it] 54%|█████▎ | 1490/2774 [4:54:11<4:06:36, 11.52s/it] {'loss': 1.085, 'learning_rate': 2.3220491701788953e-06, 'epoch': 0.54} 54%|█████▎ | 1490/2774 [4:54:11<4:06:36, 11.52s/it] 54%|█████▎ | 1491/2774 [4:54:23<4:10:12, 11.70s/it] {'loss': 1.0215, 'learning_rate': 2.3191370023391894e-06, 'epoch': 0.54} 54%|█████▎ | 1491/2774 [4:54:23<4:10:12, 11.70s/it] 54%|█████▍ | 1492/2774 [4:54:34<4:08:44, 11.64s/it] {'loss': 1.0093, 'learning_rate': 2.3162250811856863e-06, 'epoch': 0.54} 54%|█████▍ | 1492/2774 [4:54:34<4:08:44, 11.64s/it] 54%|█████▍ | 1493/2774 [4:54:45<4:05:11, 11.48s/it] {'loss': 1.0586, 'learning_rate': 2.313313410690071e-06, 'epoch': 0.54} 54%|█████▍ | 1493/2774 [4:54:45<4:05:11, 11.48s/it] 54%|█████▍ | 1494/2774 [4:54:57<4:04:55, 11.48s/it] {'loss': 1.1016, 'learning_rate': 2.3104019948236864e-06, 'epoch': 0.54} 54%|█████▍ | 1494/2774 [4:54:57<4:04:55, 11.48s/it] 54%|█████▍ | 1495/2774 [4:55:08<4:04:04, 11.45s/it] {'loss': 1.0366, 'learning_rate': 2.3074908375575273e-06, 'epoch': 0.54} 54%|█████▍ | 1495/2774 [4:55:08<4:04:04, 11.45s/it] 54%|█████▍ | 1496/2774 [4:55:19<4:02:23, 11.38s/it] {'loss': 1.0449, 'learning_rate': 2.3045799428622366e-06, 'epoch': 0.54} 54%|█████▍ | 1496/2774 [4:55:19<4:02:23, 11.38s/it] 54%|█████▍ | 1497/2774 [4:55:31<4:02:58, 11.42s/it] {'loss': 1.0093, 'learning_rate': 2.3016693147081e-06, 'epoch': 0.54} 54%|█████▍ | 1497/2774 [4:55:31<4:02:58, 11.42s/it] 54%|█████▍ | 1498/2774 [4:55:44<4:14:47, 11.98s/it] {'loss': 0.9878, 'learning_rate': 2.298758957065036e-06, 'epoch': 0.54} 54%|█████▍ | 1498/2774 [4:55:44<4:14:47, 11.98s/it] 54%|█████▍ | 1499/2774 [4:55:56<4:14:02, 11.95s/it] {'loss': 0.9653, 'learning_rate': 2.295848873902598e-06, 'epoch': 0.54} 54%|█████▍ | 1499/2774 [4:55:56<4:14:02, 11.95s/it] 54%|█████▍ | 1500/2774 [4:56:08<4:11:00, 11.82s/it] {'loss': 1.0483, 'learning_rate': 2.2929390691899635e-06, 'epoch': 0.54} 54%|█████▍ | 1500/2774 [4:56:08<4:11:00, 11.82s/it] 54%|█████▍ | 1501/2774 [4:56:19<4:07:08, 11.65s/it] {'loss': 1.0049, 'learning_rate': 2.2900295468959304e-06, 'epoch': 0.54} 54%|█████▍ | 1501/2774 [4:56:19<4:07:08, 11.65s/it] 54%|█████▍ | 1502/2774 [4:56:31<4:07:32, 11.68s/it] {'loss': 0.9829, 'learning_rate': 2.2871203109889117e-06, 'epoch': 0.54} 54%|█████▍ | 1502/2774 [4:56:31<4:07:32, 11.68s/it] 54%|█████▍ | 1503/2774 [4:56:42<4:07:01, 11.66s/it] {'loss': 1.0122, 'learning_rate': 2.284211365436929e-06, 'epoch': 0.54} 54%|█████▍ | 1503/2774 [4:56:42<4:07:01, 11.66s/it] 54%|█████▍ | 1504/2774 [4:56:54<4:09:00, 11.76s/it] {'loss': 1.019, 'learning_rate': 2.281302714207608e-06, 'epoch': 0.54} 54%|█████▍ | 1504/2774 [4:56:54<4:09:00, 11.76s/it] 54%|█████▍ | 1505/2774 [4:57:08<4:19:31, 12.27s/it] {'loss': 1.0117, 'learning_rate': 2.2783943612681743e-06, 'epoch': 0.54} 54%|█████▍ | 1505/2774 [4:57:08<4:19:31, 12.27s/it] 54%|█████▍ | 1506/2774 [4:57:19<4:14:15, 12.03s/it] {'loss': 0.9893, 'learning_rate': 2.2754863105854456e-06, 'epoch': 0.54} 54%|█████▍ | 1506/2774 [4:57:19<4:14:15, 12.03s/it] 54%|█████▍ | 1507/2774 [4:57:30<4:08:30, 11.77s/it] {'loss': 1.0283, 'learning_rate': 2.272578566125826e-06, 'epoch': 0.54} 54%|█████▍ | 1507/2774 [4:57:30<4:08:30, 11.77s/it] 54%|█████▍ | 1508/2774 [4:57:42<4:06:47, 11.70s/it] {'loss': 1.0322, 'learning_rate': 2.269671131855304e-06, 'epoch': 0.54} 54%|█████▍ | 1508/2774 [4:57:42<4:06:47, 11.70s/it] 54%|█████▍ | 1509/2774 [4:57:53<4:06:38, 11.70s/it] {'loss': 1.0518, 'learning_rate': 2.266764011739444e-06, 'epoch': 0.54} 54%|█████▍ | 1509/2774 [4:57:53<4:06:38, 11.70s/it] 54%|█████▍ | 1510/2774 [4:58:05<4:04:55, 11.63s/it] {'loss': 1.0142, 'learning_rate': 2.263857209743383e-06, 'epoch': 0.54} 54%|█████▍ | 1510/2774 [4:58:05<4:04:55, 11.63s/it] 54%|█████▍ | 1511/2774 [4:58:17<4:06:29, 11.71s/it] {'loss': 1.0156, 'learning_rate': 2.2609507298318235e-06, 'epoch': 0.54} 54%|█████▍ | 1511/2774 [4:58:17<4:06:29, 11.71s/it] 55%|█████▍ | 1512/2774 [4:58:28<4:05:13, 11.66s/it] {'loss': 1.0151, 'learning_rate': 2.258044575969027e-06, 'epoch': 0.55} 55%|█████▍ | 1512/2774 [4:58:28<4:05:13, 11.66s/it] 55%|█████▍ | 1513/2774 [4:58:40<4:04:06, 11.61s/it] {'loss': 1.0464, 'learning_rate': 2.2551387521188135e-06, 'epoch': 0.55} 55%|█████▍ | 1513/2774 [4:58:40<4:04:06, 11.61s/it] 55%|█████▍ | 1514/2774 [4:58:51<4:00:35, 11.46s/it] {'loss': 1.0225, 'learning_rate': 2.25223326224455e-06, 'epoch': 0.55} 55%|█████▍ | 1514/2774 [4:58:51<4:00:35, 11.46s/it] 55%|█████▍ | 1515/2774 [4:59:02<3:59:23, 11.41s/it] {'loss': 1.0718, 'learning_rate': 2.24932811030915e-06, 'epoch': 0.55} 55%|█████▍ | 1515/2774 [4:59:02<3:59:23, 11.41s/it] 55%|█████▍ | 1516/2774 [4:59:14<3:59:01, 11.40s/it] {'loss': 1.0127, 'learning_rate': 2.246423300275065e-06, 'epoch': 0.55} 55%|█████▍ | 1516/2774 [4:59:14<3:59:01, 11.40s/it] 55%|█████▍ | 1517/2774 [4:59:25<3:57:30, 11.34s/it] {'loss': 1.0371, 'learning_rate': 2.2435188361042794e-06, 'epoch': 0.55} 55%|█████▍ | 1517/2774 [4:59:25<3:57:30, 11.34s/it] 55%|█████▍ | 1518/2774 [4:59:38<4:10:56, 11.99s/it] {'loss': 0.9985, 'learning_rate': 2.240614721758308e-06, 'epoch': 0.55} 55%|█████▍ | 1518/2774 [4:59:38<4:10:56, 11.99s/it] 55%|█████▍ | 1519/2774 [4:59:50<4:09:21, 11.92s/it] {'loss': 1.0176, 'learning_rate': 2.2377109611981875e-06, 'epoch': 0.55} 55%|█████▍ | 1519/2774 [4:59:50<4:09:21, 11.92s/it] 55%|█████▍ | 1520/2774 [5:00:02<4:09:46, 11.95s/it] {'loss': 1.0322, 'learning_rate': 2.234807558384471e-06, 'epoch': 0.55} 55%|█████▍ | 1520/2774 [5:00:02<4:09:46, 11.95s/it] 55%|█████▍ | 1521/2774 [5:00:15<4:14:19, 12.18s/it] {'loss': 1.0449, 'learning_rate': 2.2319045172772254e-06, 'epoch': 0.55} 55%|█████▍ | 1521/2774 [5:00:15<4:14:19, 12.18s/it] 55%|█████▍ | 1522/2774 [5:00:28<4:20:02, 12.46s/it] {'loss': 0.9907, 'learning_rate': 2.2290018418360228e-06, 'epoch': 0.55} 55%|█████▍ | 1522/2774 [5:00:28<4:20:02, 12.46s/it] 55%|█████▍ | 1523/2774 [5:00:41<4:21:24, 12.54s/it] {'loss': 0.9746, 'learning_rate': 2.2260995360199376e-06, 'epoch': 0.55} 55%|█████▍ | 1523/2774 [5:00:41<4:21:24, 12.54s/it] 55%|█████▍ | 1524/2774 [5:00:52<4:13:33, 12.17s/it] {'loss': 1.0635, 'learning_rate': 2.2231976037875404e-06, 'epoch': 0.55} 55%|█████▍ | 1524/2774 [5:00:52<4:13:33, 12.17s/it] 55%|█████▍ | 1525/2774 [5:01:04<4:11:05, 12.06s/it] {'loss': 1.0654, 'learning_rate': 2.220296049096889e-06, 'epoch': 0.55} 55%|█████▍ | 1525/2774 [5:01:04<4:11:05, 12.06s/it] 55%|█████▌ | 1526/2774 [5:01:15<4:05:43, 11.81s/it] {'loss': 1.0605, 'learning_rate': 2.2173948759055306e-06, 'epoch': 0.55} 55%|█████▌ | 1526/2774 [5:01:15<4:05:43, 11.81s/it] 55%|█████▌ | 1527/2774 [5:01:27<4:04:30, 11.76s/it] {'loss': 1.021, 'learning_rate': 2.21449408817049e-06, 'epoch': 0.55} 55%|█████▌ | 1527/2774 [5:01:27<4:04:30, 11.76s/it] 55%|█████▌ | 1528/2774 [5:01:38<4:01:37, 11.64s/it] {'loss': 0.9937, 'learning_rate': 2.2115936898482654e-06, 'epoch': 0.55} 55%|█████▌ | 1528/2774 [5:01:38<4:01:37, 11.64s/it] 55%|█████▌ | 1529/2774 [5:01:49<3:58:50, 11.51s/it] {'loss': 1.0864, 'learning_rate': 2.208693684894826e-06, 'epoch': 0.55} 55%|█████▌ | 1529/2774 [5:01:49<3:58:50, 11.51s/it] 55%|█████▌ | 1530/2774 [5:02:01<3:59:43, 11.56s/it] {'loss': 1.0703, 'learning_rate': 2.2057940772656034e-06, 'epoch': 0.55} 55%|█████▌ | 1530/2774 [5:02:01<3:59:43, 11.56s/it] 55%|█████▌ | 1531/2774 [5:02:12<3:59:08, 11.54s/it] {'loss': 1.0186, 'learning_rate': 2.2028948709154867e-06, 'epoch': 0.55} 55%|█████▌ | 1531/2774 [5:02:12<3:59:08, 11.54s/it] 55%|█████▌ | 1532/2774 [5:02:25<4:05:53, 11.88s/it] {'loss': 1.061, 'learning_rate': 2.199996069798819e-06, 'epoch': 0.55} 55%|█████▌ | 1532/2774 [5:02:25<4:05:53, 11.88s/it] 55%|█████▌ | 1533/2774 [5:02:36<4:00:47, 11.64s/it] {'loss': 1.0508, 'learning_rate': 2.197097677869389e-06, 'epoch': 0.55} 55%|█████▌ | 1533/2774 [5:02:36<4:00:47, 11.64s/it] 55%|█████▌ | 1534/2774 [5:02:50<4:12:20, 12.21s/it] {'loss': 1.0029, 'learning_rate': 2.1941996990804287e-06, 'epoch': 0.55} 55%|█████▌ | 1534/2774 [5:02:50<4:12:20, 12.21s/it] 55%|█████▌ | 1535/2774 [5:03:01<4:06:35, 11.94s/it] {'loss': 1.0044, 'learning_rate': 2.1913021373846056e-06, 'epoch': 0.55} 55%|█████▌ | 1535/2774 [5:03:01<4:06:35, 11.94s/it] 55%|█████▌ | 1536/2774 [5:03:15<4:17:21, 12.47s/it] {'loss': 0.9658, 'learning_rate': 2.1884049967340193e-06, 'epoch': 0.55} 55%|█████▌ | 1536/2774 [5:03:15<4:17:21, 12.47s/it] 55%|█████▌ | 1537/2774 [5:03:26<4:10:55, 12.17s/it] {'loss': 1.0859, 'learning_rate': 2.185508281080195e-06, 'epoch': 0.55} 55%|█████▌ | 1537/2774 [5:03:26<4:10:55, 12.17s/it] 55%|█████▌ | 1538/2774 [5:03:38<4:06:56, 11.99s/it] {'loss': 1.0078, 'learning_rate': 2.182611994374077e-06, 'epoch': 0.55} 55%|█████▌ | 1538/2774 [5:03:38<4:06:56, 11.99s/it] 55%|█████▌ | 1539/2774 [5:03:49<4:02:50, 11.80s/it] {'loss': 1.0024, 'learning_rate': 2.1797161405660257e-06, 'epoch': 0.55} 55%|█████▌ | 1539/2774 [5:03:49<4:02:50, 11.80s/it] 56%|█████▌ | 1540/2774 [5:04:01<4:00:41, 11.70s/it] {'loss': 1.0522, 'learning_rate': 2.1768207236058106e-06, 'epoch': 0.56} 56%|█████▌ | 1540/2774 [5:04:01<4:00:41, 11.70s/it] 56%|█████▌ | 1541/2774 [5:04:12<3:59:04, 11.63s/it] {'loss': 0.9976, 'learning_rate': 2.173925747442606e-06, 'epoch': 0.56} 56%|█████▌ | 1541/2774 [5:04:12<3:59:04, 11.63s/it] 56%|█████▌ | 1542/2774 [5:04:23<3:57:04, 11.55s/it] {'loss': 1.085, 'learning_rate': 2.1710312160249856e-06, 'epoch': 0.56} 56%|█████▌ | 1542/2774 [5:04:23<3:57:04, 11.55s/it] 56%|█████▌ | 1543/2774 [5:04:36<4:04:57, 11.94s/it] {'loss': 0.9932, 'learning_rate': 2.1681371333009127e-06, 'epoch': 0.56} 56%|█████▌ | 1543/2774 [5:04:36<4:04:57, 11.94s/it] 56%|█████▌ | 1544/2774 [5:04:48<4:02:13, 11.82s/it] {'loss': 1.0488, 'learning_rate': 2.1652435032177425e-06, 'epoch': 0.56} 56%|█████▌ | 1544/2774 [5:04:48<4:02:13, 11.82s/it] 56%|█████▌ | 1545/2774 [5:04:59<3:59:04, 11.67s/it] {'loss': 1.0444, 'learning_rate': 2.1623503297222124e-06, 'epoch': 0.56} 56%|█████▌ | 1545/2774 [5:04:59<3:59:04, 11.67s/it] 56%|█████▌ | 1546/2774 [5:05:11<3:58:29, 11.65s/it] {'loss': 1.0547, 'learning_rate': 2.1594576167604355e-06, 'epoch': 0.56} 56%|█████▌ | 1546/2774 [5:05:11<3:58:29, 11.65s/it] 56%|█████▌ | 1547/2774 [5:05:22<3:58:17, 11.65s/it] {'loss': 1.0273, 'learning_rate': 2.1565653682778975e-06, 'epoch': 0.56} 56%|█████▌ | 1547/2774 [5:05:22<3:58:17, 11.65s/it] 56%|█████▌ | 1548/2774 [5:05:34<3:55:48, 11.54s/it] {'loss': 1.0239, 'learning_rate': 2.153673588219451e-06, 'epoch': 0.56} 56%|█████▌ | 1548/2774 [5:05:34<3:55:48, 11.54s/it] 56%|█████▌ | 1549/2774 [5:05:45<3:57:15, 11.62s/it] {'loss': 1.0684, 'learning_rate': 2.150782280529309e-06, 'epoch': 0.56} 56%|█████▌ | 1549/2774 [5:05:45<3:57:15, 11.62s/it] 56%|█████▌ | 1550/2774 [5:05:57<3:55:34, 11.55s/it] {'loss': 1.0542, 'learning_rate': 2.1478914491510412e-06, 'epoch': 0.56} 56%|█████▌ | 1550/2774 [5:05:57<3:55:34, 11.55s/it] 56%|█████▌ | 1551/2774 [5:06:09<3:57:15, 11.64s/it] {'loss': 0.9854, 'learning_rate': 2.145001098027567e-06, 'epoch': 0.56} 56%|█████▌ | 1551/2774 [5:06:09<3:57:15, 11.64s/it] 56%|█████▌ | 1552/2774 [5:06:20<3:56:25, 11.61s/it] {'loss': 0.9917, 'learning_rate': 2.1421112311011493e-06, 'epoch': 0.56} 56%|█████▌ | 1552/2774 [5:06:20<3:56:25, 11.61s/it] 56%|█████▌ | 1553/2774 [5:06:32<3:54:46, 11.54s/it] {'loss': 1.0078, 'learning_rate': 2.1392218523133927e-06, 'epoch': 0.56} 56%|█████▌ | 1553/2774 [5:06:32<3:54:46, 11.54s/it] 56%|█████▌ | 1554/2774 [5:06:43<3:55:53, 11.60s/it] {'loss': 1.0273, 'learning_rate': 2.136332965605236e-06, 'epoch': 0.56} 56%|█████▌ | 1554/2774 [5:06:43<3:55:53, 11.60s/it] 56%|█████▌ | 1555/2774 [5:06:55<3:55:55, 11.61s/it] {'loss': 1.0586, 'learning_rate': 2.1334445749169457e-06, 'epoch': 0.56} 56%|█████▌ | 1555/2774 [5:06:55<3:55:55, 11.61s/it] 56%|█████▌ | 1556/2774 [5:07:06<3:54:18, 11.54s/it] {'loss': 1.0327, 'learning_rate': 2.130556684188112e-06, 'epoch': 0.56} 56%|█████▌ | 1556/2774 [5:07:06<3:54:18, 11.54s/it] 56%|█████▌ | 1557/2774 [5:07:18<3:54:26, 11.56s/it] {'loss': 1.0215, 'learning_rate': 2.127669297357644e-06, 'epoch': 0.56} 56%|█████▌ | 1557/2774 [5:07:18<3:54:26, 11.56s/it] 56%|█████▌ | 1558/2774 [5:07:29<3:54:04, 11.55s/it] {'loss': 1.019, 'learning_rate': 2.124782418363762e-06, 'epoch': 0.56} 56%|█████▌ | 1558/2774 [5:07:29<3:54:04, 11.55s/it] 56%|█████▌ | 1559/2774 [5:07:41<3:53:42, 11.54s/it] {'loss': 1.0552, 'learning_rate': 2.1218960511439953e-06, 'epoch': 0.56} 56%|█████▌ | 1559/2774 [5:07:41<3:53:42, 11.54s/it] 56%|█████▌ | 1560/2774 [5:07:53<3:53:16, 11.53s/it] {'loss': 0.9746, 'learning_rate': 2.1190101996351745e-06, 'epoch': 0.56} 56%|█████▌ | 1560/2774 [5:07:53<3:53:16, 11.53s/it] 56%|█████▋ | 1561/2774 [5:08:04<3:52:42, 11.51s/it] {'loss': 1.0146, 'learning_rate': 2.1161248677734263e-06, 'epoch': 0.56} 56%|█████▋ | 1561/2774 [5:08:04<3:52:42, 11.51s/it] 56%|█████▋ | 1562/2774 [5:08:17<3:59:18, 11.85s/it] {'loss': 0.9775, 'learning_rate': 2.1132400594941697e-06, 'epoch': 0.56} 56%|█████▋ | 1562/2774 [5:08:17<3:59:18, 11.85s/it] 56%|█████▋ | 1563/2774 [5:08:28<3:55:53, 11.69s/it] {'loss': 1.0454, 'learning_rate': 2.1103557787321076e-06, 'epoch': 0.56} 56%|█████▋ | 1563/2774 [5:08:28<3:55:53, 11.69s/it] 56%|█████▋ | 1564/2774 [5:08:40<3:55:16, 11.67s/it] {'loss': 1.04, 'learning_rate': 2.107472029421226e-06, 'epoch': 0.56} 56%|█████▋ | 1564/2774 [5:08:40<3:55:16, 11.67s/it] 56%|█████▋ | 1565/2774 [5:08:51<3:54:47, 11.65s/it] {'loss': 1.0537, 'learning_rate': 2.104588815494784e-06, 'epoch': 0.56} 56%|█████▋ | 1565/2774 [5:08:51<3:54:47, 11.65s/it] 56%|█████▋ | 1566/2774 [5:09:03<3:54:09, 11.63s/it] {'loss': 1.041, 'learning_rate': 2.101706140885311e-06, 'epoch': 0.56} 56%|█████▋ | 1566/2774 [5:09:03<3:54:09, 11.63s/it] 56%|█████▋ | 1567/2774 [5:09:14<3:53:14, 11.59s/it] {'loss': 1.0352, 'learning_rate': 2.0988240095246025e-06, 'epoch': 0.56} 56%|█████▋ | 1567/2774 [5:09:14<3:53:14, 11.59s/it] 57%|█████▋ | 1568/2774 [5:09:25<3:50:44, 11.48s/it] {'loss': 1.0142, 'learning_rate': 2.09594242534371e-06, 'epoch': 0.57} 57%|█████▋ | 1568/2774 [5:09:25<3:50:44, 11.48s/it] 57%|█████▋ | 1569/2774 [5:09:37<3:48:25, 11.37s/it] {'loss': 0.9736, 'learning_rate': 2.0930613922729424e-06, 'epoch': 0.57} 57%|█████▋ | 1569/2774 [5:09:37<3:48:25, 11.37s/it] 57%|█████▋ | 1570/2774 [5:09:48<3:48:11, 11.37s/it] {'loss': 1.0083, 'learning_rate': 2.090180914241852e-06, 'epoch': 0.57} 57%|█████▋ | 1570/2774 [5:09:48<3:48:11, 11.37s/it] 57%|█████▋ | 1571/2774 [5:09:59<3:47:19, 11.34s/it] {'loss': 1.04, 'learning_rate': 2.087300995179238e-06, 'epoch': 0.57} 57%|█████▋ | 1571/2774 [5:09:59<3:47:19, 11.34s/it] 57%|█████▋ | 1572/2774 [5:10:10<3:46:30, 11.31s/it] {'loss': 1.0029, 'learning_rate': 2.084421639013136e-06, 'epoch': 0.57} 57%|█████▋ | 1572/2774 [5:10:10<3:46:30, 11.31s/it] 57%|█████▋ | 1573/2774 [5:10:22<3:47:28, 11.36s/it] {'loss': 1.0522, 'learning_rate': 2.0815428496708143e-06, 'epoch': 0.57} 57%|█████▋ | 1573/2774 [5:10:22<3:47:28, 11.36s/it] 57%|█████▋ | 1574/2774 [5:10:33<3:47:11, 11.36s/it] {'loss': 1.0327, 'learning_rate': 2.078664631078767e-06, 'epoch': 0.57} 57%|█████▋ | 1574/2774 [5:10:33<3:47:11, 11.36s/it] 57%|█████▋ | 1575/2774 [5:10:47<3:59:48, 12.00s/it] {'loss': 0.9668, 'learning_rate': 2.0757869871627112e-06, 'epoch': 0.57} 57%|█████▋ | 1575/2774 [5:10:47<3:59:48, 12.00s/it] 57%|█████▋ | 1576/2774 [5:10:58<3:56:42, 11.86s/it] {'loss': 1.0615, 'learning_rate': 2.0729099218475784e-06, 'epoch': 0.57} 57%|█████▋ | 1576/2774 [5:10:58<3:56:42, 11.86s/it] 57%|█████▋ | 1577/2774 [5:11:09<3:52:20, 11.65s/it] {'loss': 1.0366, 'learning_rate': 2.0700334390575126e-06, 'epoch': 0.57} 57%|█████▋ | 1577/2774 [5:11:09<3:52:20, 11.65s/it] 57%|█████▋ | 1578/2774 [5:11:21<3:52:35, 11.67s/it] {'loss': 0.9565, 'learning_rate': 2.0671575427158638e-06, 'epoch': 0.57} 57%|█████▋ | 1578/2774 [5:11:21<3:52:35, 11.67s/it] 57%|█████▋ | 1579/2774 [5:11:33<3:54:01, 11.75s/it] {'loss': 1.0215, 'learning_rate': 2.0642822367451783e-06, 'epoch': 0.57} 57%|█████▋ | 1579/2774 [5:11:33<3:54:01, 11.75s/it] 57%|█████▋ | 1580/2774 [5:11:46<3:59:13, 12.02s/it] {'loss': 1.0239, 'learning_rate': 2.0614075250672006e-06, 'epoch': 0.57} 57%|█████▋ | 1580/2774 [5:11:46<3:59:13, 12.02s/it] 57%|█████▋ | 1581/2774 [5:11:57<3:56:48, 11.91s/it] {'loss': 1.0464, 'learning_rate': 2.058533411602864e-06, 'epoch': 0.57} 57%|█████▋ | 1581/2774 [5:11:57<3:56:48, 11.91s/it] 57%|█████▋ | 1582/2774 [5:12:09<3:53:56, 11.78s/it] {'loss': 1.0146, 'learning_rate': 2.055659900272286e-06, 'epoch': 0.57} 57%|█████▋ | 1582/2774 [5:12:09<3:53:56, 11.78s/it] 57%|█████▋ | 1583/2774 [5:12:21<3:53:55, 11.78s/it] {'loss': 1.0396, 'learning_rate': 2.052786994994763e-06, 'epoch': 0.57} 57%|█████▋ | 1583/2774 [5:12:21<3:53:55, 11.78s/it] 57%|█████▋ | 1584/2774 [5:12:35<4:07:47, 12.49s/it] {'loss': 0.9941, 'learning_rate': 2.049914699688762e-06, 'epoch': 0.57} 57%|█████▋ | 1584/2774 [5:12:35<4:07:47, 12.49s/it] 57%|█████▋ | 1585/2774 [5:12:46<4:00:58, 12.16s/it] {'loss': 1.0786, 'learning_rate': 2.047043018271922e-06, 'epoch': 0.57} 57%|█████▋ | 1585/2774 [5:12:46<4:00:58, 12.16s/it] 57%|█████▋ | 1586/2774 [5:12:57<3:54:30, 11.84s/it] {'loss': 0.9897, 'learning_rate': 2.044171954661043e-06, 'epoch': 0.57} 57%|█████▋ | 1586/2774 [5:12:57<3:54:30, 11.84s/it] 57%|█████▋ | 1587/2774 [5:13:09<3:55:17, 11.89s/it] {'loss': 1.0459, 'learning_rate': 2.0413015127720826e-06, 'epoch': 0.57} 57%|█████▋ | 1587/2774 [5:13:09<3:55:17, 11.89s/it] 57%|█████▋ | 1588/2774 [5:13:21<3:51:46, 11.73s/it] {'loss': 1.0244, 'learning_rate': 2.0384316965201494e-06, 'epoch': 0.57} 57%|█████▋ | 1588/2774 [5:13:21<3:51:46, 11.73s/it] 57%|█████▋ | 1589/2774 [5:13:33<3:53:51, 11.84s/it] {'loss': 1.0049, 'learning_rate': 2.035562509819499e-06, 'epoch': 0.57} 57%|█████▋ | 1589/2774 [5:13:33<3:53:51, 11.84s/it] 57%|█████▋ | 1590/2774 [5:13:44<3:51:04, 11.71s/it] {'loss': 1.0215, 'learning_rate': 2.0326939565835296e-06, 'epoch': 0.57} 57%|█████▋ | 1590/2774 [5:13:44<3:51:04, 11.71s/it] 57%|█████▋ | 1591/2774 [5:13:56<3:51:58, 11.77s/it] {'loss': 1.0376, 'learning_rate': 2.029826040724774e-06, 'epoch': 0.57} 57%|█████▋ | 1591/2774 [5:13:56<3:51:58, 11.77s/it] 57%|█████▋ | 1592/2774 [5:14:08<3:51:56, 11.77s/it] {'loss': 1.0303, 'learning_rate': 2.0269587661548964e-06, 'epoch': 0.57} 57%|█████▋ | 1592/2774 [5:14:08<3:51:56, 11.77s/it] 57%|█████▋ | 1593/2774 [5:14:19<3:50:23, 11.70s/it] {'loss': 1.0366, 'learning_rate': 2.0240921367846863e-06, 'epoch': 0.57} 57%|█████▋ | 1593/2774 [5:14:19<3:50:23, 11.70s/it] 57%|█████▋ | 1594/2774 [5:14:31<3:46:48, 11.53s/it] {'loss': 1.0078, 'learning_rate': 2.0212261565240528e-06, 'epoch': 0.57} 57%|█████▋ | 1594/2774 [5:14:31<3:46:48, 11.53s/it] 57%|█████▋ | 1595/2774 [5:14:43<3:50:10, 11.71s/it] {'loss': 1.0356, 'learning_rate': 2.0183608292820197e-06, 'epoch': 0.57} 57%|█████▋ | 1595/2774 [5:14:43<3:50:10, 11.71s/it] 58%|█████▊ | 1596/2774 [5:14:54<3:48:06, 11.62s/it] {'loss': 1.019, 'learning_rate': 2.015496158966722e-06, 'epoch': 0.58} 58%|█████▊ | 1596/2774 [5:14:54<3:48:06, 11.62s/it] 58%|█████▊ | 1597/2774 [5:15:06<3:47:36, 11.60s/it] {'loss': 1.082, 'learning_rate': 2.0126321494853936e-06, 'epoch': 0.58} 58%|█████▊ | 1597/2774 [5:15:06<3:47:36, 11.60s/it] 58%|█████▊ | 1598/2774 [5:15:17<3:45:24, 11.50s/it] {'loss': 1.085, 'learning_rate': 2.0097688047443727e-06, 'epoch': 0.58} 58%|█████▊ | 1598/2774 [5:15:17<3:45:24, 11.50s/it] 58%|█████▊ | 1599/2774 [5:15:29<3:46:28, 11.57s/it] {'loss': 1.0591, 'learning_rate': 2.0069061286490877e-06, 'epoch': 0.58} 58%|█████▊ | 1599/2774 [5:15:29<3:46:28, 11.57s/it] 58%|█████▊ | 1600/2774 [5:15:40<3:45:14, 11.51s/it] {'loss': 1.0796, 'learning_rate': 2.004044125104057e-06, 'epoch': 0.58} 58%|█████▊ | 1600/2774 [5:15:40<3:45:14, 11.51s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 58%|█████▊ | 1601/2774 [5:16:18<6:21:54, 19.54s/it] {'loss': 1.0679, 'learning_rate': 2.0011827980128788e-06, 'epoch': 0.58} 58%|█████▊ | 1601/2774 [5:16:18<6:21:54, 19.54s/it] 58%|█████▊ | 1602/2774 [5:16:30<5:36:10, 17.21s/it] {'loss': 1.0166, 'learning_rate': 1.998322151278232e-06, 'epoch': 0.58} 58%|█████▊ | 1602/2774 [5:16:30<5:36:10, 17.21s/it] 58%|█████▊ | 1603/2774 [5:16:41<5:01:00, 15.42s/it] {'loss': 1.0493, 'learning_rate': 1.9954621888018656e-06, 'epoch': 0.58} 58%|█████▊ | 1603/2774 [5:16:41<5:01:00, 15.42s/it] 58%|█████▊ | 1604/2774 [5:16:53<4:40:13, 14.37s/it] {'loss': 1.0366, 'learning_rate': 1.9926029144845956e-06, 'epoch': 0.58} 58%|█████▊ | 1604/2774 [5:16:53<4:40:13, 14.37s/it] 58%|█████▊ | 1605/2774 [5:17:05<4:26:40, 13.69s/it] {'loss': 1.0444, 'learning_rate': 1.9897443322262985e-06, 'epoch': 0.58} 58%|█████▊ | 1605/2774 [5:17:05<4:26:40, 13.69s/it] 58%|█████▊ | 1606/2774 [5:17:17<4:13:51, 13.04s/it] {'loss': 1.0464, 'learning_rate': 1.986886445925909e-06, 'epoch': 0.58} 58%|█████▊ | 1606/2774 [5:17:17<4:13:51, 13.04s/it] 58%|█████▊ | 1607/2774 [5:17:29<4:06:27, 12.67s/it] {'loss': 1.0278, 'learning_rate': 1.984029259481411e-06, 'epoch': 0.58} 58%|█████▊ | 1607/2774 [5:17:29<4:06:27, 12.67s/it] 58%|█████▊ | 1608/2774 [5:17:40<3:58:35, 12.28s/it] {'loss': 1.0576, 'learning_rate': 1.981172776789834e-06, 'epoch': 0.58} 58%|█████▊ | 1608/2774 [5:17:40<3:58:35, 12.28s/it] 58%|█████▊ | 1609/2774 [5:17:52<3:58:56, 12.31s/it] {'loss': 1.0083, 'learning_rate': 1.978317001747248e-06, 'epoch': 0.58} 58%|█████▊ | 1609/2774 [5:17:52<3:58:56, 12.31s/it] 58%|█████▊ | 1610/2774 [5:18:04<3:54:30, 12.09s/it] {'loss': 0.9878, 'learning_rate': 1.9754619382487572e-06, 'epoch': 0.58} 58%|█████▊ | 1610/2774 [5:18:04<3:54:30, 12.09s/it] 58%|█████▊ | 1611/2774 [5:18:16<3:55:43, 12.16s/it] {'loss': 1.0107, 'learning_rate': 1.9726075901884964e-06, 'epoch': 0.58} 58%|█████▊ | 1611/2774 [5:18:16<3:55:43, 12.16s/it] 58%|█████▊ | 1612/2774 [5:18:28<3:52:22, 12.00s/it] {'loss': 1.0063, 'learning_rate': 1.9697539614596237e-06, 'epoch': 0.58} 58%|█████▊ | 1612/2774 [5:18:28<3:52:22, 12.00s/it] 58%|█████▊ | 1613/2774 [5:18:39<3:48:56, 11.83s/it] {'loss': 1.0801, 'learning_rate': 1.9669010559543163e-06, 'epoch': 0.58} 58%|█████▊ | 1613/2774 [5:18:39<3:48:56, 11.83s/it] 58%|█████▊ | 1614/2774 [5:18:51<3:48:19, 11.81s/it] {'loss': 1.0244, 'learning_rate': 1.9640488775637647e-06, 'epoch': 0.58} 58%|█████▊ | 1614/2774 [5:18:51<3:48:19, 11.81s/it] 58%|█████▊ | 1615/2774 [5:19:02<3:44:57, 11.65s/it] {'loss': 1.0552, 'learning_rate': 1.9611974301781693e-06, 'epoch': 0.58} 58%|█████▊ | 1615/2774 [5:19:02<3:44:57, 11.65s/it] 58%|█████▊ | 1616/2774 [5:19:14<3:44:04, 11.61s/it] {'loss': 0.9683, 'learning_rate': 1.95834671768673e-06, 'epoch': 0.58} 58%|█████▊ | 1616/2774 [5:19:14<3:44:04, 11.61s/it] 58%|█████▊ | 1617/2774 [5:19:25<3:43:35, 11.60s/it] {'loss': 1.0337, 'learning_rate': 1.9554967439776474e-06, 'epoch': 0.58} 58%|█████▊ | 1617/2774 [5:19:25<3:43:35, 11.60s/it] 58%|█████▊ | 1618/2774 [5:19:37<3:44:01, 11.63s/it] {'loss': 1.0786, 'learning_rate': 1.952647512938114e-06, 'epoch': 0.58} 58%|█████▊ | 1618/2774 [5:19:37<3:44:01, 11.63s/it] 58%|█████▊ | 1619/2774 [5:19:48<3:41:13, 11.49s/it] {'loss': 1.0259, 'learning_rate': 1.9497990284543076e-06, 'epoch': 0.58} 58%|█████▊ | 1619/2774 [5:19:48<3:41:13, 11.49s/it] 58%|█████▊ | 1620/2774 [5:20:00<3:40:19, 11.46s/it] {'loss': 0.9878, 'learning_rate': 1.94695129441139e-06, 'epoch': 0.58} 58%|█████▊ | 1620/2774 [5:20:00<3:40:19, 11.46s/it] 58%|█████▊ | 1621/2774 [5:20:11<3:41:13, 11.51s/it] {'loss': 1.0674, 'learning_rate': 1.944104314693498e-06, 'epoch': 0.58} 58%|█████▊ | 1621/2774 [5:20:11<3:41:13, 11.51s/it] 58%|█████▊ | 1622/2774 [5:20:23<3:41:40, 11.55s/it] {'loss': 1.0474, 'learning_rate': 1.94125809318374e-06, 'epoch': 0.58} 58%|█████▊ | 1622/2774 [5:20:23<3:41:40, 11.55s/it] 59%|█████▊ | 1623/2774 [5:20:36<3:51:37, 12.07s/it] {'loss': 1.0312, 'learning_rate': 1.93841263376419e-06, 'epoch': 0.59} 59%|█████▊ | 1623/2774 [5:20:36<3:51:37, 12.07s/it] 59%|█████▊ | 1624/2774 [5:20:48<3:46:49, 11.83s/it] {'loss': 0.9824, 'learning_rate': 1.9355679403158843e-06, 'epoch': 0.59} 59%|█████▊ | 1624/2774 [5:20:48<3:46:49, 11.83s/it] 59%|█████▊ | 1625/2774 [5:20:59<3:46:16, 11.82s/it] {'loss': 1.0137, 'learning_rate': 1.932724016718811e-06, 'epoch': 0.59} 59%|█████▊ | 1625/2774 [5:20:59<3:46:16, 11.82s/it] 59%|█████▊ | 1626/2774 [5:21:11<3:45:13, 11.77s/it] {'loss': 1.0581, 'learning_rate': 1.92988086685191e-06, 'epoch': 0.59} 59%|█████▊ | 1626/2774 [5:21:11<3:45:13, 11.77s/it] 59%|█████▊ | 1627/2774 [5:21:22<3:41:23, 11.58s/it] {'loss': 1.0234, 'learning_rate': 1.9270384945930667e-06, 'epoch': 0.59} 59%|█████▊ | 1627/2774 [5:21:22<3:41:23, 11.58s/it] 59%|█████▊ | 1628/2774 [5:21:35<3:50:47, 12.08s/it] {'loss': 1.0142, 'learning_rate': 1.9241969038191055e-06, 'epoch': 0.59} 59%|█████▊ | 1628/2774 [5:21:35<3:50:47, 12.08s/it] 59%|█████▊ | 1629/2774 [5:21:48<3:51:33, 12.13s/it] {'loss': 1.0376, 'learning_rate': 1.9213560984057844e-06, 'epoch': 0.59} 59%|█████▊ | 1629/2774 [5:21:48<3:51:33, 12.13s/it] 59%|█████▉ | 1630/2774 [5:21:59<3:48:34, 11.99s/it] {'loss': 1.0186, 'learning_rate': 1.9185160822277896e-06, 'epoch': 0.59} 59%|█████▉ | 1630/2774 [5:21:59<3:48:34, 11.99s/it] 59%|█████▉ | 1631/2774 [5:22:11<3:47:51, 11.96s/it] {'loss': 0.9902, 'learning_rate': 1.915676859158733e-06, 'epoch': 0.59} 59%|█████▉ | 1631/2774 [5:22:11<3:47:51, 11.96s/it] 59%|█████▉ | 1632/2774 [5:22:23<3:46:01, 11.88s/it] {'loss': 1.043, 'learning_rate': 1.9128384330711416e-06, 'epoch': 0.59} 59%|█████▉ | 1632/2774 [5:22:23<3:46:01, 11.88s/it] 59%|█████▉ | 1633/2774 [5:22:34<3:43:05, 11.73s/it] {'loss': 1.02, 'learning_rate': 1.9100008078364586e-06, 'epoch': 0.59} 59%|█████▉ | 1633/2774 [5:22:34<3:43:05, 11.73s/it] 59%|█████▉ | 1634/2774 [5:22:48<3:52:27, 12.23s/it] {'loss': 1.0117, 'learning_rate': 1.9071639873250333e-06, 'epoch': 0.59} 59%|█████▉ | 1634/2774 [5:22:48<3:52:27, 12.23s/it] 59%|█████▉ | 1635/2774 [5:22:59<3:47:22, 11.98s/it] {'loss': 1.0488, 'learning_rate': 1.9043279754061164e-06, 'epoch': 0.59} 59%|█████▉ | 1635/2774 [5:22:59<3:47:22, 11.98s/it] 59%|█████▉ | 1636/2774 [5:23:11<3:44:19, 11.83s/it] {'loss': 1.0405, 'learning_rate': 1.9014927759478575e-06, 'epoch': 0.59} 59%|█████▉ | 1636/2774 [5:23:11<3:44:19, 11.83s/it] 59%|█████▉ | 1637/2774 [5:23:22<3:40:59, 11.66s/it] {'loss': 1.001, 'learning_rate': 1.8986583928172972e-06, 'epoch': 0.59} 59%|█████▉ | 1637/2774 [5:23:22<3:40:59, 11.66s/it] 59%|█████▉ | 1638/2774 [5:23:33<3:40:08, 11.63s/it] {'loss': 1.0308, 'learning_rate': 1.8958248298803634e-06, 'epoch': 0.59} 59%|█████▉ | 1638/2774 [5:23:33<3:40:08, 11.63s/it] 59%|█████▉ | 1639/2774 [5:23:45<3:37:39, 11.51s/it] {'loss': 0.98, 'learning_rate': 1.892992091001864e-06, 'epoch': 0.59} 59%|█████▉ | 1639/2774 [5:23:45<3:37:39, 11.51s/it] 59%|█████▉ | 1640/2774 [5:23:56<3:37:39, 11.52s/it] {'loss': 1.0264, 'learning_rate': 1.8901601800454845e-06, 'epoch': 0.59} 59%|█████▉ | 1640/2774 [5:23:56<3:37:39, 11.52s/it] 59%|█████▉ | 1641/2774 [5:24:07<3:35:55, 11.43s/it] {'loss': 1.0312, 'learning_rate': 1.8873291008737795e-06, 'epoch': 0.59} 59%|█████▉ | 1641/2774 [5:24:07<3:35:55, 11.43s/it] 59%|█████▉ | 1642/2774 [5:24:19<3:35:14, 11.41s/it] {'loss': 1.062, 'learning_rate': 1.8844988573481722e-06, 'epoch': 0.59} 59%|█████▉ | 1642/2774 [5:24:19<3:35:14, 11.41s/it] 59%|█████▉ | 1643/2774 [5:24:30<3:35:51, 11.45s/it] {'loss': 0.9956, 'learning_rate': 1.8816694533289405e-06, 'epoch': 0.59} 59%|█████▉ | 1643/2774 [5:24:30<3:35:51, 11.45s/it] 59%|█████▉ | 1644/2774 [5:24:42<3:38:28, 11.60s/it] {'loss': 1.0181, 'learning_rate': 1.8788408926752225e-06, 'epoch': 0.59} 59%|█████▉ | 1644/2774 [5:24:42<3:38:28, 11.60s/it] 59%|█████▉ | 1645/2774 [5:24:54<3:36:38, 11.51s/it] {'loss': 1.0698, 'learning_rate': 1.8760131792450034e-06, 'epoch': 0.59} 59%|█████▉ | 1645/2774 [5:24:54<3:36:38, 11.51s/it] 59%|█████▉ | 1646/2774 [5:25:05<3:37:28, 11.57s/it] {'loss': 1.0098, 'learning_rate': 1.8731863168951142e-06, 'epoch': 0.59} 59%|█████▉ | 1646/2774 [5:25:05<3:37:28, 11.57s/it] 59%|█████▉ | 1647/2774 [5:25:17<3:35:59, 11.50s/it] {'loss': 0.9658, 'learning_rate': 1.8703603094812236e-06, 'epoch': 0.59} 59%|█████▉ | 1647/2774 [5:25:17<3:35:59, 11.50s/it] 59%|█████▉ | 1648/2774 [5:25:28<3:34:41, 11.44s/it] {'loss': 1.0034, 'learning_rate': 1.8675351608578358e-06, 'epoch': 0.59} 59%|█████▉ | 1648/2774 [5:25:28<3:34:41, 11.44s/it] 59%|█████▉ | 1649/2774 [5:25:39<3:33:23, 11.38s/it] {'loss': 1.0171, 'learning_rate': 1.864710874878282e-06, 'epoch': 0.59} 59%|█████▉ | 1649/2774 [5:25:39<3:33:23, 11.38s/it] 59%|█████▉ | 1650/2774 [5:25:51<3:35:11, 11.49s/it] {'loss': 1.0508, 'learning_rate': 1.8618874553947189e-06, 'epoch': 0.59} 59%|█████▉ | 1650/2774 [5:25:51<3:35:11, 11.49s/it] 60%|█████▉ | 1651/2774 [5:26:05<3:49:00, 12.24s/it] {'loss': 1.0151, 'learning_rate': 1.8590649062581192e-06, 'epoch': 0.6} 60%|█████▉ | 1651/2774 [5:26:05<3:49:00, 12.24s/it] 60%|█████▉ | 1652/2774 [5:26:16<3:43:43, 11.96s/it] {'loss': 0.9712, 'learning_rate': 1.8562432313182692e-06, 'epoch': 0.6} 60%|█████▉ | 1652/2774 [5:26:16<3:43:43, 11.96s/it] 60%|█████▉ | 1653/2774 [5:26:27<3:39:47, 11.76s/it] {'loss': 1.0508, 'learning_rate': 1.8534224344237634e-06, 'epoch': 0.6} 60%|█████▉ | 1653/2774 [5:26:27<3:39:47, 11.76s/it] 60%|█████▉ | 1654/2774 [5:26:39<3:39:49, 11.78s/it] {'loss': 1.0146, 'learning_rate': 1.8506025194219984e-06, 'epoch': 0.6} 60%|█████▉ | 1654/2774 [5:26:39<3:39:49, 11.78s/it] 60%|█████▉ | 1655/2774 [5:26:50<3:36:34, 11.61s/it] {'loss': 1.0083, 'learning_rate': 1.8477834901591678e-06, 'epoch': 0.6} 60%|█████▉ | 1655/2774 [5:26:50<3:36:34, 11.61s/it] 60%|█████▉ | 1656/2774 [5:27:03<3:39:01, 11.75s/it] {'loss': 0.9971, 'learning_rate': 1.8449653504802573e-06, 'epoch': 0.6} 60%|█████▉ | 1656/2774 [5:27:03<3:39:01, 11.75s/it] 60%|█████▉ | 1657/2774 [5:27:14<3:38:35, 11.74s/it] {'loss': 1.0532, 'learning_rate': 1.8421481042290393e-06, 'epoch': 0.6} 60%|█████▉ | 1657/2774 [5:27:14<3:38:35, 11.74s/it] 60%|█████▉ | 1658/2774 [5:27:26<3:36:49, 11.66s/it] {'loss': 1.0298, 'learning_rate': 1.8393317552480672e-06, 'epoch': 0.6} 60%|█████▉ | 1658/2774 [5:27:26<3:36:49, 11.66s/it] 60%|█████▉ | 1659/2774 [5:27:37<3:36:36, 11.66s/it] {'loss': 1.0259, 'learning_rate': 1.8365163073786712e-06, 'epoch': 0.6} 60%|█████▉ | 1659/2774 [5:27:37<3:36:36, 11.66s/it] 60%|█████▉ | 1660/2774 [5:27:49<3:34:14, 11.54s/it] {'loss': 0.9697, 'learning_rate': 1.8337017644609532e-06, 'epoch': 0.6} 60%|█████▉ | 1660/2774 [5:27:49<3:34:14, 11.54s/it] 60%|█████▉ | 1661/2774 [5:28:02<3:46:08, 12.19s/it] {'loss': 1.0239, 'learning_rate': 1.8308881303337772e-06, 'epoch': 0.6} 60%|█████▉ | 1661/2774 [5:28:02<3:46:08, 12.19s/it] 60%|█████▉ | 1662/2774 [5:28:15<3:49:00, 12.36s/it] {'loss': 1.0366, 'learning_rate': 1.8280754088347714e-06, 'epoch': 0.6} 60%|█████▉ | 1662/2774 [5:28:15<3:49:00, 12.36s/it] 60%|█████▉ | 1663/2774 [5:28:26<3:43:24, 12.06s/it] {'loss': 1.0098, 'learning_rate': 1.8252636038003181e-06, 'epoch': 0.6} 60%|█████▉ | 1663/2774 [5:28:26<3:43:24, 12.06s/it] 60%|█████▉ | 1664/2774 [5:28:38<3:39:53, 11.89s/it] {'loss': 1.0171, 'learning_rate': 1.82245271906555e-06, 'epoch': 0.6} 60%|█████▉ | 1664/2774 [5:28:38<3:39:53, 11.89s/it] 60%|██████ | 1665/2774 [5:28:49<3:37:18, 11.76s/it] {'loss': 1.0806, 'learning_rate': 1.8196427584643433e-06, 'epoch': 0.6} 60%|██████ | 1665/2774 [5:28:49<3:37:18, 11.76s/it] 60%|██████ | 1666/2774 [5:29:01<3:36:24, 11.72s/it] {'loss': 1.0269, 'learning_rate': 1.8168337258293145e-06, 'epoch': 0.6} 60%|██████ | 1666/2774 [5:29:01<3:36:24, 11.72s/it] 60%|██████ | 1667/2774 [5:29:13<3:34:56, 11.65s/it] {'loss': 1.0005, 'learning_rate': 1.8140256249918153e-06, 'epoch': 0.6} 60%|██████ | 1667/2774 [5:29:13<3:34:56, 11.65s/it] 60%|██████ | 1668/2774 [5:29:24<3:35:41, 11.70s/it] {'loss': 0.9678, 'learning_rate': 1.8112184597819246e-06, 'epoch': 0.6} 60%|██████ | 1668/2774 [5:29:24<3:35:41, 11.70s/it] 60%|██████ | 1669/2774 [5:29:36<3:35:30, 11.70s/it] {'loss': 1.0684, 'learning_rate': 1.808412234028448e-06, 'epoch': 0.6} 60%|██████ | 1669/2774 [5:29:36<3:35:30, 11.70s/it] 60%|██████ | 1670/2774 [5:29:47<3:31:45, 11.51s/it] {'loss': 0.9517, 'learning_rate': 1.8056069515589054e-06, 'epoch': 0.6} 60%|██████ | 1670/2774 [5:29:47<3:31:45, 11.51s/it] 60%|██████ | 1671/2774 [5:29:59<3:31:19, 11.50s/it] {'loss': 1.0537, 'learning_rate': 1.8028026161995335e-06, 'epoch': 0.6} 60%|██████ | 1671/2774 [5:29:59<3:31:19, 11.50s/it] 60%|██████ | 1672/2774 [5:30:10<3:30:26, 11.46s/it] {'loss': 1.0259, 'learning_rate': 1.7999992317752768e-06, 'epoch': 0.6} 60%|██████ | 1672/2774 [5:30:10<3:30:26, 11.46s/it] 60%|██████ | 1673/2774 [5:30:21<3:30:33, 11.47s/it] {'loss': 0.9946, 'learning_rate': 1.7971968021097818e-06, 'epoch': 0.6} 60%|██████ | 1673/2774 [5:30:21<3:30:33, 11.47s/it] 60%|██████ | 1674/2774 [5:30:33<3:31:46, 11.55s/it] {'loss': 1.0249, 'learning_rate': 1.7943953310253939e-06, 'epoch': 0.6} 60%|██████ | 1674/2774 [5:30:33<3:31:46, 11.55s/it] 60%|██████ | 1675/2774 [5:30:45<3:32:27, 11.60s/it] {'loss': 0.9741, 'learning_rate': 1.7915948223431506e-06, 'epoch': 0.6} 60%|██████ | 1675/2774 [5:30:45<3:32:27, 11.60s/it] 60%|██████ | 1676/2774 [5:30:56<3:31:27, 11.56s/it] {'loss': 1.0264, 'learning_rate': 1.788795279882775e-06, 'epoch': 0.6} 60%|██████ | 1676/2774 [5:30:56<3:31:27, 11.56s/it] 60%|██████ | 1677/2774 [5:31:08<3:30:39, 11.52s/it] {'loss': 1.0107, 'learning_rate': 1.7859967074626756e-06, 'epoch': 0.6} 60%|██████ | 1677/2774 [5:31:08<3:30:39, 11.52s/it] 60%|██████ | 1678/2774 [5:31:19<3:29:13, 11.45s/it] {'loss': 1.0093, 'learning_rate': 1.7831991088999357e-06, 'epoch': 0.6} 60%|██████ | 1678/2774 [5:31:19<3:29:13, 11.45s/it] 61%|██████ | 1679/2774 [5:31:30<3:27:46, 11.38s/it] {'loss': 1.0088, 'learning_rate': 1.7804024880103101e-06, 'epoch': 0.61} 61%|██████ | 1679/2774 [5:31:30<3:27:46, 11.38s/it] 61%|██████ | 1680/2774 [5:31:43<3:35:59, 11.85s/it] {'loss': 1.0386, 'learning_rate': 1.7776068486082215e-06, 'epoch': 0.61} 61%|██████ | 1680/2774 [5:31:43<3:35:59, 11.85s/it] 61%|██████ | 1681/2774 [5:31:55<3:34:32, 11.78s/it] {'loss': 1.0269, 'learning_rate': 1.7748121945067526e-06, 'epoch': 0.61} 61%|██████ | 1681/2774 [5:31:55<3:34:32, 11.78s/it] 61%|██████ | 1682/2774 [5:32:06<3:33:04, 11.71s/it] {'loss': 1.0269, 'learning_rate': 1.772018529517643e-06, 'epoch': 0.61} 61%|██████ | 1682/2774 [5:32:06<3:33:04, 11.71s/it] 61%|██████ | 1683/2774 [5:32:18<3:31:53, 11.65s/it] {'loss': 1.0674, 'learning_rate': 1.7692258574512827e-06, 'epoch': 0.61} 61%|██████ | 1683/2774 [5:32:18<3:31:53, 11.65s/it] 61%|██████ | 1684/2774 [5:32:29<3:29:56, 11.56s/it] {'loss': 1.0, 'learning_rate': 1.766434182116708e-06, 'epoch': 0.61} 61%|██████ | 1684/2774 [5:32:29<3:29:56, 11.56s/it] 61%|██████ | 1685/2774 [5:32:41<3:28:28, 11.49s/it] {'loss': 1.0254, 'learning_rate': 1.7636435073215956e-06, 'epoch': 0.61} 61%|██████ | 1685/2774 [5:32:41<3:28:28, 11.49s/it] 61%|██████ | 1686/2774 [5:32:52<3:27:13, 11.43s/it] {'loss': 1.0176, 'learning_rate': 1.7608538368722572e-06, 'epoch': 0.61} 61%|██████ | 1686/2774 [5:32:52<3:27:13, 11.43s/it] 61%|██████ | 1687/2774 [5:33:04<3:29:25, 11.56s/it] {'loss': 1.0669, 'learning_rate': 1.7580651745736357e-06, 'epoch': 0.61} 61%|██████ | 1687/2774 [5:33:04<3:29:25, 11.56s/it] 61%|██████ | 1688/2774 [5:33:15<3:26:59, 11.44s/it] {'loss': 0.9888, 'learning_rate': 1.755277524229296e-06, 'epoch': 0.61} 61%|██████ | 1688/2774 [5:33:15<3:26:59, 11.44s/it] 61%|██████ | 1689/2774 [5:33:26<3:26:43, 11.43s/it] {'loss': 1.0381, 'learning_rate': 1.752490889641426e-06, 'epoch': 0.61} 61%|██████ | 1689/2774 [5:33:26<3:26:43, 11.43s/it] 61%|██████ | 1690/2774 [5:33:38<3:27:50, 11.50s/it] {'loss': 0.9927, 'learning_rate': 1.7497052746108262e-06, 'epoch': 0.61} 61%|██████ | 1690/2774 [5:33:38<3:27:50, 11.50s/it] 61%|██████ | 1691/2774 [5:33:50<3:27:53, 11.52s/it] {'loss': 1.0376, 'learning_rate': 1.7469206829369085e-06, 'epoch': 0.61} 61%|██████ | 1691/2774 [5:33:50<3:27:53, 11.52s/it] 61%|██████ | 1692/2774 [5:34:01<3:28:52, 11.58s/it] {'loss': 0.9658, 'learning_rate': 1.7441371184176865e-06, 'epoch': 0.61} 61%|██████ | 1692/2774 [5:34:01<3:28:52, 11.58s/it] 61%|██████ | 1693/2774 [5:34:12<3:26:41, 11.47s/it] {'loss': 1.0371, 'learning_rate': 1.7413545848497745e-06, 'epoch': 0.61} 61%|██████ | 1693/2774 [5:34:12<3:26:41, 11.47s/it] 61%|██████ | 1694/2774 [5:34:24<3:25:03, 11.39s/it] {'loss': 1.0171, 'learning_rate': 1.7385730860283806e-06, 'epoch': 0.61} 61%|██████ | 1694/2774 [5:34:24<3:25:03, 11.39s/it] 61%|██████ | 1695/2774 [5:34:35<3:26:09, 11.46s/it] {'loss': 1.0537, 'learning_rate': 1.7357926257473007e-06, 'epoch': 0.61} 61%|██████ | 1695/2774 [5:34:35<3:26:09, 11.46s/it] 61%|██████ | 1696/2774 [5:34:47<3:25:46, 11.45s/it] {'loss': 1.0767, 'learning_rate': 1.7330132077989159e-06, 'epoch': 0.61} 61%|██████ | 1696/2774 [5:34:47<3:25:46, 11.45s/it] 61%|██████ | 1697/2774 [5:34:59<3:29:01, 11.64s/it] {'loss': 1.0366, 'learning_rate': 1.7302348359741821e-06, 'epoch': 0.61} 61%|██████ | 1697/2774 [5:34:59<3:29:01, 11.64s/it] 61%|██████ | 1698/2774 [5:35:10<3:27:58, 11.60s/it] {'loss': 1.0205, 'learning_rate': 1.7274575140626318e-06, 'epoch': 0.61} 61%|██████ | 1698/2774 [5:35:10<3:27:58, 11.60s/it] 61%|██████ | 1699/2774 [5:35:22<3:26:53, 11.55s/it] {'loss': 1.0396, 'learning_rate': 1.7246812458523642e-06, 'epoch': 0.61} 61%|██████ | 1699/2774 [5:35:22<3:26:53, 11.55s/it] 61%|██████▏ | 1700/2774 [5:35:33<3:25:02, 11.45s/it] {'loss': 1.0181, 'learning_rate': 1.7219060351300417e-06, 'epoch': 0.61} 61%|██████▏ | 1700/2774 [5:35:33<3:25:02, 11.45s/it] 61%|██████▏ | 1701/2774 [5:35:44<3:23:42, 11.39s/it] {'loss': 1.0186, 'learning_rate': 1.7191318856808848e-06, 'epoch': 0.61} 61%|██████▏ | 1701/2774 [5:35:44<3:23:42, 11.39s/it] 61%|██████▏ | 1702/2774 [5:35:56<3:23:08, 11.37s/it] {'loss': 1.0146, 'learning_rate': 1.716358801288664e-06, 'epoch': 0.61} 61%|██████▏ | 1702/2774 [5:35:56<3:23:08, 11.37s/it] 61%|██████▏ | 1703/2774 [5:36:07<3:24:57, 11.48s/it] {'loss': 1.0576, 'learning_rate': 1.7135867857356998e-06, 'epoch': 0.61} 61%|██████▏ | 1703/2774 [5:36:07<3:24:57, 11.48s/it] 61%|██████▏ | 1704/2774 [5:36:19<3:24:07, 11.45s/it] {'loss': 1.0127, 'learning_rate': 1.7108158428028537e-06, 'epoch': 0.61} 61%|██████▏ | 1704/2774 [5:36:19<3:24:07, 11.45s/it] 61%|██████▏ | 1705/2774 [5:36:30<3:22:37, 11.37s/it] {'loss': 1.0298, 'learning_rate': 1.7080459762695262e-06, 'epoch': 0.61} 61%|██████▏ | 1705/2774 [5:36:30<3:22:37, 11.37s/it] 61%|██████▏ | 1706/2774 [5:36:41<3:22:08, 11.36s/it] {'loss': 0.9614, 'learning_rate': 1.7052771899136453e-06, 'epoch': 0.61} 61%|██████▏ | 1706/2774 [5:36:41<3:22:08, 11.36s/it] 62%|██████▏ | 1707/2774 [5:36:52<3:20:53, 11.30s/it] {'loss': 1.0728, 'learning_rate': 1.7025094875116693e-06, 'epoch': 0.62} 62%|██████▏ | 1707/2774 [5:36:52<3:20:53, 11.30s/it] 62%|██████▏ | 1708/2774 [5:37:04<3:22:21, 11.39s/it] {'loss': 1.0068, 'learning_rate': 1.6997428728385773e-06, 'epoch': 0.62} 62%|██████▏ | 1708/2774 [5:37:04<3:22:21, 11.39s/it] 62%|██████▏ | 1709/2774 [5:37:16<3:23:16, 11.45s/it] {'loss': 1.061, 'learning_rate': 1.6969773496678648e-06, 'epoch': 0.62} 62%|██████▏ | 1709/2774 [5:37:16<3:23:16, 11.45s/it] 62%|██████▏ | 1710/2774 [5:37:28<3:27:00, 11.67s/it] {'loss': 1.0859, 'learning_rate': 1.6942129217715382e-06, 'epoch': 0.62} 62%|██████▏ | 1710/2774 [5:37:28<3:27:00, 11.67s/it] 62%|██████▏ | 1711/2774 [5:37:39<3:26:33, 11.66s/it] {'loss': 1.0635, 'learning_rate': 1.6914495929201098e-06, 'epoch': 0.62} 62%|██████▏ | 1711/2774 [5:37:39<3:26:33, 11.66s/it] 62%|██████▏ | 1712/2774 [5:37:51<3:23:57, 11.52s/it] {'loss': 1.0327, 'learning_rate': 1.6886873668825932e-06, 'epoch': 0.62} 62%|██████▏ | 1712/2774 [5:37:51<3:23:57, 11.52s/it] 62%|██████▏ | 1713/2774 [5:38:02<3:24:26, 11.56s/it] {'loss': 1.0698, 'learning_rate': 1.6859262474264985e-06, 'epoch': 0.62} 62%|██████▏ | 1713/2774 [5:38:02<3:24:26, 11.56s/it] 62%|██████▏ | 1714/2774 [5:38:14<3:24:06, 11.55s/it] {'loss': 1.0859, 'learning_rate': 1.6831662383178262e-06, 'epoch': 0.62} 62%|██████▏ | 1714/2774 [5:38:14<3:24:06, 11.55s/it] 62%|██████▏ | 1715/2774 [5:38:25<3:23:31, 11.53s/it] {'loss': 1.0762, 'learning_rate': 1.6804073433210605e-06, 'epoch': 0.62} 62%|██████▏ | 1715/2774 [5:38:25<3:23:31, 11.53s/it] 62%|██████▏ | 1716/2774 [5:38:36<3:21:15, 11.41s/it] {'loss': 1.0142, 'learning_rate': 1.6776495661991682e-06, 'epoch': 0.62} 62%|██████▏ | 1716/2774 [5:38:36<3:21:15, 11.41s/it] 62%|██████▏ | 1717/2774 [5:38:48<3:19:57, 11.35s/it] {'loss': 1.0317, 'learning_rate': 1.674892910713591e-06, 'epoch': 0.62} 62%|██████▏ | 1717/2774 [5:38:48<3:19:57, 11.35s/it] 62%|██████▏ | 1718/2774 [5:38:59<3:20:04, 11.37s/it] {'loss': 1.0332, 'learning_rate': 1.67213738062424e-06, 'epoch': 0.62} 62%|██████▏ | 1718/2774 [5:38:59<3:20:04, 11.37s/it] 62%|██████▏ | 1719/2774 [5:39:10<3:19:55, 11.37s/it] {'loss': 1.0479, 'learning_rate': 1.6693829796894923e-06, 'epoch': 0.62} 62%|██████▏ | 1719/2774 [5:39:10<3:19:55, 11.37s/it] 62%|██████▏ | 1720/2774 [5:39:23<3:25:10, 11.68s/it] {'loss': 1.0342, 'learning_rate': 1.666629711666184e-06, 'epoch': 0.62} 62%|██████▏ | 1720/2774 [5:39:23<3:25:10, 11.68s/it] 62%|██████▏ | 1721/2774 [5:39:34<3:22:29, 11.54s/it] {'loss': 0.9551, 'learning_rate': 1.663877580309607e-06, 'epoch': 0.62} 62%|██████▏ | 1721/2774 [5:39:34<3:22:29, 11.54s/it] 62%|██████▏ | 1722/2774 [5:39:46<3:25:41, 11.73s/it] {'loss': 1.0151, 'learning_rate': 1.6611265893735007e-06, 'epoch': 0.62} 62%|██████▏ | 1722/2774 [5:39:46<3:25:41, 11.73s/it] 62%|██████▏ | 1723/2774 [5:39:58<3:24:37, 11.68s/it] {'loss': 1.084, 'learning_rate': 1.6583767426100528e-06, 'epoch': 0.62} 62%|██████▏ | 1723/2774 [5:39:58<3:24:37, 11.68s/it] 62%|██████▏ | 1724/2774 [5:40:09<3:22:01, 11.54s/it] {'loss': 0.998, 'learning_rate': 1.6556280437698857e-06, 'epoch': 0.62} 62%|██████▏ | 1724/2774 [5:40:09<3:22:01, 11.54s/it] 62%|██████▏ | 1725/2774 [5:40:22<3:27:08, 11.85s/it] {'loss': 1.0239, 'learning_rate': 1.6528804966020603e-06, 'epoch': 0.62} 62%|██████▏ | 1725/2774 [5:40:22<3:27:08, 11.85s/it] 62%|██████▏ | 1726/2774 [5:40:33<3:24:00, 11.68s/it] {'loss': 1.0386, 'learning_rate': 1.6501341048540647e-06, 'epoch': 0.62} 62%|██████▏ | 1726/2774 [5:40:33<3:24:00, 11.68s/it] 62%|██████▏ | 1727/2774 [5:40:44<3:23:19, 11.65s/it] {'loss': 1.0283, 'learning_rate': 1.647388872271811e-06, 'epoch': 0.62} 62%|██████▏ | 1727/2774 [5:40:44<3:23:19, 11.65s/it] 62%|██████▏ | 1728/2774 [5:40:56<3:21:44, 11.57s/it] {'loss': 1.0229, 'learning_rate': 1.6446448025996303e-06, 'epoch': 0.62} 62%|██████▏ | 1728/2774 [5:40:56<3:21:44, 11.57s/it] 62%|██████▏ | 1729/2774 [5:41:07<3:21:50, 11.59s/it] {'loss': 0.9937, 'learning_rate': 1.6419018995802685e-06, 'epoch': 0.62} 62%|██████▏ | 1729/2774 [5:41:07<3:21:50, 11.59s/it] 62%|██████▏ | 1730/2774 [5:41:19<3:21:44, 11.59s/it] {'loss': 1.0156, 'learning_rate': 1.6391601669548796e-06, 'epoch': 0.62} 62%|██████▏ | 1730/2774 [5:41:19<3:21:44, 11.59s/it] 62%|██████▏ | 1731/2774 [5:41:30<3:20:43, 11.55s/it] {'loss': 1.0571, 'learning_rate': 1.6364196084630207e-06, 'epoch': 0.62} 62%|██████▏ | 1731/2774 [5:41:30<3:20:43, 11.55s/it] 62%|██████▏ | 1732/2774 [5:41:43<3:24:30, 11.78s/it] {'loss': 1.0527, 'learning_rate': 1.6336802278426494e-06, 'epoch': 0.62} 62%|██████▏ | 1732/2774 [5:41:43<3:24:30, 11.78s/it] 62%|██████▏ | 1733/2774 [5:41:55<3:28:19, 12.01s/it] {'loss': 1.0278, 'learning_rate': 1.6309420288301136e-06, 'epoch': 0.62} 62%|██████▏ | 1733/2774 [5:41:55<3:28:19, 12.01s/it] 63%|██████▎ | 1734/2774 [5:42:08<3:30:02, 12.12s/it] {'loss': 0.9946, 'learning_rate': 1.628205015160152e-06, 'epoch': 0.63} 63%|██████▎ | 1734/2774 [5:42:08<3:30:02, 12.12s/it] 63%|██████▎ | 1735/2774 [5:42:20<3:28:54, 12.06s/it] {'loss': 1.0776, 'learning_rate': 1.625469190565886e-06, 'epoch': 0.63} 63%|██████▎ | 1735/2774 [5:42:20<3:28:54, 12.06s/it] 63%|██████▎ | 1736/2774 [5:42:31<3:26:07, 11.91s/it] {'loss': 1.0142, 'learning_rate': 1.6227345587788152e-06, 'epoch': 0.63} 63%|██████▎ | 1736/2774 [5:42:31<3:26:07, 11.91s/it] 63%|██████▎ | 1737/2774 [5:42:43<3:23:59, 11.80s/it] {'loss': 0.9854, 'learning_rate': 1.620001123528812e-06, 'epoch': 0.63} 63%|██████▎ | 1737/2774 [5:42:43<3:23:59, 11.80s/it] 63%|██████▎ | 1738/2774 [5:42:56<3:30:57, 12.22s/it] {'loss': 0.9805, 'learning_rate': 1.6172688885441174e-06, 'epoch': 0.63} 63%|██████▎ | 1738/2774 [5:42:56<3:30:57, 12.22s/it] 63%|██████▎ | 1739/2774 [5:43:08<3:30:44, 12.22s/it] {'loss': 1.0596, 'learning_rate': 1.6145378575513343e-06, 'epoch': 0.63} 63%|██████▎ | 1739/2774 [5:43:08<3:30:44, 12.22s/it] 63%|██████▎ | 1740/2774 [5:43:20<3:27:51, 12.06s/it] {'loss': 1.062, 'learning_rate': 1.611808034275424e-06, 'epoch': 0.63} 63%|██████▎ | 1740/2774 [5:43:20<3:27:51, 12.06s/it] 63%|██████▎ | 1741/2774 [5:43:31<3:23:29, 11.82s/it] {'loss': 1.063, 'learning_rate': 1.609079422439702e-06, 'epoch': 0.63} 63%|██████▎ | 1741/2774 [5:43:31<3:23:29, 11.82s/it] 63%|██████▎ | 1742/2774 [5:43:42<3:20:02, 11.63s/it] {'loss': 1.0278, 'learning_rate': 1.6063520257658278e-06, 'epoch': 0.63} 63%|██████▎ | 1742/2774 [5:43:42<3:20:02, 11.63s/it] 63%|██████▎ | 1743/2774 [5:43:53<3:17:21, 11.49s/it] {'loss': 1.0264, 'learning_rate': 1.6036258479738065e-06, 'epoch': 0.63} 63%|██████▎ | 1743/2774 [5:43:53<3:17:21, 11.49s/it] 63%|██████▎ | 1744/2774 [5:44:05<3:16:40, 11.46s/it] {'loss': 1.0161, 'learning_rate': 1.6009008927819802e-06, 'epoch': 0.63} 63%|██████▎ | 1744/2774 [5:44:05<3:16:40, 11.46s/it] 63%|██████▎ | 1745/2774 [5:44:16<3:17:17, 11.50s/it] {'loss': 1.0464, 'learning_rate': 1.598177163907023e-06, 'epoch': 0.63} 63%|██████▎ | 1745/2774 [5:44:16<3:17:17, 11.50s/it] 63%|██████▎ | 1746/2774 [5:44:30<3:26:08, 12.03s/it] {'loss': 0.9849, 'learning_rate': 1.5954546650639368e-06, 'epoch': 0.63} 63%|██████▎ | 1746/2774 [5:44:30<3:26:08, 12.03s/it] 63%|██████▎ | 1747/2774 [5:44:43<3:33:20, 12.46s/it] {'loss': 0.9863, 'learning_rate': 1.5927333999660457e-06, 'epoch': 0.63} 63%|██████▎ | 1747/2774 [5:44:43<3:33:20, 12.46s/it] 63%|██████▎ | 1748/2774 [5:44:55<3:27:52, 12.16s/it] {'loss': 1.0112, 'learning_rate': 1.59001337232499e-06, 'epoch': 0.63} 63%|██████▎ | 1748/2774 [5:44:55<3:27:52, 12.16s/it] 63%|██████▎ | 1749/2774 [5:45:08<3:33:30, 12.50s/it] {'loss': 0.9531, 'learning_rate': 1.5872945858507239e-06, 'epoch': 0.63} 63%|██████▎ | 1749/2774 [5:45:08<3:33:30, 12.50s/it] 63%|██████▎ | 1750/2774 [5:45:20<3:29:25, 12.27s/it] {'loss': 1.0688, 'learning_rate': 1.5845770442515082e-06, 'epoch': 0.63} 63%|██████▎ | 1750/2774 [5:45:20<3:29:25, 12.27s/it] 63%|██████▎ | 1751/2774 [5:45:31<3:24:25, 11.99s/it] {'loss': 1.0796, 'learning_rate': 1.5818607512339048e-06, 'epoch': 0.63} 63%|██████▎ | 1751/2774 [5:45:31<3:24:25, 11.99s/it] 63%|██████▎ | 1752/2774 [5:45:43<3:22:37, 11.90s/it] {'loss': 1.0728, 'learning_rate': 1.579145710502773e-06, 'epoch': 0.63} 63%|██████▎ | 1752/2774 [5:45:43<3:22:37, 11.90s/it] 63%|██████▎ | 1753/2774 [5:45:54<3:20:23, 11.78s/it] {'loss': 1.0781, 'learning_rate': 1.5764319257612649e-06, 'epoch': 0.63} 63%|██████▎ | 1753/2774 [5:45:54<3:20:23, 11.78s/it] 63%|██████▎ | 1754/2774 [5:46:05<3:17:53, 11.64s/it] {'loss': 1.0649, 'learning_rate': 1.573719400710819e-06, 'epoch': 0.63} 63%|██████▎ | 1754/2774 [5:46:05<3:17:53, 11.64s/it] 63%|██████▎ | 1755/2774 [5:46:17<3:15:45, 11.53s/it] {'loss': 1.0737, 'learning_rate': 1.571008139051155e-06, 'epoch': 0.63} 63%|██████▎ | 1755/2774 [5:46:17<3:15:45, 11.53s/it] 63%|██████▎ | 1756/2774 [5:46:28<3:15:14, 11.51s/it] {'loss': 1.0762, 'learning_rate': 1.5682981444802708e-06, 'epoch': 0.63} 63%|██████▎ | 1756/2774 [5:46:28<3:15:14, 11.51s/it] 63%|██████▎ | 1757/2774 [5:46:39<3:14:08, 11.45s/it] {'loss': 1.0669, 'learning_rate': 1.565589420694435e-06, 'epoch': 0.63} 63%|██████▎ | 1757/2774 [5:46:40<3:14:08, 11.45s/it] 63%|██████▎ | 1758/2774 [5:46:51<3:13:33, 11.43s/it] {'loss': 1.0376, 'learning_rate': 1.5628819713881832e-06, 'epoch': 0.63} 63%|██████▎ | 1758/2774 [5:46:51<3:13:33, 11.43s/it] 63%|██████▎ | 1759/2774 [5:47:03<3:19:04, 11.77s/it] {'loss': 1.0342, 'learning_rate': 1.5601758002543138e-06, 'epoch': 0.63} 63%|██████▎ | 1759/2774 [5:47:03<3:19:04, 11.77s/it] 63%|██████▎ | 1760/2774 [5:47:15<3:16:30, 11.63s/it] {'loss': 0.9917, 'learning_rate': 1.5574709109838782e-06, 'epoch': 0.63} 63%|██████▎ | 1760/2774 [5:47:15<3:16:30, 11.63s/it] 63%|██████▎ | 1761/2774 [5:47:26<3:15:44, 11.59s/it] {'loss': 1.0225, 'learning_rate': 1.5547673072661837e-06, 'epoch': 0.63} 63%|██████▎ | 1761/2774 [5:47:26<3:15:44, 11.59s/it] 64%|██████▎ | 1762/2774 [5:47:38<3:14:19, 11.52s/it] {'loss': 0.9888, 'learning_rate': 1.552064992788782e-06, 'epoch': 0.64} 64%|██████▎ | 1762/2774 [5:47:38<3:14:19, 11.52s/it] 64%|██████▎ | 1763/2774 [5:47:50<3:20:15, 11.88s/it] {'loss': 0.959, 'learning_rate': 1.5493639712374672e-06, 'epoch': 0.64} 64%|██████▎ | 1763/2774 [5:47:50<3:20:15, 11.88s/it] 64%|██████▎ | 1764/2774 [5:48:02<3:20:30, 11.91s/it] {'loss': 0.9824, 'learning_rate': 1.5466642462962695e-06, 'epoch': 0.64} 64%|██████▎ | 1764/2774 [5:48:02<3:20:30, 11.91s/it] 64%|██████▎ | 1765/2774 [5:48:14<3:19:38, 11.87s/it] {'loss': 1.042, 'learning_rate': 1.54396582164745e-06, 'epoch': 0.64} 64%|██████▎ | 1765/2774 [5:48:14<3:19:38, 11.87s/it] 64%|██████▎ | 1766/2774 [5:48:26<3:17:13, 11.74s/it] {'loss': 0.9873, 'learning_rate': 1.5412687009714974e-06, 'epoch': 0.64} 64%|██████▎ | 1766/2774 [5:48:26<3:17:13, 11.74s/it] 64%|██████▎ | 1767/2774 [5:48:37<3:14:59, 11.62s/it] {'loss': 1.0161, 'learning_rate': 1.5385728879471217e-06, 'epoch': 0.64} 64%|██████▎ | 1767/2774 [5:48:37<3:14:59, 11.62s/it] 64%|██████▎ | 1768/2774 [5:48:48<3:14:50, 11.62s/it] {'loss': 0.9897, 'learning_rate': 1.535878386251249e-06, 'epoch': 0.64} 64%|██████▎ | 1768/2774 [5:48:48<3:14:50, 11.62s/it] 64%|██████▍ | 1769/2774 [5:49:00<3:13:57, 11.58s/it] {'loss': 1.0239, 'learning_rate': 1.5331851995590159e-06, 'epoch': 0.64} 64%|██████▍ | 1769/2774 [5:49:00<3:13:57, 11.58s/it] 64%|██████▍ | 1770/2774 [5:49:13<3:19:07, 11.90s/it] {'loss': 1.0444, 'learning_rate': 1.530493331543767e-06, 'epoch': 0.64} 64%|██████▍ | 1770/2774 [5:49:13<3:19:07, 11.90s/it] 64%|██████▍ | 1771/2774 [5:49:24<3:15:30, 11.70s/it] {'loss': 0.9868, 'learning_rate': 1.5278027858770472e-06, 'epoch': 0.64} 64%|██████▍ | 1771/2774 [5:49:24<3:15:30, 11.70s/it] 64%|██████▍ | 1772/2774 [5:49:36<3:16:19, 11.76s/it] {'loss': 1.0654, 'learning_rate': 1.5251135662285993e-06, 'epoch': 0.64} 64%|██████▍ | 1772/2774 [5:49:36<3:16:19, 11.76s/it] 64%|██████▍ | 1773/2774 [5:49:48<3:20:57, 12.05s/it] {'loss': 1.0332, 'learning_rate': 1.5224256762663556e-06, 'epoch': 0.64} 64%|██████▍ | 1773/2774 [5:49:48<3:20:57, 12.05s/it] 64%|██████▍ | 1774/2774 [5:50:00<3:18:20, 11.90s/it] {'loss': 0.9829, 'learning_rate': 1.5197391196564357e-06, 'epoch': 0.64} 64%|██████▍ | 1774/2774 [5:50:00<3:18:20, 11.90s/it] 64%|██████▍ | 1775/2774 [5:50:11<3:15:26, 11.74s/it] {'loss': 1.019, 'learning_rate': 1.5170539000631407e-06, 'epoch': 0.64} 64%|██████▍ | 1775/2774 [5:50:11<3:15:26, 11.74s/it] 64%|██████▍ | 1776/2774 [5:50:23<3:12:30, 11.57s/it] {'loss': 1.0684, 'learning_rate': 1.5143700211489476e-06, 'epoch': 0.64} 64%|██████▍ | 1776/2774 [5:50:23<3:12:30, 11.57s/it] 64%|██████▍ | 1777/2774 [5:50:34<3:14:01, 11.68s/it] {'loss': 1.0308, 'learning_rate': 1.5116874865745069e-06, 'epoch': 0.64} 64%|██████▍ | 1777/2774 [5:50:34<3:14:01, 11.68s/it] 64%|██████▍ | 1778/2774 [5:50:46<3:12:15, 11.58s/it] {'loss': 1.0015, 'learning_rate': 1.5090062999986304e-06, 'epoch': 0.64} 64%|██████▍ | 1778/2774 [5:50:46<3:12:15, 11.58s/it] 64%|██████▍ | 1779/2774 [5:50:57<3:10:38, 11.50s/it] {'loss': 1.0645, 'learning_rate': 1.5063264650782972e-06, 'epoch': 0.64} 64%|██████▍ | 1779/2774 [5:50:57<3:10:38, 11.50s/it] 64%|██████▍ | 1780/2774 [5:51:09<3:10:17, 11.49s/it] {'loss': 1.0039, 'learning_rate': 1.5036479854686392e-06, 'epoch': 0.64} 64%|██████▍ | 1780/2774 [5:51:09<3:10:17, 11.49s/it] 64%|██████▍ | 1781/2774 [5:51:22<3:18:13, 11.98s/it] {'loss': 1.0532, 'learning_rate': 1.5009708648229409e-06, 'epoch': 0.64} 64%|██████▍ | 1781/2774 [5:51:22<3:18:13, 11.98s/it] 64%|██████▍ | 1782/2774 [5:51:33<3:14:27, 11.76s/it] {'loss': 1.0522, 'learning_rate': 1.4982951067926335e-06, 'epoch': 0.64} 64%|██████▍ | 1782/2774 [5:51:33<3:14:27, 11.76s/it] 64%|██████▍ | 1783/2774 [5:51:44<3:11:37, 11.60s/it] {'loss': 0.999, 'learning_rate': 1.495620715027289e-06, 'epoch': 0.64} 64%|██████▍ | 1783/2774 [5:51:44<3:11:37, 11.60s/it] 64%|██████▍ | 1784/2774 [5:51:56<3:10:56, 11.57s/it] {'loss': 1.0737, 'learning_rate': 1.4929476931746167e-06, 'epoch': 0.64} 64%|██████▍ | 1784/2774 [5:51:56<3:10:56, 11.57s/it] 64%|██████▍ | 1785/2774 [5:52:07<3:09:26, 11.49s/it] {'loss': 1.0142, 'learning_rate': 1.4902760448804559e-06, 'epoch': 0.64} 64%|██████▍ | 1785/2774 [5:52:07<3:09:26, 11.49s/it] 64%|██████▍ | 1786/2774 [5:52:18<3:08:12, 11.43s/it] {'loss': 1.0449, 'learning_rate': 1.4876057737887755e-06, 'epoch': 0.64} 64%|██████▍ | 1786/2774 [5:52:18<3:08:12, 11.43s/it] 64%|██████▍ | 1787/2774 [5:52:30<3:07:29, 11.40s/it] {'loss': 1.0527, 'learning_rate': 1.484936883541662e-06, 'epoch': 0.64} 64%|██████▍ | 1787/2774 [5:52:30<3:07:29, 11.40s/it] 64%|██████▍ | 1788/2774 [5:52:41<3:07:47, 11.43s/it] {'loss': 0.998, 'learning_rate': 1.4822693777793207e-06, 'epoch': 0.64} 64%|██████▍ | 1788/2774 [5:52:41<3:07:47, 11.43s/it] 64%|██████▍ | 1789/2774 [5:52:53<3:07:31, 11.42s/it] {'loss': 1.0063, 'learning_rate': 1.4796032601400687e-06, 'epoch': 0.64} 64%|██████▍ | 1789/2774 [5:52:53<3:07:31, 11.42s/it] 65%|██████▍ | 1790/2774 [5:53:06<3:19:00, 12.13s/it] {'loss': 0.9961, 'learning_rate': 1.4769385342603292e-06, 'epoch': 0.65} 65%|██████▍ | 1790/2774 [5:53:06<3:19:00, 12.13s/it] 65%|██████▍ | 1791/2774 [5:53:18<3:17:18, 12.04s/it] {'loss': 0.957, 'learning_rate': 1.4742752037746277e-06, 'epoch': 0.65} 65%|██████▍ | 1791/2774 [5:53:18<3:17:18, 12.04s/it] 65%|██████▍ | 1792/2774 [5:53:29<3:13:12, 11.81s/it] {'loss': 1.0806, 'learning_rate': 1.4716132723155864e-06, 'epoch': 0.65} 65%|██████▍ | 1792/2774 [5:53:29<3:13:12, 11.81s/it] 65%|██████▍ | 1793/2774 [5:53:41<3:11:11, 11.69s/it] {'loss': 1.0098, 'learning_rate': 1.4689527435139184e-06, 'epoch': 0.65} 65%|██████▍ | 1793/2774 [5:53:41<3:11:11, 11.69s/it] 65%|██████▍ | 1794/2774 [5:53:53<3:11:10, 11.71s/it] {'loss': 1.0205, 'learning_rate': 1.4662936209984242e-06, 'epoch': 0.65} 65%|██████▍ | 1794/2774 [5:53:53<3:11:10, 11.71s/it] 65%|██████▍ | 1795/2774 [5:54:04<3:10:57, 11.70s/it] {'loss': 1.0142, 'learning_rate': 1.4636359083959867e-06, 'epoch': 0.65} 65%|██████▍ | 1795/2774 [5:54:04<3:10:57, 11.70s/it] 65%|██████▍ | 1796/2774 [5:54:16<3:08:33, 11.57s/it] {'loss': 0.9731, 'learning_rate': 1.460979609331565e-06, 'epoch': 0.65} 65%|██████▍ | 1796/2774 [5:54:16<3:08:33, 11.57s/it] 65%|██████▍ | 1797/2774 [5:54:27<3:08:31, 11.58s/it] {'loss': 1.0439, 'learning_rate': 1.458324727428191e-06, 'epoch': 0.65} 65%|██████▍ | 1797/2774 [5:54:27<3:08:31, 11.58s/it] 65%|██████▍ | 1798/2774 [5:54:40<3:15:45, 12.03s/it] {'loss': 0.9717, 'learning_rate': 1.4556712663069622e-06, 'epoch': 0.65} 65%|██████▍ | 1798/2774 [5:54:40<3:15:45, 12.03s/it] 65%|██████▍ | 1799/2774 [5:54:52<3:12:47, 11.86s/it] {'loss': 0.9766, 'learning_rate': 1.4530192295870405e-06, 'epoch': 0.65} 65%|██████▍ | 1799/2774 [5:54:52<3:12:47, 11.86s/it] 65%|██████▍ | 1800/2774 [5:55:03<3:10:51, 11.76s/it] {'loss': 1.0596, 'learning_rate': 1.4503686208856426e-06, 'epoch': 0.65} 65%|██████▍ | 1800/2774 [5:55:03<3:10:51, 11.76s/it] 65%|██████▍ | 1801/2774 [5:55:14<3:07:33, 11.57s/it] {'loss': 1.0288, 'learning_rate': 1.4477194438180403e-06, 'epoch': 0.65} 65%|██████▍ | 1801/2774 [5:55:14<3:07:33, 11.57s/it] 65%|██████▍ | 1802/2774 [5:55:26<3:06:30, 11.51s/it] {'loss': 1.0347, 'learning_rate': 1.445071701997549e-06, 'epoch': 0.65} 65%|██████▍ | 1802/2774 [5:55:26<3:06:30, 11.51s/it] 65%|██████▍ | 1803/2774 [5:55:39<3:12:56, 11.92s/it] {'loss': 1.0371, 'learning_rate': 1.4424253990355308e-06, 'epoch': 0.65} 65%|██████▍ | 1803/2774 [5:55:39<3:12:56, 11.92s/it] 65%|██████▌ | 1804/2774 [5:55:51<3:14:49, 12.05s/it] {'loss': 1.021, 'learning_rate': 1.439780538541382e-06, 'epoch': 0.65} 65%|██████▌ | 1804/2774 [5:55:51<3:14:49, 12.05s/it] 65%|██████▌ | 1805/2774 [5:56:03<3:13:21, 11.97s/it] {'loss': 1.0654, 'learning_rate': 1.4371371241225326e-06, 'epoch': 0.65} 65%|██████▌ | 1805/2774 [5:56:03<3:13:21, 11.97s/it] 65%|██████▌ | 1806/2774 [5:56:14<3:10:14, 11.79s/it] {'loss': 1.0342, 'learning_rate': 1.4344951593844391e-06, 'epoch': 0.65} 65%|██████▌ | 1806/2774 [5:56:14<3:10:14, 11.79s/it] 65%|██████▌ | 1807/2774 [5:56:26<3:09:47, 11.78s/it] {'loss': 0.9966, 'learning_rate': 1.431854647930584e-06, 'epoch': 0.65} 65%|██████▌ | 1807/2774 [5:56:26<3:09:47, 11.78s/it] 65%|██████▌ | 1808/2774 [5:56:37<3:07:25, 11.64s/it] {'loss': 1.0718, 'learning_rate': 1.4292155933624642e-06, 'epoch': 0.65} 65%|██████▌ | 1808/2774 [5:56:37<3:07:25, 11.64s/it] 65%|██████▌ | 1809/2774 [5:56:48<3:05:43, 11.55s/it] {'loss': 1.0454, 'learning_rate': 1.4265779992795894e-06, 'epoch': 0.65} 65%|██████▌ | 1809/2774 [5:56:48<3:05:43, 11.55s/it] 65%|██████▌ | 1810/2774 [5:57:00<3:04:50, 11.50s/it] {'loss': 1.1113, 'learning_rate': 1.4239418692794813e-06, 'epoch': 0.65} 65%|██████▌ | 1810/2774 [5:57:00<3:04:50, 11.50s/it] 65%|██████▌ | 1811/2774 [5:57:11<3:04:26, 11.49s/it] {'loss': 0.9775, 'learning_rate': 1.4213072069576594e-06, 'epoch': 0.65} 65%|██████▌ | 1811/2774 [5:57:11<3:04:26, 11.49s/it] 65%|██████▌ | 1812/2774 [5:57:23<3:04:23, 11.50s/it] {'loss': 0.9922, 'learning_rate': 1.4186740159076461e-06, 'epoch': 0.65} 65%|██████▌ | 1812/2774 [5:57:23<3:04:23, 11.50s/it] 65%|██████▌ | 1813/2774 [5:57:34<3:04:25, 11.51s/it] {'loss': 1.0044, 'learning_rate': 1.4160422997209543e-06, 'epoch': 0.65} 65%|██████▌ | 1813/2774 [5:57:34<3:04:25, 11.51s/it] 65%|██████▌ | 1814/2774 [5:57:46<3:03:19, 11.46s/it] {'loss': 1.0215, 'learning_rate': 1.4134120619870855e-06, 'epoch': 0.65} 65%|██████▌ | 1814/2774 [5:57:46<3:03:19, 11.46s/it] 65%|██████▌ | 1815/2774 [5:57:59<3:13:34, 12.11s/it] {'loss': 0.9731, 'learning_rate': 1.4107833062935244e-06, 'epoch': 0.65} 65%|██████▌ | 1815/2774 [5:57:59<3:13:34, 12.11s/it] 65%|██████▌ | 1816/2774 [5:58:11<3:09:47, 11.89s/it] {'loss': 1.0439, 'learning_rate': 1.4081560362257365e-06, 'epoch': 0.65} 65%|██████▌ | 1816/2774 [5:58:11<3:09:47, 11.89s/it] 66%|██████▌ | 1817/2774 [5:58:22<3:06:42, 11.71s/it] {'loss': 0.9956, 'learning_rate': 1.405530255367158e-06, 'epoch': 0.66} 66%|██████▌ | 1817/2774 [5:58:22<3:06:42, 11.71s/it] 66%|██████▌ | 1818/2774 [5:58:34<3:06:08, 11.68s/it] {'loss': 1.0713, 'learning_rate': 1.402905967299197e-06, 'epoch': 0.66} 66%|██████▌ | 1818/2774 [5:58:34<3:06:08, 11.68s/it] 66%|██████▌ | 1819/2774 [5:58:47<3:12:11, 12.07s/it] {'loss': 0.9941, 'learning_rate': 1.4002831756012215e-06, 'epoch': 0.66} 66%|██████▌ | 1819/2774 [5:58:47<3:12:11, 12.07s/it] 66%|██████▌ | 1820/2774 [5:58:58<3:09:44, 11.93s/it] {'loss': 0.9834, 'learning_rate': 1.3976618838505637e-06, 'epoch': 0.66} 66%|██████▌ | 1820/2774 [5:58:58<3:09:44, 11.93s/it] 66%|██████▌ | 1821/2774 [5:59:10<3:07:09, 11.78s/it] {'loss': 0.9604, 'learning_rate': 1.3950420956225052e-06, 'epoch': 0.66} 66%|██████▌ | 1821/2774 [5:59:10<3:07:09, 11.78s/it] 66%|██████▌ | 1822/2774 [5:59:21<3:05:02, 11.66s/it] {'loss': 1.0059, 'learning_rate': 1.3924238144902813e-06, 'epoch': 0.66} 66%|██████▌ | 1822/2774 [5:59:21<3:05:02, 11.66s/it] 66%|██████▌ | 1823/2774 [5:59:32<3:02:54, 11.54s/it] {'loss': 1.0239, 'learning_rate': 1.3898070440250656e-06, 'epoch': 0.66} 66%|██████▌ | 1823/2774 [5:59:32<3:02:54, 11.54s/it] 66%|██████▌ | 1824/2774 [5:59:44<3:01:41, 11.48s/it] {'loss': 1.0474, 'learning_rate': 1.387191787795978e-06, 'epoch': 0.66} 66%|██████▌ | 1824/2774 [5:59:44<3:01:41, 11.48s/it] 66%|██████▌ | 1825/2774 [5:59:55<3:00:27, 11.41s/it] {'loss': 0.9966, 'learning_rate': 1.3845780493700684e-06, 'epoch': 0.66} 66%|██████▌ | 1825/2774 [5:59:55<3:00:27, 11.41s/it] 66%|██████▌ | 1826/2774 [6:00:08<3:07:44, 11.88s/it] {'loss': 1.0156, 'learning_rate': 1.3819658323123193e-06, 'epoch': 0.66} 66%|██████▌ | 1826/2774 [6:00:08<3:07:44, 11.88s/it] 66%|██████▌ | 1827/2774 [6:00:19<3:05:47, 11.77s/it] {'loss': 0.9736, 'learning_rate': 1.3793551401856353e-06, 'epoch': 0.66} 66%|██████▌ | 1827/2774 [6:00:19<3:05:47, 11.77s/it] 66%|██████▌ | 1828/2774 [6:00:31<3:04:55, 11.73s/it] {'loss': 1.0249, 'learning_rate': 1.3767459765508448e-06, 'epoch': 0.66} 66%|██████▌ | 1828/2774 [6:00:31<3:04:55, 11.73s/it] 66%|██████▌ | 1829/2774 [6:00:45<3:13:25, 12.28s/it] {'loss': 0.9917, 'learning_rate': 1.3741383449666885e-06, 'epoch': 0.66} 66%|██████▌ | 1829/2774 [6:00:45<3:13:25, 12.28s/it] 66%|██████▌ | 1830/2774 [6:00:56<3:09:13, 12.03s/it] {'loss': 0.98, 'learning_rate': 1.3715322489898169e-06, 'epoch': 0.66} 66%|██████▌ | 1830/2774 [6:00:56<3:09:13, 12.03s/it] 66%|██████▌ | 1831/2774 [6:01:07<3:06:12, 11.85s/it] {'loss': 1.0195, 'learning_rate': 1.3689276921747901e-06, 'epoch': 0.66} 66%|██████▌ | 1831/2774 [6:01:07<3:06:12, 11.85s/it] 66%|██████▌ | 1832/2774 [6:01:19<3:05:13, 11.80s/it] {'loss': 1.0381, 'learning_rate': 1.3663246780740653e-06, 'epoch': 0.66} 66%|██████▌ | 1832/2774 [6:01:19<3:05:13, 11.80s/it] 66%|██████▌ | 1833/2774 [6:01:31<3:03:22, 11.69s/it] {'loss': 1.0317, 'learning_rate': 1.363723210237996e-06, 'epoch': 0.66} 66%|██████▌ | 1833/2774 [6:01:31<3:03:22, 11.69s/it] 66%|██████▌ | 1834/2774 [6:01:42<3:02:41, 11.66s/it] {'loss': 1.0469, 'learning_rate': 1.361123292214826e-06, 'epoch': 0.66} 66%|██████▌ | 1834/2774 [6:01:42<3:02:41, 11.66s/it] 66%|██████▌ | 1835/2774 [6:01:55<3:06:14, 11.90s/it] {'loss': 1.0088, 'learning_rate': 1.358524927550689e-06, 'epoch': 0.66} 66%|██████▌ | 1835/2774 [6:01:55<3:06:14, 11.90s/it] 66%|██████▌ | 1836/2774 [6:02:06<3:04:02, 11.77s/it] {'loss': 1.0225, 'learning_rate': 1.3559281197895955e-06, 'epoch': 0.66} 66%|██████▌ | 1836/2774 [6:02:06<3:04:02, 11.77s/it] 66%|██████▌ | 1837/2774 [6:02:19<3:09:07, 12.11s/it] {'loss': 0.9814, 'learning_rate': 1.3533328724734358e-06, 'epoch': 0.66} 66%|██████▌ | 1837/2774 [6:02:19<3:09:07, 12.11s/it] 66%|██████▋ | 1838/2774 [6:02:31<3:07:47, 12.04s/it] {'loss': 0.9575, 'learning_rate': 1.3507391891419689e-06, 'epoch': 0.66} 66%|██████▋ | 1838/2774 [6:02:31<3:07:47, 12.04s/it] 66%|██████▋ | 1839/2774 [6:02:43<3:05:59, 11.94s/it] {'loss': 1.0278, 'learning_rate': 1.3481470733328238e-06, 'epoch': 0.66} 66%|██████▋ | 1839/2774 [6:02:43<3:05:59, 11.94s/it] 66%|██████▋ | 1840/2774 [6:02:54<3:03:36, 11.80s/it] {'loss': 1.041, 'learning_rate': 1.3455565285814898e-06, 'epoch': 0.66} 66%|██████▋ | 1840/2774 [6:02:54<3:03:36, 11.80s/it] 66%|██████▋ | 1841/2774 [6:03:06<3:02:02, 11.71s/it] {'loss': 1.0508, 'learning_rate': 1.3429675584213122e-06, 'epoch': 0.66} 66%|██████▋ | 1841/2774 [6:03:06<3:02:02, 11.71s/it] 66%|██████▋ | 1842/2774 [6:03:17<3:01:07, 11.66s/it] {'loss': 1.0044, 'learning_rate': 1.3403801663834897e-06, 'epoch': 0.66} 66%|██████▋ | 1842/2774 [6:03:17<3:01:07, 11.66s/it] 66%|██████▋ | 1843/2774 [6:03:29<3:01:20, 11.69s/it] {'loss': 0.979, 'learning_rate': 1.3377943559970707e-06, 'epoch': 0.66} 66%|██████▋ | 1843/2774 [6:03:29<3:01:20, 11.69s/it] 66%|██████▋ | 1844/2774 [6:03:40<3:00:43, 11.66s/it] {'loss': 1.0791, 'learning_rate': 1.3352101307889422e-06, 'epoch': 0.66} 66%|██████▋ | 1844/2774 [6:03:40<3:00:43, 11.66s/it] 67%|██████▋ | 1845/2774 [6:03:52<2:59:37, 11.60s/it] {'loss': 1.0098, 'learning_rate': 1.3326274942838333e-06, 'epoch': 0.67} 67%|██████▋ | 1845/2774 [6:03:52<2:59:37, 11.60s/it] 67%|██████▋ | 1846/2774 [6:04:04<3:01:10, 11.71s/it] {'loss': 1.0166, 'learning_rate': 1.330046450004302e-06, 'epoch': 0.67} 67%|██████▋ | 1846/2774 [6:04:04<3:01:10, 11.71s/it] 67%|██████▋ | 1847/2774 [6:04:18<3:10:23, 12.32s/it] {'loss': 0.9639, 'learning_rate': 1.3274670014707392e-06, 'epoch': 0.67} 67%|██████▋ | 1847/2774 [6:04:18<3:10:23, 12.32s/it] 67%|██████▋ | 1848/2774 [6:04:29<3:05:50, 12.04s/it] {'loss': 1.002, 'learning_rate': 1.3248891522013546e-06, 'epoch': 0.67} 67%|██████▋ | 1848/2774 [6:04:29<3:05:50, 12.04s/it] 67%|██████▋ | 1849/2774 [6:04:40<3:01:53, 11.80s/it] {'loss': 1.0405, 'learning_rate': 1.3223129057121816e-06, 'epoch': 0.67} 67%|██████▋ | 1849/2774 [6:04:40<3:01:53, 11.80s/it] 67%|██████▋ | 1850/2774 [6:04:52<3:00:09, 11.70s/it] {'loss': 1.0806, 'learning_rate': 1.3197382655170616e-06, 'epoch': 0.67} 67%|██████▋ | 1850/2774 [6:04:52<3:00:09, 11.70s/it] 67%|██████▋ | 1851/2774 [6:05:06<3:09:51, 12.34s/it] {'loss': 0.9961, 'learning_rate': 1.3171652351276505e-06, 'epoch': 0.67} 67%|██████▋ | 1851/2774 [6:05:06<3:09:51, 12.34s/it] 67%|██████▋ | 1852/2774 [6:05:18<3:08:26, 12.26s/it] {'loss': 1.0498, 'learning_rate': 1.3145938180534045e-06, 'epoch': 0.67} 67%|██████▋ | 1852/2774 [6:05:18<3:08:26, 12.26s/it] 67%|██████▋ | 1853/2774 [6:05:29<3:03:24, 11.95s/it] {'loss': 1.0029, 'learning_rate': 1.3120240178015834e-06, 'epoch': 0.67} 67%|██████▋ | 1853/2774 [6:05:29<3:03:24, 11.95s/it] 67%|██████▋ | 1854/2774 [6:05:40<3:00:21, 11.76s/it] {'loss': 1.0596, 'learning_rate': 1.3094558378772383e-06, 'epoch': 0.67} 67%|██████▋ | 1854/2774 [6:05:40<3:00:21, 11.76s/it] 67%|██████▋ | 1855/2774 [6:05:53<3:03:48, 12.00s/it] {'loss': 1.0264, 'learning_rate': 1.3068892817832108e-06, 'epoch': 0.67} 67%|██████▋ | 1855/2774 [6:05:53<3:03:48, 12.00s/it] 67%|██████▋ | 1856/2774 [6:06:04<3:02:24, 11.92s/it] {'loss': 1.0283, 'learning_rate': 1.30432435302013e-06, 'epoch': 0.67} 67%|██████▋ | 1856/2774 [6:06:04<3:02:24, 11.92s/it] 67%|██████▋ | 1857/2774 [6:06:16<2:58:50, 11.70s/it] {'loss': 1.0142, 'learning_rate': 1.3017610550864019e-06, 'epoch': 0.67} 67%|██████▋ | 1857/2774 [6:06:16<2:58:50, 11.70s/it] 67%|██████▋ | 1858/2774 [6:06:27<2:56:55, 11.59s/it] {'loss': 1.0156, 'learning_rate': 1.299199391478212e-06, 'epoch': 0.67} 67%|██████▋ | 1858/2774 [6:06:27<2:56:55, 11.59s/it] 67%|██████▋ | 1859/2774 [6:06:39<2:56:30, 11.57s/it] {'loss': 1.0464, 'learning_rate': 1.2966393656895136e-06, 'epoch': 0.67} 67%|██████▋ | 1859/2774 [6:06:39<2:56:30, 11.57s/it] 67%|██████▋ | 1860/2774 [6:06:50<2:55:41, 11.53s/it] {'loss': 1.0332, 'learning_rate': 1.2940809812120276e-06, 'epoch': 0.67} 67%|██████▋ | 1860/2774 [6:06:50<2:55:41, 11.53s/it] 67%|██████▋ | 1861/2774 [6:07:01<2:55:14, 11.52s/it] {'loss': 0.9932, 'learning_rate': 1.2915242415352346e-06, 'epoch': 0.67} 67%|██████▋ | 1861/2774 [6:07:01<2:55:14, 11.52s/it] 67%|██████▋ | 1862/2774 [6:07:13<2:54:01, 11.45s/it] {'loss': 1.0288, 'learning_rate': 1.2889691501463753e-06, 'epoch': 0.67} 67%|██████▋ | 1862/2774 [6:07:13<2:54:01, 11.45s/it] 67%|██████▋ | 1863/2774 [6:07:24<2:53:55, 11.45s/it] {'loss': 1.0151, 'learning_rate': 1.2864157105304376e-06, 'epoch': 0.67} 67%|██████▋ | 1863/2774 [6:07:24<2:53:55, 11.45s/it] 67%|██████▋ | 1864/2774 [6:07:36<2:56:16, 11.62s/it] {'loss': 1.0317, 'learning_rate': 1.2838639261701614e-06, 'epoch': 0.67} 67%|██████▋ | 1864/2774 [6:07:36<2:56:16, 11.62s/it] 67%|██████▋ | 1865/2774 [6:07:49<3:03:04, 12.08s/it] {'loss': 0.9688, 'learning_rate': 1.2813138005460241e-06, 'epoch': 0.67} 67%|██████▋ | 1865/2774 [6:07:49<3:03:04, 12.08s/it] 67%|██████▋ | 1866/2774 [6:08:01<2:59:20, 11.85s/it] {'loss': 1.0659, 'learning_rate': 1.278765337136245e-06, 'epoch': 0.67} 67%|██████▋ | 1866/2774 [6:08:01<2:59:20, 11.85s/it] 67%|██████▋ | 1867/2774 [6:08:12<2:57:26, 11.74s/it] {'loss': 1.0635, 'learning_rate': 1.276218539416773e-06, 'epoch': 0.67} 67%|██████▋ | 1867/2774 [6:08:12<2:57:26, 11.74s/it] 67%|██████▋ | 1868/2774 [6:08:25<3:01:35, 12.03s/it] {'loss': 0.9834, 'learning_rate': 1.273673410861287e-06, 'epoch': 0.67} 67%|██████▋ | 1868/2774 [6:08:25<3:01:35, 12.03s/it] 67%|██████▋ | 1869/2774 [6:08:36<2:59:05, 11.87s/it] {'loss': 1.0117, 'learning_rate': 1.271129954941187e-06, 'epoch': 0.67} 67%|██████▋ | 1869/2774 [6:08:36<2:59:05, 11.87s/it] 67%|██████▋ | 1870/2774 [6:08:48<2:55:51, 11.67s/it] {'loss': 1.0557, 'learning_rate': 1.2685881751255957e-06, 'epoch': 0.67} 67%|██████▋ | 1870/2774 [6:08:48<2:55:51, 11.67s/it] 67%|██████▋ | 1871/2774 [6:08:59<2:52:42, 11.48s/it] {'loss': 0.9854, 'learning_rate': 1.2660480748813453e-06, 'epoch': 0.67} 67%|██████▋ | 1871/2774 [6:08:59<2:52:42, 11.48s/it] 67%|██████▋ | 1872/2774 [6:09:10<2:51:53, 11.43s/it] {'loss': 1.0508, 'learning_rate': 1.2635096576729804e-06, 'epoch': 0.67} 67%|██████▋ | 1872/2774 [6:09:10<2:51:53, 11.43s/it] 68%|██████▊ | 1873/2774 [6:09:21<2:52:15, 11.47s/it] {'loss': 1.0557, 'learning_rate': 1.260972926962747e-06, 'epoch': 0.68} 68%|██████▊ | 1873/2774 [6:09:21<2:52:15, 11.47s/it] 68%|██████▊ | 1874/2774 [6:09:33<2:51:21, 11.42s/it] {'loss': 1.0615, 'learning_rate': 1.258437886210595e-06, 'epoch': 0.68} 68%|██████▊ | 1874/2774 [6:09:33<2:51:21, 11.42s/it] 68%|██████▊ | 1875/2774 [6:09:44<2:50:15, 11.36s/it] {'loss': 1.0059, 'learning_rate': 1.2559045388741654e-06, 'epoch': 0.68} 68%|██████▊ | 1875/2774 [6:09:44<2:50:15, 11.36s/it] 68%|██████▊ | 1876/2774 [6:09:56<2:50:45, 11.41s/it] {'loss': 1.064, 'learning_rate': 1.2533728884087909e-06, 'epoch': 0.68} 68%|██████▊ | 1876/2774 [6:09:56<2:50:45, 11.41s/it] 68%|██████▊ | 1877/2774 [6:10:07<2:50:54, 11.43s/it] {'loss': 1.0615, 'learning_rate': 1.250842938267489e-06, 'epoch': 0.68} 68%|██████▊ | 1877/2774 [6:10:07<2:50:54, 11.43s/it] 68%|██████▊ | 1878/2774 [6:10:19<2:51:02, 11.45s/it] {'loss': 0.9751, 'learning_rate': 1.2483146919009608e-06, 'epoch': 0.68} 68%|██████▊ | 1878/2774 [6:10:19<2:51:02, 11.45s/it] 68%|██████▊ | 1879/2774 [6:10:30<2:52:32, 11.57s/it] {'loss': 1.0312, 'learning_rate': 1.2457881527575808e-06, 'epoch': 0.68} 68%|██████▊ | 1879/2774 [6:10:30<2:52:32, 11.57s/it] 68%|██████▊ | 1880/2774 [6:10:42<2:51:21, 11.50s/it] {'loss': 1.0439, 'learning_rate': 1.2432633242833943e-06, 'epoch': 0.68} 68%|██████▊ | 1880/2774 [6:10:42<2:51:21, 11.50s/it] 68%|██████▊ | 1881/2774 [6:10:53<2:50:26, 11.45s/it] {'loss': 1.0547, 'learning_rate': 1.2407402099221174e-06, 'epoch': 0.68} 68%|██████▊ | 1881/2774 [6:10:53<2:50:26, 11.45s/it] 68%|██████▊ | 1882/2774 [6:11:04<2:49:21, 11.39s/it] {'loss': 1.0283, 'learning_rate': 1.2382188131151234e-06, 'epoch': 0.68} 68%|██████▊ | 1882/2774 [6:11:04<2:49:21, 11.39s/it] 68%|██████▊ | 1883/2774 [6:11:16<2:49:56, 11.44s/it] {'loss': 1.0449, 'learning_rate': 1.235699137301447e-06, 'epoch': 0.68} 68%|██████▊ | 1883/2774 [6:11:16<2:49:56, 11.44s/it] 68%|██████▊ | 1884/2774 [6:11:27<2:49:48, 11.45s/it] {'loss': 1.126, 'learning_rate': 1.2331811859177722e-06, 'epoch': 0.68} 68%|██████▊ | 1884/2774 [6:11:27<2:49:48, 11.45s/it] 68%|██████▊ | 1885/2774 [6:11:39<2:50:40, 11.52s/it] {'loss': 1.0024, 'learning_rate': 1.2306649623984355e-06, 'epoch': 0.68} 68%|██████▊ | 1885/2774 [6:11:39<2:50:40, 11.52s/it] 68%|██████▊ | 1886/2774 [6:11:50<2:49:17, 11.44s/it] {'loss': 1.0249, 'learning_rate': 1.2281504701754094e-06, 'epoch': 0.68} 68%|██████▊ | 1886/2774 [6:11:50<2:49:17, 11.44s/it] 68%|██████▊ | 1887/2774 [6:12:02<2:50:47, 11.55s/it] {'loss': 1.0093, 'learning_rate': 1.2256377126783128e-06, 'epoch': 0.68} 68%|██████▊ | 1887/2774 [6:12:02<2:50:47, 11.55s/it] 68%|██████▊ | 1888/2774 [6:12:13<2:49:18, 11.47s/it] {'loss': 1.0488, 'learning_rate': 1.223126693334393e-06, 'epoch': 0.68} 68%|██████▊ | 1888/2774 [6:12:13<2:49:18, 11.47s/it] 68%|██████▊ | 1889/2774 [6:12:26<2:55:41, 11.91s/it] {'loss': 0.9512, 'learning_rate': 1.2206174155685308e-06, 'epoch': 0.68} 68%|██████▊ | 1889/2774 [6:12:26<2:55:41, 11.91s/it] 68%|██████▊ | 1890/2774 [6:12:38<2:54:04, 11.81s/it] {'loss': 1.0112, 'learning_rate': 1.2181098828032273e-06, 'epoch': 0.68} 68%|██████▊ | 1890/2774 [6:12:38<2:54:04, 11.81s/it] 68%|██████▊ | 1891/2774 [6:12:49<2:52:59, 11.75s/it] {'loss': 1.0454, 'learning_rate': 1.2156040984586079e-06, 'epoch': 0.68} 68%|██████▊ | 1891/2774 [6:12:49<2:52:59, 11.75s/it] 68%|██████▊ | 1892/2774 [6:13:02<2:54:08, 11.85s/it] {'loss': 1.0679, 'learning_rate': 1.213100065952409e-06, 'epoch': 0.68} 68%|██████▊ | 1892/2774 [6:13:02<2:54:08, 11.85s/it] 68%|██████▊ | 1893/2774 [6:13:13<2:52:15, 11.73s/it] {'loss': 1.082, 'learning_rate': 1.2105977886999814e-06, 'epoch': 0.68} 68%|██████▊ | 1893/2774 [6:13:13<2:52:15, 11.73s/it] 68%|██████▊ | 1894/2774 [6:13:24<2:49:27, 11.55s/it] {'loss': 1.0444, 'learning_rate': 1.2080972701142795e-06, 'epoch': 0.68} 68%|██████▊ | 1894/2774 [6:13:24<2:49:27, 11.55s/it] 68%|██████▊ | 1895/2774 [6:13:36<2:48:40, 11.51s/it] {'loss': 1.0151, 'learning_rate': 1.2055985136058595e-06, 'epoch': 0.68} 68%|██████▊ | 1895/2774 [6:13:36<2:48:40, 11.51s/it] 68%|██████▊ | 1896/2774 [6:13:47<2:48:41, 11.53s/it] {'loss': 1.0059, 'learning_rate': 1.2031015225828734e-06, 'epoch': 0.68} 68%|██████▊ | 1896/2774 [6:13:47<2:48:41, 11.53s/it] 68%|██████▊ | 1897/2774 [6:14:00<2:52:33, 11.81s/it] {'loss': 0.9873, 'learning_rate': 1.200606300451068e-06, 'epoch': 0.68} 68%|██████▊ | 1897/2774 [6:14:00<2:52:33, 11.81s/it] 68%|██████▊ | 1898/2774 [6:14:11<2:50:39, 11.69s/it] {'loss': 1.0366, 'learning_rate': 1.1981128506137737e-06, 'epoch': 0.68} 68%|██████▊ | 1898/2774 [6:14:11<2:50:39, 11.69s/it] 68%|██████▊ | 1899/2774 [6:14:22<2:48:55, 11.58s/it] {'loss': 0.9766, 'learning_rate': 1.1956211764719072e-06, 'epoch': 0.68} 68%|██████▊ | 1899/2774 [6:14:22<2:48:55, 11.58s/it] 68%|██████▊ | 1900/2774 [6:14:35<2:55:27, 12.05s/it] {'loss': 0.9917, 'learning_rate': 1.1931312814239607e-06, 'epoch': 0.68} 68%|██████▊ | 1900/2774 [6:14:35<2:55:27, 12.05s/it] 69%|██████▊ | 1901/2774 [6:14:47<2:51:32, 11.79s/it] {'loss': 1.0361, 'learning_rate': 1.1906431688659995e-06, 'epoch': 0.69} 69%|██████▊ | 1901/2774 [6:14:47<2:51:32, 11.79s/it] 69%|██████▊ | 1902/2774 [6:14:58<2:49:40, 11.67s/it] {'loss': 1.0063, 'learning_rate': 1.188156842191661e-06, 'epoch': 0.69} 69%|██████▊ | 1902/2774 [6:14:58<2:49:40, 11.67s/it] 69%|██████▊ | 1903/2774 [6:15:09<2:47:27, 11.54s/it] {'loss': 1.0659, 'learning_rate': 1.1856723047921434e-06, 'epoch': 0.69} 69%|██████▊ | 1903/2774 [6:15:09<2:47:27, 11.54s/it] 69%|██████▊ | 1904/2774 [6:15:21<2:47:08, 11.53s/it] {'loss': 1.0317, 'learning_rate': 1.1831895600562046e-06, 'epoch': 0.69} 69%|██████▊ | 1904/2774 [6:15:21<2:47:08, 11.53s/it] 69%|██████▊ | 1905/2774 [6:15:32<2:45:05, 11.40s/it] {'loss': 1.0171, 'learning_rate': 1.1807086113701608e-06, 'epoch': 0.69} 69%|██████▊ | 1905/2774 [6:15:32<2:45:05, 11.40s/it] 69%|██████▊ | 1906/2774 [6:15:43<2:44:44, 11.39s/it] {'loss': 1.0205, 'learning_rate': 1.178229462117875e-06, 'epoch': 0.69} 69%|██████▊ | 1906/2774 [6:15:43<2:44:44, 11.39s/it] 69%|██████▊ | 1907/2774 [6:15:55<2:47:06, 11.56s/it] {'loss': 1.0215, 'learning_rate': 1.1757521156807556e-06, 'epoch': 0.69} 69%|██████▊ | 1907/2774 [6:15:55<2:47:06, 11.56s/it]/usr/local/lib/python3.9/dist-packages/PIL/TiffImagePlugin.py:850: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 2. warnings.warn(str(msg)) 69%|██████▉ | 1908/2774 [6:16:07<2:45:50, 11.49s/it] {'loss': 1.0469, 'learning_rate': 1.1732765754377558e-06, 'epoch': 0.69} 69%|██████▉ | 1908/2774 [6:16:07<2:45:50, 11.49s/it] 69%|██████▉ | 1909/2774 [6:16:18<2:47:11, 11.60s/it] {'loss': 1.064, 'learning_rate': 1.1708028447653614e-06, 'epoch': 0.69} 69%|██████▉ | 1909/2774 [6:16:18<2:47:11, 11.60s/it] 69%|██████▉ | 1910/2774 [6:16:30<2:46:05, 11.53s/it] {'loss': 1.0127, 'learning_rate': 1.1683309270375928e-06, 'epoch': 0.69} 69%|██████▉ | 1910/2774 [6:16:30<2:46:05, 11.53s/it] 69%|██████▉ | 1911/2774 [6:16:42<2:47:57, 11.68s/it] {'loss': 1.0303, 'learning_rate': 1.165860825625995e-06, 'epoch': 0.69} 69%|██████▉ | 1911/2774 [6:16:42<2:47:57, 11.68s/it] 69%|██████▉ | 1912/2774 [6:16:53<2:47:07, 11.63s/it] {'loss': 1.0195, 'learning_rate': 1.16339254389964e-06, 'epoch': 0.69} 69%|██████▉ | 1912/2774 [6:16:53<2:47:07, 11.63s/it] 69%|██████▉ | 1913/2774 [6:17:05<2:45:09, 11.51s/it] {'loss': 1.0151, 'learning_rate': 1.1609260852251105e-06, 'epoch': 0.69} 69%|██████▉ | 1913/2774 [6:17:05<2:45:09, 11.51s/it] 69%|██████▉ | 1914/2774 [6:17:16<2:44:19, 11.46s/it] {'loss': 1.0171, 'learning_rate': 1.158461452966511e-06, 'epoch': 0.69} 69%|██████▉ | 1914/2774 [6:17:16<2:44:19, 11.46s/it] 69%|██████▉ | 1915/2774 [6:17:27<2:44:15, 11.47s/it] {'loss': 1.0654, 'learning_rate': 1.1559986504854481e-06, 'epoch': 0.69} 69%|██████▉ | 1915/2774 [6:17:27<2:44:15, 11.47s/it] 69%|██████▉ | 1916/2774 [6:17:39<2:45:10, 11.55s/it] {'loss': 1.0532, 'learning_rate': 1.1535376811410384e-06, 'epoch': 0.69} 69%|██████▉ | 1916/2774 [6:17:39<2:45:10, 11.55s/it] 69%|██████▉ | 1917/2774 [6:17:51<2:47:10, 11.70s/it] {'loss': 1.0601, 'learning_rate': 1.1510785482898928e-06, 'epoch': 0.69} 69%|██████▉ | 1917/2774 [6:17:51<2:47:10, 11.70s/it] 69%|██████▉ | 1918/2774 [6:18:03<2:46:38, 11.68s/it] {'loss': 1.0205, 'learning_rate': 1.1486212552861225e-06, 'epoch': 0.69} 69%|██████▉ | 1918/2774 [6:18:03<2:46:38, 11.68s/it] 69%|██████▉ | 1919/2774 [6:18:15<2:47:00, 11.72s/it] {'loss': 1.0308, 'learning_rate': 1.1461658054813244e-06, 'epoch': 0.69} 69%|██████▉ | 1919/2774 [6:18:15<2:47:00, 11.72s/it] 69%|██████▉ | 1920/2774 [6:18:26<2:46:35, 11.70s/it] {'loss': 1.085, 'learning_rate': 1.1437122022245859e-06, 'epoch': 0.69} 69%|██████▉ | 1920/2774 [6:18:26<2:46:35, 11.70s/it] 69%|██████▉ | 1921/2774 [6:18:38<2:44:45, 11.59s/it] {'loss': 1.04, 'learning_rate': 1.1412604488624721e-06, 'epoch': 0.69} 69%|██████▉ | 1921/2774 [6:18:38<2:44:45, 11.59s/it] 69%|██████▉ | 1922/2774 [6:18:51<2:51:17, 12.06s/it] {'loss': 0.9932, 'learning_rate': 1.1388105487390273e-06, 'epoch': 0.69} 69%|██████▉ | 1922/2774 [6:18:51<2:51:17, 12.06s/it]/usr/local/lib/python3.9/dist-packages/PIL/TiffImagePlugin.py:850: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) 69%|██████▉ | 1923/2774 [6:19:02<2:48:02, 11.85s/it] {'loss': 0.999, 'learning_rate': 1.1363625051957655e-06, 'epoch': 0.69} 69%|██████▉ | 1923/2774 [6:19:02<2:48:02, 11.85s/it] 69%|██████▉ | 1924/2774 [6:19:14<2:46:03, 11.72s/it] {'loss': 1.0474, 'learning_rate': 1.1339163215716728e-06, 'epoch': 0.69} 69%|██████▉ | 1924/2774 [6:19:14<2:46:03, 11.72s/it] 69%|██████▉ | 1925/2774 [6:19:25<2:43:56, 11.59s/it] {'loss': 1.0659, 'learning_rate': 1.1314720012031935e-06, 'epoch': 0.69} 69%|██████▉ | 1925/2774 [6:19:25<2:43:56, 11.59s/it] 69%|██████▉ | 1926/2774 [6:19:37<2:44:49, 11.66s/it] {'loss': 0.9482, 'learning_rate': 1.129029547424235e-06, 'epoch': 0.69} 69%|██████▉ | 1926/2774 [6:19:37<2:44:49, 11.66s/it] 69%|██████▉ | 1927/2774 [6:19:48<2:43:11, 11.56s/it] {'loss': 1.0518, 'learning_rate': 1.1265889635661558e-06, 'epoch': 0.69} 69%|██████▉ | 1927/2774 [6:19:48<2:43:11, 11.56s/it] 70%|██████▉ | 1928/2774 [6:20:00<2:43:37, 11.61s/it] {'loss': 1.0073, 'learning_rate': 1.1241502529577642e-06, 'epoch': 0.7} 70%|██████▉ | 1928/2774 [6:20:00<2:43:37, 11.61s/it] 70%|██████▉ | 1929/2774 [6:20:11<2:42:44, 11.56s/it] {'loss': 1.0137, 'learning_rate': 1.1217134189253155e-06, 'epoch': 0.7} 70%|██████▉ | 1929/2774 [6:20:11<2:42:44, 11.56s/it] 70%|██████▉ | 1930/2774 [6:20:24<2:49:12, 12.03s/it] {'loss': 1.0093, 'learning_rate': 1.1192784647925031e-06, 'epoch': 0.7} 70%|██████▉ | 1930/2774 [6:20:24<2:49:12, 12.03s/it] 70%|██████▉ | 1931/2774 [6:20:36<2:46:57, 11.88s/it] {'loss': 0.998, 'learning_rate': 1.116845393880458e-06, 'epoch': 0.7} 70%|██████▉ | 1931/2774 [6:20:36<2:46:57, 11.88s/it] 70%|██████▉ | 1932/2774 [6:20:48<2:46:03, 11.83s/it] {'loss': 1.0015, 'learning_rate': 1.1144142095077406e-06, 'epoch': 0.7} 70%|██████▉ | 1932/2774 [6:20:48<2:46:03, 11.83s/it] 70%|██████▉ | 1933/2774 [6:21:01<2:51:58, 12.27s/it] {'loss': 0.9766, 'learning_rate': 1.1119849149903414e-06, 'epoch': 0.7} 70%|██████▉ | 1933/2774 [6:21:01<2:51:58, 12.27s/it] 70%|██████▉ | 1934/2774 [6:21:12<2:47:35, 11.97s/it] {'loss': 1.0454, 'learning_rate': 1.1095575136416695e-06, 'epoch': 0.7} 70%|██████▉ | 1934/2774 [6:21:12<2:47:35, 11.97s/it] 70%|██████▉ | 1935/2774 [6:21:24<2:45:30, 11.84s/it] {'loss': 1.0303, 'learning_rate': 1.1071320087725557e-06, 'epoch': 0.7} 70%|██████▉ | 1935/2774 [6:21:24<2:45:30, 11.84s/it] 70%|██████▉ | 1936/2774 [6:21:35<2:43:34, 11.71s/it] {'loss': 1.0181, 'learning_rate': 1.10470840369124e-06, 'epoch': 0.7} 70%|██████▉ | 1936/2774 [6:21:35<2:43:34, 11.71s/it] 70%|██████▉ | 1937/2774 [6:21:47<2:43:10, 11.70s/it] {'loss': 1.0171, 'learning_rate': 1.1022867017033757e-06, 'epoch': 0.7} 70%|██████▉ | 1937/2774 [6:21:47<2:43:10, 11.70s/it] 70%|██████▉ | 1938/2774 [6:21:58<2:40:58, 11.55s/it] {'loss': 1.0244, 'learning_rate': 1.0998669061120157e-06, 'epoch': 0.7} 70%|██████▉ | 1938/2774 [6:21:58<2:40:58, 11.55s/it] 70%|██████▉ | 1939/2774 [6:22:11<2:46:39, 11.98s/it] {'loss': 1.0327, 'learning_rate': 1.097449020217617e-06, 'epoch': 0.7} 70%|██████▉ | 1939/2774 [6:22:11<2:46:39, 11.98s/it] 70%|██████▉ | 1940/2774 [6:22:23<2:49:07, 12.17s/it] {'loss': 0.9478, 'learning_rate': 1.0950330473180287e-06, 'epoch': 0.7} 70%|██████▉ | 1940/2774 [6:22:23<2:49:07, 12.17s/it] 70%|██████▉ | 1941/2774 [6:22:35<2:45:32, 11.92s/it] {'loss': 1.0537, 'learning_rate': 1.0926189907084922e-06, 'epoch': 0.7} 70%|██████▉ | 1941/2774 [6:22:35<2:45:32, 11.92s/it] 70%|███████ | 1942/2774 [6:22:46<2:42:31, 11.72s/it] {'loss': 1.0225, 'learning_rate': 1.090206853681634e-06, 'epoch': 0.7} 70%|███████ | 1942/2774 [6:22:46<2:42:31, 11.72s/it] 70%|███████ | 1943/2774 [6:22:59<2:47:52, 12.12s/it] {'loss': 0.999, 'learning_rate': 1.0877966395274654e-06, 'epoch': 0.7} 70%|███████ | 1943/2774 [6:22:59<2:47:52, 12.12s/it] 70%|███████ | 1944/2774 [6:23:11<2:45:13, 11.94s/it] {'loss': 1.0176, 'learning_rate': 1.08538835153337e-06, 'epoch': 0.7} 70%|███████ | 1944/2774 [6:23:11<2:45:13, 11.94s/it] 70%|███████ | 1945/2774 [6:23:23<2:44:50, 11.93s/it] {'loss': 1.0073, 'learning_rate': 1.0829819929841104e-06, 'epoch': 0.7} 70%|███████ | 1945/2774 [6:23:23<2:44:50, 11.93s/it] 70%|███████ | 1946/2774 [6:23:34<2:41:56, 11.73s/it] {'loss': 1.0464, 'learning_rate': 1.0805775671618124e-06, 'epoch': 0.7} 70%|███████ | 1946/2774 [6:23:34<2:41:56, 11.73s/it] 70%|███████ | 1947/2774 [6:23:47<2:45:52, 12.03s/it] {'loss': 1.0552, 'learning_rate': 1.078175077345967e-06, 'epoch': 0.7} 70%|███████ | 1947/2774 [6:23:47<2:45:52, 12.03s/it] 70%|███████ | 1948/2774 [6:23:58<2:43:28, 11.87s/it] {'loss': 1.021, 'learning_rate': 1.075774526813427e-06, 'epoch': 0.7} 70%|███████ | 1948/2774 [6:23:58<2:43:28, 11.87s/it] 70%|███████ | 1949/2774 [6:24:09<2:40:36, 11.68s/it] {'loss': 1.0107, 'learning_rate': 1.073375918838397e-06, 'epoch': 0.7} 70%|███████ | 1949/2774 [6:24:09<2:40:36, 11.68s/it] 70%|███████ | 1950/2774 [6:24:20<2:38:11, 11.52s/it] {'loss': 1.04, 'learning_rate': 1.0709792566924333e-06, 'epoch': 0.7} 70%|███████ | 1950/2774 [6:24:20<2:38:11, 11.52s/it] 70%|███████ | 1951/2774 [6:24:32<2:38:09, 11.53s/it] {'loss': 1.0396, 'learning_rate': 1.0685845436444391e-06, 'epoch': 0.7} 70%|███████ | 1951/2774 [6:24:32<2:38:09, 11.53s/it] 70%|███████ | 1952/2774 [6:24:43<2:37:16, 11.48s/it] {'loss': 0.9595, 'learning_rate': 1.0661917829606585e-06, 'epoch': 0.7} 70%|███████ | 1952/2774 [6:24:43<2:37:16, 11.48s/it] 70%|███████ | 1953/2774 [6:24:55<2:36:10, 11.41s/it] {'loss': 1.0322, 'learning_rate': 1.0638009779046707e-06, 'epoch': 0.7} 70%|███████ | 1953/2774 [6:24:55<2:36:10, 11.41s/it] 70%|███████ | 1954/2774 [6:25:06<2:35:57, 11.41s/it] {'loss': 1.0132, 'learning_rate': 1.061412131737392e-06, 'epoch': 0.7} 70%|███████ | 1954/2774 [6:25:06<2:35:57, 11.41s/it] 70%|███████ | 1955/2774 [6:25:18<2:36:31, 11.47s/it] {'loss': 1.0742, 'learning_rate': 1.0590252477170614e-06, 'epoch': 0.7} 70%|███████ | 1955/2774 [6:25:18<2:36:31, 11.47s/it] 71%|███████ | 1956/2774 [6:25:30<2:38:45, 11.65s/it] {'loss': 1.0288, 'learning_rate': 1.0566403290992471e-06, 'epoch': 0.71} 71%|███████ | 1956/2774 [6:25:30<2:38:45, 11.65s/it] 71%|███████ | 1957/2774 [6:25:41<2:37:20, 11.55s/it] {'loss': 1.1035, 'learning_rate': 1.0542573791368323e-06, 'epoch': 0.71} 71%|███████ | 1957/2774 [6:25:41<2:37:20, 11.55s/it] 71%|███████ | 1958/2774 [6:25:53<2:38:23, 11.65s/it] {'loss': 1.0181, 'learning_rate': 1.0518764010800193e-06, 'epoch': 0.71} 71%|███████ | 1958/2774 [6:25:53<2:38:23, 11.65s/it] 71%|███████ | 1959/2774 [6:26:04<2:37:41, 11.61s/it] {'loss': 1.0112, 'learning_rate': 1.0494973981763145e-06, 'epoch': 0.71} 71%|███████ | 1959/2774 [6:26:04<2:37:41, 11.61s/it] 71%|███████ | 1960/2774 [6:26:16<2:38:06, 11.65s/it] {'loss': 0.9917, 'learning_rate': 1.0471203736705371e-06, 'epoch': 0.71} 71%|███████ | 1960/2774 [6:26:16<2:38:06, 11.65s/it] 71%|███████ | 1961/2774 [6:26:27<2:36:05, 11.52s/it] {'loss': 1.0576, 'learning_rate': 1.044745330804803e-06, 'epoch': 0.71} 71%|███████ | 1961/2774 [6:26:27<2:36:05, 11.52s/it] 71%|███████ | 1962/2774 [6:26:40<2:41:29, 11.93s/it] {'loss': 1.0181, 'learning_rate': 1.0423722728185292e-06, 'epoch': 0.71} 71%|███████ | 1962/2774 [6:26:40<2:41:29, 11.93s/it] 71%|███████ | 1963/2774 [6:26:52<2:39:44, 11.82s/it] {'loss': 0.998, 'learning_rate': 1.0400012029484216e-06, 'epoch': 0.71} 71%|███████ | 1963/2774 [6:26:52<2:39:44, 11.82s/it] 71%|███████ | 1964/2774 [6:27:03<2:38:02, 11.71s/it] {'loss': 1.0181, 'learning_rate': 1.0376321244284778e-06, 'epoch': 0.71} 71%|███████ | 1964/2774 [6:27:03<2:38:02, 11.71s/it] 71%|███████ | 1965/2774 [6:27:15<2:36:43, 11.62s/it] {'loss': 0.9922, 'learning_rate': 1.0352650404899765e-06, 'epoch': 0.71} 71%|███████ | 1965/2774 [6:27:15<2:36:43, 11.62s/it] 71%|███████ | 1966/2774 [6:27:27<2:37:22, 11.69s/it] {'loss': 1.0283, 'learning_rate': 1.0328999543614782e-06, 'epoch': 0.71} 71%|███████ | 1966/2774 [6:27:27<2:37:22, 11.69s/it] 71%|███████ | 1967/2774 [6:27:40<2:42:30, 12.08s/it] {'loss': 1.061, 'learning_rate': 1.0305368692688175e-06, 'epoch': 0.71} 71%|███████ | 1967/2774 [6:27:40<2:42:30, 12.08s/it] 71%|███████ | 1968/2774 [6:27:51<2:39:21, 11.86s/it] {'loss': 1.0615, 'learning_rate': 1.028175788435099e-06, 'epoch': 0.71} 71%|███████ | 1968/2774 [6:27:51<2:39:21, 11.86s/it] 71%|███████ | 1969/2774 [6:28:02<2:37:16, 11.72s/it] {'loss': 1.0083, 'learning_rate': 1.0258167150806938e-06, 'epoch': 0.71} 71%|███████ | 1969/2774 [6:28:02<2:37:16, 11.72s/it] 71%|███████ | 1970/2774 [6:28:13<2:34:57, 11.56s/it] {'loss': 1.0039, 'learning_rate': 1.0234596524232374e-06, 'epoch': 0.71} 71%|███████ | 1970/2774 [6:28:13<2:34:57, 11.56s/it] 71%|███████ | 1971/2774 [6:28:25<2:34:38, 11.56s/it] {'loss': 1.0176, 'learning_rate': 1.0211046036776187e-06, 'epoch': 0.71} 71%|███████ | 1971/2774 [6:28:25<2:34:38, 11.56s/it] 71%|███████ | 1972/2774 [6:28:37<2:34:25, 11.55s/it] {'loss': 1.0649, 'learning_rate': 1.018751572055984e-06, 'epoch': 0.71} 71%|███████ | 1972/2774 [6:28:37<2:34:25, 11.55s/it] 71%|███████ | 1973/2774 [6:28:48<2:33:12, 11.48s/it] {'loss': 1.0034, 'learning_rate': 1.0164005607677253e-06, 'epoch': 0.71} 71%|███████ | 1973/2774 [6:28:48<2:33:12, 11.48s/it] 71%|███████ | 1974/2774 [6:29:00<2:35:59, 11.70s/it] {'loss': 0.9907, 'learning_rate': 1.014051573019479e-06, 'epoch': 0.71} 71%|███████ | 1974/2774 [6:29:00<2:35:59, 11.70s/it] 71%|███████ | 1975/2774 [6:29:12<2:35:28, 11.68s/it] {'loss': 1.0874, 'learning_rate': 1.0117046120151242e-06, 'epoch': 0.71} 71%|███████ | 1975/2774 [6:29:12<2:35:28, 11.68s/it] 71%|███████ | 1976/2774 [6:29:23<2:34:06, 11.59s/it] {'loss': 1.0483, 'learning_rate': 1.0093596809557732e-06, 'epoch': 0.71} 71%|███████ | 1976/2774 [6:29:23<2:34:06, 11.59s/it] 71%|███████▏ | 1977/2774 [6:29:35<2:33:21, 11.54s/it] {'loss': 1.02, 'learning_rate': 1.0070167830397702e-06, 'epoch': 0.71} 71%|███████▏ | 1977/2774 [6:29:35<2:33:21, 11.54s/it] 71%|███████▏ | 1978/2774 [6:29:46<2:31:46, 11.44s/it] {'loss': 1.0254, 'learning_rate': 1.004675921462686e-06, 'epoch': 0.71} 71%|███████▏ | 1978/2774 [6:29:46<2:31:46, 11.44s/it] 71%|███████▏ | 1979/2774 [6:29:57<2:31:16, 11.42s/it] {'loss': 0.9995, 'learning_rate': 1.0023370994173155e-06, 'epoch': 0.71} 71%|███████▏ | 1979/2774 [6:29:57<2:31:16, 11.42s/it] 71%|███████▏ | 1980/2774 [6:30:09<2:31:38, 11.46s/it] {'loss': 1.0137, 'learning_rate': 1.000000320093669e-06, 'epoch': 0.71} 71%|███████▏ | 1980/2774 [6:30:09<2:31:38, 11.46s/it] 71%|███████▏ | 1981/2774 [6:30:20<2:31:16, 11.45s/it] {'loss': 0.9404, 'learning_rate': 9.976655866789745e-07, 'epoch': 0.71} 71%|███████▏ | 1981/2774 [6:30:20<2:31:16, 11.45s/it] 71%|███████▏ | 1982/2774 [6:30:31<2:31:01, 11.44s/it] {'loss': 0.9761, 'learning_rate': 9.953329023576655e-07, 'epoch': 0.71} 71%|███████▏ | 1982/2774 [6:30:31<2:31:01, 11.44s/it] 71%|███████▏ | 1983/2774 [6:30:43<2:31:22, 11.48s/it] {'loss': 1.0913, 'learning_rate': 9.93002270311384e-07, 'epoch': 0.71} 71%|███████▏ | 1983/2774 [6:30:43<2:31:22, 11.48s/it] 72%|███████▏ | 1984/2774 [6:30:55<2:31:03, 11.47s/it] {'loss': 0.9878, 'learning_rate': 9.9067369371897e-07, 'epoch': 0.72} 72%|███████▏ | 1984/2774 [6:30:55<2:31:03, 11.47s/it] 72%|███████▏ | 1985/2774 [6:31:07<2:33:21, 11.66s/it] {'loss': 1.019, 'learning_rate': 9.883471757564634e-07, 'epoch': 0.72} 72%|███████▏ | 1985/2774 [6:31:07<2:33:21, 11.66s/it] 72%|███████▏ | 1986/2774 [6:31:18<2:32:01, 11.58s/it] {'loss': 1.0146, 'learning_rate': 9.860227195970906e-07, 'epoch': 0.72} 72%|███████▏ | 1986/2774 [6:31:18<2:32:01, 11.58s/it] 72%|███████▏ | 1987/2774 [6:31:29<2:30:12, 11.45s/it] {'loss': 0.9927, 'learning_rate': 9.837003284112727e-07, 'epoch': 0.72} 72%|███████▏ | 1987/2774 [6:31:29<2:30:12, 11.45s/it] 72%|███████▏ | 1988/2774 [6:31:40<2:29:06, 11.38s/it] {'loss': 1.0522, 'learning_rate': 9.813800053666086e-07, 'epoch': 0.72} 72%|███████▏ | 1988/2774 [6:31:40<2:29:06, 11.38s/it] 72%|███████▏ | 1989/2774 [6:31:52<2:28:58, 11.39s/it] {'loss': 1.0073, 'learning_rate': 9.790617536278809e-07, 'epoch': 0.72} 72%|███████▏ | 1989/2774 [6:31:52<2:28:58, 11.39s/it] 72%|███████▏ | 1990/2774 [6:32:03<2:29:50, 11.47s/it] {'loss': 1.0352, 'learning_rate': 9.767455763570433e-07, 'epoch': 0.72} 72%|███████▏ | 1990/2774 [6:32:03<2:29:50, 11.47s/it] 72%|███████▏ | 1991/2774 [6:32:17<2:37:19, 12.06s/it] {'loss': 1.0088, 'learning_rate': 9.74431476713223e-07, 'epoch': 0.72} 72%|███████▏ | 1991/2774 [6:32:17<2:37:19, 12.06s/it] 72%|███████▏ | 1992/2774 [6:32:28<2:34:16, 11.84s/it] {'loss': 0.9961, 'learning_rate': 9.721194578527112e-07, 'epoch': 0.72} 72%|███████▏ | 1992/2774 [6:32:28<2:34:16, 11.84s/it] 72%|███████▏ | 1993/2774 [6:32:39<2:31:34, 11.64s/it] {'loss': 0.9985, 'learning_rate': 9.698095229289614e-07, 'epoch': 0.72} 72%|███████▏ | 1993/2774 [6:32:39<2:31:34, 11.64s/it] 72%|███████▏ | 1994/2774 [6:32:51<2:32:10, 11.71s/it] {'loss': 0.9868, 'learning_rate': 9.67501675092587e-07, 'epoch': 0.72} 72%|███████▏ | 1994/2774 [6:32:51<2:32:10, 11.71s/it] 72%|███████▏ | 1995/2774 [6:33:03<2:30:20, 11.58s/it] {'loss': 1.0405, 'learning_rate': 9.65195917491352e-07, 'epoch': 0.72} 72%|███████▏ | 1995/2774 [6:33:03<2:30:20, 11.58s/it] 72%|███████▏ | 1996/2774 [6:33:14<2:31:25, 11.68s/it] {'loss': 1.0435, 'learning_rate': 9.6289225327017e-07, 'epoch': 0.72} 72%|███████▏ | 1996/2774 [6:33:14<2:31:25, 11.68s/it] 72%|███████▏ | 1997/2774 [6:33:27<2:35:53, 12.04s/it] {'loss': 0.9395, 'learning_rate': 9.605906855711011e-07, 'epoch': 0.72} 72%|███████▏ | 1997/2774 [6:33:27<2:35:53, 12.04s/it] 72%|███████▏ | 1998/2774 [6:33:39<2:35:05, 11.99s/it] {'loss': 1.0776, 'learning_rate': 9.582912175333438e-07, 'epoch': 0.72} 72%|███████▏ | 1998/2774 [6:33:39<2:35:05, 11.99s/it] 72%|███████▏ | 1999/2774 [6:33:51<2:34:06, 11.93s/it] {'loss': 1.0024, 'learning_rate': 9.55993852293233e-07, 'epoch': 0.72} 72%|███████▏ | 1999/2774 [6:33:51<2:34:06, 11.93s/it] 72%|███████▏ | 2000/2774 [6:34:03<2:32:34, 11.83s/it] {'loss': 1.04, 'learning_rate': 9.53698592984238e-07, 'epoch': 0.72} 72%|███████▏ | 2000/2774 [6:34:03<2:32:34, 11.83s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 72%|███████▏ | 2001/2774 [6:34:40<4:13:02, 19.64s/it] {'loss': 1.0435, 'learning_rate': 9.514054427369515e-07, 'epoch': 0.72} 72%|███████▏ | 2001/2774 [6:34:40<4:13:02, 19.64s/it] 72%|███████▏ | 2002/2774 [6:34:53<3:47:07, 17.65s/it] {'loss': 1.0259, 'learning_rate': 9.491144046790939e-07, 'epoch': 0.72} 72%|███████▏ | 2002/2774 [6:34:53<3:47:07, 17.65s/it] 72%|███████▏ | 2003/2774 [6:35:06<3:25:26, 15.99s/it] {'loss': 1.0303, 'learning_rate': 9.468254819355019e-07, 'epoch': 0.72} 72%|███████▏ | 2003/2774 [6:35:06<3:25:26, 15.99s/it] 72%|███████▏ | 2004/2774 [6:35:17<3:07:05, 14.58s/it] {'loss': 1.0024, 'learning_rate': 9.445386776281282e-07, 'epoch': 0.72} 72%|███████▏ | 2004/2774 [6:35:17<3:07:05, 14.58s/it] 72%|███████▏ | 2005/2774 [6:35:29<2:59:05, 13.97s/it] {'loss': 1.0171, 'learning_rate': 9.422539948760342e-07, 'epoch': 0.72} 72%|███████▏ | 2005/2774 [6:35:29<2:59:05, 13.97s/it] 72%|███████▏ | 2006/2774 [6:35:43<2:56:25, 13.78s/it] {'loss': 1.0029, 'learning_rate': 9.399714367953913e-07, 'epoch': 0.72} 72%|███████▏ | 2006/2774 [6:35:43<2:56:25, 13.78s/it] 72%|███████▏ | 2007/2774 [6:35:54<2:47:59, 13.14s/it] {'loss': 1.021, 'learning_rate': 9.37691006499469e-07, 'epoch': 0.72} 72%|███████▏ | 2007/2774 [6:35:54<2:47:59, 13.14s/it] 72%|███████▏ | 2008/2774 [6:36:06<2:41:09, 12.62s/it] {'loss': 1.0415, 'learning_rate': 9.354127070986385e-07, 'epoch': 0.72} 72%|███████▏ | 2008/2774 [6:36:06<2:41:09, 12.62s/it] 72%|███████▏ | 2009/2774 [6:36:17<2:35:38, 12.21s/it] {'loss': 1.0073, 'learning_rate': 9.331365417003602e-07, 'epoch': 0.72} 72%|███████▏ | 2009/2774 [6:36:17<2:35:38, 12.21s/it] 72%|███████▏ | 2010/2774 [6:36:28<2:31:47, 11.92s/it] {'loss': 1.0308, 'learning_rate': 9.308625134091886e-07, 'epoch': 0.72} 72%|███████▏ | 2010/2774 [6:36:28<2:31:47, 11.92s/it] 72%|███████▏ | 2011/2774 [6:36:40<2:29:02, 11.72s/it] {'loss': 0.9873, 'learning_rate': 9.285906253267587e-07, 'epoch': 0.72} 72%|███████▏ | 2011/2774 [6:36:40<2:29:02, 11.72s/it] 73%|███████▎ | 2012/2774 [6:36:51<2:28:12, 11.67s/it] {'loss': 0.9902, 'learning_rate': 9.263208805517912e-07, 'epoch': 0.73} 73%|███████▎ | 2012/2774 [6:36:51<2:28:12, 11.67s/it] 73%|███████▎ | 2013/2774 [6:37:02<2:26:27, 11.55s/it] {'loss': 1.0034, 'learning_rate': 9.24053282180078e-07, 'epoch': 0.73} 73%|███████▎ | 2013/2774 [6:37:02<2:26:27, 11.55s/it] 73%|███████▎ | 2014/2774 [6:37:14<2:25:32, 11.49s/it] {'loss': 1.103, 'learning_rate': 9.21787833304488e-07, 'epoch': 0.73} 73%|███████▎ | 2014/2774 [6:37:14<2:25:32, 11.49s/it] 73%|███████▎ | 2015/2774 [6:37:25<2:24:26, 11.42s/it] {'loss': 1.0088, 'learning_rate': 9.195245370149555e-07, 'epoch': 0.73} 73%|███████▎ | 2015/2774 [6:37:25<2:24:26, 11.42s/it] 73%|███████▎ | 2016/2774 [6:37:36<2:23:57, 11.39s/it] {'loss': 1.0767, 'learning_rate': 9.172633963984818e-07, 'epoch': 0.73} 73%|███████▎ | 2016/2774 [6:37:36<2:23:57, 11.39s/it] 73%|███████▎ | 2017/2774 [6:37:50<2:31:53, 12.04s/it] {'loss': 0.9736, 'learning_rate': 9.150044145391237e-07, 'epoch': 0.73} 73%|███████▎ | 2017/2774 [6:37:50<2:31:53, 12.04s/it] 73%|███████▎ | 2018/2774 [6:38:01<2:29:18, 11.85s/it] {'loss': 1.0132, 'learning_rate': 9.127475945179982e-07, 'epoch': 0.73} 73%|███████▎ | 2018/2774 [6:38:01<2:29:18, 11.85s/it] 73%|███████▎ | 2019/2774 [6:38:15<2:34:56, 12.31s/it] {'loss': 0.958, 'learning_rate': 9.104929394132706e-07, 'epoch': 0.73} 73%|███████▎ | 2019/2774 [6:38:15<2:34:56, 12.31s/it] 73%|███████▎ | 2020/2774 [6:38:26<2:30:27, 11.97s/it] {'loss': 0.9673, 'learning_rate': 9.082404523001531e-07, 'epoch': 0.73} 73%|███████▎ | 2020/2774 [6:38:26<2:30:27, 11.97s/it] 73%|███████▎ | 2021/2774 [6:38:38<2:29:35, 11.92s/it] {'loss': 1.0376, 'learning_rate': 9.059901362509044e-07, 'epoch': 0.73} 73%|███████▎ | 2021/2774 [6:38:38<2:29:35, 11.92s/it] 73%|███████▎ | 2022/2774 [6:38:49<2:27:21, 11.76s/it] {'loss': 1.0054, 'learning_rate': 9.03741994334818e-07, 'epoch': 0.73} 73%|███████▎ | 2022/2774 [6:38:49<2:27:21, 11.76s/it] 73%|███████▎ | 2023/2774 [6:39:00<2:25:41, 11.64s/it] {'loss': 0.9878, 'learning_rate': 9.01496029618224e-07, 'epoch': 0.73} 73%|███████▎ | 2023/2774 [6:39:00<2:25:41, 11.64s/it] 73%|███████▎ | 2024/2774 [6:39:12<2:26:01, 11.68s/it] {'loss': 1.021, 'learning_rate': 8.992522451644823e-07, 'epoch': 0.73} 73%|███████▎ | 2024/2774 [6:39:12<2:26:01, 11.68s/it] 73%|███████▎ | 2025/2774 [6:39:24<2:26:05, 11.70s/it] {'loss': 1.0391, 'learning_rate': 8.970106440339801e-07, 'epoch': 0.73} 73%|███████▎ | 2025/2774 [6:39:24<2:26:05, 11.70s/it] 73%|███████▎ | 2026/2774 [6:39:36<2:25:32, 11.67s/it] {'loss': 1.0024, 'learning_rate': 8.947712292841248e-07, 'epoch': 0.73} 73%|███████▎ | 2026/2774 [6:39:36<2:25:32, 11.67s/it] 73%|███████▎ | 2027/2774 [6:39:47<2:25:45, 11.71s/it] {'loss': 1.0337, 'learning_rate': 8.925340039693444e-07, 'epoch': 0.73} 73%|███████▎ | 2027/2774 [6:39:47<2:25:45, 11.71s/it] 73%|███████▎ | 2028/2774 [6:40:00<2:29:11, 12.00s/it] {'loss': 0.999, 'learning_rate': 8.902989711410773e-07, 'epoch': 0.73} 73%|███████▎ | 2028/2774 [6:40:00<2:29:11, 12.00s/it] 73%|███████▎ | 2029/2774 [6:40:12<2:28:53, 11.99s/it] {'loss': 1.0078, 'learning_rate': 8.880661338477753e-07, 'epoch': 0.73} 73%|███████▎ | 2029/2774 [6:40:12<2:28:53, 11.99s/it] 73%|███████▎ | 2030/2774 [6:40:24<2:30:14, 12.12s/it] {'loss': 0.9888, 'learning_rate': 8.858354951348924e-07, 'epoch': 0.73} 73%|███████▎ | 2030/2774 [6:40:24<2:30:14, 12.12s/it] 73%|███████▎ | 2031/2774 [6:40:36<2:27:21, 11.90s/it] {'loss': 0.9946, 'learning_rate': 8.83607058044885e-07, 'epoch': 0.73} 73%|███████▎ | 2031/2774 [6:40:36<2:27:21, 11.90s/it] 73%|███████▎ | 2032/2774 [6:40:49<2:33:20, 12.40s/it] {'loss': 0.9937, 'learning_rate': 8.813808256172063e-07, 'epoch': 0.73} 73%|███████▎ | 2032/2774 [6:40:49<2:33:20, 12.40s/it] 73%|███████▎ | 2033/2774 [6:41:01<2:30:48, 12.21s/it] {'loss': 1.0059, 'learning_rate': 8.791568008883039e-07, 'epoch': 0.73} 73%|███████▎ | 2033/2774 [6:41:01<2:30:48, 12.21s/it] 73%|███████▎ | 2034/2774 [6:41:13<2:30:34, 12.21s/it] {'loss': 1.022, 'learning_rate': 8.769349868916119e-07, 'epoch': 0.73} 73%|███████▎ | 2034/2774 [6:41:13<2:30:34, 12.21s/it] 73%|███████▎ | 2035/2774 [6:41:25<2:27:10, 11.95s/it] {'loss': 1.0469, 'learning_rate': 8.747153866575522e-07, 'epoch': 0.73} 73%|███████▎ | 2035/2774 [6:41:25<2:27:10, 11.95s/it] 73%|███████▎ | 2036/2774 [6:41:37<2:30:17, 12.22s/it] {'loss': 0.9766, 'learning_rate': 8.724980032135233e-07, 'epoch': 0.73} 73%|███████▎ | 2036/2774 [6:41:37<2:30:17, 12.22s/it] 73%|███████▎ | 2037/2774 [6:41:49<2:26:28, 11.93s/it] {'loss': 1.0771, 'learning_rate': 8.702828395839044e-07, 'epoch': 0.73} 73%|███████▎ | 2037/2774 [6:41:49<2:26:28, 11.93s/it] 73%|███████▎ | 2038/2774 [6:42:01<2:28:11, 12.08s/it] {'loss': 1.042, 'learning_rate': 8.680698987900435e-07, 'epoch': 0.73} 73%|███████▎ | 2038/2774 [6:42:01<2:28:11, 12.08s/it] 74%|███████▎ | 2039/2774 [6:42:13<2:26:32, 11.96s/it] {'loss': 1.0332, 'learning_rate': 8.658591838502587e-07, 'epoch': 0.74} 74%|███████▎ | 2039/2774 [6:42:13<2:26:32, 11.96s/it] 74%|███████▎ | 2040/2774 [6:42:25<2:25:39, 11.91s/it] {'loss': 1.0391, 'learning_rate': 8.636506977798306e-07, 'epoch': 0.74} 74%|███████▎ | 2040/2774 [6:42:25<2:25:39, 11.91s/it] 74%|███████▎ | 2041/2774 [6:42:36<2:23:49, 11.77s/it] {'loss': 1.0659, 'learning_rate': 8.614444435910024e-07, 'epoch': 0.74} 74%|███████▎ | 2041/2774 [6:42:36<2:23:49, 11.77s/it] 74%|███████▎ | 2042/2774 [6:42:48<2:25:13, 11.90s/it] {'loss': 1.0288, 'learning_rate': 8.592404242929697e-07, 'epoch': 0.74} 74%|███████▎ | 2042/2774 [6:42:48<2:25:13, 11.90s/it] 74%|███████▎ | 2043/2774 [6:43:01<2:26:58, 12.06s/it] {'loss': 0.9707, 'learning_rate': 8.57038642891884e-07, 'epoch': 0.74} 74%|███████▎ | 2043/2774 [6:43:01<2:26:58, 12.06s/it] 74%|███████▎ | 2044/2774 [6:43:12<2:25:19, 11.94s/it] {'loss': 1.0522, 'learning_rate': 8.548391023908403e-07, 'epoch': 0.74} 74%|███████▎ | 2044/2774 [6:43:12<2:25:19, 11.94s/it] 74%|███████▎ | 2045/2774 [6:43:24<2:23:08, 11.78s/it] {'loss': 1.0239, 'learning_rate': 8.526418057898791e-07, 'epoch': 0.74} 74%|███████▎ | 2045/2774 [6:43:24<2:23:08, 11.78s/it] 74%|███████▍ | 2046/2774 [6:43:35<2:21:28, 11.66s/it] {'loss': 1.02, 'learning_rate': 8.504467560859814e-07, 'epoch': 0.74} 74%|███████▍ | 2046/2774 [6:43:35<2:21:28, 11.66s/it] 74%|███████▍ | 2047/2774 [6:43:47<2:20:38, 11.61s/it] {'loss': 1.04, 'learning_rate': 8.482539562730607e-07, 'epoch': 0.74} 74%|███████▍ | 2047/2774 [6:43:47<2:20:38, 11.61s/it] 74%|███████▍ | 2048/2774 [6:43:59<2:21:22, 11.68s/it] {'loss': 1.0156, 'learning_rate': 8.460634093419662e-07, 'epoch': 0.74} 74%|███████▍ | 2048/2774 [6:43:59<2:21:22, 11.68s/it] 74%|███████▍ | 2049/2774 [6:44:11<2:25:22, 12.03s/it] {'loss': 1.1001, 'learning_rate': 8.43875118280468e-07, 'epoch': 0.74} 74%|███████▍ | 2049/2774 [6:44:11<2:25:22, 12.03s/it] 74%|███████▍ | 2050/2774 [6:44:23<2:22:43, 11.83s/it] {'loss': 0.9956, 'learning_rate': 8.416890860732657e-07, 'epoch': 0.74} 74%|███████▍ | 2050/2774 [6:44:23<2:22:43, 11.83s/it] 74%|███████▍ | 2051/2774 [6:44:35<2:22:30, 11.83s/it] {'loss': 0.9927, 'learning_rate': 8.395053157019733e-07, 'epoch': 0.74} 74%|███████▍ | 2051/2774 [6:44:35<2:22:30, 11.83s/it] 74%|███████▍ | 2052/2774 [6:44:46<2:21:11, 11.73s/it] {'loss': 1.0181, 'learning_rate': 8.373238101451234e-07, 'epoch': 0.74} 74%|███████▍ | 2052/2774 [6:44:46<2:21:11, 11.73s/it] 74%|███████▍ | 2053/2774 [6:44:59<2:24:50, 12.05s/it] {'loss': 1.0181, 'learning_rate': 8.351445723781562e-07, 'epoch': 0.74} 74%|███████▍ | 2053/2774 [6:44:59<2:24:50, 12.05s/it] 74%|███████▍ | 2054/2774 [6:45:10<2:22:19, 11.86s/it] {'loss': 0.9766, 'learning_rate': 8.32967605373422e-07, 'epoch': 0.74} 74%|███████▍ | 2054/2774 [6:45:10<2:22:19, 11.86s/it] 74%|███████▍ | 2055/2774 [6:45:21<2:19:24, 11.63s/it] {'loss': 0.9927, 'learning_rate': 8.307929121001704e-07, 'epoch': 0.74} 74%|███████▍ | 2055/2774 [6:45:21<2:19:24, 11.63s/it] 74%|███████▍ | 2056/2774 [6:45:33<2:18:00, 11.53s/it] {'loss': 1.0151, 'learning_rate': 8.286204955245535e-07, 'epoch': 0.74} 74%|███████▍ | 2056/2774 [6:45:33<2:18:00, 11.53s/it] 74%|███████▍ | 2057/2774 [6:45:44<2:18:00, 11.55s/it] {'loss': 1.0312, 'learning_rate': 8.264503586096159e-07, 'epoch': 0.74} 74%|███████▍ | 2057/2774 [6:45:44<2:18:00, 11.55s/it] 74%|███████▍ | 2058/2774 [6:45:56<2:17:54, 11.56s/it] {'loss': 1.0542, 'learning_rate': 8.242825043152924e-07, 'epoch': 0.74} 74%|███████▍ | 2058/2774 [6:45:56<2:17:54, 11.56s/it] 74%|███████▍ | 2059/2774 [6:46:07<2:17:50, 11.57s/it] {'loss': 1.0605, 'learning_rate': 8.221169355984051e-07, 'epoch': 0.74} 74%|███████▍ | 2059/2774 [6:46:07<2:17:50, 11.57s/it] 74%|███████▍ | 2060/2774 [6:46:20<2:22:36, 11.98s/it] {'loss': 1.0186, 'learning_rate': 8.199536554126603e-07, 'epoch': 0.74} 74%|███████▍ | 2060/2774 [6:46:20<2:22:36, 11.98s/it] 74%|███████▍ | 2061/2774 [6:46:32<2:19:36, 11.75s/it] {'loss': 1.042, 'learning_rate': 8.177926667086399e-07, 'epoch': 0.74} 74%|███████▍ | 2061/2774 [6:46:32<2:19:36, 11.75s/it] 74%|███████▍ | 2062/2774 [6:46:43<2:18:04, 11.64s/it] {'loss': 1.0693, 'learning_rate': 8.156339724338036e-07, 'epoch': 0.74} 74%|███████▍ | 2062/2774 [6:46:43<2:18:04, 11.64s/it] 74%|███████▍ | 2063/2774 [6:46:55<2:17:48, 11.63s/it] {'loss': 0.9844, 'learning_rate': 8.134775755324784e-07, 'epoch': 0.74} 74%|███████▍ | 2063/2774 [6:46:55<2:17:48, 11.63s/it] 74%|███████▍ | 2064/2774 [6:47:06<2:16:51, 11.57s/it] {'loss': 1.0176, 'learning_rate': 8.11323478945861e-07, 'epoch': 0.74} 74%|███████▍ | 2064/2774 [6:47:06<2:16:51, 11.57s/it] 74%|███████▍ | 2065/2774 [6:47:17<2:15:13, 11.44s/it] {'loss': 0.9912, 'learning_rate': 8.09171685612008e-07, 'epoch': 0.74} 74%|███████▍ | 2065/2774 [6:47:17<2:15:13, 11.44s/it] 74%|███████▍ | 2066/2774 [6:47:29<2:15:20, 11.47s/it] {'loss': 1.0278, 'learning_rate': 8.070221984658358e-07, 'epoch': 0.74} 74%|███████▍ | 2066/2774 [6:47:29<2:15:20, 11.47s/it] 75%|███████▍ | 2067/2774 [6:47:41<2:17:39, 11.68s/it] {'loss': 1.04, 'learning_rate': 8.048750204391143e-07, 'epoch': 0.75} 75%|███████▍ | 2067/2774 [6:47:41<2:17:39, 11.68s/it] 75%|███████▍ | 2068/2774 [6:47:52<2:16:21, 11.59s/it] {'loss': 1.0083, 'learning_rate': 8.027301544604657e-07, 'epoch': 0.75} 75%|███████▍ | 2068/2774 [6:47:52<2:16:21, 11.59s/it] 75%|███████▍ | 2069/2774 [6:48:04<2:15:27, 11.53s/it] {'loss': 1.0181, 'learning_rate': 8.005876034553575e-07, 'epoch': 0.75} 75%|███████▍ | 2069/2774 [6:48:04<2:15:27, 11.53s/it] 75%|███████▍ | 2070/2774 [6:48:15<2:14:35, 11.47s/it] {'loss': 0.9946, 'learning_rate': 7.984473703460985e-07, 'epoch': 0.75} 75%|███████▍ | 2070/2774 [6:48:15<2:14:35, 11.47s/it] 75%|███████▍ | 2071/2774 [6:48:26<2:14:29, 11.48s/it] {'loss': 1.0322, 'learning_rate': 7.963094580518394e-07, 'epoch': 0.75} 75%|███████▍ | 2071/2774 [6:48:26<2:14:29, 11.48s/it] 75%|███████▍ | 2072/2774 [6:48:40<2:22:33, 12.18s/it] {'loss': 0.9697, 'learning_rate': 7.941738694885614e-07, 'epoch': 0.75} 75%|███████▍ | 2072/2774 [6:48:40<2:22:33, 12.18s/it] 75%|███████▍ | 2073/2774 [6:48:52<2:20:24, 12.02s/it] {'loss': 1.0049, 'learning_rate': 7.920406075690804e-07, 'epoch': 0.75} 75%|███████▍ | 2073/2774 [6:48:52<2:20:24, 12.02s/it] 75%|███████▍ | 2074/2774 [6:49:03<2:17:27, 11.78s/it] {'loss': 1.0376, 'learning_rate': 7.899096752030346e-07, 'epoch': 0.75} 75%|███████▍ | 2074/2774 [6:49:03<2:17:27, 11.78s/it] 75%|███████▍ | 2075/2774 [6:49:14<2:15:33, 11.64s/it] {'loss': 1.02, 'learning_rate': 7.877810752968901e-07, 'epoch': 0.75} 75%|███████▍ | 2075/2774 [6:49:14<2:15:33, 11.64s/it] 75%|███████▍ | 2076/2774 [6:49:26<2:13:43, 11.50s/it] {'loss': 1.0474, 'learning_rate': 7.856548107539247e-07, 'epoch': 0.75} 75%|███████▍ | 2076/2774 [6:49:26<2:13:43, 11.50s/it] 75%|███████▍ | 2077/2774 [6:49:37<2:14:46, 11.60s/it] {'loss': 1.0615, 'learning_rate': 7.835308844742376e-07, 'epoch': 0.75} 75%|███████▍ | 2077/2774 [6:49:37<2:14:46, 11.60s/it] 75%|███████▍ | 2078/2774 [6:49:49<2:14:49, 11.62s/it] {'loss': 1.0791, 'learning_rate': 7.814092993547342e-07, 'epoch': 0.75} 75%|███████▍ | 2078/2774 [6:49:49<2:14:49, 11.62s/it] 75%|███████▍ | 2079/2774 [6:50:01<2:14:09, 11.58s/it] {'loss': 1.0361, 'learning_rate': 7.792900582891303e-07, 'epoch': 0.75} 75%|███████▍ | 2079/2774 [6:50:01<2:14:09, 11.58s/it] 75%|███████▍ | 2080/2774 [6:50:12<2:13:23, 11.53s/it] {'loss': 1.0083, 'learning_rate': 7.771731641679406e-07, 'epoch': 0.75} 75%|███████▍ | 2080/2774 [6:50:12<2:13:23, 11.53s/it] 75%|███████▌ | 2081/2774 [6:50:24<2:14:16, 11.63s/it] {'loss': 1.0181, 'learning_rate': 7.750586198784829e-07, 'epoch': 0.75} 75%|███████▌ | 2081/2774 [6:50:24<2:14:16, 11.63s/it] 75%|███████▌ | 2082/2774 [6:50:39<2:24:55, 12.57s/it] {'loss': 1.0103, 'learning_rate': 7.72946428304866e-07, 'epoch': 0.75} 75%|███████▌ | 2082/2774 [6:50:39<2:24:55, 12.57s/it] 75%|███████▌ | 2083/2774 [6:50:50<2:21:10, 12.26s/it] {'loss': 0.9775, 'learning_rate': 7.708365923279931e-07, 'epoch': 0.75} 75%|███████▌ | 2083/2774 [6:50:50<2:21:10, 12.26s/it] 75%|███████▌ | 2084/2774 [6:51:02<2:19:54, 12.17s/it] {'loss': 1.0142, 'learning_rate': 7.687291148255527e-07, 'epoch': 0.75} 75%|███████▌ | 2084/2774 [6:51:02<2:19:54, 12.17s/it] 75%|███████▌ | 2085/2774 [6:51:14<2:19:23, 12.14s/it] {'loss': 1.0474, 'learning_rate': 7.666239986720162e-07, 'epoch': 0.75} 75%|███████▌ | 2085/2774 [6:51:14<2:19:23, 12.14s/it] 75%|███████▌ | 2086/2774 [6:51:26<2:16:38, 11.92s/it] {'loss': 1.0283, 'learning_rate': 7.645212467386346e-07, 'epoch': 0.75} 75%|███████▌ | 2086/2774 [6:51:26<2:16:38, 11.92s/it] 75%|███████▌ | 2087/2774 [6:51:37<2:13:41, 11.68s/it] {'loss': 1.0396, 'learning_rate': 7.624208618934356e-07, 'epoch': 0.75} 75%|███████▌ | 2087/2774 [6:51:37<2:13:41, 11.68s/it] 75%|███████▌ | 2088/2774 [6:51:48<2:12:05, 11.55s/it] {'loss': 0.9951, 'learning_rate': 7.603228470012162e-07, 'epoch': 0.75} 75%|███████▌ | 2088/2774 [6:51:48<2:12:05, 11.55s/it] 75%|███████▌ | 2089/2774 [6:52:00<2:11:55, 11.56s/it] {'loss': 1.0215, 'learning_rate': 7.582272049235431e-07, 'epoch': 0.75} 75%|███████▌ | 2089/2774 [6:52:00<2:11:55, 11.56s/it] 75%|███████▌ | 2090/2774 [6:52:13<2:17:09, 12.03s/it] {'loss': 1.0024, 'learning_rate': 7.561339385187449e-07, 'epoch': 0.75} 75%|███████▌ | 2090/2774 [6:52:13<2:17:09, 12.03s/it] 75%|███████▌ | 2091/2774 [6:52:25<2:19:38, 12.27s/it] {'loss': 0.9609, 'learning_rate': 7.540430506419099e-07, 'epoch': 0.75} 75%|███████▌ | 2091/2774 [6:52:25<2:19:38, 12.27s/it] 75%|███████▌ | 2092/2774 [6:52:37<2:17:10, 12.07s/it] {'loss': 1.0498, 'learning_rate': 7.519545441448842e-07, 'epoch': 0.75} 75%|███████▌ | 2092/2774 [6:52:37<2:17:10, 12.07s/it] 75%|███████▌ | 2093/2774 [6:52:50<2:19:03, 12.25s/it] {'loss': 1.0283, 'learning_rate': 7.498684218762639e-07, 'epoch': 0.75} 75%|███████▌ | 2093/2774 [6:52:50<2:19:03, 12.25s/it] 75%|███████▌ | 2094/2774 [6:53:01<2:16:17, 12.03s/it] {'loss': 1.0894, 'learning_rate': 7.477846866813934e-07, 'epoch': 0.75} 75%|███████▌ | 2094/2774 [6:53:01<2:16:17, 12.03s/it] 76%|███████▌ | 2095/2774 [6:53:12<2:12:48, 11.74s/it] {'loss': 1.0615, 'learning_rate': 7.457033414023613e-07, 'epoch': 0.76} 76%|███████▌ | 2095/2774 [6:53:12<2:12:48, 11.74s/it] 76%|███████▌ | 2096/2774 [6:53:25<2:15:29, 11.99s/it] {'loss': 1.0386, 'learning_rate': 7.436243888779982e-07, 'epoch': 0.76} 76%|███████▌ | 2096/2774 [6:53:25<2:15:29, 11.99s/it] 76%|███████▌ | 2097/2774 [6:53:36<2:13:37, 11.84s/it] {'loss': 1.0293, 'learning_rate': 7.41547831943868e-07, 'epoch': 0.76} 76%|███████▌ | 2097/2774 [6:53:36<2:13:37, 11.84s/it] 76%|███████▌ | 2098/2774 [6:53:50<2:18:07, 12.26s/it] {'loss': 0.9868, 'learning_rate': 7.394736734322705e-07, 'epoch': 0.76} 76%|███████▌ | 2098/2774 [6:53:50<2:18:07, 12.26s/it] 76%|███████▌ | 2099/2774 [6:54:01<2:14:52, 11.99s/it] {'loss': 1.0449, 'learning_rate': 7.374019161722315e-07, 'epoch': 0.76} 76%|███████▌ | 2099/2774 [6:54:01<2:14:52, 11.99s/it] 76%|███████▌ | 2100/2774 [6:54:14<2:19:16, 12.40s/it] {'loss': 1.0029, 'learning_rate': 7.353325629895039e-07, 'epoch': 0.76} 76%|███████▌ | 2100/2774 [6:54:14<2:19:16, 12.40s/it] 76%|███████▌ | 2101/2774 [6:54:26<2:15:11, 12.05s/it] {'loss': 0.9878, 'learning_rate': 7.332656167065591e-07, 'epoch': 0.76} 76%|███████▌ | 2101/2774 [6:54:26<2:15:11, 12.05s/it] 76%|███████▌ | 2102/2774 [6:54:37<2:13:56, 11.96s/it] {'loss': 1.0098, 'learning_rate': 7.312010801425892e-07, 'epoch': 0.76} 76%|███████▌ | 2102/2774 [6:54:37<2:13:56, 11.96s/it] 76%|███████▌ | 2103/2774 [6:54:49<2:13:07, 11.90s/it] {'loss': 1.0288, 'learning_rate': 7.29138956113494e-07, 'epoch': 0.76} 76%|███████▌ | 2103/2774 [6:54:49<2:13:07, 11.90s/it] 76%|███████▌ | 2104/2774 [6:55:01<2:11:11, 11.75s/it] {'loss': 0.9541, 'learning_rate': 7.270792474318889e-07, 'epoch': 0.76} 76%|███████▌ | 2104/2774 [6:55:01<2:11:11, 11.75s/it] 76%|███████▌ | 2105/2774 [6:55:14<2:15:48, 12.18s/it] {'loss': 0.9346, 'learning_rate': 7.250219569070904e-07, 'epoch': 0.76} 76%|███████▌ | 2105/2774 [6:55:14<2:15:48, 12.18s/it] 76%|███████▌ | 2106/2774 [6:55:25<2:12:57, 11.94s/it] {'loss': 1.0762, 'learning_rate': 7.229670873451197e-07, 'epoch': 0.76} 76%|███████▌ | 2106/2774 [6:55:25<2:12:57, 11.94s/it] 76%|███████▌ | 2107/2774 [6:55:38<2:14:51, 12.13s/it] {'loss': 0.9526, 'learning_rate': 7.20914641548694e-07, 'epoch': 0.76} 76%|███████▌ | 2107/2774 [6:55:38<2:14:51, 12.13s/it] 76%|███████▌ | 2108/2774 [6:55:49<2:12:22, 11.93s/it] {'loss': 1.0376, 'learning_rate': 7.18864622317226e-07, 'epoch': 0.76} 76%|███████▌ | 2108/2774 [6:55:49<2:12:22, 11.93s/it] 76%|███████▌ | 2109/2774 [6:56:01<2:11:02, 11.82s/it] {'loss': 1.0493, 'learning_rate': 7.168170324468171e-07, 'epoch': 0.76} 76%|███████▌ | 2109/2774 [6:56:01<2:11:02, 11.82s/it] 76%|███████▌ | 2110/2774 [6:56:13<2:11:10, 11.85s/it] {'loss': 0.9941, 'learning_rate': 7.147718747302577e-07, 'epoch': 0.76} 76%|███████▌ | 2110/2774 [6:56:13<2:11:10, 11.85s/it] 76%|███████▌ | 2111/2774 [6:56:24<2:09:56, 11.76s/it] {'loss': 1.0103, 'learning_rate': 7.127291519570184e-07, 'epoch': 0.76} 76%|███████▌ | 2111/2774 [6:56:24<2:09:56, 11.76s/it] 76%|███████▌ | 2112/2774 [6:56:36<2:08:44, 11.67s/it] {'loss': 0.978, 'learning_rate': 7.106888669132497e-07, 'epoch': 0.76} 76%|███████▌ | 2112/2774 [6:56:36<2:08:44, 11.67s/it] 76%|███████▌ | 2113/2774 [6:56:47<2:08:19, 11.65s/it] {'loss': 1.063, 'learning_rate': 7.086510223817766e-07, 'epoch': 0.76} 76%|███████▌ | 2113/2774 [6:56:47<2:08:19, 11.65s/it] 76%|███████▌ | 2114/2774 [6:56:59<2:07:10, 11.56s/it] {'loss': 0.9873, 'learning_rate': 7.066156211420975e-07, 'epoch': 0.76} 76%|███████▌ | 2114/2774 [6:56:59<2:07:10, 11.56s/it] 76%|███████▌ | 2115/2774 [6:57:11<2:08:29, 11.70s/it] {'loss': 1.0122, 'learning_rate': 7.045826659703756e-07, 'epoch': 0.76} 76%|███████▌ | 2115/2774 [6:57:11<2:08:29, 11.70s/it] 76%|███████▋ | 2116/2774 [6:57:22<2:08:00, 11.67s/it] {'loss': 1.0859, 'learning_rate': 7.025521596394382e-07, 'epoch': 0.76} 76%|███████▋ | 2116/2774 [6:57:22<2:08:00, 11.67s/it] 76%|███████▋ | 2117/2774 [6:57:34<2:07:05, 11.61s/it] {'loss': 1.0078, 'learning_rate': 7.005241049187752e-07, 'epoch': 0.76} 76%|███████▋ | 2117/2774 [6:57:34<2:07:05, 11.61s/it] 76%|███████▋ | 2118/2774 [6:57:47<2:12:26, 12.11s/it] {'loss': 0.9414, 'learning_rate': 6.98498504574529e-07, 'epoch': 0.76} 76%|███████▋ | 2118/2774 [6:57:47<2:12:26, 12.11s/it] 76%|███████▋ | 2119/2774 [6:57:58<2:10:16, 11.93s/it] {'loss': 1.0469, 'learning_rate': 6.964753613694977e-07, 'epoch': 0.76} 76%|███████▋ | 2119/2774 [6:57:58<2:10:16, 11.93s/it] 76%|███████▋ | 2120/2774 [6:58:10<2:09:19, 11.87s/it] {'loss': 1.0254, 'learning_rate': 6.944546780631256e-07, 'epoch': 0.76} 76%|███████▋ | 2120/2774 [6:58:10<2:09:19, 11.87s/it] 76%|███████▋ | 2121/2774 [6:58:22<2:08:10, 11.78s/it] {'loss': 1.002, 'learning_rate': 6.924364574115025e-07, 'epoch': 0.76} 76%|███████▋ | 2121/2774 [6:58:22<2:08:10, 11.78s/it] 76%|███████▋ | 2122/2774 [6:58:33<2:05:59, 11.59s/it] {'loss': 1.042, 'learning_rate': 6.90420702167359e-07, 'epoch': 0.76} 76%|███████▋ | 2122/2774 [6:58:33<2:05:59, 11.59s/it] 77%|███████▋ | 2123/2774 [6:58:44<2:04:43, 11.50s/it] {'loss': 1.0161, 'learning_rate': 6.884074150800649e-07, 'epoch': 0.77} 77%|███████▋ | 2123/2774 [6:58:44<2:04:43, 11.50s/it] 77%|███████▋ | 2124/2774 [6:58:56<2:04:11, 11.46s/it] {'loss': 1.0181, 'learning_rate': 6.863965988956203e-07, 'epoch': 0.77} 77%|███████▋ | 2124/2774 [6:58:56<2:04:11, 11.46s/it] 77%|███████▋ | 2125/2774 [6:59:07<2:03:25, 11.41s/it] {'loss': 0.998, 'learning_rate': 6.843882563566589e-07, 'epoch': 0.77} 77%|███████▋ | 2125/2774 [6:59:07<2:03:25, 11.41s/it] 77%|███████▋ | 2126/2774 [6:59:18<2:03:35, 11.44s/it] {'loss': 1.0073, 'learning_rate': 6.82382390202437e-07, 'epoch': 0.77} 77%|███████▋ | 2126/2774 [6:59:18<2:03:35, 11.44s/it] 77%|███████▋ | 2127/2774 [6:59:30<2:04:22, 11.53s/it] {'loss': 0.9951, 'learning_rate': 6.803790031688365e-07, 'epoch': 0.77} 77%|███████▋ | 2127/2774 [6:59:30<2:04:22, 11.53s/it] 77%|███████▋ | 2128/2774 [6:59:42<2:03:55, 11.51s/it] {'loss': 1.0215, 'learning_rate': 6.783780979883548e-07, 'epoch': 0.77} 77%|███████▋ | 2128/2774 [6:59:42<2:03:55, 11.51s/it] 77%|███████▋ | 2129/2774 [6:59:53<2:03:03, 11.45s/it] {'loss': 1.0542, 'learning_rate': 6.763796773901074e-07, 'epoch': 0.77} 77%|███████▋ | 2129/2774 [6:59:53<2:03:03, 11.45s/it] 77%|███████▋ | 2130/2774 [7:00:04<2:02:31, 11.41s/it] {'loss': 1.0273, 'learning_rate': 6.743837440998169e-07, 'epoch': 0.77} 77%|███████▋ | 2130/2774 [7:00:04<2:02:31, 11.41s/it] 77%|███████▋ | 2131/2774 [7:00:16<2:03:53, 11.56s/it] {'loss': 1.0713, 'learning_rate': 6.723903008398178e-07, 'epoch': 0.77} 77%|███████▋ | 2131/2774 [7:00:16<2:03:53, 11.56s/it] 77%|███████▋ | 2132/2774 [7:00:27<2:02:20, 11.43s/it] {'loss': 1.0767, 'learning_rate': 6.703993503290448e-07, 'epoch': 0.77} 77%|███████▋ | 2132/2774 [7:00:27<2:02:20, 11.43s/it] 77%|███████▋ | 2133/2774 [7:00:40<2:04:51, 11.69s/it] {'loss': 1.064, 'learning_rate': 6.684108952830354e-07, 'epoch': 0.77} 77%|███████▋ | 2133/2774 [7:00:40<2:04:51, 11.69s/it] 77%|███████▋ | 2134/2774 [7:00:51<2:04:44, 11.69s/it] {'loss': 1.0327, 'learning_rate': 6.66424938413921e-07, 'epoch': 0.77} 77%|███████▋ | 2134/2774 [7:00:51<2:04:44, 11.69s/it] 77%|███████▋ | 2135/2774 [7:01:03<2:04:26, 11.68s/it] {'loss': 1.0454, 'learning_rate': 6.644414824304282e-07, 'epoch': 0.77} 77%|███████▋ | 2135/2774 [7:01:03<2:04:26, 11.68s/it] 77%|███████▋ | 2136/2774 [7:01:15<2:04:14, 11.68s/it] {'loss': 1.0835, 'learning_rate': 6.624605300378703e-07, 'epoch': 0.77} 77%|███████▋ | 2136/2774 [7:01:15<2:04:14, 11.68s/it] 77%|███████▋ | 2137/2774 [7:01:27<2:04:53, 11.76s/it] {'loss': 0.9961, 'learning_rate': 6.604820839381459e-07, 'epoch': 0.77} 77%|███████▋ | 2137/2774 [7:01:27<2:04:53, 11.76s/it] 77%|███████▋ | 2138/2774 [7:01:38<2:03:36, 11.66s/it] {'loss': 1.002, 'learning_rate': 6.585061468297377e-07, 'epoch': 0.77} 77%|███████▋ | 2138/2774 [7:01:38<2:03:36, 11.66s/it] 77%|███████▋ | 2139/2774 [7:01:49<2:02:34, 11.58s/it] {'loss': 0.9863, 'learning_rate': 6.565327214077033e-07, 'epoch': 0.77} 77%|███████▋ | 2139/2774 [7:01:49<2:02:34, 11.58s/it] 77%|███████▋ | 2140/2774 [7:02:00<2:00:29, 11.40s/it] {'loss': 1.0127, 'learning_rate': 6.545618103636764e-07, 'epoch': 0.77} 77%|███████▋ | 2140/2774 [7:02:00<2:00:29, 11.40s/it] 77%|███████▋ | 2141/2774 [7:02:12<2:00:37, 11.43s/it] {'loss': 1.0117, 'learning_rate': 6.525934163858597e-07, 'epoch': 0.77} 77%|███████▋ | 2141/2774 [7:02:12<2:00:37, 11.43s/it] 77%|███████▋ | 2142/2774 [7:02:23<2:00:29, 11.44s/it] {'loss': 1.0298, 'learning_rate': 6.50627542159025e-07, 'epoch': 0.77} 77%|███████▋ | 2142/2774 [7:02:23<2:00:29, 11.44s/it] 77%|███████▋ | 2143/2774 [7:02:35<2:00:22, 11.45s/it] {'loss': 1.0098, 'learning_rate': 6.486641903645044e-07, 'epoch': 0.77} 77%|███████▋ | 2143/2774 [7:02:35<2:00:22, 11.45s/it] 77%|███████▋ | 2144/2774 [7:02:46<2:00:36, 11.49s/it] {'loss': 1.0469, 'learning_rate': 6.467033636801928e-07, 'epoch': 0.77} 77%|███████▋ | 2144/2774 [7:02:46<2:00:36, 11.49s/it] 77%|███████▋ | 2145/2774 [7:02:58<2:00:54, 11.53s/it] {'loss': 1.0576, 'learning_rate': 6.447450647805378e-07, 'epoch': 0.77} 77%|███████▋ | 2145/2774 [7:02:58<2:00:54, 11.53s/it] 77%|███████▋ | 2146/2774 [7:03:10<2:03:02, 11.76s/it] {'loss': 1.0161, 'learning_rate': 6.427892963365425e-07, 'epoch': 0.77} 77%|███████▋ | 2146/2774 [7:03:10<2:03:02, 11.76s/it] 77%|███████▋ | 2147/2774 [7:03:22<2:02:26, 11.72s/it] {'loss': 1.043, 'learning_rate': 6.40836061015756e-07, 'epoch': 0.77} 77%|███████▋ | 2147/2774 [7:03:22<2:02:26, 11.72s/it] 77%|███████▋ | 2148/2774 [7:03:33<2:01:35, 11.65s/it] {'loss': 1.022, 'learning_rate': 6.388853614822732e-07, 'epoch': 0.77} 77%|███████▋ | 2148/2774 [7:03:33<2:01:35, 11.65s/it] 77%|███████▋ | 2149/2774 [7:03:45<2:00:13, 11.54s/it] {'loss': 1.0615, 'learning_rate': 6.369372003967297e-07, 'epoch': 0.77} 77%|███████▋ | 2149/2774 [7:03:45<2:00:13, 11.54s/it] 78%|███████▊ | 2150/2774 [7:03:57<2:01:00, 11.64s/it] {'loss': 1.0132, 'learning_rate': 6.349915804163012e-07, 'epoch': 0.78} 78%|███████▊ | 2150/2774 [7:03:57<2:01:00, 11.64s/it] 78%|███████▊ | 2151/2774 [7:04:08<1:59:35, 11.52s/it] {'loss': 0.9966, 'learning_rate': 6.330485041946943e-07, 'epoch': 0.78} 78%|███████▊ | 2151/2774 [7:04:08<1:59:35, 11.52s/it] 78%|███████▊ | 2152/2774 [7:04:19<1:59:58, 11.57s/it] {'loss': 1.0664, 'learning_rate': 6.311079743821489e-07, 'epoch': 0.78} 78%|███████▊ | 2152/2774 [7:04:19<1:59:58, 11.57s/it] 78%|███████▊ | 2153/2774 [7:04:32<2:03:06, 11.89s/it] {'loss': 1.0845, 'learning_rate': 6.29169993625429e-07, 'epoch': 0.78} 78%|███████▊ | 2153/2774 [7:04:32<2:03:06, 11.89s/it] 78%|███████▊ | 2154/2774 [7:04:43<2:00:43, 11.68s/it] {'loss': 1.0146, 'learning_rate': 6.272345645678249e-07, 'epoch': 0.78} 78%|███████▊ | 2154/2774 [7:04:43<2:00:43, 11.68s/it] 78%|███████▊ | 2155/2774 [7:04:55<1:59:40, 11.60s/it] {'loss': 1.0488, 'learning_rate': 6.253016898491435e-07, 'epoch': 0.78} 78%|███████▊ | 2155/2774 [7:04:55<1:59:40, 11.60s/it] 78%|███████▊ | 2156/2774 [7:05:06<2:00:00, 11.65s/it] {'loss': 1.0776, 'learning_rate': 6.233713721057108e-07, 'epoch': 0.78} 78%|███████▊ | 2156/2774 [7:05:06<2:00:00, 11.65s/it] 78%|███████▊ | 2157/2774 [7:05:18<1:59:13, 11.59s/it] {'loss': 1.0864, 'learning_rate': 6.214436139703614e-07, 'epoch': 0.78} 78%|███████▊ | 2157/2774 [7:05:18<1:59:13, 11.59s/it] 78%|███████▊ | 2158/2774 [7:05:30<2:00:01, 11.69s/it] {'loss': 1.0542, 'learning_rate': 6.195184180724429e-07, 'epoch': 0.78} 78%|███████▊ | 2158/2774 [7:05:30<2:00:01, 11.69s/it] 78%|███████▊ | 2159/2774 [7:05:41<1:59:40, 11.67s/it] {'loss': 1.0366, 'learning_rate': 6.175957870378043e-07, 'epoch': 0.78} 78%|███████▊ | 2159/2774 [7:05:41<1:59:40, 11.67s/it] 78%|███████▊ | 2160/2774 [7:05:54<2:02:36, 11.98s/it] {'loss': 1.0381, 'learning_rate': 6.156757234888006e-07, 'epoch': 0.78} 78%|███████▊ | 2160/2774 [7:05:54<2:02:36, 11.98s/it] 78%|███████▊ | 2161/2774 [7:06:06<2:01:26, 11.89s/it] {'loss': 1.0674, 'learning_rate': 6.137582300442807e-07, 'epoch': 0.78} 78%|███████▊ | 2161/2774 [7:06:06<2:01:26, 11.89s/it] 78%|███████▊ | 2162/2774 [7:06:19<2:04:39, 12.22s/it] {'loss': 1.0059, 'learning_rate': 6.118433093195897e-07, 'epoch': 0.78} 78%|███████▊ | 2162/2774 [7:06:19<2:04:39, 12.22s/it] 78%|███████▊ | 2163/2774 [7:06:30<2:01:44, 11.95s/it] {'loss': 1.0752, 'learning_rate': 6.099309639265652e-07, 'epoch': 0.78} 78%|███████▊ | 2163/2774 [7:06:30<2:01:44, 11.95s/it] 78%|███████▊ | 2164/2774 [7:06:42<2:00:36, 11.86s/it] {'loss': 1.0649, 'learning_rate': 6.080211964735292e-07, 'epoch': 0.78} 78%|███████▊ | 2164/2774 [7:06:42<2:00:36, 11.86s/it] 78%|███████▊ | 2165/2774 [7:06:53<1:59:12, 11.75s/it] {'loss': 1.0635, 'learning_rate': 6.061140095652906e-07, 'epoch': 0.78} 78%|███████▊ | 2165/2774 [7:06:53<1:59:12, 11.75s/it] 78%|███████▊ | 2166/2774 [7:07:05<1:58:14, 11.67s/it] {'loss': 1.0542, 'learning_rate': 6.042094058031367e-07, 'epoch': 0.78} 78%|███████▊ | 2166/2774 [7:07:05<1:58:14, 11.67s/it] 78%|███████▊ | 2167/2774 [7:07:17<2:00:14, 11.88s/it] {'loss': 1.0269, 'learning_rate': 6.023073877848314e-07, 'epoch': 0.78} 78%|███████▊ | 2167/2774 [7:07:17<2:00:14, 11.88s/it] 78%|███████▊ | 2168/2774 [7:07:29<1:58:19, 11.72s/it] {'loss': 1.0161, 'learning_rate': 6.004079581046123e-07, 'epoch': 0.78} 78%|███████▊ | 2168/2774 [7:07:29<1:58:19, 11.72s/it] 78%|███████▊ | 2169/2774 [7:07:41<1:59:08, 11.82s/it] {'loss': 1.0229, 'learning_rate': 5.985111193531878e-07, 'epoch': 0.78} 78%|███████▊ | 2169/2774 [7:07:41<1:59:08, 11.82s/it] 78%|███████▊ | 2170/2774 [7:07:52<1:58:15, 11.75s/it] {'loss': 0.9775, 'learning_rate': 5.9661687411773e-07, 'epoch': 0.78} 78%|███████▊ | 2170/2774 [7:07:52<1:58:15, 11.75s/it] 78%|███████▊ | 2171/2774 [7:08:05<2:00:43, 12.01s/it] {'loss': 1.0293, 'learning_rate': 5.947252249818764e-07, 'epoch': 0.78} 78%|███████▊ | 2171/2774 [7:08:05<2:00:43, 12.01s/it] 78%|███████▊ | 2172/2774 [7:08:16<1:59:05, 11.87s/it] {'loss': 1.04, 'learning_rate': 5.928361745257207e-07, 'epoch': 0.78} 78%|███████▊ | 2172/2774 [7:08:16<1:59:05, 11.87s/it] 78%|███████▊ | 2173/2774 [7:08:29<2:01:09, 12.10s/it] {'loss': 0.9697, 'learning_rate': 5.909497253258153e-07, 'epoch': 0.78} 78%|███████▊ | 2173/2774 [7:08:29<2:01:09, 12.10s/it] 78%|███████▊ | 2174/2774 [7:08:40<1:58:33, 11.86s/it] {'loss': 1.0015, 'learning_rate': 5.890658799551619e-07, 'epoch': 0.78} 78%|███████▊ | 2174/2774 [7:08:40<1:58:33, 11.86s/it] 78%|███████▊ | 2175/2774 [7:08:52<1:58:54, 11.91s/it] {'loss': 1.0601, 'learning_rate': 5.871846409832119e-07, 'epoch': 0.78} 78%|███████▊ | 2175/2774 [7:08:52<1:58:54, 11.91s/it] 78%|███████▊ | 2176/2774 [7:09:04<1:57:14, 11.76s/it] {'loss': 1.0078, 'learning_rate': 5.853060109758608e-07, 'epoch': 0.78} 78%|███████▊ | 2176/2774 [7:09:04<1:57:14, 11.76s/it] 78%|███████▊ | 2177/2774 [7:09:15<1:56:02, 11.66s/it] {'loss': 1.0303, 'learning_rate': 5.834299924954482e-07, 'epoch': 0.78} 78%|███████▊ | 2177/2774 [7:09:15<1:56:02, 11.66s/it] 79%|███████▊ | 2178/2774 [7:09:28<1:59:42, 12.05s/it] {'loss': 0.9976, 'learning_rate': 5.815565881007481e-07, 'epoch': 0.79} 79%|███████▊ | 2178/2774 [7:09:28<1:59:42, 12.05s/it] 79%|███████▊ | 2179/2774 [7:09:40<1:58:27, 11.94s/it] {'loss': 1.0254, 'learning_rate': 5.796858003469727e-07, 'epoch': 0.79} 79%|███████▊ | 2179/2774 [7:09:40<1:58:27, 11.94s/it] 79%|███████▊ | 2180/2774 [7:09:51<1:56:41, 11.79s/it] {'loss': 1.0244, 'learning_rate': 5.778176317857618e-07, 'epoch': 0.79} 79%|███████▊ | 2180/2774 [7:09:51<1:56:41, 11.79s/it] 79%|███████▊ | 2181/2774 [7:10:04<1:59:45, 12.12s/it] {'loss': 0.9849, 'learning_rate': 5.759520849651862e-07, 'epoch': 0.79} 79%|███████▊ | 2181/2774 [7:10:04<1:59:45, 12.12s/it] 79%|███████▊ | 2182/2774 [7:10:16<1:57:55, 11.95s/it] {'loss': 1.0073, 'learning_rate': 5.740891624297381e-07, 'epoch': 0.79} 79%|███████▊ | 2182/2774 [7:10:16<1:57:55, 11.95s/it] 79%|███████▊ | 2183/2774 [7:10:27<1:55:28, 11.72s/it] {'loss': 1.0234, 'learning_rate': 5.722288667203315e-07, 'epoch': 0.79} 79%|███████▊ | 2183/2774 [7:10:27<1:55:28, 11.72s/it] 79%|███████▊ | 2184/2774 [7:10:39<1:55:35, 11.76s/it] {'loss': 1.0645, 'learning_rate': 5.703712003742965e-07, 'epoch': 0.79} 79%|███████▊ | 2184/2774 [7:10:39<1:55:35, 11.76s/it] 79%|███████▉ | 2185/2774 [7:10:50<1:53:24, 11.55s/it] {'loss': 1.0874, 'learning_rate': 5.685161659253791e-07, 'epoch': 0.79} 79%|███████▉ | 2185/2774 [7:10:50<1:53:24, 11.55s/it] 79%|███████▉ | 2186/2774 [7:11:01<1:52:49, 11.51s/it] {'loss': 0.9956, 'learning_rate': 5.666637659037338e-07, 'epoch': 0.79} 79%|███████▉ | 2186/2774 [7:11:01<1:52:49, 11.51s/it] 79%|███████▉ | 2187/2774 [7:11:13<1:52:36, 11.51s/it] {'loss': 1.0483, 'learning_rate': 5.648140028359214e-07, 'epoch': 0.79} 79%|███████▉ | 2187/2774 [7:11:13<1:52:36, 11.51s/it] 79%|███████▉ | 2188/2774 [7:11:24<1:52:25, 11.51s/it] {'loss': 1.0186, 'learning_rate': 5.629668792449086e-07, 'epoch': 0.79} 79%|███████▉ | 2188/2774 [7:11:24<1:52:25, 11.51s/it] 79%|███████▉ | 2189/2774 [7:11:36<1:53:17, 11.62s/it] {'loss': 0.998, 'learning_rate': 5.611223976500591e-07, 'epoch': 0.79} 79%|███████▉ | 2189/2774 [7:11:36<1:53:17, 11.62s/it] 79%|███████▉ | 2190/2774 [7:11:48<1:52:39, 11.57s/it] {'loss': 1.0479, 'learning_rate': 5.59280560567135e-07, 'epoch': 0.79} 79%|███████▉ | 2190/2774 [7:11:48<1:52:39, 11.57s/it] 79%|███████▉ | 2191/2774 [7:12:00<1:53:52, 11.72s/it] {'loss': 0.9741, 'learning_rate': 5.574413705082904e-07, 'epoch': 0.79} 79%|███████▉ | 2191/2774 [7:12:00<1:53:52, 11.72s/it] 79%|███████▉ | 2192/2774 [7:12:11<1:52:26, 11.59s/it] {'loss': 1.1064, 'learning_rate': 5.55604829982071e-07, 'epoch': 0.79} 79%|███████▉ | 2192/2774 [7:12:11<1:52:26, 11.59s/it] 79%|███████▉ | 2193/2774 [7:12:22<1:52:01, 11.57s/it] {'loss': 1.0767, 'learning_rate': 5.537709414934045e-07, 'epoch': 0.79} 79%|███████▉ | 2193/2774 [7:12:22<1:52:01, 11.57s/it] 79%|███████▉ | 2194/2774 [7:12:34<1:51:37, 11.55s/it] {'loss': 1.0308, 'learning_rate': 5.519397075436058e-07, 'epoch': 0.79} 79%|███████▉ | 2194/2774 [7:12:34<1:51:37, 11.55s/it] 79%|███████▉ | 2195/2774 [7:12:45<1:51:16, 11.53s/it] {'loss': 1.0386, 'learning_rate': 5.501111306303666e-07, 'epoch': 0.79} 79%|███████▉ | 2195/2774 [7:12:45<1:51:16, 11.53s/it] 79%|███████▉ | 2196/2774 [7:12:58<1:54:13, 11.86s/it] {'loss': 1.0103, 'learning_rate': 5.482852132477562e-07, 'epoch': 0.79} 79%|███████▉ | 2196/2774 [7:12:58<1:54:13, 11.86s/it] 79%|███████▉ | 2197/2774 [7:13:10<1:53:23, 11.79s/it] {'loss': 1.0752, 'learning_rate': 5.464619578862143e-07, 'epoch': 0.79} 79%|███████▉ | 2197/2774 [7:13:10<1:53:23, 11.79s/it] 79%|███████▉ | 2198/2774 [7:13:21<1:51:24, 11.61s/it] {'loss': 0.98, 'learning_rate': 5.446413670325529e-07, 'epoch': 0.79} 79%|███████▉ | 2198/2774 [7:13:21<1:51:24, 11.61s/it] 79%|███████▉ | 2199/2774 [7:13:32<1:51:06, 11.59s/it] {'loss': 0.999, 'learning_rate': 5.428234431699459e-07, 'epoch': 0.79} 79%|███████▉ | 2199/2774 [7:13:32<1:51:06, 11.59s/it] 79%|███████▉ | 2200/2774 [7:13:45<1:53:15, 11.84s/it] {'loss': 1.0127, 'learning_rate': 5.410081887779334e-07, 'epoch': 0.79} 79%|███████▉ | 2200/2774 [7:13:45<1:53:15, 11.84s/it] 79%|███████▉ | 2201/2774 [7:13:56<1:50:58, 11.62s/it] {'loss': 0.9937, 'learning_rate': 5.391956063324122e-07, 'epoch': 0.79} 79%|███████▉ | 2201/2774 [7:13:56<1:50:58, 11.62s/it] 79%|███████▉ | 2202/2774 [7:14:07<1:50:28, 11.59s/it] {'loss': 1.0317, 'learning_rate': 5.373856983056347e-07, 'epoch': 0.79} 79%|███████▉ | 2202/2774 [7:14:07<1:50:28, 11.59s/it] 79%|███████▉ | 2203/2774 [7:14:19<1:51:18, 11.70s/it] {'loss': 1.0205, 'learning_rate': 5.355784671662059e-07, 'epoch': 0.79} 79%|███████▉ | 2203/2774 [7:14:19<1:51:18, 11.70s/it] 79%|███████▉ | 2204/2774 [7:14:30<1:49:29, 11.53s/it] {'loss': 1.0713, 'learning_rate': 5.337739153790813e-07, 'epoch': 0.79} 79%|███████▉ | 2204/2774 [7:14:30<1:49:29, 11.53s/it] 79%|███████▉ | 2205/2774 [7:14:42<1:50:19, 11.63s/it] {'loss': 1.0532, 'learning_rate': 5.31972045405559e-07, 'epoch': 0.79} 79%|███████▉ | 2205/2774 [7:14:42<1:50:19, 11.63s/it] 80%|███████▉ | 2206/2774 [7:14:56<1:54:50, 12.13s/it] {'loss': 0.9907, 'learning_rate': 5.301728597032821e-07, 'epoch': 0.8} 80%|███████▉ | 2206/2774 [7:14:56<1:54:50, 12.13s/it] 80%|███████▉ | 2207/2774 [7:15:07<1:53:12, 11.98s/it] {'loss': 1.0498, 'learning_rate': 5.283763607262305e-07, 'epoch': 0.8} 80%|███████▉ | 2207/2774 [7:15:07<1:53:12, 11.98s/it] 80%|███████▉ | 2208/2774 [7:15:19<1:50:58, 11.76s/it] {'loss': 0.9775, 'learning_rate': 5.265825509247199e-07, 'epoch': 0.8} 80%|███████▉ | 2208/2774 [7:15:19<1:50:58, 11.76s/it] 80%|███████▉ | 2209/2774 [7:15:30<1:49:52, 11.67s/it] {'loss': 0.9893, 'learning_rate': 5.247914327453996e-07, 'epoch': 0.8} 80%|███████▉ | 2209/2774 [7:15:30<1:49:52, 11.67s/it] 80%|███████▉ | 2210/2774 [7:15:42<1:49:52, 11.69s/it] {'loss': 1.0137, 'learning_rate': 5.23003008631246e-07, 'epoch': 0.8} 80%|███████▉ | 2210/2774 [7:15:42<1:49:52, 11.69s/it] 80%|███████▉ | 2211/2774 [7:15:53<1:49:32, 11.67s/it] {'loss': 0.9878, 'learning_rate': 5.212172810215607e-07, 'epoch': 0.8} 80%|███████▉ | 2211/2774 [7:15:53<1:49:32, 11.67s/it] 80%|███████▉ | 2212/2774 [7:16:05<1:48:34, 11.59s/it] {'loss': 1.0649, 'learning_rate': 5.194342523519699e-07, 'epoch': 0.8} 80%|███████▉ | 2212/2774 [7:16:05<1:48:34, 11.59s/it] 80%|███████▉ | 2213/2774 [7:16:16<1:47:13, 11.47s/it] {'loss': 1.0488, 'learning_rate': 5.176539250544163e-07, 'epoch': 0.8} 80%|███████▉ | 2213/2774 [7:16:16<1:47:13, 11.47s/it] 80%|███████▉ | 2214/2774 [7:16:28<1:48:52, 11.67s/it] {'loss': 0.9814, 'learning_rate': 5.158763015571581e-07, 'epoch': 0.8} 80%|███████▉ | 2214/2774 [7:16:28<1:48:52, 11.67s/it] 80%|███████▉ | 2215/2774 [7:16:39<1:47:54, 11.58s/it] {'loss': 1.0552, 'learning_rate': 5.141013842847672e-07, 'epoch': 0.8} 80%|███████▉ | 2215/2774 [7:16:39<1:47:54, 11.58s/it] 80%|███████▉ | 2216/2774 [7:16:51<1:47:44, 11.58s/it] {'loss': 1.0396, 'learning_rate': 5.123291756581231e-07, 'epoch': 0.8} 80%|███████▉ | 2216/2774 [7:16:51<1:47:44, 11.58s/it] 80%|███████▉ | 2217/2774 [7:17:02<1:47:02, 11.53s/it] {'loss': 1.0317, 'learning_rate': 5.105596780944122e-07, 'epoch': 0.8} 80%|███████▉ | 2217/2774 [7:17:02<1:47:02, 11.53s/it] 80%|███████▉ | 2218/2774 [7:17:14<1:45:47, 11.42s/it] {'loss': 1.0396, 'learning_rate': 5.087928940071207e-07, 'epoch': 0.8} 80%|███████▉ | 2218/2774 [7:17:14<1:45:47, 11.42s/it] 80%|███████▉ | 2219/2774 [7:17:25<1:45:33, 11.41s/it] {'loss': 0.9702, 'learning_rate': 5.07028825806038e-07, 'epoch': 0.8} 80%|███████▉ | 2219/2774 [7:17:25<1:45:33, 11.41s/it] 80%|████████ | 2220/2774 [7:17:36<1:45:18, 11.40s/it] {'loss': 0.981, 'learning_rate': 5.052674758972431e-07, 'epoch': 0.8} 80%|████████ | 2220/2774 [7:17:36<1:45:18, 11.40s/it] 80%|████████ | 2221/2774 [7:17:50<1:49:54, 11.92s/it] {'loss': 0.9531, 'learning_rate': 5.035088466831134e-07, 'epoch': 0.8} 80%|████████ | 2221/2774 [7:17:50<1:49:54, 11.92s/it] 80%|████████ | 2222/2774 [7:18:01<1:48:34, 11.80s/it] {'loss': 1.0352, 'learning_rate': 5.017529405623115e-07, 'epoch': 0.8} 80%|████████ | 2222/2774 [7:18:01<1:48:34, 11.80s/it] 80%|████████ | 2223/2774 [7:18:12<1:46:51, 11.64s/it] {'loss': 1.0737, 'learning_rate': 4.999997599297888e-07, 'epoch': 0.8} 80%|████████ | 2223/2774 [7:18:12<1:46:51, 11.64s/it] 80%|████████ | 2224/2774 [7:18:24<1:46:12, 11.59s/it] {'loss': 0.9834, 'learning_rate': 4.982493071767758e-07, 'epoch': 0.8} 80%|████████ | 2224/2774 [7:18:24<1:46:12, 11.59s/it] 80%|████████ | 2225/2774 [7:18:35<1:45:43, 11.55s/it] {'loss': 1.0239, 'learning_rate': 4.965015846907865e-07, 'epoch': 0.8} 80%|████████ | 2225/2774 [7:18:35<1:45:43, 11.55s/it] 80%|████████ | 2226/2774 [7:18:47<1:45:43, 11.58s/it] {'loss': 1.0044, 'learning_rate': 4.947565948556066e-07, 'epoch': 0.8} 80%|████████ | 2226/2774 [7:18:47<1:45:43, 11.58s/it] 80%|████████ | 2227/2774 [7:18:59<1:47:11, 11.76s/it] {'loss': 1.0078, 'learning_rate': 4.930143400512988e-07, 'epoch': 0.8} 80%|████████ | 2227/2774 [7:18:59<1:47:11, 11.76s/it] 80%|████████ | 2228/2774 [7:19:10<1:45:34, 11.60s/it] {'loss': 1.0547, 'learning_rate': 4.912748226541924e-07, 'epoch': 0.8} 80%|████████ | 2228/2774 [7:19:10<1:45:34, 11.60s/it] 80%|████████ | 2229/2774 [7:19:22<1:45:07, 11.57s/it] {'loss': 1.0571, 'learning_rate': 4.895380450368841e-07, 'epoch': 0.8} 80%|████████ | 2229/2774 [7:19:22<1:45:07, 11.57s/it] 80%|████████ | 2230/2774 [7:19:33<1:44:15, 11.50s/it] {'loss': 1.0903, 'learning_rate': 4.878040095682335e-07, 'epoch': 0.8} 80%|████████ | 2230/2774 [7:19:33<1:44:15, 11.50s/it] 80%|████████ | 2231/2774 [7:19:47<1:49:38, 12.11s/it] {'loss': 1.0127, 'learning_rate': 4.860727186133607e-07, 'epoch': 0.8} 80%|████████ | 2231/2774 [7:19:47<1:49:38, 12.11s/it] 80%|████████ | 2232/2774 [7:19:58<1:48:33, 12.02s/it] {'loss': 1.0435, 'learning_rate': 4.843441745336419e-07, 'epoch': 0.8} 80%|████████ | 2232/2774 [7:19:58<1:48:33, 12.02s/it] 80%|████████ | 2233/2774 [7:20:10<1:46:06, 11.77s/it] {'loss': 1.0547, 'learning_rate': 4.826183796867059e-07, 'epoch': 0.8} 80%|████████ | 2233/2774 [7:20:10<1:46:06, 11.77s/it] 81%|████████ | 2234/2774 [7:20:22<1:48:19, 12.04s/it] {'loss': 1.0034, 'learning_rate': 4.80895336426434e-07, 'epoch': 0.81} 81%|████████ | 2234/2774 [7:20:22<1:48:19, 12.04s/it] 81%|████████ | 2235/2774 [7:20:33<1:45:36, 11.76s/it] {'loss': 1.0024, 'learning_rate': 4.791750471029519e-07, 'epoch': 0.81} 81%|████████ | 2235/2774 [7:20:33<1:45:36, 11.76s/it] 81%|████████ | 2236/2774 [7:20:45<1:44:30, 11.66s/it] {'loss': 1.0684, 'learning_rate': 4.774575140626317e-07, 'epoch': 0.81} 81%|████████ | 2236/2774 [7:20:45<1:44:30, 11.66s/it] 81%|████████ | 2237/2774 [7:20:56<1:43:46, 11.59s/it] {'loss': 1.0527, 'learning_rate': 4.757427396480838e-07, 'epoch': 0.81} 81%|████████ | 2237/2774 [7:20:56<1:43:46, 11.59s/it] 81%|████████ | 2238/2774 [7:21:08<1:43:09, 11.55s/it] {'loss': 1.04, 'learning_rate': 4.7403072619815696e-07, 'epoch': 0.81} 81%|████████ | 2238/2774 [7:21:08<1:43:09, 11.55s/it] 81%|████████ | 2239/2774 [7:21:19<1:42:33, 11.50s/it] {'loss': 1.0562, 'learning_rate': 4.723214760479333e-07, 'epoch': 0.81} 81%|████████ | 2239/2774 [7:21:19<1:42:33, 11.50s/it] 81%|████████ | 2240/2774 [7:21:31<1:43:03, 11.58s/it] {'loss': 1.0356, 'learning_rate': 4.7061499152872866e-07, 'epoch': 0.81} 81%|████████ | 2240/2774 [7:21:31<1:43:03, 11.58s/it] 81%|████████ | 2241/2774 [7:21:44<1:46:09, 11.95s/it] {'loss': 1.0186, 'learning_rate': 4.6891127496808295e-07, 'epoch': 0.81} 81%|████████ | 2241/2774 [7:21:44<1:46:09, 11.95s/it] 81%|████████ | 2242/2774 [7:21:55<1:45:08, 11.86s/it] {'loss': 1.0117, 'learning_rate': 4.6721032868976417e-07, 'epoch': 0.81} 81%|████████ | 2242/2774 [7:21:55<1:45:08, 11.86s/it] 81%|████████ | 2243/2774 [7:22:07<1:43:23, 11.68s/it] {'loss': 1.083, 'learning_rate': 4.6551215501375896e-07, 'epoch': 0.81} 81%|████████ | 2243/2774 [7:22:07<1:43:23, 11.68s/it] 81%|████████ | 2244/2774 [7:22:18<1:42:10, 11.57s/it] {'loss': 1.022, 'learning_rate': 4.638167562562751e-07, 'epoch': 0.81} 81%|████████ | 2244/2774 [7:22:18<1:42:10, 11.57s/it] 81%|████████ | 2245/2774 [7:22:29<1:41:22, 11.50s/it] {'loss': 1.0049, 'learning_rate': 4.6212413472973257e-07, 'epoch': 0.81} 81%|████████ | 2245/2774 [7:22:29<1:41:22, 11.50s/it] 81%|████████ | 2246/2774 [7:22:40<1:40:10, 11.38s/it] {'loss': 0.9932, 'learning_rate': 4.6043429274276685e-07, 'epoch': 0.81} 81%|████████ | 2246/2774 [7:22:40<1:40:10, 11.38s/it] 81%|████████ | 2247/2774 [7:22:52<1:40:53, 11.49s/it] {'loss': 1.041, 'learning_rate': 4.5874723260021794e-07, 'epoch': 0.81} 81%|████████ | 2247/2774 [7:22:52<1:40:53, 11.49s/it] 81%|████████ | 2248/2774 [7:23:05<1:45:05, 11.99s/it] {'loss': 1.0034, 'learning_rate': 4.570629566031354e-07, 'epoch': 0.81} 81%|████████ | 2248/2774 [7:23:05<1:45:05, 11.99s/it] 81%|████████ | 2249/2774 [7:23:17<1:43:10, 11.79s/it] {'loss': 1.0229, 'learning_rate': 4.553814670487694e-07, 'epoch': 0.81} 81%|████████ | 2249/2774 [7:23:17<1:43:10, 11.79s/it] 81%|████████ | 2250/2774 [7:23:28<1:42:39, 11.75s/it] {'loss': 0.9512, 'learning_rate': 4.537027662305707e-07, 'epoch': 0.81} 81%|████████ | 2250/2774 [7:23:28<1:42:39, 11.75s/it] 81%|████████ | 2251/2774 [7:23:40<1:41:36, 11.66s/it] {'loss': 1.0342, 'learning_rate': 4.5202685643818495e-07, 'epoch': 0.81} 81%|████████ | 2251/2774 [7:23:40<1:41:36, 11.66s/it] 81%|████████ | 2252/2774 [7:23:51<1:40:47, 11.59s/it] {'loss': 1.0464, 'learning_rate': 4.5035373995745287e-07, 'epoch': 0.81} 81%|████████ | 2252/2774 [7:23:51<1:40:47, 11.59s/it] 81%|████████ | 2253/2774 [7:24:02<1:39:24, 11.45s/it] {'loss': 1.0439, 'learning_rate': 4.48683419070404e-07, 'epoch': 0.81} 81%|████████ | 2253/2774 [7:24:02<1:39:24, 11.45s/it] 81%|████████▏ | 2254/2774 [7:24:14<1:38:50, 11.40s/it] {'loss': 1.0693, 'learning_rate': 4.4701589605525427e-07, 'epoch': 0.81} 81%|████████▏ | 2254/2774 [7:24:14<1:38:50, 11.40s/it] 81%|████████▏ | 2255/2774 [7:24:25<1:39:26, 11.50s/it] {'loss': 1.0518, 'learning_rate': 4.4535117318640545e-07, 'epoch': 0.81} 81%|████████▏ | 2255/2774 [7:24:25<1:39:26, 11.50s/it] 81%|████████▏ | 2256/2774 [7:24:37<1:39:12, 11.49s/it] {'loss': 1.0317, 'learning_rate': 4.4368925273443856e-07, 'epoch': 0.81} 81%|████████▏ | 2256/2774 [7:24:37<1:39:12, 11.49s/it] 81%|████████▏ | 2257/2774 [7:24:48<1:38:55, 11.48s/it] {'loss': 1.0293, 'learning_rate': 4.4203013696611203e-07, 'epoch': 0.81} 81%|████████▏ | 2257/2774 [7:24:48<1:38:55, 11.48s/it] 81%|████████▏ | 2258/2774 [7:25:00<1:38:59, 11.51s/it] {'loss': 1.0068, 'learning_rate': 4.403738281443609e-07, 'epoch': 0.81} 81%|████████▏ | 2258/2774 [7:25:00<1:38:59, 11.51s/it] 81%|████████▏ | 2259/2774 [7:25:11<1:38:57, 11.53s/it] {'loss': 1.0376, 'learning_rate': 4.3872032852828955e-07, 'epoch': 0.81} 81%|████████▏ | 2259/2774 [7:25:11<1:38:57, 11.53s/it] 81%|████████▏ | 2260/2774 [7:25:23<1:38:29, 11.50s/it] {'loss': 1.0215, 'learning_rate': 4.3706964037317085e-07, 'epoch': 0.81} 81%|████████▏ | 2260/2774 [7:25:23<1:38:29, 11.50s/it] 82%|████████▏ | 2261/2774 [7:25:35<1:41:07, 11.83s/it] {'loss': 1.0762, 'learning_rate': 4.354217659304452e-07, 'epoch': 0.82} 82%|████████▏ | 2261/2774 [7:25:35<1:41:07, 11.83s/it] 82%|████████▏ | 2262/2774 [7:25:47<1:41:30, 11.89s/it] {'loss': 1.0068, 'learning_rate': 4.3377670744771253e-07, 'epoch': 0.82} 82%|████████▏ | 2262/2774 [7:25:47<1:41:30, 11.89s/it] 82%|████████▏ | 2263/2774 [7:25:59<1:41:02, 11.86s/it] {'loss': 0.9595, 'learning_rate': 4.321344671687344e-07, 'epoch': 0.82} 82%|████████▏ | 2263/2774 [7:25:59<1:41:02, 11.86s/it] 82%|████████▏ | 2264/2774 [7:26:13<1:46:37, 12.54s/it] {'loss': 0.9985, 'learning_rate': 4.304950473334268e-07, 'epoch': 0.82} 82%|████████▏ | 2264/2774 [7:26:13<1:46:37, 12.54s/it] 82%|████████▏ | 2265/2774 [7:26:25<1:45:11, 12.40s/it] {'loss': 1.0142, 'learning_rate': 4.288584501778592e-07, 'epoch': 0.82} 82%|████████▏ | 2265/2774 [7:26:25<1:45:11, 12.40s/it] 82%|████████▏ | 2266/2774 [7:26:37<1:42:39, 12.12s/it] {'loss': 1.0469, 'learning_rate': 4.2722467793425093e-07, 'epoch': 0.82} 82%|████████▏ | 2266/2774 [7:26:37<1:42:39, 12.12s/it] 82%|████████▏ | 2267/2774 [7:26:49<1:41:47, 12.05s/it] {'loss': 1.0059, 'learning_rate': 4.255937328309695e-07, 'epoch': 0.82} 82%|████████▏ | 2267/2774 [7:26:49<1:41:47, 12.05s/it] 82%|████████▏ | 2268/2774 [7:27:00<1:39:39, 11.82s/it] {'loss': 0.9692, 'learning_rate': 4.2396561709252436e-07, 'epoch': 0.82} 82%|████████▏ | 2268/2774 [7:27:00<1:39:39, 11.82s/it] 82%|████████▏ | 2269/2774 [7:27:12<1:38:54, 11.75s/it] {'loss': 1.0625, 'learning_rate': 4.2234033293956865e-07, 'epoch': 0.82} 82%|████████▏ | 2269/2774 [7:27:12<1:38:54, 11.75s/it] 82%|████████▏ | 2270/2774 [7:27:23<1:38:01, 11.67s/it] {'loss': 1.019, 'learning_rate': 4.2071788258889025e-07, 'epoch': 0.82} 82%|████████▏ | 2270/2774 [7:27:23<1:38:01, 11.67s/it] 82%|████████▏ | 2271/2774 [7:27:34<1:36:37, 11.53s/it] {'loss': 1.021, 'learning_rate': 4.190982682534145e-07, 'epoch': 0.82} 82%|████████▏ | 2271/2774 [7:27:34<1:36:37, 11.53s/it] 82%|████████▏ | 2272/2774 [7:27:46<1:36:22, 11.52s/it] {'loss': 0.9858, 'learning_rate': 4.174814921421963e-07, 'epoch': 0.82} 82%|████████▏ | 2272/2774 [7:27:46<1:36:22, 11.52s/it] 82%|████████▏ | 2273/2774 [7:27:58<1:38:21, 11.78s/it] {'loss': 1.0645, 'learning_rate': 4.158675564604223e-07, 'epoch': 0.82} 82%|████████▏ | 2273/2774 [7:27:58<1:38:21, 11.78s/it] 82%|████████▏ | 2274/2774 [7:28:10<1:37:18, 11.68s/it] {'loss': 1.0137, 'learning_rate': 4.142564634094021e-07, 'epoch': 0.82} 82%|████████▏ | 2274/2774 [7:28:10<1:37:18, 11.68s/it] 82%|████████▏ | 2275/2774 [7:28:23<1:40:49, 12.12s/it] {'loss': 0.9854, 'learning_rate': 4.126482151865696e-07, 'epoch': 0.82} 82%|████████▏ | 2275/2774 [7:28:23<1:40:49, 12.12s/it] 82%|████████▏ | 2276/2774 [7:28:34<1:38:22, 11.85s/it] {'loss': 1.0342, 'learning_rate': 4.1104281398547746e-07, 'epoch': 0.82} 82%|████████▏ | 2276/2774 [7:28:34<1:38:22, 11.85s/it] 82%|████████▏ | 2277/2774 [7:28:45<1:36:58, 11.71s/it] {'loss': 0.9956, 'learning_rate': 4.094402619957974e-07, 'epoch': 0.82} 82%|████████▏ | 2277/2774 [7:28:45<1:36:58, 11.71s/it] 82%|████████▏ | 2278/2774 [7:28:57<1:36:50, 11.72s/it] {'loss': 1.0142, 'learning_rate': 4.078405614033126e-07, 'epoch': 0.82} 82%|████████▏ | 2278/2774 [7:28:57<1:36:50, 11.72s/it] 82%|████████▏ | 2279/2774 [7:29:09<1:36:15, 11.67s/it] {'loss': 0.9961, 'learning_rate': 4.062437143899176e-07, 'epoch': 0.82} 82%|████████▏ | 2279/2774 [7:29:09<1:36:15, 11.67s/it] 82%|████████▏ | 2280/2774 [7:29:20<1:35:05, 11.55s/it] {'loss': 1.0923, 'learning_rate': 4.046497231336166e-07, 'epoch': 0.82} 82%|████████▏ | 2280/2774 [7:29:20<1:35:05, 11.55s/it] 82%|████████▏ | 2281/2774 [7:29:32<1:36:02, 11.69s/it] {'loss': 1.0625, 'learning_rate': 4.0305858980851595e-07, 'epoch': 0.82} 82%|████████▏ | 2281/2774 [7:29:32<1:36:02, 11.69s/it] 82%|████████▏ | 2282/2774 [7:29:45<1:39:16, 12.11s/it] {'loss': 1.0444, 'learning_rate': 4.014703165848266e-07, 'epoch': 0.82} 82%|████████▏ | 2282/2774 [7:29:45<1:39:16, 12.11s/it] 82%|████████▏ | 2283/2774 [7:29:58<1:41:58, 12.46s/it] {'loss': 0.9663, 'learning_rate': 3.9988490562885675e-07, 'epoch': 0.82} 82%|████████▏ | 2283/2774 [7:29:58<1:41:58, 12.46s/it] 82%|████████▏ | 2284/2774 [7:30:10<1:39:19, 12.16s/it] {'loss': 0.9785, 'learning_rate': 3.983023591030113e-07, 'epoch': 0.82} 82%|████████▏ | 2284/2774 [7:30:10<1:39:19, 12.16s/it] 82%|████████▏ | 2285/2774 [7:30:22<1:39:50, 12.25s/it] {'loss': 1.0137, 'learning_rate': 3.9672267916578743e-07, 'epoch': 0.82} 82%|████████▏ | 2285/2774 [7:30:22<1:39:50, 12.25s/it] 82%|████████▏ | 2286/2774 [7:30:34<1:37:40, 12.01s/it] {'loss': 1.0454, 'learning_rate': 3.951458679717743e-07, 'epoch': 0.82} 82%|████████▏ | 2286/2774 [7:30:34<1:37:40, 12.01s/it] 82%|████████▏ | 2287/2774 [7:30:45<1:36:44, 11.92s/it] {'loss': 0.999, 'learning_rate': 3.935719276716457e-07, 'epoch': 0.82} 82%|████████▏ | 2287/2774 [7:30:45<1:36:44, 11.92s/it] 82%|████████▏ | 2288/2774 [7:30:57<1:35:42, 11.82s/it] {'loss': 1.0127, 'learning_rate': 3.920008604121628e-07, 'epoch': 0.82} 82%|████████▏ | 2288/2774 [7:30:57<1:35:42, 11.82s/it] 83%|████████▎ | 2289/2774 [7:31:08<1:34:09, 11.65s/it] {'loss': 1.0488, 'learning_rate': 3.904326683361648e-07, 'epoch': 0.83} 83%|████████▎ | 2289/2774 [7:31:08<1:34:09, 11.65s/it] 83%|████████▎ | 2290/2774 [7:31:20<1:34:47, 11.75s/it] {'loss': 1.0327, 'learning_rate': 3.888673535825727e-07, 'epoch': 0.83} 83%|████████▎ | 2290/2774 [7:31:20<1:34:47, 11.75s/it] 83%|████████▎ | 2291/2774 [7:31:33<1:37:47, 12.15s/it] {'loss': 1.0322, 'learning_rate': 3.8730491828637944e-07, 'epoch': 0.83} 83%|████████▎ | 2291/2774 [7:31:33<1:37:47, 12.15s/it] 83%|████████▎ | 2292/2774 [7:31:45<1:36:47, 12.05s/it] {'loss': 1.0415, 'learning_rate': 3.8574536457865436e-07, 'epoch': 0.83} 83%|████████▎ | 2292/2774 [7:31:45<1:36:47, 12.05s/it] 83%|████████▎ | 2293/2774 [7:31:56<1:34:58, 11.85s/it] {'loss': 1.0576, 'learning_rate': 3.841886945865325e-07, 'epoch': 0.83} 83%|████████▎ | 2293/2774 [7:31:56<1:34:58, 11.85s/it] 83%|████████▎ | 2294/2774 [7:32:08<1:34:04, 11.76s/it] {'loss': 1.0234, 'learning_rate': 3.8263491043321887e-07, 'epoch': 0.83} 83%|████████▎ | 2294/2774 [7:32:08<1:34:04, 11.76s/it] 83%|████████▎ | 2295/2774 [7:32:20<1:33:27, 11.71s/it] {'loss': 1.0195, 'learning_rate': 3.810840142379807e-07, 'epoch': 0.83} 83%|████████▎ | 2295/2774 [7:32:20<1:33:27, 11.71s/it] 83%|████████▎ | 2296/2774 [7:32:31<1:33:09, 11.69s/it] {'loss': 1.0356, 'learning_rate': 3.7953600811614727e-07, 'epoch': 0.83} 83%|████████▎ | 2296/2774 [7:32:31<1:33:09, 11.69s/it] 83%|████████▎ | 2297/2774 [7:32:43<1:33:05, 11.71s/it] {'loss': 1.0522, 'learning_rate': 3.7799089417910467e-07, 'epoch': 0.83} 83%|████████▎ | 2297/2774 [7:32:43<1:33:05, 11.71s/it] 83%|████████▎ | 2298/2774 [7:32:55<1:32:28, 11.66s/it] {'loss': 1.0293, 'learning_rate': 3.7644867453429575e-07, 'epoch': 0.83} 83%|████████▎ | 2298/2774 [7:32:55<1:32:28, 11.66s/it] 83%|████████▎ | 2299/2774 [7:33:06<1:32:02, 11.63s/it] {'loss': 1.0166, 'learning_rate': 3.749093512852148e-07, 'epoch': 0.83} 83%|████████▎ | 2299/2774 [7:33:06<1:32:02, 11.63s/it] 83%|████████▎ | 2300/2774 [7:33:18<1:31:50, 11.63s/it] {'loss': 1.0107, 'learning_rate': 3.7337292653140485e-07, 'epoch': 0.83} 83%|████████▎ | 2300/2774 [7:33:18<1:31:50, 11.63s/it] 83%|████████▎ | 2301/2774 [7:33:29<1:31:31, 11.61s/it] {'loss': 1.0073, 'learning_rate': 3.7183940236845767e-07, 'epoch': 0.83} 83%|████████▎ | 2301/2774 [7:33:29<1:31:31, 11.61s/it] 83%|████████▎ | 2302/2774 [7:33:41<1:32:01, 11.70s/it] {'loss': 0.9937, 'learning_rate': 3.703087808880071e-07, 'epoch': 0.83} 83%|████████▎ | 2302/2774 [7:33:41<1:32:01, 11.70s/it] 83%|████████▎ | 2303/2774 [7:33:53<1:31:13, 11.62s/it] {'loss': 1.0513, 'learning_rate': 3.6878106417772757e-07, 'epoch': 0.83} 83%|████████▎ | 2303/2774 [7:33:53<1:31:13, 11.62s/it] 83%|████████▎ | 2304/2774 [7:34:06<1:35:15, 12.16s/it] {'loss': 0.9731, 'learning_rate': 3.6725625432133374e-07, 'epoch': 0.83} 83%|████████▎ | 2304/2774 [7:34:06<1:35:15, 12.16s/it] 83%|████████▎ | 2305/2774 [7:34:18<1:34:24, 12.08s/it] {'loss': 0.9956, 'learning_rate': 3.6573435339857384e-07, 'epoch': 0.83} 83%|████████▎ | 2305/2774 [7:34:18<1:34:24, 12.08s/it] 83%|████████▎ | 2306/2774 [7:34:30<1:32:56, 11.92s/it] {'loss': 1.0107, 'learning_rate': 3.6421536348522746e-07, 'epoch': 0.83} 83%|████████▎ | 2306/2774 [7:34:30<1:32:56, 11.92s/it] 83%|████████▎ | 2307/2774 [7:34:41<1:31:41, 11.78s/it] {'loss': 1.0029, 'learning_rate': 3.6269928665310707e-07, 'epoch': 0.83} 83%|████████▎ | 2307/2774 [7:34:41<1:31:41, 11.78s/it] 83%|████████▎ | 2308/2774 [7:34:53<1:32:30, 11.91s/it] {'loss': 1.0669, 'learning_rate': 3.611861249700482e-07, 'epoch': 0.83} 83%|████████▎ | 2308/2774 [7:34:53<1:32:30, 11.91s/it] 83%|████████▎ | 2309/2774 [7:35:05<1:31:07, 11.76s/it] {'loss': 1.0469, 'learning_rate': 3.5967588049991317e-07, 'epoch': 0.83} 83%|████████▎ | 2309/2774 [7:35:05<1:31:07, 11.76s/it] 83%|████████▎ | 2310/2774 [7:35:16<1:30:21, 11.68s/it] {'loss': 1.0176, 'learning_rate': 3.5816855530258376e-07, 'epoch': 0.83} 83%|████████▎ | 2310/2774 [7:35:16<1:30:21, 11.68s/it] 83%|████████▎ | 2311/2774 [7:35:27<1:29:24, 11.59s/it] {'loss': 1.0029, 'learning_rate': 3.5666415143396054e-07, 'epoch': 0.83} 83%|████████▎ | 2311/2774 [7:35:27<1:29:24, 11.59s/it] 83%|████████▎ | 2312/2774 [7:35:39<1:29:58, 11.69s/it] {'loss': 1.0469, 'learning_rate': 3.551626709459588e-07, 'epoch': 0.83} 83%|████████▎ | 2312/2774 [7:35:39<1:29:58, 11.69s/it] 83%|████████▎ | 2313/2774 [7:35:52<1:32:23, 12.03s/it] {'loss': 1.0259, 'learning_rate': 3.5366411588650866e-07, 'epoch': 0.83} 83%|████████▎ | 2313/2774 [7:35:52<1:32:23, 12.03s/it] 83%|████████▎ | 2314/2774 [7:36:04<1:32:13, 12.03s/it] {'loss': 0.9956, 'learning_rate': 3.5216848829954714e-07, 'epoch': 0.83} 83%|████████▎ | 2314/2774 [7:36:04<1:32:13, 12.03s/it] 83%|████████▎ | 2315/2774 [7:36:16<1:30:23, 11.82s/it] {'loss': 1.0425, 'learning_rate': 3.50675790225021e-07, 'epoch': 0.83} 83%|████████▎ | 2315/2774 [7:36:16<1:30:23, 11.82s/it] 83%|████████▎ | 2316/2774 [7:36:27<1:28:51, 11.64s/it] {'loss': 1.0049, 'learning_rate': 3.491860236988798e-07, 'epoch': 0.83} 83%|████████▎ | 2316/2774 [7:36:27<1:28:51, 11.64s/it] 84%|████████▎ | 2317/2774 [7:36:41<1:35:16, 12.51s/it] {'loss': 0.9854, 'learning_rate': 3.476991907530755e-07, 'epoch': 0.84} 84%|████████▎ | 2317/2774 [7:36:41<1:35:16, 12.51s/it] 84%|████████▎ | 2318/2774 [7:36:53<1:33:43, 12.33s/it] {'loss': 1.0923, 'learning_rate': 3.4621529341555745e-07, 'epoch': 0.84} 84%|████████▎ | 2318/2774 [7:36:53<1:33:43, 12.33s/it] 84%|████████▎ | 2319/2774 [7:37:05<1:31:44, 12.10s/it] {'loss': 1.021, 'learning_rate': 3.4473433371027406e-07, 'epoch': 0.84} 84%|████████▎ | 2319/2774 [7:37:05<1:31:44, 12.10s/it] 84%|████████▎ | 2320/2774 [7:37:16<1:29:44, 11.86s/it] {'loss': 1.0493, 'learning_rate': 3.432563136571621e-07, 'epoch': 0.84} 84%|████████▎ | 2320/2774 [7:37:16<1:29:44, 11.86s/it] 84%|████████▎ | 2321/2774 [7:37:28<1:28:40, 11.74s/it] {'loss': 0.98, 'learning_rate': 3.417812352721536e-07, 'epoch': 0.84} 84%|████████▎ | 2321/2774 [7:37:28<1:28:40, 11.74s/it] 84%|████████▎ | 2322/2774 [7:37:39<1:28:44, 11.78s/it] {'loss': 0.9312, 'learning_rate': 3.403091005671655e-07, 'epoch': 0.84} 84%|████████▎ | 2322/2774 [7:37:39<1:28:44, 11.78s/it] 84%|████████▎ | 2323/2774 [7:37:52<1:29:25, 11.90s/it] {'loss': 1.0015, 'learning_rate': 3.388399115501012e-07, 'epoch': 0.84} 84%|████████▎ | 2323/2774 [7:37:52<1:29:25, 11.90s/it] 84%|████████▍ | 2324/2774 [7:38:03<1:27:37, 11.68s/it] {'loss': 1.0752, 'learning_rate': 3.373736702248451e-07, 'epoch': 0.84} 84%|████████▍ | 2324/2774 [7:38:03<1:27:37, 11.68s/it] 84%|████████▍ | 2325/2774 [7:38:15<1:27:53, 11.75s/it] {'loss': 0.9497, 'learning_rate': 3.3591037859126266e-07, 'epoch': 0.84} 84%|████████▍ | 2325/2774 [7:38:15<1:27:53, 11.75s/it] 84%|████████▍ | 2326/2774 [7:38:26<1:27:18, 11.69s/it] {'loss': 1.062, 'learning_rate': 3.3445003864519486e-07, 'epoch': 0.84} 84%|████████▍ | 2326/2774 [7:38:26<1:27:18, 11.69s/it] 84%|████████▍ | 2327/2774 [7:38:40<1:32:34, 12.43s/it] {'loss': 1.019, 'learning_rate': 3.329926523784563e-07, 'epoch': 0.84} 84%|████████▍ | 2327/2774 [7:38:40<1:32:34, 12.43s/it] 84%|████████▍ | 2328/2774 [7:38:52<1:30:15, 12.14s/it] {'loss': 1.0464, 'learning_rate': 3.3153822177883543e-07, 'epoch': 0.84} 84%|████████▍ | 2328/2774 [7:38:52<1:30:15, 12.14s/it] 84%|████████▍ | 2329/2774 [7:39:03<1:28:42, 11.96s/it] {'loss': 1.022, 'learning_rate': 3.3008674883008686e-07, 'epoch': 0.84} 84%|████████▍ | 2329/2774 [7:39:03<1:28:42, 11.96s/it] 84%|████████▍ | 2330/2774 [7:39:15<1:27:03, 11.77s/it] {'loss': 1.0703, 'learning_rate': 3.286382355119319e-07, 'epoch': 0.84} 84%|████████▍ | 2330/2774 [7:39:15<1:27:03, 11.77s/it] 84%|████████▍ | 2331/2774 [7:39:26<1:26:14, 11.68s/it] {'loss': 1.0474, 'learning_rate': 3.2719268380005496e-07, 'epoch': 0.84} 84%|████████▍ | 2331/2774 [7:39:26<1:26:14, 11.68s/it] 84%|████████▍ | 2332/2774 [7:39:38<1:25:18, 11.58s/it] {'loss': 1.0332, 'learning_rate': 3.2575009566610193e-07, 'epoch': 0.84} 84%|████████▍ | 2332/2774 [7:39:38<1:25:18, 11.58s/it] 84%|████████▍ | 2333/2774 [7:39:49<1:24:53, 11.55s/it] {'loss': 1.0713, 'learning_rate': 3.243104730776753e-07, 'epoch': 0.84} 84%|████████▍ | 2333/2774 [7:39:49<1:24:53, 11.55s/it] 84%|████████▍ | 2334/2774 [7:40:00<1:23:33, 11.39s/it] {'loss': 1.0474, 'learning_rate': 3.2287381799833427e-07, 'epoch': 0.84} 84%|████████▍ | 2334/2774 [7:40:00<1:23:33, 11.39s/it] 84%|████████▍ | 2335/2774 [7:40:12<1:25:16, 11.65s/it] {'loss': 1.0483, 'learning_rate': 3.214401323875882e-07, 'epoch': 0.84} 84%|████████▍ | 2335/2774 [7:40:12<1:25:16, 11.65s/it] 84%|████████▍ | 2336/2774 [7:40:25<1:27:56, 12.05s/it] {'loss': 1.0532, 'learning_rate': 3.2000941820089893e-07, 'epoch': 0.84} 84%|████████▍ | 2336/2774 [7:40:25<1:27:56, 12.05s/it] 84%|████████▍ | 2337/2774 [7:40:36<1:25:49, 11.78s/it] {'loss': 1.0405, 'learning_rate': 3.1858167738967383e-07, 'epoch': 0.84} 84%|████████▍ | 2337/2774 [7:40:36<1:25:49, 11.78s/it] 84%|████████▍ | 2338/2774 [7:40:48<1:24:50, 11.68s/it] {'loss': 1.0117, 'learning_rate': 3.171569119012649e-07, 'epoch': 0.84} 84%|████████▍ | 2338/2774 [7:40:48<1:24:50, 11.68s/it] 84%|████████▍ | 2339/2774 [7:40:59<1:24:19, 11.63s/it] {'loss': 1.0142, 'learning_rate': 3.1573512367896545e-07, 'epoch': 0.84} 84%|████████▍ | 2339/2774 [7:40:59<1:24:19, 11.63s/it] 84%|████████▍ | 2340/2774 [7:41:11<1:24:41, 11.71s/it] {'loss': 1.0063, 'learning_rate': 3.143163146620104e-07, 'epoch': 0.84} 84%|████████▍ | 2340/2774 [7:41:11<1:24:41, 11.71s/it] 84%|████████▍ | 2341/2774 [7:41:25<1:28:09, 12.22s/it] {'loss': 1.0674, 'learning_rate': 3.1290048678556786e-07, 'epoch': 0.84} 84%|████████▍ | 2341/2774 [7:41:25<1:28:09, 12.22s/it] 84%|████████▍ | 2342/2774 [7:41:36<1:26:14, 11.98s/it] {'loss': 1.0239, 'learning_rate': 3.1148764198074304e-07, 'epoch': 0.84} 84%|████████▍ | 2342/2774 [7:41:36<1:26:14, 11.98s/it] 84%|████████▍ | 2343/2774 [7:41:47<1:24:04, 11.70s/it] {'loss': 1.0122, 'learning_rate': 3.1007778217456956e-07, 'epoch': 0.84} 84%|████████▍ | 2343/2774 [7:41:47<1:24:04, 11.70s/it] 84%|████████▍ | 2344/2774 [7:41:59<1:23:13, 11.61s/it] {'loss': 1.0352, 'learning_rate': 3.08670909290012e-07, 'epoch': 0.84} 84%|████████▍ | 2344/2774 [7:41:59<1:23:13, 11.61s/it] 85%|████████▍ | 2345/2774 [7:42:10<1:22:22, 11.52s/it] {'loss': 0.9897, 'learning_rate': 3.0726702524596003e-07, 'epoch': 0.85} 85%|████████▍ | 2345/2774 [7:42:10<1:22:22, 11.52s/it] 85%|████████▍ | 2346/2774 [7:42:21<1:21:44, 11.46s/it] {'loss': 0.9932, 'learning_rate': 3.058661319572259e-07, 'epoch': 0.85} 85%|████████▍ | 2346/2774 [7:42:21<1:21:44, 11.46s/it] 85%|████████▍ | 2347/2774 [7:42:35<1:26:27, 12.15s/it] {'loss': 0.9722, 'learning_rate': 3.0446823133454346e-07, 'epoch': 0.85} 85%|████████▍ | 2347/2774 [7:42:35<1:26:27, 12.15s/it] 85%|████████▍ | 2348/2774 [7:42:46<1:24:08, 11.85s/it] {'loss': 1.0273, 'learning_rate': 3.0307332528456577e-07, 'epoch': 0.85} 85%|████████▍ | 2348/2774 [7:42:46<1:24:08, 11.85s/it] 85%|████████▍ | 2349/2774 [7:42:57<1:22:55, 11.71s/it] {'loss': 1.0522, 'learning_rate': 3.016814157098588e-07, 'epoch': 0.85} 85%|████████▍ | 2349/2774 [7:42:57<1:22:55, 11.71s/it] 85%|████████▍ | 2350/2774 [7:43:09<1:22:34, 11.68s/it] {'loss': 1.0474, 'learning_rate': 3.00292504508905e-07, 'epoch': 0.85} 85%|████████▍ | 2350/2774 [7:43:09<1:22:34, 11.68s/it] 85%|████████▍ | 2351/2774 [7:43:21<1:22:03, 11.64s/it] {'loss': 1.001, 'learning_rate': 2.989065935760943e-07, 'epoch': 0.85} 85%|████████▍ | 2351/2774 [7:43:21<1:22:03, 11.64s/it] 85%|████████▍ | 2352/2774 [7:43:32<1:21:28, 11.58s/it] {'loss': 1.0259, 'learning_rate': 2.975236848017249e-07, 'epoch': 0.85} 85%|████████▍ | 2352/2774 [7:43:32<1:21:28, 11.58s/it] 85%|████████▍ | 2353/2774 [7:43:44<1:21:18, 11.59s/it] {'loss': 1.0107, 'learning_rate': 2.961437800720021e-07, 'epoch': 0.85} 85%|████████▍ | 2353/2774 [7:43:44<1:21:18, 11.59s/it] 85%|████████▍ | 2354/2774 [7:43:55<1:20:30, 11.50s/it] {'loss': 1.0146, 'learning_rate': 2.947668812690316e-07, 'epoch': 0.85} 85%|████████▍ | 2354/2774 [7:43:55<1:20:30, 11.50s/it] 85%|████████▍ | 2355/2774 [7:44:07<1:21:51, 11.72s/it] {'loss': 1.0132, 'learning_rate': 2.933929902708213e-07, 'epoch': 0.85} 85%|████████▍ | 2355/2774 [7:44:07<1:21:51, 11.72s/it] 85%|████████▍ | 2356/2774 [7:44:18<1:20:29, 11.55s/it] {'loss': 0.979, 'learning_rate': 2.9202210895127424e-07, 'epoch': 0.85} 85%|████████▍ | 2356/2774 [7:44:18<1:20:29, 11.55s/it] 85%|████████▍ | 2357/2774 [7:44:30<1:19:25, 11.43s/it] {'loss': 1.0161, 'learning_rate': 2.906542391801906e-07, 'epoch': 0.85} 85%|████████▍ | 2357/2774 [7:44:30<1:19:25, 11.43s/it] 85%|████████▌ | 2358/2774 [7:44:41<1:18:38, 11.34s/it] {'loss': 1.0283, 'learning_rate': 2.8928938282326123e-07, 'epoch': 0.85} 85%|████████▌ | 2358/2774 [7:44:41<1:18:38, 11.34s/it] 85%|████████▌ | 2359/2774 [7:44:52<1:19:02, 11.43s/it] {'loss': 1.0537, 'learning_rate': 2.8792754174206903e-07, 'epoch': 0.85} 85%|████████▌ | 2359/2774 [7:44:52<1:19:02, 11.43s/it] 85%|████████▌ | 2360/2774 [7:45:04<1:19:14, 11.48s/it] {'loss': 0.9868, 'learning_rate': 2.865687177940818e-07, 'epoch': 0.85} 85%|████████▌ | 2360/2774 [7:45:04<1:19:14, 11.48s/it] 85%|████████▌ | 2361/2774 [7:45:16<1:19:20, 11.53s/it] {'loss': 1.0591, 'learning_rate': 2.8521291283265417e-07, 'epoch': 0.85} 85%|████████▌ | 2361/2774 [7:45:16<1:19:20, 11.53s/it] 85%|████████▌ | 2362/2774 [7:45:27<1:19:33, 11.59s/it] {'loss': 1.0723, 'learning_rate': 2.838601287070214e-07, 'epoch': 0.85} 85%|████████▌ | 2362/2774 [7:45:27<1:19:33, 11.59s/it] 85%|████████▌ | 2363/2774 [7:45:38<1:18:30, 11.46s/it] {'loss': 1.0576, 'learning_rate': 2.825103672623003e-07, 'epoch': 0.85} 85%|████████▌ | 2363/2774 [7:45:38<1:18:30, 11.46s/it] 85%|████████▌ | 2364/2774 [7:45:50<1:17:45, 11.38s/it] {'loss': 1.0005, 'learning_rate': 2.811636303394835e-07, 'epoch': 0.85} 85%|████████▌ | 2364/2774 [7:45:50<1:17:45, 11.38s/it] 85%|████████▌ | 2365/2774 [7:46:01<1:18:19, 11.49s/it] {'loss': 1.0786, 'learning_rate': 2.7981991977543865e-07, 'epoch': 0.85} 85%|████████▌ | 2365/2774 [7:46:01<1:18:19, 11.49s/it] 85%|████████▌ | 2366/2774 [7:46:13<1:17:57, 11.46s/it] {'loss': 1.0723, 'learning_rate': 2.784792374029055e-07, 'epoch': 0.85} 85%|████████▌ | 2366/2774 [7:46:13<1:17:57, 11.46s/it] 85%|████████▌ | 2367/2774 [7:46:27<1:24:02, 12.39s/it] {'loss': 0.9805, 'learning_rate': 2.7714158505049437e-07, 'epoch': 0.85} 85%|████████▌ | 2367/2774 [7:46:27<1:24:02, 12.39s/it] 85%|████████▌ | 2368/2774 [7:46:39<1:21:42, 12.07s/it] {'loss': 0.9829, 'learning_rate': 2.758069645426817e-07, 'epoch': 0.85} 85%|████████▌ | 2368/2774 [7:46:39<1:21:42, 12.07s/it] 85%|████████▌ | 2369/2774 [7:46:50<1:20:05, 11.87s/it] {'loss': 0.9932, 'learning_rate': 2.744753776998102e-07, 'epoch': 0.85} 85%|████████▌ | 2369/2774 [7:46:50<1:20:05, 11.87s/it] 85%|████████▌ | 2370/2774 [7:47:01<1:18:39, 11.68s/it] {'loss': 1.0312, 'learning_rate': 2.731468263380827e-07, 'epoch': 0.85} 85%|████████▌ | 2370/2774 [7:47:01<1:18:39, 11.68s/it] 85%|████████▌ | 2371/2774 [7:47:13<1:17:38, 11.56s/it] {'loss': 1.042, 'learning_rate': 2.7182131226956427e-07, 'epoch': 0.85} 85%|████████▌ | 2371/2774 [7:47:13<1:17:38, 11.56s/it] 86%|████████▌ | 2372/2774 [7:47:24<1:17:45, 11.61s/it] {'loss': 1.0874, 'learning_rate': 2.7049883730217526e-07, 'epoch': 0.86} 86%|████████▌ | 2372/2774 [7:47:24<1:17:45, 11.61s/it] 86%|████████▌ | 2373/2774 [7:47:36<1:17:15, 11.56s/it] {'loss': 1.0752, 'learning_rate': 2.691794032396916e-07, 'epoch': 0.86} 86%|████████▌ | 2373/2774 [7:47:36<1:17:15, 11.56s/it] 86%|████████▌ | 2374/2774 [7:47:48<1:17:56, 11.69s/it] {'loss': 0.9507, 'learning_rate': 2.678630118817413e-07, 'epoch': 0.86} 86%|████████▌ | 2374/2774 [7:47:48<1:17:56, 11.69s/it] 86%|████████▌ | 2375/2774 [7:47:59<1:17:16, 11.62s/it] {'loss': 1.0298, 'learning_rate': 2.6654966502380365e-07, 'epoch': 0.86} 86%|████████▌ | 2375/2774 [7:47:59<1:17:16, 11.62s/it] 86%|████████▌ | 2376/2774 [7:48:11<1:16:45, 11.57s/it] {'loss': 1.0508, 'learning_rate': 2.6523936445720407e-07, 'epoch': 0.86} 86%|████████▌ | 2376/2774 [7:48:11<1:16:45, 11.57s/it] 86%|████████▌ | 2377/2774 [7:48:25<1:22:00, 12.39s/it] {'loss': 0.9697, 'learning_rate': 2.6393211196911267e-07, 'epoch': 0.86} 86%|████████▌ | 2377/2774 [7:48:25<1:22:00, 12.39s/it] 86%|████████▌ | 2378/2774 [7:48:36<1:19:39, 12.07s/it] {'loss': 0.9995, 'learning_rate': 2.626279093425438e-07, 'epoch': 0.86} 86%|████████▌ | 2378/2774 [7:48:36<1:19:39, 12.07s/it] 86%|████████▌ | 2379/2774 [7:48:48<1:18:04, 11.86s/it] {'loss': 1.0083, 'learning_rate': 2.6132675835635e-07, 'epoch': 0.86} 86%|████████▌ | 2379/2774 [7:48:48<1:18:04, 11.86s/it] 86%|████████▌ | 2380/2774 [7:48:59<1:16:50, 11.70s/it] {'loss': 1.0146, 'learning_rate': 2.6002866078522425e-07, 'epoch': 0.86} 86%|████████▌ | 2380/2774 [7:48:59<1:16:50, 11.70s/it] 86%|████████▌ | 2381/2774 [7:49:10<1:16:01, 11.61s/it] {'loss': 1.0605, 'learning_rate': 2.587336183996914e-07, 'epoch': 0.86} 86%|████████▌ | 2381/2774 [7:49:10<1:16:01, 11.61s/it] 86%|████████▌ | 2382/2774 [7:49:22<1:15:20, 11.53s/it] {'loss': 1.0439, 'learning_rate': 2.5744163296611307e-07, 'epoch': 0.86} 86%|████████▌ | 2382/2774 [7:49:22<1:15:20, 11.53s/it] 86%|████████▌ | 2383/2774 [7:49:33<1:15:18, 11.56s/it] {'loss': 1.0088, 'learning_rate': 2.5615270624667706e-07, 'epoch': 0.86} 86%|████████▌ | 2383/2774 [7:49:33<1:15:18, 11.56s/it] 86%|████████▌ | 2384/2774 [7:49:45<1:15:29, 11.61s/it] {'loss': 1.0249, 'learning_rate': 2.5486683999940335e-07, 'epoch': 0.86} 86%|████████▌ | 2384/2774 [7:49:45<1:15:29, 11.61s/it] 86%|████████▌ | 2385/2774 [7:49:57<1:15:59, 11.72s/it] {'loss': 1.02, 'learning_rate': 2.5358403597813443e-07, 'epoch': 0.86} 86%|████████▌ | 2385/2774 [7:49:57<1:15:59, 11.72s/it] 86%|████████▌ | 2386/2774 [7:50:08<1:15:12, 11.63s/it] {'loss': 1.0566, 'learning_rate': 2.5230429593253893e-07, 'epoch': 0.86} 86%|████████▌ | 2386/2774 [7:50:08<1:15:12, 11.63s/it] 86%|████████▌ | 2387/2774 [7:50:22<1:18:02, 12.10s/it] {'loss': 1.0205, 'learning_rate': 2.510276216081037e-07, 'epoch': 0.86} 86%|████████▌ | 2387/2774 [7:50:22<1:18:02, 12.10s/it] 86%|████████▌ | 2388/2774 [7:50:33<1:16:27, 11.88s/it] {'loss': 1.0513, 'learning_rate': 2.497540147461361e-07, 'epoch': 0.86} 86%|████████▌ | 2388/2774 [7:50:33<1:16:27, 11.88s/it] 86%|████████▌ | 2389/2774 [7:50:44<1:15:05, 11.70s/it] {'loss': 0.9673, 'learning_rate': 2.484834770837585e-07, 'epoch': 0.86} 86%|████████▌ | 2389/2774 [7:50:44<1:15:05, 11.70s/it] 86%|████████▌ | 2390/2774 [7:50:57<1:16:29, 11.95s/it] {'loss': 0.9937, 'learning_rate': 2.472160103539084e-07, 'epoch': 0.86} 86%|████████▌ | 2390/2774 [7:50:57<1:16:29, 11.95s/it] 86%|████████▌ | 2391/2774 [7:51:10<1:18:05, 12.23s/it] {'loss': 1.0112, 'learning_rate': 2.4595161628533315e-07, 'epoch': 0.86} 86%|████████▌ | 2391/2774 [7:51:10<1:18:05, 12.23s/it] 86%|████████▌ | 2392/2774 [7:51:22<1:17:55, 12.24s/it] {'loss': 0.9502, 'learning_rate': 2.446902966025902e-07, 'epoch': 0.86} 86%|████████▌ | 2392/2774 [7:51:22<1:17:55, 12.24s/it] 86%|████████▋ | 2393/2774 [7:51:34<1:16:39, 12.07s/it] {'loss': 1.085, 'learning_rate': 2.4343205302604254e-07, 'epoch': 0.86} 86%|████████▋ | 2393/2774 [7:51:34<1:16:39, 12.07s/it] 86%|████████▋ | 2394/2774 [7:51:45<1:15:25, 11.91s/it] {'loss': 1.0283, 'learning_rate': 2.421768872718594e-07, 'epoch': 0.86} 86%|████████▋ | 2394/2774 [7:51:45<1:15:25, 11.91s/it] 86%|████████▋ | 2395/2774 [7:51:56<1:13:59, 11.71s/it] {'loss': 1.0137, 'learning_rate': 2.4092480105201043e-07, 'epoch': 0.86} 86%|████████▋ | 2395/2774 [7:51:56<1:13:59, 11.71s/it] 86%|████████▋ | 2396/2774 [7:52:08<1:12:58, 11.58s/it] {'loss': 1.0186, 'learning_rate': 2.396757960742663e-07, 'epoch': 0.86} 86%|████████▋ | 2396/2774 [7:52:08<1:12:58, 11.58s/it] 86%|████████▋ | 2397/2774 [7:52:19<1:13:02, 11.63s/it] {'loss': 1.0522, 'learning_rate': 2.384298740421939e-07, 'epoch': 0.86} 86%|████████▋ | 2397/2774 [7:52:19<1:13:02, 11.63s/it] 86%|████████▋ | 2398/2774 [7:52:31<1:13:09, 11.67s/it] {'loss': 1.0171, 'learning_rate': 2.3718703665515515e-07, 'epoch': 0.86} 86%|████████▋ | 2398/2774 [7:52:31<1:13:09, 11.67s/it] 86%|████████▋ | 2399/2774 [7:52:42<1:11:59, 11.52s/it] {'loss': 1.0117, 'learning_rate': 2.3594728560830615e-07, 'epoch': 0.86} 86%|████████▋ | 2399/2774 [7:52:42<1:11:59, 11.52s/it] 87%|████████▋ | 2400/2774 [7:52:54<1:11:20, 11.45s/it] {'loss': 1.0171, 'learning_rate': 2.3471062259259187e-07, 'epoch': 0.87} 87%|████████▋ | 2400/2774 [7:52:54<1:11:20, 11.45s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 87%|████████▋ | 2401/2774 [7:53:32<2:00:44, 19.42s/it] {'loss': 0.979, 'learning_rate': 2.3347704929474606e-07, 'epoch': 0.87} 87%|████████▋ | 2401/2774 [7:53:32<2:00:44, 19.42s/it] 87%|████████▋ | 2402/2774 [7:53:43<1:45:52, 17.08s/it] {'loss': 0.9883, 'learning_rate': 2.3224656739728761e-07, 'epoch': 0.87} 87%|████████▋ | 2402/2774 [7:53:43<1:45:52, 17.08s/it] 87%|████████▋ | 2403/2774 [7:53:55<1:35:49, 15.50s/it] {'loss': 1.0498, 'learning_rate': 2.310191785785207e-07, 'epoch': 0.87} 87%|████████▋ | 2403/2774 [7:53:55<1:35:49, 15.50s/it] 87%|████████▋ | 2404/2774 [7:54:08<1:29:57, 14.59s/it] {'loss': 0.9873, 'learning_rate': 2.2979488451252834e-07, 'epoch': 0.87} 87%|████████▋ | 2404/2774 [7:54:08<1:29:57, 14.59s/it] 87%|████████▋ | 2405/2774 [7:54:19<1:24:01, 13.66s/it] {'loss': 0.9927, 'learning_rate': 2.285736868691746e-07, 'epoch': 0.87} 87%|████████▋ | 2405/2774 [7:54:19<1:24:01, 13.66s/it] 87%|████████▋ | 2406/2774 [7:54:31<1:19:38, 12.99s/it] {'loss': 1.0112, 'learning_rate': 2.273555873140984e-07, 'epoch': 0.87} 87%|████████▋ | 2406/2774 [7:54:31<1:19:38, 12.99s/it] 87%|████████▋ | 2407/2774 [7:54:42<1:16:30, 12.51s/it] {'loss': 1.0156, 'learning_rate': 2.26140587508715e-07, 'epoch': 0.87} 87%|████████▋ | 2407/2774 [7:54:42<1:16:30, 12.51s/it] 87%|████████▋ | 2408/2774 [7:54:53<1:14:31, 12.22s/it] {'loss': 0.9893, 'learning_rate': 2.249286891102098e-07, 'epoch': 0.87} 87%|████████▋ | 2408/2774 [7:54:53<1:14:31, 12.22s/it] 87%|████████▋ | 2409/2774 [7:55:05<1:12:40, 11.95s/it] {'loss': 0.9448, 'learning_rate': 2.2371989377154013e-07, 'epoch': 0.87} 87%|████████▋ | 2409/2774 [7:55:05<1:12:40, 11.95s/it] 87%|████████▋ | 2410/2774 [7:55:16<1:11:20, 11.76s/it] {'loss': 1.0483, 'learning_rate': 2.2251420314142845e-07, 'epoch': 0.87} 87%|████████▋ | 2410/2774 [7:55:16<1:11:20, 11.76s/it] 87%|████████▋ | 2411/2774 [7:55:28<1:10:53, 11.72s/it] {'loss': 1.0332, 'learning_rate': 2.2131161886436465e-07, 'epoch': 0.87} 87%|████████▋ | 2411/2774 [7:55:28<1:10:53, 11.72s/it] 87%|████████▋ | 2412/2774 [7:55:39<1:09:47, 11.57s/it] {'loss': 0.9751, 'learning_rate': 2.2011214258060076e-07, 'epoch': 0.87} 87%|████████▋ | 2412/2774 [7:55:39<1:09:47, 11.57s/it] 87%|████████▋ | 2413/2774 [7:55:51<1:09:38, 11.58s/it] {'loss': 1.0034, 'learning_rate': 2.1891577592615065e-07, 'epoch': 0.87} 87%|████████▋ | 2413/2774 [7:55:51<1:09:38, 11.58s/it] 87%|████████▋ | 2414/2774 [7:56:02<1:09:35, 11.60s/it] {'loss': 1.0811, 'learning_rate': 2.1772252053278513e-07, 'epoch': 0.87} 87%|████████▋ | 2414/2774 [7:56:02<1:09:35, 11.60s/it] 87%|████████▋ | 2415/2774 [7:56:14<1:09:08, 11.56s/it] {'loss': 1.0391, 'learning_rate': 2.1653237802803345e-07, 'epoch': 0.87} 87%|████████▋ | 2415/2774 [7:56:14<1:09:08, 11.56s/it] 87%|████████▋ | 2416/2774 [7:56:25<1:08:33, 11.49s/it] {'loss': 0.9873, 'learning_rate': 2.1534535003517736e-07, 'epoch': 0.87} 87%|████████▋ | 2416/2774 [7:56:25<1:08:33, 11.49s/it] 87%|████████▋ | 2417/2774 [7:56:37<1:08:27, 11.51s/it] {'loss': 1.0005, 'learning_rate': 2.1416143817325207e-07, 'epoch': 0.87} 87%|████████▋ | 2417/2774 [7:56:37<1:08:27, 11.51s/it] 87%|████████▋ | 2418/2774 [7:56:49<1:09:36, 11.73s/it] {'loss': 1.0801, 'learning_rate': 2.1298064405704145e-07, 'epoch': 0.87} 87%|████████▋ | 2418/2774 [7:56:49<1:09:36, 11.73s/it] 87%|████████▋ | 2419/2774 [7:57:01<1:10:48, 11.97s/it] {'loss': 1.0039, 'learning_rate': 2.1180296929707717e-07, 'epoch': 0.87} 87%|████████▋ | 2419/2774 [7:57:01<1:10:48, 11.97s/it] 87%|████████▋ | 2420/2774 [7:57:12<1:09:05, 11.71s/it] {'loss': 1.0288, 'learning_rate': 2.106284154996363e-07, 'epoch': 0.87} 87%|████████▋ | 2420/2774 [7:57:12<1:09:05, 11.71s/it] 87%|████████▋ | 2421/2774 [7:57:25<1:10:38, 12.01s/it] {'loss': 1.0098, 'learning_rate': 2.0945698426673988e-07, 'epoch': 0.87} 87%|████████▋ | 2421/2774 [7:57:25<1:10:38, 12.01s/it] 87%|████████▋ | 2422/2774 [7:57:36<1:09:22, 11.82s/it] {'loss': 1.0049, 'learning_rate': 2.0828867719614926e-07, 'epoch': 0.87} 87%|████████▋ | 2422/2774 [7:57:36<1:09:22, 11.82s/it] 87%|████████▋ | 2423/2774 [7:57:48<1:08:47, 11.76s/it] {'loss': 0.9868, 'learning_rate': 2.0712349588136392e-07, 'epoch': 0.87} 87%|████████▋ | 2423/2774 [7:57:48<1:08:47, 11.76s/it] 87%|████████▋ | 2424/2774 [7:58:00<1:08:02, 11.67s/it] {'loss': 1.0093, 'learning_rate': 2.0596144191162182e-07, 'epoch': 0.87} 87%|████████▋ | 2424/2774 [7:58:00<1:08:02, 11.67s/it] 87%|████████▋ | 2425/2774 [7:58:11<1:07:22, 11.58s/it] {'loss': 1.0298, 'learning_rate': 2.048025168718934e-07, 'epoch': 0.87} 87%|████████▋ | 2425/2774 [7:58:11<1:07:22, 11.58s/it] 87%|████████▋ | 2426/2774 [7:58:23<1:07:12, 11.59s/it] {'loss': 1.0811, 'learning_rate': 2.036467223428834e-07, 'epoch': 0.87} 87%|████████▋ | 2426/2774 [7:58:23<1:07:12, 11.59s/it] 87%|████████▋ | 2427/2774 [7:58:36<1:10:12, 12.14s/it] {'loss': 0.9634, 'learning_rate': 2.0249405990102554e-07, 'epoch': 0.87} 87%|████████▋ | 2427/2774 [7:58:36<1:10:12, 12.14s/it] 88%|████████▊ | 2428/2774 [7:58:47<1:08:24, 11.86s/it] {'loss': 1.0093, 'learning_rate': 2.0134453111848112e-07, 'epoch': 0.88} 88%|████████▊ | 2428/2774 [7:58:47<1:08:24, 11.86s/it] 88%|████████▊ | 2429/2774 [7:58:59<1:07:17, 11.70s/it] {'loss': 1.0107, 'learning_rate': 2.0019813756313815e-07, 'epoch': 0.88} 88%|████████▊ | 2429/2774 [7:58:59<1:07:17, 11.70s/it] 88%|████████▊ | 2430/2774 [7:59:10<1:06:55, 11.67s/it] {'loss': 0.9697, 'learning_rate': 1.990548807986084e-07, 'epoch': 0.88} 88%|████████▊ | 2430/2774 [7:59:10<1:06:55, 11.67s/it] 88%|████████▊ | 2431/2774 [7:59:21<1:06:10, 11.58s/it] {'loss': 1.0566, 'learning_rate': 1.979147623842248e-07, 'epoch': 0.88} 88%|████████▊ | 2431/2774 [7:59:21<1:06:10, 11.58s/it] 88%|████████▊ | 2432/2774 [7:59:34<1:07:30, 11.84s/it] {'loss': 1.0483, 'learning_rate': 1.9677778387504064e-07, 'epoch': 0.88} 88%|████████▊ | 2432/2774 [7:59:34<1:07:30, 11.84s/it] 88%|████████▊ | 2433/2774 [7:59:46<1:06:52, 11.77s/it] {'loss': 1.0356, 'learning_rate': 1.9564394682182518e-07, 'epoch': 0.88} 88%|████████▊ | 2433/2774 [7:59:46<1:06:52, 11.77s/it] 88%|████████▊ | 2434/2774 [7:59:57<1:06:02, 11.65s/it] {'loss': 0.979, 'learning_rate': 1.9451325277106415e-07, 'epoch': 0.88} 88%|████████▊ | 2434/2774 [7:59:57<1:06:02, 11.65s/it] 88%|████████▊ | 2435/2774 [8:00:09<1:05:49, 11.65s/it] {'loss': 1.0063, 'learning_rate': 1.9338570326495555e-07, 'epoch': 0.88} 88%|████████▊ | 2435/2774 [8:00:09<1:05:49, 11.65s/it] 88%|████████▊ | 2436/2774 [8:00:20<1:05:52, 11.69s/it] {'loss': 1.0327, 'learning_rate': 1.9226129984140945e-07, 'epoch': 0.88} 88%|████████▊ | 2436/2774 [8:00:20<1:05:52, 11.69s/it] 88%|████████▊ | 2437/2774 [8:00:34<1:08:37, 12.22s/it] {'loss': 1.084, 'learning_rate': 1.911400440340433e-07, 'epoch': 0.88} 88%|████████▊ | 2437/2774 [8:00:34<1:08:37, 12.22s/it] 88%|████████▊ | 2438/2774 [8:00:45<1:06:47, 11.93s/it] {'loss': 1.0742, 'learning_rate': 1.9002193737218288e-07, 'epoch': 0.88} 88%|████████▊ | 2438/2774 [8:00:45<1:06:47, 11.93s/it] 88%|████████▊ | 2439/2774 [8:00:56<1:05:31, 11.74s/it] {'loss': 1.0225, 'learning_rate': 1.889069813808575e-07, 'epoch': 0.88} 88%|████████▊ | 2439/2774 [8:00:56<1:05:31, 11.74s/it] 88%|████████▊ | 2440/2774 [8:01:07<1:04:22, 11.56s/it] {'loss': 1.0371, 'learning_rate': 1.8779517758080096e-07, 'epoch': 0.88} 88%|████████▊ | 2440/2774 [8:01:07<1:04:22, 11.56s/it] 88%|████████▊ | 2441/2774 [8:01:20<1:05:53, 11.87s/it] {'loss': 1.0156, 'learning_rate': 1.8668652748844524e-07, 'epoch': 0.88} 88%|████████▊ | 2441/2774 [8:01:20<1:05:53, 11.87s/it] 88%|████████▊ | 2442/2774 [8:01:33<1:07:27, 12.19s/it] {'loss': 0.9971, 'learning_rate': 1.8558103261592298e-07, 'epoch': 0.88} 88%|████████▊ | 2442/2774 [8:01:33<1:07:27, 12.19s/it] 88%|████████▊ | 2443/2774 [8:01:47<1:10:00, 12.69s/it] {'loss': 0.9946, 'learning_rate': 1.8447869447106194e-07, 'epoch': 0.88} 88%|████████▊ | 2443/2774 [8:01:47<1:10:00, 12.69s/it] 88%|████████▊ | 2444/2774 [8:01:58<1:07:24, 12.26s/it] {'loss': 1.061, 'learning_rate': 1.8337951455738469e-07, 'epoch': 0.88} 88%|████████▊ | 2444/2774 [8:01:58<1:07:24, 12.26s/it] 88%|████████▊ | 2445/2774 [8:02:10<1:06:24, 12.11s/it] {'loss': 1.002, 'learning_rate': 1.822834943741067e-07, 'epoch': 0.88} 88%|████████▊ | 2445/2774 [8:02:10<1:06:24, 12.11s/it] 88%|████████▊ | 2446/2774 [8:02:21<1:05:20, 11.95s/it] {'loss': 1.0156, 'learning_rate': 1.8119063541613302e-07, 'epoch': 0.88} 88%|████████▊ | 2446/2774 [8:02:21<1:05:20, 11.95s/it] 88%|████████▊ | 2447/2774 [8:02:33<1:03:48, 11.71s/it] {'loss': 0.9839, 'learning_rate': 1.8010093917405714e-07, 'epoch': 0.88} 88%|████████▊ | 2447/2774 [8:02:33<1:03:48, 11.71s/it] 88%|████████▊ | 2448/2774 [8:02:44<1:03:27, 11.68s/it] {'loss': 1.061, 'learning_rate': 1.7901440713415873e-07, 'epoch': 0.88} 88%|████████▊ | 2448/2774 [8:02:44<1:03:27, 11.68s/it] 88%|████████▊ | 2449/2774 [8:02:58<1:06:01, 12.19s/it] {'loss': 1.0303, 'learning_rate': 1.779310407784024e-07, 'epoch': 0.88} 88%|████████▊ | 2449/2774 [8:02:58<1:06:01, 12.19s/it] 88%|████████▊ | 2450/2774 [8:03:09<1:04:49, 12.00s/it] {'loss': 1.022, 'learning_rate': 1.768508415844339e-07, 'epoch': 0.88} 88%|████████▊ | 2450/2774 [8:03:09<1:04:49, 12.00s/it] 88%|████████▊ | 2451/2774 [8:03:20<1:03:15, 11.75s/it] {'loss': 1.0259, 'learning_rate': 1.757738110255802e-07, 'epoch': 0.88} 88%|████████▊ | 2451/2774 [8:03:20<1:03:15, 11.75s/it] 88%|████████▊ | 2452/2774 [8:03:34<1:05:33, 12.22s/it] {'loss': 1.0674, 'learning_rate': 1.746999505708452e-07, 'epoch': 0.88} 88%|████████▊ | 2452/2774 [8:03:34<1:05:33, 12.22s/it] 88%|████████▊ | 2453/2774 [8:03:45<1:04:26, 12.05s/it] {'loss': 1.0098, 'learning_rate': 1.7362926168491057e-07, 'epoch': 0.88} 88%|████████▊ | 2453/2774 [8:03:45<1:04:26, 12.05s/it] 88%|████████▊ | 2454/2774 [8:03:57<1:02:58, 11.81s/it] {'loss': 1.02, 'learning_rate': 1.725617458281309e-07, 'epoch': 0.88} 88%|████████▊ | 2454/2774 [8:03:57<1:02:58, 11.81s/it] 89%|████████▊ | 2455/2774 [8:04:10<1:04:49, 12.19s/it] {'loss': 1.0654, 'learning_rate': 1.7149740445653345e-07, 'epoch': 0.89} 89%|████████▊ | 2455/2774 [8:04:10<1:04:49, 12.19s/it] 89%|████████▊ | 2456/2774 [8:04:21<1:03:48, 12.04s/it] {'loss': 1.0288, 'learning_rate': 1.7043623902181478e-07, 'epoch': 0.89} 89%|████████▊ | 2456/2774 [8:04:21<1:03:48, 12.04s/it] 89%|████████▊ | 2457/2774 [8:04:33<1:02:34, 11.84s/it] {'loss': 1.0459, 'learning_rate': 1.6937825097134126e-07, 'epoch': 0.89} 89%|████████▊ | 2457/2774 [8:04:33<1:02:34, 11.84s/it] 89%|████████▊ | 2458/2774 [8:04:44<1:01:41, 11.71s/it] {'loss': 0.998, 'learning_rate': 1.6832344174814413e-07, 'epoch': 0.89} 89%|████████▊ | 2458/2774 [8:04:44<1:01:41, 11.71s/it] 89%|████████▊ | 2459/2774 [8:04:57<1:03:45, 12.14s/it] {'loss': 1.0039, 'learning_rate': 1.6727181279092037e-07, 'epoch': 0.89} 89%|████████▊ | 2459/2774 [8:04:57<1:03:45, 12.14s/it] 89%|████████▊ | 2460/2774 [8:05:09<1:02:41, 11.98s/it] {'loss': 1.0693, 'learning_rate': 1.662233655340273e-07, 'epoch': 0.89} 89%|████████▊ | 2460/2774 [8:05:09<1:02:41, 11.98s/it] 89%|████████▊ | 2461/2774 [8:05:20<1:01:37, 11.81s/it] {'loss': 1.0024, 'learning_rate': 1.6517810140748436e-07, 'epoch': 0.89} 89%|████████▊ | 2461/2774 [8:05:20<1:01:37, 11.81s/it] 89%|████████▉ | 2462/2774 [8:05:31<1:00:29, 11.63s/it] {'loss': 1.0796, 'learning_rate': 1.6413602183696808e-07, 'epoch': 0.89} 89%|████████▉ | 2462/2774 [8:05:31<1:00:29, 11.63s/it] 89%|████████▉ | 2463/2774 [8:05:44<1:00:55, 11.75s/it] {'loss': 1.0527, 'learning_rate': 1.6309712824381318e-07, 'epoch': 0.89} 89%|████████▉ | 2463/2774 [8:05:44<1:00:55, 11.75s/it] 89%|████████▉ | 2464/2774 [8:05:55<1:00:18, 11.67s/it] {'loss': 1.042, 'learning_rate': 1.620614220450062e-07, 'epoch': 0.89} 89%|████████▉ | 2464/2774 [8:05:55<1:00:18, 11.67s/it] 89%|████████▉ | 2465/2774 [8:06:06<59:30, 11.55s/it] {'loss': 1.0562, 'learning_rate': 1.6102890465318904e-07, 'epoch': 0.89} 89%|████████▉ | 2465/2774 [8:06:06<59:30, 11.55s/it] 89%|████████▉ | 2466/2774 [8:06:18<59:45, 11.64s/it] {'loss': 0.9609, 'learning_rate': 1.5999957747665191e-07, 'epoch': 0.89} 89%|████████▉ | 2466/2774 [8:06:18<59:45, 11.64s/it] 89%|████████▉ | 2467/2774 [8:06:29<58:52, 11.51s/it] {'loss': 1.0127, 'learning_rate': 1.5897344191933617e-07, 'epoch': 0.89} 89%|████████▉ | 2467/2774 [8:06:29<58:52, 11.51s/it] 89%|████████▉ | 2468/2774 [8:06:42<1:01:00, 11.96s/it] {'loss': 0.9526, 'learning_rate': 1.5795049938082841e-07, 'epoch': 0.89} 89%|████████▉ | 2468/2774 [8:06:42<1:01:00, 11.96s/it] 89%|████████▉ | 2469/2774 [8:06:54<59:46, 11.76s/it] {'loss': 1.0396, 'learning_rate': 1.5693075125635949e-07, 'epoch': 0.89} 89%|████████▉ | 2469/2774 [8:06:54<59:46, 11.76s/it] 89%|████████▉ | 2470/2774 [8:07:05<59:12, 11.69s/it] {'loss': 1.062, 'learning_rate': 1.5591419893680542e-07, 'epoch': 0.89} 89%|████████▉ | 2470/2774 [8:07:05<59:12, 11.69s/it] 89%|████████▉ | 2471/2774 [8:07:16<58:29, 11.58s/it] {'loss': 0.9961, 'learning_rate': 1.549008438086813e-07, 'epoch': 0.89} 89%|████████▉ | 2471/2774 [8:07:16<58:29, 11.58s/it] 89%|████████▉ | 2472/2774 [8:07:28<58:32, 11.63s/it] {'loss': 1.0503, 'learning_rate': 1.5389068725414346e-07, 'epoch': 0.89} 89%|████████▉ | 2472/2774 [8:07:28<58:32, 11.63s/it] 89%|████████▉ | 2473/2774 [8:07:40<59:17, 11.82s/it] {'loss': 1.0786, 'learning_rate': 1.5288373065098284e-07, 'epoch': 0.89} 89%|████████▉ | 2473/2774 [8:07:40<59:17, 11.82s/it] 89%|████████▉ | 2474/2774 [8:07:52<58:12, 11.64s/it] {'loss': 1.0371, 'learning_rate': 1.5187997537262882e-07, 'epoch': 0.89} 89%|████████▉ | 2474/2774 [8:07:52<58:12, 11.64s/it] 89%|████████▉ | 2475/2774 [8:08:03<57:57, 11.63s/it] {'loss': 0.9902, 'learning_rate': 1.5087942278814188e-07, 'epoch': 0.89} 89%|████████▉ | 2475/2774 [8:08:03<57:57, 11.63s/it] 89%|████████▉ | 2476/2774 [8:08:15<57:31, 11.58s/it] {'loss': 0.9819, 'learning_rate': 1.4988207426221617e-07, 'epoch': 0.89} 89%|████████▉ | 2476/2774 [8:08:15<57:31, 11.58s/it] 89%|████████▉ | 2477/2774 [8:08:26<57:26, 11.60s/it] {'loss': 0.9756, 'learning_rate': 1.4888793115517412e-07, 'epoch': 0.89} 89%|████████▉ | 2477/2774 [8:08:26<57:26, 11.60s/it] 89%|████████▉ | 2478/2774 [8:08:40<59:55, 12.15s/it] {'loss': 1.0034, 'learning_rate': 1.478969948229675e-07, 'epoch': 0.89} 89%|████████▉ | 2478/2774 [8:08:40<59:55, 12.15s/it] 89%|████████▉ | 2479/2774 [8:08:51<58:15, 11.85s/it] {'loss': 1.0073, 'learning_rate': 1.4690926661717313e-07, 'epoch': 0.89} 89%|████████▉ | 2479/2774 [8:08:51<58:15, 11.85s/it] 89%|████████▉ | 2480/2774 [8:09:03<57:36, 11.76s/it] {'loss': 1.0615, 'learning_rate': 1.4592474788499317e-07, 'epoch': 0.89} 89%|████████▉ | 2480/2774 [8:09:03<57:36, 11.76s/it] 89%|████████▉ | 2481/2774 [8:09:15<58:49, 12.05s/it] {'loss': 0.979, 'learning_rate': 1.449434399692512e-07, 'epoch': 0.89} 89%|████████▉ | 2481/2774 [8:09:15<58:49, 12.05s/it] 89%|████████▉ | 2482/2774 [8:09:27<58:02, 11.93s/it] {'loss': 1.0068, 'learning_rate': 1.4396534420839214e-07, 'epoch': 0.89} 89%|████████▉ | 2482/2774 [8:09:27<58:02, 11.93s/it] 90%|████████▉ | 2483/2774 [8:09:39<58:22, 12.04s/it] {'loss': 1.0342, 'learning_rate': 1.4299046193647914e-07, 'epoch': 0.9} 90%|████████▉ | 2483/2774 [8:09:39<58:22, 12.04s/it] 90%|████████▉ | 2484/2774 [8:09:51<57:42, 11.94s/it] {'loss': 1.0298, 'learning_rate': 1.4201879448319356e-07, 'epoch': 0.9} 90%|████████▉ | 2484/2774 [8:09:51<57:42, 11.94s/it] 90%|████████▉ | 2485/2774 [8:10:02<56:23, 11.71s/it] {'loss': 1.0645, 'learning_rate': 1.4105034317383e-07, 'epoch': 0.9} 90%|████████▉ | 2485/2774 [8:10:02<56:23, 11.71s/it] 90%|████████▉ | 2486/2774 [8:10:13<55:35, 11.58s/it] {'loss': 0.9995, 'learning_rate': 1.4008510932929848e-07, 'epoch': 0.9} 90%|████████▉ | 2486/2774 [8:10:13<55:35, 11.58s/it] 90%|████████▉ | 2487/2774 [8:10:26<56:22, 11.78s/it] {'loss': 1.0591, 'learning_rate': 1.3912309426611924e-07, 'epoch': 0.9} 90%|████████▉ | 2487/2774 [8:10:26<56:22, 11.78s/it] 90%|████████▉ | 2488/2774 [8:10:37<55:24, 11.63s/it] {'loss': 0.9956, 'learning_rate': 1.38164299296423e-07, 'epoch': 0.9} 90%|████████▉ | 2488/2774 [8:10:37<55:24, 11.63s/it] 90%|████████▉ | 2489/2774 [8:10:49<55:23, 11.66s/it] {'loss': 0.9932, 'learning_rate': 1.372087257279478e-07, 'epoch': 0.9} 90%|████████▉ | 2489/2774 [8:10:49<55:23, 11.66s/it] 90%|████████▉ | 2490/2774 [8:11:01<56:38, 11.97s/it] {'loss': 1.0444, 'learning_rate': 1.362563748640386e-07, 'epoch': 0.9} 90%|████████▉ | 2490/2774 [8:11:01<56:38, 11.97s/it] 90%|████████▉ | 2491/2774 [8:11:12<55:16, 11.72s/it] {'loss': 1.0303, 'learning_rate': 1.353072480036438e-07, 'epoch': 0.9} 90%|████████▉ | 2491/2774 [8:11:12<55:16, 11.72s/it] 90%|████████▉ | 2492/2774 [8:11:24<54:37, 11.62s/it] {'loss': 1.0664, 'learning_rate': 1.3436134644131627e-07, 'epoch': 0.9} 90%|████████▉ | 2492/2774 [8:11:24<54:37, 11.62s/it] 90%|████████▉ | 2493/2774 [8:11:35<53:59, 11.53s/it] {'loss': 1.0273, 'learning_rate': 1.3341867146720755e-07, 'epoch': 0.9} 90%|████████▉ | 2493/2774 [8:11:35<53:59, 11.53s/it] 90%|████████▉ | 2494/2774 [8:11:47<53:39, 11.50s/it] {'loss': 1.0142, 'learning_rate': 1.3247922436706972e-07, 'epoch': 0.9} 90%|████████▉ | 2494/2774 [8:11:47<53:39, 11.50s/it] 90%|████████▉ | 2495/2774 [8:11:58<53:15, 11.45s/it] {'loss': 1.0796, 'learning_rate': 1.3154300642225198e-07, 'epoch': 0.9} 90%|████████▉ | 2495/2774 [8:11:58<53:15, 11.45s/it] 90%|████████▉ | 2496/2774 [8:12:09<52:52, 11.41s/it] {'loss': 1.0122, 'learning_rate': 1.306100189096987e-07, 'epoch': 0.9} 90%|████████▉ | 2496/2774 [8:12:09<52:52, 11.41s/it] 90%|█████████ | 2497/2774 [8:12:22<53:52, 11.67s/it] {'loss': 0.9663, 'learning_rate': 1.2968026310194892e-07, 'epoch': 0.9} 90%|█████████ | 2497/2774 [8:12:22<53:52, 11.67s/it] 90%|█████████ | 2498/2774 [8:12:33<53:24, 11.61s/it] {'loss': 1.0693, 'learning_rate': 1.2875374026713294e-07, 'epoch': 0.9} 90%|█████████ | 2498/2774 [8:12:33<53:24, 11.61s/it] 90%|█████████ | 2499/2774 [8:12:45<53:28, 11.67s/it] {'loss': 1.0288, 'learning_rate': 1.2783045166897296e-07, 'epoch': 0.9} 90%|█████████ | 2499/2774 [8:12:45<53:28, 11.67s/it] 90%|█████████ | 2500/2774 [8:12:57<54:42, 11.98s/it] {'loss': 1.0083, 'learning_rate': 1.2691039856677744e-07, 'epoch': 0.9} 90%|█████████ | 2500/2774 [8:12:58<54:42, 11.98s/it] 90%|█████████ | 2501/2774 [8:13:09<53:29, 11.76s/it] {'loss': 1.0503, 'learning_rate': 1.259935822154443e-07, 'epoch': 0.9} 90%|█████████ | 2501/2774 [8:13:09<53:29, 11.76s/it] 90%|█████████ | 2502/2774 [8:13:20<53:12, 11.74s/it] {'loss': 0.9917, 'learning_rate': 1.2508000386545482e-07, 'epoch': 0.9} 90%|█████████ | 2502/2774 [8:13:20<53:12, 11.74s/it] 90%|█████████ | 2503/2774 [8:13:32<52:34, 11.64s/it] {'loss': 1.0308, 'learning_rate': 1.2416966476287538e-07, 'epoch': 0.9} 90%|█████████ | 2503/2774 [8:13:32<52:34, 11.64s/it] 90%|█████████ | 2504/2774 [8:13:43<51:53, 11.53s/it] {'loss': 1.0386, 'learning_rate': 1.2326256614935306e-07, 'epoch': 0.9} 90%|█████████ | 2504/2774 [8:13:43<51:53, 11.53s/it] 90%|█████████ | 2505/2774 [8:13:55<51:42, 11.53s/it] {'loss': 1.0693, 'learning_rate': 1.223587092621162e-07, 'epoch': 0.9} 90%|█████████ | 2505/2774 [8:13:55<51:42, 11.53s/it] 90%|█████████ | 2506/2774 [8:14:06<51:34, 11.55s/it] {'loss': 1.0361, 'learning_rate': 1.2145809533397e-07, 'epoch': 0.9} 90%|█████████ | 2506/2774 [8:14:06<51:34, 11.55s/it] 90%|█████████ | 2507/2774 [8:14:19<53:18, 11.98s/it] {'loss': 1.019, 'learning_rate': 1.2056072559329861e-07, 'epoch': 0.9} 90%|█████████ | 2507/2774 [8:14:19<53:18, 11.98s/it] 90%|█████████ | 2508/2774 [8:14:31<53:15, 12.01s/it] {'loss': 1.0361, 'learning_rate': 1.1966660126405934e-07, 'epoch': 0.9} 90%|█████████ | 2508/2774 [8:14:31<53:15, 12.01s/it] 90%|█████████ | 2509/2774 [8:14:43<52:42, 11.93s/it] {'loss': 1.0371, 'learning_rate': 1.1877572356578409e-07, 'epoch': 0.9} 90%|█████████ | 2509/2774 [8:14:43<52:42, 11.93s/it] 90%|█████████ | 2510/2774 [8:14:55<52:35, 11.95s/it] {'loss': 0.9746, 'learning_rate': 1.1788809371357568e-07, 'epoch': 0.9} 90%|█████████ | 2510/2774 [8:14:55<52:35, 11.95s/it] 91%|█████████ | 2511/2774 [8:15:06<51:40, 11.79s/it] {'loss': 1.0137, 'learning_rate': 1.1700371291810842e-07, 'epoch': 0.91} 91%|█████████ | 2511/2774 [8:15:06<51:40, 11.79s/it] 91%|█████████ | 2512/2774 [8:15:18<51:09, 11.72s/it] {'loss': 1.0518, 'learning_rate': 1.161225823856238e-07, 'epoch': 0.91} 91%|█████████ | 2512/2774 [8:15:18<51:09, 11.72s/it] 91%|█████████ | 2513/2774 [8:15:30<50:48, 11.68s/it] {'loss': 1.0591, 'learning_rate': 1.1524470331793075e-07, 'epoch': 0.91} 91%|█████████ | 2513/2774 [8:15:30<50:48, 11.68s/it] 91%|█████████ | 2514/2774 [8:15:41<50:07, 11.57s/it] {'loss': 0.9868, 'learning_rate': 1.143700769124037e-07, 'epoch': 0.91} 91%|█████████ | 2514/2774 [8:15:41<50:07, 11.57s/it] 91%|█████████ | 2515/2774 [8:15:53<50:32, 11.71s/it] {'loss': 1.042, 'learning_rate': 1.1349870436197924e-07, 'epoch': 0.91} 91%|█████████ | 2515/2774 [8:15:53<50:32, 11.71s/it] 91%|█████████ | 2516/2774 [8:16:04<49:54, 11.61s/it] {'loss': 1.0444, 'learning_rate': 1.1263058685515776e-07, 'epoch': 0.91} 91%|█████████ | 2516/2774 [8:16:04<49:54, 11.61s/it] 91%|█████████ | 2517/2774 [8:16:16<49:44, 11.61s/it] {'loss': 1.0273, 'learning_rate': 1.117657255759988e-07, 'epoch': 0.91} 91%|█████████ | 2517/2774 [8:16:16<49:44, 11.61s/it] 91%|█████████ | 2518/2774 [8:16:28<49:29, 11.60s/it] {'loss': 1.0342, 'learning_rate': 1.1090412170412068e-07, 'epoch': 0.91} 91%|█████████ | 2518/2774 [8:16:28<49:29, 11.60s/it] 91%|█████████ | 2519/2774 [8:16:39<49:00, 11.53s/it] {'loss': 1.0806, 'learning_rate': 1.100457764146995e-07, 'epoch': 0.91} 91%|█████████ | 2519/2774 [8:16:39<49:00, 11.53s/it] 91%|█████████ | 2520/2774 [8:16:50<48:44, 11.52s/it] {'loss': 1.0332, 'learning_rate': 1.0919069087846624e-07, 'epoch': 0.91} 91%|█████████ | 2520/2774 [8:16:50<48:44, 11.52s/it] 91%|█████████ | 2521/2774 [8:17:02<48:18, 11.46s/it] {'loss': 1.0171, 'learning_rate': 1.0833886626170547e-07, 'epoch': 0.91} 91%|█████████ | 2521/2774 [8:17:02<48:18, 11.46s/it] 91%|█████████ | 2522/2774 [8:17:13<48:09, 11.47s/it] {'loss': 1.062, 'learning_rate': 1.0749030372625535e-07, 'epoch': 0.91} 91%|█████████ | 2522/2774 [8:17:13<48:09, 11.47s/it] 91%|█████████ | 2523/2774 [8:17:25<48:39, 11.63s/it] {'loss': 1.0229, 'learning_rate': 1.0664500442950281e-07, 'epoch': 0.91} 91%|█████████ | 2523/2774 [8:17:25<48:39, 11.63s/it] 91%|█████████ | 2524/2774 [8:17:37<48:18, 11.60s/it] {'loss': 1.0083, 'learning_rate': 1.0580296952438618e-07, 'epoch': 0.91} 91%|█████████ | 2524/2774 [8:17:37<48:18, 11.60s/it] 91%|█████████ | 2525/2774 [8:17:48<47:40, 11.49s/it] {'loss': 0.9702, 'learning_rate': 1.0496420015938924e-07, 'epoch': 0.91} 91%|█████████ | 2525/2774 [8:17:48<47:40, 11.49s/it] 91%|█████████ | 2526/2774 [8:18:00<47:34, 11.51s/it] {'loss': 0.9692, 'learning_rate': 1.0412869747854409e-07, 'epoch': 0.91} 91%|█████████ | 2526/2774 [8:18:00<47:34, 11.51s/it] 91%|█████████ | 2527/2774 [8:18:11<47:23, 11.51s/it] {'loss': 1.0415, 'learning_rate': 1.032964626214239e-07, 'epoch': 0.91} 91%|█████████ | 2527/2774 [8:18:11<47:23, 11.51s/it] 91%|█████████ | 2528/2774 [8:18:25<50:12, 12.25s/it] {'loss': 1.0146, 'learning_rate': 1.0246749672314844e-07, 'epoch': 0.91} 91%|█████████ | 2528/2774 [8:18:25<50:12, 12.25s/it] 91%|█████████ | 2529/2774 [8:18:37<49:48, 12.20s/it] {'loss': 1.0605, 'learning_rate': 1.0164180091437631e-07, 'epoch': 0.91} 91%|█████████ | 2529/2774 [8:18:37<49:48, 12.20s/it] 91%|█████████ | 2530/2774 [8:18:51<52:08, 12.82s/it] {'loss': 0.9883, 'learning_rate': 1.0081937632130695e-07, 'epoch': 0.91} 91%|█████████ | 2530/2774 [8:18:51<52:08, 12.82s/it] 91%|█████████ | 2531/2774 [8:19:04<51:29, 12.71s/it] {'loss': 0.9946, 'learning_rate': 1.0000022406567777e-07, 'epoch': 0.91} 91%|█████████ | 2531/2774 [8:19:04<51:29, 12.71s/it] 91%|█████████▏| 2532/2774 [8:19:17<52:11, 12.94s/it] {'loss': 1.0737, 'learning_rate': 9.918434526476311e-08, 'epoch': 0.91} 91%|█████████▏| 2532/2774 [8:19:17<52:11, 12.94s/it] 91%|█████████▏| 2533/2774 [8:19:29<49:58, 12.44s/it] {'loss': 1.085, 'learning_rate': 9.837174103137199e-08, 'epoch': 0.91} 91%|█████████▏| 2533/2774 [8:19:29<49:58, 12.44s/it] 91%|█████████▏| 2534/2774 [8:19:40<48:20, 12.09s/it] {'loss': 1.0244, 'learning_rate': 9.756241247384807e-08, 'epoch': 0.91} 91%|█████████▏| 2534/2774 [8:19:40<48:20, 12.09s/it] 91%|█████████▏| 2535/2774 [8:19:52<48:31, 12.18s/it] {'loss': 0.9917, 'learning_rate': 9.675636069606642e-08, 'epoch': 0.91} 91%|█████████▏| 2535/2774 [8:19:52<48:31, 12.18s/it]/usr/local/lib/python3.9/dist-packages/PIL/TiffImagePlugin.py:850: UserWarning: Corrupt EXIF data. Expecting to read 2 bytes but only got 0. warnings.warn(str(msg)) 91%|█████████▏| 2536/2774 [8:20:04<47:40, 12.02s/it] {'loss': 1.0112, 'learning_rate': 9.595358679743261e-08, 'epoch': 0.91} 91%|█████████▏| 2536/2774 [8:20:04<47:40, 12.02s/it] 91%|█████████▏| 2537/2774 [8:20:15<46:44, 11.83s/it] {'loss': 0.9985, 'learning_rate': 9.515409187288188e-08, 'epoch': 0.91} 91%|█████████▏| 2537/2774 [8:20:15<46:44, 11.83s/it] 91%|█████████▏| 2538/2774 [8:20:27<45:55, 11.68s/it] {'loss': 1.0415, 'learning_rate': 9.435787701287724e-08, 'epoch': 0.91} 91%|█████████▏| 2538/2774 [8:20:27<45:55, 11.68s/it] 92%|█████████▏| 2539/2774 [8:20:38<45:03, 11.51s/it] {'loss': 1.0122, 'learning_rate': 9.356494330340749e-08, 'epoch': 0.92} 92%|█████████▏| 2539/2774 [8:20:38<45:03, 11.51s/it] 92%|█████████▏| 2540/2774 [8:20:49<44:41, 11.46s/it] {'loss': 1.0435, 'learning_rate': 9.277529182598638e-08, 'epoch': 0.92} 92%|█████████▏| 2540/2774 [8:20:49<44:41, 11.46s/it] 92%|█████████▏| 2541/2774 [8:21:01<44:52, 11.56s/it] {'loss': 0.9966, 'learning_rate': 9.198892365765072e-08, 'epoch': 0.92} 92%|█████████▏| 2541/2774 [8:21:01<44:52, 11.56s/it] 92%|█████████▏| 2542/2774 [8:21:12<44:32, 11.52s/it] {'loss': 1.0322, 'learning_rate': 9.120583987095921e-08, 'epoch': 0.92} 92%|█████████▏| 2542/2774 [8:21:12<44:32, 11.52s/it] 92%|█████████▏| 2543/2774 [8:21:24<44:10, 11.47s/it] {'loss': 1.0215, 'learning_rate': 9.04260415339911e-08, 'epoch': 0.92} 92%|█████████▏| 2543/2774 [8:21:24<44:10, 11.47s/it] 92%|█████████▏| 2544/2774 [8:21:35<44:19, 11.56s/it] {'loss': 1.0488, 'learning_rate': 8.964952971034418e-08, 'epoch': 0.92} 92%|█████████▏| 2544/2774 [8:21:35<44:19, 11.56s/it] 92%|█████████▏| 2545/2774 [8:21:47<44:04, 11.55s/it] {'loss': 0.98, 'learning_rate': 8.887630545913323e-08, 'epoch': 0.92} 92%|█████████▏| 2545/2774 [8:21:47<44:04, 11.55s/it] 92%|█████████▏| 2546/2774 [8:21:59<44:09, 11.62s/it] {'loss': 1.0576, 'learning_rate': 8.81063698349896e-08, 'epoch': 0.92} 92%|█████████▏| 2546/2774 [8:21:59<44:09, 11.62s/it] 92%|█████████▏| 2547/2774 [8:22:10<43:37, 11.53s/it] {'loss': 0.9971, 'learning_rate': 8.733972388805911e-08, 'epoch': 0.92} 92%|█████████▏| 2547/2774 [8:22:10<43:37, 11.53s/it] 92%|█████████▏| 2548/2774 [8:22:22<43:28, 11.54s/it] {'loss': 1.0005, 'learning_rate': 8.657636866400032e-08, 'epoch': 0.92} 92%|█████████▏| 2548/2774 [8:22:22<43:28, 11.54s/it] 92%|█████████▏| 2549/2774 [8:22:33<42:59, 11.47s/it] {'loss': 1.0708, 'learning_rate': 8.581630520398398e-08, 'epoch': 0.92} 92%|█████████▏| 2549/2774 [8:22:33<42:59, 11.47s/it] 92%|█████████▏| 2550/2774 [8:22:47<45:47, 12.27s/it] {'loss': 0.9536, 'learning_rate': 8.505953454469057e-08, 'epoch': 0.92} 92%|█████████▏| 2550/2774 [8:22:47<45:47, 12.27s/it] 92%|█████████▏| 2551/2774 [8:22:59<44:46, 12.05s/it] {'loss': 1.0186, 'learning_rate': 8.430605771830941e-08, 'epoch': 0.92} 92%|█████████▏| 2551/2774 [8:22:59<44:46, 12.05s/it] 92%|█████████▏| 2552/2774 [8:23:11<45:11, 12.21s/it] {'loss': 0.9834, 'learning_rate': 8.355587575253732e-08, 'epoch': 0.92} 92%|█████████▏| 2552/2774 [8:23:11<45:11, 12.21s/it] 92%|█████████▏| 2553/2774 [8:23:22<43:59, 11.94s/it] {'loss': 0.9927, 'learning_rate': 8.280898967057805e-08, 'epoch': 0.92} 92%|█████████▏| 2553/2774 [8:23:22<43:59, 11.94s/it] 92%|█████████▏| 2554/2774 [8:23:34<43:41, 11.92s/it] {'loss': 1.0493, 'learning_rate': 8.206540049113781e-08, 'epoch': 0.92} 92%|█████████▏| 2554/2774 [8:23:34<43:41, 11.92s/it] 92%|█████████▏| 2555/2774 [8:23:46<43:29, 11.92s/it] {'loss': 0.9897, 'learning_rate': 8.132510922842812e-08, 'epoch': 0.92} 92%|█████████▏| 2555/2774 [8:23:46<43:29, 11.92s/it] 92%|█████████▏| 2556/2774 [8:23:58<42:40, 11.74s/it] {'loss': 1.0137, 'learning_rate': 8.0588116892161e-08, 'epoch': 0.92} 92%|█████████▏| 2556/2774 [8:23:58<42:40, 11.74s/it] 92%|█████████▏| 2557/2774 [8:24:11<44:45, 12.37s/it] {'loss': 1.02, 'learning_rate': 7.985442448755015e-08, 'epoch': 0.92} 92%|█████████▏| 2557/2774 [8:24:11<44:45, 12.37s/it] 92%|█████████▏| 2558/2774 [8:24:23<43:12, 12.00s/it] {'loss': 1.0171, 'learning_rate': 7.912403301530703e-08, 'epoch': 0.92} 92%|█████████▏| 2558/2774 [8:24:23<43:12, 12.00s/it] 92%|█████████▏| 2559/2774 [8:24:34<42:31, 11.87s/it] {'loss': 1.0571, 'learning_rate': 7.839694347164223e-08, 'epoch': 0.92} 92%|█████████▏| 2559/2774 [8:24:34<42:31, 11.87s/it] 92%|█████████▏| 2560/2774 [8:24:46<42:38, 11.95s/it] {'loss': 1.0679, 'learning_rate': 7.767315684826138e-08, 'epoch': 0.92} 92%|█████████▏| 2560/2774 [8:24:46<42:38, 11.95s/it] 92%|█████████▏| 2561/2774 [8:24:58<42:11, 11.88s/it] {'loss': 1.0547, 'learning_rate': 7.695267413236562e-08, 'epoch': 0.92} 92%|█████████▏| 2561/2774 [8:24:58<42:11, 11.88s/it] 92%|█████████▏| 2562/2774 [8:25:10<42:29, 12.03s/it] {'loss': 1.0122, 'learning_rate': 7.623549630665056e-08, 'epoch': 0.92} 92%|█████████▏| 2562/2774 [8:25:10<42:29, 12.03s/it] 92%|█████████▏| 2563/2774 [8:25:25<44:52, 12.76s/it] {'loss': 0.9458, 'learning_rate': 7.552162434930288e-08, 'epoch': 0.92} 92%|█████████▏| 2563/2774 [8:25:25<44:52, 12.76s/it] 92%|█████████▏| 2564/2774 [8:25:36<43:09, 12.33s/it] {'loss': 1.0293, 'learning_rate': 7.481105923400039e-08, 'epoch': 0.92} 92%|█████████▏| 2564/2774 [8:25:36<43:09, 12.33s/it] 92%|█████████▏| 2565/2774 [8:25:48<42:03, 12.08s/it] {'loss': 0.9995, 'learning_rate': 7.410380192991202e-08, 'epoch': 0.92} 92%|█████████▏| 2565/2774 [8:25:48<42:03, 12.08s/it] 93%|█████████▎| 2566/2774 [8:25:59<41:31, 11.98s/it] {'loss': 1.0171, 'learning_rate': 7.339985340169359e-08, 'epoch': 0.93} 93%|█████████▎| 2566/2774 [8:25:59<41:31, 11.98s/it] 93%|█████████▎| 2567/2774 [8:26:14<43:41, 12.67s/it] {'loss': 1.0171, 'learning_rate': 7.269921460948764e-08, 'epoch': 0.93} 93%|█████████▎| 2567/2774 [8:26:14<43:41, 12.67s/it] 93%|█████████▎| 2568/2774 [8:26:25<42:32, 12.39s/it] {'loss': 1.0117, 'learning_rate': 7.200188650892448e-08, 'epoch': 0.93} 93%|█████████▎| 2568/2774 [8:26:25<42:32, 12.39s/it] 93%|█████████▎| 2569/2774 [8:26:37<41:28, 12.14s/it] {'loss': 1.0059, 'learning_rate': 7.130787005111605e-08, 'epoch': 0.93} 93%|█████████▎| 2569/2774 [8:26:37<41:28, 12.14s/it] 93%|█████████▎| 2570/2774 [8:26:49<40:43, 11.98s/it] {'loss': 1.0488, 'learning_rate': 7.061716618266018e-08, 'epoch': 0.93} 93%|█████████▎| 2570/2774 [8:26:49<40:43, 11.98s/it] 93%|█████████▎| 2571/2774 [8:27:00<39:50, 11.78s/it] {'loss': 0.9976, 'learning_rate': 6.992977584563465e-08, 'epoch': 0.93} 93%|█████████▎| 2571/2774 [8:27:00<39:50, 11.78s/it] 93%|█████████▎| 2572/2774 [8:27:13<41:12, 12.24s/it] {'loss': 0.9805, 'learning_rate': 6.92456999775984e-08, 'epoch': 0.93} 93%|█████████▎| 2572/2774 [8:27:13<41:12, 12.24s/it] 93%|█████████▎| 2573/2774 [8:27:25<40:26, 12.07s/it] {'loss': 1.1016, 'learning_rate': 6.856493951158949e-08, 'epoch': 0.93} 93%|█████████▎| 2573/2774 [8:27:25<40:26, 12.07s/it] 93%|█████████▎| 2574/2774 [8:27:36<39:36, 11.88s/it] {'loss': 1.0166, 'learning_rate': 6.788749537612411e-08, 'epoch': 0.93} 93%|█████████▎| 2574/2774 [8:27:36<39:36, 11.88s/it] 93%|█████████▎| 2575/2774 [8:27:48<39:02, 11.77s/it] {'loss': 1.0098, 'learning_rate': 6.721336849519505e-08, 'epoch': 0.93} 93%|█████████▎| 2575/2774 [8:27:48<39:02, 11.77s/it] 93%|█████████▎| 2576/2774 [8:27:59<38:31, 11.68s/it] {'loss': 1.0347, 'learning_rate': 6.654255978827101e-08, 'epoch': 0.93} 93%|█████████▎| 2576/2774 [8:27:59<38:31, 11.68s/it] 93%|█████████▎| 2577/2774 [8:28:11<38:22, 11.69s/it] {'loss': 0.9604, 'learning_rate': 6.587507017029427e-08, 'epoch': 0.93} 93%|█████████▎| 2577/2774 [8:28:11<38:22, 11.69s/it] 93%|█████████▎| 2578/2774 [8:28:23<38:06, 11.67s/it] {'loss': 0.9731, 'learning_rate': 6.521090055168044e-08, 'epoch': 0.93} 93%|█████████▎| 2578/2774 [8:28:23<38:06, 11.67s/it] 93%|█████████▎| 2579/2774 [8:28:35<39:09, 12.05s/it] {'loss': 1.0361, 'learning_rate': 6.455005183831659e-08, 'epoch': 0.93} 93%|█████████▎| 2579/2774 [8:28:35<39:09, 12.05s/it] 93%|█████████▎| 2580/2774 [8:28:47<38:14, 11.83s/it] {'loss': 1.0073, 'learning_rate': 6.389252493156084e-08, 'epoch': 0.93} 93%|█████████▎| 2580/2774 [8:28:47<38:14, 11.83s/it] 93%|█████████▎| 2581/2774 [8:28:58<37:35, 11.69s/it] {'loss': 1.0244, 'learning_rate': 6.323832072823971e-08, 'epoch': 0.93} 93%|█████████▎| 2581/2774 [8:28:58<37:35, 11.69s/it] 93%|█████████▎| 2582/2774 [8:29:10<37:27, 11.71s/it] {'loss': 0.9639, 'learning_rate': 6.258744012064833e-08, 'epoch': 0.93} 93%|█████████▎| 2582/2774 [8:29:10<37:27, 11.71s/it] 93%|█████████▎| 2583/2774 [8:29:21<37:07, 11.66s/it] {'loss': 1.0171, 'learning_rate': 6.193988399654849e-08, 'epoch': 0.93} 93%|█████████▎| 2583/2774 [8:29:21<37:07, 11.66s/it] 93%|█████████▎| 2584/2774 [8:29:33<37:05, 11.71s/it] {'loss': 0.9917, 'learning_rate': 6.129565323916814e-08, 'epoch': 0.93} 93%|█████████▎| 2584/2774 [8:29:33<37:05, 11.71s/it] 93%|█████████▎| 2585/2774 [8:29:45<37:01, 11.75s/it] {'loss': 1.103, 'learning_rate': 6.065474872719856e-08, 'epoch': 0.93} 93%|█████████▎| 2585/2774 [8:29:45<37:01, 11.75s/it] 93%|█████████▎| 2586/2774 [8:29:56<36:16, 11.58s/it] {'loss': 1.0117, 'learning_rate': 6.001717133479496e-08, 'epoch': 0.93} 93%|█████████▎| 2586/2774 [8:29:56<36:16, 11.58s/it] 93%|█████████▎| 2587/2774 [8:30:08<35:44, 11.47s/it] {'loss': 1.021, 'learning_rate': 5.938292193157419e-08, 'epoch': 0.93} 93%|█████████▎| 2587/2774 [8:30:08<35:44, 11.47s/it] 93%|█████████▎| 2588/2774 [8:30:19<35:25, 11.43s/it] {'loss': 1.0669, 'learning_rate': 5.875200138261428e-08, 'epoch': 0.93} 93%|█████████▎| 2588/2774 [8:30:19<35:25, 11.43s/it] 93%|█████████▎| 2589/2774 [8:30:30<34:59, 11.35s/it] {'loss': 1.0122, 'learning_rate': 5.812441054845325e-08, 'epoch': 0.93} 93%|█████████▎| 2589/2774 [8:30:30<34:59, 11.35s/it] 93%|█████████▎| 2590/2774 [8:30:41<34:49, 11.35s/it] {'loss': 1.0249, 'learning_rate': 5.7500150285086376e-08, 'epoch': 0.93} 93%|█████████▎| 2590/2774 [8:30:41<34:49, 11.35s/it] 93%|█████████▎| 2591/2774 [8:30:53<34:32, 11.32s/it] {'loss': 0.979, 'learning_rate': 5.6879221443967016e-08, 'epoch': 0.93} 93%|█████████▎| 2591/2774 [8:30:53<34:32, 11.32s/it] 93%|█████████▎| 2592/2774 [8:31:04<34:27, 11.36s/it] {'loss': 0.9678, 'learning_rate': 5.626162487200465e-08, 'epoch': 0.93} 93%|█████████▎| 2592/2774 [8:31:04<34:27, 11.36s/it] 93%|█████████▎| 2593/2774 [8:31:16<34:34, 11.46s/it] {'loss': 0.9839, 'learning_rate': 5.564736141156407e-08, 'epoch': 0.93} 93%|█████████▎| 2593/2774 [8:31:16<34:34, 11.46s/it] 94%|█████████▎| 2594/2774 [8:31:27<34:10, 11.39s/it] {'loss': 1.0205, 'learning_rate': 5.503643190046315e-08, 'epoch': 0.94} 94%|█████████▎| 2594/2774 [8:31:27<34:10, 11.39s/it] 94%|█████████▎| 2595/2774 [8:31:38<34:00, 11.40s/it] {'loss': 1.042, 'learning_rate': 5.4428837171973114e-08, 'epoch': 0.94} 94%|█████████▎| 2595/2774 [8:31:38<34:00, 11.40s/it] 94%|█████████▎| 2596/2774 [8:31:50<34:07, 11.50s/it] {'loss': 0.9409, 'learning_rate': 5.382457805481606e-08, 'epoch': 0.94} 94%|█████████▎| 2596/2774 [8:31:50<34:07, 11.50s/it] 94%|█████████▎| 2597/2774 [8:32:02<34:00, 11.53s/it] {'loss': 1.0264, 'learning_rate': 5.322365537316549e-08, 'epoch': 0.94} 94%|█████████▎| 2597/2774 [8:32:02<34:00, 11.53s/it] 94%|█████████▎| 2598/2774 [8:32:13<33:56, 11.57s/it] {'loss': 1.0366, 'learning_rate': 5.2626069946643264e-08, 'epoch': 0.94} 94%|█████████▎| 2598/2774 [8:32:13<33:56, 11.57s/it] 94%|█████████▎| 2599/2774 [8:32:25<33:37, 11.53s/it] {'loss': 1.0552, 'learning_rate': 5.2031822590319636e-08, 'epoch': 0.94} 94%|█████████▎| 2599/2774 [8:32:25<33:37, 11.53s/it] 94%|█████████▎| 2600/2774 [8:32:36<33:08, 11.43s/it] {'loss': 1.0391, 'learning_rate': 5.144091411471236e-08, 'epoch': 0.94} 94%|█████████▎| 2600/2774 [8:32:36<33:08, 11.43s/it] 94%|█████████▍| 2601/2774 [8:32:47<32:55, 11.42s/it] {'loss': 1.0376, 'learning_rate': 5.0853345325785064e-08, 'epoch': 0.94} 94%|█████████▍| 2601/2774 [8:32:47<32:55, 11.42s/it] 94%|█████████▍| 2602/2774 [8:33:00<33:17, 11.61s/it] {'loss': 1.0283, 'learning_rate': 5.026911702494558e-08, 'epoch': 0.94} 94%|█████████▍| 2602/2774 [8:33:00<33:17, 11.61s/it] 94%|█████████▍| 2603/2774 [8:33:11<33:15, 11.67s/it] {'loss': 0.9492, 'learning_rate': 4.968823000904649e-08, 'epoch': 0.94} 94%|█████████▍| 2603/2774 [8:33:11<33:15, 11.67s/it] 94%|█████████▍| 2604/2774 [8:33:23<33:12, 11.72s/it] {'loss': 1.0137, 'learning_rate': 4.911068507038236e-08, 'epoch': 0.94} 94%|█████████▍| 2604/2774 [8:33:23<33:12, 11.72s/it] 94%|█████████▍| 2605/2774 [8:33:35<32:49, 11.66s/it] {'loss': 1.0508, 'learning_rate': 4.8536482996690004e-08, 'epoch': 0.94} 94%|█████████▍| 2605/2774 [8:33:35<32:49, 11.66s/it] 94%|█████████▍| 2606/2774 [8:33:46<32:33, 11.63s/it] {'loss': 1.0205, 'learning_rate': 4.796562457114573e-08, 'epoch': 0.94} 94%|█████████▍| 2606/2774 [8:33:46<32:33, 11.63s/it] 94%|█████████▍| 2607/2774 [8:34:00<33:49, 12.15s/it] {'loss': 1.0347, 'learning_rate': 4.739811057236615e-08, 'epoch': 0.94} 94%|█████████▍| 2607/2774 [8:34:00<33:49, 12.15s/it] 94%|█████████▍| 2608/2774 [8:34:12<33:29, 12.11s/it] {'loss': 1.0107, 'learning_rate': 4.6833941774406535e-08, 'epoch': 0.94} 94%|█████████▍| 2608/2774 [8:34:12<33:29, 12.11s/it] 94%|█████████▍| 2609/2774 [8:34:23<32:34, 11.85s/it] {'loss': 1.0088, 'learning_rate': 4.627311894675857e-08, 'epoch': 0.94} 94%|█████████▍| 2609/2774 [8:34:23<32:34, 11.85s/it] 94%|█████████▍| 2610/2774 [8:34:35<32:15, 11.80s/it] {'loss': 0.9668, 'learning_rate': 4.5715642854350374e-08, 'epoch': 0.94} 94%|█████████▍| 2610/2774 [8:34:35<32:15, 11.80s/it] 94%|█████████▍| 2611/2774 [8:34:46<31:48, 11.71s/it] {'loss': 1.0352, 'learning_rate': 4.5161514257546504e-08, 'epoch': 0.94} 94%|█████████▍| 2611/2774 [8:34:46<31:48, 11.71s/it] 94%|█████████▍| 2612/2774 [8:34:57<31:16, 11.59s/it] {'loss': 1.0474, 'learning_rate': 4.46107339121446e-08, 'epoch': 0.94} 94%|█████████▍| 2612/2774 [8:34:57<31:16, 11.59s/it] 94%|█████████▍| 2613/2774 [8:35:09<30:53, 11.51s/it] {'loss': 1.0366, 'learning_rate': 4.406330256937541e-08, 'epoch': 0.94} 94%|█████████▍| 2613/2774 [8:35:09<30:53, 11.51s/it] 94%|█████████▍| 2614/2774 [8:35:20<30:51, 11.57s/it] {'loss': 1.0381, 'learning_rate': 4.3519220975902775e-08, 'epoch': 0.94} 94%|█████████▍| 2614/2774 [8:35:20<30:51, 11.57s/it] 94%|█████████▍| 2615/2774 [8:35:32<30:55, 11.67s/it] {'loss': 1.0034, 'learning_rate': 4.297848987382031e-08, 'epoch': 0.94} 94%|█████████▍| 2615/2774 [8:35:32<30:55, 11.67s/it] 94%|█████████▍| 2616/2774 [8:35:45<31:21, 11.91s/it] {'loss': 1.0098, 'learning_rate': 4.2441110000653596e-08, 'epoch': 0.94} 94%|█████████▍| 2616/2774 [8:35:45<31:21, 11.91s/it] 94%|█████████▍| 2617/2774 [8:35:56<30:40, 11.72s/it] {'loss': 1.0288, 'learning_rate': 4.190708208935579e-08, 'epoch': 0.94} 94%|█████████▍| 2617/2774 [8:35:56<30:40, 11.72s/it] 94%|█████████▍| 2618/2774 [8:36:08<30:23, 11.69s/it] {'loss': 1.0146, 'learning_rate': 4.1376406868308684e-08, 'epoch': 0.94} 94%|█████████▍| 2618/2774 [8:36:08<30:23, 11.69s/it] 94%|█████████▍| 2619/2774 [8:36:21<31:40, 12.26s/it] {'loss': 0.9351, 'learning_rate': 4.084908506132107e-08, 'epoch': 0.94} 94%|█████████▍| 2619/2774 [8:36:21<31:40, 12.26s/it] 94%|█████████▍| 2620/2774 [8:36:33<30:55, 12.05s/it] {'loss': 0.9795, 'learning_rate': 4.0325117387628455e-08, 'epoch': 0.94} 94%|█████████▍| 2620/2774 [8:36:33<30:55, 12.05s/it] 94%|█████████▍| 2621/2774 [8:36:44<30:09, 11.82s/it] {'loss': 1.0723, 'learning_rate': 3.9804504561890554e-08, 'epoch': 0.94} 94%|█████████▍| 2621/2774 [8:36:44<30:09, 11.82s/it] 95%|█████████▍| 2622/2774 [8:36:55<29:35, 11.68s/it] {'loss': 1.0889, 'learning_rate': 3.928724729419242e-08, 'epoch': 0.95} 95%|█████████▍| 2622/2774 [8:36:55<29:35, 11.68s/it] 95%|█████████▍| 2623/2774 [8:37:07<29:11, 11.60s/it] {'loss': 1.0122, 'learning_rate': 3.877334629004109e-08, 'epoch': 0.95} 95%|█████████▍| 2623/2774 [8:37:07<29:11, 11.60s/it] 95%|█████████▍| 2624/2774 [8:37:18<28:47, 11.52s/it] {'loss': 1.0381, 'learning_rate': 3.826280225036727e-08, 'epoch': 0.95} 95%|█████████▍| 2624/2774 [8:37:18<28:47, 11.52s/it] 95%|█████████▍| 2625/2774 [8:37:30<28:54, 11.64s/it] {'loss': 0.98, 'learning_rate': 3.7755615871521434e-08, 'epoch': 0.95} 95%|█████████▍| 2625/2774 [8:37:30<28:54, 11.64s/it] 95%|█████████▍| 2626/2774 [8:37:42<28:37, 11.60s/it] {'loss': 1.0737, 'learning_rate': 3.725178784527578e-08, 'epoch': 0.95} 95%|█████████▍| 2626/2774 [8:37:42<28:37, 11.60s/it] 95%|█████████▍| 2627/2774 [8:37:53<28:33, 11.65s/it] {'loss': 1.0532, 'learning_rate': 3.6751318858820885e-08, 'epoch': 0.95} 95%|█████████▍| 2627/2774 [8:37:53<28:33, 11.65s/it] 95%|█████████▍| 2628/2774 [8:38:05<28:23, 11.67s/it] {'loss': 1.0146, 'learning_rate': 3.625420959476628e-08, 'epoch': 0.95} 95%|█████████▍| 2628/2774 [8:38:05<28:23, 11.67s/it] 95%|█████████▍| 2629/2774 [8:38:16<27:51, 11.53s/it] {'loss': 1.0215, 'learning_rate': 3.576046073113903e-08, 'epoch': 0.95} 95%|█████████▍| 2629/2774 [8:38:16<27:51, 11.53s/it] 95%|█████████▍| 2630/2774 [8:38:28<27:44, 11.56s/it] {'loss': 1.1143, 'learning_rate': 3.5270072941382684e-08, 'epoch': 0.95} 95%|█████████▍| 2630/2774 [8:38:28<27:44, 11.56s/it] 95%|█████████▍| 2631/2774 [8:38:40<27:33, 11.56s/it] {'loss': 1.0278, 'learning_rate': 3.4783046894356906e-08, 'epoch': 0.95} 95%|█████████▍| 2631/2774 [8:38:40<27:33, 11.56s/it] 95%|█████████▍| 2632/2774 [8:38:51<27:11, 11.49s/it] {'loss': 1.0132, 'learning_rate': 3.429938325433507e-08, 'epoch': 0.95} 95%|█████████▍| 2632/2774 [8:38:51<27:11, 11.49s/it] 95%|█████████▍| 2633/2774 [8:39:02<27:05, 11.53s/it] {'loss': 1.0474, 'learning_rate': 3.3819082681006145e-08, 'epoch': 0.95} 95%|█████████▍| 2633/2774 [8:39:02<27:05, 11.53s/it] 95%|█████████▍| 2634/2774 [8:39:14<27:00, 11.57s/it] {'loss': 1.0225, 'learning_rate': 3.334214582946998e-08, 'epoch': 0.95} 95%|█████████▍| 2634/2774 [8:39:14<27:00, 11.57s/it] 95%|█████████▍| 2635/2774 [8:39:25<26:40, 11.51s/it] {'loss': 1.0254, 'learning_rate': 3.2868573350240687e-08, 'epoch': 0.95} 95%|█████████▍| 2635/2774 [8:39:25<26:40, 11.51s/it] 95%|█████████▌| 2636/2774 [8:39:37<26:27, 11.50s/it] {'loss': 1.0649, 'learning_rate': 3.239836588924211e-08, 'epoch': 0.95} 95%|█████████▌| 2636/2774 [8:39:37<26:27, 11.50s/it] 95%|█████████▌| 2637/2774 [8:39:49<26:22, 11.55s/it] {'loss': 1.0508, 'learning_rate': 3.1931524087808476e-08, 'epoch': 0.95} 95%|█████████▌| 2637/2774 [8:39:49<26:22, 11.55s/it] 95%|█████████▌| 2638/2774 [8:40:01<26:31, 11.70s/it] {'loss': 0.9888, 'learning_rate': 3.146804858268404e-08, 'epoch': 0.95} 95%|█████████▌| 2638/2774 [8:40:01<26:31, 11.70s/it] 95%|█████████▌| 2639/2774 [8:40:12<26:06, 11.60s/it] {'loss': 1.0625, 'learning_rate': 3.100794000602175e-08, 'epoch': 0.95} 95%|█████████▌| 2639/2774 [8:40:12<26:06, 11.60s/it] 95%|█████████▌| 2640/2774 [8:40:25<26:59, 12.08s/it] {'loss': 1.0308, 'learning_rate': 3.0551198985381284e-08, 'epoch': 0.95} 95%|█████████▌| 2640/2774 [8:40:25<26:59, 12.08s/it] 95%|█████████▌| 2641/2774 [8:40:37<26:19, 11.87s/it] {'loss': 1.0068, 'learning_rate': 3.0097826143730414e-08, 'epoch': 0.95} 95%|█████████▌| 2641/2774 [8:40:37<26:19, 11.87s/it] 95%|█████████▌| 2642/2774 [8:40:48<26:00, 11.82s/it] {'loss': 1.043, 'learning_rate': 2.9647822099442004e-08, 'epoch': 0.95} 95%|█████████▌| 2642/2774 [8:40:48<26:00, 11.82s/it] 95%|█████████▌| 2643/2774 [8:41:00<25:39, 11.75s/it] {'loss': 1.0122, 'learning_rate': 2.9201187466294246e-08, 'epoch': 0.95} 95%|█████████▌| 2643/2774 [8:41:00<25:39, 11.75s/it] 95%|█████████▌| 2644/2774 [8:41:11<25:19, 11.69s/it] {'loss': 1.0146, 'learning_rate': 2.8757922853470123e-08, 'epoch': 0.95} 95%|█████████▌| 2644/2774 [8:41:11<25:19, 11.69s/it] 95%|█████████▌| 2645/2774 [8:41:23<24:53, 11.58s/it] {'loss': 0.9844, 'learning_rate': 2.8318028865555736e-08, 'epoch': 0.95} 95%|█████████▌| 2645/2774 [8:41:23<24:53, 11.58s/it] 95%|█████████▌| 2646/2774 [8:41:34<24:36, 11.53s/it] {'loss': 1.0308, 'learning_rate': 2.788150610253948e-08, 'epoch': 0.95} 95%|█████████▌| 2646/2774 [8:41:34<24:36, 11.53s/it] 95%|█████████▌| 2647/2774 [8:41:46<24:25, 11.54s/it] {'loss': 1.0068, 'learning_rate': 2.7448355159812313e-08, 'epoch': 0.95} 95%|█████████▌| 2647/2774 [8:41:46<24:25, 11.54s/it] 95%|█████████▌| 2648/2774 [8:41:57<24:02, 11.45s/it] {'loss': 1.0127, 'learning_rate': 2.7018576628166095e-08, 'epoch': 0.95} 95%|█████████▌| 2648/2774 [8:41:57<24:02, 11.45s/it] 95%|█████████▌| 2649/2774 [8:42:08<23:50, 11.44s/it] {'loss': 0.9707, 'learning_rate': 2.659217109379275e-08, 'epoch': 0.95} 95%|█████████▌| 2649/2774 [8:42:08<23:50, 11.44s/it] 96%|█████████▌| 2650/2774 [8:42:20<23:38, 11.44s/it] {'loss': 1.0376, 'learning_rate': 2.616913913828373e-08, 'epoch': 0.96} 96%|█████████▌| 2650/2774 [8:42:20<23:38, 11.44s/it] 96%|█████████▌| 2651/2774 [8:42:31<23:21, 11.39s/it] {'loss': 1.0474, 'learning_rate': 2.574948133862887e-08, 'epoch': 0.96} 96%|█████████▌| 2651/2774 [8:42:31<23:21, 11.39s/it] 96%|█████████▌| 2652/2774 [8:42:42<23:07, 11.38s/it] {'loss': 0.9883, 'learning_rate': 2.5333198267215862e-08, 'epoch': 0.96} 96%|█████████▌| 2652/2774 [8:42:42<23:07, 11.38s/it] 96%|█████████▌| 2653/2774 [8:42:54<23:02, 11.42s/it] {'loss': 1.0703, 'learning_rate': 2.4920290491830257e-08, 'epoch': 0.96} 96%|█████████▌| 2653/2774 [8:42:54<23:02, 11.42s/it] 96%|█████████▌| 2654/2774 [8:43:06<22:58, 11.49s/it] {'loss': 1.0127, 'learning_rate': 2.4510758575652937e-08, 'epoch': 0.96} 96%|█████████▌| 2654/2774 [8:43:06<22:58, 11.49s/it] 96%|█████████▌| 2655/2774 [8:43:18<23:32, 11.87s/it] {'loss': 1.0249, 'learning_rate': 2.4104603077260703e-08, 'epoch': 0.96} 96%|█████████▌| 2655/2774 [8:43:18<23:32, 11.87s/it] 96%|█████████▌| 2656/2774 [8:43:30<23:25, 11.91s/it] {'loss': 1.0234, 'learning_rate': 2.3701824550624864e-08, 'epoch': 0.96} 96%|█████████▌| 2656/2774 [8:43:30<23:25, 11.91s/it] 96%|█████████▌| 2657/2774 [8:43:44<24:22, 12.50s/it] {'loss': 1.0039, 'learning_rate': 2.3302423545111807e-08, 'epoch': 0.96} 96%|█████████▌| 2657/2774 [8:43:44<24:22, 12.50s/it] 96%|█████████▌| 2658/2774 [8:43:56<23:33, 12.19s/it] {'loss': 1.0308, 'learning_rate': 2.2906400605479663e-08, 'epoch': 0.96} 96%|█████████▌| 2658/2774 [8:43:56<23:33, 12.19s/it] 96%|█████████▌| 2659/2774 [8:44:07<22:58, 11.98s/it] {'loss': 1.0884, 'learning_rate': 2.251375627187996e-08, 'epoch': 0.96} 96%|█████████▌| 2659/2774 [8:44:07<22:58, 11.98s/it] 96%|█████████▌| 2660/2774 [8:44:18<22:19, 11.75s/it] {'loss': 1.0508, 'learning_rate': 2.212449107985598e-08, 'epoch': 0.96} 96%|█████████▌| 2660/2774 [8:44:18<22:19, 11.75s/it] 96%|█████████▌| 2661/2774 [8:44:31<22:33, 11.98s/it] {'loss': 0.9922, 'learning_rate': 2.173860556034163e-08, 'epoch': 0.96} 96%|█████████▌| 2661/2774 [8:44:31<22:33, 11.98s/it] 96%|█████████▌| 2662/2774 [8:44:42<21:55, 11.74s/it] {'loss': 1.0791, 'learning_rate': 2.1356100239662002e-08, 'epoch': 0.96} 96%|█████████▌| 2662/2774 [8:44:42<21:55, 11.74s/it] 96%|█████████▌| 2663/2774 [8:44:54<21:47, 11.78s/it] {'loss': 0.9834, 'learning_rate': 2.0976975639530606e-08, 'epoch': 0.96} 96%|█████████▌| 2663/2774 [8:44:54<21:47, 11.78s/it] 96%|█████████▌| 2664/2774 [8:45:06<21:26, 11.70s/it] {'loss': 0.9858, 'learning_rate': 2.060123227705102e-08, 'epoch': 0.96} 96%|█████████▌| 2664/2774 [8:45:06<21:26, 11.70s/it] 96%|█████████▌| 2665/2774 [8:45:17<21:02, 11.59s/it] {'loss': 1.0151, 'learning_rate': 2.0228870664714128e-08, 'epoch': 0.96} 96%|█████████▌| 2665/2774 [8:45:17<21:02, 11.59s/it] 96%|█████████▌| 2666/2774 [8:45:28<20:41, 11.49s/it] {'loss': 1.04, 'learning_rate': 1.9859891310398948e-08, 'epoch': 0.96} 96%|█████████▌| 2666/2774 [8:45:28<20:41, 11.49s/it] 96%|█████████▌| 2667/2774 [8:45:40<20:26, 11.46s/it] {'loss': 1.0615, 'learning_rate': 1.9494294717370964e-08, 'epoch': 0.96} 96%|█████████▌| 2667/2774 [8:45:40<20:26, 11.46s/it] 96%|█████████▌| 2668/2774 [8:45:51<20:18, 11.50s/it] {'loss': 1.0225, 'learning_rate': 1.9132081384281575e-08, 'epoch': 0.96} 96%|█████████▌| 2668/2774 [8:45:51<20:18, 11.50s/it] 96%|█████████▌| 2669/2774 [8:46:03<20:07, 11.50s/it] {'loss': 1.0356, 'learning_rate': 1.8773251805168092e-08, 'epoch': 0.96} 96%|█████████▌| 2669/2774 [8:46:03<20:07, 11.50s/it] 96%|█████████▋| 2670/2774 [8:46:14<19:55, 11.50s/it] {'loss': 1.0586, 'learning_rate': 1.8417806469452626e-08, 'epoch': 0.96} 96%|█████████▋| 2670/2774 [8:46:14<19:55, 11.50s/it] 96%|█████████▋| 2671/2774 [8:46:26<19:41, 11.47s/it] {'loss': 1.0811, 'learning_rate': 1.806574586194071e-08, 'epoch': 0.96} 96%|█████████▋| 2671/2774 [8:46:26<19:41, 11.47s/it] 96%|█████████▋| 2672/2774 [8:46:38<19:58, 11.75s/it] {'loss': 0.9658, 'learning_rate': 1.7717070462822116e-08, 'epoch': 0.96} 96%|█████████▋| 2672/2774 [8:46:38<19:58, 11.75s/it] 96%|█████████▋| 2673/2774 [8:46:50<19:41, 11.70s/it] {'loss': 1.0117, 'learning_rate': 1.7371780747668655e-08, 'epoch': 0.96} 96%|█████████▋| 2673/2774 [8:46:50<19:41, 11.70s/it] 96%|█████████▋| 2674/2774 [8:47:01<19:31, 11.72s/it] {'loss': 1.1084, 'learning_rate': 1.7029877187434986e-08, 'epoch': 0.96} 96%|█████████▋| 2674/2774 [8:47:01<19:31, 11.72s/it] 96%|█████████▋| 2675/2774 [8:47:13<19:11, 11.63s/it] {'loss': 1.0889, 'learning_rate': 1.6691360248456412e-08, 'epoch': 0.96} 96%|█████████▋| 2675/2774 [8:47:13<19:11, 11.63s/it] 96%|█████████▋| 2676/2774 [8:47:26<19:57, 12.22s/it] {'loss': 0.9692, 'learning_rate': 1.6356230392450268e-08, 'epoch': 0.96} 96%|█████████▋| 2676/2774 [8:47:26<19:57, 12.22s/it] 97%|█████████▋| 2677/2774 [8:47:37<19:14, 11.90s/it] {'loss': 1.022, 'learning_rate': 1.6024488076512855e-08, 'epoch': 0.97} 97%|█████████▋| 2677/2774 [8:47:37<19:14, 11.90s/it] 97%|█████████▋| 2678/2774 [8:47:49<18:42, 11.70s/it] {'loss': 0.9805, 'learning_rate': 1.5696133753121124e-08, 'epoch': 0.97} 97%|█████████▋| 2678/2774 [8:47:49<18:42, 11.70s/it] 97%|█████████▋| 2679/2774 [8:48:00<18:20, 11.59s/it] {'loss': 1.0249, 'learning_rate': 1.5371167870130433e-08, 'epoch': 0.97} 97%|█████████▋| 2679/2774 [8:48:00<18:20, 11.59s/it] 97%|█████████▋| 2680/2774 [8:48:11<18:06, 11.56s/it] {'loss': 1.0605, 'learning_rate': 1.504959087077429e-08, 'epoch': 0.97} 97%|█████████▋| 2680/2774 [8:48:11<18:06, 11.56s/it] 97%|█████████▋| 2681/2774 [8:48:23<17:59, 11.61s/it] {'loss': 1.0854, 'learning_rate': 1.473140319366434e-08, 'epoch': 0.97} 97%|█████████▋| 2681/2774 [8:48:23<17:59, 11.61s/it] 97%|█████████▋| 2682/2774 [8:48:35<17:39, 11.52s/it] {'loss': 0.998, 'learning_rate': 1.4416605272789819e-08, 'epoch': 0.97} 97%|█████████▋| 2682/2774 [8:48:35<17:39, 11.52s/it] 97%|█████████▋| 2683/2774 [8:48:46<17:38, 11.63s/it] {'loss': 1.0264, 'learning_rate': 1.4105197537515602e-08, 'epoch': 0.97} 97%|█████████▋| 2683/2774 [8:48:46<17:38, 11.63s/it] 97%|█████████▋| 2684/2774 [8:48:58<17:17, 11.53s/it] {'loss': 0.9487, 'learning_rate': 1.3797180412583322e-08, 'epoch': 0.97} 97%|█████████▋| 2684/2774 [8:48:58<17:17, 11.53s/it] 97%|█████████▋| 2685/2774 [8:49:11<18:01, 12.15s/it] {'loss': 1.0137, 'learning_rate': 1.349255431810942e-08, 'epoch': 0.97} 97%|█████████▋| 2685/2774 [8:49:11<18:01, 12.15s/it] 97%|█████████▋| 2686/2774 [8:49:24<18:11, 12.40s/it] {'loss': 0.9824, 'learning_rate': 1.31913196695857e-08, 'epoch': 0.97} 97%|█████████▋| 2686/2774 [8:49:24<18:11, 12.40s/it] 97%|█████████▋| 2687/2774 [8:49:36<17:33, 12.11s/it] {'loss': 1.0454, 'learning_rate': 1.289347687787823e-08, 'epoch': 0.97} 97%|█████████▋| 2687/2774 [8:49:36<17:33, 12.11s/it] 97%|█████████▋| 2688/2774 [8:49:47<16:57, 11.84s/it] {'loss': 1.0156, 'learning_rate': 1.259902634922594e-08, 'epoch': 0.97} 97%|█████████▋| 2688/2774 [8:49:47<16:57, 11.84s/it] 97%|█████████▋| 2689/2774 [8:49:59<16:40, 11.77s/it] {'loss': 1.0571, 'learning_rate': 1.2307968485242572e-08, 'epoch': 0.97} 97%|█████████▋| 2689/2774 [8:49:59<16:40, 11.77s/it] 97%|█████████▋| 2690/2774 [8:50:10<16:14, 11.60s/it] {'loss': 0.9727, 'learning_rate': 1.2020303682912237e-08, 'epoch': 0.97} 97%|█████████▋| 2690/2774 [8:50:10<16:14, 11.60s/it] 97%|█████████▋| 2691/2774 [8:50:22<16:09, 11.68s/it] {'loss': 1.0508, 'learning_rate': 1.1736032334593306e-08, 'epoch': 0.97} 97%|█████████▋| 2691/2774 [8:50:22<16:09, 11.68s/it] 97%|█████████▋| 2692/2774 [8:50:35<16:37, 12.17s/it] {'loss': 1.0239, 'learning_rate': 1.1455154828014515e-08, 'epoch': 0.97} 97%|█████████▋| 2692/2774 [8:50:35<16:37, 12.17s/it] 97%|█████████▋| 2693/2774 [8:50:46<16:05, 11.92s/it] {'loss': 0.979, 'learning_rate': 1.1177671546275526e-08, 'epoch': 0.97} 97%|█████████▋| 2693/2774 [8:50:46<16:05, 11.92s/it] 97%|█████████▋| 2694/2774 [8:50:58<15:44, 11.80s/it] {'loss': 1.0034, 'learning_rate': 1.090358286784693e-08, 'epoch': 0.97} 97%|█████████▋| 2694/2774 [8:50:58<15:44, 11.80s/it] 97%|█████████▋| 2695/2774 [8:51:10<15:34, 11.83s/it] {'loss': 1.1143, 'learning_rate': 1.0632889166569128e-08, 'epoch': 0.97} 97%|█████████▋| 2695/2774 [8:51:10<15:34, 11.83s/it] 97%|█████████▋| 2696/2774 [8:51:21<15:16, 11.76s/it] {'loss': 0.9917, 'learning_rate': 1.036559081165206e-08, 'epoch': 0.97} 97%|█████████▋| 2696/2774 [8:51:21<15:16, 11.76s/it] 97%|█████████▋| 2697/2774 [8:51:34<15:38, 12.19s/it] {'loss': 0.9907, 'learning_rate': 1.0101688167674372e-08, 'epoch': 0.97} 97%|█████████▋| 2697/2774 [8:51:34<15:38, 12.19s/it] 97%|█████████▋| 2698/2774 [8:51:46<15:11, 11.99s/it] {'loss': 1.0381, 'learning_rate': 9.841181594583693e-09, 'epoch': 0.97} 97%|█████████▋| 2698/2774 [8:51:46<15:11, 11.99s/it] 97%|█████████▋| 2699/2774 [8:51:57<14:41, 11.75s/it] {'loss': 1.0747, 'learning_rate': 9.584071447694688e-09, 'epoch': 0.97} 97%|█████████▋| 2699/2774 [8:51:57<14:41, 11.75s/it] 97%|█████████▋| 2700/2774 [8:52:10<14:59, 12.16s/it] {'loss': 0.9819, 'learning_rate': 9.330358077690449e-09, 'epoch': 0.97} 97%|█████████▋| 2700/2774 [8:52:10<14:59, 12.16s/it] 97%|█████████▋| 2701/2774 [8:52:22<14:28, 11.90s/it] {'loss': 1.0171, 'learning_rate': 9.080041830620834e-09, 'epoch': 0.97} 97%|█████████▋| 2701/2774 [8:52:22<14:28, 11.90s/it] 97%|█████████▋| 2702/2774 [8:52:34<14:32, 12.12s/it] {'loss': 1.0112, 'learning_rate': 8.833123047901626e-09, 'epoch': 0.97} 97%|█████████▋| 2702/2774 [8:52:34<14:32, 12.12s/it] 97%|█████████▋| 2703/2774 [8:52:46<14:06, 11.93s/it] {'loss': 1.0132, 'learning_rate': 8.589602066315372e-09, 'epoch': 0.97} 97%|█████████▋| 2703/2774 [8:52:46<14:06, 11.93s/it] 97%|█████████▋| 2704/2774 [8:52:57<13:47, 11.82s/it] {'loss': 1.0112, 'learning_rate': 8.349479218009993e-09, 'epoch': 0.97} 97%|█████████▋| 2704/2774 [8:52:57<13:47, 11.82s/it] 98%|█████████▊| 2705/2774 [8:53:09<13:43, 11.94s/it] {'loss': 1.0278, 'learning_rate': 8.112754830498504e-09, 'epoch': 0.98} 98%|█████████▊| 2705/2774 [8:53:09<13:43, 11.94s/it] 98%|█████████▊| 2706/2774 [8:53:22<13:35, 12.00s/it] {'loss': 1.0117, 'learning_rate': 7.879429226658741e-09, 'epoch': 0.98} 98%|█████████▊| 2706/2774 [8:53:22<13:35, 12.00s/it] 98%|█████████▊| 2707/2774 [8:53:34<13:40, 12.25s/it] {'loss': 1.0044, 'learning_rate': 7.649502724732528e-09, 'epoch': 0.98} 98%|█████████▊| 2707/2774 [8:53:34<13:40, 12.25s/it] 98%|█████████▊| 2708/2774 [8:53:46<13:15, 12.05s/it] {'loss': 1.0103, 'learning_rate': 7.4229756383259465e-09, 'epoch': 0.98} 98%|█████████▊| 2708/2774 [8:53:46<13:15, 12.05s/it] 98%|█████████▊| 2709/2774 [8:53:58<12:52, 11.88s/it] {'loss': 1.0322, 'learning_rate': 7.1998482764082386e-09, 'epoch': 0.98} 98%|█████████▊| 2709/2774 [8:53:58<12:52, 11.88s/it] 98%|█████████▊| 2710/2774 [8:54:09<12:28, 11.70s/it] {'loss': 1.063, 'learning_rate': 6.980120943311519e-09, 'epoch': 0.98} 98%|█████████▊| 2710/2774 [8:54:09<12:28, 11.70s/it] 98%|█████████▊| 2711/2774 [8:54:21<12:22, 11.79s/it] {'loss': 1.0371, 'learning_rate': 6.763793938730778e-09, 'epoch': 0.98} 98%|█████████▊| 2711/2774 [8:54:21<12:22, 11.79s/it] 98%|█████████▊| 2712/2774 [8:54:32<11:59, 11.61s/it] {'loss': 1.0015, 'learning_rate': 6.5508675577227735e-09, 'epoch': 0.98} 98%|█████████▊| 2712/2774 [8:54:32<11:59, 11.61s/it] 98%|█████████▊| 2713/2774 [8:54:46<12:26, 12.24s/it] {'loss': 0.9917, 'learning_rate': 6.341342090706304e-09, 'epoch': 0.98} 98%|█████████▊| 2713/2774 [8:54:46<12:26, 12.24s/it] 98%|█████████▊| 2714/2774 [8:54:57<12:03, 12.06s/it] {'loss': 0.9702, 'learning_rate': 6.1352178234613816e-09, 'epoch': 0.98} 98%|█████████▊| 2714/2774 [8:54:57<12:03, 12.06s/it] 98%|█████████▊| 2715/2774 [8:55:09<11:40, 11.87s/it] {'loss': 1.0659, 'learning_rate': 5.9324950371292264e-09, 'epoch': 0.98} 98%|█████████▊| 2715/2774 [8:55:09<11:40, 11.87s/it] 98%|█████████▊| 2716/2774 [8:55:20<11:18, 11.69s/it] {'loss': 1.0391, 'learning_rate': 5.733174008211717e-09, 'epoch': 0.98} 98%|█████████▊| 2716/2774 [8:55:20<11:18, 11.69s/it] 98%|█████████▊| 2717/2774 [8:55:31<10:58, 11.56s/it] {'loss': 0.9941, 'learning_rate': 5.537255008569997e-09, 'epoch': 0.98} 98%|█████████▊| 2717/2774 [8:55:31<10:58, 11.56s/it] 98%|█████████▊| 2718/2774 [8:55:45<11:19, 12.13s/it] {'loss': 1.0166, 'learning_rate': 5.3447383054261445e-09, 'epoch': 0.98} 98%|█████████▊| 2718/2774 [8:55:45<11:19, 12.13s/it] 98%|█████████▊| 2719/2774 [8:55:56<10:57, 11.95s/it] {'loss': 0.9854, 'learning_rate': 5.155624161361505e-09, 'epoch': 0.98} 98%|█████████▊| 2719/2774 [8:55:56<10:57, 11.95s/it] 98%|█████████▊| 2720/2774 [8:56:08<10:44, 11.94s/it] {'loss': 0.9702, 'learning_rate': 4.96991283431586e-09, 'epoch': 0.98} 98%|█████████▊| 2720/2774 [8:56:08<10:44, 11.94s/it] 98%|█████████▊| 2721/2774 [8:56:20<10:23, 11.76s/it] {'loss': 1.0581, 'learning_rate': 4.787604577588534e-09, 'epoch': 0.98} 98%|█████████▊| 2721/2774 [8:56:20<10:23, 11.76s/it] 98%|█████████▊| 2722/2774 [8:56:34<10:47, 12.45s/it] {'loss': 0.9575, 'learning_rate': 4.608699639837288e-09, 'epoch': 0.98} 98%|█████████▊| 2722/2774 [8:56:34<10:47, 12.45s/it] 98%|█████████▊| 2723/2774 [8:56:45<10:17, 12.11s/it] {'loss': 0.9707, 'learning_rate': 4.433198265076932e-09, 'epoch': 0.98} 98%|█████████▊| 2723/2774 [8:56:45<10:17, 12.11s/it] 98%|█████████▊| 2724/2774 [8:56:57<09:58, 11.96s/it] {'loss': 1.0518, 'learning_rate': 4.261100692681264e-09, 'epoch': 0.98} 98%|█████████▊| 2724/2774 [8:56:57<09:58, 11.96s/it] 98%|█████████▊| 2725/2774 [8:57:08<09:38, 11.81s/it] {'loss': 1.0269, 'learning_rate': 4.092407157380851e-09, 'epoch': 0.98} 98%|█████████▊| 2725/2774 [8:57:08<09:38, 11.81s/it] 98%|█████████▊| 2726/2774 [8:57:19<09:22, 11.72s/it] {'loss': 0.9761, 'learning_rate': 3.9271178892635875e-09, 'epoch': 0.98} 98%|█████████▊| 2726/2774 [8:57:19<09:22, 11.72s/it] 98%|█████████▊| 2727/2774 [8:57:32<09:21, 11.94s/it] {'loss': 1.0869, 'learning_rate': 3.765233113773858e-09, 'epoch': 0.98} 98%|█████████▊| 2727/2774 [8:57:32<09:21, 11.94s/it] 98%|█████████▊| 2728/2774 [8:57:43<08:59, 11.73s/it] {'loss': 0.9951, 'learning_rate': 3.6067530517128192e-09, 'epoch': 0.98} 98%|█████████▊| 2728/2774 [8:57:43<08:59, 11.73s/it] 98%|█████████▊| 2729/2774 [8:57:54<08:40, 11.57s/it] {'loss': 1.0156, 'learning_rate': 3.4516779192375616e-09, 'epoch': 0.98} 98%|█████████▊| 2729/2774 [8:57:54<08:40, 11.57s/it] 98%|█████████▊| 2730/2774 [8:58:06<08:26, 11.51s/it] {'loss': 0.9619, 'learning_rate': 3.3000079278611154e-09, 'epoch': 0.98} 98%|█████████▊| 2730/2774 [8:58:06<08:26, 11.51s/it] 98%|█████████▊| 2731/2774 [8:58:17<08:17, 11.56s/it] {'loss': 1.0068, 'learning_rate': 3.151743284452724e-09, 'epoch': 0.98} 98%|█████████▊| 2731/2774 [8:58:17<08:17, 11.56s/it] 98%|█████████▊| 2732/2774 [8:58:29<08:05, 11.56s/it] {'loss': 1.0352, 'learning_rate': 3.0068841912359035e-09, 'epoch': 0.98} 98%|█████████▊| 2732/2774 [8:58:29<08:05, 11.56s/it] 99%|█████████▊| 2733/2774 [8:58:40<07:52, 11.53s/it] {'loss': 1.0337, 'learning_rate': 2.865430845790107e-09, 'epoch': 0.99} 99%|█████████▊| 2733/2774 [8:58:40<07:52, 11.53s/it] 99%|█████████▊| 2734/2774 [8:58:52<07:36, 11.41s/it] {'loss': 0.9585, 'learning_rate': 2.7273834410485033e-09, 'epoch': 0.99} 99%|█████████▊| 2734/2774 [8:58:52<07:36, 11.41s/it] 99%|█████████▊| 2735/2774 [8:59:04<07:33, 11.63s/it] {'loss': 1.0376, 'learning_rate': 2.5927421653001995e-09, 'epoch': 0.99} 99%|█████████▊| 2735/2774 [8:59:04<07:33, 11.63s/it] 99%|█████████▊| 2736/2774 [8:59:15<07:20, 11.59s/it] {'loss': 1.0566, 'learning_rate': 2.4615072021871855e-09, 'epoch': 0.99} 99%|█████████▊| 2736/2774 [8:59:15<07:20, 11.59s/it] 99%|█████████▊| 2737/2774 [8:59:27<07:07, 11.55s/it] {'loss': 1.0405, 'learning_rate': 2.333678730706279e-09, 'epoch': 0.99} 99%|█████████▊| 2737/2774 [8:59:27<07:07, 11.55s/it] 99%|█████████▊| 2738/2774 [8:59:38<06:52, 11.47s/it] {'loss': 1.0356, 'learning_rate': 2.2092569252077366e-09, 'epoch': 0.99} 99%|█████████▊| 2738/2774 [8:59:38<06:52, 11.47s/it] 99%|█████████▊| 2739/2774 [8:59:51<06:57, 11.94s/it] {'loss': 1.0327, 'learning_rate': 2.0882419553952537e-09, 'epoch': 0.99} 99%|█████████▊| 2739/2774 [8:59:51<06:57, 11.94s/it] 99%|█████████▉| 2740/2774 [9:00:03<06:41, 11.82s/it] {'loss': 1.0479, 'learning_rate': 1.9706339863262424e-09, 'epoch': 0.99} 99%|█████████▉| 2740/2774 [9:00:03<06:41, 11.82s/it] 99%|█████████▉| 2741/2774 [9:00:14<06:29, 11.79s/it] {'loss': 1.0444, 'learning_rate': 1.8564331784107214e-09, 'epoch': 0.99} 99%|█████████▉| 2741/2774 [9:00:14<06:29, 11.79s/it] 99%|█████████▉| 2742/2774 [9:00:28<06:33, 12.29s/it] {'loss': 0.9517, 'learning_rate': 1.7456396874115933e-09, 'epoch': 0.99} 99%|█████████▉| 2742/2774 [9:00:28<06:33, 12.29s/it] 99%|█████████▉| 2743/2774 [9:00:39<06:12, 12.02s/it] {'loss': 1.0547, 'learning_rate': 1.6382536644446445e-09, 'epoch': 0.99} 99%|█████████▉| 2743/2774 [9:00:39<06:12, 12.02s/it] 99%|█████████▉| 2744/2774 [9:00:51<05:55, 11.84s/it] {'loss': 1.0366, 'learning_rate': 1.534275255977713e-09, 'epoch': 0.99} 99%|█████████▉| 2744/2774 [9:00:51<05:55, 11.84s/it] 99%|█████████▉| 2745/2774 [9:01:02<05:39, 11.70s/it] {'loss': 1.0293, 'learning_rate': 1.433704603831243e-09, 'epoch': 0.99} 99%|█████████▉| 2745/2774 [9:01:02<05:39, 11.70s/it] 99%|█████████▉| 2746/2774 [9:01:14<05:30, 11.79s/it] {'loss': 1.0122, 'learning_rate': 1.3365418451774526e-09, 'epoch': 0.99} 99%|█████████▉| 2746/2774 [9:01:14<05:30, 11.79s/it] 99%|█████████▉| 2747/2774 [9:01:25<05:16, 11.73s/it] {'loss': 1.1206, 'learning_rate': 1.2427871125403334e-09, 'epoch': 0.99} 99%|█████████▉| 2747/2774 [9:01:25<05:16, 11.73s/it] 99%|█████████▉| 2748/2774 [9:01:37<05:01, 11.58s/it] {'loss': 1.0068, 'learning_rate': 1.1524405337962063e-09, 'epoch': 0.99} 99%|█████████▉| 2748/2774 [9:01:37<05:01, 11.58s/it] 99%|█████████▉| 2749/2774 [9:01:48<04:49, 11.58s/it] {'loss': 1.0703, 'learning_rate': 1.065502232171778e-09, 'epoch': 0.99} 99%|█████████▉| 2749/2774 [9:01:48<04:49, 11.58s/it] 99%|█████████▉| 2750/2774 [9:02:00<04:35, 11.50s/it] {'loss': 1.0166, 'learning_rate': 9.819723262458057e-10, 'epoch': 0.99} 99%|█████████▉| 2750/2774 [9:02:00<04:35, 11.50s/it] 99%|█████████▉| 2751/2774 [9:02:11<04:23, 11.45s/it] {'loss': 1.0249, 'learning_rate': 9.018509299482669e-10, 'epoch': 0.99} 99%|█████████▉| 2751/2774 [9:02:11<04:23, 11.45s/it] 99%|█████████▉| 2752/2774 [9:02:22<04:12, 11.48s/it] {'loss': 1.0063, 'learning_rate': 8.251381525595237e-10, 'epoch': 0.99} 99%|█████████▉| 2752/2774 [9:02:22<04:12, 11.48s/it] 99%|█████████▉| 2753/2774 [9:02:34<04:00, 11.46s/it] {'loss': 1.0864, 'learning_rate': 7.518340987114347e-10, 'epoch': 0.99} 99%|█████████▉| 2753/2774 [9:02:34<04:00, 11.46s/it] 99%|█████████▉| 2754/2774 [9:02:45<03:47, 11.39s/it] {'loss': 0.9976, 'learning_rate': 6.819388683862449e-10, 'epoch': 0.99} 99%|█████████▉| 2754/2774 [9:02:45<03:47, 11.39s/it] 99%|█████████▉| 2755/2774 [9:02:56<03:36, 11.38s/it] {'loss': 1.0342, 'learning_rate': 6.154525569168623e-10, 'epoch': 0.99} 99%|█████████▉| 2755/2774 [9:02:56<03:36, 11.38s/it] 99%|█████████▉| 2756/2774 [9:03:08<03:26, 11.49s/it] {'loss': 1.0415, 'learning_rate': 5.523752549863037e-10, 'epoch': 0.99} 99%|█████████▉| 2756/2774 [9:03:08<03:26, 11.49s/it] 99%|█████████▉| 2757/2774 [9:03:20<03:15, 11.50s/it] {'loss': 1.0205, 'learning_rate': 4.927070486288043e-10, 'epoch': 0.99} 99%|█████████▉| 2757/2774 [9:03:20<03:15, 11.50s/it] 99%|█████████▉| 2758/2774 [9:03:31<03:03, 11.50s/it] {'loss': 1.0024, 'learning_rate': 4.364480192275977e-10, 'epoch': 0.99} 99%|█████████▉| 2758/2774 [9:03:31<03:03, 11.50s/it] 99%|█████████▉| 2759/2774 [9:03:43<02:54, 11.65s/it] {'loss': 1.0308, 'learning_rate': 3.835982435168584e-10, 'epoch': 0.99} 99%|█████████▉| 2759/2774 [9:03:43<02:54, 11.65s/it] 99%|█████████▉| 2760/2774 [9:03:54<02:40, 11.50s/it] {'loss': 0.9834, 'learning_rate': 3.3415779358059174e-10, 'epoch': 0.99} 99%|█████████▉| 2760/2774 [9:03:54<02:40, 11.50s/it] 100%|█████████▉| 2761/2774 [9:04:07<02:35, 11.96s/it] {'loss': 0.9233, 'learning_rate': 2.8812673685235657e-10, 'epoch': 1.0} 100%|█████████▉| 2761/2774 [9:04:07<02:35, 11.96s/it] 100%|█████████▉| 2762/2774 [9:04:20<02:26, 12.20s/it] {'loss': 0.9932, 'learning_rate': 2.4550513611582007e-10, 'epoch': 1.0} 100%|█████████▉| 2762/2774 [9:04:20<02:26, 12.20s/it] 100%|█████████▉| 2763/2774 [9:04:33<02:16, 12.42s/it] {'loss': 1.0405, 'learning_rate': 2.0629304950420258e-10, 'epoch': 1.0} 100%|█████████▉| 2763/2774 [9:04:33<02:16, 12.42s/it] 100%|█████████▉| 2764/2774 [9:04:45<02:01, 12.12s/it] {'loss': 1.0146, 'learning_rate': 1.7049053050083308e-10, 'epoch': 1.0} 100%|█████████▉| 2764/2774 [9:04:45<02:01, 12.12s/it] 100%|█████████▉| 2765/2774 [9:04:56<01:47, 11.95s/it] {'loss': 1.0205, 'learning_rate': 1.380976279374835e-10, 'epoch': 1.0} 100%|█████████▉| 2765/2774 [9:04:56<01:47, 11.95s/it] 100%|█████████▉| 2766/2774 [9:05:08<01:34, 11.85s/it] {'loss': 1.02, 'learning_rate': 1.0911438599686686e-10, 'epoch': 1.0} 100%|█████████▉| 2766/2774 [9:05:08<01:34, 11.85s/it] 100%|█████████▉| 2767/2774 [9:05:19<01:21, 11.61s/it] {'loss': 1.0254, 'learning_rate': 8.35408442095842e-11, 'epoch': 1.0} 100%|█████████▉| 2767/2774 [9:05:19<01:21, 11.61s/it] 100%|█████████▉| 2768/2774 [9:05:30<01:09, 11.55s/it] {'loss': 0.9971, 'learning_rate': 6.137703745717761e-11, 'epoch': 1.0} 100%|█████████▉| 2768/2774 [9:05:30<01:09, 11.55s/it] 100%|█████████▉| 2769/2774 [9:05:42<00:57, 11.53s/it] {'loss': 1.0059, 'learning_rate': 4.262299596907715e-11, 'epoch': 1.0} 100%|█████████▉| 2769/2774 [9:05:42<00:57, 11.53s/it] 100%|█████████▉| 2770/2774 [9:05:53<00:46, 11.58s/it] {'loss': 1.0249, 'learning_rate': 2.7278745325098887e-11, 'epoch': 1.0} 100%|█████████▉| 2770/2774 [9:05:53<00:46, 11.58s/it] 100%|█████████▉| 2771/2774 [9:06:05<00:34, 11.50s/it] {'loss': 1.0352, 'learning_rate': 1.534430645377949e-11, 'epoch': 1.0} 100%|█████████▉| 2771/2774 [9:06:05<00:34, 11.50s/it] 100%|█████████▉| 2772/2774 [9:06:16<00:22, 11.47s/it] {'loss': 0.998, 'learning_rate': 6.819695632931389e-12, 'epoch': 1.0} 100%|█████████▉| 2772/2774 [9:06:16<00:22, 11.47s/it] 100%|█████████▉| 2773/2774 [9:06:27<00:11, 11.42s/it] {'loss': 1.0547, 'learning_rate': 1.7049244896427675e-12, 'epoch': 1.0} 100%|█████████▉| 2773/2774 [9:06:27<00:11, 11.42s/it] 100%|██████████| 2774/2774 [9:06:40<00:00, 11.69s/it] {'loss': 1.0215, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 2774/2774 [9:06:40<00:00, 11.69s/it] {'train_runtime': 32801.9753, 'train_samples_per_second': 10.823, 'train_steps_per_second': 0.085, 'train_loss': 1.030239465516853, 'epoch': 1.0} 100%|██████████| 2774/2774 [9:06:40<00:00, 11.69s/it] 100%|██████████| 2774/2774 [9:06:40<00:00, 11.82s/it] 2024-03-10 20:19:34.180 n193-018-074:2301448:2302681 [0] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-10 20:19:34.180 n193-018-074:2301449:2302675 [1] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-10 20:19:34.180 n193-018-074:2301450:2302682 [2] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-10 20:19:34.382 n193-018-074:2301451:2302679 [3] NCCL INFO [Service thread] Connection closed by localRank 4 2024-03-10 20:19:34.382 n193-018-074:2301453:2302676 [5] NCCL INFO [Service thread] Connection closed by localRank 4 2024-03-10 20:19:34.382 n193-018-074:2301452:2302680 [4] NCCL INFO [Service thread] Connection closed by localRank 4 2024-03-10 20:19:34.418 n193-018-074:2301451:2302679 [3] NCCL INFO [Service thread] Connection closed by localRank 3 2024-03-10 20:19:34.418 n193-018-074:2301450:2302682 [2] NCCL INFO [Service thread] Connection closed by localRank 3 2024-03-10 20:19:34.418 n193-018-074:2301452:2302680 [4] NCCL INFO [Service thread] Connection closed by localRank 3 2024-03-10 20:19:34.437 n193-018-074:2301448:2302681 [0] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-10 20:19:34.437 n193-018-074:2301454:2302678 [6] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-10 20:19:34.437 n193-018-074:2301455:2302677 [7] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-10 20:19:34.547 n193-018-074:2301453:2302676 [5] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-10 20:19:34.547 n193-018-074:2301454:2302678 [6] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-10 20:19:34.547 n193-018-074:2301455:2302677 [7] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-10 20:19:34.621 n193-018-074:2301452:2302680 [4] NCCL INFO [Service thread] Connection closed by localRank 5 2024-03-10 20:19:34.621 n193-018-074:2301453:2302676 [5] NCCL INFO [Service thread] Connection closed by localRank 5 2024-03-10 20:19:34.621 n193-018-074:2301454:2302678 [6] NCCL INFO [Service thread] Connection closed by localRank 5 2024-03-10 20:19:34.666 n193-018-074:2301449:2302675 [1] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-10 20:19:34.666 n193-018-074:2301451:2302679 [3] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-10 20:19:34.666 n193-018-074:2301450:2302682 [2] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-10 20:19:34.788 n193-018-074:2301454:2301454 [6] NCCL INFO comm 0xb7182140 rank 6 nranks 8 cudaDev 6 busId c5000 - Abort COMPLETE 2024-03-10 20:19:34.791 n193-018-074:2301452:2301452 [4] NCCL INFO comm 0x185a77340 rank 4 nranks 8 cudaDev 4 busId 89000 - Abort COMPLETE 2024-03-10 20:19:34.830 n193-018-074:2301451:2301451 [3] NCCL INFO comm 0xb6209bc0 rank 3 nranks 8 cudaDev 3 busId 4e000 - Abort COMPLETE 2024-03-10 20:19:35.225 n193-018-074:2301453:2301453 [5] NCCL INFO comm 0x1862cf940 rank 5 nranks 8 cudaDev 5 busId 8e000 - Abort COMPLETE 2024-03-10 20:19:35.269 n193-018-074:2301450:2301450 [2] NCCL INFO comm 0xb858a750 rank 2 nranks 8 cudaDev 2 busId 4a000 - Abort COMPLETE 2024-03-10 20:19:37.061 n193-018-074:2301455:2302419 [7] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-10 20:19:37.656 n193-018-074:2301449:2302422 [1] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-10 20:19:40.518 n193-018-074:2301455:2301455 [7] NCCL INFO comm 0x1872d08c0 rank 7 nranks 8 cudaDev 7 busId c9000 - Abort COMPLETE 2024-03-10 20:19:40.520 n193-018-074:2301449:2301449 [1] NCCL INFO comm 0x1862a5d40 rank 1 nranks 8 cudaDev 1 busId 16000 - Abort COMPLETE 2024-03-10 20:19:42.723 n193-018-074:2301448:2302425 [0] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-10 20:19:42.742 n193-018-074:2301448:2302425 [0] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-10 20:19:52.571 n193-018-074:2301448:2302681 [0] NCCL INFO [Service thread] Connection closed by localRank 0 2024-03-10 20:19:53.416 n193-018-074:2301448:2301448 [0] NCCL INFO comm 0x1985b4bb0 rank 0 nranks 8 cudaDev 0 busId 10000 - Abort COMPLETE