2023-03-07 10:14:36,176 INFO [train2.py:879] (2/4) Training started 2023-03-07 10:14:36,176 INFO [train2.py:880] (2/4) {'frame_shift_ms': 10.0, 'allowed_excess_duration_ratio': 0.1, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.3', 'k2-build-type': 'Debug', 'k2-with-cuda': True, 'k2-git-sha1': '3b81ac9686aee539d447bb2085b2cdfc131c7c91', 'k2-git-date': 'Thu Jan 26 20:40:25 2023', 'lhotse-version': '1.9.0.dev+git.97bf4b0.dirty', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'surt', 'icefall-git-sha1': 'e9931b7-dirty', 'icefall-git-date': 'Fri Mar 3 16:27:17 2023', 'icefall-path': '/exp/draj/mini_scale_2022/icefall', 'k2-path': '/exp/draj/mini_scale_2022/k2/k2/python/k2/__init__.py', 'lhotse-path': '/exp/draj/mini_scale_2022/lhotse/lhotse/__init__.py', 'hostname': 'r8n04', 'IP address': '10.1.8.4'}, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'world_size': 4, 'master_port': 12368, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer_ctc_att/exp/v0'), 'lang_dir': PosixPath('data/lang_bpe_500'), 'base_lr': 0.05, 'lr_batches': 5000, 'lr_epochs': 3.5, 'att_rate': 0.8, 'num_decoder_layers': 6, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 2000, 'keep_last_k': 10, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'full_libri': True, 'manifest_dir': PosixPath('data/manifests'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'} 2023-03-07 10:14:36,468 INFO [lexicon.py:168] (2/4) Loading pre-compiled data/lang_bpe_500/Linv.pt 2023-03-07 10:14:37,050 INFO [train2.py:902] (2/4) About to create model 2023-03-07 10:14:37,531 INFO [zipformer.py:178] (2/4) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8. 2023-03-07 10:14:37,592 INFO [train2.py:906] (2/4) Number of model parameters: 86083707 2023-03-07 10:14:42,450 INFO [train2.py:921] (2/4) Using DDP 2023-03-07 10:14:42,777 INFO [asr_datamodule.py:420] (2/4) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2023-03-07 10:14:42,878 INFO [asr_datamodule.py:224] (2/4) Enable MUSAN 2023-03-07 10:14:42,878 INFO [asr_datamodule.py:225] (2/4) About to get Musan cuts 2023-03-07 10:14:44,369 INFO [asr_datamodule.py:249] (2/4) Enable SpecAugment 2023-03-07 10:14:44,369 INFO [asr_datamodule.py:250] (2/4) Time warp factor: 80 2023-03-07 10:14:44,369 INFO [asr_datamodule.py:260] (2/4) Num frame mask: 10 2023-03-07 10:14:44,369 INFO [asr_datamodule.py:273] (2/4) About to create train dataset 2023-03-07 10:14:44,369 INFO [asr_datamodule.py:300] (2/4) Using DynamicBucketingSampler. 2023-03-07 10:14:46,917 INFO [asr_datamodule.py:316] (2/4) About to create train dataloader 2023-03-07 10:14:46,918 INFO [asr_datamodule.py:440] (2/4) About to get dev-clean cuts 2023-03-07 10:14:46,919 INFO [asr_datamodule.py:447] (2/4) About to get dev-other cuts 2023-03-07 10:14:46,920 INFO [asr_datamodule.py:347] (2/4) About to create dev dataset 2023-03-07 10:14:47,193 INFO [asr_datamodule.py:364] (2/4) About to create dev dataloader 2023-03-07 10:15:00,128 INFO [train2.py:809] (2/4) Epoch 1, batch 0, loss[ctc_loss=5.366, att_loss=1.252, loss=2.075, over 15880.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009596, over 39.00 utterances.], tot_loss[ctc_loss=5.366, att_loss=1.252, loss=2.075, over 15880.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009596, over 39.00 utterances.], batch size: 39, lr: 2.50e-02, grad_scale: 2.0 2023-03-07 10:15:00,128 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 10:15:12,519 INFO [train2.py:843] (2/4) Epoch 1, validation: ctc_loss=5.278, att_loss=1.419, loss=2.191, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 10:15:12,520 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 12523MB 2023-03-07 10:15:18,051 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={1, 2} 2023-03-07 10:15:43,082 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:16:22,265 INFO [train2.py:809] (2/4) Epoch 1, batch 50, loss[ctc_loss=1.264, att_loss=1.068, loss=1.107, over 17066.00 frames. utt_duration=691.3 frames, utt_pad_proportion=0.1304, over 99.00 utterances.], tot_loss[ctc_loss=2.23, att_loss=1.122, loss=1.344, over 736258.53 frames. utt_duration=1231 frames, utt_pad_proportion=0.05482, over 2395.68 utterances.], batch size: 99, lr: 2.75e-02, grad_scale: 2.0 2023-03-07 10:16:35,834 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=60.98 vs. limit=5.0 2023-03-07 10:17:06,715 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:17:21,426 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2920, 5.2754, 5.3567, 5.6278, 5.1704, 5.2022, 4.9418, 5.6320], device='cuda:2'), covar=tensor([0.0339, 0.0049, 0.0269, 0.0171, 0.0184, 0.0176, 0.0046, 0.0104], device='cuda:2'), in_proj_covar=tensor([0.0009, 0.0008, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0008], device='cuda:2'), out_proj_covar=tensor([8.6161e-06, 8.5751e-06, 8.5283e-06, 8.7755e-06, 8.5823e-06, 8.5795e-06, 8.5126e-06, 8.6684e-06], device='cuda:2') 2023-03-07 10:17:31,214 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.328e+01 1.076e+02 2.257e+02 4.349e+02 2.620e+03, threshold=4.514e+02, percent-clipped=0.0 2023-03-07 10:17:31,258 INFO [train2.py:809] (2/4) Epoch 1, batch 100, loss[ctc_loss=1.171, att_loss=0.9804, loss=1.019, over 16124.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.00631, over 42.00 utterances.], tot_loss[ctc_loss=1.649, att_loss=1.051, loss=1.17, over 1292492.18 frames. utt_duration=1179 frames, utt_pad_proportion=0.07506, over 4389.15 utterances.], batch size: 42, lr: 3.00e-02, grad_scale: 2.0 2023-03-07 10:18:30,915 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8798, 4.8992, 4.9193, 4.8264, 4.9248, 4.8037, 4.9226, 4.9257], device='cuda:2'), covar=tensor([0.0032, 0.0056, 0.0082, 0.0134, 0.0085, 0.0028, 0.0078, 0.0073], device='cuda:2'), in_proj_covar=tensor([0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008], device='cuda:2'), out_proj_covar=tensor([8.0732e-06, 8.2406e-06, 8.1385e-06, 8.1695e-06, 8.3061e-06, 8.2893e-06, 8.3052e-06, 8.1690e-06], device='cuda:2') 2023-03-07 10:18:33,474 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=144.0, num_to_drop=2, layers_to_drop={0, 1} 2023-03-07 10:18:42,513 INFO [train2.py:809] (2/4) Epoch 1, batch 150, loss[ctc_loss=1.208, att_loss=1.005, loss=1.046, over 16637.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.004841, over 47.00 utterances.], tot_loss[ctc_loss=1.453, att_loss=1.022, loss=1.108, over 1733840.79 frames. utt_duration=1193 frames, utt_pad_proportion=0.06789, over 5821.49 utterances.], batch size: 47, lr: 3.25e-02, grad_scale: 2.0 2023-03-07 10:19:35,300 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=25.62 vs. limit=5.0 2023-03-07 10:19:36,489 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2023-03-07 10:19:37,168 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8836, 4.9040, 4.9211, 4.9107, 4.9249, 4.8149, 4.9172, 4.9171], device='cuda:2'), covar=tensor([0.0025, 0.0027, 0.0043, 0.0044, 0.0049, 0.0019, 0.0026, 0.0021], device='cuda:2'), in_proj_covar=tensor([0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008, 0.0008], device='cuda:2'), out_proj_covar=tensor([7.9402e-06, 8.1083e-06, 7.9997e-06, 8.0226e-06, 8.1677e-06, 8.1780e-06, 8.1879e-06, 8.0160e-06], device='cuda:2') 2023-03-07 10:19:48,621 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.064e+01 7.732e+01 1.287e+02 1.925e+02 9.711e+02, threshold=2.575e+02, percent-clipped=2.0 2023-03-07 10:19:48,671 INFO [train2.py:809] (2/4) Epoch 1, batch 200, loss[ctc_loss=1.107, att_loss=0.9072, loss=0.947, over 15958.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006855, over 41.00 utterances.], tot_loss[ctc_loss=1.35, att_loss=0.9962, loss=1.067, over 2057322.51 frames. utt_duration=1219 frames, utt_pad_proportion=0.0669, over 6759.56 utterances.], batch size: 41, lr: 3.50e-02, grad_scale: 2.0 2023-03-07 10:20:29,547 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=7.26 vs. limit=2.0 2023-03-07 10:20:54,634 INFO [train2.py:809] (2/4) Epoch 1, batch 250, loss[ctc_loss=1.178, att_loss=0.9494, loss=0.9952, over 16414.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.006755, over 44.00 utterances.], tot_loss[ctc_loss=1.294, att_loss=0.9834, loss=1.046, over 2323642.45 frames. utt_duration=1198 frames, utt_pad_proportion=0.07294, over 7765.53 utterances.], batch size: 44, lr: 3.75e-02, grad_scale: 2.0 2023-03-07 10:21:36,440 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=11.67 vs. limit=5.0 2023-03-07 10:21:54,687 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=296.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:21:59,462 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=300.0, num_to_drop=2, layers_to_drop={1, 3} 2023-03-07 10:22:00,457 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.252e+01 6.827e+01 1.043e+02 1.996e+02 1.331e+03, threshold=2.085e+02, percent-clipped=17.0 2023-03-07 10:22:00,503 INFO [train2.py:809] (2/4) Epoch 1, batch 300, loss[ctc_loss=1.227, att_loss=1.005, loss=1.05, over 16959.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007743, over 50.00 utterances.], tot_loss[ctc_loss=1.256, att_loss=0.9718, loss=1.029, over 2531975.38 frames. utt_duration=1213 frames, utt_pad_proportion=0.06871, over 8361.80 utterances.], batch size: 50, lr: 4.00e-02, grad_scale: 2.0 2023-03-07 10:23:06,528 INFO [train2.py:809] (2/4) Epoch 1, batch 350, loss[ctc_loss=1.18, att_loss=0.9713, loss=1.013, over 14208.00 frames. utt_duration=393.5 frames, utt_pad_proportion=0.3156, over 145.00 utterances.], tot_loss[ctc_loss=1.234, att_loss=0.9632, loss=1.017, over 2692314.96 frames. utt_duration=1197 frames, utt_pad_proportion=0.07146, over 9005.23 utterances.], batch size: 145, lr: 4.25e-02, grad_scale: 2.0 2023-03-07 10:23:14,277 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=357.0, num_to_drop=2, layers_to_drop={0, 2} 2023-03-07 10:23:54,037 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=387.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 10:24:12,775 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.983e+01 8.126e+01 1.923e+02 2.981e+02 6.249e+02, threshold=3.846e+02, percent-clipped=46.0 2023-03-07 10:24:12,823 INFO [train2.py:809] (2/4) Epoch 1, batch 400, loss[ctc_loss=1.021, att_loss=0.812, loss=0.8538, over 15495.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.00814, over 36.00 utterances.], tot_loss[ctc_loss=1.215, att_loss=0.9551, loss=1.007, over 2826688.32 frames. utt_duration=1221 frames, utt_pad_proportion=0.06325, over 9274.38 utterances.], batch size: 36, lr: 4.50e-02, grad_scale: 4.0 2023-03-07 10:25:02,034 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=439.0, num_to_drop=2, layers_to_drop={1, 2} 2023-03-07 10:25:13,990 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=448.0, num_to_drop=2, layers_to_drop={2, 3} 2023-03-07 10:25:17,464 INFO [train2.py:809] (2/4) Epoch 1, batch 450, loss[ctc_loss=1.014, att_loss=0.8029, loss=0.8452, over 15503.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008071, over 36.00 utterances.], tot_loss[ctc_loss=1.195, att_loss=0.9423, loss=0.9928, over 2931838.47 frames. utt_duration=1257 frames, utt_pad_proportion=0.05283, over 9337.15 utterances.], batch size: 36, lr: 4.75e-02, grad_scale: 4.0 2023-03-07 10:26:22,551 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.753e+01 1.641e+02 2.817e+02 4.953e+02 1.195e+03, threshold=5.634e+02, percent-clipped=34.0 2023-03-07 10:26:22,597 INFO [train2.py:809] (2/4) Epoch 1, batch 500, loss[ctc_loss=1.169, att_loss=0.9516, loss=0.9951, over 17320.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.0107, over 55.00 utterances.], tot_loss[ctc_loss=1.174, att_loss=0.9295, loss=0.9785, over 3011697.98 frames. utt_duration=1223 frames, utt_pad_proportion=0.05936, over 9859.81 utterances.], batch size: 55, lr: 4.99e-02, grad_scale: 4.0 2023-03-07 10:26:31,723 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8743, 3.6986, 3.5682, 3.4709, 1.6001, 3.2103, 3.2388, 2.5212], device='cuda:2'), covar=tensor([0.0115, 0.0134, 0.0359, 0.0387, 0.2687, 0.0229, 0.0225, 0.0288], device='cuda:2'), in_proj_covar=tensor([0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0010, 0.0009], device='cuda:2'), out_proj_covar=tensor([8.5082e-06, 8.5560e-06, 9.3622e-06, 8.8985e-06, 9.4916e-06, 9.0931e-06, 9.0908e-06, 8.8058e-06], device='cuda:2') 2023-03-07 10:27:27,824 INFO [train2.py:809] (2/4) Epoch 1, batch 550, loss[ctc_loss=1.029, att_loss=0.8567, loss=0.8912, over 16130.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.00611, over 42.00 utterances.], tot_loss[ctc_loss=1.153, att_loss=0.9169, loss=0.9642, over 3074578.37 frames. utt_duration=1216 frames, utt_pad_proportion=0.05956, over 10130.00 utterances.], batch size: 42, lr: 4.98e-02, grad_scale: 4.0 2023-03-07 10:27:31,677 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5575, 4.2748, 4.4142, 4.7637, 4.5159, 4.7211, 3.7577, 4.4872], device='cuda:2'), covar=tensor([0.0253, 0.0152, 0.0325, 0.0077, 0.0109, 0.0069, 0.0846, 0.0205], device='cuda:2'), in_proj_covar=tensor([0.0010, 0.0009, 0.0010, 0.0009, 0.0009, 0.0009, 0.0010, 0.0009], device='cuda:2'), out_proj_covar=tensor([9.1331e-06, 8.6472e-06, 9.3267e-06, 8.1277e-06, 8.3744e-06, 8.1598e-06, 9.6369e-06, 8.5521e-06], device='cuda:2') 2023-03-07 10:27:41,722 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=562.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 10:27:49,953 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=568.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:27:50,360 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=8.58 vs. limit=5.0 2023-03-07 10:28:10,616 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=6.09 vs. limit=2.0 2023-03-07 10:28:19,193 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=590.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:28:29,539 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.80 vs. limit=5.0 2023-03-07 10:28:32,096 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=600.0, num_to_drop=2, layers_to_drop={0, 1} 2023-03-07 10:28:33,064 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.073e+01 2.172e+02 3.350e+02 5.891e+02 2.324e+03, threshold=6.699e+02, percent-clipped=26.0 2023-03-07 10:28:33,107 INFO [train2.py:809] (2/4) Epoch 1, batch 600, loss[ctc_loss=0.9292, att_loss=0.7604, loss=0.7942, over 15492.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009249, over 36.00 utterances.], tot_loss[ctc_loss=1.128, att_loss=0.9024, loss=0.9475, over 3119548.58 frames. utt_duration=1229 frames, utt_pad_proportion=0.05719, over 10166.22 utterances.], batch size: 36, lr: 4.98e-02, grad_scale: 4.0 2023-03-07 10:28:37,991 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.51 vs. limit=2.0 2023-03-07 10:28:42,088 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.9095, 4.0193, 2.8759, 2.2639, 2.6290, 1.9806, 3.3043, 2.5642], device='cuda:2'), covar=tensor([0.1730, 0.1237, 0.0755, 0.1849, 0.1217, 0.1604, 0.1370, 0.1781], device='cuda:2'), in_proj_covar=tensor([0.0009, 0.0010, 0.0008, 0.0010, 0.0009, 0.0009, 0.0009, 0.0009], device='cuda:2'), out_proj_covar=tensor([7.4657e-06, 8.9275e-06, 7.2353e-06, 8.3002e-06, 8.4106e-06, 8.6840e-06, 9.2036e-06, 8.1840e-06], device='cuda:2') 2023-03-07 10:28:44,386 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.2047, 6.4800, 6.0962, 6.4655, 6.0615, 6.1291, 6.5128, 6.1713], device='cuda:2'), covar=tensor([0.0913, 0.0366, 0.2021, 0.0390, 0.1447, 0.1130, 0.0242, 0.1139], device='cuda:2'), in_proj_covar=tensor([0.0017, 0.0018, 0.0017, 0.0017, 0.0017, 0.0015, 0.0016, 0.0018], device='cuda:2'), out_proj_covar=tensor([1.7849e-05, 1.7934e-05, 1.7573e-05, 1.7172e-05, 1.7658e-05, 1.5888e-05, 1.5992e-05, 1.7903e-05], device='cuda:2') 2023-03-07 10:28:56,687 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.75 vs. limit=2.0 2023-03-07 10:29:01,192 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=623.0, num_to_drop=2, layers_to_drop={0, 3} 2023-03-07 10:29:09,918 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=629.0, num_to_drop=2, layers_to_drop={1, 3} 2023-03-07 10:29:33,582 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=648.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:29:37,164 INFO [train2.py:809] (2/4) Epoch 1, batch 650, loss[ctc_loss=0.8852, att_loss=0.7695, loss=0.7926, over 15642.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008988, over 37.00 utterances.], tot_loss[ctc_loss=1.092, att_loss=0.8887, loss=0.9294, over 3161253.06 frames. utt_duration=1253 frames, utt_pad_proportion=0.04969, over 10103.69 utterances.], batch size: 37, lr: 4.98e-02, grad_scale: 4.0 2023-03-07 10:29:37,434 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=651.0, num_to_drop=2, layers_to_drop={1, 3} 2023-03-07 10:29:38,588 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=652.0, num_to_drop=2, layers_to_drop={1, 3} 2023-03-07 10:30:42,275 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 2.720e+02 3.813e+02 5.930e+02 1.074e+03, threshold=7.626e+02, percent-clipped=15.0 2023-03-07 10:30:42,318 INFO [train2.py:809] (2/4) Epoch 1, batch 700, loss[ctc_loss=0.8768, att_loss=0.8463, loss=0.8524, over 16478.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006654, over 46.00 utterances.], tot_loss[ctc_loss=1.049, att_loss=0.8774, loss=0.9116, over 3189325.20 frames. utt_duration=1235 frames, utt_pad_proportion=0.05456, over 10340.79 utterances.], batch size: 46, lr: 4.98e-02, grad_scale: 4.0 2023-03-07 10:31:32,119 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=739.0, num_to_drop=2, layers_to_drop={0, 2} 2023-03-07 10:31:37,692 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=743.0, num_to_drop=2, layers_to_drop={2, 3} 2023-03-07 10:31:47,678 INFO [train2.py:809] (2/4) Epoch 1, batch 750, loss[ctc_loss=0.8141, att_loss=0.8057, loss=0.8074, over 15963.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005989, over 41.00 utterances.], tot_loss[ctc_loss=0.9979, att_loss=0.8634, loss=0.8903, over 3199845.85 frames. utt_duration=1249 frames, utt_pad_proportion=0.05463, over 10257.64 utterances.], batch size: 41, lr: 4.97e-02, grad_scale: 4.0 2023-03-07 10:32:34,578 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=787.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:32:52,870 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.221e+02 2.689e+02 3.178e+02 4.558e+02 8.298e+02, threshold=6.355e+02, percent-clipped=3.0 2023-03-07 10:32:52,913 INFO [train2.py:809] (2/4) Epoch 1, batch 800, loss[ctc_loss=0.794, att_loss=0.8241, loss=0.8181, over 17357.00 frames. utt_duration=1008 frames, utt_pad_proportion=0.04945, over 69.00 utterances.], tot_loss[ctc_loss=0.95, att_loss=0.8527, loss=0.8721, over 3216114.43 frames. utt_duration=1254 frames, utt_pad_proportion=0.05439, over 10273.33 utterances.], batch size: 69, lr: 4.97e-02, grad_scale: 8.0 2023-03-07 10:33:31,303 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.20 vs. limit=2.0 2023-03-07 10:33:52,596 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=847.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:33:57,208 INFO [train2.py:809] (2/4) Epoch 1, batch 850, loss[ctc_loss=0.6555, att_loss=0.6901, loss=0.6832, over 15373.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.00991, over 35.00 utterances.], tot_loss[ctc_loss=0.9069, att_loss=0.8391, loss=0.8526, over 3226481.27 frames. utt_duration=1259 frames, utt_pad_proportion=0.05414, over 10263.33 utterances.], batch size: 35, lr: 4.96e-02, grad_scale: 8.0 2023-03-07 10:35:01,440 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.412e+02 3.625e+02 4.631e+02 5.738e+02 1.582e+03, threshold=9.262e+02, percent-clipped=18.0 2023-03-07 10:35:01,482 INFO [train2.py:809] (2/4) Epoch 1, batch 900, loss[ctc_loss=0.6959, att_loss=0.719, loss=0.7144, over 16331.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006141, over 45.00 utterances.], tot_loss[ctc_loss=0.8695, att_loss=0.8208, loss=0.8305, over 3240642.18 frames. utt_duration=1244 frames, utt_pad_proportion=0.05586, over 10428.57 utterances.], batch size: 45, lr: 4.96e-02, grad_scale: 8.0 2023-03-07 10:35:10,236 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=908.0, num_to_drop=2, layers_to_drop={0, 2} 2023-03-07 10:35:23,154 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=918.0, num_to_drop=2, layers_to_drop={0, 1} 2023-03-07 10:35:31,236 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=924.0, num_to_drop=2, layers_to_drop={0, 1} 2023-03-07 10:35:59,518 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=946.0, num_to_drop=2, layers_to_drop={0, 2} 2023-03-07 10:36:05,673 INFO [train2.py:809] (2/4) Epoch 1, batch 950, loss[ctc_loss=0.8292, att_loss=0.7628, loss=0.776, over 17290.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01248, over 55.00 utterances.], tot_loss[ctc_loss=0.8361, att_loss=0.7939, loss=0.8024, over 3255521.64 frames. utt_duration=1244 frames, utt_pad_proportion=0.05324, over 10482.91 utterances.], batch size: 55, lr: 4.96e-02, grad_scale: 8.0 2023-03-07 10:36:07,100 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=952.0, num_to_drop=2, layers_to_drop={0, 1} 2023-03-07 10:37:08,812 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1000.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 10:37:09,958 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.067e+02 5.280e+02 6.177e+02 7.418e+02 1.446e+03, threshold=1.235e+03, percent-clipped=9.0 2023-03-07 10:37:10,000 INFO [train2.py:809] (2/4) Epoch 1, batch 1000, loss[ctc_loss=0.6891, att_loss=0.6472, loss=0.6556, over 16979.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006111, over 50.00 utterances.], tot_loss[ctc_loss=0.8045, att_loss=0.7608, loss=0.7695, over 3257301.61 frames. utt_duration=1250 frames, utt_pad_proportion=0.05211, over 10437.23 utterances.], batch size: 50, lr: 4.95e-02, grad_scale: 8.0 2023-03-07 10:37:29,589 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.21 vs. limit=2.0 2023-03-07 10:38:04,797 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1043.0, num_to_drop=2, layers_to_drop={0, 1} 2023-03-07 10:38:15,413 INFO [train2.py:809] (2/4) Epoch 1, batch 1050, loss[ctc_loss=0.5842, att_loss=0.5241, loss=0.5361, over 10950.00 frames. utt_duration=1827 frames, utt_pad_proportion=0.2189, over 24.00 utterances.], tot_loss[ctc_loss=0.7727, att_loss=0.7244, loss=0.734, over 3243404.65 frames. utt_duration=1254 frames, utt_pad_proportion=0.05444, over 10355.87 utterances.], batch size: 24, lr: 4.95e-02, grad_scale: 8.0 2023-03-07 10:39:07,964 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1091.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:39:21,060 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.042e+02 5.471e+02 6.537e+02 8.714e+02 1.508e+03, threshold=1.307e+03, percent-clipped=3.0 2023-03-07 10:39:21,103 INFO [train2.py:809] (2/4) Epoch 1, batch 1100, loss[ctc_loss=0.6762, att_loss=0.5581, loss=0.5818, over 17360.00 frames. utt_duration=1103 frames, utt_pad_proportion=0.03373, over 63.00 utterances.], tot_loss[ctc_loss=0.7419, att_loss=0.6873, loss=0.6982, over 3249415.11 frames. utt_duration=1261 frames, utt_pad_proportion=0.05263, over 10316.19 utterances.], batch size: 63, lr: 4.94e-02, grad_scale: 8.0 2023-03-07 10:40:27,179 INFO [train2.py:809] (2/4) Epoch 1, batch 1150, loss[ctc_loss=0.5579, att_loss=0.496, loss=0.5084, over 16120.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006052, over 42.00 utterances.], tot_loss[ctc_loss=0.7166, att_loss=0.6558, loss=0.6679, over 3251546.08 frames. utt_duration=1256 frames, utt_pad_proportion=0.05472, over 10367.62 utterances.], batch size: 42, lr: 4.94e-02, grad_scale: 8.0 2023-03-07 10:40:30,064 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1153.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:41:32,899 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.510e+02 5.540e+02 7.016e+02 8.798e+02 2.623e+03, threshold=1.403e+03, percent-clipped=4.0 2023-03-07 10:41:32,942 INFO [train2.py:809] (2/4) Epoch 1, batch 1200, loss[ctc_loss=0.5416, att_loss=0.4726, loss=0.4864, over 15658.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.008118, over 37.00 utterances.], tot_loss[ctc_loss=0.6936, att_loss=0.6281, loss=0.6412, over 3246326.61 frames. utt_duration=1240 frames, utt_pad_proportion=0.06044, over 10482.52 utterances.], batch size: 37, lr: 4.93e-02, grad_scale: 8.0 2023-03-07 10:41:35,570 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1203.0, num_to_drop=2, layers_to_drop={1, 3} 2023-03-07 10:41:49,212 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1214.0, num_to_drop=2, layers_to_drop={1, 2} 2023-03-07 10:41:54,857 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1218.0, num_to_drop=2, layers_to_drop={1, 2} 2023-03-07 10:42:03,502 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1224.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 10:42:18,653 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.14 vs. limit=2.0 2023-03-07 10:42:31,504 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1246.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:42:37,638 INFO [train2.py:809] (2/4) Epoch 1, batch 1250, loss[ctc_loss=0.497, att_loss=0.4398, loss=0.4512, over 15763.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.009013, over 38.00 utterances.], tot_loss[ctc_loss=0.6725, att_loss=0.6022, loss=0.6162, over 3237223.91 frames. utt_duration=1246 frames, utt_pad_proportion=0.06049, over 10406.48 utterances.], batch size: 38, lr: 4.92e-02, grad_scale: 8.0 2023-03-07 10:42:53,981 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1976, 3.2269, 3.5820, 3.7472, 4.2467, 3.7576, 3.8459, 3.2696], device='cuda:2'), covar=tensor([0.6146, 0.3146, 0.4085, 0.2282, 0.1291, 0.4780, 0.4097, 0.8176], device='cuda:2'), in_proj_covar=tensor([0.0015, 0.0016, 0.0015, 0.0017, 0.0017, 0.0016, 0.0016, 0.0018], device='cuda:2'), out_proj_covar=tensor([9.4279e-06, 9.5733e-06, 9.8002e-06, 9.5342e-06, 1.0042e-05, 1.0448e-05, 1.0742e-05, 1.2657e-05], device='cuda:2') 2023-03-07 10:42:56,390 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1266.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:43:05,039 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1272.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 10:43:34,639 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1294.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 10:43:43,367 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.330e+02 5.046e+02 6.291e+02 8.119e+02 1.875e+03, threshold=1.258e+03, percent-clipped=2.0 2023-03-07 10:43:43,410 INFO [train2.py:809] (2/4) Epoch 1, batch 1300, loss[ctc_loss=0.5739, att_loss=0.4861, loss=0.5036, over 17382.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03406, over 63.00 utterances.], tot_loss[ctc_loss=0.653, att_loss=0.5792, loss=0.594, over 3246838.02 frames. utt_duration=1246 frames, utt_pad_proportion=0.05801, over 10438.28 utterances.], batch size: 63, lr: 4.92e-02, grad_scale: 8.0 2023-03-07 10:44:50,634 INFO [train2.py:809] (2/4) Epoch 1, batch 1350, loss[ctc_loss=0.523, att_loss=0.4436, loss=0.4595, over 15526.00 frames. utt_duration=1727 frames, utt_pad_proportion=0.006964, over 36.00 utterances.], tot_loss[ctc_loss=0.6337, att_loss=0.5581, loss=0.5732, over 3250605.90 frames. utt_duration=1267 frames, utt_pad_proportion=0.05326, over 10278.29 utterances.], batch size: 36, lr: 4.91e-02, grad_scale: 8.0 2023-03-07 10:45:19,251 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.16 vs. limit=2.0 2023-03-07 10:45:32,667 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4547, 2.1555, 3.8027, 3.8979, 3.5595, 3.6871, 2.7701, 3.7449], device='cuda:2'), covar=tensor([0.1088, 0.4863, 0.0721, 0.0916, 0.1154, 0.1037, 0.4535, 0.0766], device='cuda:2'), in_proj_covar=tensor([0.0044, 0.0048, 0.0059, 0.0079, 0.0072, 0.0084, 0.0052, 0.0076], device='cuda:2'), out_proj_covar=tensor([3.7515e-05, 4.1197e-05, 4.6889e-05, 5.6594e-05, 5.1845e-05, 6.2945e-05, 4.4512e-05, 5.5488e-05], device='cuda:2') 2023-03-07 10:45:39,830 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2023-03-07 10:45:59,934 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+02 4.980e+02 5.949e+02 7.400e+02 1.342e+03, threshold=1.190e+03, percent-clipped=3.0 2023-03-07 10:45:59,977 INFO [train2.py:809] (2/4) Epoch 1, batch 1400, loss[ctc_loss=0.546, att_loss=0.4604, loss=0.4775, over 17029.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007971, over 51.00 utterances.], tot_loss[ctc_loss=0.6126, att_loss=0.5369, loss=0.552, over 3261972.04 frames. utt_duration=1287 frames, utt_pad_proportion=0.04618, over 10147.23 utterances.], batch size: 51, lr: 4.91e-02, grad_scale: 8.0 2023-03-07 10:46:45,479 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1434.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 10:47:08,136 INFO [train2.py:809] (2/4) Epoch 1, batch 1450, loss[ctc_loss=0.6275, att_loss=0.5183, loss=0.5401, over 17149.00 frames. utt_duration=869.8 frames, utt_pad_proportion=0.08916, over 79.00 utterances.], tot_loss[ctc_loss=0.6005, att_loss=0.5237, loss=0.5391, over 3272233.62 frames. utt_duration=1268 frames, utt_pad_proportion=0.04908, over 10332.63 utterances.], batch size: 79, lr: 4.90e-02, grad_scale: 8.0 2023-03-07 10:48:08,690 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1495.0, num_to_drop=2, layers_to_drop={0, 3} 2023-03-07 10:48:16,250 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.693e+02 5.276e+02 6.895e+02 9.021e+02 2.982e+03, threshold=1.379e+03, percent-clipped=6.0 2023-03-07 10:48:16,293 INFO [train2.py:809] (2/4) Epoch 1, batch 1500, loss[ctc_loss=0.5115, att_loss=0.4648, loss=0.4741, over 16539.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006227, over 45.00 utterances.], tot_loss[ctc_loss=0.5846, att_loss=0.5088, loss=0.524, over 3272933.30 frames. utt_duration=1293 frames, utt_pad_proportion=0.0431, over 10140.26 utterances.], batch size: 45, lr: 4.89e-02, grad_scale: 8.0 2023-03-07 10:48:19,336 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1503.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 10:48:27,265 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1509.0, num_to_drop=2, layers_to_drop={1, 2} 2023-03-07 10:48:33,173 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2023-03-07 10:48:41,086 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.64 vs. limit=5.0 2023-03-07 10:49:09,161 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9310, 5.6656, 5.0731, 5.7542, 5.4178, 5.2205, 5.5369, 5.2866], device='cuda:2'), covar=tensor([0.0414, 0.0513, 0.0793, 0.0413, 0.0670, 0.0872, 0.0655, 0.0709], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0089, 0.0101, 0.0085, 0.0089, 0.0110, 0.0088, 0.0108], device='cuda:2'), out_proj_covar=tensor([7.8095e-05, 9.2237e-05, 1.0503e-04, 8.6178e-05, 8.7344e-05, 1.2717e-04, 8.9263e-05, 1.1871e-04], device='cuda:2') 2023-03-07 10:49:09,861 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.30 vs. limit=5.0 2023-03-07 10:49:26,086 INFO [train2.py:809] (2/4) Epoch 1, batch 1550, loss[ctc_loss=0.5402, att_loss=0.4816, loss=0.4934, over 16480.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.00568, over 46.00 utterances.], tot_loss[ctc_loss=0.5691, att_loss=0.4952, loss=0.51, over 3276213.86 frames. utt_duration=1320 frames, utt_pad_proportion=0.0367, over 9941.30 utterances.], batch size: 46, lr: 4.89e-02, grad_scale: 8.0 2023-03-07 10:49:26,178 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1551.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 10:49:46,970 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1566.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:50:15,161 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9759, 4.0924, 4.3700, 4.4273, 4.2507, 2.0622, 4.9563, 3.7546], device='cuda:2'), covar=tensor([0.1548, 0.4440, 0.0832, 0.0945, 0.2144, 0.3199, 0.1019, 0.8148], device='cuda:2'), in_proj_covar=tensor([0.0023, 0.0028, 0.0025, 0.0025, 0.0032, 0.0028, 0.0027, 0.0025], device='cuda:2'), out_proj_covar=tensor([1.2262e-05, 1.9870e-05, 1.2089e-05, 1.4073e-05, 1.9491e-05, 1.7133e-05, 1.4779e-05, 1.9510e-05], device='cuda:2') 2023-03-07 10:50:36,307 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.585e+02 5.591e+02 7.426e+02 9.323e+02 2.170e+03, threshold=1.485e+03, percent-clipped=5.0 2023-03-07 10:50:36,350 INFO [train2.py:809] (2/4) Epoch 1, batch 1600, loss[ctc_loss=0.4584, att_loss=0.4097, loss=0.4194, over 15877.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009658, over 39.00 utterances.], tot_loss[ctc_loss=0.5573, att_loss=0.4853, loss=0.4997, over 3281724.31 frames. utt_duration=1308 frames, utt_pad_proportion=0.03784, over 10048.40 utterances.], batch size: 39, lr: 4.88e-02, grad_scale: 8.0 2023-03-07 10:50:50,107 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1611.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:51:10,592 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6018, 1.8187, 2.8860, 2.6308, 2.7183, 2.5368, 2.3406, 2.5651], device='cuda:2'), covar=tensor([0.1762, 0.1675, 0.0546, 0.1132, 0.0950, 0.1454, 0.1504, 0.1340], device='cuda:2'), in_proj_covar=tensor([0.0049, 0.0048, 0.0059, 0.0082, 0.0076, 0.0090, 0.0049, 0.0081], device='cuda:2'), out_proj_covar=tensor([3.9529e-05, 4.0146e-05, 4.4207e-05, 5.9955e-05, 5.3372e-05, 6.8262e-05, 4.1695e-05, 5.9779e-05], device='cuda:2') 2023-03-07 10:51:13,091 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1627.0, num_to_drop=2, layers_to_drop={0, 1} 2023-03-07 10:51:28,716 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9379, 4.0095, 4.0497, 4.2466, 4.2636, 3.7141, 3.8489, 3.7562], device='cuda:2'), covar=tensor([0.0534, 0.0593, 0.0419, 0.0313, 0.0404, 0.0790, 0.0561, 0.0673], device='cuda:2'), in_proj_covar=tensor([0.0033, 0.0035, 0.0037, 0.0036, 0.0033, 0.0035, 0.0038, 0.0037], device='cuda:2'), out_proj_covar=tensor([2.5822e-05, 2.7852e-05, 3.0176e-05, 2.8326e-05, 2.5809e-05, 2.8835e-05, 3.0568e-05, 2.8941e-05], device='cuda:2') 2023-03-07 10:51:44,856 INFO [train2.py:809] (2/4) Epoch 1, batch 1650, loss[ctc_loss=0.5465, att_loss=0.4756, loss=0.4898, over 17068.00 frames. utt_duration=1315 frames, utt_pad_proportion=0.007852, over 52.00 utterances.], tot_loss[ctc_loss=0.5479, att_loss=0.4773, loss=0.4915, over 3276752.60 frames. utt_duration=1265 frames, utt_pad_proportion=0.05122, over 10371.00 utterances.], batch size: 52, lr: 4.87e-02, grad_scale: 8.0 2023-03-07 10:52:15,879 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1672.0, num_to_drop=2, layers_to_drop={1, 3} 2023-03-07 10:52:55,825 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.825e+02 4.712e+02 6.880e+02 8.240e+02 2.356e+03, threshold=1.376e+03, percent-clipped=2.0 2023-03-07 10:52:55,868 INFO [train2.py:809] (2/4) Epoch 1, batch 1700, loss[ctc_loss=0.5211, att_loss=0.4593, loss=0.4717, over 17288.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01282, over 55.00 utterances.], tot_loss[ctc_loss=0.5356, att_loss=0.4678, loss=0.4814, over 3275687.01 frames. utt_duration=1248 frames, utt_pad_proportion=0.05529, over 10507.61 utterances.], batch size: 55, lr: 4.86e-02, grad_scale: 8.0 2023-03-07 10:54:07,065 INFO [train2.py:809] (2/4) Epoch 1, batch 1750, loss[ctc_loss=0.4605, att_loss=0.4337, loss=0.4391, over 16981.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006501, over 50.00 utterances.], tot_loss[ctc_loss=0.5219, att_loss=0.4584, loss=0.4711, over 3264664.81 frames. utt_duration=1241 frames, utt_pad_proportion=0.05942, over 10539.29 utterances.], batch size: 50, lr: 4.86e-02, grad_scale: 8.0 2023-03-07 10:55:02,402 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1790.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 10:55:17,468 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.104e+02 4.727e+02 5.545e+02 7.358e+02 1.291e+03, threshold=1.109e+03, percent-clipped=0.0 2023-03-07 10:55:17,511 INFO [train2.py:809] (2/4) Epoch 1, batch 1800, loss[ctc_loss=0.4837, att_loss=0.4395, loss=0.4483, over 16329.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005567, over 45.00 utterances.], tot_loss[ctc_loss=0.511, att_loss=0.451, loss=0.463, over 3268538.20 frames. utt_duration=1224 frames, utt_pad_proportion=0.06191, over 10698.64 utterances.], batch size: 45, lr: 4.85e-02, grad_scale: 8.0 2023-03-07 10:55:28,555 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1809.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 10:56:09,995 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1838.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:56:27,794 INFO [train2.py:809] (2/4) Epoch 1, batch 1850, loss[ctc_loss=0.4783, att_loss=0.4531, loss=0.4582, over 16965.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.00767, over 50.00 utterances.], tot_loss[ctc_loss=0.502, att_loss=0.4461, loss=0.4573, over 3275798.30 frames. utt_duration=1238 frames, utt_pad_proportion=0.05534, over 10596.96 utterances.], batch size: 50, lr: 4.84e-02, grad_scale: 8.0 2023-03-07 10:56:36,581 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1857.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 10:56:36,774 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1857.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 10:56:40,258 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.71 vs. limit=2.0 2023-03-07 10:56:56,769 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1870.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 10:57:37,105 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1899.0, num_to_drop=2, layers_to_drop={0, 1} 2023-03-07 10:57:39,636 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.994e+02 5.025e+02 6.383e+02 8.643e+02 1.619e+03, threshold=1.277e+03, percent-clipped=10.0 2023-03-07 10:57:39,682 INFO [train2.py:809] (2/4) Epoch 1, batch 1900, loss[ctc_loss=0.444, att_loss=0.4007, loss=0.4093, over 17443.00 frames. utt_duration=1013 frames, utt_pad_proportion=0.04351, over 69.00 utterances.], tot_loss[ctc_loss=0.4906, att_loss=0.4392, loss=0.4495, over 3279514.60 frames. utt_duration=1223 frames, utt_pad_proportion=0.05615, over 10739.11 utterances.], batch size: 69, lr: 4.83e-02, grad_scale: 8.0 2023-03-07 10:58:05,209 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1918.0, num_to_drop=2, layers_to_drop={0, 2} 2023-03-07 10:58:10,272 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1922.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 10:58:22,741 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1931.0, num_to_drop=2, layers_to_drop={1, 2} 2023-03-07 10:58:50,731 INFO [train2.py:809] (2/4) Epoch 1, batch 1950, loss[ctc_loss=0.5679, att_loss=0.4943, loss=0.509, over 14032.00 frames. utt_duration=388.3 frames, utt_pad_proportion=0.3247, over 145.00 utterances.], tot_loss[ctc_loss=0.481, att_loss=0.4335, loss=0.443, over 3269215.98 frames. utt_duration=1220 frames, utt_pad_proportion=0.06043, over 10735.89 utterances.], batch size: 145, lr: 4.83e-02, grad_scale: 8.0 2023-03-07 10:59:14,929 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1967.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:00:06,965 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.041e+02 5.008e+02 6.125e+02 7.778e+02 1.503e+03, threshold=1.225e+03, percent-clipped=3.0 2023-03-07 11:00:07,008 INFO [train2.py:809] (2/4) Epoch 1, batch 2000, loss[ctc_loss=0.3892, att_loss=0.3857, loss=0.3864, over 17109.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.0159, over 56.00 utterances.], tot_loss[ctc_loss=0.4715, att_loss=0.4279, loss=0.4366, over 3265663.75 frames. utt_duration=1220 frames, utt_pad_proportion=0.06226, over 10718.38 utterances.], batch size: 56, lr: 4.82e-02, grad_scale: 16.0 2023-03-07 11:01:23,774 INFO [train2.py:809] (2/4) Epoch 1, batch 2050, loss[ctc_loss=0.337, att_loss=0.3405, loss=0.3398, over 15898.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.007284, over 39.00 utterances.], tot_loss[ctc_loss=0.4554, att_loss=0.4194, loss=0.4266, over 3271004.69 frames. utt_duration=1231 frames, utt_pad_proportion=0.05859, over 10641.66 utterances.], batch size: 39, lr: 4.81e-02, grad_scale: 16.0 2023-03-07 11:02:23,679 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2090.0, num_to_drop=2, layers_to_drop={1, 2} 2023-03-07 11:02:39,886 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+02 3.921e+02 4.697e+02 6.442e+02 1.106e+03, threshold=9.394e+02, percent-clipped=0.0 2023-03-07 11:02:39,928 INFO [train2.py:809] (2/4) Epoch 1, batch 2100, loss[ctc_loss=0.4087, att_loss=0.4017, loss=0.4031, over 16408.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006336, over 44.00 utterances.], tot_loss[ctc_loss=0.4392, att_loss=0.4114, loss=0.417, over 3267026.10 frames. utt_duration=1266 frames, utt_pad_proportion=0.0506, over 10330.87 utterances.], batch size: 44, lr: 4.80e-02, grad_scale: 16.0 2023-03-07 11:03:35,226 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2138.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:03:55,158 INFO [train2.py:809] (2/4) Epoch 1, batch 2150, loss[ctc_loss=0.3697, att_loss=0.3701, loss=0.3701, over 15621.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009642, over 37.00 utterances.], tot_loss[ctc_loss=0.4274, att_loss=0.4062, loss=0.4105, over 3273387.21 frames. utt_duration=1273 frames, utt_pad_proportion=0.04773, over 10298.63 utterances.], batch size: 37, lr: 4.79e-02, grad_scale: 16.0 2023-03-07 11:05:00,213 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=2194.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:05:10,565 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.636e+02 4.045e+02 5.653e+02 7.128e+02 1.127e+03, threshold=1.131e+03, percent-clipped=5.0 2023-03-07 11:05:10,609 INFO [train2.py:809] (2/4) Epoch 1, batch 2200, loss[ctc_loss=0.3455, att_loss=0.3744, loss=0.3686, over 17313.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.01119, over 55.00 utterances.], tot_loss[ctc_loss=0.4167, att_loss=0.4023, loss=0.4052, over 3285818.01 frames. utt_duration=1262 frames, utt_pad_proportion=0.04731, over 10431.17 utterances.], batch size: 55, lr: 4.78e-02, grad_scale: 16.0 2023-03-07 11:05:30,183 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=2213.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:05:43,456 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2222.0, num_to_drop=2, layers_to_drop={0, 1} 2023-03-07 11:05:49,154 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=2226.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:06:26,397 INFO [train2.py:809] (2/4) Epoch 1, batch 2250, loss[ctc_loss=0.3327, att_loss=0.3593, loss=0.354, over 11765.00 frames. utt_duration=1811 frames, utt_pad_proportion=0.03337, over 26.00 utterances.], tot_loss[ctc_loss=0.4068, att_loss=0.3975, loss=0.3994, over 3271832.75 frames. utt_duration=1245 frames, utt_pad_proportion=0.05421, over 10520.72 utterances.], batch size: 26, lr: 4.77e-02, grad_scale: 16.0 2023-03-07 11:06:52,770 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2267.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:06:57,125 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2270.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:07:24,839 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0336, 5.5866, 5.7121, 5.9065, 5.5777, 5.7965, 5.5094, 5.8281], device='cuda:2'), covar=tensor([0.0191, 0.0241, 0.0211, 0.0224, 0.0340, 0.0343, 0.0275, 0.0266], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0107, 0.0088, 0.0098, 0.0112, 0.0105, 0.0095, 0.0114], device='cuda:2'), out_proj_covar=tensor([8.7911e-05, 9.3584e-05, 8.4483e-05, 9.3024e-05, 1.0452e-04, 1.0829e-04, 9.4514e-05, 1.1289e-04], device='cuda:2') 2023-03-07 11:07:43,371 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.224e+02 3.835e+02 4.889e+02 6.269e+02 9.585e+02, threshold=9.779e+02, percent-clipped=0.0 2023-03-07 11:07:43,415 INFO [train2.py:809] (2/4) Epoch 1, batch 2300, loss[ctc_loss=0.3176, att_loss=0.3317, loss=0.3289, over 15510.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.007576, over 36.00 utterances.], tot_loss[ctc_loss=0.3964, att_loss=0.393, loss=0.3937, over 3272718.41 frames. utt_duration=1260 frames, utt_pad_proportion=0.04891, over 10402.09 utterances.], batch size: 36, lr: 4.77e-02, grad_scale: 16.0 2023-03-07 11:08:00,629 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1819, 5.0128, 4.9890, 5.0832, 4.7250, 5.1336, 5.1320, 5.0338], device='cuda:2'), covar=tensor([0.0338, 0.0294, 0.0314, 0.0325, 0.0539, 0.0395, 0.0358, 0.0552], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0111, 0.0092, 0.0103, 0.0118, 0.0109, 0.0099, 0.0119], device='cuda:2'), out_proj_covar=tensor([9.2334e-05, 9.7356e-05, 8.8748e-05, 9.7794e-05, 1.0978e-04, 1.1241e-04, 9.9009e-05, 1.1885e-04], device='cuda:2') 2023-03-07 11:08:06,519 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2315.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:08:43,520 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=2340.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:08:59,800 INFO [train2.py:809] (2/4) Epoch 1, batch 2350, loss[ctc_loss=0.3774, att_loss=0.383, loss=0.3819, over 17404.00 frames. utt_duration=1106 frames, utt_pad_proportion=0.03295, over 63.00 utterances.], tot_loss[ctc_loss=0.389, att_loss=0.3908, loss=0.3904, over 3276110.15 frames. utt_duration=1242 frames, utt_pad_proportion=0.05191, over 10567.08 utterances.], batch size: 63, lr: 4.76e-02, grad_scale: 16.0 2023-03-07 11:09:24,747 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.97 vs. limit=2.0 2023-03-07 11:10:15,552 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.741e+02 4.246e+02 5.457e+02 7.092e+02 1.687e+03, threshold=1.091e+03, percent-clipped=6.0 2023-03-07 11:10:15,595 INFO [train2.py:809] (2/4) Epoch 1, batch 2400, loss[ctc_loss=0.3281, att_loss=0.3654, loss=0.358, over 16550.00 frames. utt_duration=1473 frames, utt_pad_proportion=0.005045, over 45.00 utterances.], tot_loss[ctc_loss=0.3827, att_loss=0.388, loss=0.387, over 3284864.88 frames. utt_duration=1229 frames, utt_pad_proportion=0.05208, over 10708.57 utterances.], batch size: 45, lr: 4.75e-02, grad_scale: 16.0 2023-03-07 11:10:15,986 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=2401.0, num_to_drop=2, layers_to_drop={2, 3} 2023-03-07 11:10:46,045 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.97 vs. limit=2.0 2023-03-07 11:11:30,932 INFO [train2.py:809] (2/4) Epoch 1, batch 2450, loss[ctc_loss=0.4217, att_loss=0.4163, loss=0.4173, over 16945.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.007359, over 50.00 utterances.], tot_loss[ctc_loss=0.3778, att_loss=0.3864, loss=0.3847, over 3296675.91 frames. utt_duration=1231 frames, utt_pad_proportion=0.04828, over 10725.07 utterances.], batch size: 50, lr: 4.74e-02, grad_scale: 16.0 2023-03-07 11:11:45,010 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1268, 4.7686, 4.1778, 5.2485, 3.0849, 4.3609, 5.4215, 4.1096], device='cuda:2'), covar=tensor([0.1747, 0.1644, 0.1187, 0.0450, 1.0031, 0.0742, 0.0427, 0.4186], device='cuda:2'), in_proj_covar=tensor([0.0050, 0.0043, 0.0046, 0.0050, 0.0120, 0.0058, 0.0055, 0.0050], device='cuda:2'), out_proj_covar=tensor([2.5556e-05, 2.8440e-05, 1.8139e-05, 2.4917e-05, 7.5816e-05, 3.1614e-05, 2.4561e-05, 3.4319e-05], device='cuda:2') 2023-03-07 11:12:34,776 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-07 11:12:36,933 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2494.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 11:12:47,383 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.845e+02 4.360e+02 5.388e+02 6.966e+02 1.787e+03, threshold=1.078e+03, percent-clipped=1.0 2023-03-07 11:12:47,426 INFO [train2.py:809] (2/4) Epoch 1, batch 2500, loss[ctc_loss=0.3726, att_loss=0.3919, loss=0.388, over 17012.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008326, over 51.00 utterances.], tot_loss[ctc_loss=0.3691, att_loss=0.3825, loss=0.3798, over 3293391.34 frames. utt_duration=1254 frames, utt_pad_proportion=0.04508, over 10514.67 utterances.], batch size: 51, lr: 4.73e-02, grad_scale: 16.0 2023-03-07 11:13:06,028 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4260, 4.9608, 4.6914, 4.4805, 5.1694, 4.8621, 4.8153, 4.6606], device='cuda:2'), covar=tensor([0.0319, 0.0178, 0.0250, 0.0361, 0.0168, 0.0179, 0.0219, 0.0350], device='cuda:2'), in_proj_covar=tensor([0.0082, 0.0069, 0.0067, 0.0056, 0.0074, 0.0093, 0.0066, 0.0081], device='cuda:2'), out_proj_covar=tensor([8.7419e-05, 6.6180e-05, 6.6788e-05, 5.8575e-05, 7.7398e-05, 1.0288e-04, 6.4260e-05, 8.8860e-05], device='cuda:2') 2023-03-07 11:13:06,119 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2513.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:13:19,834 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.35 vs. limit=2.0 2023-03-07 11:13:25,379 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2526.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:13:49,475 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2542.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:14:02,690 INFO [train2.py:809] (2/4) Epoch 1, batch 2550, loss[ctc_loss=0.2718, att_loss=0.3155, loss=0.3068, over 15776.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008338, over 38.00 utterances.], tot_loss[ctc_loss=0.3632, att_loss=0.3798, loss=0.3765, over 3290676.69 frames. utt_duration=1260 frames, utt_pad_proportion=0.04463, over 10462.06 utterances.], batch size: 38, lr: 4.72e-02, grad_scale: 16.0 2023-03-07 11:14:19,961 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2561.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:14:39,272 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2574.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:15:19,989 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.773e+02 4.230e+02 5.272e+02 6.534e+02 1.107e+03, threshold=1.054e+03, percent-clipped=1.0 2023-03-07 11:15:20,033 INFO [train2.py:809] (2/4) Epoch 1, batch 2600, loss[ctc_loss=0.3707, att_loss=0.3955, loss=0.3905, over 17108.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01484, over 56.00 utterances.], tot_loss[ctc_loss=0.3568, att_loss=0.3766, loss=0.3727, over 3280716.51 frames. utt_duration=1263 frames, utt_pad_proportion=0.04546, over 10400.73 utterances.], batch size: 56, lr: 4.71e-02, grad_scale: 16.0 2023-03-07 11:16:26,839 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.15 vs. limit=2.0 2023-03-07 11:16:37,076 INFO [train2.py:809] (2/4) Epoch 1, batch 2650, loss[ctc_loss=0.334, att_loss=0.3655, loss=0.3592, over 16124.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006403, over 42.00 utterances.], tot_loss[ctc_loss=0.3504, att_loss=0.3736, loss=0.3689, over 3282882.38 frames. utt_duration=1261 frames, utt_pad_proportion=0.04564, over 10428.13 utterances.], batch size: 42, lr: 4.70e-02, grad_scale: 16.0 2023-03-07 11:17:29,894 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=2685.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:17:36,246 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2023-03-07 11:17:45,991 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=2696.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 11:17:53,968 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+02 4.149e+02 5.114e+02 6.615e+02 1.308e+03, threshold=1.023e+03, percent-clipped=1.0 2023-03-07 11:17:54,010 INFO [train2.py:809] (2/4) Epoch 1, batch 2700, loss[ctc_loss=0.4069, att_loss=0.4006, loss=0.4019, over 16959.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007933, over 50.00 utterances.], tot_loss[ctc_loss=0.3467, att_loss=0.372, loss=0.3669, over 3265215.80 frames. utt_duration=1235 frames, utt_pad_proportion=0.05864, over 10591.97 utterances.], batch size: 50, lr: 4.69e-02, grad_scale: 16.0 2023-03-07 11:18:28,346 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0937, 3.6519, 3.9309, 3.9818, 4.3096, 3.8831, 2.9879, 3.6551], device='cuda:2'), covar=tensor([0.0192, 0.0273, 0.0207, 0.0268, 0.0164, 0.0295, 0.0850, 0.0382], device='cuda:2'), in_proj_covar=tensor([0.0026, 0.0026, 0.0027, 0.0027, 0.0025, 0.0025, 0.0038, 0.0031], device='cuda:2'), out_proj_covar=tensor([1.9385e-05, 1.9522e-05, 1.8362e-05, 1.8598e-05, 1.8548e-05, 1.7995e-05, 3.0285e-05, 2.2740e-05], device='cuda:2') 2023-03-07 11:19:03,803 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=2746.0, num_to_drop=2, layers_to_drop={0, 2} 2023-03-07 11:19:10,751 INFO [train2.py:809] (2/4) Epoch 1, batch 2750, loss[ctc_loss=0.3536, att_loss=0.3965, loss=0.3879, over 17146.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.01353, over 56.00 utterances.], tot_loss[ctc_loss=0.3425, att_loss=0.3706, loss=0.365, over 3267759.70 frames. utt_duration=1235 frames, utt_pad_proportion=0.05856, over 10595.72 utterances.], batch size: 56, lr: 4.68e-02, grad_scale: 16.0 2023-03-07 11:19:30,472 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2023-03-07 11:19:31,245 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9393, 5.9614, 5.4210, 5.9894, 5.6588, 5.6086, 5.5524, 5.4418], device='cuda:2'), covar=tensor([0.0592, 0.0559, 0.0609, 0.0377, 0.0487, 0.0703, 0.0937, 0.1110], device='cuda:2'), in_proj_covar=tensor([0.0156, 0.0182, 0.0155, 0.0132, 0.0131, 0.0184, 0.0175, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 11:20:27,836 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 4.978e+02 5.756e+02 7.064e+02 1.353e+03, threshold=1.151e+03, percent-clipped=3.0 2023-03-07 11:20:27,883 INFO [train2.py:809] (2/4) Epoch 1, batch 2800, loss[ctc_loss=0.3155, att_loss=0.3524, loss=0.345, over 16110.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.007234, over 42.00 utterances.], tot_loss[ctc_loss=0.3386, att_loss=0.369, loss=0.3629, over 3268740.89 frames. utt_duration=1245 frames, utt_pad_proportion=0.05542, over 10511.57 utterances.], batch size: 42, lr: 4.67e-02, grad_scale: 16.0 2023-03-07 11:21:04,285 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=2824.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:21:37,618 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.15 vs. limit=2.0 2023-03-07 11:21:44,973 INFO [train2.py:809] (2/4) Epoch 1, batch 2850, loss[ctc_loss=0.3784, att_loss=0.3962, loss=0.3926, over 17376.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03338, over 63.00 utterances.], tot_loss[ctc_loss=0.3353, att_loss=0.3678, loss=0.3613, over 3261108.24 frames. utt_duration=1274 frames, utt_pad_proportion=0.05126, over 10249.06 utterances.], batch size: 63, lr: 4.66e-02, grad_scale: 16.0 2023-03-07 11:22:36,843 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=2885.0, num_to_drop=2, layers_to_drop={1, 2} 2023-03-07 11:23:01,455 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.575e+02 4.414e+02 5.747e+02 7.042e+02 1.597e+03, threshold=1.149e+03, percent-clipped=5.0 2023-03-07 11:23:01,498 INFO [train2.py:809] (2/4) Epoch 1, batch 2900, loss[ctc_loss=0.3193, att_loss=0.3771, loss=0.3655, over 17443.00 frames. utt_duration=1013 frames, utt_pad_proportion=0.04286, over 69.00 utterances.], tot_loss[ctc_loss=0.332, att_loss=0.3671, loss=0.36, over 3276353.61 frames. utt_duration=1276 frames, utt_pad_proportion=0.04612, over 10282.24 utterances.], batch size: 69, lr: 4.65e-02, grad_scale: 16.0 2023-03-07 11:23:57,975 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=2937.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 11:24:18,939 INFO [train2.py:809] (2/4) Epoch 1, batch 2950, loss[ctc_loss=0.3332, att_loss=0.3758, loss=0.3672, over 17392.00 frames. utt_duration=1106 frames, utt_pad_proportion=0.03327, over 63.00 utterances.], tot_loss[ctc_loss=0.3273, att_loss=0.3645, loss=0.3571, over 3263625.19 frames. utt_duration=1261 frames, utt_pad_proportion=0.05483, over 10365.93 utterances.], batch size: 63, lr: 4.64e-02, grad_scale: 16.0 2023-03-07 11:24:41,789 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.04 vs. limit=2.0 2023-03-07 11:25:29,064 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2996.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 11:25:32,186 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=2998.0, num_to_drop=2, layers_to_drop={2, 3} 2023-03-07 11:25:36,468 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.432e+02 4.332e+02 5.294e+02 6.689e+02 1.698e+03, threshold=1.059e+03, percent-clipped=3.0 2023-03-07 11:25:36,511 INFO [train2.py:809] (2/4) Epoch 1, batch 3000, loss[ctc_loss=0.3191, att_loss=0.3708, loss=0.3604, over 16278.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.007447, over 43.00 utterances.], tot_loss[ctc_loss=0.3252, att_loss=0.3632, loss=0.3556, over 3253448.06 frames. utt_duration=1240 frames, utt_pad_proportion=0.06271, over 10505.44 utterances.], batch size: 43, lr: 4.63e-02, grad_scale: 16.0 2023-03-07 11:25:36,511 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 11:25:51,183 INFO [train2.py:843] (2/4) Epoch 1, validation: ctc_loss=0.2388, att_loss=0.3154, loss=0.3001, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 11:25:51,184 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 14667MB 2023-03-07 11:26:18,678 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3019.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:26:51,055 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3041.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:26:55,339 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3044.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:27:07,499 INFO [train2.py:809] (2/4) Epoch 1, batch 3050, loss[ctc_loss=0.2995, att_loss=0.3449, loss=0.3358, over 16275.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.007508, over 43.00 utterances.], tot_loss[ctc_loss=0.3227, att_loss=0.3626, loss=0.3547, over 3258815.35 frames. utt_duration=1218 frames, utt_pad_proportion=0.06493, over 10715.09 utterances.], batch size: 43, lr: 4.62e-02, grad_scale: 16.0 2023-03-07 11:27:43,179 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.08 vs. limit=2.0 2023-03-07 11:27:52,678 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3080.0, num_to_drop=2, layers_to_drop={0, 3} 2023-03-07 11:28:24,852 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 4.335e+02 5.357e+02 6.400e+02 1.084e+03, threshold=1.071e+03, percent-clipped=0.0 2023-03-07 11:28:24,897 INFO [train2.py:809] (2/4) Epoch 1, batch 3100, loss[ctc_loss=0.4524, att_loss=0.4369, loss=0.44, over 16888.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006524, over 49.00 utterances.], tot_loss[ctc_loss=0.319, att_loss=0.3609, loss=0.3525, over 3258699.12 frames. utt_duration=1219 frames, utt_pad_proportion=0.06484, over 10704.19 utterances.], batch size: 49, lr: 4.61e-02, grad_scale: 16.0 2023-03-07 11:29:41,930 INFO [train2.py:809] (2/4) Epoch 1, batch 3150, loss[ctc_loss=0.2833, att_loss=0.3379, loss=0.327, over 15879.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009627, over 39.00 utterances.], tot_loss[ctc_loss=0.3183, att_loss=0.3612, loss=0.3526, over 3254823.40 frames. utt_duration=1197 frames, utt_pad_proportion=0.071, over 10893.29 utterances.], batch size: 39, lr: 4.60e-02, grad_scale: 16.0 2023-03-07 11:29:43,860 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1054, 4.4717, 4.9720, 4.2622, 3.1748, 4.5188, 4.4403, 4.9254], device='cuda:2'), covar=tensor([0.1020, 0.0478, 0.0146, 0.0465, 0.3588, 0.0303, 0.0250, 0.0129], device='cuda:2'), in_proj_covar=tensor([0.0054, 0.0059, 0.0068, 0.0101, 0.0163, 0.0056, 0.0085, 0.0083], device='cuda:2'), out_proj_covar=tensor([4.3078e-05, 4.0391e-05, 3.7709e-05, 6.2005e-05, 1.1326e-04, 3.4286e-05, 4.5171e-05, 3.8955e-05], device='cuda:2') 2023-03-07 11:29:44,531 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.79 vs. limit=5.0 2023-03-07 11:30:11,693 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2023-03-07 11:30:17,010 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4537, 5.1359, 5.1190, 5.1923, 4.6907, 4.9428, 5.1324, 5.1264], device='cuda:2'), covar=tensor([0.0221, 0.0119, 0.0152, 0.0128, 0.0237, 0.0089, 0.0319, 0.0110], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0056, 0.0063, 0.0051, 0.0068, 0.0059, 0.0069, 0.0052], device='cuda:2'), out_proj_covar=tensor([8.8280e-05, 6.0006e-05, 6.4516e-05, 5.3119e-05, 7.6123e-05, 6.9109e-05, 7.8202e-05, 4.9489e-05], device='cuda:2') 2023-03-07 11:30:25,835 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3180.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:30:58,387 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.970e+02 4.582e+02 5.665e+02 7.020e+02 1.937e+03, threshold=1.133e+03, percent-clipped=6.0 2023-03-07 11:30:58,430 INFO [train2.py:809] (2/4) Epoch 1, batch 3200, loss[ctc_loss=0.2939, att_loss=0.35, loss=0.3388, over 16538.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006377, over 45.00 utterances.], tot_loss[ctc_loss=0.3168, att_loss=0.3607, loss=0.3519, over 3267150.54 frames. utt_duration=1218 frames, utt_pad_proportion=0.06339, over 10745.56 utterances.], batch size: 45, lr: 4.59e-02, grad_scale: 16.0 2023-03-07 11:30:58,851 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1954, 4.6310, 4.0780, 4.6350, 4.7917, 4.9211, 4.5114, 2.7409], device='cuda:2'), covar=tensor([0.1802, 0.1254, 0.2641, 0.0893, 0.0798, 0.0944, 0.2489, 1.1577], device='cuda:2'), in_proj_covar=tensor([0.0101, 0.0076, 0.0047, 0.0084, 0.0076, 0.0087, 0.0065, 0.0159], device='cuda:2'), out_proj_covar=tensor([6.3795e-05, 3.3410e-05, 3.4904e-05, 3.3399e-05, 4.0705e-05, 4.3903e-05, 4.1099e-05, 9.6985e-05], device='cuda:2') 2023-03-07 11:31:21,117 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3119, 3.7020, 3.0357, 1.5880, 3.3331, 3.7665, 3.1105, 2.3914], device='cuda:2'), covar=tensor([0.0391, 0.0270, 0.0432, 0.1379, 0.0283, 0.0218, 0.0623, 0.1032], device='cuda:2'), in_proj_covar=tensor([0.0035, 0.0034, 0.0033, 0.0047, 0.0031, 0.0026, 0.0032, 0.0044], device='cuda:2'), out_proj_covar=tensor([2.6231e-05, 2.3794e-05, 2.5398e-05, 3.9581e-05, 2.2976e-05, 2.0394e-05, 2.4639e-05, 3.7761e-05], device='cuda:2') 2023-03-07 11:32:13,829 INFO [train2.py:809] (2/4) Epoch 1, batch 3250, loss[ctc_loss=0.3865, att_loss=0.4116, loss=0.4065, over 14548.00 frames. utt_duration=400.1 frames, utt_pad_proportion=0.303, over 146.00 utterances.], tot_loss[ctc_loss=0.3155, att_loss=0.3594, loss=0.3506, over 3259475.12 frames. utt_duration=1209 frames, utt_pad_proportion=0.06821, over 10793.25 utterances.], batch size: 146, lr: 4.58e-02, grad_scale: 16.0 2023-03-07 11:33:07,663 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3528, 4.4925, 4.4252, 4.5375, 4.7401, 4.6683, 4.6897, 4.8200], device='cuda:2'), covar=tensor([0.0128, 0.0192, 0.0131, 0.0139, 0.0134, 0.0117, 0.0135, 0.0101], device='cuda:2'), in_proj_covar=tensor([0.0034, 0.0034, 0.0034, 0.0030, 0.0030, 0.0031, 0.0030, 0.0030], device='cuda:2'), out_proj_covar=tensor([3.9276e-05, 3.6623e-05, 3.7313e-05, 3.0433e-05, 2.9206e-05, 3.4398e-05, 3.0163e-05, 2.9692e-05], device='cuda:2') 2023-03-07 11:33:09,808 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3288.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:33:17,865 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3293.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:33:29,892 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.939e+02 4.361e+02 5.805e+02 6.999e+02 1.268e+03, threshold=1.161e+03, percent-clipped=3.0 2023-03-07 11:33:29,935 INFO [train2.py:809] (2/4) Epoch 1, batch 3300, loss[ctc_loss=0.2821, att_loss=0.3557, loss=0.341, over 16689.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006379, over 46.00 utterances.], tot_loss[ctc_loss=0.3124, att_loss=0.3579, loss=0.3488, over 3258621.03 frames. utt_duration=1195 frames, utt_pad_proportion=0.07009, over 10920.95 utterances.], batch size: 46, lr: 4.57e-02, grad_scale: 16.0 2023-03-07 11:34:30,978 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3341.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:34:37,468 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6687, 5.7958, 5.3114, 5.8195, 5.3843, 5.3322, 5.2728, 5.2477], device='cuda:2'), covar=tensor([0.1282, 0.0694, 0.0571, 0.0495, 0.0667, 0.0998, 0.1672, 0.1224], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0219, 0.0175, 0.0154, 0.0143, 0.0209, 0.0215, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 11:34:38,307 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.33 vs. limit=5.0 2023-03-07 11:34:43,751 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3349.0, num_to_drop=2, layers_to_drop={2, 3} 2023-03-07 11:34:46,402 INFO [train2.py:809] (2/4) Epoch 1, batch 3350, loss[ctc_loss=0.3332, att_loss=0.3616, loss=0.3559, over 17068.00 frames. utt_duration=691.1 frames, utt_pad_proportion=0.1307, over 99.00 utterances.], tot_loss[ctc_loss=0.3079, att_loss=0.3558, loss=0.3462, over 3260956.98 frames. utt_duration=1219 frames, utt_pad_proportion=0.06349, over 10710.66 utterances.], batch size: 99, lr: 4.56e-02, grad_scale: 16.0 2023-03-07 11:34:55,280 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2023-03-07 11:35:23,032 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3375.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:35:44,662 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3389.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:36:00,676 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2023-03-07 11:36:04,117 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.598e+02 4.455e+02 5.465e+02 6.622e+02 1.273e+03, threshold=1.093e+03, percent-clipped=2.0 2023-03-07 11:36:04,159 INFO [train2.py:809] (2/4) Epoch 1, batch 3400, loss[ctc_loss=0.2597, att_loss=0.3283, loss=0.3145, over 14967.00 frames. utt_duration=1816 frames, utt_pad_proportion=0.02683, over 33.00 utterances.], tot_loss[ctc_loss=0.3048, att_loss=0.3551, loss=0.345, over 3271297.85 frames. utt_duration=1228 frames, utt_pad_proportion=0.05899, over 10670.03 utterances.], batch size: 33, lr: 4.55e-02, grad_scale: 16.0 2023-03-07 11:36:10,881 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.07 vs. limit=2.0 2023-03-07 11:37:05,154 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6187, 3.9564, 3.8546, 3.6469, 3.9944, 3.9081, 3.8995, 3.7936], device='cuda:2'), covar=tensor([0.0494, 0.0270, 0.0261, 0.0388, 0.0279, 0.0256, 0.0184, 0.0315], device='cuda:2'), in_proj_covar=tensor([0.0119, 0.0096, 0.0079, 0.0074, 0.0103, 0.0120, 0.0084, 0.0102], device='cuda:2'), out_proj_covar=tensor([1.4483e-04, 1.1211e-04, 9.3604e-05, 9.6096e-05, 1.2758e-04, 1.5911e-04, 9.4323e-05, 1.3138e-04], device='cuda:2') 2023-03-07 11:37:19,749 INFO [train2.py:809] (2/4) Epoch 1, batch 3450, loss[ctc_loss=0.3134, att_loss=0.358, loss=0.3491, over 16314.00 frames. utt_duration=1451 frames, utt_pad_proportion=0.007205, over 45.00 utterances.], tot_loss[ctc_loss=0.3049, att_loss=0.3547, loss=0.3448, over 3264575.06 frames. utt_duration=1226 frames, utt_pad_proportion=0.0622, over 10662.47 utterances.], batch size: 45, lr: 4.54e-02, grad_scale: 16.0 2023-03-07 11:38:03,915 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3480.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:38:35,897 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+02 4.628e+02 6.013e+02 7.728e+02 1.949e+03, threshold=1.203e+03, percent-clipped=6.0 2023-03-07 11:38:35,939 INFO [train2.py:809] (2/4) Epoch 1, batch 3500, loss[ctc_loss=0.2367, att_loss=0.2936, loss=0.2822, over 15356.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01146, over 35.00 utterances.], tot_loss[ctc_loss=0.304, att_loss=0.3542, loss=0.3441, over 3273059.45 frames. utt_duration=1245 frames, utt_pad_proportion=0.05479, over 10524.75 utterances.], batch size: 35, lr: 4.53e-02, grad_scale: 16.0 2023-03-07 11:39:03,367 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.79 vs. limit=2.0 2023-03-07 11:39:16,982 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3528.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:39:45,698 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3546.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:39:52,691 INFO [train2.py:809] (2/4) Epoch 1, batch 3550, loss[ctc_loss=0.2803, att_loss=0.3313, loss=0.3211, over 15891.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009066, over 39.00 utterances.], tot_loss[ctc_loss=0.3043, att_loss=0.3541, loss=0.3442, over 3271957.36 frames. utt_duration=1269 frames, utt_pad_proportion=0.04924, over 10327.89 utterances.], batch size: 39, lr: 4.51e-02, grad_scale: 16.0 2023-03-07 11:40:20,797 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7802, 5.8283, 5.3525, 5.9310, 5.5320, 5.5151, 5.2208, 5.3435], device='cuda:2'), covar=tensor([0.0830, 0.0582, 0.0578, 0.0374, 0.0530, 0.0740, 0.1651, 0.1169], device='cuda:2'), in_proj_covar=tensor([0.0174, 0.0211, 0.0175, 0.0151, 0.0141, 0.0207, 0.0213, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 11:40:58,964 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3593.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:41:00,544 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1781, 4.5741, 4.4312, 4.5889, 4.7478, 4.4835, 4.4668, 4.6806], device='cuda:2'), covar=tensor([0.0168, 0.0233, 0.0123, 0.0115, 0.0109, 0.0124, 0.0157, 0.0125], device='cuda:2'), in_proj_covar=tensor([0.0034, 0.0034, 0.0034, 0.0029, 0.0028, 0.0031, 0.0031, 0.0030], device='cuda:2'), out_proj_covar=tensor([4.0713e-05, 3.9521e-05, 4.0206e-05, 3.2795e-05, 3.1099e-05, 3.7628e-05, 3.3774e-05, 3.2534e-05], device='cuda:2') 2023-03-07 11:41:10,752 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.449e+02 6.503e+02 7.868e+02 1.011e+03 2.300e+03, threshold=1.574e+03, percent-clipped=15.0 2023-03-07 11:41:10,799 INFO [train2.py:809] (2/4) Epoch 1, batch 3600, loss[ctc_loss=0.2694, att_loss=0.313, loss=0.3043, over 15770.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.009368, over 38.00 utterances.], tot_loss[ctc_loss=0.3023, att_loss=0.353, loss=0.3428, over 3272362.91 frames. utt_duration=1281 frames, utt_pad_proportion=0.04682, over 10229.67 utterances.], batch size: 38, lr: 4.50e-02, grad_scale: 16.0 2023-03-07 11:41:20,363 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3607.0, num_to_drop=2, layers_to_drop={0, 3} 2023-03-07 11:42:01,848 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.09 vs. limit=2.0 2023-03-07 11:42:12,269 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3641.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:42:17,630 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3644.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 11:42:27,536 INFO [train2.py:809] (2/4) Epoch 1, batch 3650, loss[ctc_loss=0.3249, att_loss=0.3669, loss=0.3585, over 17048.00 frames. utt_duration=690.2 frames, utt_pad_proportion=0.1318, over 99.00 utterances.], tot_loss[ctc_loss=0.3004, att_loss=0.3521, loss=0.3418, over 3269795.70 frames. utt_duration=1292 frames, utt_pad_proportion=0.04378, over 10132.59 utterances.], batch size: 99, lr: 4.49e-02, grad_scale: 16.0 2023-03-07 11:43:05,267 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3675.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:43:06,908 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3676.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 11:43:27,357 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.04 vs. limit=2.0 2023-03-07 11:43:47,093 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.430e+02 5.749e+02 7.107e+02 9.009e+02 1.799e+03, threshold=1.421e+03, percent-clipped=2.0 2023-03-07 11:43:47,136 INFO [train2.py:809] (2/4) Epoch 1, batch 3700, loss[ctc_loss=0.339, att_loss=0.382, loss=0.3734, over 16863.00 frames. utt_duration=682.8 frames, utt_pad_proportion=0.1422, over 99.00 utterances.], tot_loss[ctc_loss=0.2982, att_loss=0.3509, loss=0.3404, over 3264874.57 frames. utt_duration=1280 frames, utt_pad_proportion=0.04732, over 10215.11 utterances.], batch size: 99, lr: 4.48e-02, grad_scale: 16.0 2023-03-07 11:44:08,492 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.64 vs. limit=5.0 2023-03-07 11:44:10,942 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7112, 3.9864, 3.5327, 2.1140, 3.6395, 4.4630, 3.7393, 3.0326], device='cuda:2'), covar=tensor([0.0290, 0.0224, 0.0318, 0.1508, 0.0265, 0.0070, 0.0499, 0.0903], device='cuda:2'), in_proj_covar=tensor([0.0047, 0.0046, 0.0044, 0.0065, 0.0045, 0.0031, 0.0038, 0.0062], device='cuda:2'), out_proj_covar=tensor([3.5316e-05, 3.3532e-05, 3.8887e-05, 5.7342e-05, 3.4222e-05, 2.4529e-05, 3.3676e-05, 5.4182e-05], device='cuda:2') 2023-03-07 11:44:21,463 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3723.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:44:45,062 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5054, 4.8967, 4.3792, 4.6282, 4.9449, 4.8879, 4.7202, 4.8789], device='cuda:2'), covar=tensor([0.0131, 0.0193, 0.0136, 0.0132, 0.0125, 0.0085, 0.0128, 0.0125], device='cuda:2'), in_proj_covar=tensor([0.0033, 0.0034, 0.0034, 0.0029, 0.0027, 0.0030, 0.0031, 0.0030], device='cuda:2'), out_proj_covar=tensor([3.9823e-05, 4.0514e-05, 4.1493e-05, 3.3665e-05, 2.9687e-05, 3.6327e-05, 3.4206e-05, 3.3286e-05], device='cuda:2') 2023-03-07 11:44:45,185 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3737.0, num_to_drop=2, layers_to_drop={2, 3} 2023-03-07 11:44:48,764 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.61 vs. limit=2.0 2023-03-07 11:45:05,049 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8891, 3.8368, 3.3070, 1.8224, 3.4148, 4.3386, 3.6350, 2.5910], device='cuda:2'), covar=tensor([0.0198, 0.0211, 0.0373, 0.1303, 0.0334, 0.0065, 0.0329, 0.0877], device='cuda:2'), in_proj_covar=tensor([0.0048, 0.0047, 0.0047, 0.0067, 0.0047, 0.0032, 0.0039, 0.0065], device='cuda:2'), out_proj_covar=tensor([3.6437e-05, 3.4175e-05, 4.1306e-05, 5.8957e-05, 3.5847e-05, 2.5159e-05, 3.5361e-05, 5.6758e-05], device='cuda:2') 2023-03-07 11:45:06,157 INFO [train2.py:809] (2/4) Epoch 1, batch 3750, loss[ctc_loss=0.3549, att_loss=0.3956, loss=0.3875, over 16620.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005588, over 47.00 utterances.], tot_loss[ctc_loss=0.2959, att_loss=0.35, loss=0.3392, over 3257972.57 frames. utt_duration=1234 frames, utt_pad_proportion=0.06077, over 10572.96 utterances.], batch size: 47, lr: 4.47e-02, grad_scale: 16.0 2023-03-07 11:45:44,721 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0 2023-03-07 11:46:23,998 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.021e+02 4.717e+02 6.819e+02 8.989e+02 1.986e+03, threshold=1.364e+03, percent-clipped=6.0 2023-03-07 11:46:24,041 INFO [train2.py:809] (2/4) Epoch 1, batch 3800, loss[ctc_loss=0.362, att_loss=0.3805, loss=0.3768, over 17344.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.02115, over 59.00 utterances.], tot_loss[ctc_loss=0.2943, att_loss=0.3487, loss=0.3378, over 3259900.99 frames. utt_duration=1249 frames, utt_pad_proportion=0.05693, over 10449.60 utterances.], batch size: 59, lr: 4.46e-02, grad_scale: 16.0 2023-03-07 11:46:25,957 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8273, 1.6348, 2.3033, 3.1265, 2.2861, 2.1678, 2.2481, 2.7162], device='cuda:2'), covar=tensor([0.0351, 0.0895, 0.0594, 0.0277, 0.0543, 0.1028, 0.0675, 0.0294], device='cuda:2'), in_proj_covar=tensor([0.0052, 0.0044, 0.0040, 0.0050, 0.0046, 0.0061, 0.0054, 0.0052], device='cuda:2'), out_proj_covar=tensor([3.7570e-05, 3.6340e-05, 3.6637e-05, 3.3589e-05, 3.8117e-05, 6.5132e-05, 5.1368e-05, 3.5624e-05], device='cuda:2') 2023-03-07 11:46:58,877 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0447, 6.0481, 5.4257, 6.0409, 5.6496, 5.6776, 5.6125, 5.3903], device='cuda:2'), covar=tensor([0.0791, 0.0621, 0.0586, 0.0526, 0.0497, 0.0700, 0.1637, 0.1659], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0219, 0.0184, 0.0166, 0.0144, 0.0222, 0.0233, 0.0210], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:2') 2023-03-07 11:47:17,183 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0740, 4.6066, 4.4601, 4.4003, 4.7898, 4.6783, 4.5858, 4.4652], device='cuda:2'), covar=tensor([0.0543, 0.0198, 0.0237, 0.0359, 0.0259, 0.0211, 0.0188, 0.0291], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0103, 0.0083, 0.0081, 0.0113, 0.0129, 0.0092, 0.0110], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-07 11:47:43,588 INFO [train2.py:809] (2/4) Epoch 1, batch 3850, loss[ctc_loss=0.2327, att_loss=0.3023, loss=0.2884, over 15640.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008439, over 37.00 utterances.], tot_loss[ctc_loss=0.2924, att_loss=0.3484, loss=0.3372, over 3263599.16 frames. utt_duration=1243 frames, utt_pad_proportion=0.05766, over 10512.79 utterances.], batch size: 37, lr: 4.45e-02, grad_scale: 16.0 2023-03-07 11:48:07,175 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.76 vs. limit=2.0 2023-03-07 11:49:01,029 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.357e+02 5.503e+02 6.521e+02 8.081e+02 1.434e+03, threshold=1.304e+03, percent-clipped=2.0 2023-03-07 11:49:01,073 INFO [train2.py:809] (2/4) Epoch 1, batch 3900, loss[ctc_loss=0.2669, att_loss=0.3534, loss=0.3361, over 17120.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01498, over 56.00 utterances.], tot_loss[ctc_loss=0.291, att_loss=0.3478, loss=0.3364, over 3256417.92 frames. utt_duration=1230 frames, utt_pad_proportion=0.06227, over 10598.95 utterances.], batch size: 56, lr: 4.44e-02, grad_scale: 16.0 2023-03-07 11:49:02,691 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3902.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 11:49:08,010 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-07 11:49:39,856 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3926.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:50:08,114 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3944.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:50:18,519 INFO [train2.py:809] (2/4) Epoch 1, batch 3950, loss[ctc_loss=0.2607, att_loss=0.3269, loss=0.3136, over 14102.00 frames. utt_duration=1821 frames, utt_pad_proportion=0.0509, over 31.00 utterances.], tot_loss[ctc_loss=0.2886, att_loss=0.3466, loss=0.335, over 3254038.61 frames. utt_duration=1242 frames, utt_pad_proportion=0.0606, over 10494.00 utterances.], batch size: 31, lr: 4.43e-02, grad_scale: 16.0 2023-03-07 11:50:30,406 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.65 vs. limit=5.0 2023-03-07 11:51:37,720 INFO [train2.py:809] (2/4) Epoch 2, batch 0, loss[ctc_loss=0.2437, att_loss=0.299, loss=0.288, over 15499.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008929, over 36.00 utterances.], tot_loss[ctc_loss=0.2437, att_loss=0.299, loss=0.288, over 15499.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008929, over 36.00 utterances.], batch size: 36, lr: 4.34e-02, grad_scale: 8.0 2023-03-07 11:51:37,720 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 11:51:49,514 INFO [train2.py:843] (2/4) Epoch 2, validation: ctc_loss=0.1604, att_loss=0.2954, loss=0.2684, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 11:51:49,516 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 11:51:52,986 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3987.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:52:01,279 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3992.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:52:08,115 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-03-07 11:52:20,458 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.599e+02 5.286e+02 6.545e+02 8.035e+02 1.268e+03, threshold=1.309e+03, percent-clipped=0.0 2023-03-07 11:53:10,084 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=4032.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 11:53:14,648 INFO [train2.py:809] (2/4) Epoch 2, batch 50, loss[ctc_loss=0.2944, att_loss=0.3545, loss=0.3425, over 16142.00 frames. utt_duration=1539 frames, utt_pad_proportion=0.005371, over 42.00 utterances.], tot_loss[ctc_loss=0.2879, att_loss=0.3516, loss=0.3388, over 732752.03 frames. utt_duration=1181 frames, utt_pad_proportion=0.08061, over 2485.79 utterances.], batch size: 42, lr: 4.33e-02, grad_scale: 8.0 2023-03-07 11:54:37,347 INFO [train2.py:809] (2/4) Epoch 2, batch 100, loss[ctc_loss=0.3019, att_loss=0.3604, loss=0.3487, over 17295.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02475, over 59.00 utterances.], tot_loss[ctc_loss=0.2821, att_loss=0.3438, loss=0.3314, over 1275013.87 frames. utt_duration=1208 frames, utt_pad_proportion=0.07913, over 4226.88 utterances.], batch size: 59, lr: 4.31e-02, grad_scale: 8.0 2023-03-07 11:55:05,385 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.894e+02 5.347e+02 6.194e+02 8.279e+02 2.356e+03, threshold=1.239e+03, percent-clipped=4.0 2023-03-07 11:56:00,899 INFO [train2.py:809] (2/4) Epoch 2, batch 150, loss[ctc_loss=0.3011, att_loss=0.358, loss=0.3466, over 17384.00 frames. utt_duration=1009 frames, utt_pad_proportion=0.04683, over 69.00 utterances.], tot_loss[ctc_loss=0.2834, att_loss=0.3447, loss=0.3324, over 1719186.13 frames. utt_duration=1197 frames, utt_pad_proportion=0.07322, over 5752.41 utterances.], batch size: 69, lr: 4.30e-02, grad_scale: 8.0 2023-03-07 11:56:11,618 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=4141.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:56:32,620 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5458, 4.6953, 4.7644, 4.4802, 4.6671, 3.9691, 4.4830, 3.5198], device='cuda:2'), covar=tensor([0.0109, 0.0055, 0.0089, 0.0100, 0.0046, 0.0108, 0.0090, 0.0287], device='cuda:2'), in_proj_covar=tensor([0.0018, 0.0016, 0.0018, 0.0021, 0.0017, 0.0019, 0.0020, 0.0028], device='cuda:2'), out_proj_covar=tensor([2.2447e-05, 2.2835e-05, 2.7041e-05, 2.5235e-05, 2.1147e-05, 2.8298e-05, 2.4466e-05, 3.5787e-05], device='cuda:2') 2023-03-07 11:57:24,047 INFO [train2.py:809] (2/4) Epoch 2, batch 200, loss[ctc_loss=0.2607, att_loss=0.3542, loss=0.3355, over 16763.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.005923, over 48.00 utterances.], tot_loss[ctc_loss=0.2816, att_loss=0.3441, loss=0.3316, over 2073842.80 frames. utt_duration=1225 frames, utt_pad_proportion=0.05876, over 6781.38 utterances.], batch size: 48, lr: 4.29e-02, grad_scale: 8.0 2023-03-07 11:57:52,227 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.168e+02 5.150e+02 6.618e+02 8.460e+02 1.655e+03, threshold=1.324e+03, percent-clipped=5.0 2023-03-07 11:57:52,572 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4202.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 11:57:52,643 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=4202.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 11:58:40,953 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9892, 4.4490, 5.0598, 3.8106, 3.9802, 4.1050, 5.0646, 4.6500], device='cuda:2'), covar=tensor([0.1161, 0.1017, 0.0431, 0.2933, 0.3536, 0.2650, 0.0365, 0.0801], device='cuda:2'), in_proj_covar=tensor([0.0061, 0.0108, 0.0073, 0.0112, 0.0135, 0.0086, 0.0054, 0.0052], device='cuda:2'), out_proj_covar=tensor([2.8712e-05, 4.9312e-05, 3.0561e-05, 6.6925e-05, 7.9495e-05, 5.3821e-05, 2.9373e-05, 2.8321e-05], device='cuda:2') 2023-03-07 11:58:46,861 INFO [train2.py:809] (2/4) Epoch 2, batch 250, loss[ctc_loss=0.3264, att_loss=0.3746, loss=0.365, over 16339.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005548, over 45.00 utterances.], tot_loss[ctc_loss=0.2758, att_loss=0.3397, loss=0.3269, over 2334297.42 frames. utt_duration=1275 frames, utt_pad_proportion=0.05083, over 7334.07 utterances.], batch size: 45, lr: 4.28e-02, grad_scale: 8.0 2023-03-07 11:59:11,789 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4250.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 11:59:38,008 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3314, 4.5685, 4.9151, 5.0886, 1.8353, 3.5061, 5.1149, 3.8044], device='cuda:2'), covar=tensor([0.1539, 0.1344, 0.0388, 0.0520, 2.4048, 0.2724, 0.0498, 0.6968], device='cuda:2'), in_proj_covar=tensor([0.0095, 0.0055, 0.0073, 0.0077, 0.0233, 0.0120, 0.0084, 0.0077], device='cuda:2'), out_proj_covar=tensor([4.6068e-05, 2.6597e-05, 2.8164e-05, 2.9820e-05, 1.2767e-04, 5.5031e-05, 3.1596e-05, 4.5585e-05], device='cuda:2') 2023-03-07 11:59:50,187 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-07 12:00:04,008 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=4282.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:00:08,476 INFO [train2.py:809] (2/4) Epoch 2, batch 300, loss[ctc_loss=0.3026, att_loss=0.366, loss=0.3534, over 17404.00 frames. utt_duration=1107 frames, utt_pad_proportion=0.03182, over 63.00 utterances.], tot_loss[ctc_loss=0.2769, att_loss=0.3418, loss=0.3288, over 2552661.89 frames. utt_duration=1267 frames, utt_pad_proportion=0.0488, over 8070.75 utterances.], batch size: 63, lr: 4.27e-02, grad_scale: 8.0 2023-03-07 12:00:14,398 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=4288.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:00:36,610 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.675e+02 5.336e+02 6.631e+02 8.677e+02 1.956e+03, threshold=1.326e+03, percent-clipped=4.0 2023-03-07 12:01:11,211 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.73 vs. limit=2.0 2023-03-07 12:01:25,979 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4332.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 12:01:30,248 INFO [train2.py:809] (2/4) Epoch 2, batch 350, loss[ctc_loss=0.2564, att_loss=0.3333, loss=0.318, over 16638.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.004587, over 47.00 utterances.], tot_loss[ctc_loss=0.2753, att_loss=0.3402, loss=0.3272, over 2715078.73 frames. utt_duration=1265 frames, utt_pad_proportion=0.04845, over 8593.77 utterances.], batch size: 47, lr: 4.26e-02, grad_scale: 8.0 2023-03-07 12:01:52,911 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=4349.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:02:21,780 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7899, 5.9547, 5.2133, 5.9707, 5.5464, 5.4111, 5.2876, 5.4041], device='cuda:2'), covar=tensor([0.0815, 0.0627, 0.0737, 0.0541, 0.0605, 0.0903, 0.1899, 0.1393], device='cuda:2'), in_proj_covar=tensor([0.0191, 0.0232, 0.0194, 0.0177, 0.0156, 0.0238, 0.0257, 0.0221], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 12:02:42,478 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4380.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 12:02:46,224 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2023-03-07 12:02:49,847 INFO [train2.py:809] (2/4) Epoch 2, batch 400, loss[ctc_loss=0.325, att_loss=0.3873, loss=0.3749, over 17272.00 frames. utt_duration=1098 frames, utt_pad_proportion=0.03822, over 63.00 utterances.], tot_loss[ctc_loss=0.2745, att_loss=0.34, loss=0.3269, over 2833665.31 frames. utt_duration=1260 frames, utt_pad_proportion=0.05305, over 9004.23 utterances.], batch size: 63, lr: 4.25e-02, grad_scale: 8.0 2023-03-07 12:03:17,594 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.419e+02 5.648e+02 6.669e+02 8.746e+02 3.020e+03, threshold=1.334e+03, percent-clipped=4.0 2023-03-07 12:03:57,250 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2766, 3.8512, 4.4377, 4.2314, 4.3256, 3.6039, 4.0398, 3.4515], device='cuda:2'), covar=tensor([0.0088, 0.0135, 0.0117, 0.0093, 0.0083, 0.0154, 0.0105, 0.0296], device='cuda:2'), in_proj_covar=tensor([0.0020, 0.0018, 0.0019, 0.0022, 0.0019, 0.0020, 0.0021, 0.0031], device='cuda:2'), out_proj_covar=tensor([2.6557e-05, 2.6780e-05, 3.1031e-05, 2.7940e-05, 2.5147e-05, 3.1580e-05, 2.7599e-05, 4.2231e-05], device='cuda:2') 2023-03-07 12:04:10,866 INFO [train2.py:809] (2/4) Epoch 2, batch 450, loss[ctc_loss=0.2431, att_loss=0.3193, loss=0.3041, over 16139.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005571, over 42.00 utterances.], tot_loss[ctc_loss=0.2752, att_loss=0.3411, loss=0.3279, over 2930845.39 frames. utt_duration=1279 frames, utt_pad_proportion=0.0488, over 9176.14 utterances.], batch size: 42, lr: 4.24e-02, grad_scale: 8.0 2023-03-07 12:05:11,193 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5046, 4.1818, 4.4807, 4.2938, 2.4924, 4.7909, 4.3467, 4.7104], device='cuda:2'), covar=tensor([0.0154, 0.0129, 0.0207, 0.0353, 0.3405, 0.0143, 0.0354, 0.0139], device='cuda:2'), in_proj_covar=tensor([0.0061, 0.0064, 0.0097, 0.0124, 0.0199, 0.0063, 0.0116, 0.0100], device='cuda:2'), out_proj_covar=tensor([4.7742e-05, 4.7350e-05, 6.3101e-05, 8.1796e-05, 1.3150e-04, 4.5198e-05, 7.3504e-05, 5.7850e-05], device='cuda:2') 2023-03-07 12:05:33,256 INFO [train2.py:809] (2/4) Epoch 2, batch 500, loss[ctc_loss=0.2933, att_loss=0.3495, loss=0.3383, over 17427.00 frames. utt_duration=1012 frames, utt_pad_proportion=0.04547, over 69.00 utterances.], tot_loss[ctc_loss=0.2742, att_loss=0.3405, loss=0.3272, over 3003432.37 frames. utt_duration=1245 frames, utt_pad_proportion=0.05689, over 9664.18 utterances.], batch size: 69, lr: 4.23e-02, grad_scale: 8.0 2023-03-07 12:05:52,582 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=4497.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:06:00,027 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.132e+02 4.968e+02 5.977e+02 8.490e+02 1.955e+03, threshold=1.195e+03, percent-clipped=9.0 2023-03-07 12:06:53,452 INFO [train2.py:809] (2/4) Epoch 2, batch 550, loss[ctc_loss=0.2221, att_loss=0.2991, loss=0.2837, over 16179.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006412, over 41.00 utterances.], tot_loss[ctc_loss=0.2734, att_loss=0.3401, loss=0.3267, over 3065766.12 frames. utt_duration=1228 frames, utt_pad_proportion=0.05933, over 9998.07 utterances.], batch size: 41, lr: 4.22e-02, grad_scale: 8.0 2023-03-07 12:07:55,437 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9844, 2.6824, 3.2387, 2.9795, 2.9905, 2.8666, 2.7145, 3.9503], device='cuda:2'), covar=tensor([0.0134, 0.0292, 0.0476, 0.0424, 0.0254, 0.0512, 0.0581, 0.0070], device='cuda:2'), in_proj_covar=tensor([0.0036, 0.0027, 0.0050, 0.0041, 0.0030, 0.0049, 0.0048, 0.0026], device='cuda:2'), out_proj_covar=tensor([3.4942e-05, 3.5897e-05, 6.0552e-05, 4.1040e-05, 3.4322e-05, 5.3880e-05, 4.8958e-05, 2.9882e-05], device='cuda:2') 2023-03-07 12:08:08,825 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4582.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:08:13,844 INFO [train2.py:809] (2/4) Epoch 2, batch 600, loss[ctc_loss=0.2563, att_loss=0.336, loss=0.3201, over 16117.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006618, over 42.00 utterances.], tot_loss[ctc_loss=0.2756, att_loss=0.3413, loss=0.3282, over 3117537.39 frames. utt_duration=1228 frames, utt_pad_proportion=0.05805, over 10164.37 utterances.], batch size: 42, lr: 4.21e-02, grad_scale: 8.0 2023-03-07 12:08:40,957 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.676e+02 4.778e+02 6.448e+02 8.266e+02 2.224e+03, threshold=1.290e+03, percent-clipped=6.0 2023-03-07 12:09:26,952 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4630.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:09:35,421 INFO [train2.py:809] (2/4) Epoch 2, batch 650, loss[ctc_loss=0.2538, att_loss=0.3366, loss=0.32, over 16459.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.007066, over 46.00 utterances.], tot_loss[ctc_loss=0.2742, att_loss=0.3408, loss=0.3275, over 3154914.54 frames. utt_duration=1228 frames, utt_pad_proportion=0.05672, over 10287.78 utterances.], batch size: 46, lr: 4.20e-02, grad_scale: 8.0 2023-03-07 12:09:50,186 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=4644.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:10:56,697 INFO [train2.py:809] (2/4) Epoch 2, batch 700, loss[ctc_loss=0.2418, att_loss=0.3235, loss=0.3072, over 16476.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006056, over 46.00 utterances.], tot_loss[ctc_loss=0.2724, att_loss=0.3397, loss=0.3262, over 3182186.83 frames. utt_duration=1226 frames, utt_pad_proportion=0.05727, over 10397.13 utterances.], batch size: 46, lr: 4.19e-02, grad_scale: 8.0 2023-03-07 12:11:23,312 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+02 4.876e+02 6.372e+02 7.940e+02 1.497e+03, threshold=1.274e+03, percent-clipped=2.0 2023-03-07 12:12:18,128 INFO [train2.py:809] (2/4) Epoch 2, batch 750, loss[ctc_loss=0.2582, att_loss=0.3216, loss=0.3089, over 14484.00 frames. utt_duration=1812 frames, utt_pad_proportion=0.04593, over 32.00 utterances.], tot_loss[ctc_loss=0.272, att_loss=0.3395, loss=0.326, over 3198495.35 frames. utt_duration=1208 frames, utt_pad_proportion=0.06386, over 10605.83 utterances.], batch size: 32, lr: 4.18e-02, grad_scale: 8.0 2023-03-07 12:13:38,640 INFO [train2.py:809] (2/4) Epoch 2, batch 800, loss[ctc_loss=0.3386, att_loss=0.375, loss=0.3677, over 16682.00 frames. utt_duration=682.3 frames, utt_pad_proportion=0.1449, over 98.00 utterances.], tot_loss[ctc_loss=0.2723, att_loss=0.3402, loss=0.3266, over 3220183.94 frames. utt_duration=1215 frames, utt_pad_proportion=0.06006, over 10613.19 utterances.], batch size: 98, lr: 4.17e-02, grad_scale: 8.0 2023-03-07 12:13:55,869 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9359, 5.1727, 5.4843, 5.5964, 4.9799, 5.7499, 5.2160, 5.7713], device='cuda:2'), covar=tensor([0.0349, 0.0483, 0.0295, 0.0349, 0.1686, 0.0520, 0.0340, 0.0406], device='cuda:2'), in_proj_covar=tensor([0.0222, 0.0184, 0.0154, 0.0179, 0.0280, 0.0184, 0.0146, 0.0197], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 12:13:57,572 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4797.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:14:05,888 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.406e+02 5.666e+02 6.834e+02 8.456e+02 1.435e+03, threshold=1.367e+03, percent-clipped=1.0 2023-03-07 12:14:58,934 INFO [train2.py:809] (2/4) Epoch 2, batch 850, loss[ctc_loss=0.2629, att_loss=0.3359, loss=0.3213, over 16621.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005603, over 47.00 utterances.], tot_loss[ctc_loss=0.2715, att_loss=0.3399, loss=0.3262, over 3237147.09 frames. utt_duration=1230 frames, utt_pad_proportion=0.05652, over 10541.34 utterances.], batch size: 47, lr: 4.16e-02, grad_scale: 8.0 2023-03-07 12:15:14,885 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4845.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:16:19,353 INFO [train2.py:809] (2/4) Epoch 2, batch 900, loss[ctc_loss=0.2044, att_loss=0.2733, loss=0.2596, over 15649.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008576, over 37.00 utterances.], tot_loss[ctc_loss=0.2684, att_loss=0.3378, loss=0.324, over 3246081.62 frames. utt_duration=1250 frames, utt_pad_proportion=0.05131, over 10396.78 utterances.], batch size: 37, lr: 4.15e-02, grad_scale: 8.0 2023-03-07 12:16:46,357 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.888e+02 4.698e+02 5.859e+02 7.531e+02 1.383e+03, threshold=1.172e+03, percent-clipped=2.0 2023-03-07 12:16:50,556 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1512, 4.3386, 4.0450, 4.5392, 4.5673, 4.1521, 4.0040, 4.3796], device='cuda:2'), covar=tensor([0.0104, 0.0194, 0.0134, 0.0149, 0.0088, 0.0125, 0.0201, 0.0127], device='cuda:2'), in_proj_covar=tensor([0.0035, 0.0037, 0.0038, 0.0030, 0.0027, 0.0031, 0.0038, 0.0036], device='cuda:2'), out_proj_covar=tensor([4.9923e-05, 5.3617e-05, 5.9563e-05, 4.3593e-05, 3.6185e-05, 4.7351e-05, 5.2062e-05, 5.1175e-05], device='cuda:2') 2023-03-07 12:16:53,604 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2819, 5.4912, 4.7787, 5.5301, 5.0943, 5.0339, 4.6256, 4.7866], device='cuda:2'), covar=tensor([0.0980, 0.0735, 0.0877, 0.0552, 0.0553, 0.1067, 0.2177, 0.1514], device='cuda:2'), in_proj_covar=tensor([0.0198, 0.0229, 0.0203, 0.0173, 0.0163, 0.0241, 0.0266, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 12:17:23,188 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9878, 6.0388, 5.4629, 6.1833, 5.8785, 5.6629, 5.4973, 5.5337], device='cuda:2'), covar=tensor([0.1426, 0.0784, 0.0942, 0.0631, 0.0662, 0.1104, 0.2265, 0.1879], device='cuda:2'), in_proj_covar=tensor([0.0199, 0.0234, 0.0205, 0.0178, 0.0164, 0.0244, 0.0264, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 12:17:40,694 INFO [train2.py:809] (2/4) Epoch 2, batch 950, loss[ctc_loss=0.2537, att_loss=0.3063, loss=0.2958, over 15343.00 frames. utt_duration=1755 frames, utt_pad_proportion=0.01248, over 35.00 utterances.], tot_loss[ctc_loss=0.2664, att_loss=0.337, loss=0.3229, over 3246976.49 frames. utt_duration=1264 frames, utt_pad_proportion=0.04699, over 10286.28 utterances.], batch size: 35, lr: 4.14e-02, grad_scale: 8.0 2023-03-07 12:17:48,815 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4728, 4.5690, 4.2126, 4.9205, 4.7718, 4.4853, 4.3581, 4.7566], device='cuda:2'), covar=tensor([0.0131, 0.0229, 0.0171, 0.0120, 0.0106, 0.0157, 0.0246, 0.0132], device='cuda:2'), in_proj_covar=tensor([0.0034, 0.0037, 0.0037, 0.0029, 0.0026, 0.0030, 0.0037, 0.0035], device='cuda:2'), out_proj_covar=tensor([4.8662e-05, 5.3253e-05, 5.8471e-05, 4.3010e-05, 3.5927e-05, 4.6761e-05, 5.1782e-05, 4.9823e-05], device='cuda:2') 2023-03-07 12:17:55,073 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4944.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:18:23,403 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.33 vs. limit=2.0 2023-03-07 12:19:01,004 INFO [train2.py:809] (2/4) Epoch 2, batch 1000, loss[ctc_loss=0.2022, att_loss=0.2961, loss=0.2774, over 15481.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.00943, over 36.00 utterances.], tot_loss[ctc_loss=0.2646, att_loss=0.3359, loss=0.3217, over 3258473.08 frames. utt_duration=1252 frames, utt_pad_proportion=0.04823, over 10424.05 utterances.], batch size: 36, lr: 4.13e-02, grad_scale: 8.0 2023-03-07 12:19:05,037 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.10 vs. limit=2.0 2023-03-07 12:19:12,034 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4992.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:19:27,689 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.851e+02 4.989e+02 6.201e+02 7.841e+02 1.460e+03, threshold=1.240e+03, percent-clipped=6.0 2023-03-07 12:20:21,707 INFO [train2.py:809] (2/4) Epoch 2, batch 1050, loss[ctc_loss=0.2221, att_loss=0.3019, loss=0.2859, over 15763.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009405, over 38.00 utterances.], tot_loss[ctc_loss=0.2623, att_loss=0.3342, loss=0.3199, over 3256659.96 frames. utt_duration=1281 frames, utt_pad_proportion=0.04466, over 10181.48 utterances.], batch size: 38, lr: 4.12e-02, grad_scale: 8.0 2023-03-07 12:21:03,606 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7197, 3.9722, 3.4768, 1.7050, 3.3271, 4.0732, 3.5651, 2.5923], device='cuda:2'), covar=tensor([0.0308, 0.0223, 0.0415, 0.1607, 0.0388, 0.0127, 0.0380, 0.1053], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0068, 0.0071, 0.0100, 0.0085, 0.0048, 0.0054, 0.0099], device='cuda:2'), out_proj_covar=tensor([6.2928e-05, 5.8395e-05, 7.4659e-05, 8.8969e-05, 7.5262e-05, 4.2650e-05, 5.8610e-05, 8.8318e-05], device='cuda:2') 2023-03-07 12:21:05,637 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.19 vs. limit=2.0 2023-03-07 12:21:42,518 INFO [train2.py:809] (2/4) Epoch 2, batch 1100, loss[ctc_loss=0.2865, att_loss=0.3529, loss=0.3396, over 17318.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02357, over 59.00 utterances.], tot_loss[ctc_loss=0.2631, att_loss=0.3351, loss=0.3207, over 3267213.33 frames. utt_duration=1272 frames, utt_pad_proportion=0.04475, over 10287.52 utterances.], batch size: 59, lr: 4.11e-02, grad_scale: 8.0 2023-03-07 12:22:09,661 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.609e+02 5.009e+02 6.335e+02 7.458e+02 1.646e+03, threshold=1.267e+03, percent-clipped=3.0 2023-03-07 12:22:19,071 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.88 vs. limit=5.0 2023-03-07 12:22:33,005 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0287, 5.2439, 4.6824, 5.2658, 4.8693, 4.7950, 4.5136, 4.7492], device='cuda:2'), covar=tensor([0.1191, 0.0821, 0.0828, 0.0553, 0.0685, 0.1098, 0.2271, 0.1610], device='cuda:2'), in_proj_covar=tensor([0.0203, 0.0237, 0.0209, 0.0181, 0.0166, 0.0255, 0.0267, 0.0238], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 12:23:03,364 INFO [train2.py:809] (2/4) Epoch 2, batch 1150, loss[ctc_loss=0.214, att_loss=0.3117, loss=0.2921, over 16394.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007485, over 44.00 utterances.], tot_loss[ctc_loss=0.2626, att_loss=0.3348, loss=0.3204, over 3271769.55 frames. utt_duration=1261 frames, utt_pad_proportion=0.0481, over 10391.39 utterances.], batch size: 44, lr: 4.10e-02, grad_scale: 8.0 2023-03-07 12:23:21,247 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7910, 5.2275, 5.3647, 5.6085, 4.9611, 5.6673, 5.1298, 5.6294], device='cuda:2'), covar=tensor([0.0363, 0.0376, 0.0345, 0.0266, 0.1526, 0.0483, 0.0401, 0.0517], device='cuda:2'), in_proj_covar=tensor([0.0239, 0.0193, 0.0165, 0.0191, 0.0298, 0.0193, 0.0155, 0.0210], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 12:23:30,000 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.7283, 1.6657, 1.8697, 1.5462, 1.7802, 1.7983, 1.1292, 2.5234], device='cuda:2'), covar=tensor([0.0760, 0.0968, 0.0930, 0.1088, 0.0915, 0.0979, 0.0963, 0.0499], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0060, 0.0051, 0.0065, 0.0060, 0.0065, 0.0059, 0.0070], device='cuda:2'), out_proj_covar=tensor([4.8795e-05, 5.1204e-05, 4.8539e-05, 4.5658e-05, 4.8197e-05, 7.0328e-05, 5.8440e-05, 4.6024e-05], device='cuda:2') 2023-03-07 12:23:39,924 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2775, 4.9702, 5.1266, 5.1701, 4.3608, 4.9278, 5.1801, 4.9106], device='cuda:2'), covar=tensor([0.0356, 0.0160, 0.0161, 0.0116, 0.0349, 0.0108, 0.0266, 0.0146], device='cuda:2'), in_proj_covar=tensor([0.0100, 0.0082, 0.0089, 0.0068, 0.0098, 0.0079, 0.0090, 0.0071], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-07 12:24:24,937 INFO [train2.py:809] (2/4) Epoch 2, batch 1200, loss[ctc_loss=0.2425, att_loss=0.312, loss=0.2981, over 15875.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.00997, over 39.00 utterances.], tot_loss[ctc_loss=0.2618, att_loss=0.3348, loss=0.3202, over 3272355.08 frames. utt_duration=1265 frames, utt_pad_proportion=0.04736, over 10361.11 utterances.], batch size: 39, lr: 4.08e-02, grad_scale: 8.0 2023-03-07 12:24:53,170 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.294e+02 5.285e+02 6.236e+02 7.599e+02 1.317e+03, threshold=1.247e+03, percent-clipped=1.0 2023-03-07 12:25:45,339 INFO [train2.py:809] (2/4) Epoch 2, batch 1250, loss[ctc_loss=0.2185, att_loss=0.2926, loss=0.2778, over 14068.00 frames. utt_duration=1816 frames, utt_pad_proportion=0.05979, over 31.00 utterances.], tot_loss[ctc_loss=0.2626, att_loss=0.3351, loss=0.3206, over 3271075.45 frames. utt_duration=1227 frames, utt_pad_proportion=0.05737, over 10674.91 utterances.], batch size: 31, lr: 4.07e-02, grad_scale: 8.0 2023-03-07 12:26:09,849 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2003, 4.2680, 5.1679, 4.1664, 3.8632, 4.1464, 4.4697, 4.3492], device='cuda:2'), covar=tensor([0.0634, 0.1297, 0.0317, 0.1645, 0.2921, 0.1678, 0.0488, 0.0784], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0153, 0.0095, 0.0173, 0.0197, 0.0109, 0.0070, 0.0076], device='cuda:2'), out_proj_covar=tensor([4.3483e-05, 8.4857e-05, 4.9586e-05, 1.1139e-04, 1.2191e-04, 7.4672e-05, 4.3519e-05, 4.6519e-05], device='cuda:2') 2023-03-07 12:27:05,707 INFO [train2.py:809] (2/4) Epoch 2, batch 1300, loss[ctc_loss=0.3026, att_loss=0.3532, loss=0.3431, over 17343.00 frames. utt_duration=1177 frames, utt_pad_proportion=0.02224, over 59.00 utterances.], tot_loss[ctc_loss=0.2611, att_loss=0.3333, loss=0.3189, over 3265783.27 frames. utt_duration=1223 frames, utt_pad_proportion=0.05961, over 10695.93 utterances.], batch size: 59, lr: 4.06e-02, grad_scale: 8.0 2023-03-07 12:27:33,502 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.402e+02 4.574e+02 5.700e+02 7.129e+02 3.450e+03, threshold=1.140e+03, percent-clipped=2.0 2023-03-07 12:28:24,881 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5215, 5.4661, 5.1579, 5.5932, 5.1289, 5.0889, 5.0008, 5.0744], device='cuda:2'), covar=tensor([0.0873, 0.0880, 0.0716, 0.0545, 0.0627, 0.1207, 0.2010, 0.1676], device='cuda:2'), in_proj_covar=tensor([0.0201, 0.0243, 0.0211, 0.0182, 0.0166, 0.0256, 0.0267, 0.0238], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 12:28:26,297 INFO [train2.py:809] (2/4) Epoch 2, batch 1350, loss[ctc_loss=0.2651, att_loss=0.3537, loss=0.336, over 17569.00 frames. utt_duration=1020 frames, utt_pad_proportion=0.0378, over 69.00 utterances.], tot_loss[ctc_loss=0.262, att_loss=0.3346, loss=0.3201, over 3276734.87 frames. utt_duration=1216 frames, utt_pad_proportion=0.05919, over 10795.78 utterances.], batch size: 69, lr: 4.05e-02, grad_scale: 8.0 2023-03-07 12:28:26,757 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3916, 3.9753, 5.0906, 3.7914, 4.0741, 4.1146, 4.6864, 4.2122], device='cuda:2'), covar=tensor([0.0359, 0.1569, 0.0272, 0.2252, 0.1979, 0.1646, 0.0368, 0.0859], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0157, 0.0095, 0.0178, 0.0196, 0.0112, 0.0072, 0.0077], device='cuda:2'), out_proj_covar=tensor([4.3549e-05, 8.8057e-05, 5.1030e-05, 1.1511e-04, 1.2250e-04, 7.6840e-05, 4.5553e-05, 4.8782e-05], device='cuda:2') 2023-03-07 12:29:08,038 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0820, 3.8125, 5.1697, 3.9598, 3.8569, 4.0173, 4.2661, 4.3035], device='cuda:2'), covar=tensor([0.0486, 0.1830, 0.0278, 0.2132, 0.2807, 0.1594, 0.0524, 0.0752], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0159, 0.0095, 0.0180, 0.0198, 0.0114, 0.0074, 0.0078], device='cuda:2'), out_proj_covar=tensor([4.4043e-05, 8.9058e-05, 5.1037e-05, 1.1689e-04, 1.2363e-04, 7.7687e-05, 4.7068e-05, 4.9644e-05], device='cuda:2') 2023-03-07 12:29:47,499 INFO [train2.py:809] (2/4) Epoch 2, batch 1400, loss[ctc_loss=0.2409, att_loss=0.3429, loss=0.3225, over 16461.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.006885, over 46.00 utterances.], tot_loss[ctc_loss=0.2606, att_loss=0.3339, loss=0.3193, over 3274025.58 frames. utt_duration=1227 frames, utt_pad_proportion=0.05786, over 10686.95 utterances.], batch size: 46, lr: 4.04e-02, grad_scale: 8.0 2023-03-07 12:30:15,419 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.881e+02 4.556e+02 5.757e+02 7.116e+02 2.241e+03, threshold=1.151e+03, percent-clipped=2.0 2023-03-07 12:30:28,920 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5104, 3.7531, 5.0044, 3.9733, 3.7070, 4.1216, 4.0714, 4.2949], device='cuda:2'), covar=tensor([0.0245, 0.1494, 0.0241, 0.1856, 0.2950, 0.1708, 0.0552, 0.0972], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0160, 0.0095, 0.0181, 0.0203, 0.0113, 0.0076, 0.0079], device='cuda:2'), out_proj_covar=tensor([4.4636e-05, 9.0668e-05, 5.2126e-05, 1.1793e-04, 1.2651e-04, 7.7907e-05, 4.8226e-05, 5.0577e-05], device='cuda:2') 2023-03-07 12:31:08,625 INFO [train2.py:809] (2/4) Epoch 2, batch 1450, loss[ctc_loss=0.2429, att_loss=0.3195, loss=0.3042, over 16268.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.008011, over 43.00 utterances.], tot_loss[ctc_loss=0.2585, att_loss=0.3333, loss=0.3183, over 3278858.37 frames. utt_duration=1238 frames, utt_pad_proportion=0.05419, over 10605.41 utterances.], batch size: 43, lr: 4.03e-02, grad_scale: 8.0 2023-03-07 12:32:29,328 INFO [train2.py:809] (2/4) Epoch 2, batch 1500, loss[ctc_loss=0.2991, att_loss=0.3593, loss=0.3472, over 17402.00 frames. utt_duration=1106 frames, utt_pad_proportion=0.03286, over 63.00 utterances.], tot_loss[ctc_loss=0.2576, att_loss=0.333, loss=0.318, over 3281481.03 frames. utt_duration=1220 frames, utt_pad_proportion=0.05824, over 10773.51 utterances.], batch size: 63, lr: 4.02e-02, grad_scale: 8.0 2023-03-07 12:32:57,036 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 4.697e+02 5.525e+02 6.949e+02 3.924e+03, threshold=1.105e+03, percent-clipped=4.0 2023-03-07 12:33:49,178 INFO [train2.py:809] (2/4) Epoch 2, batch 1550, loss[ctc_loss=0.2921, att_loss=0.3731, loss=0.3569, over 17322.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02254, over 59.00 utterances.], tot_loss[ctc_loss=0.2583, att_loss=0.3337, loss=0.3186, over 3289353.30 frames. utt_duration=1240 frames, utt_pad_proportion=0.05122, over 10620.87 utterances.], batch size: 59, lr: 4.01e-02, grad_scale: 8.0 2023-03-07 12:35:10,349 INFO [train2.py:809] (2/4) Epoch 2, batch 1600, loss[ctc_loss=0.1941, att_loss=0.2875, loss=0.2688, over 15741.00 frames. utt_duration=1658 frames, utt_pad_proportion=0.008793, over 38.00 utterances.], tot_loss[ctc_loss=0.2569, att_loss=0.3332, loss=0.3179, over 3284754.64 frames. utt_duration=1233 frames, utt_pad_proportion=0.05524, over 10668.87 utterances.], batch size: 38, lr: 4.00e-02, grad_scale: 8.0 2023-03-07 12:35:36,478 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2023-03-07 12:35:38,630 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.176e+02 4.466e+02 5.617e+02 6.515e+02 1.895e+03, threshold=1.123e+03, percent-clipped=5.0 2023-03-07 12:36:31,275 INFO [train2.py:809] (2/4) Epoch 2, batch 1650, loss[ctc_loss=0.2823, att_loss=0.3492, loss=0.3358, over 17183.00 frames. utt_duration=871.6 frames, utt_pad_proportion=0.08354, over 79.00 utterances.], tot_loss[ctc_loss=0.2567, att_loss=0.3338, loss=0.3184, over 3293812.13 frames. utt_duration=1240 frames, utt_pad_proportion=0.05095, over 10635.50 utterances.], batch size: 79, lr: 3.99e-02, grad_scale: 8.0 2023-03-07 12:36:55,836 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=5650.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 12:37:51,239 INFO [train2.py:809] (2/4) Epoch 2, batch 1700, loss[ctc_loss=0.2549, att_loss=0.3589, loss=0.3381, over 16870.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.008149, over 49.00 utterances.], tot_loss[ctc_loss=0.2553, att_loss=0.3326, loss=0.3171, over 3277085.35 frames. utt_duration=1224 frames, utt_pad_proportion=0.05833, over 10726.99 utterances.], batch size: 49, lr: 3.98e-02, grad_scale: 8.0 2023-03-07 12:38:18,536 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.296e+02 4.646e+02 5.757e+02 7.379e+02 2.321e+03, threshold=1.151e+03, percent-clipped=8.0 2023-03-07 12:38:34,114 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5711.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 12:39:10,816 INFO [train2.py:809] (2/4) Epoch 2, batch 1750, loss[ctc_loss=0.1956, att_loss=0.2787, loss=0.2621, over 15635.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.009384, over 37.00 utterances.], tot_loss[ctc_loss=0.2529, att_loss=0.3306, loss=0.315, over 3276595.79 frames. utt_duration=1254 frames, utt_pad_proportion=0.05155, over 10467.63 utterances.], batch size: 37, lr: 3.97e-02, grad_scale: 8.0 2023-03-07 12:39:44,205 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=5755.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:40:31,802 INFO [train2.py:809] (2/4) Epoch 2, batch 1800, loss[ctc_loss=0.234, att_loss=0.3216, loss=0.3041, over 17037.00 frames. utt_duration=1338 frames, utt_pad_proportion=0.006652, over 51.00 utterances.], tot_loss[ctc_loss=0.2513, att_loss=0.3298, loss=0.3141, over 3281693.22 frames. utt_duration=1274 frames, utt_pad_proportion=0.04557, over 10318.42 utterances.], batch size: 51, lr: 3.96e-02, grad_scale: 8.0 2023-03-07 12:40:34,459 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0352, 4.2615, 4.1442, 4.5581, 4.6836, 4.2160, 4.2812, 2.5791], device='cuda:2'), covar=tensor([0.0477, 0.0326, 0.0291, 0.0129, 0.0714, 0.0310, 0.0329, 0.3600], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0087, 0.0081, 0.0088, 0.0133, 0.0107, 0.0068, 0.0204], device='cuda:2'), out_proj_covar=tensor([8.7228e-05, 5.5678e-05, 5.7468e-05, 5.5040e-05, 1.0587e-04, 6.9091e-05, 5.1286e-05, 1.3845e-04], device='cuda:2') 2023-03-07 12:40:59,738 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.935e+02 4.735e+02 5.722e+02 7.362e+02 1.615e+03, threshold=1.144e+03, percent-clipped=5.0 2023-03-07 12:41:23,375 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5816.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:41:53,264 INFO [train2.py:809] (2/4) Epoch 2, batch 1850, loss[ctc_loss=0.2338, att_loss=0.3133, loss=0.2974, over 16287.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006853, over 43.00 utterances.], tot_loss[ctc_loss=0.2526, att_loss=0.3307, loss=0.3151, over 3284308.18 frames. utt_duration=1242 frames, utt_pad_proportion=0.05292, over 10587.62 utterances.], batch size: 43, lr: 3.95e-02, grad_scale: 8.0 2023-03-07 12:41:55,415 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7159, 3.8983, 3.9881, 3.9257, 2.2068, 2.6997, 4.0005, 3.5513], device='cuda:2'), covar=tensor([0.1419, 0.0411, 0.0312, 0.0573, 1.2948, 0.2888, 0.0351, 0.3719], device='cuda:2'), in_proj_covar=tensor([0.0171, 0.0098, 0.0127, 0.0139, 0.0360, 0.0222, 0.0139, 0.0151], device='cuda:2'), out_proj_covar=tensor([8.6780e-05, 4.5817e-05, 5.2127e-05, 6.2395e-05, 1.8012e-04, 1.0568e-04, 5.8795e-05, 8.6380e-05], device='cuda:2') 2023-03-07 12:42:30,746 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=5858.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:43:13,669 INFO [train2.py:809] (2/4) Epoch 2, batch 1900, loss[ctc_loss=0.271, att_loss=0.3548, loss=0.338, over 17346.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.02116, over 59.00 utterances.], tot_loss[ctc_loss=0.2541, att_loss=0.3323, loss=0.3167, over 3285633.17 frames. utt_duration=1210 frames, utt_pad_proportion=0.06129, over 10873.65 utterances.], batch size: 59, lr: 3.95e-02, grad_scale: 8.0 2023-03-07 12:43:41,189 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.885e+02 5.042e+02 5.993e+02 7.245e+02 1.405e+03, threshold=1.199e+03, percent-clipped=1.0 2023-03-07 12:44:09,805 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5919.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:44:35,112 INFO [train2.py:809] (2/4) Epoch 2, batch 1950, loss[ctc_loss=0.2544, att_loss=0.3287, loss=0.3138, over 16169.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.00765, over 41.00 utterances.], tot_loss[ctc_loss=0.2532, att_loss=0.3312, loss=0.3156, over 3272603.20 frames. utt_duration=1218 frames, utt_pad_proportion=0.06321, over 10760.06 utterances.], batch size: 41, lr: 3.94e-02, grad_scale: 8.0 2023-03-07 12:45:55,275 INFO [train2.py:809] (2/4) Epoch 2, batch 2000, loss[ctc_loss=0.2319, att_loss=0.3253, loss=0.3066, over 16111.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006407, over 42.00 utterances.], tot_loss[ctc_loss=0.2525, att_loss=0.3301, loss=0.3146, over 3262406.45 frames. utt_duration=1218 frames, utt_pad_proportion=0.06397, over 10727.64 utterances.], batch size: 42, lr: 3.93e-02, grad_scale: 16.0 2023-03-07 12:46:04,123 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=5990.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:46:23,729 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.2590, 1.9700, 1.4753, 1.1710, 1.0958, 1.9711, 2.0579, 2.0541], device='cuda:2'), covar=tensor([0.0398, 0.0599, 0.0738, 0.0810, 0.0498, 0.0733, 0.0470, 0.0329], device='cuda:2'), in_proj_covar=tensor([0.0065, 0.0054, 0.0056, 0.0064, 0.0055, 0.0071, 0.0053, 0.0074], device='cuda:2'), out_proj_covar=tensor([4.4107e-05, 4.5002e-05, 4.7726e-05, 4.9450e-05, 3.9682e-05, 6.5439e-05, 4.8814e-05, 4.3019e-05], device='cuda:2') 2023-03-07 12:46:26,496 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.443e+02 4.806e+02 6.010e+02 7.878e+02 2.247e+03, threshold=1.202e+03, percent-clipped=5.0 2023-03-07 12:46:33,526 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6006.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 12:47:08,291 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.0464, 1.7702, 1.4283, 1.2821, 1.2178, 2.2395, 1.5730, 1.9978], device='cuda:2'), covar=tensor([0.0565, 0.0614, 0.0789, 0.0931, 0.0514, 0.0560, 0.0847, 0.0364], device='cuda:2'), in_proj_covar=tensor([0.0071, 0.0058, 0.0062, 0.0067, 0.0058, 0.0073, 0.0059, 0.0079], device='cuda:2'), out_proj_covar=tensor([4.7928e-05, 4.7330e-05, 5.2215e-05, 5.2341e-05, 4.0928e-05, 6.8248e-05, 5.3145e-05, 4.5134e-05], device='cuda:2') 2023-03-07 12:47:19,301 INFO [train2.py:809] (2/4) Epoch 2, batch 2050, loss[ctc_loss=0.2113, att_loss=0.2815, loss=0.2674, over 15777.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008086, over 38.00 utterances.], tot_loss[ctc_loss=0.2534, att_loss=0.3304, loss=0.315, over 3260415.31 frames. utt_duration=1228 frames, utt_pad_proportion=0.06073, over 10632.35 utterances.], batch size: 38, lr: 3.92e-02, grad_scale: 8.0 2023-03-07 12:47:33,243 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6043.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:47:45,313 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6051.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:48:20,779 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5014, 4.4235, 4.3518, 4.8477, 4.7550, 4.5665, 4.2907, 4.7651], device='cuda:2'), covar=tensor([0.0103, 0.0327, 0.0134, 0.0103, 0.0106, 0.0097, 0.0268, 0.0147], device='cuda:2'), in_proj_covar=tensor([0.0037, 0.0041, 0.0040, 0.0030, 0.0028, 0.0032, 0.0046, 0.0039], device='cuda:2'), out_proj_covar=tensor([6.2853e-05, 7.0842e-05, 7.5203e-05, 5.4071e-05, 4.5097e-05, 5.7685e-05, 7.7794e-05, 6.8468e-05], device='cuda:2') 2023-03-07 12:48:40,075 INFO [train2.py:809] (2/4) Epoch 2, batch 2100, loss[ctc_loss=0.2261, att_loss=0.3354, loss=0.3135, over 17022.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007875, over 51.00 utterances.], tot_loss[ctc_loss=0.2498, att_loss=0.3285, loss=0.3127, over 3262952.50 frames. utt_duration=1263 frames, utt_pad_proportion=0.05236, over 10347.08 utterances.], batch size: 51, lr: 3.91e-02, grad_scale: 8.0 2023-03-07 12:49:08,982 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.112e+02 4.379e+02 5.307e+02 7.391e+02 2.758e+03, threshold=1.061e+03, percent-clipped=4.0 2023-03-07 12:49:10,909 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6104.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:49:23,167 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6111.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:49:35,961 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2771, 4.4793, 4.2433, 4.5034, 4.6124, 4.2459, 4.2381, 4.5640], device='cuda:2'), covar=tensor([0.0096, 0.0189, 0.0128, 0.0124, 0.0095, 0.0103, 0.0262, 0.0134], device='cuda:2'), in_proj_covar=tensor([0.0036, 0.0040, 0.0040, 0.0030, 0.0028, 0.0031, 0.0045, 0.0038], device='cuda:2'), out_proj_covar=tensor([6.3332e-05, 6.9282e-05, 7.6220e-05, 5.3589e-05, 4.6112e-05, 5.6499e-05, 7.7109e-05, 6.7904e-05], device='cuda:2') 2023-03-07 12:50:01,354 INFO [train2.py:809] (2/4) Epoch 2, batch 2150, loss[ctc_loss=0.2858, att_loss=0.3533, loss=0.3398, over 16546.00 frames. utt_duration=670.1 frames, utt_pad_proportion=0.1561, over 99.00 utterances.], tot_loss[ctc_loss=0.2502, att_loss=0.3296, loss=0.3137, over 3275765.90 frames. utt_duration=1254 frames, utt_pad_proportion=0.05095, over 10462.63 utterances.], batch size: 99, lr: 3.90e-02, grad_scale: 8.0 2023-03-07 12:50:01,719 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6135.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:50:37,529 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1766, 4.7639, 4.9701, 4.9685, 1.7419, 2.9712, 4.8813, 3.9617], device='cuda:2'), covar=tensor([0.1085, 0.0312, 0.0175, 0.0377, 1.8810, 0.3348, 0.0284, 0.3822], device='cuda:2'), in_proj_covar=tensor([0.0179, 0.0098, 0.0129, 0.0143, 0.0359, 0.0223, 0.0136, 0.0155], device='cuda:2'), out_proj_covar=tensor([9.0669e-05, 4.7703e-05, 5.6973e-05, 6.4691e-05, 1.7715e-04, 1.0624e-04, 5.9556e-05, 8.8382e-05], device='cuda:2') 2023-03-07 12:50:57,748 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6170.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:51:21,401 INFO [train2.py:809] (2/4) Epoch 2, batch 2200, loss[ctc_loss=0.2427, att_loss=0.3106, loss=0.297, over 16397.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007228, over 44.00 utterances.], tot_loss[ctc_loss=0.2513, att_loss=0.3301, loss=0.3143, over 3275803.40 frames. utt_duration=1267 frames, utt_pad_proportion=0.04797, over 10351.68 utterances.], batch size: 44, lr: 3.89e-02, grad_scale: 8.0 2023-03-07 12:51:39,666 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6196.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:51:50,201 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.181e+02 4.728e+02 5.597e+02 7.112e+02 1.691e+03, threshold=1.119e+03, percent-clipped=4.0 2023-03-07 12:51:50,633 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6203.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:52:04,348 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6211.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:52:08,958 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6214.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:52:16,939 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3465, 1.9044, 2.6927, 3.4338, 2.5808, 2.6179, 3.0990, 1.5133], device='cuda:2'), covar=tensor([0.0896, 0.1348, 0.0373, 0.0348, 0.0508, 0.0483, 0.0473, 0.1742], device='cuda:2'), in_proj_covar=tensor([0.0078, 0.0060, 0.0051, 0.0086, 0.0084, 0.0069, 0.0071, 0.0099], device='cuda:2'), out_proj_covar=tensor([4.9751e-05, 4.5550e-05, 4.1311e-05, 4.8890e-05, 5.1706e-05, 4.6567e-05, 4.7053e-05, 7.4897e-05], device='cuda:2') 2023-03-07 12:52:36,019 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4928, 4.6650, 4.4167, 4.7336, 4.8900, 4.4671, 4.4240, 4.6766], device='cuda:2'), covar=tensor([0.0102, 0.0262, 0.0152, 0.0127, 0.0107, 0.0120, 0.0240, 0.0150], device='cuda:2'), in_proj_covar=tensor([0.0038, 0.0043, 0.0041, 0.0031, 0.0028, 0.0032, 0.0046, 0.0039], device='cuda:2'), out_proj_covar=tensor([6.6271e-05, 7.4951e-05, 8.1604e-05, 5.6461e-05, 4.7677e-05, 5.8632e-05, 7.9131e-05, 7.0781e-05], device='cuda:2') 2023-03-07 12:52:36,143 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6231.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:52:42,165 INFO [train2.py:809] (2/4) Epoch 2, batch 2250, loss[ctc_loss=0.2322, att_loss=0.3224, loss=0.3044, over 16282.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006446, over 43.00 utterances.], tot_loss[ctc_loss=0.2475, att_loss=0.328, loss=0.3119, over 3273181.95 frames. utt_duration=1291 frames, utt_pad_proportion=0.04278, over 10151.93 utterances.], batch size: 43, lr: 3.88e-02, grad_scale: 8.0 2023-03-07 12:53:08,371 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1199, 2.1026, 2.7740, 3.9570, 4.3167, 4.2780, 2.5894, 1.7233], device='cuda:2'), covar=tensor([0.0228, 0.1953, 0.1045, 0.0381, 0.0174, 0.0128, 0.1582, 0.2207], device='cuda:2'), in_proj_covar=tensor([0.0082, 0.0122, 0.0108, 0.0075, 0.0056, 0.0069, 0.0126, 0.0122], device='cuda:2'), out_proj_covar=tensor([7.1863e-05, 1.0991e-04, 9.8660e-05, 8.1602e-05, 5.9078e-05, 5.8664e-05, 1.1827e-04, 1.0926e-04], device='cuda:2') 2023-03-07 12:53:29,783 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6264.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:53:32,828 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6266.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:53:42,157 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6272.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:54:02,555 INFO [train2.py:809] (2/4) Epoch 2, batch 2300, loss[ctc_loss=0.2352, att_loss=0.3354, loss=0.3154, over 16454.00 frames. utt_duration=1432 frames, utt_pad_proportion=0.005959, over 46.00 utterances.], tot_loss[ctc_loss=0.2484, att_loss=0.3286, loss=0.3126, over 3274056.61 frames. utt_duration=1268 frames, utt_pad_proportion=0.04883, over 10340.09 utterances.], batch size: 46, lr: 3.87e-02, grad_scale: 8.0 2023-03-07 12:54:30,722 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.781e+02 4.301e+02 5.622e+02 7.152e+02 1.270e+03, threshold=1.124e+03, percent-clipped=3.0 2023-03-07 12:54:36,255 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6306.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 12:55:09,232 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6327.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:55:19,425 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2741, 4.6806, 4.9589, 5.0083, 1.7562, 2.3657, 5.0604, 3.6643], device='cuda:2'), covar=tensor([0.1051, 0.0365, 0.0177, 0.0373, 1.8056, 0.4708, 0.0224, 0.4437], device='cuda:2'), in_proj_covar=tensor([0.0187, 0.0104, 0.0134, 0.0152, 0.0371, 0.0235, 0.0141, 0.0173], device='cuda:2'), out_proj_covar=tensor([9.5057e-05, 5.2096e-05, 6.0436e-05, 6.9963e-05, 1.8224e-04, 1.1251e-04, 6.2502e-05, 9.6006e-05], device='cuda:2') 2023-03-07 12:55:21,933 INFO [train2.py:809] (2/4) Epoch 2, batch 2350, loss[ctc_loss=0.309, att_loss=0.3575, loss=0.3478, over 17154.00 frames. utt_duration=870.2 frames, utt_pad_proportion=0.08692, over 79.00 utterances.], tot_loss[ctc_loss=0.25, att_loss=0.3297, loss=0.3138, over 3281960.41 frames. utt_duration=1251 frames, utt_pad_proportion=0.05063, over 10504.53 utterances.], batch size: 79, lr: 3.86e-02, grad_scale: 8.0 2023-03-07 12:55:39,730 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6346.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:55:52,498 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6354.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 12:56:00,626 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4516, 5.6099, 5.0005, 5.7254, 5.1225, 5.0650, 4.9716, 4.9710], device='cuda:2'), covar=tensor([0.1185, 0.0934, 0.0926, 0.0622, 0.0622, 0.1362, 0.3177, 0.2301], device='cuda:2'), in_proj_covar=tensor([0.0204, 0.0256, 0.0215, 0.0185, 0.0165, 0.0260, 0.0282, 0.0255], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 12:56:41,288 INFO [train2.py:809] (2/4) Epoch 2, batch 2400, loss[ctc_loss=0.2708, att_loss=0.3455, loss=0.3306, over 17056.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009528, over 53.00 utterances.], tot_loss[ctc_loss=0.2478, att_loss=0.3277, loss=0.3118, over 3274795.26 frames. utt_duration=1278 frames, utt_pad_proportion=0.0454, over 10258.36 utterances.], batch size: 53, lr: 3.85e-02, grad_scale: 8.0 2023-03-07 12:57:03,925 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6399.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:57:07,415 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0431, 5.1273, 5.4420, 5.8198, 5.2039, 5.8880, 5.2218, 5.9672], device='cuda:2'), covar=tensor([0.0382, 0.0575, 0.0420, 0.0379, 0.1849, 0.0671, 0.0339, 0.0467], device='cuda:2'), in_proj_covar=tensor([0.0285, 0.0221, 0.0197, 0.0218, 0.0360, 0.0207, 0.0175, 0.0244], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:2') 2023-03-07 12:57:10,896 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.496e+02 4.794e+02 5.680e+02 6.939e+02 1.517e+03, threshold=1.136e+03, percent-clipped=5.0 2023-03-07 12:57:24,368 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6411.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:58:03,299 INFO [train2.py:809] (2/4) Epoch 2, batch 2450, loss[ctc_loss=0.2289, att_loss=0.3303, loss=0.31, over 17245.00 frames. utt_duration=1171 frames, utt_pad_proportion=0.02587, over 59.00 utterances.], tot_loss[ctc_loss=0.2452, att_loss=0.326, loss=0.3098, over 3270894.54 frames. utt_duration=1283 frames, utt_pad_proportion=0.04515, over 10205.69 utterances.], batch size: 59, lr: 3.84e-02, grad_scale: 8.0 2023-03-07 12:58:25,524 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5116, 4.3067, 4.5816, 4.4675, 2.0252, 4.4041, 2.0872, 4.7322], device='cuda:2'), covar=tensor([0.0170, 0.0251, 0.0482, 0.0250, 0.4079, 0.0157, 0.1457, 0.0303], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0071, 0.0138, 0.0122, 0.0213, 0.0077, 0.0139, 0.0122], device='cuda:2'), out_proj_covar=tensor([6.3851e-05, 6.0616e-05, 1.0433e-04, 8.8505e-05, 1.4706e-04, 6.1503e-05, 1.0098e-04, 8.6804e-05], device='cuda:2') 2023-03-07 12:58:42,227 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6459.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:58:47,190 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6462.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:58:54,674 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6467.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:59:23,398 INFO [train2.py:809] (2/4) Epoch 2, batch 2500, loss[ctc_loss=0.2653, att_loss=0.351, loss=0.3339, over 16630.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.00514, over 47.00 utterances.], tot_loss[ctc_loss=0.2454, att_loss=0.3258, loss=0.3097, over 3269127.97 frames. utt_duration=1261 frames, utt_pad_proportion=0.05191, over 10386.49 utterances.], batch size: 47, lr: 3.83e-02, grad_scale: 8.0 2023-03-07 12:59:28,664 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2965, 5.1421, 5.1452, 4.9165, 2.2246, 3.3382, 5.2777, 4.0339], device='cuda:2'), covar=tensor([0.1127, 0.0261, 0.0236, 0.0558, 1.3456, 0.2407, 0.0270, 0.3801], device='cuda:2'), in_proj_covar=tensor([0.0191, 0.0106, 0.0134, 0.0152, 0.0371, 0.0232, 0.0140, 0.0175], device='cuda:2'), out_proj_covar=tensor([9.7535e-05, 5.3487e-05, 6.1707e-05, 7.0990e-05, 1.8150e-04, 1.1178e-04, 6.3338e-05, 9.6723e-05], device='cuda:2') 2023-03-07 12:59:33,172 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6491.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 12:59:51,986 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.651e+02 4.550e+02 5.536e+02 6.688e+02 2.770e+03, threshold=1.107e+03, percent-clipped=3.0 2023-03-07 13:00:09,852 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6514.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:00:18,441 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-07 13:00:24,376 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6523.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 13:00:29,505 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6526.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:00:32,862 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6528.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:00:44,095 INFO [train2.py:809] (2/4) Epoch 2, batch 2550, loss[ctc_loss=0.2033, att_loss=0.2837, loss=0.2677, over 15781.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.008039, over 38.00 utterances.], tot_loss[ctc_loss=0.2445, att_loss=0.325, loss=0.3089, over 3271186.31 frames. utt_duration=1261 frames, utt_pad_proportion=0.0506, over 10389.15 utterances.], batch size: 38, lr: 3.82e-02, grad_scale: 8.0 2023-03-07 13:01:23,800 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6559.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:01:28,551 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6562.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:01:36,570 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6567.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:02:05,597 INFO [train2.py:809] (2/4) Epoch 2, batch 2600, loss[ctc_loss=0.2146, att_loss=0.3111, loss=0.2918, over 16113.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.007049, over 42.00 utterances.], tot_loss[ctc_loss=0.2421, att_loss=0.3238, loss=0.3074, over 3274101.10 frames. utt_duration=1287 frames, utt_pad_proportion=0.04411, over 10185.90 utterances.], batch size: 42, lr: 3.81e-02, grad_scale: 8.0 2023-03-07 13:02:35,426 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 4.654e+02 5.537e+02 6.841e+02 1.492e+03, threshold=1.107e+03, percent-clipped=3.0 2023-03-07 13:03:05,295 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6622.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:03:16,828 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6629.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:03:26,460 INFO [train2.py:809] (2/4) Epoch 2, batch 2650, loss[ctc_loss=0.1863, att_loss=0.2811, loss=0.2621, over 15629.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009937, over 37.00 utterances.], tot_loss[ctc_loss=0.2415, att_loss=0.3237, loss=0.3073, over 3273437.51 frames. utt_duration=1268 frames, utt_pad_proportion=0.04849, over 10337.25 utterances.], batch size: 37, lr: 3.80e-02, grad_scale: 8.0 2023-03-07 13:03:44,202 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6646.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:04:45,709 INFO [train2.py:809] (2/4) Epoch 2, batch 2700, loss[ctc_loss=0.1805, att_loss=0.2907, loss=0.2687, over 16322.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006566, over 45.00 utterances.], tot_loss[ctc_loss=0.2425, att_loss=0.3244, loss=0.308, over 3274872.63 frames. utt_duration=1268 frames, utt_pad_proportion=0.04949, over 10347.09 utterances.], batch size: 45, lr: 3.79e-02, grad_scale: 8.0 2023-03-07 13:04:54,029 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6690.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:04:55,659 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([0.8037, 1.8070, 2.1991, 1.5887, 1.3819, 2.3894, 1.7477, 2.2996], device='cuda:2'), covar=tensor([0.1210, 0.0654, 0.0615, 0.0845, 0.0729, 0.0635, 0.0753, 0.0528], device='cuda:2'), in_proj_covar=tensor([0.0074, 0.0066, 0.0064, 0.0067, 0.0067, 0.0082, 0.0069, 0.0092], device='cuda:2'), out_proj_covar=tensor([4.9405e-05, 4.9111e-05, 4.9534e-05, 5.4678e-05, 4.1612e-05, 6.5709e-05, 5.3951e-05, 4.7657e-05], device='cuda:2') 2023-03-07 13:05:00,065 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6694.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:05:09,219 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6699.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:05:15,303 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.955e+02 4.204e+02 5.612e+02 7.162e+02 1.326e+03, threshold=1.122e+03, percent-clipped=2.0 2023-03-07 13:05:30,591 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.11 vs. limit=2.0 2023-03-07 13:06:06,238 INFO [train2.py:809] (2/4) Epoch 2, batch 2750, loss[ctc_loss=0.3104, att_loss=0.351, loss=0.3429, over 16329.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006065, over 45.00 utterances.], tot_loss[ctc_loss=0.2432, att_loss=0.3243, loss=0.3081, over 3256552.05 frames. utt_duration=1213 frames, utt_pad_proportion=0.06883, over 10749.62 utterances.], batch size: 45, lr: 3.79e-02, grad_scale: 8.0 2023-03-07 13:06:26,267 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6747.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:07:25,949 INFO [train2.py:809] (2/4) Epoch 2, batch 2800, loss[ctc_loss=0.1981, att_loss=0.2981, loss=0.2781, over 16406.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006472, over 44.00 utterances.], tot_loss[ctc_loss=0.2412, att_loss=0.3232, loss=0.3068, over 3265736.27 frames. utt_duration=1239 frames, utt_pad_proportion=0.05962, over 10559.14 utterances.], batch size: 44, lr: 3.78e-02, grad_scale: 8.0 2023-03-07 13:07:36,247 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6791.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:07:55,847 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+02 4.477e+02 5.364e+02 6.587e+02 1.505e+03, threshold=1.073e+03, percent-clipped=3.0 2023-03-07 13:08:20,319 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6818.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 13:08:28,520 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6823.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:08:33,295 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6826.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:08:46,711 INFO [train2.py:809] (2/4) Epoch 2, batch 2850, loss[ctc_loss=0.2088, att_loss=0.3033, loss=0.2844, over 16277.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.007371, over 43.00 utterances.], tot_loss[ctc_loss=0.2405, att_loss=0.3231, loss=0.3066, over 3268433.86 frames. utt_duration=1240 frames, utt_pad_proportion=0.0582, over 10553.11 utterances.], batch size: 43, lr: 3.77e-02, grad_scale: 8.0 2023-03-07 13:08:53,173 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6839.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:09:25,283 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6859.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:09:38,413 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6867.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:09:49,553 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6874.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:09:56,253 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6878.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 13:10:06,817 INFO [train2.py:809] (2/4) Epoch 2, batch 2900, loss[ctc_loss=0.2342, att_loss=0.3261, loss=0.3077, over 16320.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006734, over 45.00 utterances.], tot_loss[ctc_loss=0.2399, att_loss=0.3228, loss=0.3062, over 3258632.86 frames. utt_duration=1227 frames, utt_pad_proportion=0.06311, over 10634.05 utterances.], batch size: 45, lr: 3.76e-02, grad_scale: 8.0 2023-03-07 13:10:36,472 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+02 4.377e+02 5.489e+02 7.218e+02 1.392e+03, threshold=1.098e+03, percent-clipped=5.0 2023-03-07 13:10:42,896 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6907.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:10:46,797 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6909.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:10:55,832 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6915.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:11:02,758 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8070, 5.9503, 5.3655, 6.0081, 5.5565, 5.5029, 5.3708, 5.3654], device='cuda:2'), covar=tensor([0.0744, 0.0626, 0.0617, 0.0513, 0.0415, 0.0941, 0.1565, 0.1371], device='cuda:2'), in_proj_covar=tensor([0.0218, 0.0270, 0.0225, 0.0202, 0.0173, 0.0270, 0.0284, 0.0272], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 13:11:07,554 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6922.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:11:27,220 INFO [train2.py:809] (2/4) Epoch 2, batch 2950, loss[ctc_loss=0.1891, att_loss=0.2814, loss=0.2629, over 14572.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.03193, over 32.00 utterances.], tot_loss[ctc_loss=0.2379, att_loss=0.3213, loss=0.3046, over 3253739.43 frames. utt_duration=1249 frames, utt_pad_proportion=0.05891, over 10432.13 utterances.], batch size: 32, lr: 3.75e-02, grad_scale: 8.0 2023-03-07 13:11:35,188 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6939.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 13:12:24,515 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6970.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:12:24,796 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6970.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:12:47,618 INFO [train2.py:809] (2/4) Epoch 2, batch 3000, loss[ctc_loss=0.1912, att_loss=0.3115, loss=0.2875, over 16767.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006481, over 48.00 utterances.], tot_loss[ctc_loss=0.2359, att_loss=0.3208, loss=0.3039, over 3267353.30 frames. utt_duration=1265 frames, utt_pad_proportion=0.05137, over 10340.97 utterances.], batch size: 48, lr: 3.74e-02, grad_scale: 8.0 2023-03-07 13:12:47,619 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 13:13:01,224 INFO [train2.py:843] (2/4) Epoch 2, validation: ctc_loss=0.1245, att_loss=0.2759, loss=0.2456, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 13:13:01,225 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 13:13:01,447 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6985.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:13:30,571 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.232e+02 4.393e+02 5.286e+02 7.029e+02 1.971e+03, threshold=1.057e+03, percent-clipped=9.0 2023-03-07 13:14:02,525 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.9417, 3.6449, 3.1662, 3.1333, 3.6257, 3.4978, 1.7407, 3.9665], device='cuda:2'), covar=tensor([0.1140, 0.0197, 0.0816, 0.0610, 0.0265, 0.0661, 0.1324, 0.0101], device='cuda:2'), in_proj_covar=tensor([0.0082, 0.0048, 0.0093, 0.0076, 0.0055, 0.0098, 0.0086, 0.0043], device='cuda:2'), out_proj_covar=tensor([1.0498e-04, 7.5629e-05, 1.3426e-04, 9.8648e-05, 8.4520e-05, 1.3746e-04, 1.0838e-04, 6.4501e-05], device='cuda:2') 2023-03-07 13:14:20,770 INFO [train2.py:809] (2/4) Epoch 2, batch 3050, loss[ctc_loss=0.2585, att_loss=0.3102, loss=0.2999, over 15487.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.009616, over 36.00 utterances.], tot_loss[ctc_loss=0.2365, att_loss=0.3205, loss=0.3037, over 3255963.65 frames. utt_duration=1237 frames, utt_pad_proportion=0.06109, over 10545.27 utterances.], batch size: 36, lr: 3.73e-02, grad_scale: 8.0 2023-03-07 13:14:42,946 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7048.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 13:14:44,462 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4368, 4.9555, 5.0959, 5.0242, 1.9641, 2.9912, 5.1344, 3.6723], device='cuda:2'), covar=tensor([0.0688, 0.0245, 0.0174, 0.0383, 1.3631, 0.2943, 0.0227, 0.3610], device='cuda:2'), in_proj_covar=tensor([0.0209, 0.0116, 0.0137, 0.0156, 0.0387, 0.0252, 0.0137, 0.0192], device='cuda:2'), out_proj_covar=tensor([1.0920e-04, 6.0976e-05, 6.6201e-05, 7.5649e-05, 1.8702e-04, 1.2387e-04, 6.7208e-05, 1.0670e-04], device='cuda:2') 2023-03-07 13:14:55,971 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.78 vs. limit=2.0 2023-03-07 13:15:41,471 INFO [train2.py:809] (2/4) Epoch 2, batch 3100, loss[ctc_loss=0.2479, att_loss=0.3242, loss=0.309, over 16768.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006496, over 48.00 utterances.], tot_loss[ctc_loss=0.2359, att_loss=0.3209, loss=0.3039, over 3262996.83 frames. utt_duration=1246 frames, utt_pad_proportion=0.05667, over 10490.24 utterances.], batch size: 48, lr: 3.72e-02, grad_scale: 8.0 2023-03-07 13:16:10,552 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+02 4.796e+02 5.430e+02 6.688e+02 1.495e+03, threshold=1.086e+03, percent-clipped=6.0 2023-03-07 13:16:20,419 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7109.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 13:16:35,028 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7118.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 13:16:42,323 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7123.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:16:55,407 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8487, 4.5995, 4.8567, 4.3396, 4.8278, 3.6640, 4.2900, 2.1642], device='cuda:2'), covar=tensor([0.0156, 0.0089, 0.0135, 0.0202, 0.0067, 0.0171, 0.0166, 0.1237], device='cuda:2'), in_proj_covar=tensor([0.0030, 0.0028, 0.0027, 0.0039, 0.0030, 0.0033, 0.0038, 0.0071], device='cuda:2'), out_proj_covar=tensor([6.1055e-05, 6.7774e-05, 7.3404e-05, 7.7779e-05, 6.3062e-05, 8.2118e-05, 7.6398e-05, 1.3631e-04], device='cuda:2') 2023-03-07 13:17:01,198 INFO [train2.py:809] (2/4) Epoch 2, batch 3150, loss[ctc_loss=0.1877, att_loss=0.2792, loss=0.2609, over 15853.00 frames. utt_duration=1627 frames, utt_pad_proportion=0.009535, over 39.00 utterances.], tot_loss[ctc_loss=0.2378, att_loss=0.3222, loss=0.3053, over 3266785.29 frames. utt_duration=1224 frames, utt_pad_proportion=0.06136, over 10687.27 utterances.], batch size: 39, lr: 3.71e-02, grad_scale: 8.0 2023-03-07 13:17:47,353 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7163.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:17:51,642 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7166.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:17:59,113 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7171.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:18:22,230 INFO [train2.py:809] (2/4) Epoch 2, batch 3200, loss[ctc_loss=0.2132, att_loss=0.3086, loss=0.2895, over 16410.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.007117, over 44.00 utterances.], tot_loss[ctc_loss=0.238, att_loss=0.3222, loss=0.3054, over 3258523.09 frames. utt_duration=1211 frames, utt_pad_proportion=0.06683, over 10772.05 utterances.], batch size: 44, lr: 3.71e-02, grad_scale: 8.0 2023-03-07 13:18:50,783 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 4.512e+02 5.517e+02 7.088e+02 1.562e+03, threshold=1.103e+03, percent-clipped=2.0 2023-03-07 13:19:25,279 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7224.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 13:19:32,837 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4722, 1.9357, 2.4586, 3.5602, 3.8673, 3.7568, 2.4238, 1.9998], device='cuda:2'), covar=tensor([0.0312, 0.2221, 0.1257, 0.0361, 0.0124, 0.0158, 0.1767, 0.2133], device='cuda:2'), in_proj_covar=tensor([0.0092, 0.0142, 0.0129, 0.0084, 0.0064, 0.0076, 0.0143, 0.0137], device='cuda:2'), out_proj_covar=tensor([8.3716e-05, 1.2851e-04, 1.2064e-04, 9.5349e-05, 6.8250e-05, 6.8539e-05, 1.3605e-04, 1.2373e-04], device='cuda:2') 2023-03-07 13:19:41,626 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7234.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 13:19:42,924 INFO [train2.py:809] (2/4) Epoch 2, batch 3250, loss[ctc_loss=0.1879, att_loss=0.2682, loss=0.2522, over 15759.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.00939, over 38.00 utterances.], tot_loss[ctc_loss=0.239, att_loss=0.323, loss=0.3062, over 3268600.34 frames. utt_duration=1212 frames, utt_pad_proportion=0.06469, over 10805.31 utterances.], batch size: 38, lr: 3.70e-02, grad_scale: 8.0 2023-03-07 13:20:29,939 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7265.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:20:45,982 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7275.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:21:02,604 INFO [train2.py:809] (2/4) Epoch 2, batch 3300, loss[ctc_loss=0.2197, att_loss=0.3023, loss=0.2858, over 15881.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.008297, over 39.00 utterances.], tot_loss[ctc_loss=0.2404, att_loss=0.324, loss=0.3072, over 3272888.02 frames. utt_duration=1182 frames, utt_pad_proportion=0.06839, over 11092.78 utterances.], batch size: 39, lr: 3.69e-02, grad_scale: 8.0 2023-03-07 13:21:02,952 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7285.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:21:06,649 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2023-03-07 13:21:27,824 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3354, 2.0321, 3.2556, 4.2780, 4.5904, 4.4699, 2.8693, 2.0704], device='cuda:2'), covar=tensor([0.0186, 0.2288, 0.1033, 0.0467, 0.0090, 0.0121, 0.1743, 0.2273], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0137, 0.0127, 0.0085, 0.0064, 0.0076, 0.0141, 0.0135], device='cuda:2'), out_proj_covar=tensor([8.2161e-05, 1.2482e-04, 1.1969e-04, 9.5579e-05, 6.8323e-05, 6.8061e-05, 1.3345e-04, 1.2273e-04], device='cuda:2') 2023-03-07 13:21:28,252 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-07 13:21:30,732 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+02 4.714e+02 6.021e+02 7.352e+02 2.285e+03, threshold=1.204e+03, percent-clipped=4.0 2023-03-07 13:22:20,347 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7333.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:22:23,595 INFO [train2.py:809] (2/4) Epoch 2, batch 3350, loss[ctc_loss=0.2133, att_loss=0.3155, loss=0.2951, over 16428.00 frames. utt_duration=1495 frames, utt_pad_proportion=0.006014, over 44.00 utterances.], tot_loss[ctc_loss=0.2367, att_loss=0.3219, loss=0.3048, over 3269675.81 frames. utt_duration=1202 frames, utt_pad_proportion=0.06531, over 10895.67 utterances.], batch size: 44, lr: 3.68e-02, grad_scale: 8.0 2023-03-07 13:22:25,543 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7336.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:23:31,321 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4976, 3.2490, 4.9666, 4.0294, 3.4062, 4.0936, 4.4379, 4.4761], device='cuda:2'), covar=tensor([0.0128, 0.1262, 0.0116, 0.1176, 0.2756, 0.0804, 0.0288, 0.0350], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0194, 0.0107, 0.0237, 0.0281, 0.0138, 0.0090, 0.0099], device='cuda:2'), out_proj_covar=tensor([7.0140e-05, 1.3207e-04, 7.5737e-05, 1.7412e-04, 1.9101e-04, 1.0912e-04, 6.8311e-05, 7.6814e-05], device='cuda:2') 2023-03-07 13:23:43,566 INFO [train2.py:809] (2/4) Epoch 2, batch 3400, loss[ctc_loss=0.2317, att_loss=0.2954, loss=0.2826, over 15519.00 frames. utt_duration=1726 frames, utt_pad_proportion=0.007508, over 36.00 utterances.], tot_loss[ctc_loss=0.2361, att_loss=0.3212, loss=0.3042, over 3266793.99 frames. utt_duration=1204 frames, utt_pad_proportion=0.06669, over 10869.35 utterances.], batch size: 36, lr: 3.67e-02, grad_scale: 8.0 2023-03-07 13:24:08,302 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.74 vs. limit=2.0 2023-03-07 13:24:12,389 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+02 4.124e+02 5.654e+02 7.446e+02 2.114e+03, threshold=1.131e+03, percent-clipped=4.0 2023-03-07 13:24:14,039 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7404.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 13:24:18,713 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7407.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:25:03,869 INFO [train2.py:809] (2/4) Epoch 2, batch 3450, loss[ctc_loss=0.216, att_loss=0.3191, loss=0.2985, over 16762.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006866, over 48.00 utterances.], tot_loss[ctc_loss=0.2339, att_loss=0.3194, loss=0.3023, over 3256447.31 frames. utt_duration=1203 frames, utt_pad_proportion=0.06899, over 10841.83 utterances.], batch size: 48, lr: 3.66e-02, grad_scale: 8.0 2023-03-07 13:25:56,123 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7468.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:26:07,607 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2023-03-07 13:26:22,848 INFO [train2.py:809] (2/4) Epoch 2, batch 3500, loss[ctc_loss=0.1939, att_loss=0.2948, loss=0.2747, over 16178.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006335, over 41.00 utterances.], tot_loss[ctc_loss=0.2323, att_loss=0.3183, loss=0.3011, over 3259420.77 frames. utt_duration=1227 frames, utt_pad_proportion=0.06264, over 10642.52 utterances.], batch size: 41, lr: 3.65e-02, grad_scale: 8.0 2023-03-07 13:26:50,319 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.396e+02 4.119e+02 4.861e+02 5.923e+02 1.212e+03, threshold=9.723e+02, percent-clipped=1.0 2023-03-07 13:27:16,189 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7519.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 13:27:39,889 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7534.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 13:27:41,015 INFO [train2.py:809] (2/4) Epoch 2, batch 3550, loss[ctc_loss=0.2751, att_loss=0.3507, loss=0.3356, over 17090.00 frames. utt_duration=1222 frames, utt_pad_proportion=0.01671, over 56.00 utterances.], tot_loss[ctc_loss=0.2322, att_loss=0.3179, loss=0.3007, over 3257046.81 frames. utt_duration=1241 frames, utt_pad_proportion=0.061, over 10510.09 utterances.], batch size: 56, lr: 3.65e-02, grad_scale: 8.0 2023-03-07 13:28:23,390 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3339, 4.3521, 4.4837, 4.6909, 4.8532, 4.7052, 4.2698, 2.2175], device='cuda:2'), covar=tensor([0.0314, 0.0410, 0.0279, 0.0192, 0.0853, 0.0201, 0.0464, 0.3920], device='cuda:2'), in_proj_covar=tensor([0.0124, 0.0105, 0.0100, 0.0100, 0.0181, 0.0115, 0.0086, 0.0236], device='cuda:2'), out_proj_covar=tensor([9.6095e-05, 7.4743e-05, 7.6577e-05, 7.5114e-05, 1.5364e-04, 8.2760e-05, 6.9506e-05, 1.7583e-04], device='cuda:2') 2023-03-07 13:28:27,995 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7565.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:28:39,076 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3492, 4.7343, 4.7618, 4.9043, 4.3111, 4.6710, 4.9256, 4.8025], device='cuda:2'), covar=tensor([0.0325, 0.0224, 0.0254, 0.0143, 0.0350, 0.0147, 0.0358, 0.0126], device='cuda:2'), in_proj_covar=tensor([0.0108, 0.0095, 0.0108, 0.0074, 0.0112, 0.0086, 0.0100, 0.0082], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 13:28:55,956 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7582.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 13:29:00,313 INFO [train2.py:809] (2/4) Epoch 2, batch 3600, loss[ctc_loss=0.2601, att_loss=0.342, loss=0.3256, over 16533.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006767, over 45.00 utterances.], tot_loss[ctc_loss=0.2341, att_loss=0.3189, loss=0.3019, over 3264190.42 frames. utt_duration=1253 frames, utt_pad_proportion=0.05649, over 10436.14 utterances.], batch size: 45, lr: 3.64e-02, grad_scale: 8.0 2023-03-07 13:29:01,366 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2023-03-07 13:29:28,532 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.614e+02 5.143e+02 6.522e+02 9.491e+02 2.921e+03, threshold=1.304e+03, percent-clipped=20.0 2023-03-07 13:29:44,795 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7613.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:30:03,080 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7624.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:30:14,832 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7631.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:30:20,987 INFO [train2.py:809] (2/4) Epoch 2, batch 3650, loss[ctc_loss=0.2343, att_loss=0.3246, loss=0.3065, over 17059.00 frames. utt_duration=690.7 frames, utt_pad_proportion=0.1333, over 99.00 utterances.], tot_loss[ctc_loss=0.2332, att_loss=0.319, loss=0.3019, over 3276604.21 frames. utt_duration=1250 frames, utt_pad_proportion=0.05364, over 10498.13 utterances.], batch size: 99, lr: 3.63e-02, grad_scale: 8.0 2023-03-07 13:30:35,665 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.12 vs. limit=2.0 2023-03-07 13:30:38,277 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.1903, 1.8809, 2.8564, 2.8192, 3.4549, 1.9519, 2.2827, 2.0855], device='cuda:2'), covar=tensor([0.1057, 0.1073, 0.0294, 0.0416, 0.0175, 0.0709, 0.0759, 0.1377], device='cuda:2'), in_proj_covar=tensor([0.0067, 0.0057, 0.0048, 0.0069, 0.0054, 0.0052, 0.0066, 0.0081], device='cuda:2'), out_proj_covar=tensor([4.0171e-05, 3.6654e-05, 3.2811e-05, 3.9212e-05, 3.2933e-05, 3.6210e-05, 3.8807e-05, 5.3766e-05], device='cuda:2') 2023-03-07 13:31:41,707 INFO [train2.py:809] (2/4) Epoch 2, batch 3700, loss[ctc_loss=0.2141, att_loss=0.2766, loss=0.2641, over 15525.00 frames. utt_duration=1727 frames, utt_pad_proportion=0.006964, over 36.00 utterances.], tot_loss[ctc_loss=0.2341, att_loss=0.319, loss=0.3021, over 3275763.30 frames. utt_duration=1224 frames, utt_pad_proportion=0.05957, over 10719.29 utterances.], batch size: 36, lr: 3.62e-02, grad_scale: 8.0 2023-03-07 13:31:42,160 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7685.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:31:57,684 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1270, 5.2994, 5.5582, 5.8849, 5.2614, 6.1135, 5.2950, 6.0858], device='cuda:2'), covar=tensor([0.0331, 0.0431, 0.0378, 0.0360, 0.1312, 0.0338, 0.0306, 0.0387], device='cuda:2'), in_proj_covar=tensor([0.0301, 0.0228, 0.0211, 0.0237, 0.0390, 0.0213, 0.0182, 0.0251], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:2') 2023-03-07 13:32:10,451 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 4.093e+02 5.003e+02 6.020e+02 1.181e+03, threshold=1.001e+03, percent-clipped=0.0 2023-03-07 13:32:12,240 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7704.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 13:33:01,945 INFO [train2.py:809] (2/4) Epoch 2, batch 3750, loss[ctc_loss=0.224, att_loss=0.3301, loss=0.3089, over 17378.00 frames. utt_duration=1180 frames, utt_pad_proportion=0.01941, over 59.00 utterances.], tot_loss[ctc_loss=0.2326, att_loss=0.3184, loss=0.3012, over 3269734.60 frames. utt_duration=1226 frames, utt_pad_proportion=0.05963, over 10677.23 utterances.], batch size: 59, lr: 3.61e-02, grad_scale: 8.0 2023-03-07 13:33:14,928 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7743.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:33:29,475 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7752.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 13:33:46,950 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7763.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:34:18,379 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.82 vs. limit=2.0 2023-03-07 13:34:21,942 INFO [train2.py:809] (2/4) Epoch 2, batch 3800, loss[ctc_loss=0.2225, att_loss=0.3183, loss=0.2992, over 17017.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008684, over 51.00 utterances.], tot_loss[ctc_loss=0.2336, att_loss=0.3191, loss=0.302, over 3271201.17 frames. utt_duration=1231 frames, utt_pad_proportion=0.05822, over 10646.66 utterances.], batch size: 51, lr: 3.60e-02, grad_scale: 8.0 2023-03-07 13:34:49,620 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7802.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:34:49,755 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2671, 4.8731, 4.7013, 4.7645, 2.0846, 2.6493, 4.7941, 3.8723], device='cuda:2'), covar=tensor([0.0784, 0.0211, 0.0178, 0.0329, 0.9902, 0.2649, 0.0213, 0.2535], device='cuda:2'), in_proj_covar=tensor([0.0222, 0.0133, 0.0144, 0.0163, 0.0391, 0.0266, 0.0141, 0.0210], device='cuda:2'), out_proj_covar=tensor([1.1889e-04, 6.9657e-05, 7.2117e-05, 7.9808e-05, 1.9106e-04, 1.3127e-04, 7.1262e-05, 1.1754e-04], device='cuda:2') 2023-03-07 13:34:50,772 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+02 4.201e+02 5.009e+02 6.863e+02 2.448e+03, threshold=1.002e+03, percent-clipped=6.0 2023-03-07 13:34:52,731 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7804.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:35:11,204 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.18 vs. limit=2.0 2023-03-07 13:35:14,354 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.04 vs. limit=2.0 2023-03-07 13:35:16,914 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7819.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 13:35:42,001 INFO [train2.py:809] (2/4) Epoch 2, batch 3850, loss[ctc_loss=0.2389, att_loss=0.3177, loss=0.3019, over 16178.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006914, over 41.00 utterances.], tot_loss[ctc_loss=0.2321, att_loss=0.3184, loss=0.3012, over 3270507.15 frames. utt_duration=1243 frames, utt_pad_proportion=0.0534, over 10537.84 utterances.], batch size: 41, lr: 3.60e-02, grad_scale: 8.0 2023-03-07 13:35:46,685 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3821, 4.7518, 5.0312, 5.1750, 4.6838, 5.3648, 4.9940, 5.3793], device='cuda:2'), covar=tensor([0.0512, 0.0563, 0.0447, 0.0414, 0.1808, 0.0655, 0.0490, 0.0512], device='cuda:2'), in_proj_covar=tensor([0.0309, 0.0229, 0.0212, 0.0238, 0.0392, 0.0215, 0.0181, 0.0256], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:2') 2023-03-07 13:36:25,216 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7863.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:36:31,177 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7867.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:36:31,942 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.80 vs. limit=2.0 2023-03-07 13:36:59,754 INFO [train2.py:809] (2/4) Epoch 2, batch 3900, loss[ctc_loss=0.2193, att_loss=0.323, loss=0.3022, over 17310.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01139, over 55.00 utterances.], tot_loss[ctc_loss=0.2323, att_loss=0.3191, loss=0.3018, over 3276176.72 frames. utt_duration=1243 frames, utt_pad_proportion=0.05224, over 10558.67 utterances.], batch size: 55, lr: 3.59e-02, grad_scale: 8.0 2023-03-07 13:37:27,828 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+02 4.818e+02 5.738e+02 6.928e+02 1.942e+03, threshold=1.148e+03, percent-clipped=11.0 2023-03-07 13:37:29,629 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4781, 4.9655, 4.9468, 5.0840, 4.4854, 4.7982, 5.3423, 5.0318], device='cuda:2'), covar=tensor([0.0298, 0.0187, 0.0258, 0.0119, 0.0327, 0.0132, 0.0163, 0.0117], device='cuda:2'), in_proj_covar=tensor([0.0114, 0.0101, 0.0114, 0.0078, 0.0114, 0.0088, 0.0104, 0.0086], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 13:38:11,509 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7931.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:38:17,768 INFO [train2.py:809] (2/4) Epoch 2, batch 3950, loss[ctc_loss=0.2485, att_loss=0.3323, loss=0.3156, over 17311.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01082, over 55.00 utterances.], tot_loss[ctc_loss=0.231, att_loss=0.3176, loss=0.3003, over 3264426.87 frames. utt_duration=1251 frames, utt_pad_proportion=0.05199, over 10452.27 utterances.], batch size: 55, lr: 3.58e-02, grad_scale: 8.0 2023-03-07 13:39:31,703 INFO [train2.py:809] (2/4) Epoch 3, batch 0, loss[ctc_loss=0.2498, att_loss=0.325, loss=0.31, over 16986.00 frames. utt_duration=687.6 frames, utt_pad_proportion=0.134, over 99.00 utterances.], tot_loss[ctc_loss=0.2498, att_loss=0.325, loss=0.31, over 16986.00 frames. utt_duration=687.6 frames, utt_pad_proportion=0.134, over 99.00 utterances.], batch size: 99, lr: 3.40e-02, grad_scale: 8.0 2023-03-07 13:39:31,703 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 13:39:44,016 INFO [train2.py:843] (2/4) Epoch 3, validation: ctc_loss=0.122, att_loss=0.2748, loss=0.2442, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 13:39:44,017 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 13:40:00,309 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7979.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:40:01,921 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7980.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:40:42,653 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.716e+02 4.111e+02 5.021e+02 6.373e+02 1.388e+03, threshold=1.004e+03, percent-clipped=3.0 2023-03-07 13:41:08,148 INFO [train2.py:809] (2/4) Epoch 3, batch 50, loss[ctc_loss=0.2405, att_loss=0.344, loss=0.3233, over 17043.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.01022, over 53.00 utterances.], tot_loss[ctc_loss=0.225, att_loss=0.3149, loss=0.2969, over 739334.64 frames. utt_duration=1280 frames, utt_pad_proportion=0.04329, over 2313.01 utterances.], batch size: 53, lr: 3.39e-02, grad_scale: 16.0 2023-03-07 13:41:08,504 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2023, 2.6508, 3.6497, 2.3142, 3.2199, 4.2618, 3.9709, 3.0290], device='cuda:2'), covar=tensor([0.0328, 0.1123, 0.0704, 0.1524, 0.1017, 0.0160, 0.0581, 0.1394], device='cuda:2'), in_proj_covar=tensor([0.0117, 0.0128, 0.0109, 0.0137, 0.0140, 0.0076, 0.0089, 0.0142], device='cuda:2'), out_proj_covar=tensor([1.1786e-04, 1.2875e-04, 1.3146e-04, 1.3807e-04, 1.4643e-04, 8.9131e-05, 1.0862e-04, 1.4346e-04], device='cuda:2') 2023-03-07 13:42:17,939 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8063.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:42:27,410 INFO [train2.py:809] (2/4) Epoch 3, batch 100, loss[ctc_loss=0.2493, att_loss=0.3295, loss=0.3134, over 16892.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.005514, over 49.00 utterances.], tot_loss[ctc_loss=0.2312, att_loss=0.3188, loss=0.3013, over 1310159.71 frames. utt_duration=1171 frames, utt_pad_proportion=0.06432, over 4479.90 utterances.], batch size: 49, lr: 3.38e-02, grad_scale: 16.0 2023-03-07 13:43:13,265 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-07 13:43:15,557 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=8099.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:43:21,306 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.756e+02 4.455e+02 5.328e+02 6.355e+02 1.061e+03, threshold=1.066e+03, percent-clipped=2.0 2023-03-07 13:43:33,800 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8111.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:43:40,911 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3883, 5.5826, 4.9653, 5.6071, 5.2421, 5.1867, 5.0017, 5.0193], device='cuda:2'), covar=tensor([0.1061, 0.0806, 0.0748, 0.0588, 0.0572, 0.0923, 0.1864, 0.1730], device='cuda:2'), in_proj_covar=tensor([0.0234, 0.0283, 0.0235, 0.0213, 0.0189, 0.0288, 0.0305, 0.0283], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 13:43:47,461 INFO [train2.py:809] (2/4) Epoch 3, batch 150, loss[ctc_loss=0.237, att_loss=0.3162, loss=0.3004, over 17110.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01559, over 56.00 utterances.], tot_loss[ctc_loss=0.2256, att_loss=0.3159, loss=0.2978, over 1749730.23 frames. utt_duration=1251 frames, utt_pad_proportion=0.04656, over 5602.46 utterances.], batch size: 56, lr: 3.37e-02, grad_scale: 16.0 2023-03-07 13:44:41,428 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.06 vs. limit=2.0 2023-03-07 13:44:50,046 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=8158.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:45:07,247 INFO [train2.py:809] (2/4) Epoch 3, batch 200, loss[ctc_loss=0.1673, att_loss=0.2725, loss=0.2514, over 16003.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007988, over 40.00 utterances.], tot_loss[ctc_loss=0.2214, att_loss=0.3131, loss=0.2948, over 2082802.31 frames. utt_duration=1265 frames, utt_pad_proportion=0.04812, over 6592.34 utterances.], batch size: 40, lr: 3.37e-02, grad_scale: 16.0 2023-03-07 13:46:02,073 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+02 3.842e+02 5.004e+02 6.502e+02 2.003e+03, threshold=1.001e+03, percent-clipped=2.0 2023-03-07 13:46:13,753 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9252, 4.2171, 4.2598, 4.3416, 1.7398, 4.2506, 2.1193, 3.0990], device='cuda:2'), covar=tensor([0.0250, 0.0236, 0.0613, 0.0305, 0.4369, 0.0158, 0.1475, 0.0831], device='cuda:2'), in_proj_covar=tensor([0.0090, 0.0077, 0.0167, 0.0110, 0.0219, 0.0086, 0.0158, 0.0145], device='cuda:2'), out_proj_covar=tensor([7.6014e-05, 6.7675e-05, 1.3390e-04, 8.3548e-05, 1.5987e-04, 7.3548e-05, 1.2297e-04, 1.1247e-04], device='cuda:2') 2023-03-07 13:46:28,355 INFO [train2.py:809] (2/4) Epoch 3, batch 250, loss[ctc_loss=0.2057, att_loss=0.2741, loss=0.2604, over 15488.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.008412, over 36.00 utterances.], tot_loss[ctc_loss=0.2229, att_loss=0.3148, loss=0.2964, over 2354639.03 frames. utt_duration=1253 frames, utt_pad_proportion=0.04738, over 7523.47 utterances.], batch size: 36, lr: 3.36e-02, grad_scale: 16.0 2023-03-07 13:47:07,546 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6305, 2.0638, 2.6829, 3.8352, 3.9818, 3.5450, 2.5113, 1.4825], device='cuda:2'), covar=tensor([0.0287, 0.2463, 0.1291, 0.0500, 0.0259, 0.0313, 0.1858, 0.2679], device='cuda:2'), in_proj_covar=tensor([0.0096, 0.0155, 0.0143, 0.0096, 0.0074, 0.0084, 0.0154, 0.0144], device='cuda:2'), out_proj_covar=tensor([9.0137e-05, 1.4270e-04, 1.3593e-04, 1.0816e-04, 7.6536e-05, 7.7334e-05, 1.4804e-04, 1.3243e-04], device='cuda:2') 2023-03-07 13:47:08,960 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6048, 5.2157, 5.0008, 4.7963, 5.3242, 5.1543, 4.9799, 4.8502], device='cuda:2'), covar=tensor([0.0891, 0.0350, 0.0223, 0.0478, 0.0289, 0.0294, 0.0233, 0.0318], device='cuda:2'), in_proj_covar=tensor([0.0255, 0.0165, 0.0112, 0.0132, 0.0179, 0.0192, 0.0152, 0.0168], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 13:47:11,296 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-07 13:47:24,630 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([0.8115, 0.9818, 1.6811, 0.7633, 1.6436, 2.0928, 2.1012, 2.3674], device='cuda:2'), covar=tensor([0.1613, 0.1460, 0.1141, 0.1274, 0.0741, 0.0480, 0.1276, 0.0513], device='cuda:2'), in_proj_covar=tensor([0.0069, 0.0073, 0.0066, 0.0068, 0.0071, 0.0074, 0.0077, 0.0086], device='cuda:2'), out_proj_covar=tensor([4.6762e-05, 5.3690e-05, 4.7219e-05, 4.4390e-05, 4.2555e-05, 4.8454e-05, 4.5688e-05, 4.3101e-05], device='cuda:2') 2023-03-07 13:47:40,622 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=8265.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:47:46,792 INFO [train2.py:809] (2/4) Epoch 3, batch 300, loss[ctc_loss=0.1883, att_loss=0.2603, loss=0.2459, over 15754.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.00883, over 38.00 utterances.], tot_loss[ctc_loss=0.2234, att_loss=0.3144, loss=0.2962, over 2561485.64 frames. utt_duration=1254 frames, utt_pad_proportion=0.0487, over 8181.62 utterances.], batch size: 38, lr: 3.35e-02, grad_scale: 16.0 2023-03-07 13:48:03,758 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8280.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:48:41,921 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.817e+02 4.123e+02 5.343e+02 7.099e+02 1.515e+03, threshold=1.069e+03, percent-clipped=5.0 2023-03-07 13:49:05,807 INFO [train2.py:809] (2/4) Epoch 3, batch 350, loss[ctc_loss=0.1801, att_loss=0.3055, loss=0.2804, over 16960.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007208, over 50.00 utterances.], tot_loss[ctc_loss=0.2237, att_loss=0.3143, loss=0.2962, over 2713713.83 frames. utt_duration=1242 frames, utt_pad_proportion=0.05339, over 8752.21 utterances.], batch size: 50, lr: 3.34e-02, grad_scale: 8.0 2023-03-07 13:49:16,792 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=8326.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:49:19,540 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8328.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:49:56,740 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.42 vs. limit=5.0 2023-03-07 13:50:25,020 INFO [train2.py:809] (2/4) Epoch 3, batch 400, loss[ctc_loss=0.2042, att_loss=0.2818, loss=0.2663, over 15766.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.009107, over 38.00 utterances.], tot_loss[ctc_loss=0.224, att_loss=0.3146, loss=0.2965, over 2836905.62 frames. utt_duration=1236 frames, utt_pad_proportion=0.05699, over 9193.80 utterances.], batch size: 38, lr: 3.34e-02, grad_scale: 8.0 2023-03-07 13:50:42,709 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.92 vs. limit=5.0 2023-03-07 13:51:12,615 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8399.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:51:20,126 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.563e+02 4.039e+02 5.098e+02 6.528e+02 1.596e+03, threshold=1.020e+03, percent-clipped=2.0 2023-03-07 13:51:44,894 INFO [train2.py:809] (2/4) Epoch 3, batch 450, loss[ctc_loss=0.2404, att_loss=0.3159, loss=0.3008, over 16184.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006136, over 41.00 utterances.], tot_loss[ctc_loss=0.2237, att_loss=0.3148, loss=0.2966, over 2938534.72 frames. utt_duration=1239 frames, utt_pad_proportion=0.0558, over 9501.03 utterances.], batch size: 41, lr: 3.33e-02, grad_scale: 8.0 2023-03-07 13:51:59,375 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4175, 5.0510, 5.1622, 5.0107, 2.0877, 2.8035, 5.1117, 3.8243], device='cuda:2'), covar=tensor([0.0606, 0.0253, 0.0142, 0.0341, 1.0532, 0.2702, 0.0182, 0.2857], device='cuda:2'), in_proj_covar=tensor([0.0227, 0.0137, 0.0146, 0.0169, 0.0406, 0.0278, 0.0143, 0.0219], device='cuda:2'), out_proj_covar=tensor([1.2235e-04, 7.0583e-05, 7.3983e-05, 8.4082e-05, 1.9844e-04, 1.3915e-04, 7.2215e-05, 1.2481e-04], device='cuda:2') 2023-03-07 13:52:14,466 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=3.25 vs. limit=2.0 2023-03-07 13:52:30,022 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8447.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:52:47,947 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8458.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:53:05,346 INFO [train2.py:809] (2/4) Epoch 3, batch 500, loss[ctc_loss=0.1985, att_loss=0.2841, loss=0.267, over 15965.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005506, over 41.00 utterances.], tot_loss[ctc_loss=0.2225, att_loss=0.3139, loss=0.2956, over 3013458.91 frames. utt_duration=1242 frames, utt_pad_proportion=0.05507, over 9717.50 utterances.], batch size: 41, lr: 3.32e-02, grad_scale: 8.0 2023-03-07 13:53:08,868 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.0135, 1.2669, 1.6897, 2.5270, 0.7486, 2.9911, 1.7123, 2.2218], device='cuda:2'), covar=tensor([0.0602, 0.1260, 0.0870, 0.0908, 0.1982, 0.0358, 0.1821, 0.1563], device='cuda:2'), in_proj_covar=tensor([0.0065, 0.0058, 0.0055, 0.0078, 0.0065, 0.0053, 0.0069, 0.0086], device='cuda:2'), out_proj_covar=tensor([3.7874e-05, 3.9109e-05, 3.8284e-05, 4.3643e-05, 3.6323e-05, 3.2776e-05, 4.1047e-05, 5.4741e-05], device='cuda:2') 2023-03-07 13:54:00,436 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+02 4.509e+02 5.776e+02 7.013e+02 1.879e+03, threshold=1.155e+03, percent-clipped=5.0 2023-03-07 13:54:04,282 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8506.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:54:08,311 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2723, 4.7267, 4.9004, 4.8497, 2.1194, 2.7465, 4.6842, 3.7759], device='cuda:2'), covar=tensor([0.0533, 0.0252, 0.0124, 0.0259, 0.8823, 0.2333, 0.0182, 0.2601], device='cuda:2'), in_proj_covar=tensor([0.0231, 0.0137, 0.0144, 0.0169, 0.0405, 0.0278, 0.0141, 0.0223], device='cuda:2'), out_proj_covar=tensor([1.2459e-04, 7.0136e-05, 7.3396e-05, 8.4009e-05, 1.9764e-04, 1.3954e-04, 7.1721e-05, 1.2678e-04], device='cuda:2') 2023-03-07 13:54:24,638 INFO [train2.py:809] (2/4) Epoch 3, batch 550, loss[ctc_loss=0.2263, att_loss=0.32, loss=0.3013, over 17474.00 frames. utt_duration=1014 frames, utt_pad_proportion=0.04397, over 69.00 utterances.], tot_loss[ctc_loss=0.2229, att_loss=0.3137, loss=0.2955, over 3072925.60 frames. utt_duration=1205 frames, utt_pad_proportion=0.06264, over 10212.38 utterances.], batch size: 69, lr: 3.31e-02, grad_scale: 8.0 2023-03-07 13:55:45,135 INFO [train2.py:809] (2/4) Epoch 3, batch 600, loss[ctc_loss=0.201, att_loss=0.3098, loss=0.2881, over 17255.00 frames. utt_duration=1256 frames, utt_pad_proportion=0.01379, over 55.00 utterances.], tot_loss[ctc_loss=0.223, att_loss=0.3143, loss=0.296, over 3126365.49 frames. utt_duration=1199 frames, utt_pad_proportion=0.06161, over 10440.16 utterances.], batch size: 55, lr: 3.31e-02, grad_scale: 8.0 2023-03-07 13:56:42,021 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 4.122e+02 4.877e+02 6.146e+02 9.874e+02, threshold=9.754e+02, percent-clipped=0.0 2023-03-07 13:56:47,699 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([0.6357, 1.0158, 1.5392, 1.4834, 1.0780, 1.4111, 1.2077, 1.4892], device='cuda:2'), covar=tensor([0.2566, 0.1290, 0.0855, 0.0880, 0.1097, 0.0944, 0.0966, 0.0867], device='cuda:2'), in_proj_covar=tensor([0.0062, 0.0072, 0.0066, 0.0066, 0.0072, 0.0077, 0.0077, 0.0086], device='cuda:2'), out_proj_covar=tensor([4.1461e-05, 5.3104e-05, 4.6062e-05, 4.1798e-05, 4.2663e-05, 4.8348e-05, 4.4731e-05, 4.3282e-05], device='cuda:2') 2023-03-07 13:57:05,727 INFO [train2.py:809] (2/4) Epoch 3, batch 650, loss[ctc_loss=0.2205, att_loss=0.2927, loss=0.2782, over 15624.00 frames. utt_duration=1690 frames, utt_pad_proportion=0.01032, over 37.00 utterances.], tot_loss[ctc_loss=0.2203, att_loss=0.3122, loss=0.2938, over 3155009.43 frames. utt_duration=1223 frames, utt_pad_proportion=0.058, over 10332.27 utterances.], batch size: 37, lr: 3.30e-02, grad_scale: 8.0 2023-03-07 13:57:09,595 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=8621.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 13:57:20,620 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.8652, 3.1905, 2.9604, 2.8326, 3.2913, 3.0337, 1.6386, 3.6810], device='cuda:2'), covar=tensor([0.1262, 0.0253, 0.0757, 0.0573, 0.0384, 0.0651, 0.1161, 0.0090], device='cuda:2'), in_proj_covar=tensor([0.0104, 0.0066, 0.0113, 0.0093, 0.0077, 0.0113, 0.0100, 0.0051], device='cuda:2'), out_proj_covar=tensor([1.4481e-04, 1.1059e-04, 1.7002e-04, 1.2925e-04, 1.3089e-04, 1.6901e-04, 1.3669e-04, 8.0800e-05], device='cuda:2') 2023-03-07 13:57:26,681 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6678, 4.9633, 4.9771, 4.9710, 4.5588, 4.8546, 5.3467, 4.9995], device='cuda:2'), covar=tensor([0.0311, 0.0246, 0.0319, 0.0172, 0.0338, 0.0157, 0.0282, 0.0154], device='cuda:2'), in_proj_covar=tensor([0.0124, 0.0106, 0.0122, 0.0080, 0.0120, 0.0092, 0.0109, 0.0091], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-07 13:58:25,322 INFO [train2.py:809] (2/4) Epoch 3, batch 700, loss[ctc_loss=0.2364, att_loss=0.3257, loss=0.3078, over 17110.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01539, over 56.00 utterances.], tot_loss[ctc_loss=0.2203, att_loss=0.3116, loss=0.2933, over 3168785.05 frames. utt_duration=1226 frames, utt_pad_proportion=0.05981, over 10349.36 utterances.], batch size: 56, lr: 3.29e-02, grad_scale: 8.0 2023-03-07 13:58:41,659 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.6791, 3.7325, 3.3096, 3.4676, 3.9451, 3.3873, 1.8404, 4.2990], device='cuda:2'), covar=tensor([0.1551, 0.0258, 0.1056, 0.0611, 0.0324, 0.0889, 0.1342, 0.0069], device='cuda:2'), in_proj_covar=tensor([0.0106, 0.0067, 0.0116, 0.0095, 0.0078, 0.0116, 0.0100, 0.0052], device='cuda:2'), out_proj_covar=tensor([1.4849e-04, 1.1274e-04, 1.7550e-04, 1.3260e-04, 1.3281e-04, 1.7347e-04, 1.3686e-04, 8.2575e-05], device='cuda:2') 2023-03-07 13:59:21,218 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.427e+02 4.229e+02 5.521e+02 7.710e+02 1.687e+03, threshold=1.104e+03, percent-clipped=9.0 2023-03-07 13:59:44,322 INFO [train2.py:809] (2/4) Epoch 3, batch 750, loss[ctc_loss=0.2326, att_loss=0.3052, loss=0.2906, over 15938.00 frames. utt_duration=1556 frames, utt_pad_proportion=0.007482, over 41.00 utterances.], tot_loss[ctc_loss=0.2197, att_loss=0.3113, loss=0.293, over 3188707.13 frames. utt_duration=1228 frames, utt_pad_proportion=0.06014, over 10395.33 utterances.], batch size: 41, lr: 3.29e-02, grad_scale: 8.0 2023-03-07 14:00:51,000 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=8760.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:01:04,404 INFO [train2.py:809] (2/4) Epoch 3, batch 800, loss[ctc_loss=0.184, att_loss=0.302, loss=0.2784, over 16532.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006917, over 45.00 utterances.], tot_loss[ctc_loss=0.2191, att_loss=0.3109, loss=0.2926, over 3203129.75 frames. utt_duration=1217 frames, utt_pad_proportion=0.0628, over 10539.72 utterances.], batch size: 45, lr: 3.28e-02, grad_scale: 8.0 2023-03-07 14:02:01,731 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.534e+02 3.936e+02 4.764e+02 5.876e+02 1.316e+03, threshold=9.528e+02, percent-clipped=1.0 2023-03-07 14:02:17,390 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1836, 4.7954, 4.9502, 4.7342, 2.4866, 4.5157, 3.5232, 3.5488], device='cuda:2'), covar=tensor([0.0206, 0.0111, 0.0553, 0.0283, 0.3344, 0.0181, 0.0903, 0.0844], device='cuda:2'), in_proj_covar=tensor([0.0090, 0.0077, 0.0177, 0.0112, 0.0220, 0.0086, 0.0162, 0.0153], device='cuda:2'), out_proj_covar=tensor([7.9024e-05, 6.8840e-05, 1.4215e-04, 8.6490e-05, 1.6388e-04, 7.3774e-05, 1.2782e-04, 1.2062e-04], device='cuda:2') 2023-03-07 14:02:24,971 INFO [train2.py:809] (2/4) Epoch 3, batch 850, loss[ctc_loss=0.2412, att_loss=0.3375, loss=0.3183, over 17424.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03086, over 63.00 utterances.], tot_loss[ctc_loss=0.2192, att_loss=0.3113, loss=0.2929, over 3209975.89 frames. utt_duration=1207 frames, utt_pad_proportion=0.06767, over 10655.06 utterances.], batch size: 63, lr: 3.27e-02, grad_scale: 8.0 2023-03-07 14:02:29,336 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=8821.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:03:33,440 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6592, 4.8599, 4.7615, 4.6933, 5.3598, 4.9828, 4.6584, 2.9106], device='cuda:2'), covar=tensor([0.0279, 0.0306, 0.0202, 0.0273, 0.0806, 0.0196, 0.0339, 0.3428], device='cuda:2'), in_proj_covar=tensor([0.0125, 0.0109, 0.0102, 0.0110, 0.0193, 0.0129, 0.0094, 0.0245], device='cuda:2'), out_proj_covar=tensor([1.0045e-04, 8.0420e-05, 8.1330e-05, 8.6625e-05, 1.7079e-04, 9.7793e-05, 7.9526e-05, 1.9118e-04], device='cuda:2') 2023-03-07 14:03:43,842 INFO [train2.py:809] (2/4) Epoch 3, batch 900, loss[ctc_loss=0.2268, att_loss=0.3222, loss=0.3031, over 16977.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006793, over 50.00 utterances.], tot_loss[ctc_loss=0.2189, att_loss=0.3111, loss=0.2926, over 3213185.49 frames. utt_duration=1214 frames, utt_pad_proportion=0.06763, over 10603.74 utterances.], batch size: 50, lr: 3.26e-02, grad_scale: 8.0 2023-03-07 14:03:56,958 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.7470, 3.6913, 3.2584, 3.4967, 3.6845, 3.2763, 1.8543, 4.3143], device='cuda:2'), covar=tensor([0.1532, 0.0245, 0.0917, 0.0545, 0.0407, 0.0710, 0.1280, 0.0083], device='cuda:2'), in_proj_covar=tensor([0.0103, 0.0068, 0.0115, 0.0093, 0.0080, 0.0112, 0.0100, 0.0053], device='cuda:2'), out_proj_covar=tensor([1.4486e-04, 1.1559e-04, 1.7460e-04, 1.3216e-04, 1.3594e-04, 1.6923e-04, 1.3687e-04, 8.5128e-05], device='cuda:2') 2023-03-07 14:04:32,218 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.1894, 1.1106, 2.3805, 1.1573, 1.5506, 2.7794, 1.5121, 0.9436], device='cuda:2'), covar=tensor([0.1212, 0.1071, 0.0507, 0.1511, 0.1029, 0.0233, 0.0882, 0.2564], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0056, 0.0051, 0.0076, 0.0060, 0.0052, 0.0064, 0.0086], device='cuda:2'), out_proj_covar=tensor([3.7954e-05, 3.7815e-05, 3.6129e-05, 4.3466e-05, 3.6282e-05, 3.0566e-05, 3.8655e-05, 5.4342e-05], device='cuda:2') 2023-03-07 14:04:39,898 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.583e+02 4.192e+02 5.104e+02 6.098e+02 1.636e+03, threshold=1.021e+03, percent-clipped=5.0 2023-03-07 14:04:47,197 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-07 14:05:03,066 INFO [train2.py:809] (2/4) Epoch 3, batch 950, loss[ctc_loss=0.2493, att_loss=0.3201, loss=0.3059, over 15933.00 frames. utt_duration=1555 frames, utt_pad_proportion=0.007347, over 41.00 utterances.], tot_loss[ctc_loss=0.2199, att_loss=0.3118, loss=0.2934, over 3225909.21 frames. utt_duration=1190 frames, utt_pad_proportion=0.07279, over 10859.60 utterances.], batch size: 41, lr: 3.26e-02, grad_scale: 8.0 2023-03-07 14:05:05,151 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-07 14:05:07,424 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8921.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:05:16,821 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=8927.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:05:42,463 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2879, 4.4362, 4.5687, 4.4277, 4.8250, 4.2940, 4.2798, 2.1352], device='cuda:2'), covar=tensor([0.0267, 0.0364, 0.0240, 0.0161, 0.0824, 0.0282, 0.0419, 0.3799], device='cuda:2'), in_proj_covar=tensor([0.0124, 0.0108, 0.0100, 0.0109, 0.0196, 0.0124, 0.0093, 0.0238], device='cuda:2'), out_proj_covar=tensor([9.9880e-05, 8.0684e-05, 8.1183e-05, 8.5599e-05, 1.7247e-04, 9.5325e-05, 7.8950e-05, 1.8742e-04], device='cuda:2') 2023-03-07 14:06:03,035 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.5557, 1.0966, 2.2645, 1.1757, 1.4917, 3.1849, 2.0393, 1.2491], device='cuda:2'), covar=tensor([0.0944, 0.1811, 0.0849, 0.2394, 0.1952, 0.0513, 0.1416, 0.2888], device='cuda:2'), in_proj_covar=tensor([0.0068, 0.0059, 0.0053, 0.0079, 0.0062, 0.0054, 0.0067, 0.0089], device='cuda:2'), out_proj_covar=tensor([3.8695e-05, 4.0051e-05, 3.7147e-05, 4.5339e-05, 3.8244e-05, 3.2013e-05, 4.0165e-05, 5.6145e-05], device='cuda:2') 2023-03-07 14:06:18,715 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9833, 2.1588, 3.2917, 4.4514, 4.3093, 4.5081, 2.8739, 2.0640], device='cuda:2'), covar=tensor([0.0376, 0.2744, 0.1275, 0.0450, 0.0136, 0.0168, 0.2021, 0.2640], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0149, 0.0143, 0.0097, 0.0073, 0.0086, 0.0154, 0.0140], device='cuda:2'), out_proj_covar=tensor([9.1789e-05, 1.3855e-04, 1.3804e-04, 1.0955e-04, 7.6212e-05, 8.2514e-05, 1.4990e-04, 1.3024e-04], device='cuda:2') 2023-03-07 14:06:23,048 INFO [train2.py:809] (2/4) Epoch 3, batch 1000, loss[ctc_loss=0.2297, att_loss=0.3357, loss=0.3145, over 16762.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006718, over 48.00 utterances.], tot_loss[ctc_loss=0.2184, att_loss=0.3115, loss=0.2929, over 3240149.17 frames. utt_duration=1200 frames, utt_pad_proportion=0.06882, over 10817.34 utterances.], batch size: 48, lr: 3.25e-02, grad_scale: 8.0 2023-03-07 14:06:23,866 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8969.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:06:50,831 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=8986.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:06:53,999 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=8988.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:07:19,491 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.658e+02 4.100e+02 4.728e+02 5.740e+02 1.456e+03, threshold=9.457e+02, percent-clipped=2.0 2023-03-07 14:07:34,360 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9013.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:07:43,209 INFO [train2.py:809] (2/4) Epoch 3, batch 1050, loss[ctc_loss=0.2611, att_loss=0.3397, loss=0.324, over 16695.00 frames. utt_duration=676.1 frames, utt_pad_proportion=0.1506, over 99.00 utterances.], tot_loss[ctc_loss=0.2166, att_loss=0.3094, loss=0.2908, over 3241789.21 frames. utt_duration=1213 frames, utt_pad_proportion=0.06567, over 10699.72 utterances.], batch size: 99, lr: 3.24e-02, grad_scale: 8.0 2023-03-07 14:07:47,183 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8976, 6.0277, 5.4855, 6.0943, 5.8528, 5.7114, 5.5444, 5.4464], device='cuda:2'), covar=tensor([0.1033, 0.0710, 0.0640, 0.0540, 0.0500, 0.0851, 0.1723, 0.1470], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0290, 0.0240, 0.0227, 0.0203, 0.0297, 0.0311, 0.0297], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 14:08:27,895 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9047.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:09:02,451 INFO [train2.py:809] (2/4) Epoch 3, batch 1100, loss[ctc_loss=0.1895, att_loss=0.2878, loss=0.2682, over 16169.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.006872, over 41.00 utterances.], tot_loss[ctc_loss=0.2161, att_loss=0.3096, loss=0.2909, over 3246294.03 frames. utt_duration=1235 frames, utt_pad_proportion=0.06057, over 10531.14 utterances.], batch size: 41, lr: 3.24e-02, grad_scale: 8.0 2023-03-07 14:09:11,155 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9074.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:09:58,361 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2023-03-07 14:09:58,861 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.857e+02 5.076e+02 6.311e+02 2.003e+03, threshold=1.015e+03, percent-clipped=7.0 2023-03-07 14:10:18,407 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9116.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:10:22,914 INFO [train2.py:809] (2/4) Epoch 3, batch 1150, loss[ctc_loss=0.194, att_loss=0.2846, loss=0.2665, over 15774.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008275, over 38.00 utterances.], tot_loss[ctc_loss=0.2149, att_loss=0.3085, loss=0.2898, over 3246990.34 frames. utt_duration=1231 frames, utt_pad_proportion=0.06295, over 10560.51 utterances.], batch size: 38, lr: 3.23e-02, grad_scale: 8.0 2023-03-07 14:10:23,218 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9639, 4.8052, 4.7726, 3.6479, 4.8243, 3.9366, 4.6039, 2.5290], device='cuda:2'), covar=tensor([0.0146, 0.0084, 0.0313, 0.0428, 0.0109, 0.0180, 0.0160, 0.1479], device='cuda:2'), in_proj_covar=tensor([0.0036, 0.0038, 0.0034, 0.0055, 0.0037, 0.0043, 0.0049, 0.0087], device='cuda:2'), out_proj_covar=tensor([8.7460e-05, 1.0774e-04, 1.0333e-04, 1.3397e-04, 9.7308e-05, 1.3016e-04, 1.2053e-04, 1.9768e-04], device='cuda:2') 2023-03-07 14:10:55,669 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2023-03-07 14:11:31,637 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.74 vs. limit=2.0 2023-03-07 14:11:42,362 INFO [train2.py:809] (2/4) Epoch 3, batch 1200, loss[ctc_loss=0.3131, att_loss=0.3533, loss=0.3453, over 16963.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007699, over 50.00 utterances.], tot_loss[ctc_loss=0.2147, att_loss=0.308, loss=0.2893, over 3248294.83 frames. utt_duration=1259 frames, utt_pad_proportion=0.05699, over 10330.82 utterances.], batch size: 50, lr: 3.22e-02, grad_scale: 8.0 2023-03-07 14:12:01,839 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2129, 2.0694, 2.8066, 4.2865, 4.4056, 4.2721, 2.6936, 1.7875], device='cuda:2'), covar=tensor([0.0225, 0.2711, 0.1229, 0.0363, 0.0136, 0.0209, 0.1854, 0.2590], device='cuda:2'), in_proj_covar=tensor([0.0102, 0.0163, 0.0151, 0.0102, 0.0079, 0.0094, 0.0160, 0.0148], device='cuda:2'), out_proj_covar=tensor([9.7068e-05, 1.5176e-04, 1.4676e-04, 1.1603e-04, 8.1716e-05, 9.0664e-05, 1.5687e-04, 1.3867e-04], device='cuda:2') 2023-03-07 14:12:29,353 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.1886, 1.2102, 2.3092, 1.8721, 1.9501, 2.3581, 1.5072, 1.4583], device='cuda:2'), covar=tensor([0.0486, 0.1722, 0.0366, 0.1305, 0.0659, 0.0232, 0.1063, 0.2157], device='cuda:2'), in_proj_covar=tensor([0.0060, 0.0052, 0.0046, 0.0069, 0.0057, 0.0048, 0.0066, 0.0081], device='cuda:2'), out_proj_covar=tensor([3.4753e-05, 3.6484e-05, 3.3027e-05, 4.0656e-05, 3.5162e-05, 2.8579e-05, 3.8668e-05, 5.1473e-05], device='cuda:2') 2023-03-07 14:12:41,117 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.934e+02 4.184e+02 5.248e+02 6.761e+02 1.156e+03, threshold=1.050e+03, percent-clipped=4.0 2023-03-07 14:13:03,116 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4958, 1.3439, 2.5119, 2.4969, 1.5572, 3.1204, 1.8027, 1.4058], device='cuda:2'), covar=tensor([0.0537, 0.1580, 0.0573, 0.1461, 0.1296, 0.0270, 0.1900, 0.2986], device='cuda:2'), in_proj_covar=tensor([0.0060, 0.0051, 0.0046, 0.0069, 0.0057, 0.0048, 0.0064, 0.0082], device='cuda:2'), out_proj_covar=tensor([3.4686e-05, 3.5763e-05, 3.2804e-05, 4.0409e-05, 3.4805e-05, 2.8318e-05, 3.8190e-05, 5.1491e-05], device='cuda:2') 2023-03-07 14:13:06,017 INFO [train2.py:809] (2/4) Epoch 3, batch 1250, loss[ctc_loss=0.1853, att_loss=0.2672, loss=0.2508, over 15374.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01037, over 35.00 utterances.], tot_loss[ctc_loss=0.2168, att_loss=0.3096, loss=0.291, over 3256345.87 frames. utt_duration=1241 frames, utt_pad_proportion=0.06021, over 10505.82 utterances.], batch size: 35, lr: 3.22e-02, grad_scale: 8.0 2023-03-07 14:13:45,646 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0708, 4.7361, 4.4716, 4.7293, 4.8598, 4.2974, 4.2059, 4.4641], device='cuda:2'), covar=tensor([0.0168, 0.0295, 0.0104, 0.0125, 0.0125, 0.0156, 0.0354, 0.0254], device='cuda:2'), in_proj_covar=tensor([0.0043, 0.0043, 0.0043, 0.0032, 0.0032, 0.0039, 0.0057, 0.0049], device='cuda:2'), out_proj_covar=tensor([1.0236e-04, 1.0384e-04, 1.1704e-04, 8.3083e-05, 7.5753e-05, 1.0223e-04, 1.3845e-04, 1.2367e-04], device='cuda:2') 2023-03-07 14:14:30,176 INFO [train2.py:809] (2/4) Epoch 3, batch 1300, loss[ctc_loss=0.2022, att_loss=0.3116, loss=0.2897, over 17039.00 frames. utt_duration=1338 frames, utt_pad_proportion=0.006842, over 51.00 utterances.], tot_loss[ctc_loss=0.2149, att_loss=0.3085, loss=0.2898, over 3255192.62 frames. utt_duration=1236 frames, utt_pad_proportion=0.06248, over 10548.48 utterances.], batch size: 51, lr: 3.21e-02, grad_scale: 8.0 2023-03-07 14:14:52,978 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9283.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:15:18,847 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.81 vs. limit=2.0 2023-03-07 14:15:29,121 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.817e+02 3.943e+02 4.874e+02 6.147e+02 1.582e+03, threshold=9.748e+02, percent-clipped=2.0 2023-03-07 14:15:31,192 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3879, 4.6024, 4.3177, 4.5313, 4.9076, 4.5199, 4.1323, 1.9883], device='cuda:2'), covar=tensor([0.0257, 0.0366, 0.0406, 0.0235, 0.1017, 0.0270, 0.0459, 0.4327], device='cuda:2'), in_proj_covar=tensor([0.0118, 0.0107, 0.0103, 0.0106, 0.0195, 0.0125, 0.0092, 0.0246], device='cuda:2'), out_proj_covar=tensor([9.7500e-05, 8.2995e-05, 8.5148e-05, 8.7295e-05, 1.7426e-04, 9.7880e-05, 8.1050e-05, 1.9426e-04], device='cuda:2') 2023-03-07 14:15:54,114 INFO [train2.py:809] (2/4) Epoch 3, batch 1350, loss[ctc_loss=0.176, att_loss=0.2928, loss=0.2694, over 16540.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.005661, over 45.00 utterances.], tot_loss[ctc_loss=0.2139, att_loss=0.3085, loss=0.2896, over 3266062.16 frames. utt_duration=1259 frames, utt_pad_proportion=0.05367, over 10386.44 utterances.], batch size: 45, lr: 3.20e-02, grad_scale: 8.0 2023-03-07 14:16:08,251 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.19 vs. limit=5.0 2023-03-07 14:16:32,194 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9342.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:17:17,853 INFO [train2.py:809] (2/4) Epoch 3, batch 1400, loss[ctc_loss=0.1957, att_loss=0.2846, loss=0.2668, over 15765.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008856, over 38.00 utterances.], tot_loss[ctc_loss=0.2126, att_loss=0.3077, loss=0.2887, over 3271273.38 frames. utt_duration=1239 frames, utt_pad_proportion=0.05639, over 10570.51 utterances.], batch size: 38, lr: 3.20e-02, grad_scale: 8.0 2023-03-07 14:17:18,106 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9369.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:17:19,331 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.59 vs. limit=2.0 2023-03-07 14:18:16,801 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 3.704e+02 4.754e+02 6.457e+02 1.261e+03, threshold=9.507e+02, percent-clipped=3.0 2023-03-07 14:18:33,981 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9414.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:18:37,115 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=9416.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:18:41,731 INFO [train2.py:809] (2/4) Epoch 3, batch 1450, loss[ctc_loss=0.2335, att_loss=0.3264, loss=0.3079, over 17259.00 frames. utt_duration=875.3 frames, utt_pad_proportion=0.08149, over 79.00 utterances.], tot_loss[ctc_loss=0.2134, att_loss=0.3084, loss=0.2894, over 3264180.12 frames. utt_duration=1209 frames, utt_pad_proportion=0.06529, over 10813.82 utterances.], batch size: 79, lr: 3.19e-02, grad_scale: 8.0 2023-03-07 14:19:11,502 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6386, 4.9510, 4.8444, 5.0123, 4.4526, 4.6993, 5.1599, 5.0699], device='cuda:2'), covar=tensor([0.0244, 0.0214, 0.0351, 0.0148, 0.0332, 0.0205, 0.0269, 0.0122], device='cuda:2'), in_proj_covar=tensor([0.0125, 0.0112, 0.0131, 0.0086, 0.0125, 0.0100, 0.0112, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:2') 2023-03-07 14:19:58,099 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=9464.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:20:06,084 INFO [train2.py:809] (2/4) Epoch 3, batch 1500, loss[ctc_loss=0.1896, att_loss=0.2804, loss=0.2623, over 15784.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007961, over 38.00 utterances.], tot_loss[ctc_loss=0.212, att_loss=0.3081, loss=0.2889, over 3271259.50 frames. utt_duration=1227 frames, utt_pad_proportion=0.06005, over 10676.97 utterances.], batch size: 38, lr: 3.18e-02, grad_scale: 8.0 2023-03-07 14:20:15,990 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9475.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:21:03,452 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.770e+02 4.105e+02 5.004e+02 6.164e+02 1.572e+03, threshold=1.001e+03, percent-clipped=6.0 2023-03-07 14:21:27,817 INFO [train2.py:809] (2/4) Epoch 3, batch 1550, loss[ctc_loss=0.2569, att_loss=0.3388, loss=0.3224, over 17290.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01249, over 55.00 utterances.], tot_loss[ctc_loss=0.2123, att_loss=0.3082, loss=0.2891, over 3274144.11 frames. utt_duration=1250 frames, utt_pad_proportion=0.05489, over 10491.53 utterances.], batch size: 55, lr: 3.18e-02, grad_scale: 8.0 2023-03-07 14:21:28,132 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0799, 2.8017, 3.4148, 2.3275, 3.3974, 4.3977, 4.1651, 3.2443], device='cuda:2'), covar=tensor([0.0434, 0.1201, 0.0867, 0.1404, 0.0843, 0.0130, 0.0329, 0.1020], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0155, 0.0132, 0.0150, 0.0166, 0.0093, 0.0107, 0.0158], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 14:22:01,985 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.9941, 1.1838, 1.6505, 2.0648, 1.1153, 2.4916, 1.3081, 1.8292], device='cuda:2'), covar=tensor([0.0400, 0.1364, 0.0872, 0.0522, 0.0870, 0.0333, 0.0981, 0.0741], device='cuda:2'), in_proj_covar=tensor([0.0061, 0.0070, 0.0071, 0.0070, 0.0067, 0.0063, 0.0077, 0.0084], device='cuda:2'), out_proj_covar=tensor([3.4249e-05, 4.7147e-05, 4.5186e-05, 4.0689e-05, 3.9342e-05, 3.7393e-05, 4.4368e-05, 4.4548e-05], device='cuda:2') 2023-03-07 14:22:50,219 INFO [train2.py:809] (2/4) Epoch 3, batch 1600, loss[ctc_loss=0.1963, att_loss=0.3139, loss=0.2904, over 16625.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005289, over 47.00 utterances.], tot_loss[ctc_loss=0.2108, att_loss=0.3075, loss=0.2882, over 3282587.05 frames. utt_duration=1271 frames, utt_pad_proportion=0.04748, over 10338.90 utterances.], batch size: 47, lr: 3.17e-02, grad_scale: 8.0 2023-03-07 14:23:04,058 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.13 vs. limit=2.0 2023-03-07 14:23:12,996 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=9583.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:23:14,642 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9584.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:23:21,902 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2023-03-07 14:23:37,533 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2073, 4.2219, 4.7415, 4.2471, 2.0125, 4.3036, 2.3003, 2.9119], device='cuda:2'), covar=tensor([0.0185, 0.0256, 0.0418, 0.0322, 0.3856, 0.0206, 0.1736, 0.1054], device='cuda:2'), in_proj_covar=tensor([0.0089, 0.0082, 0.0182, 0.0112, 0.0220, 0.0092, 0.0175, 0.0157], device='cuda:2'), out_proj_covar=tensor([7.7858e-05, 7.4113e-05, 1.4799e-04, 8.6869e-05, 1.6680e-04, 7.9325e-05, 1.3886e-04, 1.2568e-04], device='cuda:2') 2023-03-07 14:23:49,431 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.439e+02 4.014e+02 4.915e+02 6.528e+02 1.408e+03, threshold=9.829e+02, percent-clipped=8.0 2023-03-07 14:24:13,620 INFO [train2.py:809] (2/4) Epoch 3, batch 1650, loss[ctc_loss=0.2017, att_loss=0.3074, loss=0.2862, over 17187.00 frames. utt_duration=871.7 frames, utt_pad_proportion=0.08334, over 79.00 utterances.], tot_loss[ctc_loss=0.2118, att_loss=0.3086, loss=0.2892, over 3286441.56 frames. utt_duration=1239 frames, utt_pad_proportion=0.05411, over 10626.73 utterances.], batch size: 79, lr: 3.16e-02, grad_scale: 8.0 2023-03-07 14:24:33,194 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=9631.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:24:42,216 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-07 14:24:52,637 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=9642.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:24:52,655 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3807, 2.0499, 2.9150, 4.0558, 4.1009, 4.2109, 2.6422, 2.0397], device='cuda:2'), covar=tensor([0.0502, 0.2730, 0.1225, 0.0582, 0.0321, 0.0189, 0.2167, 0.2761], device='cuda:2'), in_proj_covar=tensor([0.0106, 0.0164, 0.0154, 0.0105, 0.0079, 0.0091, 0.0163, 0.0151], device='cuda:2'), out_proj_covar=tensor([1.0223e-04, 1.5416e-04, 1.5008e-04, 1.1895e-04, 8.3145e-05, 8.8699e-05, 1.5976e-04, 1.4217e-04], device='cuda:2') 2023-03-07 14:24:57,517 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9645.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:25:18,004 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9658.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:25:35,209 INFO [train2.py:809] (2/4) Epoch 3, batch 1700, loss[ctc_loss=0.1704, att_loss=0.2688, loss=0.2491, over 15364.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01162, over 35.00 utterances.], tot_loss[ctc_loss=0.2097, att_loss=0.3068, loss=0.2874, over 3286323.08 frames. utt_duration=1261 frames, utt_pad_proportion=0.04841, over 10433.62 utterances.], batch size: 35, lr: 3.16e-02, grad_scale: 8.0 2023-03-07 14:25:35,547 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=9669.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:25:51,549 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9205, 4.8521, 4.6324, 3.8155, 4.7771, 4.1417, 4.0116, 2.5072], device='cuda:2'), covar=tensor([0.0115, 0.0079, 0.0188, 0.0354, 0.0085, 0.0151, 0.0240, 0.1325], device='cuda:2'), in_proj_covar=tensor([0.0038, 0.0039, 0.0035, 0.0059, 0.0038, 0.0045, 0.0052, 0.0088], device='cuda:2'), out_proj_covar=tensor([9.5294e-05, 1.1565e-04, 1.1059e-04, 1.4843e-04, 1.0313e-04, 1.4003e-04, 1.3355e-04, 2.0936e-04], device='cuda:2') 2023-03-07 14:26:08,340 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.28 vs. limit=5.0 2023-03-07 14:26:09,530 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=9690.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:26:33,372 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.300e+02 3.813e+02 4.618e+02 5.851e+02 1.256e+03, threshold=9.236e+02, percent-clipped=5.0 2023-03-07 14:26:33,790 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3343, 3.1981, 3.8126, 2.5649, 3.5387, 4.4922, 4.2410, 3.0590], device='cuda:2'), covar=tensor([0.0382, 0.1253, 0.0721, 0.1791, 0.1046, 0.0229, 0.0595, 0.1552], device='cuda:2'), in_proj_covar=tensor([0.0142, 0.0155, 0.0131, 0.0151, 0.0167, 0.0093, 0.0107, 0.0157], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 14:26:54,263 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=9717.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:26:57,504 INFO [train2.py:809] (2/4) Epoch 3, batch 1750, loss[ctc_loss=0.199, att_loss=0.2947, loss=0.2756, over 16410.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.006921, over 44.00 utterances.], tot_loss[ctc_loss=0.2096, att_loss=0.3072, loss=0.2877, over 3282720.88 frames. utt_duration=1271 frames, utt_pad_proportion=0.04619, over 10343.26 utterances.], batch size: 44, lr: 3.15e-02, grad_scale: 8.0 2023-03-07 14:26:57,939 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9719.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:27:37,583 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9743.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:28:19,374 INFO [train2.py:809] (2/4) Epoch 3, batch 1800, loss[ctc_loss=0.255, att_loss=0.3507, loss=0.3316, over 17425.00 frames. utt_duration=1012 frames, utt_pad_proportion=0.04478, over 69.00 utterances.], tot_loss[ctc_loss=0.2092, att_loss=0.307, loss=0.2874, over 3277001.89 frames. utt_duration=1268 frames, utt_pad_proportion=0.04857, over 10349.02 utterances.], batch size: 69, lr: 3.14e-02, grad_scale: 8.0 2023-03-07 14:28:21,132 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9770.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:29:00,493 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9334, 4.1170, 4.0280, 4.2068, 4.2926, 4.1115, 3.8509, 4.0035], device='cuda:2'), covar=tensor([0.0113, 0.0189, 0.0107, 0.0123, 0.0077, 0.0100, 0.0298, 0.0221], device='cuda:2'), in_proj_covar=tensor([0.0043, 0.0042, 0.0043, 0.0032, 0.0031, 0.0037, 0.0057, 0.0050], device='cuda:2'), out_proj_covar=tensor([1.0882e-04, 1.0694e-04, 1.2490e-04, 8.5640e-05, 7.8349e-05, 1.0247e-04, 1.4557e-04, 1.3215e-04], device='cuda:2') 2023-03-07 14:29:13,201 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6227, 5.0153, 4.5985, 4.9840, 5.1177, 4.7902, 4.4406, 4.8687], device='cuda:2'), covar=tensor([0.0085, 0.0118, 0.0081, 0.0072, 0.0056, 0.0079, 0.0244, 0.0140], device='cuda:2'), in_proj_covar=tensor([0.0043, 0.0042, 0.0044, 0.0032, 0.0031, 0.0037, 0.0058, 0.0050], device='cuda:2'), out_proj_covar=tensor([1.0913e-04, 1.0666e-04, 1.2536e-04, 8.5488e-05, 7.8301e-05, 1.0293e-04, 1.4621e-04, 1.3206e-04], device='cuda:2') 2023-03-07 14:29:17,432 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.661e+02 4.029e+02 5.153e+02 6.488e+02 1.407e+03, threshold=1.031e+03, percent-clipped=9.0 2023-03-07 14:29:17,929 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9804.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:29:41,461 INFO [train2.py:809] (2/4) Epoch 3, batch 1850, loss[ctc_loss=0.2247, att_loss=0.3254, loss=0.3053, over 16619.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005498, over 47.00 utterances.], tot_loss[ctc_loss=0.2118, att_loss=0.3084, loss=0.2891, over 3265670.28 frames. utt_duration=1218 frames, utt_pad_proportion=0.0631, over 10735.15 utterances.], batch size: 47, lr: 3.14e-02, grad_scale: 8.0 2023-03-07 14:30:38,289 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9240, 2.7341, 3.4499, 2.2553, 3.2979, 3.9575, 3.7638, 2.8745], device='cuda:2'), covar=tensor([0.0305, 0.1151, 0.0614, 0.1326, 0.0795, 0.0289, 0.0451, 0.1199], device='cuda:2'), in_proj_covar=tensor([0.0143, 0.0159, 0.0133, 0.0156, 0.0170, 0.0098, 0.0112, 0.0160], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 14:31:02,969 INFO [train2.py:809] (2/4) Epoch 3, batch 1900, loss[ctc_loss=0.1872, att_loss=0.286, loss=0.2663, over 15893.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.008864, over 39.00 utterances.], tot_loss[ctc_loss=0.2124, att_loss=0.309, loss=0.2897, over 3278629.82 frames. utt_duration=1222 frames, utt_pad_proportion=0.05888, over 10748.00 utterances.], batch size: 39, lr: 3.13e-02, grad_scale: 8.0 2023-03-07 14:32:00,231 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.657e+02 4.209e+02 5.043e+02 6.488e+02 1.113e+03, threshold=1.009e+03, percent-clipped=3.0 2023-03-07 14:32:23,701 INFO [train2.py:809] (2/4) Epoch 3, batch 1950, loss[ctc_loss=0.1532, att_loss=0.2627, loss=0.2408, over 15847.00 frames. utt_duration=1627 frames, utt_pad_proportion=0.01045, over 39.00 utterances.], tot_loss[ctc_loss=0.2113, att_loss=0.3091, loss=0.2896, over 3285185.85 frames. utt_duration=1211 frames, utt_pad_proportion=0.06027, over 10861.00 utterances.], batch size: 39, lr: 3.13e-02, grad_scale: 8.0 2023-03-07 14:32:34,350 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.82 vs. limit=2.0 2023-03-07 14:32:58,027 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9940.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:33:44,090 INFO [train2.py:809] (2/4) Epoch 3, batch 2000, loss[ctc_loss=0.2091, att_loss=0.3178, loss=0.296, over 16620.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005633, over 47.00 utterances.], tot_loss[ctc_loss=0.2104, att_loss=0.309, loss=0.2893, over 3288592.80 frames. utt_duration=1218 frames, utt_pad_proportion=0.0587, over 10815.57 utterances.], batch size: 47, lr: 3.12e-02, grad_scale: 8.0 2023-03-07 14:34:20,736 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3621, 2.1703, 3.0152, 4.6074, 4.5192, 4.8449, 3.1573, 2.0597], device='cuda:2'), covar=tensor([0.0185, 0.2717, 0.1676, 0.0416, 0.0209, 0.0064, 0.1377, 0.2372], device='cuda:2'), in_proj_covar=tensor([0.0111, 0.0173, 0.0162, 0.0110, 0.0084, 0.0090, 0.0165, 0.0156], device='cuda:2'), out_proj_covar=tensor([1.0784e-04, 1.6347e-04, 1.5798e-04, 1.2370e-04, 8.8752e-05, 8.7673e-05, 1.6352e-04, 1.4839e-04], device='cuda:2') 2023-03-07 14:34:46,870 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+02 4.346e+02 5.150e+02 6.489e+02 1.727e+03, threshold=1.030e+03, percent-clipped=5.0 2023-03-07 14:35:03,023 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10014.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:35:10,649 INFO [train2.py:809] (2/4) Epoch 3, batch 2050, loss[ctc_loss=0.1996, att_loss=0.29, loss=0.2719, over 15357.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01105, over 35.00 utterances.], tot_loss[ctc_loss=0.2103, att_loss=0.3088, loss=0.2891, over 3289169.62 frames. utt_duration=1237 frames, utt_pad_proportion=0.05411, over 10651.75 utterances.], batch size: 35, lr: 3.11e-02, grad_scale: 8.0 2023-03-07 14:35:43,978 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.8962, 3.8281, 2.9976, 3.1284, 3.6906, 3.4074, 2.3241, 4.4114], device='cuda:2'), covar=tensor([0.1594, 0.0292, 0.1119, 0.0803, 0.0426, 0.0777, 0.1195, 0.0144], device='cuda:2'), in_proj_covar=tensor([0.0119, 0.0081, 0.0133, 0.0108, 0.0091, 0.0129, 0.0114, 0.0068], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:2') 2023-03-07 14:35:43,987 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10039.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:36:21,329 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10062.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 14:36:32,065 INFO [train2.py:809] (2/4) Epoch 3, batch 2100, loss[ctc_loss=0.2231, att_loss=0.3135, loss=0.2954, over 16119.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005975, over 42.00 utterances.], tot_loss[ctc_loss=0.2097, att_loss=0.3081, loss=0.2884, over 3280048.85 frames. utt_duration=1238 frames, utt_pad_proportion=0.05559, over 10613.97 utterances.], batch size: 42, lr: 3.11e-02, grad_scale: 8.0 2023-03-07 14:36:33,980 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10070.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:36:47,618 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4493, 5.8921, 5.2656, 5.9169, 5.4441, 5.4030, 5.3218, 5.2942], device='cuda:2'), covar=tensor([0.1740, 0.0842, 0.0677, 0.0580, 0.0725, 0.1153, 0.2263, 0.1931], device='cuda:2'), in_proj_covar=tensor([0.0275, 0.0307, 0.0250, 0.0236, 0.0218, 0.0314, 0.0333, 0.0318], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 14:36:53,099 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.6297, 1.3219, 1.6793, 1.6484, 1.0988, 2.1811, 1.1051, 1.1918], device='cuda:2'), covar=tensor([0.0896, 0.1042, 0.1182, 0.1302, 0.1680, 0.0584, 0.1323, 0.3170], device='cuda:2'), in_proj_covar=tensor([0.0062, 0.0055, 0.0054, 0.0071, 0.0058, 0.0057, 0.0064, 0.0087], device='cuda:2'), out_proj_covar=tensor([3.6737e-05, 3.4979e-05, 3.5627e-05, 4.3242e-05, 3.8824e-05, 3.0690e-05, 3.9326e-05, 5.6945e-05], device='cuda:2') 2023-03-07 14:37:21,237 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10099.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:37:22,987 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10100.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 14:37:28,709 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.690e+02 4.150e+02 4.955e+02 6.007e+02 1.097e+03, threshold=9.911e+02, percent-clipped=1.0 2023-03-07 14:37:50,923 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10118.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:37:52,384 INFO [train2.py:809] (2/4) Epoch 3, batch 2150, loss[ctc_loss=0.2008, att_loss=0.3166, loss=0.2934, over 16884.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.007224, over 49.00 utterances.], tot_loss[ctc_loss=0.2106, att_loss=0.3085, loss=0.2889, over 3279154.57 frames. utt_duration=1210 frames, utt_pad_proportion=0.06256, over 10849.59 utterances.], batch size: 49, lr: 3.10e-02, grad_scale: 8.0 2023-03-07 14:37:55,565 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4057, 5.1112, 4.8698, 4.7793, 5.0940, 5.1218, 4.9812, 4.8477], device='cuda:2'), covar=tensor([0.1653, 0.0459, 0.0278, 0.0703, 0.0445, 0.0280, 0.0290, 0.0332], device='cuda:2'), in_proj_covar=tensor([0.0296, 0.0193, 0.0125, 0.0155, 0.0202, 0.0215, 0.0169, 0.0183], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 14:37:58,707 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10123.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 14:38:04,861 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5387, 4.4545, 4.3983, 4.1116, 1.7375, 4.7302, 2.2576, 2.7980], device='cuda:2'), covar=tensor([0.0139, 0.0213, 0.0689, 0.0373, 0.3827, 0.0132, 0.1759, 0.1119], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0085, 0.0191, 0.0109, 0.0221, 0.0088, 0.0178, 0.0163], device='cuda:2'), out_proj_covar=tensor([8.1023e-05, 7.7305e-05, 1.5694e-04, 8.5823e-05, 1.7000e-04, 7.8448e-05, 1.4092e-04, 1.3134e-04], device='cuda:2') 2023-03-07 14:38:18,629 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4999, 4.9620, 4.8088, 5.0937, 4.5540, 4.7477, 5.2821, 5.0176], device='cuda:2'), covar=tensor([0.0287, 0.0214, 0.0348, 0.0116, 0.0264, 0.0182, 0.0201, 0.0126], device='cuda:2'), in_proj_covar=tensor([0.0131, 0.0116, 0.0137, 0.0085, 0.0129, 0.0098, 0.0113, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:2') 2023-03-07 14:39:12,830 INFO [train2.py:809] (2/4) Epoch 3, batch 2200, loss[ctc_loss=0.2056, att_loss=0.3167, loss=0.2944, over 16404.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006593, over 44.00 utterances.], tot_loss[ctc_loss=0.2104, att_loss=0.3077, loss=0.2883, over 3265850.83 frames. utt_duration=1194 frames, utt_pad_proportion=0.06962, over 10957.26 utterances.], batch size: 44, lr: 3.09e-02, grad_scale: 8.0 2023-03-07 14:39:39,704 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.12 vs. limit=2.0 2023-03-07 14:40:09,925 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 4.147e+02 5.296e+02 6.795e+02 2.410e+03, threshold=1.059e+03, percent-clipped=6.0 2023-03-07 14:40:24,100 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6975, 5.0882, 4.8414, 5.2074, 4.5735, 4.9572, 5.3278, 5.0962], device='cuda:2'), covar=tensor([0.0254, 0.0196, 0.0383, 0.0116, 0.0298, 0.0131, 0.0249, 0.0159], device='cuda:2'), in_proj_covar=tensor([0.0132, 0.0118, 0.0140, 0.0086, 0.0130, 0.0099, 0.0116, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:2') 2023-03-07 14:40:33,272 INFO [train2.py:809] (2/4) Epoch 3, batch 2250, loss[ctc_loss=0.198, att_loss=0.2766, loss=0.2609, over 15744.00 frames. utt_duration=1659 frames, utt_pad_proportion=0.01005, over 38.00 utterances.], tot_loss[ctc_loss=0.2095, att_loss=0.3074, loss=0.2878, over 3268716.01 frames. utt_duration=1216 frames, utt_pad_proportion=0.06289, over 10761.60 utterances.], batch size: 38, lr: 3.09e-02, grad_scale: 8.0 2023-03-07 14:41:07,810 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10240.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:41:41,233 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-07 14:41:50,171 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6408, 2.6855, 5.0762, 3.6290, 3.2638, 4.4597, 4.7838, 4.4768], device='cuda:2'), covar=tensor([0.0137, 0.1519, 0.0139, 0.1507, 0.2571, 0.0440, 0.0218, 0.0426], device='cuda:2'), in_proj_covar=tensor([0.0109, 0.0213, 0.0116, 0.0266, 0.0317, 0.0159, 0.0101, 0.0112], device='cuda:2'), out_proj_covar=tensor([8.8679e-05, 1.5793e-04, 9.0323e-05, 2.0756e-04, 2.3138e-04, 1.2829e-04, 8.1645e-05, 9.2783e-05], device='cuda:2') 2023-03-07 14:41:52,776 INFO [train2.py:809] (2/4) Epoch 3, batch 2300, loss[ctc_loss=0.1677, att_loss=0.279, loss=0.2567, over 15941.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007249, over 41.00 utterances.], tot_loss[ctc_loss=0.2077, att_loss=0.3061, loss=0.2864, over 3268954.27 frames. utt_duration=1236 frames, utt_pad_proportion=0.05672, over 10589.32 utterances.], batch size: 41, lr: 3.08e-02, grad_scale: 8.0 2023-03-07 14:42:09,302 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7892, 4.2095, 4.0141, 3.9944, 1.9756, 4.3739, 2.4523, 2.4632], device='cuda:2'), covar=tensor([0.0218, 0.0183, 0.0641, 0.0273, 0.3449, 0.0113, 0.1460, 0.1157], device='cuda:2'), in_proj_covar=tensor([0.0092, 0.0083, 0.0192, 0.0110, 0.0222, 0.0087, 0.0178, 0.0163], device='cuda:2'), out_proj_covar=tensor([8.1318e-05, 7.6828e-05, 1.5790e-04, 8.6610e-05, 1.7123e-04, 7.8017e-05, 1.4145e-04, 1.3122e-04], device='cuda:2') 2023-03-07 14:42:18,985 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8835, 4.6737, 4.9713, 3.2192, 4.8384, 3.9131, 4.2559, 2.5208], device='cuda:2'), covar=tensor([0.0103, 0.0082, 0.0164, 0.0604, 0.0082, 0.0220, 0.0240, 0.1467], device='cuda:2'), in_proj_covar=tensor([0.0042, 0.0042, 0.0038, 0.0066, 0.0041, 0.0049, 0.0057, 0.0092], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 14:42:23,467 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10288.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:42:49,025 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 3.999e+02 4.937e+02 6.652e+02 1.498e+03, threshold=9.874e+02, percent-clipped=4.0 2023-03-07 14:43:04,923 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10314.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:43:12,497 INFO [train2.py:809] (2/4) Epoch 3, batch 2350, loss[ctc_loss=0.2186, att_loss=0.3304, loss=0.308, over 17031.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.0101, over 52.00 utterances.], tot_loss[ctc_loss=0.2077, att_loss=0.306, loss=0.2863, over 3266290.84 frames. utt_duration=1245 frames, utt_pad_proportion=0.05683, over 10508.38 utterances.], batch size: 52, lr: 3.08e-02, grad_scale: 16.0 2023-03-07 14:44:13,016 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.03 vs. limit=5.0 2023-03-07 14:44:21,298 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10362.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:44:32,349 INFO [train2.py:809] (2/4) Epoch 3, batch 2400, loss[ctc_loss=0.1933, att_loss=0.3072, loss=0.2844, over 17423.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03172, over 63.00 utterances.], tot_loss[ctc_loss=0.2066, att_loss=0.3048, loss=0.2852, over 3262459.98 frames. utt_duration=1256 frames, utt_pad_proportion=0.05447, over 10404.46 utterances.], batch size: 63, lr: 3.07e-02, grad_scale: 16.0 2023-03-07 14:45:14,388 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10395.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 14:45:20,843 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10399.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:45:28,355 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+02 3.987e+02 4.935e+02 6.057e+02 1.238e+03, threshold=9.870e+02, percent-clipped=1.0 2023-03-07 14:45:34,867 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7308, 5.1275, 4.8296, 4.7933, 5.2276, 5.0714, 4.7907, 4.7638], device='cuda:2'), covar=tensor([0.1066, 0.0402, 0.0241, 0.0575, 0.0273, 0.0248, 0.0304, 0.0294], device='cuda:2'), in_proj_covar=tensor([0.0294, 0.0182, 0.0125, 0.0153, 0.0198, 0.0217, 0.0169, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 14:45:36,611 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10409.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:45:50,450 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10418.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 14:45:51,944 INFO [train2.py:809] (2/4) Epoch 3, batch 2450, loss[ctc_loss=0.2139, att_loss=0.3039, loss=0.2859, over 17052.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008192, over 52.00 utterances.], tot_loss[ctc_loss=0.2055, att_loss=0.3042, loss=0.2845, over 3265056.94 frames. utt_duration=1254 frames, utt_pad_proportion=0.05369, over 10430.48 utterances.], batch size: 52, lr: 3.06e-02, grad_scale: 16.0 2023-03-07 14:46:38,268 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10447.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:46:49,298 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10454.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:47:03,936 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.02 vs. limit=2.0 2023-03-07 14:47:12,260 INFO [train2.py:809] (2/4) Epoch 3, batch 2500, loss[ctc_loss=0.2051, att_loss=0.2868, loss=0.2704, over 15348.00 frames. utt_duration=1755 frames, utt_pad_proportion=0.01266, over 35.00 utterances.], tot_loss[ctc_loss=0.2063, att_loss=0.3044, loss=0.2848, over 3250511.42 frames. utt_duration=1216 frames, utt_pad_proportion=0.06825, over 10702.98 utterances.], batch size: 35, lr: 3.06e-02, grad_scale: 16.0 2023-03-07 14:47:14,177 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10470.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:47:33,798 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6421, 5.8567, 5.2297, 5.8344, 5.4427, 5.4030, 5.3770, 5.1647], device='cuda:2'), covar=tensor([0.1132, 0.0743, 0.0733, 0.0541, 0.0584, 0.1153, 0.2083, 0.1987], device='cuda:2'), in_proj_covar=tensor([0.0269, 0.0312, 0.0253, 0.0240, 0.0219, 0.0305, 0.0335, 0.0320], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 14:47:55,870 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2369, 4.5130, 4.0572, 4.6413, 5.0376, 4.3928, 4.1445, 1.9944], device='cuda:2'), covar=tensor([0.0389, 0.0389, 0.0410, 0.0245, 0.0881, 0.0379, 0.0477, 0.4075], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0114, 0.0111, 0.0109, 0.0214, 0.0129, 0.0102, 0.0254], device='cuda:2'), out_proj_covar=tensor([1.1099e-04, 9.3065e-05, 9.3551e-05, 9.6110e-05, 1.9516e-04, 1.0560e-04, 9.1218e-05, 2.0934e-04], device='cuda:2') 2023-03-07 14:48:09,695 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.331e+02 4.171e+02 5.202e+02 6.320e+02 1.110e+03, threshold=1.040e+03, percent-clipped=6.0 2023-03-07 14:48:11,762 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.5995, 2.1782, 2.2009, 1.3361, 1.0630, 3.0117, 0.9772, 1.2819], device='cuda:2'), covar=tensor([0.0821, 0.0772, 0.0766, 0.5190, 0.2868, 0.0397, 0.2594, 0.4007], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0054, 0.0054, 0.0074, 0.0057, 0.0058, 0.0068, 0.0087], device='cuda:2'), out_proj_covar=tensor([3.9216e-05, 3.4049e-05, 3.5643e-05, 4.5169e-05, 3.8557e-05, 3.2539e-05, 4.2336e-05, 5.7287e-05], device='cuda:2') 2023-03-07 14:48:17,275 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.15 vs. limit=2.0 2023-03-07 14:48:27,703 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10515.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 14:48:33,423 INFO [train2.py:809] (2/4) Epoch 3, batch 2550, loss[ctc_loss=0.2027, att_loss=0.2747, loss=0.2603, over 15610.00 frames. utt_duration=1689 frames, utt_pad_proportion=0.009321, over 37.00 utterances.], tot_loss[ctc_loss=0.2054, att_loss=0.3042, loss=0.2844, over 3268225.01 frames. utt_duration=1231 frames, utt_pad_proportion=0.05944, over 10634.45 utterances.], batch size: 37, lr: 3.05e-02, grad_scale: 16.0 2023-03-07 14:48:50,955 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3645, 2.8027, 3.7522, 2.2023, 3.4106, 4.4476, 4.2024, 3.0561], device='cuda:2'), covar=tensor([0.0323, 0.1390, 0.0604, 0.1623, 0.0829, 0.0194, 0.0439, 0.1360], device='cuda:2'), in_proj_covar=tensor([0.0150, 0.0159, 0.0134, 0.0155, 0.0165, 0.0098, 0.0113, 0.0163], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 14:49:33,406 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.74 vs. limit=2.0 2023-03-07 14:49:54,160 INFO [train2.py:809] (2/4) Epoch 3, batch 2600, loss[ctc_loss=0.1903, att_loss=0.3082, loss=0.2846, over 16883.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006685, over 49.00 utterances.], tot_loss[ctc_loss=0.2036, att_loss=0.3029, loss=0.283, over 3267059.44 frames. utt_duration=1237 frames, utt_pad_proportion=0.05886, over 10576.24 utterances.], batch size: 49, lr: 3.05e-02, grad_scale: 16.0 2023-03-07 14:50:50,041 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 3.926e+02 4.748e+02 6.116e+02 1.074e+03, threshold=9.496e+02, percent-clipped=2.0 2023-03-07 14:51:13,367 INFO [train2.py:809] (2/4) Epoch 3, batch 2650, loss[ctc_loss=0.1585, att_loss=0.2728, loss=0.2499, over 16256.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.007986, over 43.00 utterances.], tot_loss[ctc_loss=0.2032, att_loss=0.3033, loss=0.2833, over 3272448.31 frames. utt_duration=1248 frames, utt_pad_proportion=0.05469, over 10498.03 utterances.], batch size: 43, lr: 3.04e-02, grad_scale: 16.0 2023-03-07 14:52:32,125 INFO [train2.py:809] (2/4) Epoch 3, batch 2700, loss[ctc_loss=0.1845, att_loss=0.2911, loss=0.2697, over 16134.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005879, over 42.00 utterances.], tot_loss[ctc_loss=0.2054, att_loss=0.3051, loss=0.2851, over 3279213.55 frames. utt_duration=1234 frames, utt_pad_proportion=0.05645, over 10645.61 utterances.], batch size: 42, lr: 3.03e-02, grad_scale: 16.0 2023-03-07 14:52:47,895 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=6.09 vs. limit=5.0 2023-03-07 14:53:12,161 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2023-03-07 14:53:14,772 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10695.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:53:28,794 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.793e+02 4.160e+02 5.458e+02 6.683e+02 1.246e+03, threshold=1.092e+03, percent-clipped=5.0 2023-03-07 14:53:51,043 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10718.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 14:53:52,194 INFO [train2.py:809] (2/4) Epoch 3, batch 2750, loss[ctc_loss=0.2105, att_loss=0.2944, loss=0.2776, over 16144.00 frames. utt_duration=1539 frames, utt_pad_proportion=0.005156, over 42.00 utterances.], tot_loss[ctc_loss=0.2067, att_loss=0.3061, loss=0.2862, over 3286853.59 frames. utt_duration=1219 frames, utt_pad_proportion=0.05738, over 10798.15 utterances.], batch size: 42, lr: 3.03e-02, grad_scale: 16.0 2023-03-07 14:54:32,773 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10743.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:54:34,614 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10744.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:54:36,102 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4299, 4.8525, 4.3289, 4.8830, 4.7806, 4.5231, 4.3300, 4.5040], device='cuda:2'), covar=tensor([0.0108, 0.0189, 0.0115, 0.0072, 0.0107, 0.0098, 0.0296, 0.0217], device='cuda:2'), in_proj_covar=tensor([0.0043, 0.0043, 0.0044, 0.0032, 0.0032, 0.0039, 0.0059, 0.0053], device='cuda:2'), out_proj_covar=tensor([1.1525e-04, 1.1558e-04, 1.3733e-04, 8.9467e-05, 8.7616e-05, 1.0994e-04, 1.6154e-04, 1.4686e-04], device='cuda:2') 2023-03-07 14:55:06,777 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10765.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:55:08,222 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10766.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 14:55:12,963 INFO [train2.py:809] (2/4) Epoch 3, batch 2800, loss[ctc_loss=0.2171, att_loss=0.3038, loss=0.2865, over 15956.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.00644, over 41.00 utterances.], tot_loss[ctc_loss=0.2064, att_loss=0.3057, loss=0.2859, over 3287936.71 frames. utt_duration=1211 frames, utt_pad_proportion=0.05869, over 10871.13 utterances.], batch size: 41, lr: 3.02e-02, grad_scale: 16.0 2023-03-07 14:55:21,927 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10774.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:56:10,550 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.287e+02 4.175e+02 5.006e+02 5.883e+02 1.707e+03, threshold=1.001e+03, percent-clipped=3.0 2023-03-07 14:56:12,591 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10805.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:56:16,481 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.31 vs. limit=5.0 2023-03-07 14:56:20,291 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10810.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 14:56:28,564 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.98 vs. limit=5.0 2023-03-07 14:56:33,908 INFO [train2.py:809] (2/4) Epoch 3, batch 2850, loss[ctc_loss=0.1965, att_loss=0.3108, loss=0.2879, over 16616.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.006036, over 47.00 utterances.], tot_loss[ctc_loss=0.2049, att_loss=0.305, loss=0.2849, over 3286432.00 frames. utt_duration=1208 frames, utt_pad_proportion=0.05933, over 10895.40 utterances.], batch size: 47, lr: 3.02e-02, grad_scale: 16.0 2023-03-07 14:57:00,139 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10835.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 14:57:00,711 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2023-03-07 14:57:39,236 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1162, 2.2086, 3.0127, 4.3943, 4.4619, 4.3039, 2.9299, 2.0356], device='cuda:2'), covar=tensor([0.0334, 0.2819, 0.1507, 0.0669, 0.0161, 0.0142, 0.1823, 0.2827], device='cuda:2'), in_proj_covar=tensor([0.0116, 0.0172, 0.0164, 0.0113, 0.0091, 0.0096, 0.0170, 0.0158], device='cuda:2'), out_proj_covar=tensor([1.1515e-04, 1.6424e-04, 1.6233e-04, 1.2788e-04, 9.5288e-05, 9.5572e-05, 1.6821e-04, 1.5282e-04], device='cuda:2') 2023-03-07 14:57:54,306 INFO [train2.py:809] (2/4) Epoch 3, batch 2900, loss[ctc_loss=0.177, att_loss=0.3048, loss=0.2792, over 16771.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006333, over 48.00 utterances.], tot_loss[ctc_loss=0.2042, att_loss=0.3041, loss=0.2841, over 3284915.89 frames. utt_duration=1229 frames, utt_pad_proportion=0.05513, over 10708.04 utterances.], batch size: 48, lr: 3.01e-02, grad_scale: 4.0 2023-03-07 14:58:05,985 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2023-03-07 14:58:21,002 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10885.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 14:58:54,574 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.371e+02 4.021e+02 4.974e+02 6.236e+02 1.292e+03, threshold=9.948e+02, percent-clipped=6.0 2023-03-07 14:59:14,719 INFO [train2.py:809] (2/4) Epoch 3, batch 2950, loss[ctc_loss=0.191, att_loss=0.2965, loss=0.2754, over 16620.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005498, over 47.00 utterances.], tot_loss[ctc_loss=0.2037, att_loss=0.3038, loss=0.2838, over 3280541.74 frames. utt_duration=1215 frames, utt_pad_proportion=0.05983, over 10816.33 utterances.], batch size: 47, lr: 3.01e-02, grad_scale: 4.0 2023-03-07 14:59:53,542 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.20 vs. limit=5.0 2023-03-07 14:59:59,150 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10946.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 15:00:34,061 INFO [train2.py:809] (2/4) Epoch 3, batch 3000, loss[ctc_loss=0.1601, att_loss=0.2665, loss=0.2452, over 15491.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009472, over 36.00 utterances.], tot_loss[ctc_loss=0.2048, att_loss=0.3043, loss=0.2844, over 3268245.76 frames. utt_duration=1191 frames, utt_pad_proportion=0.06941, over 10991.13 utterances.], batch size: 36, lr: 3.00e-02, grad_scale: 4.0 2023-03-07 15:00:34,061 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 15:00:47,742 INFO [train2.py:843] (2/4) Epoch 3, validation: ctc_loss=0.1004, att_loss=0.2657, loss=0.2327, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 15:00:47,743 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 15:01:48,348 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.705e+02 4.292e+02 5.307e+02 6.516e+02 1.303e+03, threshold=1.061e+03, percent-clipped=4.0 2023-03-07 15:02:09,267 INFO [train2.py:809] (2/4) Epoch 3, batch 3050, loss[ctc_loss=0.1843, att_loss=0.2979, loss=0.2752, over 17130.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01434, over 56.00 utterances.], tot_loss[ctc_loss=0.2035, att_loss=0.3035, loss=0.2835, over 3269518.75 frames. utt_duration=1186 frames, utt_pad_proportion=0.07035, over 11040.40 utterances.], batch size: 56, lr: 2.99e-02, grad_scale: 4.0 2023-03-07 15:02:16,448 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.7929, 1.9248, 2.1101, 1.5471, 1.9618, 2.7802, 1.9408, 1.3535], device='cuda:2'), covar=tensor([0.0665, 0.0832, 0.0698, 0.1977, 0.1015, 0.0600, 0.1279, 0.3969], device='cuda:2'), in_proj_covar=tensor([0.0060, 0.0048, 0.0051, 0.0063, 0.0047, 0.0054, 0.0056, 0.0076], device='cuda:2'), out_proj_covar=tensor([3.4970e-05, 3.0729e-05, 3.3042e-05, 3.9445e-05, 3.2410e-05, 3.1048e-05, 3.5941e-05, 5.0959e-05], device='cuda:2') 2023-03-07 15:03:23,381 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11065.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:03:29,045 INFO [train2.py:809] (2/4) Epoch 3, batch 3100, loss[ctc_loss=0.18, att_loss=0.2881, loss=0.2665, over 16271.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007071, over 43.00 utterances.], tot_loss[ctc_loss=0.2024, att_loss=0.3028, loss=0.2827, over 3267873.94 frames. utt_duration=1217 frames, utt_pad_proportion=0.06225, over 10753.00 utterances.], batch size: 43, lr: 2.99e-02, grad_scale: 4.0 2023-03-07 15:03:47,766 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6195, 5.1482, 4.9610, 4.8656, 5.3321, 5.1511, 4.9539, 4.8093], device='cuda:2'), covar=tensor([0.1193, 0.0393, 0.0231, 0.0463, 0.0236, 0.0245, 0.0259, 0.0261], device='cuda:2'), in_proj_covar=tensor([0.0315, 0.0190, 0.0132, 0.0162, 0.0212, 0.0229, 0.0181, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 15:04:20,583 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11100.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:04:29,504 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.231e+02 3.861e+02 4.979e+02 6.138e+02 1.576e+03, threshold=9.958e+02, percent-clipped=5.0 2023-03-07 15:04:36,000 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11110.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 15:04:40,487 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11113.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:04:50,177 INFO [train2.py:809] (2/4) Epoch 3, batch 3150, loss[ctc_loss=0.2484, att_loss=0.3273, loss=0.3115, over 17000.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.008996, over 51.00 utterances.], tot_loss[ctc_loss=0.2028, att_loss=0.3029, loss=0.2829, over 3258035.86 frames. utt_duration=1192 frames, utt_pad_proportion=0.07328, over 10949.71 utterances.], batch size: 51, lr: 2.98e-02, grad_scale: 4.0 2023-03-07 15:05:08,077 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11130.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:05:37,951 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4110, 2.7454, 3.4813, 2.4499, 3.2505, 4.4582, 4.0498, 3.1599], device='cuda:2'), covar=tensor([0.0309, 0.1713, 0.1094, 0.1565, 0.1051, 0.0265, 0.0630, 0.1365], device='cuda:2'), in_proj_covar=tensor([0.0154, 0.0170, 0.0147, 0.0157, 0.0172, 0.0110, 0.0119, 0.0166], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 15:05:53,256 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11158.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:06:11,072 INFO [train2.py:809] (2/4) Epoch 3, batch 3200, loss[ctc_loss=0.1391, att_loss=0.2588, loss=0.2349, over 14136.00 frames. utt_duration=1825 frames, utt_pad_proportion=0.0566, over 31.00 utterances.], tot_loss[ctc_loss=0.2014, att_loss=0.3023, loss=0.2821, over 3265070.04 frames. utt_duration=1211 frames, utt_pad_proportion=0.06627, over 10794.70 utterances.], batch size: 31, lr: 2.98e-02, grad_scale: 8.0 2023-03-07 15:06:19,902 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11174.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:07:10,781 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+02 3.760e+02 4.677e+02 5.867e+02 1.602e+03, threshold=9.354e+02, percent-clipped=2.0 2023-03-07 15:07:14,333 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6170, 4.7460, 4.6781, 4.7587, 5.0653, 4.9470, 4.6420, 2.0085], device='cuda:2'), covar=tensor([0.0233, 0.0262, 0.0184, 0.0279, 0.1135, 0.0198, 0.0293, 0.3653], device='cuda:2'), in_proj_covar=tensor([0.0123, 0.0112, 0.0105, 0.0108, 0.0213, 0.0127, 0.0102, 0.0237], device='cuda:2'), out_proj_covar=tensor([1.0806e-04, 9.2218e-05, 8.7652e-05, 9.4190e-05, 1.9486e-04, 1.0650e-04, 8.9678e-05, 2.0059e-04], device='cuda:2') 2023-03-07 15:07:31,877 INFO [train2.py:809] (2/4) Epoch 3, batch 3250, loss[ctc_loss=0.1985, att_loss=0.3176, loss=0.2938, over 17139.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.0131, over 56.00 utterances.], tot_loss[ctc_loss=0.2003, att_loss=0.302, loss=0.2816, over 3270616.94 frames. utt_duration=1231 frames, utt_pad_proportion=0.05984, over 10640.58 utterances.], batch size: 56, lr: 2.97e-02, grad_scale: 8.0 2023-03-07 15:07:42,164 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2682, 2.2536, 2.9861, 4.6071, 4.4151, 4.4868, 2.9180, 2.1281], device='cuda:2'), covar=tensor([0.0357, 0.2794, 0.1607, 0.0479, 0.0270, 0.0125, 0.1789, 0.2770], device='cuda:2'), in_proj_covar=tensor([0.0121, 0.0172, 0.0168, 0.0116, 0.0095, 0.0098, 0.0170, 0.0161], device='cuda:2'), out_proj_covar=tensor([1.2168e-04, 1.6466e-04, 1.6575e-04, 1.3008e-04, 1.0198e-04, 9.7870e-05, 1.6923e-04, 1.5594e-04], device='cuda:2') 2023-03-07 15:07:58,392 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11235.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:07:58,440 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5063, 4.3415, 4.2239, 4.6624, 4.7767, 4.6957, 4.0848, 1.8598], device='cuda:2'), covar=tensor([0.0236, 0.0400, 0.0395, 0.0079, 0.0930, 0.0173, 0.0512, 0.3799], device='cuda:2'), in_proj_covar=tensor([0.0124, 0.0113, 0.0107, 0.0108, 0.0220, 0.0127, 0.0103, 0.0239], device='cuda:2'), out_proj_covar=tensor([1.0900e-04, 9.3440e-05, 8.9008e-05, 9.4838e-05, 1.9928e-04, 1.0714e-04, 9.1075e-05, 2.0312e-04], device='cuda:2') 2023-03-07 15:08:07,927 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11241.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 15:08:52,863 INFO [train2.py:809] (2/4) Epoch 3, batch 3300, loss[ctc_loss=0.2173, att_loss=0.3221, loss=0.3012, over 17368.00 frames. utt_duration=1008 frames, utt_pad_proportion=0.04865, over 69.00 utterances.], tot_loss[ctc_loss=0.1993, att_loss=0.301, loss=0.2807, over 3269047.73 frames. utt_duration=1227 frames, utt_pad_proportion=0.06015, over 10670.85 utterances.], batch size: 69, lr: 2.97e-02, grad_scale: 8.0 2023-03-07 15:09:48,013 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5463, 5.0437, 4.7377, 5.0912, 4.4743, 4.9877, 5.3926, 5.1075], device='cuda:2'), covar=tensor([0.0333, 0.0239, 0.0437, 0.0186, 0.0399, 0.0123, 0.0159, 0.0123], device='cuda:2'), in_proj_covar=tensor([0.0138, 0.0124, 0.0147, 0.0094, 0.0138, 0.0101, 0.0118, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 15:09:52,441 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+02 3.797e+02 4.904e+02 6.059e+02 1.982e+03, threshold=9.808e+02, percent-clipped=6.0 2023-03-07 15:09:58,448 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2023-03-07 15:10:12,978 INFO [train2.py:809] (2/4) Epoch 3, batch 3350, loss[ctc_loss=0.2036, att_loss=0.3142, loss=0.2921, over 17378.00 frames. utt_duration=1009 frames, utt_pad_proportion=0.04908, over 69.00 utterances.], tot_loss[ctc_loss=0.1984, att_loss=0.3009, loss=0.2804, over 3272921.81 frames. utt_duration=1245 frames, utt_pad_proportion=0.05532, over 10531.69 utterances.], batch size: 69, lr: 2.96e-02, grad_scale: 8.0 2023-03-07 15:11:00,581 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1732, 4.5765, 4.4039, 4.6997, 4.0828, 4.3924, 4.7879, 4.6290], device='cuda:2'), covar=tensor([0.0359, 0.0230, 0.0357, 0.0129, 0.0339, 0.0220, 0.0245, 0.0136], device='cuda:2'), in_proj_covar=tensor([0.0143, 0.0128, 0.0151, 0.0095, 0.0142, 0.0107, 0.0125, 0.0111], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 15:11:06,983 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2360, 5.2765, 5.3958, 3.5853, 5.0462, 4.2385, 4.8015, 2.1435], device='cuda:2'), covar=tensor([0.0156, 0.0052, 0.0139, 0.0699, 0.0089, 0.0139, 0.0180, 0.2026], device='cuda:2'), in_proj_covar=tensor([0.0043, 0.0043, 0.0038, 0.0072, 0.0043, 0.0052, 0.0060, 0.0093], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 15:11:34,093 INFO [train2.py:809] (2/4) Epoch 3, batch 3400, loss[ctc_loss=0.1922, att_loss=0.3145, loss=0.29, over 16889.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006377, over 49.00 utterances.], tot_loss[ctc_loss=0.1977, att_loss=0.3003, loss=0.2798, over 3271159.64 frames. utt_duration=1262 frames, utt_pad_proportion=0.05218, over 10382.55 utterances.], batch size: 49, lr: 2.96e-02, grad_scale: 8.0 2023-03-07 15:12:25,247 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11400.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:12:34,184 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.436e+02 3.676e+02 4.769e+02 6.355e+02 1.428e+03, threshold=9.539e+02, percent-clipped=8.0 2023-03-07 15:12:54,111 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2300, 4.9659, 5.0269, 3.6033, 4.8755, 4.1429, 4.4799, 2.5294], device='cuda:2'), covar=tensor([0.0068, 0.0072, 0.0238, 0.0523, 0.0086, 0.0182, 0.0189, 0.1404], device='cuda:2'), in_proj_covar=tensor([0.0043, 0.0044, 0.0039, 0.0072, 0.0043, 0.0052, 0.0060, 0.0094], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 15:12:55,341 INFO [train2.py:809] (2/4) Epoch 3, batch 3450, loss[ctc_loss=0.1877, att_loss=0.291, loss=0.2704, over 16177.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006427, over 41.00 utterances.], tot_loss[ctc_loss=0.1986, att_loss=0.3016, loss=0.281, over 3278192.12 frames. utt_duration=1243 frames, utt_pad_proportion=0.056, over 10562.72 utterances.], batch size: 41, lr: 2.95e-02, grad_scale: 8.0 2023-03-07 15:13:12,837 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11430.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:13:41,672 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11448.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:13:46,484 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6335, 5.2420, 4.9560, 5.1327, 4.6852, 5.0168, 5.4280, 5.2285], device='cuda:2'), covar=tensor([0.0318, 0.0212, 0.0342, 0.0136, 0.0328, 0.0139, 0.0160, 0.0113], device='cuda:2'), in_proj_covar=tensor([0.0141, 0.0126, 0.0151, 0.0095, 0.0139, 0.0105, 0.0121, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 15:14:15,454 INFO [train2.py:809] (2/4) Epoch 3, batch 3500, loss[ctc_loss=0.1914, att_loss=0.2989, loss=0.2774, over 16966.00 frames. utt_duration=686.9 frames, utt_pad_proportion=0.1392, over 99.00 utterances.], tot_loss[ctc_loss=0.1996, att_loss=0.3021, loss=0.2816, over 3277532.21 frames. utt_duration=1209 frames, utt_pad_proportion=0.06266, over 10858.35 utterances.], batch size: 99, lr: 2.95e-02, grad_scale: 8.0 2023-03-07 15:14:29,627 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11478.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:15:12,618 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-07 15:15:14,771 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 3.561e+02 4.473e+02 5.619e+02 1.545e+03, threshold=8.947e+02, percent-clipped=3.0 2023-03-07 15:15:36,471 INFO [train2.py:809] (2/4) Epoch 3, batch 3550, loss[ctc_loss=0.2132, att_loss=0.316, loss=0.2955, over 17500.00 frames. utt_duration=887.6 frames, utt_pad_proportion=0.07055, over 79.00 utterances.], tot_loss[ctc_loss=0.198, att_loss=0.3013, loss=0.2807, over 3279992.33 frames. utt_duration=1238 frames, utt_pad_proportion=0.05473, over 10607.26 utterances.], batch size: 79, lr: 2.94e-02, grad_scale: 8.0 2023-03-07 15:15:53,521 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11530.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:15:56,837 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11532.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:15:57,216 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2023-03-07 15:16:12,274 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11541.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 15:16:56,921 INFO [train2.py:809] (2/4) Epoch 3, batch 3600, loss[ctc_loss=0.179, att_loss=0.2698, loss=0.2517, over 15787.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007615, over 38.00 utterances.], tot_loss[ctc_loss=0.1987, att_loss=0.3017, loss=0.2811, over 3279180.35 frames. utt_duration=1246 frames, utt_pad_proportion=0.05406, over 10541.73 utterances.], batch size: 38, lr: 2.93e-02, grad_scale: 8.0 2023-03-07 15:17:29,070 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11589.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 15:17:36,073 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11593.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:17:56,221 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+02 4.035e+02 5.369e+02 6.421e+02 1.595e+03, threshold=1.074e+03, percent-clipped=5.0 2023-03-07 15:18:18,104 INFO [train2.py:809] (2/4) Epoch 3, batch 3650, loss[ctc_loss=0.1896, att_loss=0.2832, loss=0.2645, over 15372.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01112, over 35.00 utterances.], tot_loss[ctc_loss=0.1983, att_loss=0.3014, loss=0.2808, over 3273450.05 frames. utt_duration=1251 frames, utt_pad_proportion=0.0556, over 10482.40 utterances.], batch size: 35, lr: 2.93e-02, grad_scale: 8.0 2023-03-07 15:18:43,929 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11635.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:19:38,108 INFO [train2.py:809] (2/4) Epoch 3, batch 3700, loss[ctc_loss=0.2242, att_loss=0.3066, loss=0.2901, over 16316.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.007068, over 45.00 utterances.], tot_loss[ctc_loss=0.1983, att_loss=0.3012, loss=0.2806, over 3263399.93 frames. utt_duration=1221 frames, utt_pad_proportion=0.06572, over 10700.87 utterances.], batch size: 45, lr: 2.92e-02, grad_scale: 8.0 2023-03-07 15:20:14,935 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2543, 4.9026, 4.9272, 4.7908, 1.8409, 2.6613, 5.0795, 3.7528], device='cuda:2'), covar=tensor([0.0528, 0.0139, 0.0131, 0.0235, 0.8018, 0.2409, 0.0131, 0.1906], device='cuda:2'), in_proj_covar=tensor([0.0238, 0.0140, 0.0155, 0.0178, 0.0388, 0.0290, 0.0147, 0.0257], device='cuda:2'), out_proj_covar=tensor([1.3566e-04, 7.3738e-05, 8.4001e-05, 8.6292e-05, 1.9420e-04, 1.4525e-04, 7.7410e-05, 1.4444e-04], device='cuda:2') 2023-03-07 15:20:22,401 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11696.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:20:27,148 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11699.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:20:37,360 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.856e+02 3.737e+02 4.615e+02 5.715e+02 1.108e+03, threshold=9.229e+02, percent-clipped=1.0 2023-03-07 15:20:58,426 INFO [train2.py:809] (2/4) Epoch 3, batch 3750, loss[ctc_loss=0.1705, att_loss=0.2902, loss=0.2663, over 16319.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.00684, over 45.00 utterances.], tot_loss[ctc_loss=0.1976, att_loss=0.3011, loss=0.2804, over 3270228.72 frames. utt_duration=1244 frames, utt_pad_proportion=0.05736, over 10525.88 utterances.], batch size: 45, lr: 2.92e-02, grad_scale: 8.0 2023-03-07 15:21:33,041 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6058, 2.3283, 4.7618, 3.7136, 3.2228, 4.3865, 4.4289, 4.6050], device='cuda:2'), covar=tensor([0.0140, 0.1841, 0.0223, 0.1383, 0.2450, 0.0355, 0.0227, 0.0273], device='cuda:2'), in_proj_covar=tensor([0.0116, 0.0216, 0.0115, 0.0267, 0.0317, 0.0166, 0.0108, 0.0111], device='cuda:2'), out_proj_covar=tensor([9.5321e-05, 1.6400e-04, 9.3220e-05, 2.1201e-04, 2.3690e-04, 1.3454e-04, 9.0781e-05, 9.4462e-05], device='cuda:2') 2023-03-07 15:22:04,571 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11760.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:22:18,742 INFO [train2.py:809] (2/4) Epoch 3, batch 3800, loss[ctc_loss=0.1774, att_loss=0.2726, loss=0.2536, over 15642.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008534, over 37.00 utterances.], tot_loss[ctc_loss=0.1978, att_loss=0.3008, loss=0.2802, over 3258514.25 frames. utt_duration=1200 frames, utt_pad_proportion=0.07022, over 10877.16 utterances.], batch size: 37, lr: 2.91e-02, grad_scale: 8.0 2023-03-07 15:23:15,558 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1440, 4.9208, 4.9740, 3.3386, 4.8764, 4.1209, 4.2298, 2.7026], device='cuda:2'), covar=tensor([0.0095, 0.0069, 0.0170, 0.0616, 0.0112, 0.0157, 0.0239, 0.1289], device='cuda:2'), in_proj_covar=tensor([0.0042, 0.0044, 0.0039, 0.0074, 0.0044, 0.0053, 0.0060, 0.0091], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 15:23:18,124 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+02 4.239e+02 5.484e+02 6.646e+02 1.603e+03, threshold=1.097e+03, percent-clipped=8.0 2023-03-07 15:23:33,642 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11816.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:23:37,903 INFO [train2.py:809] (2/4) Epoch 3, batch 3850, loss[ctc_loss=0.1777, att_loss=0.2856, loss=0.2641, over 16411.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.007087, over 44.00 utterances.], tot_loss[ctc_loss=0.1981, att_loss=0.3003, loss=0.2799, over 3258123.06 frames. utt_duration=1189 frames, utt_pad_proportion=0.07254, over 10974.88 utterances.], batch size: 44, lr: 2.91e-02, grad_scale: 8.0 2023-03-07 15:23:55,911 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11830.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:24:07,841 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6799, 5.8929, 5.2707, 5.8374, 5.5741, 5.2195, 5.1860, 5.0888], device='cuda:2'), covar=tensor([0.1212, 0.0817, 0.0702, 0.0649, 0.0666, 0.1337, 0.2788, 0.2507], device='cuda:2'), in_proj_covar=tensor([0.0281, 0.0313, 0.0260, 0.0253, 0.0235, 0.0325, 0.0350, 0.0342], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 15:24:20,864 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11846.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:24:40,884 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3966, 4.5855, 4.5864, 4.6432, 4.9662, 4.3219, 4.3510, 1.9844], device='cuda:2'), covar=tensor([0.0304, 0.0368, 0.0260, 0.0274, 0.0971, 0.0328, 0.0474, 0.3840], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0119, 0.0110, 0.0114, 0.0229, 0.0135, 0.0105, 0.0255], device='cuda:2'), out_proj_covar=tensor([1.1546e-04, 1.0012e-04, 9.5235e-05, 1.0266e-04, 2.1004e-04, 1.1571e-04, 9.6102e-05, 2.1797e-04], device='cuda:2') 2023-03-07 15:24:49,386 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2023-03-07 15:24:55,936 INFO [train2.py:809] (2/4) Epoch 3, batch 3900, loss[ctc_loss=0.2228, att_loss=0.2913, loss=0.2776, over 12302.00 frames. utt_duration=1824 frames, utt_pad_proportion=0.1441, over 27.00 utterances.], tot_loss[ctc_loss=0.1976, att_loss=0.3, loss=0.2795, over 3257679.59 frames. utt_duration=1197 frames, utt_pad_proportion=0.06918, over 10901.10 utterances.], batch size: 27, lr: 2.90e-02, grad_scale: 8.0 2023-03-07 15:25:08,650 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11877.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:25:09,893 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11878.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:25:16,888 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.02 vs. limit=2.0 2023-03-07 15:25:25,497 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11888.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:25:53,040 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 3.699e+02 4.427e+02 5.534e+02 1.749e+03, threshold=8.855e+02, percent-clipped=3.0 2023-03-07 15:25:53,326 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5782, 5.0757, 4.6125, 4.9703, 5.1860, 4.9683, 4.7417, 4.5652], device='cuda:2'), covar=tensor([0.1127, 0.0441, 0.0368, 0.0445, 0.0297, 0.0356, 0.0310, 0.0367], device='cuda:2'), in_proj_covar=tensor([0.0313, 0.0188, 0.0134, 0.0161, 0.0205, 0.0227, 0.0179, 0.0190], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 15:25:54,896 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11907.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:25:56,462 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7108, 2.5208, 4.8742, 3.8539, 3.0402, 4.3843, 4.4063, 4.6481], device='cuda:2'), covar=tensor([0.0110, 0.1807, 0.0105, 0.1242, 0.2550, 0.0283, 0.0222, 0.0237], device='cuda:2'), in_proj_covar=tensor([0.0117, 0.0217, 0.0117, 0.0267, 0.0313, 0.0165, 0.0107, 0.0111], device='cuda:2'), out_proj_covar=tensor([9.5962e-05, 1.6520e-04, 9.4500e-05, 2.1154e-04, 2.3555e-04, 1.3381e-04, 8.9938e-05, 9.4715e-05], device='cuda:2') 2023-03-07 15:26:12,005 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3890, 4.4198, 4.2495, 4.5391, 4.7424, 4.3493, 4.1791, 1.9184], device='cuda:2'), covar=tensor([0.0245, 0.0447, 0.0296, 0.0108, 0.1117, 0.0253, 0.0501, 0.3691], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0117, 0.0106, 0.0110, 0.0218, 0.0132, 0.0103, 0.0247], device='cuda:2'), out_proj_covar=tensor([1.1270e-04, 9.8562e-05, 9.1975e-05, 9.9174e-05, 2.0159e-04, 1.1386e-04, 9.3483e-05, 2.1189e-04], device='cuda:2') 2023-03-07 15:26:13,112 INFO [train2.py:809] (2/4) Epoch 3, batch 3950, loss[ctc_loss=0.2158, att_loss=0.3303, loss=0.3074, over 17311.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02313, over 59.00 utterances.], tot_loss[ctc_loss=0.1957, att_loss=0.299, loss=0.2783, over 3257563.09 frames. utt_duration=1233 frames, utt_pad_proportion=0.06046, over 10583.50 utterances.], batch size: 59, lr: 2.90e-02, grad_scale: 8.0 2023-03-07 15:26:14,892 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11920.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:26:57,709 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11948.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:26:57,722 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.6232, 1.3309, 3.2202, 1.6057, 2.7514, 2.7151, 1.9092, 1.4198], device='cuda:2'), covar=tensor([0.1281, 0.1950, 0.0540, 0.2750, 0.1271, 0.0960, 0.1054, 0.4179], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0057, 0.0054, 0.0067, 0.0049, 0.0062, 0.0055, 0.0079], device='cuda:2'), out_proj_covar=tensor([3.7775e-05, 3.4226e-05, 3.3163e-05, 4.2939e-05, 3.4925e-05, 3.7525e-05, 3.5327e-05, 5.5230e-05], device='cuda:2') 2023-03-07 15:27:26,997 INFO [train2.py:809] (2/4) Epoch 4, batch 0, loss[ctc_loss=0.2036, att_loss=0.3032, loss=0.2832, over 16540.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006212, over 45.00 utterances.], tot_loss[ctc_loss=0.2036, att_loss=0.3032, loss=0.2832, over 16540.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006212, over 45.00 utterances.], batch size: 45, lr: 2.71e-02, grad_scale: 8.0 2023-03-07 15:27:26,997 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 15:27:39,231 INFO [train2.py:843] (2/4) Epoch 4, validation: ctc_loss=0.101, att_loss=0.2624, loss=0.2301, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 15:27:39,231 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 15:28:24,357 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11981.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:28:28,835 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11984.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:28:39,329 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11991.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:29:03,779 INFO [train2.py:809] (2/4) Epoch 4, batch 50, loss[ctc_loss=0.1741, att_loss=0.2897, loss=0.2665, over 16767.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.00579, over 48.00 utterances.], tot_loss[ctc_loss=0.1938, att_loss=0.2953, loss=0.275, over 736944.75 frames. utt_duration=1252 frames, utt_pad_proportion=0.0569, over 2356.41 utterances.], batch size: 48, lr: 2.70e-02, grad_scale: 8.0 2023-03-07 15:29:08,345 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.811e+02 4.015e+02 4.845e+02 5.994e+02 1.313e+03, threshold=9.689e+02, percent-clipped=6.0 2023-03-07 15:29:13,801 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12009.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:30:10,858 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12045.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:30:15,600 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6086, 2.1485, 4.9168, 3.7050, 3.1016, 4.2580, 4.4885, 4.5810], device='cuda:2'), covar=tensor([0.0104, 0.1749, 0.0108, 0.1331, 0.2369, 0.0335, 0.0198, 0.0230], device='cuda:2'), in_proj_covar=tensor([0.0116, 0.0218, 0.0115, 0.0270, 0.0319, 0.0164, 0.0106, 0.0114], device='cuda:2'), out_proj_covar=tensor([9.5238e-05, 1.6536e-04, 9.3245e-05, 2.1337e-04, 2.3938e-04, 1.3262e-04, 8.9178e-05, 9.7131e-05], device='cuda:2') 2023-03-07 15:30:24,238 INFO [train2.py:809] (2/4) Epoch 4, batch 100, loss[ctc_loss=0.1698, att_loss=0.2637, loss=0.2449, over 15368.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01115, over 35.00 utterances.], tot_loss[ctc_loss=0.1951, att_loss=0.2992, loss=0.2784, over 1304466.48 frames. utt_duration=1231 frames, utt_pad_proportion=0.05378, over 4245.52 utterances.], batch size: 35, lr: 2.70e-02, grad_scale: 8.0 2023-03-07 15:30:27,572 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12055.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:31:01,989 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=12076.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:31:44,681 INFO [train2.py:809] (2/4) Epoch 4, batch 150, loss[ctc_loss=0.1784, att_loss=0.3021, loss=0.2774, over 16885.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.007238, over 49.00 utterances.], tot_loss[ctc_loss=0.1938, att_loss=0.298, loss=0.2772, over 1734721.50 frames. utt_duration=1213 frames, utt_pad_proportion=0.06196, over 5726.56 utterances.], batch size: 49, lr: 2.69e-02, grad_scale: 8.0 2023-03-07 15:31:49,492 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.353e+02 4.093e+02 5.170e+02 6.831e+02 1.205e+03, threshold=1.034e+03, percent-clipped=5.0 2023-03-07 15:32:38,619 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12137.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:32:51,522 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.92 vs. limit=2.0 2023-03-07 15:33:04,150 INFO [train2.py:809] (2/4) Epoch 4, batch 200, loss[ctc_loss=0.195, att_loss=0.2952, loss=0.2751, over 16177.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.00696, over 41.00 utterances.], tot_loss[ctc_loss=0.1915, att_loss=0.2972, loss=0.2761, over 2073054.56 frames. utt_duration=1231 frames, utt_pad_proportion=0.05954, over 6743.17 utterances.], batch size: 41, lr: 2.69e-02, grad_scale: 8.0 2023-03-07 15:33:34,676 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12172.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:33:59,121 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12188.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:34:21,566 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12202.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:34:23,023 INFO [train2.py:809] (2/4) Epoch 4, batch 250, loss[ctc_loss=0.1716, att_loss=0.3001, loss=0.2744, over 16946.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.007939, over 50.00 utterances.], tot_loss[ctc_loss=0.1907, att_loss=0.2973, loss=0.276, over 2347796.39 frames. utt_duration=1259 frames, utt_pad_proportion=0.0489, over 7466.93 utterances.], batch size: 50, lr: 2.68e-02, grad_scale: 8.0 2023-03-07 15:34:28,452 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 3.783e+02 4.490e+02 5.232e+02 8.659e+02, threshold=8.979e+02, percent-clipped=0.0 2023-03-07 15:35:15,672 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12236.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:35:17,462 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3004, 2.4199, 3.0515, 2.4228, 2.9125, 3.3233, 3.1749, 2.6979], device='cuda:2'), covar=tensor([0.0352, 0.1339, 0.0883, 0.1182, 0.0758, 0.0427, 0.0471, 0.1052], device='cuda:2'), in_proj_covar=tensor([0.0168, 0.0179, 0.0161, 0.0168, 0.0179, 0.0126, 0.0125, 0.0173], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 15:35:42,721 INFO [train2.py:809] (2/4) Epoch 4, batch 300, loss[ctc_loss=0.1909, att_loss=0.3007, loss=0.2787, over 17333.00 frames. utt_duration=1177 frames, utt_pad_proportion=0.02264, over 59.00 utterances.], tot_loss[ctc_loss=0.1907, att_loss=0.2973, loss=0.2759, over 2555985.84 frames. utt_duration=1261 frames, utt_pad_proportion=0.04971, over 8120.05 utterances.], batch size: 59, lr: 2.68e-02, grad_scale: 8.0 2023-03-07 15:36:19,393 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12276.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:36:42,531 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12291.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:37:01,514 INFO [train2.py:809] (2/4) Epoch 4, batch 350, loss[ctc_loss=0.1902, att_loss=0.265, loss=0.25, over 14594.00 frames. utt_duration=1826 frames, utt_pad_proportion=0.04064, over 32.00 utterances.], tot_loss[ctc_loss=0.1901, att_loss=0.2972, loss=0.2758, over 2713702.87 frames. utt_duration=1259 frames, utt_pad_proportion=0.05093, over 8630.19 utterances.], batch size: 32, lr: 2.67e-02, grad_scale: 8.0 2023-03-07 15:37:03,270 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12304.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:37:06,084 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+02 4.027e+02 4.741e+02 6.835e+02 1.897e+03, threshold=9.483e+02, percent-clipped=6.0 2023-03-07 15:37:58,251 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12339.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:37:59,859 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12340.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:38:20,460 INFO [train2.py:809] (2/4) Epoch 4, batch 400, loss[ctc_loss=0.1781, att_loss=0.2866, loss=0.2649, over 16116.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.00671, over 42.00 utterances.], tot_loss[ctc_loss=0.1906, att_loss=0.2975, loss=0.2761, over 2836832.89 frames. utt_duration=1227 frames, utt_pad_proportion=0.06054, over 9255.96 utterances.], batch size: 42, lr: 2.67e-02, grad_scale: 8.0 2023-03-07 15:38:23,729 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12355.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:39:39,638 INFO [train2.py:809] (2/4) Epoch 4, batch 450, loss[ctc_loss=0.197, att_loss=0.3025, loss=0.2814, over 16750.00 frames. utt_duration=1397 frames, utt_pad_proportion=0.007561, over 48.00 utterances.], tot_loss[ctc_loss=0.191, att_loss=0.2975, loss=0.2762, over 2937634.16 frames. utt_duration=1211 frames, utt_pad_proportion=0.06378, over 9714.60 utterances.], batch size: 48, lr: 2.66e-02, grad_scale: 8.0 2023-03-07 15:39:39,755 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12403.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:39:44,237 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 3.731e+02 4.738e+02 5.685e+02 9.673e+02, threshold=9.477e+02, percent-clipped=1.0 2023-03-07 15:40:25,793 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12432.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:40:47,581 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=12446.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:40:51,053 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9060, 3.8707, 3.8632, 2.9243, 3.8504, 3.5740, 3.4900, 2.6702], device='cuda:2'), covar=tensor([0.0090, 0.0091, 0.0160, 0.0527, 0.0083, 0.0255, 0.0228, 0.1036], device='cuda:2'), in_proj_covar=tensor([0.0044, 0.0049, 0.0042, 0.0080, 0.0047, 0.0058, 0.0066, 0.0097], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0003], device='cuda:2') 2023-03-07 15:40:58,414 INFO [train2.py:809] (2/4) Epoch 4, batch 500, loss[ctc_loss=0.1746, att_loss=0.3021, loss=0.2766, over 17036.00 frames. utt_duration=1287 frames, utt_pad_proportion=0.01059, over 53.00 utterances.], tot_loss[ctc_loss=0.1912, att_loss=0.2968, loss=0.2757, over 2997550.90 frames. utt_duration=1189 frames, utt_pad_proportion=0.07539, over 10098.63 utterances.], batch size: 53, lr: 2.66e-02, grad_scale: 8.0 2023-03-07 15:41:11,511 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6942, 3.8096, 3.3463, 3.4606, 3.9808, 3.5303, 2.5468, 4.6521], device='cuda:2'), covar=tensor([0.1079, 0.0396, 0.1213, 0.0694, 0.0501, 0.0797, 0.1029, 0.0180], device='cuda:2'), in_proj_covar=tensor([0.0132, 0.0100, 0.0153, 0.0124, 0.0119, 0.0146, 0.0128, 0.0090], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 15:41:13,522 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.07 vs. limit=2.0 2023-03-07 15:41:28,694 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12472.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:42:15,273 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12502.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:42:16,489 INFO [train2.py:809] (2/4) Epoch 4, batch 550, loss[ctc_loss=0.1761, att_loss=0.2848, loss=0.263, over 15624.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.01016, over 37.00 utterances.], tot_loss[ctc_loss=0.1911, att_loss=0.2963, loss=0.2752, over 3063023.15 frames. utt_duration=1221 frames, utt_pad_proportion=0.0648, over 10049.41 utterances.], batch size: 37, lr: 2.65e-02, grad_scale: 8.0 2023-03-07 15:42:21,768 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.239e+02 3.520e+02 4.552e+02 5.513e+02 1.151e+03, threshold=9.104e+02, percent-clipped=4.0 2023-03-07 15:42:23,565 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12507.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:42:31,199 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2856, 4.9513, 4.9639, 3.9474, 1.9811, 2.5996, 5.0498, 3.7373], device='cuda:2'), covar=tensor([0.0449, 0.0129, 0.0136, 0.0839, 0.6820, 0.2312, 0.0141, 0.2014], device='cuda:2'), in_proj_covar=tensor([0.0247, 0.0145, 0.0163, 0.0192, 0.0398, 0.0311, 0.0154, 0.0275], device='cuda:2'), out_proj_covar=tensor([1.4088e-04, 7.4311e-05, 8.6617e-05, 9.3745e-05, 2.0111e-04, 1.5521e-04, 8.0547e-05, 1.5241e-04], device='cuda:2') 2023-03-07 15:42:43,559 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12520.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:42:48,552 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2404, 4.8855, 4.9809, 4.2327, 1.8813, 2.6929, 5.0589, 3.8071], device='cuda:2'), covar=tensor([0.0503, 0.0129, 0.0125, 0.0552, 0.7522, 0.2432, 0.0130, 0.1759], device='cuda:2'), in_proj_covar=tensor([0.0250, 0.0146, 0.0164, 0.0192, 0.0400, 0.0312, 0.0156, 0.0276], device='cuda:2'), out_proj_covar=tensor([1.4195e-04, 7.4095e-05, 8.6478e-05, 9.4224e-05, 2.0193e-04, 1.5545e-04, 8.0749e-05, 1.5297e-04], device='cuda:2') 2023-03-07 15:42:56,636 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.11 vs. limit=2.0 2023-03-07 15:43:31,171 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12550.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:43:35,779 INFO [train2.py:809] (2/4) Epoch 4, batch 600, loss[ctc_loss=0.2781, att_loss=0.3446, loss=0.3313, over 13626.00 frames. utt_duration=374.7 frames, utt_pad_proportion=0.3473, over 146.00 utterances.], tot_loss[ctc_loss=0.1903, att_loss=0.2961, loss=0.2749, over 3111660.64 frames. utt_duration=1200 frames, utt_pad_proportion=0.06805, over 10386.89 utterances.], batch size: 146, lr: 2.65e-02, grad_scale: 8.0 2023-03-07 15:44:13,047 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12576.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:44:27,283 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-07 15:44:51,700 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.48 vs. limit=2.0 2023-03-07 15:44:55,221 INFO [train2.py:809] (2/4) Epoch 4, batch 650, loss[ctc_loss=0.1655, att_loss=0.2829, loss=0.2594, over 16403.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.00685, over 44.00 utterances.], tot_loss[ctc_loss=0.1877, att_loss=0.2949, loss=0.2734, over 3148763.02 frames. utt_duration=1204 frames, utt_pad_proportion=0.06593, over 10472.53 utterances.], batch size: 44, lr: 2.65e-02, grad_scale: 8.0 2023-03-07 15:44:57,704 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12604.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:45:00,415 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 3.647e+02 4.227e+02 5.409e+02 1.351e+03, threshold=8.454e+02, percent-clipped=4.0 2023-03-07 15:45:29,071 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12624.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:45:53,198 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12640.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:46:13,060 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12652.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:46:14,654 INFO [train2.py:809] (2/4) Epoch 4, batch 700, loss[ctc_loss=0.1966, att_loss=0.3049, loss=0.2833, over 17381.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03398, over 63.00 utterances.], tot_loss[ctc_loss=0.1883, att_loss=0.2956, loss=0.2741, over 3183808.38 frames. utt_duration=1199 frames, utt_pad_proportion=0.06498, over 10633.57 utterances.], batch size: 63, lr: 2.64e-02, grad_scale: 8.0 2023-03-07 15:47:09,144 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12688.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:47:34,090 INFO [train2.py:809] (2/4) Epoch 4, batch 750, loss[ctc_loss=0.1763, att_loss=0.2738, loss=0.2543, over 15917.00 frames. utt_duration=1634 frames, utt_pad_proportion=0.007197, over 39.00 utterances.], tot_loss[ctc_loss=0.1882, att_loss=0.2954, loss=0.274, over 3200835.31 frames. utt_duration=1213 frames, utt_pad_proportion=0.06302, over 10571.64 utterances.], batch size: 39, lr: 2.64e-02, grad_scale: 8.0 2023-03-07 15:47:38,634 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 3.712e+02 4.325e+02 6.002e+02 1.306e+03, threshold=8.650e+02, percent-clipped=6.0 2023-03-07 15:47:42,007 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.5525, 0.7805, 2.3384, 1.7691, 1.4961, 2.3999, 1.2078, 0.7779], device='cuda:2'), covar=tensor([0.1267, 0.3479, 0.1270, 0.2541, 0.2336, 0.2200, 0.1489, 0.7771], device='cuda:2'), in_proj_covar=tensor([0.0067, 0.0060, 0.0059, 0.0067, 0.0056, 0.0064, 0.0060, 0.0087], device='cuda:2'), out_proj_covar=tensor([3.9572e-05, 3.8257e-05, 3.5758e-05, 4.3582e-05, 3.7858e-05, 4.0706e-05, 3.8702e-05, 6.1371e-05], device='cuda:2') 2023-03-07 15:48:10,458 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2023-03-07 15:48:19,075 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12732.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:48:43,129 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0510, 6.2120, 5.6398, 6.1971, 5.9621, 5.6750, 5.7009, 5.5414], device='cuda:2'), covar=tensor([0.0750, 0.0642, 0.0611, 0.0567, 0.0431, 0.1073, 0.1797, 0.1615], device='cuda:2'), in_proj_covar=tensor([0.0293, 0.0326, 0.0264, 0.0264, 0.0240, 0.0329, 0.0357, 0.0344], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 15:48:52,752 INFO [train2.py:809] (2/4) Epoch 4, batch 800, loss[ctc_loss=0.1925, att_loss=0.303, loss=0.2809, over 17297.00 frames. utt_duration=877.1 frames, utt_pad_proportion=0.08258, over 79.00 utterances.], tot_loss[ctc_loss=0.1883, att_loss=0.2953, loss=0.2739, over 3213787.16 frames. utt_duration=1211 frames, utt_pad_proportion=0.06427, over 10624.50 utterances.], batch size: 79, lr: 2.63e-02, grad_scale: 8.0 2023-03-07 15:49:08,752 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=12763.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:49:35,445 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12780.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:49:44,687 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3214, 2.4795, 3.6337, 2.1488, 3.6029, 4.5183, 4.3583, 3.0389], device='cuda:2'), covar=tensor([0.0426, 0.1777, 0.0890, 0.1706, 0.0844, 0.0317, 0.0322, 0.1583], device='cuda:2'), in_proj_covar=tensor([0.0167, 0.0176, 0.0162, 0.0165, 0.0179, 0.0132, 0.0123, 0.0176], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 15:50:11,384 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12802.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:50:12,736 INFO [train2.py:809] (2/4) Epoch 4, batch 850, loss[ctc_loss=0.1846, att_loss=0.2688, loss=0.2519, over 15487.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009376, over 36.00 utterances.], tot_loss[ctc_loss=0.19, att_loss=0.2958, loss=0.2747, over 3225710.77 frames. utt_duration=1188 frames, utt_pad_proportion=0.0702, over 10871.83 utterances.], batch size: 36, lr: 2.63e-02, grad_scale: 8.0 2023-03-07 15:50:17,379 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+02 3.893e+02 4.938e+02 6.078e+02 1.799e+03, threshold=9.877e+02, percent-clipped=7.0 2023-03-07 15:50:46,144 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12824.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 15:51:07,314 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.84 vs. limit=2.0 2023-03-07 15:51:32,714 INFO [train2.py:809] (2/4) Epoch 4, batch 900, loss[ctc_loss=0.1851, att_loss=0.2835, loss=0.2638, over 15954.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007104, over 41.00 utterances.], tot_loss[ctc_loss=0.1907, att_loss=0.2963, loss=0.2752, over 3235155.33 frames. utt_duration=1174 frames, utt_pad_proportion=0.07398, over 11037.86 utterances.], batch size: 41, lr: 2.62e-02, grad_scale: 16.0 2023-03-07 15:52:53,765 INFO [train2.py:809] (2/4) Epoch 4, batch 950, loss[ctc_loss=0.188, att_loss=0.3138, loss=0.2887, over 17315.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.01137, over 55.00 utterances.], tot_loss[ctc_loss=0.1893, att_loss=0.2962, loss=0.2748, over 3242200.46 frames. utt_duration=1203 frames, utt_pad_proportion=0.06654, over 10798.28 utterances.], batch size: 55, lr: 2.62e-02, grad_scale: 16.0 2023-03-07 15:52:58,563 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 3.388e+02 4.239e+02 5.070e+02 1.281e+03, threshold=8.479e+02, percent-clipped=6.0 2023-03-07 15:54:15,089 INFO [train2.py:809] (2/4) Epoch 4, batch 1000, loss[ctc_loss=0.1562, att_loss=0.2575, loss=0.2373, over 15753.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009766, over 38.00 utterances.], tot_loss[ctc_loss=0.1887, att_loss=0.2956, loss=0.2742, over 3245162.20 frames. utt_duration=1202 frames, utt_pad_proportion=0.0668, over 10812.67 utterances.], batch size: 38, lr: 2.61e-02, grad_scale: 8.0 2023-03-07 15:54:31,868 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4738, 5.6505, 5.0335, 5.6453, 5.3803, 5.1262, 5.0902, 4.9922], device='cuda:2'), covar=tensor([0.1096, 0.0878, 0.0764, 0.0710, 0.0572, 0.1371, 0.2257, 0.2107], device='cuda:2'), in_proj_covar=tensor([0.0298, 0.0332, 0.0271, 0.0270, 0.0245, 0.0333, 0.0370, 0.0350], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 15:54:49,524 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8215, 5.3015, 5.1350, 5.0828, 5.2932, 5.1749, 5.1223, 4.8501], device='cuda:2'), covar=tensor([0.1048, 0.0311, 0.0182, 0.0419, 0.0271, 0.0249, 0.0188, 0.0259], device='cuda:2'), in_proj_covar=tensor([0.0329, 0.0200, 0.0140, 0.0172, 0.0221, 0.0238, 0.0182, 0.0209], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0003, 0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 15:55:00,661 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6892, 3.0682, 3.9240, 2.5858, 3.9332, 4.6184, 4.6235, 3.3758], device='cuda:2'), covar=tensor([0.0318, 0.1521, 0.0811, 0.1624, 0.0701, 0.0512, 0.0392, 0.1378], device='cuda:2'), in_proj_covar=tensor([0.0174, 0.0183, 0.0168, 0.0170, 0.0183, 0.0140, 0.0131, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 15:55:37,218 INFO [train2.py:809] (2/4) Epoch 4, batch 1050, loss[ctc_loss=0.238, att_loss=0.331, loss=0.3124, over 17357.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.0205, over 59.00 utterances.], tot_loss[ctc_loss=0.1873, att_loss=0.2941, loss=0.2727, over 3246133.19 frames. utt_duration=1218 frames, utt_pad_proportion=0.06327, over 10676.97 utterances.], batch size: 59, lr: 2.61e-02, grad_scale: 8.0 2023-03-07 15:55:39,200 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.2878, 2.8411, 2.3279, 2.3381, 2.2299, 2.7716, 1.6198, 1.5004], device='cuda:2'), covar=tensor([0.1614, 0.0440, 0.1009, 0.1735, 0.1184, 0.1599, 0.1140, 0.5544], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0054, 0.0059, 0.0067, 0.0054, 0.0065, 0.0061, 0.0088], device='cuda:2'), out_proj_covar=tensor([3.9193e-05, 3.5221e-05, 3.5719e-05, 4.4057e-05, 3.6519e-05, 4.2101e-05, 3.9428e-05, 6.1899e-05], device='cuda:2') 2023-03-07 15:55:43,248 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.497e+02 3.995e+02 4.964e+02 6.656e+02 1.651e+03, threshold=9.928e+02, percent-clipped=9.0 2023-03-07 15:56:24,834 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9846, 5.2677, 5.5478, 5.6719, 5.1123, 5.8893, 5.1650, 6.0357], device='cuda:2'), covar=tensor([0.0488, 0.0544, 0.0395, 0.0522, 0.1794, 0.0634, 0.0442, 0.0329], device='cuda:2'), in_proj_covar=tensor([0.0416, 0.0275, 0.0275, 0.0325, 0.0481, 0.0277, 0.0238, 0.0293], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0002, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 15:56:55,990 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.80 vs. limit=2.0 2023-03-07 15:56:58,343 INFO [train2.py:809] (2/4) Epoch 4, batch 1100, loss[ctc_loss=0.1442, att_loss=0.2642, loss=0.2402, over 16277.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007132, over 43.00 utterances.], tot_loss[ctc_loss=0.1847, att_loss=0.292, loss=0.2705, over 3251344.36 frames. utt_duration=1250 frames, utt_pad_proportion=0.05512, over 10414.58 utterances.], batch size: 43, lr: 2.61e-02, grad_scale: 8.0 2023-03-07 15:58:14,878 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2023-03-07 15:58:18,903 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=13102.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:58:20,185 INFO [train2.py:809] (2/4) Epoch 4, batch 1150, loss[ctc_loss=0.2113, att_loss=0.3172, loss=0.296, over 16746.00 frames. utt_duration=678.2 frames, utt_pad_proportion=0.149, over 99.00 utterances.], tot_loss[ctc_loss=0.1835, att_loss=0.2913, loss=0.2698, over 3258792.21 frames. utt_duration=1275 frames, utt_pad_proportion=0.04857, over 10231.99 utterances.], batch size: 99, lr: 2.60e-02, grad_scale: 8.0 2023-03-07 15:58:26,390 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 3.646e+02 4.698e+02 5.581e+02 1.233e+03, threshold=9.396e+02, percent-clipped=4.0 2023-03-07 15:58:46,219 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=13119.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 15:59:08,306 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1525, 4.9527, 4.8988, 4.0929, 1.9458, 2.6427, 5.0020, 3.5240], device='cuda:2'), covar=tensor([0.0472, 0.0130, 0.0140, 0.0693, 0.7399, 0.2608, 0.0151, 0.2162], device='cuda:2'), in_proj_covar=tensor([0.0247, 0.0146, 0.0165, 0.0185, 0.0386, 0.0310, 0.0152, 0.0280], device='cuda:2'), out_proj_covar=tensor([1.3989e-04, 7.4416e-05, 8.7554e-05, 8.9749e-05, 1.9606e-04, 1.5426e-04, 7.8012e-05, 1.5293e-04], device='cuda:2') 2023-03-07 15:59:09,875 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3766, 5.1827, 5.1094, 4.0383, 1.9516, 2.7084, 5.2977, 3.7188], device='cuda:2'), covar=tensor([0.0405, 0.0151, 0.0169, 0.0952, 0.7704, 0.2652, 0.0225, 0.2137], device='cuda:2'), in_proj_covar=tensor([0.0246, 0.0145, 0.0165, 0.0185, 0.0386, 0.0309, 0.0152, 0.0279], device='cuda:2'), out_proj_covar=tensor([1.3962e-04, 7.4285e-05, 8.7394e-05, 8.9639e-05, 1.9569e-04, 1.5404e-04, 7.7865e-05, 1.5261e-04], device='cuda:2') 2023-03-07 15:59:35,982 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=13150.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 15:59:40,594 INFO [train2.py:809] (2/4) Epoch 4, batch 1200, loss[ctc_loss=0.167, att_loss=0.2717, loss=0.2507, over 15871.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.01009, over 39.00 utterances.], tot_loss[ctc_loss=0.184, att_loss=0.2918, loss=0.2702, over 3267926.82 frames. utt_duration=1295 frames, utt_pad_proportion=0.04243, over 10106.22 utterances.], batch size: 39, lr: 2.60e-02, grad_scale: 8.0 2023-03-07 16:01:01,001 INFO [train2.py:809] (2/4) Epoch 4, batch 1250, loss[ctc_loss=0.1932, att_loss=0.31, loss=0.2867, over 17299.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01136, over 55.00 utterances.], tot_loss[ctc_loss=0.1847, att_loss=0.2923, loss=0.2707, over 3269579.52 frames. utt_duration=1264 frames, utt_pad_proportion=0.05121, over 10362.22 utterances.], batch size: 55, lr: 2.59e-02, grad_scale: 8.0 2023-03-07 16:01:07,179 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+02 3.747e+02 4.896e+02 5.785e+02 2.345e+03, threshold=9.791e+02, percent-clipped=4.0 2023-03-07 16:02:21,490 INFO [train2.py:809] (2/4) Epoch 4, batch 1300, loss[ctc_loss=0.1909, att_loss=0.3131, loss=0.2887, over 17403.00 frames. utt_duration=1106 frames, utt_pad_proportion=0.03284, over 63.00 utterances.], tot_loss[ctc_loss=0.1835, att_loss=0.2919, loss=0.2702, over 3266448.89 frames. utt_duration=1268 frames, utt_pad_proportion=0.05037, over 10319.17 utterances.], batch size: 63, lr: 2.59e-02, grad_scale: 8.0 2023-03-07 16:03:26,581 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3107, 4.2197, 4.3763, 2.9449, 4.0760, 3.7170, 3.8564, 2.3985], device='cuda:2'), covar=tensor([0.0116, 0.0116, 0.0119, 0.0687, 0.0115, 0.0231, 0.0257, 0.1450], device='cuda:2'), in_proj_covar=tensor([0.0047, 0.0051, 0.0045, 0.0082, 0.0050, 0.0059, 0.0069, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:2') 2023-03-07 16:03:40,653 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1819, 4.7801, 4.6177, 4.8125, 4.6444, 4.5319, 4.2938, 4.6372], device='cuda:2'), covar=tensor([0.0102, 0.0112, 0.0077, 0.0090, 0.0082, 0.0088, 0.0292, 0.0173], device='cuda:2'), in_proj_covar=tensor([0.0049, 0.0049, 0.0048, 0.0036, 0.0036, 0.0044, 0.0063, 0.0059], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 16:03:41,842 INFO [train2.py:809] (2/4) Epoch 4, batch 1350, loss[ctc_loss=0.2388, att_loss=0.3258, loss=0.3084, over 17289.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01173, over 55.00 utterances.], tot_loss[ctc_loss=0.184, att_loss=0.2923, loss=0.2706, over 3273862.91 frames. utt_duration=1251 frames, utt_pad_proportion=0.05258, over 10477.54 utterances.], batch size: 55, lr: 2.58e-02, grad_scale: 8.0 2023-03-07 16:03:48,889 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 3.582e+02 4.373e+02 5.653e+02 1.802e+03, threshold=8.745e+02, percent-clipped=3.0 2023-03-07 16:04:26,687 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7554, 4.1796, 3.9319, 4.3686, 4.2269, 4.1274, 3.8312, 4.0709], device='cuda:2'), covar=tensor([0.0127, 0.0156, 0.0132, 0.0080, 0.0095, 0.0099, 0.0297, 0.0216], device='cuda:2'), in_proj_covar=tensor([0.0049, 0.0049, 0.0048, 0.0036, 0.0036, 0.0044, 0.0063, 0.0060], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 16:05:02,319 INFO [train2.py:809] (2/4) Epoch 4, batch 1400, loss[ctc_loss=0.1691, att_loss=0.3012, loss=0.2748, over 17060.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.007887, over 52.00 utterances.], tot_loss[ctc_loss=0.1835, att_loss=0.2921, loss=0.2704, over 3270480.17 frames. utt_duration=1263 frames, utt_pad_proportion=0.05047, over 10370.01 utterances.], batch size: 52, lr: 2.58e-02, grad_scale: 8.0 2023-03-07 16:06:22,102 INFO [train2.py:809] (2/4) Epoch 4, batch 1450, loss[ctc_loss=0.1475, att_loss=0.2556, loss=0.234, over 15755.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009578, over 38.00 utterances.], tot_loss[ctc_loss=0.1842, att_loss=0.293, loss=0.2712, over 3281073.89 frames. utt_duration=1268 frames, utt_pad_proportion=0.04704, over 10363.46 utterances.], batch size: 38, lr: 2.58e-02, grad_scale: 8.0 2023-03-07 16:06:28,369 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.701e+02 3.999e+02 4.753e+02 5.960e+02 1.214e+03, threshold=9.507e+02, percent-clipped=6.0 2023-03-07 16:06:47,412 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=13419.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:07:42,519 INFO [train2.py:809] (2/4) Epoch 4, batch 1500, loss[ctc_loss=0.1679, att_loss=0.2795, loss=0.2571, over 16689.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005714, over 46.00 utterances.], tot_loss[ctc_loss=0.1827, att_loss=0.2919, loss=0.27, over 3278770.60 frames. utt_duration=1277 frames, utt_pad_proportion=0.04553, over 10280.52 utterances.], batch size: 46, lr: 2.57e-02, grad_scale: 8.0 2023-03-07 16:08:04,754 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=13467.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:08:47,824 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.55 vs. limit=2.0 2023-03-07 16:09:03,078 INFO [train2.py:809] (2/4) Epoch 4, batch 1550, loss[ctc_loss=0.1389, att_loss=0.2528, loss=0.23, over 15879.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009658, over 39.00 utterances.], tot_loss[ctc_loss=0.1831, att_loss=0.2923, loss=0.2704, over 3276811.33 frames. utt_duration=1284 frames, utt_pad_proportion=0.04621, over 10221.39 utterances.], batch size: 39, lr: 2.57e-02, grad_scale: 8.0 2023-03-07 16:09:09,260 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.195e+02 3.814e+02 4.608e+02 5.940e+02 1.228e+03, threshold=9.215e+02, percent-clipped=5.0 2023-03-07 16:10:19,112 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0526, 4.9658, 4.9674, 2.5895, 4.8994, 4.3698, 4.1704, 2.4025], device='cuda:2'), covar=tensor([0.0098, 0.0077, 0.0179, 0.0926, 0.0082, 0.0141, 0.0294, 0.1533], device='cuda:2'), in_proj_covar=tensor([0.0047, 0.0050, 0.0042, 0.0082, 0.0048, 0.0059, 0.0068, 0.0096], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0003], device='cuda:2') 2023-03-07 16:10:23,309 INFO [train2.py:809] (2/4) Epoch 4, batch 1600, loss[ctc_loss=0.2632, att_loss=0.3362, loss=0.3216, over 17241.00 frames. utt_duration=874.5 frames, utt_pad_proportion=0.08331, over 79.00 utterances.], tot_loss[ctc_loss=0.1834, att_loss=0.2928, loss=0.2709, over 3278289.00 frames. utt_duration=1284 frames, utt_pad_proportion=0.0453, over 10222.58 utterances.], batch size: 79, lr: 2.56e-02, grad_scale: 8.0 2023-03-07 16:10:53,060 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6372, 2.3760, 5.0014, 3.7466, 3.1354, 4.3113, 4.5080, 4.5674], device='cuda:2'), covar=tensor([0.0115, 0.1853, 0.0079, 0.1219, 0.2369, 0.0314, 0.0193, 0.0271], device='cuda:2'), in_proj_covar=tensor([0.0117, 0.0227, 0.0118, 0.0278, 0.0322, 0.0175, 0.0105, 0.0123], device='cuda:2'), out_proj_covar=tensor([9.7620e-05, 1.7496e-04, 9.7620e-05, 2.2032e-04, 2.4693e-04, 1.4447e-04, 8.8734e-05, 1.0413e-04], device='cuda:2') 2023-03-07 16:11:42,751 INFO [train2.py:809] (2/4) Epoch 4, batch 1650, loss[ctc_loss=0.1887, att_loss=0.2973, loss=0.2756, over 16867.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007435, over 49.00 utterances.], tot_loss[ctc_loss=0.1856, att_loss=0.294, loss=0.2723, over 3274932.97 frames. utt_duration=1233 frames, utt_pad_proportion=0.05944, over 10635.80 utterances.], batch size: 49, lr: 2.56e-02, grad_scale: 8.0 2023-03-07 16:11:48,974 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+02 3.919e+02 5.311e+02 6.584e+02 2.912e+03, threshold=1.062e+03, percent-clipped=12.0 2023-03-07 16:12:13,038 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2023-03-07 16:13:01,503 INFO [train2.py:809] (2/4) Epoch 4, batch 1700, loss[ctc_loss=0.1517, att_loss=0.2724, loss=0.2482, over 16339.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005609, over 45.00 utterances.], tot_loss[ctc_loss=0.1852, att_loss=0.2937, loss=0.272, over 3272372.97 frames. utt_duration=1243 frames, utt_pad_proportion=0.05868, over 10546.02 utterances.], batch size: 45, lr: 2.55e-02, grad_scale: 8.0 2023-03-07 16:14:05,302 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3396, 3.7229, 2.9677, 3.0291, 3.8701, 3.5951, 2.5118, 4.3860], device='cuda:2'), covar=tensor([0.1267, 0.0332, 0.1052, 0.0770, 0.0401, 0.0533, 0.0925, 0.0212], device='cuda:2'), in_proj_covar=tensor([0.0138, 0.0110, 0.0162, 0.0129, 0.0126, 0.0151, 0.0133, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 16:14:20,718 INFO [train2.py:809] (2/4) Epoch 4, batch 1750, loss[ctc_loss=0.1687, att_loss=0.2752, loss=0.2539, over 16024.00 frames. utt_duration=1604 frames, utt_pad_proportion=0.006227, over 40.00 utterances.], tot_loss[ctc_loss=0.185, att_loss=0.2933, loss=0.2717, over 3265834.72 frames. utt_duration=1254 frames, utt_pad_proportion=0.05735, over 10428.41 utterances.], batch size: 40, lr: 2.55e-02, grad_scale: 8.0 2023-03-07 16:14:27,018 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.465e+02 3.709e+02 4.336e+02 5.273e+02 1.516e+03, threshold=8.672e+02, percent-clipped=2.0 2023-03-07 16:15:15,166 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=13737.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:15:24,037 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3476, 2.1635, 2.8140, 3.8953, 3.7112, 3.8941, 2.6625, 1.7104], device='cuda:2'), covar=tensor([0.0513, 0.2686, 0.1287, 0.0581, 0.0433, 0.0227, 0.1907, 0.2926], device='cuda:2'), in_proj_covar=tensor([0.0132, 0.0181, 0.0174, 0.0130, 0.0114, 0.0102, 0.0176, 0.0167], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 16:15:39,465 INFO [train2.py:809] (2/4) Epoch 4, batch 1800, loss[ctc_loss=0.1789, att_loss=0.302, loss=0.2773, over 17052.00 frames. utt_duration=1339 frames, utt_pad_proportion=0.006749, over 51.00 utterances.], tot_loss[ctc_loss=0.1852, att_loss=0.294, loss=0.2722, over 3276104.32 frames. utt_duration=1261 frames, utt_pad_proportion=0.05153, over 10408.23 utterances.], batch size: 51, lr: 2.55e-02, grad_scale: 8.0 2023-03-07 16:16:15,874 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2023-03-07 16:16:51,931 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=13798.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 16:16:59,424 INFO [train2.py:809] (2/4) Epoch 4, batch 1850, loss[ctc_loss=0.2249, att_loss=0.3167, loss=0.2983, over 17222.00 frames. utt_duration=690.4 frames, utt_pad_proportion=0.1272, over 100.00 utterances.], tot_loss[ctc_loss=0.1852, att_loss=0.2937, loss=0.272, over 3273063.76 frames. utt_duration=1219 frames, utt_pad_proportion=0.06123, over 10751.94 utterances.], batch size: 100, lr: 2.54e-02, grad_scale: 8.0 2023-03-07 16:17:05,503 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.641e+02 3.804e+02 4.653e+02 6.004e+02 1.107e+03, threshold=9.305e+02, percent-clipped=7.0 2023-03-07 16:18:19,881 INFO [train2.py:809] (2/4) Epoch 4, batch 1900, loss[ctc_loss=0.2311, att_loss=0.3165, loss=0.2994, over 17035.00 frames. utt_duration=1338 frames, utt_pad_proportion=0.006871, over 51.00 utterances.], tot_loss[ctc_loss=0.184, att_loss=0.2928, loss=0.271, over 3275012.18 frames. utt_duration=1212 frames, utt_pad_proportion=0.06157, over 10824.54 utterances.], batch size: 51, lr: 2.54e-02, grad_scale: 8.0 2023-03-07 16:19:39,437 INFO [train2.py:809] (2/4) Epoch 4, batch 1950, loss[ctc_loss=0.2028, att_loss=0.2895, loss=0.2722, over 16129.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005698, over 42.00 utterances.], tot_loss[ctc_loss=0.1835, att_loss=0.2921, loss=0.2704, over 3271328.21 frames. utt_duration=1232 frames, utt_pad_proportion=0.05747, over 10638.30 utterances.], batch size: 42, lr: 2.53e-02, grad_scale: 8.0 2023-03-07 16:19:45,526 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+02 3.636e+02 4.382e+02 5.668e+02 1.213e+03, threshold=8.764e+02, percent-clipped=2.0 2023-03-07 16:20:59,405 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.21 vs. limit=2.0 2023-03-07 16:21:00,026 INFO [train2.py:809] (2/4) Epoch 4, batch 2000, loss[ctc_loss=0.1744, att_loss=0.3039, loss=0.278, over 17055.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008723, over 52.00 utterances.], tot_loss[ctc_loss=0.1849, att_loss=0.2931, loss=0.2715, over 3270722.42 frames. utt_duration=1190 frames, utt_pad_proportion=0.06826, over 11012.09 utterances.], batch size: 52, lr: 2.53e-02, grad_scale: 8.0 2023-03-07 16:21:11,922 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.56 vs. limit=2.0 2023-03-07 16:21:32,380 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5007, 2.9108, 3.6042, 2.8306, 3.5738, 4.4492, 4.3274, 3.2320], device='cuda:2'), covar=tensor([0.0377, 0.1657, 0.1055, 0.1464, 0.1094, 0.0428, 0.0570, 0.1339], device='cuda:2'), in_proj_covar=tensor([0.0186, 0.0192, 0.0176, 0.0174, 0.0197, 0.0152, 0.0143, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 16:22:11,234 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.70 vs. limit=5.0 2023-03-07 16:22:23,888 INFO [train2.py:809] (2/4) Epoch 4, batch 2050, loss[ctc_loss=0.1637, att_loss=0.2866, loss=0.262, over 16886.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.007326, over 49.00 utterances.], tot_loss[ctc_loss=0.1836, att_loss=0.2926, loss=0.2708, over 3267544.32 frames. utt_duration=1191 frames, utt_pad_proportion=0.06927, over 10986.36 utterances.], batch size: 49, lr: 2.53e-02, grad_scale: 8.0 2023-03-07 16:22:30,196 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.271e+02 3.689e+02 4.518e+02 6.015e+02 1.676e+03, threshold=9.035e+02, percent-clipped=8.0 2023-03-07 16:22:32,531 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.55 vs. limit=5.0 2023-03-07 16:22:44,293 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-07 16:22:48,796 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2023-03-07 16:23:43,381 INFO [train2.py:809] (2/4) Epoch 4, batch 2100, loss[ctc_loss=0.1863, att_loss=0.3077, loss=0.2834, over 17281.00 frames. utt_duration=1099 frames, utt_pad_proportion=0.0397, over 63.00 utterances.], tot_loss[ctc_loss=0.1837, att_loss=0.2925, loss=0.2707, over 3271592.10 frames. utt_duration=1193 frames, utt_pad_proportion=0.06739, over 10987.51 utterances.], batch size: 63, lr: 2.52e-02, grad_scale: 8.0 2023-03-07 16:24:14,252 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14072.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:24:47,827 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14093.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 16:25:03,072 INFO [train2.py:809] (2/4) Epoch 4, batch 2150, loss[ctc_loss=0.1848, att_loss=0.3033, loss=0.2796, over 17031.00 frames. utt_duration=1287 frames, utt_pad_proportion=0.01085, over 53.00 utterances.], tot_loss[ctc_loss=0.1844, att_loss=0.2937, loss=0.2718, over 3283042.38 frames. utt_duration=1182 frames, utt_pad_proportion=0.06658, over 11122.13 utterances.], batch size: 53, lr: 2.52e-02, grad_scale: 8.0 2023-03-07 16:25:03,437 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14103.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:25:09,250 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 3.723e+02 4.436e+02 6.346e+02 3.042e+03, threshold=8.872e+02, percent-clipped=8.0 2023-03-07 16:25:30,145 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14120.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 16:25:47,590 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-07 16:25:51,828 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14133.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:26:23,576 INFO [train2.py:809] (2/4) Epoch 4, batch 2200, loss[ctc_loss=0.183, att_loss=0.3028, loss=0.2789, over 17147.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.01359, over 56.00 utterances.], tot_loss[ctc_loss=0.1837, att_loss=0.2934, loss=0.2714, over 3280228.35 frames. utt_duration=1197 frames, utt_pad_proportion=0.06287, over 10971.78 utterances.], batch size: 56, lr: 2.51e-02, grad_scale: 8.0 2023-03-07 16:26:41,868 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14164.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:26:42,235 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.66 vs. limit=5.0 2023-03-07 16:26:53,303 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9278, 5.3373, 4.9688, 5.0842, 5.4229, 5.3317, 5.1521, 4.9686], device='cuda:2'), covar=tensor([0.0808, 0.0336, 0.0278, 0.0382, 0.0177, 0.0207, 0.0189, 0.0224], device='cuda:2'), in_proj_covar=tensor([0.0339, 0.0200, 0.0138, 0.0177, 0.0218, 0.0239, 0.0186, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 16:27:09,483 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14181.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 16:27:44,783 INFO [train2.py:809] (2/4) Epoch 4, batch 2250, loss[ctc_loss=0.1647, att_loss=0.2815, loss=0.2581, over 16274.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.00763, over 43.00 utterances.], tot_loss[ctc_loss=0.1824, att_loss=0.2927, loss=0.2706, over 3282314.40 frames. utt_duration=1224 frames, utt_pad_proportion=0.05668, over 10741.07 utterances.], batch size: 43, lr: 2.51e-02, grad_scale: 8.0 2023-03-07 16:27:50,860 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.200e+02 3.354e+02 4.208e+02 5.126e+02 1.091e+03, threshold=8.417e+02, percent-clipped=2.0 2023-03-07 16:29:04,195 INFO [train2.py:809] (2/4) Epoch 4, batch 2300, loss[ctc_loss=0.1779, att_loss=0.3003, loss=0.2758, over 16614.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005857, over 47.00 utterances.], tot_loss[ctc_loss=0.1844, att_loss=0.2938, loss=0.2719, over 3279324.63 frames. utt_duration=1183 frames, utt_pad_proportion=0.0681, over 11105.18 utterances.], batch size: 47, lr: 2.51e-02, grad_scale: 8.0 2023-03-07 16:29:08,139 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2023-03-07 16:29:09,333 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9127, 4.6359, 4.5740, 4.4878, 4.9838, 4.8552, 4.2444, 2.0189], device='cuda:2'), covar=tensor([0.0155, 0.0251, 0.0218, 0.0172, 0.1194, 0.0152, 0.0445, 0.3384], device='cuda:2'), in_proj_covar=tensor([0.0133, 0.0116, 0.0111, 0.0117, 0.0249, 0.0127, 0.0104, 0.0249], device='cuda:2'), out_proj_covar=tensor([1.2244e-04, 1.0135e-04, 9.8882e-05, 1.0933e-04, 2.3087e-04, 1.1731e-04, 9.9004e-05, 2.2429e-04], device='cuda:2') 2023-03-07 16:30:24,649 INFO [train2.py:809] (2/4) Epoch 4, batch 2350, loss[ctc_loss=0.1557, att_loss=0.2701, loss=0.2472, over 16009.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.005924, over 40.00 utterances.], tot_loss[ctc_loss=0.1826, att_loss=0.2931, loss=0.271, over 3277228.52 frames. utt_duration=1198 frames, utt_pad_proportion=0.06511, over 10952.05 utterances.], batch size: 40, lr: 2.50e-02, grad_scale: 8.0 2023-03-07 16:30:31,148 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.697e+02 3.860e+02 4.546e+02 5.797e+02 1.491e+03, threshold=9.092e+02, percent-clipped=6.0 2023-03-07 16:31:07,376 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-07 16:31:44,317 INFO [train2.py:809] (2/4) Epoch 4, batch 2400, loss[ctc_loss=0.1581, att_loss=0.2823, loss=0.2574, over 16520.00 frames. utt_duration=1470 frames, utt_pad_proportion=0.007277, over 45.00 utterances.], tot_loss[ctc_loss=0.1828, att_loss=0.293, loss=0.271, over 3279250.83 frames. utt_duration=1196 frames, utt_pad_proportion=0.06427, over 10984.37 utterances.], batch size: 45, lr: 2.50e-02, grad_scale: 8.0 2023-03-07 16:32:48,050 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=14393.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:33:03,728 INFO [train2.py:809] (2/4) Epoch 4, batch 2450, loss[ctc_loss=0.1777, att_loss=0.3048, loss=0.2793, over 16885.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.007297, over 49.00 utterances.], tot_loss[ctc_loss=0.1819, att_loss=0.2925, loss=0.2704, over 3285985.68 frames. utt_duration=1207 frames, utt_pad_proportion=0.05956, over 10901.94 utterances.], batch size: 49, lr: 2.49e-02, grad_scale: 8.0 2023-03-07 16:33:09,772 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.088e+02 3.936e+02 4.992e+02 6.745e+02 1.784e+03, threshold=9.985e+02, percent-clipped=4.0 2023-03-07 16:33:20,195 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14413.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:33:26,290 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.92 vs. limit=5.0 2023-03-07 16:33:35,261 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2023-03-07 16:33:44,307 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14428.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:34:04,967 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=14441.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:34:23,280 INFO [train2.py:809] (2/4) Epoch 4, batch 2500, loss[ctc_loss=0.1585, att_loss=0.2759, loss=0.2524, over 17027.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007322, over 51.00 utterances.], tot_loss[ctc_loss=0.1806, att_loss=0.2908, loss=0.2688, over 3279920.06 frames. utt_duration=1216 frames, utt_pad_proportion=0.05976, over 10804.84 utterances.], batch size: 51, lr: 2.49e-02, grad_scale: 8.0 2023-03-07 16:34:27,079 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2231, 4.5749, 4.3809, 4.4603, 4.6323, 4.5694, 4.3726, 4.2464], device='cuda:2'), covar=tensor([0.0982, 0.0404, 0.0256, 0.0393, 0.0281, 0.0260, 0.0232, 0.0258], device='cuda:2'), in_proj_covar=tensor([0.0354, 0.0207, 0.0143, 0.0185, 0.0232, 0.0251, 0.0193, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 16:34:33,337 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14459.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:34:57,598 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14474.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:35:00,414 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14476.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 16:35:02,048 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0254, 4.5066, 4.6858, 4.7672, 4.9396, 4.9185, 4.5768, 2.1522], device='cuda:2'), covar=tensor([0.0192, 0.0348, 0.0204, 0.0180, 0.1197, 0.0189, 0.0276, 0.3211], device='cuda:2'), in_proj_covar=tensor([0.0132, 0.0117, 0.0113, 0.0114, 0.0254, 0.0129, 0.0104, 0.0252], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 16:35:17,969 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14487.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:35:29,717 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.53 vs. limit=2.0 2023-03-07 16:35:42,394 INFO [train2.py:809] (2/4) Epoch 4, batch 2550, loss[ctc_loss=0.1872, att_loss=0.2982, loss=0.276, over 16774.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006081, over 48.00 utterances.], tot_loss[ctc_loss=0.1811, att_loss=0.2916, loss=0.2695, over 3285418.92 frames. utt_duration=1216 frames, utt_pad_proportion=0.0585, over 10824.20 utterances.], batch size: 48, lr: 2.49e-02, grad_scale: 8.0 2023-03-07 16:35:48,913 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.665e+02 3.493e+02 4.541e+02 5.557e+02 1.503e+03, threshold=9.083e+02, percent-clipped=2.0 2023-03-07 16:35:53,640 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.3486, 2.9234, 2.2601, 2.6581, 2.8284, 2.5821, 1.6226, 1.2815], device='cuda:2'), covar=tensor([0.1289, 0.0784, 0.0932, 0.1600, 0.0770, 0.2391, 0.1449, 0.9355], device='cuda:2'), in_proj_covar=tensor([0.0065, 0.0058, 0.0060, 0.0071, 0.0056, 0.0072, 0.0061, 0.0094], device='cuda:2'), out_proj_covar=tensor([4.1226e-05, 3.6470e-05, 3.7302e-05, 4.8068e-05, 3.7119e-05, 4.9903e-05, 4.1225e-05, 6.7704e-05], device='cuda:2') 2023-03-07 16:36:22,723 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0412, 4.8967, 4.8270, 2.8738, 4.9597, 4.3098, 4.2723, 2.7444], device='cuda:2'), covar=tensor([0.0077, 0.0081, 0.0204, 0.0915, 0.0065, 0.0137, 0.0252, 0.1363], device='cuda:2'), in_proj_covar=tensor([0.0048, 0.0054, 0.0044, 0.0086, 0.0050, 0.0062, 0.0072, 0.0097], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:2') 2023-03-07 16:36:54,439 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14548.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:37:02,526 INFO [train2.py:809] (2/4) Epoch 4, batch 2600, loss[ctc_loss=0.1542, att_loss=0.273, loss=0.2492, over 15946.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007353, over 41.00 utterances.], tot_loss[ctc_loss=0.1819, att_loss=0.2922, loss=0.2701, over 3281795.80 frames. utt_duration=1204 frames, utt_pad_proportion=0.06296, over 10917.48 utterances.], batch size: 41, lr: 2.48e-02, grad_scale: 8.0 2023-03-07 16:38:02,299 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-07 16:38:21,812 INFO [train2.py:809] (2/4) Epoch 4, batch 2650, loss[ctc_loss=0.1605, att_loss=0.2841, loss=0.2594, over 16539.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006302, over 45.00 utterances.], tot_loss[ctc_loss=0.1808, att_loss=0.2911, loss=0.2691, over 3276228.70 frames. utt_duration=1217 frames, utt_pad_proportion=0.06039, over 10779.99 utterances.], batch size: 45, lr: 2.48e-02, grad_scale: 8.0 2023-03-07 16:38:28,424 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.304e+02 3.793e+02 4.783e+02 6.009e+02 1.165e+03, threshold=9.567e+02, percent-clipped=2.0 2023-03-07 16:38:41,708 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9242, 6.1047, 5.6357, 6.0890, 5.7477, 5.5401, 5.5173, 5.4114], device='cuda:2'), covar=tensor([0.1032, 0.0864, 0.0642, 0.0645, 0.0505, 0.1182, 0.2456, 0.2309], device='cuda:2'), in_proj_covar=tensor([0.0312, 0.0346, 0.0278, 0.0266, 0.0254, 0.0351, 0.0385, 0.0357], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 16:38:43,262 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5529, 5.8464, 5.3329, 5.8404, 5.4375, 5.2450, 5.2312, 5.2237], device='cuda:2'), covar=tensor([0.1371, 0.0854, 0.0805, 0.0563, 0.0623, 0.1229, 0.2763, 0.2397], device='cuda:2'), in_proj_covar=tensor([0.0313, 0.0346, 0.0279, 0.0267, 0.0255, 0.0351, 0.0386, 0.0358], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 16:39:01,281 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7016, 2.2451, 2.7881, 4.1230, 4.1479, 4.4073, 2.5431, 1.5859], device='cuda:2'), covar=tensor([0.0534, 0.2741, 0.1591, 0.0727, 0.0563, 0.0180, 0.2014, 0.3308], device='cuda:2'), in_proj_covar=tensor([0.0141, 0.0182, 0.0179, 0.0136, 0.0120, 0.0107, 0.0181, 0.0171], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 16:39:41,960 INFO [train2.py:809] (2/4) Epoch 4, batch 2700, loss[ctc_loss=0.2047, att_loss=0.3197, loss=0.2967, over 17363.00 frames. utt_duration=1179 frames, utt_pad_proportion=0.02027, over 59.00 utterances.], tot_loss[ctc_loss=0.1818, att_loss=0.2921, loss=0.2701, over 3276308.19 frames. utt_duration=1187 frames, utt_pad_proportion=0.06887, over 11056.51 utterances.], batch size: 59, lr: 2.48e-02, grad_scale: 8.0 2023-03-07 16:39:45,254 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5524, 5.7737, 5.2605, 5.7271, 5.4084, 5.2011, 5.2361, 5.0532], device='cuda:2'), covar=tensor([0.1098, 0.0698, 0.0706, 0.0567, 0.0603, 0.1294, 0.1851, 0.1822], device='cuda:2'), in_proj_covar=tensor([0.0310, 0.0338, 0.0272, 0.0268, 0.0253, 0.0346, 0.0379, 0.0351], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 16:41:01,175 INFO [train2.py:809] (2/4) Epoch 4, batch 2750, loss[ctc_loss=0.1517, att_loss=0.2672, loss=0.2441, over 15635.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.00893, over 37.00 utterances.], tot_loss[ctc_loss=0.1814, att_loss=0.2917, loss=0.2697, over 3280970.37 frames. utt_duration=1194 frames, utt_pad_proportion=0.06614, over 11009.59 utterances.], batch size: 37, lr: 2.47e-02, grad_scale: 8.0 2023-03-07 16:41:07,358 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.333e+02 3.898e+02 4.978e+02 5.848e+02 1.042e+03, threshold=9.955e+02, percent-clipped=2.0 2023-03-07 16:41:42,290 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=14728.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:42:02,330 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14740.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:42:23,120 INFO [train2.py:809] (2/4) Epoch 4, batch 2800, loss[ctc_loss=0.1461, att_loss=0.2673, loss=0.243, over 15954.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007089, over 41.00 utterances.], tot_loss[ctc_loss=0.1805, att_loss=0.2908, loss=0.2688, over 3272557.26 frames. utt_duration=1189 frames, utt_pad_proportion=0.06885, over 11023.64 utterances.], batch size: 41, lr: 2.47e-02, grad_scale: 8.0 2023-03-07 16:42:24,943 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3763, 4.7463, 4.5995, 4.6528, 4.8068, 4.7258, 4.5141, 4.3954], device='cuda:2'), covar=tensor([0.0941, 0.0377, 0.0201, 0.0373, 0.0243, 0.0288, 0.0242, 0.0275], device='cuda:2'), in_proj_covar=tensor([0.0347, 0.0202, 0.0143, 0.0183, 0.0229, 0.0251, 0.0191, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0003, 0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 16:42:32,809 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=14759.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:42:48,939 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14769.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:43:00,440 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=14776.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:43:00,646 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=14776.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 16:43:41,691 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14801.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:43:44,565 INFO [train2.py:809] (2/4) Epoch 4, batch 2850, loss[ctc_loss=0.2791, att_loss=0.3374, loss=0.3258, over 14462.00 frames. utt_duration=397.7 frames, utt_pad_proportion=0.306, over 146.00 utterances.], tot_loss[ctc_loss=0.1806, att_loss=0.2911, loss=0.269, over 3267010.31 frames. utt_duration=1187 frames, utt_pad_proportion=0.06991, over 11021.46 utterances.], batch size: 146, lr: 2.46e-02, grad_scale: 8.0 2023-03-07 16:43:50,719 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 3.258e+02 4.280e+02 5.701e+02 1.148e+03, threshold=8.559e+02, percent-clipped=5.0 2023-03-07 16:43:50,890 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=14807.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:44:18,601 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=14824.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 16:44:24,854 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.5546, 2.0728, 2.7201, 2.3579, 2.5421, 2.7819, 1.8295, 1.6520], device='cuda:2'), covar=tensor([0.0750, 0.1514, 0.0603, 0.2266, 0.1480, 0.1028, 0.1214, 0.6652], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0062, 0.0062, 0.0078, 0.0062, 0.0073, 0.0064, 0.0099], device='cuda:2'), out_proj_covar=tensor([4.1811e-05, 3.8792e-05, 3.8405e-05, 5.2902e-05, 4.0319e-05, 5.1860e-05, 4.3007e-05, 7.1873e-05], device='cuda:2') 2023-03-07 16:44:48,694 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14843.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:45:04,280 INFO [train2.py:809] (2/4) Epoch 4, batch 2900, loss[ctc_loss=0.1476, att_loss=0.2735, loss=0.2483, over 15972.00 frames. utt_duration=1560 frames, utt_pad_proportion=0.005829, over 41.00 utterances.], tot_loss[ctc_loss=0.1811, att_loss=0.2917, loss=0.2696, over 3269496.47 frames. utt_duration=1205 frames, utt_pad_proportion=0.06475, over 10864.36 utterances.], batch size: 41, lr: 2.46e-02, grad_scale: 8.0 2023-03-07 16:46:25,730 INFO [train2.py:809] (2/4) Epoch 4, batch 2950, loss[ctc_loss=0.1376, att_loss=0.2462, loss=0.2245, over 15625.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.01017, over 37.00 utterances.], tot_loss[ctc_loss=0.1792, att_loss=0.2899, loss=0.2678, over 3268625.25 frames. utt_duration=1229 frames, utt_pad_proportion=0.05945, over 10651.26 utterances.], batch size: 37, lr: 2.46e-02, grad_scale: 8.0 2023-03-07 16:46:32,078 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+02 3.902e+02 4.882e+02 6.670e+02 1.203e+03, threshold=9.763e+02, percent-clipped=9.0 2023-03-07 16:46:49,653 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-07 16:47:14,091 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1465, 4.6097, 4.1675, 4.7351, 4.1962, 4.4660, 4.8541, 4.5800], device='cuda:2'), covar=tensor([0.0413, 0.0277, 0.0753, 0.0165, 0.0454, 0.0271, 0.0238, 0.0179], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0155, 0.0198, 0.0121, 0.0172, 0.0119, 0.0146, 0.0139], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 16:47:46,081 INFO [train2.py:809] (2/4) Epoch 4, batch 3000, loss[ctc_loss=0.1366, att_loss=0.2567, loss=0.2327, over 16169.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.007148, over 41.00 utterances.], tot_loss[ctc_loss=0.18, att_loss=0.2903, loss=0.2682, over 3261312.50 frames. utt_duration=1182 frames, utt_pad_proportion=0.07355, over 11048.89 utterances.], batch size: 41, lr: 2.45e-02, grad_scale: 16.0 2023-03-07 16:47:46,082 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 16:48:00,418 INFO [train2.py:843] (2/4) Epoch 4, validation: ctc_loss=0.08782, att_loss=0.2564, loss=0.2227, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 16:48:00,418 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 16:48:19,353 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6836, 3.8880, 3.9417, 3.8728, 4.0598, 4.0065, 3.8657, 3.8101], device='cuda:2'), covar=tensor([0.1187, 0.0670, 0.0222, 0.0520, 0.0326, 0.0329, 0.0267, 0.0320], device='cuda:2'), in_proj_covar=tensor([0.0352, 0.0210, 0.0144, 0.0190, 0.0234, 0.0256, 0.0196, 0.0219], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0003, 0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 16:49:20,420 INFO [train2.py:809] (2/4) Epoch 4, batch 3050, loss[ctc_loss=0.1299, att_loss=0.2551, loss=0.2301, over 16173.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.006124, over 41.00 utterances.], tot_loss[ctc_loss=0.1798, att_loss=0.2898, loss=0.2678, over 3261515.86 frames. utt_duration=1170 frames, utt_pad_proportion=0.07698, over 11168.48 utterances.], batch size: 41, lr: 2.45e-02, grad_scale: 16.0 2023-03-07 16:49:26,540 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+02 3.805e+02 4.579e+02 6.197e+02 1.095e+03, threshold=9.158e+02, percent-clipped=2.0 2023-03-07 16:49:29,217 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0 2023-03-07 16:49:31,103 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.03 vs. limit=5.0 2023-03-07 16:49:36,542 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5827, 3.9078, 3.6836, 3.9354, 3.8967, 3.7730, 3.3044, 3.7465], device='cuda:2'), covar=tensor([0.0106, 0.0105, 0.0103, 0.0087, 0.0082, 0.0094, 0.0373, 0.0239], device='cuda:2'), in_proj_covar=tensor([0.0051, 0.0051, 0.0053, 0.0039, 0.0039, 0.0048, 0.0071, 0.0067], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 16:50:24,267 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4150, 3.6252, 3.0083, 3.3219, 3.7166, 3.4189, 2.5148, 4.2080], device='cuda:2'), covar=tensor([0.1266, 0.0507, 0.1284, 0.0664, 0.0563, 0.0746, 0.1099, 0.0393], device='cuda:2'), in_proj_covar=tensor([0.0148, 0.0116, 0.0166, 0.0139, 0.0140, 0.0161, 0.0145, 0.0116], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 16:50:25,744 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5227, 5.1162, 4.6299, 4.9680, 4.9074, 4.7130, 4.3563, 4.9313], device='cuda:2'), covar=tensor([0.0109, 0.0109, 0.0094, 0.0090, 0.0120, 0.0091, 0.0338, 0.0175], device='cuda:2'), in_proj_covar=tensor([0.0050, 0.0050, 0.0051, 0.0038, 0.0038, 0.0047, 0.0069, 0.0065], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 16:50:40,668 INFO [train2.py:809] (2/4) Epoch 4, batch 3100, loss[ctc_loss=0.1662, att_loss=0.2889, loss=0.2644, over 17011.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.00837, over 51.00 utterances.], tot_loss[ctc_loss=0.1804, att_loss=0.2906, loss=0.2686, over 3267581.26 frames. utt_duration=1161 frames, utt_pad_proportion=0.07704, over 11268.40 utterances.], batch size: 51, lr: 2.45e-02, grad_scale: 16.0 2023-03-07 16:50:41,079 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5144, 1.6530, 1.8663, 0.7676, 2.9959, 1.3206, 1.8452, 2.3679], device='cuda:2'), covar=tensor([0.0325, 0.1443, 0.1677, 0.1954, 0.0509, 0.1288, 0.1127, 0.0804], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0090, 0.0092, 0.0082, 0.0076, 0.0076, 0.0092, 0.0089], device='cuda:2'), out_proj_covar=tensor([3.9497e-05, 4.6048e-05, 4.7450e-05, 4.4498e-05, 3.9179e-05, 4.1250e-05, 4.3869e-05, 4.4709e-05], device='cuda:2') 2023-03-07 16:51:06,265 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=15069.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:51:50,995 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=15096.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:51:55,899 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.4975, 2.6698, 2.6190, 2.3549, 2.8700, 2.6307, 2.4336, 1.3857], device='cuda:2'), covar=tensor([0.2309, 0.1427, 0.0989, 0.2020, 0.1022, 0.2183, 0.2051, 0.8289], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0065, 0.0067, 0.0079, 0.0066, 0.0078, 0.0068, 0.0102], device='cuda:2'), out_proj_covar=tensor([4.5885e-05, 4.0500e-05, 4.0650e-05, 5.4363e-05, 4.2715e-05, 5.4833e-05, 4.4965e-05, 7.5682e-05], device='cuda:2') 2023-03-07 16:52:01,854 INFO [train2.py:809] (2/4) Epoch 4, batch 3150, loss[ctc_loss=0.1677, att_loss=0.2957, loss=0.2701, over 17289.00 frames. utt_duration=876.7 frames, utt_pad_proportion=0.08102, over 79.00 utterances.], tot_loss[ctc_loss=0.1792, att_loss=0.2904, loss=0.2682, over 3270017.43 frames. utt_duration=1158 frames, utt_pad_proportion=0.07732, over 11310.72 utterances.], batch size: 79, lr: 2.44e-02, grad_scale: 16.0 2023-03-07 16:52:07,991 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 3.829e+02 4.318e+02 5.478e+02 1.412e+03, threshold=8.637e+02, percent-clipped=4.0 2023-03-07 16:52:23,556 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=15117.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:53:05,048 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=15143.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:53:21,396 INFO [train2.py:809] (2/4) Epoch 4, batch 3200, loss[ctc_loss=0.1995, att_loss=0.307, loss=0.2855, over 17108.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01557, over 56.00 utterances.], tot_loss[ctc_loss=0.1777, att_loss=0.2897, loss=0.2673, over 3275869.17 frames. utt_duration=1181 frames, utt_pad_proportion=0.07027, over 11106.66 utterances.], batch size: 56, lr: 2.44e-02, grad_scale: 16.0 2023-03-07 16:53:40,025 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.9151, 2.6334, 2.9209, 2.6297, 3.0123, 3.0255, 2.6750, 1.5717], device='cuda:2'), covar=tensor([0.1875, 0.1524, 0.0917, 0.2460, 0.1362, 0.4186, 0.1003, 1.0634], device='cuda:2'), in_proj_covar=tensor([0.0074, 0.0067, 0.0069, 0.0081, 0.0068, 0.0083, 0.0068, 0.0107], device='cuda:2'), out_proj_covar=tensor([4.6971e-05, 4.1462e-05, 4.2134e-05, 5.6104e-05, 4.4060e-05, 5.8003e-05, 4.5177e-05, 7.8169e-05], device='cuda:2') 2023-03-07 16:54:21,481 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=15191.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:54:28,827 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=15195.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 16:54:42,165 INFO [train2.py:809] (2/4) Epoch 4, batch 3250, loss[ctc_loss=0.1524, att_loss=0.2632, loss=0.241, over 15511.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.008083, over 36.00 utterances.], tot_loss[ctc_loss=0.1771, att_loss=0.2895, loss=0.267, over 3280493.76 frames. utt_duration=1197 frames, utt_pad_proportion=0.06497, over 10977.67 utterances.], batch size: 36, lr: 2.44e-02, grad_scale: 16.0 2023-03-07 16:54:48,522 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.575e+02 4.506e+02 5.643e+02 1.593e+03, threshold=9.012e+02, percent-clipped=6.0 2023-03-07 16:55:00,239 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-07 16:56:01,881 INFO [train2.py:809] (2/4) Epoch 4, batch 3300, loss[ctc_loss=0.1546, att_loss=0.2873, loss=0.2608, over 16630.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.005005, over 47.00 utterances.], tot_loss[ctc_loss=0.177, att_loss=0.2892, loss=0.2667, over 3279625.04 frames. utt_duration=1216 frames, utt_pad_proportion=0.06049, over 10798.08 utterances.], batch size: 47, lr: 2.43e-02, grad_scale: 16.0 2023-03-07 16:56:06,920 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=15256.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 16:57:22,460 INFO [train2.py:809] (2/4) Epoch 4, batch 3350, loss[ctc_loss=0.1642, att_loss=0.2938, loss=0.2679, over 17038.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.00886, over 52.00 utterances.], tot_loss[ctc_loss=0.1768, att_loss=0.289, loss=0.2665, over 3283901.96 frames. utt_duration=1230 frames, utt_pad_proportion=0.0562, over 10693.91 utterances.], batch size: 52, lr: 2.43e-02, grad_scale: 16.0 2023-03-07 16:57:23,257 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.77 vs. limit=2.0 2023-03-07 16:57:24,816 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-07 16:57:28,536 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.491e+02 3.835e+02 4.773e+02 5.805e+02 1.964e+03, threshold=9.546e+02, percent-clipped=8.0 2023-03-07 16:58:42,386 INFO [train2.py:809] (2/4) Epoch 4, batch 3400, loss[ctc_loss=0.1668, att_loss=0.2696, loss=0.249, over 14071.00 frames. utt_duration=1817 frames, utt_pad_proportion=0.05008, over 31.00 utterances.], tot_loss[ctc_loss=0.1767, att_loss=0.289, loss=0.2666, over 3284844.12 frames. utt_duration=1239 frames, utt_pad_proportion=0.05337, over 10614.13 utterances.], batch size: 31, lr: 2.42e-02, grad_scale: 16.0 2023-03-07 16:59:51,080 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=15396.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:00:02,643 INFO [train2.py:809] (2/4) Epoch 4, batch 3450, loss[ctc_loss=0.1088, att_loss=0.2262, loss=0.2028, over 15495.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009025, over 36.00 utterances.], tot_loss[ctc_loss=0.1753, att_loss=0.2881, loss=0.2655, over 3279925.52 frames. utt_duration=1251 frames, utt_pad_proportion=0.05139, over 10496.24 utterances.], batch size: 36, lr: 2.42e-02, grad_scale: 8.0 2023-03-07 17:00:10,263 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+02 3.236e+02 3.892e+02 5.378e+02 1.574e+03, threshold=7.784e+02, percent-clipped=3.0 2023-03-07 17:01:07,805 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=15444.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:01:22,735 INFO [train2.py:809] (2/4) Epoch 4, batch 3500, loss[ctc_loss=0.151, att_loss=0.2804, loss=0.2545, over 16279.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.007356, over 43.00 utterances.], tot_loss[ctc_loss=0.1754, att_loss=0.2884, loss=0.2658, over 3281511.86 frames. utt_duration=1277 frames, utt_pad_proportion=0.04454, over 10290.72 utterances.], batch size: 43, lr: 2.42e-02, grad_scale: 8.0 2023-03-07 17:02:43,134 INFO [train2.py:809] (2/4) Epoch 4, batch 3550, loss[ctc_loss=0.1612, att_loss=0.2674, loss=0.2461, over 15645.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.00836, over 37.00 utterances.], tot_loss[ctc_loss=0.1727, att_loss=0.2866, loss=0.2638, over 3274861.07 frames. utt_duration=1304 frames, utt_pad_proportion=0.03974, over 10057.77 utterances.], batch size: 37, lr: 2.41e-02, grad_scale: 8.0 2023-03-07 17:02:47,903 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5948, 5.8773, 4.9614, 5.7851, 5.4436, 5.2218, 5.2436, 5.1337], device='cuda:2'), covar=tensor([0.1247, 0.0786, 0.0847, 0.0615, 0.0704, 0.1332, 0.2099, 0.1757], device='cuda:2'), in_proj_covar=tensor([0.0312, 0.0348, 0.0273, 0.0276, 0.0253, 0.0345, 0.0375, 0.0351], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 17:02:50,733 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.812e+02 4.413e+02 5.065e+02 1.151e+03, threshold=8.826e+02, percent-clipped=3.0 2023-03-07 17:04:00,623 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=15551.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 17:04:03,537 INFO [train2.py:809] (2/4) Epoch 4, batch 3600, loss[ctc_loss=0.1693, att_loss=0.2855, loss=0.2622, over 16691.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006245, over 46.00 utterances.], tot_loss[ctc_loss=0.1734, att_loss=0.2866, loss=0.264, over 3270534.90 frames. utt_duration=1286 frames, utt_pad_proportion=0.04474, over 10185.63 utterances.], batch size: 46, lr: 2.41e-02, grad_scale: 8.0 2023-03-07 17:04:28,214 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.58 vs. limit=2.0 2023-03-07 17:05:24,927 INFO [train2.py:809] (2/4) Epoch 4, batch 3650, loss[ctc_loss=0.1583, att_loss=0.2685, loss=0.2465, over 15955.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006409, over 41.00 utterances.], tot_loss[ctc_loss=0.1729, att_loss=0.2867, loss=0.2639, over 3278567.94 frames. utt_duration=1271 frames, utt_pad_proportion=0.04565, over 10331.56 utterances.], batch size: 41, lr: 2.41e-02, grad_scale: 8.0 2023-03-07 17:05:32,893 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.822e+02 4.406e+02 5.174e+02 1.169e+03, threshold=8.811e+02, percent-clipped=2.0 2023-03-07 17:06:37,177 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7404, 5.1180, 5.4947, 5.3141, 4.4945, 5.4267, 4.9040, 5.5703], device='cuda:2'), covar=tensor([0.0998, 0.1008, 0.0717, 0.1249, 0.4090, 0.1477, 0.0845, 0.0831], device='cuda:2'), in_proj_covar=tensor([0.0457, 0.0300, 0.0308, 0.0362, 0.0519, 0.0308, 0.0254, 0.0312], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 17:06:44,836 INFO [train2.py:809] (2/4) Epoch 4, batch 3700, loss[ctc_loss=0.1187, att_loss=0.2589, loss=0.2309, over 16122.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006418, over 42.00 utterances.], tot_loss[ctc_loss=0.1734, att_loss=0.2868, loss=0.2641, over 3270585.80 frames. utt_duration=1248 frames, utt_pad_proportion=0.05507, over 10497.92 utterances.], batch size: 42, lr: 2.40e-02, grad_scale: 8.0 2023-03-07 17:07:40,541 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2023-03-07 17:08:05,617 INFO [train2.py:809] (2/4) Epoch 4, batch 3750, loss[ctc_loss=0.1924, att_loss=0.3025, loss=0.2805, over 17127.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01393, over 56.00 utterances.], tot_loss[ctc_loss=0.173, att_loss=0.2871, loss=0.2643, over 3279423.06 frames. utt_duration=1235 frames, utt_pad_proportion=0.05619, over 10630.56 utterances.], batch size: 56, lr: 2.40e-02, grad_scale: 8.0 2023-03-07 17:08:13,129 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+02 3.564e+02 4.340e+02 5.643e+02 1.443e+03, threshold=8.680e+02, percent-clipped=5.0 2023-03-07 17:09:23,008 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2023-03-07 17:09:25,255 INFO [train2.py:809] (2/4) Epoch 4, batch 3800, loss[ctc_loss=0.1813, att_loss=0.3019, loss=0.2778, over 17071.00 frames. utt_duration=1221 frames, utt_pad_proportion=0.01767, over 56.00 utterances.], tot_loss[ctc_loss=0.1728, att_loss=0.2873, loss=0.2644, over 3283899.30 frames. utt_duration=1244 frames, utt_pad_proportion=0.0524, over 10568.52 utterances.], batch size: 56, lr: 2.40e-02, grad_scale: 8.0 2023-03-07 17:09:49,064 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.54 vs. limit=2.0 2023-03-07 17:10:46,093 INFO [train2.py:809] (2/4) Epoch 4, batch 3850, loss[ctc_loss=0.1986, att_loss=0.3025, loss=0.2817, over 16615.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005857, over 47.00 utterances.], tot_loss[ctc_loss=0.1732, att_loss=0.2874, loss=0.2645, over 3281487.70 frames. utt_duration=1230 frames, utt_pad_proportion=0.05694, over 10680.81 utterances.], batch size: 47, lr: 2.39e-02, grad_scale: 8.0 2023-03-07 17:10:53,737 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.413e+02 3.550e+02 4.549e+02 5.409e+02 1.209e+03, threshold=9.099e+02, percent-clipped=5.0 2023-03-07 17:11:48,096 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5341, 3.7508, 2.9268, 3.4687, 3.7928, 3.5825, 2.6977, 4.4814], device='cuda:2'), covar=tensor([0.1261, 0.0372, 0.1330, 0.0701, 0.0584, 0.0709, 0.1035, 0.0294], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0119, 0.0171, 0.0139, 0.0146, 0.0157, 0.0146, 0.0121], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 17:11:55,722 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2153, 4.7268, 4.6400, 3.2677, 2.1149, 2.5710, 4.6603, 3.7086], device='cuda:2'), covar=tensor([0.0396, 0.0172, 0.0170, 0.1321, 0.5739, 0.2449, 0.0145, 0.1574], device='cuda:2'), in_proj_covar=tensor([0.0257, 0.0161, 0.0185, 0.0175, 0.0367, 0.0314, 0.0163, 0.0292], device='cuda:2'), out_proj_covar=tensor([1.3952e-04, 7.6394e-05, 8.9276e-05, 8.2301e-05, 1.8585e-04, 1.5079e-04, 7.6063e-05, 1.5217e-04], device='cuda:2') 2023-03-07 17:12:00,193 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=15851.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:12:02,974 INFO [train2.py:809] (2/4) Epoch 4, batch 3900, loss[ctc_loss=0.156, att_loss=0.2754, loss=0.2515, over 16115.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.00671, over 42.00 utterances.], tot_loss[ctc_loss=0.1741, att_loss=0.288, loss=0.2652, over 3282176.66 frames. utt_duration=1236 frames, utt_pad_proportion=0.05592, over 10633.00 utterances.], batch size: 42, lr: 2.39e-02, grad_scale: 8.0 2023-03-07 17:12:49,855 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=15883.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:13:13,708 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=15899.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:13:19,702 INFO [train2.py:809] (2/4) Epoch 4, batch 3950, loss[ctc_loss=0.1541, att_loss=0.264, loss=0.242, over 15865.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.01061, over 39.00 utterances.], tot_loss[ctc_loss=0.1744, att_loss=0.2883, loss=0.2655, over 3290108.13 frames. utt_duration=1241 frames, utt_pad_proportion=0.05165, over 10614.28 utterances.], batch size: 39, lr: 2.39e-02, grad_scale: 8.0 2023-03-07 17:13:27,824 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+02 3.629e+02 4.356e+02 5.723e+02 1.612e+03, threshold=8.711e+02, percent-clipped=3.0 2023-03-07 17:14:39,237 INFO [train2.py:809] (2/4) Epoch 5, batch 0, loss[ctc_loss=0.184, att_loss=0.3014, loss=0.2779, over 17478.00 frames. utt_duration=1015 frames, utt_pad_proportion=0.04371, over 69.00 utterances.], tot_loss[ctc_loss=0.184, att_loss=0.3014, loss=0.2779, over 17478.00 frames. utt_duration=1015 frames, utt_pad_proportion=0.04371, over 69.00 utterances.], batch size: 69, lr: 2.22e-02, grad_scale: 8.0 2023-03-07 17:14:39,238 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 17:14:51,959 INFO [train2.py:843] (2/4) Epoch 5, validation: ctc_loss=0.08303, att_loss=0.2543, loss=0.22, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 17:14:51,960 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 17:15:06,132 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=15944.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:16:13,042 INFO [train2.py:809] (2/4) Epoch 5, batch 50, loss[ctc_loss=0.1487, att_loss=0.2771, loss=0.2515, over 16394.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.006854, over 44.00 utterances.], tot_loss[ctc_loss=0.1708, att_loss=0.2877, loss=0.2643, over 744997.17 frames. utt_duration=1223 frames, utt_pad_proportion=0.05508, over 2439.96 utterances.], batch size: 44, lr: 2.22e-02, grad_scale: 8.0 2023-03-07 17:16:41,392 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16001.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:16:51,838 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 3.417e+02 4.421e+02 5.634e+02 1.181e+03, threshold=8.842e+02, percent-clipped=4.0 2023-03-07 17:17:29,680 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.74 vs. limit=2.0 2023-03-07 17:17:36,496 INFO [train2.py:809] (2/4) Epoch 5, batch 100, loss[ctc_loss=0.1704, att_loss=0.2688, loss=0.2491, over 15363.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.0116, over 35.00 utterances.], tot_loss[ctc_loss=0.1696, att_loss=0.2846, loss=0.2616, over 1304449.37 frames. utt_duration=1239 frames, utt_pad_proportion=0.05259, over 4215.75 utterances.], batch size: 35, lr: 2.21e-02, grad_scale: 8.0 2023-03-07 17:18:18,165 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16062.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:18:55,109 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16085.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:18:56,277 INFO [train2.py:809] (2/4) Epoch 5, batch 150, loss[ctc_loss=0.1474, att_loss=0.277, loss=0.2511, over 16119.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.00671, over 42.00 utterances.], tot_loss[ctc_loss=0.1671, att_loss=0.2839, loss=0.2605, over 1741749.80 frames. utt_duration=1271 frames, utt_pad_proportion=0.04604, over 5487.08 utterances.], batch size: 42, lr: 2.21e-02, grad_scale: 8.0 2023-03-07 17:19:28,480 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.2373, 3.6667, 3.1298, 3.3701, 3.9193, 3.4932, 2.2500, 4.4794], device='cuda:2'), covar=tensor([0.1306, 0.0429, 0.1071, 0.0683, 0.0498, 0.0727, 0.1269, 0.0291], device='cuda:2'), in_proj_covar=tensor([0.0144, 0.0119, 0.0171, 0.0138, 0.0144, 0.0161, 0.0143, 0.0123], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 17:19:31,235 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 3.158e+02 4.095e+02 5.546e+02 1.058e+03, threshold=8.189e+02, percent-clipped=5.0 2023-03-07 17:20:16,134 INFO [train2.py:809] (2/4) Epoch 5, batch 200, loss[ctc_loss=0.1688, att_loss=0.2843, loss=0.2612, over 16481.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006579, over 46.00 utterances.], tot_loss[ctc_loss=0.1653, att_loss=0.2819, loss=0.2586, over 2081492.01 frames. utt_duration=1299 frames, utt_pad_proportion=0.03969, over 6416.61 utterances.], batch size: 46, lr: 2.21e-02, grad_scale: 8.0 2023-03-07 17:20:32,685 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16146.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:20:37,367 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16149.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:21:35,475 INFO [train2.py:809] (2/4) Epoch 5, batch 250, loss[ctc_loss=0.1622, att_loss=0.2933, loss=0.2671, over 17264.00 frames. utt_duration=1257 frames, utt_pad_proportion=0.01392, over 55.00 utterances.], tot_loss[ctc_loss=0.1638, att_loss=0.2812, loss=0.2577, over 2347930.21 frames. utt_duration=1284 frames, utt_pad_proportion=0.04213, over 7323.74 utterances.], batch size: 55, lr: 2.20e-02, grad_scale: 8.0 2023-03-07 17:22:11,995 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 3.409e+02 3.964e+02 4.827e+02 7.562e+02, threshold=7.928e+02, percent-clipped=0.0 2023-03-07 17:22:15,447 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16210.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:22:57,401 INFO [train2.py:809] (2/4) Epoch 5, batch 300, loss[ctc_loss=0.1492, att_loss=0.268, loss=0.2442, over 16270.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007798, over 43.00 utterances.], tot_loss[ctc_loss=0.1629, att_loss=0.2806, loss=0.257, over 2550235.35 frames. utt_duration=1251 frames, utt_pad_proportion=0.05219, over 8162.06 utterances.], batch size: 43, lr: 2.20e-02, grad_scale: 8.0 2023-03-07 17:23:02,214 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16239.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:24:18,574 INFO [train2.py:809] (2/4) Epoch 5, batch 350, loss[ctc_loss=0.1451, att_loss=0.2718, loss=0.2465, over 16481.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006609, over 46.00 utterances.], tot_loss[ctc_loss=0.1637, att_loss=0.2815, loss=0.2579, over 2715240.21 frames. utt_duration=1258 frames, utt_pad_proportion=0.05106, over 8645.75 utterances.], batch size: 46, lr: 2.20e-02, grad_scale: 8.0 2023-03-07 17:24:31,636 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8097, 5.1867, 4.6747, 5.3420, 4.5958, 5.0925, 5.4621, 5.1714], device='cuda:2'), covar=tensor([0.0338, 0.0252, 0.0782, 0.0167, 0.0446, 0.0128, 0.0208, 0.0172], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0161, 0.0211, 0.0136, 0.0180, 0.0130, 0.0155, 0.0151], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-07 17:24:56,573 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 3.423e+02 4.365e+02 5.343e+02 1.409e+03, threshold=8.730e+02, percent-clipped=9.0 2023-03-07 17:25:24,176 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5694, 5.2910, 5.1142, 5.1862, 5.2226, 5.3333, 5.0683, 5.0144], device='cuda:2'), covar=tensor([0.1476, 0.0504, 0.0257, 0.0412, 0.0474, 0.0346, 0.0271, 0.0288], device='cuda:2'), in_proj_covar=tensor([0.0369, 0.0214, 0.0152, 0.0193, 0.0238, 0.0261, 0.0201, 0.0233], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 17:25:41,307 INFO [train2.py:809] (2/4) Epoch 5, batch 400, loss[ctc_loss=0.1902, att_loss=0.3128, loss=0.2883, over 17061.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.008389, over 52.00 utterances.], tot_loss[ctc_loss=0.1648, att_loss=0.2819, loss=0.2585, over 2837269.24 frames. utt_duration=1242 frames, utt_pad_proportion=0.05525, over 9148.80 utterances.], batch size: 52, lr: 2.20e-02, grad_scale: 8.0 2023-03-07 17:26:15,254 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16357.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:26:37,591 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16371.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:26:38,299 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2023-03-07 17:27:02,333 INFO [train2.py:809] (2/4) Epoch 5, batch 450, loss[ctc_loss=0.1585, att_loss=0.2824, loss=0.2576, over 17017.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008006, over 51.00 utterances.], tot_loss[ctc_loss=0.1642, att_loss=0.2814, loss=0.2579, over 2935396.96 frames. utt_duration=1275 frames, utt_pad_proportion=0.0475, over 9219.24 utterances.], batch size: 51, lr: 2.19e-02, grad_scale: 8.0 2023-03-07 17:27:40,288 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.521e+02 3.844e+02 4.538e+02 5.770e+02 9.247e+02, threshold=9.077e+02, percent-clipped=1.0 2023-03-07 17:28:17,578 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16432.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:28:24,059 INFO [train2.py:809] (2/4) Epoch 5, batch 500, loss[ctc_loss=0.1447, att_loss=0.2534, loss=0.2317, over 15500.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008817, over 36.00 utterances.], tot_loss[ctc_loss=0.1657, att_loss=0.2822, loss=0.2589, over 3011351.71 frames. utt_duration=1269 frames, utt_pad_proportion=0.04977, over 9505.60 utterances.], batch size: 36, lr: 2.19e-02, grad_scale: 8.0 2023-03-07 17:28:32,531 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16441.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:29:13,964 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3186, 4.7427, 4.4175, 4.7034, 4.7775, 4.6157, 4.1268, 4.7828], device='cuda:2'), covar=tensor([0.0119, 0.0154, 0.0122, 0.0113, 0.0128, 0.0095, 0.0344, 0.0222], device='cuda:2'), in_proj_covar=tensor([0.0050, 0.0051, 0.0055, 0.0039, 0.0038, 0.0048, 0.0071, 0.0066], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 17:29:45,244 INFO [train2.py:809] (2/4) Epoch 5, batch 550, loss[ctc_loss=0.1581, att_loss=0.2836, loss=0.2585, over 16480.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.006489, over 46.00 utterances.], tot_loss[ctc_loss=0.166, att_loss=0.282, loss=0.2588, over 3068430.60 frames. utt_duration=1267 frames, utt_pad_proportion=0.05086, over 9700.44 utterances.], batch size: 46, lr: 2.19e-02, grad_scale: 8.0 2023-03-07 17:30:10,106 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.05 vs. limit=5.0 2023-03-07 17:30:16,800 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16505.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:30:22,784 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+02 3.300e+02 4.154e+02 5.468e+02 1.131e+03, threshold=8.309e+02, percent-clipped=7.0 2023-03-07 17:30:25,785 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-07 17:31:03,533 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8127, 0.9890, 1.5947, 2.1879, 2.1967, 1.5262, 1.4024, 2.3183], device='cuda:2'), covar=tensor([0.0838, 0.2550, 0.2135, 0.1105, 0.0402, 0.1934, 0.1875, 0.0921], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0096, 0.0101, 0.0080, 0.0082, 0.0086, 0.0100, 0.0086], device='cuda:2'), out_proj_covar=tensor([4.1625e-05, 5.1405e-05, 5.1865e-05, 4.3622e-05, 3.8865e-05, 4.6270e-05, 4.8998e-05, 4.4401e-05], device='cuda:2') 2023-03-07 17:31:06,242 INFO [train2.py:809] (2/4) Epoch 5, batch 600, loss[ctc_loss=0.1538, att_loss=0.2551, loss=0.2348, over 14522.00 frames. utt_duration=1817 frames, utt_pad_proportion=0.03627, over 32.00 utterances.], tot_loss[ctc_loss=0.1673, att_loss=0.2828, loss=0.2597, over 3111542.92 frames. utt_duration=1242 frames, utt_pad_proportion=0.05653, over 10030.04 utterances.], batch size: 32, lr: 2.18e-02, grad_scale: 8.0 2023-03-07 17:31:12,941 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=16539.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:31:41,818 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2146, 4.3821, 4.2813, 4.5686, 2.1918, 4.8092, 2.3547, 2.0477], device='cuda:2'), covar=tensor([0.0202, 0.0149, 0.0974, 0.0245, 0.2756, 0.0159, 0.2082, 0.1919], device='cuda:2'), in_proj_covar=tensor([0.0099, 0.0095, 0.0238, 0.0109, 0.0224, 0.0093, 0.0216, 0.0199], device='cuda:2'), out_proj_covar=tensor([9.5103e-05, 9.3053e-05, 2.0668e-04, 9.6008e-05, 1.9446e-04, 8.7451e-05, 1.8535e-04, 1.7128e-04], device='cuda:2') 2023-03-07 17:32:27,514 INFO [train2.py:809] (2/4) Epoch 5, batch 650, loss[ctc_loss=0.1875, att_loss=0.2984, loss=0.2762, over 17337.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.03648, over 63.00 utterances.], tot_loss[ctc_loss=0.1661, att_loss=0.2827, loss=0.2594, over 3151280.67 frames. utt_duration=1243 frames, utt_pad_proportion=0.05613, over 10152.63 utterances.], batch size: 63, lr: 2.18e-02, grad_scale: 8.0 2023-03-07 17:32:29,904 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=16587.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:33:07,232 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 3.372e+02 4.098e+02 5.133e+02 8.733e+02, threshold=8.196e+02, percent-clipped=2.0 2023-03-07 17:33:51,405 INFO [train2.py:809] (2/4) Epoch 5, batch 700, loss[ctc_loss=0.1574, att_loss=0.2766, loss=0.2528, over 15951.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.005681, over 41.00 utterances.], tot_loss[ctc_loss=0.1652, att_loss=0.2831, loss=0.2595, over 3188818.60 frames. utt_duration=1241 frames, utt_pad_proportion=0.05343, over 10293.60 utterances.], batch size: 41, lr: 2.18e-02, grad_scale: 8.0 2023-03-07 17:34:26,836 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=16657.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:34:26,929 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9032, 1.0957, 2.1344, 2.1236, 2.7483, 1.3305, 1.7661, 1.9334], device='cuda:2'), covar=tensor([0.0504, 0.2133, 0.1890, 0.0902, 0.0512, 0.1942, 0.1369, 0.1230], device='cuda:2'), in_proj_covar=tensor([0.0095, 0.0095, 0.0102, 0.0080, 0.0082, 0.0085, 0.0100, 0.0088], device='cuda:2'), out_proj_covar=tensor([4.0944e-05, 5.0625e-05, 5.2412e-05, 4.2920e-05, 3.8923e-05, 4.5532e-05, 4.8999e-05, 4.5452e-05], device='cuda:2') 2023-03-07 17:34:28,921 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.87 vs. limit=5.0 2023-03-07 17:35:14,450 INFO [train2.py:809] (2/4) Epoch 5, batch 750, loss[ctc_loss=0.1746, att_loss=0.2812, loss=0.2599, over 16142.00 frames. utt_duration=1539 frames, utt_pad_proportion=0.004682, over 42.00 utterances.], tot_loss[ctc_loss=0.164, att_loss=0.282, loss=0.2584, over 3205365.63 frames. utt_duration=1260 frames, utt_pad_proportion=0.04957, over 10184.58 utterances.], batch size: 42, lr: 2.17e-02, grad_scale: 8.0 2023-03-07 17:35:45,800 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=16705.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:35:51,950 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+02 3.355e+02 3.924e+02 5.306e+02 2.112e+03, threshold=7.847e+02, percent-clipped=6.0 2023-03-07 17:36:21,556 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16727.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:36:36,909 INFO [train2.py:809] (2/4) Epoch 5, batch 800, loss[ctc_loss=0.1618, att_loss=0.2693, loss=0.2478, over 13207.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.09623, over 29.00 utterances.], tot_loss[ctc_loss=0.1665, att_loss=0.2837, loss=0.2603, over 3214954.77 frames. utt_duration=1226 frames, utt_pad_proportion=0.05933, over 10505.72 utterances.], batch size: 29, lr: 2.17e-02, grad_scale: 8.0 2023-03-07 17:36:45,935 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=16741.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:37:59,754 INFO [train2.py:809] (2/4) Epoch 5, batch 850, loss[ctc_loss=0.1491, att_loss=0.2801, loss=0.2539, over 17031.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007913, over 51.00 utterances.], tot_loss[ctc_loss=0.1672, att_loss=0.2841, loss=0.2607, over 3227409.63 frames. utt_duration=1220 frames, utt_pad_proportion=0.06143, over 10594.50 utterances.], batch size: 51, lr: 2.17e-02, grad_scale: 8.0 2023-03-07 17:38:04,497 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=16789.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:38:30,532 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=16805.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:38:36,314 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 3.471e+02 4.444e+02 5.932e+02 1.311e+03, threshold=8.889e+02, percent-clipped=13.0 2023-03-07 17:39:18,090 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5379, 2.2193, 3.1603, 4.2371, 3.8922, 4.1046, 2.7792, 1.7611], device='cuda:2'), covar=tensor([0.0542, 0.3094, 0.1248, 0.0717, 0.0493, 0.0251, 0.1922, 0.3091], device='cuda:2'), in_proj_covar=tensor([0.0141, 0.0190, 0.0187, 0.0140, 0.0128, 0.0114, 0.0181, 0.0171], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 17:39:22,339 INFO [train2.py:809] (2/4) Epoch 5, batch 900, loss[ctc_loss=0.1882, att_loss=0.2844, loss=0.2652, over 16968.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.006608, over 50.00 utterances.], tot_loss[ctc_loss=0.167, att_loss=0.284, loss=0.2606, over 3238127.46 frames. utt_duration=1217 frames, utt_pad_proportion=0.06275, over 10658.78 utterances.], batch size: 50, lr: 2.16e-02, grad_scale: 8.0 2023-03-07 17:39:23,056 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.51 vs. limit=5.0 2023-03-07 17:39:48,828 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=16853.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:40:13,297 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16868.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 17:40:25,265 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2023-03-07 17:40:42,555 INFO [train2.py:809] (2/4) Epoch 5, batch 950, loss[ctc_loss=0.1866, att_loss=0.3028, loss=0.2796, over 16966.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007641, over 50.00 utterances.], tot_loss[ctc_loss=0.1677, att_loss=0.2844, loss=0.2611, over 3248544.75 frames. utt_duration=1225 frames, utt_pad_proportion=0.06104, over 10623.74 utterances.], batch size: 50, lr: 2.16e-02, grad_scale: 8.0 2023-03-07 17:40:49,145 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8195, 6.0075, 5.3103, 5.8484, 5.6632, 5.3812, 5.4578, 5.3005], device='cuda:2'), covar=tensor([0.1119, 0.0858, 0.0890, 0.0681, 0.0577, 0.1226, 0.2198, 0.2038], device='cuda:2'), in_proj_covar=tensor([0.0322, 0.0361, 0.0294, 0.0282, 0.0265, 0.0345, 0.0388, 0.0370], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 17:41:14,643 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16906.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:41:18,939 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+02 3.605e+02 4.438e+02 5.826e+02 1.823e+03, threshold=8.877e+02, percent-clipped=6.0 2023-03-07 17:41:52,650 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16929.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 17:42:04,887 INFO [train2.py:809] (2/4) Epoch 5, batch 1000, loss[ctc_loss=0.1356, att_loss=0.2588, loss=0.2342, over 16121.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006603, over 42.00 utterances.], tot_loss[ctc_loss=0.1657, att_loss=0.2835, loss=0.26, over 3261195.81 frames. utt_duration=1248 frames, utt_pad_proportion=0.05296, over 10469.03 utterances.], batch size: 42, lr: 2.16e-02, grad_scale: 8.0 2023-03-07 17:42:54,677 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16967.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 17:43:25,865 INFO [train2.py:809] (2/4) Epoch 5, batch 1050, loss[ctc_loss=0.1706, att_loss=0.2904, loss=0.2664, over 17055.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.00881, over 53.00 utterances.], tot_loss[ctc_loss=0.1657, att_loss=0.2829, loss=0.2595, over 3261898.29 frames. utt_duration=1245 frames, utt_pad_proportion=0.05427, over 10491.23 utterances.], batch size: 53, lr: 2.16e-02, grad_scale: 8.0 2023-03-07 17:44:02,523 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.326e+02 3.274e+02 4.192e+02 5.406e+02 1.067e+03, threshold=8.384e+02, percent-clipped=5.0 2023-03-07 17:44:12,888 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6558, 5.8427, 5.2409, 5.8225, 5.5491, 5.2264, 5.2939, 5.0550], device='cuda:2'), covar=tensor([0.1307, 0.0775, 0.0745, 0.0562, 0.0567, 0.1214, 0.2180, 0.2003], device='cuda:2'), in_proj_covar=tensor([0.0327, 0.0362, 0.0293, 0.0283, 0.0269, 0.0348, 0.0390, 0.0372], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 17:44:32,628 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=17027.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:44:47,446 INFO [train2.py:809] (2/4) Epoch 5, batch 1100, loss[ctc_loss=0.1758, att_loss=0.2949, loss=0.2711, over 17034.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.008962, over 52.00 utterances.], tot_loss[ctc_loss=0.1654, att_loss=0.2825, loss=0.2591, over 3266989.30 frames. utt_duration=1249 frames, utt_pad_proportion=0.05397, over 10477.41 utterances.], batch size: 52, lr: 2.15e-02, grad_scale: 8.0 2023-03-07 17:45:50,935 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=17075.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:46:09,758 INFO [train2.py:809] (2/4) Epoch 5, batch 1150, loss[ctc_loss=0.1269, att_loss=0.2496, loss=0.2251, over 15499.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008849, over 36.00 utterances.], tot_loss[ctc_loss=0.1646, att_loss=0.2812, loss=0.2579, over 3249740.96 frames. utt_duration=1253 frames, utt_pad_proportion=0.05804, over 10383.07 utterances.], batch size: 36, lr: 2.15e-02, grad_scale: 8.0 2023-03-07 17:46:45,383 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.659e+02 3.410e+02 4.345e+02 5.323e+02 1.056e+03, threshold=8.690e+02, percent-clipped=4.0 2023-03-07 17:47:22,583 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2023-03-07 17:47:31,219 INFO [train2.py:809] (2/4) Epoch 5, batch 1200, loss[ctc_loss=0.1603, att_loss=0.2593, loss=0.2395, over 15464.00 frames. utt_duration=1720 frames, utt_pad_proportion=0.01102, over 36.00 utterances.], tot_loss[ctc_loss=0.165, att_loss=0.2815, loss=0.2582, over 3256441.83 frames. utt_duration=1273 frames, utt_pad_proportion=0.05145, over 10245.35 utterances.], batch size: 36, lr: 2.15e-02, grad_scale: 8.0 2023-03-07 17:47:49,071 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4559, 2.4294, 3.5438, 2.9896, 3.2051, 4.5406, 4.2368, 3.1407], device='cuda:2'), covar=tensor([0.0358, 0.2014, 0.1175, 0.1345, 0.1215, 0.0499, 0.0571, 0.1488], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0196, 0.0192, 0.0186, 0.0201, 0.0187, 0.0154, 0.0193], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 17:48:32,281 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.63 vs. limit=5.0 2023-03-07 17:48:37,219 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8704, 1.4804, 1.9555, 1.7928, 2.6678, 0.7301, 1.8871, 2.0418], device='cuda:2'), covar=tensor([0.0571, 0.1891, 0.1853, 0.0839, 0.0336, 0.2070, 0.1305, 0.1262], device='cuda:2'), in_proj_covar=tensor([0.0092, 0.0099, 0.0100, 0.0078, 0.0084, 0.0081, 0.0097, 0.0090], device='cuda:2'), out_proj_covar=tensor([4.1290e-05, 5.1489e-05, 5.1918e-05, 4.1557e-05, 3.9435e-05, 4.3910e-05, 4.8237e-05, 4.7583e-05], device='cuda:2') 2023-03-07 17:48:52,437 INFO [train2.py:809] (2/4) Epoch 5, batch 1250, loss[ctc_loss=0.1725, att_loss=0.2759, loss=0.2552, over 16319.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006703, over 45.00 utterances.], tot_loss[ctc_loss=0.1667, att_loss=0.2826, loss=0.2595, over 3263064.29 frames. utt_duration=1242 frames, utt_pad_proportion=0.05541, over 10521.51 utterances.], batch size: 45, lr: 2.14e-02, grad_scale: 8.0 2023-03-07 17:48:57,269 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17189.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:49:21,220 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6473, 4.2408, 4.2477, 4.2899, 2.3488, 4.0340, 2.3692, 2.2169], device='cuda:2'), covar=tensor([0.0349, 0.0130, 0.0562, 0.0185, 0.2276, 0.0240, 0.1597, 0.1432], device='cuda:2'), in_proj_covar=tensor([0.0104, 0.0096, 0.0232, 0.0112, 0.0227, 0.0097, 0.0214, 0.0200], device='cuda:2'), out_proj_covar=tensor([9.9356e-05, 9.4643e-05, 2.0229e-04, 9.9034e-05, 1.9729e-04, 9.1547e-05, 1.8380e-04, 1.7244e-04], device='cuda:2') 2023-03-07 17:49:29,491 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 3.900e+02 4.861e+02 6.296e+02 1.076e+03, threshold=9.721e+02, percent-clipped=3.0 2023-03-07 17:49:55,882 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17224.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 17:50:11,082 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.57 vs. limit=2.0 2023-03-07 17:50:14,607 INFO [train2.py:809] (2/4) Epoch 5, batch 1300, loss[ctc_loss=0.1521, att_loss=0.2473, loss=0.2282, over 15865.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.01047, over 39.00 utterances.], tot_loss[ctc_loss=0.1657, att_loss=0.2818, loss=0.2586, over 3266564.82 frames. utt_duration=1250 frames, utt_pad_proportion=0.05354, over 10469.09 utterances.], batch size: 39, lr: 2.14e-02, grad_scale: 8.0 2023-03-07 17:50:36,826 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17250.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:50:56,143 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17262.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 17:51:35,789 INFO [train2.py:809] (2/4) Epoch 5, batch 1350, loss[ctc_loss=0.1673, att_loss=0.2931, loss=0.268, over 17000.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.009098, over 51.00 utterances.], tot_loss[ctc_loss=0.1653, att_loss=0.2822, loss=0.2588, over 3276211.93 frames. utt_duration=1254 frames, utt_pad_proportion=0.05068, over 10465.64 utterances.], batch size: 51, lr: 2.14e-02, grad_scale: 8.0 2023-03-07 17:52:12,508 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 3.130e+02 4.012e+02 5.410e+02 1.134e+03, threshold=8.024e+02, percent-clipped=3.0 2023-03-07 17:52:57,026 INFO [train2.py:809] (2/4) Epoch 5, batch 1400, loss[ctc_loss=0.1593, att_loss=0.293, loss=0.2662, over 17021.00 frames. utt_duration=1286 frames, utt_pad_proportion=0.01153, over 53.00 utterances.], tot_loss[ctc_loss=0.1656, att_loss=0.2829, loss=0.2595, over 3282488.46 frames. utt_duration=1235 frames, utt_pad_proportion=0.05355, over 10640.95 utterances.], batch size: 53, lr: 2.14e-02, grad_scale: 8.0 2023-03-07 17:52:59,001 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9815, 1.2649, 2.2412, 2.3321, 3.0169, 1.2899, 1.3544, 2.2306], device='cuda:2'), covar=tensor([0.0474, 0.2306, 0.2233, 0.0876, 0.0435, 0.1774, 0.1881, 0.1096], device='cuda:2'), in_proj_covar=tensor([0.0090, 0.0097, 0.0100, 0.0078, 0.0085, 0.0082, 0.0100, 0.0089], device='cuda:2'), out_proj_covar=tensor([4.0860e-05, 5.1327e-05, 5.2004e-05, 4.1556e-05, 3.9949e-05, 4.4756e-05, 4.9709e-05, 4.7364e-05], device='cuda:2') 2023-03-07 17:53:12,946 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9444, 3.8032, 3.8022, 2.7091, 3.7315, 3.6004, 3.5332, 2.5398], device='cuda:2'), covar=tensor([0.0102, 0.0120, 0.0154, 0.0763, 0.0096, 0.0333, 0.0257, 0.1204], device='cuda:2'), in_proj_covar=tensor([0.0050, 0.0058, 0.0047, 0.0093, 0.0054, 0.0068, 0.0075, 0.0098], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 17:53:38,719 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17362.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:54:13,815 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17383.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 17:54:18,018 INFO [train2.py:809] (2/4) Epoch 5, batch 1450, loss[ctc_loss=0.1472, att_loss=0.2721, loss=0.2471, over 16295.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006427, over 43.00 utterances.], tot_loss[ctc_loss=0.1649, att_loss=0.2823, loss=0.2588, over 3279832.54 frames. utt_duration=1248 frames, utt_pad_proportion=0.05153, over 10525.08 utterances.], batch size: 43, lr: 2.13e-02, grad_scale: 8.0 2023-03-07 17:54:55,330 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+02 3.556e+02 4.184e+02 5.156e+02 1.348e+03, threshold=8.369e+02, percent-clipped=7.0 2023-03-07 17:55:19,586 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17423.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:55:31,671 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.61 vs. limit=2.0 2023-03-07 17:55:39,518 INFO [train2.py:809] (2/4) Epoch 5, batch 1500, loss[ctc_loss=0.1611, att_loss=0.2556, loss=0.2367, over 15762.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009264, over 38.00 utterances.], tot_loss[ctc_loss=0.1632, att_loss=0.2807, loss=0.2572, over 3261937.48 frames. utt_duration=1249 frames, utt_pad_proportion=0.05626, over 10455.74 utterances.], batch size: 38, lr: 2.13e-02, grad_scale: 8.0 2023-03-07 17:55:52,454 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17444.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 17:56:21,617 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9369, 4.5358, 4.7833, 4.5604, 5.3315, 5.1729, 4.8181, 2.2836], device='cuda:2'), covar=tensor([0.0257, 0.0601, 0.0253, 0.0359, 0.1009, 0.0142, 0.0290, 0.3010], device='cuda:2'), in_proj_covar=tensor([0.0131, 0.0122, 0.0113, 0.0125, 0.0276, 0.0128, 0.0112, 0.0244], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 17:56:57,709 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2023-03-07 17:56:59,733 INFO [train2.py:809] (2/4) Epoch 5, batch 1550, loss[ctc_loss=0.1809, att_loss=0.2838, loss=0.2632, over 16389.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007591, over 44.00 utterances.], tot_loss[ctc_loss=0.1615, att_loss=0.2794, loss=0.2558, over 3261483.87 frames. utt_duration=1272 frames, utt_pad_proportion=0.05048, over 10267.61 utterances.], batch size: 44, lr: 2.13e-02, grad_scale: 8.0 2023-03-07 17:57:36,008 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 3.523e+02 4.297e+02 5.219e+02 1.408e+03, threshold=8.593e+02, percent-clipped=2.0 2023-03-07 17:58:01,433 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=17524.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 17:58:11,546 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2204, 3.7348, 3.4710, 3.6680, 2.3989, 3.6733, 2.5316, 2.3632], device='cuda:2'), covar=tensor([0.0365, 0.0165, 0.0811, 0.0275, 0.2360, 0.0237, 0.1603, 0.1583], device='cuda:2'), in_proj_covar=tensor([0.0106, 0.0098, 0.0237, 0.0115, 0.0230, 0.0098, 0.0219, 0.0205], device='cuda:2'), out_proj_covar=tensor([1.0094e-04, 9.6429e-05, 2.0740e-04, 1.0063e-04, 2.0041e-04, 9.2389e-05, 1.8719e-04, 1.7672e-04], device='cuda:2') 2023-03-07 17:58:20,672 INFO [train2.py:809] (2/4) Epoch 5, batch 1600, loss[ctc_loss=0.1715, att_loss=0.2905, loss=0.2667, over 16862.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.007772, over 49.00 utterances.], tot_loss[ctc_loss=0.1636, att_loss=0.2813, loss=0.2578, over 3269958.83 frames. utt_duration=1243 frames, utt_pad_proportion=0.05579, over 10534.97 utterances.], batch size: 49, lr: 2.12e-02, grad_scale: 8.0 2023-03-07 17:58:34,854 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17545.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 17:59:03,257 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=17562.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 17:59:20,713 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=17572.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 17:59:41,750 INFO [train2.py:809] (2/4) Epoch 5, batch 1650, loss[ctc_loss=0.2044, att_loss=0.3014, loss=0.282, over 16393.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007167, over 44.00 utterances.], tot_loss[ctc_loss=0.1639, att_loss=0.2816, loss=0.2581, over 3270708.53 frames. utt_duration=1223 frames, utt_pad_proportion=0.06067, over 10707.75 utterances.], batch size: 44, lr: 2.12e-02, grad_scale: 8.0 2023-03-07 17:59:42,752 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.97 vs. limit=2.0 2023-03-07 18:00:19,756 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 3.498e+02 4.244e+02 5.016e+02 1.623e+03, threshold=8.488e+02, percent-clipped=3.0 2023-03-07 18:00:22,203 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=17610.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:01:02,641 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.30 vs. limit=2.0 2023-03-07 18:01:04,844 INFO [train2.py:809] (2/4) Epoch 5, batch 1700, loss[ctc_loss=0.1521, att_loss=0.2606, loss=0.2389, over 15783.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007835, over 38.00 utterances.], tot_loss[ctc_loss=0.1636, att_loss=0.2814, loss=0.2579, over 3268937.64 frames. utt_duration=1221 frames, utt_pad_proportion=0.06301, over 10720.04 utterances.], batch size: 38, lr: 2.12e-02, grad_scale: 8.0 2023-03-07 18:01:24,790 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2023-03-07 18:01:38,731 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3663, 4.5882, 4.6835, 4.6019, 4.8302, 4.7274, 4.4735, 4.3391], device='cuda:2'), covar=tensor([0.1213, 0.0788, 0.0216, 0.0470, 0.0254, 0.0307, 0.0290, 0.0363], device='cuda:2'), in_proj_covar=tensor([0.0378, 0.0231, 0.0157, 0.0203, 0.0245, 0.0271, 0.0212, 0.0240], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 18:01:59,114 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4344, 3.8272, 3.2390, 3.4173, 3.8548, 3.6937, 2.4696, 4.4694], device='cuda:2'), covar=tensor([0.1345, 0.0477, 0.1036, 0.0650, 0.0642, 0.0670, 0.1153, 0.0356], device='cuda:2'), in_proj_covar=tensor([0.0156, 0.0128, 0.0178, 0.0148, 0.0158, 0.0171, 0.0152, 0.0143], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 18:02:26,017 INFO [train2.py:809] (2/4) Epoch 5, batch 1750, loss[ctc_loss=0.1266, att_loss=0.2485, loss=0.2241, over 16171.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.007435, over 41.00 utterances.], tot_loss[ctc_loss=0.162, att_loss=0.2804, loss=0.2567, over 3264444.54 frames. utt_duration=1262 frames, utt_pad_proportion=0.05415, over 10362.61 utterances.], batch size: 41, lr: 2.12e-02, grad_scale: 8.0 2023-03-07 18:02:47,893 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17700.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:03:02,977 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+02 3.443e+02 4.314e+02 5.475e+02 1.397e+03, threshold=8.628e+02, percent-clipped=3.0 2023-03-07 18:03:17,848 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17718.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:03:37,962 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17730.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:03:38,774 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.73 vs. limit=5.0 2023-03-07 18:03:42,608 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3789, 1.5136, 2.0648, 2.1694, 2.8940, 2.3258, 2.1379, 3.2571], device='cuda:2'), covar=tensor([0.0342, 0.2414, 0.2203, 0.1361, 0.0587, 0.0983, 0.1815, 0.0681], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0103, 0.0107, 0.0091, 0.0088, 0.0083, 0.0105, 0.0090], device='cuda:2'), out_proj_covar=tensor([4.0644e-05, 5.4908e-05, 5.5043e-05, 4.5801e-05, 4.1002e-05, 4.5776e-05, 5.2401e-05, 4.8381e-05], device='cuda:2') 2023-03-07 18:03:46,844 INFO [train2.py:809] (2/4) Epoch 5, batch 1800, loss[ctc_loss=0.1138, att_loss=0.2332, loss=0.2093, over 15391.00 frames. utt_duration=1761 frames, utt_pad_proportion=0.009834, over 35.00 utterances.], tot_loss[ctc_loss=0.1608, att_loss=0.279, loss=0.2554, over 3255296.80 frames. utt_duration=1253 frames, utt_pad_proportion=0.05836, over 10403.29 utterances.], batch size: 35, lr: 2.11e-02, grad_scale: 8.0 2023-03-07 18:03:51,602 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17739.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 18:04:16,828 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6448, 5.8915, 5.1891, 5.8453, 5.5981, 5.3108, 5.3329, 5.1178], device='cuda:2'), covar=tensor([0.1258, 0.0779, 0.0821, 0.0587, 0.0605, 0.1277, 0.1920, 0.1901], device='cuda:2'), in_proj_covar=tensor([0.0330, 0.0368, 0.0288, 0.0294, 0.0274, 0.0355, 0.0399, 0.0366], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 18:04:27,121 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17761.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:04:34,823 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.82 vs. limit=5.0 2023-03-07 18:05:07,324 INFO [train2.py:809] (2/4) Epoch 5, batch 1850, loss[ctc_loss=0.1644, att_loss=0.3037, loss=0.2758, over 16758.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006086, over 48.00 utterances.], tot_loss[ctc_loss=0.1615, att_loss=0.2796, loss=0.256, over 3256421.61 frames. utt_duration=1225 frames, utt_pad_proportion=0.0656, over 10646.74 utterances.], batch size: 48, lr: 2.11e-02, grad_scale: 8.0 2023-03-07 18:05:09,087 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1265, 5.4410, 5.3416, 5.3452, 5.5791, 5.4810, 5.2230, 5.0696], device='cuda:2'), covar=tensor([0.0787, 0.0381, 0.0186, 0.0365, 0.0204, 0.0206, 0.0248, 0.0235], device='cuda:2'), in_proj_covar=tensor([0.0379, 0.0229, 0.0159, 0.0202, 0.0247, 0.0274, 0.0212, 0.0239], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 18:05:15,555 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17791.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:05:45,053 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 3.254e+02 3.960e+02 5.162e+02 1.019e+03, threshold=7.921e+02, percent-clipped=3.0 2023-03-07 18:06:28,241 INFO [train2.py:809] (2/4) Epoch 5, batch 1900, loss[ctc_loss=0.1631, att_loss=0.2932, loss=0.2672, over 17304.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01174, over 55.00 utterances.], tot_loss[ctc_loss=0.162, att_loss=0.2796, loss=0.2561, over 3255819.25 frames. utt_duration=1224 frames, utt_pad_proportion=0.06539, over 10648.83 utterances.], batch size: 55, lr: 2.11e-02, grad_scale: 8.0 2023-03-07 18:06:42,550 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=17845.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:07:49,564 INFO [train2.py:809] (2/4) Epoch 5, batch 1950, loss[ctc_loss=0.14, att_loss=0.25, loss=0.228, over 15785.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.008333, over 38.00 utterances.], tot_loss[ctc_loss=0.1633, att_loss=0.2802, loss=0.2568, over 3246152.89 frames. utt_duration=1214 frames, utt_pad_proportion=0.07043, over 10704.65 utterances.], batch size: 38, lr: 2.11e-02, grad_scale: 8.0 2023-03-07 18:08:00,640 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=17893.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:08:14,089 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4417, 2.9686, 3.6523, 2.5332, 3.2791, 4.5168, 4.3486, 3.2441], device='cuda:2'), covar=tensor([0.0413, 0.1643, 0.1015, 0.1570, 0.1225, 0.0500, 0.0509, 0.1377], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0204, 0.0201, 0.0186, 0.0207, 0.0190, 0.0161, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 18:08:27,507 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 3.619e+02 4.276e+02 5.445e+02 1.197e+03, threshold=8.551e+02, percent-clipped=4.0 2023-03-07 18:09:09,709 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6161, 2.3250, 4.9042, 3.8149, 3.0397, 4.0811, 4.8255, 4.7110], device='cuda:2'), covar=tensor([0.0159, 0.2070, 0.0156, 0.1271, 0.2364, 0.0286, 0.0119, 0.0214], device='cuda:2'), in_proj_covar=tensor([0.0134, 0.0243, 0.0121, 0.0299, 0.0316, 0.0184, 0.0109, 0.0134], device='cuda:2'), out_proj_covar=tensor([1.1449e-04, 1.9259e-04, 1.0360e-04, 2.3855e-04, 2.5411e-04, 1.5537e-04, 9.4408e-05, 1.1766e-04], device='cuda:2') 2023-03-07 18:09:10,811 INFO [train2.py:809] (2/4) Epoch 5, batch 2000, loss[ctc_loss=0.126, att_loss=0.2568, loss=0.2306, over 16162.00 frames. utt_duration=1578 frames, utt_pad_proportion=0.007987, over 41.00 utterances.], tot_loss[ctc_loss=0.1622, att_loss=0.2805, loss=0.2568, over 3264965.12 frames. utt_duration=1237 frames, utt_pad_proportion=0.05988, over 10569.73 utterances.], batch size: 41, lr: 2.10e-02, grad_scale: 8.0 2023-03-07 18:09:15,892 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8924, 4.9186, 4.6837, 2.7694, 4.7262, 4.0270, 4.0309, 2.2880], device='cuda:2'), covar=tensor([0.0166, 0.0079, 0.0280, 0.1032, 0.0086, 0.0249, 0.0346, 0.1594], device='cuda:2'), in_proj_covar=tensor([0.0051, 0.0059, 0.0050, 0.0095, 0.0057, 0.0069, 0.0077, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 18:09:50,809 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17960.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:10:22,054 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2023-03-07 18:10:24,658 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6387, 1.5246, 2.1675, 1.8587, 2.7044, 2.3881, 2.0207, 3.0723], device='cuda:2'), covar=tensor([0.0913, 0.2658, 0.2258, 0.1557, 0.0791, 0.1142, 0.1749, 0.0700], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0098, 0.0101, 0.0088, 0.0083, 0.0082, 0.0102, 0.0087], device='cuda:2'), out_proj_covar=tensor([4.0319e-05, 5.2939e-05, 5.3204e-05, 4.4542e-05, 3.8963e-05, 4.4723e-05, 5.1670e-05, 4.6963e-05], device='cuda:2') 2023-03-07 18:10:30,710 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7640, 5.2114, 5.0829, 5.1607, 5.3173, 5.1391, 4.9103, 4.7829], device='cuda:2'), covar=tensor([0.1153, 0.0413, 0.0197, 0.0327, 0.0250, 0.0267, 0.0301, 0.0312], device='cuda:2'), in_proj_covar=tensor([0.0377, 0.0225, 0.0158, 0.0197, 0.0244, 0.0272, 0.0208, 0.0239], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 18:10:32,018 INFO [train2.py:809] (2/4) Epoch 5, batch 2050, loss[ctc_loss=0.1489, att_loss=0.2734, loss=0.2485, over 16375.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.008271, over 44.00 utterances.], tot_loss[ctc_loss=0.162, att_loss=0.2805, loss=0.2568, over 3252342.35 frames. utt_duration=1227 frames, utt_pad_proportion=0.06257, over 10613.45 utterances.], batch size: 44, lr: 2.10e-02, grad_scale: 8.0 2023-03-07 18:11:14,244 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.268e+02 3.678e+02 4.390e+02 5.847e+02 1.003e+03, threshold=8.779e+02, percent-clipped=6.0 2023-03-07 18:11:29,686 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18018.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:11:34,414 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18021.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:11:57,536 INFO [train2.py:809] (2/4) Epoch 5, batch 2100, loss[ctc_loss=0.1441, att_loss=0.2831, loss=0.2553, over 16743.00 frames. utt_duration=1397 frames, utt_pad_proportion=0.007946, over 48.00 utterances.], tot_loss[ctc_loss=0.1626, att_loss=0.2807, loss=0.2571, over 3250094.79 frames. utt_duration=1203 frames, utt_pad_proportion=0.07056, over 10818.79 utterances.], batch size: 48, lr: 2.10e-02, grad_scale: 8.0 2023-03-07 18:12:02,537 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18039.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 18:12:15,511 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1748, 5.3337, 5.7311, 5.6915, 5.4402, 6.0384, 5.0777, 6.1448], device='cuda:2'), covar=tensor([0.0438, 0.0430, 0.0364, 0.0635, 0.1776, 0.0627, 0.0408, 0.0394], device='cuda:2'), in_proj_covar=tensor([0.0497, 0.0318, 0.0325, 0.0393, 0.0546, 0.0331, 0.0284, 0.0351], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 18:12:30,804 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18056.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:12:36,873 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7171, 5.1656, 4.9571, 4.9724, 5.2141, 5.0063, 4.8981, 4.7122], device='cuda:2'), covar=tensor([0.1153, 0.0338, 0.0250, 0.0517, 0.0236, 0.0308, 0.0250, 0.0311], device='cuda:2'), in_proj_covar=tensor([0.0375, 0.0224, 0.0156, 0.0200, 0.0243, 0.0269, 0.0206, 0.0238], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 18:12:46,838 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18066.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:13:17,728 INFO [train2.py:809] (2/4) Epoch 5, batch 2150, loss[ctc_loss=0.1626, att_loss=0.2934, loss=0.2673, over 17324.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.01061, over 55.00 utterances.], tot_loss[ctc_loss=0.164, att_loss=0.2818, loss=0.2582, over 3258565.00 frames. utt_duration=1189 frames, utt_pad_proportion=0.07175, over 10980.71 utterances.], batch size: 55, lr: 2.09e-02, grad_scale: 8.0 2023-03-07 18:13:17,962 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18086.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:13:19,446 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18087.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 18:13:21,095 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4885, 4.9511, 4.6592, 4.9602, 5.0289, 4.7462, 4.0325, 4.8850], device='cuda:2'), covar=tensor([0.0089, 0.0107, 0.0100, 0.0068, 0.0072, 0.0082, 0.0383, 0.0150], device='cuda:2'), in_proj_covar=tensor([0.0052, 0.0053, 0.0056, 0.0038, 0.0038, 0.0048, 0.0071, 0.0067], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 18:13:40,799 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0437, 4.2795, 4.5878, 4.6914, 2.0258, 4.5804, 2.1597, 1.9443], device='cuda:2'), covar=tensor([0.0307, 0.0181, 0.0703, 0.0166, 0.3200, 0.0181, 0.2315, 0.2021], device='cuda:2'), in_proj_covar=tensor([0.0110, 0.0104, 0.0243, 0.0112, 0.0233, 0.0097, 0.0220, 0.0205], device='cuda:2'), out_proj_covar=tensor([1.0416e-04, 1.0138e-04, 2.1242e-04, 9.8043e-05, 2.0409e-04, 9.1072e-05, 1.8985e-04, 1.7831e-04], device='cuda:2') 2023-03-07 18:13:54,648 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.249e+02 3.704e+02 4.287e+02 5.437e+02 1.158e+03, threshold=8.574e+02, percent-clipped=4.0 2023-03-07 18:13:59,118 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.1959, 2.9108, 2.5802, 2.7348, 3.1754, 2.9168, 2.2808, 3.2360], device='cuda:2'), covar=tensor([0.1026, 0.0333, 0.0844, 0.0603, 0.0446, 0.0605, 0.0843, 0.0389], device='cuda:2'), in_proj_covar=tensor([0.0153, 0.0130, 0.0178, 0.0143, 0.0157, 0.0170, 0.0151, 0.0144], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 18:14:38,377 INFO [train2.py:809] (2/4) Epoch 5, batch 2200, loss[ctc_loss=0.1916, att_loss=0.2781, loss=0.2608, over 16389.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.007712, over 44.00 utterances.], tot_loss[ctc_loss=0.1634, att_loss=0.2812, loss=0.2577, over 3261800.45 frames. utt_duration=1191 frames, utt_pad_proportion=0.07114, over 10968.73 utterances.], batch size: 44, lr: 2.09e-02, grad_scale: 8.0 2023-03-07 18:14:39,668 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2023-03-07 18:15:41,246 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18174.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:15:42,900 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18175.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:16:00,105 INFO [train2.py:809] (2/4) Epoch 5, batch 2250, loss[ctc_loss=0.1428, att_loss=0.2879, loss=0.2589, over 16618.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005752, over 47.00 utterances.], tot_loss[ctc_loss=0.1623, att_loss=0.2807, loss=0.257, over 3267001.51 frames. utt_duration=1210 frames, utt_pad_proportion=0.06554, over 10815.86 utterances.], batch size: 47, lr: 2.09e-02, grad_scale: 8.0 2023-03-07 18:16:38,767 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.354e+02 3.428e+02 4.207e+02 5.393e+02 1.191e+03, threshold=8.413e+02, percent-clipped=8.0 2023-03-07 18:17:20,591 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18235.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:17:21,717 INFO [train2.py:809] (2/4) Epoch 5, batch 2300, loss[ctc_loss=0.1472, att_loss=0.2775, loss=0.2515, over 16704.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.004881, over 46.00 utterances.], tot_loss[ctc_loss=0.1606, att_loss=0.2804, loss=0.2565, over 3269270.42 frames. utt_duration=1232 frames, utt_pad_proportion=0.05987, over 10623.80 utterances.], batch size: 46, lr: 2.09e-02, grad_scale: 8.0 2023-03-07 18:17:22,131 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18236.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:18:43,283 INFO [train2.py:809] (2/4) Epoch 5, batch 2350, loss[ctc_loss=0.1843, att_loss=0.2996, loss=0.2766, over 16878.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007052, over 49.00 utterances.], tot_loss[ctc_loss=0.1599, att_loss=0.2796, loss=0.2557, over 3264185.80 frames. utt_duration=1246 frames, utt_pad_proportion=0.05824, over 10493.57 utterances.], batch size: 49, lr: 2.08e-02, grad_scale: 8.0 2023-03-07 18:19:21,026 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.258e+02 3.468e+02 4.143e+02 4.951e+02 1.211e+03, threshold=8.286e+02, percent-clipped=3.0 2023-03-07 18:19:32,550 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18316.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:19:32,625 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5108, 4.9654, 4.7624, 4.9240, 4.8757, 4.6424, 3.9717, 4.9119], device='cuda:2'), covar=tensor([0.0089, 0.0150, 0.0079, 0.0066, 0.0071, 0.0084, 0.0409, 0.0180], device='cuda:2'), in_proj_covar=tensor([0.0053, 0.0054, 0.0056, 0.0038, 0.0038, 0.0047, 0.0071, 0.0067], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 18:20:04,252 INFO [train2.py:809] (2/4) Epoch 5, batch 2400, loss[ctc_loss=0.139, att_loss=0.2725, loss=0.2458, over 16419.00 frames. utt_duration=1495 frames, utt_pad_proportion=0.006301, over 44.00 utterances.], tot_loss[ctc_loss=0.1614, att_loss=0.2807, loss=0.2568, over 3266154.79 frames. utt_duration=1209 frames, utt_pad_proportion=0.06648, over 10823.89 utterances.], batch size: 44, lr: 2.08e-02, grad_scale: 16.0 2023-03-07 18:20:38,652 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18356.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:21:26,060 INFO [train2.py:809] (2/4) Epoch 5, batch 2450, loss[ctc_loss=0.1861, att_loss=0.3011, loss=0.2781, over 16774.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006214, over 48.00 utterances.], tot_loss[ctc_loss=0.1618, att_loss=0.2812, loss=0.2573, over 3273452.36 frames. utt_duration=1231 frames, utt_pad_proportion=0.05985, over 10650.08 utterances.], batch size: 48, lr: 2.08e-02, grad_scale: 16.0 2023-03-07 18:21:26,379 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18386.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:21:56,556 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18404.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:22:04,374 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 3.376e+02 4.245e+02 5.566e+02 1.055e+03, threshold=8.490e+02, percent-clipped=7.0 2023-03-07 18:22:44,603 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18434.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:22:47,528 INFO [train2.py:809] (2/4) Epoch 5, batch 2500, loss[ctc_loss=0.1136, att_loss=0.2419, loss=0.2163, over 15373.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01109, over 35.00 utterances.], tot_loss[ctc_loss=0.1606, att_loss=0.2803, loss=0.2564, over 3271531.95 frames. utt_duration=1242 frames, utt_pad_proportion=0.0577, over 10548.71 utterances.], batch size: 35, lr: 2.08e-02, grad_scale: 16.0 2023-03-07 18:22:59,488 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8139, 4.8887, 4.7452, 2.7491, 4.6283, 4.3454, 3.9987, 2.3568], device='cuda:2'), covar=tensor([0.0162, 0.0084, 0.0206, 0.0963, 0.0100, 0.0171, 0.0315, 0.1574], device='cuda:2'), in_proj_covar=tensor([0.0052, 0.0060, 0.0048, 0.0094, 0.0057, 0.0068, 0.0078, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 18:24:09,077 INFO [train2.py:809] (2/4) Epoch 5, batch 2550, loss[ctc_loss=0.1278, att_loss=0.2664, loss=0.2386, over 16336.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005761, over 45.00 utterances.], tot_loss[ctc_loss=0.1612, att_loss=0.2809, loss=0.257, over 3272442.07 frames. utt_duration=1221 frames, utt_pad_proportion=0.0618, over 10733.82 utterances.], batch size: 45, lr: 2.07e-02, grad_scale: 16.0 2023-03-07 18:24:15,588 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18490.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:24:44,672 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18508.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 18:24:45,767 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 3.434e+02 4.162e+02 5.436e+02 1.075e+03, threshold=8.324e+02, percent-clipped=2.0 2023-03-07 18:25:20,155 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18530.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:25:21,721 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18531.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:25:29,428 INFO [train2.py:809] (2/4) Epoch 5, batch 2600, loss[ctc_loss=0.1439, att_loss=0.2646, loss=0.2405, over 15954.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007042, over 41.00 utterances.], tot_loss[ctc_loss=0.1605, att_loss=0.2802, loss=0.2563, over 3271558.12 frames. utt_duration=1217 frames, utt_pad_proportion=0.06256, over 10765.01 utterances.], batch size: 41, lr: 2.07e-02, grad_scale: 16.0 2023-03-07 18:25:54,417 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18551.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:26:22,967 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18569.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 18:26:50,086 INFO [train2.py:809] (2/4) Epoch 5, batch 2650, loss[ctc_loss=0.1709, att_loss=0.2822, loss=0.2599, over 16331.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.00605, over 45.00 utterances.], tot_loss[ctc_loss=0.1624, att_loss=0.2813, loss=0.2576, over 3279191.25 frames. utt_duration=1218 frames, utt_pad_proportion=0.06066, over 10780.81 utterances.], batch size: 45, lr: 2.07e-02, grad_scale: 16.0 2023-03-07 18:27:27,353 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 3.747e+02 4.549e+02 5.816e+02 1.798e+03, threshold=9.097e+02, percent-clipped=9.0 2023-03-07 18:27:38,436 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18616.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:28:10,692 INFO [train2.py:809] (2/4) Epoch 5, batch 2700, loss[ctc_loss=0.1251, att_loss=0.248, loss=0.2234, over 15474.00 frames. utt_duration=1721 frames, utt_pad_proportion=0.01049, over 36.00 utterances.], tot_loss[ctc_loss=0.1615, att_loss=0.2803, loss=0.2565, over 3274868.28 frames. utt_duration=1241 frames, utt_pad_proportion=0.05628, over 10572.37 utterances.], batch size: 36, lr: 2.07e-02, grad_scale: 16.0 2023-03-07 18:28:56,342 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18664.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:29:13,378 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18674.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:29:28,302 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-07 18:29:31,909 INFO [train2.py:809] (2/4) Epoch 5, batch 2750, loss[ctc_loss=0.1536, att_loss=0.2702, loss=0.2469, over 16259.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.007803, over 43.00 utterances.], tot_loss[ctc_loss=0.1625, att_loss=0.2811, loss=0.2574, over 3281794.52 frames. utt_duration=1224 frames, utt_pad_proportion=0.05784, over 10734.02 utterances.], batch size: 43, lr: 2.06e-02, grad_scale: 16.0 2023-03-07 18:30:10,085 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-03-07 18:30:10,721 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.386e+02 3.574e+02 4.434e+02 5.832e+02 1.352e+03, threshold=8.867e+02, percent-clipped=7.0 2023-03-07 18:30:26,516 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7153, 4.0791, 3.1008, 3.8148, 3.9849, 3.9050, 2.9920, 4.6221], device='cuda:2'), covar=tensor([0.1134, 0.0281, 0.1228, 0.0471, 0.0488, 0.0549, 0.0851, 0.0326], device='cuda:2'), in_proj_covar=tensor([0.0154, 0.0130, 0.0180, 0.0146, 0.0158, 0.0170, 0.0154, 0.0149], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 18:30:41,961 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7382, 5.0767, 4.7181, 5.1891, 4.6282, 4.8757, 5.4078, 5.1581], device='cuda:2'), covar=tensor([0.0353, 0.0278, 0.0623, 0.0169, 0.0416, 0.0164, 0.0180, 0.0122], device='cuda:2'), in_proj_covar=tensor([0.0204, 0.0170, 0.0219, 0.0142, 0.0186, 0.0138, 0.0159, 0.0153], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-07 18:30:51,277 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18735.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:30:52,495 INFO [train2.py:809] (2/4) Epoch 5, batch 2800, loss[ctc_loss=0.1212, att_loss=0.2701, loss=0.2403, over 16332.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.00608, over 45.00 utterances.], tot_loss[ctc_loss=0.1618, att_loss=0.2802, loss=0.2565, over 3266325.28 frames. utt_duration=1232 frames, utt_pad_proportion=0.06056, over 10614.98 utterances.], batch size: 45, lr: 2.06e-02, grad_scale: 8.0 2023-03-07 18:31:03,247 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2408, 4.8201, 4.8478, 2.8588, 1.9464, 2.4685, 4.6437, 3.7763], device='cuda:2'), covar=tensor([0.0428, 0.0192, 0.0173, 0.2403, 0.6215, 0.2886, 0.0183, 0.1536], device='cuda:2'), in_proj_covar=tensor([0.0277, 0.0174, 0.0201, 0.0184, 0.0365, 0.0329, 0.0180, 0.0317], device='cuda:2'), out_proj_covar=tensor([1.4423e-04, 7.7521e-05, 9.1104e-05, 8.6189e-05, 1.8015e-04, 1.5216e-04, 7.9765e-05, 1.5591e-04], device='cuda:2') 2023-03-07 18:32:14,583 INFO [train2.py:809] (2/4) Epoch 5, batch 2850, loss[ctc_loss=0.1231, att_loss=0.2421, loss=0.2183, over 13275.00 frames. utt_duration=1833 frames, utt_pad_proportion=0.1033, over 29.00 utterances.], tot_loss[ctc_loss=0.161, att_loss=0.2795, loss=0.2558, over 3264755.90 frames. utt_duration=1214 frames, utt_pad_proportion=0.06428, over 10771.53 utterances.], batch size: 29, lr: 2.06e-02, grad_scale: 8.0 2023-03-07 18:32:54,817 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 3.458e+02 4.600e+02 5.611e+02 1.235e+03, threshold=9.200e+02, percent-clipped=3.0 2023-03-07 18:33:13,935 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-07 18:33:24,768 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3651, 3.9298, 3.0679, 3.3793, 4.1110, 3.7761, 2.2237, 4.4959], device='cuda:2'), covar=tensor([0.1162, 0.0367, 0.1063, 0.0577, 0.0435, 0.0484, 0.1234, 0.0281], device='cuda:2'), in_proj_covar=tensor([0.0153, 0.0131, 0.0180, 0.0146, 0.0159, 0.0170, 0.0155, 0.0150], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 18:33:27,976 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18830.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:33:29,499 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18831.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:33:37,336 INFO [train2.py:809] (2/4) Epoch 5, batch 2900, loss[ctc_loss=0.1608, att_loss=0.2841, loss=0.2594, over 16983.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006618, over 50.00 utterances.], tot_loss[ctc_loss=0.1602, att_loss=0.2797, loss=0.2558, over 3279518.09 frames. utt_duration=1231 frames, utt_pad_proportion=0.05606, over 10665.16 utterances.], batch size: 50, lr: 2.06e-02, grad_scale: 8.0 2023-03-07 18:33:53,704 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18846.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:33:58,308 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6276, 4.9189, 4.7125, 4.9454, 4.9023, 4.7046, 3.8373, 4.8978], device='cuda:2'), covar=tensor([0.0075, 0.0128, 0.0091, 0.0068, 0.0094, 0.0094, 0.0414, 0.0152], device='cuda:2'), in_proj_covar=tensor([0.0054, 0.0054, 0.0058, 0.0040, 0.0040, 0.0049, 0.0073, 0.0069], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-07 18:34:21,713 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18864.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 18:34:44,645 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18878.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:34:46,083 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18879.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:34:56,885 INFO [train2.py:809] (2/4) Epoch 5, batch 2950, loss[ctc_loss=0.1399, att_loss=0.2888, loss=0.259, over 16768.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006407, over 48.00 utterances.], tot_loss[ctc_loss=0.1591, att_loss=0.279, loss=0.255, over 3279617.60 frames. utt_duration=1263 frames, utt_pad_proportion=0.04861, over 10395.95 utterances.], batch size: 48, lr: 2.05e-02, grad_scale: 8.0 2023-03-07 18:35:34,921 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.226e+02 3.285e+02 4.050e+02 4.764e+02 8.688e+02, threshold=8.101e+02, percent-clipped=0.0 2023-03-07 18:36:07,994 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18930.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:36:17,499 INFO [train2.py:809] (2/4) Epoch 5, batch 3000, loss[ctc_loss=0.1253, att_loss=0.2607, loss=0.2336, over 16335.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.00567, over 45.00 utterances.], tot_loss[ctc_loss=0.1585, att_loss=0.279, loss=0.2549, over 3281689.06 frames. utt_duration=1288 frames, utt_pad_proportion=0.04232, over 10202.45 utterances.], batch size: 45, lr: 2.05e-02, grad_scale: 8.0 2023-03-07 18:36:17,500 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 18:36:31,697 INFO [train2.py:843] (2/4) Epoch 5, validation: ctc_loss=0.07531, att_loss=0.2513, loss=0.2161, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 18:36:31,698 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 18:37:52,824 INFO [train2.py:809] (2/4) Epoch 5, batch 3050, loss[ctc_loss=0.1347, att_loss=0.2563, loss=0.232, over 14596.00 frames. utt_duration=1826 frames, utt_pad_proportion=0.03805, over 32.00 utterances.], tot_loss[ctc_loss=0.1579, att_loss=0.2786, loss=0.2544, over 3271784.07 frames. utt_duration=1269 frames, utt_pad_proportion=0.05181, over 10328.80 utterances.], batch size: 32, lr: 2.05e-02, grad_scale: 8.0 2023-03-07 18:38:01,030 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18991.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:38:31,921 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.659e+02 4.471e+02 5.591e+02 1.197e+03, threshold=8.941e+02, percent-clipped=8.0 2023-03-07 18:38:47,368 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-03-07 18:38:52,819 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0620, 2.1945, 3.4737, 2.4538, 3.0465, 4.3268, 4.3076, 2.7022], device='cuda:2'), covar=tensor([0.0585, 0.2723, 0.0995, 0.2034, 0.1221, 0.0758, 0.0533, 0.2224], device='cuda:2'), in_proj_covar=tensor([0.0202, 0.0208, 0.0205, 0.0192, 0.0210, 0.0204, 0.0170, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 18:39:04,997 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19030.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:39:06,716 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4729, 2.7059, 3.5693, 2.6867, 3.3702, 4.5837, 4.3511, 3.3251], device='cuda:2'), covar=tensor([0.0344, 0.1813, 0.0911, 0.1569, 0.1014, 0.0535, 0.0526, 0.1299], device='cuda:2'), in_proj_covar=tensor([0.0203, 0.0209, 0.0206, 0.0194, 0.0212, 0.0206, 0.0171, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 18:39:13,981 INFO [train2.py:809] (2/4) Epoch 5, batch 3100, loss[ctc_loss=0.2418, att_loss=0.3221, loss=0.306, over 17289.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02429, over 59.00 utterances.], tot_loss[ctc_loss=0.1588, att_loss=0.2794, loss=0.2553, over 3274074.34 frames. utt_duration=1258 frames, utt_pad_proportion=0.05303, over 10424.09 utterances.], batch size: 59, lr: 2.05e-02, grad_scale: 8.0 2023-03-07 18:39:54,477 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.8388, 2.4790, 2.8129, 2.1177, 2.5918, 2.3013, 2.5808, 1.6706], device='cuda:2'), covar=tensor([0.1236, 0.0950, 0.2926, 0.4313, 0.1414, 0.4996, 0.0753, 0.7420], device='cuda:2'), in_proj_covar=tensor([0.0069, 0.0069, 0.0069, 0.0092, 0.0067, 0.0089, 0.0062, 0.0108], device='cuda:2'), out_proj_covar=tensor([5.0542e-05, 4.6471e-05, 4.9133e-05, 6.6688e-05, 4.7872e-05, 6.7151e-05, 4.5442e-05, 8.1609e-05], device='cuda:2') 2023-03-07 18:40:36,263 INFO [train2.py:809] (2/4) Epoch 5, batch 3150, loss[ctc_loss=0.1223, att_loss=0.2735, loss=0.2433, over 16873.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.00806, over 49.00 utterances.], tot_loss[ctc_loss=0.1581, att_loss=0.2789, loss=0.2547, over 3277469.04 frames. utt_duration=1252 frames, utt_pad_proportion=0.05231, over 10487.61 utterances.], batch size: 49, lr: 2.04e-02, grad_scale: 8.0 2023-03-07 18:41:14,955 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.126e+02 3.661e+02 4.363e+02 5.729e+02 1.353e+03, threshold=8.726e+02, percent-clipped=4.0 2023-03-07 18:41:30,746 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3028, 4.7984, 4.5440, 4.8845, 4.8286, 4.4435, 3.9187, 4.7535], device='cuda:2'), covar=tensor([0.0123, 0.0139, 0.0112, 0.0092, 0.0101, 0.0146, 0.0399, 0.0236], device='cuda:2'), in_proj_covar=tensor([0.0055, 0.0055, 0.0059, 0.0040, 0.0041, 0.0051, 0.0073, 0.0070], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-07 18:41:46,864 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=19129.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 18:41:57,274 INFO [train2.py:809] (2/4) Epoch 5, batch 3200, loss[ctc_loss=0.1439, att_loss=0.2604, loss=0.2371, over 15945.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007093, over 41.00 utterances.], tot_loss[ctc_loss=0.1572, att_loss=0.2778, loss=0.2537, over 3271846.16 frames. utt_duration=1255 frames, utt_pad_proportion=0.05221, over 10440.39 utterances.], batch size: 41, lr: 2.04e-02, grad_scale: 8.0 2023-03-07 18:42:14,104 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19146.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:42:43,206 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19164.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 18:43:19,482 INFO [train2.py:809] (2/4) Epoch 5, batch 3250, loss[ctc_loss=0.2261, att_loss=0.315, loss=0.2972, over 14231.00 frames. utt_duration=391.5 frames, utt_pad_proportion=0.318, over 146.00 utterances.], tot_loss[ctc_loss=0.1566, att_loss=0.2773, loss=0.2532, over 3265858.47 frames. utt_duration=1231 frames, utt_pad_proportion=0.06059, over 10627.51 utterances.], batch size: 146, lr: 2.04e-02, grad_scale: 8.0 2023-03-07 18:43:26,721 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=19190.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 18:43:32,678 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19194.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:43:41,860 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2023-03-07 18:43:42,885 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2608, 5.0658, 4.9368, 2.5395, 1.7877, 2.8597, 4.9758, 3.7672], device='cuda:2'), covar=tensor([0.0417, 0.0148, 0.0198, 0.2865, 0.6722, 0.2283, 0.0156, 0.1844], device='cuda:2'), in_proj_covar=tensor([0.0274, 0.0171, 0.0200, 0.0185, 0.0364, 0.0327, 0.0185, 0.0316], device='cuda:2'), out_proj_covar=tensor([1.4147e-04, 7.6178e-05, 9.0105e-05, 8.6397e-05, 1.7884e-04, 1.5050e-04, 8.1684e-05, 1.5449e-04], device='cuda:2') 2023-03-07 18:43:49,203 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7582, 5.1411, 4.9802, 4.9915, 5.1751, 5.1661, 4.8092, 4.6183], device='cuda:2'), covar=tensor([0.1045, 0.0437, 0.0200, 0.0488, 0.0265, 0.0247, 0.0300, 0.0364], device='cuda:2'), in_proj_covar=tensor([0.0388, 0.0232, 0.0163, 0.0205, 0.0254, 0.0283, 0.0216, 0.0247], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 18:43:58,427 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 3.305e+02 4.293e+02 5.572e+02 8.816e+02, threshold=8.587e+02, percent-clipped=1.0 2023-03-07 18:44:01,770 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19212.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 18:44:28,416 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9459, 5.1533, 5.6241, 5.3121, 4.7221, 5.6881, 5.1867, 5.7539], device='cuda:2'), covar=tensor([0.0939, 0.0979, 0.0706, 0.1636, 0.3896, 0.1301, 0.0746, 0.0943], device='cuda:2'), in_proj_covar=tensor([0.0512, 0.0328, 0.0345, 0.0411, 0.0568, 0.0345, 0.0291, 0.0364], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 18:44:40,690 INFO [train2.py:809] (2/4) Epoch 5, batch 3300, loss[ctc_loss=0.1517, att_loss=0.2799, loss=0.2543, over 17398.00 frames. utt_duration=882.4 frames, utt_pad_proportion=0.07599, over 79.00 utterances.], tot_loss[ctc_loss=0.1564, att_loss=0.2779, loss=0.2536, over 3271924.38 frames. utt_duration=1242 frames, utt_pad_proportion=0.05572, over 10551.28 utterances.], batch size: 79, lr: 2.04e-02, grad_scale: 8.0 2023-03-07 18:46:01,529 INFO [train2.py:809] (2/4) Epoch 5, batch 3350, loss[ctc_loss=0.1474, att_loss=0.2809, loss=0.2542, over 17056.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008853, over 52.00 utterances.], tot_loss[ctc_loss=0.1568, att_loss=0.278, loss=0.2538, over 3272445.98 frames. utt_duration=1243 frames, utt_pad_proportion=0.0558, over 10545.18 utterances.], batch size: 52, lr: 2.03e-02, grad_scale: 8.0 2023-03-07 18:46:01,752 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19286.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:46:39,663 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.326e+02 4.027e+02 5.244e+02 1.066e+03, threshold=8.053e+02, percent-clipped=2.0 2023-03-07 18:46:48,047 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7133, 1.9048, 1.9679, 2.1786, 3.7904, 1.3999, 1.7966, 1.8247], device='cuda:2'), covar=tensor([0.0327, 0.2012, 0.2557, 0.1184, 0.0311, 0.1887, 0.2654, 0.1712], device='cuda:2'), in_proj_covar=tensor([0.0095, 0.0104, 0.0100, 0.0090, 0.0084, 0.0090, 0.0107, 0.0089], device='cuda:2'), out_proj_covar=tensor([4.1555e-05, 5.5590e-05, 5.4046e-05, 4.4649e-05, 3.9760e-05, 4.9734e-05, 5.4638e-05, 4.8926e-05], device='cuda:2') 2023-03-07 18:47:13,639 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19330.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:47:22,653 INFO [train2.py:809] (2/4) Epoch 5, batch 3400, loss[ctc_loss=0.1542, att_loss=0.297, loss=0.2685, over 17090.00 frames. utt_duration=1222 frames, utt_pad_proportion=0.01668, over 56.00 utterances.], tot_loss[ctc_loss=0.1569, att_loss=0.2782, loss=0.2539, over 3269109.44 frames. utt_duration=1249 frames, utt_pad_proportion=0.05601, over 10482.44 utterances.], batch size: 56, lr: 2.03e-02, grad_scale: 8.0 2023-03-07 18:48:16,228 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2023-03-07 18:48:30,577 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19378.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:48:38,730 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6373, 2.4934, 5.0552, 3.8797, 2.8733, 4.3714, 4.6490, 4.7715], device='cuda:2'), covar=tensor([0.0192, 0.1846, 0.0132, 0.1263, 0.2546, 0.0292, 0.0155, 0.0232], device='cuda:2'), in_proj_covar=tensor([0.0135, 0.0248, 0.0125, 0.0306, 0.0319, 0.0192, 0.0113, 0.0140], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0003, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-07 18:48:42,958 INFO [train2.py:809] (2/4) Epoch 5, batch 3450, loss[ctc_loss=0.2206, att_loss=0.3182, loss=0.2987, over 16827.00 frames. utt_duration=681.3 frames, utt_pad_proportion=0.1441, over 99.00 utterances.], tot_loss[ctc_loss=0.1569, att_loss=0.2785, loss=0.2542, over 3272328.55 frames. utt_duration=1252 frames, utt_pad_proportion=0.05409, over 10468.76 utterances.], batch size: 99, lr: 2.03e-02, grad_scale: 8.0 2023-03-07 18:48:56,580 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=19394.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:49:21,716 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+02 3.490e+02 4.079e+02 5.311e+02 1.453e+03, threshold=8.158e+02, percent-clipped=3.0 2023-03-07 18:50:03,983 INFO [train2.py:809] (2/4) Epoch 5, batch 3500, loss[ctc_loss=0.2357, att_loss=0.3257, loss=0.3077, over 14209.00 frames. utt_duration=390.9 frames, utt_pad_proportion=0.3179, over 146.00 utterances.], tot_loss[ctc_loss=0.1568, att_loss=0.2788, loss=0.2544, over 3276564.72 frames. utt_duration=1243 frames, utt_pad_proportion=0.05554, over 10557.87 utterances.], batch size: 146, lr: 2.03e-02, grad_scale: 8.0 2023-03-07 18:50:34,977 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=19455.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:51:24,372 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19485.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 18:51:25,713 INFO [train2.py:809] (2/4) Epoch 5, batch 3550, loss[ctc_loss=0.1563, att_loss=0.277, loss=0.2529, over 16685.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006543, over 46.00 utterances.], tot_loss[ctc_loss=0.156, att_loss=0.2783, loss=0.2539, over 3274773.64 frames. utt_duration=1249 frames, utt_pad_proportion=0.05325, over 10498.38 utterances.], batch size: 46, lr: 2.02e-02, grad_scale: 8.0 2023-03-07 18:52:02,970 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.286e+02 3.371e+02 4.237e+02 4.987e+02 1.005e+03, threshold=8.474e+02, percent-clipped=3.0 2023-03-07 18:52:15,272 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2269, 4.2081, 4.0529, 4.2346, 2.1248, 4.2036, 2.4119, 1.8720], device='cuda:2'), covar=tensor([0.0204, 0.0142, 0.0779, 0.0198, 0.2712, 0.0276, 0.1911, 0.1932], device='cuda:2'), in_proj_covar=tensor([0.0109, 0.0100, 0.0245, 0.0111, 0.0226, 0.0101, 0.0223, 0.0206], device='cuda:2'), out_proj_covar=tensor([1.0582e-04, 9.8751e-05, 2.1647e-04, 9.9677e-05, 2.0137e-04, 9.7033e-05, 1.9377e-04, 1.8115e-04], device='cuda:2') 2023-03-07 18:52:46,423 INFO [train2.py:809] (2/4) Epoch 5, batch 3600, loss[ctc_loss=0.1022, att_loss=0.234, loss=0.2076, over 15758.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.008253, over 38.00 utterances.], tot_loss[ctc_loss=0.156, att_loss=0.278, loss=0.2536, over 3275545.61 frames. utt_duration=1251 frames, utt_pad_proportion=0.05182, over 10483.43 utterances.], batch size: 38, lr: 2.02e-02, grad_scale: 8.0 2023-03-07 18:53:32,003 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.51 vs. limit=5.0 2023-03-07 18:54:07,430 INFO [train2.py:809] (2/4) Epoch 5, batch 3650, loss[ctc_loss=0.142, att_loss=0.2786, loss=0.2513, over 17333.00 frames. utt_duration=1262 frames, utt_pad_proportion=0.0103, over 55.00 utterances.], tot_loss[ctc_loss=0.1571, att_loss=0.2786, loss=0.2543, over 3282243.01 frames. utt_duration=1252 frames, utt_pad_proportion=0.0502, over 10499.43 utterances.], batch size: 55, lr: 2.02e-02, grad_scale: 8.0 2023-03-07 18:54:07,771 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19586.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:54:44,973 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.264e+02 3.543e+02 4.665e+02 6.112e+02 1.723e+03, threshold=9.329e+02, percent-clipped=10.0 2023-03-07 18:54:59,275 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=19618.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:55:25,675 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19634.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:55:28,659 INFO [train2.py:809] (2/4) Epoch 5, batch 3700, loss[ctc_loss=0.1437, att_loss=0.2571, loss=0.2344, over 15659.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.007426, over 37.00 utterances.], tot_loss[ctc_loss=0.1547, att_loss=0.2765, loss=0.2522, over 3277044.30 frames. utt_duration=1284 frames, utt_pad_proportion=0.0441, over 10218.67 utterances.], batch size: 37, lr: 2.02e-02, grad_scale: 8.0 2023-03-07 18:56:05,485 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2753, 4.7950, 4.7491, 4.6666, 5.2953, 5.0506, 4.5434, 2.1433], device='cuda:2'), covar=tensor([0.0121, 0.0303, 0.0215, 0.0270, 0.1135, 0.0145, 0.0267, 0.3165], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0122, 0.0120, 0.0123, 0.0283, 0.0130, 0.0116, 0.0239], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 18:56:39,232 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=19679.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:56:49,596 INFO [train2.py:809] (2/4) Epoch 5, batch 3750, loss[ctc_loss=0.1363, att_loss=0.2648, loss=0.2391, over 16877.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007811, over 49.00 utterances.], tot_loss[ctc_loss=0.1551, att_loss=0.2765, loss=0.2522, over 3268860.63 frames. utt_duration=1267 frames, utt_pad_proportion=0.04995, over 10331.73 utterances.], batch size: 49, lr: 2.01e-02, grad_scale: 8.0 2023-03-07 18:56:59,421 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0363, 5.0518, 4.8885, 2.7715, 4.8409, 4.3779, 4.3993, 2.2963], device='cuda:2'), covar=tensor([0.0103, 0.0064, 0.0194, 0.1066, 0.0079, 0.0162, 0.0249, 0.1592], device='cuda:2'), in_proj_covar=tensor([0.0054, 0.0062, 0.0052, 0.0098, 0.0058, 0.0070, 0.0080, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 18:57:19,943 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0769, 4.9752, 5.0778, 3.1325, 4.9730, 4.4423, 4.4047, 2.5949], device='cuda:2'), covar=tensor([0.0107, 0.0075, 0.0118, 0.0920, 0.0076, 0.0154, 0.0262, 0.1360], device='cuda:2'), in_proj_covar=tensor([0.0054, 0.0062, 0.0051, 0.0098, 0.0057, 0.0070, 0.0080, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 18:57:27,474 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+02 3.451e+02 3.922e+02 5.029e+02 1.290e+03, threshold=7.843e+02, percent-clipped=3.0 2023-03-07 18:58:11,651 INFO [train2.py:809] (2/4) Epoch 5, batch 3800, loss[ctc_loss=0.1298, att_loss=0.2469, loss=0.2235, over 15498.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.007852, over 36.00 utterances.], tot_loss[ctc_loss=0.1542, att_loss=0.2756, loss=0.2513, over 3262721.08 frames. utt_duration=1286 frames, utt_pad_proportion=0.0474, over 10158.74 utterances.], batch size: 36, lr: 2.01e-02, grad_scale: 8.0 2023-03-07 18:58:15,011 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5067, 2.5215, 3.3647, 4.3003, 4.1451, 4.1892, 2.9955, 2.1472], device='cuda:2'), covar=tensor([0.0498, 0.2368, 0.1013, 0.0624, 0.0376, 0.0229, 0.1374, 0.2477], device='cuda:2'), in_proj_covar=tensor([0.0144, 0.0188, 0.0184, 0.0149, 0.0136, 0.0117, 0.0177, 0.0171], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 18:58:16,677 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.1689, 1.7103, 2.0903, 1.8871, 2.0787, 1.8323, 2.0019, 1.6648], device='cuda:2'), covar=tensor([0.1141, 0.2787, 0.2314, 0.1101, 0.1449, 0.2147, 0.2004, 0.1898], device='cuda:2'), in_proj_covar=tensor([0.0099, 0.0106, 0.0100, 0.0092, 0.0090, 0.0091, 0.0105, 0.0089], device='cuda:2'), out_proj_covar=tensor([4.3859e-05, 5.7152e-05, 5.4975e-05, 4.6030e-05, 4.3012e-05, 5.1881e-05, 5.4652e-05, 4.9639e-05], device='cuda:2') 2023-03-07 18:58:33,859 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19750.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 18:58:38,989 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4391, 5.0118, 5.0188, 3.1619, 1.9671, 2.6255, 5.0946, 3.7673], device='cuda:2'), covar=tensor([0.0503, 0.0297, 0.0278, 0.2324, 0.6868, 0.2942, 0.0260, 0.2022], device='cuda:2'), in_proj_covar=tensor([0.0276, 0.0174, 0.0199, 0.0185, 0.0363, 0.0328, 0.0186, 0.0322], device='cuda:2'), out_proj_covar=tensor([1.4232e-04, 7.6768e-05, 9.0134e-05, 8.4450e-05, 1.7746e-04, 1.5024e-04, 8.1527e-05, 1.5576e-04], device='cuda:2') 2023-03-07 18:59:24,241 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9722, 5.2753, 5.5741, 5.5246, 5.2279, 5.8444, 5.0502, 5.8758], device='cuda:2'), covar=tensor([0.0522, 0.0671, 0.0581, 0.0844, 0.1884, 0.0861, 0.0558, 0.0580], device='cuda:2'), in_proj_covar=tensor([0.0504, 0.0326, 0.0353, 0.0412, 0.0570, 0.0351, 0.0291, 0.0361], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 18:59:30,532 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19785.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 18:59:31,766 INFO [train2.py:809] (2/4) Epoch 5, batch 3850, loss[ctc_loss=0.1684, att_loss=0.277, loss=0.2553, over 15941.00 frames. utt_duration=1556 frames, utt_pad_proportion=0.007342, over 41.00 utterances.], tot_loss[ctc_loss=0.1545, att_loss=0.2759, loss=0.2516, over 3265516.46 frames. utt_duration=1276 frames, utt_pad_proportion=0.05022, over 10249.50 utterances.], batch size: 41, lr: 2.01e-02, grad_scale: 8.0 2023-03-07 19:00:09,271 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.273e+02 3.504e+02 4.097e+02 4.952e+02 1.319e+03, threshold=8.195e+02, percent-clipped=3.0 2023-03-07 19:00:44,709 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19833.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 19:00:48,873 INFO [train2.py:809] (2/4) Epoch 5, batch 3900, loss[ctc_loss=0.1303, att_loss=0.2708, loss=0.2427, over 17314.00 frames. utt_duration=1101 frames, utt_pad_proportion=0.03702, over 63.00 utterances.], tot_loss[ctc_loss=0.1555, att_loss=0.2765, loss=0.2523, over 3264981.49 frames. utt_duration=1292 frames, utt_pad_proportion=0.04555, over 10123.80 utterances.], batch size: 63, lr: 2.01e-02, grad_scale: 8.0 2023-03-07 19:01:50,519 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.3205, 2.6026, 3.1834, 2.2951, 2.7590, 2.7196, 2.7527, 1.3033], device='cuda:2'), covar=tensor([0.1275, 0.0733, 0.1778, 0.4582, 0.1857, 0.1844, 0.0991, 0.9262], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0069, 0.0069, 0.0097, 0.0067, 0.0087, 0.0063, 0.0110], device='cuda:2'), out_proj_covar=tensor([5.0302e-05, 4.6680e-05, 5.0863e-05, 7.0257e-05, 4.8293e-05, 6.7522e-05, 4.6764e-05, 8.3623e-05], device='cuda:2') 2023-03-07 19:02:05,453 INFO [train2.py:809] (2/4) Epoch 5, batch 3950, loss[ctc_loss=0.1464, att_loss=0.2698, loss=0.2451, over 16278.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.007447, over 43.00 utterances.], tot_loss[ctc_loss=0.1561, att_loss=0.277, loss=0.2528, over 3276112.27 frames. utt_duration=1300 frames, utt_pad_proportion=0.04076, over 10095.56 utterances.], batch size: 43, lr: 2.00e-02, grad_scale: 8.0 2023-03-07 19:02:43,149 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.585e+02 3.417e+02 4.181e+02 5.217e+02 1.635e+03, threshold=8.363e+02, percent-clipped=3.0 2023-03-07 19:03:26,307 INFO [train2.py:809] (2/4) Epoch 6, batch 0, loss[ctc_loss=0.1269, att_loss=0.2613, loss=0.2344, over 15983.00 frames. utt_duration=1600 frames, utt_pad_proportion=0.008277, over 40.00 utterances.], tot_loss[ctc_loss=0.1269, att_loss=0.2613, loss=0.2344, over 15983.00 frames. utt_duration=1600 frames, utt_pad_proportion=0.008277, over 40.00 utterances.], batch size: 40, lr: 1.87e-02, grad_scale: 8.0 2023-03-07 19:03:26,307 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 19:03:38,951 INFO [train2.py:843] (2/4) Epoch 6, validation: ctc_loss=0.07525, att_loss=0.2502, loss=0.2152, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 19:03:38,952 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 19:04:58,946 INFO [train2.py:809] (2/4) Epoch 6, batch 50, loss[ctc_loss=0.1341, att_loss=0.2818, loss=0.2523, over 17085.00 frames. utt_duration=1222 frames, utt_pad_proportion=0.01689, over 56.00 utterances.], tot_loss[ctc_loss=0.154, att_loss=0.2774, loss=0.2528, over 734221.53 frames. utt_duration=1151 frames, utt_pad_proportion=0.08574, over 2555.36 utterances.], batch size: 56, lr: 1.87e-02, grad_scale: 8.0 2023-03-07 19:05:05,356 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19974.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:05:56,648 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5982, 1.4675, 1.7420, 1.5763, 2.9392, 1.9616, 1.1891, 1.3529], device='cuda:2'), covar=tensor([0.0389, 0.2632, 0.2118, 0.1520, 0.0563, 0.1600, 0.2585, 0.2141], device='cuda:2'), in_proj_covar=tensor([0.0098, 0.0103, 0.0099, 0.0093, 0.0088, 0.0087, 0.0106, 0.0087], device='cuda:2'), out_proj_covar=tensor([4.2727e-05, 5.5617e-05, 5.4141e-05, 4.5832e-05, 4.1338e-05, 4.9346e-05, 5.4951e-05, 4.8506e-05], device='cuda:2') 2023-03-07 19:05:59,250 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.64 vs. limit=2.0 2023-03-07 19:06:05,489 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 3.240e+02 3.986e+02 5.094e+02 1.358e+03, threshold=7.972e+02, percent-clipped=5.0 2023-03-07 19:06:22,311 INFO [train2.py:809] (2/4) Epoch 6, batch 100, loss[ctc_loss=0.1501, att_loss=0.2834, loss=0.2568, over 16944.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.00788, over 50.00 utterances.], tot_loss[ctc_loss=0.1529, att_loss=0.2772, loss=0.2524, over 1304123.10 frames. utt_duration=1234 frames, utt_pad_proportion=0.05887, over 4232.91 utterances.], batch size: 50, lr: 1.86e-02, grad_scale: 8.0 2023-03-07 19:06:22,677 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20020.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:07:09,800 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20050.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:07:40,788 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0075, 4.4127, 3.9776, 4.3678, 2.3246, 4.0246, 2.2900, 2.3844], device='cuda:2'), covar=tensor([0.0269, 0.0148, 0.0751, 0.0172, 0.2300, 0.0269, 0.1873, 0.1472], device='cuda:2'), in_proj_covar=tensor([0.0105, 0.0096, 0.0250, 0.0110, 0.0225, 0.0107, 0.0220, 0.0203], device='cuda:2'), out_proj_covar=tensor([1.0375e-04, 9.6211e-05, 2.1986e-04, 9.8372e-05, 2.0113e-04, 1.0022e-04, 1.9293e-04, 1.7942e-04], device='cuda:2') 2023-03-07 19:07:40,881 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0879, 4.8324, 4.7756, 2.4416, 2.0256, 2.7668, 4.4515, 3.7090], device='cuda:2'), covar=tensor([0.0499, 0.0147, 0.0154, 0.2844, 0.5615, 0.2276, 0.0251, 0.1629], device='cuda:2'), in_proj_covar=tensor([0.0281, 0.0181, 0.0204, 0.0187, 0.0365, 0.0332, 0.0186, 0.0328], device='cuda:2'), out_proj_covar=tensor([1.4501e-04, 7.8575e-05, 9.1873e-05, 8.5828e-05, 1.7847e-04, 1.5147e-04, 8.0896e-05, 1.5804e-04], device='cuda:2') 2023-03-07 19:07:41,940 INFO [train2.py:809] (2/4) Epoch 6, batch 150, loss[ctc_loss=0.1853, att_loss=0.303, loss=0.2794, over 17311.00 frames. utt_duration=877.8 frames, utt_pad_proportion=0.07984, over 79.00 utterances.], tot_loss[ctc_loss=0.1518, att_loss=0.2753, loss=0.2506, over 1732905.42 frames. utt_duration=1238 frames, utt_pad_proportion=0.0632, over 5604.12 utterances.], batch size: 79, lr: 1.86e-02, grad_scale: 8.0 2023-03-07 19:07:52,549 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-07 19:07:53,976 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.73 vs. limit=2.0 2023-03-07 19:07:58,122 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-03-07 19:08:00,438 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20081.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:08:12,683 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20089.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:08:27,264 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20098.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:08:42,060 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.48 vs. limit=5.0 2023-03-07 19:08:45,956 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+02 3.260e+02 3.980e+02 4.904e+02 1.174e+03, threshold=7.960e+02, percent-clipped=3.0 2023-03-07 19:08:49,525 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20112.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:09:00,152 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.0058, 3.3238, 3.7100, 2.2975, 3.5903, 3.2589, 3.3156, 2.4854], device='cuda:2'), covar=tensor([0.1362, 0.0662, 0.1128, 0.4531, 0.0817, 0.1850, 0.0839, 0.6410], device='cuda:2'), in_proj_covar=tensor([0.0067, 0.0071, 0.0072, 0.0100, 0.0068, 0.0091, 0.0066, 0.0113], device='cuda:2'), out_proj_covar=tensor([5.1137e-05, 4.8353e-05, 5.2891e-05, 7.2844e-05, 4.9540e-05, 7.0416e-05, 4.8390e-05, 8.6418e-05], device='cuda:2') 2023-03-07 19:09:02,826 INFO [train2.py:809] (2/4) Epoch 6, batch 200, loss[ctc_loss=0.1612, att_loss=0.289, loss=0.2634, over 17026.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.008204, over 51.00 utterances.], tot_loss[ctc_loss=0.1503, att_loss=0.2746, loss=0.2498, over 2072307.26 frames. utt_duration=1260 frames, utt_pad_proportion=0.05566, over 6585.92 utterances.], batch size: 51, lr: 1.86e-02, grad_scale: 8.0 2023-03-07 19:09:15,343 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8157, 5.9699, 5.2605, 5.9204, 5.6685, 5.2951, 5.4137, 5.2992], device='cuda:2'), covar=tensor([0.1100, 0.0785, 0.0804, 0.0603, 0.0583, 0.1296, 0.1947, 0.2236], device='cuda:2'), in_proj_covar=tensor([0.0347, 0.0383, 0.0299, 0.0309, 0.0275, 0.0367, 0.0406, 0.0384], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 19:09:16,942 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4980, 5.7051, 4.9811, 5.6409, 5.3428, 5.0452, 5.1600, 4.9918], device='cuda:2'), covar=tensor([0.1117, 0.0817, 0.0802, 0.0595, 0.0635, 0.1299, 0.1980, 0.2268], device='cuda:2'), in_proj_covar=tensor([0.0348, 0.0383, 0.0300, 0.0309, 0.0275, 0.0368, 0.0407, 0.0385], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 19:09:51,033 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20150.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:10:11,010 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2023-03-07 19:10:23,258 INFO [train2.py:809] (2/4) Epoch 6, batch 250, loss[ctc_loss=0.1853, att_loss=0.3041, loss=0.2804, over 16858.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.008022, over 49.00 utterances.], tot_loss[ctc_loss=0.1515, att_loss=0.2758, loss=0.2509, over 2340351.53 frames. utt_duration=1230 frames, utt_pad_proportion=0.0595, over 7620.79 utterances.], batch size: 49, lr: 1.86e-02, grad_scale: 8.0 2023-03-07 19:10:28,341 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20173.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:11:07,916 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6361, 3.9730, 3.7518, 3.9612, 3.9425, 3.7258, 3.1343, 3.8560], device='cuda:2'), covar=tensor([0.0102, 0.0100, 0.0106, 0.0067, 0.0082, 0.0112, 0.0437, 0.0185], device='cuda:2'), in_proj_covar=tensor([0.0058, 0.0056, 0.0061, 0.0042, 0.0042, 0.0053, 0.0077, 0.0072], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-07 19:11:17,515 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20204.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:11:27,954 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.978e+02 3.813e+02 4.681e+02 1.334e+03, threshold=7.626e+02, percent-clipped=7.0 2023-03-07 19:11:43,058 INFO [train2.py:809] (2/4) Epoch 6, batch 300, loss[ctc_loss=0.1478, att_loss=0.285, loss=0.2576, over 16782.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005785, over 48.00 utterances.], tot_loss[ctc_loss=0.1527, att_loss=0.2768, loss=0.252, over 2554370.28 frames. utt_duration=1214 frames, utt_pad_proportion=0.06159, over 8428.72 utterances.], batch size: 48, lr: 1.86e-02, grad_scale: 4.0 2023-03-07 19:12:54,593 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20265.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:13:01,863 INFO [train2.py:809] (2/4) Epoch 6, batch 350, loss[ctc_loss=0.1305, att_loss=0.2769, loss=0.2476, over 16764.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006747, over 48.00 utterances.], tot_loss[ctc_loss=0.1524, att_loss=0.2754, loss=0.2508, over 2712842.93 frames. utt_duration=1232 frames, utt_pad_proportion=0.05761, over 8819.20 utterances.], batch size: 48, lr: 1.85e-02, grad_scale: 4.0 2023-03-07 19:13:08,354 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20274.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:13:12,002 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7189, 5.0774, 4.9343, 4.9721, 5.1720, 5.1509, 4.8006, 4.6710], device='cuda:2'), covar=tensor([0.0981, 0.0492, 0.0277, 0.0453, 0.0281, 0.0282, 0.0286, 0.0347], device='cuda:2'), in_proj_covar=tensor([0.0388, 0.0228, 0.0170, 0.0204, 0.0261, 0.0289, 0.0217, 0.0248], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 19:14:07,909 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 3.150e+02 3.613e+02 4.726e+02 1.759e+03, threshold=7.227e+02, percent-clipped=5.0 2023-03-07 19:14:22,591 INFO [train2.py:809] (2/4) Epoch 6, batch 400, loss[ctc_loss=0.1796, att_loss=0.273, loss=0.2543, over 16616.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005947, over 47.00 utterances.], tot_loss[ctc_loss=0.1517, att_loss=0.2754, loss=0.2507, over 2839956.64 frames. utt_duration=1249 frames, utt_pad_proportion=0.05456, over 9107.94 utterances.], batch size: 47, lr: 1.85e-02, grad_scale: 8.0 2023-03-07 19:14:25,775 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20322.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:15:10,044 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.49 vs. limit=5.0 2023-03-07 19:15:42,859 INFO [train2.py:809] (2/4) Epoch 6, batch 450, loss[ctc_loss=0.1321, att_loss=0.2449, loss=0.2223, over 15759.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.008814, over 38.00 utterances.], tot_loss[ctc_loss=0.1524, att_loss=0.2762, loss=0.2514, over 2945575.38 frames. utt_duration=1234 frames, utt_pad_proportion=0.05474, over 9561.40 utterances.], batch size: 38, lr: 1.85e-02, grad_scale: 8.0 2023-03-07 19:15:53,045 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20376.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:16:12,235 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20388.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:16:48,765 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.120e+02 3.385e+02 4.079e+02 4.609e+02 1.123e+03, threshold=8.157e+02, percent-clipped=3.0 2023-03-07 19:16:51,231 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2023-03-07 19:17:00,092 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20418.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:17:02,922 INFO [train2.py:809] (2/4) Epoch 6, batch 500, loss[ctc_loss=0.1188, att_loss=0.2296, loss=0.2074, over 15647.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008703, over 37.00 utterances.], tot_loss[ctc_loss=0.1502, att_loss=0.2756, loss=0.2505, over 3027962.71 frames. utt_duration=1259 frames, utt_pad_proportion=0.04701, over 9634.19 utterances.], batch size: 37, lr: 1.85e-02, grad_scale: 8.0 2023-03-07 19:17:04,943 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3435, 4.6087, 4.3139, 4.7629, 2.2118, 4.6175, 2.7515, 1.8986], device='cuda:2'), covar=tensor([0.0157, 0.0170, 0.0862, 0.0198, 0.2672, 0.0233, 0.1770, 0.2172], device='cuda:2'), in_proj_covar=tensor([0.0108, 0.0101, 0.0258, 0.0113, 0.0233, 0.0107, 0.0229, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 19:17:42,536 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20445.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:17:48,952 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20449.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:18:19,473 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20468.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:18:22,351 INFO [train2.py:809] (2/4) Epoch 6, batch 550, loss[ctc_loss=0.1355, att_loss=0.2539, loss=0.2302, over 15965.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005864, over 41.00 utterances.], tot_loss[ctc_loss=0.1492, att_loss=0.2747, loss=0.2496, over 3085594.17 frames. utt_duration=1275 frames, utt_pad_proportion=0.04396, over 9690.33 utterances.], batch size: 41, lr: 1.84e-02, grad_scale: 8.0 2023-03-07 19:18:37,163 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20479.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:19:28,294 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 3.263e+02 3.939e+02 5.001e+02 1.057e+03, threshold=7.878e+02, percent-clipped=3.0 2023-03-07 19:19:43,048 INFO [train2.py:809] (2/4) Epoch 6, batch 600, loss[ctc_loss=0.1905, att_loss=0.306, loss=0.2829, over 17347.00 frames. utt_duration=1007 frames, utt_pad_proportion=0.04971, over 69.00 utterances.], tot_loss[ctc_loss=0.1491, att_loss=0.2744, loss=0.2493, over 3119831.54 frames. utt_duration=1249 frames, utt_pad_proportion=0.05242, over 10004.02 utterances.], batch size: 69, lr: 1.84e-02, grad_scale: 8.0 2023-03-07 19:20:04,884 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8389, 4.8879, 4.9655, 2.9869, 4.6991, 4.1612, 4.2408, 2.1112], device='cuda:2'), covar=tensor([0.0184, 0.0112, 0.0164, 0.1199, 0.0104, 0.0231, 0.0341, 0.1964], device='cuda:2'), in_proj_covar=tensor([0.0053, 0.0066, 0.0053, 0.0099, 0.0058, 0.0072, 0.0081, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 19:20:44,096 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3238, 4.7335, 4.5082, 4.6640, 4.7476, 4.3600, 3.8233, 4.6698], device='cuda:2'), covar=tensor([0.0115, 0.0139, 0.0103, 0.0101, 0.0106, 0.0122, 0.0452, 0.0179], device='cuda:2'), in_proj_covar=tensor([0.0058, 0.0057, 0.0061, 0.0043, 0.0042, 0.0053, 0.0076, 0.0071], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-07 19:20:47,281 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20560.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:21:03,190 INFO [train2.py:809] (2/4) Epoch 6, batch 650, loss[ctc_loss=0.1611, att_loss=0.2734, loss=0.2509, over 17473.00 frames. utt_duration=1111 frames, utt_pad_proportion=0.02696, over 63.00 utterances.], tot_loss[ctc_loss=0.1498, att_loss=0.2753, loss=0.2502, over 3162356.10 frames. utt_duration=1259 frames, utt_pad_proportion=0.04716, over 10057.19 utterances.], batch size: 63, lr: 1.84e-02, grad_scale: 8.0 2023-03-07 19:21:32,149 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20588.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:22:08,669 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+02 2.894e+02 3.503e+02 4.491e+02 1.231e+03, threshold=7.006e+02, percent-clipped=2.0 2023-03-07 19:22:23,250 INFO [train2.py:809] (2/4) Epoch 6, batch 700, loss[ctc_loss=0.1721, att_loss=0.3032, loss=0.277, over 16783.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005667, over 48.00 utterances.], tot_loss[ctc_loss=0.1498, att_loss=0.2754, loss=0.2503, over 3195352.04 frames. utt_duration=1267 frames, utt_pad_proportion=0.04425, over 10102.42 utterances.], batch size: 48, lr: 1.84e-02, grad_scale: 8.0 2023-03-07 19:23:09,513 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20649.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:23:20,105 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.48 vs. limit=2.0 2023-03-07 19:23:42,154 INFO [train2.py:809] (2/4) Epoch 6, batch 750, loss[ctc_loss=0.15, att_loss=0.2713, loss=0.2471, over 16265.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.007015, over 43.00 utterances.], tot_loss[ctc_loss=0.151, att_loss=0.2757, loss=0.2508, over 3213882.78 frames. utt_duration=1238 frames, utt_pad_proportion=0.05239, over 10398.49 utterances.], batch size: 43, lr: 1.84e-02, grad_scale: 8.0 2023-03-07 19:23:51,714 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20676.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:24:44,870 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-03-07 19:24:47,041 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+02 3.454e+02 4.265e+02 5.138e+02 1.250e+03, threshold=8.530e+02, percent-clipped=7.0 2023-03-07 19:25:01,488 INFO [train2.py:809] (2/4) Epoch 6, batch 800, loss[ctc_loss=0.1935, att_loss=0.2987, loss=0.2777, over 16863.00 frames. utt_duration=682.9 frames, utt_pad_proportion=0.1442, over 99.00 utterances.], tot_loss[ctc_loss=0.1512, att_loss=0.2759, loss=0.2509, over 3239201.24 frames. utt_duration=1241 frames, utt_pad_proportion=0.04976, over 10449.25 utterances.], batch size: 99, lr: 1.83e-02, grad_scale: 8.0 2023-03-07 19:25:07,755 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20724.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:25:39,415 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20744.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:25:41,062 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20745.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:26:02,202 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5073, 5.1428, 4.8337, 5.1698, 5.0491, 4.8341, 3.8838, 5.0082], device='cuda:2'), covar=tensor([0.0116, 0.0136, 0.0094, 0.0079, 0.0102, 0.0098, 0.0523, 0.0242], device='cuda:2'), in_proj_covar=tensor([0.0058, 0.0054, 0.0060, 0.0043, 0.0041, 0.0053, 0.0075, 0.0071], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-07 19:26:18,423 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20768.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:26:21,151 INFO [train2.py:809] (2/4) Epoch 6, batch 850, loss[ctc_loss=0.1392, att_loss=0.2799, loss=0.2517, over 16627.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005409, over 47.00 utterances.], tot_loss[ctc_loss=0.1508, att_loss=0.2754, loss=0.2505, over 3248365.02 frames. utt_duration=1255 frames, utt_pad_proportion=0.04801, over 10367.45 utterances.], batch size: 47, lr: 1.83e-02, grad_scale: 8.0 2023-03-07 19:26:27,689 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20774.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:26:57,839 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20793.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:27:19,853 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-07 19:27:26,712 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 3.098e+02 3.888e+02 4.985e+02 1.005e+03, threshold=7.777e+02, percent-clipped=2.0 2023-03-07 19:27:28,898 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2086, 4.9876, 4.8324, 2.8213, 1.9686, 2.8656, 4.3643, 3.6702], device='cuda:2'), covar=tensor([0.0469, 0.0132, 0.0177, 0.2443, 0.5914, 0.2382, 0.0289, 0.1757], device='cuda:2'), in_proj_covar=tensor([0.0285, 0.0178, 0.0208, 0.0179, 0.0357, 0.0333, 0.0186, 0.0325], device='cuda:2'), out_proj_covar=tensor([1.4466e-04, 7.6682e-05, 9.3604e-05, 8.1811e-05, 1.7321e-04, 1.5024e-04, 8.0119e-05, 1.5490e-04], device='cuda:2') 2023-03-07 19:27:35,415 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20816.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:27:41,726 INFO [train2.py:809] (2/4) Epoch 6, batch 900, loss[ctc_loss=0.1507, att_loss=0.2856, loss=0.2586, over 17320.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02254, over 59.00 utterances.], tot_loss[ctc_loss=0.149, att_loss=0.274, loss=0.249, over 3258196.68 frames. utt_duration=1271 frames, utt_pad_proportion=0.04357, over 10263.86 utterances.], batch size: 59, lr: 1.83e-02, grad_scale: 8.0 2023-03-07 19:28:05,780 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6901, 5.9327, 5.3076, 5.8514, 5.6284, 5.1822, 5.3497, 5.1974], device='cuda:2'), covar=tensor([0.1508, 0.0917, 0.0825, 0.0740, 0.0684, 0.1363, 0.2345, 0.2444], device='cuda:2'), in_proj_covar=tensor([0.0349, 0.0385, 0.0300, 0.0312, 0.0285, 0.0371, 0.0423, 0.0392], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 19:28:45,611 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20860.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:29:01,654 INFO [train2.py:809] (2/4) Epoch 6, batch 950, loss[ctc_loss=0.1431, att_loss=0.2514, loss=0.2297, over 15765.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.009044, over 38.00 utterances.], tot_loss[ctc_loss=0.1481, att_loss=0.2733, loss=0.2482, over 3264950.57 frames. utt_duration=1291 frames, utt_pad_proportion=0.0389, over 10124.68 utterances.], batch size: 38, lr: 1.83e-02, grad_scale: 8.0 2023-03-07 19:29:50,502 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.13 vs. limit=5.0 2023-03-07 19:30:01,779 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20908.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:30:06,282 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+02 3.096e+02 4.099e+02 5.928e+02 9.906e+02, threshold=8.198e+02, percent-clipped=5.0 2023-03-07 19:30:21,096 INFO [train2.py:809] (2/4) Epoch 6, batch 1000, loss[ctc_loss=0.1628, att_loss=0.2851, loss=0.2607, over 16472.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006237, over 46.00 utterances.], tot_loss[ctc_loss=0.1489, att_loss=0.2738, loss=0.2488, over 3268328.21 frames. utt_duration=1289 frames, utt_pad_proportion=0.03814, over 10151.87 utterances.], batch size: 46, lr: 1.83e-02, grad_scale: 8.0 2023-03-07 19:30:59,396 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20944.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:31:40,994 INFO [train2.py:809] (2/4) Epoch 6, batch 1050, loss[ctc_loss=0.1352, att_loss=0.2467, loss=0.2244, over 15480.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.009935, over 36.00 utterances.], tot_loss[ctc_loss=0.1481, att_loss=0.2729, loss=0.248, over 3257784.31 frames. utt_duration=1277 frames, utt_pad_proportion=0.04577, over 10212.82 utterances.], batch size: 36, lr: 1.82e-02, grad_scale: 8.0 2023-03-07 19:31:55,663 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3985, 4.3911, 4.5184, 4.3636, 2.2600, 4.6473, 2.5468, 2.2318], device='cuda:2'), covar=tensor([0.0182, 0.0205, 0.0599, 0.0290, 0.2368, 0.0217, 0.1779, 0.1683], device='cuda:2'), in_proj_covar=tensor([0.0107, 0.0099, 0.0254, 0.0113, 0.0231, 0.0109, 0.0226, 0.0210], device='cuda:2'), out_proj_covar=tensor([1.0678e-04, 9.9324e-05, 2.2485e-04, 1.0305e-04, 2.0758e-04, 1.0233e-04, 1.9918e-04, 1.8569e-04], device='cuda:2') 2023-03-07 19:32:47,825 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+02 3.142e+02 3.611e+02 4.263e+02 9.074e+02, threshold=7.223e+02, percent-clipped=1.0 2023-03-07 19:33:02,162 INFO [train2.py:809] (2/4) Epoch 6, batch 1100, loss[ctc_loss=0.1302, att_loss=0.253, loss=0.2284, over 16016.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006707, over 40.00 utterances.], tot_loss[ctc_loss=0.1472, att_loss=0.2717, loss=0.2468, over 3255475.31 frames. utt_duration=1289 frames, utt_pad_proportion=0.04487, over 10111.50 utterances.], batch size: 40, lr: 1.82e-02, grad_scale: 8.0 2023-03-07 19:33:12,061 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0529, 5.3762, 4.7830, 5.3078, 4.9109, 4.6183, 4.7944, 4.7594], device='cuda:2'), covar=tensor([0.1375, 0.0956, 0.0885, 0.0682, 0.0776, 0.1549, 0.2324, 0.2267], device='cuda:2'), in_proj_covar=tensor([0.0342, 0.0388, 0.0301, 0.0312, 0.0287, 0.0363, 0.0415, 0.0381], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 19:33:29,581 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2985, 5.1576, 5.0959, 3.3575, 5.0610, 4.4956, 4.6113, 2.9175], device='cuda:2'), covar=tensor([0.0109, 0.0075, 0.0185, 0.0812, 0.0081, 0.0155, 0.0233, 0.1294], device='cuda:2'), in_proj_covar=tensor([0.0053, 0.0066, 0.0054, 0.0098, 0.0059, 0.0074, 0.0081, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 19:33:40,398 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21044.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:33:52,752 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21052.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:34:22,327 INFO [train2.py:809] (2/4) Epoch 6, batch 1150, loss[ctc_loss=0.1518, att_loss=0.2772, loss=0.2521, over 16609.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.005742, over 47.00 utterances.], tot_loss[ctc_loss=0.1485, att_loss=0.2727, loss=0.2479, over 3266169.64 frames. utt_duration=1302 frames, utt_pad_proportion=0.03964, over 10042.21 utterances.], batch size: 47, lr: 1.82e-02, grad_scale: 8.0 2023-03-07 19:34:28,581 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21074.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:34:57,178 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21092.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:35:19,146 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1752, 5.0609, 4.9026, 2.6725, 1.9549, 2.7260, 4.9499, 3.8380], device='cuda:2'), covar=tensor([0.0504, 0.0150, 0.0233, 0.2790, 0.6098, 0.2484, 0.0177, 0.1766], device='cuda:2'), in_proj_covar=tensor([0.0283, 0.0177, 0.0204, 0.0179, 0.0358, 0.0333, 0.0186, 0.0326], device='cuda:2'), out_proj_covar=tensor([1.4387e-04, 7.6809e-05, 9.2539e-05, 8.1164e-05, 1.7315e-04, 1.5035e-04, 7.9397e-05, 1.5497e-04], device='cuda:2') 2023-03-07 19:35:27,187 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 3.236e+02 4.040e+02 5.073e+02 1.159e+03, threshold=8.079e+02, percent-clipped=5.0 2023-03-07 19:35:30,953 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21113.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:35:39,367 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21118.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:35:41,977 INFO [train2.py:809] (2/4) Epoch 6, batch 1200, loss[ctc_loss=0.1536, att_loss=0.2784, loss=0.2534, over 16465.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.007362, over 46.00 utterances.], tot_loss[ctc_loss=0.1483, att_loss=0.2729, loss=0.248, over 3269764.69 frames. utt_duration=1298 frames, utt_pad_proportion=0.03924, over 10086.04 utterances.], batch size: 46, lr: 1.82e-02, grad_scale: 8.0 2023-03-07 19:35:45,287 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21122.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:35:47,179 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1420, 4.3727, 4.1644, 4.3313, 4.4166, 4.1882, 4.0352, 2.1813], device='cuda:2'), covar=tensor([0.0282, 0.0240, 0.0230, 0.0149, 0.1059, 0.0239, 0.0327, 0.2597], device='cuda:2'), in_proj_covar=tensor([0.0132, 0.0124, 0.0120, 0.0125, 0.0294, 0.0130, 0.0117, 0.0237], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 19:36:18,596 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7258, 5.0621, 5.0160, 4.8836, 5.1407, 5.0973, 4.7970, 4.5787], device='cuda:2'), covar=tensor([0.1039, 0.0467, 0.0220, 0.0479, 0.0299, 0.0279, 0.0314, 0.0327], device='cuda:2'), in_proj_covar=tensor([0.0390, 0.0227, 0.0166, 0.0206, 0.0260, 0.0291, 0.0222, 0.0250], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 19:37:02,263 INFO [train2.py:809] (2/4) Epoch 6, batch 1250, loss[ctc_loss=0.1469, att_loss=0.2559, loss=0.2341, over 15759.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009248, over 38.00 utterances.], tot_loss[ctc_loss=0.1477, att_loss=0.2724, loss=0.2474, over 3269867.98 frames. utt_duration=1295 frames, utt_pad_proportion=0.0404, over 10109.10 utterances.], batch size: 38, lr: 1.82e-02, grad_scale: 8.0 2023-03-07 19:37:17,148 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21179.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:38:08,270 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 3.230e+02 3.853e+02 4.858e+02 1.560e+03, threshold=7.707e+02, percent-clipped=4.0 2023-03-07 19:38:22,119 INFO [train2.py:809] (2/4) Epoch 6, batch 1300, loss[ctc_loss=0.1231, att_loss=0.247, loss=0.2222, over 15640.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008645, over 37.00 utterances.], tot_loss[ctc_loss=0.1483, att_loss=0.2729, loss=0.248, over 3275514.63 frames. utt_duration=1307 frames, utt_pad_proportion=0.03659, over 10036.97 utterances.], batch size: 37, lr: 1.81e-02, grad_scale: 8.0 2023-03-07 19:38:59,934 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21244.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:39:10,463 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.61 vs. limit=2.0 2023-03-07 19:39:42,381 INFO [train2.py:809] (2/4) Epoch 6, batch 1350, loss[ctc_loss=0.1738, att_loss=0.2933, loss=0.2694, over 17363.00 frames. utt_duration=1104 frames, utt_pad_proportion=0.03255, over 63.00 utterances.], tot_loss[ctc_loss=0.1489, att_loss=0.2741, loss=0.249, over 3273069.03 frames. utt_duration=1260 frames, utt_pad_proportion=0.04669, over 10404.98 utterances.], batch size: 63, lr: 1.81e-02, grad_scale: 8.0 2023-03-07 19:40:17,788 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21292.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:40:48,695 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.207e+02 3.220e+02 3.890e+02 4.663e+02 1.481e+03, threshold=7.781e+02, percent-clipped=4.0 2023-03-07 19:40:50,620 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21312.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:41:02,794 INFO [train2.py:809] (2/4) Epoch 6, batch 1400, loss[ctc_loss=0.1154, att_loss=0.2482, loss=0.2216, over 16125.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006279, over 42.00 utterances.], tot_loss[ctc_loss=0.1489, att_loss=0.2735, loss=0.2485, over 3271791.20 frames. utt_duration=1258 frames, utt_pad_proportion=0.04905, over 10418.29 utterances.], batch size: 42, lr: 1.81e-02, grad_scale: 8.0 2023-03-07 19:41:16,040 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21328.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:41:19,165 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5919, 4.9421, 4.8376, 4.8595, 5.0612, 4.9993, 4.7341, 4.4812], device='cuda:2'), covar=tensor([0.1174, 0.0477, 0.0277, 0.0456, 0.0271, 0.0276, 0.0326, 0.0326], device='cuda:2'), in_proj_covar=tensor([0.0395, 0.0228, 0.0168, 0.0207, 0.0263, 0.0294, 0.0224, 0.0254], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 19:42:23,223 INFO [train2.py:809] (2/4) Epoch 6, batch 1450, loss[ctc_loss=0.1435, att_loss=0.2476, loss=0.2268, over 15372.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01052, over 35.00 utterances.], tot_loss[ctc_loss=0.1489, att_loss=0.2739, loss=0.2489, over 3276592.09 frames. utt_duration=1254 frames, utt_pad_proportion=0.05009, over 10464.83 utterances.], batch size: 35, lr: 1.81e-02, grad_scale: 8.0 2023-03-07 19:42:28,311 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21373.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:42:33,293 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.71 vs. limit=5.0 2023-03-07 19:42:54,061 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21389.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:43:16,655 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.99 vs. limit=5.0 2023-03-07 19:43:24,934 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21408.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:43:29,213 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.143e+02 3.096e+02 3.926e+02 4.783e+02 1.162e+03, threshold=7.851e+02, percent-clipped=3.0 2023-03-07 19:43:43,053 INFO [train2.py:809] (2/4) Epoch 6, batch 1500, loss[ctc_loss=0.129, att_loss=0.2375, loss=0.2158, over 15363.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.0116, over 35.00 utterances.], tot_loss[ctc_loss=0.1479, att_loss=0.273, loss=0.248, over 3272740.77 frames. utt_duration=1268 frames, utt_pad_proportion=0.04954, over 10335.41 utterances.], batch size: 35, lr: 1.81e-02, grad_scale: 8.0 2023-03-07 19:44:48,186 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9329, 4.6837, 4.8509, 4.7145, 5.3174, 4.8146, 4.6051, 2.1892], device='cuda:2'), covar=tensor([0.0324, 0.0366, 0.0185, 0.0364, 0.1038, 0.0288, 0.0331, 0.2572], device='cuda:2'), in_proj_covar=tensor([0.0130, 0.0123, 0.0119, 0.0123, 0.0296, 0.0128, 0.0115, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 19:45:01,845 INFO [train2.py:809] (2/4) Epoch 6, batch 1550, loss[ctc_loss=0.1153, att_loss=0.2435, loss=0.2178, over 16000.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007822, over 40.00 utterances.], tot_loss[ctc_loss=0.1478, att_loss=0.2733, loss=0.2482, over 3273976.92 frames. utt_duration=1247 frames, utt_pad_proportion=0.05507, over 10513.00 utterances.], batch size: 40, lr: 1.80e-02, grad_scale: 8.0 2023-03-07 19:45:08,975 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21474.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:45:17,698 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-07 19:46:07,490 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 3.352e+02 3.791e+02 4.457e+02 9.611e+02, threshold=7.583e+02, percent-clipped=1.0 2023-03-07 19:46:21,675 INFO [train2.py:809] (2/4) Epoch 6, batch 1600, loss[ctc_loss=0.1584, att_loss=0.2917, loss=0.265, over 17297.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02325, over 59.00 utterances.], tot_loss[ctc_loss=0.1473, att_loss=0.2733, loss=0.2481, over 3272644.26 frames. utt_duration=1247 frames, utt_pad_proportion=0.05557, over 10510.86 utterances.], batch size: 59, lr: 1.80e-02, grad_scale: 8.0 2023-03-07 19:46:50,969 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-07 19:47:07,345 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.83 vs. limit=2.0 2023-03-07 19:47:19,222 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7262, 1.3778, 2.0569, 1.4596, 2.4985, 1.7574, 1.8322, 2.0275], device='cuda:2'), covar=tensor([0.0375, 0.3815, 0.2798, 0.1606, 0.0923, 0.2161, 0.1853, 0.1413], device='cuda:2'), in_proj_covar=tensor([0.0092, 0.0102, 0.0101, 0.0087, 0.0087, 0.0086, 0.0096, 0.0077], device='cuda:2'), out_proj_covar=tensor([4.1246e-05, 5.5142e-05, 5.4150e-05, 4.4771e-05, 4.1316e-05, 4.8753e-05, 5.2075e-05, 4.4643e-05], device='cuda:2') 2023-03-07 19:47:41,020 INFO [train2.py:809] (2/4) Epoch 6, batch 1650, loss[ctc_loss=0.1623, att_loss=0.2697, loss=0.2482, over 16784.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005637, over 48.00 utterances.], tot_loss[ctc_loss=0.1474, att_loss=0.2729, loss=0.2478, over 3268872.61 frames. utt_duration=1264 frames, utt_pad_proportion=0.05253, over 10356.52 utterances.], batch size: 48, lr: 1.80e-02, grad_scale: 8.0 2023-03-07 19:47:42,794 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0449, 3.9685, 3.3071, 3.8221, 4.2859, 3.8599, 2.7457, 4.7739], device='cuda:2'), covar=tensor([0.0955, 0.0387, 0.1001, 0.0562, 0.0440, 0.0616, 0.1058, 0.0258], device='cuda:2'), in_proj_covar=tensor([0.0162, 0.0141, 0.0185, 0.0151, 0.0173, 0.0179, 0.0157, 0.0169], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 19:48:20,305 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21595.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:48:26,489 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0166, 4.8416, 4.8594, 2.8838, 4.7126, 4.1737, 4.2649, 3.1443], device='cuda:2'), covar=tensor([0.0100, 0.0090, 0.0168, 0.0971, 0.0080, 0.0204, 0.0237, 0.1103], device='cuda:2'), in_proj_covar=tensor([0.0053, 0.0066, 0.0053, 0.0097, 0.0058, 0.0074, 0.0080, 0.0100], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 19:48:47,370 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+02 3.087e+02 4.072e+02 5.275e+02 1.721e+03, threshold=8.145e+02, percent-clipped=7.0 2023-03-07 19:48:50,755 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21613.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:49:01,645 INFO [train2.py:809] (2/4) Epoch 6, batch 1700, loss[ctc_loss=0.1063, att_loss=0.2564, loss=0.2264, over 16377.00 frames. utt_duration=1490 frames, utt_pad_proportion=0.008407, over 44.00 utterances.], tot_loss[ctc_loss=0.1469, att_loss=0.2718, loss=0.2468, over 3259759.94 frames. utt_duration=1266 frames, utt_pad_proportion=0.05487, over 10313.63 utterances.], batch size: 44, lr: 1.80e-02, grad_scale: 8.0 2023-03-07 19:49:54,563 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5020, 2.3401, 4.9467, 3.6935, 2.8786, 4.3983, 4.5918, 4.6469], device='cuda:2'), covar=tensor([0.0182, 0.2003, 0.0126, 0.1260, 0.2177, 0.0219, 0.0105, 0.0204], device='cuda:2'), in_proj_covar=tensor([0.0131, 0.0240, 0.0120, 0.0300, 0.0296, 0.0181, 0.0107, 0.0135], device='cuda:2'), out_proj_covar=tensor([1.1638e-04, 1.9621e-04, 1.0655e-04, 2.4236e-04, 2.4765e-04, 1.5692e-04, 9.6252e-05, 1.2013e-04], device='cuda:2') 2023-03-07 19:50:00,441 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21656.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:50:08,283 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3597, 3.8264, 2.9956, 3.3738, 3.7700, 3.3750, 2.3229, 4.3784], device='cuda:2'), covar=tensor([0.1378, 0.0350, 0.1164, 0.0788, 0.0588, 0.0817, 0.1284, 0.0372], device='cuda:2'), in_proj_covar=tensor([0.0163, 0.0141, 0.0186, 0.0153, 0.0175, 0.0182, 0.0158, 0.0171], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 19:50:19,140 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21668.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:50:22,100 INFO [train2.py:809] (2/4) Epoch 6, batch 1750, loss[ctc_loss=0.1346, att_loss=0.2681, loss=0.2414, over 16393.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.006779, over 44.00 utterances.], tot_loss[ctc_loss=0.147, att_loss=0.2728, loss=0.2477, over 3270795.50 frames. utt_duration=1223 frames, utt_pad_proportion=0.06095, over 10712.88 utterances.], batch size: 44, lr: 1.80e-02, grad_scale: 8.0 2023-03-07 19:50:29,303 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21674.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:50:44,675 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21684.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:51:23,942 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21708.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:51:28,243 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+02 3.408e+02 4.076e+02 5.243e+02 8.515e+02, threshold=8.152e+02, percent-clipped=2.0 2023-03-07 19:51:42,607 INFO [train2.py:809] (2/4) Epoch 6, batch 1800, loss[ctc_loss=0.1619, att_loss=0.2849, loss=0.2603, over 17114.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01534, over 56.00 utterances.], tot_loss[ctc_loss=0.1478, att_loss=0.2736, loss=0.2484, over 3274288.75 frames. utt_duration=1209 frames, utt_pad_proportion=0.06254, over 10847.28 utterances.], batch size: 56, lr: 1.79e-02, grad_scale: 8.0 2023-03-07 19:51:58,407 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8648, 3.8657, 3.1742, 3.4538, 3.8696, 3.6262, 2.5051, 4.5357], device='cuda:2'), covar=tensor([0.1038, 0.0383, 0.1074, 0.0626, 0.0533, 0.0713, 0.1144, 0.0252], device='cuda:2'), in_proj_covar=tensor([0.0159, 0.0140, 0.0183, 0.0151, 0.0172, 0.0179, 0.0157, 0.0170], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 19:52:18,919 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5755, 2.6194, 4.9166, 3.8435, 2.8835, 4.3726, 4.6274, 4.7555], device='cuda:2'), covar=tensor([0.0158, 0.1849, 0.0141, 0.1181, 0.2313, 0.0263, 0.0100, 0.0176], device='cuda:2'), in_proj_covar=tensor([0.0133, 0.0241, 0.0121, 0.0300, 0.0298, 0.0185, 0.0108, 0.0136], device='cuda:2'), out_proj_covar=tensor([1.1833e-04, 1.9669e-04, 1.0736e-04, 2.4281e-04, 2.4969e-04, 1.5989e-04, 9.7108e-05, 1.2168e-04], device='cuda:2') 2023-03-07 19:52:39,969 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21756.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:52:51,281 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21763.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:53:02,484 INFO [train2.py:809] (2/4) Epoch 6, batch 1850, loss[ctc_loss=0.1187, att_loss=0.2647, loss=0.2355, over 16525.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.004962, over 45.00 utterances.], tot_loss[ctc_loss=0.1482, att_loss=0.2743, loss=0.2491, over 3273118.12 frames. utt_duration=1195 frames, utt_pad_proportion=0.06561, over 10968.29 utterances.], batch size: 45, lr: 1.79e-02, grad_scale: 8.0 2023-03-07 19:53:09,232 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21774.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:54:09,694 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.389e+02 3.296e+02 3.946e+02 5.007e+02 1.050e+03, threshold=7.891e+02, percent-clipped=3.0 2023-03-07 19:54:24,826 INFO [train2.py:809] (2/4) Epoch 6, batch 1900, loss[ctc_loss=0.1355, att_loss=0.2742, loss=0.2465, over 17191.00 frames. utt_duration=871.8 frames, utt_pad_proportion=0.08619, over 79.00 utterances.], tot_loss[ctc_loss=0.1468, att_loss=0.2733, loss=0.248, over 3277266.99 frames. utt_duration=1221 frames, utt_pad_proportion=0.05816, over 10753.58 utterances.], batch size: 79, lr: 1.79e-02, grad_scale: 8.0 2023-03-07 19:54:28,010 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21822.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:54:31,372 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21824.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:55:14,669 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21850.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:55:41,336 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0 2023-03-07 19:55:47,727 INFO [train2.py:809] (2/4) Epoch 6, batch 1950, loss[ctc_loss=0.1197, att_loss=0.2377, loss=0.2141, over 15504.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008562, over 36.00 utterances.], tot_loss[ctc_loss=0.1464, att_loss=0.2727, loss=0.2474, over 3271827.24 frames. utt_duration=1218 frames, utt_pad_proportion=0.06128, over 10761.59 utterances.], batch size: 36, lr: 1.79e-02, grad_scale: 8.0 2023-03-07 19:56:30,692 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-07 19:56:54,343 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 3.406e+02 4.124e+02 5.166e+02 8.214e+02, threshold=8.249e+02, percent-clipped=4.0 2023-03-07 19:56:54,798 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21911.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:57:09,087 INFO [train2.py:809] (2/4) Epoch 6, batch 2000, loss[ctc_loss=0.1587, att_loss=0.2831, loss=0.2582, over 17304.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01096, over 55.00 utterances.], tot_loss[ctc_loss=0.1485, att_loss=0.2745, loss=0.2493, over 3278432.31 frames. utt_duration=1193 frames, utt_pad_proportion=0.06477, over 11010.51 utterances.], batch size: 55, lr: 1.79e-02, grad_scale: 8.0 2023-03-07 19:57:48,367 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8025, 4.8255, 4.7919, 2.6394, 4.6345, 4.3293, 3.9774, 2.2222], device='cuda:2'), covar=tensor([0.0109, 0.0100, 0.0116, 0.1084, 0.0085, 0.0185, 0.0334, 0.1744], device='cuda:2'), in_proj_covar=tensor([0.0054, 0.0066, 0.0054, 0.0100, 0.0059, 0.0076, 0.0082, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0004, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 19:57:59,926 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21951.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:58:27,729 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21968.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:58:29,213 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21969.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:58:30,578 INFO [train2.py:809] (2/4) Epoch 6, batch 2050, loss[ctc_loss=0.1434, att_loss=0.2839, loss=0.2558, over 17310.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02404, over 59.00 utterances.], tot_loss[ctc_loss=0.1477, att_loss=0.274, loss=0.2487, over 3276368.61 frames. utt_duration=1205 frames, utt_pad_proportion=0.06216, over 10891.12 utterances.], batch size: 59, lr: 1.78e-02, grad_scale: 8.0 2023-03-07 19:58:30,983 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21970.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:58:53,164 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21984.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:59:42,652 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 3.148e+02 3.962e+02 4.824e+02 9.666e+02, threshold=7.923e+02, percent-clipped=1.0 2023-03-07 19:59:51,301 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22016.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 19:59:57,618 INFO [train2.py:809] (2/4) Epoch 6, batch 2100, loss[ctc_loss=0.1651, att_loss=0.2939, loss=0.2682, over 17382.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03409, over 63.00 utterances.], tot_loss[ctc_loss=0.147, att_loss=0.274, loss=0.2486, over 3270763.95 frames. utt_duration=1199 frames, utt_pad_proportion=0.06535, over 10925.97 utterances.], batch size: 63, lr: 1.78e-02, grad_scale: 8.0 2023-03-07 20:00:15,240 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22031.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:00:16,530 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22032.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:01:20,718 INFO [train2.py:809] (2/4) Epoch 6, batch 2150, loss[ctc_loss=0.1817, att_loss=0.2942, loss=0.2717, over 16698.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005933, over 46.00 utterances.], tot_loss[ctc_loss=0.1464, att_loss=0.273, loss=0.2477, over 3270251.53 frames. utt_duration=1226 frames, utt_pad_proportion=0.05833, over 10679.30 utterances.], batch size: 46, lr: 1.78e-02, grad_scale: 8.0 2023-03-07 20:02:28,004 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-03-07 20:02:28,353 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.159e+02 3.198e+02 3.917e+02 4.536e+02 8.271e+02, threshold=7.834e+02, percent-clipped=2.0 2023-03-07 20:02:42,295 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22119.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:02:43,575 INFO [train2.py:809] (2/4) Epoch 6, batch 2200, loss[ctc_loss=0.1538, att_loss=0.2907, loss=0.2633, over 17327.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02215, over 59.00 utterances.], tot_loss[ctc_loss=0.145, att_loss=0.272, loss=0.2466, over 3268600.30 frames. utt_duration=1254 frames, utt_pad_proportion=0.05282, over 10437.41 utterances.], batch size: 59, lr: 1.78e-02, grad_scale: 8.0 2023-03-07 20:02:59,627 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22130.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:03:48,127 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2023-03-07 20:03:58,696 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.48 vs. limit=2.0 2023-03-07 20:04:05,345 INFO [train2.py:809] (2/4) Epoch 6, batch 2250, loss[ctc_loss=0.1117, att_loss=0.2528, loss=0.2246, over 15646.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008782, over 37.00 utterances.], tot_loss[ctc_loss=0.1439, att_loss=0.2715, loss=0.2459, over 3274901.62 frames. utt_duration=1270 frames, utt_pad_proportion=0.04714, over 10327.08 utterances.], batch size: 37, lr: 1.78e-02, grad_scale: 8.0 2023-03-07 20:04:14,635 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22176.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:04:38,885 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22191.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:05:03,104 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22206.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:05:11,251 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+02 2.999e+02 3.625e+02 4.758e+02 1.652e+03, threshold=7.250e+02, percent-clipped=4.0 2023-03-07 20:05:17,967 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22215.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:05:25,454 INFO [train2.py:809] (2/4) Epoch 6, batch 2300, loss[ctc_loss=0.1475, att_loss=0.2686, loss=0.2444, over 16957.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.00786, over 50.00 utterances.], tot_loss[ctc_loss=0.145, att_loss=0.2719, loss=0.2466, over 3271300.05 frames. utt_duration=1261 frames, utt_pad_proportion=0.05056, over 10389.45 utterances.], batch size: 50, lr: 1.77e-02, grad_scale: 16.0 2023-03-07 20:05:31,941 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9466, 5.3277, 4.7866, 5.3730, 4.7216, 5.0786, 5.5039, 5.3354], device='cuda:2'), covar=tensor([0.0392, 0.0231, 0.0716, 0.0157, 0.0435, 0.0179, 0.0177, 0.0136], device='cuda:2'), in_proj_covar=tensor([0.0238, 0.0186, 0.0245, 0.0162, 0.0202, 0.0158, 0.0177, 0.0174], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-07 20:05:49,271 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5134, 5.8415, 5.2387, 5.7313, 5.4284, 5.1530, 5.1708, 5.0766], device='cuda:2'), covar=tensor([0.1242, 0.0828, 0.0866, 0.0683, 0.0683, 0.1316, 0.2272, 0.2039], device='cuda:2'), in_proj_covar=tensor([0.0349, 0.0397, 0.0305, 0.0324, 0.0286, 0.0373, 0.0430, 0.0391], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 20:05:53,160 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22237.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:05:53,189 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22237.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:06:15,983 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22251.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:06:44,820 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22269.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:06:46,010 INFO [train2.py:809] (2/4) Epoch 6, batch 2350, loss[ctc_loss=0.1735, att_loss=0.2815, loss=0.2599, over 17019.00 frames. utt_duration=689.1 frames, utt_pad_proportion=0.1343, over 99.00 utterances.], tot_loss[ctc_loss=0.1448, att_loss=0.2718, loss=0.2464, over 3269204.42 frames. utt_duration=1237 frames, utt_pad_proportion=0.05685, over 10585.53 utterances.], batch size: 99, lr: 1.77e-02, grad_scale: 16.0 2023-03-07 20:06:51,374 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7379, 4.5786, 4.5922, 4.6325, 5.0642, 4.9236, 4.6332, 2.1569], device='cuda:2'), covar=tensor([0.0245, 0.0294, 0.0212, 0.0145, 0.1082, 0.0176, 0.0230, 0.2980], device='cuda:2'), in_proj_covar=tensor([0.0131, 0.0124, 0.0124, 0.0124, 0.0299, 0.0128, 0.0116, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 20:06:56,071 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22276.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:07:30,972 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22298.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:07:32,241 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22299.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:07:52,173 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 3.128e+02 3.825e+02 4.605e+02 9.373e+02, threshold=7.650e+02, percent-clipped=4.0 2023-03-07 20:08:01,632 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22317.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:08:06,064 INFO [train2.py:809] (2/4) Epoch 6, batch 2400, loss[ctc_loss=0.1154, att_loss=0.2535, loss=0.2259, over 16287.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006111, over 43.00 utterances.], tot_loss[ctc_loss=0.1455, att_loss=0.2723, loss=0.2469, over 3272021.05 frames. utt_duration=1225 frames, utt_pad_proportion=0.05847, over 10696.85 utterances.], batch size: 43, lr: 1.77e-02, grad_scale: 16.0 2023-03-07 20:08:15,573 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22326.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:08:18,686 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2725, 4.8490, 4.4890, 4.7341, 4.7750, 4.5028, 3.6717, 4.6763], device='cuda:2'), covar=tensor([0.0122, 0.0105, 0.0103, 0.0098, 0.0082, 0.0113, 0.0519, 0.0193], device='cuda:2'), in_proj_covar=tensor([0.0058, 0.0056, 0.0064, 0.0044, 0.0044, 0.0053, 0.0077, 0.0074], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 20:09:26,765 INFO [train2.py:809] (2/4) Epoch 6, batch 2450, loss[ctc_loss=0.1215, att_loss=0.2552, loss=0.2284, over 16116.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006987, over 42.00 utterances.], tot_loss[ctc_loss=0.1452, att_loss=0.272, loss=0.2467, over 3273305.63 frames. utt_duration=1252 frames, utt_pad_proportion=0.05136, over 10473.48 utterances.], batch size: 42, lr: 1.77e-02, grad_scale: 16.0 2023-03-07 20:09:51,342 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22385.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:10:34,026 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 3.249e+02 3.944e+02 4.920e+02 7.510e+02, threshold=7.889e+02, percent-clipped=0.0 2023-03-07 20:10:47,145 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22419.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:10:48,451 INFO [train2.py:809] (2/4) Epoch 6, batch 2500, loss[ctc_loss=0.1688, att_loss=0.2615, loss=0.2429, over 15770.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008683, over 38.00 utterances.], tot_loss[ctc_loss=0.1448, att_loss=0.272, loss=0.2466, over 3279173.24 frames. utt_duration=1273 frames, utt_pad_proportion=0.04508, over 10319.04 utterances.], batch size: 38, lr: 1.77e-02, grad_scale: 16.0 2023-03-07 20:11:09,040 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6656, 2.9950, 3.7129, 2.8691, 3.7644, 4.6441, 4.4405, 3.2750], device='cuda:2'), covar=tensor([0.0318, 0.1449, 0.0939, 0.1284, 0.0882, 0.0564, 0.0439, 0.1325], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0211, 0.0213, 0.0194, 0.0217, 0.0225, 0.0174, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 20:11:15,036 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.61 vs. limit=2.0 2023-03-07 20:11:30,417 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22446.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:11:55,090 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.1372, 3.0678, 3.5537, 2.5382, 3.3396, 3.5398, 3.1814, 1.8202], device='cuda:2'), covar=tensor([0.1773, 0.1137, 0.1929, 0.6891, 0.2351, 0.1780, 0.0868, 1.0588], device='cuda:2'), in_proj_covar=tensor([0.0071, 0.0074, 0.0079, 0.0111, 0.0070, 0.0107, 0.0066, 0.0125], device='cuda:2'), out_proj_covar=tensor([5.8012e-05, 5.4198e-05, 6.1411e-05, 8.3618e-05, 5.5884e-05, 8.2085e-05, 5.0685e-05, 9.5393e-05], device='cuda:2') 2023-03-07 20:12:04,149 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22467.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:12:08,537 INFO [train2.py:809] (2/4) Epoch 6, batch 2550, loss[ctc_loss=0.1504, att_loss=0.2857, loss=0.2586, over 17132.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01421, over 56.00 utterances.], tot_loss[ctc_loss=0.1444, att_loss=0.2713, loss=0.2459, over 3280601.65 frames. utt_duration=1286 frames, utt_pad_proportion=0.04112, over 10218.78 utterances.], batch size: 56, lr: 1.76e-02, grad_scale: 16.0 2023-03-07 20:12:34,601 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22486.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:12:43,119 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.3529, 2.7998, 3.0516, 2.2904, 2.9219, 2.9454, 2.9235, 1.6362], device='cuda:2'), covar=tensor([0.1787, 0.1258, 0.2424, 0.6295, 0.3229, 0.4804, 0.0778, 1.1521], device='cuda:2'), in_proj_covar=tensor([0.0071, 0.0074, 0.0081, 0.0112, 0.0070, 0.0109, 0.0066, 0.0126], device='cuda:2'), out_proj_covar=tensor([5.8251e-05, 5.4516e-05, 6.2219e-05, 8.4196e-05, 5.6056e-05, 8.3216e-05, 5.1320e-05, 9.6198e-05], device='cuda:2') 2023-03-07 20:13:07,428 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22506.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:13:07,887 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-03-07 20:13:14,826 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 3.216e+02 3.975e+02 4.594e+02 1.058e+03, threshold=7.951e+02, percent-clipped=4.0 2023-03-07 20:13:28,849 INFO [train2.py:809] (2/4) Epoch 6, batch 2600, loss[ctc_loss=0.1846, att_loss=0.3043, loss=0.2803, over 17124.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01467, over 56.00 utterances.], tot_loss[ctc_loss=0.144, att_loss=0.2714, loss=0.2459, over 3279861.49 frames. utt_duration=1273 frames, utt_pad_proportion=0.04482, over 10317.61 utterances.], batch size: 56, lr: 1.76e-02, grad_scale: 16.0 2023-03-07 20:13:43,362 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3312, 1.4614, 1.9055, 1.5540, 3.2896, 2.0978, 1.6300, 2.0641], device='cuda:2'), covar=tensor([0.1490, 0.3497, 0.2971, 0.1963, 0.0523, 0.1741, 0.2478, 0.1603], device='cuda:2'), in_proj_covar=tensor([0.0087, 0.0096, 0.0096, 0.0084, 0.0080, 0.0079, 0.0094, 0.0074], device='cuda:2'), out_proj_covar=tensor([3.9800e-05, 5.3389e-05, 5.3080e-05, 4.4598e-05, 3.9547e-05, 4.5902e-05, 5.2301e-05, 4.3800e-05], device='cuda:2') 2023-03-07 20:13:44,195 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.78 vs. limit=5.0 2023-03-07 20:13:48,534 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22532.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:14:25,015 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22554.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:14:51,059 INFO [train2.py:809] (2/4) Epoch 6, batch 2650, loss[ctc_loss=0.1304, att_loss=0.2753, loss=0.2463, over 17320.00 frames. utt_duration=878.6 frames, utt_pad_proportion=0.07999, over 79.00 utterances.], tot_loss[ctc_loss=0.1428, att_loss=0.2705, loss=0.245, over 3277367.06 frames. utt_duration=1271 frames, utt_pad_proportion=0.04574, over 10325.43 utterances.], batch size: 79, lr: 1.76e-02, grad_scale: 16.0 2023-03-07 20:14:52,734 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22571.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:15:28,352 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22593.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:15:59,450 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 3.146e+02 3.715e+02 4.789e+02 1.567e+03, threshold=7.430e+02, percent-clipped=3.0 2023-03-07 20:16:11,939 INFO [train2.py:809] (2/4) Epoch 6, batch 2700, loss[ctc_loss=0.1234, att_loss=0.2748, loss=0.2445, over 17029.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.00816, over 51.00 utterances.], tot_loss[ctc_loss=0.144, att_loss=0.2716, loss=0.2461, over 3279496.53 frames. utt_duration=1273 frames, utt_pad_proportion=0.04454, over 10320.15 utterances.], batch size: 51, lr: 1.76e-02, grad_scale: 8.0 2023-03-07 20:16:21,398 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22626.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:16:23,848 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-07 20:16:41,834 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22639.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:17:15,426 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8536, 5.3587, 4.2842, 5.4503, 4.6640, 5.2809, 5.3384, 5.3213], device='cuda:2'), covar=tensor([0.0380, 0.0228, 0.1038, 0.0131, 0.0385, 0.0112, 0.0293, 0.0144], device='cuda:2'), in_proj_covar=tensor([0.0247, 0.0193, 0.0254, 0.0170, 0.0211, 0.0164, 0.0187, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 20:17:26,631 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7542, 5.1829, 4.9947, 5.0044, 5.1783, 5.2152, 4.9840, 4.6898], device='cuda:2'), covar=tensor([0.1076, 0.0380, 0.0224, 0.0427, 0.0251, 0.0231, 0.0252, 0.0294], device='cuda:2'), in_proj_covar=tensor([0.0405, 0.0232, 0.0175, 0.0219, 0.0278, 0.0303, 0.0230, 0.0262], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 20:17:32,383 INFO [train2.py:809] (2/4) Epoch 6, batch 2750, loss[ctc_loss=0.1468, att_loss=0.2697, loss=0.2451, over 16532.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006662, over 45.00 utterances.], tot_loss[ctc_loss=0.1434, att_loss=0.2709, loss=0.2454, over 3281432.33 frames. utt_duration=1290 frames, utt_pad_proportion=0.04009, over 10184.48 utterances.], batch size: 45, lr: 1.76e-02, grad_scale: 8.0 2023-03-07 20:17:38,674 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22674.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:18:04,604 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5496, 4.9088, 4.3369, 4.9039, 4.1938, 4.6864, 4.9650, 4.8316], device='cuda:2'), covar=tensor([0.0415, 0.0270, 0.0812, 0.0194, 0.0554, 0.0245, 0.0267, 0.0153], device='cuda:2'), in_proj_covar=tensor([0.0244, 0.0191, 0.0252, 0.0169, 0.0211, 0.0162, 0.0186, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 20:18:21,033 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22700.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 20:18:35,606 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1201, 4.8567, 4.8356, 4.7428, 4.8637, 4.9504, 4.7086, 4.5361], device='cuda:2'), covar=tensor([0.2071, 0.0648, 0.0309, 0.0481, 0.0505, 0.0387, 0.0338, 0.0347], device='cuda:2'), in_proj_covar=tensor([0.0403, 0.0232, 0.0174, 0.0218, 0.0277, 0.0303, 0.0227, 0.0260], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 20:18:40,006 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+02 3.125e+02 3.816e+02 4.658e+02 9.758e+02, threshold=7.632e+02, percent-clipped=3.0 2023-03-07 20:18:52,612 INFO [train2.py:809] (2/4) Epoch 6, batch 2800, loss[ctc_loss=0.1368, att_loss=0.2769, loss=0.2489, over 17067.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.007939, over 52.00 utterances.], tot_loss[ctc_loss=0.1433, att_loss=0.2713, loss=0.2457, over 3293235.64 frames. utt_duration=1293 frames, utt_pad_proportion=0.03632, over 10200.42 utterances.], batch size: 52, lr: 1.76e-02, grad_scale: 8.0 2023-03-07 20:19:26,388 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22741.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:19:32,012 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-03-07 20:20:11,907 INFO [train2.py:809] (2/4) Epoch 6, batch 2850, loss[ctc_loss=0.1333, att_loss=0.2518, loss=0.2281, over 15629.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009827, over 37.00 utterances.], tot_loss[ctc_loss=0.1443, att_loss=0.2715, loss=0.246, over 3278757.91 frames. utt_duration=1266 frames, utt_pad_proportion=0.04704, over 10370.47 utterances.], batch size: 37, lr: 1.75e-02, grad_scale: 8.0 2023-03-07 20:20:38,191 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22786.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:20:51,865 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22794.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:21:20,768 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+02 3.039e+02 3.742e+02 4.818e+02 9.718e+02, threshold=7.483e+02, percent-clipped=5.0 2023-03-07 20:21:28,485 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6642, 5.9097, 5.1730, 5.7909, 5.4178, 5.2096, 5.2547, 5.1893], device='cuda:2'), covar=tensor([0.1184, 0.0731, 0.0802, 0.0680, 0.0744, 0.1143, 0.2211, 0.2060], device='cuda:2'), in_proj_covar=tensor([0.0360, 0.0403, 0.0306, 0.0323, 0.0292, 0.0370, 0.0438, 0.0390], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 20:21:33,094 INFO [train2.py:809] (2/4) Epoch 6, batch 2900, loss[ctc_loss=0.1337, att_loss=0.2799, loss=0.2506, over 17046.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.00841, over 52.00 utterances.], tot_loss[ctc_loss=0.1434, att_loss=0.2712, loss=0.2456, over 3277653.08 frames. utt_duration=1255 frames, utt_pad_proportion=0.05178, over 10462.35 utterances.], batch size: 52, lr: 1.75e-02, grad_scale: 8.0 2023-03-07 20:21:41,151 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22825.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:21:52,568 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22832.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:21:55,482 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22834.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:22:18,888 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6034, 4.9162, 4.7767, 4.7853, 4.9563, 4.9633, 4.6951, 4.4286], device='cuda:2'), covar=tensor([0.1027, 0.0479, 0.0259, 0.0514, 0.0331, 0.0282, 0.0331, 0.0362], device='cuda:2'), in_proj_covar=tensor([0.0403, 0.0229, 0.0174, 0.0217, 0.0274, 0.0298, 0.0229, 0.0257], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 20:22:30,796 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22855.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:22:53,783 INFO [train2.py:809] (2/4) Epoch 6, batch 2950, loss[ctc_loss=0.1464, att_loss=0.272, loss=0.2469, over 15948.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006658, over 41.00 utterances.], tot_loss[ctc_loss=0.1445, att_loss=0.2725, loss=0.2469, over 3283869.05 frames. utt_duration=1218 frames, utt_pad_proportion=0.05968, over 10798.85 utterances.], batch size: 41, lr: 1.75e-02, grad_scale: 8.0 2023-03-07 20:22:55,657 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22871.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:23:02,132 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.9409, 3.2547, 3.4564, 2.8017, 3.2837, 3.4644, 3.3836, 1.8060], device='cuda:2'), covar=tensor([0.1545, 0.1064, 0.2054, 0.5598, 0.1794, 0.3884, 0.0615, 1.0968], device='cuda:2'), in_proj_covar=tensor([0.0071, 0.0076, 0.0082, 0.0115, 0.0070, 0.0107, 0.0067, 0.0125], device='cuda:2'), out_proj_covar=tensor([5.8372e-05, 5.5915e-05, 6.4147e-05, 8.5934e-05, 5.6128e-05, 8.2888e-05, 5.2333e-05, 9.6136e-05], device='cuda:2') 2023-03-07 20:23:10,192 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22880.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:23:20,746 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22886.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:23:23,833 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5186, 2.3636, 3.0817, 4.2105, 3.9446, 4.2635, 2.6877, 1.8970], device='cuda:2'), covar=tensor([0.0584, 0.2612, 0.1263, 0.0579, 0.0452, 0.0170, 0.1709, 0.2827], device='cuda:2'), in_proj_covar=tensor([0.0157, 0.0197, 0.0193, 0.0163, 0.0145, 0.0121, 0.0189, 0.0178], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 20:23:32,178 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22893.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:23:33,614 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3164, 5.1651, 5.3751, 2.8690, 5.1429, 4.5012, 4.5440, 2.4034], device='cuda:2'), covar=tensor([0.0126, 0.0090, 0.0088, 0.1199, 0.0068, 0.0185, 0.0257, 0.1794], device='cuda:2'), in_proj_covar=tensor([0.0055, 0.0067, 0.0055, 0.0101, 0.0060, 0.0077, 0.0082, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0004, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 20:24:02,049 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 3.285e+02 3.915e+02 4.917e+02 1.190e+03, threshold=7.831e+02, percent-clipped=3.0 2023-03-07 20:24:12,925 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22919.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:24:14,358 INFO [train2.py:809] (2/4) Epoch 6, batch 3000, loss[ctc_loss=0.1518, att_loss=0.2611, loss=0.2393, over 15927.00 frames. utt_duration=1556 frames, utt_pad_proportion=0.007933, over 41.00 utterances.], tot_loss[ctc_loss=0.1445, att_loss=0.272, loss=0.2465, over 3274438.69 frames. utt_duration=1218 frames, utt_pad_proportion=0.06305, over 10764.73 utterances.], batch size: 41, lr: 1.75e-02, grad_scale: 8.0 2023-03-07 20:24:14,358 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 20:24:28,228 INFO [train2.py:843] (2/4) Epoch 6, validation: ctc_loss=0.06806, att_loss=0.2473, loss=0.2115, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 20:24:28,228 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 20:25:01,617 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22941.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:25:47,902 INFO [train2.py:809] (2/4) Epoch 6, batch 3050, loss[ctc_loss=0.1734, att_loss=0.3005, loss=0.2751, over 17130.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01451, over 56.00 utterances.], tot_loss[ctc_loss=0.1455, att_loss=0.2723, loss=0.247, over 3273257.76 frames. utt_duration=1217 frames, utt_pad_proportion=0.06251, over 10771.44 utterances.], batch size: 56, lr: 1.75e-02, grad_scale: 8.0 2023-03-07 20:26:27,909 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22995.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 20:26:32,225 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2023-03-07 20:26:55,431 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 3.380e+02 4.169e+02 5.152e+02 1.227e+03, threshold=8.338e+02, percent-clipped=8.0 2023-03-07 20:27:07,731 INFO [train2.py:809] (2/4) Epoch 6, batch 3100, loss[ctc_loss=0.1591, att_loss=0.2765, loss=0.253, over 16779.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005652, over 48.00 utterances.], tot_loss[ctc_loss=0.1456, att_loss=0.2716, loss=0.2464, over 3266215.76 frames. utt_duration=1220 frames, utt_pad_proportion=0.06305, over 10718.17 utterances.], batch size: 48, lr: 1.74e-02, grad_scale: 8.0 2023-03-07 20:27:08,919 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-07 20:27:42,230 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23041.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:28:16,265 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5739, 2.8401, 3.6541, 2.8815, 3.5923, 4.6535, 4.3259, 3.1748], device='cuda:2'), covar=tensor([0.0395, 0.1694, 0.0996, 0.1511, 0.0988, 0.0617, 0.0640, 0.1505], device='cuda:2'), in_proj_covar=tensor([0.0212, 0.0208, 0.0214, 0.0197, 0.0218, 0.0224, 0.0179, 0.0208], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 20:28:28,022 INFO [train2.py:809] (2/4) Epoch 6, batch 3150, loss[ctc_loss=0.116, att_loss=0.2758, loss=0.2438, over 17285.00 frames. utt_duration=1173 frames, utt_pad_proportion=0.02562, over 59.00 utterances.], tot_loss[ctc_loss=0.1458, att_loss=0.2719, loss=0.2466, over 3273519.94 frames. utt_duration=1228 frames, utt_pad_proportion=0.05896, over 10673.52 utterances.], batch size: 59, lr: 1.74e-02, grad_scale: 8.0 2023-03-07 20:28:59,003 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23089.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:29:05,498 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9583, 5.2450, 4.6808, 5.2694, 4.6185, 5.0239, 5.3986, 5.1908], device='cuda:2'), covar=tensor([0.0401, 0.0196, 0.0734, 0.0173, 0.0429, 0.0180, 0.0217, 0.0166], device='cuda:2'), in_proj_covar=tensor([0.0249, 0.0191, 0.0250, 0.0171, 0.0212, 0.0163, 0.0184, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 20:29:22,292 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.20 vs. limit=5.0 2023-03-07 20:29:30,217 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=23108.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:29:36,092 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.237e+02 3.296e+02 3.877e+02 4.841e+02 1.237e+03, threshold=7.753e+02, percent-clipped=8.0 2023-03-07 20:29:48,527 INFO [train2.py:809] (2/4) Epoch 6, batch 3200, loss[ctc_loss=0.1488, att_loss=0.2776, loss=0.2518, over 16759.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006925, over 48.00 utterances.], tot_loss[ctc_loss=0.1445, att_loss=0.2716, loss=0.2462, over 3281065.48 frames. utt_duration=1238 frames, utt_pad_proportion=0.05512, over 10616.00 utterances.], batch size: 48, lr: 1.74e-02, grad_scale: 8.0 2023-03-07 20:30:36,880 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23150.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:30:44,878 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1855, 4.9846, 5.0087, 2.8690, 4.8654, 4.4403, 4.3564, 2.6576], device='cuda:2'), covar=tensor([0.0085, 0.0091, 0.0157, 0.1008, 0.0087, 0.0157, 0.0270, 0.1359], device='cuda:2'), in_proj_covar=tensor([0.0053, 0.0067, 0.0056, 0.0098, 0.0059, 0.0075, 0.0080, 0.0098], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 20:30:59,662 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6600, 2.8096, 3.7030, 2.8766, 3.7312, 4.7327, 4.4056, 3.4211], device='cuda:2'), covar=tensor([0.0281, 0.1671, 0.0960, 0.1411, 0.0761, 0.0438, 0.0495, 0.1123], device='cuda:2'), in_proj_covar=tensor([0.0211, 0.0208, 0.0212, 0.0195, 0.0216, 0.0224, 0.0180, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 20:31:07,700 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=23169.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:31:08,932 INFO [train2.py:809] (2/4) Epoch 6, batch 3250, loss[ctc_loss=0.1894, att_loss=0.306, loss=0.2826, over 17286.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01124, over 55.00 utterances.], tot_loss[ctc_loss=0.1444, att_loss=0.2719, loss=0.2464, over 3284070.69 frames. utt_duration=1250 frames, utt_pad_proportion=0.05109, over 10518.23 utterances.], batch size: 55, lr: 1.74e-02, grad_scale: 8.0 2023-03-07 20:31:26,771 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23181.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:32:18,013 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 3.273e+02 3.918e+02 4.726e+02 1.280e+03, threshold=7.837e+02, percent-clipped=1.0 2023-03-07 20:32:24,720 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8397, 3.8156, 3.2373, 3.5239, 4.0137, 3.6750, 2.6464, 4.5179], device='cuda:2'), covar=tensor([0.0856, 0.0311, 0.0826, 0.0498, 0.0377, 0.0459, 0.0898, 0.0295], device='cuda:2'), in_proj_covar=tensor([0.0163, 0.0142, 0.0187, 0.0153, 0.0175, 0.0185, 0.0159, 0.0181], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 20:32:26,201 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=23217.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:32:30,551 INFO [train2.py:809] (2/4) Epoch 6, batch 3300, loss[ctc_loss=0.1509, att_loss=0.2697, loss=0.2459, over 16519.00 frames. utt_duration=1470 frames, utt_pad_proportion=0.007352, over 45.00 utterances.], tot_loss[ctc_loss=0.1434, att_loss=0.271, loss=0.2455, over 3280055.31 frames. utt_duration=1255 frames, utt_pad_proportion=0.05137, over 10465.58 utterances.], batch size: 45, lr: 1.74e-02, grad_scale: 8.0 2023-03-07 20:32:45,801 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=23229.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:32:53,583 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7397, 4.5107, 4.6753, 4.6145, 5.0133, 4.8108, 4.4982, 2.1343], device='cuda:2'), covar=tensor([0.0199, 0.0307, 0.0182, 0.0131, 0.1041, 0.0177, 0.0259, 0.2665], device='cuda:2'), in_proj_covar=tensor([0.0133, 0.0124, 0.0117, 0.0120, 0.0297, 0.0129, 0.0113, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 20:33:51,764 INFO [train2.py:809] (2/4) Epoch 6, batch 3350, loss[ctc_loss=0.1324, att_loss=0.2589, loss=0.2336, over 15968.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.006109, over 41.00 utterances.], tot_loss[ctc_loss=0.1433, att_loss=0.2707, loss=0.2452, over 3280990.09 frames. utt_duration=1254 frames, utt_pad_proportion=0.05052, over 10475.41 utterances.], batch size: 41, lr: 1.74e-02, grad_scale: 8.0 2023-03-07 20:34:05,053 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=23278.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:34:24,917 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=23290.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:34:32,805 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23295.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:34:59,023 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 3.226e+02 3.955e+02 4.943e+02 9.794e+02, threshold=7.910e+02, percent-clipped=2.0 2023-03-07 20:35:11,506 INFO [train2.py:809] (2/4) Epoch 6, batch 3400, loss[ctc_loss=0.09161, att_loss=0.2311, loss=0.2032, over 14508.00 frames. utt_duration=1815 frames, utt_pad_proportion=0.03934, over 32.00 utterances.], tot_loss[ctc_loss=0.1431, att_loss=0.2705, loss=0.2451, over 3273076.54 frames. utt_duration=1260 frames, utt_pad_proportion=0.05146, over 10400.50 utterances.], batch size: 32, lr: 1.73e-02, grad_scale: 8.0 2023-03-07 20:35:13,451 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7459, 2.4063, 3.2649, 4.4151, 4.0283, 4.0830, 3.0428, 1.9891], device='cuda:2'), covar=tensor([0.0537, 0.2755, 0.1273, 0.0549, 0.0655, 0.0469, 0.1758, 0.2843], device='cuda:2'), in_proj_covar=tensor([0.0158, 0.0197, 0.0194, 0.0166, 0.0149, 0.0126, 0.0192, 0.0181], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 20:35:49,218 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23343.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:36:31,954 INFO [train2.py:809] (2/4) Epoch 6, batch 3450, loss[ctc_loss=0.1391, att_loss=0.2819, loss=0.2534, over 17058.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009195, over 53.00 utterances.], tot_loss[ctc_loss=0.1433, att_loss=0.271, loss=0.2454, over 3277807.93 frames. utt_duration=1285 frames, utt_pad_proportion=0.0447, over 10217.07 utterances.], batch size: 53, lr: 1.73e-02, grad_scale: 8.0 2023-03-07 20:37:37,226 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.71 vs. limit=2.0 2023-03-07 20:37:39,554 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.288e+02 3.391e+02 4.040e+02 5.091e+02 1.147e+03, threshold=8.079e+02, percent-clipped=5.0 2023-03-07 20:37:52,193 INFO [train2.py:809] (2/4) Epoch 6, batch 3500, loss[ctc_loss=0.1298, att_loss=0.2671, loss=0.2397, over 16549.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.005807, over 45.00 utterances.], tot_loss[ctc_loss=0.1435, att_loss=0.2709, loss=0.2454, over 3282685.18 frames. utt_duration=1282 frames, utt_pad_proportion=0.04414, over 10250.87 utterances.], batch size: 45, lr: 1.73e-02, grad_scale: 8.0 2023-03-07 20:38:40,754 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23450.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:39:03,021 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23464.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:39:13,017 INFO [train2.py:809] (2/4) Epoch 6, batch 3550, loss[ctc_loss=0.1691, att_loss=0.2997, loss=0.2736, over 17309.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01151, over 55.00 utterances.], tot_loss[ctc_loss=0.1438, att_loss=0.2717, loss=0.2461, over 3289831.00 frames. utt_duration=1265 frames, utt_pad_proportion=0.04581, over 10417.26 utterances.], batch size: 55, lr: 1.73e-02, grad_scale: 8.0 2023-03-07 20:39:30,686 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23481.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:39:36,983 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7578, 5.1399, 5.0470, 4.9928, 5.1586, 5.1527, 4.8052, 4.5675], device='cuda:2'), covar=tensor([0.0948, 0.0364, 0.0181, 0.0400, 0.0260, 0.0251, 0.0334, 0.0309], device='cuda:2'), in_proj_covar=tensor([0.0396, 0.0229, 0.0173, 0.0212, 0.0273, 0.0300, 0.0226, 0.0254], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 20:39:42,265 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=23488.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:39:52,402 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-07 20:39:57,395 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23498.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:40:04,535 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6739, 2.5995, 5.1476, 3.9670, 2.9479, 4.4705, 4.9085, 4.7554], device='cuda:2'), covar=tensor([0.0188, 0.1792, 0.0174, 0.1129, 0.2327, 0.0277, 0.0104, 0.0193], device='cuda:2'), in_proj_covar=tensor([0.0139, 0.0235, 0.0119, 0.0295, 0.0289, 0.0180, 0.0103, 0.0140], device='cuda:2'), out_proj_covar=tensor([1.2536e-04, 1.9192e-04, 1.0696e-04, 2.3891e-04, 2.4442e-04, 1.5763e-04, 9.3257e-05, 1.2567e-04], device='cuda:2') 2023-03-07 20:40:10,473 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.2920, 3.6869, 3.0171, 3.3772, 3.7674, 3.6039, 2.7057, 4.3008], device='cuda:2'), covar=tensor([0.1234, 0.0410, 0.1117, 0.0592, 0.0493, 0.0600, 0.0905, 0.0409], device='cuda:2'), in_proj_covar=tensor([0.0164, 0.0144, 0.0189, 0.0154, 0.0177, 0.0188, 0.0159, 0.0185], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 20:40:19,336 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+02 3.330e+02 3.992e+02 4.756e+02 1.054e+03, threshold=7.983e+02, percent-clipped=2.0 2023-03-07 20:40:31,494 INFO [train2.py:809] (2/4) Epoch 6, batch 3600, loss[ctc_loss=0.1369, att_loss=0.2757, loss=0.248, over 16964.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007655, over 50.00 utterances.], tot_loss[ctc_loss=0.1428, att_loss=0.2704, loss=0.2449, over 3284884.41 frames. utt_duration=1289 frames, utt_pad_proportion=0.04136, over 10203.33 utterances.], batch size: 50, lr: 1.73e-02, grad_scale: 8.0 2023-03-07 20:40:46,277 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23529.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:41:15,672 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4433, 4.1055, 3.1895, 3.4935, 4.2395, 3.7079, 2.6419, 4.5684], device='cuda:2'), covar=tensor([0.1238, 0.0311, 0.1055, 0.0619, 0.0461, 0.0638, 0.1053, 0.0481], device='cuda:2'), in_proj_covar=tensor([0.0164, 0.0145, 0.0191, 0.0153, 0.0177, 0.0188, 0.0159, 0.0186], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 20:41:18,610 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=23549.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 20:41:26,020 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.16 vs. limit=2.0 2023-03-07 20:41:51,589 INFO [train2.py:809] (2/4) Epoch 6, batch 3650, loss[ctc_loss=0.1346, att_loss=0.2575, loss=0.2329, over 16131.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005987, over 42.00 utterances.], tot_loss[ctc_loss=0.1427, att_loss=0.2698, loss=0.2443, over 3272735.36 frames. utt_duration=1286 frames, utt_pad_proportion=0.04572, over 10194.45 utterances.], batch size: 42, lr: 1.72e-02, grad_scale: 8.0 2023-03-07 20:41:57,520 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23573.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:42:12,582 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1011, 4.4482, 4.1208, 4.5254, 2.4252, 4.4992, 2.5875, 1.8138], device='cuda:2'), covar=tensor([0.0331, 0.0155, 0.0940, 0.0325, 0.2370, 0.0188, 0.1865, 0.2216], device='cuda:2'), in_proj_covar=tensor([0.0116, 0.0103, 0.0251, 0.0110, 0.0223, 0.0099, 0.0221, 0.0209], device='cuda:2'), out_proj_covar=tensor([1.1445e-04, 1.0198e-04, 2.2424e-04, 1.0128e-04, 2.0564e-04, 9.5931e-05, 1.9657e-04, 1.8708e-04], device='cuda:2') 2023-03-07 20:42:16,982 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23585.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:42:59,618 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 3.073e+02 3.816e+02 5.187e+02 1.095e+03, threshold=7.631e+02, percent-clipped=5.0 2023-03-07 20:43:12,936 INFO [train2.py:809] (2/4) Epoch 6, batch 3700, loss[ctc_loss=0.1228, att_loss=0.237, loss=0.2142, over 15751.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009238, over 38.00 utterances.], tot_loss[ctc_loss=0.1439, att_loss=0.2709, loss=0.2455, over 3278993.35 frames. utt_duration=1274 frames, utt_pad_proportion=0.04778, over 10303.80 utterances.], batch size: 38, lr: 1.72e-02, grad_scale: 8.0 2023-03-07 20:44:32,073 INFO [train2.py:809] (2/4) Epoch 6, batch 3750, loss[ctc_loss=0.126, att_loss=0.2653, loss=0.2375, over 16482.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.00568, over 46.00 utterances.], tot_loss[ctc_loss=0.1437, att_loss=0.2704, loss=0.2451, over 3263873.75 frames. utt_duration=1275 frames, utt_pad_proportion=0.05253, over 10248.07 utterances.], batch size: 46, lr: 1.72e-02, grad_scale: 8.0 2023-03-07 20:45:25,983 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.43 vs. limit=5.0 2023-03-07 20:45:38,748 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 3.181e+02 3.944e+02 5.174e+02 1.956e+03, threshold=7.887e+02, percent-clipped=9.0 2023-03-07 20:45:52,119 INFO [train2.py:809] (2/4) Epoch 6, batch 3800, loss[ctc_loss=0.1344, att_loss=0.2474, loss=0.2248, over 15997.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007331, over 40.00 utterances.], tot_loss[ctc_loss=0.1437, att_loss=0.2701, loss=0.2448, over 3258365.02 frames. utt_duration=1234 frames, utt_pad_proportion=0.06401, over 10573.48 utterances.], batch size: 40, lr: 1.72e-02, grad_scale: 8.0 2023-03-07 20:47:02,229 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23764.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:47:10,104 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4214, 2.1775, 4.9494, 3.6286, 2.9168, 4.3129, 4.6433, 4.6087], device='cuda:2'), covar=tensor([0.0219, 0.1951, 0.0109, 0.1299, 0.2163, 0.0277, 0.0113, 0.0223], device='cuda:2'), in_proj_covar=tensor([0.0143, 0.0240, 0.0122, 0.0304, 0.0298, 0.0185, 0.0106, 0.0144], device='cuda:2'), out_proj_covar=tensor([1.2908e-04, 1.9678e-04, 1.1009e-04, 2.4636e-04, 2.5182e-04, 1.6176e-04, 9.4742e-05, 1.2934e-04], device='cuda:2') 2023-03-07 20:47:11,174 INFO [train2.py:809] (2/4) Epoch 6, batch 3850, loss[ctc_loss=0.1477, att_loss=0.2831, loss=0.2561, over 17334.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02296, over 59.00 utterances.], tot_loss[ctc_loss=0.1438, att_loss=0.2706, loss=0.2452, over 3268448.27 frames. utt_duration=1227 frames, utt_pad_proportion=0.06279, over 10665.54 utterances.], batch size: 59, lr: 1.72e-02, grad_scale: 8.0 2023-03-07 20:47:51,837 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.5761, 3.3121, 3.0422, 2.6176, 3.0443, 2.8324, 3.3286, 1.9201], device='cuda:2'), covar=tensor([0.1688, 0.1034, 0.4197, 0.5649, 0.1803, 0.4689, 0.0742, 0.7670], device='cuda:2'), in_proj_covar=tensor([0.0077, 0.0079, 0.0084, 0.0123, 0.0075, 0.0111, 0.0070, 0.0127], device='cuda:2'), out_proj_covar=tensor([6.2809e-05, 5.8143e-05, 6.6792e-05, 9.2536e-05, 5.9303e-05, 8.6883e-05, 5.3724e-05, 9.9366e-05], device='cuda:2') 2023-03-07 20:48:18,384 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+02 3.226e+02 4.048e+02 5.028e+02 1.314e+03, threshold=8.096e+02, percent-clipped=6.0 2023-03-07 20:48:18,526 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23812.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:48:31,092 INFO [train2.py:809] (2/4) Epoch 6, batch 3900, loss[ctc_loss=0.1336, att_loss=0.2649, loss=0.2386, over 16950.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.008488, over 50.00 utterances.], tot_loss[ctc_loss=0.1426, att_loss=0.2697, loss=0.2443, over 3260638.68 frames. utt_duration=1224 frames, utt_pad_proportion=0.06485, over 10671.36 utterances.], batch size: 50, lr: 1.72e-02, grad_scale: 8.0 2023-03-07 20:49:08,486 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23844.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 20:49:21,374 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-07 20:49:48,973 INFO [train2.py:809] (2/4) Epoch 6, batch 3950, loss[ctc_loss=0.1232, att_loss=0.2408, loss=0.2173, over 15761.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.009233, over 38.00 utterances.], tot_loss[ctc_loss=0.1425, att_loss=0.2691, loss=0.2438, over 3253182.99 frames. utt_duration=1224 frames, utt_pad_proportion=0.06716, over 10643.73 utterances.], batch size: 38, lr: 1.71e-02, grad_scale: 8.0 2023-03-07 20:49:54,051 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23873.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:50:12,163 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23885.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:51:06,492 INFO [train2.py:809] (2/4) Epoch 7, batch 0, loss[ctc_loss=0.1105, att_loss=0.2484, loss=0.2208, over 15646.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.009004, over 37.00 utterances.], tot_loss[ctc_loss=0.1105, att_loss=0.2484, loss=0.2208, over 15646.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.009004, over 37.00 utterances.], batch size: 37, lr: 1.61e-02, grad_scale: 8.0 2023-03-07 20:51:06,492 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 20:51:19,279 INFO [train2.py:843] (2/4) Epoch 7, validation: ctc_loss=0.06772, att_loss=0.2471, loss=0.2112, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 20:51:19,280 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 20:51:34,367 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 3.227e+02 4.296e+02 5.385e+02 1.552e+03, threshold=8.591e+02, percent-clipped=5.0 2023-03-07 20:51:48,667 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23921.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:52:07,302 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23933.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:52:30,537 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2189, 5.1512, 5.0812, 2.5591, 1.8379, 2.5943, 4.5231, 3.7382], device='cuda:2'), covar=tensor([0.0514, 0.0183, 0.0217, 0.3122, 0.6189, 0.2822, 0.0431, 0.1957], device='cuda:2'), in_proj_covar=tensor([0.0296, 0.0191, 0.0214, 0.0177, 0.0350, 0.0333, 0.0202, 0.0329], device='cuda:2'), out_proj_covar=tensor([1.4531e-04, 7.7508e-05, 9.4048e-05, 8.1019e-05, 1.6523e-04, 1.4631e-04, 8.4685e-05, 1.5242e-04], device='cuda:2') 2023-03-07 20:52:38,300 INFO [train2.py:809] (2/4) Epoch 7, batch 50, loss[ctc_loss=0.102, att_loss=0.2257, loss=0.201, over 15882.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009269, over 39.00 utterances.], tot_loss[ctc_loss=0.1384, att_loss=0.2675, loss=0.2417, over 735984.53 frames. utt_duration=1271 frames, utt_pad_proportion=0.0468, over 2318.80 utterances.], batch size: 39, lr: 1.60e-02, grad_scale: 8.0 2023-03-07 20:52:38,510 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9944, 5.2650, 5.5093, 5.4112, 5.3853, 5.9353, 5.1290, 6.0990], device='cuda:2'), covar=tensor([0.0543, 0.0564, 0.0539, 0.0802, 0.1648, 0.0690, 0.0526, 0.0411], device='cuda:2'), in_proj_covar=tensor([0.0561, 0.0350, 0.0375, 0.0446, 0.0609, 0.0384, 0.0314, 0.0390], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 20:54:02,452 INFO [train2.py:809] (2/4) Epoch 7, batch 100, loss[ctc_loss=0.101, att_loss=0.2395, loss=0.2118, over 15897.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.007903, over 39.00 utterances.], tot_loss[ctc_loss=0.1385, att_loss=0.2677, loss=0.2418, over 1298898.84 frames. utt_duration=1255 frames, utt_pad_proportion=0.05254, over 4145.28 utterances.], batch size: 39, lr: 1.60e-02, grad_scale: 8.0 2023-03-07 20:54:18,138 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 3.045e+02 3.704e+02 4.782e+02 7.393e+02, threshold=7.408e+02, percent-clipped=0.0 2023-03-07 20:54:57,397 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24037.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:55:23,567 INFO [train2.py:809] (2/4) Epoch 7, batch 150, loss[ctc_loss=0.127, att_loss=0.2656, loss=0.2379, over 16549.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.005912, over 45.00 utterances.], tot_loss[ctc_loss=0.14, att_loss=0.2688, loss=0.2431, over 1739575.77 frames. utt_duration=1241 frames, utt_pad_proportion=0.05406, over 5611.66 utterances.], batch size: 45, lr: 1.60e-02, grad_scale: 8.0 2023-03-07 20:56:34,189 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24098.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:56:42,642 INFO [train2.py:809] (2/4) Epoch 7, batch 200, loss[ctc_loss=0.2108, att_loss=0.3093, loss=0.2896, over 14456.00 frames. utt_duration=395 frames, utt_pad_proportion=0.3083, over 147.00 utterances.], tot_loss[ctc_loss=0.14, att_loss=0.2689, loss=0.2431, over 2077135.55 frames. utt_duration=1251 frames, utt_pad_proportion=0.05473, over 6651.84 utterances.], batch size: 147, lr: 1.60e-02, grad_scale: 8.0 2023-03-07 20:56:56,758 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.953e+02 3.642e+02 4.338e+02 9.354e+02, threshold=7.285e+02, percent-clipped=1.0 2023-03-07 20:57:29,696 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4843, 4.7578, 5.0683, 4.9269, 4.8457, 5.4084, 4.8920, 5.5219], device='cuda:2'), covar=tensor([0.0611, 0.0724, 0.0622, 0.0975, 0.1829, 0.0821, 0.0720, 0.0557], device='cuda:2'), in_proj_covar=tensor([0.0572, 0.0360, 0.0382, 0.0456, 0.0618, 0.0393, 0.0316, 0.0395], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 20:57:47,336 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=24144.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 20:58:02,081 INFO [train2.py:809] (2/4) Epoch 7, batch 250, loss[ctc_loss=0.1126, att_loss=0.2515, loss=0.2237, over 16122.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006587, over 42.00 utterances.], tot_loss[ctc_loss=0.1411, att_loss=0.2698, loss=0.2441, over 2335816.05 frames. utt_duration=1210 frames, utt_pad_proportion=0.06556, over 7731.47 utterances.], batch size: 42, lr: 1.60e-02, grad_scale: 8.0 2023-03-07 20:58:02,361 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9887, 5.2904, 4.7899, 5.2492, 4.7770, 5.0429, 5.4103, 5.2480], device='cuda:2'), covar=tensor([0.0380, 0.0324, 0.0619, 0.0250, 0.0356, 0.0185, 0.0188, 0.0143], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0199, 0.0253, 0.0178, 0.0214, 0.0160, 0.0187, 0.0184], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 20:59:02,759 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=24192.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 20:59:22,695 INFO [train2.py:809] (2/4) Epoch 7, batch 300, loss[ctc_loss=0.1281, att_loss=0.243, loss=0.22, over 15647.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008392, over 37.00 utterances.], tot_loss[ctc_loss=0.1398, att_loss=0.2692, loss=0.2433, over 2550173.05 frames. utt_duration=1245 frames, utt_pad_proportion=0.05446, over 8203.12 utterances.], batch size: 37, lr: 1.60e-02, grad_scale: 8.0 2023-03-07 20:59:36,411 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.956e+02 3.932e+02 4.844e+02 1.089e+03, threshold=7.865e+02, percent-clipped=8.0 2023-03-07 21:00:42,742 INFO [train2.py:809] (2/4) Epoch 7, batch 350, loss[ctc_loss=0.1359, att_loss=0.2759, loss=0.2479, over 16639.00 frames. utt_duration=1418 frames, utt_pad_proportion=0.004527, over 47.00 utterances.], tot_loss[ctc_loss=0.139, att_loss=0.2696, loss=0.2435, over 2719736.51 frames. utt_duration=1264 frames, utt_pad_proportion=0.04666, over 8616.18 utterances.], batch size: 47, lr: 1.59e-02, grad_scale: 8.0 2023-03-07 21:01:06,650 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2023-03-07 21:01:07,228 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.80 vs. limit=2.0 2023-03-07 21:01:20,068 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5818, 3.5101, 3.6612, 2.6422, 3.5068, 3.3574, 3.5116, 2.0010], device='cuda:2'), covar=tensor([0.1228, 0.1240, 0.1596, 0.7257, 0.0997, 0.7363, 0.0737, 1.0196], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0078, 0.0080, 0.0124, 0.0072, 0.0113, 0.0070, 0.0125], device='cuda:2'), out_proj_covar=tensor([6.1672e-05, 5.7949e-05, 6.4946e-05, 9.3538e-05, 5.7819e-05, 8.7785e-05, 5.3261e-05, 9.8086e-05], device='cuda:2') 2023-03-07 21:02:03,070 INFO [train2.py:809] (2/4) Epoch 7, batch 400, loss[ctc_loss=0.09628, att_loss=0.235, loss=0.2072, over 15649.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008671, over 37.00 utterances.], tot_loss[ctc_loss=0.1375, att_loss=0.268, loss=0.2419, over 2834426.85 frames. utt_duration=1267 frames, utt_pad_proportion=0.04996, over 8962.27 utterances.], batch size: 37, lr: 1.59e-02, grad_scale: 8.0 2023-03-07 21:02:16,570 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.699e+02 2.848e+02 3.496e+02 4.281e+02 6.983e+02, threshold=6.993e+02, percent-clipped=0.0 2023-03-07 21:03:00,615 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3287, 2.6397, 3.1300, 4.1734, 3.8850, 3.9943, 2.8437, 1.7626], device='cuda:2'), covar=tensor([0.0708, 0.2281, 0.1440, 0.0647, 0.0554, 0.0378, 0.1634, 0.2989], device='cuda:2'), in_proj_covar=tensor([0.0159, 0.0195, 0.0193, 0.0169, 0.0147, 0.0129, 0.0186, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 21:03:22,777 INFO [train2.py:809] (2/4) Epoch 7, batch 450, loss[ctc_loss=0.1333, att_loss=0.26, loss=0.2347, over 16138.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005113, over 42.00 utterances.], tot_loss[ctc_loss=0.1383, att_loss=0.2683, loss=0.2423, over 2933598.64 frames. utt_duration=1254 frames, utt_pad_proportion=0.05228, over 9365.99 utterances.], batch size: 42, lr: 1.59e-02, grad_scale: 8.0 2023-03-07 21:03:35,526 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4675, 2.1720, 5.0018, 3.8400, 2.8143, 4.2604, 4.7703, 4.6857], device='cuda:2'), covar=tensor([0.0233, 0.2297, 0.0132, 0.1118, 0.2320, 0.0269, 0.0112, 0.0195], device='cuda:2'), in_proj_covar=tensor([0.0142, 0.0239, 0.0123, 0.0297, 0.0288, 0.0182, 0.0107, 0.0140], device='cuda:2'), out_proj_covar=tensor([1.2909e-04, 1.9630e-04, 1.0914e-04, 2.4272e-04, 2.4637e-04, 1.5922e-04, 9.5195e-05, 1.2608e-04], device='cuda:2') 2023-03-07 21:03:58,084 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2023-03-07 21:04:26,024 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=24393.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:04:42,797 INFO [train2.py:809] (2/4) Epoch 7, batch 500, loss[ctc_loss=0.1045, att_loss=0.2464, loss=0.218, over 16252.00 frames. utt_duration=1513 frames, utt_pad_proportion=0.008245, over 43.00 utterances.], tot_loss[ctc_loss=0.1385, att_loss=0.2685, loss=0.2425, over 3001339.39 frames. utt_duration=1250 frames, utt_pad_proportion=0.05649, over 9617.07 utterances.], batch size: 43, lr: 1.59e-02, grad_scale: 8.0 2023-03-07 21:04:56,600 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+02 3.293e+02 3.980e+02 5.188e+02 9.157e+02, threshold=7.960e+02, percent-clipped=6.0 2023-03-07 21:06:02,369 INFO [train2.py:809] (2/4) Epoch 7, batch 550, loss[ctc_loss=0.1039, att_loss=0.2334, loss=0.2075, over 15478.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.01006, over 36.00 utterances.], tot_loss[ctc_loss=0.1388, att_loss=0.2688, loss=0.2428, over 3063570.40 frames. utt_duration=1228 frames, utt_pad_proportion=0.06001, over 9993.83 utterances.], batch size: 36, lr: 1.59e-02, grad_scale: 8.0 2023-03-07 21:07:23,383 INFO [train2.py:809] (2/4) Epoch 7, batch 600, loss[ctc_loss=0.1523, att_loss=0.2596, loss=0.2381, over 16176.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006535, over 41.00 utterances.], tot_loss[ctc_loss=0.1391, att_loss=0.2693, loss=0.2432, over 3111118.94 frames. utt_duration=1216 frames, utt_pad_proportion=0.06272, over 10245.86 utterances.], batch size: 41, lr: 1.59e-02, grad_scale: 8.0 2023-03-07 21:07:37,152 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 3.070e+02 3.787e+02 4.448e+02 1.341e+03, threshold=7.574e+02, percent-clipped=2.0 2023-03-07 21:08:33,832 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24547.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:08:43,729 INFO [train2.py:809] (2/4) Epoch 7, batch 650, loss[ctc_loss=0.1485, att_loss=0.2909, loss=0.2624, over 17063.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009035, over 53.00 utterances.], tot_loss[ctc_loss=0.1375, att_loss=0.2672, loss=0.2413, over 3138961.52 frames. utt_duration=1236 frames, utt_pad_proportion=0.0599, over 10171.05 utterances.], batch size: 53, lr: 1.59e-02, grad_scale: 8.0 2023-03-07 21:08:46,685 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.59 vs. limit=2.0 2023-03-07 21:09:08,918 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24569.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:09:16,573 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5271, 4.6410, 4.7349, 5.2545, 2.3121, 4.8787, 2.7835, 1.9404], device='cuda:2'), covar=tensor([0.0193, 0.0198, 0.0707, 0.0070, 0.2477, 0.0115, 0.1697, 0.1919], device='cuda:2'), in_proj_covar=tensor([0.0118, 0.0107, 0.0260, 0.0112, 0.0231, 0.0104, 0.0229, 0.0212], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 21:09:21,093 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3786, 4.8324, 4.6007, 4.7169, 4.8945, 4.5656, 3.7918, 4.7321], device='cuda:2'), covar=tensor([0.0104, 0.0092, 0.0109, 0.0101, 0.0093, 0.0098, 0.0439, 0.0178], device='cuda:2'), in_proj_covar=tensor([0.0060, 0.0058, 0.0067, 0.0045, 0.0047, 0.0055, 0.0078, 0.0075], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 21:10:04,226 INFO [train2.py:809] (2/4) Epoch 7, batch 700, loss[ctc_loss=0.1417, att_loss=0.264, loss=0.2396, over 16112.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006422, over 42.00 utterances.], tot_loss[ctc_loss=0.1366, att_loss=0.2672, loss=0.2411, over 3175136.08 frames. utt_duration=1251 frames, utt_pad_proportion=0.05436, over 10164.41 utterances.], batch size: 42, lr: 1.58e-02, grad_scale: 8.0 2023-03-07 21:10:12,545 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24608.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:10:18,244 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+02 2.863e+02 3.503e+02 4.304e+02 9.533e+02, threshold=7.005e+02, percent-clipped=2.0 2023-03-07 21:10:40,601 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2023-03-07 21:10:46,432 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24630.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:11:23,262 INFO [train2.py:809] (2/4) Epoch 7, batch 750, loss[ctc_loss=0.1908, att_loss=0.2827, loss=0.2643, over 15938.00 frames. utt_duration=1556 frames, utt_pad_proportion=0.008099, over 41.00 utterances.], tot_loss[ctc_loss=0.1378, att_loss=0.2675, loss=0.2416, over 3192590.01 frames. utt_duration=1277 frames, utt_pad_proportion=0.04957, over 10015.12 utterances.], batch size: 41, lr: 1.58e-02, grad_scale: 16.0 2023-03-07 21:12:06,361 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-07 21:12:27,411 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=24693.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:12:40,333 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2962, 1.0345, 2.0019, 2.2589, 3.4830, 1.9681, 1.6288, 2.4332], device='cuda:2'), covar=tensor([0.0482, 0.3994, 0.2625, 0.1215, 0.0614, 0.1816, 0.2960, 0.1540], device='cuda:2'), in_proj_covar=tensor([0.0081, 0.0088, 0.0089, 0.0079, 0.0078, 0.0076, 0.0087, 0.0072], device='cuda:2'), out_proj_covar=tensor([3.8215e-05, 5.1872e-05, 5.1044e-05, 4.3250e-05, 3.8885e-05, 4.5011e-05, 5.0061e-05, 4.3646e-05], device='cuda:2') 2023-03-07 21:12:43,250 INFO [train2.py:809] (2/4) Epoch 7, batch 800, loss[ctc_loss=0.1054, att_loss=0.2489, loss=0.2202, over 16100.00 frames. utt_duration=1535 frames, utt_pad_proportion=0.006642, over 42.00 utterances.], tot_loss[ctc_loss=0.1373, att_loss=0.2671, loss=0.2411, over 3211813.49 frames. utt_duration=1287 frames, utt_pad_proportion=0.04677, over 9997.31 utterances.], batch size: 42, lr: 1.58e-02, grad_scale: 16.0 2023-03-07 21:12:57,269 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+02 2.992e+02 3.996e+02 4.981e+02 1.398e+03, threshold=7.991e+02, percent-clipped=9.0 2023-03-07 21:13:43,783 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=24741.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:13:53,999 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0910, 3.8998, 3.3029, 3.4280, 3.9282, 3.5807, 2.6607, 4.4926], device='cuda:2'), covar=tensor([0.0865, 0.0366, 0.1027, 0.0627, 0.0511, 0.0631, 0.0938, 0.0356], device='cuda:2'), in_proj_covar=tensor([0.0163, 0.0147, 0.0186, 0.0153, 0.0180, 0.0184, 0.0159, 0.0186], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 21:14:03,648 INFO [train2.py:809] (2/4) Epoch 7, batch 850, loss[ctc_loss=0.145, att_loss=0.2559, loss=0.2337, over 15964.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.006529, over 41.00 utterances.], tot_loss[ctc_loss=0.1365, att_loss=0.2658, loss=0.2399, over 3215257.53 frames. utt_duration=1299 frames, utt_pad_proportion=0.04716, over 9909.77 utterances.], batch size: 41, lr: 1.58e-02, grad_scale: 16.0 2023-03-07 21:14:19,614 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3574, 2.4451, 2.9679, 4.1955, 3.9674, 4.0258, 2.6824, 1.7746], device='cuda:2'), covar=tensor([0.0619, 0.2543, 0.1309, 0.0606, 0.0498, 0.0293, 0.1751, 0.2725], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0203, 0.0191, 0.0168, 0.0149, 0.0131, 0.0192, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 21:14:22,662 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24765.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:14:31,713 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0344, 5.3282, 5.2862, 5.1824, 5.4529, 5.3954, 5.1359, 4.9046], device='cuda:2'), covar=tensor([0.1032, 0.0338, 0.0178, 0.0461, 0.0208, 0.0261, 0.0297, 0.0261], device='cuda:2'), in_proj_covar=tensor([0.0402, 0.0239, 0.0183, 0.0222, 0.0284, 0.0313, 0.0231, 0.0264], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 21:15:24,359 INFO [train2.py:809] (2/4) Epoch 7, batch 900, loss[ctc_loss=0.1319, att_loss=0.245, loss=0.2224, over 15751.00 frames. utt_duration=1659 frames, utt_pad_proportion=0.01005, over 38.00 utterances.], tot_loss[ctc_loss=0.1358, att_loss=0.2658, loss=0.2398, over 3232067.57 frames. utt_duration=1305 frames, utt_pad_proportion=0.04481, over 9919.33 utterances.], batch size: 38, lr: 1.58e-02, grad_scale: 16.0 2023-03-07 21:15:38,324 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.848e+02 3.280e+02 4.214e+02 1.122e+03, threshold=6.561e+02, percent-clipped=2.0 2023-03-07 21:16:00,340 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24826.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:16:36,248 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24848.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:16:44,178 INFO [train2.py:809] (2/4) Epoch 7, batch 950, loss[ctc_loss=0.1419, att_loss=0.2772, loss=0.2501, over 16876.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007023, over 49.00 utterances.], tot_loss[ctc_loss=0.1353, att_loss=0.2659, loss=0.2398, over 3244289.39 frames. utt_duration=1301 frames, utt_pad_proportion=0.0435, over 9986.01 utterances.], batch size: 49, lr: 1.58e-02, grad_scale: 16.0 2023-03-07 21:17:50,355 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4092, 2.5683, 2.9951, 4.2501, 3.9742, 4.0453, 2.6482, 1.8365], device='cuda:2'), covar=tensor([0.0733, 0.2600, 0.1516, 0.0652, 0.0561, 0.0299, 0.2134, 0.3017], device='cuda:2'), in_proj_covar=tensor([0.0160, 0.0200, 0.0190, 0.0167, 0.0149, 0.0131, 0.0189, 0.0177], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 21:18:03,790 INFO [train2.py:809] (2/4) Epoch 7, batch 1000, loss[ctc_loss=0.1109, att_loss=0.2469, loss=0.2197, over 9666.00 frames. utt_duration=1842 frames, utt_pad_proportion=0.2624, over 21.00 utterances.], tot_loss[ctc_loss=0.1364, att_loss=0.2668, loss=0.2407, over 3246169.39 frames. utt_duration=1277 frames, utt_pad_proportion=0.0486, over 10181.64 utterances.], batch size: 21, lr: 1.57e-02, grad_scale: 16.0 2023-03-07 21:18:04,021 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=24903.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:18:13,385 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24909.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:18:17,632 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+02 2.930e+02 3.688e+02 4.379e+02 1.333e+03, threshold=7.376e+02, percent-clipped=8.0 2023-03-07 21:18:20,355 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.61 vs. limit=5.0 2023-03-07 21:18:25,811 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0911, 2.2931, 2.7164, 3.7260, 3.5984, 3.7293, 2.5315, 1.7851], device='cuda:2'), covar=tensor([0.0559, 0.2296, 0.1267, 0.0442, 0.0463, 0.0240, 0.1504, 0.2446], device='cuda:2'), in_proj_covar=tensor([0.0162, 0.0203, 0.0192, 0.0169, 0.0150, 0.0133, 0.0191, 0.0181], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 21:18:28,238 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.84 vs. limit=5.0 2023-03-07 21:18:37,658 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=24925.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:19:22,729 INFO [train2.py:809] (2/4) Epoch 7, batch 1050, loss[ctc_loss=0.1157, att_loss=0.2445, loss=0.2187, over 14125.00 frames. utt_duration=1824 frames, utt_pad_proportion=0.04947, over 31.00 utterances.], tot_loss[ctc_loss=0.1372, att_loss=0.2673, loss=0.2413, over 3251592.66 frames. utt_duration=1253 frames, utt_pad_proportion=0.05476, over 10392.11 utterances.], batch size: 31, lr: 1.57e-02, grad_scale: 16.0 2023-03-07 21:19:58,537 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8921, 5.1528, 5.4985, 5.3298, 5.1891, 5.8600, 5.1347, 5.9440], device='cuda:2'), covar=tensor([0.0587, 0.0658, 0.0549, 0.0868, 0.1777, 0.0766, 0.0539, 0.0584], device='cuda:2'), in_proj_covar=tensor([0.0583, 0.0364, 0.0393, 0.0457, 0.0635, 0.0404, 0.0328, 0.0413], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 21:20:29,715 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24995.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:20:43,053 INFO [train2.py:809] (2/4) Epoch 7, batch 1100, loss[ctc_loss=0.1861, att_loss=0.296, loss=0.274, over 16968.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.00655, over 50.00 utterances.], tot_loss[ctc_loss=0.1375, att_loss=0.2677, loss=0.2417, over 3251255.50 frames. utt_duration=1239 frames, utt_pad_proportion=0.05862, over 10507.10 utterances.], batch size: 50, lr: 1.57e-02, grad_scale: 16.0 2023-03-07 21:20:57,053 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+02 2.910e+02 3.574e+02 4.609e+02 1.124e+03, threshold=7.148e+02, percent-clipped=6.0 2023-03-07 21:21:14,591 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.0481, 3.4600, 3.3843, 2.6515, 3.0811, 3.2876, 3.2452, 1.9216], device='cuda:2'), covar=tensor([0.1712, 0.1025, 0.2276, 0.6413, 0.3668, 0.4505, 0.0886, 1.0065], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0079, 0.0081, 0.0129, 0.0073, 0.0118, 0.0072, 0.0130], device='cuda:2'), out_proj_covar=tensor([6.3610e-05, 5.9984e-05, 6.6714e-05, 9.8032e-05, 5.9700e-05, 9.2471e-05, 5.5766e-05, 1.0205e-04], device='cuda:2') 2023-03-07 21:21:24,084 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3360, 4.5669, 4.5997, 5.0489, 2.6349, 4.5617, 2.9365, 2.3088], device='cuda:2'), covar=tensor([0.0230, 0.0253, 0.0678, 0.0134, 0.2188, 0.0202, 0.1532, 0.1544], device='cuda:2'), in_proj_covar=tensor([0.0119, 0.0106, 0.0256, 0.0111, 0.0229, 0.0104, 0.0231, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 21:21:25,493 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8208, 5.1546, 4.5940, 5.1887, 4.6108, 4.9059, 5.3350, 5.1118], device='cuda:2'), covar=tensor([0.0388, 0.0211, 0.0696, 0.0171, 0.0400, 0.0220, 0.0192, 0.0137], device='cuda:2'), in_proj_covar=tensor([0.0258, 0.0199, 0.0256, 0.0175, 0.0213, 0.0162, 0.0186, 0.0183], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 21:22:03,893 INFO [train2.py:809] (2/4) Epoch 7, batch 1150, loss[ctc_loss=0.1159, att_loss=0.2388, loss=0.2142, over 15520.00 frames. utt_duration=1726 frames, utt_pad_proportion=0.007508, over 36.00 utterances.], tot_loss[ctc_loss=0.138, att_loss=0.268, loss=0.242, over 3257775.15 frames. utt_duration=1240 frames, utt_pad_proportion=0.05675, over 10521.82 utterances.], batch size: 36, lr: 1.57e-02, grad_scale: 16.0 2023-03-07 21:22:04,266 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6667, 3.8089, 2.8788, 3.4203, 3.8697, 3.5110, 2.4341, 4.2408], device='cuda:2'), covar=tensor([0.1089, 0.0400, 0.1172, 0.0569, 0.0535, 0.0572, 0.1042, 0.0420], device='cuda:2'), in_proj_covar=tensor([0.0168, 0.0154, 0.0191, 0.0159, 0.0188, 0.0189, 0.0166, 0.0194], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 21:22:08,919 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=25056.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 21:22:39,842 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8794, 5.0878, 5.4503, 5.4326, 5.2095, 5.7751, 5.1454, 5.8951], device='cuda:2'), covar=tensor([0.0626, 0.0691, 0.0634, 0.0798, 0.1954, 0.0916, 0.0558, 0.0642], device='cuda:2'), in_proj_covar=tensor([0.0584, 0.0367, 0.0398, 0.0463, 0.0635, 0.0407, 0.0335, 0.0414], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 21:22:48,433 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-07 21:23:24,556 INFO [train2.py:809] (2/4) Epoch 7, batch 1200, loss[ctc_loss=0.1446, att_loss=0.263, loss=0.2393, over 16000.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007729, over 40.00 utterances.], tot_loss[ctc_loss=0.1372, att_loss=0.2677, loss=0.2416, over 3266546.55 frames. utt_duration=1254 frames, utt_pad_proportion=0.05258, over 10436.16 utterances.], batch size: 40, lr: 1.57e-02, grad_scale: 16.0 2023-03-07 21:23:38,667 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.863e+02 3.570e+02 4.681e+02 1.368e+03, threshold=7.141e+02, percent-clipped=7.0 2023-03-07 21:23:52,679 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=25121.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:24:21,382 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.93 vs. limit=5.0 2023-03-07 21:24:44,727 INFO [train2.py:809] (2/4) Epoch 7, batch 1250, loss[ctc_loss=0.112, att_loss=0.263, loss=0.2328, over 17361.00 frames. utt_duration=1104 frames, utt_pad_proportion=0.03426, over 63.00 utterances.], tot_loss[ctc_loss=0.1379, att_loss=0.268, loss=0.242, over 3268122.84 frames. utt_duration=1244 frames, utt_pad_proportion=0.05601, over 10517.36 utterances.], batch size: 63, lr: 1.57e-02, grad_scale: 16.0 2023-03-07 21:26:05,286 INFO [train2.py:809] (2/4) Epoch 7, batch 1300, loss[ctc_loss=0.1394, att_loss=0.2647, loss=0.2397, over 16017.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006846, over 40.00 utterances.], tot_loss[ctc_loss=0.1368, att_loss=0.267, loss=0.241, over 3254563.83 frames. utt_duration=1264 frames, utt_pad_proportion=0.05389, over 10312.97 utterances.], batch size: 40, lr: 1.57e-02, grad_scale: 16.0 2023-03-07 21:26:05,543 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25203.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:26:06,849 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=25204.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:26:19,176 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.854e+02 3.559e+02 4.625e+02 8.622e+02, threshold=7.119e+02, percent-clipped=5.0 2023-03-07 21:26:36,718 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9476, 1.8050, 2.1347, 1.4926, 3.4087, 2.2171, 1.7947, 2.3108], device='cuda:2'), covar=tensor([0.0418, 0.2825, 0.2541, 0.1406, 0.0442, 0.1095, 0.2504, 0.1168], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0089, 0.0090, 0.0080, 0.0078, 0.0076, 0.0091, 0.0070], device='cuda:2'), out_proj_covar=tensor([3.8632e-05, 5.2776e-05, 5.2154e-05, 4.4243e-05, 3.9356e-05, 4.5099e-05, 5.2216e-05, 4.3171e-05], device='cuda:2') 2023-03-07 21:26:39,783 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25225.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:27:22,555 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25251.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:27:25,585 INFO [train2.py:809] (2/4) Epoch 7, batch 1350, loss[ctc_loss=0.1142, att_loss=0.2659, loss=0.2355, over 16955.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008167, over 50.00 utterances.], tot_loss[ctc_loss=0.1358, att_loss=0.2666, loss=0.2405, over 3260565.43 frames. utt_duration=1283 frames, utt_pad_proportion=0.04805, over 10181.12 utterances.], batch size: 50, lr: 1.56e-02, grad_scale: 16.0 2023-03-07 21:27:56,371 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25273.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:28:06,740 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4798, 2.9044, 3.6378, 2.8685, 3.4891, 4.5773, 4.4049, 3.2234], device='cuda:2'), covar=tensor([0.0324, 0.1666, 0.0935, 0.1503, 0.0990, 0.0583, 0.0484, 0.1316], device='cuda:2'), in_proj_covar=tensor([0.0208, 0.0207, 0.0214, 0.0194, 0.0211, 0.0237, 0.0183, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 21:28:30,245 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.82 vs. limit=5.0 2023-03-07 21:28:45,909 INFO [train2.py:809] (2/4) Epoch 7, batch 1400, loss[ctc_loss=0.1553, att_loss=0.2816, loss=0.2563, over 16524.00 frames. utt_duration=1470 frames, utt_pad_proportion=0.006712, over 45.00 utterances.], tot_loss[ctc_loss=0.1344, att_loss=0.2655, loss=0.2393, over 3260578.60 frames. utt_duration=1306 frames, utt_pad_proportion=0.04425, over 9997.36 utterances.], batch size: 45, lr: 1.56e-02, grad_scale: 16.0 2023-03-07 21:29:01,607 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.558e+02 2.920e+02 3.478e+02 4.487e+02 1.591e+03, threshold=6.957e+02, percent-clipped=5.0 2023-03-07 21:29:54,068 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.49 vs. limit=5.0 2023-03-07 21:30:03,022 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=25351.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 21:30:05,931 INFO [train2.py:809] (2/4) Epoch 7, batch 1450, loss[ctc_loss=0.1252, att_loss=0.2756, loss=0.2455, over 16974.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.0071, over 50.00 utterances.], tot_loss[ctc_loss=0.1348, att_loss=0.266, loss=0.2397, over 3259702.31 frames. utt_duration=1289 frames, utt_pad_proportion=0.04912, over 10124.04 utterances.], batch size: 50, lr: 1.56e-02, grad_scale: 8.0 2023-03-07 21:30:43,032 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2240, 3.9634, 3.3383, 3.8431, 4.1451, 3.8636, 3.1964, 4.6920], device='cuda:2'), covar=tensor([0.0856, 0.0402, 0.0913, 0.0493, 0.0532, 0.0500, 0.0734, 0.0392], device='cuda:2'), in_proj_covar=tensor([0.0169, 0.0155, 0.0190, 0.0159, 0.0185, 0.0187, 0.0164, 0.0194], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 21:30:49,759 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8119, 1.5892, 2.1147, 1.2755, 3.0806, 1.8877, 1.5610, 1.5647], device='cuda:2'), covar=tensor([0.0663, 0.4198, 0.2053, 0.2369, 0.0651, 0.2159, 0.3297, 0.2682], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0090, 0.0087, 0.0079, 0.0078, 0.0075, 0.0090, 0.0071], device='cuda:2'), out_proj_covar=tensor([3.8484e-05, 5.2792e-05, 5.0845e-05, 4.4132e-05, 3.9377e-05, 4.4813e-05, 5.2207e-05, 4.3648e-05], device='cuda:2') 2023-03-07 21:30:56,418 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-07 21:31:25,456 INFO [train2.py:809] (2/4) Epoch 7, batch 1500, loss[ctc_loss=0.1204, att_loss=0.2602, loss=0.2322, over 17060.00 frames. utt_duration=1340 frames, utt_pad_proportion=0.006109, over 51.00 utterances.], tot_loss[ctc_loss=0.1342, att_loss=0.2661, loss=0.2397, over 3266126.88 frames. utt_duration=1297 frames, utt_pad_proportion=0.04613, over 10085.52 utterances.], batch size: 51, lr: 1.56e-02, grad_scale: 8.0 2023-03-07 21:31:40,025 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-03-07 21:31:40,547 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.966e+02 3.623e+02 4.720e+02 8.485e+02, threshold=7.246e+02, percent-clipped=4.0 2023-03-07 21:31:52,972 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25421.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:32:03,875 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=25428.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 21:32:44,994 INFO [train2.py:809] (2/4) Epoch 7, batch 1550, loss[ctc_loss=0.1454, att_loss=0.2861, loss=0.258, over 16878.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.006979, over 49.00 utterances.], tot_loss[ctc_loss=0.135, att_loss=0.2667, loss=0.2403, over 3272635.04 frames. utt_duration=1286 frames, utt_pad_proportion=0.04649, over 10187.25 utterances.], batch size: 49, lr: 1.56e-02, grad_scale: 8.0 2023-03-07 21:33:10,082 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25469.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:33:42,991 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=25489.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 21:34:04,342 INFO [train2.py:809] (2/4) Epoch 7, batch 1600, loss[ctc_loss=0.1353, att_loss=0.2509, loss=0.2278, over 15956.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006284, over 41.00 utterances.], tot_loss[ctc_loss=0.1352, att_loss=0.2666, loss=0.2403, over 3272305.21 frames. utt_duration=1299 frames, utt_pad_proportion=0.04281, over 10091.22 utterances.], batch size: 41, lr: 1.56e-02, grad_scale: 8.0 2023-03-07 21:34:06,181 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25504.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:34:10,711 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3164, 4.5617, 4.0790, 4.6148, 4.0974, 4.2740, 4.7047, 4.4811], device='cuda:2'), covar=tensor([0.0440, 0.0240, 0.0812, 0.0197, 0.0495, 0.0362, 0.0183, 0.0189], device='cuda:2'), in_proj_covar=tensor([0.0260, 0.0203, 0.0261, 0.0179, 0.0217, 0.0165, 0.0186, 0.0188], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 21:34:19,607 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.911e+02 3.522e+02 4.081e+02 9.255e+02, threshold=7.044e+02, percent-clipped=3.0 2023-03-07 21:35:09,480 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2023-03-07 21:35:22,695 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25552.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:35:24,118 INFO [train2.py:809] (2/4) Epoch 7, batch 1650, loss[ctc_loss=0.1431, att_loss=0.2831, loss=0.2551, over 16464.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.006644, over 46.00 utterances.], tot_loss[ctc_loss=0.1364, att_loss=0.2674, loss=0.2412, over 3271274.57 frames. utt_duration=1245 frames, utt_pad_proportion=0.05608, over 10520.99 utterances.], batch size: 46, lr: 1.56e-02, grad_scale: 8.0 2023-03-07 21:36:44,067 INFO [train2.py:809] (2/4) Epoch 7, batch 1700, loss[ctc_loss=0.1256, att_loss=0.2319, loss=0.2106, over 15498.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009105, over 36.00 utterances.], tot_loss[ctc_loss=0.136, att_loss=0.2666, loss=0.2405, over 3260698.54 frames. utt_duration=1254 frames, utt_pad_proportion=0.05544, over 10412.45 utterances.], batch size: 36, lr: 1.55e-02, grad_scale: 8.0 2023-03-07 21:36:59,366 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.859e+02 3.554e+02 4.579e+02 8.259e+02, threshold=7.107e+02, percent-clipped=6.0 2023-03-07 21:37:31,940 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9823, 3.8244, 3.2021, 3.3161, 3.8937, 3.4072, 2.2552, 4.3649], device='cuda:2'), covar=tensor([0.0936, 0.0365, 0.0945, 0.0684, 0.0529, 0.0714, 0.1132, 0.0338], device='cuda:2'), in_proj_covar=tensor([0.0169, 0.0154, 0.0192, 0.0160, 0.0190, 0.0191, 0.0164, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 21:38:00,703 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25651.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 21:38:03,532 INFO [train2.py:809] (2/4) Epoch 7, batch 1750, loss[ctc_loss=0.1369, att_loss=0.2685, loss=0.2422, over 16936.00 frames. utt_duration=692.7 frames, utt_pad_proportion=0.1319, over 98.00 utterances.], tot_loss[ctc_loss=0.1348, att_loss=0.2661, loss=0.2399, over 3261679.72 frames. utt_duration=1265 frames, utt_pad_proportion=0.05227, over 10324.26 utterances.], batch size: 98, lr: 1.55e-02, grad_scale: 8.0 2023-03-07 21:38:13,774 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.59 vs. limit=2.0 2023-03-07 21:39:18,791 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25699.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:39:25,097 INFO [train2.py:809] (2/4) Epoch 7, batch 1800, loss[ctc_loss=0.1305, att_loss=0.2815, loss=0.2513, over 17301.00 frames. utt_duration=1100 frames, utt_pad_proportion=0.0394, over 63.00 utterances.], tot_loss[ctc_loss=0.1347, att_loss=0.2669, loss=0.2405, over 3264296.37 frames. utt_duration=1239 frames, utt_pad_proportion=0.05949, over 10550.56 utterances.], batch size: 63, lr: 1.55e-02, grad_scale: 8.0 2023-03-07 21:39:39,898 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0838, 4.7658, 4.7767, 2.4079, 2.1589, 2.5663, 4.3316, 3.6130], device='cuda:2'), covar=tensor([0.0548, 0.0168, 0.0187, 0.3187, 0.5679, 0.2703, 0.0354, 0.1803], device='cuda:2'), in_proj_covar=tensor([0.0307, 0.0195, 0.0217, 0.0182, 0.0362, 0.0339, 0.0205, 0.0340], device='cuda:2'), out_proj_covar=tensor([1.4940e-04, 7.7503e-05, 9.6008e-05, 8.4168e-05, 1.6808e-04, 1.4742e-04, 8.3400e-05, 1.5540e-04], device='cuda:2') 2023-03-07 21:39:40,904 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+02 2.893e+02 3.531e+02 4.524e+02 7.224e+02, threshold=7.063e+02, percent-clipped=1.0 2023-03-07 21:40:27,524 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2646, 5.2680, 5.0991, 2.7073, 2.0720, 3.0405, 5.0441, 3.8393], device='cuda:2'), covar=tensor([0.0542, 0.0181, 0.0236, 0.3263, 0.6192, 0.2258, 0.0250, 0.1821], device='cuda:2'), in_proj_covar=tensor([0.0308, 0.0196, 0.0217, 0.0183, 0.0364, 0.0340, 0.0207, 0.0342], device='cuda:2'), out_proj_covar=tensor([1.5009e-04, 7.8084e-05, 9.6032e-05, 8.4546e-05, 1.6886e-04, 1.4779e-04, 8.4371e-05, 1.5587e-04], device='cuda:2') 2023-03-07 21:40:46,309 INFO [train2.py:809] (2/4) Epoch 7, batch 1850, loss[ctc_loss=0.1161, att_loss=0.2681, loss=0.2377, over 17381.00 frames. utt_duration=881.7 frames, utt_pad_proportion=0.07583, over 79.00 utterances.], tot_loss[ctc_loss=0.133, att_loss=0.2658, loss=0.2393, over 3266516.56 frames. utt_duration=1237 frames, utt_pad_proportion=0.05832, over 10574.03 utterances.], batch size: 79, lr: 1.55e-02, grad_scale: 8.0 2023-03-07 21:41:37,050 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=25784.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 21:42:06,983 INFO [train2.py:809] (2/4) Epoch 7, batch 1900, loss[ctc_loss=0.187, att_loss=0.3125, loss=0.2874, over 17301.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01201, over 55.00 utterances.], tot_loss[ctc_loss=0.133, att_loss=0.2657, loss=0.2391, over 3265982.93 frames. utt_duration=1258 frames, utt_pad_proportion=0.05192, over 10400.40 utterances.], batch size: 55, lr: 1.55e-02, grad_scale: 8.0 2023-03-07 21:42:21,037 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8593, 6.1112, 5.4108, 6.0496, 5.8116, 5.4319, 5.5681, 5.3592], device='cuda:2'), covar=tensor([0.1205, 0.0901, 0.0868, 0.0689, 0.0671, 0.1512, 0.2193, 0.2341], device='cuda:2'), in_proj_covar=tensor([0.0372, 0.0423, 0.0325, 0.0341, 0.0306, 0.0387, 0.0445, 0.0418], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 21:42:22,403 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.923e+02 3.414e+02 4.315e+02 6.616e+02, threshold=6.827e+02, percent-clipped=0.0 2023-03-07 21:43:27,373 INFO [train2.py:809] (2/4) Epoch 7, batch 1950, loss[ctc_loss=0.1548, att_loss=0.3022, loss=0.2727, over 17084.00 frames. utt_duration=1222 frames, utt_pad_proportion=0.01717, over 56.00 utterances.], tot_loss[ctc_loss=0.1325, att_loss=0.2653, loss=0.2388, over 3259169.41 frames. utt_duration=1256 frames, utt_pad_proportion=0.05439, over 10387.87 utterances.], batch size: 56, lr: 1.55e-02, grad_scale: 8.0 2023-03-07 21:43:35,777 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=25858.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 21:43:41,705 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7352, 5.9564, 5.3026, 5.8658, 5.5272, 5.1696, 5.3641, 5.0998], device='cuda:2'), covar=tensor([0.1241, 0.0924, 0.0847, 0.0684, 0.0786, 0.1634, 0.2148, 0.2287], device='cuda:2'), in_proj_covar=tensor([0.0372, 0.0424, 0.0322, 0.0338, 0.0306, 0.0388, 0.0444, 0.0412], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 21:43:48,121 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7114, 2.5240, 5.0553, 4.1432, 3.0105, 4.7105, 4.9013, 4.9080], device='cuda:2'), covar=tensor([0.0195, 0.1809, 0.0184, 0.0996, 0.2089, 0.0181, 0.0117, 0.0170], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0240, 0.0126, 0.0301, 0.0293, 0.0182, 0.0109, 0.0140], device='cuda:2'), out_proj_covar=tensor([1.3207e-04, 1.9762e-04, 1.1268e-04, 2.4594e-04, 2.5086e-04, 1.6018e-04, 9.7639e-05, 1.2727e-04], device='cuda:2') 2023-03-07 21:44:26,715 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-07 21:44:48,782 INFO [train2.py:809] (2/4) Epoch 7, batch 2000, loss[ctc_loss=0.1186, att_loss=0.2643, loss=0.2351, over 16871.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007537, over 49.00 utterances.], tot_loss[ctc_loss=0.1313, att_loss=0.2645, loss=0.2379, over 3264962.45 frames. utt_duration=1277 frames, utt_pad_proportion=0.04811, over 10237.04 utterances.], batch size: 49, lr: 1.55e-02, grad_scale: 8.0 2023-03-07 21:45:04,086 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+02 2.979e+02 3.606e+02 4.355e+02 1.257e+03, threshold=7.211e+02, percent-clipped=2.0 2023-03-07 21:45:14,771 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=25919.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 21:45:56,024 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1649, 5.1574, 5.0351, 2.5698, 1.8006, 2.7225, 4.5003, 3.6730], device='cuda:2'), covar=tensor([0.0531, 0.0165, 0.0185, 0.3192, 0.6265, 0.2526, 0.0453, 0.2025], device='cuda:2'), in_proj_covar=tensor([0.0303, 0.0192, 0.0213, 0.0179, 0.0354, 0.0331, 0.0206, 0.0332], device='cuda:2'), out_proj_covar=tensor([1.4760e-04, 7.6801e-05, 9.4471e-05, 8.2978e-05, 1.6431e-04, 1.4325e-04, 8.4964e-05, 1.5190e-04], device='cuda:2') 2023-03-07 21:46:05,309 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3106, 4.6423, 4.6024, 5.0020, 2.3593, 4.7864, 2.8039, 1.8656], device='cuda:2'), covar=tensor([0.0225, 0.0156, 0.0626, 0.0097, 0.2330, 0.0126, 0.1633, 0.2039], device='cuda:2'), in_proj_covar=tensor([0.0119, 0.0105, 0.0249, 0.0107, 0.0220, 0.0103, 0.0225, 0.0203], device='cuda:2'), out_proj_covar=tensor([1.1842e-04, 1.0514e-04, 2.2451e-04, 9.8892e-05, 2.0425e-04, 1.0070e-04, 2.0223e-04, 1.8411e-04], device='cuda:2') 2023-03-07 21:46:09,507 INFO [train2.py:809] (2/4) Epoch 7, batch 2050, loss[ctc_loss=0.109, att_loss=0.2389, loss=0.2129, over 15862.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.01062, over 39.00 utterances.], tot_loss[ctc_loss=0.1306, att_loss=0.2643, loss=0.2375, over 3267711.85 frames. utt_duration=1286 frames, utt_pad_proportion=0.04615, over 10175.92 utterances.], batch size: 39, lr: 1.54e-02, grad_scale: 8.0 2023-03-07 21:47:33,894 INFO [train2.py:809] (2/4) Epoch 7, batch 2100, loss[ctc_loss=0.1242, att_loss=0.2229, loss=0.2032, over 15491.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.00944, over 36.00 utterances.], tot_loss[ctc_loss=0.1311, att_loss=0.2645, loss=0.2379, over 3264298.16 frames. utt_duration=1276 frames, utt_pad_proportion=0.04849, over 10245.98 utterances.], batch size: 36, lr: 1.54e-02, grad_scale: 8.0 2023-03-07 21:47:48,451 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4229, 2.7495, 3.3363, 4.3493, 4.0460, 4.1000, 2.7446, 2.0986], device='cuda:2'), covar=tensor([0.0586, 0.2273, 0.1090, 0.0684, 0.0487, 0.0307, 0.1734, 0.2672], device='cuda:2'), in_proj_covar=tensor([0.0158, 0.0200, 0.0190, 0.0173, 0.0155, 0.0129, 0.0189, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 21:47:49,610 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+02 2.961e+02 3.539e+02 4.278e+02 1.039e+03, threshold=7.077e+02, percent-clipped=2.0 2023-03-07 21:48:38,156 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6507, 2.7431, 5.0150, 3.9046, 2.9935, 4.6107, 4.7133, 4.6567], device='cuda:2'), covar=tensor([0.0249, 0.1654, 0.0211, 0.1091, 0.2116, 0.0202, 0.0129, 0.0271], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0241, 0.0127, 0.0302, 0.0293, 0.0182, 0.0111, 0.0140], device='cuda:2'), out_proj_covar=tensor([1.3242e-04, 1.9907e-04, 1.1369e-04, 2.4681e-04, 2.5132e-04, 1.6079e-04, 9.9098e-05, 1.2799e-04], device='cuda:2') 2023-03-07 21:48:53,241 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4156, 2.6109, 3.3677, 4.3922, 4.0714, 4.0860, 2.7830, 2.2384], device='cuda:2'), covar=tensor([0.0664, 0.2643, 0.1226, 0.0544, 0.0450, 0.0363, 0.1738, 0.2656], device='cuda:2'), in_proj_covar=tensor([0.0159, 0.0201, 0.0190, 0.0174, 0.0154, 0.0127, 0.0188, 0.0178], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 21:48:54,478 INFO [train2.py:809] (2/4) Epoch 7, batch 2150, loss[ctc_loss=0.1192, att_loss=0.2666, loss=0.2371, over 16260.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.007635, over 43.00 utterances.], tot_loss[ctc_loss=0.1316, att_loss=0.2652, loss=0.2385, over 3271637.95 frames. utt_duration=1296 frames, utt_pad_proportion=0.04201, over 10111.39 utterances.], batch size: 43, lr: 1.54e-02, grad_scale: 8.0 2023-03-07 21:49:44,674 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26084.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 21:49:49,614 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2023-03-07 21:50:13,400 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26102.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:50:14,584 INFO [train2.py:809] (2/4) Epoch 7, batch 2200, loss[ctc_loss=0.1166, att_loss=0.2638, loss=0.2343, over 16881.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.0067, over 49.00 utterances.], tot_loss[ctc_loss=0.1322, att_loss=0.2655, loss=0.2388, over 3270938.18 frames. utt_duration=1276 frames, utt_pad_proportion=0.04857, over 10263.61 utterances.], batch size: 49, lr: 1.54e-02, grad_scale: 8.0 2023-03-07 21:50:30,120 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.379e+02 2.999e+02 3.731e+02 5.039e+02 1.380e+03, threshold=7.462e+02, percent-clipped=10.0 2023-03-07 21:51:01,663 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26132.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 21:51:34,424 INFO [train2.py:809] (2/4) Epoch 7, batch 2250, loss[ctc_loss=0.1393, att_loss=0.271, loss=0.2447, over 16850.00 frames. utt_duration=1377 frames, utt_pad_proportion=0.008492, over 49.00 utterances.], tot_loss[ctc_loss=0.1325, att_loss=0.2657, loss=0.2391, over 3277586.97 frames. utt_duration=1278 frames, utt_pad_proportion=0.04539, over 10269.46 utterances.], batch size: 49, lr: 1.54e-02, grad_scale: 8.0 2023-03-07 21:51:50,772 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26163.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:51:59,366 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26168.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:52:55,167 INFO [train2.py:809] (2/4) Epoch 7, batch 2300, loss[ctc_loss=0.1407, att_loss=0.2702, loss=0.2443, over 17292.00 frames. utt_duration=876.9 frames, utt_pad_proportion=0.07985, over 79.00 utterances.], tot_loss[ctc_loss=0.1326, att_loss=0.2658, loss=0.2391, over 3271076.06 frames. utt_duration=1268 frames, utt_pad_proportion=0.0489, over 10332.23 utterances.], batch size: 79, lr: 1.54e-02, grad_scale: 8.0 2023-03-07 21:53:07,145 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2023-03-07 21:53:10,516 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.696e+02 2.838e+02 3.640e+02 4.798e+02 9.404e+02, threshold=7.281e+02, percent-clipped=6.0 2023-03-07 21:53:12,391 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26214.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 21:53:30,138 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1362, 5.1183, 5.0148, 2.6671, 4.7492, 4.3326, 4.1938, 2.7095], device='cuda:2'), covar=tensor([0.0122, 0.0075, 0.0181, 0.1136, 0.0092, 0.0212, 0.0316, 0.1361], device='cuda:2'), in_proj_covar=tensor([0.0054, 0.0071, 0.0058, 0.0099, 0.0062, 0.0081, 0.0084, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0004, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 21:53:37,112 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26229.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:54:08,513 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.87 vs. limit=2.0 2023-03-07 21:54:11,203 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26250.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:54:15,538 INFO [train2.py:809] (2/4) Epoch 7, batch 2350, loss[ctc_loss=0.08909, att_loss=0.2299, loss=0.2017, over 15764.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.007782, over 38.00 utterances.], tot_loss[ctc_loss=0.1325, att_loss=0.2654, loss=0.2388, over 3261580.22 frames. utt_duration=1267 frames, utt_pad_proportion=0.05124, over 10312.00 utterances.], batch size: 38, lr: 1.54e-02, grad_scale: 8.0 2023-03-07 21:54:36,882 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9061, 5.2435, 4.6380, 5.2816, 4.6272, 4.9678, 5.3312, 5.1886], device='cuda:2'), covar=tensor([0.0389, 0.0212, 0.0684, 0.0163, 0.0386, 0.0177, 0.0186, 0.0145], device='cuda:2'), in_proj_covar=tensor([0.0264, 0.0206, 0.0258, 0.0185, 0.0217, 0.0165, 0.0190, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 21:54:45,237 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-07 21:55:36,003 INFO [train2.py:809] (2/4) Epoch 7, batch 2400, loss[ctc_loss=0.1271, att_loss=0.2462, loss=0.2224, over 14100.00 frames. utt_duration=1821 frames, utt_pad_proportion=0.05955, over 31.00 utterances.], tot_loss[ctc_loss=0.1343, att_loss=0.2667, loss=0.2402, over 3258465.16 frames. utt_duration=1232 frames, utt_pad_proportion=0.06115, over 10589.27 utterances.], batch size: 31, lr: 1.53e-02, grad_scale: 8.0 2023-03-07 21:55:49,035 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26311.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:55:52,369 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+02 3.013e+02 3.656e+02 4.621e+02 8.184e+02, threshold=7.313e+02, percent-clipped=1.0 2023-03-07 21:56:27,678 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2023-03-07 21:56:34,135 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9591, 4.9388, 4.6660, 4.8214, 5.1147, 4.9971, 4.8333, 2.2105], device='cuda:2'), covar=tensor([0.0220, 0.0184, 0.0209, 0.0197, 0.1406, 0.0178, 0.0188, 0.2448], device='cuda:2'), in_proj_covar=tensor([0.0133, 0.0123, 0.0125, 0.0121, 0.0303, 0.0126, 0.0114, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 21:56:56,978 INFO [train2.py:809] (2/4) Epoch 7, batch 2450, loss[ctc_loss=0.1708, att_loss=0.299, loss=0.2734, over 17383.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03387, over 63.00 utterances.], tot_loss[ctc_loss=0.1332, att_loss=0.267, loss=0.2402, over 3269775.90 frames. utt_duration=1244 frames, utt_pad_proportion=0.05624, over 10527.58 utterances.], batch size: 63, lr: 1.53e-02, grad_scale: 8.0 2023-03-07 21:58:18,162 INFO [train2.py:809] (2/4) Epoch 7, batch 2500, loss[ctc_loss=0.1224, att_loss=0.2687, loss=0.2395, over 17024.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.00754, over 51.00 utterances.], tot_loss[ctc_loss=0.1328, att_loss=0.2666, loss=0.2398, over 3273523.57 frames. utt_duration=1265 frames, utt_pad_proportion=0.04959, over 10366.14 utterances.], batch size: 51, lr: 1.53e-02, grad_scale: 8.0 2023-03-07 21:58:33,370 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.816e+02 3.513e+02 4.106e+02 7.996e+02, threshold=7.027e+02, percent-clipped=1.0 2023-03-07 21:58:36,092 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26414.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 21:59:38,006 INFO [train2.py:809] (2/4) Epoch 7, batch 2550, loss[ctc_loss=0.1401, att_loss=0.2839, loss=0.2552, over 17349.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.01943, over 59.00 utterances.], tot_loss[ctc_loss=0.1319, att_loss=0.2658, loss=0.239, over 3281483.87 frames. utt_duration=1285 frames, utt_pad_proportion=0.04295, over 10226.04 utterances.], batch size: 59, lr: 1.53e-02, grad_scale: 8.0 2023-03-07 21:59:45,998 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26458.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:00:14,063 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26475.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:00:58,231 INFO [train2.py:809] (2/4) Epoch 7, batch 2600, loss[ctc_loss=0.1239, att_loss=0.2405, loss=0.2172, over 15494.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.007536, over 36.00 utterances.], tot_loss[ctc_loss=0.1312, att_loss=0.2646, loss=0.2379, over 3274472.41 frames. utt_duration=1271 frames, utt_pad_proportion=0.0475, over 10315.69 utterances.], batch size: 36, lr: 1.53e-02, grad_scale: 8.0 2023-03-07 22:01:14,579 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.671e+02 3.263e+02 3.835e+02 5.237e+02 1.388e+03, threshold=7.671e+02, percent-clipped=11.0 2023-03-07 22:01:16,497 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26514.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:01:31,413 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26524.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:01:44,642 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0297, 5.2077, 5.5481, 5.4519, 5.3943, 5.9090, 5.0410, 6.0723], device='cuda:2'), covar=tensor([0.0579, 0.0547, 0.0543, 0.0801, 0.1641, 0.0755, 0.0533, 0.0493], device='cuda:2'), in_proj_covar=tensor([0.0604, 0.0360, 0.0408, 0.0464, 0.0634, 0.0412, 0.0331, 0.0417], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 22:02:18,268 INFO [train2.py:809] (2/4) Epoch 7, batch 2650, loss[ctc_loss=0.1255, att_loss=0.2474, loss=0.223, over 15636.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008867, over 37.00 utterances.], tot_loss[ctc_loss=0.1316, att_loss=0.2647, loss=0.238, over 3267688.20 frames. utt_duration=1266 frames, utt_pad_proportion=0.05159, over 10339.95 utterances.], batch size: 37, lr: 1.53e-02, grad_scale: 8.0 2023-03-07 22:02:32,521 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26562.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:03:04,933 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1625, 2.1658, 2.9948, 4.1256, 3.7300, 3.8941, 2.5774, 2.0124], device='cuda:2'), covar=tensor([0.0683, 0.2802, 0.1199, 0.0547, 0.0713, 0.0368, 0.1820, 0.2522], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0202, 0.0191, 0.0173, 0.0159, 0.0128, 0.0188, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:03:37,576 INFO [train2.py:809] (2/4) Epoch 7, batch 2700, loss[ctc_loss=0.1482, att_loss=0.2762, loss=0.2506, over 16337.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005822, over 45.00 utterances.], tot_loss[ctc_loss=0.1322, att_loss=0.265, loss=0.2384, over 3272853.63 frames. utt_duration=1259 frames, utt_pad_proportion=0.05087, over 10409.50 utterances.], batch size: 45, lr: 1.53e-02, grad_scale: 8.0 2023-03-07 22:03:42,444 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26606.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:03:53,739 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+02 3.023e+02 3.591e+02 4.363e+02 1.241e+03, threshold=7.183e+02, percent-clipped=4.0 2023-03-07 22:04:09,001 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9959, 6.1375, 5.4977, 5.9865, 5.7359, 5.4463, 5.5245, 5.4385], device='cuda:2'), covar=tensor([0.0854, 0.0859, 0.0758, 0.0692, 0.0660, 0.1263, 0.1967, 0.1930], device='cuda:2'), in_proj_covar=tensor([0.0371, 0.0434, 0.0322, 0.0339, 0.0309, 0.0389, 0.0443, 0.0408], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 22:04:56,685 INFO [train2.py:809] (2/4) Epoch 7, batch 2750, loss[ctc_loss=0.139, att_loss=0.2738, loss=0.2468, over 16953.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008356, over 50.00 utterances.], tot_loss[ctc_loss=0.1321, att_loss=0.265, loss=0.2384, over 3276592.63 frames. utt_duration=1267 frames, utt_pad_proportion=0.0483, over 10353.94 utterances.], batch size: 50, lr: 1.52e-02, grad_scale: 8.0 2023-03-07 22:05:27,999 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9370, 5.1840, 5.4824, 5.3933, 5.2201, 5.8707, 5.1031, 5.9405], device='cuda:2'), covar=tensor([0.0592, 0.0577, 0.0556, 0.0928, 0.1661, 0.0802, 0.0550, 0.0591], device='cuda:2'), in_proj_covar=tensor([0.0590, 0.0354, 0.0399, 0.0460, 0.0624, 0.0410, 0.0324, 0.0412], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 22:06:15,390 INFO [train2.py:809] (2/4) Epoch 7, batch 2800, loss[ctc_loss=0.1566, att_loss=0.282, loss=0.2569, over 17399.00 frames. utt_duration=882.4 frames, utt_pad_proportion=0.0751, over 79.00 utterances.], tot_loss[ctc_loss=0.1329, att_loss=0.2655, loss=0.2389, over 3277177.10 frames. utt_duration=1254 frames, utt_pad_proportion=0.0512, over 10462.79 utterances.], batch size: 79, lr: 1.52e-02, grad_scale: 8.0 2023-03-07 22:06:31,174 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 3.082e+02 3.761e+02 4.456e+02 1.465e+03, threshold=7.521e+02, percent-clipped=2.0 2023-03-07 22:07:34,544 INFO [train2.py:809] (2/4) Epoch 7, batch 2850, loss[ctc_loss=0.1205, att_loss=0.2435, loss=0.2189, over 15368.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01149, over 35.00 utterances.], tot_loss[ctc_loss=0.1329, att_loss=0.2651, loss=0.2387, over 3273688.14 frames. utt_duration=1262 frames, utt_pad_proportion=0.05009, over 10387.49 utterances.], batch size: 35, lr: 1.52e-02, grad_scale: 8.0 2023-03-07 22:07:42,510 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26758.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:08:01,759 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26770.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:08:54,865 INFO [train2.py:809] (2/4) Epoch 7, batch 2900, loss[ctc_loss=0.08881, att_loss=0.2213, loss=0.1948, over 10647.00 frames. utt_duration=1853 frames, utt_pad_proportion=0.2266, over 23.00 utterances.], tot_loss[ctc_loss=0.1328, att_loss=0.265, loss=0.2386, over 3267197.26 frames. utt_duration=1242 frames, utt_pad_proportion=0.05685, over 10538.80 utterances.], batch size: 23, lr: 1.52e-02, grad_scale: 8.0 2023-03-07 22:08:59,633 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26806.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:09:11,159 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 3.082e+02 3.745e+02 4.484e+02 1.236e+03, threshold=7.491e+02, percent-clipped=3.0 2023-03-07 22:09:30,286 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26824.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:10:16,153 INFO [train2.py:809] (2/4) Epoch 7, batch 2950, loss[ctc_loss=0.1349, att_loss=0.2544, loss=0.2305, over 16538.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006437, over 45.00 utterances.], tot_loss[ctc_loss=0.1326, att_loss=0.2648, loss=0.2384, over 3276010.45 frames. utt_duration=1256 frames, utt_pad_proportion=0.05095, over 10448.39 utterances.], batch size: 45, lr: 1.52e-02, grad_scale: 8.0 2023-03-07 22:10:35,055 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.48 vs. limit=5.0 2023-03-07 22:10:39,192 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9097, 4.9010, 4.9515, 2.5142, 4.7122, 4.3031, 4.2650, 2.2796], device='cuda:2'), covar=tensor([0.0155, 0.0090, 0.0138, 0.1146, 0.0098, 0.0214, 0.0302, 0.1644], device='cuda:2'), in_proj_covar=tensor([0.0053, 0.0072, 0.0059, 0.0098, 0.0062, 0.0082, 0.0084, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0004, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 22:10:47,882 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26872.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:11:29,471 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.70 vs. limit=5.0 2023-03-07 22:11:36,462 INFO [train2.py:809] (2/4) Epoch 7, batch 3000, loss[ctc_loss=0.1651, att_loss=0.2782, loss=0.2556, over 16829.00 frames. utt_duration=681.4 frames, utt_pad_proportion=0.1439, over 99.00 utterances.], tot_loss[ctc_loss=0.1333, att_loss=0.2658, loss=0.2393, over 3276719.62 frames. utt_duration=1208 frames, utt_pad_proportion=0.06259, over 10861.35 utterances.], batch size: 99, lr: 1.52e-02, grad_scale: 8.0 2023-03-07 22:11:36,463 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 22:11:50,142 INFO [train2.py:843] (2/4) Epoch 7, validation: ctc_loss=0.06224, att_loss=0.2434, loss=0.2072, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 22:11:50,142 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 22:11:52,064 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26904.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:11:55,130 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26906.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:12:06,321 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.662e+02 2.833e+02 3.331e+02 4.099e+02 1.001e+03, threshold=6.663e+02, percent-clipped=1.0 2023-03-07 22:12:11,679 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-07 22:12:21,074 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.02 vs. limit=2.0 2023-03-07 22:13:05,215 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26950.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:13:09,498 INFO [train2.py:809] (2/4) Epoch 7, batch 3050, loss[ctc_loss=0.09199, att_loss=0.2208, loss=0.195, over 15494.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008519, over 36.00 utterances.], tot_loss[ctc_loss=0.1322, att_loss=0.2654, loss=0.2387, over 3275195.44 frames. utt_duration=1220 frames, utt_pad_proportion=0.05863, over 10747.37 utterances.], batch size: 36, lr: 1.52e-02, grad_scale: 8.0 2023-03-07 22:13:09,853 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5045, 2.7571, 3.5998, 3.0115, 3.4462, 4.7685, 4.5181, 3.4266], device='cuda:2'), covar=tensor([0.0487, 0.1930, 0.1051, 0.1436, 0.1103, 0.0465, 0.0405, 0.1199], device='cuda:2'), in_proj_covar=tensor([0.0215, 0.0217, 0.0217, 0.0196, 0.0221, 0.0248, 0.0188, 0.0207], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:13:10,448 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2023-03-07 22:13:11,199 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26954.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:13:29,168 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26965.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:13:31,067 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.04 vs. limit=2.0 2023-03-07 22:14:29,789 INFO [train2.py:809] (2/4) Epoch 7, batch 3100, loss[ctc_loss=0.1218, att_loss=0.2492, loss=0.2237, over 16346.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.00532, over 45.00 utterances.], tot_loss[ctc_loss=0.1319, att_loss=0.2652, loss=0.2386, over 3278682.15 frames. utt_duration=1229 frames, utt_pad_proportion=0.05556, over 10688.14 utterances.], batch size: 45, lr: 1.51e-02, grad_scale: 8.0 2023-03-07 22:14:43,252 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=27011.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:14:45,950 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 3.057e+02 3.764e+02 4.844e+02 1.301e+03, threshold=7.527e+02, percent-clipped=9.0 2023-03-07 22:15:49,017 INFO [train2.py:809] (2/4) Epoch 7, batch 3150, loss[ctc_loss=0.1389, att_loss=0.2674, loss=0.2417, over 17247.00 frames. utt_duration=875 frames, utt_pad_proportion=0.08381, over 79.00 utterances.], tot_loss[ctc_loss=0.1333, att_loss=0.2659, loss=0.2394, over 3273520.45 frames. utt_duration=1197 frames, utt_pad_proportion=0.06615, over 10954.75 utterances.], batch size: 79, lr: 1.51e-02, grad_scale: 8.0 2023-03-07 22:16:17,126 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=27070.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:17:09,177 INFO [train2.py:809] (2/4) Epoch 7, batch 3200, loss[ctc_loss=0.1322, att_loss=0.2768, loss=0.2479, over 16615.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.006006, over 47.00 utterances.], tot_loss[ctc_loss=0.1317, att_loss=0.2654, loss=0.2387, over 3282777.42 frames. utt_duration=1228 frames, utt_pad_proportion=0.05673, over 10702.25 utterances.], batch size: 47, lr: 1.51e-02, grad_scale: 8.0 2023-03-07 22:17:20,900 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9592, 6.1679, 5.5104, 6.0865, 5.8225, 5.4506, 5.6461, 5.3937], device='cuda:2'), covar=tensor([0.1260, 0.0917, 0.0794, 0.0757, 0.0639, 0.1419, 0.2087, 0.2224], device='cuda:2'), in_proj_covar=tensor([0.0375, 0.0432, 0.0320, 0.0334, 0.0308, 0.0380, 0.0447, 0.0404], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 22:17:25,181 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.918e+02 3.542e+02 4.352e+02 6.869e+02, threshold=7.084e+02, percent-clipped=0.0 2023-03-07 22:17:30,169 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1005, 1.7497, 1.9059, 1.0082, 3.1968, 2.2551, 1.4152, 1.2144], device='cuda:2'), covar=tensor([0.0790, 0.4692, 0.3722, 0.3073, 0.0844, 0.1961, 0.4231, 0.3667], device='cuda:2'), in_proj_covar=tensor([0.0083, 0.0091, 0.0089, 0.0080, 0.0076, 0.0078, 0.0088, 0.0070], device='cuda:2'), out_proj_covar=tensor([4.1008e-05, 5.5349e-05, 5.2193e-05, 4.4832e-05, 3.9191e-05, 4.6568e-05, 5.2942e-05, 4.5106e-05], device='cuda:2') 2023-03-07 22:17:32,944 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=27118.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:18:21,591 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=27148.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:18:29,006 INFO [train2.py:809] (2/4) Epoch 7, batch 3250, loss[ctc_loss=0.1239, att_loss=0.2634, loss=0.2355, over 16782.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005697, over 48.00 utterances.], tot_loss[ctc_loss=0.1316, att_loss=0.265, loss=0.2383, over 3279595.79 frames. utt_duration=1231 frames, utt_pad_proportion=0.05702, over 10672.21 utterances.], batch size: 48, lr: 1.51e-02, grad_scale: 8.0 2023-03-07 22:18:51,380 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1643, 4.6727, 4.4663, 4.6892, 4.8308, 4.4026, 3.4898, 4.5813], device='cuda:2'), covar=tensor([0.0129, 0.0144, 0.0125, 0.0091, 0.0103, 0.0112, 0.0569, 0.0240], device='cuda:2'), in_proj_covar=tensor([0.0063, 0.0058, 0.0070, 0.0045, 0.0046, 0.0057, 0.0081, 0.0075], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 22:19:17,302 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1904, 4.8471, 4.9519, 4.9366, 2.5578, 5.1784, 2.7363, 2.4678], device='cuda:2'), covar=tensor([0.0237, 0.0138, 0.0476, 0.0104, 0.2020, 0.0081, 0.1554, 0.1576], device='cuda:2'), in_proj_covar=tensor([0.0120, 0.0100, 0.0249, 0.0112, 0.0220, 0.0101, 0.0223, 0.0205], device='cuda:2'), out_proj_covar=tensor([1.1952e-04, 1.0233e-04, 2.2520e-04, 1.0524e-04, 2.0568e-04, 9.9365e-05, 2.0162e-04, 1.8613e-04], device='cuda:2') 2023-03-07 22:19:48,584 INFO [train2.py:809] (2/4) Epoch 7, batch 3300, loss[ctc_loss=0.1315, att_loss=0.2634, loss=0.237, over 16465.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.007407, over 46.00 utterances.], tot_loss[ctc_loss=0.1318, att_loss=0.2653, loss=0.2386, over 3287313.34 frames. utt_duration=1245 frames, utt_pad_proportion=0.05115, over 10574.83 utterances.], batch size: 46, lr: 1.51e-02, grad_scale: 8.0 2023-03-07 22:19:59,205 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=27209.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:20:05,128 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.634e+02 2.966e+02 3.461e+02 4.502e+02 1.067e+03, threshold=6.921e+02, percent-clipped=5.0 2023-03-07 22:20:33,725 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0301, 5.3048, 5.3085, 5.2262, 5.4072, 5.4093, 5.1582, 4.7484], device='cuda:2'), covar=tensor([0.0934, 0.0404, 0.0179, 0.0437, 0.0257, 0.0220, 0.0188, 0.0312], device='cuda:2'), in_proj_covar=tensor([0.0413, 0.0246, 0.0190, 0.0234, 0.0288, 0.0304, 0.0236, 0.0270], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-07 22:20:54,103 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0086, 2.1731, 3.2765, 2.5470, 3.2025, 4.1185, 3.8918, 3.0019], device='cuda:2'), covar=tensor([0.0478, 0.2283, 0.1138, 0.1520, 0.1055, 0.0809, 0.0691, 0.1311], device='cuda:2'), in_proj_covar=tensor([0.0219, 0.0218, 0.0216, 0.0199, 0.0222, 0.0251, 0.0192, 0.0208], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:21:08,401 INFO [train2.py:809] (2/4) Epoch 7, batch 3350, loss[ctc_loss=0.1213, att_loss=0.2636, loss=0.2352, over 16477.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.00688, over 46.00 utterances.], tot_loss[ctc_loss=0.1317, att_loss=0.2653, loss=0.2386, over 3283492.07 frames. utt_duration=1248 frames, utt_pad_proportion=0.05176, over 10535.99 utterances.], batch size: 46, lr: 1.51e-02, grad_scale: 8.0 2023-03-07 22:21:20,235 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=27260.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:21:58,844 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1609, 4.7407, 4.2500, 4.4719, 2.5974, 4.8908, 2.1708, 2.0722], device='cuda:2'), covar=tensor([0.0244, 0.0129, 0.0823, 0.0216, 0.1950, 0.0142, 0.1863, 0.1763], device='cuda:2'), in_proj_covar=tensor([0.0119, 0.0100, 0.0252, 0.0113, 0.0220, 0.0102, 0.0222, 0.0204], device='cuda:2'), out_proj_covar=tensor([1.1926e-04, 1.0263e-04, 2.2730e-04, 1.0632e-04, 2.0586e-04, 9.9744e-05, 2.0061e-04, 1.8563e-04], device='cuda:2') 2023-03-07 22:22:28,310 INFO [train2.py:809] (2/4) Epoch 7, batch 3400, loss[ctc_loss=0.1508, att_loss=0.2862, loss=0.2591, over 16795.00 frames. utt_duration=680 frames, utt_pad_proportion=0.1468, over 99.00 utterances.], tot_loss[ctc_loss=0.1323, att_loss=0.2663, loss=0.2395, over 3286431.05 frames. utt_duration=1240 frames, utt_pad_proportion=0.05181, over 10618.11 utterances.], batch size: 99, lr: 1.51e-02, grad_scale: 8.0 2023-03-07 22:22:33,598 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=27306.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:22:43,940 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.932e+02 3.516e+02 4.279e+02 1.181e+03, threshold=7.032e+02, percent-clipped=5.0 2023-03-07 22:23:17,941 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3355, 4.7539, 4.6099, 4.9918, 4.9468, 4.5999, 3.4765, 4.6397], device='cuda:2'), covar=tensor([0.0104, 0.0096, 0.0092, 0.0056, 0.0066, 0.0088, 0.0549, 0.0169], device='cuda:2'), in_proj_covar=tensor([0.0063, 0.0058, 0.0069, 0.0044, 0.0047, 0.0057, 0.0080, 0.0076], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 22:23:45,932 INFO [train2.py:809] (2/4) Epoch 7, batch 3450, loss[ctc_loss=0.1395, att_loss=0.2773, loss=0.2498, over 17030.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007293, over 51.00 utterances.], tot_loss[ctc_loss=0.1326, att_loss=0.2658, loss=0.2392, over 3288667.91 frames. utt_duration=1262 frames, utt_pad_proportion=0.04605, over 10432.37 utterances.], batch size: 51, lr: 1.51e-02, grad_scale: 16.0 2023-03-07 22:24:43,943 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2860, 4.8981, 5.1044, 5.0421, 4.9992, 5.1388, 4.9833, 4.6912], device='cuda:2'), covar=tensor([0.2031, 0.0989, 0.0288, 0.0548, 0.0709, 0.0408, 0.0321, 0.0413], device='cuda:2'), in_proj_covar=tensor([0.0419, 0.0254, 0.0195, 0.0238, 0.0293, 0.0315, 0.0242, 0.0279], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 22:25:06,083 INFO [train2.py:809] (2/4) Epoch 7, batch 3500, loss[ctc_loss=0.1056, att_loss=0.2331, loss=0.2076, over 14518.00 frames. utt_duration=1816 frames, utt_pad_proportion=0.03485, over 32.00 utterances.], tot_loss[ctc_loss=0.1343, att_loss=0.2666, loss=0.2401, over 3273694.42 frames. utt_duration=1225 frames, utt_pad_proportion=0.06009, over 10700.55 utterances.], batch size: 32, lr: 1.50e-02, grad_scale: 16.0 2023-03-07 22:25:22,143 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 3.129e+02 3.937e+02 5.112e+02 1.151e+03, threshold=7.873e+02, percent-clipped=8.0 2023-03-07 22:25:48,695 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1125, 5.3962, 4.8604, 5.4915, 4.9164, 5.0642, 5.5293, 5.3804], device='cuda:2'), covar=tensor([0.0339, 0.0207, 0.0717, 0.0175, 0.0338, 0.0174, 0.0176, 0.0118], device='cuda:2'), in_proj_covar=tensor([0.0270, 0.0212, 0.0269, 0.0194, 0.0218, 0.0166, 0.0197, 0.0191], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 22:26:26,697 INFO [train2.py:809] (2/4) Epoch 7, batch 3550, loss[ctc_loss=0.1065, att_loss=0.2382, loss=0.2119, over 15632.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008977, over 37.00 utterances.], tot_loss[ctc_loss=0.1342, att_loss=0.2667, loss=0.2402, over 3267254.25 frames. utt_duration=1218 frames, utt_pad_proportion=0.0633, over 10740.06 utterances.], batch size: 37, lr: 1.50e-02, grad_scale: 16.0 2023-03-07 22:27:34,163 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-03-07 22:27:46,246 INFO [train2.py:809] (2/4) Epoch 7, batch 3600, loss[ctc_loss=0.1285, att_loss=0.2633, loss=0.2364, over 16118.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.00688, over 42.00 utterances.], tot_loss[ctc_loss=0.1327, att_loss=0.2659, loss=0.2392, over 3269676.74 frames. utt_duration=1221 frames, utt_pad_proportion=0.06226, over 10728.10 utterances.], batch size: 42, lr: 1.50e-02, grad_scale: 16.0 2023-03-07 22:27:48,095 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=27504.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:27:51,383 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1646, 2.7622, 3.6131, 2.6637, 3.2043, 4.4203, 4.3030, 2.9453], device='cuda:2'), covar=tensor([0.0470, 0.1628, 0.0959, 0.1454, 0.1150, 0.0654, 0.0414, 0.1479], device='cuda:2'), in_proj_covar=tensor([0.0216, 0.0214, 0.0216, 0.0196, 0.0222, 0.0251, 0.0187, 0.0207], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:28:02,320 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.874e+02 3.597e+02 4.562e+02 1.127e+03, threshold=7.194e+02, percent-clipped=4.0 2023-03-07 22:29:00,854 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2023-03-07 22:29:03,306 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9493, 5.2039, 5.1990, 4.8578, 5.2991, 5.3350, 4.9242, 4.7098], device='cuda:2'), covar=tensor([0.0958, 0.0468, 0.0224, 0.0743, 0.0265, 0.0238, 0.0311, 0.0319], device='cuda:2'), in_proj_covar=tensor([0.0416, 0.0254, 0.0195, 0.0237, 0.0293, 0.0316, 0.0244, 0.0276], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 22:29:06,208 INFO [train2.py:809] (2/4) Epoch 7, batch 3650, loss[ctc_loss=0.1098, att_loss=0.2524, loss=0.2239, over 16407.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006517, over 44.00 utterances.], tot_loss[ctc_loss=0.1334, att_loss=0.2661, loss=0.2396, over 3267135.93 frames. utt_duration=1194 frames, utt_pad_proportion=0.06932, over 10954.86 utterances.], batch size: 44, lr: 1.50e-02, grad_scale: 16.0 2023-03-07 22:29:18,193 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=27560.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:29:49,528 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2023-03-07 22:30:27,723 INFO [train2.py:809] (2/4) Epoch 7, batch 3700, loss[ctc_loss=0.1051, att_loss=0.2443, loss=0.2165, over 16538.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.005856, over 45.00 utterances.], tot_loss[ctc_loss=0.1322, att_loss=0.2659, loss=0.2392, over 3276497.22 frames. utt_duration=1207 frames, utt_pad_proportion=0.06407, over 10869.49 utterances.], batch size: 45, lr: 1.50e-02, grad_scale: 16.0 2023-03-07 22:30:33,229 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=27606.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:30:36,306 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=27608.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:30:44,496 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+02 3.024e+02 4.004e+02 4.895e+02 1.157e+03, threshold=8.007e+02, percent-clipped=6.0 2023-03-07 22:31:47,385 INFO [train2.py:809] (2/4) Epoch 7, batch 3750, loss[ctc_loss=0.1131, att_loss=0.2558, loss=0.2273, over 16130.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005359, over 42.00 utterances.], tot_loss[ctc_loss=0.1322, att_loss=0.2655, loss=0.2389, over 3268967.17 frames. utt_duration=1198 frames, utt_pad_proportion=0.0687, over 10929.95 utterances.], batch size: 42, lr: 1.50e-02, grad_scale: 16.0 2023-03-07 22:31:47,706 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6262, 2.5038, 3.1153, 4.3426, 4.0266, 4.0694, 2.8764, 1.9399], device='cuda:2'), covar=tensor([0.0626, 0.2700, 0.1270, 0.0794, 0.0638, 0.0342, 0.1674, 0.2798], device='cuda:2'), in_proj_covar=tensor([0.0166, 0.0203, 0.0195, 0.0180, 0.0161, 0.0132, 0.0189, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:31:49,601 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=27654.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:33:07,745 INFO [train2.py:809] (2/4) Epoch 7, batch 3800, loss[ctc_loss=0.1251, att_loss=0.2488, loss=0.224, over 15952.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007291, over 41.00 utterances.], tot_loss[ctc_loss=0.1313, att_loss=0.2647, loss=0.238, over 3268264.67 frames. utt_duration=1213 frames, utt_pad_proportion=0.06548, over 10790.46 utterances.], batch size: 41, lr: 1.50e-02, grad_scale: 16.0 2023-03-07 22:33:25,169 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 3.032e+02 3.559e+02 4.389e+02 6.627e+02, threshold=7.117e+02, percent-clipped=0.0 2023-03-07 22:33:27,193 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=27714.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 22:34:28,360 INFO [train2.py:809] (2/4) Epoch 7, batch 3850, loss[ctc_loss=0.1394, att_loss=0.2808, loss=0.2525, over 16475.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006222, over 46.00 utterances.], tot_loss[ctc_loss=0.1313, att_loss=0.2646, loss=0.238, over 3261864.29 frames. utt_duration=1219 frames, utt_pad_proportion=0.0663, over 10716.74 utterances.], batch size: 46, lr: 1.49e-02, grad_scale: 16.0 2023-03-07 22:35:03,380 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=27775.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 22:35:46,405 INFO [train2.py:809] (2/4) Epoch 7, batch 3900, loss[ctc_loss=0.117, att_loss=0.2354, loss=0.2117, over 15367.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01131, over 35.00 utterances.], tot_loss[ctc_loss=0.1304, att_loss=0.264, loss=0.2373, over 3260349.83 frames. utt_duration=1220 frames, utt_pad_proportion=0.06695, over 10704.14 utterances.], batch size: 35, lr: 1.49e-02, grad_scale: 16.0 2023-03-07 22:35:48,249 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=27804.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:36:01,932 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+02 3.007e+02 3.697e+02 4.485e+02 8.676e+02, threshold=7.394e+02, percent-clipped=2.0 2023-03-07 22:37:02,137 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=27852.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:37:03,599 INFO [train2.py:809] (2/4) Epoch 7, batch 3950, loss[ctc_loss=0.1354, att_loss=0.2772, loss=0.2489, over 17337.00 frames. utt_duration=1177 frames, utt_pad_proportion=0.01848, over 59.00 utterances.], tot_loss[ctc_loss=0.1298, att_loss=0.264, loss=0.2371, over 3265019.01 frames. utt_duration=1240 frames, utt_pad_proportion=0.06079, over 10541.24 utterances.], batch size: 59, lr: 1.49e-02, grad_scale: 16.0 2023-03-07 22:37:36,163 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8698, 1.4738, 1.9621, 1.3091, 3.4797, 2.6017, 1.0893, 1.5017], device='cuda:2'), covar=tensor([0.0525, 0.4265, 0.3597, 0.2943, 0.0337, 0.1133, 0.4241, 0.2110], device='cuda:2'), in_proj_covar=tensor([0.0083, 0.0088, 0.0090, 0.0080, 0.0075, 0.0075, 0.0090, 0.0067], device='cuda:2'), out_proj_covar=tensor([4.1496e-05, 5.3975e-05, 5.2535e-05, 4.5049e-05, 3.7945e-05, 4.5292e-05, 5.4039e-05, 4.3151e-05], device='cuda:2') 2023-03-07 22:38:22,296 INFO [train2.py:809] (2/4) Epoch 8, batch 0, loss[ctc_loss=0.1428, att_loss=0.2921, loss=0.2623, over 16862.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.007963, over 49.00 utterances.], tot_loss[ctc_loss=0.1428, att_loss=0.2921, loss=0.2623, over 16862.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.007963, over 49.00 utterances.], batch size: 49, lr: 1.40e-02, grad_scale: 8.0 2023-03-07 22:38:22,297 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 22:38:34,610 INFO [train2.py:843] (2/4) Epoch 8, validation: ctc_loss=0.06098, att_loss=0.2435, loss=0.207, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 22:38:34,611 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 22:39:19,199 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+02 2.808e+02 3.476e+02 4.336e+02 8.495e+02, threshold=6.952e+02, percent-clipped=4.0 2023-03-07 22:39:49,796 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0932, 4.6821, 4.3312, 4.5285, 2.2320, 4.5758, 2.1513, 1.8657], device='cuda:2'), covar=tensor([0.0248, 0.0092, 0.0744, 0.0188, 0.2396, 0.0161, 0.1874, 0.1841], device='cuda:2'), in_proj_covar=tensor([0.0119, 0.0100, 0.0254, 0.0114, 0.0224, 0.0101, 0.0228, 0.0206], device='cuda:2'), out_proj_covar=tensor([1.2005e-04, 1.0292e-04, 2.3046e-04, 1.0715e-04, 2.1040e-04, 9.9420e-05, 2.0616e-04, 1.8771e-04], device='cuda:2') 2023-03-07 22:39:54,580 INFO [train2.py:809] (2/4) Epoch 8, batch 50, loss[ctc_loss=0.09578, att_loss=0.2408, loss=0.2118, over 16263.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.00647, over 43.00 utterances.], tot_loss[ctc_loss=0.1306, att_loss=0.2666, loss=0.2394, over 746529.69 frames. utt_duration=1277 frames, utt_pad_proportion=0.04076, over 2341.72 utterances.], batch size: 43, lr: 1.40e-02, grad_scale: 4.0 2023-03-07 22:41:14,415 INFO [train2.py:809] (2/4) Epoch 8, batch 100, loss[ctc_loss=0.1163, att_loss=0.2707, loss=0.2398, over 16321.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006749, over 45.00 utterances.], tot_loss[ctc_loss=0.1282, att_loss=0.2637, loss=0.2366, over 1300144.76 frames. utt_duration=1267 frames, utt_pad_proportion=0.0501, over 4110.25 utterances.], batch size: 45, lr: 1.40e-02, grad_scale: 4.0 2023-03-07 22:41:17,748 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.08 vs. limit=5.0 2023-03-07 22:42:02,736 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.962e+02 3.730e+02 4.479e+02 1.131e+03, threshold=7.459e+02, percent-clipped=8.0 2023-03-07 22:42:13,356 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-03-07 22:42:39,316 INFO [train2.py:809] (2/4) Epoch 8, batch 150, loss[ctc_loss=0.1061, att_loss=0.2342, loss=0.2086, over 15351.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01134, over 35.00 utterances.], tot_loss[ctc_loss=0.1286, att_loss=0.2631, loss=0.2362, over 1740018.77 frames. utt_duration=1260 frames, utt_pad_proportion=0.05217, over 5529.71 utterances.], batch size: 35, lr: 1.40e-02, grad_scale: 4.0 2023-03-07 22:43:28,915 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3410, 2.7585, 3.5982, 2.5515, 3.4684, 4.5224, 4.3085, 2.9500], device='cuda:2'), covar=tensor([0.0473, 0.1839, 0.1116, 0.1506, 0.0915, 0.0566, 0.0487, 0.1551], device='cuda:2'), in_proj_covar=tensor([0.0216, 0.0212, 0.0219, 0.0196, 0.0222, 0.0252, 0.0194, 0.0209], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:43:31,924 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28070.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:44:00,289 INFO [train2.py:809] (2/4) Epoch 8, batch 200, loss[ctc_loss=0.08675, att_loss=0.2201, loss=0.1934, over 14565.00 frames. utt_duration=1822 frames, utt_pad_proportion=0.03489, over 32.00 utterances.], tot_loss[ctc_loss=0.1266, att_loss=0.262, loss=0.2349, over 2082277.97 frames. utt_duration=1258 frames, utt_pad_proportion=0.05123, over 6628.01 utterances.], batch size: 32, lr: 1.40e-02, grad_scale: 4.0 2023-03-07 22:44:44,919 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+02 2.763e+02 3.553e+02 4.276e+02 6.955e+02, threshold=7.106e+02, percent-clipped=0.0 2023-03-07 22:44:52,126 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2023-03-07 22:45:21,336 INFO [train2.py:809] (2/4) Epoch 8, batch 250, loss[ctc_loss=0.1172, att_loss=0.2599, loss=0.2314, over 16470.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.007046, over 46.00 utterances.], tot_loss[ctc_loss=0.1249, att_loss=0.2611, loss=0.2338, over 2348350.03 frames. utt_duration=1253 frames, utt_pad_proportion=0.05323, over 7505.00 utterances.], batch size: 46, lr: 1.40e-02, grad_scale: 4.0 2023-03-07 22:46:35,933 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.02 vs. limit=5.0 2023-03-07 22:46:41,309 INFO [train2.py:809] (2/4) Epoch 8, batch 300, loss[ctc_loss=0.09789, att_loss=0.229, loss=0.2028, over 15757.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009515, over 38.00 utterances.], tot_loss[ctc_loss=0.1258, att_loss=0.2621, loss=0.2348, over 2551602.58 frames. utt_duration=1247 frames, utt_pad_proportion=0.05539, over 8191.77 utterances.], batch size: 38, lr: 1.40e-02, grad_scale: 4.0 2023-03-07 22:46:48,905 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6763, 2.9463, 3.6908, 2.5830, 3.3271, 4.6594, 4.5059, 3.3601], device='cuda:2'), covar=tensor([0.0400, 0.1704, 0.1166, 0.1647, 0.1183, 0.0694, 0.0528, 0.1347], device='cuda:2'), in_proj_covar=tensor([0.0217, 0.0216, 0.0223, 0.0199, 0.0223, 0.0254, 0.0196, 0.0208], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:47:13,705 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.52 vs. limit=5.0 2023-03-07 22:47:24,410 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5474, 4.5556, 4.4601, 4.5603, 5.0296, 4.7688, 4.4326, 2.1031], device='cuda:2'), covar=tensor([0.0217, 0.0336, 0.0282, 0.0183, 0.0979, 0.0191, 0.0299, 0.2530], device='cuda:2'), in_proj_covar=tensor([0.0128, 0.0123, 0.0122, 0.0125, 0.0300, 0.0124, 0.0112, 0.0228], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 22:47:27,067 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.948e+02 3.357e+02 4.007e+02 1.001e+03, threshold=6.714e+02, percent-clipped=3.0 2023-03-07 22:47:51,072 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28229.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:48:03,891 INFO [train2.py:809] (2/4) Epoch 8, batch 350, loss[ctc_loss=0.116, att_loss=0.2706, loss=0.2397, over 16879.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.006146, over 49.00 utterances.], tot_loss[ctc_loss=0.1267, att_loss=0.2631, loss=0.2358, over 2719594.05 frames. utt_duration=1224 frames, utt_pad_proportion=0.0572, over 8896.49 utterances.], batch size: 49, lr: 1.40e-02, grad_scale: 4.0 2023-03-07 22:48:14,498 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8678, 6.0645, 5.4059, 5.9797, 5.7765, 5.3103, 5.4899, 5.4283], device='cuda:2'), covar=tensor([0.1008, 0.0961, 0.0768, 0.0645, 0.0716, 0.1360, 0.2164, 0.2229], device='cuda:2'), in_proj_covar=tensor([0.0385, 0.0433, 0.0333, 0.0342, 0.0318, 0.0392, 0.0453, 0.0414], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 22:48:25,664 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7285, 5.0369, 4.9193, 4.9206, 5.0743, 5.0802, 4.7352, 4.5666], device='cuda:2'), covar=tensor([0.1071, 0.0493, 0.0280, 0.0550, 0.0308, 0.0319, 0.0310, 0.0365], device='cuda:2'), in_proj_covar=tensor([0.0429, 0.0257, 0.0203, 0.0246, 0.0301, 0.0328, 0.0253, 0.0284], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 22:48:33,676 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28255.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 22:49:25,169 INFO [train2.py:809] (2/4) Epoch 8, batch 400, loss[ctc_loss=0.1096, att_loss=0.2539, loss=0.2251, over 16479.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006684, over 46.00 utterances.], tot_loss[ctc_loss=0.1271, att_loss=0.2635, loss=0.2362, over 2846891.35 frames. utt_duration=1214 frames, utt_pad_proportion=0.0587, over 9392.52 utterances.], batch size: 46, lr: 1.40e-02, grad_scale: 8.0 2023-03-07 22:49:25,542 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28287.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:49:30,704 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28290.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:49:35,283 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9987, 4.9225, 4.9073, 2.2764, 1.8719, 2.5768, 3.6974, 3.5508], device='cuda:2'), covar=tensor([0.0656, 0.0140, 0.0165, 0.3716, 0.5885, 0.2853, 0.0920, 0.2040], device='cuda:2'), in_proj_covar=tensor([0.0319, 0.0195, 0.0221, 0.0188, 0.0357, 0.0348, 0.0217, 0.0348], device='cuda:2'), out_proj_covar=tensor([1.5348e-04, 7.7407e-05, 9.7042e-05, 8.6974e-05, 1.6408e-04, 1.4822e-04, 8.7514e-05, 1.5660e-04], device='cuda:2') 2023-03-07 22:49:49,546 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5576, 3.5384, 2.9697, 3.3045, 3.7643, 3.3173, 2.2818, 4.0399], device='cuda:2'), covar=tensor([0.1047, 0.0532, 0.0952, 0.0601, 0.0493, 0.0657, 0.1117, 0.0344], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0162, 0.0191, 0.0164, 0.0197, 0.0196, 0.0169, 0.0208], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 22:50:09,511 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.601e+02 3.135e+02 4.187e+02 9.825e+02, threshold=6.270e+02, percent-clipped=5.0 2023-03-07 22:50:11,521 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28316.0, num_to_drop=1, layers_to_drop={1} 2023-03-07 22:50:45,311 INFO [train2.py:809] (2/4) Epoch 8, batch 450, loss[ctc_loss=0.1627, att_loss=0.2802, loss=0.2567, over 16538.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006587, over 45.00 utterances.], tot_loss[ctc_loss=0.1276, att_loss=0.263, loss=0.2359, over 2941571.42 frames. utt_duration=1232 frames, utt_pad_proportion=0.05479, over 9562.22 utterances.], batch size: 45, lr: 1.39e-02, grad_scale: 8.0 2023-03-07 22:51:03,327 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28348.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:51:07,819 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4336, 2.3487, 3.1416, 4.4745, 4.0919, 4.0199, 2.7279, 2.0035], device='cuda:2'), covar=tensor([0.0720, 0.2850, 0.1495, 0.0593, 0.0638, 0.0440, 0.1865, 0.2789], device='cuda:2'), in_proj_covar=tensor([0.0165, 0.0200, 0.0192, 0.0175, 0.0164, 0.0136, 0.0190, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:51:37,061 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=28370.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:52:04,602 INFO [train2.py:809] (2/4) Epoch 8, batch 500, loss[ctc_loss=0.1166, att_loss=0.2353, loss=0.2115, over 15739.00 frames. utt_duration=1659 frames, utt_pad_proportion=0.009788, over 38.00 utterances.], tot_loss[ctc_loss=0.1276, att_loss=0.2628, loss=0.2358, over 3011470.55 frames. utt_duration=1217 frames, utt_pad_proportion=0.05997, over 9908.13 utterances.], batch size: 38, lr: 1.39e-02, grad_scale: 8.0 2023-03-07 22:52:08,784 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6061, 4.7807, 4.6577, 4.7711, 5.1213, 4.8836, 4.5000, 2.0913], device='cuda:2'), covar=tensor([0.0277, 0.0218, 0.0181, 0.0166, 0.0996, 0.0219, 0.0250, 0.2597], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0121, 0.0120, 0.0123, 0.0296, 0.0122, 0.0112, 0.0224], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 22:52:40,753 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.12 vs. limit=5.0 2023-03-07 22:52:49,182 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+02 2.823e+02 3.661e+02 4.475e+02 7.657e+02, threshold=7.322e+02, percent-clipped=5.0 2023-03-07 22:52:54,253 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=28418.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 22:53:25,911 INFO [train2.py:809] (2/4) Epoch 8, batch 550, loss[ctc_loss=0.09281, att_loss=0.247, loss=0.2161, over 16265.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.007681, over 43.00 utterances.], tot_loss[ctc_loss=0.1255, att_loss=0.2611, loss=0.234, over 3062274.96 frames. utt_duration=1245 frames, utt_pad_proportion=0.05699, over 9852.65 utterances.], batch size: 43, lr: 1.39e-02, grad_scale: 8.0 2023-03-07 22:53:36,289 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-07 22:54:43,136 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28486.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:54:44,264 INFO [train2.py:809] (2/4) Epoch 8, batch 600, loss[ctc_loss=0.1473, att_loss=0.2852, loss=0.2576, over 16465.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.0069, over 46.00 utterances.], tot_loss[ctc_loss=0.1254, att_loss=0.2614, loss=0.2342, over 3110089.88 frames. utt_duration=1231 frames, utt_pad_proportion=0.06032, over 10120.30 utterances.], batch size: 46, lr: 1.39e-02, grad_scale: 8.0 2023-03-07 22:54:52,849 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28492.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:55:27,684 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.636e+02 3.439e+02 4.247e+02 1.142e+03, threshold=6.879e+02, percent-clipped=2.0 2023-03-07 22:55:57,585 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9741, 5.3647, 4.8210, 5.4490, 4.8007, 5.1222, 5.4793, 5.2811], device='cuda:2'), covar=tensor([0.0429, 0.0195, 0.0732, 0.0167, 0.0346, 0.0137, 0.0185, 0.0145], device='cuda:2'), in_proj_covar=tensor([0.0282, 0.0216, 0.0280, 0.0203, 0.0226, 0.0174, 0.0203, 0.0199], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 22:56:03,439 INFO [train2.py:809] (2/4) Epoch 8, batch 650, loss[ctc_loss=0.1116, att_loss=0.2492, loss=0.2217, over 15960.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.006004, over 41.00 utterances.], tot_loss[ctc_loss=0.1251, att_loss=0.2605, loss=0.2334, over 3142771.28 frames. utt_duration=1256 frames, utt_pad_proportion=0.05425, over 10018.99 utterances.], batch size: 41, lr: 1.39e-02, grad_scale: 8.0 2023-03-07 22:56:19,713 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28547.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:56:21,112 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2594, 4.5153, 4.0566, 4.5936, 4.0834, 4.2368, 4.6155, 4.4973], device='cuda:2'), covar=tensor([0.0525, 0.0380, 0.0881, 0.0258, 0.0490, 0.0361, 0.0277, 0.0212], device='cuda:2'), in_proj_covar=tensor([0.0283, 0.0217, 0.0282, 0.0204, 0.0228, 0.0175, 0.0203, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 22:56:29,236 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28553.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:56:55,815 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9345, 6.1754, 5.6017, 5.9764, 5.7783, 5.4973, 5.6461, 5.2827], device='cuda:2'), covar=tensor([0.1031, 0.0730, 0.0644, 0.0610, 0.0669, 0.1231, 0.1661, 0.2023], device='cuda:2'), in_proj_covar=tensor([0.0390, 0.0441, 0.0333, 0.0347, 0.0326, 0.0395, 0.0467, 0.0416], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 22:57:08,370 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1914, 4.7448, 4.7000, 4.3612, 2.3599, 4.2948, 2.6081, 2.1542], device='cuda:2'), covar=tensor([0.0229, 0.0121, 0.0640, 0.0239, 0.2448, 0.0174, 0.1799, 0.1881], device='cuda:2'), in_proj_covar=tensor([0.0123, 0.0101, 0.0259, 0.0115, 0.0225, 0.0102, 0.0231, 0.0208], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:57:21,187 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28585.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:57:24,657 INFO [train2.py:809] (2/4) Epoch 8, batch 700, loss[ctc_loss=0.1229, att_loss=0.2655, loss=0.237, over 16625.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.00514, over 47.00 utterances.], tot_loss[ctc_loss=0.1251, att_loss=0.2598, loss=0.2329, over 3156121.39 frames. utt_duration=1229 frames, utt_pad_proportion=0.06388, over 10287.28 utterances.], batch size: 47, lr: 1.39e-02, grad_scale: 8.0 2023-03-07 22:57:39,093 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3025, 2.4920, 3.1138, 4.3797, 3.9152, 4.0764, 2.7850, 1.9897], device='cuda:2'), covar=tensor([0.0695, 0.2289, 0.1259, 0.0465, 0.0578, 0.0293, 0.1746, 0.2733], device='cuda:2'), in_proj_covar=tensor([0.0166, 0.0197, 0.0190, 0.0175, 0.0165, 0.0134, 0.0190, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 22:58:02,998 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28611.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 22:58:08,934 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+02 2.921e+02 3.565e+02 4.512e+02 9.961e+02, threshold=7.130e+02, percent-clipped=4.0 2023-03-07 22:58:44,785 INFO [train2.py:809] (2/4) Epoch 8, batch 750, loss[ctc_loss=0.1231, att_loss=0.2532, loss=0.2272, over 16398.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007903, over 44.00 utterances.], tot_loss[ctc_loss=0.1261, att_loss=0.2609, loss=0.234, over 3191344.34 frames. utt_duration=1251 frames, utt_pad_proportion=0.05487, over 10212.38 utterances.], batch size: 44, lr: 1.39e-02, grad_scale: 8.0 2023-03-07 22:58:55,039 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28643.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 22:59:19,048 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.04 vs. limit=2.0 2023-03-07 23:00:04,392 INFO [train2.py:809] (2/4) Epoch 8, batch 800, loss[ctc_loss=0.1089, att_loss=0.2351, loss=0.2098, over 14601.00 frames. utt_duration=1827 frames, utt_pad_proportion=0.0385, over 32.00 utterances.], tot_loss[ctc_loss=0.1273, att_loss=0.2624, loss=0.2354, over 3208641.18 frames. utt_duration=1209 frames, utt_pad_proportion=0.06587, over 10624.82 utterances.], batch size: 32, lr: 1.39e-02, grad_scale: 8.0 2023-03-07 23:00:30,665 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.64 vs. limit=2.0 2023-03-07 23:00:43,601 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7902, 5.0859, 4.9432, 4.9319, 5.1351, 5.1175, 4.8661, 4.5763], device='cuda:2'), covar=tensor([0.0924, 0.0441, 0.0287, 0.0467, 0.0239, 0.0292, 0.0272, 0.0356], device='cuda:2'), in_proj_covar=tensor([0.0418, 0.0257, 0.0201, 0.0241, 0.0297, 0.0326, 0.0245, 0.0280], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 23:00:47,881 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+02 2.816e+02 3.750e+02 4.467e+02 1.351e+03, threshold=7.500e+02, percent-clipped=6.0 2023-03-07 23:00:48,283 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2527, 2.3272, 2.9900, 4.1894, 3.9321, 4.0712, 2.7554, 1.7109], device='cuda:2'), covar=tensor([0.0652, 0.2453, 0.1197, 0.0440, 0.0516, 0.0249, 0.1728, 0.2582], device='cuda:2'), in_proj_covar=tensor([0.0166, 0.0198, 0.0189, 0.0174, 0.0165, 0.0135, 0.0192, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 23:01:24,439 INFO [train2.py:809] (2/4) Epoch 8, batch 850, loss[ctc_loss=0.1272, att_loss=0.2668, loss=0.2389, over 17025.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007497, over 51.00 utterances.], tot_loss[ctc_loss=0.1257, att_loss=0.2613, loss=0.2341, over 3226715.18 frames. utt_duration=1241 frames, utt_pad_proportion=0.05717, over 10415.85 utterances.], batch size: 51, lr: 1.38e-02, grad_scale: 8.0 2023-03-07 23:02:43,217 INFO [train2.py:809] (2/4) Epoch 8, batch 900, loss[ctc_loss=0.1056, att_loss=0.243, loss=0.2156, over 16392.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.007021, over 44.00 utterances.], tot_loss[ctc_loss=0.1266, att_loss=0.2624, loss=0.2353, over 3236481.53 frames. utt_duration=1244 frames, utt_pad_proportion=0.05625, over 10415.89 utterances.], batch size: 44, lr: 1.38e-02, grad_scale: 8.0 2023-03-07 23:03:27,130 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+02 2.976e+02 3.721e+02 4.391e+02 9.491e+02, threshold=7.443e+02, percent-clipped=5.0 2023-03-07 23:03:35,178 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7459, 4.1629, 4.5041, 4.3570, 4.2235, 4.6679, 4.3593, 4.7484], device='cuda:2'), covar=tensor([0.0817, 0.0780, 0.0606, 0.0934, 0.1813, 0.0789, 0.1794, 0.0646], device='cuda:2'), in_proj_covar=tensor([0.0616, 0.0373, 0.0423, 0.0490, 0.0647, 0.0426, 0.0350, 0.0429], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 23:03:59,437 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6338, 2.6323, 3.3884, 4.3563, 4.0732, 4.0852, 2.7900, 1.7717], device='cuda:2'), covar=tensor([0.0520, 0.2216, 0.1054, 0.0615, 0.0572, 0.0357, 0.1769, 0.2628], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0193, 0.0184, 0.0175, 0.0163, 0.0134, 0.0188, 0.0175], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 23:04:03,685 INFO [train2.py:809] (2/4) Epoch 8, batch 950, loss[ctc_loss=0.1714, att_loss=0.2934, loss=0.269, over 13442.00 frames. utt_duration=372.2 frames, utt_pad_proportion=0.3526, over 145.00 utterances.], tot_loss[ctc_loss=0.1266, att_loss=0.2627, loss=0.2355, over 3238528.67 frames. utt_duration=1224 frames, utt_pad_proportion=0.06272, over 10596.19 utterances.], batch size: 145, lr: 1.38e-02, grad_scale: 8.0 2023-03-07 23:04:08,801 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1777, 4.5278, 4.6054, 4.3846, 2.0724, 4.3203, 2.5652, 1.6629], device='cuda:2'), covar=tensor([0.0285, 0.0128, 0.0602, 0.0210, 0.2428, 0.0158, 0.1683, 0.1946], device='cuda:2'), in_proj_covar=tensor([0.0120, 0.0100, 0.0251, 0.0115, 0.0221, 0.0102, 0.0229, 0.0203], device='cuda:2'), out_proj_covar=tensor([1.2094e-04, 1.0330e-04, 2.2837e-04, 1.0871e-04, 2.0914e-04, 9.9779e-05, 2.0694e-04, 1.8620e-04], device='cuda:2') 2023-03-07 23:04:11,996 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28842.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:04:21,435 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28848.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:04:42,042 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28861.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:04:54,440 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2669, 4.4139, 4.3035, 4.2613, 2.1867, 4.1907, 2.5083, 1.6941], device='cuda:2'), covar=tensor([0.0281, 0.0133, 0.0745, 0.0246, 0.2331, 0.0196, 0.1684, 0.1922], device='cuda:2'), in_proj_covar=tensor([0.0119, 0.0100, 0.0251, 0.0115, 0.0220, 0.0102, 0.0228, 0.0202], device='cuda:2'), out_proj_covar=tensor([1.2055e-04, 1.0333e-04, 2.2858e-04, 1.0859e-04, 2.0816e-04, 9.9939e-05, 2.0668e-04, 1.8497e-04], device='cuda:2') 2023-03-07 23:05:21,503 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=28885.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:05:23,031 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9962, 5.3631, 4.8126, 5.4659, 4.7077, 5.1157, 5.4803, 5.2715], device='cuda:2'), covar=tensor([0.0444, 0.0216, 0.0720, 0.0142, 0.0484, 0.0148, 0.0180, 0.0133], device='cuda:2'), in_proj_covar=tensor([0.0283, 0.0214, 0.0279, 0.0202, 0.0228, 0.0176, 0.0200, 0.0197], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 23:05:24,289 INFO [train2.py:809] (2/4) Epoch 8, batch 1000, loss[ctc_loss=0.1133, att_loss=0.2467, loss=0.22, over 15883.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009269, over 39.00 utterances.], tot_loss[ctc_loss=0.1265, att_loss=0.263, loss=0.2357, over 3253601.63 frames. utt_duration=1246 frames, utt_pad_proportion=0.05411, over 10458.02 utterances.], batch size: 39, lr: 1.38e-02, grad_scale: 8.0 2023-03-07 23:05:26,151 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6510, 5.2044, 4.8244, 5.3477, 5.3609, 4.7777, 3.7871, 5.1403], device='cuda:2'), covar=tensor([0.0095, 0.0086, 0.0120, 0.0070, 0.0068, 0.0094, 0.0467, 0.0247], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0060, 0.0072, 0.0047, 0.0048, 0.0059, 0.0083, 0.0081], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 23:05:56,837 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4318, 5.1151, 4.7586, 5.0915, 5.1025, 4.6979, 3.5450, 5.0084], device='cuda:2'), covar=tensor([0.0119, 0.0098, 0.0114, 0.0084, 0.0094, 0.0107, 0.0591, 0.0222], device='cuda:2'), in_proj_covar=tensor([0.0067, 0.0061, 0.0073, 0.0047, 0.0048, 0.0059, 0.0083, 0.0082], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 23:06:01,481 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=28911.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 23:06:07,295 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+02 2.634e+02 3.393e+02 4.335e+02 1.819e+03, threshold=6.785e+02, percent-clipped=6.0 2023-03-07 23:06:20,111 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28922.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:06:26,921 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2299, 2.4960, 4.5686, 3.7066, 2.9863, 4.1389, 4.0907, 4.2051], device='cuda:2'), covar=tensor([0.0156, 0.1571, 0.0084, 0.0962, 0.1674, 0.0259, 0.0104, 0.0197], device='cuda:2'), in_proj_covar=tensor([0.0143, 0.0237, 0.0120, 0.0297, 0.0274, 0.0179, 0.0103, 0.0138], device='cuda:2'), out_proj_covar=tensor([1.3264e-04, 1.9721e-04, 1.0775e-04, 2.4373e-04, 2.4136e-04, 1.5871e-04, 9.3381e-05, 1.2830e-04], device='cuda:2') 2023-03-07 23:06:29,903 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.0856, 3.4746, 3.4279, 2.7896, 3.5039, 3.3395, 3.3596, 2.1004], device='cuda:2'), covar=tensor([0.1279, 0.1167, 0.2006, 0.7045, 0.2407, 0.3490, 0.0834, 1.0357], device='cuda:2'), in_proj_covar=tensor([0.0077, 0.0089, 0.0093, 0.0147, 0.0084, 0.0138, 0.0081, 0.0141], device='cuda:2'), out_proj_covar=tensor([6.9305e-05, 7.0748e-05, 7.9134e-05, 1.1443e-04, 7.1405e-05, 1.0933e-04, 6.6700e-05, 1.1270e-04], device='cuda:2') 2023-03-07 23:06:37,061 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=28933.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:06:43,111 INFO [train2.py:809] (2/4) Epoch 8, batch 1050, loss[ctc_loss=0.09804, att_loss=0.2282, loss=0.2022, over 15377.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01065, over 35.00 utterances.], tot_loss[ctc_loss=0.1274, att_loss=0.2634, loss=0.2362, over 3252965.32 frames. utt_duration=1208 frames, utt_pad_proportion=0.06543, over 10781.65 utterances.], batch size: 35, lr: 1.38e-02, grad_scale: 8.0 2023-03-07 23:06:53,378 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=28943.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:07:01,114 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9870, 5.3683, 4.7294, 5.4019, 4.6657, 5.0080, 5.4867, 5.2700], device='cuda:2'), covar=tensor([0.0466, 0.0224, 0.0801, 0.0165, 0.0421, 0.0219, 0.0184, 0.0155], device='cuda:2'), in_proj_covar=tensor([0.0278, 0.0211, 0.0274, 0.0198, 0.0225, 0.0174, 0.0199, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-07 23:07:18,035 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=28959.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 23:07:25,956 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8396, 2.6250, 3.3393, 2.5368, 3.1605, 3.9837, 3.8329, 3.0134], device='cuda:2'), covar=tensor([0.0416, 0.1632, 0.0975, 0.1411, 0.0960, 0.0752, 0.0563, 0.1347], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0213, 0.0221, 0.0196, 0.0222, 0.0254, 0.0193, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-07 23:08:03,736 INFO [train2.py:809] (2/4) Epoch 8, batch 1100, loss[ctc_loss=0.1068, att_loss=0.2568, loss=0.2268, over 16782.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005623, over 48.00 utterances.], tot_loss[ctc_loss=0.1255, att_loss=0.2621, loss=0.2348, over 3249111.78 frames. utt_duration=1249 frames, utt_pad_proportion=0.05669, over 10420.48 utterances.], batch size: 48, lr: 1.38e-02, grad_scale: 8.0 2023-03-07 23:08:10,620 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=28991.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:08:25,566 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6321, 2.4332, 5.1079, 3.9634, 3.0958, 4.6380, 4.8968, 4.8840], device='cuda:2'), covar=tensor([0.0210, 0.1971, 0.0156, 0.0992, 0.1840, 0.0201, 0.0089, 0.0162], device='cuda:2'), in_proj_covar=tensor([0.0148, 0.0245, 0.0124, 0.0305, 0.0282, 0.0183, 0.0106, 0.0142], device='cuda:2'), out_proj_covar=tensor([1.3679e-04, 2.0395e-04, 1.1120e-04, 2.5065e-04, 2.4812e-04, 1.6289e-04, 9.6066e-05, 1.3166e-04], device='cuda:2') 2023-03-07 23:08:28,654 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29002.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:08:48,761 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.591e+02 2.718e+02 3.264e+02 3.994e+02 7.593e+02, threshold=6.528e+02, percent-clipped=2.0 2023-03-07 23:09:25,646 INFO [train2.py:809] (2/4) Epoch 8, batch 1150, loss[ctc_loss=0.1406, att_loss=0.2807, loss=0.2527, over 16555.00 frames. utt_duration=677.4 frames, utt_pad_proportion=0.1511, over 98.00 utterances.], tot_loss[ctc_loss=0.1256, att_loss=0.2622, loss=0.2349, over 3250835.68 frames. utt_duration=1227 frames, utt_pad_proportion=0.06188, over 10608.19 utterances.], batch size: 98, lr: 1.38e-02, grad_scale: 8.0 2023-03-07 23:10:06,629 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29063.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:10:46,212 INFO [train2.py:809] (2/4) Epoch 8, batch 1200, loss[ctc_loss=0.1445, att_loss=0.2779, loss=0.2512, over 17437.00 frames. utt_duration=884.4 frames, utt_pad_proportion=0.07397, over 79.00 utterances.], tot_loss[ctc_loss=0.1268, att_loss=0.2633, loss=0.236, over 3259811.19 frames. utt_duration=1217 frames, utt_pad_proportion=0.06313, over 10725.38 utterances.], batch size: 79, lr: 1.38e-02, grad_scale: 8.0 2023-03-07 23:11:20,563 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9599, 5.2970, 5.2228, 5.2893, 5.4206, 5.3477, 5.0602, 4.8335], device='cuda:2'), covar=tensor([0.1072, 0.0561, 0.0228, 0.0419, 0.0289, 0.0300, 0.0240, 0.0329], device='cuda:2'), in_proj_covar=tensor([0.0431, 0.0264, 0.0203, 0.0242, 0.0305, 0.0334, 0.0252, 0.0285], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0003, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-07 23:11:30,218 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.598e+02 2.794e+02 3.105e+02 3.921e+02 8.857e+02, threshold=6.210e+02, percent-clipped=2.0 2023-03-07 23:12:06,185 INFO [train2.py:809] (2/4) Epoch 8, batch 1250, loss[ctc_loss=0.1514, att_loss=0.279, loss=0.2535, over 16737.00 frames. utt_duration=677.9 frames, utt_pad_proportion=0.1494, over 99.00 utterances.], tot_loss[ctc_loss=0.127, att_loss=0.2636, loss=0.2363, over 3272927.70 frames. utt_duration=1213 frames, utt_pad_proportion=0.06015, over 10807.67 utterances.], batch size: 99, lr: 1.38e-02, grad_scale: 8.0 2023-03-07 23:12:14,249 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29142.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:12:23,510 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29148.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:12:28,291 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6848, 4.8136, 4.5898, 4.4881, 5.2401, 5.0786, 4.4122, 2.2261], device='cuda:2'), covar=tensor([0.0236, 0.0276, 0.0297, 0.0315, 0.0859, 0.0168, 0.0363, 0.2332], device='cuda:2'), in_proj_covar=tensor([0.0131, 0.0123, 0.0129, 0.0127, 0.0308, 0.0125, 0.0115, 0.0228], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 23:13:04,881 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2023-03-07 23:13:25,934 INFO [train2.py:809] (2/4) Epoch 8, batch 1300, loss[ctc_loss=0.1517, att_loss=0.2681, loss=0.2448, over 17027.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.006614, over 51.00 utterances.], tot_loss[ctc_loss=0.1254, att_loss=0.2624, loss=0.235, over 3272195.78 frames. utt_duration=1242 frames, utt_pad_proportion=0.05445, over 10549.48 utterances.], batch size: 51, lr: 1.37e-02, grad_scale: 8.0 2023-03-07 23:13:30,547 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29190.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:13:39,856 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29196.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:14:09,582 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+02 2.738e+02 3.321e+02 4.360e+02 1.126e+03, threshold=6.642e+02, percent-clipped=8.0 2023-03-07 23:14:13,594 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=29217.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:14:44,873 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29236.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:14:46,059 INFO [train2.py:809] (2/4) Epoch 8, batch 1350, loss[ctc_loss=0.1211, att_loss=0.2703, loss=0.2404, over 17060.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009296, over 53.00 utterances.], tot_loss[ctc_loss=0.1255, att_loss=0.2623, loss=0.2349, over 3271631.51 frames. utt_duration=1259 frames, utt_pad_proportion=0.05141, over 10408.11 utterances.], batch size: 53, lr: 1.37e-02, grad_scale: 8.0 2023-03-07 23:16:06,783 INFO [train2.py:809] (2/4) Epoch 8, batch 1400, loss[ctc_loss=0.1259, att_loss=0.2732, loss=0.2437, over 16637.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.004662, over 47.00 utterances.], tot_loss[ctc_loss=0.1253, att_loss=0.2623, loss=0.2349, over 3281166.35 frames. utt_duration=1264 frames, utt_pad_proportion=0.04698, over 10395.36 utterances.], batch size: 47, lr: 1.37e-02, grad_scale: 8.0 2023-03-07 23:16:23,185 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29297.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:16:42,384 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1020, 5.1880, 5.0625, 2.3890, 1.9063, 3.0352, 4.2235, 3.7779], device='cuda:2'), covar=tensor([0.0565, 0.0193, 0.0190, 0.3494, 0.6248, 0.2187, 0.0743, 0.1943], device='cuda:2'), in_proj_covar=tensor([0.0307, 0.0194, 0.0217, 0.0179, 0.0347, 0.0332, 0.0212, 0.0335], device='cuda:2'), out_proj_covar=tensor([1.4688e-04, 7.6664e-05, 9.6020e-05, 8.2449e-05, 1.5788e-04, 1.4124e-04, 8.6000e-05, 1.4993e-04], device='cuda:2') 2023-03-07 23:16:52,001 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.467e+02 2.810e+02 3.377e+02 4.369e+02 8.474e+02, threshold=6.754e+02, percent-clipped=4.0 2023-03-07 23:17:27,567 INFO [train2.py:809] (2/4) Epoch 8, batch 1450, loss[ctc_loss=0.1266, att_loss=0.2718, loss=0.2428, over 17382.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03234, over 63.00 utterances.], tot_loss[ctc_loss=0.1237, att_loss=0.2611, loss=0.2337, over 3283318.09 frames. utt_duration=1291 frames, utt_pad_proportion=0.04084, over 10185.61 utterances.], batch size: 63, lr: 1.37e-02, grad_scale: 8.0 2023-03-07 23:17:32,023 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-07 23:17:41,985 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3396, 2.3663, 3.3433, 4.3092, 4.0349, 4.0655, 2.8193, 1.8902], device='cuda:2'), covar=tensor([0.0823, 0.2749, 0.1110, 0.0683, 0.0514, 0.0379, 0.1804, 0.2828], device='cuda:2'), in_proj_covar=tensor([0.0160, 0.0192, 0.0183, 0.0171, 0.0163, 0.0134, 0.0184, 0.0175], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 23:17:45,127 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5845, 1.7701, 1.8980, 1.6818, 2.7610, 2.3277, 1.6634, 2.3336], device='cuda:2'), covar=tensor([0.0509, 0.3382, 0.2708, 0.1601, 0.0572, 0.1137, 0.2305, 0.1020], device='cuda:2'), in_proj_covar=tensor([0.0077, 0.0082, 0.0083, 0.0076, 0.0072, 0.0070, 0.0079, 0.0064], device='cuda:2'), out_proj_covar=tensor([4.0502e-05, 5.1091e-05, 5.0189e-05, 4.3563e-05, 3.8373e-05, 4.3566e-05, 4.9241e-05, 4.1152e-05], device='cuda:2') 2023-03-07 23:18:00,628 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=29358.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:18:47,772 INFO [train2.py:809] (2/4) Epoch 8, batch 1500, loss[ctc_loss=0.09064, att_loss=0.2557, loss=0.2227, over 16335.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005791, over 45.00 utterances.], tot_loss[ctc_loss=0.1234, att_loss=0.2607, loss=0.2333, over 3286829.36 frames. utt_duration=1300 frames, utt_pad_proportion=0.03772, over 10126.89 utterances.], batch size: 45, lr: 1.37e-02, grad_scale: 8.0 2023-03-07 23:19:30,629 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.613e+02 2.592e+02 3.615e+02 4.521e+02 1.886e+03, threshold=7.229e+02, percent-clipped=4.0 2023-03-07 23:20:07,838 INFO [train2.py:809] (2/4) Epoch 8, batch 1550, loss[ctc_loss=0.2009, att_loss=0.2958, loss=0.2768, over 16955.00 frames. utt_duration=686.6 frames, utt_pad_proportion=0.1364, over 99.00 utterances.], tot_loss[ctc_loss=0.1245, att_loss=0.2612, loss=0.2339, over 3285855.66 frames. utt_duration=1279 frames, utt_pad_proportion=0.04358, over 10288.39 utterances.], batch size: 99, lr: 1.37e-02, grad_scale: 8.0 2023-03-07 23:20:31,582 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7952, 5.0567, 5.0184, 4.9712, 5.2072, 5.1530, 4.9151, 4.5817], device='cuda:2'), covar=tensor([0.0911, 0.0545, 0.0247, 0.0515, 0.0253, 0.0270, 0.0272, 0.0327], device='cuda:2'), in_proj_covar=tensor([0.0427, 0.0264, 0.0204, 0.0244, 0.0304, 0.0329, 0.0252, 0.0283], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-07 23:20:51,828 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6157, 4.4828, 4.5605, 4.5862, 5.1388, 4.9510, 4.3461, 2.0113], device='cuda:2'), covar=tensor([0.0248, 0.0471, 0.0240, 0.0211, 0.0943, 0.0195, 0.0396, 0.2837], device='cuda:2'), in_proj_covar=tensor([0.0132, 0.0125, 0.0131, 0.0128, 0.0313, 0.0124, 0.0118, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 23:21:28,331 INFO [train2.py:809] (2/4) Epoch 8, batch 1600, loss[ctc_loss=0.1871, att_loss=0.3016, loss=0.2787, over 13982.00 frames. utt_duration=384.4 frames, utt_pad_proportion=0.328, over 146.00 utterances.], tot_loss[ctc_loss=0.1241, att_loss=0.2611, loss=0.2337, over 3284598.34 frames. utt_duration=1279 frames, utt_pad_proportion=0.04379, over 10280.42 utterances.], batch size: 146, lr: 1.37e-02, grad_scale: 8.0 2023-03-07 23:22:11,635 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.666e+02 2.839e+02 3.566e+02 4.248e+02 8.470e+02, threshold=7.133e+02, percent-clipped=2.0 2023-03-07 23:22:16,304 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29517.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:22:35,620 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1000, 4.9091, 4.5650, 4.5157, 2.7133, 4.6933, 2.9515, 1.7416], device='cuda:2'), covar=tensor([0.0360, 0.0128, 0.0699, 0.0206, 0.1966, 0.0128, 0.1476, 0.1957], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0100, 0.0254, 0.0114, 0.0224, 0.0104, 0.0229, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-07 23:22:48,447 INFO [train2.py:809] (2/4) Epoch 8, batch 1650, loss[ctc_loss=0.151, att_loss=0.2808, loss=0.2548, over 17182.00 frames. utt_duration=695.9 frames, utt_pad_proportion=0.1214, over 99.00 utterances.], tot_loss[ctc_loss=0.1255, att_loss=0.2616, loss=0.2343, over 3276060.83 frames. utt_duration=1228 frames, utt_pad_proportion=0.05744, over 10680.20 utterances.], batch size: 99, lr: 1.37e-02, grad_scale: 8.0 2023-03-07 23:23:10,628 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29551.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:23:33,456 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29565.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:23:53,678 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0302, 1.5178, 1.7138, 1.9880, 2.9993, 2.2139, 1.6020, 2.5897], device='cuda:2'), covar=tensor([0.0595, 0.4978, 0.4092, 0.1988, 0.0927, 0.2023, 0.3727, 0.1315], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0082, 0.0086, 0.0076, 0.0073, 0.0070, 0.0081, 0.0064], device='cuda:2'), out_proj_covar=tensor([4.0419e-05, 5.1673e-05, 5.1416e-05, 4.3862e-05, 3.9240e-05, 4.3845e-05, 5.0345e-05, 4.1509e-05], device='cuda:2') 2023-03-07 23:24:08,553 INFO [train2.py:809] (2/4) Epoch 8, batch 1700, loss[ctc_loss=0.1105, att_loss=0.2522, loss=0.2238, over 16547.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.005807, over 45.00 utterances.], tot_loss[ctc_loss=0.1243, att_loss=0.2614, loss=0.234, over 3284041.23 frames. utt_duration=1244 frames, utt_pad_proportion=0.0515, over 10576.42 utterances.], batch size: 45, lr: 1.36e-02, grad_scale: 8.0 2023-03-07 23:24:16,298 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=29592.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:24:47,207 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29612.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:24:52,833 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+02 2.662e+02 3.188e+02 4.272e+02 1.218e+03, threshold=6.376e+02, percent-clipped=1.0 2023-03-07 23:25:28,584 INFO [train2.py:809] (2/4) Epoch 8, batch 1750, loss[ctc_loss=0.09606, att_loss=0.2227, loss=0.1973, over 11783.00 frames. utt_duration=1814 frames, utt_pad_proportion=0.02513, over 26.00 utterances.], tot_loss[ctc_loss=0.1248, att_loss=0.2612, loss=0.234, over 3277120.42 frames. utt_duration=1240 frames, utt_pad_proportion=0.05322, over 10584.21 utterances.], batch size: 26, lr: 1.36e-02, grad_scale: 8.0 2023-03-07 23:26:01,610 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29658.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:26:07,779 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29662.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:26:48,815 INFO [train2.py:809] (2/4) Epoch 8, batch 1800, loss[ctc_loss=0.139, att_loss=0.2755, loss=0.2482, over 17018.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007875, over 51.00 utterances.], tot_loss[ctc_loss=0.1253, att_loss=0.2619, loss=0.2345, over 3276550.69 frames. utt_duration=1228 frames, utt_pad_proportion=0.05823, over 10689.43 utterances.], batch size: 51, lr: 1.36e-02, grad_scale: 8.0 2023-03-07 23:27:19,037 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29706.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:27:33,868 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+02 3.006e+02 3.671e+02 4.359e+02 7.069e+02, threshold=7.341e+02, percent-clipped=3.0 2023-03-07 23:27:47,120 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29723.0, num_to_drop=1, layers_to_drop={0} 2023-03-07 23:28:01,597 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9737, 2.0532, 1.4790, 1.8978, 2.6170, 2.2451, 1.4218, 2.6658], device='cuda:2'), covar=tensor([0.0767, 0.3124, 0.4712, 0.2158, 0.0962, 0.1628, 0.3205, 0.1245], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0082, 0.0087, 0.0076, 0.0073, 0.0071, 0.0082, 0.0064], device='cuda:2'), out_proj_covar=tensor([4.0930e-05, 5.1527e-05, 5.2293e-05, 4.4245e-05, 3.9484e-05, 4.4478e-05, 5.1056e-05, 4.1697e-05], device='cuda:2') 2023-03-07 23:28:08,904 INFO [train2.py:809] (2/4) Epoch 8, batch 1850, loss[ctc_loss=0.1434, att_loss=0.2862, loss=0.2576, over 17055.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.008795, over 53.00 utterances.], tot_loss[ctc_loss=0.1246, att_loss=0.2613, loss=0.234, over 3275992.16 frames. utt_duration=1239 frames, utt_pad_proportion=0.05548, over 10590.50 utterances.], batch size: 53, lr: 1.36e-02, grad_scale: 8.0 2023-03-07 23:29:29,190 INFO [train2.py:809] (2/4) Epoch 8, batch 1900, loss[ctc_loss=0.112, att_loss=0.2534, loss=0.2252, over 15945.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.006875, over 41.00 utterances.], tot_loss[ctc_loss=0.1248, att_loss=0.2622, loss=0.2347, over 3280914.63 frames. utt_duration=1226 frames, utt_pad_proportion=0.05803, over 10718.89 utterances.], batch size: 41, lr: 1.36e-02, grad_scale: 8.0 2023-03-07 23:29:53,385 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29802.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:30:14,331 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+02 2.848e+02 3.489e+02 4.183e+02 9.529e+02, threshold=6.977e+02, percent-clipped=3.0 2023-03-07 23:30:23,400 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6835, 4.6221, 4.6975, 4.7009, 5.0162, 5.0064, 4.4852, 2.1931], device='cuda:2'), covar=tensor([0.0243, 0.0395, 0.0259, 0.0202, 0.1404, 0.0176, 0.0314, 0.2732], device='cuda:2'), in_proj_covar=tensor([0.0128, 0.0123, 0.0128, 0.0127, 0.0311, 0.0122, 0.0116, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 23:30:49,810 INFO [train2.py:809] (2/4) Epoch 8, batch 1950, loss[ctc_loss=0.1461, att_loss=0.274, loss=0.2484, over 16401.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.006986, over 44.00 utterances.], tot_loss[ctc_loss=0.1257, att_loss=0.2632, loss=0.2357, over 3285330.49 frames. utt_duration=1214 frames, utt_pad_proportion=0.06021, over 10833.90 utterances.], batch size: 44, lr: 1.36e-02, grad_scale: 8.0 2023-03-07 23:31:32,175 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29863.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:32:10,086 INFO [train2.py:809] (2/4) Epoch 8, batch 2000, loss[ctc_loss=0.1603, att_loss=0.294, loss=0.2673, over 17295.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01242, over 55.00 utterances.], tot_loss[ctc_loss=0.1259, att_loss=0.2632, loss=0.2358, over 3281479.67 frames. utt_duration=1199 frames, utt_pad_proportion=0.06435, over 10960.81 utterances.], batch size: 55, lr: 1.36e-02, grad_scale: 8.0 2023-03-07 23:32:18,041 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29892.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:32:41,935 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=29907.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:32:54,605 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+02 2.836e+02 3.233e+02 3.942e+02 8.276e+02, threshold=6.465e+02, percent-clipped=1.0 2023-03-07 23:33:30,286 INFO [train2.py:809] (2/4) Epoch 8, batch 2050, loss[ctc_loss=0.09699, att_loss=0.2505, loss=0.2198, over 16124.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005852, over 42.00 utterances.], tot_loss[ctc_loss=0.1244, att_loss=0.262, loss=0.2345, over 3282580.65 frames. utt_duration=1229 frames, utt_pad_proportion=0.05659, over 10698.01 utterances.], batch size: 42, lr: 1.36e-02, grad_scale: 8.0 2023-03-07 23:33:34,861 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29940.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:34:03,369 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1970, 4.6911, 4.5905, 4.7141, 4.6370, 4.4531, 3.5330, 4.5794], device='cuda:2'), covar=tensor([0.0115, 0.0112, 0.0097, 0.0095, 0.0108, 0.0103, 0.0499, 0.0223], device='cuda:2'), in_proj_covar=tensor([0.0065, 0.0061, 0.0071, 0.0046, 0.0049, 0.0058, 0.0083, 0.0082], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 23:34:16,638 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0952, 4.9529, 4.7472, 4.6775, 2.4206, 5.0133, 2.6423, 1.7742], device='cuda:2'), covar=tensor([0.0304, 0.0121, 0.0618, 0.0219, 0.2358, 0.0118, 0.1783, 0.1975], device='cuda:2'), in_proj_covar=tensor([0.0123, 0.0098, 0.0250, 0.0110, 0.0220, 0.0099, 0.0226, 0.0198], device='cuda:2'), out_proj_covar=tensor([1.2230e-04, 1.0188e-04, 2.2800e-04, 1.0500e-04, 2.0845e-04, 9.7805e-05, 2.0514e-04, 1.8132e-04], device='cuda:2') 2023-03-07 23:34:51,162 INFO [train2.py:809] (2/4) Epoch 8, batch 2100, loss[ctc_loss=0.1179, att_loss=0.2615, loss=0.2328, over 16328.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006217, over 45.00 utterances.], tot_loss[ctc_loss=0.1229, att_loss=0.261, loss=0.2334, over 3273118.60 frames. utt_duration=1240 frames, utt_pad_proportion=0.05673, over 10573.99 utterances.], batch size: 45, lr: 1.36e-02, grad_scale: 8.0 2023-03-07 23:35:16,322 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2830, 5.3100, 5.1238, 2.7776, 2.2593, 3.0497, 4.5015, 3.9336], device='cuda:2'), covar=tensor([0.0553, 0.0193, 0.0216, 0.3808, 0.5327, 0.2377, 0.0610, 0.1804], device='cuda:2'), in_proj_covar=tensor([0.0320, 0.0201, 0.0228, 0.0189, 0.0357, 0.0341, 0.0217, 0.0347], device='cuda:2'), out_proj_covar=tensor([1.5272e-04, 7.9979e-05, 1.0178e-04, 8.8282e-05, 1.6224e-04, 1.4480e-04, 8.7309e-05, 1.5554e-04], device='cuda:2') 2023-03-07 23:35:43,066 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.671e+02 3.420e+02 4.365e+02 1.034e+03, threshold=6.841e+02, percent-clipped=5.0 2023-03-07 23:35:47,053 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30018.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 23:36:16,101 INFO [train2.py:809] (2/4) Epoch 8, batch 2150, loss[ctc_loss=0.1223, att_loss=0.2527, loss=0.2266, over 15883.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.008282, over 39.00 utterances.], tot_loss[ctc_loss=0.1233, att_loss=0.2611, loss=0.2335, over 3270418.79 frames. utt_duration=1235 frames, utt_pad_proportion=0.058, over 10609.35 utterances.], batch size: 39, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:36:24,556 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8126, 2.4333, 3.7844, 3.4890, 2.8656, 3.8279, 3.8118, 3.8361], device='cuda:2'), covar=tensor([0.0197, 0.1365, 0.0127, 0.0864, 0.1672, 0.0263, 0.0121, 0.0243], device='cuda:2'), in_proj_covar=tensor([0.0151, 0.0245, 0.0126, 0.0305, 0.0287, 0.0185, 0.0110, 0.0147], device='cuda:2'), out_proj_covar=tensor([1.3936e-04, 2.0410e-04, 1.1222e-04, 2.5183e-04, 2.5282e-04, 1.6419e-04, 9.9554e-05, 1.3627e-04], device='cuda:2') 2023-03-07 23:37:36,073 INFO [train2.py:809] (2/4) Epoch 8, batch 2200, loss[ctc_loss=0.122, att_loss=0.2447, loss=0.2201, over 16119.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006895, over 42.00 utterances.], tot_loss[ctc_loss=0.1235, att_loss=0.2611, loss=0.2336, over 3272792.36 frames. utt_duration=1241 frames, utt_pad_proportion=0.055, over 10563.40 utterances.], batch size: 42, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:38:22,431 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.675e+02 3.232e+02 4.079e+02 9.910e+02, threshold=6.464e+02, percent-clipped=1.0 2023-03-07 23:38:37,653 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30125.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:38:55,634 INFO [train2.py:809] (2/4) Epoch 8, batch 2250, loss[ctc_loss=0.1112, att_loss=0.2472, loss=0.22, over 16113.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.005871, over 42.00 utterances.], tot_loss[ctc_loss=0.1236, att_loss=0.2614, loss=0.2338, over 3275897.35 frames. utt_duration=1238 frames, utt_pad_proportion=0.05479, over 10594.11 utterances.], batch size: 42, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:39:28,578 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30158.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:39:50,166 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1370, 5.0925, 5.0287, 2.5332, 1.9138, 2.4940, 3.4826, 3.7145], device='cuda:2'), covar=tensor([0.0597, 0.0140, 0.0195, 0.3141, 0.6229, 0.2912, 0.1175, 0.1841], device='cuda:2'), in_proj_covar=tensor([0.0310, 0.0193, 0.0219, 0.0183, 0.0344, 0.0329, 0.0209, 0.0334], device='cuda:2'), out_proj_covar=tensor([1.4821e-04, 7.6182e-05, 9.7187e-05, 8.4840e-05, 1.5632e-04, 1.3980e-04, 8.4448e-05, 1.4968e-04], device='cuda:2') 2023-03-07 23:40:13,307 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30186.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:40:14,502 INFO [train2.py:809] (2/4) Epoch 8, batch 2300, loss[ctc_loss=0.1184, att_loss=0.2569, loss=0.2292, over 16494.00 frames. utt_duration=1436 frames, utt_pad_proportion=0.004911, over 46.00 utterances.], tot_loss[ctc_loss=0.1224, att_loss=0.2603, loss=0.2327, over 3268290.25 frames. utt_duration=1267 frames, utt_pad_proportion=0.05028, over 10327.20 utterances.], batch size: 46, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:40:31,783 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8521, 3.7414, 2.9003, 3.5619, 3.8520, 3.5230, 2.8045, 4.2793], device='cuda:2'), covar=tensor([0.0973, 0.0479, 0.1224, 0.0569, 0.0613, 0.0622, 0.0914, 0.0416], device='cuda:2'), in_proj_covar=tensor([0.0175, 0.0164, 0.0195, 0.0165, 0.0208, 0.0194, 0.0169, 0.0219], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-07 23:40:32,686 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-03-07 23:40:46,680 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30207.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:40:50,456 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30209.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:41:00,845 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.528e+02 2.649e+02 3.337e+02 4.141e+02 1.009e+03, threshold=6.675e+02, percent-clipped=1.0 2023-03-07 23:41:34,433 INFO [train2.py:809] (2/4) Epoch 8, batch 2350, loss[ctc_loss=0.08781, att_loss=0.2309, loss=0.2022, over 15873.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009861, over 39.00 utterances.], tot_loss[ctc_loss=0.1235, att_loss=0.2608, loss=0.2334, over 3266669.77 frames. utt_duration=1244 frames, utt_pad_proportion=0.05755, over 10514.76 utterances.], batch size: 39, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:41:43,892 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9707, 4.9807, 4.9008, 4.9281, 5.3781, 5.2599, 4.6619, 2.5265], device='cuda:2'), covar=tensor([0.0177, 0.0206, 0.0167, 0.0199, 0.0831, 0.0129, 0.0231, 0.2236], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0120, 0.0125, 0.0125, 0.0303, 0.0121, 0.0113, 0.0224], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 23:42:02,899 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30255.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:42:28,180 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30270.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:42:54,201 INFO [train2.py:809] (2/4) Epoch 8, batch 2400, loss[ctc_loss=0.09958, att_loss=0.2387, loss=0.2109, over 16398.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007843, over 44.00 utterances.], tot_loss[ctc_loss=0.1231, att_loss=0.2607, loss=0.2332, over 3262435.86 frames. utt_duration=1235 frames, utt_pad_proportion=0.06111, over 10580.23 utterances.], batch size: 44, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:43:40,600 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.833e+02 3.441e+02 4.475e+02 1.318e+03, threshold=6.883e+02, percent-clipped=7.0 2023-03-07 23:43:44,653 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30318.0, num_to_drop=1, layers_to_drop={2} 2023-03-07 23:44:13,412 INFO [train2.py:809] (2/4) Epoch 8, batch 2450, loss[ctc_loss=0.1624, att_loss=0.2772, loss=0.2542, over 17005.00 frames. utt_duration=688.5 frames, utt_pad_proportion=0.1361, over 99.00 utterances.], tot_loss[ctc_loss=0.1225, att_loss=0.2605, loss=0.2329, over 3267563.84 frames. utt_duration=1244 frames, utt_pad_proportion=0.0577, over 10516.76 utterances.], batch size: 99, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:44:27,689 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4263, 4.9327, 4.8199, 4.9722, 4.9426, 4.7471, 3.3877, 4.7290], device='cuda:2'), covar=tensor([0.0099, 0.0097, 0.0073, 0.0072, 0.0069, 0.0075, 0.0582, 0.0183], device='cuda:2'), in_proj_covar=tensor([0.0067, 0.0063, 0.0073, 0.0047, 0.0050, 0.0059, 0.0085, 0.0084], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 23:44:44,833 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-07 23:44:50,960 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30360.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:45:00,782 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30366.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:45:34,121 INFO [train2.py:809] (2/4) Epoch 8, batch 2500, loss[ctc_loss=0.1012, att_loss=0.2372, loss=0.21, over 15618.00 frames. utt_duration=1690 frames, utt_pad_proportion=0.009363, over 37.00 utterances.], tot_loss[ctc_loss=0.1231, att_loss=0.2607, loss=0.2332, over 3265092.91 frames. utt_duration=1216 frames, utt_pad_proportion=0.06596, over 10752.87 utterances.], batch size: 37, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:45:59,278 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-07 23:46:05,487 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6385, 4.8400, 4.2717, 4.7577, 4.5065, 4.2407, 4.3969, 4.2513], device='cuda:2'), covar=tensor([0.1163, 0.1170, 0.0997, 0.0864, 0.1073, 0.1442, 0.2457, 0.2341], device='cuda:2'), in_proj_covar=tensor([0.0393, 0.0455, 0.0348, 0.0358, 0.0329, 0.0398, 0.0478, 0.0426], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 23:46:11,874 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3647, 4.8069, 4.7138, 4.7411, 4.8426, 4.6000, 3.3894, 4.6296], device='cuda:2'), covar=tensor([0.0120, 0.0115, 0.0120, 0.0112, 0.0100, 0.0102, 0.0696, 0.0257], device='cuda:2'), in_proj_covar=tensor([0.0067, 0.0063, 0.0073, 0.0048, 0.0050, 0.0059, 0.0086, 0.0084], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 23:46:21,434 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.650e+02 3.302e+02 4.215e+02 1.043e+03, threshold=6.603e+02, percent-clipped=3.0 2023-03-07 23:46:30,141 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30421.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:46:54,581 INFO [train2.py:809] (2/4) Epoch 8, batch 2550, loss[ctc_loss=0.09527, att_loss=0.2432, loss=0.2136, over 16537.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.005781, over 45.00 utterances.], tot_loss[ctc_loss=0.1231, att_loss=0.2608, loss=0.2333, over 3273704.15 frames. utt_duration=1223 frames, utt_pad_proportion=0.06137, over 10723.16 utterances.], batch size: 45, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:47:28,445 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30458.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:48:04,585 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30481.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:48:13,631 INFO [train2.py:809] (2/4) Epoch 8, batch 2600, loss[ctc_loss=0.1198, att_loss=0.2574, loss=0.2299, over 15964.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.006436, over 41.00 utterances.], tot_loss[ctc_loss=0.1228, att_loss=0.2607, loss=0.2331, over 3275152.40 frames. utt_duration=1225 frames, utt_pad_proportion=0.06168, over 10707.68 utterances.], batch size: 41, lr: 1.35e-02, grad_scale: 8.0 2023-03-07 23:48:42,716 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4796, 4.9562, 4.7841, 4.8239, 5.0536, 4.7788, 3.6666, 4.9085], device='cuda:2'), covar=tensor([0.0114, 0.0106, 0.0091, 0.0084, 0.0083, 0.0084, 0.0556, 0.0222], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0061, 0.0072, 0.0047, 0.0049, 0.0058, 0.0084, 0.0082], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-07 23:48:44,082 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30506.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:49:00,819 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 2.736e+02 3.358e+02 3.851e+02 8.673e+02, threshold=6.715e+02, percent-clipped=1.0 2023-03-07 23:49:07,845 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1624, 5.2375, 5.6157, 5.5854, 5.4445, 6.0661, 5.2457, 6.1790], device='cuda:2'), covar=tensor([0.0611, 0.0639, 0.0678, 0.0940, 0.1856, 0.0814, 0.0460, 0.0500], device='cuda:2'), in_proj_covar=tensor([0.0628, 0.0375, 0.0433, 0.0496, 0.0674, 0.0441, 0.0352, 0.0429], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-07 23:49:33,591 INFO [train2.py:809] (2/4) Epoch 8, batch 2650, loss[ctc_loss=0.1233, att_loss=0.2623, loss=0.2345, over 16766.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006747, over 48.00 utterances.], tot_loss[ctc_loss=0.1232, att_loss=0.2608, loss=0.2332, over 3281916.86 frames. utt_duration=1239 frames, utt_pad_proportion=0.05672, over 10610.53 utterances.], batch size: 48, lr: 1.34e-02, grad_scale: 8.0 2023-03-07 23:49:59,152 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8858, 4.7321, 4.5084, 4.9260, 5.2500, 5.0420, 4.6762, 2.2909], device='cuda:2'), covar=tensor([0.0202, 0.0318, 0.0289, 0.0188, 0.0866, 0.0178, 0.0286, 0.2448], device='cuda:2'), in_proj_covar=tensor([0.0130, 0.0121, 0.0126, 0.0126, 0.0307, 0.0122, 0.0113, 0.0224], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-07 23:50:18,955 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30565.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:50:52,910 INFO [train2.py:809] (2/4) Epoch 8, batch 2700, loss[ctc_loss=0.123, att_loss=0.2376, loss=0.2147, over 15395.00 frames. utt_duration=1761 frames, utt_pad_proportion=0.009626, over 35.00 utterances.], tot_loss[ctc_loss=0.1231, att_loss=0.2604, loss=0.2329, over 3277111.99 frames. utt_duration=1244 frames, utt_pad_proportion=0.05371, over 10546.57 utterances.], batch size: 35, lr: 1.34e-02, grad_scale: 8.0 2023-03-07 23:51:25,688 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9222, 5.0788, 5.4298, 5.4039, 5.2486, 5.8844, 5.0171, 5.9695], device='cuda:2'), covar=tensor([0.0533, 0.0640, 0.0593, 0.0847, 0.1791, 0.0655, 0.0551, 0.0492], device='cuda:2'), in_proj_covar=tensor([0.0616, 0.0373, 0.0424, 0.0483, 0.0660, 0.0430, 0.0344, 0.0424], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-07 23:51:39,698 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.974e+02 3.669e+02 4.665e+02 1.299e+03, threshold=7.338e+02, percent-clipped=5.0 2023-03-07 23:52:12,707 INFO [train2.py:809] (2/4) Epoch 8, batch 2750, loss[ctc_loss=0.1486, att_loss=0.2812, loss=0.2547, over 16779.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.006037, over 48.00 utterances.], tot_loss[ctc_loss=0.1245, att_loss=0.2613, loss=0.234, over 3273376.72 frames. utt_duration=1231 frames, utt_pad_proportion=0.05779, over 10653.49 utterances.], batch size: 48, lr: 1.34e-02, grad_scale: 8.0 2023-03-07 23:52:42,328 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30655.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:52:59,240 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.84 vs. limit=2.0 2023-03-07 23:53:33,187 INFO [train2.py:809] (2/4) Epoch 8, batch 2800, loss[ctc_loss=0.1085, att_loss=0.2265, loss=0.2029, over 15510.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.00813, over 36.00 utterances.], tot_loss[ctc_loss=0.1237, att_loss=0.2608, loss=0.2334, over 3274663.43 frames. utt_duration=1224 frames, utt_pad_proportion=0.05879, over 10710.41 utterances.], batch size: 36, lr: 1.34e-02, grad_scale: 8.0 2023-03-07 23:53:56,535 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6295, 2.4565, 3.3768, 4.4894, 4.1554, 4.0449, 2.8178, 1.6696], device='cuda:2'), covar=tensor([0.0537, 0.2327, 0.1076, 0.0475, 0.0545, 0.0319, 0.1649, 0.2926], device='cuda:2'), in_proj_covar=tensor([0.0166, 0.0201, 0.0190, 0.0184, 0.0174, 0.0140, 0.0193, 0.0187], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 23:54:21,208 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+02 2.729e+02 3.286e+02 4.018e+02 8.527e+02, threshold=6.573e+02, percent-clipped=3.0 2023-03-07 23:54:21,499 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30716.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:54:21,696 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30716.0, num_to_drop=1, layers_to_drop={3} 2023-03-07 23:54:53,843 INFO [train2.py:809] (2/4) Epoch 8, batch 2850, loss[ctc_loss=0.1206, att_loss=0.2412, loss=0.2171, over 15359.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01191, over 35.00 utterances.], tot_loss[ctc_loss=0.1235, att_loss=0.2609, loss=0.2334, over 3274171.23 frames. utt_duration=1225 frames, utt_pad_proportion=0.06016, over 10701.34 utterances.], batch size: 35, lr: 1.34e-02, grad_scale: 8.0 2023-03-07 23:55:54,735 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8417, 6.0994, 5.3591, 5.9301, 5.7028, 5.4108, 5.5463, 5.4161], device='cuda:2'), covar=tensor([0.1062, 0.0826, 0.0870, 0.0754, 0.0779, 0.1310, 0.1842, 0.2124], device='cuda:2'), in_proj_covar=tensor([0.0398, 0.0451, 0.0350, 0.0361, 0.0334, 0.0397, 0.0475, 0.0424], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 23:56:04,122 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30781.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:56:08,495 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7468, 6.0537, 5.3102, 5.9090, 5.6703, 5.2917, 5.4389, 5.3360], device='cuda:2'), covar=tensor([0.1153, 0.0787, 0.0856, 0.0636, 0.0755, 0.1306, 0.2079, 0.1933], device='cuda:2'), in_proj_covar=tensor([0.0397, 0.0450, 0.0349, 0.0360, 0.0333, 0.0397, 0.0473, 0.0423], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-07 23:56:13,176 INFO [train2.py:809] (2/4) Epoch 8, batch 2900, loss[ctc_loss=0.08547, att_loss=0.2211, loss=0.194, over 15395.00 frames. utt_duration=1761 frames, utt_pad_proportion=0.009497, over 35.00 utterances.], tot_loss[ctc_loss=0.1233, att_loss=0.2609, loss=0.2333, over 3272619.64 frames. utt_duration=1213 frames, utt_pad_proportion=0.06375, over 10806.73 utterances.], batch size: 35, lr: 1.34e-02, grad_scale: 8.0 2023-03-07 23:56:50,342 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.61 vs. limit=2.0 2023-03-07 23:56:53,914 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2920, 2.6738, 3.0295, 4.3441, 4.0269, 4.0612, 2.7646, 1.8324], device='cuda:2'), covar=tensor([0.0682, 0.2053, 0.1195, 0.0469, 0.0547, 0.0316, 0.1537, 0.2700], device='cuda:2'), in_proj_covar=tensor([0.0165, 0.0201, 0.0189, 0.0180, 0.0174, 0.0141, 0.0193, 0.0187], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-07 23:56:59,560 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+02 2.826e+02 3.398e+02 4.214e+02 7.798e+02, threshold=6.796e+02, percent-clipped=3.0 2023-03-07 23:57:20,142 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30829.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:57:32,272 INFO [train2.py:809] (2/4) Epoch 8, batch 2950, loss[ctc_loss=0.1018, att_loss=0.2445, loss=0.2159, over 16280.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.00731, over 43.00 utterances.], tot_loss[ctc_loss=0.1233, att_loss=0.2611, loss=0.2335, over 3275906.59 frames. utt_duration=1223 frames, utt_pad_proportion=0.06119, over 10728.91 utterances.], batch size: 43, lr: 1.34e-02, grad_scale: 8.0 2023-03-07 23:58:17,850 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30865.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:58:52,465 INFO [train2.py:809] (2/4) Epoch 8, batch 3000, loss[ctc_loss=0.108, att_loss=0.2583, loss=0.2282, over 16951.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008356, over 50.00 utterances.], tot_loss[ctc_loss=0.124, att_loss=0.2618, loss=0.2343, over 3268051.97 frames. utt_duration=1189 frames, utt_pad_proportion=0.07161, over 11008.33 utterances.], batch size: 50, lr: 1.34e-02, grad_scale: 8.0 2023-03-07 23:58:52,465 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-07 23:59:06,434 INFO [train2.py:843] (2/4) Epoch 8, validation: ctc_loss=0.05812, att_loss=0.2422, loss=0.2054, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-07 23:59:06,434 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-07 23:59:32,923 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30903.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:59:48,143 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30913.0, num_to_drop=0, layers_to_drop=set() 2023-03-07 23:59:53,217 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.876e+02 3.540e+02 4.400e+02 7.249e+02, threshold=7.079e+02, percent-clipped=3.0 2023-03-08 00:00:07,481 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30925.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:00:21,858 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30934.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:00:26,211 INFO [train2.py:809] (2/4) Epoch 8, batch 3050, loss[ctc_loss=0.1176, att_loss=0.2775, loss=0.2455, over 17124.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01485, over 56.00 utterances.], tot_loss[ctc_loss=0.1227, att_loss=0.2609, loss=0.2332, over 3268853.92 frames. utt_duration=1208 frames, utt_pad_proportion=0.06713, over 10837.25 utterances.], batch size: 56, lr: 1.34e-02, grad_scale: 8.0 2023-03-08 00:01:10,382 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30964.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:01:44,896 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30986.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:01:46,066 INFO [train2.py:809] (2/4) Epoch 8, batch 3100, loss[ctc_loss=0.1036, att_loss=0.26, loss=0.2287, over 16979.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006082, over 50.00 utterances.], tot_loss[ctc_loss=0.1218, att_loss=0.2604, loss=0.2327, over 3274700.68 frames. utt_duration=1237 frames, utt_pad_proportion=0.0581, over 10600.81 utterances.], batch size: 50, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:01:59,077 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30995.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:02:24,729 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31011.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 00:02:24,869 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31011.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 00:02:32,583 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+02 2.541e+02 3.093e+02 3.936e+02 7.180e+02, threshold=6.186e+02, percent-clipped=1.0 2023-03-08 00:02:32,935 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31016.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:03:05,881 INFO [train2.py:809] (2/4) Epoch 8, batch 3150, loss[ctc_loss=0.1947, att_loss=0.3058, loss=0.2835, over 13616.00 frames. utt_duration=374.6 frames, utt_pad_proportion=0.3486, over 146.00 utterances.], tot_loss[ctc_loss=0.1232, att_loss=0.2612, loss=0.2336, over 3267449.92 frames. utt_duration=1207 frames, utt_pad_proportion=0.06705, over 10839.15 utterances.], batch size: 146, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:03:49,186 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31064.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:04:01,722 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31072.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 00:04:25,397 INFO [train2.py:809] (2/4) Epoch 8, batch 3200, loss[ctc_loss=0.101, att_loss=0.2676, loss=0.2342, over 17061.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.008462, over 52.00 utterances.], tot_loss[ctc_loss=0.1223, att_loss=0.2607, loss=0.233, over 3273483.20 frames. utt_duration=1226 frames, utt_pad_proportion=0.06082, over 10689.87 utterances.], batch size: 52, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:04:59,378 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2023-03-08 00:05:11,436 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.783e+02 3.320e+02 3.859e+02 7.988e+02, threshold=6.640e+02, percent-clipped=5.0 2023-03-08 00:05:32,382 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9758, 6.2040, 5.6244, 6.0116, 5.8660, 5.4264, 5.6402, 5.5049], device='cuda:2'), covar=tensor([0.1072, 0.0662, 0.0754, 0.0625, 0.0580, 0.1360, 0.2020, 0.1980], device='cuda:2'), in_proj_covar=tensor([0.0395, 0.0446, 0.0345, 0.0356, 0.0328, 0.0396, 0.0470, 0.0422], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 00:05:44,556 INFO [train2.py:809] (2/4) Epoch 8, batch 3250, loss[ctc_loss=0.11, att_loss=0.2539, loss=0.2252, over 17028.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.00816, over 51.00 utterances.], tot_loss[ctc_loss=0.1209, att_loss=0.2594, loss=0.2317, over 3273480.51 frames. utt_duration=1264 frames, utt_pad_proportion=0.0521, over 10369.04 utterances.], batch size: 51, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:06:15,191 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9091, 3.9800, 3.2544, 3.7368, 3.9306, 3.6615, 2.6427, 4.4938], device='cuda:2'), covar=tensor([0.1146, 0.0466, 0.1253, 0.0648, 0.0739, 0.0758, 0.1158, 0.0575], device='cuda:2'), in_proj_covar=tensor([0.0175, 0.0166, 0.0194, 0.0165, 0.0211, 0.0199, 0.0173, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 00:06:44,220 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.96 vs. limit=2.0 2023-03-08 00:07:00,813 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8232, 6.1593, 5.5578, 5.9143, 5.7167, 5.4265, 5.6097, 5.5186], device='cuda:2'), covar=tensor([0.1193, 0.0779, 0.0740, 0.0690, 0.0701, 0.1152, 0.2187, 0.2203], device='cuda:2'), in_proj_covar=tensor([0.0402, 0.0457, 0.0353, 0.0365, 0.0336, 0.0403, 0.0480, 0.0433], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 00:07:03,810 INFO [train2.py:809] (2/4) Epoch 8, batch 3300, loss[ctc_loss=0.1341, att_loss=0.2765, loss=0.248, over 16749.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006722, over 48.00 utterances.], tot_loss[ctc_loss=0.12, att_loss=0.2589, loss=0.2311, over 3275249.62 frames. utt_duration=1268 frames, utt_pad_proportion=0.05045, over 10342.01 utterances.], batch size: 48, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:07:51,306 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+02 2.610e+02 3.147e+02 4.010e+02 7.285e+02, threshold=6.294e+02, percent-clipped=3.0 2023-03-08 00:08:24,613 INFO [train2.py:809] (2/4) Epoch 8, batch 3350, loss[ctc_loss=0.1149, att_loss=0.2677, loss=0.2371, over 16525.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.007082, over 45.00 utterances.], tot_loss[ctc_loss=0.1192, att_loss=0.2587, loss=0.2308, over 3271250.59 frames. utt_duration=1259 frames, utt_pad_proportion=0.05279, over 10408.75 utterances.], batch size: 45, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:08:40,166 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3191, 3.0684, 2.5942, 2.8025, 3.1533, 3.0149, 2.3275, 3.1635], device='cuda:2'), covar=tensor([0.1002, 0.0393, 0.0901, 0.0612, 0.0660, 0.0592, 0.0907, 0.0517], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0168, 0.0198, 0.0168, 0.0213, 0.0202, 0.0175, 0.0225], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 00:09:00,708 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31259.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:09:35,253 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31281.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:09:43,086 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6881, 4.4036, 4.5660, 4.5948, 4.9813, 4.6789, 4.4221, 1.9977], device='cuda:2'), covar=tensor([0.0204, 0.0317, 0.0225, 0.0131, 0.1131, 0.0175, 0.0284, 0.2545], device='cuda:2'), in_proj_covar=tensor([0.0132, 0.0125, 0.0127, 0.0129, 0.0315, 0.0124, 0.0114, 0.0225], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 00:09:44,168 INFO [train2.py:809] (2/4) Epoch 8, batch 3400, loss[ctc_loss=0.1211, att_loss=0.264, loss=0.2354, over 17096.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.01626, over 56.00 utterances.], tot_loss[ctc_loss=0.119, att_loss=0.2579, loss=0.2301, over 3266889.47 frames. utt_duration=1262 frames, utt_pad_proportion=0.05323, over 10366.86 utterances.], batch size: 56, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:09:49,557 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31290.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:10:21,693 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1797, 5.2133, 5.1137, 2.5680, 2.1152, 2.9897, 4.4669, 3.7626], device='cuda:2'), covar=tensor([0.0556, 0.0205, 0.0238, 0.3878, 0.5947, 0.2509, 0.0672, 0.2030], device='cuda:2'), in_proj_covar=tensor([0.0323, 0.0205, 0.0226, 0.0189, 0.0356, 0.0342, 0.0217, 0.0348], device='cuda:2'), out_proj_covar=tensor([1.5354e-04, 8.0871e-05, 9.9852e-05, 8.6765e-05, 1.6099e-04, 1.4490e-04, 8.6828e-05, 1.5364e-04], device='cuda:2') 2023-03-08 00:10:23,080 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31311.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:10:30,543 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+02 2.871e+02 3.632e+02 4.520e+02 1.169e+03, threshold=7.263e+02, percent-clipped=5.0 2023-03-08 00:11:04,308 INFO [train2.py:809] (2/4) Epoch 8, batch 3450, loss[ctc_loss=0.09283, att_loss=0.2271, loss=0.2003, over 15752.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009719, over 38.00 utterances.], tot_loss[ctc_loss=0.1206, att_loss=0.2586, loss=0.231, over 3272300.44 frames. utt_duration=1245 frames, utt_pad_proportion=0.0555, over 10529.85 utterances.], batch size: 38, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:11:40,440 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31359.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:11:52,647 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31367.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 00:12:25,039 INFO [train2.py:809] (2/4) Epoch 8, batch 3500, loss[ctc_loss=0.1043, att_loss=0.2364, loss=0.21, over 15650.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.007917, over 37.00 utterances.], tot_loss[ctc_loss=0.1216, att_loss=0.2599, loss=0.2322, over 3280264.91 frames. utt_duration=1228 frames, utt_pad_proportion=0.05738, over 10697.06 utterances.], batch size: 37, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:13:02,229 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31409.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:13:12,733 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.623e+02 3.276e+02 4.275e+02 9.794e+02, threshold=6.553e+02, percent-clipped=4.0 2023-03-08 00:13:47,412 INFO [train2.py:809] (2/4) Epoch 8, batch 3550, loss[ctc_loss=0.1198, att_loss=0.2655, loss=0.2363, over 16616.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005932, over 47.00 utterances.], tot_loss[ctc_loss=0.1225, att_loss=0.2609, loss=0.2333, over 3274300.03 frames. utt_duration=1207 frames, utt_pad_proportion=0.06592, over 10860.77 utterances.], batch size: 47, lr: 1.33e-02, grad_scale: 8.0 2023-03-08 00:13:57,880 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1351, 5.3984, 5.6532, 5.5478, 5.5568, 6.1196, 5.2721, 6.1149], device='cuda:2'), covar=tensor([0.0575, 0.0550, 0.0567, 0.0854, 0.1584, 0.0686, 0.0508, 0.0600], device='cuda:2'), in_proj_covar=tensor([0.0634, 0.0380, 0.0437, 0.0500, 0.0669, 0.0430, 0.0351, 0.0430], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002, 0.0003], device='cuda:2') 2023-03-08 00:13:58,424 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2023-03-08 00:14:40,195 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31470.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:14:58,558 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6263, 2.3324, 3.1019, 4.3346, 4.0837, 4.1110, 2.7750, 2.0806], device='cuda:2'), covar=tensor([0.0588, 0.2596, 0.1336, 0.0566, 0.0554, 0.0281, 0.1478, 0.2569], device='cuda:2'), in_proj_covar=tensor([0.0165, 0.0200, 0.0184, 0.0178, 0.0169, 0.0136, 0.0185, 0.0178], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 00:15:08,418 INFO [train2.py:809] (2/4) Epoch 8, batch 3600, loss[ctc_loss=0.1244, att_loss=0.2747, loss=0.2446, over 17163.00 frames. utt_duration=1227 frames, utt_pad_proportion=0.01248, over 56.00 utterances.], tot_loss[ctc_loss=0.1216, att_loss=0.2599, loss=0.2322, over 3270040.87 frames. utt_duration=1228 frames, utt_pad_proportion=0.06289, over 10668.91 utterances.], batch size: 56, lr: 1.32e-02, grad_scale: 8.0 2023-03-08 00:15:15,112 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31491.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 00:15:54,465 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.411e+02 2.609e+02 2.998e+02 4.010e+02 6.869e+02, threshold=5.995e+02, percent-clipped=2.0 2023-03-08 00:15:54,829 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2884, 5.1597, 5.0916, 2.9446, 4.9688, 4.6707, 4.4810, 2.8952], device='cuda:2'), covar=tensor([0.0126, 0.0081, 0.0187, 0.0970, 0.0096, 0.0126, 0.0251, 0.1200], device='cuda:2'), in_proj_covar=tensor([0.0056, 0.0075, 0.0063, 0.0100, 0.0065, 0.0085, 0.0087, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 00:16:29,588 INFO [train2.py:809] (2/4) Epoch 8, batch 3650, loss[ctc_loss=0.1043, att_loss=0.2656, loss=0.2333, over 16449.00 frames. utt_duration=1432 frames, utt_pad_proportion=0.007593, over 46.00 utterances.], tot_loss[ctc_loss=0.1215, att_loss=0.2602, loss=0.2324, over 3272648.18 frames. utt_duration=1216 frames, utt_pad_proportion=0.06536, over 10777.90 utterances.], batch size: 46, lr: 1.32e-02, grad_scale: 8.0 2023-03-08 00:16:47,159 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5070, 4.9141, 4.7672, 4.8664, 4.8915, 4.6106, 3.7030, 4.8332], device='cuda:2'), covar=tensor([0.0101, 0.0091, 0.0105, 0.0081, 0.0098, 0.0109, 0.0532, 0.0178], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0061, 0.0072, 0.0046, 0.0050, 0.0059, 0.0082, 0.0081], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 00:16:54,310 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31552.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 00:17:04,961 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31559.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:17:40,260 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31581.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:17:50,162 INFO [train2.py:809] (2/4) Epoch 8, batch 3700, loss[ctc_loss=0.1425, att_loss=0.2817, loss=0.2538, over 17295.00 frames. utt_duration=877.1 frames, utt_pad_proportion=0.08162, over 79.00 utterances.], tot_loss[ctc_loss=0.1218, att_loss=0.2603, loss=0.2326, over 3277986.39 frames. utt_duration=1214 frames, utt_pad_proportion=0.06374, over 10813.12 utterances.], batch size: 79, lr: 1.32e-02, grad_scale: 8.0 2023-03-08 00:17:55,138 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31590.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:18:22,264 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31607.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:18:35,888 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.776e+02 3.358e+02 4.277e+02 8.903e+02, threshold=6.717e+02, percent-clipped=4.0 2023-03-08 00:18:57,110 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31629.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:19:10,115 INFO [train2.py:809] (2/4) Epoch 8, batch 3750, loss[ctc_loss=0.1125, att_loss=0.268, loss=0.2369, over 16975.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.006969, over 50.00 utterances.], tot_loss[ctc_loss=0.1219, att_loss=0.2603, loss=0.2327, over 3268142.34 frames. utt_duration=1218 frames, utt_pad_proportion=0.06493, over 10744.53 utterances.], batch size: 50, lr: 1.32e-02, grad_scale: 8.0 2023-03-08 00:19:11,680 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31638.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:19:57,359 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31667.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 00:20:29,573 INFO [train2.py:809] (2/4) Epoch 8, batch 3800, loss[ctc_loss=0.09153, att_loss=0.2426, loss=0.2124, over 16691.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006245, over 46.00 utterances.], tot_loss[ctc_loss=0.122, att_loss=0.2604, loss=0.2327, over 3277273.70 frames. utt_duration=1223 frames, utt_pad_proportion=0.06069, over 10731.88 utterances.], batch size: 46, lr: 1.32e-02, grad_scale: 8.0 2023-03-08 00:20:56,818 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31704.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:21:13,677 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31715.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 00:21:14,931 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.603e+02 2.740e+02 3.298e+02 4.235e+02 1.019e+03, threshold=6.595e+02, percent-clipped=5.0 2023-03-08 00:21:49,313 INFO [train2.py:809] (2/4) Epoch 8, batch 3850, loss[ctc_loss=0.142, att_loss=0.2841, loss=0.2556, over 17319.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.01003, over 55.00 utterances.], tot_loss[ctc_loss=0.1217, att_loss=0.2609, loss=0.2331, over 3284337.37 frames. utt_duration=1235 frames, utt_pad_proportion=0.0558, over 10651.38 utterances.], batch size: 55, lr: 1.32e-02, grad_scale: 8.0 2023-03-08 00:22:32,493 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31765.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:22:32,698 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31765.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:23:05,946 INFO [train2.py:809] (2/4) Epoch 8, batch 3900, loss[ctc_loss=0.1174, att_loss=0.25, loss=0.2235, over 16389.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.00813, over 44.00 utterances.], tot_loss[ctc_loss=0.1214, att_loss=0.2603, loss=0.2325, over 3280414.41 frames. utt_duration=1241 frames, utt_pad_proportion=0.05566, over 10588.44 utterances.], batch size: 44, lr: 1.32e-02, grad_scale: 8.0 2023-03-08 00:23:07,817 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31788.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:23:24,919 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5343, 5.8323, 5.1852, 5.6412, 5.4049, 5.1057, 5.1753, 5.0257], device='cuda:2'), covar=tensor([0.1102, 0.0790, 0.0848, 0.0657, 0.0808, 0.1404, 0.1998, 0.2228], device='cuda:2'), in_proj_covar=tensor([0.0398, 0.0452, 0.0347, 0.0359, 0.0330, 0.0393, 0.0465, 0.0421], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 00:23:50,995 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+02 2.672e+02 3.345e+02 4.002e+02 8.198e+02, threshold=6.690e+02, percent-clipped=4.0 2023-03-08 00:24:01,934 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31823.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:24:22,888 INFO [train2.py:809] (2/4) Epoch 8, batch 3950, loss[ctc_loss=0.09056, att_loss=0.217, loss=0.1917, over 15478.00 frames. utt_duration=1721 frames, utt_pad_proportion=0.01024, over 36.00 utterances.], tot_loss[ctc_loss=0.1211, att_loss=0.2597, loss=0.232, over 3274610.90 frames. utt_duration=1238 frames, utt_pad_proportion=0.05677, over 10590.06 utterances.], batch size: 36, lr: 1.32e-02, grad_scale: 8.0 2023-03-08 00:24:38,345 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31847.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 00:24:41,473 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31849.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:25:07,969 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9190, 3.8291, 3.0979, 3.4797, 3.7418, 3.4328, 2.7385, 4.3647], device='cuda:2'), covar=tensor([0.1057, 0.0447, 0.1080, 0.0617, 0.0639, 0.0710, 0.0962, 0.0409], device='cuda:2'), in_proj_covar=tensor([0.0175, 0.0169, 0.0198, 0.0164, 0.0211, 0.0199, 0.0172, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 00:25:39,120 INFO [train2.py:809] (2/4) Epoch 9, batch 0, loss[ctc_loss=0.1262, att_loss=0.2753, loss=0.2455, over 16893.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006215, over 49.00 utterances.], tot_loss[ctc_loss=0.1262, att_loss=0.2753, loss=0.2455, over 16893.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006215, over 49.00 utterances.], batch size: 49, lr: 1.25e-02, grad_scale: 8.0 2023-03-08 00:25:39,120 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 00:25:51,858 INFO [train2.py:843] (2/4) Epoch 9, validation: ctc_loss=0.05701, att_loss=0.2418, loss=0.2048, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 00:25:51,859 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 00:26:05,419 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31878.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:26:15,010 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31884.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:26:43,330 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1375, 5.0026, 4.8639, 4.9626, 5.4275, 5.0913, 4.8358, 2.2922], device='cuda:2'), covar=tensor([0.0125, 0.0156, 0.0134, 0.0184, 0.1113, 0.0125, 0.0163, 0.2344], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0123, 0.0126, 0.0130, 0.0311, 0.0124, 0.0113, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 00:27:04,483 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31915.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:27:05,620 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.515e+02 2.740e+02 3.193e+02 4.001e+02 8.553e+02, threshold=6.386e+02, percent-clipped=3.0 2023-03-08 00:27:12,471 INFO [train2.py:809] (2/4) Epoch 9, batch 50, loss[ctc_loss=0.1394, att_loss=0.2805, loss=0.2523, over 16996.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.009964, over 51.00 utterances.], tot_loss[ctc_loss=0.1198, att_loss=0.2629, loss=0.2343, over 745429.57 frames. utt_duration=1175 frames, utt_pad_proportion=0.06665, over 2540.98 utterances.], batch size: 51, lr: 1.25e-02, grad_scale: 16.0 2023-03-08 00:27:43,178 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31939.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:28:00,024 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31950.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:28:08,450 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5395, 2.4869, 4.9463, 3.9049, 2.8796, 4.5338, 4.8655, 4.7091], device='cuda:2'), covar=tensor([0.0236, 0.1843, 0.0159, 0.1125, 0.2164, 0.0211, 0.0100, 0.0214], device='cuda:2'), in_proj_covar=tensor([0.0147, 0.0244, 0.0129, 0.0307, 0.0286, 0.0185, 0.0110, 0.0147], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0003, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 00:28:12,842 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4988, 2.4267, 4.8779, 3.8459, 2.8062, 4.3972, 4.6220, 4.6069], device='cuda:2'), covar=tensor([0.0209, 0.1786, 0.0117, 0.1034, 0.2198, 0.0234, 0.0117, 0.0239], device='cuda:2'), in_proj_covar=tensor([0.0147, 0.0243, 0.0129, 0.0306, 0.0285, 0.0185, 0.0110, 0.0146], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0003, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 00:28:31,527 INFO [train2.py:809] (2/4) Epoch 9, batch 100, loss[ctc_loss=0.1329, att_loss=0.2716, loss=0.2439, over 17398.00 frames. utt_duration=1010 frames, utt_pad_proportion=0.04718, over 69.00 utterances.], tot_loss[ctc_loss=0.1178, att_loss=0.2592, loss=0.2309, over 1307828.85 frames. utt_duration=1299 frames, utt_pad_proportion=0.03962, over 4030.72 utterances.], batch size: 69, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:28:41,655 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31976.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:29:37,912 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([1.8896, 3.2821, 3.2743, 2.6967, 3.1101, 3.3176, 3.1814, 1.9131], device='cuda:2'), covar=tensor([0.1774, 0.2179, 0.4544, 0.7772, 0.4853, 0.5167, 0.1530, 1.1675], device='cuda:2'), in_proj_covar=tensor([0.0086, 0.0098, 0.0103, 0.0166, 0.0089, 0.0152, 0.0086, 0.0154], device='cuda:2'), out_proj_covar=tensor([7.9464e-05, 8.1687e-05, 8.9835e-05, 1.3160e-04, 7.8582e-05, 1.2191e-04, 7.3431e-05, 1.2346e-04], device='cuda:2') 2023-03-08 00:29:41,557 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=32011.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:29:48,731 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 3.075e+02 3.496e+02 4.231e+02 7.573e+02, threshold=6.993e+02, percent-clipped=4.0 2023-03-08 00:29:54,998 INFO [train2.py:809] (2/4) Epoch 9, batch 150, loss[ctc_loss=0.09315, att_loss=0.2384, loss=0.2094, over 16003.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007388, over 40.00 utterances.], tot_loss[ctc_loss=0.1176, att_loss=0.2586, loss=0.2304, over 1744978.04 frames. utt_duration=1275 frames, utt_pad_proportion=0.044, over 5479.01 utterances.], batch size: 40, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:30:43,806 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.00 vs. limit=5.0 2023-03-08 00:30:59,451 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32060.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:31:07,695 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32065.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:31:15,199 INFO [train2.py:809] (2/4) Epoch 9, batch 200, loss[ctc_loss=0.0988, att_loss=0.253, loss=0.2221, over 16694.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005386, over 46.00 utterances.], tot_loss[ctc_loss=0.1182, att_loss=0.259, loss=0.2308, over 2082066.73 frames. utt_duration=1253 frames, utt_pad_proportion=0.05214, over 6656.99 utterances.], batch size: 46, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:32:23,973 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32113.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:32:28,422 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.737e+02 3.348e+02 3.979e+02 1.144e+03, threshold=6.695e+02, percent-clipped=2.0 2023-03-08 00:32:34,684 INFO [train2.py:809] (2/4) Epoch 9, batch 250, loss[ctc_loss=0.1803, att_loss=0.3052, loss=0.2802, over 17319.00 frames. utt_duration=1101 frames, utt_pad_proportion=0.03669, over 63.00 utterances.], tot_loss[ctc_loss=0.1182, att_loss=0.2589, loss=0.2308, over 2345837.44 frames. utt_duration=1276 frames, utt_pad_proportion=0.04735, over 7360.06 utterances.], batch size: 63, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:33:13,091 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32144.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:33:17,806 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32147.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 00:33:54,281 INFO [train2.py:809] (2/4) Epoch 9, batch 300, loss[ctc_loss=0.1383, att_loss=0.2821, loss=0.2534, over 17044.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.008686, over 52.00 utterances.], tot_loss[ctc_loss=0.1189, att_loss=0.2597, loss=0.2316, over 2551894.77 frames. utt_duration=1263 frames, utt_pad_proportion=0.05041, over 8093.96 utterances.], batch size: 52, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:33:55,835 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.97 vs. limit=2.0 2023-03-08 00:34:09,526 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32179.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:34:34,216 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32195.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 00:34:55,329 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4537, 4.9536, 4.7129, 5.0839, 5.0551, 4.6919, 3.3984, 4.8151], device='cuda:2'), covar=tensor([0.0111, 0.0108, 0.0087, 0.0069, 0.0066, 0.0091, 0.0650, 0.0211], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0063, 0.0073, 0.0047, 0.0049, 0.0060, 0.0084, 0.0082], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 00:35:08,279 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.478e+02 2.524e+02 3.099e+02 4.059e+02 8.225e+02, threshold=6.197e+02, percent-clipped=2.0 2023-03-08 00:35:14,582 INFO [train2.py:809] (2/4) Epoch 9, batch 350, loss[ctc_loss=0.122, att_loss=0.2458, loss=0.221, over 15996.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.008132, over 40.00 utterances.], tot_loss[ctc_loss=0.1174, att_loss=0.2584, loss=0.2302, over 2711247.56 frames. utt_duration=1272 frames, utt_pad_proportion=0.04974, over 8535.92 utterances.], batch size: 40, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:35:37,881 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32234.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:35:42,780 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5871, 3.9013, 3.7582, 3.1067, 3.5759, 3.7908, 3.6668, 2.5316], device='cuda:2'), covar=tensor([0.1333, 0.0891, 0.2014, 0.6690, 0.1799, 0.4087, 0.0948, 1.0143], device='cuda:2'), in_proj_covar=tensor([0.0086, 0.0096, 0.0102, 0.0167, 0.0088, 0.0152, 0.0087, 0.0153], device='cuda:2'), out_proj_covar=tensor([7.9445e-05, 8.0112e-05, 8.9655e-05, 1.3258e-04, 7.8253e-05, 1.2210e-04, 7.3862e-05, 1.2287e-04], device='cuda:2') 2023-03-08 00:36:35,203 INFO [train2.py:809] (2/4) Epoch 9, batch 400, loss[ctc_loss=0.09505, att_loss=0.2358, loss=0.2076, over 15886.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.007429, over 39.00 utterances.], tot_loss[ctc_loss=0.117, att_loss=0.2576, loss=0.2295, over 2828187.87 frames. utt_duration=1274 frames, utt_pad_proportion=0.0493, over 8889.44 utterances.], batch size: 39, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:36:36,902 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32271.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:36:49,944 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3699, 5.7089, 5.0110, 5.5844, 5.3510, 5.0049, 5.0487, 4.9864], device='cuda:2'), covar=tensor([0.1321, 0.0926, 0.0855, 0.0725, 0.0757, 0.1384, 0.2187, 0.2052], device='cuda:2'), in_proj_covar=tensor([0.0390, 0.0446, 0.0341, 0.0353, 0.0327, 0.0386, 0.0461, 0.0418], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 00:37:13,885 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.31 vs. limit=5.0 2023-03-08 00:37:32,240 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32306.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:37:48,588 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+02 2.567e+02 3.123e+02 3.914e+02 5.946e+02, threshold=6.245e+02, percent-clipped=0.0 2023-03-08 00:37:54,626 INFO [train2.py:809] (2/4) Epoch 9, batch 450, loss[ctc_loss=0.1138, att_loss=0.2697, loss=0.2385, over 17314.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.01959, over 59.00 utterances.], tot_loss[ctc_loss=0.1168, att_loss=0.2574, loss=0.2293, over 2932095.04 frames. utt_duration=1271 frames, utt_pad_proportion=0.04778, over 9237.81 utterances.], batch size: 59, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:38:52,194 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5139, 4.5255, 4.4139, 4.9134, 2.5111, 4.5544, 2.9586, 2.6337], device='cuda:2'), covar=tensor([0.0217, 0.0184, 0.0845, 0.0157, 0.1999, 0.0214, 0.1461, 0.1471], device='cuda:2'), in_proj_covar=tensor([0.0124, 0.0103, 0.0252, 0.0107, 0.0218, 0.0102, 0.0228, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 00:38:59,121 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32360.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:39:14,595 INFO [train2.py:809] (2/4) Epoch 9, batch 500, loss[ctc_loss=0.09543, att_loss=0.2302, loss=0.2032, over 14504.00 frames. utt_duration=1815 frames, utt_pad_proportion=0.03686, over 32.00 utterances.], tot_loss[ctc_loss=0.117, att_loss=0.258, loss=0.2298, over 3007093.56 frames. utt_duration=1253 frames, utt_pad_proportion=0.05279, over 9608.07 utterances.], batch size: 32, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:40:15,779 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32408.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:40:28,735 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.692e+02 3.196e+02 3.958e+02 9.983e+02, threshold=6.392e+02, percent-clipped=1.0 2023-03-08 00:40:35,580 INFO [train2.py:809] (2/4) Epoch 9, batch 550, loss[ctc_loss=0.1317, att_loss=0.2441, loss=0.2216, over 15654.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.008165, over 37.00 utterances.], tot_loss[ctc_loss=0.1173, att_loss=0.2582, loss=0.23, over 3065988.66 frames. utt_duration=1233 frames, utt_pad_proportion=0.05776, over 9962.49 utterances.], batch size: 37, lr: 1.24e-02, grad_scale: 16.0 2023-03-08 00:41:13,831 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32444.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:41:26,783 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9241, 6.2071, 5.7074, 6.0895, 5.8890, 5.5637, 5.6547, 5.6198], device='cuda:2'), covar=tensor([0.1182, 0.0913, 0.0729, 0.0625, 0.0708, 0.1381, 0.2264, 0.1910], device='cuda:2'), in_proj_covar=tensor([0.0398, 0.0453, 0.0344, 0.0358, 0.0331, 0.0391, 0.0471, 0.0420], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 00:41:55,747 INFO [train2.py:809] (2/4) Epoch 9, batch 600, loss[ctc_loss=0.1113, att_loss=0.252, loss=0.2238, over 16312.00 frames. utt_duration=1519 frames, utt_pad_proportion=0.005209, over 43.00 utterances.], tot_loss[ctc_loss=0.1179, att_loss=0.2584, loss=0.2303, over 3097470.66 frames. utt_duration=1210 frames, utt_pad_proportion=0.06787, over 10255.37 utterances.], batch size: 43, lr: 1.23e-02, grad_scale: 16.0 2023-03-08 00:42:10,453 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32479.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:42:29,840 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32492.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:43:07,804 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.381e+02 2.723e+02 3.556e+02 4.400e+02 8.502e+02, threshold=7.112e+02, percent-clipped=9.0 2023-03-08 00:43:14,508 INFO [train2.py:809] (2/4) Epoch 9, batch 650, loss[ctc_loss=0.1551, att_loss=0.2862, loss=0.26, over 16999.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.008952, over 51.00 utterances.], tot_loss[ctc_loss=0.1179, att_loss=0.2582, loss=0.2302, over 3138815.83 frames. utt_duration=1216 frames, utt_pad_proportion=0.06473, over 10336.64 utterances.], batch size: 51, lr: 1.23e-02, grad_scale: 16.0 2023-03-08 00:43:25,743 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32527.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:43:36,645 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32534.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:44:13,298 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3458, 1.9654, 2.2566, 1.9379, 2.6054, 2.2046, 1.7721, 2.2727], device='cuda:2'), covar=tensor([0.0815, 0.3791, 0.3566, 0.1949, 0.0887, 0.1662, 0.3465, 0.1484], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0085, 0.0085, 0.0076, 0.0073, 0.0071, 0.0083, 0.0064], device='cuda:2'), out_proj_covar=tensor([4.3144e-05, 5.4712e-05, 5.3305e-05, 4.5994e-05, 4.1292e-05, 4.5949e-05, 5.2242e-05, 4.3182e-05], device='cuda:2') 2023-03-08 00:44:14,576 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0242, 6.2845, 5.7911, 6.0730, 6.0160, 5.6487, 5.8092, 5.6479], device='cuda:2'), covar=tensor([0.1042, 0.0725, 0.0627, 0.0619, 0.0598, 0.1069, 0.1923, 0.1928], device='cuda:2'), in_proj_covar=tensor([0.0398, 0.0449, 0.0345, 0.0360, 0.0333, 0.0393, 0.0475, 0.0422], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 00:44:33,114 INFO [train2.py:809] (2/4) Epoch 9, batch 700, loss[ctc_loss=0.1646, att_loss=0.284, loss=0.2601, over 16872.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007405, over 49.00 utterances.], tot_loss[ctc_loss=0.119, att_loss=0.2585, loss=0.2306, over 3159571.17 frames. utt_duration=1203 frames, utt_pad_proportion=0.0696, over 10515.74 utterances.], batch size: 49, lr: 1.23e-02, grad_scale: 16.0 2023-03-08 00:44:35,660 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32571.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:44:52,290 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32582.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:45:15,248 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1095, 4.6457, 4.3031, 4.4570, 2.4588, 4.3506, 2.5244, 1.6181], device='cuda:2'), covar=tensor([0.0280, 0.0099, 0.0716, 0.0147, 0.2059, 0.0143, 0.1621, 0.1788], device='cuda:2'), in_proj_covar=tensor([0.0125, 0.0103, 0.0249, 0.0105, 0.0219, 0.0101, 0.0226, 0.0195], device='cuda:2'), out_proj_covar=tensor([1.2673e-04, 1.0686e-04, 2.2902e-04, 1.0196e-04, 2.0840e-04, 9.9956e-05, 2.0726e-04, 1.8105e-04], device='cuda:2') 2023-03-08 00:45:31,403 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32606.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:45:47,001 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+02 2.817e+02 3.384e+02 3.932e+02 1.069e+03, threshold=6.768e+02, percent-clipped=2.0 2023-03-08 00:45:51,659 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32619.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:45:53,049 INFO [train2.py:809] (2/4) Epoch 9, batch 750, loss[ctc_loss=0.1245, att_loss=0.2548, loss=0.2287, over 15959.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006918, over 41.00 utterances.], tot_loss[ctc_loss=0.119, att_loss=0.2588, loss=0.2309, over 3192401.53 frames. utt_duration=1215 frames, utt_pad_proportion=0.06238, over 10519.72 utterances.], batch size: 41, lr: 1.23e-02, grad_scale: 16.0 2023-03-08 00:45:55,711 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6756, 2.2302, 5.1292, 4.0969, 3.1277, 4.6572, 4.9202, 4.8400], device='cuda:2'), covar=tensor([0.0247, 0.1822, 0.0152, 0.0933, 0.1857, 0.0182, 0.0097, 0.0195], device='cuda:2'), in_proj_covar=tensor([0.0143, 0.0239, 0.0125, 0.0297, 0.0277, 0.0179, 0.0108, 0.0143], device='cuda:2'), out_proj_covar=tensor([1.3335e-04, 2.0090e-04, 1.1206e-04, 2.4719e-04, 2.4553e-04, 1.6011e-04, 9.9329e-05, 1.3624e-04], device='cuda:2') 2023-03-08 00:46:47,265 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32654.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:47:12,888 INFO [train2.py:809] (2/4) Epoch 9, batch 800, loss[ctc_loss=0.0908, att_loss=0.2296, loss=0.2018, over 16006.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007234, over 40.00 utterances.], tot_loss[ctc_loss=0.1183, att_loss=0.2583, loss=0.2303, over 3216014.00 frames. utt_duration=1233 frames, utt_pad_proportion=0.05724, over 10444.51 utterances.], batch size: 40, lr: 1.23e-02, grad_scale: 16.0 2023-03-08 00:48:28,948 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.641e+02 2.761e+02 3.783e+02 4.545e+02 8.141e+02, threshold=7.567e+02, percent-clipped=8.0 2023-03-08 00:48:33,645 INFO [train2.py:809] (2/4) Epoch 9, batch 850, loss[ctc_loss=0.1596, att_loss=0.2827, loss=0.2581, over 16992.00 frames. utt_duration=688.1 frames, utt_pad_proportion=0.1367, over 99.00 utterances.], tot_loss[ctc_loss=0.1182, att_loss=0.2589, loss=0.2308, over 3235305.26 frames. utt_duration=1220 frames, utt_pad_proportion=0.05894, over 10617.76 utterances.], batch size: 99, lr: 1.23e-02, grad_scale: 8.0 2023-03-08 00:48:49,945 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9893, 4.6605, 4.2477, 4.6127, 2.2760, 4.3908, 2.4252, 1.6948], device='cuda:2'), covar=tensor([0.0323, 0.0116, 0.0776, 0.0133, 0.2243, 0.0171, 0.1879, 0.1913], device='cuda:2'), in_proj_covar=tensor([0.0128, 0.0104, 0.0254, 0.0107, 0.0221, 0.0103, 0.0231, 0.0199], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 00:49:10,124 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5528, 2.8099, 3.7890, 2.8748, 3.5358, 4.7211, 4.4959, 3.5747], device='cuda:2'), covar=tensor([0.0332, 0.1659, 0.0904, 0.1397, 0.1040, 0.0759, 0.0486, 0.1171], device='cuda:2'), in_proj_covar=tensor([0.0224, 0.0219, 0.0232, 0.0201, 0.0229, 0.0271, 0.0205, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 00:49:54,786 INFO [train2.py:809] (2/4) Epoch 9, batch 900, loss[ctc_loss=0.1365, att_loss=0.2567, loss=0.2327, over 15767.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008531, over 38.00 utterances.], tot_loss[ctc_loss=0.1171, att_loss=0.2587, loss=0.2304, over 3255785.86 frames. utt_duration=1246 frames, utt_pad_proportion=0.05049, over 10467.05 utterances.], batch size: 38, lr: 1.23e-02, grad_scale: 8.0 2023-03-08 00:51:11,046 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+02 2.630e+02 3.149e+02 3.930e+02 7.487e+02, threshold=6.299e+02, percent-clipped=0.0 2023-03-08 00:51:15,679 INFO [train2.py:809] (2/4) Epoch 9, batch 950, loss[ctc_loss=0.1254, att_loss=0.2704, loss=0.2414, over 17018.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008094, over 51.00 utterances.], tot_loss[ctc_loss=0.1175, att_loss=0.2587, loss=0.2305, over 3251554.33 frames. utt_duration=1254 frames, utt_pad_proportion=0.05134, over 10384.57 utterances.], batch size: 51, lr: 1.23e-02, grad_scale: 8.0 2023-03-08 00:52:08,550 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-08 00:52:35,537 INFO [train2.py:809] (2/4) Epoch 9, batch 1000, loss[ctc_loss=0.1126, att_loss=0.2634, loss=0.2332, over 17352.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.01929, over 59.00 utterances.], tot_loss[ctc_loss=0.1168, att_loss=0.2578, loss=0.2296, over 3243365.82 frames. utt_duration=1267 frames, utt_pad_proportion=0.05155, over 10251.44 utterances.], batch size: 59, lr: 1.23e-02, grad_scale: 8.0 2023-03-08 00:53:50,525 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.538e+02 2.656e+02 3.223e+02 3.962e+02 9.648e+02, threshold=6.446e+02, percent-clipped=5.0 2023-03-08 00:53:55,333 INFO [train2.py:809] (2/4) Epoch 9, batch 1050, loss[ctc_loss=0.09344, att_loss=0.2359, loss=0.2074, over 15884.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009409, over 39.00 utterances.], tot_loss[ctc_loss=0.1162, att_loss=0.2571, loss=0.2289, over 3249283.84 frames. utt_duration=1278 frames, utt_pad_proportion=0.04925, over 10179.14 utterances.], batch size: 39, lr: 1.23e-02, grad_scale: 8.0 2023-03-08 00:54:14,881 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.54 vs. limit=2.0 2023-03-08 00:54:56,729 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6640, 6.0178, 5.4351, 5.7777, 5.6484, 5.2915, 5.3726, 5.3012], device='cuda:2'), covar=tensor([0.1456, 0.0757, 0.0778, 0.0747, 0.0762, 0.1240, 0.2126, 0.1980], device='cuda:2'), in_proj_covar=tensor([0.0401, 0.0447, 0.0351, 0.0365, 0.0334, 0.0401, 0.0478, 0.0426], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 00:55:10,119 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=32966.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:55:15,883 INFO [train2.py:809] (2/4) Epoch 9, batch 1100, loss[ctc_loss=0.1877, att_loss=0.3078, loss=0.2838, over 14061.00 frames. utt_duration=386.9 frames, utt_pad_proportion=0.326, over 146.00 utterances.], tot_loss[ctc_loss=0.1162, att_loss=0.2568, loss=0.2286, over 3247276.49 frames. utt_duration=1251 frames, utt_pad_proportion=0.05843, over 10394.66 utterances.], batch size: 146, lr: 1.23e-02, grad_scale: 8.0 2023-03-08 00:55:55,641 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4414, 3.6644, 3.0121, 3.4567, 3.9099, 3.5056, 2.6477, 4.1685], device='cuda:2'), covar=tensor([0.1264, 0.0478, 0.1112, 0.0523, 0.0539, 0.0640, 0.0925, 0.0476], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0168, 0.0196, 0.0167, 0.0213, 0.0202, 0.0171, 0.0225], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 00:56:31,655 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 3.040e+02 3.595e+02 4.615e+02 9.742e+02, threshold=7.190e+02, percent-clipped=7.0 2023-03-08 00:56:36,389 INFO [train2.py:809] (2/4) Epoch 9, batch 1150, loss[ctc_loss=0.08989, att_loss=0.2304, loss=0.2023, over 15496.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008519, over 36.00 utterances.], tot_loss[ctc_loss=0.1175, att_loss=0.2579, loss=0.2298, over 3255973.31 frames. utt_duration=1232 frames, utt_pad_proportion=0.06106, over 10587.53 utterances.], batch size: 36, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 00:56:47,523 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33027.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:57:54,933 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33069.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 00:57:56,111 INFO [train2.py:809] (2/4) Epoch 9, batch 1200, loss[ctc_loss=0.1136, att_loss=0.255, loss=0.2267, over 17074.00 frames. utt_duration=1290 frames, utt_pad_proportion=0.008252, over 53.00 utterances.], tot_loss[ctc_loss=0.1172, att_loss=0.2575, loss=0.2295, over 3259353.23 frames. utt_duration=1235 frames, utt_pad_proportion=0.05886, over 10566.49 utterances.], batch size: 53, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 00:58:28,653 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1246, 4.7336, 4.4226, 4.6013, 2.4698, 4.5155, 2.9268, 1.7642], device='cuda:2'), covar=tensor([0.0264, 0.0109, 0.0803, 0.0205, 0.2205, 0.0206, 0.1698, 0.1976], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0102, 0.0253, 0.0109, 0.0220, 0.0105, 0.0228, 0.0198], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 00:59:12,584 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+02 2.465e+02 3.116e+02 3.703e+02 8.772e+02, threshold=6.232e+02, percent-clipped=2.0 2023-03-08 00:59:17,166 INFO [train2.py:809] (2/4) Epoch 9, batch 1250, loss[ctc_loss=0.104, att_loss=0.2566, loss=0.2261, over 16876.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007649, over 49.00 utterances.], tot_loss[ctc_loss=0.1171, att_loss=0.2583, loss=0.2301, over 3272406.06 frames. utt_duration=1232 frames, utt_pad_proportion=0.0562, over 10641.23 utterances.], batch size: 49, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 00:59:32,940 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33130.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:00:14,104 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33155.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:00:36,809 INFO [train2.py:809] (2/4) Epoch 9, batch 1300, loss[ctc_loss=0.1043, att_loss=0.233, loss=0.2072, over 15615.00 frames. utt_duration=1690 frames, utt_pad_proportion=0.01062, over 37.00 utterances.], tot_loss[ctc_loss=0.1175, att_loss=0.2586, loss=0.2304, over 3266696.97 frames. utt_duration=1219 frames, utt_pad_proportion=0.05921, over 10735.21 utterances.], batch size: 37, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 01:01:51,141 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33216.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:01:52,243 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+02 2.474e+02 3.066e+02 3.872e+02 7.426e+02, threshold=6.132e+02, percent-clipped=4.0 2023-03-08 01:01:56,795 INFO [train2.py:809] (2/4) Epoch 9, batch 1350, loss[ctc_loss=0.1346, att_loss=0.2717, loss=0.2443, over 16325.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006536, over 45.00 utterances.], tot_loss[ctc_loss=0.1174, att_loss=0.2583, loss=0.2302, over 3264236.88 frames. utt_duration=1236 frames, utt_pad_proportion=0.05767, over 10580.62 utterances.], batch size: 45, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 01:02:14,666 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8833, 5.2193, 4.6840, 5.3322, 4.6806, 4.9685, 5.3316, 5.1873], device='cuda:2'), covar=tensor([0.0487, 0.0246, 0.0770, 0.0188, 0.0394, 0.0228, 0.0240, 0.0139], device='cuda:2'), in_proj_covar=tensor([0.0301, 0.0233, 0.0290, 0.0222, 0.0239, 0.0188, 0.0218, 0.0209], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-08 01:03:09,369 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.1404, 3.6376, 3.3792, 2.8525, 3.4827, 3.4583, 3.3787, 2.1011], device='cuda:2'), covar=tensor([0.1470, 0.1627, 0.3231, 0.7689, 0.3134, 0.5980, 0.0943, 1.1057], device='cuda:2'), in_proj_covar=tensor([0.0087, 0.0100, 0.0105, 0.0171, 0.0090, 0.0155, 0.0090, 0.0156], device='cuda:2'), out_proj_covar=tensor([8.1095e-05, 8.3759e-05, 9.2954e-05, 1.3586e-04, 8.0620e-05, 1.2561e-04, 7.7090e-05, 1.2557e-04], device='cuda:2') 2023-03-08 01:03:16,723 INFO [train2.py:809] (2/4) Epoch 9, batch 1400, loss[ctc_loss=0.1099, att_loss=0.2356, loss=0.2105, over 16129.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.006079, over 42.00 utterances.], tot_loss[ctc_loss=0.1165, att_loss=0.2574, loss=0.2292, over 3265858.77 frames. utt_duration=1231 frames, utt_pad_proportion=0.05811, over 10620.89 utterances.], batch size: 42, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 01:04:17,301 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-08 01:04:32,133 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.539e+02 2.521e+02 2.872e+02 3.748e+02 7.311e+02, threshold=5.745e+02, percent-clipped=4.0 2023-03-08 01:04:37,049 INFO [train2.py:809] (2/4) Epoch 9, batch 1450, loss[ctc_loss=0.1176, att_loss=0.2659, loss=0.2362, over 16971.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.006623, over 50.00 utterances.], tot_loss[ctc_loss=0.1174, att_loss=0.2585, loss=0.2303, over 3268001.29 frames. utt_duration=1206 frames, utt_pad_proportion=0.06314, over 10854.55 utterances.], batch size: 50, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 01:04:40,487 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=33322.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:05:56,599 INFO [train2.py:809] (2/4) Epoch 9, batch 1500, loss[ctc_loss=0.09964, att_loss=0.2207, loss=0.1965, over 15352.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01121, over 35.00 utterances.], tot_loss[ctc_loss=0.1172, att_loss=0.2586, loss=0.2303, over 3275414.25 frames. utt_duration=1221 frames, utt_pad_proportion=0.05788, over 10746.53 utterances.], batch size: 35, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 01:06:46,770 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2023-03-08 01:07:11,752 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.661e+02 2.556e+02 3.254e+02 4.022e+02 8.037e+02, threshold=6.507e+02, percent-clipped=3.0 2023-03-08 01:07:16,451 INFO [train2.py:809] (2/4) Epoch 9, batch 1550, loss[ctc_loss=0.09805, att_loss=0.2386, loss=0.2105, over 16273.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007584, over 43.00 utterances.], tot_loss[ctc_loss=0.1168, att_loss=0.2581, loss=0.2298, over 3277646.16 frames. utt_duration=1231 frames, utt_pad_proportion=0.05524, over 10659.58 utterances.], batch size: 43, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 01:07:24,577 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=33425.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:08:37,546 INFO [train2.py:809] (2/4) Epoch 9, batch 1600, loss[ctc_loss=0.126, att_loss=0.2638, loss=0.2362, over 16485.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.006127, over 46.00 utterances.], tot_loss[ctc_loss=0.1163, att_loss=0.2579, loss=0.2296, over 3276878.85 frames. utt_duration=1243 frames, utt_pad_proportion=0.05266, over 10556.52 utterances.], batch size: 46, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 01:08:40,898 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33472.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:09:43,902 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=33511.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:09:53,211 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+02 2.746e+02 3.273e+02 4.611e+02 8.460e+02, threshold=6.545e+02, percent-clipped=8.0 2023-03-08 01:09:57,951 INFO [train2.py:809] (2/4) Epoch 9, batch 1650, loss[ctc_loss=0.09623, att_loss=0.2242, loss=0.1986, over 15370.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.009577, over 35.00 utterances.], tot_loss[ctc_loss=0.1171, att_loss=0.258, loss=0.2298, over 3272185.43 frames. utt_duration=1230 frames, utt_pad_proportion=0.0591, over 10657.00 utterances.], batch size: 35, lr: 1.22e-02, grad_scale: 8.0 2023-03-08 01:10:18,995 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33533.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:11:18,165 INFO [train2.py:809] (2/4) Epoch 9, batch 1700, loss[ctc_loss=0.1551, att_loss=0.2796, loss=0.2547, over 13837.00 frames. utt_duration=383.1 frames, utt_pad_proportion=0.3349, over 145.00 utterances.], tot_loss[ctc_loss=0.1164, att_loss=0.2575, loss=0.2293, over 3273664.94 frames. utt_duration=1213 frames, utt_pad_proportion=0.06204, over 10812.01 utterances.], batch size: 145, lr: 1.21e-02, grad_scale: 8.0 2023-03-08 01:12:35,922 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.622e+02 2.664e+02 3.328e+02 4.191e+02 7.435e+02, threshold=6.656e+02, percent-clipped=3.0 2023-03-08 01:12:39,104 INFO [train2.py:809] (2/4) Epoch 9, batch 1750, loss[ctc_loss=0.1115, att_loss=0.2535, loss=0.2251, over 16273.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007752, over 43.00 utterances.], tot_loss[ctc_loss=0.1165, att_loss=0.257, loss=0.2289, over 3268987.65 frames. utt_duration=1199 frames, utt_pad_proportion=0.06763, over 10917.51 utterances.], batch size: 43, lr: 1.21e-02, grad_scale: 4.0 2023-03-08 01:12:42,501 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=33622.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:13:49,250 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9649, 5.3142, 4.6350, 5.4521, 4.6891, 5.0903, 5.4193, 5.1617], device='cuda:2'), covar=tensor([0.0424, 0.0246, 0.0856, 0.0145, 0.0430, 0.0150, 0.0203, 0.0173], device='cuda:2'), in_proj_covar=tensor([0.0305, 0.0235, 0.0298, 0.0226, 0.0245, 0.0193, 0.0220, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 01:13:58,120 INFO [train2.py:809] (2/4) Epoch 9, batch 1800, loss[ctc_loss=0.1271, att_loss=0.2715, loss=0.2427, over 17260.00 frames. utt_duration=875.2 frames, utt_pad_proportion=0.08357, over 79.00 utterances.], tot_loss[ctc_loss=0.1166, att_loss=0.2567, loss=0.2287, over 3265399.84 frames. utt_duration=1212 frames, utt_pad_proportion=0.06616, over 10793.39 utterances.], batch size: 79, lr: 1.21e-02, grad_scale: 4.0 2023-03-08 01:13:58,225 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=33670.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:14:09,667 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8853, 6.0817, 5.4921, 5.8661, 5.7761, 5.3603, 5.5082, 5.3194], device='cuda:2'), covar=tensor([0.1008, 0.0846, 0.0928, 0.0792, 0.0670, 0.1301, 0.2225, 0.2362], device='cuda:2'), in_proj_covar=tensor([0.0406, 0.0459, 0.0354, 0.0365, 0.0331, 0.0402, 0.0481, 0.0427], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 01:14:09,874 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33677.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:15:14,682 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.479e+02 2.773e+02 3.319e+02 4.110e+02 8.879e+02, threshold=6.639e+02, percent-clipped=6.0 2023-03-08 01:15:17,759 INFO [train2.py:809] (2/4) Epoch 9, batch 1850, loss[ctc_loss=0.1163, att_loss=0.2612, loss=0.2323, over 17113.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01539, over 56.00 utterances.], tot_loss[ctc_loss=0.1171, att_loss=0.2574, loss=0.2293, over 3270506.88 frames. utt_duration=1198 frames, utt_pad_proportion=0.06727, over 10930.11 utterances.], batch size: 56, lr: 1.21e-02, grad_scale: 4.0 2023-03-08 01:15:26,856 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=33725.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:15:28,953 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2023-03-08 01:15:47,516 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33738.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 01:15:54,353 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5024, 2.5371, 4.9113, 3.6456, 2.8775, 4.3272, 4.7285, 4.6052], device='cuda:2'), covar=tensor([0.0272, 0.1856, 0.0172, 0.1218, 0.2209, 0.0254, 0.0110, 0.0227], device='cuda:2'), in_proj_covar=tensor([0.0148, 0.0250, 0.0130, 0.0310, 0.0288, 0.0184, 0.0113, 0.0148], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0003, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 01:16:18,352 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.51 vs. limit=5.0 2023-03-08 01:16:37,510 INFO [train2.py:809] (2/4) Epoch 9, batch 1900, loss[ctc_loss=0.09351, att_loss=0.2451, loss=0.2147, over 16119.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005975, over 42.00 utterances.], tot_loss[ctc_loss=0.1163, att_loss=0.2565, loss=0.2285, over 3266249.93 frames. utt_duration=1229 frames, utt_pad_proportion=0.06093, over 10642.56 utterances.], batch size: 42, lr: 1.21e-02, grad_scale: 4.0 2023-03-08 01:16:42,309 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=33773.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:16:42,525 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0475, 5.0545, 4.9492, 2.6663, 4.8006, 4.5630, 4.1604, 2.3933], device='cuda:2'), covar=tensor([0.0110, 0.0071, 0.0176, 0.1027, 0.0089, 0.0175, 0.0312, 0.1405], device='cuda:2'), in_proj_covar=tensor([0.0057, 0.0076, 0.0066, 0.0101, 0.0066, 0.0088, 0.0086, 0.0097], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 01:17:44,861 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=33811.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:17:55,253 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+02 2.711e+02 3.222e+02 4.216e+02 1.081e+03, threshold=6.444e+02, percent-clipped=7.0 2023-03-08 01:17:58,266 INFO [train2.py:809] (2/4) Epoch 9, batch 1950, loss[ctc_loss=0.1141, att_loss=0.2704, loss=0.2391, over 17306.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01168, over 55.00 utterances.], tot_loss[ctc_loss=0.116, att_loss=0.2568, loss=0.2286, over 3270653.43 frames. utt_duration=1233 frames, utt_pad_proportion=0.05813, over 10623.80 utterances.], batch size: 55, lr: 1.21e-02, grad_scale: 4.0 2023-03-08 01:18:08,809 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4012, 2.4861, 3.2778, 4.4232, 4.0876, 4.0659, 2.7755, 2.1233], device='cuda:2'), covar=tensor([0.0676, 0.2396, 0.1144, 0.0473, 0.0558, 0.0293, 0.1557, 0.2431], device='cuda:2'), in_proj_covar=tensor([0.0165, 0.0200, 0.0188, 0.0182, 0.0171, 0.0138, 0.0189, 0.0178], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 01:18:11,658 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=33828.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:19:01,160 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=33859.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:19:17,458 INFO [train2.py:809] (2/4) Epoch 9, batch 2000, loss[ctc_loss=0.1159, att_loss=0.2502, loss=0.2234, over 16536.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006677, over 45.00 utterances.], tot_loss[ctc_loss=0.1169, att_loss=0.2572, loss=0.2291, over 3274961.03 frames. utt_duration=1242 frames, utt_pad_proportion=0.05487, over 10558.16 utterances.], batch size: 45, lr: 1.21e-02, grad_scale: 8.0 2023-03-08 01:20:34,522 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.651e+02 2.627e+02 3.033e+02 3.756e+02 1.034e+03, threshold=6.065e+02, percent-clipped=4.0 2023-03-08 01:20:37,627 INFO [train2.py:809] (2/4) Epoch 9, batch 2050, loss[ctc_loss=0.1098, att_loss=0.2502, loss=0.2222, over 16554.00 frames. utt_duration=1473 frames, utt_pad_proportion=0.005507, over 45.00 utterances.], tot_loss[ctc_loss=0.1166, att_loss=0.2566, loss=0.2286, over 3259713.20 frames. utt_duration=1220 frames, utt_pad_proportion=0.06506, over 10698.82 utterances.], batch size: 45, lr: 1.21e-02, grad_scale: 8.0 2023-03-08 01:20:45,608 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-08 01:21:08,878 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33939.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:21:28,864 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4695, 2.1292, 4.7548, 3.7586, 3.0764, 4.2184, 4.3431, 4.4170], device='cuda:2'), covar=tensor([0.0160, 0.1944, 0.0096, 0.1030, 0.1754, 0.0238, 0.0137, 0.0204], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0248, 0.0129, 0.0306, 0.0283, 0.0182, 0.0113, 0.0147], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0003, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 01:21:58,316 INFO [train2.py:809] (2/4) Epoch 9, batch 2100, loss[ctc_loss=0.2017, att_loss=0.3103, loss=0.2886, over 14247.00 frames. utt_duration=391.8 frames, utt_pad_proportion=0.3174, over 146.00 utterances.], tot_loss[ctc_loss=0.1184, att_loss=0.2585, loss=0.2304, over 3258629.80 frames. utt_duration=1188 frames, utt_pad_proportion=0.0728, over 10988.49 utterances.], batch size: 146, lr: 1.21e-02, grad_scale: 8.0 2023-03-08 01:22:50,637 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34000.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:23:18,328 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+02 2.606e+02 3.129e+02 4.184e+02 1.323e+03, threshold=6.258e+02, percent-clipped=9.0 2023-03-08 01:23:22,290 INFO [train2.py:809] (2/4) Epoch 9, batch 2150, loss[ctc_loss=0.1382, att_loss=0.2715, loss=0.2449, over 16981.00 frames. utt_duration=687.6 frames, utt_pad_proportion=0.1373, over 99.00 utterances.], tot_loss[ctc_loss=0.1182, att_loss=0.2585, loss=0.2305, over 3263883.56 frames. utt_duration=1177 frames, utt_pad_proportion=0.07367, over 11110.90 utterances.], batch size: 99, lr: 1.21e-02, grad_scale: 8.0 2023-03-08 01:23:42,988 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34033.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 01:24:02,281 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34044.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:24:35,683 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.77 vs. limit=5.0 2023-03-08 01:24:42,989 INFO [train2.py:809] (2/4) Epoch 9, batch 2200, loss[ctc_loss=0.1138, att_loss=0.2645, loss=0.2344, over 16890.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.00718, over 49.00 utterances.], tot_loss[ctc_loss=0.1176, att_loss=0.2582, loss=0.2301, over 3262938.43 frames. utt_duration=1191 frames, utt_pad_proportion=0.07151, over 10968.40 utterances.], batch size: 49, lr: 1.21e-02, grad_scale: 8.0 2023-03-08 01:25:18,277 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4804, 4.4878, 4.4280, 4.5328, 4.9228, 4.5022, 4.3963, 2.1886], device='cuda:2'), covar=tensor([0.0230, 0.0312, 0.0251, 0.0192, 0.0939, 0.0222, 0.0287, 0.2230], device='cuda:2'), in_proj_covar=tensor([0.0128, 0.0125, 0.0124, 0.0132, 0.0316, 0.0124, 0.0116, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 01:25:38,274 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34105.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:25:58,668 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+02 2.581e+02 3.160e+02 3.967e+02 1.179e+03, threshold=6.319e+02, percent-clipped=5.0 2023-03-08 01:26:01,793 INFO [train2.py:809] (2/4) Epoch 9, batch 2250, loss[ctc_loss=0.09408, att_loss=0.2312, loss=0.2038, over 15495.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008471, over 36.00 utterances.], tot_loss[ctc_loss=0.1161, att_loss=0.2571, loss=0.2289, over 3259937.21 frames. utt_duration=1211 frames, utt_pad_proportion=0.0676, over 10779.42 utterances.], batch size: 36, lr: 1.21e-02, grad_scale: 8.0 2023-03-08 01:26:14,043 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34128.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:26:42,338 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34145.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:27:21,333 INFO [train2.py:809] (2/4) Epoch 9, batch 2300, loss[ctc_loss=0.1232, att_loss=0.2781, loss=0.2471, over 17289.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01206, over 55.00 utterances.], tot_loss[ctc_loss=0.1165, att_loss=0.2578, loss=0.2295, over 3270250.61 frames. utt_duration=1217 frames, utt_pad_proportion=0.06354, over 10758.64 utterances.], batch size: 55, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:27:30,966 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34176.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:28:20,174 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34206.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:28:39,051 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.559e+02 3.119e+02 3.946e+02 1.160e+03, threshold=6.239e+02, percent-clipped=6.0 2023-03-08 01:28:42,211 INFO [train2.py:809] (2/4) Epoch 9, batch 2350, loss[ctc_loss=0.1263, att_loss=0.2628, loss=0.2355, over 15947.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.006953, over 41.00 utterances.], tot_loss[ctc_loss=0.1167, att_loss=0.2582, loss=0.2299, over 3273258.76 frames. utt_duration=1214 frames, utt_pad_proportion=0.06255, over 10795.77 utterances.], batch size: 41, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:30:03,532 INFO [train2.py:809] (2/4) Epoch 9, batch 2400, loss[ctc_loss=0.1584, att_loss=0.2794, loss=0.2552, over 16976.00 frames. utt_duration=687.4 frames, utt_pad_proportion=0.1365, over 99.00 utterances.], tot_loss[ctc_loss=0.1163, att_loss=0.2578, loss=0.2295, over 3279154.91 frames. utt_duration=1218 frames, utt_pad_proportion=0.05994, over 10786.58 utterances.], batch size: 99, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:30:14,638 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1012, 5.5147, 4.9595, 5.5185, 4.9156, 5.1026, 5.5942, 5.3968], device='cuda:2'), covar=tensor([0.0453, 0.0218, 0.0796, 0.0205, 0.0386, 0.0155, 0.0191, 0.0139], device='cuda:2'), in_proj_covar=tensor([0.0300, 0.0234, 0.0297, 0.0227, 0.0241, 0.0189, 0.0223, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-08 01:30:45,151 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34295.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:31:21,640 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+02 2.539e+02 3.254e+02 3.979e+02 8.947e+02, threshold=6.508e+02, percent-clipped=3.0 2023-03-08 01:31:24,807 INFO [train2.py:809] (2/4) Epoch 9, batch 2450, loss[ctc_loss=0.1258, att_loss=0.2699, loss=0.2411, over 17070.00 frames. utt_duration=1315 frames, utt_pad_proportion=0.007074, over 52.00 utterances.], tot_loss[ctc_loss=0.1155, att_loss=0.257, loss=0.2287, over 3275607.65 frames. utt_duration=1240 frames, utt_pad_proportion=0.05551, over 10575.54 utterances.], batch size: 52, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:31:47,054 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34333.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:32:45,731 INFO [train2.py:809] (2/4) Epoch 9, batch 2500, loss[ctc_loss=0.1054, att_loss=0.2538, loss=0.2242, over 16623.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005454, over 47.00 utterances.], tot_loss[ctc_loss=0.1154, att_loss=0.2568, loss=0.2286, over 3271754.81 frames. utt_duration=1250 frames, utt_pad_proportion=0.05466, over 10483.35 utterances.], batch size: 47, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:32:53,558 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8457, 5.2006, 4.6968, 5.3005, 4.6430, 4.9558, 5.4058, 5.1548], device='cuda:2'), covar=tensor([0.0506, 0.0223, 0.0802, 0.0183, 0.0435, 0.0187, 0.0180, 0.0150], device='cuda:2'), in_proj_covar=tensor([0.0300, 0.0233, 0.0296, 0.0227, 0.0240, 0.0188, 0.0221, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-08 01:33:03,129 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34381.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:33:35,233 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34400.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:34:03,745 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+02 2.575e+02 3.136e+02 3.976e+02 7.316e+02, threshold=6.272e+02, percent-clipped=1.0 2023-03-08 01:34:06,908 INFO [train2.py:809] (2/4) Epoch 9, batch 2550, loss[ctc_loss=0.08921, att_loss=0.2204, loss=0.1941, over 15771.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008887, over 38.00 utterances.], tot_loss[ctc_loss=0.1156, att_loss=0.2569, loss=0.2287, over 3274036.69 frames. utt_duration=1258 frames, utt_pad_proportion=0.05199, over 10421.53 utterances.], batch size: 38, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:35:28,486 INFO [train2.py:809] (2/4) Epoch 9, batch 2600, loss[ctc_loss=0.0815, att_loss=0.241, loss=0.2091, over 16188.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.005752, over 41.00 utterances.], tot_loss[ctc_loss=0.1146, att_loss=0.2565, loss=0.2281, over 3278629.26 frames. utt_duration=1268 frames, utt_pad_proportion=0.04823, over 10355.81 utterances.], batch size: 41, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:36:19,408 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34501.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:36:46,603 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.469e+02 2.544e+02 3.202e+02 3.743e+02 1.159e+03, threshold=6.403e+02, percent-clipped=3.0 2023-03-08 01:36:49,759 INFO [train2.py:809] (2/4) Epoch 9, batch 2650, loss[ctc_loss=0.1976, att_loss=0.2985, loss=0.2783, over 14427.00 frames. utt_duration=394.2 frames, utt_pad_proportion=0.3097, over 147.00 utterances.], tot_loss[ctc_loss=0.1155, att_loss=0.2573, loss=0.229, over 3275079.96 frames. utt_duration=1229 frames, utt_pad_proportion=0.06008, over 10675.64 utterances.], batch size: 147, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:38:10,644 INFO [train2.py:809] (2/4) Epoch 9, batch 2700, loss[ctc_loss=0.1666, att_loss=0.2918, loss=0.2667, over 17174.00 frames. utt_duration=695.5 frames, utt_pad_proportion=0.1284, over 99.00 utterances.], tot_loss[ctc_loss=0.1156, att_loss=0.2571, loss=0.2288, over 3274856.84 frames. utt_duration=1234 frames, utt_pad_proportion=0.05889, over 10625.21 utterances.], batch size: 99, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:38:37,197 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5766, 2.1670, 4.9085, 3.8456, 3.0808, 4.3618, 4.4239, 4.6561], device='cuda:2'), covar=tensor([0.0123, 0.1625, 0.0101, 0.0808, 0.1672, 0.0171, 0.0151, 0.0144], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0245, 0.0127, 0.0302, 0.0281, 0.0180, 0.0110, 0.0145], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 01:38:47,938 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34593.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:38:51,081 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34595.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:39:27,955 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.571e+02 3.320e+02 3.912e+02 1.043e+03, threshold=6.640e+02, percent-clipped=3.0 2023-03-08 01:39:30,983 INFO [train2.py:809] (2/4) Epoch 9, batch 2750, loss[ctc_loss=0.1152, att_loss=0.2349, loss=0.211, over 14560.00 frames. utt_duration=1822 frames, utt_pad_proportion=0.03463, over 32.00 utterances.], tot_loss[ctc_loss=0.1156, att_loss=0.257, loss=0.2287, over 3266392.92 frames. utt_duration=1251 frames, utt_pad_proportion=0.05542, over 10457.82 utterances.], batch size: 32, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:39:34,456 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34622.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:39:41,372 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34626.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:40:08,111 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34643.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:40:25,687 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34654.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:40:50,936 INFO [train2.py:809] (2/4) Epoch 9, batch 2800, loss[ctc_loss=0.1056, att_loss=0.23, loss=0.2051, over 15635.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008977, over 37.00 utterances.], tot_loss[ctc_loss=0.116, att_loss=0.2573, loss=0.229, over 3265459.74 frames. utt_duration=1241 frames, utt_pad_proportion=0.05949, over 10537.51 utterances.], batch size: 37, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:40:55,304 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-08 01:41:13,013 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34683.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 01:41:19,764 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34687.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:41:40,071 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34700.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:41:55,871 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34710.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 01:42:07,902 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+02 2.689e+02 3.286e+02 4.031e+02 7.048e+02, threshold=6.572e+02, percent-clipped=5.0 2023-03-08 01:42:11,101 INFO [train2.py:809] (2/4) Epoch 9, batch 2850, loss[ctc_loss=0.1304, att_loss=0.275, loss=0.2461, over 17362.00 frames. utt_duration=1104 frames, utt_pad_proportion=0.03432, over 63.00 utterances.], tot_loss[ctc_loss=0.1159, att_loss=0.2571, loss=0.2289, over 3263165.59 frames. utt_duration=1246 frames, utt_pad_proportion=0.06033, over 10487.52 utterances.], batch size: 63, lr: 1.20e-02, grad_scale: 8.0 2023-03-08 01:42:16,232 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1582, 5.2132, 5.0004, 2.7043, 2.0413, 2.9416, 3.8384, 3.8586], device='cuda:2'), covar=tensor([0.0620, 0.0233, 0.0272, 0.3418, 0.5991, 0.2455, 0.1248, 0.1883], device='cuda:2'), in_proj_covar=tensor([0.0322, 0.0209, 0.0225, 0.0191, 0.0349, 0.0335, 0.0222, 0.0347], device='cuda:2'), out_proj_covar=tensor([1.5126e-04, 8.0335e-05, 9.8893e-05, 8.5686e-05, 1.5541e-04, 1.3942e-04, 8.7977e-05, 1.5050e-04], device='cuda:2') 2023-03-08 01:42:44,371 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5349, 4.5537, 4.5008, 4.4642, 4.8605, 4.4914, 4.3727, 2.0789], device='cuda:2'), covar=tensor([0.0216, 0.0298, 0.0251, 0.0203, 0.1041, 0.0210, 0.0273, 0.2502], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0124, 0.0126, 0.0130, 0.0315, 0.0124, 0.0114, 0.0217], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 01:42:56,380 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34748.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:43:31,241 INFO [train2.py:809] (2/4) Epoch 9, batch 2900, loss[ctc_loss=0.08555, att_loss=0.2219, loss=0.1946, over 14541.00 frames. utt_duration=1819 frames, utt_pad_proportion=0.04056, over 32.00 utterances.], tot_loss[ctc_loss=0.1155, att_loss=0.2567, loss=0.2285, over 3267961.95 frames. utt_duration=1264 frames, utt_pad_proportion=0.05477, over 10350.52 utterances.], batch size: 32, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:43:33,241 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34771.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 01:44:16,878 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1885, 5.4514, 4.8102, 5.2821, 5.0330, 4.6762, 4.9060, 4.6934], device='cuda:2'), covar=tensor([0.1377, 0.0871, 0.0849, 0.0711, 0.0787, 0.1498, 0.2311, 0.2270], device='cuda:2'), in_proj_covar=tensor([0.0422, 0.0477, 0.0369, 0.0375, 0.0347, 0.0412, 0.0501, 0.0446], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 01:44:22,090 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34801.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:44:48,783 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.507e+02 2.823e+02 3.310e+02 4.171e+02 9.231e+02, threshold=6.619e+02, percent-clipped=4.0 2023-03-08 01:44:51,951 INFO [train2.py:809] (2/4) Epoch 9, batch 2950, loss[ctc_loss=0.1299, att_loss=0.269, loss=0.2412, over 17418.00 frames. utt_duration=1011 frames, utt_pad_proportion=0.04512, over 69.00 utterances.], tot_loss[ctc_loss=0.1154, att_loss=0.2568, loss=0.2285, over 3267392.41 frames. utt_duration=1279 frames, utt_pad_proportion=0.05129, over 10232.36 utterances.], batch size: 69, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:45:40,216 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34849.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:46:13,928 INFO [train2.py:809] (2/4) Epoch 9, batch 3000, loss[ctc_loss=0.1269, att_loss=0.2493, loss=0.2248, over 16168.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.00617, over 41.00 utterances.], tot_loss[ctc_loss=0.1157, att_loss=0.2568, loss=0.2286, over 3271439.02 frames. utt_duration=1269 frames, utt_pad_proportion=0.05094, over 10322.86 utterances.], batch size: 41, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:46:13,929 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 01:46:32,377 INFO [train2.py:843] (2/4) Epoch 9, validation: ctc_loss=0.05401, att_loss=0.2408, loss=0.2035, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 01:46:32,377 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 01:46:41,747 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.77 vs. limit=5.0 2023-03-08 01:47:49,369 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.566e+02 2.760e+02 3.196e+02 3.997e+02 8.726e+02, threshold=6.392e+02, percent-clipped=4.0 2023-03-08 01:47:52,532 INFO [train2.py:809] (2/4) Epoch 9, batch 3050, loss[ctc_loss=0.1063, att_loss=0.2578, loss=0.2275, over 16181.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006653, over 41.00 utterances.], tot_loss[ctc_loss=0.1156, att_loss=0.2566, loss=0.2284, over 3266771.73 frames. utt_duration=1255 frames, utt_pad_proportion=0.05427, over 10427.60 utterances.], batch size: 41, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:48:15,953 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1536, 4.6859, 4.8236, 4.4524, 2.3130, 4.9809, 3.0586, 2.2803], device='cuda:2'), covar=tensor([0.0326, 0.0178, 0.0590, 0.0195, 0.2201, 0.0084, 0.1501, 0.1654], device='cuda:2'), in_proj_covar=tensor([0.0130, 0.0105, 0.0252, 0.0106, 0.0217, 0.0101, 0.0224, 0.0201], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 01:48:40,025 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34949.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:49:12,668 INFO [train2.py:809] (2/4) Epoch 9, batch 3100, loss[ctc_loss=0.0921, att_loss=0.2575, loss=0.2244, over 16620.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005663, over 47.00 utterances.], tot_loss[ctc_loss=0.1149, att_loss=0.256, loss=0.2278, over 3271319.07 frames. utt_duration=1265 frames, utt_pad_proportion=0.04987, over 10356.22 utterances.], batch size: 47, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:49:25,040 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34978.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 01:49:32,542 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34982.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:50:30,869 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+02 2.700e+02 3.373e+02 4.220e+02 7.587e+02, threshold=6.745e+02, percent-clipped=2.0 2023-03-08 01:50:34,019 INFO [train2.py:809] (2/4) Epoch 9, batch 3150, loss[ctc_loss=0.1192, att_loss=0.2778, loss=0.2461, over 17121.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01475, over 56.00 utterances.], tot_loss[ctc_loss=0.1147, att_loss=0.2563, loss=0.228, over 3271627.49 frames. utt_duration=1241 frames, utt_pad_proportion=0.05653, over 10562.08 utterances.], batch size: 56, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:51:07,536 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.33 vs. limit=5.0 2023-03-08 01:51:48,853 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35066.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 01:51:54,781 INFO [train2.py:809] (2/4) Epoch 9, batch 3200, loss[ctc_loss=0.12, att_loss=0.2532, loss=0.2265, over 16459.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.00702, over 46.00 utterances.], tot_loss[ctc_loss=0.1147, att_loss=0.2568, loss=0.2284, over 3267697.72 frames. utt_duration=1227 frames, utt_pad_proportion=0.05912, over 10667.49 utterances.], batch size: 46, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:51:57,450 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-03-08 01:52:22,189 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1891, 2.3138, 3.1174, 4.1112, 3.6831, 3.8141, 2.7700, 1.7402], device='cuda:2'), covar=tensor([0.0749, 0.2621, 0.1078, 0.0666, 0.0706, 0.0410, 0.1621, 0.2821], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0199, 0.0187, 0.0177, 0.0172, 0.0140, 0.0187, 0.0177], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 01:52:31,355 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6665, 2.6517, 3.3829, 4.5572, 4.1920, 4.1054, 3.1701, 1.9702], device='cuda:2'), covar=tensor([0.0503, 0.2169, 0.1015, 0.0390, 0.0505, 0.0284, 0.1202, 0.2350], device='cuda:2'), in_proj_covar=tensor([0.0160, 0.0198, 0.0186, 0.0177, 0.0171, 0.0139, 0.0187, 0.0176], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 01:52:31,404 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35092.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:53:13,428 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.527e+02 2.431e+02 2.913e+02 3.671e+02 9.523e+02, threshold=5.826e+02, percent-clipped=2.0 2023-03-08 01:53:16,527 INFO [train2.py:809] (2/4) Epoch 9, batch 3250, loss[ctc_loss=0.1173, att_loss=0.2741, loss=0.2427, over 16970.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007319, over 50.00 utterances.], tot_loss[ctc_loss=0.1139, att_loss=0.2559, loss=0.2275, over 3267351.29 frames. utt_duration=1252 frames, utt_pad_proportion=0.05497, over 10452.19 utterances.], batch size: 50, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:53:23,421 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4253, 4.9401, 4.7076, 4.9535, 4.9547, 4.5888, 3.3324, 4.8666], device='cuda:2'), covar=tensor([0.0108, 0.0121, 0.0120, 0.0080, 0.0133, 0.0099, 0.0673, 0.0278], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0065, 0.0077, 0.0048, 0.0052, 0.0062, 0.0085, 0.0083], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 01:53:59,540 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9269, 3.6828, 3.0888, 3.3428, 3.9103, 3.5480, 2.8968, 4.1932], device='cuda:2'), covar=tensor([0.1029, 0.0424, 0.1128, 0.0703, 0.0688, 0.0650, 0.0857, 0.0534], device='cuda:2'), in_proj_covar=tensor([0.0185, 0.0178, 0.0205, 0.0171, 0.0223, 0.0208, 0.0179, 0.0242], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 01:54:10,647 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35153.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:54:20,534 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-08 01:54:37,149 INFO [train2.py:809] (2/4) Epoch 9, batch 3300, loss[ctc_loss=0.1224, att_loss=0.2521, loss=0.2261, over 15884.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.0093, over 39.00 utterances.], tot_loss[ctc_loss=0.1141, att_loss=0.2564, loss=0.2279, over 3272996.38 frames. utt_duration=1254 frames, utt_pad_proportion=0.0538, over 10454.22 utterances.], batch size: 39, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:55:55,872 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.545e+02 2.550e+02 3.164e+02 3.869e+02 8.354e+02, threshold=6.328e+02, percent-clipped=5.0 2023-03-08 01:55:58,995 INFO [train2.py:809] (2/4) Epoch 9, batch 3350, loss[ctc_loss=0.1023, att_loss=0.2517, loss=0.2218, over 16112.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.007064, over 42.00 utterances.], tot_loss[ctc_loss=0.1141, att_loss=0.2567, loss=0.2282, over 3275566.44 frames. utt_duration=1247 frames, utt_pad_proportion=0.05508, over 10519.07 utterances.], batch size: 42, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:56:39,672 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6574, 5.1597, 4.9295, 5.2170, 5.2706, 4.8854, 3.8367, 5.1951], device='cuda:2'), covar=tensor([0.0101, 0.0088, 0.0121, 0.0065, 0.0100, 0.0082, 0.0522, 0.0206], device='cuda:2'), in_proj_covar=tensor([0.0068, 0.0067, 0.0079, 0.0049, 0.0054, 0.0064, 0.0087, 0.0085], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 01:56:46,170 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35249.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:57:18,720 INFO [train2.py:809] (2/4) Epoch 9, batch 3400, loss[ctc_loss=0.1095, att_loss=0.2314, loss=0.207, over 15488.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.008065, over 36.00 utterances.], tot_loss[ctc_loss=0.1161, att_loss=0.2579, loss=0.2296, over 3267996.67 frames. utt_duration=1231 frames, utt_pad_proportion=0.06137, over 10631.79 utterances.], batch size: 36, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:57:32,780 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35278.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 01:57:38,891 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35282.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:57:42,508 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.88 vs. limit=2.0 2023-03-08 01:58:03,248 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35297.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:58:36,339 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.421e+02 2.749e+02 3.298e+02 4.168e+02 1.046e+03, threshold=6.595e+02, percent-clipped=7.0 2023-03-08 01:58:39,566 INFO [train2.py:809] (2/4) Epoch 9, batch 3450, loss[ctc_loss=0.1144, att_loss=0.2653, loss=0.2351, over 17395.00 frames. utt_duration=1010 frames, utt_pad_proportion=0.04471, over 69.00 utterances.], tot_loss[ctc_loss=0.1151, att_loss=0.2574, loss=0.229, over 3279017.02 frames. utt_duration=1247 frames, utt_pad_proportion=0.05393, over 10532.70 utterances.], batch size: 69, lr: 1.19e-02, grad_scale: 8.0 2023-03-08 01:58:50,424 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35326.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:58:56,844 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35330.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 01:59:06,539 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4466, 2.4721, 3.0593, 4.2370, 3.8591, 3.8497, 2.7943, 1.7772], device='cuda:2'), covar=tensor([0.0671, 0.2561, 0.1218, 0.0521, 0.0608, 0.0382, 0.1809, 0.2977], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0203, 0.0188, 0.0179, 0.0172, 0.0141, 0.0189, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 01:59:21,341 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35345.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 01:59:46,980 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8514, 5.1353, 4.5720, 5.2173, 4.6358, 4.8861, 5.3001, 4.9947], device='cuda:2'), covar=tensor([0.0522, 0.0281, 0.0894, 0.0209, 0.0422, 0.0264, 0.0194, 0.0185], device='cuda:2'), in_proj_covar=tensor([0.0310, 0.0242, 0.0305, 0.0233, 0.0244, 0.0194, 0.0223, 0.0216], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 01:59:55,015 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35366.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 02:00:01,144 INFO [train2.py:809] (2/4) Epoch 9, batch 3500, loss[ctc_loss=0.09155, att_loss=0.2338, loss=0.2054, over 15950.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.006798, over 41.00 utterances.], tot_loss[ctc_loss=0.1154, att_loss=0.2575, loss=0.2291, over 3278699.60 frames. utt_duration=1253 frames, utt_pad_proportion=0.05172, over 10476.97 utterances.], batch size: 41, lr: 1.18e-02, grad_scale: 8.0 2023-03-08 02:00:09,027 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35374.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:00:52,449 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-08 02:00:59,474 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35406.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 02:01:12,463 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35414.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 02:01:18,496 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.648e+02 2.480e+02 3.195e+02 3.962e+02 8.803e+02, threshold=6.389e+02, percent-clipped=2.0 2023-03-08 02:01:21,707 INFO [train2.py:809] (2/4) Epoch 9, batch 3550, loss[ctc_loss=0.1002, att_loss=0.2614, loss=0.2291, over 17006.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.008778, over 51.00 utterances.], tot_loss[ctc_loss=0.1157, att_loss=0.2573, loss=0.229, over 3282504.32 frames. utt_duration=1285 frames, utt_pad_proportion=0.04384, over 10233.19 utterances.], batch size: 51, lr: 1.18e-02, grad_scale: 8.0 2023-03-08 02:01:34,077 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4263, 2.9141, 2.6318, 2.9030, 3.1871, 3.0726, 2.3824, 3.1887], device='cuda:2'), covar=tensor([0.0897, 0.0422, 0.0807, 0.0521, 0.0583, 0.0510, 0.0809, 0.0528], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0173, 0.0200, 0.0166, 0.0217, 0.0201, 0.0175, 0.0237], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 02:01:46,292 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35435.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:02:06,775 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35448.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:02:41,172 INFO [train2.py:809] (2/4) Epoch 9, batch 3600, loss[ctc_loss=0.117, att_loss=0.2524, loss=0.2253, over 15937.00 frames. utt_duration=1556 frames, utt_pad_proportion=0.007342, over 41.00 utterances.], tot_loss[ctc_loss=0.1152, att_loss=0.2559, loss=0.2278, over 3271663.07 frames. utt_duration=1286 frames, utt_pad_proportion=0.04672, over 10188.56 utterances.], batch size: 41, lr: 1.18e-02, grad_scale: 8.0 2023-03-08 02:02:52,416 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6572, 4.3759, 4.4086, 4.4569, 4.9555, 4.6873, 4.2102, 2.0754], device='cuda:2'), covar=tensor([0.0231, 0.0408, 0.0301, 0.0199, 0.0917, 0.0240, 0.0280, 0.2495], device='cuda:2'), in_proj_covar=tensor([0.0127, 0.0126, 0.0128, 0.0132, 0.0322, 0.0124, 0.0117, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 02:03:58,102 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+02 2.714e+02 3.545e+02 4.370e+02 8.238e+02, threshold=7.091e+02, percent-clipped=8.0 2023-03-08 02:04:01,307 INFO [train2.py:809] (2/4) Epoch 9, batch 3650, loss[ctc_loss=0.1108, att_loss=0.2584, loss=0.2289, over 17202.00 frames. utt_duration=689.5 frames, utt_pad_proportion=0.1261, over 100.00 utterances.], tot_loss[ctc_loss=0.1158, att_loss=0.2565, loss=0.2283, over 3267819.03 frames. utt_duration=1274 frames, utt_pad_proportion=0.04924, over 10274.11 utterances.], batch size: 100, lr: 1.18e-02, grad_scale: 8.0 2023-03-08 02:04:09,205 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4728, 2.3979, 5.0157, 3.7851, 2.8540, 4.3608, 4.8347, 4.5330], device='cuda:2'), covar=tensor([0.0239, 0.1901, 0.0151, 0.1139, 0.2161, 0.0250, 0.0097, 0.0233], device='cuda:2'), in_proj_covar=tensor([0.0144, 0.0245, 0.0130, 0.0303, 0.0280, 0.0183, 0.0111, 0.0147], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 02:04:18,684 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35530.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:05:22,723 INFO [train2.py:809] (2/4) Epoch 9, batch 3700, loss[ctc_loss=0.1075, att_loss=0.25, loss=0.2215, over 16405.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006895, over 44.00 utterances.], tot_loss[ctc_loss=0.1158, att_loss=0.2565, loss=0.2284, over 3276339.34 frames. utt_duration=1269 frames, utt_pad_proportion=0.04701, over 10336.76 utterances.], batch size: 44, lr: 1.18e-02, grad_scale: 8.0 2023-03-08 02:05:57,194 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35591.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:06:21,269 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5201, 2.8169, 3.5929, 2.8059, 3.6133, 4.6785, 4.3732, 3.3501], device='cuda:2'), covar=tensor([0.0346, 0.1709, 0.1259, 0.1340, 0.0885, 0.0475, 0.0537, 0.1131], device='cuda:2'), in_proj_covar=tensor([0.0216, 0.0220, 0.0235, 0.0196, 0.0226, 0.0276, 0.0207, 0.0210], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:2') 2023-03-08 02:06:40,037 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.486e+02 3.166e+02 4.039e+02 7.024e+02, threshold=6.332e+02, percent-clipped=0.0 2023-03-08 02:06:43,374 INFO [train2.py:809] (2/4) Epoch 9, batch 3750, loss[ctc_loss=0.1283, att_loss=0.2684, loss=0.2404, over 16472.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006237, over 46.00 utterances.], tot_loss[ctc_loss=0.1164, att_loss=0.2569, loss=0.2288, over 3276208.26 frames. utt_duration=1250 frames, utt_pad_proportion=0.05242, over 10497.65 utterances.], batch size: 46, lr: 1.18e-02, grad_scale: 16.0 2023-03-08 02:08:04,810 INFO [train2.py:809] (2/4) Epoch 9, batch 3800, loss[ctc_loss=0.1029, att_loss=0.2466, loss=0.2178, over 16280.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006416, over 43.00 utterances.], tot_loss[ctc_loss=0.1147, att_loss=0.2559, loss=0.2277, over 3265743.61 frames. utt_duration=1261 frames, utt_pad_proportion=0.05089, over 10375.19 utterances.], batch size: 43, lr: 1.18e-02, grad_scale: 16.0 2023-03-08 02:08:09,895 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9314, 5.3230, 5.1473, 5.2593, 5.3536, 5.3190, 5.0101, 4.7897], device='cuda:2'), covar=tensor([0.1095, 0.0534, 0.0249, 0.0398, 0.0267, 0.0291, 0.0283, 0.0310], device='cuda:2'), in_proj_covar=tensor([0.0433, 0.0278, 0.0226, 0.0258, 0.0327, 0.0346, 0.0270, 0.0300], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 02:08:56,070 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35701.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 02:09:23,353 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.337e+02 2.475e+02 3.126e+02 3.794e+02 8.902e+02, threshold=6.253e+02, percent-clipped=4.0 2023-03-08 02:09:26,479 INFO [train2.py:809] (2/4) Epoch 9, batch 3850, loss[ctc_loss=0.08726, att_loss=0.2483, loss=0.2161, over 16400.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007016, over 44.00 utterances.], tot_loss[ctc_loss=0.1137, att_loss=0.2554, loss=0.2271, over 3277505.92 frames. utt_duration=1268 frames, utt_pad_proportion=0.0459, over 10350.93 utterances.], batch size: 44, lr: 1.18e-02, grad_scale: 16.0 2023-03-08 02:09:42,278 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35730.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:10:10,758 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35748.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:10:45,237 INFO [train2.py:809] (2/4) Epoch 9, batch 3900, loss[ctc_loss=0.2178, att_loss=0.3207, loss=0.3001, over 13803.00 frames. utt_duration=379.7 frames, utt_pad_proportion=0.3362, over 146.00 utterances.], tot_loss[ctc_loss=0.1145, att_loss=0.256, loss=0.2277, over 3267021.86 frames. utt_duration=1243 frames, utt_pad_proportion=0.05636, over 10529.21 utterances.], batch size: 146, lr: 1.18e-02, grad_scale: 16.0 2023-03-08 02:10:47,114 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35771.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:11:06,392 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0815, 3.9209, 3.1488, 3.7339, 4.0296, 3.8070, 2.8738, 4.4166], device='cuda:2'), covar=tensor([0.0955, 0.0419, 0.1112, 0.0525, 0.0547, 0.0564, 0.0871, 0.0512], device='cuda:2'), in_proj_covar=tensor([0.0185, 0.0176, 0.0207, 0.0170, 0.0221, 0.0206, 0.0179, 0.0242], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 02:11:26,174 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35796.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:11:47,436 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5855, 2.3628, 5.1133, 3.9524, 3.0073, 4.5821, 4.9036, 4.6712], device='cuda:2'), covar=tensor([0.0211, 0.1702, 0.0163, 0.0985, 0.1932, 0.0217, 0.0101, 0.0206], device='cuda:2'), in_proj_covar=tensor([0.0140, 0.0238, 0.0128, 0.0294, 0.0272, 0.0177, 0.0109, 0.0142], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 02:11:50,465 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35811.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:12:00,558 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.615e+02 2.688e+02 3.135e+02 3.909e+02 9.741e+02, threshold=6.269e+02, percent-clipped=3.0 2023-03-08 02:12:03,700 INFO [train2.py:809] (2/4) Epoch 9, batch 3950, loss[ctc_loss=0.1024, att_loss=0.2342, loss=0.2079, over 15661.00 frames. utt_duration=1695 frames, utt_pad_proportion=0.00788, over 37.00 utterances.], tot_loss[ctc_loss=0.1142, att_loss=0.256, loss=0.2276, over 3268829.43 frames. utt_duration=1257 frames, utt_pad_proportion=0.05256, over 10412.32 utterances.], batch size: 37, lr: 1.18e-02, grad_scale: 16.0 2023-03-08 02:12:22,924 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35832.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:13:22,885 INFO [train2.py:809] (2/4) Epoch 10, batch 0, loss[ctc_loss=0.1219, att_loss=0.2688, loss=0.2394, over 16957.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008079, over 50.00 utterances.], tot_loss[ctc_loss=0.1219, att_loss=0.2688, loss=0.2394, over 16957.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008079, over 50.00 utterances.], batch size: 50, lr: 1.12e-02, grad_scale: 16.0 2023-03-08 02:13:22,886 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 02:13:35,372 INFO [train2.py:843] (2/4) Epoch 10, validation: ctc_loss=0.0538, att_loss=0.2416, loss=0.204, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 02:13:35,373 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 02:14:04,781 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35872.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:14:25,941 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35886.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:14:44,590 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1085, 4.6170, 4.5296, 4.6125, 2.5352, 4.7969, 2.7371, 1.7412], device='cuda:2'), covar=tensor([0.0321, 0.0130, 0.0660, 0.0137, 0.1950, 0.0106, 0.1543, 0.1786], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0108, 0.0253, 0.0108, 0.0221, 0.0101, 0.0224, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 02:14:54,644 INFO [train2.py:809] (2/4) Epoch 10, batch 50, loss[ctc_loss=0.1175, att_loss=0.2631, loss=0.234, over 17204.00 frames. utt_duration=872.6 frames, utt_pad_proportion=0.0853, over 79.00 utterances.], tot_loss[ctc_loss=0.114, att_loss=0.2544, loss=0.2263, over 736322.49 frames. utt_duration=1264 frames, utt_pad_proportion=0.04734, over 2333.79 utterances.], batch size: 79, lr: 1.12e-02, grad_scale: 16.0 2023-03-08 02:15:18,101 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.508e+02 2.998e+02 4.058e+02 7.580e+02, threshold=5.995e+02, percent-clipped=4.0 2023-03-08 02:16:09,520 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.2002, 5.3864, 5.6410, 5.6460, 5.5899, 6.0942, 5.2911, 6.1883], device='cuda:2'), covar=tensor([0.0489, 0.0607, 0.0598, 0.0780, 0.1533, 0.0731, 0.0523, 0.0522], device='cuda:2'), in_proj_covar=tensor([0.0670, 0.0405, 0.0463, 0.0527, 0.0702, 0.0470, 0.0385, 0.0460], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 02:16:15,461 INFO [train2.py:809] (2/4) Epoch 10, batch 100, loss[ctc_loss=0.1686, att_loss=0.2874, loss=0.2636, over 16336.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005822, over 45.00 utterances.], tot_loss[ctc_loss=0.1143, att_loss=0.2554, loss=0.2272, over 1300603.02 frames. utt_duration=1279 frames, utt_pad_proportion=0.043, over 4071.92 utterances.], batch size: 45, lr: 1.12e-02, grad_scale: 16.0 2023-03-08 02:17:36,803 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36001.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 02:17:41,620 INFO [train2.py:809] (2/4) Epoch 10, batch 150, loss[ctc_loss=0.09427, att_loss=0.2499, loss=0.2188, over 16531.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006051, over 45.00 utterances.], tot_loss[ctc_loss=0.1141, att_loss=0.2558, loss=0.2274, over 1733368.05 frames. utt_duration=1275 frames, utt_pad_proportion=0.04648, over 5444.63 utterances.], batch size: 45, lr: 1.12e-02, grad_scale: 16.0 2023-03-08 02:17:56,767 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5243, 2.5561, 3.4840, 4.3784, 4.0874, 3.9289, 2.9015, 2.1338], device='cuda:2'), covar=tensor([0.0635, 0.2687, 0.0994, 0.0582, 0.0631, 0.0432, 0.1570, 0.2578], device='cuda:2'), in_proj_covar=tensor([0.0159, 0.0200, 0.0182, 0.0178, 0.0169, 0.0140, 0.0187, 0.0177], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 02:18:03,932 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.622e+02 2.601e+02 3.072e+02 3.945e+02 7.612e+02, threshold=6.144e+02, percent-clipped=5.0 2023-03-08 02:18:14,724 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3040, 5.0719, 5.1207, 5.1976, 5.0438, 5.2069, 5.0115, 4.7499], device='cuda:2'), covar=tensor([0.1841, 0.0764, 0.0332, 0.0510, 0.0769, 0.0385, 0.0319, 0.0419], device='cuda:2'), in_proj_covar=tensor([0.0432, 0.0274, 0.0221, 0.0260, 0.0324, 0.0340, 0.0266, 0.0299], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 02:18:22,450 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36030.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:18:52,123 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36049.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 02:19:00,393 INFO [train2.py:809] (2/4) Epoch 10, batch 200, loss[ctc_loss=0.1313, att_loss=0.2748, loss=0.2461, over 17016.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007963, over 51.00 utterances.], tot_loss[ctc_loss=0.1154, att_loss=0.2564, loss=0.2282, over 2070315.21 frames. utt_duration=1293 frames, utt_pad_proportion=0.04335, over 6412.62 utterances.], batch size: 51, lr: 1.12e-02, grad_scale: 16.0 2023-03-08 02:19:38,213 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36078.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:19:52,310 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8704, 3.7840, 3.1260, 3.4508, 3.8900, 3.6220, 2.6538, 4.3240], device='cuda:2'), covar=tensor([0.1121, 0.0493, 0.1134, 0.0738, 0.0669, 0.0692, 0.1084, 0.0526], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0174, 0.0204, 0.0170, 0.0222, 0.0206, 0.0178, 0.0241], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 02:20:11,926 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9514, 6.1657, 5.5617, 5.9669, 5.7769, 5.4895, 5.6422, 5.3659], device='cuda:2'), covar=tensor([0.1138, 0.0940, 0.0738, 0.0743, 0.0771, 0.1260, 0.2350, 0.2456], device='cuda:2'), in_proj_covar=tensor([0.0419, 0.0490, 0.0362, 0.0374, 0.0355, 0.0414, 0.0499, 0.0448], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 02:20:19,271 INFO [train2.py:809] (2/4) Epoch 10, batch 250, loss[ctc_loss=0.09591, att_loss=0.2307, loss=0.2037, over 14559.00 frames. utt_duration=1821 frames, utt_pad_proportion=0.04096, over 32.00 utterances.], tot_loss[ctc_loss=0.1148, att_loss=0.2563, loss=0.228, over 2337153.04 frames. utt_duration=1285 frames, utt_pad_proportion=0.04628, over 7282.39 utterances.], batch size: 32, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:20:42,219 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.582e+02 2.693e+02 3.162e+02 4.102e+02 9.052e+02, threshold=6.324e+02, percent-clipped=7.0 2023-03-08 02:20:53,330 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3321, 4.8607, 4.5979, 4.9135, 4.9407, 4.5376, 3.3501, 4.7717], device='cuda:2'), covar=tensor([0.0109, 0.0102, 0.0110, 0.0066, 0.0088, 0.0106, 0.0598, 0.0171], device='cuda:2'), in_proj_covar=tensor([0.0069, 0.0068, 0.0079, 0.0050, 0.0054, 0.0063, 0.0086, 0.0086], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 02:20:56,473 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36127.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:21:39,367 INFO [train2.py:809] (2/4) Epoch 10, batch 300, loss[ctc_loss=0.1198, att_loss=0.2665, loss=0.2371, over 16962.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.00748, over 50.00 utterances.], tot_loss[ctc_loss=0.1149, att_loss=0.2569, loss=0.2285, over 2551023.46 frames. utt_duration=1264 frames, utt_pad_proportion=0.04934, over 8085.16 utterances.], batch size: 50, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:22:01,540 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36167.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:22:23,359 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36181.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:22:31,056 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36186.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:23:01,510 INFO [train2.py:809] (2/4) Epoch 10, batch 350, loss[ctc_loss=0.1737, att_loss=0.291, loss=0.2675, over 14205.00 frames. utt_duration=390.6 frames, utt_pad_proportion=0.3171, over 146.00 utterances.], tot_loss[ctc_loss=0.1151, att_loss=0.2578, loss=0.2293, over 2722119.51 frames. utt_duration=1229 frames, utt_pad_proportion=0.05355, over 8872.08 utterances.], batch size: 146, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:23:24,556 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 2.621e+02 3.186e+02 4.132e+02 7.508e+02, threshold=6.371e+02, percent-clipped=3.0 2023-03-08 02:23:50,210 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36234.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:24:03,319 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36242.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:24:04,823 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3360, 3.8748, 3.2891, 3.7362, 4.1530, 3.8108, 3.3080, 4.5582], device='cuda:2'), covar=tensor([0.0758, 0.0435, 0.1029, 0.0528, 0.0554, 0.0589, 0.0661, 0.0349], device='cuda:2'), in_proj_covar=tensor([0.0179, 0.0173, 0.0202, 0.0169, 0.0220, 0.0203, 0.0177, 0.0237], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 02:24:24,495 INFO [train2.py:809] (2/4) Epoch 10, batch 400, loss[ctc_loss=0.1148, att_loss=0.2497, loss=0.2228, over 15953.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007151, over 41.00 utterances.], tot_loss[ctc_loss=0.1126, att_loss=0.2567, loss=0.2279, over 2845012.95 frames. utt_duration=1241 frames, utt_pad_proportion=0.05154, over 9181.28 utterances.], batch size: 41, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:25:45,292 INFO [train2.py:809] (2/4) Epoch 10, batch 450, loss[ctc_loss=0.1116, att_loss=0.2427, loss=0.2165, over 15852.00 frames. utt_duration=1627 frames, utt_pad_proportion=0.01139, over 39.00 utterances.], tot_loss[ctc_loss=0.1121, att_loss=0.2563, loss=0.2275, over 2945864.58 frames. utt_duration=1258 frames, utt_pad_proportion=0.04754, over 9377.09 utterances.], batch size: 39, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:26:06,978 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.425e+02 2.409e+02 2.979e+02 3.960e+02 1.144e+03, threshold=5.958e+02, percent-clipped=2.0 2023-03-08 02:26:31,713 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0466, 4.8950, 4.9258, 2.3209, 1.9043, 2.4849, 2.8453, 3.7211], device='cuda:2'), covar=tensor([0.0663, 0.0189, 0.0203, 0.3603, 0.5741, 0.3012, 0.1949, 0.1847], device='cuda:2'), in_proj_covar=tensor([0.0327, 0.0215, 0.0233, 0.0196, 0.0354, 0.0337, 0.0229, 0.0349], device='cuda:2'), out_proj_covar=tensor([1.5192e-04, 8.1517e-05, 1.0060e-04, 8.8729e-05, 1.5649e-04, 1.3898e-04, 9.0721e-05, 1.5063e-04], device='cuda:2') 2023-03-08 02:27:03,699 INFO [train2.py:809] (2/4) Epoch 10, batch 500, loss[ctc_loss=0.1003, att_loss=0.2312, loss=0.205, over 15372.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01101, over 35.00 utterances.], tot_loss[ctc_loss=0.1123, att_loss=0.2558, loss=0.2271, over 3008930.94 frames. utt_duration=1240 frames, utt_pad_proportion=0.05507, over 9721.58 utterances.], batch size: 35, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:27:37,659 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.82 vs. limit=2.0 2023-03-08 02:28:22,575 INFO [train2.py:809] (2/4) Epoch 10, batch 550, loss[ctc_loss=0.127, att_loss=0.2777, loss=0.2476, over 17039.00 frames. utt_duration=1287 frames, utt_pad_proportion=0.01053, over 53.00 utterances.], tot_loss[ctc_loss=0.1116, att_loss=0.2546, loss=0.226, over 3064823.55 frames. utt_duration=1268 frames, utt_pad_proportion=0.0486, over 9677.34 utterances.], batch size: 53, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:28:45,406 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+02 2.390e+02 3.015e+02 3.600e+02 8.483e+02, threshold=6.031e+02, percent-clipped=3.0 2023-03-08 02:28:56,589 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9656, 3.8056, 3.1452, 3.5237, 3.9659, 3.6359, 2.8355, 4.3063], device='cuda:2'), covar=tensor([0.0917, 0.0394, 0.0984, 0.0609, 0.0541, 0.0629, 0.0904, 0.0431], device='cuda:2'), in_proj_covar=tensor([0.0179, 0.0172, 0.0200, 0.0168, 0.0219, 0.0203, 0.0177, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 02:28:59,598 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36427.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:29:41,953 INFO [train2.py:809] (2/4) Epoch 10, batch 600, loss[ctc_loss=0.09637, att_loss=0.265, loss=0.2313, over 16882.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006773, over 49.00 utterances.], tot_loss[ctc_loss=0.1106, att_loss=0.2538, loss=0.2251, over 3112918.25 frames. utt_duration=1274 frames, utt_pad_proportion=0.04694, over 9782.65 utterances.], batch size: 49, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:30:01,899 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1506, 5.4287, 5.5706, 5.5819, 5.5503, 6.1211, 5.0993, 6.1609], device='cuda:2'), covar=tensor([0.0756, 0.0617, 0.0669, 0.1051, 0.1944, 0.0676, 0.0602, 0.0608], device='cuda:2'), in_proj_covar=tensor([0.0678, 0.0409, 0.0475, 0.0536, 0.0711, 0.0472, 0.0388, 0.0468], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 02:30:03,587 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36467.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:30:15,930 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36475.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:31:02,502 INFO [train2.py:809] (2/4) Epoch 10, batch 650, loss[ctc_loss=0.1148, att_loss=0.2373, loss=0.2128, over 15509.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008338, over 36.00 utterances.], tot_loss[ctc_loss=0.11, att_loss=0.253, loss=0.2244, over 3148767.49 frames. utt_duration=1292 frames, utt_pad_proportion=0.04383, over 9763.07 utterances.], batch size: 36, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:31:20,059 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36515.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:31:21,718 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3649, 4.8561, 4.6503, 4.8054, 4.9494, 4.5015, 3.0154, 4.6254], device='cuda:2'), covar=tensor([0.0129, 0.0130, 0.0115, 0.0094, 0.0115, 0.0121, 0.0854, 0.0306], device='cuda:2'), in_proj_covar=tensor([0.0070, 0.0069, 0.0080, 0.0051, 0.0055, 0.0065, 0.0088, 0.0088], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 02:31:24,437 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+02 2.550e+02 3.093e+02 3.802e+02 8.175e+02, threshold=6.187e+02, percent-clipped=1.0 2023-03-08 02:31:36,261 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 02:31:53,773 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36537.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:31:57,772 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-03-08 02:32:03,228 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2379, 4.5492, 4.4360, 4.8571, 2.7915, 4.5093, 2.7872, 1.9454], device='cuda:2'), covar=tensor([0.0251, 0.0121, 0.0661, 0.0120, 0.1674, 0.0163, 0.1575, 0.1845], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0106, 0.0250, 0.0107, 0.0219, 0.0102, 0.0225, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 02:32:21,152 INFO [train2.py:809] (2/4) Epoch 10, batch 700, loss[ctc_loss=0.1417, att_loss=0.2517, loss=0.2297, over 15363.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01049, over 35.00 utterances.], tot_loss[ctc_loss=0.1116, att_loss=0.2546, loss=0.226, over 3174256.12 frames. utt_duration=1256 frames, utt_pad_proportion=0.05418, over 10124.33 utterances.], batch size: 35, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:33:09,174 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6301, 4.7806, 4.2391, 4.7124, 4.4040, 3.9847, 4.3238, 4.0285], device='cuda:2'), covar=tensor([0.1273, 0.1356, 0.1140, 0.0974, 0.1211, 0.1740, 0.2609, 0.2817], device='cuda:2'), in_proj_covar=tensor([0.0420, 0.0490, 0.0364, 0.0378, 0.0350, 0.0409, 0.0502, 0.0442], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 02:33:38,868 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.72 vs. limit=2.0 2023-03-08 02:33:41,082 INFO [train2.py:809] (2/4) Epoch 10, batch 750, loss[ctc_loss=0.09705, att_loss=0.255, loss=0.2234, over 17273.00 frames. utt_duration=875.9 frames, utt_pad_proportion=0.08283, over 79.00 utterances.], tot_loss[ctc_loss=0.1121, att_loss=0.255, loss=0.2265, over 3195797.24 frames. utt_duration=1237 frames, utt_pad_proportion=0.0571, over 10346.73 utterances.], batch size: 79, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:33:43,691 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36605.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:33:48,471 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36608.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:33:55,262 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36612.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:34:04,096 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+02 2.573e+02 3.133e+02 3.667e+02 8.255e+02, threshold=6.267e+02, percent-clipped=5.0 2023-03-08 02:35:01,780 INFO [train2.py:809] (2/4) Epoch 10, batch 800, loss[ctc_loss=0.1239, att_loss=0.2779, loss=0.2471, over 16624.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005513, over 47.00 utterances.], tot_loss[ctc_loss=0.1124, att_loss=0.2551, loss=0.2265, over 3211637.37 frames. utt_duration=1226 frames, utt_pad_proportion=0.06125, over 10494.94 utterances.], batch size: 47, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:35:10,479 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4206, 2.5242, 5.0700, 3.9375, 2.9681, 4.4790, 4.9765, 4.7463], device='cuda:2'), covar=tensor([0.0226, 0.1660, 0.0134, 0.0895, 0.1849, 0.0205, 0.0093, 0.0185], device='cuda:2'), in_proj_covar=tensor([0.0139, 0.0235, 0.0127, 0.0293, 0.0269, 0.0178, 0.0109, 0.0141], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 02:35:16,189 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.82 vs. limit=5.0 2023-03-08 02:35:21,879 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36666.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:35:26,442 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36669.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:35:32,503 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36673.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:36:15,405 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36700.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 02:36:21,097 INFO [train2.py:809] (2/4) Epoch 10, batch 850, loss[ctc_loss=0.1188, att_loss=0.2759, loss=0.2445, over 17379.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03429, over 63.00 utterances.], tot_loss[ctc_loss=0.1126, att_loss=0.255, loss=0.2265, over 3229453.80 frames. utt_duration=1229 frames, utt_pad_proportion=0.05952, over 10521.21 utterances.], batch size: 63, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:36:23,796 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1942, 5.2128, 5.0335, 2.4934, 1.9377, 2.9860, 3.4413, 3.8490], device='cuda:2'), covar=tensor([0.0605, 0.0198, 0.0233, 0.4055, 0.5807, 0.2393, 0.1676, 0.1886], device='cuda:2'), in_proj_covar=tensor([0.0327, 0.0216, 0.0231, 0.0198, 0.0353, 0.0336, 0.0229, 0.0349], device='cuda:2'), out_proj_covar=tensor([1.5176e-04, 8.2141e-05, 9.9714e-05, 8.9822e-05, 1.5629e-04, 1.3859e-04, 9.0647e-05, 1.5035e-04], device='cuda:2') 2023-03-08 02:36:43,996 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.594e+02 3.302e+02 4.019e+02 1.127e+03, threshold=6.604e+02, percent-clipped=9.0 2023-03-08 02:37:34,268 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.74 vs. limit=5.0 2023-03-08 02:37:41,457 INFO [train2.py:809] (2/4) Epoch 10, batch 900, loss[ctc_loss=0.1047, att_loss=0.2583, loss=0.2276, over 17057.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.008563, over 52.00 utterances.], tot_loss[ctc_loss=0.1118, att_loss=0.2539, loss=0.2254, over 3239961.47 frames. utt_duration=1263 frames, utt_pad_proportion=0.05134, over 10274.70 utterances.], batch size: 52, lr: 1.11e-02, grad_scale: 16.0 2023-03-08 02:37:53,273 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36761.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 02:38:04,145 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6540, 5.8761, 5.2658, 5.7254, 5.5749, 5.1249, 5.3490, 5.1433], device='cuda:2'), covar=tensor([0.1188, 0.0915, 0.0932, 0.0788, 0.0743, 0.1503, 0.2311, 0.2327], device='cuda:2'), in_proj_covar=tensor([0.0424, 0.0495, 0.0369, 0.0381, 0.0353, 0.0412, 0.0505, 0.0449], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 02:38:21,602 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7531, 3.6491, 3.0036, 3.2581, 3.8663, 3.5236, 2.4935, 4.1478], device='cuda:2'), covar=tensor([0.1083, 0.0502, 0.1168, 0.0781, 0.0614, 0.0724, 0.1118, 0.0460], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0173, 0.0201, 0.0170, 0.0222, 0.0206, 0.0179, 0.0240], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 02:39:01,055 INFO [train2.py:809] (2/4) Epoch 10, batch 950, loss[ctc_loss=0.08549, att_loss=0.2342, loss=0.2044, over 16285.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006945, over 43.00 utterances.], tot_loss[ctc_loss=0.1117, att_loss=0.2539, loss=0.2255, over 3246123.18 frames. utt_duration=1258 frames, utt_pad_proportion=0.05263, over 10333.64 utterances.], batch size: 43, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:39:03,645 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36805.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 02:39:23,926 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.661e+02 3.552e+02 4.499e+02 1.125e+03, threshold=7.103e+02, percent-clipped=5.0 2023-03-08 02:39:54,077 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36837.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:40:22,150 INFO [train2.py:809] (2/4) Epoch 10, batch 1000, loss[ctc_loss=0.1342, att_loss=0.2669, loss=0.2404, over 16752.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.007235, over 48.00 utterances.], tot_loss[ctc_loss=0.1117, att_loss=0.2543, loss=0.2258, over 3261528.52 frames. utt_duration=1233 frames, utt_pad_proportion=0.05437, over 10591.63 utterances.], batch size: 48, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:40:41,835 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36866.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 02:41:11,558 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36885.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:41:43,593 INFO [train2.py:809] (2/4) Epoch 10, batch 1050, loss[ctc_loss=0.1056, att_loss=0.2624, loss=0.2311, over 16879.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.006979, over 49.00 utterances.], tot_loss[ctc_loss=0.1119, att_loss=0.2543, loss=0.2258, over 3265412.49 frames. utt_duration=1227 frames, utt_pad_proportion=0.05534, over 10658.95 utterances.], batch size: 49, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:41:55,896 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36911.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:42:06,548 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.560e+02 2.544e+02 2.981e+02 3.460e+02 1.238e+03, threshold=5.962e+02, percent-clipped=1.0 2023-03-08 02:43:05,653 INFO [train2.py:809] (2/4) Epoch 10, batch 1100, loss[ctc_loss=0.08117, att_loss=0.2278, loss=0.1984, over 16118.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006726, over 42.00 utterances.], tot_loss[ctc_loss=0.1118, att_loss=0.2549, loss=0.2263, over 3272006.42 frames. utt_duration=1214 frames, utt_pad_proportion=0.05651, over 10791.57 utterances.], batch size: 42, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:43:12,944 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36958.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:43:17,541 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36961.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:43:22,061 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36964.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:43:28,260 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36968.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:43:29,712 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6199, 5.9409, 5.3400, 5.7553, 5.6064, 5.2188, 5.2439, 5.1344], device='cuda:2'), covar=tensor([0.1432, 0.0910, 0.0928, 0.0770, 0.0858, 0.1473, 0.2684, 0.2605], device='cuda:2'), in_proj_covar=tensor([0.0419, 0.0485, 0.0365, 0.0376, 0.0350, 0.0405, 0.0498, 0.0447], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 02:43:34,572 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36972.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:44:16,941 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4205, 4.8931, 4.6252, 4.6991, 4.8673, 4.5446, 3.0794, 4.7016], device='cuda:2'), covar=tensor([0.0104, 0.0119, 0.0106, 0.0089, 0.0082, 0.0103, 0.0734, 0.0201], device='cuda:2'), in_proj_covar=tensor([0.0069, 0.0069, 0.0081, 0.0050, 0.0055, 0.0065, 0.0087, 0.0088], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 02:44:26,441 INFO [train2.py:809] (2/4) Epoch 10, batch 1150, loss[ctc_loss=0.1253, att_loss=0.2798, loss=0.2489, over 17087.00 frames. utt_duration=1222 frames, utt_pad_proportion=0.01689, over 56.00 utterances.], tot_loss[ctc_loss=0.1111, att_loss=0.2542, loss=0.2256, over 3273257.01 frames. utt_duration=1232 frames, utt_pad_proportion=0.05327, over 10644.23 utterances.], batch size: 56, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:44:26,844 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=37004.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:44:48,827 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.581e+02 2.521e+02 2.891e+02 3.394e+02 5.463e+02, threshold=5.783e+02, percent-clipped=0.0 2023-03-08 02:44:50,898 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=37019.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:44:55,658 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1179, 5.1716, 4.9324, 2.6001, 2.0164, 2.8527, 3.4700, 3.7617], device='cuda:2'), covar=tensor([0.0650, 0.0207, 0.0270, 0.3911, 0.5854, 0.2626, 0.1557, 0.1874], device='cuda:2'), in_proj_covar=tensor([0.0329, 0.0216, 0.0233, 0.0199, 0.0351, 0.0336, 0.0226, 0.0349], device='cuda:2'), out_proj_covar=tensor([1.5197e-04, 8.2543e-05, 1.0092e-04, 9.0335e-05, 1.5559e-04, 1.3844e-04, 8.9003e-05, 1.5003e-04], device='cuda:2') 2023-03-08 02:45:46,918 INFO [train2.py:809] (2/4) Epoch 10, batch 1200, loss[ctc_loss=0.09583, att_loss=0.2661, loss=0.2321, over 16952.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.006774, over 50.00 utterances.], tot_loss[ctc_loss=0.1114, att_loss=0.255, loss=0.2263, over 3280829.92 frames. utt_duration=1233 frames, utt_pad_proportion=0.05159, over 10652.91 utterances.], batch size: 50, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:45:50,261 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37056.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 02:46:04,809 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=37065.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:46:33,022 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4175, 2.5738, 5.0549, 3.9748, 2.9402, 4.4158, 4.9028, 4.7766], device='cuda:2'), covar=tensor([0.0249, 0.1709, 0.0166, 0.0988, 0.1957, 0.0227, 0.0105, 0.0191], device='cuda:2'), in_proj_covar=tensor([0.0141, 0.0243, 0.0130, 0.0298, 0.0274, 0.0181, 0.0111, 0.0144], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 02:46:52,541 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8260, 3.7203, 3.6221, 3.0041, 3.6995, 3.6949, 3.7161, 2.2252], device='cuda:2'), covar=tensor([0.1042, 0.2208, 0.3464, 0.7342, 0.1174, 0.7381, 0.0929, 0.9944], device='cuda:2'), in_proj_covar=tensor([0.0090, 0.0113, 0.0116, 0.0184, 0.0096, 0.0170, 0.0098, 0.0168], device='cuda:2'), out_proj_covar=tensor([8.5846e-05, 9.7298e-05, 1.0391e-04, 1.4789e-04, 8.8072e-05, 1.3959e-04, 8.5353e-05, 1.3583e-04], device='cuda:2') 2023-03-08 02:47:01,693 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-08 02:47:06,739 INFO [train2.py:809] (2/4) Epoch 10, batch 1250, loss[ctc_loss=0.113, att_loss=0.2692, loss=0.238, over 16953.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007675, over 50.00 utterances.], tot_loss[ctc_loss=0.1112, att_loss=0.2544, loss=0.2258, over 3275755.50 frames. utt_duration=1259 frames, utt_pad_proportion=0.04666, over 10420.06 utterances.], batch size: 50, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:47:29,342 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.464e+02 2.541e+02 3.066e+02 3.675e+02 8.173e+02, threshold=6.132e+02, percent-clipped=8.0 2023-03-08 02:47:35,147 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.41 vs. limit=5.0 2023-03-08 02:48:28,615 INFO [train2.py:809] (2/4) Epoch 10, batch 1300, loss[ctc_loss=0.092, att_loss=0.2248, loss=0.1982, over 15378.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.0107, over 35.00 utterances.], tot_loss[ctc_loss=0.1108, att_loss=0.2545, loss=0.2257, over 3282829.15 frames. utt_duration=1251 frames, utt_pad_proportion=0.04854, over 10511.39 utterances.], batch size: 35, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:48:40,425 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37161.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 02:49:15,564 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.47 vs. limit=5.0 2023-03-08 02:49:49,924 INFO [train2.py:809] (2/4) Epoch 10, batch 1350, loss[ctc_loss=0.1161, att_loss=0.267, loss=0.2368, over 17545.00 frames. utt_duration=1019 frames, utt_pad_proportion=0.03899, over 69.00 utterances.], tot_loss[ctc_loss=0.1101, att_loss=0.2539, loss=0.2251, over 3282801.68 frames. utt_duration=1254 frames, utt_pad_proportion=0.04734, over 10481.12 utterances.], batch size: 69, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:50:12,587 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.658e+02 2.454e+02 3.108e+02 4.121e+02 7.457e+02, threshold=6.215e+02, percent-clipped=5.0 2023-03-08 02:50:37,934 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8876, 4.6373, 4.4558, 4.6936, 5.1766, 4.8727, 4.6088, 2.4000], device='cuda:2'), covar=tensor([0.0147, 0.0313, 0.0280, 0.0245, 0.1058, 0.0129, 0.0245, 0.2078], device='cuda:2'), in_proj_covar=tensor([0.0125, 0.0123, 0.0130, 0.0132, 0.0319, 0.0121, 0.0112, 0.0215], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 02:50:43,293 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6924, 2.7027, 3.4676, 4.5910, 3.9655, 4.1186, 2.9903, 2.1291], device='cuda:2'), covar=tensor([0.0612, 0.2295, 0.1083, 0.0514, 0.0795, 0.0346, 0.1649, 0.2572], device='cuda:2'), in_proj_covar=tensor([0.0164, 0.0204, 0.0191, 0.0190, 0.0179, 0.0144, 0.0191, 0.0184], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 02:51:11,276 INFO [train2.py:809] (2/4) Epoch 10, batch 1400, loss[ctc_loss=0.1065, att_loss=0.2532, loss=0.2239, over 17232.00 frames. utt_duration=874.1 frames, utt_pad_proportion=0.08379, over 79.00 utterances.], tot_loss[ctc_loss=0.1103, att_loss=0.2539, loss=0.2252, over 3276237.23 frames. utt_duration=1232 frames, utt_pad_proportion=0.05445, over 10649.35 utterances.], batch size: 79, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:51:23,182 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37261.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:51:27,875 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37264.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:51:32,391 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37267.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:51:33,965 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37268.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:51:34,029 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=37268.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:52:27,539 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5331, 4.4695, 4.2973, 4.5210, 4.9495, 4.5883, 4.4443, 2.0490], device='cuda:2'), covar=tensor([0.0249, 0.0282, 0.0311, 0.0208, 0.0962, 0.0201, 0.0251, 0.2472], device='cuda:2'), in_proj_covar=tensor([0.0125, 0.0122, 0.0129, 0.0132, 0.0318, 0.0121, 0.0112, 0.0215], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 02:52:31,832 INFO [train2.py:809] (2/4) Epoch 10, batch 1450, loss[ctc_loss=0.1233, att_loss=0.2556, loss=0.2292, over 16287.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006233, over 43.00 utterances.], tot_loss[ctc_loss=0.1109, att_loss=0.2541, loss=0.2254, over 3271324.22 frames. utt_duration=1228 frames, utt_pad_proportion=0.05789, over 10668.28 utterances.], batch size: 43, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:52:39,561 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37309.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:52:44,934 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37312.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:52:48,257 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37314.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:52:51,391 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37316.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:52:54,387 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+02 2.516e+02 3.005e+02 3.782e+02 1.128e+03, threshold=6.010e+02, percent-clipped=1.0 2023-03-08 02:53:11,959 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=37329.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:53:22,638 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6312, 2.6121, 3.4483, 4.5207, 3.9967, 4.0953, 3.0262, 2.0269], device='cuda:2'), covar=tensor([0.0545, 0.2352, 0.0975, 0.0487, 0.0580, 0.0282, 0.1562, 0.2523], device='cuda:2'), in_proj_covar=tensor([0.0162, 0.0203, 0.0187, 0.0188, 0.0179, 0.0142, 0.0189, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 02:53:45,922 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4998, 1.4059, 1.8178, 2.2521, 2.7857, 1.8663, 1.4852, 2.1418], device='cuda:2'), covar=tensor([0.1701, 0.5712, 0.4797, 0.1878, 0.1188, 0.2029, 0.4212, 0.1716], device='cuda:2'), in_proj_covar=tensor([0.0077, 0.0084, 0.0091, 0.0076, 0.0075, 0.0070, 0.0083, 0.0066], device='cuda:2'), out_proj_covar=tensor([4.7564e-05, 5.6428e-05, 5.8957e-05, 4.9109e-05, 4.6139e-05, 4.8052e-05, 5.5049e-05, 4.5927e-05], device='cuda:2') 2023-03-08 02:53:51,710 INFO [train2.py:809] (2/4) Epoch 10, batch 1500, loss[ctc_loss=0.1195, att_loss=0.2655, loss=0.2363, over 17323.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02257, over 59.00 utterances.], tot_loss[ctc_loss=0.1114, att_loss=0.2541, loss=0.2256, over 3265226.30 frames. utt_duration=1224 frames, utt_pad_proportion=0.05976, over 10686.10 utterances.], batch size: 59, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:53:55,195 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37356.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 02:54:01,998 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37360.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:54:08,301 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=37364.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:55:12,309 INFO [train2.py:809] (2/4) Epoch 10, batch 1550, loss[ctc_loss=0.1014, att_loss=0.2493, loss=0.2197, over 16010.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007048, over 40.00 utterances.], tot_loss[ctc_loss=0.1113, att_loss=0.2545, loss=0.2259, over 3277495.83 frames. utt_duration=1248 frames, utt_pad_proportion=0.05148, over 10515.42 utterances.], batch size: 40, lr: 1.10e-02, grad_scale: 16.0 2023-03-08 02:55:12,437 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37404.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 02:55:34,599 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+02 2.658e+02 3.084e+02 3.993e+02 7.629e+02, threshold=6.167e+02, percent-clipped=4.0 2023-03-08 02:55:35,054 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6893, 1.6812, 2.2464, 2.0964, 2.4451, 2.1484, 1.7273, 2.2925], device='cuda:2'), covar=tensor([0.1964, 0.4235, 0.3963, 0.2271, 0.2098, 0.2142, 0.4179, 0.1810], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0083, 0.0088, 0.0075, 0.0074, 0.0069, 0.0082, 0.0065], device='cuda:2'), out_proj_covar=tensor([4.6796e-05, 5.5606e-05, 5.7811e-05, 4.8653e-05, 4.5679e-05, 4.7214e-05, 5.4458e-05, 4.5436e-05], device='cuda:2') 2023-03-08 02:55:45,942 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=37425.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 02:56:32,395 INFO [train2.py:809] (2/4) Epoch 10, batch 1600, loss[ctc_loss=0.1057, att_loss=0.2453, loss=0.2174, over 16697.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005208, over 46.00 utterances.], tot_loss[ctc_loss=0.1117, att_loss=0.2544, loss=0.2258, over 3271656.95 frames. utt_duration=1243 frames, utt_pad_proportion=0.05591, over 10543.69 utterances.], batch size: 46, lr: 1.09e-02, grad_scale: 16.0 2023-03-08 02:56:43,950 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37461.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 02:56:48,631 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6011, 2.5075, 5.1698, 3.8590, 3.1295, 4.5657, 4.9228, 4.7587], device='cuda:2'), covar=tensor([0.0250, 0.1938, 0.0187, 0.1112, 0.1992, 0.0231, 0.0110, 0.0227], device='cuda:2'), in_proj_covar=tensor([0.0144, 0.0245, 0.0132, 0.0303, 0.0278, 0.0185, 0.0113, 0.0149], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 02:57:10,459 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1813, 5.2177, 5.1290, 2.7391, 4.9856, 4.6138, 4.5959, 2.6499], device='cuda:2'), covar=tensor([0.0130, 0.0068, 0.0174, 0.1040, 0.0076, 0.0172, 0.0253, 0.1334], device='cuda:2'), in_proj_covar=tensor([0.0059, 0.0079, 0.0069, 0.0101, 0.0067, 0.0090, 0.0089, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 02:57:53,649 INFO [train2.py:809] (2/4) Epoch 10, batch 1650, loss[ctc_loss=0.1592, att_loss=0.2799, loss=0.2558, over 16673.00 frames. utt_duration=675.3 frames, utt_pad_proportion=0.1527, over 99.00 utterances.], tot_loss[ctc_loss=0.1116, att_loss=0.254, loss=0.2255, over 3256413.56 frames. utt_duration=1191 frames, utt_pad_proportion=0.07245, over 10952.95 utterances.], batch size: 99, lr: 1.09e-02, grad_scale: 16.0 2023-03-08 02:58:01,287 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37509.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 02:58:14,780 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.538e+02 2.486e+02 2.891e+02 3.571e+02 1.016e+03, threshold=5.782e+02, percent-clipped=2.0 2023-03-08 02:58:30,052 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.70 vs. limit=5.0 2023-03-08 02:58:51,291 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7581, 1.9333, 2.4199, 2.2291, 2.5199, 2.1173, 1.8519, 2.6390], device='cuda:2'), covar=tensor([0.1133, 0.3272, 0.2467, 0.1811, 0.1355, 0.1846, 0.3341, 0.1224], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0082, 0.0085, 0.0072, 0.0072, 0.0066, 0.0079, 0.0063], device='cuda:2'), out_proj_covar=tensor([4.5350e-05, 5.4262e-05, 5.5899e-05, 4.7123e-05, 4.4428e-05, 4.5744e-05, 5.2891e-05, 4.3873e-05], device='cuda:2') 2023-03-08 02:59:12,748 INFO [train2.py:809] (2/4) Epoch 10, batch 1700, loss[ctc_loss=0.1111, att_loss=0.2473, loss=0.2201, over 15774.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.0084, over 38.00 utterances.], tot_loss[ctc_loss=0.1113, att_loss=0.2538, loss=0.2253, over 3250115.49 frames. utt_duration=1181 frames, utt_pad_proportion=0.07677, over 11023.76 utterances.], batch size: 38, lr: 1.09e-02, grad_scale: 16.0 2023-03-08 02:59:33,339 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37567.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:00:33,880 INFO [train2.py:809] (2/4) Epoch 10, batch 1750, loss[ctc_loss=0.09432, att_loss=0.2307, loss=0.2034, over 15773.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.007604, over 38.00 utterances.], tot_loss[ctc_loss=0.1094, att_loss=0.2524, loss=0.2238, over 3253398.37 frames. utt_duration=1221 frames, utt_pad_proportion=0.06634, over 10672.36 utterances.], batch size: 38, lr: 1.09e-02, grad_scale: 16.0 2023-03-08 03:00:49,724 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37614.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:00:51,060 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37615.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:00:53,445 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2023-03-08 03:00:55,646 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.443e+02 2.432e+02 2.940e+02 3.870e+02 6.231e+02, threshold=5.879e+02, percent-clipped=2.0 2023-03-08 03:01:05,302 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37624.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:01:53,789 INFO [train2.py:809] (2/4) Epoch 10, batch 1800, loss[ctc_loss=0.1115, att_loss=0.2641, loss=0.2336, over 17117.00 frames. utt_duration=693 frames, utt_pad_proportion=0.1305, over 99.00 utterances.], tot_loss[ctc_loss=0.1088, att_loss=0.2524, loss=0.2237, over 3261154.06 frames. utt_duration=1246 frames, utt_pad_proportion=0.05787, over 10479.92 utterances.], batch size: 99, lr: 1.09e-02, grad_scale: 32.0 2023-03-08 03:02:03,015 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37660.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:02:05,721 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37662.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:03:13,561 INFO [train2.py:809] (2/4) Epoch 10, batch 1850, loss[ctc_loss=0.1058, att_loss=0.2408, loss=0.2138, over 16156.00 frames. utt_duration=1578 frames, utt_pad_proportion=0.00834, over 41.00 utterances.], tot_loss[ctc_loss=0.1084, att_loss=0.2522, loss=0.2234, over 3253901.03 frames. utt_duration=1250 frames, utt_pad_proportion=0.05807, over 10425.23 utterances.], batch size: 41, lr: 1.09e-02, grad_scale: 16.0 2023-03-08 03:03:19,721 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37708.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:03:21,532 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4263, 1.7536, 2.3334, 2.3619, 2.8604, 2.1170, 1.9622, 3.0976], device='cuda:2'), covar=tensor([0.0992, 0.4588, 0.3416, 0.1950, 0.1616, 0.1859, 0.3849, 0.1270], device='cuda:2'), in_proj_covar=tensor([0.0074, 0.0084, 0.0088, 0.0073, 0.0072, 0.0068, 0.0080, 0.0063], device='cuda:2'), out_proj_covar=tensor([4.5981e-05, 5.5599e-05, 5.7473e-05, 4.7918e-05, 4.5062e-05, 4.6865e-05, 5.3808e-05, 4.3934e-05], device='cuda:2') 2023-03-08 03:03:28,449 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.80 vs. limit=5.0 2023-03-08 03:03:36,155 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.525e+02 2.414e+02 3.142e+02 4.365e+02 1.152e+03, threshold=6.284e+02, percent-clipped=6.0 2023-03-08 03:03:37,925 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37720.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:03:59,951 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8527, 2.5445, 3.3970, 4.7000, 4.1679, 4.1857, 3.1086, 2.0506], device='cuda:2'), covar=tensor([0.0452, 0.2412, 0.1201, 0.0460, 0.0787, 0.0325, 0.1201, 0.2227], device='cuda:2'), in_proj_covar=tensor([0.0163, 0.0206, 0.0187, 0.0185, 0.0181, 0.0143, 0.0187, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 03:04:15,516 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8793, 5.2223, 4.7419, 5.3223, 4.6713, 4.9668, 5.3997, 5.1913], device='cuda:2'), covar=tensor([0.0553, 0.0281, 0.0790, 0.0218, 0.0434, 0.0204, 0.0185, 0.0154], device='cuda:2'), in_proj_covar=tensor([0.0315, 0.0249, 0.0310, 0.0238, 0.0253, 0.0196, 0.0230, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 03:04:33,113 INFO [train2.py:809] (2/4) Epoch 10, batch 1900, loss[ctc_loss=0.11, att_loss=0.2635, loss=0.2328, over 17226.00 frames. utt_duration=873.7 frames, utt_pad_proportion=0.08221, over 79.00 utterances.], tot_loss[ctc_loss=0.1089, att_loss=0.2527, loss=0.2239, over 3251530.82 frames. utt_duration=1224 frames, utt_pad_proportion=0.06573, over 10639.85 utterances.], batch size: 79, lr: 1.09e-02, grad_scale: 16.0 2023-03-08 03:04:34,872 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5757, 5.0815, 4.7766, 4.9949, 5.0476, 4.7530, 3.6605, 5.0056], device='cuda:2'), covar=tensor([0.0105, 0.0106, 0.0098, 0.0086, 0.0095, 0.0093, 0.0572, 0.0186], device='cuda:2'), in_proj_covar=tensor([0.0068, 0.0068, 0.0079, 0.0049, 0.0054, 0.0064, 0.0085, 0.0085], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:05:23,544 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3348, 5.2949, 5.2830, 3.1970, 5.0776, 4.8006, 4.6953, 2.8873], device='cuda:2'), covar=tensor([0.0101, 0.0059, 0.0110, 0.0809, 0.0072, 0.0147, 0.0205, 0.1239], device='cuda:2'), in_proj_covar=tensor([0.0058, 0.0079, 0.0068, 0.0100, 0.0067, 0.0089, 0.0088, 0.0098], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 03:05:53,430 INFO [train2.py:809] (2/4) Epoch 10, batch 1950, loss[ctc_loss=0.132, att_loss=0.2735, loss=0.2452, over 17065.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.00889, over 53.00 utterances.], tot_loss[ctc_loss=0.1092, att_loss=0.2534, loss=0.2245, over 3254727.04 frames. utt_duration=1235 frames, utt_pad_proportion=0.063, over 10553.67 utterances.], batch size: 53, lr: 1.09e-02, grad_scale: 16.0 2023-03-08 03:06:16,463 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+02 2.508e+02 3.059e+02 3.528e+02 6.330e+02, threshold=6.117e+02, percent-clipped=1.0 2023-03-08 03:06:31,780 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6305, 3.5066, 3.0149, 3.3489, 3.8012, 3.3867, 2.6629, 4.0039], device='cuda:2'), covar=tensor([0.1055, 0.0461, 0.1066, 0.0606, 0.0576, 0.0634, 0.0846, 0.0496], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0178, 0.0203, 0.0169, 0.0226, 0.0207, 0.0176, 0.0244], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 03:06:39,555 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.89 vs. limit=5.0 2023-03-08 03:07:12,866 INFO [train2.py:809] (2/4) Epoch 10, batch 2000, loss[ctc_loss=0.1156, att_loss=0.2485, loss=0.2219, over 15760.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.00883, over 38.00 utterances.], tot_loss[ctc_loss=0.1091, att_loss=0.2531, loss=0.2243, over 3262595.52 frames. utt_duration=1240 frames, utt_pad_proportion=0.05971, over 10540.47 utterances.], batch size: 38, lr: 1.09e-02, grad_scale: 16.0 2023-03-08 03:07:25,288 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0897, 6.2709, 5.6247, 6.0683, 6.0011, 5.5478, 5.7797, 5.4667], device='cuda:2'), covar=tensor([0.1102, 0.0879, 0.0806, 0.0796, 0.0677, 0.1360, 0.2470, 0.2879], device='cuda:2'), in_proj_covar=tensor([0.0419, 0.0489, 0.0370, 0.0378, 0.0349, 0.0411, 0.0501, 0.0457], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 03:07:46,403 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6209, 4.8413, 4.7734, 4.7669, 5.4039, 4.5555, 4.8121, 2.1775], device='cuda:2'), covar=tensor([0.0196, 0.0160, 0.0136, 0.0135, 0.0672, 0.0198, 0.0141, 0.2108], device='cuda:2'), in_proj_covar=tensor([0.0124, 0.0125, 0.0130, 0.0133, 0.0319, 0.0121, 0.0112, 0.0218], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 03:08:19,907 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9056, 5.1627, 4.6614, 5.3271, 4.5355, 4.9699, 5.3442, 5.1356], device='cuda:2'), covar=tensor([0.0500, 0.0283, 0.0911, 0.0200, 0.0515, 0.0210, 0.0204, 0.0158], device='cuda:2'), in_proj_covar=tensor([0.0319, 0.0252, 0.0317, 0.0241, 0.0258, 0.0198, 0.0235, 0.0228], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 03:08:32,707 INFO [train2.py:809] (2/4) Epoch 10, batch 2050, loss[ctc_loss=0.1123, att_loss=0.2611, loss=0.2313, over 17068.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.007997, over 52.00 utterances.], tot_loss[ctc_loss=0.1079, att_loss=0.2524, loss=0.2235, over 3257443.27 frames. utt_duration=1280 frames, utt_pad_proportion=0.05136, over 10188.11 utterances.], batch size: 52, lr: 1.09e-02, grad_scale: 16.0 2023-03-08 03:08:46,079 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-03-08 03:08:55,900 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.379e+02 2.394e+02 2.856e+02 3.386e+02 7.118e+02, threshold=5.711e+02, percent-clipped=3.0 2023-03-08 03:09:04,088 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37924.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:09:30,823 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2283, 4.5914, 4.3898, 4.0593, 2.4697, 4.4263, 2.3893, 1.6210], device='cuda:2'), covar=tensor([0.0261, 0.0124, 0.0531, 0.0267, 0.1776, 0.0178, 0.1594, 0.1757], device='cuda:2'), in_proj_covar=tensor([0.0128, 0.0103, 0.0250, 0.0108, 0.0216, 0.0102, 0.0223, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 03:09:32,312 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1196, 2.3621, 2.6678, 3.8294, 3.5070, 3.7027, 2.6046, 1.9298], device='cuda:2'), covar=tensor([0.0666, 0.2372, 0.1294, 0.0619, 0.0825, 0.0353, 0.1617, 0.2279], device='cuda:2'), in_proj_covar=tensor([0.0163, 0.0204, 0.0187, 0.0185, 0.0180, 0.0143, 0.0187, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 03:09:52,781 INFO [train2.py:809] (2/4) Epoch 10, batch 2100, loss[ctc_loss=0.09243, att_loss=0.2503, loss=0.2187, over 16522.00 frames. utt_duration=1470 frames, utt_pad_proportion=0.006622, over 45.00 utterances.], tot_loss[ctc_loss=0.1084, att_loss=0.2528, loss=0.2239, over 3264392.22 frames. utt_duration=1287 frames, utt_pad_proportion=0.04697, over 10154.08 utterances.], batch size: 45, lr: 1.09e-02, grad_scale: 8.0 2023-03-08 03:10:21,721 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37972.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:10:33,457 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=37979.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:11:18,142 INFO [train2.py:809] (2/4) Epoch 10, batch 2150, loss[ctc_loss=0.147, att_loss=0.2842, loss=0.2568, over 17292.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01248, over 55.00 utterances.], tot_loss[ctc_loss=0.1085, att_loss=0.2525, loss=0.2237, over 3260728.38 frames. utt_duration=1302 frames, utt_pad_proportion=0.04491, over 10032.54 utterances.], batch size: 55, lr: 1.09e-02, grad_scale: 8.0 2023-03-08 03:11:42,431 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+02 2.860e+02 3.437e+02 3.901e+02 7.219e+02, threshold=6.874e+02, percent-clipped=5.0 2023-03-08 03:11:42,755 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=38020.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:12:05,960 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8750, 5.1394, 5.1791, 5.0904, 5.2241, 5.1863, 4.9228, 4.6294], device='cuda:2'), covar=tensor([0.1027, 0.0563, 0.0227, 0.0462, 0.0296, 0.0319, 0.0325, 0.0364], device='cuda:2'), in_proj_covar=tensor([0.0441, 0.0283, 0.0232, 0.0268, 0.0332, 0.0353, 0.0278, 0.0311], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 03:12:15,463 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=38040.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:12:37,089 INFO [train2.py:809] (2/4) Epoch 10, batch 2200, loss[ctc_loss=0.1495, att_loss=0.2762, loss=0.2509, over 16481.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.005815, over 46.00 utterances.], tot_loss[ctc_loss=0.1081, att_loss=0.2517, loss=0.223, over 3259714.27 frames. utt_duration=1307 frames, utt_pad_proportion=0.04408, over 9990.52 utterances.], batch size: 46, lr: 1.09e-02, grad_scale: 8.0 2023-03-08 03:12:58,886 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=38068.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:13:15,986 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6085, 4.8245, 4.6545, 4.6347, 5.1095, 4.7345, 4.6900, 2.1888], device='cuda:2'), covar=tensor([0.0213, 0.0254, 0.0227, 0.0254, 0.1174, 0.0183, 0.0245, 0.2555], device='cuda:2'), in_proj_covar=tensor([0.0123, 0.0124, 0.0130, 0.0132, 0.0319, 0.0118, 0.0113, 0.0216], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 03:13:23,545 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1010, 5.3654, 5.6751, 5.5792, 5.6201, 6.0621, 5.1738, 6.1912], device='cuda:2'), covar=tensor([0.0621, 0.0643, 0.0640, 0.0849, 0.1583, 0.0828, 0.0584, 0.0541], device='cuda:2'), in_proj_covar=tensor([0.0672, 0.0408, 0.0476, 0.0528, 0.0703, 0.0466, 0.0386, 0.0468], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:13:56,131 INFO [train2.py:809] (2/4) Epoch 10, batch 2250, loss[ctc_loss=0.1098, att_loss=0.2597, loss=0.2297, over 17017.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007831, over 51.00 utterances.], tot_loss[ctc_loss=0.1087, att_loss=0.2521, loss=0.2234, over 3256313.65 frames. utt_duration=1294 frames, utt_pad_proportion=0.04724, over 10080.38 utterances.], batch size: 51, lr: 1.09e-02, grad_scale: 8.0 2023-03-08 03:14:21,121 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.523e+02 2.484e+02 3.268e+02 3.999e+02 7.414e+02, threshold=6.535e+02, percent-clipped=1.0 2023-03-08 03:15:14,534 INFO [train2.py:809] (2/4) Epoch 10, batch 2300, loss[ctc_loss=0.1241, att_loss=0.2473, loss=0.2226, over 15946.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007462, over 41.00 utterances.], tot_loss[ctc_loss=0.109, att_loss=0.2525, loss=0.2238, over 3262309.89 frames. utt_duration=1285 frames, utt_pad_proportion=0.04723, over 10167.52 utterances.], batch size: 41, lr: 1.09e-02, grad_scale: 8.0 2023-03-08 03:16:11,179 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3309, 3.5094, 3.5637, 2.7024, 3.6483, 3.5399, 3.5564, 1.8690], device='cuda:2'), covar=tensor([0.1374, 0.1636, 0.2296, 1.2958, 0.1176, 0.6104, 0.0881, 1.8108], device='cuda:2'), in_proj_covar=tensor([0.0094, 0.0114, 0.0119, 0.0189, 0.0098, 0.0174, 0.0099, 0.0174], device='cuda:2'), out_proj_covar=tensor([8.9797e-05, 9.9827e-05, 1.0675e-04, 1.5267e-04, 9.0533e-05, 1.4416e-04, 8.7832e-05, 1.4110e-04], device='cuda:2') 2023-03-08 03:16:33,799 INFO [train2.py:809] (2/4) Epoch 10, batch 2350, loss[ctc_loss=0.08458, att_loss=0.2201, loss=0.193, over 15398.00 frames. utt_duration=1761 frames, utt_pad_proportion=0.00969, over 35.00 utterances.], tot_loss[ctc_loss=0.1104, att_loss=0.2536, loss=0.225, over 3262270.95 frames. utt_duration=1233 frames, utt_pad_proportion=0.06066, over 10596.98 utterances.], batch size: 35, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:16:54,990 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6460, 2.2840, 4.9678, 3.8509, 3.1395, 4.3901, 4.8353, 4.6677], device='cuda:2'), covar=tensor([0.0166, 0.2168, 0.0159, 0.1088, 0.1854, 0.0223, 0.0085, 0.0172], device='cuda:2'), in_proj_covar=tensor([0.0148, 0.0248, 0.0131, 0.0306, 0.0278, 0.0185, 0.0115, 0.0150], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0003, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 03:16:59,114 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+02 2.404e+02 3.254e+02 4.140e+02 7.234e+02, threshold=6.508e+02, percent-clipped=3.0 2023-03-08 03:17:27,192 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5355, 2.6233, 3.5552, 3.0297, 3.4772, 4.6769, 4.4625, 3.2261], device='cuda:2'), covar=tensor([0.0417, 0.2208, 0.1184, 0.1468, 0.1139, 0.0676, 0.0459, 0.1500], device='cuda:2'), in_proj_covar=tensor([0.0229, 0.0229, 0.0244, 0.0208, 0.0239, 0.0297, 0.0213, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:17:39,632 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0097, 4.9890, 4.9091, 2.1721, 1.9145, 2.6256, 2.9110, 3.6222], device='cuda:2'), covar=tensor([0.0637, 0.0182, 0.0171, 0.4024, 0.6439, 0.2821, 0.2115, 0.2043], device='cuda:2'), in_proj_covar=tensor([0.0330, 0.0216, 0.0230, 0.0198, 0.0353, 0.0333, 0.0227, 0.0349], device='cuda:2'), out_proj_covar=tensor([1.5143e-04, 8.2554e-05, 9.9502e-05, 8.9406e-05, 1.5557e-04, 1.3637e-04, 8.9472e-05, 1.4924e-04], device='cuda:2') 2023-03-08 03:17:43,322 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7060, 3.0286, 3.6433, 3.2859, 3.6680, 4.7918, 4.5484, 3.4789], device='cuda:2'), covar=tensor([0.0334, 0.1664, 0.1117, 0.1300, 0.1051, 0.0629, 0.0549, 0.1188], device='cuda:2'), in_proj_covar=tensor([0.0229, 0.0229, 0.0244, 0.0208, 0.0239, 0.0297, 0.0213, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:17:53,625 INFO [train2.py:809] (2/4) Epoch 10, batch 2400, loss[ctc_loss=0.1336, att_loss=0.2756, loss=0.2472, over 17352.00 frames. utt_duration=1103 frames, utt_pad_proportion=0.03571, over 63.00 utterances.], tot_loss[ctc_loss=0.11, att_loss=0.2538, loss=0.225, over 3272831.88 frames. utt_duration=1252 frames, utt_pad_proportion=0.05289, over 10470.74 utterances.], batch size: 63, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:18:22,231 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7763, 2.3422, 2.3816, 3.4065, 3.1292, 3.3699, 2.5203, 2.0824], device='cuda:2'), covar=tensor([0.0645, 0.2031, 0.1308, 0.0555, 0.0785, 0.0383, 0.1355, 0.2025], device='cuda:2'), in_proj_covar=tensor([0.0164, 0.0208, 0.0189, 0.0188, 0.0183, 0.0146, 0.0190, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 03:19:00,526 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7833, 1.4421, 1.9146, 1.9093, 2.3404, 1.5798, 1.4409, 2.7026], device='cuda:2'), covar=tensor([0.1099, 0.5059, 0.4466, 0.2218, 0.1435, 0.2241, 0.3802, 0.1188], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0084, 0.0090, 0.0073, 0.0072, 0.0068, 0.0082, 0.0062], device='cuda:2'), out_proj_covar=tensor([4.5867e-05, 5.6304e-05, 5.8473e-05, 4.8372e-05, 4.5360e-05, 4.6682e-05, 5.4339e-05, 4.3731e-05], device='cuda:2') 2023-03-08 03:19:14,665 INFO [train2.py:809] (2/4) Epoch 10, batch 2450, loss[ctc_loss=0.1217, att_loss=0.2683, loss=0.239, over 16991.00 frames. utt_duration=694.9 frames, utt_pad_proportion=0.1292, over 98.00 utterances.], tot_loss[ctc_loss=0.1096, att_loss=0.2541, loss=0.2252, over 3277009.67 frames. utt_duration=1241 frames, utt_pad_proportion=0.05514, over 10571.55 utterances.], batch size: 98, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:19:40,306 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.402e+02 2.470e+02 3.021e+02 3.879e+02 8.501e+02, threshold=6.042e+02, percent-clipped=3.0 2023-03-08 03:20:05,101 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=38335.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:20:34,413 INFO [train2.py:809] (2/4) Epoch 10, batch 2500, loss[ctc_loss=0.09895, att_loss=0.2577, loss=0.226, over 16480.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.006519, over 46.00 utterances.], tot_loss[ctc_loss=0.1092, att_loss=0.254, loss=0.2251, over 3273180.43 frames. utt_duration=1259 frames, utt_pad_proportion=0.05196, over 10413.48 utterances.], batch size: 46, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:20:39,321 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5451, 2.9267, 3.6144, 2.8750, 3.6017, 4.7665, 4.4591, 3.4092], device='cuda:2'), covar=tensor([0.0343, 0.1487, 0.1016, 0.1361, 0.0993, 0.0509, 0.0511, 0.1120], device='cuda:2'), in_proj_covar=tensor([0.0223, 0.0224, 0.0236, 0.0202, 0.0235, 0.0291, 0.0210, 0.0217], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:20:43,872 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=38360.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:21:17,611 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3251, 2.7227, 3.5529, 2.8311, 3.4281, 4.5742, 4.3152, 3.4173], device='cuda:2'), covar=tensor([0.0418, 0.1668, 0.0948, 0.1327, 0.1041, 0.0695, 0.0504, 0.1099], device='cuda:2'), in_proj_covar=tensor([0.0223, 0.0225, 0.0237, 0.0203, 0.0235, 0.0292, 0.0211, 0.0217], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:21:53,307 INFO [train2.py:809] (2/4) Epoch 10, batch 2550, loss[ctc_loss=0.07429, att_loss=0.2166, loss=0.1882, over 15501.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.009323, over 36.00 utterances.], tot_loss[ctc_loss=0.1114, att_loss=0.2551, loss=0.2264, over 3261492.26 frames. utt_duration=1236 frames, utt_pad_proportion=0.06036, over 10567.66 utterances.], batch size: 36, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:22:18,521 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.579e+02 2.719e+02 3.198e+02 3.977e+02 1.146e+03, threshold=6.397e+02, percent-clipped=5.0 2023-03-08 03:22:21,717 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=38421.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:23:12,917 INFO [train2.py:809] (2/4) Epoch 10, batch 2600, loss[ctc_loss=0.09029, att_loss=0.2557, loss=0.2227, over 17011.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.008574, over 51.00 utterances.], tot_loss[ctc_loss=0.1116, att_loss=0.2559, loss=0.227, over 3269420.59 frames. utt_duration=1204 frames, utt_pad_proportion=0.06493, over 10875.85 utterances.], batch size: 51, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:23:26,109 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6068, 5.1196, 4.9192, 4.9412, 5.0469, 4.7103, 3.3767, 4.9666], device='cuda:2'), covar=tensor([0.0098, 0.0112, 0.0089, 0.0072, 0.0094, 0.0107, 0.0693, 0.0210], device='cuda:2'), in_proj_covar=tensor([0.0072, 0.0071, 0.0083, 0.0051, 0.0056, 0.0067, 0.0090, 0.0090], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:24:31,650 INFO [train2.py:809] (2/4) Epoch 10, batch 2650, loss[ctc_loss=0.1001, att_loss=0.2516, loss=0.2213, over 17195.00 frames. utt_duration=872.4 frames, utt_pad_proportion=0.08558, over 79.00 utterances.], tot_loss[ctc_loss=0.1104, att_loss=0.2557, loss=0.2266, over 3273522.03 frames. utt_duration=1215 frames, utt_pad_proportion=0.06075, over 10786.23 utterances.], batch size: 79, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:24:37,474 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.73 vs. limit=5.0 2023-03-08 03:24:58,451 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.587e+02 2.727e+02 3.206e+02 4.393e+02 9.975e+02, threshold=6.412e+02, percent-clipped=4.0 2023-03-08 03:25:51,263 INFO [train2.py:809] (2/4) Epoch 10, batch 2700, loss[ctc_loss=0.1079, att_loss=0.2476, loss=0.2197, over 15949.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.006766, over 41.00 utterances.], tot_loss[ctc_loss=0.11, att_loss=0.255, loss=0.226, over 3277277.58 frames. utt_duration=1248 frames, utt_pad_proportion=0.0526, over 10519.99 utterances.], batch size: 41, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:26:01,436 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7944, 5.0607, 5.2976, 5.2696, 5.2518, 5.7484, 5.1199, 5.8507], device='cuda:2'), covar=tensor([0.0765, 0.0698, 0.0810, 0.1084, 0.2030, 0.0980, 0.0662, 0.0772], device='cuda:2'), in_proj_covar=tensor([0.0687, 0.0423, 0.0488, 0.0551, 0.0726, 0.0481, 0.0395, 0.0481], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:27:11,538 INFO [train2.py:809] (2/4) Epoch 10, batch 2750, loss[ctc_loss=0.09441, att_loss=0.2295, loss=0.2024, over 13598.00 frames. utt_duration=1815 frames, utt_pad_proportion=0.08581, over 30.00 utterances.], tot_loss[ctc_loss=0.1083, att_loss=0.253, loss=0.2241, over 3269345.43 frames. utt_duration=1280 frames, utt_pad_proportion=0.04685, over 10226.16 utterances.], batch size: 30, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:27:38,145 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.499e+02 2.648e+02 3.075e+02 3.913e+02 9.612e+02, threshold=6.151e+02, percent-clipped=3.0 2023-03-08 03:28:01,964 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=38635.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:28:31,010 INFO [train2.py:809] (2/4) Epoch 10, batch 2800, loss[ctc_loss=0.1088, att_loss=0.2505, loss=0.2222, over 16280.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.00669, over 43.00 utterances.], tot_loss[ctc_loss=0.1087, att_loss=0.2533, loss=0.2244, over 3272441.67 frames. utt_duration=1268 frames, utt_pad_proportion=0.04943, over 10337.87 utterances.], batch size: 43, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:29:18,056 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=38683.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:29:51,023 INFO [train2.py:809] (2/4) Epoch 10, batch 2850, loss[ctc_loss=0.108, att_loss=0.2618, loss=0.231, over 17322.00 frames. utt_duration=1101 frames, utt_pad_proportion=0.03731, over 63.00 utterances.], tot_loss[ctc_loss=0.1097, att_loss=0.2542, loss=0.2253, over 3274296.17 frames. utt_duration=1229 frames, utt_pad_proportion=0.0586, over 10669.54 utterances.], batch size: 63, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:30:11,778 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=38716.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:30:17,874 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.492e+02 2.597e+02 3.110e+02 4.075e+02 8.061e+02, threshold=6.220e+02, percent-clipped=4.0 2023-03-08 03:30:36,661 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5496, 2.3167, 4.9974, 3.7537, 2.9868, 4.2467, 4.7285, 4.6492], device='cuda:2'), covar=tensor([0.0240, 0.1933, 0.0148, 0.1200, 0.2077, 0.0261, 0.0097, 0.0238], device='cuda:2'), in_proj_covar=tensor([0.0150, 0.0246, 0.0132, 0.0309, 0.0279, 0.0187, 0.0115, 0.0150], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0003, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 03:30:57,383 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5031, 3.6016, 3.4670, 2.8484, 3.3945, 3.5965, 3.5655, 2.1774], device='cuda:2'), covar=tensor([0.1396, 0.2347, 0.4326, 0.8641, 0.3975, 0.8103, 0.1252, 1.0846], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0113, 0.0120, 0.0186, 0.0098, 0.0173, 0.0098, 0.0172], device='cuda:2'), out_proj_covar=tensor([8.9485e-05, 9.8761e-05, 1.0787e-04, 1.5062e-04, 9.0704e-05, 1.4308e-04, 8.7223e-05, 1.3971e-04], device='cuda:2') 2023-03-08 03:31:10,912 INFO [train2.py:809] (2/4) Epoch 10, batch 2900, loss[ctc_loss=0.09168, att_loss=0.2439, loss=0.2134, over 16533.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006051, over 45.00 utterances.], tot_loss[ctc_loss=0.1087, att_loss=0.2536, loss=0.2246, over 3271405.22 frames. utt_duration=1235 frames, utt_pad_proportion=0.05793, over 10608.80 utterances.], batch size: 45, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:32:30,748 INFO [train2.py:809] (2/4) Epoch 10, batch 2950, loss[ctc_loss=0.1108, att_loss=0.2614, loss=0.2312, over 17486.00 frames. utt_duration=887 frames, utt_pad_proportion=0.07119, over 79.00 utterances.], tot_loss[ctc_loss=0.1103, att_loss=0.2546, loss=0.2257, over 3267984.89 frames. utt_duration=1200 frames, utt_pad_proportion=0.06816, over 10907.06 utterances.], batch size: 79, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:32:47,235 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.30 vs. limit=5.0 2023-03-08 03:32:57,342 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+02 2.729e+02 3.322e+02 4.124e+02 1.749e+03, threshold=6.643e+02, percent-clipped=6.0 2023-03-08 03:33:02,377 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=38823.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:33:50,303 INFO [train2.py:809] (2/4) Epoch 10, batch 3000, loss[ctc_loss=0.1307, att_loss=0.2709, loss=0.2429, over 17287.00 frames. utt_duration=876.7 frames, utt_pad_proportion=0.08102, over 79.00 utterances.], tot_loss[ctc_loss=0.1106, att_loss=0.2548, loss=0.226, over 3268902.93 frames. utt_duration=1185 frames, utt_pad_proportion=0.07179, over 11046.26 utterances.], batch size: 79, lr: 1.08e-02, grad_scale: 8.0 2023-03-08 03:33:50,304 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 03:34:06,505 INFO [train2.py:843] (2/4) Epoch 10, validation: ctc_loss=0.0536, att_loss=0.2406, loss=0.2032, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 03:34:06,506 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 03:34:31,140 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-03-08 03:34:54,885 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=38884.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:35:26,805 INFO [train2.py:809] (2/4) Epoch 10, batch 3050, loss[ctc_loss=0.1173, att_loss=0.2714, loss=0.2406, over 17138.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.01326, over 56.00 utterances.], tot_loss[ctc_loss=0.1108, att_loss=0.2553, loss=0.2264, over 3274575.82 frames. utt_duration=1184 frames, utt_pad_proportion=0.06968, over 11077.95 utterances.], batch size: 56, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:35:37,402 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0699, 4.4874, 4.3364, 4.4765, 4.9733, 4.6408, 4.5210, 2.2325], device='cuda:2'), covar=tensor([0.0304, 0.0320, 0.0336, 0.0218, 0.0928, 0.0183, 0.0264, 0.2183], device='cuda:2'), in_proj_covar=tensor([0.0123, 0.0127, 0.0132, 0.0135, 0.0325, 0.0119, 0.0116, 0.0217], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 03:35:53,210 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.607e+02 2.391e+02 2.901e+02 3.646e+02 9.557e+02, threshold=5.803e+02, percent-clipped=2.0 2023-03-08 03:36:05,724 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9211, 5.2862, 4.7093, 5.3170, 4.6658, 4.9215, 5.4417, 5.1379], device='cuda:2'), covar=tensor([0.0535, 0.0284, 0.0875, 0.0248, 0.0470, 0.0218, 0.0212, 0.0191], device='cuda:2'), in_proj_covar=tensor([0.0326, 0.0254, 0.0314, 0.0246, 0.0261, 0.0200, 0.0239, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 03:36:27,904 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5751, 5.8650, 5.1261, 5.7050, 5.5056, 5.0369, 5.2308, 5.0494], device='cuda:2'), covar=tensor([0.1396, 0.0895, 0.0894, 0.0817, 0.0787, 0.1507, 0.2278, 0.2727], device='cuda:2'), in_proj_covar=tensor([0.0421, 0.0490, 0.0362, 0.0375, 0.0349, 0.0410, 0.0503, 0.0446], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 03:36:47,176 INFO [train2.py:809] (2/4) Epoch 10, batch 3100, loss[ctc_loss=0.1016, att_loss=0.2629, loss=0.2307, over 17031.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007307, over 51.00 utterances.], tot_loss[ctc_loss=0.1104, att_loss=0.2549, loss=0.226, over 3281731.34 frames. utt_duration=1215 frames, utt_pad_proportion=0.0601, over 10817.00 utterances.], batch size: 51, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:38:08,102 INFO [train2.py:809] (2/4) Epoch 10, batch 3150, loss[ctc_loss=0.1348, att_loss=0.2634, loss=0.2377, over 16762.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006792, over 48.00 utterances.], tot_loss[ctc_loss=0.1095, att_loss=0.2545, loss=0.2255, over 3284202.48 frames. utt_duration=1219 frames, utt_pad_proportion=0.05931, over 10792.18 utterances.], batch size: 48, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:38:28,464 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=39016.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:38:34,153 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+02 2.629e+02 3.270e+02 4.032e+02 9.981e+02, threshold=6.541e+02, percent-clipped=6.0 2023-03-08 03:39:27,544 INFO [train2.py:809] (2/4) Epoch 10, batch 3200, loss[ctc_loss=0.1327, att_loss=0.2644, loss=0.2381, over 17180.00 frames. utt_duration=871.4 frames, utt_pad_proportion=0.08755, over 79.00 utterances.], tot_loss[ctc_loss=0.1091, att_loss=0.254, loss=0.225, over 3277269.82 frames. utt_duration=1225 frames, utt_pad_proportion=0.05819, over 10715.96 utterances.], batch size: 79, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:39:44,558 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=39064.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:40:48,182 INFO [train2.py:809] (2/4) Epoch 10, batch 3250, loss[ctc_loss=0.09499, att_loss=0.2364, loss=0.2081, over 16532.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006111, over 45.00 utterances.], tot_loss[ctc_loss=0.1093, att_loss=0.2542, loss=0.2252, over 3282554.62 frames. utt_duration=1209 frames, utt_pad_proportion=0.05996, over 10873.72 utterances.], batch size: 45, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:41:13,728 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+02 2.584e+02 3.081e+02 4.035e+02 1.100e+03, threshold=6.162e+02, percent-clipped=4.0 2023-03-08 03:42:08,486 INFO [train2.py:809] (2/4) Epoch 10, batch 3300, loss[ctc_loss=0.0635, att_loss=0.2125, loss=0.1827, over 15788.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.0076, over 38.00 utterances.], tot_loss[ctc_loss=0.109, att_loss=0.2541, loss=0.2251, over 3282244.86 frames. utt_duration=1209 frames, utt_pad_proportion=0.06046, over 10873.84 utterances.], batch size: 38, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:42:48,944 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39179.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:42:58,415 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39185.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:43:02,648 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 03:43:12,532 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.82 vs. limit=2.0 2023-03-08 03:43:29,381 INFO [train2.py:809] (2/4) Epoch 10, batch 3350, loss[ctc_loss=0.1362, att_loss=0.2614, loss=0.2364, over 16200.00 frames. utt_duration=1582 frames, utt_pad_proportion=0.005641, over 41.00 utterances.], tot_loss[ctc_loss=0.1087, att_loss=0.2533, loss=0.2243, over 3273214.60 frames. utt_duration=1230 frames, utt_pad_proportion=0.05898, over 10653.59 utterances.], batch size: 41, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:43:54,948 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+02 2.443e+02 2.910e+02 3.809e+02 9.404e+02, threshold=5.819e+02, percent-clipped=2.0 2023-03-08 03:44:37,723 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39246.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:44:38,535 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2023-03-08 03:44:47,908 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.86 vs. limit=5.0 2023-03-08 03:44:49,741 INFO [train2.py:809] (2/4) Epoch 10, batch 3400, loss[ctc_loss=0.1048, att_loss=0.2293, loss=0.2044, over 15370.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01064, over 35.00 utterances.], tot_loss[ctc_loss=0.1091, att_loss=0.2536, loss=0.2247, over 3273397.31 frames. utt_duration=1234 frames, utt_pad_proportion=0.05805, over 10627.83 utterances.], batch size: 35, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:45:11,219 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0803, 5.3920, 4.8760, 5.4692, 4.7528, 5.1087, 5.5586, 5.3243], device='cuda:2'), covar=tensor([0.0479, 0.0292, 0.0758, 0.0224, 0.0452, 0.0186, 0.0171, 0.0160], device='cuda:2'), in_proj_covar=tensor([0.0321, 0.0252, 0.0314, 0.0244, 0.0261, 0.0200, 0.0237, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 03:45:32,565 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2023-03-08 03:45:37,911 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39284.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:45:49,293 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0600, 5.3323, 5.3361, 5.3423, 5.4446, 5.3869, 5.0478, 4.9593], device='cuda:2'), covar=tensor([0.1036, 0.0521, 0.0271, 0.0498, 0.0265, 0.0295, 0.0312, 0.0268], device='cuda:2'), in_proj_covar=tensor([0.0446, 0.0285, 0.0241, 0.0271, 0.0336, 0.0362, 0.0283, 0.0316], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 03:45:56,332 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-03-08 03:46:11,201 INFO [train2.py:809] (2/4) Epoch 10, batch 3450, loss[ctc_loss=0.1127, att_loss=0.2539, loss=0.2257, over 16323.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006779, over 45.00 utterances.], tot_loss[ctc_loss=0.1076, att_loss=0.2522, loss=0.2232, over 3268917.38 frames. utt_duration=1246 frames, utt_pad_proportion=0.05703, over 10506.06 utterances.], batch size: 45, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:46:36,009 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+02 2.326e+02 2.969e+02 3.835e+02 9.960e+02, threshold=5.938e+02, percent-clipped=3.0 2023-03-08 03:47:16,370 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39345.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:47:31,579 INFO [train2.py:809] (2/4) Epoch 10, batch 3500, loss[ctc_loss=0.1463, att_loss=0.2698, loss=0.2451, over 17009.00 frames. utt_duration=688.6 frames, utt_pad_proportion=0.1327, over 99.00 utterances.], tot_loss[ctc_loss=0.1069, att_loss=0.2517, loss=0.2227, over 3265976.93 frames. utt_duration=1261 frames, utt_pad_proportion=0.05392, over 10371.52 utterances.], batch size: 99, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:48:31,849 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3402, 2.5834, 3.5344, 2.6709, 3.2646, 4.4128, 4.2129, 3.1546], device='cuda:2'), covar=tensor([0.0445, 0.2212, 0.1209, 0.1704, 0.1450, 0.1004, 0.0675, 0.1596], device='cuda:2'), in_proj_covar=tensor([0.0223, 0.0225, 0.0240, 0.0204, 0.0237, 0.0295, 0.0213, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:48:52,045 INFO [train2.py:809] (2/4) Epoch 10, batch 3550, loss[ctc_loss=0.1077, att_loss=0.2634, loss=0.2323, over 16949.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.008444, over 50.00 utterances.], tot_loss[ctc_loss=0.1074, att_loss=0.2522, loss=0.2233, over 3258028.48 frames. utt_duration=1263 frames, utt_pad_proportion=0.05476, over 10331.32 utterances.], batch size: 50, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:49:17,423 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.387e+02 2.370e+02 2.861e+02 3.451e+02 5.590e+02, threshold=5.722e+02, percent-clipped=0.0 2023-03-08 03:49:23,919 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5685, 2.7715, 3.3092, 4.3412, 3.8499, 3.9459, 2.8978, 2.3912], device='cuda:2'), covar=tensor([0.0563, 0.2201, 0.1013, 0.0568, 0.0868, 0.0432, 0.1753, 0.2297], device='cuda:2'), in_proj_covar=tensor([0.0162, 0.0204, 0.0183, 0.0186, 0.0182, 0.0147, 0.0190, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 03:49:54,099 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0860, 4.5835, 4.4008, 4.6967, 2.5613, 4.4795, 2.8260, 1.5444], device='cuda:2'), covar=tensor([0.0362, 0.0131, 0.0590, 0.0138, 0.1867, 0.0173, 0.1428, 0.1873], device='cuda:2'), in_proj_covar=tensor([0.0131, 0.0104, 0.0247, 0.0105, 0.0215, 0.0104, 0.0218, 0.0198], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 03:50:11,858 INFO [train2.py:809] (2/4) Epoch 10, batch 3600, loss[ctc_loss=0.1126, att_loss=0.2618, loss=0.232, over 16292.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006442, over 43.00 utterances.], tot_loss[ctc_loss=0.1078, att_loss=0.2524, loss=0.2235, over 3261013.26 frames. utt_duration=1263 frames, utt_pad_proportion=0.05338, over 10342.29 utterances.], batch size: 43, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:50:27,166 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39463.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:50:40,621 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.56 vs. limit=2.0 2023-03-08 03:50:41,871 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-08 03:50:52,065 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=39479.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:51:17,805 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6766, 3.6241, 2.9785, 3.3691, 3.7788, 3.3393, 2.5358, 4.0834], device='cuda:2'), covar=tensor([0.1074, 0.0441, 0.1069, 0.0623, 0.0644, 0.0672, 0.1008, 0.0508], device='cuda:2'), in_proj_covar=tensor([0.0183, 0.0180, 0.0205, 0.0173, 0.0229, 0.0208, 0.0180, 0.0247], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 03:51:32,922 INFO [train2.py:809] (2/4) Epoch 10, batch 3650, loss[ctc_loss=0.09952, att_loss=0.2568, loss=0.2254, over 16457.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.007798, over 46.00 utterances.], tot_loss[ctc_loss=0.109, att_loss=0.2536, loss=0.2247, over 3260224.31 frames. utt_duration=1198 frames, utt_pad_proportion=0.06974, over 10897.21 utterances.], batch size: 46, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:51:33,437 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6202, 4.5921, 4.5199, 4.6114, 5.0519, 4.5329, 4.5194, 2.1185], device='cuda:2'), covar=tensor([0.0227, 0.0197, 0.0241, 0.0160, 0.0870, 0.0233, 0.0218, 0.2363], device='cuda:2'), in_proj_covar=tensor([0.0123, 0.0125, 0.0132, 0.0134, 0.0325, 0.0121, 0.0117, 0.0219], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 03:51:57,851 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.423e+02 2.407e+02 3.049e+02 3.792e+02 1.207e+03, threshold=6.099e+02, percent-clipped=6.0 2023-03-08 03:52:04,638 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39524.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:52:09,224 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=39527.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:52:31,751 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39541.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:52:53,646 INFO [train2.py:809] (2/4) Epoch 10, batch 3700, loss[ctc_loss=0.1107, att_loss=0.2323, loss=0.208, over 15992.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007688, over 40.00 utterances.], tot_loss[ctc_loss=0.1073, att_loss=0.2519, loss=0.223, over 3254374.93 frames. utt_duration=1215 frames, utt_pad_proportion=0.06694, over 10729.48 utterances.], batch size: 40, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:53:17,102 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8387, 4.7705, 4.6494, 2.5142, 4.5515, 4.4660, 3.9485, 2.4885], device='cuda:2'), covar=tensor([0.0082, 0.0097, 0.0238, 0.1150, 0.0094, 0.0196, 0.0374, 0.1462], device='cuda:2'), in_proj_covar=tensor([0.0059, 0.0081, 0.0073, 0.0100, 0.0068, 0.0091, 0.0091, 0.0100], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 03:53:55,817 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 03:54:12,934 INFO [train2.py:809] (2/4) Epoch 10, batch 3750, loss[ctc_loss=0.1107, att_loss=0.2674, loss=0.236, over 17411.00 frames. utt_duration=1107 frames, utt_pad_proportion=0.03238, over 63.00 utterances.], tot_loss[ctc_loss=0.1073, att_loss=0.2521, loss=0.2232, over 3263529.97 frames. utt_duration=1238 frames, utt_pad_proportion=0.06002, over 10555.32 utterances.], batch size: 63, lr: 1.07e-02, grad_scale: 8.0 2023-03-08 03:54:38,433 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.452e+02 2.346e+02 2.722e+02 3.425e+02 7.090e+02, threshold=5.444e+02, percent-clipped=1.0 2023-03-08 03:55:08,184 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39639.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 03:55:09,571 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39640.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:55:24,489 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8834, 5.2233, 5.4002, 5.4145, 5.2934, 5.8591, 5.0746, 5.9498], device='cuda:2'), covar=tensor([0.0713, 0.0593, 0.0742, 0.0960, 0.1814, 0.0879, 0.0662, 0.0650], device='cuda:2'), in_proj_covar=tensor([0.0705, 0.0426, 0.0492, 0.0553, 0.0744, 0.0496, 0.0400, 0.0489], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:55:24,660 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4429, 3.5671, 3.5935, 3.0151, 3.5378, 3.5481, 3.5063, 2.0806], device='cuda:2'), covar=tensor([0.1041, 0.1491, 0.2243, 0.5187, 0.2229, 0.3045, 0.0783, 0.9155], device='cuda:2'), in_proj_covar=tensor([0.0095, 0.0115, 0.0122, 0.0192, 0.0098, 0.0177, 0.0102, 0.0171], device='cuda:2'), out_proj_covar=tensor([9.1499e-05, 1.0100e-04, 1.0992e-04, 1.5588e-04, 9.1012e-05, 1.4684e-04, 8.9931e-05, 1.3990e-04], device='cuda:2') 2023-03-08 03:55:33,238 INFO [train2.py:809] (2/4) Epoch 10, batch 3800, loss[ctc_loss=0.09425, att_loss=0.2379, loss=0.2092, over 15964.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005755, over 41.00 utterances.], tot_loss[ctc_loss=0.1072, att_loss=0.252, loss=0.223, over 3264635.87 frames. utt_duration=1248 frames, utt_pad_proportion=0.0567, over 10472.63 utterances.], batch size: 41, lr: 1.06e-02, grad_scale: 8.0 2023-03-08 03:56:46,663 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39700.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 03:56:53,085 INFO [train2.py:809] (2/4) Epoch 10, batch 3850, loss[ctc_loss=0.09858, att_loss=0.2538, loss=0.2227, over 17437.00 frames. utt_duration=1012 frames, utt_pad_proportion=0.04213, over 69.00 utterances.], tot_loss[ctc_loss=0.1068, att_loss=0.2518, loss=0.2228, over 3259483.29 frames. utt_duration=1248 frames, utt_pad_proportion=0.05677, over 10461.20 utterances.], batch size: 69, lr: 1.06e-02, grad_scale: 8.0 2023-03-08 03:57:18,321 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+02 2.275e+02 2.627e+02 3.517e+02 7.871e+02, threshold=5.254e+02, percent-clipped=5.0 2023-03-08 03:58:10,241 INFO [train2.py:809] (2/4) Epoch 10, batch 3900, loss[ctc_loss=0.09512, att_loss=0.2314, loss=0.2041, over 16184.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.005633, over 41.00 utterances.], tot_loss[ctc_loss=0.1076, att_loss=0.2525, loss=0.2236, over 3266704.36 frames. utt_duration=1252 frames, utt_pad_proportion=0.0538, over 10453.19 utterances.], batch size: 41, lr: 1.06e-02, grad_scale: 8.0 2023-03-08 03:58:24,444 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8600, 5.1318, 5.4265, 5.3158, 5.3513, 5.8274, 5.1034, 5.9339], device='cuda:2'), covar=tensor([0.0769, 0.0710, 0.0708, 0.1082, 0.2003, 0.0856, 0.0633, 0.0647], device='cuda:2'), in_proj_covar=tensor([0.0698, 0.0420, 0.0483, 0.0552, 0.0730, 0.0490, 0.0400, 0.0481], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 03:58:30,692 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39767.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:59:26,950 INFO [train2.py:809] (2/4) Epoch 10, batch 3950, loss[ctc_loss=0.1918, att_loss=0.301, loss=0.2792, over 14235.00 frames. utt_duration=394.2 frames, utt_pad_proportion=0.3155, over 145.00 utterances.], tot_loss[ctc_loss=0.1074, att_loss=0.2523, loss=0.2233, over 3263946.68 frames. utt_duration=1239 frames, utt_pad_proportion=0.05865, over 10551.14 utterances.], batch size: 145, lr: 1.06e-02, grad_scale: 8.0 2023-03-08 03:59:44,086 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39815.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:59:50,030 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39819.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 03:59:51,185 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.448e+02 2.608e+02 3.275e+02 3.896e+02 7.798e+02, threshold=6.551e+02, percent-clipped=9.0 2023-03-08 04:00:04,093 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39828.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:00:45,409 INFO [train2.py:809] (2/4) Epoch 11, batch 0, loss[ctc_loss=0.1107, att_loss=0.2592, loss=0.2295, over 16764.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.00688, over 48.00 utterances.], tot_loss[ctc_loss=0.1107, att_loss=0.2592, loss=0.2295, over 16764.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.00688, over 48.00 utterances.], batch size: 48, lr: 1.01e-02, grad_scale: 8.0 2023-03-08 04:00:45,410 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 04:00:57,590 INFO [train2.py:843] (2/4) Epoch 11, validation: ctc_loss=0.05063, att_loss=0.2383, loss=0.2008, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 04:00:57,591 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 04:01:02,360 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=39841.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:01:12,696 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-03-08 04:01:58,693 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39876.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:02:12,665 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.77 vs. limit=5.0 2023-03-08 04:02:18,277 INFO [train2.py:809] (2/4) Epoch 11, batch 50, loss[ctc_loss=0.1146, att_loss=0.2574, loss=0.2288, over 16632.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.00514, over 47.00 utterances.], tot_loss[ctc_loss=0.1085, att_loss=0.2574, loss=0.2276, over 755804.82 frames. utt_duration=1263 frames, utt_pad_proportion=0.02714, over 2395.70 utterances.], batch size: 47, lr: 1.01e-02, grad_scale: 8.0 2023-03-08 04:02:19,881 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=39889.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:03:08,054 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 2.626e+02 3.075e+02 3.847e+02 1.291e+03, threshold=6.150e+02, percent-clipped=4.0 2023-03-08 04:03:37,059 INFO [train2.py:809] (2/4) Epoch 11, batch 100, loss[ctc_loss=0.105, att_loss=0.2202, loss=0.1972, over 15386.00 frames. utt_duration=1760 frames, utt_pad_proportion=0.0102, over 35.00 utterances.], tot_loss[ctc_loss=0.1078, att_loss=0.2542, loss=0.2249, over 1317544.60 frames. utt_duration=1273 frames, utt_pad_proportion=0.03565, over 4144.77 utterances.], batch size: 35, lr: 1.01e-02, grad_scale: 8.0 2023-03-08 04:03:40,313 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=39940.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:04:55,829 INFO [train2.py:809] (2/4) Epoch 11, batch 150, loss[ctc_loss=0.1077, att_loss=0.2684, loss=0.2362, over 16974.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007144, over 50.00 utterances.], tot_loss[ctc_loss=0.1074, att_loss=0.2536, loss=0.2243, over 1751687.77 frames. utt_duration=1262 frames, utt_pad_proportion=0.04426, over 5558.58 utterances.], batch size: 50, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:04:55,935 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=39988.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:05:07,717 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39995.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 04:05:51,856 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.648e+02 2.498e+02 3.139e+02 4.301e+02 8.550e+02, threshold=6.278e+02, percent-clipped=5.0 2023-03-08 04:06:01,263 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0165, 3.8436, 3.1230, 3.4904, 3.9078, 3.6693, 2.8046, 4.3786], device='cuda:2'), covar=tensor([0.1044, 0.0476, 0.1199, 0.0677, 0.0679, 0.0670, 0.0911, 0.0443], device='cuda:2'), in_proj_covar=tensor([0.0183, 0.0179, 0.0205, 0.0174, 0.0232, 0.0209, 0.0182, 0.0249], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 04:06:20,496 INFO [train2.py:809] (2/4) Epoch 11, batch 200, loss[ctc_loss=0.1193, att_loss=0.2668, loss=0.2373, over 16762.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006673, over 48.00 utterances.], tot_loss[ctc_loss=0.108, att_loss=0.2528, loss=0.2239, over 2086903.97 frames. utt_duration=1230 frames, utt_pad_proportion=0.05552, over 6792.23 utterances.], batch size: 48, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:07:39,781 INFO [train2.py:809] (2/4) Epoch 11, batch 250, loss[ctc_loss=0.1693, att_loss=0.2901, loss=0.2659, over 13876.00 frames. utt_duration=381.7 frames, utt_pad_proportion=0.335, over 146.00 utterances.], tot_loss[ctc_loss=0.1074, att_loss=0.2519, loss=0.223, over 2337440.93 frames. utt_duration=1246 frames, utt_pad_proportion=0.05732, over 7514.77 utterances.], batch size: 146, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:08:20,363 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6848, 4.8791, 4.6156, 2.3158, 4.5929, 4.5175, 3.8789, 2.4251], device='cuda:2'), covar=tensor([0.0146, 0.0103, 0.0279, 0.1450, 0.0104, 0.0189, 0.0452, 0.1786], device='cuda:2'), in_proj_covar=tensor([0.0060, 0.0083, 0.0075, 0.0104, 0.0069, 0.0094, 0.0093, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 04:08:28,155 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40119.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:08:29,483 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.404e+02 2.494e+02 2.990e+02 3.866e+02 8.648e+02, threshold=5.979e+02, percent-clipped=3.0 2023-03-08 04:08:34,920 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40123.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:08:59,721 INFO [train2.py:809] (2/4) Epoch 11, batch 300, loss[ctc_loss=0.08335, att_loss=0.2348, loss=0.2045, over 15974.00 frames. utt_duration=1560 frames, utt_pad_proportion=0.005923, over 41.00 utterances.], tot_loss[ctc_loss=0.1067, att_loss=0.252, loss=0.223, over 2542911.18 frames. utt_duration=1244 frames, utt_pad_proportion=0.05563, over 8185.45 utterances.], batch size: 41, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:09:14,418 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6290, 2.5656, 2.8653, 4.5159, 4.1179, 4.1406, 3.0512, 1.8136], device='cuda:2'), covar=tensor([0.0458, 0.2194, 0.1386, 0.0434, 0.0505, 0.0301, 0.1203, 0.2323], device='cuda:2'), in_proj_covar=tensor([0.0163, 0.0207, 0.0186, 0.0187, 0.0180, 0.0148, 0.0190, 0.0181], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 04:09:45,570 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40167.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:09:52,699 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40171.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:10:20,969 INFO [train2.py:809] (2/4) Epoch 11, batch 350, loss[ctc_loss=0.09991, att_loss=0.252, loss=0.2216, over 16539.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006347, over 45.00 utterances.], tot_loss[ctc_loss=0.1057, att_loss=0.2516, loss=0.2224, over 2707186.35 frames. utt_duration=1246 frames, utt_pad_proportion=0.05326, over 8704.31 utterances.], batch size: 45, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:10:31,980 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40195.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:11:08,665 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40218.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:11:11,922 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.687e+02 2.384e+02 2.929e+02 3.608e+02 1.665e+03, threshold=5.857e+02, percent-clipped=2.0 2023-03-08 04:11:41,144 INFO [train2.py:809] (2/4) Epoch 11, batch 400, loss[ctc_loss=0.09131, att_loss=0.2136, loss=0.1892, over 15764.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.007907, over 38.00 utterances.], tot_loss[ctc_loss=0.1061, att_loss=0.2521, loss=0.2229, over 2833089.22 frames. utt_duration=1211 frames, utt_pad_proportion=0.0627, over 9369.92 utterances.], batch size: 38, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:12:09,460 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40256.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:12:36,242 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40272.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:12:47,334 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40279.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:13:01,948 INFO [train2.py:809] (2/4) Epoch 11, batch 450, loss[ctc_loss=0.07995, att_loss=0.2268, loss=0.1974, over 14540.00 frames. utt_duration=1819 frames, utt_pad_proportion=0.03826, over 32.00 utterances.], tot_loss[ctc_loss=0.1059, att_loss=0.2519, loss=0.2227, over 2932612.04 frames. utt_duration=1213 frames, utt_pad_proportion=0.06114, over 9685.44 utterances.], batch size: 32, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:13:12,972 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40295.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 04:13:52,370 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.652e+02 2.417e+02 2.903e+02 3.339e+02 6.124e+02, threshold=5.806e+02, percent-clipped=1.0 2023-03-08 04:14:08,015 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40330.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:14:13,340 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40333.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:14:21,408 INFO [train2.py:809] (2/4) Epoch 11, batch 500, loss[ctc_loss=0.07393, att_loss=0.2034, loss=0.1775, over 15743.00 frames. utt_duration=1659 frames, utt_pad_proportion=0.01028, over 38.00 utterances.], tot_loss[ctc_loss=0.1056, att_loss=0.2515, loss=0.2223, over 3010792.57 frames. utt_duration=1250 frames, utt_pad_proportion=0.05174, over 9645.48 utterances.], batch size: 38, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:14:29,171 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40343.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 04:15:41,808 INFO [train2.py:809] (2/4) Epoch 11, batch 550, loss[ctc_loss=0.09701, att_loss=0.2544, loss=0.223, over 16536.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006407, over 45.00 utterances.], tot_loss[ctc_loss=0.1057, att_loss=0.2515, loss=0.2223, over 3074511.15 frames. utt_duration=1251 frames, utt_pad_proportion=0.05021, over 9840.00 utterances.], batch size: 45, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:15:46,976 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40391.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:16:33,185 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+02 2.316e+02 2.892e+02 3.547e+02 1.294e+03, threshold=5.784e+02, percent-clipped=4.0 2023-03-08 04:16:38,021 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40423.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:16:49,627 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9611, 2.0704, 2.2374, 2.0987, 2.5266, 1.6760, 1.8526, 1.9325], device='cuda:2'), covar=tensor([0.0989, 0.4090, 0.3116, 0.1959, 0.1635, 0.2165, 0.2986, 0.1841], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0089, 0.0091, 0.0077, 0.0076, 0.0071, 0.0088, 0.0064], device='cuda:2'), out_proj_covar=tensor([4.8907e-05, 5.9961e-05, 6.0984e-05, 5.1433e-05, 4.9085e-05, 4.9544e-05, 5.8611e-05, 4.5578e-05], device='cuda:2') 2023-03-08 04:17:02,143 INFO [train2.py:809] (2/4) Epoch 11, batch 600, loss[ctc_loss=0.1188, att_loss=0.2665, loss=0.2369, over 17066.00 frames. utt_duration=1290 frames, utt_pad_proportion=0.008774, over 53.00 utterances.], tot_loss[ctc_loss=0.1044, att_loss=0.2502, loss=0.2211, over 3114268.61 frames. utt_duration=1262 frames, utt_pad_proportion=0.04933, over 9879.31 utterances.], batch size: 53, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:17:54,375 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40471.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:17:54,601 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40471.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:18:11,528 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2023-03-08 04:18:22,146 INFO [train2.py:809] (2/4) Epoch 11, batch 650, loss[ctc_loss=0.09893, att_loss=0.2437, loss=0.2148, over 16401.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007676, over 44.00 utterances.], tot_loss[ctc_loss=0.1056, att_loss=0.2509, loss=0.2219, over 3155412.36 frames. utt_duration=1270 frames, utt_pad_proportion=0.0467, over 9953.93 utterances.], batch size: 44, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:18:28,832 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40492.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:19:03,054 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1276, 4.4724, 4.6826, 5.0979, 2.6793, 4.5893, 2.4632, 1.8925], device='cuda:2'), covar=tensor([0.0296, 0.0182, 0.0615, 0.0070, 0.1847, 0.0144, 0.1774, 0.1839], device='cuda:2'), in_proj_covar=tensor([0.0132, 0.0107, 0.0254, 0.0104, 0.0220, 0.0107, 0.0223, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 04:19:11,085 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40519.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:19:12,501 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.622e+02 2.380e+02 3.105e+02 3.848e+02 6.391e+02, threshold=6.210e+02, percent-clipped=2.0 2023-03-08 04:19:41,595 INFO [train2.py:809] (2/4) Epoch 11, batch 700, loss[ctc_loss=0.0918, att_loss=0.2574, loss=0.2243, over 16763.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006747, over 48.00 utterances.], tot_loss[ctc_loss=0.1048, att_loss=0.2506, loss=0.2215, over 3173546.86 frames. utt_duration=1268 frames, utt_pad_proportion=0.04944, over 10025.59 utterances.], batch size: 48, lr: 1.01e-02, grad_scale: 16.0 2023-03-08 04:19:55,544 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3811, 3.4640, 3.4748, 3.0376, 3.4323, 3.5355, 3.3818, 2.6867], device='cuda:2'), covar=tensor([0.1002, 0.1597, 0.3300, 0.5712, 0.3261, 0.3367, 0.1142, 0.6510], device='cuda:2'), in_proj_covar=tensor([0.0094, 0.0118, 0.0124, 0.0190, 0.0099, 0.0178, 0.0103, 0.0168], device='cuda:2'), out_proj_covar=tensor([9.0549e-05, 1.0360e-04, 1.1223e-04, 1.5512e-04, 9.2170e-05, 1.4756e-04, 9.1124e-05, 1.3864e-04], device='cuda:2') 2023-03-08 04:20:01,480 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40551.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:20:04,719 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40553.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:20:12,329 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0927, 4.5303, 4.1935, 4.9709, 2.5249, 4.4920, 2.4133, 1.7531], device='cuda:2'), covar=tensor([0.0317, 0.0145, 0.0802, 0.0079, 0.1941, 0.0161, 0.1672, 0.1782], device='cuda:2'), in_proj_covar=tensor([0.0133, 0.0107, 0.0254, 0.0105, 0.0219, 0.0106, 0.0223, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 04:20:38,008 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40574.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:21:00,399 INFO [train2.py:809] (2/4) Epoch 11, batch 750, loss[ctc_loss=0.1318, att_loss=0.2649, loss=0.2383, over 16281.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006553, over 43.00 utterances.], tot_loss[ctc_loss=0.1044, att_loss=0.2502, loss=0.221, over 3190460.12 frames. utt_duration=1283 frames, utt_pad_proportion=0.04713, over 9955.22 utterances.], batch size: 43, lr: 1.00e-02, grad_scale: 16.0 2023-03-08 04:21:51,407 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.282e+02 2.391e+02 2.736e+02 3.441e+02 6.786e+02, threshold=5.472e+02, percent-clipped=2.0 2023-03-08 04:22:03,984 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40628.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:22:08,273 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4472, 5.0263, 4.5977, 4.8389, 5.0488, 4.5542, 3.6104, 4.9415], device='cuda:2'), covar=tensor([0.0122, 0.0102, 0.0146, 0.0095, 0.0087, 0.0137, 0.0654, 0.0205], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0071, 0.0086, 0.0052, 0.0058, 0.0067, 0.0090, 0.0092], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 04:22:20,809 INFO [train2.py:809] (2/4) Epoch 11, batch 800, loss[ctc_loss=0.1159, att_loss=0.2655, loss=0.2356, over 16786.00 frames. utt_duration=679.7 frames, utt_pad_proportion=0.1417, over 99.00 utterances.], tot_loss[ctc_loss=0.1037, att_loss=0.2498, loss=0.2206, over 3202751.52 frames. utt_duration=1279 frames, utt_pad_proportion=0.04818, over 10030.45 utterances.], batch size: 99, lr: 1.00e-02, grad_scale: 16.0 2023-03-08 04:22:22,745 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9775, 3.8278, 3.1751, 3.5572, 3.9822, 3.6841, 2.8372, 4.4223], device='cuda:2'), covar=tensor([0.0966, 0.0520, 0.1008, 0.0601, 0.0626, 0.0611, 0.0896, 0.0436], device='cuda:2'), in_proj_covar=tensor([0.0184, 0.0181, 0.0204, 0.0174, 0.0234, 0.0212, 0.0183, 0.0248], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 04:22:41,716 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40651.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:23:07,278 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9768, 5.2227, 5.4807, 5.3504, 5.3376, 5.9282, 5.1476, 6.0507], device='cuda:2'), covar=tensor([0.0688, 0.0721, 0.0796, 0.1114, 0.1956, 0.0852, 0.0580, 0.0650], device='cuda:2'), in_proj_covar=tensor([0.0701, 0.0424, 0.0488, 0.0547, 0.0730, 0.0489, 0.0396, 0.0482], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 04:23:27,409 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7346, 3.6953, 3.6995, 3.1802, 3.5280, 3.6916, 3.4447, 2.8239], device='cuda:2'), covar=tensor([0.1038, 0.1573, 0.2886, 0.6886, 0.3310, 0.5722, 0.1482, 0.7302], device='cuda:2'), in_proj_covar=tensor([0.0094, 0.0118, 0.0124, 0.0191, 0.0100, 0.0178, 0.0103, 0.0168], device='cuda:2'), out_proj_covar=tensor([9.0382e-05, 1.0391e-04, 1.1192e-04, 1.5566e-04, 9.2999e-05, 1.4779e-04, 9.1075e-05, 1.3842e-04], device='cuda:2') 2023-03-08 04:23:27,458 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2161, 4.5007, 4.6438, 4.8157, 2.6437, 4.5349, 2.9437, 1.5947], device='cuda:2'), covar=tensor([0.0295, 0.0190, 0.0631, 0.0105, 0.1967, 0.0176, 0.1465, 0.2063], device='cuda:2'), in_proj_covar=tensor([0.0134, 0.0108, 0.0257, 0.0107, 0.0221, 0.0108, 0.0225, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 04:23:38,285 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40686.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:23:41,167 INFO [train2.py:809] (2/4) Epoch 11, batch 850, loss[ctc_loss=0.07993, att_loss=0.2438, loss=0.211, over 16535.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006081, over 45.00 utterances.], tot_loss[ctc_loss=0.1045, att_loss=0.2506, loss=0.2214, over 3217072.85 frames. utt_duration=1253 frames, utt_pad_proportion=0.0552, over 10282.12 utterances.], batch size: 45, lr: 1.00e-02, grad_scale: 16.0 2023-03-08 04:24:18,993 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40712.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:24:31,780 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.342e+02 2.347e+02 2.849e+02 3.646e+02 6.696e+02, threshold=5.699e+02, percent-clipped=1.0 2023-03-08 04:25:00,587 INFO [train2.py:809] (2/4) Epoch 11, batch 900, loss[ctc_loss=0.1037, att_loss=0.2331, loss=0.2073, over 15789.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007725, over 38.00 utterances.], tot_loss[ctc_loss=0.1057, att_loss=0.2513, loss=0.2222, over 3230779.95 frames. utt_duration=1249 frames, utt_pad_proportion=0.05537, over 10359.71 utterances.], batch size: 38, lr: 1.00e-02, grad_scale: 16.0 2023-03-08 04:25:38,826 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7747, 5.0368, 5.0154, 4.9717, 5.1106, 5.0617, 4.8165, 4.6207], device='cuda:2'), covar=tensor([0.0929, 0.0531, 0.0239, 0.0453, 0.0291, 0.0271, 0.0318, 0.0307], device='cuda:2'), in_proj_covar=tensor([0.0454, 0.0291, 0.0243, 0.0276, 0.0341, 0.0362, 0.0289, 0.0322], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 04:26:21,130 INFO [train2.py:809] (2/4) Epoch 11, batch 950, loss[ctc_loss=0.07069, att_loss=0.2143, loss=0.1856, over 15748.00 frames. utt_duration=1659 frames, utt_pad_proportion=0.01, over 38.00 utterances.], tot_loss[ctc_loss=0.105, att_loss=0.2506, loss=0.2215, over 3240505.76 frames. utt_duration=1263 frames, utt_pad_proportion=0.05179, over 10274.79 utterances.], batch size: 38, lr: 1.00e-02, grad_scale: 16.0 2023-03-08 04:26:26,087 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9731, 4.0995, 3.8202, 4.2788, 2.5393, 3.9180, 2.5246, 1.8520], device='cuda:2'), covar=tensor([0.0330, 0.0153, 0.0825, 0.0141, 0.1938, 0.0217, 0.1704, 0.1879], device='cuda:2'), in_proj_covar=tensor([0.0135, 0.0109, 0.0257, 0.0106, 0.0219, 0.0108, 0.0226, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 04:27:07,946 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1840, 5.4706, 5.6436, 5.5996, 5.5988, 6.0943, 5.2649, 6.2257], device='cuda:2'), covar=tensor([0.0575, 0.0596, 0.0735, 0.0914, 0.1750, 0.0812, 0.0489, 0.0552], device='cuda:2'), in_proj_covar=tensor([0.0697, 0.0425, 0.0489, 0.0552, 0.0735, 0.0492, 0.0399, 0.0485], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 04:27:12,373 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.493e+02 2.301e+02 2.865e+02 3.345e+02 1.136e+03, threshold=5.731e+02, percent-clipped=4.0 2023-03-08 04:27:41,341 INFO [train2.py:809] (2/4) Epoch 11, batch 1000, loss[ctc_loss=0.2067, att_loss=0.3097, loss=0.2891, over 14257.00 frames. utt_duration=392 frames, utt_pad_proportion=0.3171, over 146.00 utterances.], tot_loss[ctc_loss=0.1054, att_loss=0.2508, loss=0.2217, over 3251365.73 frames. utt_duration=1274 frames, utt_pad_proportion=0.04884, over 10221.55 utterances.], batch size: 146, lr: 1.00e-02, grad_scale: 16.0 2023-03-08 04:27:57,008 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40848.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:28:01,798 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40851.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:28:25,277 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40865.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:28:39,142 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40874.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:28:41,846 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.84 vs. limit=2.0 2023-03-08 04:28:48,591 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.30 vs. limit=5.0 2023-03-08 04:29:01,316 INFO [train2.py:809] (2/4) Epoch 11, batch 1050, loss[ctc_loss=0.09966, att_loss=0.2653, loss=0.2322, over 16762.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.005967, over 48.00 utterances.], tot_loss[ctc_loss=0.1049, att_loss=0.251, loss=0.2218, over 3258817.91 frames. utt_duration=1289 frames, utt_pad_proportion=0.04369, over 10122.37 utterances.], batch size: 48, lr: 1.00e-02, grad_scale: 16.0 2023-03-08 04:29:18,983 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40899.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:29:20,785 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40900.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:29:48,755 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-08 04:29:52,389 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+02 2.443e+02 2.925e+02 3.809e+02 9.259e+02, threshold=5.849e+02, percent-clipped=3.0 2023-03-08 04:29:56,138 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40922.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:30:02,236 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.88 vs. limit=2.0 2023-03-08 04:30:03,363 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40926.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:30:06,377 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40928.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:30:22,027 INFO [train2.py:809] (2/4) Epoch 11, batch 1100, loss[ctc_loss=0.0925, att_loss=0.2338, loss=0.2055, over 15876.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009783, over 39.00 utterances.], tot_loss[ctc_loss=0.104, att_loss=0.2507, loss=0.2214, over 3266401.47 frames. utt_duration=1275 frames, utt_pad_proportion=0.04559, over 10259.46 utterances.], batch size: 39, lr: 1.00e-02, grad_scale: 16.0 2023-03-08 04:30:59,272 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40961.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 04:31:23,533 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40976.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:31:39,207 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40986.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:31:42,188 INFO [train2.py:809] (2/4) Epoch 11, batch 1150, loss[ctc_loss=0.1118, att_loss=0.2319, loss=0.2079, over 16169.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.006401, over 41.00 utterances.], tot_loss[ctc_loss=0.1052, att_loss=0.2518, loss=0.2225, over 3275551.70 frames. utt_duration=1264 frames, utt_pad_proportion=0.04738, over 10374.92 utterances.], batch size: 41, lr: 1.00e-02, grad_scale: 16.0 2023-03-08 04:31:56,134 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40997.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:32:12,921 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41007.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:32:34,186 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.489e+02 2.552e+02 3.086e+02 3.791e+02 8.496e+02, threshold=6.172e+02, percent-clipped=3.0 2023-03-08 04:32:44,273 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9411, 5.2589, 4.7707, 5.3737, 4.6984, 4.9720, 5.3790, 5.2496], device='cuda:2'), covar=tensor([0.0499, 0.0268, 0.0717, 0.0215, 0.0399, 0.0217, 0.0246, 0.0158], device='cuda:2'), in_proj_covar=tensor([0.0324, 0.0251, 0.0310, 0.0244, 0.0258, 0.0197, 0.0237, 0.0228], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 04:32:56,347 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41034.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:33:02,230 INFO [train2.py:809] (2/4) Epoch 11, batch 1200, loss[ctc_loss=0.08064, att_loss=0.2175, loss=0.1901, over 15380.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01049, over 35.00 utterances.], tot_loss[ctc_loss=0.1055, att_loss=0.2517, loss=0.2224, over 3276185.73 frames. utt_duration=1272 frames, utt_pad_proportion=0.04697, over 10318.13 utterances.], batch size: 35, lr: 9.99e-03, grad_scale: 16.0 2023-03-08 04:33:13,640 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41045.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:33:34,500 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41058.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:34:22,832 INFO [train2.py:809] (2/4) Epoch 11, batch 1250, loss[ctc_loss=0.09359, att_loss=0.2223, loss=0.1965, over 15491.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.009568, over 36.00 utterances.], tot_loss[ctc_loss=0.1047, att_loss=0.2517, loss=0.2223, over 3278425.48 frames. utt_duration=1263 frames, utt_pad_proportion=0.04801, over 10395.69 utterances.], batch size: 36, lr: 9.99e-03, grad_scale: 16.0 2023-03-08 04:34:23,378 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0125, 4.8971, 4.8318, 2.3351, 1.9960, 2.6298, 2.6689, 3.8188], device='cuda:2'), covar=tensor([0.0653, 0.0190, 0.0211, 0.3855, 0.5535, 0.2724, 0.2284, 0.1610], device='cuda:2'), in_proj_covar=tensor([0.0329, 0.0215, 0.0232, 0.0205, 0.0343, 0.0332, 0.0228, 0.0349], device='cuda:2'), out_proj_covar=tensor([1.4933e-04, 8.0602e-05, 9.9579e-05, 9.1603e-05, 1.5066e-04, 1.3526e-04, 9.0473e-05, 1.4835e-04], device='cuda:2') 2023-03-08 04:34:37,409 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1097, 3.7631, 3.1513, 3.3981, 3.8822, 3.5378, 2.9032, 4.2757], device='cuda:2'), covar=tensor([0.0895, 0.0423, 0.1075, 0.0767, 0.0734, 0.0715, 0.0951, 0.0528], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0180, 0.0201, 0.0172, 0.0234, 0.0209, 0.0182, 0.0247], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 04:34:45,126 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41102.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:34:51,873 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41106.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:35:14,660 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.546e+02 2.278e+02 2.660e+02 3.579e+02 5.724e+02, threshold=5.320e+02, percent-clipped=1.0 2023-03-08 04:35:14,933 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8878, 5.1975, 5.3998, 5.2890, 5.3355, 5.8421, 5.1038, 5.9518], device='cuda:2'), covar=tensor([0.0647, 0.0707, 0.0771, 0.1097, 0.1664, 0.0808, 0.0671, 0.0559], device='cuda:2'), in_proj_covar=tensor([0.0705, 0.0430, 0.0494, 0.0561, 0.0742, 0.0493, 0.0398, 0.0484], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 04:35:42,813 INFO [train2.py:809] (2/4) Epoch 11, batch 1300, loss[ctc_loss=0.1526, att_loss=0.2809, loss=0.2553, over 16763.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.005967, over 48.00 utterances.], tot_loss[ctc_loss=0.1056, att_loss=0.2523, loss=0.223, over 3263913.70 frames. utt_duration=1230 frames, utt_pad_proportion=0.06014, over 10631.18 utterances.], batch size: 48, lr: 9.98e-03, grad_scale: 16.0 2023-03-08 04:35:52,345 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9130, 6.0483, 5.5455, 5.9193, 5.7590, 5.2788, 5.4840, 5.4414], device='cuda:2'), covar=tensor([0.1093, 0.0815, 0.0706, 0.0615, 0.0693, 0.1397, 0.2327, 0.2035], device='cuda:2'), in_proj_covar=tensor([0.0439, 0.0509, 0.0369, 0.0383, 0.0360, 0.0420, 0.0523, 0.0458], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 04:35:58,622 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41148.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:36:23,576 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41163.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:36:43,504 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9012, 3.9459, 3.1757, 3.6980, 4.0488, 3.7535, 2.8738, 4.4383], device='cuda:2'), covar=tensor([0.0989, 0.0361, 0.0986, 0.0546, 0.0588, 0.0611, 0.0895, 0.0403], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0177, 0.0199, 0.0169, 0.0230, 0.0206, 0.0180, 0.0244], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 04:36:54,076 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2957, 4.5663, 4.4908, 4.5549, 4.6453, 4.5817, 4.3782, 4.1539], device='cuda:2'), covar=tensor([0.0963, 0.0469, 0.0289, 0.0430, 0.0266, 0.0312, 0.0332, 0.0376], device='cuda:2'), in_proj_covar=tensor([0.0451, 0.0285, 0.0242, 0.0278, 0.0341, 0.0362, 0.0287, 0.0319], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 04:37:03,123 INFO [train2.py:809] (2/4) Epoch 11, batch 1350, loss[ctc_loss=0.07175, att_loss=0.2098, loss=0.1822, over 15649.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008545, over 37.00 utterances.], tot_loss[ctc_loss=0.1055, att_loss=0.2522, loss=0.2228, over 3267840.16 frames. utt_duration=1220 frames, utt_pad_proportion=0.05985, over 10724.81 utterances.], batch size: 37, lr: 9.98e-03, grad_scale: 16.0 2023-03-08 04:37:15,597 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41196.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:37:55,026 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.568e+02 2.465e+02 2.945e+02 3.733e+02 9.191e+02, threshold=5.889e+02, percent-clipped=5.0 2023-03-08 04:37:56,887 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41221.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:38:23,869 INFO [train2.py:809] (2/4) Epoch 11, batch 1400, loss[ctc_loss=0.09374, att_loss=0.2648, loss=0.2306, over 16477.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006745, over 46.00 utterances.], tot_loss[ctc_loss=0.1042, att_loss=0.2513, loss=0.2219, over 3270460.06 frames. utt_duration=1257 frames, utt_pad_proportion=0.0513, over 10415.51 utterances.], batch size: 46, lr: 9.97e-03, grad_scale: 16.0 2023-03-08 04:38:52,762 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41256.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 04:39:44,640 INFO [train2.py:809] (2/4) Epoch 11, batch 1450, loss[ctc_loss=0.08404, att_loss=0.2233, loss=0.1954, over 15507.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.007943, over 36.00 utterances.], tot_loss[ctc_loss=0.1052, att_loss=0.2517, loss=0.2224, over 3270076.98 frames. utt_duration=1257 frames, utt_pad_proportion=0.05209, over 10415.06 utterances.], batch size: 36, lr: 9.96e-03, grad_scale: 16.0 2023-03-08 04:40:04,957 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2023-03-08 04:40:15,975 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41307.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:40:19,181 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1414, 5.1691, 5.0564, 2.8873, 4.8577, 4.5833, 4.3887, 2.5907], device='cuda:2'), covar=tensor([0.0112, 0.0073, 0.0200, 0.0977, 0.0085, 0.0178, 0.0334, 0.1440], device='cuda:2'), in_proj_covar=tensor([0.0059, 0.0082, 0.0074, 0.0102, 0.0068, 0.0093, 0.0091, 0.0098], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 04:40:37,477 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+02 2.447e+02 2.872e+02 3.322e+02 6.594e+02, threshold=5.743e+02, percent-clipped=2.0 2023-03-08 04:41:05,356 INFO [train2.py:809] (2/4) Epoch 11, batch 1500, loss[ctc_loss=0.07918, att_loss=0.2302, loss=0.2, over 15679.00 frames. utt_duration=1696 frames, utt_pad_proportion=0.006741, over 37.00 utterances.], tot_loss[ctc_loss=0.1033, att_loss=0.2496, loss=0.2203, over 3257401.33 frames. utt_duration=1279 frames, utt_pad_proportion=0.05219, over 10203.01 utterances.], batch size: 37, lr: 9.96e-03, grad_scale: 16.0 2023-03-08 04:41:30,420 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41353.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:41:33,415 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41355.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:42:04,028 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0874, 2.1966, 3.1057, 4.4434, 4.0317, 4.0262, 2.7799, 2.1615], device='cuda:2'), covar=tensor([0.0847, 0.2883, 0.1194, 0.0503, 0.0646, 0.0340, 0.1700, 0.2427], device='cuda:2'), in_proj_covar=tensor([0.0164, 0.0203, 0.0181, 0.0186, 0.0183, 0.0143, 0.0188, 0.0177], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 04:42:25,614 INFO [train2.py:809] (2/4) Epoch 11, batch 1550, loss[ctc_loss=0.082, att_loss=0.2288, loss=0.1994, over 16175.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.007067, over 41.00 utterances.], tot_loss[ctc_loss=0.1035, att_loss=0.2499, loss=0.2206, over 3263495.04 frames. utt_duration=1275 frames, utt_pad_proportion=0.05206, over 10247.92 utterances.], batch size: 41, lr: 9.95e-03, grad_scale: 16.0 2023-03-08 04:42:47,298 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41401.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:43:18,733 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6086, 2.5953, 4.9323, 3.8052, 3.0091, 4.3944, 4.7290, 4.5448], device='cuda:2'), covar=tensor([0.0174, 0.1565, 0.0095, 0.0901, 0.1666, 0.0213, 0.0121, 0.0228], device='cuda:2'), in_proj_covar=tensor([0.0149, 0.0245, 0.0134, 0.0305, 0.0276, 0.0187, 0.0118, 0.0154], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 04:43:19,832 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+02 2.586e+02 2.982e+02 3.761e+02 9.691e+02, threshold=5.963e+02, percent-clipped=6.0 2023-03-08 04:43:33,039 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-03-08 04:43:43,999 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.76 vs. limit=2.0 2023-03-08 04:43:45,722 INFO [train2.py:809] (2/4) Epoch 11, batch 1600, loss[ctc_loss=0.1023, att_loss=0.2575, loss=0.2265, over 16462.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.0069, over 46.00 utterances.], tot_loss[ctc_loss=0.1037, att_loss=0.2497, loss=0.2205, over 3258177.44 frames. utt_duration=1261 frames, utt_pad_proportion=0.05689, over 10343.49 utterances.], batch size: 46, lr: 9.95e-03, grad_scale: 8.0 2023-03-08 04:44:17,922 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41458.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:45:05,422 INFO [train2.py:809] (2/4) Epoch 11, batch 1650, loss[ctc_loss=0.09473, att_loss=0.2399, loss=0.2108, over 16018.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.00726, over 40.00 utterances.], tot_loss[ctc_loss=0.1043, att_loss=0.2506, loss=0.2213, over 3264384.01 frames. utt_duration=1251 frames, utt_pad_proportion=0.05766, over 10447.70 utterances.], batch size: 40, lr: 9.94e-03, grad_scale: 8.0 2023-03-08 04:45:58,326 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.357e+02 2.532e+02 2.866e+02 3.682e+02 7.453e+02, threshold=5.731e+02, percent-clipped=3.0 2023-03-08 04:45:58,660 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41521.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:46:00,247 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7636, 2.8552, 5.0814, 4.1084, 3.2016, 4.6286, 5.0594, 4.6515], device='cuda:2'), covar=tensor([0.0217, 0.1623, 0.0215, 0.0932, 0.1830, 0.0203, 0.0104, 0.0246], device='cuda:2'), in_proj_covar=tensor([0.0149, 0.0245, 0.0135, 0.0307, 0.0278, 0.0187, 0.0118, 0.0155], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 04:46:14,380 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6472, 5.0633, 4.9013, 4.8722, 5.0482, 4.7989, 3.8757, 5.0306], device='cuda:2'), covar=tensor([0.0093, 0.0097, 0.0100, 0.0096, 0.0101, 0.0105, 0.0544, 0.0235], device='cuda:2'), in_proj_covar=tensor([0.0072, 0.0071, 0.0086, 0.0053, 0.0058, 0.0069, 0.0090, 0.0091], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 04:46:25,300 INFO [train2.py:809] (2/4) Epoch 11, batch 1700, loss[ctc_loss=0.08081, att_loss=0.2202, loss=0.1923, over 15368.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01139, over 35.00 utterances.], tot_loss[ctc_loss=0.1042, att_loss=0.2503, loss=0.2211, over 3262523.65 frames. utt_duration=1242 frames, utt_pad_proportion=0.06089, over 10518.30 utterances.], batch size: 35, lr: 9.93e-03, grad_scale: 8.0 2023-03-08 04:46:55,393 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41556.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 04:47:16,357 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41569.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:47:29,169 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8158, 2.1483, 4.9955, 4.1390, 3.3482, 4.6081, 4.8583, 4.7544], device='cuda:2'), covar=tensor([0.0115, 0.1863, 0.0102, 0.0752, 0.1553, 0.0164, 0.0080, 0.0142], device='cuda:2'), in_proj_covar=tensor([0.0148, 0.0243, 0.0132, 0.0304, 0.0275, 0.0185, 0.0116, 0.0153], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 04:47:33,905 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1170, 4.9386, 5.0323, 2.2202, 1.9099, 2.4115, 2.7346, 3.5047], device='cuda:2'), covar=tensor([0.0817, 0.0267, 0.0251, 0.4190, 0.7403, 0.3827, 0.2627, 0.2431], device='cuda:2'), in_proj_covar=tensor([0.0336, 0.0222, 0.0238, 0.0213, 0.0353, 0.0339, 0.0230, 0.0359], device='cuda:2'), out_proj_covar=tensor([1.5223e-04, 8.3149e-05, 1.0181e-04, 9.5607e-05, 1.5487e-04, 1.3800e-04, 9.1205e-05, 1.5220e-04], device='cuda:2') 2023-03-08 04:47:46,037 INFO [train2.py:809] (2/4) Epoch 11, batch 1750, loss[ctc_loss=0.09843, att_loss=0.2526, loss=0.2218, over 16616.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.005992, over 47.00 utterances.], tot_loss[ctc_loss=0.104, att_loss=0.2499, loss=0.2207, over 3258778.01 frames. utt_duration=1253 frames, utt_pad_proportion=0.05758, over 10413.69 utterances.], batch size: 47, lr: 9.93e-03, grad_scale: 8.0 2023-03-08 04:47:49,382 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3294, 2.1917, 2.0323, 1.6637, 2.5043, 2.2200, 2.3495, 2.2214], device='cuda:2'), covar=tensor([0.0769, 0.3778, 0.3610, 0.2437, 0.3271, 0.1827, 0.1749, 0.1200], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0087, 0.0088, 0.0075, 0.0076, 0.0071, 0.0084, 0.0060], device='cuda:2'), out_proj_covar=tensor([4.9032e-05, 5.9019e-05, 6.0066e-05, 5.1301e-05, 4.9305e-05, 4.9609e-05, 5.6804e-05, 4.3770e-05], device='cuda:2') 2023-03-08 04:48:12,485 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41604.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:48:39,825 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.573e+02 2.288e+02 2.927e+02 3.480e+02 6.791e+02, threshold=5.853e+02, percent-clipped=3.0 2023-03-08 04:49:06,239 INFO [train2.py:809] (2/4) Epoch 11, batch 1800, loss[ctc_loss=0.08862, att_loss=0.2435, loss=0.2125, over 16177.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.007052, over 41.00 utterances.], tot_loss[ctc_loss=0.104, att_loss=0.2505, loss=0.2212, over 3269252.51 frames. utt_duration=1252 frames, utt_pad_proportion=0.05438, over 10453.33 utterances.], batch size: 41, lr: 9.92e-03, grad_scale: 8.0 2023-03-08 04:49:07,000 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.88 vs. limit=2.0 2023-03-08 04:49:31,704 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41653.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:49:33,307 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41654.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:49:43,861 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1148, 4.7624, 4.9252, 4.7110, 2.7829, 4.5422, 2.7423, 1.8569], device='cuda:2'), covar=tensor([0.0359, 0.0152, 0.0539, 0.0144, 0.1811, 0.0188, 0.1589, 0.1794], device='cuda:2'), in_proj_covar=tensor([0.0137, 0.0108, 0.0253, 0.0107, 0.0217, 0.0108, 0.0219, 0.0199], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 04:50:21,857 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-03-08 04:50:27,082 INFO [train2.py:809] (2/4) Epoch 11, batch 1850, loss[ctc_loss=0.09552, att_loss=0.2362, loss=0.2081, over 15875.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009876, over 39.00 utterances.], tot_loss[ctc_loss=0.1043, att_loss=0.2511, loss=0.2217, over 3273080.78 frames. utt_duration=1245 frames, utt_pad_proportion=0.0554, over 10525.41 utterances.], batch size: 39, lr: 9.92e-03, grad_scale: 8.0 2023-03-08 04:50:48,888 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41701.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:50:49,114 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41701.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:51:12,147 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41715.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 04:51:20,777 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.644e+02 2.415e+02 2.840e+02 3.452e+02 8.289e+02, threshold=5.680e+02, percent-clipped=4.0 2023-03-08 04:51:46,884 INFO [train2.py:809] (2/4) Epoch 11, batch 1900, loss[ctc_loss=0.09972, att_loss=0.2417, loss=0.2133, over 16417.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.006018, over 44.00 utterances.], tot_loss[ctc_loss=0.1041, att_loss=0.2511, loss=0.2217, over 3268588.11 frames. utt_duration=1231 frames, utt_pad_proportion=0.06012, over 10637.28 utterances.], batch size: 44, lr: 9.91e-03, grad_scale: 8.0 2023-03-08 04:52:04,781 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41749.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:52:19,287 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41758.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:53:07,409 INFO [train2.py:809] (2/4) Epoch 11, batch 1950, loss[ctc_loss=0.1607, att_loss=0.2786, loss=0.255, over 14150.00 frames. utt_duration=391.8 frames, utt_pad_proportion=0.3186, over 145.00 utterances.], tot_loss[ctc_loss=0.1043, att_loss=0.2508, loss=0.2215, over 3262541.83 frames. utt_duration=1231 frames, utt_pad_proportion=0.05961, over 10616.51 utterances.], batch size: 145, lr: 9.91e-03, grad_scale: 8.0 2023-03-08 04:53:37,477 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41806.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:54:00,605 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.413e+02 2.550e+02 3.116e+02 4.012e+02 6.949e+02, threshold=6.232e+02, percent-clipped=6.0 2023-03-08 04:54:02,164 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2023-03-08 04:54:27,444 INFO [train2.py:809] (2/4) Epoch 11, batch 2000, loss[ctc_loss=0.1032, att_loss=0.2511, loss=0.2215, over 17340.00 frames. utt_duration=879.5 frames, utt_pad_proportion=0.07611, over 79.00 utterances.], tot_loss[ctc_loss=0.1035, att_loss=0.2506, loss=0.2212, over 3260563.79 frames. utt_duration=1229 frames, utt_pad_proportion=0.0595, over 10625.26 utterances.], batch size: 79, lr: 9.90e-03, grad_scale: 8.0 2023-03-08 04:55:08,010 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7685, 6.0416, 5.5158, 5.8460, 5.7530, 5.2860, 5.4741, 5.3803], device='cuda:2'), covar=tensor([0.1423, 0.0830, 0.0750, 0.0809, 0.0809, 0.1387, 0.2253, 0.2233], device='cuda:2'), in_proj_covar=tensor([0.0446, 0.0510, 0.0372, 0.0389, 0.0363, 0.0424, 0.0526, 0.0465], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 04:55:47,165 INFO [train2.py:809] (2/4) Epoch 11, batch 2050, loss[ctc_loss=0.08481, att_loss=0.2375, loss=0.2069, over 16166.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.006554, over 41.00 utterances.], tot_loss[ctc_loss=0.1047, att_loss=0.2517, loss=0.2223, over 3267234.85 frames. utt_duration=1213 frames, utt_pad_proportion=0.06211, over 10784.69 utterances.], batch size: 41, lr: 9.89e-03, grad_scale: 8.0 2023-03-08 04:56:05,553 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4035, 2.6919, 3.0446, 4.1776, 3.8695, 3.9780, 2.9589, 2.1099], device='cuda:2'), covar=tensor([0.0705, 0.2372, 0.1177, 0.0735, 0.0857, 0.0381, 0.1468, 0.2541], device='cuda:2'), in_proj_covar=tensor([0.0165, 0.0204, 0.0184, 0.0188, 0.0188, 0.0148, 0.0192, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 04:56:17,859 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41906.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:56:32,118 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2023-03-08 04:56:40,437 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.467e+02 2.492e+02 2.943e+02 3.583e+02 1.498e+03, threshold=5.886e+02, percent-clipped=5.0 2023-03-08 04:57:07,336 INFO [train2.py:809] (2/4) Epoch 11, batch 2100, loss[ctc_loss=0.1293, att_loss=0.2558, loss=0.2305, over 16114.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006299, over 42.00 utterances.], tot_loss[ctc_loss=0.1058, att_loss=0.2522, loss=0.223, over 3256472.44 frames. utt_duration=1190 frames, utt_pad_proportion=0.07245, over 10958.54 utterances.], batch size: 42, lr: 9.89e-03, grad_scale: 8.0 2023-03-08 04:57:10,795 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5375, 4.6112, 4.5031, 4.6274, 5.0021, 4.8158, 4.6526, 2.0748], device='cuda:2'), covar=tensor([0.0212, 0.0226, 0.0211, 0.0182, 0.1264, 0.0153, 0.0190, 0.2475], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0130, 0.0137, 0.0137, 0.0329, 0.0120, 0.0119, 0.0217], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 04:57:23,978 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41948.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 04:57:54,634 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41967.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 04:58:17,131 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0992, 5.4000, 5.3423, 5.3913, 5.4282, 5.3536, 5.1344, 4.8901], device='cuda:2'), covar=tensor([0.0990, 0.0383, 0.0206, 0.0298, 0.0264, 0.0269, 0.0300, 0.0289], device='cuda:2'), in_proj_covar=tensor([0.0447, 0.0283, 0.0239, 0.0276, 0.0337, 0.0351, 0.0284, 0.0315], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 04:58:27,977 INFO [train2.py:809] (2/4) Epoch 11, batch 2150, loss[ctc_loss=0.09029, att_loss=0.2476, loss=0.2162, over 16527.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006201, over 45.00 utterances.], tot_loss[ctc_loss=0.1058, att_loss=0.2522, loss=0.2229, over 3253429.29 frames. utt_duration=1180 frames, utt_pad_proportion=0.0772, over 11038.73 utterances.], batch size: 45, lr: 9.88e-03, grad_scale: 8.0 2023-03-08 04:59:07,012 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42009.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 04:59:08,296 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42010.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 04:59:19,887 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.88 vs. limit=2.0 2023-03-08 04:59:25,319 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+02 2.469e+02 2.816e+02 3.466e+02 1.172e+03, threshold=5.632e+02, percent-clipped=3.0 2023-03-08 04:59:37,190 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6325, 3.0859, 3.7098, 3.1008, 3.5574, 4.7141, 4.3650, 3.4856], device='cuda:2'), covar=tensor([0.0317, 0.1569, 0.0934, 0.1339, 0.0992, 0.0713, 0.0563, 0.1224], device='cuda:2'), in_proj_covar=tensor([0.0226, 0.0226, 0.0244, 0.0203, 0.0241, 0.0300, 0.0215, 0.0221], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 04:59:52,149 INFO [train2.py:809] (2/4) Epoch 11, batch 2200, loss[ctc_loss=0.1107, att_loss=0.2494, loss=0.2216, over 16273.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007676, over 43.00 utterances.], tot_loss[ctc_loss=0.1048, att_loss=0.2521, loss=0.2226, over 3262197.42 frames. utt_duration=1212 frames, utt_pad_proportion=0.06621, over 10780.16 utterances.], batch size: 43, lr: 9.88e-03, grad_scale: 8.0 2023-03-08 05:00:25,438 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.85 vs. limit=2.0 2023-03-08 05:01:12,046 INFO [train2.py:809] (2/4) Epoch 11, batch 2250, loss[ctc_loss=0.1199, att_loss=0.2674, loss=0.2379, over 17462.00 frames. utt_duration=886 frames, utt_pad_proportion=0.0703, over 79.00 utterances.], tot_loss[ctc_loss=0.1033, att_loss=0.2508, loss=0.2213, over 3267108.04 frames. utt_duration=1246 frames, utt_pad_proportion=0.05716, over 10503.42 utterances.], batch size: 79, lr: 9.87e-03, grad_scale: 8.0 2023-03-08 05:02:04,984 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.420e+02 2.270e+02 2.801e+02 3.546e+02 6.959e+02, threshold=5.601e+02, percent-clipped=2.0 2023-03-08 05:02:27,216 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.60 vs. limit=2.0 2023-03-08 05:02:32,249 INFO [train2.py:809] (2/4) Epoch 11, batch 2300, loss[ctc_loss=0.08404, att_loss=0.2559, loss=0.2215, over 16627.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005289, over 47.00 utterances.], tot_loss[ctc_loss=0.1032, att_loss=0.2512, loss=0.2216, over 3271617.45 frames. utt_duration=1233 frames, utt_pad_proportion=0.0599, over 10624.29 utterances.], batch size: 47, lr: 9.86e-03, grad_scale: 8.0 2023-03-08 05:03:33,688 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8186, 5.1430, 5.4016, 5.2932, 5.2782, 5.7934, 5.1067, 5.9558], device='cuda:2'), covar=tensor([0.0683, 0.0684, 0.0666, 0.1064, 0.1644, 0.0791, 0.0652, 0.0517], device='cuda:2'), in_proj_covar=tensor([0.0713, 0.0423, 0.0491, 0.0559, 0.0743, 0.0487, 0.0400, 0.0486], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 05:03:52,294 INFO [train2.py:809] (2/4) Epoch 11, batch 2350, loss[ctc_loss=0.08332, att_loss=0.2173, loss=0.1905, over 15756.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.007256, over 38.00 utterances.], tot_loss[ctc_loss=0.1023, att_loss=0.2504, loss=0.2208, over 3265189.70 frames. utt_duration=1248 frames, utt_pad_proportion=0.05658, over 10473.79 utterances.], batch size: 38, lr: 9.86e-03, grad_scale: 8.0 2023-03-08 05:04:18,926 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.58 vs. limit=2.0 2023-03-08 05:04:45,693 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.492e+02 2.954e+02 3.764e+02 8.189e+02, threshold=5.908e+02, percent-clipped=4.0 2023-03-08 05:04:49,884 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6731, 2.8122, 5.1395, 3.9678, 2.9530, 4.3654, 4.7651, 4.7395], device='cuda:2'), covar=tensor([0.0223, 0.1644, 0.0134, 0.0922, 0.1870, 0.0237, 0.0131, 0.0237], device='cuda:2'), in_proj_covar=tensor([0.0151, 0.0245, 0.0135, 0.0306, 0.0280, 0.0188, 0.0120, 0.0155], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 05:05:13,513 INFO [train2.py:809] (2/4) Epoch 11, batch 2400, loss[ctc_loss=0.08422, att_loss=0.2489, loss=0.2159, over 16756.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.005631, over 48.00 utterances.], tot_loss[ctc_loss=0.102, att_loss=0.2505, loss=0.2208, over 3277129.15 frames. utt_duration=1259 frames, utt_pad_proportion=0.05044, over 10420.59 utterances.], batch size: 48, lr: 9.85e-03, grad_scale: 8.0 2023-03-08 05:05:52,557 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42262.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:06:25,753 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0693, 5.3795, 5.3329, 5.2251, 5.3731, 5.3433, 4.9915, 4.8355], device='cuda:2'), covar=tensor([0.0965, 0.0361, 0.0224, 0.0537, 0.0261, 0.0252, 0.0337, 0.0316], device='cuda:2'), in_proj_covar=tensor([0.0456, 0.0289, 0.0244, 0.0286, 0.0346, 0.0354, 0.0290, 0.0322], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 05:06:33,431 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6239, 3.7330, 3.4780, 3.0162, 3.5574, 3.7473, 3.6689, 2.3010], device='cuda:2'), covar=tensor([0.1166, 0.1568, 0.4430, 0.6998, 0.1391, 0.3784, 0.0898, 0.8226], device='cuda:2'), in_proj_covar=tensor([0.0098, 0.0117, 0.0126, 0.0192, 0.0103, 0.0179, 0.0104, 0.0169], device='cuda:2'), out_proj_covar=tensor([9.4742e-05, 1.0322e-04, 1.1361e-04, 1.5710e-04, 9.5359e-05, 1.4914e-04, 9.3104e-05, 1.3988e-04], device='cuda:2') 2023-03-08 05:06:34,473 INFO [train2.py:809] (2/4) Epoch 11, batch 2450, loss[ctc_loss=0.102, att_loss=0.2451, loss=0.2165, over 16392.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.00747, over 44.00 utterances.], tot_loss[ctc_loss=0.1027, att_loss=0.2503, loss=0.2208, over 3272879.52 frames. utt_duration=1230 frames, utt_pad_proportion=0.05834, over 10653.54 utterances.], batch size: 44, lr: 9.85e-03, grad_scale: 8.0 2023-03-08 05:06:59,993 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42304.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 05:07:09,505 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=42310.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 05:07:26,993 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+02 2.356e+02 2.915e+02 3.557e+02 7.393e+02, threshold=5.830e+02, percent-clipped=4.0 2023-03-08 05:07:37,987 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0796, 5.0955, 4.9613, 3.0117, 4.8329, 4.6469, 4.3419, 2.5226], device='cuda:2'), covar=tensor([0.0115, 0.0078, 0.0222, 0.1082, 0.0088, 0.0168, 0.0322, 0.1539], device='cuda:2'), in_proj_covar=tensor([0.0061, 0.0085, 0.0076, 0.0105, 0.0070, 0.0096, 0.0093, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 05:07:41,603 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-03-08 05:07:54,612 INFO [train2.py:809] (2/4) Epoch 11, batch 2500, loss[ctc_loss=0.2449, att_loss=0.3156, loss=0.3014, over 14194.00 frames. utt_duration=390.4 frames, utt_pad_proportion=0.321, over 146.00 utterances.], tot_loss[ctc_loss=0.1024, att_loss=0.2497, loss=0.2202, over 3246179.36 frames. utt_duration=1230 frames, utt_pad_proportion=0.06355, over 10566.54 utterances.], batch size: 146, lr: 9.84e-03, grad_scale: 8.0 2023-03-08 05:08:26,079 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=42358.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:08:59,989 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42379.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:09:14,584 INFO [train2.py:809] (2/4) Epoch 11, batch 2550, loss[ctc_loss=0.1546, att_loss=0.2723, loss=0.2488, over 17060.00 frames. utt_duration=690.8 frames, utt_pad_proportion=0.1321, over 99.00 utterances.], tot_loss[ctc_loss=0.1027, att_loss=0.2503, loss=0.2208, over 3252143.06 frames. utt_duration=1219 frames, utt_pad_proportion=0.06556, over 10680.82 utterances.], batch size: 99, lr: 9.84e-03, grad_scale: 8.0 2023-03-08 05:10:06,489 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.521e+02 2.376e+02 2.981e+02 3.759e+02 6.397e+02, threshold=5.961e+02, percent-clipped=1.0 2023-03-08 05:10:34,790 INFO [train2.py:809] (2/4) Epoch 11, batch 2600, loss[ctc_loss=0.08379, att_loss=0.2194, loss=0.1922, over 15777.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008212, over 38.00 utterances.], tot_loss[ctc_loss=0.1021, att_loss=0.2488, loss=0.2194, over 3246029.66 frames. utt_duration=1212 frames, utt_pad_proportion=0.07046, over 10722.13 utterances.], batch size: 38, lr: 9.83e-03, grad_scale: 8.0 2023-03-08 05:10:38,274 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42440.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:10:49,275 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9465, 3.8097, 3.1439, 3.2520, 3.9142, 3.4603, 2.9071, 4.3624], device='cuda:2'), covar=tensor([0.0919, 0.0423, 0.0969, 0.0676, 0.0566, 0.0703, 0.0803, 0.0342], device='cuda:2'), in_proj_covar=tensor([0.0184, 0.0180, 0.0200, 0.0172, 0.0233, 0.0210, 0.0181, 0.0249], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 05:11:23,464 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-03-08 05:11:38,055 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-08 05:11:55,329 INFO [train2.py:809] (2/4) Epoch 11, batch 2650, loss[ctc_loss=0.08424, att_loss=0.2325, loss=0.2028, over 15658.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.00807, over 37.00 utterances.], tot_loss[ctc_loss=0.1022, att_loss=0.2493, loss=0.2199, over 3249485.01 frames. utt_duration=1214 frames, utt_pad_proportion=0.06959, over 10717.76 utterances.], batch size: 37, lr: 9.82e-03, grad_scale: 8.0 2023-03-08 05:12:23,796 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42506.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:12:47,504 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.508e+02 2.235e+02 2.761e+02 3.286e+02 8.509e+02, threshold=5.523e+02, percent-clipped=3.0 2023-03-08 05:13:03,746 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.1536, 3.3699, 3.2161, 2.8925, 3.2338, 3.1530, 3.3612, 2.1499], device='cuda:2'), covar=tensor([0.1375, 0.1558, 0.2398, 0.5788, 0.2243, 0.4373, 0.0986, 0.8027], device='cuda:2'), in_proj_covar=tensor([0.0100, 0.0119, 0.0126, 0.0193, 0.0103, 0.0178, 0.0106, 0.0169], device='cuda:2'), out_proj_covar=tensor([9.5956e-05, 1.0454e-04, 1.1421e-04, 1.5798e-04, 9.5328e-05, 1.4831e-04, 9.3877e-05, 1.3962e-04], device='cuda:2') 2023-03-08 05:13:15,980 INFO [train2.py:809] (2/4) Epoch 11, batch 2700, loss[ctc_loss=0.09159, att_loss=0.2483, loss=0.217, over 16957.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008079, over 50.00 utterances.], tot_loss[ctc_loss=0.1034, att_loss=0.2498, loss=0.2205, over 3250784.95 frames. utt_duration=1199 frames, utt_pad_proportion=0.07277, over 10854.49 utterances.], batch size: 50, lr: 9.82e-03, grad_scale: 8.0 2023-03-08 05:13:17,846 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7199, 5.1525, 5.0175, 5.1743, 5.1768, 4.7378, 3.7708, 5.1991], device='cuda:2'), covar=tensor([0.0090, 0.0105, 0.0097, 0.0071, 0.0084, 0.0111, 0.0617, 0.0143], device='cuda:2'), in_proj_covar=tensor([0.0074, 0.0072, 0.0088, 0.0053, 0.0059, 0.0070, 0.0091, 0.0091], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 05:13:38,855 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-03-08 05:13:53,928 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=42562.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:14:02,143 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42567.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:14:03,788 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1326, 4.9076, 4.7719, 5.0808, 2.7971, 4.8000, 2.7583, 2.0216], device='cuda:2'), covar=tensor([0.0354, 0.0148, 0.0677, 0.0101, 0.1712, 0.0133, 0.1534, 0.1698], device='cuda:2'), in_proj_covar=tensor([0.0140, 0.0110, 0.0257, 0.0107, 0.0217, 0.0108, 0.0221, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 05:14:37,196 INFO [train2.py:809] (2/4) Epoch 11, batch 2750, loss[ctc_loss=0.1241, att_loss=0.2738, loss=0.2438, over 17295.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01211, over 55.00 utterances.], tot_loss[ctc_loss=0.1041, att_loss=0.2506, loss=0.2213, over 3251190.03 frames. utt_duration=1179 frames, utt_pad_proportion=0.07784, over 11046.47 utterances.], batch size: 55, lr: 9.81e-03, grad_scale: 8.0 2023-03-08 05:15:02,444 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=42604.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 05:15:12,250 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=42610.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:15:30,221 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.628e+02 2.426e+02 2.955e+02 3.528e+02 6.434e+02, threshold=5.910e+02, percent-clipped=2.0 2023-03-08 05:15:54,253 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-08 05:15:57,777 INFO [train2.py:809] (2/4) Epoch 11, batch 2800, loss[ctc_loss=0.09684, att_loss=0.2496, loss=0.2191, over 16313.00 frames. utt_duration=1451 frames, utt_pad_proportion=0.007281, over 45.00 utterances.], tot_loss[ctc_loss=0.1046, att_loss=0.2514, loss=0.2221, over 3255574.87 frames. utt_duration=1173 frames, utt_pad_proportion=0.07914, over 11120.33 utterances.], batch size: 45, lr: 9.81e-03, grad_scale: 8.0 2023-03-08 05:16:15,483 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42649.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:16:20,521 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=42652.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 05:16:21,503 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-03-08 05:17:17,565 INFO [train2.py:809] (2/4) Epoch 11, batch 2850, loss[ctc_loss=0.09294, att_loss=0.2424, loss=0.2125, over 16471.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006327, over 46.00 utterances.], tot_loss[ctc_loss=0.104, att_loss=0.2508, loss=0.2215, over 3261126.45 frames. utt_duration=1180 frames, utt_pad_proportion=0.07567, over 11070.61 utterances.], batch size: 46, lr: 9.80e-03, grad_scale: 8.0 2023-03-08 05:17:52,702 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42710.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:18:10,595 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.499e+02 2.468e+02 2.915e+02 3.973e+02 1.035e+03, threshold=5.829e+02, percent-clipped=4.0 2023-03-08 05:18:33,604 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42735.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:18:38,144 INFO [train2.py:809] (2/4) Epoch 11, batch 2900, loss[ctc_loss=0.07703, att_loss=0.2368, loss=0.2048, over 16468.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006433, over 46.00 utterances.], tot_loss[ctc_loss=0.1033, att_loss=0.2504, loss=0.221, over 3252021.72 frames. utt_duration=1186 frames, utt_pad_proportion=0.07654, over 10978.73 utterances.], batch size: 46, lr: 9.80e-03, grad_scale: 8.0 2023-03-08 05:19:58,452 INFO [train2.py:809] (2/4) Epoch 11, batch 2950, loss[ctc_loss=0.1044, att_loss=0.2675, loss=0.2348, over 17334.00 frames. utt_duration=879.2 frames, utt_pad_proportion=0.07741, over 79.00 utterances.], tot_loss[ctc_loss=0.1032, att_loss=0.2506, loss=0.2211, over 3255421.91 frames. utt_duration=1174 frames, utt_pad_proportion=0.0775, over 11102.47 utterances.], batch size: 79, lr: 9.79e-03, grad_scale: 8.0 2023-03-08 05:20:52,200 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+02 2.273e+02 2.678e+02 3.532e+02 7.141e+02, threshold=5.356e+02, percent-clipped=1.0 2023-03-08 05:21:19,019 INFO [train2.py:809] (2/4) Epoch 11, batch 3000, loss[ctc_loss=0.104, att_loss=0.2652, loss=0.233, over 16965.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.006813, over 50.00 utterances.], tot_loss[ctc_loss=0.1026, att_loss=0.2508, loss=0.2212, over 3260186.53 frames. utt_duration=1187 frames, utt_pad_proportion=0.07301, over 10996.49 utterances.], batch size: 50, lr: 9.78e-03, grad_scale: 8.0 2023-03-08 05:21:19,019 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 05:21:32,740 INFO [train2.py:843] (2/4) Epoch 11, validation: ctc_loss=0.04985, att_loss=0.2382, loss=0.2006, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 05:21:32,741 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 05:22:05,851 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42859.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:22:10,464 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42862.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:22:37,885 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42879.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:22:52,098 INFO [train2.py:809] (2/4) Epoch 11, batch 3050, loss[ctc_loss=0.09348, att_loss=0.2486, loss=0.2175, over 16485.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.006263, over 46.00 utterances.], tot_loss[ctc_loss=0.1035, att_loss=0.2512, loss=0.2217, over 3264812.04 frames. utt_duration=1194 frames, utt_pad_proportion=0.06951, over 10951.67 utterances.], batch size: 46, lr: 9.78e-03, grad_scale: 8.0 2023-03-08 05:22:55,483 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1501, 2.3483, 3.1880, 4.2476, 3.7209, 3.8806, 2.6805, 1.9463], device='cuda:2'), covar=tensor([0.0872, 0.2706, 0.1063, 0.0601, 0.0923, 0.0421, 0.1848, 0.2541], device='cuda:2'), in_proj_covar=tensor([0.0167, 0.0204, 0.0183, 0.0188, 0.0185, 0.0150, 0.0193, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 05:22:56,240 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-03-08 05:23:06,810 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9903, 5.2708, 5.1699, 5.2066, 5.3048, 5.2773, 4.9669, 4.7532], device='cuda:2'), covar=tensor([0.1096, 0.0460, 0.0236, 0.0478, 0.0292, 0.0256, 0.0314, 0.0325], device='cuda:2'), in_proj_covar=tensor([0.0455, 0.0291, 0.0244, 0.0284, 0.0344, 0.0355, 0.0286, 0.0319], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 05:23:43,383 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42920.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:23:44,616 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.578e+02 2.391e+02 3.019e+02 3.669e+02 9.404e+02, threshold=6.038e+02, percent-clipped=6.0 2023-03-08 05:24:11,651 INFO [train2.py:809] (2/4) Epoch 11, batch 3100, loss[ctc_loss=0.09097, att_loss=0.2303, loss=0.2024, over 15773.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008589, over 38.00 utterances.], tot_loss[ctc_loss=0.1032, att_loss=0.2511, loss=0.2215, over 3268124.92 frames. utt_duration=1205 frames, utt_pad_proportion=0.06544, over 10864.96 utterances.], batch size: 38, lr: 9.77e-03, grad_scale: 8.0 2023-03-08 05:24:15,780 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42940.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:24:17,006 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7944, 6.0826, 5.4209, 5.7651, 5.6480, 5.3305, 5.4826, 5.3091], device='cuda:2'), covar=tensor([0.1377, 0.0816, 0.0809, 0.0822, 0.0880, 0.1569, 0.2300, 0.2429], device='cuda:2'), in_proj_covar=tensor([0.0444, 0.0502, 0.0376, 0.0388, 0.0364, 0.0424, 0.0526, 0.0463], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 05:25:02,055 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1093, 5.3776, 5.3566, 5.3172, 5.4734, 5.3792, 5.1068, 4.8926], device='cuda:2'), covar=tensor([0.0930, 0.0468, 0.0215, 0.0414, 0.0219, 0.0257, 0.0247, 0.0274], device='cuda:2'), in_proj_covar=tensor([0.0457, 0.0292, 0.0247, 0.0286, 0.0347, 0.0358, 0.0288, 0.0322], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 05:25:32,683 INFO [train2.py:809] (2/4) Epoch 11, batch 3150, loss[ctc_loss=0.1119, att_loss=0.2568, loss=0.2278, over 16621.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005603, over 47.00 utterances.], tot_loss[ctc_loss=0.1039, att_loss=0.2517, loss=0.2222, over 3270994.48 frames. utt_duration=1195 frames, utt_pad_proportion=0.06832, over 10958.48 utterances.], batch size: 47, lr: 9.77e-03, grad_scale: 8.0 2023-03-08 05:26:00,842 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=43005.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:26:26,646 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.352e+02 2.340e+02 2.878e+02 3.301e+02 6.036e+02, threshold=5.756e+02, percent-clipped=0.0 2023-03-08 05:26:49,556 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43035.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:26:53,989 INFO [train2.py:809] (2/4) Epoch 11, batch 3200, loss[ctc_loss=0.09885, att_loss=0.2573, loss=0.2256, over 17156.00 frames. utt_duration=1227 frames, utt_pad_proportion=0.01317, over 56.00 utterances.], tot_loss[ctc_loss=0.1047, att_loss=0.2529, loss=0.2232, over 3279528.93 frames. utt_duration=1166 frames, utt_pad_proportion=0.07301, over 11260.44 utterances.], batch size: 56, lr: 9.76e-03, grad_scale: 8.0 2023-03-08 05:28:06,592 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43083.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:28:14,938 INFO [train2.py:809] (2/4) Epoch 11, batch 3250, loss[ctc_loss=0.1176, att_loss=0.2643, loss=0.235, over 16763.00 frames. utt_duration=678.9 frames, utt_pad_proportion=0.1482, over 99.00 utterances.], tot_loss[ctc_loss=0.1057, att_loss=0.2534, loss=0.2239, over 3278191.00 frames. utt_duration=1149 frames, utt_pad_proportion=0.07593, over 11430.77 utterances.], batch size: 99, lr: 9.76e-03, grad_scale: 8.0 2023-03-08 05:29:07,590 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+02 2.382e+02 2.901e+02 3.870e+02 8.262e+02, threshold=5.801e+02, percent-clipped=4.0 2023-03-08 05:29:35,241 INFO [train2.py:809] (2/4) Epoch 11, batch 3300, loss[ctc_loss=0.1105, att_loss=0.2484, loss=0.2209, over 16388.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.008462, over 44.00 utterances.], tot_loss[ctc_loss=0.1048, att_loss=0.253, loss=0.2233, over 3286395.63 frames. utt_duration=1180 frames, utt_pad_proportion=0.06612, over 11155.11 utterances.], batch size: 44, lr: 9.75e-03, grad_scale: 8.0 2023-03-08 05:29:38,055 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-08 05:29:43,991 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-08 05:30:12,619 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43162.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:30:55,503 INFO [train2.py:809] (2/4) Epoch 11, batch 3350, loss[ctc_loss=0.08451, att_loss=0.2408, loss=0.2096, over 16338.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005746, over 45.00 utterances.], tot_loss[ctc_loss=0.1046, att_loss=0.2526, loss=0.223, over 3289527.76 frames. utt_duration=1187 frames, utt_pad_proportion=0.06389, over 11097.10 utterances.], batch size: 45, lr: 9.75e-03, grad_scale: 8.0 2023-03-08 05:31:30,915 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43210.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:31:39,583 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=43215.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:31:49,462 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.559e+02 2.322e+02 2.901e+02 3.775e+02 1.145e+03, threshold=5.802e+02, percent-clipped=9.0 2023-03-08 05:32:12,670 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=43235.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:32:17,048 INFO [train2.py:809] (2/4) Epoch 11, batch 3400, loss[ctc_loss=0.08036, att_loss=0.221, loss=0.1929, over 15343.00 frames. utt_duration=1755 frames, utt_pad_proportion=0.01297, over 35.00 utterances.], tot_loss[ctc_loss=0.1045, att_loss=0.2524, loss=0.2228, over 3279708.66 frames. utt_duration=1167 frames, utt_pad_proportion=0.0719, over 11255.71 utterances.], batch size: 35, lr: 9.74e-03, grad_scale: 8.0 2023-03-08 05:33:37,405 INFO [train2.py:809] (2/4) Epoch 11, batch 3450, loss[ctc_loss=0.07495, att_loss=0.2156, loss=0.1875, over 15381.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01057, over 35.00 utterances.], tot_loss[ctc_loss=0.1032, att_loss=0.2516, loss=0.2219, over 3281311.41 frames. utt_duration=1185 frames, utt_pad_proportion=0.06682, over 11086.70 utterances.], batch size: 35, lr: 9.73e-03, grad_scale: 8.0 2023-03-08 05:34:03,791 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43305.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:34:05,902 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-08 05:34:20,938 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.62 vs. limit=5.0 2023-03-08 05:34:29,499 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+02 2.188e+02 2.793e+02 3.558e+02 5.760e+02, threshold=5.586e+02, percent-clipped=0.0 2023-03-08 05:34:56,566 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.34 vs. limit=5.0 2023-03-08 05:34:57,085 INFO [train2.py:809] (2/4) Epoch 11, batch 3500, loss[ctc_loss=0.1066, att_loss=0.2619, loss=0.2308, over 16869.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007523, over 49.00 utterances.], tot_loss[ctc_loss=0.103, att_loss=0.251, loss=0.2214, over 3271805.28 frames. utt_duration=1210 frames, utt_pad_proportion=0.06259, over 10831.15 utterances.], batch size: 49, lr: 9.73e-03, grad_scale: 8.0 2023-03-08 05:35:20,572 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43353.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:36:17,065 INFO [train2.py:809] (2/4) Epoch 11, batch 3550, loss[ctc_loss=0.08139, att_loss=0.2376, loss=0.2064, over 16645.00 frames. utt_duration=1418 frames, utt_pad_proportion=0.004184, over 47.00 utterances.], tot_loss[ctc_loss=0.1024, att_loss=0.2508, loss=0.2211, over 3280842.62 frames. utt_duration=1223 frames, utt_pad_proportion=0.05782, over 10743.81 utterances.], batch size: 47, lr: 9.72e-03, grad_scale: 4.0 2023-03-08 05:37:07,659 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-03-08 05:37:11,207 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+02 2.366e+02 2.941e+02 3.430e+02 6.532e+02, threshold=5.882e+02, percent-clipped=4.0 2023-03-08 05:37:37,562 INFO [train2.py:809] (2/4) Epoch 11, batch 3600, loss[ctc_loss=0.1257, att_loss=0.2732, loss=0.2437, over 16334.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005974, over 45.00 utterances.], tot_loss[ctc_loss=0.1028, att_loss=0.2512, loss=0.2215, over 3284126.88 frames. utt_duration=1230 frames, utt_pad_proportion=0.05498, over 10691.10 utterances.], batch size: 45, lr: 9.72e-03, grad_scale: 8.0 2023-03-08 05:38:41,727 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3452, 4.9262, 4.8147, 5.0220, 2.7667, 4.7265, 2.8730, 1.9183], device='cuda:2'), covar=tensor([0.0291, 0.0151, 0.0631, 0.0102, 0.1712, 0.0173, 0.1489, 0.1786], device='cuda:2'), in_proj_covar=tensor([0.0143, 0.0112, 0.0261, 0.0110, 0.0223, 0.0111, 0.0225, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 05:38:58,674 INFO [train2.py:809] (2/4) Epoch 11, batch 3650, loss[ctc_loss=0.09089, att_loss=0.2519, loss=0.2197, over 16758.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.00694, over 48.00 utterances.], tot_loss[ctc_loss=0.103, att_loss=0.2506, loss=0.2211, over 3272562.44 frames. utt_duration=1222 frames, utt_pad_proportion=0.06116, over 10727.72 utterances.], batch size: 48, lr: 9.71e-03, grad_scale: 8.0 2023-03-08 05:39:41,224 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=43514.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:39:42,795 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43515.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:39:53,256 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.496e+02 2.334e+02 2.934e+02 3.698e+02 7.886e+02, threshold=5.869e+02, percent-clipped=6.0 2023-03-08 05:40:03,530 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2023-03-08 05:40:14,897 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43535.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:40:19,099 INFO [train2.py:809] (2/4) Epoch 11, batch 3700, loss[ctc_loss=0.1025, att_loss=0.263, loss=0.2309, over 17387.00 frames. utt_duration=1009 frames, utt_pad_proportion=0.04861, over 69.00 utterances.], tot_loss[ctc_loss=0.1021, att_loss=0.2501, loss=0.2205, over 3270635.81 frames. utt_duration=1249 frames, utt_pad_proportion=0.05555, over 10490.83 utterances.], batch size: 69, lr: 9.71e-03, grad_scale: 8.0 2023-03-08 05:40:59,608 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43563.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:41:19,038 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=43575.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:41:30,968 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43583.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:41:38,578 INFO [train2.py:809] (2/4) Epoch 11, batch 3750, loss[ctc_loss=0.0849, att_loss=0.2618, loss=0.2264, over 17051.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.009086, over 52.00 utterances.], tot_loss[ctc_loss=0.1017, att_loss=0.2498, loss=0.2202, over 3258250.32 frames. utt_duration=1243 frames, utt_pad_proportion=0.06007, over 10497.43 utterances.], batch size: 52, lr: 9.70e-03, grad_scale: 8.0 2023-03-08 05:41:47,930 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2458, 2.6158, 3.4969, 2.5439, 3.3385, 4.3441, 4.0404, 3.1059], device='cuda:2'), covar=tensor([0.0365, 0.1898, 0.1171, 0.1702, 0.1108, 0.0751, 0.0625, 0.1479], device='cuda:2'), in_proj_covar=tensor([0.0223, 0.0224, 0.0242, 0.0201, 0.0234, 0.0299, 0.0214, 0.0219], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 05:42:32,238 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.570e+02 2.236e+02 2.829e+02 3.523e+02 7.340e+02, threshold=5.658e+02, percent-clipped=1.0 2023-03-08 05:42:37,276 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2023-03-08 05:42:58,805 INFO [train2.py:809] (2/4) Epoch 11, batch 3800, loss[ctc_loss=0.0826, att_loss=0.2273, loss=0.1984, over 12279.00 frames. utt_duration=1821 frames, utt_pad_proportion=0.1436, over 27.00 utterances.], tot_loss[ctc_loss=0.1009, att_loss=0.249, loss=0.2194, over 3253284.86 frames. utt_duration=1247 frames, utt_pad_proportion=0.05955, over 10449.83 utterances.], batch size: 27, lr: 9.70e-03, grad_scale: 8.0 2023-03-08 05:44:18,445 INFO [train2.py:809] (2/4) Epoch 11, batch 3850, loss[ctc_loss=0.09151, att_loss=0.2586, loss=0.2252, over 16960.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007904, over 50.00 utterances.], tot_loss[ctc_loss=0.09986, att_loss=0.2484, loss=0.2187, over 3251630.68 frames. utt_duration=1268 frames, utt_pad_proportion=0.05449, over 10266.79 utterances.], batch size: 50, lr: 9.69e-03, grad_scale: 8.0 2023-03-08 05:45:00,477 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-03-08 05:45:10,393 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.366e+02 2.441e+02 3.005e+02 4.132e+02 9.940e+02, threshold=6.009e+02, percent-clipped=5.0 2023-03-08 05:45:35,989 INFO [train2.py:809] (2/4) Epoch 11, batch 3900, loss[ctc_loss=0.079, att_loss=0.2203, loss=0.1921, over 15760.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009342, over 38.00 utterances.], tot_loss[ctc_loss=0.1013, att_loss=0.2495, loss=0.2199, over 3257795.05 frames. utt_duration=1225 frames, utt_pad_proportion=0.06359, over 10653.25 utterances.], batch size: 38, lr: 9.69e-03, grad_scale: 8.0 2023-03-08 05:46:51,881 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=43787.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:46:53,044 INFO [train2.py:809] (2/4) Epoch 11, batch 3950, loss[ctc_loss=0.08037, att_loss=0.2438, loss=0.2111, over 16495.00 frames. utt_duration=1436 frames, utt_pad_proportion=0.004791, over 46.00 utterances.], tot_loss[ctc_loss=0.1018, att_loss=0.2498, loss=0.2202, over 3251780.81 frames. utt_duration=1218 frames, utt_pad_proportion=0.06644, over 10693.23 utterances.], batch size: 46, lr: 9.68e-03, grad_scale: 8.0 2023-03-08 05:48:13,299 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.357e+02 2.409e+02 2.914e+02 3.960e+02 1.049e+03, threshold=5.829e+02, percent-clipped=5.0 2023-03-08 05:48:13,344 INFO [train2.py:809] (2/4) Epoch 12, batch 0, loss[ctc_loss=0.08676, att_loss=0.2396, loss=0.209, over 16535.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006647, over 45.00 utterances.], tot_loss[ctc_loss=0.08676, att_loss=0.2396, loss=0.209, over 16535.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006647, over 45.00 utterances.], batch size: 45, lr: 9.27e-03, grad_scale: 8.0 2023-03-08 05:48:13,345 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 05:48:20,805 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9862, 4.0087, 3.8635, 2.3258, 3.7282, 3.8274, 3.4142, 2.4491], device='cuda:2'), covar=tensor([0.0140, 0.0116, 0.0218, 0.1128, 0.0163, 0.0250, 0.0393, 0.1470], device='cuda:2'), in_proj_covar=tensor([0.0063, 0.0086, 0.0078, 0.0107, 0.0072, 0.0097, 0.0095, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 05:48:25,650 INFO [train2.py:843] (2/4) Epoch 12, validation: ctc_loss=0.04785, att_loss=0.2384, loss=0.2003, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 05:48:25,651 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 05:49:09,183 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=43848.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:49:45,222 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=43870.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:49:48,003 INFO [train2.py:809] (2/4) Epoch 12, batch 50, loss[ctc_loss=0.09628, att_loss=0.2377, loss=0.2094, over 15650.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008545, over 37.00 utterances.], tot_loss[ctc_loss=0.1016, att_loss=0.251, loss=0.2211, over 736477.88 frames. utt_duration=1139 frames, utt_pad_proportion=0.08619, over 2589.07 utterances.], batch size: 37, lr: 9.26e-03, grad_scale: 8.0 2023-03-08 05:50:08,442 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6881, 3.5384, 2.9269, 3.1406, 3.6717, 3.4043, 2.4873, 3.8811], device='cuda:2'), covar=tensor([0.1065, 0.0498, 0.1151, 0.0716, 0.0734, 0.0679, 0.0961, 0.0537], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0182, 0.0201, 0.0173, 0.0236, 0.0208, 0.0181, 0.0248], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 05:50:48,080 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=43909.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:50:57,788 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6354, 2.7496, 3.7962, 4.4200, 3.9604, 3.9880, 3.1038, 2.1870], device='cuda:2'), covar=tensor([0.0625, 0.2165, 0.0739, 0.0537, 0.0693, 0.0363, 0.1318, 0.2368], device='cuda:2'), in_proj_covar=tensor([0.0166, 0.0205, 0.0185, 0.0191, 0.0188, 0.0149, 0.0193, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 05:51:09,637 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.511e+02 2.447e+02 2.836e+02 3.349e+02 4.947e+02, threshold=5.672e+02, percent-clipped=0.0 2023-03-08 05:51:09,682 INFO [train2.py:809] (2/4) Epoch 12, batch 100, loss[ctc_loss=0.0844, att_loss=0.2206, loss=0.1933, over 14540.00 frames. utt_duration=1819 frames, utt_pad_proportion=0.03811, over 32.00 utterances.], tot_loss[ctc_loss=0.1023, att_loss=0.2512, loss=0.2214, over 1288045.23 frames. utt_duration=1143 frames, utt_pad_proportion=0.08774, over 4514.06 utterances.], batch size: 32, lr: 9.26e-03, grad_scale: 8.0 2023-03-08 05:52:09,343 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=43959.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:52:28,371 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=43970.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:52:31,105 INFO [train2.py:809] (2/4) Epoch 12, batch 150, loss[ctc_loss=0.1016, att_loss=0.2359, loss=0.2091, over 13657.00 frames. utt_duration=1822 frames, utt_pad_proportion=0.07777, over 30.00 utterances.], tot_loss[ctc_loss=0.1009, att_loss=0.2493, loss=0.2196, over 1723792.17 frames. utt_duration=1224 frames, utt_pad_proportion=0.06858, over 5642.12 utterances.], batch size: 30, lr: 9.25e-03, grad_scale: 8.0 2023-03-08 05:52:57,515 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9212, 5.1781, 5.1716, 5.1915, 5.3193, 5.2434, 4.9788, 4.7584], device='cuda:2'), covar=tensor([0.1085, 0.0579, 0.0275, 0.0489, 0.0303, 0.0302, 0.0333, 0.0335], device='cuda:2'), in_proj_covar=tensor([0.0459, 0.0294, 0.0253, 0.0286, 0.0351, 0.0361, 0.0292, 0.0325], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 05:53:52,671 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=44020.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:53:55,400 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.396e+02 2.379e+02 2.873e+02 3.449e+02 9.028e+02, threshold=5.746e+02, percent-clipped=3.0 2023-03-08 05:53:55,442 INFO [train2.py:809] (2/4) Epoch 12, batch 200, loss[ctc_loss=0.06119, att_loss=0.2284, loss=0.195, over 16383.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.00809, over 44.00 utterances.], tot_loss[ctc_loss=0.1011, att_loss=0.2493, loss=0.2196, over 2058569.55 frames. utt_duration=1233 frames, utt_pad_proportion=0.06542, over 6687.49 utterances.], batch size: 44, lr: 9.25e-03, grad_scale: 8.0 2023-03-08 05:54:20,773 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6987, 3.7110, 3.5156, 3.0056, 3.6085, 3.6694, 3.6099, 2.6134], device='cuda:2'), covar=tensor([0.1147, 0.1510, 0.3036, 0.8195, 0.2973, 0.4840, 0.0880, 0.7975], device='cuda:2'), in_proj_covar=tensor([0.0102, 0.0123, 0.0136, 0.0204, 0.0105, 0.0187, 0.0107, 0.0176], device='cuda:2'), out_proj_covar=tensor([9.8501e-05, 1.0916e-04, 1.2193e-04, 1.6674e-04, 9.8410e-05, 1.5541e-04, 9.5477e-05, 1.4611e-04], device='cuda:2') 2023-03-08 05:55:14,812 INFO [train2.py:809] (2/4) Epoch 12, batch 250, loss[ctc_loss=0.1578, att_loss=0.2878, loss=0.2618, over 13945.00 frames. utt_duration=383.7 frames, utt_pad_proportion=0.3316, over 146.00 utterances.], tot_loss[ctc_loss=0.1015, att_loss=0.2496, loss=0.22, over 2331185.18 frames. utt_duration=1235 frames, utt_pad_proportion=0.06256, over 7561.58 utterances.], batch size: 146, lr: 9.24e-03, grad_scale: 8.0 2023-03-08 05:56:14,290 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6532, 2.0737, 5.0162, 3.8616, 2.8976, 4.3579, 4.9023, 4.6606], device='cuda:2'), covar=tensor([0.0185, 0.2068, 0.0139, 0.1026, 0.1946, 0.0219, 0.0090, 0.0196], device='cuda:2'), in_proj_covar=tensor([0.0152, 0.0245, 0.0137, 0.0306, 0.0273, 0.0184, 0.0122, 0.0156], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 05:56:35,436 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.521e+02 2.301e+02 2.863e+02 3.737e+02 8.031e+02, threshold=5.726e+02, percent-clipped=4.0 2023-03-08 05:56:35,480 INFO [train2.py:809] (2/4) Epoch 12, batch 300, loss[ctc_loss=0.08967, att_loss=0.2288, loss=0.201, over 15916.00 frames. utt_duration=1634 frames, utt_pad_proportion=0.007353, over 39.00 utterances.], tot_loss[ctc_loss=0.1011, att_loss=0.2495, loss=0.2198, over 2549585.11 frames. utt_duration=1245 frames, utt_pad_proportion=0.05409, over 8203.13 utterances.], batch size: 39, lr: 9.24e-03, grad_scale: 8.0 2023-03-08 05:56:46,169 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-08 05:57:10,234 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44143.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:57:53,918 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=44170.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:57:56,721 INFO [train2.py:809] (2/4) Epoch 12, batch 350, loss[ctc_loss=0.06754, att_loss=0.2269, loss=0.195, over 15942.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007031, over 41.00 utterances.], tot_loss[ctc_loss=0.1014, att_loss=0.25, loss=0.2202, over 2707055.65 frames. utt_duration=1194 frames, utt_pad_proportion=0.06778, over 9078.26 utterances.], batch size: 41, lr: 9.23e-03, grad_scale: 8.0 2023-03-08 05:59:10,321 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=44218.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 05:59:17,105 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.518e+02 2.261e+02 2.765e+02 3.364e+02 1.191e+03, threshold=5.531e+02, percent-clipped=3.0 2023-03-08 05:59:17,148 INFO [train2.py:809] (2/4) Epoch 12, batch 400, loss[ctc_loss=0.06912, att_loss=0.2225, loss=0.1918, over 16272.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007041, over 43.00 utterances.], tot_loss[ctc_loss=0.1001, att_loss=0.2492, loss=0.2194, over 2832734.51 frames. utt_duration=1233 frames, utt_pad_proportion=0.05721, over 9202.73 utterances.], batch size: 43, lr: 9.23e-03, grad_scale: 8.0 2023-03-08 05:59:25,780 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-03-08 05:59:53,296 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.80 vs. limit=2.0 2023-03-08 06:00:24,823 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44265.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:00:25,468 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.58 vs. limit=2.0 2023-03-08 06:00:36,683 INFO [train2.py:809] (2/4) Epoch 12, batch 450, loss[ctc_loss=0.1049, att_loss=0.221, loss=0.1978, over 14526.00 frames. utt_duration=1817 frames, utt_pad_proportion=0.03231, over 32.00 utterances.], tot_loss[ctc_loss=0.1006, att_loss=0.2494, loss=0.2196, over 2927827.70 frames. utt_duration=1223 frames, utt_pad_proportion=0.06122, over 9586.48 utterances.], batch size: 32, lr: 9.22e-03, grad_scale: 8.0 2023-03-08 06:01:28,121 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-03-08 06:01:44,251 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44315.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:01:56,074 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+02 2.333e+02 2.897e+02 3.791e+02 1.244e+03, threshold=5.793e+02, percent-clipped=7.0 2023-03-08 06:01:56,118 INFO [train2.py:809] (2/4) Epoch 12, batch 500, loss[ctc_loss=0.08837, att_loss=0.2303, loss=0.2019, over 16186.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.00589, over 41.00 utterances.], tot_loss[ctc_loss=0.1004, att_loss=0.2495, loss=0.2197, over 3008099.08 frames. utt_duration=1234 frames, utt_pad_proportion=0.05689, over 9760.35 utterances.], batch size: 41, lr: 9.22e-03, grad_scale: 8.0 2023-03-08 06:03:03,222 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-08 06:03:15,764 INFO [train2.py:809] (2/4) Epoch 12, batch 550, loss[ctc_loss=0.0915, att_loss=0.2541, loss=0.2216, over 17299.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02366, over 59.00 utterances.], tot_loss[ctc_loss=0.09988, att_loss=0.249, loss=0.2191, over 3064923.38 frames. utt_duration=1270 frames, utt_pad_proportion=0.04875, over 9663.82 utterances.], batch size: 59, lr: 9.21e-03, grad_scale: 8.0 2023-03-08 06:03:21,518 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-08 06:04:33,719 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7984, 5.1184, 4.5662, 5.2281, 4.5247, 4.8430, 5.2524, 5.0451], device='cuda:2'), covar=tensor([0.0591, 0.0331, 0.1005, 0.0243, 0.0512, 0.0293, 0.0225, 0.0185], device='cuda:2'), in_proj_covar=tensor([0.0340, 0.0267, 0.0321, 0.0263, 0.0273, 0.0207, 0.0251, 0.0242], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 06:04:34,992 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.625e+02 2.230e+02 2.908e+02 3.570e+02 8.469e+02, threshold=5.817e+02, percent-clipped=3.0 2023-03-08 06:04:35,039 INFO [train2.py:809] (2/4) Epoch 12, batch 600, loss[ctc_loss=0.1086, att_loss=0.2529, loss=0.2241, over 17023.00 frames. utt_duration=1286 frames, utt_pad_proportion=0.01134, over 53.00 utterances.], tot_loss[ctc_loss=0.09995, att_loss=0.2496, loss=0.2197, over 3123966.23 frames. utt_duration=1266 frames, utt_pad_proportion=0.04631, over 9885.17 utterances.], batch size: 53, lr: 9.21e-03, grad_scale: 8.0 2023-03-08 06:05:04,144 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5966, 5.1882, 4.8581, 4.9239, 5.2129, 4.7254, 3.6967, 5.1039], device='cuda:2'), covar=tensor([0.0110, 0.0089, 0.0113, 0.0082, 0.0080, 0.0101, 0.0592, 0.0181], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0071, 0.0088, 0.0054, 0.0059, 0.0069, 0.0092, 0.0093], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 06:05:08,620 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=44443.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:05:35,077 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.51 vs. limit=5.0 2023-03-08 06:05:53,592 INFO [train2.py:809] (2/4) Epoch 12, batch 650, loss[ctc_loss=0.09807, att_loss=0.2349, loss=0.2075, over 15992.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.008086, over 40.00 utterances.], tot_loss[ctc_loss=0.09894, att_loss=0.2483, loss=0.2185, over 3155350.14 frames. utt_duration=1285 frames, utt_pad_proportion=0.04361, over 9829.89 utterances.], batch size: 40, lr: 9.20e-03, grad_scale: 8.0 2023-03-08 06:06:23,474 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=44491.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:07:12,354 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.481e+02 2.522e+02 3.198e+02 3.829e+02 8.235e+02, threshold=6.395e+02, percent-clipped=2.0 2023-03-08 06:07:12,397 INFO [train2.py:809] (2/4) Epoch 12, batch 700, loss[ctc_loss=0.09553, att_loss=0.2512, loss=0.22, over 16969.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007275, over 50.00 utterances.], tot_loss[ctc_loss=0.1001, att_loss=0.2489, loss=0.2191, over 3185257.33 frames. utt_duration=1281 frames, utt_pad_proportion=0.04364, over 9956.32 utterances.], batch size: 50, lr: 9.20e-03, grad_scale: 8.0 2023-03-08 06:07:20,679 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=44527.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:07:57,873 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=44550.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:08:17,045 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1642, 1.9097, 2.0789, 2.3138, 2.8544, 2.1895, 2.3957, 3.1388], device='cuda:2'), covar=tensor([0.1013, 0.3591, 0.3146, 0.1614, 0.1246, 0.1593, 0.1996, 0.0941], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0085, 0.0091, 0.0076, 0.0076, 0.0069, 0.0088, 0.0062], device='cuda:2'), out_proj_covar=tensor([5.2537e-05, 5.9881e-05, 6.2751e-05, 5.2260e-05, 5.0311e-05, 5.0069e-05, 6.0393e-05, 4.6047e-05], device='cuda:2') 2023-03-08 06:08:21,819 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=44565.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:08:32,493 INFO [train2.py:809] (2/4) Epoch 12, batch 750, loss[ctc_loss=0.09559, att_loss=0.237, loss=0.2087, over 16167.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.007072, over 41.00 utterances.], tot_loss[ctc_loss=0.1006, att_loss=0.2493, loss=0.2196, over 3203335.96 frames. utt_duration=1259 frames, utt_pad_proportion=0.05034, over 10188.77 utterances.], batch size: 41, lr: 9.19e-03, grad_scale: 8.0 2023-03-08 06:08:58,575 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=44588.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:09:11,729 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.65 vs. limit=5.0 2023-03-08 06:09:35,570 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=44611.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:09:39,034 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=44613.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:09:42,318 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=44615.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:09:52,664 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.593e+02 2.480e+02 2.940e+02 3.562e+02 6.927e+02, threshold=5.880e+02, percent-clipped=2.0 2023-03-08 06:09:52,707 INFO [train2.py:809] (2/4) Epoch 12, batch 800, loss[ctc_loss=0.123, att_loss=0.2688, loss=0.2396, over 17333.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02204, over 59.00 utterances.], tot_loss[ctc_loss=0.09984, att_loss=0.2489, loss=0.2191, over 3215580.89 frames. utt_duration=1255 frames, utt_pad_proportion=0.05192, over 10258.94 utterances.], batch size: 59, lr: 9.19e-03, grad_scale: 8.0 2023-03-08 06:10:23,862 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.55 vs. limit=5.0 2023-03-08 06:10:57,837 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=44663.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:11:12,232 INFO [train2.py:809] (2/4) Epoch 12, batch 850, loss[ctc_loss=0.112, att_loss=0.2624, loss=0.2323, over 17311.00 frames. utt_duration=878 frames, utt_pad_proportion=0.07969, over 79.00 utterances.], tot_loss[ctc_loss=0.1003, att_loss=0.2494, loss=0.2196, over 3229349.56 frames. utt_duration=1217 frames, utt_pad_proportion=0.06107, over 10626.33 utterances.], batch size: 79, lr: 9.18e-03, grad_scale: 8.0 2023-03-08 06:12:30,574 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.599e+02 2.371e+02 2.719e+02 3.452e+02 7.248e+02, threshold=5.437e+02, percent-clipped=1.0 2023-03-08 06:12:30,617 INFO [train2.py:809] (2/4) Epoch 12, batch 900, loss[ctc_loss=0.09246, att_loss=0.2573, loss=0.2243, over 17295.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01235, over 55.00 utterances.], tot_loss[ctc_loss=0.09996, att_loss=0.2494, loss=0.2195, over 3244199.13 frames. utt_duration=1219 frames, utt_pad_proportion=0.05843, over 10654.84 utterances.], batch size: 55, lr: 9.18e-03, grad_scale: 8.0 2023-03-08 06:13:24,801 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5449, 5.1676, 4.9631, 4.9270, 5.0907, 4.8177, 3.6349, 5.0206], device='cuda:2'), covar=tensor([0.0117, 0.0122, 0.0132, 0.0109, 0.0098, 0.0096, 0.0679, 0.0289], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0071, 0.0088, 0.0054, 0.0059, 0.0069, 0.0092, 0.0092], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 06:13:33,555 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.99 vs. limit=5.0 2023-03-08 06:13:49,827 INFO [train2.py:809] (2/4) Epoch 12, batch 950, loss[ctc_loss=0.123, att_loss=0.2556, loss=0.2291, over 17059.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.008578, over 52.00 utterances.], tot_loss[ctc_loss=0.09987, att_loss=0.2493, loss=0.2194, over 3248141.78 frames. utt_duration=1218 frames, utt_pad_proportion=0.06046, over 10680.50 utterances.], batch size: 52, lr: 9.17e-03, grad_scale: 8.0 2023-03-08 06:14:44,460 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7049, 4.6061, 4.5871, 4.6216, 5.1239, 4.5665, 4.7229, 2.2947], device='cuda:2'), covar=tensor([0.0182, 0.0244, 0.0211, 0.0218, 0.1069, 0.0199, 0.0199, 0.2170], device='cuda:2'), in_proj_covar=tensor([0.0125, 0.0132, 0.0136, 0.0144, 0.0330, 0.0119, 0.0119, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 06:15:09,798 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.510e+02 2.480e+02 3.005e+02 3.455e+02 6.503e+02, threshold=6.009e+02, percent-clipped=2.0 2023-03-08 06:15:09,841 INFO [train2.py:809] (2/4) Epoch 12, batch 1000, loss[ctc_loss=0.06894, att_loss=0.2331, loss=0.2003, over 16328.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.00646, over 45.00 utterances.], tot_loss[ctc_loss=0.09984, att_loss=0.2493, loss=0.2194, over 3259131.94 frames. utt_duration=1246 frames, utt_pad_proportion=0.05263, over 10474.92 utterances.], batch size: 45, lr: 9.17e-03, grad_scale: 8.0 2023-03-08 06:16:28,096 INFO [train2.py:809] (2/4) Epoch 12, batch 1050, loss[ctc_loss=0.09012, att_loss=0.2527, loss=0.2202, over 16969.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007305, over 50.00 utterances.], tot_loss[ctc_loss=0.101, att_loss=0.25, loss=0.2202, over 3266643.18 frames. utt_duration=1250 frames, utt_pad_proportion=0.05061, over 10464.51 utterances.], batch size: 50, lr: 9.16e-03, grad_scale: 8.0 2023-03-08 06:16:46,083 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44883.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:17:22,292 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44906.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:17:39,195 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-03-08 06:17:47,208 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+02 2.358e+02 3.023e+02 3.796e+02 1.133e+03, threshold=6.045e+02, percent-clipped=5.0 2023-03-08 06:17:47,251 INFO [train2.py:809] (2/4) Epoch 12, batch 1100, loss[ctc_loss=0.1107, att_loss=0.234, loss=0.2093, over 15647.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008233, over 37.00 utterances.], tot_loss[ctc_loss=0.1004, att_loss=0.2492, loss=0.2194, over 3252584.55 frames. utt_duration=1273 frames, utt_pad_proportion=0.04861, over 10228.49 utterances.], batch size: 37, lr: 9.16e-03, grad_scale: 8.0 2023-03-08 06:18:52,476 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2023-03-08 06:19:06,840 INFO [train2.py:809] (2/4) Epoch 12, batch 1150, loss[ctc_loss=0.08771, att_loss=0.2278, loss=0.1998, over 15786.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007819, over 38.00 utterances.], tot_loss[ctc_loss=0.1001, att_loss=0.2483, loss=0.2187, over 3255767.52 frames. utt_duration=1276 frames, utt_pad_proportion=0.04898, over 10217.23 utterances.], batch size: 38, lr: 9.15e-03, grad_scale: 8.0 2023-03-08 06:20:27,105 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+02 2.531e+02 3.116e+02 3.796e+02 1.058e+03, threshold=6.232e+02, percent-clipped=4.0 2023-03-08 06:20:27,148 INFO [train2.py:809] (2/4) Epoch 12, batch 1200, loss[ctc_loss=0.0807, att_loss=0.2183, loss=0.1908, over 15899.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.008521, over 39.00 utterances.], tot_loss[ctc_loss=0.09906, att_loss=0.2472, loss=0.2176, over 3252994.60 frames. utt_duration=1286 frames, utt_pad_proportion=0.04776, over 10126.60 utterances.], batch size: 39, lr: 9.15e-03, grad_scale: 8.0 2023-03-08 06:21:22,296 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45056.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:21:47,503 INFO [train2.py:809] (2/4) Epoch 12, batch 1250, loss[ctc_loss=0.1112, att_loss=0.279, loss=0.2455, over 17081.00 frames. utt_duration=1290 frames, utt_pad_proportion=0.008165, over 53.00 utterances.], tot_loss[ctc_loss=0.0989, att_loss=0.2475, loss=0.2178, over 3259709.39 frames. utt_duration=1279 frames, utt_pad_proportion=0.0493, over 10204.58 utterances.], batch size: 53, lr: 9.14e-03, grad_scale: 8.0 2023-03-08 06:21:51,602 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4583, 2.7149, 3.6646, 4.3342, 3.9471, 4.0429, 2.9250, 2.4076], device='cuda:2'), covar=tensor([0.0647, 0.2283, 0.0842, 0.0552, 0.0813, 0.0355, 0.1415, 0.2091], device='cuda:2'), in_proj_covar=tensor([0.0170, 0.0210, 0.0187, 0.0197, 0.0190, 0.0153, 0.0196, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 06:22:02,286 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45081.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:22:15,894 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0450, 4.9702, 4.8290, 2.9816, 4.6134, 4.4584, 4.2001, 2.6585], device='cuda:2'), covar=tensor([0.0147, 0.0095, 0.0246, 0.1036, 0.0111, 0.0229, 0.0366, 0.1445], device='cuda:2'), in_proj_covar=tensor([0.0062, 0.0085, 0.0078, 0.0105, 0.0071, 0.0096, 0.0095, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 06:22:58,408 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45117.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 06:23:05,868 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.644e+02 2.248e+02 2.874e+02 3.449e+02 8.124e+02, threshold=5.748e+02, percent-clipped=2.0 2023-03-08 06:23:05,913 INFO [train2.py:809] (2/4) Epoch 12, batch 1300, loss[ctc_loss=0.1074, att_loss=0.2633, loss=0.2321, over 16759.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.00691, over 48.00 utterances.], tot_loss[ctc_loss=0.09958, att_loss=0.2478, loss=0.2182, over 3264167.27 frames. utt_duration=1259 frames, utt_pad_proportion=0.05264, over 10385.95 utterances.], batch size: 48, lr: 9.14e-03, grad_scale: 8.0 2023-03-08 06:23:23,913 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1189, 5.2148, 5.0168, 2.4444, 2.0244, 2.9193, 2.8580, 3.8054], device='cuda:2'), covar=tensor([0.0673, 0.0212, 0.0215, 0.4818, 0.6222, 0.2419, 0.2592, 0.1888], device='cuda:2'), in_proj_covar=tensor([0.0339, 0.0225, 0.0241, 0.0215, 0.0348, 0.0338, 0.0232, 0.0355], device='cuda:2'), out_proj_covar=tensor([1.5141e-04, 8.3281e-05, 1.0350e-04, 9.5110e-05, 1.5155e-04, 1.3644e-04, 9.1692e-05, 1.4946e-04], device='cuda:2') 2023-03-08 06:23:37,476 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45142.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:24:24,487 INFO [train2.py:809] (2/4) Epoch 12, batch 1350, loss[ctc_loss=0.1172, att_loss=0.2716, loss=0.2407, over 17029.00 frames. utt_duration=1287 frames, utt_pad_proportion=0.01035, over 53.00 utterances.], tot_loss[ctc_loss=0.09943, att_loss=0.2486, loss=0.2188, over 3276826.36 frames. utt_duration=1283 frames, utt_pad_proportion=0.04436, over 10224.85 utterances.], batch size: 53, lr: 9.13e-03, grad_scale: 8.0 2023-03-08 06:24:42,121 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=45183.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:24:54,760 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5421, 5.0566, 4.7402, 4.8441, 5.0071, 4.6154, 3.5324, 4.8915], device='cuda:2'), covar=tensor([0.0112, 0.0111, 0.0144, 0.0087, 0.0088, 0.0121, 0.0686, 0.0208], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0070, 0.0087, 0.0053, 0.0058, 0.0070, 0.0091, 0.0091], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 06:25:18,425 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=45206.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:25:43,525 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.529e+02 2.386e+02 2.782e+02 3.300e+02 6.204e+02, threshold=5.564e+02, percent-clipped=3.0 2023-03-08 06:25:43,570 INFO [train2.py:809] (2/4) Epoch 12, batch 1400, loss[ctc_loss=0.1154, att_loss=0.2583, loss=0.2297, over 16642.00 frames. utt_duration=673.9 frames, utt_pad_proportion=0.1491, over 99.00 utterances.], tot_loss[ctc_loss=0.09861, att_loss=0.2472, loss=0.2175, over 3267682.52 frames. utt_duration=1311 frames, utt_pad_proportion=0.04016, over 9982.10 utterances.], batch size: 99, lr: 9.13e-03, grad_scale: 8.0 2023-03-08 06:25:45,867 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.82 vs. limit=2.0 2023-03-08 06:25:58,481 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=45231.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:26:25,571 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.95 vs. limit=5.0 2023-03-08 06:26:34,542 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=45254.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:27:04,202 INFO [train2.py:809] (2/4) Epoch 12, batch 1450, loss[ctc_loss=0.158, att_loss=0.2844, loss=0.2591, over 13822.00 frames. utt_duration=383 frames, utt_pad_proportion=0.3339, over 145.00 utterances.], tot_loss[ctc_loss=0.09884, att_loss=0.2479, loss=0.2181, over 3270369.99 frames. utt_duration=1276 frames, utt_pad_proportion=0.04826, over 10263.50 utterances.], batch size: 145, lr: 9.12e-03, grad_scale: 8.0 2023-03-08 06:28:17,197 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-08 06:28:24,615 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 2.356e+02 2.763e+02 3.457e+02 6.962e+02, threshold=5.526e+02, percent-clipped=5.0 2023-03-08 06:28:24,658 INFO [train2.py:809] (2/4) Epoch 12, batch 1500, loss[ctc_loss=0.08136, att_loss=0.214, loss=0.1875, over 15513.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.007971, over 36.00 utterances.], tot_loss[ctc_loss=0.0986, att_loss=0.2477, loss=0.2179, over 3266241.39 frames. utt_duration=1273 frames, utt_pad_proportion=0.05018, over 10273.26 utterances.], batch size: 36, lr: 9.12e-03, grad_scale: 8.0 2023-03-08 06:29:06,048 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45348.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 06:29:09,232 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-08 06:29:45,020 INFO [train2.py:809] (2/4) Epoch 12, batch 1550, loss[ctc_loss=0.08792, att_loss=0.2196, loss=0.1933, over 15792.00 frames. utt_duration=1664 frames, utt_pad_proportion=0.007254, over 38.00 utterances.], tot_loss[ctc_loss=0.09721, att_loss=0.2464, loss=0.2166, over 3257390.51 frames. utt_duration=1292 frames, utt_pad_proportion=0.04779, over 10096.13 utterances.], batch size: 38, lr: 9.11e-03, grad_scale: 8.0 2023-03-08 06:30:44,805 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45409.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 06:30:49,187 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=45412.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 06:31:05,236 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.391e+02 2.353e+02 2.825e+02 3.392e+02 6.329e+02, threshold=5.650e+02, percent-clipped=2.0 2023-03-08 06:31:05,279 INFO [train2.py:809] (2/4) Epoch 12, batch 1600, loss[ctc_loss=0.08478, att_loss=0.2454, loss=0.2133, over 17414.00 frames. utt_duration=1011 frames, utt_pad_proportion=0.04718, over 69.00 utterances.], tot_loss[ctc_loss=0.09826, att_loss=0.247, loss=0.2173, over 3260623.14 frames. utt_duration=1280 frames, utt_pad_proportion=0.05009, over 10204.42 utterances.], batch size: 69, lr: 9.11e-03, grad_scale: 16.0 2023-03-08 06:31:28,671 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=45437.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:31:49,347 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1132, 6.3010, 5.6922, 6.0689, 5.9905, 5.4981, 5.7572, 5.5046], device='cuda:2'), covar=tensor([0.1260, 0.0872, 0.0945, 0.0750, 0.0804, 0.1454, 0.2250, 0.2496], device='cuda:2'), in_proj_covar=tensor([0.0445, 0.0507, 0.0386, 0.0383, 0.0372, 0.0431, 0.0525, 0.0461], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 06:32:24,619 INFO [train2.py:809] (2/4) Epoch 12, batch 1650, loss[ctc_loss=0.1183, att_loss=0.2566, loss=0.229, over 16889.00 frames. utt_duration=683.8 frames, utt_pad_proportion=0.1399, over 99.00 utterances.], tot_loss[ctc_loss=0.09948, att_loss=0.2485, loss=0.2187, over 3270912.15 frames. utt_duration=1244 frames, utt_pad_proportion=0.05615, over 10531.79 utterances.], batch size: 99, lr: 9.10e-03, grad_scale: 16.0 2023-03-08 06:32:36,135 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7731, 3.7158, 3.5201, 3.2165, 3.7316, 3.7625, 3.5132, 2.6411], device='cuda:2'), covar=tensor([0.1048, 0.1795, 0.3190, 0.5054, 0.4262, 0.4590, 0.1281, 0.6822], device='cuda:2'), in_proj_covar=tensor([0.0106, 0.0129, 0.0139, 0.0214, 0.0112, 0.0196, 0.0116, 0.0184], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 06:33:43,906 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+02 2.369e+02 2.877e+02 3.430e+02 6.900e+02, threshold=5.754e+02, percent-clipped=3.0 2023-03-08 06:33:43,951 INFO [train2.py:809] (2/4) Epoch 12, batch 1700, loss[ctc_loss=0.09229, att_loss=0.2248, loss=0.1983, over 15660.00 frames. utt_duration=1695 frames, utt_pad_proportion=0.007817, over 37.00 utterances.], tot_loss[ctc_loss=0.09906, att_loss=0.2478, loss=0.2181, over 3266153.79 frames. utt_duration=1253 frames, utt_pad_proportion=0.05578, over 10443.12 utterances.], batch size: 37, lr: 9.10e-03, grad_scale: 16.0 2023-03-08 06:35:04,020 INFO [train2.py:809] (2/4) Epoch 12, batch 1750, loss[ctc_loss=0.06439, att_loss=0.2129, loss=0.1832, over 15507.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.007544, over 36.00 utterances.], tot_loss[ctc_loss=0.09882, att_loss=0.2477, loss=0.218, over 3275488.04 frames. utt_duration=1262 frames, utt_pad_proportion=0.05036, over 10395.31 utterances.], batch size: 36, lr: 9.09e-03, grad_scale: 16.0 2023-03-08 06:36:24,186 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.564e+02 2.369e+02 2.874e+02 3.424e+02 1.036e+03, threshold=5.747e+02, percent-clipped=2.0 2023-03-08 06:36:24,229 INFO [train2.py:809] (2/4) Epoch 12, batch 1800, loss[ctc_loss=0.08873, att_loss=0.2499, loss=0.2177, over 17331.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02293, over 59.00 utterances.], tot_loss[ctc_loss=0.09831, att_loss=0.2476, loss=0.2178, over 3281943.26 frames. utt_duration=1274 frames, utt_pad_proportion=0.0467, over 10314.13 utterances.], batch size: 59, lr: 9.09e-03, grad_scale: 16.0 2023-03-08 06:36:26,059 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-08 06:37:08,222 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.25 vs. limit=5.0 2023-03-08 06:37:43,871 INFO [train2.py:809] (2/4) Epoch 12, batch 1850, loss[ctc_loss=0.1353, att_loss=0.2774, loss=0.249, over 17144.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.0135, over 56.00 utterances.], tot_loss[ctc_loss=0.09868, att_loss=0.2484, loss=0.2185, over 3282974.04 frames. utt_duration=1274 frames, utt_pad_proportion=0.04611, over 10320.72 utterances.], batch size: 56, lr: 9.08e-03, grad_scale: 8.0 2023-03-08 06:38:32,229 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45702.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:38:35,070 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=45704.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 06:38:48,030 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=45712.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 06:39:02,763 INFO [train2.py:809] (2/4) Epoch 12, batch 1900, loss[ctc_loss=0.1211, att_loss=0.2684, loss=0.239, over 17292.00 frames. utt_duration=877.1 frames, utt_pad_proportion=0.07967, over 79.00 utterances.], tot_loss[ctc_loss=0.09909, att_loss=0.2492, loss=0.2192, over 3289224.30 frames. utt_duration=1272 frames, utt_pad_proportion=0.04483, over 10354.28 utterances.], batch size: 79, lr: 9.08e-03, grad_scale: 8.0 2023-03-08 06:39:04,236 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.442e+02 2.369e+02 2.834e+02 3.743e+02 8.835e+02, threshold=5.667e+02, percent-clipped=7.0 2023-03-08 06:39:26,609 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=45737.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:39:39,303 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7698, 3.0483, 3.6917, 3.3709, 3.6104, 4.7944, 4.5369, 3.6816], device='cuda:2'), covar=tensor([0.0362, 0.1742, 0.1095, 0.1185, 0.1013, 0.0720, 0.0502, 0.1072], device='cuda:2'), in_proj_covar=tensor([0.0236, 0.0238, 0.0256, 0.0212, 0.0247, 0.0318, 0.0228, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 06:39:50,591 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6963, 3.9265, 3.9035, 2.3891, 2.3459, 2.7604, 2.4487, 3.5135], device='cuda:2'), covar=tensor([0.0681, 0.0267, 0.0299, 0.2959, 0.3973, 0.2166, 0.2129, 0.1350], device='cuda:2'), in_proj_covar=tensor([0.0334, 0.0218, 0.0236, 0.0209, 0.0339, 0.0332, 0.0227, 0.0350], device='cuda:2'), out_proj_covar=tensor([1.4946e-04, 8.1045e-05, 1.0167e-04, 9.1687e-05, 1.4737e-04, 1.3363e-04, 8.9648e-05, 1.4721e-04], device='cuda:2') 2023-03-08 06:40:02,993 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=45760.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:40:07,768 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45763.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:40:21,061 INFO [train2.py:809] (2/4) Epoch 12, batch 1950, loss[ctc_loss=0.07692, att_loss=0.2387, loss=0.2063, over 16463.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.006915, over 46.00 utterances.], tot_loss[ctc_loss=0.09961, att_loss=0.2492, loss=0.2193, over 3287822.03 frames. utt_duration=1252 frames, utt_pad_proportion=0.05044, over 10519.46 utterances.], batch size: 46, lr: 9.07e-03, grad_scale: 8.0 2023-03-08 06:40:24,508 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45774.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:40:41,626 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=45785.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:41:41,126 INFO [train2.py:809] (2/4) Epoch 12, batch 2000, loss[ctc_loss=0.09235, att_loss=0.2551, loss=0.2225, over 16459.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.006855, over 46.00 utterances.], tot_loss[ctc_loss=0.1002, att_loss=0.2495, loss=0.2197, over 3278885.63 frames. utt_duration=1230 frames, utt_pad_proportion=0.05559, over 10675.20 utterances.], batch size: 46, lr: 9.07e-03, grad_scale: 8.0 2023-03-08 06:41:42,586 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.549e+02 2.558e+02 3.002e+02 3.965e+02 8.183e+02, threshold=6.004e+02, percent-clipped=7.0 2023-03-08 06:42:02,194 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45835.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:42:41,075 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45859.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:43:00,499 INFO [train2.py:809] (2/4) Epoch 12, batch 2050, loss[ctc_loss=0.07875, att_loss=0.227, loss=0.1974, over 14116.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.04941, over 31.00 utterances.], tot_loss[ctc_loss=0.1004, att_loss=0.2497, loss=0.2198, over 3282925.93 frames. utt_duration=1223 frames, utt_pad_proportion=0.05517, over 10746.90 utterances.], batch size: 31, lr: 9.06e-03, grad_scale: 8.0 2023-03-08 06:43:02,970 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0118, 4.9694, 4.8168, 2.8641, 4.7520, 4.4757, 4.3962, 2.4922], device='cuda:2'), covar=tensor([0.0101, 0.0088, 0.0232, 0.1055, 0.0086, 0.0213, 0.0288, 0.1467], device='cuda:2'), in_proj_covar=tensor([0.0060, 0.0084, 0.0077, 0.0102, 0.0070, 0.0095, 0.0093, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 06:43:07,824 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9696, 3.6590, 3.1416, 3.2103, 3.9636, 3.5813, 2.6170, 4.2599], device='cuda:2'), covar=tensor([0.0986, 0.0464, 0.1046, 0.0706, 0.0584, 0.0697, 0.0970, 0.0426], device='cuda:2'), in_proj_covar=tensor([0.0186, 0.0186, 0.0203, 0.0175, 0.0239, 0.0212, 0.0181, 0.0254], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 06:44:18,294 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45920.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:44:21,024 INFO [train2.py:809] (2/4) Epoch 12, batch 2100, loss[ctc_loss=0.08218, att_loss=0.2295, loss=0.2001, over 11478.00 frames. utt_duration=1838 frames, utt_pad_proportion=0.1726, over 25.00 utterances.], tot_loss[ctc_loss=0.1, att_loss=0.2498, loss=0.2199, over 3279407.73 frames. utt_duration=1229 frames, utt_pad_proportion=0.05501, over 10688.85 utterances.], batch size: 25, lr: 9.06e-03, grad_scale: 8.0 2023-03-08 06:44:22,594 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.410e+02 2.414e+02 3.055e+02 3.692e+02 7.937e+02, threshold=6.110e+02, percent-clipped=4.0 2023-03-08 06:45:41,045 INFO [train2.py:809] (2/4) Epoch 12, batch 2150, loss[ctc_loss=0.05969, att_loss=0.2081, loss=0.1784, over 15629.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009827, over 37.00 utterances.], tot_loss[ctc_loss=0.09932, att_loss=0.249, loss=0.2191, over 3280948.97 frames. utt_duration=1261 frames, utt_pad_proportion=0.04742, over 10421.30 utterances.], batch size: 37, lr: 9.05e-03, grad_scale: 8.0 2023-03-08 06:45:56,989 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4964, 4.9557, 4.7941, 4.8941, 4.9446, 4.6805, 3.4801, 4.7289], device='cuda:2'), covar=tensor([0.0120, 0.0113, 0.0120, 0.0070, 0.0097, 0.0115, 0.0688, 0.0207], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0072, 0.0088, 0.0054, 0.0060, 0.0071, 0.0092, 0.0091], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 06:46:37,287 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46004.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 06:47:04,790 INFO [train2.py:809] (2/4) Epoch 12, batch 2200, loss[ctc_loss=0.1001, att_loss=0.2591, loss=0.2273, over 16465.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.006614, over 46.00 utterances.], tot_loss[ctc_loss=0.09959, att_loss=0.2491, loss=0.2192, over 3280312.32 frames. utt_duration=1242 frames, utt_pad_proportion=0.05226, over 10578.86 utterances.], batch size: 46, lr: 9.05e-03, grad_scale: 8.0 2023-03-08 06:47:06,226 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.436e+02 2.315e+02 2.803e+02 3.534e+02 8.357e+02, threshold=5.606e+02, percent-clipped=3.0 2023-03-08 06:47:39,212 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9692, 3.6840, 3.0239, 3.4759, 3.8627, 3.5038, 2.8240, 4.1589], device='cuda:2'), covar=tensor([0.1048, 0.0469, 0.1188, 0.0608, 0.0677, 0.0726, 0.0906, 0.0513], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0187, 0.0207, 0.0177, 0.0243, 0.0215, 0.0184, 0.0259], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 06:47:53,437 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46052.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 06:48:02,429 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46058.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:48:23,928 INFO [train2.py:809] (2/4) Epoch 12, batch 2250, loss[ctc_loss=0.09287, att_loss=0.2372, loss=0.2084, over 15946.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.006257, over 41.00 utterances.], tot_loss[ctc_loss=0.09878, att_loss=0.2481, loss=0.2183, over 3280837.82 frames. utt_duration=1266 frames, utt_pad_proportion=0.0463, over 10378.56 utterances.], batch size: 41, lr: 9.04e-03, grad_scale: 8.0 2023-03-08 06:49:42,480 INFO [train2.py:809] (2/4) Epoch 12, batch 2300, loss[ctc_loss=0.1357, att_loss=0.2735, loss=0.2459, over 17115.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01435, over 56.00 utterances.], tot_loss[ctc_loss=0.09906, att_loss=0.2485, loss=0.2186, over 3286130.24 frames. utt_duration=1249 frames, utt_pad_proportion=0.04938, over 10537.50 utterances.], batch size: 56, lr: 9.04e-03, grad_scale: 8.0 2023-03-08 06:49:44,022 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.648e+02 2.436e+02 3.067e+02 3.650e+02 8.781e+02, threshold=6.133e+02, percent-clipped=4.0 2023-03-08 06:49:55,624 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46130.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:50:35,538 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6083, 2.5456, 5.0751, 3.8799, 2.9802, 4.2171, 4.8987, 4.7016], device='cuda:2'), covar=tensor([0.0214, 0.1734, 0.0151, 0.1167, 0.1875, 0.0245, 0.0107, 0.0219], device='cuda:2'), in_proj_covar=tensor([0.0151, 0.0246, 0.0138, 0.0308, 0.0271, 0.0186, 0.0126, 0.0157], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 06:51:01,256 INFO [train2.py:809] (2/4) Epoch 12, batch 2350, loss[ctc_loss=0.1213, att_loss=0.2661, loss=0.2372, over 17380.00 frames. utt_duration=1009 frames, utt_pad_proportion=0.04809, over 69.00 utterances.], tot_loss[ctc_loss=0.09978, att_loss=0.2489, loss=0.2191, over 3286989.42 frames. utt_duration=1248 frames, utt_pad_proportion=0.04989, over 10548.77 utterances.], batch size: 69, lr: 9.03e-03, grad_scale: 8.0 2023-03-08 06:52:09,335 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46215.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:52:19,909 INFO [train2.py:809] (2/4) Epoch 12, batch 2400, loss[ctc_loss=0.101, att_loss=0.256, loss=0.225, over 16748.00 frames. utt_duration=1397 frames, utt_pad_proportion=0.006782, over 48.00 utterances.], tot_loss[ctc_loss=0.09921, att_loss=0.2487, loss=0.2188, over 3289891.48 frames. utt_duration=1271 frames, utt_pad_proportion=0.04393, over 10366.34 utterances.], batch size: 48, lr: 9.03e-03, grad_scale: 8.0 2023-03-08 06:52:21,344 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.579e+02 2.431e+02 2.918e+02 3.900e+02 7.255e+02, threshold=5.836e+02, percent-clipped=5.0 2023-03-08 06:53:39,241 INFO [train2.py:809] (2/4) Epoch 12, batch 2450, loss[ctc_loss=0.08771, att_loss=0.2239, loss=0.1966, over 14502.00 frames. utt_duration=1814 frames, utt_pad_proportion=0.04199, over 32.00 utterances.], tot_loss[ctc_loss=0.09914, att_loss=0.2484, loss=0.2186, over 3276923.04 frames. utt_duration=1281 frames, utt_pad_proportion=0.04386, over 10243.21 utterances.], batch size: 32, lr: 9.02e-03, grad_scale: 8.0 2023-03-08 06:53:46,248 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=46276.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:54:12,729 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5716, 2.3074, 5.0534, 4.1126, 2.9336, 4.3116, 4.8164, 4.7560], device='cuda:2'), covar=tensor([0.0194, 0.1717, 0.0126, 0.0857, 0.1907, 0.0196, 0.0101, 0.0179], device='cuda:2'), in_proj_covar=tensor([0.0152, 0.0246, 0.0139, 0.0308, 0.0273, 0.0186, 0.0127, 0.0159], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 06:54:58,104 INFO [train2.py:809] (2/4) Epoch 12, batch 2500, loss[ctc_loss=0.1324, att_loss=0.2804, loss=0.2508, over 17498.00 frames. utt_duration=1113 frames, utt_pad_proportion=0.02834, over 63.00 utterances.], tot_loss[ctc_loss=0.09868, att_loss=0.2481, loss=0.2182, over 3278479.52 frames. utt_duration=1283 frames, utt_pad_proportion=0.04355, over 10229.63 utterances.], batch size: 63, lr: 9.02e-03, grad_scale: 8.0 2023-03-08 06:55:00,273 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.509e+02 2.157e+02 2.613e+02 3.362e+02 8.530e+02, threshold=5.227e+02, percent-clipped=3.0 2023-03-08 06:55:22,647 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=46337.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:55:56,110 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46358.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:56:18,494 INFO [train2.py:809] (2/4) Epoch 12, batch 2550, loss[ctc_loss=0.1407, att_loss=0.2753, loss=0.2484, over 17106.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.01579, over 56.00 utterances.], tot_loss[ctc_loss=0.0998, att_loss=0.2488, loss=0.219, over 3273298.70 frames. utt_duration=1266 frames, utt_pad_proportion=0.04965, over 10353.47 utterances.], batch size: 56, lr: 9.01e-03, grad_scale: 8.0 2023-03-08 06:56:20,028 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8548, 6.0587, 5.4426, 5.8366, 5.7007, 5.3205, 5.4922, 5.1665], device='cuda:2'), covar=tensor([0.1108, 0.0820, 0.0871, 0.0776, 0.0881, 0.1426, 0.2244, 0.2554], device='cuda:2'), in_proj_covar=tensor([0.0447, 0.0507, 0.0382, 0.0384, 0.0371, 0.0423, 0.0529, 0.0461], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 06:56:25,869 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-03-08 06:57:12,239 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46406.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:57:26,981 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0617, 4.0350, 3.9277, 2.6911, 3.8775, 3.7559, 3.5986, 2.6592], device='cuda:2'), covar=tensor([0.0095, 0.0090, 0.0168, 0.0969, 0.0102, 0.0362, 0.0305, 0.1231], device='cuda:2'), in_proj_covar=tensor([0.0060, 0.0084, 0.0077, 0.0103, 0.0071, 0.0095, 0.0094, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 06:57:38,531 INFO [train2.py:809] (2/4) Epoch 12, batch 2600, loss[ctc_loss=0.07236, att_loss=0.2196, loss=0.1902, over 15887.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009113, over 39.00 utterances.], tot_loss[ctc_loss=0.09966, att_loss=0.2492, loss=0.2193, over 3276737.52 frames. utt_duration=1252 frames, utt_pad_proportion=0.05352, over 10484.48 utterances.], batch size: 39, lr: 9.01e-03, grad_scale: 8.0 2023-03-08 06:57:39,865 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+02 2.587e+02 3.130e+02 3.974e+02 7.202e+02, threshold=6.259e+02, percent-clipped=6.0 2023-03-08 06:57:51,786 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46430.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:58:57,577 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-03-08 06:58:58,112 INFO [train2.py:809] (2/4) Epoch 12, batch 2650, loss[ctc_loss=0.06866, att_loss=0.2296, loss=0.1974, over 15884.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.008713, over 39.00 utterances.], tot_loss[ctc_loss=0.1001, att_loss=0.2496, loss=0.2197, over 3279612.56 frames. utt_duration=1235 frames, utt_pad_proportion=0.05718, over 10635.42 utterances.], batch size: 39, lr: 9.00e-03, grad_scale: 8.0 2023-03-08 06:59:08,156 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46478.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 06:59:48,713 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-03-08 07:00:07,013 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46515.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:00:11,541 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 07:00:18,193 INFO [train2.py:809] (2/4) Epoch 12, batch 2700, loss[ctc_loss=0.1383, att_loss=0.2749, loss=0.2476, over 17044.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.009405, over 52.00 utterances.], tot_loss[ctc_loss=0.09989, att_loss=0.2493, loss=0.2194, over 3283007.89 frames. utt_duration=1246 frames, utt_pad_proportion=0.05304, over 10549.95 utterances.], batch size: 52, lr: 9.00e-03, grad_scale: 8.0 2023-03-08 07:00:19,663 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.175e+02 2.444e+02 2.933e+02 3.610e+02 6.469e+02, threshold=5.867e+02, percent-clipped=1.0 2023-03-08 07:01:03,128 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3404, 3.5630, 3.3790, 2.9835, 3.6540, 3.5754, 3.3975, 2.6453], device='cuda:2'), covar=tensor([0.1760, 0.1726, 0.3854, 0.7927, 0.2543, 0.4031, 0.1272, 0.7480], device='cuda:2'), in_proj_covar=tensor([0.0108, 0.0126, 0.0140, 0.0209, 0.0109, 0.0194, 0.0116, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 07:01:23,498 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46563.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:01:37,871 INFO [train2.py:809] (2/4) Epoch 12, batch 2750, loss[ctc_loss=0.1066, att_loss=0.256, loss=0.2261, over 17382.00 frames. utt_duration=1009 frames, utt_pad_proportion=0.04784, over 69.00 utterances.], tot_loss[ctc_loss=0.09934, att_loss=0.2493, loss=0.2193, over 3281665.43 frames. utt_duration=1256 frames, utt_pad_proportion=0.05017, over 10460.01 utterances.], batch size: 69, lr: 9.00e-03, grad_scale: 8.0 2023-03-08 07:02:15,219 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=46595.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:02:26,690 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=46602.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:02:57,839 INFO [train2.py:809] (2/4) Epoch 12, batch 2800, loss[ctc_loss=0.09484, att_loss=0.2351, loss=0.207, over 15972.00 frames. utt_duration=1560 frames, utt_pad_proportion=0.005366, over 41.00 utterances.], tot_loss[ctc_loss=0.09955, att_loss=0.2493, loss=0.2194, over 3284113.75 frames. utt_duration=1249 frames, utt_pad_proportion=0.05175, over 10533.36 utterances.], batch size: 41, lr: 8.99e-03, grad_scale: 8.0 2023-03-08 07:02:59,238 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.251e+02 2.286e+02 2.805e+02 3.463e+02 9.248e+02, threshold=5.611e+02, percent-clipped=3.0 2023-03-08 07:03:14,311 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46632.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:03:52,985 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=46656.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 07:04:04,685 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=46663.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:04:18,469 INFO [train2.py:809] (2/4) Epoch 12, batch 2850, loss[ctc_loss=0.07265, att_loss=0.2206, loss=0.191, over 15881.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009549, over 39.00 utterances.], tot_loss[ctc_loss=0.09893, att_loss=0.2485, loss=0.2186, over 3278897.77 frames. utt_duration=1263 frames, utt_pad_proportion=0.04936, over 10395.90 utterances.], batch size: 39, lr: 8.99e-03, grad_scale: 8.0 2023-03-08 07:05:39,967 INFO [train2.py:809] (2/4) Epoch 12, batch 2900, loss[ctc_loss=0.1082, att_loss=0.2489, loss=0.2207, over 16871.00 frames. utt_duration=683.3 frames, utt_pad_proportion=0.1426, over 99.00 utterances.], tot_loss[ctc_loss=0.09906, att_loss=0.2487, loss=0.2188, over 3281611.77 frames. utt_duration=1258 frames, utt_pad_proportion=0.04973, over 10443.89 utterances.], batch size: 99, lr: 8.98e-03, grad_scale: 8.0 2023-03-08 07:05:41,465 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.596e+02 2.133e+02 2.612e+02 3.383e+02 7.389e+02, threshold=5.223e+02, percent-clipped=3.0 2023-03-08 07:06:57,205 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6154, 4.8466, 5.1480, 5.0200, 5.0143, 5.5638, 4.9776, 5.6872], device='cuda:2'), covar=tensor([0.0847, 0.0946, 0.0964, 0.1382, 0.2420, 0.0987, 0.0845, 0.0706], device='cuda:2'), in_proj_covar=tensor([0.0740, 0.0434, 0.0513, 0.0572, 0.0753, 0.0519, 0.0416, 0.0502], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 07:07:00,904 INFO [train2.py:809] (2/4) Epoch 12, batch 2950, loss[ctc_loss=0.1192, att_loss=0.2619, loss=0.2334, over 17447.00 frames. utt_duration=1109 frames, utt_pad_proportion=0.02954, over 63.00 utterances.], tot_loss[ctc_loss=0.09932, att_loss=0.2493, loss=0.2193, over 3293479.39 frames. utt_duration=1245 frames, utt_pad_proportion=0.04917, over 10593.17 utterances.], batch size: 63, lr: 8.98e-03, grad_scale: 8.0 2023-03-08 07:08:03,483 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=46810.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:08:08,062 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4697, 2.4234, 3.2868, 4.4927, 3.9501, 4.1866, 2.8895, 2.3005], device='cuda:2'), covar=tensor([0.0702, 0.2613, 0.1096, 0.0543, 0.0628, 0.0335, 0.1463, 0.2378], device='cuda:2'), in_proj_covar=tensor([0.0173, 0.0209, 0.0186, 0.0194, 0.0190, 0.0155, 0.0195, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 07:08:12,201 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.56 vs. limit=2.0 2023-03-08 07:08:12,792 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4502, 3.5765, 3.4059, 2.9630, 3.6199, 3.5548, 3.4719, 2.3702], device='cuda:2'), covar=tensor([0.1683, 0.1935, 0.3218, 0.8217, 0.4260, 0.3771, 0.1389, 0.9903], device='cuda:2'), in_proj_covar=tensor([0.0108, 0.0127, 0.0140, 0.0211, 0.0108, 0.0193, 0.0116, 0.0183], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 07:08:23,023 INFO [train2.py:809] (2/4) Epoch 12, batch 3000, loss[ctc_loss=0.09994, att_loss=0.246, loss=0.2168, over 15957.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.006684, over 41.00 utterances.], tot_loss[ctc_loss=0.09887, att_loss=0.2494, loss=0.2193, over 3292469.06 frames. utt_duration=1217 frames, utt_pad_proportion=0.05654, over 10838.79 utterances.], batch size: 41, lr: 8.97e-03, grad_scale: 8.0 2023-03-08 07:08:23,023 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 07:08:36,640 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5987, 3.6457, 3.4674, 2.3027, 3.5157, 3.6029, 3.1213, 2.2645], device='cuda:2'), covar=tensor([0.0128, 0.0119, 0.0246, 0.0980, 0.0122, 0.0212, 0.0370, 0.1312], device='cuda:2'), in_proj_covar=tensor([0.0061, 0.0085, 0.0079, 0.0103, 0.0071, 0.0095, 0.0094, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0002, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 07:08:37,753 INFO [train2.py:843] (2/4) Epoch 12, validation: ctc_loss=0.04782, att_loss=0.2378, loss=0.1998, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 07:08:37,754 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 07:08:39,279 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.604e+02 2.287e+02 2.757e+02 3.651e+02 8.301e+02, threshold=5.514e+02, percent-clipped=2.0 2023-03-08 07:08:58,844 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3484, 2.3608, 3.0951, 2.3352, 3.0216, 3.5016, 3.4136, 2.6891], device='cuda:2'), covar=tensor([0.0544, 0.1747, 0.1127, 0.1454, 0.1049, 0.1260, 0.0695, 0.1360], device='cuda:2'), in_proj_covar=tensor([0.0232, 0.0234, 0.0253, 0.0209, 0.0243, 0.0314, 0.0226, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 07:09:56,646 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=46871.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:09:57,803 INFO [train2.py:809] (2/4) Epoch 12, batch 3050, loss[ctc_loss=0.07174, att_loss=0.2464, loss=0.2115, over 16640.00 frames. utt_duration=1418 frames, utt_pad_proportion=0.004318, over 47.00 utterances.], tot_loss[ctc_loss=0.09864, att_loss=0.249, loss=0.219, over 3287567.34 frames. utt_duration=1224 frames, utt_pad_proportion=0.0557, over 10758.66 utterances.], batch size: 47, lr: 8.97e-03, grad_scale: 8.0 2023-03-08 07:10:14,178 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1459, 2.4835, 3.2357, 4.2055, 3.7819, 3.8447, 2.7682, 1.9063], device='cuda:2'), covar=tensor([0.0915, 0.2568, 0.1019, 0.0600, 0.0727, 0.0424, 0.1610, 0.2801], device='cuda:2'), in_proj_covar=tensor([0.0175, 0.0211, 0.0188, 0.0197, 0.0191, 0.0157, 0.0198, 0.0183], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 07:10:33,992 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2023-03-08 07:10:44,999 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0500, 5.0585, 4.8887, 2.0170, 1.9586, 2.7108, 2.9575, 3.9221], device='cuda:2'), covar=tensor([0.0605, 0.0200, 0.0205, 0.5172, 0.6060, 0.2748, 0.2375, 0.1554], device='cuda:2'), in_proj_covar=tensor([0.0333, 0.0222, 0.0237, 0.0213, 0.0342, 0.0332, 0.0229, 0.0352], device='cuda:2'), out_proj_covar=tensor([1.4783e-04, 8.2677e-05, 1.0284e-04, 9.4084e-05, 1.4870e-04, 1.3381e-04, 9.0165e-05, 1.4783e-04], device='cuda:2') 2023-03-08 07:11:17,290 INFO [train2.py:809] (2/4) Epoch 12, batch 3100, loss[ctc_loss=0.08533, att_loss=0.2505, loss=0.2175, over 17050.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008279, over 52.00 utterances.], tot_loss[ctc_loss=0.09945, att_loss=0.2494, loss=0.2194, over 3282656.52 frames. utt_duration=1218 frames, utt_pad_proportion=0.05943, over 10791.01 utterances.], batch size: 52, lr: 8.96e-03, grad_scale: 8.0 2023-03-08 07:11:18,791 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.489e+02 2.293e+02 2.705e+02 3.311e+02 9.468e+02, threshold=5.410e+02, percent-clipped=5.0 2023-03-08 07:11:32,980 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46932.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:12:02,263 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46951.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 07:12:13,038 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46958.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:12:25,442 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-08 07:12:36,181 INFO [train2.py:809] (2/4) Epoch 12, batch 3150, loss[ctc_loss=0.08005, att_loss=0.2572, loss=0.2218, over 16960.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007991, over 50.00 utterances.], tot_loss[ctc_loss=0.09886, att_loss=0.2488, loss=0.2188, over 3280127.19 frames. utt_duration=1241 frames, utt_pad_proportion=0.05536, over 10588.29 utterances.], batch size: 50, lr: 8.96e-03, grad_scale: 8.0 2023-03-08 07:12:48,208 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46980.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:13:55,728 INFO [train2.py:809] (2/4) Epoch 12, batch 3200, loss[ctc_loss=0.1379, att_loss=0.2706, loss=0.2441, over 17004.00 frames. utt_duration=688.5 frames, utt_pad_proportion=0.1362, over 99.00 utterances.], tot_loss[ctc_loss=0.09779, att_loss=0.2473, loss=0.2174, over 3267324.90 frames. utt_duration=1259 frames, utt_pad_proportion=0.05358, over 10396.39 utterances.], batch size: 99, lr: 8.95e-03, grad_scale: 8.0 2023-03-08 07:13:57,255 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.636e+02 2.280e+02 2.660e+02 3.739e+02 9.855e+02, threshold=5.321e+02, percent-clipped=6.0 2023-03-08 07:14:04,000 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0568, 4.5286, 4.1636, 4.5861, 2.5790, 4.4167, 2.4609, 1.7392], device='cuda:2'), covar=tensor([0.0344, 0.0135, 0.0768, 0.0161, 0.1900, 0.0188, 0.1755, 0.1900], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0118, 0.0257, 0.0116, 0.0218, 0.0110, 0.0225, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 07:14:06,361 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.91 vs. limit=5.0 2023-03-08 07:15:09,187 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2732, 4.7340, 4.6079, 4.8439, 2.7935, 4.6488, 2.8092, 2.0262], device='cuda:2'), covar=tensor([0.0284, 0.0155, 0.0599, 0.0131, 0.1643, 0.0166, 0.1455, 0.1690], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0118, 0.0258, 0.0116, 0.0220, 0.0110, 0.0226, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 07:15:16,018 INFO [train2.py:809] (2/4) Epoch 12, batch 3250, loss[ctc_loss=0.1199, att_loss=0.255, loss=0.228, over 16271.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007782, over 43.00 utterances.], tot_loss[ctc_loss=0.09758, att_loss=0.2477, loss=0.2176, over 3274888.20 frames. utt_duration=1273 frames, utt_pad_proportion=0.04732, over 10298.99 utterances.], batch size: 43, lr: 8.95e-03, grad_scale: 8.0 2023-03-08 07:16:03,003 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7176, 2.8892, 3.5332, 3.1943, 3.6064, 4.7580, 4.5279, 3.5560], device='cuda:2'), covar=tensor([0.0343, 0.1661, 0.1359, 0.1208, 0.1088, 0.0740, 0.0580, 0.1101], device='cuda:2'), in_proj_covar=tensor([0.0230, 0.0232, 0.0253, 0.0208, 0.0242, 0.0313, 0.0225, 0.0221], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 07:16:34,791 INFO [train2.py:809] (2/4) Epoch 12, batch 3300, loss[ctc_loss=0.08068, att_loss=0.2324, loss=0.2021, over 11022.00 frames. utt_duration=1839 frames, utt_pad_proportion=0.2034, over 24.00 utterances.], tot_loss[ctc_loss=0.09711, att_loss=0.2477, loss=0.2176, over 3272838.68 frames. utt_duration=1257 frames, utt_pad_proportion=0.05068, over 10430.61 utterances.], batch size: 24, lr: 8.94e-03, grad_scale: 8.0 2023-03-08 07:16:36,329 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.338e+02 2.270e+02 2.752e+02 3.318e+02 7.003e+02, threshold=5.505e+02, percent-clipped=5.0 2023-03-08 07:16:51,086 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.84 vs. limit=2.0 2023-03-08 07:17:43,755 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=47166.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:17:53,902 INFO [train2.py:809] (2/4) Epoch 12, batch 3350, loss[ctc_loss=0.07912, att_loss=0.2439, loss=0.2109, over 16492.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.006037, over 46.00 utterances.], tot_loss[ctc_loss=0.09686, att_loss=0.2481, loss=0.2178, over 3285625.08 frames. utt_duration=1262 frames, utt_pad_proportion=0.0456, over 10423.43 utterances.], batch size: 46, lr: 8.94e-03, grad_scale: 8.0 2023-03-08 07:18:31,490 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-08 07:19:14,294 INFO [train2.py:809] (2/4) Epoch 12, batch 3400, loss[ctc_loss=0.08698, att_loss=0.2634, loss=0.2282, over 16889.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006538, over 49.00 utterances.], tot_loss[ctc_loss=0.0979, att_loss=0.2487, loss=0.2185, over 3286051.24 frames. utt_duration=1238 frames, utt_pad_proportion=0.05234, over 10631.93 utterances.], batch size: 49, lr: 8.93e-03, grad_scale: 8.0 2023-03-08 07:19:15,802 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.372e+02 2.335e+02 2.724e+02 3.467e+02 7.189e+02, threshold=5.448e+02, percent-clipped=4.0 2023-03-08 07:19:44,373 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47241.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:19:50,512 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6607, 2.9008, 3.5011, 4.4432, 3.9882, 4.0152, 3.0037, 2.2319], device='cuda:2'), covar=tensor([0.0619, 0.2199, 0.0944, 0.0613, 0.0700, 0.0453, 0.1448, 0.2377], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0213, 0.0190, 0.0199, 0.0195, 0.0160, 0.0198, 0.0184], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 07:19:56,613 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4740, 4.9515, 4.8459, 4.8895, 5.0608, 4.6336, 3.4175, 4.8094], device='cuda:2'), covar=tensor([0.0120, 0.0103, 0.0108, 0.0088, 0.0084, 0.0113, 0.0704, 0.0194], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0073, 0.0091, 0.0056, 0.0061, 0.0073, 0.0096, 0.0095], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 07:19:59,823 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=47251.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:20:10,655 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=47258.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:20:33,847 INFO [train2.py:809] (2/4) Epoch 12, batch 3450, loss[ctc_loss=0.08351, att_loss=0.2344, loss=0.2042, over 16017.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007523, over 40.00 utterances.], tot_loss[ctc_loss=0.09786, att_loss=0.2485, loss=0.2183, over 3282115.32 frames. utt_duration=1231 frames, utt_pad_proportion=0.05652, over 10678.41 utterances.], batch size: 40, lr: 8.93e-03, grad_scale: 8.0 2023-03-08 07:21:15,942 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=47299.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:21:20,715 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47302.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:21:26,639 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=47306.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:21:53,256 INFO [train2.py:809] (2/4) Epoch 12, batch 3500, loss[ctc_loss=0.1052, att_loss=0.2625, loss=0.2311, over 17317.00 frames. utt_duration=1101 frames, utt_pad_proportion=0.0377, over 63.00 utterances.], tot_loss[ctc_loss=0.09775, att_loss=0.2477, loss=0.2177, over 3272815.78 frames. utt_duration=1243 frames, utt_pad_proportion=0.05614, over 10540.70 utterances.], batch size: 63, lr: 8.92e-03, grad_scale: 8.0 2023-03-08 07:21:54,808 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.587e+02 2.392e+02 2.957e+02 3.491e+02 6.570e+02, threshold=5.913e+02, percent-clipped=5.0 2023-03-08 07:23:13,813 INFO [train2.py:809] (2/4) Epoch 12, batch 3550, loss[ctc_loss=0.08766, att_loss=0.231, loss=0.2023, over 15897.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.008428, over 39.00 utterances.], tot_loss[ctc_loss=0.09682, att_loss=0.2472, loss=0.2171, over 3276071.97 frames. utt_duration=1251 frames, utt_pad_proportion=0.05183, over 10491.17 utterances.], batch size: 39, lr: 8.92e-03, grad_scale: 8.0 2023-03-08 07:23:25,978 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47379.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:23:46,140 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8604, 5.1346, 5.2132, 5.0640, 5.2324, 5.1429, 4.9060, 4.6278], device='cuda:2'), covar=tensor([0.1061, 0.0565, 0.0204, 0.0417, 0.0254, 0.0270, 0.0335, 0.0347], device='cuda:2'), in_proj_covar=tensor([0.0466, 0.0302, 0.0262, 0.0294, 0.0353, 0.0371, 0.0304, 0.0334], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 07:24:34,835 INFO [train2.py:809] (2/4) Epoch 12, batch 3600, loss[ctc_loss=0.1107, att_loss=0.2597, loss=0.2299, over 16327.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006475, over 45.00 utterances.], tot_loss[ctc_loss=0.09803, att_loss=0.2483, loss=0.2182, over 3271126.14 frames. utt_duration=1220 frames, utt_pad_proportion=0.06244, over 10739.43 utterances.], batch size: 45, lr: 8.92e-03, grad_scale: 8.0 2023-03-08 07:24:36,352 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+02 2.371e+02 2.874e+02 3.552e+02 5.141e+02, threshold=5.747e+02, percent-clipped=0.0 2023-03-08 07:24:51,496 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2492, 5.5627, 5.0911, 5.6416, 5.0356, 5.0827, 5.6788, 5.4556], device='cuda:2'), covar=tensor([0.0389, 0.0216, 0.0683, 0.0200, 0.0340, 0.0246, 0.0219, 0.0143], device='cuda:2'), in_proj_covar=tensor([0.0339, 0.0263, 0.0322, 0.0264, 0.0272, 0.0203, 0.0253, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004], device='cuda:2') 2023-03-08 07:25:04,428 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47440.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:25:46,472 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-03-08 07:25:47,360 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=47466.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:25:56,430 INFO [train2.py:809] (2/4) Epoch 12, batch 3650, loss[ctc_loss=0.1053, att_loss=0.2643, loss=0.2325, over 16787.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005549, over 48.00 utterances.], tot_loss[ctc_loss=0.09683, att_loss=0.2474, loss=0.2173, over 3271231.04 frames. utt_duration=1233 frames, utt_pad_proportion=0.05836, over 10624.09 utterances.], batch size: 48, lr: 8.91e-03, grad_scale: 8.0 2023-03-08 07:26:14,267 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7476, 1.9425, 2.4898, 2.0470, 2.8943, 2.6827, 2.4716, 2.6731], device='cuda:2'), covar=tensor([0.1731, 0.5959, 0.4148, 0.2713, 0.1548, 0.1854, 0.3538, 0.1255], device='cuda:2'), in_proj_covar=tensor([0.0080, 0.0086, 0.0091, 0.0074, 0.0078, 0.0070, 0.0091, 0.0062], device='cuda:2'), out_proj_covar=tensor([5.4617e-05, 6.1347e-05, 6.3911e-05, 5.2495e-05, 5.2674e-05, 5.2004e-05, 6.2718e-05, 4.6796e-05], device='cuda:2') 2023-03-08 07:27:03,209 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=47514.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:27:10,012 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2172, 4.5760, 4.6441, 4.7589, 2.8883, 4.4718, 2.8678, 1.8497], device='cuda:2'), covar=tensor([0.0313, 0.0177, 0.0591, 0.0180, 0.1638, 0.0212, 0.1382, 0.1861], device='cuda:2'), in_proj_covar=tensor([0.0147, 0.0119, 0.0257, 0.0117, 0.0221, 0.0110, 0.0225, 0.0199], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 07:27:16,065 INFO [train2.py:809] (2/4) Epoch 12, batch 3700, loss[ctc_loss=0.1137, att_loss=0.2756, loss=0.2432, over 17404.00 frames. utt_duration=1106 frames, utt_pad_proportion=0.03305, over 63.00 utterances.], tot_loss[ctc_loss=0.09697, att_loss=0.2472, loss=0.2171, over 3278357.52 frames. utt_duration=1262 frames, utt_pad_proportion=0.04984, over 10399.42 utterances.], batch size: 63, lr: 8.91e-03, grad_scale: 8.0 2023-03-08 07:27:17,601 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+02 2.382e+02 2.831e+02 3.423e+02 6.375e+02, threshold=5.662e+02, percent-clipped=1.0 2023-03-08 07:28:38,168 INFO [train2.py:809] (2/4) Epoch 12, batch 3750, loss[ctc_loss=0.08633, att_loss=0.2415, loss=0.2105, over 16399.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007737, over 44.00 utterances.], tot_loss[ctc_loss=0.09653, att_loss=0.2466, loss=0.2166, over 3264875.56 frames. utt_duration=1250 frames, utt_pad_proportion=0.05598, over 10458.00 utterances.], batch size: 44, lr: 8.90e-03, grad_scale: 8.0 2023-03-08 07:29:07,755 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.61 vs. limit=2.0 2023-03-08 07:29:17,213 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=47597.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:29:51,409 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=2.10 vs. limit=2.0 2023-03-08 07:29:58,055 INFO [train2.py:809] (2/4) Epoch 12, batch 3800, loss[ctc_loss=0.0959, att_loss=0.225, loss=0.1991, over 15371.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01076, over 35.00 utterances.], tot_loss[ctc_loss=0.09688, att_loss=0.2467, loss=0.2168, over 3263345.89 frames. utt_duration=1232 frames, utt_pad_proportion=0.06014, over 10609.65 utterances.], batch size: 35, lr: 8.90e-03, grad_scale: 8.0 2023-03-08 07:29:59,588 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+02 2.386e+02 2.816e+02 3.593e+02 7.143e+02, threshold=5.632e+02, percent-clipped=4.0 2023-03-08 07:30:43,875 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.29 vs. limit=5.0 2023-03-08 07:31:06,948 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2023-03-08 07:31:17,528 INFO [train2.py:809] (2/4) Epoch 12, batch 3850, loss[ctc_loss=0.09638, att_loss=0.2329, loss=0.2056, over 15771.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.00873, over 38.00 utterances.], tot_loss[ctc_loss=0.09729, att_loss=0.2468, loss=0.2169, over 3259455.64 frames. utt_duration=1232 frames, utt_pad_proportion=0.06233, over 10595.65 utterances.], batch size: 38, lr: 8.89e-03, grad_scale: 16.0 2023-03-08 07:31:19,206 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7274, 5.9532, 5.3494, 5.6779, 5.6065, 5.0890, 5.2978, 5.0792], device='cuda:2'), covar=tensor([0.1253, 0.0932, 0.0913, 0.0772, 0.0760, 0.1613, 0.2334, 0.2149], device='cuda:2'), in_proj_covar=tensor([0.0443, 0.0517, 0.0387, 0.0386, 0.0373, 0.0425, 0.0532, 0.0468], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 07:32:20,163 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.32 vs. limit=5.0 2023-03-08 07:32:34,729 INFO [train2.py:809] (2/4) Epoch 12, batch 3900, loss[ctc_loss=0.1067, att_loss=0.2586, loss=0.2282, over 17281.00 frames. utt_duration=1173 frames, utt_pad_proportion=0.02256, over 59.00 utterances.], tot_loss[ctc_loss=0.09665, att_loss=0.2469, loss=0.2168, over 3261292.69 frames. utt_duration=1254 frames, utt_pad_proportion=0.05608, over 10412.11 utterances.], batch size: 59, lr: 8.89e-03, grad_scale: 16.0 2023-03-08 07:32:36,205 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.572e+02 2.278e+02 2.685e+02 3.452e+02 6.474e+02, threshold=5.369e+02, percent-clipped=2.0 2023-03-08 07:32:55,460 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=47735.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:33:00,352 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47738.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:33:53,002 INFO [train2.py:809] (2/4) Epoch 12, batch 3950, loss[ctc_loss=0.1212, att_loss=0.2665, loss=0.2374, over 17384.00 frames. utt_duration=881.7 frames, utt_pad_proportion=0.0768, over 79.00 utterances.], tot_loss[ctc_loss=0.09823, att_loss=0.2481, loss=0.2181, over 3249834.01 frames. utt_duration=1202 frames, utt_pad_proportion=0.06975, over 10827.73 utterances.], batch size: 79, lr: 8.88e-03, grad_scale: 8.0 2023-03-08 07:33:53,241 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5209, 4.7632, 4.7055, 4.6945, 4.8437, 4.7915, 4.4918, 4.3060], device='cuda:2'), covar=tensor([0.1042, 0.0602, 0.0351, 0.0503, 0.0317, 0.0339, 0.0381, 0.0396], device='cuda:2'), in_proj_covar=tensor([0.0463, 0.0299, 0.0262, 0.0291, 0.0351, 0.0369, 0.0301, 0.0334], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 07:34:17,728 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47788.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:34:35,152 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47799.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:35:12,427 INFO [train2.py:809] (2/4) Epoch 13, batch 0, loss[ctc_loss=0.1208, att_loss=0.2463, loss=0.2212, over 15966.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005755, over 41.00 utterances.], tot_loss[ctc_loss=0.1208, att_loss=0.2463, loss=0.2212, over 15966.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005755, over 41.00 utterances.], batch size: 41, lr: 8.53e-03, grad_scale: 8.0 2023-03-08 07:35:12,427 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 07:35:24,621 INFO [train2.py:843] (2/4) Epoch 13, validation: ctc_loss=0.04799, att_loss=0.2379, loss=0.2, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 07:35:24,622 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 07:35:29,660 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47809.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 07:35:54,074 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.302e+02 2.314e+02 2.717e+02 3.534e+02 1.025e+03, threshold=5.435e+02, percent-clipped=1.0 2023-03-08 07:36:08,427 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.55 vs. limit=2.0 2023-03-08 07:36:15,496 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4046, 2.0293, 2.0048, 2.2413, 3.0460, 2.2871, 2.0190, 2.8435], device='cuda:2'), covar=tensor([0.1764, 0.4090, 0.3596, 0.1533, 0.1188, 0.1890, 0.3119, 0.0811], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0086, 0.0091, 0.0074, 0.0079, 0.0072, 0.0091, 0.0061], device='cuda:2'), out_proj_covar=tensor([5.4412e-05, 6.1716e-05, 6.4223e-05, 5.2685e-05, 5.3681e-05, 5.2867e-05, 6.3350e-05, 4.6330e-05], device='cuda:2') 2023-03-08 07:36:34,382 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47849.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:36:45,287 INFO [train2.py:809] (2/4) Epoch 13, batch 50, loss[ctc_loss=0.08535, att_loss=0.2295, loss=0.2007, over 15948.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007446, over 41.00 utterances.], tot_loss[ctc_loss=0.0988, att_loss=0.2506, loss=0.2203, over 746377.26 frames. utt_duration=1135 frames, utt_pad_proportion=0.0715, over 2634.56 utterances.], batch size: 41, lr: 8.53e-03, grad_scale: 8.0 2023-03-08 07:36:57,018 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6512, 5.8685, 5.3184, 5.6558, 5.5138, 5.1447, 5.2998, 5.1152], device='cuda:2'), covar=tensor([0.1200, 0.0939, 0.0898, 0.0745, 0.0793, 0.1530, 0.2357, 0.2314], device='cuda:2'), in_proj_covar=tensor([0.0448, 0.0523, 0.0389, 0.0386, 0.0374, 0.0429, 0.0536, 0.0472], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 07:37:08,578 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47870.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 07:37:32,453 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1907, 3.9042, 3.1404, 3.5699, 4.0761, 3.6968, 3.0670, 4.4595], device='cuda:2'), covar=tensor([0.0937, 0.0579, 0.1124, 0.0630, 0.0646, 0.0676, 0.0788, 0.0422], device='cuda:2'), in_proj_covar=tensor([0.0186, 0.0188, 0.0205, 0.0177, 0.0242, 0.0215, 0.0184, 0.0259], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 07:37:51,326 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=47897.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:38:05,797 INFO [train2.py:809] (2/4) Epoch 13, batch 100, loss[ctc_loss=0.07604, att_loss=0.2218, loss=0.1927, over 15996.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.006021, over 40.00 utterances.], tot_loss[ctc_loss=0.09779, att_loss=0.25, loss=0.2196, over 1300513.90 frames. utt_duration=1203 frames, utt_pad_proportion=0.06507, over 4331.19 utterances.], batch size: 40, lr: 8.52e-03, grad_scale: 8.0 2023-03-08 07:38:35,372 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.519e+02 2.449e+02 2.869e+02 3.545e+02 8.286e+02, threshold=5.739e+02, percent-clipped=3.0 2023-03-08 07:39:08,787 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=47945.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:39:26,430 INFO [train2.py:809] (2/4) Epoch 13, batch 150, loss[ctc_loss=0.1192, att_loss=0.266, loss=0.2367, over 16931.00 frames. utt_duration=685.6 frames, utt_pad_proportion=0.1409, over 99.00 utterances.], tot_loss[ctc_loss=0.09714, att_loss=0.2477, loss=0.2176, over 1725989.87 frames. utt_duration=1250 frames, utt_pad_proportion=0.05755, over 5528.76 utterances.], batch size: 99, lr: 8.52e-03, grad_scale: 8.0 2023-03-08 07:40:28,052 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6399, 4.6498, 4.6093, 4.5576, 5.1279, 4.6108, 4.5740, 2.1720], device='cuda:2'), covar=tensor([0.0225, 0.0225, 0.0207, 0.0195, 0.0957, 0.0242, 0.0254, 0.2231], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0136, 0.0138, 0.0150, 0.0335, 0.0123, 0.0125, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 07:40:51,752 INFO [train2.py:809] (2/4) Epoch 13, batch 200, loss[ctc_loss=0.1106, att_loss=0.2636, loss=0.233, over 17233.00 frames. utt_duration=1170 frames, utt_pad_proportion=0.02858, over 59.00 utterances.], tot_loss[ctc_loss=0.09744, att_loss=0.2488, loss=0.2186, over 2082918.17 frames. utt_duration=1278 frames, utt_pad_proportion=0.04439, over 6524.84 utterances.], batch size: 59, lr: 8.52e-03, grad_scale: 8.0 2023-03-08 07:40:52,660 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.14 vs. limit=2.0 2023-03-08 07:41:07,835 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-03-08 07:41:20,953 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.382e+02 2.209e+02 2.594e+02 3.149e+02 6.379e+02, threshold=5.187e+02, percent-clipped=1.0 2023-03-08 07:41:38,784 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48035.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:41:38,947 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6037, 2.3623, 5.1273, 3.9779, 3.0226, 4.4462, 4.9685, 4.6101], device='cuda:2'), covar=tensor([0.0226, 0.1732, 0.0144, 0.0957, 0.1850, 0.0194, 0.0086, 0.0245], device='cuda:2'), in_proj_covar=tensor([0.0150, 0.0241, 0.0138, 0.0305, 0.0265, 0.0182, 0.0125, 0.0157], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 07:42:10,168 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2804, 5.2243, 4.9231, 3.1257, 5.0192, 4.9438, 4.5555, 2.6034], device='cuda:2'), covar=tensor([0.0128, 0.0087, 0.0321, 0.1219, 0.0109, 0.0164, 0.0361, 0.2008], device='cuda:2'), in_proj_covar=tensor([0.0063, 0.0087, 0.0082, 0.0106, 0.0073, 0.0095, 0.0095, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 07:42:12,098 INFO [train2.py:809] (2/4) Epoch 13, batch 250, loss[ctc_loss=0.08465, att_loss=0.2309, loss=0.2017, over 15497.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009041, over 36.00 utterances.], tot_loss[ctc_loss=0.09654, att_loss=0.2475, loss=0.2173, over 2337709.32 frames. utt_duration=1289 frames, utt_pad_proportion=0.04451, over 7260.18 utterances.], batch size: 36, lr: 8.51e-03, grad_scale: 8.0 2023-03-08 07:42:38,645 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48072.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:42:55,996 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48083.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:43:12,983 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48094.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:43:31,447 INFO [train2.py:809] (2/4) Epoch 13, batch 300, loss[ctc_loss=0.1061, att_loss=0.2629, loss=0.2315, over 16763.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006792, over 48.00 utterances.], tot_loss[ctc_loss=0.09568, att_loss=0.2467, loss=0.2165, over 2545786.82 frames. utt_duration=1250 frames, utt_pad_proportion=0.05311, over 8157.36 utterances.], batch size: 48, lr: 8.51e-03, grad_scale: 8.0 2023-03-08 07:44:01,029 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.586e+02 2.171e+02 2.716e+02 3.489e+02 9.070e+02, threshold=5.431e+02, percent-clipped=6.0 2023-03-08 07:44:16,013 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48133.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:44:32,581 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48144.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:44:51,565 INFO [train2.py:809] (2/4) Epoch 13, batch 350, loss[ctc_loss=0.1735, att_loss=0.2937, loss=0.2696, over 14001.00 frames. utt_duration=385 frames, utt_pad_proportion=0.3293, over 146.00 utterances.], tot_loss[ctc_loss=0.0977, att_loss=0.2485, loss=0.2184, over 2705286.45 frames. utt_duration=1205 frames, utt_pad_proportion=0.06383, over 8994.19 utterances.], batch size: 146, lr: 8.50e-03, grad_scale: 8.0 2023-03-08 07:45:04,059 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2023-03-08 07:45:06,297 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48165.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 07:45:35,907 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8501, 6.0537, 5.5007, 5.8321, 5.6876, 5.3059, 5.5220, 5.2896], device='cuda:2'), covar=tensor([0.1064, 0.0855, 0.0749, 0.0708, 0.0779, 0.1551, 0.2184, 0.2300], device='cuda:2'), in_proj_covar=tensor([0.0450, 0.0521, 0.0388, 0.0384, 0.0373, 0.0426, 0.0534, 0.0466], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 07:46:10,724 INFO [train2.py:809] (2/4) Epoch 13, batch 400, loss[ctc_loss=0.08422, att_loss=0.2444, loss=0.2124, over 16879.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007767, over 49.00 utterances.], tot_loss[ctc_loss=0.09664, att_loss=0.2482, loss=0.2179, over 2832508.04 frames. utt_duration=1212 frames, utt_pad_proportion=0.06255, over 9361.14 utterances.], batch size: 49, lr: 8.50e-03, grad_scale: 8.0 2023-03-08 07:46:21,641 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48212.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 07:46:40,272 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.388e+02 2.275e+02 2.732e+02 3.494e+02 6.607e+02, threshold=5.464e+02, percent-clipped=2.0 2023-03-08 07:46:48,659 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1861, 5.1276, 5.0432, 2.4840, 1.9981, 2.5902, 2.7142, 3.9171], device='cuda:2'), covar=tensor([0.0557, 0.0197, 0.0199, 0.3748, 0.5805, 0.2786, 0.2500, 0.1545], device='cuda:2'), in_proj_covar=tensor([0.0339, 0.0230, 0.0245, 0.0218, 0.0348, 0.0338, 0.0236, 0.0360], device='cuda:2'), out_proj_covar=tensor([1.4987e-04, 8.5479e-05, 1.0611e-04, 9.5292e-05, 1.5042e-04, 1.3589e-04, 9.3383e-05, 1.5023e-04], device='cuda:2') 2023-03-08 07:47:03,914 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-08 07:47:29,597 INFO [train2.py:809] (2/4) Epoch 13, batch 450, loss[ctc_loss=0.06977, att_loss=0.2211, loss=0.1908, over 15878.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009689, over 39.00 utterances.], tot_loss[ctc_loss=0.09596, att_loss=0.2476, loss=0.2172, over 2932240.72 frames. utt_duration=1233 frames, utt_pad_proportion=0.0588, over 9527.44 utterances.], batch size: 39, lr: 8.49e-03, grad_scale: 8.0 2023-03-08 07:47:56,910 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48273.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 07:48:48,218 INFO [train2.py:809] (2/4) Epoch 13, batch 500, loss[ctc_loss=0.1076, att_loss=0.2542, loss=0.2248, over 15952.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.00712, over 41.00 utterances.], tot_loss[ctc_loss=0.09636, att_loss=0.2479, loss=0.2176, over 3000240.47 frames. utt_duration=1220 frames, utt_pad_proportion=0.06468, over 9851.89 utterances.], batch size: 41, lr: 8.49e-03, grad_scale: 8.0 2023-03-08 07:49:10,183 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2023-03-08 07:49:16,721 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+02 2.184e+02 2.674e+02 3.467e+02 7.635e+02, threshold=5.348e+02, percent-clipped=2.0 2023-03-08 07:49:17,121 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48324.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:50:07,389 INFO [train2.py:809] (2/4) Epoch 13, batch 550, loss[ctc_loss=0.08951, att_loss=0.2502, loss=0.2181, over 16625.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005379, over 47.00 utterances.], tot_loss[ctc_loss=0.09629, att_loss=0.2478, loss=0.2175, over 3066976.08 frames. utt_duration=1249 frames, utt_pad_proportion=0.05531, over 9836.17 utterances.], batch size: 47, lr: 8.49e-03, grad_scale: 8.0 2023-03-08 07:50:21,590 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8567, 3.7560, 3.6662, 3.2043, 3.6921, 3.7127, 3.4024, 2.7918], device='cuda:2'), covar=tensor([0.1235, 0.1419, 0.2739, 0.5381, 0.3505, 0.4439, 0.2103, 0.6572], device='cuda:2'), in_proj_covar=tensor([0.0112, 0.0132, 0.0139, 0.0212, 0.0111, 0.0198, 0.0118, 0.0183], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 07:50:39,246 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2023-03-08 07:50:49,106 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2023-03-08 07:50:54,959 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48385.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:51:08,887 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48394.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:51:13,516 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48397.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:51:27,704 INFO [train2.py:809] (2/4) Epoch 13, batch 600, loss[ctc_loss=0.08193, att_loss=0.2367, loss=0.2057, over 16180.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006397, over 41.00 utterances.], tot_loss[ctc_loss=0.0955, att_loss=0.2469, loss=0.2167, over 3115927.49 frames. utt_duration=1258 frames, utt_pad_proportion=0.05196, over 9922.01 utterances.], batch size: 41, lr: 8.48e-03, grad_scale: 8.0 2023-03-08 07:51:42,627 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.20 vs. limit=5.0 2023-03-08 07:51:56,285 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48423.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:51:57,442 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.401e+02 2.243e+02 2.721e+02 3.200e+02 7.268e+02, threshold=5.442e+02, percent-clipped=2.0 2023-03-08 07:52:04,399 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48428.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:52:25,665 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48442.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:52:28,967 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48444.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:52:47,829 INFO [train2.py:809] (2/4) Epoch 13, batch 650, loss[ctc_loss=0.1169, att_loss=0.2703, loss=0.2396, over 16781.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005005, over 48.00 utterances.], tot_loss[ctc_loss=0.09541, att_loss=0.2471, loss=0.2168, over 3153192.40 frames. utt_duration=1224 frames, utt_pad_proportion=0.05912, over 10313.08 utterances.], batch size: 48, lr: 8.48e-03, grad_scale: 8.0 2023-03-08 07:52:51,853 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48458.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:53:03,150 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48465.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 07:53:04,681 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0055, 3.7737, 3.1272, 3.3693, 3.9958, 3.6663, 2.7406, 4.3611], device='cuda:2'), covar=tensor([0.0986, 0.0526, 0.1084, 0.0719, 0.0723, 0.0611, 0.0968, 0.0494], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0190, 0.0210, 0.0178, 0.0245, 0.0215, 0.0186, 0.0260], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 07:53:33,519 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48484.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:53:45,420 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48492.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:54:07,434 INFO [train2.py:809] (2/4) Epoch 13, batch 700, loss[ctc_loss=0.09986, att_loss=0.2428, loss=0.2142, over 16334.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005943, over 45.00 utterances.], tot_loss[ctc_loss=0.09466, att_loss=0.2474, loss=0.2168, over 3185030.12 frames. utt_duration=1234 frames, utt_pad_proportion=0.05521, over 10340.12 utterances.], batch size: 45, lr: 8.47e-03, grad_scale: 8.0 2023-03-08 07:54:19,502 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48513.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 07:54:33,010 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.58 vs. limit=2.0 2023-03-08 07:54:36,144 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.318e+02 2.306e+02 2.831e+02 3.598e+02 7.957e+02, threshold=5.663e+02, percent-clipped=6.0 2023-03-08 07:55:26,624 INFO [train2.py:809] (2/4) Epoch 13, batch 750, loss[ctc_loss=0.08169, att_loss=0.2453, loss=0.2125, over 16779.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005933, over 48.00 utterances.], tot_loss[ctc_loss=0.09451, att_loss=0.2465, loss=0.2161, over 3195557.34 frames. utt_duration=1241 frames, utt_pad_proportion=0.05594, over 10316.39 utterances.], batch size: 48, lr: 8.47e-03, grad_scale: 8.0 2023-03-08 07:55:46,618 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48568.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 07:56:47,921 INFO [train2.py:809] (2/4) Epoch 13, batch 800, loss[ctc_loss=0.09976, att_loss=0.2526, loss=0.2221, over 17352.00 frames. utt_duration=1103 frames, utt_pad_proportion=0.03585, over 63.00 utterances.], tot_loss[ctc_loss=0.09601, att_loss=0.2478, loss=0.2174, over 3220207.33 frames. utt_duration=1216 frames, utt_pad_proportion=0.06005, over 10603.14 utterances.], batch size: 63, lr: 8.46e-03, grad_scale: 8.0 2023-03-08 07:56:57,668 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8465, 4.9024, 4.7901, 2.6869, 4.6002, 4.4943, 4.0878, 2.6821], device='cuda:2'), covar=tensor([0.0085, 0.0084, 0.0214, 0.1104, 0.0110, 0.0167, 0.0317, 0.1334], device='cuda:2'), in_proj_covar=tensor([0.0063, 0.0088, 0.0082, 0.0105, 0.0074, 0.0094, 0.0094, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 07:56:59,492 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5215, 2.3960, 5.1652, 3.9854, 2.9788, 4.3731, 4.9501, 4.6235], device='cuda:2'), covar=tensor([0.0232, 0.1801, 0.0118, 0.0969, 0.1838, 0.0223, 0.0087, 0.0216], device='cuda:2'), in_proj_covar=tensor([0.0150, 0.0239, 0.0140, 0.0301, 0.0265, 0.0182, 0.0126, 0.0157], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 07:57:16,744 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.553e+02 2.319e+02 2.822e+02 3.507e+02 7.468e+02, threshold=5.644e+02, percent-clipped=2.0 2023-03-08 07:57:31,139 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6958, 4.5139, 4.6212, 4.6262, 5.1406, 4.7522, 4.7204, 2.2889], device='cuda:2'), covar=tensor([0.0188, 0.0315, 0.0240, 0.0226, 0.1003, 0.0173, 0.0239, 0.1980], device='cuda:2'), in_proj_covar=tensor([0.0125, 0.0137, 0.0141, 0.0153, 0.0337, 0.0123, 0.0125, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 07:57:43,767 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6283, 2.7324, 5.2171, 4.0457, 3.0819, 4.4589, 5.0398, 4.5707], device='cuda:2'), covar=tensor([0.0230, 0.1580, 0.0147, 0.0927, 0.1681, 0.0212, 0.0089, 0.0239], device='cuda:2'), in_proj_covar=tensor([0.0149, 0.0238, 0.0140, 0.0301, 0.0264, 0.0182, 0.0125, 0.0157], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 07:58:07,623 INFO [train2.py:809] (2/4) Epoch 13, batch 850, loss[ctc_loss=0.08387, att_loss=0.2439, loss=0.2119, over 17053.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.007311, over 52.00 utterances.], tot_loss[ctc_loss=0.0951, att_loss=0.2476, loss=0.2171, over 3239413.24 frames. utt_duration=1242 frames, utt_pad_proportion=0.05331, over 10446.16 utterances.], batch size: 52, lr: 8.46e-03, grad_scale: 8.0 2023-03-08 07:58:30,294 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9975, 5.0709, 4.9211, 2.2088, 1.9454, 2.5404, 2.8975, 3.7647], device='cuda:2'), covar=tensor([0.0731, 0.0205, 0.0230, 0.4519, 0.6022, 0.3017, 0.2381, 0.1830], device='cuda:2'), in_proj_covar=tensor([0.0337, 0.0230, 0.0242, 0.0215, 0.0346, 0.0336, 0.0233, 0.0357], device='cuda:2'), out_proj_covar=tensor([1.4950e-04, 8.5714e-05, 1.0496e-04, 9.4360e-05, 1.4908e-04, 1.3485e-04, 9.2100e-05, 1.4903e-04], device='cuda:2') 2023-03-08 07:58:46,036 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48680.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 07:59:28,737 INFO [train2.py:809] (2/4) Epoch 13, batch 900, loss[ctc_loss=0.08218, att_loss=0.2497, loss=0.2162, over 16698.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005903, over 46.00 utterances.], tot_loss[ctc_loss=0.09469, att_loss=0.2474, loss=0.2169, over 3251696.40 frames. utt_duration=1252 frames, utt_pad_proportion=0.05036, over 10398.45 utterances.], batch size: 46, lr: 8.45e-03, grad_scale: 8.0 2023-03-08 07:59:56,781 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.412e+02 2.153e+02 2.503e+02 3.048e+02 6.271e+02, threshold=5.007e+02, percent-clipped=1.0 2023-03-08 08:00:04,009 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48728.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:00:43,564 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48753.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:00:49,765 INFO [train2.py:809] (2/4) Epoch 13, batch 950, loss[ctc_loss=0.07605, att_loss=0.2266, loss=0.1965, over 15784.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007898, over 38.00 utterances.], tot_loss[ctc_loss=0.09382, att_loss=0.2461, loss=0.2156, over 3254273.81 frames. utt_duration=1271 frames, utt_pad_proportion=0.04749, over 10251.79 utterances.], batch size: 38, lr: 8.45e-03, grad_scale: 8.0 2023-03-08 08:00:54,715 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9562, 5.2940, 5.5417, 5.3929, 5.4660, 5.9220, 5.2551, 6.0497], device='cuda:2'), covar=tensor([0.0663, 0.0570, 0.0636, 0.1077, 0.1540, 0.0787, 0.0567, 0.0592], device='cuda:2'), in_proj_covar=tensor([0.0741, 0.0437, 0.0515, 0.0574, 0.0762, 0.0521, 0.0415, 0.0513], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 08:01:21,802 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48776.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:01:26,505 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48779.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:02:10,409 INFO [train2.py:809] (2/4) Epoch 13, batch 1000, loss[ctc_loss=0.08599, att_loss=0.2121, loss=0.1869, over 11862.00 frames. utt_duration=1826 frames, utt_pad_proportion=0.1725, over 26.00 utterances.], tot_loss[ctc_loss=0.09339, att_loss=0.2455, loss=0.2151, over 3257670.30 frames. utt_duration=1283 frames, utt_pad_proportion=0.04371, over 10168.82 utterances.], batch size: 26, lr: 8.45e-03, grad_scale: 8.0 2023-03-08 08:02:33,386 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48821.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:02:37,582 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.313e+02 2.228e+02 2.748e+02 3.245e+02 7.781e+02, threshold=5.496e+02, percent-clipped=5.0 2023-03-08 08:02:41,594 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8970, 5.1444, 5.1386, 5.0836, 5.2139, 5.1556, 4.9089, 4.6427], device='cuda:2'), covar=tensor([0.1094, 0.0601, 0.0256, 0.0460, 0.0280, 0.0331, 0.0307, 0.0383], device='cuda:2'), in_proj_covar=tensor([0.0477, 0.0313, 0.0270, 0.0303, 0.0363, 0.0385, 0.0306, 0.0346], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 08:03:00,454 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48838.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:03:04,751 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7491, 5.9502, 5.3617, 5.7745, 5.6024, 5.2407, 5.3146, 5.1779], device='cuda:2'), covar=tensor([0.1144, 0.0949, 0.0863, 0.0682, 0.0806, 0.1489, 0.2381, 0.2240], device='cuda:2'), in_proj_covar=tensor([0.0455, 0.0528, 0.0392, 0.0392, 0.0375, 0.0431, 0.0544, 0.0479], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 08:03:30,178 INFO [train2.py:809] (2/4) Epoch 13, batch 1050, loss[ctc_loss=0.1189, att_loss=0.2542, loss=0.2271, over 16192.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.006009, over 41.00 utterances.], tot_loss[ctc_loss=0.09392, att_loss=0.2462, loss=0.2157, over 3263129.72 frames. utt_duration=1263 frames, utt_pad_proportion=0.04809, over 10345.99 utterances.], batch size: 41, lr: 8.44e-03, grad_scale: 8.0 2023-03-08 08:03:49,032 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48868.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 08:04:11,855 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48882.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:04:39,691 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48899.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:04:50,177 INFO [train2.py:809] (2/4) Epoch 13, batch 1100, loss[ctc_loss=0.1139, att_loss=0.2392, loss=0.2142, over 15390.00 frames. utt_duration=1760 frames, utt_pad_proportion=0.009963, over 35.00 utterances.], tot_loss[ctc_loss=0.09555, att_loss=0.2467, loss=0.2165, over 3254352.11 frames. utt_duration=1227 frames, utt_pad_proportion=0.06069, over 10619.78 utterances.], batch size: 35, lr: 8.44e-03, grad_scale: 8.0 2023-03-08 08:05:05,420 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48916.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 08:05:18,174 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.628e+02 2.462e+02 3.131e+02 3.917e+02 9.784e+02, threshold=6.261e+02, percent-clipped=6.0 2023-03-08 08:05:49,300 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.92 vs. limit=5.0 2023-03-08 08:06:09,390 INFO [train2.py:809] (2/4) Epoch 13, batch 1150, loss[ctc_loss=0.06908, att_loss=0.223, loss=0.1922, over 15862.00 frames. utt_duration=1628 frames, utt_pad_proportion=0.01067, over 39.00 utterances.], tot_loss[ctc_loss=0.09639, att_loss=0.2472, loss=0.2171, over 3254435.06 frames. utt_duration=1186 frames, utt_pad_proportion=0.07232, over 10993.64 utterances.], batch size: 39, lr: 8.43e-03, grad_scale: 8.0 2023-03-08 08:06:12,815 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9525, 3.7796, 3.2569, 3.5613, 4.0655, 3.7431, 2.8858, 4.3883], device='cuda:2'), covar=tensor([0.1050, 0.0393, 0.0956, 0.0620, 0.0601, 0.0619, 0.0927, 0.0414], device='cuda:2'), in_proj_covar=tensor([0.0187, 0.0189, 0.0206, 0.0178, 0.0242, 0.0217, 0.0185, 0.0257], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 08:06:47,200 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48980.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:07:02,561 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-03-08 08:07:04,926 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7665, 1.9464, 2.0803, 2.4585, 3.0659, 2.0736, 1.9018, 2.9509], device='cuda:2'), covar=tensor([0.1954, 0.4993, 0.4010, 0.1794, 0.2039, 0.2113, 0.3488, 0.0959], device='cuda:2'), in_proj_covar=tensor([0.0082, 0.0090, 0.0093, 0.0077, 0.0081, 0.0073, 0.0093, 0.0064], device='cuda:2'), out_proj_covar=tensor([5.6400e-05, 6.4552e-05, 6.6521e-05, 5.4814e-05, 5.5442e-05, 5.4505e-05, 6.4899e-05, 4.8614e-05], device='cuda:2') 2023-03-08 08:07:28,988 INFO [train2.py:809] (2/4) Epoch 13, batch 1200, loss[ctc_loss=0.1159, att_loss=0.273, loss=0.2416, over 17298.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01215, over 55.00 utterances.], tot_loss[ctc_loss=0.0959, att_loss=0.2467, loss=0.2166, over 3259330.64 frames. utt_duration=1196 frames, utt_pad_proportion=0.06884, over 10914.69 utterances.], batch size: 55, lr: 8.43e-03, grad_scale: 8.0 2023-03-08 08:07:57,672 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.513e+02 2.494e+02 2.872e+02 3.566e+02 1.058e+03, threshold=5.743e+02, percent-clipped=4.0 2023-03-08 08:07:57,932 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0213, 5.2732, 5.5351, 5.3361, 5.3929, 5.9516, 5.1558, 6.0965], device='cuda:2'), covar=tensor([0.0649, 0.0610, 0.0726, 0.1223, 0.1857, 0.0852, 0.0639, 0.0571], device='cuda:2'), in_proj_covar=tensor([0.0751, 0.0440, 0.0524, 0.0580, 0.0774, 0.0528, 0.0422, 0.0517], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 08:08:04,015 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49028.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:08:10,820 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.58 vs. limit=5.0 2023-03-08 08:08:44,298 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49053.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:08:48,630 INFO [train2.py:809] (2/4) Epoch 13, batch 1250, loss[ctc_loss=0.0888, att_loss=0.2386, loss=0.2086, over 16410.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006532, over 44.00 utterances.], tot_loss[ctc_loss=0.09591, att_loss=0.2472, loss=0.217, over 3272750.46 frames. utt_duration=1218 frames, utt_pad_proportion=0.05973, over 10759.31 utterances.], batch size: 44, lr: 8.42e-03, grad_scale: 8.0 2023-03-08 08:09:06,929 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-03-08 08:09:25,528 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49079.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:10:01,131 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49101.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:10:08,581 INFO [train2.py:809] (2/4) Epoch 13, batch 1300, loss[ctc_loss=0.09101, att_loss=0.2543, loss=0.2216, over 16792.00 frames. utt_duration=679.9 frames, utt_pad_proportion=0.1448, over 99.00 utterances.], tot_loss[ctc_loss=0.09507, att_loss=0.2467, loss=0.2164, over 3264591.67 frames. utt_duration=1210 frames, utt_pad_proportion=0.06511, over 10807.65 utterances.], batch size: 99, lr: 8.42e-03, grad_scale: 8.0 2023-03-08 08:10:08,823 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9269, 5.2228, 5.4845, 5.3733, 5.3337, 5.8707, 5.1470, 5.9891], device='cuda:2'), covar=tensor([0.0749, 0.0658, 0.0686, 0.1171, 0.1889, 0.0937, 0.0680, 0.0675], device='cuda:2'), in_proj_covar=tensor([0.0746, 0.0440, 0.0520, 0.0576, 0.0771, 0.0527, 0.0420, 0.0518], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 08:10:20,992 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3419, 4.7193, 4.9958, 4.7800, 4.8468, 5.2534, 4.7914, 5.3543], device='cuda:2'), covar=tensor([0.0688, 0.0688, 0.0663, 0.1095, 0.1672, 0.0823, 0.1059, 0.0604], device='cuda:2'), in_proj_covar=tensor([0.0742, 0.0439, 0.0516, 0.0573, 0.0766, 0.0523, 0.0418, 0.0514], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 08:10:36,805 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.295e+02 2.317e+02 2.735e+02 3.523e+02 8.419e+02, threshold=5.469e+02, percent-clipped=3.0 2023-03-08 08:10:41,556 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49127.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:10:59,572 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-03-08 08:11:28,002 INFO [train2.py:809] (2/4) Epoch 13, batch 1350, loss[ctc_loss=0.09869, att_loss=0.2372, loss=0.2095, over 16206.00 frames. utt_duration=1582 frames, utt_pad_proportion=0.004801, over 41.00 utterances.], tot_loss[ctc_loss=0.09492, att_loss=0.246, loss=0.2158, over 3253579.28 frames. utt_duration=1223 frames, utt_pad_proportion=0.06439, over 10658.27 utterances.], batch size: 41, lr: 8.42e-03, grad_scale: 8.0 2023-03-08 08:12:01,197 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49177.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:12:29,055 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49194.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:12:47,667 INFO [train2.py:809] (2/4) Epoch 13, batch 1400, loss[ctc_loss=0.06851, att_loss=0.2164, loss=0.1868, over 13264.00 frames. utt_duration=1831 frames, utt_pad_proportion=0.09443, over 29.00 utterances.], tot_loss[ctc_loss=0.09463, att_loss=0.2463, loss=0.216, over 3266452.68 frames. utt_duration=1253 frames, utt_pad_proportion=0.05436, over 10443.04 utterances.], batch size: 29, lr: 8.41e-03, grad_scale: 8.0 2023-03-08 08:12:55,482 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6826, 5.9667, 5.4126, 5.7744, 5.5468, 5.1938, 5.2311, 5.1648], device='cuda:2'), covar=tensor([0.1312, 0.0828, 0.0895, 0.0794, 0.0963, 0.1449, 0.2351, 0.2304], device='cuda:2'), in_proj_covar=tensor([0.0456, 0.0526, 0.0392, 0.0396, 0.0372, 0.0425, 0.0536, 0.0472], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 08:13:15,979 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.537e+02 2.165e+02 2.664e+02 3.367e+02 6.809e+02, threshold=5.329e+02, percent-clipped=2.0 2023-03-08 08:13:35,669 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=49236.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:14:07,333 INFO [train2.py:809] (2/4) Epoch 13, batch 1450, loss[ctc_loss=0.06832, att_loss=0.2446, loss=0.2094, over 16751.00 frames. utt_duration=1397 frames, utt_pad_proportion=0.007502, over 48.00 utterances.], tot_loss[ctc_loss=0.09498, att_loss=0.2466, loss=0.2163, over 3263584.05 frames. utt_duration=1221 frames, utt_pad_proportion=0.06437, over 10701.66 utterances.], batch size: 48, lr: 8.41e-03, grad_scale: 8.0 2023-03-08 08:14:55,054 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4719, 2.2782, 4.9392, 3.6634, 2.7037, 4.1361, 4.6668, 4.4390], device='cuda:2'), covar=tensor([0.0201, 0.1788, 0.0111, 0.1042, 0.2010, 0.0224, 0.0116, 0.0237], device='cuda:2'), in_proj_covar=tensor([0.0153, 0.0241, 0.0143, 0.0304, 0.0267, 0.0185, 0.0128, 0.0159], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 08:15:12,345 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.46 vs. limit=5.0 2023-03-08 08:15:13,315 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=49297.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:15:26,869 INFO [train2.py:809] (2/4) Epoch 13, batch 1500, loss[ctc_loss=0.1418, att_loss=0.2663, loss=0.2414, over 14423.00 frames. utt_duration=399.3 frames, utt_pad_proportion=0.3067, over 145.00 utterances.], tot_loss[ctc_loss=0.09531, att_loss=0.2469, loss=0.2166, over 3261477.87 frames. utt_duration=1215 frames, utt_pad_proportion=0.06495, over 10748.35 utterances.], batch size: 145, lr: 8.40e-03, grad_scale: 8.0 2023-03-08 08:15:55,303 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.604e+02 2.233e+02 2.652e+02 3.112e+02 8.079e+02, threshold=5.303e+02, percent-clipped=1.0 2023-03-08 08:16:00,990 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-03-08 08:16:10,861 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.45 vs. limit=5.0 2023-03-08 08:16:32,372 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6489, 2.8233, 3.4078, 4.5006, 3.9913, 3.9843, 3.0053, 2.2039], device='cuda:2'), covar=tensor([0.0530, 0.2025, 0.0947, 0.0501, 0.0745, 0.0392, 0.1417, 0.2252], device='cuda:2'), in_proj_covar=tensor([0.0169, 0.0208, 0.0187, 0.0195, 0.0192, 0.0160, 0.0195, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 08:16:47,514 INFO [train2.py:809] (2/4) Epoch 13, batch 1550, loss[ctc_loss=0.09195, att_loss=0.2373, loss=0.2083, over 15872.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.01022, over 39.00 utterances.], tot_loss[ctc_loss=0.0943, att_loss=0.246, loss=0.2157, over 3265266.30 frames. utt_duration=1237 frames, utt_pad_proportion=0.0583, over 10572.47 utterances.], batch size: 39, lr: 8.40e-03, grad_scale: 8.0 2023-03-08 08:17:43,735 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3822, 3.4830, 3.3522, 2.9037, 3.4645, 3.4326, 3.2937, 2.4414], device='cuda:2'), covar=tensor([0.1171, 0.2749, 0.4582, 0.5367, 0.1275, 0.4699, 0.2342, 0.6736], device='cuda:2'), in_proj_covar=tensor([0.0109, 0.0128, 0.0139, 0.0205, 0.0109, 0.0196, 0.0116, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 08:18:04,223 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8556, 5.1526, 5.1649, 5.0699, 5.2072, 5.2082, 4.9062, 4.7134], device='cuda:2'), covar=tensor([0.1189, 0.0553, 0.0258, 0.0446, 0.0280, 0.0303, 0.0370, 0.0358], device='cuda:2'), in_proj_covar=tensor([0.0473, 0.0306, 0.0269, 0.0298, 0.0356, 0.0379, 0.0303, 0.0340], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 08:18:08,484 INFO [train2.py:809] (2/4) Epoch 13, batch 1600, loss[ctc_loss=0.08978, att_loss=0.245, loss=0.2139, over 16287.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006126, over 43.00 utterances.], tot_loss[ctc_loss=0.09423, att_loss=0.2459, loss=0.2156, over 3266327.30 frames. utt_duration=1237 frames, utt_pad_proportion=0.05868, over 10573.62 utterances.], batch size: 43, lr: 8.40e-03, grad_scale: 8.0 2023-03-08 08:18:34,206 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=49422.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:18:36,893 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.458e+02 2.327e+02 2.714e+02 3.320e+02 9.456e+02, threshold=5.428e+02, percent-clipped=3.0 2023-03-08 08:18:50,254 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=49432.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 08:19:27,360 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3275, 2.4954, 3.2469, 4.3465, 3.8247, 3.8183, 2.9513, 1.8181], device='cuda:2'), covar=tensor([0.0708, 0.2373, 0.0998, 0.0542, 0.0741, 0.0423, 0.1386, 0.2606], device='cuda:2'), in_proj_covar=tensor([0.0169, 0.0207, 0.0186, 0.0195, 0.0192, 0.0159, 0.0193, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 08:19:28,471 INFO [train2.py:809] (2/4) Epoch 13, batch 1650, loss[ctc_loss=0.1467, att_loss=0.2827, loss=0.2555, over 16457.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.007919, over 46.00 utterances.], tot_loss[ctc_loss=0.09548, att_loss=0.247, loss=0.2167, over 3269200.37 frames. utt_duration=1200 frames, utt_pad_proportion=0.06662, over 10908.84 utterances.], batch size: 46, lr: 8.39e-03, grad_scale: 8.0 2023-03-08 08:20:03,222 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49477.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:20:12,589 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=49483.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:20:16,786 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7475, 6.0083, 5.3922, 5.7799, 5.6619, 5.2492, 5.3804, 5.1764], device='cuda:2'), covar=tensor([0.1194, 0.0902, 0.0860, 0.0766, 0.0811, 0.1403, 0.2299, 0.2284], device='cuda:2'), in_proj_covar=tensor([0.0449, 0.0527, 0.0388, 0.0389, 0.0371, 0.0421, 0.0530, 0.0470], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 08:20:29,301 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=49493.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 08:20:30,711 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49494.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:20:48,673 INFO [train2.py:809] (2/4) Epoch 13, batch 1700, loss[ctc_loss=0.1003, att_loss=0.2532, loss=0.2226, over 16945.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.008707, over 50.00 utterances.], tot_loss[ctc_loss=0.09544, att_loss=0.2468, loss=0.2166, over 3269344.27 frames. utt_duration=1211 frames, utt_pad_proportion=0.06439, over 10811.45 utterances.], batch size: 50, lr: 8.39e-03, grad_scale: 8.0 2023-03-08 08:21:16,981 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.458e+02 2.324e+02 2.788e+02 3.752e+02 8.319e+02, threshold=5.576e+02, percent-clipped=5.0 2023-03-08 08:21:18,730 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49525.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:21:27,815 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.81 vs. limit=2.0 2023-03-08 08:21:46,039 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49542.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:22:07,063 INFO [train2.py:809] (2/4) Epoch 13, batch 1750, loss[ctc_loss=0.07671, att_loss=0.2245, loss=0.1949, over 15759.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.008782, over 38.00 utterances.], tot_loss[ctc_loss=0.0963, att_loss=0.2474, loss=0.2171, over 3265118.68 frames. utt_duration=1211 frames, utt_pad_proportion=0.06551, over 10799.22 utterances.], batch size: 38, lr: 8.38e-03, grad_scale: 8.0 2023-03-08 08:23:03,829 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49592.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:23:04,064 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7246, 4.8062, 4.7857, 4.6954, 5.1917, 4.7020, 4.7493, 2.3714], device='cuda:2'), covar=tensor([0.0204, 0.0213, 0.0188, 0.0220, 0.1012, 0.0217, 0.0226, 0.2158], device='cuda:2'), in_proj_covar=tensor([0.0127, 0.0137, 0.0142, 0.0154, 0.0340, 0.0124, 0.0125, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 08:23:25,899 INFO [train2.py:809] (2/4) Epoch 13, batch 1800, loss[ctc_loss=0.08396, att_loss=0.2343, loss=0.2042, over 16117.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006895, over 42.00 utterances.], tot_loss[ctc_loss=0.09514, att_loss=0.2458, loss=0.2157, over 3258118.15 frames. utt_duration=1240 frames, utt_pad_proportion=0.05988, over 10521.95 utterances.], batch size: 42, lr: 8.38e-03, grad_scale: 8.0 2023-03-08 08:23:54,567 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.478e+02 2.431e+02 2.885e+02 3.492e+02 8.603e+02, threshold=5.770e+02, percent-clipped=6.0 2023-03-08 08:24:27,821 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.59 vs. limit=5.0 2023-03-08 08:24:45,024 INFO [train2.py:809] (2/4) Epoch 13, batch 1850, loss[ctc_loss=0.1109, att_loss=0.254, loss=0.2254, over 16618.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005947, over 47.00 utterances.], tot_loss[ctc_loss=0.09504, att_loss=0.2459, loss=0.2157, over 3269120.95 frames. utt_duration=1265 frames, utt_pad_proportion=0.05146, over 10351.74 utterances.], batch size: 47, lr: 8.37e-03, grad_scale: 8.0 2023-03-08 08:25:07,409 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0713, 5.3232, 5.6089, 5.4856, 5.5059, 6.0477, 5.2005, 6.1555], device='cuda:2'), covar=tensor([0.0733, 0.0725, 0.0783, 0.1214, 0.1956, 0.0831, 0.0714, 0.0633], device='cuda:2'), in_proj_covar=tensor([0.0756, 0.0447, 0.0526, 0.0586, 0.0768, 0.0531, 0.0433, 0.0527], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 08:25:42,443 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-03-08 08:26:04,418 INFO [train2.py:809] (2/4) Epoch 13, batch 1900, loss[ctc_loss=0.09312, att_loss=0.2559, loss=0.2233, over 16868.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007479, over 49.00 utterances.], tot_loss[ctc_loss=0.0939, att_loss=0.2455, loss=0.2152, over 3274838.08 frames. utt_duration=1271 frames, utt_pad_proportion=0.04907, over 10321.55 utterances.], batch size: 49, lr: 8.37e-03, grad_scale: 8.0 2023-03-08 08:26:33,118 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.532e+02 2.157e+02 2.753e+02 3.358e+02 9.197e+02, threshold=5.506e+02, percent-clipped=6.0 2023-03-08 08:26:53,210 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6287, 4.9843, 5.2101, 5.0112, 5.0616, 5.5857, 5.0075, 5.7232], device='cuda:2'), covar=tensor([0.0669, 0.0645, 0.0759, 0.1173, 0.1775, 0.0802, 0.0709, 0.0559], device='cuda:2'), in_proj_covar=tensor([0.0751, 0.0448, 0.0524, 0.0586, 0.0768, 0.0534, 0.0430, 0.0521], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 08:27:00,258 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5859, 3.5570, 3.5063, 3.1752, 3.4833, 3.5330, 3.3973, 2.6814], device='cuda:2'), covar=tensor([0.1168, 0.2577, 0.2775, 0.5058, 0.5338, 0.4540, 0.2242, 0.5795], device='cuda:2'), in_proj_covar=tensor([0.0109, 0.0131, 0.0140, 0.0206, 0.0107, 0.0193, 0.0116, 0.0177], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001], device='cuda:2') 2023-03-08 08:27:24,889 INFO [train2.py:809] (2/4) Epoch 13, batch 1950, loss[ctc_loss=0.114, att_loss=0.2669, loss=0.2363, over 17436.00 frames. utt_duration=1109 frames, utt_pad_proportion=0.03008, over 63.00 utterances.], tot_loss[ctc_loss=0.09456, att_loss=0.2464, loss=0.216, over 3282164.61 frames. utt_duration=1255 frames, utt_pad_proportion=0.05053, over 10475.90 utterances.], batch size: 63, lr: 8.37e-03, grad_scale: 16.0 2023-03-08 08:27:41,109 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=49766.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:27:59,745 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49778.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:28:07,040 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.23 vs. limit=5.0 2023-03-08 08:28:17,537 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49788.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 08:28:45,895 INFO [train2.py:809] (2/4) Epoch 13, batch 2000, loss[ctc_loss=0.1158, att_loss=0.2615, loss=0.2324, over 17422.00 frames. utt_duration=1011 frames, utt_pad_proportion=0.04672, over 69.00 utterances.], tot_loss[ctc_loss=0.09455, att_loss=0.2466, loss=0.2162, over 3289390.53 frames. utt_duration=1259 frames, utt_pad_proportion=0.04786, over 10460.00 utterances.], batch size: 69, lr: 8.36e-03, grad_scale: 16.0 2023-03-08 08:29:14,827 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.359e+02 2.246e+02 2.552e+02 3.168e+02 7.222e+02, threshold=5.105e+02, percent-clipped=2.0 2023-03-08 08:29:19,817 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=49827.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:30:05,631 INFO [train2.py:809] (2/4) Epoch 13, batch 2050, loss[ctc_loss=0.08521, att_loss=0.2258, loss=0.1977, over 16005.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007373, over 40.00 utterances.], tot_loss[ctc_loss=0.09437, att_loss=0.2462, loss=0.2158, over 3290383.14 frames. utt_duration=1276 frames, utt_pad_proportion=0.04462, over 10330.65 utterances.], batch size: 40, lr: 8.36e-03, grad_scale: 16.0 2023-03-08 08:30:48,740 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0862, 5.3718, 4.8180, 5.4592, 4.7952, 5.0265, 5.4921, 5.2839], device='cuda:2'), covar=tensor([0.0481, 0.0249, 0.0794, 0.0229, 0.0374, 0.0237, 0.0205, 0.0176], device='cuda:2'), in_proj_covar=tensor([0.0342, 0.0272, 0.0327, 0.0276, 0.0278, 0.0211, 0.0258, 0.0244], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0005, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 08:31:03,764 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49892.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:31:25,303 INFO [train2.py:809] (2/4) Epoch 13, batch 2100, loss[ctc_loss=0.1165, att_loss=0.2776, loss=0.2454, over 17131.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01352, over 56.00 utterances.], tot_loss[ctc_loss=0.09464, att_loss=0.2463, loss=0.2159, over 3281836.09 frames. utt_duration=1265 frames, utt_pad_proportion=0.04976, over 10391.87 utterances.], batch size: 56, lr: 8.35e-03, grad_scale: 16.0 2023-03-08 08:31:42,602 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9824, 5.2436, 5.2215, 5.1039, 5.2840, 5.2388, 4.9550, 4.6893], device='cuda:2'), covar=tensor([0.1028, 0.0539, 0.0252, 0.0509, 0.0295, 0.0329, 0.0355, 0.0361], device='cuda:2'), in_proj_covar=tensor([0.0480, 0.0312, 0.0277, 0.0306, 0.0364, 0.0385, 0.0310, 0.0345], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 08:31:53,560 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+02 2.361e+02 2.979e+02 3.749e+02 1.127e+03, threshold=5.957e+02, percent-clipped=6.0 2023-03-08 08:32:08,317 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5413, 4.9923, 4.8049, 4.9175, 5.0485, 4.5901, 3.8936, 4.9453], device='cuda:2'), covar=tensor([0.0105, 0.0101, 0.0126, 0.0083, 0.0096, 0.0106, 0.0504, 0.0184], device='cuda:2'), in_proj_covar=tensor([0.0078, 0.0074, 0.0092, 0.0057, 0.0063, 0.0073, 0.0094, 0.0095], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 08:32:19,978 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49940.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:32:44,497 INFO [train2.py:809] (2/4) Epoch 13, batch 2150, loss[ctc_loss=0.08914, att_loss=0.2543, loss=0.2213, over 16980.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006691, over 50.00 utterances.], tot_loss[ctc_loss=0.09415, att_loss=0.246, loss=0.2156, over 3282722.96 frames. utt_duration=1297 frames, utt_pad_proportion=0.04233, over 10134.88 utterances.], batch size: 50, lr: 8.35e-03, grad_scale: 16.0 2023-03-08 08:33:03,783 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4880, 1.5303, 2.1777, 2.5400, 2.5930, 2.3093, 2.2829, 2.5378], device='cuda:2'), covar=tensor([0.1934, 0.6558, 0.3706, 0.1454, 0.2176, 0.1565, 0.2795, 0.1196], device='cuda:2'), in_proj_covar=tensor([0.0085, 0.0094, 0.0097, 0.0080, 0.0087, 0.0075, 0.0095, 0.0065], device='cuda:2'), out_proj_covar=tensor([5.9242e-05, 6.6963e-05, 6.9282e-05, 5.7555e-05, 5.9209e-05, 5.6446e-05, 6.6945e-05, 5.0167e-05], device='cuda:2') 2023-03-08 08:34:08,270 INFO [train2.py:809] (2/4) Epoch 13, batch 2200, loss[ctc_loss=0.09821, att_loss=0.2586, loss=0.2265, over 17297.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02478, over 59.00 utterances.], tot_loss[ctc_loss=0.09466, att_loss=0.2466, loss=0.2162, over 3286143.71 frames. utt_duration=1306 frames, utt_pad_proportion=0.04014, over 10078.03 utterances.], batch size: 59, lr: 8.35e-03, grad_scale: 16.0 2023-03-08 08:34:35,765 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.558e+02 2.313e+02 2.828e+02 3.809e+02 8.151e+02, threshold=5.657e+02, percent-clipped=5.0 2023-03-08 08:35:25,945 INFO [train2.py:809] (2/4) Epoch 13, batch 2250, loss[ctc_loss=0.08485, att_loss=0.2482, loss=0.2155, over 16893.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006274, over 49.00 utterances.], tot_loss[ctc_loss=0.09458, att_loss=0.2464, loss=0.216, over 3286597.98 frames. utt_duration=1294 frames, utt_pad_proportion=0.04152, over 10172.21 utterances.], batch size: 49, lr: 8.34e-03, grad_scale: 16.0 2023-03-08 08:36:00,896 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50078.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:36:06,765 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-08 08:36:17,744 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50088.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 08:36:45,461 INFO [train2.py:809] (2/4) Epoch 13, batch 2300, loss[ctc_loss=0.09605, att_loss=0.2278, loss=0.2014, over 15753.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009254, over 38.00 utterances.], tot_loss[ctc_loss=0.09516, att_loss=0.2467, loss=0.2164, over 3284696.90 frames. utt_duration=1289 frames, utt_pad_proportion=0.0439, over 10202.72 utterances.], batch size: 38, lr: 8.34e-03, grad_scale: 8.0 2023-03-08 08:37:10,654 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=50122.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:37:15,133 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.321e+02 2.441e+02 2.731e+02 3.266e+02 6.421e+02, threshold=5.462e+02, percent-clipped=3.0 2023-03-08 08:37:17,333 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50126.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:37:20,755 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50128.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:37:34,160 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50136.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 08:37:52,508 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4982, 5.7511, 5.2220, 5.6303, 5.3758, 4.9799, 5.1934, 5.0150], device='cuda:2'), covar=tensor([0.1175, 0.0827, 0.0712, 0.0738, 0.0847, 0.1368, 0.2252, 0.2065], device='cuda:2'), in_proj_covar=tensor([0.0446, 0.0524, 0.0389, 0.0392, 0.0374, 0.0425, 0.0541, 0.0470], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 08:37:55,609 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.2034, 5.4806, 5.7790, 5.6274, 5.6863, 6.1631, 5.3179, 6.2597], device='cuda:2'), covar=tensor([0.0671, 0.0624, 0.0662, 0.1122, 0.1894, 0.0819, 0.0583, 0.0616], device='cuda:2'), in_proj_covar=tensor([0.0757, 0.0447, 0.0531, 0.0585, 0.0770, 0.0530, 0.0427, 0.0525], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 08:38:04,742 INFO [train2.py:809] (2/4) Epoch 13, batch 2350, loss[ctc_loss=0.09926, att_loss=0.2542, loss=0.2232, over 16953.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008269, over 50.00 utterances.], tot_loss[ctc_loss=0.09626, att_loss=0.2473, loss=0.2171, over 3271914.82 frames. utt_duration=1233 frames, utt_pad_proportion=0.06173, over 10625.19 utterances.], batch size: 50, lr: 8.33e-03, grad_scale: 8.0 2023-03-08 08:38:58,510 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50189.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:39:07,521 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4186, 3.5114, 3.4306, 3.0494, 3.4173, 3.5157, 3.3424, 2.3797], device='cuda:2'), covar=tensor([0.1393, 0.1455, 0.2681, 0.5528, 0.1650, 0.3106, 0.1165, 0.7071], device='cuda:2'), in_proj_covar=tensor([0.0108, 0.0131, 0.0140, 0.0209, 0.0108, 0.0194, 0.0118, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 08:39:10,456 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5132, 3.0782, 3.7443, 4.5772, 4.0713, 4.0345, 3.1088, 2.2726], device='cuda:2'), covar=tensor([0.0718, 0.2023, 0.0827, 0.0434, 0.0674, 0.0406, 0.1441, 0.2344], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0213, 0.0193, 0.0202, 0.0199, 0.0164, 0.0200, 0.0185], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 08:39:10,504 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50197.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 08:39:24,033 INFO [train2.py:809] (2/4) Epoch 13, batch 2400, loss[ctc_loss=0.08062, att_loss=0.225, loss=0.1962, over 15497.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008423, over 36.00 utterances.], tot_loss[ctc_loss=0.09679, att_loss=0.248, loss=0.2177, over 3274512.50 frames. utt_duration=1221 frames, utt_pad_proportion=0.06329, over 10744.03 utterances.], batch size: 36, lr: 8.33e-03, grad_scale: 8.0 2023-03-08 08:39:27,453 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9438, 5.0175, 4.7713, 2.8682, 4.6813, 4.5597, 4.0632, 2.7009], device='cuda:2'), covar=tensor([0.0121, 0.0068, 0.0209, 0.0971, 0.0096, 0.0188, 0.0328, 0.1298], device='cuda:2'), in_proj_covar=tensor([0.0062, 0.0087, 0.0084, 0.0104, 0.0074, 0.0095, 0.0091, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 08:39:54,444 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.486e+02 2.261e+02 2.728e+02 3.440e+02 5.730e+02, threshold=5.455e+02, percent-clipped=1.0 2023-03-08 08:40:42,741 INFO [train2.py:809] (2/4) Epoch 13, batch 2450, loss[ctc_loss=0.0675, att_loss=0.2109, loss=0.1822, over 15352.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01097, over 35.00 utterances.], tot_loss[ctc_loss=0.0965, att_loss=0.2479, loss=0.2176, over 3284686.90 frames. utt_duration=1240 frames, utt_pad_proportion=0.05534, over 10611.29 utterances.], batch size: 35, lr: 8.32e-03, grad_scale: 8.0 2023-03-08 08:40:45,463 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-08 08:40:46,207 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50258.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 08:41:21,551 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-08 08:42:02,222 INFO [train2.py:809] (2/4) Epoch 13, batch 2500, loss[ctc_loss=0.1028, att_loss=0.2617, loss=0.2299, over 17316.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.01111, over 55.00 utterances.], tot_loss[ctc_loss=0.09625, att_loss=0.2474, loss=0.2172, over 3278377.24 frames. utt_duration=1242 frames, utt_pad_proportion=0.05546, over 10573.01 utterances.], batch size: 55, lr: 8.32e-03, grad_scale: 8.0 2023-03-08 08:42:33,321 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.348e+02 2.276e+02 2.773e+02 3.463e+02 7.694e+02, threshold=5.547e+02, percent-clipped=7.0 2023-03-08 08:43:17,164 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8674, 4.7804, 4.7169, 4.6915, 5.2442, 4.7726, 4.8076, 2.5391], device='cuda:2'), covar=tensor([0.0161, 0.0211, 0.0216, 0.0199, 0.0759, 0.0177, 0.0214, 0.1901], device='cuda:2'), in_proj_covar=tensor([0.0128, 0.0137, 0.0144, 0.0154, 0.0340, 0.0125, 0.0126, 0.0210], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 08:43:21,289 INFO [train2.py:809] (2/4) Epoch 13, batch 2550, loss[ctc_loss=0.08165, att_loss=0.2419, loss=0.2098, over 16271.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007691, over 43.00 utterances.], tot_loss[ctc_loss=0.0967, att_loss=0.2475, loss=0.2174, over 3275050.08 frames. utt_duration=1236 frames, utt_pad_proportion=0.05811, over 10614.47 utterances.], batch size: 43, lr: 8.32e-03, grad_scale: 8.0 2023-03-08 08:44:40,230 INFO [train2.py:809] (2/4) Epoch 13, batch 2600, loss[ctc_loss=0.1294, att_loss=0.2466, loss=0.2232, over 15645.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.00894, over 37.00 utterances.], tot_loss[ctc_loss=0.09638, att_loss=0.2472, loss=0.217, over 3266921.49 frames. utt_duration=1248 frames, utt_pad_proportion=0.05726, over 10483.61 utterances.], batch size: 37, lr: 8.31e-03, grad_scale: 8.0 2023-03-08 08:45:06,210 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50422.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:45:10,507 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.653e+02 2.288e+02 2.850e+02 3.496e+02 8.572e+02, threshold=5.699e+02, percent-clipped=4.0 2023-03-08 08:45:59,190 INFO [train2.py:809] (2/4) Epoch 13, batch 2650, loss[ctc_loss=0.06817, att_loss=0.2264, loss=0.1947, over 16389.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.006359, over 44.00 utterances.], tot_loss[ctc_loss=0.09629, att_loss=0.2474, loss=0.2172, over 3269530.54 frames. utt_duration=1246 frames, utt_pad_proportion=0.05638, over 10505.94 utterances.], batch size: 44, lr: 8.31e-03, grad_scale: 8.0 2023-03-08 08:46:22,030 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50470.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:46:44,868 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=50484.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:47:17,951 INFO [train2.py:809] (2/4) Epoch 13, batch 2700, loss[ctc_loss=0.08493, att_loss=0.2349, loss=0.2049, over 16114.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006222, over 42.00 utterances.], tot_loss[ctc_loss=0.09671, att_loss=0.2474, loss=0.2173, over 3272993.10 frames. utt_duration=1256 frames, utt_pad_proportion=0.05299, over 10432.59 utterances.], batch size: 42, lr: 8.30e-03, grad_scale: 8.0 2023-03-08 08:47:25,964 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9420, 4.3197, 3.9437, 4.3633, 2.5407, 4.3382, 2.4265, 1.6019], device='cuda:2'), covar=tensor([0.0445, 0.0150, 0.0900, 0.0173, 0.1817, 0.0144, 0.1667, 0.1865], device='cuda:2'), in_proj_covar=tensor([0.0153, 0.0123, 0.0254, 0.0119, 0.0218, 0.0111, 0.0224, 0.0199], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 08:47:49,716 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.590e+02 2.365e+02 2.788e+02 3.367e+02 6.677e+02, threshold=5.576e+02, percent-clipped=4.0 2023-03-08 08:48:23,031 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50546.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:48:33,804 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=50553.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 08:48:38,308 INFO [train2.py:809] (2/4) Epoch 13, batch 2750, loss[ctc_loss=0.08725, att_loss=0.2517, loss=0.2188, over 17437.00 frames. utt_duration=1012 frames, utt_pad_proportion=0.04515, over 69.00 utterances.], tot_loss[ctc_loss=0.09709, att_loss=0.2483, loss=0.218, over 3279920.44 frames. utt_duration=1231 frames, utt_pad_proportion=0.05736, over 10672.92 utterances.], batch size: 69, lr: 8.30e-03, grad_scale: 8.0 2023-03-08 08:48:39,539 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-03-08 08:49:57,333 INFO [train2.py:809] (2/4) Epoch 13, batch 2800, loss[ctc_loss=0.1018, att_loss=0.2432, loss=0.2149, over 15872.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.008906, over 39.00 utterances.], tot_loss[ctc_loss=0.09634, att_loss=0.2473, loss=0.2171, over 3279178.82 frames. utt_duration=1245 frames, utt_pad_proportion=0.05376, over 10546.39 utterances.], batch size: 39, lr: 8.30e-03, grad_scale: 8.0 2023-03-08 08:49:59,327 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50607.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:50:29,966 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.488e+02 2.143e+02 2.586e+02 3.162e+02 8.446e+02, threshold=5.172e+02, percent-clipped=2.0 2023-03-08 08:50:33,481 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0507, 4.7194, 4.4905, 4.7279, 3.0070, 4.5953, 2.5546, 1.9231], device='cuda:2'), covar=tensor([0.0367, 0.0136, 0.0719, 0.0161, 0.1483, 0.0145, 0.1497, 0.1668], device='cuda:2'), in_proj_covar=tensor([0.0153, 0.0123, 0.0253, 0.0119, 0.0219, 0.0111, 0.0224, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 08:51:17,761 INFO [train2.py:809] (2/4) Epoch 13, batch 2850, loss[ctc_loss=0.07423, att_loss=0.2106, loss=0.1834, over 15872.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.008224, over 39.00 utterances.], tot_loss[ctc_loss=0.09504, att_loss=0.2466, loss=0.2163, over 3277170.73 frames. utt_duration=1256 frames, utt_pad_proportion=0.05146, over 10448.59 utterances.], batch size: 39, lr: 8.29e-03, grad_scale: 8.0 2023-03-08 08:52:37,436 INFO [train2.py:809] (2/4) Epoch 13, batch 2900, loss[ctc_loss=0.1035, att_loss=0.2324, loss=0.2066, over 15490.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.00944, over 36.00 utterances.], tot_loss[ctc_loss=0.09448, att_loss=0.2464, loss=0.216, over 3276459.46 frames. utt_duration=1272 frames, utt_pad_proportion=0.04765, over 10313.27 utterances.], batch size: 36, lr: 8.29e-03, grad_scale: 8.0 2023-03-08 08:53:08,849 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.567e+02 2.231e+02 2.674e+02 3.196e+02 5.262e+02, threshold=5.348e+02, percent-clipped=1.0 2023-03-08 08:53:56,555 INFO [train2.py:809] (2/4) Epoch 13, batch 2950, loss[ctc_loss=0.08277, att_loss=0.218, loss=0.191, over 15475.00 frames. utt_duration=1721 frames, utt_pad_proportion=0.009813, over 36.00 utterances.], tot_loss[ctc_loss=0.09482, att_loss=0.2463, loss=0.216, over 3275051.71 frames. utt_duration=1255 frames, utt_pad_proportion=0.05227, over 10452.99 utterances.], batch size: 36, lr: 8.28e-03, grad_scale: 8.0 2023-03-08 08:54:39,265 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50782.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:54:42,258 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50784.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:55:02,325 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3600, 3.4474, 3.3355, 2.9760, 3.4137, 3.4877, 3.3832, 2.4296], device='cuda:2'), covar=tensor([0.1212, 0.2103, 0.3458, 0.5057, 0.2518, 0.3536, 0.1788, 0.5839], device='cuda:2'), in_proj_covar=tensor([0.0111, 0.0135, 0.0147, 0.0217, 0.0110, 0.0200, 0.0122, 0.0186], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 08:55:16,324 INFO [train2.py:809] (2/4) Epoch 13, batch 3000, loss[ctc_loss=0.07956, att_loss=0.2363, loss=0.205, over 15771.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008604, over 38.00 utterances.], tot_loss[ctc_loss=0.09501, att_loss=0.2459, loss=0.2158, over 3281988.00 frames. utt_duration=1269 frames, utt_pad_proportion=0.04678, over 10361.30 utterances.], batch size: 38, lr: 8.28e-03, grad_scale: 8.0 2023-03-08 08:55:16,325 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 08:55:30,013 INFO [train2.py:843] (2/4) Epoch 13, validation: ctc_loss=0.04571, att_loss=0.2368, loss=0.1986, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 08:55:30,013 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 08:55:33,400 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50808.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 08:56:00,975 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+02 2.441e+02 3.101e+02 4.045e+02 6.345e+02, threshold=6.202e+02, percent-clipped=8.0 2023-03-08 08:56:12,103 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50832.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:56:29,178 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50843.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:56:44,449 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50853.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 08:56:48,706 INFO [train2.py:809] (2/4) Epoch 13, batch 3050, loss[ctc_loss=0.08237, att_loss=0.2339, loss=0.2036, over 16281.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006629, over 43.00 utterances.], tot_loss[ctc_loss=0.09495, att_loss=0.2461, loss=0.2159, over 3283692.76 frames. utt_duration=1283 frames, utt_pad_proportion=0.04439, over 10251.71 utterances.], batch size: 43, lr: 8.28e-03, grad_scale: 8.0 2023-03-08 08:57:10,215 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50869.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 08:57:20,444 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-03-08 08:57:58,277 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.80 vs. limit=2.0 2023-03-08 08:58:00,756 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50901.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 08:58:02,420 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=50902.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:58:08,259 INFO [train2.py:809] (2/4) Epoch 13, batch 3100, loss[ctc_loss=0.09636, att_loss=0.2567, loss=0.2247, over 17051.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.009013, over 53.00 utterances.], tot_loss[ctc_loss=0.09512, att_loss=0.2466, loss=0.2163, over 3277468.05 frames. utt_duration=1261 frames, utt_pad_proportion=0.04975, over 10406.27 utterances.], batch size: 53, lr: 8.27e-03, grad_scale: 8.0 2023-03-08 08:58:39,136 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.514e+02 2.262e+02 2.755e+02 3.486e+02 1.234e+03, threshold=5.511e+02, percent-clipped=2.0 2023-03-08 08:59:11,666 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50946.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 08:59:26,546 INFO [train2.py:809] (2/4) Epoch 13, batch 3150, loss[ctc_loss=0.1158, att_loss=0.2589, loss=0.2303, over 17469.00 frames. utt_duration=1014 frames, utt_pad_proportion=0.04331, over 69.00 utterances.], tot_loss[ctc_loss=0.09596, att_loss=0.2477, loss=0.2174, over 3278584.48 frames. utt_duration=1225 frames, utt_pad_proportion=0.05738, over 10720.34 utterances.], batch size: 69, lr: 8.27e-03, grad_scale: 8.0 2023-03-08 08:59:28,378 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1597, 2.7998, 3.1033, 4.2601, 3.8363, 3.8573, 2.8093, 1.8638], device='cuda:2'), covar=tensor([0.0839, 0.2221, 0.1071, 0.0882, 0.0776, 0.0546, 0.1699, 0.2805], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0211, 0.0188, 0.0200, 0.0199, 0.0160, 0.0197, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 09:00:46,288 INFO [train2.py:809] (2/4) Epoch 13, batch 3200, loss[ctc_loss=0.09598, att_loss=0.2636, loss=0.2301, over 17043.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.009289, over 52.00 utterances.], tot_loss[ctc_loss=0.09595, att_loss=0.2482, loss=0.2177, over 3284724.18 frames. utt_duration=1213 frames, utt_pad_proportion=0.05874, over 10848.90 utterances.], batch size: 52, lr: 8.26e-03, grad_scale: 8.0 2023-03-08 09:00:48,213 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51007.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:01:17,934 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.429e+02 2.247e+02 2.628e+02 3.429e+02 8.157e+02, threshold=5.255e+02, percent-clipped=5.0 2023-03-08 09:02:05,326 INFO [train2.py:809] (2/4) Epoch 13, batch 3250, loss[ctc_loss=0.07034, att_loss=0.2131, loss=0.1846, over 15500.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008865, over 36.00 utterances.], tot_loss[ctc_loss=0.09464, att_loss=0.2471, loss=0.2166, over 3283780.20 frames. utt_duration=1240 frames, utt_pad_proportion=0.05216, over 10604.08 utterances.], batch size: 36, lr: 8.26e-03, grad_scale: 8.0 2023-03-08 09:02:14,801 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51062.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:03:23,230 INFO [train2.py:809] (2/4) Epoch 13, batch 3300, loss[ctc_loss=0.06643, att_loss=0.2152, loss=0.1855, over 15516.00 frames. utt_duration=1726 frames, utt_pad_proportion=0.007603, over 36.00 utterances.], tot_loss[ctc_loss=0.09576, att_loss=0.2479, loss=0.2174, over 3283598.59 frames. utt_duration=1227 frames, utt_pad_proportion=0.05531, over 10715.11 utterances.], batch size: 36, lr: 8.26e-03, grad_scale: 8.0 2023-03-08 09:03:51,966 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51123.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:03:54,635 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+02 2.303e+02 2.804e+02 3.427e+02 6.539e+02, threshold=5.609e+02, percent-clipped=6.0 2023-03-08 09:04:14,234 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51138.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:04:29,395 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9466, 5.1750, 5.5224, 5.3667, 5.3764, 5.9502, 5.1832, 6.0282], device='cuda:2'), covar=tensor([0.0698, 0.0716, 0.0733, 0.1276, 0.1761, 0.0813, 0.0598, 0.0586], device='cuda:2'), in_proj_covar=tensor([0.0763, 0.0455, 0.0534, 0.0592, 0.0786, 0.0540, 0.0436, 0.0522], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 09:04:41,663 INFO [train2.py:809] (2/4) Epoch 13, batch 3350, loss[ctc_loss=0.08777, att_loss=0.2593, loss=0.225, over 17290.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01183, over 55.00 utterances.], tot_loss[ctc_loss=0.09559, att_loss=0.2475, loss=0.2171, over 3284389.61 frames. utt_duration=1244 frames, utt_pad_proportion=0.05163, over 10572.15 utterances.], batch size: 55, lr: 8.25e-03, grad_scale: 8.0 2023-03-08 09:04:55,045 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51164.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 09:05:08,566 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51172.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 09:05:55,369 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51202.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:06:01,137 INFO [train2.py:809] (2/4) Epoch 13, batch 3400, loss[ctc_loss=0.1456, att_loss=0.2724, loss=0.247, over 17433.00 frames. utt_duration=1012 frames, utt_pad_proportion=0.04516, over 69.00 utterances.], tot_loss[ctc_loss=0.0952, att_loss=0.2466, loss=0.2163, over 3275180.03 frames. utt_duration=1246 frames, utt_pad_proportion=0.05506, over 10524.67 utterances.], batch size: 69, lr: 8.25e-03, grad_scale: 8.0 2023-03-08 09:06:29,478 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2023-03-08 09:06:33,019 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.585e+02 2.487e+02 2.909e+02 3.549e+02 7.479e+02, threshold=5.817e+02, percent-clipped=4.0 2023-03-08 09:06:45,897 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51233.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 09:07:11,549 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51250.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:07:20,299 INFO [train2.py:809] (2/4) Epoch 13, batch 3450, loss[ctc_loss=0.08181, att_loss=0.2353, loss=0.2046, over 16121.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006464, over 42.00 utterances.], tot_loss[ctc_loss=0.09485, att_loss=0.2465, loss=0.2162, over 3271852.04 frames. utt_duration=1261 frames, utt_pad_proportion=0.05316, over 10391.20 utterances.], batch size: 42, lr: 8.24e-03, grad_scale: 8.0 2023-03-08 09:07:50,801 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8262, 5.1434, 4.5994, 5.1992, 4.5530, 4.8667, 5.2309, 5.0223], device='cuda:2'), covar=tensor([0.0533, 0.0284, 0.0922, 0.0261, 0.0456, 0.0269, 0.0224, 0.0212], device='cuda:2'), in_proj_covar=tensor([0.0349, 0.0276, 0.0330, 0.0281, 0.0283, 0.0216, 0.0262, 0.0244], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 09:08:33,581 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51302.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:08:39,552 INFO [train2.py:809] (2/4) Epoch 13, batch 3500, loss[ctc_loss=0.09554, att_loss=0.264, loss=0.2303, over 17346.00 frames. utt_duration=1103 frames, utt_pad_proportion=0.0358, over 63.00 utterances.], tot_loss[ctc_loss=0.09504, att_loss=0.2474, loss=0.2169, over 3271357.67 frames. utt_duration=1215 frames, utt_pad_proportion=0.06495, over 10784.78 utterances.], batch size: 63, lr: 8.24e-03, grad_scale: 8.0 2023-03-08 09:09:10,405 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+02 2.305e+02 2.663e+02 3.285e+02 9.490e+02, threshold=5.326e+02, percent-clipped=4.0 2023-03-08 09:09:39,651 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.2305, 5.3535, 5.7561, 5.5510, 5.6686, 6.1547, 5.3373, 6.2746], device='cuda:2'), covar=tensor([0.0613, 0.0696, 0.0660, 0.1108, 0.1688, 0.0758, 0.0553, 0.0599], device='cuda:2'), in_proj_covar=tensor([0.0768, 0.0456, 0.0537, 0.0591, 0.0789, 0.0540, 0.0437, 0.0526], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 09:09:57,628 INFO [train2.py:809] (2/4) Epoch 13, batch 3550, loss[ctc_loss=0.0974, att_loss=0.245, loss=0.2155, over 16254.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.008666, over 43.00 utterances.], tot_loss[ctc_loss=0.09465, att_loss=0.2468, loss=0.2164, over 3269122.28 frames. utt_duration=1223 frames, utt_pad_proportion=0.0635, over 10701.36 utterances.], batch size: 43, lr: 8.24e-03, grad_scale: 8.0 2023-03-08 09:10:30,824 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1171, 4.4344, 4.5790, 4.9747, 2.6864, 4.6035, 2.7725, 1.9902], device='cuda:2'), covar=tensor([0.0443, 0.0255, 0.0675, 0.0122, 0.1865, 0.0146, 0.1615, 0.1871], device='cuda:2'), in_proj_covar=tensor([0.0157, 0.0128, 0.0258, 0.0120, 0.0224, 0.0114, 0.0227, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 09:10:39,748 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51382.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:10:54,961 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51392.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:11:16,642 INFO [train2.py:809] (2/4) Epoch 13, batch 3600, loss[ctc_loss=0.1126, att_loss=0.2649, loss=0.2344, over 17097.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.01622, over 56.00 utterances.], tot_loss[ctc_loss=0.09502, att_loss=0.2467, loss=0.2164, over 3268049.91 frames. utt_duration=1209 frames, utt_pad_proportion=0.06732, over 10829.39 utterances.], batch size: 56, lr: 8.23e-03, grad_scale: 8.0 2023-03-08 09:11:36,222 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51418.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:11:46,269 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+02 2.412e+02 2.884e+02 3.551e+02 7.030e+02, threshold=5.767e+02, percent-clipped=4.0 2023-03-08 09:12:05,887 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51438.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:12:13,371 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51443.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:12:29,022 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51453.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:12:33,508 INFO [train2.py:809] (2/4) Epoch 13, batch 3650, loss[ctc_loss=0.1399, att_loss=0.2832, loss=0.2546, over 13925.00 frames. utt_duration=385.7 frames, utt_pad_proportion=0.3293, over 145.00 utterances.], tot_loss[ctc_loss=0.09592, att_loss=0.2472, loss=0.2169, over 3267162.05 frames. utt_duration=1203 frames, utt_pad_proportion=0.06772, over 10874.08 utterances.], batch size: 145, lr: 8.23e-03, grad_scale: 8.0 2023-03-08 09:12:47,052 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51464.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 09:13:20,392 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51486.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:13:34,850 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51495.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:13:34,959 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8081, 2.5309, 5.1329, 4.0623, 3.0642, 4.4949, 5.0000, 4.7103], device='cuda:2'), covar=tensor([0.0194, 0.1520, 0.0184, 0.0883, 0.1779, 0.0198, 0.0100, 0.0234], device='cuda:2'), in_proj_covar=tensor([0.0154, 0.0238, 0.0144, 0.0303, 0.0265, 0.0185, 0.0130, 0.0160], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 09:13:45,015 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9945, 3.7607, 3.1727, 3.3984, 3.9922, 3.5841, 2.8331, 4.2596], device='cuda:2'), covar=tensor([0.0938, 0.0482, 0.0975, 0.0665, 0.0584, 0.0633, 0.0877, 0.0493], device='cuda:2'), in_proj_covar=tensor([0.0186, 0.0191, 0.0205, 0.0180, 0.0242, 0.0218, 0.0185, 0.0260], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 09:13:52,936 INFO [train2.py:809] (2/4) Epoch 13, batch 3700, loss[ctc_loss=0.1048, att_loss=0.2643, loss=0.2324, over 17045.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.009187, over 52.00 utterances.], tot_loss[ctc_loss=0.09616, att_loss=0.2474, loss=0.2172, over 3268868.81 frames. utt_duration=1224 frames, utt_pad_proportion=0.06174, over 10695.61 utterances.], batch size: 52, lr: 8.22e-03, grad_scale: 8.0 2023-03-08 09:13:54,713 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8071, 5.0523, 5.3707, 5.2750, 5.2877, 5.8331, 5.0701, 5.8968], device='cuda:2'), covar=tensor([0.0673, 0.0751, 0.0827, 0.1179, 0.1711, 0.0858, 0.0691, 0.0658], device='cuda:2'), in_proj_covar=tensor([0.0767, 0.0452, 0.0535, 0.0588, 0.0779, 0.0539, 0.0435, 0.0522], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 09:14:03,483 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51512.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 09:14:23,030 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.524e+02 2.328e+02 2.806e+02 3.830e+02 9.896e+02, threshold=5.611e+02, percent-clipped=3.0 2023-03-08 09:14:28,086 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51528.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 09:15:04,285 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2023-03-08 09:15:13,024 INFO [train2.py:809] (2/4) Epoch 13, batch 3750, loss[ctc_loss=0.08495, att_loss=0.2358, loss=0.2057, over 16274.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007661, over 43.00 utterances.], tot_loss[ctc_loss=0.09502, att_loss=0.2464, loss=0.2161, over 3266806.59 frames. utt_duration=1240 frames, utt_pad_proportion=0.05782, over 10550.05 utterances.], batch size: 43, lr: 8.22e-03, grad_scale: 8.0 2023-03-08 09:15:13,435 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51556.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:16:26,974 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51602.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:16:32,781 INFO [train2.py:809] (2/4) Epoch 13, batch 3800, loss[ctc_loss=0.1038, att_loss=0.243, loss=0.2152, over 16771.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006392, over 48.00 utterances.], tot_loss[ctc_loss=0.09485, att_loss=0.2459, loss=0.2157, over 3267738.09 frames. utt_duration=1239 frames, utt_pad_proportion=0.05672, over 10558.75 utterances.], batch size: 48, lr: 8.22e-03, grad_scale: 8.0 2023-03-08 09:17:02,533 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.490e+02 2.194e+02 2.674e+02 3.110e+02 7.500e+02, threshold=5.349e+02, percent-clipped=1.0 2023-03-08 09:17:24,133 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3593, 3.2473, 3.2013, 2.8990, 3.2978, 3.1389, 3.2755, 2.3100], device='cuda:2'), covar=tensor([0.1020, 0.2217, 0.2685, 0.4732, 0.1240, 0.3586, 0.1174, 0.6425], device='cuda:2'), in_proj_covar=tensor([0.0113, 0.0136, 0.0147, 0.0216, 0.0112, 0.0201, 0.0123, 0.0184], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 09:17:41,355 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51650.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:17:51,575 INFO [train2.py:809] (2/4) Epoch 13, batch 3850, loss[ctc_loss=0.08675, att_loss=0.2344, loss=0.2049, over 15959.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006393, over 41.00 utterances.], tot_loss[ctc_loss=0.09414, att_loss=0.245, loss=0.2148, over 3248377.86 frames. utt_duration=1229 frames, utt_pad_proportion=0.06391, over 10583.86 utterances.], batch size: 41, lr: 8.21e-03, grad_scale: 8.0 2023-03-08 09:18:30,094 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3773, 5.2476, 5.1308, 2.9432, 5.1390, 4.7637, 4.6124, 3.1442], device='cuda:2'), covar=tensor([0.0101, 0.0080, 0.0270, 0.1048, 0.0074, 0.0159, 0.0245, 0.1112], device='cuda:2'), in_proj_covar=tensor([0.0064, 0.0089, 0.0086, 0.0105, 0.0075, 0.0098, 0.0093, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 09:19:07,770 INFO [train2.py:809] (2/4) Epoch 13, batch 3900, loss[ctc_loss=0.09039, att_loss=0.2294, loss=0.2016, over 15885.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009191, over 39.00 utterances.], tot_loss[ctc_loss=0.09451, att_loss=0.245, loss=0.2149, over 3247506.52 frames. utt_duration=1215 frames, utt_pad_proportion=0.06879, over 10708.73 utterances.], batch size: 39, lr: 8.21e-03, grad_scale: 8.0 2023-03-08 09:19:26,178 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51718.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:19:36,509 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.485e+02 2.131e+02 2.698e+02 3.249e+02 8.076e+02, threshold=5.395e+02, percent-clipped=6.0 2023-03-08 09:19:56,079 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8437, 5.1009, 5.3916, 5.2752, 5.2713, 5.8423, 5.1547, 5.9478], device='cuda:2'), covar=tensor([0.0689, 0.0738, 0.0656, 0.1237, 0.1863, 0.0792, 0.0594, 0.0590], device='cuda:2'), in_proj_covar=tensor([0.0759, 0.0447, 0.0532, 0.0583, 0.0774, 0.0534, 0.0431, 0.0515], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 09:19:56,095 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51738.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:20:10,170 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51747.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:20:11,525 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51748.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:20:23,592 INFO [train2.py:809] (2/4) Epoch 13, batch 3950, loss[ctc_loss=0.07665, att_loss=0.2461, loss=0.2122, over 16542.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006032, over 45.00 utterances.], tot_loss[ctc_loss=0.09437, att_loss=0.2454, loss=0.2152, over 3253021.02 frames. utt_duration=1224 frames, utt_pad_proportion=0.06597, over 10645.48 utterances.], batch size: 45, lr: 8.20e-03, grad_scale: 8.0 2023-03-08 09:20:38,818 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51766.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:20:40,666 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0891, 5.1241, 4.9037, 3.0672, 4.8769, 4.6493, 4.3683, 2.6143], device='cuda:2'), covar=tensor([0.0115, 0.0067, 0.0263, 0.0963, 0.0087, 0.0164, 0.0287, 0.1341], device='cuda:2'), in_proj_covar=tensor([0.0063, 0.0087, 0.0085, 0.0105, 0.0074, 0.0097, 0.0092, 0.0100], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 09:20:45,970 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-03-08 09:20:55,645 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8245, 3.7135, 3.0720, 3.4818, 3.9129, 3.5916, 2.8331, 4.2152], device='cuda:2'), covar=tensor([0.1092, 0.0477, 0.1097, 0.0677, 0.0679, 0.0697, 0.0914, 0.0569], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0194, 0.0209, 0.0183, 0.0247, 0.0221, 0.0186, 0.0263], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 09:21:41,018 INFO [train2.py:809] (2/4) Epoch 14, batch 0, loss[ctc_loss=0.08293, att_loss=0.235, loss=0.2046, over 15968.00 frames. utt_duration=1560 frames, utt_pad_proportion=0.006016, over 41.00 utterances.], tot_loss[ctc_loss=0.08293, att_loss=0.235, loss=0.2046, over 15968.00 frames. utt_duration=1560 frames, utt_pad_proportion=0.006016, over 41.00 utterances.], batch size: 41, lr: 7.90e-03, grad_scale: 8.0 2023-03-08 09:21:41,018 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 09:21:52,751 INFO [train2.py:843] (2/4) Epoch 14, validation: ctc_loss=0.04501, att_loss=0.2367, loss=0.1984, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 09:21:52,752 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 09:22:20,796 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51808.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:22:46,827 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.359e+02 2.322e+02 2.890e+02 3.740e+02 8.433e+02, threshold=5.780e+02, percent-clipped=5.0 2023-03-08 09:22:49,378 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 09:22:52,361 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51828.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 09:23:04,213 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-03-08 09:23:11,221 INFO [train2.py:809] (2/4) Epoch 14, batch 50, loss[ctc_loss=0.09529, att_loss=0.2411, loss=0.212, over 16416.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.00626, over 44.00 utterances.], tot_loss[ctc_loss=0.09346, att_loss=0.2464, loss=0.2158, over 746955.16 frames. utt_duration=1367 frames, utt_pad_proportion=0.01673, over 2188.09 utterances.], batch size: 44, lr: 7.90e-03, grad_scale: 8.0 2023-03-08 09:23:29,061 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51851.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:24:08,124 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51876.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 09:24:29,806 INFO [train2.py:809] (2/4) Epoch 14, batch 100, loss[ctc_loss=0.09894, att_loss=0.2602, loss=0.228, over 17120.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01493, over 56.00 utterances.], tot_loss[ctc_loss=0.09176, att_loss=0.2453, loss=0.2146, over 1310541.64 frames. utt_duration=1274 frames, utt_pad_proportion=0.04247, over 4118.24 utterances.], batch size: 56, lr: 7.90e-03, grad_scale: 8.0 2023-03-08 09:25:03,407 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5794, 2.2847, 5.1045, 3.9902, 3.0139, 4.3606, 4.8232, 4.7431], device='cuda:2'), covar=tensor([0.0211, 0.1809, 0.0124, 0.0970, 0.1818, 0.0228, 0.0095, 0.0195], device='cuda:2'), in_proj_covar=tensor([0.0155, 0.0239, 0.0145, 0.0302, 0.0265, 0.0186, 0.0130, 0.0160], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 09:25:24,996 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.531e+02 2.393e+02 2.937e+02 3.541e+02 1.195e+03, threshold=5.873e+02, percent-clipped=2.0 2023-03-08 09:25:48,793 INFO [train2.py:809] (2/4) Epoch 14, batch 150, loss[ctc_loss=0.1608, att_loss=0.2854, loss=0.2604, over 14713.00 frames. utt_duration=404.6 frames, utt_pad_proportion=0.2951, over 146.00 utterances.], tot_loss[ctc_loss=0.0928, att_loss=0.2455, loss=0.215, over 1736474.44 frames. utt_duration=1228 frames, utt_pad_proportion=0.06295, over 5664.12 utterances.], batch size: 146, lr: 7.89e-03, grad_scale: 8.0 2023-03-08 09:26:28,850 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5923, 2.9694, 3.7374, 4.5789, 4.1734, 4.1703, 3.0001, 2.5096], device='cuda:2'), covar=tensor([0.0628, 0.1989, 0.0738, 0.0481, 0.0639, 0.0395, 0.1470, 0.2060], device='cuda:2'), in_proj_covar=tensor([0.0174, 0.0214, 0.0191, 0.0206, 0.0204, 0.0163, 0.0200, 0.0185], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 09:26:56,459 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3289, 5.2338, 5.1886, 2.6549, 2.1524, 3.0461, 2.8185, 3.9630], device='cuda:2'), covar=tensor([0.0628, 0.0312, 0.0195, 0.4526, 0.5841, 0.2476, 0.2995, 0.1687], device='cuda:2'), in_proj_covar=tensor([0.0346, 0.0239, 0.0246, 0.0225, 0.0351, 0.0338, 0.0236, 0.0357], device='cuda:2'), out_proj_covar=tensor([1.5275e-04, 8.8419e-05, 1.0621e-04, 9.8406e-05, 1.5032e-04, 1.3506e-04, 9.3965e-05, 1.4913e-04], device='cuda:2') 2023-03-08 09:27:07,212 INFO [train2.py:809] (2/4) Epoch 14, batch 200, loss[ctc_loss=0.1082, att_loss=0.2559, loss=0.2264, over 17011.00 frames. utt_duration=1310 frames, utt_pad_proportion=0.008961, over 52.00 utterances.], tot_loss[ctc_loss=0.09244, att_loss=0.2452, loss=0.2146, over 2074782.29 frames. utt_duration=1221 frames, utt_pad_proportion=0.06365, over 6806.86 utterances.], batch size: 52, lr: 7.89e-03, grad_scale: 8.0 2023-03-08 09:28:06,027 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.576e+02 2.279e+02 2.700e+02 3.291e+02 7.221e+02, threshold=5.400e+02, percent-clipped=2.0 2023-03-08 09:28:27,681 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52038.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:28:30,296 INFO [train2.py:809] (2/4) Epoch 14, batch 250, loss[ctc_loss=0.1259, att_loss=0.2664, loss=0.2383, over 16484.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.00571, over 46.00 utterances.], tot_loss[ctc_loss=0.09334, att_loss=0.2461, loss=0.2155, over 2330414.33 frames. utt_duration=1175 frames, utt_pad_proportion=0.07928, over 7944.32 utterances.], batch size: 46, lr: 7.88e-03, grad_scale: 8.0 2023-03-08 09:28:30,676 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52040.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:28:42,581 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52048.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:29:42,268 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52086.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:29:48,099 INFO [train2.py:809] (2/4) Epoch 14, batch 300, loss[ctc_loss=0.08477, att_loss=0.226, loss=0.1978, over 15954.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007026, over 41.00 utterances.], tot_loss[ctc_loss=0.09398, att_loss=0.2462, loss=0.2157, over 2541399.66 frames. utt_duration=1172 frames, utt_pad_proportion=0.0738, over 8681.29 utterances.], batch size: 41, lr: 7.88e-03, grad_scale: 16.0 2023-03-08 09:29:57,181 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52096.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:29:58,963 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0467, 3.8558, 3.1849, 3.5936, 3.9771, 3.6620, 3.1854, 4.3707], device='cuda:2'), covar=tensor([0.0939, 0.0479, 0.0982, 0.0633, 0.0658, 0.0656, 0.0738, 0.0425], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0194, 0.0210, 0.0182, 0.0246, 0.0221, 0.0186, 0.0262], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 09:30:05,843 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52101.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:30:05,868 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0988, 4.5639, 4.5109, 4.8587, 2.4569, 4.7397, 2.5947, 1.7858], device='cuda:2'), covar=tensor([0.0333, 0.0152, 0.0599, 0.0126, 0.1899, 0.0142, 0.1674, 0.1803], device='cuda:2'), in_proj_covar=tensor([0.0158, 0.0128, 0.0257, 0.0121, 0.0223, 0.0114, 0.0226, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 09:30:08,641 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52103.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:30:42,753 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.393e+02 2.191e+02 2.821e+02 3.545e+02 1.097e+03, threshold=5.643e+02, percent-clipped=10.0 2023-03-08 09:30:57,585 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0017, 5.1128, 5.0293, 2.5145, 2.0130, 2.8251, 2.1953, 3.8792], device='cuda:2'), covar=tensor([0.0763, 0.0180, 0.0196, 0.4178, 0.5967, 0.2581, 0.3261, 0.1644], device='cuda:2'), in_proj_covar=tensor([0.0342, 0.0236, 0.0245, 0.0222, 0.0347, 0.0335, 0.0232, 0.0354], device='cuda:2'), out_proj_covar=tensor([1.5083e-04, 8.7472e-05, 1.0574e-04, 9.7124e-05, 1.4871e-04, 1.3390e-04, 9.2291e-05, 1.4764e-04], device='cuda:2') 2023-03-08 09:31:06,326 INFO [train2.py:809] (2/4) Epoch 14, batch 350, loss[ctc_loss=0.08622, att_loss=0.2306, loss=0.2017, over 16123.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006418, over 42.00 utterances.], tot_loss[ctc_loss=0.09355, att_loss=0.2461, loss=0.2156, over 2693780.98 frames. utt_duration=1201 frames, utt_pad_proportion=0.0696, over 8980.32 utterances.], batch size: 42, lr: 7.88e-03, grad_scale: 16.0 2023-03-08 09:31:11,456 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4206, 4.3557, 4.1295, 4.2953, 4.7138, 4.5075, 4.2744, 2.4208], device='cuda:2'), covar=tensor([0.0213, 0.0272, 0.0358, 0.0241, 0.0888, 0.0189, 0.0269, 0.2013], device='cuda:2'), in_proj_covar=tensor([0.0131, 0.0141, 0.0146, 0.0159, 0.0346, 0.0129, 0.0130, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 09:31:24,126 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52151.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:31:27,222 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1834, 4.6777, 4.5884, 5.0014, 2.7177, 4.8179, 2.7442, 2.2554], device='cuda:2'), covar=tensor([0.0337, 0.0160, 0.0700, 0.0120, 0.1876, 0.0134, 0.1620, 0.1615], device='cuda:2'), in_proj_covar=tensor([0.0157, 0.0128, 0.0258, 0.0121, 0.0224, 0.0115, 0.0227, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 09:31:27,883 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-08 09:32:25,894 INFO [train2.py:809] (2/4) Epoch 14, batch 400, loss[ctc_loss=0.1016, att_loss=0.2298, loss=0.2042, over 15996.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007869, over 40.00 utterances.], tot_loss[ctc_loss=0.09391, att_loss=0.246, loss=0.2155, over 2823898.04 frames. utt_duration=1228 frames, utt_pad_proportion=0.06181, over 9213.12 utterances.], batch size: 40, lr: 7.87e-03, grad_scale: 16.0 2023-03-08 09:32:31,136 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52193.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:32:40,746 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52199.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:33:21,771 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+02 2.286e+02 2.626e+02 3.493e+02 1.383e+03, threshold=5.251e+02, percent-clipped=5.0 2023-03-08 09:33:46,078 INFO [train2.py:809] (2/4) Epoch 14, batch 450, loss[ctc_loss=0.0671, att_loss=0.2127, loss=0.1836, over 16187.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.0065, over 41.00 utterances.], tot_loss[ctc_loss=0.09309, att_loss=0.2456, loss=0.2151, over 2934028.53 frames. utt_duration=1240 frames, utt_pad_proportion=0.05437, over 9478.37 utterances.], batch size: 41, lr: 7.87e-03, grad_scale: 16.0 2023-03-08 09:34:09,176 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52254.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:35:04,913 INFO [train2.py:809] (2/4) Epoch 14, batch 500, loss[ctc_loss=0.09502, att_loss=0.2558, loss=0.2237, over 17370.00 frames. utt_duration=881 frames, utt_pad_proportion=0.07746, over 79.00 utterances.], tot_loss[ctc_loss=0.09246, att_loss=0.2451, loss=0.2146, over 3006924.51 frames. utt_duration=1269 frames, utt_pad_proportion=0.04775, over 9490.19 utterances.], batch size: 79, lr: 7.87e-03, grad_scale: 16.0 2023-03-08 09:35:29,371 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.53 vs. limit=2.0 2023-03-08 09:35:39,562 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52312.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:36:01,138 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.299e+02 2.275e+02 2.805e+02 3.649e+02 6.948e+02, threshold=5.610e+02, percent-clipped=5.0 2023-03-08 09:36:05,231 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4775, 2.6699, 3.6479, 2.8470, 3.4303, 4.6233, 4.4260, 3.1415], device='cuda:2'), covar=tensor([0.0398, 0.2190, 0.1159, 0.1506, 0.1143, 0.0779, 0.0573, 0.1570], device='cuda:2'), in_proj_covar=tensor([0.0230, 0.0232, 0.0253, 0.0207, 0.0250, 0.0324, 0.0231, 0.0224], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 09:36:23,207 INFO [train2.py:809] (2/4) Epoch 14, batch 550, loss[ctc_loss=0.1218, att_loss=0.273, loss=0.2427, over 17378.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03431, over 63.00 utterances.], tot_loss[ctc_loss=0.09162, att_loss=0.2446, loss=0.214, over 3069740.09 frames. utt_duration=1295 frames, utt_pad_proportion=0.04127, over 9490.95 utterances.], batch size: 63, lr: 7.86e-03, grad_scale: 8.0 2023-03-08 09:37:15,139 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52373.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:37:42,123 INFO [train2.py:809] (2/4) Epoch 14, batch 600, loss[ctc_loss=0.0785, att_loss=0.2236, loss=0.1946, over 16025.00 frames. utt_duration=1604 frames, utt_pad_proportion=0.006196, over 40.00 utterances.], tot_loss[ctc_loss=0.09113, att_loss=0.2442, loss=0.2136, over 3120927.90 frames. utt_duration=1301 frames, utt_pad_proportion=0.03772, over 9605.59 utterances.], batch size: 40, lr: 7.86e-03, grad_scale: 8.0 2023-03-08 09:37:51,398 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52396.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:38:03,148 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52403.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:38:39,775 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.434e+02 2.147e+02 2.714e+02 3.419e+02 5.544e+02, threshold=5.429e+02, percent-clipped=0.0 2023-03-08 09:39:01,290 INFO [train2.py:809] (2/4) Epoch 14, batch 650, loss[ctc_loss=0.1001, att_loss=0.245, loss=0.216, over 16758.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.007102, over 48.00 utterances.], tot_loss[ctc_loss=0.09094, att_loss=0.2447, loss=0.2139, over 3158982.12 frames. utt_duration=1273 frames, utt_pad_proportion=0.04386, over 9941.56 utterances.], batch size: 48, lr: 7.85e-03, grad_scale: 8.0 2023-03-08 09:39:18,702 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52451.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:39:51,716 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1347, 5.2135, 5.0692, 2.5371, 2.0355, 2.8596, 2.7123, 3.8574], device='cuda:2'), covar=tensor([0.0678, 0.0274, 0.0220, 0.4357, 0.5766, 0.2539, 0.2671, 0.1756], device='cuda:2'), in_proj_covar=tensor([0.0342, 0.0237, 0.0245, 0.0223, 0.0348, 0.0335, 0.0231, 0.0353], device='cuda:2'), out_proj_covar=tensor([1.5068e-04, 8.8237e-05, 1.0640e-04, 9.7536e-05, 1.4882e-04, 1.3395e-04, 9.1641e-05, 1.4740e-04], device='cuda:2') 2023-03-08 09:40:19,852 INFO [train2.py:809] (2/4) Epoch 14, batch 700, loss[ctc_loss=0.08988, att_loss=0.2532, loss=0.2206, over 17123.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01392, over 56.00 utterances.], tot_loss[ctc_loss=0.09151, att_loss=0.245, loss=0.2143, over 3188145.81 frames. utt_duration=1276 frames, utt_pad_proportion=0.04257, over 10005.56 utterances.], batch size: 56, lr: 7.85e-03, grad_scale: 8.0 2023-03-08 09:41:15,851 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.392e+02 1.961e+02 2.440e+02 3.056e+02 5.134e+02, threshold=4.881e+02, percent-clipped=0.0 2023-03-08 09:41:22,236 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2023-03-08 09:41:27,588 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3102, 4.5992, 4.5204, 4.9679, 2.7466, 4.7759, 2.5339, 2.1219], device='cuda:2'), covar=tensor([0.0276, 0.0153, 0.0661, 0.0092, 0.1717, 0.0110, 0.1672, 0.1576], device='cuda:2'), in_proj_covar=tensor([0.0156, 0.0130, 0.0258, 0.0120, 0.0224, 0.0115, 0.0227, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 09:41:37,731 INFO [train2.py:809] (2/4) Epoch 14, batch 750, loss[ctc_loss=0.06809, att_loss=0.2226, loss=0.1917, over 15775.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.00907, over 38.00 utterances.], tot_loss[ctc_loss=0.09093, att_loss=0.245, loss=0.2142, over 3213471.59 frames. utt_duration=1285 frames, utt_pad_proportion=0.04092, over 10014.86 utterances.], batch size: 38, lr: 7.85e-03, grad_scale: 8.0 2023-03-08 09:41:45,335 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-08 09:41:52,172 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52549.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:42:02,155 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52555.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 09:42:56,741 INFO [train2.py:809] (2/4) Epoch 14, batch 800, loss[ctc_loss=0.1119, att_loss=0.269, loss=0.2376, over 17263.00 frames. utt_duration=1172 frames, utt_pad_proportion=0.0258, over 59.00 utterances.], tot_loss[ctc_loss=0.09148, att_loss=0.2452, loss=0.2144, over 3225786.94 frames. utt_duration=1263 frames, utt_pad_proportion=0.04819, over 10227.28 utterances.], batch size: 59, lr: 7.84e-03, grad_scale: 8.0 2023-03-08 09:43:38,092 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52616.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 09:43:39,444 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2503, 2.7074, 3.5341, 2.8622, 3.4054, 4.4008, 4.2216, 3.1383], device='cuda:2'), covar=tensor([0.0415, 0.1853, 0.0997, 0.1346, 0.0949, 0.0911, 0.0571, 0.1258], device='cuda:2'), in_proj_covar=tensor([0.0232, 0.0232, 0.0253, 0.0208, 0.0248, 0.0325, 0.0232, 0.0224], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 09:43:43,823 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-08 09:43:54,265 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.482e+02 2.071e+02 2.560e+02 3.132e+02 4.899e+02, threshold=5.119e+02, percent-clipped=1.0 2023-03-08 09:44:16,112 INFO [train2.py:809] (2/4) Epoch 14, batch 850, loss[ctc_loss=0.06095, att_loss=0.2142, loss=0.1835, over 15495.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009009, over 36.00 utterances.], tot_loss[ctc_loss=0.09166, att_loss=0.245, loss=0.2143, over 3230126.79 frames. utt_duration=1247 frames, utt_pad_proportion=0.05497, over 10375.95 utterances.], batch size: 36, lr: 7.84e-03, grad_scale: 8.0 2023-03-08 09:44:20,932 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0145, 5.3486, 5.2494, 5.2130, 5.3707, 5.3239, 5.0480, 4.8477], device='cuda:2'), covar=tensor([0.0909, 0.0453, 0.0294, 0.0451, 0.0280, 0.0263, 0.0340, 0.0310], device='cuda:2'), in_proj_covar=tensor([0.0475, 0.0314, 0.0279, 0.0307, 0.0360, 0.0378, 0.0311, 0.0346], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 09:45:00,849 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52668.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:45:32,741 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2302, 4.6929, 4.4732, 4.8424, 2.7741, 4.5822, 2.6762, 2.2310], device='cuda:2'), covar=tensor([0.0312, 0.0145, 0.0648, 0.0127, 0.1740, 0.0168, 0.1609, 0.1525], device='cuda:2'), in_proj_covar=tensor([0.0157, 0.0130, 0.0259, 0.0122, 0.0226, 0.0116, 0.0229, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 09:45:35,372 INFO [train2.py:809] (2/4) Epoch 14, batch 900, loss[ctc_loss=0.07325, att_loss=0.2213, loss=0.1917, over 15876.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009306, over 39.00 utterances.], tot_loss[ctc_loss=0.09089, att_loss=0.2443, loss=0.2136, over 3242335.84 frames. utt_duration=1283 frames, utt_pad_proportion=0.04634, over 10117.06 utterances.], batch size: 39, lr: 7.84e-03, grad_scale: 8.0 2023-03-08 09:45:44,738 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52696.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:46:32,364 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.422e+02 2.184e+02 2.605e+02 3.188e+02 5.544e+02, threshold=5.210e+02, percent-clipped=2.0 2023-03-08 09:46:53,846 INFO [train2.py:809] (2/4) Epoch 14, batch 950, loss[ctc_loss=0.0714, att_loss=0.2149, loss=0.1862, over 14487.00 frames. utt_duration=1812 frames, utt_pad_proportion=0.04331, over 32.00 utterances.], tot_loss[ctc_loss=0.09085, att_loss=0.2439, loss=0.2133, over 3244288.58 frames. utt_duration=1279 frames, utt_pad_proportion=0.04874, over 10159.81 utterances.], batch size: 32, lr: 7.83e-03, grad_scale: 8.0 2023-03-08 09:47:00,136 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52744.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:47:13,071 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52752.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:47:59,774 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.81 vs. limit=5.0 2023-03-08 09:48:12,276 INFO [train2.py:809] (2/4) Epoch 14, batch 1000, loss[ctc_loss=0.1032, att_loss=0.2557, loss=0.2252, over 17203.00 frames. utt_duration=872.5 frames, utt_pad_proportion=0.08445, over 79.00 utterances.], tot_loss[ctc_loss=0.09018, att_loss=0.2435, loss=0.2128, over 3253101.53 frames. utt_duration=1290 frames, utt_pad_proportion=0.04455, over 10098.01 utterances.], batch size: 79, lr: 7.83e-03, grad_scale: 8.0 2023-03-08 09:48:48,539 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52813.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:49:09,713 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.532e+02 2.243e+02 2.663e+02 3.401e+02 6.618e+02, threshold=5.327e+02, percent-clipped=3.0 2023-03-08 09:49:31,215 INFO [train2.py:809] (2/4) Epoch 14, batch 1050, loss[ctc_loss=0.0698, att_loss=0.2168, loss=0.1874, over 15616.00 frames. utt_duration=1690 frames, utt_pad_proportion=0.01051, over 37.00 utterances.], tot_loss[ctc_loss=0.08944, att_loss=0.2427, loss=0.2121, over 3252976.92 frames. utt_duration=1293 frames, utt_pad_proportion=0.04428, over 10072.63 utterances.], batch size: 37, lr: 7.82e-03, grad_scale: 4.0 2023-03-08 09:49:45,239 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52849.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:50:49,485 INFO [train2.py:809] (2/4) Epoch 14, batch 1100, loss[ctc_loss=0.08189, att_loss=0.2487, loss=0.2153, over 17284.00 frames. utt_duration=1004 frames, utt_pad_proportion=0.05235, over 69.00 utterances.], tot_loss[ctc_loss=0.09021, att_loss=0.2435, loss=0.2128, over 3259068.04 frames. utt_duration=1268 frames, utt_pad_proportion=0.04883, over 10289.48 utterances.], batch size: 69, lr: 7.82e-03, grad_scale: 4.0 2023-03-08 09:51:00,110 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52897.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:51:23,354 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52911.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 09:51:47,356 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2023-03-08 09:51:48,090 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.396e+02 2.277e+02 2.760e+02 3.306e+02 6.438e+02, threshold=5.521e+02, percent-clipped=4.0 2023-03-08 09:52:07,935 INFO [train2.py:809] (2/4) Epoch 14, batch 1150, loss[ctc_loss=0.1386, att_loss=0.2729, loss=0.2461, over 17348.00 frames. utt_duration=1007 frames, utt_pad_proportion=0.05088, over 69.00 utterances.], tot_loss[ctc_loss=0.09155, att_loss=0.2441, loss=0.2136, over 3264095.93 frames. utt_duration=1274 frames, utt_pad_proportion=0.04694, over 10260.91 utterances.], batch size: 69, lr: 7.82e-03, grad_scale: 4.0 2023-03-08 09:52:52,779 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52968.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:53:26,827 INFO [train2.py:809] (2/4) Epoch 14, batch 1200, loss[ctc_loss=0.08492, att_loss=0.2168, loss=0.1904, over 15516.00 frames. utt_duration=1726 frames, utt_pad_proportion=0.007715, over 36.00 utterances.], tot_loss[ctc_loss=0.09222, att_loss=0.2448, loss=0.2143, over 3273695.34 frames. utt_duration=1268 frames, utt_pad_proportion=0.04599, over 10342.67 utterances.], batch size: 36, lr: 7.81e-03, grad_scale: 8.0 2023-03-08 09:54:08,668 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53016.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:54:16,488 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-08 09:54:25,777 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.353e+02 2.184e+02 2.648e+02 3.563e+02 1.260e+03, threshold=5.296e+02, percent-clipped=9.0 2023-03-08 09:54:45,535 INFO [train2.py:809] (2/4) Epoch 14, batch 1250, loss[ctc_loss=0.1065, att_loss=0.2648, loss=0.2331, over 17442.00 frames. utt_duration=1013 frames, utt_pad_proportion=0.04456, over 69.00 utterances.], tot_loss[ctc_loss=0.09229, att_loss=0.2448, loss=0.2143, over 3274835.58 frames. utt_duration=1262 frames, utt_pad_proportion=0.04834, over 10396.28 utterances.], batch size: 69, lr: 7.81e-03, grad_scale: 8.0 2023-03-08 09:55:10,579 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53056.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:55:53,660 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0967, 5.1388, 4.9745, 2.5335, 1.9455, 2.7858, 2.5308, 3.8537], device='cuda:2'), covar=tensor([0.0686, 0.0224, 0.0240, 0.4593, 0.5780, 0.2627, 0.2912, 0.1638], device='cuda:2'), in_proj_covar=tensor([0.0343, 0.0240, 0.0248, 0.0225, 0.0349, 0.0339, 0.0234, 0.0357], device='cuda:2'), out_proj_covar=tensor([1.5082e-04, 8.8837e-05, 1.0703e-04, 9.8935e-05, 1.4927e-04, 1.3503e-04, 9.3048e-05, 1.4856e-04], device='cuda:2') 2023-03-08 09:56:03,883 INFO [train2.py:809] (2/4) Epoch 14, batch 1300, loss[ctc_loss=0.08792, att_loss=0.2427, loss=0.2117, over 17302.00 frames. utt_duration=877.8 frames, utt_pad_proportion=0.08085, over 79.00 utterances.], tot_loss[ctc_loss=0.09188, att_loss=0.2443, loss=0.2139, over 3276016.75 frames. utt_duration=1276 frames, utt_pad_proportion=0.0443, over 10281.64 utterances.], batch size: 79, lr: 7.81e-03, grad_scale: 8.0 2023-03-08 09:56:04,918 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-03-08 09:56:32,030 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53108.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:56:41,959 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0686, 5.3328, 5.3062, 5.2921, 5.3975, 5.3045, 5.0870, 4.8064], device='cuda:2'), covar=tensor([0.0880, 0.0409, 0.0244, 0.0413, 0.0252, 0.0268, 0.0273, 0.0307], device='cuda:2'), in_proj_covar=tensor([0.0473, 0.0316, 0.0280, 0.0310, 0.0364, 0.0381, 0.0310, 0.0349], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 09:56:47,504 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53117.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 09:57:03,198 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.477e+02 2.168e+02 2.572e+02 3.596e+02 7.210e+02, threshold=5.144e+02, percent-clipped=5.0 2023-03-08 09:57:23,233 INFO [train2.py:809] (2/4) Epoch 14, batch 1350, loss[ctc_loss=0.1108, att_loss=0.2658, loss=0.2348, over 17039.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.009521, over 52.00 utterances.], tot_loss[ctc_loss=0.09111, att_loss=0.2447, loss=0.214, over 3285737.50 frames. utt_duration=1277 frames, utt_pad_proportion=0.04127, over 10301.38 utterances.], batch size: 52, lr: 7.80e-03, grad_scale: 8.0 2023-03-08 09:57:56,772 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4204, 2.4848, 5.0416, 3.9001, 2.7589, 4.2638, 4.8359, 4.6644], device='cuda:2'), covar=tensor([0.0264, 0.1974, 0.0145, 0.1068, 0.2114, 0.0251, 0.0110, 0.0224], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0245, 0.0150, 0.0307, 0.0268, 0.0190, 0.0133, 0.0166], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 09:58:00,575 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5124, 2.4103, 5.0450, 3.8984, 2.7489, 4.3271, 4.8570, 4.6844], device='cuda:2'), covar=tensor([0.0252, 0.1925, 0.0143, 0.1069, 0.2117, 0.0232, 0.0108, 0.0226], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0245, 0.0150, 0.0306, 0.0268, 0.0189, 0.0133, 0.0166], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 09:58:41,977 INFO [train2.py:809] (2/4) Epoch 14, batch 1400, loss[ctc_loss=0.1121, att_loss=0.2622, loss=0.2321, over 17370.00 frames. utt_duration=1104 frames, utt_pad_proportion=0.03394, over 63.00 utterances.], tot_loss[ctc_loss=0.092, att_loss=0.2448, loss=0.2142, over 3275232.01 frames. utt_duration=1246 frames, utt_pad_proportion=0.05194, over 10526.22 utterances.], batch size: 63, lr: 7.80e-03, grad_scale: 8.0 2023-03-08 09:59:16,385 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53211.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 09:59:41,363 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.524e+02 2.260e+02 2.643e+02 3.236e+02 5.232e+02, threshold=5.286e+02, percent-clipped=1.0 2023-03-08 10:00:01,538 INFO [train2.py:809] (2/4) Epoch 14, batch 1450, loss[ctc_loss=0.06919, att_loss=0.2257, loss=0.1944, over 16401.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007692, over 44.00 utterances.], tot_loss[ctc_loss=0.09256, att_loss=0.2447, loss=0.2143, over 3270334.43 frames. utt_duration=1220 frames, utt_pad_proportion=0.06085, over 10732.53 utterances.], batch size: 44, lr: 7.80e-03, grad_scale: 8.0 2023-03-08 10:00:03,456 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53241.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:00:06,931 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0270, 5.2937, 5.2977, 5.2375, 5.4044, 5.3365, 5.0991, 4.7966], device='cuda:2'), covar=tensor([0.1049, 0.0578, 0.0265, 0.0514, 0.0278, 0.0302, 0.0298, 0.0361], device='cuda:2'), in_proj_covar=tensor([0.0481, 0.0321, 0.0285, 0.0316, 0.0369, 0.0385, 0.0316, 0.0355], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 10:00:23,765 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6497, 5.0379, 5.2406, 5.0370, 5.0482, 5.6072, 5.0641, 5.7032], device='cuda:2'), covar=tensor([0.0715, 0.0667, 0.0748, 0.1237, 0.1863, 0.0946, 0.0667, 0.0671], device='cuda:2'), in_proj_covar=tensor([0.0783, 0.0462, 0.0542, 0.0601, 0.0798, 0.0551, 0.0436, 0.0538], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 10:00:31,931 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53259.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 10:00:42,528 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7368, 3.9435, 3.8679, 3.9553, 4.0087, 3.7846, 3.0073, 3.8721], device='cuda:2'), covar=tensor([0.0120, 0.0117, 0.0137, 0.0076, 0.0086, 0.0112, 0.0605, 0.0213], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0076, 0.0095, 0.0059, 0.0065, 0.0075, 0.0096, 0.0097], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 10:01:21,624 INFO [train2.py:809] (2/4) Epoch 14, batch 1500, loss[ctc_loss=0.09378, att_loss=0.23, loss=0.2027, over 15632.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.009041, over 37.00 utterances.], tot_loss[ctc_loss=0.09247, att_loss=0.2447, loss=0.2143, over 3268954.88 frames. utt_duration=1206 frames, utt_pad_proportion=0.06522, over 10859.49 utterances.], batch size: 37, lr: 7.79e-03, grad_scale: 8.0 2023-03-08 10:01:41,575 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53302.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:01:44,596 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53304.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:02:02,620 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53315.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:02:20,571 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.404e+02 2.202e+02 2.545e+02 3.206e+02 8.434e+02, threshold=5.090e+02, percent-clipped=3.0 2023-03-08 10:02:41,152 INFO [train2.py:809] (2/4) Epoch 14, batch 1550, loss[ctc_loss=0.08319, att_loss=0.2292, loss=0.2, over 16179.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006197, over 41.00 utterances.], tot_loss[ctc_loss=0.09204, att_loss=0.2448, loss=0.2142, over 3267035.07 frames. utt_duration=1205 frames, utt_pad_proportion=0.06513, over 10861.53 utterances.], batch size: 41, lr: 7.79e-03, grad_scale: 8.0 2023-03-08 10:03:21,796 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53365.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:03:24,979 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53367.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:03:37,809 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2023-03-08 10:03:38,792 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53376.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:03:47,403 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5850, 4.8150, 4.2721, 4.7421, 4.4349, 4.1204, 4.3520, 4.1479], device='cuda:2'), covar=tensor([0.1368, 0.1263, 0.1132, 0.0961, 0.1241, 0.1804, 0.2514, 0.2628], device='cuda:2'), in_proj_covar=tensor([0.0475, 0.0548, 0.0412, 0.0403, 0.0389, 0.0442, 0.0562, 0.0497], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 10:03:59,967 INFO [train2.py:809] (2/4) Epoch 14, batch 1600, loss[ctc_loss=0.06303, att_loss=0.2226, loss=0.1907, over 16541.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006362, over 45.00 utterances.], tot_loss[ctc_loss=0.09241, att_loss=0.2451, loss=0.2146, over 3270697.56 frames. utt_duration=1192 frames, utt_pad_proportion=0.0674, over 10986.11 utterances.], batch size: 45, lr: 7.78e-03, grad_scale: 8.0 2023-03-08 10:04:29,513 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53408.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:04:36,036 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53412.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:04:56,771 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2023-03-08 10:04:58,720 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.459e+02 2.287e+02 2.681e+02 3.372e+02 7.467e+02, threshold=5.361e+02, percent-clipped=4.0 2023-03-08 10:05:00,644 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53428.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 10:05:19,365 INFO [train2.py:809] (2/4) Epoch 14, batch 1650, loss[ctc_loss=0.05798, att_loss=0.2136, loss=0.1825, over 15507.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.00837, over 36.00 utterances.], tot_loss[ctc_loss=0.09155, att_loss=0.2448, loss=0.2141, over 3275167.90 frames. utt_duration=1228 frames, utt_pad_proportion=0.05796, over 10678.09 utterances.], batch size: 36, lr: 7.78e-03, grad_scale: 8.0 2023-03-08 10:05:44,647 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53456.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:06:38,740 INFO [train2.py:809] (2/4) Epoch 14, batch 1700, loss[ctc_loss=0.09966, att_loss=0.2567, loss=0.2253, over 16470.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006508, over 46.00 utterances.], tot_loss[ctc_loss=0.09109, att_loss=0.2451, loss=0.2143, over 3278606.10 frames. utt_duration=1239 frames, utt_pad_proportion=0.05497, over 10595.40 utterances.], batch size: 46, lr: 7.78e-03, grad_scale: 8.0 2023-03-08 10:07:31,345 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0099, 1.9740, 2.1047, 2.7489, 2.5863, 2.0973, 2.2558, 2.8593], device='cuda:2'), covar=tensor([0.1366, 0.4853, 0.3629, 0.1273, 0.1512, 0.2159, 0.3220, 0.1172], device='cuda:2'), in_proj_covar=tensor([0.0087, 0.0095, 0.0096, 0.0081, 0.0085, 0.0077, 0.0099, 0.0068], device='cuda:2'), out_proj_covar=tensor([6.0905e-05, 6.8505e-05, 7.0779e-05, 5.9518e-05, 6.0115e-05, 5.9014e-05, 6.9649e-05, 5.2910e-05], device='cuda:2') 2023-03-08 10:07:33,531 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-08 10:07:37,132 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.475e+02 2.130e+02 2.657e+02 3.171e+02 6.492e+02, threshold=5.314e+02, percent-clipped=1.0 2023-03-08 10:07:57,987 INFO [train2.py:809] (2/4) Epoch 14, batch 1750, loss[ctc_loss=0.1013, att_loss=0.2551, loss=0.2244, over 16342.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005518, over 45.00 utterances.], tot_loss[ctc_loss=0.09086, att_loss=0.2444, loss=0.2137, over 3267824.50 frames. utt_duration=1235 frames, utt_pad_proportion=0.05943, over 10593.05 utterances.], batch size: 45, lr: 7.77e-03, grad_scale: 8.0 2023-03-08 10:08:48,097 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.64 vs. limit=5.0 2023-03-08 10:09:17,125 INFO [train2.py:809] (2/4) Epoch 14, batch 1800, loss[ctc_loss=0.0823, att_loss=0.2383, loss=0.2071, over 16285.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006747, over 43.00 utterances.], tot_loss[ctc_loss=0.09118, att_loss=0.2451, loss=0.2143, over 3275449.56 frames. utt_duration=1219 frames, utt_pad_proportion=0.0602, over 10762.89 utterances.], batch size: 43, lr: 7.77e-03, grad_scale: 8.0 2023-03-08 10:09:28,935 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53597.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:09:45,877 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7952, 3.6245, 3.0966, 3.4284, 3.9004, 3.6140, 2.9629, 4.2269], device='cuda:2'), covar=tensor([0.1077, 0.0432, 0.1019, 0.0635, 0.0637, 0.0618, 0.0774, 0.0417], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0194, 0.0210, 0.0183, 0.0249, 0.0221, 0.0189, 0.0266], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 10:10:17,824 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 2.192e+02 2.789e+02 3.495e+02 6.708e+02, threshold=5.578e+02, percent-clipped=5.0 2023-03-08 10:10:21,451 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0827, 3.8131, 3.2798, 3.6137, 4.0585, 3.8543, 3.2102, 4.5076], device='cuda:2'), covar=tensor([0.0956, 0.0422, 0.0928, 0.0606, 0.0655, 0.0548, 0.0757, 0.0380], device='cuda:2'), in_proj_covar=tensor([0.0191, 0.0195, 0.0211, 0.0184, 0.0250, 0.0222, 0.0190, 0.0266], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 10:10:39,128 INFO [train2.py:809] (2/4) Epoch 14, batch 1850, loss[ctc_loss=0.09936, att_loss=0.263, loss=0.2303, over 17375.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03424, over 63.00 utterances.], tot_loss[ctc_loss=0.09142, att_loss=0.2452, loss=0.2145, over 3275387.78 frames. utt_duration=1213 frames, utt_pad_proportion=0.06264, over 10816.51 utterances.], batch size: 63, lr: 7.77e-03, grad_scale: 8.0 2023-03-08 10:11:12,781 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53660.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:11:29,366 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53671.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:11:36,286 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 10:11:59,262 INFO [train2.py:809] (2/4) Epoch 14, batch 1900, loss[ctc_loss=0.08073, att_loss=0.2383, loss=0.2068, over 15640.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.009209, over 37.00 utterances.], tot_loss[ctc_loss=0.091, att_loss=0.2449, loss=0.2141, over 3277597.99 frames. utt_duration=1233 frames, utt_pad_proportion=0.0576, over 10647.78 utterances.], batch size: 37, lr: 7.76e-03, grad_scale: 8.0 2023-03-08 10:12:28,693 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9681, 3.7557, 3.1980, 3.5637, 4.0536, 3.7802, 2.9421, 4.3988], device='cuda:2'), covar=tensor([0.1029, 0.0548, 0.1076, 0.0635, 0.0659, 0.0598, 0.0890, 0.0424], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0194, 0.0211, 0.0183, 0.0248, 0.0221, 0.0188, 0.0265], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 10:12:32,207 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7995, 6.0843, 5.5096, 5.8797, 5.7438, 5.3705, 5.5525, 5.4463], device='cuda:2'), covar=tensor([0.1459, 0.0873, 0.0914, 0.0708, 0.0692, 0.1371, 0.2223, 0.2220], device='cuda:2'), in_proj_covar=tensor([0.0475, 0.0540, 0.0410, 0.0401, 0.0388, 0.0439, 0.0555, 0.0493], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 10:12:35,609 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53712.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:12:43,133 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1545, 5.2052, 4.9632, 2.9566, 4.8674, 4.7637, 4.2994, 2.7624], device='cuda:2'), covar=tensor([0.0129, 0.0075, 0.0240, 0.1007, 0.0090, 0.0156, 0.0316, 0.1352], device='cuda:2'), in_proj_covar=tensor([0.0065, 0.0090, 0.0086, 0.0106, 0.0075, 0.0100, 0.0095, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 10:12:52,348 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53723.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 10:12:58,261 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.175e+02 2.093e+02 2.485e+02 3.000e+02 6.734e+02, threshold=4.970e+02, percent-clipped=1.0 2023-03-08 10:13:18,791 INFO [train2.py:809] (2/4) Epoch 14, batch 1950, loss[ctc_loss=0.1001, att_loss=0.2554, loss=0.2243, over 16610.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.006186, over 47.00 utterances.], tot_loss[ctc_loss=0.09053, att_loss=0.2445, loss=0.2137, over 3277058.49 frames. utt_duration=1249 frames, utt_pad_proportion=0.05415, over 10507.12 utterances.], batch size: 47, lr: 7.76e-03, grad_scale: 8.0 2023-03-08 10:13:50,783 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53760.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:14:37,559 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.87 vs. limit=2.0 2023-03-08 10:14:38,393 INFO [train2.py:809] (2/4) Epoch 14, batch 2000, loss[ctc_loss=0.08297, att_loss=0.249, loss=0.2158, over 17383.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03413, over 63.00 utterances.], tot_loss[ctc_loss=0.08961, att_loss=0.2439, loss=0.213, over 3275466.71 frames. utt_duration=1272 frames, utt_pad_proportion=0.0481, over 10312.18 utterances.], batch size: 63, lr: 7.76e-03, grad_scale: 8.0 2023-03-08 10:14:49,853 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6746, 5.0502, 4.8771, 4.9452, 5.1378, 4.6932, 3.7154, 4.9776], device='cuda:2'), covar=tensor([0.0113, 0.0116, 0.0120, 0.0089, 0.0092, 0.0115, 0.0662, 0.0268], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0077, 0.0096, 0.0059, 0.0065, 0.0076, 0.0096, 0.0097], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 10:15:36,775 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.520e+02 2.201e+02 2.632e+02 3.371e+02 1.571e+03, threshold=5.264e+02, percent-clipped=3.0 2023-03-08 10:15:56,557 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5184, 3.6118, 3.4080, 2.9604, 3.5939, 3.4905, 3.5199, 2.3770], device='cuda:2'), covar=tensor([0.1250, 0.1449, 0.3081, 0.5259, 0.1203, 0.3063, 0.1037, 0.6948], device='cuda:2'), in_proj_covar=tensor([0.0117, 0.0139, 0.0148, 0.0221, 0.0117, 0.0207, 0.0127, 0.0187], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 10:15:57,724 INFO [train2.py:809] (2/4) Epoch 14, batch 2050, loss[ctc_loss=0.07573, att_loss=0.2234, loss=0.1939, over 16539.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.005751, over 45.00 utterances.], tot_loss[ctc_loss=0.09008, att_loss=0.2441, loss=0.2133, over 3268557.46 frames. utt_duration=1234 frames, utt_pad_proportion=0.06002, over 10605.98 utterances.], batch size: 45, lr: 7.75e-03, grad_scale: 8.0 2023-03-08 10:17:18,529 INFO [train2.py:809] (2/4) Epoch 14, batch 2100, loss[ctc_loss=0.08451, att_loss=0.2392, loss=0.2083, over 16191.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.005599, over 41.00 utterances.], tot_loss[ctc_loss=0.0895, att_loss=0.2431, loss=0.2124, over 3264537.99 frames. utt_duration=1254 frames, utt_pad_proportion=0.05735, over 10423.56 utterances.], batch size: 41, lr: 7.75e-03, grad_scale: 8.0 2023-03-08 10:17:27,927 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.55 vs. limit=2.0 2023-03-08 10:17:30,139 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53897.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:18:17,165 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.409e+02 2.088e+02 2.471e+02 3.126e+02 7.099e+02, threshold=4.942e+02, percent-clipped=2.0 2023-03-08 10:18:37,840 INFO [train2.py:809] (2/4) Epoch 14, batch 2150, loss[ctc_loss=0.09593, att_loss=0.2599, loss=0.2271, over 16872.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007479, over 49.00 utterances.], tot_loss[ctc_loss=0.08965, att_loss=0.2433, loss=0.2126, over 3270575.68 frames. utt_duration=1257 frames, utt_pad_proportion=0.05425, over 10418.54 utterances.], batch size: 49, lr: 7.75e-03, grad_scale: 8.0 2023-03-08 10:18:41,277 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0338, 4.9726, 4.8246, 2.9426, 4.6615, 4.6529, 4.3290, 2.6352], device='cuda:2'), covar=tensor([0.0139, 0.0104, 0.0260, 0.1080, 0.0119, 0.0190, 0.0315, 0.1467], device='cuda:2'), in_proj_covar=tensor([0.0065, 0.0090, 0.0086, 0.0106, 0.0075, 0.0100, 0.0095, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 10:18:45,611 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53945.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:19:10,081 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53960.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:19:27,873 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53971.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:19:57,013 INFO [train2.py:809] (2/4) Epoch 14, batch 2200, loss[ctc_loss=0.07645, att_loss=0.2384, loss=0.206, over 16844.00 frames. utt_duration=682 frames, utt_pad_proportion=0.1421, over 99.00 utterances.], tot_loss[ctc_loss=0.09038, att_loss=0.2438, loss=0.2131, over 3277857.14 frames. utt_duration=1257 frames, utt_pad_proportion=0.05202, over 10442.46 utterances.], batch size: 99, lr: 7.74e-03, grad_scale: 8.0 2023-03-08 10:20:30,046 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=54008.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:20:47,494 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=54019.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:20:53,796 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=54023.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 10:20:59,197 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.392e+02 2.188e+02 2.816e+02 3.562e+02 5.650e+02, threshold=5.631e+02, percent-clipped=6.0 2023-03-08 10:21:20,203 INFO [train2.py:809] (2/4) Epoch 14, batch 2250, loss[ctc_loss=0.09368, att_loss=0.2532, loss=0.2213, over 17003.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.008981, over 51.00 utterances.], tot_loss[ctc_loss=0.09088, att_loss=0.2443, loss=0.2136, over 3287250.72 frames. utt_duration=1263 frames, utt_pad_proportion=0.04756, over 10424.79 utterances.], batch size: 51, lr: 7.74e-03, grad_scale: 8.0 2023-03-08 10:21:25,112 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6365, 3.7063, 2.8650, 3.1142, 3.8593, 3.5037, 2.5435, 4.0922], device='cuda:2'), covar=tensor([0.1280, 0.0437, 0.1209, 0.0762, 0.0616, 0.0681, 0.1118, 0.0503], device='cuda:2'), in_proj_covar=tensor([0.0191, 0.0194, 0.0213, 0.0184, 0.0251, 0.0222, 0.0190, 0.0266], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 10:21:36,567 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6545, 2.6038, 5.2519, 4.0911, 3.0787, 4.4368, 5.0256, 4.7889], device='cuda:2'), covar=tensor([0.0270, 0.1667, 0.0163, 0.0856, 0.1688, 0.0249, 0.0114, 0.0241], device='cuda:2'), in_proj_covar=tensor([0.0162, 0.0245, 0.0150, 0.0306, 0.0267, 0.0192, 0.0137, 0.0167], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 10:21:54,048 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0628, 5.3358, 4.7144, 5.6042, 4.9130, 5.0915, 5.4774, 5.2985], device='cuda:2'), covar=tensor([0.0571, 0.0362, 0.1102, 0.0247, 0.0362, 0.0234, 0.0332, 0.0203], device='cuda:2'), in_proj_covar=tensor([0.0358, 0.0280, 0.0334, 0.0288, 0.0286, 0.0218, 0.0267, 0.0250], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 10:22:09,521 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=54071.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:22:39,616 INFO [train2.py:809] (2/4) Epoch 14, batch 2300, loss[ctc_loss=0.06489, att_loss=0.2152, loss=0.1851, over 16194.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.00604, over 41.00 utterances.], tot_loss[ctc_loss=0.09094, att_loss=0.2443, loss=0.2136, over 3276765.35 frames. utt_duration=1271 frames, utt_pad_proportion=0.04723, over 10327.67 utterances.], batch size: 41, lr: 7.73e-03, grad_scale: 8.0 2023-03-08 10:23:36,761 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.455e+02 2.251e+02 2.657e+02 3.358e+02 7.808e+02, threshold=5.315e+02, percent-clipped=1.0 2023-03-08 10:23:57,482 INFO [train2.py:809] (2/4) Epoch 14, batch 2350, loss[ctc_loss=0.1476, att_loss=0.2768, loss=0.251, over 14241.00 frames. utt_duration=391.7 frames, utt_pad_proportion=0.3151, over 146.00 utterances.], tot_loss[ctc_loss=0.09174, att_loss=0.2441, loss=0.2136, over 3269776.84 frames. utt_duration=1248 frames, utt_pad_proportion=0.05542, over 10492.36 utterances.], batch size: 146, lr: 7.73e-03, grad_scale: 8.0 2023-03-08 10:24:01,663 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5302, 2.4412, 5.1041, 3.8904, 2.8219, 4.2799, 4.8824, 4.7796], device='cuda:2'), covar=tensor([0.0236, 0.1886, 0.0128, 0.0957, 0.1930, 0.0238, 0.0105, 0.0191], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0244, 0.0150, 0.0305, 0.0266, 0.0190, 0.0137, 0.0165], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 10:25:16,436 INFO [train2.py:809] (2/4) Epoch 14, batch 2400, loss[ctc_loss=0.07323, att_loss=0.2208, loss=0.1913, over 15767.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008667, over 38.00 utterances.], tot_loss[ctc_loss=0.09083, att_loss=0.2437, loss=0.2131, over 3270704.16 frames. utt_duration=1280 frames, utt_pad_proportion=0.04773, over 10230.85 utterances.], batch size: 38, lr: 7.73e-03, grad_scale: 8.0 2023-03-08 10:26:13,537 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.93 vs. limit=5.0 2023-03-08 10:26:15,330 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.443e+02 2.183e+02 2.531e+02 3.194e+02 7.343e+02, threshold=5.062e+02, percent-clipped=2.0 2023-03-08 10:26:20,233 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1215, 5.0143, 4.9563, 2.9829, 4.9289, 4.6292, 4.1211, 2.9243], device='cuda:2'), covar=tensor([0.0103, 0.0079, 0.0212, 0.0992, 0.0077, 0.0178, 0.0350, 0.1203], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0090, 0.0086, 0.0107, 0.0076, 0.0100, 0.0095, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 10:26:35,973 INFO [train2.py:809] (2/4) Epoch 14, batch 2450, loss[ctc_loss=0.1072, att_loss=0.2583, loss=0.2281, over 17129.00 frames. utt_duration=693.6 frames, utt_pad_proportion=0.1286, over 99.00 utterances.], tot_loss[ctc_loss=0.09049, att_loss=0.2434, loss=0.2128, over 3262619.37 frames. utt_duration=1260 frames, utt_pad_proportion=0.05491, over 10368.16 utterances.], batch size: 99, lr: 7.72e-03, grad_scale: 8.0 2023-03-08 10:27:54,984 INFO [train2.py:809] (2/4) Epoch 14, batch 2500, loss[ctc_loss=0.08631, att_loss=0.2587, loss=0.2242, over 17435.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.02938, over 63.00 utterances.], tot_loss[ctc_loss=0.09012, att_loss=0.2437, loss=0.2129, over 3273723.56 frames. utt_duration=1276 frames, utt_pad_proportion=0.04765, over 10271.43 utterances.], batch size: 63, lr: 7.72e-03, grad_scale: 8.0 2023-03-08 10:28:53,990 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.418e+02 2.435e+02 2.776e+02 3.474e+02 9.434e+02, threshold=5.553e+02, percent-clipped=7.0 2023-03-08 10:29:12,659 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.13 vs. limit=5.0 2023-03-08 10:29:14,830 INFO [train2.py:809] (2/4) Epoch 14, batch 2550, loss[ctc_loss=0.06392, att_loss=0.2298, loss=0.1966, over 16128.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005621, over 42.00 utterances.], tot_loss[ctc_loss=0.08977, att_loss=0.244, loss=0.2131, over 3275253.59 frames. utt_duration=1287 frames, utt_pad_proportion=0.04481, over 10190.72 utterances.], batch size: 42, lr: 7.72e-03, grad_scale: 8.0 2023-03-08 10:30:37,341 INFO [train2.py:809] (2/4) Epoch 14, batch 2600, loss[ctc_loss=0.1458, att_loss=0.2781, loss=0.2516, over 13210.00 frames. utt_duration=361 frames, utt_pad_proportion=0.3656, over 147.00 utterances.], tot_loss[ctc_loss=0.09095, att_loss=0.245, loss=0.2142, over 3266271.36 frames. utt_duration=1226 frames, utt_pad_proportion=0.06381, over 10665.80 utterances.], batch size: 147, lr: 7.71e-03, grad_scale: 8.0 2023-03-08 10:31:18,568 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.65 vs. limit=5.0 2023-03-08 10:31:39,670 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.402e+02 2.230e+02 2.611e+02 3.212e+02 5.306e+02, threshold=5.223e+02, percent-clipped=0.0 2023-03-08 10:31:55,743 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2023-03-08 10:32:01,194 INFO [train2.py:809] (2/4) Epoch 14, batch 2650, loss[ctc_loss=0.08839, att_loss=0.2325, loss=0.2036, over 15755.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.01012, over 38.00 utterances.], tot_loss[ctc_loss=0.09154, att_loss=0.2451, loss=0.2144, over 3260564.83 frames. utt_duration=1183 frames, utt_pad_proportion=0.07417, over 11039.55 utterances.], batch size: 38, lr: 7.71e-03, grad_scale: 8.0 2023-03-08 10:33:24,001 INFO [train2.py:809] (2/4) Epoch 14, batch 2700, loss[ctc_loss=0.07123, att_loss=0.2174, loss=0.1881, over 15648.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008091, over 37.00 utterances.], tot_loss[ctc_loss=0.09142, att_loss=0.245, loss=0.2143, over 3265145.32 frames. utt_duration=1194 frames, utt_pad_proportion=0.07055, over 10951.49 utterances.], batch size: 37, lr: 7.71e-03, grad_scale: 8.0 2023-03-08 10:34:25,552 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 2.188e+02 2.730e+02 3.256e+02 6.934e+02, threshold=5.460e+02, percent-clipped=5.0 2023-03-08 10:34:47,225 INFO [train2.py:809] (2/4) Epoch 14, batch 2750, loss[ctc_loss=0.08326, att_loss=0.2204, loss=0.193, over 15366.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.009706, over 35.00 utterances.], tot_loss[ctc_loss=0.09192, att_loss=0.2455, loss=0.2148, over 3270485.14 frames. utt_duration=1190 frames, utt_pad_proportion=0.07095, over 11011.11 utterances.], batch size: 35, lr: 7.70e-03, grad_scale: 8.0 2023-03-08 10:35:04,578 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0876, 5.4099, 5.6603, 5.4902, 5.5205, 6.0528, 5.2644, 6.1350], device='cuda:2'), covar=tensor([0.0698, 0.0728, 0.0671, 0.1210, 0.1866, 0.0780, 0.0661, 0.0564], device='cuda:2'), in_proj_covar=tensor([0.0768, 0.0451, 0.0533, 0.0595, 0.0787, 0.0538, 0.0436, 0.0526], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 10:35:29,440 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4382, 2.4775, 4.9888, 3.8005, 2.7911, 4.2028, 4.6712, 4.5620], device='cuda:2'), covar=tensor([0.0178, 0.1815, 0.0115, 0.0982, 0.2026, 0.0250, 0.0108, 0.0202], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0245, 0.0151, 0.0307, 0.0268, 0.0191, 0.0138, 0.0167], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 10:36:10,902 INFO [train2.py:809] (2/4) Epoch 14, batch 2800, loss[ctc_loss=0.08872, att_loss=0.2596, loss=0.2254, over 17052.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.009702, over 53.00 utterances.], tot_loss[ctc_loss=0.09109, att_loss=0.2449, loss=0.2141, over 3269240.40 frames. utt_duration=1202 frames, utt_pad_proportion=0.06834, over 10890.61 utterances.], batch size: 53, lr: 7.70e-03, grad_scale: 8.0 2023-03-08 10:37:12,827 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.585e+02 2.211e+02 2.653e+02 3.453e+02 1.073e+03, threshold=5.307e+02, percent-clipped=4.0 2023-03-08 10:37:34,793 INFO [train2.py:809] (2/4) Epoch 14, batch 2850, loss[ctc_loss=0.08817, att_loss=0.2294, loss=0.2011, over 15512.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.008019, over 36.00 utterances.], tot_loss[ctc_loss=0.09097, att_loss=0.2444, loss=0.2137, over 3270848.72 frames. utt_duration=1208 frames, utt_pad_proportion=0.06678, over 10844.42 utterances.], batch size: 36, lr: 7.70e-03, grad_scale: 8.0 2023-03-08 10:38:14,645 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=54664.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 10:38:57,456 INFO [train2.py:809] (2/4) Epoch 14, batch 2900, loss[ctc_loss=0.07255, att_loss=0.2234, loss=0.1932, over 12760.00 frames. utt_duration=1824 frames, utt_pad_proportion=0.03782, over 28.00 utterances.], tot_loss[ctc_loss=0.09066, att_loss=0.2438, loss=0.2132, over 3264427.97 frames. utt_duration=1211 frames, utt_pad_proportion=0.06759, over 10798.06 utterances.], batch size: 28, lr: 7.69e-03, grad_scale: 8.0 2023-03-08 10:39:10,328 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.91 vs. limit=5.0 2023-03-08 10:39:51,089 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7279, 3.5315, 3.4431, 2.7293, 3.5768, 3.6649, 3.5291, 2.0312], device='cuda:2'), covar=tensor([0.1200, 0.1969, 0.3742, 0.8885, 0.5179, 0.3678, 0.1358, 1.1720], device='cuda:2'), in_proj_covar=tensor([0.0115, 0.0138, 0.0151, 0.0218, 0.0114, 0.0205, 0.0128, 0.0186], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 10:39:55,628 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=54725.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 10:39:58,424 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 2.074e+02 2.453e+02 3.116e+02 5.349e+02, threshold=4.905e+02, percent-clipped=1.0 2023-03-08 10:40:19,744 INFO [train2.py:809] (2/4) Epoch 14, batch 2950, loss[ctc_loss=0.09476, att_loss=0.2478, loss=0.2172, over 16542.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006242, over 45.00 utterances.], tot_loss[ctc_loss=0.09056, att_loss=0.2441, loss=0.2134, over 3269263.24 frames. utt_duration=1231 frames, utt_pad_proportion=0.06004, over 10637.48 utterances.], batch size: 45, lr: 7.69e-03, grad_scale: 8.0 2023-03-08 10:41:41,648 INFO [train2.py:809] (2/4) Epoch 14, batch 3000, loss[ctc_loss=0.09213, att_loss=0.259, loss=0.2256, over 17286.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01282, over 55.00 utterances.], tot_loss[ctc_loss=0.09034, att_loss=0.2443, loss=0.2135, over 3270322.31 frames. utt_duration=1253 frames, utt_pad_proportion=0.05487, over 10451.28 utterances.], batch size: 55, lr: 7.69e-03, grad_scale: 8.0 2023-03-08 10:41:41,649 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 10:41:56,406 INFO [train2.py:843] (2/4) Epoch 14, validation: ctc_loss=0.0441, att_loss=0.2368, loss=0.1983, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 10:41:56,407 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 10:42:09,312 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0296, 3.7819, 3.0919, 3.4749, 3.9666, 3.6115, 2.8685, 4.2307], device='cuda:2'), covar=tensor([0.0980, 0.0435, 0.1073, 0.0619, 0.0638, 0.0711, 0.0905, 0.0447], device='cuda:2'), in_proj_covar=tensor([0.0191, 0.0197, 0.0212, 0.0186, 0.0252, 0.0222, 0.0190, 0.0268], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 10:42:21,841 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.80 vs. limit=5.0 2023-03-08 10:42:55,793 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 2.238e+02 2.720e+02 3.334e+02 7.498e+02, threshold=5.441e+02, percent-clipped=5.0 2023-03-08 10:43:16,796 INFO [train2.py:809] (2/4) Epoch 14, batch 3050, loss[ctc_loss=0.0607, att_loss=0.2107, loss=0.1807, over 15891.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.008957, over 39.00 utterances.], tot_loss[ctc_loss=0.09042, att_loss=0.2444, loss=0.2136, over 3271051.24 frames. utt_duration=1259 frames, utt_pad_proportion=0.05266, over 10407.18 utterances.], batch size: 39, lr: 7.68e-03, grad_scale: 16.0 2023-03-08 10:43:43,784 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9935, 4.7130, 4.6092, 4.5882, 5.3634, 4.9502, 4.7914, 2.7320], device='cuda:2'), covar=tensor([0.0131, 0.0324, 0.0302, 0.0327, 0.0836, 0.0139, 0.0289, 0.1632], device='cuda:2'), in_proj_covar=tensor([0.0133, 0.0146, 0.0153, 0.0163, 0.0350, 0.0131, 0.0138, 0.0212], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 10:43:52,882 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=54862.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:44:38,672 INFO [train2.py:809] (2/4) Epoch 14, batch 3100, loss[ctc_loss=0.09313, att_loss=0.2645, loss=0.2302, over 17096.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.01635, over 56.00 utterances.], tot_loss[ctc_loss=0.09103, att_loss=0.245, loss=0.2142, over 3273035.76 frames. utt_duration=1249 frames, utt_pad_proportion=0.05587, over 10493.99 utterances.], batch size: 56, lr: 7.68e-03, grad_scale: 16.0 2023-03-08 10:44:53,091 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7335, 5.0183, 4.6175, 5.1509, 4.4310, 4.8748, 5.2207, 4.9861], device='cuda:2'), covar=tensor([0.0609, 0.0315, 0.0827, 0.0248, 0.0522, 0.0247, 0.0240, 0.0206], device='cuda:2'), in_proj_covar=tensor([0.0351, 0.0275, 0.0328, 0.0287, 0.0280, 0.0215, 0.0267, 0.0248], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 10:45:32,223 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=54923.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 10:45:38,028 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.392e+02 2.291e+02 2.720e+02 3.507e+02 7.867e+02, threshold=5.439e+02, percent-clipped=2.0 2023-03-08 10:45:59,422 INFO [train2.py:809] (2/4) Epoch 14, batch 3150, loss[ctc_loss=0.09515, att_loss=0.2408, loss=0.2117, over 16003.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007667, over 40.00 utterances.], tot_loss[ctc_loss=0.09165, att_loss=0.245, loss=0.2144, over 3261622.49 frames. utt_duration=1218 frames, utt_pad_proportion=0.06594, over 10725.13 utterances.], batch size: 40, lr: 7.68e-03, grad_scale: 16.0 2023-03-08 10:47:20,544 INFO [train2.py:809] (2/4) Epoch 14, batch 3200, loss[ctc_loss=0.08197, att_loss=0.2213, loss=0.1934, over 16016.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006955, over 40.00 utterances.], tot_loss[ctc_loss=0.09106, att_loss=0.2445, loss=0.2138, over 3262289.92 frames. utt_duration=1223 frames, utt_pad_proportion=0.06359, over 10687.18 utterances.], batch size: 40, lr: 7.67e-03, grad_scale: 16.0 2023-03-08 10:47:31,407 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0775, 5.0713, 4.9525, 2.2102, 1.9573, 2.6542, 2.1850, 3.7987], device='cuda:2'), covar=tensor([0.0658, 0.0232, 0.0207, 0.4164, 0.5433, 0.2589, 0.3147, 0.1726], device='cuda:2'), in_proj_covar=tensor([0.0333, 0.0233, 0.0239, 0.0216, 0.0338, 0.0329, 0.0230, 0.0348], device='cuda:2'), out_proj_covar=tensor([1.4577e-04, 8.6809e-05, 1.0300e-04, 9.3938e-05, 1.4432e-04, 1.3041e-04, 9.1323e-05, 1.4483e-04], device='cuda:2') 2023-03-08 10:48:12,459 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=55020.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 10:48:23,962 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.330e+02 2.213e+02 2.608e+02 3.360e+02 8.145e+02, threshold=5.216e+02, percent-clipped=2.0 2023-03-08 10:48:38,137 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0839, 3.9027, 3.1600, 3.7042, 4.1599, 3.6723, 3.2013, 4.5270], device='cuda:2'), covar=tensor([0.1038, 0.0413, 0.1234, 0.0615, 0.0661, 0.0708, 0.0823, 0.0408], device='cuda:2'), in_proj_covar=tensor([0.0187, 0.0193, 0.0208, 0.0182, 0.0246, 0.0219, 0.0187, 0.0262], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 10:48:46,130 INFO [train2.py:809] (2/4) Epoch 14, batch 3250, loss[ctc_loss=0.1022, att_loss=0.2303, loss=0.2047, over 15782.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007258, over 38.00 utterances.], tot_loss[ctc_loss=0.09101, att_loss=0.2444, loss=0.2137, over 3271390.36 frames. utt_duration=1235 frames, utt_pad_proportion=0.05843, over 10611.35 utterances.], batch size: 38, lr: 7.67e-03, grad_scale: 16.0 2023-03-08 10:50:12,431 INFO [train2.py:809] (2/4) Epoch 14, batch 3300, loss[ctc_loss=0.09012, att_loss=0.2334, loss=0.2048, over 16271.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007767, over 43.00 utterances.], tot_loss[ctc_loss=0.09137, att_loss=0.2445, loss=0.2139, over 3273171.18 frames. utt_duration=1217 frames, utt_pad_proportion=0.06196, over 10772.25 utterances.], batch size: 43, lr: 7.66e-03, grad_scale: 16.0 2023-03-08 10:50:43,490 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5648, 3.5649, 3.4681, 3.0080, 3.5849, 3.6585, 3.5769, 2.7090], device='cuda:2'), covar=tensor([0.1119, 0.1597, 0.2545, 0.5447, 0.0968, 0.3771, 0.1455, 0.5479], device='cuda:2'), in_proj_covar=tensor([0.0118, 0.0142, 0.0154, 0.0224, 0.0117, 0.0207, 0.0130, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 10:51:15,769 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+02 2.260e+02 2.683e+02 3.437e+02 9.607e+02, threshold=5.365e+02, percent-clipped=9.0 2023-03-08 10:51:37,889 INFO [train2.py:809] (2/4) Epoch 14, batch 3350, loss[ctc_loss=0.07015, att_loss=0.2265, loss=0.1952, over 16182.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006182, over 41.00 utterances.], tot_loss[ctc_loss=0.09042, att_loss=0.2438, loss=0.2131, over 3272609.75 frames. utt_duration=1226 frames, utt_pad_proportion=0.06013, over 10690.60 utterances.], batch size: 41, lr: 7.66e-03, grad_scale: 16.0 2023-03-08 10:51:51,521 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=55148.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:52:17,425 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3004, 4.6705, 4.9325, 4.8338, 4.7640, 5.2325, 4.8983, 5.3485], device='cuda:2'), covar=tensor([0.0816, 0.0699, 0.0704, 0.1234, 0.1876, 0.0854, 0.0918, 0.0653], device='cuda:2'), in_proj_covar=tensor([0.0779, 0.0462, 0.0544, 0.0607, 0.0798, 0.0549, 0.0446, 0.0538], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 10:53:01,326 INFO [train2.py:809] (2/4) Epoch 14, batch 3400, loss[ctc_loss=0.06355, att_loss=0.2407, loss=0.2052, over 16991.00 frames. utt_duration=1334 frames, utt_pad_proportion=0.009491, over 51.00 utterances.], tot_loss[ctc_loss=0.08991, att_loss=0.2434, loss=0.2127, over 3274277.81 frames. utt_duration=1241 frames, utt_pad_proportion=0.05486, over 10568.15 utterances.], batch size: 51, lr: 7.66e-03, grad_scale: 16.0 2023-03-08 10:53:34,753 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=55209.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:53:38,105 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=55211.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:53:50,360 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=55218.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 10:54:05,573 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.493e+02 2.249e+02 2.663e+02 3.402e+02 7.529e+02, threshold=5.327e+02, percent-clipped=2.0 2023-03-08 10:54:12,471 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2279, 4.5270, 4.3750, 4.8321, 2.5552, 4.9798, 2.7945, 1.6999], device='cuda:2'), covar=tensor([0.0344, 0.0202, 0.0758, 0.0149, 0.2006, 0.0113, 0.1550, 0.1928], device='cuda:2'), in_proj_covar=tensor([0.0159, 0.0133, 0.0251, 0.0122, 0.0220, 0.0115, 0.0225, 0.0198], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 10:54:27,357 INFO [train2.py:809] (2/4) Epoch 14, batch 3450, loss[ctc_loss=0.09999, att_loss=0.2531, loss=0.2225, over 17039.00 frames. utt_duration=1338 frames, utt_pad_proportion=0.006798, over 51.00 utterances.], tot_loss[ctc_loss=0.09097, att_loss=0.245, loss=0.2142, over 3278819.13 frames. utt_duration=1245 frames, utt_pad_proportion=0.05403, over 10545.95 utterances.], batch size: 51, lr: 7.65e-03, grad_scale: 16.0 2023-03-08 10:54:29,396 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2051, 2.8997, 3.1975, 4.2033, 3.6898, 3.7406, 2.7202, 1.7903], device='cuda:2'), covar=tensor([0.0755, 0.1910, 0.1034, 0.0584, 0.0936, 0.0519, 0.1839, 0.2725], device='cuda:2'), in_proj_covar=tensor([0.0171, 0.0213, 0.0190, 0.0200, 0.0206, 0.0165, 0.0198, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 10:54:59,710 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1335, 5.1930, 5.0170, 2.6548, 1.9142, 2.8081, 3.0179, 3.9357], device='cuda:2'), covar=tensor([0.0718, 0.0223, 0.0226, 0.3595, 0.6087, 0.2632, 0.2501, 0.1663], device='cuda:2'), in_proj_covar=tensor([0.0340, 0.0236, 0.0242, 0.0220, 0.0343, 0.0335, 0.0233, 0.0354], device='cuda:2'), out_proj_covar=tensor([1.4895e-04, 8.7999e-05, 1.0418e-04, 9.5789e-05, 1.4661e-04, 1.3287e-04, 9.2566e-05, 1.4706e-04], device='cuda:2') 2023-03-08 10:55:22,018 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=55272.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 10:55:52,466 INFO [train2.py:809] (2/4) Epoch 14, batch 3500, loss[ctc_loss=0.07009, att_loss=0.2294, loss=0.1975, over 16110.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.007126, over 42.00 utterances.], tot_loss[ctc_loss=0.09051, att_loss=0.2448, loss=0.214, over 3284562.65 frames. utt_duration=1263 frames, utt_pad_proportion=0.04763, over 10414.66 utterances.], batch size: 42, lr: 7.65e-03, grad_scale: 8.0 2023-03-08 10:56:18,166 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8219, 3.6901, 3.6932, 3.1933, 3.7162, 3.7259, 3.7604, 2.8459], device='cuda:2'), covar=tensor([0.1076, 0.1688, 0.2182, 0.4828, 0.1084, 0.6354, 0.1014, 0.5736], device='cuda:2'), in_proj_covar=tensor([0.0120, 0.0144, 0.0154, 0.0223, 0.0117, 0.0209, 0.0130, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 10:56:43,454 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=55320.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 10:56:56,799 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.444e+02 2.024e+02 2.386e+02 2.873e+02 9.037e+02, threshold=4.771e+02, percent-clipped=2.0 2023-03-08 10:57:17,392 INFO [train2.py:809] (2/4) Epoch 14, batch 3550, loss[ctc_loss=0.09152, att_loss=0.2443, loss=0.2137, over 16966.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.00767, over 50.00 utterances.], tot_loss[ctc_loss=0.09049, att_loss=0.2444, loss=0.2136, over 3277177.93 frames. utt_duration=1264 frames, utt_pad_proportion=0.0476, over 10381.33 utterances.], batch size: 50, lr: 7.65e-03, grad_scale: 8.0 2023-03-08 10:58:05,150 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=55368.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 10:58:21,695 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.76 vs. limit=5.0 2023-03-08 10:58:41,900 INFO [train2.py:809] (2/4) Epoch 14, batch 3600, loss[ctc_loss=0.08149, att_loss=0.2307, loss=0.2008, over 15890.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009129, over 39.00 utterances.], tot_loss[ctc_loss=0.09036, att_loss=0.245, loss=0.2141, over 3275395.59 frames. utt_duration=1243 frames, utt_pad_proportion=0.05308, over 10556.24 utterances.], batch size: 39, lr: 7.64e-03, grad_scale: 8.0 2023-03-08 10:59:40,639 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6031, 2.1648, 5.0317, 4.1206, 2.9149, 4.3710, 4.7247, 4.7228], device='cuda:2'), covar=tensor([0.0142, 0.1656, 0.0126, 0.0763, 0.1767, 0.0184, 0.0108, 0.0147], device='cuda:2'), in_proj_covar=tensor([0.0163, 0.0247, 0.0154, 0.0311, 0.0269, 0.0192, 0.0140, 0.0169], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 10:59:46,533 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.378e+02 2.040e+02 2.412e+02 2.968e+02 1.250e+03, threshold=4.823e+02, percent-clipped=3.0 2023-03-08 11:00:07,019 INFO [train2.py:809] (2/4) Epoch 14, batch 3650, loss[ctc_loss=0.08391, att_loss=0.2415, loss=0.21, over 16408.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.007132, over 44.00 utterances.], tot_loss[ctc_loss=0.08915, att_loss=0.2445, loss=0.2134, over 3283495.25 frames. utt_duration=1265 frames, utt_pad_proportion=0.04626, over 10398.64 utterances.], batch size: 44, lr: 7.64e-03, grad_scale: 8.0 2023-03-08 11:00:49,517 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4177, 4.9164, 4.7013, 4.8471, 4.9959, 4.5903, 3.5956, 4.8575], device='cuda:2'), covar=tensor([0.0111, 0.0101, 0.0118, 0.0078, 0.0100, 0.0110, 0.0629, 0.0183], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0077, 0.0095, 0.0058, 0.0065, 0.0076, 0.0095, 0.0097], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 11:01:08,350 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7541, 3.0272, 3.8827, 4.7165, 4.0533, 4.1669, 3.0001, 2.1675], device='cuda:2'), covar=tensor([0.0548, 0.2014, 0.0759, 0.0475, 0.0900, 0.0377, 0.1469, 0.2297], device='cuda:2'), in_proj_covar=tensor([0.0170, 0.0212, 0.0188, 0.0199, 0.0204, 0.0162, 0.0197, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:01:29,821 INFO [train2.py:809] (2/4) Epoch 14, batch 3700, loss[ctc_loss=0.0713, att_loss=0.2378, loss=0.2045, over 16331.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005989, over 45.00 utterances.], tot_loss[ctc_loss=0.09013, att_loss=0.2447, loss=0.2138, over 3273960.34 frames. utt_duration=1256 frames, utt_pad_proportion=0.04996, over 10437.24 utterances.], batch size: 45, lr: 7.64e-03, grad_scale: 8.0 2023-03-08 11:01:46,818 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7546, 3.5988, 2.8776, 3.3218, 3.7791, 3.5093, 2.4178, 4.0382], device='cuda:2'), covar=tensor([0.1010, 0.0454, 0.1040, 0.0604, 0.0614, 0.0606, 0.1047, 0.0423], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0197, 0.0210, 0.0187, 0.0253, 0.0224, 0.0191, 0.0270], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 11:01:53,189 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=55504.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:02:16,964 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=55518.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 11:02:33,334 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.607e+02 2.162e+02 2.668e+02 3.280e+02 7.175e+02, threshold=5.335e+02, percent-clipped=8.0 2023-03-08 11:02:53,079 INFO [train2.py:809] (2/4) Epoch 14, batch 3750, loss[ctc_loss=0.1326, att_loss=0.2738, loss=0.2456, over 17304.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01102, over 55.00 utterances.], tot_loss[ctc_loss=0.09084, att_loss=0.2445, loss=0.2138, over 3264900.59 frames. utt_duration=1220 frames, utt_pad_proportion=0.06105, over 10721.89 utterances.], batch size: 55, lr: 7.63e-03, grad_scale: 8.0 2023-03-08 11:02:53,568 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9364, 4.8864, 4.9031, 4.7942, 5.3261, 4.9526, 4.8021, 2.0864], device='cuda:2'), covar=tensor([0.0097, 0.0149, 0.0130, 0.0130, 0.0796, 0.0106, 0.0178, 0.2134], device='cuda:2'), in_proj_covar=tensor([0.0135, 0.0147, 0.0155, 0.0166, 0.0356, 0.0134, 0.0140, 0.0217], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 11:03:36,233 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=55566.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:03:37,993 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=55567.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:04:16,307 INFO [train2.py:809] (2/4) Epoch 14, batch 3800, loss[ctc_loss=0.1099, att_loss=0.2306, loss=0.2065, over 15631.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008977, over 37.00 utterances.], tot_loss[ctc_loss=0.09136, att_loss=0.2449, loss=0.2142, over 3260734.30 frames. utt_duration=1230 frames, utt_pad_proportion=0.05978, over 10617.31 utterances.], batch size: 37, lr: 7.63e-03, grad_scale: 8.0 2023-03-08 11:05:20,070 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.413e+02 2.212e+02 2.620e+02 3.323e+02 8.677e+02, threshold=5.239e+02, percent-clipped=3.0 2023-03-08 11:05:40,105 INFO [train2.py:809] (2/4) Epoch 14, batch 3850, loss[ctc_loss=0.0715, att_loss=0.2247, loss=0.1941, over 16184.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.005967, over 41.00 utterances.], tot_loss[ctc_loss=0.09067, att_loss=0.2444, loss=0.2136, over 3270050.87 frames. utt_duration=1236 frames, utt_pad_proportion=0.05616, over 10591.89 utterances.], batch size: 41, lr: 7.63e-03, grad_scale: 8.0 2023-03-08 11:07:00,900 INFO [train2.py:809] (2/4) Epoch 14, batch 3900, loss[ctc_loss=0.07456, att_loss=0.217, loss=0.1885, over 15765.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008934, over 38.00 utterances.], tot_loss[ctc_loss=0.08969, att_loss=0.2441, loss=0.2132, over 3273167.98 frames. utt_duration=1229 frames, utt_pad_proportion=0.05882, over 10667.62 utterances.], batch size: 38, lr: 7.62e-03, grad_scale: 8.0 2023-03-08 11:08:01,631 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.280e+02 2.167e+02 2.705e+02 3.104e+02 1.128e+03, threshold=5.409e+02, percent-clipped=4.0 2023-03-08 11:08:20,153 INFO [train2.py:809] (2/4) Epoch 14, batch 3950, loss[ctc_loss=0.09114, att_loss=0.2577, loss=0.2244, over 17060.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009166, over 53.00 utterances.], tot_loss[ctc_loss=0.09027, att_loss=0.2445, loss=0.2137, over 3272466.81 frames. utt_duration=1226 frames, utt_pad_proportion=0.05908, over 10687.97 utterances.], batch size: 53, lr: 7.62e-03, grad_scale: 8.0 2023-03-08 11:08:32,858 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=55748.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:09:03,629 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5680, 2.7736, 3.5545, 4.5303, 3.9949, 4.1045, 3.0365, 2.1431], device='cuda:2'), covar=tensor([0.0553, 0.2193, 0.0877, 0.0538, 0.0754, 0.0426, 0.1383, 0.2361], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0215, 0.0192, 0.0202, 0.0207, 0.0164, 0.0198, 0.0183], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:09:39,037 INFO [train2.py:809] (2/4) Epoch 15, batch 0, loss[ctc_loss=0.1058, att_loss=0.2437, loss=0.2161, over 16194.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.006086, over 41.00 utterances.], tot_loss[ctc_loss=0.1058, att_loss=0.2437, loss=0.2161, over 16194.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.006086, over 41.00 utterances.], batch size: 41, lr: 7.36e-03, grad_scale: 8.0 2023-03-08 11:09:39,038 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 11:09:51,761 INFO [train2.py:843] (2/4) Epoch 15, validation: ctc_loss=0.04404, att_loss=0.2365, loss=0.198, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 11:09:51,762 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 11:10:43,424 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=55804.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:10:44,868 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1128, 5.3779, 5.6225, 5.5284, 5.5383, 6.0614, 5.2164, 6.1997], device='cuda:2'), covar=tensor([0.0635, 0.0677, 0.0728, 0.1123, 0.1829, 0.0793, 0.0615, 0.0536], device='cuda:2'), in_proj_covar=tensor([0.0780, 0.0464, 0.0547, 0.0606, 0.0802, 0.0556, 0.0453, 0.0539], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 11:10:46,717 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6535, 3.7056, 3.6005, 3.8821, 2.7851, 3.6870, 2.7879, 2.0862], device='cuda:2'), covar=tensor([0.0406, 0.0247, 0.0649, 0.0193, 0.1404, 0.0226, 0.1119, 0.1376], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0135, 0.0254, 0.0124, 0.0221, 0.0116, 0.0226, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:10:51,504 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=55809.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:11:05,770 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7489, 1.9292, 1.9652, 2.0413, 2.4190, 2.4869, 2.0583, 2.8742], device='cuda:2'), covar=tensor([0.2141, 0.4829, 0.2530, 0.2553, 0.2216, 0.1769, 0.4205, 0.1909], device='cuda:2'), in_proj_covar=tensor([0.0088, 0.0096, 0.0098, 0.0084, 0.0087, 0.0079, 0.0102, 0.0069], device='cuda:2'), out_proj_covar=tensor([6.3082e-05, 7.0094e-05, 7.2881e-05, 6.1847e-05, 6.1910e-05, 6.1199e-05, 7.2331e-05, 5.4228e-05], device='cuda:2') 2023-03-08 11:11:13,890 INFO [train2.py:809] (2/4) Epoch 15, batch 50, loss[ctc_loss=0.09122, att_loss=0.2537, loss=0.2212, over 17097.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.01559, over 56.00 utterances.], tot_loss[ctc_loss=0.09415, att_loss=0.2461, loss=0.2157, over 736773.53 frames. utt_duration=1124 frames, utt_pad_proportion=0.0846, over 2624.94 utterances.], batch size: 56, lr: 7.35e-03, grad_scale: 8.0 2023-03-08 11:11:21,610 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.330e+02 2.261e+02 2.719e+02 3.409e+02 7.287e+02, threshold=5.439e+02, percent-clipped=2.0 2023-03-08 11:11:42,963 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.90 vs. limit=5.0 2023-03-08 11:11:55,437 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=55848.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:12:01,375 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=55852.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:12:25,427 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=55867.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:12:29,294 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9360, 5.2442, 4.7197, 5.2915, 4.6362, 5.0233, 5.3989, 5.1417], device='cuda:2'), covar=tensor([0.0561, 0.0270, 0.0812, 0.0277, 0.0421, 0.0198, 0.0199, 0.0176], device='cuda:2'), in_proj_covar=tensor([0.0359, 0.0280, 0.0336, 0.0296, 0.0284, 0.0219, 0.0272, 0.0250], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 11:12:35,388 INFO [train2.py:809] (2/4) Epoch 15, batch 100, loss[ctc_loss=0.0871, att_loss=0.2255, loss=0.1978, over 15526.00 frames. utt_duration=1727 frames, utt_pad_proportion=0.007044, over 36.00 utterances.], tot_loss[ctc_loss=0.09285, att_loss=0.2462, loss=0.2155, over 1301632.79 frames. utt_duration=1174 frames, utt_pad_proportion=0.07125, over 4440.13 utterances.], batch size: 36, lr: 7.35e-03, grad_scale: 8.0 2023-03-08 11:13:35,186 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=55909.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:13:44,375 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=55915.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:13:48,623 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.01 vs. limit=5.0 2023-03-08 11:13:57,609 INFO [train2.py:809] (2/4) Epoch 15, batch 150, loss[ctc_loss=0.1101, att_loss=0.2646, loss=0.2337, over 17424.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03164, over 63.00 utterances.], tot_loss[ctc_loss=0.08986, att_loss=0.2448, loss=0.2138, over 1739172.18 frames. utt_duration=1227 frames, utt_pad_proportion=0.05859, over 5675.69 utterances.], batch size: 63, lr: 7.35e-03, grad_scale: 8.0 2023-03-08 11:14:05,582 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.482e+02 2.292e+02 2.863e+02 3.453e+02 8.582e+02, threshold=5.726e+02, percent-clipped=4.0 2023-03-08 11:15:19,355 INFO [train2.py:809] (2/4) Epoch 15, batch 200, loss[ctc_loss=0.0616, att_loss=0.2382, loss=0.2029, over 16480.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006026, over 46.00 utterances.], tot_loss[ctc_loss=0.08781, att_loss=0.2438, loss=0.2126, over 2081816.08 frames. utt_duration=1270 frames, utt_pad_proportion=0.04766, over 6564.76 utterances.], batch size: 46, lr: 7.34e-03, grad_scale: 8.0 2023-03-08 11:16:45,492 INFO [train2.py:809] (2/4) Epoch 15, batch 250, loss[ctc_loss=0.08688, att_loss=0.2446, loss=0.2131, over 17339.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.03561, over 63.00 utterances.], tot_loss[ctc_loss=0.08838, att_loss=0.2441, loss=0.213, over 2345677.24 frames. utt_duration=1237 frames, utt_pad_proportion=0.05604, over 7592.53 utterances.], batch size: 63, lr: 7.34e-03, grad_scale: 8.0 2023-03-08 11:16:53,154 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.371e+02 2.127e+02 2.516e+02 2.865e+02 8.492e+02, threshold=5.032e+02, percent-clipped=2.0 2023-03-08 11:17:46,770 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4642, 5.7648, 5.2640, 5.5897, 5.3774, 5.0727, 5.2047, 5.0744], device='cuda:2'), covar=tensor([0.1482, 0.0963, 0.0866, 0.0803, 0.0947, 0.1498, 0.2364, 0.2524], device='cuda:2'), in_proj_covar=tensor([0.0468, 0.0544, 0.0416, 0.0405, 0.0390, 0.0439, 0.0563, 0.0495], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 11:18:06,089 INFO [train2.py:809] (2/4) Epoch 15, batch 300, loss[ctc_loss=0.1042, att_loss=0.2539, loss=0.224, over 16545.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.005897, over 45.00 utterances.], tot_loss[ctc_loss=0.08857, att_loss=0.2437, loss=0.2127, over 2552499.77 frames. utt_duration=1248 frames, utt_pad_proportion=0.05455, over 8193.58 utterances.], batch size: 45, lr: 7.34e-03, grad_scale: 8.0 2023-03-08 11:18:24,621 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7567, 5.9772, 5.4585, 5.7554, 5.6260, 5.2267, 5.4887, 5.2009], device='cuda:2'), covar=tensor([0.1113, 0.0880, 0.0763, 0.0762, 0.0771, 0.1402, 0.2064, 0.2450], device='cuda:2'), in_proj_covar=tensor([0.0466, 0.0541, 0.0413, 0.0405, 0.0389, 0.0437, 0.0561, 0.0492], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 11:18:45,819 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 11:18:55,707 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=56104.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:19:25,555 INFO [train2.py:809] (2/4) Epoch 15, batch 350, loss[ctc_loss=0.0764, att_loss=0.2353, loss=0.2035, over 16535.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006722, over 45.00 utterances.], tot_loss[ctc_loss=0.08907, att_loss=0.2426, loss=0.2119, over 2700946.38 frames. utt_duration=1232 frames, utt_pad_proportion=0.06103, over 8779.01 utterances.], batch size: 45, lr: 7.34e-03, grad_scale: 8.0 2023-03-08 11:19:34,067 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.479e+02 2.148e+02 2.622e+02 3.077e+02 6.565e+02, threshold=5.243e+02, percent-clipped=3.0 2023-03-08 11:20:01,861 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56145.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:20:45,056 INFO [train2.py:809] (2/4) Epoch 15, batch 400, loss[ctc_loss=0.08052, att_loss=0.2175, loss=0.1901, over 15901.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.007653, over 39.00 utterances.], tot_loss[ctc_loss=0.0885, att_loss=0.2417, loss=0.2111, over 2827084.16 frames. utt_duration=1278 frames, utt_pad_proportion=0.05029, over 8856.63 utterances.], batch size: 39, lr: 7.33e-03, grad_scale: 8.0 2023-03-08 11:21:34,924 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=56204.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:21:38,235 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=56206.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:22:00,257 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8932, 5.1574, 5.4431, 5.2547, 5.3447, 5.8351, 5.0977, 5.9441], device='cuda:2'), covar=tensor([0.0686, 0.0847, 0.0722, 0.1218, 0.1824, 0.0913, 0.0688, 0.0653], device='cuda:2'), in_proj_covar=tensor([0.0786, 0.0469, 0.0545, 0.0612, 0.0809, 0.0559, 0.0455, 0.0542], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 11:22:02,740 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-08 11:22:04,490 INFO [train2.py:809] (2/4) Epoch 15, batch 450, loss[ctc_loss=0.07836, att_loss=0.2453, loss=0.2119, over 16314.00 frames. utt_duration=1451 frames, utt_pad_proportion=0.007281, over 45.00 utterances.], tot_loss[ctc_loss=0.08853, att_loss=0.2424, loss=0.2116, over 2926657.08 frames. utt_duration=1284 frames, utt_pad_proportion=0.04798, over 9125.61 utterances.], batch size: 45, lr: 7.33e-03, grad_scale: 8.0 2023-03-08 11:22:12,115 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.329e+02 2.121e+02 2.729e+02 3.274e+02 6.212e+02, threshold=5.457e+02, percent-clipped=2.0 2023-03-08 11:23:15,442 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5742, 2.5837, 5.0251, 3.9229, 2.8946, 4.2270, 4.8401, 4.6377], device='cuda:2'), covar=tensor([0.0236, 0.1671, 0.0188, 0.0915, 0.1906, 0.0274, 0.0115, 0.0229], device='cuda:2'), in_proj_covar=tensor([0.0160, 0.0242, 0.0153, 0.0305, 0.0265, 0.0190, 0.0138, 0.0169], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 11:23:23,983 INFO [train2.py:809] (2/4) Epoch 15, batch 500, loss[ctc_loss=0.1039, att_loss=0.232, loss=0.2064, over 15764.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008531, over 38.00 utterances.], tot_loss[ctc_loss=0.08774, att_loss=0.2417, loss=0.2109, over 3000462.30 frames. utt_duration=1259 frames, utt_pad_proportion=0.05393, over 9541.77 utterances.], batch size: 38, lr: 7.33e-03, grad_scale: 8.0 2023-03-08 11:23:52,899 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.91 vs. limit=5.0 2023-03-08 11:24:43,547 INFO [train2.py:809] (2/4) Epoch 15, batch 550, loss[ctc_loss=0.1716, att_loss=0.2814, loss=0.2594, over 14746.00 frames. utt_duration=402.7 frames, utt_pad_proportion=0.2947, over 147.00 utterances.], tot_loss[ctc_loss=0.08825, att_loss=0.2424, loss=0.2116, over 3066076.56 frames. utt_duration=1271 frames, utt_pad_proportion=0.04966, over 9661.47 utterances.], batch size: 147, lr: 7.32e-03, grad_scale: 8.0 2023-03-08 11:24:51,297 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.338e+02 2.199e+02 2.714e+02 3.543e+02 1.087e+03, threshold=5.428e+02, percent-clipped=4.0 2023-03-08 11:26:03,950 INFO [train2.py:809] (2/4) Epoch 15, batch 600, loss[ctc_loss=0.07284, att_loss=0.2188, loss=0.1896, over 15998.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.0079, over 40.00 utterances.], tot_loss[ctc_loss=0.08951, att_loss=0.2436, loss=0.2128, over 3116142.49 frames. utt_duration=1241 frames, utt_pad_proportion=0.05622, over 10057.23 utterances.], batch size: 40, lr: 7.32e-03, grad_scale: 8.0 2023-03-08 11:26:11,705 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-08 11:26:15,791 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5668, 3.4368, 3.4898, 2.9437, 3.4990, 3.4963, 3.5191, 2.3307], device='cuda:2'), covar=tensor([0.1213, 0.2584, 0.2192, 0.5045, 0.1223, 0.3961, 0.1148, 0.6068], device='cuda:2'), in_proj_covar=tensor([0.0125, 0.0146, 0.0157, 0.0227, 0.0119, 0.0210, 0.0133, 0.0191], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 11:26:54,540 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=56404.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:27:23,612 INFO [train2.py:809] (2/4) Epoch 15, batch 650, loss[ctc_loss=0.07804, att_loss=0.238, loss=0.206, over 16380.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.007551, over 44.00 utterances.], tot_loss[ctc_loss=0.09007, att_loss=0.2439, loss=0.2132, over 3151776.10 frames. utt_duration=1241 frames, utt_pad_proportion=0.05527, over 10174.35 utterances.], batch size: 44, lr: 7.32e-03, grad_scale: 8.0 2023-03-08 11:27:31,801 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.524e+02 2.098e+02 2.434e+02 3.232e+02 5.405e+02, threshold=4.867e+02, percent-clipped=0.0 2023-03-08 11:28:09,902 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=56452.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:28:33,796 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6844, 5.0713, 4.0586, 5.2558, 4.4040, 4.9550, 5.0730, 4.9966], device='cuda:2'), covar=tensor([0.0645, 0.0379, 0.1380, 0.0324, 0.0459, 0.0277, 0.0379, 0.0230], device='cuda:2'), in_proj_covar=tensor([0.0358, 0.0280, 0.0336, 0.0296, 0.0287, 0.0221, 0.0274, 0.0250], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 11:28:42,741 INFO [train2.py:809] (2/4) Epoch 15, batch 700, loss[ctc_loss=0.0896, att_loss=0.253, loss=0.2203, over 16628.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005439, over 47.00 utterances.], tot_loss[ctc_loss=0.09078, att_loss=0.2445, loss=0.2137, over 3177821.48 frames. utt_duration=1227 frames, utt_pad_proportion=0.0604, over 10370.61 utterances.], batch size: 47, lr: 7.31e-03, grad_scale: 8.0 2023-03-08 11:29:28,152 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=56501.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:29:33,142 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=56504.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:30:02,715 INFO [train2.py:809] (2/4) Epoch 15, batch 750, loss[ctc_loss=0.08646, att_loss=0.2514, loss=0.2184, over 16534.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006542, over 45.00 utterances.], tot_loss[ctc_loss=0.09023, att_loss=0.2439, loss=0.2131, over 3202548.35 frames. utt_duration=1237 frames, utt_pad_proportion=0.05628, over 10366.03 utterances.], batch size: 45, lr: 7.31e-03, grad_scale: 8.0 2023-03-08 11:30:03,080 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56523.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 11:30:11,024 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.244e+02 2.381e+02 2.975e+02 3.740e+02 8.505e+02, threshold=5.949e+02, percent-clipped=7.0 2023-03-08 11:30:49,607 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=56552.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:31:03,225 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.65 vs. limit=5.0 2023-03-08 11:31:15,198 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3031, 2.6744, 3.1833, 4.3885, 3.8566, 3.9336, 2.6999, 1.8198], device='cuda:2'), covar=tensor([0.0722, 0.2368, 0.1102, 0.0582, 0.0892, 0.0453, 0.1819, 0.2712], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0215, 0.0190, 0.0200, 0.0207, 0.0163, 0.0200, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:31:22,468 INFO [train2.py:809] (2/4) Epoch 15, batch 800, loss[ctc_loss=0.1128, att_loss=0.2495, loss=0.2221, over 16265.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.008057, over 43.00 utterances.], tot_loss[ctc_loss=0.09026, att_loss=0.2446, loss=0.2137, over 3224211.18 frames. utt_duration=1216 frames, utt_pad_proportion=0.05855, over 10615.60 utterances.], batch size: 43, lr: 7.31e-03, grad_scale: 8.0 2023-03-08 11:31:42,512 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=56584.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 11:32:16,013 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1018, 5.1429, 5.0111, 2.3437, 1.9459, 2.5567, 2.2540, 3.7719], device='cuda:2'), covar=tensor([0.0741, 0.0246, 0.0202, 0.4401, 0.5726, 0.2809, 0.3331, 0.1881], device='cuda:2'), in_proj_covar=tensor([0.0340, 0.0238, 0.0243, 0.0224, 0.0346, 0.0335, 0.0232, 0.0354], device='cuda:2'), out_proj_covar=tensor([1.4846e-04, 8.9278e-05, 1.0496e-04, 9.7803e-05, 1.4696e-04, 1.3277e-04, 9.2795e-05, 1.4698e-04], device='cuda:2') 2023-03-08 11:32:20,174 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.96 vs. limit=2.0 2023-03-08 11:32:44,276 INFO [train2.py:809] (2/4) Epoch 15, batch 850, loss[ctc_loss=0.09498, att_loss=0.2526, loss=0.2211, over 17055.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008723, over 52.00 utterances.], tot_loss[ctc_loss=0.0898, att_loss=0.244, loss=0.2131, over 3224175.60 frames. utt_duration=1199 frames, utt_pad_proportion=0.06749, over 10769.40 utterances.], batch size: 52, lr: 7.30e-03, grad_scale: 8.0 2023-03-08 11:32:52,930 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.315e+02 2.144e+02 2.509e+02 3.065e+02 7.548e+02, threshold=5.017e+02, percent-clipped=1.0 2023-03-08 11:32:57,974 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8601, 4.0837, 3.9070, 4.2278, 2.4615, 4.0561, 2.5227, 1.6814], device='cuda:2'), covar=tensor([0.0399, 0.0194, 0.0768, 0.0231, 0.1886, 0.0243, 0.1564, 0.1728], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0136, 0.0255, 0.0127, 0.0221, 0.0117, 0.0227, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:33:57,210 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5216, 2.5795, 5.0169, 3.9480, 2.8403, 4.2542, 4.5902, 4.6000], device='cuda:2'), covar=tensor([0.0190, 0.1525, 0.0110, 0.0850, 0.1754, 0.0216, 0.0124, 0.0221], device='cuda:2'), in_proj_covar=tensor([0.0158, 0.0238, 0.0151, 0.0300, 0.0261, 0.0189, 0.0137, 0.0167], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 11:34:04,335 INFO [train2.py:809] (2/4) Epoch 15, batch 900, loss[ctc_loss=0.1085, att_loss=0.2645, loss=0.2333, over 17391.00 frames. utt_duration=882.2 frames, utt_pad_proportion=0.07628, over 79.00 utterances.], tot_loss[ctc_loss=0.08949, att_loss=0.2437, loss=0.2129, over 3239024.30 frames. utt_duration=1213 frames, utt_pad_proportion=0.06229, over 10694.84 utterances.], batch size: 79, lr: 7.30e-03, grad_scale: 8.0 2023-03-08 11:34:42,344 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.70 vs. limit=2.0 2023-03-08 11:35:24,397 INFO [train2.py:809] (2/4) Epoch 15, batch 950, loss[ctc_loss=0.09355, att_loss=0.243, loss=0.2131, over 16773.00 frames. utt_duration=672.5 frames, utt_pad_proportion=0.1498, over 100.00 utterances.], tot_loss[ctc_loss=0.09053, att_loss=0.2445, loss=0.2137, over 3254475.03 frames. utt_duration=1216 frames, utt_pad_proportion=0.05903, over 10716.54 utterances.], batch size: 100, lr: 7.30e-03, grad_scale: 8.0 2023-03-08 11:35:32,193 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.300e+02 2.400e+02 2.758e+02 3.423e+02 1.509e+03, threshold=5.516e+02, percent-clipped=3.0 2023-03-08 11:36:34,839 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0146, 5.2568, 4.7831, 5.3138, 4.6625, 4.9781, 5.3977, 5.2091], device='cuda:2'), covar=tensor([0.0519, 0.0272, 0.0765, 0.0258, 0.0404, 0.0229, 0.0231, 0.0183], device='cuda:2'), in_proj_covar=tensor([0.0362, 0.0285, 0.0340, 0.0301, 0.0292, 0.0222, 0.0275, 0.0255], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 11:36:43,409 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6148, 2.5726, 5.1432, 3.9089, 2.9317, 4.3140, 4.9458, 4.7771], device='cuda:2'), covar=tensor([0.0208, 0.1599, 0.0148, 0.0897, 0.1815, 0.0241, 0.0102, 0.0198], device='cuda:2'), in_proj_covar=tensor([0.0158, 0.0239, 0.0152, 0.0300, 0.0262, 0.0189, 0.0137, 0.0167], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 11:36:44,601 INFO [train2.py:809] (2/4) Epoch 15, batch 1000, loss[ctc_loss=0.09562, att_loss=0.2319, loss=0.2046, over 16177.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006412, over 41.00 utterances.], tot_loss[ctc_loss=0.08936, att_loss=0.2436, loss=0.2127, over 3254549.18 frames. utt_duration=1223 frames, utt_pad_proportion=0.05873, over 10655.41 utterances.], batch size: 41, lr: 7.29e-03, grad_scale: 8.0 2023-03-08 11:37:30,261 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=56801.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:37:33,401 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56803.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:38:05,065 INFO [train2.py:809] (2/4) Epoch 15, batch 1050, loss[ctc_loss=0.08296, att_loss=0.2513, loss=0.2177, over 17351.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.01924, over 59.00 utterances.], tot_loss[ctc_loss=0.08821, att_loss=0.2429, loss=0.2119, over 3258986.94 frames. utt_duration=1253 frames, utt_pad_proportion=0.05155, over 10420.23 utterances.], batch size: 59, lr: 7.29e-03, grad_scale: 8.0 2023-03-08 11:38:12,496 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.420e+02 2.111e+02 2.438e+02 3.282e+02 6.926e+02, threshold=4.877e+02, percent-clipped=2.0 2023-03-08 11:38:20,333 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5202, 2.8875, 3.4424, 4.4924, 4.0772, 4.0601, 2.8300, 2.1560], device='cuda:2'), covar=tensor([0.0599, 0.2019, 0.0969, 0.0468, 0.0638, 0.0417, 0.1576, 0.2285], device='cuda:2'), in_proj_covar=tensor([0.0174, 0.0214, 0.0192, 0.0202, 0.0208, 0.0164, 0.0200, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:38:46,230 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=56849.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:39:10,435 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=56864.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:39:24,724 INFO [train2.py:809] (2/4) Epoch 15, batch 1100, loss[ctc_loss=0.08768, att_loss=0.2285, loss=0.2004, over 15507.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008418, over 36.00 utterances.], tot_loss[ctc_loss=0.08798, att_loss=0.2421, loss=0.2113, over 3253612.16 frames. utt_duration=1260 frames, utt_pad_proportion=0.0511, over 10338.03 utterances.], batch size: 36, lr: 7.29e-03, grad_scale: 8.0 2023-03-08 11:39:34,159 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=56879.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 11:40:20,524 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56908.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:40:44,822 INFO [train2.py:809] (2/4) Epoch 15, batch 1150, loss[ctc_loss=0.0822, att_loss=0.2435, loss=0.2112, over 16771.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006155, over 48.00 utterances.], tot_loss[ctc_loss=0.08797, att_loss=0.2417, loss=0.211, over 3256241.00 frames. utt_duration=1276 frames, utt_pad_proportion=0.04848, over 10221.57 utterances.], batch size: 48, lr: 7.28e-03, grad_scale: 8.0 2023-03-08 11:40:52,693 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 2.173e+02 2.768e+02 3.173e+02 4.889e+02, threshold=5.536e+02, percent-clipped=1.0 2023-03-08 11:41:58,516 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=56969.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:41:58,544 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3306, 4.6070, 4.3967, 4.9834, 2.5236, 4.7998, 2.8083, 2.0008], device='cuda:2'), covar=tensor([0.0306, 0.0229, 0.0814, 0.0167, 0.1864, 0.0148, 0.1400, 0.1626], device='cuda:2'), in_proj_covar=tensor([0.0157, 0.0133, 0.0250, 0.0125, 0.0216, 0.0114, 0.0221, 0.0192], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:42:04,423 INFO [train2.py:809] (2/4) Epoch 15, batch 1200, loss[ctc_loss=0.06834, att_loss=0.2145, loss=0.1852, over 15743.00 frames. utt_duration=1658 frames, utt_pad_proportion=0.01061, over 38.00 utterances.], tot_loss[ctc_loss=0.08817, att_loss=0.2418, loss=0.2111, over 3257120.28 frames. utt_duration=1264 frames, utt_pad_proportion=0.05297, over 10322.13 utterances.], batch size: 38, lr: 7.28e-03, grad_scale: 8.0 2023-03-08 11:42:36,796 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56993.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:42:39,834 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56995.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:43:23,350 INFO [train2.py:809] (2/4) Epoch 15, batch 1250, loss[ctc_loss=0.06722, att_loss=0.2358, loss=0.2021, over 16274.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007676, over 43.00 utterances.], tot_loss[ctc_loss=0.08822, att_loss=0.2416, loss=0.2109, over 3251699.41 frames. utt_duration=1267 frames, utt_pad_proportion=0.05185, over 10281.11 utterances.], batch size: 43, lr: 7.28e-03, grad_scale: 8.0 2023-03-08 11:43:31,118 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.571e+02 2.221e+02 2.683e+02 3.397e+02 1.061e+03, threshold=5.366e+02, percent-clipped=4.0 2023-03-08 11:44:13,714 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57054.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:44:17,051 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57056.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:44:43,934 INFO [train2.py:809] (2/4) Epoch 15, batch 1300, loss[ctc_loss=0.07846, att_loss=0.2254, loss=0.196, over 16413.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.00683, over 44.00 utterances.], tot_loss[ctc_loss=0.08831, att_loss=0.2413, loss=0.2107, over 3251135.23 frames. utt_duration=1251 frames, utt_pad_proportion=0.05753, over 10405.69 utterances.], batch size: 44, lr: 7.27e-03, grad_scale: 8.0 2023-03-08 11:45:23,678 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.80 vs. limit=5.0 2023-03-08 11:46:03,445 INFO [train2.py:809] (2/4) Epoch 15, batch 1350, loss[ctc_loss=0.1039, att_loss=0.2602, loss=0.229, over 16891.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006974, over 49.00 utterances.], tot_loss[ctc_loss=0.08973, att_loss=0.2429, loss=0.2123, over 3264169.95 frames. utt_duration=1240 frames, utt_pad_proportion=0.05663, over 10544.98 utterances.], batch size: 49, lr: 7.27e-03, grad_scale: 8.0 2023-03-08 11:46:11,169 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.494e+02 2.073e+02 2.593e+02 3.161e+02 1.501e+03, threshold=5.185e+02, percent-clipped=3.0 2023-03-08 11:46:28,768 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-08 11:46:37,150 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0355, 5.2280, 5.2470, 5.2290, 5.3848, 5.3175, 5.0390, 4.7847], device='cuda:2'), covar=tensor([0.0943, 0.0516, 0.0290, 0.0461, 0.0266, 0.0305, 0.0339, 0.0339], device='cuda:2'), in_proj_covar=tensor([0.0480, 0.0321, 0.0292, 0.0314, 0.0371, 0.0395, 0.0317, 0.0356], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 11:46:43,343 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0928, 4.4934, 4.3756, 4.7423, 2.7281, 4.7160, 2.6599, 1.7816], device='cuda:2'), covar=tensor([0.0400, 0.0243, 0.0853, 0.0196, 0.1819, 0.0177, 0.1594, 0.1848], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0136, 0.0254, 0.0127, 0.0220, 0.0116, 0.0225, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:47:00,513 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57159.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:47:22,524 INFO [train2.py:809] (2/4) Epoch 15, batch 1400, loss[ctc_loss=0.07562, att_loss=0.2295, loss=0.1987, over 16001.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.00776, over 40.00 utterances.], tot_loss[ctc_loss=0.08892, att_loss=0.2423, loss=0.2116, over 3265321.49 frames. utt_duration=1277 frames, utt_pad_proportion=0.04816, over 10237.78 utterances.], batch size: 40, lr: 7.27e-03, grad_scale: 8.0 2023-03-08 11:47:32,865 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57179.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 11:47:44,285 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57186.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:47:45,769 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2601, 2.4831, 3.3060, 4.3194, 3.9333, 3.8566, 2.6958, 1.9696], device='cuda:2'), covar=tensor([0.0726, 0.2312, 0.1018, 0.0532, 0.0655, 0.0444, 0.1691, 0.2427], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0210, 0.0188, 0.0200, 0.0206, 0.0162, 0.0197, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:48:00,881 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57197.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:48:19,134 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0340, 4.3464, 4.2639, 4.6238, 2.6917, 4.4597, 2.9337, 1.6824], device='cuda:2'), covar=tensor([0.0361, 0.0237, 0.0716, 0.0192, 0.1761, 0.0201, 0.1335, 0.1828], device='cuda:2'), in_proj_covar=tensor([0.0159, 0.0135, 0.0252, 0.0126, 0.0219, 0.0116, 0.0222, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:48:42,464 INFO [train2.py:809] (2/4) Epoch 15, batch 1450, loss[ctc_loss=0.1043, att_loss=0.2686, loss=0.2358, over 17093.00 frames. utt_duration=1291 frames, utt_pad_proportion=0.007367, over 53.00 utterances.], tot_loss[ctc_loss=0.08813, att_loss=0.2421, loss=0.2113, over 3262322.84 frames. utt_duration=1264 frames, utt_pad_proportion=0.05184, over 10337.09 utterances.], batch size: 53, lr: 7.26e-03, grad_scale: 8.0 2023-03-08 11:48:48,502 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57227.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 11:48:49,835 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.461e+02 2.071e+02 2.451e+02 3.320e+02 6.721e+02, threshold=4.903e+02, percent-clipped=4.0 2023-03-08 11:49:15,898 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57244.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:49:20,444 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57247.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:49:38,328 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57258.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 11:49:48,050 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57264.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:49:51,278 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1703, 5.1091, 4.9659, 2.9007, 4.9034, 4.7698, 4.3846, 2.8980], device='cuda:2'), covar=tensor([0.0104, 0.0067, 0.0185, 0.0973, 0.0085, 0.0152, 0.0285, 0.1163], device='cuda:2'), in_proj_covar=tensor([0.0066, 0.0091, 0.0086, 0.0106, 0.0075, 0.0100, 0.0095, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 11:50:02,006 INFO [train2.py:809] (2/4) Epoch 15, batch 1500, loss[ctc_loss=0.1264, att_loss=0.2693, loss=0.2408, over 16930.00 frames. utt_duration=685.4 frames, utt_pad_proportion=0.1335, over 99.00 utterances.], tot_loss[ctc_loss=0.08773, att_loss=0.2424, loss=0.2115, over 3269358.38 frames. utt_duration=1253 frames, utt_pad_proportion=0.05339, over 10453.49 utterances.], batch size: 99, lr: 7.26e-03, grad_scale: 16.0 2023-03-08 11:50:20,493 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0995, 2.4579, 3.0907, 4.2812, 3.8048, 3.8456, 2.7861, 2.0164], device='cuda:2'), covar=tensor([0.0789, 0.2285, 0.1119, 0.0535, 0.0715, 0.0422, 0.1645, 0.2433], device='cuda:2'), in_proj_covar=tensor([0.0173, 0.0211, 0.0188, 0.0200, 0.0206, 0.0164, 0.0198, 0.0181], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:50:53,402 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57305.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:51:20,963 INFO [train2.py:809] (2/4) Epoch 15, batch 1550, loss[ctc_loss=0.101, att_loss=0.255, loss=0.2242, over 16765.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006658, over 48.00 utterances.], tot_loss[ctc_loss=0.08826, att_loss=0.2428, loss=0.2119, over 3272487.80 frames. utt_duration=1258 frames, utt_pad_proportion=0.05281, over 10419.68 utterances.], batch size: 48, lr: 7.26e-03, grad_scale: 16.0 2023-03-08 11:51:29,315 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.286e+02 2.083e+02 2.433e+02 3.048e+02 8.459e+02, threshold=4.866e+02, percent-clipped=4.0 2023-03-08 11:52:02,049 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57349.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:52:05,153 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57351.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:52:39,937 INFO [train2.py:809] (2/4) Epoch 15, batch 1600, loss[ctc_loss=0.08056, att_loss=0.2462, loss=0.2131, over 16615.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005827, over 47.00 utterances.], tot_loss[ctc_loss=0.08799, att_loss=0.2425, loss=0.2116, over 3272438.19 frames. utt_duration=1248 frames, utt_pad_proportion=0.05512, over 10499.36 utterances.], batch size: 47, lr: 7.26e-03, grad_scale: 16.0 2023-03-08 11:53:20,342 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57398.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:54:00,801 INFO [train2.py:809] (2/4) Epoch 15, batch 1650, loss[ctc_loss=0.1025, att_loss=0.2564, loss=0.2257, over 17048.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008366, over 52.00 utterances.], tot_loss[ctc_loss=0.08799, att_loss=0.2426, loss=0.2117, over 3273779.49 frames. utt_duration=1255 frames, utt_pad_proportion=0.05312, over 10444.53 utterances.], batch size: 52, lr: 7.25e-03, grad_scale: 16.0 2023-03-08 11:54:09,581 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.363e+02 2.043e+02 2.377e+02 2.961e+02 5.254e+02, threshold=4.753e+02, percent-clipped=2.0 2023-03-08 11:54:59,141 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57459.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:54:59,227 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57459.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:55:02,015 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0200, 6.2236, 5.5499, 5.9563, 5.8271, 5.3076, 5.6428, 5.3069], device='cuda:2'), covar=tensor([0.1095, 0.0814, 0.0959, 0.0777, 0.0753, 0.1542, 0.2164, 0.2482], device='cuda:2'), in_proj_covar=tensor([0.0467, 0.0540, 0.0410, 0.0405, 0.0389, 0.0430, 0.0555, 0.0495], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 11:55:03,811 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57462.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:55:20,179 INFO [train2.py:809] (2/4) Epoch 15, batch 1700, loss[ctc_loss=0.08273, att_loss=0.2422, loss=0.2103, over 16758.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.00694, over 48.00 utterances.], tot_loss[ctc_loss=0.08823, att_loss=0.2429, loss=0.212, over 3271884.03 frames. utt_duration=1219 frames, utt_pad_proportion=0.06198, over 10745.88 utterances.], batch size: 48, lr: 7.25e-03, grad_scale: 16.0 2023-03-08 11:55:33,403 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1063, 5.3554, 5.6163, 5.4511, 5.5855, 6.0756, 5.2778, 6.2223], device='cuda:2'), covar=tensor([0.0712, 0.0661, 0.0775, 0.1227, 0.1876, 0.0885, 0.0629, 0.0559], device='cuda:2'), in_proj_covar=tensor([0.0793, 0.0464, 0.0546, 0.0607, 0.0796, 0.0560, 0.0449, 0.0535], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 11:55:54,992 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57495.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:56:06,457 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6999, 2.2210, 5.0658, 3.9103, 3.0208, 4.5350, 4.8199, 4.7606], device='cuda:2'), covar=tensor([0.0176, 0.1691, 0.0124, 0.0906, 0.1721, 0.0165, 0.0094, 0.0179], device='cuda:2'), in_proj_covar=tensor([0.0165, 0.0242, 0.0155, 0.0309, 0.0266, 0.0193, 0.0140, 0.0171], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 11:56:10,885 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8731, 1.8875, 2.3459, 2.0393, 2.6612, 2.5929, 2.1046, 2.9268], device='cuda:2'), covar=tensor([0.2705, 0.4602, 0.3429, 0.2108, 0.1756, 0.1431, 0.3282, 0.0784], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0098, 0.0100, 0.0088, 0.0090, 0.0080, 0.0104, 0.0071], device='cuda:2'), out_proj_covar=tensor([6.5332e-05, 7.1957e-05, 7.5019e-05, 6.4814e-05, 6.4556e-05, 6.2588e-05, 7.4149e-05, 5.6134e-05], device='cuda:2') 2023-03-08 11:56:14,330 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57507.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:56:38,718 INFO [train2.py:809] (2/4) Epoch 15, batch 1750, loss[ctc_loss=0.0857, att_loss=0.2574, loss=0.223, over 16459.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.007096, over 46.00 utterances.], tot_loss[ctc_loss=0.08908, att_loss=0.2434, loss=0.2125, over 3264072.32 frames. utt_duration=1208 frames, utt_pad_proportion=0.0655, over 10818.68 utterances.], batch size: 46, lr: 7.25e-03, grad_scale: 16.0 2023-03-08 11:56:39,758 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57523.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:56:46,885 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+02 2.236e+02 2.703e+02 3.413e+02 7.176e+02, threshold=5.406e+02, percent-clipped=5.0 2023-03-08 11:56:57,022 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-03-08 11:56:59,363 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0163, 5.3027, 4.8164, 5.3855, 4.7286, 4.9967, 5.4914, 5.2471], device='cuda:2'), covar=tensor([0.0516, 0.0290, 0.0849, 0.0263, 0.0444, 0.0217, 0.0199, 0.0177], device='cuda:2'), in_proj_covar=tensor([0.0364, 0.0284, 0.0342, 0.0299, 0.0293, 0.0222, 0.0275, 0.0255], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 11:57:08,506 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57542.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:57:26,198 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57553.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 11:57:31,834 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57556.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:57:44,183 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57564.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:57:58,282 INFO [train2.py:809] (2/4) Epoch 15, batch 1800, loss[ctc_loss=0.06576, att_loss=0.2144, loss=0.1847, over 15348.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01198, over 35.00 utterances.], tot_loss[ctc_loss=0.08857, att_loss=0.2429, loss=0.212, over 3261468.75 frames. utt_duration=1243 frames, utt_pad_proportion=0.05712, over 10507.07 utterances.], batch size: 35, lr: 7.24e-03, grad_scale: 16.0 2023-03-08 11:58:02,089 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5171, 3.7907, 3.5146, 3.8860, 2.6117, 3.7097, 2.7819, 2.0439], device='cuda:2'), covar=tensor([0.0420, 0.0251, 0.0729, 0.0203, 0.1584, 0.0234, 0.1272, 0.1424], device='cuda:2'), in_proj_covar=tensor([0.0158, 0.0133, 0.0251, 0.0123, 0.0217, 0.0115, 0.0220, 0.0193], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:58:08,324 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4602, 2.3194, 4.9779, 3.6037, 3.0233, 4.3330, 4.5929, 4.5762], device='cuda:2'), covar=tensor([0.0210, 0.1820, 0.0108, 0.1105, 0.1706, 0.0210, 0.0117, 0.0214], device='cuda:2'), in_proj_covar=tensor([0.0165, 0.0242, 0.0154, 0.0307, 0.0265, 0.0192, 0.0139, 0.0170], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 11:58:19,531 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.15 vs. limit=2.0 2023-03-08 11:58:41,633 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57600.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:58:46,474 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5490, 2.8891, 3.5924, 4.5754, 4.0937, 3.9858, 2.9272, 2.1005], device='cuda:2'), covar=tensor([0.0686, 0.1906, 0.0906, 0.0413, 0.0622, 0.0402, 0.1497, 0.2274], device='cuda:2'), in_proj_covar=tensor([0.0173, 0.0209, 0.0187, 0.0197, 0.0204, 0.0163, 0.0198, 0.0181], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 11:59:00,593 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57612.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 11:59:11,005 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-03-08 11:59:19,509 INFO [train2.py:809] (2/4) Epoch 15, batch 1850, loss[ctc_loss=0.102, att_loss=0.2343, loss=0.2078, over 16184.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006653, over 41.00 utterances.], tot_loss[ctc_loss=0.08847, att_loss=0.2428, loss=0.212, over 3260041.62 frames. utt_duration=1237 frames, utt_pad_proportion=0.05985, over 10558.73 utterances.], batch size: 41, lr: 7.24e-03, grad_scale: 16.0 2023-03-08 11:59:27,208 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.618e+02 2.149e+02 2.578e+02 3.271e+02 8.979e+02, threshold=5.155e+02, percent-clipped=4.0 2023-03-08 12:00:00,125 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57649.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:00:03,808 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57651.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:00:38,106 INFO [train2.py:809] (2/4) Epoch 15, batch 1900, loss[ctc_loss=0.08828, att_loss=0.2183, loss=0.1923, over 15508.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008386, over 36.00 utterances.], tot_loss[ctc_loss=0.08777, att_loss=0.2422, loss=0.2114, over 3262484.46 frames. utt_duration=1242 frames, utt_pad_proportion=0.05742, over 10518.51 utterances.], batch size: 36, lr: 7.24e-03, grad_scale: 16.0 2023-03-08 12:01:00,433 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4186, 4.9202, 4.7383, 4.9571, 4.9787, 4.5598, 3.3695, 4.8859], device='cuda:2'), covar=tensor([0.0126, 0.0094, 0.0137, 0.0068, 0.0089, 0.0116, 0.0742, 0.0173], device='cuda:2'), in_proj_covar=tensor([0.0082, 0.0079, 0.0098, 0.0060, 0.0067, 0.0078, 0.0097, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 12:01:15,662 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57697.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:01:18,927 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57699.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:01:56,431 INFO [train2.py:809] (2/4) Epoch 15, batch 1950, loss[ctc_loss=0.09961, att_loss=0.2577, loss=0.2261, over 17429.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03132, over 63.00 utterances.], tot_loss[ctc_loss=0.08762, att_loss=0.2421, loss=0.2112, over 3269499.25 frames. utt_duration=1277 frames, utt_pad_proportion=0.04848, over 10257.02 utterances.], batch size: 63, lr: 7.23e-03, grad_scale: 16.0 2023-03-08 12:02:05,439 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.339e+02 2.187e+02 2.629e+02 3.507e+02 9.887e+02, threshold=5.258e+02, percent-clipped=4.0 2023-03-08 12:02:13,636 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57733.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:02:47,500 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57754.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:03:17,165 INFO [train2.py:809] (2/4) Epoch 15, batch 2000, loss[ctc_loss=0.07662, att_loss=0.236, loss=0.2041, over 16277.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006858, over 43.00 utterances.], tot_loss[ctc_loss=0.08764, att_loss=0.2427, loss=0.2117, over 3271344.33 frames. utt_duration=1256 frames, utt_pad_proportion=0.05367, over 10430.68 utterances.], batch size: 43, lr: 7.23e-03, grad_scale: 16.0 2023-03-08 12:03:21,158 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1030, 5.3454, 5.6379, 5.4478, 5.5518, 6.0724, 5.2535, 6.1526], device='cuda:2'), covar=tensor([0.0681, 0.0804, 0.0701, 0.1139, 0.1731, 0.0774, 0.0586, 0.0586], device='cuda:2'), in_proj_covar=tensor([0.0796, 0.0469, 0.0545, 0.0606, 0.0801, 0.0562, 0.0448, 0.0531], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 12:03:50,219 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57794.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:04:28,945 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57818.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:04:37,299 INFO [train2.py:809] (2/4) Epoch 15, batch 2050, loss[ctc_loss=0.0697, att_loss=0.2411, loss=0.2069, over 17356.00 frames. utt_duration=880 frames, utt_pad_proportion=0.0737, over 79.00 utterances.], tot_loss[ctc_loss=0.08626, att_loss=0.2419, loss=0.2108, over 3272147.27 frames. utt_duration=1276 frames, utt_pad_proportion=0.04937, over 10270.64 utterances.], batch size: 79, lr: 7.23e-03, grad_scale: 16.0 2023-03-08 12:04:45,710 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.367e+02 2.062e+02 2.536e+02 3.073e+02 8.062e+02, threshold=5.072e+02, percent-clipped=2.0 2023-03-08 12:05:07,136 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57842.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:05:22,333 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57851.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:05:25,422 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57853.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 12:05:29,210 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.18 vs. limit=5.0 2023-03-08 12:05:56,962 INFO [train2.py:809] (2/4) Epoch 15, batch 2100, loss[ctc_loss=0.08694, att_loss=0.2522, loss=0.2192, over 17399.00 frames. utt_duration=1010 frames, utt_pad_proportion=0.04701, over 69.00 utterances.], tot_loss[ctc_loss=0.08667, att_loss=0.2416, loss=0.2106, over 3270372.19 frames. utt_duration=1287 frames, utt_pad_proportion=0.04714, over 10173.18 utterances.], batch size: 69, lr: 7.22e-03, grad_scale: 16.0 2023-03-08 12:06:23,268 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57890.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:06:40,402 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57900.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:06:41,917 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57901.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:07:03,550 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-03-08 12:07:09,374 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6798, 5.0815, 4.9632, 5.0727, 5.2768, 4.7295, 3.7158, 5.0634], device='cuda:2'), covar=tensor([0.0115, 0.0120, 0.0119, 0.0076, 0.0072, 0.0128, 0.0647, 0.0188], device='cuda:2'), in_proj_covar=tensor([0.0082, 0.0079, 0.0098, 0.0060, 0.0066, 0.0078, 0.0096, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 12:07:17,264 INFO [train2.py:809] (2/4) Epoch 15, batch 2150, loss[ctc_loss=0.07999, att_loss=0.2404, loss=0.2083, over 16557.00 frames. utt_duration=1473 frames, utt_pad_proportion=0.005267, over 45.00 utterances.], tot_loss[ctc_loss=0.08873, att_loss=0.243, loss=0.2121, over 3262077.41 frames. utt_duration=1213 frames, utt_pad_proportion=0.06643, over 10772.33 utterances.], batch size: 45, lr: 7.22e-03, grad_scale: 16.0 2023-03-08 12:07:19,361 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7515, 3.4679, 3.5484, 2.7771, 3.5523, 3.6312, 3.6302, 2.2793], device='cuda:2'), covar=tensor([0.1125, 0.1997, 0.3045, 1.1527, 0.1982, 0.3427, 0.1092, 1.1602], device='cuda:2'), in_proj_covar=tensor([0.0124, 0.0146, 0.0157, 0.0225, 0.0120, 0.0212, 0.0134, 0.0192], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 12:07:25,215 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.416e+02 2.164e+02 2.681e+02 3.169e+02 6.485e+02, threshold=5.361e+02, percent-clipped=3.0 2023-03-08 12:07:27,228 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0201, 5.2382, 5.5221, 5.4100, 5.4990, 5.9225, 5.1695, 6.0598], device='cuda:2'), covar=tensor([0.0658, 0.0648, 0.0668, 0.1167, 0.1613, 0.0890, 0.0600, 0.0637], device='cuda:2'), in_proj_covar=tensor([0.0792, 0.0469, 0.0543, 0.0607, 0.0799, 0.0561, 0.0447, 0.0532], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 12:07:57,495 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57948.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:08:36,584 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0583, 4.4943, 4.2771, 4.7085, 2.6079, 4.7171, 2.4664, 1.8491], device='cuda:2'), covar=tensor([0.0387, 0.0210, 0.0694, 0.0177, 0.1954, 0.0158, 0.1665, 0.1769], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0136, 0.0254, 0.0127, 0.0224, 0.0118, 0.0224, 0.0198], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 12:08:37,641 INFO [train2.py:809] (2/4) Epoch 15, batch 2200, loss[ctc_loss=0.1095, att_loss=0.2618, loss=0.2314, over 17002.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.008778, over 51.00 utterances.], tot_loss[ctc_loss=0.08922, att_loss=0.2432, loss=0.2124, over 3261142.10 frames. utt_duration=1187 frames, utt_pad_proportion=0.07279, over 11001.68 utterances.], batch size: 51, lr: 7.22e-03, grad_scale: 16.0 2023-03-08 12:10:01,086 INFO [train2.py:809] (2/4) Epoch 15, batch 2250, loss[ctc_loss=0.08183, att_loss=0.2302, loss=0.2005, over 16192.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.00566, over 41.00 utterances.], tot_loss[ctc_loss=0.08953, att_loss=0.2441, loss=0.2132, over 3271663.66 frames. utt_duration=1186 frames, utt_pad_proportion=0.07112, over 11052.65 utterances.], batch size: 41, lr: 7.22e-03, grad_scale: 16.0 2023-03-08 12:10:08,781 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.417e+02 2.180e+02 2.771e+02 3.383e+02 6.703e+02, threshold=5.542e+02, percent-clipped=1.0 2023-03-08 12:10:49,919 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58054.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:10:58,170 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.34 vs. limit=5.0 2023-03-08 12:11:07,922 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1155, 4.4947, 4.2290, 4.6517, 3.0493, 4.7728, 2.3809, 2.0152], device='cuda:2'), covar=tensor([0.0366, 0.0190, 0.0696, 0.0179, 0.1552, 0.0141, 0.1579, 0.1608], device='cuda:2'), in_proj_covar=tensor([0.0160, 0.0136, 0.0253, 0.0127, 0.0223, 0.0118, 0.0223, 0.0198], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 12:11:20,344 INFO [train2.py:809] (2/4) Epoch 15, batch 2300, loss[ctc_loss=0.08351, att_loss=0.2278, loss=0.199, over 15959.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.00616, over 41.00 utterances.], tot_loss[ctc_loss=0.08927, att_loss=0.2438, loss=0.2129, over 3276257.81 frames. utt_duration=1187 frames, utt_pad_proportion=0.0682, over 11050.83 utterances.], batch size: 41, lr: 7.21e-03, grad_scale: 8.0 2023-03-08 12:11:27,141 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1508, 4.5637, 4.4589, 4.6333, 2.9246, 4.7426, 2.5801, 1.8087], device='cuda:2'), covar=tensor([0.0340, 0.0196, 0.0651, 0.0216, 0.1665, 0.0160, 0.1489, 0.1743], device='cuda:2'), in_proj_covar=tensor([0.0160, 0.0136, 0.0254, 0.0127, 0.0224, 0.0118, 0.0224, 0.0198], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 12:11:29,003 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-08 12:11:45,856 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=58089.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:11:46,682 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2023-03-08 12:12:06,471 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58102.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:12:06,843 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1464, 2.3471, 4.6324, 3.7292, 2.9443, 4.0121, 4.1911, 4.2157], device='cuda:2'), covar=tensor([0.0211, 0.1659, 0.0105, 0.0835, 0.1618, 0.0278, 0.0169, 0.0255], device='cuda:2'), in_proj_covar=tensor([0.0165, 0.0240, 0.0154, 0.0304, 0.0262, 0.0191, 0.0140, 0.0169], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 12:12:33,098 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58118.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:12:40,614 INFO [train2.py:809] (2/4) Epoch 15, batch 2350, loss[ctc_loss=0.08255, att_loss=0.2235, loss=0.1953, over 14891.00 frames. utt_duration=1806 frames, utt_pad_proportion=0.03094, over 33.00 utterances.], tot_loss[ctc_loss=0.0883, att_loss=0.2435, loss=0.2125, over 3265100.21 frames. utt_duration=1198 frames, utt_pad_proportion=0.06675, over 10916.16 utterances.], batch size: 33, lr: 7.21e-03, grad_scale: 8.0 2023-03-08 12:12:49,822 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+02 2.243e+02 2.592e+02 3.349e+02 5.777e+02, threshold=5.184e+02, percent-clipped=2.0 2023-03-08 12:13:06,590 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9780, 5.2443, 5.1618, 5.1890, 5.2973, 5.3033, 4.9794, 4.7607], device='cuda:2'), covar=tensor([0.0976, 0.0469, 0.0318, 0.0431, 0.0243, 0.0259, 0.0323, 0.0320], device='cuda:2'), in_proj_covar=tensor([0.0483, 0.0324, 0.0293, 0.0322, 0.0378, 0.0395, 0.0318, 0.0358], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 12:13:08,889 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58141.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:13:14,196 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58144.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:13:24,727 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58151.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:13:48,260 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58166.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:13:58,783 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.78 vs. limit=2.0 2023-03-08 12:13:58,916 INFO [train2.py:809] (2/4) Epoch 15, batch 2400, loss[ctc_loss=0.0801, att_loss=0.2524, loss=0.218, over 17310.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02326, over 59.00 utterances.], tot_loss[ctc_loss=0.08901, att_loss=0.2438, loss=0.2128, over 3265935.90 frames. utt_duration=1208 frames, utt_pad_proportion=0.06654, over 10829.40 utterances.], batch size: 59, lr: 7.21e-03, grad_scale: 8.0 2023-03-08 12:14:14,958 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1561, 5.2281, 5.0098, 2.3780, 2.0541, 2.9511, 2.6661, 3.9244], device='cuda:2'), covar=tensor([0.0664, 0.0235, 0.0221, 0.4487, 0.5292, 0.2401, 0.2885, 0.1679], device='cuda:2'), in_proj_covar=tensor([0.0344, 0.0239, 0.0245, 0.0225, 0.0343, 0.0336, 0.0234, 0.0355], device='cuda:2'), out_proj_covar=tensor([1.5011e-04, 9.0163e-05, 1.0601e-04, 9.7931e-05, 1.4621e-04, 1.3341e-04, 9.3777e-05, 1.4697e-04], device='cuda:2') 2023-03-08 12:14:40,626 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58199.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:14:45,875 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58202.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:14:50,486 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58205.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:15:14,928 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-03-08 12:15:18,769 INFO [train2.py:809] (2/4) Epoch 15, batch 2450, loss[ctc_loss=0.07006, att_loss=0.2217, loss=0.1914, over 15759.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.01022, over 38.00 utterances.], tot_loss[ctc_loss=0.08829, att_loss=0.2433, loss=0.2123, over 3260458.87 frames. utt_duration=1219 frames, utt_pad_proportion=0.06404, over 10713.38 utterances.], batch size: 38, lr: 7.20e-03, grad_scale: 8.0 2023-03-08 12:15:27,699 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.313e+02 2.141e+02 2.652e+02 3.416e+02 6.108e+02, threshold=5.305e+02, percent-clipped=4.0 2023-03-08 12:16:38,079 INFO [train2.py:809] (2/4) Epoch 15, batch 2500, loss[ctc_loss=0.0692, att_loss=0.2372, loss=0.2036, over 16865.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.007861, over 49.00 utterances.], tot_loss[ctc_loss=0.0874, att_loss=0.2428, loss=0.2117, over 3264221.91 frames. utt_duration=1242 frames, utt_pad_proportion=0.05849, over 10521.69 utterances.], batch size: 49, lr: 7.20e-03, grad_scale: 8.0 2023-03-08 12:17:57,543 INFO [train2.py:809] (2/4) Epoch 15, batch 2550, loss[ctc_loss=0.08537, att_loss=0.2554, loss=0.2214, over 17315.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02368, over 59.00 utterances.], tot_loss[ctc_loss=0.08753, att_loss=0.2434, loss=0.2122, over 3272626.00 frames. utt_duration=1238 frames, utt_pad_proportion=0.05819, over 10586.42 utterances.], batch size: 59, lr: 7.20e-03, grad_scale: 8.0 2023-03-08 12:18:06,732 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.352e+02 2.062e+02 2.631e+02 3.242e+02 5.829e+02, threshold=5.263e+02, percent-clipped=2.0 2023-03-08 12:19:17,178 INFO [train2.py:809] (2/4) Epoch 15, batch 2600, loss[ctc_loss=0.07816, att_loss=0.2438, loss=0.2107, over 17426.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03151, over 63.00 utterances.], tot_loss[ctc_loss=0.08709, att_loss=0.2429, loss=0.2117, over 3275744.17 frames. utt_duration=1224 frames, utt_pad_proportion=0.05993, over 10717.68 utterances.], batch size: 63, lr: 7.19e-03, grad_scale: 8.0 2023-03-08 12:19:35,369 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.98 vs. limit=5.0 2023-03-08 12:19:43,491 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58389.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:20:37,375 INFO [train2.py:809] (2/4) Epoch 15, batch 2650, loss[ctc_loss=0.06994, att_loss=0.2355, loss=0.2024, over 16622.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005513, over 47.00 utterances.], tot_loss[ctc_loss=0.08636, att_loss=0.2418, loss=0.2107, over 3261430.36 frames. utt_duration=1241 frames, utt_pad_proportion=0.05911, over 10526.58 utterances.], batch size: 47, lr: 7.19e-03, grad_scale: 8.0 2023-03-08 12:20:42,305 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8684, 4.3428, 4.5562, 4.3777, 4.4291, 4.7741, 4.4610, 4.8356], device='cuda:2'), covar=tensor([0.0812, 0.0729, 0.0749, 0.1166, 0.1754, 0.0984, 0.1960, 0.0797], device='cuda:2'), in_proj_covar=tensor([0.0779, 0.0461, 0.0538, 0.0599, 0.0790, 0.0552, 0.0442, 0.0525], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 12:20:46,672 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.284e+02 2.218e+02 2.453e+02 3.355e+02 6.382e+02, threshold=4.906e+02, percent-clipped=2.0 2023-03-08 12:21:00,179 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58437.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:21:01,918 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8196, 5.0542, 5.0019, 4.9544, 5.1068, 5.1002, 4.7380, 4.5997], device='cuda:2'), covar=tensor([0.0994, 0.0552, 0.0310, 0.0512, 0.0307, 0.0293, 0.0379, 0.0359], device='cuda:2'), in_proj_covar=tensor([0.0483, 0.0325, 0.0294, 0.0322, 0.0377, 0.0397, 0.0319, 0.0360], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 12:21:18,078 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9932, 5.2035, 5.1252, 5.1266, 5.2807, 5.2705, 4.9092, 4.7755], device='cuda:2'), covar=tensor([0.0951, 0.0480, 0.0316, 0.0467, 0.0281, 0.0286, 0.0380, 0.0339], device='cuda:2'), in_proj_covar=tensor([0.0483, 0.0325, 0.0294, 0.0322, 0.0377, 0.0397, 0.0320, 0.0360], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 12:21:22,722 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.2621, 5.5704, 5.7840, 5.5264, 5.7330, 6.2059, 5.3730, 6.2694], device='cuda:2'), covar=tensor([0.0591, 0.0552, 0.0743, 0.1167, 0.1712, 0.0818, 0.0600, 0.0597], device='cuda:2'), in_proj_covar=tensor([0.0779, 0.0461, 0.0538, 0.0600, 0.0790, 0.0553, 0.0441, 0.0525], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 12:21:57,758 INFO [train2.py:809] (2/4) Epoch 15, batch 2700, loss[ctc_loss=0.1012, att_loss=0.2392, loss=0.2116, over 16182.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006715, over 41.00 utterances.], tot_loss[ctc_loss=0.08663, att_loss=0.242, loss=0.211, over 3260715.65 frames. utt_duration=1227 frames, utt_pad_proportion=0.06251, over 10645.02 utterances.], batch size: 41, lr: 7.19e-03, grad_scale: 8.0 2023-03-08 12:22:36,297 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=58497.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:22:41,393 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=58500.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:23:17,021 INFO [train2.py:809] (2/4) Epoch 15, batch 2750, loss[ctc_loss=0.09133, att_loss=0.2602, loss=0.2264, over 16780.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.0058, over 48.00 utterances.], tot_loss[ctc_loss=0.08703, att_loss=0.2425, loss=0.2114, over 3263962.77 frames. utt_duration=1231 frames, utt_pad_proportion=0.06098, over 10617.09 utterances.], batch size: 48, lr: 7.18e-03, grad_scale: 8.0 2023-03-08 12:23:26,923 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.411e+02 2.307e+02 2.848e+02 3.393e+02 6.929e+02, threshold=5.697e+02, percent-clipped=6.0 2023-03-08 12:23:56,840 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 12:24:35,855 INFO [train2.py:809] (2/4) Epoch 15, batch 2800, loss[ctc_loss=0.07758, att_loss=0.228, loss=0.1979, over 15894.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.008692, over 39.00 utterances.], tot_loss[ctc_loss=0.08754, att_loss=0.2423, loss=0.2113, over 3257387.34 frames. utt_duration=1218 frames, utt_pad_proportion=0.06652, over 10707.28 utterances.], batch size: 39, lr: 7.18e-03, grad_scale: 8.0 2023-03-08 12:24:51,039 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9029, 4.3891, 3.9916, 4.5104, 2.4051, 4.4320, 2.2062, 1.6616], device='cuda:2'), covar=tensor([0.0402, 0.0171, 0.0905, 0.0162, 0.2103, 0.0167, 0.1836, 0.1895], device='cuda:2'), in_proj_covar=tensor([0.0157, 0.0133, 0.0247, 0.0124, 0.0218, 0.0117, 0.0219, 0.0194], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 12:25:53,744 INFO [train2.py:809] (2/4) Epoch 15, batch 2850, loss[ctc_loss=0.1262, att_loss=0.235, loss=0.2132, over 15521.00 frames. utt_duration=1726 frames, utt_pad_proportion=0.007492, over 36.00 utterances.], tot_loss[ctc_loss=0.08751, att_loss=0.2426, loss=0.2116, over 3262284.13 frames. utt_duration=1244 frames, utt_pad_proportion=0.05809, over 10500.04 utterances.], batch size: 36, lr: 7.18e-03, grad_scale: 8.0 2023-03-08 12:26:02,953 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.515e+02 2.157e+02 2.674e+02 3.074e+02 6.924e+02, threshold=5.347e+02, percent-clipped=2.0 2023-03-08 12:26:07,516 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5649, 2.9388, 3.2302, 4.5443, 3.9411, 3.9731, 2.9640, 2.1895], device='cuda:2'), covar=tensor([0.0621, 0.1914, 0.1082, 0.0498, 0.0797, 0.0466, 0.1439, 0.2351], device='cuda:2'), in_proj_covar=tensor([0.0173, 0.0211, 0.0189, 0.0199, 0.0206, 0.0167, 0.0198, 0.0181], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 12:27:12,212 INFO [train2.py:809] (2/4) Epoch 15, batch 2900, loss[ctc_loss=0.09863, att_loss=0.2376, loss=0.2098, over 15947.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007539, over 41.00 utterances.], tot_loss[ctc_loss=0.08782, att_loss=0.2425, loss=0.2115, over 3255597.87 frames. utt_duration=1236 frames, utt_pad_proportion=0.06179, over 10545.20 utterances.], batch size: 41, lr: 7.18e-03, grad_scale: 8.0 2023-03-08 12:27:18,802 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0471, 4.5572, 4.4227, 4.5038, 2.6237, 4.5274, 2.4274, 2.0681], device='cuda:2'), covar=tensor([0.0333, 0.0173, 0.0633, 0.0166, 0.1759, 0.0155, 0.1564, 0.1549], device='cuda:2'), in_proj_covar=tensor([0.0157, 0.0133, 0.0248, 0.0124, 0.0220, 0.0118, 0.0219, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 12:27:23,868 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0228, 5.3555, 5.2944, 5.2002, 5.3547, 5.3397, 5.0567, 4.8038], device='cuda:2'), covar=tensor([0.1088, 0.0491, 0.0274, 0.0528, 0.0275, 0.0319, 0.0392, 0.0370], device='cuda:2'), in_proj_covar=tensor([0.0480, 0.0323, 0.0293, 0.0318, 0.0377, 0.0395, 0.0319, 0.0360], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 12:27:26,460 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58681.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:27:47,579 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0054, 5.2873, 5.2245, 5.1106, 5.2653, 5.2633, 5.0188, 4.7868], device='cuda:2'), covar=tensor([0.1019, 0.0464, 0.0284, 0.0545, 0.0282, 0.0301, 0.0356, 0.0344], device='cuda:2'), in_proj_covar=tensor([0.0482, 0.0324, 0.0294, 0.0320, 0.0379, 0.0397, 0.0320, 0.0361], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 12:28:31,198 INFO [train2.py:809] (2/4) Epoch 15, batch 2950, loss[ctc_loss=0.06545, att_loss=0.2207, loss=0.1896, over 16172.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.007358, over 41.00 utterances.], tot_loss[ctc_loss=0.08723, att_loss=0.2419, loss=0.211, over 3253058.04 frames. utt_duration=1255 frames, utt_pad_proportion=0.05733, over 10380.43 utterances.], batch size: 41, lr: 7.17e-03, grad_scale: 8.0 2023-03-08 12:28:41,001 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.392e+02 2.063e+02 2.403e+02 2.955e+02 5.605e+02, threshold=4.806e+02, percent-clipped=1.0 2023-03-08 12:29:01,926 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58742.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:29:49,917 INFO [train2.py:809] (2/4) Epoch 15, batch 3000, loss[ctc_loss=0.09081, att_loss=0.2634, loss=0.2289, over 17112.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01546, over 56.00 utterances.], tot_loss[ctc_loss=0.08786, att_loss=0.2427, loss=0.2117, over 3254258.97 frames. utt_duration=1230 frames, utt_pad_proportion=0.06349, over 10595.88 utterances.], batch size: 56, lr: 7.17e-03, grad_scale: 8.0 2023-03-08 12:29:49,918 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 12:30:03,571 INFO [train2.py:843] (2/4) Epoch 15, validation: ctc_loss=0.04336, att_loss=0.2362, loss=0.1976, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 12:30:03,571 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 12:30:07,178 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58775.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:30:42,556 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58797.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:30:47,649 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58800.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:31:15,301 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4802, 2.9344, 3.0060, 2.5246, 3.0237, 2.9621, 2.9848, 1.9712], device='cuda:2'), covar=tensor([0.1107, 0.2116, 0.2334, 0.8501, 0.1391, 0.3197, 0.1415, 0.9964], device='cuda:2'), in_proj_covar=tensor([0.0129, 0.0151, 0.0164, 0.0230, 0.0125, 0.0220, 0.0140, 0.0197], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 12:31:24,433 INFO [train2.py:809] (2/4) Epoch 15, batch 3050, loss[ctc_loss=0.09874, att_loss=0.2496, loss=0.2194, over 16687.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.005907, over 46.00 utterances.], tot_loss[ctc_loss=0.08632, att_loss=0.2416, loss=0.2105, over 3258825.08 frames. utt_duration=1267 frames, utt_pad_proportion=0.05433, over 10300.23 utterances.], batch size: 46, lr: 7.17e-03, grad_scale: 8.0 2023-03-08 12:31:34,016 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.376e+02 2.164e+02 2.584e+02 3.141e+02 6.750e+02, threshold=5.168e+02, percent-clipped=6.0 2023-03-08 12:31:45,903 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58836.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:31:59,462 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58845.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:32:04,247 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58848.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:32:17,089 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9294, 5.1662, 5.0940, 5.0437, 5.1490, 5.1792, 4.8373, 4.6540], device='cuda:2'), covar=tensor([0.0830, 0.0423, 0.0259, 0.0450, 0.0266, 0.0277, 0.0369, 0.0338], device='cuda:2'), in_proj_covar=tensor([0.0477, 0.0322, 0.0292, 0.0318, 0.0375, 0.0394, 0.0318, 0.0359], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 12:32:43,304 INFO [train2.py:809] (2/4) Epoch 15, batch 3100, loss[ctc_loss=0.07917, att_loss=0.2276, loss=0.1979, over 16285.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.007036, over 43.00 utterances.], tot_loss[ctc_loss=0.08668, att_loss=0.2413, loss=0.2104, over 3245696.43 frames. utt_duration=1264 frames, utt_pad_proportion=0.05862, over 10283.17 utterances.], batch size: 43, lr: 7.16e-03, grad_scale: 8.0 2023-03-08 12:33:00,593 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-08 12:33:34,860 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58906.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:34:00,655 INFO [train2.py:809] (2/4) Epoch 15, batch 3150, loss[ctc_loss=0.1131, att_loss=0.2655, loss=0.235, over 17365.00 frames. utt_duration=880.6 frames, utt_pad_proportion=0.07792, over 79.00 utterances.], tot_loss[ctc_loss=0.08715, att_loss=0.242, loss=0.211, over 3254761.27 frames. utt_duration=1254 frames, utt_pad_proportion=0.05854, over 10397.85 utterances.], batch size: 79, lr: 7.16e-03, grad_scale: 8.0 2023-03-08 12:34:09,777 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.384e+02 2.265e+02 2.592e+02 3.060e+02 7.077e+02, threshold=5.184e+02, percent-clipped=3.0 2023-03-08 12:34:15,425 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4279, 2.6478, 3.6173, 3.0440, 3.4425, 4.5655, 4.3293, 3.2076], device='cuda:2'), covar=tensor([0.0370, 0.1828, 0.1150, 0.1129, 0.1029, 0.0686, 0.0491, 0.1210], device='cuda:2'), in_proj_covar=tensor([0.0236, 0.0236, 0.0264, 0.0211, 0.0254, 0.0332, 0.0237, 0.0230], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 12:35:10,534 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58967.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:35:19,407 INFO [train2.py:809] (2/4) Epoch 15, batch 3200, loss[ctc_loss=0.07692, att_loss=0.2406, loss=0.2078, over 16965.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007553, over 50.00 utterances.], tot_loss[ctc_loss=0.08661, att_loss=0.2417, loss=0.2107, over 3256562.96 frames. utt_duration=1238 frames, utt_pad_proportion=0.06226, over 10533.46 utterances.], batch size: 50, lr: 7.16e-03, grad_scale: 8.0 2023-03-08 12:36:14,660 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5505, 4.7806, 4.5616, 4.7099, 5.2083, 4.7255, 4.6748, 2.4042], device='cuda:2'), covar=tensor([0.0248, 0.0253, 0.0282, 0.0261, 0.0918, 0.0211, 0.0312, 0.2153], device='cuda:2'), in_proj_covar=tensor([0.0134, 0.0147, 0.0151, 0.0166, 0.0344, 0.0133, 0.0141, 0.0209], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 12:36:38,220 INFO [train2.py:809] (2/4) Epoch 15, batch 3250, loss[ctc_loss=0.1119, att_loss=0.2516, loss=0.2237, over 17321.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02338, over 59.00 utterances.], tot_loss[ctc_loss=0.08691, att_loss=0.2419, loss=0.2109, over 3260485.93 frames. utt_duration=1233 frames, utt_pad_proportion=0.06196, over 10592.43 utterances.], batch size: 59, lr: 7.15e-03, grad_scale: 8.0 2023-03-08 12:36:48,042 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 2.216e+02 2.757e+02 3.407e+02 8.183e+02, threshold=5.513e+02, percent-clipped=5.0 2023-03-08 12:37:00,989 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59037.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:37:57,611 INFO [train2.py:809] (2/4) Epoch 15, batch 3300, loss[ctc_loss=0.07238, att_loss=0.2208, loss=0.1911, over 15870.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.0102, over 39.00 utterances.], tot_loss[ctc_loss=0.08661, att_loss=0.2415, loss=0.2105, over 3263097.06 frames. utt_duration=1236 frames, utt_pad_proportion=0.0608, over 10576.20 utterances.], batch size: 39, lr: 7.15e-03, grad_scale: 8.0 2023-03-08 12:39:16,673 INFO [train2.py:809] (2/4) Epoch 15, batch 3350, loss[ctc_loss=0.1174, att_loss=0.2556, loss=0.228, over 16879.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007679, over 49.00 utterances.], tot_loss[ctc_loss=0.08591, att_loss=0.2412, loss=0.2101, over 3262015.23 frames. utt_duration=1267 frames, utt_pad_proportion=0.05289, over 10308.25 utterances.], batch size: 49, lr: 7.15e-03, grad_scale: 8.0 2023-03-08 12:39:27,189 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.236e+02 1.917e+02 2.406e+02 2.955e+02 7.524e+02, threshold=4.811e+02, percent-clipped=3.0 2023-03-08 12:39:30,493 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59131.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:40:22,546 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=59164.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:40:36,060 INFO [train2.py:809] (2/4) Epoch 15, batch 3400, loss[ctc_loss=0.07186, att_loss=0.218, loss=0.1888, over 15355.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01221, over 35.00 utterances.], tot_loss[ctc_loss=0.08563, att_loss=0.2406, loss=0.2096, over 3259495.36 frames. utt_duration=1280 frames, utt_pad_proportion=0.05112, over 10195.92 utterances.], batch size: 35, lr: 7.15e-03, grad_scale: 8.0 2023-03-08 12:41:50,260 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5013, 2.2670, 4.9453, 3.8297, 2.8253, 4.2672, 4.7725, 4.5629], device='cuda:2'), covar=tensor([0.0247, 0.1882, 0.0154, 0.0957, 0.1880, 0.0233, 0.0104, 0.0229], device='cuda:2'), in_proj_covar=tensor([0.0166, 0.0240, 0.0157, 0.0308, 0.0264, 0.0191, 0.0142, 0.0173], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 12:41:56,068 INFO [train2.py:809] (2/4) Epoch 15, batch 3450, loss[ctc_loss=0.09004, att_loss=0.2468, loss=0.2154, over 16617.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.005977, over 47.00 utterances.], tot_loss[ctc_loss=0.08553, att_loss=0.2414, loss=0.2102, over 3273156.81 frames. utt_duration=1293 frames, utt_pad_proportion=0.0445, over 10140.10 utterances.], batch size: 47, lr: 7.14e-03, grad_scale: 8.0 2023-03-08 12:42:00,138 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=59225.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:42:06,829 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.536e+02 2.401e+02 2.877e+02 3.602e+02 6.308e+02, threshold=5.755e+02, percent-clipped=6.0 2023-03-08 12:42:38,905 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.69 vs. limit=5.0 2023-03-08 12:42:58,741 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59262.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:43:06,934 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9545, 4.1990, 4.2866, 4.5263, 2.7666, 4.4319, 2.4028, 1.7044], device='cuda:2'), covar=tensor([0.0439, 0.0243, 0.0629, 0.0159, 0.1629, 0.0189, 0.1629, 0.1806], device='cuda:2'), in_proj_covar=tensor([0.0162, 0.0138, 0.0252, 0.0127, 0.0223, 0.0120, 0.0226, 0.0198], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 12:43:16,318 INFO [train2.py:809] (2/4) Epoch 15, batch 3500, loss[ctc_loss=0.06662, att_loss=0.2301, loss=0.1974, over 16000.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007931, over 40.00 utterances.], tot_loss[ctc_loss=0.08561, att_loss=0.2412, loss=0.2101, over 3277289.77 frames. utt_duration=1292 frames, utt_pad_proportion=0.04341, over 10158.57 utterances.], batch size: 40, lr: 7.14e-03, grad_scale: 8.0 2023-03-08 12:44:35,591 INFO [train2.py:809] (2/4) Epoch 15, batch 3550, loss[ctc_loss=0.09592, att_loss=0.2531, loss=0.2216, over 16960.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007801, over 50.00 utterances.], tot_loss[ctc_loss=0.08541, att_loss=0.2412, loss=0.21, over 3281213.70 frames. utt_duration=1297 frames, utt_pad_proportion=0.0402, over 10132.65 utterances.], batch size: 50, lr: 7.14e-03, grad_scale: 8.0 2023-03-08 12:44:45,171 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 2.222e+02 2.629e+02 3.391e+02 5.578e+02, threshold=5.259e+02, percent-clipped=0.0 2023-03-08 12:44:56,738 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-03-08 12:44:57,812 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=59337.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:45:54,579 INFO [train2.py:809] (2/4) Epoch 15, batch 3600, loss[ctc_loss=0.07836, att_loss=0.2154, loss=0.188, over 15645.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008719, over 37.00 utterances.], tot_loss[ctc_loss=0.08699, att_loss=0.2425, loss=0.2114, over 3272053.28 frames. utt_duration=1268 frames, utt_pad_proportion=0.05064, over 10331.05 utterances.], batch size: 37, lr: 7.13e-03, grad_scale: 8.0 2023-03-08 12:46:01,687 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3097, 3.7981, 3.2585, 3.5964, 4.0418, 3.6862, 3.1462, 4.3408], device='cuda:2'), covar=tensor([0.0869, 0.0480, 0.1012, 0.0625, 0.0679, 0.0684, 0.0807, 0.0518], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0201, 0.0217, 0.0190, 0.0260, 0.0229, 0.0193, 0.0278], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 12:46:13,893 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=59385.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:46:40,023 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.69 vs. limit=2.0 2023-03-08 12:46:43,784 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.44 vs. limit=5.0 2023-03-08 12:47:15,176 INFO [train2.py:809] (2/4) Epoch 15, batch 3650, loss[ctc_loss=0.118, att_loss=0.2664, loss=0.2367, over 17345.00 frames. utt_duration=1103 frames, utt_pad_proportion=0.03193, over 63.00 utterances.], tot_loss[ctc_loss=0.08667, att_loss=0.2418, loss=0.2107, over 3270816.04 frames. utt_duration=1266 frames, utt_pad_proportion=0.05048, over 10342.77 utterances.], batch size: 63, lr: 7.13e-03, grad_scale: 8.0 2023-03-08 12:47:15,423 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3016, 5.5841, 5.1697, 5.6765, 5.0360, 5.1963, 5.7001, 5.4915], device='cuda:2'), covar=tensor([0.0478, 0.0242, 0.0679, 0.0250, 0.0345, 0.0194, 0.0176, 0.0150], device='cuda:2'), in_proj_covar=tensor([0.0365, 0.0288, 0.0340, 0.0306, 0.0294, 0.0224, 0.0276, 0.0258], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 12:47:24,350 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.406e+02 2.155e+02 2.711e+02 3.466e+02 9.955e+02, threshold=5.422e+02, percent-clipped=4.0 2023-03-08 12:47:27,722 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=59431.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:47:32,110 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0263, 5.3046, 5.5696, 5.4358, 5.5220, 5.9928, 5.2406, 6.0495], device='cuda:2'), covar=tensor([0.0661, 0.0705, 0.0732, 0.1172, 0.1686, 0.0906, 0.0709, 0.0702], device='cuda:2'), in_proj_covar=tensor([0.0800, 0.0473, 0.0552, 0.0613, 0.0806, 0.0564, 0.0458, 0.0547], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 12:48:26,443 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0451, 5.3906, 4.8520, 5.4156, 4.7514, 5.0485, 5.4822, 5.2812], device='cuda:2'), covar=tensor([0.0516, 0.0220, 0.0770, 0.0231, 0.0407, 0.0188, 0.0193, 0.0160], device='cuda:2'), in_proj_covar=tensor([0.0363, 0.0287, 0.0339, 0.0304, 0.0293, 0.0223, 0.0274, 0.0258], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 12:48:34,103 INFO [train2.py:809] (2/4) Epoch 15, batch 3700, loss[ctc_loss=0.09817, att_loss=0.2546, loss=0.2233, over 16892.00 frames. utt_duration=1381 frames, utt_pad_proportion=0.006083, over 49.00 utterances.], tot_loss[ctc_loss=0.08608, att_loss=0.2413, loss=0.2102, over 3266659.92 frames. utt_duration=1279 frames, utt_pad_proportion=0.04918, over 10230.52 utterances.], batch size: 49, lr: 7.13e-03, grad_scale: 8.0 2023-03-08 12:48:43,285 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=59479.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:49:20,200 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=59502.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:49:44,331 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.81 vs. limit=2.0 2023-03-08 12:49:49,035 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59520.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:49:53,480 INFO [train2.py:809] (2/4) Epoch 15, batch 3750, loss[ctc_loss=0.07243, att_loss=0.2151, loss=0.1866, over 15634.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.008555, over 37.00 utterances.], tot_loss[ctc_loss=0.08676, att_loss=0.2422, loss=0.2111, over 3267782.06 frames. utt_duration=1255 frames, utt_pad_proportion=0.05416, over 10424.78 utterances.], batch size: 37, lr: 7.12e-03, grad_scale: 8.0 2023-03-08 12:50:02,432 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.260e+02 2.043e+02 2.443e+02 3.044e+02 5.040e+02, threshold=4.885e+02, percent-clipped=0.0 2023-03-08 12:50:54,594 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=59562.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:50:56,184 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=59563.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:51:12,183 INFO [train2.py:809] (2/4) Epoch 15, batch 3800, loss[ctc_loss=0.08628, att_loss=0.2344, loss=0.2047, over 16032.00 frames. utt_duration=1604 frames, utt_pad_proportion=0.005932, over 40.00 utterances.], tot_loss[ctc_loss=0.08657, att_loss=0.2423, loss=0.2111, over 3273386.96 frames. utt_duration=1256 frames, utt_pad_proportion=0.05245, over 10438.82 utterances.], batch size: 40, lr: 7.12e-03, grad_scale: 8.0 2023-03-08 12:52:09,968 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=59610.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:52:31,309 INFO [train2.py:809] (2/4) Epoch 15, batch 3850, loss[ctc_loss=0.08459, att_loss=0.2296, loss=0.2006, over 15639.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.009162, over 37.00 utterances.], tot_loss[ctc_loss=0.08653, att_loss=0.2422, loss=0.2111, over 3270767.16 frames. utt_duration=1266 frames, utt_pad_proportion=0.04992, over 10342.94 utterances.], batch size: 37, lr: 7.12e-03, grad_scale: 8.0 2023-03-08 12:52:40,310 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.434e+02 2.221e+02 2.684e+02 3.228e+02 7.729e+02, threshold=5.369e+02, percent-clipped=7.0 2023-03-08 12:53:46,842 INFO [train2.py:809] (2/4) Epoch 15, batch 3900, loss[ctc_loss=0.08338, att_loss=0.2496, loss=0.2163, over 16608.00 frames. utt_duration=679.5 frames, utt_pad_proportion=0.1485, over 98.00 utterances.], tot_loss[ctc_loss=0.08673, att_loss=0.242, loss=0.211, over 3264227.06 frames. utt_duration=1263 frames, utt_pad_proportion=0.05267, over 10352.20 utterances.], batch size: 98, lr: 7.12e-03, grad_scale: 8.0 2023-03-08 12:55:03,845 INFO [train2.py:809] (2/4) Epoch 15, batch 3950, loss[ctc_loss=0.07515, att_loss=0.2189, loss=0.1902, over 15637.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008582, over 37.00 utterances.], tot_loss[ctc_loss=0.08701, att_loss=0.2425, loss=0.2114, over 3270296.22 frames. utt_duration=1247 frames, utt_pad_proportion=0.05501, over 10505.95 utterances.], batch size: 37, lr: 7.11e-03, grad_scale: 8.0 2023-03-08 12:55:12,749 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.264e+02 1.994e+02 2.418e+02 2.996e+02 5.766e+02, threshold=4.836e+02, percent-clipped=1.0 2023-03-08 12:55:36,040 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1093, 4.3838, 4.5141, 4.6845, 2.7940, 4.6077, 2.5397, 2.1272], device='cuda:2'), covar=tensor([0.0405, 0.0270, 0.0662, 0.0170, 0.1666, 0.0173, 0.1509, 0.1528], device='cuda:2'), in_proj_covar=tensor([0.0163, 0.0137, 0.0249, 0.0127, 0.0221, 0.0120, 0.0224, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 12:56:20,781 INFO [train2.py:809] (2/4) Epoch 16, batch 0, loss[ctc_loss=0.08112, att_loss=0.2472, loss=0.214, over 16341.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005533, over 45.00 utterances.], tot_loss[ctc_loss=0.08112, att_loss=0.2472, loss=0.214, over 16341.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005533, over 45.00 utterances.], batch size: 45, lr: 6.88e-03, grad_scale: 8.0 2023-03-08 12:56:20,781 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 12:56:32,579 INFO [train2.py:843] (2/4) Epoch 16, validation: ctc_loss=0.04399, att_loss=0.2363, loss=0.1978, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 12:56:32,580 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 12:57:52,424 INFO [train2.py:809] (2/4) Epoch 16, batch 50, loss[ctc_loss=0.09703, att_loss=0.2497, loss=0.2192, over 16860.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.007978, over 49.00 utterances.], tot_loss[ctc_loss=0.08551, att_loss=0.2421, loss=0.2108, over 736305.99 frames. utt_duration=1247 frames, utt_pad_proportion=0.05714, over 2364.99 utterances.], batch size: 49, lr: 6.88e-03, grad_scale: 8.0 2023-03-08 12:58:13,663 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=59820.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:58:27,785 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.394e+02 2.112e+02 2.420e+02 3.068e+02 7.828e+02, threshold=4.840e+02, percent-clipped=6.0 2023-03-08 12:58:59,310 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=59848.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:59:12,662 INFO [train2.py:809] (2/4) Epoch 16, batch 100, loss[ctc_loss=0.08434, att_loss=0.2422, loss=0.2106, over 16257.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.00865, over 43.00 utterances.], tot_loss[ctc_loss=0.08483, att_loss=0.2419, loss=0.2105, over 1299422.66 frames. utt_duration=1237 frames, utt_pad_proportion=0.05759, over 4207.69 utterances.], batch size: 43, lr: 6.88e-03, grad_scale: 8.0 2023-03-08 12:59:14,411 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59858.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 12:59:30,342 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=59868.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:00:03,277 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=59889.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:00:31,894 INFO [train2.py:809] (2/4) Epoch 16, batch 150, loss[ctc_loss=0.08462, att_loss=0.2422, loss=0.2107, over 16330.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006202, over 45.00 utterances.], tot_loss[ctc_loss=0.08298, att_loss=0.2402, loss=0.2087, over 1728600.94 frames. utt_duration=1277 frames, utt_pad_proportion=0.05058, over 5420.09 utterances.], batch size: 45, lr: 6.87e-03, grad_scale: 8.0 2023-03-08 13:00:35,482 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=59909.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:01:06,710 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.476e+02 2.211e+02 2.747e+02 3.339e+02 6.489e+02, threshold=5.494e+02, percent-clipped=4.0 2023-03-08 13:01:40,806 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=59950.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:01:51,216 INFO [train2.py:809] (2/4) Epoch 16, batch 200, loss[ctc_loss=0.05938, att_loss=0.2074, loss=0.1778, over 15355.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01223, over 35.00 utterances.], tot_loss[ctc_loss=0.08434, att_loss=0.2411, loss=0.2098, over 2077274.90 frames. utt_duration=1251 frames, utt_pad_proportion=0.05191, over 6651.52 utterances.], batch size: 35, lr: 6.87e-03, grad_scale: 8.0 2023-03-08 13:02:21,907 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8352, 4.7658, 4.6026, 2.9299, 4.5826, 4.4898, 4.1859, 2.7107], device='cuda:2'), covar=tensor([0.0132, 0.0089, 0.0261, 0.0968, 0.0092, 0.0209, 0.0297, 0.1240], device='cuda:2'), in_proj_covar=tensor([0.0067, 0.0094, 0.0091, 0.0108, 0.0077, 0.0103, 0.0097, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 13:02:45,702 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6011, 5.0173, 4.8357, 5.0218, 5.0759, 4.7651, 3.7330, 4.9735], device='cuda:2'), covar=tensor([0.0107, 0.0095, 0.0105, 0.0065, 0.0074, 0.0092, 0.0583, 0.0193], device='cuda:2'), in_proj_covar=tensor([0.0082, 0.0079, 0.0099, 0.0061, 0.0066, 0.0078, 0.0097, 0.0100], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 13:02:47,084 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.08 vs. limit=5.0 2023-03-08 13:03:15,166 INFO [train2.py:809] (2/4) Epoch 16, batch 250, loss[ctc_loss=0.09208, att_loss=0.2394, loss=0.2099, over 15954.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007151, over 41.00 utterances.], tot_loss[ctc_loss=0.08423, att_loss=0.2407, loss=0.2094, over 2343265.01 frames. utt_duration=1242 frames, utt_pad_proportion=0.05453, over 7557.16 utterances.], batch size: 41, lr: 6.87e-03, grad_scale: 8.0 2023-03-08 13:03:31,828 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6471, 1.7770, 2.1362, 2.6498, 2.5265, 2.4116, 2.3631, 2.8749], device='cuda:2'), covar=tensor([0.2001, 0.4082, 0.2971, 0.1653, 0.2922, 0.1640, 0.2711, 0.0876], device='cuda:2'), in_proj_covar=tensor([0.0090, 0.0095, 0.0097, 0.0085, 0.0089, 0.0080, 0.0100, 0.0070], device='cuda:2'), out_proj_covar=tensor([6.5158e-05, 7.0772e-05, 7.3692e-05, 6.3835e-05, 6.4175e-05, 6.2730e-05, 7.2493e-05, 5.5581e-05], device='cuda:2') 2023-03-08 13:03:50,525 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.435e+02 2.138e+02 2.534e+02 3.199e+02 6.195e+02, threshold=5.067e+02, percent-clipped=2.0 2023-03-08 13:04:09,073 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60040.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:04:28,384 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60052.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:04:35,587 INFO [train2.py:809] (2/4) Epoch 16, batch 300, loss[ctc_loss=0.08979, att_loss=0.2392, loss=0.2093, over 16187.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.006439, over 41.00 utterances.], tot_loss[ctc_loss=0.08515, att_loss=0.2414, loss=0.2102, over 2554056.91 frames. utt_duration=1245 frames, utt_pad_proportion=0.05318, over 8218.90 utterances.], batch size: 41, lr: 6.87e-03, grad_scale: 16.0 2023-03-08 13:04:49,167 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60065.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:05:09,384 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60078.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:05:46,278 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60101.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 13:05:55,886 INFO [train2.py:809] (2/4) Epoch 16, batch 350, loss[ctc_loss=0.07126, att_loss=0.2153, loss=0.1865, over 15875.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009228, over 39.00 utterances.], tot_loss[ctc_loss=0.08578, att_loss=0.2418, loss=0.2106, over 2714178.75 frames. utt_duration=1236 frames, utt_pad_proportion=0.05514, over 8791.86 utterances.], batch size: 39, lr: 6.86e-03, grad_scale: 16.0 2023-03-08 13:06:05,707 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60113.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 13:06:26,340 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60126.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:06:30,555 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.343e+02 2.169e+02 2.724e+02 3.771e+02 9.216e+02, threshold=5.447e+02, percent-clipped=7.0 2023-03-08 13:06:47,223 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60139.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:06:55,462 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5076, 2.6985, 3.3839, 4.3747, 3.8994, 3.8030, 2.9156, 2.4192], device='cuda:2'), covar=tensor([0.0640, 0.2312, 0.0983, 0.0615, 0.0952, 0.0548, 0.1629, 0.2316], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0215, 0.0191, 0.0202, 0.0211, 0.0169, 0.0198, 0.0181], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 13:07:14,379 INFO [train2.py:809] (2/4) Epoch 16, batch 400, loss[ctc_loss=0.08567, att_loss=0.2407, loss=0.2097, over 16540.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006062, over 45.00 utterances.], tot_loss[ctc_loss=0.08696, att_loss=0.243, loss=0.2118, over 2850534.35 frames. utt_duration=1234 frames, utt_pad_proportion=0.05228, over 9248.20 utterances.], batch size: 45, lr: 6.86e-03, grad_scale: 8.0 2023-03-08 13:07:16,894 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60158.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:08:16,872 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5505, 2.5922, 4.9632, 3.9435, 3.0663, 4.2789, 4.8644, 4.6234], device='cuda:2'), covar=tensor([0.0206, 0.1699, 0.0192, 0.0984, 0.1855, 0.0241, 0.0112, 0.0207], device='cuda:2'), in_proj_covar=tensor([0.0165, 0.0237, 0.0157, 0.0304, 0.0259, 0.0191, 0.0139, 0.0170], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 13:08:19,862 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60198.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:08:29,146 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60204.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:08:32,211 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60206.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:08:33,620 INFO [train2.py:809] (2/4) Epoch 16, batch 450, loss[ctc_loss=0.08052, att_loss=0.2515, loss=0.2173, over 17287.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01263, over 55.00 utterances.], tot_loss[ctc_loss=0.08698, att_loss=0.243, loss=0.2118, over 2950652.74 frames. utt_duration=1261 frames, utt_pad_proportion=0.04527, over 9371.19 utterances.], batch size: 55, lr: 6.86e-03, grad_scale: 8.0 2023-03-08 13:09:10,571 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.357e+02 2.196e+02 2.695e+02 3.219e+02 5.768e+02, threshold=5.391e+02, percent-clipped=1.0 2023-03-08 13:09:34,776 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60245.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:09:54,249 INFO [train2.py:809] (2/4) Epoch 16, batch 500, loss[ctc_loss=0.07708, att_loss=0.2179, loss=0.1897, over 14227.00 frames. utt_duration=1837 frames, utt_pad_proportion=0.04206, over 31.00 utterances.], tot_loss[ctc_loss=0.08684, att_loss=0.243, loss=0.2118, over 3025682.95 frames. utt_duration=1237 frames, utt_pad_proportion=0.05067, over 9799.09 utterances.], batch size: 31, lr: 6.85e-03, grad_scale: 8.0 2023-03-08 13:09:57,665 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60259.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:10:52,688 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-08 13:11:14,952 INFO [train2.py:809] (2/4) Epoch 16, batch 550, loss[ctc_loss=0.05929, att_loss=0.2217, loss=0.1893, over 15960.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005989, over 41.00 utterances.], tot_loss[ctc_loss=0.08647, att_loss=0.2426, loss=0.2114, over 3082215.24 frames. utt_duration=1242 frames, utt_pad_proportion=0.05082, over 9936.65 utterances.], batch size: 41, lr: 6.85e-03, grad_scale: 8.0 2023-03-08 13:11:30,559 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.74 vs. limit=2.0 2023-03-08 13:11:51,581 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.455e+02 1.986e+02 2.421e+02 3.099e+02 6.630e+02, threshold=4.842e+02, percent-clipped=3.0 2023-03-08 13:12:13,592 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0757, 3.6916, 3.6713, 3.2846, 3.6959, 3.8367, 3.6939, 2.5124], device='cuda:2'), covar=tensor([0.1080, 0.1631, 0.3094, 0.4878, 0.1881, 0.2345, 0.1250, 0.6372], device='cuda:2'), in_proj_covar=tensor([0.0130, 0.0152, 0.0163, 0.0228, 0.0127, 0.0220, 0.0139, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 13:12:35,540 INFO [train2.py:809] (2/4) Epoch 16, batch 600, loss[ctc_loss=0.06568, att_loss=0.2293, loss=0.1966, over 16326.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006399, over 45.00 utterances.], tot_loss[ctc_loss=0.08735, att_loss=0.243, loss=0.2119, over 3124948.82 frames. utt_duration=1229 frames, utt_pad_proportion=0.05583, over 10184.34 utterances.], batch size: 45, lr: 6.85e-03, grad_scale: 8.0 2023-03-08 13:12:41,143 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-03-08 13:13:26,365 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9091, 5.2827, 4.8154, 5.3075, 4.7374, 4.9740, 5.4142, 5.2179], device='cuda:2'), covar=tensor([0.0559, 0.0238, 0.0847, 0.0236, 0.0408, 0.0252, 0.0204, 0.0159], device='cuda:2'), in_proj_covar=tensor([0.0365, 0.0291, 0.0345, 0.0302, 0.0298, 0.0227, 0.0280, 0.0262], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 13:13:29,411 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4200, 1.5695, 1.8731, 2.3394, 2.1657, 2.0580, 1.7708, 2.5470], device='cuda:2'), covar=tensor([0.1922, 0.2527, 0.2285, 0.1199, 0.2826, 0.1198, 0.1651, 0.0848], device='cuda:2'), in_proj_covar=tensor([0.0089, 0.0093, 0.0096, 0.0086, 0.0089, 0.0080, 0.0100, 0.0069], device='cuda:2'), out_proj_covar=tensor([6.4698e-05, 7.0130e-05, 7.3426e-05, 6.4212e-05, 6.4492e-05, 6.2557e-05, 7.2977e-05, 5.5236e-05], device='cuda:2') 2023-03-08 13:13:31,585 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2023-03-08 13:13:39,236 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60396.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 13:13:57,398 INFO [train2.py:809] (2/4) Epoch 16, batch 650, loss[ctc_loss=0.07214, att_loss=0.2262, loss=0.1954, over 16181.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006807, over 41.00 utterances.], tot_loss[ctc_loss=0.0871, att_loss=0.2426, loss=0.2115, over 3151186.12 frames. utt_duration=1221 frames, utt_pad_proportion=0.0604, over 10332.08 utterances.], batch size: 41, lr: 6.85e-03, grad_scale: 8.0 2023-03-08 13:13:59,129 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60408.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 13:14:14,115 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6811, 3.4288, 3.3955, 3.0268, 3.4410, 3.4979, 3.4462, 2.3670], device='cuda:2'), covar=tensor([0.1160, 0.1776, 0.2338, 0.4247, 0.1269, 0.4214, 0.1336, 0.5661], device='cuda:2'), in_proj_covar=tensor([0.0130, 0.0151, 0.0162, 0.0227, 0.0127, 0.0217, 0.0138, 0.0193], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 13:14:20,207 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60421.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:14:33,816 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.518e+02 2.189e+02 2.670e+02 3.387e+02 7.027e+02, threshold=5.339e+02, percent-clipped=7.0 2023-03-08 13:14:36,770 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.82 vs. limit=5.0 2023-03-08 13:14:40,854 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60434.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:14:52,968 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.00 vs. limit=5.0 2023-03-08 13:15:17,214 INFO [train2.py:809] (2/4) Epoch 16, batch 700, loss[ctc_loss=0.08901, att_loss=0.2494, loss=0.2173, over 17252.00 frames. utt_duration=875.1 frames, utt_pad_proportion=0.08364, over 79.00 utterances.], tot_loss[ctc_loss=0.08632, att_loss=0.2422, loss=0.211, over 3171331.34 frames. utt_duration=1239 frames, utt_pad_proportion=0.05695, over 10251.10 utterances.], batch size: 79, lr: 6.84e-03, grad_scale: 8.0 2023-03-08 13:16:22,337 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2023-03-08 13:16:33,161 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60504.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:16:37,565 INFO [train2.py:809] (2/4) Epoch 16, batch 750, loss[ctc_loss=0.1069, att_loss=0.2553, loss=0.2256, over 16886.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.00737, over 49.00 utterances.], tot_loss[ctc_loss=0.0863, att_loss=0.2423, loss=0.2111, over 3199351.32 frames. utt_duration=1253 frames, utt_pad_proportion=0.05259, over 10227.40 utterances.], batch size: 49, lr: 6.84e-03, grad_scale: 8.0 2023-03-08 13:17:14,653 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.359e+02 2.152e+02 2.643e+02 3.369e+02 1.101e+03, threshold=5.285e+02, percent-clipped=4.0 2023-03-08 13:17:40,028 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60545.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:17:40,745 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.35 vs. limit=5.0 2023-03-08 13:17:50,568 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60552.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:17:54,426 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60554.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:17:57,660 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2637, 2.6425, 3.0770, 4.1934, 3.7032, 3.6670, 2.7297, 2.0720], device='cuda:2'), covar=tensor([0.0686, 0.2083, 0.1077, 0.0558, 0.0815, 0.0551, 0.1578, 0.2311], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0215, 0.0190, 0.0204, 0.0212, 0.0169, 0.0197, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 13:17:58,940 INFO [train2.py:809] (2/4) Epoch 16, batch 800, loss[ctc_loss=0.09589, att_loss=0.2587, loss=0.2261, over 17296.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02468, over 59.00 utterances.], tot_loss[ctc_loss=0.08626, att_loss=0.2422, loss=0.211, over 3212851.02 frames. utt_duration=1238 frames, utt_pad_proportion=0.05713, over 10397.29 utterances.], batch size: 59, lr: 6.84e-03, grad_scale: 8.0 2023-03-08 13:18:56,201 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60593.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:19:00,145 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-08 13:19:18,474 INFO [train2.py:809] (2/4) Epoch 16, batch 850, loss[ctc_loss=0.08249, att_loss=0.2442, loss=0.2119, over 17228.00 frames. utt_duration=873.8 frames, utt_pad_proportion=0.08603, over 79.00 utterances.], tot_loss[ctc_loss=0.08624, att_loss=0.2421, loss=0.2109, over 3222953.71 frames. utt_duration=1239 frames, utt_pad_proportion=0.0583, over 10418.71 utterances.], batch size: 79, lr: 6.83e-03, grad_scale: 8.0 2023-03-08 13:19:23,254 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.22 vs. limit=5.0 2023-03-08 13:19:24,895 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-03-08 13:19:55,689 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.256e+02 2.159e+02 2.689e+02 3.192e+02 6.800e+02, threshold=5.378e+02, percent-clipped=2.0 2023-03-08 13:20:39,640 INFO [train2.py:809] (2/4) Epoch 16, batch 900, loss[ctc_loss=0.1054, att_loss=0.2591, loss=0.2284, over 17282.00 frames. utt_duration=1173 frames, utt_pad_proportion=0.02565, over 59.00 utterances.], tot_loss[ctc_loss=0.0865, att_loss=0.2421, loss=0.211, over 3234924.98 frames. utt_duration=1221 frames, utt_pad_proportion=0.06197, over 10611.54 utterances.], batch size: 59, lr: 6.83e-03, grad_scale: 8.0 2023-03-08 13:21:20,733 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-08 13:21:39,238 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5236, 2.2606, 4.9284, 3.8758, 2.8501, 4.1731, 4.6161, 4.6468], device='cuda:2'), covar=tensor([0.0211, 0.1918, 0.0159, 0.0872, 0.1889, 0.0263, 0.0133, 0.0205], device='cuda:2'), in_proj_covar=tensor([0.0166, 0.0237, 0.0158, 0.0304, 0.0259, 0.0193, 0.0140, 0.0171], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 13:21:42,289 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60696.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:22:00,172 INFO [train2.py:809] (2/4) Epoch 16, batch 950, loss[ctc_loss=0.1107, att_loss=0.2701, loss=0.2382, over 17277.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.01335, over 55.00 utterances.], tot_loss[ctc_loss=0.08692, att_loss=0.2423, loss=0.2112, over 3240352.76 frames. utt_duration=1219 frames, utt_pad_proportion=0.06363, over 10644.34 utterances.], batch size: 55, lr: 6.83e-03, grad_scale: 8.0 2023-03-08 13:22:02,084 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60708.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:22:07,250 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.29 vs. limit=5.0 2023-03-08 13:22:18,049 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60718.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:22:22,542 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60721.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:22:36,658 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.437e+02 2.156e+02 2.670e+02 3.136e+02 6.539e+02, threshold=5.340e+02, percent-clipped=2.0 2023-03-08 13:22:43,843 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60734.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:22:59,110 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60744.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:23:19,150 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60756.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:23:20,646 INFO [train2.py:809] (2/4) Epoch 16, batch 1000, loss[ctc_loss=0.07594, att_loss=0.236, loss=0.204, over 17361.00 frames. utt_duration=880.4 frames, utt_pad_proportion=0.07716, over 79.00 utterances.], tot_loss[ctc_loss=0.08648, att_loss=0.2425, loss=0.2113, over 3253580.73 frames. utt_duration=1249 frames, utt_pad_proportion=0.05408, over 10435.87 utterances.], batch size: 79, lr: 6.83e-03, grad_scale: 8.0 2023-03-08 13:23:39,233 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60769.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:23:55,729 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60779.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:24:00,514 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60782.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:24:40,433 INFO [train2.py:809] (2/4) Epoch 16, batch 1050, loss[ctc_loss=0.06362, att_loss=0.2145, loss=0.1843, over 15346.00 frames. utt_duration=1755 frames, utt_pad_proportion=0.0116, over 35.00 utterances.], tot_loss[ctc_loss=0.08622, att_loss=0.2416, loss=0.2105, over 3252168.14 frames. utt_duration=1265 frames, utt_pad_proportion=0.05142, over 10293.50 utterances.], batch size: 35, lr: 6.82e-03, grad_scale: 8.0 2023-03-08 13:25:17,107 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.408e+02 2.141e+02 2.597e+02 3.263e+02 1.129e+03, threshold=5.194e+02, percent-clipped=2.0 2023-03-08 13:25:47,589 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8484, 3.7183, 3.1356, 3.4277, 3.8891, 3.6273, 2.8670, 4.2877], device='cuda:2'), covar=tensor([0.1102, 0.0485, 0.1077, 0.0684, 0.0680, 0.0669, 0.0960, 0.0490], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0203, 0.0217, 0.0190, 0.0263, 0.0230, 0.0194, 0.0280], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 13:25:56,054 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60854.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:26:00,458 INFO [train2.py:809] (2/4) Epoch 16, batch 1100, loss[ctc_loss=0.08532, att_loss=0.2528, loss=0.2193, over 16955.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008342, over 50.00 utterances.], tot_loss[ctc_loss=0.08599, att_loss=0.2417, loss=0.2106, over 3257214.81 frames. utt_duration=1267 frames, utt_pad_proportion=0.05169, over 10294.56 utterances.], batch size: 50, lr: 6.82e-03, grad_scale: 8.0 2023-03-08 13:26:04,166 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5025, 2.5698, 5.0192, 3.8894, 2.8875, 4.2786, 4.8233, 4.6413], device='cuda:2'), covar=tensor([0.0251, 0.1648, 0.0178, 0.0941, 0.1841, 0.0247, 0.0112, 0.0232], device='cuda:2'), in_proj_covar=tensor([0.0167, 0.0236, 0.0158, 0.0304, 0.0259, 0.0192, 0.0140, 0.0170], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 13:26:12,109 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0305, 5.0380, 5.0044, 2.1173, 1.8460, 2.6731, 2.4321, 3.6955], device='cuda:2'), covar=tensor([0.0702, 0.0246, 0.0212, 0.5164, 0.6222, 0.2894, 0.3271, 0.1927], device='cuda:2'), in_proj_covar=tensor([0.0342, 0.0243, 0.0247, 0.0226, 0.0342, 0.0335, 0.0236, 0.0356], device='cuda:2'), out_proj_covar=tensor([1.4903e-04, 9.0768e-05, 1.0580e-04, 9.8127e-05, 1.4529e-04, 1.3275e-04, 9.4639e-05, 1.4662e-04], device='cuda:2') 2023-03-08 13:27:04,875 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1563, 5.0935, 4.8573, 2.8681, 4.8369, 4.6562, 4.3248, 2.7272], device='cuda:2'), covar=tensor([0.0093, 0.0090, 0.0288, 0.1092, 0.0092, 0.0213, 0.0332, 0.1434], device='cuda:2'), in_proj_covar=tensor([0.0069, 0.0095, 0.0094, 0.0110, 0.0079, 0.0107, 0.0099, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 13:27:12,936 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60902.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:27:21,211 INFO [train2.py:809] (2/4) Epoch 16, batch 1150, loss[ctc_loss=0.07559, att_loss=0.2147, loss=0.1869, over 15644.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008909, over 37.00 utterances.], tot_loss[ctc_loss=0.08611, att_loss=0.2422, loss=0.211, over 3270446.98 frames. utt_duration=1265 frames, utt_pad_proportion=0.04898, over 10350.98 utterances.], batch size: 37, lr: 6.82e-03, grad_scale: 8.0 2023-03-08 13:27:42,314 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7783, 4.6815, 4.5716, 4.5710, 5.1469, 4.6574, 4.5388, 2.4245], device='cuda:2'), covar=tensor([0.0181, 0.0263, 0.0284, 0.0266, 0.0942, 0.0190, 0.0305, 0.2120], device='cuda:2'), in_proj_covar=tensor([0.0137, 0.0153, 0.0157, 0.0169, 0.0351, 0.0134, 0.0145, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 13:27:59,215 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 2.021e+02 2.572e+02 3.109e+02 6.628e+02, threshold=5.143e+02, percent-clipped=1.0 2023-03-08 13:28:05,705 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7086, 5.9613, 5.4279, 5.7400, 5.5687, 5.1311, 5.2999, 5.1841], device='cuda:2'), covar=tensor([0.1337, 0.0825, 0.0779, 0.0702, 0.0942, 0.1505, 0.2327, 0.2196], device='cuda:2'), in_proj_covar=tensor([0.0478, 0.0553, 0.0421, 0.0419, 0.0397, 0.0439, 0.0564, 0.0503], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 13:28:42,082 INFO [train2.py:809] (2/4) Epoch 16, batch 1200, loss[ctc_loss=0.09633, att_loss=0.2519, loss=0.2208, over 17311.00 frames. utt_duration=878 frames, utt_pad_proportion=0.08156, over 79.00 utterances.], tot_loss[ctc_loss=0.08496, att_loss=0.2418, loss=0.2105, over 3277821.40 frames. utt_duration=1275 frames, utt_pad_proportion=0.04432, over 10296.97 utterances.], batch size: 79, lr: 6.81e-03, grad_scale: 8.0 2023-03-08 13:30:02,898 INFO [train2.py:809] (2/4) Epoch 16, batch 1250, loss[ctc_loss=0.108, att_loss=0.2535, loss=0.2244, over 17026.00 frames. utt_duration=1311 frames, utt_pad_proportion=0.01032, over 52.00 utterances.], tot_loss[ctc_loss=0.08464, att_loss=0.2414, loss=0.2101, over 3277269.02 frames. utt_duration=1270 frames, utt_pad_proportion=0.04637, over 10336.10 utterances.], batch size: 52, lr: 6.81e-03, grad_scale: 8.0 2023-03-08 13:30:20,098 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.53 vs. limit=2.0 2023-03-08 13:30:29,342 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5740, 5.8611, 5.2426, 5.6100, 5.4947, 5.1039, 5.2521, 5.0552], device='cuda:2'), covar=tensor([0.1151, 0.0970, 0.0835, 0.0791, 0.0837, 0.1232, 0.2320, 0.2607], device='cuda:2'), in_proj_covar=tensor([0.0477, 0.0547, 0.0416, 0.0413, 0.0392, 0.0432, 0.0560, 0.0496], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 13:30:39,969 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.300e+02 1.974e+02 2.401e+02 2.960e+02 6.292e+02, threshold=4.803e+02, percent-clipped=4.0 2023-03-08 13:31:03,629 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7073, 4.6299, 4.4532, 2.8044, 4.4263, 4.3704, 3.9566, 2.6350], device='cuda:2'), covar=tensor([0.0104, 0.0103, 0.0284, 0.1010, 0.0103, 0.0231, 0.0340, 0.1347], device='cuda:2'), in_proj_covar=tensor([0.0068, 0.0094, 0.0093, 0.0110, 0.0078, 0.0107, 0.0098, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 13:31:12,738 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6258, 3.2203, 3.8048, 3.0876, 3.7534, 4.7224, 4.5466, 3.4826], device='cuda:2'), covar=tensor([0.0377, 0.1510, 0.1093, 0.1394, 0.0930, 0.0766, 0.0445, 0.1153], device='cuda:2'), in_proj_covar=tensor([0.0238, 0.0235, 0.0265, 0.0208, 0.0253, 0.0335, 0.0241, 0.0228], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 13:31:22,902 INFO [train2.py:809] (2/4) Epoch 16, batch 1300, loss[ctc_loss=0.07755, att_loss=0.2327, loss=0.2017, over 16398.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007047, over 44.00 utterances.], tot_loss[ctc_loss=0.08535, att_loss=0.2422, loss=0.2108, over 3277473.90 frames. utt_duration=1235 frames, utt_pad_proportion=0.05389, over 10626.57 utterances.], batch size: 44, lr: 6.81e-03, grad_scale: 8.0 2023-03-08 13:31:29,553 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=61061.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:31:50,218 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=61074.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:32:42,655 INFO [train2.py:809] (2/4) Epoch 16, batch 1350, loss[ctc_loss=0.05843, att_loss=0.2019, loss=0.1732, over 15373.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01088, over 35.00 utterances.], tot_loss[ctc_loss=0.08601, att_loss=0.242, loss=0.2108, over 3270826.51 frames. utt_duration=1250 frames, utt_pad_proportion=0.05051, over 10480.67 utterances.], batch size: 35, lr: 6.81e-03, grad_scale: 8.0 2023-03-08 13:33:06,285 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=61122.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:33:18,946 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.675e+02 2.149e+02 2.520e+02 3.082e+02 5.998e+02, threshold=5.039e+02, percent-clipped=2.0 2023-03-08 13:34:02,074 INFO [train2.py:809] (2/4) Epoch 16, batch 1400, loss[ctc_loss=0.05665, att_loss=0.2192, loss=0.1867, over 15993.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007672, over 40.00 utterances.], tot_loss[ctc_loss=0.08551, att_loss=0.2417, loss=0.2104, over 3279385.73 frames. utt_duration=1263 frames, utt_pad_proportion=0.04553, over 10402.48 utterances.], batch size: 40, lr: 6.80e-03, grad_scale: 8.0 2023-03-08 13:34:44,051 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.27 vs. limit=5.0 2023-03-08 13:35:21,833 INFO [train2.py:809] (2/4) Epoch 16, batch 1450, loss[ctc_loss=0.08022, att_loss=0.2512, loss=0.217, over 16463.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.006809, over 46.00 utterances.], tot_loss[ctc_loss=0.08528, att_loss=0.2421, loss=0.2107, over 3285096.20 frames. utt_duration=1283 frames, utt_pad_proportion=0.03995, over 10251.34 utterances.], batch size: 46, lr: 6.80e-03, grad_scale: 8.0 2023-03-08 13:35:40,107 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9904, 3.6995, 3.0640, 3.4837, 3.8957, 3.6395, 2.8599, 4.1858], device='cuda:2'), covar=tensor([0.1054, 0.0459, 0.1114, 0.0610, 0.0677, 0.0655, 0.0939, 0.0460], device='cuda:2'), in_proj_covar=tensor([0.0195, 0.0201, 0.0217, 0.0188, 0.0260, 0.0228, 0.0193, 0.0277], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 13:35:58,473 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.469e+02 2.227e+02 2.591e+02 3.129e+02 8.266e+02, threshold=5.182e+02, percent-clipped=3.0 2023-03-08 13:36:12,556 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6105, 2.6074, 2.9073, 4.3886, 4.0261, 4.0444, 2.9223, 2.2321], device='cuda:2'), covar=tensor([0.0569, 0.2203, 0.1364, 0.0651, 0.0722, 0.0370, 0.1527, 0.2234], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0213, 0.0190, 0.0202, 0.0212, 0.0168, 0.0197, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 13:36:40,857 INFO [train2.py:809] (2/4) Epoch 16, batch 1500, loss[ctc_loss=0.09824, att_loss=0.236, loss=0.2084, over 16165.00 frames. utt_duration=1578 frames, utt_pad_proportion=0.007271, over 41.00 utterances.], tot_loss[ctc_loss=0.08534, att_loss=0.2413, loss=0.2101, over 3282621.25 frames. utt_duration=1278 frames, utt_pad_proportion=0.04251, over 10287.34 utterances.], batch size: 41, lr: 6.80e-03, grad_scale: 8.0 2023-03-08 13:36:53,260 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7401, 5.1063, 5.2240, 5.1573, 5.1415, 5.1940, 4.8083, 4.6401], device='cuda:2'), covar=tensor([0.1486, 0.0785, 0.0309, 0.0554, 0.0521, 0.0418, 0.0441, 0.0473], device='cuda:2'), in_proj_covar=tensor([0.0493, 0.0330, 0.0301, 0.0326, 0.0384, 0.0403, 0.0328, 0.0365], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 13:37:59,315 INFO [train2.py:809] (2/4) Epoch 16, batch 1550, loss[ctc_loss=0.09406, att_loss=0.2484, loss=0.2176, over 17328.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02225, over 59.00 utterances.], tot_loss[ctc_loss=0.0853, att_loss=0.2408, loss=0.2097, over 3275537.07 frames. utt_duration=1292 frames, utt_pad_proportion=0.04201, over 10152.94 utterances.], batch size: 59, lr: 6.80e-03, grad_scale: 8.0 2023-03-08 13:38:11,091 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.72 vs. limit=2.0 2023-03-08 13:38:25,745 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6216, 4.8531, 4.7897, 4.8394, 4.8713, 4.8923, 4.5401, 4.4061], device='cuda:2'), covar=tensor([0.0896, 0.0531, 0.0356, 0.0451, 0.0279, 0.0311, 0.0357, 0.0324], device='cuda:2'), in_proj_covar=tensor([0.0487, 0.0327, 0.0300, 0.0324, 0.0380, 0.0400, 0.0325, 0.0361], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 13:38:35,927 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.223e+02 2.094e+02 2.601e+02 3.579e+02 7.093e+02, threshold=5.201e+02, percent-clipped=2.0 2023-03-08 13:39:18,922 INFO [train2.py:809] (2/4) Epoch 16, batch 1600, loss[ctc_loss=0.06276, att_loss=0.2151, loss=0.1846, over 15750.00 frames. utt_duration=1659 frames, utt_pad_proportion=0.009269, over 38.00 utterances.], tot_loss[ctc_loss=0.08525, att_loss=0.2411, loss=0.2099, over 3281623.67 frames. utt_duration=1279 frames, utt_pad_proportion=0.04385, over 10274.60 utterances.], batch size: 38, lr: 6.79e-03, grad_scale: 8.0 2023-03-08 13:39:46,410 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=61374.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:40:38,376 INFO [train2.py:809] (2/4) Epoch 16, batch 1650, loss[ctc_loss=0.07216, att_loss=0.2227, loss=0.1926, over 14574.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.03423, over 32.00 utterances.], tot_loss[ctc_loss=0.08531, att_loss=0.241, loss=0.2099, over 3276567.21 frames. utt_duration=1282 frames, utt_pad_proportion=0.04379, over 10234.96 utterances.], batch size: 32, lr: 6.79e-03, grad_scale: 8.0 2023-03-08 13:40:44,764 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7682, 2.2157, 2.1227, 2.6784, 2.9747, 2.5432, 2.0648, 2.7198], device='cuda:2'), covar=tensor([0.1610, 0.3677, 0.3716, 0.1642, 0.1749, 0.1457, 0.3222, 0.1023], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0095, 0.0100, 0.0088, 0.0091, 0.0083, 0.0103, 0.0071], device='cuda:2'), out_proj_covar=tensor([6.6255e-05, 7.1851e-05, 7.5481e-05, 6.5651e-05, 6.6031e-05, 6.4385e-05, 7.4914e-05, 5.6994e-05], device='cuda:2') 2023-03-08 13:40:54,190 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=61417.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:41:02,527 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=61422.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:41:12,276 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.07 vs. limit=5.0 2023-03-08 13:41:14,485 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.484e+02 2.105e+02 2.549e+02 2.944e+02 7.207e+02, threshold=5.098e+02, percent-clipped=3.0 2023-03-08 13:41:38,557 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5487, 2.5925, 5.1460, 3.7972, 2.9765, 4.3735, 4.8794, 4.6074], device='cuda:2'), covar=tensor([0.0212, 0.1608, 0.0143, 0.1126, 0.1840, 0.0207, 0.0111, 0.0225], device='cuda:2'), in_proj_covar=tensor([0.0170, 0.0244, 0.0163, 0.0313, 0.0266, 0.0198, 0.0145, 0.0174], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 13:41:56,931 INFO [train2.py:809] (2/4) Epoch 16, batch 1700, loss[ctc_loss=0.08885, att_loss=0.2388, loss=0.2088, over 16406.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006608, over 44.00 utterances.], tot_loss[ctc_loss=0.08638, att_loss=0.2419, loss=0.2108, over 3280203.40 frames. utt_duration=1278 frames, utt_pad_proportion=0.04392, over 10281.05 utterances.], batch size: 44, lr: 6.79e-03, grad_scale: 8.0 2023-03-08 13:42:01,620 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0775, 5.3512, 5.2820, 5.2796, 5.4068, 5.3814, 5.0376, 4.8145], device='cuda:2'), covar=tensor([0.0971, 0.0480, 0.0282, 0.0515, 0.0272, 0.0271, 0.0362, 0.0308], device='cuda:2'), in_proj_covar=tensor([0.0495, 0.0331, 0.0303, 0.0328, 0.0386, 0.0406, 0.0331, 0.0367], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 13:42:43,848 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-03-08 13:42:54,778 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 13:43:00,596 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8855, 6.0916, 5.4381, 5.8853, 5.7304, 5.3264, 5.5578, 5.3265], device='cuda:2'), covar=tensor([0.1111, 0.0839, 0.0887, 0.0731, 0.0830, 0.1410, 0.2145, 0.2275], device='cuda:2'), in_proj_covar=tensor([0.0488, 0.0555, 0.0424, 0.0426, 0.0404, 0.0442, 0.0576, 0.0505], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 13:43:16,832 INFO [train2.py:809] (2/4) Epoch 16, batch 1750, loss[ctc_loss=0.0847, att_loss=0.2319, loss=0.2025, over 16154.00 frames. utt_duration=1578 frames, utt_pad_proportion=0.007168, over 41.00 utterances.], tot_loss[ctc_loss=0.08494, att_loss=0.2406, loss=0.2095, over 3275645.16 frames. utt_duration=1294 frames, utt_pad_proportion=0.04085, over 10139.24 utterances.], batch size: 41, lr: 6.78e-03, grad_scale: 8.0 2023-03-08 13:43:53,597 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.239e+02 1.998e+02 2.470e+02 3.033e+02 6.251e+02, threshold=4.940e+02, percent-clipped=1.0 2023-03-08 13:44:12,686 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.57 vs. limit=5.0 2023-03-08 13:44:14,030 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8749, 3.6090, 3.0848, 3.3394, 3.8267, 3.5172, 2.9417, 4.1802], device='cuda:2'), covar=tensor([0.0991, 0.0496, 0.1042, 0.0665, 0.0721, 0.0742, 0.0819, 0.0419], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0200, 0.0215, 0.0187, 0.0258, 0.0226, 0.0193, 0.0275], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 13:44:36,285 INFO [train2.py:809] (2/4) Epoch 16, batch 1800, loss[ctc_loss=0.09565, att_loss=0.2536, loss=0.222, over 16873.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007302, over 49.00 utterances.], tot_loss[ctc_loss=0.08428, att_loss=0.2404, loss=0.2091, over 3276623.96 frames. utt_duration=1304 frames, utt_pad_proportion=0.03811, over 10060.94 utterances.], batch size: 49, lr: 6.78e-03, grad_scale: 8.0 2023-03-08 13:45:14,732 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4524, 2.4340, 5.0166, 3.7844, 2.8071, 4.2926, 4.7521, 4.5985], device='cuda:2'), covar=tensor([0.0246, 0.1750, 0.0145, 0.1042, 0.1987, 0.0224, 0.0118, 0.0230], device='cuda:2'), in_proj_covar=tensor([0.0171, 0.0246, 0.0163, 0.0315, 0.0268, 0.0199, 0.0146, 0.0175], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 13:45:55,250 INFO [train2.py:809] (2/4) Epoch 16, batch 1850, loss[ctc_loss=0.1132, att_loss=0.2616, loss=0.232, over 17298.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01214, over 55.00 utterances.], tot_loss[ctc_loss=0.08469, att_loss=0.2416, loss=0.2102, over 3283547.09 frames. utt_duration=1288 frames, utt_pad_proportion=0.03964, over 10210.34 utterances.], batch size: 55, lr: 6.78e-03, grad_scale: 8.0 2023-03-08 13:46:31,939 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.537e+02 2.233e+02 2.629e+02 3.133e+02 5.185e+02, threshold=5.258e+02, percent-clipped=1.0 2023-03-08 13:47:14,836 INFO [train2.py:809] (2/4) Epoch 16, batch 1900, loss[ctc_loss=0.09822, att_loss=0.2685, loss=0.2344, over 17385.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03366, over 63.00 utterances.], tot_loss[ctc_loss=0.08587, att_loss=0.2423, loss=0.211, over 3272503.37 frames. utt_duration=1248 frames, utt_pad_proportion=0.05224, over 10497.73 utterances.], batch size: 63, lr: 6.78e-03, grad_scale: 8.0 2023-03-08 13:47:28,946 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2499, 5.2253, 4.9865, 2.8867, 4.9690, 4.7193, 4.4820, 2.8363], device='cuda:2'), covar=tensor([0.0100, 0.0068, 0.0249, 0.1055, 0.0080, 0.0177, 0.0272, 0.1263], device='cuda:2'), in_proj_covar=tensor([0.0068, 0.0094, 0.0092, 0.0109, 0.0079, 0.0106, 0.0097, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 13:48:33,461 INFO [train2.py:809] (2/4) Epoch 16, batch 1950, loss[ctc_loss=0.08426, att_loss=0.2406, loss=0.2093, over 16185.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.006531, over 41.00 utterances.], tot_loss[ctc_loss=0.08635, att_loss=0.2429, loss=0.2116, over 3271570.98 frames. utt_duration=1230 frames, utt_pad_proportion=0.05923, over 10650.57 utterances.], batch size: 41, lr: 6.77e-03, grad_scale: 8.0 2023-03-08 13:48:36,785 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0264, 6.2114, 5.6452, 6.0202, 5.8391, 5.4242, 5.7039, 5.4481], device='cuda:2'), covar=tensor([0.1253, 0.0918, 0.0842, 0.0690, 0.0948, 0.1596, 0.2243, 0.2321], device='cuda:2'), in_proj_covar=tensor([0.0487, 0.0556, 0.0425, 0.0426, 0.0408, 0.0446, 0.0576, 0.0504], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 13:48:49,370 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=61717.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:49:09,858 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.327e+02 2.102e+02 2.537e+02 2.891e+02 6.099e+02, threshold=5.073e+02, percent-clipped=2.0 2023-03-08 13:49:13,323 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=61732.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:49:30,498 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=61743.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:49:53,043 INFO [train2.py:809] (2/4) Epoch 16, batch 2000, loss[ctc_loss=0.07665, att_loss=0.2416, loss=0.2086, over 17025.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007788, over 51.00 utterances.], tot_loss[ctc_loss=0.08536, att_loss=0.2426, loss=0.2112, over 3278939.06 frames. utt_duration=1243 frames, utt_pad_proportion=0.05466, over 10565.65 utterances.], batch size: 51, lr: 6.77e-03, grad_scale: 8.0 2023-03-08 13:50:05,416 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=61765.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:50:49,945 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=61793.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:51:08,257 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=61804.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:51:12,563 INFO [train2.py:809] (2/4) Epoch 16, batch 2050, loss[ctc_loss=0.08496, att_loss=0.2526, loss=0.2191, over 17283.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.01233, over 55.00 utterances.], tot_loss[ctc_loss=0.08602, att_loss=0.2423, loss=0.2111, over 3268466.33 frames. utt_duration=1232 frames, utt_pad_proportion=0.05972, over 10628.88 utterances.], batch size: 55, lr: 6.77e-03, grad_scale: 8.0 2023-03-08 13:51:48,695 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.602e+02 2.229e+02 2.633e+02 3.391e+02 8.027e+02, threshold=5.266e+02, percent-clipped=7.0 2023-03-08 13:52:32,063 INFO [train2.py:809] (2/4) Epoch 16, batch 2100, loss[ctc_loss=0.1362, att_loss=0.2692, loss=0.2426, over 13646.00 frames. utt_duration=373 frames, utt_pad_proportion=0.3468, over 147.00 utterances.], tot_loss[ctc_loss=0.08636, att_loss=0.2426, loss=0.2114, over 3268954.43 frames. utt_duration=1205 frames, utt_pad_proportion=0.06711, over 10867.58 utterances.], batch size: 147, lr: 6.77e-03, grad_scale: 8.0 2023-03-08 13:53:51,105 INFO [train2.py:809] (2/4) Epoch 16, batch 2150, loss[ctc_loss=0.07766, att_loss=0.2484, loss=0.2142, over 16625.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.00517, over 47.00 utterances.], tot_loss[ctc_loss=0.08563, att_loss=0.2426, loss=0.2112, over 3270033.17 frames. utt_duration=1209 frames, utt_pad_proportion=0.0644, over 10835.16 utterances.], batch size: 47, lr: 6.76e-03, grad_scale: 8.0 2023-03-08 13:54:03,336 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=61914.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:54:09,418 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4499, 2.7349, 3.4850, 4.3658, 3.8911, 3.9497, 2.9098, 1.8460], device='cuda:2'), covar=tensor([0.0601, 0.1990, 0.0893, 0.0672, 0.0746, 0.0411, 0.1481, 0.2524], device='cuda:2'), in_proj_covar=tensor([0.0175, 0.0212, 0.0188, 0.0202, 0.0213, 0.0169, 0.0196, 0.0183], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 13:54:27,521 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.318e+02 2.191e+02 2.731e+02 3.267e+02 7.309e+02, threshold=5.461e+02, percent-clipped=1.0 2023-03-08 13:55:11,221 INFO [train2.py:809] (2/4) Epoch 16, batch 2200, loss[ctc_loss=0.09164, att_loss=0.228, loss=0.2007, over 15476.00 frames. utt_duration=1721 frames, utt_pad_proportion=0.009227, over 36.00 utterances.], tot_loss[ctc_loss=0.0851, att_loss=0.2421, loss=0.2107, over 3275622.27 frames. utt_duration=1235 frames, utt_pad_proportion=0.05639, over 10619.17 utterances.], batch size: 36, lr: 6.76e-03, grad_scale: 8.0 2023-03-08 13:55:40,153 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=61975.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:56:35,310 INFO [train2.py:809] (2/4) Epoch 16, batch 2250, loss[ctc_loss=0.09611, att_loss=0.2565, loss=0.2244, over 16316.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.00684, over 45.00 utterances.], tot_loss[ctc_loss=0.085, att_loss=0.2422, loss=0.2107, over 3274157.57 frames. utt_duration=1235 frames, utt_pad_proportion=0.05683, over 10619.33 utterances.], batch size: 45, lr: 6.76e-03, grad_scale: 8.0 2023-03-08 13:57:11,186 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.510e+02 2.098e+02 2.625e+02 3.264e+02 7.065e+02, threshold=5.250e+02, percent-clipped=5.0 2023-03-08 13:57:11,453 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5527, 4.9552, 5.1637, 4.9907, 5.0712, 5.5192, 4.8435, 5.6296], device='cuda:2'), covar=tensor([0.0757, 0.0741, 0.0788, 0.1232, 0.1763, 0.0933, 0.1045, 0.0610], device='cuda:2'), in_proj_covar=tensor([0.0808, 0.0480, 0.0565, 0.0620, 0.0821, 0.0577, 0.0454, 0.0550], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 13:57:53,150 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-08 13:57:54,976 INFO [train2.py:809] (2/4) Epoch 16, batch 2300, loss[ctc_loss=0.1228, att_loss=0.2651, loss=0.2367, over 17286.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01273, over 55.00 utterances.], tot_loss[ctc_loss=0.0853, att_loss=0.242, loss=0.2107, over 3278847.09 frames. utt_duration=1238 frames, utt_pad_proportion=0.05461, over 10603.69 utterances.], batch size: 55, lr: 6.75e-03, grad_scale: 8.0 2023-03-08 13:58:44,421 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62088.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:59:02,066 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62099.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:59:14,701 INFO [train2.py:809] (2/4) Epoch 16, batch 2350, loss[ctc_loss=0.08888, att_loss=0.2514, loss=0.2189, over 17326.00 frames. utt_duration=878.8 frames, utt_pad_proportion=0.07887, over 79.00 utterances.], tot_loss[ctc_loss=0.08572, att_loss=0.2423, loss=0.211, over 3284331.86 frames. utt_duration=1258 frames, utt_pad_proportion=0.04815, over 10456.13 utterances.], batch size: 79, lr: 6.75e-03, grad_scale: 8.0 2023-03-08 13:59:38,241 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62122.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 13:59:49,925 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.396e+02 2.132e+02 2.401e+02 3.401e+02 9.519e+02, threshold=4.802e+02, percent-clipped=3.0 2023-03-08 14:00:34,133 INFO [train2.py:809] (2/4) Epoch 16, batch 2400, loss[ctc_loss=0.06175, att_loss=0.2201, loss=0.1884, over 16101.00 frames. utt_duration=1535 frames, utt_pad_proportion=0.00768, over 42.00 utterances.], tot_loss[ctc_loss=0.0858, att_loss=0.2419, loss=0.2107, over 3279251.61 frames. utt_duration=1255 frames, utt_pad_proportion=0.04982, over 10466.42 utterances.], batch size: 42, lr: 6.75e-03, grad_scale: 16.0 2023-03-08 14:00:41,444 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2023-03-08 14:01:15,281 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62183.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:01:53,490 INFO [train2.py:809] (2/4) Epoch 16, batch 2450, loss[ctc_loss=0.1287, att_loss=0.277, loss=0.2473, over 14731.00 frames. utt_duration=408 frames, utt_pad_proportion=0.2904, over 145.00 utterances.], tot_loss[ctc_loss=0.08667, att_loss=0.2427, loss=0.2115, over 3271587.27 frames. utt_duration=1188 frames, utt_pad_proportion=0.0677, over 11026.38 utterances.], batch size: 145, lr: 6.75e-03, grad_scale: 16.0 2023-03-08 14:02:29,359 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.309e+02 2.234e+02 2.691e+02 3.084e+02 9.437e+02, threshold=5.381e+02, percent-clipped=5.0 2023-03-08 14:03:13,144 INFO [train2.py:809] (2/4) Epoch 16, batch 2500, loss[ctc_loss=0.09155, att_loss=0.261, loss=0.2271, over 16771.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.005567, over 48.00 utterances.], tot_loss[ctc_loss=0.08597, att_loss=0.2425, loss=0.2112, over 3276318.59 frames. utt_duration=1205 frames, utt_pad_proportion=0.06265, over 10890.91 utterances.], batch size: 48, lr: 6.74e-03, grad_scale: 16.0 2023-03-08 14:03:33,442 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62270.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:03:38,191 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62273.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:04:32,515 INFO [train2.py:809] (2/4) Epoch 16, batch 2550, loss[ctc_loss=0.0977, att_loss=0.2546, loss=0.2232, over 16867.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.006896, over 49.00 utterances.], tot_loss[ctc_loss=0.08515, att_loss=0.2418, loss=0.2105, over 3279781.90 frames. utt_duration=1222 frames, utt_pad_proportion=0.05786, over 10748.27 utterances.], batch size: 49, lr: 6.74e-03, grad_scale: 8.0 2023-03-08 14:04:32,742 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1320, 5.4116, 5.7126, 5.6352, 5.5891, 6.0606, 5.2893, 6.1360], device='cuda:2'), covar=tensor([0.0747, 0.0747, 0.0752, 0.1180, 0.1959, 0.0879, 0.0613, 0.0725], device='cuda:2'), in_proj_covar=tensor([0.0793, 0.0469, 0.0552, 0.0606, 0.0807, 0.0561, 0.0445, 0.0542], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 14:05:05,868 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5597, 2.9050, 3.6325, 2.9520, 3.5835, 4.6063, 4.4368, 3.4964], device='cuda:2'), covar=tensor([0.0310, 0.1821, 0.1159, 0.1345, 0.1017, 0.0804, 0.0523, 0.1091], device='cuda:2'), in_proj_covar=tensor([0.0235, 0.0235, 0.0263, 0.0208, 0.0250, 0.0335, 0.0238, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 14:05:09,997 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.185e+02 2.191e+02 2.595e+02 3.145e+02 7.901e+02, threshold=5.189e+02, percent-clipped=4.0 2023-03-08 14:05:15,230 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62334.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:05:52,045 INFO [train2.py:809] (2/4) Epoch 16, batch 2600, loss[ctc_loss=0.1239, att_loss=0.2725, loss=0.2428, over 17054.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.009615, over 53.00 utterances.], tot_loss[ctc_loss=0.08493, att_loss=0.2418, loss=0.2104, over 3282217.85 frames. utt_duration=1238 frames, utt_pad_proportion=0.05373, over 10617.86 utterances.], batch size: 53, lr: 6.74e-03, grad_scale: 8.0 2023-03-08 14:06:41,219 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62388.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:06:45,789 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62391.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:06:58,814 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62399.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:07:11,500 INFO [train2.py:809] (2/4) Epoch 16, batch 2650, loss[ctc_loss=0.07773, att_loss=0.2223, loss=0.1934, over 15783.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007788, over 38.00 utterances.], tot_loss[ctc_loss=0.0849, att_loss=0.2413, loss=0.21, over 3273658.01 frames. utt_duration=1248 frames, utt_pad_proportion=0.05573, over 10508.73 utterances.], batch size: 38, lr: 6.74e-03, grad_scale: 8.0 2023-03-08 14:07:49,390 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.560e+02 2.130e+02 2.553e+02 3.253e+02 7.281e+02, threshold=5.105e+02, percent-clipped=6.0 2023-03-08 14:07:57,927 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62436.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:08:15,335 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62447.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:08:17,685 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2023-03-08 14:08:24,099 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62452.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:08:31,468 INFO [train2.py:809] (2/4) Epoch 16, batch 2700, loss[ctc_loss=0.1014, att_loss=0.2423, loss=0.2141, over 16128.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005436, over 42.00 utterances.], tot_loss[ctc_loss=0.08429, att_loss=0.2405, loss=0.2092, over 3269005.78 frames. utt_duration=1277 frames, utt_pad_proportion=0.04863, over 10248.67 utterances.], batch size: 42, lr: 6.73e-03, grad_scale: 8.0 2023-03-08 14:09:05,214 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62478.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:09:25,965 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6763, 3.5827, 3.4913, 3.1172, 3.5808, 3.7149, 3.6337, 2.5994], device='cuda:2'), covar=tensor([0.1154, 0.1564, 0.2772, 0.4604, 0.1167, 0.3960, 0.0980, 0.5764], device='cuda:2'), in_proj_covar=tensor([0.0136, 0.0160, 0.0168, 0.0233, 0.0132, 0.0225, 0.0146, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 14:09:34,471 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9185, 3.7579, 3.6805, 3.2324, 3.6704, 3.8708, 3.7306, 2.8665], device='cuda:2'), covar=tensor([0.0976, 0.1200, 0.1791, 0.5247, 0.2344, 0.2107, 0.1065, 0.5321], device='cuda:2'), in_proj_covar=tensor([0.0136, 0.0160, 0.0169, 0.0233, 0.0132, 0.0225, 0.0146, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 14:09:38,065 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6376, 4.8778, 5.3789, 4.8775, 4.6329, 5.4307, 5.0280, 5.4704], device='cuda:2'), covar=tensor([0.1431, 0.1911, 0.1431, 0.2608, 0.4277, 0.2031, 0.1245, 0.1698], device='cuda:2'), in_proj_covar=tensor([0.0790, 0.0468, 0.0553, 0.0606, 0.0805, 0.0559, 0.0444, 0.0541], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 14:09:51,797 INFO [train2.py:809] (2/4) Epoch 16, batch 2750, loss[ctc_loss=0.07676, att_loss=0.2203, loss=0.1916, over 15663.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.00623, over 37.00 utterances.], tot_loss[ctc_loss=0.0842, att_loss=0.2402, loss=0.209, over 3261541.61 frames. utt_duration=1284 frames, utt_pad_proportion=0.04746, over 10172.85 utterances.], batch size: 37, lr: 6.73e-03, grad_scale: 8.0 2023-03-08 14:10:29,273 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.352e+02 2.080e+02 2.636e+02 3.250e+02 8.951e+02, threshold=5.273e+02, percent-clipped=4.0 2023-03-08 14:10:52,932 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62545.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:11:10,992 INFO [train2.py:809] (2/4) Epoch 16, batch 2800, loss[ctc_loss=0.0785, att_loss=0.2335, loss=0.2025, over 16396.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007183, over 44.00 utterances.], tot_loss[ctc_loss=0.08545, att_loss=0.2412, loss=0.21, over 3269802.92 frames. utt_duration=1289 frames, utt_pad_proportion=0.04445, over 10160.16 utterances.], batch size: 44, lr: 6.73e-03, grad_scale: 8.0 2023-03-08 14:11:11,719 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2023-03-08 14:11:12,879 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1812, 5.1443, 4.9133, 2.8452, 4.9039, 4.8430, 4.6364, 3.0362], device='cuda:2'), covar=tensor([0.0113, 0.0099, 0.0306, 0.1068, 0.0095, 0.0163, 0.0247, 0.1210], device='cuda:2'), in_proj_covar=tensor([0.0069, 0.0096, 0.0093, 0.0109, 0.0079, 0.0106, 0.0098, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 14:11:31,252 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62570.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:11:55,461 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62585.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:12:07,526 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9224, 6.0976, 5.5058, 5.8920, 5.7779, 5.2602, 5.5562, 5.3094], device='cuda:2'), covar=tensor([0.1165, 0.0901, 0.0884, 0.0835, 0.0759, 0.1478, 0.2264, 0.2398], device='cuda:2'), in_proj_covar=tensor([0.0494, 0.0563, 0.0433, 0.0434, 0.0410, 0.0449, 0.0588, 0.0509], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 14:12:29,674 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62606.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:12:30,822 INFO [train2.py:809] (2/4) Epoch 16, batch 2850, loss[ctc_loss=0.07579, att_loss=0.2227, loss=0.1933, over 15866.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.009296, over 39.00 utterances.], tot_loss[ctc_loss=0.0855, att_loss=0.2413, loss=0.2101, over 3270522.30 frames. utt_duration=1253 frames, utt_pad_proportion=0.05322, over 10453.44 utterances.], batch size: 39, lr: 6.72e-03, grad_scale: 8.0 2023-03-08 14:12:48,242 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62618.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:13:06,009 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62629.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:13:09,364 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.381e+02 1.902e+02 2.514e+02 2.955e+02 4.757e+02, threshold=5.028e+02, percent-clipped=0.0 2023-03-08 14:13:32,832 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5367, 2.6688, 3.3882, 4.5182, 3.9618, 4.0674, 2.9370, 2.4793], device='cuda:2'), covar=tensor([0.0632, 0.2349, 0.0976, 0.0502, 0.0775, 0.0433, 0.1452, 0.2132], device='cuda:2'), in_proj_covar=tensor([0.0179, 0.0216, 0.0192, 0.0204, 0.0214, 0.0171, 0.0198, 0.0183], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 14:13:34,642 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62646.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:13:51,158 INFO [train2.py:809] (2/4) Epoch 16, batch 2900, loss[ctc_loss=0.1032, att_loss=0.2587, loss=0.2276, over 16626.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005394, over 47.00 utterances.], tot_loss[ctc_loss=0.08464, att_loss=0.2407, loss=0.2095, over 3268787.15 frames. utt_duration=1257 frames, utt_pad_proportion=0.05325, over 10411.68 utterances.], batch size: 47, lr: 6.72e-03, grad_scale: 8.0 2023-03-08 14:14:06,748 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62666.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:15:12,440 INFO [train2.py:809] (2/4) Epoch 16, batch 2950, loss[ctc_loss=0.06991, att_loss=0.2248, loss=0.1938, over 15869.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.01009, over 39.00 utterances.], tot_loss[ctc_loss=0.08402, att_loss=0.2406, loss=0.2093, over 3276444.91 frames. utt_duration=1258 frames, utt_pad_proportion=0.05104, over 10433.98 utterances.], batch size: 39, lr: 6.72e-03, grad_scale: 8.0 2023-03-08 14:15:44,576 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62727.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:15:50,980 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.359e+02 2.017e+02 2.387e+02 2.901e+02 6.735e+02, threshold=4.774e+02, percent-clipped=4.0 2023-03-08 14:16:02,978 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2023-03-08 14:16:10,793 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 14:16:17,628 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62747.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:16:20,673 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7144, 5.9667, 5.3826, 5.7505, 5.5927, 5.2023, 5.4289, 5.1862], device='cuda:2'), covar=tensor([0.1206, 0.0874, 0.0838, 0.0812, 0.0906, 0.1394, 0.2323, 0.2279], device='cuda:2'), in_proj_covar=tensor([0.0493, 0.0560, 0.0431, 0.0432, 0.0409, 0.0446, 0.0580, 0.0508], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 14:16:32,917 INFO [train2.py:809] (2/4) Epoch 16, batch 3000, loss[ctc_loss=0.06182, att_loss=0.2184, loss=0.1871, over 15775.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008542, over 38.00 utterances.], tot_loss[ctc_loss=0.08419, att_loss=0.241, loss=0.2097, over 3275058.49 frames. utt_duration=1213 frames, utt_pad_proportion=0.06243, over 10815.07 utterances.], batch size: 38, lr: 6.72e-03, grad_scale: 8.0 2023-03-08 14:16:32,917 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 14:16:46,718 INFO [train2.py:843] (2/4) Epoch 16, validation: ctc_loss=0.0433, att_loss=0.235, loss=0.1967, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 14:16:46,719 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 14:17:11,758 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62773.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:17:19,166 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62778.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:17:42,012 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-03-08 14:18:06,025 INFO [train2.py:809] (2/4) Epoch 16, batch 3050, loss[ctc_loss=0.07015, att_loss=0.2434, loss=0.2087, over 16694.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005238, over 46.00 utterances.], tot_loss[ctc_loss=0.08352, att_loss=0.2401, loss=0.2088, over 3274457.18 frames. utt_duration=1242 frames, utt_pad_proportion=0.05519, over 10559.38 utterances.], batch size: 46, lr: 6.71e-03, grad_scale: 8.0 2023-03-08 14:18:28,066 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.79 vs. limit=2.0 2023-03-08 14:18:36,336 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62826.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:18:42,999 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0644, 5.1333, 5.0804, 2.3848, 2.0709, 2.8216, 3.4409, 3.9156], device='cuda:2'), covar=tensor([0.0674, 0.0279, 0.0216, 0.5181, 0.5749, 0.2648, 0.2120, 0.1636], device='cuda:2'), in_proj_covar=tensor([0.0344, 0.0247, 0.0248, 0.0229, 0.0340, 0.0333, 0.0238, 0.0357], device='cuda:2'), out_proj_covar=tensor([1.4967e-04, 9.2764e-05, 1.0723e-04, 9.9625e-05, 1.4458e-04, 1.3161e-04, 9.4979e-05, 1.4730e-04], device='cuda:2') 2023-03-08 14:18:43,970 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.353e+02 2.145e+02 2.502e+02 3.037e+02 6.032e+02, threshold=5.003e+02, percent-clipped=3.0 2023-03-08 14:18:48,972 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62834.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:19:28,202 INFO [train2.py:809] (2/4) Epoch 16, batch 3100, loss[ctc_loss=0.08299, att_loss=0.2562, loss=0.2215, over 17043.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.009521, over 52.00 utterances.], tot_loss[ctc_loss=0.08337, att_loss=0.2401, loss=0.2087, over 3277895.54 frames. utt_duration=1251 frames, utt_pad_proportion=0.05087, over 10497.41 utterances.], batch size: 52, lr: 6.71e-03, grad_scale: 8.0 2023-03-08 14:20:35,237 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0164, 4.1383, 3.8575, 4.1943, 3.8569, 3.7296, 4.2148, 4.1121], device='cuda:2'), covar=tensor([0.0546, 0.0315, 0.0658, 0.0361, 0.0422, 0.0889, 0.0272, 0.0196], device='cuda:2'), in_proj_covar=tensor([0.0368, 0.0291, 0.0344, 0.0309, 0.0298, 0.0221, 0.0279, 0.0261], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0005, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 14:20:42,393 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62901.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:20:51,887 INFO [train2.py:809] (2/4) Epoch 16, batch 3150, loss[ctc_loss=0.08385, att_loss=0.2488, loss=0.2158, over 16326.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006217, over 45.00 utterances.], tot_loss[ctc_loss=0.0836, att_loss=0.2402, loss=0.2089, over 3280735.58 frames. utt_duration=1258 frames, utt_pad_proportion=0.04892, over 10445.08 utterances.], batch size: 45, lr: 6.71e-03, grad_scale: 8.0 2023-03-08 14:21:28,216 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62929.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:21:31,248 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.516e+02 2.078e+02 2.532e+02 3.325e+02 7.691e+02, threshold=5.063e+02, percent-clipped=6.0 2023-03-08 14:21:49,292 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62941.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:22:15,346 INFO [train2.py:809] (2/4) Epoch 16, batch 3200, loss[ctc_loss=0.07796, att_loss=0.218, loss=0.19, over 15745.00 frames. utt_duration=1659 frames, utt_pad_proportion=0.01072, over 38.00 utterances.], tot_loss[ctc_loss=0.08363, att_loss=0.2399, loss=0.2086, over 3275740.42 frames. utt_duration=1250 frames, utt_pad_proportion=0.05231, over 10492.00 utterances.], batch size: 38, lr: 6.71e-03, grad_scale: 8.0 2023-03-08 14:22:15,668 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0241, 4.9892, 4.7766, 2.7507, 4.8615, 4.5625, 4.2980, 2.7618], device='cuda:2'), covar=tensor([0.0116, 0.0090, 0.0241, 0.1072, 0.0087, 0.0202, 0.0315, 0.1345], device='cuda:2'), in_proj_covar=tensor([0.0070, 0.0098, 0.0095, 0.0111, 0.0080, 0.0108, 0.0099, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 14:22:47,674 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62977.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:23:19,219 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4353, 4.8586, 4.7227, 4.8439, 4.9760, 4.6095, 3.3458, 4.7279], device='cuda:2'), covar=tensor([0.0117, 0.0107, 0.0127, 0.0086, 0.0088, 0.0111, 0.0746, 0.0190], device='cuda:2'), in_proj_covar=tensor([0.0083, 0.0079, 0.0098, 0.0063, 0.0067, 0.0079, 0.0097, 0.0100], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 14:23:37,496 INFO [train2.py:809] (2/4) Epoch 16, batch 3250, loss[ctc_loss=0.09073, att_loss=0.2499, loss=0.2181, over 16554.00 frames. utt_duration=1473 frames, utt_pad_proportion=0.004955, over 45.00 utterances.], tot_loss[ctc_loss=0.08314, att_loss=0.2399, loss=0.2085, over 3277527.59 frames. utt_duration=1284 frames, utt_pad_proportion=0.04457, over 10223.86 utterances.], batch size: 45, lr: 6.70e-03, grad_scale: 8.0 2023-03-08 14:24:03,051 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63022.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:24:17,457 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.315e+02 2.062e+02 2.466e+02 2.972e+02 8.137e+02, threshold=4.932e+02, percent-clipped=1.0 2023-03-08 14:24:45,795 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63047.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:25:01,615 INFO [train2.py:809] (2/4) Epoch 16, batch 3300, loss[ctc_loss=0.08, att_loss=0.2224, loss=0.1939, over 15622.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009563, over 37.00 utterances.], tot_loss[ctc_loss=0.08339, att_loss=0.2404, loss=0.209, over 3281567.68 frames. utt_duration=1253 frames, utt_pad_proportion=0.05119, over 10487.23 utterances.], batch size: 37, lr: 6.70e-03, grad_scale: 8.0 2023-03-08 14:26:04,494 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63095.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:26:24,098 INFO [train2.py:809] (2/4) Epoch 16, batch 3350, loss[ctc_loss=0.09981, att_loss=0.262, loss=0.2295, over 17012.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008166, over 51.00 utterances.], tot_loss[ctc_loss=0.08451, att_loss=0.241, loss=0.2097, over 3278771.35 frames. utt_duration=1235 frames, utt_pad_proportion=0.05735, over 10631.94 utterances.], batch size: 51, lr: 6.70e-03, grad_scale: 8.0 2023-03-08 14:26:24,467 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0095, 5.0233, 4.7914, 2.4835, 4.8654, 4.6241, 4.1740, 2.7097], device='cuda:2'), covar=tensor([0.0119, 0.0087, 0.0255, 0.1226, 0.0083, 0.0191, 0.0334, 0.1345], device='cuda:2'), in_proj_covar=tensor([0.0070, 0.0097, 0.0095, 0.0110, 0.0080, 0.0107, 0.0099, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 14:26:33,984 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2023-03-08 14:26:49,482 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0936, 5.3997, 4.9157, 5.4648, 4.8243, 5.0890, 5.5159, 5.2797], device='cuda:2'), covar=tensor([0.0532, 0.0243, 0.0731, 0.0230, 0.0377, 0.0186, 0.0208, 0.0157], device='cuda:2'), in_proj_covar=tensor([0.0371, 0.0293, 0.0346, 0.0311, 0.0299, 0.0223, 0.0282, 0.0260], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 14:27:00,460 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63129.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:27:03,353 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.304e+02 2.090e+02 2.534e+02 3.183e+02 5.519e+02, threshold=5.068e+02, percent-clipped=6.0 2023-03-08 14:27:04,402 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-03-08 14:27:47,072 INFO [train2.py:809] (2/4) Epoch 16, batch 3400, loss[ctc_loss=0.0905, att_loss=0.2644, loss=0.2296, over 17041.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.01028, over 53.00 utterances.], tot_loss[ctc_loss=0.08445, att_loss=0.2417, loss=0.2103, over 3288148.79 frames. utt_duration=1235 frames, utt_pad_proportion=0.05455, over 10660.70 utterances.], batch size: 53, lr: 6.70e-03, grad_scale: 8.0 2023-03-08 14:27:59,278 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3438, 5.2471, 4.9982, 3.1434, 5.0781, 4.8755, 4.5780, 3.0503], device='cuda:2'), covar=tensor([0.0103, 0.0084, 0.0258, 0.0975, 0.0093, 0.0166, 0.0287, 0.1301], device='cuda:2'), in_proj_covar=tensor([0.0070, 0.0096, 0.0095, 0.0109, 0.0079, 0.0106, 0.0098, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 14:29:01,699 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63201.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:29:11,054 INFO [train2.py:809] (2/4) Epoch 16, batch 3450, loss[ctc_loss=0.09435, att_loss=0.2578, loss=0.2251, over 16779.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005889, over 48.00 utterances.], tot_loss[ctc_loss=0.08453, att_loss=0.2418, loss=0.2104, over 3289479.81 frames. utt_duration=1243 frames, utt_pad_proportion=0.0519, over 10595.07 utterances.], batch size: 48, lr: 6.69e-03, grad_scale: 8.0 2023-03-08 14:29:34,425 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63221.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:29:38,928 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9372, 5.1579, 5.0763, 5.0652, 5.2227, 5.1971, 4.8725, 4.6295], device='cuda:2'), covar=tensor([0.1008, 0.0561, 0.0334, 0.0526, 0.0280, 0.0329, 0.0354, 0.0382], device='cuda:2'), in_proj_covar=tensor([0.0482, 0.0325, 0.0302, 0.0318, 0.0376, 0.0398, 0.0322, 0.0358], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 14:29:49,780 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.272e+02 2.022e+02 2.458e+02 2.979e+02 7.741e+02, threshold=4.916e+02, percent-clipped=5.0 2023-03-08 14:30:06,830 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63241.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:30:20,198 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63249.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:30:32,948 INFO [train2.py:809] (2/4) Epoch 16, batch 3500, loss[ctc_loss=0.06123, att_loss=0.2077, loss=0.1784, over 15772.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.009149, over 38.00 utterances.], tot_loss[ctc_loss=0.08429, att_loss=0.2414, loss=0.21, over 3287159.67 frames. utt_duration=1260 frames, utt_pad_proportion=0.04729, over 10445.50 utterances.], batch size: 38, lr: 6.69e-03, grad_scale: 8.0 2023-03-08 14:31:14,065 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63282.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:31:25,098 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63289.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:31:55,978 INFO [train2.py:809] (2/4) Epoch 16, batch 3550, loss[ctc_loss=0.09149, att_loss=0.2441, loss=0.2136, over 16297.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006457, over 43.00 utterances.], tot_loss[ctc_loss=0.08441, att_loss=0.2411, loss=0.2098, over 3288558.19 frames. utt_duration=1274 frames, utt_pad_proportion=0.04303, over 10340.29 utterances.], batch size: 43, lr: 6.69e-03, grad_scale: 8.0 2023-03-08 14:32:01,343 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4752, 2.4256, 4.9910, 3.8092, 2.9214, 4.1796, 4.6017, 4.5600], device='cuda:2'), covar=tensor([0.0224, 0.1770, 0.0109, 0.0981, 0.1796, 0.0269, 0.0136, 0.0229], device='cuda:2'), in_proj_covar=tensor([0.0173, 0.0244, 0.0162, 0.0311, 0.0264, 0.0197, 0.0143, 0.0173], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 14:32:20,750 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63322.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:32:36,185 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.235e+02 2.051e+02 2.467e+02 3.018e+02 7.226e+02, threshold=4.933e+02, percent-clipped=6.0 2023-03-08 14:33:19,238 INFO [train2.py:809] (2/4) Epoch 16, batch 3600, loss[ctc_loss=0.06887, att_loss=0.2309, loss=0.1985, over 16392.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.006764, over 44.00 utterances.], tot_loss[ctc_loss=0.08432, att_loss=0.2412, loss=0.2098, over 3284687.08 frames. utt_duration=1281 frames, utt_pad_proportion=0.04243, over 10265.19 utterances.], batch size: 44, lr: 6.69e-03, grad_scale: 8.0 2023-03-08 14:33:39,692 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63370.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:34:42,617 INFO [train2.py:809] (2/4) Epoch 16, batch 3650, loss[ctc_loss=0.07329, att_loss=0.2172, loss=0.1884, over 15896.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.008459, over 39.00 utterances.], tot_loss[ctc_loss=0.0837, att_loss=0.2411, loss=0.2096, over 3287585.10 frames. utt_duration=1283 frames, utt_pad_proportion=0.04042, over 10257.92 utterances.], batch size: 39, lr: 6.68e-03, grad_scale: 8.0 2023-03-08 14:35:03,589 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63420.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:35:06,811 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63422.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:35:19,552 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63429.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:35:22,396 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.472e+02 2.142e+02 2.479e+02 3.214e+02 4.969e+02, threshold=4.958e+02, percent-clipped=1.0 2023-03-08 14:36:06,126 INFO [train2.py:809] (2/4) Epoch 16, batch 3700, loss[ctc_loss=0.1104, att_loss=0.2428, loss=0.2163, over 16341.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005472, over 45.00 utterances.], tot_loss[ctc_loss=0.08442, att_loss=0.2412, loss=0.2099, over 3274857.16 frames. utt_duration=1240 frames, utt_pad_proportion=0.05194, over 10576.31 utterances.], batch size: 45, lr: 6.68e-03, grad_scale: 8.0 2023-03-08 14:36:40,564 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63477.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:36:47,212 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63481.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:36:50,269 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63483.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:36:59,167 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7561, 3.5224, 3.5069, 3.0510, 3.6038, 3.5053, 3.6050, 2.6810], device='cuda:2'), covar=tensor([0.0965, 0.1729, 0.2124, 0.4423, 0.1318, 0.2502, 0.1009, 0.4317], device='cuda:2'), in_proj_covar=tensor([0.0137, 0.0160, 0.0171, 0.0236, 0.0131, 0.0229, 0.0146, 0.0198], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 14:37:29,951 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-08 14:37:30,409 INFO [train2.py:809] (2/4) Epoch 16, batch 3750, loss[ctc_loss=0.06764, att_loss=0.2243, loss=0.1929, over 16017.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006707, over 40.00 utterances.], tot_loss[ctc_loss=0.08338, att_loss=0.24, loss=0.2087, over 3267240.87 frames. utt_duration=1272 frames, utt_pad_proportion=0.04819, over 10289.61 utterances.], batch size: 40, lr: 6.68e-03, grad_scale: 8.0 2023-03-08 14:37:53,642 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5238, 2.4415, 4.9954, 3.6036, 3.0695, 4.1914, 4.6778, 4.6240], device='cuda:2'), covar=tensor([0.0215, 0.1738, 0.0113, 0.1119, 0.1723, 0.0236, 0.0130, 0.0212], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0241, 0.0161, 0.0309, 0.0262, 0.0196, 0.0143, 0.0172], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 14:38:09,923 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.475e+02 2.179e+02 2.509e+02 3.051e+02 5.574e+02, threshold=5.018e+02, percent-clipped=2.0 2023-03-08 14:38:52,443 INFO [train2.py:809] (2/4) Epoch 16, batch 3800, loss[ctc_loss=0.08633, att_loss=0.2524, loss=0.2192, over 16349.00 frames. utt_duration=1455 frames, utt_pad_proportion=0.005107, over 45.00 utterances.], tot_loss[ctc_loss=0.08284, att_loss=0.2393, loss=0.208, over 3265911.44 frames. utt_duration=1283 frames, utt_pad_proportion=0.04611, over 10191.34 utterances.], batch size: 45, lr: 6.67e-03, grad_scale: 8.0 2023-03-08 14:39:11,915 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63569.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:39:25,935 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63577.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:39:41,254 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63586.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:40:15,163 INFO [train2.py:809] (2/4) Epoch 16, batch 3850, loss[ctc_loss=0.1168, att_loss=0.2693, loss=0.2388, over 17429.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03132, over 63.00 utterances.], tot_loss[ctc_loss=0.08355, att_loss=0.2403, loss=0.2089, over 3268879.66 frames. utt_duration=1261 frames, utt_pad_proportion=0.0509, over 10383.41 utterances.], batch size: 63, lr: 6.67e-03, grad_scale: 8.0 2023-03-08 14:40:52,485 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63630.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:40:53,568 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.309e+02 2.062e+02 2.555e+02 3.342e+02 5.623e+02, threshold=5.109e+02, percent-clipped=2.0 2023-03-08 14:41:19,866 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63647.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:41:35,251 INFO [train2.py:809] (2/4) Epoch 16, batch 3900, loss[ctc_loss=0.09903, att_loss=0.2558, loss=0.2244, over 17058.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008003, over 52.00 utterances.], tot_loss[ctc_loss=0.08263, att_loss=0.2395, loss=0.2081, over 3255171.96 frames. utt_duration=1276 frames, utt_pad_proportion=0.05129, over 10218.71 utterances.], batch size: 52, lr: 6.67e-03, grad_scale: 8.0 2023-03-08 14:42:21,145 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-08 14:42:54,343 INFO [train2.py:809] (2/4) Epoch 16, batch 3950, loss[ctc_loss=0.06891, att_loss=0.2235, loss=0.1926, over 16174.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006489, over 41.00 utterances.], tot_loss[ctc_loss=0.0826, att_loss=0.2395, loss=0.2081, over 3261377.72 frames. utt_duration=1258 frames, utt_pad_proportion=0.05365, over 10378.53 utterances.], batch size: 41, lr: 6.67e-03, grad_scale: 8.0 2023-03-08 14:43:31,754 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.446e+02 2.047e+02 2.429e+02 3.327e+02 6.424e+02, threshold=4.857e+02, percent-clipped=3.0 2023-03-08 14:44:08,716 INFO [train2.py:809] (2/4) Epoch 17, batch 0, loss[ctc_loss=0.08783, att_loss=0.2494, loss=0.2171, over 16479.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.005981, over 46.00 utterances.], tot_loss[ctc_loss=0.08783, att_loss=0.2494, loss=0.2171, over 16479.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.005981, over 46.00 utterances.], batch size: 46, lr: 6.46e-03, grad_scale: 8.0 2023-03-08 14:44:08,716 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 14:44:15,261 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8159, 4.6029, 4.5774, 2.2872, 1.9447, 2.7246, 1.9964, 3.6670], device='cuda:2'), covar=tensor([0.0728, 0.0241, 0.0254, 0.4712, 0.5899, 0.2692, 0.3944, 0.1443], device='cuda:2'), in_proj_covar=tensor([0.0340, 0.0244, 0.0249, 0.0225, 0.0338, 0.0329, 0.0239, 0.0352], device='cuda:2'), out_proj_covar=tensor([1.4724e-04, 9.0898e-05, 1.0734e-04, 9.7254e-05, 1.4338e-04, 1.2993e-04, 9.5658e-05, 1.4504e-04], device='cuda:2') 2023-03-08 14:44:15,625 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4458, 5.0171, 4.9873, 4.8351, 4.8915, 5.3813, 4.9924, 5.4728], device='cuda:2'), covar=tensor([0.0704, 0.0663, 0.0828, 0.1268, 0.1981, 0.0894, 0.0464, 0.0612], device='cuda:2'), in_proj_covar=tensor([0.0799, 0.0469, 0.0555, 0.0615, 0.0814, 0.0568, 0.0449, 0.0546], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 14:44:18,664 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7385, 3.5034, 3.4334, 3.0902, 3.5390, 3.4825, 3.5333, 2.5506], device='cuda:2'), covar=tensor([0.0881, 0.1558, 0.3108, 0.3615, 0.1146, 0.2223, 0.0944, 0.4767], device='cuda:2'), in_proj_covar=tensor([0.0136, 0.0158, 0.0170, 0.0232, 0.0131, 0.0227, 0.0145, 0.0197], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 14:44:21,756 INFO [train2.py:843] (2/4) Epoch 17, validation: ctc_loss=0.04327, att_loss=0.2362, loss=0.1976, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 14:44:21,757 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 14:44:22,948 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 14:44:39,657 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1540, 3.9148, 3.2914, 3.3684, 4.1160, 3.7246, 2.9689, 4.4398], device='cuda:2'), covar=tensor([0.0984, 0.0483, 0.1020, 0.0758, 0.0669, 0.0678, 0.0926, 0.0422], device='cuda:2'), in_proj_covar=tensor([0.0191, 0.0201, 0.0214, 0.0187, 0.0258, 0.0226, 0.0190, 0.0271], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 14:45:20,161 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63776.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:45:23,476 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63778.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:45:44,537 INFO [train2.py:809] (2/4) Epoch 17, batch 50, loss[ctc_loss=0.06464, att_loss=0.2199, loss=0.1888, over 12262.00 frames. utt_duration=1818 frames, utt_pad_proportion=0.03248, over 27.00 utterances.], tot_loss[ctc_loss=0.08173, att_loss=0.2386, loss=0.2073, over 737708.80 frames. utt_duration=1205 frames, utt_pad_proportion=0.05989, over 2452.67 utterances.], batch size: 27, lr: 6.46e-03, grad_scale: 8.0 2023-03-08 14:46:15,467 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5390, 3.6750, 3.5681, 3.4576, 3.8815, 3.5164, 3.3777, 2.5741], device='cuda:2'), covar=tensor([0.0361, 0.0441, 0.0433, 0.0470, 0.0744, 0.0333, 0.0418, 0.1594], device='cuda:2'), in_proj_covar=tensor([0.0141, 0.0157, 0.0163, 0.0175, 0.0353, 0.0138, 0.0147, 0.0215], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 14:46:17,011 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63810.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:46:43,424 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6987, 2.1497, 2.1651, 2.4332, 2.6219, 2.4471, 2.0037, 2.8412], device='cuda:2'), covar=tensor([0.1731, 0.3574, 0.3453, 0.1963, 0.2125, 0.1415, 0.3117, 0.1101], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0103, 0.0109, 0.0097, 0.0100, 0.0088, 0.0109, 0.0079], device='cuda:2'), out_proj_covar=tensor([7.1188e-05, 7.8097e-05, 8.2864e-05, 7.2003e-05, 7.2522e-05, 6.9419e-05, 8.0158e-05, 6.2930e-05], device='cuda:2') 2023-03-08 14:46:50,928 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 2.041e+02 2.465e+02 3.095e+02 5.834e+02, threshold=4.929e+02, percent-clipped=2.0 2023-03-08 14:46:56,123 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7710, 4.7549, 4.4707, 2.6462, 4.6090, 4.4746, 3.9003, 2.2806], device='cuda:2'), covar=tensor([0.0131, 0.0118, 0.0305, 0.1215, 0.0098, 0.0233, 0.0434, 0.1727], device='cuda:2'), in_proj_covar=tensor([0.0070, 0.0097, 0.0095, 0.0110, 0.0080, 0.0107, 0.0099, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 14:47:08,059 INFO [train2.py:809] (2/4) Epoch 17, batch 100, loss[ctc_loss=0.06674, att_loss=0.2032, loss=0.1759, over 15424.00 frames. utt_duration=1764 frames, utt_pad_proportion=0.007316, over 35.00 utterances.], tot_loss[ctc_loss=0.08528, att_loss=0.2421, loss=0.2107, over 1301330.29 frames. utt_duration=1136 frames, utt_pad_proportion=0.07754, over 4586.93 utterances.], batch size: 35, lr: 6.46e-03, grad_scale: 8.0 2023-03-08 14:47:16,171 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63845.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:47:58,810 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63871.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:48:08,254 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63877.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:48:31,342 INFO [train2.py:809] (2/4) Epoch 17, batch 150, loss[ctc_loss=0.09233, att_loss=0.2358, loss=0.2071, over 16176.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.00655, over 41.00 utterances.], tot_loss[ctc_loss=0.08526, att_loss=0.2426, loss=0.2111, over 1735618.97 frames. utt_duration=1159 frames, utt_pad_proportion=0.07565, over 5996.87 utterances.], batch size: 41, lr: 6.46e-03, grad_scale: 4.0 2023-03-08 14:48:56,221 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63906.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:49:12,706 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63916.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:49:17,345 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5474, 4.9713, 4.8135, 4.8688, 4.9943, 4.7062, 3.5242, 4.9233], device='cuda:2'), covar=tensor([0.0113, 0.0098, 0.0114, 0.0071, 0.0084, 0.0102, 0.0679, 0.0178], device='cuda:2'), in_proj_covar=tensor([0.0083, 0.0079, 0.0100, 0.0063, 0.0067, 0.0079, 0.0097, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 14:49:26,807 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63925.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:49:26,915 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63925.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:49:37,549 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.362e+02 1.983e+02 2.552e+02 2.961e+02 5.419e+02, threshold=5.104e+02, percent-clipped=2.0 2023-03-08 14:49:52,618 INFO [train2.py:809] (2/4) Epoch 17, batch 200, loss[ctc_loss=0.06692, att_loss=0.2323, loss=0.1992, over 16017.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007492, over 40.00 utterances.], tot_loss[ctc_loss=0.0842, att_loss=0.2422, loss=0.2106, over 2074182.75 frames. utt_duration=1203 frames, utt_pad_proportion=0.06693, over 6903.77 utterances.], batch size: 40, lr: 6.45e-03, grad_scale: 4.0 2023-03-08 14:49:55,674 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63942.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:50:52,543 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63977.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:51:15,255 INFO [train2.py:809] (2/4) Epoch 17, batch 250, loss[ctc_loss=0.07165, att_loss=0.2118, loss=0.1837, over 15647.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008782, over 37.00 utterances.], tot_loss[ctc_loss=0.08423, att_loss=0.2429, loss=0.2112, over 2345857.72 frames. utt_duration=1226 frames, utt_pad_proportion=0.06062, over 7664.86 utterances.], batch size: 37, lr: 6.45e-03, grad_scale: 4.0 2023-03-08 14:51:19,680 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-08 14:52:14,228 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64023.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:52:28,263 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.355e+02 2.064e+02 2.460e+02 3.052e+02 5.985e+02, threshold=4.919e+02, percent-clipped=1.0 2023-03-08 14:52:43,727 INFO [train2.py:809] (2/4) Epoch 17, batch 300, loss[ctc_loss=0.08779, att_loss=0.2507, loss=0.2181, over 16950.00 frames. utt_duration=686.4 frames, utt_pad_proportion=0.1377, over 99.00 utterances.], tot_loss[ctc_loss=0.0834, att_loss=0.2416, loss=0.21, over 2555456.46 frames. utt_duration=1217 frames, utt_pad_proportion=0.06066, over 8407.30 utterances.], batch size: 99, lr: 6.45e-03, grad_scale: 4.0 2023-03-08 14:53:01,292 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5151, 4.9734, 4.7500, 4.9524, 5.0191, 4.6206, 3.4422, 4.8952], device='cuda:2'), covar=tensor([0.0111, 0.0085, 0.0132, 0.0068, 0.0088, 0.0120, 0.0655, 0.0166], device='cuda:2'), in_proj_covar=tensor([0.0082, 0.0078, 0.0098, 0.0062, 0.0066, 0.0078, 0.0096, 0.0099], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 14:53:21,266 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.37 vs. limit=5.0 2023-03-08 14:53:39,776 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64076.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:53:42,899 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64078.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:53:52,362 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64084.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:53:57,543 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0642, 5.3153, 5.3121, 5.1719, 5.3386, 5.3406, 5.0323, 4.7950], device='cuda:2'), covar=tensor([0.1104, 0.0515, 0.0249, 0.0460, 0.0293, 0.0308, 0.0349, 0.0354], device='cuda:2'), in_proj_covar=tensor([0.0490, 0.0331, 0.0307, 0.0323, 0.0383, 0.0402, 0.0328, 0.0363], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 14:54:04,130 INFO [train2.py:809] (2/4) Epoch 17, batch 350, loss[ctc_loss=0.1012, att_loss=0.2607, loss=0.2288, over 16758.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.007102, over 48.00 utterances.], tot_loss[ctc_loss=0.08462, att_loss=0.2427, loss=0.2111, over 2717557.38 frames. utt_duration=1210 frames, utt_pad_proportion=0.06071, over 8995.68 utterances.], batch size: 48, lr: 6.45e-03, grad_scale: 4.0 2023-03-08 14:54:56,967 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64124.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:55:00,047 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64126.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:55:09,305 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.372e+02 2.060e+02 2.368e+02 3.330e+02 9.339e+02, threshold=4.736e+02, percent-clipped=6.0 2023-03-08 14:55:25,122 INFO [train2.py:809] (2/4) Epoch 17, batch 400, loss[ctc_loss=0.1353, att_loss=0.2693, loss=0.2425, over 14494.00 frames. utt_duration=401.5 frames, utt_pad_proportion=0.3017, over 145.00 utterances.], tot_loss[ctc_loss=0.08428, att_loss=0.2419, loss=0.2104, over 2838407.61 frames. utt_duration=1226 frames, utt_pad_proportion=0.05924, over 9274.42 utterances.], batch size: 145, lr: 6.44e-03, grad_scale: 8.0 2023-03-08 14:56:07,200 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64166.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:56:36,181 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2703, 2.3743, 4.4067, 3.7077, 2.9253, 3.9605, 4.0943, 4.1220], device='cuda:2'), covar=tensor([0.0199, 0.1670, 0.0135, 0.0896, 0.1670, 0.0269, 0.0189, 0.0309], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0240, 0.0165, 0.0308, 0.0263, 0.0197, 0.0144, 0.0173], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 14:56:47,842 INFO [train2.py:809] (2/4) Epoch 17, batch 450, loss[ctc_loss=0.09145, att_loss=0.2454, loss=0.2146, over 16888.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006274, over 49.00 utterances.], tot_loss[ctc_loss=0.08444, att_loss=0.2422, loss=0.2106, over 2950134.20 frames. utt_duration=1230 frames, utt_pad_proportion=0.05282, over 9604.21 utterances.], batch size: 49, lr: 6.44e-03, grad_scale: 8.0 2023-03-08 14:57:04,188 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64201.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:57:42,947 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64225.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:57:53,584 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.502e+02 2.140e+02 2.597e+02 3.193e+02 5.967e+02, threshold=5.195e+02, percent-clipped=4.0 2023-03-08 14:57:54,046 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64232.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:58:09,212 INFO [train2.py:809] (2/4) Epoch 17, batch 500, loss[ctc_loss=0.0763, att_loss=0.2415, loss=0.2085, over 16699.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.00574, over 46.00 utterances.], tot_loss[ctc_loss=0.08391, att_loss=0.242, loss=0.2104, over 3022043.20 frames. utt_duration=1224 frames, utt_pad_proportion=0.05477, over 9890.38 utterances.], batch size: 46, lr: 6.44e-03, grad_scale: 8.0 2023-03-08 14:58:11,140 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64242.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:58:58,189 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64272.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:58:59,753 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64273.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:59:10,955 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5009, 4.9440, 4.7431, 5.0529, 5.0149, 4.6502, 3.4639, 4.9220], device='cuda:2'), covar=tensor([0.0124, 0.0097, 0.0143, 0.0066, 0.0083, 0.0119, 0.0680, 0.0174], device='cuda:2'), in_proj_covar=tensor([0.0084, 0.0081, 0.0100, 0.0063, 0.0067, 0.0079, 0.0097, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0003], device='cuda:2') 2023-03-08 14:59:27,931 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64290.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 14:59:29,444 INFO [train2.py:809] (2/4) Epoch 17, batch 550, loss[ctc_loss=0.07746, att_loss=0.2248, loss=0.1953, over 15498.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008897, over 36.00 utterances.], tot_loss[ctc_loss=0.08333, att_loss=0.2416, loss=0.21, over 3082401.23 frames. utt_duration=1220 frames, utt_pad_proportion=0.05619, over 10118.44 utterances.], batch size: 36, lr: 6.44e-03, grad_scale: 8.0 2023-03-08 14:59:33,713 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64293.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:00:34,620 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.348e+02 1.879e+02 2.342e+02 2.961e+02 5.931e+02, threshold=4.683e+02, percent-clipped=3.0 2023-03-08 15:00:50,370 INFO [train2.py:809] (2/4) Epoch 17, batch 600, loss[ctc_loss=0.08699, att_loss=0.2397, loss=0.2092, over 16396.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007394, over 44.00 utterances.], tot_loss[ctc_loss=0.08342, att_loss=0.2413, loss=0.2097, over 3120481.21 frames. utt_duration=1213 frames, utt_pad_proportion=0.06107, over 10300.18 utterances.], batch size: 44, lr: 6.43e-03, grad_scale: 8.0 2023-03-08 15:00:57,637 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64345.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:01:42,118 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7362, 5.9830, 5.5026, 5.7545, 5.6048, 5.2408, 5.4563, 5.3133], device='cuda:2'), covar=tensor([0.1384, 0.0878, 0.0859, 0.0820, 0.1019, 0.1459, 0.2337, 0.2085], device='cuda:2'), in_proj_covar=tensor([0.0497, 0.0568, 0.0432, 0.0431, 0.0412, 0.0456, 0.0586, 0.0508], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:01:51,612 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64379.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:02:12,267 INFO [train2.py:809] (2/4) Epoch 17, batch 650, loss[ctc_loss=0.1165, att_loss=0.235, loss=0.2113, over 16185.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.005434, over 41.00 utterances.], tot_loss[ctc_loss=0.08266, att_loss=0.2404, loss=0.2089, over 3153738.60 frames. utt_duration=1226 frames, utt_pad_proportion=0.05932, over 10304.42 utterances.], batch size: 41, lr: 6.43e-03, grad_scale: 8.0 2023-03-08 15:02:36,764 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64406.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:03:17,364 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.378e+02 2.002e+02 2.471e+02 3.095e+02 7.929e+02, threshold=4.943e+02, percent-clipped=8.0 2023-03-08 15:03:32,610 INFO [train2.py:809] (2/4) Epoch 17, batch 700, loss[ctc_loss=0.07961, att_loss=0.2277, loss=0.1981, over 15374.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01055, over 35.00 utterances.], tot_loss[ctc_loss=0.08304, att_loss=0.2404, loss=0.2089, over 3176811.24 frames. utt_duration=1226 frames, utt_pad_proportion=0.05936, over 10373.28 utterances.], batch size: 35, lr: 6.43e-03, grad_scale: 8.0 2023-03-08 15:04:09,386 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4293, 2.7514, 2.9902, 4.3898, 3.9430, 4.0348, 2.9207, 2.1658], device='cuda:2'), covar=tensor([0.0619, 0.1982, 0.1270, 0.0495, 0.0748, 0.0434, 0.1347, 0.2247], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0218, 0.0190, 0.0206, 0.0215, 0.0172, 0.0200, 0.0188], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 15:04:12,656 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64465.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:04:14,121 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64466.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:04:55,323 INFO [train2.py:809] (2/4) Epoch 17, batch 750, loss[ctc_loss=0.06743, att_loss=0.2301, loss=0.1976, over 16549.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.005792, over 45.00 utterances.], tot_loss[ctc_loss=0.0831, att_loss=0.2404, loss=0.2089, over 3193785.50 frames. utt_duration=1248 frames, utt_pad_proportion=0.05496, over 10251.45 utterances.], batch size: 45, lr: 6.43e-03, grad_scale: 8.0 2023-03-08 15:05:07,165 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.32 vs. limit=5.0 2023-03-08 15:05:11,194 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64501.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:05:18,270 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7480, 2.2094, 5.1281, 4.1634, 3.0645, 4.3549, 4.8830, 4.8104], device='cuda:2'), covar=tensor([0.0149, 0.1606, 0.0106, 0.0722, 0.1690, 0.0167, 0.0094, 0.0154], device='cuda:2'), in_proj_covar=tensor([0.0171, 0.0237, 0.0163, 0.0303, 0.0259, 0.0194, 0.0144, 0.0172], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 15:05:31,729 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64514.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:05:51,395 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64526.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:06:01,019 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 2.298e+02 2.805e+02 3.347e+02 6.423e+02, threshold=5.609e+02, percent-clipped=6.0 2023-03-08 15:06:16,286 INFO [train2.py:809] (2/4) Epoch 17, batch 800, loss[ctc_loss=0.08531, att_loss=0.222, loss=0.1947, over 15653.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.008292, over 37.00 utterances.], tot_loss[ctc_loss=0.0837, att_loss=0.2403, loss=0.209, over 3209544.60 frames. utt_duration=1247 frames, utt_pad_proportion=0.05559, over 10306.11 utterances.], batch size: 37, lr: 6.42e-03, grad_scale: 8.0 2023-03-08 15:06:28,798 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64549.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:07:05,774 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64572.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:07:32,191 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64588.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:07:33,848 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64589.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:07:36,540 INFO [train2.py:809] (2/4) Epoch 17, batch 850, loss[ctc_loss=0.071, att_loss=0.2256, loss=0.1947, over 15949.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007244, over 41.00 utterances.], tot_loss[ctc_loss=0.08238, att_loss=0.2393, loss=0.2079, over 3223892.97 frames. utt_duration=1277 frames, utt_pad_proportion=0.04831, over 10106.98 utterances.], batch size: 41, lr: 6.42e-03, grad_scale: 8.0 2023-03-08 15:08:24,221 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64620.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:08:43,536 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.557e+02 2.027e+02 2.349e+02 3.075e+02 7.936e+02, threshold=4.698e+02, percent-clipped=3.0 2023-03-08 15:08:58,460 INFO [train2.py:809] (2/4) Epoch 17, batch 900, loss[ctc_loss=0.09826, att_loss=0.2617, loss=0.229, over 17034.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.009913, over 52.00 utterances.], tot_loss[ctc_loss=0.08243, att_loss=0.2401, loss=0.2086, over 3243120.48 frames. utt_duration=1290 frames, utt_pad_proportion=0.04237, over 10069.02 utterances.], batch size: 52, lr: 6.42e-03, grad_scale: 8.0 2023-03-08 15:09:07,686 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1889, 4.4563, 4.3931, 4.5383, 2.9064, 4.5036, 2.5906, 2.0369], device='cuda:2'), covar=tensor([0.0344, 0.0231, 0.0665, 0.0176, 0.1454, 0.0154, 0.1580, 0.1665], device='cuda:2'), in_proj_covar=tensor([0.0169, 0.0143, 0.0257, 0.0136, 0.0219, 0.0124, 0.0231, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 15:09:10,778 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64648.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:09:14,134 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64650.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:10:01,409 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64679.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:10:21,892 INFO [train2.py:809] (2/4) Epoch 17, batch 950, loss[ctc_loss=0.07722, att_loss=0.2215, loss=0.1927, over 15636.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.009257, over 37.00 utterances.], tot_loss[ctc_loss=0.08265, att_loss=0.2403, loss=0.2088, over 3246933.87 frames. utt_duration=1264 frames, utt_pad_proportion=0.04851, over 10284.81 utterances.], batch size: 37, lr: 6.42e-03, grad_scale: 8.0 2023-03-08 15:10:34,176 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-03-08 15:10:38,867 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64701.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:10:42,294 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64703.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:10:51,707 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64709.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:11:19,704 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64727.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:11:25,967 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.10 vs. limit=5.0 2023-03-08 15:11:28,016 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 2.001e+02 2.338e+02 2.858e+02 5.487e+02, threshold=4.676e+02, percent-clipped=1.0 2023-03-08 15:11:42,848 INFO [train2.py:809] (2/4) Epoch 17, batch 1000, loss[ctc_loss=0.07361, att_loss=0.2396, loss=0.2064, over 17070.00 frames. utt_duration=1315 frames, utt_pad_proportion=0.00791, over 52.00 utterances.], tot_loss[ctc_loss=0.08269, att_loss=0.2403, loss=0.2088, over 3255759.60 frames. utt_duration=1268 frames, utt_pad_proportion=0.04668, over 10281.52 utterances.], batch size: 52, lr: 6.41e-03, grad_scale: 8.0 2023-03-08 15:11:49,881 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64745.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:11:56,102 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4976, 3.0793, 2.6938, 2.9014, 3.2109, 3.0618, 2.5480, 3.1398], device='cuda:2'), covar=tensor([0.0826, 0.0410, 0.0811, 0.0562, 0.0636, 0.0519, 0.0739, 0.0489], device='cuda:2'), in_proj_covar=tensor([0.0195, 0.0206, 0.0219, 0.0190, 0.0265, 0.0230, 0.0195, 0.0278], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 15:12:01,528 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64752.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:12:20,101 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64764.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:13:02,948 INFO [train2.py:809] (2/4) Epoch 17, batch 1050, loss[ctc_loss=0.06848, att_loss=0.2096, loss=0.1814, over 15507.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.008306, over 36.00 utterances.], tot_loss[ctc_loss=0.08326, att_loss=0.2403, loss=0.2089, over 3262946.17 frames. utt_duration=1260 frames, utt_pad_proportion=0.04837, over 10370.90 utterances.], batch size: 36, lr: 6.41e-03, grad_scale: 8.0 2023-03-08 15:13:27,819 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64806.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:13:38,374 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64813.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:13:50,529 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64821.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:14:08,042 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.316e+02 2.188e+02 2.644e+02 3.168e+02 6.507e+02, threshold=5.288e+02, percent-clipped=8.0 2023-03-08 15:14:22,624 INFO [train2.py:809] (2/4) Epoch 17, batch 1100, loss[ctc_loss=0.08387, att_loss=0.2372, loss=0.2065, over 16267.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.008117, over 43.00 utterances.], tot_loss[ctc_loss=0.08247, att_loss=0.2396, loss=0.2082, over 3256713.26 frames. utt_duration=1283 frames, utt_pad_proportion=0.0456, over 10167.11 utterances.], batch size: 43, lr: 6.41e-03, grad_scale: 8.0 2023-03-08 15:14:44,758 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4937, 3.2715, 3.1504, 2.7987, 3.3244, 3.1936, 3.2919, 2.1916], device='cuda:2'), covar=tensor([0.1102, 0.1475, 0.3620, 0.5436, 0.1494, 0.4445, 0.1264, 0.6099], device='cuda:2'), in_proj_covar=tensor([0.0141, 0.0163, 0.0177, 0.0239, 0.0139, 0.0237, 0.0151, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 15:15:19,143 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.48 vs. limit=2.0 2023-03-08 15:15:37,946 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64888.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:15:42,103 INFO [train2.py:809] (2/4) Epoch 17, batch 1150, loss[ctc_loss=0.0784, att_loss=0.2303, loss=0.1999, over 16008.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.006649, over 40.00 utterances.], tot_loss[ctc_loss=0.082, att_loss=0.239, loss=0.2076, over 3256451.55 frames. utt_duration=1282 frames, utt_pad_proportion=0.04811, over 10174.13 utterances.], batch size: 40, lr: 6.41e-03, grad_scale: 8.0 2023-03-08 15:15:49,554 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0577, 4.5596, 4.6104, 4.9494, 2.7911, 4.8087, 3.0107, 2.1937], device='cuda:2'), covar=tensor([0.0428, 0.0248, 0.0677, 0.0129, 0.1616, 0.0144, 0.1305, 0.1625], device='cuda:2'), in_proj_covar=tensor([0.0170, 0.0142, 0.0257, 0.0135, 0.0217, 0.0124, 0.0230, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 15:16:24,102 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1720, 5.1686, 4.9394, 2.2295, 1.9024, 2.6505, 2.1376, 3.7733], device='cuda:2'), covar=tensor([0.0603, 0.0225, 0.0251, 0.4786, 0.5965, 0.2773, 0.3280, 0.1843], device='cuda:2'), in_proj_covar=tensor([0.0342, 0.0250, 0.0250, 0.0229, 0.0339, 0.0331, 0.0237, 0.0357], device='cuda:2'), out_proj_covar=tensor([1.4800e-04, 9.1936e-05, 1.0752e-04, 9.9374e-05, 1.4357e-04, 1.3076e-04, 9.5073e-05, 1.4650e-04], device='cuda:2') 2023-03-08 15:16:47,807 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 2.059e+02 2.511e+02 2.874e+02 5.628e+02, threshold=5.021e+02, percent-clipped=1.0 2023-03-08 15:16:54,646 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64936.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:17:02,370 INFO [train2.py:809] (2/4) Epoch 17, batch 1200, loss[ctc_loss=0.0682, att_loss=0.2216, loss=0.1909, over 16175.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.007083, over 41.00 utterances.], tot_loss[ctc_loss=0.08167, att_loss=0.2385, loss=0.2072, over 3260004.61 frames. utt_duration=1293 frames, utt_pad_proportion=0.04592, over 10100.09 utterances.], batch size: 41, lr: 6.40e-03, grad_scale: 8.0 2023-03-08 15:17:09,401 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64945.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:17:10,058 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-03-08 15:17:58,151 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64976.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:18:22,338 INFO [train2.py:809] (2/4) Epoch 17, batch 1250, loss[ctc_loss=0.07116, att_loss=0.2266, loss=0.1955, over 15962.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005973, over 41.00 utterances.], tot_loss[ctc_loss=0.08192, att_loss=0.2387, loss=0.2073, over 3261419.06 frames. utt_duration=1321 frames, utt_pad_proportion=0.03929, over 9885.69 utterances.], batch size: 41, lr: 6.40e-03, grad_scale: 8.0 2023-03-08 15:18:31,750 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-03-08 15:18:34,212 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6638, 2.0512, 2.1696, 2.4917, 2.6069, 2.3316, 2.2773, 2.6363], device='cuda:2'), covar=tensor([0.1550, 0.3355, 0.2645, 0.1404, 0.1454, 0.1396, 0.2467, 0.1252], device='cuda:2'), in_proj_covar=tensor([0.0099, 0.0107, 0.0109, 0.0096, 0.0102, 0.0091, 0.0110, 0.0080], device='cuda:2'), out_proj_covar=tensor([7.2460e-05, 8.0748e-05, 8.3368e-05, 7.2258e-05, 7.4272e-05, 7.1288e-05, 8.1166e-05, 6.4050e-05], device='cuda:2') 2023-03-08 15:18:39,117 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65001.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:18:44,350 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65004.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:19:27,888 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.350e+02 2.065e+02 2.436e+02 3.098e+02 5.553e+02, threshold=4.871e+02, percent-clipped=2.0 2023-03-08 15:19:36,247 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65037.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 15:19:41,902 INFO [train2.py:809] (2/4) Epoch 17, batch 1300, loss[ctc_loss=0.09756, att_loss=0.2298, loss=0.2033, over 16012.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007632, over 40.00 utterances.], tot_loss[ctc_loss=0.08239, att_loss=0.2392, loss=0.2078, over 3268133.65 frames. utt_duration=1290 frames, utt_pad_proportion=0.04472, over 10149.07 utterances.], batch size: 40, lr: 6.40e-03, grad_scale: 8.0 2023-03-08 15:19:55,566 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65049.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:20:05,126 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=65055.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:20:11,148 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65059.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:20:45,112 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-03-08 15:21:01,328 INFO [train2.py:809] (2/4) Epoch 17, batch 1350, loss[ctc_loss=0.04891, att_loss=0.215, loss=0.1818, over 16182.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006136, over 41.00 utterances.], tot_loss[ctc_loss=0.08272, att_loss=0.2397, loss=0.2083, over 3271769.08 frames. utt_duration=1251 frames, utt_pad_proportion=0.05301, over 10477.95 utterances.], batch size: 41, lr: 6.40e-03, grad_scale: 8.0 2023-03-08 15:21:12,329 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8993, 5.0725, 5.1119, 5.0216, 5.1265, 5.1006, 4.8569, 4.5903], device='cuda:2'), covar=tensor([0.0864, 0.0491, 0.0230, 0.0482, 0.0288, 0.0298, 0.0325, 0.0357], device='cuda:2'), in_proj_covar=tensor([0.0497, 0.0339, 0.0312, 0.0334, 0.0392, 0.0411, 0.0333, 0.0373], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 15:21:18,508 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65101.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:21:29,264 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65108.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:21:41,848 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65116.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:21:49,450 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65121.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:22:07,568 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.334e+02 2.057e+02 2.538e+02 3.006e+02 4.389e+02, threshold=5.076e+02, percent-clipped=0.0 2023-03-08 15:22:22,264 INFO [train2.py:809] (2/4) Epoch 17, batch 1400, loss[ctc_loss=0.07457, att_loss=0.2235, loss=0.1937, over 15767.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.009028, over 38.00 utterances.], tot_loss[ctc_loss=0.08285, att_loss=0.2397, loss=0.2084, over 3259735.65 frames. utt_duration=1240 frames, utt_pad_proportion=0.06005, over 10529.78 utterances.], batch size: 38, lr: 6.39e-03, grad_scale: 8.0 2023-03-08 15:23:03,340 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0979, 5.0362, 4.7704, 2.9214, 4.7800, 4.6727, 3.9958, 2.5087], device='cuda:2'), covar=tensor([0.0098, 0.0090, 0.0265, 0.0969, 0.0094, 0.0173, 0.0387, 0.1443], device='cuda:2'), in_proj_covar=tensor([0.0070, 0.0096, 0.0095, 0.0109, 0.0080, 0.0107, 0.0098, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:23:06,280 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65169.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:23:35,245 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3532, 5.3038, 5.0886, 3.4561, 5.1016, 4.8491, 4.4524, 2.9153], device='cuda:2'), covar=tensor([0.0098, 0.0071, 0.0252, 0.0751, 0.0088, 0.0173, 0.0302, 0.1222], device='cuda:2'), in_proj_covar=tensor([0.0070, 0.0096, 0.0095, 0.0108, 0.0080, 0.0106, 0.0097, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:23:41,173 INFO [train2.py:809] (2/4) Epoch 17, batch 1450, loss[ctc_loss=0.05923, att_loss=0.2051, loss=0.176, over 15361.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01137, over 35.00 utterances.], tot_loss[ctc_loss=0.08321, att_loss=0.2395, loss=0.2083, over 3260422.80 frames. utt_duration=1252 frames, utt_pad_proportion=0.05638, over 10430.15 utterances.], batch size: 35, lr: 6.39e-03, grad_scale: 8.0 2023-03-08 15:24:04,398 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0567, 5.1075, 4.7882, 2.6429, 4.8802, 4.6541, 4.0073, 2.4539], device='cuda:2'), covar=tensor([0.0137, 0.0086, 0.0283, 0.1206, 0.0088, 0.0216, 0.0409, 0.1585], device='cuda:2'), in_proj_covar=tensor([0.0069, 0.0095, 0.0094, 0.0108, 0.0079, 0.0106, 0.0096, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:24:13,610 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6703, 2.6963, 3.8964, 3.4422, 2.8943, 3.6096, 3.5859, 3.7323], device='cuda:2'), covar=tensor([0.0308, 0.1216, 0.0165, 0.0862, 0.1348, 0.0295, 0.0178, 0.0302], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0246, 0.0171, 0.0318, 0.0269, 0.0202, 0.0150, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 15:24:37,678 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0015, 6.1418, 5.6228, 5.8605, 5.8132, 5.3196, 5.6570, 5.4286], device='cuda:2'), covar=tensor([0.0982, 0.0833, 0.0871, 0.0792, 0.0893, 0.1429, 0.2132, 0.2144], device='cuda:2'), in_proj_covar=tensor([0.0502, 0.0571, 0.0435, 0.0434, 0.0416, 0.0459, 0.0589, 0.0513], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:24:46,651 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.514e+02 2.067e+02 2.518e+02 3.214e+02 7.164e+02, threshold=5.035e+02, percent-clipped=4.0 2023-03-08 15:25:01,179 INFO [train2.py:809] (2/4) Epoch 17, batch 1500, loss[ctc_loss=0.08563, att_loss=0.2324, loss=0.203, over 16423.00 frames. utt_duration=1495 frames, utt_pad_proportion=0.006271, over 44.00 utterances.], tot_loss[ctc_loss=0.08317, att_loss=0.2393, loss=0.2081, over 3257156.04 frames. utt_duration=1239 frames, utt_pad_proportion=0.06224, over 10525.41 utterances.], batch size: 44, lr: 6.39e-03, grad_scale: 8.0 2023-03-08 15:25:08,284 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65245.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:26:11,707 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=65285.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:26:21,178 INFO [train2.py:809] (2/4) Epoch 17, batch 1550, loss[ctc_loss=0.08362, att_loss=0.2577, loss=0.2228, over 17305.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01125, over 55.00 utterances.], tot_loss[ctc_loss=0.08229, att_loss=0.2388, loss=0.2075, over 3261001.13 frames. utt_duration=1251 frames, utt_pad_proportion=0.05946, over 10441.58 utterances.], batch size: 55, lr: 6.39e-03, grad_scale: 8.0 2023-03-08 15:26:24,406 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65293.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:26:34,869 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4536, 2.7677, 4.9448, 3.7832, 2.9234, 4.1919, 4.5695, 4.5829], device='cuda:2'), covar=tensor([0.0228, 0.1545, 0.0148, 0.1027, 0.1771, 0.0254, 0.0161, 0.0252], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0244, 0.0169, 0.0315, 0.0266, 0.0201, 0.0150, 0.0178], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 15:26:42,562 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65304.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:27:27,597 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.558e+02 2.050e+02 2.369e+02 2.748e+02 8.749e+02, threshold=4.738e+02, percent-clipped=3.0 2023-03-08 15:27:27,841 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65332.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 15:27:41,714 INFO [train2.py:809] (2/4) Epoch 17, batch 1600, loss[ctc_loss=0.1013, att_loss=0.2648, loss=0.2321, over 17219.00 frames. utt_duration=873.3 frames, utt_pad_proportion=0.08459, over 79.00 utterances.], tot_loss[ctc_loss=0.0823, att_loss=0.2396, loss=0.2081, over 3273631.14 frames. utt_duration=1244 frames, utt_pad_proportion=0.05602, over 10542.18 utterances.], batch size: 79, lr: 6.38e-03, grad_scale: 8.0 2023-03-08 15:27:51,063 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65346.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:28:00,085 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65352.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:28:10,885 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65359.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:28:30,327 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7114, 5.0994, 5.3357, 5.1157, 5.1468, 5.6641, 5.0871, 5.7860], device='cuda:2'), covar=tensor([0.0768, 0.0707, 0.0794, 0.1101, 0.1848, 0.0888, 0.0751, 0.0662], device='cuda:2'), in_proj_covar=tensor([0.0807, 0.0474, 0.0564, 0.0622, 0.0827, 0.0572, 0.0459, 0.0558], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 15:29:02,418 INFO [train2.py:809] (2/4) Epoch 17, batch 1650, loss[ctc_loss=0.1002, att_loss=0.2608, loss=0.2287, over 17134.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01437, over 56.00 utterances.], tot_loss[ctc_loss=0.08328, att_loss=0.2404, loss=0.209, over 3268921.71 frames. utt_duration=1221 frames, utt_pad_proportion=0.06336, over 10726.15 utterances.], batch size: 56, lr: 6.38e-03, grad_scale: 8.0 2023-03-08 15:29:10,822 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7659, 2.6148, 3.3189, 2.7413, 3.2372, 4.0263, 3.8558, 2.7401], device='cuda:2'), covar=tensor([0.0461, 0.1830, 0.1226, 0.1345, 0.1034, 0.0889, 0.0619, 0.1474], device='cuda:2'), in_proj_covar=tensor([0.0242, 0.0241, 0.0269, 0.0214, 0.0256, 0.0350, 0.0249, 0.0233], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 15:29:20,638 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65401.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:29:29,998 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65407.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:29:31,721 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65408.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:29:36,204 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65411.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:30:09,495 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 2.074e+02 2.531e+02 3.179e+02 8.424e+02, threshold=5.062e+02, percent-clipped=4.0 2023-03-08 15:30:23,556 INFO [train2.py:809] (2/4) Epoch 17, batch 1700, loss[ctc_loss=0.08002, att_loss=0.2451, loss=0.2121, over 16454.00 frames. utt_duration=1432 frames, utt_pad_proportion=0.008039, over 46.00 utterances.], tot_loss[ctc_loss=0.08424, att_loss=0.2407, loss=0.2094, over 3274877.93 frames. utt_duration=1217 frames, utt_pad_proportion=0.06269, over 10779.26 utterances.], batch size: 46, lr: 6.38e-03, grad_scale: 8.0 2023-03-08 15:30:29,883 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8385, 2.6406, 3.4094, 2.7159, 3.2801, 4.0544, 3.8692, 2.7454], device='cuda:2'), covar=tensor([0.0463, 0.1740, 0.1097, 0.1295, 0.0947, 0.0865, 0.0625, 0.1422], device='cuda:2'), in_proj_covar=tensor([0.0239, 0.0238, 0.0266, 0.0211, 0.0252, 0.0347, 0.0246, 0.0230], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 15:30:37,511 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65449.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:30:48,517 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65456.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:30:53,697 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7332, 2.4153, 5.1028, 4.0199, 3.1466, 4.3705, 4.7769, 4.7617], device='cuda:2'), covar=tensor([0.0168, 0.1688, 0.0117, 0.0891, 0.1635, 0.0190, 0.0090, 0.0156], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0246, 0.0170, 0.0316, 0.0269, 0.0202, 0.0151, 0.0179], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 15:31:05,803 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9050, 4.8711, 4.4587, 2.6293, 4.6164, 4.5252, 3.8178, 2.3278], device='cuda:2'), covar=tensor([0.0132, 0.0100, 0.0301, 0.1164, 0.0111, 0.0217, 0.0451, 0.1684], device='cuda:2'), in_proj_covar=tensor([0.0069, 0.0095, 0.0094, 0.0107, 0.0079, 0.0105, 0.0096, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:31:44,384 INFO [train2.py:809] (2/4) Epoch 17, batch 1750, loss[ctc_loss=0.0647, att_loss=0.2239, loss=0.1921, over 16291.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006564, over 43.00 utterances.], tot_loss[ctc_loss=0.08376, att_loss=0.2405, loss=0.2092, over 3277146.77 frames. utt_duration=1223 frames, utt_pad_proportion=0.05971, over 10735.31 utterances.], batch size: 43, lr: 6.38e-03, grad_scale: 8.0 2023-03-08 15:32:49,893 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 2.005e+02 2.493e+02 3.105e+02 6.516e+02, threshold=4.986e+02, percent-clipped=3.0 2023-03-08 15:32:50,251 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=65532.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 15:32:51,749 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8931, 2.2266, 2.4966, 2.6643, 2.6437, 2.9428, 2.4446, 3.0053], device='cuda:2'), covar=tensor([0.1468, 0.3376, 0.2443, 0.1316, 0.1641, 0.1259, 0.2288, 0.1038], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0105, 0.0109, 0.0094, 0.0101, 0.0088, 0.0107, 0.0078], device='cuda:2'), out_proj_covar=tensor([7.1307e-05, 7.9328e-05, 8.2688e-05, 7.0721e-05, 7.3479e-05, 6.9750e-05, 7.9105e-05, 6.2547e-05], device='cuda:2') 2023-03-08 15:33:05,036 INFO [train2.py:809] (2/4) Epoch 17, batch 1800, loss[ctc_loss=0.06374, att_loss=0.2184, loss=0.1875, over 15890.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.00902, over 39.00 utterances.], tot_loss[ctc_loss=0.08294, att_loss=0.24, loss=0.2086, over 3270522.59 frames. utt_duration=1224 frames, utt_pad_proportion=0.06177, over 10698.22 utterances.], batch size: 39, lr: 6.37e-03, grad_scale: 8.0 2023-03-08 15:34:19,280 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8165, 4.7097, 4.6421, 2.3310, 1.9929, 2.7723, 2.0775, 3.7393], device='cuda:2'), covar=tensor([0.0769, 0.0224, 0.0243, 0.4750, 0.5427, 0.2518, 0.3486, 0.1470], device='cuda:2'), in_proj_covar=tensor([0.0342, 0.0249, 0.0251, 0.0228, 0.0339, 0.0331, 0.0238, 0.0356], device='cuda:2'), out_proj_covar=tensor([1.4757e-04, 9.1661e-05, 1.0781e-04, 9.8479e-05, 1.4326e-04, 1.3052e-04, 9.5255e-05, 1.4617e-04], device='cuda:2') 2023-03-08 15:34:26,296 INFO [train2.py:809] (2/4) Epoch 17, batch 1850, loss[ctc_loss=0.06743, att_loss=0.226, loss=0.1943, over 16009.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007786, over 40.00 utterances.], tot_loss[ctc_loss=0.08273, att_loss=0.2398, loss=0.2084, over 3259622.80 frames. utt_duration=1208 frames, utt_pad_proportion=0.06838, over 10805.99 utterances.], batch size: 40, lr: 6.37e-03, grad_scale: 8.0 2023-03-08 15:34:29,964 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65593.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 15:35:23,989 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7636, 6.0715, 5.6130, 5.9041, 5.7749, 5.3322, 5.4890, 5.3525], device='cuda:2'), covar=tensor([0.1287, 0.0899, 0.0845, 0.0713, 0.0898, 0.1533, 0.2358, 0.2393], device='cuda:2'), in_proj_covar=tensor([0.0498, 0.0571, 0.0434, 0.0429, 0.0413, 0.0458, 0.0586, 0.0508], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:35:31,594 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.506e+02 1.987e+02 2.376e+02 2.824e+02 5.455e+02, threshold=4.753e+02, percent-clipped=1.0 2023-03-08 15:35:31,969 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65632.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:35:46,839 INFO [train2.py:809] (2/4) Epoch 17, batch 1900, loss[ctc_loss=0.07386, att_loss=0.2542, loss=0.2182, over 16781.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005889, over 48.00 utterances.], tot_loss[ctc_loss=0.08268, att_loss=0.24, loss=0.2085, over 3261309.95 frames. utt_duration=1228 frames, utt_pad_proportion=0.06298, over 10639.04 utterances.], batch size: 48, lr: 6.37e-03, grad_scale: 8.0 2023-03-08 15:35:47,051 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65641.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:36:48,027 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65680.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:37:05,991 INFO [train2.py:809] (2/4) Epoch 17, batch 1950, loss[ctc_loss=0.06817, att_loss=0.2276, loss=0.1957, over 12741.00 frames. utt_duration=1822 frames, utt_pad_proportion=0.1207, over 28.00 utterances.], tot_loss[ctc_loss=0.0821, att_loss=0.24, loss=0.2084, over 3268863.29 frames. utt_duration=1251 frames, utt_pad_proportion=0.05494, over 10466.49 utterances.], batch size: 28, lr: 6.37e-03, grad_scale: 8.0 2023-03-08 15:37:36,755 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65711.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:38:10,050 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.350e+02 2.136e+02 2.531e+02 3.248e+02 6.819e+02, threshold=5.061e+02, percent-clipped=5.0 2023-03-08 15:38:25,205 INFO [train2.py:809] (2/4) Epoch 17, batch 2000, loss[ctc_loss=0.1072, att_loss=0.2578, loss=0.2277, over 17222.00 frames. utt_duration=873.3 frames, utt_pad_proportion=0.08456, over 79.00 utterances.], tot_loss[ctc_loss=0.08335, att_loss=0.2401, loss=0.2088, over 3266468.95 frames. utt_duration=1225 frames, utt_pad_proportion=0.06215, over 10675.14 utterances.], batch size: 79, lr: 6.37e-03, grad_scale: 8.0 2023-03-08 15:38:30,703 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.71 vs. limit=2.0 2023-03-08 15:38:53,012 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65759.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:39:44,925 INFO [train2.py:809] (2/4) Epoch 17, batch 2050, loss[ctc_loss=0.06076, att_loss=0.2195, loss=0.1877, over 15772.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008573, over 38.00 utterances.], tot_loss[ctc_loss=0.08365, att_loss=0.2408, loss=0.2093, over 3269641.13 frames. utt_duration=1203 frames, utt_pad_proportion=0.06733, over 10889.35 utterances.], batch size: 38, lr: 6.36e-03, grad_scale: 8.0 2023-03-08 15:40:33,366 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2723, 5.2755, 5.0448, 3.2162, 5.0466, 4.8674, 4.5174, 2.6786], device='cuda:2'), covar=tensor([0.0136, 0.0085, 0.0248, 0.0911, 0.0089, 0.0184, 0.0292, 0.1404], device='cuda:2'), in_proj_covar=tensor([0.0071, 0.0096, 0.0095, 0.0108, 0.0080, 0.0106, 0.0096, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:40:51,675 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.921e+02 2.500e+02 3.054e+02 5.987e+02, threshold=5.000e+02, percent-clipped=2.0 2023-03-08 15:41:00,854 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2023-03-08 15:41:01,840 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=65838.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 15:41:06,584 INFO [train2.py:809] (2/4) Epoch 17, batch 2100, loss[ctc_loss=0.0858, att_loss=0.2656, loss=0.2296, over 17439.00 frames. utt_duration=1109 frames, utt_pad_proportion=0.03082, over 63.00 utterances.], tot_loss[ctc_loss=0.08299, att_loss=0.2403, loss=0.2088, over 3276014.27 frames. utt_duration=1230 frames, utt_pad_proportion=0.05888, over 10667.28 utterances.], batch size: 63, lr: 6.36e-03, grad_scale: 8.0 2023-03-08 15:41:53,411 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4783, 2.4006, 4.9575, 3.8641, 3.0036, 4.2524, 4.6191, 4.6165], device='cuda:2'), covar=tensor([0.0190, 0.1839, 0.0145, 0.1013, 0.1667, 0.0236, 0.0135, 0.0199], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0246, 0.0170, 0.0315, 0.0269, 0.0201, 0.0152, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 15:42:21,794 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65888.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 15:42:26,207 INFO [train2.py:809] (2/4) Epoch 17, batch 2150, loss[ctc_loss=0.1011, att_loss=0.2667, loss=0.2336, over 17316.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02207, over 59.00 utterances.], tot_loss[ctc_loss=0.08342, att_loss=0.2403, loss=0.209, over 3281536.09 frames. utt_duration=1242 frames, utt_pad_proportion=0.05476, over 10585.32 utterances.], batch size: 59, lr: 6.36e-03, grad_scale: 16.0 2023-03-08 15:42:38,836 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65899.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 15:43:30,032 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7159, 3.2976, 3.6355, 3.1453, 3.5724, 4.6970, 4.4604, 3.4121], device='cuda:2'), covar=tensor([0.0333, 0.1432, 0.1294, 0.1332, 0.1096, 0.0831, 0.0533, 0.1266], device='cuda:2'), in_proj_covar=tensor([0.0242, 0.0239, 0.0268, 0.0211, 0.0256, 0.0350, 0.0247, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 15:43:31,156 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.335e+02 2.187e+02 2.638e+02 3.352e+02 9.129e+02, threshold=5.276e+02, percent-clipped=7.0 2023-03-08 15:43:45,838 INFO [train2.py:809] (2/4) Epoch 17, batch 2200, loss[ctc_loss=0.06705, att_loss=0.2278, loss=0.1957, over 15856.00 frames. utt_duration=1628 frames, utt_pad_proportion=0.01043, over 39.00 utterances.], tot_loss[ctc_loss=0.08327, att_loss=0.2406, loss=0.2091, over 3282783.42 frames. utt_duration=1230 frames, utt_pad_proportion=0.05611, over 10685.30 utterances.], batch size: 39, lr: 6.36e-03, grad_scale: 16.0 2023-03-08 15:43:46,132 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65941.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:45:01,904 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65989.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:45:04,874 INFO [train2.py:809] (2/4) Epoch 17, batch 2250, loss[ctc_loss=0.07082, att_loss=0.2264, loss=0.1953, over 16004.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.00745, over 40.00 utterances.], tot_loss[ctc_loss=0.08297, att_loss=0.2407, loss=0.2092, over 3288279.06 frames. utt_duration=1236 frames, utt_pad_proportion=0.0533, over 10652.83 utterances.], batch size: 40, lr: 6.35e-03, grad_scale: 16.0 2023-03-08 15:45:15,511 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1641, 4.5465, 4.5478, 4.7767, 2.7186, 4.5812, 2.9091, 1.8269], device='cuda:2'), covar=tensor([0.0319, 0.0195, 0.0553, 0.0172, 0.1655, 0.0156, 0.1322, 0.1770], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0145, 0.0262, 0.0139, 0.0223, 0.0126, 0.0232, 0.0207], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 15:45:54,146 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1918, 3.6973, 3.3457, 3.4331, 4.0491, 3.5580, 3.0531, 4.2685], device='cuda:2'), covar=tensor([0.0906, 0.0528, 0.0965, 0.0680, 0.0619, 0.0850, 0.0874, 0.0370], device='cuda:2'), in_proj_covar=tensor([0.0196, 0.0209, 0.0221, 0.0193, 0.0265, 0.0234, 0.0196, 0.0278], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 15:46:12,738 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.416e+02 2.093e+02 2.407e+02 3.008e+02 8.516e+02, threshold=4.813e+02, percent-clipped=4.0 2023-03-08 15:46:27,299 INFO [train2.py:809] (2/4) Epoch 17, batch 2300, loss[ctc_loss=0.0824, att_loss=0.2167, loss=0.1898, over 15528.00 frames. utt_duration=1727 frames, utt_pad_proportion=0.007076, over 36.00 utterances.], tot_loss[ctc_loss=0.08322, att_loss=0.2407, loss=0.2092, over 3272750.94 frames. utt_duration=1251 frames, utt_pad_proportion=0.05189, over 10477.77 utterances.], batch size: 36, lr: 6.35e-03, grad_scale: 8.0 2023-03-08 15:47:48,103 INFO [train2.py:809] (2/4) Epoch 17, batch 2350, loss[ctc_loss=0.06315, att_loss=0.2199, loss=0.1885, over 15645.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008766, over 37.00 utterances.], tot_loss[ctc_loss=0.08341, att_loss=0.2406, loss=0.2091, over 3274108.18 frames. utt_duration=1260 frames, utt_pad_proportion=0.05016, over 10409.11 utterances.], batch size: 37, lr: 6.35e-03, grad_scale: 8.0 2023-03-08 15:47:53,315 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6784, 4.3959, 4.6624, 4.5378, 5.1787, 4.6309, 4.4534, 2.2320], device='cuda:2'), covar=tensor([0.0170, 0.0327, 0.0232, 0.0254, 0.0737, 0.0186, 0.0291, 0.2175], device='cuda:2'), in_proj_covar=tensor([0.0140, 0.0158, 0.0163, 0.0177, 0.0354, 0.0138, 0.0149, 0.0215], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 15:48:55,984 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.531e+02 2.214e+02 2.683e+02 3.352e+02 8.000e+02, threshold=5.366e+02, percent-clipped=7.0 2023-03-08 15:49:09,134 INFO [train2.py:809] (2/4) Epoch 17, batch 2400, loss[ctc_loss=0.07759, att_loss=0.2429, loss=0.2098, over 16388.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.008341, over 44.00 utterances.], tot_loss[ctc_loss=0.08349, att_loss=0.2408, loss=0.2094, over 3273523.97 frames. utt_duration=1234 frames, utt_pad_proportion=0.05611, over 10625.73 utterances.], batch size: 44, lr: 6.35e-03, grad_scale: 8.0 2023-03-08 15:49:14,392 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66144.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:49:16,989 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-03-08 15:49:17,501 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7384, 3.9923, 3.9045, 3.9788, 4.0322, 3.7817, 3.0950, 3.9271], device='cuda:2'), covar=tensor([0.0127, 0.0109, 0.0137, 0.0084, 0.0087, 0.0125, 0.0583, 0.0187], device='cuda:2'), in_proj_covar=tensor([0.0085, 0.0083, 0.0103, 0.0064, 0.0068, 0.0080, 0.0098, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 15:49:39,853 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66160.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 15:50:21,681 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0348, 4.9954, 4.6967, 3.0096, 4.7751, 4.6753, 4.2627, 2.9690], device='cuda:2'), covar=tensor([0.0115, 0.0099, 0.0280, 0.1009, 0.0103, 0.0193, 0.0314, 0.1188], device='cuda:2'), in_proj_covar=tensor([0.0071, 0.0097, 0.0095, 0.0109, 0.0080, 0.0106, 0.0096, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:50:24,742 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66188.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 15:50:29,106 INFO [train2.py:809] (2/4) Epoch 17, batch 2450, loss[ctc_loss=0.08957, att_loss=0.2529, loss=0.2203, over 17416.00 frames. utt_duration=1107 frames, utt_pad_proportion=0.0322, over 63.00 utterances.], tot_loss[ctc_loss=0.08431, att_loss=0.2413, loss=0.2099, over 3265125.77 frames. utt_duration=1204 frames, utt_pad_proportion=0.06516, over 10858.22 utterances.], batch size: 63, lr: 6.34e-03, grad_scale: 8.0 2023-03-08 15:50:33,914 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66194.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 15:50:51,792 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66205.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 15:51:16,829 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66221.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 15:51:35,449 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.379e+02 1.994e+02 2.398e+02 2.990e+02 4.880e+02, threshold=4.796e+02, percent-clipped=0.0 2023-03-08 15:51:40,272 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66236.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 15:51:48,520 INFO [train2.py:809] (2/4) Epoch 17, batch 2500, loss[ctc_loss=0.0841, att_loss=0.2389, loss=0.208, over 16286.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006233, over 43.00 utterances.], tot_loss[ctc_loss=0.08449, att_loss=0.2411, loss=0.2098, over 3266280.34 frames. utt_duration=1223 frames, utt_pad_proportion=0.06042, over 10697.12 utterances.], batch size: 43, lr: 6.34e-03, grad_scale: 8.0 2023-03-08 15:52:16,101 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66258.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:53:08,202 INFO [train2.py:809] (2/4) Epoch 17, batch 2550, loss[ctc_loss=0.07005, att_loss=0.2141, loss=0.1853, over 15877.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.01003, over 39.00 utterances.], tot_loss[ctc_loss=0.08358, att_loss=0.2405, loss=0.2091, over 3271154.70 frames. utt_duration=1241 frames, utt_pad_proportion=0.05538, over 10559.99 utterances.], batch size: 39, lr: 6.34e-03, grad_scale: 8.0 2023-03-08 15:53:22,174 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7181, 5.9293, 5.4049, 5.7116, 5.5769, 5.0808, 5.3836, 5.0993], device='cuda:2'), covar=tensor([0.1208, 0.0882, 0.0884, 0.0845, 0.0889, 0.1488, 0.2088, 0.2285], device='cuda:2'), in_proj_covar=tensor([0.0495, 0.0570, 0.0435, 0.0432, 0.0415, 0.0453, 0.0587, 0.0505], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 15:53:52,743 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66319.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 15:54:14,463 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+02 2.284e+02 2.590e+02 3.284e+02 7.249e+02, threshold=5.180e+02, percent-clipped=6.0 2023-03-08 15:54:27,438 INFO [train2.py:809] (2/4) Epoch 17, batch 2600, loss[ctc_loss=0.07861, att_loss=0.2327, loss=0.2019, over 16003.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007559, over 40.00 utterances.], tot_loss[ctc_loss=0.08383, att_loss=0.241, loss=0.2096, over 3279560.72 frames. utt_duration=1243 frames, utt_pad_proportion=0.05286, over 10565.66 utterances.], batch size: 40, lr: 6.34e-03, grad_scale: 8.0 2023-03-08 15:55:06,094 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6325, 3.5224, 3.4652, 3.0742, 3.5929, 3.6531, 3.5667, 2.5406], device='cuda:2'), covar=tensor([0.1136, 0.1518, 0.2364, 0.4428, 0.1289, 0.2886, 0.0972, 0.5235], device='cuda:2'), in_proj_covar=tensor([0.0143, 0.0162, 0.0175, 0.0235, 0.0138, 0.0231, 0.0152, 0.0201], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 15:55:36,732 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.35 vs. limit=5.0 2023-03-08 15:55:47,240 INFO [train2.py:809] (2/4) Epoch 17, batch 2650, loss[ctc_loss=0.08828, att_loss=0.2511, loss=0.2186, over 17331.00 frames. utt_duration=1177 frames, utt_pad_proportion=0.02185, over 59.00 utterances.], tot_loss[ctc_loss=0.0837, att_loss=0.2411, loss=0.2096, over 3282413.43 frames. utt_duration=1236 frames, utt_pad_proportion=0.05434, over 10638.71 utterances.], batch size: 59, lr: 6.33e-03, grad_scale: 8.0 2023-03-08 15:56:19,728 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9479, 2.3391, 2.7794, 2.6311, 2.7594, 2.6105, 2.5577, 2.7114], device='cuda:2'), covar=tensor([0.1277, 0.2870, 0.2024, 0.1548, 0.1606, 0.1197, 0.2259, 0.1029], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0104, 0.0108, 0.0092, 0.0101, 0.0088, 0.0107, 0.0079], device='cuda:2'), out_proj_covar=tensor([7.1141e-05, 7.8826e-05, 8.2416e-05, 7.0041e-05, 7.3408e-05, 6.9678e-05, 7.9129e-05, 6.2829e-05], device='cuda:2') 2023-03-08 15:56:28,548 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9011, 6.1397, 5.6345, 5.8901, 5.7862, 5.3555, 5.6431, 5.4022], device='cuda:2'), covar=tensor([0.1239, 0.0780, 0.0912, 0.0752, 0.0827, 0.1453, 0.1993, 0.2196], device='cuda:2'), in_proj_covar=tensor([0.0501, 0.0573, 0.0438, 0.0439, 0.0418, 0.0456, 0.0587, 0.0511], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 15:56:53,519 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.354e+02 2.144e+02 2.549e+02 3.292e+02 5.806e+02, threshold=5.098e+02, percent-clipped=4.0 2023-03-08 15:57:06,279 INFO [train2.py:809] (2/4) Epoch 17, batch 2700, loss[ctc_loss=0.06895, att_loss=0.242, loss=0.2074, over 16774.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006229, over 48.00 utterances.], tot_loss[ctc_loss=0.08411, att_loss=0.2411, loss=0.2097, over 3278665.67 frames. utt_duration=1230 frames, utt_pad_proportion=0.05768, over 10676.35 utterances.], batch size: 48, lr: 6.33e-03, grad_scale: 8.0 2023-03-08 15:57:25,365 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-08 15:58:06,537 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1397, 2.6819, 3.1657, 4.3028, 3.7411, 3.8377, 2.7822, 2.1550], device='cuda:2'), covar=tensor([0.0813, 0.2162, 0.0998, 0.0611, 0.0869, 0.0447, 0.1638, 0.2189], device='cuda:2'), in_proj_covar=tensor([0.0175, 0.0214, 0.0187, 0.0206, 0.0211, 0.0170, 0.0197, 0.0180], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 15:58:12,535 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2023-03-08 15:58:25,422 INFO [train2.py:809] (2/4) Epoch 17, batch 2750, loss[ctc_loss=0.07341, att_loss=0.2162, loss=0.1876, over 15493.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008806, over 36.00 utterances.], tot_loss[ctc_loss=0.08431, att_loss=0.2411, loss=0.2097, over 3280666.03 frames. utt_duration=1236 frames, utt_pad_proportion=0.05479, over 10628.29 utterances.], batch size: 36, lr: 6.33e-03, grad_scale: 8.0 2023-03-08 15:58:30,330 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66494.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 15:58:40,075 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66500.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 15:58:43,799 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-03-08 15:59:05,245 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66516.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 15:59:12,745 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66521.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 15:59:31,456 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.407e+02 2.169e+02 2.491e+02 3.348e+02 1.363e+03, threshold=4.982e+02, percent-clipped=6.0 2023-03-08 15:59:37,926 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5187, 2.8981, 3.6646, 2.9610, 3.5855, 4.6458, 4.4056, 3.2630], device='cuda:2'), covar=tensor([0.0352, 0.1822, 0.1240, 0.1454, 0.1035, 0.0802, 0.0631, 0.1426], device='cuda:2'), in_proj_covar=tensor([0.0242, 0.0241, 0.0271, 0.0216, 0.0260, 0.0353, 0.0252, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 15:59:43,569 INFO [train2.py:809] (2/4) Epoch 17, batch 2800, loss[ctc_loss=0.07426, att_loss=0.2449, loss=0.2108, over 16490.00 frames. utt_duration=1436 frames, utt_pad_proportion=0.005198, over 46.00 utterances.], tot_loss[ctc_loss=0.08479, att_loss=0.2418, loss=0.2104, over 3277487.91 frames. utt_duration=1202 frames, utt_pad_proportion=0.06438, over 10921.73 utterances.], batch size: 46, lr: 6.33e-03, grad_scale: 8.0 2023-03-08 15:59:45,182 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66542.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:00:48,772 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66582.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:01:02,310 INFO [train2.py:809] (2/4) Epoch 17, batch 2850, loss[ctc_loss=0.07783, att_loss=0.2399, loss=0.2075, over 16866.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.008339, over 49.00 utterances.], tot_loss[ctc_loss=0.08495, att_loss=0.242, loss=0.2106, over 3278158.95 frames. utt_duration=1203 frames, utt_pad_proportion=0.06361, over 10916.94 utterances.], batch size: 49, lr: 6.32e-03, grad_scale: 8.0 2023-03-08 16:01:39,566 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66614.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:02:09,210 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.516e+02 2.066e+02 2.507e+02 3.083e+02 5.054e+02, threshold=5.014e+02, percent-clipped=1.0 2023-03-08 16:02:17,165 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9794, 4.0072, 3.7894, 2.8315, 3.8241, 3.8249, 3.5769, 2.7039], device='cuda:2'), covar=tensor([0.0133, 0.0129, 0.0273, 0.0928, 0.0120, 0.0382, 0.0316, 0.1218], device='cuda:2'), in_proj_covar=tensor([0.0071, 0.0097, 0.0096, 0.0109, 0.0080, 0.0107, 0.0097, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 16:02:21,164 INFO [train2.py:809] (2/4) Epoch 17, batch 2900, loss[ctc_loss=0.1079, att_loss=0.2565, loss=0.2268, over 17080.00 frames. utt_duration=691.6 frames, utt_pad_proportion=0.129, over 99.00 utterances.], tot_loss[ctc_loss=0.08417, att_loss=0.2414, loss=0.21, over 3281839.40 frames. utt_duration=1208 frames, utt_pad_proportion=0.06114, over 10883.54 utterances.], batch size: 99, lr: 6.32e-03, grad_scale: 8.0 2023-03-08 16:02:21,410 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2683, 5.4720, 5.5069, 5.4320, 5.5758, 5.5087, 5.2352, 4.9864], device='cuda:2'), covar=tensor([0.0918, 0.0459, 0.0241, 0.0452, 0.0228, 0.0270, 0.0315, 0.0303], device='cuda:2'), in_proj_covar=tensor([0.0498, 0.0336, 0.0313, 0.0329, 0.0396, 0.0407, 0.0332, 0.0369], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 16:03:41,154 INFO [train2.py:809] (2/4) Epoch 17, batch 2950, loss[ctc_loss=0.09404, att_loss=0.2422, loss=0.2126, over 16008.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007817, over 40.00 utterances.], tot_loss[ctc_loss=0.08475, att_loss=0.2423, loss=0.2108, over 3279390.46 frames. utt_duration=1181 frames, utt_pad_proportion=0.06799, over 11122.40 utterances.], batch size: 40, lr: 6.32e-03, grad_scale: 8.0 2023-03-08 16:04:39,580 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4215, 2.2848, 5.0209, 3.8646, 2.8962, 4.1857, 4.7535, 4.6717], device='cuda:2'), covar=tensor([0.0287, 0.1873, 0.0189, 0.0961, 0.1887, 0.0275, 0.0130, 0.0228], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0245, 0.0169, 0.0313, 0.0268, 0.0202, 0.0153, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 16:04:44,881 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66731.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:04:47,517 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.385e+02 2.044e+02 2.450e+02 3.054e+02 6.191e+02, threshold=4.900e+02, percent-clipped=3.0 2023-03-08 16:04:54,108 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7378, 4.9761, 4.9366, 4.8576, 5.0068, 4.9711, 4.7026, 4.5332], device='cuda:2'), covar=tensor([0.0981, 0.0521, 0.0342, 0.0503, 0.0327, 0.0350, 0.0356, 0.0344], device='cuda:2'), in_proj_covar=tensor([0.0495, 0.0335, 0.0312, 0.0328, 0.0394, 0.0406, 0.0332, 0.0367], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0003, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 16:05:00,730 INFO [train2.py:809] (2/4) Epoch 17, batch 3000, loss[ctc_loss=0.09787, att_loss=0.2282, loss=0.2021, over 15480.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.007193, over 36.00 utterances.], tot_loss[ctc_loss=0.08388, att_loss=0.2414, loss=0.2099, over 3281225.64 frames. utt_duration=1208 frames, utt_pad_proportion=0.06142, over 10875.90 utterances.], batch size: 36, lr: 6.32e-03, grad_scale: 8.0 2023-03-08 16:05:00,731 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 16:05:14,969 INFO [train2.py:843] (2/4) Epoch 17, validation: ctc_loss=0.04199, att_loss=0.2349, loss=0.1964, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 16:05:14,971 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 16:06:34,013 INFO [train2.py:809] (2/4) Epoch 17, batch 3050, loss[ctc_loss=0.08852, att_loss=0.2479, loss=0.2161, over 16781.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005667, over 48.00 utterances.], tot_loss[ctc_loss=0.08313, att_loss=0.2411, loss=0.2095, over 3282214.78 frames. utt_duration=1243 frames, utt_pad_proportion=0.05448, over 10579.20 utterances.], batch size: 48, lr: 6.32e-03, grad_scale: 4.0 2023-03-08 16:06:35,931 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66792.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:06:49,699 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66800.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:07:14,793 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66816.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:07:18,481 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2023-03-08 16:07:20,898 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9783, 4.1232, 3.8049, 4.1652, 3.7939, 3.7302, 4.1693, 4.0755], device='cuda:2'), covar=tensor([0.0562, 0.0359, 0.0711, 0.0421, 0.0428, 0.0745, 0.0300, 0.0226], device='cuda:2'), in_proj_covar=tensor([0.0374, 0.0300, 0.0348, 0.0316, 0.0302, 0.0224, 0.0283, 0.0266], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 16:07:42,434 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.427e+02 2.123e+02 2.532e+02 3.472e+02 1.749e+03, threshold=5.065e+02, percent-clipped=12.0 2023-03-08 16:07:53,810 INFO [train2.py:809] (2/4) Epoch 17, batch 3100, loss[ctc_loss=0.0792, att_loss=0.241, loss=0.2086, over 16953.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008167, over 50.00 utterances.], tot_loss[ctc_loss=0.08353, att_loss=0.2414, loss=0.2098, over 3275188.45 frames. utt_duration=1197 frames, utt_pad_proportion=0.06843, over 10961.11 utterances.], batch size: 50, lr: 6.31e-03, grad_scale: 4.0 2023-03-08 16:08:05,849 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66848.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:08:30,874 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66864.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:08:50,683 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66877.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 16:09:12,468 INFO [train2.py:809] (2/4) Epoch 17, batch 3150, loss[ctc_loss=0.08235, att_loss=0.2253, loss=0.1967, over 16186.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.006316, over 41.00 utterances.], tot_loss[ctc_loss=0.08382, att_loss=0.2408, loss=0.2094, over 3262165.82 frames. utt_duration=1172 frames, utt_pad_proportion=0.07794, over 11148.43 utterances.], batch size: 41, lr: 6.31e-03, grad_scale: 4.0 2023-03-08 16:09:50,607 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66914.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:09:56,845 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6371, 4.6524, 4.5665, 4.6457, 5.2510, 4.4144, 4.6337, 2.4093], device='cuda:2'), covar=tensor([0.0193, 0.0229, 0.0245, 0.0228, 0.0807, 0.0239, 0.0245, 0.1975], device='cuda:2'), in_proj_covar=tensor([0.0140, 0.0157, 0.0164, 0.0177, 0.0354, 0.0138, 0.0149, 0.0212], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-08 16:10:21,243 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.312e+02 2.012e+02 2.406e+02 2.883e+02 7.010e+02, threshold=4.812e+02, percent-clipped=2.0 2023-03-08 16:10:32,787 INFO [train2.py:809] (2/4) Epoch 17, batch 3200, loss[ctc_loss=0.07547, att_loss=0.2187, loss=0.19, over 16013.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007601, over 40.00 utterances.], tot_loss[ctc_loss=0.08273, att_loss=0.2402, loss=0.2087, over 3270645.74 frames. utt_duration=1191 frames, utt_pad_proportion=0.07054, over 10994.23 utterances.], batch size: 40, lr: 6.31e-03, grad_scale: 8.0 2023-03-08 16:10:58,877 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66957.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:11:06,319 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66962.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:11:51,563 INFO [train2.py:809] (2/4) Epoch 17, batch 3250, loss[ctc_loss=0.08591, att_loss=0.2497, loss=0.2169, over 16765.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006555, over 48.00 utterances.], tot_loss[ctc_loss=0.08215, att_loss=0.2395, loss=0.208, over 3271293.28 frames. utt_duration=1211 frames, utt_pad_proportion=0.06542, over 10818.78 utterances.], batch size: 48, lr: 6.31e-03, grad_scale: 8.0 2023-03-08 16:12:28,722 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7111, 3.3921, 3.4871, 2.9907, 3.5223, 3.5971, 3.5026, 2.6079], device='cuda:2'), covar=tensor([0.1016, 0.1761, 0.2150, 0.5354, 0.1213, 0.2261, 0.1253, 0.5043], device='cuda:2'), in_proj_covar=tensor([0.0144, 0.0165, 0.0176, 0.0236, 0.0139, 0.0232, 0.0153, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 16:12:34,762 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67018.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 16:12:58,912 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.388e+02 2.115e+02 2.445e+02 2.942e+02 7.309e+02, threshold=4.890e+02, percent-clipped=3.0 2023-03-08 16:13:11,486 INFO [train2.py:809] (2/4) Epoch 17, batch 3300, loss[ctc_loss=0.1654, att_loss=0.2867, loss=0.2625, over 17297.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02482, over 59.00 utterances.], tot_loss[ctc_loss=0.08323, att_loss=0.2404, loss=0.2089, over 3268058.67 frames. utt_duration=1181 frames, utt_pad_proportion=0.07391, over 11081.67 utterances.], batch size: 59, lr: 6.30e-03, grad_scale: 8.0 2023-03-08 16:13:34,681 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-08 16:14:17,412 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67083.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:14:18,815 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6774, 4.8937, 4.8966, 4.8950, 4.9509, 4.9337, 4.6876, 4.4405], device='cuda:2'), covar=tensor([0.1025, 0.0630, 0.0314, 0.0458, 0.0305, 0.0336, 0.0361, 0.0374], device='cuda:2'), in_proj_covar=tensor([0.0499, 0.0337, 0.0315, 0.0328, 0.0396, 0.0409, 0.0335, 0.0372], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 16:14:24,018 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=67087.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:14:31,292 INFO [train2.py:809] (2/4) Epoch 17, batch 3350, loss[ctc_loss=0.08898, att_loss=0.2585, loss=0.2246, over 16771.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006274, over 48.00 utterances.], tot_loss[ctc_loss=0.08326, att_loss=0.241, loss=0.2094, over 3269560.32 frames. utt_duration=1191 frames, utt_pad_proportion=0.06924, over 10996.35 utterances.], batch size: 48, lr: 6.30e-03, grad_scale: 8.0 2023-03-08 16:14:53,695 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.71 vs. limit=2.0 2023-03-08 16:15:37,317 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.585e+02 2.015e+02 2.419e+02 3.041e+02 5.956e+02, threshold=4.838e+02, percent-clipped=4.0 2023-03-08 16:15:49,839 INFO [train2.py:809] (2/4) Epoch 17, batch 3400, loss[ctc_loss=0.06352, att_loss=0.2363, loss=0.2017, over 16256.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.008077, over 43.00 utterances.], tot_loss[ctc_loss=0.0833, att_loss=0.2407, loss=0.2093, over 3273384.06 frames. utt_duration=1203 frames, utt_pad_proportion=0.06648, over 10898.97 utterances.], batch size: 43, lr: 6.30e-03, grad_scale: 8.0 2023-03-08 16:15:54,798 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67144.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:15:56,960 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-08 16:16:00,842 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6234, 5.1387, 4.9435, 5.0182, 5.1790, 4.8285, 3.6986, 5.0291], device='cuda:2'), covar=tensor([0.0113, 0.0091, 0.0105, 0.0078, 0.0068, 0.0102, 0.0623, 0.0147], device='cuda:2'), in_proj_covar=tensor([0.0087, 0.0084, 0.0105, 0.0066, 0.0070, 0.0082, 0.0101, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 16:16:45,573 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=67177.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 16:17:08,453 INFO [train2.py:809] (2/4) Epoch 17, batch 3450, loss[ctc_loss=0.05205, att_loss=0.2071, loss=0.1761, over 15393.00 frames. utt_duration=1761 frames, utt_pad_proportion=0.009834, over 35.00 utterances.], tot_loss[ctc_loss=0.08342, att_loss=0.2405, loss=0.209, over 3263386.75 frames. utt_duration=1206 frames, utt_pad_proportion=0.06782, over 10838.53 utterances.], batch size: 35, lr: 6.30e-03, grad_scale: 8.0 2023-03-08 16:17:54,016 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4992, 3.0257, 3.6256, 2.8564, 3.5296, 4.6064, 4.4388, 3.4216], device='cuda:2'), covar=tensor([0.0380, 0.1654, 0.1229, 0.1373, 0.1059, 0.0841, 0.0506, 0.1102], device='cuda:2'), in_proj_covar=tensor([0.0238, 0.0236, 0.0268, 0.0211, 0.0256, 0.0348, 0.0247, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 16:18:01,731 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=67225.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:18:15,120 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.253e+02 2.178e+02 2.510e+02 3.172e+02 6.489e+02, threshold=5.020e+02, percent-clipped=3.0 2023-03-08 16:18:24,989 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-08 16:18:27,081 INFO [train2.py:809] (2/4) Epoch 17, batch 3500, loss[ctc_loss=0.1245, att_loss=0.2558, loss=0.2295, over 16944.00 frames. utt_duration=686.1 frames, utt_pad_proportion=0.1359, over 99.00 utterances.], tot_loss[ctc_loss=0.08383, att_loss=0.2406, loss=0.2092, over 3265538.03 frames. utt_duration=1196 frames, utt_pad_proportion=0.06912, over 10939.54 utterances.], batch size: 99, lr: 6.29e-03, grad_scale: 8.0 2023-03-08 16:18:36,795 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9942, 5.1105, 4.9419, 2.3403, 2.0309, 3.1494, 2.7487, 3.9122], device='cuda:2'), covar=tensor([0.0744, 0.0269, 0.0273, 0.5013, 0.6205, 0.2225, 0.2869, 0.1697], device='cuda:2'), in_proj_covar=tensor([0.0352, 0.0258, 0.0258, 0.0236, 0.0348, 0.0339, 0.0244, 0.0362], device='cuda:2'), out_proj_covar=tensor([1.5122e-04, 9.5014e-05, 1.1094e-04, 1.0228e-04, 1.4658e-04, 1.3351e-04, 9.7420e-05, 1.4834e-04], device='cuda:2') 2023-03-08 16:19:11,383 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0802, 5.0508, 4.7709, 2.8191, 4.8059, 4.5594, 4.3500, 2.8814], device='cuda:2'), covar=tensor([0.0123, 0.0100, 0.0294, 0.1101, 0.0104, 0.0252, 0.0310, 0.1331], device='cuda:2'), in_proj_covar=tensor([0.0071, 0.0096, 0.0096, 0.0110, 0.0081, 0.0107, 0.0097, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 16:19:24,096 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-03-08 16:19:46,679 INFO [train2.py:809] (2/4) Epoch 17, batch 3550, loss[ctc_loss=0.07006, att_loss=0.2274, loss=0.1959, over 16331.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005791, over 45.00 utterances.], tot_loss[ctc_loss=0.08337, att_loss=0.24, loss=0.2087, over 3258942.10 frames. utt_duration=1221 frames, utt_pad_proportion=0.06404, over 10690.16 utterances.], batch size: 45, lr: 6.29e-03, grad_scale: 8.0 2023-03-08 16:20:21,501 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=67313.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 16:20:36,899 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1688, 5.4502, 4.9411, 5.2597, 5.1075, 4.7238, 4.9781, 4.7103], device='cuda:2'), covar=tensor([0.1317, 0.0849, 0.0929, 0.0888, 0.0980, 0.1456, 0.2004, 0.2293], device='cuda:2'), in_proj_covar=tensor([0.0507, 0.0583, 0.0440, 0.0441, 0.0419, 0.0458, 0.0598, 0.0520], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 16:20:54,011 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.300e+02 1.995e+02 2.311e+02 2.839e+02 6.877e+02, threshold=4.621e+02, percent-clipped=1.0 2023-03-08 16:21:06,785 INFO [train2.py:809] (2/4) Epoch 17, batch 3600, loss[ctc_loss=0.08299, att_loss=0.253, loss=0.219, over 17274.00 frames. utt_duration=1098 frames, utt_pad_proportion=0.03756, over 63.00 utterances.], tot_loss[ctc_loss=0.08291, att_loss=0.2402, loss=0.2087, over 3265939.82 frames. utt_duration=1231 frames, utt_pad_proportion=0.06024, over 10628.10 utterances.], batch size: 63, lr: 6.29e-03, grad_scale: 8.0 2023-03-08 16:22:19,231 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=67387.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 16:22:26,310 INFO [train2.py:809] (2/4) Epoch 17, batch 3650, loss[ctc_loss=0.07059, att_loss=0.2145, loss=0.1857, over 15904.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.007003, over 39.00 utterances.], tot_loss[ctc_loss=0.08273, att_loss=0.2401, loss=0.2087, over 3269183.98 frames. utt_duration=1228 frames, utt_pad_proportion=0.0605, over 10662.32 utterances.], batch size: 39, lr: 6.29e-03, grad_scale: 8.0 2023-03-08 16:22:26,741 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2205, 4.6118, 4.6083, 4.9044, 3.1005, 4.5976, 3.0217, 2.0480], device='cuda:2'), covar=tensor([0.0372, 0.0208, 0.0625, 0.0153, 0.1452, 0.0166, 0.1300, 0.1615], device='cuda:2'), in_proj_covar=tensor([0.0174, 0.0148, 0.0258, 0.0142, 0.0222, 0.0128, 0.0230, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 16:22:35,010 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-03-08 16:23:29,977 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8574, 3.7041, 3.1044, 3.2744, 3.7669, 3.4788, 2.6667, 3.9695], device='cuda:2'), covar=tensor([0.1171, 0.0452, 0.1165, 0.0755, 0.0745, 0.0726, 0.1064, 0.0545], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0207, 0.0219, 0.0191, 0.0264, 0.0231, 0.0195, 0.0274], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 16:23:34,278 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.388e+02 2.053e+02 2.468e+02 2.850e+02 9.204e+02, threshold=4.935e+02, percent-clipped=2.0 2023-03-08 16:23:36,643 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=67435.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:23:44,136 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=67439.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:23:47,240 INFO [train2.py:809] (2/4) Epoch 17, batch 3700, loss[ctc_loss=0.07861, att_loss=0.2417, loss=0.209, over 16686.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006468, over 46.00 utterances.], tot_loss[ctc_loss=0.08218, att_loss=0.2394, loss=0.2079, over 3268938.80 frames. utt_duration=1258 frames, utt_pad_proportion=0.0527, over 10404.66 utterances.], batch size: 46, lr: 6.28e-03, grad_scale: 8.0 2023-03-08 16:24:55,144 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9018, 5.1711, 4.7685, 5.2511, 4.7058, 4.8814, 5.3320, 5.0787], device='cuda:2'), covar=tensor([0.0572, 0.0283, 0.0785, 0.0343, 0.0400, 0.0259, 0.0215, 0.0192], device='cuda:2'), in_proj_covar=tensor([0.0374, 0.0299, 0.0351, 0.0320, 0.0304, 0.0227, 0.0286, 0.0269], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 16:25:00,569 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4887, 2.7248, 5.0483, 3.9367, 3.1100, 4.3178, 4.7741, 4.6529], device='cuda:2'), covar=tensor([0.0273, 0.1668, 0.0170, 0.0925, 0.1697, 0.0228, 0.0112, 0.0235], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0240, 0.0168, 0.0309, 0.0265, 0.0200, 0.0152, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 16:25:07,046 INFO [train2.py:809] (2/4) Epoch 17, batch 3750, loss[ctc_loss=0.06832, att_loss=0.231, loss=0.1984, over 16687.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006662, over 46.00 utterances.], tot_loss[ctc_loss=0.08256, att_loss=0.2395, loss=0.2081, over 3277778.18 frames. utt_duration=1256 frames, utt_pad_proportion=0.05067, over 10450.15 utterances.], batch size: 46, lr: 6.28e-03, grad_scale: 8.0 2023-03-08 16:25:42,188 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67513.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:25:46,633 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0669, 6.2529, 5.7344, 5.9672, 5.9013, 5.4486, 5.7480, 5.4603], device='cuda:2'), covar=tensor([0.1117, 0.0806, 0.0780, 0.0701, 0.0837, 0.1427, 0.2039, 0.2068], device='cuda:2'), in_proj_covar=tensor([0.0501, 0.0582, 0.0436, 0.0440, 0.0415, 0.0452, 0.0589, 0.0514], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 16:26:14,094 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.323e+02 2.055e+02 2.519e+02 3.069e+02 5.925e+02, threshold=5.038e+02, percent-clipped=2.0 2023-03-08 16:26:24,786 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6932, 2.6765, 3.8981, 3.4677, 2.9826, 3.6707, 3.5641, 3.7433], device='cuda:2'), covar=tensor([0.0318, 0.1316, 0.0165, 0.0848, 0.1330, 0.0286, 0.0228, 0.0325], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0242, 0.0169, 0.0310, 0.0266, 0.0201, 0.0153, 0.0182], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 16:26:25,911 INFO [train2.py:809] (2/4) Epoch 17, batch 3800, loss[ctc_loss=0.1268, att_loss=0.2686, loss=0.2402, over 13604.00 frames. utt_duration=374.2 frames, utt_pad_proportion=0.3493, over 146.00 utterances.], tot_loss[ctc_loss=0.0819, att_loss=0.2387, loss=0.2073, over 3260831.38 frames. utt_duration=1247 frames, utt_pad_proportion=0.05756, over 10471.25 utterances.], batch size: 146, lr: 6.28e-03, grad_scale: 8.0 2023-03-08 16:27:16,652 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0843, 4.3415, 4.0485, 4.4329, 2.7504, 4.4478, 2.6200, 1.8136], device='cuda:2'), covar=tensor([0.0357, 0.0210, 0.0859, 0.0199, 0.1767, 0.0162, 0.1647, 0.1743], device='cuda:2'), in_proj_covar=tensor([0.0171, 0.0146, 0.0255, 0.0141, 0.0219, 0.0126, 0.0227, 0.0201], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 16:27:18,162 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67574.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 16:27:46,046 INFO [train2.py:809] (2/4) Epoch 17, batch 3850, loss[ctc_loss=0.1027, att_loss=0.2596, loss=0.2282, over 17244.00 frames. utt_duration=1001 frames, utt_pad_proportion=0.05555, over 69.00 utterances.], tot_loss[ctc_loss=0.08164, att_loss=0.2389, loss=0.2075, over 3264132.38 frames. utt_duration=1238 frames, utt_pad_proportion=0.05974, over 10561.67 utterances.], batch size: 69, lr: 6.28e-03, grad_scale: 8.0 2023-03-08 16:28:20,429 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=67613.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:28:53,211 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.336e+02 2.153e+02 2.695e+02 3.326e+02 5.215e+02, threshold=5.391e+02, percent-clipped=2.0 2023-03-08 16:28:55,824 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-08 16:28:59,761 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2393, 5.4780, 5.4004, 5.3924, 5.5526, 5.4706, 5.2265, 5.0142], device='cuda:2'), covar=tensor([0.0962, 0.0481, 0.0302, 0.0499, 0.0243, 0.0266, 0.0301, 0.0300], device='cuda:2'), in_proj_covar=tensor([0.0499, 0.0336, 0.0316, 0.0331, 0.0391, 0.0409, 0.0335, 0.0372], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 16:29:04,156 INFO [train2.py:809] (2/4) Epoch 17, batch 3900, loss[ctc_loss=0.05091, att_loss=0.2244, loss=0.1897, over 16159.00 frames. utt_duration=1578 frames, utt_pad_proportion=0.007972, over 41.00 utterances.], tot_loss[ctc_loss=0.08125, att_loss=0.2385, loss=0.2071, over 3261169.70 frames. utt_duration=1241 frames, utt_pad_proportion=0.05987, over 10526.23 utterances.], batch size: 41, lr: 6.28e-03, grad_scale: 8.0 2023-03-08 16:29:34,774 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=67661.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:30:04,338 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67680.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:30:21,234 INFO [train2.py:809] (2/4) Epoch 17, batch 3950, loss[ctc_loss=0.09597, att_loss=0.2613, loss=0.2282, over 17302.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02462, over 59.00 utterances.], tot_loss[ctc_loss=0.08138, att_loss=0.2389, loss=0.2074, over 3259996.36 frames. utt_duration=1232 frames, utt_pad_proportion=0.06215, over 10594.54 utterances.], batch size: 59, lr: 6.27e-03, grad_scale: 8.0 2023-03-08 16:30:23,172 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67692.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:30:26,100 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0847, 5.2869, 5.5560, 5.5162, 5.5422, 6.0914, 5.2513, 6.2144], device='cuda:2'), covar=tensor([0.0591, 0.0686, 0.0866, 0.1155, 0.1603, 0.0692, 0.0612, 0.0468], device='cuda:2'), in_proj_covar=tensor([0.0824, 0.0479, 0.0572, 0.0633, 0.0831, 0.0584, 0.0463, 0.0574], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 16:31:39,748 INFO [train2.py:809] (2/4) Epoch 18, batch 0, loss[ctc_loss=0.06021, att_loss=0.2076, loss=0.1781, over 15778.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.006917, over 38.00 utterances.], tot_loss[ctc_loss=0.06021, att_loss=0.2076, loss=0.1781, over 15778.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.006917, over 38.00 utterances.], batch size: 38, lr: 6.09e-03, grad_scale: 8.0 2023-03-08 16:31:39,748 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 16:31:52,765 INFO [train2.py:843] (2/4) Epoch 18, validation: ctc_loss=0.04224, att_loss=0.2352, loss=0.1966, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 16:31:52,766 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 16:32:02,658 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-03-08 16:32:06,401 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.395e+02 2.121e+02 2.560e+02 3.250e+02 7.102e+02, threshold=5.120e+02, percent-clipped=3.0 2023-03-08 16:32:09,798 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1009, 3.7098, 3.6762, 3.2736, 3.7046, 3.8296, 3.7148, 2.8610], device='cuda:2'), covar=tensor([0.1099, 0.1637, 0.2697, 0.4789, 0.2360, 0.3353, 0.1474, 0.4737], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0166, 0.0173, 0.0237, 0.0142, 0.0232, 0.0152, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 16:32:14,445 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=67739.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:32:17,629 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67741.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:32:20,475 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4658, 2.9725, 3.6275, 2.8533, 3.4485, 4.5589, 4.4505, 3.2304], device='cuda:2'), covar=tensor([0.0392, 0.1714, 0.1206, 0.1378, 0.1081, 0.0907, 0.0495, 0.1361], device='cuda:2'), in_proj_covar=tensor([0.0243, 0.0241, 0.0271, 0.0214, 0.0259, 0.0352, 0.0249, 0.0233], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 16:32:35,593 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67753.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 16:32:43,132 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9072, 6.1909, 5.7491, 5.9107, 5.8771, 5.3663, 5.5871, 5.4403], device='cuda:2'), covar=tensor([0.1221, 0.0772, 0.0773, 0.0810, 0.0812, 0.1518, 0.2053, 0.2371], device='cuda:2'), in_proj_covar=tensor([0.0498, 0.0570, 0.0432, 0.0434, 0.0412, 0.0446, 0.0584, 0.0504], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 16:33:00,716 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2023-03-08 16:33:11,757 INFO [train2.py:809] (2/4) Epoch 18, batch 50, loss[ctc_loss=0.07956, att_loss=0.2397, loss=0.2076, over 16874.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007361, over 49.00 utterances.], tot_loss[ctc_loss=0.08268, att_loss=0.2406, loss=0.2091, over 741394.30 frames. utt_duration=1248 frames, utt_pad_proportion=0.0469, over 2378.74 utterances.], batch size: 49, lr: 6.09e-03, grad_scale: 8.0 2023-03-08 16:33:30,482 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=67787.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:34:31,358 INFO [train2.py:809] (2/4) Epoch 18, batch 100, loss[ctc_loss=0.06831, att_loss=0.2183, loss=0.1883, over 15963.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005646, over 41.00 utterances.], tot_loss[ctc_loss=0.08184, att_loss=0.2396, loss=0.2081, over 1298221.47 frames. utt_duration=1230 frames, utt_pad_proportion=0.05904, over 4226.27 utterances.], batch size: 41, lr: 6.09e-03, grad_scale: 8.0 2023-03-08 16:34:45,110 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.584e+02 2.310e+02 2.796e+02 3.498e+02 6.865e+02, threshold=5.593e+02, percent-clipped=9.0 2023-03-08 16:35:40,473 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=67869.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 16:35:49,981 INFO [train2.py:809] (2/4) Epoch 18, batch 150, loss[ctc_loss=0.0763, att_loss=0.2223, loss=0.1931, over 15950.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006595, over 41.00 utterances.], tot_loss[ctc_loss=0.08266, att_loss=0.2408, loss=0.2092, over 1742576.47 frames. utt_duration=1249 frames, utt_pad_proportion=0.05103, over 5588.33 utterances.], batch size: 41, lr: 6.09e-03, grad_scale: 8.0 2023-03-08 16:36:40,974 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7380, 3.5101, 3.4843, 3.0745, 3.5995, 3.6078, 3.6052, 2.6830], device='cuda:2'), covar=tensor([0.1011, 0.1436, 0.3072, 0.3954, 0.1706, 0.3890, 0.1096, 0.4148], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0165, 0.0175, 0.0237, 0.0142, 0.0234, 0.0152, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 16:37:08,981 INFO [train2.py:809] (2/4) Epoch 18, batch 200, loss[ctc_loss=0.09379, att_loss=0.2648, loss=0.2306, over 16962.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007714, over 50.00 utterances.], tot_loss[ctc_loss=0.08109, att_loss=0.2396, loss=0.2079, over 2083142.23 frames. utt_duration=1265 frames, utt_pad_proportion=0.04562, over 6593.17 utterances.], batch size: 50, lr: 6.08e-03, grad_scale: 8.0 2023-03-08 16:37:12,413 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3761, 2.4548, 4.8086, 3.7225, 2.9154, 4.1075, 4.3576, 4.4946], device='cuda:2'), covar=tensor([0.0205, 0.1656, 0.0119, 0.0961, 0.1706, 0.0241, 0.0169, 0.0217], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0240, 0.0169, 0.0307, 0.0263, 0.0199, 0.0151, 0.0181], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 16:37:22,688 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.418e+02 2.000e+02 2.339e+02 2.884e+02 4.176e+02, threshold=4.678e+02, percent-clipped=0.0 2023-03-08 16:37:45,868 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5337, 2.3772, 2.3085, 2.2259, 3.0245, 2.6721, 2.1698, 3.1116], device='cuda:2'), covar=tensor([0.1598, 0.3015, 0.2818, 0.1956, 0.1399, 0.2038, 0.2575, 0.0836], device='cuda:2'), in_proj_covar=tensor([0.0100, 0.0109, 0.0113, 0.0096, 0.0104, 0.0091, 0.0112, 0.0082], device='cuda:2'), out_proj_covar=tensor([7.4216e-05, 8.2672e-05, 8.6243e-05, 7.3412e-05, 7.6518e-05, 7.2417e-05, 8.3179e-05, 6.5990e-05], device='cuda:2') 2023-03-08 16:38:27,431 INFO [train2.py:809] (2/4) Epoch 18, batch 250, loss[ctc_loss=0.06839, att_loss=0.2123, loss=0.1835, over 15367.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01044, over 35.00 utterances.], tot_loss[ctc_loss=0.08101, att_loss=0.2388, loss=0.2073, over 2347979.80 frames. utt_duration=1285 frames, utt_pad_proportion=0.0425, over 7315.99 utterances.], batch size: 35, lr: 6.08e-03, grad_scale: 8.0 2023-03-08 16:38:36,823 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67981.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:39:31,894 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0421, 4.3682, 4.2572, 4.5747, 2.6158, 4.3752, 2.5555, 1.5052], device='cuda:2'), covar=tensor([0.0459, 0.0197, 0.0653, 0.0181, 0.1782, 0.0200, 0.1650, 0.1918], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0146, 0.0253, 0.0140, 0.0218, 0.0126, 0.0227, 0.0201], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 16:39:50,974 INFO [train2.py:809] (2/4) Epoch 18, batch 300, loss[ctc_loss=0.1346, att_loss=0.2764, loss=0.2481, over 14352.00 frames. utt_duration=391.9 frames, utt_pad_proportion=0.3137, over 147.00 utterances.], tot_loss[ctc_loss=0.08148, att_loss=0.2389, loss=0.2074, over 2547313.11 frames. utt_duration=1272 frames, utt_pad_proportion=0.04948, over 8018.62 utterances.], batch size: 147, lr: 6.08e-03, grad_scale: 8.0 2023-03-08 16:40:04,757 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.391e+02 2.007e+02 2.345e+02 2.847e+02 6.033e+02, threshold=4.689e+02, percent-clipped=3.0 2023-03-08 16:40:08,020 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68036.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:40:17,208 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68042.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:40:20,447 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3832, 2.2811, 2.1810, 2.1401, 3.0008, 2.4758, 2.1016, 2.9426], device='cuda:2'), covar=tensor([0.1842, 0.3086, 0.3023, 0.1698, 0.1668, 0.1133, 0.2504, 0.0859], device='cuda:2'), in_proj_covar=tensor([0.0102, 0.0110, 0.0113, 0.0097, 0.0106, 0.0092, 0.0113, 0.0083], device='cuda:2'), out_proj_covar=tensor([7.5274e-05, 8.3732e-05, 8.6878e-05, 7.4247e-05, 7.7508e-05, 7.3401e-05, 8.4206e-05, 6.6685e-05], device='cuda:2') 2023-03-08 16:40:26,341 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68048.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:41:09,393 INFO [train2.py:809] (2/4) Epoch 18, batch 350, loss[ctc_loss=0.06285, att_loss=0.2249, loss=0.1925, over 15874.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009337, over 39.00 utterances.], tot_loss[ctc_loss=0.08212, att_loss=0.2389, loss=0.2076, over 2710417.72 frames. utt_duration=1263 frames, utt_pad_proportion=0.04953, over 8591.45 utterances.], batch size: 39, lr: 6.08e-03, grad_scale: 8.0 2023-03-08 16:41:27,804 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=68087.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 16:42:27,980 INFO [train2.py:809] (2/4) Epoch 18, batch 400, loss[ctc_loss=0.06094, att_loss=0.212, loss=0.1818, over 15628.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009937, over 37.00 utterances.], tot_loss[ctc_loss=0.08105, att_loss=0.2386, loss=0.2071, over 2835092.18 frames. utt_duration=1285 frames, utt_pad_proportion=0.04504, over 8838.06 utterances.], batch size: 37, lr: 6.07e-03, grad_scale: 8.0 2023-03-08 16:42:41,448 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.402e+02 2.125e+02 2.519e+02 3.295e+02 8.158e+02, threshold=5.039e+02, percent-clipped=7.0 2023-03-08 16:43:03,776 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68148.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 16:43:37,532 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68169.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:43:46,625 INFO [train2.py:809] (2/4) Epoch 18, batch 450, loss[ctc_loss=0.07708, att_loss=0.2413, loss=0.2084, over 16495.00 frames. utt_duration=1436 frames, utt_pad_proportion=0.005721, over 46.00 utterances.], tot_loss[ctc_loss=0.08101, att_loss=0.239, loss=0.2074, over 2939337.55 frames. utt_duration=1300 frames, utt_pad_proportion=0.03938, over 9054.21 utterances.], batch size: 46, lr: 6.07e-03, grad_scale: 8.0 2023-03-08 16:44:53,069 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68217.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:45:05,415 INFO [train2.py:809] (2/4) Epoch 18, batch 500, loss[ctc_loss=0.0558, att_loss=0.2071, loss=0.1769, over 15492.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008838, over 36.00 utterances.], tot_loss[ctc_loss=0.08118, att_loss=0.2394, loss=0.2077, over 3014248.94 frames. utt_duration=1258 frames, utt_pad_proportion=0.04999, over 9593.87 utterances.], batch size: 36, lr: 6.07e-03, grad_scale: 8.0 2023-03-08 16:45:19,448 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.432e+02 1.924e+02 2.355e+02 3.288e+02 6.167e+02, threshold=4.710e+02, percent-clipped=3.0 2023-03-08 16:45:59,273 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6025, 5.8518, 5.3168, 5.6193, 5.4804, 5.1157, 5.3396, 5.0780], device='cuda:2'), covar=tensor([0.1218, 0.0925, 0.1063, 0.0836, 0.0974, 0.1380, 0.2217, 0.2529], device='cuda:2'), in_proj_covar=tensor([0.0503, 0.0579, 0.0440, 0.0438, 0.0418, 0.0447, 0.0594, 0.0511], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 16:46:24,806 INFO [train2.py:809] (2/4) Epoch 18, batch 550, loss[ctc_loss=0.07828, att_loss=0.2436, loss=0.2105, over 17291.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02248, over 59.00 utterances.], tot_loss[ctc_loss=0.08145, att_loss=0.2393, loss=0.2077, over 3066003.21 frames. utt_duration=1252 frames, utt_pad_proportion=0.05346, over 9809.11 utterances.], batch size: 59, lr: 6.07e-03, grad_scale: 8.0 2023-03-08 16:46:48,256 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1378, 5.4757, 5.0028, 5.5319, 4.8774, 5.1451, 5.6390, 5.3419], device='cuda:2'), covar=tensor([0.0479, 0.0284, 0.0653, 0.0281, 0.0380, 0.0193, 0.0158, 0.0171], device='cuda:2'), in_proj_covar=tensor([0.0378, 0.0303, 0.0352, 0.0322, 0.0306, 0.0233, 0.0287, 0.0274], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 16:47:10,891 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.63 vs. limit=5.0 2023-03-08 16:47:44,058 INFO [train2.py:809] (2/4) Epoch 18, batch 600, loss[ctc_loss=0.08674, att_loss=0.2279, loss=0.1997, over 16017.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006722, over 40.00 utterances.], tot_loss[ctc_loss=0.08122, att_loss=0.2392, loss=0.2076, over 3115512.88 frames. utt_duration=1240 frames, utt_pad_proportion=0.05487, over 10061.34 utterances.], batch size: 40, lr: 6.07e-03, grad_scale: 8.0 2023-03-08 16:47:58,120 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.378e+02 2.193e+02 2.599e+02 3.203e+02 7.782e+02, threshold=5.199e+02, percent-clipped=2.0 2023-03-08 16:48:01,682 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68336.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:48:03,099 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68337.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:48:19,824 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68348.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:49:03,206 INFO [train2.py:809] (2/4) Epoch 18, batch 650, loss[ctc_loss=0.08732, att_loss=0.2334, loss=0.2042, over 16116.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006206, over 42.00 utterances.], tot_loss[ctc_loss=0.08115, att_loss=0.2395, loss=0.2079, over 3150405.91 frames. utt_duration=1203 frames, utt_pad_proportion=0.0653, over 10489.51 utterances.], batch size: 42, lr: 6.06e-03, grad_scale: 8.0 2023-03-08 16:49:09,309 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 16:49:17,257 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68384.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:49:35,684 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68396.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:49:40,479 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=68399.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:50:22,681 INFO [train2.py:809] (2/4) Epoch 18, batch 700, loss[ctc_loss=0.05312, att_loss=0.2074, loss=0.1766, over 15664.00 frames. utt_duration=1695 frames, utt_pad_proportion=0.006919, over 37.00 utterances.], tot_loss[ctc_loss=0.08067, att_loss=0.2392, loss=0.2075, over 3179641.89 frames. utt_duration=1207 frames, utt_pad_proportion=0.063, over 10554.49 utterances.], batch size: 37, lr: 6.06e-03, grad_scale: 8.0 2023-03-08 16:50:36,279 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.516e+02 2.020e+02 2.578e+02 3.082e+02 5.307e+02, threshold=5.155e+02, percent-clipped=1.0 2023-03-08 16:50:50,326 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68443.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 16:51:18,380 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68460.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:51:41,628 INFO [train2.py:809] (2/4) Epoch 18, batch 750, loss[ctc_loss=0.07312, att_loss=0.2139, loss=0.1857, over 15622.00 frames. utt_duration=1690 frames, utt_pad_proportion=0.01043, over 37.00 utterances.], tot_loss[ctc_loss=0.08139, att_loss=0.2395, loss=0.2079, over 3200174.55 frames. utt_duration=1206 frames, utt_pad_proportion=0.06419, over 10626.26 utterances.], batch size: 37, lr: 6.06e-03, grad_scale: 8.0 2023-03-08 16:51:45,642 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-08 16:53:00,270 INFO [train2.py:809] (2/4) Epoch 18, batch 800, loss[ctc_loss=0.07848, att_loss=0.2361, loss=0.2046, over 15955.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007011, over 41.00 utterances.], tot_loss[ctc_loss=0.08156, att_loss=0.2394, loss=0.2078, over 3213969.68 frames. utt_duration=1234 frames, utt_pad_proportion=0.05863, over 10431.87 utterances.], batch size: 41, lr: 6.06e-03, grad_scale: 8.0 2023-03-08 16:53:13,826 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.459e+02 2.218e+02 2.657e+02 2.960e+02 6.407e+02, threshold=5.314e+02, percent-clipped=4.0 2023-03-08 16:54:08,988 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7877, 3.9761, 3.9106, 4.0004, 4.0465, 3.7957, 2.9628, 3.8863], device='cuda:2'), covar=tensor([0.0135, 0.0122, 0.0158, 0.0090, 0.0106, 0.0145, 0.0670, 0.0216], device='cuda:2'), in_proj_covar=tensor([0.0088, 0.0085, 0.0105, 0.0067, 0.0071, 0.0083, 0.0101, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 16:54:19,202 INFO [train2.py:809] (2/4) Epoch 18, batch 850, loss[ctc_loss=0.08003, att_loss=0.2069, loss=0.1815, over 15643.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008265, over 37.00 utterances.], tot_loss[ctc_loss=0.08101, att_loss=0.2388, loss=0.2073, over 3222439.84 frames. utt_duration=1235 frames, utt_pad_proportion=0.06103, over 10447.18 utterances.], batch size: 37, lr: 6.05e-03, grad_scale: 8.0 2023-03-08 16:54:24,126 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=68578.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:54:59,462 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6239, 3.0850, 3.1504, 2.6223, 3.2160, 3.0023, 3.1211, 2.1385], device='cuda:2'), covar=tensor([0.0982, 0.1634, 0.1967, 0.4866, 0.1165, 0.2531, 0.0980, 0.5325], device='cuda:2'), in_proj_covar=tensor([0.0148, 0.0167, 0.0176, 0.0242, 0.0144, 0.0239, 0.0156, 0.0207], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 16:55:38,612 INFO [train2.py:809] (2/4) Epoch 18, batch 900, loss[ctc_loss=0.08256, att_loss=0.2561, loss=0.2214, over 16864.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.007263, over 49.00 utterances.], tot_loss[ctc_loss=0.08145, att_loss=0.2396, loss=0.208, over 3235351.28 frames. utt_duration=1217 frames, utt_pad_proportion=0.06505, over 10647.32 utterances.], batch size: 49, lr: 6.05e-03, grad_scale: 8.0 2023-03-08 16:55:52,341 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.470e+02 1.982e+02 2.432e+02 3.091e+02 4.816e+02, threshold=4.864e+02, percent-clipped=0.0 2023-03-08 16:55:57,208 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68637.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:56:00,277 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68639.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:56:06,354 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2595, 4.7481, 4.6585, 4.8405, 3.1415, 4.5331, 2.9143, 2.1636], device='cuda:2'), covar=tensor([0.0355, 0.0221, 0.0556, 0.0168, 0.1424, 0.0208, 0.1331, 0.1592], device='cuda:2'), in_proj_covar=tensor([0.0173, 0.0145, 0.0251, 0.0141, 0.0218, 0.0127, 0.0227, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 16:56:57,069 INFO [train2.py:809] (2/4) Epoch 18, batch 950, loss[ctc_loss=0.0756, att_loss=0.2465, loss=0.2124, over 17272.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.01365, over 55.00 utterances.], tot_loss[ctc_loss=0.08082, att_loss=0.2396, loss=0.2079, over 3244680.06 frames. utt_duration=1239 frames, utt_pad_proportion=0.05915, over 10491.87 utterances.], batch size: 55, lr: 6.05e-03, grad_scale: 8.0 2023-03-08 16:57:12,204 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68685.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:57:12,455 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=68685.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:57:45,065 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8979, 5.1989, 5.4744, 5.3623, 5.3368, 5.8959, 5.2130, 5.9951], device='cuda:2'), covar=tensor([0.0689, 0.0752, 0.0733, 0.1224, 0.1863, 0.0812, 0.0610, 0.0648], device='cuda:2'), in_proj_covar=tensor([0.0828, 0.0482, 0.0571, 0.0639, 0.0839, 0.0585, 0.0469, 0.0576], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 16:58:15,808 INFO [train2.py:809] (2/4) Epoch 18, batch 1000, loss[ctc_loss=0.1089, att_loss=0.2708, loss=0.2384, over 17056.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008853, over 52.00 utterances.], tot_loss[ctc_loss=0.08176, att_loss=0.2405, loss=0.2088, over 3260736.99 frames. utt_duration=1242 frames, utt_pad_proportion=0.05487, over 10515.83 utterances.], batch size: 52, lr: 6.05e-03, grad_scale: 8.0 2023-03-08 16:58:29,416 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.279e+02 2.078e+02 2.479e+02 2.964e+02 7.208e+02, threshold=4.957e+02, percent-clipped=4.0 2023-03-08 16:58:43,589 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68743.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 16:58:47,163 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2023-03-08 16:58:48,352 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68746.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:59:02,234 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68755.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 16:59:33,512 INFO [train2.py:809] (2/4) Epoch 18, batch 1050, loss[ctc_loss=0.08178, att_loss=0.2477, loss=0.2145, over 17585.00 frames. utt_duration=892.1 frames, utt_pad_proportion=0.06681, over 79.00 utterances.], tot_loss[ctc_loss=0.08242, att_loss=0.2411, loss=0.2093, over 3260966.84 frames. utt_duration=1212 frames, utt_pad_proportion=0.06294, over 10773.84 utterances.], batch size: 79, lr: 6.05e-03, grad_scale: 16.0 2023-03-08 16:59:58,352 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68791.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 17:00:26,740 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2669, 2.8703, 3.2829, 4.3430, 3.8652, 3.8161, 2.8555, 2.0982], device='cuda:2'), covar=tensor([0.0704, 0.1981, 0.0906, 0.0559, 0.0824, 0.0523, 0.1528, 0.2242], device='cuda:2'), in_proj_covar=tensor([0.0175, 0.0212, 0.0187, 0.0205, 0.0210, 0.0169, 0.0197, 0.0183], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 17:00:53,123 INFO [train2.py:809] (2/4) Epoch 18, batch 1100, loss[ctc_loss=0.08837, att_loss=0.2427, loss=0.2118, over 16205.00 frames. utt_duration=1582 frames, utt_pad_proportion=0.004939, over 41.00 utterances.], tot_loss[ctc_loss=0.0826, att_loss=0.2415, loss=0.2097, over 3266408.85 frames. utt_duration=1201 frames, utt_pad_proportion=0.0658, over 10888.78 utterances.], batch size: 41, lr: 6.04e-03, grad_scale: 16.0 2023-03-08 17:01:06,985 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.271e+02 2.026e+02 2.542e+02 3.113e+02 8.116e+02, threshold=5.085e+02, percent-clipped=3.0 2023-03-08 17:02:12,246 INFO [train2.py:809] (2/4) Epoch 18, batch 1150, loss[ctc_loss=0.1049, att_loss=0.251, loss=0.2218, over 16972.00 frames. utt_duration=687.3 frames, utt_pad_proportion=0.1376, over 99.00 utterances.], tot_loss[ctc_loss=0.08238, att_loss=0.2404, loss=0.2088, over 3271229.12 frames. utt_duration=1217 frames, utt_pad_proportion=0.06066, over 10763.12 utterances.], batch size: 99, lr: 6.04e-03, grad_scale: 16.0 2023-03-08 17:03:28,903 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2023-03-08 17:03:29,036 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.18 vs. limit=5.0 2023-03-08 17:03:31,040 INFO [train2.py:809] (2/4) Epoch 18, batch 1200, loss[ctc_loss=0.05641, att_loss=0.2053, loss=0.1756, over 15365.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01154, over 35.00 utterances.], tot_loss[ctc_loss=0.08085, att_loss=0.2391, loss=0.2074, over 3267145.26 frames. utt_duration=1235 frames, utt_pad_proportion=0.05668, over 10595.68 utterances.], batch size: 35, lr: 6.04e-03, grad_scale: 16.0 2023-03-08 17:03:44,924 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.483e+02 2.054e+02 2.571e+02 3.261e+02 6.642e+02, threshold=5.143e+02, percent-clipped=4.0 2023-03-08 17:03:45,173 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68934.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:04:50,130 INFO [train2.py:809] (2/4) Epoch 18, batch 1250, loss[ctc_loss=0.1209, att_loss=0.2585, loss=0.231, over 17114.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01456, over 56.00 utterances.], tot_loss[ctc_loss=0.08106, att_loss=0.2398, loss=0.208, over 3275916.67 frames. utt_duration=1244 frames, utt_pad_proportion=0.0526, over 10544.95 utterances.], batch size: 56, lr: 6.04e-03, grad_scale: 16.0 2023-03-08 17:05:12,726 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 17:06:07,445 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4347, 2.6264, 4.9379, 3.8756, 3.0745, 4.2002, 4.5694, 4.6525], device='cuda:2'), covar=tensor([0.0245, 0.1556, 0.0151, 0.0858, 0.1652, 0.0231, 0.0171, 0.0227], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0240, 0.0171, 0.0309, 0.0266, 0.0202, 0.0154, 0.0184], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 17:06:07,731 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.55 vs. limit=2.0 2023-03-08 17:06:08,500 INFO [train2.py:809] (2/4) Epoch 18, batch 1300, loss[ctc_loss=0.06556, att_loss=0.2378, loss=0.2033, over 17111.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01537, over 56.00 utterances.], tot_loss[ctc_loss=0.08157, att_loss=0.2401, loss=0.2084, over 3274558.57 frames. utt_duration=1237 frames, utt_pad_proportion=0.05617, over 10604.96 utterances.], batch size: 56, lr: 6.04e-03, grad_scale: 16.0 2023-03-08 17:06:22,261 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 2.084e+02 2.462e+02 2.947e+02 6.955e+02, threshold=4.923e+02, percent-clipped=4.0 2023-03-08 17:06:33,220 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69041.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:06:36,482 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69043.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:06:54,771 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69055.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:07:27,318 INFO [train2.py:809] (2/4) Epoch 18, batch 1350, loss[ctc_loss=0.09558, att_loss=0.2522, loss=0.2208, over 17053.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008955, over 52.00 utterances.], tot_loss[ctc_loss=0.08209, att_loss=0.2406, loss=0.2089, over 3283925.50 frames. utt_duration=1241 frames, utt_pad_proportion=0.0532, over 10597.76 utterances.], batch size: 52, lr: 6.03e-03, grad_scale: 16.0 2023-03-08 17:07:56,642 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8321, 3.5928, 3.5234, 3.0152, 3.6362, 3.6469, 3.5895, 2.8358], device='cuda:2'), covar=tensor([0.0976, 0.1277, 0.2219, 0.4507, 0.1088, 0.2763, 0.1097, 0.3732], device='cuda:2'), in_proj_covar=tensor([0.0146, 0.0166, 0.0174, 0.0236, 0.0141, 0.0236, 0.0155, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 17:08:10,433 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69103.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:08:10,613 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1825, 5.2358, 5.0361, 2.9254, 4.9948, 4.8236, 4.3978, 3.0514], device='cuda:2'), covar=tensor([0.0117, 0.0091, 0.0227, 0.1067, 0.0096, 0.0200, 0.0317, 0.1171], device='cuda:2'), in_proj_covar=tensor([0.0071, 0.0096, 0.0094, 0.0107, 0.0080, 0.0106, 0.0094, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 17:08:12,201 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69104.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:08:46,055 INFO [train2.py:809] (2/4) Epoch 18, batch 1400, loss[ctc_loss=0.0757, att_loss=0.245, loss=0.2111, over 16761.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006851, over 48.00 utterances.], tot_loss[ctc_loss=0.08185, att_loss=0.2403, loss=0.2086, over 3285322.97 frames. utt_duration=1254 frames, utt_pad_proportion=0.05015, over 10490.59 utterances.], batch size: 48, lr: 6.03e-03, grad_scale: 16.0 2023-03-08 17:08:59,736 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 2.123e+02 2.542e+02 3.068e+02 7.934e+02, threshold=5.085e+02, percent-clipped=3.0 2023-03-08 17:09:44,859 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69162.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:09:54,761 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69168.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:10:05,179 INFO [train2.py:809] (2/4) Epoch 18, batch 1450, loss[ctc_loss=0.07325, att_loss=0.22, loss=0.1906, over 16002.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.00776, over 40.00 utterances.], tot_loss[ctc_loss=0.08163, att_loss=0.2402, loss=0.2085, over 3280823.96 frames. utt_duration=1232 frames, utt_pad_proportion=0.0563, over 10663.62 utterances.], batch size: 40, lr: 6.03e-03, grad_scale: 16.0 2023-03-08 17:11:20,875 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69223.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:11:23,579 INFO [train2.py:809] (2/4) Epoch 18, batch 1500, loss[ctc_loss=0.07295, att_loss=0.2464, loss=0.2117, over 17289.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01197, over 55.00 utterances.], tot_loss[ctc_loss=0.08155, att_loss=0.24, loss=0.2083, over 3279123.30 frames. utt_duration=1208 frames, utt_pad_proportion=0.06271, over 10874.96 utterances.], batch size: 55, lr: 6.03e-03, grad_scale: 16.0 2023-03-08 17:11:23,985 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69225.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:11:25,514 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8861, 3.7205, 3.0754, 3.2984, 3.8663, 3.4808, 2.8872, 4.1518], device='cuda:2'), covar=tensor([0.1012, 0.0499, 0.1098, 0.0686, 0.0672, 0.0731, 0.0846, 0.0479], device='cuda:2'), in_proj_covar=tensor([0.0196, 0.0207, 0.0219, 0.0192, 0.0261, 0.0232, 0.0197, 0.0276], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 17:11:29,974 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69229.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:11:37,069 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.337e+02 2.061e+02 2.532e+02 3.025e+02 5.458e+02, threshold=5.063e+02, percent-clipped=1.0 2023-03-08 17:11:37,450 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69234.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:11:54,401 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69245.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 17:12:39,659 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69273.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:12:42,419 INFO [train2.py:809] (2/4) Epoch 18, batch 1550, loss[ctc_loss=0.06629, att_loss=0.2348, loss=0.2011, over 16412.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.006321, over 44.00 utterances.], tot_loss[ctc_loss=0.08068, att_loss=0.2395, loss=0.2077, over 3279348.75 frames. utt_duration=1243 frames, utt_pad_proportion=0.05404, over 10566.98 utterances.], batch size: 44, lr: 6.02e-03, grad_scale: 16.0 2023-03-08 17:12:52,882 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69282.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:12:59,170 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69286.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:13:21,763 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-08 17:13:31,389 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69306.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 17:13:53,249 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69320.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:14:00,356 INFO [train2.py:809] (2/4) Epoch 18, batch 1600, loss[ctc_loss=0.07718, att_loss=0.2481, loss=0.2139, over 16468.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006508, over 46.00 utterances.], tot_loss[ctc_loss=0.08062, att_loss=0.2397, loss=0.2079, over 3279289.20 frames. utt_duration=1249 frames, utt_pad_proportion=0.0532, over 10516.45 utterances.], batch size: 46, lr: 6.02e-03, grad_scale: 16.0 2023-03-08 17:14:14,481 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.214e+02 1.944e+02 2.286e+02 2.991e+02 6.797e+02, threshold=4.572e+02, percent-clipped=3.0 2023-03-08 17:14:14,899 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69334.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 17:14:25,732 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69341.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:15:06,014 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8638, 6.1275, 5.5724, 5.8521, 5.7811, 5.3201, 5.5682, 5.3144], device='cuda:2'), covar=tensor([0.1230, 0.0754, 0.0830, 0.0746, 0.0878, 0.1287, 0.2172, 0.2242], device='cuda:2'), in_proj_covar=tensor([0.0501, 0.0569, 0.0436, 0.0433, 0.0408, 0.0447, 0.0588, 0.0502], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 17:15:10,848 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69368.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:15:21,709 INFO [train2.py:809] (2/4) Epoch 18, batch 1650, loss[ctc_loss=0.092, att_loss=0.2503, loss=0.2186, over 17363.00 frames. utt_duration=1008 frames, utt_pad_proportion=0.04827, over 69.00 utterances.], tot_loss[ctc_loss=0.08087, att_loss=0.2395, loss=0.2077, over 3275166.74 frames. utt_duration=1242 frames, utt_pad_proportion=0.05709, over 10560.42 utterances.], batch size: 69, lr: 6.02e-03, grad_scale: 16.0 2023-03-08 17:15:31,639 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69381.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:15:44,485 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69389.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:15:52,796 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69394.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:16:02,177 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69399.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:16:09,713 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2181, 3.7786, 3.2085, 3.4967, 3.9899, 3.5914, 3.0658, 4.3523], device='cuda:2'), covar=tensor([0.0892, 0.0533, 0.1113, 0.0671, 0.0655, 0.0757, 0.0898, 0.0444], device='cuda:2'), in_proj_covar=tensor([0.0195, 0.0207, 0.0219, 0.0192, 0.0261, 0.0231, 0.0196, 0.0275], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 17:16:44,770 INFO [train2.py:809] (2/4) Epoch 18, batch 1700, loss[ctc_loss=0.08133, att_loss=0.2524, loss=0.2181, over 17046.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.009202, over 52.00 utterances.], tot_loss[ctc_loss=0.08047, att_loss=0.2392, loss=0.2074, over 3276172.77 frames. utt_duration=1258 frames, utt_pad_proportion=0.05281, over 10433.54 utterances.], batch size: 52, lr: 6.02e-03, grad_scale: 16.0 2023-03-08 17:16:51,578 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69429.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:16:58,945 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.294e+02 2.149e+02 2.524e+02 3.202e+02 9.381e+02, threshold=5.048e+02, percent-clipped=6.0 2023-03-08 17:17:34,278 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69455.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:17:53,652 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7715, 3.5413, 3.4811, 3.0608, 3.4994, 3.5334, 3.5782, 2.5705], device='cuda:2'), covar=tensor([0.0989, 0.1318, 0.1873, 0.3983, 0.1474, 0.1762, 0.0969, 0.4650], device='cuda:2'), in_proj_covar=tensor([0.0149, 0.0166, 0.0176, 0.0237, 0.0143, 0.0239, 0.0157, 0.0207], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 17:18:06,228 INFO [train2.py:809] (2/4) Epoch 18, batch 1750, loss[ctc_loss=0.08353, att_loss=0.2239, loss=0.1959, over 15958.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.00602, over 41.00 utterances.], tot_loss[ctc_loss=0.0808, att_loss=0.2392, loss=0.2075, over 3271748.90 frames. utt_duration=1250 frames, utt_pad_proportion=0.05623, over 10480.11 utterances.], batch size: 41, lr: 6.02e-03, grad_scale: 16.0 2023-03-08 17:19:17,104 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69518.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:19:26,205 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69524.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:19:27,554 INFO [train2.py:809] (2/4) Epoch 18, batch 1800, loss[ctc_loss=0.06301, att_loss=0.2286, loss=0.1954, over 16005.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007636, over 40.00 utterances.], tot_loss[ctc_loss=0.08077, att_loss=0.239, loss=0.2074, over 3276925.18 frames. utt_duration=1278 frames, utt_pad_proportion=0.04815, over 10270.73 utterances.], batch size: 40, lr: 6.01e-03, grad_scale: 16.0 2023-03-08 17:19:41,869 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.594e+02 2.055e+02 2.270e+02 2.783e+02 4.907e+02, threshold=4.541e+02, percent-clipped=0.0 2023-03-08 17:20:49,339 INFO [train2.py:809] (2/4) Epoch 18, batch 1850, loss[ctc_loss=0.06541, att_loss=0.22, loss=0.1891, over 15953.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.00644, over 41.00 utterances.], tot_loss[ctc_loss=0.08009, att_loss=0.2391, loss=0.2073, over 3278492.33 frames. utt_duration=1256 frames, utt_pad_proportion=0.05158, over 10453.17 utterances.], batch size: 41, lr: 6.01e-03, grad_scale: 16.0 2023-03-08 17:20:58,881 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69581.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:21:31,781 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69601.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 17:22:09,408 INFO [train2.py:809] (2/4) Epoch 18, batch 1900, loss[ctc_loss=0.07637, att_loss=0.2291, loss=0.1985, over 15366.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01149, over 35.00 utterances.], tot_loss[ctc_loss=0.08034, att_loss=0.2397, loss=0.2079, over 3291723.69 frames. utt_duration=1272 frames, utt_pad_proportion=0.04472, over 10364.27 utterances.], batch size: 35, lr: 6.01e-03, grad_scale: 16.0 2023-03-08 17:22:15,643 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69629.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 17:22:15,885 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5090, 2.5126, 4.9186, 3.8379, 3.0951, 4.2484, 4.7595, 4.6669], device='cuda:2'), covar=tensor([0.0265, 0.1741, 0.0185, 0.0910, 0.1645, 0.0233, 0.0125, 0.0246], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0239, 0.0170, 0.0309, 0.0264, 0.0202, 0.0154, 0.0185], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 17:22:23,437 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.309e+02 2.048e+02 2.360e+02 2.878e+02 5.569e+02, threshold=4.720e+02, percent-clipped=3.0 2023-03-08 17:22:37,292 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4250, 2.4790, 4.8102, 3.7275, 2.9867, 4.1606, 4.5692, 4.5420], device='cuda:2'), covar=tensor([0.0252, 0.1685, 0.0164, 0.0983, 0.1692, 0.0257, 0.0160, 0.0244], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0238, 0.0169, 0.0308, 0.0264, 0.0202, 0.0154, 0.0185], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 17:23:25,216 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 17:23:28,939 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69674.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 17:23:30,114 INFO [train2.py:809] (2/4) Epoch 18, batch 1950, loss[ctc_loss=0.1505, att_loss=0.2762, loss=0.251, over 13812.00 frames. utt_duration=382.4 frames, utt_pad_proportion=0.3361, over 145.00 utterances.], tot_loss[ctc_loss=0.08133, att_loss=0.2403, loss=0.2085, over 3289428.13 frames. utt_duration=1268 frames, utt_pad_proportion=0.04676, over 10387.24 utterances.], batch size: 145, lr: 6.01e-03, grad_scale: 16.0 2023-03-08 17:23:31,801 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69676.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:24:09,276 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69699.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:24:48,837 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69724.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:24:50,198 INFO [train2.py:809] (2/4) Epoch 18, batch 2000, loss[ctc_loss=0.05744, att_loss=0.2198, loss=0.1873, over 15952.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006642, over 41.00 utterances.], tot_loss[ctc_loss=0.08227, att_loss=0.2405, loss=0.2089, over 3280550.98 frames. utt_duration=1217 frames, utt_pad_proportion=0.06136, over 10791.60 utterances.], batch size: 41, lr: 6.00e-03, grad_scale: 8.0 2023-03-08 17:24:53,837 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0944, 4.3894, 4.4247, 4.6942, 2.5591, 4.4620, 2.7393, 1.5999], device='cuda:2'), covar=tensor([0.0368, 0.0222, 0.0615, 0.0158, 0.1808, 0.0178, 0.1424, 0.1869], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0150, 0.0256, 0.0144, 0.0220, 0.0129, 0.0230, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 17:25:05,731 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.492e+02 2.144e+02 2.656e+02 3.349e+02 1.249e+03, threshold=5.313e+02, percent-clipped=13.0 2023-03-08 17:25:06,193 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69735.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 17:25:26,862 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69747.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:25:31,584 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69750.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:26:10,679 INFO [train2.py:809] (2/4) Epoch 18, batch 2050, loss[ctc_loss=0.0819, att_loss=0.2495, loss=0.216, over 17379.00 frames. utt_duration=881.6 frames, utt_pad_proportion=0.07689, over 79.00 utterances.], tot_loss[ctc_loss=0.08261, att_loss=0.2414, loss=0.2096, over 3276945.24 frames. utt_duration=1160 frames, utt_pad_proportion=0.07729, over 11315.52 utterances.], batch size: 79, lr: 6.00e-03, grad_scale: 8.0 2023-03-08 17:27:19,501 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69818.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:27:28,555 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69824.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:27:29,781 INFO [train2.py:809] (2/4) Epoch 18, batch 2100, loss[ctc_loss=0.07758, att_loss=0.254, loss=0.2187, over 17285.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.0121, over 55.00 utterances.], tot_loss[ctc_loss=0.08166, att_loss=0.2408, loss=0.209, over 3280838.41 frames. utt_duration=1191 frames, utt_pad_proportion=0.06786, over 11033.23 utterances.], batch size: 55, lr: 6.00e-03, grad_scale: 8.0 2023-03-08 17:27:45,295 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.227e+02 1.987e+02 2.392e+02 3.047e+02 6.775e+02, threshold=4.784e+02, percent-clipped=2.0 2023-03-08 17:28:30,457 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0296, 5.0749, 4.8609, 2.1417, 1.8628, 2.6397, 2.5619, 3.7961], device='cuda:2'), covar=tensor([0.0688, 0.0247, 0.0244, 0.4897, 0.5947, 0.2750, 0.3147, 0.1782], device='cuda:2'), in_proj_covar=tensor([0.0343, 0.0256, 0.0254, 0.0234, 0.0339, 0.0329, 0.0244, 0.0360], device='cuda:2'), out_proj_covar=tensor([1.4668e-04, 9.5002e-05, 1.0912e-04, 1.0098e-04, 1.4280e-04, 1.2965e-04, 9.7337e-05, 1.4732e-04], device='cuda:2') 2023-03-08 17:28:36,237 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69866.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:28:45,327 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69872.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:28:49,864 INFO [train2.py:809] (2/4) Epoch 18, batch 2150, loss[ctc_loss=0.05985, att_loss=0.2229, loss=0.1903, over 16286.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006853, over 43.00 utterances.], tot_loss[ctc_loss=0.08072, att_loss=0.2401, loss=0.2082, over 3268719.62 frames. utt_duration=1192 frames, utt_pad_proportion=0.06948, over 10980.33 utterances.], batch size: 43, lr: 6.00e-03, grad_scale: 8.0 2023-03-08 17:28:59,297 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69881.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:29:31,239 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69901.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 17:29:51,187 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2023-03-08 17:30:03,941 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-03-08 17:30:08,949 INFO [train2.py:809] (2/4) Epoch 18, batch 2200, loss[ctc_loss=0.06899, att_loss=0.2294, loss=0.1973, over 15873.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.00875, over 39.00 utterances.], tot_loss[ctc_loss=0.08029, att_loss=0.2398, loss=0.2079, over 3270524.05 frames. utt_duration=1212 frames, utt_pad_proportion=0.06395, over 10807.93 utterances.], batch size: 39, lr: 6.00e-03, grad_scale: 8.0 2023-03-08 17:30:15,156 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69929.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:30:15,359 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69929.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 17:30:24,844 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.283e+02 2.018e+02 2.397e+02 3.065e+02 5.583e+02, threshold=4.795e+02, percent-clipped=3.0 2023-03-08 17:30:47,802 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69949.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 17:31:28,416 INFO [train2.py:809] (2/4) Epoch 18, batch 2250, loss[ctc_loss=0.1116, att_loss=0.2735, loss=0.2411, over 17320.00 frames. utt_duration=1101 frames, utt_pad_proportion=0.03759, over 63.00 utterances.], tot_loss[ctc_loss=0.0807, att_loss=0.2399, loss=0.208, over 3270869.33 frames. utt_duration=1206 frames, utt_pad_proportion=0.06559, over 10859.64 utterances.], batch size: 63, lr: 5.99e-03, grad_scale: 8.0 2023-03-08 17:31:30,210 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69976.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:31:31,507 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69977.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:31:31,787 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9604, 3.7010, 3.0650, 3.3010, 3.7616, 3.4317, 2.8261, 4.0351], device='cuda:2'), covar=tensor([0.0987, 0.0479, 0.1167, 0.0706, 0.0827, 0.0789, 0.0971, 0.0566], device='cuda:2'), in_proj_covar=tensor([0.0198, 0.0211, 0.0223, 0.0193, 0.0267, 0.0234, 0.0199, 0.0279], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 17:31:45,229 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1822, 3.9741, 3.3943, 3.7098, 4.0741, 3.7681, 3.4265, 4.5087], device='cuda:2'), covar=tensor([0.1015, 0.0527, 0.1096, 0.0666, 0.0743, 0.0747, 0.0755, 0.0493], device='cuda:2'), in_proj_covar=tensor([0.0198, 0.0212, 0.0223, 0.0193, 0.0267, 0.0234, 0.0200, 0.0280], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 17:31:58,277 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6306, 2.5124, 5.1270, 4.2194, 3.0112, 4.3921, 4.8497, 4.6752], device='cuda:2'), covar=tensor([0.0273, 0.1819, 0.0182, 0.0854, 0.1812, 0.0256, 0.0161, 0.0272], device='cuda:2'), in_proj_covar=tensor([0.0179, 0.0242, 0.0170, 0.0312, 0.0266, 0.0204, 0.0156, 0.0187], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 17:32:51,049 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70024.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:32:51,260 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70024.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:32:52,442 INFO [train2.py:809] (2/4) Epoch 18, batch 2300, loss[ctc_loss=0.05114, att_loss=0.2259, loss=0.191, over 16405.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006143, over 44.00 utterances.], tot_loss[ctc_loss=0.08037, att_loss=0.2395, loss=0.2077, over 3274941.05 frames. utt_duration=1211 frames, utt_pad_proportion=0.06369, over 10828.45 utterances.], batch size: 44, lr: 5.99e-03, grad_scale: 8.0 2023-03-08 17:32:55,039 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-08 17:33:01,026 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=70030.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 17:33:04,164 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5295, 2.2063, 2.2924, 2.5064, 2.7201, 2.7483, 2.3427, 3.0110], device='cuda:2'), covar=tensor([0.1847, 0.3536, 0.2572, 0.1930, 0.1806, 0.1275, 0.3136, 0.1203], device='cuda:2'), in_proj_covar=tensor([0.0105, 0.0114, 0.0114, 0.0099, 0.0109, 0.0097, 0.0117, 0.0086], device='cuda:2'), out_proj_covar=tensor([7.7914e-05, 8.6623e-05, 8.7881e-05, 7.5965e-05, 7.9897e-05, 7.6579e-05, 8.6646e-05, 6.8897e-05], device='cuda:2') 2023-03-08 17:33:09,215 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.472e+02 2.121e+02 2.733e+02 3.440e+02 9.117e+02, threshold=5.466e+02, percent-clipped=8.0 2023-03-08 17:33:23,347 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.05 vs. limit=5.0 2023-03-08 17:33:34,404 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70050.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:34:07,955 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70072.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:34:12,449 INFO [train2.py:809] (2/4) Epoch 18, batch 2350, loss[ctc_loss=0.06135, att_loss=0.2272, loss=0.1941, over 16129.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005498, over 42.00 utterances.], tot_loss[ctc_loss=0.08052, att_loss=0.2393, loss=0.2076, over 3271577.17 frames. utt_duration=1202 frames, utt_pad_proportion=0.06689, over 10904.92 utterances.], batch size: 42, lr: 5.99e-03, grad_scale: 8.0 2023-03-08 17:34:47,381 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1448, 2.7790, 3.0069, 4.1499, 3.7197, 3.6932, 2.7473, 2.0805], device='cuda:2'), covar=tensor([0.0779, 0.1964, 0.1032, 0.0616, 0.0823, 0.0520, 0.1667, 0.2451], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0215, 0.0188, 0.0211, 0.0215, 0.0173, 0.0202, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 17:34:50,259 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70098.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:35:08,709 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=70110.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:35:31,915 INFO [train2.py:809] (2/4) Epoch 18, batch 2400, loss[ctc_loss=0.08039, att_loss=0.2553, loss=0.2203, over 17284.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01293, over 55.00 utterances.], tot_loss[ctc_loss=0.08046, att_loss=0.2392, loss=0.2075, over 3266409.04 frames. utt_duration=1206 frames, utt_pad_proportion=0.06632, over 10843.41 utterances.], batch size: 55, lr: 5.99e-03, grad_scale: 8.0 2023-03-08 17:35:48,285 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.485e+02 2.097e+02 2.428e+02 2.770e+02 6.273e+02, threshold=4.856e+02, percent-clipped=1.0 2023-03-08 17:36:45,631 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=70171.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:36:52,019 INFO [train2.py:809] (2/4) Epoch 18, batch 2450, loss[ctc_loss=0.0904, att_loss=0.2561, loss=0.2229, over 16471.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006297, over 46.00 utterances.], tot_loss[ctc_loss=0.08147, att_loss=0.2408, loss=0.2089, over 3273478.99 frames. utt_duration=1183 frames, utt_pad_proportion=0.07137, over 11083.62 utterances.], batch size: 46, lr: 5.99e-03, grad_scale: 8.0 2023-03-08 17:38:11,067 INFO [train2.py:809] (2/4) Epoch 18, batch 2500, loss[ctc_loss=0.1163, att_loss=0.2667, loss=0.2367, over 17149.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.01332, over 56.00 utterances.], tot_loss[ctc_loss=0.08224, att_loss=0.2411, loss=0.2093, over 3280796.86 frames. utt_duration=1179 frames, utt_pad_proportion=0.06839, over 11142.52 utterances.], batch size: 56, lr: 5.98e-03, grad_scale: 8.0 2023-03-08 17:38:26,724 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.323e+02 2.106e+02 2.442e+02 2.921e+02 5.704e+02, threshold=4.884e+02, percent-clipped=1.0 2023-03-08 17:39:06,547 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=70260.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:39:19,792 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0199, 5.3403, 5.6037, 5.4531, 5.5134, 6.0072, 5.2925, 6.0818], device='cuda:2'), covar=tensor([0.0673, 0.0741, 0.0861, 0.1377, 0.1686, 0.0852, 0.0650, 0.0656], device='cuda:2'), in_proj_covar=tensor([0.0834, 0.0486, 0.0571, 0.0639, 0.0834, 0.0591, 0.0472, 0.0580], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 17:39:31,631 INFO [train2.py:809] (2/4) Epoch 18, batch 2550, loss[ctc_loss=0.07866, att_loss=0.2413, loss=0.2088, over 17291.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01242, over 55.00 utterances.], tot_loss[ctc_loss=0.08169, att_loss=0.2408, loss=0.209, over 3278039.63 frames. utt_duration=1177 frames, utt_pad_proportion=0.06997, over 11157.46 utterances.], batch size: 55, lr: 5.98e-03, grad_scale: 8.0 2023-03-08 17:39:43,784 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-08 17:39:48,993 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-03-08 17:40:43,950 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=70321.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:40:50,210 INFO [train2.py:809] (2/4) Epoch 18, batch 2600, loss[ctc_loss=0.1209, att_loss=0.2607, loss=0.2327, over 16468.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.007061, over 46.00 utterances.], tot_loss[ctc_loss=0.08179, att_loss=0.2407, loss=0.2089, over 3281236.35 frames. utt_duration=1181 frames, utt_pad_proportion=0.06779, over 11123.35 utterances.], batch size: 46, lr: 5.98e-03, grad_scale: 8.0 2023-03-08 17:40:58,272 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70330.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 17:41:05,530 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.324e+02 1.967e+02 2.510e+02 3.094e+02 6.487e+02, threshold=5.020e+02, percent-clipped=1.0 2023-03-08 17:42:08,698 INFO [train2.py:809] (2/4) Epoch 18, batch 2650, loss[ctc_loss=0.06983, att_loss=0.2107, loss=0.1825, over 15512.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.008019, over 36.00 utterances.], tot_loss[ctc_loss=0.08176, att_loss=0.2402, loss=0.2085, over 3275163.06 frames. utt_duration=1194 frames, utt_pad_proportion=0.06714, over 10987.23 utterances.], batch size: 36, lr: 5.98e-03, grad_scale: 8.0 2023-03-08 17:42:13,358 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70378.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 17:42:14,708 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0258, 6.2642, 5.7177, 6.0406, 5.9172, 5.4521, 5.6671, 5.4644], device='cuda:2'), covar=tensor([0.1223, 0.0784, 0.0767, 0.0729, 0.0792, 0.1349, 0.2135, 0.2294], device='cuda:2'), in_proj_covar=tensor([0.0492, 0.0574, 0.0435, 0.0431, 0.0408, 0.0445, 0.0585, 0.0500], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 17:43:27,229 INFO [train2.py:809] (2/4) Epoch 18, batch 2700, loss[ctc_loss=0.07527, att_loss=0.2291, loss=0.1983, over 16278.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.007478, over 43.00 utterances.], tot_loss[ctc_loss=0.0822, att_loss=0.2399, loss=0.2084, over 3265625.20 frames. utt_duration=1185 frames, utt_pad_proportion=0.07136, over 11036.77 utterances.], batch size: 43, lr: 5.98e-03, grad_scale: 8.0 2023-03-08 17:43:30,656 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6830, 3.0087, 3.6825, 3.1367, 3.6674, 4.6776, 4.5808, 3.2537], device='cuda:2'), covar=tensor([0.0329, 0.1708, 0.1261, 0.1284, 0.1024, 0.0838, 0.0518, 0.1455], device='cuda:2'), in_proj_covar=tensor([0.0240, 0.0240, 0.0270, 0.0213, 0.0261, 0.0352, 0.0249, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 17:43:43,122 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.380e+02 1.957e+02 2.578e+02 3.179e+02 5.678e+02, threshold=5.155e+02, percent-clipped=6.0 2023-03-08 17:43:54,204 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6348, 3.7875, 3.3724, 3.8287, 2.5846, 3.7688, 2.6402, 2.2377], device='cuda:2'), covar=tensor([0.0495, 0.0269, 0.0977, 0.0240, 0.1835, 0.0227, 0.1600, 0.1515], device='cuda:2'), in_proj_covar=tensor([0.0179, 0.0150, 0.0256, 0.0145, 0.0221, 0.0129, 0.0231, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 17:44:15,437 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9309, 6.1596, 5.6240, 5.8940, 5.8166, 5.2782, 5.5503, 5.3386], device='cuda:2'), covar=tensor([0.1218, 0.0802, 0.0933, 0.0766, 0.0831, 0.1458, 0.2221, 0.2082], device='cuda:2'), in_proj_covar=tensor([0.0496, 0.0577, 0.0439, 0.0435, 0.0412, 0.0450, 0.0591, 0.0505], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 17:44:31,282 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=70466.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:44:33,983 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.51 vs. limit=5.0 2023-03-08 17:44:46,104 INFO [train2.py:809] (2/4) Epoch 18, batch 2750, loss[ctc_loss=0.05294, att_loss=0.2124, loss=0.1805, over 15883.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.008963, over 39.00 utterances.], tot_loss[ctc_loss=0.08154, att_loss=0.2393, loss=0.2077, over 3251841.05 frames. utt_duration=1192 frames, utt_pad_proportion=0.07362, over 10927.46 utterances.], batch size: 39, lr: 5.97e-03, grad_scale: 8.0 2023-03-08 17:46:05,047 INFO [train2.py:809] (2/4) Epoch 18, batch 2800, loss[ctc_loss=0.1111, att_loss=0.2635, loss=0.233, over 14038.00 frames. utt_duration=386 frames, utt_pad_proportion=0.3275, over 146.00 utterances.], tot_loss[ctc_loss=0.08145, att_loss=0.2396, loss=0.2079, over 3256177.70 frames. utt_duration=1177 frames, utt_pad_proportion=0.07677, over 11080.12 utterances.], batch size: 146, lr: 5.97e-03, grad_scale: 8.0 2023-03-08 17:46:20,626 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.291e+02 1.948e+02 2.363e+02 2.957e+02 5.460e+02, threshold=4.725e+02, percent-clipped=2.0 2023-03-08 17:47:17,800 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0014, 6.1606, 5.5877, 5.9067, 5.8269, 5.3259, 5.6200, 5.3312], device='cuda:2'), covar=tensor([0.1013, 0.0704, 0.0835, 0.0754, 0.0736, 0.1327, 0.1988, 0.2228], device='cuda:2'), in_proj_covar=tensor([0.0492, 0.0576, 0.0437, 0.0430, 0.0410, 0.0446, 0.0586, 0.0505], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 17:47:24,018 INFO [train2.py:809] (2/4) Epoch 18, batch 2850, loss[ctc_loss=0.07201, att_loss=0.2134, loss=0.1851, over 15649.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008091, over 37.00 utterances.], tot_loss[ctc_loss=0.08142, att_loss=0.24, loss=0.2082, over 3263245.72 frames. utt_duration=1175 frames, utt_pad_proportion=0.07563, over 11123.66 utterances.], batch size: 37, lr: 5.97e-03, grad_scale: 8.0 2023-03-08 17:47:54,969 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5157, 2.8552, 3.3638, 4.3900, 3.9103, 3.9322, 3.0315, 2.1987], device='cuda:2'), covar=tensor([0.0643, 0.1953, 0.0885, 0.0544, 0.0818, 0.0451, 0.1421, 0.2306], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0213, 0.0185, 0.0210, 0.0211, 0.0170, 0.0198, 0.0187], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 17:48:21,663 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4581, 4.8150, 4.7471, 4.7960, 4.9018, 4.5843, 3.5653, 4.8017], device='cuda:2'), covar=tensor([0.0137, 0.0124, 0.0127, 0.0109, 0.0101, 0.0124, 0.0659, 0.0203], device='cuda:2'), in_proj_covar=tensor([0.0087, 0.0084, 0.0104, 0.0066, 0.0071, 0.0082, 0.0101, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 17:48:29,674 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=70616.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:48:43,289 INFO [train2.py:809] (2/4) Epoch 18, batch 2900, loss[ctc_loss=0.07096, att_loss=0.2158, loss=0.1869, over 15618.00 frames. utt_duration=1690 frames, utt_pad_proportion=0.009816, over 37.00 utterances.], tot_loss[ctc_loss=0.08116, att_loss=0.2398, loss=0.208, over 3253850.25 frames. utt_duration=1148 frames, utt_pad_proportion=0.08527, over 11350.53 utterances.], batch size: 37, lr: 5.97e-03, grad_scale: 8.0 2023-03-08 17:48:59,499 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.265e+02 1.930e+02 2.447e+02 2.876e+02 5.952e+02, threshold=4.894e+02, percent-clipped=4.0 2023-03-08 17:49:11,433 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.27 vs. limit=5.0 2023-03-08 17:49:14,833 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 17:50:04,767 INFO [train2.py:809] (2/4) Epoch 18, batch 2950, loss[ctc_loss=0.08618, att_loss=0.2442, loss=0.2126, over 17434.00 frames. utt_duration=1012 frames, utt_pad_proportion=0.04527, over 69.00 utterances.], tot_loss[ctc_loss=0.08112, att_loss=0.2398, loss=0.2081, over 3251659.92 frames. utt_duration=1135 frames, utt_pad_proportion=0.089, over 11476.61 utterances.], batch size: 69, lr: 5.96e-03, grad_scale: 8.0 2023-03-08 17:50:16,393 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5532, 4.9746, 5.1514, 4.9383, 5.0941, 5.4940, 4.9781, 5.5773], device='cuda:2'), covar=tensor([0.0710, 0.0747, 0.0932, 0.1508, 0.1779, 0.0972, 0.0882, 0.0770], device='cuda:2'), in_proj_covar=tensor([0.0831, 0.0490, 0.0571, 0.0639, 0.0837, 0.0594, 0.0470, 0.0586], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 17:50:38,964 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5277, 3.7362, 3.5797, 3.4006, 3.9234, 3.4600, 3.5232, 2.4792], device='cuda:2'), covar=tensor([0.0353, 0.0364, 0.0440, 0.0533, 0.0634, 0.0353, 0.0433, 0.1798], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0166, 0.0169, 0.0187, 0.0357, 0.0142, 0.0158, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 17:51:18,089 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3845, 5.2124, 5.1644, 3.1135, 5.1265, 4.8846, 4.7178, 3.2893], device='cuda:2'), covar=tensor([0.0084, 0.0095, 0.0202, 0.0874, 0.0078, 0.0166, 0.0230, 0.1015], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0097, 0.0096, 0.0108, 0.0080, 0.0106, 0.0096, 0.0101], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 17:51:26,433 INFO [train2.py:809] (2/4) Epoch 18, batch 3000, loss[ctc_loss=0.1343, att_loss=0.2681, loss=0.2414, over 13672.00 frames. utt_duration=376.1 frames, utt_pad_proportion=0.3447, over 146.00 utterances.], tot_loss[ctc_loss=0.0814, att_loss=0.2406, loss=0.2087, over 3260125.09 frames. utt_duration=1139 frames, utt_pad_proportion=0.08588, over 11460.63 utterances.], batch size: 146, lr: 5.96e-03, grad_scale: 8.0 2023-03-08 17:51:26,433 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 17:51:43,588 INFO [train2.py:843] (2/4) Epoch 18, validation: ctc_loss=0.04147, att_loss=0.2347, loss=0.196, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 17:51:43,589 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 17:51:59,784 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.535e+02 1.987e+02 2.490e+02 3.021e+02 7.545e+02, threshold=4.980e+02, percent-clipped=3.0 2023-03-08 17:52:49,417 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70766.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:53:03,428 INFO [train2.py:809] (2/4) Epoch 18, batch 3050, loss[ctc_loss=0.08366, att_loss=0.246, loss=0.2135, over 16316.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.007114, over 45.00 utterances.], tot_loss[ctc_loss=0.08035, att_loss=0.239, loss=0.2073, over 3256725.07 frames. utt_duration=1177 frames, utt_pad_proportion=0.07741, over 11085.84 utterances.], batch size: 45, lr: 5.96e-03, grad_scale: 8.0 2023-03-08 17:54:07,343 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70814.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:54:26,016 INFO [train2.py:809] (2/4) Epoch 18, batch 3100, loss[ctc_loss=0.08197, att_loss=0.253, loss=0.2188, over 17394.00 frames. utt_duration=1010 frames, utt_pad_proportion=0.04735, over 69.00 utterances.], tot_loss[ctc_loss=0.0801, att_loss=0.2393, loss=0.2075, over 3269747.46 frames. utt_duration=1181 frames, utt_pad_proportion=0.07278, over 11084.84 utterances.], batch size: 69, lr: 5.96e-03, grad_scale: 8.0 2023-03-08 17:54:41,865 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.217e+02 2.010e+02 2.337e+02 3.007e+02 9.236e+02, threshold=4.674e+02, percent-clipped=2.0 2023-03-08 17:55:35,490 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2023-03-08 17:55:46,922 INFO [train2.py:809] (2/4) Epoch 18, batch 3150, loss[ctc_loss=0.07172, att_loss=0.2284, loss=0.1971, over 16261.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.007635, over 43.00 utterances.], tot_loss[ctc_loss=0.07982, att_loss=0.2391, loss=0.2073, over 3271190.02 frames. utt_duration=1203 frames, utt_pad_proportion=0.06702, over 10892.26 utterances.], batch size: 43, lr: 5.96e-03, grad_scale: 8.0 2023-03-08 17:56:19,732 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-03-08 17:56:52,236 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70916.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:57:07,452 INFO [train2.py:809] (2/4) Epoch 18, batch 3200, loss[ctc_loss=0.08397, att_loss=0.2167, loss=0.1902, over 15504.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.007879, over 36.00 utterances.], tot_loss[ctc_loss=0.07958, att_loss=0.2381, loss=0.2064, over 3264237.52 frames. utt_duration=1237 frames, utt_pad_proportion=0.06095, over 10568.44 utterances.], batch size: 36, lr: 5.95e-03, grad_scale: 8.0 2023-03-08 17:57:23,346 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.324e+02 2.043e+02 2.547e+02 2.997e+02 7.270e+02, threshold=5.093e+02, percent-clipped=4.0 2023-03-08 17:58:10,131 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70964.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:58:28,689 INFO [train2.py:809] (2/4) Epoch 18, batch 3250, loss[ctc_loss=0.07017, att_loss=0.213, loss=0.1844, over 15775.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008416, over 38.00 utterances.], tot_loss[ctc_loss=0.07903, att_loss=0.2373, loss=0.2056, over 3263349.13 frames. utt_duration=1242 frames, utt_pad_proportion=0.06035, over 10525.53 utterances.], batch size: 38, lr: 5.95e-03, grad_scale: 8.0 2023-03-08 17:58:34,239 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9003, 5.1450, 5.4791, 5.3040, 5.3257, 5.9010, 5.1064, 5.9806], device='cuda:2'), covar=tensor([0.0771, 0.0742, 0.0807, 0.1196, 0.1881, 0.0839, 0.0703, 0.0667], device='cuda:2'), in_proj_covar=tensor([0.0837, 0.0487, 0.0571, 0.0635, 0.0840, 0.0594, 0.0469, 0.0585], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 17:58:56,387 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=70992.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 17:59:47,173 INFO [train2.py:809] (2/4) Epoch 18, batch 3300, loss[ctc_loss=0.06099, att_loss=0.2294, loss=0.1957, over 16403.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007661, over 44.00 utterances.], tot_loss[ctc_loss=0.07876, att_loss=0.237, loss=0.2053, over 3264179.39 frames. utt_duration=1262 frames, utt_pad_proportion=0.05561, over 10360.37 utterances.], batch size: 44, lr: 5.95e-03, grad_scale: 8.0 2023-03-08 17:59:55,980 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=71030.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:00:02,924 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.258e+02 1.989e+02 2.342e+02 2.870e+02 6.957e+02, threshold=4.685e+02, percent-clipped=3.0 2023-03-08 18:00:23,682 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6406, 3.8811, 3.9350, 2.3559, 2.4072, 2.8204, 2.5450, 3.5396], device='cuda:2'), covar=tensor([0.0700, 0.0341, 0.0353, 0.3966, 0.3945, 0.2072, 0.2436, 0.1289], device='cuda:2'), in_proj_covar=tensor([0.0351, 0.0263, 0.0259, 0.0239, 0.0345, 0.0332, 0.0248, 0.0363], device='cuda:2'), out_proj_covar=tensor([1.4948e-04, 9.7775e-05, 1.1091e-04, 1.0357e-04, 1.4514e-04, 1.3064e-04, 9.8964e-05, 1.4878e-04], device='cuda:2') 2023-03-08 18:00:23,936 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.68 vs. limit=2.0 2023-03-08 18:00:31,319 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=71053.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:01:01,598 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2023-03-08 18:01:05,448 INFO [train2.py:809] (2/4) Epoch 18, batch 3350, loss[ctc_loss=0.07272, att_loss=0.2352, loss=0.2027, over 16549.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.005762, over 45.00 utterances.], tot_loss[ctc_loss=0.07965, att_loss=0.2383, loss=0.2066, over 3267862.74 frames. utt_duration=1226 frames, utt_pad_proportion=0.06341, over 10676.59 utterances.], batch size: 45, lr: 5.95e-03, grad_scale: 8.0 2023-03-08 18:01:31,179 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=71091.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:02:25,933 INFO [train2.py:809] (2/4) Epoch 18, batch 3400, loss[ctc_loss=0.1054, att_loss=0.2537, loss=0.224, over 14179.00 frames. utt_duration=392.6 frames, utt_pad_proportion=0.3172, over 145.00 utterances.], tot_loss[ctc_loss=0.07977, att_loss=0.2384, loss=0.2066, over 3261344.11 frames. utt_duration=1203 frames, utt_pad_proportion=0.06998, over 10860.38 utterances.], batch size: 145, lr: 5.95e-03, grad_scale: 8.0 2023-03-08 18:02:35,233 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2023-03-08 18:02:42,139 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.526e+02 2.089e+02 2.385e+02 3.204e+02 8.386e+02, threshold=4.770e+02, percent-clipped=4.0 2023-03-08 18:03:01,754 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8212, 3.0549, 3.8120, 3.3832, 3.7930, 4.8198, 4.6591, 3.4827], device='cuda:2'), covar=tensor([0.0291, 0.1660, 0.1148, 0.1177, 0.0927, 0.0679, 0.0418, 0.1125], device='cuda:2'), in_proj_covar=tensor([0.0240, 0.0238, 0.0269, 0.0213, 0.0256, 0.0349, 0.0249, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 18:03:12,814 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4700, 2.5393, 2.6609, 4.4680, 4.0205, 3.9635, 3.0382, 2.0209], device='cuda:2'), covar=tensor([0.0636, 0.2227, 0.1630, 0.0566, 0.0778, 0.0452, 0.1282, 0.2598], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0215, 0.0188, 0.0212, 0.0216, 0.0173, 0.0200, 0.0190], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 18:03:46,794 INFO [train2.py:809] (2/4) Epoch 18, batch 3450, loss[ctc_loss=0.08055, att_loss=0.2478, loss=0.2143, over 16956.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008035, over 50.00 utterances.], tot_loss[ctc_loss=0.07974, att_loss=0.2384, loss=0.2067, over 3263760.98 frames. utt_duration=1214 frames, utt_pad_proportion=0.065, over 10764.69 utterances.], batch size: 50, lr: 5.94e-03, grad_scale: 8.0 2023-03-08 18:04:31,284 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.32 vs. limit=5.0 2023-03-08 18:04:51,650 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-03-08 18:05:06,129 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-08 18:05:06,664 INFO [train2.py:809] (2/4) Epoch 18, batch 3500, loss[ctc_loss=0.07255, att_loss=0.2352, loss=0.2027, over 16186.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006013, over 41.00 utterances.], tot_loss[ctc_loss=0.07914, att_loss=0.2382, loss=0.2064, over 3271109.19 frames. utt_duration=1237 frames, utt_pad_proportion=0.05832, over 10590.15 utterances.], batch size: 41, lr: 5.94e-03, grad_scale: 8.0 2023-03-08 18:05:22,700 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.257e+02 1.989e+02 2.438e+02 2.819e+02 5.372e+02, threshold=4.876e+02, percent-clipped=3.0 2023-03-08 18:05:30,019 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.53 vs. limit=2.0 2023-03-08 18:06:26,579 INFO [train2.py:809] (2/4) Epoch 18, batch 3550, loss[ctc_loss=0.1002, att_loss=0.2607, loss=0.2286, over 17084.00 frames. utt_duration=1316 frames, utt_pad_proportion=0.006362, over 52.00 utterances.], tot_loss[ctc_loss=0.07958, att_loss=0.2382, loss=0.2065, over 3268075.75 frames. utt_duration=1242 frames, utt_pad_proportion=0.05639, over 10536.79 utterances.], batch size: 52, lr: 5.94e-03, grad_scale: 8.0 2023-03-08 18:06:46,910 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.10 vs. limit=5.0 2023-03-08 18:06:59,096 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9540, 6.1846, 5.5988, 5.9638, 5.8576, 5.2910, 5.6156, 5.3661], device='cuda:2'), covar=tensor([0.1345, 0.0894, 0.1074, 0.0748, 0.1010, 0.1655, 0.2478, 0.2437], device='cuda:2'), in_proj_covar=tensor([0.0500, 0.0582, 0.0438, 0.0433, 0.0414, 0.0452, 0.0587, 0.0509], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 18:07:25,087 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7181, 2.2533, 2.6957, 2.4520, 2.6894, 2.7473, 2.3703, 2.8876], device='cuda:2'), covar=tensor([0.1988, 0.3642, 0.2413, 0.1961, 0.2844, 0.1281, 0.2823, 0.1362], device='cuda:2'), in_proj_covar=tensor([0.0108, 0.0115, 0.0113, 0.0100, 0.0112, 0.0098, 0.0118, 0.0087], device='cuda:2'), out_proj_covar=tensor([7.9530e-05, 8.7672e-05, 8.8109e-05, 7.7162e-05, 8.2207e-05, 7.7673e-05, 8.7609e-05, 7.0122e-05], device='cuda:2') 2023-03-08 18:07:46,253 INFO [train2.py:809] (2/4) Epoch 18, batch 3600, loss[ctc_loss=0.09135, att_loss=0.2529, loss=0.2206, over 17433.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03212, over 63.00 utterances.], tot_loss[ctc_loss=0.07945, att_loss=0.2381, loss=0.2064, over 3273676.98 frames. utt_duration=1256 frames, utt_pad_proportion=0.05197, over 10440.60 utterances.], batch size: 63, lr: 5.94e-03, grad_scale: 8.0 2023-03-08 18:07:46,705 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3028, 4.0714, 4.1215, 4.0902, 4.7141, 4.2462, 4.0606, 2.4233], device='cuda:2'), covar=tensor([0.0232, 0.0422, 0.0442, 0.0312, 0.0938, 0.0231, 0.0357, 0.1863], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0166, 0.0170, 0.0187, 0.0357, 0.0141, 0.0159, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 18:08:02,039 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.388e+02 1.997e+02 2.305e+02 2.777e+02 7.963e+02, threshold=4.609e+02, percent-clipped=3.0 2023-03-08 18:08:23,284 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=71348.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:09:09,345 INFO [train2.py:809] (2/4) Epoch 18, batch 3650, loss[ctc_loss=0.07406, att_loss=0.2286, loss=0.1977, over 16320.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.005312, over 45.00 utterances.], tot_loss[ctc_loss=0.07958, att_loss=0.2383, loss=0.2066, over 3277282.45 frames. utt_duration=1261 frames, utt_pad_proportion=0.04974, over 10409.35 utterances.], batch size: 45, lr: 5.94e-03, grad_scale: 8.0 2023-03-08 18:09:28,393 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=71386.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:10:33,055 INFO [train2.py:809] (2/4) Epoch 18, batch 3700, loss[ctc_loss=0.04666, att_loss=0.2079, loss=0.1756, over 15766.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008699, over 38.00 utterances.], tot_loss[ctc_loss=0.07961, att_loss=0.2387, loss=0.2069, over 3288525.99 frames. utt_duration=1259 frames, utt_pad_proportion=0.04626, over 10461.34 utterances.], batch size: 38, lr: 5.93e-03, grad_scale: 8.0 2023-03-08 18:10:49,956 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.274e+02 2.052e+02 2.398e+02 2.900e+02 5.315e+02, threshold=4.797e+02, percent-clipped=3.0 2023-03-08 18:11:56,300 INFO [train2.py:809] (2/4) Epoch 18, batch 3750, loss[ctc_loss=0.08987, att_loss=0.2505, loss=0.2184, over 16331.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006186, over 45.00 utterances.], tot_loss[ctc_loss=0.07959, att_loss=0.2386, loss=0.2068, over 3281477.05 frames. utt_duration=1264 frames, utt_pad_proportion=0.04754, over 10393.93 utterances.], batch size: 45, lr: 5.93e-03, grad_scale: 8.0 2023-03-08 18:12:04,212 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.35 vs. limit=5.0 2023-03-08 18:12:18,407 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7117, 5.9869, 5.4223, 5.7147, 5.6293, 5.1972, 5.4191, 5.1824], device='cuda:2'), covar=tensor([0.1197, 0.0893, 0.0953, 0.0837, 0.0840, 0.1722, 0.2360, 0.2343], device='cuda:2'), in_proj_covar=tensor([0.0499, 0.0583, 0.0438, 0.0437, 0.0414, 0.0455, 0.0589, 0.0507], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 18:12:58,118 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-03-08 18:13:18,684 INFO [train2.py:809] (2/4) Epoch 18, batch 3800, loss[ctc_loss=0.06564, att_loss=0.2275, loss=0.1951, over 15355.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01218, over 35.00 utterances.], tot_loss[ctc_loss=0.07964, att_loss=0.2387, loss=0.2069, over 3289319.26 frames. utt_duration=1263 frames, utt_pad_proportion=0.04436, over 10432.84 utterances.], batch size: 35, lr: 5.93e-03, grad_scale: 8.0 2023-03-08 18:13:28,798 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9479, 3.6057, 3.6560, 3.1555, 3.6064, 3.7417, 3.7341, 2.8959], device='cuda:2'), covar=tensor([0.0992, 0.1512, 0.2486, 0.3681, 0.1235, 0.1944, 0.0918, 0.3966], device='cuda:2'), in_proj_covar=tensor([0.0152, 0.0168, 0.0181, 0.0242, 0.0145, 0.0239, 0.0162, 0.0209], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 18:13:34,473 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.420e+02 2.121e+02 2.472e+02 3.058e+02 5.629e+02, threshold=4.945e+02, percent-clipped=2.0 2023-03-08 18:14:04,566 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6461, 3.4379, 2.8294, 3.1125, 3.5179, 3.3115, 2.6958, 3.5959], device='cuda:2'), covar=tensor([0.1081, 0.0484, 0.1018, 0.0758, 0.0715, 0.0685, 0.0912, 0.0660], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0212, 0.0221, 0.0192, 0.0267, 0.0232, 0.0197, 0.0279], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 18:14:37,808 INFO [train2.py:809] (2/4) Epoch 18, batch 3850, loss[ctc_loss=0.06347, att_loss=0.23, loss=0.1967, over 16689.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005788, over 46.00 utterances.], tot_loss[ctc_loss=0.08029, att_loss=0.2391, loss=0.2073, over 3290307.10 frames. utt_duration=1229 frames, utt_pad_proportion=0.05176, over 10722.18 utterances.], batch size: 46, lr: 5.93e-03, grad_scale: 8.0 2023-03-08 18:14:54,608 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.54 vs. limit=2.0 2023-03-08 18:15:29,839 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-03-08 18:15:41,485 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=71616.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:15:54,889 INFO [train2.py:809] (2/4) Epoch 18, batch 3900, loss[ctc_loss=0.1098, att_loss=0.2588, loss=0.229, over 17451.00 frames. utt_duration=1013 frames, utt_pad_proportion=0.04498, over 69.00 utterances.], tot_loss[ctc_loss=0.08007, att_loss=0.239, loss=0.2072, over 3292653.85 frames. utt_duration=1247 frames, utt_pad_proportion=0.04673, over 10575.10 utterances.], batch size: 69, lr: 5.93e-03, grad_scale: 8.0 2023-03-08 18:16:10,963 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.518e+02 2.131e+02 2.535e+02 3.133e+02 8.645e+02, threshold=5.071e+02, percent-clipped=4.0 2023-03-08 18:16:31,490 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=71648.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:17:13,022 INFO [train2.py:809] (2/4) Epoch 18, batch 3950, loss[ctc_loss=0.09261, att_loss=0.2232, loss=0.197, over 15781.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008102, over 38.00 utterances.], tot_loss[ctc_loss=0.08006, att_loss=0.2395, loss=0.2076, over 3290520.27 frames. utt_duration=1232 frames, utt_pad_proportion=0.05192, over 10692.50 utterances.], batch size: 38, lr: 5.92e-03, grad_scale: 8.0 2023-03-08 18:17:16,298 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=71677.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:17:29,977 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=71686.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:17:45,659 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=71696.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:18:26,972 INFO [train2.py:809] (2/4) Epoch 19, batch 0, loss[ctc_loss=0.05769, att_loss=0.2272, loss=0.1933, over 15945.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007446, over 41.00 utterances.], tot_loss[ctc_loss=0.05769, att_loss=0.2272, loss=0.1933, over 15945.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007446, over 41.00 utterances.], batch size: 41, lr: 5.76e-03, grad_scale: 16.0 2023-03-08 18:18:26,972 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 18:18:38,961 INFO [train2.py:843] (2/4) Epoch 19, validation: ctc_loss=0.04291, att_loss=0.2348, loss=0.1964, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 18:18:38,962 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 18:19:20,155 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=71734.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:19:21,551 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.522e+02 2.028e+02 2.615e+02 3.173e+02 7.413e+02, threshold=5.229e+02, percent-clipped=5.0 2023-03-08 18:19:57,598 INFO [train2.py:809] (2/4) Epoch 19, batch 50, loss[ctc_loss=0.0613, att_loss=0.2142, loss=0.1836, over 15367.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01151, over 35.00 utterances.], tot_loss[ctc_loss=0.07275, att_loss=0.2338, loss=0.2016, over 738931.25 frames. utt_duration=1429 frames, utt_pad_proportion=0.01428, over 2070.48 utterances.], batch size: 35, lr: 5.76e-03, grad_scale: 16.0 2023-03-08 18:20:48,151 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3293, 2.5658, 3.0985, 2.5724, 3.0680, 3.4712, 3.3828, 2.7189], device='cuda:2'), covar=tensor([0.0496, 0.1455, 0.1173, 0.1235, 0.0967, 0.1143, 0.0659, 0.1277], device='cuda:2'), in_proj_covar=tensor([0.0242, 0.0243, 0.0275, 0.0216, 0.0262, 0.0355, 0.0254, 0.0230], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 18:21:17,852 INFO [train2.py:809] (2/4) Epoch 19, batch 100, loss[ctc_loss=0.09649, att_loss=0.2298, loss=0.2031, over 16259.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.007909, over 43.00 utterances.], tot_loss[ctc_loss=0.07691, att_loss=0.2361, loss=0.2042, over 1298404.81 frames. utt_duration=1236 frames, utt_pad_proportion=0.06043, over 4207.77 utterances.], batch size: 43, lr: 5.76e-03, grad_scale: 16.0 2023-03-08 18:22:00,749 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.482e+02 2.106e+02 2.518e+02 3.036e+02 5.583e+02, threshold=5.037e+02, percent-clipped=2.0 2023-03-08 18:22:36,458 INFO [train2.py:809] (2/4) Epoch 19, batch 150, loss[ctc_loss=0.04482, att_loss=0.1914, loss=0.1621, over 12846.00 frames. utt_duration=1836 frames, utt_pad_proportion=0.105, over 28.00 utterances.], tot_loss[ctc_loss=0.07664, att_loss=0.2364, loss=0.2044, over 1733932.63 frames. utt_duration=1296 frames, utt_pad_proportion=0.0439, over 5357.12 utterances.], batch size: 28, lr: 5.76e-03, grad_scale: 16.0 2023-03-08 18:22:59,636 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.28 vs. limit=5.0 2023-03-08 18:23:30,855 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7920, 3.9666, 3.9327, 3.9858, 4.0256, 3.7733, 3.0706, 3.9410], device='cuda:2'), covar=tensor([0.0136, 0.0144, 0.0154, 0.0096, 0.0114, 0.0142, 0.0653, 0.0222], device='cuda:2'), in_proj_covar=tensor([0.0088, 0.0085, 0.0105, 0.0066, 0.0071, 0.0082, 0.0101, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 18:23:44,776 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-03-08 18:23:56,060 INFO [train2.py:809] (2/4) Epoch 19, batch 200, loss[ctc_loss=0.06795, att_loss=0.241, loss=0.2064, over 16947.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.008444, over 50.00 utterances.], tot_loss[ctc_loss=0.07784, att_loss=0.238, loss=0.206, over 2074135.33 frames. utt_duration=1239 frames, utt_pad_proportion=0.0581, over 6706.81 utterances.], batch size: 50, lr: 5.75e-03, grad_scale: 16.0 2023-03-08 18:24:14,380 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=71919.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:24:36,087 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0708, 6.2947, 5.7060, 5.9869, 5.9040, 5.4499, 5.7413, 5.3640], device='cuda:2'), covar=tensor([0.1095, 0.0869, 0.1022, 0.0883, 0.0826, 0.1612, 0.2255, 0.2923], device='cuda:2'), in_proj_covar=tensor([0.0502, 0.0586, 0.0441, 0.0440, 0.0419, 0.0455, 0.0593, 0.0516], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 18:24:39,023 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.975e+02 2.308e+02 2.986e+02 5.349e+02, threshold=4.617e+02, percent-clipped=2.0 2023-03-08 18:24:54,403 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0592, 3.8996, 3.2617, 3.4343, 4.1030, 3.6272, 2.9872, 4.4327], device='cuda:2'), covar=tensor([0.0987, 0.0458, 0.1028, 0.0814, 0.0668, 0.0761, 0.0916, 0.0476], device='cuda:2'), in_proj_covar=tensor([0.0199, 0.0215, 0.0225, 0.0195, 0.0270, 0.0235, 0.0198, 0.0283], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 18:25:06,250 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.84 vs. limit=2.0 2023-03-08 18:25:14,790 INFO [train2.py:809] (2/4) Epoch 19, batch 250, loss[ctc_loss=0.08075, att_loss=0.2477, loss=0.2143, over 17056.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.00865, over 52.00 utterances.], tot_loss[ctc_loss=0.0785, att_loss=0.239, loss=0.2069, over 2350831.73 frames. utt_duration=1244 frames, utt_pad_proportion=0.05272, over 7570.10 utterances.], batch size: 52, lr: 5.75e-03, grad_scale: 16.0 2023-03-08 18:25:37,776 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=71972.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:25:51,040 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=71980.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:26:05,053 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7838, 3.9795, 3.9468, 4.0148, 4.0230, 3.8336, 3.0772, 3.9295], device='cuda:2'), covar=tensor([0.0124, 0.0138, 0.0133, 0.0088, 0.0108, 0.0129, 0.0598, 0.0197], device='cuda:2'), in_proj_covar=tensor([0.0087, 0.0084, 0.0104, 0.0065, 0.0071, 0.0081, 0.0100, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 18:26:39,084 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2970, 2.7580, 3.5802, 2.7577, 3.4906, 4.4794, 4.3201, 3.0756], device='cuda:2'), covar=tensor([0.0423, 0.1783, 0.1105, 0.1499, 0.1072, 0.0814, 0.0535, 0.1336], device='cuda:2'), in_proj_covar=tensor([0.0240, 0.0241, 0.0273, 0.0216, 0.0261, 0.0353, 0.0253, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 18:26:40,275 INFO [train2.py:809] (2/4) Epoch 19, batch 300, loss[ctc_loss=0.08297, att_loss=0.2541, loss=0.2198, over 17303.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02346, over 59.00 utterances.], tot_loss[ctc_loss=0.07906, att_loss=0.2398, loss=0.2076, over 2557232.36 frames. utt_duration=1221 frames, utt_pad_proportion=0.05812, over 8390.20 utterances.], batch size: 59, lr: 5.75e-03, grad_scale: 16.0 2023-03-08 18:27:10,075 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.74 vs. limit=2.0 2023-03-08 18:27:12,141 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1517, 5.1481, 4.8472, 2.7486, 4.9579, 4.6109, 4.3243, 2.4204], device='cuda:2'), covar=tensor([0.0097, 0.0082, 0.0301, 0.1116, 0.0083, 0.0233, 0.0321, 0.1655], device='cuda:2'), in_proj_covar=tensor([0.0072, 0.0098, 0.0099, 0.0109, 0.0082, 0.0108, 0.0098, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 18:27:21,340 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8501, 6.0897, 5.5002, 5.7689, 5.6759, 5.2905, 5.4810, 5.1261], device='cuda:2'), covar=tensor([0.1228, 0.0994, 0.0954, 0.0936, 0.1030, 0.1472, 0.2250, 0.2734], device='cuda:2'), in_proj_covar=tensor([0.0500, 0.0584, 0.0440, 0.0442, 0.0417, 0.0456, 0.0591, 0.0515], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 18:27:22,665 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.261e+02 2.009e+02 2.404e+02 3.052e+02 6.487e+02, threshold=4.807e+02, percent-clipped=6.0 2023-03-08 18:27:23,048 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0140, 3.7569, 3.1684, 3.3927, 4.0556, 3.5445, 2.9528, 4.2869], device='cuda:2'), covar=tensor([0.0950, 0.0542, 0.1075, 0.0690, 0.0600, 0.0670, 0.0928, 0.0431], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0217, 0.0226, 0.0196, 0.0271, 0.0237, 0.0199, 0.0285], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 18:27:55,370 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.41 vs. limit=5.0 2023-03-08 18:27:59,502 INFO [train2.py:809] (2/4) Epoch 19, batch 350, loss[ctc_loss=0.09782, att_loss=0.2528, loss=0.2218, over 17431.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.0313, over 63.00 utterances.], tot_loss[ctc_loss=0.0788, att_loss=0.2395, loss=0.2074, over 2718499.93 frames. utt_duration=1223 frames, utt_pad_proportion=0.05811, over 8898.73 utterances.], batch size: 63, lr: 5.75e-03, grad_scale: 16.0 2023-03-08 18:28:09,206 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5592, 2.7406, 4.9979, 3.9771, 2.9753, 4.2029, 4.8552, 4.6965], device='cuda:2'), covar=tensor([0.0238, 0.1540, 0.0177, 0.0921, 0.1787, 0.0265, 0.0132, 0.0214], device='cuda:2'), in_proj_covar=tensor([0.0184, 0.0244, 0.0174, 0.0315, 0.0271, 0.0207, 0.0160, 0.0190], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 18:28:36,894 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1028, 5.3656, 5.6407, 5.4010, 5.5391, 6.0431, 5.2058, 6.1791], device='cuda:2'), covar=tensor([0.0635, 0.0740, 0.0786, 0.1280, 0.1700, 0.0837, 0.0700, 0.0578], device='cuda:2'), in_proj_covar=tensor([0.0823, 0.0484, 0.0567, 0.0626, 0.0837, 0.0588, 0.0464, 0.0576], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 18:29:01,468 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-08 18:29:17,681 INFO [train2.py:809] (2/4) Epoch 19, batch 400, loss[ctc_loss=0.08249, att_loss=0.2576, loss=0.2226, over 17062.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009064, over 53.00 utterances.], tot_loss[ctc_loss=0.07805, att_loss=0.2386, loss=0.2065, over 2838587.42 frames. utt_duration=1237 frames, utt_pad_proportion=0.05712, over 9190.76 utterances.], batch size: 53, lr: 5.75e-03, grad_scale: 8.0 2023-03-08 18:29:42,565 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9634, 4.0156, 3.8512, 4.1665, 2.6253, 4.1251, 2.4329, 1.7504], device='cuda:2'), covar=tensor([0.0376, 0.0232, 0.0764, 0.0244, 0.1748, 0.0177, 0.1640, 0.1817], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0154, 0.0257, 0.0149, 0.0224, 0.0130, 0.0232, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 18:29:51,756 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=72129.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:30:02,014 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.245e+02 2.032e+02 2.343e+02 2.887e+02 1.413e+03, threshold=4.687e+02, percent-clipped=3.0 2023-03-08 18:30:30,550 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.18 vs. limit=5.0 2023-03-08 18:30:37,548 INFO [train2.py:809] (2/4) Epoch 19, batch 450, loss[ctc_loss=0.1087, att_loss=0.2554, loss=0.226, over 17456.00 frames. utt_duration=1013 frames, utt_pad_proportion=0.04413, over 69.00 utterances.], tot_loss[ctc_loss=0.07848, att_loss=0.2385, loss=0.2065, over 2938760.06 frames. utt_duration=1240 frames, utt_pad_proportion=0.05579, over 9493.41 utterances.], batch size: 69, lr: 5.74e-03, grad_scale: 8.0 2023-03-08 18:31:23,417 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-08 18:31:28,139 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=72190.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:31:56,578 INFO [train2.py:809] (2/4) Epoch 19, batch 500, loss[ctc_loss=0.05774, att_loss=0.2181, loss=0.1861, over 15972.00 frames. utt_duration=1560 frames, utt_pad_proportion=0.006, over 41.00 utterances.], tot_loss[ctc_loss=0.07839, att_loss=0.2383, loss=0.2063, over 3016206.96 frames. utt_duration=1265 frames, utt_pad_proportion=0.04895, over 9550.87 utterances.], batch size: 41, lr: 5.74e-03, grad_scale: 8.0 2023-03-08 18:32:39,381 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.209e+02 2.024e+02 2.496e+02 3.106e+02 1.224e+03, threshold=4.991e+02, percent-clipped=5.0 2023-03-08 18:33:14,205 INFO [train2.py:809] (2/4) Epoch 19, batch 550, loss[ctc_loss=0.08586, att_loss=0.2477, loss=0.2153, over 17389.00 frames. utt_duration=1010 frames, utt_pad_proportion=0.04838, over 69.00 utterances.], tot_loss[ctc_loss=0.07869, att_loss=0.2379, loss=0.206, over 3076170.02 frames. utt_duration=1265 frames, utt_pad_proportion=0.0474, over 9736.91 utterances.], batch size: 69, lr: 5.74e-03, grad_scale: 8.0 2023-03-08 18:33:37,393 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=72272.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:33:41,793 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=72275.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:33:52,521 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6983, 2.8895, 5.0822, 4.2512, 3.0314, 4.3151, 4.8545, 4.5849], device='cuda:2'), covar=tensor([0.0232, 0.1567, 0.0185, 0.0810, 0.1827, 0.0250, 0.0144, 0.0263], device='cuda:2'), in_proj_covar=tensor([0.0183, 0.0243, 0.0174, 0.0313, 0.0269, 0.0206, 0.0160, 0.0190], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 18:34:28,113 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 18:34:32,969 INFO [train2.py:809] (2/4) Epoch 19, batch 600, loss[ctc_loss=0.07815, att_loss=0.2119, loss=0.1851, over 15372.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01052, over 35.00 utterances.], tot_loss[ctc_loss=0.07928, att_loss=0.2377, loss=0.206, over 3111843.42 frames. utt_duration=1259 frames, utt_pad_proportion=0.05266, over 9900.40 utterances.], batch size: 35, lr: 5.74e-03, grad_scale: 8.0 2023-03-08 18:34:52,096 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=72320.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:35:02,566 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=72326.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:35:18,136 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.518e+02 1.966e+02 2.354e+02 3.021e+02 6.403e+02, threshold=4.709e+02, percent-clipped=3.0 2023-03-08 18:35:53,224 INFO [train2.py:809] (2/4) Epoch 19, batch 650, loss[ctc_loss=0.08014, att_loss=0.241, loss=0.2088, over 16542.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006332, over 45.00 utterances.], tot_loss[ctc_loss=0.07859, att_loss=0.2373, loss=0.2056, over 3150937.24 frames. utt_duration=1279 frames, utt_pad_proportion=0.04658, over 9865.64 utterances.], batch size: 45, lr: 5.74e-03, grad_scale: 8.0 2023-03-08 18:36:40,903 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=72387.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:37:01,826 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 18:37:13,706 INFO [train2.py:809] (2/4) Epoch 19, batch 700, loss[ctc_loss=0.07257, att_loss=0.227, loss=0.1961, over 16170.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.006796, over 41.00 utterances.], tot_loss[ctc_loss=0.07814, att_loss=0.2374, loss=0.2055, over 3183274.43 frames. utt_duration=1285 frames, utt_pad_proportion=0.04443, over 9924.14 utterances.], batch size: 41, lr: 5.73e-03, grad_scale: 8.0 2023-03-08 18:37:25,359 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4655, 2.8354, 3.3394, 4.4706, 4.0091, 4.0442, 3.0788, 2.3448], device='cuda:2'), covar=tensor([0.0640, 0.1993, 0.1005, 0.0585, 0.0721, 0.0442, 0.1290, 0.2165], device='cuda:2'), in_proj_covar=tensor([0.0174, 0.0208, 0.0183, 0.0208, 0.0212, 0.0167, 0.0195, 0.0184], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 18:37:57,759 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 2.010e+02 2.298e+02 2.722e+02 6.112e+02, threshold=4.595e+02, percent-clipped=4.0 2023-03-08 18:38:32,585 INFO [train2.py:809] (2/4) Epoch 19, batch 750, loss[ctc_loss=0.05578, att_loss=0.2071, loss=0.1768, over 15641.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008016, over 37.00 utterances.], tot_loss[ctc_loss=0.07747, att_loss=0.237, loss=0.2051, over 3208921.32 frames. utt_duration=1298 frames, utt_pad_proportion=0.03967, over 9901.22 utterances.], batch size: 37, lr: 5.73e-03, grad_scale: 8.0 2023-03-08 18:39:16,284 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=72485.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:39:46,964 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-08 18:39:51,880 INFO [train2.py:809] (2/4) Epoch 19, batch 800, loss[ctc_loss=0.09823, att_loss=0.2471, loss=0.2174, over 16136.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005664, over 42.00 utterances.], tot_loss[ctc_loss=0.07813, att_loss=0.2372, loss=0.2054, over 3212231.28 frames. utt_duration=1259 frames, utt_pad_proportion=0.05267, over 10221.12 utterances.], batch size: 42, lr: 5.73e-03, grad_scale: 8.0 2023-03-08 18:40:37,046 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.504e+02 2.045e+02 2.501e+02 3.023e+02 6.959e+02, threshold=5.001e+02, percent-clipped=5.0 2023-03-08 18:41:12,444 INFO [train2.py:809] (2/4) Epoch 19, batch 850, loss[ctc_loss=0.0832, att_loss=0.2315, loss=0.2018, over 15627.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009294, over 37.00 utterances.], tot_loss[ctc_loss=0.07809, att_loss=0.2375, loss=0.2056, over 3217098.68 frames. utt_duration=1223 frames, utt_pad_proportion=0.06586, over 10538.92 utterances.], batch size: 37, lr: 5.73e-03, grad_scale: 8.0 2023-03-08 18:41:39,236 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=72575.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:41:41,602 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-08 18:42:10,168 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.18 vs. limit=5.0 2023-03-08 18:42:31,643 INFO [train2.py:809] (2/4) Epoch 19, batch 900, loss[ctc_loss=0.06281, att_loss=0.2368, loss=0.202, over 16323.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006597, over 45.00 utterances.], tot_loss[ctc_loss=0.07849, att_loss=0.2378, loss=0.206, over 3220651.84 frames. utt_duration=1170 frames, utt_pad_proportion=0.08087, over 11021.07 utterances.], batch size: 45, lr: 5.73e-03, grad_scale: 8.0 2023-03-08 18:42:51,228 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8008, 3.4979, 3.5135, 2.9982, 3.5923, 3.6127, 3.5872, 2.4181], device='cuda:2'), covar=tensor([0.1034, 0.1714, 0.2670, 0.4415, 0.1458, 0.3110, 0.0925, 0.4848], device='cuda:2'), in_proj_covar=tensor([0.0153, 0.0172, 0.0186, 0.0247, 0.0146, 0.0241, 0.0163, 0.0210], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 18:42:55,554 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=72623.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:43:16,193 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.377e+02 2.045e+02 2.502e+02 3.069e+02 5.173e+02, threshold=5.005e+02, percent-clipped=1.0 2023-03-08 18:43:51,338 INFO [train2.py:809] (2/4) Epoch 19, batch 950, loss[ctc_loss=0.06402, att_loss=0.2294, loss=0.1963, over 16488.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.005363, over 46.00 utterances.], tot_loss[ctc_loss=0.07797, att_loss=0.2378, loss=0.2059, over 3232705.77 frames. utt_duration=1169 frames, utt_pad_proportion=0.07839, over 11077.27 utterances.], batch size: 46, lr: 5.72e-03, grad_scale: 8.0 2023-03-08 18:44:06,774 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.16 vs. limit=5.0 2023-03-08 18:44:30,092 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=72682.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:44:52,471 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4260, 4.4617, 4.2425, 2.6341, 4.3521, 4.1730, 3.6007, 2.7144], device='cuda:2'), covar=tensor([0.0142, 0.0122, 0.0292, 0.1058, 0.0111, 0.0279, 0.0436, 0.1319], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0099, 0.0099, 0.0109, 0.0082, 0.0108, 0.0098, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 18:45:11,226 INFO [train2.py:809] (2/4) Epoch 19, batch 1000, loss[ctc_loss=0.06694, att_loss=0.2407, loss=0.206, over 17219.00 frames. utt_duration=873.1 frames, utt_pad_proportion=0.08288, over 79.00 utterances.], tot_loss[ctc_loss=0.07859, att_loss=0.2387, loss=0.2066, over 3252596.85 frames. utt_duration=1185 frames, utt_pad_proportion=0.07055, over 10996.79 utterances.], batch size: 79, lr: 5.72e-03, grad_scale: 8.0 2023-03-08 18:45:56,177 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.301e+02 2.016e+02 2.281e+02 2.809e+02 6.026e+02, threshold=4.562e+02, percent-clipped=5.0 2023-03-08 18:46:31,108 INFO [train2.py:809] (2/4) Epoch 19, batch 1050, loss[ctc_loss=0.06071, att_loss=0.2099, loss=0.1801, over 15793.00 frames. utt_duration=1664 frames, utt_pad_proportion=0.006646, over 38.00 utterances.], tot_loss[ctc_loss=0.07806, att_loss=0.2386, loss=0.2065, over 3258133.35 frames. utt_duration=1193 frames, utt_pad_proportion=0.06955, over 10935.50 utterances.], batch size: 38, lr: 5.72e-03, grad_scale: 8.0 2023-03-08 18:46:39,765 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.71 vs. limit=2.0 2023-03-08 18:46:40,156 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-08 18:47:13,807 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=72785.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:47:50,613 INFO [train2.py:809] (2/4) Epoch 19, batch 1100, loss[ctc_loss=0.0813, att_loss=0.2496, loss=0.2159, over 17033.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.01006, over 52.00 utterances.], tot_loss[ctc_loss=0.07831, att_loss=0.2383, loss=0.2063, over 3257530.32 frames. utt_duration=1202 frames, utt_pad_proportion=0.06793, over 10855.21 utterances.], batch size: 52, lr: 5.72e-03, grad_scale: 8.0 2023-03-08 18:47:57,192 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7605, 3.2601, 3.9150, 3.3178, 3.8025, 4.7638, 4.6182, 3.4055], device='cuda:2'), covar=tensor([0.0311, 0.1464, 0.1038, 0.1369, 0.0913, 0.0930, 0.0570, 0.1282], device='cuda:2'), in_proj_covar=tensor([0.0241, 0.0241, 0.0274, 0.0216, 0.0259, 0.0354, 0.0252, 0.0230], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 18:48:30,782 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=72833.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:48:35,937 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.339e+02 1.909e+02 2.229e+02 2.585e+02 5.838e+02, threshold=4.459e+02, percent-clipped=1.0 2023-03-08 18:48:48,594 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3492, 2.9741, 3.6201, 4.5606, 3.9063, 3.9249, 2.8124, 2.3505], device='cuda:2'), covar=tensor([0.0761, 0.1928, 0.0778, 0.0463, 0.0931, 0.0460, 0.1544, 0.2085], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0211, 0.0185, 0.0209, 0.0216, 0.0170, 0.0197, 0.0186], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 18:49:10,569 INFO [train2.py:809] (2/4) Epoch 19, batch 1150, loss[ctc_loss=0.1336, att_loss=0.2722, loss=0.2445, over 14162.00 frames. utt_duration=389.6 frames, utt_pad_proportion=0.3213, over 146.00 utterances.], tot_loss[ctc_loss=0.07842, att_loss=0.2384, loss=0.2064, over 3260858.40 frames. utt_duration=1198 frames, utt_pad_proportion=0.06887, over 10901.36 utterances.], batch size: 146, lr: 5.72e-03, grad_scale: 8.0 2023-03-08 18:49:10,886 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2707, 2.7887, 3.2599, 4.4303, 3.8237, 3.8176, 2.7297, 2.3394], device='cuda:2'), covar=tensor([0.0719, 0.1932, 0.0920, 0.0466, 0.0892, 0.0475, 0.1589, 0.2111], device='cuda:2'), in_proj_covar=tensor([0.0175, 0.0210, 0.0185, 0.0208, 0.0214, 0.0169, 0.0197, 0.0185], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 18:49:20,067 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2023-03-08 18:50:08,390 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=72894.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:50:30,465 INFO [train2.py:809] (2/4) Epoch 19, batch 1200, loss[ctc_loss=0.06789, att_loss=0.2257, loss=0.1941, over 16025.00 frames. utt_duration=1604 frames, utt_pad_proportion=0.006827, over 40.00 utterances.], tot_loss[ctc_loss=0.07849, att_loss=0.2383, loss=0.2063, over 3262804.19 frames. utt_duration=1204 frames, utt_pad_proportion=0.06821, over 10852.09 utterances.], batch size: 40, lr: 5.71e-03, grad_scale: 8.0 2023-03-08 18:50:49,209 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7808, 3.2753, 3.8678, 3.2140, 3.7268, 4.8091, 4.6792, 3.6363], device='cuda:2'), covar=tensor([0.0315, 0.1440, 0.1054, 0.1374, 0.0972, 0.0829, 0.0484, 0.1106], device='cuda:2'), in_proj_covar=tensor([0.0241, 0.0241, 0.0274, 0.0215, 0.0259, 0.0355, 0.0252, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 18:51:15,277 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.411e+02 1.921e+02 2.301e+02 2.976e+02 5.020e+02, threshold=4.601e+02, percent-clipped=3.0 2023-03-08 18:51:41,016 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.56 vs. limit=5.0 2023-03-08 18:51:45,262 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=72955.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:51:50,076 INFO [train2.py:809] (2/4) Epoch 19, batch 1250, loss[ctc_loss=0.08576, att_loss=0.2513, loss=0.2182, over 17062.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009166, over 53.00 utterances.], tot_loss[ctc_loss=0.07814, att_loss=0.2383, loss=0.2062, over 3264808.84 frames. utt_duration=1198 frames, utt_pad_proportion=0.06805, over 10914.93 utterances.], batch size: 53, lr: 5.71e-03, grad_scale: 8.0 2023-03-08 18:51:59,975 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=72964.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 18:52:28,323 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=72982.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:53:08,941 INFO [train2.py:809] (2/4) Epoch 19, batch 1300, loss[ctc_loss=0.08901, att_loss=0.2595, loss=0.2254, over 17107.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.0151, over 56.00 utterances.], tot_loss[ctc_loss=0.0777, att_loss=0.2382, loss=0.2061, over 3267044.02 frames. utt_duration=1214 frames, utt_pad_proportion=0.06296, over 10778.61 utterances.], batch size: 56, lr: 5.71e-03, grad_scale: 8.0 2023-03-08 18:53:36,058 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5860, 3.6474, 3.3395, 3.7349, 2.5652, 3.7126, 2.7355, 2.1304], device='cuda:2'), covar=tensor([0.0502, 0.0320, 0.0881, 0.0299, 0.1616, 0.0229, 0.1316, 0.1571], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0151, 0.0253, 0.0148, 0.0220, 0.0129, 0.0227, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 18:53:37,536 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=73025.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 18:53:45,281 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=73030.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:53:55,191 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.296e+02 2.017e+02 2.532e+02 3.108e+02 5.795e+02, threshold=5.064e+02, percent-clipped=4.0 2023-03-08 18:54:30,468 INFO [train2.py:809] (2/4) Epoch 19, batch 1350, loss[ctc_loss=0.08088, att_loss=0.2481, loss=0.2147, over 17013.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007604, over 51.00 utterances.], tot_loss[ctc_loss=0.07793, att_loss=0.2385, loss=0.2064, over 3263797.34 frames. utt_duration=1208 frames, utt_pad_proportion=0.0655, over 10824.64 utterances.], batch size: 51, lr: 5.71e-03, grad_scale: 8.0 2023-03-08 18:55:35,969 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4299, 2.8359, 3.6388, 2.7133, 3.4848, 4.5981, 4.3751, 3.1818], device='cuda:2'), covar=tensor([0.0377, 0.1865, 0.1159, 0.1616, 0.1137, 0.0767, 0.0490, 0.1425], device='cuda:2'), in_proj_covar=tensor([0.0238, 0.0239, 0.0271, 0.0214, 0.0257, 0.0352, 0.0250, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 18:55:36,024 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7208, 2.2456, 2.2937, 2.1896, 2.8765, 2.5997, 2.3045, 2.9211], device='cuda:2'), covar=tensor([0.1453, 0.3199, 0.2837, 0.3330, 0.1711, 0.1067, 0.2744, 0.1230], device='cuda:2'), in_proj_covar=tensor([0.0107, 0.0114, 0.0112, 0.0101, 0.0111, 0.0097, 0.0121, 0.0089], device='cuda:2'), out_proj_covar=tensor([7.9806e-05, 8.7756e-05, 8.8012e-05, 7.8177e-05, 8.1987e-05, 7.7530e-05, 8.9686e-05, 7.1313e-05], device='cuda:2') 2023-03-08 18:55:50,786 INFO [train2.py:809] (2/4) Epoch 19, batch 1400, loss[ctc_loss=0.1009, att_loss=0.2569, loss=0.2257, over 16954.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008079, over 50.00 utterances.], tot_loss[ctc_loss=0.07782, att_loss=0.2383, loss=0.2062, over 3274192.05 frames. utt_duration=1223 frames, utt_pad_proportion=0.05985, over 10724.23 utterances.], batch size: 50, lr: 5.71e-03, grad_scale: 8.0 2023-03-08 18:56:16,619 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2023-03-08 18:56:35,399 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.220e+02 2.120e+02 2.397e+02 2.926e+02 6.281e+02, threshold=4.794e+02, percent-clipped=3.0 2023-03-08 18:57:10,511 INFO [train2.py:809] (2/4) Epoch 19, batch 1450, loss[ctc_loss=0.04897, att_loss=0.2157, loss=0.1824, over 16283.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006462, over 43.00 utterances.], tot_loss[ctc_loss=0.07718, att_loss=0.238, loss=0.2058, over 3281993.71 frames. utt_duration=1228 frames, utt_pad_proportion=0.05656, over 10701.67 utterances.], batch size: 43, lr: 5.70e-03, grad_scale: 8.0 2023-03-08 18:58:06,350 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9302, 5.1943, 5.1413, 5.0771, 5.2099, 5.1318, 4.8986, 4.6430], device='cuda:2'), covar=tensor([0.0958, 0.0489, 0.0310, 0.0527, 0.0294, 0.0315, 0.0373, 0.0325], device='cuda:2'), in_proj_covar=tensor([0.0509, 0.0344, 0.0327, 0.0344, 0.0403, 0.0416, 0.0343, 0.0380], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 18:58:15,581 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-03-08 18:58:19,326 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-08 18:58:30,638 INFO [train2.py:809] (2/4) Epoch 19, batch 1500, loss[ctc_loss=0.05202, att_loss=0.2122, loss=0.1802, over 15390.00 frames. utt_duration=1760 frames, utt_pad_proportion=0.008961, over 35.00 utterances.], tot_loss[ctc_loss=0.07743, att_loss=0.2382, loss=0.2061, over 3282361.22 frames. utt_duration=1218 frames, utt_pad_proportion=0.05885, over 10790.36 utterances.], batch size: 35, lr: 5.70e-03, grad_scale: 8.0 2023-03-08 18:58:37,401 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0071, 4.4673, 4.4798, 4.7898, 2.6239, 4.5030, 2.8764, 1.6469], device='cuda:2'), covar=tensor([0.0413, 0.0240, 0.0670, 0.0178, 0.1724, 0.0175, 0.1381, 0.1842], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0150, 0.0250, 0.0147, 0.0217, 0.0128, 0.0224, 0.0199], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 18:59:00,001 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9444, 4.9512, 4.8788, 2.1657, 1.8894, 2.9282, 2.3872, 3.7548], device='cuda:2'), covar=tensor([0.0830, 0.0244, 0.0235, 0.5237, 0.5823, 0.2339, 0.3395, 0.1751], device='cuda:2'), in_proj_covar=tensor([0.0352, 0.0266, 0.0264, 0.0242, 0.0345, 0.0336, 0.0251, 0.0362], device='cuda:2'), out_proj_covar=tensor([1.4984e-04, 9.8668e-05, 1.1202e-04, 1.0425e-04, 1.4512e-04, 1.3182e-04, 1.0089e-04, 1.4811e-04], device='cuda:2') 2023-03-08 18:59:15,228 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.382e+02 2.048e+02 2.505e+02 3.116e+02 1.439e+03, threshold=5.009e+02, percent-clipped=5.0 2023-03-08 18:59:36,929 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=73250.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 18:59:49,904 INFO [train2.py:809] (2/4) Epoch 19, batch 1550, loss[ctc_loss=0.05449, att_loss=0.208, loss=0.1773, over 15659.00 frames. utt_duration=1695 frames, utt_pad_proportion=0.007896, over 37.00 utterances.], tot_loss[ctc_loss=0.07808, att_loss=0.2384, loss=0.2064, over 3280353.30 frames. utt_duration=1221 frames, utt_pad_proportion=0.06098, over 10763.84 utterances.], batch size: 37, lr: 5.70e-03, grad_scale: 8.0 2023-03-08 19:00:45,881 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=73293.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:01:09,921 INFO [train2.py:809] (2/4) Epoch 19, batch 1600, loss[ctc_loss=0.06614, att_loss=0.205, loss=0.1773, over 14481.00 frames. utt_duration=1812 frames, utt_pad_proportion=0.03355, over 32.00 utterances.], tot_loss[ctc_loss=0.0769, att_loss=0.2371, loss=0.205, over 3274670.14 frames. utt_duration=1244 frames, utt_pad_proportion=0.05602, over 10540.96 utterances.], batch size: 32, lr: 5.70e-03, grad_scale: 8.0 2023-03-08 19:01:18,510 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 19:01:28,645 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=73320.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 19:01:45,338 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8434, 2.3256, 2.6190, 2.1990, 2.9583, 2.6376, 2.4781, 3.1145], device='cuda:2'), covar=tensor([0.1344, 0.2879, 0.2112, 0.1972, 0.1653, 0.1066, 0.2197, 0.1022], device='cuda:2'), in_proj_covar=tensor([0.0107, 0.0113, 0.0112, 0.0102, 0.0111, 0.0098, 0.0120, 0.0089], device='cuda:2'), out_proj_covar=tensor([7.9710e-05, 8.7031e-05, 8.7966e-05, 7.8186e-05, 8.1832e-05, 7.7707e-05, 8.9221e-05, 7.1439e-05], device='cuda:2') 2023-03-08 19:01:54,272 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.313e+02 2.054e+02 2.356e+02 2.874e+02 5.257e+02, threshold=4.713e+02, percent-clipped=1.0 2023-03-08 19:02:23,079 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=73354.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:02:29,412 INFO [train2.py:809] (2/4) Epoch 19, batch 1650, loss[ctc_loss=0.0571, att_loss=0.2166, loss=0.1847, over 15639.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008439, over 37.00 utterances.], tot_loss[ctc_loss=0.0772, att_loss=0.2379, loss=0.2058, over 3282693.33 frames. utt_duration=1256 frames, utt_pad_proportion=0.05165, over 10464.01 utterances.], batch size: 37, lr: 5.70e-03, grad_scale: 8.0 2023-03-08 19:02:56,830 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.75 vs. limit=2.0 2023-03-08 19:03:31,592 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8565, 4.8972, 4.5947, 2.7868, 4.6415, 4.4732, 3.9714, 2.4814], device='cuda:2'), covar=tensor([0.0125, 0.0098, 0.0274, 0.1036, 0.0103, 0.0231, 0.0378, 0.1542], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0100, 0.0100, 0.0110, 0.0083, 0.0110, 0.0098, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 19:03:48,203 INFO [train2.py:809] (2/4) Epoch 19, batch 1700, loss[ctc_loss=0.07657, att_loss=0.2407, loss=0.2079, over 17020.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007671, over 51.00 utterances.], tot_loss[ctc_loss=0.07757, att_loss=0.2384, loss=0.2062, over 3285922.58 frames. utt_duration=1260 frames, utt_pad_proportion=0.04921, over 10444.30 utterances.], batch size: 51, lr: 5.69e-03, grad_scale: 8.0 2023-03-08 19:03:54,754 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9656, 4.0030, 3.9546, 3.8656, 4.4282, 3.9284, 3.8015, 2.5080], device='cuda:2'), covar=tensor([0.0298, 0.0533, 0.0439, 0.0473, 0.0944, 0.0297, 0.0460, 0.1840], device='cuda:2'), in_proj_covar=tensor([0.0147, 0.0171, 0.0173, 0.0188, 0.0359, 0.0144, 0.0161, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:04:32,146 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.279e+02 2.058e+02 2.463e+02 2.943e+02 5.957e+02, threshold=4.927e+02, percent-clipped=3.0 2023-03-08 19:05:06,957 INFO [train2.py:809] (2/4) Epoch 19, batch 1750, loss[ctc_loss=0.0979, att_loss=0.256, loss=0.2244, over 16744.00 frames. utt_duration=1397 frames, utt_pad_proportion=0.006959, over 48.00 utterances.], tot_loss[ctc_loss=0.07824, att_loss=0.2388, loss=0.2067, over 3287205.42 frames. utt_duration=1235 frames, utt_pad_proportion=0.05346, over 10661.42 utterances.], batch size: 48, lr: 5.69e-03, grad_scale: 8.0 2023-03-08 19:05:16,379 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5710, 5.0432, 4.1128, 5.1521, 4.5092, 4.7989, 5.0947, 4.9414], device='cuda:2'), covar=tensor([0.0747, 0.0331, 0.1391, 0.0357, 0.0447, 0.0331, 0.0329, 0.0244], device='cuda:2'), in_proj_covar=tensor([0.0378, 0.0309, 0.0360, 0.0331, 0.0313, 0.0232, 0.0292, 0.0277], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 19:06:25,915 INFO [train2.py:809] (2/4) Epoch 19, batch 1800, loss[ctc_loss=0.0845, att_loss=0.2406, loss=0.2094, over 17375.00 frames. utt_duration=881.2 frames, utt_pad_proportion=0.07729, over 79.00 utterances.], tot_loss[ctc_loss=0.07775, att_loss=0.238, loss=0.2059, over 3278803.91 frames. utt_duration=1238 frames, utt_pad_proportion=0.05309, over 10608.05 utterances.], batch size: 79, lr: 5.69e-03, grad_scale: 8.0 2023-03-08 19:07:10,632 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.911e+02 2.242e+02 2.883e+02 9.834e+02, threshold=4.485e+02, percent-clipped=3.0 2023-03-08 19:07:13,345 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 19:07:33,093 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=73550.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:07:45,873 INFO [train2.py:809] (2/4) Epoch 19, batch 1850, loss[ctc_loss=0.07067, att_loss=0.2459, loss=0.2109, over 16616.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.006006, over 47.00 utterances.], tot_loss[ctc_loss=0.07746, att_loss=0.2377, loss=0.2057, over 3282811.70 frames. utt_duration=1250 frames, utt_pad_proportion=0.04839, over 10518.45 utterances.], batch size: 47, lr: 5.69e-03, grad_scale: 8.0 2023-03-08 19:08:11,005 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.91 vs. limit=5.0 2023-03-08 19:08:36,597 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=73590.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:08:47,948 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7991, 5.1264, 5.3451, 5.2408, 5.2561, 5.7115, 5.0803, 5.8001], device='cuda:2'), covar=tensor([0.0671, 0.0724, 0.0747, 0.1149, 0.1765, 0.0897, 0.0795, 0.0678], device='cuda:2'), in_proj_covar=tensor([0.0848, 0.0496, 0.0581, 0.0641, 0.0857, 0.0604, 0.0479, 0.0590], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 19:08:49,384 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=73598.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:09:02,308 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5283, 4.6137, 4.5821, 4.7268, 5.2122, 4.3634, 4.5098, 2.6559], device='cuda:2'), covar=tensor([0.0224, 0.0308, 0.0291, 0.0271, 0.0817, 0.0259, 0.0333, 0.1833], device='cuda:2'), in_proj_covar=tensor([0.0148, 0.0171, 0.0174, 0.0189, 0.0359, 0.0144, 0.0161, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:09:04,975 INFO [train2.py:809] (2/4) Epoch 19, batch 1900, loss[ctc_loss=0.06076, att_loss=0.2053, loss=0.1764, over 15508.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008434, over 36.00 utterances.], tot_loss[ctc_loss=0.07788, att_loss=0.2378, loss=0.2058, over 3283198.50 frames. utt_duration=1255 frames, utt_pad_proportion=0.04774, over 10478.19 utterances.], batch size: 36, lr: 5.69e-03, grad_scale: 8.0 2023-03-08 19:09:21,617 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-08 19:09:23,761 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=73620.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 19:09:48,771 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.371e+02 1.956e+02 2.391e+02 2.954e+02 5.478e+02, threshold=4.781e+02, percent-clipped=4.0 2023-03-08 19:10:08,348 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8538, 5.1728, 5.3947, 5.1619, 5.3025, 5.7399, 5.1616, 5.8903], device='cuda:2'), covar=tensor([0.0716, 0.0781, 0.0797, 0.1416, 0.1912, 0.1071, 0.0713, 0.0695], device='cuda:2'), in_proj_covar=tensor([0.0846, 0.0494, 0.0578, 0.0640, 0.0853, 0.0604, 0.0477, 0.0588], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 19:10:10,669 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=73649.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:10:13,996 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=73651.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:10:14,637 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-08 19:10:24,375 INFO [train2.py:809] (2/4) Epoch 19, batch 1950, loss[ctc_loss=0.06265, att_loss=0.2331, loss=0.199, over 16702.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005576, over 46.00 utterances.], tot_loss[ctc_loss=0.07762, att_loss=0.2379, loss=0.2058, over 3277420.83 frames. utt_duration=1254 frames, utt_pad_proportion=0.04973, over 10465.16 utterances.], batch size: 46, lr: 5.68e-03, grad_scale: 8.0 2023-03-08 19:10:40,462 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=73668.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 19:11:22,803 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7718, 2.3736, 2.5161, 3.3135, 3.1847, 3.2560, 2.5566, 2.1686], device='cuda:2'), covar=tensor([0.0712, 0.1899, 0.1051, 0.0786, 0.0820, 0.0490, 0.1343, 0.1845], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0215, 0.0188, 0.0214, 0.0217, 0.0174, 0.0200, 0.0187], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:11:43,429 INFO [train2.py:809] (2/4) Epoch 19, batch 2000, loss[ctc_loss=0.1103, att_loss=0.2672, loss=0.2358, over 16752.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.007191, over 48.00 utterances.], tot_loss[ctc_loss=0.07901, att_loss=0.239, loss=0.207, over 3277250.73 frames. utt_duration=1230 frames, utt_pad_proportion=0.05565, over 10671.20 utterances.], batch size: 48, lr: 5.68e-03, grad_scale: 8.0 2023-03-08 19:12:22,465 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=73732.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:12:28,296 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.224e+02 2.150e+02 2.565e+02 3.312e+02 1.867e+03, threshold=5.130e+02, percent-clipped=8.0 2023-03-08 19:13:03,765 INFO [train2.py:809] (2/4) Epoch 19, batch 2050, loss[ctc_loss=0.08544, att_loss=0.2551, loss=0.2211, over 16472.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.007076, over 46.00 utterances.], tot_loss[ctc_loss=0.07856, att_loss=0.2385, loss=0.2065, over 3272203.38 frames. utt_duration=1253 frames, utt_pad_proportion=0.05187, over 10461.85 utterances.], batch size: 46, lr: 5.68e-03, grad_scale: 8.0 2023-03-08 19:13:10,024 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.19 vs. limit=5.0 2023-03-08 19:13:59,671 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=73793.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:14:06,280 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8629, 3.6111, 3.1189, 3.0968, 3.8330, 3.4815, 2.6801, 4.0635], device='cuda:2'), covar=tensor([0.1038, 0.0513, 0.1097, 0.0753, 0.0737, 0.0799, 0.1004, 0.0460], device='cuda:2'), in_proj_covar=tensor([0.0198, 0.0211, 0.0220, 0.0192, 0.0268, 0.0234, 0.0197, 0.0280], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 19:14:13,838 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.32 vs. limit=5.0 2023-03-08 19:14:24,522 INFO [train2.py:809] (2/4) Epoch 19, batch 2100, loss[ctc_loss=0.06136, att_loss=0.2167, loss=0.1856, over 15664.00 frames. utt_duration=1695 frames, utt_pad_proportion=0.007722, over 37.00 utterances.], tot_loss[ctc_loss=0.07862, att_loss=0.2385, loss=0.2065, over 3266712.30 frames. utt_duration=1232 frames, utt_pad_proportion=0.05921, over 10621.58 utterances.], batch size: 37, lr: 5.68e-03, grad_scale: 8.0 2023-03-08 19:14:26,695 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0094, 4.3992, 4.3386, 4.4911, 2.7943, 4.3826, 2.8226, 1.7758], device='cuda:2'), covar=tensor([0.0380, 0.0206, 0.0651, 0.0261, 0.1540, 0.0185, 0.1303, 0.1767], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0152, 0.0257, 0.0149, 0.0221, 0.0132, 0.0228, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:14:49,104 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9785, 3.6392, 3.6918, 3.2521, 3.7828, 3.7694, 3.7769, 2.7690], device='cuda:2'), covar=tensor([0.0964, 0.1464, 0.2159, 0.3791, 0.1036, 0.3330, 0.0791, 0.4053], device='cuda:2'), in_proj_covar=tensor([0.0157, 0.0175, 0.0186, 0.0249, 0.0150, 0.0247, 0.0164, 0.0211], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:15:03,370 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2023-03-08 19:15:08,604 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.215e+02 2.020e+02 2.356e+02 2.940e+02 5.527e+02, threshold=4.712e+02, percent-clipped=4.0 2023-03-08 19:15:32,906 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7645, 2.0737, 2.3965, 2.1670, 2.8359, 2.6359, 2.5596, 2.9951], device='cuda:2'), covar=tensor([0.1858, 0.4034, 0.2736, 0.2085, 0.1853, 0.1194, 0.2408, 0.1240], device='cuda:2'), in_proj_covar=tensor([0.0107, 0.0114, 0.0112, 0.0102, 0.0111, 0.0097, 0.0120, 0.0089], device='cuda:2'), out_proj_covar=tensor([7.9820e-05, 8.7479e-05, 8.7901e-05, 7.8414e-05, 8.2133e-05, 7.7330e-05, 8.8854e-05, 7.1165e-05], device='cuda:2') 2023-03-08 19:15:43,844 INFO [train2.py:809] (2/4) Epoch 19, batch 2150, loss[ctc_loss=0.0952, att_loss=0.2562, loss=0.224, over 17289.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01193, over 55.00 utterances.], tot_loss[ctc_loss=0.07848, att_loss=0.2385, loss=0.2065, over 3270079.87 frames. utt_duration=1231 frames, utt_pad_proportion=0.05919, over 10635.22 utterances.], batch size: 55, lr: 5.68e-03, grad_scale: 8.0 2023-03-08 19:15:49,665 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-08 19:17:03,551 INFO [train2.py:809] (2/4) Epoch 19, batch 2200, loss[ctc_loss=0.066, att_loss=0.2086, loss=0.18, over 14532.00 frames. utt_duration=1818 frames, utt_pad_proportion=0.03803, over 32.00 utterances.], tot_loss[ctc_loss=0.07829, att_loss=0.2375, loss=0.2057, over 3249627.62 frames. utt_duration=1222 frames, utt_pad_proportion=0.06668, over 10647.39 utterances.], batch size: 32, lr: 5.68e-03, grad_scale: 8.0 2023-03-08 19:17:43,903 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-08 19:17:47,492 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.450e+02 2.057e+02 2.535e+02 3.122e+02 7.194e+02, threshold=5.070e+02, percent-clipped=8.0 2023-03-08 19:18:03,632 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.81 vs. limit=2.0 2023-03-08 19:18:04,651 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=73946.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:18:09,410 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=73949.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:18:10,186 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.16 vs. limit=5.0 2023-03-08 19:18:23,574 INFO [train2.py:809] (2/4) Epoch 19, batch 2250, loss[ctc_loss=0.07413, att_loss=0.2413, loss=0.2079, over 16478.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.006489, over 46.00 utterances.], tot_loss[ctc_loss=0.07743, att_loss=0.237, loss=0.2051, over 3254881.04 frames. utt_duration=1239 frames, utt_pad_proportion=0.06147, over 10519.52 utterances.], batch size: 46, lr: 5.67e-03, grad_scale: 8.0 2023-03-08 19:18:41,628 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1278, 4.4193, 4.1066, 4.6602, 2.4627, 4.4691, 2.4857, 1.6086], device='cuda:2'), covar=tensor([0.0358, 0.0208, 0.0883, 0.0186, 0.2077, 0.0192, 0.1860, 0.2098], device='cuda:2'), in_proj_covar=tensor([0.0183, 0.0154, 0.0260, 0.0150, 0.0222, 0.0133, 0.0232, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:18:45,220 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.02 vs. limit=5.0 2023-03-08 19:19:15,893 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9504, 3.6390, 3.9227, 3.7518, 4.1246, 4.9034, 4.7282, 3.6810], device='cuda:2'), covar=tensor([0.0224, 0.1134, 0.1007, 0.0966, 0.0721, 0.0609, 0.0430, 0.1012], device='cuda:2'), in_proj_covar=tensor([0.0241, 0.0239, 0.0274, 0.0213, 0.0257, 0.0355, 0.0251, 0.0230], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 19:19:25,461 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=73997.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:19:47,859 INFO [train2.py:809] (2/4) Epoch 19, batch 2300, loss[ctc_loss=0.08768, att_loss=0.2301, loss=0.2016, over 16018.00 frames. utt_duration=1604 frames, utt_pad_proportion=0.007074, over 40.00 utterances.], tot_loss[ctc_loss=0.07844, att_loss=0.2382, loss=0.2062, over 3264562.70 frames. utt_duration=1234 frames, utt_pad_proportion=0.05992, over 10590.97 utterances.], batch size: 40, lr: 5.67e-03, grad_scale: 8.0 2023-03-08 19:20:22,386 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1604, 4.4719, 4.3331, 4.7331, 3.0992, 4.5278, 2.7330, 1.8750], device='cuda:2'), covar=tensor([0.0366, 0.0271, 0.0693, 0.0188, 0.1321, 0.0197, 0.1452, 0.1691], device='cuda:2'), in_proj_covar=tensor([0.0183, 0.0154, 0.0259, 0.0150, 0.0221, 0.0133, 0.0231, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:20:30,626 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.391e+02 2.198e+02 2.569e+02 3.135e+02 6.426e+02, threshold=5.138e+02, percent-clipped=5.0 2023-03-08 19:21:05,897 INFO [train2.py:809] (2/4) Epoch 19, batch 2350, loss[ctc_loss=0.06833, att_loss=0.2268, loss=0.1951, over 15851.00 frames. utt_duration=1627 frames, utt_pad_proportion=0.009489, over 39.00 utterances.], tot_loss[ctc_loss=0.07808, att_loss=0.2375, loss=0.2056, over 3258527.31 frames. utt_duration=1246 frames, utt_pad_proportion=0.05905, over 10473.84 utterances.], batch size: 39, lr: 5.67e-03, grad_scale: 8.0 2023-03-08 19:21:15,269 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74064.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:21:21,926 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0945, 5.1196, 4.8161, 2.6446, 4.8695, 4.6595, 4.3031, 2.9284], device='cuda:2'), covar=tensor([0.0120, 0.0095, 0.0274, 0.1174, 0.0104, 0.0219, 0.0334, 0.1348], device='cuda:2'), in_proj_covar=tensor([0.0074, 0.0101, 0.0101, 0.0111, 0.0084, 0.0111, 0.0099, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 19:21:52,209 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=74088.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:22:24,337 INFO [train2.py:809] (2/4) Epoch 19, batch 2400, loss[ctc_loss=0.07827, att_loss=0.2474, loss=0.2135, over 17347.00 frames. utt_duration=1103 frames, utt_pad_proportion=0.03682, over 63.00 utterances.], tot_loss[ctc_loss=0.07853, att_loss=0.2382, loss=0.2063, over 3259607.16 frames. utt_duration=1241 frames, utt_pad_proportion=0.05963, over 10518.49 utterances.], batch size: 63, lr: 5.67e-03, grad_scale: 16.0 2023-03-08 19:22:51,362 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=74125.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:23:07,412 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.383e+02 2.133e+02 2.520e+02 3.216e+02 6.079e+02, threshold=5.039e+02, percent-clipped=1.0 2023-03-08 19:23:43,615 INFO [train2.py:809] (2/4) Epoch 19, batch 2450, loss[ctc_loss=0.07672, att_loss=0.2284, loss=0.1981, over 16748.00 frames. utt_duration=1397 frames, utt_pad_proportion=0.007487, over 48.00 utterances.], tot_loss[ctc_loss=0.07837, att_loss=0.238, loss=0.206, over 3259037.10 frames. utt_duration=1250 frames, utt_pad_proportion=0.05796, over 10445.63 utterances.], batch size: 48, lr: 5.67e-03, grad_scale: 16.0 2023-03-08 19:25:02,842 INFO [train2.py:809] (2/4) Epoch 19, batch 2500, loss[ctc_loss=0.05819, att_loss=0.2378, loss=0.2019, over 17425.00 frames. utt_duration=883.8 frames, utt_pad_proportion=0.07454, over 79.00 utterances.], tot_loss[ctc_loss=0.07851, att_loss=0.2382, loss=0.2063, over 3257218.81 frames. utt_duration=1198 frames, utt_pad_proportion=0.07188, over 10893.73 utterances.], batch size: 79, lr: 5.66e-03, grad_scale: 16.0 2023-03-08 19:25:47,409 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.253e+02 2.096e+02 2.486e+02 2.922e+02 5.921e+02, threshold=4.971e+02, percent-clipped=2.0 2023-03-08 19:25:54,841 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74240.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:26:03,999 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=74246.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:26:10,779 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6824, 4.9899, 4.5967, 5.0724, 4.4074, 4.7418, 5.1543, 4.9268], device='cuda:2'), covar=tensor([0.0647, 0.0302, 0.0787, 0.0354, 0.0477, 0.0271, 0.0234, 0.0216], device='cuda:2'), in_proj_covar=tensor([0.0373, 0.0307, 0.0355, 0.0328, 0.0310, 0.0230, 0.0293, 0.0274], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 19:26:23,586 INFO [train2.py:809] (2/4) Epoch 19, batch 2550, loss[ctc_loss=0.07317, att_loss=0.2164, loss=0.1878, over 14507.00 frames. utt_duration=1815 frames, utt_pad_proportion=0.02834, over 32.00 utterances.], tot_loss[ctc_loss=0.07813, att_loss=0.2381, loss=0.2061, over 3255491.87 frames. utt_duration=1185 frames, utt_pad_proportion=0.07519, over 11004.72 utterances.], batch size: 32, lr: 5.66e-03, grad_scale: 16.0 2023-03-08 19:27:20,421 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=74294.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:27:32,390 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=74301.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:27:38,513 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1015, 5.4888, 4.8993, 5.5191, 4.9021, 5.0718, 5.5598, 5.3558], device='cuda:2'), covar=tensor([0.0583, 0.0282, 0.0808, 0.0315, 0.0403, 0.0229, 0.0242, 0.0203], device='cuda:2'), in_proj_covar=tensor([0.0373, 0.0306, 0.0354, 0.0328, 0.0309, 0.0230, 0.0292, 0.0273], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 19:27:43,384 INFO [train2.py:809] (2/4) Epoch 19, batch 2600, loss[ctc_loss=0.05997, att_loss=0.2108, loss=0.1806, over 15467.00 frames. utt_duration=1720 frames, utt_pad_proportion=0.01094, over 36.00 utterances.], tot_loss[ctc_loss=0.07764, att_loss=0.2375, loss=0.2055, over 3264180.09 frames. utt_duration=1211 frames, utt_pad_proportion=0.06575, over 10795.75 utterances.], batch size: 36, lr: 5.66e-03, grad_scale: 8.0 2023-03-08 19:28:29,336 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.314e+02 1.890e+02 2.336e+02 2.791e+02 4.929e+02, threshold=4.673e+02, percent-clipped=0.0 2023-03-08 19:28:34,909 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7117, 5.0234, 4.5681, 5.0969, 4.4640, 4.6838, 5.1765, 4.9491], device='cuda:2'), covar=tensor([0.0657, 0.0296, 0.0825, 0.0298, 0.0450, 0.0332, 0.0227, 0.0210], device='cuda:2'), in_proj_covar=tensor([0.0373, 0.0306, 0.0355, 0.0327, 0.0309, 0.0230, 0.0292, 0.0272], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 19:29:04,095 INFO [train2.py:809] (2/4) Epoch 19, batch 2650, loss[ctc_loss=0.1338, att_loss=0.2711, loss=0.2436, over 14032.00 frames. utt_duration=385.9 frames, utt_pad_proportion=0.3266, over 146.00 utterances.], tot_loss[ctc_loss=0.07766, att_loss=0.2373, loss=0.2054, over 3266060.74 frames. utt_duration=1233 frames, utt_pad_proportion=0.06029, over 10605.30 utterances.], batch size: 146, lr: 5.66e-03, grad_scale: 8.0 2023-03-08 19:29:36,275 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.33 vs. limit=5.0 2023-03-08 19:29:52,092 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=74388.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:30:23,753 INFO [train2.py:809] (2/4) Epoch 19, batch 2700, loss[ctc_loss=0.1051, att_loss=0.2608, loss=0.2296, over 17354.00 frames. utt_duration=1008 frames, utt_pad_proportion=0.0493, over 69.00 utterances.], tot_loss[ctc_loss=0.07828, att_loss=0.2381, loss=0.2062, over 3274388.12 frames. utt_duration=1238 frames, utt_pad_proportion=0.05668, over 10592.22 utterances.], batch size: 69, lr: 5.66e-03, grad_scale: 8.0 2023-03-08 19:30:42,823 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=74420.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:30:44,724 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8384, 3.9940, 3.8113, 4.1996, 2.5645, 4.0822, 2.5196, 1.9124], device='cuda:2'), covar=tensor([0.0498, 0.0294, 0.0822, 0.0221, 0.1709, 0.0193, 0.1593, 0.1704], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0152, 0.0252, 0.0147, 0.0216, 0.0132, 0.0228, 0.0199], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:31:09,477 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=74436.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:31:10,820 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.294e+02 2.123e+02 2.465e+02 2.927e+02 6.214e+02, threshold=4.930e+02, percent-clipped=3.0 2023-03-08 19:31:46,095 INFO [train2.py:809] (2/4) Epoch 19, batch 2750, loss[ctc_loss=0.07759, att_loss=0.2211, loss=0.1924, over 15505.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008482, over 36.00 utterances.], tot_loss[ctc_loss=0.07904, att_loss=0.2382, loss=0.2064, over 3266722.05 frames. utt_duration=1221 frames, utt_pad_proportion=0.06307, over 10712.51 utterances.], batch size: 36, lr: 5.65e-03, grad_scale: 8.0 2023-03-08 19:33:07,057 INFO [train2.py:809] (2/4) Epoch 19, batch 2800, loss[ctc_loss=0.0693, att_loss=0.2497, loss=0.2136, over 17356.00 frames. utt_duration=1104 frames, utt_pad_proportion=0.03194, over 63.00 utterances.], tot_loss[ctc_loss=0.07856, att_loss=0.2384, loss=0.2064, over 3275803.40 frames. utt_duration=1217 frames, utt_pad_proportion=0.06158, over 10780.10 utterances.], batch size: 63, lr: 5.65e-03, grad_scale: 8.0 2023-03-08 19:33:19,787 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7949, 2.2547, 2.2438, 2.3083, 2.9826, 2.4491, 2.1868, 2.7751], device='cuda:2'), covar=tensor([0.1329, 0.3150, 0.2339, 0.1677, 0.2058, 0.1442, 0.2733, 0.1192], device='cuda:2'), in_proj_covar=tensor([0.0109, 0.0116, 0.0113, 0.0102, 0.0114, 0.0100, 0.0122, 0.0090], device='cuda:2'), out_proj_covar=tensor([8.1069e-05, 8.9349e-05, 8.8731e-05, 7.8982e-05, 8.4074e-05, 7.9337e-05, 9.0253e-05, 7.2457e-05], device='cuda:2') 2023-03-08 19:33:46,262 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.25 vs. limit=5.0 2023-03-08 19:33:53,759 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.261e+02 2.053e+02 2.431e+02 2.775e+02 7.062e+02, threshold=4.862e+02, percent-clipped=1.0 2023-03-08 19:34:27,615 INFO [train2.py:809] (2/4) Epoch 19, batch 2850, loss[ctc_loss=0.06367, att_loss=0.2086, loss=0.1796, over 16000.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007099, over 40.00 utterances.], tot_loss[ctc_loss=0.0788, att_loss=0.2391, loss=0.2071, over 3277853.11 frames. utt_duration=1200 frames, utt_pad_proportion=0.066, over 10938.25 utterances.], batch size: 40, lr: 5.65e-03, grad_scale: 8.0 2023-03-08 19:35:27,813 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74595.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:35:29,839 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=74596.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:35:48,357 INFO [train2.py:809] (2/4) Epoch 19, batch 2900, loss[ctc_loss=0.05685, att_loss=0.2111, loss=0.1802, over 15512.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.008035, over 36.00 utterances.], tot_loss[ctc_loss=0.07769, att_loss=0.2377, loss=0.2057, over 3270202.29 frames. utt_duration=1216 frames, utt_pad_proportion=0.06463, over 10774.56 utterances.], batch size: 36, lr: 5.65e-03, grad_scale: 8.0 2023-03-08 19:36:08,925 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4992, 3.0009, 3.5326, 2.9649, 3.4437, 4.5809, 4.3987, 3.0015], device='cuda:2'), covar=tensor([0.0352, 0.1570, 0.1395, 0.1304, 0.1061, 0.0823, 0.0524, 0.1410], device='cuda:2'), in_proj_covar=tensor([0.0241, 0.0242, 0.0275, 0.0214, 0.0259, 0.0357, 0.0255, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 19:36:34,195 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.339e+02 1.992e+02 2.354e+02 2.922e+02 1.423e+03, threshold=4.708e+02, percent-clipped=1.0 2023-03-08 19:36:38,209 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.69 vs. limit=2.0 2023-03-08 19:37:06,203 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=74656.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:37:08,927 INFO [train2.py:809] (2/4) Epoch 19, batch 2950, loss[ctc_loss=0.06931, att_loss=0.2384, loss=0.2046, over 16622.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005738, over 47.00 utterances.], tot_loss[ctc_loss=0.07699, att_loss=0.2374, loss=0.2054, over 3269785.05 frames. utt_duration=1232 frames, utt_pad_proportion=0.06035, over 10627.24 utterances.], batch size: 47, lr: 5.65e-03, grad_scale: 8.0 2023-03-08 19:38:28,658 INFO [train2.py:809] (2/4) Epoch 19, batch 3000, loss[ctc_loss=0.07581, att_loss=0.2239, loss=0.1943, over 15788.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007584, over 38.00 utterances.], tot_loss[ctc_loss=0.0774, att_loss=0.2374, loss=0.2054, over 3265628.89 frames. utt_duration=1246 frames, utt_pad_proportion=0.05678, over 10492.28 utterances.], batch size: 38, lr: 5.64e-03, grad_scale: 8.0 2023-03-08 19:38:28,659 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 19:38:42,983 INFO [train2.py:843] (2/4) Epoch 19, validation: ctc_loss=0.04253, att_loss=0.235, loss=0.1965, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 19:38:42,984 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 19:38:55,905 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2231, 5.4961, 5.3604, 5.4046, 5.5054, 5.4526, 5.1658, 4.9455], device='cuda:2'), covar=tensor([0.0934, 0.0474, 0.0261, 0.0478, 0.0258, 0.0283, 0.0364, 0.0301], device='cuda:2'), in_proj_covar=tensor([0.0510, 0.0347, 0.0329, 0.0341, 0.0401, 0.0416, 0.0344, 0.0382], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 19:39:02,200 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=74720.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:39:29,765 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 2.054e+02 2.397e+02 2.986e+02 9.234e+02, threshold=4.795e+02, percent-clipped=4.0 2023-03-08 19:39:57,313 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5985, 5.0217, 4.8288, 4.9174, 5.0575, 4.7723, 3.5237, 5.0130], device='cuda:2'), covar=tensor([0.0118, 0.0120, 0.0133, 0.0083, 0.0095, 0.0113, 0.0648, 0.0175], device='cuda:2'), in_proj_covar=tensor([0.0090, 0.0086, 0.0109, 0.0067, 0.0072, 0.0084, 0.0104, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 19:40:05,292 INFO [train2.py:809] (2/4) Epoch 19, batch 3050, loss[ctc_loss=0.05649, att_loss=0.2082, loss=0.1778, over 15369.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01128, over 35.00 utterances.], tot_loss[ctc_loss=0.07753, att_loss=0.2377, loss=0.2056, over 3251648.58 frames. utt_duration=1234 frames, utt_pad_proportion=0.06143, over 10549.94 utterances.], batch size: 35, lr: 5.64e-03, grad_scale: 8.0 2023-03-08 19:40:20,706 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=74768.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:40:37,008 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-08 19:41:26,170 INFO [train2.py:809] (2/4) Epoch 19, batch 3100, loss[ctc_loss=0.08308, att_loss=0.2267, loss=0.198, over 16136.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005587, over 42.00 utterances.], tot_loss[ctc_loss=0.07744, att_loss=0.2373, loss=0.2053, over 3253354.84 frames. utt_duration=1252 frames, utt_pad_proportion=0.05728, over 10406.02 utterances.], batch size: 42, lr: 5.64e-03, grad_scale: 8.0 2023-03-08 19:42:11,534 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.275e+02 1.971e+02 2.423e+02 3.077e+02 6.495e+02, threshold=4.846e+02, percent-clipped=7.0 2023-03-08 19:42:46,964 INFO [train2.py:809] (2/4) Epoch 19, batch 3150, loss[ctc_loss=0.09364, att_loss=0.2589, loss=0.2258, over 17047.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.007777, over 52.00 utterances.], tot_loss[ctc_loss=0.07741, att_loss=0.2377, loss=0.2057, over 3266111.71 frames. utt_duration=1256 frames, utt_pad_proportion=0.05329, over 10415.38 utterances.], batch size: 52, lr: 5.64e-03, grad_scale: 8.0 2023-03-08 19:43:47,026 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=74896.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:43:49,947 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5967, 4.7728, 4.4105, 4.8953, 4.2792, 4.5300, 4.9306, 4.7308], device='cuda:2'), covar=tensor([0.0574, 0.0412, 0.0869, 0.0399, 0.0474, 0.0352, 0.0317, 0.0238], device='cuda:2'), in_proj_covar=tensor([0.0378, 0.0311, 0.0359, 0.0332, 0.0312, 0.0232, 0.0294, 0.0276], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 19:44:06,752 INFO [train2.py:809] (2/4) Epoch 19, batch 3200, loss[ctc_loss=0.0981, att_loss=0.2475, loss=0.2176, over 17016.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007933, over 51.00 utterances.], tot_loss[ctc_loss=0.07716, att_loss=0.2374, loss=0.2054, over 3266089.45 frames. utt_duration=1249 frames, utt_pad_proportion=0.05515, over 10474.52 utterances.], batch size: 51, lr: 5.64e-03, grad_scale: 8.0 2023-03-08 19:44:08,697 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74909.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:44:51,312 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.517e+02 1.907e+02 2.285e+02 3.012e+02 5.502e+02, threshold=4.570e+02, percent-clipped=3.0 2023-03-08 19:44:55,979 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0197, 6.2544, 5.6313, 5.9485, 5.9351, 5.4657, 5.7116, 5.4267], device='cuda:2'), covar=tensor([0.1236, 0.0828, 0.0989, 0.0837, 0.0773, 0.1501, 0.2138, 0.2176], device='cuda:2'), in_proj_covar=tensor([0.0512, 0.0595, 0.0450, 0.0444, 0.0419, 0.0459, 0.0606, 0.0520], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 19:45:03,892 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=74944.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:45:15,403 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=74951.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:45:26,712 INFO [train2.py:809] (2/4) Epoch 19, batch 3250, loss[ctc_loss=0.07526, att_loss=0.249, loss=0.2142, over 16776.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005993, over 48.00 utterances.], tot_loss[ctc_loss=0.07688, att_loss=0.237, loss=0.205, over 3268868.85 frames. utt_duration=1270 frames, utt_pad_proportion=0.05035, over 10307.27 utterances.], batch size: 48, lr: 5.64e-03, grad_scale: 8.0 2023-03-08 19:45:36,414 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74964.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:45:45,830 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=74970.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:46:46,968 INFO [train2.py:809] (2/4) Epoch 19, batch 3300, loss[ctc_loss=0.08297, att_loss=0.2337, loss=0.2036, over 16179.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006412, over 41.00 utterances.], tot_loss[ctc_loss=0.0773, att_loss=0.2373, loss=0.2053, over 3269141.13 frames. utt_duration=1266 frames, utt_pad_proportion=0.05183, over 10343.29 utterances.], batch size: 41, lr: 5.63e-03, grad_scale: 8.0 2023-03-08 19:47:13,530 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=75025.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:47:31,947 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.356e+02 2.051e+02 2.410e+02 2.967e+02 4.688e+02, threshold=4.820e+02, percent-clipped=1.0 2023-03-08 19:48:06,385 INFO [train2.py:809] (2/4) Epoch 19, batch 3350, loss[ctc_loss=0.08795, att_loss=0.2552, loss=0.2217, over 16769.00 frames. utt_duration=679.3 frames, utt_pad_proportion=0.1477, over 99.00 utterances.], tot_loss[ctc_loss=0.07816, att_loss=0.2381, loss=0.2061, over 3269893.32 frames. utt_duration=1246 frames, utt_pad_proportion=0.05676, over 10507.44 utterances.], batch size: 99, lr: 5.63e-03, grad_scale: 8.0 2023-03-08 19:49:19,485 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2023-03-08 19:49:26,188 INFO [train2.py:809] (2/4) Epoch 19, batch 3400, loss[ctc_loss=0.0556, att_loss=0.2104, loss=0.1795, over 15625.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.01011, over 37.00 utterances.], tot_loss[ctc_loss=0.07826, att_loss=0.2375, loss=0.2056, over 3265447.20 frames. utt_duration=1209 frames, utt_pad_proportion=0.06658, over 10814.19 utterances.], batch size: 37, lr: 5.63e-03, grad_scale: 8.0 2023-03-08 19:50:01,340 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7388, 5.0128, 4.6454, 5.1302, 4.5552, 4.7307, 5.1483, 4.9302], device='cuda:2'), covar=tensor([0.0592, 0.0339, 0.0752, 0.0345, 0.0410, 0.0366, 0.0245, 0.0201], device='cuda:2'), in_proj_covar=tensor([0.0383, 0.0314, 0.0361, 0.0336, 0.0316, 0.0236, 0.0297, 0.0279], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 19:50:11,775 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 2.118e+02 2.457e+02 2.938e+02 6.241e+02, threshold=4.915e+02, percent-clipped=4.0 2023-03-08 19:50:25,663 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0372, 5.3584, 4.9198, 5.4633, 4.7281, 5.0044, 5.4812, 5.2252], device='cuda:2'), covar=tensor([0.0534, 0.0275, 0.0734, 0.0265, 0.0413, 0.0239, 0.0217, 0.0202], device='cuda:2'), in_proj_covar=tensor([0.0380, 0.0312, 0.0358, 0.0333, 0.0314, 0.0234, 0.0295, 0.0277], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 19:50:46,056 INFO [train2.py:809] (2/4) Epoch 19, batch 3450, loss[ctc_loss=0.0841, att_loss=0.252, loss=0.2184, over 16539.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006272, over 45.00 utterances.], tot_loss[ctc_loss=0.07827, att_loss=0.2381, loss=0.2061, over 3274581.18 frames. utt_duration=1238 frames, utt_pad_proportion=0.05724, over 10592.33 utterances.], batch size: 45, lr: 5.63e-03, grad_scale: 8.0 2023-03-08 19:51:32,623 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1321, 4.4104, 4.5287, 4.7986, 3.0130, 4.4460, 2.6956, 2.0541], device='cuda:2'), covar=tensor([0.0404, 0.0269, 0.0619, 0.0169, 0.1371, 0.0166, 0.1432, 0.1577], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0152, 0.0253, 0.0147, 0.0215, 0.0131, 0.0225, 0.0197], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:51:57,384 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7877, 5.2531, 5.0893, 5.1869, 5.3354, 4.8593, 3.7090, 5.2518], device='cuda:2'), covar=tensor([0.0097, 0.0086, 0.0099, 0.0070, 0.0060, 0.0100, 0.0601, 0.0152], device='cuda:2'), in_proj_covar=tensor([0.0089, 0.0085, 0.0107, 0.0066, 0.0072, 0.0083, 0.0102, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 19:52:06,153 INFO [train2.py:809] (2/4) Epoch 19, batch 3500, loss[ctc_loss=0.09604, att_loss=0.2364, loss=0.2084, over 16683.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006914, over 46.00 utterances.], tot_loss[ctc_loss=0.07824, att_loss=0.2382, loss=0.2062, over 3271155.08 frames. utt_duration=1221 frames, utt_pad_proportion=0.0633, over 10733.14 utterances.], batch size: 46, lr: 5.63e-03, grad_scale: 8.0 2023-03-08 19:52:21,279 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-03-08 19:52:38,245 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=75228.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:52:52,420 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.557e+02 2.026e+02 2.447e+02 3.089e+02 8.845e+02, threshold=4.893e+02, percent-clipped=4.0 2023-03-08 19:53:15,774 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75251.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:53:23,744 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1987, 5.4211, 5.3688, 5.3387, 5.4894, 5.4446, 5.0999, 4.9693], device='cuda:2'), covar=tensor([0.1011, 0.0565, 0.0266, 0.0543, 0.0254, 0.0270, 0.0460, 0.0313], device='cuda:2'), in_proj_covar=tensor([0.0515, 0.0351, 0.0332, 0.0345, 0.0403, 0.0420, 0.0348, 0.0384], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 19:53:26,626 INFO [train2.py:809] (2/4) Epoch 19, batch 3550, loss[ctc_loss=0.06629, att_loss=0.2175, loss=0.1873, over 15505.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.008322, over 36.00 utterances.], tot_loss[ctc_loss=0.07751, att_loss=0.2383, loss=0.2062, over 3276939.22 frames. utt_duration=1242 frames, utt_pad_proportion=0.05507, over 10567.06 utterances.], batch size: 36, lr: 5.62e-03, grad_scale: 8.0 2023-03-08 19:53:37,480 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=75265.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:53:45,183 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=75270.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:54:01,358 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1686, 5.2493, 4.9592, 2.5604, 2.0132, 3.1681, 2.6008, 4.0088], device='cuda:2'), covar=tensor([0.0661, 0.0274, 0.0272, 0.4414, 0.5517, 0.2214, 0.3114, 0.1577], device='cuda:2'), in_proj_covar=tensor([0.0356, 0.0267, 0.0265, 0.0240, 0.0343, 0.0335, 0.0250, 0.0362], device='cuda:2'), out_proj_covar=tensor([1.5146e-04, 9.8954e-05, 1.1314e-04, 1.0359e-04, 1.4417e-04, 1.3136e-04, 1.0002e-04, 1.4767e-04], device='cuda:2') 2023-03-08 19:54:15,822 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=75289.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:54:32,153 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75299.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:54:46,280 INFO [train2.py:809] (2/4) Epoch 19, batch 3600, loss[ctc_loss=0.08677, att_loss=0.2546, loss=0.221, over 16764.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006703, over 48.00 utterances.], tot_loss[ctc_loss=0.07787, att_loss=0.2384, loss=0.2063, over 3264819.01 frames. utt_duration=1204 frames, utt_pad_proportion=0.06666, over 10864.46 utterances.], batch size: 48, lr: 5.62e-03, grad_scale: 8.0 2023-03-08 19:55:04,938 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=75320.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 19:55:23,535 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=75331.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 19:55:32,224 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.965e+02 2.324e+02 2.872e+02 5.380e+02, threshold=4.648e+02, percent-clipped=3.0 2023-03-08 19:56:05,131 INFO [train2.py:809] (2/4) Epoch 19, batch 3650, loss[ctc_loss=0.08256, att_loss=0.2499, loss=0.2164, over 17313.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02389, over 59.00 utterances.], tot_loss[ctc_loss=0.0789, att_loss=0.2396, loss=0.2075, over 3274807.26 frames. utt_duration=1190 frames, utt_pad_proportion=0.06777, over 11020.83 utterances.], batch size: 59, lr: 5.62e-03, grad_scale: 8.0 2023-03-08 19:56:59,345 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9677, 4.4481, 4.4188, 4.5764, 2.8424, 4.3012, 2.5994, 1.8799], device='cuda:2'), covar=tensor([0.0493, 0.0201, 0.0644, 0.0209, 0.1542, 0.0186, 0.1461, 0.1614], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0153, 0.0254, 0.0149, 0.0216, 0.0133, 0.0227, 0.0199], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 19:57:05,509 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0623, 4.9617, 4.9165, 2.2686, 2.0742, 2.7295, 2.2879, 3.8592], device='cuda:2'), covar=tensor([0.0674, 0.0227, 0.0192, 0.4354, 0.5693, 0.2812, 0.3500, 0.1422], device='cuda:2'), in_proj_covar=tensor([0.0357, 0.0267, 0.0266, 0.0240, 0.0345, 0.0337, 0.0250, 0.0363], device='cuda:2'), out_proj_covar=tensor([1.5194e-04, 9.9028e-05, 1.1335e-04, 1.0361e-04, 1.4492e-04, 1.3222e-04, 1.0010e-04, 1.4811e-04], device='cuda:2') 2023-03-08 19:57:25,709 INFO [train2.py:809] (2/4) Epoch 19, batch 3700, loss[ctc_loss=0.08064, att_loss=0.2498, loss=0.2159, over 17434.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03114, over 63.00 utterances.], tot_loss[ctc_loss=0.0786, att_loss=0.2391, loss=0.207, over 3278709.91 frames. utt_duration=1208 frames, utt_pad_proportion=0.06229, over 10872.28 utterances.], batch size: 63, lr: 5.62e-03, grad_scale: 8.0 2023-03-08 19:57:28,968 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7607, 4.9505, 5.5720, 4.8759, 4.8504, 5.6241, 5.0070, 5.6142], device='cuda:2'), covar=tensor([0.1400, 0.1593, 0.1102, 0.2359, 0.3583, 0.1643, 0.1264, 0.1334], device='cuda:2'), in_proj_covar=tensor([0.0852, 0.0500, 0.0586, 0.0647, 0.0861, 0.0609, 0.0482, 0.0591], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 19:58:11,211 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 2.064e+02 2.535e+02 3.260e+02 9.270e+02, threshold=5.071e+02, percent-clipped=6.0 2023-03-08 19:58:37,157 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5318, 2.5208, 5.0226, 3.9601, 3.1731, 4.3799, 4.8732, 4.7305], device='cuda:2'), covar=tensor([0.0295, 0.1763, 0.0255, 0.0937, 0.1613, 0.0235, 0.0161, 0.0258], device='cuda:2'), in_proj_covar=tensor([0.0185, 0.0241, 0.0178, 0.0308, 0.0263, 0.0205, 0.0163, 0.0193], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002], device='cuda:2') 2023-03-08 19:58:44,486 INFO [train2.py:809] (2/4) Epoch 19, batch 3750, loss[ctc_loss=0.07452, att_loss=0.2439, loss=0.21, over 17078.00 frames. utt_duration=1290 frames, utt_pad_proportion=0.008441, over 53.00 utterances.], tot_loss[ctc_loss=0.07835, att_loss=0.2386, loss=0.2066, over 3274360.82 frames. utt_duration=1211 frames, utt_pad_proportion=0.06227, over 10826.30 utterances.], batch size: 53, lr: 5.62e-03, grad_scale: 8.0 2023-03-08 19:58:50,339 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.15 vs. limit=5.0 2023-03-08 19:59:16,093 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5184, 4.9488, 4.8130, 5.0117, 5.0873, 4.6272, 3.7698, 4.9332], device='cuda:2'), covar=tensor([0.0121, 0.0103, 0.0118, 0.0066, 0.0072, 0.0097, 0.0542, 0.0163], device='cuda:2'), in_proj_covar=tensor([0.0089, 0.0085, 0.0107, 0.0066, 0.0072, 0.0083, 0.0102, 0.0106], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 20:00:03,970 INFO [train2.py:809] (2/4) Epoch 19, batch 3800, loss[ctc_loss=0.08025, att_loss=0.2304, loss=0.2004, over 15885.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009393, over 39.00 utterances.], tot_loss[ctc_loss=0.07857, att_loss=0.2387, loss=0.2066, over 3281026.13 frames. utt_duration=1227 frames, utt_pad_proportion=0.05716, over 10712.58 utterances.], batch size: 39, lr: 5.62e-03, grad_scale: 8.0 2023-03-08 20:00:50,186 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.424e+02 2.066e+02 2.537e+02 3.345e+02 7.460e+02, threshold=5.074e+02, percent-clipped=4.0 2023-03-08 20:01:19,383 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8737, 2.4624, 2.6895, 2.6460, 3.2505, 2.6208, 2.4379, 2.9171], device='cuda:2'), covar=tensor([0.1778, 0.3222, 0.2649, 0.1845, 0.1772, 0.1599, 0.3433, 0.1247], device='cuda:2'), in_proj_covar=tensor([0.0111, 0.0118, 0.0117, 0.0104, 0.0117, 0.0103, 0.0125, 0.0092], device='cuda:2'), out_proj_covar=tensor([8.3003e-05, 9.1326e-05, 9.1543e-05, 8.0588e-05, 8.5942e-05, 8.1871e-05, 9.2401e-05, 7.4201e-05], device='cuda:2') 2023-03-08 20:01:23,606 INFO [train2.py:809] (2/4) Epoch 19, batch 3850, loss[ctc_loss=0.06479, att_loss=0.2077, loss=0.1791, over 15475.00 frames. utt_duration=1721 frames, utt_pad_proportion=0.01059, over 36.00 utterances.], tot_loss[ctc_loss=0.07788, att_loss=0.2381, loss=0.2061, over 3275729.26 frames. utt_duration=1218 frames, utt_pad_proportion=0.05986, over 10768.12 utterances.], batch size: 36, lr: 5.61e-03, grad_scale: 8.0 2023-03-08 20:01:34,612 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75565.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:02:03,498 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=75584.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:02:17,138 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6024, 4.9907, 4.7638, 4.9677, 5.0871, 4.6772, 3.5890, 4.9078], device='cuda:2'), covar=tensor([0.0127, 0.0110, 0.0150, 0.0083, 0.0079, 0.0116, 0.0644, 0.0192], device='cuda:2'), in_proj_covar=tensor([0.0089, 0.0085, 0.0108, 0.0066, 0.0072, 0.0083, 0.0102, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 20:02:40,366 INFO [train2.py:809] (2/4) Epoch 19, batch 3900, loss[ctc_loss=0.04851, att_loss=0.2133, loss=0.1803, over 16117.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006818, over 42.00 utterances.], tot_loss[ctc_loss=0.07783, att_loss=0.2385, loss=0.2063, over 3273413.21 frames. utt_duration=1211 frames, utt_pad_proportion=0.06229, over 10824.43 utterances.], batch size: 42, lr: 5.61e-03, grad_scale: 8.0 2023-03-08 20:02:47,833 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75613.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:02:55,370 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9465, 6.2038, 5.6580, 5.9221, 5.8774, 5.3518, 5.6758, 5.3459], device='cuda:2'), covar=tensor([0.1183, 0.0768, 0.0885, 0.0826, 0.0643, 0.1481, 0.1750, 0.2251], device='cuda:2'), in_proj_covar=tensor([0.0506, 0.0591, 0.0448, 0.0443, 0.0418, 0.0456, 0.0598, 0.0516], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 20:02:58,537 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75620.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:03:07,707 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=75626.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 20:03:24,258 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.284e+02 1.998e+02 2.333e+02 2.845e+02 5.732e+02, threshold=4.667e+02, percent-clipped=1.0 2023-03-08 20:03:57,008 INFO [train2.py:809] (2/4) Epoch 19, batch 3950, loss[ctc_loss=0.05827, att_loss=0.2179, loss=0.1859, over 14525.00 frames. utt_duration=1817 frames, utt_pad_proportion=0.03637, over 32.00 utterances.], tot_loss[ctc_loss=0.07805, att_loss=0.2386, loss=0.2065, over 3269743.03 frames. utt_duration=1198 frames, utt_pad_proportion=0.06607, over 10928.45 utterances.], batch size: 32, lr: 5.61e-03, grad_scale: 8.0 2023-03-08 20:04:12,252 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75668.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:04:13,978 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7916, 3.9744, 3.9162, 4.0142, 4.0677, 3.8584, 3.0062, 3.8992], device='cuda:2'), covar=tensor([0.0150, 0.0142, 0.0152, 0.0112, 0.0121, 0.0130, 0.0661, 0.0281], device='cuda:2'), in_proj_covar=tensor([0.0090, 0.0086, 0.0109, 0.0067, 0.0073, 0.0084, 0.0103, 0.0108], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 20:05:12,139 INFO [train2.py:809] (2/4) Epoch 20, batch 0, loss[ctc_loss=0.0644, att_loss=0.214, loss=0.1841, over 15761.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.00906, over 38.00 utterances.], tot_loss[ctc_loss=0.0644, att_loss=0.214, loss=0.1841, over 15761.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.00906, over 38.00 utterances.], batch size: 38, lr: 5.46e-03, grad_scale: 8.0 2023-03-08 20:05:12,139 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 20:05:24,206 INFO [train2.py:843] (2/4) Epoch 20, validation: ctc_loss=0.04136, att_loss=0.235, loss=0.1963, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 20:05:24,207 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 20:06:35,312 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.268e+02 2.109e+02 2.667e+02 3.273e+02 6.239e+02, threshold=5.334e+02, percent-clipped=6.0 2023-03-08 20:06:37,674 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-03-08 20:06:43,083 INFO [train2.py:809] (2/4) Epoch 20, batch 50, loss[ctc_loss=0.08387, att_loss=0.2528, loss=0.219, over 17071.00 frames. utt_duration=1290 frames, utt_pad_proportion=0.00847, over 53.00 utterances.], tot_loss[ctc_loss=0.07615, att_loss=0.24, loss=0.2072, over 741861.03 frames. utt_duration=1202 frames, utt_pad_proportion=0.05905, over 2471.70 utterances.], batch size: 53, lr: 5.46e-03, grad_scale: 8.0 2023-03-08 20:07:27,233 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-08 20:08:03,706 INFO [train2.py:809] (2/4) Epoch 20, batch 100, loss[ctc_loss=0.1158, att_loss=0.2638, loss=0.2342, over 13733.00 frames. utt_duration=377.8 frames, utt_pad_proportion=0.343, over 146.00 utterances.], tot_loss[ctc_loss=0.07822, att_loss=0.2388, loss=0.2067, over 1301439.68 frames. utt_duration=1188 frames, utt_pad_proportion=0.06599, over 4388.98 utterances.], batch size: 146, lr: 5.46e-03, grad_scale: 8.0 2023-03-08 20:08:16,354 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8787, 5.3025, 5.0933, 5.1724, 5.3309, 4.9061, 3.5643, 5.2350], device='cuda:2'), covar=tensor([0.0088, 0.0088, 0.0100, 0.0082, 0.0056, 0.0095, 0.0628, 0.0149], device='cuda:2'), in_proj_covar=tensor([0.0089, 0.0085, 0.0107, 0.0066, 0.0072, 0.0083, 0.0102, 0.0106], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 20:08:56,146 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2023-03-08 20:09:16,367 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.261e+02 1.985e+02 2.361e+02 2.843e+02 6.853e+02, threshold=4.722e+02, percent-clipped=2.0 2023-03-08 20:09:24,378 INFO [train2.py:809] (2/4) Epoch 20, batch 150, loss[ctc_loss=0.09153, att_loss=0.2507, loss=0.2189, over 17015.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007933, over 51.00 utterances.], tot_loss[ctc_loss=0.07906, att_loss=0.239, loss=0.207, over 1735852.56 frames. utt_duration=1153 frames, utt_pad_proportion=0.07645, over 6027.34 utterances.], batch size: 51, lr: 5.46e-03, grad_scale: 8.0 2023-03-08 20:10:33,724 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75884.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:10:44,800 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5897, 3.1993, 3.7773, 3.0198, 3.6539, 4.7421, 4.5650, 3.3106], device='cuda:2'), covar=tensor([0.0366, 0.1461, 0.1141, 0.1406, 0.0983, 0.0747, 0.0491, 0.1260], device='cuda:2'), in_proj_covar=tensor([0.0244, 0.0241, 0.0275, 0.0216, 0.0263, 0.0355, 0.0256, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 20:10:46,185 INFO [train2.py:809] (2/4) Epoch 20, batch 200, loss[ctc_loss=0.06674, att_loss=0.2362, loss=0.2023, over 16115.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006361, over 42.00 utterances.], tot_loss[ctc_loss=0.07855, att_loss=0.239, loss=0.2069, over 2077211.83 frames. utt_duration=1188 frames, utt_pad_proportion=0.06933, over 7000.25 utterances.], batch size: 42, lr: 5.46e-03, grad_scale: 8.0 2023-03-08 20:11:42,081 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75926.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 20:11:51,158 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75932.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:11:58,997 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.248e+02 2.084e+02 2.421e+02 2.926e+02 5.998e+02, threshold=4.842e+02, percent-clipped=2.0 2023-03-08 20:12:06,826 INFO [train2.py:809] (2/4) Epoch 20, batch 250, loss[ctc_loss=0.05361, att_loss=0.2164, loss=0.1838, over 16122.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006603, over 42.00 utterances.], tot_loss[ctc_loss=0.07752, att_loss=0.2388, loss=0.2066, over 2349008.22 frames. utt_duration=1222 frames, utt_pad_proportion=0.05956, over 7697.73 utterances.], batch size: 42, lr: 5.45e-03, grad_scale: 8.0 2023-03-08 20:12:59,212 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75974.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:13:08,644 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6662, 4.7610, 4.6983, 4.8240, 5.2840, 4.5800, 4.6371, 2.4797], device='cuda:2'), covar=tensor([0.0193, 0.0282, 0.0302, 0.0257, 0.0950, 0.0215, 0.0318, 0.1861], device='cuda:2'), in_proj_covar=tensor([0.0151, 0.0174, 0.0176, 0.0192, 0.0356, 0.0147, 0.0165, 0.0211], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 20:13:09,182 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-03-08 20:13:10,195 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1310, 4.4621, 4.5761, 4.8043, 2.9531, 4.5874, 2.7754, 1.6092], device='cuda:2'), covar=tensor([0.0490, 0.0253, 0.0648, 0.0182, 0.1522, 0.0156, 0.1464, 0.1855], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0153, 0.0253, 0.0150, 0.0215, 0.0133, 0.0227, 0.0198], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 20:13:25,318 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1493, 5.4677, 4.9146, 5.5320, 4.9397, 5.1199, 5.6000, 5.3918], device='cuda:2'), covar=tensor([0.0465, 0.0232, 0.0723, 0.0279, 0.0343, 0.0169, 0.0192, 0.0186], device='cuda:2'), in_proj_covar=tensor([0.0378, 0.0311, 0.0360, 0.0336, 0.0313, 0.0233, 0.0295, 0.0280], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 20:13:26,591 INFO [train2.py:809] (2/4) Epoch 20, batch 300, loss[ctc_loss=0.07827, att_loss=0.2121, loss=0.1853, over 15344.00 frames. utt_duration=1755 frames, utt_pad_proportion=0.01286, over 35.00 utterances.], tot_loss[ctc_loss=0.07732, att_loss=0.2382, loss=0.2061, over 2549634.68 frames. utt_duration=1240 frames, utt_pad_proportion=0.05787, over 8237.72 utterances.], batch size: 35, lr: 5.45e-03, grad_scale: 8.0 2023-03-08 20:14:42,638 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.429e+02 2.144e+02 2.516e+02 3.059e+02 1.142e+03, threshold=5.032e+02, percent-clipped=7.0 2023-03-08 20:14:50,245 INFO [train2.py:809] (2/4) Epoch 20, batch 350, loss[ctc_loss=0.07883, att_loss=0.2452, loss=0.2119, over 16630.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.005125, over 47.00 utterances.], tot_loss[ctc_loss=0.07731, att_loss=0.2385, loss=0.2063, over 2717323.29 frames. utt_duration=1231 frames, utt_pad_proportion=0.0572, over 8837.14 utterances.], batch size: 47, lr: 5.45e-03, grad_scale: 8.0 2023-03-08 20:15:17,534 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-08 20:16:10,635 INFO [train2.py:809] (2/4) Epoch 20, batch 400, loss[ctc_loss=0.09597, att_loss=0.2599, loss=0.2271, over 17047.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.009978, over 53.00 utterances.], tot_loss[ctc_loss=0.07755, att_loss=0.238, loss=0.2059, over 2837767.45 frames. utt_duration=1232 frames, utt_pad_proportion=0.05814, over 9222.13 utterances.], batch size: 53, lr: 5.45e-03, grad_scale: 8.0 2023-03-08 20:17:22,263 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.475e+02 1.983e+02 2.448e+02 2.910e+02 4.561e+02, threshold=4.897e+02, percent-clipped=0.0 2023-03-08 20:17:29,974 INFO [train2.py:809] (2/4) Epoch 20, batch 450, loss[ctc_loss=0.06651, att_loss=0.2373, loss=0.2031, over 16406.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.007208, over 44.00 utterances.], tot_loss[ctc_loss=0.07761, att_loss=0.2379, loss=0.2058, over 2943131.67 frames. utt_duration=1243 frames, utt_pad_proportion=0.05228, over 9484.48 utterances.], batch size: 44, lr: 5.45e-03, grad_scale: 8.0 2023-03-08 20:18:18,986 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3193, 3.7781, 3.8865, 3.2212, 3.8730, 3.9740, 3.9429, 2.8324], device='cuda:2'), covar=tensor([0.0938, 0.2071, 0.2152, 0.5864, 0.1674, 0.2564, 0.0796, 0.7136], device='cuda:2'), in_proj_covar=tensor([0.0162, 0.0175, 0.0188, 0.0251, 0.0151, 0.0249, 0.0168, 0.0211], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 20:18:49,885 INFO [train2.py:809] (2/4) Epoch 20, batch 500, loss[ctc_loss=0.07759, att_loss=0.2176, loss=0.1896, over 15624.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.01006, over 37.00 utterances.], tot_loss[ctc_loss=0.07722, att_loss=0.2369, loss=0.2049, over 2997383.50 frames. utt_duration=1235 frames, utt_pad_proportion=0.06014, over 9721.17 utterances.], batch size: 37, lr: 5.45e-03, grad_scale: 8.0 2023-03-08 20:19:11,453 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.74 vs. limit=5.0 2023-03-08 20:20:01,813 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.260e+02 1.846e+02 2.334e+02 2.812e+02 5.634e+02, threshold=4.669e+02, percent-clipped=1.0 2023-03-08 20:20:09,795 INFO [train2.py:809] (2/4) Epoch 20, batch 550, loss[ctc_loss=0.06318, att_loss=0.2275, loss=0.1946, over 16528.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006902, over 45.00 utterances.], tot_loss[ctc_loss=0.07708, att_loss=0.2367, loss=0.2048, over 3053893.31 frames. utt_duration=1217 frames, utt_pad_proportion=0.06491, over 10053.94 utterances.], batch size: 45, lr: 5.44e-03, grad_scale: 8.0 2023-03-08 20:20:10,145 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6217, 3.2859, 3.3230, 2.8943, 3.2529, 3.3368, 3.3396, 2.4297], device='cuda:2'), covar=tensor([0.1114, 0.1609, 0.2365, 0.4507, 0.1410, 0.3845, 0.1064, 0.4460], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0175, 0.0187, 0.0249, 0.0150, 0.0248, 0.0167, 0.0211], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 20:20:54,219 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=76269.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 20:21:29,922 INFO [train2.py:809] (2/4) Epoch 20, batch 600, loss[ctc_loss=0.06343, att_loss=0.2079, loss=0.179, over 15495.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009041, over 36.00 utterances.], tot_loss[ctc_loss=0.07681, att_loss=0.2365, loss=0.2045, over 3098989.15 frames. utt_duration=1213 frames, utt_pad_proportion=0.06654, over 10231.31 utterances.], batch size: 36, lr: 5.44e-03, grad_scale: 16.0 2023-03-08 20:21:33,845 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=76294.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:21:44,472 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9601, 3.6682, 3.1512, 3.3582, 3.8647, 3.6019, 2.9369, 4.0686], device='cuda:2'), covar=tensor([0.1029, 0.0507, 0.1033, 0.0694, 0.0711, 0.0705, 0.0881, 0.0498], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0213, 0.0225, 0.0198, 0.0274, 0.0237, 0.0200, 0.0285], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 20:22:20,699 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=76323.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 20:22:31,335 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=76330.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 20:22:41,629 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.413e+02 1.945e+02 2.306e+02 2.690e+02 7.016e+02, threshold=4.612e+02, percent-clipped=2.0 2023-03-08 20:22:49,614 INFO [train2.py:809] (2/4) Epoch 20, batch 650, loss[ctc_loss=0.06596, att_loss=0.2479, loss=0.2115, over 16973.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007056, over 50.00 utterances.], tot_loss[ctc_loss=0.0769, att_loss=0.2368, loss=0.2049, over 3145593.87 frames. utt_duration=1213 frames, utt_pad_proportion=0.06374, over 10381.83 utterances.], batch size: 50, lr: 5.44e-03, grad_scale: 16.0 2023-03-08 20:23:11,358 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=76355.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:23:18,796 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0426, 5.0659, 4.8156, 2.8916, 4.8166, 4.6801, 4.3023, 2.7364], device='cuda:2'), covar=tensor([0.0106, 0.0098, 0.0265, 0.1048, 0.0095, 0.0191, 0.0345, 0.1370], device='cuda:2'), in_proj_covar=tensor([0.0072, 0.0100, 0.0101, 0.0110, 0.0083, 0.0110, 0.0097, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 20:23:56,984 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=76384.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 20:24:09,246 INFO [train2.py:809] (2/4) Epoch 20, batch 700, loss[ctc_loss=0.09163, att_loss=0.2493, loss=0.2177, over 17337.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.03657, over 63.00 utterances.], tot_loss[ctc_loss=0.07678, att_loss=0.2367, loss=0.2047, over 3174975.38 frames. utt_duration=1220 frames, utt_pad_proportion=0.06175, over 10426.58 utterances.], batch size: 63, lr: 5.44e-03, grad_scale: 16.0 2023-03-08 20:25:20,467 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.218e+02 1.882e+02 2.322e+02 2.702e+02 6.742e+02, threshold=4.645e+02, percent-clipped=2.0 2023-03-08 20:25:28,802 INFO [train2.py:809] (2/4) Epoch 20, batch 750, loss[ctc_loss=0.05611, att_loss=0.2067, loss=0.1765, over 14110.00 frames. utt_duration=1822 frames, utt_pad_proportion=0.0666, over 31.00 utterances.], tot_loss[ctc_loss=0.07655, att_loss=0.2367, loss=0.2047, over 3196951.86 frames. utt_duration=1240 frames, utt_pad_proportion=0.05659, over 10322.26 utterances.], batch size: 31, lr: 5.44e-03, grad_scale: 16.0 2023-03-08 20:26:20,870 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-03-08 20:26:48,172 INFO [train2.py:809] (2/4) Epoch 20, batch 800, loss[ctc_loss=0.07191, att_loss=0.2404, loss=0.2067, over 17068.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.008905, over 53.00 utterances.], tot_loss[ctc_loss=0.07669, att_loss=0.2364, loss=0.2045, over 3212521.93 frames. utt_duration=1255 frames, utt_pad_proportion=0.05327, over 10250.42 utterances.], batch size: 53, lr: 5.44e-03, grad_scale: 16.0 2023-03-08 20:28:00,600 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.342e+02 1.827e+02 2.351e+02 2.962e+02 7.560e+02, threshold=4.702e+02, percent-clipped=5.0 2023-03-08 20:28:07,270 INFO [train2.py:809] (2/4) Epoch 20, batch 850, loss[ctc_loss=0.05828, att_loss=0.2305, loss=0.196, over 16774.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005919, over 48.00 utterances.], tot_loss[ctc_loss=0.07614, att_loss=0.2364, loss=0.2044, over 3224416.78 frames. utt_duration=1241 frames, utt_pad_proportion=0.05661, over 10403.50 utterances.], batch size: 48, lr: 5.43e-03, grad_scale: 8.0 2023-03-08 20:29:26,681 INFO [train2.py:809] (2/4) Epoch 20, batch 900, loss[ctc_loss=0.08883, att_loss=0.2323, loss=0.2036, over 15945.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007093, over 41.00 utterances.], tot_loss[ctc_loss=0.07629, att_loss=0.2373, loss=0.2051, over 3240568.90 frames. utt_duration=1218 frames, utt_pad_proportion=0.06081, over 10651.69 utterances.], batch size: 41, lr: 5.43e-03, grad_scale: 8.0 2023-03-08 20:30:21,609 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=76625.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 20:30:41,671 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.428e+02 2.021e+02 2.452e+02 3.045e+02 6.796e+02, threshold=4.904e+02, percent-clipped=5.0 2023-03-08 20:30:47,768 INFO [train2.py:809] (2/4) Epoch 20, batch 950, loss[ctc_loss=0.09448, att_loss=0.2452, loss=0.2151, over 16257.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.007986, over 43.00 utterances.], tot_loss[ctc_loss=0.07628, att_loss=0.2375, loss=0.2053, over 3248901.60 frames. utt_duration=1229 frames, utt_pad_proportion=0.05867, over 10585.25 utterances.], batch size: 43, lr: 5.43e-03, grad_scale: 8.0 2023-03-08 20:31:02,301 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=76650.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:31:47,586 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=76679.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 20:32:08,844 INFO [train2.py:809] (2/4) Epoch 20, batch 1000, loss[ctc_loss=0.08739, att_loss=0.2561, loss=0.2224, over 17306.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02421, over 59.00 utterances.], tot_loss[ctc_loss=0.07589, att_loss=0.2373, loss=0.205, over 3257596.23 frames. utt_duration=1241 frames, utt_pad_proportion=0.05516, over 10514.54 utterances.], batch size: 59, lr: 5.43e-03, grad_scale: 8.0 2023-03-08 20:33:21,294 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.239e+02 1.956e+02 2.404e+02 2.887e+02 6.438e+02, threshold=4.808e+02, percent-clipped=2.0 2023-03-08 20:33:28,406 INFO [train2.py:809] (2/4) Epoch 20, batch 1050, loss[ctc_loss=0.08506, att_loss=0.252, loss=0.2186, over 17114.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.0155, over 56.00 utterances.], tot_loss[ctc_loss=0.07532, att_loss=0.2368, loss=0.2045, over 3270822.46 frames. utt_duration=1260 frames, utt_pad_proportion=0.04813, over 10399.12 utterances.], batch size: 56, lr: 5.43e-03, grad_scale: 8.0 2023-03-08 20:33:47,105 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4179, 5.3973, 5.2217, 3.1604, 5.1303, 4.9982, 4.9641, 3.2113], device='cuda:2'), covar=tensor([0.0081, 0.0081, 0.0215, 0.0918, 0.0087, 0.0152, 0.0202, 0.1176], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0101, 0.0102, 0.0111, 0.0083, 0.0110, 0.0098, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 20:34:07,210 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7120, 2.2421, 2.5378, 2.2777, 2.4765, 2.5028, 2.4658, 3.0644], device='cuda:2'), covar=tensor([0.1444, 0.3178, 0.2120, 0.1633, 0.1507, 0.1249, 0.2323, 0.1048], device='cuda:2'), in_proj_covar=tensor([0.0112, 0.0122, 0.0116, 0.0105, 0.0117, 0.0103, 0.0125, 0.0094], device='cuda:2'), out_proj_covar=tensor([8.3874e-05, 9.3394e-05, 9.1863e-05, 8.1702e-05, 8.6392e-05, 8.2185e-05, 9.2880e-05, 7.5008e-05], device='cuda:2') 2023-03-08 20:34:12,012 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4417, 2.1669, 2.0908, 2.1904, 2.5413, 2.2861, 2.3081, 2.9250], device='cuda:2'), covar=tensor([0.1613, 0.3255, 0.2376, 0.1601, 0.1513, 0.1401, 0.2361, 0.1241], device='cuda:2'), in_proj_covar=tensor([0.0112, 0.0122, 0.0117, 0.0105, 0.0117, 0.0103, 0.0125, 0.0094], device='cuda:2'), out_proj_covar=tensor([8.3973e-05, 9.3516e-05, 9.1957e-05, 8.1814e-05, 8.6504e-05, 8.2293e-05, 9.3011e-05, 7.5094e-05], device='cuda:2') 2023-03-08 20:34:48,261 INFO [train2.py:809] (2/4) Epoch 20, batch 1100, loss[ctc_loss=0.06467, att_loss=0.2143, loss=0.1844, over 15766.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.00917, over 38.00 utterances.], tot_loss[ctc_loss=0.07508, att_loss=0.2368, loss=0.2045, over 3271472.19 frames. utt_duration=1281 frames, utt_pad_proportion=0.04282, over 10231.08 utterances.], batch size: 38, lr: 5.42e-03, grad_scale: 8.0 2023-03-08 20:35:01,894 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0135, 4.9554, 4.6705, 2.8283, 4.7573, 4.6324, 4.2221, 2.5009], device='cuda:2'), covar=tensor([0.0121, 0.0119, 0.0292, 0.1131, 0.0113, 0.0217, 0.0387, 0.1695], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0101, 0.0102, 0.0110, 0.0083, 0.0110, 0.0098, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 20:36:02,081 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 2.008e+02 2.417e+02 3.031e+02 6.018e+02, threshold=4.833e+02, percent-clipped=3.0 2023-03-08 20:36:03,924 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=76839.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:36:08,259 INFO [train2.py:809] (2/4) Epoch 20, batch 1150, loss[ctc_loss=0.07623, att_loss=0.2428, loss=0.2095, over 17029.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.008117, over 51.00 utterances.], tot_loss[ctc_loss=0.07521, att_loss=0.237, loss=0.2046, over 3272605.76 frames. utt_duration=1271 frames, utt_pad_proportion=0.04596, over 10313.75 utterances.], batch size: 51, lr: 5.42e-03, grad_scale: 8.0 2023-03-08 20:36:20,537 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0818, 2.6483, 2.7521, 4.2159, 3.8988, 3.8141, 2.8563, 2.2342], device='cuda:2'), covar=tensor([0.0865, 0.2190, 0.1453, 0.0633, 0.0784, 0.0474, 0.1433, 0.2312], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0214, 0.0189, 0.0212, 0.0218, 0.0176, 0.0201, 0.0186], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 20:37:28,705 INFO [train2.py:809] (2/4) Epoch 20, batch 1200, loss[ctc_loss=0.06969, att_loss=0.2373, loss=0.2038, over 16475.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006011, over 46.00 utterances.], tot_loss[ctc_loss=0.07573, att_loss=0.2375, loss=0.2051, over 3272162.83 frames. utt_duration=1232 frames, utt_pad_proportion=0.05616, over 10635.65 utterances.], batch size: 46, lr: 5.42e-03, grad_scale: 8.0 2023-03-08 20:37:41,975 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=76900.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:38:21,816 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=76925.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 20:38:42,102 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.259e+02 1.866e+02 2.347e+02 2.914e+02 5.639e+02, threshold=4.693e+02, percent-clipped=1.0 2023-03-08 20:38:49,690 INFO [train2.py:809] (2/4) Epoch 20, batch 1250, loss[ctc_loss=0.07766, att_loss=0.2488, loss=0.2146, over 17057.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009383, over 53.00 utterances.], tot_loss[ctc_loss=0.07533, att_loss=0.2365, loss=0.2043, over 3271228.20 frames. utt_duration=1239 frames, utt_pad_proportion=0.05586, over 10576.33 utterances.], batch size: 53, lr: 5.42e-03, grad_scale: 8.0 2023-03-08 20:39:02,601 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=76950.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:39:38,660 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=76973.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 20:39:48,077 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=76979.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 20:40:10,272 INFO [train2.py:809] (2/4) Epoch 20, batch 1300, loss[ctc_loss=0.07067, att_loss=0.245, loss=0.2101, over 17023.00 frames. utt_duration=1286 frames, utt_pad_proportion=0.00963, over 53.00 utterances.], tot_loss[ctc_loss=0.07525, att_loss=0.2363, loss=0.2041, over 3262208.93 frames. utt_duration=1251 frames, utt_pad_proportion=0.05384, over 10443.62 utterances.], batch size: 53, lr: 5.42e-03, grad_scale: 8.0 2023-03-08 20:40:19,513 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=76998.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:41:04,685 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77027.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 20:41:22,762 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.337e+02 2.046e+02 2.509e+02 3.001e+02 6.074e+02, threshold=5.017e+02, percent-clipped=4.0 2023-03-08 20:41:29,839 INFO [train2.py:809] (2/4) Epoch 20, batch 1350, loss[ctc_loss=0.07449, att_loss=0.222, loss=0.1925, over 16137.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005772, over 42.00 utterances.], tot_loss[ctc_loss=0.07539, att_loss=0.2359, loss=0.2038, over 3262836.38 frames. utt_duration=1242 frames, utt_pad_proportion=0.05714, over 10521.22 utterances.], batch size: 42, lr: 5.42e-03, grad_scale: 8.0 2023-03-08 20:42:48,966 INFO [train2.py:809] (2/4) Epoch 20, batch 1400, loss[ctc_loss=0.06669, att_loss=0.2405, loss=0.2057, over 17048.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008381, over 52.00 utterances.], tot_loss[ctc_loss=0.07563, att_loss=0.2359, loss=0.2039, over 3259459.08 frames. utt_duration=1232 frames, utt_pad_proportion=0.0616, over 10599.87 utterances.], batch size: 52, lr: 5.41e-03, grad_scale: 8.0 2023-03-08 20:42:54,070 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77095.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:43:15,373 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9887, 5.2203, 5.5434, 5.2881, 5.4163, 5.9283, 5.2208, 6.0446], device='cuda:2'), covar=tensor([0.0700, 0.0740, 0.0819, 0.1356, 0.1860, 0.0942, 0.0624, 0.0653], device='cuda:2'), in_proj_covar=tensor([0.0863, 0.0499, 0.0585, 0.0645, 0.0858, 0.0611, 0.0476, 0.0590], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 20:44:01,976 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.377e+02 1.974e+02 2.358e+02 2.926e+02 6.381e+02, threshold=4.717e+02, percent-clipped=2.0 2023-03-08 20:44:05,921 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-03-08 20:44:08,004 INFO [train2.py:809] (2/4) Epoch 20, batch 1450, loss[ctc_loss=0.06894, att_loss=0.228, loss=0.1962, over 16406.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.007284, over 44.00 utterances.], tot_loss[ctc_loss=0.0751, att_loss=0.2354, loss=0.2034, over 3259570.28 frames. utt_duration=1252 frames, utt_pad_proportion=0.05724, over 10427.72 utterances.], batch size: 44, lr: 5.41e-03, grad_scale: 8.0 2023-03-08 20:44:29,607 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77156.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 20:45:07,780 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5409, 2.1736, 2.4349, 2.5379, 2.8214, 2.6118, 2.4876, 3.1228], device='cuda:2'), covar=tensor([0.2196, 0.3109, 0.2175, 0.1444, 0.1350, 0.0925, 0.2111, 0.1118], device='cuda:2'), in_proj_covar=tensor([0.0113, 0.0122, 0.0116, 0.0105, 0.0117, 0.0101, 0.0125, 0.0093], device='cuda:2'), out_proj_covar=tensor([8.4490e-05, 9.3599e-05, 9.1947e-05, 8.1595e-05, 8.6381e-05, 8.1347e-05, 9.3374e-05, 7.4969e-05], device='cuda:2') 2023-03-08 20:45:27,439 INFO [train2.py:809] (2/4) Epoch 20, batch 1500, loss[ctc_loss=0.07727, att_loss=0.2501, loss=0.2155, over 17372.00 frames. utt_duration=1179 frames, utt_pad_proportion=0.01899, over 59.00 utterances.], tot_loss[ctc_loss=0.0754, att_loss=0.2359, loss=0.2038, over 3256240.72 frames. utt_duration=1232 frames, utt_pad_proportion=0.0635, over 10588.42 utterances.], batch size: 59, lr: 5.41e-03, grad_scale: 8.0 2023-03-08 20:45:31,978 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77195.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:45:57,182 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77211.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:46:34,113 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-08 20:46:40,577 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.285e+02 1.945e+02 2.357e+02 2.826e+02 4.354e+02, threshold=4.714e+02, percent-clipped=0.0 2023-03-08 20:46:46,772 INFO [train2.py:809] (2/4) Epoch 20, batch 1550, loss[ctc_loss=0.08849, att_loss=0.254, loss=0.2209, over 17141.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.01389, over 56.00 utterances.], tot_loss[ctc_loss=0.07587, att_loss=0.237, loss=0.2047, over 3261627.65 frames. utt_duration=1204 frames, utt_pad_proportion=0.06701, over 10848.72 utterances.], batch size: 56, lr: 5.41e-03, grad_scale: 8.0 2023-03-08 20:47:29,718 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6058, 2.1926, 2.3562, 2.4524, 2.7494, 2.5024, 2.5021, 3.1456], device='cuda:2'), covar=tensor([0.2654, 0.3936, 0.2894, 0.1936, 0.1730, 0.1344, 0.2674, 0.2049], device='cuda:2'), in_proj_covar=tensor([0.0117, 0.0124, 0.0119, 0.0107, 0.0119, 0.0104, 0.0127, 0.0095], device='cuda:2'), out_proj_covar=tensor([8.6777e-05, 9.4969e-05, 9.3537e-05, 8.3091e-05, 8.7929e-05, 8.3297e-05, 9.4958e-05, 7.6389e-05], device='cuda:2') 2023-03-08 20:47:34,326 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77272.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:47:35,830 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77273.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:47:42,416 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77277.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:48:05,772 INFO [train2.py:809] (2/4) Epoch 20, batch 1600, loss[ctc_loss=0.1209, att_loss=0.2666, loss=0.2375, over 13516.00 frames. utt_duration=371.7 frames, utt_pad_proportion=0.3501, over 146.00 utterances.], tot_loss[ctc_loss=0.07553, att_loss=0.2369, loss=0.2046, over 3263212.86 frames. utt_duration=1209 frames, utt_pad_proportion=0.06607, over 10808.09 utterances.], batch size: 146, lr: 5.41e-03, grad_scale: 8.0 2023-03-08 20:48:58,741 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-03-08 20:49:11,774 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77334.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:49:18,253 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.314e+02 1.917e+02 2.281e+02 2.951e+02 6.682e+02, threshold=4.563e+02, percent-clipped=3.0 2023-03-08 20:49:18,715 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77338.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:49:25,151 INFO [train2.py:809] (2/4) Epoch 20, batch 1650, loss[ctc_loss=0.07959, att_loss=0.2236, loss=0.1948, over 16010.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007063, over 40.00 utterances.], tot_loss[ctc_loss=0.07518, att_loss=0.2363, loss=0.2041, over 3258410.76 frames. utt_duration=1224 frames, utt_pad_proportion=0.0625, over 10660.08 utterances.], batch size: 40, lr: 5.41e-03, grad_scale: 8.0 2023-03-08 20:50:29,685 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0777, 5.3728, 4.9219, 5.4014, 4.7625, 5.0915, 5.5067, 5.3061], device='cuda:2'), covar=tensor([0.0508, 0.0298, 0.0750, 0.0284, 0.0407, 0.0192, 0.0202, 0.0178], device='cuda:2'), in_proj_covar=tensor([0.0380, 0.0313, 0.0359, 0.0336, 0.0313, 0.0233, 0.0294, 0.0278], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 20:50:44,523 INFO [train2.py:809] (2/4) Epoch 20, batch 1700, loss[ctc_loss=0.08499, att_loss=0.2507, loss=0.2175, over 17303.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.0227, over 59.00 utterances.], tot_loss[ctc_loss=0.0758, att_loss=0.2374, loss=0.2051, over 3272807.06 frames. utt_duration=1219 frames, utt_pad_proportion=0.06094, over 10754.87 utterances.], batch size: 59, lr: 5.40e-03, grad_scale: 8.0 2023-03-08 20:51:30,377 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1099, 6.3672, 5.8677, 6.0903, 6.0637, 5.5553, 5.8869, 5.6249], device='cuda:2'), covar=tensor([0.1193, 0.0803, 0.0845, 0.0688, 0.0753, 0.1475, 0.2010, 0.1906], device='cuda:2'), in_proj_covar=tensor([0.0516, 0.0604, 0.0457, 0.0446, 0.0430, 0.0467, 0.0606, 0.0525], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 20:51:57,823 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.333e+02 1.985e+02 2.370e+02 2.830e+02 5.351e+02, threshold=4.740e+02, percent-clipped=1.0 2023-03-08 20:52:04,402 INFO [train2.py:809] (2/4) Epoch 20, batch 1750, loss[ctc_loss=0.09908, att_loss=0.2549, loss=0.2237, over 16611.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.006231, over 47.00 utterances.], tot_loss[ctc_loss=0.07574, att_loss=0.237, loss=0.2047, over 3274067.32 frames. utt_duration=1242 frames, utt_pad_proportion=0.05446, over 10557.71 utterances.], batch size: 47, lr: 5.40e-03, grad_scale: 8.0 2023-03-08 20:52:18,558 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77451.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 20:52:26,276 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8197, 6.1541, 5.5493, 5.8573, 5.8165, 5.2523, 5.4703, 5.3330], device='cuda:2'), covar=tensor([0.1397, 0.0889, 0.0955, 0.0801, 0.0911, 0.1583, 0.2660, 0.2432], device='cuda:2'), in_proj_covar=tensor([0.0521, 0.0609, 0.0461, 0.0450, 0.0433, 0.0471, 0.0613, 0.0529], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 20:52:46,499 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7447, 5.1765, 4.9837, 5.1624, 5.1806, 4.8607, 3.4307, 5.0624], device='cuda:2'), covar=tensor([0.0107, 0.0097, 0.0119, 0.0066, 0.0070, 0.0102, 0.0710, 0.0147], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0085, 0.0108, 0.0067, 0.0073, 0.0084, 0.0103, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 20:52:53,396 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1824, 4.5228, 4.7545, 4.7522, 2.9507, 4.6624, 2.7097, 2.1057], device='cuda:2'), covar=tensor([0.0365, 0.0213, 0.0570, 0.0181, 0.1435, 0.0137, 0.1506, 0.1602], device='cuda:2'), in_proj_covar=tensor([0.0186, 0.0158, 0.0259, 0.0154, 0.0221, 0.0138, 0.0232, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 20:53:21,537 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6652, 3.1574, 3.5957, 3.0583, 3.6145, 4.6893, 4.4796, 3.2659], device='cuda:2'), covar=tensor([0.0310, 0.1597, 0.1314, 0.1321, 0.1082, 0.0857, 0.0575, 0.1288], device='cuda:2'), in_proj_covar=tensor([0.0242, 0.0243, 0.0275, 0.0215, 0.0264, 0.0358, 0.0258, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 20:53:24,304 INFO [train2.py:809] (2/4) Epoch 20, batch 1800, loss[ctc_loss=0.06686, att_loss=0.2077, loss=0.1795, over 11440.00 frames. utt_duration=1832 frames, utt_pad_proportion=0.1751, over 25.00 utterances.], tot_loss[ctc_loss=0.07563, att_loss=0.2367, loss=0.2045, over 3263601.61 frames. utt_duration=1248 frames, utt_pad_proportion=0.05666, over 10471.29 utterances.], batch size: 25, lr: 5.40e-03, grad_scale: 8.0 2023-03-08 20:53:29,177 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77495.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:53:49,370 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2506, 5.6150, 5.1717, 5.6593, 5.0791, 5.2141, 5.7053, 5.5542], device='cuda:2'), covar=tensor([0.0505, 0.0248, 0.0669, 0.0227, 0.0339, 0.0167, 0.0198, 0.0155], device='cuda:2'), in_proj_covar=tensor([0.0385, 0.0315, 0.0362, 0.0340, 0.0316, 0.0234, 0.0297, 0.0281], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 20:54:13,345 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1839, 5.4168, 5.3829, 5.3723, 5.4756, 5.3755, 5.1746, 4.8970], device='cuda:2'), covar=tensor([0.0983, 0.0526, 0.0289, 0.0420, 0.0296, 0.0305, 0.0334, 0.0356], device='cuda:2'), in_proj_covar=tensor([0.0527, 0.0359, 0.0340, 0.0351, 0.0414, 0.0428, 0.0353, 0.0393], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 20:54:37,624 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.274e+02 1.944e+02 2.387e+02 2.834e+02 5.542e+02, threshold=4.773e+02, percent-clipped=3.0 2023-03-08 20:54:44,320 INFO [train2.py:809] (2/4) Epoch 20, batch 1850, loss[ctc_loss=0.08822, att_loss=0.2474, loss=0.2156, over 16960.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007164, over 50.00 utterances.], tot_loss[ctc_loss=0.07565, att_loss=0.2365, loss=0.2043, over 3266744.49 frames. utt_duration=1233 frames, utt_pad_proportion=0.05748, over 10611.23 utterances.], batch size: 50, lr: 5.40e-03, grad_scale: 8.0 2023-03-08 20:54:45,980 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77543.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:55:23,581 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77567.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:56:03,933 INFO [train2.py:809] (2/4) Epoch 20, batch 1900, loss[ctc_loss=0.09617, att_loss=0.2597, loss=0.227, over 17335.00 frames. utt_duration=1177 frames, utt_pad_proportion=0.02255, over 59.00 utterances.], tot_loss[ctc_loss=0.07599, att_loss=0.237, loss=0.2048, over 3272081.59 frames. utt_duration=1242 frames, utt_pad_proportion=0.05504, over 10547.18 utterances.], batch size: 59, lr: 5.40e-03, grad_scale: 8.0 2023-03-08 20:56:12,205 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9785, 2.5232, 3.0325, 3.8010, 3.4601, 3.4853, 2.6712, 2.2287], device='cuda:2'), covar=tensor([0.0741, 0.2057, 0.0862, 0.0618, 0.0884, 0.0523, 0.1466, 0.2002], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0217, 0.0188, 0.0216, 0.0219, 0.0176, 0.0203, 0.0186], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 20:56:42,742 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77616.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:57:03,863 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77629.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:57:10,165 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77633.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:57:17,662 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.953e+02 2.465e+02 3.107e+02 2.003e+03, threshold=4.931e+02, percent-clipped=5.0 2023-03-08 20:57:24,504 INFO [train2.py:809] (2/4) Epoch 20, batch 1950, loss[ctc_loss=0.06641, att_loss=0.2244, loss=0.1928, over 15900.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.008334, over 39.00 utterances.], tot_loss[ctc_loss=0.07607, att_loss=0.2371, loss=0.2049, over 3268894.50 frames. utt_duration=1228 frames, utt_pad_proportion=0.05946, over 10657.16 utterances.], batch size: 39, lr: 5.39e-03, grad_scale: 8.0 2023-03-08 20:58:19,905 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77677.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:58:19,933 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77677.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 20:58:30,319 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.85 vs. limit=2.0 2023-03-08 20:58:44,126 INFO [train2.py:809] (2/4) Epoch 20, batch 2000, loss[ctc_loss=0.06045, att_loss=0.2395, loss=0.2037, over 17012.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.009135, over 51.00 utterances.], tot_loss[ctc_loss=0.07611, att_loss=0.2369, loss=0.2047, over 3269963.03 frames. utt_duration=1241 frames, utt_pad_proportion=0.05603, over 10551.41 utterances.], batch size: 51, lr: 5.39e-03, grad_scale: 8.0 2023-03-08 20:59:03,341 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4589, 2.8312, 4.9459, 3.8032, 2.9247, 4.2233, 4.6176, 4.5064], device='cuda:2'), covar=tensor([0.0263, 0.1495, 0.0190, 0.1038, 0.1843, 0.0257, 0.0163, 0.0290], device='cuda:2'), in_proj_covar=tensor([0.0186, 0.0240, 0.0179, 0.0308, 0.0265, 0.0209, 0.0167, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 20:59:14,963 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-08 20:59:22,189 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1137, 5.5054, 5.0213, 5.5110, 4.9151, 5.0559, 5.6345, 5.4169], device='cuda:2'), covar=tensor([0.0583, 0.0291, 0.0761, 0.0303, 0.0395, 0.0215, 0.0248, 0.0186], device='cuda:2'), in_proj_covar=tensor([0.0382, 0.0313, 0.0361, 0.0338, 0.0313, 0.0234, 0.0295, 0.0279], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 20:59:43,380 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1717, 5.1548, 4.9062, 2.9911, 4.8708, 4.7478, 4.4352, 3.0834], device='cuda:2'), covar=tensor([0.0096, 0.0088, 0.0272, 0.0929, 0.0100, 0.0185, 0.0279, 0.1201], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0101, 0.0101, 0.0110, 0.0083, 0.0109, 0.0098, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 20:59:57,578 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.253e+02 2.073e+02 2.649e+02 3.244e+02 1.218e+03, threshold=5.298e+02, percent-clipped=5.0 2023-03-08 20:59:57,977 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77738.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:00:03,836 INFO [train2.py:809] (2/4) Epoch 20, batch 2050, loss[ctc_loss=0.06977, att_loss=0.2551, loss=0.218, over 17058.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.007945, over 52.00 utterances.], tot_loss[ctc_loss=0.07638, att_loss=0.2372, loss=0.205, over 3274932.45 frames. utt_duration=1261 frames, utt_pad_proportion=0.0515, over 10399.10 utterances.], batch size: 52, lr: 5.39e-03, grad_scale: 8.0 2023-03-08 21:00:17,886 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77751.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 21:01:23,890 INFO [train2.py:809] (2/4) Epoch 20, batch 2100, loss[ctc_loss=0.08847, att_loss=0.2506, loss=0.2181, over 17316.00 frames. utt_duration=1101 frames, utt_pad_proportion=0.03782, over 63.00 utterances.], tot_loss[ctc_loss=0.07642, att_loss=0.2371, loss=0.205, over 3272531.68 frames. utt_duration=1258 frames, utt_pad_proportion=0.05336, over 10421.18 utterances.], batch size: 63, lr: 5.39e-03, grad_scale: 8.0 2023-03-08 21:01:34,657 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77799.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:02:36,993 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 1.913e+02 2.227e+02 2.675e+02 5.227e+02, threshold=4.453e+02, percent-clipped=0.0 2023-03-08 21:02:43,921 INFO [train2.py:809] (2/4) Epoch 20, batch 2150, loss[ctc_loss=0.05483, att_loss=0.2083, loss=0.1776, over 15652.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.00864, over 37.00 utterances.], tot_loss[ctc_loss=0.07734, att_loss=0.2378, loss=0.2057, over 3266664.06 frames. utt_duration=1238 frames, utt_pad_proportion=0.05809, over 10570.21 utterances.], batch size: 37, lr: 5.39e-03, grad_scale: 8.0 2023-03-08 21:03:24,195 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77867.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:03:53,017 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1283, 5.1061, 4.9402, 2.1852, 1.9385, 2.5248, 2.3603, 3.8600], device='cuda:2'), covar=tensor([0.0662, 0.0244, 0.0212, 0.4827, 0.5756, 0.3075, 0.3307, 0.1672], device='cuda:2'), in_proj_covar=tensor([0.0354, 0.0270, 0.0266, 0.0244, 0.0344, 0.0336, 0.0251, 0.0368], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-08 21:04:04,224 INFO [train2.py:809] (2/4) Epoch 20, batch 2200, loss[ctc_loss=0.1412, att_loss=0.2768, loss=0.2497, over 14078.00 frames. utt_duration=389.9 frames, utt_pad_proportion=0.3219, over 145.00 utterances.], tot_loss[ctc_loss=0.07859, att_loss=0.2386, loss=0.2066, over 3258700.23 frames. utt_duration=1190 frames, utt_pad_proportion=0.07275, over 10963.81 utterances.], batch size: 145, lr: 5.39e-03, grad_scale: 8.0 2023-03-08 21:04:21,040 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.15 vs. limit=5.0 2023-03-08 21:04:32,554 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.87 vs. limit=5.0 2023-03-08 21:04:41,056 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77915.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:05:03,614 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77929.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:05:10,174 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77933.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:05:17,370 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.298e+02 1.939e+02 2.291e+02 2.587e+02 5.643e+02, threshold=4.582e+02, percent-clipped=3.0 2023-03-08 21:05:23,445 INFO [train2.py:809] (2/4) Epoch 20, batch 2250, loss[ctc_loss=0.06573, att_loss=0.2241, loss=0.1924, over 16150.00 frames. utt_duration=1539 frames, utt_pad_proportion=0.004894, over 42.00 utterances.], tot_loss[ctc_loss=0.07833, att_loss=0.2386, loss=0.2065, over 3260353.06 frames. utt_duration=1217 frames, utt_pad_proportion=0.06555, over 10727.69 utterances.], batch size: 42, lr: 5.38e-03, grad_scale: 8.0 2023-03-08 21:05:42,161 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2023-03-08 21:06:11,734 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77972.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:06:19,581 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77977.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:06:26,478 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77981.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:06:43,280 INFO [train2.py:809] (2/4) Epoch 20, batch 2300, loss[ctc_loss=0.07556, att_loss=0.2365, loss=0.2044, over 16118.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006145, over 42.00 utterances.], tot_loss[ctc_loss=0.07884, att_loss=0.239, loss=0.2069, over 3261797.35 frames. utt_duration=1212 frames, utt_pad_proportion=0.06662, over 10776.95 utterances.], batch size: 42, lr: 5.38e-03, grad_scale: 8.0 2023-03-08 21:07:53,498 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=78033.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:07:58,314 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5992, 4.9861, 4.8787, 4.8817, 5.0196, 4.6711, 3.4368, 4.9137], device='cuda:2'), covar=tensor([0.0127, 0.0124, 0.0139, 0.0105, 0.0106, 0.0129, 0.0758, 0.0226], device='cuda:2'), in_proj_covar=tensor([0.0089, 0.0084, 0.0106, 0.0066, 0.0072, 0.0082, 0.0101, 0.0106], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 21:08:01,054 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.567e+02 2.189e+02 2.492e+02 2.915e+02 6.479e+02, threshold=4.984e+02, percent-clipped=5.0 2023-03-08 21:08:07,066 INFO [train2.py:809] (2/4) Epoch 20, batch 2350, loss[ctc_loss=0.08514, att_loss=0.2567, loss=0.2224, over 17361.00 frames. utt_duration=880.6 frames, utt_pad_proportion=0.07792, over 79.00 utterances.], tot_loss[ctc_loss=0.07811, att_loss=0.2385, loss=0.2064, over 3270160.89 frames. utt_duration=1221 frames, utt_pad_proportion=0.06165, over 10723.31 utterances.], batch size: 79, lr: 5.38e-03, grad_scale: 8.0 2023-03-08 21:08:21,604 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6925, 4.9949, 4.8944, 4.9117, 5.0525, 4.7244, 3.4900, 4.9728], device='cuda:2'), covar=tensor([0.0103, 0.0110, 0.0113, 0.0092, 0.0093, 0.0115, 0.0702, 0.0185], device='cuda:2'), in_proj_covar=tensor([0.0089, 0.0084, 0.0106, 0.0066, 0.0072, 0.0082, 0.0101, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 21:09:26,270 INFO [train2.py:809] (2/4) Epoch 20, batch 2400, loss[ctc_loss=0.06942, att_loss=0.2096, loss=0.1815, over 15886.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009175, over 39.00 utterances.], tot_loss[ctc_loss=0.07728, att_loss=0.2379, loss=0.2058, over 3270424.59 frames. utt_duration=1216 frames, utt_pad_proportion=0.06298, over 10770.21 utterances.], batch size: 39, lr: 5.38e-03, grad_scale: 8.0 2023-03-08 21:10:39,631 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.436e+02 2.004e+02 2.481e+02 3.189e+02 5.932e+02, threshold=4.963e+02, percent-clipped=2.0 2023-03-08 21:10:46,372 INFO [train2.py:809] (2/4) Epoch 20, batch 2450, loss[ctc_loss=0.06868, att_loss=0.2073, loss=0.1796, over 15341.00 frames. utt_duration=1755 frames, utt_pad_proportion=0.01182, over 35.00 utterances.], tot_loss[ctc_loss=0.07733, att_loss=0.2384, loss=0.2061, over 3272114.09 frames. utt_duration=1208 frames, utt_pad_proportion=0.06532, over 10852.63 utterances.], batch size: 35, lr: 5.38e-03, grad_scale: 8.0 2023-03-08 21:12:06,741 INFO [train2.py:809] (2/4) Epoch 20, batch 2500, loss[ctc_loss=0.07671, att_loss=0.2502, loss=0.2155, over 16775.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005993, over 48.00 utterances.], tot_loss[ctc_loss=0.07643, att_loss=0.2375, loss=0.2053, over 3273586.55 frames. utt_duration=1212 frames, utt_pad_proportion=0.06399, over 10819.26 utterances.], batch size: 48, lr: 5.38e-03, grad_scale: 8.0 2023-03-08 21:12:29,862 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6715, 3.0815, 3.6283, 4.7005, 4.1327, 4.0300, 3.1328, 2.6622], device='cuda:2'), covar=tensor([0.0582, 0.1893, 0.0869, 0.0405, 0.0739, 0.0450, 0.1358, 0.1844], device='cuda:2'), in_proj_covar=tensor([0.0179, 0.0217, 0.0189, 0.0215, 0.0218, 0.0174, 0.0203, 0.0187], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:12:46,175 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4159, 2.4318, 4.9493, 3.8278, 2.9163, 4.1767, 4.6104, 4.5035], device='cuda:2'), covar=tensor([0.0265, 0.1851, 0.0133, 0.0981, 0.1788, 0.0276, 0.0188, 0.0266], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0242, 0.0179, 0.0311, 0.0266, 0.0210, 0.0169, 0.0197], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:13:01,237 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7333, 4.9715, 4.5496, 4.9919, 4.4102, 4.6330, 5.0948, 4.9293], device='cuda:2'), covar=tensor([0.0584, 0.0315, 0.0786, 0.0351, 0.0450, 0.0372, 0.0243, 0.0198], device='cuda:2'), in_proj_covar=tensor([0.0380, 0.0312, 0.0357, 0.0336, 0.0312, 0.0233, 0.0293, 0.0277], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 21:13:12,163 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=78232.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:13:21,001 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.184e+02 1.956e+02 2.237e+02 2.783e+02 5.149e+02, threshold=4.474e+02, percent-clipped=1.0 2023-03-08 21:13:27,566 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-03-08 21:13:27,749 INFO [train2.py:809] (2/4) Epoch 20, batch 2550, loss[ctc_loss=0.08338, att_loss=0.2402, loss=0.2088, over 16526.00 frames. utt_duration=1470 frames, utt_pad_proportion=0.007142, over 45.00 utterances.], tot_loss[ctc_loss=0.07732, att_loss=0.2381, loss=0.2059, over 3260017.09 frames. utt_duration=1176 frames, utt_pad_proportion=0.07623, over 11106.36 utterances.], batch size: 45, lr: 5.37e-03, grad_scale: 8.0 2023-03-08 21:14:15,798 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=78272.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:14:30,346 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1555, 5.4367, 4.8015, 5.5527, 4.9251, 5.0911, 5.5410, 5.3272], device='cuda:2'), covar=tensor([0.0460, 0.0246, 0.0936, 0.0227, 0.0363, 0.0195, 0.0255, 0.0194], device='cuda:2'), in_proj_covar=tensor([0.0381, 0.0314, 0.0360, 0.0336, 0.0313, 0.0235, 0.0294, 0.0278], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 21:14:47,848 INFO [train2.py:809] (2/4) Epoch 20, batch 2600, loss[ctc_loss=0.1156, att_loss=0.2558, loss=0.2278, over 17394.00 frames. utt_duration=1106 frames, utt_pad_proportion=0.03308, over 63.00 utterances.], tot_loss[ctc_loss=0.07731, att_loss=0.2379, loss=0.2058, over 3261077.23 frames. utt_duration=1176 frames, utt_pad_proportion=0.07548, over 11108.80 utterances.], batch size: 63, lr: 5.37e-03, grad_scale: 8.0 2023-03-08 21:14:49,870 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=78293.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 21:15:21,902 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6945, 3.1512, 3.7162, 3.2143, 3.6273, 4.7102, 4.5846, 3.5187], device='cuda:2'), covar=tensor([0.0310, 0.1564, 0.1212, 0.1219, 0.1105, 0.0696, 0.0512, 0.1073], device='cuda:2'), in_proj_covar=tensor([0.0246, 0.0244, 0.0280, 0.0217, 0.0267, 0.0362, 0.0259, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 21:15:32,445 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=78320.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:15:53,563 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=78333.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:16:00,906 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.986e+02 2.327e+02 2.880e+02 6.602e+02, threshold=4.653e+02, percent-clipped=4.0 2023-03-08 21:16:07,832 INFO [train2.py:809] (2/4) Epoch 20, batch 2650, loss[ctc_loss=0.06156, att_loss=0.2216, loss=0.1896, over 16322.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006779, over 45.00 utterances.], tot_loss[ctc_loss=0.07658, att_loss=0.2371, loss=0.205, over 3264720.76 frames. utt_duration=1208 frames, utt_pad_proportion=0.06741, over 10820.47 utterances.], batch size: 45, lr: 5.37e-03, grad_scale: 8.0 2023-03-08 21:16:09,764 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8354, 2.3021, 2.8807, 2.6277, 2.8319, 2.9026, 2.6412, 3.2682], device='cuda:2'), covar=tensor([0.1552, 0.3318, 0.1937, 0.1572, 0.1635, 0.1359, 0.2585, 0.0953], device='cuda:2'), in_proj_covar=tensor([0.0114, 0.0122, 0.0115, 0.0105, 0.0118, 0.0103, 0.0126, 0.0095], device='cuda:2'), out_proj_covar=tensor([8.5693e-05, 9.3876e-05, 9.1799e-05, 8.2069e-05, 8.7099e-05, 8.2324e-05, 9.4136e-05, 7.6158e-05], device='cuda:2') 2023-03-08 21:17:10,074 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=78381.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:17:27,554 INFO [train2.py:809] (2/4) Epoch 20, batch 2700, loss[ctc_loss=0.07986, att_loss=0.2495, loss=0.2155, over 16535.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006752, over 45.00 utterances.], tot_loss[ctc_loss=0.07667, att_loss=0.2375, loss=0.2054, over 3270112.86 frames. utt_duration=1195 frames, utt_pad_proportion=0.06908, over 10959.79 utterances.], batch size: 45, lr: 5.37e-03, grad_scale: 8.0 2023-03-08 21:17:35,519 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8425, 6.1340, 5.6224, 5.8599, 5.7669, 5.3852, 5.4944, 5.3350], device='cuda:2'), covar=tensor([0.1302, 0.0846, 0.0921, 0.0842, 0.0882, 0.1406, 0.2322, 0.2246], device='cuda:2'), in_proj_covar=tensor([0.0510, 0.0595, 0.0450, 0.0447, 0.0419, 0.0455, 0.0601, 0.0513], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 21:18:41,449 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.385e+02 1.853e+02 2.270e+02 2.670e+02 9.199e+02, threshold=4.540e+02, percent-clipped=5.0 2023-03-08 21:18:47,884 INFO [train2.py:809] (2/4) Epoch 20, batch 2750, loss[ctc_loss=0.0777, att_loss=0.2451, loss=0.2116, over 17068.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.007983, over 52.00 utterances.], tot_loss[ctc_loss=0.07627, att_loss=0.2376, loss=0.2053, over 3274412.59 frames. utt_duration=1213 frames, utt_pad_proportion=0.06311, over 10807.28 utterances.], batch size: 52, lr: 5.37e-03, grad_scale: 8.0 2023-03-08 21:19:17,747 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=78460.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:20:08,380 INFO [train2.py:809] (2/4) Epoch 20, batch 2800, loss[ctc_loss=0.08293, att_loss=0.2436, loss=0.2114, over 17127.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01461, over 56.00 utterances.], tot_loss[ctc_loss=0.07625, att_loss=0.2374, loss=0.2051, over 3279707.09 frames. utt_duration=1232 frames, utt_pad_proportion=0.0577, over 10661.97 utterances.], batch size: 56, lr: 5.37e-03, grad_scale: 8.0 2023-03-08 21:20:54,641 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.98 vs. limit=5.0 2023-03-08 21:20:55,795 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=78521.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:20:57,325 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0710, 5.3971, 5.3068, 5.2578, 5.3676, 5.3476, 5.0226, 4.8148], device='cuda:2'), covar=tensor([0.0971, 0.0495, 0.0309, 0.0503, 0.0293, 0.0302, 0.0411, 0.0340], device='cuda:2'), in_proj_covar=tensor([0.0519, 0.0355, 0.0338, 0.0347, 0.0410, 0.0425, 0.0350, 0.0389], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 21:21:21,797 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.455e+02 2.103e+02 2.542e+02 3.045e+02 6.490e+02, threshold=5.083e+02, percent-clipped=4.0 2023-03-08 21:21:28,201 INFO [train2.py:809] (2/4) Epoch 20, batch 2850, loss[ctc_loss=0.07884, att_loss=0.2228, loss=0.194, over 16013.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006877, over 40.00 utterances.], tot_loss[ctc_loss=0.07625, att_loss=0.237, loss=0.2049, over 3277281.93 frames. utt_duration=1241 frames, utt_pad_proportion=0.05676, over 10577.41 utterances.], batch size: 40, lr: 5.36e-03, grad_scale: 16.0 2023-03-08 21:21:51,237 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=78556.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:22:42,485 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=78588.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 21:22:48,498 INFO [train2.py:809] (2/4) Epoch 20, batch 2900, loss[ctc_loss=0.1109, att_loss=0.252, loss=0.2238, over 17035.00 frames. utt_duration=1338 frames, utt_pad_proportion=0.006871, over 51.00 utterances.], tot_loss[ctc_loss=0.07673, att_loss=0.2368, loss=0.2048, over 3275290.19 frames. utt_duration=1238 frames, utt_pad_proportion=0.05779, over 10593.51 utterances.], batch size: 51, lr: 5.36e-03, grad_scale: 16.0 2023-03-08 21:23:30,352 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=78617.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:24:02,855 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.232e+02 1.965e+02 2.501e+02 3.006e+02 5.766e+02, threshold=5.002e+02, percent-clipped=1.0 2023-03-08 21:24:08,863 INFO [train2.py:809] (2/4) Epoch 20, batch 2950, loss[ctc_loss=0.07154, att_loss=0.242, loss=0.2079, over 17007.00 frames. utt_duration=1310 frames, utt_pad_proportion=0.009208, over 52.00 utterances.], tot_loss[ctc_loss=0.0768, att_loss=0.2368, loss=0.2048, over 3272481.72 frames. utt_duration=1247 frames, utt_pad_proportion=0.05677, over 10506.49 utterances.], batch size: 52, lr: 5.36e-03, grad_scale: 16.0 2023-03-08 21:25:07,937 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-08 21:25:28,471 INFO [train2.py:809] (2/4) Epoch 20, batch 3000, loss[ctc_loss=0.07441, att_loss=0.2383, loss=0.2055, over 16166.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.007087, over 41.00 utterances.], tot_loss[ctc_loss=0.0769, att_loss=0.2365, loss=0.2046, over 3258340.13 frames. utt_duration=1237 frames, utt_pad_proportion=0.06351, over 10551.98 utterances.], batch size: 41, lr: 5.36e-03, grad_scale: 16.0 2023-03-08 21:25:28,472 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 21:25:42,049 INFO [train2.py:843] (2/4) Epoch 20, validation: ctc_loss=0.04025, att_loss=0.2341, loss=0.1953, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 21:25:42,049 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 21:26:18,180 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-08 21:26:18,396 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.07 vs. limit=5.0 2023-03-08 21:26:55,738 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.291e+02 1.875e+02 2.186e+02 2.641e+02 4.823e+02, threshold=4.373e+02, percent-clipped=0.0 2023-03-08 21:27:02,051 INFO [train2.py:809] (2/4) Epoch 20, batch 3050, loss[ctc_loss=0.06048, att_loss=0.2071, loss=0.1778, over 15903.00 frames. utt_duration=1633 frames, utt_pad_proportion=0.008147, over 39.00 utterances.], tot_loss[ctc_loss=0.07644, att_loss=0.2368, loss=0.2047, over 3266692.00 frames. utt_duration=1251 frames, utt_pad_proportion=0.05634, over 10454.84 utterances.], batch size: 39, lr: 5.36e-03, grad_scale: 16.0 2023-03-08 21:27:03,949 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1081, 4.9870, 4.7082, 2.9826, 4.7821, 4.6082, 4.3235, 2.8313], device='cuda:2'), covar=tensor([0.0105, 0.0098, 0.0295, 0.0983, 0.0098, 0.0198, 0.0301, 0.1351], device='cuda:2'), in_proj_covar=tensor([0.0072, 0.0100, 0.0100, 0.0109, 0.0082, 0.0109, 0.0097, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 21:28:22,265 INFO [train2.py:809] (2/4) Epoch 20, batch 3100, loss[ctc_loss=0.06603, att_loss=0.2374, loss=0.2031, over 17049.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008453, over 52.00 utterances.], tot_loss[ctc_loss=0.07601, att_loss=0.237, loss=0.2048, over 3268136.64 frames. utt_duration=1254 frames, utt_pad_proportion=0.05358, over 10435.18 utterances.], batch size: 52, lr: 5.36e-03, grad_scale: 16.0 2023-03-08 21:29:01,271 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=78816.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:29:15,667 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1569, 5.4378, 4.8414, 5.2705, 5.1098, 4.5900, 4.8314, 4.6797], device='cuda:2'), covar=tensor([0.1325, 0.0915, 0.0931, 0.0810, 0.1037, 0.1543, 0.2383, 0.2298], device='cuda:2'), in_proj_covar=tensor([0.0512, 0.0597, 0.0451, 0.0450, 0.0423, 0.0455, 0.0607, 0.0520], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 21:29:36,348 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.264e+02 1.977e+02 2.476e+02 2.944e+02 6.319e+02, threshold=4.952e+02, percent-clipped=6.0 2023-03-08 21:29:42,599 INFO [train2.py:809] (2/4) Epoch 20, batch 3150, loss[ctc_loss=0.08304, att_loss=0.2505, loss=0.217, over 16335.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006034, over 45.00 utterances.], tot_loss[ctc_loss=0.07565, att_loss=0.2371, loss=0.2048, over 3266076.64 frames. utt_duration=1236 frames, utt_pad_proportion=0.0591, over 10580.03 utterances.], batch size: 45, lr: 5.35e-03, grad_scale: 16.0 2023-03-08 21:29:59,073 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8016, 3.3932, 3.5181, 2.8023, 3.4783, 3.5144, 3.5216, 2.2513], device='cuda:2'), covar=tensor([0.1360, 0.1550, 0.2270, 0.6223, 0.1141, 0.1934, 0.1023, 0.7918], device='cuda:2'), in_proj_covar=tensor([0.0167, 0.0182, 0.0193, 0.0248, 0.0154, 0.0254, 0.0172, 0.0215], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:30:55,893 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=78888.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:31:01,820 INFO [train2.py:809] (2/4) Epoch 20, batch 3200, loss[ctc_loss=0.07536, att_loss=0.246, loss=0.2119, over 17357.00 frames. utt_duration=1104 frames, utt_pad_proportion=0.03427, over 63.00 utterances.], tot_loss[ctc_loss=0.07556, att_loss=0.2368, loss=0.2046, over 3272879.98 frames. utt_duration=1238 frames, utt_pad_proportion=0.05644, over 10589.00 utterances.], batch size: 63, lr: 5.35e-03, grad_scale: 16.0 2023-03-08 21:31:34,106 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8230, 5.1970, 5.4252, 5.2807, 5.2660, 5.7655, 5.1609, 5.8416], device='cuda:2'), covar=tensor([0.0761, 0.0688, 0.0749, 0.1131, 0.1707, 0.0958, 0.0798, 0.0722], device='cuda:2'), in_proj_covar=tensor([0.0860, 0.0504, 0.0588, 0.0654, 0.0861, 0.0608, 0.0481, 0.0599], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 21:31:34,120 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=78912.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:32:12,197 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=78936.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:32:15,121 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.285e+02 2.045e+02 2.242e+02 2.704e+02 5.175e+02, threshold=4.484e+02, percent-clipped=1.0 2023-03-08 21:32:17,041 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4647, 2.8239, 3.5329, 2.9301, 3.4819, 4.5520, 4.3752, 3.0672], device='cuda:2'), covar=tensor([0.0346, 0.1828, 0.1376, 0.1414, 0.1199, 0.0709, 0.0573, 0.1405], device='cuda:2'), in_proj_covar=tensor([0.0245, 0.0245, 0.0281, 0.0219, 0.0267, 0.0363, 0.0260, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 21:32:21,322 INFO [train2.py:809] (2/4) Epoch 20, batch 3250, loss[ctc_loss=0.06561, att_loss=0.2454, loss=0.2095, over 17021.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008035, over 51.00 utterances.], tot_loss[ctc_loss=0.07536, att_loss=0.2363, loss=0.2041, over 3269386.68 frames. utt_duration=1241 frames, utt_pad_proportion=0.05686, over 10554.63 utterances.], batch size: 51, lr: 5.35e-03, grad_scale: 16.0 2023-03-08 21:32:58,080 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=78965.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:33:40,325 INFO [train2.py:809] (2/4) Epoch 20, batch 3300, loss[ctc_loss=0.07166, att_loss=0.2389, loss=0.2054, over 16952.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008327, over 50.00 utterances.], tot_loss[ctc_loss=0.07479, att_loss=0.236, loss=0.2038, over 3272812.31 frames. utt_duration=1257 frames, utt_pad_proportion=0.05091, over 10429.14 utterances.], batch size: 50, lr: 5.35e-03, grad_scale: 16.0 2023-03-08 21:34:35,275 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79026.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:34:54,084 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.427e+02 1.921e+02 2.242e+02 2.810e+02 6.510e+02, threshold=4.483e+02, percent-clipped=2.0 2023-03-08 21:35:00,358 INFO [train2.py:809] (2/4) Epoch 20, batch 3350, loss[ctc_loss=0.04938, att_loss=0.2062, loss=0.1749, over 14063.00 frames. utt_duration=1816 frames, utt_pad_proportion=0.05446, over 31.00 utterances.], tot_loss[ctc_loss=0.07461, att_loss=0.236, loss=0.2037, over 3271258.99 frames. utt_duration=1272 frames, utt_pad_proportion=0.04834, over 10299.19 utterances.], batch size: 31, lr: 5.35e-03, grad_scale: 16.0 2023-03-08 21:35:58,060 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.57 vs. limit=2.0 2023-03-08 21:36:21,443 INFO [train2.py:809] (2/4) Epoch 20, batch 3400, loss[ctc_loss=0.08703, att_loss=0.2539, loss=0.2205, over 17105.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.01585, over 56.00 utterances.], tot_loss[ctc_loss=0.07516, att_loss=0.2365, loss=0.2042, over 3274581.74 frames. utt_duration=1256 frames, utt_pad_proportion=0.05103, over 10443.67 utterances.], batch size: 56, lr: 5.35e-03, grad_scale: 16.0 2023-03-08 21:36:59,684 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79116.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:37:34,906 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.410e+02 1.976e+02 2.387e+02 2.813e+02 1.211e+03, threshold=4.774e+02, percent-clipped=8.0 2023-03-08 21:37:41,716 INFO [train2.py:809] (2/4) Epoch 20, batch 3450, loss[ctc_loss=0.08027, att_loss=0.2383, loss=0.2067, over 16272.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007737, over 43.00 utterances.], tot_loss[ctc_loss=0.07507, att_loss=0.2365, loss=0.2042, over 3278316.12 frames. utt_duration=1262 frames, utt_pad_proportion=0.04925, over 10402.46 utterances.], batch size: 43, lr: 5.34e-03, grad_scale: 16.0 2023-03-08 21:38:16,626 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=79164.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:38:50,103 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6134, 2.9997, 3.0371, 2.6762, 2.9043, 2.9765, 3.0578, 2.1562], device='cuda:2'), covar=tensor([0.1107, 0.1435, 0.1906, 0.4040, 0.1969, 0.1972, 0.1056, 0.4218], device='cuda:2'), in_proj_covar=tensor([0.0167, 0.0181, 0.0191, 0.0249, 0.0154, 0.0254, 0.0171, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:39:01,281 INFO [train2.py:809] (2/4) Epoch 20, batch 3500, loss[ctc_loss=0.06602, att_loss=0.2449, loss=0.2092, over 16754.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006367, over 48.00 utterances.], tot_loss[ctc_loss=0.07479, att_loss=0.2361, loss=0.2038, over 3275289.71 frames. utt_duration=1265 frames, utt_pad_proportion=0.04899, over 10368.41 utterances.], batch size: 48, lr: 5.34e-03, grad_scale: 16.0 2023-03-08 21:39:33,906 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79212.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:40:03,430 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6217, 3.3561, 3.3601, 3.0071, 3.2321, 3.3699, 3.4817, 2.4526], device='cuda:2'), covar=tensor([0.1231, 0.1618, 0.2235, 0.3010, 0.2329, 0.3418, 0.0966, 0.3980], device='cuda:2'), in_proj_covar=tensor([0.0167, 0.0182, 0.0192, 0.0248, 0.0155, 0.0254, 0.0171, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:40:15,351 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.933e+02 2.262e+02 2.788e+02 9.048e+02, threshold=4.523e+02, percent-clipped=2.0 2023-03-08 21:40:22,327 INFO [train2.py:809] (2/4) Epoch 20, batch 3550, loss[ctc_loss=0.08238, att_loss=0.2508, loss=0.2171, over 17299.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01146, over 55.00 utterances.], tot_loss[ctc_loss=0.07488, att_loss=0.2366, loss=0.2042, over 3283487.44 frames. utt_duration=1261 frames, utt_pad_proportion=0.04622, over 10429.04 utterances.], batch size: 55, lr: 5.34e-03, grad_scale: 16.0 2023-03-08 21:40:35,158 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7750, 2.3229, 2.7009, 2.7651, 2.8144, 2.9362, 2.6280, 3.2900], device='cuda:2'), covar=tensor([0.1497, 0.2991, 0.1985, 0.1494, 0.1671, 0.0959, 0.1911, 0.1133], device='cuda:2'), in_proj_covar=tensor([0.0114, 0.0122, 0.0116, 0.0106, 0.0120, 0.0103, 0.0124, 0.0097], device='cuda:2'), out_proj_covar=tensor([8.5877e-05, 9.4214e-05, 9.2405e-05, 8.2684e-05, 8.8797e-05, 8.2405e-05, 9.3534e-05, 7.7045e-05], device='cuda:2') 2023-03-08 21:40:51,179 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=79260.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:41:00,709 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0970, 5.3693, 5.3012, 5.2566, 5.3608, 5.3393, 5.0945, 4.8284], device='cuda:2'), covar=tensor([0.1004, 0.0541, 0.0296, 0.0553, 0.0304, 0.0318, 0.0366, 0.0361], device='cuda:2'), in_proj_covar=tensor([0.0525, 0.0357, 0.0343, 0.0354, 0.0415, 0.0428, 0.0353, 0.0394], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 21:41:31,907 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79285.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 21:41:42,767 INFO [train2.py:809] (2/4) Epoch 20, batch 3600, loss[ctc_loss=0.09347, att_loss=0.2521, loss=0.2204, over 16478.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.005875, over 46.00 utterances.], tot_loss[ctc_loss=0.07533, att_loss=0.2371, loss=0.2047, over 3279133.53 frames. utt_duration=1237 frames, utt_pad_proportion=0.05453, over 10619.05 utterances.], batch size: 46, lr: 5.34e-03, grad_scale: 16.0 2023-03-08 21:42:29,070 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79321.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:42:37,959 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2766, 3.8566, 3.3491, 3.5315, 4.0776, 3.7976, 3.0866, 4.3639], device='cuda:2'), covar=tensor([0.0828, 0.0485, 0.0970, 0.0618, 0.0652, 0.0662, 0.0837, 0.0455], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0214, 0.0224, 0.0197, 0.0274, 0.0237, 0.0200, 0.0287], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 21:42:48,936 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79333.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:42:56,058 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.246e+02 1.889e+02 2.390e+02 3.056e+02 6.577e+02, threshold=4.779e+02, percent-clipped=3.0 2023-03-08 21:43:03,005 INFO [train2.py:809] (2/4) Epoch 20, batch 3650, loss[ctc_loss=0.05729, att_loss=0.2015, loss=0.1727, over 15393.00 frames. utt_duration=1761 frames, utt_pad_proportion=0.009754, over 35.00 utterances.], tot_loss[ctc_loss=0.07467, att_loss=0.2357, loss=0.2035, over 3264695.36 frames. utt_duration=1256 frames, utt_pad_proportion=0.05294, over 10408.44 utterances.], batch size: 35, lr: 5.34e-03, grad_scale: 16.0 2023-03-08 21:43:09,360 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79346.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 21:43:10,878 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5654, 2.5274, 4.9092, 3.8921, 2.9633, 4.2490, 4.8708, 4.5709], device='cuda:2'), covar=tensor([0.0209, 0.1699, 0.0189, 0.0934, 0.1781, 0.0258, 0.0128, 0.0241], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0242, 0.0180, 0.0310, 0.0263, 0.0211, 0.0170, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:43:36,978 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79364.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:43:54,642 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79375.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 21:44:07,369 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9265, 3.6650, 3.1356, 3.2981, 3.8516, 3.5438, 2.9410, 4.0608], device='cuda:2'), covar=tensor([0.0984, 0.0486, 0.0993, 0.0668, 0.0726, 0.0707, 0.0834, 0.0574], device='cuda:2'), in_proj_covar=tensor([0.0199, 0.0213, 0.0223, 0.0196, 0.0273, 0.0235, 0.0199, 0.0285], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 21:44:21,501 INFO [train2.py:809] (2/4) Epoch 20, batch 3700, loss[ctc_loss=0.0671, att_loss=0.2276, loss=0.1955, over 16177.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.007113, over 41.00 utterances.], tot_loss[ctc_loss=0.07549, att_loss=0.2363, loss=0.2041, over 3267694.83 frames. utt_duration=1246 frames, utt_pad_proportion=0.05574, over 10504.73 utterances.], batch size: 41, lr: 5.34e-03, grad_scale: 16.0 2023-03-08 21:44:25,004 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79394.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:44:52,268 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2871, 5.2113, 4.9742, 2.6262, 2.0575, 3.1213, 2.4259, 3.9875], device='cuda:2'), covar=tensor([0.0625, 0.0297, 0.0285, 0.4848, 0.5686, 0.2359, 0.3768, 0.1688], device='cuda:2'), in_proj_covar=tensor([0.0349, 0.0266, 0.0262, 0.0240, 0.0341, 0.0332, 0.0250, 0.0364], device='cuda:2'), out_proj_covar=tensor([1.4789e-04, 9.8745e-05, 1.1175e-04, 1.0226e-04, 1.4272e-04, 1.2999e-04, 1.0033e-04, 1.4770e-04], device='cuda:2') 2023-03-08 21:45:00,424 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-08 21:45:14,885 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79425.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:45:32,701 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79436.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 21:45:35,343 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.270e+02 1.959e+02 2.392e+02 2.997e+02 5.823e+02, threshold=4.784e+02, percent-clipped=2.0 2023-03-08 21:45:41,552 INFO [train2.py:809] (2/4) Epoch 20, batch 3750, loss[ctc_loss=0.06329, att_loss=0.2297, loss=0.1964, over 16540.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006257, over 45.00 utterances.], tot_loss[ctc_loss=0.07506, att_loss=0.2363, loss=0.2041, over 3272411.67 frames. utt_duration=1276 frames, utt_pad_proportion=0.04742, over 10268.75 utterances.], batch size: 45, lr: 5.33e-03, grad_scale: 16.0 2023-03-08 21:45:41,878 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3652, 5.3062, 5.0847, 3.2573, 5.0156, 4.8411, 4.6333, 2.8336], device='cuda:2'), covar=tensor([0.0087, 0.0079, 0.0264, 0.0824, 0.0101, 0.0189, 0.0248, 0.1332], device='cuda:2'), in_proj_covar=tensor([0.0073, 0.0101, 0.0102, 0.0110, 0.0083, 0.0111, 0.0098, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 21:46:21,478 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3981, 4.0087, 3.4638, 3.7111, 4.1667, 3.9640, 3.3400, 4.5540], device='cuda:2'), covar=tensor([0.0761, 0.0473, 0.0927, 0.0544, 0.0637, 0.0538, 0.0712, 0.0393], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0214, 0.0223, 0.0196, 0.0274, 0.0236, 0.0199, 0.0285], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 21:46:37,684 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8425, 3.6794, 3.6697, 3.1913, 3.6166, 3.7526, 3.7206, 2.8379], device='cuda:2'), covar=tensor([0.0946, 0.1043, 0.1535, 0.3036, 0.0941, 0.1690, 0.0786, 0.3479], device='cuda:2'), in_proj_covar=tensor([0.0166, 0.0180, 0.0192, 0.0248, 0.0154, 0.0252, 0.0171, 0.0212], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:47:01,105 INFO [train2.py:809] (2/4) Epoch 20, batch 3800, loss[ctc_loss=0.08612, att_loss=0.2466, loss=0.2145, over 16975.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007159, over 50.00 utterances.], tot_loss[ctc_loss=0.0761, att_loss=0.2368, loss=0.2046, over 3270165.41 frames. utt_duration=1221 frames, utt_pad_proportion=0.06022, over 10723.33 utterances.], batch size: 50, lr: 5.33e-03, grad_scale: 16.0 2023-03-08 21:48:09,353 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1165, 5.3479, 5.3679, 5.2791, 5.3938, 5.3693, 5.1282, 4.8127], device='cuda:2'), covar=tensor([0.0968, 0.0628, 0.0260, 0.0559, 0.0283, 0.0297, 0.0343, 0.0318], device='cuda:2'), in_proj_covar=tensor([0.0525, 0.0357, 0.0343, 0.0353, 0.0414, 0.0427, 0.0352, 0.0391], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 21:48:14,848 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.83 vs. limit=2.0 2023-03-08 21:48:15,597 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.371e+02 1.856e+02 2.318e+02 2.933e+02 8.070e+02, threshold=4.637e+02, percent-clipped=4.0 2023-03-08 21:48:21,940 INFO [train2.py:809] (2/4) Epoch 20, batch 3850, loss[ctc_loss=0.06661, att_loss=0.2268, loss=0.1947, over 16400.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006941, over 44.00 utterances.], tot_loss[ctc_loss=0.0753, att_loss=0.2366, loss=0.2043, over 3263090.03 frames. utt_duration=1231 frames, utt_pad_proportion=0.05817, over 10614.11 utterances.], batch size: 44, lr: 5.33e-03, grad_scale: 16.0 2023-03-08 21:48:50,894 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6181, 3.7355, 3.4122, 3.8313, 2.6049, 3.7723, 2.6871, 2.0729], device='cuda:2'), covar=tensor([0.0471, 0.0328, 0.0842, 0.0301, 0.1601, 0.0264, 0.1418, 0.1588], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0162, 0.0259, 0.0155, 0.0221, 0.0141, 0.0230, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:49:18,107 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0795, 5.3068, 5.3225, 5.2528, 5.3598, 5.3241, 5.0620, 4.8554], device='cuda:2'), covar=tensor([0.0941, 0.0514, 0.0268, 0.0523, 0.0250, 0.0272, 0.0350, 0.0279], device='cuda:2'), in_proj_covar=tensor([0.0525, 0.0357, 0.0342, 0.0351, 0.0412, 0.0426, 0.0351, 0.0390], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 21:49:29,013 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79585.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:49:39,444 INFO [train2.py:809] (2/4) Epoch 20, batch 3900, loss[ctc_loss=0.06484, att_loss=0.2015, loss=0.1741, over 15893.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.008817, over 39.00 utterances.], tot_loss[ctc_loss=0.07593, att_loss=0.2369, loss=0.2047, over 3258507.16 frames. utt_duration=1216 frames, utt_pad_proportion=0.06395, over 10736.14 utterances.], batch size: 39, lr: 5.33e-03, grad_scale: 16.0 2023-03-08 21:49:52,734 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.59 vs. limit=5.0 2023-03-08 21:50:23,836 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79621.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:50:50,105 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.328e+02 1.965e+02 2.336e+02 2.819e+02 5.688e+02, threshold=4.673e+02, percent-clipped=3.0 2023-03-08 21:50:54,944 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79641.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 21:50:56,228 INFO [train2.py:809] (2/4) Epoch 20, batch 3950, loss[ctc_loss=0.1289, att_loss=0.2717, loss=0.2431, over 14345.00 frames. utt_duration=394.6 frames, utt_pad_proportion=0.3102, over 146.00 utterances.], tot_loss[ctc_loss=0.07655, att_loss=0.2377, loss=0.2054, over 3262836.84 frames. utt_duration=1220 frames, utt_pad_proportion=0.0622, over 10715.23 utterances.], batch size: 146, lr: 5.33e-03, grad_scale: 16.0 2023-03-08 21:51:02,672 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79646.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:51:19,609 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79657.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:51:38,161 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=79669.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:52:14,301 INFO [train2.py:809] (2/4) Epoch 21, batch 0, loss[ctc_loss=0.06869, att_loss=0.2292, loss=0.1971, over 15955.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006284, over 41.00 utterances.], tot_loss[ctc_loss=0.06869, att_loss=0.2292, loss=0.1971, over 15955.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006284, over 41.00 utterances.], batch size: 41, lr: 5.20e-03, grad_scale: 16.0 2023-03-08 21:52:14,301 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 21:52:26,402 INFO [train2.py:843] (2/4) Epoch 21, validation: ctc_loss=0.04229, att_loss=0.2351, loss=0.1965, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 21:52:26,403 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 21:52:29,128 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-08 21:52:47,110 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79689.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:53:33,510 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79718.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:53:36,528 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79720.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:53:46,007 INFO [train2.py:809] (2/4) Epoch 21, batch 50, loss[ctc_loss=0.08757, att_loss=0.2511, loss=0.2184, over 16691.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.00629, over 46.00 utterances.], tot_loss[ctc_loss=0.07414, att_loss=0.2369, loss=0.2043, over 744998.38 frames. utt_duration=1196 frames, utt_pad_proportion=0.05709, over 2493.71 utterances.], batch size: 46, lr: 5.19e-03, grad_scale: 16.0 2023-03-08 21:53:54,349 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79731.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 21:54:05,022 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.329e+02 1.975e+02 2.497e+02 3.178e+02 8.121e+02, threshold=4.994e+02, percent-clipped=4.0 2023-03-08 21:54:25,534 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-03-08 21:54:43,420 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79762.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:55:05,923 INFO [train2.py:809] (2/4) Epoch 21, batch 100, loss[ctc_loss=0.0868, att_loss=0.2248, loss=0.1972, over 14573.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.03335, over 32.00 utterances.], tot_loss[ctc_loss=0.07474, att_loss=0.237, loss=0.2046, over 1308143.04 frames. utt_duration=1274 frames, utt_pad_proportion=0.04357, over 4111.14 utterances.], batch size: 32, lr: 5.19e-03, grad_scale: 16.0 2023-03-08 21:55:50,585 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3701, 4.0656, 3.4860, 3.7835, 4.2247, 3.9303, 3.4167, 4.5482], device='cuda:2'), covar=tensor([0.0860, 0.0493, 0.0861, 0.0535, 0.0666, 0.0614, 0.0762, 0.0465], device='cuda:2'), in_proj_covar=tensor([0.0202, 0.0214, 0.0225, 0.0197, 0.0276, 0.0237, 0.0199, 0.0286], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 21:56:22,728 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79823.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:56:26,917 INFO [train2.py:809] (2/4) Epoch 21, batch 150, loss[ctc_loss=0.08018, att_loss=0.2528, loss=0.2183, over 17281.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.01306, over 55.00 utterances.], tot_loss[ctc_loss=0.0737, att_loss=0.2355, loss=0.2031, over 1736263.47 frames. utt_duration=1320 frames, utt_pad_proportion=0.03671, over 5266.93 utterances.], batch size: 55, lr: 5.19e-03, grad_scale: 16.0 2023-03-08 21:56:44,953 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4546, 2.7953, 3.3920, 4.5487, 3.9679, 3.9590, 3.0670, 2.2255], device='cuda:2'), covar=tensor([0.0672, 0.2012, 0.0942, 0.0442, 0.0842, 0.0441, 0.1363, 0.2305], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0215, 0.0190, 0.0216, 0.0221, 0.0176, 0.0201, 0.0188], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:56:46,153 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.523e+02 1.968e+02 2.419e+02 2.959e+02 7.390e+02, threshold=4.838e+02, percent-clipped=3.0 2023-03-08 21:56:51,737 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1749, 5.4016, 5.7253, 5.4380, 5.7232, 6.1213, 5.2658, 6.2423], device='cuda:2'), covar=tensor([0.0573, 0.0688, 0.0737, 0.1284, 0.1430, 0.0801, 0.0637, 0.0550], device='cuda:2'), in_proj_covar=tensor([0.0864, 0.0509, 0.0588, 0.0655, 0.0861, 0.0615, 0.0482, 0.0602], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 21:56:52,031 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2159, 5.2234, 4.9479, 2.6998, 2.0924, 3.2082, 2.5501, 3.9602], device='cuda:2'), covar=tensor([0.0651, 0.0303, 0.0297, 0.4492, 0.5582, 0.2242, 0.3658, 0.1666], device='cuda:2'), in_proj_covar=tensor([0.0348, 0.0266, 0.0262, 0.0240, 0.0340, 0.0330, 0.0249, 0.0361], device='cuda:2'), out_proj_covar=tensor([1.4775e-04, 9.8472e-05, 1.1169e-04, 1.0265e-04, 1.4227e-04, 1.2903e-04, 9.9854e-05, 1.4678e-04], device='cuda:2') 2023-03-08 21:57:47,560 INFO [train2.py:809] (2/4) Epoch 21, batch 200, loss[ctc_loss=0.06805, att_loss=0.2281, loss=0.1961, over 16378.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.008302, over 44.00 utterances.], tot_loss[ctc_loss=0.07388, att_loss=0.235, loss=0.2027, over 2076637.72 frames. utt_duration=1271 frames, utt_pad_proportion=0.0477, over 6541.23 utterances.], batch size: 44, lr: 5.19e-03, grad_scale: 16.0 2023-03-08 21:58:10,256 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-03-08 21:58:20,020 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1371, 4.5088, 4.6883, 4.8970, 3.0235, 4.5716, 3.1349, 2.0188], device='cuda:2'), covar=tensor([0.0474, 0.0254, 0.0615, 0.0268, 0.1594, 0.0191, 0.1350, 0.1800], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0163, 0.0260, 0.0156, 0.0223, 0.0141, 0.0230, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:58:54,709 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4142, 2.3134, 4.8682, 3.8071, 2.9190, 4.2565, 4.5782, 4.6327], device='cuda:2'), covar=tensor([0.0231, 0.1705, 0.0138, 0.0905, 0.1702, 0.0240, 0.0175, 0.0207], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0238, 0.0179, 0.0306, 0.0260, 0.0209, 0.0168, 0.0197], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 21:59:08,382 INFO [train2.py:809] (2/4) Epoch 21, batch 250, loss[ctc_loss=0.06799, att_loss=0.2386, loss=0.2045, over 16486.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.005454, over 46.00 utterances.], tot_loss[ctc_loss=0.07401, att_loss=0.2354, loss=0.2031, over 2339991.05 frames. utt_duration=1254 frames, utt_pad_proportion=0.05194, over 7473.69 utterances.], batch size: 46, lr: 5.19e-03, grad_scale: 16.0 2023-03-08 21:59:28,143 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.404e+02 2.056e+02 2.417e+02 2.822e+02 7.152e+02, threshold=4.834e+02, percent-clipped=3.0 2023-03-08 21:59:33,211 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79941.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 21:59:33,315 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79941.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 22:00:23,544 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-08 22:00:29,277 INFO [train2.py:809] (2/4) Epoch 21, batch 300, loss[ctc_loss=0.1308, att_loss=0.2711, loss=0.2431, over 13885.00 frames. utt_duration=381.9 frames, utt_pad_proportion=0.3347, over 146.00 utterances.], tot_loss[ctc_loss=0.0747, att_loss=0.236, loss=0.2037, over 2549297.04 frames. utt_duration=1231 frames, utt_pad_proportion=0.05638, over 8294.05 utterances.], batch size: 146, lr: 5.19e-03, grad_scale: 16.0 2023-03-08 22:00:46,938 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7918, 2.2072, 2.5761, 2.4920, 2.7189, 2.6903, 2.5933, 3.1497], device='cuda:2'), covar=tensor([0.1801, 0.2875, 0.2016, 0.1576, 0.1662, 0.1256, 0.2286, 0.1282], device='cuda:2'), in_proj_covar=tensor([0.0114, 0.0120, 0.0115, 0.0106, 0.0119, 0.0103, 0.0124, 0.0096], device='cuda:2'), out_proj_covar=tensor([8.5521e-05, 9.3315e-05, 9.1886e-05, 8.2648e-05, 8.7865e-05, 8.2529e-05, 9.3599e-05, 7.6627e-05], device='cuda:2') 2023-03-08 22:00:49,778 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=79989.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 22:00:49,906 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79989.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:01:32,385 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80013.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:01:43,969 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80020.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:01:53,322 INFO [train2.py:809] (2/4) Epoch 21, batch 350, loss[ctc_loss=0.06831, att_loss=0.2377, loss=0.2038, over 16633.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.004856, over 47.00 utterances.], tot_loss[ctc_loss=0.07479, att_loss=0.236, loss=0.2038, over 2703115.56 frames. utt_duration=1222 frames, utt_pad_proportion=0.06203, over 8858.80 utterances.], batch size: 47, lr: 5.18e-03, grad_scale: 16.0 2023-03-08 22:02:02,037 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80031.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 22:02:11,095 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80037.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:02:12,386 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.209e+02 1.969e+02 2.385e+02 2.918e+02 9.167e+02, threshold=4.769e+02, percent-clipped=4.0 2023-03-08 22:03:01,920 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80068.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:03:14,450 INFO [train2.py:809] (2/4) Epoch 21, batch 400, loss[ctc_loss=0.07441, att_loss=0.2483, loss=0.2135, over 16755.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.007354, over 48.00 utterances.], tot_loss[ctc_loss=0.07425, att_loss=0.236, loss=0.2036, over 2836399.72 frames. utt_duration=1244 frames, utt_pad_proportion=0.05414, over 9133.11 utterances.], batch size: 48, lr: 5.18e-03, grad_scale: 16.0 2023-03-08 22:03:16,316 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.91 vs. limit=5.0 2023-03-08 22:03:19,728 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80079.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 22:03:24,410 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1034, 5.4404, 5.6623, 5.4447, 5.6563, 6.0804, 5.2574, 6.1263], device='cuda:2'), covar=tensor([0.0587, 0.0562, 0.0671, 0.1121, 0.1483, 0.0739, 0.0653, 0.0616], device='cuda:2'), in_proj_covar=tensor([0.0851, 0.0500, 0.0580, 0.0645, 0.0850, 0.0603, 0.0473, 0.0597], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 22:04:11,617 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2023-03-08 22:04:22,268 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80118.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:04:30,555 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-08 22:04:34,489 INFO [train2.py:809] (2/4) Epoch 21, batch 450, loss[ctc_loss=0.07971, att_loss=0.2454, loss=0.2123, over 17328.00 frames. utt_duration=1262 frames, utt_pad_proportion=0.009519, over 55.00 utterances.], tot_loss[ctc_loss=0.07517, att_loss=0.2361, loss=0.2039, over 2929390.37 frames. utt_duration=1235 frames, utt_pad_proportion=0.05849, over 9501.26 utterances.], batch size: 55, lr: 5.18e-03, grad_scale: 16.0 2023-03-08 22:04:53,681 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.266e+02 2.042e+02 2.548e+02 3.452e+02 1.189e+03, threshold=5.096e+02, percent-clipped=8.0 2023-03-08 22:05:19,396 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1520, 5.1537, 4.9511, 2.2335, 2.0777, 3.1359, 2.4167, 3.8688], device='cuda:2'), covar=tensor([0.0714, 0.0315, 0.0285, 0.5769, 0.5612, 0.2222, 0.3986, 0.1846], device='cuda:2'), in_proj_covar=tensor([0.0351, 0.0269, 0.0265, 0.0242, 0.0342, 0.0332, 0.0250, 0.0364], device='cuda:2'), out_proj_covar=tensor([1.4897e-04, 9.9720e-05, 1.1257e-04, 1.0351e-04, 1.4314e-04, 1.2978e-04, 1.0007e-04, 1.4776e-04], device='cuda:2') 2023-03-08 22:05:55,598 INFO [train2.py:809] (2/4) Epoch 21, batch 500, loss[ctc_loss=0.08602, att_loss=0.2198, loss=0.193, over 14571.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.04154, over 32.00 utterances.], tot_loss[ctc_loss=0.07606, att_loss=0.2363, loss=0.2043, over 2999350.94 frames. utt_duration=1198 frames, utt_pad_proportion=0.06912, over 10024.08 utterances.], batch size: 32, lr: 5.18e-03, grad_scale: 16.0 2023-03-08 22:07:16,765 INFO [train2.py:809] (2/4) Epoch 21, batch 550, loss[ctc_loss=0.07185, att_loss=0.2422, loss=0.2081, over 17288.00 frames. utt_duration=876.8 frames, utt_pad_proportion=0.08281, over 79.00 utterances.], tot_loss[ctc_loss=0.07568, att_loss=0.2364, loss=0.2043, over 3061060.01 frames. utt_duration=1211 frames, utt_pad_proportion=0.06559, over 10125.20 utterances.], batch size: 79, lr: 5.18e-03, grad_scale: 16.0 2023-03-08 22:07:35,877 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.287e+01 1.925e+02 2.164e+02 2.477e+02 4.088e+02, threshold=4.327e+02, percent-clipped=0.0 2023-03-08 22:07:41,492 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80241.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:08:37,651 INFO [train2.py:809] (2/4) Epoch 21, batch 600, loss[ctc_loss=0.05811, att_loss=0.2258, loss=0.1923, over 16006.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007605, over 40.00 utterances.], tot_loss[ctc_loss=0.07549, att_loss=0.2361, loss=0.204, over 3104613.04 frames. utt_duration=1209 frames, utt_pad_proportion=0.06713, over 10286.87 utterances.], batch size: 40, lr: 5.18e-03, grad_scale: 16.0 2023-03-08 22:08:44,180 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3935, 3.0095, 3.6516, 4.5055, 3.9990, 3.9299, 3.0657, 2.3464], device='cuda:2'), covar=tensor([0.0716, 0.1959, 0.0829, 0.0526, 0.0788, 0.0514, 0.1437, 0.2354], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0215, 0.0189, 0.0217, 0.0222, 0.0177, 0.0200, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:08:49,081 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80283.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:08:59,324 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80289.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:09:36,934 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6496, 2.0420, 2.2813, 2.4544, 2.5630, 2.6718, 2.4389, 3.1388], device='cuda:2'), covar=tensor([0.1630, 0.3095, 0.2203, 0.1515, 0.1989, 0.1194, 0.2144, 0.0919], device='cuda:2'), in_proj_covar=tensor([0.0116, 0.0123, 0.0116, 0.0108, 0.0121, 0.0105, 0.0127, 0.0097], device='cuda:2'), out_proj_covar=tensor([8.7201e-05, 9.5052e-05, 9.3208e-05, 8.3876e-05, 8.9587e-05, 8.3928e-05, 9.5199e-05, 7.7483e-05], device='cuda:2') 2023-03-08 22:09:38,475 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80313.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:09:58,596 INFO [train2.py:809] (2/4) Epoch 21, batch 650, loss[ctc_loss=0.06073, att_loss=0.2212, loss=0.1891, over 16388.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.008508, over 44.00 utterances.], tot_loss[ctc_loss=0.07516, att_loss=0.2361, loss=0.2039, over 3152163.79 frames. utt_duration=1229 frames, utt_pad_proportion=0.05864, over 10274.90 utterances.], batch size: 44, lr: 5.17e-03, grad_scale: 16.0 2023-03-08 22:10:18,135 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.849e+02 2.223e+02 2.876e+02 6.376e+02, threshold=4.446e+02, percent-clipped=5.0 2023-03-08 22:10:27,851 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80344.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:10:50,665 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6606, 2.0005, 2.3065, 2.4855, 2.7064, 2.5987, 2.4661, 3.0749], device='cuda:2'), covar=tensor([0.1471, 0.3206, 0.2221, 0.1409, 0.1707, 0.1098, 0.2123, 0.1266], device='cuda:2'), in_proj_covar=tensor([0.0116, 0.0123, 0.0116, 0.0108, 0.0121, 0.0105, 0.0127, 0.0097], device='cuda:2'), out_proj_covar=tensor([8.7213e-05, 9.5042e-05, 9.3103e-05, 8.4099e-05, 8.9560e-05, 8.4009e-05, 9.5321e-05, 7.7637e-05], device='cuda:2') 2023-03-08 22:10:55,597 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80361.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:11:15,262 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5390, 4.4509, 4.5322, 4.6125, 5.1366, 4.6080, 4.4704, 2.5844], device='cuda:2'), covar=tensor([0.0218, 0.0314, 0.0314, 0.0250, 0.0617, 0.0194, 0.0313, 0.1770], device='cuda:2'), in_proj_covar=tensor([0.0162, 0.0186, 0.0187, 0.0204, 0.0370, 0.0157, 0.0175, 0.0219], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:11:19,497 INFO [train2.py:809] (2/4) Epoch 21, batch 700, loss[ctc_loss=0.07016, att_loss=0.2158, loss=0.1867, over 15381.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01005, over 35.00 utterances.], tot_loss[ctc_loss=0.0755, att_loss=0.2367, loss=0.2044, over 3172015.28 frames. utt_duration=1195 frames, utt_pad_proportion=0.06859, over 10627.56 utterances.], batch size: 35, lr: 5.17e-03, grad_scale: 16.0 2023-03-08 22:11:46,965 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80393.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:12:28,267 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80418.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:12:37,307 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0238, 6.2854, 5.7769, 6.0031, 5.8949, 5.4921, 5.7616, 5.4273], device='cuda:2'), covar=tensor([0.1186, 0.0798, 0.0917, 0.0761, 0.0940, 0.1525, 0.2010, 0.2365], device='cuda:2'), in_proj_covar=tensor([0.0516, 0.0600, 0.0455, 0.0446, 0.0426, 0.0463, 0.0605, 0.0521], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 22:12:40,236 INFO [train2.py:809] (2/4) Epoch 21, batch 750, loss[ctc_loss=0.07799, att_loss=0.2525, loss=0.2176, over 17363.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.0212, over 59.00 utterances.], tot_loss[ctc_loss=0.07519, att_loss=0.2368, loss=0.2045, over 3201195.67 frames. utt_duration=1207 frames, utt_pad_proportion=0.06191, over 10617.85 utterances.], batch size: 59, lr: 5.17e-03, grad_scale: 16.0 2023-03-08 22:12:56,208 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7341, 5.1024, 5.3138, 5.1356, 5.2408, 5.6414, 5.0600, 5.7878], device='cuda:2'), covar=tensor([0.0796, 0.0712, 0.0796, 0.1192, 0.1836, 0.1004, 0.0935, 0.0724], device='cuda:2'), in_proj_covar=tensor([0.0868, 0.0513, 0.0591, 0.0654, 0.0870, 0.0621, 0.0484, 0.0605], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 22:12:59,012 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.272e+02 1.906e+02 2.332e+02 2.965e+02 6.816e+02, threshold=4.664e+02, percent-clipped=5.0 2023-03-08 22:13:25,115 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80454.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:13:31,127 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2599, 4.7241, 4.7255, 5.1048, 3.3494, 4.8737, 3.1120, 2.4336], device='cuda:2'), covar=tensor([0.0397, 0.0244, 0.0566, 0.0141, 0.1292, 0.0135, 0.1313, 0.1518], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0161, 0.0259, 0.0156, 0.0220, 0.0141, 0.0228, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:13:39,551 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80463.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:13:44,270 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80466.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:13:59,488 INFO [train2.py:809] (2/4) Epoch 21, batch 800, loss[ctc_loss=0.05317, att_loss=0.2108, loss=0.1793, over 15648.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008576, over 37.00 utterances.], tot_loss[ctc_loss=0.0772, att_loss=0.2375, loss=0.2054, over 3202604.07 frames. utt_duration=1191 frames, utt_pad_proportion=0.07054, over 10771.42 utterances.], batch size: 37, lr: 5.17e-03, grad_scale: 16.0 2023-03-08 22:14:12,154 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2023-03-08 22:14:29,055 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80494.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:15:04,408 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.81 vs. limit=5.0 2023-03-08 22:15:17,438 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80524.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:15:20,314 INFO [train2.py:809] (2/4) Epoch 21, batch 850, loss[ctc_loss=0.06776, att_loss=0.2379, loss=0.2038, over 17453.00 frames. utt_duration=1013 frames, utt_pad_proportion=0.04145, over 69.00 utterances.], tot_loss[ctc_loss=0.07619, att_loss=0.2375, loss=0.2052, over 3224986.27 frames. utt_duration=1197 frames, utt_pad_proportion=0.06558, over 10787.16 utterances.], batch size: 69, lr: 5.17e-03, grad_scale: 16.0 2023-03-08 22:15:40,669 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.78 vs. limit=5.0 2023-03-08 22:15:40,966 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 2.073e+02 2.361e+02 2.912e+02 1.019e+03, threshold=4.723e+02, percent-clipped=4.0 2023-03-08 22:16:07,037 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80555.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:16:40,197 INFO [train2.py:809] (2/4) Epoch 21, batch 900, loss[ctc_loss=0.09782, att_loss=0.264, loss=0.2307, over 17322.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.009169, over 55.00 utterances.], tot_loss[ctc_loss=0.07544, att_loss=0.2369, loss=0.2046, over 3238677.75 frames. utt_duration=1209 frames, utt_pad_proportion=0.06333, over 10725.55 utterances.], batch size: 55, lr: 5.17e-03, grad_scale: 16.0 2023-03-08 22:17:25,153 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-03-08 22:18:00,936 INFO [train2.py:809] (2/4) Epoch 21, batch 950, loss[ctc_loss=0.09375, att_loss=0.2529, loss=0.2211, over 17435.00 frames. utt_duration=884.2 frames, utt_pad_proportion=0.07223, over 79.00 utterances.], tot_loss[ctc_loss=0.0756, att_loss=0.2372, loss=0.2049, over 3247149.01 frames. utt_duration=1185 frames, utt_pad_proportion=0.0691, over 10973.57 utterances.], batch size: 79, lr: 5.16e-03, grad_scale: 16.0 2023-03-08 22:18:22,506 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.307e+02 1.900e+02 2.361e+02 2.857e+02 6.538e+02, threshold=4.723e+02, percent-clipped=3.0 2023-03-08 22:18:22,782 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80639.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:19:22,687 INFO [train2.py:809] (2/4) Epoch 21, batch 1000, loss[ctc_loss=0.07994, att_loss=0.232, loss=0.2016, over 16388.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.008462, over 44.00 utterances.], tot_loss[ctc_loss=0.07524, att_loss=0.2375, loss=0.205, over 3259818.83 frames. utt_duration=1203 frames, utt_pad_proportion=0.06311, over 10853.68 utterances.], batch size: 44, lr: 5.16e-03, grad_scale: 8.0 2023-03-08 22:20:39,451 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8946, 5.1356, 5.0877, 5.0449, 5.1309, 5.0474, 4.8341, 4.5312], device='cuda:2'), covar=tensor([0.1019, 0.0590, 0.0341, 0.0614, 0.0353, 0.0379, 0.0419, 0.0386], device='cuda:2'), in_proj_covar=tensor([0.0518, 0.0359, 0.0342, 0.0356, 0.0420, 0.0427, 0.0354, 0.0392], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 22:20:44,010 INFO [train2.py:809] (2/4) Epoch 21, batch 1050, loss[ctc_loss=0.05158, att_loss=0.2139, loss=0.1815, over 10073.00 frames. utt_duration=1833 frames, utt_pad_proportion=0.2587, over 22.00 utterances.], tot_loss[ctc_loss=0.07517, att_loss=0.2372, loss=0.2048, over 3250963.77 frames. utt_duration=1203 frames, utt_pad_proportion=0.06631, over 10824.81 utterances.], batch size: 22, lr: 5.16e-03, grad_scale: 4.0 2023-03-08 22:21:00,929 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80736.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:21:08,352 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.786e+02 2.169e+02 2.535e+02 5.351e+02, threshold=4.338e+02, percent-clipped=1.0 2023-03-08 22:21:21,085 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80749.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:21:37,417 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-03-08 22:22:00,950 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80773.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:22:05,365 INFO [train2.py:809] (2/4) Epoch 21, batch 1100, loss[ctc_loss=0.05702, att_loss=0.2034, loss=0.1741, over 15777.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008165, over 38.00 utterances.], tot_loss[ctc_loss=0.07532, att_loss=0.2375, loss=0.2051, over 3262340.01 frames. utt_duration=1188 frames, utt_pad_proportion=0.06798, over 10995.79 utterances.], batch size: 38, lr: 5.16e-03, grad_scale: 4.0 2023-03-08 22:22:39,445 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80797.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:23:16,400 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80819.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:23:27,020 INFO [train2.py:809] (2/4) Epoch 21, batch 1150, loss[ctc_loss=0.07144, att_loss=0.2241, loss=0.1936, over 15393.00 frames. utt_duration=1760 frames, utt_pad_proportion=0.009931, over 35.00 utterances.], tot_loss[ctc_loss=0.07452, att_loss=0.2363, loss=0.2039, over 3260892.00 frames. utt_duration=1208 frames, utt_pad_proportion=0.06398, over 10808.00 utterances.], batch size: 35, lr: 5.16e-03, grad_scale: 4.0 2023-03-08 22:23:40,049 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6794, 3.4418, 2.8228, 3.0848, 3.5575, 3.2989, 2.7317, 3.5737], device='cuda:2'), covar=tensor([0.1013, 0.0454, 0.0993, 0.0717, 0.0734, 0.0693, 0.0852, 0.0593], device='cuda:2'), in_proj_covar=tensor([0.0198, 0.0213, 0.0224, 0.0196, 0.0273, 0.0235, 0.0197, 0.0280], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-08 22:23:40,088 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80834.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:23:51,460 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 2.024e+02 2.478e+02 3.117e+02 8.099e+02, threshold=4.955e+02, percent-clipped=10.0 2023-03-08 22:24:06,589 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80850.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:24:48,761 INFO [train2.py:809] (2/4) Epoch 21, batch 1200, loss[ctc_loss=0.0592, att_loss=0.2414, loss=0.205, over 17139.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.01308, over 56.00 utterances.], tot_loss[ctc_loss=0.07488, att_loss=0.2369, loss=0.2045, over 3270689.49 frames. utt_duration=1210 frames, utt_pad_proportion=0.06181, over 10824.67 utterances.], batch size: 56, lr: 5.16e-03, grad_scale: 8.0 2023-03-08 22:26:09,928 INFO [train2.py:809] (2/4) Epoch 21, batch 1250, loss[ctc_loss=0.05237, att_loss=0.1932, loss=0.165, over 15508.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008338, over 36.00 utterances.], tot_loss[ctc_loss=0.07531, att_loss=0.2376, loss=0.2051, over 3277394.62 frames. utt_duration=1223 frames, utt_pad_proportion=0.0582, over 10734.09 utterances.], batch size: 36, lr: 5.16e-03, grad_scale: 8.0 2023-03-08 22:26:21,750 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5509, 4.9386, 4.7341, 4.8673, 4.8998, 4.6672, 3.1936, 4.7727], device='cuda:2'), covar=tensor([0.0110, 0.0100, 0.0122, 0.0071, 0.0100, 0.0103, 0.0757, 0.0217], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0086, 0.0110, 0.0069, 0.0074, 0.0084, 0.0103, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 22:26:31,970 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80939.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:26:34,818 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.421e+02 2.023e+02 2.469e+02 3.077e+02 5.945e+02, threshold=4.937e+02, percent-clipped=1.0 2023-03-08 22:26:58,854 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7770, 6.1046, 5.6079, 5.8442, 5.8064, 5.3241, 5.4900, 5.3556], device='cuda:2'), covar=tensor([0.1503, 0.0963, 0.1031, 0.0852, 0.1026, 0.1568, 0.2556, 0.2463], device='cuda:2'), in_proj_covar=tensor([0.0520, 0.0603, 0.0456, 0.0453, 0.0427, 0.0464, 0.0605, 0.0522], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 22:27:33,041 INFO [train2.py:809] (2/4) Epoch 21, batch 1300, loss[ctc_loss=0.06286, att_loss=0.2212, loss=0.1896, over 14504.00 frames. utt_duration=1814 frames, utt_pad_proportion=0.04854, over 32.00 utterances.], tot_loss[ctc_loss=0.0755, att_loss=0.2373, loss=0.2049, over 3269781.01 frames. utt_duration=1218 frames, utt_pad_proportion=0.06181, over 10755.19 utterances.], batch size: 32, lr: 5.15e-03, grad_scale: 8.0 2023-03-08 22:27:33,419 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4319, 2.9139, 3.4107, 4.5066, 3.9488, 3.8470, 2.9735, 2.2976], device='cuda:2'), covar=tensor([0.0718, 0.1982, 0.0935, 0.0543, 0.0857, 0.0490, 0.1624, 0.2414], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0216, 0.0191, 0.0220, 0.0223, 0.0179, 0.0204, 0.0191], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:27:50,963 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80987.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:28:55,445 INFO [train2.py:809] (2/4) Epoch 21, batch 1350, loss[ctc_loss=0.05947, att_loss=0.2, loss=0.1719, over 15635.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.009478, over 37.00 utterances.], tot_loss[ctc_loss=0.07456, att_loss=0.2356, loss=0.2034, over 3262217.76 frames. utt_duration=1242 frames, utt_pad_proportion=0.05782, over 10519.96 utterances.], batch size: 37, lr: 5.15e-03, grad_scale: 8.0 2023-03-08 22:29:00,018 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.79 vs. limit=2.0 2023-03-08 22:29:19,937 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.247e+02 1.963e+02 2.296e+02 2.713e+02 5.032e+02, threshold=4.592e+02, percent-clipped=2.0 2023-03-08 22:29:33,234 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81049.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:29:51,752 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2830, 2.4193, 3.5122, 2.5572, 3.3083, 4.5444, 4.4226, 2.6699], device='cuda:2'), covar=tensor([0.0505, 0.2364, 0.1314, 0.1860, 0.1141, 0.0731, 0.0552, 0.1996], device='cuda:2'), in_proj_covar=tensor([0.0244, 0.0243, 0.0279, 0.0220, 0.0265, 0.0364, 0.0260, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 22:30:16,871 INFO [train2.py:809] (2/4) Epoch 21, batch 1400, loss[ctc_loss=0.05692, att_loss=0.2231, loss=0.1899, over 15963.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.00642, over 41.00 utterances.], tot_loss[ctc_loss=0.07524, att_loss=0.2363, loss=0.2041, over 3260870.18 frames. utt_duration=1214 frames, utt_pad_proportion=0.0665, over 10753.96 utterances.], batch size: 41, lr: 5.15e-03, grad_scale: 8.0 2023-03-08 22:30:42,802 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81092.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:30:50,488 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81097.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:31:22,078 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81116.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:31:26,390 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81119.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:31:36,833 INFO [train2.py:809] (2/4) Epoch 21, batch 1450, loss[ctc_loss=0.07557, att_loss=0.232, loss=0.2007, over 16334.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006004, over 45.00 utterances.], tot_loss[ctc_loss=0.0753, att_loss=0.2367, loss=0.2044, over 3268879.92 frames. utt_duration=1229 frames, utt_pad_proportion=0.06071, over 10656.08 utterances.], batch size: 45, lr: 5.15e-03, grad_scale: 8.0 2023-03-08 22:31:41,563 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81129.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:32:00,571 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.791e+02 2.173e+02 2.721e+02 6.345e+02, threshold=4.347e+02, percent-clipped=3.0 2023-03-08 22:32:15,027 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81150.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:32:24,662 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3666, 4.2623, 4.4066, 4.4622, 4.9690, 4.5785, 4.4276, 2.3055], device='cuda:2'), covar=tensor([0.0258, 0.0393, 0.0349, 0.0269, 0.0731, 0.0210, 0.0305, 0.1903], device='cuda:2'), in_proj_covar=tensor([0.0161, 0.0184, 0.0185, 0.0201, 0.0366, 0.0156, 0.0173, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:32:35,924 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1117, 5.4355, 4.9795, 5.4273, 4.8307, 5.0679, 5.5305, 5.3585], device='cuda:2'), covar=tensor([0.0639, 0.0237, 0.0728, 0.0306, 0.0408, 0.0210, 0.0214, 0.0185], device='cuda:2'), in_proj_covar=tensor([0.0383, 0.0316, 0.0362, 0.0345, 0.0315, 0.0237, 0.0298, 0.0280], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-08 22:32:43,704 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81167.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:32:57,632 INFO [train2.py:809] (2/4) Epoch 21, batch 1500, loss[ctc_loss=0.06428, att_loss=0.2191, loss=0.1882, over 16191.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.005737, over 41.00 utterances.], tot_loss[ctc_loss=0.07465, att_loss=0.237, loss=0.2045, over 3280248.67 frames. utt_duration=1223 frames, utt_pad_proportion=0.05796, over 10744.46 utterances.], batch size: 41, lr: 5.15e-03, grad_scale: 8.0 2023-03-08 22:32:59,545 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=81177.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:33:33,282 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81198.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:33:38,295 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8617, 6.1349, 5.6016, 5.8574, 5.8441, 5.2511, 5.6053, 5.2155], device='cuda:2'), covar=tensor([0.1258, 0.0826, 0.0975, 0.0846, 0.0890, 0.1455, 0.2061, 0.2269], device='cuda:2'), in_proj_covar=tensor([0.0522, 0.0607, 0.0456, 0.0453, 0.0429, 0.0466, 0.0605, 0.0525], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 22:33:45,687 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.25 vs. limit=5.0 2023-03-08 22:34:12,851 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6431, 5.0941, 4.8493, 4.9942, 5.1047, 4.8012, 3.3508, 4.9218], device='cuda:2'), covar=tensor([0.0118, 0.0092, 0.0128, 0.0070, 0.0091, 0.0094, 0.0711, 0.0205], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0086, 0.0110, 0.0069, 0.0074, 0.0084, 0.0103, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 22:34:19,135 INFO [train2.py:809] (2/4) Epoch 21, batch 1550, loss[ctc_loss=0.06479, att_loss=0.2356, loss=0.2015, over 16531.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006677, over 45.00 utterances.], tot_loss[ctc_loss=0.07512, att_loss=0.2373, loss=0.2049, over 3274653.34 frames. utt_duration=1209 frames, utt_pad_proportion=0.0642, over 10845.66 utterances.], batch size: 45, lr: 5.15e-03, grad_scale: 8.0 2023-03-08 22:34:32,533 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0185, 5.3425, 5.5960, 5.3503, 5.5334, 5.9958, 5.2477, 6.0969], device='cuda:2'), covar=tensor([0.0734, 0.0702, 0.0784, 0.1299, 0.1680, 0.0888, 0.0701, 0.0604], device='cuda:2'), in_proj_covar=tensor([0.0857, 0.0506, 0.0588, 0.0653, 0.0864, 0.0620, 0.0485, 0.0599], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 22:34:40,966 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.70 vs. limit=5.0 2023-03-08 22:34:43,491 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.361e+02 1.828e+02 2.165e+02 2.736e+02 6.423e+02, threshold=4.329e+02, percent-clipped=4.0 2023-03-08 22:35:05,826 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-08 22:35:40,190 INFO [train2.py:809] (2/4) Epoch 21, batch 1600, loss[ctc_loss=0.07845, att_loss=0.245, loss=0.2117, over 16539.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006362, over 45.00 utterances.], tot_loss[ctc_loss=0.07434, att_loss=0.237, loss=0.2045, over 3279935.66 frames. utt_duration=1233 frames, utt_pad_proportion=0.05798, over 10656.04 utterances.], batch size: 45, lr: 5.14e-03, grad_scale: 8.0 2023-03-08 22:36:16,922 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81299.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:36:20,815 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81301.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:36:27,146 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5048, 4.4929, 4.6208, 4.6441, 5.1022, 4.6038, 4.5745, 2.3484], device='cuda:2'), covar=tensor([0.0256, 0.0305, 0.0295, 0.0228, 0.0695, 0.0213, 0.0300, 0.1917], device='cuda:2'), in_proj_covar=tensor([0.0160, 0.0183, 0.0184, 0.0199, 0.0365, 0.0155, 0.0173, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:36:59,919 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3537, 2.4798, 3.1228, 2.5476, 3.0788, 3.4650, 3.3808, 2.7034], device='cuda:2'), covar=tensor([0.0497, 0.1587, 0.1063, 0.1230, 0.0895, 0.1210, 0.0694, 0.1228], device='cuda:2'), in_proj_covar=tensor([0.0240, 0.0240, 0.0275, 0.0217, 0.0261, 0.0362, 0.0258, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 22:37:01,162 INFO [train2.py:809] (2/4) Epoch 21, batch 1650, loss[ctc_loss=0.1023, att_loss=0.2615, loss=0.2296, over 14177.00 frames. utt_duration=382 frames, utt_pad_proportion=0.3215, over 149.00 utterances.], tot_loss[ctc_loss=0.07411, att_loss=0.2366, loss=0.2041, over 3276736.08 frames. utt_duration=1231 frames, utt_pad_proportion=0.05891, over 10661.48 utterances.], batch size: 149, lr: 5.14e-03, grad_scale: 8.0 2023-03-08 22:37:25,607 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.957e+02 2.223e+02 2.631e+02 7.416e+02, threshold=4.446e+02, percent-clipped=3.0 2023-03-08 22:37:58,132 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=81360.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:38:01,318 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=81362.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:38:23,010 INFO [train2.py:809] (2/4) Epoch 21, batch 1700, loss[ctc_loss=0.1021, att_loss=0.2573, loss=0.2263, over 14115.00 frames. utt_duration=388.1 frames, utt_pad_proportion=0.3227, over 146.00 utterances.], tot_loss[ctc_loss=0.07408, att_loss=0.2363, loss=0.2039, over 3277141.23 frames. utt_duration=1228 frames, utt_pad_proportion=0.05945, over 10690.43 utterances.], batch size: 146, lr: 5.14e-03, grad_scale: 8.0 2023-03-08 22:38:49,079 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81392.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:39:44,349 INFO [train2.py:809] (2/4) Epoch 21, batch 1750, loss[ctc_loss=0.06788, att_loss=0.2192, loss=0.1889, over 15896.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.008724, over 39.00 utterances.], tot_loss[ctc_loss=0.07404, att_loss=0.2359, loss=0.2035, over 3266552.20 frames. utt_duration=1246 frames, utt_pad_proportion=0.05787, over 10499.54 utterances.], batch size: 39, lr: 5.14e-03, grad_scale: 8.0 2023-03-08 22:39:49,350 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81429.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:40:04,204 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0786, 4.4973, 4.6135, 4.9233, 2.6758, 4.6588, 2.6174, 1.7887], device='cuda:2'), covar=tensor([0.0451, 0.0226, 0.0543, 0.0182, 0.1683, 0.0181, 0.1487, 0.1708], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0161, 0.0256, 0.0155, 0.0219, 0.0141, 0.0228, 0.0201], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:40:06,984 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81440.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:40:08,454 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.362e+02 2.066e+02 2.468e+02 3.056e+02 6.142e+02, threshold=4.936e+02, percent-clipped=4.0 2023-03-08 22:40:59,665 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81472.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:41:05,523 INFO [train2.py:809] (2/4) Epoch 21, batch 1800, loss[ctc_loss=0.05927, att_loss=0.2043, loss=0.1753, over 15495.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008695, over 36.00 utterances.], tot_loss[ctc_loss=0.07438, att_loss=0.2361, loss=0.2038, over 3273496.24 frames. utt_duration=1258 frames, utt_pad_proportion=0.05346, over 10422.79 utterances.], batch size: 36, lr: 5.14e-03, grad_scale: 8.0 2023-03-08 22:41:07,200 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81477.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:42:25,224 INFO [train2.py:809] (2/4) Epoch 21, batch 1850, loss[ctc_loss=0.1117, att_loss=0.2641, loss=0.2336, over 17306.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02424, over 59.00 utterances.], tot_loss[ctc_loss=0.07428, att_loss=0.2357, loss=0.2034, over 3260282.99 frames. utt_duration=1273 frames, utt_pad_proportion=0.05139, over 10256.61 utterances.], batch size: 59, lr: 5.14e-03, grad_scale: 8.0 2023-03-08 22:42:48,076 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7379, 3.3550, 3.8311, 3.2242, 3.7258, 4.8311, 4.6375, 3.3311], device='cuda:2'), covar=tensor([0.0405, 0.1361, 0.1095, 0.1285, 0.1035, 0.0744, 0.0593, 0.1333], device='cuda:2'), in_proj_covar=tensor([0.0243, 0.0242, 0.0278, 0.0218, 0.0264, 0.0364, 0.0262, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 22:42:49,240 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.850e+02 2.304e+02 2.710e+02 5.976e+02, threshold=4.608e+02, percent-clipped=1.0 2023-03-08 22:43:45,861 INFO [train2.py:809] (2/4) Epoch 21, batch 1900, loss[ctc_loss=0.06811, att_loss=0.2181, loss=0.1881, over 15883.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009534, over 39.00 utterances.], tot_loss[ctc_loss=0.07416, att_loss=0.2357, loss=0.2034, over 3261836.38 frames. utt_duration=1268 frames, utt_pad_proportion=0.05248, over 10302.26 utterances.], batch size: 39, lr: 5.13e-03, grad_scale: 8.0 2023-03-08 22:44:43,725 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1186, 4.6293, 4.5071, 4.7517, 3.0721, 4.6199, 3.1410, 2.1281], device='cuda:2'), covar=tensor([0.0461, 0.0220, 0.0597, 0.0204, 0.1279, 0.0187, 0.1165, 0.1578], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0161, 0.0255, 0.0154, 0.0218, 0.0140, 0.0226, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:44:48,288 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0683, 5.3971, 5.6297, 5.4382, 5.5132, 6.0250, 5.3312, 6.1362], device='cuda:2'), covar=tensor([0.0862, 0.0712, 0.0717, 0.1369, 0.1964, 0.0923, 0.0650, 0.0649], device='cuda:2'), in_proj_covar=tensor([0.0877, 0.0512, 0.0599, 0.0668, 0.0882, 0.0636, 0.0491, 0.0611], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 22:45:07,132 INFO [train2.py:809] (2/4) Epoch 21, batch 1950, loss[ctc_loss=0.06449, att_loss=0.234, loss=0.2001, over 16967.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007378, over 50.00 utterances.], tot_loss[ctc_loss=0.07378, att_loss=0.2352, loss=0.2029, over 3266753.84 frames. utt_duration=1298 frames, utt_pad_proportion=0.04462, over 10080.54 utterances.], batch size: 50, lr: 5.13e-03, grad_scale: 8.0 2023-03-08 22:45:31,392 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.479e+02 1.984e+02 2.266e+02 2.808e+02 6.592e+02, threshold=4.533e+02, percent-clipped=3.0 2023-03-08 22:45:54,451 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81655.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:45:58,064 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81657.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:46:18,709 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81670.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:46:27,651 INFO [train2.py:809] (2/4) Epoch 21, batch 2000, loss[ctc_loss=0.07592, att_loss=0.2573, loss=0.221, over 17076.00 frames. utt_duration=1290 frames, utt_pad_proportion=0.008397, over 53.00 utterances.], tot_loss[ctc_loss=0.07397, att_loss=0.2356, loss=0.2033, over 3270726.12 frames. utt_duration=1291 frames, utt_pad_proportion=0.04446, over 10143.13 utterances.], batch size: 53, lr: 5.13e-03, grad_scale: 8.0 2023-03-08 22:47:07,634 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1104, 4.5344, 4.4845, 4.7137, 2.6705, 4.5441, 2.6388, 1.8096], device='cuda:2'), covar=tensor([0.0483, 0.0232, 0.0634, 0.0185, 0.1665, 0.0197, 0.1527, 0.1805], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0162, 0.0256, 0.0154, 0.0219, 0.0141, 0.0227, 0.0201], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:47:47,653 INFO [train2.py:809] (2/4) Epoch 21, batch 2050, loss[ctc_loss=0.08165, att_loss=0.2415, loss=0.2095, over 16401.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.007616, over 44.00 utterances.], tot_loss[ctc_loss=0.07314, att_loss=0.2349, loss=0.2025, over 3266507.75 frames. utt_duration=1306 frames, utt_pad_proportion=0.04142, over 10014.26 utterances.], batch size: 44, lr: 5.13e-03, grad_scale: 8.0 2023-03-08 22:47:55,819 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=81731.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 22:48:11,612 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.378e+02 1.913e+02 2.441e+02 2.878e+02 1.219e+03, threshold=4.882e+02, percent-clipped=4.0 2023-03-08 22:48:42,868 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.75 vs. limit=5.0 2023-03-08 22:49:02,498 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81772.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:49:08,387 INFO [train2.py:809] (2/4) Epoch 21, batch 2100, loss[ctc_loss=0.09443, att_loss=0.2646, loss=0.2306, over 17066.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.008113, over 52.00 utterances.], tot_loss[ctc_loss=0.07271, att_loss=0.2347, loss=0.2023, over 3270060.00 frames. utt_duration=1317 frames, utt_pad_proportion=0.03785, over 9940.42 utterances.], batch size: 52, lr: 5.13e-03, grad_scale: 8.0 2023-03-08 22:49:20,288 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1004, 5.1420, 4.9149, 3.0753, 4.9093, 4.6955, 4.5077, 2.8269], device='cuda:2'), covar=tensor([0.0120, 0.0111, 0.0258, 0.0968, 0.0098, 0.0222, 0.0275, 0.1423], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0102, 0.0105, 0.0111, 0.0085, 0.0113, 0.0100, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 22:50:03,684 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1283, 5.3715, 5.2938, 5.2185, 5.3873, 5.3375, 5.0779, 4.8438], device='cuda:2'), covar=tensor([0.0901, 0.0482, 0.0293, 0.0567, 0.0273, 0.0296, 0.0393, 0.0331], device='cuda:2'), in_proj_covar=tensor([0.0514, 0.0357, 0.0338, 0.0354, 0.0413, 0.0421, 0.0350, 0.0387], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 22:50:20,895 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81820.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:50:30,138 INFO [train2.py:809] (2/4) Epoch 21, batch 2150, loss[ctc_loss=0.06257, att_loss=0.2031, loss=0.175, over 15480.00 frames. utt_duration=1721 frames, utt_pad_proportion=0.009638, over 36.00 utterances.], tot_loss[ctc_loss=0.07206, att_loss=0.2347, loss=0.2022, over 3277399.19 frames. utt_duration=1320 frames, utt_pad_proportion=0.03505, over 9939.78 utterances.], batch size: 36, lr: 5.13e-03, grad_scale: 8.0 2023-03-08 22:50:37,921 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.16 vs. limit=5.0 2023-03-08 22:50:38,491 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.73 vs. limit=2.0 2023-03-08 22:50:54,306 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.279e+02 1.964e+02 2.354e+02 2.825e+02 8.417e+02, threshold=4.708e+02, percent-clipped=3.0 2023-03-08 22:51:23,767 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.79 vs. limit=5.0 2023-03-08 22:51:30,904 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1227, 5.1355, 4.9121, 3.3268, 4.9238, 4.7370, 4.5876, 3.0666], device='cuda:2'), covar=tensor([0.0131, 0.0096, 0.0274, 0.0818, 0.0099, 0.0198, 0.0247, 0.1187], device='cuda:2'), in_proj_covar=tensor([0.0074, 0.0102, 0.0104, 0.0110, 0.0085, 0.0112, 0.0100, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 22:51:50,722 INFO [train2.py:809] (2/4) Epoch 21, batch 2200, loss[ctc_loss=0.06406, att_loss=0.2186, loss=0.1877, over 15376.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01088, over 35.00 utterances.], tot_loss[ctc_loss=0.07204, att_loss=0.2344, loss=0.2019, over 3270078.52 frames. utt_duration=1321 frames, utt_pad_proportion=0.03815, over 9916.56 utterances.], batch size: 35, lr: 5.13e-03, grad_scale: 8.0 2023-03-08 22:53:11,518 INFO [train2.py:809] (2/4) Epoch 21, batch 2250, loss[ctc_loss=0.07358, att_loss=0.2525, loss=0.2167, over 16960.00 frames. utt_duration=686.6 frames, utt_pad_proportion=0.1385, over 99.00 utterances.], tot_loss[ctc_loss=0.07197, att_loss=0.2341, loss=0.2017, over 3266281.22 frames. utt_duration=1314 frames, utt_pad_proportion=0.04144, over 9958.07 utterances.], batch size: 99, lr: 5.12e-03, grad_scale: 8.0 2023-03-08 22:53:35,600 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.367e+02 2.106e+02 2.465e+02 3.018e+02 5.655e+02, threshold=4.929e+02, percent-clipped=3.0 2023-03-08 22:53:58,782 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81955.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:54:02,656 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81957.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:54:32,963 INFO [train2.py:809] (2/4) Epoch 21, batch 2300, loss[ctc_loss=0.07884, att_loss=0.2449, loss=0.2117, over 16617.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005857, over 47.00 utterances.], tot_loss[ctc_loss=0.07257, att_loss=0.2348, loss=0.2024, over 3269070.37 frames. utt_duration=1288 frames, utt_pad_proportion=0.04753, over 10164.94 utterances.], batch size: 47, lr: 5.12e-03, grad_scale: 8.0 2023-03-08 22:54:50,578 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81987.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:54:56,749 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8732, 5.0990, 5.0815, 5.0941, 5.1551, 5.1137, 4.8109, 4.6161], device='cuda:2'), covar=tensor([0.1096, 0.0604, 0.0327, 0.0486, 0.0299, 0.0338, 0.0445, 0.0394], device='cuda:2'), in_proj_covar=tensor([0.0518, 0.0360, 0.0341, 0.0354, 0.0414, 0.0421, 0.0350, 0.0390], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 22:55:22,031 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82003.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:55:25,322 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82005.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:55:38,215 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1876, 2.7187, 3.3876, 4.3730, 3.7684, 3.8209, 2.8122, 2.0613], device='cuda:2'), covar=tensor([0.0822, 0.2047, 0.0876, 0.0542, 0.1034, 0.0528, 0.1601, 0.2528], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0217, 0.0192, 0.0219, 0.0225, 0.0178, 0.0203, 0.0191], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:55:59,321 INFO [train2.py:809] (2/4) Epoch 21, batch 2350, loss[ctc_loss=0.07086, att_loss=0.2268, loss=0.1956, over 14143.00 frames. utt_duration=1827 frames, utt_pad_proportion=0.04409, over 31.00 utterances.], tot_loss[ctc_loss=0.07297, att_loss=0.2347, loss=0.2024, over 3265434.87 frames. utt_duration=1253 frames, utt_pad_proportion=0.05661, over 10439.08 utterances.], batch size: 31, lr: 5.12e-03, grad_scale: 8.0 2023-03-08 22:55:59,553 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82026.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 22:56:22,868 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.390e+02 1.965e+02 2.374e+02 2.963e+02 4.469e+02, threshold=4.748e+02, percent-clipped=0.0 2023-03-08 22:56:35,687 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82048.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:56:47,286 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82055.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:57:20,433 INFO [train2.py:809] (2/4) Epoch 21, batch 2400, loss[ctc_loss=0.06725, att_loss=0.2266, loss=0.1947, over 16117.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006222, over 42.00 utterances.], tot_loss[ctc_loss=0.073, att_loss=0.2353, loss=0.2029, over 3270928.14 frames. utt_duration=1268 frames, utt_pad_proportion=0.05052, over 10331.07 utterances.], batch size: 42, lr: 5.12e-03, grad_scale: 8.0 2023-03-08 22:58:14,311 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7024, 3.5102, 3.4558, 3.0207, 3.4574, 3.5141, 3.5168, 2.6390], device='cuda:2'), covar=tensor([0.1174, 0.3002, 0.2734, 0.3078, 0.1467, 0.2941, 0.1151, 0.3872], device='cuda:2'), in_proj_covar=tensor([0.0171, 0.0185, 0.0196, 0.0249, 0.0154, 0.0256, 0.0176, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 22:58:25,025 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82116.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 22:58:40,645 INFO [train2.py:809] (2/4) Epoch 21, batch 2450, loss[ctc_loss=0.05283, att_loss=0.2216, loss=0.1878, over 16401.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.00691, over 44.00 utterances.], tot_loss[ctc_loss=0.07281, att_loss=0.2359, loss=0.2033, over 3277433.89 frames. utt_duration=1260 frames, utt_pad_proportion=0.05014, over 10419.79 utterances.], batch size: 44, lr: 5.12e-03, grad_scale: 8.0 2023-03-08 22:59:04,485 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.920e+01 1.856e+02 2.212e+02 2.703e+02 4.819e+02, threshold=4.424e+02, percent-clipped=1.0 2023-03-08 22:59:54,703 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9905, 5.0444, 4.9233, 2.3151, 1.9674, 2.9015, 2.2652, 3.7953], device='cuda:2'), covar=tensor([0.0822, 0.0321, 0.0255, 0.5205, 0.5914, 0.2655, 0.4104, 0.1742], device='cuda:2'), in_proj_covar=tensor([0.0357, 0.0276, 0.0269, 0.0246, 0.0344, 0.0334, 0.0252, 0.0366], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-08 23:00:00,558 INFO [train2.py:809] (2/4) Epoch 21, batch 2500, loss[ctc_loss=0.06376, att_loss=0.2215, loss=0.1899, over 16394.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007349, over 44.00 utterances.], tot_loss[ctc_loss=0.07356, att_loss=0.2359, loss=0.2034, over 3267922.17 frames. utt_duration=1262 frames, utt_pad_proportion=0.05206, over 10369.62 utterances.], batch size: 44, lr: 5.12e-03, grad_scale: 8.0 2023-03-08 23:00:15,472 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.74 vs. limit=2.0 2023-03-08 23:00:25,827 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1039, 5.3584, 5.2734, 5.1943, 5.4059, 5.3538, 5.0627, 4.8425], device='cuda:2'), covar=tensor([0.1003, 0.0521, 0.0311, 0.0563, 0.0276, 0.0321, 0.0392, 0.0332], device='cuda:2'), in_proj_covar=tensor([0.0519, 0.0362, 0.0344, 0.0357, 0.0417, 0.0423, 0.0353, 0.0392], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 23:01:21,093 INFO [train2.py:809] (2/4) Epoch 21, batch 2550, loss[ctc_loss=0.05497, att_loss=0.2125, loss=0.181, over 15874.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.01006, over 39.00 utterances.], tot_loss[ctc_loss=0.07372, att_loss=0.2358, loss=0.2034, over 3269748.40 frames. utt_duration=1265 frames, utt_pad_proportion=0.05163, over 10352.41 utterances.], batch size: 39, lr: 5.11e-03, grad_scale: 8.0 2023-03-08 23:01:45,116 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.468e+02 1.926e+02 2.375e+02 3.006e+02 1.037e+03, threshold=4.749e+02, percent-clipped=4.0 2023-03-08 23:02:07,459 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7395, 2.1895, 2.6130, 2.7581, 2.9146, 2.4593, 2.4150, 3.3543], device='cuda:2'), covar=tensor([0.1729, 0.3566, 0.2401, 0.1541, 0.1512, 0.1679, 0.2761, 0.1293], device='cuda:2'), in_proj_covar=tensor([0.0117, 0.0125, 0.0121, 0.0109, 0.0124, 0.0108, 0.0129, 0.0099], device='cuda:2'), out_proj_covar=tensor([8.8289e-05, 9.6544e-05, 9.5787e-05, 8.5026e-05, 9.1781e-05, 8.6299e-05, 9.7273e-05, 7.9172e-05], device='cuda:2') 2023-03-08 23:02:42,044 INFO [train2.py:809] (2/4) Epoch 21, batch 2600, loss[ctc_loss=0.09492, att_loss=0.2608, loss=0.2276, over 17135.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01336, over 56.00 utterances.], tot_loss[ctc_loss=0.07376, att_loss=0.2355, loss=0.2031, over 3264613.32 frames. utt_duration=1256 frames, utt_pad_proportion=0.05555, over 10410.64 utterances.], batch size: 56, lr: 5.11e-03, grad_scale: 8.0 2023-03-08 23:03:37,200 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6616, 3.4122, 2.8410, 3.1110, 3.5322, 3.2680, 2.7843, 3.5686], device='cuda:2'), covar=tensor([0.1119, 0.0484, 0.1144, 0.0715, 0.0747, 0.0782, 0.0853, 0.0507], device='cuda:2'), in_proj_covar=tensor([0.0199, 0.0214, 0.0224, 0.0197, 0.0275, 0.0238, 0.0197, 0.0282], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 23:04:03,371 INFO [train2.py:809] (2/4) Epoch 21, batch 2650, loss[ctc_loss=0.07002, att_loss=0.2445, loss=0.2096, over 17107.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.01579, over 56.00 utterances.], tot_loss[ctc_loss=0.07345, att_loss=0.2351, loss=0.2028, over 3264929.49 frames. utt_duration=1250 frames, utt_pad_proportion=0.05644, over 10464.56 utterances.], batch size: 56, lr: 5.11e-03, grad_scale: 8.0 2023-03-08 23:04:03,714 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=82326.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:04:27,511 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 1.898e+02 2.205e+02 2.899e+02 5.130e+02, threshold=4.409e+02, percent-clipped=1.0 2023-03-08 23:04:31,600 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82343.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:04:36,587 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82346.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:05:21,678 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82374.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:05:24,739 INFO [train2.py:809] (2/4) Epoch 21, batch 2700, loss[ctc_loss=0.09582, att_loss=0.2665, loss=0.2324, over 17049.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.009231, over 53.00 utterances.], tot_loss[ctc_loss=0.07319, att_loss=0.2356, loss=0.2031, over 3265319.46 frames. utt_duration=1245 frames, utt_pad_proportion=0.05702, over 10506.56 utterances.], batch size: 53, lr: 5.11e-03, grad_scale: 8.0 2023-03-08 23:06:06,853 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1828, 5.1549, 4.9692, 2.8996, 4.9839, 4.7274, 4.2498, 2.9472], device='cuda:2'), covar=tensor([0.0099, 0.0088, 0.0261, 0.1010, 0.0091, 0.0194, 0.0364, 0.1250], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0102, 0.0105, 0.0112, 0.0085, 0.0114, 0.0100, 0.0106], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 23:06:07,625 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2023-03-08 23:06:16,102 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82407.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:06:22,272 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82411.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:06:46,063 INFO [train2.py:809] (2/4) Epoch 21, batch 2750, loss[ctc_loss=0.05708, att_loss=0.2143, loss=0.1829, over 14070.00 frames. utt_duration=1817 frames, utt_pad_proportion=0.05857, over 31.00 utterances.], tot_loss[ctc_loss=0.07284, att_loss=0.2352, loss=0.2027, over 3266665.84 frames. utt_duration=1245 frames, utt_pad_proportion=0.05678, over 10505.59 utterances.], batch size: 31, lr: 5.11e-03, grad_scale: 8.0 2023-03-08 23:07:10,785 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.410e+02 1.903e+02 2.360e+02 2.836e+02 7.135e+02, threshold=4.721e+02, percent-clipped=4.0 2023-03-08 23:07:32,046 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7270, 3.1899, 3.7298, 3.3029, 3.7581, 4.7818, 4.5174, 3.3701], device='cuda:2'), covar=tensor([0.0342, 0.1605, 0.1236, 0.1325, 0.0962, 0.0749, 0.0644, 0.1263], device='cuda:2'), in_proj_covar=tensor([0.0245, 0.0243, 0.0279, 0.0221, 0.0265, 0.0366, 0.0263, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:08:06,984 INFO [train2.py:809] (2/4) Epoch 21, batch 2800, loss[ctc_loss=0.08, att_loss=0.2475, loss=0.214, over 17293.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01252, over 55.00 utterances.], tot_loss[ctc_loss=0.07237, att_loss=0.235, loss=0.2025, over 3270704.40 frames. utt_duration=1252 frames, utt_pad_proportion=0.05396, over 10463.71 utterances.], batch size: 55, lr: 5.11e-03, grad_scale: 8.0 2023-03-08 23:08:34,714 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82493.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 23:09:20,384 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5575, 2.5464, 5.0486, 4.0780, 2.9805, 4.3003, 4.9084, 4.7368], device='cuda:2'), covar=tensor([0.0296, 0.1668, 0.0197, 0.0837, 0.1708, 0.0261, 0.0162, 0.0250], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0241, 0.0185, 0.0309, 0.0265, 0.0213, 0.0174, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 23:09:26,647 INFO [train2.py:809] (2/4) Epoch 21, batch 2850, loss[ctc_loss=0.06247, att_loss=0.2157, loss=0.1851, over 16174.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.006765, over 41.00 utterances.], tot_loss[ctc_loss=0.07318, att_loss=0.236, loss=0.2035, over 3282128.57 frames. utt_duration=1248 frames, utt_pad_proportion=0.05134, over 10533.59 utterances.], batch size: 41, lr: 5.11e-03, grad_scale: 8.0 2023-03-08 23:09:50,385 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.302e+02 1.879e+02 2.361e+02 2.971e+02 6.221e+02, threshold=4.723e+02, percent-clipped=6.0 2023-03-08 23:09:52,794 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9117, 6.1187, 5.5241, 5.8713, 5.8023, 5.2141, 5.5543, 5.2351], device='cuda:2'), covar=tensor([0.1135, 0.0833, 0.0902, 0.0735, 0.0847, 0.1466, 0.2227, 0.2171], device='cuda:2'), in_proj_covar=tensor([0.0515, 0.0599, 0.0453, 0.0449, 0.0421, 0.0461, 0.0600, 0.0518], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 23:09:57,816 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3121, 1.7458, 1.9615, 2.2229, 2.1895, 2.0814, 1.8153, 2.5650], device='cuda:2'), covar=tensor([0.1180, 0.1910, 0.1682, 0.1462, 0.1473, 0.1090, 0.1656, 0.1129], device='cuda:2'), in_proj_covar=tensor([0.0116, 0.0124, 0.0119, 0.0107, 0.0122, 0.0106, 0.0128, 0.0098], device='cuda:2'), out_proj_covar=tensor([8.7563e-05, 9.5919e-05, 9.4888e-05, 8.4052e-05, 9.0571e-05, 8.4904e-05, 9.6187e-05, 7.8490e-05], device='cuda:2') 2023-03-08 23:10:11,722 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82554.0, num_to_drop=1, layers_to_drop={0} 2023-03-08 23:10:35,779 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82569.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:10:46,920 INFO [train2.py:809] (2/4) Epoch 21, batch 2900, loss[ctc_loss=0.05292, att_loss=0.2165, loss=0.1838, over 15633.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008962, over 37.00 utterances.], tot_loss[ctc_loss=0.07319, att_loss=0.2353, loss=0.2029, over 3272650.38 frames. utt_duration=1217 frames, utt_pad_proportion=0.06112, over 10771.44 utterances.], batch size: 37, lr: 5.10e-03, grad_scale: 8.0 2023-03-08 23:10:55,027 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0947, 5.0803, 4.8512, 2.7252, 4.9170, 4.6018, 4.1914, 2.7771], device='cuda:2'), covar=tensor([0.0102, 0.0092, 0.0252, 0.1140, 0.0092, 0.0209, 0.0352, 0.1372], device='cuda:2'), in_proj_covar=tensor([0.0074, 0.0101, 0.0104, 0.0111, 0.0085, 0.0112, 0.0099, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-08 23:12:07,973 INFO [train2.py:809] (2/4) Epoch 21, batch 2950, loss[ctc_loss=0.06341, att_loss=0.2336, loss=0.1996, over 16553.00 frames. utt_duration=1473 frames, utt_pad_proportion=0.005687, over 45.00 utterances.], tot_loss[ctc_loss=0.07289, att_loss=0.2356, loss=0.2031, over 3283188.31 frames. utt_duration=1242 frames, utt_pad_proportion=0.05276, over 10587.83 utterances.], batch size: 45, lr: 5.10e-03, grad_scale: 8.0 2023-03-08 23:12:14,627 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82630.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:12:31,984 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.421e+02 1.787e+02 2.245e+02 2.585e+02 7.263e+02, threshold=4.491e+02, percent-clipped=3.0 2023-03-08 23:12:36,027 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=82643.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:13:27,963 INFO [train2.py:809] (2/4) Epoch 21, batch 3000, loss[ctc_loss=0.1371, att_loss=0.2784, loss=0.2501, over 13737.00 frames. utt_duration=377.9 frames, utt_pad_proportion=0.3416, over 146.00 utterances.], tot_loss[ctc_loss=0.07323, att_loss=0.2358, loss=0.2033, over 3276324.45 frames. utt_duration=1218 frames, utt_pad_proportion=0.06115, over 10773.72 utterances.], batch size: 146, lr: 5.10e-03, grad_scale: 8.0 2023-03-08 23:13:27,964 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 23:13:41,812 INFO [train2.py:843] (2/4) Epoch 21, validation: ctc_loss=0.04141, att_loss=0.2346, loss=0.196, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 23:13:41,813 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 23:14:05,991 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82691.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:14:08,684 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-08 23:14:24,620 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82702.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:14:38,809 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=82711.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:15:02,608 INFO [train2.py:809] (2/4) Epoch 21, batch 3050, loss[ctc_loss=0.09613, att_loss=0.2638, loss=0.2303, over 17270.00 frames. utt_duration=1173 frames, utt_pad_proportion=0.0253, over 59.00 utterances.], tot_loss[ctc_loss=0.07342, att_loss=0.2364, loss=0.2038, over 3276075.76 frames. utt_duration=1249 frames, utt_pad_proportion=0.0528, over 10500.86 utterances.], batch size: 59, lr: 5.10e-03, grad_scale: 16.0 2023-03-08 23:15:11,246 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4613, 2.6821, 2.8948, 4.3383, 4.0136, 3.9124, 3.0536, 2.3953], device='cuda:2'), covar=tensor([0.0620, 0.2041, 0.1268, 0.0679, 0.0684, 0.0429, 0.1251, 0.1986], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0213, 0.0191, 0.0220, 0.0223, 0.0178, 0.0202, 0.0188], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 23:15:26,168 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.870e+01 1.786e+02 2.187e+02 2.749e+02 5.841e+02, threshold=4.374e+02, percent-clipped=4.0 2023-03-08 23:15:47,932 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4683, 4.4094, 4.5296, 4.4750, 5.0839, 4.5148, 4.5914, 2.3448], device='cuda:2'), covar=tensor([0.0244, 0.0323, 0.0307, 0.0323, 0.0733, 0.0237, 0.0279, 0.1897], device='cuda:2'), in_proj_covar=tensor([0.0162, 0.0185, 0.0184, 0.0201, 0.0364, 0.0156, 0.0174, 0.0213], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 23:15:55,319 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82759.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:16:22,501 INFO [train2.py:809] (2/4) Epoch 21, batch 3100, loss[ctc_loss=0.06995, att_loss=0.2138, loss=0.1851, over 15875.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.008719, over 39.00 utterances.], tot_loss[ctc_loss=0.07427, att_loss=0.237, loss=0.2045, over 3271322.90 frames. utt_duration=1230 frames, utt_pad_proportion=0.05826, over 10647.72 utterances.], batch size: 39, lr: 5.10e-03, grad_scale: 16.0 2023-03-08 23:17:32,589 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.09 vs. limit=5.0 2023-03-08 23:17:43,777 INFO [train2.py:809] (2/4) Epoch 21, batch 3150, loss[ctc_loss=0.06279, att_loss=0.2059, loss=0.1772, over 13203.00 frames. utt_duration=1822 frames, utt_pad_proportion=0.1033, over 29.00 utterances.], tot_loss[ctc_loss=0.07434, att_loss=0.237, loss=0.2044, over 3265314.77 frames. utt_duration=1187 frames, utt_pad_proportion=0.06874, over 11018.50 utterances.], batch size: 29, lr: 5.10e-03, grad_scale: 8.0 2023-03-08 23:17:55,149 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2222, 3.8549, 3.2230, 3.4611, 4.0044, 3.6906, 2.9377, 4.3401], device='cuda:2'), covar=tensor([0.0941, 0.0506, 0.0995, 0.0659, 0.0710, 0.0657, 0.0890, 0.0422], device='cuda:2'), in_proj_covar=tensor([0.0203, 0.0217, 0.0226, 0.0199, 0.0278, 0.0241, 0.0200, 0.0283], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 23:18:09,382 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 2.205e+02 2.472e+02 3.065e+02 6.005e+02, threshold=4.943e+02, percent-clipped=5.0 2023-03-08 23:18:21,032 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82849.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 23:19:02,548 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7299, 3.2946, 3.7707, 3.3058, 3.7227, 4.7905, 4.6390, 3.5643], device='cuda:2'), covar=tensor([0.0285, 0.1429, 0.1204, 0.1241, 0.0946, 0.0762, 0.0492, 0.1031], device='cuda:2'), in_proj_covar=tensor([0.0247, 0.0245, 0.0282, 0.0222, 0.0267, 0.0368, 0.0263, 0.0233], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:19:03,800 INFO [train2.py:809] (2/4) Epoch 21, batch 3200, loss[ctc_loss=0.06708, att_loss=0.222, loss=0.191, over 15873.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.009555, over 39.00 utterances.], tot_loss[ctc_loss=0.07455, att_loss=0.2368, loss=0.2044, over 3268420.82 frames. utt_duration=1186 frames, utt_pad_proportion=0.06923, over 11040.26 utterances.], batch size: 39, lr: 5.09e-03, grad_scale: 8.0 2023-03-08 23:19:09,500 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9960, 3.6473, 3.7653, 2.9346, 3.7552, 3.8716, 3.7472, 2.4142], device='cuda:2'), covar=tensor([0.1147, 0.1682, 0.2443, 0.6197, 0.1930, 0.1916, 0.1010, 0.7198], device='cuda:2'), in_proj_covar=tensor([0.0173, 0.0186, 0.0197, 0.0250, 0.0156, 0.0256, 0.0178, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 23:19:22,579 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.21 vs. limit=5.0 2023-03-08 23:19:40,931 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2582, 3.8414, 3.2077, 3.4486, 4.0086, 3.6676, 3.1086, 4.3254], device='cuda:2'), covar=tensor([0.0961, 0.0511, 0.1095, 0.0696, 0.0706, 0.0716, 0.0822, 0.0459], device='cuda:2'), in_proj_covar=tensor([0.0201, 0.0214, 0.0224, 0.0196, 0.0274, 0.0238, 0.0197, 0.0280], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 23:20:23,211 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82925.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:20:24,592 INFO [train2.py:809] (2/4) Epoch 21, batch 3250, loss[ctc_loss=0.06176, att_loss=0.2303, loss=0.1966, over 16183.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006335, over 41.00 utterances.], tot_loss[ctc_loss=0.07373, att_loss=0.2358, loss=0.2034, over 3268848.70 frames. utt_duration=1213 frames, utt_pad_proportion=0.06347, over 10792.18 utterances.], batch size: 41, lr: 5.09e-03, grad_scale: 8.0 2023-03-08 23:20:50,590 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.258e+02 1.807e+02 2.273e+02 2.911e+02 7.187e+02, threshold=4.546e+02, percent-clipped=3.0 2023-03-08 23:21:24,861 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82963.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:21:45,456 INFO [train2.py:809] (2/4) Epoch 21, batch 3300, loss[ctc_loss=0.07005, att_loss=0.2345, loss=0.2016, over 16869.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007508, over 49.00 utterances.], tot_loss[ctc_loss=0.074, att_loss=0.2356, loss=0.2033, over 3267070.65 frames. utt_duration=1202 frames, utt_pad_proportion=0.06691, over 10885.34 utterances.], batch size: 49, lr: 5.09e-03, grad_scale: 8.0 2023-03-08 23:22:28,782 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83002.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:22:43,048 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9400, 4.0810, 3.7654, 4.2179, 2.5883, 4.0826, 2.4837, 1.7048], device='cuda:2'), covar=tensor([0.0437, 0.0235, 0.0929, 0.0308, 0.1740, 0.0267, 0.1627, 0.1880], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0163, 0.0254, 0.0154, 0.0217, 0.0142, 0.0226, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-08 23:22:54,059 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2023-03-08 23:23:03,043 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-03-08 23:23:04,082 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=83024.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:23:07,435 INFO [train2.py:809] (2/4) Epoch 21, batch 3350, loss[ctc_loss=0.08459, att_loss=0.2417, loss=0.2103, over 17468.00 frames. utt_duration=886.2 frames, utt_pad_proportion=0.07209, over 79.00 utterances.], tot_loss[ctc_loss=0.07465, att_loss=0.236, loss=0.2037, over 3270204.72 frames. utt_duration=1185 frames, utt_pad_proportion=0.06974, over 11053.10 utterances.], batch size: 79, lr: 5.09e-03, grad_scale: 8.0 2023-03-08 23:23:33,600 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.287e+02 1.819e+02 2.175e+02 2.592e+02 7.580e+02, threshold=4.350e+02, percent-clipped=2.0 2023-03-08 23:23:46,016 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83050.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:24:07,282 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.78 vs. limit=5.0 2023-03-08 23:24:26,033 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1697, 6.2944, 5.8113, 6.0863, 6.0301, 5.4595, 5.8209, 5.5083], device='cuda:2'), covar=tensor([0.1111, 0.0901, 0.0967, 0.0759, 0.0842, 0.1627, 0.1864, 0.2296], device='cuda:2'), in_proj_covar=tensor([0.0520, 0.0605, 0.0454, 0.0455, 0.0422, 0.0462, 0.0601, 0.0517], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 23:24:28,076 INFO [train2.py:809] (2/4) Epoch 21, batch 3400, loss[ctc_loss=0.07588, att_loss=0.2128, loss=0.1854, over 14086.00 frames. utt_duration=1819 frames, utt_pad_proportion=0.04792, over 31.00 utterances.], tot_loss[ctc_loss=0.07481, att_loss=0.2369, loss=0.2045, over 3275456.03 frames. utt_duration=1184 frames, utt_pad_proportion=0.06927, over 11082.54 utterances.], batch size: 31, lr: 5.09e-03, grad_scale: 8.0 2023-03-08 23:24:35,053 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.13 vs. limit=2.0 2023-03-08 23:25:09,379 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3257, 2.1065, 2.0493, 2.2859, 2.5967, 2.2191, 1.9977, 2.5534], device='cuda:2'), covar=tensor([0.1918, 0.3139, 0.2659, 0.1694, 0.2100, 0.1676, 0.2330, 0.1189], device='cuda:2'), in_proj_covar=tensor([0.0118, 0.0125, 0.0121, 0.0109, 0.0124, 0.0106, 0.0129, 0.0099], device='cuda:2'), out_proj_covar=tensor([8.9027e-05, 9.6650e-05, 9.6354e-05, 8.5186e-05, 9.1919e-05, 8.5328e-05, 9.7070e-05, 7.9323e-05], device='cuda:2') 2023-03-08 23:25:26,879 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-03-08 23:25:47,342 INFO [train2.py:809] (2/4) Epoch 21, batch 3450, loss[ctc_loss=0.06921, att_loss=0.2141, loss=0.1851, over 15630.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.007534, over 37.00 utterances.], tot_loss[ctc_loss=0.07535, att_loss=0.2376, loss=0.2051, over 3275149.85 frames. utt_duration=1199 frames, utt_pad_proportion=0.06618, over 10937.42 utterances.], batch size: 37, lr: 5.09e-03, grad_scale: 8.0 2023-03-08 23:25:59,518 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-08 23:26:13,618 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.358e+02 1.851e+02 2.238e+02 2.723e+02 5.748e+02, threshold=4.475e+02, percent-clipped=3.0 2023-03-08 23:26:24,478 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83149.0, num_to_drop=1, layers_to_drop={2} 2023-03-08 23:26:26,528 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2023-03-08 23:27:06,898 INFO [train2.py:809] (2/4) Epoch 21, batch 3500, loss[ctc_loss=0.1084, att_loss=0.2594, loss=0.2292, over 17302.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02382, over 59.00 utterances.], tot_loss[ctc_loss=0.07623, att_loss=0.2379, loss=0.2055, over 3269317.97 frames. utt_duration=1188 frames, utt_pad_proportion=0.07045, over 11020.71 utterances.], batch size: 59, lr: 5.09e-03, grad_scale: 8.0 2023-03-08 23:27:39,108 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83196.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:27:40,354 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83197.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 23:28:24,320 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4339, 2.6918, 3.6065, 2.9048, 3.5327, 4.5904, 4.4101, 3.0308], device='cuda:2'), covar=tensor([0.0400, 0.2014, 0.1342, 0.1491, 0.1075, 0.0827, 0.0554, 0.1500], device='cuda:2'), in_proj_covar=tensor([0.0247, 0.0244, 0.0281, 0.0220, 0.0264, 0.0368, 0.0262, 0.0233], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:28:25,785 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83225.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:28:26,899 INFO [train2.py:809] (2/4) Epoch 21, batch 3550, loss[ctc_loss=0.08089, att_loss=0.2497, loss=0.2159, over 16873.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007332, over 49.00 utterances.], tot_loss[ctc_loss=0.07563, att_loss=0.238, loss=0.2055, over 3269698.82 frames. utt_duration=1188 frames, utt_pad_proportion=0.06967, over 11025.27 utterances.], batch size: 49, lr: 5.08e-03, grad_scale: 8.0 2023-03-08 23:28:35,516 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.11 vs. limit=5.0 2023-03-08 23:28:53,066 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.215e+02 1.836e+02 2.204e+02 2.637e+02 6.921e+02, threshold=4.407e+02, percent-clipped=5.0 2023-03-08 23:29:16,516 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=83257.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:29:41,887 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83273.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:29:47,121 INFO [train2.py:809] (2/4) Epoch 21, batch 3600, loss[ctc_loss=0.0762, att_loss=0.2443, loss=0.2107, over 16753.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.007206, over 48.00 utterances.], tot_loss[ctc_loss=0.07519, att_loss=0.2377, loss=0.2052, over 3271068.05 frames. utt_duration=1197 frames, utt_pad_proportion=0.06702, over 10946.68 utterances.], batch size: 48, lr: 5.08e-03, grad_scale: 8.0 2023-03-08 23:30:00,072 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83284.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:30:56,573 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=83319.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:31:07,600 INFO [train2.py:809] (2/4) Epoch 21, batch 3650, loss[ctc_loss=0.07568, att_loss=0.2211, loss=0.192, over 14586.00 frames. utt_duration=1825 frames, utt_pad_proportion=0.03704, over 32.00 utterances.], tot_loss[ctc_loss=0.07523, att_loss=0.2375, loss=0.205, over 3275468.90 frames. utt_duration=1201 frames, utt_pad_proportion=0.06474, over 10923.65 utterances.], batch size: 32, lr: 5.08e-03, grad_scale: 8.0 2023-03-08 23:31:33,944 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 2.085e+02 2.475e+02 3.033e+02 6.314e+02, threshold=4.951e+02, percent-clipped=5.0 2023-03-08 23:31:39,092 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=83345.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:32:27,566 INFO [train2.py:809] (2/4) Epoch 21, batch 3700, loss[ctc_loss=0.07042, att_loss=0.2365, loss=0.2033, over 16693.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005595, over 46.00 utterances.], tot_loss[ctc_loss=0.07508, att_loss=0.2365, loss=0.2042, over 3267398.36 frames. utt_duration=1209 frames, utt_pad_proportion=0.06548, over 10827.93 utterances.], batch size: 46, lr: 5.08e-03, grad_scale: 8.0 2023-03-08 23:33:44,785 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3489, 5.6405, 5.0585, 5.4298, 5.3114, 4.7991, 5.1153, 4.8774], device='cuda:2'), covar=tensor([0.1342, 0.0879, 0.0965, 0.0870, 0.0926, 0.1526, 0.2350, 0.2286], device='cuda:2'), in_proj_covar=tensor([0.0523, 0.0606, 0.0455, 0.0455, 0.0425, 0.0464, 0.0606, 0.0519], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-08 23:33:49,256 INFO [train2.py:809] (2/4) Epoch 21, batch 3750, loss[ctc_loss=0.05551, att_loss=0.231, loss=0.1959, over 16413.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.005973, over 44.00 utterances.], tot_loss[ctc_loss=0.07486, att_loss=0.2364, loss=0.2041, over 3272672.31 frames. utt_duration=1215 frames, utt_pad_proportion=0.06229, over 10790.65 utterances.], batch size: 44, lr: 5.08e-03, grad_scale: 8.0 2023-03-08 23:34:15,115 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 1.925e+02 2.407e+02 3.142e+02 5.755e+02, threshold=4.814e+02, percent-clipped=4.0 2023-03-08 23:34:57,838 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-08 23:35:09,004 INFO [train2.py:809] (2/4) Epoch 21, batch 3800, loss[ctc_loss=0.08903, att_loss=0.2525, loss=0.2198, over 17087.00 frames. utt_duration=1222 frames, utt_pad_proportion=0.01521, over 56.00 utterances.], tot_loss[ctc_loss=0.0754, att_loss=0.2366, loss=0.2043, over 3274575.88 frames. utt_duration=1213 frames, utt_pad_proportion=0.06359, over 10814.17 utterances.], batch size: 56, lr: 5.08e-03, grad_scale: 8.0 2023-03-08 23:35:24,883 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1554, 5.4634, 5.6674, 5.5031, 5.6288, 6.0715, 5.4150, 6.1658], device='cuda:2'), covar=tensor([0.0757, 0.0771, 0.0856, 0.1413, 0.1995, 0.1021, 0.0633, 0.0708], device='cuda:2'), in_proj_covar=tensor([0.0870, 0.0511, 0.0598, 0.0662, 0.0874, 0.0627, 0.0485, 0.0607], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:36:29,146 INFO [train2.py:809] (2/4) Epoch 21, batch 3850, loss[ctc_loss=0.05738, att_loss=0.2156, loss=0.184, over 16011.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007601, over 40.00 utterances.], tot_loss[ctc_loss=0.07471, att_loss=0.2356, loss=0.2034, over 3257589.62 frames. utt_duration=1202 frames, utt_pad_proportion=0.07132, over 10857.24 utterances.], batch size: 40, lr: 5.07e-03, grad_scale: 8.0 2023-03-08 23:36:53,732 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 2.030e+02 2.457e+02 3.139e+02 8.463e+02, threshold=4.915e+02, percent-clipped=7.0 2023-03-08 23:37:09,544 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=83552.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:37:46,939 INFO [train2.py:809] (2/4) Epoch 21, batch 3900, loss[ctc_loss=0.08232, att_loss=0.2537, loss=0.2194, over 17048.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.009057, over 52.00 utterances.], tot_loss[ctc_loss=0.07426, att_loss=0.2354, loss=0.2032, over 3258730.26 frames. utt_duration=1211 frames, utt_pad_proportion=0.06716, over 10774.39 utterances.], batch size: 52, lr: 5.07e-03, grad_scale: 8.0 2023-03-08 23:38:55,359 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83619.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:39:05,533 INFO [train2.py:809] (2/4) Epoch 21, batch 3950, loss[ctc_loss=0.06652, att_loss=0.2201, loss=0.1894, over 15872.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.009046, over 39.00 utterances.], tot_loss[ctc_loss=0.07447, att_loss=0.2365, loss=0.2041, over 3268234.89 frames. utt_duration=1209 frames, utt_pad_proportion=0.06546, over 10829.50 utterances.], batch size: 39, lr: 5.07e-03, grad_scale: 8.0 2023-03-08 23:39:27,336 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=83640.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:39:30,214 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.299e+02 1.844e+02 2.165e+02 2.793e+02 5.808e+02, threshold=4.331e+02, percent-clipped=4.0 2023-03-08 23:40:16,715 INFO [train2.py:809] (2/4) Epoch 22, batch 0, loss[ctc_loss=0.07209, att_loss=0.224, loss=0.1936, over 16175.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.006688, over 41.00 utterances.], tot_loss[ctc_loss=0.07209, att_loss=0.224, loss=0.1936, over 16175.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.006688, over 41.00 utterances.], batch size: 41, lr: 4.95e-03, grad_scale: 8.0 2023-03-08 23:40:16,715 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-08 23:40:29,660 INFO [train2.py:843] (2/4) Epoch 22, validation: ctc_loss=0.04004, att_loss=0.2341, loss=0.1953, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-08 23:40:29,661 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-08 23:40:40,581 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1370, 5.4639, 5.3440, 5.2571, 5.4184, 5.3770, 5.0637, 4.8386], device='cuda:2'), covar=tensor([0.0946, 0.0434, 0.0296, 0.0575, 0.0313, 0.0318, 0.0442, 0.0387], device='cuda:2'), in_proj_covar=tensor([0.0516, 0.0362, 0.0342, 0.0356, 0.0418, 0.0429, 0.0353, 0.0392], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 23:40:41,980 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83667.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:41:48,586 INFO [train2.py:809] (2/4) Epoch 22, batch 50, loss[ctc_loss=0.07904, att_loss=0.2495, loss=0.2154, over 17477.00 frames. utt_duration=1111 frames, utt_pad_proportion=0.02954, over 63.00 utterances.], tot_loss[ctc_loss=0.07192, att_loss=0.2329, loss=0.2007, over 736091.84 frames. utt_duration=1315 frames, utt_pad_proportion=0.03976, over 2242.21 utterances.], batch size: 63, lr: 4.95e-03, grad_scale: 8.0 2023-03-08 23:42:40,556 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.187e+02 1.991e+02 2.354e+02 2.967e+02 6.077e+02, threshold=4.708e+02, percent-clipped=4.0 2023-03-08 23:42:43,910 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8568, 6.1190, 5.5463, 5.8684, 5.7834, 5.2688, 5.5584, 5.2766], device='cuda:2'), covar=tensor([0.1339, 0.0820, 0.0960, 0.0785, 0.0905, 0.1667, 0.2216, 0.2333], device='cuda:2'), in_proj_covar=tensor([0.0522, 0.0602, 0.0455, 0.0452, 0.0422, 0.0463, 0.0602, 0.0518], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:2') 2023-03-08 23:43:01,782 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4723, 4.8855, 4.8236, 4.8716, 4.9657, 4.6118, 3.4249, 4.7872], device='cuda:2'), covar=tensor([0.0132, 0.0125, 0.0114, 0.0081, 0.0094, 0.0111, 0.0710, 0.0218], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0086, 0.0109, 0.0069, 0.0075, 0.0085, 0.0103, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:43:08,487 INFO [train2.py:809] (2/4) Epoch 22, batch 100, loss[ctc_loss=0.06008, att_loss=0.2275, loss=0.194, over 16626.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005484, over 47.00 utterances.], tot_loss[ctc_loss=0.07326, att_loss=0.2343, loss=0.2021, over 1293182.57 frames. utt_duration=1238 frames, utt_pad_proportion=0.05818, over 4182.24 utterances.], batch size: 47, lr: 4.95e-03, grad_scale: 8.0 2023-03-08 23:43:10,919 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83760.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:43:48,879 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9931, 5.1645, 5.1780, 5.0963, 5.2286, 5.1775, 4.8680, 4.6538], device='cuda:2'), covar=tensor([0.0979, 0.0554, 0.0280, 0.0526, 0.0293, 0.0327, 0.0386, 0.0365], device='cuda:2'), in_proj_covar=tensor([0.0516, 0.0361, 0.0343, 0.0355, 0.0418, 0.0429, 0.0353, 0.0391], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-08 23:44:29,209 INFO [train2.py:809] (2/4) Epoch 22, batch 150, loss[ctc_loss=0.08177, att_loss=0.235, loss=0.2044, over 17042.00 frames. utt_duration=690.2 frames, utt_pad_proportion=0.1351, over 99.00 utterances.], tot_loss[ctc_loss=0.07299, att_loss=0.2353, loss=0.2029, over 1741211.85 frames. utt_duration=1242 frames, utt_pad_proportion=0.05237, over 5612.92 utterances.], batch size: 99, lr: 4.95e-03, grad_scale: 8.0 2023-03-08 23:44:48,449 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=83821.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:45:21,310 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 1.932e+02 2.277e+02 2.985e+02 7.139e+02, threshold=4.555e+02, percent-clipped=2.0 2023-03-08 23:45:37,557 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83852.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:45:37,613 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2921, 4.0198, 3.3951, 3.6507, 4.1631, 3.8499, 3.1982, 4.4761], device='cuda:2'), covar=tensor([0.0894, 0.0468, 0.0967, 0.0619, 0.0628, 0.0638, 0.0831, 0.0375], device='cuda:2'), in_proj_covar=tensor([0.0203, 0.0215, 0.0227, 0.0198, 0.0277, 0.0241, 0.0199, 0.0285], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 23:45:49,298 INFO [train2.py:809] (2/4) Epoch 22, batch 200, loss[ctc_loss=0.0618, att_loss=0.2111, loss=0.1812, over 15652.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.007742, over 37.00 utterances.], tot_loss[ctc_loss=0.07284, att_loss=0.2356, loss=0.2031, over 2083662.86 frames. utt_duration=1228 frames, utt_pad_proportion=0.05594, over 6795.74 utterances.], batch size: 37, lr: 4.95e-03, grad_scale: 8.0 2023-03-08 23:46:28,881 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7993, 4.2786, 4.5327, 4.3218, 4.4042, 4.7035, 4.3959, 4.7966], device='cuda:2'), covar=tensor([0.0914, 0.0938, 0.0822, 0.1228, 0.1774, 0.1053, 0.2129, 0.0743], device='cuda:2'), in_proj_covar=tensor([0.0881, 0.0517, 0.0599, 0.0669, 0.0878, 0.0631, 0.0490, 0.0609], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:46:53,873 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83900.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:46:55,855 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1462, 3.9622, 3.3268, 3.4987, 4.1140, 3.7605, 3.0796, 4.3778], device='cuda:2'), covar=tensor([0.1009, 0.0554, 0.1013, 0.0678, 0.0703, 0.0730, 0.0834, 0.0592], device='cuda:2'), in_proj_covar=tensor([0.0204, 0.0217, 0.0229, 0.0199, 0.0279, 0.0243, 0.0200, 0.0288], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-08 23:47:08,786 INFO [train2.py:809] (2/4) Epoch 22, batch 250, loss[ctc_loss=0.06937, att_loss=0.2369, loss=0.2034, over 17204.00 frames. utt_duration=696.5 frames, utt_pad_proportion=0.1261, over 99.00 utterances.], tot_loss[ctc_loss=0.07381, att_loss=0.236, loss=0.2036, over 2349346.00 frames. utt_duration=1205 frames, utt_pad_proportion=0.06118, over 7810.16 utterances.], batch size: 99, lr: 4.94e-03, grad_scale: 8.0 2023-03-08 23:47:57,194 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83940.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:47:59,819 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.391e+02 1.880e+02 2.225e+02 2.668e+02 4.764e+02, threshold=4.450e+02, percent-clipped=1.0 2023-03-08 23:48:11,527 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83949.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:48:27,880 INFO [train2.py:809] (2/4) Epoch 22, batch 300, loss[ctc_loss=0.04908, att_loss=0.2014, loss=0.171, over 15498.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008865, over 36.00 utterances.], tot_loss[ctc_loss=0.07354, att_loss=0.2354, loss=0.203, over 2557557.30 frames. utt_duration=1239 frames, utt_pad_proportion=0.05222, over 8266.30 utterances.], batch size: 36, lr: 4.94e-03, grad_scale: 8.0 2023-03-08 23:48:44,313 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.92 vs. limit=5.0 2023-03-08 23:48:46,392 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-08 23:49:09,352 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83985.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:49:13,745 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83988.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:49:51,794 INFO [train2.py:809] (2/4) Epoch 22, batch 350, loss[ctc_loss=0.06181, att_loss=0.2124, loss=0.1823, over 12379.00 frames. utt_duration=1835 frames, utt_pad_proportion=0.1197, over 27.00 utterances.], tot_loss[ctc_loss=0.07348, att_loss=0.2355, loss=0.2031, over 2713759.42 frames. utt_duration=1241 frames, utt_pad_proportion=0.0532, over 8755.73 utterances.], batch size: 27, lr: 4.94e-03, grad_scale: 8.0 2023-03-08 23:49:53,707 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84010.0, num_to_drop=1, layers_to_drop={3} 2023-03-08 23:49:59,927 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5912, 3.0351, 3.8458, 3.0847, 3.7874, 4.7428, 4.5913, 3.3987], device='cuda:2'), covar=tensor([0.0362, 0.1562, 0.1009, 0.1346, 0.0857, 0.0719, 0.0508, 0.1140], device='cuda:2'), in_proj_covar=tensor([0.0246, 0.0242, 0.0281, 0.0219, 0.0264, 0.0366, 0.0259, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:50:06,549 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.82 vs. limit=2.0 2023-03-08 23:50:43,092 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.569e+02 1.981e+02 2.404e+02 2.952e+02 5.790e+02, threshold=4.808e+02, percent-clipped=6.0 2023-03-08 23:50:50,298 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84046.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:50:51,779 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5127, 2.9185, 3.6523, 3.0012, 3.5829, 4.6624, 4.4867, 3.2190], device='cuda:2'), covar=tensor([0.0377, 0.1719, 0.1261, 0.1322, 0.1058, 0.0743, 0.0462, 0.1275], device='cuda:2'), in_proj_covar=tensor([0.0246, 0.0243, 0.0282, 0.0219, 0.0264, 0.0368, 0.0260, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:51:11,740 INFO [train2.py:809] (2/4) Epoch 22, batch 400, loss[ctc_loss=0.08759, att_loss=0.2536, loss=0.2204, over 17429.00 frames. utt_duration=884 frames, utt_pad_proportion=0.07535, over 79.00 utterances.], tot_loss[ctc_loss=0.0728, att_loss=0.235, loss=0.2026, over 2832458.16 frames. utt_duration=1275 frames, utt_pad_proportion=0.04679, over 8898.79 utterances.], batch size: 79, lr: 4.94e-03, grad_scale: 8.0 2023-03-08 23:51:57,487 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8676, 5.2005, 5.0972, 5.1051, 5.3214, 4.8753, 3.8987, 5.2250], device='cuda:2'), covar=tensor([0.0098, 0.0110, 0.0113, 0.0088, 0.0086, 0.0096, 0.0566, 0.0158], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0086, 0.0109, 0.0069, 0.0075, 0.0084, 0.0103, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:52:12,394 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84097.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:52:31,744 INFO [train2.py:809] (2/4) Epoch 22, batch 450, loss[ctc_loss=0.04743, att_loss=0.2187, loss=0.1844, over 16104.00 frames. utt_duration=1535 frames, utt_pad_proportion=0.007285, over 42.00 utterances.], tot_loss[ctc_loss=0.07244, att_loss=0.2349, loss=0.2024, over 2929229.50 frames. utt_duration=1287 frames, utt_pad_proportion=0.04503, over 9115.54 utterances.], batch size: 42, lr: 4.94e-03, grad_scale: 8.0 2023-03-08 23:52:42,549 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84116.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:53:22,925 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.259e+02 1.943e+02 2.393e+02 2.892e+02 4.484e+02, threshold=4.787e+02, percent-clipped=0.0 2023-03-08 23:53:39,935 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2023-03-08 23:53:50,183 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84158.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:53:51,322 INFO [train2.py:809] (2/4) Epoch 22, batch 500, loss[ctc_loss=0.07347, att_loss=0.244, loss=0.2099, over 16948.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.006964, over 50.00 utterances.], tot_loss[ctc_loss=0.07312, att_loss=0.2352, loss=0.2028, over 3002862.51 frames. utt_duration=1260 frames, utt_pad_proportion=0.05218, over 9544.04 utterances.], batch size: 50, lr: 4.94e-03, grad_scale: 8.0 2023-03-08 23:54:55,806 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84199.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:55:12,694 INFO [train2.py:809] (2/4) Epoch 22, batch 550, loss[ctc_loss=0.08828, att_loss=0.2631, loss=0.2281, over 17295.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02322, over 59.00 utterances.], tot_loss[ctc_loss=0.07319, att_loss=0.2359, loss=0.2033, over 3072132.63 frames. utt_duration=1268 frames, utt_pad_proportion=0.04755, over 9701.90 utterances.], batch size: 59, lr: 4.94e-03, grad_scale: 8.0 2023-03-08 23:55:55,792 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84236.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:56:04,937 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.221e+02 1.935e+02 2.218e+02 2.633e+02 6.667e+02, threshold=4.436e+02, percent-clipped=1.0 2023-03-08 23:56:33,882 INFO [train2.py:809] (2/4) Epoch 22, batch 600, loss[ctc_loss=0.05955, att_loss=0.2253, loss=0.1922, over 15933.00 frames. utt_duration=1556 frames, utt_pad_proportion=0.008394, over 41.00 utterances.], tot_loss[ctc_loss=0.07417, att_loss=0.237, loss=0.2044, over 3118848.18 frames. utt_duration=1219 frames, utt_pad_proportion=0.05699, over 10245.65 utterances.], batch size: 41, lr: 4.93e-03, grad_scale: 8.0 2023-03-08 23:56:35,682 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84260.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:56:44,741 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8849, 5.2246, 5.4101, 5.3190, 5.4220, 5.8660, 5.2099, 5.9354], device='cuda:2'), covar=tensor([0.0784, 0.0795, 0.0927, 0.1349, 0.1897, 0.0986, 0.0707, 0.0762], device='cuda:2'), in_proj_covar=tensor([0.0873, 0.0514, 0.0600, 0.0666, 0.0874, 0.0627, 0.0487, 0.0612], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:57:33,975 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2023-03-08 23:57:35,152 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84297.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:57:48,999 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84305.0, num_to_drop=1, layers_to_drop={1} 2023-03-08 23:57:55,136 INFO [train2.py:809] (2/4) Epoch 22, batch 650, loss[ctc_loss=0.08737, att_loss=0.2441, loss=0.2128, over 17196.00 frames. utt_duration=872 frames, utt_pad_proportion=0.08785, over 79.00 utterances.], tot_loss[ctc_loss=0.07406, att_loss=0.2369, loss=0.2043, over 3164859.08 frames. utt_duration=1239 frames, utt_pad_proportion=0.04952, over 10228.89 utterances.], batch size: 79, lr: 4.93e-03, grad_scale: 8.0 2023-03-08 23:58:38,281 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84336.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:58:45,774 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84341.0, num_to_drop=0, layers_to_drop=set() 2023-03-08 23:58:47,101 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.197e+02 2.036e+02 2.451e+02 3.016e+02 6.584e+02, threshold=4.901e+02, percent-clipped=3.0 2023-03-08 23:59:13,065 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6522, 3.1108, 3.8688, 3.0604, 3.6976, 4.7218, 4.5099, 3.5382], device='cuda:2'), covar=tensor([0.0316, 0.1577, 0.1092, 0.1443, 0.0990, 0.0756, 0.0604, 0.1084], device='cuda:2'), in_proj_covar=tensor([0.0246, 0.0243, 0.0281, 0.0219, 0.0264, 0.0367, 0.0261, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-08 23:59:16,009 INFO [train2.py:809] (2/4) Epoch 22, batch 700, loss[ctc_loss=0.05394, att_loss=0.1978, loss=0.1691, over 12748.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.1207, over 28.00 utterances.], tot_loss[ctc_loss=0.07395, att_loss=0.2367, loss=0.2041, over 3184315.83 frames. utt_duration=1225 frames, utt_pad_proportion=0.05702, over 10406.87 utterances.], batch size: 28, lr: 4.93e-03, grad_scale: 8.0 2023-03-08 23:59:41,736 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9305, 3.7560, 3.7020, 3.1569, 3.7830, 3.7733, 3.7198, 2.7825], device='cuda:2'), covar=tensor([0.0898, 0.0984, 0.1298, 0.2914, 0.0822, 0.1168, 0.0610, 0.3286], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0190, 0.0202, 0.0257, 0.0159, 0.0262, 0.0182, 0.0219], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-08 23:59:51,978 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.99 vs. limit=5.0 2023-03-09 00:00:17,546 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84397.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:00:28,895 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3286, 2.4513, 4.8340, 3.8929, 2.9747, 4.1427, 4.4569, 4.5234], device='cuda:2'), covar=tensor([0.0282, 0.1739, 0.0160, 0.0910, 0.1659, 0.0284, 0.0191, 0.0264], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0244, 0.0191, 0.0315, 0.0267, 0.0218, 0.0178, 0.0210], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:00:37,602 INFO [train2.py:809] (2/4) Epoch 22, batch 750, loss[ctc_loss=0.09461, att_loss=0.2597, loss=0.2267, over 17239.00 frames. utt_duration=874.4 frames, utt_pad_proportion=0.08246, over 79.00 utterances.], tot_loss[ctc_loss=0.07367, att_loss=0.2365, loss=0.204, over 3212934.10 frames. utt_duration=1245 frames, utt_pad_proportion=0.05032, over 10338.96 utterances.], batch size: 79, lr: 4.93e-03, grad_scale: 8.0 2023-03-09 00:00:48,939 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84416.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:00:50,424 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7544, 6.0144, 5.5151, 5.7559, 5.6871, 5.1998, 5.3986, 5.2386], device='cuda:2'), covar=tensor([0.1373, 0.1009, 0.0950, 0.0877, 0.0947, 0.1668, 0.2497, 0.2328], device='cuda:2'), in_proj_covar=tensor([0.0529, 0.0616, 0.0462, 0.0460, 0.0430, 0.0467, 0.0612, 0.0529], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 00:01:29,373 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.218e+02 1.936e+02 2.365e+02 2.996e+02 6.300e+02, threshold=4.730e+02, percent-clipped=3.0 2023-03-09 00:01:48,763 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84453.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:01:57,858 INFO [train2.py:809] (2/4) Epoch 22, batch 800, loss[ctc_loss=0.06775, att_loss=0.2137, loss=0.1845, over 15502.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008215, over 36.00 utterances.], tot_loss[ctc_loss=0.07334, att_loss=0.2353, loss=0.2029, over 3224712.84 frames. utt_duration=1286 frames, utt_pad_proportion=0.04286, over 10043.32 utterances.], batch size: 36, lr: 4.93e-03, grad_scale: 8.0 2023-03-09 00:02:06,036 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84464.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:02:12,472 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0333, 5.3003, 4.9352, 5.3817, 4.8279, 5.0959, 5.4637, 5.2364], device='cuda:2'), covar=tensor([0.0537, 0.0301, 0.0730, 0.0343, 0.0366, 0.0220, 0.0216, 0.0170], device='cuda:2'), in_proj_covar=tensor([0.0387, 0.0320, 0.0366, 0.0349, 0.0322, 0.0238, 0.0301, 0.0284], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 00:03:17,418 INFO [train2.py:809] (2/4) Epoch 22, batch 850, loss[ctc_loss=0.06194, att_loss=0.2214, loss=0.1895, over 16282.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.007204, over 43.00 utterances.], tot_loss[ctc_loss=0.07339, att_loss=0.2359, loss=0.2034, over 3242050.04 frames. utt_duration=1277 frames, utt_pad_proportion=0.04456, over 10168.99 utterances.], batch size: 43, lr: 4.93e-03, grad_scale: 8.0 2023-03-09 00:03:42,550 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84525.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:04:07,925 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.340e+02 1.954e+02 2.376e+02 2.798e+02 7.444e+02, threshold=4.752e+02, percent-clipped=4.0 2023-03-09 00:04:30,057 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84555.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:04:36,030 INFO [train2.py:809] (2/4) Epoch 22, batch 900, loss[ctc_loss=0.06676, att_loss=0.2462, loss=0.2103, over 16761.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006777, over 48.00 utterances.], tot_loss[ctc_loss=0.07319, att_loss=0.2357, loss=0.2032, over 3244883.34 frames. utt_duration=1261 frames, utt_pad_proportion=0.05089, over 10304.54 utterances.], batch size: 48, lr: 4.93e-03, grad_scale: 8.0 2023-03-09 00:04:52,969 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8899, 5.2093, 4.7812, 5.3128, 4.6281, 5.0208, 5.3605, 5.1224], device='cuda:2'), covar=tensor([0.0627, 0.0298, 0.0790, 0.0299, 0.0430, 0.0233, 0.0220, 0.0181], device='cuda:2'), in_proj_covar=tensor([0.0390, 0.0322, 0.0367, 0.0350, 0.0324, 0.0238, 0.0303, 0.0285], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 00:05:08,786 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4615, 2.8171, 3.6469, 2.7240, 3.4921, 4.5597, 4.4172, 3.1853], device='cuda:2'), covar=tensor([0.0376, 0.1803, 0.1208, 0.1556, 0.1090, 0.0935, 0.0608, 0.1331], device='cuda:2'), in_proj_covar=tensor([0.0245, 0.0244, 0.0279, 0.0219, 0.0264, 0.0366, 0.0258, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 00:05:18,210 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84586.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:05:22,788 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9771, 2.5535, 2.8305, 3.8370, 3.4563, 3.6112, 2.6928, 2.0674], device='cuda:2'), covar=tensor([0.0889, 0.2005, 0.1028, 0.0722, 0.1001, 0.0506, 0.1492, 0.2334], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0216, 0.0189, 0.0219, 0.0225, 0.0179, 0.0202, 0.0187], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:05:27,574 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84592.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:05:50,245 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84605.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:05:55,901 INFO [train2.py:809] (2/4) Epoch 22, batch 950, loss[ctc_loss=0.09297, att_loss=0.2474, loss=0.2165, over 17118.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01541, over 56.00 utterances.], tot_loss[ctc_loss=0.07238, att_loss=0.2346, loss=0.2022, over 3237609.90 frames. utt_duration=1266 frames, utt_pad_proportion=0.0521, over 10241.94 utterances.], batch size: 56, lr: 4.92e-03, grad_scale: 8.0 2023-03-09 00:06:33,714 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84633.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:06:45,428 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.96 vs. limit=5.0 2023-03-09 00:06:46,113 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84641.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:06:47,275 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.223e+02 1.899e+02 2.395e+02 3.054e+02 5.479e+02, threshold=4.790e+02, percent-clipped=1.0 2023-03-09 00:07:06,386 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84653.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:07:15,602 INFO [train2.py:809] (2/4) Epoch 22, batch 1000, loss[ctc_loss=0.07417, att_loss=0.2484, loss=0.2136, over 17043.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.01028, over 53.00 utterances.], tot_loss[ctc_loss=0.0721, att_loss=0.2349, loss=0.2023, over 3249703.92 frames. utt_duration=1275 frames, utt_pad_proportion=0.04835, over 10208.94 utterances.], batch size: 53, lr: 4.92e-03, grad_scale: 8.0 2023-03-09 00:08:02,347 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84689.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:08:07,641 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84692.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:08:10,942 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84694.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:08:34,841 INFO [train2.py:809] (2/4) Epoch 22, batch 1050, loss[ctc_loss=0.05549, att_loss=0.2111, loss=0.18, over 15359.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01207, over 35.00 utterances.], tot_loss[ctc_loss=0.072, att_loss=0.2349, loss=0.2023, over 3258605.24 frames. utt_duration=1270 frames, utt_pad_proportion=0.04771, over 10275.73 utterances.], batch size: 35, lr: 4.92e-03, grad_scale: 8.0 2023-03-09 00:09:25,763 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.985e+02 2.384e+02 2.768e+02 5.690e+02, threshold=4.768e+02, percent-clipped=2.0 2023-03-09 00:09:45,242 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84753.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:09:54,285 INFO [train2.py:809] (2/4) Epoch 22, batch 1100, loss[ctc_loss=0.06286, att_loss=0.2225, loss=0.1906, over 15894.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.008615, over 39.00 utterances.], tot_loss[ctc_loss=0.07362, att_loss=0.2357, loss=0.2033, over 3254537.31 frames. utt_duration=1230 frames, utt_pad_proportion=0.06046, over 10594.20 utterances.], batch size: 39, lr: 4.92e-03, grad_scale: 8.0 2023-03-09 00:10:41,296 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1233, 5.4170, 5.3313, 5.2979, 5.4274, 5.3869, 5.1016, 4.8799], device='cuda:2'), covar=tensor([0.0884, 0.0441, 0.0298, 0.0541, 0.0270, 0.0281, 0.0376, 0.0308], device='cuda:2'), in_proj_covar=tensor([0.0518, 0.0365, 0.0347, 0.0358, 0.0420, 0.0429, 0.0359, 0.0395], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 00:10:59,977 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84800.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:11:01,285 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84801.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:11:13,082 INFO [train2.py:809] (2/4) Epoch 22, batch 1150, loss[ctc_loss=0.06696, att_loss=0.2195, loss=0.189, over 16175.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006473, over 41.00 utterances.], tot_loss[ctc_loss=0.07343, att_loss=0.2349, loss=0.2026, over 3256272.41 frames. utt_duration=1237 frames, utt_pad_proportion=0.06013, over 10543.52 utterances.], batch size: 41, lr: 4.92e-03, grad_scale: 16.0 2023-03-09 00:11:20,869 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8372, 5.1817, 5.4127, 5.2063, 5.2946, 5.8105, 5.2069, 5.8985], device='cuda:2'), covar=tensor([0.0741, 0.0808, 0.0832, 0.1382, 0.1894, 0.0951, 0.0701, 0.0676], device='cuda:2'), in_proj_covar=tensor([0.0872, 0.0515, 0.0597, 0.0664, 0.0872, 0.0626, 0.0486, 0.0607], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 00:12:04,638 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.375e+02 1.971e+02 2.350e+02 2.955e+02 6.874e+02, threshold=4.701e+02, percent-clipped=3.0 2023-03-09 00:12:25,856 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84855.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:12:31,661 INFO [train2.py:809] (2/4) Epoch 22, batch 1200, loss[ctc_loss=0.05915, att_loss=0.2418, loss=0.2053, over 16784.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005578, over 48.00 utterances.], tot_loss[ctc_loss=0.07352, att_loss=0.2354, loss=0.203, over 3261324.57 frames. utt_duration=1233 frames, utt_pad_proportion=0.06086, over 10595.44 utterances.], batch size: 48, lr: 4.92e-03, grad_scale: 16.0 2023-03-09 00:12:35,080 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84861.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:13:06,041 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84881.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:13:24,364 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84892.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:13:41,852 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84903.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:13:51,118 INFO [train2.py:809] (2/4) Epoch 22, batch 1250, loss[ctc_loss=0.08904, att_loss=0.2563, loss=0.2228, over 17111.00 frames. utt_duration=692.9 frames, utt_pad_proportion=0.1317, over 99.00 utterances.], tot_loss[ctc_loss=0.07339, att_loss=0.2355, loss=0.2031, over 3263553.22 frames. utt_duration=1224 frames, utt_pad_proportion=0.0613, over 10675.23 utterances.], batch size: 99, lr: 4.92e-03, grad_scale: 16.0 2023-03-09 00:13:55,030 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2023-03-09 00:14:10,821 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7836, 3.1751, 3.7793, 3.1768, 3.6724, 4.8006, 4.5912, 3.5211], device='cuda:2'), covar=tensor([0.0377, 0.1544, 0.1231, 0.1340, 0.1095, 0.0757, 0.0567, 0.1109], device='cuda:2'), in_proj_covar=tensor([0.0244, 0.0240, 0.0276, 0.0216, 0.0262, 0.0363, 0.0256, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 00:14:40,827 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84940.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:14:43,743 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.899e+02 2.318e+02 2.747e+02 6.253e+02, threshold=4.636e+02, percent-clipped=3.0 2023-03-09 00:14:46,349 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.01 vs. limit=5.0 2023-03-09 00:14:58,357 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84951.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:15:09,890 INFO [train2.py:809] (2/4) Epoch 22, batch 1300, loss[ctc_loss=0.05661, att_loss=0.2057, loss=0.1759, over 15355.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01242, over 35.00 utterances.], tot_loss[ctc_loss=0.07184, att_loss=0.2342, loss=0.2017, over 3259378.75 frames. utt_duration=1270 frames, utt_pad_proportion=0.05253, over 10276.40 utterances.], batch size: 35, lr: 4.91e-03, grad_scale: 16.0 2023-03-09 00:15:56,226 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8911, 3.7293, 3.6589, 3.1751, 3.6745, 3.7748, 3.7799, 2.8764], device='cuda:2'), covar=tensor([0.0945, 0.1019, 0.1972, 0.2826, 0.1127, 0.2475, 0.0776, 0.3371], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0189, 0.0200, 0.0253, 0.0158, 0.0258, 0.0180, 0.0216], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:15:58,285 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84989.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:16:02,925 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84992.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:16:29,439 INFO [train2.py:809] (2/4) Epoch 22, batch 1350, loss[ctc_loss=0.1169, att_loss=0.2484, loss=0.2221, over 16396.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007949, over 44.00 utterances.], tot_loss[ctc_loss=0.07184, att_loss=0.2347, loss=0.2021, over 3267303.63 frames. utt_duration=1284 frames, utt_pad_proportion=0.04851, over 10193.71 utterances.], batch size: 44, lr: 4.91e-03, grad_scale: 16.0 2023-03-09 00:16:34,354 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85012.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:17:18,549 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85040.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:17:18,752 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3423, 5.2807, 5.1288, 3.3458, 5.0687, 4.9368, 4.8095, 3.1825], device='cuda:2'), covar=tensor([0.0090, 0.0071, 0.0213, 0.0793, 0.0090, 0.0158, 0.0213, 0.1118], device='cuda:2'), in_proj_covar=tensor([0.0074, 0.0101, 0.0104, 0.0110, 0.0085, 0.0113, 0.0099, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 00:17:21,480 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.237e+02 1.871e+02 2.280e+02 2.864e+02 6.552e+02, threshold=4.561e+02, percent-clipped=2.0 2023-03-09 00:17:21,706 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8569, 6.1593, 5.6613, 5.8943, 5.8728, 5.3554, 5.5749, 5.3918], device='cuda:2'), covar=tensor([0.1296, 0.1003, 0.0825, 0.0858, 0.0807, 0.1427, 0.2357, 0.2250], device='cuda:2'), in_proj_covar=tensor([0.0528, 0.0618, 0.0462, 0.0461, 0.0430, 0.0467, 0.0615, 0.0529], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 00:17:29,352 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-03-09 00:17:36,314 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1140, 5.1469, 4.9296, 2.3621, 2.0697, 3.0055, 2.5143, 3.8559], device='cuda:2'), covar=tensor([0.0729, 0.0308, 0.0255, 0.4672, 0.5870, 0.2488, 0.3605, 0.1752], device='cuda:2'), in_proj_covar=tensor([0.0354, 0.0276, 0.0266, 0.0244, 0.0339, 0.0330, 0.0253, 0.0360], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 00:17:42,457 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1106, 5.0761, 4.9061, 2.4316, 1.9876, 3.0204, 2.3024, 3.8549], device='cuda:2'), covar=tensor([0.0752, 0.0284, 0.0249, 0.4411, 0.5724, 0.2405, 0.3954, 0.1772], device='cuda:2'), in_proj_covar=tensor([0.0353, 0.0276, 0.0266, 0.0244, 0.0339, 0.0330, 0.0253, 0.0360], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 00:17:48,042 INFO [train2.py:809] (2/4) Epoch 22, batch 1400, loss[ctc_loss=0.07277, att_loss=0.2325, loss=0.2006, over 16975.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.0071, over 50.00 utterances.], tot_loss[ctc_loss=0.07255, att_loss=0.2349, loss=0.2024, over 3271351.10 frames. utt_duration=1265 frames, utt_pad_proportion=0.05159, over 10354.23 utterances.], batch size: 50, lr: 4.91e-03, grad_scale: 16.0 2023-03-09 00:18:51,227 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1204, 5.4455, 5.0102, 5.5239, 4.8526, 5.1320, 5.5941, 5.3157], device='cuda:2'), covar=tensor([0.0595, 0.0266, 0.0819, 0.0287, 0.0429, 0.0193, 0.0210, 0.0203], device='cuda:2'), in_proj_covar=tensor([0.0389, 0.0321, 0.0367, 0.0350, 0.0323, 0.0238, 0.0302, 0.0286], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 00:19:03,488 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85107.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:19:06,354 INFO [train2.py:809] (2/4) Epoch 22, batch 1450, loss[ctc_loss=0.0626, att_loss=0.2067, loss=0.1779, over 15512.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.007939, over 36.00 utterances.], tot_loss[ctc_loss=0.07338, att_loss=0.2353, loss=0.2029, over 3263863.01 frames. utt_duration=1229 frames, utt_pad_proportion=0.06171, over 10636.57 utterances.], batch size: 36, lr: 4.91e-03, grad_scale: 16.0 2023-03-09 00:19:16,253 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85115.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:19:59,352 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.290e+02 1.922e+02 2.285e+02 2.782e+02 4.553e+02, threshold=4.570e+02, percent-clipped=0.0 2023-03-09 00:20:21,434 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85156.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:20:25,860 INFO [train2.py:809] (2/4) Epoch 22, batch 1500, loss[ctc_loss=0.05955, att_loss=0.2402, loss=0.204, over 16344.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.005274, over 45.00 utterances.], tot_loss[ctc_loss=0.07315, att_loss=0.2352, loss=0.2028, over 3264199.23 frames. utt_duration=1225 frames, utt_pad_proportion=0.06238, over 10673.62 utterances.], batch size: 45, lr: 4.91e-03, grad_scale: 16.0 2023-03-09 00:20:26,053 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1718, 5.3675, 5.6776, 5.4385, 5.6763, 6.0824, 5.3844, 6.2016], device='cuda:2'), covar=tensor([0.0609, 0.0718, 0.0805, 0.1287, 0.1572, 0.0850, 0.0647, 0.0564], device='cuda:2'), in_proj_covar=tensor([0.0868, 0.0515, 0.0596, 0.0662, 0.0869, 0.0627, 0.0485, 0.0607], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 00:20:40,484 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85168.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 00:20:52,802 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85176.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:21:01,420 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85181.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:21:46,574 INFO [train2.py:809] (2/4) Epoch 22, batch 1550, loss[ctc_loss=0.07788, att_loss=0.2389, loss=0.2067, over 17134.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01333, over 56.00 utterances.], tot_loss[ctc_loss=0.07331, att_loss=0.236, loss=0.2035, over 3275081.79 frames. utt_duration=1192 frames, utt_pad_proportion=0.0668, over 11004.22 utterances.], batch size: 56, lr: 4.91e-03, grad_scale: 16.0 2023-03-09 00:22:19,154 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85229.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:22:42,139 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.384e+02 1.952e+02 2.297e+02 2.764e+02 6.264e+02, threshold=4.595e+02, percent-clipped=3.0 2023-03-09 00:22:44,891 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-09 00:23:06,928 INFO [train2.py:809] (2/4) Epoch 22, batch 1600, loss[ctc_loss=0.1012, att_loss=0.2626, loss=0.2303, over 17286.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01203, over 55.00 utterances.], tot_loss[ctc_loss=0.07297, att_loss=0.2358, loss=0.2033, over 3270280.51 frames. utt_duration=1188 frames, utt_pad_proportion=0.07064, over 11024.52 utterances.], batch size: 55, lr: 4.91e-03, grad_scale: 8.0 2023-03-09 00:23:56,464 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85289.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:24:10,304 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2884, 3.9337, 3.4414, 3.7526, 4.1433, 3.8195, 3.3876, 4.4470], device='cuda:2'), covar=tensor([0.0893, 0.0423, 0.0940, 0.0556, 0.0618, 0.0686, 0.0702, 0.0393], device='cuda:2'), in_proj_covar=tensor([0.0204, 0.0218, 0.0227, 0.0200, 0.0280, 0.0242, 0.0203, 0.0289], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 00:24:23,648 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85307.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:24:27,035 INFO [train2.py:809] (2/4) Epoch 22, batch 1650, loss[ctc_loss=0.06929, att_loss=0.2511, loss=0.2147, over 16754.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.005294, over 48.00 utterances.], tot_loss[ctc_loss=0.07259, att_loss=0.2357, loss=0.2031, over 3271331.46 frames. utt_duration=1208 frames, utt_pad_proportion=0.06447, over 10841.52 utterances.], batch size: 48, lr: 4.90e-03, grad_scale: 8.0 2023-03-09 00:25:11,313 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85337.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:25:20,304 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.436e+02 1.903e+02 2.201e+02 2.688e+02 4.353e+02, threshold=4.403e+02, percent-clipped=0.0 2023-03-09 00:25:44,525 INFO [train2.py:809] (2/4) Epoch 22, batch 1700, loss[ctc_loss=0.06114, att_loss=0.2191, loss=0.1875, over 16154.00 frames. utt_duration=1578 frames, utt_pad_proportion=0.008324, over 41.00 utterances.], tot_loss[ctc_loss=0.07345, att_loss=0.2362, loss=0.2036, over 3273955.45 frames. utt_duration=1216 frames, utt_pad_proportion=0.06268, over 10785.59 utterances.], batch size: 41, lr: 4.90e-03, grad_scale: 8.0 2023-03-09 00:27:03,854 INFO [train2.py:809] (2/4) Epoch 22, batch 1750, loss[ctc_loss=0.06622, att_loss=0.2497, loss=0.213, over 16319.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006794, over 45.00 utterances.], tot_loss[ctc_loss=0.07426, att_loss=0.2361, loss=0.2038, over 3257092.45 frames. utt_duration=1177 frames, utt_pad_proportion=0.07566, over 11079.32 utterances.], batch size: 45, lr: 4.90e-03, grad_scale: 8.0 2023-03-09 00:27:35,528 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85428.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:27:58,932 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.196e+02 1.824e+02 2.227e+02 2.751e+02 5.313e+02, threshold=4.454e+02, percent-clipped=1.0 2023-03-09 00:28:19,233 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85456.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:28:24,081 INFO [train2.py:809] (2/4) Epoch 22, batch 1800, loss[ctc_loss=0.06862, att_loss=0.243, loss=0.2081, over 17304.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01166, over 55.00 utterances.], tot_loss[ctc_loss=0.0743, att_loss=0.236, loss=0.2037, over 3257686.58 frames. utt_duration=1156 frames, utt_pad_proportion=0.08015, over 11290.95 utterances.], batch size: 55, lr: 4.90e-03, grad_scale: 8.0 2023-03-09 00:28:30,526 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85463.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 00:28:44,315 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85471.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:28:47,526 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2659, 3.0019, 3.3356, 4.4277, 3.9218, 3.8540, 2.9678, 2.2552], device='cuda:2'), covar=tensor([0.0792, 0.1872, 0.0941, 0.0540, 0.0757, 0.0506, 0.1436, 0.2248], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0215, 0.0190, 0.0219, 0.0226, 0.0180, 0.0200, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:29:04,483 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85483.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:29:13,790 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85489.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:29:37,444 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85504.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:29:40,791 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4528, 2.7076, 4.9245, 3.9396, 2.9387, 4.2561, 4.6606, 4.6430], device='cuda:2'), covar=tensor([0.0280, 0.1464, 0.0189, 0.0848, 0.1677, 0.0255, 0.0164, 0.0275], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0242, 0.0190, 0.0312, 0.0264, 0.0217, 0.0179, 0.0210], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:29:45,465 INFO [train2.py:809] (2/4) Epoch 22, batch 1850, loss[ctc_loss=0.06701, att_loss=0.2516, loss=0.2147, over 16970.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.006491, over 50.00 utterances.], tot_loss[ctc_loss=0.07357, att_loss=0.2355, loss=0.2031, over 3264733.84 frames. utt_duration=1204 frames, utt_pad_proportion=0.06776, over 10860.59 utterances.], batch size: 50, lr: 4.90e-03, grad_scale: 8.0 2023-03-09 00:30:19,823 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85530.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:30:39,481 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.220e+02 1.961e+02 2.288e+02 2.776e+02 5.536e+02, threshold=4.576e+02, percent-clipped=1.0 2023-03-09 00:30:41,515 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85544.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:31:04,495 INFO [train2.py:809] (2/4) Epoch 22, batch 1900, loss[ctc_loss=0.08791, att_loss=0.2575, loss=0.2236, over 17383.00 frames. utt_duration=1009 frames, utt_pad_proportion=0.04789, over 69.00 utterances.], tot_loss[ctc_loss=0.07396, att_loss=0.236, loss=0.2036, over 3257869.19 frames. utt_duration=1217 frames, utt_pad_proportion=0.06606, over 10721.47 utterances.], batch size: 69, lr: 4.90e-03, grad_scale: 8.0 2023-03-09 00:31:55,963 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85591.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:32:21,681 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85607.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:32:24,411 INFO [train2.py:809] (2/4) Epoch 22, batch 1950, loss[ctc_loss=0.09138, att_loss=0.2584, loss=0.225, over 17393.00 frames. utt_duration=1010 frames, utt_pad_proportion=0.04825, over 69.00 utterances.], tot_loss[ctc_loss=0.0731, att_loss=0.2356, loss=0.2031, over 3263723.59 frames. utt_duration=1213 frames, utt_pad_proportion=0.06517, over 10776.94 utterances.], batch size: 69, lr: 4.90e-03, grad_scale: 8.0 2023-03-09 00:32:41,401 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8992, 5.1683, 5.1465, 5.0333, 5.1749, 5.1401, 4.8374, 4.6961], device='cuda:2'), covar=tensor([0.0939, 0.0537, 0.0304, 0.0451, 0.0304, 0.0304, 0.0423, 0.0307], device='cuda:2'), in_proj_covar=tensor([0.0518, 0.0363, 0.0341, 0.0356, 0.0417, 0.0427, 0.0356, 0.0391], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-09 00:32:51,861 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85625.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:32:58,446 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2023-03-09 00:33:07,166 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9599, 5.1514, 5.6858, 5.0908, 5.1525, 5.8220, 5.0788, 5.8897], device='cuda:2'), covar=tensor([0.1120, 0.1514, 0.1152, 0.2340, 0.3131, 0.1419, 0.1231, 0.1123], device='cuda:2'), in_proj_covar=tensor([0.0868, 0.0517, 0.0598, 0.0662, 0.0876, 0.0627, 0.0484, 0.0609], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 00:33:16,584 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9795, 5.2947, 5.5513, 5.3962, 5.4971, 5.9920, 5.2329, 6.0758], device='cuda:2'), covar=tensor([0.0781, 0.0823, 0.0885, 0.1351, 0.2015, 0.0827, 0.0757, 0.0655], device='cuda:2'), in_proj_covar=tensor([0.0868, 0.0517, 0.0598, 0.0662, 0.0876, 0.0628, 0.0484, 0.0610], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 00:33:19,614 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.244e+02 1.791e+02 2.130e+02 2.666e+02 5.391e+02, threshold=4.261e+02, percent-clipped=1.0 2023-03-09 00:33:31,145 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8823, 2.3066, 2.7071, 2.8407, 2.9756, 2.9501, 2.4079, 3.2446], device='cuda:2'), covar=tensor([0.1768, 0.2817, 0.2079, 0.1312, 0.2137, 0.1687, 0.2447, 0.1295], device='cuda:2'), in_proj_covar=tensor([0.0120, 0.0125, 0.0124, 0.0111, 0.0126, 0.0107, 0.0131, 0.0101], device='cuda:2'), out_proj_covar=tensor([9.0615e-05, 9.7578e-05, 9.8341e-05, 8.6884e-05, 9.3962e-05, 8.6422e-05, 9.9114e-05, 8.1210e-05], device='cuda:2') 2023-03-09 00:33:39,319 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85655.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:33:45,244 INFO [train2.py:809] (2/4) Epoch 22, batch 2000, loss[ctc_loss=0.06071, att_loss=0.2157, loss=0.1847, over 16313.00 frames. utt_duration=1519 frames, utt_pad_proportion=0.00533, over 43.00 utterances.], tot_loss[ctc_loss=0.07227, att_loss=0.235, loss=0.2024, over 3264537.79 frames. utt_duration=1242 frames, utt_pad_proportion=0.0571, over 10527.74 utterances.], batch size: 43, lr: 4.89e-03, grad_scale: 8.0 2023-03-09 00:34:26,065 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8364, 3.6865, 3.6879, 3.1776, 3.6165, 3.8165, 3.7724, 2.7137], device='cuda:2'), covar=tensor([0.1007, 0.1477, 0.1585, 0.3244, 0.1095, 0.1385, 0.0771, 0.3532], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0190, 0.0200, 0.0252, 0.0159, 0.0260, 0.0183, 0.0216], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:34:29,124 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85686.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 00:34:37,823 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-03-09 00:35:05,233 INFO [train2.py:809] (2/4) Epoch 22, batch 2050, loss[ctc_loss=0.04296, att_loss=0.212, loss=0.1782, over 16404.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006578, over 44.00 utterances.], tot_loss[ctc_loss=0.0724, att_loss=0.2354, loss=0.2028, over 3269220.98 frames. utt_duration=1254 frames, utt_pad_proportion=0.0533, over 10442.04 utterances.], batch size: 44, lr: 4.89e-03, grad_scale: 8.0 2023-03-09 00:36:01,003 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.043e+02 1.910e+02 2.299e+02 2.633e+02 5.710e+02, threshold=4.599e+02, percent-clipped=8.0 2023-03-09 00:36:03,774 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.27 vs. limit=5.0 2023-03-09 00:36:27,082 INFO [train2.py:809] (2/4) Epoch 22, batch 2100, loss[ctc_loss=0.05658, att_loss=0.2304, loss=0.1956, over 16175.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.005925, over 41.00 utterances.], tot_loss[ctc_loss=0.07353, att_loss=0.2365, loss=0.2039, over 3281600.65 frames. utt_duration=1243 frames, utt_pad_proportion=0.05208, over 10570.71 utterances.], batch size: 41, lr: 4.89e-03, grad_scale: 8.0 2023-03-09 00:36:33,872 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85763.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:36:47,314 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85771.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:36:55,240 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-03-09 00:37:08,399 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85784.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:37:10,033 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0614, 5.3487, 5.2599, 5.2259, 5.3298, 5.3323, 4.9511, 4.7873], device='cuda:2'), covar=tensor([0.0997, 0.0510, 0.0273, 0.0491, 0.0282, 0.0275, 0.0418, 0.0331], device='cuda:2'), in_proj_covar=tensor([0.0521, 0.0364, 0.0344, 0.0359, 0.0421, 0.0428, 0.0358, 0.0394], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-09 00:37:47,852 INFO [train2.py:809] (2/4) Epoch 22, batch 2150, loss[ctc_loss=0.0664, att_loss=0.2483, loss=0.2119, over 17047.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.009376, over 52.00 utterances.], tot_loss[ctc_loss=0.07333, att_loss=0.2359, loss=0.2034, over 3279007.04 frames. utt_duration=1240 frames, utt_pad_proportion=0.05351, over 10589.15 utterances.], batch size: 52, lr: 4.89e-03, grad_scale: 8.0 2023-03-09 00:37:51,098 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85811.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:38:04,883 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85819.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:38:36,690 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85839.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:38:42,696 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.253e+02 1.885e+02 2.205e+02 3.046e+02 7.448e+02, threshold=4.410e+02, percent-clipped=6.0 2023-03-09 00:39:07,835 INFO [train2.py:809] (2/4) Epoch 22, batch 2200, loss[ctc_loss=0.07956, att_loss=0.2507, loss=0.2164, over 16951.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.00859, over 50.00 utterances.], tot_loss[ctc_loss=0.07243, att_loss=0.2355, loss=0.2028, over 3282917.49 frames. utt_duration=1251 frames, utt_pad_proportion=0.05036, over 10510.93 utterances.], batch size: 50, lr: 4.89e-03, grad_scale: 8.0 2023-03-09 00:39:35,130 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85875.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:39:52,052 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85886.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:40:28,362 INFO [train2.py:809] (2/4) Epoch 22, batch 2250, loss[ctc_loss=0.08517, att_loss=0.2288, loss=0.2001, over 15770.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.0084, over 38.00 utterances.], tot_loss[ctc_loss=0.07241, att_loss=0.2355, loss=0.2029, over 3279290.43 frames. utt_duration=1235 frames, utt_pad_proportion=0.05468, over 10637.89 utterances.], batch size: 38, lr: 4.89e-03, grad_scale: 8.0 2023-03-09 00:41:02,664 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85930.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 00:41:11,787 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85936.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:41:22,182 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.269e+02 1.895e+02 2.304e+02 2.747e+02 7.782e+02, threshold=4.607e+02, percent-clipped=2.0 2023-03-09 00:41:44,930 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85957.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:41:47,669 INFO [train2.py:809] (2/4) Epoch 22, batch 2300, loss[ctc_loss=0.06989, att_loss=0.2458, loss=0.2106, over 16954.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.0075, over 50.00 utterances.], tot_loss[ctc_loss=0.07208, att_loss=0.2349, loss=0.2024, over 3272905.67 frames. utt_duration=1217 frames, utt_pad_proportion=0.06075, over 10774.77 utterances.], batch size: 50, lr: 4.89e-03, grad_scale: 8.0 2023-03-09 00:42:24,312 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85981.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 00:42:27,534 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3335, 2.9306, 3.2100, 4.3792, 3.8041, 3.8536, 2.8739, 2.1413], device='cuda:2'), covar=tensor([0.0754, 0.1905, 0.0995, 0.0593, 0.0944, 0.0491, 0.1718, 0.2397], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0217, 0.0192, 0.0221, 0.0229, 0.0182, 0.0203, 0.0190], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:42:40,403 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85991.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 00:43:02,488 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86002.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:43:10,602 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7659, 2.2841, 2.5862, 2.7817, 3.0440, 2.7308, 2.4701, 3.3148], device='cuda:2'), covar=tensor([0.2296, 0.2758, 0.2010, 0.1306, 0.1884, 0.1032, 0.2218, 0.1016], device='cuda:2'), in_proj_covar=tensor([0.0119, 0.0125, 0.0122, 0.0110, 0.0124, 0.0106, 0.0130, 0.0101], device='cuda:2'), out_proj_covar=tensor([9.0044e-05, 9.6982e-05, 9.7094e-05, 8.6014e-05, 9.2573e-05, 8.5706e-05, 9.8412e-05, 8.0764e-05], device='cuda:2') 2023-03-09 00:43:13,252 INFO [train2.py:809] (2/4) Epoch 22, batch 2350, loss[ctc_loss=0.06428, att_loss=0.2302, loss=0.197, over 16410.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006517, over 44.00 utterances.], tot_loss[ctc_loss=0.07157, att_loss=0.2341, loss=0.2016, over 3261015.79 frames. utt_duration=1226 frames, utt_pad_proportion=0.06162, over 10649.80 utterances.], batch size: 44, lr: 4.88e-03, grad_scale: 8.0 2023-03-09 00:43:28,513 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86018.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:43:55,991 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2023-03-09 00:43:57,596 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2023-03-09 00:43:58,280 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2772, 4.4224, 4.6492, 4.4910, 5.0378, 4.6392, 4.5030, 2.3605], device='cuda:2'), covar=tensor([0.0310, 0.0340, 0.0290, 0.0286, 0.0913, 0.0208, 0.0296, 0.2110], device='cuda:2'), in_proj_covar=tensor([0.0167, 0.0193, 0.0191, 0.0209, 0.0369, 0.0162, 0.0182, 0.0218], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:44:06,999 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.776e+02 2.270e+02 2.670e+02 5.895e+02, threshold=4.541e+02, percent-clipped=3.0 2023-03-09 00:44:12,098 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1132, 5.2270, 4.9891, 2.3663, 2.0125, 3.0545, 2.3256, 3.9349], device='cuda:2'), covar=tensor([0.0732, 0.0256, 0.0245, 0.4616, 0.5606, 0.2292, 0.3817, 0.1638], device='cuda:2'), in_proj_covar=tensor([0.0356, 0.0277, 0.0267, 0.0246, 0.0341, 0.0334, 0.0255, 0.0366], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 00:44:12,464 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-09 00:44:21,735 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86052.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:44:32,410 INFO [train2.py:809] (2/4) Epoch 22, batch 2400, loss[ctc_loss=0.0896, att_loss=0.2542, loss=0.2213, over 17289.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01273, over 55.00 utterances.], tot_loss[ctc_loss=0.07126, att_loss=0.2342, loss=0.2016, over 3271255.85 frames. utt_duration=1247 frames, utt_pad_proportion=0.05544, over 10507.47 utterances.], batch size: 55, lr: 4.88e-03, grad_scale: 8.0 2023-03-09 00:44:39,823 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86063.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:44:49,069 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86069.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:44:57,364 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7522, 5.0119, 4.6636, 5.1090, 4.5459, 4.8046, 5.1801, 4.9189], device='cuda:2'), covar=tensor([0.0615, 0.0296, 0.0755, 0.0325, 0.0433, 0.0332, 0.0218, 0.0202], device='cuda:2'), in_proj_covar=tensor([0.0389, 0.0321, 0.0368, 0.0352, 0.0323, 0.0238, 0.0304, 0.0286], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 00:45:12,632 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86084.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:45:29,153 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5296, 4.9153, 4.6777, 4.9190, 4.9897, 4.5820, 3.2931, 4.8658], device='cuda:2'), covar=tensor([0.0134, 0.0131, 0.0165, 0.0110, 0.0114, 0.0151, 0.0811, 0.0244], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0090, 0.0114, 0.0071, 0.0078, 0.0088, 0.0107, 0.0111], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 00:45:50,865 INFO [train2.py:809] (2/4) Epoch 22, batch 2450, loss[ctc_loss=0.05861, att_loss=0.206, loss=0.1765, over 15363.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01055, over 35.00 utterances.], tot_loss[ctc_loss=0.07129, att_loss=0.2339, loss=0.2014, over 3268925.41 frames. utt_duration=1265 frames, utt_pad_proportion=0.05182, over 10345.32 utterances.], batch size: 35, lr: 4.88e-03, grad_scale: 8.0 2023-03-09 00:45:58,794 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86113.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:46:26,071 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86130.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:46:28,916 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86132.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:46:39,894 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86139.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:46:45,846 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.355e+02 2.126e+02 2.601e+02 3.056e+02 8.215e+02, threshold=5.203e+02, percent-clipped=8.0 2023-03-09 00:46:48,703 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-09 00:47:11,847 INFO [train2.py:809] (2/4) Epoch 22, batch 2500, loss[ctc_loss=0.06832, att_loss=0.2315, loss=0.1989, over 15984.00 frames. utt_duration=1561 frames, utt_pad_proportion=0.005208, over 41.00 utterances.], tot_loss[ctc_loss=0.07188, att_loss=0.2344, loss=0.2019, over 3265175.23 frames. utt_duration=1271 frames, utt_pad_proportion=0.05083, over 10284.21 utterances.], batch size: 41, lr: 4.88e-03, grad_scale: 8.0 2023-03-09 00:47:13,961 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0740, 5.2110, 4.9135, 2.4154, 2.0738, 3.1053, 2.5948, 3.9434], device='cuda:2'), covar=tensor([0.0780, 0.0252, 0.0277, 0.5062, 0.5605, 0.2333, 0.3605, 0.1619], device='cuda:2'), in_proj_covar=tensor([0.0357, 0.0278, 0.0268, 0.0247, 0.0341, 0.0335, 0.0255, 0.0368], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 00:47:57,552 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86186.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:47:58,892 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86187.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:48:33,442 INFO [train2.py:809] (2/4) Epoch 22, batch 2550, loss[ctc_loss=0.0743, att_loss=0.2267, loss=0.1962, over 16151.00 frames. utt_duration=1578 frames, utt_pad_proportion=0.007184, over 41.00 utterances.], tot_loss[ctc_loss=0.07272, att_loss=0.235, loss=0.2025, over 3276324.35 frames. utt_duration=1280 frames, utt_pad_proportion=0.04568, over 10247.16 utterances.], batch size: 41, lr: 4.88e-03, grad_scale: 8.0 2023-03-09 00:49:10,452 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86231.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:49:15,111 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86234.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:49:29,161 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.366e+02 1.974e+02 2.356e+02 2.687e+02 6.031e+02, threshold=4.712e+02, percent-clipped=2.0 2023-03-09 00:49:34,327 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86246.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:49:54,972 INFO [train2.py:809] (2/4) Epoch 22, batch 2600, loss[ctc_loss=0.07441, att_loss=0.2346, loss=0.2026, over 16474.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006086, over 46.00 utterances.], tot_loss[ctc_loss=0.07222, att_loss=0.235, loss=0.2024, over 3278676.08 frames. utt_duration=1260 frames, utt_pad_proportion=0.04993, over 10424.98 utterances.], batch size: 46, lr: 4.88e-03, grad_scale: 8.0 2023-03-09 00:50:16,784 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.10 vs. limit=5.0 2023-03-09 00:50:23,976 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2144, 4.4918, 4.4234, 4.7689, 2.7654, 4.3381, 2.8930, 1.9698], device='cuda:2'), covar=tensor([0.0431, 0.0266, 0.0666, 0.0201, 0.1791, 0.0240, 0.1498, 0.1714], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0165, 0.0255, 0.0156, 0.0218, 0.0147, 0.0228, 0.0200], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:50:31,768 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86281.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:50:32,559 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-03-09 00:50:39,193 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86286.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 00:51:13,063 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86307.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:51:15,788 INFO [train2.py:809] (2/4) Epoch 22, batch 2650, loss[ctc_loss=0.06463, att_loss=0.2184, loss=0.1877, over 16164.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.007711, over 41.00 utterances.], tot_loss[ctc_loss=0.07191, att_loss=0.2347, loss=0.2022, over 3276550.72 frames. utt_duration=1268 frames, utt_pad_proportion=0.04833, over 10349.73 utterances.], batch size: 41, lr: 4.88e-03, grad_scale: 8.0 2023-03-09 00:51:23,792 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86313.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:51:49,595 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86329.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:52:11,644 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-09 00:52:11,918 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.265e+02 1.827e+02 2.241e+02 2.634e+02 5.171e+02, threshold=4.483e+02, percent-clipped=1.0 2023-03-09 00:52:37,270 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86358.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:52:38,735 INFO [train2.py:809] (2/4) Epoch 22, batch 2700, loss[ctc_loss=0.09337, att_loss=0.2536, loss=0.2215, over 17036.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.00902, over 52.00 utterances.], tot_loss[ctc_loss=0.07194, att_loss=0.235, loss=0.2024, over 3268617.33 frames. utt_duration=1219 frames, utt_pad_proportion=0.0633, over 10738.71 utterances.], batch size: 52, lr: 4.87e-03, grad_scale: 8.0 2023-03-09 00:53:02,320 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3515, 2.9169, 3.5451, 2.7615, 3.4381, 4.5210, 4.3374, 2.9371], device='cuda:2'), covar=tensor([0.0417, 0.1702, 0.1274, 0.1540, 0.1158, 0.0913, 0.0618, 0.1549], device='cuda:2'), in_proj_covar=tensor([0.0244, 0.0240, 0.0280, 0.0218, 0.0264, 0.0365, 0.0259, 0.0230], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 00:53:58,796 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86408.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:54:00,129 INFO [train2.py:809] (2/4) Epoch 22, batch 2750, loss[ctc_loss=0.04858, att_loss=0.2154, loss=0.182, over 13709.00 frames. utt_duration=1830 frames, utt_pad_proportion=0.07829, over 30.00 utterances.], tot_loss[ctc_loss=0.07177, att_loss=0.2346, loss=0.2021, over 3267140.59 frames. utt_duration=1225 frames, utt_pad_proportion=0.06253, over 10679.74 utterances.], batch size: 30, lr: 4.87e-03, grad_scale: 8.0 2023-03-09 00:54:24,410 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0023, 3.7502, 3.7196, 3.2143, 3.7784, 3.8452, 3.7950, 2.7049], device='cuda:2'), covar=tensor([0.1000, 0.1200, 0.2033, 0.2900, 0.1114, 0.2338, 0.0837, 0.3669], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0194, 0.0206, 0.0260, 0.0164, 0.0267, 0.0189, 0.0221], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:54:25,779 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86425.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:54:49,459 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86440.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:54:50,922 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1809, 5.5486, 5.7097, 5.5318, 5.7192, 6.1342, 5.3312, 6.2073], device='cuda:2'), covar=tensor([0.0627, 0.0614, 0.0745, 0.1348, 0.1695, 0.0847, 0.0584, 0.0642], device='cuda:2'), in_proj_covar=tensor([0.0864, 0.0509, 0.0596, 0.0659, 0.0866, 0.0629, 0.0486, 0.0605], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 00:54:53,694 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.448e+02 1.998e+02 2.389e+02 2.932e+02 8.437e+02, threshold=4.778e+02, percent-clipped=2.0 2023-03-09 00:54:57,209 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86445.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:55:20,789 INFO [train2.py:809] (2/4) Epoch 22, batch 2800, loss[ctc_loss=0.111, att_loss=0.2661, loss=0.2351, over 14086.00 frames. utt_duration=382.1 frames, utt_pad_proportion=0.3262, over 148.00 utterances.], tot_loss[ctc_loss=0.07146, att_loss=0.2341, loss=0.2016, over 3262453.71 frames. utt_duration=1214 frames, utt_pad_proportion=0.06603, over 10766.58 utterances.], batch size: 148, lr: 4.87e-03, grad_scale: 8.0 2023-03-09 00:56:07,224 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2719, 4.3328, 4.5637, 4.4894, 5.0477, 4.5146, 4.4685, 2.5987], device='cuda:2'), covar=tensor([0.0308, 0.0393, 0.0307, 0.0298, 0.0706, 0.0250, 0.0356, 0.1793], device='cuda:2'), in_proj_covar=tensor([0.0166, 0.0191, 0.0191, 0.0207, 0.0366, 0.0161, 0.0180, 0.0215], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:56:28,104 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86501.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:56:32,723 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2887, 4.5867, 4.6639, 4.9417, 3.0191, 4.3932, 3.1874, 1.8223], device='cuda:2'), covar=tensor([0.0404, 0.0305, 0.0524, 0.0179, 0.1447, 0.0234, 0.1185, 0.1694], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0170, 0.0262, 0.0160, 0.0223, 0.0150, 0.0233, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:56:36,560 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86506.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 00:56:40,824 INFO [train2.py:809] (2/4) Epoch 22, batch 2850, loss[ctc_loss=0.05855, att_loss=0.2428, loss=0.2059, over 16881.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.0067, over 49.00 utterances.], tot_loss[ctc_loss=0.0714, att_loss=0.2347, loss=0.2021, over 3273182.84 frames. utt_duration=1231 frames, utt_pad_proportion=0.0593, over 10650.93 utterances.], batch size: 49, lr: 4.87e-03, grad_scale: 8.0 2023-03-09 00:57:16,128 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86531.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:57:34,135 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.710e+01 1.849e+02 2.202e+02 2.707e+02 3.690e+02, threshold=4.403e+02, percent-clipped=0.0 2023-03-09 00:57:57,080 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7269, 2.7339, 3.9155, 3.5647, 2.9837, 3.6659, 3.6923, 3.7714], device='cuda:2'), covar=tensor([0.0304, 0.1186, 0.0217, 0.0712, 0.1324, 0.0309, 0.0250, 0.0352], device='cuda:2'), in_proj_covar=tensor([0.0198, 0.0242, 0.0193, 0.0316, 0.0265, 0.0218, 0.0181, 0.0211], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 00:58:01,269 INFO [train2.py:809] (2/4) Epoch 22, batch 2900, loss[ctc_loss=0.06355, att_loss=0.2382, loss=0.2033, over 16452.00 frames. utt_duration=1432 frames, utt_pad_proportion=0.007367, over 46.00 utterances.], tot_loss[ctc_loss=0.07145, att_loss=0.2347, loss=0.2021, over 3269939.20 frames. utt_duration=1204 frames, utt_pad_proportion=0.06636, over 10874.95 utterances.], batch size: 46, lr: 4.87e-03, grad_scale: 8.0 2023-03-09 00:58:33,082 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86579.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:58:34,892 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86580.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:58:44,199 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86586.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 00:59:09,702 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86602.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 00:59:21,686 INFO [train2.py:809] (2/4) Epoch 22, batch 2950, loss[ctc_loss=0.07569, att_loss=0.2338, loss=0.2022, over 16289.00 frames. utt_duration=1517 frames, utt_pad_proportion=0.006732, over 43.00 utterances.], tot_loss[ctc_loss=0.07047, att_loss=0.2342, loss=0.2014, over 3273814.55 frames. utt_duration=1240 frames, utt_pad_proportion=0.05738, over 10574.78 utterances.], batch size: 43, lr: 4.87e-03, grad_scale: 8.0 2023-03-09 00:59:28,862 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86613.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:00:01,851 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86634.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 01:00:12,892 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86641.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:00:15,584 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.364e+02 1.951e+02 2.348e+02 3.080e+02 7.280e+02, threshold=4.697e+02, percent-clipped=5.0 2023-03-09 01:00:41,175 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86658.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:00:42,427 INFO [train2.py:809] (2/4) Epoch 22, batch 3000, loss[ctc_loss=0.05879, att_loss=0.2283, loss=0.1944, over 17009.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008414, over 51.00 utterances.], tot_loss[ctc_loss=0.07155, att_loss=0.2351, loss=0.2024, over 3272510.21 frames. utt_duration=1214 frames, utt_pad_proportion=0.06484, over 10797.53 utterances.], batch size: 51, lr: 4.87e-03, grad_scale: 8.0 2023-03-09 01:00:42,427 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 01:00:57,104 INFO [train2.py:843] (2/4) Epoch 22, validation: ctc_loss=0.03993, att_loss=0.2347, loss=0.1957, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 01:00:57,105 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 01:01:01,030 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86661.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:01:15,836 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2654, 4.5476, 4.6028, 4.8467, 2.8615, 4.3551, 2.7832, 2.0616], device='cuda:2'), covar=tensor([0.0448, 0.0223, 0.0594, 0.0180, 0.1671, 0.0223, 0.1533, 0.1707], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0167, 0.0257, 0.0157, 0.0219, 0.0148, 0.0229, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:02:11,054 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86706.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:02:15,056 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86708.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:02:16,474 INFO [train2.py:809] (2/4) Epoch 22, batch 3050, loss[ctc_loss=0.06607, att_loss=0.211, loss=0.182, over 10151.00 frames. utt_duration=1848 frames, utt_pad_proportion=0.2431, over 22.00 utterances.], tot_loss[ctc_loss=0.07171, att_loss=0.235, loss=0.2023, over 3271412.46 frames. utt_duration=1229 frames, utt_pad_proportion=0.06037, over 10657.48 utterances.], batch size: 22, lr: 4.86e-03, grad_scale: 8.0 2023-03-09 01:02:43,492 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86725.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:03:11,084 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.329e+02 1.829e+02 2.203e+02 2.781e+02 6.356e+02, threshold=4.407e+02, percent-clipped=2.0 2023-03-09 01:03:31,281 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86756.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:03:35,076 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5231, 2.9689, 3.6390, 3.1498, 3.5454, 4.6178, 4.4095, 3.2171], device='cuda:2'), covar=tensor([0.0364, 0.1646, 0.1211, 0.1210, 0.1013, 0.0826, 0.0522, 0.1268], device='cuda:2'), in_proj_covar=tensor([0.0247, 0.0242, 0.0281, 0.0219, 0.0264, 0.0369, 0.0262, 0.0231], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 01:03:36,291 INFO [train2.py:809] (2/4) Epoch 22, batch 3100, loss[ctc_loss=0.06715, att_loss=0.2421, loss=0.2071, over 16770.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006303, over 48.00 utterances.], tot_loss[ctc_loss=0.07289, att_loss=0.2356, loss=0.203, over 3269962.96 frames. utt_duration=1213 frames, utt_pad_proportion=0.06462, over 10799.68 utterances.], batch size: 48, lr: 4.86e-03, grad_scale: 8.0 2023-03-09 01:03:42,252 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-03-09 01:03:59,931 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86773.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:04:15,698 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6300, 2.9829, 3.4803, 4.5689, 3.9141, 3.9347, 3.0136, 2.3815], device='cuda:2'), covar=tensor([0.0635, 0.1926, 0.0934, 0.0505, 0.0994, 0.0526, 0.1524, 0.2224], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0213, 0.0189, 0.0219, 0.0225, 0.0182, 0.0200, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:04:35,402 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86796.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:04:43,665 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86801.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 01:04:51,059 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-03-09 01:04:57,200 INFO [train2.py:809] (2/4) Epoch 22, batch 3150, loss[ctc_loss=0.07876, att_loss=0.242, loss=0.2094, over 16984.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006486, over 50.00 utterances.], tot_loss[ctc_loss=0.07315, att_loss=0.2357, loss=0.2032, over 3265629.27 frames. utt_duration=1209 frames, utt_pad_proportion=0.06523, over 10816.36 utterances.], batch size: 50, lr: 4.86e-03, grad_scale: 8.0 2023-03-09 01:05:06,043 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86814.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:05:18,285 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.42 vs. limit=5.0 2023-03-09 01:05:51,843 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.403e+02 1.971e+02 2.330e+02 2.834e+02 6.258e+02, threshold=4.661e+02, percent-clipped=4.0 2023-03-09 01:06:18,408 INFO [train2.py:809] (2/4) Epoch 22, batch 3200, loss[ctc_loss=0.06423, att_loss=0.2355, loss=0.2013, over 16327.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006202, over 45.00 utterances.], tot_loss[ctc_loss=0.07266, att_loss=0.2352, loss=0.2027, over 3261961.51 frames. utt_duration=1201 frames, utt_pad_proportion=0.06937, over 10882.42 utterances.], batch size: 45, lr: 4.86e-03, grad_scale: 8.0 2023-03-09 01:06:45,826 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86875.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:06:50,286 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86878.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:07:27,693 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86902.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:07:39,433 INFO [train2.py:809] (2/4) Epoch 22, batch 3250, loss[ctc_loss=0.05871, att_loss=0.2131, loss=0.1822, over 15641.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.00902, over 37.00 utterances.], tot_loss[ctc_loss=0.07165, att_loss=0.2351, loss=0.2024, over 3271499.81 frames. utt_duration=1222 frames, utt_pad_proportion=0.06255, over 10720.24 utterances.], batch size: 37, lr: 4.86e-03, grad_scale: 8.0 2023-03-09 01:08:22,557 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86936.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:08:27,458 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86939.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:08:33,098 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.904e+02 2.296e+02 2.851e+02 6.880e+02, threshold=4.593e+02, percent-clipped=1.0 2023-03-09 01:08:44,096 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86950.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:08:59,932 INFO [train2.py:809] (2/4) Epoch 22, batch 3300, loss[ctc_loss=0.05256, att_loss=0.2071, loss=0.1762, over 15636.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008946, over 37.00 utterances.], tot_loss[ctc_loss=0.07095, att_loss=0.2344, loss=0.2017, over 3276483.24 frames. utt_duration=1237 frames, utt_pad_proportion=0.05621, over 10609.67 utterances.], batch size: 37, lr: 4.86e-03, grad_scale: 8.0 2023-03-09 01:09:52,333 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9198, 3.7055, 3.1402, 3.4903, 3.9087, 3.6701, 3.0822, 4.2162], device='cuda:2'), covar=tensor([0.1166, 0.0579, 0.1075, 0.0681, 0.0749, 0.0740, 0.0841, 0.0446], device='cuda:2'), in_proj_covar=tensor([0.0206, 0.0219, 0.0229, 0.0204, 0.0284, 0.0245, 0.0202, 0.0292], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 01:09:56,900 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4574, 3.0162, 3.4740, 4.5316, 3.9861, 3.8860, 2.9836, 2.1612], device='cuda:2'), covar=tensor([0.0723, 0.1847, 0.0883, 0.0536, 0.0973, 0.0503, 0.1455, 0.2343], device='cuda:2'), in_proj_covar=tensor([0.0179, 0.0211, 0.0187, 0.0218, 0.0224, 0.0179, 0.0199, 0.0188], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:10:08,640 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87002.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 01:10:20,440 INFO [train2.py:809] (2/4) Epoch 22, batch 3350, loss[ctc_loss=0.1441, att_loss=0.2714, loss=0.2459, over 14531.00 frames. utt_duration=402.6 frames, utt_pad_proportion=0.2998, over 145.00 utterances.], tot_loss[ctc_loss=0.0712, att_loss=0.2349, loss=0.2022, over 3272075.66 frames. utt_duration=1220 frames, utt_pad_proportion=0.06111, over 10742.01 utterances.], batch size: 145, lr: 4.86e-03, grad_scale: 8.0 2023-03-09 01:11:13,696 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.863e+02 2.152e+02 2.724e+02 4.782e+02, threshold=4.303e+02, percent-clipped=1.0 2023-03-09 01:11:40,079 INFO [train2.py:809] (2/4) Epoch 22, batch 3400, loss[ctc_loss=0.06072, att_loss=0.2099, loss=0.18, over 15640.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008376, over 37.00 utterances.], tot_loss[ctc_loss=0.0714, att_loss=0.2349, loss=0.2022, over 3270297.46 frames. utt_duration=1234 frames, utt_pad_proportion=0.05778, over 10614.98 utterances.], batch size: 37, lr: 4.85e-03, grad_scale: 8.0 2023-03-09 01:11:43,132 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.13 vs. limit=5.0 2023-03-09 01:11:47,033 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87063.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 01:11:50,032 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3192, 4.4899, 4.5578, 4.9400, 2.7834, 4.3863, 2.7853, 1.4910], device='cuda:2'), covar=tensor([0.0343, 0.0244, 0.0571, 0.0149, 0.1558, 0.0208, 0.1397, 0.1889], device='cuda:2'), in_proj_covar=tensor([0.0196, 0.0170, 0.0262, 0.0160, 0.0223, 0.0150, 0.0232, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:12:09,718 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6129, 2.1618, 2.1905, 2.4903, 2.8038, 2.5360, 2.4877, 3.0857], device='cuda:2'), covar=tensor([0.2645, 0.3026, 0.2160, 0.1546, 0.1867, 0.1161, 0.2160, 0.1003], device='cuda:2'), in_proj_covar=tensor([0.0121, 0.0127, 0.0123, 0.0111, 0.0127, 0.0108, 0.0130, 0.0103], device='cuda:2'), out_proj_covar=tensor([9.1958e-05, 9.8534e-05, 9.8157e-05, 8.7193e-05, 9.4769e-05, 8.6862e-05, 9.9114e-05, 8.2111e-05], device='cuda:2') 2023-03-09 01:12:38,143 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87096.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:12:46,619 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87101.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 01:12:59,255 INFO [train2.py:809] (2/4) Epoch 22, batch 3450, loss[ctc_loss=0.09031, att_loss=0.257, loss=0.2237, over 16888.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006479, over 49.00 utterances.], tot_loss[ctc_loss=0.07303, att_loss=0.2364, loss=0.2037, over 3271422.91 frames. utt_duration=1219 frames, utt_pad_proportion=0.06112, over 10746.68 utterances.], batch size: 49, lr: 4.85e-03, grad_scale: 8.0 2023-03-09 01:12:59,480 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8046, 5.0120, 4.4539, 5.1747, 4.5539, 4.8293, 5.1308, 4.8658], device='cuda:2'), covar=tensor([0.0617, 0.0322, 0.1028, 0.0357, 0.0418, 0.0329, 0.0335, 0.0250], device='cuda:2'), in_proj_covar=tensor([0.0383, 0.0319, 0.0363, 0.0349, 0.0319, 0.0236, 0.0301, 0.0285], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 01:13:34,514 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-03-09 01:13:49,460 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6841, 2.2441, 2.3981, 2.5785, 2.8160, 2.5603, 2.3810, 3.0566], device='cuda:2'), covar=tensor([0.2518, 0.3690, 0.2650, 0.1736, 0.2106, 0.1477, 0.2804, 0.1131], device='cuda:2'), in_proj_covar=tensor([0.0122, 0.0128, 0.0124, 0.0112, 0.0128, 0.0108, 0.0131, 0.0104], device='cuda:2'), out_proj_covar=tensor([9.2623e-05, 9.9097e-05, 9.8886e-05, 8.7783e-05, 9.5430e-05, 8.7507e-05, 9.9965e-05, 8.2751e-05], device='cuda:2') 2023-03-09 01:13:52,177 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.972e+02 2.278e+02 2.813e+02 5.019e+02, threshold=4.557e+02, percent-clipped=5.0 2023-03-09 01:13:54,598 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87144.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:14:02,270 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87149.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:14:19,573 INFO [train2.py:809] (2/4) Epoch 22, batch 3500, loss[ctc_loss=0.05699, att_loss=0.2144, loss=0.1829, over 15502.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008167, over 36.00 utterances.], tot_loss[ctc_loss=0.07265, att_loss=0.2363, loss=0.2036, over 3272600.90 frames. utt_duration=1212 frames, utt_pad_proportion=0.06275, over 10812.67 utterances.], batch size: 36, lr: 4.85e-03, grad_scale: 8.0 2023-03-09 01:14:37,050 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87170.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:15:39,617 INFO [train2.py:809] (2/4) Epoch 22, batch 3550, loss[ctc_loss=0.05599, att_loss=0.2134, loss=0.1819, over 15376.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01091, over 35.00 utterances.], tot_loss[ctc_loss=0.07228, att_loss=0.2353, loss=0.2027, over 3265224.25 frames. utt_duration=1243 frames, utt_pad_proportion=0.05705, over 10519.60 utterances.], batch size: 35, lr: 4.85e-03, grad_scale: 8.0 2023-03-09 01:15:51,488 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-09 01:16:12,834 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87230.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:16:18,886 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87234.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:16:22,082 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87236.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:16:32,594 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.973e+02 2.339e+02 2.801e+02 5.420e+02, threshold=4.678e+02, percent-clipped=3.0 2023-03-09 01:16:59,774 INFO [train2.py:809] (2/4) Epoch 22, batch 3600, loss[ctc_loss=0.06411, att_loss=0.2256, loss=0.1933, over 16272.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007645, over 43.00 utterances.], tot_loss[ctc_loss=0.07284, att_loss=0.2356, loss=0.2031, over 3265998.59 frames. utt_duration=1226 frames, utt_pad_proportion=0.06071, over 10669.39 utterances.], batch size: 43, lr: 4.85e-03, grad_scale: 16.0 2023-03-09 01:17:39,325 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87284.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:17:47,424 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2820, 5.2752, 4.9794, 3.0191, 4.9940, 4.8220, 4.6443, 3.0136], device='cuda:2'), covar=tensor([0.0117, 0.0092, 0.0286, 0.1042, 0.0102, 0.0202, 0.0245, 0.1210], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0101, 0.0105, 0.0111, 0.0085, 0.0115, 0.0099, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 01:17:51,203 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87291.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:18:18,724 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5077, 2.1786, 2.2427, 2.5124, 2.7253, 2.4579, 2.3990, 3.0129], device='cuda:2'), covar=tensor([0.1748, 0.3009, 0.2063, 0.1419, 0.2799, 0.1437, 0.2301, 0.0956], device='cuda:2'), in_proj_covar=tensor([0.0125, 0.0130, 0.0127, 0.0115, 0.0132, 0.0111, 0.0134, 0.0105], device='cuda:2'), out_proj_covar=tensor([9.4597e-05, 1.0105e-04, 1.0095e-04, 8.9778e-05, 9.8010e-05, 8.9777e-05, 1.0217e-04, 8.4212e-05], device='cuda:2') 2023-03-09 01:18:19,832 INFO [train2.py:809] (2/4) Epoch 22, batch 3650, loss[ctc_loss=0.07152, att_loss=0.2425, loss=0.2083, over 16630.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.00517, over 47.00 utterances.], tot_loss[ctc_loss=0.07184, att_loss=0.235, loss=0.2024, over 3266865.46 frames. utt_duration=1235 frames, utt_pad_proportion=0.0579, over 10594.17 utterances.], batch size: 47, lr: 4.85e-03, grad_scale: 16.0 2023-03-09 01:18:32,550 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8126, 5.2897, 5.0515, 5.1297, 5.2843, 4.8571, 3.8424, 5.3140], device='cuda:2'), covar=tensor([0.0112, 0.0101, 0.0127, 0.0084, 0.0084, 0.0123, 0.0605, 0.0156], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0088, 0.0111, 0.0070, 0.0076, 0.0087, 0.0104, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 01:19:02,892 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-03-09 01:19:13,096 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.232e+02 1.934e+02 2.315e+02 2.927e+02 6.600e+02, threshold=4.631e+02, percent-clipped=3.0 2023-03-09 01:19:38,698 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87358.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 01:19:40,005 INFO [train2.py:809] (2/4) Epoch 22, batch 3700, loss[ctc_loss=0.07669, att_loss=0.2531, loss=0.2178, over 17066.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.008316, over 53.00 utterances.], tot_loss[ctc_loss=0.0713, att_loss=0.2345, loss=0.2018, over 3264134.53 frames. utt_duration=1257 frames, utt_pad_proportion=0.05349, over 10395.92 utterances.], batch size: 53, lr: 4.85e-03, grad_scale: 16.0 2023-03-09 01:19:44,784 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87362.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:20:04,183 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2023-03-09 01:20:59,237 INFO [train2.py:809] (2/4) Epoch 22, batch 3750, loss[ctc_loss=0.05324, att_loss=0.2339, loss=0.1977, over 17448.00 frames. utt_duration=1109 frames, utt_pad_proportion=0.03062, over 63.00 utterances.], tot_loss[ctc_loss=0.07203, att_loss=0.2349, loss=0.2023, over 3265913.44 frames. utt_duration=1268 frames, utt_pad_proportion=0.05026, over 10311.67 utterances.], batch size: 63, lr: 4.84e-03, grad_scale: 16.0 2023-03-09 01:21:01,719 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-03-09 01:21:21,537 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87423.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:21:23,960 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.89 vs. limit=5.0 2023-03-09 01:21:52,783 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.314e+02 1.951e+02 2.463e+02 3.155e+02 5.552e+02, threshold=4.927e+02, percent-clipped=3.0 2023-03-09 01:22:19,681 INFO [train2.py:809] (2/4) Epoch 22, batch 3800, loss[ctc_loss=0.03934, att_loss=0.1955, loss=0.1643, over 15787.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007552, over 38.00 utterances.], tot_loss[ctc_loss=0.07132, att_loss=0.2341, loss=0.2016, over 3264521.95 frames. utt_duration=1287 frames, utt_pad_proportion=0.04708, over 10157.11 utterances.], batch size: 38, lr: 4.84e-03, grad_scale: 16.0 2023-03-09 01:22:23,121 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87461.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:22:37,208 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87470.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:22:59,376 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4357, 2.6530, 4.9998, 3.8543, 3.1225, 4.2026, 4.7741, 4.6313], device='cuda:2'), covar=tensor([0.0288, 0.1508, 0.0203, 0.0830, 0.1635, 0.0256, 0.0175, 0.0265], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0240, 0.0193, 0.0314, 0.0262, 0.0216, 0.0182, 0.0211], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:23:34,638 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87505.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:23:40,698 INFO [train2.py:809] (2/4) Epoch 22, batch 3850, loss[ctc_loss=0.06693, att_loss=0.2397, loss=0.2052, over 17185.00 frames. utt_duration=871.6 frames, utt_pad_proportion=0.08728, over 79.00 utterances.], tot_loss[ctc_loss=0.0712, att_loss=0.2345, loss=0.2019, over 3271622.73 frames. utt_duration=1265 frames, utt_pad_proportion=0.05021, over 10356.97 utterances.], batch size: 79, lr: 4.84e-03, grad_scale: 16.0 2023-03-09 01:23:54,822 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87518.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:23:59,497 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7349, 6.0372, 5.3939, 5.6893, 5.7055, 5.1957, 5.4017, 5.2078], device='cuda:2'), covar=tensor([0.1236, 0.0783, 0.0880, 0.0807, 0.0871, 0.1391, 0.2193, 0.2054], device='cuda:2'), in_proj_covar=tensor([0.0533, 0.0621, 0.0469, 0.0465, 0.0443, 0.0475, 0.0623, 0.0536], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 01:24:01,318 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87522.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:24:19,714 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87534.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:24:33,080 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.913e+02 2.224e+02 2.689e+02 7.925e+02, threshold=4.448e+02, percent-clipped=4.0 2023-03-09 01:24:57,881 INFO [train2.py:809] (2/4) Epoch 22, batch 3900, loss[ctc_loss=0.1071, att_loss=0.263, loss=0.2318, over 17557.00 frames. utt_duration=879.3 frames, utt_pad_proportion=0.07349, over 80.00 utterances.], tot_loss[ctc_loss=0.07157, att_loss=0.2348, loss=0.2022, over 3269207.15 frames. utt_duration=1253 frames, utt_pad_proportion=0.05287, over 10448.28 utterances.], batch size: 80, lr: 4.84e-03, grad_scale: 16.0 2023-03-09 01:25:08,949 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87566.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 01:25:33,508 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87582.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:25:36,949 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87584.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:25:39,676 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87586.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:25:44,706 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87589.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:26:09,214 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87605.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:26:14,864 INFO [train2.py:809] (2/4) Epoch 22, batch 3950, loss[ctc_loss=0.08037, att_loss=0.2444, loss=0.2116, over 17138.00 frames. utt_duration=869.3 frames, utt_pad_proportion=0.08883, over 79.00 utterances.], tot_loss[ctc_loss=0.07159, att_loss=0.2353, loss=0.2025, over 3270903.14 frames. utt_duration=1255 frames, utt_pad_proportion=0.05268, over 10433.90 utterances.], batch size: 79, lr: 4.84e-03, grad_scale: 16.0 2023-03-09 01:26:39,259 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9018, 5.2518, 4.8098, 5.3533, 4.6123, 5.0315, 5.3971, 5.0871], device='cuda:2'), covar=tensor([0.0677, 0.0310, 0.0908, 0.0333, 0.0474, 0.0255, 0.0254, 0.0228], device='cuda:2'), in_proj_covar=tensor([0.0390, 0.0322, 0.0368, 0.0354, 0.0325, 0.0239, 0.0305, 0.0288], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 01:27:30,757 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.383e+02 1.991e+02 2.302e+02 2.874e+02 7.527e+02, threshold=4.603e+02, percent-clipped=2.0 2023-03-09 01:27:30,801 INFO [train2.py:809] (2/4) Epoch 23, batch 0, loss[ctc_loss=0.06745, att_loss=0.238, loss=0.2039, over 17282.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.0115, over 55.00 utterances.], tot_loss[ctc_loss=0.06745, att_loss=0.238, loss=0.2039, over 17282.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.0115, over 55.00 utterances.], batch size: 55, lr: 4.73e-03, grad_scale: 16.0 2023-03-09 01:27:30,801 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 01:27:42,673 INFO [train2.py:843] (2/4) Epoch 23, validation: ctc_loss=0.04039, att_loss=0.2346, loss=0.1958, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 01:27:42,674 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 01:27:46,039 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87645.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 01:27:53,622 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87650.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:28:06,064 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87658.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 01:28:18,430 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87666.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 01:28:58,598 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-03-09 01:29:02,024 INFO [train2.py:809] (2/4) Epoch 23, batch 50, loss[ctc_loss=0.07891, att_loss=0.2435, loss=0.2106, over 17145.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.01268, over 56.00 utterances.], tot_loss[ctc_loss=0.07092, att_loss=0.2372, loss=0.204, over 744676.48 frames. utt_duration=1234 frames, utt_pad_proportion=0.04756, over 2417.36 utterances.], batch size: 56, lr: 4.73e-03, grad_scale: 16.0 2023-03-09 01:29:22,816 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87706.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 01:29:41,150 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87718.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:29:54,251 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0812, 4.3182, 4.2294, 4.6405, 2.4896, 4.4698, 2.4895, 1.8369], device='cuda:2'), covar=tensor([0.0468, 0.0263, 0.0718, 0.0237, 0.1806, 0.0204, 0.1652, 0.1693], device='cuda:2'), in_proj_covar=tensor([0.0198, 0.0172, 0.0264, 0.0161, 0.0224, 0.0153, 0.0234, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:30:21,898 INFO [train2.py:809] (2/4) Epoch 23, batch 100, loss[ctc_loss=0.07868, att_loss=0.227, loss=0.1973, over 14558.00 frames. utt_duration=1821 frames, utt_pad_proportion=0.03425, over 32.00 utterances.], tot_loss[ctc_loss=0.07074, att_loss=0.2347, loss=0.2019, over 1305355.95 frames. utt_duration=1278 frames, utt_pad_proportion=0.04417, over 4090.94 utterances.], batch size: 32, lr: 4.73e-03, grad_scale: 8.0 2023-03-09 01:30:23,391 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 2.012e+02 2.393e+02 2.818e+02 6.680e+02, threshold=4.785e+02, percent-clipped=2.0 2023-03-09 01:30:42,579 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2026, 2.9731, 3.0975, 4.2159, 3.8242, 3.7627, 2.9252, 2.2918], device='cuda:2'), covar=tensor([0.0790, 0.1799, 0.1019, 0.0631, 0.0906, 0.0527, 0.1466, 0.2192], device='cuda:2'), in_proj_covar=tensor([0.0183, 0.0217, 0.0192, 0.0224, 0.0229, 0.0183, 0.0205, 0.0192], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:31:37,189 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3624, 2.0081, 1.9766, 2.3055, 2.5155, 2.3134, 2.1226, 2.6726], device='cuda:2'), covar=tensor([0.2017, 0.3132, 0.2086, 0.1586, 0.1861, 0.1604, 0.2286, 0.1660], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0130, 0.0126, 0.0116, 0.0131, 0.0111, 0.0134, 0.0106], device='cuda:2'), out_proj_covar=tensor([9.5104e-05, 1.0106e-04, 1.0058e-04, 9.0597e-05, 9.7684e-05, 8.9809e-05, 1.0218e-04, 8.4767e-05], device='cuda:2') 2023-03-09 01:31:41,399 INFO [train2.py:809] (2/4) Epoch 23, batch 150, loss[ctc_loss=0.06625, att_loss=0.2172, loss=0.187, over 15878.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.008625, over 39.00 utterances.], tot_loss[ctc_loss=0.07069, att_loss=0.2332, loss=0.2007, over 1739153.87 frames. utt_duration=1276 frames, utt_pad_proportion=0.04914, over 5459.44 utterances.], batch size: 39, lr: 4.73e-03, grad_scale: 8.0 2023-03-09 01:31:58,519 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1526, 3.7495, 3.2004, 3.4106, 3.9695, 3.6259, 3.1017, 4.2647], device='cuda:2'), covar=tensor([0.1039, 0.0510, 0.1146, 0.0740, 0.0760, 0.0798, 0.0853, 0.0479], device='cuda:2'), in_proj_covar=tensor([0.0203, 0.0217, 0.0227, 0.0202, 0.0280, 0.0243, 0.0200, 0.0289], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 01:32:20,530 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87817.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:32:55,588 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.66 vs. limit=5.0 2023-03-09 01:33:02,585 INFO [train2.py:809] (2/4) Epoch 23, batch 200, loss[ctc_loss=0.06163, att_loss=0.2309, loss=0.1971, over 16131.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005679, over 42.00 utterances.], tot_loss[ctc_loss=0.07132, att_loss=0.2344, loss=0.2018, over 2073733.10 frames. utt_duration=1212 frames, utt_pad_proportion=0.067, over 6851.68 utterances.], batch size: 42, lr: 4.72e-03, grad_scale: 8.0 2023-03-09 01:33:04,008 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.980e+02 2.350e+02 2.635e+02 6.250e+02, threshold=4.701e+02, percent-clipped=5.0 2023-03-09 01:33:16,336 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-09 01:33:31,417 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87861.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 01:34:11,165 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87886.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:34:22,273 INFO [train2.py:809] (2/4) Epoch 23, batch 250, loss[ctc_loss=0.07172, att_loss=0.2228, loss=0.1926, over 15775.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008243, over 38.00 utterances.], tot_loss[ctc_loss=0.0714, att_loss=0.2343, loss=0.2017, over 2341814.99 frames. utt_duration=1240 frames, utt_pad_proportion=0.05819, over 7563.40 utterances.], batch size: 38, lr: 4.72e-03, grad_scale: 8.0 2023-03-09 01:35:27,222 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87934.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:35:37,864 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87940.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 01:35:42,325 INFO [train2.py:809] (2/4) Epoch 23, batch 300, loss[ctc_loss=0.07374, att_loss=0.2426, loss=0.2088, over 17054.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.00963, over 53.00 utterances.], tot_loss[ctc_loss=0.06942, att_loss=0.2328, loss=0.2002, over 2544697.06 frames. utt_duration=1276 frames, utt_pad_proportion=0.04948, over 7988.00 utterances.], batch size: 53, lr: 4.72e-03, grad_scale: 8.0 2023-03-09 01:35:43,857 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.187e+02 1.810e+02 2.179e+02 2.816e+02 7.106e+02, threshold=4.359e+02, percent-clipped=3.0 2023-03-09 01:35:45,665 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87945.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:36:10,764 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87961.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 01:36:15,611 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2564, 2.8399, 3.1979, 4.2715, 3.8562, 3.7620, 2.8318, 2.1920], device='cuda:2'), covar=tensor([0.0869, 0.1872, 0.1049, 0.0667, 0.0855, 0.0559, 0.1716, 0.2260], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0215, 0.0191, 0.0223, 0.0227, 0.0182, 0.0204, 0.0190], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:36:55,336 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.82 vs. limit=5.0 2023-03-09 01:37:02,473 INFO [train2.py:809] (2/4) Epoch 23, batch 350, loss[ctc_loss=0.09363, att_loss=0.2559, loss=0.2234, over 16977.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.007159, over 50.00 utterances.], tot_loss[ctc_loss=0.0698, att_loss=0.2327, loss=0.2001, over 2703714.90 frames. utt_duration=1287 frames, utt_pad_proportion=0.04735, over 8413.53 utterances.], batch size: 50, lr: 4.72e-03, grad_scale: 8.0 2023-03-09 01:37:34,089 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3439, 5.2875, 4.9736, 3.5360, 5.1283, 5.0824, 4.8137, 3.4575], device='cuda:2'), covar=tensor([0.0119, 0.0108, 0.0344, 0.0963, 0.0100, 0.0181, 0.0266, 0.1240], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0103, 0.0107, 0.0112, 0.0086, 0.0116, 0.0101, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 01:37:46,398 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88018.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:38:27,590 INFO [train2.py:809] (2/4) Epoch 23, batch 400, loss[ctc_loss=0.07308, att_loss=0.2373, loss=0.2045, over 16871.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.008075, over 49.00 utterances.], tot_loss[ctc_loss=0.07122, att_loss=0.2342, loss=0.2016, over 2827296.44 frames. utt_duration=1241 frames, utt_pad_proportion=0.05887, over 9122.40 utterances.], batch size: 49, lr: 4.72e-03, grad_scale: 8.0 2023-03-09 01:38:29,097 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.944e+02 2.395e+02 3.108e+02 5.272e+02, threshold=4.790e+02, percent-clipped=4.0 2023-03-09 01:38:42,577 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8561, 2.2598, 2.6317, 2.8905, 2.9533, 2.6700, 2.5447, 3.1447], device='cuda:2'), covar=tensor([0.1188, 0.3194, 0.1804, 0.1102, 0.1261, 0.1228, 0.2255, 0.1016], device='cuda:2'), in_proj_covar=tensor([0.0123, 0.0128, 0.0124, 0.0114, 0.0129, 0.0110, 0.0132, 0.0105], device='cuda:2'), out_proj_covar=tensor([9.3339e-05, 9.9620e-05, 9.9240e-05, 8.9184e-05, 9.5920e-05, 8.8480e-05, 1.0075e-04, 8.3709e-05], device='cuda:2') 2023-03-09 01:39:04,771 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88066.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:39:48,100 INFO [train2.py:809] (2/4) Epoch 23, batch 450, loss[ctc_loss=0.0755, att_loss=0.2216, loss=0.1924, over 16185.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.00551, over 41.00 utterances.], tot_loss[ctc_loss=0.07192, att_loss=0.2343, loss=0.2018, over 2922069.37 frames. utt_duration=1231 frames, utt_pad_proportion=0.06258, over 9503.96 utterances.], batch size: 41, lr: 4.72e-03, grad_scale: 8.0 2023-03-09 01:40:00,132 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=88100.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:40:10,124 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-09 01:40:26,313 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88117.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:40:56,996 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2023-03-09 01:41:09,077 INFO [train2.py:809] (2/4) Epoch 23, batch 500, loss[ctc_loss=0.07272, att_loss=0.2447, loss=0.2103, over 17299.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01216, over 55.00 utterances.], tot_loss[ctc_loss=0.07219, att_loss=0.2347, loss=0.2022, over 2996585.29 frames. utt_duration=1223 frames, utt_pad_proportion=0.06505, over 9815.42 utterances.], batch size: 55, lr: 4.72e-03, grad_scale: 8.0 2023-03-09 01:41:10,580 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.266e+02 1.977e+02 2.397e+02 3.059e+02 5.388e+02, threshold=4.794e+02, percent-clipped=2.0 2023-03-09 01:41:37,460 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88161.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 01:41:37,512 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7434, 3.5018, 3.4479, 2.9228, 3.4422, 3.4278, 3.4954, 2.3985], device='cuda:2'), covar=tensor([0.1137, 0.1017, 0.1916, 0.3850, 0.0996, 0.1679, 0.1060, 0.4809], device='cuda:2'), in_proj_covar=tensor([0.0185, 0.0193, 0.0206, 0.0259, 0.0165, 0.0267, 0.0189, 0.0219], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:41:37,530 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=88161.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:41:44,236 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88165.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:41:50,481 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0523, 5.3617, 4.9489, 5.4145, 4.8169, 5.0498, 5.4880, 5.2563], device='cuda:2'), covar=tensor([0.0570, 0.0261, 0.0689, 0.0318, 0.0383, 0.0272, 0.0222, 0.0199], device='cuda:2'), in_proj_covar=tensor([0.0384, 0.0320, 0.0362, 0.0351, 0.0321, 0.0238, 0.0303, 0.0288], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0005, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 01:42:29,381 INFO [train2.py:809] (2/4) Epoch 23, batch 550, loss[ctc_loss=0.06507, att_loss=0.2339, loss=0.2001, over 16771.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006214, over 48.00 utterances.], tot_loss[ctc_loss=0.07283, att_loss=0.2348, loss=0.2024, over 3057895.91 frames. utt_duration=1221 frames, utt_pad_proportion=0.06426, over 10031.35 utterances.], batch size: 48, lr: 4.72e-03, grad_scale: 8.0 2023-03-09 01:42:41,088 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1042, 4.4324, 4.8354, 4.5970, 3.0069, 4.5369, 3.2275, 2.3072], device='cuda:2'), covar=tensor([0.0561, 0.0356, 0.0515, 0.0265, 0.1441, 0.0239, 0.1200, 0.1529], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0172, 0.0265, 0.0164, 0.0226, 0.0154, 0.0235, 0.0208], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:42:54,653 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88209.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:43:46,374 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88240.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:43:51,893 INFO [train2.py:809] (2/4) Epoch 23, batch 600, loss[ctc_loss=0.05971, att_loss=0.2056, loss=0.1764, over 15645.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008265, over 37.00 utterances.], tot_loss[ctc_loss=0.07171, att_loss=0.2337, loss=0.2013, over 3103482.38 frames. utt_duration=1249 frames, utt_pad_proportion=0.05782, over 9952.65 utterances.], batch size: 37, lr: 4.71e-03, grad_scale: 8.0 2023-03-09 01:43:53,455 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.322e+02 1.886e+02 2.333e+02 2.841e+02 4.176e+02, threshold=4.666e+02, percent-clipped=0.0 2023-03-09 01:43:55,486 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88245.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:44:20,771 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88261.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 01:45:04,433 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88288.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:45:10,130 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=88291.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 01:45:12,841 INFO [train2.py:809] (2/4) Epoch 23, batch 650, loss[ctc_loss=0.06327, att_loss=0.239, loss=0.2039, over 17040.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.009478, over 52.00 utterances.], tot_loss[ctc_loss=0.07199, att_loss=0.2342, loss=0.2018, over 3141965.88 frames. utt_duration=1235 frames, utt_pad_proportion=0.05924, over 10187.01 utterances.], batch size: 52, lr: 4.71e-03, grad_scale: 8.0 2023-03-09 01:45:12,985 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88293.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:45:30,405 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9212, 4.9297, 4.5953, 2.8588, 4.6007, 4.6390, 4.0980, 2.4733], device='cuda:2'), covar=tensor([0.0167, 0.0122, 0.0356, 0.1113, 0.0132, 0.0219, 0.0392, 0.1611], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0102, 0.0106, 0.0111, 0.0086, 0.0115, 0.0100, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 01:45:38,313 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88309.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:46:20,230 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2513, 4.5367, 4.7009, 4.9025, 2.9500, 4.6238, 3.1864, 2.0162], device='cuda:2'), covar=tensor([0.0430, 0.0310, 0.0527, 0.0204, 0.1451, 0.0201, 0.1138, 0.1597], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0169, 0.0262, 0.0162, 0.0222, 0.0152, 0.0230, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:46:22,679 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-09 01:46:34,459 INFO [train2.py:809] (2/4) Epoch 23, batch 700, loss[ctc_loss=0.07717, att_loss=0.2242, loss=0.1948, over 16182.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006305, over 41.00 utterances.], tot_loss[ctc_loss=0.07198, att_loss=0.2343, loss=0.2018, over 3168733.50 frames. utt_duration=1233 frames, utt_pad_proportion=0.05995, over 10290.40 utterances.], batch size: 41, lr: 4.71e-03, grad_scale: 8.0 2023-03-09 01:46:35,998 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.287e+02 1.968e+02 2.314e+02 2.837e+02 1.022e+03, threshold=4.627e+02, percent-clipped=4.0 2023-03-09 01:46:49,142 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=88352.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 01:47:56,421 INFO [train2.py:809] (2/4) Epoch 23, batch 750, loss[ctc_loss=0.06385, att_loss=0.2358, loss=0.2014, over 16396.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007964, over 44.00 utterances.], tot_loss[ctc_loss=0.07131, att_loss=0.2336, loss=0.2011, over 3184166.77 frames. utt_duration=1247 frames, utt_pad_proportion=0.05894, over 10229.96 utterances.], batch size: 44, lr: 4.71e-03, grad_scale: 8.0 2023-03-09 01:48:17,880 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6945, 3.3567, 3.2838, 2.8626, 3.2928, 3.3090, 3.3763, 2.4527], device='cuda:2'), covar=tensor([0.1162, 0.1490, 0.1778, 0.3489, 0.1461, 0.1838, 0.1082, 0.3971], device='cuda:2'), in_proj_covar=tensor([0.0186, 0.0193, 0.0206, 0.0260, 0.0165, 0.0267, 0.0189, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:49:15,205 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1853, 4.5147, 4.6074, 4.7895, 3.0813, 4.5026, 2.8345, 2.1934], device='cuda:2'), covar=tensor([0.0432, 0.0279, 0.0580, 0.0347, 0.1396, 0.0232, 0.1386, 0.1492], device='cuda:2'), in_proj_covar=tensor([0.0196, 0.0169, 0.0260, 0.0161, 0.0221, 0.0152, 0.0230, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:49:17,982 INFO [train2.py:809] (2/4) Epoch 23, batch 800, loss[ctc_loss=0.08158, att_loss=0.2616, loss=0.2256, over 17326.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.03626, over 63.00 utterances.], tot_loss[ctc_loss=0.07075, att_loss=0.2337, loss=0.2011, over 3206113.05 frames. utt_duration=1273 frames, utt_pad_proportion=0.05092, over 10084.27 utterances.], batch size: 63, lr: 4.71e-03, grad_scale: 8.0 2023-03-09 01:49:19,538 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.999e+02 2.331e+02 2.822e+02 6.920e+02, threshold=4.662e+02, percent-clipped=4.0 2023-03-09 01:49:39,400 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=88456.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:49:55,842 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7435, 2.2817, 2.5513, 2.5870, 2.7068, 2.6920, 2.4118, 2.9088], device='cuda:2'), covar=tensor([0.1808, 0.2897, 0.2060, 0.1519, 0.1705, 0.1361, 0.2332, 0.1434], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0130, 0.0127, 0.0117, 0.0130, 0.0112, 0.0136, 0.0107], device='cuda:2'), out_proj_covar=tensor([9.5376e-05, 1.0109e-04, 1.0136e-04, 9.1207e-05, 9.7475e-05, 9.0540e-05, 1.0311e-04, 8.5206e-05], device='cuda:2') 2023-03-09 01:50:00,481 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1200, 5.0332, 4.8146, 3.1309, 4.7364, 4.7505, 4.3157, 2.7237], device='cuda:2'), covar=tensor([0.0121, 0.0110, 0.0290, 0.1008, 0.0124, 0.0190, 0.0330, 0.1346], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0101, 0.0105, 0.0111, 0.0085, 0.0114, 0.0099, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 01:50:32,698 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-03-09 01:50:39,457 INFO [train2.py:809] (2/4) Epoch 23, batch 850, loss[ctc_loss=0.0641, att_loss=0.2228, loss=0.1911, over 15951.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006658, over 41.00 utterances.], tot_loss[ctc_loss=0.07099, att_loss=0.2337, loss=0.2012, over 3219234.85 frames. utt_duration=1281 frames, utt_pad_proportion=0.05062, over 10067.00 utterances.], batch size: 41, lr: 4.71e-03, grad_scale: 8.0 2023-03-09 01:51:36,794 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7759, 2.4023, 5.1292, 4.2042, 3.2415, 4.4644, 5.0188, 4.8653], device='cuda:2'), covar=tensor([0.0191, 0.1465, 0.0146, 0.0816, 0.1541, 0.0193, 0.0108, 0.0187], device='cuda:2'), in_proj_covar=tensor([0.0204, 0.0244, 0.0194, 0.0319, 0.0266, 0.0218, 0.0184, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 01:51:41,309 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.45 vs. limit=5.0 2023-03-09 01:52:02,090 INFO [train2.py:809] (2/4) Epoch 23, batch 900, loss[ctc_loss=0.07113, att_loss=0.24, loss=0.2063, over 17340.00 frames. utt_duration=879.6 frames, utt_pad_proportion=0.07992, over 79.00 utterances.], tot_loss[ctc_loss=0.07104, att_loss=0.2334, loss=0.2009, over 3223451.33 frames. utt_duration=1263 frames, utt_pad_proportion=0.05448, over 10220.13 utterances.], batch size: 79, lr: 4.71e-03, grad_scale: 8.0 2023-03-09 01:52:03,753 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.224e+02 1.915e+02 2.378e+02 3.006e+02 6.830e+02, threshold=4.756e+02, percent-clipped=6.0 2023-03-09 01:52:35,955 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-09 01:53:24,722 INFO [train2.py:809] (2/4) Epoch 23, batch 950, loss[ctc_loss=0.07175, att_loss=0.254, loss=0.2176, over 16979.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006939, over 50.00 utterances.], tot_loss[ctc_loss=0.07092, att_loss=0.2338, loss=0.2012, over 3235067.61 frames. utt_duration=1251 frames, utt_pad_proportion=0.0557, over 10352.62 utterances.], batch size: 50, lr: 4.70e-03, grad_scale: 8.0 2023-03-09 01:54:46,091 INFO [train2.py:809] (2/4) Epoch 23, batch 1000, loss[ctc_loss=0.06712, att_loss=0.2497, loss=0.2132, over 16778.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005963, over 48.00 utterances.], tot_loss[ctc_loss=0.07036, att_loss=0.2334, loss=0.2008, over 3242484.15 frames. utt_duration=1259 frames, utt_pad_proportion=0.05314, over 10314.88 utterances.], batch size: 48, lr: 4.70e-03, grad_scale: 8.0 2023-03-09 01:54:48,312 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.864e+02 2.157e+02 2.686e+02 4.602e+02, threshold=4.313e+02, percent-clipped=0.0 2023-03-09 01:54:53,439 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=88647.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 01:55:26,048 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0540, 5.3823, 5.2908, 5.3520, 5.4855, 5.0668, 3.8825, 5.4472], device='cuda:2'), covar=tensor([0.0080, 0.0085, 0.0095, 0.0053, 0.0065, 0.0092, 0.0560, 0.0144], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0087, 0.0111, 0.0069, 0.0075, 0.0086, 0.0103, 0.0108], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 01:56:09,566 INFO [train2.py:809] (2/4) Epoch 23, batch 1050, loss[ctc_loss=0.0671, att_loss=0.2448, loss=0.2093, over 16617.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005723, over 47.00 utterances.], tot_loss[ctc_loss=0.07101, att_loss=0.2342, loss=0.2015, over 3248344.56 frames. utt_duration=1256 frames, utt_pad_proportion=0.05385, over 10355.25 utterances.], batch size: 47, lr: 4.70e-03, grad_scale: 8.0 2023-03-09 01:57:24,375 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0062, 5.3373, 4.9081, 5.3967, 4.7645, 5.0905, 5.4817, 5.2231], device='cuda:2'), covar=tensor([0.0553, 0.0305, 0.0768, 0.0325, 0.0445, 0.0237, 0.0225, 0.0213], device='cuda:2'), in_proj_covar=tensor([0.0390, 0.0326, 0.0367, 0.0355, 0.0326, 0.0241, 0.0307, 0.0292], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 01:57:30,340 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.15 vs. limit=2.0 2023-03-09 01:57:30,982 INFO [train2.py:809] (2/4) Epoch 23, batch 1100, loss[ctc_loss=0.06376, att_loss=0.241, loss=0.2056, over 17322.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02324, over 59.00 utterances.], tot_loss[ctc_loss=0.0709, att_loss=0.2336, loss=0.201, over 3250645.85 frames. utt_duration=1291 frames, utt_pad_proportion=0.0462, over 10082.17 utterances.], batch size: 59, lr: 4.70e-03, grad_scale: 8.0 2023-03-09 01:57:32,494 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.348e+02 2.060e+02 2.487e+02 3.193e+02 8.233e+02, threshold=4.974e+02, percent-clipped=4.0 2023-03-09 01:57:52,258 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88756.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 01:58:52,971 INFO [train2.py:809] (2/4) Epoch 23, batch 1150, loss[ctc_loss=0.1045, att_loss=0.2584, loss=0.2276, over 17341.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.03464, over 63.00 utterances.], tot_loss[ctc_loss=0.07044, att_loss=0.2332, loss=0.2007, over 3258116.15 frames. utt_duration=1300 frames, utt_pad_proportion=0.04257, over 10035.06 utterances.], batch size: 63, lr: 4.70e-03, grad_scale: 8.0 2023-03-09 01:59:11,764 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88804.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:00:16,049 INFO [train2.py:809] (2/4) Epoch 23, batch 1200, loss[ctc_loss=0.08295, att_loss=0.2516, loss=0.2179, over 17450.00 frames. utt_duration=1109 frames, utt_pad_proportion=0.03032, over 63.00 utterances.], tot_loss[ctc_loss=0.07016, att_loss=0.2331, loss=0.2005, over 3261526.63 frames. utt_duration=1303 frames, utt_pad_proportion=0.04254, over 10027.64 utterances.], batch size: 63, lr: 4.70e-03, grad_scale: 8.0 2023-03-09 02:00:17,521 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.914e+02 2.277e+02 2.770e+02 5.405e+02, threshold=4.555e+02, percent-clipped=1.0 2023-03-09 02:00:40,198 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5668, 2.5711, 2.5655, 2.3965, 2.5681, 2.4129, 2.6240, 2.0945], device='cuda:2'), covar=tensor([0.1159, 0.1857, 0.2412, 0.3205, 0.1240, 0.2798, 0.1536, 0.3537], device='cuda:2'), in_proj_covar=tensor([0.0187, 0.0194, 0.0205, 0.0261, 0.0165, 0.0268, 0.0191, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 02:01:01,224 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.82 vs. limit=5.0 2023-03-09 02:01:04,503 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.91 vs. limit=5.0 2023-03-09 02:01:37,925 INFO [train2.py:809] (2/4) Epoch 23, batch 1250, loss[ctc_loss=0.08478, att_loss=0.2319, loss=0.2024, over 16003.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007419, over 40.00 utterances.], tot_loss[ctc_loss=0.07059, att_loss=0.2333, loss=0.2007, over 3264867.36 frames. utt_duration=1297 frames, utt_pad_proportion=0.04413, over 10082.74 utterances.], batch size: 40, lr: 4.70e-03, grad_scale: 8.0 2023-03-09 02:03:00,074 INFO [train2.py:809] (2/4) Epoch 23, batch 1300, loss[ctc_loss=0.05319, att_loss=0.2097, loss=0.1784, over 15377.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01075, over 35.00 utterances.], tot_loss[ctc_loss=0.0712, att_loss=0.2339, loss=0.2014, over 3268966.13 frames. utt_duration=1301 frames, utt_pad_proportion=0.04282, over 10059.13 utterances.], batch size: 35, lr: 4.70e-03, grad_scale: 8.0 2023-03-09 02:03:01,704 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.977e+02 2.384e+02 3.004e+02 6.014e+02, threshold=4.768e+02, percent-clipped=3.0 2023-03-09 02:03:06,871 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88947.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 02:04:21,371 INFO [train2.py:809] (2/4) Epoch 23, batch 1350, loss[ctc_loss=0.07126, att_loss=0.2188, loss=0.1893, over 16272.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007798, over 43.00 utterances.], tot_loss[ctc_loss=0.07117, att_loss=0.2337, loss=0.2012, over 3266592.00 frames. utt_duration=1297 frames, utt_pad_proportion=0.04384, over 10085.69 utterances.], batch size: 43, lr: 4.69e-03, grad_scale: 8.0 2023-03-09 02:04:24,661 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88995.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 02:05:14,565 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.53 vs. limit=2.0 2023-03-09 02:05:17,207 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89027.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:05:43,318 INFO [train2.py:809] (2/4) Epoch 23, batch 1400, loss[ctc_loss=0.06301, att_loss=0.2142, loss=0.184, over 15891.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.008324, over 39.00 utterances.], tot_loss[ctc_loss=0.07083, att_loss=0.2332, loss=0.2007, over 3266414.21 frames. utt_duration=1290 frames, utt_pad_proportion=0.04661, over 10139.82 utterances.], batch size: 39, lr: 4.69e-03, grad_scale: 8.0 2023-03-09 02:05:44,870 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.382e+02 1.935e+02 2.297e+02 2.917e+02 5.540e+02, threshold=4.593e+02, percent-clipped=1.0 2023-03-09 02:05:48,801 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-09 02:05:54,598 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9488, 5.2113, 5.4860, 5.3242, 5.4126, 5.8737, 5.1573, 5.9972], device='cuda:2'), covar=tensor([0.0630, 0.0748, 0.0845, 0.1423, 0.1627, 0.0947, 0.0747, 0.0675], device='cuda:2'), in_proj_covar=tensor([0.0881, 0.0515, 0.0617, 0.0671, 0.0879, 0.0638, 0.0496, 0.0619], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 02:06:03,997 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2324, 3.8377, 3.3269, 3.6126, 4.0159, 3.7443, 3.3737, 4.3962], device='cuda:2'), covar=tensor([0.0998, 0.0571, 0.1086, 0.0680, 0.0746, 0.0725, 0.0738, 0.0496], device='cuda:2'), in_proj_covar=tensor([0.0205, 0.0217, 0.0229, 0.0204, 0.0282, 0.0245, 0.0200, 0.0293], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 02:06:20,417 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9177, 3.6366, 3.5844, 3.1355, 3.6403, 3.6582, 3.6789, 2.5610], device='cuda:2'), covar=tensor([0.0998, 0.1197, 0.1787, 0.3251, 0.1177, 0.2364, 0.0735, 0.3569], device='cuda:2'), in_proj_covar=tensor([0.0186, 0.0194, 0.0204, 0.0260, 0.0165, 0.0268, 0.0191, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 02:06:57,316 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89088.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:07:04,586 INFO [train2.py:809] (2/4) Epoch 23, batch 1450, loss[ctc_loss=0.05063, att_loss=0.2133, loss=0.1807, over 16268.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.00795, over 43.00 utterances.], tot_loss[ctc_loss=0.07143, att_loss=0.2343, loss=0.2017, over 3274817.42 frames. utt_duration=1258 frames, utt_pad_proportion=0.05252, over 10426.11 utterances.], batch size: 43, lr: 4.69e-03, grad_scale: 8.0 2023-03-09 02:07:08,176 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3745, 2.4909, 4.9120, 3.8655, 3.0442, 4.2260, 4.6196, 4.4554], device='cuda:2'), covar=tensor([0.0273, 0.1511, 0.0170, 0.0874, 0.1604, 0.0248, 0.0162, 0.0300], device='cuda:2'), in_proj_covar=tensor([0.0204, 0.0243, 0.0194, 0.0317, 0.0264, 0.0217, 0.0184, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 02:08:26,667 INFO [train2.py:809] (2/4) Epoch 23, batch 1500, loss[ctc_loss=0.07864, att_loss=0.2396, loss=0.2074, over 16120.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006649, over 42.00 utterances.], tot_loss[ctc_loss=0.07154, att_loss=0.2344, loss=0.2019, over 3272458.44 frames. utt_duration=1241 frames, utt_pad_proportion=0.05711, over 10557.81 utterances.], batch size: 42, lr: 4.69e-03, grad_scale: 8.0 2023-03-09 02:08:28,106 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.367e+02 1.977e+02 2.384e+02 2.734e+02 5.349e+02, threshold=4.768e+02, percent-clipped=3.0 2023-03-09 02:08:47,296 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8558, 5.1969, 5.0688, 5.1933, 5.2890, 4.8970, 3.6602, 5.1992], device='cuda:2'), covar=tensor([0.0098, 0.0092, 0.0109, 0.0065, 0.0081, 0.0103, 0.0645, 0.0154], device='cuda:2'), in_proj_covar=tensor([0.0092, 0.0088, 0.0112, 0.0070, 0.0076, 0.0087, 0.0103, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 02:09:08,694 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89169.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:09:39,206 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9181, 5.0123, 4.5111, 2.7429, 4.7919, 4.7280, 3.9356, 2.0841], device='cuda:2'), covar=tensor([0.0217, 0.0127, 0.0447, 0.1522, 0.0142, 0.0221, 0.0585, 0.2595], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0102, 0.0105, 0.0111, 0.0085, 0.0113, 0.0100, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 02:09:48,304 INFO [train2.py:809] (2/4) Epoch 23, batch 1550, loss[ctc_loss=0.06643, att_loss=0.22, loss=0.1893, over 15998.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007931, over 40.00 utterances.], tot_loss[ctc_loss=0.07131, att_loss=0.234, loss=0.2014, over 3272907.03 frames. utt_duration=1247 frames, utt_pad_proportion=0.05455, over 10511.84 utterances.], batch size: 40, lr: 4.69e-03, grad_scale: 8.0 2023-03-09 02:10:22,663 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5400, 4.9682, 5.1149, 4.9329, 5.1000, 5.5335, 4.9949, 5.6015], device='cuda:2'), covar=tensor([0.0820, 0.0759, 0.0866, 0.1407, 0.1714, 0.0860, 0.0995, 0.0663], device='cuda:2'), in_proj_covar=tensor([0.0881, 0.0517, 0.0616, 0.0671, 0.0881, 0.0638, 0.0495, 0.0615], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 02:10:49,435 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89230.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:10:59,321 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4829, 5.7858, 5.2173, 5.5509, 5.4162, 4.9911, 5.1663, 5.0086], device='cuda:2'), covar=tensor([0.1298, 0.0894, 0.0994, 0.0814, 0.0960, 0.1529, 0.2258, 0.2083], device='cuda:2'), in_proj_covar=tensor([0.0527, 0.0612, 0.0467, 0.0461, 0.0436, 0.0468, 0.0615, 0.0531], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 02:11:01,098 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4826, 4.7724, 4.6329, 4.7712, 4.8482, 4.5366, 3.4627, 4.7977], device='cuda:2'), covar=tensor([0.0109, 0.0112, 0.0143, 0.0081, 0.0096, 0.0121, 0.0670, 0.0168], device='cuda:2'), in_proj_covar=tensor([0.0092, 0.0088, 0.0112, 0.0070, 0.0076, 0.0087, 0.0103, 0.0110], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 02:11:10,190 INFO [train2.py:809] (2/4) Epoch 23, batch 1600, loss[ctc_loss=0.05673, att_loss=0.1996, loss=0.171, over 15640.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.009115, over 37.00 utterances.], tot_loss[ctc_loss=0.07206, att_loss=0.2349, loss=0.2023, over 3280631.99 frames. utt_duration=1259 frames, utt_pad_proportion=0.0489, over 10438.10 utterances.], batch size: 37, lr: 4.69e-03, grad_scale: 8.0 2023-03-09 02:11:11,786 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.895e+02 2.282e+02 2.881e+02 5.348e+02, threshold=4.565e+02, percent-clipped=1.0 2023-03-09 02:12:31,874 INFO [train2.py:809] (2/4) Epoch 23, batch 1650, loss[ctc_loss=0.09638, att_loss=0.2618, loss=0.2287, over 13807.00 frames. utt_duration=379.6 frames, utt_pad_proportion=0.3375, over 146.00 utterances.], tot_loss[ctc_loss=0.07189, att_loss=0.2349, loss=0.2023, over 3278286.19 frames. utt_duration=1255 frames, utt_pad_proportion=0.051, over 10460.34 utterances.], batch size: 146, lr: 4.69e-03, grad_scale: 8.0 2023-03-09 02:12:42,428 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-09 02:12:59,832 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2882, 5.5099, 5.4637, 5.4126, 5.5642, 5.4811, 5.1932, 4.9922], device='cuda:2'), covar=tensor([0.0898, 0.0504, 0.0290, 0.0420, 0.0251, 0.0270, 0.0353, 0.0286], device='cuda:2'), in_proj_covar=tensor([0.0521, 0.0367, 0.0350, 0.0361, 0.0426, 0.0435, 0.0360, 0.0398], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-09 02:13:33,226 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5727, 4.6299, 4.1502, 2.4997, 4.4521, 4.4952, 3.7959, 2.1855], device='cuda:2'), covar=tensor([0.0160, 0.0151, 0.0403, 0.1508, 0.0155, 0.0261, 0.0520, 0.2314], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0102, 0.0105, 0.0111, 0.0086, 0.0114, 0.0100, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 02:13:54,205 INFO [train2.py:809] (2/4) Epoch 23, batch 1700, loss[ctc_loss=0.1285, att_loss=0.2706, loss=0.2422, over 14337.00 frames. utt_duration=394.2 frames, utt_pad_proportion=0.3144, over 146.00 utterances.], tot_loss[ctc_loss=0.07189, att_loss=0.2346, loss=0.2021, over 3275836.54 frames. utt_duration=1244 frames, utt_pad_proportion=0.05514, over 10542.80 utterances.], batch size: 146, lr: 4.68e-03, grad_scale: 8.0 2023-03-09 02:13:55,725 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.335e+02 1.978e+02 2.274e+02 2.807e+02 5.898e+02, threshold=4.548e+02, percent-clipped=3.0 2023-03-09 02:14:45,032 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89374.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:15:00,028 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89383.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:15:15,727 INFO [train2.py:809] (2/4) Epoch 23, batch 1750, loss[ctc_loss=0.06169, att_loss=0.2154, loss=0.1846, over 16172.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.007435, over 41.00 utterances.], tot_loss[ctc_loss=0.07203, att_loss=0.235, loss=0.2024, over 3274066.35 frames. utt_duration=1235 frames, utt_pad_proportion=0.05689, over 10615.06 utterances.], batch size: 41, lr: 4.68e-03, grad_scale: 8.0 2023-03-09 02:16:09,016 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89425.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:16:24,766 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-09 02:16:25,509 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89435.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:16:31,821 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89439.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 02:16:37,758 INFO [train2.py:809] (2/4) Epoch 23, batch 1800, loss[ctc_loss=0.07931, att_loss=0.2553, loss=0.2201, over 16633.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.004856, over 47.00 utterances.], tot_loss[ctc_loss=0.07144, att_loss=0.2349, loss=0.2022, over 3278101.08 frames. utt_duration=1259 frames, utt_pad_proportion=0.05001, over 10423.31 utterances.], batch size: 47, lr: 4.68e-03, grad_scale: 8.0 2023-03-09 02:16:39,275 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.252e+02 1.934e+02 2.267e+02 2.788e+02 6.296e+02, threshold=4.533e+02, percent-clipped=2.0 2023-03-09 02:17:49,828 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89486.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:18:00,958 INFO [train2.py:809] (2/4) Epoch 23, batch 1850, loss[ctc_loss=0.07123, att_loss=0.2401, loss=0.2064, over 16533.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006707, over 45.00 utterances.], tot_loss[ctc_loss=0.07041, att_loss=0.2344, loss=0.2016, over 3281244.16 frames. utt_duration=1253 frames, utt_pad_proportion=0.05115, over 10483.43 utterances.], batch size: 45, lr: 4.68e-03, grad_scale: 8.0 2023-03-09 02:18:12,504 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89500.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 02:18:27,092 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89509.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:18:53,263 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89525.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:19:23,209 INFO [train2.py:809] (2/4) Epoch 23, batch 1900, loss[ctc_loss=0.06336, att_loss=0.2154, loss=0.185, over 15522.00 frames. utt_duration=1726 frames, utt_pad_proportion=0.00738, over 36.00 utterances.], tot_loss[ctc_loss=0.07025, att_loss=0.2342, loss=0.2014, over 3277488.56 frames. utt_duration=1270 frames, utt_pad_proportion=0.04839, over 10336.66 utterances.], batch size: 36, lr: 4.68e-03, grad_scale: 8.0 2023-03-09 02:19:24,723 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.258e+02 1.857e+02 2.214e+02 2.613e+02 6.463e+02, threshold=4.427e+02, percent-clipped=1.0 2023-03-09 02:19:42,356 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.78 vs. limit=5.0 2023-03-09 02:20:07,570 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89570.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:20:44,881 INFO [train2.py:809] (2/4) Epoch 23, batch 1950, loss[ctc_loss=0.06054, att_loss=0.235, loss=0.2001, over 17376.00 frames. utt_duration=881.1 frames, utt_pad_proportion=0.07839, over 79.00 utterances.], tot_loss[ctc_loss=0.07169, att_loss=0.2356, loss=0.2028, over 3269205.88 frames. utt_duration=1220 frames, utt_pad_proportion=0.06145, over 10735.76 utterances.], batch size: 79, lr: 4.68e-03, grad_scale: 8.0 2023-03-09 02:20:45,846 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2023-03-09 02:21:38,585 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4306, 2.9686, 3.6365, 2.9332, 3.5938, 4.6214, 4.4325, 3.3085], device='cuda:2'), covar=tensor([0.0477, 0.1845, 0.1267, 0.1581, 0.1037, 0.0815, 0.0647, 0.1258], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0248, 0.0286, 0.0225, 0.0270, 0.0379, 0.0269, 0.0237], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 02:22:06,988 INFO [train2.py:809] (2/4) Epoch 23, batch 2000, loss[ctc_loss=0.08346, att_loss=0.241, loss=0.2095, over 16527.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006962, over 45.00 utterances.], tot_loss[ctc_loss=0.07241, att_loss=0.2355, loss=0.2029, over 3266830.65 frames. utt_duration=1217 frames, utt_pad_proportion=0.06205, over 10749.58 utterances.], batch size: 45, lr: 4.68e-03, grad_scale: 8.0 2023-03-09 02:22:08,496 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.359e+02 1.989e+02 2.307e+02 2.806e+02 7.990e+02, threshold=4.613e+02, percent-clipped=4.0 2023-03-09 02:22:57,830 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6779, 5.1491, 4.9127, 4.9988, 5.1615, 4.8006, 3.5166, 5.1594], device='cuda:2'), covar=tensor([0.0120, 0.0095, 0.0140, 0.0096, 0.0095, 0.0112, 0.0653, 0.0168], device='cuda:2'), in_proj_covar=tensor([0.0091, 0.0087, 0.0111, 0.0069, 0.0075, 0.0086, 0.0102, 0.0108], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 02:23:07,549 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89680.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:23:12,480 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=89683.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:23:28,546 INFO [train2.py:809] (2/4) Epoch 23, batch 2050, loss[ctc_loss=0.06131, att_loss=0.2303, loss=0.1965, over 15958.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006793, over 41.00 utterances.], tot_loss[ctc_loss=0.07158, att_loss=0.2347, loss=0.2021, over 3264276.97 frames. utt_duration=1218 frames, utt_pad_proportion=0.06245, over 10734.06 utterances.], batch size: 41, lr: 4.68e-03, grad_scale: 8.0 2023-03-09 02:24:28,912 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89730.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:24:30,398 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=89731.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:24:42,935 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.78 vs. limit=2.0 2023-03-09 02:24:47,318 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89741.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:24:50,039 INFO [train2.py:809] (2/4) Epoch 23, batch 2100, loss[ctc_loss=0.07158, att_loss=0.2151, loss=0.1864, over 15478.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.009999, over 36.00 utterances.], tot_loss[ctc_loss=0.07163, att_loss=0.2344, loss=0.2019, over 3265454.64 frames. utt_duration=1219 frames, utt_pad_proportion=0.06273, over 10729.05 utterances.], batch size: 36, lr: 4.67e-03, grad_scale: 16.0 2023-03-09 02:24:51,496 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.393e+02 1.974e+02 2.233e+02 2.769e+02 5.380e+02, threshold=4.466e+02, percent-clipped=5.0 2023-03-09 02:25:36,369 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5336, 4.9298, 5.1374, 4.9787, 5.1009, 5.5199, 5.0104, 5.5641], device='cuda:2'), covar=tensor([0.0762, 0.0740, 0.0775, 0.1315, 0.1759, 0.0818, 0.0958, 0.0695], device='cuda:2'), in_proj_covar=tensor([0.0876, 0.0513, 0.0614, 0.0665, 0.0880, 0.0635, 0.0492, 0.0613], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 02:25:52,096 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89781.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:26:11,804 INFO [train2.py:809] (2/4) Epoch 23, batch 2150, loss[ctc_loss=0.08598, att_loss=0.2514, loss=0.2183, over 16618.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005633, over 47.00 utterances.], tot_loss[ctc_loss=0.07165, att_loss=0.2346, loss=0.202, over 3267836.53 frames. utt_duration=1230 frames, utt_pad_proportion=0.05823, over 10641.68 utterances.], batch size: 47, lr: 4.67e-03, grad_scale: 16.0 2023-03-09 02:26:15,134 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89795.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 02:26:17,457 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.57 vs. limit=2.0 2023-03-09 02:27:04,691 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=89825.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:27:33,613 INFO [train2.py:809] (2/4) Epoch 23, batch 2200, loss[ctc_loss=0.07162, att_loss=0.2438, loss=0.2094, over 16483.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.005634, over 46.00 utterances.], tot_loss[ctc_loss=0.0718, att_loss=0.2348, loss=0.2022, over 3272303.94 frames. utt_duration=1252 frames, utt_pad_proportion=0.05296, over 10464.92 utterances.], batch size: 46, lr: 4.67e-03, grad_scale: 16.0 2023-03-09 02:27:34,943 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.973e+02 2.386e+02 3.013e+02 6.689e+02, threshold=4.771e+02, percent-clipped=5.0 2023-03-09 02:27:50,538 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8357, 4.6768, 4.6239, 2.3155, 1.9948, 2.8614, 2.2889, 3.7704], device='cuda:2'), covar=tensor([0.0793, 0.0237, 0.0226, 0.4383, 0.5318, 0.2401, 0.3438, 0.1406], device='cuda:2'), in_proj_covar=tensor([0.0357, 0.0280, 0.0268, 0.0243, 0.0339, 0.0331, 0.0255, 0.0365], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 02:28:09,093 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89865.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:28:09,259 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89865.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:28:23,122 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=89873.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:28:54,729 INFO [train2.py:809] (2/4) Epoch 23, batch 2250, loss[ctc_loss=0.05755, att_loss=0.2481, loss=0.21, over 16619.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005678, over 47.00 utterances.], tot_loss[ctc_loss=0.07199, att_loss=0.235, loss=0.2024, over 3273990.56 frames. utt_duration=1252 frames, utt_pad_proportion=0.05312, over 10471.16 utterances.], batch size: 47, lr: 4.67e-03, grad_scale: 16.0 2023-03-09 02:29:41,015 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0985, 5.1316, 4.9972, 2.2369, 1.9590, 3.0426, 2.5621, 3.8466], device='cuda:2'), covar=tensor([0.0742, 0.0338, 0.0217, 0.5466, 0.5812, 0.2404, 0.3380, 0.1713], device='cuda:2'), in_proj_covar=tensor([0.0357, 0.0280, 0.0268, 0.0243, 0.0339, 0.0331, 0.0255, 0.0367], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 02:29:48,472 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89926.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 02:30:14,730 INFO [train2.py:809] (2/4) Epoch 23, batch 2300, loss[ctc_loss=0.06656, att_loss=0.2266, loss=0.1946, over 15957.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.006051, over 41.00 utterances.], tot_loss[ctc_loss=0.07131, att_loss=0.2342, loss=0.2016, over 3275605.40 frames. utt_duration=1284 frames, utt_pad_proportion=0.0454, over 10215.77 utterances.], batch size: 41, lr: 4.67e-03, grad_scale: 16.0 2023-03-09 02:30:16,395 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 1.850e+02 2.227e+02 2.792e+02 5.012e+02, threshold=4.455e+02, percent-clipped=2.0 2023-03-09 02:31:36,741 INFO [train2.py:809] (2/4) Epoch 23, batch 2350, loss[ctc_loss=0.07826, att_loss=0.2438, loss=0.2107, over 17316.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02278, over 59.00 utterances.], tot_loss[ctc_loss=0.07166, att_loss=0.2353, loss=0.2025, over 3285257.91 frames. utt_duration=1264 frames, utt_pad_proportion=0.04794, over 10410.67 utterances.], batch size: 59, lr: 4.67e-03, grad_scale: 16.0 2023-03-09 02:32:42,539 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90030.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:32:52,442 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90036.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:33:00,232 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9991, 5.3685, 5.6126, 5.3494, 5.5082, 5.9853, 5.2673, 6.0384], device='cuda:2'), covar=tensor([0.0714, 0.0671, 0.0764, 0.1149, 0.1586, 0.0801, 0.0640, 0.0742], device='cuda:2'), in_proj_covar=tensor([0.0883, 0.0517, 0.0617, 0.0669, 0.0886, 0.0634, 0.0498, 0.0619], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 02:33:03,245 INFO [train2.py:809] (2/4) Epoch 23, batch 2400, loss[ctc_loss=0.04876, att_loss=0.2247, loss=0.1895, over 16625.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005379, over 47.00 utterances.], tot_loss[ctc_loss=0.07121, att_loss=0.2351, loss=0.2023, over 3290898.10 frames. utt_duration=1263 frames, utt_pad_proportion=0.04646, over 10433.98 utterances.], batch size: 47, lr: 4.67e-03, grad_scale: 16.0 2023-03-09 02:33:04,733 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.274e+02 1.874e+02 2.405e+02 2.931e+02 6.452e+02, threshold=4.810e+02, percent-clipped=5.0 2023-03-09 02:33:23,323 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6584, 3.3920, 2.8188, 3.0018, 3.5047, 3.2829, 2.5973, 3.6125], device='cuda:2'), covar=tensor([0.0998, 0.0496, 0.1038, 0.0773, 0.0756, 0.0724, 0.0931, 0.0462], device='cuda:2'), in_proj_covar=tensor([0.0205, 0.0220, 0.0229, 0.0205, 0.0282, 0.0245, 0.0201, 0.0292], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 02:34:00,241 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90078.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:34:05,177 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90081.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:34:24,273 INFO [train2.py:809] (2/4) Epoch 23, batch 2450, loss[ctc_loss=0.05547, att_loss=0.2128, loss=0.1814, over 14979.00 frames. utt_duration=1817 frames, utt_pad_proportion=0.02621, over 33.00 utterances.], tot_loss[ctc_loss=0.07098, att_loss=0.2341, loss=0.2015, over 3276359.39 frames. utt_duration=1257 frames, utt_pad_proportion=0.04947, over 10436.72 utterances.], batch size: 33, lr: 4.67e-03, grad_scale: 16.0 2023-03-09 02:34:27,943 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90095.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 02:34:38,673 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-03-09 02:35:23,984 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90129.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:35:25,870 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90130.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:35:47,560 INFO [train2.py:809] (2/4) Epoch 23, batch 2500, loss[ctc_loss=0.06581, att_loss=0.2349, loss=0.2011, over 16688.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005018, over 46.00 utterances.], tot_loss[ctc_loss=0.071, att_loss=0.234, loss=0.2014, over 3270696.34 frames. utt_duration=1253 frames, utt_pad_proportion=0.05157, over 10449.94 utterances.], batch size: 46, lr: 4.66e-03, grad_scale: 16.0 2023-03-09 02:35:47,665 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90143.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 02:35:48,949 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 2.045e+02 2.395e+02 2.937e+02 6.523e+02, threshold=4.790e+02, percent-clipped=3.0 2023-03-09 02:36:23,545 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90165.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:36:44,026 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2023-03-09 02:37:05,984 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90191.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:37:09,384 INFO [train2.py:809] (2/4) Epoch 23, batch 2550, loss[ctc_loss=0.06119, att_loss=0.2054, loss=0.1766, over 14495.00 frames. utt_duration=1813 frames, utt_pad_proportion=0.0427, over 32.00 utterances.], tot_loss[ctc_loss=0.07115, att_loss=0.2337, loss=0.2012, over 3262488.01 frames. utt_duration=1262 frames, utt_pad_proportion=0.05232, over 10352.38 utterances.], batch size: 32, lr: 4.66e-03, grad_scale: 16.0 2023-03-09 02:37:33,590 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3761, 4.3726, 4.4736, 4.5168, 5.0087, 4.5411, 4.5041, 2.6272], device='cuda:2'), covar=tensor([0.0246, 0.0361, 0.0360, 0.0321, 0.0879, 0.0220, 0.0316, 0.1685], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0198, 0.0195, 0.0212, 0.0372, 0.0166, 0.0186, 0.0218], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 02:37:41,489 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90213.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:37:54,852 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90221.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 02:38:12,238 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90232.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:38:13,829 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7862, 2.4615, 2.6318, 2.6579, 2.7531, 2.6423, 2.5989, 3.0493], device='cuda:2'), covar=tensor([0.1340, 0.2368, 0.1759, 0.1256, 0.1268, 0.1161, 0.2055, 0.1020], device='cuda:2'), in_proj_covar=tensor([0.0124, 0.0126, 0.0123, 0.0114, 0.0128, 0.0111, 0.0133, 0.0105], device='cuda:2'), out_proj_covar=tensor([9.3793e-05, 9.8221e-05, 9.8480e-05, 8.9191e-05, 9.5834e-05, 8.9396e-05, 1.0093e-04, 8.4024e-05], device='cuda:2') 2023-03-09 02:38:30,844 INFO [train2.py:809] (2/4) Epoch 23, batch 2600, loss[ctc_loss=0.05368, att_loss=0.2323, loss=0.1966, over 16766.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006496, over 48.00 utterances.], tot_loss[ctc_loss=0.07071, att_loss=0.2328, loss=0.2004, over 3256466.21 frames. utt_duration=1275 frames, utt_pad_proportion=0.05213, over 10231.89 utterances.], batch size: 48, lr: 4.66e-03, grad_scale: 8.0 2023-03-09 02:38:34,181 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 2.020e+02 2.371e+02 2.963e+02 6.277e+02, threshold=4.742e+02, percent-clipped=3.0 2023-03-09 02:38:41,537 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.80 vs. limit=2.0 2023-03-09 02:39:12,234 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1599, 5.0116, 4.9170, 2.9295, 4.8500, 4.6534, 4.3612, 2.7037], device='cuda:2'), covar=tensor([0.0116, 0.0125, 0.0253, 0.1081, 0.0106, 0.0223, 0.0327, 0.1399], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0102, 0.0104, 0.0111, 0.0085, 0.0114, 0.0099, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 02:39:52,428 INFO [train2.py:809] (2/4) Epoch 23, batch 2650, loss[ctc_loss=0.06634, att_loss=0.246, loss=0.21, over 17262.00 frames. utt_duration=875.7 frames, utt_pad_proportion=0.08019, over 79.00 utterances.], tot_loss[ctc_loss=0.07099, att_loss=0.2339, loss=0.2013, over 3262797.35 frames. utt_duration=1261 frames, utt_pad_proportion=0.05295, over 10364.33 utterances.], batch size: 79, lr: 4.66e-03, grad_scale: 8.0 2023-03-09 02:39:52,883 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90293.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:41:03,321 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90336.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:41:14,428 INFO [train2.py:809] (2/4) Epoch 23, batch 2700, loss[ctc_loss=0.05383, att_loss=0.2301, loss=0.1948, over 16329.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006369, over 45.00 utterances.], tot_loss[ctc_loss=0.07118, att_loss=0.2341, loss=0.2015, over 3255194.17 frames. utt_duration=1236 frames, utt_pad_proportion=0.06041, over 10543.59 utterances.], batch size: 45, lr: 4.66e-03, grad_scale: 8.0 2023-03-09 02:41:17,512 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.381e+02 1.963e+02 2.237e+02 2.916e+02 6.361e+02, threshold=4.475e+02, percent-clipped=3.0 2023-03-09 02:41:27,035 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90351.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:42:21,367 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90384.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:42:35,956 INFO [train2.py:809] (2/4) Epoch 23, batch 2750, loss[ctc_loss=0.08494, att_loss=0.2487, loss=0.2159, over 17354.00 frames. utt_duration=1008 frames, utt_pad_proportion=0.04928, over 69.00 utterances.], tot_loss[ctc_loss=0.07106, att_loss=0.2342, loss=0.2016, over 3260300.54 frames. utt_duration=1245 frames, utt_pad_proportion=0.0569, over 10485.31 utterances.], batch size: 69, lr: 4.66e-03, grad_scale: 8.0 2023-03-09 02:43:06,575 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90412.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:43:31,589 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5996, 3.1612, 3.4839, 4.7364, 4.2181, 4.1403, 3.1149, 2.5887], device='cuda:2'), covar=tensor([0.0641, 0.1777, 0.0864, 0.0394, 0.0646, 0.0400, 0.1312, 0.1980], device='cuda:2'), in_proj_covar=tensor([0.0184, 0.0219, 0.0190, 0.0222, 0.0227, 0.0182, 0.0205, 0.0188], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 02:43:58,168 INFO [train2.py:809] (2/4) Epoch 23, batch 2800, loss[ctc_loss=0.06596, att_loss=0.2436, loss=0.2081, over 17329.00 frames. utt_duration=1262 frames, utt_pad_proportion=0.008941, over 55.00 utterances.], tot_loss[ctc_loss=0.07167, att_loss=0.2348, loss=0.2022, over 3257422.57 frames. utt_duration=1223 frames, utt_pad_proportion=0.06187, over 10668.12 utterances.], batch size: 55, lr: 4.66e-03, grad_scale: 8.0 2023-03-09 02:44:01,315 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.235e+02 1.964e+02 2.366e+02 2.896e+02 7.050e+02, threshold=4.731e+02, percent-clipped=3.0 2023-03-09 02:44:23,938 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90459.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:44:48,735 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7904, 2.2122, 2.5823, 2.5382, 2.7020, 2.6520, 2.4521, 2.9947], device='cuda:2'), covar=tensor([0.1168, 0.2830, 0.1692, 0.1395, 0.1490, 0.1131, 0.2342, 0.1025], device='cuda:2'), in_proj_covar=tensor([0.0126, 0.0129, 0.0125, 0.0117, 0.0131, 0.0113, 0.0137, 0.0108], device='cuda:2'), out_proj_covar=tensor([9.5510e-05, 1.0038e-04, 1.0047e-04, 9.1389e-05, 9.8055e-05, 9.0868e-05, 1.0348e-04, 8.5954e-05], device='cuda:2') 2023-03-09 02:45:07,715 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90486.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:45:19,165 INFO [train2.py:809] (2/4) Epoch 23, batch 2850, loss[ctc_loss=0.07827, att_loss=0.2498, loss=0.2155, over 17146.00 frames. utt_duration=1226 frames, utt_pad_proportion=0.01274, over 56.00 utterances.], tot_loss[ctc_loss=0.07151, att_loss=0.2343, loss=0.2018, over 3256393.19 frames. utt_duration=1234 frames, utt_pad_proportion=0.05979, over 10572.30 utterances.], batch size: 56, lr: 4.66e-03, grad_scale: 8.0 2023-03-09 02:45:24,413 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90496.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:46:03,457 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90520.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:46:04,892 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90521.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 02:46:40,204 INFO [train2.py:809] (2/4) Epoch 23, batch 2900, loss[ctc_loss=0.06253, att_loss=0.2271, loss=0.1942, over 16127.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005667, over 42.00 utterances.], tot_loss[ctc_loss=0.07128, att_loss=0.2344, loss=0.2018, over 3259590.92 frames. utt_duration=1213 frames, utt_pad_proportion=0.06548, over 10763.31 utterances.], batch size: 42, lr: 4.65e-03, grad_scale: 8.0 2023-03-09 02:46:42,143 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1432, 5.0853, 4.8183, 3.2096, 4.9538, 4.7487, 4.4482, 3.0265], device='cuda:2'), covar=tensor([0.0135, 0.0092, 0.0310, 0.0885, 0.0092, 0.0183, 0.0271, 0.1157], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0103, 0.0105, 0.0111, 0.0086, 0.0115, 0.0099, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 02:46:43,382 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 2.000e+02 2.282e+02 2.820e+02 8.053e+02, threshold=4.565e+02, percent-clipped=3.0 2023-03-09 02:47:02,775 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90557.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:47:22,352 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90569.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:47:53,002 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90588.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:48:01,367 INFO [train2.py:809] (2/4) Epoch 23, batch 2950, loss[ctc_loss=0.0817, att_loss=0.2535, loss=0.2191, over 17023.00 frames. utt_duration=1286 frames, utt_pad_proportion=0.01073, over 53.00 utterances.], tot_loss[ctc_loss=0.07079, att_loss=0.2337, loss=0.2011, over 3254943.80 frames. utt_duration=1235 frames, utt_pad_proportion=0.06157, over 10556.14 utterances.], batch size: 53, lr: 4.65e-03, grad_scale: 8.0 2023-03-09 02:49:23,757 INFO [train2.py:809] (2/4) Epoch 23, batch 3000, loss[ctc_loss=0.07909, att_loss=0.2495, loss=0.2154, over 17281.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.01309, over 55.00 utterances.], tot_loss[ctc_loss=0.07037, att_loss=0.2341, loss=0.2013, over 3264120.38 frames. utt_duration=1234 frames, utt_pad_proportion=0.05875, over 10595.11 utterances.], batch size: 55, lr: 4.65e-03, grad_scale: 8.0 2023-03-09 02:49:23,757 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 02:49:38,035 INFO [train2.py:843] (2/4) Epoch 23, validation: ctc_loss=0.03973, att_loss=0.234, loss=0.1952, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 02:49:38,035 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 02:49:41,304 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.311e+02 1.804e+02 2.183e+02 2.911e+02 8.359e+02, threshold=4.365e+02, percent-clipped=3.0 2023-03-09 02:50:13,749 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-03-09 02:51:00,735 INFO [train2.py:809] (2/4) Epoch 23, batch 3050, loss[ctc_loss=0.05868, att_loss=0.2421, loss=0.2055, over 17029.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007249, over 51.00 utterances.], tot_loss[ctc_loss=0.07049, att_loss=0.2344, loss=0.2016, over 3267521.79 frames. utt_duration=1233 frames, utt_pad_proportion=0.05829, over 10610.50 utterances.], batch size: 51, lr: 4.65e-03, grad_scale: 8.0 2023-03-09 02:51:24,178 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90707.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:51:56,051 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1631, 5.2054, 4.8743, 2.3246, 2.0835, 3.3627, 2.2866, 3.9033], device='cuda:2'), covar=tensor([0.0714, 0.0262, 0.0279, 0.5250, 0.5691, 0.2032, 0.4129, 0.1674], device='cuda:2'), in_proj_covar=tensor([0.0360, 0.0283, 0.0272, 0.0246, 0.0343, 0.0336, 0.0259, 0.0371], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 02:52:23,333 INFO [train2.py:809] (2/4) Epoch 23, batch 3100, loss[ctc_loss=0.05965, att_loss=0.2391, loss=0.2032, over 16999.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.009127, over 51.00 utterances.], tot_loss[ctc_loss=0.07021, att_loss=0.234, loss=0.2012, over 3275019.33 frames. utt_duration=1247 frames, utt_pad_proportion=0.05388, over 10519.61 utterances.], batch size: 51, lr: 4.65e-03, grad_scale: 4.0 2023-03-09 02:52:28,637 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.220e+02 1.776e+02 2.095e+02 2.519e+02 6.521e+02, threshold=4.191e+02, percent-clipped=1.0 2023-03-09 02:53:20,425 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2023-03-09 02:53:34,432 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90786.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:53:45,911 INFO [train2.py:809] (2/4) Epoch 23, batch 3150, loss[ctc_loss=0.07004, att_loss=0.2408, loss=0.2067, over 17444.00 frames. utt_duration=1013 frames, utt_pad_proportion=0.0446, over 69.00 utterances.], tot_loss[ctc_loss=0.06979, att_loss=0.233, loss=0.2003, over 3261282.24 frames. utt_duration=1244 frames, utt_pad_proportion=0.05681, over 10499.10 utterances.], batch size: 69, lr: 4.65e-03, grad_scale: 4.0 2023-03-09 02:54:21,920 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90815.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:54:52,454 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90834.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:55:04,743 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0571, 4.4403, 4.3962, 4.6482, 2.9017, 4.6159, 2.8637, 1.8617], device='cuda:2'), covar=tensor([0.0462, 0.0270, 0.0685, 0.0213, 0.1597, 0.0194, 0.1429, 0.1769], device='cuda:2'), in_proj_covar=tensor([0.0201, 0.0172, 0.0264, 0.0164, 0.0224, 0.0155, 0.0233, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 02:55:08,166 INFO [train2.py:809] (2/4) Epoch 23, batch 3200, loss[ctc_loss=0.053, att_loss=0.2105, loss=0.179, over 16014.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006862, over 40.00 utterances.], tot_loss[ctc_loss=0.06946, att_loss=0.2329, loss=0.2002, over 3270314.21 frames. utt_duration=1267 frames, utt_pad_proportion=0.04912, over 10337.74 utterances.], batch size: 40, lr: 4.65e-03, grad_scale: 8.0 2023-03-09 02:55:12,916 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.356e+02 1.815e+02 2.300e+02 2.792e+02 4.062e+02, threshold=4.600e+02, percent-clipped=0.0 2023-03-09 02:55:18,219 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90849.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:55:22,819 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90852.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:56:21,733 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90888.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:56:29,988 INFO [train2.py:809] (2/4) Epoch 23, batch 3250, loss[ctc_loss=0.06347, att_loss=0.2173, loss=0.1866, over 14569.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.03357, over 32.00 utterances.], tot_loss[ctc_loss=0.07019, att_loss=0.2336, loss=0.2009, over 3273548.21 frames. utt_duration=1267 frames, utt_pad_proportion=0.04788, over 10350.86 utterances.], batch size: 32, lr: 4.64e-03, grad_scale: 8.0 2023-03-09 02:56:57,374 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90910.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:57:02,862 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4794, 2.6360, 4.9762, 3.8579, 3.1089, 4.2460, 4.7254, 4.6009], device='cuda:2'), covar=tensor([0.0273, 0.1509, 0.0159, 0.0869, 0.1596, 0.0233, 0.0182, 0.0254], device='cuda:2'), in_proj_covar=tensor([0.0206, 0.0243, 0.0197, 0.0319, 0.0265, 0.0221, 0.0187, 0.0216], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 02:57:39,806 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90936.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 02:57:51,515 INFO [train2.py:809] (2/4) Epoch 23, batch 3300, loss[ctc_loss=0.1116, att_loss=0.2558, loss=0.227, over 14089.00 frames. utt_duration=387.4 frames, utt_pad_proportion=0.3252, over 146.00 utterances.], tot_loss[ctc_loss=0.07023, att_loss=0.2335, loss=0.2009, over 3269080.70 frames. utt_duration=1258 frames, utt_pad_proportion=0.05244, over 10403.17 utterances.], batch size: 146, lr: 4.64e-03, grad_scale: 8.0 2023-03-09 02:57:56,188 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.358e+02 1.978e+02 2.311e+02 2.914e+02 8.897e+02, threshold=4.622e+02, percent-clipped=4.0 2023-03-09 02:59:13,555 INFO [train2.py:809] (2/4) Epoch 23, batch 3350, loss[ctc_loss=0.07168, att_loss=0.2479, loss=0.2126, over 16325.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006506, over 45.00 utterances.], tot_loss[ctc_loss=0.07002, att_loss=0.2336, loss=0.2009, over 3266295.23 frames. utt_duration=1251 frames, utt_pad_proportion=0.05441, over 10456.05 utterances.], batch size: 45, lr: 4.64e-03, grad_scale: 8.0 2023-03-09 02:59:37,324 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91007.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:00:35,855 INFO [train2.py:809] (2/4) Epoch 23, batch 3400, loss[ctc_loss=0.06291, att_loss=0.2309, loss=0.1973, over 16536.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006332, over 45.00 utterances.], tot_loss[ctc_loss=0.07011, att_loss=0.2339, loss=0.2012, over 3269955.41 frames. utt_duration=1240 frames, utt_pad_proportion=0.05587, over 10562.71 utterances.], batch size: 45, lr: 4.64e-03, grad_scale: 8.0 2023-03-09 03:00:40,401 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.261e+02 1.937e+02 2.293e+02 2.658e+02 6.821e+02, threshold=4.585e+02, percent-clipped=3.0 2023-03-09 03:00:55,208 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91055.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:01:57,223 INFO [train2.py:809] (2/4) Epoch 23, batch 3450, loss[ctc_loss=0.0812, att_loss=0.2296, loss=0.2, over 16540.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006257, over 45.00 utterances.], tot_loss[ctc_loss=0.0699, att_loss=0.2331, loss=0.2004, over 3261348.02 frames. utt_duration=1243 frames, utt_pad_proportion=0.05755, over 10509.90 utterances.], batch size: 45, lr: 4.64e-03, grad_scale: 8.0 2023-03-09 03:02:33,698 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91115.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:02:52,890 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5500, 4.5575, 4.6666, 4.6294, 5.2194, 4.7179, 4.5165, 2.5713], device='cuda:2'), covar=tensor([0.0252, 0.0354, 0.0291, 0.0342, 0.0783, 0.0205, 0.0382, 0.1703], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0199, 0.0197, 0.0213, 0.0373, 0.0167, 0.0187, 0.0216], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:03:06,295 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9107, 2.3868, 2.7000, 2.6926, 2.8820, 2.7368, 2.6068, 3.1642], device='cuda:2'), covar=tensor([0.1545, 0.2576, 0.1774, 0.1400, 0.1993, 0.2062, 0.2401, 0.1061], device='cuda:2'), in_proj_covar=tensor([0.0128, 0.0129, 0.0127, 0.0118, 0.0134, 0.0115, 0.0140, 0.0110], device='cuda:2'), out_proj_covar=tensor([9.7132e-05, 1.0127e-04, 1.0237e-04, 9.2448e-05, 1.0033e-04, 9.2306e-05, 1.0608e-04, 8.7434e-05], device='cuda:2') 2023-03-09 03:03:19,285 INFO [train2.py:809] (2/4) Epoch 23, batch 3500, loss[ctc_loss=0.07147, att_loss=0.2312, loss=0.1993, over 16105.00 frames. utt_duration=1535 frames, utt_pad_proportion=0.006977, over 42.00 utterances.], tot_loss[ctc_loss=0.07004, att_loss=0.2332, loss=0.2005, over 3260640.71 frames. utt_duration=1235 frames, utt_pad_proportion=0.0615, over 10573.41 utterances.], batch size: 42, lr: 4.64e-03, grad_scale: 8.0 2023-03-09 03:03:23,956 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.203e+02 1.873e+02 2.204e+02 2.861e+02 4.686e+02, threshold=4.409e+02, percent-clipped=2.0 2023-03-09 03:03:33,670 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91152.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:03:51,250 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91163.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:04:40,872 INFO [train2.py:809] (2/4) Epoch 23, batch 3550, loss[ctc_loss=0.06896, att_loss=0.2289, loss=0.1969, over 16324.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006566, over 45.00 utterances.], tot_loss[ctc_loss=0.07056, att_loss=0.2341, loss=0.2014, over 3270852.21 frames. utt_duration=1214 frames, utt_pad_proportion=0.06325, over 10792.40 utterances.], batch size: 45, lr: 4.64e-03, grad_scale: 8.0 2023-03-09 03:04:52,821 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91200.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:04:58,877 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2023-03-09 03:05:01,556 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=91205.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:05:38,259 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7234, 4.5952, 4.7466, 4.6336, 5.3147, 4.7553, 4.5833, 2.8338], device='cuda:2'), covar=tensor([0.0208, 0.0462, 0.0304, 0.0383, 0.0712, 0.0202, 0.0397, 0.1531], device='cuda:2'), in_proj_covar=tensor([0.0172, 0.0199, 0.0197, 0.0213, 0.0372, 0.0166, 0.0187, 0.0215], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:06:04,020 INFO [train2.py:809] (2/4) Epoch 23, batch 3600, loss[ctc_loss=0.07594, att_loss=0.2201, loss=0.1913, over 16009.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006463, over 40.00 utterances.], tot_loss[ctc_loss=0.06981, att_loss=0.2331, loss=0.2004, over 3264413.21 frames. utt_duration=1242 frames, utt_pad_proportion=0.05687, over 10524.91 utterances.], batch size: 40, lr: 4.64e-03, grad_scale: 8.0 2023-03-09 03:06:08,716 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.503e+02 1.921e+02 2.289e+02 2.747e+02 1.018e+03, threshold=4.578e+02, percent-clipped=6.0 2023-03-09 03:07:14,030 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91285.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:07:22,013 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8943, 5.0907, 5.1482, 5.1039, 5.1688, 5.1361, 4.7975, 4.6003], device='cuda:2'), covar=tensor([0.1009, 0.0587, 0.0286, 0.0446, 0.0282, 0.0330, 0.0436, 0.0357], device='cuda:2'), in_proj_covar=tensor([0.0525, 0.0366, 0.0354, 0.0361, 0.0429, 0.0436, 0.0363, 0.0401], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-09 03:07:26,452 INFO [train2.py:809] (2/4) Epoch 23, batch 3650, loss[ctc_loss=0.06699, att_loss=0.2437, loss=0.2083, over 17350.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.02185, over 59.00 utterances.], tot_loss[ctc_loss=0.07032, att_loss=0.2334, loss=0.2008, over 3269113.95 frames. utt_duration=1250 frames, utt_pad_proportion=0.05365, over 10472.66 utterances.], batch size: 59, lr: 4.63e-03, grad_scale: 8.0 2023-03-09 03:07:42,506 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8321, 3.3321, 3.8988, 3.2963, 3.8265, 4.8628, 4.6757, 3.6897], device='cuda:2'), covar=tensor([0.0304, 0.1516, 0.0932, 0.1219, 0.0879, 0.0610, 0.0472, 0.0962], device='cuda:2'), in_proj_covar=tensor([0.0249, 0.0245, 0.0283, 0.0221, 0.0269, 0.0373, 0.0264, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 03:08:22,588 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3349, 4.5494, 4.6712, 4.8737, 2.9356, 4.4663, 2.8947, 2.0544], device='cuda:2'), covar=tensor([0.0380, 0.0274, 0.0540, 0.0451, 0.1546, 0.0237, 0.1388, 0.1616], device='cuda:2'), in_proj_covar=tensor([0.0198, 0.0170, 0.0261, 0.0163, 0.0223, 0.0155, 0.0231, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:08:33,565 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5425, 3.0638, 3.4052, 4.5099, 4.0594, 3.9992, 3.0895, 2.3840], device='cuda:2'), covar=tensor([0.0614, 0.1906, 0.0959, 0.0521, 0.0862, 0.0451, 0.1363, 0.2138], device='cuda:2'), in_proj_covar=tensor([0.0177, 0.0212, 0.0184, 0.0216, 0.0224, 0.0178, 0.0198, 0.0185], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:08:47,986 INFO [train2.py:809] (2/4) Epoch 23, batch 3700, loss[ctc_loss=0.06521, att_loss=0.2369, loss=0.2025, over 16682.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006175, over 46.00 utterances.], tot_loss[ctc_loss=0.06974, att_loss=0.2328, loss=0.2002, over 3260448.91 frames. utt_duration=1269 frames, utt_pad_proportion=0.05142, over 10287.73 utterances.], batch size: 46, lr: 4.63e-03, grad_scale: 4.0 2023-03-09 03:08:53,235 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=91346.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:08:54,348 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.187e+02 1.976e+02 2.254e+02 2.803e+02 7.124e+02, threshold=4.508e+02, percent-clipped=5.0 2023-03-09 03:09:14,608 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5209, 3.0886, 4.9438, 4.0284, 2.9412, 4.2295, 4.8251, 4.6951], device='cuda:2'), covar=tensor([0.0316, 0.1287, 0.0249, 0.0800, 0.1764, 0.0281, 0.0162, 0.0286], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0243, 0.0197, 0.0319, 0.0265, 0.0222, 0.0188, 0.0217], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:10:10,207 INFO [train2.py:809] (2/4) Epoch 23, batch 3750, loss[ctc_loss=0.08103, att_loss=0.2506, loss=0.2167, over 17112.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01484, over 56.00 utterances.], tot_loss[ctc_loss=0.0698, att_loss=0.2322, loss=0.1998, over 3256210.61 frames. utt_duration=1263 frames, utt_pad_proportion=0.05392, over 10324.42 utterances.], batch size: 56, lr: 4.63e-03, grad_scale: 4.0 2023-03-09 03:11:32,631 INFO [train2.py:809] (2/4) Epoch 23, batch 3800, loss[ctc_loss=0.1203, att_loss=0.264, loss=0.2353, over 14124.00 frames. utt_duration=391.1 frames, utt_pad_proportion=0.3199, over 145.00 utterances.], tot_loss[ctc_loss=0.07034, att_loss=0.2332, loss=0.2006, over 3262784.03 frames. utt_duration=1241 frames, utt_pad_proportion=0.05862, over 10526.63 utterances.], batch size: 145, lr: 4.63e-03, grad_scale: 4.0 2023-03-09 03:11:38,947 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.926e+02 2.458e+02 3.092e+02 6.476e+02, threshold=4.915e+02, percent-clipped=5.0 2023-03-09 03:12:41,821 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-03-09 03:12:55,887 INFO [train2.py:809] (2/4) Epoch 23, batch 3850, loss[ctc_loss=0.07111, att_loss=0.2458, loss=0.2108, over 17038.00 frames. utt_duration=1338 frames, utt_pad_proportion=0.007549, over 51.00 utterances.], tot_loss[ctc_loss=0.06992, att_loss=0.233, loss=0.2004, over 3257809.77 frames. utt_duration=1250 frames, utt_pad_proportion=0.05761, over 10439.19 utterances.], batch size: 51, lr: 4.63e-03, grad_scale: 4.0 2023-03-09 03:13:15,022 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91505.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:13:39,475 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2078, 3.8337, 3.2698, 3.5994, 3.9981, 3.7372, 3.0685, 4.3266], device='cuda:2'), covar=tensor([0.0965, 0.0522, 0.1153, 0.0623, 0.0768, 0.0670, 0.0835, 0.0556], device='cuda:2'), in_proj_covar=tensor([0.0202, 0.0218, 0.0224, 0.0201, 0.0278, 0.0241, 0.0198, 0.0288], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 03:14:12,251 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91542.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 03:14:13,401 INFO [train2.py:809] (2/4) Epoch 23, batch 3900, loss[ctc_loss=0.07447, att_loss=0.2174, loss=0.1888, over 15881.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.008219, over 39.00 utterances.], tot_loss[ctc_loss=0.06946, att_loss=0.2326, loss=0.2, over 3262173.89 frames. utt_duration=1242 frames, utt_pad_proportion=0.05843, over 10517.79 utterances.], batch size: 39, lr: 4.63e-03, grad_scale: 4.0 2023-03-09 03:14:19,549 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.215e+02 1.985e+02 2.304e+02 2.861e+02 4.797e+02, threshold=4.607e+02, percent-clipped=0.0 2023-03-09 03:14:29,564 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91553.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:15:21,970 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.85 vs. limit=5.0 2023-03-09 03:15:28,235 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2023-03-09 03:15:32,017 INFO [train2.py:809] (2/4) Epoch 23, batch 3950, loss[ctc_loss=0.08674, att_loss=0.2597, loss=0.2251, over 17317.00 frames. utt_duration=1005 frames, utt_pad_proportion=0.05241, over 69.00 utterances.], tot_loss[ctc_loss=0.07057, att_loss=0.2336, loss=0.201, over 3273565.25 frames. utt_duration=1226 frames, utt_pad_proportion=0.05775, over 10697.66 utterances.], batch size: 69, lr: 4.63e-03, grad_scale: 4.0 2023-03-09 03:15:48,515 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=91603.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 03:16:52,901 INFO [train2.py:809] (2/4) Epoch 24, batch 0, loss[ctc_loss=0.08528, att_loss=0.246, loss=0.2139, over 17064.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.007509, over 52.00 utterances.], tot_loss[ctc_loss=0.08528, att_loss=0.246, loss=0.2139, over 17064.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.007509, over 52.00 utterances.], batch size: 52, lr: 4.53e-03, grad_scale: 8.0 2023-03-09 03:16:52,902 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 03:17:00,660 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5721, 3.8272, 3.9061, 2.0538, 1.9765, 2.6751, 2.1106, 3.3936], device='cuda:2'), covar=tensor([0.0764, 0.0517, 0.0450, 0.4974, 0.5304, 0.2501, 0.3825, 0.1389], device='cuda:2'), in_proj_covar=tensor([0.0359, 0.0282, 0.0271, 0.0245, 0.0341, 0.0334, 0.0257, 0.0369], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 03:17:05,993 INFO [train2.py:843] (2/4) Epoch 24, validation: ctc_loss=0.04095, att_loss=0.2349, loss=0.1961, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 03:17:05,994 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 03:17:28,525 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=91641.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:17:37,666 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.284e+02 2.024e+02 2.562e+02 3.130e+02 9.930e+02, threshold=5.124e+02, percent-clipped=6.0 2023-03-09 03:18:15,995 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-09 03:18:27,183 INFO [train2.py:809] (2/4) Epoch 24, batch 50, loss[ctc_loss=0.07311, att_loss=0.2184, loss=0.1893, over 15507.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008354, over 36.00 utterances.], tot_loss[ctc_loss=0.06707, att_loss=0.23, loss=0.1974, over 733620.04 frames. utt_duration=1367 frames, utt_pad_proportion=0.04207, over 2149.19 utterances.], batch size: 36, lr: 4.53e-03, grad_scale: 8.0 2023-03-09 03:19:47,844 INFO [train2.py:809] (2/4) Epoch 24, batch 100, loss[ctc_loss=0.07391, att_loss=0.22, loss=0.1908, over 15775.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.0084, over 38.00 utterances.], tot_loss[ctc_loss=0.06795, att_loss=0.2314, loss=0.1987, over 1286097.12 frames. utt_duration=1253 frames, utt_pad_proportion=0.06408, over 4111.82 utterances.], batch size: 38, lr: 4.52e-03, grad_scale: 8.0 2023-03-09 03:20:20,175 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.937e+02 2.364e+02 2.777e+02 5.363e+02, threshold=4.728e+02, percent-clipped=1.0 2023-03-09 03:20:53,279 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.67 vs. limit=5.0 2023-03-09 03:21:08,901 INFO [train2.py:809] (2/4) Epoch 24, batch 150, loss[ctc_loss=0.0543, att_loss=0.2064, loss=0.176, over 14516.00 frames. utt_duration=1816 frames, utt_pad_proportion=0.03047, over 32.00 utterances.], tot_loss[ctc_loss=0.06947, att_loss=0.2327, loss=0.2001, over 1725635.10 frames. utt_duration=1249 frames, utt_pad_proportion=0.06069, over 5534.41 utterances.], batch size: 32, lr: 4.52e-03, grad_scale: 8.0 2023-03-09 03:22:30,995 INFO [train2.py:809] (2/4) Epoch 24, batch 200, loss[ctc_loss=0.08353, att_loss=0.251, loss=0.2175, over 17017.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007948, over 51.00 utterances.], tot_loss[ctc_loss=0.06877, att_loss=0.2325, loss=0.1998, over 2070747.48 frames. utt_duration=1252 frames, utt_pad_proportion=0.05614, over 6625.90 utterances.], batch size: 51, lr: 4.52e-03, grad_scale: 8.0 2023-03-09 03:23:02,671 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.924e+02 2.312e+02 2.679e+02 4.327e+02, threshold=4.624e+02, percent-clipped=0.0 2023-03-09 03:23:51,118 INFO [train2.py:809] (2/4) Epoch 24, batch 250, loss[ctc_loss=0.07331, att_loss=0.232, loss=0.2003, over 16538.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006437, over 45.00 utterances.], tot_loss[ctc_loss=0.06935, att_loss=0.2326, loss=0.2, over 2321955.14 frames. utt_duration=1284 frames, utt_pad_proportion=0.05107, over 7244.43 utterances.], batch size: 45, lr: 4.52e-03, grad_scale: 8.0 2023-03-09 03:24:06,714 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3587, 2.6022, 4.8632, 3.8281, 2.9031, 4.1823, 4.6660, 4.5112], device='cuda:2'), covar=tensor([0.0314, 0.1569, 0.0194, 0.0914, 0.1800, 0.0280, 0.0190, 0.0328], device='cuda:2'), in_proj_covar=tensor([0.0208, 0.0241, 0.0196, 0.0317, 0.0263, 0.0220, 0.0188, 0.0217], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:24:14,500 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91891.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:24:25,437 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=91898.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 03:25:06,742 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.90 vs. limit=5.0 2023-03-09 03:25:11,834 INFO [train2.py:809] (2/4) Epoch 24, batch 300, loss[ctc_loss=0.06612, att_loss=0.2375, loss=0.2033, over 16885.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006391, over 49.00 utterances.], tot_loss[ctc_loss=0.0699, att_loss=0.2328, loss=0.2003, over 2525107.24 frames. utt_duration=1267 frames, utt_pad_proportion=0.05575, over 7979.26 utterances.], batch size: 49, lr: 4.52e-03, grad_scale: 8.0 2023-03-09 03:25:35,427 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91941.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:25:44,283 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.878e+02 2.164e+02 2.654e+02 4.888e+02, threshold=4.328e+02, percent-clipped=1.0 2023-03-09 03:25:53,275 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=91952.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:25:57,826 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91955.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:26:32,685 INFO [train2.py:809] (2/4) Epoch 24, batch 350, loss[ctc_loss=0.07166, att_loss=0.2287, loss=0.1973, over 16118.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006237, over 42.00 utterances.], tot_loss[ctc_loss=0.06937, att_loss=0.2328, loss=0.2001, over 2698234.68 frames. utt_duration=1266 frames, utt_pad_proportion=0.05173, over 8536.98 utterances.], batch size: 42, lr: 4.52e-03, grad_scale: 8.0 2023-03-09 03:26:39,972 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91981.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 03:26:52,910 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91989.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:27:40,066 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92016.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:27:58,110 INFO [train2.py:809] (2/4) Epoch 24, batch 400, loss[ctc_loss=0.08782, att_loss=0.2489, loss=0.2166, over 16332.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006186, over 45.00 utterances.], tot_loss[ctc_loss=0.06983, att_loss=0.233, loss=0.2004, over 2821074.61 frames. utt_duration=1246 frames, utt_pad_proportion=0.05793, over 9066.94 utterances.], batch size: 45, lr: 4.52e-03, grad_scale: 8.0 2023-03-09 03:28:16,346 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0840, 5.3059, 5.2879, 5.2967, 5.3728, 5.3471, 5.0135, 4.8130], device='cuda:2'), covar=tensor([0.1051, 0.0590, 0.0316, 0.0522, 0.0290, 0.0338, 0.0413, 0.0364], device='cuda:2'), in_proj_covar=tensor([0.0527, 0.0366, 0.0354, 0.0361, 0.0428, 0.0438, 0.0362, 0.0402], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-09 03:28:22,900 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92042.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 03:28:30,201 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 1.943e+02 2.339e+02 2.970e+02 6.262e+02, threshold=4.678e+02, percent-clipped=8.0 2023-03-09 03:29:04,484 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7720, 6.0421, 5.5550, 5.8541, 5.7350, 5.2118, 5.3748, 5.3256], device='cuda:2'), covar=tensor([0.1308, 0.0872, 0.0951, 0.0746, 0.0909, 0.1472, 0.2356, 0.2128], device='cuda:2'), in_proj_covar=tensor([0.0533, 0.0621, 0.0471, 0.0464, 0.0438, 0.0473, 0.0622, 0.0538], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 03:29:18,995 INFO [train2.py:809] (2/4) Epoch 24, batch 450, loss[ctc_loss=0.03364, att_loss=0.2109, loss=0.1755, over 15951.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007151, over 41.00 utterances.], tot_loss[ctc_loss=0.06979, att_loss=0.2326, loss=0.2, over 2919274.01 frames. utt_duration=1268 frames, utt_pad_proportion=0.05126, over 9217.30 utterances.], batch size: 41, lr: 4.52e-03, grad_scale: 8.0 2023-03-09 03:30:40,856 INFO [train2.py:809] (2/4) Epoch 24, batch 500, loss[ctc_loss=0.06465, att_loss=0.2329, loss=0.1993, over 16403.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.00748, over 44.00 utterances.], tot_loss[ctc_loss=0.0702, att_loss=0.2332, loss=0.2006, over 2998066.22 frames. utt_duration=1256 frames, utt_pad_proportion=0.05489, over 9557.64 utterances.], batch size: 44, lr: 4.51e-03, grad_scale: 8.0 2023-03-09 03:31:13,732 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.285e+02 1.745e+02 2.110e+02 2.591e+02 7.932e+02, threshold=4.219e+02, percent-clipped=4.0 2023-03-09 03:32:01,720 INFO [train2.py:809] (2/4) Epoch 24, batch 550, loss[ctc_loss=0.08473, att_loss=0.2359, loss=0.2057, over 16161.00 frames. utt_duration=1578 frames, utt_pad_proportion=0.007363, over 41.00 utterances.], tot_loss[ctc_loss=0.07057, att_loss=0.2336, loss=0.201, over 3054632.55 frames. utt_duration=1218 frames, utt_pad_proportion=0.06546, over 10043.24 utterances.], batch size: 41, lr: 4.51e-03, grad_scale: 8.0 2023-03-09 03:32:18,889 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-03-09 03:32:35,879 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=92198.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 03:33:22,318 INFO [train2.py:809] (2/4) Epoch 24, batch 600, loss[ctc_loss=0.08196, att_loss=0.2234, loss=0.1952, over 15782.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.007835, over 38.00 utterances.], tot_loss[ctc_loss=0.07078, att_loss=0.2337, loss=0.2011, over 3095794.81 frames. utt_duration=1229 frames, utt_pad_proportion=0.06537, over 10091.48 utterances.], batch size: 38, lr: 4.51e-03, grad_scale: 8.0 2023-03-09 03:33:53,652 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=92246.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 03:33:54,959 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.830e+02 2.241e+02 2.667e+02 5.450e+02, threshold=4.482e+02, percent-clipped=6.0 2023-03-09 03:33:55,216 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=92247.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:34:38,849 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.94 vs. limit=5.0 2023-03-09 03:34:42,405 INFO [train2.py:809] (2/4) Epoch 24, batch 650, loss[ctc_loss=0.07466, att_loss=0.2465, loss=0.2121, over 17022.00 frames. utt_duration=1286 frames, utt_pad_proportion=0.01058, over 53.00 utterances.], tot_loss[ctc_loss=0.07052, att_loss=0.2342, loss=0.2015, over 3142480.83 frames. utt_duration=1230 frames, utt_pad_proportion=0.0612, over 10232.39 utterances.], batch size: 53, lr: 4.51e-03, grad_scale: 8.0 2023-03-09 03:35:24,929 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3178, 4.4834, 4.4676, 4.4628, 4.9012, 4.3910, 4.4172, 2.4382], device='cuda:2'), covar=tensor([0.0318, 0.0393, 0.0368, 0.0351, 0.0981, 0.0303, 0.0412, 0.2036], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0202, 0.0202, 0.0218, 0.0380, 0.0170, 0.0190, 0.0218], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:35:36,587 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2347, 4.4002, 4.4835, 4.6752, 2.9812, 4.4539, 3.2392, 1.8781], device='cuda:2'), covar=tensor([0.0485, 0.0335, 0.0667, 0.0275, 0.1527, 0.0255, 0.1173, 0.1698], device='cuda:2'), in_proj_covar=tensor([0.0199, 0.0171, 0.0263, 0.0165, 0.0224, 0.0155, 0.0230, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:35:37,892 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=92311.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:36:03,497 INFO [train2.py:809] (2/4) Epoch 24, batch 700, loss[ctc_loss=0.1066, att_loss=0.2647, loss=0.2331, over 14402.00 frames. utt_duration=393.5 frames, utt_pad_proportion=0.3085, over 147.00 utterances.], tot_loss[ctc_loss=0.07035, att_loss=0.2342, loss=0.2015, over 3170187.56 frames. utt_duration=1210 frames, utt_pad_proportion=0.06641, over 10492.55 utterances.], batch size: 147, lr: 4.51e-03, grad_scale: 8.0 2023-03-09 03:36:04,565 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1314, 4.2486, 4.3172, 4.5885, 2.7834, 4.3791, 2.9702, 1.8825], device='cuda:2'), covar=tensor([0.0462, 0.0311, 0.0672, 0.0236, 0.1697, 0.0217, 0.1390, 0.1727], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0171, 0.0263, 0.0165, 0.0225, 0.0156, 0.0231, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:36:20,881 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=92337.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 03:36:36,599 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.268e+02 1.935e+02 2.344e+02 2.941e+02 5.112e+02, threshold=4.689e+02, percent-clipped=2.0 2023-03-09 03:36:46,160 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7988, 5.0396, 5.3377, 5.1877, 5.2745, 5.7017, 5.0535, 5.8102], device='cuda:2'), covar=tensor([0.0821, 0.0795, 0.0799, 0.1506, 0.1996, 0.1077, 0.0858, 0.0730], device='cuda:2'), in_proj_covar=tensor([0.0882, 0.0513, 0.0617, 0.0665, 0.0880, 0.0637, 0.0498, 0.0622], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 03:37:25,729 INFO [train2.py:809] (2/4) Epoch 24, batch 750, loss[ctc_loss=0.07, att_loss=0.2313, loss=0.199, over 17046.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.009862, over 53.00 utterances.], tot_loss[ctc_loss=0.06983, att_loss=0.2333, loss=0.2006, over 3188241.18 frames. utt_duration=1235 frames, utt_pad_proportion=0.0613, over 10337.17 utterances.], batch size: 53, lr: 4.51e-03, grad_scale: 8.0 2023-03-09 03:38:04,678 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3735, 4.4661, 4.4580, 4.5271, 4.9891, 4.4361, 4.4386, 2.6215], device='cuda:2'), covar=tensor([0.0286, 0.0344, 0.0360, 0.0291, 0.0811, 0.0265, 0.0335, 0.1693], device='cuda:2'), in_proj_covar=tensor([0.0174, 0.0200, 0.0200, 0.0215, 0.0375, 0.0169, 0.0189, 0.0215], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:38:27,726 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2023-03-09 03:38:46,217 INFO [train2.py:809] (2/4) Epoch 24, batch 800, loss[ctc_loss=0.08973, att_loss=0.2527, loss=0.2201, over 16882.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.006891, over 49.00 utterances.], tot_loss[ctc_loss=0.07003, att_loss=0.2339, loss=0.2011, over 3213230.29 frames. utt_duration=1234 frames, utt_pad_proportion=0.05863, over 10428.07 utterances.], batch size: 49, lr: 4.51e-03, grad_scale: 8.0 2023-03-09 03:39:19,378 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.406e+02 1.923e+02 2.416e+02 3.239e+02 1.370e+03, threshold=4.832e+02, percent-clipped=5.0 2023-03-09 03:40:02,103 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7070, 2.3077, 2.5225, 2.6697, 2.9210, 2.5000, 2.4532, 2.8016], device='cuda:2'), covar=tensor([0.2130, 0.3911, 0.2478, 0.2449, 0.2089, 0.2531, 0.3130, 0.2028], device='cuda:2'), in_proj_covar=tensor([0.0127, 0.0132, 0.0128, 0.0119, 0.0134, 0.0116, 0.0142, 0.0112], device='cuda:2'), out_proj_covar=tensor([9.7189e-05, 1.0289e-04, 1.0345e-04, 9.3385e-05, 1.0080e-04, 9.3497e-05, 1.0705e-04, 8.9349e-05], device='cuda:2') 2023-03-09 03:40:07,916 INFO [train2.py:809] (2/4) Epoch 24, batch 850, loss[ctc_loss=0.07032, att_loss=0.2144, loss=0.1856, over 15502.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008594, over 36.00 utterances.], tot_loss[ctc_loss=0.06935, att_loss=0.2339, loss=0.201, over 3231021.74 frames. utt_duration=1240 frames, utt_pad_proportion=0.05644, over 10436.15 utterances.], batch size: 36, lr: 4.51e-03, grad_scale: 8.0 2023-03-09 03:40:42,834 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1554, 5.4422, 5.3456, 5.3531, 5.4735, 5.4480, 5.0838, 4.8430], device='cuda:2'), covar=tensor([0.0994, 0.0514, 0.0316, 0.0510, 0.0312, 0.0324, 0.0382, 0.0343], device='cuda:2'), in_proj_covar=tensor([0.0529, 0.0366, 0.0354, 0.0363, 0.0429, 0.0436, 0.0363, 0.0402], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-09 03:41:20,440 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9317, 5.2592, 4.7046, 5.3616, 4.7040, 5.0421, 5.3657, 5.2054], device='cuda:2'), covar=tensor([0.0641, 0.0321, 0.0962, 0.0332, 0.0427, 0.0250, 0.0280, 0.0212], device='cuda:2'), in_proj_covar=tensor([0.0391, 0.0323, 0.0365, 0.0357, 0.0328, 0.0238, 0.0308, 0.0286], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0005, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 03:41:26,745 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4622, 2.9381, 3.6137, 3.0245, 3.5585, 4.5739, 4.3865, 3.1239], device='cuda:2'), covar=tensor([0.0389, 0.1865, 0.1212, 0.1444, 0.1091, 0.0883, 0.0543, 0.1351], device='cuda:2'), in_proj_covar=tensor([0.0249, 0.0249, 0.0288, 0.0224, 0.0272, 0.0377, 0.0269, 0.0235], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 03:41:29,528 INFO [train2.py:809] (2/4) Epoch 24, batch 900, loss[ctc_loss=0.06273, att_loss=0.2391, loss=0.2038, over 17064.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.008287, over 53.00 utterances.], tot_loss[ctc_loss=0.06945, att_loss=0.2342, loss=0.2013, over 3247062.00 frames. utt_duration=1216 frames, utt_pad_proportion=0.05795, over 10692.11 utterances.], batch size: 53, lr: 4.50e-03, grad_scale: 8.0 2023-03-09 03:42:02,358 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.834e+02 2.155e+02 2.748e+02 5.069e+02, threshold=4.309e+02, percent-clipped=1.0 2023-03-09 03:42:02,742 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=92547.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:42:50,356 INFO [train2.py:809] (2/4) Epoch 24, batch 950, loss[ctc_loss=0.07314, att_loss=0.2539, loss=0.2178, over 17060.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009485, over 53.00 utterances.], tot_loss[ctc_loss=0.0698, att_loss=0.2341, loss=0.2013, over 3255976.04 frames. utt_duration=1224 frames, utt_pad_proportion=0.05519, over 10652.56 utterances.], batch size: 53, lr: 4.50e-03, grad_scale: 8.0 2023-03-09 03:43:20,156 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=92595.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:43:46,291 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=92611.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:44:08,919 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92625.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:44:11,647 INFO [train2.py:809] (2/4) Epoch 24, batch 1000, loss[ctc_loss=0.06052, att_loss=0.2464, loss=0.2092, over 16629.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.00511, over 47.00 utterances.], tot_loss[ctc_loss=0.06957, att_loss=0.2343, loss=0.2014, over 3267031.36 frames. utt_duration=1259 frames, utt_pad_proportion=0.04658, over 10391.98 utterances.], batch size: 47, lr: 4.50e-03, grad_scale: 8.0 2023-03-09 03:44:28,680 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=92637.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 03:44:44,464 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.218e+02 1.788e+02 2.232e+02 2.616e+02 5.460e+02, threshold=4.464e+02, percent-clipped=2.0 2023-03-09 03:45:04,283 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=92659.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:45:27,875 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.17 vs. limit=5.0 2023-03-09 03:45:33,246 INFO [train2.py:809] (2/4) Epoch 24, batch 1050, loss[ctc_loss=0.07847, att_loss=0.258, loss=0.2221, over 17029.00 frames. utt_duration=1287 frames, utt_pad_proportion=0.01015, over 53.00 utterances.], tot_loss[ctc_loss=0.06925, att_loss=0.2341, loss=0.2011, over 3271485.62 frames. utt_duration=1270 frames, utt_pad_proportion=0.04448, over 10316.56 utterances.], batch size: 53, lr: 4.50e-03, grad_scale: 8.0 2023-03-09 03:45:47,603 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=92685.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 03:45:49,251 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92686.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:46:23,951 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0530, 5.0958, 4.8193, 2.8443, 4.8594, 4.7493, 4.1912, 2.5595], device='cuda:2'), covar=tensor([0.0124, 0.0089, 0.0261, 0.1088, 0.0106, 0.0181, 0.0351, 0.1445], device='cuda:2'), in_proj_covar=tensor([0.0075, 0.0102, 0.0104, 0.0110, 0.0086, 0.0114, 0.0099, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 03:46:43,756 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92720.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:46:54,334 INFO [train2.py:809] (2/4) Epoch 24, batch 1100, loss[ctc_loss=0.06614, att_loss=0.213, loss=0.1836, over 15652.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.008355, over 37.00 utterances.], tot_loss[ctc_loss=0.06908, att_loss=0.2342, loss=0.2012, over 3274583.76 frames. utt_duration=1296 frames, utt_pad_proportion=0.0384, over 10121.08 utterances.], batch size: 37, lr: 4.50e-03, grad_scale: 8.0 2023-03-09 03:47:27,345 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.412e+02 1.778e+02 2.254e+02 2.467e+02 8.926e+02, threshold=4.507e+02, percent-clipped=3.0 2023-03-09 03:48:16,145 INFO [train2.py:809] (2/4) Epoch 24, batch 1150, loss[ctc_loss=0.05913, att_loss=0.2204, loss=0.1882, over 16397.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007918, over 44.00 utterances.], tot_loss[ctc_loss=0.06864, att_loss=0.2334, loss=0.2005, over 3266511.81 frames. utt_duration=1291 frames, utt_pad_proportion=0.04251, over 10129.74 utterances.], batch size: 44, lr: 4.50e-03, grad_scale: 8.0 2023-03-09 03:48:23,633 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92781.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:49:37,246 INFO [train2.py:809] (2/4) Epoch 24, batch 1200, loss[ctc_loss=0.11, att_loss=0.269, loss=0.2372, over 17050.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008541, over 52.00 utterances.], tot_loss[ctc_loss=0.06881, att_loss=0.2336, loss=0.2006, over 3271578.45 frames. utt_duration=1273 frames, utt_pad_proportion=0.04651, over 10288.80 utterances.], batch size: 52, lr: 4.50e-03, grad_scale: 8.0 2023-03-09 03:49:45,332 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92831.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:50:10,381 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.270e+02 1.947e+02 2.306e+02 3.014e+02 1.019e+03, threshold=4.613e+02, percent-clipped=6.0 2023-03-09 03:50:46,901 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0243, 4.3703, 4.1884, 4.6569, 2.7146, 4.4119, 2.6700, 1.9907], device='cuda:2'), covar=tensor([0.0473, 0.0260, 0.0797, 0.0197, 0.1737, 0.0214, 0.1581, 0.1610], device='cuda:2'), in_proj_covar=tensor([0.0202, 0.0170, 0.0263, 0.0166, 0.0223, 0.0156, 0.0232, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 03:50:59,082 INFO [train2.py:809] (2/4) Epoch 24, batch 1250, loss[ctc_loss=0.0614, att_loss=0.2188, loss=0.1873, over 15891.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.00891, over 39.00 utterances.], tot_loss[ctc_loss=0.06765, att_loss=0.2323, loss=0.1994, over 3269621.29 frames. utt_duration=1280 frames, utt_pad_proportion=0.04383, over 10227.54 utterances.], batch size: 39, lr: 4.50e-03, grad_scale: 8.0 2023-03-09 03:51:24,809 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92892.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:51:35,726 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92899.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:52:02,039 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1209, 5.3802, 5.4137, 5.2656, 5.4070, 5.3742, 5.0563, 4.8684], device='cuda:2'), covar=tensor([0.0949, 0.0461, 0.0251, 0.0510, 0.0262, 0.0317, 0.0353, 0.0320], device='cuda:2'), in_proj_covar=tensor([0.0529, 0.0366, 0.0351, 0.0363, 0.0427, 0.0437, 0.0362, 0.0400], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0003, 0.0004], device='cuda:2') 2023-03-09 03:52:20,573 INFO [train2.py:809] (2/4) Epoch 24, batch 1300, loss[ctc_loss=0.07702, att_loss=0.249, loss=0.2146, over 17312.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02274, over 59.00 utterances.], tot_loss[ctc_loss=0.06883, att_loss=0.2333, loss=0.2004, over 3273704.59 frames. utt_duration=1245 frames, utt_pad_proportion=0.05097, over 10529.93 utterances.], batch size: 59, lr: 4.50e-03, grad_scale: 8.0 2023-03-09 03:52:21,480 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.83 vs. limit=5.0 2023-03-09 03:52:27,998 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92931.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:52:45,612 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5222, 4.9471, 4.8252, 4.9042, 4.9881, 4.6710, 3.3642, 4.9172], device='cuda:2'), covar=tensor([0.0121, 0.0113, 0.0122, 0.0085, 0.0099, 0.0107, 0.0712, 0.0188], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0089, 0.0111, 0.0070, 0.0076, 0.0087, 0.0102, 0.0108], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 03:52:53,102 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.274e+02 1.940e+02 2.372e+02 2.718e+02 7.433e+02, threshold=4.745e+02, percent-clipped=1.0 2023-03-09 03:53:14,567 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92960.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:53:27,948 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8841, 2.4761, 2.8781, 2.6593, 2.9255, 2.7175, 2.6271, 3.1488], device='cuda:2'), covar=tensor([0.1078, 0.2279, 0.1328, 0.1475, 0.1348, 0.1157, 0.1965, 0.1069], device='cuda:2'), in_proj_covar=tensor([0.0128, 0.0132, 0.0129, 0.0120, 0.0135, 0.0117, 0.0142, 0.0113], device='cuda:2'), out_proj_covar=tensor([9.7633e-05, 1.0342e-04, 1.0375e-04, 9.3657e-05, 1.0139e-04, 9.3769e-05, 1.0714e-04, 8.9623e-05], device='cuda:2') 2023-03-09 03:53:39,304 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92975.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:53:41,992 INFO [train2.py:809] (2/4) Epoch 24, batch 1350, loss[ctc_loss=0.07343, att_loss=0.2185, loss=0.1895, over 15960.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.006653, over 41.00 utterances.], tot_loss[ctc_loss=0.0682, att_loss=0.2323, loss=0.1995, over 3267549.92 frames. utt_duration=1266 frames, utt_pad_proportion=0.04851, over 10336.40 utterances.], batch size: 41, lr: 4.49e-03, grad_scale: 8.0 2023-03-09 03:53:49,251 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=92981.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:53:49,400 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4932, 3.0151, 3.6040, 3.1178, 3.5514, 4.5665, 4.4242, 3.3736], device='cuda:2'), covar=tensor([0.0384, 0.1794, 0.1348, 0.1275, 0.1055, 0.0964, 0.0559, 0.1145], device='cuda:2'), in_proj_covar=tensor([0.0246, 0.0246, 0.0283, 0.0220, 0.0267, 0.0372, 0.0265, 0.0233], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 03:53:50,277 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-09 03:54:07,144 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92992.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:54:36,430 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93010.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:55:03,583 INFO [train2.py:809] (2/4) Epoch 24, batch 1400, loss[ctc_loss=0.06411, att_loss=0.221, loss=0.1896, over 16195.00 frames. utt_duration=1582 frames, utt_pad_proportion=0.005277, over 41.00 utterances.], tot_loss[ctc_loss=0.06845, att_loss=0.2326, loss=0.1998, over 3275774.33 frames. utt_duration=1266 frames, utt_pad_proportion=0.04687, over 10365.38 utterances.], batch size: 41, lr: 4.49e-03, grad_scale: 8.0 2023-03-09 03:55:19,544 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93036.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:55:19,574 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93036.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:55:36,735 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.237e+02 1.820e+02 2.183e+02 2.571e+02 5.095e+02, threshold=4.367e+02, percent-clipped=2.0 2023-03-09 03:56:16,545 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93071.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:56:24,062 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93076.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:56:25,392 INFO [train2.py:809] (2/4) Epoch 24, batch 1450, loss[ctc_loss=0.07534, att_loss=0.2399, loss=0.2069, over 16765.00 frames. utt_duration=678.9 frames, utt_pad_proportion=0.1461, over 99.00 utterances.], tot_loss[ctc_loss=0.068, att_loss=0.2321, loss=0.1992, over 3275344.27 frames. utt_duration=1277 frames, utt_pad_proportion=0.04398, over 10273.52 utterances.], batch size: 99, lr: 4.49e-03, grad_scale: 8.0 2023-03-09 03:56:46,255 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-09 03:56:54,726 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1498, 4.4193, 4.3364, 4.3627, 4.4817, 4.2730, 3.2085, 4.3619], device='cuda:2'), covar=tensor([0.0149, 0.0130, 0.0149, 0.0097, 0.0110, 0.0121, 0.0731, 0.0203], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0089, 0.0112, 0.0070, 0.0076, 0.0087, 0.0103, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 03:56:57,989 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93097.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 03:57:26,930 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93115.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:57:45,165 INFO [train2.py:809] (2/4) Epoch 24, batch 1500, loss[ctc_loss=0.08649, att_loss=0.2455, loss=0.2137, over 17171.00 frames. utt_duration=695.4 frames, utt_pad_proportion=0.1275, over 99.00 utterances.], tot_loss[ctc_loss=0.06903, att_loss=0.2329, loss=0.2001, over 3269763.44 frames. utt_duration=1232 frames, utt_pad_proportion=0.05693, over 10629.44 utterances.], batch size: 99, lr: 4.49e-03, grad_scale: 8.0 2023-03-09 03:58:17,655 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.288e+02 1.858e+02 2.215e+02 2.637e+02 6.796e+02, threshold=4.431e+02, percent-clipped=2.0 2023-03-09 03:58:39,478 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9259, 5.1872, 5.2321, 5.0908, 5.2355, 5.2231, 4.8397, 4.7073], device='cuda:2'), covar=tensor([0.1068, 0.0607, 0.0270, 0.0499, 0.0298, 0.0338, 0.0428, 0.0337], device='cuda:2'), in_proj_covar=tensor([0.0531, 0.0367, 0.0353, 0.0366, 0.0430, 0.0439, 0.0366, 0.0400], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 03:58:42,932 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0634, 5.1117, 4.7813, 2.2063, 2.1340, 3.1382, 2.2885, 3.7590], device='cuda:2'), covar=tensor([0.0756, 0.0297, 0.0320, 0.5256, 0.5375, 0.2167, 0.3874, 0.1864], device='cuda:2'), in_proj_covar=tensor([0.0354, 0.0279, 0.0267, 0.0243, 0.0334, 0.0330, 0.0258, 0.0366], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 03:59:04,763 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93176.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:59:05,933 INFO [train2.py:809] (2/4) Epoch 24, batch 1550, loss[ctc_loss=0.05849, att_loss=0.2199, loss=0.1876, over 15963.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.006482, over 41.00 utterances.], tot_loss[ctc_loss=0.07004, att_loss=0.234, loss=0.2012, over 3280430.07 frames. utt_duration=1228 frames, utt_pad_proportion=0.05577, over 10695.18 utterances.], batch size: 41, lr: 4.49e-03, grad_scale: 8.0 2023-03-09 03:59:23,146 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93187.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 03:59:29,461 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6407, 5.0038, 4.8399, 4.9470, 5.1327, 4.7251, 3.8658, 5.0253], device='cuda:2'), covar=tensor([0.0116, 0.0125, 0.0127, 0.0091, 0.0080, 0.0124, 0.0556, 0.0196], device='cuda:2'), in_proj_covar=tensor([0.0094, 0.0090, 0.0112, 0.0071, 0.0076, 0.0087, 0.0103, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 04:00:26,992 INFO [train2.py:809] (2/4) Epoch 24, batch 1600, loss[ctc_loss=0.06333, att_loss=0.2228, loss=0.1909, over 15651.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.00864, over 37.00 utterances.], tot_loss[ctc_loss=0.07, att_loss=0.2344, loss=0.2015, over 3277905.26 frames. utt_duration=1241 frames, utt_pad_proportion=0.05338, over 10577.50 utterances.], batch size: 37, lr: 4.49e-03, grad_scale: 8.0 2023-03-09 04:00:59,420 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.441e+02 2.026e+02 2.316e+02 3.152e+02 2.970e+03, threshold=4.633e+02, percent-clipped=9.0 2023-03-09 04:01:12,781 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93255.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:01:33,925 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1463, 5.4015, 5.4373, 5.3364, 5.4227, 5.4138, 5.0575, 4.8985], device='cuda:2'), covar=tensor([0.0919, 0.0481, 0.0265, 0.0497, 0.0292, 0.0311, 0.0393, 0.0310], device='cuda:2'), in_proj_covar=tensor([0.0533, 0.0368, 0.0355, 0.0367, 0.0432, 0.0440, 0.0368, 0.0400], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 04:01:47,572 INFO [train2.py:809] (2/4) Epoch 24, batch 1650, loss[ctc_loss=0.05951, att_loss=0.2395, loss=0.2035, over 16324.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006399, over 45.00 utterances.], tot_loss[ctc_loss=0.06953, att_loss=0.2343, loss=0.2013, over 3279009.36 frames. utt_duration=1251 frames, utt_pad_proportion=0.05103, over 10494.47 utterances.], batch size: 45, lr: 4.49e-03, grad_scale: 8.0 2023-03-09 04:01:54,069 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93281.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:02:05,086 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93287.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:03:09,900 INFO [train2.py:809] (2/4) Epoch 24, batch 1700, loss[ctc_loss=0.06352, att_loss=0.2355, loss=0.2011, over 16905.00 frames. utt_duration=684.5 frames, utt_pad_proportion=0.1422, over 99.00 utterances.], tot_loss[ctc_loss=0.06931, att_loss=0.2345, loss=0.2015, over 3288989.86 frames. utt_duration=1236 frames, utt_pad_proportion=0.05137, over 10659.64 utterances.], batch size: 99, lr: 4.49e-03, grad_scale: 16.0 2023-03-09 04:03:13,046 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93329.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:03:16,320 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93331.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:03:32,703 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-09 04:03:33,332 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7111, 5.0105, 4.8508, 5.0331, 5.0942, 4.8437, 3.5281, 5.0560], device='cuda:2'), covar=tensor([0.0114, 0.0112, 0.0130, 0.0070, 0.0088, 0.0104, 0.0674, 0.0147], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0089, 0.0111, 0.0070, 0.0075, 0.0086, 0.0102, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 04:03:42,405 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 1.807e+02 2.162e+02 2.444e+02 4.017e+02, threshold=4.325e+02, percent-clipped=0.0 2023-03-09 04:04:04,696 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7961, 5.1426, 4.9997, 5.2117, 5.2464, 4.9873, 3.8595, 5.2134], device='cuda:2'), covar=tensor([0.0110, 0.0096, 0.0141, 0.0059, 0.0107, 0.0088, 0.0574, 0.0133], device='cuda:2'), in_proj_covar=tensor([0.0093, 0.0089, 0.0111, 0.0070, 0.0075, 0.0086, 0.0102, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 04:04:08,125 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0371, 5.1146, 4.8719, 2.2539, 2.1821, 2.9288, 2.4461, 3.8275], device='cuda:2'), covar=tensor([0.0763, 0.0321, 0.0279, 0.5164, 0.5142, 0.2500, 0.3771, 0.1813], device='cuda:2'), in_proj_covar=tensor([0.0361, 0.0286, 0.0273, 0.0248, 0.0341, 0.0337, 0.0261, 0.0373], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 04:04:14,236 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93366.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:04:29,855 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93376.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:04:29,931 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8209, 3.5597, 3.5062, 3.0141, 3.6103, 3.5826, 3.6546, 2.5335], device='cuda:2'), covar=tensor([0.1040, 0.1360, 0.1800, 0.3271, 0.0858, 0.2313, 0.0733, 0.4031], device='cuda:2'), in_proj_covar=tensor([0.0192, 0.0198, 0.0215, 0.0265, 0.0172, 0.0277, 0.0197, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:04:31,041 INFO [train2.py:809] (2/4) Epoch 24, batch 1750, loss[ctc_loss=0.08404, att_loss=0.247, loss=0.2144, over 17012.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008254, over 51.00 utterances.], tot_loss[ctc_loss=0.06854, att_loss=0.2343, loss=0.2011, over 3288756.55 frames. utt_duration=1252 frames, utt_pad_proportion=0.04774, over 10520.55 utterances.], batch size: 51, lr: 4.48e-03, grad_scale: 16.0 2023-03-09 04:04:56,254 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93392.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 04:05:19,375 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2694, 4.4200, 4.5199, 4.4703, 5.0219, 4.4922, 4.4065, 2.3871], device='cuda:2'), covar=tensor([0.0301, 0.0418, 0.0329, 0.0418, 0.0755, 0.0225, 0.0366, 0.1756], device='cuda:2'), in_proj_covar=tensor([0.0171, 0.0198, 0.0196, 0.0211, 0.0367, 0.0167, 0.0185, 0.0211], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:05:48,350 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93424.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:05:52,976 INFO [train2.py:809] (2/4) Epoch 24, batch 1800, loss[ctc_loss=0.05308, att_loss=0.2164, loss=0.1838, over 15660.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.008007, over 37.00 utterances.], tot_loss[ctc_loss=0.0682, att_loss=0.2335, loss=0.2004, over 3283277.33 frames. utt_duration=1277 frames, utt_pad_proportion=0.04267, over 10294.12 utterances.], batch size: 37, lr: 4.48e-03, grad_scale: 16.0 2023-03-09 04:06:19,719 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6340, 4.9545, 4.8383, 4.9515, 5.0025, 4.7229, 3.3694, 4.9800], device='cuda:2'), covar=tensor([0.0120, 0.0125, 0.0127, 0.0091, 0.0095, 0.0121, 0.0734, 0.0196], device='cuda:2'), in_proj_covar=tensor([0.0094, 0.0090, 0.0112, 0.0071, 0.0077, 0.0087, 0.0104, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 04:06:25,985 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.320e+02 1.826e+02 2.245e+02 2.637e+02 3.732e+02, threshold=4.489e+02, percent-clipped=0.0 2023-03-09 04:07:05,215 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93471.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:07:14,512 INFO [train2.py:809] (2/4) Epoch 24, batch 1850, loss[ctc_loss=0.06636, att_loss=0.2227, loss=0.1914, over 15861.00 frames. utt_duration=1628 frames, utt_pad_proportion=0.01086, over 39.00 utterances.], tot_loss[ctc_loss=0.06858, att_loss=0.2336, loss=0.2006, over 3271754.99 frames. utt_duration=1236 frames, utt_pad_proportion=0.0564, over 10604.49 utterances.], batch size: 39, lr: 4.48e-03, grad_scale: 16.0 2023-03-09 04:07:24,794 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6607, 4.9771, 5.4564, 4.9909, 4.8587, 5.6223, 5.0587, 5.5655], device='cuda:2'), covar=tensor([0.1559, 0.1447, 0.1275, 0.2337, 0.3524, 0.1654, 0.1303, 0.1550], device='cuda:2'), in_proj_covar=tensor([0.0894, 0.0519, 0.0619, 0.0665, 0.0891, 0.0640, 0.0505, 0.0629], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 04:07:31,205 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93487.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:08:35,371 INFO [train2.py:809] (2/4) Epoch 24, batch 1900, loss[ctc_loss=0.05306, att_loss=0.2135, loss=0.1814, over 16247.00 frames. utt_duration=1513 frames, utt_pad_proportion=0.00929, over 43.00 utterances.], tot_loss[ctc_loss=0.06864, att_loss=0.2337, loss=0.2007, over 3278918.79 frames. utt_duration=1248 frames, utt_pad_proportion=0.05106, over 10517.71 utterances.], batch size: 43, lr: 4.48e-03, grad_scale: 16.0 2023-03-09 04:08:48,849 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93535.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:09:07,452 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.393e+02 1.868e+02 2.241e+02 2.979e+02 6.717e+02, threshold=4.482e+02, percent-clipped=6.0 2023-03-09 04:09:21,302 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93555.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:09:22,806 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2594, 5.2696, 4.8930, 3.1524, 5.1106, 4.9673, 4.3941, 2.3231], device='cuda:2'), covar=tensor([0.0160, 0.0105, 0.0339, 0.1151, 0.0110, 0.0184, 0.0410, 0.2174], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0104, 0.0107, 0.0112, 0.0087, 0.0116, 0.0100, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 04:09:55,676 INFO [train2.py:809] (2/4) Epoch 24, batch 1950, loss[ctc_loss=0.05201, att_loss=0.2154, loss=0.1828, over 15899.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.008381, over 39.00 utterances.], tot_loss[ctc_loss=0.06879, att_loss=0.2339, loss=0.2008, over 3280422.66 frames. utt_duration=1268 frames, utt_pad_proportion=0.04719, over 10358.27 utterances.], batch size: 39, lr: 4.48e-03, grad_scale: 16.0 2023-03-09 04:10:06,474 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6625, 3.0519, 2.9931, 2.6915, 3.0814, 2.9953, 3.1073, 2.2390], device='cuda:2'), covar=tensor([0.1080, 0.1462, 0.2662, 0.3389, 0.1060, 0.2594, 0.1126, 0.3774], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0197, 0.0212, 0.0263, 0.0170, 0.0272, 0.0196, 0.0225], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:10:12,621 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93587.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:10:38,953 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93603.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:11:08,714 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2023-03-09 04:11:17,165 INFO [train2.py:809] (2/4) Epoch 24, batch 2000, loss[ctc_loss=0.1064, att_loss=0.261, loss=0.2301, over 17271.00 frames. utt_duration=1172 frames, utt_pad_proportion=0.02536, over 59.00 utterances.], tot_loss[ctc_loss=0.06942, att_loss=0.2344, loss=0.2014, over 3281966.15 frames. utt_duration=1243 frames, utt_pad_proportion=0.05365, over 10575.39 utterances.], batch size: 59, lr: 4.48e-03, grad_scale: 16.0 2023-03-09 04:11:24,603 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93631.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:11:30,358 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-09 04:11:31,238 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93635.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:11:50,188 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.339e+02 1.872e+02 2.240e+02 2.588e+02 7.651e+02, threshold=4.480e+02, percent-clipped=4.0 2023-03-09 04:11:53,935 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2023-03-09 04:12:07,326 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-03-09 04:12:22,117 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93666.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:12:38,935 INFO [train2.py:809] (2/4) Epoch 24, batch 2050, loss[ctc_loss=0.06425, att_loss=0.2369, loss=0.2024, over 17406.00 frames. utt_duration=1107 frames, utt_pad_proportion=0.0327, over 63.00 utterances.], tot_loss[ctc_loss=0.06926, att_loss=0.2343, loss=0.2013, over 3287403.94 frames. utt_duration=1247 frames, utt_pad_proportion=0.05094, over 10557.13 utterances.], batch size: 63, lr: 4.48e-03, grad_scale: 16.0 2023-03-09 04:12:42,980 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93679.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:13:03,607 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9238, 5.1987, 4.8098, 5.2910, 4.6960, 4.8756, 5.3532, 5.1590], device='cuda:2'), covar=tensor([0.0602, 0.0302, 0.0792, 0.0320, 0.0397, 0.0329, 0.0242, 0.0202], device='cuda:2'), in_proj_covar=tensor([0.0394, 0.0326, 0.0367, 0.0355, 0.0326, 0.0241, 0.0309, 0.0289], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 04:13:03,672 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93692.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:13:39,500 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93714.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:14:00,050 INFO [train2.py:809] (2/4) Epoch 24, batch 2100, loss[ctc_loss=0.0542, att_loss=0.2293, loss=0.1943, over 16336.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005867, over 45.00 utterances.], tot_loss[ctc_loss=0.06875, att_loss=0.234, loss=0.2009, over 3288411.33 frames. utt_duration=1240 frames, utt_pad_proportion=0.05293, over 10617.07 utterances.], batch size: 45, lr: 4.48e-03, grad_scale: 8.0 2023-03-09 04:14:15,421 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3568, 2.3933, 4.7989, 3.7044, 2.8976, 4.1551, 4.4460, 4.4823], device='cuda:2'), covar=tensor([0.0277, 0.1721, 0.0214, 0.0882, 0.1651, 0.0264, 0.0234, 0.0276], device='cuda:2'), in_proj_covar=tensor([0.0212, 0.0244, 0.0204, 0.0321, 0.0268, 0.0224, 0.0194, 0.0221], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:14:21,495 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93740.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:14:34,155 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.365e+02 1.856e+02 2.273e+02 2.574e+02 6.013e+02, threshold=4.546e+02, percent-clipped=4.0 2023-03-09 04:15:09,528 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93770.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:15:11,023 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93771.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:15:19,887 INFO [train2.py:809] (2/4) Epoch 24, batch 2150, loss[ctc_loss=0.06209, att_loss=0.2362, loss=0.2014, over 17165.00 frames. utt_duration=870.7 frames, utt_pad_proportion=0.0844, over 79.00 utterances.], tot_loss[ctc_loss=0.06905, att_loss=0.2339, loss=0.201, over 3288575.63 frames. utt_duration=1255 frames, utt_pad_proportion=0.04788, over 10496.90 utterances.], batch size: 79, lr: 4.47e-03, grad_scale: 8.0 2023-03-09 04:15:50,114 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1342, 5.4681, 5.0680, 5.5028, 4.9123, 5.1262, 5.6218, 5.3711], device='cuda:2'), covar=tensor([0.0580, 0.0241, 0.0703, 0.0258, 0.0340, 0.0215, 0.0183, 0.0183], device='cuda:2'), in_proj_covar=tensor([0.0392, 0.0324, 0.0366, 0.0354, 0.0325, 0.0240, 0.0308, 0.0288], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0005, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 04:16:11,727 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7238, 3.1639, 3.9404, 3.3559, 3.6904, 4.7900, 4.5799, 3.5627], device='cuda:2'), covar=tensor([0.0334, 0.1573, 0.1015, 0.1164, 0.1041, 0.0717, 0.0643, 0.0998], device='cuda:2'), in_proj_covar=tensor([0.0247, 0.0247, 0.0284, 0.0219, 0.0269, 0.0371, 0.0268, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 04:16:17,994 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9537, 3.6685, 3.6597, 3.1828, 3.7239, 3.7706, 3.7380, 2.6488], device='cuda:2'), covar=tensor([0.1008, 0.1338, 0.1839, 0.3016, 0.0913, 0.1703, 0.1023, 0.3809], device='cuda:2'), in_proj_covar=tensor([0.0192, 0.0198, 0.0212, 0.0265, 0.0171, 0.0272, 0.0197, 0.0225], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:16:28,703 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93819.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:16:41,184 INFO [train2.py:809] (2/4) Epoch 24, batch 2200, loss[ctc_loss=0.06347, att_loss=0.2421, loss=0.2063, over 17066.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.009108, over 53.00 utterances.], tot_loss[ctc_loss=0.06863, att_loss=0.2335, loss=0.2005, over 3285842.33 frames. utt_duration=1278 frames, utt_pad_proportion=0.0428, over 10295.86 utterances.], batch size: 53, lr: 4.47e-03, grad_scale: 8.0 2023-03-09 04:16:49,239 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93831.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:16:58,701 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1319, 2.7109, 3.0156, 4.1108, 3.7217, 3.7475, 2.7726, 2.2027], device='cuda:2'), covar=tensor([0.0783, 0.1914, 0.0976, 0.0603, 0.0857, 0.0486, 0.1570, 0.2121], device='cuda:2'), in_proj_covar=tensor([0.0183, 0.0216, 0.0188, 0.0221, 0.0228, 0.0183, 0.0204, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:17:16,339 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.265e+02 1.790e+02 2.187e+02 2.751e+02 8.387e+02, threshold=4.374e+02, percent-clipped=4.0 2023-03-09 04:18:00,714 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.20 vs. limit=5.0 2023-03-09 04:18:02,864 INFO [train2.py:809] (2/4) Epoch 24, batch 2250, loss[ctc_loss=0.04994, att_loss=0.2258, loss=0.1907, over 16272.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007721, over 43.00 utterances.], tot_loss[ctc_loss=0.0694, att_loss=0.2339, loss=0.201, over 3278293.23 frames. utt_duration=1240 frames, utt_pad_proportion=0.05388, over 10591.02 utterances.], batch size: 43, lr: 4.47e-03, grad_scale: 8.0 2023-03-09 04:18:37,209 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5480, 4.7793, 4.6791, 4.5599, 5.3698, 4.6818, 4.6952, 3.0863], device='cuda:2'), covar=tensor([0.0265, 0.0309, 0.0401, 0.0397, 0.0723, 0.0229, 0.0328, 0.1412], device='cuda:2'), in_proj_covar=tensor([0.0173, 0.0200, 0.0198, 0.0214, 0.0371, 0.0170, 0.0187, 0.0214], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:19:24,950 INFO [train2.py:809] (2/4) Epoch 24, batch 2300, loss[ctc_loss=0.05737, att_loss=0.233, loss=0.1979, over 16604.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.005102, over 47.00 utterances.], tot_loss[ctc_loss=0.06934, att_loss=0.2334, loss=0.2006, over 3263754.44 frames. utt_duration=1202 frames, utt_pad_proportion=0.06556, over 10878.03 utterances.], batch size: 47, lr: 4.47e-03, grad_scale: 8.0 2023-03-09 04:19:45,408 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7083, 5.9113, 5.4139, 5.6190, 5.5996, 5.1431, 5.3376, 5.0390], device='cuda:2'), covar=tensor([0.1177, 0.0962, 0.0903, 0.0837, 0.0884, 0.1571, 0.2523, 0.2358], device='cuda:2'), in_proj_covar=tensor([0.0537, 0.0625, 0.0474, 0.0473, 0.0439, 0.0479, 0.0632, 0.0540], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 04:20:00,413 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.894e+02 2.401e+02 2.825e+02 6.941e+02, threshold=4.802e+02, percent-clipped=5.0 2023-03-09 04:20:34,660 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6819, 4.8653, 4.3582, 4.7450, 4.5734, 4.1726, 4.4378, 4.1511], device='cuda:2'), covar=tensor([0.1262, 0.1336, 0.1107, 0.1062, 0.1107, 0.1792, 0.2440, 0.2851], device='cuda:2'), in_proj_covar=tensor([0.0538, 0.0627, 0.0476, 0.0475, 0.0440, 0.0480, 0.0633, 0.0542], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 04:20:34,866 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7893, 2.4605, 2.6505, 3.3617, 3.0420, 3.1969, 2.5155, 2.2656], device='cuda:2'), covar=tensor([0.0798, 0.1895, 0.0934, 0.0746, 0.0980, 0.0591, 0.1579, 0.1749], device='cuda:2'), in_proj_covar=tensor([0.0184, 0.0217, 0.0188, 0.0222, 0.0229, 0.0184, 0.0205, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:20:47,686 INFO [train2.py:809] (2/4) Epoch 24, batch 2350, loss[ctc_loss=0.04587, att_loss=0.2303, loss=0.1934, over 17025.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.008437, over 51.00 utterances.], tot_loss[ctc_loss=0.06912, att_loss=0.2332, loss=0.2004, over 3265024.45 frames. utt_duration=1215 frames, utt_pad_proportion=0.06301, over 10760.82 utterances.], batch size: 51, lr: 4.47e-03, grad_scale: 8.0 2023-03-09 04:20:54,667 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93981.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:21:51,923 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0579, 4.2307, 4.3024, 4.5983, 2.7485, 4.3712, 2.6622, 1.6482], device='cuda:2'), covar=tensor([0.0466, 0.0317, 0.0705, 0.0243, 0.1594, 0.0261, 0.1527, 0.1790], device='cuda:2'), in_proj_covar=tensor([0.0202, 0.0173, 0.0263, 0.0167, 0.0223, 0.0159, 0.0231, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:22:13,394 INFO [train2.py:809] (2/4) Epoch 24, batch 2400, loss[ctc_loss=0.1222, att_loss=0.2636, loss=0.2353, over 13837.00 frames. utt_duration=380.7 frames, utt_pad_proportion=0.3367, over 146.00 utterances.], tot_loss[ctc_loss=0.06948, att_loss=0.2345, loss=0.2015, over 3277900.51 frames. utt_duration=1213 frames, utt_pad_proportion=0.06094, over 10818.82 utterances.], batch size: 146, lr: 4.47e-03, grad_scale: 8.0 2023-03-09 04:22:38,935 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=94042.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:22:47,938 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.339e+02 1.881e+02 2.121e+02 2.677e+02 6.185e+02, threshold=4.243e+02, percent-clipped=1.0 2023-03-09 04:23:34,729 INFO [train2.py:809] (2/4) Epoch 24, batch 2450, loss[ctc_loss=0.07208, att_loss=0.2466, loss=0.2117, over 16533.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.005886, over 45.00 utterances.], tot_loss[ctc_loss=0.06963, att_loss=0.2345, loss=0.2015, over 3279926.12 frames. utt_duration=1212 frames, utt_pad_proportion=0.0606, over 10837.50 utterances.], batch size: 45, lr: 4.47e-03, grad_scale: 8.0 2023-03-09 04:23:41,479 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2023-03-09 04:24:55,077 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=94126.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:24:56,497 INFO [train2.py:809] (2/4) Epoch 24, batch 2500, loss[ctc_loss=0.07003, att_loss=0.2359, loss=0.2027, over 16268.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007904, over 43.00 utterances.], tot_loss[ctc_loss=0.07035, att_loss=0.2345, loss=0.2016, over 3269018.25 frames. utt_duration=1185 frames, utt_pad_proportion=0.07099, over 11053.24 utterances.], batch size: 43, lr: 4.47e-03, grad_scale: 8.0 2023-03-09 04:25:12,548 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2547, 4.4087, 4.8561, 4.8238, 3.0355, 4.5737, 2.9721, 1.8914], device='cuda:2'), covar=tensor([0.0421, 0.0300, 0.0519, 0.0233, 0.1390, 0.0253, 0.1320, 0.1665], device='cuda:2'), in_proj_covar=tensor([0.0203, 0.0174, 0.0263, 0.0167, 0.0223, 0.0159, 0.0231, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:25:31,027 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.204e+02 1.808e+02 2.198e+02 2.566e+02 4.413e+02, threshold=4.397e+02, percent-clipped=1.0 2023-03-09 04:26:18,035 INFO [train2.py:809] (2/4) Epoch 24, batch 2550, loss[ctc_loss=0.07343, att_loss=0.2287, loss=0.1977, over 15942.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007202, over 41.00 utterances.], tot_loss[ctc_loss=0.06911, att_loss=0.2331, loss=0.2003, over 3255780.59 frames. utt_duration=1219 frames, utt_pad_proportion=0.06534, over 10695.56 utterances.], batch size: 41, lr: 4.47e-03, grad_scale: 8.0 2023-03-09 04:26:25,262 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9530, 3.5826, 3.5992, 2.7899, 3.6497, 3.6780, 3.7078, 2.1740], device='cuda:2'), covar=tensor([0.1261, 0.1369, 0.2352, 0.5737, 0.1364, 0.2904, 0.1115, 0.6726], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0199, 0.0215, 0.0266, 0.0174, 0.0275, 0.0199, 0.0228], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:27:22,319 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6061, 3.0110, 2.6411, 2.8751, 3.1328, 3.0378, 2.4932, 3.0359], device='cuda:2'), covar=tensor([0.0818, 0.0402, 0.0774, 0.0555, 0.0583, 0.0566, 0.0798, 0.0426], device='cuda:2'), in_proj_covar=tensor([0.0204, 0.0221, 0.0225, 0.0204, 0.0282, 0.0245, 0.0200, 0.0291], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 04:27:33,004 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=94223.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 04:27:39,577 INFO [train2.py:809] (2/4) Epoch 24, batch 2600, loss[ctc_loss=0.07228, att_loss=0.2435, loss=0.2093, over 17447.00 frames. utt_duration=1013 frames, utt_pad_proportion=0.04535, over 69.00 utterances.], tot_loss[ctc_loss=0.06858, att_loss=0.2328, loss=0.1999, over 3255857.17 frames. utt_duration=1240 frames, utt_pad_proportion=0.06081, over 10514.78 utterances.], batch size: 69, lr: 4.46e-03, grad_scale: 8.0 2023-03-09 04:27:45,531 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4193, 4.6110, 4.5711, 4.5788, 5.1992, 4.6436, 4.5387, 2.7848], device='cuda:2'), covar=tensor([0.0286, 0.0293, 0.0326, 0.0336, 0.0623, 0.0224, 0.0316, 0.1565], device='cuda:2'), in_proj_covar=tensor([0.0176, 0.0202, 0.0200, 0.0216, 0.0376, 0.0172, 0.0190, 0.0217], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:27:57,910 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2023-03-09 04:28:14,366 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.213e+02 1.761e+02 2.111e+02 2.578e+02 4.836e+02, threshold=4.223e+02, percent-clipped=1.0 2023-03-09 04:29:01,212 INFO [train2.py:809] (2/4) Epoch 24, batch 2650, loss[ctc_loss=0.05154, att_loss=0.2054, loss=0.1747, over 15350.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01263, over 35.00 utterances.], tot_loss[ctc_loss=0.06881, att_loss=0.2335, loss=0.2006, over 3266250.16 frames. utt_duration=1239 frames, utt_pad_proportion=0.05739, over 10557.77 utterances.], batch size: 35, lr: 4.46e-03, grad_scale: 8.0 2023-03-09 04:29:14,082 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=94284.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 04:29:44,041 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0933, 3.8179, 3.8213, 3.2879, 3.8218, 3.9047, 3.8489, 2.9283], device='cuda:2'), covar=tensor([0.1015, 0.1096, 0.1407, 0.2907, 0.1620, 0.1909, 0.0795, 0.3167], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0198, 0.0213, 0.0265, 0.0174, 0.0273, 0.0198, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:29:48,679 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5199, 4.8749, 4.8204, 4.9066, 4.9667, 4.5829, 3.1372, 4.8299], device='cuda:2'), covar=tensor([0.0144, 0.0117, 0.0126, 0.0086, 0.0113, 0.0132, 0.0855, 0.0222], device='cuda:2'), in_proj_covar=tensor([0.0094, 0.0090, 0.0112, 0.0070, 0.0077, 0.0088, 0.0104, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 04:30:23,803 INFO [train2.py:809] (2/4) Epoch 24, batch 2700, loss[ctc_loss=0.05014, att_loss=0.2105, loss=0.1784, over 15496.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.007932, over 36.00 utterances.], tot_loss[ctc_loss=0.06876, att_loss=0.2331, loss=0.2002, over 3265047.65 frames. utt_duration=1240 frames, utt_pad_proportion=0.05877, over 10543.85 utterances.], batch size: 36, lr: 4.46e-03, grad_scale: 8.0 2023-03-09 04:30:40,575 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=94337.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:30:57,580 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.293e+02 1.818e+02 2.150e+02 2.630e+02 4.839e+02, threshold=4.300e+02, percent-clipped=3.0 2023-03-09 04:30:58,029 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=94348.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:31:45,417 INFO [train2.py:809] (2/4) Epoch 24, batch 2750, loss[ctc_loss=0.04764, att_loss=0.2047, loss=0.1733, over 16183.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006029, over 41.00 utterances.], tot_loss[ctc_loss=0.06879, att_loss=0.233, loss=0.2002, over 3266007.40 frames. utt_duration=1239 frames, utt_pad_proportion=0.05905, over 10555.90 utterances.], batch size: 41, lr: 4.46e-03, grad_scale: 8.0 2023-03-09 04:32:38,159 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=94409.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 04:32:39,042 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-09 04:33:06,661 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=94426.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:33:07,895 INFO [train2.py:809] (2/4) Epoch 24, batch 2800, loss[ctc_loss=0.07552, att_loss=0.2388, loss=0.2061, over 17288.00 frames. utt_duration=876.8 frames, utt_pad_proportion=0.08096, over 79.00 utterances.], tot_loss[ctc_loss=0.06953, att_loss=0.2338, loss=0.201, over 3270005.98 frames. utt_duration=1208 frames, utt_pad_proportion=0.06595, over 10841.71 utterances.], batch size: 79, lr: 4.46e-03, grad_scale: 8.0 2023-03-09 04:33:37,746 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1779, 5.3930, 5.4043, 5.3712, 5.4237, 5.4304, 5.0472, 4.9179], device='cuda:2'), covar=tensor([0.1070, 0.0598, 0.0375, 0.0582, 0.0310, 0.0322, 0.0445, 0.0304], device='cuda:2'), in_proj_covar=tensor([0.0529, 0.0369, 0.0359, 0.0369, 0.0434, 0.0442, 0.0368, 0.0400], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 04:33:40,431 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.912e+02 2.313e+02 2.793e+02 7.224e+02, threshold=4.627e+02, percent-clipped=4.0 2023-03-09 04:34:24,211 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=94474.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:34:28,881 INFO [train2.py:809] (2/4) Epoch 24, batch 2850, loss[ctc_loss=0.07225, att_loss=0.2352, loss=0.2026, over 16976.00 frames. utt_duration=687.5 frames, utt_pad_proportion=0.1374, over 99.00 utterances.], tot_loss[ctc_loss=0.06885, att_loss=0.2335, loss=0.2006, over 3261002.56 frames. utt_duration=1211 frames, utt_pad_proportion=0.06722, over 10789.02 utterances.], batch size: 99, lr: 4.46e-03, grad_scale: 8.0 2023-03-09 04:35:45,455 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2023-03-09 04:35:52,348 INFO [train2.py:809] (2/4) Epoch 24, batch 2900, loss[ctc_loss=0.09185, att_loss=0.2556, loss=0.2228, over 17041.00 frames. utt_duration=1287 frames, utt_pad_proportion=0.0105, over 53.00 utterances.], tot_loss[ctc_loss=0.06937, att_loss=0.2334, loss=0.2006, over 3261712.73 frames. utt_duration=1203 frames, utt_pad_proportion=0.06795, over 10859.01 utterances.], batch size: 53, lr: 4.46e-03, grad_scale: 8.0 2023-03-09 04:36:08,814 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0737, 5.3272, 5.3217, 5.2260, 5.3336, 5.3554, 4.9580, 4.8196], device='cuda:2'), covar=tensor([0.1010, 0.0566, 0.0287, 0.0516, 0.0303, 0.0312, 0.0433, 0.0326], device='cuda:2'), in_proj_covar=tensor([0.0529, 0.0370, 0.0360, 0.0370, 0.0434, 0.0443, 0.0369, 0.0400], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 04:36:26,073 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 1.878e+02 2.321e+02 2.758e+02 4.740e+02, threshold=4.642e+02, percent-clipped=1.0 2023-03-09 04:37:14,351 INFO [train2.py:809] (2/4) Epoch 24, batch 2950, loss[ctc_loss=0.04549, att_loss=0.221, loss=0.1859, over 16477.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.005261, over 46.00 utterances.], tot_loss[ctc_loss=0.06905, att_loss=0.2334, loss=0.2005, over 3270921.09 frames. utt_duration=1239 frames, utt_pad_proportion=0.05751, over 10569.83 utterances.], batch size: 46, lr: 4.46e-03, grad_scale: 8.0 2023-03-09 04:37:17,809 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=94579.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 04:38:09,779 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0647, 4.1691, 3.9800, 3.9576, 4.4537, 4.1131, 3.8931, 2.5875], device='cuda:2'), covar=tensor([0.0306, 0.0395, 0.0461, 0.0387, 0.0738, 0.0267, 0.0424, 0.1616], device='cuda:2'), in_proj_covar=tensor([0.0175, 0.0202, 0.0201, 0.0215, 0.0372, 0.0171, 0.0189, 0.0215], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:38:15,459 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.59 vs. limit=5.0 2023-03-09 04:38:36,142 INFO [train2.py:809] (2/4) Epoch 24, batch 3000, loss[ctc_loss=0.06615, att_loss=0.2166, loss=0.1865, over 15777.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008992, over 38.00 utterances.], tot_loss[ctc_loss=0.06966, att_loss=0.2337, loss=0.2009, over 3263318.20 frames. utt_duration=1223 frames, utt_pad_proportion=0.0638, over 10688.41 utterances.], batch size: 38, lr: 4.45e-03, grad_scale: 8.0 2023-03-09 04:38:36,143 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 04:38:50,630 INFO [train2.py:843] (2/4) Epoch 24, validation: ctc_loss=0.04165, att_loss=0.2345, loss=0.196, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 04:38:50,631 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 04:39:06,750 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=94637.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:39:23,558 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.235e+02 2.097e+02 2.380e+02 3.047e+02 5.871e+02, threshold=4.759e+02, percent-clipped=2.0 2023-03-09 04:40:11,569 INFO [train2.py:809] (2/4) Epoch 24, batch 3050, loss[ctc_loss=0.07561, att_loss=0.2388, loss=0.2062, over 16462.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.006734, over 46.00 utterances.], tot_loss[ctc_loss=0.06955, att_loss=0.2333, loss=0.2006, over 3260699.38 frames. utt_duration=1212 frames, utt_pad_proportion=0.06684, over 10778.59 utterances.], batch size: 46, lr: 4.45e-03, grad_scale: 8.0 2023-03-09 04:40:24,108 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=94685.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:40:54,714 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=94704.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 04:41:32,550 INFO [train2.py:809] (2/4) Epoch 24, batch 3100, loss[ctc_loss=0.09443, att_loss=0.2541, loss=0.2222, over 17265.00 frames. utt_duration=1257 frames, utt_pad_proportion=0.01419, over 55.00 utterances.], tot_loss[ctc_loss=0.06933, att_loss=0.2334, loss=0.2006, over 3266690.98 frames. utt_duration=1230 frames, utt_pad_proportion=0.06107, over 10637.54 utterances.], batch size: 55, lr: 4.45e-03, grad_scale: 8.0 2023-03-09 04:42:05,407 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.429e+02 1.910e+02 2.247e+02 2.795e+02 4.738e+02, threshold=4.495e+02, percent-clipped=0.0 2023-03-09 04:42:54,023 INFO [train2.py:809] (2/4) Epoch 24, batch 3150, loss[ctc_loss=0.09654, att_loss=0.2619, loss=0.2288, over 17050.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008076, over 52.00 utterances.], tot_loss[ctc_loss=0.06962, att_loss=0.234, loss=0.2011, over 3274429.26 frames. utt_duration=1225 frames, utt_pad_proportion=0.06013, over 10704.92 utterances.], batch size: 52, lr: 4.45e-03, grad_scale: 8.0 2023-03-09 04:43:01,856 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.35 vs. limit=5.0 2023-03-09 04:43:23,028 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1957, 4.3152, 4.5700, 4.8637, 2.9027, 4.5441, 3.0564, 1.6814], device='cuda:2'), covar=tensor([0.0438, 0.0295, 0.0532, 0.0173, 0.1368, 0.0210, 0.1167, 0.1649], device='cuda:2'), in_proj_covar=tensor([0.0202, 0.0174, 0.0262, 0.0167, 0.0222, 0.0160, 0.0230, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:44:15,930 INFO [train2.py:809] (2/4) Epoch 24, batch 3200, loss[ctc_loss=0.08438, att_loss=0.2564, loss=0.222, over 17299.00 frames. utt_duration=1100 frames, utt_pad_proportion=0.03599, over 63.00 utterances.], tot_loss[ctc_loss=0.06946, att_loss=0.2341, loss=0.2011, over 3279781.63 frames. utt_duration=1233 frames, utt_pad_proportion=0.05488, over 10652.60 utterances.], batch size: 63, lr: 4.45e-03, grad_scale: 8.0 2023-03-09 04:44:44,519 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2415, 3.8512, 3.2601, 3.5899, 4.0482, 3.7334, 3.1067, 4.3510], device='cuda:2'), covar=tensor([0.0879, 0.0510, 0.1017, 0.0607, 0.0648, 0.0663, 0.0842, 0.0450], device='cuda:2'), in_proj_covar=tensor([0.0205, 0.0223, 0.0226, 0.0205, 0.0283, 0.0245, 0.0203, 0.0292], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 04:44:48,824 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.290e+02 1.841e+02 2.191e+02 2.766e+02 6.449e+02, threshold=4.381e+02, percent-clipped=4.0 2023-03-09 04:45:14,011 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2023-03-09 04:45:36,260 INFO [train2.py:809] (2/4) Epoch 24, batch 3250, loss[ctc_loss=0.08408, att_loss=0.2526, loss=0.2189, over 17382.00 frames. utt_duration=881.5 frames, utt_pad_proportion=0.07791, over 79.00 utterances.], tot_loss[ctc_loss=0.06988, att_loss=0.2347, loss=0.2017, over 3285482.65 frames. utt_duration=1241 frames, utt_pad_proportion=0.0501, over 10604.15 utterances.], batch size: 79, lr: 4.45e-03, grad_scale: 8.0 2023-03-09 04:45:37,333 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.35 vs. limit=5.0 2023-03-09 04:45:39,724 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=94879.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 04:46:56,545 INFO [train2.py:809] (2/4) Epoch 24, batch 3300, loss[ctc_loss=0.09343, att_loss=0.2589, loss=0.2258, over 17304.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01158, over 55.00 utterances.], tot_loss[ctc_loss=0.06992, att_loss=0.2346, loss=0.2017, over 3285138.35 frames. utt_duration=1243 frames, utt_pad_proportion=0.05088, over 10584.90 utterances.], batch size: 55, lr: 4.45e-03, grad_scale: 8.0 2023-03-09 04:46:56,645 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=94927.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 04:47:29,073 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.857e+02 2.212e+02 2.661e+02 5.199e+02, threshold=4.424e+02, percent-clipped=2.0 2023-03-09 04:48:16,306 INFO [train2.py:809] (2/4) Epoch 24, batch 3350, loss[ctc_loss=0.07313, att_loss=0.2383, loss=0.2053, over 17016.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008123, over 51.00 utterances.], tot_loss[ctc_loss=0.06928, att_loss=0.2348, loss=0.2017, over 3286598.02 frames. utt_duration=1238 frames, utt_pad_proportion=0.05274, over 10633.49 utterances.], batch size: 51, lr: 4.45e-03, grad_scale: 8.0 2023-03-09 04:48:59,850 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95004.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:49:36,980 INFO [train2.py:809] (2/4) Epoch 24, batch 3400, loss[ctc_loss=0.0545, att_loss=0.2378, loss=0.2011, over 17033.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.00784, over 51.00 utterances.], tot_loss[ctc_loss=0.06937, att_loss=0.2344, loss=0.2014, over 3283960.12 frames. utt_duration=1231 frames, utt_pad_proportion=0.05486, over 10680.07 utterances.], batch size: 51, lr: 4.45e-03, grad_scale: 8.0 2023-03-09 04:50:09,912 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.325e+02 1.881e+02 2.292e+02 2.864e+02 6.221e+02, threshold=4.584e+02, percent-clipped=2.0 2023-03-09 04:50:16,274 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=95052.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:50:56,263 INFO [train2.py:809] (2/4) Epoch 24, batch 3450, loss[ctc_loss=0.0667, att_loss=0.233, loss=0.1997, over 16329.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006186, over 45.00 utterances.], tot_loss[ctc_loss=0.07032, att_loss=0.235, loss=0.2021, over 3282543.61 frames. utt_duration=1214 frames, utt_pad_proportion=0.06071, over 10829.52 utterances.], batch size: 45, lr: 4.44e-03, grad_scale: 8.0 2023-03-09 04:51:02,165 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.55 vs. limit=2.0 2023-03-09 04:51:34,174 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.43 vs. limit=5.0 2023-03-09 04:51:41,248 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95105.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 04:52:16,795 INFO [train2.py:809] (2/4) Epoch 24, batch 3500, loss[ctc_loss=0.07662, att_loss=0.2411, loss=0.2082, over 17286.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01288, over 55.00 utterances.], tot_loss[ctc_loss=0.0703, att_loss=0.2351, loss=0.2021, over 3280281.87 frames. utt_duration=1207 frames, utt_pad_proportion=0.06327, over 10887.31 utterances.], batch size: 55, lr: 4.44e-03, grad_scale: 8.0 2023-03-09 04:52:49,703 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.240e+01 1.898e+02 2.336e+02 2.890e+02 6.917e+02, threshold=4.673e+02, percent-clipped=5.0 2023-03-09 04:53:18,840 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95166.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 04:53:35,725 INFO [train2.py:809] (2/4) Epoch 24, batch 3550, loss[ctc_loss=0.06847, att_loss=0.2347, loss=0.2015, over 15627.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009247, over 37.00 utterances.], tot_loss[ctc_loss=0.07013, att_loss=0.2346, loss=0.2017, over 3272819.61 frames. utt_duration=1216 frames, utt_pad_proportion=0.0627, over 10780.95 utterances.], batch size: 37, lr: 4.44e-03, grad_scale: 8.0 2023-03-09 04:53:39,189 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3421, 2.7353, 3.6316, 2.7174, 3.5854, 4.5063, 4.3735, 3.2291], device='cuda:2'), covar=tensor([0.0375, 0.1752, 0.1127, 0.1495, 0.0940, 0.0674, 0.0524, 0.1137], device='cuda:2'), in_proj_covar=tensor([0.0250, 0.0248, 0.0287, 0.0222, 0.0270, 0.0375, 0.0268, 0.0235], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 04:54:08,508 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-09 04:54:18,963 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1476, 5.4530, 5.3968, 5.4066, 5.5120, 5.4525, 5.0997, 4.9292], device='cuda:2'), covar=tensor([0.1019, 0.0562, 0.0282, 0.0468, 0.0250, 0.0312, 0.0403, 0.0294], device='cuda:2'), in_proj_covar=tensor([0.0531, 0.0373, 0.0361, 0.0371, 0.0434, 0.0441, 0.0370, 0.0402], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 04:54:56,144 INFO [train2.py:809] (2/4) Epoch 24, batch 3600, loss[ctc_loss=0.08149, att_loss=0.2524, loss=0.2182, over 17053.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008752, over 52.00 utterances.], tot_loss[ctc_loss=0.06959, att_loss=0.2336, loss=0.2008, over 3273645.68 frames. utt_duration=1243 frames, utt_pad_proportion=0.05723, over 10550.00 utterances.], batch size: 52, lr: 4.44e-03, grad_scale: 8.0 2023-03-09 04:55:29,599 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.313e+02 1.952e+02 2.410e+02 2.837e+02 7.042e+02, threshold=4.821e+02, percent-clipped=2.0 2023-03-09 04:55:50,104 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95260.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:56:02,713 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95268.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:56:16,596 INFO [train2.py:809] (2/4) Epoch 24, batch 3650, loss[ctc_loss=0.07406, att_loss=0.2426, loss=0.2089, over 17035.00 frames. utt_duration=1287 frames, utt_pad_proportion=0.0106, over 53.00 utterances.], tot_loss[ctc_loss=0.069, att_loss=0.2331, loss=0.2003, over 3275225.95 frames. utt_duration=1261 frames, utt_pad_proportion=0.05146, over 10400.60 utterances.], batch size: 53, lr: 4.44e-03, grad_scale: 8.0 2023-03-09 04:56:29,458 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1372, 5.3960, 5.6875, 5.4526, 5.6141, 6.0847, 5.3467, 6.1243], device='cuda:2'), covar=tensor([0.0674, 0.0692, 0.0791, 0.1438, 0.1894, 0.0924, 0.0712, 0.0694], device='cuda:2'), in_proj_covar=tensor([0.0893, 0.0519, 0.0624, 0.0670, 0.0899, 0.0646, 0.0509, 0.0631], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 04:57:28,678 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95321.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:57:37,493 INFO [train2.py:809] (2/4) Epoch 24, batch 3700, loss[ctc_loss=0.06647, att_loss=0.214, loss=0.1845, over 15631.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009262, over 37.00 utterances.], tot_loss[ctc_loss=0.06874, att_loss=0.233, loss=0.2001, over 3275248.73 frames. utt_duration=1268 frames, utt_pad_proportion=0.0495, over 10345.62 utterances.], batch size: 37, lr: 4.44e-03, grad_scale: 8.0 2023-03-09 04:57:40,892 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95329.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:57:56,773 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95339.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 04:58:11,161 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.823e+02 2.114e+02 2.643e+02 5.105e+02, threshold=4.228e+02, percent-clipped=2.0 2023-03-09 04:58:58,651 INFO [train2.py:809] (2/4) Epoch 24, batch 3750, loss[ctc_loss=0.05389, att_loss=0.2068, loss=0.1762, over 15750.00 frames. utt_duration=1659 frames, utt_pad_proportion=0.008709, over 38.00 utterances.], tot_loss[ctc_loss=0.06922, att_loss=0.2334, loss=0.2005, over 3274632.17 frames. utt_duration=1247 frames, utt_pad_proportion=0.05498, over 10520.09 utterances.], batch size: 38, lr: 4.44e-03, grad_scale: 8.0 2023-03-09 04:59:19,361 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0412, 4.2874, 4.2701, 4.6322, 2.6593, 4.4048, 2.8385, 2.1494], device='cuda:2'), covar=tensor([0.0474, 0.0320, 0.0673, 0.0335, 0.1627, 0.0264, 0.1316, 0.1398], device='cuda:2'), in_proj_covar=tensor([0.0203, 0.0175, 0.0262, 0.0167, 0.0223, 0.0160, 0.0231, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 04:59:27,769 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.12 vs. limit=2.0 2023-03-09 04:59:35,868 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95400.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:00:08,386 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7984, 2.5208, 2.3398, 2.4962, 2.8583, 2.5447, 2.5442, 2.9125], device='cuda:2'), covar=tensor([0.1420, 0.2345, 0.1904, 0.1371, 0.1488, 0.1136, 0.1933, 0.1118], device='cuda:2'), in_proj_covar=tensor([0.0130, 0.0133, 0.0130, 0.0121, 0.0137, 0.0118, 0.0142, 0.0116], device='cuda:2'), out_proj_covar=tensor([9.9293e-05, 1.0456e-04, 1.0484e-04, 9.5001e-05, 1.0286e-04, 9.5279e-05, 1.0778e-04, 9.2274e-05], device='cuda:2') 2023-03-09 05:00:19,270 INFO [train2.py:809] (2/4) Epoch 24, batch 3800, loss[ctc_loss=0.06739, att_loss=0.2403, loss=0.2057, over 17018.00 frames. utt_duration=1311 frames, utt_pad_proportion=0.01083, over 52.00 utterances.], tot_loss[ctc_loss=0.06903, att_loss=0.2331, loss=0.2003, over 3266590.26 frames. utt_duration=1254 frames, utt_pad_proportion=0.0533, over 10433.34 utterances.], batch size: 52, lr: 4.44e-03, grad_scale: 8.0 2023-03-09 05:00:21,700 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-03-09 05:00:52,294 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.438e+02 1.978e+02 2.351e+02 3.015e+02 8.965e+02, threshold=4.702e+02, percent-clipped=6.0 2023-03-09 05:01:13,133 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95461.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 05:01:33,311 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.25 vs. limit=5.0 2023-03-09 05:01:38,906 INFO [train2.py:809] (2/4) Epoch 24, batch 3850, loss[ctc_loss=0.0716, att_loss=0.2337, loss=0.2013, over 16883.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.007532, over 49.00 utterances.], tot_loss[ctc_loss=0.06919, att_loss=0.2333, loss=0.2005, over 3271872.83 frames. utt_duration=1254 frames, utt_pad_proportion=0.05186, over 10449.67 utterances.], batch size: 49, lr: 4.43e-03, grad_scale: 8.0 2023-03-09 05:01:39,762 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.57 vs. limit=2.0 2023-03-09 05:01:57,658 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2023-03-09 05:02:54,838 INFO [train2.py:809] (2/4) Epoch 24, batch 3900, loss[ctc_loss=0.05522, att_loss=0.2198, loss=0.1869, over 16284.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.00696, over 43.00 utterances.], tot_loss[ctc_loss=0.06997, att_loss=0.2335, loss=0.2008, over 3270340.41 frames. utt_duration=1242 frames, utt_pad_proportion=0.05686, over 10542.61 utterances.], batch size: 43, lr: 4.43e-03, grad_scale: 8.0 2023-03-09 05:03:26,621 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.865e+02 2.329e+02 2.791e+02 6.016e+02, threshold=4.658e+02, percent-clipped=3.0 2023-03-09 05:04:10,398 INFO [train2.py:809] (2/4) Epoch 24, batch 3950, loss[ctc_loss=0.05165, att_loss=0.2229, loss=0.1886, over 16317.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.007068, over 45.00 utterances.], tot_loss[ctc_loss=0.0693, att_loss=0.2334, loss=0.2006, over 3263778.25 frames. utt_duration=1240 frames, utt_pad_proportion=0.05817, over 10539.40 utterances.], batch size: 45, lr: 4.43e-03, grad_scale: 8.0 2023-03-09 05:05:25,993 INFO [train2.py:809] (2/4) Epoch 25, batch 0, loss[ctc_loss=0.08185, att_loss=0.2272, loss=0.1981, over 15647.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008122, over 37.00 utterances.], tot_loss[ctc_loss=0.08185, att_loss=0.2272, loss=0.1981, over 15647.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008122, over 37.00 utterances.], batch size: 37, lr: 4.34e-03, grad_scale: 8.0 2023-03-09 05:05:25,993 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 05:05:38,245 INFO [train2.py:843] (2/4) Epoch 25, validation: ctc_loss=0.04004, att_loss=0.2344, loss=0.1955, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 05:05:38,246 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 05:05:46,427 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95616.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:05:48,117 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95617.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:05:58,895 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95624.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:06:36,951 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.264e+02 1.872e+02 2.311e+02 2.775e+02 5.959e+02, threshold=4.621e+02, percent-clipped=4.0 2023-03-09 05:06:54,924 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.3641, 2.9800, 2.6175, 2.7864, 3.1197, 3.0324, 2.3846, 2.9532], device='cuda:2'), covar=tensor([0.1066, 0.0419, 0.0911, 0.0632, 0.0686, 0.0605, 0.0896, 0.0483], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0225, 0.0228, 0.0207, 0.0286, 0.0245, 0.0204, 0.0295], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 05:06:57,666 INFO [train2.py:809] (2/4) Epoch 25, batch 50, loss[ctc_loss=0.08074, att_loss=0.2416, loss=0.2094, over 16458.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.00699, over 46.00 utterances.], tot_loss[ctc_loss=0.06846, att_loss=0.234, loss=0.2009, over 740131.96 frames. utt_duration=1273 frames, utt_pad_proportion=0.04392, over 2328.95 utterances.], batch size: 46, lr: 4.34e-03, grad_scale: 8.0 2023-03-09 05:07:25,631 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95678.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:07:51,592 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.74 vs. limit=2.0 2023-03-09 05:07:52,062 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95695.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:08:17,165 INFO [train2.py:809] (2/4) Epoch 25, batch 100, loss[ctc_loss=0.08012, att_loss=0.2538, loss=0.219, over 16883.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007635, over 49.00 utterances.], tot_loss[ctc_loss=0.06898, att_loss=0.2331, loss=0.2003, over 1298438.76 frames. utt_duration=1272 frames, utt_pad_proportion=0.04952, over 4089.15 utterances.], batch size: 49, lr: 4.34e-03, grad_scale: 8.0 2023-03-09 05:09:15,973 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.395e+02 1.992e+02 2.344e+02 2.996e+02 8.490e+02, threshold=4.688e+02, percent-clipped=4.0 2023-03-09 05:09:36,922 INFO [train2.py:809] (2/4) Epoch 25, batch 150, loss[ctc_loss=0.06643, att_loss=0.2332, loss=0.1998, over 16534.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006677, over 45.00 utterances.], tot_loss[ctc_loss=0.06964, att_loss=0.2336, loss=0.2008, over 1738874.68 frames. utt_duration=1236 frames, utt_pad_proportion=0.05813, over 5634.89 utterances.], batch size: 45, lr: 4.34e-03, grad_scale: 16.0 2023-03-09 05:09:37,287 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95761.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 05:10:55,139 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=95809.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 05:10:58,462 INFO [train2.py:809] (2/4) Epoch 25, batch 200, loss[ctc_loss=0.07548, att_loss=0.2244, loss=0.1946, over 16010.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007156, over 40.00 utterances.], tot_loss[ctc_loss=0.06983, att_loss=0.2336, loss=0.2008, over 2086450.99 frames. utt_duration=1239 frames, utt_pad_proportion=0.05395, over 6744.39 utterances.], batch size: 40, lr: 4.34e-03, grad_scale: 16.0 2023-03-09 05:11:00,467 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95812.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 05:11:56,672 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95847.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:11:57,809 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.266e+02 1.872e+02 2.363e+02 3.018e+02 4.508e+02, threshold=4.726e+02, percent-clipped=0.0 2023-03-09 05:12:18,652 INFO [train2.py:809] (2/4) Epoch 25, batch 250, loss[ctc_loss=0.08472, att_loss=0.2563, loss=0.222, over 17010.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007677, over 51.00 utterances.], tot_loss[ctc_loss=0.07075, att_loss=0.2344, loss=0.2017, over 2344712.69 frames. utt_duration=1188 frames, utt_pad_proportion=0.06731, over 7903.37 utterances.], batch size: 51, lr: 4.33e-03, grad_scale: 16.0 2023-03-09 05:12:38,741 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95873.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 05:13:35,004 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95908.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 05:13:39,392 INFO [train2.py:809] (2/4) Epoch 25, batch 300, loss[ctc_loss=0.06232, att_loss=0.2082, loss=0.179, over 15639.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.009399, over 37.00 utterances.], tot_loss[ctc_loss=0.07004, att_loss=0.2335, loss=0.2008, over 2545767.69 frames. utt_duration=1200 frames, utt_pad_proportion=0.06531, over 8498.30 utterances.], batch size: 37, lr: 4.33e-03, grad_scale: 16.0 2023-03-09 05:13:48,193 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95916.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:14:00,739 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95924.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:14:38,794 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.855e+02 2.204e+02 2.784e+02 5.954e+02, threshold=4.408e+02, percent-clipped=3.0 2023-03-09 05:14:59,614 INFO [train2.py:809] (2/4) Epoch 25, batch 350, loss[ctc_loss=0.05879, att_loss=0.2223, loss=0.1896, over 16400.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006941, over 44.00 utterances.], tot_loss[ctc_loss=0.06885, att_loss=0.2334, loss=0.2005, over 2707944.72 frames. utt_duration=1229 frames, utt_pad_proportion=0.05852, over 8824.15 utterances.], batch size: 44, lr: 4.33e-03, grad_scale: 16.0 2023-03-09 05:15:04,977 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=95964.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:15:17,325 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=95972.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:15:19,649 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95973.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:15:30,463 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95980.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:15:54,767 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95995.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:16:24,359 INFO [train2.py:809] (2/4) Epoch 25, batch 400, loss[ctc_loss=0.05495, att_loss=0.2188, loss=0.186, over 16117.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006114, over 42.00 utterances.], tot_loss[ctc_loss=0.06797, att_loss=0.2326, loss=0.1997, over 2834283.18 frames. utt_duration=1267 frames, utt_pad_proportion=0.04944, over 8960.58 utterances.], batch size: 42, lr: 4.33e-03, grad_scale: 16.0 2023-03-09 05:17:12,349 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96041.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:17:15,168 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96043.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:17:22,748 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.291e+02 1.820e+02 2.178e+02 2.552e+02 4.436e+02, threshold=4.357e+02, percent-clipped=1.0 2023-03-09 05:17:43,677 INFO [train2.py:809] (2/4) Epoch 25, batch 450, loss[ctc_loss=0.09127, att_loss=0.2478, loss=0.2165, over 17126.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.0145, over 56.00 utterances.], tot_loss[ctc_loss=0.06742, att_loss=0.2319, loss=0.199, over 2929467.84 frames. utt_duration=1281 frames, utt_pad_proportion=0.04685, over 9156.44 utterances.], batch size: 56, lr: 4.33e-03, grad_scale: 16.0 2023-03-09 05:18:14,123 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96080.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:19:03,652 INFO [train2.py:809] (2/4) Epoch 25, batch 500, loss[ctc_loss=0.06207, att_loss=0.2382, loss=0.203, over 17355.00 frames. utt_duration=1008 frames, utt_pad_proportion=0.05032, over 69.00 utterances.], tot_loss[ctc_loss=0.06793, att_loss=0.2322, loss=0.1994, over 3007404.29 frames. utt_duration=1280 frames, utt_pad_proportion=0.04742, over 9407.32 utterances.], batch size: 69, lr: 4.33e-03, grad_scale: 16.0 2023-03-09 05:19:24,216 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1285, 5.4215, 5.0712, 5.4799, 4.8471, 5.0576, 5.5904, 5.2901], device='cuda:2'), covar=tensor([0.0561, 0.0262, 0.0678, 0.0306, 0.0422, 0.0255, 0.0218, 0.0194], device='cuda:2'), in_proj_covar=tensor([0.0397, 0.0330, 0.0372, 0.0361, 0.0331, 0.0244, 0.0313, 0.0293], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 05:19:50,649 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96141.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:20:00,864 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.348e+02 1.791e+02 2.138e+02 2.685e+02 5.642e+02, threshold=4.276e+02, percent-clipped=2.0 2023-03-09 05:20:21,841 INFO [train2.py:809] (2/4) Epoch 25, batch 550, loss[ctc_loss=0.06183, att_loss=0.2419, loss=0.2059, over 17128.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01445, over 56.00 utterances.], tot_loss[ctc_loss=0.06861, att_loss=0.2325, loss=0.1997, over 3064878.52 frames. utt_duration=1287 frames, utt_pad_proportion=0.04617, over 9538.53 utterances.], batch size: 56, lr: 4.33e-03, grad_scale: 16.0 2023-03-09 05:20:28,369 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96165.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:20:32,705 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96168.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 05:21:27,373 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96203.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 05:21:39,962 INFO [train2.py:809] (2/4) Epoch 25, batch 600, loss[ctc_loss=0.05514, att_loss=0.2207, loss=0.1876, over 16181.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006259, over 41.00 utterances.], tot_loss[ctc_loss=0.06898, att_loss=0.2318, loss=0.1992, over 3100762.67 frames. utt_duration=1269 frames, utt_pad_proportion=0.054, over 9788.24 utterances.], batch size: 41, lr: 4.33e-03, grad_scale: 16.0 2023-03-09 05:22:03,775 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96226.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:22:37,889 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.13 vs. limit=2.0 2023-03-09 05:22:38,437 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.936e+02 2.176e+02 2.702e+02 4.909e+02, threshold=4.353e+02, percent-clipped=2.0 2023-03-09 05:22:59,847 INFO [train2.py:809] (2/4) Epoch 25, batch 650, loss[ctc_loss=0.06439, att_loss=0.2403, loss=0.2051, over 16482.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.006428, over 46.00 utterances.], tot_loss[ctc_loss=0.0683, att_loss=0.2319, loss=0.1992, over 3134537.80 frames. utt_duration=1250 frames, utt_pad_proportion=0.05744, over 10038.99 utterances.], batch size: 46, lr: 4.33e-03, grad_scale: 16.0 2023-03-09 05:23:19,361 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96273.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:23:24,488 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.29 vs. limit=5.0 2023-03-09 05:24:19,883 INFO [train2.py:809] (2/4) Epoch 25, batch 700, loss[ctc_loss=0.07534, att_loss=0.2171, loss=0.1888, over 12363.00 frames. utt_duration=1833 frames, utt_pad_proportion=0.03942, over 27.00 utterances.], tot_loss[ctc_loss=0.06803, att_loss=0.232, loss=0.1992, over 3170698.19 frames. utt_duration=1257 frames, utt_pad_proportion=0.05221, over 10104.60 utterances.], batch size: 27, lr: 4.32e-03, grad_scale: 16.0 2023-03-09 05:24:36,213 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96321.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:24:40,971 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96324.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:24:59,819 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96336.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:25:18,101 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.804e+02 2.150e+02 2.706e+02 4.503e+02, threshold=4.300e+02, percent-clipped=0.0 2023-03-09 05:25:38,752 INFO [train2.py:809] (2/4) Epoch 25, batch 750, loss[ctc_loss=0.04712, att_loss=0.1963, loss=0.1665, over 14567.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.04023, over 32.00 utterances.], tot_loss[ctc_loss=0.06825, att_loss=0.2322, loss=0.1994, over 3184226.44 frames. utt_duration=1255 frames, utt_pad_proportion=0.05438, over 10162.28 utterances.], batch size: 32, lr: 4.32e-03, grad_scale: 16.0 2023-03-09 05:26:00,269 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1881, 4.3086, 4.3345, 4.3144, 4.8801, 4.2101, 4.2543, 2.4089], device='cuda:2'), covar=tensor([0.0301, 0.0404, 0.0388, 0.0380, 0.0779, 0.0296, 0.0405, 0.1790], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0206, 0.0203, 0.0221, 0.0378, 0.0175, 0.0193, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:26:16,457 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2023-03-09 05:26:17,567 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96385.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:26:25,361 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96390.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:26:58,846 INFO [train2.py:809] (2/4) Epoch 25, batch 800, loss[ctc_loss=0.089, att_loss=0.2481, loss=0.2163, over 17285.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01286, over 55.00 utterances.], tot_loss[ctc_loss=0.06948, att_loss=0.2332, loss=0.2005, over 3204220.86 frames. utt_duration=1238 frames, utt_pad_proportion=0.05838, over 10364.40 utterances.], batch size: 55, lr: 4.32e-03, grad_scale: 16.0 2023-03-09 05:27:38,851 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96436.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:27:48,602 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5324, 2.6468, 5.0392, 3.9797, 3.0485, 4.3055, 4.7896, 4.7226], device='cuda:2'), covar=tensor([0.0333, 0.1505, 0.0222, 0.0822, 0.1610, 0.0270, 0.0184, 0.0284], device='cuda:2'), in_proj_covar=tensor([0.0213, 0.0242, 0.0206, 0.0318, 0.0265, 0.0225, 0.0194, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:27:54,819 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9474, 4.4059, 4.4224, 4.6384, 2.8377, 4.4130, 2.9723, 1.8300], device='cuda:2'), covar=tensor([0.0496, 0.0284, 0.0611, 0.0242, 0.1486, 0.0235, 0.1225, 0.1622], device='cuda:2'), in_proj_covar=tensor([0.0206, 0.0177, 0.0264, 0.0171, 0.0224, 0.0162, 0.0232, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:27:57,436 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.925e+02 2.269e+02 2.805e+02 5.385e+02, threshold=4.538e+02, percent-clipped=5.0 2023-03-09 05:28:02,627 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96451.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:28:19,551 INFO [train2.py:809] (2/4) Epoch 25, batch 850, loss[ctc_loss=0.06581, att_loss=0.2434, loss=0.2078, over 17055.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008766, over 52.00 utterances.], tot_loss[ctc_loss=0.06934, att_loss=0.2335, loss=0.2007, over 3221100.28 frames. utt_duration=1230 frames, utt_pad_proportion=0.06062, over 10484.27 utterances.], batch size: 52, lr: 4.32e-03, grad_scale: 16.0 2023-03-09 05:28:31,891 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96468.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 05:29:28,028 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96503.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:29:41,365 INFO [train2.py:809] (2/4) Epoch 25, batch 900, loss[ctc_loss=0.09517, att_loss=0.2452, loss=0.2152, over 14245.00 frames. utt_duration=394.4 frames, utt_pad_proportion=0.3152, over 145.00 utterances.], tot_loss[ctc_loss=0.06965, att_loss=0.234, loss=0.2011, over 3236005.86 frames. utt_duration=1205 frames, utt_pad_proportion=0.06622, over 10758.72 utterances.], batch size: 145, lr: 4.32e-03, grad_scale: 16.0 2023-03-09 05:29:49,300 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96516.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 05:29:57,210 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96521.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:30:40,145 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.401e+02 1.959e+02 2.352e+02 2.820e+02 9.770e+02, threshold=4.704e+02, percent-clipped=4.0 2023-03-09 05:30:45,075 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96551.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:31:03,282 INFO [train2.py:809] (2/4) Epoch 25, batch 950, loss[ctc_loss=0.08287, att_loss=0.2442, loss=0.2119, over 17300.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01139, over 55.00 utterances.], tot_loss[ctc_loss=0.07046, att_loss=0.2344, loss=0.2016, over 3238171.33 frames. utt_duration=1191 frames, utt_pad_proportion=0.07169, over 10891.01 utterances.], batch size: 55, lr: 4.32e-03, grad_scale: 16.0 2023-03-09 05:31:27,073 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2488, 4.3652, 4.3790, 4.3340, 4.9028, 4.3556, 4.3148, 2.5571], device='cuda:2'), covar=tensor([0.0323, 0.0469, 0.0437, 0.0399, 0.0721, 0.0300, 0.0400, 0.1796], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0207, 0.0204, 0.0223, 0.0380, 0.0177, 0.0195, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:32:25,902 INFO [train2.py:809] (2/4) Epoch 25, batch 1000, loss[ctc_loss=0.06768, att_loss=0.2571, loss=0.2192, over 17347.00 frames. utt_duration=1103 frames, utt_pad_proportion=0.03595, over 63.00 utterances.], tot_loss[ctc_loss=0.0703, att_loss=0.2346, loss=0.2017, over 3246134.16 frames. utt_duration=1214 frames, utt_pad_proportion=0.06476, over 10705.63 utterances.], batch size: 63, lr: 4.32e-03, grad_scale: 16.0 2023-03-09 05:33:04,376 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.80 vs. limit=2.0 2023-03-09 05:33:05,491 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96636.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:33:07,061 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2273, 3.8149, 3.3150, 3.5722, 4.0818, 3.6842, 3.1929, 4.4020], device='cuda:2'), covar=tensor([0.0924, 0.0506, 0.1035, 0.0618, 0.0691, 0.0775, 0.0794, 0.0420], device='cuda:2'), in_proj_covar=tensor([0.0206, 0.0224, 0.0227, 0.0207, 0.0287, 0.0245, 0.0203, 0.0294], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 05:33:23,592 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.244e+02 1.809e+02 2.155e+02 2.476e+02 5.052e+02, threshold=4.310e+02, percent-clipped=2.0 2023-03-09 05:33:40,883 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7643, 6.0583, 5.5345, 5.7996, 5.7035, 5.2796, 5.4524, 5.3314], device='cuda:2'), covar=tensor([0.1344, 0.0843, 0.0960, 0.0817, 0.1027, 0.1510, 0.2295, 0.2378], device='cuda:2'), in_proj_covar=tensor([0.0542, 0.0625, 0.0476, 0.0468, 0.0442, 0.0478, 0.0626, 0.0539], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 05:33:46,084 INFO [train2.py:809] (2/4) Epoch 25, batch 1050, loss[ctc_loss=0.04478, att_loss=0.214, loss=0.1802, over 16122.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005791, over 42.00 utterances.], tot_loss[ctc_loss=0.06891, att_loss=0.2333, loss=0.2004, over 3248200.19 frames. utt_duration=1231 frames, utt_pad_proportion=0.06187, over 10567.08 utterances.], batch size: 42, lr: 4.32e-03, grad_scale: 16.0 2023-03-09 05:34:17,067 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96680.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:34:23,046 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96684.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:35:06,973 INFO [train2.py:809] (2/4) Epoch 25, batch 1100, loss[ctc_loss=0.06655, att_loss=0.2479, loss=0.2116, over 17336.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.03569, over 63.00 utterances.], tot_loss[ctc_loss=0.06912, att_loss=0.2335, loss=0.2007, over 3256005.38 frames. utt_duration=1238 frames, utt_pad_proportion=0.05946, over 10535.80 utterances.], batch size: 63, lr: 4.32e-03, grad_scale: 16.0 2023-03-09 05:35:47,569 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96736.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:36:03,172 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96746.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:36:06,428 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.271e+02 2.003e+02 2.274e+02 2.730e+02 5.677e+02, threshold=4.548e+02, percent-clipped=3.0 2023-03-09 05:36:09,968 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5304, 2.8063, 4.8780, 4.0064, 3.1093, 4.3560, 4.8062, 4.6283], device='cuda:2'), covar=tensor([0.0263, 0.1395, 0.0273, 0.0872, 0.1562, 0.0247, 0.0180, 0.0265], device='cuda:2'), in_proj_covar=tensor([0.0212, 0.0241, 0.0205, 0.0317, 0.0263, 0.0224, 0.0193, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:36:19,034 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.13 vs. limit=2.0 2023-03-09 05:36:28,284 INFO [train2.py:809] (2/4) Epoch 25, batch 1150, loss[ctc_loss=0.0684, att_loss=0.2248, loss=0.1935, over 16008.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007234, over 40.00 utterances.], tot_loss[ctc_loss=0.06958, att_loss=0.2342, loss=0.2012, over 3252164.12 frames. utt_duration=1214 frames, utt_pad_proportion=0.06583, over 10727.64 utterances.], batch size: 40, lr: 4.31e-03, grad_scale: 16.0 2023-03-09 05:36:53,004 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96776.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:37:05,056 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96784.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:37:16,120 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0023, 5.0509, 4.7713, 2.8324, 4.8273, 4.6457, 4.3484, 2.8592], device='cuda:2'), covar=tensor([0.0116, 0.0099, 0.0300, 0.1097, 0.0105, 0.0204, 0.0304, 0.1318], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0104, 0.0107, 0.0111, 0.0087, 0.0114, 0.0100, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 05:37:47,485 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0791, 3.7807, 3.7093, 3.2149, 3.7599, 3.9137, 3.8279, 2.8201], device='cuda:2'), covar=tensor([0.1035, 0.1056, 0.1516, 0.3337, 0.2905, 0.1579, 0.1114, 0.3307], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0196, 0.0210, 0.0264, 0.0173, 0.0271, 0.0195, 0.0222], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:37:48,674 INFO [train2.py:809] (2/4) Epoch 25, batch 1200, loss[ctc_loss=0.06623, att_loss=0.2162, loss=0.1862, over 15877.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009814, over 39.00 utterances.], tot_loss[ctc_loss=0.06978, att_loss=0.2343, loss=0.2014, over 3257285.60 frames. utt_duration=1208 frames, utt_pad_proportion=0.06714, over 10800.01 utterances.], batch size: 39, lr: 4.31e-03, grad_scale: 16.0 2023-03-09 05:38:04,276 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1060, 5.1295, 4.9597, 3.2429, 4.9949, 4.6863, 4.5436, 2.9878], device='cuda:2'), covar=tensor([0.0137, 0.0096, 0.0264, 0.0895, 0.0095, 0.0196, 0.0268, 0.1249], device='cuda:2'), in_proj_covar=tensor([0.0077, 0.0104, 0.0106, 0.0111, 0.0087, 0.0115, 0.0100, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 05:38:05,780 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96821.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:38:19,638 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1548, 3.8064, 3.7355, 3.2582, 3.8254, 3.8881, 3.8544, 2.8481], device='cuda:2'), covar=tensor([0.0989, 0.1075, 0.2096, 0.3253, 0.1765, 0.3006, 0.0811, 0.3004], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0195, 0.0209, 0.0263, 0.0173, 0.0270, 0.0195, 0.0221], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:38:30,788 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96837.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:38:48,489 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.354e+02 2.088e+02 2.358e+02 2.925e+02 6.715e+02, threshold=4.715e+02, percent-clipped=3.0 2023-03-09 05:39:08,866 INFO [train2.py:809] (2/4) Epoch 25, batch 1250, loss[ctc_loss=0.05601, att_loss=0.2115, loss=0.1804, over 15958.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006113, over 41.00 utterances.], tot_loss[ctc_loss=0.06916, att_loss=0.2332, loss=0.2004, over 3258325.75 frames. utt_duration=1221 frames, utt_pad_proportion=0.06318, over 10683.33 utterances.], batch size: 41, lr: 4.31e-03, grad_scale: 16.0 2023-03-09 05:39:22,155 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96869.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:39:56,813 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.95 vs. limit=5.0 2023-03-09 05:40:02,563 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9277, 5.1155, 5.6026, 5.0864, 5.0704, 5.6948, 5.1194, 5.7772], device='cuda:2'), covar=tensor([0.1269, 0.1466, 0.1205, 0.2647, 0.3335, 0.1729, 0.1295, 0.1336], device='cuda:2'), in_proj_covar=tensor([0.0905, 0.0522, 0.0632, 0.0677, 0.0906, 0.0656, 0.0515, 0.0642], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 05:40:28,118 INFO [train2.py:809] (2/4) Epoch 25, batch 1300, loss[ctc_loss=0.07869, att_loss=0.2441, loss=0.211, over 16472.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.007196, over 46.00 utterances.], tot_loss[ctc_loss=0.06851, att_loss=0.233, loss=0.2001, over 3269567.24 frames. utt_duration=1230 frames, utt_pad_proportion=0.05825, over 10647.97 utterances.], batch size: 46, lr: 4.31e-03, grad_scale: 16.0 2023-03-09 05:41:27,105 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.796e+02 2.102e+02 2.487e+02 3.780e+02, threshold=4.203e+02, percent-clipped=0.0 2023-03-09 05:41:47,817 INFO [train2.py:809] (2/4) Epoch 25, batch 1350, loss[ctc_loss=0.07911, att_loss=0.2396, loss=0.2075, over 17468.00 frames. utt_duration=885.9 frames, utt_pad_proportion=0.07042, over 79.00 utterances.], tot_loss[ctc_loss=0.06859, att_loss=0.2328, loss=0.2, over 3263594.79 frames. utt_duration=1221 frames, utt_pad_proportion=0.06226, over 10703.69 utterances.], batch size: 79, lr: 4.31e-03, grad_scale: 16.0 2023-03-09 05:42:18,038 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96980.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:42:51,020 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97000.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:43:07,413 INFO [train2.py:809] (2/4) Epoch 25, batch 1400, loss[ctc_loss=0.07724, att_loss=0.24, loss=0.2075, over 16463.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.00687, over 46.00 utterances.], tot_loss[ctc_loss=0.06901, att_loss=0.2331, loss=0.2003, over 3256202.22 frames. utt_duration=1186 frames, utt_pad_proportion=0.07318, over 11000.52 utterances.], batch size: 46, lr: 4.31e-03, grad_scale: 16.0 2023-03-09 05:43:17,828 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.75 vs. limit=2.0 2023-03-09 05:43:34,008 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97028.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:44:02,826 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97046.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:44:05,389 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.953e+02 2.309e+02 2.789e+02 6.785e+02, threshold=4.619e+02, percent-clipped=7.0 2023-03-09 05:44:14,918 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.13 vs. limit=2.0 2023-03-09 05:44:18,997 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4513, 4.5269, 4.6682, 4.5812, 5.1466, 4.4655, 4.5721, 2.7208], device='cuda:2'), covar=tensor([0.0263, 0.0359, 0.0303, 0.0320, 0.0699, 0.0264, 0.0323, 0.1704], device='cuda:2'), in_proj_covar=tensor([0.0178, 0.0205, 0.0203, 0.0221, 0.0378, 0.0176, 0.0194, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:44:26,316 INFO [train2.py:809] (2/4) Epoch 25, batch 1450, loss[ctc_loss=0.04787, att_loss=0.1979, loss=0.1679, over 15628.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.00989, over 37.00 utterances.], tot_loss[ctc_loss=0.06923, att_loss=0.2327, loss=0.2, over 3251952.48 frames. utt_duration=1200 frames, utt_pad_proportion=0.07283, over 10850.42 utterances.], batch size: 37, lr: 4.31e-03, grad_scale: 16.0 2023-03-09 05:44:27,316 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97061.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:44:45,743 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0576, 4.4342, 4.3720, 4.5842, 3.0248, 4.4322, 2.6383, 1.7374], device='cuda:2'), covar=tensor([0.0510, 0.0285, 0.0700, 0.0254, 0.1445, 0.0226, 0.1575, 0.1766], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0177, 0.0264, 0.0170, 0.0223, 0.0162, 0.0232, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:45:17,670 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97094.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:45:31,098 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0744, 5.3963, 4.8710, 5.4740, 4.8353, 5.0653, 5.5032, 5.2807], device='cuda:2'), covar=tensor([0.0613, 0.0296, 0.0854, 0.0328, 0.0390, 0.0237, 0.0243, 0.0206], device='cuda:2'), in_proj_covar=tensor([0.0400, 0.0331, 0.0371, 0.0363, 0.0330, 0.0244, 0.0315, 0.0293], device='cuda:2'), out_proj_covar=tensor([0.0007, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 05:45:44,472 INFO [train2.py:809] (2/4) Epoch 25, batch 1500, loss[ctc_loss=0.04815, att_loss=0.2249, loss=0.1896, over 16532.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006021, over 45.00 utterances.], tot_loss[ctc_loss=0.06862, att_loss=0.2327, loss=0.1999, over 3257251.40 frames. utt_duration=1223 frames, utt_pad_proportion=0.06551, over 10664.35 utterances.], batch size: 45, lr: 4.31e-03, grad_scale: 16.0 2023-03-09 05:45:52,951 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6095, 5.0471, 4.8003, 4.9715, 5.1283, 4.7054, 3.6894, 5.0276], device='cuda:2'), covar=tensor([0.0119, 0.0116, 0.0135, 0.0098, 0.0092, 0.0126, 0.0651, 0.0159], device='cuda:2'), in_proj_covar=tensor([0.0096, 0.0093, 0.0115, 0.0072, 0.0079, 0.0089, 0.0106, 0.0111], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 05:46:15,852 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1148, 5.3499, 5.2579, 5.2178, 5.3965, 5.3742, 4.9905, 4.8665], device='cuda:2'), covar=tensor([0.1038, 0.0514, 0.0344, 0.0537, 0.0262, 0.0318, 0.0417, 0.0329], device='cuda:2'), in_proj_covar=tensor([0.0536, 0.0375, 0.0364, 0.0374, 0.0436, 0.0442, 0.0370, 0.0406], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 05:46:17,397 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97132.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:46:43,660 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.190e+02 2.011e+02 2.273e+02 2.604e+02 5.523e+02, threshold=4.546e+02, percent-clipped=2.0 2023-03-09 05:46:48,645 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0405, 4.4139, 4.3586, 4.5785, 2.8430, 4.3991, 2.7979, 1.4971], device='cuda:2'), covar=tensor([0.0508, 0.0253, 0.0737, 0.0249, 0.1555, 0.0236, 0.1461, 0.1814], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0176, 0.0263, 0.0170, 0.0222, 0.0162, 0.0231, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:46:59,351 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4509, 3.9129, 3.4330, 3.6658, 4.1248, 3.8573, 3.3841, 4.4766], device='cuda:2'), covar=tensor([0.0899, 0.0499, 0.1056, 0.0675, 0.0739, 0.0695, 0.0766, 0.0607], device='cuda:2'), in_proj_covar=tensor([0.0205, 0.0223, 0.0227, 0.0206, 0.0287, 0.0245, 0.0202, 0.0294], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 05:47:03,717 INFO [train2.py:809] (2/4) Epoch 25, batch 1550, loss[ctc_loss=0.05346, att_loss=0.2126, loss=0.1808, over 15953.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007213, over 41.00 utterances.], tot_loss[ctc_loss=0.06893, att_loss=0.2329, loss=0.2001, over 3267761.23 frames. utt_duration=1234 frames, utt_pad_proportion=0.06017, over 10605.26 utterances.], batch size: 41, lr: 4.31e-03, grad_scale: 16.0 2023-03-09 05:47:17,496 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2023-03-09 05:48:24,492 INFO [train2.py:809] (2/4) Epoch 25, batch 1600, loss[ctc_loss=0.05799, att_loss=0.2063, loss=0.1767, over 15374.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.008235, over 35.00 utterances.], tot_loss[ctc_loss=0.06929, att_loss=0.2337, loss=0.2008, over 3273359.84 frames. utt_duration=1230 frames, utt_pad_proportion=0.05966, over 10653.92 utterances.], batch size: 35, lr: 4.30e-03, grad_scale: 16.0 2023-03-09 05:49:22,652 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.291e+02 1.951e+02 2.275e+02 2.852e+02 6.769e+02, threshold=4.549e+02, percent-clipped=7.0 2023-03-09 05:49:42,313 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97260.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:49:43,459 INFO [train2.py:809] (2/4) Epoch 25, batch 1650, loss[ctc_loss=0.07456, att_loss=0.2501, loss=0.215, over 17298.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01198, over 55.00 utterances.], tot_loss[ctc_loss=0.06916, att_loss=0.2338, loss=0.2009, over 3279286.15 frames. utt_duration=1234 frames, utt_pad_proportion=0.05668, over 10643.88 utterances.], batch size: 55, lr: 4.30e-03, grad_scale: 16.0 2023-03-09 05:51:02,811 INFO [train2.py:809] (2/4) Epoch 25, batch 1700, loss[ctc_loss=0.09258, att_loss=0.2509, loss=0.2192, over 17281.00 frames. utt_duration=1173 frames, utt_pad_proportion=0.02571, over 59.00 utterances.], tot_loss[ctc_loss=0.06857, att_loss=0.2337, loss=0.2007, over 3283219.21 frames. utt_duration=1249 frames, utt_pad_proportion=0.05218, over 10528.71 utterances.], batch size: 59, lr: 4.30e-03, grad_scale: 16.0 2023-03-09 05:51:19,403 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97321.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:51:51,950 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97342.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:52:01,001 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97347.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:52:02,096 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.753e+02 2.134e+02 2.732e+02 7.784e+02, threshold=4.268e+02, percent-clipped=3.0 2023-03-09 05:52:15,781 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97356.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:52:23,289 INFO [train2.py:809] (2/4) Epoch 25, batch 1750, loss[ctc_loss=0.04947, att_loss=0.2154, loss=0.1822, over 15771.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.0084, over 38.00 utterances.], tot_loss[ctc_loss=0.06764, att_loss=0.2325, loss=0.1995, over 3275799.64 frames. utt_duration=1282 frames, utt_pad_proportion=0.046, over 10230.29 utterances.], batch size: 38, lr: 4.30e-03, grad_scale: 16.0 2023-03-09 05:52:44,636 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97374.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:53:31,663 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97403.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 05:53:39,242 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97408.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:53:44,074 INFO [train2.py:809] (2/4) Epoch 25, batch 1800, loss[ctc_loss=0.07396, att_loss=0.2348, loss=0.2026, over 16494.00 frames. utt_duration=1436 frames, utt_pad_proportion=0.00557, over 46.00 utterances.], tot_loss[ctc_loss=0.06769, att_loss=0.2324, loss=0.1994, over 3273123.13 frames. utt_duration=1251 frames, utt_pad_proportion=0.05335, over 10476.27 utterances.], batch size: 46, lr: 4.30e-03, grad_scale: 16.0 2023-03-09 05:54:16,748 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97432.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:54:21,370 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97435.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:54:22,981 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5007, 2.8422, 4.9100, 3.9408, 3.0261, 4.2439, 4.6939, 4.5494], device='cuda:2'), covar=tensor([0.0281, 0.1405, 0.0198, 0.0883, 0.1731, 0.0267, 0.0200, 0.0314], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0242, 0.0207, 0.0318, 0.0264, 0.0226, 0.0197, 0.0224], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:54:43,144 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.842e+02 2.140e+02 2.481e+02 6.088e+02, threshold=4.279e+02, percent-clipped=3.0 2023-03-09 05:55:03,728 INFO [train2.py:809] (2/4) Epoch 25, batch 1850, loss[ctc_loss=0.08932, att_loss=0.25, loss=0.2179, over 17305.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01169, over 55.00 utterances.], tot_loss[ctc_loss=0.06729, att_loss=0.2323, loss=0.1993, over 3278842.03 frames. utt_duration=1260 frames, utt_pad_proportion=0.04889, over 10424.87 utterances.], batch size: 55, lr: 4.30e-03, grad_scale: 16.0 2023-03-09 05:55:05,514 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7098, 5.1809, 4.9977, 5.1096, 5.2259, 4.8386, 3.8563, 5.1936], device='cuda:2'), covar=tensor([0.0122, 0.0091, 0.0148, 0.0073, 0.0091, 0.0106, 0.0570, 0.0154], device='cuda:2'), in_proj_covar=tensor([0.0096, 0.0092, 0.0115, 0.0072, 0.0079, 0.0089, 0.0106, 0.0111], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 05:55:33,380 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97480.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:56:22,707 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4577, 2.9630, 3.6208, 2.9928, 3.3797, 4.5967, 4.4184, 3.2892], device='cuda:2'), covar=tensor([0.0369, 0.1694, 0.1178, 0.1368, 0.1175, 0.0803, 0.0497, 0.1207], device='cuda:2'), in_proj_covar=tensor([0.0249, 0.0250, 0.0290, 0.0225, 0.0270, 0.0380, 0.0269, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 05:56:23,871 INFO [train2.py:809] (2/4) Epoch 25, batch 1900, loss[ctc_loss=0.06961, att_loss=0.2479, loss=0.2122, over 16878.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007126, over 49.00 utterances.], tot_loss[ctc_loss=0.06756, att_loss=0.2328, loss=0.1997, over 3284388.78 frames. utt_duration=1272 frames, utt_pad_proportion=0.04587, over 10341.82 utterances.], batch size: 49, lr: 4.30e-03, grad_scale: 16.0 2023-03-09 05:56:30,381 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7588, 2.2971, 5.1047, 4.1130, 3.2409, 4.5169, 4.8324, 4.8552], device='cuda:2'), covar=tensor([0.0173, 0.1467, 0.0136, 0.0718, 0.1364, 0.0164, 0.0139, 0.0180], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0243, 0.0207, 0.0319, 0.0264, 0.0226, 0.0197, 0.0224], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:57:23,539 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.351e+02 1.849e+02 2.257e+02 2.657e+02 5.129e+02, threshold=4.513e+02, percent-clipped=1.0 2023-03-09 05:57:35,274 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97555.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:57:44,859 INFO [train2.py:809] (2/4) Epoch 25, batch 1950, loss[ctc_loss=0.06682, att_loss=0.2265, loss=0.1945, over 15944.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007819, over 41.00 utterances.], tot_loss[ctc_loss=0.06731, att_loss=0.2324, loss=0.1994, over 3278910.73 frames. utt_duration=1282 frames, utt_pad_proportion=0.04478, over 10241.60 utterances.], batch size: 41, lr: 4.30e-03, grad_scale: 16.0 2023-03-09 05:57:48,516 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4816, 2.7487, 3.6354, 4.5454, 4.0291, 3.9136, 3.0140, 2.4608], device='cuda:2'), covar=tensor([0.0773, 0.2109, 0.0797, 0.0489, 0.0847, 0.0504, 0.1484, 0.2079], device='cuda:2'), in_proj_covar=tensor([0.0185, 0.0219, 0.0188, 0.0221, 0.0230, 0.0185, 0.0206, 0.0191], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:58:08,890 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9639, 3.7365, 3.6555, 3.1209, 3.7196, 3.7678, 3.7940, 2.7338], device='cuda:2'), covar=tensor([0.1270, 0.1132, 0.1537, 0.3579, 0.0979, 0.2143, 0.1051, 0.3487], device='cuda:2'), in_proj_covar=tensor([0.0192, 0.0200, 0.0213, 0.0268, 0.0176, 0.0276, 0.0199, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:58:40,548 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2734, 4.4575, 4.5638, 4.7502, 2.9861, 4.6519, 2.7373, 1.8223], device='cuda:2'), covar=tensor([0.0456, 0.0260, 0.0659, 0.0199, 0.1592, 0.0173, 0.1536, 0.1822], device='cuda:2'), in_proj_covar=tensor([0.0204, 0.0176, 0.0260, 0.0168, 0.0220, 0.0160, 0.0228, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:58:58,651 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3795, 4.5304, 4.7210, 4.5073, 5.2069, 4.3280, 4.4381, 2.6261], device='cuda:2'), covar=tensor([0.0315, 0.0385, 0.0299, 0.0425, 0.0746, 0.0336, 0.0420, 0.1844], device='cuda:2'), in_proj_covar=tensor([0.0181, 0.0207, 0.0205, 0.0224, 0.0382, 0.0179, 0.0196, 0.0221], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 05:59:06,987 INFO [train2.py:809] (2/4) Epoch 25, batch 2000, loss[ctc_loss=0.09635, att_loss=0.2515, loss=0.2205, over 17302.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02441, over 59.00 utterances.], tot_loss[ctc_loss=0.06679, att_loss=0.2318, loss=0.1988, over 3275539.58 frames. utt_duration=1311 frames, utt_pad_proportion=0.0386, over 10007.79 utterances.], batch size: 59, lr: 4.30e-03, grad_scale: 16.0 2023-03-09 05:59:15,364 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97616.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 05:59:15,549 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97616.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:00:08,128 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.275e+02 1.752e+02 2.147e+02 2.537e+02 5.924e+02, threshold=4.294e+02, percent-clipped=3.0 2023-03-09 06:00:21,289 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97656.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:00:25,135 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-03-09 06:00:29,817 INFO [train2.py:809] (2/4) Epoch 25, batch 2050, loss[ctc_loss=0.06651, att_loss=0.2438, loss=0.2084, over 16977.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.006287, over 50.00 utterances.], tot_loss[ctc_loss=0.06746, att_loss=0.2324, loss=0.1994, over 3270581.31 frames. utt_duration=1290 frames, utt_pad_proportion=0.04405, over 10154.10 utterances.], batch size: 50, lr: 4.29e-03, grad_scale: 16.0 2023-03-09 06:00:50,808 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97674.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 06:00:50,872 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4431, 4.5401, 4.7779, 4.6501, 5.3435, 4.3322, 4.5108, 2.6900], device='cuda:2'), covar=tensor([0.0279, 0.0364, 0.0280, 0.0355, 0.0549, 0.0311, 0.0358, 0.1664], device='cuda:2'), in_proj_covar=tensor([0.0179, 0.0205, 0.0203, 0.0221, 0.0377, 0.0177, 0.0193, 0.0218], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 06:01:16,679 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-09 06:01:30,786 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97698.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 06:01:38,574 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97703.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:01:40,002 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97704.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:01:51,388 INFO [train2.py:809] (2/4) Epoch 25, batch 2100, loss[ctc_loss=0.09407, att_loss=0.2551, loss=0.2229, over 14218.00 frames. utt_duration=391 frames, utt_pad_proportion=0.3188, over 146.00 utterances.], tot_loss[ctc_loss=0.06798, att_loss=0.2329, loss=0.1999, over 3275532.43 frames. utt_duration=1284 frames, utt_pad_proportion=0.04407, over 10215.58 utterances.], batch size: 146, lr: 4.29e-03, grad_scale: 16.0 2023-03-09 06:02:20,960 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97730.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:02:29,451 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97735.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 06:02:50,716 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.928e+02 2.285e+02 2.765e+02 5.128e+02, threshold=4.569e+02, percent-clipped=4.0 2023-03-09 06:03:06,954 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9798, 5.2602, 4.8398, 5.3607, 4.6959, 4.9191, 5.4303, 5.2028], device='cuda:2'), covar=tensor([0.0561, 0.0311, 0.0795, 0.0309, 0.0454, 0.0268, 0.0240, 0.0197], device='cuda:2'), in_proj_covar=tensor([0.0400, 0.0333, 0.0375, 0.0365, 0.0333, 0.0246, 0.0315, 0.0294], device='cuda:2'), out_proj_covar=tensor([0.0007, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 06:03:11,110 INFO [train2.py:809] (2/4) Epoch 25, batch 2150, loss[ctc_loss=0.06909, att_loss=0.2205, loss=0.1902, over 15955.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007151, over 41.00 utterances.], tot_loss[ctc_loss=0.06831, att_loss=0.2329, loss=0.2, over 3276724.01 frames. utt_duration=1268 frames, utt_pad_proportion=0.04787, over 10348.94 utterances.], batch size: 41, lr: 4.29e-03, grad_scale: 32.0 2023-03-09 06:04:32,647 INFO [train2.py:809] (2/4) Epoch 25, batch 2200, loss[ctc_loss=0.0786, att_loss=0.2238, loss=0.1948, over 15757.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009562, over 38.00 utterances.], tot_loss[ctc_loss=0.06828, att_loss=0.233, loss=0.2001, over 3270587.49 frames. utt_duration=1241 frames, utt_pad_proportion=0.05667, over 10553.54 utterances.], batch size: 38, lr: 4.29e-03, grad_scale: 32.0 2023-03-09 06:05:34,918 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.282e+02 1.901e+02 2.356e+02 2.897e+02 7.922e+02, threshold=4.712e+02, percent-clipped=5.0 2023-03-09 06:05:53,823 INFO [train2.py:809] (2/4) Epoch 25, batch 2250, loss[ctc_loss=0.04743, att_loss=0.2119, loss=0.179, over 16171.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.006796, over 41.00 utterances.], tot_loss[ctc_loss=0.06827, att_loss=0.2325, loss=0.1996, over 3272919.72 frames. utt_duration=1241 frames, utt_pad_proportion=0.05548, over 10565.04 utterances.], batch size: 41, lr: 4.29e-03, grad_scale: 16.0 2023-03-09 06:05:59,114 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4321, 2.2709, 4.6861, 3.9499, 3.1651, 4.2561, 4.2217, 4.5212], device='cuda:2'), covar=tensor([0.0189, 0.1444, 0.0146, 0.0660, 0.1288, 0.0199, 0.0220, 0.0226], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0241, 0.0206, 0.0317, 0.0263, 0.0225, 0.0197, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 06:06:14,137 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-03-09 06:06:37,498 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97888.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:07:13,166 INFO [train2.py:809] (2/4) Epoch 25, batch 2300, loss[ctc_loss=0.07478, att_loss=0.2299, loss=0.1989, over 16950.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.00769, over 50.00 utterances.], tot_loss[ctc_loss=0.06829, att_loss=0.2318, loss=0.1991, over 3259799.97 frames. utt_duration=1256 frames, utt_pad_proportion=0.05361, over 10392.51 utterances.], batch size: 50, lr: 4.29e-03, grad_scale: 16.0 2023-03-09 06:07:13,326 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97911.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:07:21,261 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97916.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:07:21,400 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5625, 2.1181, 5.0436, 4.0665, 3.1103, 4.4311, 4.7088, 4.8126], device='cuda:2'), covar=tensor([0.0237, 0.1679, 0.0171, 0.0743, 0.1581, 0.0207, 0.0152, 0.0204], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0241, 0.0206, 0.0318, 0.0263, 0.0225, 0.0197, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 06:07:21,828 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.95 vs. limit=5.0 2023-03-09 06:07:23,435 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.13 vs. limit=2.0 2023-03-09 06:08:14,117 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.837e+02 2.120e+02 2.605e+02 6.957e+02, threshold=4.241e+02, percent-clipped=1.0 2023-03-09 06:08:15,177 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97949.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:08:33,497 INFO [train2.py:809] (2/4) Epoch 25, batch 2350, loss[ctc_loss=0.0608, att_loss=0.2381, loss=0.2027, over 17323.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02258, over 59.00 utterances.], tot_loss[ctc_loss=0.0681, att_loss=0.2323, loss=0.1995, over 3266014.45 frames. utt_duration=1268 frames, utt_pad_proportion=0.04857, over 10318.79 utterances.], batch size: 59, lr: 4.29e-03, grad_scale: 16.0 2023-03-09 06:08:36,087 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-09 06:08:38,251 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97964.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:08:47,681 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7103, 6.0094, 5.3918, 5.7174, 5.5782, 5.0551, 5.2980, 5.1354], device='cuda:2'), covar=tensor([0.1347, 0.0912, 0.0942, 0.0837, 0.0894, 0.1589, 0.2434, 0.2183], device='cuda:2'), in_proj_covar=tensor([0.0536, 0.0622, 0.0471, 0.0463, 0.0433, 0.0477, 0.0619, 0.0530], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 06:09:34,057 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97998.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:09:45,837 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98003.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:09:57,923 INFO [train2.py:809] (2/4) Epoch 25, batch 2400, loss[ctc_loss=0.07588, att_loss=0.2465, loss=0.2124, over 17300.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01136, over 55.00 utterances.], tot_loss[ctc_loss=0.06793, att_loss=0.2324, loss=0.1995, over 3268458.22 frames. utt_duration=1279 frames, utt_pad_proportion=0.04644, over 10236.93 utterances.], batch size: 55, lr: 4.29e-03, grad_scale: 16.0 2023-03-09 06:10:22,794 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1220, 4.1558, 4.2361, 4.1851, 4.6866, 4.1959, 4.0641, 2.4990], device='cuda:2'), covar=tensor([0.0337, 0.0442, 0.0416, 0.0366, 0.0757, 0.0320, 0.0406, 0.1742], device='cuda:2'), in_proj_covar=tensor([0.0180, 0.0206, 0.0204, 0.0222, 0.0380, 0.0179, 0.0194, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 06:10:29,501 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98030.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 06:10:29,578 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98030.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:10:50,507 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6212, 3.3104, 3.7651, 3.1775, 3.6736, 4.7485, 4.5748, 3.3646], device='cuda:2'), covar=tensor([0.0358, 0.1434, 0.1201, 0.1257, 0.1013, 0.0818, 0.0566, 0.1204], device='cuda:2'), in_proj_covar=tensor([0.0246, 0.0247, 0.0285, 0.0220, 0.0265, 0.0372, 0.0265, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 06:10:55,034 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98046.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:11:00,022 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.282e+02 1.784e+02 2.263e+02 2.686e+02 5.862e+02, threshold=4.525e+02, percent-clipped=1.0 2023-03-09 06:11:03,268 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98051.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:11:18,645 INFO [train2.py:809] (2/4) Epoch 25, batch 2450, loss[ctc_loss=0.08381, att_loss=0.2404, loss=0.2091, over 17001.00 frames. utt_duration=688.4 frames, utt_pad_proportion=0.1362, over 99.00 utterances.], tot_loss[ctc_loss=0.06735, att_loss=0.2323, loss=0.1993, over 3270269.82 frames. utt_duration=1270 frames, utt_pad_proportion=0.04869, over 10314.37 utterances.], batch size: 99, lr: 4.29e-03, grad_scale: 16.0 2023-03-09 06:11:29,687 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9903, 6.2058, 5.6335, 5.8857, 5.8925, 5.3174, 5.6350, 5.2915], device='cuda:2'), covar=tensor([0.1141, 0.0854, 0.0908, 0.0812, 0.0764, 0.1563, 0.2268, 0.2453], device='cuda:2'), in_proj_covar=tensor([0.0539, 0.0628, 0.0475, 0.0465, 0.0435, 0.0479, 0.0624, 0.0533], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 06:11:46,643 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98078.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:12:31,652 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98106.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:12:39,089 INFO [train2.py:809] (2/4) Epoch 25, batch 2500, loss[ctc_loss=0.05152, att_loss=0.2272, loss=0.1921, over 16771.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006274, over 48.00 utterances.], tot_loss[ctc_loss=0.06695, att_loss=0.2314, loss=0.1985, over 3264795.09 frames. utt_duration=1292 frames, utt_pad_proportion=0.04558, over 10118.44 utterances.], batch size: 48, lr: 4.29e-03, grad_scale: 16.0 2023-03-09 06:13:40,618 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.428e+02 1.837e+02 2.156e+02 2.644e+02 5.272e+02, threshold=4.313e+02, percent-clipped=2.0 2023-03-09 06:13:59,452 INFO [train2.py:809] (2/4) Epoch 25, batch 2550, loss[ctc_loss=0.04602, att_loss=0.2302, loss=0.1933, over 16859.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.008081, over 49.00 utterances.], tot_loss[ctc_loss=0.06714, att_loss=0.2315, loss=0.1986, over 3257853.61 frames. utt_duration=1282 frames, utt_pad_proportion=0.05103, over 10176.00 utterances.], batch size: 49, lr: 4.28e-03, grad_scale: 16.0 2023-03-09 06:14:10,072 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98167.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:15:20,410 INFO [train2.py:809] (2/4) Epoch 25, batch 2600, loss[ctc_loss=0.05525, att_loss=0.2123, loss=0.1809, over 15528.00 frames. utt_duration=1727 frames, utt_pad_proportion=0.006869, over 36.00 utterances.], tot_loss[ctc_loss=0.06723, att_loss=0.232, loss=0.1991, over 3261958.40 frames. utt_duration=1237 frames, utt_pad_proportion=0.06083, over 10559.85 utterances.], batch size: 36, lr: 4.28e-03, grad_scale: 16.0 2023-03-09 06:15:20,707 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98211.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:15:39,907 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.30 vs. limit=5.0 2023-03-09 06:16:14,188 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98244.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:16:21,475 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.227e+02 1.793e+02 2.177e+02 2.680e+02 7.649e+02, threshold=4.354e+02, percent-clipped=6.0 2023-03-09 06:16:37,168 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98259.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:16:40,774 INFO [train2.py:809] (2/4) Epoch 25, batch 2650, loss[ctc_loss=0.07778, att_loss=0.2438, loss=0.2106, over 17331.00 frames. utt_duration=1176 frames, utt_pad_proportion=0.02135, over 59.00 utterances.], tot_loss[ctc_loss=0.06697, att_loss=0.2318, loss=0.1988, over 3259393.82 frames. utt_duration=1232 frames, utt_pad_proportion=0.0624, over 10591.34 utterances.], batch size: 59, lr: 4.28e-03, grad_scale: 16.0 2023-03-09 06:18:00,622 INFO [train2.py:809] (2/4) Epoch 25, batch 2700, loss[ctc_loss=0.08636, att_loss=0.2543, loss=0.2207, over 17247.00 frames. utt_duration=874.8 frames, utt_pad_proportion=0.08201, over 79.00 utterances.], tot_loss[ctc_loss=0.06697, att_loss=0.232, loss=0.199, over 3265362.74 frames. utt_duration=1222 frames, utt_pad_proportion=0.06286, over 10701.89 utterances.], batch size: 79, lr: 4.28e-03, grad_scale: 16.0 2023-03-09 06:18:07,841 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2023-03-09 06:18:31,195 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98330.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 06:19:00,609 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.296e+02 1.887e+02 2.414e+02 2.992e+02 7.902e+02, threshold=4.828e+02, percent-clipped=2.0 2023-03-09 06:19:19,945 INFO [train2.py:809] (2/4) Epoch 25, batch 2750, loss[ctc_loss=0.05518, att_loss=0.2053, loss=0.1752, over 15791.00 frames. utt_duration=1664 frames, utt_pad_proportion=0.007223, over 38.00 utterances.], tot_loss[ctc_loss=0.06761, att_loss=0.2325, loss=0.1995, over 3262249.57 frames. utt_duration=1203 frames, utt_pad_proportion=0.06825, over 10864.30 utterances.], batch size: 38, lr: 4.28e-03, grad_scale: 16.0 2023-03-09 06:19:47,085 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98378.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 06:19:56,234 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8127, 6.0960, 5.5646, 5.8062, 5.7698, 5.2770, 5.5066, 5.2502], device='cuda:2'), covar=tensor([0.1271, 0.0887, 0.0978, 0.0784, 0.0886, 0.1578, 0.2177, 0.2175], device='cuda:2'), in_proj_covar=tensor([0.0543, 0.0631, 0.0478, 0.0466, 0.0435, 0.0482, 0.0628, 0.0539], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 06:20:37,052 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98410.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:20:38,133 INFO [train2.py:809] (2/4) Epoch 25, batch 2800, loss[ctc_loss=0.0723, att_loss=0.2357, loss=0.203, over 16422.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.006362, over 44.00 utterances.], tot_loss[ctc_loss=0.06793, att_loss=0.233, loss=0.2, over 3272204.11 frames. utt_duration=1209 frames, utt_pad_proportion=0.06468, over 10838.09 utterances.], batch size: 44, lr: 4.28e-03, grad_scale: 16.0 2023-03-09 06:21:09,932 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1744, 2.7736, 3.1412, 4.2929, 3.8301, 3.7945, 2.7841, 2.2655], device='cuda:2'), covar=tensor([0.0860, 0.1936, 0.0923, 0.0552, 0.0957, 0.0503, 0.1584, 0.2071], device='cuda:2'), in_proj_covar=tensor([0.0184, 0.0218, 0.0186, 0.0220, 0.0231, 0.0184, 0.0204, 0.0189], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 06:21:37,615 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.204e+02 1.960e+02 2.287e+02 2.906e+02 4.125e+02, threshold=4.574e+02, percent-clipped=0.0 2023-03-09 06:21:56,305 INFO [train2.py:809] (2/4) Epoch 25, batch 2850, loss[ctc_loss=0.06519, att_loss=0.2229, loss=0.1913, over 15381.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.009953, over 35.00 utterances.], tot_loss[ctc_loss=0.06785, att_loss=0.2326, loss=0.1996, over 3263926.39 frames. utt_duration=1214 frames, utt_pad_proportion=0.0657, over 10769.33 utterances.], batch size: 35, lr: 4.28e-03, grad_scale: 16.0 2023-03-09 06:21:57,965 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98462.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:22:04,240 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9550, 4.3068, 4.6791, 4.4752, 2.7919, 4.3405, 2.9759, 1.7652], device='cuda:2'), covar=tensor([0.0523, 0.0310, 0.0603, 0.0392, 0.1491, 0.0271, 0.1301, 0.1736], device='cuda:2'), in_proj_covar=tensor([0.0209, 0.0179, 0.0265, 0.0174, 0.0225, 0.0165, 0.0233, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 06:22:05,019 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-09 06:22:12,355 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98471.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:23:14,284 INFO [train2.py:809] (2/4) Epoch 25, batch 2900, loss[ctc_loss=0.08053, att_loss=0.2424, loss=0.21, over 17070.00 frames. utt_duration=1221 frames, utt_pad_proportion=0.01721, over 56.00 utterances.], tot_loss[ctc_loss=0.06754, att_loss=0.2327, loss=0.1996, over 3271728.27 frames. utt_duration=1217 frames, utt_pad_proportion=0.06186, over 10768.70 utterances.], batch size: 56, lr: 4.28e-03, grad_scale: 16.0 2023-03-09 06:23:45,946 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.31 vs. limit=2.0 2023-03-09 06:23:52,966 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5453, 2.8596, 4.9404, 3.9998, 3.1630, 4.4132, 4.8423, 4.7603], device='cuda:2'), covar=tensor([0.0303, 0.1380, 0.0265, 0.0890, 0.1639, 0.0251, 0.0166, 0.0277], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0243, 0.0209, 0.0320, 0.0265, 0.0229, 0.0199, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 06:24:06,312 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98543.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:24:07,678 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98544.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:24:14,934 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.223e+02 1.817e+02 2.098e+02 2.346e+02 4.343e+02, threshold=4.197e+02, percent-clipped=0.0 2023-03-09 06:24:16,105 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-09 06:24:34,075 INFO [train2.py:809] (2/4) Epoch 25, batch 2950, loss[ctc_loss=0.06628, att_loss=0.2133, loss=0.1839, over 15615.00 frames. utt_duration=1690 frames, utt_pad_proportion=0.008877, over 37.00 utterances.], tot_loss[ctc_loss=0.06819, att_loss=0.233, loss=0.2, over 3265793.64 frames. utt_duration=1201 frames, utt_pad_proportion=0.06632, over 10893.89 utterances.], batch size: 37, lr: 4.28e-03, grad_scale: 16.0 2023-03-09 06:25:23,555 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98592.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:25:43,577 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98604.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:25:51,115 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6662, 5.9122, 5.3590, 5.6485, 5.6049, 5.0499, 5.4297, 5.1699], device='cuda:2'), covar=tensor([0.1316, 0.0974, 0.1006, 0.0880, 0.0916, 0.1623, 0.2060, 0.2388], device='cuda:2'), in_proj_covar=tensor([0.0542, 0.0627, 0.0476, 0.0466, 0.0434, 0.0482, 0.0626, 0.0537], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 06:25:54,029 INFO [train2.py:809] (2/4) Epoch 25, batch 3000, loss[ctc_loss=0.08234, att_loss=0.2586, loss=0.2233, over 17479.00 frames. utt_duration=1111 frames, utt_pad_proportion=0.02864, over 63.00 utterances.], tot_loss[ctc_loss=0.06785, att_loss=0.2324, loss=0.1995, over 3267281.01 frames. utt_duration=1231 frames, utt_pad_proportion=0.05793, over 10626.94 utterances.], batch size: 63, lr: 4.27e-03, grad_scale: 16.0 2023-03-09 06:25:54,029 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 06:26:08,487 INFO [train2.py:843] (2/4) Epoch 25, validation: ctc_loss=0.04165, att_loss=0.235, loss=0.1963, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 06:26:08,488 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 06:27:09,501 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 1.890e+02 2.327e+02 2.970e+02 6.667e+02, threshold=4.653e+02, percent-clipped=5.0 2023-03-09 06:27:28,840 INFO [train2.py:809] (2/4) Epoch 25, batch 3050, loss[ctc_loss=0.08893, att_loss=0.2409, loss=0.2105, over 16420.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.006422, over 44.00 utterances.], tot_loss[ctc_loss=0.06803, att_loss=0.2327, loss=0.1998, over 3268244.67 frames. utt_duration=1231 frames, utt_pad_proportion=0.05951, over 10636.22 utterances.], batch size: 44, lr: 4.27e-03, grad_scale: 16.0 2023-03-09 06:27:43,790 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4407, 2.5941, 4.8732, 3.9241, 3.1225, 4.2608, 4.6897, 4.6255], device='cuda:2'), covar=tensor([0.0345, 0.1575, 0.0231, 0.0851, 0.1588, 0.0274, 0.0185, 0.0297], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0244, 0.0209, 0.0320, 0.0265, 0.0229, 0.0199, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 06:27:56,264 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1543, 5.4978, 5.7110, 5.5524, 5.6845, 6.0920, 5.3459, 6.2026], device='cuda:2'), covar=tensor([0.0710, 0.0660, 0.0807, 0.1244, 0.1652, 0.0837, 0.0706, 0.0574], device='cuda:2'), in_proj_covar=tensor([0.0912, 0.0522, 0.0637, 0.0680, 0.0910, 0.0662, 0.0517, 0.0636], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 06:28:48,629 INFO [train2.py:809] (2/4) Epoch 25, batch 3100, loss[ctc_loss=0.0472, att_loss=0.2175, loss=0.1834, over 16009.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007063, over 40.00 utterances.], tot_loss[ctc_loss=0.06821, att_loss=0.2334, loss=0.2004, over 3278307.65 frames. utt_duration=1226 frames, utt_pad_proportion=0.05794, over 10711.89 utterances.], batch size: 40, lr: 4.27e-03, grad_scale: 16.0 2023-03-09 06:29:43,660 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-03-09 06:29:48,740 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 1.858e+02 2.243e+02 2.658e+02 4.552e+02, threshold=4.486e+02, percent-clipped=0.0 2023-03-09 06:30:07,764 INFO [train2.py:809] (2/4) Epoch 25, batch 3150, loss[ctc_loss=0.06911, att_loss=0.2469, loss=0.2114, over 16891.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006318, over 49.00 utterances.], tot_loss[ctc_loss=0.06797, att_loss=0.2333, loss=0.2003, over 3281063.36 frames. utt_duration=1234 frames, utt_pad_proportion=0.05607, over 10646.14 utterances.], batch size: 49, lr: 4.27e-03, grad_scale: 16.0 2023-03-09 06:30:09,788 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98762.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:30:15,611 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98766.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:31:26,678 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98810.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:31:28,125 INFO [train2.py:809] (2/4) Epoch 25, batch 3200, loss[ctc_loss=0.06797, att_loss=0.2363, loss=0.2026, over 17012.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.009149, over 51.00 utterances.], tot_loss[ctc_loss=0.068, att_loss=0.2333, loss=0.2002, over 3281708.26 frames. utt_duration=1235 frames, utt_pad_proportion=0.0562, over 10643.49 utterances.], batch size: 51, lr: 4.27e-03, grad_scale: 8.0 2023-03-09 06:32:29,334 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.298e+02 1.857e+02 2.201e+02 2.576e+02 7.126e+02, threshold=4.402e+02, percent-clipped=1.0 2023-03-09 06:32:46,789 INFO [train2.py:809] (2/4) Epoch 25, batch 3250, loss[ctc_loss=0.1055, att_loss=0.2646, loss=0.2328, over 14023.00 frames. utt_duration=385.8 frames, utt_pad_proportion=0.329, over 146.00 utterances.], tot_loss[ctc_loss=0.06795, att_loss=0.2327, loss=0.1998, over 3268655.01 frames. utt_duration=1226 frames, utt_pad_proportion=0.06262, over 10673.90 utterances.], batch size: 146, lr: 4.27e-03, grad_scale: 8.0 2023-03-09 06:33:33,486 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98890.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:33:41,353 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98895.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:33:47,310 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98899.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:34:05,906 INFO [train2.py:809] (2/4) Epoch 25, batch 3300, loss[ctc_loss=0.06802, att_loss=0.2412, loss=0.2066, over 17007.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.008588, over 51.00 utterances.], tot_loss[ctc_loss=0.06731, att_loss=0.2322, loss=0.1993, over 3270503.28 frames. utt_duration=1250 frames, utt_pad_proportion=0.05554, over 10475.39 utterances.], batch size: 51, lr: 4.27e-03, grad_scale: 8.0 2023-03-09 06:34:50,548 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6575, 1.8767, 1.8382, 2.4365, 2.3695, 2.1868, 1.8293, 2.5963], device='cuda:2'), covar=tensor([0.0921, 0.1684, 0.1210, 0.0708, 0.1413, 0.0797, 0.1039, 0.0873], device='cuda:2'), in_proj_covar=tensor([0.0131, 0.0134, 0.0131, 0.0123, 0.0140, 0.0121, 0.0142, 0.0119], device='cuda:2'), out_proj_covar=tensor([1.0113e-04, 1.0623e-04, 1.0617e-04, 9.6284e-05, 1.0564e-04, 9.7326e-05, 1.0881e-04, 9.4492e-05], device='cuda:2') 2023-03-09 06:35:02,974 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5020, 2.4963, 2.4550, 2.2908, 2.5199, 2.4444, 2.5249, 1.9599], device='cuda:2'), covar=tensor([0.1255, 0.1699, 0.3022, 0.3435, 0.1310, 0.2443, 0.1675, 0.3599], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0201, 0.0214, 0.0268, 0.0177, 0.0276, 0.0199, 0.0225], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 06:35:07,190 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.395e+02 1.882e+02 2.265e+02 2.775e+02 6.610e+02, threshold=4.530e+02, percent-clipped=3.0 2023-03-09 06:35:09,221 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98951.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:35:17,008 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98956.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:35:24,890 INFO [train2.py:809] (2/4) Epoch 25, batch 3350, loss[ctc_loss=0.07535, att_loss=0.2399, loss=0.207, over 17050.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.008453, over 52.00 utterances.], tot_loss[ctc_loss=0.06709, att_loss=0.2324, loss=0.1993, over 3269677.86 frames. utt_duration=1250 frames, utt_pad_proportion=0.05641, over 10475.90 utterances.], batch size: 52, lr: 4.27e-03, grad_scale: 8.0 2023-03-09 06:35:41,917 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98971.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:36:44,961 INFO [train2.py:809] (2/4) Epoch 25, batch 3400, loss[ctc_loss=0.06688, att_loss=0.2375, loss=0.2034, over 17016.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.007977, over 51.00 utterances.], tot_loss[ctc_loss=0.06765, att_loss=0.2329, loss=0.1998, over 3267915.81 frames. utt_duration=1226 frames, utt_pad_proportion=0.06166, over 10673.50 utterances.], batch size: 51, lr: 4.27e-03, grad_scale: 8.0 2023-03-09 06:37:07,474 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0637, 5.0802, 4.8683, 2.7963, 4.8541, 4.6930, 4.1530, 2.9125], device='cuda:2'), covar=tensor([0.0126, 0.0120, 0.0262, 0.1084, 0.0103, 0.0191, 0.0369, 0.1269], device='cuda:2'), in_proj_covar=tensor([0.0077, 0.0105, 0.0107, 0.0111, 0.0087, 0.0115, 0.0100, 0.0103], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 06:37:18,886 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99032.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:37:45,686 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.421e+02 1.932e+02 2.257e+02 2.819e+02 9.991e+02, threshold=4.513e+02, percent-clipped=2.0 2023-03-09 06:37:47,571 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99051.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:38:03,633 INFO [train2.py:809] (2/4) Epoch 25, batch 3450, loss[ctc_loss=0.07477, att_loss=0.2493, loss=0.2144, over 17046.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.007588, over 52.00 utterances.], tot_loss[ctc_loss=0.06822, att_loss=0.2334, loss=0.2003, over 3271675.74 frames. utt_duration=1237 frames, utt_pad_proportion=0.05736, over 10594.31 utterances.], batch size: 52, lr: 4.26e-03, grad_scale: 8.0 2023-03-09 06:38:12,082 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99066.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:39:22,270 INFO [train2.py:809] (2/4) Epoch 25, batch 3500, loss[ctc_loss=0.05127, att_loss=0.2317, loss=0.1956, over 16478.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006026, over 46.00 utterances.], tot_loss[ctc_loss=0.06847, att_loss=0.2332, loss=0.2003, over 3264403.48 frames. utt_duration=1197 frames, utt_pad_proportion=0.06898, over 10922.41 utterances.], batch size: 46, lr: 4.26e-03, grad_scale: 8.0 2023-03-09 06:39:24,128 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99112.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:39:27,519 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99114.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:39:48,600 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5272, 2.2350, 2.0495, 2.4463, 2.8497, 2.5223, 2.1746, 2.9334], device='cuda:2'), covar=tensor([0.1747, 0.2250, 0.1557, 0.1283, 0.1662, 0.1090, 0.2052, 0.1180], device='cuda:2'), in_proj_covar=tensor([0.0132, 0.0135, 0.0131, 0.0124, 0.0140, 0.0121, 0.0142, 0.0120], device='cuda:2'), out_proj_covar=tensor([1.0145e-04, 1.0651e-04, 1.0642e-04, 9.7022e-05, 1.0580e-04, 9.7354e-05, 1.0880e-04, 9.5026e-05], device='cuda:2') 2023-03-09 06:39:55,268 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.11 vs. limit=5.0 2023-03-09 06:40:23,772 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.663e+02 2.107e+02 2.565e+02 8.689e+02, threshold=4.214e+02, percent-clipped=2.0 2023-03-09 06:40:40,954 INFO [train2.py:809] (2/4) Epoch 25, batch 3550, loss[ctc_loss=0.07247, att_loss=0.2331, loss=0.201, over 16274.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.006527, over 43.00 utterances.], tot_loss[ctc_loss=0.06811, att_loss=0.2336, loss=0.2005, over 3272292.80 frames. utt_duration=1219 frames, utt_pad_proportion=0.06231, over 10748.32 utterances.], batch size: 43, lr: 4.26e-03, grad_scale: 8.0 2023-03-09 06:41:40,265 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99199.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:41:59,394 INFO [train2.py:809] (2/4) Epoch 25, batch 3600, loss[ctc_loss=0.1091, att_loss=0.2649, loss=0.2337, over 13768.00 frames. utt_duration=378.8 frames, utt_pad_proportion=0.3378, over 146.00 utterances.], tot_loss[ctc_loss=0.06969, att_loss=0.2345, loss=0.2015, over 3268087.26 frames. utt_duration=1173 frames, utt_pad_proportion=0.07382, over 11156.95 utterances.], batch size: 146, lr: 4.26e-03, grad_scale: 8.0 2023-03-09 06:42:28,462 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99229.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:42:55,189 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99246.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:42:57,263 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99247.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:43:01,838 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.837e+02 2.162e+02 2.807e+02 5.340e+02, threshold=4.325e+02, percent-clipped=4.0 2023-03-09 06:43:03,544 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99251.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:43:11,992 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1167, 5.1563, 4.8524, 2.5301, 2.0043, 3.2574, 2.4482, 3.8073], device='cuda:2'), covar=tensor([0.0724, 0.0320, 0.0354, 0.4899, 0.5468, 0.2229, 0.3907, 0.1848], device='cuda:2'), in_proj_covar=tensor([0.0361, 0.0291, 0.0275, 0.0248, 0.0340, 0.0334, 0.0262, 0.0371], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 06:43:19,232 INFO [train2.py:809] (2/4) Epoch 25, batch 3650, loss[ctc_loss=0.06813, att_loss=0.2269, loss=0.1952, over 16537.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006527, over 45.00 utterances.], tot_loss[ctc_loss=0.06916, att_loss=0.2335, loss=0.2007, over 3264375.38 frames. utt_duration=1207 frames, utt_pad_proportion=0.06678, over 10829.55 utterances.], batch size: 45, lr: 4.26e-03, grad_scale: 8.0 2023-03-09 06:43:30,883 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99268.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:43:34,527 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5805, 4.9922, 4.8380, 4.9825, 5.0771, 4.6295, 3.1477, 5.0375], device='cuda:2'), covar=tensor([0.0123, 0.0108, 0.0135, 0.0080, 0.0100, 0.0133, 0.0827, 0.0166], device='cuda:2'), in_proj_covar=tensor([0.0095, 0.0091, 0.0113, 0.0070, 0.0077, 0.0088, 0.0104, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 06:43:41,603 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-03-09 06:44:05,950 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99290.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:44:40,181 INFO [train2.py:809] (2/4) Epoch 25, batch 3700, loss[ctc_loss=0.0606, att_loss=0.2146, loss=0.1838, over 15992.00 frames. utt_duration=1600 frames, utt_pad_proportion=0.00775, over 40.00 utterances.], tot_loss[ctc_loss=0.06886, att_loss=0.2334, loss=0.2005, over 3266765.61 frames. utt_duration=1210 frames, utt_pad_proportion=0.06545, over 10808.54 utterances.], batch size: 40, lr: 4.26e-03, grad_scale: 8.0 2023-03-09 06:45:01,482 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99324.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:45:05,841 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99327.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:45:09,022 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99329.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:45:41,895 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 1.855e+02 2.238e+02 2.669e+02 6.912e+02, threshold=4.475e+02, percent-clipped=5.0 2023-03-09 06:45:44,649 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.85 vs. limit=5.0 2023-03-09 06:45:59,439 INFO [train2.py:809] (2/4) Epoch 25, batch 3750, loss[ctc_loss=0.09434, att_loss=0.2653, loss=0.2311, over 17110.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01557, over 56.00 utterances.], tot_loss[ctc_loss=0.0692, att_loss=0.2336, loss=0.2007, over 3266489.46 frames. utt_duration=1208 frames, utt_pad_proportion=0.06657, over 10830.79 utterances.], batch size: 56, lr: 4.26e-03, grad_scale: 8.0 2023-03-09 06:46:37,878 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99385.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:47:13,267 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99407.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:47:19,830 INFO [train2.py:809] (2/4) Epoch 25, batch 3800, loss[ctc_loss=0.07479, att_loss=0.2166, loss=0.1882, over 14546.00 frames. utt_duration=1820 frames, utt_pad_proportion=0.04076, over 32.00 utterances.], tot_loss[ctc_loss=0.06928, att_loss=0.234, loss=0.2011, over 3270733.33 frames. utt_duration=1235 frames, utt_pad_proportion=0.05782, over 10606.40 utterances.], batch size: 32, lr: 4.26e-03, grad_scale: 8.0 2023-03-09 06:47:34,708 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99420.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:48:22,108 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.283e+02 1.840e+02 2.193e+02 2.491e+02 3.920e+02, threshold=4.386e+02, percent-clipped=0.0 2023-03-09 06:48:38,757 INFO [train2.py:809] (2/4) Epoch 25, batch 3850, loss[ctc_loss=0.09757, att_loss=0.2499, loss=0.2194, over 14237.00 frames. utt_duration=386.2 frames, utt_pad_proportion=0.32, over 148.00 utterances.], tot_loss[ctc_loss=0.06916, att_loss=0.2333, loss=0.2005, over 3258424.61 frames. utt_duration=1214 frames, utt_pad_proportion=0.0683, over 10752.82 utterances.], batch size: 148, lr: 4.26e-03, grad_scale: 8.0 2023-03-09 06:48:49,792 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99468.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 06:48:51,345 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9154, 5.2407, 4.8604, 5.2957, 4.7087, 4.9374, 5.3611, 5.1602], device='cuda:2'), covar=tensor([0.0623, 0.0298, 0.0764, 0.0317, 0.0392, 0.0300, 0.0236, 0.0192], device='cuda:2'), in_proj_covar=tensor([0.0400, 0.0335, 0.0377, 0.0367, 0.0334, 0.0247, 0.0319, 0.0298], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 06:49:09,257 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99481.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:49:55,229 INFO [train2.py:809] (2/4) Epoch 25, batch 3900, loss[ctc_loss=0.06837, att_loss=0.2349, loss=0.2016, over 17299.00 frames. utt_duration=1100 frames, utt_pad_proportion=0.03836, over 63.00 utterances.], tot_loss[ctc_loss=0.06916, att_loss=0.2337, loss=0.2008, over 3267094.84 frames. utt_duration=1201 frames, utt_pad_proportion=0.06901, over 10898.19 utterances.], batch size: 63, lr: 4.25e-03, grad_scale: 8.0 2023-03-09 06:50:23,082 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99529.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 06:50:48,865 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99546.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:50:54,662 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.849e+02 2.248e+02 2.718e+02 5.190e+02, threshold=4.496e+02, percent-clipped=3.0 2023-03-09 06:50:56,463 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99551.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:51:11,362 INFO [train2.py:809] (2/4) Epoch 25, batch 3950, loss[ctc_loss=0.06124, att_loss=0.2141, loss=0.1835, over 15373.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.01099, over 35.00 utterances.], tot_loss[ctc_loss=0.06816, att_loss=0.2326, loss=0.1997, over 3263844.52 frames. utt_duration=1240 frames, utt_pad_proportion=0.0599, over 10542.25 utterances.], batch size: 35, lr: 4.25e-03, grad_scale: 8.0 2023-03-09 06:51:22,247 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99568.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:51:47,852 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99585.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:51:54,097 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99589.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:52:25,624 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99594.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:52:27,610 INFO [train2.py:809] (2/4) Epoch 26, batch 0, loss[ctc_loss=0.05154, att_loss=0.2034, loss=0.173, over 14480.00 frames. utt_duration=1812 frames, utt_pad_proportion=0.03983, over 32.00 utterances.], tot_loss[ctc_loss=0.05154, att_loss=0.2034, loss=0.173, over 14480.00 frames. utt_duration=1812 frames, utt_pad_proportion=0.03983, over 32.00 utterances.], batch size: 32, lr: 4.17e-03, grad_scale: 8.0 2023-03-09 06:52:27,610 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 06:52:36,983 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4910, 3.2544, 3.7033, 3.3719, 3.6241, 4.5526, 4.3767, 3.4934], device='cuda:2'), covar=tensor([0.0365, 0.1385, 0.1239, 0.1069, 0.0986, 0.0858, 0.0651, 0.1047], device='cuda:2'), in_proj_covar=tensor([0.0246, 0.0247, 0.0287, 0.0221, 0.0265, 0.0375, 0.0269, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 06:52:40,072 INFO [train2.py:843] (2/4) Epoch 26, validation: ctc_loss=0.04131, att_loss=0.2338, loss=0.1953, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 06:52:40,073 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 06:52:46,817 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99599.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:53:27,428 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99624.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:53:32,269 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99627.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:53:36,080 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99629.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:53:40,601 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1989, 5.4299, 5.4342, 5.4335, 5.5179, 5.4484, 5.1330, 4.8699], device='cuda:2'), covar=tensor([0.0963, 0.0525, 0.0258, 0.0473, 0.0247, 0.0302, 0.0379, 0.0352], device='cuda:2'), in_proj_covar=tensor([0.0535, 0.0379, 0.0364, 0.0373, 0.0434, 0.0445, 0.0373, 0.0408], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 06:54:00,742 INFO [train2.py:809] (2/4) Epoch 26, batch 50, loss[ctc_loss=0.08845, att_loss=0.2524, loss=0.2196, over 17162.00 frames. utt_duration=870.5 frames, utt_pad_proportion=0.08948, over 79.00 utterances.], tot_loss[ctc_loss=0.06425, att_loss=0.2312, loss=0.1978, over 739400.49 frames. utt_duration=1355 frames, utt_pad_proportion=0.02554, over 2185.00 utterances.], batch size: 79, lr: 4.17e-03, grad_scale: 8.0 2023-03-09 06:54:08,986 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.776e+02 2.200e+02 2.639e+02 7.025e+02, threshold=4.399e+02, percent-clipped=1.0 2023-03-09 06:54:09,434 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99650.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:54:48,114 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99675.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:54:56,433 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99680.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:54:56,573 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99680.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:55:06,982 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2251, 5.4372, 5.4372, 5.4453, 5.5246, 5.4740, 5.1040, 4.8990], device='cuda:2'), covar=tensor([0.0950, 0.0576, 0.0281, 0.0510, 0.0281, 0.0320, 0.0431, 0.0364], device='cuda:2'), in_proj_covar=tensor([0.0534, 0.0379, 0.0364, 0.0373, 0.0435, 0.0444, 0.0373, 0.0408], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 06:55:19,428 INFO [train2.py:809] (2/4) Epoch 26, batch 100, loss[ctc_loss=0.0739, att_loss=0.2377, loss=0.2049, over 16121.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006787, over 42.00 utterances.], tot_loss[ctc_loss=0.06596, att_loss=0.2311, loss=0.1981, over 1300969.02 frames. utt_duration=1269 frames, utt_pad_proportion=0.0472, over 4104.44 utterances.], batch size: 42, lr: 4.17e-03, grad_scale: 8.0 2023-03-09 06:55:39,210 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99707.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:56:31,497 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99741.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:56:37,954 INFO [train2.py:809] (2/4) Epoch 26, batch 150, loss[ctc_loss=0.06714, att_loss=0.2122, loss=0.1832, over 15404.00 frames. utt_duration=1762 frames, utt_pad_proportion=0.009192, over 35.00 utterances.], tot_loss[ctc_loss=0.06693, att_loss=0.2314, loss=0.1985, over 1727354.53 frames. utt_duration=1238 frames, utt_pad_proportion=0.06366, over 5588.25 utterances.], batch size: 35, lr: 4.17e-03, grad_scale: 8.0 2023-03-09 06:56:46,323 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.249e+02 1.892e+02 2.300e+02 2.888e+02 7.505e+02, threshold=4.601e+02, percent-clipped=4.0 2023-03-09 06:56:54,122 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99755.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:57:01,587 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2023-03-09 06:57:27,683 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99776.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 06:57:57,204 INFO [train2.py:809] (2/4) Epoch 26, batch 200, loss[ctc_loss=0.08198, att_loss=0.2474, loss=0.2143, over 17271.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.01279, over 55.00 utterances.], tot_loss[ctc_loss=0.06693, att_loss=0.2322, loss=0.1991, over 2067935.34 frames. utt_duration=1245 frames, utt_pad_proportion=0.0611, over 6654.07 utterances.], batch size: 55, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 06:58:44,447 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99824.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 06:59:17,381 INFO [train2.py:809] (2/4) Epoch 26, batch 250, loss[ctc_loss=0.05934, att_loss=0.2403, loss=0.2041, over 16784.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005667, over 48.00 utterances.], tot_loss[ctc_loss=0.06757, att_loss=0.2329, loss=0.1998, over 2330693.29 frames. utt_duration=1227 frames, utt_pad_proportion=0.06502, over 7606.33 utterances.], batch size: 48, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 06:59:24,073 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8950, 4.9254, 4.6329, 2.8185, 4.5881, 4.6079, 4.1428, 2.5814], device='cuda:2'), covar=tensor([0.0110, 0.0134, 0.0304, 0.1093, 0.0135, 0.0217, 0.0381, 0.1542], device='cuda:2'), in_proj_covar=tensor([0.0076, 0.0104, 0.0107, 0.0110, 0.0087, 0.0114, 0.0100, 0.0102], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 06:59:25,254 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.472e+02 1.972e+02 2.288e+02 2.962e+02 9.624e+02, threshold=4.577e+02, percent-clipped=5.0 2023-03-09 06:59:47,977 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.76 vs. limit=2.0 2023-03-09 06:59:56,934 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.75 vs. limit=2.0 2023-03-09 07:00:03,398 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7714, 3.4516, 3.5042, 3.0720, 3.5193, 3.4451, 3.4992, 2.4065], device='cuda:2'), covar=tensor([0.1161, 0.1239, 0.1468, 0.2822, 0.1043, 0.1641, 0.0844, 0.3566], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0200, 0.0213, 0.0267, 0.0177, 0.0275, 0.0196, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:00:19,868 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99885.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:00:35,345 INFO [train2.py:809] (2/4) Epoch 26, batch 300, loss[ctc_loss=0.07918, att_loss=0.245, loss=0.2118, over 16609.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.005353, over 47.00 utterances.], tot_loss[ctc_loss=0.06763, att_loss=0.2328, loss=0.1997, over 2537231.59 frames. utt_duration=1235 frames, utt_pad_proportion=0.06286, over 8230.39 utterances.], batch size: 47, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 07:01:21,940 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99924.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:01:22,037 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99924.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:01:35,964 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99933.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:01:55,224 INFO [train2.py:809] (2/4) Epoch 26, batch 350, loss[ctc_loss=0.05534, att_loss=0.2339, loss=0.1982, over 16485.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.006172, over 46.00 utterances.], tot_loss[ctc_loss=0.06774, att_loss=0.2332, loss=0.2001, over 2707650.49 frames. utt_duration=1264 frames, utt_pad_proportion=0.05209, over 8580.08 utterances.], batch size: 46, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 07:01:55,385 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99945.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:02:02,686 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.378e+02 1.873e+02 2.109e+02 2.728e+02 6.807e+02, threshold=4.219e+02, percent-clipped=3.0 2023-03-09 07:02:13,913 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6900, 5.0888, 4.9153, 5.0514, 5.1059, 4.7883, 3.6007, 5.0848], device='cuda:2'), covar=tensor([0.0121, 0.0109, 0.0128, 0.0070, 0.0106, 0.0125, 0.0652, 0.0160], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0093, 0.0116, 0.0072, 0.0078, 0.0090, 0.0106, 0.0112], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 07:02:39,743 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99972.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:02:46,070 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1811, 3.8404, 3.7831, 3.2090, 3.8721, 3.9517, 3.8570, 2.9286], device='cuda:2'), covar=tensor([0.0947, 0.0974, 0.1576, 0.3461, 0.0892, 0.1360, 0.0948, 0.3105], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0199, 0.0213, 0.0266, 0.0176, 0.0274, 0.0196, 0.0225], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:02:52,065 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99980.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:03:02,902 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1090, 4.3229, 4.2990, 4.4462, 2.8399, 4.4174, 2.6907, 1.7913], device='cuda:2'), covar=tensor([0.0480, 0.0294, 0.0703, 0.0302, 0.1506, 0.0219, 0.1439, 0.1600], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0177, 0.0259, 0.0172, 0.0221, 0.0162, 0.0231, 0.0202], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:03:09,532 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8524, 5.2954, 5.3183, 5.3003, 5.3467, 5.3504, 5.0333, 4.8180], device='cuda:2'), covar=tensor([0.1373, 0.0609, 0.0364, 0.0506, 0.0410, 0.0388, 0.0478, 0.0422], device='cuda:2'), in_proj_covar=tensor([0.0532, 0.0376, 0.0365, 0.0374, 0.0435, 0.0444, 0.0372, 0.0408], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 07:03:15,350 INFO [train2.py:809] (2/4) Epoch 26, batch 400, loss[ctc_loss=0.07195, att_loss=0.2376, loss=0.2045, over 16322.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.00649, over 45.00 utterances.], tot_loss[ctc_loss=0.0668, att_loss=0.2322, loss=0.1991, over 2830527.63 frames. utt_duration=1251 frames, utt_pad_proportion=0.0559, over 9062.55 utterances.], batch size: 45, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 07:03:39,411 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=100007.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 07:04:11,936 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100028.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:04:24,061 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=100036.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:04:38,118 INFO [train2.py:809] (2/4) Epoch 26, batch 450, loss[ctc_loss=0.04533, att_loss=0.2117, loss=0.1784, over 16274.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.006995, over 43.00 utterances.], tot_loss[ctc_loss=0.06745, att_loss=0.233, loss=0.1999, over 2933540.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.05958, over 9607.12 utterances.], batch size: 43, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 07:04:46,163 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.318e+02 1.836e+02 2.145e+02 2.710e+02 6.185e+02, threshold=4.291e+02, percent-clipped=5.0 2023-03-09 07:05:14,895 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=100068.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 07:05:27,550 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100076.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:05:56,810 INFO [train2.py:809] (2/4) Epoch 26, batch 500, loss[ctc_loss=0.04569, att_loss=0.1921, loss=0.1628, over 15360.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01183, over 35.00 utterances.], tot_loss[ctc_loss=0.06708, att_loss=0.233, loss=0.1998, over 3011326.03 frames. utt_duration=1227 frames, utt_pad_proportion=0.05817, over 9828.25 utterances.], batch size: 35, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 07:06:13,673 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0973, 3.7404, 3.1203, 3.3186, 3.8972, 3.6690, 3.0730, 4.1638], device='cuda:2'), covar=tensor([0.0992, 0.0529, 0.1095, 0.0842, 0.0788, 0.0748, 0.0859, 0.0528], device='cuda:2'), in_proj_covar=tensor([0.0209, 0.0226, 0.0229, 0.0208, 0.0288, 0.0248, 0.0204, 0.0296], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 07:06:27,655 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6878, 5.1789, 4.9360, 5.1306, 5.2445, 4.8757, 3.3844, 5.1292], device='cuda:2'), covar=tensor([0.0141, 0.0131, 0.0187, 0.0076, 0.0127, 0.0127, 0.0787, 0.0222], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0093, 0.0116, 0.0072, 0.0078, 0.0090, 0.0106, 0.0112], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 07:06:29,168 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6504, 3.1213, 3.7352, 3.1765, 3.6108, 4.6986, 4.5503, 3.2583], device='cuda:2'), covar=tensor([0.0292, 0.1621, 0.1263, 0.1336, 0.1060, 0.0796, 0.0548, 0.1210], device='cuda:2'), in_proj_covar=tensor([0.0248, 0.0249, 0.0287, 0.0222, 0.0266, 0.0378, 0.0270, 0.0233], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 07:06:43,579 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100124.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:06:43,779 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100124.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 07:07:07,796 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.7364, 3.9720, 3.9264, 3.9831, 4.0733, 3.8514, 2.9729, 3.9490], device='cuda:2'), covar=tensor([0.0164, 0.0133, 0.0144, 0.0092, 0.0091, 0.0143, 0.0687, 0.0202], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0093, 0.0116, 0.0072, 0.0078, 0.0090, 0.0106, 0.0112], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 07:07:16,638 INFO [train2.py:809] (2/4) Epoch 26, batch 550, loss[ctc_loss=0.06956, att_loss=0.2399, loss=0.2059, over 17067.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.008128, over 52.00 utterances.], tot_loss[ctc_loss=0.06654, att_loss=0.232, loss=0.1989, over 3069471.66 frames. utt_duration=1224 frames, utt_pad_proportion=0.0596, over 10040.27 utterances.], batch size: 52, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 07:07:24,147 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.301e+02 1.784e+02 2.229e+02 2.699e+02 5.667e+02, threshold=4.458e+02, percent-clipped=3.0 2023-03-09 07:07:47,293 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9416, 5.2151, 5.4534, 5.2687, 5.4455, 5.8752, 5.2232, 5.9698], device='cuda:2'), covar=tensor([0.0715, 0.0754, 0.0880, 0.1448, 0.1703, 0.0894, 0.0699, 0.0682], device='cuda:2'), in_proj_covar=tensor([0.0913, 0.0527, 0.0640, 0.0685, 0.0908, 0.0662, 0.0512, 0.0649], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 07:08:00,402 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100172.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 07:08:36,931 INFO [train2.py:809] (2/4) Epoch 26, batch 600, loss[ctc_loss=0.0628, att_loss=0.2402, loss=0.2047, over 17320.00 frames. utt_duration=1006 frames, utt_pad_proportion=0.05042, over 69.00 utterances.], tot_loss[ctc_loss=0.06698, att_loss=0.2323, loss=0.1992, over 3117996.53 frames. utt_duration=1227 frames, utt_pad_proportion=0.05829, over 10176.06 utterances.], batch size: 69, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 07:08:52,911 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2023-03-09 07:09:08,742 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.78 vs. limit=2.0 2023-03-09 07:09:24,646 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100224.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:09:47,260 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=100238.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:09:58,209 INFO [train2.py:809] (2/4) Epoch 26, batch 650, loss[ctc_loss=0.06644, att_loss=0.2221, loss=0.191, over 15876.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.008765, over 39.00 utterances.], tot_loss[ctc_loss=0.068, att_loss=0.2324, loss=0.1995, over 3136738.97 frames. utt_duration=1187 frames, utt_pad_proportion=0.07382, over 10583.05 utterances.], batch size: 39, lr: 4.16e-03, grad_scale: 8.0 2023-03-09 07:09:58,554 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100245.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:10:06,956 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.335e+02 1.945e+02 2.331e+02 2.893e+02 4.719e+02, threshold=4.662e+02, percent-clipped=1.0 2023-03-09 07:10:43,114 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100272.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:10:49,783 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1600, 5.1663, 5.0982, 2.2292, 2.0829, 2.8233, 2.5409, 4.0520], device='cuda:2'), covar=tensor([0.0693, 0.0279, 0.0208, 0.5285, 0.5450, 0.2646, 0.3572, 0.1537], device='cuda:2'), in_proj_covar=tensor([0.0362, 0.0295, 0.0278, 0.0250, 0.0343, 0.0335, 0.0262, 0.0372], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 07:11:16,660 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100293.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:11:19,699 INFO [train2.py:809] (2/4) Epoch 26, batch 700, loss[ctc_loss=0.07964, att_loss=0.2479, loss=0.2142, over 17437.00 frames. utt_duration=1012 frames, utt_pad_proportion=0.04591, over 69.00 utterances.], tot_loss[ctc_loss=0.06759, att_loss=0.2327, loss=0.1997, over 3170143.98 frames. utt_duration=1193 frames, utt_pad_proportion=0.07041, over 10645.42 utterances.], batch size: 69, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:11:26,799 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=100299.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:12:05,542 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2023-03-09 07:12:20,598 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2023-03-09 07:12:26,361 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100336.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:12:40,623 INFO [train2.py:809] (2/4) Epoch 26, batch 750, loss[ctc_loss=0.05023, att_loss=0.2194, loss=0.1855, over 16395.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.008024, over 44.00 utterances.], tot_loss[ctc_loss=0.06769, att_loss=0.2329, loss=0.1999, over 3194885.71 frames. utt_duration=1192 frames, utt_pad_proportion=0.06899, over 10730.86 utterances.], batch size: 44, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:12:48,983 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.213e+02 1.813e+02 2.158e+02 2.667e+02 1.532e+03, threshold=4.316e+02, percent-clipped=4.0 2023-03-09 07:13:10,579 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=100363.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 07:13:43,914 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100384.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:14:01,727 INFO [train2.py:809] (2/4) Epoch 26, batch 800, loss[ctc_loss=0.0434, att_loss=0.238, loss=0.1991, over 16864.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.007626, over 49.00 utterances.], tot_loss[ctc_loss=0.06761, att_loss=0.2333, loss=0.2002, over 3211457.61 frames. utt_duration=1206 frames, utt_pad_proportion=0.06584, over 10668.18 utterances.], batch size: 49, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:14:13,225 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=100402.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:14:14,686 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5757, 5.0296, 4.8120, 4.9109, 5.1212, 4.7193, 3.4721, 4.9829], device='cuda:2'), covar=tensor([0.0129, 0.0120, 0.0148, 0.0094, 0.0090, 0.0119, 0.0679, 0.0175], device='cuda:2'), in_proj_covar=tensor([0.0096, 0.0092, 0.0116, 0.0072, 0.0078, 0.0090, 0.0106, 0.0111], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 07:15:05,674 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2023-03-09 07:15:22,214 INFO [train2.py:809] (2/4) Epoch 26, batch 850, loss[ctc_loss=0.04773, att_loss=0.2054, loss=0.1739, over 16012.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006939, over 40.00 utterances.], tot_loss[ctc_loss=0.06718, att_loss=0.2329, loss=0.1998, over 3225091.33 frames. utt_duration=1234 frames, utt_pad_proportion=0.05939, over 10468.93 utterances.], batch size: 40, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:15:24,947 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4959, 4.7215, 4.4113, 4.7835, 4.2993, 4.3459, 4.7892, 4.6472], device='cuda:2'), covar=tensor([0.0594, 0.0321, 0.0648, 0.0381, 0.0411, 0.0503, 0.0269, 0.0200], device='cuda:2'), in_proj_covar=tensor([0.0401, 0.0336, 0.0376, 0.0369, 0.0336, 0.0246, 0.0321, 0.0301], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 07:15:31,016 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.217e+02 1.778e+02 2.108e+02 2.517e+02 5.047e+02, threshold=4.216e+02, percent-clipped=2.0 2023-03-09 07:15:52,669 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=100463.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 07:16:43,801 INFO [train2.py:809] (2/4) Epoch 26, batch 900, loss[ctc_loss=0.05358, att_loss=0.2374, loss=0.2006, over 17029.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007337, over 51.00 utterances.], tot_loss[ctc_loss=0.06658, att_loss=0.2321, loss=0.199, over 3228650.55 frames. utt_duration=1230 frames, utt_pad_proportion=0.06266, over 10514.18 utterances.], batch size: 51, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:18:03,337 INFO [train2.py:809] (2/4) Epoch 26, batch 950, loss[ctc_loss=0.04161, att_loss=0.1954, loss=0.1647, over 15485.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.009744, over 36.00 utterances.], tot_loss[ctc_loss=0.06763, att_loss=0.233, loss=0.1999, over 3244387.18 frames. utt_duration=1226 frames, utt_pad_proportion=0.06106, over 10598.53 utterances.], batch size: 36, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:18:11,130 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.401e+02 1.882e+02 2.194e+02 2.822e+02 5.450e+02, threshold=4.387e+02, percent-clipped=2.0 2023-03-09 07:18:49,472 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9849, 5.3112, 4.9354, 5.3878, 4.7953, 5.0580, 5.4736, 5.2370], device='cuda:2'), covar=tensor([0.0721, 0.0294, 0.0791, 0.0332, 0.0436, 0.0229, 0.0226, 0.0202], device='cuda:2'), in_proj_covar=tensor([0.0401, 0.0337, 0.0377, 0.0370, 0.0337, 0.0247, 0.0321, 0.0302], device='cuda:2'), out_proj_covar=tensor([0.0007, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 07:19:22,763 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=100594.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:19:24,219 INFO [train2.py:809] (2/4) Epoch 26, batch 1000, loss[ctc_loss=0.07506, att_loss=0.25, loss=0.215, over 17271.00 frames. utt_duration=1098 frames, utt_pad_proportion=0.04015, over 63.00 utterances.], tot_loss[ctc_loss=0.06673, att_loss=0.232, loss=0.1989, over 3246235.63 frames. utt_duration=1221 frames, utt_pad_proportion=0.06302, over 10648.32 utterances.], batch size: 63, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:20:41,599 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9126, 5.2819, 4.8561, 5.3330, 4.6852, 4.9862, 5.4048, 5.1966], device='cuda:2'), covar=tensor([0.0731, 0.0309, 0.0884, 0.0367, 0.0445, 0.0307, 0.0264, 0.0213], device='cuda:2'), in_proj_covar=tensor([0.0402, 0.0337, 0.0377, 0.0371, 0.0337, 0.0246, 0.0320, 0.0302], device='cuda:2'), out_proj_covar=tensor([0.0007, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 07:20:45,242 INFO [train2.py:809] (2/4) Epoch 26, batch 1050, loss[ctc_loss=0.05395, att_loss=0.2088, loss=0.1778, over 15787.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.008254, over 38.00 utterances.], tot_loss[ctc_loss=0.06716, att_loss=0.2326, loss=0.1995, over 3252311.11 frames. utt_duration=1199 frames, utt_pad_proportion=0.06847, over 10867.65 utterances.], batch size: 38, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:20:53,109 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.280e+02 1.800e+02 2.231e+02 2.686e+02 5.529e+02, threshold=4.462e+02, percent-clipped=2.0 2023-03-09 07:21:14,704 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100663.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 07:22:05,996 INFO [train2.py:809] (2/4) Epoch 26, batch 1100, loss[ctc_loss=0.06228, att_loss=0.2268, loss=0.1939, over 16121.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005898, over 42.00 utterances.], tot_loss[ctc_loss=0.06707, att_loss=0.2327, loss=0.1996, over 3254400.58 frames. utt_duration=1207 frames, utt_pad_proportion=0.06791, over 10797.59 utterances.], batch size: 42, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:22:07,982 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3900, 2.8483, 4.9819, 4.0474, 3.1592, 4.2481, 4.7567, 4.7339], device='cuda:2'), covar=tensor([0.0349, 0.1360, 0.0219, 0.0849, 0.1560, 0.0273, 0.0194, 0.0265], device='cuda:2'), in_proj_covar=tensor([0.0216, 0.0242, 0.0209, 0.0316, 0.0264, 0.0228, 0.0199, 0.0225], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:22:31,629 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100711.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 07:23:01,677 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=100730.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 07:23:24,290 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1629, 5.1351, 4.9922, 2.4528, 2.1130, 3.2614, 2.6932, 3.9639], device='cuda:2'), covar=tensor([0.0680, 0.0348, 0.0302, 0.4908, 0.5435, 0.2104, 0.3387, 0.1647], device='cuda:2'), in_proj_covar=tensor([0.0363, 0.0297, 0.0280, 0.0251, 0.0342, 0.0335, 0.0262, 0.0372], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 07:23:25,348 INFO [train2.py:809] (2/4) Epoch 26, batch 1150, loss[ctc_loss=0.03904, att_loss=0.1969, loss=0.1653, over 15510.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.008226, over 36.00 utterances.], tot_loss[ctc_loss=0.06732, att_loss=0.2323, loss=0.1993, over 3251005.09 frames. utt_duration=1204 frames, utt_pad_proportion=0.07069, over 10810.76 utterances.], batch size: 36, lr: 4.15e-03, grad_scale: 8.0 2023-03-09 07:23:32,922 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.823e+02 2.156e+02 2.583e+02 8.005e+02, threshold=4.313e+02, percent-clipped=1.0 2023-03-09 07:23:46,172 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=100758.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 07:24:01,912 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4675, 4.4636, 4.6216, 4.6116, 5.1234, 4.4077, 4.4397, 2.6474], device='cuda:2'), covar=tensor([0.0257, 0.0480, 0.0330, 0.0318, 0.0711, 0.0271, 0.0389, 0.1742], device='cuda:2'), in_proj_covar=tensor([0.0182, 0.0211, 0.0207, 0.0224, 0.0379, 0.0182, 0.0196, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:24:16,134 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1487, 3.7945, 3.2625, 3.3789, 4.0551, 3.6098, 2.9175, 4.2720], device='cuda:2'), covar=tensor([0.0916, 0.0546, 0.0958, 0.0754, 0.0630, 0.0721, 0.0904, 0.0475], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0225, 0.0229, 0.0205, 0.0286, 0.0246, 0.0202, 0.0294], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 07:24:39,259 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=100791.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 07:24:39,765 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-03-09 07:24:44,980 INFO [train2.py:809] (2/4) Epoch 26, batch 1200, loss[ctc_loss=0.05035, att_loss=0.2283, loss=0.1928, over 16886.00 frames. utt_duration=683.6 frames, utt_pad_proportion=0.1423, over 99.00 utterances.], tot_loss[ctc_loss=0.06718, att_loss=0.2322, loss=0.1992, over 3253519.80 frames. utt_duration=1201 frames, utt_pad_proportion=0.0694, over 10850.10 utterances.], batch size: 99, lr: 4.14e-03, grad_scale: 16.0 2023-03-09 07:24:55,094 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9003, 4.5823, 4.6621, 2.3969, 2.1732, 2.9986, 2.3358, 3.7742], device='cuda:2'), covar=tensor([0.0747, 0.0347, 0.0264, 0.4658, 0.4855, 0.2324, 0.3665, 0.1486], device='cuda:2'), in_proj_covar=tensor([0.0360, 0.0295, 0.0277, 0.0249, 0.0339, 0.0332, 0.0260, 0.0369], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 07:26:02,234 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.78 vs. limit=2.0 2023-03-09 07:26:04,020 INFO [train2.py:809] (2/4) Epoch 26, batch 1250, loss[ctc_loss=0.07454, att_loss=0.2456, loss=0.2114, over 17006.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.009498, over 51.00 utterances.], tot_loss[ctc_loss=0.06721, att_loss=0.2321, loss=0.1991, over 3263678.10 frames. utt_duration=1226 frames, utt_pad_proportion=0.06071, over 10662.92 utterances.], batch size: 51, lr: 4.14e-03, grad_scale: 16.0 2023-03-09 07:26:11,742 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.226e+02 1.779e+02 2.144e+02 2.683e+02 5.396e+02, threshold=4.288e+02, percent-clipped=4.0 2023-03-09 07:26:50,256 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1024, 5.3608, 5.2993, 5.2887, 5.4079, 5.3514, 4.9716, 4.8400], device='cuda:2'), covar=tensor([0.0919, 0.0566, 0.0305, 0.0527, 0.0281, 0.0324, 0.0448, 0.0330], device='cuda:2'), in_proj_covar=tensor([0.0527, 0.0378, 0.0360, 0.0374, 0.0433, 0.0442, 0.0373, 0.0405], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 07:27:22,195 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100894.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:27:23,460 INFO [train2.py:809] (2/4) Epoch 26, batch 1300, loss[ctc_loss=0.05242, att_loss=0.2216, loss=0.1878, over 15943.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.007617, over 41.00 utterances.], tot_loss[ctc_loss=0.06722, att_loss=0.2325, loss=0.1994, over 3277500.66 frames. utt_duration=1243 frames, utt_pad_proportion=0.05284, over 10563.28 utterances.], batch size: 41, lr: 4.14e-03, grad_scale: 16.0 2023-03-09 07:27:40,272 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5968, 2.7523, 5.0805, 4.1326, 3.2242, 4.3464, 4.8068, 4.7932], device='cuda:2'), covar=tensor([0.0285, 0.1516, 0.0273, 0.0776, 0.1533, 0.0241, 0.0180, 0.0255], device='cuda:2'), in_proj_covar=tensor([0.0221, 0.0247, 0.0214, 0.0323, 0.0270, 0.0232, 0.0202, 0.0230], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:28:03,447 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-09 07:28:07,233 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1494, 4.4788, 4.3781, 4.3994, 4.5194, 4.3000, 3.2872, 4.4103], device='cuda:2'), covar=tensor([0.0143, 0.0134, 0.0149, 0.0111, 0.0123, 0.0125, 0.0659, 0.0243], device='cuda:2'), in_proj_covar=tensor([0.0096, 0.0093, 0.0116, 0.0072, 0.0078, 0.0090, 0.0106, 0.0112], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 07:28:38,868 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100942.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:28:43,600 INFO [train2.py:809] (2/4) Epoch 26, batch 1350, loss[ctc_loss=0.06626, att_loss=0.2214, loss=0.1904, over 15776.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.00784, over 38.00 utterances.], tot_loss[ctc_loss=0.06762, att_loss=0.2331, loss=0.2, over 3288856.93 frames. utt_duration=1252 frames, utt_pad_proportion=0.04772, over 10523.43 utterances.], batch size: 38, lr: 4.14e-03, grad_scale: 16.0 2023-03-09 07:28:51,466 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.196e+02 1.935e+02 2.207e+02 2.665e+02 5.260e+02, threshold=4.414e+02, percent-clipped=2.0 2023-03-09 07:29:45,454 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.79 vs. limit=5.0 2023-03-09 07:30:04,562 INFO [train2.py:809] (2/4) Epoch 26, batch 1400, loss[ctc_loss=0.06461, att_loss=0.2378, loss=0.2031, over 16878.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007199, over 49.00 utterances.], tot_loss[ctc_loss=0.06739, att_loss=0.2324, loss=0.1994, over 3278679.08 frames. utt_duration=1264 frames, utt_pad_proportion=0.04719, over 10390.83 utterances.], batch size: 49, lr: 4.14e-03, grad_scale: 16.0 2023-03-09 07:30:04,886 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9347, 5.0234, 4.7564, 3.2045, 4.8418, 4.6211, 4.2544, 2.7631], device='cuda:2'), covar=tensor([0.0140, 0.0112, 0.0287, 0.0956, 0.0104, 0.0213, 0.0334, 0.1395], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0107, 0.0110, 0.0114, 0.0090, 0.0118, 0.0103, 0.0106], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 07:30:59,627 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-09 07:31:24,471 INFO [train2.py:809] (2/4) Epoch 26, batch 1450, loss[ctc_loss=0.06568, att_loss=0.2276, loss=0.1952, over 16384.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.007908, over 44.00 utterances.], tot_loss[ctc_loss=0.06665, att_loss=0.232, loss=0.1989, over 3280902.82 frames. utt_duration=1265 frames, utt_pad_proportion=0.04529, over 10384.10 utterances.], batch size: 44, lr: 4.14e-03, grad_scale: 16.0 2023-03-09 07:31:32,129 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.264e+02 1.754e+02 2.064e+02 2.618e+02 5.187e+02, threshold=4.127e+02, percent-clipped=2.0 2023-03-09 07:31:45,734 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=101058.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:32:30,531 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=101086.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 07:32:33,797 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7749, 3.5397, 3.5045, 3.0435, 3.5824, 3.5339, 3.6045, 2.7140], device='cuda:2'), covar=tensor([0.0976, 0.1261, 0.1575, 0.2980, 0.0872, 0.2526, 0.0817, 0.2993], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0199, 0.0214, 0.0264, 0.0176, 0.0275, 0.0195, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:32:44,602 INFO [train2.py:809] (2/4) Epoch 26, batch 1500, loss[ctc_loss=0.127, att_loss=0.2642, loss=0.2368, over 14257.00 frames. utt_duration=386.8 frames, utt_pad_proportion=0.3166, over 148.00 utterances.], tot_loss[ctc_loss=0.06689, att_loss=0.2321, loss=0.199, over 3271670.56 frames. utt_duration=1255 frames, utt_pad_proportion=0.05116, over 10441.99 utterances.], batch size: 148, lr: 4.14e-03, grad_scale: 16.0 2023-03-09 07:33:02,319 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=101106.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:33:04,125 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=101107.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:34:03,293 INFO [train2.py:809] (2/4) Epoch 26, batch 1550, loss[ctc_loss=0.05549, att_loss=0.1992, loss=0.1704, over 15510.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.008114, over 36.00 utterances.], tot_loss[ctc_loss=0.06687, att_loss=0.2318, loss=0.1988, over 3276096.22 frames. utt_duration=1284 frames, utt_pad_proportion=0.04403, over 10220.27 utterances.], batch size: 36, lr: 4.14e-03, grad_scale: 16.0 2023-03-09 07:34:11,012 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.859e+02 2.187e+02 2.715e+02 5.591e+02, threshold=4.373e+02, percent-clipped=3.0 2023-03-09 07:34:39,825 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=101168.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:35:22,401 INFO [train2.py:809] (2/4) Epoch 26, batch 1600, loss[ctc_loss=0.06642, att_loss=0.2518, loss=0.2147, over 16770.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006451, over 48.00 utterances.], tot_loss[ctc_loss=0.06737, att_loss=0.2324, loss=0.1994, over 3281340.16 frames. utt_duration=1284 frames, utt_pad_proportion=0.04268, over 10237.19 utterances.], batch size: 48, lr: 4.14e-03, grad_scale: 16.0 2023-03-09 07:35:48,919 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-03-09 07:36:42,605 INFO [train2.py:809] (2/4) Epoch 26, batch 1650, loss[ctc_loss=0.04564, att_loss=0.212, loss=0.1787, over 16277.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007554, over 43.00 utterances.], tot_loss[ctc_loss=0.06754, att_loss=0.2329, loss=0.1998, over 3282412.05 frames. utt_duration=1273 frames, utt_pad_proportion=0.04615, over 10323.50 utterances.], batch size: 43, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:36:50,256 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.244e+02 1.931e+02 2.160e+02 2.616e+02 4.577e+02, threshold=4.320e+02, percent-clipped=2.0 2023-03-09 07:37:12,588 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8181, 6.0509, 5.5789, 5.7735, 5.7189, 5.1780, 5.4951, 5.2829], device='cuda:2'), covar=tensor([0.1217, 0.0939, 0.0889, 0.0868, 0.1024, 0.1511, 0.2117, 0.2297], device='cuda:2'), in_proj_covar=tensor([0.0550, 0.0634, 0.0485, 0.0475, 0.0447, 0.0487, 0.0638, 0.0547], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 07:38:02,609 INFO [train2.py:809] (2/4) Epoch 26, batch 1700, loss[ctc_loss=0.07501, att_loss=0.2416, loss=0.2083, over 16627.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005439, over 47.00 utterances.], tot_loss[ctc_loss=0.06757, att_loss=0.2327, loss=0.1996, over 3283412.14 frames. utt_duration=1281 frames, utt_pad_proportion=0.0444, over 10267.52 utterances.], batch size: 47, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:39:22,755 INFO [train2.py:809] (2/4) Epoch 26, batch 1750, loss[ctc_loss=0.05049, att_loss=0.2268, loss=0.1915, over 17249.00 frames. utt_duration=698.6 frames, utt_pad_proportion=0.1191, over 99.00 utterances.], tot_loss[ctc_loss=0.06754, att_loss=0.2325, loss=0.1995, over 3280004.46 frames. utt_duration=1256 frames, utt_pad_proportion=0.05197, over 10456.24 utterances.], batch size: 99, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:39:30,360 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.455e+02 1.899e+02 2.202e+02 2.819e+02 5.865e+02, threshold=4.404e+02, percent-clipped=4.0 2023-03-09 07:39:42,133 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6636, 2.5154, 2.3940, 2.6577, 2.8693, 2.6075, 2.3544, 3.1137], device='cuda:2'), covar=tensor([0.1668, 0.2337, 0.1677, 0.1324, 0.1635, 0.1308, 0.2102, 0.1093], device='cuda:2'), in_proj_covar=tensor([0.0136, 0.0139, 0.0133, 0.0128, 0.0146, 0.0125, 0.0147, 0.0124], device='cuda:2'), out_proj_covar=tensor([1.0512e-04, 1.0984e-04, 1.0860e-04, 1.0036e-04, 1.0963e-04, 1.0063e-04, 1.1256e-04, 9.8637e-05], device='cuda:2') 2023-03-09 07:40:09,020 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4681, 2.3505, 4.9117, 3.8297, 2.9724, 4.2023, 4.6683, 4.5790], device='cuda:2'), covar=tensor([0.0287, 0.1748, 0.0188, 0.0984, 0.1776, 0.0289, 0.0178, 0.0283], device='cuda:2'), in_proj_covar=tensor([0.0225, 0.0252, 0.0217, 0.0329, 0.0275, 0.0237, 0.0206, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:40:26,893 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=101386.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 07:40:41,623 INFO [train2.py:809] (2/4) Epoch 26, batch 1800, loss[ctc_loss=0.06133, att_loss=0.2088, loss=0.1793, over 15401.00 frames. utt_duration=1762 frames, utt_pad_proportion=0.00916, over 35.00 utterances.], tot_loss[ctc_loss=0.06774, att_loss=0.2331, loss=0.2, over 3283871.34 frames. utt_duration=1269 frames, utt_pad_proportion=0.04651, over 10366.75 utterances.], batch size: 35, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:41:43,386 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=101434.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 07:42:01,375 INFO [train2.py:809] (2/4) Epoch 26, batch 1850, loss[ctc_loss=0.06449, att_loss=0.2355, loss=0.2013, over 16976.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006866, over 50.00 utterances.], tot_loss[ctc_loss=0.06783, att_loss=0.233, loss=0.2, over 3276291.45 frames. utt_duration=1227 frames, utt_pad_proportion=0.05825, over 10691.54 utterances.], batch size: 50, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:42:08,708 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.271e+02 1.881e+02 2.093e+02 3.016e+02 9.107e+02, threshold=4.186e+02, percent-clipped=4.0 2023-03-09 07:42:29,680 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=101463.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:43:19,783 INFO [train2.py:809] (2/4) Epoch 26, batch 1900, loss[ctc_loss=0.06222, att_loss=0.2406, loss=0.205, over 16633.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.004721, over 47.00 utterances.], tot_loss[ctc_loss=0.06818, att_loss=0.2339, loss=0.2008, over 3284511.94 frames. utt_duration=1218 frames, utt_pad_proportion=0.05838, over 10796.20 utterances.], batch size: 47, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:44:39,310 INFO [train2.py:809] (2/4) Epoch 26, batch 1950, loss[ctc_loss=0.05662, att_loss=0.2326, loss=0.1974, over 17333.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.0358, over 63.00 utterances.], tot_loss[ctc_loss=0.06778, att_loss=0.233, loss=0.2, over 3269734.25 frames. utt_duration=1235 frames, utt_pad_proportion=0.05783, over 10600.82 utterances.], batch size: 63, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:44:47,405 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.265e+02 1.785e+02 2.199e+02 2.810e+02 5.538e+02, threshold=4.398e+02, percent-clipped=5.0 2023-03-09 07:45:57,906 INFO [train2.py:809] (2/4) Epoch 26, batch 2000, loss[ctc_loss=0.05778, att_loss=0.2179, loss=0.1859, over 15485.00 frames. utt_duration=1722 frames, utt_pad_proportion=0.009935, over 36.00 utterances.], tot_loss[ctc_loss=0.06764, att_loss=0.2332, loss=0.2001, over 3272161.45 frames. utt_duration=1232 frames, utt_pad_proportion=0.05795, over 10638.39 utterances.], batch size: 36, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:46:03,389 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.21 vs. limit=5.0 2023-03-09 07:47:05,477 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.11 vs. limit=2.0 2023-03-09 07:47:17,702 INFO [train2.py:809] (2/4) Epoch 26, batch 2050, loss[ctc_loss=0.08975, att_loss=0.2575, loss=0.224, over 17414.00 frames. utt_duration=1107 frames, utt_pad_proportion=0.03215, over 63.00 utterances.], tot_loss[ctc_loss=0.06736, att_loss=0.2336, loss=0.2003, over 3279548.07 frames. utt_duration=1229 frames, utt_pad_proportion=0.05762, over 10684.60 utterances.], batch size: 63, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:47:26,104 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.028e+02 1.755e+02 2.108e+02 2.595e+02 4.367e+02, threshold=4.215e+02, percent-clipped=0.0 2023-03-09 07:47:50,896 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5589, 3.2538, 3.7751, 3.2676, 3.5234, 4.6436, 4.5237, 3.4533], device='cuda:2'), covar=tensor([0.0406, 0.1440, 0.1081, 0.1236, 0.1076, 0.0808, 0.0492, 0.1100], device='cuda:2'), in_proj_covar=tensor([0.0247, 0.0248, 0.0286, 0.0221, 0.0267, 0.0376, 0.0270, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 07:47:59,105 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2882, 2.8187, 3.3691, 4.3480, 3.8717, 3.8703, 3.0166, 2.3724], device='cuda:2'), covar=tensor([0.0798, 0.1986, 0.0897, 0.0694, 0.0933, 0.0514, 0.1390, 0.2050], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0220, 0.0188, 0.0223, 0.0234, 0.0189, 0.0204, 0.0192], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:48:19,679 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=101684.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:48:37,222 INFO [train2.py:809] (2/4) Epoch 26, batch 2100, loss[ctc_loss=0.07771, att_loss=0.2157, loss=0.1881, over 14580.00 frames. utt_duration=1824 frames, utt_pad_proportion=0.03555, over 32.00 utterances.], tot_loss[ctc_loss=0.06731, att_loss=0.2328, loss=0.1997, over 3264171.90 frames. utt_duration=1221 frames, utt_pad_proportion=0.06381, over 10703.54 utterances.], batch size: 32, lr: 4.13e-03, grad_scale: 16.0 2023-03-09 07:48:46,591 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3878, 5.2980, 5.2018, 3.3765, 5.1820, 4.8971, 4.7702, 3.2852], device='cuda:2'), covar=tensor([0.0099, 0.0087, 0.0228, 0.0797, 0.0082, 0.0167, 0.0240, 0.0997], device='cuda:2'), in_proj_covar=tensor([0.0078, 0.0107, 0.0110, 0.0113, 0.0089, 0.0118, 0.0103, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 07:49:26,815 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6038, 5.0090, 4.8355, 4.9993, 5.0879, 4.8465, 3.4898, 4.9097], device='cuda:2'), covar=tensor([0.0128, 0.0126, 0.0141, 0.0092, 0.0104, 0.0114, 0.0726, 0.0228], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0094, 0.0119, 0.0074, 0.0080, 0.0092, 0.0108, 0.0113], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-09 07:49:55,490 INFO [train2.py:809] (2/4) Epoch 26, batch 2150, loss[ctc_loss=0.06708, att_loss=0.2229, loss=0.1918, over 16132.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005406, over 42.00 utterances.], tot_loss[ctc_loss=0.06674, att_loss=0.2319, loss=0.1989, over 3266170.48 frames. utt_duration=1257 frames, utt_pad_proportion=0.05492, over 10402.58 utterances.], batch size: 42, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 07:49:55,849 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=101745.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:50:03,015 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.203e+02 1.940e+02 2.256e+02 2.810e+02 5.136e+02, threshold=4.511e+02, percent-clipped=2.0 2023-03-09 07:50:23,110 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=101763.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:51:13,875 INFO [train2.py:809] (2/4) Epoch 26, batch 2200, loss[ctc_loss=0.04827, att_loss=0.209, loss=0.1768, over 15492.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.008854, over 36.00 utterances.], tot_loss[ctc_loss=0.06655, att_loss=0.2319, loss=0.1989, over 3270409.35 frames. utt_duration=1273 frames, utt_pad_proportion=0.05023, over 10286.44 utterances.], batch size: 36, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 07:51:33,262 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2023-03-09 07:51:38,367 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=101811.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:51:54,467 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3250, 4.4026, 4.5681, 4.5919, 5.1026, 4.3697, 4.4565, 2.4733], device='cuda:2'), covar=tensor([0.0327, 0.0406, 0.0324, 0.0297, 0.0770, 0.0307, 0.0368, 0.1877], device='cuda:2'), in_proj_covar=tensor([0.0184, 0.0212, 0.0207, 0.0224, 0.0379, 0.0183, 0.0197, 0.0219], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:52:32,211 INFO [train2.py:809] (2/4) Epoch 26, batch 2250, loss[ctc_loss=0.07545, att_loss=0.2386, loss=0.2059, over 16399.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.007556, over 44.00 utterances.], tot_loss[ctc_loss=0.06707, att_loss=0.2328, loss=0.1997, over 3281383.85 frames. utt_duration=1269 frames, utt_pad_proportion=0.04773, over 10355.11 utterances.], batch size: 44, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 07:52:39,680 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.215e+02 1.832e+02 2.219e+02 2.925e+02 8.188e+02, threshold=4.439e+02, percent-clipped=2.0 2023-03-09 07:52:44,864 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4141, 2.6579, 4.8786, 3.8978, 3.2110, 4.2293, 4.6419, 4.6341], device='cuda:2'), covar=tensor([0.0285, 0.1585, 0.0212, 0.0863, 0.1534, 0.0267, 0.0219, 0.0255], device='cuda:2'), in_proj_covar=tensor([0.0221, 0.0245, 0.0213, 0.0321, 0.0269, 0.0232, 0.0203, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:53:51,086 INFO [train2.py:809] (2/4) Epoch 26, batch 2300, loss[ctc_loss=0.08455, att_loss=0.2491, loss=0.2162, over 17045.00 frames. utt_duration=1313 frames, utt_pad_proportion=0.009405, over 52.00 utterances.], tot_loss[ctc_loss=0.06794, att_loss=0.2338, loss=0.2006, over 3281853.15 frames. utt_duration=1221 frames, utt_pad_proportion=0.05908, over 10768.23 utterances.], batch size: 52, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 07:54:30,189 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=101920.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 07:54:59,224 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1837, 2.8142, 3.6255, 2.9040, 3.3962, 4.3037, 4.2427, 3.0663], device='cuda:2'), covar=tensor([0.0461, 0.1764, 0.1125, 0.1405, 0.1122, 0.1017, 0.0521, 0.1359], device='cuda:2'), in_proj_covar=tensor([0.0247, 0.0249, 0.0286, 0.0222, 0.0268, 0.0378, 0.0271, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 07:55:09,344 INFO [train2.py:809] (2/4) Epoch 26, batch 2350, loss[ctc_loss=0.06333, att_loss=0.2223, loss=0.1905, over 15955.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006995, over 41.00 utterances.], tot_loss[ctc_loss=0.06871, att_loss=0.2342, loss=0.2011, over 3274880.16 frames. utt_duration=1187 frames, utt_pad_proportion=0.07082, over 11046.53 utterances.], batch size: 41, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 07:55:16,748 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.309e+02 1.850e+02 2.322e+02 2.866e+02 7.956e+02, threshold=4.644e+02, percent-clipped=4.0 2023-03-09 07:55:17,261 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4902, 4.5005, 4.6559, 4.5992, 5.1650, 4.5196, 4.4501, 2.7357], device='cuda:2'), covar=tensor([0.0279, 0.0423, 0.0309, 0.0351, 0.0692, 0.0253, 0.0441, 0.1633], device='cuda:2'), in_proj_covar=tensor([0.0185, 0.0213, 0.0207, 0.0225, 0.0380, 0.0184, 0.0198, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:55:41,424 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.48 vs. limit=2.0 2023-03-09 07:56:05,572 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=101981.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 07:56:15,049 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=101987.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:56:27,655 INFO [train2.py:809] (2/4) Epoch 26, batch 2400, loss[ctc_loss=0.1117, att_loss=0.2619, loss=0.2319, over 13654.00 frames. utt_duration=378.2 frames, utt_pad_proportion=0.3434, over 145.00 utterances.], tot_loss[ctc_loss=0.06829, att_loss=0.2338, loss=0.2007, over 3276134.64 frames. utt_duration=1206 frames, utt_pad_proportion=0.06543, over 10877.30 utterances.], batch size: 145, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 07:57:44,283 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=102040.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:57:51,787 INFO [train2.py:809] (2/4) Epoch 26, batch 2450, loss[ctc_loss=0.06105, att_loss=0.2382, loss=0.2028, over 17312.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01151, over 55.00 utterances.], tot_loss[ctc_loss=0.06812, att_loss=0.2337, loss=0.2006, over 3276243.59 frames. utt_duration=1211 frames, utt_pad_proportion=0.06274, over 10837.09 utterances.], batch size: 55, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 07:57:53,839 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3587, 4.4471, 4.5854, 4.5202, 5.1072, 4.4126, 4.3735, 2.8772], device='cuda:2'), covar=tensor([0.0317, 0.0432, 0.0364, 0.0451, 0.0746, 0.0288, 0.0440, 0.1586], device='cuda:2'), in_proj_covar=tensor([0.0185, 0.0213, 0.0208, 0.0225, 0.0380, 0.0184, 0.0198, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 07:57:56,740 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=102048.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 07:57:59,372 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.331e+02 1.902e+02 2.246e+02 2.761e+02 8.723e+02, threshold=4.491e+02, percent-clipped=3.0 2023-03-09 07:59:11,505 INFO [train2.py:809] (2/4) Epoch 26, batch 2500, loss[ctc_loss=0.07386, att_loss=0.254, loss=0.218, over 17061.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.008948, over 53.00 utterances.], tot_loss[ctc_loss=0.06794, att_loss=0.2335, loss=0.2004, over 3279962.46 frames. utt_duration=1201 frames, utt_pad_proportion=0.06377, over 10938.11 utterances.], batch size: 53, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 08:00:29,997 INFO [train2.py:809] (2/4) Epoch 26, batch 2550, loss[ctc_loss=0.0719, att_loss=0.2465, loss=0.2116, over 17269.00 frames. utt_duration=875.8 frames, utt_pad_proportion=0.08385, over 79.00 utterances.], tot_loss[ctc_loss=0.06729, att_loss=0.2326, loss=0.1995, over 3279876.66 frames. utt_duration=1201 frames, utt_pad_proportion=0.06352, over 10934.04 utterances.], batch size: 79, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 08:00:37,741 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.009e+02 1.703e+02 2.113e+02 2.564e+02 5.440e+02, threshold=4.227e+02, percent-clipped=2.0 2023-03-09 08:01:48,433 INFO [train2.py:809] (2/4) Epoch 26, batch 2600, loss[ctc_loss=0.05029, att_loss=0.2076, loss=0.1761, over 15644.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.009067, over 37.00 utterances.], tot_loss[ctc_loss=0.06686, att_loss=0.2325, loss=0.1994, over 3265339.47 frames. utt_duration=1191 frames, utt_pad_proportion=0.06929, over 10977.31 utterances.], batch size: 37, lr: 4.12e-03, grad_scale: 16.0 2023-03-09 08:02:09,272 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1373, 2.7865, 3.1601, 4.1431, 3.7660, 3.7532, 2.8026, 2.1213], device='cuda:2'), covar=tensor([0.0848, 0.1943, 0.0947, 0.0617, 0.0942, 0.0546, 0.1568, 0.2330], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0220, 0.0189, 0.0224, 0.0234, 0.0189, 0.0205, 0.0192], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:03:06,803 INFO [train2.py:809] (2/4) Epoch 26, batch 2650, loss[ctc_loss=0.0659, att_loss=0.2317, loss=0.1986, over 16253.00 frames. utt_duration=1514 frames, utt_pad_proportion=0.008772, over 43.00 utterances.], tot_loss[ctc_loss=0.06676, att_loss=0.2323, loss=0.1992, over 3260619.38 frames. utt_duration=1185 frames, utt_pad_proportion=0.07153, over 11017.56 utterances.], batch size: 43, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:03:14,513 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.238e+02 1.711e+02 1.973e+02 2.429e+02 4.776e+02, threshold=3.946e+02, percent-clipped=2.0 2023-03-09 08:03:33,907 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3260, 2.8931, 3.6315, 3.0075, 3.4518, 4.4437, 4.2999, 3.2266], device='cuda:2'), covar=tensor([0.0396, 0.1816, 0.1221, 0.1335, 0.1104, 0.0993, 0.0651, 0.1193], device='cuda:2'), in_proj_covar=tensor([0.0249, 0.0250, 0.0289, 0.0223, 0.0269, 0.0380, 0.0272, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 08:03:55,842 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=102276.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 08:04:18,277 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6003, 4.8612, 4.7516, 4.7994, 4.8797, 4.8306, 4.4804, 4.3465], device='cuda:2'), covar=tensor([0.1078, 0.0585, 0.0556, 0.0505, 0.0330, 0.0365, 0.0468, 0.0417], device='cuda:2'), in_proj_covar=tensor([0.0529, 0.0376, 0.0364, 0.0374, 0.0435, 0.0444, 0.0375, 0.0406], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:04:25,662 INFO [train2.py:809] (2/4) Epoch 26, batch 2700, loss[ctc_loss=0.07336, att_loss=0.2415, loss=0.2079, over 16874.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007855, over 49.00 utterances.], tot_loss[ctc_loss=0.0665, att_loss=0.2323, loss=0.1991, over 3263699.55 frames. utt_duration=1211 frames, utt_pad_proportion=0.06475, over 10793.49 utterances.], batch size: 49, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:05:37,433 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=102340.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:05:41,915 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=102343.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:05:44,679 INFO [train2.py:809] (2/4) Epoch 26, batch 2750, loss[ctc_loss=0.0678, att_loss=0.2205, loss=0.19, over 16130.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005895, over 42.00 utterances.], tot_loss[ctc_loss=0.06602, att_loss=0.2314, loss=0.1984, over 3262730.56 frames. utt_duration=1227 frames, utt_pad_proportion=0.06107, over 10650.23 utterances.], batch size: 42, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:05:52,186 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.911e+02 2.234e+02 2.672e+02 6.955e+02, threshold=4.468e+02, percent-clipped=5.0 2023-03-09 08:06:52,284 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=102388.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:06:55,601 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7270, 2.3468, 5.1377, 4.1157, 3.2419, 4.5081, 4.8679, 4.8748], device='cuda:2'), covar=tensor([0.0194, 0.1559, 0.0153, 0.0774, 0.1478, 0.0197, 0.0130, 0.0191], device='cuda:2'), in_proj_covar=tensor([0.0222, 0.0247, 0.0215, 0.0323, 0.0271, 0.0234, 0.0206, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:07:02,733 INFO [train2.py:809] (2/4) Epoch 26, batch 2800, loss[ctc_loss=0.06798, att_loss=0.2446, loss=0.2093, over 17279.00 frames. utt_duration=1173 frames, utt_pad_proportion=0.02474, over 59.00 utterances.], tot_loss[ctc_loss=0.06704, att_loss=0.2317, loss=0.1988, over 3262007.59 frames. utt_duration=1218 frames, utt_pad_proportion=0.06409, over 10727.00 utterances.], batch size: 59, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:07:25,021 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7984, 5.0683, 5.3158, 5.1603, 5.3219, 5.7499, 5.1101, 5.8388], device='cuda:2'), covar=tensor([0.0738, 0.0739, 0.0888, 0.1420, 0.1884, 0.0965, 0.0865, 0.0766], device='cuda:2'), in_proj_covar=tensor([0.0918, 0.0524, 0.0637, 0.0686, 0.0907, 0.0663, 0.0512, 0.0644], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 08:08:21,711 INFO [train2.py:809] (2/4) Epoch 26, batch 2850, loss[ctc_loss=0.08087, att_loss=0.2312, loss=0.2011, over 16186.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.006577, over 41.00 utterances.], tot_loss[ctc_loss=0.068, att_loss=0.2333, loss=0.2002, over 3269316.32 frames. utt_duration=1197 frames, utt_pad_proportion=0.06915, over 10940.34 utterances.], batch size: 41, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:08:29,300 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.936e+02 2.281e+02 2.866e+02 5.493e+02, threshold=4.561e+02, percent-clipped=4.0 2023-03-09 08:08:57,571 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2023-03-09 08:09:19,577 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0790, 3.8032, 3.1807, 3.4541, 4.0103, 3.6570, 3.0559, 4.2250], device='cuda:2'), covar=tensor([0.1045, 0.0511, 0.1126, 0.0726, 0.0726, 0.0771, 0.0871, 0.0520], device='cuda:2'), in_proj_covar=tensor([0.0206, 0.0223, 0.0227, 0.0206, 0.0288, 0.0246, 0.0202, 0.0294], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 08:09:40,749 INFO [train2.py:809] (2/4) Epoch 26, batch 2900, loss[ctc_loss=0.05139, att_loss=0.219, loss=0.1854, over 16006.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.007311, over 40.00 utterances.], tot_loss[ctc_loss=0.06769, att_loss=0.2328, loss=0.1998, over 3274867.27 frames. utt_duration=1205 frames, utt_pad_proportion=0.06484, over 10884.63 utterances.], batch size: 40, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:10:31,028 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8498, 5.1385, 5.5539, 5.0988, 5.0229, 5.8121, 5.1517, 5.7476], device='cuda:2'), covar=tensor([0.1488, 0.1442, 0.1290, 0.2281, 0.3312, 0.1625, 0.1189, 0.1283], device='cuda:2'), in_proj_covar=tensor([0.0917, 0.0523, 0.0639, 0.0686, 0.0908, 0.0663, 0.0512, 0.0642], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 08:11:00,377 INFO [train2.py:809] (2/4) Epoch 26, batch 2950, loss[ctc_loss=0.0537, att_loss=0.2116, loss=0.18, over 16007.00 frames. utt_duration=1602 frames, utt_pad_proportion=0.006649, over 40.00 utterances.], tot_loss[ctc_loss=0.06831, att_loss=0.233, loss=0.2001, over 3269316.80 frames. utt_duration=1203 frames, utt_pad_proportion=0.06742, over 10883.94 utterances.], batch size: 40, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:11:08,368 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.971e+02 2.322e+02 2.777e+02 5.083e+02, threshold=4.643e+02, percent-clipped=3.0 2023-03-09 08:11:33,015 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7227, 3.2962, 3.9137, 3.4213, 3.6905, 4.7616, 4.6627, 3.5902], device='cuda:2'), covar=tensor([0.0317, 0.1527, 0.1036, 0.1098, 0.0945, 0.1089, 0.0567, 0.1004], device='cuda:2'), in_proj_covar=tensor([0.0250, 0.0253, 0.0289, 0.0224, 0.0271, 0.0383, 0.0274, 0.0237], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 08:11:54,074 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=102576.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 08:12:24,253 INFO [train2.py:809] (2/4) Epoch 26, batch 3000, loss[ctc_loss=0.05145, att_loss=0.2051, loss=0.1744, over 14469.00 frames. utt_duration=1811 frames, utt_pad_proportion=0.04051, over 32.00 utterances.], tot_loss[ctc_loss=0.06769, att_loss=0.2321, loss=0.1992, over 3270145.13 frames. utt_duration=1239 frames, utt_pad_proportion=0.0586, over 10572.93 utterances.], batch size: 32, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:12:24,253 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 08:12:38,598 INFO [train2.py:843] (2/4) Epoch 26, validation: ctc_loss=0.04046, att_loss=0.2348, loss=0.1959, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 08:12:38,599 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 08:12:40,632 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8340, 2.4708, 2.7169, 2.7068, 2.7586, 2.7841, 2.4524, 3.2388], device='cuda:2'), covar=tensor([0.1816, 0.2489, 0.1702, 0.1647, 0.1990, 0.1261, 0.2784, 0.1218], device='cuda:2'), in_proj_covar=tensor([0.0135, 0.0137, 0.0132, 0.0128, 0.0144, 0.0123, 0.0147, 0.0122], device='cuda:2'), out_proj_covar=tensor([1.0436e-04, 1.0897e-04, 1.0799e-04, 9.9791e-05, 1.0873e-04, 9.9299e-05, 1.1244e-04, 9.7378e-05], device='cuda:2') 2023-03-09 08:13:27,790 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=102624.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 08:14:00,124 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=102643.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:14:03,002 INFO [train2.py:809] (2/4) Epoch 26, batch 3050, loss[ctc_loss=0.05587, att_loss=0.2114, loss=0.1803, over 16283.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.006914, over 43.00 utterances.], tot_loss[ctc_loss=0.06735, att_loss=0.2315, loss=0.1987, over 3268221.24 frames. utt_duration=1232 frames, utt_pad_proportion=0.06133, over 10626.64 utterances.], batch size: 43, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:14:10,818 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.303e+02 1.917e+02 2.316e+02 2.727e+02 9.288e+02, threshold=4.632e+02, percent-clipped=2.0 2023-03-09 08:14:33,076 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-03-09 08:15:18,664 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=102691.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:15:24,934 INFO [train2.py:809] (2/4) Epoch 26, batch 3100, loss[ctc_loss=0.07167, att_loss=0.2443, loss=0.2098, over 17153.00 frames. utt_duration=1227 frames, utt_pad_proportion=0.01247, over 56.00 utterances.], tot_loss[ctc_loss=0.06735, att_loss=0.2315, loss=0.1987, over 3259359.22 frames. utt_duration=1241 frames, utt_pad_proportion=0.06075, over 10520.78 utterances.], batch size: 56, lr: 4.11e-03, grad_scale: 16.0 2023-03-09 08:16:47,144 INFO [train2.py:809] (2/4) Epoch 26, batch 3150, loss[ctc_loss=0.06443, att_loss=0.2361, loss=0.2018, over 17354.00 frames. utt_duration=1103 frames, utt_pad_proportion=0.03398, over 63.00 utterances.], tot_loss[ctc_loss=0.06652, att_loss=0.2314, loss=0.1985, over 3262161.98 frames. utt_duration=1240 frames, utt_pad_proportion=0.0597, over 10535.84 utterances.], batch size: 63, lr: 4.10e-03, grad_scale: 16.0 2023-03-09 08:16:54,907 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.334e+02 1.851e+02 2.183e+02 2.802e+02 6.228e+02, threshold=4.366e+02, percent-clipped=1.0 2023-03-09 08:17:15,281 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9399, 3.6665, 3.7393, 3.2832, 3.6287, 3.6613, 3.8023, 2.8232], device='cuda:2'), covar=tensor([0.1130, 0.1330, 0.1460, 0.2743, 0.1241, 0.1756, 0.0960, 0.2977], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0204, 0.0218, 0.0269, 0.0178, 0.0281, 0.0200, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:17:22,790 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9431, 6.1713, 5.6335, 5.8416, 5.8080, 5.2938, 5.5266, 5.2987], device='cuda:2'), covar=tensor([0.1139, 0.0893, 0.0971, 0.0867, 0.1022, 0.1501, 0.2364, 0.2254], device='cuda:2'), in_proj_covar=tensor([0.0555, 0.0635, 0.0491, 0.0482, 0.0451, 0.0488, 0.0640, 0.0546], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:18:06,655 INFO [train2.py:809] (2/4) Epoch 26, batch 3200, loss[ctc_loss=0.07591, att_loss=0.2436, loss=0.2101, over 17475.00 frames. utt_duration=1015 frames, utt_pad_proportion=0.04378, over 69.00 utterances.], tot_loss[ctc_loss=0.06646, att_loss=0.2313, loss=0.1984, over 3270685.97 frames. utt_duration=1255 frames, utt_pad_proportion=0.05337, over 10436.41 utterances.], batch size: 69, lr: 4.10e-03, grad_scale: 32.0 2023-03-09 08:18:21,612 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4757, 2.5586, 4.9714, 3.7516, 3.0661, 4.2597, 4.6102, 4.5705], device='cuda:2'), covar=tensor([0.0284, 0.1607, 0.0179, 0.1022, 0.1672, 0.0278, 0.0225, 0.0291], device='cuda:2'), in_proj_covar=tensor([0.0223, 0.0248, 0.0216, 0.0326, 0.0273, 0.0235, 0.0208, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:19:25,809 INFO [train2.py:809] (2/4) Epoch 26, batch 3250, loss[ctc_loss=0.05912, att_loss=0.2405, loss=0.2042, over 17024.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007584, over 51.00 utterances.], tot_loss[ctc_loss=0.06606, att_loss=0.2315, loss=0.1984, over 3274609.95 frames. utt_duration=1261 frames, utt_pad_proportion=0.0499, over 10402.25 utterances.], batch size: 51, lr: 4.10e-03, grad_scale: 32.0 2023-03-09 08:19:33,383 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.262e+02 1.810e+02 2.101e+02 2.825e+02 6.306e+02, threshold=4.202e+02, percent-clipped=5.0 2023-03-09 08:19:35,344 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=102851.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:20:32,510 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=102887.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:20:44,487 INFO [train2.py:809] (2/4) Epoch 26, batch 3300, loss[ctc_loss=0.05714, att_loss=0.2296, loss=0.1951, over 16874.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007214, over 49.00 utterances.], tot_loss[ctc_loss=0.06549, att_loss=0.2308, loss=0.1977, over 3271397.97 frames. utt_duration=1275 frames, utt_pad_proportion=0.04706, over 10277.20 utterances.], batch size: 49, lr: 4.10e-03, grad_scale: 32.0 2023-03-09 08:20:59,075 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-03-09 08:21:11,647 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=102912.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:21:55,661 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-09 08:22:03,746 INFO [train2.py:809] (2/4) Epoch 26, batch 3350, loss[ctc_loss=0.07067, att_loss=0.2403, loss=0.2064, over 16759.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006969, over 48.00 utterances.], tot_loss[ctc_loss=0.06528, att_loss=0.2309, loss=0.1977, over 3274531.57 frames. utt_duration=1264 frames, utt_pad_proportion=0.04927, over 10375.95 utterances.], batch size: 48, lr: 4.10e-03, grad_scale: 32.0 2023-03-09 08:22:08,729 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=102948.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:22:11,376 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.207e+02 1.718e+02 2.165e+02 2.625e+02 3.837e+02, threshold=4.330e+02, percent-clipped=0.0 2023-03-09 08:23:22,224 INFO [train2.py:809] (2/4) Epoch 26, batch 3400, loss[ctc_loss=0.1001, att_loss=0.2574, loss=0.2259, over 14388.00 frames. utt_duration=395.7 frames, utt_pad_proportion=0.3106, over 146.00 utterances.], tot_loss[ctc_loss=0.06624, att_loss=0.2321, loss=0.199, over 3280273.47 frames. utt_duration=1245 frames, utt_pad_proportion=0.05221, over 10550.54 utterances.], batch size: 146, lr: 4.10e-03, grad_scale: 16.0 2023-03-09 08:23:52,109 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-03-09 08:24:26,989 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=5.04 vs. limit=5.0 2023-03-09 08:24:40,906 INFO [train2.py:809] (2/4) Epoch 26, batch 3450, loss[ctc_loss=0.06573, att_loss=0.2156, loss=0.1856, over 15770.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008604, over 38.00 utterances.], tot_loss[ctc_loss=0.06638, att_loss=0.232, loss=0.1989, over 3278767.85 frames. utt_duration=1263 frames, utt_pad_proportion=0.04698, over 10397.60 utterances.], batch size: 38, lr: 4.10e-03, grad_scale: 16.0 2023-03-09 08:24:41,753 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-09 08:24:50,036 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.313e+02 1.811e+02 2.239e+02 2.771e+02 6.342e+02, threshold=4.479e+02, percent-clipped=4.0 2023-03-09 08:26:00,932 INFO [train2.py:809] (2/4) Epoch 26, batch 3500, loss[ctc_loss=0.08261, att_loss=0.251, loss=0.2173, over 17078.00 frames. utt_duration=1315 frames, utt_pad_proportion=0.007286, over 52.00 utterances.], tot_loss[ctc_loss=0.06635, att_loss=0.2317, loss=0.1986, over 3278927.87 frames. utt_duration=1261 frames, utt_pad_proportion=0.04809, over 10410.03 utterances.], batch size: 52, lr: 4.10e-03, grad_scale: 16.0 2023-03-09 08:26:05,856 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103098.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:26:59,873 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103132.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:27:19,711 INFO [train2.py:809] (2/4) Epoch 26, batch 3550, loss[ctc_loss=0.06091, att_loss=0.2323, loss=0.198, over 16274.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007645, over 43.00 utterances.], tot_loss[ctc_loss=0.06614, att_loss=0.2312, loss=0.1982, over 3272539.90 frames. utt_duration=1244 frames, utt_pad_proportion=0.05391, over 10537.37 utterances.], batch size: 43, lr: 4.10e-03, grad_scale: 16.0 2023-03-09 08:27:28,876 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.344e+02 1.850e+02 2.221e+02 2.819e+02 5.198e+02, threshold=4.442e+02, percent-clipped=1.0 2023-03-09 08:27:33,935 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9835, 5.0444, 4.8804, 2.2345, 2.0389, 3.0076, 2.2115, 3.7762], device='cuda:2'), covar=tensor([0.0767, 0.0316, 0.0270, 0.4899, 0.5432, 0.2338, 0.4017, 0.1754], device='cuda:2'), in_proj_covar=tensor([0.0362, 0.0298, 0.0279, 0.0250, 0.0338, 0.0332, 0.0262, 0.0370], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 08:27:42,880 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103159.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:28:29,946 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6422, 2.4618, 2.3322, 2.4784, 2.6028, 2.5198, 2.3236, 2.9735], device='cuda:2'), covar=tensor([0.1430, 0.2162, 0.2102, 0.1333, 0.1614, 0.1255, 0.1983, 0.1045], device='cuda:2'), in_proj_covar=tensor([0.0136, 0.0140, 0.0135, 0.0130, 0.0146, 0.0125, 0.0148, 0.0124], device='cuda:2'), out_proj_covar=tensor([1.0522e-04, 1.1070e-04, 1.1000e-04, 1.0137e-04, 1.1024e-04, 1.0073e-04, 1.1310e-04, 9.8431e-05], device='cuda:2') 2023-03-09 08:28:36,306 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103193.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:28:38,766 INFO [train2.py:809] (2/4) Epoch 26, batch 3600, loss[ctc_loss=0.07425, att_loss=0.2239, loss=0.1939, over 16181.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.00609, over 41.00 utterances.], tot_loss[ctc_loss=0.06707, att_loss=0.2321, loss=0.1991, over 3271954.19 frames. utt_duration=1225 frames, utt_pad_proportion=0.05915, over 10698.39 utterances.], batch size: 41, lr: 4.10e-03, grad_scale: 16.0 2023-03-09 08:28:57,781 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103207.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:29:55,727 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103243.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:29:58,624 INFO [train2.py:809] (2/4) Epoch 26, batch 3650, loss[ctc_loss=0.1114, att_loss=0.2621, loss=0.232, over 13614.00 frames. utt_duration=374.5 frames, utt_pad_proportion=0.3452, over 146.00 utterances.], tot_loss[ctc_loss=0.06687, att_loss=0.232, loss=0.199, over 3274212.91 frames. utt_duration=1243 frames, utt_pad_proportion=0.05512, over 10548.56 utterances.], batch size: 146, lr: 4.09e-03, grad_scale: 16.0 2023-03-09 08:29:58,894 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0395, 5.2794, 5.2147, 5.2320, 5.3213, 5.2797, 4.8869, 4.7663], device='cuda:2'), covar=tensor([0.1012, 0.0580, 0.0361, 0.0527, 0.0280, 0.0334, 0.0439, 0.0338], device='cuda:2'), in_proj_covar=tensor([0.0530, 0.0376, 0.0367, 0.0375, 0.0438, 0.0444, 0.0374, 0.0409], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:30:07,782 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.026e+02 1.817e+02 2.258e+02 2.684e+02 6.066e+02, threshold=4.515e+02, percent-clipped=3.0 2023-03-09 08:30:18,794 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4911, 2.8004, 4.9504, 3.8892, 3.1475, 4.3176, 4.7727, 4.6905], device='cuda:2'), covar=tensor([0.0327, 0.1393, 0.0267, 0.0872, 0.1658, 0.0272, 0.0227, 0.0279], device='cuda:2'), in_proj_covar=tensor([0.0223, 0.0247, 0.0216, 0.0324, 0.0272, 0.0235, 0.0207, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:31:17,478 INFO [train2.py:809] (2/4) Epoch 26, batch 3700, loss[ctc_loss=0.07638, att_loss=0.2511, loss=0.2162, over 16477.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006609, over 46.00 utterances.], tot_loss[ctc_loss=0.06717, att_loss=0.2319, loss=0.199, over 3263472.50 frames. utt_duration=1225 frames, utt_pad_proportion=0.06104, over 10667.46 utterances.], batch size: 46, lr: 4.09e-03, grad_scale: 16.0 2023-03-09 08:31:55,584 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4896, 4.9965, 4.8134, 4.9616, 5.1104, 4.6196, 3.6708, 4.9650], device='cuda:2'), covar=tensor([0.0140, 0.0113, 0.0153, 0.0086, 0.0090, 0.0155, 0.0635, 0.0166], device='cuda:2'), in_proj_covar=tensor([0.0099, 0.0095, 0.0119, 0.0074, 0.0080, 0.0092, 0.0108, 0.0113], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-09 08:32:30,488 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7239, 5.2381, 5.0347, 5.1961, 5.3252, 4.8790, 3.6174, 5.1439], device='cuda:2'), covar=tensor([0.0132, 0.0114, 0.0138, 0.0096, 0.0090, 0.0130, 0.0690, 0.0192], device='cuda:2'), in_proj_covar=tensor([0.0099, 0.0095, 0.0119, 0.0074, 0.0080, 0.0091, 0.0108, 0.0113], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-09 08:32:36,352 INFO [train2.py:809] (2/4) Epoch 26, batch 3750, loss[ctc_loss=0.07597, att_loss=0.2361, loss=0.2041, over 17292.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01241, over 55.00 utterances.], tot_loss[ctc_loss=0.06674, att_loss=0.2313, loss=0.1984, over 3258317.34 frames. utt_duration=1231 frames, utt_pad_proportion=0.06245, over 10597.00 utterances.], batch size: 55, lr: 4.09e-03, grad_scale: 16.0 2023-03-09 08:32:45,487 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.324e+02 1.935e+02 2.309e+02 3.024e+02 8.462e+02, threshold=4.618e+02, percent-clipped=2.0 2023-03-09 08:33:07,160 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2395, 4.4047, 4.4615, 4.5408, 5.0095, 4.3725, 4.5100, 2.4930], device='cuda:2'), covar=tensor([0.0330, 0.0440, 0.0396, 0.0379, 0.0867, 0.0286, 0.0371, 0.1941], device='cuda:2'), in_proj_covar=tensor([0.0187, 0.0215, 0.0211, 0.0228, 0.0382, 0.0185, 0.0200, 0.0221], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:33:10,140 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103366.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 08:33:18,220 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8754, 6.1321, 5.6545, 5.8627, 5.8318, 5.2621, 5.5477, 5.2661], device='cuda:2'), covar=tensor([0.1199, 0.0769, 0.0881, 0.0789, 0.0856, 0.1512, 0.2259, 0.2472], device='cuda:2'), in_proj_covar=tensor([0.0552, 0.0630, 0.0487, 0.0475, 0.0446, 0.0486, 0.0636, 0.0545], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:33:56,328 INFO [train2.py:809] (2/4) Epoch 26, batch 3800, loss[ctc_loss=0.06388, att_loss=0.2365, loss=0.202, over 16634.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.004811, over 47.00 utterances.], tot_loss[ctc_loss=0.06693, att_loss=0.2322, loss=0.1992, over 3264261.99 frames. utt_duration=1204 frames, utt_pad_proportion=0.06742, over 10856.88 utterances.], batch size: 47, lr: 4.09e-03, grad_scale: 16.0 2023-03-09 08:34:31,036 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1211, 5.0550, 5.1771, 2.0790, 2.0345, 2.7907, 2.4434, 3.7907], device='cuda:2'), covar=tensor([0.0910, 0.0667, 0.0252, 0.5596, 0.6075, 0.3124, 0.4206, 0.1997], device='cuda:2'), in_proj_covar=tensor([0.0359, 0.0297, 0.0276, 0.0248, 0.0336, 0.0328, 0.0260, 0.0366], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 08:34:32,274 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0963, 5.3878, 5.3076, 5.3404, 5.4361, 5.3796, 5.0385, 4.8764], device='cuda:2'), covar=tensor([0.1089, 0.0506, 0.0337, 0.0448, 0.0264, 0.0331, 0.0395, 0.0299], device='cuda:2'), in_proj_covar=tensor([0.0527, 0.0374, 0.0366, 0.0372, 0.0437, 0.0442, 0.0373, 0.0406], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:34:46,790 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9237, 5.1387, 5.0671, 5.0773, 5.2083, 5.1760, 4.7934, 4.6656], device='cuda:2'), covar=tensor([0.1127, 0.0627, 0.0408, 0.0595, 0.0324, 0.0366, 0.0462, 0.0348], device='cuda:2'), in_proj_covar=tensor([0.0528, 0.0375, 0.0366, 0.0373, 0.0438, 0.0443, 0.0374, 0.0406], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:34:48,472 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103427.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 08:35:16,368 INFO [train2.py:809] (2/4) Epoch 26, batch 3850, loss[ctc_loss=0.06285, att_loss=0.2379, loss=0.2029, over 16781.00 frames. utt_duration=679.4 frames, utt_pad_proportion=0.1454, over 99.00 utterances.], tot_loss[ctc_loss=0.06613, att_loss=0.2315, loss=0.1984, over 3262687.95 frames. utt_duration=1227 frames, utt_pad_proportion=0.06225, over 10647.08 utterances.], batch size: 99, lr: 4.09e-03, grad_scale: 8.0 2023-03-09 08:35:27,000 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.253e+02 1.752e+02 2.146e+02 2.591e+02 4.293e+02, threshold=4.292e+02, percent-clipped=0.0 2023-03-09 08:35:30,352 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103454.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:36:22,164 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103488.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:36:32,453 INFO [train2.py:809] (2/4) Epoch 26, batch 3900, loss[ctc_loss=0.06707, att_loss=0.239, loss=0.2046, over 17311.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.01132, over 55.00 utterances.], tot_loss[ctc_loss=0.06611, att_loss=0.232, loss=0.1988, over 3270258.55 frames. utt_duration=1235 frames, utt_pad_proportion=0.05863, over 10604.77 utterances.], batch size: 55, lr: 4.09e-03, grad_scale: 8.0 2023-03-09 08:36:34,180 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5568, 3.1184, 3.6484, 4.6005, 4.0212, 4.0116, 3.1294, 2.6989], device='cuda:2'), covar=tensor([0.0706, 0.1742, 0.0801, 0.0484, 0.0835, 0.0517, 0.1280, 0.1807], device='cuda:2'), in_proj_covar=tensor([0.0187, 0.0219, 0.0188, 0.0224, 0.0234, 0.0190, 0.0204, 0.0193], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:36:50,987 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=103507.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:37:47,345 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=103543.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:37:50,084 INFO [train2.py:809] (2/4) Epoch 26, batch 3950, loss[ctc_loss=0.08297, att_loss=0.251, loss=0.2174, over 17320.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.01099, over 55.00 utterances.], tot_loss[ctc_loss=0.06612, att_loss=0.2323, loss=0.1991, over 3269604.82 frames. utt_duration=1258 frames, utt_pad_proportion=0.05158, over 10408.87 utterances.], batch size: 55, lr: 4.09e-03, grad_scale: 8.0 2023-03-09 08:38:00,647 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.745e+02 2.083e+02 2.654e+02 4.442e+02, threshold=4.166e+02, percent-clipped=1.0 2023-03-09 08:38:05,194 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=103555.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:38:14,782 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3970, 4.5753, 4.6624, 4.7372, 5.1329, 4.5032, 4.6260, 2.6899], device='cuda:2'), covar=tensor([0.0300, 0.0340, 0.0299, 0.0264, 0.0624, 0.0252, 0.0303, 0.1667], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0217, 0.0213, 0.0230, 0.0385, 0.0187, 0.0201, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:39:05,360 INFO [train2.py:809] (2/4) Epoch 27, batch 0, loss[ctc_loss=0.06939, att_loss=0.2341, loss=0.2012, over 17129.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01445, over 56.00 utterances.], tot_loss[ctc_loss=0.06939, att_loss=0.2341, loss=0.2012, over 17129.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01445, over 56.00 utterances.], batch size: 56, lr: 4.01e-03, grad_scale: 8.0 2023-03-09 08:39:05,360 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 08:39:17,374 INFO [train2.py:843] (2/4) Epoch 27, validation: ctc_loss=0.04075, att_loss=0.2342, loss=0.1955, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 08:39:17,375 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 08:39:36,973 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=103591.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:40:35,283 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4064, 4.5341, 4.6192, 4.6297, 5.1454, 4.3820, 4.6166, 2.6542], device='cuda:2'), covar=tensor([0.0292, 0.0412, 0.0336, 0.0361, 0.0803, 0.0290, 0.0299, 0.1643], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0219, 0.0215, 0.0232, 0.0389, 0.0190, 0.0203, 0.0225], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:40:36,382 INFO [train2.py:809] (2/4) Epoch 27, batch 50, loss[ctc_loss=0.05442, att_loss=0.2122, loss=0.1806, over 16267.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007437, over 43.00 utterances.], tot_loss[ctc_loss=0.06916, att_loss=0.233, loss=0.2002, over 740550.50 frames. utt_duration=1177 frames, utt_pad_proportion=0.07075, over 2519.59 utterances.], batch size: 43, lr: 4.01e-03, grad_scale: 8.0 2023-03-09 08:41:14,259 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.910e+02 2.259e+02 2.713e+02 6.930e+02, threshold=4.518e+02, percent-clipped=6.0 2023-03-09 08:41:42,430 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1789, 2.5981, 3.4956, 2.8969, 3.3890, 4.3050, 4.1844, 3.1221], device='cuda:2'), covar=tensor([0.0386, 0.1994, 0.1314, 0.1345, 0.1085, 0.0944, 0.0623, 0.1180], device='cuda:2'), in_proj_covar=tensor([0.0248, 0.0251, 0.0289, 0.0222, 0.0270, 0.0381, 0.0273, 0.0235], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 08:41:56,144 INFO [train2.py:809] (2/4) Epoch 27, batch 100, loss[ctc_loss=0.04858, att_loss=0.2259, loss=0.1905, over 16323.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006673, over 45.00 utterances.], tot_loss[ctc_loss=0.0658, att_loss=0.2317, loss=0.1985, over 1304984.31 frames. utt_duration=1228 frames, utt_pad_proportion=0.05664, over 4257.48 utterances.], batch size: 45, lr: 4.01e-03, grad_scale: 8.0 2023-03-09 08:43:03,279 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.7776, 6.0745, 5.5488, 5.7628, 5.7094, 5.1608, 5.4492, 5.1687], device='cuda:2'), covar=tensor([0.1183, 0.0767, 0.0993, 0.0829, 0.0864, 0.1594, 0.2232, 0.2369], device='cuda:2'), in_proj_covar=tensor([0.0553, 0.0631, 0.0485, 0.0474, 0.0446, 0.0483, 0.0636, 0.0545], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:43:04,843 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103722.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 08:43:15,221 INFO [train2.py:809] (2/4) Epoch 27, batch 150, loss[ctc_loss=0.06736, att_loss=0.2033, loss=0.1761, over 14601.00 frames. utt_duration=1827 frames, utt_pad_proportion=0.03715, over 32.00 utterances.], tot_loss[ctc_loss=0.06663, att_loss=0.2328, loss=0.1996, over 1735753.14 frames. utt_duration=1228 frames, utt_pad_proportion=0.05859, over 5660.94 utterances.], batch size: 32, lr: 4.01e-03, grad_scale: 8.0 2023-03-09 08:43:33,668 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103740.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:43:52,273 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.770e+02 2.085e+02 2.452e+02 6.368e+02, threshold=4.170e+02, percent-clipped=1.0 2023-03-09 08:43:56,612 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=103754.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:44:13,425 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103765.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:44:34,212 INFO [train2.py:809] (2/4) Epoch 27, batch 200, loss[ctc_loss=0.06686, att_loss=0.2468, loss=0.2108, over 17316.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.01098, over 55.00 utterances.], tot_loss[ctc_loss=0.06595, att_loss=0.2319, loss=0.1987, over 2072508.23 frames. utt_duration=1232 frames, utt_pad_proportion=0.05883, over 6736.88 utterances.], batch size: 55, lr: 4.01e-03, grad_scale: 8.0 2023-03-09 08:44:37,774 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4752, 2.5329, 2.6126, 2.4029, 2.5293, 2.4006, 2.5643, 2.0606], device='cuda:2'), covar=tensor([0.1245, 0.1470, 0.1661, 0.3185, 0.1485, 0.1743, 0.1492, 0.3302], device='cuda:2'), in_proj_covar=tensor([0.0200, 0.0205, 0.0220, 0.0270, 0.0179, 0.0281, 0.0200, 0.0229], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:44:49,607 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=103788.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:45:10,663 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103801.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:45:11,937 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=103802.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:45:49,596 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103826.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:45:53,752 INFO [train2.py:809] (2/4) Epoch 27, batch 250, loss[ctc_loss=0.07119, att_loss=0.24, loss=0.2063, over 17049.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.0091, over 53.00 utterances.], tot_loss[ctc_loss=0.0651, att_loss=0.2311, loss=0.1979, over 2335675.29 frames. utt_duration=1259 frames, utt_pad_proportion=0.05322, over 7431.50 utterances.], batch size: 53, lr: 4.01e-03, grad_scale: 8.0 2023-03-09 08:46:05,847 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=103836.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:46:30,827 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.879e+02 2.239e+02 2.738e+02 5.518e+02, threshold=4.478e+02, percent-clipped=1.0 2023-03-09 08:47:08,898 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103876.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:47:13,075 INFO [train2.py:809] (2/4) Epoch 27, batch 300, loss[ctc_loss=0.08137, att_loss=0.2545, loss=0.2199, over 17344.00 frames. utt_duration=879.5 frames, utt_pad_proportion=0.07714, over 79.00 utterances.], tot_loss[ctc_loss=0.06552, att_loss=0.2319, loss=0.1986, over 2545284.41 frames. utt_duration=1241 frames, utt_pad_proportion=0.05869, over 8216.75 utterances.], batch size: 79, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:47:57,119 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3182, 5.5580, 5.5310, 5.5130, 5.6250, 5.5845, 5.2449, 5.0816], device='cuda:2'), covar=tensor([0.0922, 0.0496, 0.0273, 0.0417, 0.0237, 0.0272, 0.0378, 0.0291], device='cuda:2'), in_proj_covar=tensor([0.0530, 0.0377, 0.0368, 0.0374, 0.0438, 0.0441, 0.0375, 0.0407], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:47:57,176 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2818, 5.2179, 4.8141, 3.3694, 5.0888, 4.9539, 4.5638, 2.7591], device='cuda:2'), covar=tensor([0.0116, 0.0126, 0.0439, 0.0922, 0.0113, 0.0191, 0.0302, 0.1524], device='cuda:2'), in_proj_covar=tensor([0.0078, 0.0107, 0.0111, 0.0113, 0.0090, 0.0118, 0.0102, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:48:32,099 INFO [train2.py:809] (2/4) Epoch 27, batch 350, loss[ctc_loss=0.06805, att_loss=0.2458, loss=0.2103, over 16962.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007772, over 50.00 utterances.], tot_loss[ctc_loss=0.0658, att_loss=0.2319, loss=0.1987, over 2703964.77 frames. utt_duration=1227 frames, utt_pad_proportion=0.06246, over 8822.29 utterances.], batch size: 50, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:48:45,735 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103937.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 08:49:08,959 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 1.861e+02 2.238e+02 2.752e+02 4.825e+02, threshold=4.477e+02, percent-clipped=1.0 2023-03-09 08:49:15,399 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.83 vs. limit=5.0 2023-03-09 08:49:51,235 INFO [train2.py:809] (2/4) Epoch 27, batch 400, loss[ctc_loss=0.06854, att_loss=0.2281, loss=0.1962, over 16384.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.008462, over 44.00 utterances.], tot_loss[ctc_loss=0.06534, att_loss=0.2318, loss=0.1985, over 2835650.13 frames. utt_duration=1250 frames, utt_pad_proportion=0.05366, over 9088.50 utterances.], batch size: 44, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:50:02,795 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7266, 3.7805, 3.1306, 3.2437, 3.8933, 3.5788, 2.6952, 4.1382], device='cuda:2'), covar=tensor([0.1265, 0.0507, 0.1127, 0.0858, 0.0842, 0.0720, 0.1099, 0.0502], device='cuda:2'), in_proj_covar=tensor([0.0208, 0.0224, 0.0231, 0.0206, 0.0290, 0.0248, 0.0204, 0.0295], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 08:50:09,019 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103990.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:50:19,675 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103997.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:51:03,262 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5840, 3.0215, 3.5623, 4.4943, 3.9474, 4.0108, 3.1180, 2.4556], device='cuda:2'), covar=tensor([0.0642, 0.1771, 0.0800, 0.0534, 0.0923, 0.0509, 0.1378, 0.2063], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0219, 0.0187, 0.0224, 0.0233, 0.0190, 0.0204, 0.0192], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:51:04,753 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104022.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 08:51:15,260 INFO [train2.py:809] (2/4) Epoch 27, batch 450, loss[ctc_loss=0.07116, att_loss=0.2454, loss=0.2105, over 17070.00 frames. utt_duration=1315 frames, utt_pad_proportion=0.007852, over 52.00 utterances.], tot_loss[ctc_loss=0.066, att_loss=0.2321, loss=0.1989, over 2933697.06 frames. utt_duration=1250 frames, utt_pad_proportion=0.05338, over 9395.34 utterances.], batch size: 52, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:51:48,292 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=104049.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:51:51,428 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=104051.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:51:52,513 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.211e+02 1.790e+02 2.279e+02 2.671e+02 6.527e+02, threshold=4.558e+02, percent-clipped=3.0 2023-03-09 08:52:02,868 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=104058.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:52:20,943 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104070.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 08:52:21,678 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2023-03-09 08:52:27,075 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0767, 5.3383, 5.5962, 5.4331, 5.5891, 6.0235, 5.2849, 6.0759], device='cuda:2'), covar=tensor([0.0636, 0.0698, 0.0810, 0.1311, 0.1682, 0.0848, 0.0704, 0.0692], device='cuda:2'), in_proj_covar=tensor([0.0901, 0.0522, 0.0631, 0.0681, 0.0898, 0.0659, 0.0506, 0.0638], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 08:52:27,139 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0355, 5.3294, 4.9248, 5.4223, 4.7688, 5.0465, 5.4940, 5.2237], device='cuda:2'), covar=tensor([0.0611, 0.0339, 0.0771, 0.0350, 0.0419, 0.0265, 0.0227, 0.0197], device='cuda:2'), in_proj_covar=tensor([0.0402, 0.0337, 0.0380, 0.0371, 0.0337, 0.0247, 0.0318, 0.0301], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 08:52:34,616 INFO [train2.py:809] (2/4) Epoch 27, batch 500, loss[ctc_loss=0.05043, att_loss=0.2286, loss=0.193, over 17068.00 frames. utt_duration=1290 frames, utt_pad_proportion=0.008789, over 53.00 utterances.], tot_loss[ctc_loss=0.06615, att_loss=0.2321, loss=0.1989, over 3012551.54 frames. utt_duration=1254 frames, utt_pad_proportion=0.05201, over 9619.90 utterances.], batch size: 53, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:53:02,100 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104096.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:53:23,991 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=104110.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:53:41,003 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104121.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:53:53,121 INFO [train2.py:809] (2/4) Epoch 27, batch 550, loss[ctc_loss=0.06153, att_loss=0.2175, loss=0.1863, over 15666.00 frames. utt_duration=1695 frames, utt_pad_proportion=0.006808, over 37.00 utterances.], tot_loss[ctc_loss=0.06579, att_loss=0.2317, loss=0.1985, over 3070976.77 frames. utt_duration=1265 frames, utt_pad_proportion=0.05002, over 9722.35 utterances.], batch size: 37, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:54:09,455 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0224, 4.3273, 4.4956, 4.4880, 2.8683, 4.3860, 2.9283, 1.7829], device='cuda:2'), covar=tensor([0.0517, 0.0326, 0.0585, 0.0262, 0.1481, 0.0269, 0.1328, 0.1653], device='cuda:2'), in_proj_covar=tensor([0.0213, 0.0184, 0.0263, 0.0177, 0.0221, 0.0163, 0.0232, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 08:54:29,373 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.746e+02 2.054e+02 2.540e+02 6.808e+02, threshold=4.109e+02, percent-clipped=2.0 2023-03-09 08:54:58,967 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4981, 3.1617, 3.7526, 3.1351, 3.4939, 4.6477, 4.4160, 3.2475], device='cuda:2'), covar=tensor([0.0359, 0.1520, 0.1179, 0.1257, 0.1182, 0.0739, 0.0639, 0.1192], device='cuda:2'), in_proj_covar=tensor([0.0249, 0.0250, 0.0289, 0.0222, 0.0271, 0.0380, 0.0274, 0.0235], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 08:55:07,901 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8671, 6.1635, 5.6624, 5.8219, 5.8139, 5.2320, 5.5101, 5.3303], device='cuda:2'), covar=tensor([0.1258, 0.0781, 0.1032, 0.0803, 0.0901, 0.1668, 0.2441, 0.2250], device='cuda:2'), in_proj_covar=tensor([0.0554, 0.0633, 0.0488, 0.0478, 0.0449, 0.0483, 0.0640, 0.0547], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:55:10,796 INFO [train2.py:809] (2/4) Epoch 27, batch 600, loss[ctc_loss=0.05828, att_loss=0.2354, loss=0.1999, over 16881.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006817, over 49.00 utterances.], tot_loss[ctc_loss=0.06565, att_loss=0.2313, loss=0.1982, over 3119249.82 frames. utt_duration=1289 frames, utt_pad_proportion=0.04366, over 9692.82 utterances.], batch size: 49, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:56:29,435 INFO [train2.py:809] (2/4) Epoch 27, batch 650, loss[ctc_loss=0.0924, att_loss=0.2567, loss=0.2238, over 16761.00 frames. utt_duration=678.7 frames, utt_pad_proportion=0.1474, over 99.00 utterances.], tot_loss[ctc_loss=0.06658, att_loss=0.2322, loss=0.1991, over 3152256.17 frames. utt_duration=1243 frames, utt_pad_proportion=0.0552, over 10154.45 utterances.], batch size: 99, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:56:34,214 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104232.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 08:57:05,923 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.363e+02 1.860e+02 2.240e+02 2.877e+02 8.849e+02, threshold=4.479e+02, percent-clipped=3.0 2023-03-09 08:57:29,947 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5741, 4.9619, 4.7929, 4.9025, 4.9772, 4.6122, 3.4354, 4.9332], device='cuda:2'), covar=tensor([0.0134, 0.0124, 0.0146, 0.0095, 0.0100, 0.0130, 0.0785, 0.0207], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0094, 0.0118, 0.0074, 0.0079, 0.0090, 0.0108, 0.0112], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 08:57:47,592 INFO [train2.py:809] (2/4) Epoch 27, batch 700, loss[ctc_loss=0.0772, att_loss=0.2468, loss=0.2129, over 17397.00 frames. utt_duration=1106 frames, utt_pad_proportion=0.03322, over 63.00 utterances.], tot_loss[ctc_loss=0.06668, att_loss=0.2324, loss=0.1993, over 3176696.55 frames. utt_duration=1235 frames, utt_pad_proportion=0.05871, over 10302.85 utterances.], batch size: 63, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:58:36,043 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2227, 5.1862, 4.9615, 3.3188, 5.0045, 4.8211, 4.6551, 2.9705], device='cuda:2'), covar=tensor([0.0100, 0.0102, 0.0283, 0.0770, 0.0101, 0.0187, 0.0229, 0.1157], device='cuda:2'), in_proj_covar=tensor([0.0078, 0.0105, 0.0109, 0.0112, 0.0089, 0.0117, 0.0100, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 08:59:06,612 INFO [train2.py:809] (2/4) Epoch 27, batch 750, loss[ctc_loss=0.05682, att_loss=0.2287, loss=0.1943, over 16668.00 frames. utt_duration=1451 frames, utt_pad_proportion=0.007569, over 46.00 utterances.], tot_loss[ctc_loss=0.06631, att_loss=0.2327, loss=0.1994, over 3213512.94 frames. utt_duration=1251 frames, utt_pad_proportion=0.05028, over 10291.28 utterances.], batch size: 46, lr: 4.00e-03, grad_scale: 8.0 2023-03-09 08:59:34,747 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104346.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:59:40,826 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7090, 4.9902, 4.6056, 5.0727, 4.4976, 4.7546, 5.1115, 4.9360], device='cuda:2'), covar=tensor([0.0626, 0.0332, 0.0808, 0.0341, 0.0451, 0.0353, 0.0243, 0.0209], device='cuda:2'), in_proj_covar=tensor([0.0402, 0.0338, 0.0381, 0.0373, 0.0339, 0.0247, 0.0318, 0.0301], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 08:59:43,524 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.260e+02 1.900e+02 2.289e+02 2.673e+02 5.162e+02, threshold=4.578e+02, percent-clipped=1.0 2023-03-09 08:59:45,856 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104353.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 08:59:52,097 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9680, 5.0402, 4.7217, 2.5681, 4.7885, 4.7068, 4.0688, 2.5224], device='cuda:2'), covar=tensor([0.0154, 0.0125, 0.0328, 0.1281, 0.0131, 0.0224, 0.0413, 0.1607], device='cuda:2'), in_proj_covar=tensor([0.0078, 0.0105, 0.0109, 0.0112, 0.0089, 0.0117, 0.0100, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 09:00:26,055 INFO [train2.py:809] (2/4) Epoch 27, batch 800, loss[ctc_loss=0.09472, att_loss=0.2572, loss=0.2247, over 17377.00 frames. utt_duration=1009 frames, utt_pad_proportion=0.04728, over 69.00 utterances.], tot_loss[ctc_loss=0.06603, att_loss=0.2325, loss=0.1992, over 3228692.27 frames. utt_duration=1251 frames, utt_pad_proportion=0.0491, over 10333.76 utterances.], batch size: 69, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:00:42,299 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2023-03-09 09:00:54,130 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104396.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:01:08,577 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104405.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:01:33,293 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104421.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:01:42,544 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6240, 3.7330, 3.8186, 2.4084, 2.4568, 3.0031, 2.5458, 3.5055], device='cuda:2'), covar=tensor([0.0693, 0.0501, 0.0412, 0.3829, 0.3687, 0.1808, 0.2650, 0.1205], device='cuda:2'), in_proj_covar=tensor([0.0364, 0.0301, 0.0279, 0.0252, 0.0342, 0.0333, 0.0262, 0.0372], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 09:01:46,500 INFO [train2.py:809] (2/4) Epoch 27, batch 850, loss[ctc_loss=0.06694, att_loss=0.217, loss=0.187, over 16200.00 frames. utt_duration=1582 frames, utt_pad_proportion=0.005123, over 41.00 utterances.], tot_loss[ctc_loss=0.06637, att_loss=0.2326, loss=0.1993, over 3237245.95 frames. utt_duration=1236 frames, utt_pad_proportion=0.05568, over 10485.27 utterances.], batch size: 41, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:02:06,068 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0953, 5.3565, 5.6043, 5.3797, 5.6210, 6.0598, 5.3708, 6.1303], device='cuda:2'), covar=tensor([0.0821, 0.0751, 0.0842, 0.1489, 0.1909, 0.0870, 0.0657, 0.0729], device='cuda:2'), in_proj_covar=tensor([0.0907, 0.0526, 0.0638, 0.0687, 0.0903, 0.0664, 0.0510, 0.0643], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 09:02:10,579 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104444.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:02:23,114 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.335e+02 1.830e+02 2.139e+02 2.642e+02 5.994e+02, threshold=4.278e+02, percent-clipped=1.0 2023-03-09 09:02:49,621 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104469.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:03:05,896 INFO [train2.py:809] (2/4) Epoch 27, batch 900, loss[ctc_loss=0.06409, att_loss=0.2383, loss=0.2035, over 16636.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.004781, over 47.00 utterances.], tot_loss[ctc_loss=0.06577, att_loss=0.2322, loss=0.1989, over 3244784.61 frames. utt_duration=1231 frames, utt_pad_proportion=0.05817, over 10554.62 utterances.], batch size: 47, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:04:05,410 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0692, 5.3078, 5.5499, 5.4030, 5.5981, 5.9952, 5.3160, 6.0928], device='cuda:2'), covar=tensor([0.0677, 0.0762, 0.0881, 0.1441, 0.1612, 0.0922, 0.0717, 0.0680], device='cuda:2'), in_proj_covar=tensor([0.0913, 0.0530, 0.0643, 0.0691, 0.0908, 0.0668, 0.0513, 0.0646], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 09:04:05,591 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5872, 2.5599, 2.4753, 2.4275, 2.7920, 2.8547, 2.2598, 3.0672], device='cuda:2'), covar=tensor([0.1724, 0.2255, 0.1952, 0.1566, 0.1868, 0.1191, 0.2532, 0.1239], device='cuda:2'), in_proj_covar=tensor([0.0141, 0.0144, 0.0138, 0.0134, 0.0151, 0.0128, 0.0153, 0.0128], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 09:04:24,927 INFO [train2.py:809] (2/4) Epoch 27, batch 950, loss[ctc_loss=0.08517, att_loss=0.2478, loss=0.2153, over 16691.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005714, over 46.00 utterances.], tot_loss[ctc_loss=0.06598, att_loss=0.232, loss=0.1988, over 3244145.08 frames. utt_duration=1221 frames, utt_pad_proportion=0.06292, over 10639.95 utterances.], batch size: 46, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:04:30,383 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104532.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 09:04:35,030 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0532, 2.6891, 2.7569, 4.2068, 3.6959, 3.7412, 2.7061, 1.9508], device='cuda:2'), covar=tensor([0.0862, 0.1954, 0.1104, 0.0622, 0.0986, 0.0508, 0.1760, 0.2451], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0221, 0.0190, 0.0226, 0.0236, 0.0192, 0.0206, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 09:04:50,880 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2023-03-09 09:05:01,192 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.309e+02 1.806e+02 2.235e+02 2.788e+02 5.642e+02, threshold=4.470e+02, percent-clipped=2.0 2023-03-09 09:05:43,839 INFO [train2.py:809] (2/4) Epoch 27, batch 1000, loss[ctc_loss=0.04758, att_loss=0.2038, loss=0.1726, over 15501.00 frames. utt_duration=1724 frames, utt_pad_proportion=0.008007, over 36.00 utterances.], tot_loss[ctc_loss=0.06578, att_loss=0.2315, loss=0.1984, over 3252163.98 frames. utt_duration=1231 frames, utt_pad_proportion=0.06064, over 10582.30 utterances.], batch size: 36, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:05:45,471 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104580.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:05:57,552 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8211, 4.3746, 4.4256, 2.1910, 2.1732, 2.8222, 2.3061, 3.7226], device='cuda:2'), covar=tensor([0.0740, 0.0337, 0.0271, 0.5231, 0.4772, 0.2342, 0.3495, 0.1347], device='cuda:2'), in_proj_covar=tensor([0.0366, 0.0302, 0.0280, 0.0254, 0.0343, 0.0334, 0.0263, 0.0372], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 09:07:03,584 INFO [train2.py:809] (2/4) Epoch 27, batch 1050, loss[ctc_loss=0.06539, att_loss=0.2412, loss=0.206, over 16998.00 frames. utt_duration=688.4 frames, utt_pad_proportion=0.1341, over 99.00 utterances.], tot_loss[ctc_loss=0.06539, att_loss=0.231, loss=0.1979, over 3257548.40 frames. utt_duration=1241 frames, utt_pad_proportion=0.05625, over 10515.92 utterances.], batch size: 99, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:07:30,288 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104646.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:07:40,080 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.329e+02 1.873e+02 2.248e+02 2.692e+02 7.712e+02, threshold=4.496e+02, percent-clipped=6.0 2023-03-09 09:07:41,980 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104653.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:08:22,948 INFO [train2.py:809] (2/4) Epoch 27, batch 1100, loss[ctc_loss=0.05896, att_loss=0.239, loss=0.203, over 16884.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006597, over 49.00 utterances.], tot_loss[ctc_loss=0.06546, att_loss=0.2313, loss=0.1981, over 3263672.66 frames. utt_duration=1251 frames, utt_pad_proportion=0.05307, over 10447.08 utterances.], batch size: 49, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:08:46,595 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104694.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:08:58,148 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104701.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:09:04,425 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104705.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:09:42,147 INFO [train2.py:809] (2/4) Epoch 27, batch 1150, loss[ctc_loss=0.07175, att_loss=0.2452, loss=0.2105, over 16332.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.006126, over 45.00 utterances.], tot_loss[ctc_loss=0.06503, att_loss=0.2309, loss=0.1977, over 3259735.59 frames. utt_duration=1281 frames, utt_pad_proportion=0.04688, over 10190.96 utterances.], batch size: 45, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:10:15,324 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.86 vs. limit=5.0 2023-03-09 09:10:19,015 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.235e+02 1.763e+02 2.147e+02 2.589e+02 3.796e+02, threshold=4.293e+02, percent-clipped=0.0 2023-03-09 09:10:20,732 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104753.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:10:39,554 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8759, 3.6492, 3.6285, 3.2056, 3.6398, 3.7031, 3.6642, 2.6684], device='cuda:2'), covar=tensor([0.0992, 0.1042, 0.1323, 0.2361, 0.0785, 0.1752, 0.0752, 0.2921], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0204, 0.0218, 0.0268, 0.0180, 0.0278, 0.0200, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 09:11:01,665 INFO [train2.py:809] (2/4) Epoch 27, batch 1200, loss[ctc_loss=0.06731, att_loss=0.2264, loss=0.1945, over 13289.00 frames. utt_duration=1834 frames, utt_pad_proportion=0.0869, over 29.00 utterances.], tot_loss[ctc_loss=0.06538, att_loss=0.2311, loss=0.1979, over 3260207.74 frames. utt_duration=1283 frames, utt_pad_proportion=0.04675, over 10174.18 utterances.], batch size: 29, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:11:03,033 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-09 09:12:21,904 INFO [train2.py:809] (2/4) Epoch 27, batch 1250, loss[ctc_loss=0.05314, att_loss=0.2241, loss=0.1899, over 16393.00 frames. utt_duration=1491 frames, utt_pad_proportion=0.006344, over 44.00 utterances.], tot_loss[ctc_loss=0.06564, att_loss=0.2315, loss=0.1983, over 3261664.46 frames. utt_duration=1292 frames, utt_pad_proportion=0.0451, over 10106.17 utterances.], batch size: 44, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:12:58,970 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.231e+02 1.878e+02 2.209e+02 2.737e+02 5.339e+02, threshold=4.418e+02, percent-clipped=1.0 2023-03-09 09:13:42,097 INFO [train2.py:809] (2/4) Epoch 27, batch 1300, loss[ctc_loss=0.06753, att_loss=0.223, loss=0.1919, over 15772.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008777, over 38.00 utterances.], tot_loss[ctc_loss=0.0652, att_loss=0.2315, loss=0.1982, over 3266676.54 frames. utt_duration=1287 frames, utt_pad_proportion=0.04471, over 10164.72 utterances.], batch size: 38, lr: 3.99e-03, grad_scale: 8.0 2023-03-09 09:14:38,499 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1137, 6.2765, 5.8089, 6.0148, 5.9894, 5.4118, 5.8077, 5.4660], device='cuda:2'), covar=tensor([0.1291, 0.0907, 0.0871, 0.0804, 0.0888, 0.1769, 0.2412, 0.2415], device='cuda:2'), in_proj_covar=tensor([0.0551, 0.0631, 0.0483, 0.0476, 0.0448, 0.0480, 0.0638, 0.0545], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 09:15:01,665 INFO [train2.py:809] (2/4) Epoch 27, batch 1350, loss[ctc_loss=0.1349, att_loss=0.2755, loss=0.2474, over 14054.00 frames. utt_duration=389.3 frames, utt_pad_proportion=0.323, over 145.00 utterances.], tot_loss[ctc_loss=0.06612, att_loss=0.2316, loss=0.1985, over 3268307.69 frames. utt_duration=1265 frames, utt_pad_proportion=0.05078, over 10347.35 utterances.], batch size: 145, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:15:37,628 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.826e+02 2.285e+02 3.088e+02 1.440e+03, threshold=4.571e+02, percent-clipped=4.0 2023-03-09 09:16:11,623 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1620, 5.4173, 5.6911, 5.5003, 5.6683, 6.1003, 5.4179, 6.2041], device='cuda:2'), covar=tensor([0.0679, 0.0689, 0.0727, 0.1315, 0.1762, 0.0876, 0.0620, 0.0599], device='cuda:2'), in_proj_covar=tensor([0.0908, 0.0525, 0.0637, 0.0686, 0.0905, 0.0663, 0.0507, 0.0638], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 09:16:20,668 INFO [train2.py:809] (2/4) Epoch 27, batch 1400, loss[ctc_loss=0.07629, att_loss=0.2527, loss=0.2174, over 17338.00 frames. utt_duration=1177 frames, utt_pad_proportion=0.02234, over 59.00 utterances.], tot_loss[ctc_loss=0.06577, att_loss=0.2311, loss=0.198, over 3263815.93 frames. utt_duration=1274 frames, utt_pad_proportion=0.05088, over 10261.06 utterances.], batch size: 59, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:17:40,904 INFO [train2.py:809] (2/4) Epoch 27, batch 1450, loss[ctc_loss=0.05152, att_loss=0.2217, loss=0.1877, over 16782.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005504, over 48.00 utterances.], tot_loss[ctc_loss=0.06631, att_loss=0.2313, loss=0.1983, over 3256833.79 frames. utt_duration=1216 frames, utt_pad_proportion=0.0671, over 10724.12 utterances.], batch size: 48, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:17:42,043 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-09 09:18:16,914 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.239e+02 1.845e+02 2.127e+02 2.675e+02 5.625e+02, threshold=4.253e+02, percent-clipped=2.0 2023-03-09 09:18:31,483 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6027, 4.9090, 4.5370, 4.9274, 4.3989, 4.5572, 4.9994, 4.8397], device='cuda:2'), covar=tensor([0.0625, 0.0326, 0.0717, 0.0387, 0.0422, 0.0355, 0.0243, 0.0209], device='cuda:2'), in_proj_covar=tensor([0.0401, 0.0337, 0.0379, 0.0375, 0.0338, 0.0247, 0.0317, 0.0301], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 09:18:43,518 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.99 vs. limit=5.0 2023-03-09 09:19:00,492 INFO [train2.py:809] (2/4) Epoch 27, batch 1500, loss[ctc_loss=0.06332, att_loss=0.2361, loss=0.2016, over 17060.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.00889, over 53.00 utterances.], tot_loss[ctc_loss=0.06597, att_loss=0.2309, loss=0.1979, over 3262818.13 frames. utt_duration=1236 frames, utt_pad_proportion=0.0609, over 10572.53 utterances.], batch size: 53, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:19:27,155 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0384, 4.9292, 4.8765, 2.2396, 2.0633, 3.0928, 2.4347, 3.7547], device='cuda:2'), covar=tensor([0.0727, 0.0399, 0.0266, 0.5489, 0.5677, 0.2272, 0.3861, 0.1726], device='cuda:2'), in_proj_covar=tensor([0.0363, 0.0301, 0.0279, 0.0254, 0.0343, 0.0332, 0.0263, 0.0371], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 09:20:18,687 INFO [train2.py:809] (2/4) Epoch 27, batch 1550, loss[ctc_loss=0.04663, att_loss=0.2164, loss=0.1825, over 16392.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007304, over 44.00 utterances.], tot_loss[ctc_loss=0.06674, att_loss=0.2319, loss=0.1989, over 3271619.35 frames. utt_duration=1229 frames, utt_pad_proportion=0.05981, over 10662.82 utterances.], batch size: 44, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:20:25,375 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0973, 5.3494, 5.3013, 5.2537, 5.3881, 5.3486, 5.0020, 4.8243], device='cuda:2'), covar=tensor([0.0938, 0.0533, 0.0308, 0.0534, 0.0268, 0.0293, 0.0430, 0.0310], device='cuda:2'), in_proj_covar=tensor([0.0542, 0.0384, 0.0375, 0.0383, 0.0444, 0.0452, 0.0382, 0.0415], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 09:20:47,226 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2023-03-09 09:20:55,282 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.873e+02 2.333e+02 2.814e+02 5.399e+02, threshold=4.665e+02, percent-clipped=5.0 2023-03-09 09:21:09,671 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-09 09:21:38,745 INFO [train2.py:809] (2/4) Epoch 27, batch 1600, loss[ctc_loss=0.0649, att_loss=0.2291, loss=0.1962, over 15950.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006518, over 41.00 utterances.], tot_loss[ctc_loss=0.06605, att_loss=0.2312, loss=0.1982, over 3263053.76 frames. utt_duration=1257 frames, utt_pad_proportion=0.05549, over 10397.39 utterances.], batch size: 41, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:22:58,645 INFO [train2.py:809] (2/4) Epoch 27, batch 1650, loss[ctc_loss=0.05165, att_loss=0.2247, loss=0.1901, over 16253.00 frames. utt_duration=1513 frames, utt_pad_proportion=0.008367, over 43.00 utterances.], tot_loss[ctc_loss=0.06563, att_loss=0.2309, loss=0.1978, over 3266867.79 frames. utt_duration=1276 frames, utt_pad_proportion=0.04959, over 10253.77 utterances.], batch size: 43, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:23:34,346 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 2.028e+02 2.397e+02 2.825e+02 1.110e+03, threshold=4.794e+02, percent-clipped=4.0 2023-03-09 09:23:52,236 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9350, 3.6594, 3.7015, 3.1457, 3.7363, 3.7507, 3.7122, 2.6006], device='cuda:2'), covar=tensor([0.1089, 0.1043, 0.1301, 0.2968, 0.0816, 0.1853, 0.0820, 0.3416], device='cuda:2'), in_proj_covar=tensor([0.0197, 0.0203, 0.0218, 0.0268, 0.0180, 0.0280, 0.0200, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 09:24:17,719 INFO [train2.py:809] (2/4) Epoch 27, batch 1700, loss[ctc_loss=0.07027, att_loss=0.2222, loss=0.1918, over 15642.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.009115, over 37.00 utterances.], tot_loss[ctc_loss=0.06618, att_loss=0.2314, loss=0.1984, over 3265227.06 frames. utt_duration=1237 frames, utt_pad_proportion=0.06052, over 10575.57 utterances.], batch size: 37, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:25:35,430 INFO [train2.py:809] (2/4) Epoch 27, batch 1750, loss[ctc_loss=0.04206, att_loss=0.213, loss=0.1788, over 16276.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007554, over 43.00 utterances.], tot_loss[ctc_loss=0.06611, att_loss=0.2313, loss=0.1983, over 3258723.42 frames. utt_duration=1246 frames, utt_pad_proportion=0.0596, over 10470.15 utterances.], batch size: 43, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:26:11,195 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-09 09:26:11,489 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.890e+02 2.121e+02 2.571e+02 3.982e+02, threshold=4.242e+02, percent-clipped=0.0 2023-03-09 09:26:54,621 INFO [train2.py:809] (2/4) Epoch 27, batch 1800, loss[ctc_loss=0.05032, att_loss=0.2156, loss=0.1826, over 16397.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006895, over 44.00 utterances.], tot_loss[ctc_loss=0.06661, att_loss=0.2321, loss=0.199, over 3258880.93 frames. utt_duration=1208 frames, utt_pad_proportion=0.06958, over 10805.55 utterances.], batch size: 44, lr: 3.98e-03, grad_scale: 8.0 2023-03-09 09:27:08,776 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=105388.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:28:15,234 INFO [train2.py:809] (2/4) Epoch 27, batch 1850, loss[ctc_loss=0.07002, att_loss=0.218, loss=0.1884, over 15626.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009278, over 37.00 utterances.], tot_loss[ctc_loss=0.0659, att_loss=0.2317, loss=0.1985, over 3260770.18 frames. utt_duration=1225 frames, utt_pad_proportion=0.06577, over 10661.62 utterances.], batch size: 37, lr: 3.97e-03, grad_scale: 16.0 2023-03-09 09:28:27,407 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-03-09 09:28:28,048 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3267, 2.9909, 3.1470, 4.3851, 3.8927, 3.9038, 2.9404, 2.2820], device='cuda:2'), covar=tensor([0.0735, 0.1643, 0.0970, 0.0511, 0.0866, 0.0470, 0.1468, 0.2080], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0217, 0.0188, 0.0225, 0.0232, 0.0191, 0.0203, 0.0192], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 09:28:47,136 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=105449.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 09:28:51,301 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.738e+02 2.150e+02 2.856e+02 4.488e+02, threshold=4.299e+02, percent-clipped=4.0 2023-03-09 09:29:34,708 INFO [train2.py:809] (2/4) Epoch 27, batch 1900, loss[ctc_loss=0.05273, att_loss=0.2059, loss=0.1753, over 15344.00 frames. utt_duration=1755 frames, utt_pad_proportion=0.01179, over 35.00 utterances.], tot_loss[ctc_loss=0.06639, att_loss=0.2326, loss=0.1994, over 3272961.76 frames. utt_duration=1250 frames, utt_pad_proportion=0.05579, over 10483.93 utterances.], batch size: 35, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:29:45,585 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=105486.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:30:03,895 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.10 vs. limit=5.0 2023-03-09 09:30:11,404 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=105502.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:30:54,627 INFO [train2.py:809] (2/4) Epoch 27, batch 1950, loss[ctc_loss=0.07305, att_loss=0.2487, loss=0.2136, over 17023.00 frames. utt_duration=1311 frames, utt_pad_proportion=0.01051, over 52.00 utterances.], tot_loss[ctc_loss=0.06621, att_loss=0.2327, loss=0.1994, over 3271751.20 frames. utt_duration=1248 frames, utt_pad_proportion=0.05629, over 10495.00 utterances.], batch size: 52, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:31:23,702 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=105547.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:31:32,461 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.251e+02 1.913e+02 2.076e+02 2.767e+02 9.695e+02, threshold=4.152e+02, percent-clipped=4.0 2023-03-09 09:31:49,343 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=105563.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:32:14,215 INFO [train2.py:809] (2/4) Epoch 27, batch 2000, loss[ctc_loss=0.05629, att_loss=0.2246, loss=0.1909, over 16884.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006641, over 49.00 utterances.], tot_loss[ctc_loss=0.06581, att_loss=0.2326, loss=0.1992, over 3272659.03 frames. utt_duration=1247 frames, utt_pad_proportion=0.05583, over 10508.64 utterances.], batch size: 49, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:32:17,561 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4108, 2.9742, 3.6477, 3.1338, 3.5634, 4.4944, 4.3364, 3.5355], device='cuda:2'), covar=tensor([0.0382, 0.1836, 0.1319, 0.1232, 0.1153, 0.0927, 0.0554, 0.0994], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0252, 0.0294, 0.0223, 0.0274, 0.0385, 0.0277, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 09:32:18,439 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2023-03-09 09:32:36,519 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0657, 5.3431, 5.5932, 5.4266, 5.5721, 6.0589, 5.3462, 6.1315], device='cuda:2'), covar=tensor([0.0732, 0.0713, 0.0738, 0.1383, 0.1804, 0.0852, 0.0604, 0.0575], device='cuda:2'), in_proj_covar=tensor([0.0915, 0.0525, 0.0640, 0.0688, 0.0906, 0.0663, 0.0507, 0.0642], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 09:32:39,709 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9169, 3.5292, 3.6121, 2.7210, 3.4985, 3.6324, 3.6942, 2.1308], device='cuda:2'), covar=tensor([0.1227, 0.1681, 0.2111, 0.5207, 0.2065, 0.3003, 0.0957, 0.6497], device='cuda:2'), in_proj_covar=tensor([0.0199, 0.0205, 0.0220, 0.0271, 0.0182, 0.0281, 0.0202, 0.0230], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 09:33:09,046 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0611, 3.7921, 3.1817, 3.4182, 3.9370, 3.6444, 3.0133, 4.1546], device='cuda:2'), covar=tensor([0.1008, 0.0508, 0.1126, 0.0746, 0.0758, 0.0712, 0.0868, 0.0525], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0226, 0.0230, 0.0206, 0.0289, 0.0248, 0.0203, 0.0296], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 09:33:34,514 INFO [train2.py:809] (2/4) Epoch 27, batch 2050, loss[ctc_loss=0.05674, att_loss=0.2235, loss=0.1901, over 16121.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006541, over 42.00 utterances.], tot_loss[ctc_loss=0.06657, att_loss=0.2337, loss=0.2002, over 3280691.90 frames. utt_duration=1225 frames, utt_pad_proportion=0.05886, over 10729.82 utterances.], batch size: 42, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:34:11,423 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.673e+02 2.035e+02 2.438e+02 4.266e+02, threshold=4.071e+02, percent-clipped=1.0 2023-03-09 09:34:52,219 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9822, 5.2400, 5.2043, 5.2202, 5.3048, 5.2553, 4.9279, 4.7053], device='cuda:2'), covar=tensor([0.1064, 0.0608, 0.0347, 0.0509, 0.0285, 0.0330, 0.0441, 0.0363], device='cuda:2'), in_proj_covar=tensor([0.0539, 0.0384, 0.0372, 0.0378, 0.0443, 0.0451, 0.0381, 0.0414], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 09:34:53,526 INFO [train2.py:809] (2/4) Epoch 27, batch 2100, loss[ctc_loss=0.06966, att_loss=0.2339, loss=0.201, over 17042.00 frames. utt_duration=1338 frames, utt_pad_proportion=0.00658, over 51.00 utterances.], tot_loss[ctc_loss=0.067, att_loss=0.2335, loss=0.2002, over 3288714.77 frames. utt_duration=1234 frames, utt_pad_proportion=0.05483, over 10675.75 utterances.], batch size: 51, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:36:13,362 INFO [train2.py:809] (2/4) Epoch 27, batch 2150, loss[ctc_loss=0.06732, att_loss=0.2419, loss=0.207, over 17427.00 frames. utt_duration=1012 frames, utt_pad_proportion=0.04573, over 69.00 utterances.], tot_loss[ctc_loss=0.06644, att_loss=0.2331, loss=0.1998, over 3286097.68 frames. utt_duration=1215 frames, utt_pad_proportion=0.06003, over 10835.79 utterances.], batch size: 69, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:36:37,236 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=105744.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 09:36:51,251 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.182e+02 1.815e+02 2.188e+02 2.690e+02 5.528e+02, threshold=4.376e+02, percent-clipped=1.0 2023-03-09 09:36:59,180 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5353, 3.0253, 3.4554, 4.5117, 4.0166, 3.8654, 3.0470, 2.5283], device='cuda:2'), covar=tensor([0.0673, 0.1771, 0.0830, 0.0549, 0.0843, 0.0540, 0.1453, 0.1952], device='cuda:2'), in_proj_covar=tensor([0.0191, 0.0221, 0.0190, 0.0228, 0.0234, 0.0193, 0.0206, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 09:37:32,705 INFO [train2.py:809] (2/4) Epoch 27, batch 2200, loss[ctc_loss=0.07238, att_loss=0.2505, loss=0.2149, over 16774.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006303, over 48.00 utterances.], tot_loss[ctc_loss=0.0658, att_loss=0.2319, loss=0.1986, over 3275247.40 frames. utt_duration=1245 frames, utt_pad_proportion=0.05542, over 10533.42 utterances.], batch size: 48, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:37:47,010 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7072, 3.1945, 3.8130, 3.1386, 3.7989, 4.7163, 4.5728, 3.4689], device='cuda:2'), covar=tensor([0.0348, 0.1622, 0.1117, 0.1374, 0.1036, 0.0852, 0.0552, 0.1142], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0253, 0.0294, 0.0223, 0.0274, 0.0385, 0.0277, 0.0237], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 09:37:53,009 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=105792.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:37:57,090 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.05 vs. limit=5.0 2023-03-09 09:38:04,357 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=105799.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:38:38,771 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.44 vs. limit=5.0 2023-03-09 09:38:52,244 INFO [train2.py:809] (2/4) Epoch 27, batch 2250, loss[ctc_loss=0.0651, att_loss=0.2186, loss=0.1879, over 15771.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008495, over 38.00 utterances.], tot_loss[ctc_loss=0.06559, att_loss=0.2317, loss=0.1985, over 3272693.30 frames. utt_duration=1253 frames, utt_pad_proportion=0.05465, over 10458.41 utterances.], batch size: 38, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:39:13,138 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=105842.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:39:20,529 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2023-03-09 09:39:30,859 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.345e+02 1.815e+02 2.086e+02 2.504e+02 6.374e+02, threshold=4.173e+02, percent-clipped=3.0 2023-03-09 09:39:31,329 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=105853.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:39:39,846 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=105858.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:39:43,202 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=105860.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:40:12,503 INFO [train2.py:809] (2/4) Epoch 27, batch 2300, loss[ctc_loss=0.05754, att_loss=0.2168, loss=0.1849, over 15767.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.00895, over 38.00 utterances.], tot_loss[ctc_loss=0.06526, att_loss=0.2317, loss=0.1984, over 3278183.55 frames. utt_duration=1250 frames, utt_pad_proportion=0.05316, over 10501.82 utterances.], batch size: 38, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:41:31,986 INFO [train2.py:809] (2/4) Epoch 27, batch 2350, loss[ctc_loss=0.07795, att_loss=0.2426, loss=0.2097, over 17057.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.00852, over 52.00 utterances.], tot_loss[ctc_loss=0.0657, att_loss=0.232, loss=0.1987, over 3281697.71 frames. utt_duration=1246 frames, utt_pad_proportion=0.05372, over 10546.23 utterances.], batch size: 52, lr: 3.97e-03, grad_scale: 8.0 2023-03-09 09:41:59,628 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.77 vs. limit=5.0 2023-03-09 09:42:09,624 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.259e+02 1.971e+02 2.209e+02 2.522e+02 5.529e+02, threshold=4.418e+02, percent-clipped=2.0 2023-03-09 09:42:51,495 INFO [train2.py:809] (2/4) Epoch 27, batch 2400, loss[ctc_loss=0.08142, att_loss=0.2404, loss=0.2086, over 17292.00 frames. utt_duration=1099 frames, utt_pad_proportion=0.0332, over 63.00 utterances.], tot_loss[ctc_loss=0.06561, att_loss=0.2314, loss=0.1983, over 3278480.30 frames. utt_duration=1251 frames, utt_pad_proportion=0.05368, over 10491.87 utterances.], batch size: 63, lr: 3.96e-03, grad_scale: 8.0 2023-03-09 09:44:16,148 INFO [train2.py:809] (2/4) Epoch 27, batch 2450, loss[ctc_loss=0.06839, att_loss=0.2471, loss=0.2114, over 17315.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02058, over 59.00 utterances.], tot_loss[ctc_loss=0.06565, att_loss=0.2313, loss=0.1982, over 3276081.58 frames. utt_duration=1262 frames, utt_pad_proportion=0.051, over 10392.09 utterances.], batch size: 59, lr: 3.96e-03, grad_scale: 4.0 2023-03-09 09:44:40,196 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=106044.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:44:55,919 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.307e+02 1.876e+02 2.325e+02 2.817e+02 5.594e+02, threshold=4.649e+02, percent-clipped=2.0 2023-03-09 09:45:22,173 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0031, 6.2621, 5.7031, 5.9237, 5.9309, 5.3343, 5.7092, 5.2511], device='cuda:2'), covar=tensor([0.1375, 0.0850, 0.0966, 0.0866, 0.0879, 0.1517, 0.2191, 0.2355], device='cuda:2'), in_proj_covar=tensor([0.0559, 0.0635, 0.0483, 0.0477, 0.0451, 0.0480, 0.0645, 0.0545], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 09:45:33,187 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=106077.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:45:35,772 INFO [train2.py:809] (2/4) Epoch 27, batch 2500, loss[ctc_loss=0.0829, att_loss=0.2247, loss=0.1964, over 15875.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.01, over 39.00 utterances.], tot_loss[ctc_loss=0.065, att_loss=0.2304, loss=0.1973, over 3269223.02 frames. utt_duration=1280 frames, utt_pad_proportion=0.0468, over 10224.51 utterances.], batch size: 39, lr: 3.96e-03, grad_scale: 4.0 2023-03-09 09:45:57,173 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=106092.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:46:56,807 INFO [train2.py:809] (2/4) Epoch 27, batch 2550, loss[ctc_loss=0.08366, att_loss=0.2652, loss=0.2289, over 17290.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01193, over 55.00 utterances.], tot_loss[ctc_loss=0.06517, att_loss=0.2311, loss=0.198, over 3270190.66 frames. utt_duration=1262 frames, utt_pad_proportion=0.0525, over 10379.47 utterances.], batch size: 55, lr: 3.96e-03, grad_scale: 4.0 2023-03-09 09:47:12,045 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=106138.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:47:18,007 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=106142.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:47:27,656 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=106148.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:47:36,809 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.781e+02 2.163e+02 2.660e+02 6.949e+02, threshold=4.326e+02, percent-clipped=5.0 2023-03-09 09:47:39,211 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=106155.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:47:44,053 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=106158.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:48:16,575 INFO [train2.py:809] (2/4) Epoch 27, batch 2600, loss[ctc_loss=0.05536, att_loss=0.2445, loss=0.2066, over 16974.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.00614, over 50.00 utterances.], tot_loss[ctc_loss=0.06637, att_loss=0.2323, loss=0.1991, over 3272311.75 frames. utt_duration=1222 frames, utt_pad_proportion=0.06084, over 10723.20 utterances.], batch size: 50, lr: 3.96e-03, grad_scale: 4.0 2023-03-09 09:48:33,839 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=106190.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:48:41,769 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2023-03-09 09:49:01,042 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=106206.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:49:09,961 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9488, 4.0738, 3.9180, 4.3091, 2.5886, 4.1803, 2.6636, 1.8920], device='cuda:2'), covar=tensor([0.0507, 0.0292, 0.0789, 0.0288, 0.1847, 0.0256, 0.1553, 0.1805], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0183, 0.0265, 0.0178, 0.0222, 0.0164, 0.0231, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 09:49:37,252 INFO [train2.py:809] (2/4) Epoch 27, batch 2650, loss[ctc_loss=0.05934, att_loss=0.2466, loss=0.2092, over 16859.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.008692, over 49.00 utterances.], tot_loss[ctc_loss=0.06686, att_loss=0.2328, loss=0.1996, over 3269983.91 frames. utt_duration=1199 frames, utt_pad_proportion=0.06627, over 10921.47 utterances.], batch size: 49, lr: 3.96e-03, grad_scale: 4.0 2023-03-09 09:49:39,740 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2023-03-09 09:50:17,426 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.781e+02 2.178e+02 2.674e+02 4.972e+02, threshold=4.357e+02, percent-clipped=1.0 2023-03-09 09:50:39,606 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=106268.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:50:55,948 INFO [train2.py:809] (2/4) Epoch 27, batch 2700, loss[ctc_loss=0.06133, att_loss=0.2328, loss=0.1985, over 16861.00 frames. utt_duration=1378 frames, utt_pad_proportion=0.007919, over 49.00 utterances.], tot_loss[ctc_loss=0.0671, att_loss=0.2332, loss=0.2, over 3278287.28 frames. utt_duration=1210 frames, utt_pad_proportion=0.0611, over 10851.34 utterances.], batch size: 49, lr: 3.96e-03, grad_scale: 4.0 2023-03-09 09:51:02,283 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1494, 6.3548, 5.8953, 6.0290, 6.0419, 5.5448, 5.8955, 5.4965], device='cuda:2'), covar=tensor([0.1023, 0.0736, 0.0738, 0.0709, 0.0671, 0.1296, 0.1668, 0.2143], device='cuda:2'), in_proj_covar=tensor([0.0555, 0.0629, 0.0480, 0.0473, 0.0448, 0.0475, 0.0637, 0.0543], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 09:51:22,459 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.90 vs. limit=5.0 2023-03-09 09:51:48,514 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.79 vs. limit=2.0 2023-03-09 09:52:10,949 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9551, 5.3379, 5.3630, 5.3496, 5.3881, 5.3742, 5.0929, 4.9151], device='cuda:2'), covar=tensor([0.1393, 0.0681, 0.0321, 0.0534, 0.0430, 0.0387, 0.0426, 0.0364], device='cuda:2'), in_proj_covar=tensor([0.0544, 0.0386, 0.0376, 0.0383, 0.0448, 0.0455, 0.0383, 0.0419], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 09:52:15,252 INFO [train2.py:809] (2/4) Epoch 27, batch 2750, loss[ctc_loss=0.06206, att_loss=0.2197, loss=0.1882, over 16186.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.005936, over 41.00 utterances.], tot_loss[ctc_loss=0.06678, att_loss=0.233, loss=0.1997, over 3279183.99 frames. utt_duration=1212 frames, utt_pad_proportion=0.06062, over 10832.62 utterances.], batch size: 41, lr: 3.96e-03, grad_scale: 4.0 2023-03-09 09:52:15,649 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=106329.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:52:27,156 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=106336.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:52:55,847 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.858e+02 2.130e+02 2.638e+02 1.636e+03, threshold=4.261e+02, percent-clipped=5.0 2023-03-09 09:53:34,594 INFO [train2.py:809] (2/4) Epoch 27, batch 2800, loss[ctc_loss=0.06047, att_loss=0.2327, loss=0.1983, over 16318.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006946, over 45.00 utterances.], tot_loss[ctc_loss=0.06667, att_loss=0.2326, loss=0.1994, over 3282396.85 frames. utt_duration=1231 frames, utt_pad_proportion=0.05583, over 10679.35 utterances.], batch size: 45, lr: 3.96e-03, grad_scale: 8.0 2023-03-09 09:54:04,682 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=106397.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:54:55,104 INFO [train2.py:809] (2/4) Epoch 27, batch 2850, loss[ctc_loss=0.04845, att_loss=0.2112, loss=0.1786, over 16171.00 frames. utt_duration=1579 frames, utt_pad_proportion=0.007466, over 41.00 utterances.], tot_loss[ctc_loss=0.06595, att_loss=0.2324, loss=0.1991, over 3284622.35 frames. utt_duration=1229 frames, utt_pad_proportion=0.05581, over 10699.33 utterances.], batch size: 41, lr: 3.96e-03, grad_scale: 8.0 2023-03-09 09:55:01,768 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=106433.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:55:26,919 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=106448.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:55:36,025 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.809e+02 2.032e+02 2.602e+02 5.247e+02, threshold=4.065e+02, percent-clipped=2.0 2023-03-09 09:55:36,429 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=106454.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:55:37,251 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-09 09:55:37,989 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=106455.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:55:54,834 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-03-09 09:56:15,397 INFO [train2.py:809] (2/4) Epoch 27, batch 2900, loss[ctc_loss=0.0605, att_loss=0.2418, loss=0.2056, over 16782.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005563, over 48.00 utterances.], tot_loss[ctc_loss=0.06566, att_loss=0.2325, loss=0.1992, over 3278715.02 frames. utt_duration=1208 frames, utt_pad_proportion=0.06299, over 10870.43 utterances.], batch size: 48, lr: 3.96e-03, grad_scale: 8.0 2023-03-09 09:56:42,921 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=106496.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:56:53,782 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=106503.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:57:13,213 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=106515.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 09:57:34,530 INFO [train2.py:809] (2/4) Epoch 27, batch 2950, loss[ctc_loss=0.06675, att_loss=0.2351, loss=0.2014, over 17280.00 frames. utt_duration=1173 frames, utt_pad_proportion=0.02495, over 59.00 utterances.], tot_loss[ctc_loss=0.06601, att_loss=0.2325, loss=0.1992, over 3274390.03 frames. utt_duration=1200 frames, utt_pad_proportion=0.06559, over 10930.72 utterances.], batch size: 59, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 09:57:38,647 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7261, 2.4019, 2.5197, 2.7513, 2.7159, 2.7964, 2.6246, 3.1373], device='cuda:2'), covar=tensor([0.2289, 0.2569, 0.2181, 0.1994, 0.2298, 0.1356, 0.2059, 0.3027], device='cuda:2'), in_proj_covar=tensor([0.0142, 0.0143, 0.0139, 0.0134, 0.0152, 0.0129, 0.0150, 0.0128], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 09:57:57,807 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1015, 5.3495, 5.2619, 5.2518, 5.3731, 5.3290, 4.9511, 4.7785], device='cuda:2'), covar=tensor([0.0973, 0.0547, 0.0351, 0.0533, 0.0311, 0.0338, 0.0443, 0.0369], device='cuda:2'), in_proj_covar=tensor([0.0544, 0.0385, 0.0377, 0.0381, 0.0447, 0.0452, 0.0382, 0.0417], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 09:58:15,059 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.306e+02 2.110e+02 2.427e+02 2.893e+02 4.636e+02, threshold=4.854e+02, percent-clipped=4.0 2023-03-09 09:58:35,549 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0030, 4.3500, 4.6385, 4.4495, 2.7175, 4.2325, 3.0776, 1.9355], device='cuda:2'), covar=tensor([0.0543, 0.0390, 0.0539, 0.0291, 0.1565, 0.0310, 0.1215, 0.1631], device='cuda:2'), in_proj_covar=tensor([0.0216, 0.0185, 0.0266, 0.0179, 0.0223, 0.0166, 0.0232, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 09:58:53,985 INFO [train2.py:809] (2/4) Epoch 27, batch 3000, loss[ctc_loss=0.0502, att_loss=0.2168, loss=0.1835, over 16395.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007364, over 44.00 utterances.], tot_loss[ctc_loss=0.06553, att_loss=0.232, loss=0.1987, over 3268505.80 frames. utt_duration=1220 frames, utt_pad_proportion=0.06118, over 10733.42 utterances.], batch size: 44, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 09:58:53,985 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 09:59:07,587 INFO [train2.py:843] (2/4) Epoch 27, validation: ctc_loss=0.04056, att_loss=0.2346, loss=0.1958, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 09:59:07,588 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 09:59:18,581 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3451, 4.5246, 4.6052, 4.6618, 5.1535, 4.5176, 4.6211, 2.6194], device='cuda:2'), covar=tensor([0.0334, 0.0406, 0.0374, 0.0332, 0.0703, 0.0255, 0.0350, 0.1725], device='cuda:2'), in_proj_covar=tensor([0.0191, 0.0219, 0.0214, 0.0230, 0.0379, 0.0189, 0.0205, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:00:20,319 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=106624.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:00:28,227 INFO [train2.py:809] (2/4) Epoch 27, batch 3050, loss[ctc_loss=0.06979, att_loss=0.2513, loss=0.215, over 17442.00 frames. utt_duration=1109 frames, utt_pad_proportion=0.02969, over 63.00 utterances.], tot_loss[ctc_loss=0.06586, att_loss=0.2326, loss=0.1992, over 3272768.72 frames. utt_duration=1227 frames, utt_pad_proportion=0.0576, over 10680.96 utterances.], batch size: 63, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 10:00:38,418 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7427, 4.9986, 4.9244, 4.8987, 5.0300, 4.9992, 4.6219, 4.5389], device='cuda:2'), covar=tensor([0.1055, 0.0584, 0.0409, 0.0557, 0.0308, 0.0342, 0.0470, 0.0336], device='cuda:2'), in_proj_covar=tensor([0.0542, 0.0385, 0.0378, 0.0382, 0.0446, 0.0452, 0.0383, 0.0416], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 10:00:50,554 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.14 vs. limit=5.0 2023-03-09 10:01:08,157 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.208e+02 1.917e+02 2.259e+02 2.641e+02 5.465e+02, threshold=4.518e+02, percent-clipped=3.0 2023-03-09 10:01:46,961 INFO [train2.py:809] (2/4) Epoch 27, batch 3100, loss[ctc_loss=0.04776, att_loss=0.2056, loss=0.174, over 16267.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.008011, over 43.00 utterances.], tot_loss[ctc_loss=0.06511, att_loss=0.2316, loss=0.1983, over 3274949.00 frames. utt_duration=1241 frames, utt_pad_proportion=0.05437, over 10566.49 utterances.], batch size: 43, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 10:02:08,959 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=106692.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:02:27,131 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.80 vs. limit=2.0 2023-03-09 10:03:06,632 INFO [train2.py:809] (2/4) Epoch 27, batch 3150, loss[ctc_loss=0.05899, att_loss=0.2167, loss=0.1852, over 16129.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.005987, over 42.00 utterances.], tot_loss[ctc_loss=0.06463, att_loss=0.2311, loss=0.1978, over 3277025.96 frames. utt_duration=1243 frames, utt_pad_proportion=0.05312, over 10556.30 utterances.], batch size: 42, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 10:03:13,731 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=106733.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:03:20,462 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4050, 5.7057, 5.1379, 5.4746, 5.3376, 4.8395, 5.1022, 4.9203], device='cuda:2'), covar=tensor([0.1366, 0.0883, 0.0979, 0.0846, 0.0911, 0.1548, 0.2172, 0.2249], device='cuda:2'), in_proj_covar=tensor([0.0559, 0.0638, 0.0486, 0.0478, 0.0453, 0.0480, 0.0640, 0.0544], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 10:03:38,303 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5689, 2.7930, 3.2960, 4.4452, 3.9029, 3.8662, 2.9413, 2.5214], device='cuda:2'), covar=tensor([0.0622, 0.1973, 0.0989, 0.0434, 0.0923, 0.0484, 0.1455, 0.1966], device='cuda:2'), in_proj_covar=tensor([0.0187, 0.0217, 0.0187, 0.0224, 0.0232, 0.0190, 0.0203, 0.0190], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:03:47,285 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.396e+02 1.911e+02 2.328e+02 2.721e+02 4.436e+02, threshold=4.656e+02, percent-clipped=0.0 2023-03-09 10:04:21,826 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2023-03-09 10:04:22,749 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2023-03-09 10:04:26,570 INFO [train2.py:809] (2/4) Epoch 27, batch 3200, loss[ctc_loss=0.05301, att_loss=0.2088, loss=0.1777, over 15778.00 frames. utt_duration=1663 frames, utt_pad_proportion=0.008023, over 38.00 utterances.], tot_loss[ctc_loss=0.0643, att_loss=0.231, loss=0.1976, over 3270105.61 frames. utt_duration=1232 frames, utt_pad_proportion=0.05798, over 10626.67 utterances.], batch size: 38, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 10:04:30,371 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=106781.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:04:44,252 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2545, 5.5112, 5.0679, 5.5740, 4.9679, 5.1295, 5.6444, 5.4088], device='cuda:2'), covar=tensor([0.0527, 0.0264, 0.0647, 0.0298, 0.0325, 0.0208, 0.0189, 0.0191], device='cuda:2'), in_proj_covar=tensor([0.0396, 0.0336, 0.0376, 0.0370, 0.0333, 0.0243, 0.0315, 0.0298], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 10:04:56,533 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9812, 4.9895, 4.7411, 2.8721, 4.8089, 4.6710, 4.1876, 2.7447], device='cuda:2'), covar=tensor([0.0137, 0.0122, 0.0326, 0.1116, 0.0121, 0.0222, 0.0375, 0.1521], device='cuda:2'), in_proj_covar=tensor([0.0080, 0.0109, 0.0113, 0.0115, 0.0092, 0.0121, 0.0104, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 10:05:17,439 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=106810.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:05:47,160 INFO [train2.py:809] (2/4) Epoch 27, batch 3250, loss[ctc_loss=0.07247, att_loss=0.242, loss=0.2081, over 17099.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.01507, over 56.00 utterances.], tot_loss[ctc_loss=0.0642, att_loss=0.2304, loss=0.1971, over 3265285.87 frames. utt_duration=1251 frames, utt_pad_proportion=0.05486, over 10451.23 utterances.], batch size: 56, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 10:06:27,323 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.302e+02 1.765e+02 2.095e+02 2.388e+02 3.844e+02, threshold=4.190e+02, percent-clipped=0.0 2023-03-09 10:06:57,029 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2023-03-09 10:07:07,566 INFO [train2.py:809] (2/4) Epoch 27, batch 3300, loss[ctc_loss=0.05137, att_loss=0.2203, loss=0.1865, over 16119.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005898, over 42.00 utterances.], tot_loss[ctc_loss=0.06412, att_loss=0.2305, loss=0.1972, over 3269222.29 frames. utt_duration=1261 frames, utt_pad_proportion=0.05171, over 10384.53 utterances.], batch size: 42, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 10:08:02,668 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4025, 4.4927, 4.5265, 4.5715, 5.0903, 4.4007, 4.5063, 2.5146], device='cuda:2'), covar=tensor([0.0301, 0.0365, 0.0343, 0.0314, 0.0655, 0.0286, 0.0355, 0.1845], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0221, 0.0217, 0.0233, 0.0383, 0.0192, 0.0207, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:08:19,440 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=106924.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:08:27,389 INFO [train2.py:809] (2/4) Epoch 27, batch 3350, loss[ctc_loss=0.06405, att_loss=0.2125, loss=0.1828, over 15493.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009392, over 36.00 utterances.], tot_loss[ctc_loss=0.06535, att_loss=0.2316, loss=0.1984, over 3269360.33 frames. utt_duration=1255 frames, utt_pad_proportion=0.0528, over 10436.89 utterances.], batch size: 36, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 10:09:07,449 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.952e+02 2.382e+02 3.093e+02 6.808e+02, threshold=4.763e+02, percent-clipped=6.0 2023-03-09 10:09:22,186 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-03-09 10:09:26,514 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1072, 5.1648, 4.9964, 2.6458, 2.1820, 3.4543, 2.7698, 4.0257], device='cuda:2'), covar=tensor([0.0730, 0.0409, 0.0282, 0.4145, 0.5276, 0.1833, 0.3417, 0.1607], device='cuda:2'), in_proj_covar=tensor([0.0361, 0.0299, 0.0278, 0.0250, 0.0338, 0.0330, 0.0262, 0.0370], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 10:09:35,730 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=106972.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:09:47,289 INFO [train2.py:809] (2/4) Epoch 27, batch 3400, loss[ctc_loss=0.05735, att_loss=0.2069, loss=0.177, over 11874.00 frames. utt_duration=1828 frames, utt_pad_proportion=0.1678, over 26.00 utterances.], tot_loss[ctc_loss=0.06474, att_loss=0.2313, loss=0.198, over 3265305.68 frames. utt_duration=1261 frames, utt_pad_proportion=0.05147, over 10366.20 utterances.], batch size: 26, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 10:09:49,111 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4076, 3.0095, 3.6917, 3.0769, 3.5279, 4.5693, 4.3999, 3.2677], device='cuda:2'), covar=tensor([0.0409, 0.1681, 0.1171, 0.1292, 0.1064, 0.0757, 0.0614, 0.1178], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0252, 0.0296, 0.0223, 0.0273, 0.0386, 0.0278, 0.0239], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:10:09,118 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=106992.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:11:07,805 INFO [train2.py:809] (2/4) Epoch 27, batch 3450, loss[ctc_loss=0.06924, att_loss=0.236, loss=0.2026, over 17308.00 frames. utt_duration=1100 frames, utt_pad_proportion=0.03817, over 63.00 utterances.], tot_loss[ctc_loss=0.06563, att_loss=0.232, loss=0.1988, over 3261953.92 frames. utt_duration=1244 frames, utt_pad_proportion=0.0557, over 10498.94 utterances.], batch size: 63, lr: 3.95e-03, grad_scale: 8.0 2023-03-09 10:11:25,846 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=107040.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:11:48,076 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.835e+02 2.250e+02 2.844e+02 5.706e+02, threshold=4.501e+02, percent-clipped=2.0 2023-03-09 10:12:26,814 INFO [train2.py:809] (2/4) Epoch 27, batch 3500, loss[ctc_loss=0.05417, att_loss=0.2151, loss=0.1829, over 15904.00 frames. utt_duration=1633 frames, utt_pad_proportion=0.007914, over 39.00 utterances.], tot_loss[ctc_loss=0.06571, att_loss=0.2324, loss=0.1991, over 3263106.91 frames. utt_duration=1234 frames, utt_pad_proportion=0.05756, over 10588.92 utterances.], batch size: 39, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:12:37,311 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7288, 3.0681, 3.0349, 2.7443, 3.0270, 2.9942, 3.1123, 2.3032], device='cuda:2'), covar=tensor([0.1035, 0.1314, 0.2007, 0.2873, 0.1091, 0.1770, 0.1053, 0.3178], device='cuda:2'), in_proj_covar=tensor([0.0202, 0.0207, 0.0223, 0.0272, 0.0183, 0.0284, 0.0206, 0.0230], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:13:15,828 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=107110.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:13:45,757 INFO [train2.py:809] (2/4) Epoch 27, batch 3550, loss[ctc_loss=0.09201, att_loss=0.2581, loss=0.2248, over 17349.00 frames. utt_duration=1007 frames, utt_pad_proportion=0.04886, over 69.00 utterances.], tot_loss[ctc_loss=0.06635, att_loss=0.2327, loss=0.1994, over 3264453.01 frames. utt_duration=1218 frames, utt_pad_proportion=0.06114, over 10731.97 utterances.], batch size: 69, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:13:51,078 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1873, 5.4073, 5.6863, 5.4138, 5.6875, 6.1349, 5.3487, 6.1868], device='cuda:2'), covar=tensor([0.0686, 0.0695, 0.0871, 0.1454, 0.1726, 0.0840, 0.0713, 0.0741], device='cuda:2'), in_proj_covar=tensor([0.0914, 0.0526, 0.0644, 0.0683, 0.0910, 0.0667, 0.0511, 0.0644], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:14:03,584 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=107140.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 10:14:22,409 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=107152.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:14:24,971 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.809e+01 1.862e+02 2.158e+02 2.587e+02 5.069e+02, threshold=4.316e+02, percent-clipped=3.0 2023-03-09 10:14:31,568 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=107158.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:15:05,779 INFO [train2.py:809] (2/4) Epoch 27, batch 3600, loss[ctc_loss=0.07813, att_loss=0.2392, loss=0.207, over 16410.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.006755, over 44.00 utterances.], tot_loss[ctc_loss=0.066, att_loss=0.2326, loss=0.1993, over 3268155.12 frames. utt_duration=1221 frames, utt_pad_proportion=0.06033, over 10723.29 utterances.], batch size: 44, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:15:41,174 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=107201.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 10:15:59,521 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=107213.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 10:16:24,610 INFO [train2.py:809] (2/4) Epoch 27, batch 3650, loss[ctc_loss=0.0497, att_loss=0.2088, loss=0.177, over 15777.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.007526, over 38.00 utterances.], tot_loss[ctc_loss=0.06565, att_loss=0.2315, loss=0.1983, over 3261008.20 frames. utt_duration=1229 frames, utt_pad_proportion=0.06216, over 10625.80 utterances.], batch size: 38, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:17:03,101 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.892e+02 2.154e+02 2.728e+02 3.735e+02, threshold=4.308e+02, percent-clipped=0.0 2023-03-09 10:17:32,394 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1640, 4.5447, 4.6886, 4.6205, 3.0096, 4.4689, 3.0008, 2.1130], device='cuda:2'), covar=tensor([0.0469, 0.0287, 0.0573, 0.0316, 0.1404, 0.0260, 0.1250, 0.1525], device='cuda:2'), in_proj_covar=tensor([0.0218, 0.0186, 0.0268, 0.0181, 0.0224, 0.0167, 0.0233, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:17:43,432 INFO [train2.py:809] (2/4) Epoch 27, batch 3700, loss[ctc_loss=0.06115, att_loss=0.2442, loss=0.2076, over 17053.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.008911, over 53.00 utterances.], tot_loss[ctc_loss=0.06577, att_loss=0.2314, loss=0.1983, over 3256321.80 frames. utt_duration=1207 frames, utt_pad_proportion=0.07021, over 10805.63 utterances.], batch size: 53, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:19:01,339 INFO [train2.py:809] (2/4) Epoch 27, batch 3750, loss[ctc_loss=0.0732, att_loss=0.2452, loss=0.2108, over 17357.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.02131, over 59.00 utterances.], tot_loss[ctc_loss=0.06569, att_loss=0.2313, loss=0.1982, over 3263808.81 frames. utt_duration=1226 frames, utt_pad_proportion=0.06382, over 10660.31 utterances.], batch size: 59, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:19:40,407 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.252e+02 1.917e+02 2.215e+02 2.702e+02 1.402e+03, threshold=4.430e+02, percent-clipped=2.0 2023-03-09 10:20:19,855 INFO [train2.py:809] (2/4) Epoch 27, batch 3800, loss[ctc_loss=0.07317, att_loss=0.2491, loss=0.2139, over 17344.00 frames. utt_duration=1263 frames, utt_pad_proportion=0.009597, over 55.00 utterances.], tot_loss[ctc_loss=0.06601, att_loss=0.2318, loss=0.1987, over 3264483.76 frames. utt_duration=1222 frames, utt_pad_proportion=0.06494, over 10698.54 utterances.], batch size: 55, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:21:20,659 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8610, 3.5523, 2.9826, 3.1777, 3.7171, 3.4639, 2.7639, 3.7963], device='cuda:2'), covar=tensor([0.1003, 0.0502, 0.1059, 0.0708, 0.0710, 0.0705, 0.0911, 0.0476], device='cuda:2'), in_proj_covar=tensor([0.0205, 0.0225, 0.0229, 0.0206, 0.0288, 0.0246, 0.0202, 0.0294], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 10:21:25,855 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9392, 5.2048, 5.4154, 5.2240, 5.4755, 5.8786, 5.2169, 6.0115], device='cuda:2'), covar=tensor([0.0718, 0.0775, 0.0965, 0.1530, 0.1742, 0.0955, 0.0675, 0.0681], device='cuda:2'), in_proj_covar=tensor([0.0920, 0.0527, 0.0646, 0.0685, 0.0915, 0.0670, 0.0512, 0.0643], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:21:39,084 INFO [train2.py:809] (2/4) Epoch 27, batch 3850, loss[ctc_loss=0.03943, att_loss=0.2182, loss=0.1824, over 16110.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.005687, over 42.00 utterances.], tot_loss[ctc_loss=0.06559, att_loss=0.232, loss=0.1987, over 3267713.96 frames. utt_duration=1230 frames, utt_pad_proportion=0.06228, over 10640.01 utterances.], batch size: 42, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:22:17,271 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.331e+02 1.947e+02 2.251e+02 2.768e+02 8.015e+02, threshold=4.503e+02, percent-clipped=5.0 2023-03-09 10:22:53,965 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9057, 5.2402, 5.4609, 5.2461, 5.4079, 5.9060, 5.2373, 5.9747], device='cuda:2'), covar=tensor([0.0767, 0.0810, 0.0872, 0.1478, 0.1886, 0.0862, 0.0816, 0.0718], device='cuda:2'), in_proj_covar=tensor([0.0924, 0.0529, 0.0650, 0.0690, 0.0922, 0.0673, 0.0514, 0.0648], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:22:55,305 INFO [train2.py:809] (2/4) Epoch 27, batch 3900, loss[ctc_loss=0.0494, att_loss=0.2411, loss=0.2028, over 17007.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.009309, over 51.00 utterances.], tot_loss[ctc_loss=0.06579, att_loss=0.2324, loss=0.1991, over 3268995.74 frames. utt_duration=1217 frames, utt_pad_proportion=0.06328, over 10755.76 utterances.], batch size: 51, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:23:22,176 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=107496.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 10:23:22,351 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9726, 4.4113, 4.2413, 4.4701, 2.6961, 4.3391, 2.7796, 2.0073], device='cuda:2'), covar=tensor([0.0582, 0.0314, 0.0842, 0.0269, 0.1928, 0.0268, 0.1667, 0.1821], device='cuda:2'), in_proj_covar=tensor([0.0218, 0.0186, 0.0267, 0.0180, 0.0222, 0.0167, 0.0233, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:23:40,495 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=107508.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 10:23:43,582 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=107510.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:24:12,007 INFO [train2.py:809] (2/4) Epoch 27, batch 3950, loss[ctc_loss=0.07556, att_loss=0.239, loss=0.2063, over 17298.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01195, over 55.00 utterances.], tot_loss[ctc_loss=0.06529, att_loss=0.2311, loss=0.1979, over 3262423.56 frames. utt_duration=1237 frames, utt_pad_proportion=0.06092, over 10558.22 utterances.], batch size: 55, lr: 3.94e-03, grad_scale: 8.0 2023-03-09 10:24:17,262 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.57 vs. limit=5.0 2023-03-09 10:24:49,843 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.368e+02 1.912e+02 2.157e+02 2.759e+02 5.743e+02, threshold=4.314e+02, percent-clipped=3.0 2023-03-09 10:24:58,157 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3316, 2.5421, 3.0688, 2.5466, 3.0468, 3.4723, 3.4057, 2.6134], device='cuda:2'), covar=tensor([0.0498, 0.1660, 0.1301, 0.1197, 0.1035, 0.1235, 0.0691, 0.1334], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0253, 0.0296, 0.0222, 0.0273, 0.0385, 0.0277, 0.0240], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:25:20,652 INFO [train2.py:809] (2/4) Epoch 28, batch 0, loss[ctc_loss=0.06345, att_loss=0.2173, loss=0.1865, over 16177.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006351, over 41.00 utterances.], tot_loss[ctc_loss=0.06345, att_loss=0.2173, loss=0.1865, over 16177.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006351, over 41.00 utterances.], batch size: 41, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:25:20,652 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 10:25:32,827 INFO [train2.py:843] (2/4) Epoch 28, validation: ctc_loss=0.04041, att_loss=0.2344, loss=0.1956, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 10:25:32,828 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 10:25:47,324 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=107571.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:26:52,866 INFO [train2.py:809] (2/4) Epoch 28, batch 50, loss[ctc_loss=0.07567, att_loss=0.254, loss=0.2184, over 16953.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.007383, over 50.00 utterances.], tot_loss[ctc_loss=0.06585, att_loss=0.2324, loss=0.1991, over 740596.44 frames. utt_duration=1200 frames, utt_pad_proportion=0.06804, over 2471.11 utterances.], batch size: 50, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:27:20,102 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5209, 2.2934, 2.0581, 2.3970, 2.6847, 2.4534, 2.1074, 2.7053], device='cuda:2'), covar=tensor([0.1786, 0.2471, 0.2572, 0.1437, 0.2335, 0.1540, 0.2308, 0.1616], device='cuda:2'), in_proj_covar=tensor([0.0143, 0.0143, 0.0140, 0.0134, 0.0151, 0.0130, 0.0150, 0.0128], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 10:28:00,263 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.779e+02 2.036e+02 2.554e+02 4.414e+02, threshold=4.071e+02, percent-clipped=2.0 2023-03-09 10:28:08,371 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=107659.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:28:12,549 INFO [train2.py:809] (2/4) Epoch 28, batch 100, loss[ctc_loss=0.06215, att_loss=0.2374, loss=0.2023, over 17019.00 frames. utt_duration=1286 frames, utt_pad_proportion=0.01147, over 53.00 utterances.], tot_loss[ctc_loss=0.06483, att_loss=0.231, loss=0.1978, over 1308302.49 frames. utt_duration=1277 frames, utt_pad_proportion=0.04444, over 4103.48 utterances.], batch size: 53, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:29:32,143 INFO [train2.py:809] (2/4) Epoch 28, batch 150, loss[ctc_loss=0.06058, att_loss=0.2518, loss=0.2135, over 17326.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.03275, over 63.00 utterances.], tot_loss[ctc_loss=0.06674, att_loss=0.2329, loss=0.1997, over 1744405.88 frames. utt_duration=1193 frames, utt_pad_proportion=0.06543, over 5856.38 utterances.], batch size: 63, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:29:42,034 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0496, 5.3328, 5.6161, 5.4366, 5.5285, 6.0459, 5.2707, 6.0930], device='cuda:2'), covar=tensor([0.0774, 0.0825, 0.0878, 0.1366, 0.1912, 0.0909, 0.0685, 0.0668], device='cuda:2'), in_proj_covar=tensor([0.0916, 0.0524, 0.0644, 0.0684, 0.0913, 0.0665, 0.0512, 0.0643], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:29:45,368 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=107720.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:30:11,812 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=107737.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:30:38,888 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.211e+02 1.937e+02 2.331e+02 2.865e+02 6.453e+02, threshold=4.662e+02, percent-clipped=4.0 2023-03-09 10:30:51,738 INFO [train2.py:809] (2/4) Epoch 28, batch 200, loss[ctc_loss=0.05494, att_loss=0.205, loss=0.175, over 15631.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.009858, over 37.00 utterances.], tot_loss[ctc_loss=0.06539, att_loss=0.2307, loss=0.1977, over 2075331.08 frames. utt_duration=1251 frames, utt_pad_proportion=0.05365, over 6644.54 utterances.], batch size: 37, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:31:45,217 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=107796.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 10:31:48,997 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=107798.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 10:32:05,079 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=107808.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:32:05,165 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7536, 2.5939, 2.4407, 2.4852, 2.8527, 2.7814, 2.3843, 3.0982], device='cuda:2'), covar=tensor([0.1794, 0.1839, 0.1577, 0.1373, 0.1497, 0.1011, 0.1896, 0.0990], device='cuda:2'), in_proj_covar=tensor([0.0141, 0.0142, 0.0139, 0.0133, 0.0150, 0.0128, 0.0148, 0.0127], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 10:32:10,871 INFO [train2.py:809] (2/4) Epoch 28, batch 250, loss[ctc_loss=0.04737, att_loss=0.2054, loss=0.1738, over 15898.00 frames. utt_duration=1632 frames, utt_pad_proportion=0.008692, over 39.00 utterances.], tot_loss[ctc_loss=0.06482, att_loss=0.2302, loss=0.1971, over 2341709.09 frames. utt_duration=1251 frames, utt_pad_proportion=0.0541, over 7495.18 utterances.], batch size: 39, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:33:01,123 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=107844.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 10:33:16,917 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.778e+02 2.118e+02 2.564e+02 4.844e+02, threshold=4.236e+02, percent-clipped=1.0 2023-03-09 10:33:20,627 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=107856.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:33:29,916 INFO [train2.py:809] (2/4) Epoch 28, batch 300, loss[ctc_loss=0.06422, att_loss=0.224, loss=0.192, over 16283.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.007051, over 43.00 utterances.], tot_loss[ctc_loss=0.06481, att_loss=0.2304, loss=0.1973, over 2550301.31 frames. utt_duration=1248 frames, utt_pad_proportion=0.05299, over 8186.02 utterances.], batch size: 43, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:33:36,246 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=107866.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:34:49,117 INFO [train2.py:809] (2/4) Epoch 28, batch 350, loss[ctc_loss=0.06766, att_loss=0.2166, loss=0.1868, over 15649.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008592, over 37.00 utterances.], tot_loss[ctc_loss=0.06523, att_loss=0.2304, loss=0.1973, over 2697217.99 frames. utt_duration=1234 frames, utt_pad_proportion=0.0613, over 8751.25 utterances.], batch size: 37, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:34:55,382 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8017, 6.0211, 5.3992, 5.7283, 5.7114, 5.1791, 5.5011, 5.2373], device='cuda:2'), covar=tensor([0.1169, 0.0926, 0.1054, 0.0841, 0.1023, 0.1569, 0.2105, 0.2281], device='cuda:2'), in_proj_covar=tensor([0.0558, 0.0638, 0.0487, 0.0475, 0.0452, 0.0481, 0.0642, 0.0542], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 10:34:59,490 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0815, 2.4230, 4.3970, 3.7056, 2.9203, 3.8860, 4.0808, 4.1177], device='cuda:2'), covar=tensor([0.0308, 0.1603, 0.0239, 0.0832, 0.1593, 0.0313, 0.0300, 0.0362], device='cuda:2'), in_proj_covar=tensor([0.0229, 0.0248, 0.0223, 0.0323, 0.0269, 0.0239, 0.0213, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:35:55,812 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.778e+02 2.080e+02 2.502e+02 5.851e+02, threshold=4.160e+02, percent-clipped=2.0 2023-03-09 10:36:02,635 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0887, 5.3238, 5.6023, 5.3889, 5.6004, 6.0115, 5.2768, 6.0787], device='cuda:2'), covar=tensor([0.0701, 0.0829, 0.0937, 0.1364, 0.1859, 0.0945, 0.0781, 0.0749], device='cuda:2'), in_proj_covar=tensor([0.0912, 0.0524, 0.0645, 0.0684, 0.0913, 0.0665, 0.0510, 0.0644], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:36:08,361 INFO [train2.py:809] (2/4) Epoch 28, batch 400, loss[ctc_loss=0.0721, att_loss=0.2371, loss=0.2041, over 17333.00 frames. utt_duration=1262 frames, utt_pad_proportion=0.01001, over 55.00 utterances.], tot_loss[ctc_loss=0.06526, att_loss=0.2311, loss=0.1979, over 2832619.12 frames. utt_duration=1229 frames, utt_pad_proportion=0.05836, over 9227.81 utterances.], batch size: 55, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:36:30,794 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3360, 5.2871, 5.0844, 3.1288, 5.0673, 4.9623, 4.6108, 3.1194], device='cuda:2'), covar=tensor([0.0110, 0.0085, 0.0268, 0.0919, 0.0098, 0.0175, 0.0258, 0.1142], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0107, 0.0112, 0.0114, 0.0091, 0.0120, 0.0102, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 10:37:33,052 INFO [train2.py:809] (2/4) Epoch 28, batch 450, loss[ctc_loss=0.07076, att_loss=0.2392, loss=0.2055, over 16929.00 frames. utt_duration=685.5 frames, utt_pad_proportion=0.1399, over 99.00 utterances.], tot_loss[ctc_loss=0.06471, att_loss=0.2306, loss=0.1975, over 2929282.34 frames. utt_duration=1223 frames, utt_pad_proportion=0.06028, over 9589.90 utterances.], batch size: 99, lr: 3.86e-03, grad_scale: 8.0 2023-03-09 10:37:37,596 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=108015.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:38:20,528 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1923, 5.2423, 4.8625, 2.7870, 5.0588, 5.0087, 4.3700, 2.7121], device='cuda:2'), covar=tensor([0.0143, 0.0100, 0.0329, 0.1276, 0.0105, 0.0190, 0.0356, 0.1592], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0106, 0.0111, 0.0113, 0.0090, 0.0119, 0.0101, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 10:38:39,470 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.861e+02 2.240e+02 2.601e+02 4.871e+02, threshold=4.480e+02, percent-clipped=1.0 2023-03-09 10:38:51,886 INFO [train2.py:809] (2/4) Epoch 28, batch 500, loss[ctc_loss=0.04962, att_loss=0.2065, loss=0.1751, over 15778.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008322, over 38.00 utterances.], tot_loss[ctc_loss=0.06477, att_loss=0.2311, loss=0.1978, over 3009269.76 frames. utt_duration=1261 frames, utt_pad_proportion=0.05046, over 9555.82 utterances.], batch size: 38, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:38:55,269 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5549, 2.9776, 3.6529, 3.1829, 3.5584, 4.5704, 4.4044, 3.3009], device='cuda:2'), covar=tensor([0.0321, 0.1777, 0.1261, 0.1187, 0.1068, 0.0864, 0.0584, 0.1264], device='cuda:2'), in_proj_covar=tensor([0.0248, 0.0249, 0.0289, 0.0218, 0.0269, 0.0379, 0.0272, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:39:05,345 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8230, 3.6312, 3.6318, 3.1115, 3.6824, 3.6451, 3.6458, 2.8272], device='cuda:2'), covar=tensor([0.1165, 0.1258, 0.1551, 0.3415, 0.0953, 0.1899, 0.0950, 0.3006], device='cuda:2'), in_proj_covar=tensor([0.0201, 0.0206, 0.0222, 0.0270, 0.0182, 0.0280, 0.0206, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:39:41,178 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=108093.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 10:39:58,049 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1555, 5.1865, 5.0129, 2.3456, 2.2279, 3.0720, 2.5087, 3.9210], device='cuda:2'), covar=tensor([0.0688, 0.0356, 0.0284, 0.5143, 0.5092, 0.2334, 0.3948, 0.1729], device='cuda:2'), in_proj_covar=tensor([0.0365, 0.0302, 0.0282, 0.0254, 0.0341, 0.0335, 0.0265, 0.0373], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 10:40:05,587 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8751, 3.6657, 3.6837, 3.1479, 3.7396, 3.7080, 3.7145, 2.7918], device='cuda:2'), covar=tensor([0.0928, 0.0894, 0.1474, 0.2532, 0.0824, 0.1814, 0.0693, 0.2625], device='cuda:2'), in_proj_covar=tensor([0.0202, 0.0206, 0.0222, 0.0270, 0.0182, 0.0280, 0.0206, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:40:11,189 INFO [train2.py:809] (2/4) Epoch 28, batch 550, loss[ctc_loss=0.0718, att_loss=0.2354, loss=0.2027, over 16188.00 frames. utt_duration=1581 frames, utt_pad_proportion=0.006408, over 41.00 utterances.], tot_loss[ctc_loss=0.06462, att_loss=0.2306, loss=0.1974, over 3071807.11 frames. utt_duration=1272 frames, utt_pad_proportion=0.04736, over 9667.90 utterances.], batch size: 41, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:40:43,356 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.4005, 5.6968, 5.1352, 5.4512, 5.3847, 4.8407, 5.0909, 4.8983], device='cuda:2'), covar=tensor([0.1321, 0.0925, 0.1028, 0.0883, 0.1007, 0.1723, 0.2243, 0.2456], device='cuda:2'), in_proj_covar=tensor([0.0563, 0.0643, 0.0490, 0.0478, 0.0457, 0.0485, 0.0645, 0.0550], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 10:41:18,472 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.757e+02 2.157e+02 2.611e+02 5.249e+02, threshold=4.315e+02, percent-clipped=2.0 2023-03-09 10:41:30,578 INFO [train2.py:809] (2/4) Epoch 28, batch 600, loss[ctc_loss=0.08858, att_loss=0.2517, loss=0.219, over 14545.00 frames. utt_duration=402.7 frames, utt_pad_proportion=0.3008, over 145.00 utterances.], tot_loss[ctc_loss=0.06477, att_loss=0.2309, loss=0.1977, over 3116795.25 frames. utt_duration=1250 frames, utt_pad_proportion=0.0538, over 9987.26 utterances.], batch size: 145, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:41:37,577 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=108166.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:42:49,656 INFO [train2.py:809] (2/4) Epoch 28, batch 650, loss[ctc_loss=0.07154, att_loss=0.2388, loss=0.2054, over 16469.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006568, over 46.00 utterances.], tot_loss[ctc_loss=0.06472, att_loss=0.2309, loss=0.1977, over 3155256.29 frames. utt_duration=1233 frames, utt_pad_proportion=0.05627, over 10248.80 utterances.], batch size: 46, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:42:53,334 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=108214.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:43:56,527 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.814e+02 2.237e+02 2.665e+02 5.467e+02, threshold=4.474e+02, percent-clipped=5.0 2023-03-09 10:44:08,563 INFO [train2.py:809] (2/4) Epoch 28, batch 700, loss[ctc_loss=0.06676, att_loss=0.2503, loss=0.2136, over 17035.00 frames. utt_duration=1287 frames, utt_pad_proportion=0.01067, over 53.00 utterances.], tot_loss[ctc_loss=0.06486, att_loss=0.2313, loss=0.198, over 3187956.95 frames. utt_duration=1235 frames, utt_pad_proportion=0.05452, over 10339.43 utterances.], batch size: 53, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:44:08,951 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5767, 2.4193, 2.3708, 2.5038, 2.7658, 2.7042, 2.3888, 2.9258], device='cuda:2'), covar=tensor([0.1748, 0.2337, 0.1880, 0.1509, 0.1596, 0.1240, 0.2271, 0.1580], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0147, 0.0142, 0.0137, 0.0154, 0.0132, 0.0153, 0.0131], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 10:44:46,753 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5130, 3.0134, 3.6289, 3.1418, 3.4867, 4.6132, 4.4721, 3.4001], device='cuda:2'), covar=tensor([0.0393, 0.1930, 0.1428, 0.1368, 0.1290, 0.1037, 0.0612, 0.1204], device='cuda:2'), in_proj_covar=tensor([0.0251, 0.0251, 0.0293, 0.0221, 0.0273, 0.0383, 0.0275, 0.0238], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:44:50,541 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.27 vs. limit=5.0 2023-03-09 10:45:27,465 INFO [train2.py:809] (2/4) Epoch 28, batch 750, loss[ctc_loss=0.08534, att_loss=0.244, loss=0.2122, over 16474.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006835, over 46.00 utterances.], tot_loss[ctc_loss=0.06564, att_loss=0.2314, loss=0.1983, over 3208150.84 frames. utt_duration=1239 frames, utt_pad_proportion=0.05413, over 10367.25 utterances.], batch size: 46, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:45:32,965 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=108315.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:45:48,255 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=108325.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:46:33,461 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 1.877e+02 2.184e+02 2.631e+02 4.478e+02, threshold=4.368e+02, percent-clipped=1.0 2023-03-09 10:46:45,997 INFO [train2.py:809] (2/4) Epoch 28, batch 800, loss[ctc_loss=0.1087, att_loss=0.2652, loss=0.2339, over 14196.00 frames. utt_duration=390.4 frames, utt_pad_proportion=0.321, over 146.00 utterances.], tot_loss[ctc_loss=0.06546, att_loss=0.2311, loss=0.1979, over 3215565.77 frames. utt_duration=1232 frames, utt_pad_proportion=0.05855, over 10453.03 utterances.], batch size: 146, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:46:47,659 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=108363.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:47:10,563 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5031, 2.9191, 4.9755, 3.9614, 3.0794, 4.2746, 4.8300, 4.6783], device='cuda:2'), covar=tensor([0.0357, 0.1351, 0.0293, 0.0861, 0.1658, 0.0283, 0.0224, 0.0305], device='cuda:2'), in_proj_covar=tensor([0.0233, 0.0250, 0.0227, 0.0329, 0.0273, 0.0242, 0.0217, 0.0241], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:47:23,494 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=108386.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:47:34,685 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=108393.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:48:04,322 INFO [train2.py:809] (2/4) Epoch 28, batch 850, loss[ctc_loss=0.05335, att_loss=0.2218, loss=0.1881, over 15962.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005818, over 41.00 utterances.], tot_loss[ctc_loss=0.06514, att_loss=0.2309, loss=0.1977, over 3227284.19 frames. utt_duration=1254 frames, utt_pad_proportion=0.05394, over 10309.01 utterances.], batch size: 41, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:48:27,374 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8166, 6.0944, 5.5740, 5.7727, 5.7691, 5.2447, 5.5480, 5.2672], device='cuda:2'), covar=tensor([0.1288, 0.0888, 0.0952, 0.0839, 0.0985, 0.1585, 0.2218, 0.2580], device='cuda:2'), in_proj_covar=tensor([0.0564, 0.0646, 0.0492, 0.0481, 0.0459, 0.0487, 0.0648, 0.0552], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 10:48:35,084 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=108432.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:48:49,722 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=108441.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:49:03,426 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.13 vs. limit=5.0 2023-03-09 10:49:10,175 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.210e+02 1.750e+02 2.196e+02 2.771e+02 5.305e+02, threshold=4.392e+02, percent-clipped=2.0 2023-03-09 10:49:23,198 INFO [train2.py:809] (2/4) Epoch 28, batch 900, loss[ctc_loss=0.06917, att_loss=0.2491, loss=0.2131, over 17635.00 frames. utt_duration=1009 frames, utt_pad_proportion=0.04537, over 70.00 utterances.], tot_loss[ctc_loss=0.06506, att_loss=0.2313, loss=0.198, over 3240510.48 frames. utt_duration=1271 frames, utt_pad_proportion=0.04971, over 10210.11 utterances.], batch size: 70, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:50:12,173 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=108493.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 10:50:31,041 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3033, 4.3740, 4.4996, 4.4651, 5.1010, 4.2806, 4.3820, 2.5803], device='cuda:2'), covar=tensor([0.0330, 0.0484, 0.0383, 0.0427, 0.0648, 0.0294, 0.0403, 0.1792], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0221, 0.0216, 0.0233, 0.0379, 0.0192, 0.0207, 0.0220], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:50:42,012 INFO [train2.py:809] (2/4) Epoch 28, batch 950, loss[ctc_loss=0.06286, att_loss=0.2332, loss=0.1991, over 16417.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.005988, over 44.00 utterances.], tot_loss[ctc_loss=0.06542, att_loss=0.2312, loss=0.198, over 3246307.06 frames. utt_duration=1265 frames, utt_pad_proportion=0.05184, over 10276.81 utterances.], batch size: 44, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:51:36,938 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4698, 2.8997, 3.5363, 4.4326, 3.9551, 3.9399, 2.9815, 2.3715], device='cuda:2'), covar=tensor([0.0725, 0.1928, 0.0785, 0.0524, 0.0868, 0.0517, 0.1472, 0.2124], device='cuda:2'), in_proj_covar=tensor([0.0191, 0.0221, 0.0188, 0.0228, 0.0236, 0.0194, 0.0206, 0.0194], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:51:47,830 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.763e+02 2.072e+02 2.486e+02 7.743e+02, threshold=4.143e+02, percent-clipped=3.0 2023-03-09 10:52:00,760 INFO [train2.py:809] (2/4) Epoch 28, batch 1000, loss[ctc_loss=0.0845, att_loss=0.257, loss=0.2225, over 17050.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.009746, over 53.00 utterances.], tot_loss[ctc_loss=0.06567, att_loss=0.2313, loss=0.1982, over 3238501.89 frames. utt_duration=1246 frames, utt_pad_proportion=0.05935, over 10405.22 utterances.], batch size: 53, lr: 3.85e-03, grad_scale: 16.0 2023-03-09 10:52:08,887 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1332, 3.7417, 3.2301, 3.4176, 3.9955, 3.6848, 3.0092, 4.2599], device='cuda:2'), covar=tensor([0.0957, 0.0512, 0.1094, 0.0815, 0.0780, 0.0757, 0.0875, 0.0530], device='cuda:2'), in_proj_covar=tensor([0.0208, 0.0228, 0.0233, 0.0210, 0.0293, 0.0252, 0.0207, 0.0299], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 10:52:24,725 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=108577.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:53:13,944 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-09 10:53:20,714 INFO [train2.py:809] (2/4) Epoch 28, batch 1050, loss[ctc_loss=0.0771, att_loss=0.2421, loss=0.2091, over 17336.00 frames. utt_duration=879.1 frames, utt_pad_proportion=0.08041, over 79.00 utterances.], tot_loss[ctc_loss=0.06535, att_loss=0.2313, loss=0.1981, over 3240840.45 frames. utt_duration=1247 frames, utt_pad_proportion=0.05895, over 10406.04 utterances.], batch size: 79, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 10:53:50,550 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2023-03-09 10:54:02,745 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=108638.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:54:27,406 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.257e+02 1.905e+02 2.250e+02 2.754e+02 7.007e+02, threshold=4.500e+02, percent-clipped=3.0 2023-03-09 10:54:40,458 INFO [train2.py:809] (2/4) Epoch 28, batch 1100, loss[ctc_loss=0.06999, att_loss=0.2427, loss=0.2082, over 16542.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006197, over 45.00 utterances.], tot_loss[ctc_loss=0.06498, att_loss=0.2314, loss=0.1981, over 3246657.31 frames. utt_duration=1258 frames, utt_pad_proportion=0.05553, over 10338.50 utterances.], batch size: 45, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 10:55:10,686 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=108681.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:55:16,900 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3201, 4.3523, 4.5067, 4.5049, 5.0492, 4.3866, 4.3852, 2.4180], device='cuda:2'), covar=tensor([0.0328, 0.0456, 0.0360, 0.0335, 0.0729, 0.0286, 0.0391, 0.1843], device='cuda:2'), in_proj_covar=tensor([0.0192, 0.0220, 0.0215, 0.0231, 0.0377, 0.0191, 0.0206, 0.0219], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:56:00,159 INFO [train2.py:809] (2/4) Epoch 28, batch 1150, loss[ctc_loss=0.06362, att_loss=0.2395, loss=0.2043, over 16682.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006235, over 46.00 utterances.], tot_loss[ctc_loss=0.06543, att_loss=0.2315, loss=0.1983, over 3250338.69 frames. utt_duration=1245 frames, utt_pad_proportion=0.05924, over 10454.50 utterances.], batch size: 46, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 10:56:08,446 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6501, 2.4696, 2.4128, 2.5419, 2.8573, 2.8233, 2.3790, 3.1526], device='cuda:2'), covar=tensor([0.1482, 0.2301, 0.1703, 0.1406, 0.1492, 0.1029, 0.1959, 0.1168], device='cuda:2'), in_proj_covar=tensor([0.0146, 0.0147, 0.0143, 0.0138, 0.0155, 0.0132, 0.0154, 0.0132], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 10:57:07,175 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.269e+02 1.925e+02 2.338e+02 2.824e+02 7.096e+02, threshold=4.677e+02, percent-clipped=6.0 2023-03-09 10:57:10,346 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7244, 3.1432, 3.7548, 3.2849, 3.6619, 4.7406, 4.6304, 3.7167], device='cuda:2'), covar=tensor([0.0316, 0.1653, 0.1287, 0.1218, 0.1081, 0.0849, 0.0502, 0.0924], device='cuda:2'), in_proj_covar=tensor([0.0246, 0.0248, 0.0287, 0.0217, 0.0268, 0.0377, 0.0270, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 10:57:19,310 INFO [train2.py:809] (2/4) Epoch 28, batch 1200, loss[ctc_loss=0.04414, att_loss=0.2079, loss=0.1751, over 15634.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008756, over 37.00 utterances.], tot_loss[ctc_loss=0.06505, att_loss=0.2315, loss=0.1982, over 3263852.84 frames. utt_duration=1272 frames, utt_pad_proportion=0.05046, over 10274.90 utterances.], batch size: 37, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 10:58:00,761 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=108788.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 10:58:05,457 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=108791.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:58:39,211 INFO [train2.py:809] (2/4) Epoch 28, batch 1250, loss[ctc_loss=0.07407, att_loss=0.2474, loss=0.2128, over 17296.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01232, over 55.00 utterances.], tot_loss[ctc_loss=0.06471, att_loss=0.231, loss=0.1977, over 3255480.47 frames. utt_duration=1266 frames, utt_pad_proportion=0.05429, over 10294.65 utterances.], batch size: 55, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 10:59:17,567 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8618, 6.1062, 5.5750, 5.8351, 5.7863, 5.2508, 5.5354, 5.3144], device='cuda:2'), covar=tensor([0.1183, 0.0832, 0.1010, 0.0767, 0.0904, 0.1622, 0.2106, 0.2176], device='cuda:2'), in_proj_covar=tensor([0.0557, 0.0643, 0.0488, 0.0475, 0.0454, 0.0482, 0.0644, 0.0548], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 10:59:28,359 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1998, 3.8432, 3.7608, 3.2639, 3.7730, 3.8658, 3.8799, 2.8757], device='cuda:2'), covar=tensor([0.0954, 0.0964, 0.1955, 0.3027, 0.1345, 0.2529, 0.0792, 0.3258], device='cuda:2'), in_proj_covar=tensor([0.0204, 0.0209, 0.0226, 0.0275, 0.0185, 0.0286, 0.0208, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 10:59:42,575 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=108852.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 10:59:45,115 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.829e+02 2.128e+02 2.551e+02 3.725e+02, threshold=4.255e+02, percent-clipped=0.0 2023-03-09 10:59:58,168 INFO [train2.py:809] (2/4) Epoch 28, batch 1300, loss[ctc_loss=0.06301, att_loss=0.2258, loss=0.1932, over 15959.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.0067, over 41.00 utterances.], tot_loss[ctc_loss=0.06403, att_loss=0.2303, loss=0.197, over 3259883.87 frames. utt_duration=1283 frames, utt_pad_proportion=0.05015, over 10172.79 utterances.], batch size: 41, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 11:00:35,142 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7584, 5.1135, 4.9501, 5.1241, 5.2305, 4.8172, 3.6142, 5.0903], device='cuda:2'), covar=tensor([0.0123, 0.0120, 0.0164, 0.0087, 0.0098, 0.0140, 0.0668, 0.0208], device='cuda:2'), in_proj_covar=tensor([0.0096, 0.0092, 0.0116, 0.0073, 0.0079, 0.0089, 0.0105, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 11:01:13,690 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9734, 4.3062, 4.2625, 4.4881, 2.6743, 4.3590, 2.7369, 1.7620], device='cuda:2'), covar=tensor([0.0515, 0.0245, 0.0704, 0.0270, 0.1491, 0.0253, 0.1391, 0.1694], device='cuda:2'), in_proj_covar=tensor([0.0219, 0.0187, 0.0268, 0.0183, 0.0224, 0.0169, 0.0236, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:01:17,642 INFO [train2.py:809] (2/4) Epoch 28, batch 1350, loss[ctc_loss=0.04699, att_loss=0.2341, loss=0.1967, over 16753.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006545, over 48.00 utterances.], tot_loss[ctc_loss=0.0641, att_loss=0.2299, loss=0.1967, over 3257266.25 frames. utt_duration=1279 frames, utt_pad_proportion=0.05168, over 10196.34 utterances.], batch size: 48, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 11:01:23,447 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2023-03-09 11:01:51,235 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=108933.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:02:24,360 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.787e+02 2.090e+02 2.740e+02 6.253e+02, threshold=4.180e+02, percent-clipped=4.0 2023-03-09 11:02:36,713 INFO [train2.py:809] (2/4) Epoch 28, batch 1400, loss[ctc_loss=0.06434, att_loss=0.2304, loss=0.1972, over 17244.00 frames. utt_duration=874.7 frames, utt_pad_proportion=0.0841, over 79.00 utterances.], tot_loss[ctc_loss=0.06408, att_loss=0.2303, loss=0.1971, over 3260601.02 frames. utt_duration=1257 frames, utt_pad_proportion=0.05459, over 10387.37 utterances.], batch size: 79, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 11:02:48,049 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1202, 3.7219, 3.2246, 3.3462, 3.9079, 3.5824, 3.0493, 4.1254], device='cuda:2'), covar=tensor([0.0965, 0.0555, 0.1137, 0.0768, 0.0788, 0.0798, 0.0881, 0.0457], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0227, 0.0230, 0.0208, 0.0289, 0.0249, 0.0205, 0.0298], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 11:03:07,517 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=108981.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:03:39,170 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2023-03-09 11:03:56,712 INFO [train2.py:809] (2/4) Epoch 28, batch 1450, loss[ctc_loss=0.06473, att_loss=0.2445, loss=0.2086, over 17295.00 frames. utt_duration=1174 frames, utt_pad_proportion=0.02384, over 59.00 utterances.], tot_loss[ctc_loss=0.06393, att_loss=0.2302, loss=0.1969, over 3261469.18 frames. utt_duration=1256 frames, utt_pad_proportion=0.055, over 10399.78 utterances.], batch size: 59, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 11:04:24,450 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=109029.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:04:36,941 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=109037.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 11:05:03,519 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.756e+02 2.072e+02 2.557e+02 4.820e+02, threshold=4.143e+02, percent-clipped=2.0 2023-03-09 11:05:16,007 INFO [train2.py:809] (2/4) Epoch 28, batch 1500, loss[ctc_loss=0.07523, att_loss=0.256, loss=0.2199, over 17274.00 frames. utt_duration=1173 frames, utt_pad_proportion=0.02348, over 59.00 utterances.], tot_loss[ctc_loss=0.06491, att_loss=0.2316, loss=0.1983, over 3269870.65 frames. utt_duration=1230 frames, utt_pad_proportion=0.05868, over 10642.78 utterances.], batch size: 59, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 11:05:57,487 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=109088.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:06:13,834 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=109098.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 11:06:34,719 INFO [train2.py:809] (2/4) Epoch 28, batch 1550, loss[ctc_loss=0.05912, att_loss=0.2121, loss=0.1815, over 15358.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.01215, over 35.00 utterances.], tot_loss[ctc_loss=0.06452, att_loss=0.2312, loss=0.1979, over 3265027.33 frames. utt_duration=1231 frames, utt_pad_proportion=0.06089, over 10621.18 utterances.], batch size: 35, lr: 3.84e-03, grad_scale: 16.0 2023-03-09 11:06:45,812 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1241, 5.1085, 4.9188, 3.1661, 4.9154, 4.7620, 4.3994, 2.9325], device='cuda:2'), covar=tensor([0.0139, 0.0116, 0.0273, 0.0926, 0.0105, 0.0212, 0.0328, 0.1348], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0107, 0.0112, 0.0113, 0.0090, 0.0119, 0.0102, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 11:06:55,729 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9907, 4.9924, 4.9611, 2.0185, 1.9418, 2.7849, 2.2674, 3.8106], device='cuda:2'), covar=tensor([0.0740, 0.0304, 0.0211, 0.5061, 0.5392, 0.2705, 0.3835, 0.1673], device='cuda:2'), in_proj_covar=tensor([0.0364, 0.0303, 0.0281, 0.0255, 0.0342, 0.0334, 0.0265, 0.0373], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 11:07:12,458 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=109136.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:07:25,399 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3447, 2.7153, 4.7931, 3.7442, 3.0814, 4.1254, 4.4564, 4.5027], device='cuda:2'), covar=tensor([0.0310, 0.1440, 0.0248, 0.0903, 0.1524, 0.0280, 0.0239, 0.0275], device='cuda:2'), in_proj_covar=tensor([0.0231, 0.0247, 0.0224, 0.0324, 0.0268, 0.0239, 0.0214, 0.0238], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:07:30,569 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=109147.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:07:41,450 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.952e+02 2.217e+02 2.637e+02 4.191e+02, threshold=4.435e+02, percent-clipped=1.0 2023-03-09 11:07:53,851 INFO [train2.py:809] (2/4) Epoch 28, batch 1600, loss[ctc_loss=0.08404, att_loss=0.2541, loss=0.2201, over 17025.00 frames. utt_duration=1311 frames, utt_pad_proportion=0.009717, over 52.00 utterances.], tot_loss[ctc_loss=0.06465, att_loss=0.2318, loss=0.1984, over 3269671.68 frames. utt_duration=1241 frames, utt_pad_proportion=0.05733, over 10550.50 utterances.], batch size: 52, lr: 3.83e-03, grad_scale: 16.0 2023-03-09 11:07:54,125 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8384, 5.2094, 5.0999, 5.1580, 5.2999, 4.9021, 3.6401, 5.1472], device='cuda:2'), covar=tensor([0.0109, 0.0103, 0.0119, 0.0080, 0.0101, 0.0130, 0.0647, 0.0192], device='cuda:2'), in_proj_covar=tensor([0.0096, 0.0091, 0.0115, 0.0072, 0.0079, 0.0089, 0.0104, 0.0109], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 11:08:00,139 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1526, 5.4393, 5.0022, 5.4822, 4.8788, 5.0586, 5.5877, 5.3356], device='cuda:2'), covar=tensor([0.0595, 0.0320, 0.0813, 0.0332, 0.0385, 0.0262, 0.0193, 0.0209], device='cuda:2'), in_proj_covar=tensor([0.0404, 0.0345, 0.0382, 0.0383, 0.0338, 0.0249, 0.0324, 0.0305], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 11:08:11,764 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3085, 2.5436, 3.3269, 4.2208, 3.7655, 3.8600, 2.7404, 2.4535], device='cuda:2'), covar=tensor([0.0786, 0.2111, 0.0778, 0.0581, 0.0922, 0.0478, 0.1632, 0.1921], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0220, 0.0189, 0.0227, 0.0236, 0.0193, 0.0206, 0.0193], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:08:39,298 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0328, 6.2017, 5.6804, 5.8794, 5.9418, 5.2503, 5.7907, 5.3900], device='cuda:2'), covar=tensor([0.1210, 0.0942, 0.0967, 0.0873, 0.0993, 0.1701, 0.2126, 0.2355], device='cuda:2'), in_proj_covar=tensor([0.0556, 0.0638, 0.0485, 0.0475, 0.0452, 0.0482, 0.0641, 0.0545], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 11:08:44,708 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=109194.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:09:12,890 INFO [train2.py:809] (2/4) Epoch 28, batch 1650, loss[ctc_loss=0.05654, att_loss=0.2169, loss=0.1849, over 15890.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009113, over 39.00 utterances.], tot_loss[ctc_loss=0.06493, att_loss=0.2325, loss=0.199, over 3271691.46 frames. utt_duration=1225 frames, utt_pad_proportion=0.06112, over 10695.72 utterances.], batch size: 39, lr: 3.83e-03, grad_scale: 16.0 2023-03-09 11:09:47,312 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=109233.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:10:22,241 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.344e+02 1.779e+02 2.256e+02 2.612e+02 6.095e+02, threshold=4.513e+02, percent-clipped=4.0 2023-03-09 11:10:22,710 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=109255.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:10:33,279 INFO [train2.py:809] (2/4) Epoch 28, batch 1700, loss[ctc_loss=0.06772, att_loss=0.2198, loss=0.1894, over 15867.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.0103, over 39.00 utterances.], tot_loss[ctc_loss=0.06436, att_loss=0.2322, loss=0.1987, over 3274438.83 frames. utt_duration=1235 frames, utt_pad_proportion=0.05776, over 10617.81 utterances.], batch size: 39, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:11:02,561 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0305, 4.3185, 4.1596, 4.4444, 2.6199, 4.1843, 2.6878, 1.7501], device='cuda:2'), covar=tensor([0.0489, 0.0247, 0.0731, 0.0288, 0.1616, 0.0302, 0.1511, 0.1708], device='cuda:2'), in_proj_covar=tensor([0.0219, 0.0188, 0.0266, 0.0182, 0.0224, 0.0170, 0.0234, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:11:03,721 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=109281.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:11:23,980 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=109294.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:11:52,251 INFO [train2.py:809] (2/4) Epoch 28, batch 1750, loss[ctc_loss=0.0617, att_loss=0.2122, loss=0.1821, over 15628.00 frames. utt_duration=1691 frames, utt_pad_proportion=0.008761, over 37.00 utterances.], tot_loss[ctc_loss=0.06465, att_loss=0.2318, loss=0.1984, over 3273069.99 frames. utt_duration=1252 frames, utt_pad_proportion=0.05419, over 10472.38 utterances.], batch size: 37, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:12:02,552 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=109318.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:12:47,095 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7516, 5.1251, 5.0101, 5.1076, 5.2325, 4.8755, 3.9002, 5.1206], device='cuda:2'), covar=tensor([0.0122, 0.0102, 0.0123, 0.0071, 0.0084, 0.0094, 0.0542, 0.0180], device='cuda:2'), in_proj_covar=tensor([0.0096, 0.0091, 0.0116, 0.0072, 0.0079, 0.0089, 0.0104, 0.0110], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 11:12:48,675 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2955, 3.1167, 3.5883, 2.6815, 3.3697, 4.4824, 4.3229, 3.0993], device='cuda:2'), covar=tensor([0.0430, 0.1488, 0.1311, 0.1567, 0.1219, 0.0875, 0.0640, 0.1361], device='cuda:2'), in_proj_covar=tensor([0.0248, 0.0250, 0.0290, 0.0220, 0.0271, 0.0380, 0.0274, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 11:12:50,892 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-03-09 11:13:00,660 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.203e+02 1.738e+02 2.027e+02 2.395e+02 5.989e+02, threshold=4.055e+02, percent-clipped=1.0 2023-03-09 11:13:01,094 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=109355.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:13:07,473 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8898, 2.7287, 3.3690, 2.6839, 3.2337, 4.0382, 3.9031, 2.8790], device='cuda:2'), covar=tensor([0.0442, 0.1734, 0.1345, 0.1397, 0.1155, 0.1032, 0.0714, 0.1383], device='cuda:2'), in_proj_covar=tensor([0.0248, 0.0250, 0.0290, 0.0220, 0.0271, 0.0380, 0.0274, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 11:13:11,724 INFO [train2.py:809] (2/4) Epoch 28, batch 1800, loss[ctc_loss=0.07156, att_loss=0.218, loss=0.1887, over 15620.00 frames. utt_duration=1690 frames, utt_pad_proportion=0.01032, over 37.00 utterances.], tot_loss[ctc_loss=0.06567, att_loss=0.2327, loss=0.1993, over 3278341.89 frames. utt_duration=1220 frames, utt_pad_proportion=0.06026, over 10759.60 utterances.], batch size: 37, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:13:39,598 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=109379.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 11:14:01,821 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=109393.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 11:14:31,015 INFO [train2.py:809] (2/4) Epoch 28, batch 1850, loss[ctc_loss=0.06305, att_loss=0.233, loss=0.199, over 16611.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.006051, over 47.00 utterances.], tot_loss[ctc_loss=0.06589, att_loss=0.2329, loss=0.1995, over 3284109.90 frames. utt_duration=1224 frames, utt_pad_proportion=0.05791, over 10746.02 utterances.], batch size: 47, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:15:13,197 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5953, 2.5464, 2.6844, 2.6935, 2.9594, 2.8758, 2.5621, 3.1651], device='cuda:2'), covar=tensor([0.2024, 0.2359, 0.1849, 0.1564, 0.1590, 0.1467, 0.2200, 0.1339], device='cuda:2'), in_proj_covar=tensor([0.0147, 0.0149, 0.0144, 0.0139, 0.0155, 0.0133, 0.0154, 0.0133], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 11:15:26,809 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=109447.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:15:39,066 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.724e+02 2.158e+02 2.874e+02 5.021e+02, threshold=4.317e+02, percent-clipped=1.0 2023-03-09 11:15:42,586 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1987, 5.5320, 5.7434, 5.4936, 5.6681, 6.1330, 5.3986, 6.2121], device='cuda:2'), covar=tensor([0.0776, 0.0759, 0.0903, 0.1501, 0.1933, 0.1039, 0.0605, 0.0724], device='cuda:2'), in_proj_covar=tensor([0.0931, 0.0534, 0.0654, 0.0692, 0.0923, 0.0669, 0.0517, 0.0653], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 11:15:50,141 INFO [train2.py:809] (2/4) Epoch 28, batch 1900, loss[ctc_loss=0.0906, att_loss=0.2512, loss=0.2191, over 17263.00 frames. utt_duration=1172 frames, utt_pad_proportion=0.02489, over 59.00 utterances.], tot_loss[ctc_loss=0.06625, att_loss=0.2329, loss=0.1996, over 3280776.26 frames. utt_duration=1228 frames, utt_pad_proportion=0.05795, over 10700.01 utterances.], batch size: 59, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:16:07,924 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-03-09 11:16:12,014 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3754, 2.7880, 3.0399, 4.3612, 3.9529, 3.9301, 2.8583, 2.3268], device='cuda:2'), covar=tensor([0.0744, 0.1934, 0.1045, 0.0529, 0.0832, 0.0474, 0.1461, 0.2087], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0218, 0.0187, 0.0224, 0.0234, 0.0192, 0.0206, 0.0192], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:16:23,182 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2023-03-09 11:16:43,557 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=109495.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:17:09,535 INFO [train2.py:809] (2/4) Epoch 28, batch 1950, loss[ctc_loss=0.08428, att_loss=0.2467, loss=0.2142, over 17112.00 frames. utt_duration=1224 frames, utt_pad_proportion=0.01468, over 56.00 utterances.], tot_loss[ctc_loss=0.06634, att_loss=0.233, loss=0.1997, over 3266943.27 frames. utt_duration=1223 frames, utt_pad_proportion=0.06266, over 10697.04 utterances.], batch size: 56, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:18:01,999 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.69 vs. limit=5.0 2023-03-09 11:18:10,594 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=109550.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:18:17,570 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.372e+02 1.926e+02 2.337e+02 2.750e+02 5.694e+02, threshold=4.673e+02, percent-clipped=4.0 2023-03-09 11:18:21,059 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2208, 3.8270, 3.3641, 3.4533, 4.0976, 3.7236, 3.0998, 4.3691], device='cuda:2'), covar=tensor([0.0956, 0.0591, 0.1046, 0.0772, 0.0699, 0.0742, 0.0854, 0.0495], device='cuda:2'), in_proj_covar=tensor([0.0208, 0.0229, 0.0231, 0.0211, 0.0290, 0.0252, 0.0205, 0.0299], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 11:18:28,471 INFO [train2.py:809] (2/4) Epoch 28, batch 2000, loss[ctc_loss=0.06791, att_loss=0.2368, loss=0.203, over 16542.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.00539, over 45.00 utterances.], tot_loss[ctc_loss=0.06561, att_loss=0.2323, loss=0.199, over 3267814.57 frames. utt_duration=1232 frames, utt_pad_proportion=0.05891, over 10619.07 utterances.], batch size: 45, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:18:55,930 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=109579.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:19:47,002 INFO [train2.py:809] (2/4) Epoch 28, batch 2050, loss[ctc_loss=0.06013, att_loss=0.2419, loss=0.2056, over 16477.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.005936, over 46.00 utterances.], tot_loss[ctc_loss=0.06571, att_loss=0.2322, loss=0.1989, over 3272267.61 frames. utt_duration=1240 frames, utt_pad_proportion=0.05635, over 10568.64 utterances.], batch size: 46, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:20:20,231 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-09 11:20:33,232 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=109640.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:20:48,872 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=109650.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:20:56,143 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.260e+02 1.837e+02 2.171e+02 2.517e+02 4.514e+02, threshold=4.342e+02, percent-clipped=0.0 2023-03-09 11:21:02,539 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=109659.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:21:06,771 INFO [train2.py:809] (2/4) Epoch 28, batch 2100, loss[ctc_loss=0.05199, att_loss=0.2222, loss=0.1882, over 16402.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006729, over 44.00 utterances.], tot_loss[ctc_loss=0.06492, att_loss=0.2313, loss=0.1981, over 3267301.64 frames. utt_duration=1240 frames, utt_pad_proportion=0.05746, over 10554.39 utterances.], batch size: 44, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:21:07,020 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6860, 5.0491, 5.2015, 5.0658, 5.1571, 5.6633, 5.0205, 5.7212], device='cuda:2'), covar=tensor([0.0808, 0.0800, 0.0957, 0.1481, 0.2002, 0.0896, 0.0956, 0.0713], device='cuda:2'), in_proj_covar=tensor([0.0916, 0.0528, 0.0646, 0.0684, 0.0917, 0.0664, 0.0511, 0.0643], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 11:21:18,096 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0374, 5.3427, 4.9058, 5.3917, 4.7302, 5.0047, 5.4663, 5.2039], device='cuda:2'), covar=tensor([0.0615, 0.0312, 0.0853, 0.0314, 0.0443, 0.0277, 0.0277, 0.0234], device='cuda:2'), in_proj_covar=tensor([0.0408, 0.0346, 0.0386, 0.0384, 0.0342, 0.0250, 0.0327, 0.0308], device='cuda:2'), out_proj_covar=tensor([0.0007, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0006, 0.0005], device='cuda:2') 2023-03-09 11:21:21,869 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1263, 5.0483, 4.8270, 3.0332, 4.8679, 4.7354, 4.3666, 2.6551], device='cuda:2'), covar=tensor([0.0110, 0.0118, 0.0292, 0.0988, 0.0110, 0.0212, 0.0333, 0.1445], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0106, 0.0112, 0.0112, 0.0089, 0.0118, 0.0101, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 11:21:26,411 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=109674.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 11:21:57,828 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=109693.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 11:22:26,685 INFO [train2.py:809] (2/4) Epoch 28, batch 2150, loss[ctc_loss=0.09459, att_loss=0.2584, loss=0.2256, over 14551.00 frames. utt_duration=402.8 frames, utt_pad_proportion=0.3006, over 145.00 utterances.], tot_loss[ctc_loss=0.06548, att_loss=0.2317, loss=0.1985, over 3265597.44 frames. utt_duration=1230 frames, utt_pad_proportion=0.06028, over 10632.78 utterances.], batch size: 145, lr: 3.83e-03, grad_scale: 8.0 2023-03-09 11:22:40,782 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=109720.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:23:12,297 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-03-09 11:23:13,177 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=109741.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 11:23:34,194 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.864e+02 2.379e+02 2.851e+02 5.187e+02, threshold=4.758e+02, percent-clipped=2.0 2023-03-09 11:23:45,111 INFO [train2.py:809] (2/4) Epoch 28, batch 2200, loss[ctc_loss=0.05481, att_loss=0.2179, loss=0.1853, over 16014.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007017, over 40.00 utterances.], tot_loss[ctc_loss=0.06447, att_loss=0.2309, loss=0.1976, over 3267422.14 frames. utt_duration=1255 frames, utt_pad_proportion=0.05506, over 10425.34 utterances.], batch size: 40, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:25:03,656 INFO [train2.py:809] (2/4) Epoch 28, batch 2250, loss[ctc_loss=0.06945, att_loss=0.2383, loss=0.2046, over 16806.00 frames. utt_duration=680.4 frames, utt_pad_proportion=0.1409, over 99.00 utterances.], tot_loss[ctc_loss=0.06524, att_loss=0.2317, loss=0.1984, over 3268639.70 frames. utt_duration=1213 frames, utt_pad_proportion=0.06341, over 10790.06 utterances.], batch size: 99, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:26:03,222 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=109850.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:26:10,586 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.320e+02 1.849e+02 2.212e+02 2.625e+02 5.541e+02, threshold=4.423e+02, percent-clipped=2.0 2023-03-09 11:26:21,265 INFO [train2.py:809] (2/4) Epoch 28, batch 2300, loss[ctc_loss=0.08291, att_loss=0.25, loss=0.2166, over 16629.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.00499, over 47.00 utterances.], tot_loss[ctc_loss=0.06568, att_loss=0.232, loss=0.1988, over 3275048.29 frames. utt_duration=1228 frames, utt_pad_proportion=0.05848, over 10679.39 utterances.], batch size: 47, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:26:51,213 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5394, 2.6819, 5.0775, 4.0250, 3.2801, 4.3478, 4.8580, 4.6377], device='cuda:2'), covar=tensor([0.0306, 0.1486, 0.0221, 0.0833, 0.1478, 0.0245, 0.0185, 0.0319], device='cuda:2'), in_proj_covar=tensor([0.0231, 0.0248, 0.0226, 0.0325, 0.0270, 0.0242, 0.0217, 0.0241], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:27:17,539 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=109898.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:27:38,458 INFO [train2.py:809] (2/4) Epoch 28, batch 2350, loss[ctc_loss=0.05594, att_loss=0.2195, loss=0.1868, over 16409.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.006351, over 44.00 utterances.], tot_loss[ctc_loss=0.06586, att_loss=0.232, loss=0.1988, over 3280378.07 frames. utt_duration=1253 frames, utt_pad_proportion=0.05095, over 10486.59 utterances.], batch size: 44, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:27:47,035 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9178, 5.0201, 4.8660, 2.2153, 1.9851, 2.9412, 2.2968, 3.8968], device='cuda:2'), covar=tensor([0.0797, 0.0325, 0.0302, 0.5196, 0.5508, 0.2419, 0.4180, 0.1529], device='cuda:2'), in_proj_covar=tensor([0.0360, 0.0300, 0.0278, 0.0251, 0.0338, 0.0331, 0.0263, 0.0368], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 11:27:47,643 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.15 vs. limit=5.0 2023-03-09 11:28:16,091 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=109935.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:28:23,943 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9927, 4.2348, 4.0111, 4.5036, 2.6254, 4.2975, 2.6626, 1.7699], device='cuda:2'), covar=tensor([0.0522, 0.0293, 0.0847, 0.0256, 0.1756, 0.0267, 0.1614, 0.1892], device='cuda:2'), in_proj_covar=tensor([0.0217, 0.0188, 0.0265, 0.0181, 0.0222, 0.0170, 0.0233, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:28:39,543 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=109950.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:28:46,857 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.356e+02 1.839e+02 2.224e+02 2.680e+02 5.661e+02, threshold=4.448e+02, percent-clipped=2.0 2023-03-09 11:28:57,735 INFO [train2.py:809] (2/4) Epoch 28, batch 2400, loss[ctc_loss=0.04869, att_loss=0.2145, loss=0.1813, over 15971.00 frames. utt_duration=1560 frames, utt_pad_proportion=0.005907, over 41.00 utterances.], tot_loss[ctc_loss=0.06566, att_loss=0.2323, loss=0.199, over 3287459.73 frames. utt_duration=1260 frames, utt_pad_proportion=0.04806, over 10448.18 utterances.], batch size: 41, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:29:17,575 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=109974.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:29:41,984 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0979, 3.8256, 3.7529, 3.4476, 3.7712, 3.8654, 3.8513, 2.9710], device='cuda:2'), covar=tensor([0.1072, 0.1123, 0.1845, 0.2328, 0.1684, 0.3677, 0.0927, 0.3022], device='cuda:2'), in_proj_covar=tensor([0.0205, 0.0208, 0.0226, 0.0276, 0.0186, 0.0287, 0.0208, 0.0233], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:29:48,450 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4139, 2.4766, 4.8058, 3.8108, 3.0737, 4.0560, 4.5077, 4.4757], device='cuda:2'), covar=tensor([0.0296, 0.1544, 0.0227, 0.0895, 0.1618, 0.0303, 0.0213, 0.0318], device='cuda:2'), in_proj_covar=tensor([0.0232, 0.0250, 0.0228, 0.0328, 0.0272, 0.0243, 0.0218, 0.0243], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:29:55,814 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=109998.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:30:22,867 INFO [train2.py:809] (2/4) Epoch 28, batch 2450, loss[ctc_loss=0.05029, att_loss=0.2068, loss=0.1755, over 15636.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008787, over 37.00 utterances.], tot_loss[ctc_loss=0.06563, att_loss=0.232, loss=0.1987, over 3289185.18 frames. utt_duration=1261 frames, utt_pad_proportion=0.04699, over 10446.00 utterances.], batch size: 37, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:30:28,542 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=110015.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:30:28,618 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8203, 5.1672, 5.0321, 5.1029, 5.2746, 4.9246, 4.0730, 5.2347], device='cuda:2'), covar=tensor([0.0112, 0.0104, 0.0131, 0.0080, 0.0075, 0.0104, 0.0498, 0.0151], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0092, 0.0117, 0.0073, 0.0080, 0.0090, 0.0105, 0.0110], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 11:30:31,947 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110017.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:30:39,403 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=110022.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:30:41,111 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6994, 5.1159, 4.9462, 5.0454, 5.2294, 4.8285, 3.7058, 5.1619], device='cuda:2'), covar=tensor([0.0122, 0.0112, 0.0135, 0.0091, 0.0080, 0.0114, 0.0639, 0.0156], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0093, 0.0118, 0.0073, 0.0080, 0.0090, 0.0105, 0.0110], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 11:31:12,218 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2023-03-09 11:31:14,797 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110044.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:31:30,255 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9742, 3.7377, 3.2042, 3.2999, 3.9214, 3.5925, 3.0367, 4.1243], device='cuda:2'), covar=tensor([0.1122, 0.0532, 0.1158, 0.0856, 0.0777, 0.0768, 0.0909, 0.0499], device='cuda:2'), in_proj_covar=tensor([0.0209, 0.0229, 0.0233, 0.0210, 0.0291, 0.0252, 0.0205, 0.0299], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 11:31:31,405 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.872e+02 2.152e+02 2.645e+02 5.223e+02, threshold=4.304e+02, percent-clipped=1.0 2023-03-09 11:31:41,819 INFO [train2.py:809] (2/4) Epoch 28, batch 2500, loss[ctc_loss=0.07464, att_loss=0.2414, loss=0.2081, over 17000.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.008894, over 51.00 utterances.], tot_loss[ctc_loss=0.06527, att_loss=0.2317, loss=0.1985, over 3283498.99 frames. utt_duration=1262 frames, utt_pad_proportion=0.04742, over 10418.07 utterances.], batch size: 51, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:32:07,605 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110078.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:32:50,574 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110105.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:32:58,089 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110110.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:33:01,505 INFO [train2.py:809] (2/4) Epoch 28, batch 2550, loss[ctc_loss=0.06935, att_loss=0.252, loss=0.2155, over 17020.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007715, over 51.00 utterances.], tot_loss[ctc_loss=0.06435, att_loss=0.2319, loss=0.1984, over 3290806.59 frames. utt_duration=1278 frames, utt_pad_proportion=0.04169, over 10308.62 utterances.], batch size: 51, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:33:07,183 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110115.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:33:41,632 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0924, 4.5154, 4.4981, 4.7009, 2.8213, 4.4577, 2.8565, 1.9288], device='cuda:2'), covar=tensor([0.0585, 0.0309, 0.0672, 0.0267, 0.1511, 0.0266, 0.1385, 0.1632], device='cuda:2'), in_proj_covar=tensor([0.0221, 0.0191, 0.0270, 0.0184, 0.0226, 0.0172, 0.0236, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:34:08,240 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.726e+02 2.160e+02 2.628e+02 5.720e+02, threshold=4.319e+02, percent-clipped=3.0 2023-03-09 11:34:19,904 INFO [train2.py:809] (2/4) Epoch 28, batch 2600, loss[ctc_loss=0.06612, att_loss=0.2486, loss=0.2121, over 17067.00 frames. utt_duration=1290 frames, utt_pad_proportion=0.008687, over 53.00 utterances.], tot_loss[ctc_loss=0.0645, att_loss=0.2313, loss=0.1979, over 3282536.21 frames. utt_duration=1262 frames, utt_pad_proportion=0.04767, over 10415.40 utterances.], batch size: 53, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:34:34,589 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110171.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:34:42,163 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110176.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:35:38,092 INFO [train2.py:809] (2/4) Epoch 28, batch 2650, loss[ctc_loss=0.05246, att_loss=0.2198, loss=0.1863, over 16532.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006351, over 45.00 utterances.], tot_loss[ctc_loss=0.06404, att_loss=0.2309, loss=0.1975, over 3282496.75 frames. utt_duration=1266 frames, utt_pad_proportion=0.04671, over 10380.68 utterances.], batch size: 45, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:36:02,616 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0715, 4.3615, 4.5068, 4.6210, 2.8579, 4.3188, 3.0289, 1.7576], device='cuda:2'), covar=tensor([0.0521, 0.0329, 0.0668, 0.0298, 0.1475, 0.0284, 0.1345, 0.1697], device='cuda:2'), in_proj_covar=tensor([0.0219, 0.0190, 0.0268, 0.0183, 0.0225, 0.0171, 0.0234, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:36:16,236 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=110235.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:36:46,734 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.822e+02 2.163e+02 2.650e+02 7.141e+02, threshold=4.325e+02, percent-clipped=4.0 2023-03-09 11:36:47,012 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0649, 5.3651, 5.2629, 5.2574, 5.3922, 5.3594, 5.0230, 4.8102], device='cuda:2'), covar=tensor([0.1123, 0.0545, 0.0348, 0.0599, 0.0282, 0.0319, 0.0427, 0.0320], device='cuda:2'), in_proj_covar=tensor([0.0531, 0.0375, 0.0371, 0.0378, 0.0438, 0.0444, 0.0376, 0.0411], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 11:36:57,425 INFO [train2.py:809] (2/4) Epoch 28, batch 2700, loss[ctc_loss=0.0571, att_loss=0.2203, loss=0.1876, over 15952.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.007151, over 41.00 utterances.], tot_loss[ctc_loss=0.06433, att_loss=0.2308, loss=0.1975, over 3274038.67 frames. utt_duration=1285 frames, utt_pad_proportion=0.04459, over 10204.84 utterances.], batch size: 41, lr: 3.82e-03, grad_scale: 8.0 2023-03-09 11:37:32,558 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=110283.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:37:32,822 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110283.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:38:03,444 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110303.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:38:16,818 INFO [train2.py:809] (2/4) Epoch 28, batch 2750, loss[ctc_loss=0.04735, att_loss=0.2277, loss=0.1916, over 16883.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006729, over 49.00 utterances.], tot_loss[ctc_loss=0.06362, att_loss=0.2301, loss=0.1968, over 3273537.29 frames. utt_duration=1302 frames, utt_pad_proportion=0.04048, over 10069.62 utterances.], batch size: 49, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:38:22,982 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=110315.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:38:27,615 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110318.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:38:36,807 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1316, 4.4832, 4.5177, 4.6862, 2.7576, 4.4330, 2.9251, 2.3033], device='cuda:2'), covar=tensor([0.0491, 0.0310, 0.0677, 0.0271, 0.1540, 0.0260, 0.1332, 0.1450], device='cuda:2'), in_proj_covar=tensor([0.0219, 0.0190, 0.0268, 0.0182, 0.0224, 0.0171, 0.0233, 0.0204], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:39:08,928 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110344.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:39:13,547 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6839, 2.4522, 2.8411, 2.7780, 3.0817, 3.0354, 2.4361, 3.0866], device='cuda:2'), covar=tensor([0.1380, 0.2067, 0.1446, 0.1254, 0.1118, 0.0956, 0.1749, 0.1013], device='cuda:2'), in_proj_covar=tensor([0.0147, 0.0149, 0.0144, 0.0139, 0.0156, 0.0134, 0.0154, 0.0134], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 11:39:25,559 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.237e+02 1.842e+02 2.112e+02 2.497e+02 5.270e+02, threshold=4.224e+02, percent-clipped=1.0 2023-03-09 11:39:34,249 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.14 vs. limit=5.0 2023-03-09 11:39:36,856 INFO [train2.py:809] (2/4) Epoch 28, batch 2800, loss[ctc_loss=0.04896, att_loss=0.2433, loss=0.2044, over 16614.00 frames. utt_duration=1415 frames, utt_pad_proportion=0.003908, over 47.00 utterances.], tot_loss[ctc_loss=0.06345, att_loss=0.2302, loss=0.1968, over 3276321.19 frames. utt_duration=1281 frames, utt_pad_proportion=0.04485, over 10242.29 utterances.], batch size: 47, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:39:39,131 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=110363.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:39:40,845 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110364.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:39:54,237 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=110373.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:40:04,279 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110379.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:40:37,355 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=110400.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:40:55,889 INFO [train2.py:809] (2/4) Epoch 28, batch 2850, loss[ctc_loss=0.06755, att_loss=0.2308, loss=0.1981, over 16116.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006268, over 42.00 utterances.], tot_loss[ctc_loss=0.06349, att_loss=0.2301, loss=0.1968, over 3272781.07 frames. utt_duration=1271 frames, utt_pad_proportion=0.04808, over 10313.85 utterances.], batch size: 42, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:42:02,957 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.889e+02 2.295e+02 2.895e+02 8.128e+02, threshold=4.591e+02, percent-clipped=8.0 2023-03-09 11:42:14,323 INFO [train2.py:809] (2/4) Epoch 28, batch 2900, loss[ctc_loss=0.07947, att_loss=0.2454, loss=0.2122, over 16670.00 frames. utt_duration=1451 frames, utt_pad_proportion=0.007702, over 46.00 utterances.], tot_loss[ctc_loss=0.06395, att_loss=0.2302, loss=0.1969, over 3274389.56 frames. utt_duration=1275 frames, utt_pad_proportion=0.04671, over 10284.63 utterances.], batch size: 46, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:42:21,224 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=110466.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:42:28,779 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=110471.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:42:35,092 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4747, 2.4390, 2.4540, 2.6038, 2.9239, 2.7300, 2.3375, 2.8486], device='cuda:2'), covar=tensor([0.1341, 0.2049, 0.1567, 0.1057, 0.1327, 0.0951, 0.1583, 0.1274], device='cuda:2'), in_proj_covar=tensor([0.0145, 0.0146, 0.0142, 0.0137, 0.0154, 0.0132, 0.0153, 0.0132], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 11:43:32,998 INFO [train2.py:809] (2/4) Epoch 28, batch 2950, loss[ctc_loss=0.05481, att_loss=0.2175, loss=0.1849, over 16178.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.005787, over 41.00 utterances.], tot_loss[ctc_loss=0.06424, att_loss=0.2305, loss=0.1973, over 3267852.59 frames. utt_duration=1271 frames, utt_pad_proportion=0.05003, over 10298.24 utterances.], batch size: 41, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:44:31,434 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1152, 5.3595, 5.2715, 5.3295, 5.4136, 5.3942, 5.0471, 4.8570], device='cuda:2'), covar=tensor([0.1002, 0.0597, 0.0340, 0.0507, 0.0278, 0.0292, 0.0378, 0.0288], device='cuda:2'), in_proj_covar=tensor([0.0531, 0.0374, 0.0370, 0.0376, 0.0436, 0.0440, 0.0374, 0.0409], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 11:44:39,990 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.946e+01 1.771e+02 2.102e+02 2.638e+02 6.862e+02, threshold=4.204e+02, percent-clipped=2.0 2023-03-09 11:44:52,454 INFO [train2.py:809] (2/4) Epoch 28, batch 3000, loss[ctc_loss=0.0534, att_loss=0.2114, loss=0.1798, over 15763.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008468, over 38.00 utterances.], tot_loss[ctc_loss=0.06422, att_loss=0.2308, loss=0.1975, over 3261650.94 frames. utt_duration=1235 frames, utt_pad_proportion=0.06019, over 10580.25 utterances.], batch size: 38, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:44:52,455 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 11:45:06,793 INFO [train2.py:843] (2/4) Epoch 28, validation: ctc_loss=0.04082, att_loss=0.2346, loss=0.1958, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 11:45:06,793 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 11:46:25,757 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110611.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:46:26,957 INFO [train2.py:809] (2/4) Epoch 28, batch 3050, loss[ctc_loss=0.06102, att_loss=0.2409, loss=0.2049, over 17027.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007366, over 51.00 utterances.], tot_loss[ctc_loss=0.06391, att_loss=0.2309, loss=0.1975, over 3266154.28 frames. utt_duration=1239 frames, utt_pad_proportion=0.05838, over 10559.57 utterances.], batch size: 51, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:47:10,741 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=110639.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:47:35,117 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.322e+02 1.799e+02 2.141e+02 2.649e+02 5.384e+02, threshold=4.281e+02, percent-clipped=1.0 2023-03-09 11:47:42,921 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=110659.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:47:47,511 INFO [train2.py:809] (2/4) Epoch 28, batch 3100, loss[ctc_loss=0.05289, att_loss=0.2293, loss=0.194, over 16977.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.005965, over 50.00 utterances.], tot_loss[ctc_loss=0.06416, att_loss=0.231, loss=0.1976, over 3266056.79 frames. utt_duration=1207 frames, utt_pad_proportion=0.06681, over 10840.77 utterances.], batch size: 50, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:47:57,988 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-09 11:48:03,631 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110672.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:48:05,216 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=110673.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:48:06,682 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=110674.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:48:48,160 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=110700.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:49:07,615 INFO [train2.py:809] (2/4) Epoch 28, batch 3150, loss[ctc_loss=0.05661, att_loss=0.2289, loss=0.1945, over 15999.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007853, over 40.00 utterances.], tot_loss[ctc_loss=0.06417, att_loss=0.2313, loss=0.1979, over 3268773.88 frames. utt_duration=1233 frames, utt_pad_proportion=0.06034, over 10618.56 utterances.], batch size: 40, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:49:21,688 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=110721.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:49:33,172 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-03-09 11:49:46,545 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110736.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:50:04,804 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=110748.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:50:15,626 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.358e+02 1.871e+02 2.189e+02 2.637e+02 5.370e+02, threshold=4.378e+02, percent-clipped=2.0 2023-03-09 11:50:27,799 INFO [train2.py:809] (2/4) Epoch 28, batch 3200, loss[ctc_loss=0.05592, att_loss=0.2292, loss=0.1945, over 16696.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005223, over 46.00 utterances.], tot_loss[ctc_loss=0.06381, att_loss=0.2317, loss=0.1981, over 3279200.75 frames. utt_duration=1229 frames, utt_pad_proportion=0.05781, over 10687.52 utterances.], batch size: 46, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:50:31,384 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6328, 4.6436, 4.7497, 4.7244, 5.3174, 4.5339, 4.6727, 2.8634], device='cuda:2'), covar=tensor([0.0293, 0.0410, 0.0334, 0.0381, 0.0590, 0.0305, 0.0355, 0.1580], device='cuda:2'), in_proj_covar=tensor([0.0196, 0.0225, 0.0222, 0.0238, 0.0384, 0.0195, 0.0212, 0.0221], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:50:34,322 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1741, 5.5255, 5.1182, 5.5864, 4.9339, 5.1113, 5.6568, 5.4665], device='cuda:2'), covar=tensor([0.0550, 0.0240, 0.0685, 0.0272, 0.0334, 0.0222, 0.0178, 0.0180], device='cuda:2'), in_proj_covar=tensor([0.0402, 0.0341, 0.0379, 0.0380, 0.0338, 0.0245, 0.0321, 0.0301], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 11:50:34,380 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=110766.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:50:39,495 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2023-03-09 11:50:42,167 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=110771.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:51:11,549 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0427, 4.3198, 4.4850, 4.5352, 2.6778, 4.4904, 2.5334, 1.4711], device='cuda:2'), covar=tensor([0.0531, 0.0305, 0.0614, 0.0251, 0.1562, 0.0212, 0.1550, 0.1784], device='cuda:2'), in_proj_covar=tensor([0.0219, 0.0189, 0.0267, 0.0181, 0.0223, 0.0170, 0.0232, 0.0203], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 11:51:23,929 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110797.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 11:51:48,161 INFO [train2.py:809] (2/4) Epoch 28, batch 3250, loss[ctc_loss=0.06559, att_loss=0.2349, loss=0.201, over 13896.00 frames. utt_duration=384.8 frames, utt_pad_proportion=0.3307, over 145.00 utterances.], tot_loss[ctc_loss=0.06423, att_loss=0.2312, loss=0.1978, over 3272918.35 frames. utt_duration=1215 frames, utt_pad_proportion=0.06271, over 10791.51 utterances.], batch size: 145, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:51:51,340 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=110814.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:51:59,336 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=110819.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:52:07,324 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0608, 4.9711, 4.6845, 2.9996, 4.8742, 4.8397, 4.2846, 2.5296], device='cuda:2'), covar=tensor([0.0137, 0.0143, 0.0382, 0.1136, 0.0114, 0.0200, 0.0368, 0.1754], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0106, 0.0112, 0.0112, 0.0089, 0.0118, 0.0102, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 11:52:24,815 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6930, 2.6015, 2.5455, 2.6684, 2.9191, 2.8685, 2.6676, 3.0142], device='cuda:2'), covar=tensor([0.1577, 0.1997, 0.1805, 0.1288, 0.1393, 0.0904, 0.1522, 0.1229], device='cuda:2'), in_proj_covar=tensor([0.0148, 0.0148, 0.0145, 0.0139, 0.0156, 0.0134, 0.0155, 0.0134], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 11:52:55,395 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 1.746e+02 2.093e+02 2.617e+02 5.212e+02, threshold=4.186e+02, percent-clipped=3.0 2023-03-09 11:53:07,975 INFO [train2.py:809] (2/4) Epoch 28, batch 3300, loss[ctc_loss=0.0703, att_loss=0.234, loss=0.2013, over 17381.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03301, over 63.00 utterances.], tot_loss[ctc_loss=0.06458, att_loss=0.2317, loss=0.1983, over 3277732.15 frames. utt_duration=1240 frames, utt_pad_proportion=0.05436, over 10584.69 utterances.], batch size: 63, lr: 3.81e-03, grad_scale: 8.0 2023-03-09 11:53:17,667 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110868.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:54:27,398 INFO [train2.py:809] (2/4) Epoch 28, batch 3350, loss[ctc_loss=0.08953, att_loss=0.2395, loss=0.2095, over 17379.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03429, over 63.00 utterances.], tot_loss[ctc_loss=0.06452, att_loss=0.2311, loss=0.1978, over 3276956.35 frames. utt_duration=1257 frames, utt_pad_proportion=0.05083, over 10437.61 utterances.], batch size: 63, lr: 3.80e-03, grad_scale: 8.0 2023-03-09 11:54:54,359 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=110929.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:55:10,285 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=110939.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:55:36,025 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.217e+02 1.730e+02 2.154e+02 2.752e+02 9.084e+02, threshold=4.307e+02, percent-clipped=4.0 2023-03-09 11:55:41,052 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=110958.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:55:42,518 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=110959.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:55:46,718 INFO [train2.py:809] (2/4) Epoch 28, batch 3400, loss[ctc_loss=0.07125, att_loss=0.2228, loss=0.1925, over 15967.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005678, over 41.00 utterances.], tot_loss[ctc_loss=0.06525, att_loss=0.2318, loss=0.1985, over 3280297.28 frames. utt_duration=1229 frames, utt_pad_proportion=0.05673, over 10692.89 utterances.], batch size: 41, lr: 3.80e-03, grad_scale: 8.0 2023-03-09 11:55:54,575 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=110967.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:56:05,564 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=110974.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:56:26,762 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=110987.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:56:58,899 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=111007.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:57:06,333 INFO [train2.py:809] (2/4) Epoch 28, batch 3450, loss[ctc_loss=0.07083, att_loss=0.2194, loss=0.1896, over 15767.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.007436, over 38.00 utterances.], tot_loss[ctc_loss=0.06541, att_loss=0.2316, loss=0.1983, over 3276160.11 frames. utt_duration=1220 frames, utt_pad_proportion=0.05985, over 10758.44 utterances.], batch size: 38, lr: 3.80e-03, grad_scale: 8.0 2023-03-09 11:57:17,585 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=111019.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:57:21,942 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=111022.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 11:58:14,706 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.190e+02 1.799e+02 2.104e+02 2.788e+02 6.212e+02, threshold=4.209e+02, percent-clipped=4.0 2023-03-09 11:58:25,585 INFO [train2.py:809] (2/4) Epoch 28, batch 3500, loss[ctc_loss=0.06005, att_loss=0.2123, loss=0.1818, over 15772.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008416, over 38.00 utterances.], tot_loss[ctc_loss=0.0648, att_loss=0.2311, loss=0.1978, over 3271173.89 frames. utt_duration=1231 frames, utt_pad_proportion=0.05887, over 10640.03 utterances.], batch size: 38, lr: 3.80e-03, grad_scale: 8.0 2023-03-09 11:59:14,514 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=111092.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 11:59:18,560 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-09 11:59:46,741 INFO [train2.py:809] (2/4) Epoch 28, batch 3550, loss[ctc_loss=0.06988, att_loss=0.2303, loss=0.1982, over 16120.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.005502, over 42.00 utterances.], tot_loss[ctc_loss=0.06433, att_loss=0.2304, loss=0.1972, over 3263790.41 frames. utt_duration=1211 frames, utt_pad_proportion=0.06676, over 10793.16 utterances.], batch size: 42, lr: 3.80e-03, grad_scale: 8.0 2023-03-09 12:00:51,424 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-03-09 12:00:55,330 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.811e+02 2.154e+02 2.613e+02 6.278e+02, threshold=4.308e+02, percent-clipped=2.0 2023-03-09 12:01:05,920 INFO [train2.py:809] (2/4) Epoch 28, batch 3600, loss[ctc_loss=0.09506, att_loss=0.2482, loss=0.2176, over 17385.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03406, over 63.00 utterances.], tot_loss[ctc_loss=0.06464, att_loss=0.2311, loss=0.1978, over 3266047.46 frames. utt_duration=1174 frames, utt_pad_proportion=0.07438, over 11144.89 utterances.], batch size: 63, lr: 3.80e-03, grad_scale: 8.0 2023-03-09 12:02:20,210 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8505, 4.7855, 4.6400, 2.7702, 4.6393, 4.6648, 4.2271, 2.5471], device='cuda:2'), covar=tensor([0.0129, 0.0132, 0.0264, 0.1110, 0.0114, 0.0207, 0.0329, 0.1534], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0107, 0.0112, 0.0112, 0.0089, 0.0118, 0.0102, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:02:25,880 INFO [train2.py:809] (2/4) Epoch 28, batch 3650, loss[ctc_loss=0.0945, att_loss=0.2267, loss=0.2002, over 15492.00 frames. utt_duration=1723 frames, utt_pad_proportion=0.009169, over 36.00 utterances.], tot_loss[ctc_loss=0.06512, att_loss=0.2306, loss=0.1975, over 3253150.59 frames. utt_duration=1169 frames, utt_pad_proportion=0.08, over 11150.52 utterances.], batch size: 36, lr: 3.80e-03, grad_scale: 8.0 2023-03-09 12:02:45,389 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=111224.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:02:47,109 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=111225.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:03:34,857 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.213e+02 1.792e+02 2.128e+02 2.694e+02 5.056e+02, threshold=4.257e+02, percent-clipped=2.0 2023-03-09 12:03:45,876 INFO [train2.py:809] (2/4) Epoch 28, batch 3700, loss[ctc_loss=0.07137, att_loss=0.2405, loss=0.2066, over 16631.00 frames. utt_duration=673.6 frames, utt_pad_proportion=0.1559, over 99.00 utterances.], tot_loss[ctc_loss=0.06452, att_loss=0.2308, loss=0.1976, over 3264363.51 frames. utt_duration=1190 frames, utt_pad_proportion=0.07069, over 10990.08 utterances.], batch size: 99, lr: 3.80e-03, grad_scale: 16.0 2023-03-09 12:03:54,489 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=111267.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:04:10,481 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0713, 4.2057, 3.9401, 4.2447, 3.8850, 3.7142, 4.2354, 4.1197], device='cuda:2'), covar=tensor([0.0515, 0.0328, 0.0575, 0.0446, 0.0348, 0.1065, 0.0272, 0.0204], device='cuda:2'), in_proj_covar=tensor([0.0402, 0.0338, 0.0378, 0.0379, 0.0337, 0.0244, 0.0319, 0.0300], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 12:04:15,134 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0277, 5.2968, 5.5552, 5.3588, 5.5109, 5.9871, 5.2553, 6.0701], device='cuda:2'), covar=tensor([0.0749, 0.0765, 0.0870, 0.1463, 0.1857, 0.0911, 0.0811, 0.0661], device='cuda:2'), in_proj_covar=tensor([0.0925, 0.0536, 0.0655, 0.0692, 0.0926, 0.0668, 0.0517, 0.0652], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:04:24,237 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=111286.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:05:06,057 INFO [train2.py:809] (2/4) Epoch 28, batch 3750, loss[ctc_loss=0.08867, att_loss=0.242, loss=0.2114, over 17279.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.01222, over 55.00 utterances.], tot_loss[ctc_loss=0.06544, att_loss=0.2316, loss=0.1984, over 3265858.23 frames. utt_duration=1164 frames, utt_pad_proportion=0.07658, over 11241.71 utterances.], batch size: 55, lr: 3.80e-03, grad_scale: 16.0 2023-03-09 12:05:09,377 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=111314.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:05:10,823 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=111315.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:05:15,487 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0000, 5.2170, 5.1598, 5.1628, 5.2542, 5.2271, 4.8827, 4.7213], device='cuda:2'), covar=tensor([0.1012, 0.0592, 0.0354, 0.0558, 0.0310, 0.0364, 0.0455, 0.0353], device='cuda:2'), in_proj_covar=tensor([0.0531, 0.0376, 0.0374, 0.0377, 0.0442, 0.0444, 0.0376, 0.0412], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:05:26,771 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1193, 5.4733, 5.0363, 5.4975, 4.8915, 5.0690, 5.5968, 5.3662], device='cuda:2'), covar=tensor([0.0623, 0.0292, 0.0762, 0.0320, 0.0412, 0.0230, 0.0201, 0.0189], device='cuda:2'), in_proj_covar=tensor([0.0404, 0.0340, 0.0380, 0.0382, 0.0340, 0.0247, 0.0321, 0.0302], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 12:06:13,691 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.228e+02 1.865e+02 2.265e+02 2.758e+02 6.117e+02, threshold=4.530e+02, percent-clipped=1.0 2023-03-09 12:06:14,080 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9669, 4.8830, 4.7015, 2.8651, 4.7438, 4.6163, 4.2679, 2.7086], device='cuda:2'), covar=tensor([0.0116, 0.0112, 0.0271, 0.1031, 0.0101, 0.0215, 0.0293, 0.1339], device='cuda:2'), in_proj_covar=tensor([0.0080, 0.0107, 0.0113, 0.0112, 0.0090, 0.0119, 0.0102, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:06:25,559 INFO [train2.py:809] (2/4) Epoch 28, batch 3800, loss[ctc_loss=0.06015, att_loss=0.2366, loss=0.2013, over 16477.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006041, over 46.00 utterances.], tot_loss[ctc_loss=0.06528, att_loss=0.2314, loss=0.1982, over 3253240.86 frames. utt_duration=1194 frames, utt_pad_proportion=0.07237, over 10916.27 utterances.], batch size: 46, lr: 3.80e-03, grad_scale: 16.0 2023-03-09 12:06:46,845 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.2028, 4.4404, 4.5370, 4.5807, 5.0770, 4.3456, 4.4931, 2.5989], device='cuda:2'), covar=tensor([0.0416, 0.0376, 0.0393, 0.0355, 0.0888, 0.0339, 0.0366, 0.1746], device='cuda:2'), in_proj_covar=tensor([0.0199, 0.0227, 0.0225, 0.0239, 0.0386, 0.0196, 0.0215, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:06:51,902 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.00 vs. limit=5.0 2023-03-09 12:07:13,977 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=111392.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 12:07:44,851 INFO [train2.py:809] (2/4) Epoch 28, batch 3850, loss[ctc_loss=0.05594, att_loss=0.2113, loss=0.1803, over 15348.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.0126, over 35.00 utterances.], tot_loss[ctc_loss=0.06427, att_loss=0.2306, loss=0.1973, over 3259123.98 frames. utt_duration=1222 frames, utt_pad_proportion=0.06324, over 10681.39 utterances.], batch size: 35, lr: 3.80e-03, grad_scale: 16.0 2023-03-09 12:07:54,368 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3293, 4.6315, 4.5069, 4.6481, 4.7669, 4.4302, 3.4028, 4.6200], device='cuda:2'), covar=tensor([0.0145, 0.0128, 0.0156, 0.0091, 0.0097, 0.0133, 0.0678, 0.0200], device='cuda:2'), in_proj_covar=tensor([0.0098, 0.0094, 0.0119, 0.0074, 0.0080, 0.0092, 0.0107, 0.0112], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:08:28,210 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=111440.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:08:32,024 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-09 12:08:51,737 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 1.814e+02 2.141e+02 2.602e+02 6.153e+02, threshold=4.282e+02, percent-clipped=2.0 2023-03-09 12:09:02,531 INFO [train2.py:809] (2/4) Epoch 28, batch 3900, loss[ctc_loss=0.05499, att_loss=0.2297, loss=0.1948, over 16949.00 frames. utt_duration=1357 frames, utt_pad_proportion=0.007792, over 50.00 utterances.], tot_loss[ctc_loss=0.06391, att_loss=0.2304, loss=0.1971, over 3253845.62 frames. utt_duration=1219 frames, utt_pad_proportion=0.06399, over 10689.90 utterances.], batch size: 50, lr: 3.80e-03, grad_scale: 16.0 2023-03-09 12:09:06,529 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2023-03-09 12:09:12,010 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=111468.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:10:19,197 INFO [train2.py:809] (2/4) Epoch 28, batch 3950, loss[ctc_loss=0.05732, att_loss=0.2123, loss=0.1813, over 13555.00 frames. utt_duration=1809 frames, utt_pad_proportion=0.02742, over 30.00 utterances.], tot_loss[ctc_loss=0.06424, att_loss=0.2306, loss=0.1973, over 3252635.78 frames. utt_duration=1220 frames, utt_pad_proportion=0.06507, over 10676.75 utterances.], batch size: 30, lr: 3.79e-03, grad_scale: 16.0 2023-03-09 12:10:37,694 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=111524.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:10:45,257 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=111529.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:10:59,762 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5500, 2.6882, 3.6600, 2.8466, 3.5611, 4.7578, 4.6917, 3.0139], device='cuda:2'), covar=tensor([0.0506, 0.2344, 0.1156, 0.1812, 0.1091, 0.0863, 0.0491, 0.1735], device='cuda:2'), in_proj_covar=tensor([0.0251, 0.0250, 0.0295, 0.0220, 0.0273, 0.0382, 0.0275, 0.0237], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:11:24,279 INFO [train2.py:809] (2/4) Epoch 29, batch 0, loss[ctc_loss=0.0779, att_loss=0.2507, loss=0.2161, over 17416.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03188, over 63.00 utterances.], tot_loss[ctc_loss=0.0779, att_loss=0.2507, loss=0.2161, over 17416.00 frames. utt_duration=1108 frames, utt_pad_proportion=0.03188, over 63.00 utterances.], batch size: 63, lr: 3.73e-03, grad_scale: 8.0 2023-03-09 12:11:24,280 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 12:11:36,590 INFO [train2.py:843] (2/4) Epoch 29, validation: ctc_loss=0.04125, att_loss=0.2346, loss=0.1959, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 12:11:36,591 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 12:11:54,772 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.196e+02 1.789e+02 2.164e+02 2.545e+02 5.866e+02, threshold=4.328e+02, percent-clipped=2.0 2023-03-09 12:12:19,811 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=111572.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:12:34,351 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=111581.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:12:44,468 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2023-03-09 12:12:55,745 INFO [train2.py:809] (2/4) Epoch 29, batch 50, loss[ctc_loss=0.05521, att_loss=0.2442, loss=0.2064, over 17013.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.009062, over 51.00 utterances.], tot_loss[ctc_loss=0.06564, att_loss=0.2333, loss=0.1997, over 746987.31 frames. utt_duration=1336 frames, utt_pad_proportion=0.02187, over 2238.43 utterances.], batch size: 51, lr: 3.73e-03, grad_scale: 8.0 2023-03-09 12:13:26,138 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.58 vs. limit=5.0 2023-03-09 12:13:26,779 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=111614.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:14:02,201 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0322, 4.1476, 4.1440, 4.4691, 2.6016, 4.3688, 2.8064, 1.8261], device='cuda:2'), covar=tensor([0.0515, 0.0336, 0.0735, 0.0261, 0.1648, 0.0258, 0.1380, 0.1670], device='cuda:2'), in_proj_covar=tensor([0.0222, 0.0192, 0.0270, 0.0183, 0.0227, 0.0173, 0.0234, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:14:15,536 INFO [train2.py:809] (2/4) Epoch 29, batch 100, loss[ctc_loss=0.06688, att_loss=0.2131, loss=0.1838, over 15660.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.008038, over 37.00 utterances.], tot_loss[ctc_loss=0.06303, att_loss=0.2288, loss=0.1956, over 1298459.18 frames. utt_duration=1343 frames, utt_pad_proportion=0.03184, over 3872.25 utterances.], batch size: 37, lr: 3.72e-03, grad_scale: 8.0 2023-03-09 12:14:33,656 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.291e+02 1.812e+02 2.121e+02 2.575e+02 9.495e+02, threshold=4.242e+02, percent-clipped=2.0 2023-03-09 12:14:40,247 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5123, 2.8720, 4.9755, 4.0254, 3.1416, 4.2518, 4.8084, 4.6893], device='cuda:2'), covar=tensor([0.0294, 0.1463, 0.0229, 0.0885, 0.1648, 0.0293, 0.0224, 0.0287], device='cuda:2'), in_proj_covar=tensor([0.0236, 0.0249, 0.0229, 0.0328, 0.0272, 0.0244, 0.0221, 0.0244], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:14:42,935 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=111662.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:15:35,264 INFO [train2.py:809] (2/4) Epoch 29, batch 150, loss[ctc_loss=0.0879, att_loss=0.2534, loss=0.2203, over 16762.00 frames. utt_duration=1398 frames, utt_pad_proportion=0.006792, over 48.00 utterances.], tot_loss[ctc_loss=0.06353, att_loss=0.2305, loss=0.1971, over 1734755.12 frames. utt_duration=1301 frames, utt_pad_proportion=0.04002, over 5341.17 utterances.], batch size: 48, lr: 3.72e-03, grad_scale: 8.0 2023-03-09 12:16:20,868 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0896, 5.0609, 4.8463, 2.9006, 4.8745, 4.8202, 4.4293, 2.8191], device='cuda:2'), covar=tensor([0.0130, 0.0120, 0.0300, 0.1038, 0.0105, 0.0191, 0.0311, 0.1364], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0107, 0.0113, 0.0113, 0.0090, 0.0119, 0.0101, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:16:25,610 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.3120, 5.5341, 5.4752, 5.4874, 5.5721, 5.5650, 5.1877, 4.9924], device='cuda:2'), covar=tensor([0.0947, 0.0468, 0.0260, 0.0416, 0.0280, 0.0266, 0.0428, 0.0331], device='cuda:2'), in_proj_covar=tensor([0.0539, 0.0379, 0.0377, 0.0380, 0.0446, 0.0448, 0.0380, 0.0417], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:16:57,985 INFO [train2.py:809] (2/4) Epoch 29, batch 200, loss[ctc_loss=0.05488, att_loss=0.2324, loss=0.1969, over 16877.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.007126, over 49.00 utterances.], tot_loss[ctc_loss=0.06319, att_loss=0.2309, loss=0.1974, over 2080654.31 frames. utt_duration=1263 frames, utt_pad_proportion=0.0477, over 6599.01 utterances.], batch size: 49, lr: 3.72e-03, grad_scale: 8.0 2023-03-09 12:17:15,320 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2023-03-09 12:17:15,891 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.262e+02 1.846e+02 2.300e+02 2.696e+02 5.049e+02, threshold=4.600e+02, percent-clipped=1.0 2023-03-09 12:17:45,429 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1331, 5.4395, 5.6612, 5.4609, 5.6745, 6.0991, 5.3189, 6.1878], device='cuda:2'), covar=tensor([0.0737, 0.0698, 0.0811, 0.1370, 0.1828, 0.0853, 0.0682, 0.0657], device='cuda:2'), in_proj_covar=tensor([0.0927, 0.0538, 0.0656, 0.0694, 0.0926, 0.0672, 0.0516, 0.0656], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:18:01,234 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9706, 4.9415, 4.7155, 3.0246, 4.7916, 4.6460, 4.3014, 2.7742], device='cuda:2'), covar=tensor([0.0129, 0.0105, 0.0302, 0.0941, 0.0102, 0.0210, 0.0298, 0.1299], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0107, 0.0112, 0.0112, 0.0090, 0.0118, 0.0101, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:18:12,035 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=111791.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 12:18:17,855 INFO [train2.py:809] (2/4) Epoch 29, batch 250, loss[ctc_loss=0.07009, att_loss=0.2365, loss=0.2032, over 16267.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007285, over 43.00 utterances.], tot_loss[ctc_loss=0.06299, att_loss=0.23, loss=0.1966, over 2343133.93 frames. utt_duration=1280 frames, utt_pad_proportion=0.045, over 7328.26 utterances.], batch size: 43, lr: 3.72e-03, grad_scale: 8.0 2023-03-09 12:19:05,346 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=111824.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:19:37,433 INFO [train2.py:809] (2/4) Epoch 29, batch 300, loss[ctc_loss=0.08251, att_loss=0.2507, loss=0.2171, over 17354.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.02161, over 59.00 utterances.], tot_loss[ctc_loss=0.06426, att_loss=0.2309, loss=0.1976, over 2553094.74 frames. utt_duration=1244 frames, utt_pad_proportion=0.05368, over 8221.43 utterances.], batch size: 59, lr: 3.72e-03, grad_scale: 8.0 2023-03-09 12:19:49,948 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=111852.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 12:19:55,511 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.235e+02 1.741e+02 2.135e+02 2.604e+02 1.132e+03, threshold=4.270e+02, percent-clipped=2.0 2023-03-09 12:20:35,428 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=111881.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:20:56,610 INFO [train2.py:809] (2/4) Epoch 29, batch 350, loss[ctc_loss=0.119, att_loss=0.2609, loss=0.2326, over 14765.00 frames. utt_duration=405.9 frames, utt_pad_proportion=0.2928, over 146.00 utterances.], tot_loss[ctc_loss=0.06443, att_loss=0.2307, loss=0.1975, over 2713205.14 frames. utt_duration=1263 frames, utt_pad_proportion=0.04977, over 8600.13 utterances.], batch size: 146, lr: 3.72e-03, grad_scale: 8.0 2023-03-09 12:21:50,813 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=111929.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:22:16,054 INFO [train2.py:809] (2/4) Epoch 29, batch 400, loss[ctc_loss=0.05369, att_loss=0.2356, loss=0.1992, over 16883.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.007503, over 49.00 utterances.], tot_loss[ctc_loss=0.06356, att_loss=0.2306, loss=0.1972, over 2842414.51 frames. utt_duration=1261 frames, utt_pad_proportion=0.05002, over 9029.59 utterances.], batch size: 49, lr: 3.72e-03, grad_scale: 8.0 2023-03-09 12:22:17,701 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0736, 6.3274, 5.7356, 6.0421, 5.9850, 5.5919, 5.7669, 5.4356], device='cuda:2'), covar=tensor([0.1246, 0.0796, 0.1198, 0.0785, 0.0836, 0.1415, 0.2059, 0.2116], device='cuda:2'), in_proj_covar=tensor([0.0563, 0.0639, 0.0490, 0.0475, 0.0454, 0.0483, 0.0641, 0.0547], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:22:31,918 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0929, 5.0258, 4.8246, 3.0325, 4.8687, 4.7187, 4.3494, 2.6978], device='cuda:2'), covar=tensor([0.0104, 0.0098, 0.0291, 0.0943, 0.0098, 0.0204, 0.0316, 0.1389], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0106, 0.0112, 0.0112, 0.0089, 0.0118, 0.0100, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:22:34,679 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.870e+02 2.171e+02 2.633e+02 6.586e+02, threshold=4.343e+02, percent-clipped=4.0 2023-03-09 12:23:38,447 INFO [train2.py:809] (2/4) Epoch 29, batch 450, loss[ctc_loss=0.05488, att_loss=0.2238, loss=0.19, over 17009.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008501, over 51.00 utterances.], tot_loss[ctc_loss=0.06368, att_loss=0.2307, loss=0.1973, over 2939768.62 frames. utt_duration=1272 frames, utt_pad_proportion=0.04758, over 9257.00 utterances.], batch size: 51, lr: 3.72e-03, grad_scale: 4.0 2023-03-09 12:23:43,575 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4530, 2.4156, 2.1923, 2.4352, 2.7921, 2.4326, 2.3163, 2.6820], device='cuda:2'), covar=tensor([0.1834, 0.2178, 0.2277, 0.1267, 0.1509, 0.1409, 0.1994, 0.1624], device='cuda:2'), in_proj_covar=tensor([0.0150, 0.0149, 0.0146, 0.0140, 0.0158, 0.0135, 0.0158, 0.0135], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 12:25:02,538 INFO [train2.py:809] (2/4) Epoch 29, batch 500, loss[ctc_loss=0.06987, att_loss=0.2297, loss=0.1978, over 15996.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007393, over 40.00 utterances.], tot_loss[ctc_loss=0.06354, att_loss=0.2304, loss=0.197, over 3011572.72 frames. utt_duration=1264 frames, utt_pad_proportion=0.05015, over 9539.84 utterances.], batch size: 40, lr: 3.72e-03, grad_scale: 4.0 2023-03-09 12:25:22,139 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.395e+02 1.941e+02 2.244e+02 2.736e+02 5.323e+02, threshold=4.488e+02, percent-clipped=4.0 2023-03-09 12:25:37,183 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=112066.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:26:06,144 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2023-03-09 12:26:06,996 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5350, 2.8616, 3.3924, 4.5310, 3.9810, 3.9825, 2.9722, 2.5121], device='cuda:2'), covar=tensor([0.0723, 0.2016, 0.0912, 0.0493, 0.0903, 0.0494, 0.1498, 0.2036], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0223, 0.0189, 0.0233, 0.0239, 0.0196, 0.0209, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-09 12:26:22,018 INFO [train2.py:809] (2/4) Epoch 29, batch 550, loss[ctc_loss=0.06626, att_loss=0.2298, loss=0.1971, over 16177.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006914, over 41.00 utterances.], tot_loss[ctc_loss=0.06408, att_loss=0.2304, loss=0.1971, over 3061759.82 frames. utt_duration=1261 frames, utt_pad_proportion=0.0534, over 9724.99 utterances.], batch size: 41, lr: 3.72e-03, grad_scale: 4.0 2023-03-09 12:27:10,027 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=112124.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:27:11,546 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7627, 5.2204, 5.0266, 5.1321, 5.3247, 4.9528, 3.7431, 5.2399], device='cuda:2'), covar=tensor([0.0121, 0.0097, 0.0114, 0.0085, 0.0082, 0.0097, 0.0605, 0.0146], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0093, 0.0118, 0.0074, 0.0080, 0.0091, 0.0105, 0.0111], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:27:14,814 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=112127.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:27:42,786 INFO [train2.py:809] (2/4) Epoch 29, batch 600, loss[ctc_loss=0.08518, att_loss=0.2446, loss=0.2127, over 16815.00 frames. utt_duration=680.8 frames, utt_pad_proportion=0.1437, over 99.00 utterances.], tot_loss[ctc_loss=0.06426, att_loss=0.2306, loss=0.1973, over 3107252.77 frames. utt_duration=1216 frames, utt_pad_proportion=0.06308, over 10229.76 utterances.], batch size: 99, lr: 3.72e-03, grad_scale: 4.0 2023-03-09 12:27:45,923 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=112147.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 12:28:01,253 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.227e+02 2.016e+02 2.437e+02 2.973e+02 9.680e+02, threshold=4.873e+02, percent-clipped=6.0 2023-03-09 12:28:26,112 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=112172.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:29:03,067 INFO [train2.py:809] (2/4) Epoch 29, batch 650, loss[ctc_loss=0.06035, att_loss=0.227, loss=0.1937, over 16544.00 frames. utt_duration=1472 frames, utt_pad_proportion=0.006152, over 45.00 utterances.], tot_loss[ctc_loss=0.0642, att_loss=0.2307, loss=0.1974, over 3143666.63 frames. utt_duration=1224 frames, utt_pad_proportion=0.06136, over 10284.89 utterances.], batch size: 45, lr: 3.72e-03, grad_scale: 4.0 2023-03-09 12:29:11,371 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5262, 3.1868, 3.6426, 3.1852, 3.5999, 4.6324, 4.4946, 3.3865], device='cuda:2'), covar=tensor([0.0394, 0.1614, 0.1404, 0.1247, 0.1134, 0.0862, 0.0602, 0.1172], device='cuda:2'), in_proj_covar=tensor([0.0253, 0.0252, 0.0297, 0.0221, 0.0276, 0.0385, 0.0275, 0.0239], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:30:22,505 INFO [train2.py:809] (2/4) Epoch 29, batch 700, loss[ctc_loss=0.07383, att_loss=0.2126, loss=0.1848, over 15513.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.007907, over 36.00 utterances.], tot_loss[ctc_loss=0.06394, att_loss=0.2301, loss=0.1969, over 3171408.94 frames. utt_duration=1249 frames, utt_pad_proportion=0.05584, over 10168.84 utterances.], batch size: 36, lr: 3.71e-03, grad_scale: 4.0 2023-03-09 12:30:41,002 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.343e+02 1.769e+02 2.156e+02 2.659e+02 7.287e+02, threshold=4.312e+02, percent-clipped=5.0 2023-03-09 12:30:58,609 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-09 12:31:07,385 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1259, 5.0398, 4.8399, 2.9604, 4.8696, 4.6573, 4.3570, 2.6934], device='cuda:2'), covar=tensor([0.0108, 0.0107, 0.0317, 0.0991, 0.0099, 0.0220, 0.0307, 0.1360], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0107, 0.0112, 0.0113, 0.0090, 0.0119, 0.0101, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:31:21,076 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4715, 4.8809, 4.7454, 4.7562, 4.9094, 4.6406, 3.2451, 4.8103], device='cuda:2'), covar=tensor([0.0144, 0.0142, 0.0162, 0.0129, 0.0116, 0.0151, 0.0867, 0.0222], device='cuda:2'), in_proj_covar=tensor([0.0098, 0.0093, 0.0119, 0.0074, 0.0080, 0.0092, 0.0106, 0.0111], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:31:41,968 INFO [train2.py:809] (2/4) Epoch 29, batch 750, loss[ctc_loss=0.06189, att_loss=0.2426, loss=0.2065, over 17307.00 frames. utt_duration=1101 frames, utt_pad_proportion=0.03546, over 63.00 utterances.], tot_loss[ctc_loss=0.0639, att_loss=0.2308, loss=0.1974, over 3199503.30 frames. utt_duration=1246 frames, utt_pad_proportion=0.05422, over 10279.67 utterances.], batch size: 63, lr: 3.71e-03, grad_scale: 4.0 2023-03-09 12:33:02,100 INFO [train2.py:809] (2/4) Epoch 29, batch 800, loss[ctc_loss=0.07176, att_loss=0.2381, loss=0.2048, over 17007.00 frames. utt_duration=1336 frames, utt_pad_proportion=0.008472, over 51.00 utterances.], tot_loss[ctc_loss=0.06429, att_loss=0.2318, loss=0.1983, over 3219174.96 frames. utt_duration=1232 frames, utt_pad_proportion=0.05613, over 10468.28 utterances.], batch size: 51, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:33:21,112 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.436e+02 1.905e+02 2.158e+02 2.747e+02 4.535e+02, threshold=4.315e+02, percent-clipped=2.0 2023-03-09 12:34:13,334 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=112389.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:34:22,115 INFO [train2.py:809] (2/4) Epoch 29, batch 850, loss[ctc_loss=0.06232, att_loss=0.2149, loss=0.1844, over 15945.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.006969, over 41.00 utterances.], tot_loss[ctc_loss=0.06398, att_loss=0.2315, loss=0.198, over 3236315.13 frames. utt_duration=1251 frames, utt_pad_proportion=0.04952, over 10363.75 utterances.], batch size: 41, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:35:06,133 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=112422.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:35:29,396 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.4953, 2.1981, 2.3105, 2.2866, 2.5692, 2.4415, 2.0695, 2.4500], device='cuda:2'), covar=tensor([0.1968, 0.2420, 0.1895, 0.1281, 0.2195, 0.1396, 0.1785, 0.1467], device='cuda:2'), in_proj_covar=tensor([0.0150, 0.0149, 0.0147, 0.0140, 0.0158, 0.0136, 0.0158, 0.0136], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 12:35:42,812 INFO [train2.py:809] (2/4) Epoch 29, batch 900, loss[ctc_loss=0.05939, att_loss=0.2334, loss=0.1986, over 16469.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.007046, over 46.00 utterances.], tot_loss[ctc_loss=0.06415, att_loss=0.2317, loss=0.1982, over 3254159.37 frames. utt_duration=1257 frames, utt_pad_proportion=0.04557, over 10364.78 utterances.], batch size: 46, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:35:46,215 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=112447.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 12:35:51,271 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=112450.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:36:02,470 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.247e+02 1.980e+02 2.255e+02 2.971e+02 7.744e+02, threshold=4.510e+02, percent-clipped=6.0 2023-03-09 12:36:16,730 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2023-03-09 12:36:34,572 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2390, 3.8875, 3.9043, 3.4075, 3.9051, 4.0283, 3.9744, 2.9973], device='cuda:2'), covar=tensor([0.0959, 0.1043, 0.1391, 0.2459, 0.0802, 0.1379, 0.0627, 0.2554], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0210, 0.0226, 0.0276, 0.0187, 0.0287, 0.0210, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:37:03,515 INFO [train2.py:809] (2/4) Epoch 29, batch 950, loss[ctc_loss=0.08387, att_loss=0.2278, loss=0.199, over 15888.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.009222, over 39.00 utterances.], tot_loss[ctc_loss=0.06403, att_loss=0.2309, loss=0.1976, over 3251288.55 frames. utt_duration=1260 frames, utt_pad_proportion=0.04853, over 10332.31 utterances.], batch size: 39, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:37:03,629 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=112495.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 12:38:17,828 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2023-03-09 12:38:19,291 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5856, 3.2560, 3.5392, 4.5026, 4.0104, 4.0151, 3.0875, 2.6692], device='cuda:2'), covar=tensor([0.0699, 0.1675, 0.0815, 0.0598, 0.0943, 0.0503, 0.1409, 0.1941], device='cuda:2'), in_proj_covar=tensor([0.0190, 0.0219, 0.0186, 0.0228, 0.0235, 0.0193, 0.0206, 0.0193], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:38:23,728 INFO [train2.py:809] (2/4) Epoch 29, batch 1000, loss[ctc_loss=0.06178, att_loss=0.2144, loss=0.1839, over 15854.00 frames. utt_duration=1628 frames, utt_pad_proportion=0.01114, over 39.00 utterances.], tot_loss[ctc_loss=0.06368, att_loss=0.2305, loss=0.1971, over 3255594.99 frames. utt_duration=1260 frames, utt_pad_proportion=0.04962, over 10350.90 utterances.], batch size: 39, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:38:42,357 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.260e+02 1.873e+02 2.288e+02 2.757e+02 5.242e+02, threshold=4.577e+02, percent-clipped=3.0 2023-03-09 12:38:51,141 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=112562.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:39:14,919 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=112577.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:39:42,798 INFO [train2.py:809] (2/4) Epoch 29, batch 1050, loss[ctc_loss=0.09641, att_loss=0.2364, loss=0.2084, over 15967.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005693, over 41.00 utterances.], tot_loss[ctc_loss=0.06412, att_loss=0.2311, loss=0.1977, over 3264032.03 frames. utt_duration=1247 frames, utt_pad_proportion=0.05179, over 10480.60 utterances.], batch size: 41, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:40:19,254 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=112617.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:40:28,637 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=112623.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:40:52,252 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=112638.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 12:41:03,004 INFO [train2.py:809] (2/4) Epoch 29, batch 1100, loss[ctc_loss=0.07221, att_loss=0.2496, loss=0.2142, over 17351.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.02162, over 59.00 utterances.], tot_loss[ctc_loss=0.06405, att_loss=0.231, loss=0.1976, over 3269759.19 frames. utt_duration=1277 frames, utt_pad_proportion=0.04442, over 10257.11 utterances.], batch size: 59, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:41:22,207 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.269e+02 1.863e+02 2.140e+02 2.769e+02 5.111e+02, threshold=4.279e+02, percent-clipped=1.0 2023-03-09 12:41:43,281 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1647, 5.0888, 4.8582, 3.3631, 4.8481, 4.7033, 4.3935, 2.9871], device='cuda:2'), covar=tensor([0.0107, 0.0112, 0.0302, 0.0848, 0.0117, 0.0225, 0.0309, 0.1265], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0107, 0.0112, 0.0113, 0.0091, 0.0119, 0.0102, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 12:41:55,440 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=112678.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 12:42:21,110 INFO [train2.py:809] (2/4) Epoch 29, batch 1150, loss[ctc_loss=0.08302, att_loss=0.2491, loss=0.2158, over 17034.00 frames. utt_duration=1312 frames, utt_pad_proportion=0.009151, over 52.00 utterances.], tot_loss[ctc_loss=0.0638, att_loss=0.2309, loss=0.1975, over 3275418.01 frames. utt_duration=1277 frames, utt_pad_proportion=0.04355, over 10275.10 utterances.], batch size: 52, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:42:21,381 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0189, 5.3629, 4.9627, 5.3767, 4.7618, 5.0312, 5.5008, 5.2631], device='cuda:2'), covar=tensor([0.0601, 0.0286, 0.0750, 0.0351, 0.0404, 0.0264, 0.0183, 0.0204], device='cuda:2'), in_proj_covar=tensor([0.0408, 0.0343, 0.0383, 0.0386, 0.0341, 0.0247, 0.0324, 0.0305], device='cuda:2'), out_proj_covar=tensor([0.0007, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 12:43:01,944 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9923, 4.1558, 3.9814, 4.3648, 2.7443, 4.2886, 2.5338, 1.8029], device='cuda:2'), covar=tensor([0.0512, 0.0320, 0.0850, 0.0279, 0.1570, 0.0293, 0.1587, 0.1712], device='cuda:2'), in_proj_covar=tensor([0.0223, 0.0192, 0.0269, 0.0183, 0.0226, 0.0174, 0.0234, 0.0207], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:43:04,798 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=112722.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:43:40,474 INFO [train2.py:809] (2/4) Epoch 29, batch 1200, loss[ctc_loss=0.07343, att_loss=0.2461, loss=0.2115, over 17063.00 frames. utt_duration=1314 frames, utt_pad_proportion=0.007422, over 52.00 utterances.], tot_loss[ctc_loss=0.0639, att_loss=0.2306, loss=0.1973, over 3269139.38 frames. utt_duration=1289 frames, utt_pad_proportion=0.04247, over 10157.58 utterances.], batch size: 52, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:43:40,702 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=112745.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:43:59,908 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.786e+02 2.116e+02 2.747e+02 6.920e+02, threshold=4.232e+02, percent-clipped=7.0 2023-03-09 12:44:21,247 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=112770.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:45:00,865 INFO [train2.py:809] (2/4) Epoch 29, batch 1250, loss[ctc_loss=0.07342, att_loss=0.2403, loss=0.2069, over 16971.00 frames. utt_duration=1359 frames, utt_pad_proportion=0.006608, over 50.00 utterances.], tot_loss[ctc_loss=0.06424, att_loss=0.231, loss=0.1977, over 3276293.32 frames. utt_duration=1276 frames, utt_pad_proportion=0.04441, over 10281.06 utterances.], batch size: 50, lr: 3.71e-03, grad_scale: 8.0 2023-03-09 12:46:20,450 INFO [train2.py:809] (2/4) Epoch 29, batch 1300, loss[ctc_loss=0.07069, att_loss=0.2505, loss=0.2145, over 17070.00 frames. utt_duration=1315 frames, utt_pad_proportion=0.007896, over 52.00 utterances.], tot_loss[ctc_loss=0.06438, att_loss=0.2309, loss=0.1976, over 3280060.03 frames. utt_duration=1266 frames, utt_pad_proportion=0.04663, over 10379.09 utterances.], batch size: 52, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:46:40,102 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.906e+02 2.330e+02 2.870e+02 1.155e+03, threshold=4.661e+02, percent-clipped=5.0 2023-03-09 12:47:40,653 INFO [train2.py:809] (2/4) Epoch 29, batch 1350, loss[ctc_loss=0.05418, att_loss=0.2079, loss=0.1771, over 15508.00 frames. utt_duration=1725 frames, utt_pad_proportion=0.008226, over 36.00 utterances.], tot_loss[ctc_loss=0.06461, att_loss=0.2317, loss=0.1983, over 3280555.29 frames. utt_duration=1221 frames, utt_pad_proportion=0.05831, over 10764.57 utterances.], batch size: 36, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:48:17,511 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=112918.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:48:22,230 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=112921.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:48:30,065 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7108, 5.0357, 4.5976, 5.0445, 4.4952, 4.7530, 5.1383, 4.9383], device='cuda:2'), covar=tensor([0.0661, 0.0347, 0.0819, 0.0436, 0.0441, 0.0321, 0.0233, 0.0213], device='cuda:2'), in_proj_covar=tensor([0.0409, 0.0343, 0.0383, 0.0386, 0.0341, 0.0247, 0.0324, 0.0305], device='cuda:2'), out_proj_covar=tensor([0.0007, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 12:48:41,466 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=112933.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 12:49:00,127 INFO [train2.py:809] (2/4) Epoch 29, batch 1400, loss[ctc_loss=0.0481, att_loss=0.208, loss=0.1761, over 15768.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008604, over 38.00 utterances.], tot_loss[ctc_loss=0.06403, att_loss=0.2312, loss=0.1978, over 3280591.10 frames. utt_duration=1236 frames, utt_pad_proportion=0.05482, over 10628.05 utterances.], batch size: 38, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:49:19,016 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.843e+02 2.238e+02 2.535e+02 6.230e+02, threshold=4.475e+02, percent-clipped=2.0 2023-03-09 12:49:29,284 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5627, 2.8692, 3.6670, 4.6230, 4.0433, 4.0919, 3.0891, 2.7834], device='cuda:2'), covar=tensor([0.0684, 0.1915, 0.0734, 0.0433, 0.0870, 0.0460, 0.1369, 0.1732], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0220, 0.0185, 0.0227, 0.0235, 0.0192, 0.0205, 0.0191], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:49:41,520 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.2220, 5.5035, 5.7424, 5.5577, 5.7335, 6.1815, 5.3600, 6.2113], device='cuda:2'), covar=tensor([0.0618, 0.0667, 0.0733, 0.1252, 0.1518, 0.0749, 0.0707, 0.0648], device='cuda:2'), in_proj_covar=tensor([0.0912, 0.0528, 0.0638, 0.0680, 0.0905, 0.0663, 0.0509, 0.0636], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:49:44,607 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=112973.0, num_to_drop=1, layers_to_drop={3} 2023-03-09 12:49:59,375 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=112982.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:50:19,499 INFO [train2.py:809] (2/4) Epoch 29, batch 1450, loss[ctc_loss=0.05093, att_loss=0.2162, loss=0.1832, over 16122.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006052, over 42.00 utterances.], tot_loss[ctc_loss=0.06397, att_loss=0.2305, loss=0.1972, over 3274555.85 frames. utt_duration=1236 frames, utt_pad_proportion=0.05799, over 10613.06 utterances.], batch size: 42, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:51:17,172 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4403, 2.7984, 3.3006, 4.3612, 3.8585, 3.9235, 2.9323, 2.3876], device='cuda:2'), covar=tensor([0.0720, 0.1810, 0.0830, 0.0596, 0.0962, 0.0489, 0.1489, 0.2014], device='cuda:2'), in_proj_covar=tensor([0.0188, 0.0219, 0.0184, 0.0226, 0.0233, 0.0191, 0.0204, 0.0190], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:51:39,659 INFO [train2.py:809] (2/4) Epoch 29, batch 1500, loss[ctc_loss=0.05871, att_loss=0.2399, loss=0.2036, over 17307.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02414, over 59.00 utterances.], tot_loss[ctc_loss=0.06489, att_loss=0.2311, loss=0.1979, over 3272408.68 frames. utt_duration=1220 frames, utt_pad_proportion=0.06261, over 10744.60 utterances.], batch size: 59, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:51:40,001 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=113045.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:51:58,818 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.764e+02 2.089e+02 2.547e+02 5.820e+02, threshold=4.178e+02, percent-clipped=2.0 2023-03-09 12:52:55,237 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=113093.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:52:58,847 INFO [train2.py:809] (2/4) Epoch 29, batch 1550, loss[ctc_loss=0.04123, att_loss=0.2113, loss=0.1773, over 16021.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.007136, over 40.00 utterances.], tot_loss[ctc_loss=0.06468, att_loss=0.231, loss=0.1977, over 3272140.80 frames. utt_duration=1230 frames, utt_pad_proportion=0.06072, over 10657.59 utterances.], batch size: 40, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:53:16,020 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2023-03-09 12:53:30,838 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7108, 4.7083, 4.8248, 4.7281, 5.2468, 4.6150, 4.6905, 2.5743], device='cuda:2'), covar=tensor([0.0191, 0.0244, 0.0212, 0.0209, 0.0499, 0.0200, 0.0225, 0.1648], device='cuda:2'), in_proj_covar=tensor([0.0201, 0.0229, 0.0225, 0.0241, 0.0387, 0.0198, 0.0216, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:53:44,286 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.81 vs. limit=5.0 2023-03-09 12:54:19,351 INFO [train2.py:809] (2/4) Epoch 29, batch 1600, loss[ctc_loss=0.07092, att_loss=0.2452, loss=0.2103, over 17129.00 frames. utt_duration=1225 frames, utt_pad_proportion=0.01458, over 56.00 utterances.], tot_loss[ctc_loss=0.06454, att_loss=0.2315, loss=0.1981, over 3270421.89 frames. utt_duration=1227 frames, utt_pad_proportion=0.06157, over 10673.97 utterances.], batch size: 56, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:54:37,609 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4653, 4.8840, 4.7521, 4.7620, 4.9087, 4.5653, 3.4750, 4.8313], device='cuda:2'), covar=tensor([0.0150, 0.0130, 0.0160, 0.0102, 0.0137, 0.0147, 0.0706, 0.0256], device='cuda:2'), in_proj_covar=tensor([0.0098, 0.0093, 0.0118, 0.0074, 0.0081, 0.0091, 0.0107, 0.0112], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:54:38,749 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.261e+02 1.848e+02 2.168e+02 2.655e+02 4.318e+02, threshold=4.336e+02, percent-clipped=2.0 2023-03-09 12:55:22,789 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4172, 4.8880, 4.7221, 4.7978, 4.8875, 4.4978, 3.5994, 4.8129], device='cuda:2'), covar=tensor([0.0151, 0.0109, 0.0160, 0.0092, 0.0105, 0.0134, 0.0645, 0.0200], device='cuda:2'), in_proj_covar=tensor([0.0098, 0.0093, 0.0118, 0.0074, 0.0081, 0.0091, 0.0107, 0.0112], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 12:55:38,882 INFO [train2.py:809] (2/4) Epoch 29, batch 1650, loss[ctc_loss=0.04615, att_loss=0.2079, loss=0.1756, over 15890.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.008308, over 39.00 utterances.], tot_loss[ctc_loss=0.0651, att_loss=0.2319, loss=0.1985, over 3274794.98 frames. utt_duration=1227 frames, utt_pad_proportion=0.06008, over 10684.89 utterances.], batch size: 39, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:55:40,862 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4869, 4.6161, 4.6907, 4.6071, 5.2958, 4.4753, 4.5950, 2.7042], device='cuda:2'), covar=tensor([0.0355, 0.0405, 0.0381, 0.0419, 0.0652, 0.0321, 0.0371, 0.1659], device='cuda:2'), in_proj_covar=tensor([0.0202, 0.0230, 0.0226, 0.0241, 0.0389, 0.0198, 0.0217, 0.0224], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:55:53,048 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2023-03-09 12:56:06,438 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-03-09 12:56:14,947 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=113218.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:56:38,806 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=113233.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:56:40,394 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=113234.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:56:58,264 INFO [train2.py:809] (2/4) Epoch 29, batch 1700, loss[ctc_loss=0.05993, att_loss=0.2318, loss=0.1975, over 16885.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006685, over 49.00 utterances.], tot_loss[ctc_loss=0.06381, att_loss=0.2306, loss=0.1972, over 3269701.82 frames. utt_duration=1228 frames, utt_pad_proportion=0.06028, over 10665.62 utterances.], batch size: 49, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:57:17,570 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.337e+02 1.796e+02 2.106e+02 2.556e+02 4.663e+02, threshold=4.212e+02, percent-clipped=1.0 2023-03-09 12:57:18,130 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0094, 4.1238, 4.0024, 3.9506, 4.4718, 3.9607, 3.9232, 2.5213], device='cuda:2'), covar=tensor([0.0437, 0.0507, 0.0527, 0.0488, 0.0790, 0.0397, 0.0459, 0.1632], device='cuda:2'), in_proj_covar=tensor([0.0201, 0.0228, 0.0225, 0.0239, 0.0387, 0.0197, 0.0216, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 12:57:31,630 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=113266.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:57:43,284 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=113273.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 12:57:46,855 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.23 vs. limit=5.0 2023-03-09 12:57:49,295 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=113277.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:57:55,546 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=113281.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:58:18,081 INFO [train2.py:809] (2/4) Epoch 29, batch 1750, loss[ctc_loss=0.05279, att_loss=0.2133, loss=0.1812, over 16012.00 frames. utt_duration=1603 frames, utt_pad_proportion=0.006908, over 40.00 utterances.], tot_loss[ctc_loss=0.0633, att_loss=0.2297, loss=0.1965, over 3273586.81 frames. utt_duration=1252 frames, utt_pad_proportion=0.053, over 10468.40 utterances.], batch size: 40, lr: 3.70e-03, grad_scale: 8.0 2023-03-09 12:58:18,460 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=113295.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:58:37,253 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9084, 5.1818, 4.7397, 5.2221, 4.6184, 4.9316, 5.2944, 5.0809], device='cuda:2'), covar=tensor([0.0529, 0.0270, 0.0754, 0.0345, 0.0381, 0.0270, 0.0224, 0.0183], device='cuda:2'), in_proj_covar=tensor([0.0407, 0.0343, 0.0383, 0.0386, 0.0341, 0.0248, 0.0325, 0.0306], device='cuda:2'), out_proj_covar=tensor([0.0007, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 12:58:59,082 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=113321.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 12:59:06,400 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.53 vs. limit=5.0 2023-03-09 12:59:37,377 INFO [train2.py:809] (2/4) Epoch 29, batch 1800, loss[ctc_loss=0.05018, att_loss=0.2162, loss=0.183, over 16164.00 frames. utt_duration=1578 frames, utt_pad_proportion=0.008048, over 41.00 utterances.], tot_loss[ctc_loss=0.06368, att_loss=0.2304, loss=0.1971, over 3273357.78 frames. utt_duration=1243 frames, utt_pad_proportion=0.05487, over 10548.46 utterances.], batch size: 41, lr: 3.70e-03, grad_scale: 4.0 2023-03-09 12:59:39,122 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2023-03-09 12:59:57,720 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.881e+02 2.263e+02 2.651e+02 5.025e+02, threshold=4.526e+02, percent-clipped=2.0 2023-03-09 13:00:07,796 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.78 vs. limit=2.0 2023-03-09 13:00:18,141 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0078, 5.2418, 5.2179, 5.2560, 5.2766, 5.2549, 4.9048, 4.7468], device='cuda:2'), covar=tensor([0.1030, 0.0590, 0.0295, 0.0465, 0.0302, 0.0319, 0.0465, 0.0337], device='cuda:2'), in_proj_covar=tensor([0.0532, 0.0378, 0.0373, 0.0379, 0.0442, 0.0445, 0.0376, 0.0414], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 13:00:37,689 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9996, 3.7331, 3.6851, 3.1666, 3.6692, 3.8566, 3.7192, 2.6637], device='cuda:2'), covar=tensor([0.1020, 0.1029, 0.1624, 0.2873, 0.1367, 0.1553, 0.0860, 0.3219], device='cuda:2'), in_proj_covar=tensor([0.0208, 0.0210, 0.0227, 0.0277, 0.0186, 0.0288, 0.0209, 0.0232], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 13:00:55,299 INFO [train2.py:809] (2/4) Epoch 29, batch 1850, loss[ctc_loss=0.06966, att_loss=0.2499, loss=0.2139, over 17294.00 frames. utt_duration=1259 frames, utt_pad_proportion=0.01153, over 55.00 utterances.], tot_loss[ctc_loss=0.06414, att_loss=0.2307, loss=0.1974, over 3279811.75 frames. utt_duration=1255 frames, utt_pad_proportion=0.04954, over 10462.84 utterances.], batch size: 55, lr: 3.70e-03, grad_scale: 4.0 2023-03-09 13:02:00,031 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0104, 5.3199, 4.9589, 5.3660, 4.7053, 5.0142, 5.4686, 5.2464], device='cuda:2'), covar=tensor([0.0577, 0.0290, 0.0732, 0.0333, 0.0422, 0.0263, 0.0218, 0.0192], device='cuda:2'), in_proj_covar=tensor([0.0404, 0.0341, 0.0381, 0.0383, 0.0339, 0.0245, 0.0323, 0.0304], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 13:02:15,433 INFO [train2.py:809] (2/4) Epoch 29, batch 1900, loss[ctc_loss=0.05672, att_loss=0.215, loss=0.1834, over 14455.00 frames. utt_duration=1808 frames, utt_pad_proportion=0.03866, over 32.00 utterances.], tot_loss[ctc_loss=0.0642, att_loss=0.231, loss=0.1976, over 3281938.70 frames. utt_duration=1253 frames, utt_pad_proportion=0.05076, over 10489.65 utterances.], batch size: 32, lr: 3.70e-03, grad_scale: 4.0 2023-03-09 13:02:35,441 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 1.919e+02 2.347e+02 2.814e+02 3.658e+03, threshold=4.695e+02, percent-clipped=4.0 2023-03-09 13:03:34,130 INFO [train2.py:809] (2/4) Epoch 29, batch 1950, loss[ctc_loss=0.05003, att_loss=0.202, loss=0.1716, over 15355.00 frames. utt_duration=1756 frames, utt_pad_proportion=0.0122, over 35.00 utterances.], tot_loss[ctc_loss=0.06349, att_loss=0.2304, loss=0.1971, over 3282407.29 frames. utt_duration=1245 frames, utt_pad_proportion=0.05301, over 10559.15 utterances.], batch size: 35, lr: 3.69e-03, grad_scale: 4.0 2023-03-09 13:04:41,116 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6199, 4.9052, 4.5333, 4.9234, 4.3594, 4.5332, 5.0048, 4.8451], device='cuda:2'), covar=tensor([0.0645, 0.0332, 0.0820, 0.0409, 0.0449, 0.0438, 0.0271, 0.0212], device='cuda:2'), in_proj_covar=tensor([0.0403, 0.0340, 0.0379, 0.0382, 0.0338, 0.0244, 0.0322, 0.0303], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 13:04:45,102 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.06 vs. limit=5.0 2023-03-09 13:04:54,505 INFO [train2.py:809] (2/4) Epoch 29, batch 2000, loss[ctc_loss=0.06605, att_loss=0.2283, loss=0.1959, over 16414.00 frames. utt_duration=1494 frames, utt_pad_proportion=0.006891, over 44.00 utterances.], tot_loss[ctc_loss=0.06362, att_loss=0.2309, loss=0.1975, over 3279852.38 frames. utt_duration=1245 frames, utt_pad_proportion=0.05374, over 10548.58 utterances.], batch size: 44, lr: 3.69e-03, grad_scale: 8.0 2023-03-09 13:05:15,508 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.751e+02 2.072e+02 2.530e+02 5.087e+02, threshold=4.144e+02, percent-clipped=1.0 2023-03-09 13:05:45,792 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=113577.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:06:06,330 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=113590.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:06:06,465 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5474, 3.0448, 3.6693, 3.2067, 3.5385, 4.6033, 4.4266, 3.4414], device='cuda:2'), covar=tensor([0.0346, 0.1813, 0.1242, 0.1196, 0.1109, 0.0969, 0.0566, 0.1131], device='cuda:2'), in_proj_covar=tensor([0.0255, 0.0254, 0.0296, 0.0223, 0.0277, 0.0388, 0.0276, 0.0237], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 13:06:10,786 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.87 vs. limit=5.0 2023-03-09 13:06:14,829 INFO [train2.py:809] (2/4) Epoch 29, batch 2050, loss[ctc_loss=0.0799, att_loss=0.2512, loss=0.2169, over 17317.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02368, over 59.00 utterances.], tot_loss[ctc_loss=0.06345, att_loss=0.2307, loss=0.1973, over 3284665.21 frames. utt_duration=1253 frames, utt_pad_proportion=0.05075, over 10495.03 utterances.], batch size: 59, lr: 3.69e-03, grad_scale: 8.0 2023-03-09 13:06:49,203 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=113617.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:07:02,078 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=113625.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:07:23,419 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6669, 4.0710, 3.5563, 3.7614, 4.2964, 4.0545, 3.4787, 4.5857], device='cuda:2'), covar=tensor([0.0649, 0.0447, 0.0840, 0.0604, 0.0634, 0.0573, 0.0681, 0.0308], device='cuda:2'), in_proj_covar=tensor([0.0211, 0.0232, 0.0234, 0.0212, 0.0296, 0.0254, 0.0209, 0.0303], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:2') 2023-03-09 13:07:35,390 INFO [train2.py:809] (2/4) Epoch 29, batch 2100, loss[ctc_loss=0.05306, att_loss=0.2063, loss=0.1756, over 15375.00 frames. utt_duration=1759 frames, utt_pad_proportion=0.01083, over 35.00 utterances.], tot_loss[ctc_loss=0.06298, att_loss=0.2301, loss=0.1967, over 3278931.25 frames. utt_duration=1259 frames, utt_pad_proportion=0.05016, over 10426.98 utterances.], batch size: 35, lr: 3.69e-03, grad_scale: 8.0 2023-03-09 13:07:37,277 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=113646.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:07:55,038 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.850e+02 2.211e+02 2.646e+02 5.386e+02, threshold=4.422e+02, percent-clipped=4.0 2023-03-09 13:08:26,915 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=113678.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:08:55,292 INFO [train2.py:809] (2/4) Epoch 29, batch 2150, loss[ctc_loss=0.05046, att_loss=0.2219, loss=0.1876, over 16138.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.00551, over 42.00 utterances.], tot_loss[ctc_loss=0.06352, att_loss=0.2304, loss=0.197, over 3276361.65 frames. utt_duration=1241 frames, utt_pad_proportion=0.05562, over 10576.65 utterances.], batch size: 42, lr: 3.69e-03, grad_scale: 8.0 2023-03-09 13:09:13,870 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=113707.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:10:13,174 INFO [train2.py:809] (2/4) Epoch 29, batch 2200, loss[ctc_loss=0.07902, att_loss=0.2388, loss=0.2068, over 17450.00 frames. utt_duration=1013 frames, utt_pad_proportion=0.04516, over 69.00 utterances.], tot_loss[ctc_loss=0.06455, att_loss=0.2312, loss=0.1979, over 3278020.56 frames. utt_duration=1240 frames, utt_pad_proportion=0.05571, over 10585.25 utterances.], batch size: 69, lr: 3.69e-03, grad_scale: 8.0 2023-03-09 13:10:30,453 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.07 vs. limit=5.0 2023-03-09 13:10:32,604 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.369e+02 1.987e+02 2.347e+02 2.821e+02 4.899e+02, threshold=4.694e+02, percent-clipped=4.0 2023-03-09 13:11:32,256 INFO [train2.py:809] (2/4) Epoch 29, batch 2250, loss[ctc_loss=0.05895, att_loss=0.24, loss=0.2038, over 16886.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.007062, over 49.00 utterances.], tot_loss[ctc_loss=0.06447, att_loss=0.231, loss=0.1977, over 3278684.42 frames. utt_duration=1245 frames, utt_pad_proportion=0.0548, over 10545.22 utterances.], batch size: 49, lr: 3.69e-03, grad_scale: 8.0 2023-03-09 13:12:01,337 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.79 vs. limit=2.0 2023-03-09 13:12:18,342 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-03-09 13:12:51,815 INFO [train2.py:809] (2/4) Epoch 29, batch 2300, loss[ctc_loss=0.04971, att_loss=0.2253, loss=0.1901, over 16695.00 frames. utt_duration=1453 frames, utt_pad_proportion=0.005386, over 46.00 utterances.], tot_loss[ctc_loss=0.06431, att_loss=0.232, loss=0.1984, over 3289750.50 frames. utt_duration=1228 frames, utt_pad_proportion=0.0557, over 10731.80 utterances.], batch size: 46, lr: 3.69e-03, grad_scale: 8.0 2023-03-09 13:13:12,816 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.846e+02 2.164e+02 2.611e+02 9.969e+02, threshold=4.329e+02, percent-clipped=2.0 2023-03-09 13:13:13,836 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.12 vs. limit=5.0 2023-03-09 13:14:03,906 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=113890.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:14:11,234 INFO [train2.py:809] (2/4) Epoch 29, batch 2350, loss[ctc_loss=0.08292, att_loss=0.244, loss=0.2118, over 17052.00 frames. utt_duration=1288 frames, utt_pad_proportion=0.009688, over 53.00 utterances.], tot_loss[ctc_loss=0.06362, att_loss=0.2319, loss=0.1982, over 3290085.28 frames. utt_duration=1239 frames, utt_pad_proportion=0.054, over 10634.15 utterances.], batch size: 53, lr: 3.69e-03, grad_scale: 8.0 2023-03-09 13:14:27,152 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.67 vs. limit=5.0 2023-03-09 13:14:32,504 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4768, 2.7953, 4.9347, 3.9786, 3.1312, 4.2176, 4.6714, 4.6462], device='cuda:2'), covar=tensor([0.0319, 0.1369, 0.0255, 0.0845, 0.1565, 0.0288, 0.0224, 0.0285], device='cuda:2'), in_proj_covar=tensor([0.0237, 0.0248, 0.0230, 0.0324, 0.0271, 0.0244, 0.0224, 0.0244], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 13:15:19,183 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=113938.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:15:29,841 INFO [train2.py:809] (2/4) Epoch 29, batch 2400, loss[ctc_loss=0.06903, att_loss=0.2291, loss=0.1971, over 16117.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006741, over 42.00 utterances.], tot_loss[ctc_loss=0.06402, att_loss=0.2321, loss=0.1985, over 3292140.68 frames. utt_duration=1229 frames, utt_pad_proportion=0.0548, over 10726.74 utterances.], batch size: 42, lr: 3.69e-03, grad_scale: 8.0 2023-03-09 13:15:50,422 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 1.796e+02 2.124e+02 2.708e+02 5.253e+02, threshold=4.249e+02, percent-clipped=4.0 2023-03-09 13:16:14,204 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=113973.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:16:49,308 INFO [train2.py:809] (2/4) Epoch 29, batch 2450, loss[ctc_loss=0.0469, att_loss=0.2301, loss=0.1935, over 16704.00 frames. utt_duration=1454 frames, utt_pad_proportion=0.004851, over 46.00 utterances.], tot_loss[ctc_loss=0.06399, att_loss=0.2314, loss=0.1979, over 3285067.31 frames. utt_duration=1236 frames, utt_pad_proportion=0.0553, over 10642.59 utterances.], batch size: 46, lr: 3.69e-03, grad_scale: 4.0 2023-03-09 13:17:04,656 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=114002.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:18:12,390 INFO [train2.py:809] (2/4) Epoch 29, batch 2500, loss[ctc_loss=0.0539, att_loss=0.205, loss=0.1748, over 15664.00 frames. utt_duration=1695 frames, utt_pad_proportion=0.007706, over 37.00 utterances.], tot_loss[ctc_loss=0.06447, att_loss=0.2314, loss=0.198, over 3271699.78 frames. utt_duration=1214 frames, utt_pad_proportion=0.06356, over 10796.69 utterances.], batch size: 37, lr: 3.69e-03, grad_scale: 4.0 2023-03-09 13:18:34,819 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.371e+02 1.923e+02 2.146e+02 2.615e+02 3.828e+02, threshold=4.291e+02, percent-clipped=0.0 2023-03-09 13:19:32,906 INFO [train2.py:809] (2/4) Epoch 29, batch 2550, loss[ctc_loss=0.06415, att_loss=0.2502, loss=0.213, over 17039.00 frames. utt_duration=1338 frames, utt_pad_proportion=0.006696, over 51.00 utterances.], tot_loss[ctc_loss=0.06424, att_loss=0.2318, loss=0.1983, over 3277691.32 frames. utt_duration=1236 frames, utt_pad_proportion=0.05599, over 10622.74 utterances.], batch size: 51, lr: 3.68e-03, grad_scale: 4.0 2023-03-09 13:20:23,967 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=114127.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:20:52,674 INFO [train2.py:809] (2/4) Epoch 29, batch 2600, loss[ctc_loss=0.05754, att_loss=0.2108, loss=0.1801, over 15517.00 frames. utt_duration=1726 frames, utt_pad_proportion=0.007144, over 36.00 utterances.], tot_loss[ctc_loss=0.06456, att_loss=0.2318, loss=0.1983, over 3266316.65 frames. utt_duration=1191 frames, utt_pad_proportion=0.06966, over 10988.32 utterances.], batch size: 36, lr: 3.68e-03, grad_scale: 4.0 2023-03-09 13:21:15,099 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.450e+01 1.839e+02 2.178e+02 2.659e+02 5.439e+02, threshold=4.356e+02, percent-clipped=3.0 2023-03-09 13:21:55,349 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8619, 5.2129, 4.7699, 5.1987, 4.6519, 4.8562, 5.3069, 5.0575], device='cuda:2'), covar=tensor([0.0618, 0.0312, 0.0823, 0.0407, 0.0423, 0.0307, 0.0282, 0.0221], device='cuda:2'), in_proj_covar=tensor([0.0405, 0.0343, 0.0383, 0.0385, 0.0338, 0.0245, 0.0323, 0.0305], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 13:22:01,469 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=114188.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:22:11,799 INFO [train2.py:809] (2/4) Epoch 29, batch 2650, loss[ctc_loss=0.08966, att_loss=0.2405, loss=0.2103, over 16883.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006788, over 49.00 utterances.], tot_loss[ctc_loss=0.0642, att_loss=0.2312, loss=0.1978, over 3267277.76 frames. utt_duration=1213 frames, utt_pad_proportion=0.06411, over 10785.72 utterances.], batch size: 49, lr: 3.68e-03, grad_scale: 4.0 2023-03-09 13:22:52,868 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2023-03-09 13:22:55,448 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4663, 3.1759, 3.4490, 4.4067, 4.0173, 3.9584, 3.0108, 2.4136], device='cuda:2'), covar=tensor([0.0718, 0.1708, 0.0816, 0.0576, 0.0893, 0.0503, 0.1409, 0.1974], device='cuda:2'), in_proj_covar=tensor([0.0189, 0.0220, 0.0186, 0.0229, 0.0236, 0.0194, 0.0206, 0.0190], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 13:23:31,705 INFO [train2.py:809] (2/4) Epoch 29, batch 2700, loss[ctc_loss=0.05368, att_loss=0.2194, loss=0.1863, over 16119.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.00596, over 42.00 utterances.], tot_loss[ctc_loss=0.06428, att_loss=0.2308, loss=0.1975, over 3267925.36 frames. utt_duration=1228 frames, utt_pad_proportion=0.06119, over 10660.23 utterances.], batch size: 42, lr: 3.68e-03, grad_scale: 4.0 2023-03-09 13:23:53,981 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.272e+02 1.816e+02 2.102e+02 2.530e+02 4.306e+02, threshold=4.205e+02, percent-clipped=0.0 2023-03-09 13:24:16,607 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=114273.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:24:51,547 INFO [train2.py:809] (2/4) Epoch 29, batch 2750, loss[ctc_loss=0.04119, att_loss=0.2309, loss=0.1929, over 16956.00 frames. utt_duration=1358 frames, utt_pad_proportion=0.008137, over 50.00 utterances.], tot_loss[ctc_loss=0.06454, att_loss=0.2309, loss=0.1977, over 3270326.89 frames. utt_duration=1219 frames, utt_pad_proportion=0.06304, over 10741.96 utterances.], batch size: 50, lr: 3.68e-03, grad_scale: 4.0 2023-03-09 13:25:02,997 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=114302.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:25:32,585 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=114321.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:26:11,316 INFO [train2.py:809] (2/4) Epoch 29, batch 2800, loss[ctc_loss=0.05433, att_loss=0.2082, loss=0.1774, over 15650.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008576, over 37.00 utterances.], tot_loss[ctc_loss=0.06366, att_loss=0.2306, loss=0.1972, over 3277332.95 frames. utt_duration=1241 frames, utt_pad_proportion=0.05565, over 10580.15 utterances.], batch size: 37, lr: 3.68e-03, grad_scale: 8.0 2023-03-09 13:26:19,150 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=114350.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:26:33,215 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.204e+02 1.822e+02 2.190e+02 2.901e+02 6.134e+02, threshold=4.379e+02, percent-clipped=9.0 2023-03-09 13:26:38,287 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7558, 5.0414, 4.9680, 4.9934, 5.0879, 4.7575, 3.4821, 5.0896], device='cuda:2'), covar=tensor([0.0107, 0.0119, 0.0127, 0.0086, 0.0106, 0.0150, 0.0722, 0.0158], device='cuda:2'), in_proj_covar=tensor([0.0098, 0.0094, 0.0119, 0.0074, 0.0081, 0.0092, 0.0107, 0.0114], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 13:27:30,405 INFO [train2.py:809] (2/4) Epoch 29, batch 2850, loss[ctc_loss=0.07148, att_loss=0.2487, loss=0.2132, over 17302.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02364, over 59.00 utterances.], tot_loss[ctc_loss=0.06423, att_loss=0.2314, loss=0.198, over 3281803.76 frames. utt_duration=1220 frames, utt_pad_proportion=0.0592, over 10768.96 utterances.], batch size: 59, lr: 3.68e-03, grad_scale: 8.0 2023-03-09 13:28:50,606 INFO [train2.py:809] (2/4) Epoch 29, batch 2900, loss[ctc_loss=0.0648, att_loss=0.2176, loss=0.187, over 15886.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.008729, over 39.00 utterances.], tot_loss[ctc_loss=0.06423, att_loss=0.2308, loss=0.1975, over 3265564.22 frames. utt_duration=1208 frames, utt_pad_proportion=0.0665, over 10822.47 utterances.], batch size: 39, lr: 3.68e-03, grad_scale: 8.0 2023-03-09 13:28:56,230 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9271, 4.8063, 4.9939, 4.9007, 5.3787, 4.8419, 4.8353, 2.7700], device='cuda:2'), covar=tensor([0.0190, 0.0232, 0.0193, 0.0231, 0.0563, 0.0181, 0.0209, 0.1590], device='cuda:2'), in_proj_covar=tensor([0.0203, 0.0231, 0.0226, 0.0242, 0.0390, 0.0200, 0.0219, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 13:29:12,892 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.369e+02 1.896e+02 2.215e+02 2.552e+02 6.599e+02, threshold=4.430e+02, percent-clipped=4.0 2023-03-09 13:29:52,212 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=114483.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:30:10,924 INFO [train2.py:809] (2/4) Epoch 29, batch 2950, loss[ctc_loss=0.08189, att_loss=0.2524, loss=0.2183, over 17059.00 frames. utt_duration=1289 frames, utt_pad_proportion=0.008679, over 53.00 utterances.], tot_loss[ctc_loss=0.06452, att_loss=0.2308, loss=0.1975, over 3257863.41 frames. utt_duration=1188 frames, utt_pad_proportion=0.07434, over 10985.26 utterances.], batch size: 53, lr: 3.68e-03, grad_scale: 8.0 2023-03-09 13:31:30,953 INFO [train2.py:809] (2/4) Epoch 29, batch 3000, loss[ctc_loss=0.06349, att_loss=0.2406, loss=0.2052, over 16461.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.007051, over 46.00 utterances.], tot_loss[ctc_loss=0.06418, att_loss=0.2307, loss=0.1974, over 3266047.78 frames. utt_duration=1203 frames, utt_pad_proportion=0.06848, over 10877.57 utterances.], batch size: 46, lr: 3.68e-03, grad_scale: 8.0 2023-03-09 13:31:30,954 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 13:31:44,955 INFO [train2.py:843] (2/4) Epoch 29, validation: ctc_loss=0.04115, att_loss=0.2348, loss=0.1961, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 13:31:44,955 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 13:32:06,766 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 1.877e+02 2.271e+02 2.617e+02 4.276e+02, threshold=4.543e+02, percent-clipped=0.0 2023-03-09 13:33:04,603 INFO [train2.py:809] (2/4) Epoch 29, batch 3050, loss[ctc_loss=0.05338, att_loss=0.219, loss=0.1859, over 15361.00 frames. utt_duration=1757 frames, utt_pad_proportion=0.01192, over 35.00 utterances.], tot_loss[ctc_loss=0.06388, att_loss=0.2303, loss=0.197, over 3255479.93 frames. utt_duration=1233 frames, utt_pad_proportion=0.06302, over 10574.08 utterances.], batch size: 35, lr: 3.68e-03, grad_scale: 8.0 2023-03-09 13:33:06,404 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1159, 5.4639, 5.6316, 5.4279, 5.6238, 6.0644, 5.3544, 6.1604], device='cuda:2'), covar=tensor([0.0702, 0.0726, 0.0874, 0.1440, 0.1752, 0.0860, 0.0723, 0.0618], device='cuda:2'), in_proj_covar=tensor([0.0912, 0.0530, 0.0641, 0.0676, 0.0903, 0.0662, 0.0509, 0.0635], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 13:34:24,746 INFO [train2.py:809] (2/4) Epoch 29, batch 3100, loss[ctc_loss=0.06226, att_loss=0.2036, loss=0.1754, over 15774.00 frames. utt_duration=1662 frames, utt_pad_proportion=0.008589, over 38.00 utterances.], tot_loss[ctc_loss=0.0631, att_loss=0.2302, loss=0.1968, over 3247457.64 frames. utt_duration=1217 frames, utt_pad_proportion=0.06938, over 10690.39 utterances.], batch size: 38, lr: 3.68e-03, grad_scale: 8.0 2023-03-09 13:34:46,588 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.249e+02 1.785e+02 2.252e+02 2.835e+02 6.156e+02, threshold=4.505e+02, percent-clipped=2.0 2023-03-09 13:35:44,837 INFO [train2.py:809] (2/4) Epoch 29, batch 3150, loss[ctc_loss=0.0679, att_loss=0.244, loss=0.2088, over 17329.00 frames. utt_duration=879.1 frames, utt_pad_proportion=0.07953, over 79.00 utterances.], tot_loss[ctc_loss=0.06292, att_loss=0.2304, loss=0.1969, over 3250221.99 frames. utt_duration=1212 frames, utt_pad_proportion=0.06945, over 10736.42 utterances.], batch size: 79, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:36:18,395 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=114716.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:36:28,379 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.31 vs. limit=2.0 2023-03-09 13:36:34,223 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7419, 2.3248, 2.4617, 3.2905, 3.1286, 3.3064, 2.6238, 2.2226], device='cuda:2'), covar=tensor([0.0779, 0.1906, 0.1128, 0.0702, 0.0848, 0.0488, 0.1294, 0.1770], device='cuda:2'), in_proj_covar=tensor([0.0192, 0.0222, 0.0189, 0.0233, 0.0239, 0.0196, 0.0209, 0.0194], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-09 13:37:04,182 INFO [train2.py:809] (2/4) Epoch 29, batch 3200, loss[ctc_loss=0.0658, att_loss=0.2424, loss=0.2071, over 17382.00 frames. utt_duration=1105 frames, utt_pad_proportion=0.03387, over 63.00 utterances.], tot_loss[ctc_loss=0.06318, att_loss=0.23, loss=0.1966, over 3253837.26 frames. utt_duration=1208 frames, utt_pad_proportion=0.06973, over 10789.12 utterances.], batch size: 63, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:37:27,013 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.042e+02 1.814e+02 2.180e+02 2.558e+02 5.727e+02, threshold=4.361e+02, percent-clipped=1.0 2023-03-09 13:37:43,899 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.02 vs. limit=5.0 2023-03-09 13:37:52,369 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3094, 2.5454, 3.5033, 2.7150, 3.4487, 4.5175, 4.4126, 2.7904], device='cuda:2'), covar=tensor([0.0520, 0.2434, 0.1318, 0.1839, 0.1078, 0.0848, 0.0559, 0.1865], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0251, 0.0294, 0.0220, 0.0274, 0.0384, 0.0275, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 13:37:54,471 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=114777.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:38:04,230 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=114783.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:38:07,290 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9416, 5.3297, 5.4825, 5.2430, 5.4603, 5.9642, 5.2940, 6.0375], device='cuda:2'), covar=tensor([0.0751, 0.0729, 0.0974, 0.1540, 0.1811, 0.0802, 0.0711, 0.0699], device='cuda:2'), in_proj_covar=tensor([0.0914, 0.0530, 0.0640, 0.0678, 0.0906, 0.0661, 0.0511, 0.0635], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 13:38:22,855 INFO [train2.py:809] (2/4) Epoch 29, batch 3250, loss[ctc_loss=0.06228, att_loss=0.2163, loss=0.1855, over 15657.00 frames. utt_duration=1694 frames, utt_pad_proportion=0.008007, over 37.00 utterances.], tot_loss[ctc_loss=0.06286, att_loss=0.23, loss=0.1965, over 3256710.77 frames. utt_duration=1222 frames, utt_pad_proportion=0.0665, over 10676.63 utterances.], batch size: 37, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:39:20,570 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=114831.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:39:42,519 INFO [train2.py:809] (2/4) Epoch 29, batch 3300, loss[ctc_loss=0.03655, att_loss=0.2085, loss=0.1741, over 15956.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.005806, over 41.00 utterances.], tot_loss[ctc_loss=0.06329, att_loss=0.2307, loss=0.1972, over 3266568.80 frames. utt_duration=1226 frames, utt_pad_proportion=0.06322, over 10672.52 utterances.], batch size: 41, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:40:05,654 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.268e+02 1.794e+02 2.073e+02 2.411e+02 5.454e+02, threshold=4.146e+02, percent-clipped=1.0 2023-03-09 13:40:22,998 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=114871.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:41:01,526 INFO [train2.py:809] (2/4) Epoch 29, batch 3350, loss[ctc_loss=0.07171, att_loss=0.2463, loss=0.2114, over 16768.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006436, over 48.00 utterances.], tot_loss[ctc_loss=0.06358, att_loss=0.2311, loss=0.1976, over 3275389.41 frames. utt_duration=1241 frames, utt_pad_proportion=0.05649, over 10571.44 utterances.], batch size: 48, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:42:00,796 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=114932.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:42:13,676 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9705, 4.1560, 4.0235, 3.9843, 4.4521, 4.0448, 3.9179, 2.5470], device='cuda:2'), covar=tensor([0.0441, 0.0515, 0.0491, 0.0539, 0.0639, 0.0371, 0.0535, 0.1638], device='cuda:2'), in_proj_covar=tensor([0.0201, 0.0228, 0.0224, 0.0241, 0.0384, 0.0197, 0.0217, 0.0223], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 13:42:20,957 INFO [train2.py:809] (2/4) Epoch 29, batch 3400, loss[ctc_loss=0.1056, att_loss=0.2622, loss=0.2309, over 14221.00 frames. utt_duration=391.1 frames, utt_pad_proportion=0.3198, over 146.00 utterances.], tot_loss[ctc_loss=0.06353, att_loss=0.2309, loss=0.1974, over 3272049.23 frames. utt_duration=1229 frames, utt_pad_proportion=0.05969, over 10660.00 utterances.], batch size: 146, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:42:44,863 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.258e+02 1.858e+02 2.229e+02 2.688e+02 7.167e+02, threshold=4.457e+02, percent-clipped=6.0 2023-03-09 13:43:27,104 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.78 vs. limit=5.0 2023-03-09 13:43:35,018 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2023-03-09 13:43:41,690 INFO [train2.py:809] (2/4) Epoch 29, batch 3450, loss[ctc_loss=0.04442, att_loss=0.2301, loss=0.1929, over 16315.00 frames. utt_duration=1451 frames, utt_pad_proportion=0.00722, over 45.00 utterances.], tot_loss[ctc_loss=0.06327, att_loss=0.2306, loss=0.1971, over 3273924.02 frames. utt_duration=1240 frames, utt_pad_proportion=0.05634, over 10577.48 utterances.], batch size: 45, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:43:43,100 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-03-09 13:45:02,582 INFO [train2.py:809] (2/4) Epoch 29, batch 3500, loss[ctc_loss=0.07134, att_loss=0.2492, loss=0.2136, over 17250.00 frames. utt_duration=1256 frames, utt_pad_proportion=0.0149, over 55.00 utterances.], tot_loss[ctc_loss=0.06327, att_loss=0.2307, loss=0.1972, over 3273150.50 frames. utt_duration=1235 frames, utt_pad_proportion=0.05829, over 10613.81 utterances.], batch size: 55, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:45:26,257 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.209e+02 1.733e+02 2.269e+02 2.703e+02 9.473e+02, threshold=4.539e+02, percent-clipped=4.0 2023-03-09 13:45:43,151 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2997, 3.1248, 3.3367, 4.4138, 3.9236, 3.9244, 2.8595, 2.2837], device='cuda:2'), covar=tensor([0.0873, 0.1773, 0.0872, 0.0566, 0.0863, 0.0520, 0.1724, 0.2253], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0223, 0.0189, 0.0233, 0.0240, 0.0196, 0.0210, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-09 13:45:46,135 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=115072.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:46:06,710 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.70 vs. limit=2.0 2023-03-09 13:46:09,451 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=115086.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:46:22,694 INFO [train2.py:809] (2/4) Epoch 29, batch 3550, loss[ctc_loss=0.05302, att_loss=0.2068, loss=0.1761, over 15517.00 frames. utt_duration=1726 frames, utt_pad_proportion=0.007683, over 36.00 utterances.], tot_loss[ctc_loss=0.06324, att_loss=0.2303, loss=0.1969, over 3273194.33 frames. utt_duration=1248 frames, utt_pad_proportion=0.05483, over 10506.71 utterances.], batch size: 36, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:46:33,872 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9156, 5.3120, 5.1446, 5.2570, 5.3645, 4.9445, 3.9840, 5.2861], device='cuda:2'), covar=tensor([0.0111, 0.0086, 0.0108, 0.0062, 0.0067, 0.0107, 0.0551, 0.0135], device='cuda:2'), in_proj_covar=tensor([0.0099, 0.0095, 0.0120, 0.0075, 0.0082, 0.0093, 0.0109, 0.0114], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0004], device='cuda:2') 2023-03-09 13:47:27,453 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.88 vs. limit=2.0 2023-03-09 13:47:42,041 INFO [train2.py:809] (2/4) Epoch 29, batch 3600, loss[ctc_loss=0.05612, att_loss=0.2392, loss=0.2026, over 16481.00 frames. utt_duration=1435 frames, utt_pad_proportion=0.00571, over 46.00 utterances.], tot_loss[ctc_loss=0.06321, att_loss=0.2304, loss=0.197, over 3271559.85 frames. utt_duration=1261 frames, utt_pad_proportion=0.0503, over 10391.42 utterances.], batch size: 46, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:47:45,616 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=115147.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:48:05,804 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.296e+02 1.777e+02 2.035e+02 2.403e+02 5.227e+02, threshold=4.069e+02, percent-clipped=1.0 2023-03-09 13:48:48,755 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2228, 5.5222, 5.1132, 5.5502, 4.9811, 5.1196, 5.6613, 5.4003], device='cuda:2'), covar=tensor([0.0551, 0.0262, 0.0754, 0.0327, 0.0374, 0.0226, 0.0188, 0.0177], device='cuda:2'), in_proj_covar=tensor([0.0403, 0.0339, 0.0380, 0.0381, 0.0336, 0.0244, 0.0319, 0.0302], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 13:49:01,650 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2023-03-09 13:49:02,353 INFO [train2.py:809] (2/4) Epoch 29, batch 3650, loss[ctc_loss=0.047, att_loss=0.2317, loss=0.1948, over 16642.00 frames. utt_duration=1418 frames, utt_pad_proportion=0.004557, over 47.00 utterances.], tot_loss[ctc_loss=0.06354, att_loss=0.231, loss=0.1975, over 3266626.27 frames. utt_duration=1234 frames, utt_pad_proportion=0.0578, over 10603.55 utterances.], batch size: 47, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:49:14,037 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9758, 3.6556, 3.7196, 3.2747, 3.6731, 3.7756, 3.7063, 2.7304], device='cuda:2'), covar=tensor([0.0956, 0.0949, 0.1218, 0.2447, 0.0752, 0.1241, 0.0820, 0.2753], device='cuda:2'), in_proj_covar=tensor([0.0210, 0.0213, 0.0228, 0.0278, 0.0190, 0.0289, 0.0212, 0.0234], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 13:49:54,465 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=115227.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:49:57,896 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8414, 3.5401, 3.9287, 3.4508, 3.9077, 4.9081, 4.7706, 3.7225], device='cuda:2'), covar=tensor([0.0279, 0.1316, 0.1023, 0.1175, 0.0884, 0.0636, 0.0384, 0.0983], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0252, 0.0293, 0.0220, 0.0273, 0.0383, 0.0276, 0.0237], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 13:50:23,436 INFO [train2.py:809] (2/4) Epoch 29, batch 3700, loss[ctc_loss=0.05144, att_loss=0.2199, loss=0.1862, over 16881.00 frames. utt_duration=1380 frames, utt_pad_proportion=0.006803, over 49.00 utterances.], tot_loss[ctc_loss=0.06289, att_loss=0.2305, loss=0.197, over 3265545.07 frames. utt_duration=1241 frames, utt_pad_proportion=0.05563, over 10538.09 utterances.], batch size: 49, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:50:44,136 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2023-03-09 13:50:47,789 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.294e+02 1.820e+02 2.082e+02 2.598e+02 4.716e+02, threshold=4.165e+02, percent-clipped=2.0 2023-03-09 13:51:44,492 INFO [train2.py:809] (2/4) Epoch 29, batch 3750, loss[ctc_loss=0.05354, att_loss=0.2181, loss=0.1852, over 16181.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006121, over 41.00 utterances.], tot_loss[ctc_loss=0.06192, att_loss=0.2297, loss=0.1962, over 3263046.67 frames. utt_duration=1250 frames, utt_pad_proportion=0.055, over 10453.33 utterances.], batch size: 41, lr: 3.67e-03, grad_scale: 8.0 2023-03-09 13:52:02,041 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6078, 4.9532, 4.7938, 4.9059, 5.0632, 4.6449, 3.3856, 4.8372], device='cuda:2'), covar=tensor([0.0126, 0.0100, 0.0135, 0.0081, 0.0079, 0.0117, 0.0754, 0.0185], device='cuda:2'), in_proj_covar=tensor([0.0099, 0.0095, 0.0120, 0.0075, 0.0082, 0.0092, 0.0108, 0.0113], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 13:52:11,432 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.9756, 5.3123, 5.5607, 5.3952, 5.4780, 5.9066, 5.2589, 5.9905], device='cuda:2'), covar=tensor([0.0674, 0.0776, 0.0854, 0.1280, 0.1701, 0.0951, 0.0701, 0.0691], device='cuda:2'), in_proj_covar=tensor([0.0917, 0.0528, 0.0640, 0.0679, 0.0904, 0.0661, 0.0509, 0.0638], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 13:52:39,489 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4864, 2.9708, 4.9427, 3.9150, 3.1550, 4.3330, 4.7774, 4.7312], device='cuda:2'), covar=tensor([0.0343, 0.1288, 0.0308, 0.0910, 0.1530, 0.0272, 0.0263, 0.0286], device='cuda:2'), in_proj_covar=tensor([0.0240, 0.0249, 0.0234, 0.0327, 0.0275, 0.0246, 0.0228, 0.0247], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 13:53:04,026 INFO [train2.py:809] (2/4) Epoch 29, batch 3800, loss[ctc_loss=0.1154, att_loss=0.26, loss=0.2311, over 14165.00 frames. utt_duration=392.1 frames, utt_pad_proportion=0.318, over 145.00 utterances.], tot_loss[ctc_loss=0.06222, att_loss=0.2298, loss=0.1962, over 3252894.37 frames. utt_duration=1212 frames, utt_pad_proportion=0.06793, over 10746.40 utterances.], batch size: 145, lr: 3.66e-03, grad_scale: 8.0 2023-03-09 13:53:27,652 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.885e+02 2.239e+02 2.673e+02 5.789e+02, threshold=4.479e+02, percent-clipped=6.0 2023-03-09 13:53:43,990 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=115370.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:53:44,686 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2023-03-09 13:53:46,197 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=115371.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:53:47,646 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=115372.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:53:54,731 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2023-03-09 13:54:23,177 INFO [train2.py:809] (2/4) Epoch 29, batch 3850, loss[ctc_loss=0.05772, att_loss=0.2147, loss=0.1833, over 16127.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.006264, over 42.00 utterances.], tot_loss[ctc_loss=0.06224, att_loss=0.2298, loss=0.1963, over 3259744.84 frames. utt_duration=1218 frames, utt_pad_proportion=0.06445, over 10717.64 utterances.], batch size: 42, lr: 3.66e-03, grad_scale: 8.0 2023-03-09 13:55:01,818 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=115420.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:55:18,396 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=115431.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:55:20,006 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=115432.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:55:34,964 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=115442.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:55:39,326 INFO [train2.py:809] (2/4) Epoch 29, batch 3900, loss[ctc_loss=0.07623, att_loss=0.2435, loss=0.2101, over 17301.00 frames. utt_duration=1175 frames, utt_pad_proportion=0.02442, over 59.00 utterances.], tot_loss[ctc_loss=0.06264, att_loss=0.2307, loss=0.1971, over 3270666.88 frames. utt_duration=1218 frames, utt_pad_proportion=0.06055, over 10757.03 utterances.], batch size: 59, lr: 3.66e-03, grad_scale: 8.0 2023-03-09 13:55:57,077 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1316, 4.2975, 4.2740, 4.5765, 2.6848, 4.4847, 2.9331, 1.7135], device='cuda:2'), covar=tensor([0.0463, 0.0294, 0.0663, 0.0238, 0.1638, 0.0249, 0.1328, 0.1640], device='cuda:2'), in_proj_covar=tensor([0.0223, 0.0195, 0.0267, 0.0184, 0.0226, 0.0175, 0.0237, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 13:56:02,813 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 1.900e+02 2.175e+02 2.636e+02 4.480e+02, threshold=4.350e+02, percent-clipped=1.0 2023-03-09 13:56:05,258 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.12 vs. limit=2.0 2023-03-09 13:56:51,642 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.5327, 5.8110, 5.2620, 5.5637, 5.4561, 4.9663, 5.2619, 5.0101], device='cuda:2'), covar=tensor([0.1306, 0.0950, 0.1103, 0.0935, 0.1107, 0.1644, 0.2348, 0.2295], device='cuda:2'), in_proj_covar=tensor([0.0563, 0.0639, 0.0492, 0.0475, 0.0454, 0.0486, 0.0639, 0.0544], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 13:56:55,925 INFO [train2.py:809] (2/4) Epoch 29, batch 3950, loss[ctc_loss=0.04876, att_loss=0.2082, loss=0.1763, over 15608.00 frames. utt_duration=1689 frames, utt_pad_proportion=0.008803, over 37.00 utterances.], tot_loss[ctc_loss=0.06314, att_loss=0.2314, loss=0.1977, over 3283435.55 frames. utt_duration=1242 frames, utt_pad_proportion=0.05193, over 10588.08 utterances.], batch size: 37, lr: 3.66e-03, grad_scale: 8.0 2023-03-09 13:57:44,608 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=115527.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:58:12,589 INFO [train2.py:809] (2/4) Epoch 30, batch 0, loss[ctc_loss=0.08847, att_loss=0.254, loss=0.2209, over 16979.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006837, over 50.00 utterances.], tot_loss[ctc_loss=0.08847, att_loss=0.254, loss=0.2209, over 16979.00 frames. utt_duration=1360 frames, utt_pad_proportion=0.006837, over 50.00 utterances.], batch size: 50, lr: 3.60e-03, grad_scale: 8.0 2023-03-09 13:58:12,589 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 13:58:17,339 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8622, 4.7910, 4.7605, 2.2352, 2.0425, 3.0421, 2.2751, 3.6690], device='cuda:2'), covar=tensor([0.0779, 0.0362, 0.0303, 0.5093, 0.5577, 0.2361, 0.4159, 0.1559], device='cuda:2'), in_proj_covar=tensor([0.0365, 0.0307, 0.0280, 0.0253, 0.0340, 0.0332, 0.0265, 0.0373], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 13:58:24,677 INFO [train2.py:843] (2/4) Epoch 30, validation: ctc_loss=0.03959, att_loss=0.2341, loss=0.1952, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 13:58:24,678 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 13:58:49,315 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2737, 2.8575, 3.2405, 4.3634, 3.8551, 3.8427, 2.8410, 2.2128], device='cuda:2'), covar=tensor([0.0914, 0.1994, 0.0950, 0.0554, 0.0918, 0.0493, 0.1658, 0.2286], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0223, 0.0189, 0.0232, 0.0241, 0.0196, 0.0209, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-09 13:59:05,303 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2023-03-09 13:59:13,841 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 1.763e+02 2.122e+02 2.494e+02 5.580e+02, threshold=4.245e+02, percent-clipped=4.0 2023-03-09 13:59:38,534 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=115575.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 13:59:44,598 INFO [train2.py:809] (2/4) Epoch 30, batch 50, loss[ctc_loss=0.06519, att_loss=0.2382, loss=0.2036, over 17028.00 frames. utt_duration=689.4 frames, utt_pad_proportion=0.1339, over 99.00 utterances.], tot_loss[ctc_loss=0.05939, att_loss=0.2271, loss=0.1935, over 736481.59 frames. utt_duration=1330 frames, utt_pad_proportion=0.03197, over 2217.92 utterances.], batch size: 99, lr: 3.60e-03, grad_scale: 8.0 2023-03-09 14:01:04,302 INFO [train2.py:809] (2/4) Epoch 30, batch 100, loss[ctc_loss=0.0602, att_loss=0.2385, loss=0.2029, over 16873.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.008075, over 49.00 utterances.], tot_loss[ctc_loss=0.0602, att_loss=0.2281, loss=0.1946, over 1302459.05 frames. utt_duration=1302 frames, utt_pad_proportion=0.03722, over 4006.77 utterances.], batch size: 49, lr: 3.60e-03, grad_scale: 8.0 2023-03-09 14:01:06,163 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2251, 2.6280, 3.1592, 4.2162, 3.7707, 3.7737, 2.8335, 2.1163], device='cuda:2'), covar=tensor([0.0864, 0.2103, 0.0984, 0.0563, 0.0873, 0.0545, 0.1591, 0.2297], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0222, 0.0189, 0.0232, 0.0241, 0.0196, 0.0209, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-09 14:01:54,056 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.758e+02 2.072e+02 2.595e+02 5.402e+02, threshold=4.143e+02, percent-clipped=3.0 2023-03-09 14:02:13,419 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1783, 2.6669, 3.2303, 4.2103, 3.7013, 3.7345, 2.7391, 2.2842], device='cuda:2'), covar=tensor([0.0874, 0.1925, 0.0866, 0.0659, 0.0998, 0.0616, 0.1680, 0.2085], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0223, 0.0190, 0.0233, 0.0242, 0.0197, 0.0210, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-09 14:02:22,596 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0410, 5.2714, 5.2389, 5.2600, 5.3683, 5.3082, 4.9306, 4.8100], device='cuda:2'), covar=tensor([0.0982, 0.0597, 0.0320, 0.0467, 0.0258, 0.0330, 0.0444, 0.0349], device='cuda:2'), in_proj_covar=tensor([0.0543, 0.0387, 0.0380, 0.0386, 0.0453, 0.0456, 0.0385, 0.0425], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:02:23,889 INFO [train2.py:809] (2/4) Epoch 30, batch 150, loss[ctc_loss=0.05018, att_loss=0.2303, loss=0.1943, over 16326.00 frames. utt_duration=1452 frames, utt_pad_proportion=0.006597, over 45.00 utterances.], tot_loss[ctc_loss=0.06229, att_loss=0.2296, loss=0.1961, over 1741826.03 frames. utt_duration=1231 frames, utt_pad_proportion=0.05736, over 5664.82 utterances.], batch size: 45, lr: 3.60e-03, grad_scale: 8.0 2023-03-09 14:03:04,241 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0031, 6.2916, 5.8492, 5.9494, 5.9731, 5.3837, 5.6454, 5.3658], device='cuda:2'), covar=tensor([0.1313, 0.0880, 0.0982, 0.0963, 0.0962, 0.1912, 0.2295, 0.2260], device='cuda:2'), in_proj_covar=tensor([0.0563, 0.0638, 0.0492, 0.0476, 0.0454, 0.0486, 0.0641, 0.0543], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:03:39,640 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=115726.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:03:39,890 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5612, 2.8903, 4.9183, 3.9639, 3.0807, 4.3232, 4.8269, 4.6977], device='cuda:2'), covar=tensor([0.0304, 0.1281, 0.0261, 0.0910, 0.1634, 0.0281, 0.0226, 0.0297], device='cuda:2'), in_proj_covar=tensor([0.0239, 0.0247, 0.0233, 0.0324, 0.0272, 0.0246, 0.0226, 0.0245], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:03:41,129 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=115727.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:03:44,042 INFO [train2.py:809] (2/4) Epoch 30, batch 200, loss[ctc_loss=0.05615, att_loss=0.2292, loss=0.1946, over 16623.00 frames. utt_duration=1416 frames, utt_pad_proportion=0.005379, over 47.00 utterances.], tot_loss[ctc_loss=0.06132, att_loss=0.2287, loss=0.1952, over 2073230.12 frames. utt_duration=1230 frames, utt_pad_proportion=0.06206, over 6750.49 utterances.], batch size: 47, lr: 3.60e-03, grad_scale: 8.0 2023-03-09 14:04:04,992 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=115742.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:04:23,461 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7376, 3.3578, 3.3827, 2.8454, 3.3316, 3.3980, 3.4101, 2.3930], device='cuda:2'), covar=tensor([0.1061, 0.1225, 0.1441, 0.3061, 0.1519, 0.1763, 0.0849, 0.3383], device='cuda:2'), in_proj_covar=tensor([0.0211, 0.0214, 0.0227, 0.0279, 0.0190, 0.0289, 0.0212, 0.0235], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:04:32,779 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.790e+02 2.190e+02 2.640e+02 5.774e+02, threshold=4.380e+02, percent-clipped=3.0 2023-03-09 14:05:03,781 INFO [train2.py:809] (2/4) Epoch 30, batch 250, loss[ctc_loss=0.06347, att_loss=0.2383, loss=0.2033, over 17515.00 frames. utt_duration=1017 frames, utt_pad_proportion=0.0398, over 69.00 utterances.], tot_loss[ctc_loss=0.06266, att_loss=0.2297, loss=0.1963, over 2333937.96 frames. utt_duration=1200 frames, utt_pad_proportion=0.06927, over 7787.55 utterances.], batch size: 69, lr: 3.60e-03, grad_scale: 8.0 2023-03-09 14:05:15,078 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4186, 2.5698, 4.7605, 3.8147, 3.0464, 4.1208, 4.6031, 4.6080], device='cuda:2'), covar=tensor([0.0316, 0.1470, 0.0312, 0.0857, 0.1573, 0.0301, 0.0219, 0.0275], device='cuda:2'), in_proj_covar=tensor([0.0240, 0.0247, 0.0233, 0.0324, 0.0273, 0.0246, 0.0226, 0.0246], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:05:18,771 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1134, 5.1698, 4.9198, 2.3488, 2.2167, 3.3004, 2.5360, 3.9200], device='cuda:2'), covar=tensor([0.0714, 0.0326, 0.0334, 0.5460, 0.5236, 0.2041, 0.4029, 0.1596], device='cuda:2'), in_proj_covar=tensor([0.0367, 0.0308, 0.0281, 0.0254, 0.0341, 0.0332, 0.0265, 0.0374], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 14:05:21,466 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=115790.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:06:23,343 INFO [train2.py:809] (2/4) Epoch 30, batch 300, loss[ctc_loss=0.08059, att_loss=0.2509, loss=0.2168, over 17101.00 frames. utt_duration=1223 frames, utt_pad_proportion=0.01615, over 56.00 utterances.], tot_loss[ctc_loss=0.0628, att_loss=0.2293, loss=0.196, over 2537795.58 frames. utt_duration=1209 frames, utt_pad_proportion=0.06746, over 8404.01 utterances.], batch size: 56, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:07:12,539 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.192e+02 1.839e+02 2.070e+02 2.435e+02 4.420e+02, threshold=4.140e+02, percent-clipped=2.0 2023-03-09 14:07:43,746 INFO [train2.py:809] (2/4) Epoch 30, batch 350, loss[ctc_loss=0.08211, att_loss=0.2436, loss=0.2113, over 16471.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006177, over 46.00 utterances.], tot_loss[ctc_loss=0.06279, att_loss=0.2296, loss=0.1963, over 2701671.19 frames. utt_duration=1203 frames, utt_pad_proportion=0.06725, over 8991.48 utterances.], batch size: 46, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:08:05,276 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=115892.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 14:09:04,301 INFO [train2.py:809] (2/4) Epoch 30, batch 400, loss[ctc_loss=0.06136, att_loss=0.2305, loss=0.1967, over 17311.00 frames. utt_duration=1261 frames, utt_pad_proportion=0.01127, over 55.00 utterances.], tot_loss[ctc_loss=0.06267, att_loss=0.2295, loss=0.1962, over 2831572.39 frames. utt_duration=1244 frames, utt_pad_proportion=0.05591, over 9119.23 utterances.], batch size: 55, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:09:43,441 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=115953.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 14:09:53,661 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.410e+02 1.924e+02 2.363e+02 2.895e+02 9.842e+02, threshold=4.725e+02, percent-clipped=4.0 2023-03-09 14:10:24,227 INFO [train2.py:809] (2/4) Epoch 30, batch 450, loss[ctc_loss=0.0624, att_loss=0.2134, loss=0.1832, over 14517.00 frames. utt_duration=1816 frames, utt_pad_proportion=0.04015, over 32.00 utterances.], tot_loss[ctc_loss=0.06194, att_loss=0.2292, loss=0.1957, over 2923999.74 frames. utt_duration=1271 frames, utt_pad_proportion=0.05139, over 9216.01 utterances.], batch size: 32, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:11:25,115 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=116014.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:11:43,440 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=116026.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:11:45,007 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=116027.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:11:48,383 INFO [train2.py:809] (2/4) Epoch 30, batch 500, loss[ctc_loss=0.07489, att_loss=0.2396, loss=0.2067, over 17278.00 frames. utt_duration=1258 frames, utt_pad_proportion=0.01253, over 55.00 utterances.], tot_loss[ctc_loss=0.06245, att_loss=0.2294, loss=0.196, over 3004437.54 frames. utt_duration=1273 frames, utt_pad_proportion=0.04911, over 9451.82 utterances.], batch size: 55, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:12:37,212 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.798e+02 2.149e+02 2.710e+02 6.262e+02, threshold=4.298e+02, percent-clipped=5.0 2023-03-09 14:12:59,510 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=116074.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:13:01,050 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=116075.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:13:01,329 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=116075.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 14:13:07,556 INFO [train2.py:809] (2/4) Epoch 30, batch 550, loss[ctc_loss=0.09079, att_loss=0.2568, loss=0.2236, over 17353.00 frames. utt_duration=1103 frames, utt_pad_proportion=0.03487, over 63.00 utterances.], tot_loss[ctc_loss=0.06301, att_loss=0.2301, loss=0.1966, over 3061066.44 frames. utt_duration=1226 frames, utt_pad_proportion=0.06193, over 10000.14 utterances.], batch size: 63, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:13:47,221 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.7234, 4.6772, 4.3173, 2.7351, 4.5563, 4.4061, 3.9579, 2.4877], device='cuda:2'), covar=tensor([0.0149, 0.0153, 0.0418, 0.1290, 0.0135, 0.0300, 0.0417, 0.1708], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0108, 0.0113, 0.0114, 0.0091, 0.0121, 0.0103, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:13:48,814 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=116105.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:14:06,609 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=116116.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:14:27,136 INFO [train2.py:809] (2/4) Epoch 30, batch 600, loss[ctc_loss=0.07535, att_loss=0.2442, loss=0.2104, over 17416.00 frames. utt_duration=1011 frames, utt_pad_proportion=0.04623, over 69.00 utterances.], tot_loss[ctc_loss=0.06367, att_loss=0.2304, loss=0.1971, over 3106189.91 frames. utt_duration=1210 frames, utt_pad_proportion=0.06542, over 10277.59 utterances.], batch size: 69, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:15:15,787 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.620e+01 1.813e+02 2.125e+02 2.957e+02 6.068e+02, threshold=4.250e+02, percent-clipped=9.0 2023-03-09 14:15:26,204 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=116166.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:15:44,326 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=116177.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:15:47,048 INFO [train2.py:809] (2/4) Epoch 30, batch 650, loss[ctc_loss=0.04787, att_loss=0.2108, loss=0.1782, over 15996.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.00807, over 40.00 utterances.], tot_loss[ctc_loss=0.06399, att_loss=0.2311, loss=0.1977, over 3150459.21 frames. utt_duration=1202 frames, utt_pad_proportion=0.06435, over 10495.99 utterances.], batch size: 40, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:17:07,961 INFO [train2.py:809] (2/4) Epoch 30, batch 700, loss[ctc_loss=0.05017, att_loss=0.2282, loss=0.1926, over 16632.00 frames. utt_duration=1417 frames, utt_pad_proportion=0.005005, over 47.00 utterances.], tot_loss[ctc_loss=0.06352, att_loss=0.2306, loss=0.1972, over 3179877.69 frames. utt_duration=1223 frames, utt_pad_proportion=0.05767, over 10415.54 utterances.], batch size: 47, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:17:38,392 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=116248.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 14:17:57,485 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.730e+02 1.988e+02 2.420e+02 4.405e+02, threshold=3.977e+02, percent-clipped=2.0 2023-03-09 14:18:28,324 INFO [train2.py:809] (2/4) Epoch 30, batch 750, loss[ctc_loss=0.06847, att_loss=0.2251, loss=0.1938, over 15959.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.00616, over 41.00 utterances.], tot_loss[ctc_loss=0.06318, att_loss=0.2299, loss=0.1965, over 3190299.62 frames. utt_duration=1222 frames, utt_pad_proportion=0.06056, over 10456.16 utterances.], batch size: 41, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:19:38,020 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=116323.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:19:47,386 INFO [train2.py:809] (2/4) Epoch 30, batch 800, loss[ctc_loss=0.07093, att_loss=0.2423, loss=0.208, over 17351.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.02092, over 59.00 utterances.], tot_loss[ctc_loss=0.0632, att_loss=0.2303, loss=0.1969, over 3214475.11 frames. utt_duration=1220 frames, utt_pad_proportion=0.0601, over 10549.00 utterances.], batch size: 59, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:20:11,263 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2952, 2.5440, 3.0519, 2.5746, 3.0203, 3.4488, 3.3318, 2.6659], device='cuda:2'), covar=tensor([0.0538, 0.1676, 0.1291, 0.1252, 0.0985, 0.1310, 0.0802, 0.1317], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0254, 0.0295, 0.0222, 0.0274, 0.0385, 0.0276, 0.0240], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 14:20:30,379 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5934, 3.1852, 3.7471, 3.2998, 3.6510, 4.6702, 4.5567, 3.3544], device='cuda:2'), covar=tensor([0.0413, 0.1623, 0.1278, 0.1279, 0.1114, 0.0895, 0.0544, 0.1267], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0254, 0.0295, 0.0222, 0.0274, 0.0385, 0.0276, 0.0240], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 14:20:36,147 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.359e+02 1.849e+02 2.173e+02 2.485e+02 4.246e+02, threshold=4.345e+02, percent-clipped=3.0 2023-03-09 14:20:37,002 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2023-03-09 14:20:52,676 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=116370.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 14:21:06,894 INFO [train2.py:809] (2/4) Epoch 30, batch 850, loss[ctc_loss=0.06694, att_loss=0.2252, loss=0.1936, over 16129.00 frames. utt_duration=1538 frames, utt_pad_proportion=0.004777, over 42.00 utterances.], tot_loss[ctc_loss=0.06246, att_loss=0.2294, loss=0.196, over 3220539.50 frames. utt_duration=1235 frames, utt_pad_proportion=0.05918, over 10440.87 utterances.], batch size: 42, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:21:15,678 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=116384.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:22:27,522 INFO [train2.py:809] (2/4) Epoch 30, batch 900, loss[ctc_loss=0.08392, att_loss=0.2495, loss=0.2164, over 13871.00 frames. utt_duration=384.4 frames, utt_pad_proportion=0.3316, over 145.00 utterances.], tot_loss[ctc_loss=0.06252, att_loss=0.2301, loss=0.1966, over 3234331.90 frames. utt_duration=1192 frames, utt_pad_proportion=0.06832, over 10867.95 utterances.], batch size: 145, lr: 3.59e-03, grad_scale: 8.0 2023-03-09 14:22:54,786 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1673, 3.8338, 3.3227, 3.4931, 4.0376, 3.7848, 3.2059, 4.2675], device='cuda:2'), covar=tensor([0.0974, 0.0516, 0.0982, 0.0710, 0.0752, 0.0640, 0.0801, 0.0498], device='cuda:2'), in_proj_covar=tensor([0.0212, 0.0230, 0.0234, 0.0211, 0.0294, 0.0253, 0.0208, 0.0302], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:2') 2023-03-09 14:22:54,794 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=116446.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:23:16,724 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.211e+02 1.807e+02 2.134e+02 2.722e+02 5.477e+02, threshold=4.269e+02, percent-clipped=4.0 2023-03-09 14:23:18,519 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=116461.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:23:36,265 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=116472.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:23:47,729 INFO [train2.py:809] (2/4) Epoch 30, batch 950, loss[ctc_loss=0.05554, att_loss=0.224, loss=0.1903, over 16400.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007213, over 44.00 utterances.], tot_loss[ctc_loss=0.06305, att_loss=0.2305, loss=0.197, over 3252006.78 frames. utt_duration=1201 frames, utt_pad_proportion=0.06268, over 10842.85 utterances.], batch size: 44, lr: 3.58e-03, grad_scale: 8.0 2023-03-09 14:24:29,912 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=116505.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:24:33,186 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=116507.0, num_to_drop=1, layers_to_drop={2} 2023-03-09 14:25:07,879 INFO [train2.py:809] (2/4) Epoch 30, batch 1000, loss[ctc_loss=0.04939, att_loss=0.2189, loss=0.185, over 16400.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007797, over 44.00 utterances.], tot_loss[ctc_loss=0.06304, att_loss=0.2309, loss=0.1973, over 3255681.70 frames. utt_duration=1193 frames, utt_pad_proportion=0.06536, over 10925.82 utterances.], batch size: 44, lr: 3.58e-03, grad_scale: 8.0 2023-03-09 14:25:38,097 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=116548.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 14:25:56,767 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.757e+02 2.029e+02 2.635e+02 6.922e+02, threshold=4.057e+02, percent-clipped=2.0 2023-03-09 14:26:07,201 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=116566.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:26:21,820 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1761, 5.1877, 4.9456, 2.7980, 2.2951, 3.4712, 2.7268, 3.9616], device='cuda:2'), covar=tensor([0.0733, 0.0423, 0.0339, 0.4776, 0.4978, 0.1946, 0.3798, 0.1606], device='cuda:2'), in_proj_covar=tensor([0.0362, 0.0304, 0.0277, 0.0250, 0.0335, 0.0330, 0.0262, 0.0370], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 14:26:27,609 INFO [train2.py:809] (2/4) Epoch 30, batch 1050, loss[ctc_loss=0.05626, att_loss=0.2242, loss=0.1906, over 17583.00 frames. utt_duration=1006 frames, utt_pad_proportion=0.04717, over 70.00 utterances.], tot_loss[ctc_loss=0.06212, att_loss=0.2295, loss=0.1961, over 3250840.36 frames. utt_duration=1228 frames, utt_pad_proportion=0.06002, over 10606.12 utterances.], batch size: 70, lr: 3.58e-03, grad_scale: 8.0 2023-03-09 14:26:37,703 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9935, 5.2288, 5.1585, 5.2074, 5.2572, 5.2454, 4.8664, 4.6951], device='cuda:2'), covar=tensor([0.1039, 0.0540, 0.0333, 0.0475, 0.0273, 0.0316, 0.0459, 0.0360], device='cuda:2'), in_proj_covar=tensor([0.0540, 0.0387, 0.0382, 0.0387, 0.0452, 0.0456, 0.0385, 0.0424], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:26:54,763 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=116596.0, num_to_drop=1, layers_to_drop={1} 2023-03-09 14:27:30,148 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=116618.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:27:47,630 INFO [train2.py:809] (2/4) Epoch 30, batch 1100, loss[ctc_loss=0.06184, att_loss=0.244, loss=0.2075, over 17025.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007613, over 51.00 utterances.], tot_loss[ctc_loss=0.06218, att_loss=0.2299, loss=0.1963, over 3262344.60 frames. utt_duration=1240 frames, utt_pad_proportion=0.05548, over 10538.22 utterances.], batch size: 51, lr: 3.58e-03, grad_scale: 8.0 2023-03-09 14:27:53,160 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8018, 6.0887, 5.6180, 5.7874, 5.7865, 5.2033, 5.4814, 5.2895], device='cuda:2'), covar=tensor([0.1478, 0.0892, 0.0903, 0.0919, 0.1115, 0.1691, 0.2396, 0.2093], device='cuda:2'), in_proj_covar=tensor([0.0568, 0.0641, 0.0495, 0.0477, 0.0458, 0.0486, 0.0649, 0.0545], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:28:34,267 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4531, 4.5921, 4.5401, 4.6257, 4.6609, 4.6673, 4.3200, 4.2025], device='cuda:2'), covar=tensor([0.0937, 0.0613, 0.0626, 0.0571, 0.0317, 0.0387, 0.0481, 0.0377], device='cuda:2'), in_proj_covar=tensor([0.0539, 0.0385, 0.0381, 0.0386, 0.0451, 0.0454, 0.0384, 0.0422], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:28:37,009 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.262e+02 1.803e+02 2.274e+02 2.882e+02 6.074e+02, threshold=4.549e+02, percent-clipped=3.0 2023-03-09 14:28:54,127 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=116670.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 14:29:07,782 INFO [train2.py:809] (2/4) Epoch 30, batch 1150, loss[ctc_loss=0.05143, att_loss=0.2204, loss=0.1866, over 16124.00 frames. utt_duration=1537 frames, utt_pad_proportion=0.005667, over 42.00 utterances.], tot_loss[ctc_loss=0.06253, att_loss=0.2295, loss=0.1961, over 3261737.76 frames. utt_duration=1239 frames, utt_pad_proportion=0.05644, over 10541.50 utterances.], batch size: 42, lr: 3.58e-03, grad_scale: 8.0 2023-03-09 14:29:08,016 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=116679.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:29:08,185 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=116679.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:29:47,286 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=116704.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:30:09,503 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=116718.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:30:26,869 INFO [train2.py:809] (2/4) Epoch 30, batch 1200, loss[ctc_loss=0.04743, att_loss=0.2035, loss=0.1723, over 15873.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.008781, over 39.00 utterances.], tot_loss[ctc_loss=0.06247, att_loss=0.2292, loss=0.1959, over 3263655.26 frames. utt_duration=1254 frames, utt_pad_proportion=0.05311, over 10426.35 utterances.], batch size: 39, lr: 3.58e-03, grad_scale: 8.0 2023-03-09 14:30:56,380 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0866, 5.4365, 5.6363, 5.4775, 5.5864, 6.0656, 5.3540, 6.1689], device='cuda:2'), covar=tensor([0.0729, 0.0741, 0.0828, 0.1343, 0.1691, 0.0930, 0.0642, 0.0655], device='cuda:2'), in_proj_covar=tensor([0.0930, 0.0538, 0.0652, 0.0689, 0.0910, 0.0681, 0.0523, 0.0651], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 14:31:16,204 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.678e+02 2.013e+02 2.421e+02 9.091e+02, threshold=4.026e+02, percent-clipped=5.0 2023-03-09 14:31:18,142 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=116761.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:31:25,111 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=116765.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:31:36,041 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=116772.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:31:47,190 INFO [train2.py:809] (2/4) Epoch 30, batch 1250, loss[ctc_loss=0.04749, att_loss=0.2095, loss=0.1771, over 15750.00 frames. utt_duration=1660 frames, utt_pad_proportion=0.009719, over 38.00 utterances.], tot_loss[ctc_loss=0.06212, att_loss=0.229, loss=0.1956, over 3261135.71 frames. utt_duration=1257 frames, utt_pad_proportion=0.05408, over 10388.48 utterances.], batch size: 38, lr: 3.58e-03, grad_scale: 16.0 2023-03-09 14:31:48,075 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.23 vs. limit=5.0 2023-03-09 14:32:23,516 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=116802.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 14:32:23,655 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=116802.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:32:35,289 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=116809.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:32:53,098 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=116820.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:32:56,198 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.0450, 6.2638, 5.8505, 5.9599, 5.9601, 5.4542, 5.7776, 5.4418], device='cuda:2'), covar=tensor([0.1187, 0.0879, 0.0914, 0.0839, 0.0870, 0.1623, 0.2112, 0.2188], device='cuda:2'), in_proj_covar=tensor([0.0569, 0.0645, 0.0496, 0.0479, 0.0460, 0.0487, 0.0651, 0.0548], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:33:07,494 INFO [train2.py:809] (2/4) Epoch 30, batch 1300, loss[ctc_loss=0.0594, att_loss=0.2327, loss=0.198, over 16770.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.00617, over 48.00 utterances.], tot_loss[ctc_loss=0.06205, att_loss=0.2296, loss=0.1961, over 3270476.12 frames. utt_duration=1258 frames, utt_pad_proportion=0.05164, over 10409.70 utterances.], batch size: 48, lr: 3.58e-03, grad_scale: 16.0 2023-03-09 14:33:57,075 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 1.796e+02 2.133e+02 2.719e+02 8.820e+02, threshold=4.267e+02, percent-clipped=2.0 2023-03-09 14:33:58,894 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=116861.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:34:02,741 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=116863.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:34:28,221 INFO [train2.py:809] (2/4) Epoch 30, batch 1350, loss[ctc_loss=0.06191, att_loss=0.2196, loss=0.1881, over 16116.00 frames. utt_duration=1536 frames, utt_pad_proportion=0.006345, over 42.00 utterances.], tot_loss[ctc_loss=0.06167, att_loss=0.2298, loss=0.1961, over 3281463.89 frames. utt_duration=1269 frames, utt_pad_proportion=0.04622, over 10358.07 utterances.], batch size: 42, lr: 3.58e-03, grad_scale: 16.0 2023-03-09 14:34:35,090 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9738, 5.0518, 4.7884, 2.1340, 2.0649, 3.0203, 2.5631, 3.8799], device='cuda:2'), covar=tensor([0.0772, 0.0322, 0.0354, 0.6039, 0.5231, 0.2397, 0.3597, 0.1591], device='cuda:2'), in_proj_covar=tensor([0.0363, 0.0306, 0.0279, 0.0252, 0.0337, 0.0332, 0.0262, 0.0371], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 14:34:55,300 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.9861, 5.0559, 4.8123, 2.3062, 2.1513, 3.0132, 2.5928, 3.8820], device='cuda:2'), covar=tensor([0.0781, 0.0333, 0.0324, 0.5131, 0.5063, 0.2403, 0.3668, 0.1638], device='cuda:2'), in_proj_covar=tensor([0.0364, 0.0307, 0.0279, 0.0253, 0.0337, 0.0333, 0.0263, 0.0372], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 14:35:07,858 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1333, 4.4421, 4.4132, 4.7499, 2.9256, 4.3979, 2.9078, 2.0253], device='cuda:2'), covar=tensor([0.0568, 0.0313, 0.0670, 0.0261, 0.1483, 0.0286, 0.1299, 0.1647], device='cuda:2'), in_proj_covar=tensor([0.0225, 0.0196, 0.0267, 0.0184, 0.0225, 0.0176, 0.0236, 0.0205], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:35:43,044 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.39 vs. limit=5.0 2023-03-09 14:35:48,166 INFO [train2.py:809] (2/4) Epoch 30, batch 1400, loss[ctc_loss=0.05624, att_loss=0.2141, loss=0.1825, over 15962.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.005849, over 41.00 utterances.], tot_loss[ctc_loss=0.06135, att_loss=0.2297, loss=0.196, over 3283508.88 frames. utt_duration=1272 frames, utt_pad_proportion=0.04493, over 10335.95 utterances.], batch size: 41, lr: 3.58e-03, grad_scale: 16.0 2023-03-09 14:36:37,549 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.312e+02 1.819e+02 2.157e+02 2.707e+02 5.079e+02, threshold=4.315e+02, percent-clipped=2.0 2023-03-09 14:37:01,013 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=116974.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:37:01,591 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.55 vs. limit=5.0 2023-03-09 14:37:08,666 INFO [train2.py:809] (2/4) Epoch 30, batch 1450, loss[ctc_loss=0.05917, att_loss=0.2326, loss=0.1979, over 16523.00 frames. utt_duration=1470 frames, utt_pad_proportion=0.006847, over 45.00 utterances.], tot_loss[ctc_loss=0.06196, att_loss=0.2304, loss=0.1967, over 3290670.27 frames. utt_duration=1259 frames, utt_pad_proportion=0.04566, over 10465.73 utterances.], batch size: 45, lr: 3.58e-03, grad_scale: 16.0 2023-03-09 14:37:08,998 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=116979.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:37:13,826 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4513, 4.6602, 4.6623, 4.7097, 5.3018, 4.6347, 4.6293, 3.0290], device='cuda:2'), covar=tensor([0.0322, 0.0367, 0.0361, 0.0369, 0.0560, 0.0252, 0.0360, 0.1468], device='cuda:2'), in_proj_covar=tensor([0.0207, 0.0235, 0.0227, 0.0247, 0.0390, 0.0202, 0.0222, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:38:25,651 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=117027.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:38:28,586 INFO [train2.py:809] (2/4) Epoch 30, batch 1500, loss[ctc_loss=0.05211, att_loss=0.2229, loss=0.1888, over 16871.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.00742, over 49.00 utterances.], tot_loss[ctc_loss=0.06139, att_loss=0.2294, loss=0.1958, over 3280953.45 frames. utt_duration=1294 frames, utt_pad_proportion=0.03983, over 10152.68 utterances.], batch size: 49, lr: 3.58e-03, grad_scale: 16.0 2023-03-09 14:39:00,527 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0982, 5.0805, 4.8081, 2.3521, 2.0994, 3.2394, 2.5759, 3.9492], device='cuda:2'), covar=tensor([0.0730, 0.0323, 0.0308, 0.5020, 0.5240, 0.2173, 0.3735, 0.1557], device='cuda:2'), in_proj_covar=tensor([0.0362, 0.0306, 0.0278, 0.0252, 0.0336, 0.0331, 0.0262, 0.0371], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 14:39:18,347 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.257e+02 1.806e+02 2.089e+02 2.584e+02 4.922e+02, threshold=4.178e+02, percent-clipped=1.0 2023-03-09 14:39:18,628 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=117060.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:39:48,453 INFO [train2.py:809] (2/4) Epoch 30, batch 1550, loss[ctc_loss=0.07469, att_loss=0.2391, loss=0.2062, over 17264.00 frames. utt_duration=1257 frames, utt_pad_proportion=0.01405, over 55.00 utterances.], tot_loss[ctc_loss=0.0622, att_loss=0.2302, loss=0.1966, over 3286196.24 frames. utt_duration=1285 frames, utt_pad_proportion=0.04043, over 10242.68 utterances.], batch size: 55, lr: 3.58e-03, grad_scale: 16.0 2023-03-09 14:40:25,839 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=117102.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:41:08,937 INFO [train2.py:809] (2/4) Epoch 30, batch 1600, loss[ctc_loss=0.0484, att_loss=0.2107, loss=0.1782, over 15623.00 frames. utt_duration=1690 frames, utt_pad_proportion=0.009848, over 37.00 utterances.], tot_loss[ctc_loss=0.06271, att_loss=0.2305, loss=0.1969, over 3291777.05 frames. utt_duration=1273 frames, utt_pad_proportion=0.04146, over 10353.85 utterances.], batch size: 37, lr: 3.57e-03, grad_scale: 16.0 2023-03-09 14:41:43,252 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=117150.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:41:56,324 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=117158.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:41:59,198 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.817e+02 2.106e+02 2.742e+02 6.475e+02, threshold=4.211e+02, percent-clipped=3.0 2023-03-09 14:41:59,636 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5638, 2.9073, 4.9553, 4.0680, 3.2154, 4.3205, 4.8473, 4.7341], device='cuda:2'), covar=tensor([0.0315, 0.1366, 0.0276, 0.0783, 0.1562, 0.0271, 0.0226, 0.0298], device='cuda:2'), in_proj_covar=tensor([0.0244, 0.0249, 0.0237, 0.0326, 0.0274, 0.0249, 0.0231, 0.0250], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:42:00,956 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=117161.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:42:29,624 INFO [train2.py:809] (2/4) Epoch 30, batch 1650, loss[ctc_loss=0.06402, att_loss=0.2385, loss=0.2036, over 17299.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01115, over 55.00 utterances.], tot_loss[ctc_loss=0.06196, att_loss=0.2292, loss=0.1958, over 3284186.35 frames. utt_duration=1286 frames, utt_pad_proportion=0.0411, over 10224.07 utterances.], batch size: 55, lr: 3.57e-03, grad_scale: 16.0 2023-03-09 14:42:50,120 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1647, 5.5022, 5.6672, 5.4875, 5.6851, 6.1139, 5.3242, 6.1766], device='cuda:2'), covar=tensor([0.0657, 0.0705, 0.0873, 0.1417, 0.1600, 0.0821, 0.0706, 0.0671], device='cuda:2'), in_proj_covar=tensor([0.0926, 0.0538, 0.0649, 0.0688, 0.0910, 0.0675, 0.0520, 0.0652], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 14:43:10,834 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0236, 5.0375, 4.7555, 2.9582, 4.8900, 4.7734, 4.3237, 2.8099], device='cuda:2'), covar=tensor([0.0124, 0.0102, 0.0318, 0.1073, 0.0105, 0.0202, 0.0326, 0.1412], device='cuda:2'), in_proj_covar=tensor([0.0080, 0.0109, 0.0114, 0.0115, 0.0092, 0.0121, 0.0104, 0.0107], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:43:17,778 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.7914, 3.2591, 3.3096, 2.8528, 3.2486, 3.2599, 3.3138, 2.3694], device='cuda:2'), covar=tensor([0.1038, 0.1219, 0.1620, 0.3312, 0.0948, 0.2257, 0.1053, 0.3326], device='cuda:2'), in_proj_covar=tensor([0.0218, 0.0223, 0.0236, 0.0290, 0.0197, 0.0299, 0.0219, 0.0242], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:43:19,241 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=117209.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:43:30,589 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.6563, 3.6029, 3.3700, 3.8323, 2.7498, 3.7137, 2.8525, 2.2289], device='cuda:2'), covar=tensor([0.0555, 0.0403, 0.0949, 0.0326, 0.1492, 0.0319, 0.1287, 0.1596], device='cuda:2'), in_proj_covar=tensor([0.0228, 0.0199, 0.0270, 0.0187, 0.0228, 0.0178, 0.0239, 0.0208], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:43:39,148 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=117221.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:43:51,228 INFO [train2.py:809] (2/4) Epoch 30, batch 1700, loss[ctc_loss=0.05378, att_loss=0.213, loss=0.1812, over 14574.00 frames. utt_duration=1823 frames, utt_pad_proportion=0.03367, over 32.00 utterances.], tot_loss[ctc_loss=0.06137, att_loss=0.2292, loss=0.1956, over 3280627.02 frames. utt_duration=1292 frames, utt_pad_proportion=0.04082, over 10169.70 utterances.], batch size: 32, lr: 3.57e-03, grad_scale: 16.0 2023-03-09 14:44:04,246 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.6086, 2.4281, 2.3929, 2.6235, 2.8418, 2.6789, 2.2985, 2.9990], device='cuda:2'), covar=tensor([0.2073, 0.2096, 0.1944, 0.1122, 0.2804, 0.1138, 0.1881, 0.1777], device='cuda:2'), in_proj_covar=tensor([0.0156, 0.0152, 0.0150, 0.0147, 0.0162, 0.0140, 0.0163, 0.0141], device='cuda:2'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:2') 2023-03-09 14:44:36,742 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=3.27 vs. limit=5.0 2023-03-09 14:44:43,555 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.821e+02 2.138e+02 2.705e+02 3.860e+02, threshold=4.276e+02, percent-clipped=0.0 2023-03-09 14:45:04,986 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=117274.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:45:12,574 INFO [train2.py:809] (2/4) Epoch 30, batch 1750, loss[ctc_loss=0.06754, att_loss=0.2262, loss=0.1944, over 16766.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006481, over 48.00 utterances.], tot_loss[ctc_loss=0.062, att_loss=0.2298, loss=0.1962, over 3286404.13 frames. utt_duration=1273 frames, utt_pad_proportion=0.04505, over 10341.12 utterances.], batch size: 48, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:45:17,618 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=117282.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:45:28,946 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.8071, 3.0449, 3.0904, 2.7579, 3.1019, 3.0337, 3.0770, 2.3428], device='cuda:2'), covar=tensor([0.1059, 0.1338, 0.1673, 0.3147, 0.1085, 0.1977, 0.1129, 0.3149], device='cuda:2'), in_proj_covar=tensor([0.0218, 0.0222, 0.0236, 0.0288, 0.0197, 0.0298, 0.0219, 0.0241], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:45:50,719 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.70 vs. limit=2.0 2023-03-09 14:46:23,175 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=117322.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:46:34,461 INFO [train2.py:809] (2/4) Epoch 30, batch 1800, loss[ctc_loss=0.06257, att_loss=0.2229, loss=0.1909, over 16407.00 frames. utt_duration=1493 frames, utt_pad_proportion=0.00742, over 44.00 utterances.], tot_loss[ctc_loss=0.06203, att_loss=0.2298, loss=0.1963, over 3280558.78 frames. utt_duration=1250 frames, utt_pad_proportion=0.05199, over 10510.22 utterances.], batch size: 44, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:46:34,852 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=117329.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:47:25,092 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=117360.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:47:26,232 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.207e+02 1.761e+02 2.042e+02 2.458e+02 5.519e+02, threshold=4.084e+02, percent-clipped=2.0 2023-03-09 14:47:54,416 INFO [train2.py:809] (2/4) Epoch 30, batch 1850, loss[ctc_loss=0.05219, att_loss=0.2053, loss=0.1747, over 15647.00 frames. utt_duration=1693 frames, utt_pad_proportion=0.008202, over 37.00 utterances.], tot_loss[ctc_loss=0.06193, att_loss=0.2298, loss=0.1962, over 3269985.29 frames. utt_duration=1244 frames, utt_pad_proportion=0.05624, over 10525.50 utterances.], batch size: 37, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:48:12,238 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=117390.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:48:41,540 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=117408.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:49:14,297 INFO [train2.py:809] (2/4) Epoch 30, batch 1900, loss[ctc_loss=0.05512, att_loss=0.2342, loss=0.1984, over 17332.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.03655, over 63.00 utterances.], tot_loss[ctc_loss=0.06256, att_loss=0.2306, loss=0.197, over 3281299.86 frames. utt_duration=1242 frames, utt_pad_proportion=0.05345, over 10580.63 utterances.], batch size: 63, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:49:23,818 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9800, 5.3163, 4.8478, 5.3387, 4.7512, 5.0204, 5.4230, 5.1952], device='cuda:2'), covar=tensor([0.0601, 0.0243, 0.0845, 0.0323, 0.0381, 0.0287, 0.0205, 0.0201], device='cuda:2'), in_proj_covar=tensor([0.0403, 0.0340, 0.0383, 0.0384, 0.0336, 0.0243, 0.0320, 0.0302], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 14:50:00,413 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=117458.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:50:04,887 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.786e+02 2.185e+02 2.601e+02 6.892e+02, threshold=4.370e+02, percent-clipped=2.0 2023-03-09 14:50:33,475 INFO [train2.py:809] (2/4) Epoch 30, batch 1950, loss[ctc_loss=0.05278, att_loss=0.2264, loss=0.1917, over 15964.00 frames. utt_duration=1559 frames, utt_pad_proportion=0.006513, over 41.00 utterances.], tot_loss[ctc_loss=0.06225, att_loss=0.2303, loss=0.1967, over 3281237.21 frames. utt_duration=1243 frames, utt_pad_proportion=0.05215, over 10571.43 utterances.], batch size: 41, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:51:17,147 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=117506.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:51:20,664 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0780, 3.7574, 3.7392, 3.3196, 3.7374, 3.8090, 3.8005, 2.8237], device='cuda:2'), covar=tensor([0.0949, 0.1076, 0.1895, 0.2591, 0.1211, 0.2124, 0.0821, 0.2841], device='cuda:2'), in_proj_covar=tensor([0.0216, 0.0220, 0.0233, 0.0285, 0.0194, 0.0295, 0.0217, 0.0238], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:51:53,201 INFO [train2.py:809] (2/4) Epoch 30, batch 2000, loss[ctc_loss=0.05337, att_loss=0.2234, loss=0.1894, over 16183.00 frames. utt_duration=1580 frames, utt_pad_proportion=0.006075, over 41.00 utterances.], tot_loss[ctc_loss=0.06246, att_loss=0.2304, loss=0.1968, over 3284621.81 frames. utt_duration=1246 frames, utt_pad_proportion=0.05053, over 10554.99 utterances.], batch size: 41, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:51:53,574 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.5584, 2.5351, 2.5905, 2.3849, 2.5589, 2.4852, 2.5842, 2.0562], device='cuda:2'), covar=tensor([0.1087, 0.1413, 0.1675, 0.2903, 0.0996, 0.2105, 0.1403, 0.2931], device='cuda:2'), in_proj_covar=tensor([0.0216, 0.0219, 0.0232, 0.0285, 0.0194, 0.0294, 0.0217, 0.0238], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:52:37,244 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0819, 5.0329, 4.8127, 3.2731, 4.9679, 4.8028, 4.3264, 2.8371], device='cuda:2'), covar=tensor([0.0120, 0.0099, 0.0306, 0.0846, 0.0097, 0.0183, 0.0304, 0.1273], device='cuda:2'), in_proj_covar=tensor([0.0080, 0.0109, 0.0113, 0.0114, 0.0091, 0.0121, 0.0103, 0.0106], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:52:37,270 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4153, 3.1117, 3.3324, 4.4151, 3.9394, 3.9953, 3.1265, 2.3656], device='cuda:2'), covar=tensor([0.0707, 0.1740, 0.0864, 0.0534, 0.0922, 0.0492, 0.1341, 0.2140], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0224, 0.0189, 0.0234, 0.0244, 0.0198, 0.0208, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-09 14:52:43,009 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 1.738e+02 2.282e+02 2.861e+02 7.444e+02, threshold=4.565e+02, percent-clipped=5.0 2023-03-09 14:52:48,612 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1938, 2.9418, 3.0906, 4.2798, 3.8365, 3.8762, 2.9722, 2.1858], device='cuda:2'), covar=tensor([0.0870, 0.1828, 0.0963, 0.0594, 0.0989, 0.0512, 0.1440, 0.2215], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0223, 0.0189, 0.0234, 0.0243, 0.0197, 0.0208, 0.0196], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:2') 2023-03-09 14:53:08,577 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=117577.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:53:11,453 INFO [train2.py:809] (2/4) Epoch 30, batch 2050, loss[ctc_loss=0.04789, att_loss=0.2171, loss=0.1833, over 16393.00 frames. utt_duration=1492 frames, utt_pad_proportion=0.007561, over 44.00 utterances.], tot_loss[ctc_loss=0.06267, att_loss=0.23, loss=0.1965, over 3276857.98 frames. utt_duration=1260 frames, utt_pad_proportion=0.04885, over 10411.84 utterances.], batch size: 44, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:53:48,481 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6167, 4.8907, 4.5069, 4.8774, 4.4025, 4.5416, 4.9902, 4.7876], device='cuda:2'), covar=tensor([0.0603, 0.0331, 0.0827, 0.0411, 0.0414, 0.0389, 0.0254, 0.0227], device='cuda:2'), in_proj_covar=tensor([0.0405, 0.0342, 0.0384, 0.0386, 0.0338, 0.0245, 0.0322, 0.0303], device='cuda:2'), out_proj_covar=tensor([0.0006, 0.0006, 0.0006, 0.0006, 0.0005, 0.0004, 0.0005, 0.0005], device='cuda:2') 2023-03-09 14:54:32,196 INFO [train2.py:809] (2/4) Epoch 30, batch 2100, loss[ctc_loss=0.07169, att_loss=0.2488, loss=0.2134, over 17003.00 frames. utt_duration=1335 frames, utt_pad_proportion=0.00869, over 51.00 utterances.], tot_loss[ctc_loss=0.06312, att_loss=0.2305, loss=0.197, over 3280379.55 frames. utt_duration=1260 frames, utt_pad_proportion=0.04923, over 10422.58 utterances.], batch size: 51, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:55:24,881 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.295e+02 1.716e+02 1.992e+02 2.490e+02 6.469e+02, threshold=3.983e+02, percent-clipped=1.0 2023-03-09 14:55:39,894 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-03-09 14:55:48,473 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9316, 4.9681, 4.6876, 2.8620, 4.7336, 4.6615, 4.2711, 2.6014], device='cuda:2'), covar=tensor([0.0121, 0.0109, 0.0335, 0.1135, 0.0118, 0.0222, 0.0332, 0.1485], device='cuda:2'), in_proj_covar=tensor([0.0080, 0.0109, 0.0113, 0.0114, 0.0091, 0.0121, 0.0103, 0.0106], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 14:55:52,862 INFO [train2.py:809] (2/4) Epoch 30, batch 2150, loss[ctc_loss=0.08104, att_loss=0.2613, loss=0.2252, over 17309.00 frames. utt_duration=1260 frames, utt_pad_proportion=0.01068, over 55.00 utterances.], tot_loss[ctc_loss=0.06283, att_loss=0.2311, loss=0.1975, over 3283246.87 frames. utt_duration=1262 frames, utt_pad_proportion=0.04795, over 10418.08 utterances.], batch size: 55, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:55:57,710 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.9167, 5.2708, 5.0932, 5.2120, 5.2986, 4.9670, 3.7266, 5.2846], device='cuda:2'), covar=tensor([0.0109, 0.0101, 0.0133, 0.0072, 0.0093, 0.0115, 0.0625, 0.0152], device='cuda:2'), in_proj_covar=tensor([0.0099, 0.0094, 0.0121, 0.0075, 0.0082, 0.0093, 0.0108, 0.0113], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 14:56:02,888 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=117685.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 14:56:48,312 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2023-03-09 14:56:53,009 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-09 14:57:12,172 INFO [train2.py:809] (2/4) Epoch 30, batch 2200, loss[ctc_loss=0.07635, att_loss=0.219, loss=0.1905, over 15881.00 frames. utt_duration=1630 frames, utt_pad_proportion=0.009456, over 39.00 utterances.], tot_loss[ctc_loss=0.06279, att_loss=0.2305, loss=0.197, over 3274820.77 frames. utt_duration=1228 frames, utt_pad_proportion=0.0592, over 10676.72 utterances.], batch size: 39, lr: 3.57e-03, grad_scale: 8.0 2023-03-09 14:58:03,079 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.299e+02 1.825e+02 2.078e+02 2.480e+02 5.999e+02, threshold=4.155e+02, percent-clipped=3.0 2023-03-09 14:58:21,734 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.1715, 4.4369, 4.5456, 4.7453, 2.7443, 4.5120, 3.1846, 1.9271], device='cuda:2'), covar=tensor([0.0478, 0.0336, 0.0556, 0.0278, 0.1552, 0.0299, 0.1129, 0.1666], device='cuda:2'), in_proj_covar=tensor([0.0225, 0.0197, 0.0267, 0.0185, 0.0225, 0.0176, 0.0235, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:58:30,566 INFO [train2.py:809] (2/4) Epoch 30, batch 2250, loss[ctc_loss=0.06733, att_loss=0.2468, loss=0.2109, over 17021.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007671, over 51.00 utterances.], tot_loss[ctc_loss=0.06266, att_loss=0.2305, loss=0.1969, over 3277024.23 frames. utt_duration=1241 frames, utt_pad_proportion=0.05579, over 10574.44 utterances.], batch size: 51, lr: 3.56e-03, grad_scale: 8.0 2023-03-09 14:59:14,652 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9115, 3.6620, 3.6704, 3.1199, 3.6630, 3.7006, 3.7110, 2.5590], device='cuda:2'), covar=tensor([0.1148, 0.1122, 0.1423, 0.3237, 0.1306, 0.1794, 0.0833, 0.3514], device='cuda:2'), in_proj_covar=tensor([0.0215, 0.0220, 0.0232, 0.0284, 0.0194, 0.0295, 0.0216, 0.0238], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 14:59:50,948 INFO [train2.py:809] (2/4) Epoch 30, batch 2300, loss[ctc_loss=0.06821, att_loss=0.2465, loss=0.2108, over 17330.00 frames. utt_duration=1102 frames, utt_pad_proportion=0.036, over 63.00 utterances.], tot_loss[ctc_loss=0.06233, att_loss=0.2307, loss=0.197, over 3273966.57 frames. utt_duration=1222 frames, utt_pad_proportion=0.06059, over 10729.65 utterances.], batch size: 63, lr: 3.56e-03, grad_scale: 8.0 2023-03-09 15:00:42,222 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.175e+02 1.803e+02 2.047e+02 2.475e+02 6.158e+02, threshold=4.093e+02, percent-clipped=3.0 2023-03-09 15:00:45,719 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6913, 2.4464, 5.0482, 4.0811, 3.3699, 4.4369, 4.8438, 4.8092], device='cuda:2'), covar=tensor([0.0202, 0.1534, 0.0166, 0.0687, 0.1386, 0.0210, 0.0135, 0.0208], device='cuda:2'), in_proj_covar=tensor([0.0245, 0.0249, 0.0239, 0.0329, 0.0275, 0.0251, 0.0232, 0.0252], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:01:07,144 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=117877.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:01:09,861 INFO [train2.py:809] (2/4) Epoch 30, batch 2350, loss[ctc_loss=0.05435, att_loss=0.2165, loss=0.1841, over 15887.00 frames. utt_duration=1631 frames, utt_pad_proportion=0.00916, over 39.00 utterances.], tot_loss[ctc_loss=0.06268, att_loss=0.2309, loss=0.1973, over 3274395.01 frames. utt_duration=1238 frames, utt_pad_proportion=0.0568, over 10590.39 utterances.], batch size: 39, lr: 3.56e-03, grad_scale: 8.0 2023-03-09 15:02:14,443 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.0081, 5.0694, 4.7774, 2.2929, 2.0531, 3.0309, 2.7507, 3.8976], device='cuda:2'), covar=tensor([0.0777, 0.0371, 0.0347, 0.5610, 0.5371, 0.2300, 0.3316, 0.1606], device='cuda:2'), in_proj_covar=tensor([0.0369, 0.0311, 0.0282, 0.0257, 0.0343, 0.0337, 0.0266, 0.0377], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 15:02:23,401 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=117925.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:02:29,491 INFO [train2.py:809] (2/4) Epoch 30, batch 2400, loss[ctc_loss=0.0766, att_loss=0.2402, loss=0.2075, over 16271.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.007056, over 43.00 utterances.], tot_loss[ctc_loss=0.06283, att_loss=0.2308, loss=0.1972, over 3277291.86 frames. utt_duration=1251 frames, utt_pad_proportion=0.05296, over 10495.33 utterances.], batch size: 43, lr: 3.56e-03, grad_scale: 8.0 2023-03-09 15:03:11,152 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2023-03-09 15:03:20,928 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.808e+02 2.132e+02 2.494e+02 4.997e+02, threshold=4.263e+02, percent-clipped=2.0 2023-03-09 15:03:48,718 INFO [train2.py:809] (2/4) Epoch 30, batch 2450, loss[ctc_loss=0.0595, att_loss=0.2216, loss=0.1892, over 15637.00 frames. utt_duration=1692 frames, utt_pad_proportion=0.008724, over 37.00 utterances.], tot_loss[ctc_loss=0.06278, att_loss=0.2306, loss=0.1971, over 3277290.87 frames. utt_duration=1249 frames, utt_pad_proportion=0.05385, over 10509.11 utterances.], batch size: 37, lr: 3.56e-03, grad_scale: 8.0 2023-03-09 15:03:59,594 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=117985.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:05:13,630 INFO [train2.py:809] (2/4) Epoch 30, batch 2500, loss[ctc_loss=0.08209, att_loss=0.2485, loss=0.2152, over 16467.00 frames. utt_duration=1433 frames, utt_pad_proportion=0.006614, over 46.00 utterances.], tot_loss[ctc_loss=0.06304, att_loss=0.2311, loss=0.1975, over 3277632.31 frames. utt_duration=1248 frames, utt_pad_proportion=0.05445, over 10517.02 utterances.], batch size: 46, lr: 3.56e-03, grad_scale: 8.0 2023-03-09 15:05:21,135 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=118033.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:05:32,764 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=118040.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:06:07,287 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.938e+02 2.187e+02 2.706e+02 8.579e+02, threshold=4.374e+02, percent-clipped=4.0 2023-03-09 15:06:35,074 INFO [train2.py:809] (2/4) Epoch 30, batch 2550, loss[ctc_loss=0.04112, att_loss=0.2, loss=0.1682, over 15794.00 frames. utt_duration=1664 frames, utt_pad_proportion=0.007317, over 38.00 utterances.], tot_loss[ctc_loss=0.06273, att_loss=0.2304, loss=0.1968, over 3272860.18 frames. utt_duration=1251 frames, utt_pad_proportion=0.05477, over 10473.93 utterances.], batch size: 38, lr: 3.56e-03, grad_scale: 4.0 2023-03-09 15:06:38,981 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-03-09 15:07:11,127 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=118101.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:07:30,128 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2023-03-09 15:07:49,915 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8487, 3.3942, 3.8793, 3.4418, 3.8838, 4.8266, 4.6699, 3.8226], device='cuda:2'), covar=tensor([0.0286, 0.1378, 0.1158, 0.1088, 0.0961, 0.0863, 0.0563, 0.0900], device='cuda:2'), in_proj_covar=tensor([0.0252, 0.0251, 0.0292, 0.0219, 0.0273, 0.0384, 0.0276, 0.0238], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 15:07:54,643 INFO [train2.py:809] (2/4) Epoch 30, batch 2600, loss[ctc_loss=0.05009, att_loss=0.2117, loss=0.1794, over 13646.00 frames. utt_duration=1821 frames, utt_pad_proportion=0.07323, over 30.00 utterances.], tot_loss[ctc_loss=0.06283, att_loss=0.2304, loss=0.1969, over 3268300.01 frames. utt_duration=1241 frames, utt_pad_proportion=0.05827, over 10550.83 utterances.], batch size: 30, lr: 3.56e-03, grad_scale: 4.0 2023-03-09 15:08:20,488 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.5328, 4.9105, 4.8216, 4.8990, 4.9320, 4.6620, 3.6345, 4.8494], device='cuda:2'), covar=tensor([0.0127, 0.0110, 0.0133, 0.0077, 0.0113, 0.0103, 0.0635, 0.0200], device='cuda:2'), in_proj_covar=tensor([0.0098, 0.0094, 0.0121, 0.0075, 0.0081, 0.0092, 0.0108, 0.0113], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 15:08:43,345 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2023-03-09 15:08:46,944 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.196e+02 1.796e+02 2.196e+02 2.749e+02 4.358e+02, threshold=4.391e+02, percent-clipped=0.0 2023-03-09 15:08:56,474 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=118168.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:08:57,901 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([6.1002, 6.3451, 5.9058, 6.0355, 5.9994, 5.4797, 5.7866, 5.5235], device='cuda:2'), covar=tensor([0.1265, 0.0859, 0.0970, 0.0800, 0.0995, 0.1600, 0.2270, 0.2303], device='cuda:2'), in_proj_covar=tensor([0.0564, 0.0640, 0.0493, 0.0481, 0.0460, 0.0484, 0.0649, 0.0548], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 15:09:14,092 INFO [train2.py:809] (2/4) Epoch 30, batch 2650, loss[ctc_loss=0.05281, att_loss=0.2117, loss=0.18, over 15391.00 frames. utt_duration=1760 frames, utt_pad_proportion=0.01001, over 35.00 utterances.], tot_loss[ctc_loss=0.06247, att_loss=0.2302, loss=0.1967, over 3271216.95 frames. utt_duration=1243 frames, utt_pad_proportion=0.05604, over 10535.55 utterances.], batch size: 35, lr: 3.56e-03, grad_scale: 4.0 2023-03-09 15:09:14,471 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([2.9938, 3.8009, 3.8309, 3.2954, 3.7407, 3.7909, 3.7398, 2.7833], device='cuda:2'), covar=tensor([0.1124, 0.1064, 0.1339, 0.2696, 0.1461, 0.2805, 0.1123, 0.2941], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0217, 0.0231, 0.0280, 0.0193, 0.0292, 0.0215, 0.0236], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:10:14,691 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.8255, 5.2003, 5.0334, 5.1302, 5.2107, 4.8553, 3.6219, 5.1201], device='cuda:2'), covar=tensor([0.0119, 0.0111, 0.0140, 0.0079, 0.0103, 0.0116, 0.0678, 0.0169], device='cuda:2'), in_proj_covar=tensor([0.0099, 0.0095, 0.0121, 0.0075, 0.0082, 0.0093, 0.0109, 0.0114], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 15:10:33,458 INFO [train2.py:809] (2/4) Epoch 30, batch 2700, loss[ctc_loss=0.04938, att_loss=0.2098, loss=0.1777, over 14489.00 frames. utt_duration=1813 frames, utt_pad_proportion=0.04248, over 32.00 utterances.], tot_loss[ctc_loss=0.06264, att_loss=0.2299, loss=0.1964, over 3269042.96 frames. utt_duration=1258 frames, utt_pad_proportion=0.05259, over 10404.83 utterances.], batch size: 32, lr: 3.56e-03, grad_scale: 4.0 2023-03-09 15:10:33,864 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=118229.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 15:11:25,934 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.836e+02 2.266e+02 2.685e+02 4.437e+02, threshold=4.533e+02, percent-clipped=1.0 2023-03-09 15:11:37,235 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=118269.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:11:41,003 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2023-03-09 15:11:52,825 INFO [train2.py:809] (2/4) Epoch 30, batch 2750, loss[ctc_loss=0.06758, att_loss=0.2239, loss=0.1926, over 16287.00 frames. utt_duration=1516 frames, utt_pad_proportion=0.00693, over 43.00 utterances.], tot_loss[ctc_loss=0.06266, att_loss=0.2308, loss=0.1972, over 3284764.54 frames. utt_duration=1260 frames, utt_pad_proportion=0.04748, over 10443.34 utterances.], batch size: 43, lr: 3.56e-03, grad_scale: 4.0 2023-03-09 15:12:10,157 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=118289.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:13:12,424 INFO [train2.py:809] (2/4) Epoch 30, batch 2800, loss[ctc_loss=0.07401, att_loss=0.2274, loss=0.1967, over 15991.00 frames. utt_duration=1601 frames, utt_pad_proportion=0.007471, over 40.00 utterances.], tot_loss[ctc_loss=0.06278, att_loss=0.2305, loss=0.197, over 3283343.61 frames. utt_duration=1275 frames, utt_pad_proportion=0.04478, over 10313.71 utterances.], batch size: 40, lr: 3.56e-03, grad_scale: 8.0 2023-03-09 15:13:12,807 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=118329.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:13:15,023 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=118330.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:13:47,140 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=118350.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:14:05,709 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.224e+02 1.819e+02 2.314e+02 2.830e+02 4.183e+02, threshold=4.629e+02, percent-clipped=0.0 2023-03-09 15:14:26,538 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.48 vs. limit=2.0 2023-03-09 15:14:30,112 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.2387, 2.8827, 3.0484, 4.4768, 3.9251, 3.9590, 3.0327, 2.2749], device='cuda:2'), covar=tensor([0.0870, 0.2011, 0.1061, 0.0428, 0.0858, 0.0474, 0.1441, 0.2242], device='cuda:2'), in_proj_covar=tensor([0.0193, 0.0222, 0.0187, 0.0230, 0.0242, 0.0197, 0.0205, 0.0194], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:14:32,831 INFO [train2.py:809] (2/4) Epoch 30, batch 2850, loss[ctc_loss=0.05013, att_loss=0.2381, loss=0.2005, over 16777.00 frames. utt_duration=1400 frames, utt_pad_proportion=0.005978, over 48.00 utterances.], tot_loss[ctc_loss=0.06311, att_loss=0.2306, loss=0.1971, over 3269106.71 frames. utt_duration=1227 frames, utt_pad_proportion=0.05949, over 10671.73 utterances.], batch size: 48, lr: 3.56e-03, grad_scale: 8.0 2023-03-09 15:14:51,883 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=118390.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:15:01,136 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=118396.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:15:53,107 INFO [train2.py:809] (2/4) Epoch 30, batch 2900, loss[ctc_loss=0.06077, att_loss=0.2122, loss=0.1819, over 15762.00 frames. utt_duration=1661 frames, utt_pad_proportion=0.008264, over 38.00 utterances.], tot_loss[ctc_loss=0.06238, att_loss=0.2296, loss=0.1961, over 3261383.24 frames. utt_duration=1252 frames, utt_pad_proportion=0.05527, over 10429.02 utterances.], batch size: 38, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:16:07,230 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5353, 4.0233, 3.5303, 3.7533, 4.1968, 3.9402, 3.3409, 4.4473], device='cuda:2'), covar=tensor([0.0877, 0.0496, 0.1034, 0.0631, 0.0712, 0.0750, 0.0794, 0.0478], device='cuda:2'), in_proj_covar=tensor([0.0214, 0.0232, 0.0236, 0.0213, 0.0296, 0.0256, 0.0209, 0.0304], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:2') 2023-03-09 15:16:46,257 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.238e+02 1.677e+02 2.045e+02 2.519e+02 5.925e+02, threshold=4.089e+02, percent-clipped=1.0 2023-03-09 15:17:12,893 INFO [train2.py:809] (2/4) Epoch 30, batch 2950, loss[ctc_loss=0.0411, att_loss=0.2077, loss=0.1744, over 15862.00 frames. utt_duration=1628 frames, utt_pad_proportion=0.01083, over 39.00 utterances.], tot_loss[ctc_loss=0.06258, att_loss=0.2302, loss=0.1967, over 3270591.89 frames. utt_duration=1240 frames, utt_pad_proportion=0.05654, over 10563.93 utterances.], batch size: 39, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:17:24,448 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.78 vs. limit=2.0 2023-03-09 15:18:23,834 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=118524.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 15:18:31,908 INFO [train2.py:809] (2/4) Epoch 30, batch 3000, loss[ctc_loss=0.06379, att_loss=0.2281, loss=0.1952, over 16530.00 frames. utt_duration=1471 frames, utt_pad_proportion=0.006066, over 45.00 utterances.], tot_loss[ctc_loss=0.06311, att_loss=0.2301, loss=0.1967, over 3263301.36 frames. utt_duration=1230 frames, utt_pad_proportion=0.06054, over 10623.23 utterances.], batch size: 45, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:18:31,908 INFO [train2.py:834] (2/4) Computing validation loss 2023-03-09 15:18:46,178 INFO [train2.py:843] (2/4) Epoch 30, validation: ctc_loss=0.04128, att_loss=0.2346, loss=0.196, over 944034.00 frames. utt_duration=679.8 frames, utt_pad_proportion=0.1349, over 5567.00 utterances. 2023-03-09 15:18:46,179 INFO [train2.py:844] (2/4) Maximum memory allocated so far is 16150MB 2023-03-09 15:19:08,068 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8667, 3.9628, 3.8450, 4.3593, 2.6225, 4.1385, 2.8260, 1.9028], device='cuda:2'), covar=tensor([0.0654, 0.0423, 0.0816, 0.0332, 0.1683, 0.0307, 0.1480, 0.1771], device='cuda:2'), in_proj_covar=tensor([0.0227, 0.0199, 0.0269, 0.0187, 0.0226, 0.0177, 0.0236, 0.0206], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:19:37,716 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.6415, 5.0502, 4.8671, 5.0349, 5.0585, 4.7724, 3.4603, 4.9564], device='cuda:2'), covar=tensor([0.0127, 0.0109, 0.0136, 0.0079, 0.0105, 0.0111, 0.0710, 0.0168], device='cuda:2'), in_proj_covar=tensor([0.0097, 0.0093, 0.0120, 0.0074, 0.0081, 0.0092, 0.0107, 0.0111], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 15:19:38,942 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.708e+02 2.016e+02 2.374e+02 4.392e+02, threshold=4.031e+02, percent-clipped=1.0 2023-03-09 15:20:05,703 INFO [train2.py:809] (2/4) Epoch 30, batch 3050, loss[ctc_loss=0.06067, att_loss=0.2403, loss=0.2043, over 16469.00 frames. utt_duration=1434 frames, utt_pad_proportion=0.006508, over 46.00 utterances.], tot_loss[ctc_loss=0.06272, att_loss=0.2297, loss=0.1963, over 3262204.19 frames. utt_duration=1246 frames, utt_pad_proportion=0.05668, over 10483.90 utterances.], batch size: 46, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:20:21,830 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.4984, 3.9628, 3.4798, 3.6409, 4.1661, 3.8846, 3.3421, 4.4633], device='cuda:2'), covar=tensor([0.0837, 0.0464, 0.0958, 0.0679, 0.0713, 0.0672, 0.0797, 0.0450], device='cuda:2'), in_proj_covar=tensor([0.0213, 0.0232, 0.0235, 0.0212, 0.0295, 0.0255, 0.0209, 0.0303], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:2') 2023-03-09 15:21:20,127 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=118625.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:21:26,773 INFO [train2.py:809] (2/4) Epoch 30, batch 3100, loss[ctc_loss=0.07375, att_loss=0.2495, loss=0.2144, over 17394.00 frames. utt_duration=1181 frames, utt_pad_proportion=0.0193, over 59.00 utterances.], tot_loss[ctc_loss=0.06308, att_loss=0.2309, loss=0.1974, over 3271532.91 frames. utt_duration=1245 frames, utt_pad_proportion=0.05569, over 10525.50 utterances.], batch size: 59, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:21:30,269 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=118631.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:21:41,740 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.0703, 5.0929, 4.8268, 3.0393, 4.8990, 4.7382, 4.5402, 2.9684], device='cuda:2'), covar=tensor([0.0127, 0.0112, 0.0282, 0.1049, 0.0114, 0.0209, 0.0263, 0.1274], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0108, 0.0113, 0.0114, 0.0091, 0.0120, 0.0102, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 15:21:53,103 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=118645.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:21:59,591 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3123, 3.8800, 3.3794, 3.4966, 4.0580, 3.7586, 3.2343, 4.2798], device='cuda:2'), covar=tensor([0.0931, 0.0512, 0.1003, 0.0747, 0.0736, 0.0777, 0.0846, 0.0527], device='cuda:2'), in_proj_covar=tensor([0.0212, 0.0231, 0.0234, 0.0212, 0.0294, 0.0255, 0.0208, 0.0301], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0005], device='cuda:2') 2023-03-09 15:22:03,336 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2023-03-09 15:22:19,879 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.328e+02 1.896e+02 2.204e+02 2.593e+02 6.390e+02, threshold=4.408e+02, percent-clipped=1.0 2023-03-09 15:22:47,061 INFO [train2.py:809] (2/4) Epoch 30, batch 3150, loss[ctc_loss=0.04342, att_loss=0.2051, loss=0.1728, over 15872.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.007839, over 39.00 utterances.], tot_loss[ctc_loss=0.06254, att_loss=0.2308, loss=0.1972, over 3277336.71 frames. utt_duration=1251 frames, utt_pad_proportion=0.05256, over 10494.18 utterances.], batch size: 39, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:22:56,720 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=118685.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:23:08,832 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=118692.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:23:14,664 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=118696.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:23:17,767 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6642, 5.9746, 5.4715, 5.6818, 5.6315, 5.1224, 5.3650, 5.1560], device='cuda:2'), covar=tensor([0.1677, 0.0912, 0.0973, 0.0899, 0.1127, 0.1843, 0.2644, 0.2475], device='cuda:2'), in_proj_covar=tensor([0.0568, 0.0645, 0.0494, 0.0481, 0.0459, 0.0484, 0.0648, 0.0551], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 15:23:50,385 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2023-03-09 15:24:06,683 INFO [train2.py:809] (2/4) Epoch 30, batch 3200, loss[ctc_loss=0.07379, att_loss=0.2465, loss=0.2119, over 17290.00 frames. utt_duration=877.2 frames, utt_pad_proportion=0.08148, over 79.00 utterances.], tot_loss[ctc_loss=0.06264, att_loss=0.2311, loss=0.1974, over 3283174.88 frames. utt_duration=1242 frames, utt_pad_proportion=0.05236, over 10588.21 utterances.], batch size: 79, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:24:30,974 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=118744.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:24:59,444 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.732e+02 2.054e+02 2.597e+02 6.136e+02, threshold=4.108e+02, percent-clipped=5.0 2023-03-09 15:25:26,528 INFO [train2.py:809] (2/4) Epoch 30, batch 3250, loss[ctc_loss=0.05544, att_loss=0.2403, loss=0.2033, over 17024.00 frames. utt_duration=1337 frames, utt_pad_proportion=0.007482, over 51.00 utterances.], tot_loss[ctc_loss=0.06281, att_loss=0.2309, loss=0.1973, over 3286728.27 frames. utt_duration=1245 frames, utt_pad_proportion=0.0509, over 10570.35 utterances.], batch size: 51, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:25:37,221 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=2.92 vs. limit=5.0 2023-03-09 15:26:39,449 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=118824.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:26:41,730 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.31 vs. limit=2.0 2023-03-09 15:26:46,840 INFO [train2.py:809] (2/4) Epoch 30, batch 3300, loss[ctc_loss=0.06335, att_loss=0.2386, loss=0.2035, over 17465.00 frames. utt_duration=1014 frames, utt_pad_proportion=0.04355, over 69.00 utterances.], tot_loss[ctc_loss=0.06216, att_loss=0.2307, loss=0.197, over 3294427.66 frames. utt_duration=1258 frames, utt_pad_proportion=0.04563, over 10484.76 utterances.], batch size: 69, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:27:02,312 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1125, 5.3409, 5.3056, 5.3131, 5.3705, 5.3242, 4.9623, 4.7510], device='cuda:2'), covar=tensor([0.0948, 0.0509, 0.0304, 0.0442, 0.0293, 0.0325, 0.0454, 0.0365], device='cuda:2'), in_proj_covar=tensor([0.0538, 0.0384, 0.0382, 0.0386, 0.0451, 0.0454, 0.0385, 0.0422], device='cuda:2'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:2') 2023-03-09 15:27:11,425 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.6495, 5.9293, 5.4233, 5.6643, 5.6177, 5.1194, 5.3461, 5.0591], device='cuda:2'), covar=tensor([0.1302, 0.0970, 0.0951, 0.0887, 0.0940, 0.1501, 0.2274, 0.2297], device='cuda:2'), in_proj_covar=tensor([0.0566, 0.0645, 0.0495, 0.0482, 0.0459, 0.0482, 0.0648, 0.0550], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:2') 2023-03-09 15:27:39,412 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.286e+02 1.786e+02 1.999e+02 2.429e+02 6.941e+02, threshold=3.998e+02, percent-clipped=3.0 2023-03-09 15:27:55,370 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=118872.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:28:06,047 INFO [train2.py:809] (2/4) Epoch 30, batch 3350, loss[ctc_loss=0.05701, att_loss=0.2167, loss=0.1848, over 15943.00 frames. utt_duration=1557 frames, utt_pad_proportion=0.005299, over 41.00 utterances.], tot_loss[ctc_loss=0.06261, att_loss=0.2309, loss=0.1973, over 3291639.12 frames. utt_duration=1255 frames, utt_pad_proportion=0.04725, over 10501.59 utterances.], batch size: 41, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:28:24,881 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.8972, 4.8817, 4.7588, 2.1496, 1.8964, 2.7488, 2.2384, 3.8528], device='cuda:2'), covar=tensor([0.0825, 0.0295, 0.0310, 0.5354, 0.5519, 0.2740, 0.4006, 0.1553], device='cuda:2'), in_proj_covar=tensor([0.0366, 0.0311, 0.0283, 0.0257, 0.0342, 0.0336, 0.0267, 0.0375], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:2') 2023-03-09 15:28:50,466 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.1517, 3.7305, 3.1901, 3.3445, 3.9249, 3.6352, 3.0540, 4.2261], device='cuda:2'), covar=tensor([0.0905, 0.0490, 0.0996, 0.0766, 0.0706, 0.0750, 0.0804, 0.0389], device='cuda:2'), in_proj_covar=tensor([0.0211, 0.0229, 0.0233, 0.0211, 0.0293, 0.0253, 0.0208, 0.0301], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:2') 2023-03-09 15:29:01,267 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.5296, 3.7754, 3.6348, 3.6686, 3.9605, 3.6918, 3.5525, 2.7977], device='cuda:2'), covar=tensor([0.0536, 0.0562, 0.0516, 0.0563, 0.0535, 0.0402, 0.0535, 0.1353], device='cuda:2'), in_proj_covar=tensor([0.0208, 0.0237, 0.0231, 0.0248, 0.0390, 0.0204, 0.0223, 0.0226], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:29:17,635 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3560, 2.5620, 3.0978, 2.6155, 3.0656, 3.4595, 3.3620, 2.7773], device='cuda:2'), covar=tensor([0.0502, 0.1498, 0.1060, 0.1161, 0.0924, 0.1187, 0.0687, 0.1128], device='cuda:2'), in_proj_covar=tensor([0.0253, 0.0252, 0.0295, 0.0222, 0.0277, 0.0387, 0.0278, 0.0239], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:2') 2023-03-09 15:29:20,612 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=118925.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:29:26,687 INFO [train2.py:809] (2/4) Epoch 30, batch 3400, loss[ctc_loss=0.05252, att_loss=0.1992, loss=0.1699, over 15374.00 frames. utt_duration=1758 frames, utt_pad_proportion=0.0106, over 35.00 utterances.], tot_loss[ctc_loss=0.06227, att_loss=0.2306, loss=0.1969, over 3282415.86 frames. utt_duration=1240 frames, utt_pad_proportion=0.05395, over 10604.80 utterances.], batch size: 35, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:29:53,492 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=118945.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:30:19,283 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.021e+02 1.939e+02 2.194e+02 2.759e+02 6.591e+02, threshold=4.388e+02, percent-clipped=2.0 2023-03-09 15:30:37,147 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=118973.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:30:46,796 INFO [train2.py:809] (2/4) Epoch 30, batch 3450, loss[ctc_loss=0.05425, att_loss=0.2327, loss=0.197, over 16880.00 frames. utt_duration=1379 frames, utt_pad_proportion=0.006935, over 49.00 utterances.], tot_loss[ctc_loss=0.06213, att_loss=0.2304, loss=0.1967, over 3288550.46 frames. utt_duration=1242 frames, utt_pad_proportion=0.05205, over 10602.96 utterances.], batch size: 49, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:30:57,726 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=118985.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:31:00,745 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=118987.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:31:09,824 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=118993.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:32:07,441 INFO [train2.py:809] (2/4) Epoch 30, batch 3500, loss[ctc_loss=0.09391, att_loss=0.2494, loss=0.2183, over 14008.00 frames. utt_duration=380 frames, utt_pad_proportion=0.3298, over 148.00 utterances.], tot_loss[ctc_loss=0.06226, att_loss=0.2306, loss=0.1969, over 3288919.05 frames. utt_duration=1234 frames, utt_pad_proportion=0.05447, over 10674.31 utterances.], batch size: 148, lr: 3.55e-03, grad_scale: 8.0 2023-03-09 15:32:15,048 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=119033.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:32:19,432 INFO [scaling.py:679] (2/4) Whitening: num_groups=1, num_channels=384, metric=4.71 vs. limit=5.0 2023-03-09 15:32:59,918 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.231e+02 1.837e+02 2.167e+02 2.597e+02 5.563e+02, threshold=4.335e+02, percent-clipped=3.0 2023-03-09 15:33:11,438 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.0725, 3.8335, 3.7156, 3.2649, 3.7778, 3.7426, 3.8380, 2.8018], device='cuda:2'), covar=tensor([0.1064, 0.1081, 0.2570, 0.3570, 0.1054, 0.2938, 0.1120, 0.3330], device='cuda:2'), in_proj_covar=tensor([0.0217, 0.0222, 0.0235, 0.0285, 0.0198, 0.0295, 0.0218, 0.0239], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:33:20,393 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=119075.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:33:27,089 INFO [train2.py:809] (2/4) Epoch 30, batch 3550, loss[ctc_loss=0.05586, att_loss=0.2083, loss=0.1778, over 14577.00 frames. utt_duration=1824 frames, utt_pad_proportion=0.03617, over 32.00 utterances.], tot_loss[ctc_loss=0.06251, att_loss=0.23, loss=0.1965, over 3278265.22 frames. utt_duration=1239 frames, utt_pad_proportion=0.05512, over 10593.69 utterances.], batch size: 32, lr: 3.54e-03, grad_scale: 8.0 2023-03-09 15:33:50,471 INFO [zipformer.py:625] (2/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=119093.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:34:10,198 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.8479, 5.1850, 5.4816, 5.2221, 5.4024, 5.8289, 5.1566, 5.9455], device='cuda:2'), covar=tensor([0.0730, 0.0750, 0.0806, 0.1323, 0.1857, 0.0875, 0.0789, 0.0640], device='cuda:2'), in_proj_covar=tensor([0.0926, 0.0540, 0.0650, 0.0688, 0.0913, 0.0678, 0.0515, 0.0651], device='cuda:2'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003], device='cuda:2') 2023-03-09 15:34:21,273 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([3.3497, 2.8491, 3.4058, 4.4348, 3.9577, 4.0192, 2.9619, 2.3939], device='cuda:2'), covar=tensor([0.0849, 0.1961, 0.0880, 0.0539, 0.0854, 0.0495, 0.1402, 0.2133], device='cuda:2'), in_proj_covar=tensor([0.0194, 0.0224, 0.0188, 0.0233, 0.0243, 0.0198, 0.0207, 0.0195], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:34:45,500 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3555, 2.2671, 4.8325, 3.7703, 2.8582, 4.1592, 4.6576, 4.5656], device='cuda:2'), covar=tensor([0.0347, 0.1766, 0.0261, 0.0899, 0.1789, 0.0302, 0.0245, 0.0295], device='cuda:2'), in_proj_covar=tensor([0.0242, 0.0246, 0.0235, 0.0324, 0.0272, 0.0247, 0.0231, 0.0249], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:34:46,591 INFO [train2.py:809] (2/4) Epoch 30, batch 3600, loss[ctc_loss=0.05212, att_loss=0.2466, loss=0.2077, over 16769.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.005523, over 48.00 utterances.], tot_loss[ctc_loss=0.06266, att_loss=0.2302, loss=0.1967, over 3283936.51 frames. utt_duration=1239 frames, utt_pad_proportion=0.05345, over 10612.93 utterances.], batch size: 48, lr: 3.54e-03, grad_scale: 8.0 2023-03-09 15:34:57,945 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=119136.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:35:26,361 INFO [zipformer.py:625] (2/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=119154.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 15:35:38,266 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.787e+02 2.179e+02 2.834e+02 1.118e+03, threshold=4.357e+02, percent-clipped=5.0 2023-03-09 15:35:49,149 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2023-03-09 15:36:05,603 INFO [train2.py:809] (2/4) Epoch 30, batch 3650, loss[ctc_loss=0.04422, att_loss=0.2121, loss=0.1785, over 15864.00 frames. utt_duration=1629 frames, utt_pad_proportion=0.01056, over 39.00 utterances.], tot_loss[ctc_loss=0.06234, att_loss=0.2298, loss=0.1963, over 3278534.87 frames. utt_duration=1246 frames, utt_pad_proportion=0.05269, over 10540.88 utterances.], batch size: 39, lr: 3.54e-03, grad_scale: 8.0 2023-03-09 15:37:24,888 INFO [train2.py:809] (2/4) Epoch 30, batch 3700, loss[ctc_loss=0.08834, att_loss=0.2505, loss=0.2181, over 17349.00 frames. utt_duration=1178 frames, utt_pad_proportion=0.02098, over 59.00 utterances.], tot_loss[ctc_loss=0.06268, att_loss=0.2304, loss=0.1968, over 3284910.30 frames. utt_duration=1255 frames, utt_pad_proportion=0.0494, over 10483.71 utterances.], batch size: 59, lr: 3.54e-03, grad_scale: 8.0 2023-03-09 15:38:18,045 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.417e+01 1.774e+02 2.178e+02 2.630e+02 5.010e+02, threshold=4.357e+02, percent-clipped=3.0 2023-03-09 15:38:45,447 INFO [train2.py:809] (2/4) Epoch 30, batch 3750, loss[ctc_loss=0.04848, att_loss=0.2154, loss=0.182, over 15956.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006902, over 41.00 utterances.], tot_loss[ctc_loss=0.06243, att_loss=0.2302, loss=0.1967, over 3284224.65 frames. utt_duration=1254 frames, utt_pad_proportion=0.05006, over 10492.33 utterances.], batch size: 41, lr: 3.54e-03, grad_scale: 8.0 2023-03-09 15:38:59,238 INFO [zipformer.py:625] (2/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=119287.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:39:32,056 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.4804, 2.3053, 4.9263, 3.8571, 3.0822, 4.2058, 4.7173, 4.6347], device='cuda:2'), covar=tensor([0.0304, 0.1625, 0.0203, 0.0830, 0.1532, 0.0274, 0.0225, 0.0257], device='cuda:2'), in_proj_covar=tensor([0.0241, 0.0244, 0.0235, 0.0323, 0.0270, 0.0247, 0.0230, 0.0248], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:39:51,050 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.1679, 5.1487, 4.9325, 3.2277, 5.0199, 4.7827, 4.4297, 2.8689], device='cuda:2'), covar=tensor([0.0111, 0.0095, 0.0267, 0.0907, 0.0100, 0.0184, 0.0314, 0.1317], device='cuda:2'), in_proj_covar=tensor([0.0079, 0.0108, 0.0112, 0.0113, 0.0091, 0.0119, 0.0102, 0.0105], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 15:40:05,285 INFO [train2.py:809] (2/4) Epoch 30, batch 3800, loss[ctc_loss=0.0591, att_loss=0.2392, loss=0.2032, over 16769.00 frames. utt_duration=1399 frames, utt_pad_proportion=0.006481, over 48.00 utterances.], tot_loss[ctc_loss=0.06292, att_loss=0.2308, loss=0.1972, over 3279821.68 frames. utt_duration=1220 frames, utt_pad_proportion=0.05995, over 10762.70 utterances.], batch size: 48, lr: 3.54e-03, grad_scale: 8.0 2023-03-09 15:40:15,894 INFO [zipformer.py:625] (2/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=119335.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:40:58,011 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.731e+02 2.132e+02 2.565e+02 6.876e+02, threshold=4.265e+02, percent-clipped=4.0 2023-03-09 15:41:24,904 INFO [train2.py:809] (2/4) Epoch 30, batch 3850, loss[ctc_loss=0.07028, att_loss=0.2341, loss=0.2013, over 16300.00 frames. utt_duration=1518 frames, utt_pad_proportion=0.006107, over 43.00 utterances.], tot_loss[ctc_loss=0.06335, att_loss=0.2304, loss=0.197, over 3269430.36 frames. utt_duration=1219 frames, utt_pad_proportion=0.06292, over 10744.56 utterances.], batch size: 43, lr: 3.54e-03, grad_scale: 8.0 2023-03-09 15:41:44,892 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([5.2575, 5.2643, 5.0846, 3.5312, 5.0926, 4.9258, 4.6653, 3.2515], device='cuda:2'), covar=tensor([0.0109, 0.0082, 0.0239, 0.0737, 0.0088, 0.0158, 0.0243, 0.1026], device='cuda:2'), in_proj_covar=tensor([0.0078, 0.0107, 0.0112, 0.0112, 0.0090, 0.0119, 0.0102, 0.0104], device='cuda:2'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:2') 2023-03-09 15:41:52,667 INFO [zipformer.py:1447] (2/4) attn_weights_entropy = tensor([4.3708, 4.5544, 4.5517, 4.6843, 5.1040, 4.4680, 4.5476, 2.9107], device='cuda:2'), covar=tensor([0.0338, 0.0455, 0.0411, 0.0370, 0.0832, 0.0318, 0.0422, 0.1596], device='cuda:2'), in_proj_covar=tensor([0.0209, 0.0238, 0.0233, 0.0248, 0.0392, 0.0205, 0.0224, 0.0227], device='cuda:2'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:2') 2023-03-09 15:42:43,260 INFO [train2.py:809] (2/4) Epoch 30, batch 3900, loss[ctc_loss=0.05236, att_loss=0.2299, loss=0.1944, over 16267.00 frames. utt_duration=1515 frames, utt_pad_proportion=0.008072, over 43.00 utterances.], tot_loss[ctc_loss=0.06337, att_loss=0.2304, loss=0.197, over 3265352.53 frames. utt_duration=1239 frames, utt_pad_proportion=0.05868, over 10554.90 utterances.], batch size: 43, lr: 3.54e-03, grad_scale: 8.0 2023-03-09 15:42:46,468 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=119431.0, num_to_drop=0, layers_to_drop=set() 2023-03-09 15:43:14,043 INFO [zipformer.py:625] (2/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=119449.0, num_to_drop=1, layers_to_drop={0} 2023-03-09 15:43:33,706 INFO [optim.py:369] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.815e+02 2.229e+02 2.704e+02 7.416e+02, threshold=4.458e+02, percent-clipped=4.0 2023-03-09 15:43:59,616 INFO [train2.py:809] (2/4) Epoch 30, batch 3950, loss[ctc_loss=0.04859, att_loss=0.2248, loss=0.1896, over 15952.00 frames. utt_duration=1558 frames, utt_pad_proportion=0.006549, over 41.00 utterances.], tot_loss[ctc_loss=0.06332, att_loss=0.23, loss=0.1966, over 3263911.44 frames. utt_duration=1243 frames, utt_pad_proportion=0.05818, over 10514.03 utterances.], batch size: 41, lr: 3.54e-03, grad_scale: 8.0 2023-03-09 15:44:12,001 INFO [scaling.py:679] (2/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2023-03-09 15:44:52,312 INFO [train2.py:1037] (2/4) Done!