2022-11-15 13:11:38,145 INFO [train.py:944] (0/4) Training started 2022-11-15 13:11:38,149 INFO [train.py:954] (0/4) Device: cuda:0 2022-11-15 13:11:38,154 INFO [train.py:963] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 100, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.21', 'k2-build-type': 'Debug', 'k2-with-cuda': True, 'k2-git-sha1': 'f271e82ef30f75fecbae44b163e1244e53def116', 'k2-git-date': 'Fri Oct 28 05:02:16 2022', 'lhotse-version': '1.9.0.dev+git.97bf4b0.dirty', 'torch-version': '1.10.0+cu111', 'torch-cuda-available': True, 'torch-cuda-version': '11.1', 'python-version': '3.8', 'icefall-git-branch': 'ami', 'icefall-git-sha1': '65f14ba-dirty', 'icefall-git-date': 'Mon Nov 14 18:45:09 2022', 'icefall-path': '/exp/draj/mini_scale_2022/icefall', 'k2-path': '/exp/draj/mini_scale_2022/k2/k2/python/k2/__init__.py', 'lhotse-path': '/exp/draj/mini_scale_2022/lhotse/lhotse/__init__.py', 'hostname': 'r8n04', 'IP address': '10.1.8.4'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless7/exp/v2'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.05, 'lr_batches': 5000, 'lr_epochs': 3.5, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 5000, 'keep_last_k': 10, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'manifest_dir': PosixPath('data/manifests'), 'enable_musan': True, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'max_duration': 120, 'num_buckets': 50, 'on_the_fly_feats': False, 'shuffle': True, 'num_workers': 8, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'blank_id': 0, 'vocab_size': 500} 2022-11-15 13:11:38,154 INFO [train.py:965] (0/4) About to create model 2022-11-15 13:11:38,546 INFO [zipformer.py:176] (0/4) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8. 2022-11-15 13:11:38,557 INFO [train.py:969] (0/4) Number of model parameters: 70369391 2022-11-15 13:11:43,356 INFO [train.py:984] (0/4) Using DDP 2022-11-15 13:11:43,611 INFO [asr_datamodule.py:353] (0/4) About to get AMI train cuts 2022-11-15 13:11:43,616 INFO [asr_datamodule.py:201] (0/4) About to get Musan cuts 2022-11-15 13:11:44,967 INFO [asr_datamodule.py:206] (0/4) Enable MUSAN 2022-11-15 13:11:44,967 INFO [asr_datamodule.py:229] (0/4) Enable SpecAugment 2022-11-15 13:11:44,967 INFO [asr_datamodule.py:230] (0/4) Time warp factor: 80 2022-11-15 13:11:44,967 INFO [asr_datamodule.py:243] (0/4) About to create train dataset 2022-11-15 13:11:44,967 INFO [asr_datamodule.py:256] (0/4) Using DynamicBucketingSampler. 2022-11-15 13:11:45,329 INFO [asr_datamodule.py:264] (0/4) About to create train dataloader 2022-11-15 13:11:45,330 INFO [asr_datamodule.py:385] (0/4) About to get AMI IHM dev cuts 2022-11-15 13:11:45,331 INFO [asr_datamodule.py:296] (0/4) About to create dev dataset 2022-11-15 13:11:45,676 INFO [asr_datamodule.py:311] (0/4) About to create dev dataloader 2022-11-15 13:12:20,641 INFO [train.py:876] (0/4) Epoch 1, batch 0, loss[loss=3.735, simple_loss=3.372, pruned_loss=3.624, over 5621.00 frames. ], tot_loss[loss=3.735, simple_loss=3.372, pruned_loss=3.624, over 5621.00 frames. ], batch size: 23, lr: 2.50e-02, grad_scale: 2.0 2022-11-15 13:12:20,643 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 13:12:37,308 INFO [train.py:908] (0/4) Epoch 1, validation: loss=3.424, simple_loss=3.08, pruned_loss=3.435, over 1530663.00 frames. 2022-11-15 13:12:37,341 INFO [train.py:909] (0/4) Maximum memory allocated so far is 2688MB 2022-11-15 13:12:39,698 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5.0, num_to_drop=2, layers_to_drop={0, 1} 2022-11-15 13:12:50,431 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:13:01,678 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=12.99 vs. limit=2.0 2022-11-15 13:13:09,674 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=99.15 vs. limit=5.0 2022-11-15 13:13:21,377 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=93.22 vs. limit=5.0 2022-11-15 13:13:23,355 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:13:32,522 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.298e+01 5.514e+01 1.134e+02 1.922e+02 2.006e+03, threshold=2.268e+02, percent-clipped=0.0 2022-11-15 13:13:32,568 INFO [train.py:876] (0/4) Epoch 1, batch 100, loss[loss=0.3454, simple_loss=0.2949, pruned_loss=0.3982, over 4625.00 frames. ], tot_loss[loss=0.7377, simple_loss=0.6627, pruned_loss=0.6822, over 430356.48 frames. ], batch size: 5, lr: 3.00e-02, grad_scale: 2.0 2022-11-15 13:13:57,868 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=144.0, num_to_drop=2, layers_to_drop={2, 3} 2022-11-15 13:14:15,114 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=6.02 vs. limit=2.0 2022-11-15 13:14:31,594 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.547e+01 3.263e+01 4.120e+01 1.011e+02, threshold=6.525e+01, percent-clipped=0.0 2022-11-15 13:14:31,634 INFO [train.py:876] (0/4) Epoch 1, batch 200, loss[loss=0.4489, simple_loss=0.3849, pruned_loss=0.4376, over 5764.00 frames. ], tot_loss[loss=0.5534, simple_loss=0.4882, pruned_loss=0.5304, over 682995.19 frames. ], batch size: 20, lr: 3.50e-02, grad_scale: 2.0 2022-11-15 13:14:39,493 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=40.49 vs. limit=5.0 2022-11-15 13:15:24,760 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=296.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:15:27,524 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=300.0, num_to_drop=2, layers_to_drop={0, 3} 2022-11-15 13:15:28,297 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.173e+01 3.095e+01 4.109e+01 5.560e+01 3.461e+02, threshold=8.218e+01, percent-clipped=17.0 2022-11-15 13:15:28,336 INFO [train.py:876] (0/4) Epoch 1, batch 300, loss[loss=0.417, simple_loss=0.3557, pruned_loss=0.3739, over 5473.00 frames. ], tot_loss[loss=0.48, simple_loss=0.4169, pruned_loss=0.4575, over 844780.88 frames. ], batch size: 58, lr: 4.00e-02, grad_scale: 2.0 2022-11-15 13:15:33,793 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=9.53 vs. limit=5.0 2022-11-15 13:15:50,085 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.99 vs. limit=5.0 2022-11-15 13:15:54,349 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=4.55 vs. limit=2.0 2022-11-15 13:16:00,380 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=357.0, num_to_drop=2, layers_to_drop={1, 2} 2022-11-15 13:16:16,638 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=387.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:16:24,455 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+01 3.559e+01 4.600e+01 6.651e+01 2.649e+02, threshold=9.199e+01, percent-clipped=13.0 2022-11-15 13:16:24,498 INFO [train.py:876] (0/4) Epoch 1, batch 400, loss[loss=0.3893, simple_loss=0.3221, pruned_loss=0.3547, over 5591.00 frames. ], tot_loss[loss=0.4506, simple_loss=0.3862, pruned_loss=0.4193, over 934794.35 frames. ], batch size: 23, lr: 4.50e-02, grad_scale: 4.0 2022-11-15 13:16:32,992 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4309, 4.5204, 4.5065, 4.4918, 4.2429, 4.4629, 4.4872, 4.5026], device='cuda:0'), covar=tensor([0.0451, 0.0160, 0.0145, 0.0284, 0.2261, 0.0657, 0.0194, 0.0260], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0010, 0.0009, 0.0009, 0.0009, 0.0010, 0.0010, 0.0009], device='cuda:0'), out_proj_covar=tensor([9.6285e-06, 9.6098e-06, 9.2074e-06, 9.3946e-06, 9.9590e-06, 9.5039e-06, 9.5084e-06, 9.2398e-06], device='cuda:0') 2022-11-15 13:16:46,791 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=439.0, num_to_drop=2, layers_to_drop={0, 2} 2022-11-15 13:16:51,922 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=448.0, num_to_drop=2, layers_to_drop={1, 3} 2022-11-15 13:16:53,695 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=4.15 vs. limit=2.0 2022-11-15 13:17:22,361 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.467e+01 3.366e+01 4.397e+01 5.953e+01 8.256e+02, threshold=8.794e+01, percent-clipped=9.0 2022-11-15 13:17:22,403 INFO [train.py:876] (0/4) Epoch 1, batch 500, loss[loss=0.4516, simple_loss=0.3636, pruned_loss=0.4101, over 5745.00 frames. ], tot_loss[loss=0.432, simple_loss=0.3648, pruned_loss=0.3928, over 998155.40 frames. ], batch size: 16, lr: 4.99e-02, grad_scale: 4.0 2022-11-15 13:17:27,209 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=8.60 vs. limit=5.0 2022-11-15 13:17:57,338 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=562.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:18:00,627 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=568.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:18:14,158 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=590.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:18:20,264 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=600.0, num_to_drop=2, layers_to_drop={0, 2} 2022-11-15 13:18:20,636 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.558e+01 3.991e+01 4.815e+01 6.242e+01 3.465e+02, threshold=9.630e+01, percent-clipped=11.0 2022-11-15 13:18:20,677 INFO [train.py:876] (0/4) Epoch 1, batch 600, loss[loss=0.3282, simple_loss=0.2626, pruned_loss=0.2837, over 5111.00 frames. ], tot_loss[loss=0.4214, simple_loss=0.3508, pruned_loss=0.3738, over 1029501.77 frames. ], batch size: 8, lr: 4.98e-02, grad_scale: 4.0 2022-11-15 13:18:33,000 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=623.0, num_to_drop=2, layers_to_drop={0, 1} 2022-11-15 13:18:36,257 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=629.0, num_to_drop=2, layers_to_drop={1, 2} 2022-11-15 13:18:47,931 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=648.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:18:49,664 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=651.0, num_to_drop=2, layers_to_drop={0, 2} 2022-11-15 13:18:50,152 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=652.0, num_to_drop=2, layers_to_drop={0, 2} 2022-11-15 13:18:51,323 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0257, 4.2353, 4.1215, 4.2290, 4.0682, 3.4303, 3.9947, 4.2225], device='cuda:0'), covar=tensor([0.1737, 0.2622, 0.1442, 0.2968, 0.3613, 0.4671, 0.1626, 0.2498], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0013, 0.0012, 0.0013, 0.0013, 0.0014, 0.0013, 0.0013], device='cuda:0'), out_proj_covar=tensor([1.0736e-05, 1.1338e-05, 1.0938e-05, 1.1889e-05, 1.1431e-05, 1.2432e-05, 1.0656e-05, 1.1000e-05], device='cuda:0') 2022-11-15 13:18:56,346 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1901, 3.6883, 3.7109, 3.1245, 4.1841, 4.0275, 3.6316, 3.6794], device='cuda:0'), covar=tensor([0.2728, 0.1926, 0.1234, 0.5202, 0.1036, 0.1259, 0.1414, 0.1555], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0015, 0.0015, 0.0016, 0.0015, 0.0014, 0.0014, 0.0014], device='cuda:0'), out_proj_covar=tensor([1.4853e-05, 1.3558e-05, 1.3938e-05, 1.4653e-05, 1.3190e-05, 1.2636e-05, 1.3327e-05, 1.3701e-05], device='cuda:0') 2022-11-15 13:18:58,730 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.44 vs. limit=2.0 2022-11-15 13:19:18,386 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.943e+01 4.768e+01 6.411e+01 9.551e+01 4.417e+02, threshold=1.282e+02, percent-clipped=24.0 2022-11-15 13:19:18,426 INFO [train.py:876] (0/4) Epoch 1, batch 700, loss[loss=0.4523, simple_loss=0.3618, pruned_loss=0.3705, over 5447.00 frames. ], tot_loss[loss=0.4176, simple_loss=0.3425, pruned_loss=0.3615, over 1055007.59 frames. ], batch size: 64, lr: 4.98e-02, grad_scale: 4.0 2022-11-15 13:19:39,844 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=739.0, num_to_drop=2, layers_to_drop={1, 2} 2022-11-15 13:19:40,182 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.76 vs. limit=2.0 2022-11-15 13:19:42,022 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=743.0, num_to_drop=2, layers_to_drop={1, 3} 2022-11-15 13:19:52,681 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=3.43 vs. limit=2.0 2022-11-15 13:20:07,231 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=787.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:20:15,194 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.610e+01 5.331e+01 6.709e+01 8.250e+01 2.326e+02, threshold=1.342e+02, percent-clipped=8.0 2022-11-15 13:20:15,237 INFO [train.py:876] (0/4) Epoch 1, batch 800, loss[loss=0.426, simple_loss=0.3362, pruned_loss=0.3412, over 5601.00 frames. ], tot_loss[loss=0.4105, simple_loss=0.3329, pruned_loss=0.3454, over 1069546.33 frames. ], batch size: 22, lr: 4.97e-02, grad_scale: 8.0 2022-11-15 13:20:20,024 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.62 vs. limit=2.0 2022-11-15 13:20:38,005 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.11 vs. limit=2.0 2022-11-15 13:20:42,175 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=847.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:20:48,303 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.07 vs. limit=2.0 2022-11-15 13:21:13,723 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.878e+01 6.865e+01 9.860e+01 1.395e+02 3.183e+02, threshold=1.972e+02, percent-clipped=28.0 2022-11-15 13:21:13,767 INFO [train.py:876] (0/4) Epoch 1, batch 900, loss[loss=0.3457, simple_loss=0.2745, pruned_loss=0.2632, over 5750.00 frames. ], tot_loss[loss=0.4091, simple_loss=0.3289, pruned_loss=0.3337, over 1075506.71 frames. ], batch size: 13, lr: 4.96e-02, grad_scale: 8.0 2022-11-15 13:21:17,933 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=908.0, num_to_drop=2, layers_to_drop={2, 3} 2022-11-15 13:21:21,497 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.34 vs. limit=2.0 2022-11-15 13:21:22,346 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2149, 2.2144, 2.2561, 2.1649, 1.8396, 2.2279, 2.2133, 1.9655], device='cuda:0'), covar=tensor([0.2634, 0.3271, 0.2666, 0.2961, 0.3693, 0.2401, 0.2101, 0.2480], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0018, 0.0015, 0.0016, 0.0017, 0.0017, 0.0016, 0.0017], device='cuda:0'), out_proj_covar=tensor([1.5609e-05, 1.6193e-05, 1.3707e-05, 1.4643e-05, 1.4296e-05, 1.4539e-05, 1.4055e-05, 1.4717e-05], device='cuda:0') 2022-11-15 13:21:23,966 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=918.0, num_to_drop=2, layers_to_drop={1, 3} 2022-11-15 13:21:28,507 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=924.0, num_to_drop=2, layers_to_drop={0, 3} 2022-11-15 13:21:40,748 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=946.0, num_to_drop=2, layers_to_drop={0, 1} 2022-11-15 13:21:44,156 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=952.0, num_to_drop=2, layers_to_drop={1, 2} 2022-11-15 13:22:12,363 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1000.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:22:12,824 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.013e+01 8.825e+01 1.129e+02 1.516e+02 3.115e+02, threshold=2.258e+02, percent-clipped=11.0 2022-11-15 13:22:12,874 INFO [train.py:876] (0/4) Epoch 1, batch 1000, loss[loss=0.425, simple_loss=0.3332, pruned_loss=0.3182, over 5477.00 frames. ], tot_loss[loss=0.4038, simple_loss=0.3225, pruned_loss=0.3192, over 1085339.50 frames. ], batch size: 64, lr: 4.95e-02, grad_scale: 8.0 2022-11-15 13:22:18,722 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=10.20 vs. limit=5.0 2022-11-15 13:22:37,959 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1043.0, num_to_drop=2, layers_to_drop={0, 2} 2022-11-15 13:23:05,513 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1091.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:23:11,682 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.943e+01 8.541e+01 1.080e+02 1.456e+02 2.899e+02, threshold=2.160e+02, percent-clipped=5.0 2022-11-15 13:23:11,724 INFO [train.py:876] (0/4) Epoch 1, batch 1100, loss[loss=0.3419, simple_loss=0.2704, pruned_loss=0.245, over 3098.00 frames. ], tot_loss[loss=0.3958, simple_loss=0.315, pruned_loss=0.3031, over 1088041.82 frames. ], batch size: 284, lr: 4.94e-02, grad_scale: 8.0 2022-11-15 13:23:14,766 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.69 vs. limit=2.0 2022-11-15 13:23:29,752 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1488, 4.1537, 4.3780, 4.1357, 4.2575, 4.6927, 4.4583, 4.5473], device='cuda:0'), covar=tensor([0.1360, 0.1137, 0.0948, 0.1665, 0.0908, 0.0484, 0.0853, 0.0564], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0012, 0.0013, 0.0012, 0.0012, 0.0010, 0.0011, 0.0011], device='cuda:0'), out_proj_covar=tensor([1.3001e-05, 1.2089e-05, 1.2424e-05, 1.1813e-05, 1.1291e-05, 1.0348e-05, 1.1387e-05, 1.0270e-05], device='cuda:0') 2022-11-15 13:23:42,396 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1153.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:23:44,037 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1946, 2.9891, 3.1135, 3.3093, 3.0233, 3.4684, 3.1671, 3.0765], device='cuda:0'), covar=tensor([0.4505, 0.6350, 0.4739, 0.4351, 0.4832, 0.3161, 0.4363, 0.5997], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0032, 0.0031, 0.0029, 0.0027, 0.0028, 0.0029, 0.0030], device='cuda:0'), out_proj_covar=tensor([2.5735e-05, 2.8378e-05, 3.1398e-05, 2.6814e-05, 2.4965e-05, 2.4291e-05, 2.4343e-05, 2.9306e-05], device='cuda:0') 2022-11-15 13:23:47,971 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.10 vs. limit=2.0 2022-11-15 13:23:56,963 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6634, 1.5755, 1.6308, 1.6448, 1.6377, 1.5910, 1.6579, 1.5935], device='cuda:0'), covar=tensor([0.2744, 0.3503, 0.3418, 0.3225, 0.3599, 0.2997, 0.3598, 0.3631], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0030, 0.0030, 0.0028, 0.0031, 0.0028, 0.0030, 0.0030], device='cuda:0'), out_proj_covar=tensor([2.2814e-05, 2.5422e-05, 2.7202e-05, 2.3940e-05, 2.8693e-05, 2.3022e-05, 2.6365e-05, 2.8081e-05], device='cuda:0') 2022-11-15 13:24:03,796 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=7.57 vs. limit=5.0 2022-11-15 13:24:04,864 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8166, 3.9168, 3.9730, 3.6264, 3.9117, 3.9008, 3.7908, 4.1307], device='cuda:0'), covar=tensor([0.2753, 0.2722, 0.3535, 0.5413, 0.3013, 0.3235, 0.4809, 0.2413], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0030, 0.0031, 0.0032, 0.0030, 0.0026, 0.0030, 0.0029], device='cuda:0'), out_proj_covar=tensor([2.9120e-05, 3.0628e-05, 2.8284e-05, 3.0041e-05, 2.7625e-05, 2.4511e-05, 2.5950e-05, 2.6920e-05], device='cuda:0') 2022-11-15 13:24:09,644 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.854e+01 1.222e+02 1.602e+02 1.992e+02 6.604e+02, threshold=3.204e+02, percent-clipped=19.0 2022-11-15 13:24:09,687 INFO [train.py:876] (0/4) Epoch 1, batch 1200, loss[loss=0.4526, simple_loss=0.3556, pruned_loss=0.3182, over 5620.00 frames. ], tot_loss[loss=0.3867, simple_loss=0.3069, pruned_loss=0.2876, over 1088059.92 frames. ], batch size: 23, lr: 4.93e-02, grad_scale: 8.0 2022-11-15 13:24:10,852 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1203.0, num_to_drop=2, layers_to_drop={1, 2} 2022-11-15 13:24:13,747 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.85 vs. limit=5.0 2022-11-15 13:24:17,634 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1214.0, num_to_drop=2, layers_to_drop={0, 2} 2022-11-15 13:24:20,297 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1218.0, num_to_drop=2, layers_to_drop={0, 1} 2022-11-15 13:24:23,565 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1224.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:24:36,253 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1246.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:24:48,462 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1266.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:24:51,634 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1272.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:25:04,717 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1294.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:25:07,092 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=7.65 vs. limit=5.0 2022-11-15 13:25:08,463 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 1.113e+02 1.512e+02 2.067e+02 6.699e+02, threshold=3.023e+02, percent-clipped=4.0 2022-11-15 13:25:08,508 INFO [train.py:876] (0/4) Epoch 1, batch 1300, loss[loss=0.3895, simple_loss=0.3092, pruned_loss=0.2639, over 5525.00 frames. ], tot_loss[loss=0.3799, simple_loss=0.3008, pruned_loss=0.2746, over 1085196.49 frames. ], batch size: 13, lr: 4.92e-02, grad_scale: 8.0 2022-11-15 13:25:19,900 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.39 vs. limit=2.0 2022-11-15 13:25:39,908 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=11.16 vs. limit=5.0 2022-11-15 13:25:45,799 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.09 vs. limit=2.0 2022-11-15 13:25:59,568 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.96 vs. limit=2.0 2022-11-15 13:26:09,308 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.007e+01 1.313e+02 1.838e+02 2.468e+02 4.304e+02, threshold=3.675e+02, percent-clipped=9.0 2022-11-15 13:26:09,349 INFO [train.py:876] (0/4) Epoch 1, batch 1400, loss[loss=0.3759, simple_loss=0.3017, pruned_loss=0.2462, over 5717.00 frames. ], tot_loss[loss=0.3747, simple_loss=0.2962, pruned_loss=0.264, over 1081499.40 frames. ], batch size: 19, lr: 4.91e-02, grad_scale: 8.0 2022-11-15 13:26:28,893 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1434.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:26:36,848 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.88 vs. limit=5.0 2022-11-15 13:26:55,308 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3213, 3.5573, 3.5217, 3.3137, 3.2920, 3.3020, 3.2522, 3.4111], device='cuda:0'), covar=tensor([0.1190, 0.1051, 0.1090, 0.1297, 0.1247, 0.1253, 0.1096, 0.1195], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0028, 0.0027, 0.0029, 0.0030, 0.0029, 0.0025, 0.0026], device='cuda:0'), out_proj_covar=tensor([2.7080e-05, 2.6991e-05, 2.4143e-05, 2.6520e-05, 2.7473e-05, 2.8433e-05, 2.3185e-05, 2.5254e-05], device='cuda:0') 2022-11-15 13:27:05,760 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1495.0, num_to_drop=2, layers_to_drop={2, 3} 2022-11-15 13:27:09,431 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.161e+01 1.360e+02 1.948e+02 2.853e+02 7.338e+02, threshold=3.896e+02, percent-clipped=10.0 2022-11-15 13:27:09,473 INFO [train.py:876] (0/4) Epoch 1, batch 1500, loss[loss=0.3715, simple_loss=0.2981, pruned_loss=0.2389, over 5593.00 frames. ], tot_loss[loss=0.3731, simple_loss=0.2934, pruned_loss=0.2574, over 1086097.45 frames. ], batch size: 24, lr: 4.89e-02, grad_scale: 8.0 2022-11-15 13:27:10,772 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1503.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:27:14,361 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1509.0, num_to_drop=2, layers_to_drop={0, 3} 2022-11-15 13:27:40,012 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1551.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:27:45,147 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8281, 0.8470, 0.8432, 0.7774, 0.7703, 0.8622, 0.8658, 0.7744], device='cuda:0'), covar=tensor([0.0935, 0.1014, 0.1207, 0.1237, 0.1193, 0.0952, 0.0898, 0.1234], device='cuda:0'), in_proj_covar=tensor([0.0022, 0.0024, 0.0025, 0.0027, 0.0026, 0.0026, 0.0025, 0.0026], device='cuda:0'), out_proj_covar=tensor([1.9746e-05, 2.1882e-05, 2.2037e-05, 2.4934e-05, 2.3364e-05, 2.4249e-05, 2.1218e-05, 2.5233e-05], device='cuda:0') 2022-11-15 13:27:49,357 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1566.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:28:10,526 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.692e+01 1.557e+02 1.959e+02 2.401e+02 7.122e+02, threshold=3.919e+02, percent-clipped=3.0 2022-11-15 13:28:10,569 INFO [train.py:876] (0/4) Epoch 1, batch 1600, loss[loss=0.3121, simple_loss=0.2507, pruned_loss=0.1972, over 5340.00 frames. ], tot_loss[loss=0.3702, simple_loss=0.2905, pruned_loss=0.25, over 1087922.17 frames. ], batch size: 9, lr: 4.88e-02, grad_scale: 8.0 2022-11-15 13:28:17,302 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1611.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:28:26,872 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1627.0, num_to_drop=2, layers_to_drop={0, 1} 2022-11-15 13:28:28,682 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.67 vs. limit=5.0 2022-11-15 13:28:28,881 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.46 vs. limit=2.0 2022-11-15 13:28:54,815 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1672.0, num_to_drop=2, layers_to_drop={1, 2} 2022-11-15 13:29:11,722 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.804e+01 1.584e+02 2.065e+02 2.557e+02 6.153e+02, threshold=4.130e+02, percent-clipped=6.0 2022-11-15 13:29:11,765 INFO [train.py:876] (0/4) Epoch 1, batch 1700, loss[loss=0.4372, simple_loss=0.3287, pruned_loss=0.2869, over 5369.00 frames. ], tot_loss[loss=0.3686, simple_loss=0.2882, pruned_loss=0.2444, over 1089961.64 frames. ], batch size: 70, lr: 4.86e-02, grad_scale: 8.0 2022-11-15 13:29:23,971 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.96 vs. limit=5.0 2022-11-15 13:29:43,314 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0 2022-11-15 13:29:52,071 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=8.85 vs. limit=5.0 2022-11-15 13:30:07,143 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1790.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 13:30:14,465 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.349e+01 1.628e+02 2.392e+02 3.072e+02 5.496e+02, threshold=4.784e+02, percent-clipped=8.0 2022-11-15 13:30:14,508 INFO [train.py:876] (0/4) Epoch 1, batch 1800, loss[loss=0.3773, simple_loss=0.2961, pruned_loss=0.2357, over 5724.00 frames. ], tot_loss[loss=0.362, simple_loss=0.283, pruned_loss=0.2355, over 1092613.87 frames. ], batch size: 15, lr: 4.85e-02, grad_scale: 8.0 2022-11-15 13:30:19,419 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=1809.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:30:37,489 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1838.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:30:49,385 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=1857.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:30:49,482 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1857.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:30:53,043 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9173, 2.7385, 2.9596, 2.9546, 2.7742, 2.8062, 2.9860, 2.8051], device='cuda:0'), covar=tensor([0.0624, 0.0560, 0.0363, 0.0434, 0.0446, 0.0569, 0.0429, 0.0486], device='cuda:0'), in_proj_covar=tensor([0.0026, 0.0027, 0.0027, 0.0027, 0.0020, 0.0025, 0.0025, 0.0026], device='cuda:0'), out_proj_covar=tensor([2.8140e-05, 2.7728e-05, 2.7964e-05, 2.6823e-05, 2.1322e-05, 2.5751e-05, 2.5418e-05, 2.6527e-05], device='cuda:0') 2022-11-15 13:30:57,323 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=1870.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:30:58,629 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9792, 1.9689, 1.4266, 1.5424, 1.8765, 2.1491, 1.9647, 2.0507], device='cuda:0'), covar=tensor([0.0641, 0.0699, 0.0849, 0.1062, 0.0742, 0.0536, 0.0750, 0.0781], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0024, 0.0027, 0.0026, 0.0026, 0.0025, 0.0026, 0.0025], device='cuda:0'), out_proj_covar=tensor([2.0267e-05, 2.1166e-05, 2.3216e-05, 2.4722e-05, 2.2762e-05, 2.2162e-05, 2.1396e-05, 2.5090e-05], device='cuda:0') 2022-11-15 13:31:09,598 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9815, 4.3320, 4.1276, 3.9437, 4.4352, 4.5805, 4.1021, 4.5605], device='cuda:0'), covar=tensor([0.0616, 0.0663, 0.0331, 0.0957, 0.0275, 0.0503, 0.0409, 0.0242], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0019, 0.0018, 0.0022, 0.0019, 0.0018, 0.0016, 0.0015], device='cuda:0'), out_proj_covar=tensor([1.7415e-05, 1.8147e-05, 1.5324e-05, 2.0501e-05, 1.7320e-05, 1.6941e-05, 1.4899e-05, 1.3855e-05], device='cuda:0') 2022-11-15 13:31:16,043 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1899.0, num_to_drop=2, layers_to_drop={2, 3} 2022-11-15 13:31:17,086 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.491e+01 1.765e+02 2.160e+02 2.889e+02 5.463e+02, threshold=4.319e+02, percent-clipped=1.0 2022-11-15 13:31:17,126 INFO [train.py:876] (0/4) Epoch 1, batch 1900, loss[loss=0.2742, simple_loss=0.2182, pruned_loss=0.1672, over 4305.00 frames. ], tot_loss[loss=0.3585, simple_loss=0.2795, pruned_loss=0.2296, over 1084772.62 frames. ], batch size: 5, lr: 4.83e-02, grad_scale: 8.0 2022-11-15 13:31:27,818 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1918.0, num_to_drop=2, layers_to_drop={2, 3} 2022-11-15 13:31:30,122 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1922.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 13:31:35,521 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=1931.0, num_to_drop=2, layers_to_drop={0, 1} 2022-11-15 13:31:58,484 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=1967.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:32:11,237 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.41 vs. limit=5.0 2022-11-15 13:32:19,572 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.971e+02 2.700e+02 3.319e+02 5.947e+02, threshold=5.400e+02, percent-clipped=8.0 2022-11-15 13:32:19,615 INFO [train.py:876] (0/4) Epoch 1, batch 2000, loss[loss=0.3281, simple_loss=0.2576, pruned_loss=0.1993, over 5574.00 frames. ], tot_loss[loss=0.3594, simple_loss=0.2794, pruned_loss=0.2268, over 1089047.85 frames. ], batch size: 25, lr: 4.82e-02, grad_scale: 16.0 2022-11-15 13:32:24,372 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2022-11-15 13:32:34,875 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.91 vs. limit=5.0 2022-11-15 13:33:20,321 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2090.0, num_to_drop=2, layers_to_drop={0, 1} 2022-11-15 13:33:27,170 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 1.719e+02 2.410e+02 2.808e+02 6.250e+02, threshold=4.821e+02, percent-clipped=3.0 2022-11-15 13:33:27,212 INFO [train.py:876] (0/4) Epoch 1, batch 2100, loss[loss=0.3621, simple_loss=0.282, pruned_loss=0.2211, over 5458.00 frames. ], tot_loss[loss=0.3545, simple_loss=0.2764, pruned_loss=0.2206, over 1090241.56 frames. ], batch size: 53, lr: 4.80e-02, grad_scale: 16.0 2022-11-15 13:33:34,912 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.97 vs. limit=2.0 2022-11-15 13:33:52,293 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2138.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:33:53,139 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.36 vs. limit=5.0 2022-11-15 13:34:12,154 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 13:34:29,777 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=2194.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:34:34,680 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.857e+02 2.320e+02 3.014e+02 5.745e+02, threshold=4.640e+02, percent-clipped=3.0 2022-11-15 13:34:34,723 INFO [train.py:876] (0/4) Epoch 1, batch 2200, loss[loss=0.3376, simple_loss=0.2755, pruned_loss=0.1999, over 5730.00 frames. ], tot_loss[loss=0.3489, simple_loss=0.2731, pruned_loss=0.215, over 1079503.85 frames. ], batch size: 27, lr: 4.78e-02, grad_scale: 16.0 2022-11-15 13:34:43,169 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=2213.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:34:49,316 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2222.0, num_to_drop=2, layers_to_drop={0, 1} 2022-11-15 13:34:49,920 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3901, 2.1432, 2.4272, 2.3272, 2.3500, 2.3369, 2.3998, 2.4026], device='cuda:0'), covar=tensor([0.0541, 0.0796, 0.0586, 0.0656, 0.0559, 0.0481, 0.0494, 0.0507], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0031, 0.0031, 0.0029, 0.0024, 0.0026, 0.0027, 0.0030], device='cuda:0'), out_proj_covar=tensor([3.0890e-05, 3.3080e-05, 3.4110e-05, 3.0926e-05, 2.5203e-05, 2.6979e-05, 2.7926e-05, 3.0885e-05], device='cuda:0') 2022-11-15 13:34:51,868 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=2226.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:35:19,976 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2267.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:35:21,885 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2270.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:35:25,612 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6716, 3.5504, 3.2483, 3.5564, 3.2656, 2.7805, 3.4890, 3.7396], device='cuda:0'), covar=tensor([0.0437, 0.0350, 0.0398, 0.0409, 0.0523, 0.0694, 0.0509, 0.0469], device='cuda:0'), in_proj_covar=tensor([0.0022, 0.0025, 0.0024, 0.0026, 0.0027, 0.0027, 0.0029, 0.0026], device='cuda:0'), out_proj_covar=tensor([1.7830e-05, 2.0359e-05, 1.9550e-05, 2.1216e-05, 2.3672e-05, 2.2335e-05, 2.4932e-05, 2.1149e-05], device='cuda:0') 2022-11-15 13:35:34,405 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8318, 2.7326, 2.6299, 2.5904, 2.7221, 2.8374, 3.2717, 3.0576], device='cuda:0'), covar=tensor([0.0485, 0.1304, 0.0746, 0.1279, 0.0574, 0.0630, 0.0475, 0.0532], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0016, 0.0012, 0.0015, 0.0011, 0.0011, 0.0013, 0.0010], device='cuda:0'), out_proj_covar=tensor([8.8039e-06, 1.5160e-05, 1.0582e-05, 1.3041e-05, 8.2997e-06, 8.2175e-06, 1.0710e-05, 8.2462e-06], device='cuda:0') 2022-11-15 13:35:42,517 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9845, 4.4303, 4.1704, 3.9759, 3.9495, 4.3780, 3.3859, 4.6221], device='cuda:0'), covar=tensor([0.0269, 0.0238, 0.0179, 0.0554, 0.0299, 0.0187, 0.0439, 0.0104], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0018, 0.0017, 0.0022, 0.0020, 0.0017, 0.0016, 0.0015], device='cuda:0'), out_proj_covar=tensor([1.7227e-05, 1.8421e-05, 1.5440e-05, 2.2480e-05, 1.8714e-05, 1.6938e-05, 1.5772e-05, 1.3758e-05], device='cuda:0') 2022-11-15 13:35:42,976 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.350e+01 1.693e+02 2.305e+02 3.135e+02 7.690e+02, threshold=4.610e+02, percent-clipped=6.0 2022-11-15 13:35:43,019 INFO [train.py:876] (0/4) Epoch 1, batch 2300, loss[loss=0.3154, simple_loss=0.2593, pruned_loss=0.1857, over 5758.00 frames. ], tot_loss[loss=0.3399, simple_loss=0.268, pruned_loss=0.2075, over 1076471.77 frames. ], batch size: 14, lr: 4.77e-02, grad_scale: 16.0 2022-11-15 13:35:43,906 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 13:35:52,627 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2315.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:36:09,638 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=2340.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:36:34,793 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4140, 2.2711, 2.3600, 2.3773, 2.1050, 1.2782, 2.3761, 2.3109], device='cuda:0'), covar=tensor([0.0402, 0.0485, 0.0309, 0.0330, 0.0575, 0.0800, 0.0379, 0.0397], device='cuda:0'), in_proj_covar=tensor([0.0025, 0.0026, 0.0026, 0.0027, 0.0029, 0.0028, 0.0030, 0.0026], device='cuda:0'), out_proj_covar=tensor([2.0926e-05, 2.1351e-05, 2.0216e-05, 2.2100e-05, 2.5654e-05, 2.4128e-05, 2.5889e-05, 2.1630e-05], device='cuda:0') 2022-11-15 13:36:40,334 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5785, 3.1615, 3.5707, 3.2680, 3.2093, 3.1989, 2.7171, 3.2235], device='cuda:0'), covar=tensor([0.0403, 0.0572, 0.0389, 0.0468, 0.0666, 0.0633, 0.0822, 0.0447], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0029, 0.0028, 0.0028, 0.0032, 0.0032, 0.0034, 0.0030], device='cuda:0'), out_proj_covar=tensor([2.2132e-05, 2.3757e-05, 2.4492e-05, 2.3360e-05, 2.6777e-05, 2.7201e-05, 2.8920e-05, 2.5100e-05], device='cuda:0') 2022-11-15 13:36:51,173 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.270e+01 1.877e+02 2.321e+02 3.011e+02 5.507e+02, threshold=4.642e+02, percent-clipped=4.0 2022-11-15 13:36:51,213 INFO [train.py:876] (0/4) Epoch 1, batch 2400, loss[loss=0.3101, simple_loss=0.2578, pruned_loss=0.1812, over 5733.00 frames. ], tot_loss[loss=0.338, simple_loss=0.2673, pruned_loss=0.2054, over 1078974.92 frames. ], batch size: 28, lr: 4.75e-02, grad_scale: 16.0 2022-11-15 13:36:51,409 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=2401.0, num_to_drop=2, layers_to_drop={0, 2} 2022-11-15 13:36:57,394 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0 2022-11-15 13:37:12,113 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=7.02 vs. limit=5.0 2022-11-15 13:37:36,254 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 13:37:53,805 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2494.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 13:37:58,565 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.057e+02 2.130e+02 2.557e+02 3.291e+02 7.818e+02, threshold=5.113e+02, percent-clipped=6.0 2022-11-15 13:37:58,607 INFO [train.py:876] (0/4) Epoch 1, batch 2500, loss[loss=0.3646, simple_loss=0.2696, pruned_loss=0.2298, over 5347.00 frames. ], tot_loss[loss=0.333, simple_loss=0.2643, pruned_loss=0.2014, over 1078736.12 frames. ], batch size: 70, lr: 4.73e-02, grad_scale: 16.0 2022-11-15 13:38:06,771 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2513.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:38:15,174 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2526.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:38:23,066 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.96 vs. limit=2.0 2022-11-15 13:38:25,937 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2542.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:38:38,838 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2561.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:38:40,186 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2725, 4.5228, 4.3404, 4.6838, 4.7682, 4.5569, 3.3431, 4.5754], device='cuda:0'), covar=tensor([0.0443, 0.0326, 0.0421, 0.0215, 0.0134, 0.0247, 0.1925, 0.0304], device='cuda:0'), in_proj_covar=tensor([0.0022, 0.0021, 0.0020, 0.0019, 0.0017, 0.0019, 0.0032, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.1182e-05, 1.9725e-05, 1.8562e-05, 1.6068e-05, 1.3407e-05, 1.6480e-05, 3.5350e-05, 1.6509e-05], device='cuda:0') 2022-11-15 13:38:47,655 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=2574.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:39:02,972 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.0178, 4.9979, 5.0204, 4.9261, 4.9037, 5.0707, 4.8737, 4.8372], device='cuda:0'), covar=tensor([0.0164, 0.0103, 0.0119, 0.0100, 0.0083, 0.0111, 0.0116, 0.0162], device='cuda:0'), in_proj_covar=tensor([0.0025, 0.0023, 0.0025, 0.0023, 0.0023, 0.0022, 0.0026, 0.0024], device='cuda:0'), out_proj_covar=tensor([2.5020e-05, 2.1275e-05, 2.5874e-05, 2.2417e-05, 2.2494e-05, 2.1220e-05, 2.5014e-05, 2.3610e-05], device='cuda:0') 2022-11-15 13:39:06,571 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.872e+02 2.338e+02 3.221e+02 9.126e+02, threshold=4.676e+02, percent-clipped=6.0 2022-11-15 13:39:06,613 INFO [train.py:876] (0/4) Epoch 1, batch 2600, loss[loss=0.3216, simple_loss=0.2559, pruned_loss=0.1937, over 5125.00 frames. ], tot_loss[loss=0.332, simple_loss=0.2642, pruned_loss=0.2002, over 1080648.99 frames. ], batch size: 91, lr: 4.71e-02, grad_scale: 16.0 2022-11-15 13:39:19,074 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.15 vs. limit=2.0 2022-11-15 13:39:44,452 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.55 vs. limit=2.0 2022-11-15 13:40:01,538 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.46 vs. limit=2.0 2022-11-15 13:40:04,082 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=2685.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:40:11,215 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=2696.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 13:40:14,324 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.241e+02 1.859e+02 2.538e+02 3.417e+02 6.213e+02, threshold=5.075e+02, percent-clipped=6.0 2022-11-15 13:40:14,364 INFO [train.py:876] (0/4) Epoch 1, batch 2700, loss[loss=0.3104, simple_loss=0.2549, pruned_loss=0.1829, over 5612.00 frames. ], tot_loss[loss=0.3296, simple_loss=0.263, pruned_loss=0.1983, over 1072651.01 frames. ], batch size: 23, lr: 4.69e-02, grad_scale: 16.0 2022-11-15 13:40:15,581 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.57 vs. limit=5.0 2022-11-15 13:40:45,509 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=2746.0, num_to_drop=2, layers_to_drop={0, 3} 2022-11-15 13:41:13,209 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.12 vs. limit=5.0 2022-11-15 13:41:22,886 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-15 13:41:23,155 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.193e+02 2.037e+02 2.481e+02 3.360e+02 6.352e+02, threshold=4.961e+02, percent-clipped=3.0 2022-11-15 13:41:23,198 INFO [train.py:876] (0/4) Epoch 1, batch 2800, loss[loss=0.3237, simple_loss=0.2718, pruned_loss=0.1878, over 5611.00 frames. ], tot_loss[loss=0.3256, simple_loss=0.2614, pruned_loss=0.195, over 1076123.81 frames. ], batch size: 23, lr: 4.67e-02, grad_scale: 16.0 2022-11-15 13:41:27,447 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.19 vs. limit=2.0 2022-11-15 13:41:39,159 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=2824.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:41:58,455 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3115, 3.8832, 4.4925, 4.2062, 3.8847, 4.0709, 4.4635, 4.1250], device='cuda:0'), covar=tensor([0.1039, 0.1416, 0.0781, 0.0837, 0.0996, 0.0626, 0.0701, 0.0663], device='cuda:0'), in_proj_covar=tensor([0.0037, 0.0046, 0.0044, 0.0042, 0.0033, 0.0034, 0.0038, 0.0038], device='cuda:0'), out_proj_covar=tensor([4.3103e-05, 5.3642e-05, 5.3568e-05, 4.8113e-05, 3.9301e-05, 3.9485e-05, 4.3839e-05, 4.2534e-05], device='cuda:0') 2022-11-15 13:42:10,139 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.74 vs. limit=5.0 2022-11-15 13:42:14,507 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.02 vs. limit=2.0 2022-11-15 13:42:16,552 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-15 13:42:20,474 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=2885.0, num_to_drop=2, layers_to_drop={2, 3} 2022-11-15 13:42:30,823 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 2.042e+02 2.577e+02 3.345e+02 6.665e+02, threshold=5.155e+02, percent-clipped=6.0 2022-11-15 13:42:30,864 INFO [train.py:876] (0/4) Epoch 1, batch 2900, loss[loss=0.2898, simple_loss=0.2358, pruned_loss=0.1719, over 5748.00 frames. ], tot_loss[loss=0.3226, simple_loss=0.2595, pruned_loss=0.1929, over 1079248.81 frames. ], batch size: 15, lr: 4.65e-02, grad_scale: 16.0 2022-11-15 13:42:47,949 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.08 vs. limit=2.0 2022-11-15 13:42:55,183 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=2937.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:42:58,139 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 13:43:07,119 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6555, 2.6976, 2.3514, 2.3169, 1.9508, 1.6208, 2.2941, 2.4941], device='cuda:0'), covar=tensor([0.0244, 0.0199, 0.0228, 0.0298, 0.0315, 0.0439, 0.0306, 0.0285], device='cuda:0'), in_proj_covar=tensor([0.0020, 0.0020, 0.0018, 0.0023, 0.0021, 0.0022, 0.0021, 0.0021], device='cuda:0'), out_proj_covar=tensor([1.6799e-05, 1.5687e-05, 1.4772e-05, 1.8825e-05, 1.7784e-05, 1.9746e-05, 1.8639e-05, 1.8199e-05], device='cuda:0') 2022-11-15 13:43:07,790 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1903, 2.4598, 2.8997, 2.6957, 2.9113, 2.7093, 2.3347, 3.0080], device='cuda:0'), covar=tensor([0.0367, 0.0704, 0.0324, 0.0431, 0.0409, 0.0481, 0.0572, 0.0417], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0030, 0.0026, 0.0028, 0.0028, 0.0029, 0.0030, 0.0026], device='cuda:0'), out_proj_covar=tensor([2.4665e-05, 2.6644e-05, 2.4301e-05, 2.4287e-05, 2.4737e-05, 2.7154e-05, 2.6448e-05, 2.3335e-05], device='cuda:0') 2022-11-15 13:43:35,963 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=2996.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:43:37,254 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=2998.0, num_to_drop=2, layers_to_drop={1, 2} 2022-11-15 13:43:37,879 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0644, 3.3734, 3.0605, 3.5960, 2.8658, 3.4214, 3.3286, 2.7182], device='cuda:0'), covar=tensor([0.1422, 0.0389, 0.0504, 0.0256, 0.0553, 0.0443, 0.0368, 0.0405], device='cuda:0'), in_proj_covar=tensor([0.0022, 0.0017, 0.0018, 0.0014, 0.0017, 0.0016, 0.0015, 0.0014], device='cuda:0'), out_proj_covar=tensor([1.8039e-05, 1.3012e-05, 1.4222e-05, 8.5652e-06, 1.2487e-05, 1.2103e-05, 9.8310e-06, 8.8649e-06], device='cuda:0') 2022-11-15 13:43:39,748 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.318e+01 2.017e+02 2.613e+02 3.220e+02 7.122e+02, threshold=5.226e+02, percent-clipped=3.0 2022-11-15 13:43:39,792 INFO [train.py:876] (0/4) Epoch 1, batch 3000, loss[loss=0.3362, simple_loss=0.2757, pruned_loss=0.1983, over 5717.00 frames. ], tot_loss[loss=0.3218, simple_loss=0.2597, pruned_loss=0.192, over 1084920.34 frames. ], batch size: 19, lr: 4.63e-02, grad_scale: 16.0 2022-11-15 13:43:39,793 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 13:43:54,116 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7770, 2.8013, 2.7270, 2.8733, 2.5245, 2.7590, 2.6518, 2.8789], device='cuda:0'), covar=tensor([0.0299, 0.0224, 0.0275, 0.0195, 0.0316, 0.0263, 0.0370, 0.0243], device='cuda:0'), in_proj_covar=tensor([0.0026, 0.0023, 0.0026, 0.0023, 0.0025, 0.0023, 0.0028, 0.0026], device='cuda:0'), out_proj_covar=tensor([2.7191e-05, 2.2706e-05, 2.7906e-05, 2.3454e-05, 2.7041e-05, 2.2996e-05, 2.8651e-05, 2.7244e-05], device='cuda:0') 2022-11-15 13:43:57,077 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8890, 2.9560, 2.9156, 2.9330, 2.6217, 2.8913, 2.7951, 3.0198], device='cuda:0'), covar=tensor([0.0237, 0.0137, 0.0202, 0.0135, 0.0201, 0.0164, 0.0263, 0.0123], device='cuda:0'), in_proj_covar=tensor([0.0026, 0.0023, 0.0026, 0.0023, 0.0025, 0.0023, 0.0028, 0.0026], device='cuda:0'), out_proj_covar=tensor([2.7191e-05, 2.2706e-05, 2.7906e-05, 2.3454e-05, 2.7041e-05, 2.2996e-05, 2.8651e-05, 2.7244e-05], device='cuda:0') 2022-11-15 13:43:58,880 INFO [train.py:908] (0/4) Epoch 1, validation: loss=0.2736, simple_loss=0.2548, pruned_loss=0.1462, over 1530663.00 frames. 2022-11-15 13:43:58,881 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4410MB 2022-11-15 13:44:11,421 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3019.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:44:26,568 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3041.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:44:28,580 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3044.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:44:32,771 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.40 vs. limit=5.0 2022-11-15 13:44:47,931 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.15 vs. limit=5.0 2022-11-15 13:44:48,760 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.04 vs. limit=2.0 2022-11-15 13:44:53,862 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7510, 1.3996, 1.0686, 1.0932, 2.4903, 2.4469, 2.7615, 2.2229], device='cuda:0'), covar=tensor([0.0354, 0.0272, 0.0692, 0.0516, 0.0255, 0.0245, 0.0158, 0.0201], device='cuda:0'), in_proj_covar=tensor([0.0021, 0.0019, 0.0022, 0.0022, 0.0020, 0.0020, 0.0020, 0.0019], device='cuda:0'), out_proj_covar=tensor([1.8800e-05, 1.6988e-05, 2.1523e-05, 2.2410e-05, 1.7225e-05, 1.8533e-05, 1.6980e-05, 1.7522e-05], device='cuda:0') 2022-11-15 13:44:53,879 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3080.0, num_to_drop=2, layers_to_drop={0, 1} 2022-11-15 13:45:08,266 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 2.068e+02 2.442e+02 3.389e+02 5.023e+02, threshold=4.884e+02, percent-clipped=1.0 2022-11-15 13:45:08,307 INFO [train.py:876] (0/4) Epoch 1, batch 3100, loss[loss=0.3177, simple_loss=0.2582, pruned_loss=0.1886, over 5530.00 frames. ], tot_loss[loss=0.3196, simple_loss=0.2591, pruned_loss=0.1901, over 1084258.86 frames. ], batch size: 13, lr: 4.61e-02, grad_scale: 16.0 2022-11-15 13:45:17,152 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=7.58 vs. limit=5.0 2022-11-15 13:45:45,426 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0 2022-11-15 13:45:51,753 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.15 vs. limit=5.0 2022-11-15 13:46:02,907 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3180.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:46:17,615 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 2.233e+02 2.604e+02 3.182e+02 6.551e+02, threshold=5.207e+02, percent-clipped=6.0 2022-11-15 13:46:17,656 INFO [train.py:876] (0/4) Epoch 1, batch 3200, loss[loss=0.329, simple_loss=0.2663, pruned_loss=0.1958, over 5588.00 frames. ], tot_loss[loss=0.3179, simple_loss=0.2586, pruned_loss=0.1886, over 1084422.18 frames. ], batch size: 24, lr: 4.59e-02, grad_scale: 16.0 2022-11-15 13:46:44,621 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.51 vs. limit=5.0 2022-11-15 13:46:45,745 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1593, 4.4902, 4.5169, 4.1749, 4.5609, 4.0779, 4.0656, 4.0571], device='cuda:0'), covar=tensor([0.0397, 0.0263, 0.0412, 0.0353, 0.0424, 0.0392, 0.0387, 0.0379], device='cuda:0'), in_proj_covar=tensor([0.0035, 0.0036, 0.0032, 0.0034, 0.0034, 0.0032, 0.0031, 0.0031], device='cuda:0'), out_proj_covar=tensor([4.7057e-05, 4.3482e-05, 3.8106e-05, 3.9651e-05, 4.5825e-05, 4.2633e-05, 3.6675e-05, 3.8081e-05], device='cuda:0') 2022-11-15 13:46:53,606 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-15 13:46:54,897 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.00 vs. limit=2.0 2022-11-15 13:47:17,418 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3288.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:47:20,608 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3293.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:47:25,677 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.790e+01 2.210e+02 2.954e+02 4.251e+02 1.287e+03, threshold=5.908e+02, percent-clipped=13.0 2022-11-15 13:47:25,721 INFO [train.py:876] (0/4) Epoch 1, batch 3300, loss[loss=0.2401, simple_loss=0.2071, pruned_loss=0.1366, over 5734.00 frames. ], tot_loss[loss=0.3145, simple_loss=0.2565, pruned_loss=0.1862, over 1088102.32 frames. ], batch size: 11, lr: 4.57e-02, grad_scale: 16.0 2022-11-15 13:47:34,234 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5544, 4.2190, 4.1885, 3.8677, 3.2039, 3.0265, 3.9016, 4.2184], device='cuda:0'), covar=tensor([0.0230, 0.0191, 0.0086, 0.0138, 0.0647, 0.0533, 0.0300, 0.0153], device='cuda:0'), in_proj_covar=tensor([0.0021, 0.0019, 0.0019, 0.0020, 0.0022, 0.0020, 0.0020, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.1502e-05, 1.8572e-05, 1.7665e-05, 1.9891e-05, 2.2025e-05, 1.9740e-05, 2.0890e-05, 1.8726e-05], device='cuda:0') 2022-11-15 13:47:53,293 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3341.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:47:59,053 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3349.0, num_to_drop=2, layers_to_drop={0, 3} 2022-11-15 13:48:16,352 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.12 vs. limit=5.0 2022-11-15 13:48:17,244 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3375.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:48:26,983 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3389.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:48:35,851 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.826e+01 1.721e+02 2.073e+02 2.750e+02 6.330e+02, threshold=4.146e+02, percent-clipped=3.0 2022-11-15 13:48:35,892 INFO [train.py:876] (0/4) Epoch 1, batch 3400, loss[loss=0.3601, simple_loss=0.2777, pruned_loss=0.2213, over 4706.00 frames. ], tot_loss[loss=0.31, simple_loss=0.2547, pruned_loss=0.1827, over 1094337.25 frames. ], batch size: 135, lr: 4.55e-02, grad_scale: 16.0 2022-11-15 13:49:30,568 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3480.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:49:41,509 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5788, 2.5534, 2.8016, 2.2176, 2.4491, 2.3430, 2.2074, 2.6425], device='cuda:0'), covar=tensor([0.0337, 0.0436, 0.0175, 0.0565, 0.0353, 0.0260, 0.0438, 0.0238], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0029, 0.0022, 0.0030, 0.0024, 0.0026, 0.0029, 0.0024], device='cuda:0'), out_proj_covar=tensor([2.5436e-05, 2.9040e-05, 2.1913e-05, 2.8904e-05, 2.2870e-05, 2.5827e-05, 2.7697e-05, 2.3368e-05], device='cuda:0') 2022-11-15 13:49:45,036 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.861e+02 2.622e+02 3.661e+02 7.520e+02, threshold=5.245e+02, percent-clipped=13.0 2022-11-15 13:49:45,080 INFO [train.py:876] (0/4) Epoch 1, batch 3500, loss[loss=0.2709, simple_loss=0.2317, pruned_loss=0.155, over 5703.00 frames. ], tot_loss[loss=0.309, simple_loss=0.2538, pruned_loss=0.1821, over 1091970.85 frames. ], batch size: 15, lr: 4.53e-02, grad_scale: 16.0 2022-11-15 13:49:48,240 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-15 13:50:04,181 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3528.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:50:17,078 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3546.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:50:19,062 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 13:50:49,957 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3593.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:50:53,153 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1900, 1.9930, 2.1802, 1.8675, 2.3430, 1.8982, 1.3493, 2.1706], device='cuda:0'), covar=tensor([0.0138, 0.0156, 0.0136, 0.0143, 0.0077, 0.0142, 0.0230, 0.0128], device='cuda:0'), in_proj_covar=tensor([0.0024, 0.0021, 0.0023, 0.0023, 0.0021, 0.0022, 0.0023, 0.0021], device='cuda:0'), out_proj_covar=tensor([2.2237e-05, 2.0425e-05, 2.1996e-05, 2.1967e-05, 2.0342e-05, 2.1952e-05, 2.4235e-05, 1.9648e-05], device='cuda:0') 2022-11-15 13:50:56,145 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 2.156e+02 2.680e+02 3.527e+02 6.618e+02, threshold=5.360e+02, percent-clipped=3.0 2022-11-15 13:50:56,187 INFO [train.py:876] (0/4) Epoch 1, batch 3600, loss[loss=0.2532, simple_loss=0.2213, pruned_loss=0.1425, over 5774.00 frames. ], tot_loss[loss=0.3073, simple_loss=0.253, pruned_loss=0.1808, over 1089179.71 frames. ], batch size: 9, lr: 4.50e-02, grad_scale: 16.0 2022-11-15 13:51:00,508 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3607.0, num_to_drop=2, layers_to_drop={0, 3} 2022-11-15 13:51:17,574 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9207, 2.1265, 0.9482, 1.6726, 1.1347, 1.3917, 1.7081, 1.7138], device='cuda:0'), covar=tensor([0.0194, 0.0107, 0.0357, 0.0248, 0.0258, 0.0214, 0.0155, 0.0149], device='cuda:0'), in_proj_covar=tensor([0.0025, 0.0025, 0.0024, 0.0026, 0.0023, 0.0022, 0.0021, 0.0024], device='cuda:0'), out_proj_covar=tensor([2.2642e-05, 2.1996e-05, 2.4721e-05, 2.4721e-05, 2.1718e-05, 2.0042e-05, 1.9446e-05, 2.0516e-05], device='cuda:0') 2022-11-15 13:51:24,662 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3641.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:51:27,130 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3644.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 13:51:43,192 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8851, 3.3635, 2.9998, 3.4096, 3.2527, 2.8485, 2.9453, 2.7918], device='cuda:0'), covar=tensor([0.0329, 0.0202, 0.0306, 0.0163, 0.0218, 0.0405, 0.0325, 0.0373], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0031, 0.0041, 0.0034, 0.0034, 0.0038, 0.0033, 0.0032], device='cuda:0'), out_proj_covar=tensor([3.8617e-05, 3.7961e-05, 4.7746e-05, 3.8979e-05, 3.7504e-05, 4.4104e-05, 3.8309e-05, 3.5783e-05], device='cuda:0') 2022-11-15 13:51:49,277 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3675.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:51:49,959 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3676.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:52:07,941 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.388e+02 2.330e+02 3.139e+02 3.973e+02 9.859e+02, threshold=6.278e+02, percent-clipped=9.0 2022-11-15 13:52:07,981 INFO [train.py:876] (0/4) Epoch 1, batch 3700, loss[loss=0.2359, simple_loss=0.1975, pruned_loss=0.1371, over 5167.00 frames. ], tot_loss[loss=0.3067, simple_loss=0.2524, pruned_loss=0.1805, over 1081701.63 frames. ], batch size: 7, lr: 4.48e-02, grad_scale: 16.0 2022-11-15 13:52:21,296 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.25 vs. limit=5.0 2022-11-15 13:52:24,108 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3723.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:52:33,985 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3737.0, num_to_drop=2, layers_to_drop={2, 3} 2022-11-15 13:52:38,583 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-15 13:52:53,002 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 13:52:54,187 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-15 13:53:05,483 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-15 13:53:10,784 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.37 vs. limit=5.0 2022-11-15 13:53:11,243 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6272, 3.3726, 3.4822, 3.6837, 3.5996, 3.0645, 2.2217, 3.3806], device='cuda:0'), covar=tensor([0.0842, 0.0234, 0.0263, 0.0141, 0.0178, 0.0541, 0.2715, 0.0191], device='cuda:0'), in_proj_covar=tensor([0.0076, 0.0055, 0.0057, 0.0048, 0.0049, 0.0069, 0.0108, 0.0051], device='cuda:0'), out_proj_covar=tensor([8.0127e-05, 5.1223e-05, 5.0847e-05, 4.0071e-05, 4.0567e-05, 6.5248e-05, 1.3428e-04, 4.2289e-05], device='cuda:0') 2022-11-15 13:53:20,083 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 2.129e+02 2.584e+02 3.254e+02 9.208e+02, threshold=5.168e+02, percent-clipped=2.0 2022-11-15 13:53:20,124 INFO [train.py:876] (0/4) Epoch 1, batch 3800, loss[loss=0.3581, simple_loss=0.2616, pruned_loss=0.2273, over 3206.00 frames. ], tot_loss[loss=0.307, simple_loss=0.2533, pruned_loss=0.1804, over 1083006.62 frames. ], batch size: 284, lr: 4.46e-02, grad_scale: 16.0 2022-11-15 13:53:27,000 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.10 vs. limit=2.0 2022-11-15 13:54:31,929 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 2.219e+02 2.583e+02 3.487e+02 8.673e+02, threshold=5.166e+02, percent-clipped=10.0 2022-11-15 13:54:31,972 INFO [train.py:876] (0/4) Epoch 1, batch 3900, loss[loss=0.2476, simple_loss=0.2036, pruned_loss=0.1458, over 5349.00 frames. ], tot_loss[loss=0.3058, simple_loss=0.2529, pruned_loss=0.1794, over 1085620.30 frames. ], batch size: 9, lr: 4.44e-02, grad_scale: 16.0 2022-11-15 13:54:32,688 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=3902.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 13:54:50,518 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=3926.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:55:03,816 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=3944.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:55:34,398 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=3987.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:55:38,173 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=3992.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:55:44,803 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 2.234e+02 2.679e+02 3.538e+02 6.488e+02, threshold=5.359e+02, percent-clipped=3.0 2022-11-15 13:55:44,845 INFO [train.py:876] (0/4) Epoch 1, batch 4000, loss[loss=0.3173, simple_loss=0.2614, pruned_loss=0.1866, over 5461.00 frames. ], tot_loss[loss=0.3026, simple_loss=0.2509, pruned_loss=0.1772, over 1090248.34 frames. ], batch size: 58, lr: 4.42e-02, grad_scale: 32.0 2022-11-15 13:56:07,922 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=4032.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 13:56:58,561 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.016e+02 2.074e+02 2.727e+02 3.371e+02 7.611e+02, threshold=5.454e+02, percent-clipped=4.0 2022-11-15 13:56:58,604 INFO [train.py:876] (0/4) Epoch 1, batch 4100, loss[loss=0.3211, simple_loss=0.278, pruned_loss=0.1821, over 5501.00 frames. ], tot_loss[loss=0.3046, simple_loss=0.2524, pruned_loss=0.1784, over 1086399.67 frames. ], batch size: 17, lr: 4.40e-02, grad_scale: 32.0 2022-11-15 13:57:08,293 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7727, 2.0769, 2.0970, 1.8727, 1.8918, 1.9909, 1.7460, 1.9744], device='cuda:0'), covar=tensor([0.0202, 0.0167, 0.0129, 0.0197, 0.0231, 0.0225, 0.0337, 0.0184], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0016, 0.0016, 0.0019, 0.0018, 0.0017, 0.0018, 0.0017], device='cuda:0'), out_proj_covar=tensor([1.9108e-05, 1.6809e-05, 1.6168e-05, 2.0617e-05, 2.0055e-05, 1.8702e-05, 2.0932e-05, 1.8494e-05], device='cuda:0') 2022-11-15 13:57:28,544 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=4141.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:57:39,102 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.17 vs. limit=2.0 2022-11-15 13:58:09,923 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2368, 4.1968, 4.6248, 3.8958, 4.5731, 4.3326, 4.1201, 3.8277], device='cuda:0'), covar=tensor([0.0355, 0.0328, 0.0341, 0.0342, 0.0435, 0.0168, 0.0346, 0.0392], device='cuda:0'), in_proj_covar=tensor([0.0038, 0.0041, 0.0033, 0.0039, 0.0043, 0.0029, 0.0034, 0.0032], device='cuda:0'), out_proj_covar=tensor([5.5222e-05, 5.4976e-05, 4.6181e-05, 5.0985e-05, 6.9526e-05, 4.2284e-05, 4.7839e-05, 4.4100e-05], device='cuda:0') 2022-11-15 13:58:13,172 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.362e+02 1.959e+02 2.484e+02 3.266e+02 7.504e+02, threshold=4.967e+02, percent-clipped=2.0 2022-11-15 13:58:13,215 INFO [train.py:876] (0/4) Epoch 1, batch 4200, loss[loss=0.3665, simple_loss=0.2934, pruned_loss=0.2198, over 5568.00 frames. ], tot_loss[loss=0.3013, simple_loss=0.2503, pruned_loss=0.1762, over 1083693.24 frames. ], batch size: 46, lr: 4.38e-02, grad_scale: 32.0 2022-11-15 13:58:14,057 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4202.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 13:58:14,103 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7680, 0.7253, 0.5399, 0.6471, 0.6222, 0.9270, 0.3029, 0.7312], device='cuda:0'), covar=tensor([0.0093, 0.0151, 0.0172, 0.0094, 0.0163, 0.0095, 0.0225, 0.0101], device='cuda:0'), in_proj_covar=tensor([0.0030, 0.0028, 0.0030, 0.0029, 0.0031, 0.0026, 0.0029, 0.0027], device='cuda:0'), out_proj_covar=tensor([3.0588e-05, 2.7739e-05, 3.1684e-05, 2.8460e-05, 3.1823e-05, 2.6537e-05, 3.3365e-05, 2.8207e-05], device='cuda:0') 2022-11-15 13:58:14,109 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=4202.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:58:49,670 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4250.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:59:04,726 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0286, 1.9253, 2.0159, 1.8468, 1.9929, 2.1602, 2.2477, 2.4299], device='cuda:0'), covar=tensor([0.0238, 0.0698, 0.0227, 0.0297, 0.0246, 0.0165, 0.0322, 0.0138], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0025, 0.0019, 0.0022, 0.0022, 0.0022, 0.0024, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.4305e-05, 2.7033e-05, 2.0881e-05, 2.3271e-05, 2.2992e-05, 2.2710e-05, 2.6216e-05, 1.9239e-05], device='cuda:0') 2022-11-15 13:59:13,090 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=4282.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:59:13,142 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0406, 2.5284, 2.4461, 2.7516, 2.2979, 2.4451, 1.5893, 2.4730], device='cuda:0'), covar=tensor([0.0994, 0.0265, 0.0300, 0.0151, 0.0357, 0.0533, 0.2518, 0.0356], device='cuda:0'), in_proj_covar=tensor([0.0096, 0.0063, 0.0067, 0.0055, 0.0056, 0.0083, 0.0130, 0.0063], device='cuda:0'), out_proj_covar=tensor([1.0097e-04, 5.7326e-05, 6.3416e-05, 4.7163e-05, 5.0370e-05, 8.4012e-05, 1.5443e-04, 5.5624e-05], device='cuda:0') 2022-11-15 13:59:17,699 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=4288.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 13:59:27,272 INFO [train.py:876] (0/4) Epoch 1, batch 4300, loss[loss=0.24, simple_loss=0.2056, pruned_loss=0.1373, over 4411.00 frames. ], tot_loss[loss=0.3016, simple_loss=0.2507, pruned_loss=0.1763, over 1076203.30 frames. ], batch size: 5, lr: 4.35e-02, grad_scale: 16.0 2022-11-15 13:59:27,971 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 2.229e+02 3.130e+02 3.930e+02 1.663e+03, threshold=6.259e+02, percent-clipped=10.0 2022-11-15 13:59:28,560 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-15 13:59:29,403 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.91 vs. limit=5.0 2022-11-15 13:59:47,951 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5852, 1.8461, 2.0857, 1.7083, 2.5742, 1.4043, 2.4005, 2.0079], device='cuda:0'), covar=tensor([0.0161, 0.1046, 0.0228, 0.0289, 0.0306, 0.2470, 0.0250, 0.0306], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0017, 0.0019, 0.0019, 0.0019, 0.0017, 0.0018, 0.0018], device='cuda:0'), out_proj_covar=tensor([2.0338e-05, 1.8952e-05, 1.9374e-05, 2.1577e-05, 2.1993e-05, 1.9247e-05, 2.1187e-05, 2.0777e-05], device='cuda:0') 2022-11-15 13:59:50,082 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4332.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 13:59:53,505 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1675, 4.3667, 4.1772, 4.3786, 3.8256, 4.1604, 3.6735, 4.3062], device='cuda:0'), covar=tensor([0.0212, 0.0230, 0.0196, 0.0141, 0.0288, 0.0321, 0.0457, 0.0187], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0030, 0.0028, 0.0026, 0.0029, 0.0026, 0.0037, 0.0029], device='cuda:0'), out_proj_covar=tensor([3.9782e-05, 3.6754e-05, 3.5531e-05, 3.2066e-05, 3.6410e-05, 3.2725e-05, 4.6364e-05, 3.6162e-05], device='cuda:0') 2022-11-15 14:00:02,431 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=4349.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:00:05,786 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.44 vs. limit=5.0 2022-11-15 14:00:08,729 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.09 vs. limit=2.0 2022-11-15 14:00:25,183 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4380.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 14:00:41,286 INFO [train.py:876] (0/4) Epoch 1, batch 4400, loss[loss=0.3777, simple_loss=0.2975, pruned_loss=0.229, over 5397.00 frames. ], tot_loss[loss=0.2992, simple_loss=0.2491, pruned_loss=0.1747, over 1074377.89 frames. ], batch size: 70, lr: 4.33e-02, grad_scale: 16.0 2022-11-15 14:00:41,972 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.193e+02 1.948e+02 2.508e+02 3.167e+02 7.237e+02, threshold=5.016e+02, percent-clipped=3.0 2022-11-15 14:01:50,284 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6800, 4.3947, 4.3055, 4.0137, 4.2100, 3.8757, 2.8726, 4.1911], device='cuda:0'), covar=tensor([0.1681, 0.0157, 0.0218, 0.0191, 0.0119, 0.0416, 0.2985, 0.0142], device='cuda:0'), in_proj_covar=tensor([0.0098, 0.0063, 0.0070, 0.0058, 0.0058, 0.0083, 0.0130, 0.0063], device='cuda:0'), out_proj_covar=tensor([1.0331e-04, 5.7952e-05, 6.6150e-05, 5.1300e-05, 5.3194e-05, 8.4424e-05, 1.5408e-04, 5.5734e-05], device='cuda:0') 2022-11-15 14:01:51,577 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=4497.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:01:54,770 INFO [train.py:876] (0/4) Epoch 1, batch 4500, loss[loss=0.3101, simple_loss=0.2636, pruned_loss=0.1783, over 5772.00 frames. ], tot_loss[loss=0.2993, simple_loss=0.2495, pruned_loss=0.1746, over 1077136.86 frames. ], batch size: 16, lr: 4.31e-02, grad_scale: 16.0 2022-11-15 14:01:55,417 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 2.194e+02 3.031e+02 3.828e+02 9.010e+02, threshold=6.062e+02, percent-clipped=8.0 2022-11-15 14:01:58,715 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-15 14:02:17,224 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8915, 1.2596, 1.7477, 1.7531, 1.5563, 1.7542, 1.7279, 1.5798], device='cuda:0'), covar=tensor([0.0060, 0.0290, 0.0088, 0.0153, 0.0126, 0.0114, 0.0123, 0.0098], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0064, 0.0033, 0.0043, 0.0032, 0.0036, 0.0036, 0.0033], device='cuda:0'), out_proj_covar=tensor([2.6496e-05, 7.0024e-05, 3.2315e-05, 4.3597e-05, 3.1168e-05, 3.6688e-05, 3.5423e-05, 3.2994e-05], device='cuda:0') 2022-11-15 14:02:54,229 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4582.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:03:08,725 INFO [train.py:876] (0/4) Epoch 1, batch 4600, loss[loss=0.3123, simple_loss=0.261, pruned_loss=0.1818, over 5756.00 frames. ], tot_loss[loss=0.3002, simple_loss=0.2501, pruned_loss=0.1752, over 1074064.55 frames. ], batch size: 21, lr: 4.29e-02, grad_scale: 16.0 2022-11-15 14:03:09,371 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.162e+01 1.839e+02 2.747e+02 3.849e+02 7.443e+02, threshold=5.493e+02, percent-clipped=4.0 2022-11-15 14:03:29,919 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4630.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:03:30,824 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.17 vs. limit=5.0 2022-11-15 14:03:39,183 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7404, 1.7401, 2.3613, 1.8576, 2.1542, 2.2694, 2.1219, 2.3983], device='cuda:0'), covar=tensor([0.0156, 0.1194, 0.0348, 0.0714, 0.0437, 0.0363, 0.0362, 0.0474], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0067, 0.0035, 0.0047, 0.0032, 0.0037, 0.0038, 0.0034], device='cuda:0'), out_proj_covar=tensor([2.6372e-05, 7.4634e-05, 3.5070e-05, 4.8178e-05, 3.1636e-05, 3.8247e-05, 3.8345e-05, 3.5054e-05], device='cuda:0') 2022-11-15 14:03:40,563 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=4644.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:03:45,791 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2391, 4.0455, 4.4136, 4.1987, 3.8705, 3.7436, 4.7986, 3.9622], device='cuda:0'), covar=tensor([0.0417, 0.0913, 0.0394, 0.0626, 0.0426, 0.0386, 0.0454, 0.0497], device='cuda:0'), in_proj_covar=tensor([0.0042, 0.0063, 0.0049, 0.0054, 0.0039, 0.0039, 0.0055, 0.0046], device='cuda:0'), out_proj_covar=tensor([5.9132e-05, 9.3029e-05, 6.9664e-05, 7.3200e-05, 5.6922e-05, 5.3720e-05, 8.5576e-05, 6.4159e-05], device='cuda:0') 2022-11-15 14:03:59,883 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.59 vs. limit=5.0 2022-11-15 14:04:15,077 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2022-11-15 14:04:16,572 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2022-11-15 14:04:23,991 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.54 vs. limit=5.0 2022-11-15 14:04:24,344 INFO [train.py:876] (0/4) Epoch 1, batch 4700, loss[loss=0.3511, simple_loss=0.275, pruned_loss=0.2136, over 5262.00 frames. ], tot_loss[loss=0.2961, simple_loss=0.2474, pruned_loss=0.1724, over 1081963.36 frames. ], batch size: 79, lr: 4.27e-02, grad_scale: 16.0 2022-11-15 14:04:24,961 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 2.250e+02 2.748e+02 3.964e+02 7.433e+02, threshold=5.495e+02, percent-clipped=7.0 2022-11-15 14:04:32,035 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4770, 1.5567, 0.6153, 1.4466, 0.8893, 1.4959, 1.4295, 1.3861], device='cuda:0'), covar=tensor([0.0156, 0.0112, 0.0329, 0.0120, 0.0142, 0.0080, 0.0089, 0.0093], device='cuda:0'), in_proj_covar=tensor([0.0039, 0.0034, 0.0035, 0.0032, 0.0029, 0.0027, 0.0026, 0.0030], device='cuda:0'), out_proj_covar=tensor([4.0322e-05, 3.4651e-05, 4.1606e-05, 3.3829e-05, 3.1222e-05, 2.7779e-05, 2.9034e-05, 3.1398e-05], device='cuda:0') 2022-11-15 14:05:00,035 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9960, 0.7211, 0.6534, 0.6854, 0.9445, 1.2002, 0.6054, 1.0762], device='cuda:0'), covar=tensor([0.0187, 0.0167, 0.0253, 0.0335, 0.0143, 0.0126, 0.0292, 0.0106], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0021, 0.0024, 0.0026, 0.0024, 0.0024, 0.0024, 0.0022], device='cuda:0'), out_proj_covar=tensor([2.5715e-05, 2.5634e-05, 2.8378e-05, 3.5047e-05, 2.7354e-05, 2.5796e-05, 2.5310e-05, 2.5270e-05], device='cuda:0') 2022-11-15 14:05:04,418 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=7.58 vs. limit=5.0 2022-11-15 14:05:14,633 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2113, 4.2468, 4.3821, 4.2763, 3.9047, 3.7308, 4.7320, 4.1705], device='cuda:0'), covar=tensor([0.0526, 0.0761, 0.0419, 0.0595, 0.0592, 0.0583, 0.0781, 0.0448], device='cuda:0'), in_proj_covar=tensor([0.0044, 0.0065, 0.0053, 0.0057, 0.0040, 0.0039, 0.0059, 0.0047], device='cuda:0'), out_proj_covar=tensor([6.1334e-05, 9.6302e-05, 7.5582e-05, 7.8584e-05, 5.8173e-05, 5.4655e-05, 9.3185e-05, 6.5653e-05], device='cuda:0') 2022-11-15 14:05:19,709 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=7.02 vs. limit=5.0 2022-11-15 14:05:21,571 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3795, 2.1284, 2.0759, 3.2984, 3.1263, 2.8938, 1.8822, 3.6612], device='cuda:0'), covar=tensor([0.0411, 0.1356, 0.1276, 0.0228, 0.0422, 0.0705, 0.1415, 0.0194], device='cuda:0'), in_proj_covar=tensor([0.0037, 0.0074, 0.0068, 0.0036, 0.0043, 0.0054, 0.0061, 0.0042], device='cuda:0'), out_proj_covar=tensor([3.7187e-05, 7.8917e-05, 6.8308e-05, 3.5265e-05, 4.1137e-05, 5.5558e-05, 6.3599e-05, 3.9914e-05], device='cuda:0') 2022-11-15 14:05:24,006 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.49 vs. limit=5.0 2022-11-15 14:05:34,560 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4797.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:05:37,709 INFO [train.py:876] (0/4) Epoch 1, batch 4800, loss[loss=0.3097, simple_loss=0.2487, pruned_loss=0.1853, over 5679.00 frames. ], tot_loss[loss=0.2948, simple_loss=0.2469, pruned_loss=0.1714, over 1081956.68 frames. ], batch size: 36, lr: 4.25e-02, grad_scale: 16.0 2022-11-15 14:05:38,343 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.248e+02 1.870e+02 2.529e+02 3.283e+02 6.481e+02, threshold=5.059e+02, percent-clipped=1.0 2022-11-15 14:05:45,804 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8648, 3.7701, 3.3740, 3.7430, 3.6347, 3.3914, 3.1984, 2.8419], device='cuda:0'), covar=tensor([0.0529, 0.0265, 0.0282, 0.0320, 0.0235, 0.0322, 0.0326, 0.0483], device='cuda:0'), in_proj_covar=tensor([0.0042, 0.0036, 0.0048, 0.0037, 0.0043, 0.0041, 0.0039, 0.0038], device='cuda:0'), out_proj_covar=tensor([5.6591e-05, 5.0376e-05, 6.5084e-05, 4.8109e-05, 5.5548e-05, 5.0745e-05, 5.1047e-05, 5.0483e-05], device='cuda:0') 2022-11-15 14:06:09,937 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4845.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:06:21,038 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.82 vs. limit=5.0 2022-11-15 14:06:32,231 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.27 vs. limit=5.0 2022-11-15 14:06:50,935 INFO [train.py:876] (0/4) Epoch 1, batch 4900, loss[loss=0.2574, simple_loss=0.2229, pruned_loss=0.1459, over 5535.00 frames. ], tot_loss[loss=0.2965, simple_loss=0.2479, pruned_loss=0.1726, over 1079504.84 frames. ], batch size: 13, lr: 4.23e-02, grad_scale: 16.0 2022-11-15 14:06:51,539 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 14:06:51,622 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 2.074e+02 2.835e+02 3.865e+02 7.498e+02, threshold=5.670e+02, percent-clipped=5.0 2022-11-15 14:07:12,349 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.13 vs. limit=5.0 2022-11-15 14:07:22,591 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=4944.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:07:51,915 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 14:07:57,421 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=4992.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:08:03,868 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-5000.pt 2022-11-15 14:08:08,412 INFO [train.py:876] (0/4) Epoch 1, batch 5000, loss[loss=0.2715, simple_loss=0.2414, pruned_loss=0.1508, over 5656.00 frames. ], tot_loss[loss=0.2948, simple_loss=0.2473, pruned_loss=0.1711, over 1086864.83 frames. ], batch size: 29, lr: 4.20e-02, grad_scale: 16.0 2022-11-15 14:08:09,092 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 2.045e+02 2.576e+02 3.694e+02 7.012e+02, threshold=5.152e+02, percent-clipped=6.0 2022-11-15 14:08:22,610 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1146, 1.0024, 0.8685, 1.1908, 1.2790, 1.2340, 0.5131, 0.9756], device='cuda:0'), covar=tensor([0.0157, 0.0120, 0.0144, 0.0137, 0.0112, 0.0136, 0.0220, 0.0147], device='cuda:0'), in_proj_covar=tensor([0.0025, 0.0024, 0.0025, 0.0026, 0.0023, 0.0023, 0.0025, 0.0022], device='cuda:0'), out_proj_covar=tensor([2.6566e-05, 2.4296e-05, 2.8028e-05, 2.6568e-05, 2.4536e-05, 2.4172e-05, 3.2258e-05, 2.4413e-05], device='cuda:0') 2022-11-15 14:08:24,041 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8566, 2.4547, 1.8698, 3.3090, 3.7904, 3.5084, 2.7799, 4.0079], device='cuda:0'), covar=tensor([0.0150, 0.1057, 0.1179, 0.0271, 0.0226, 0.0547, 0.1005, 0.0171], device='cuda:0'), in_proj_covar=tensor([0.0039, 0.0079, 0.0072, 0.0040, 0.0049, 0.0064, 0.0068, 0.0046], device='cuda:0'), out_proj_covar=tensor([3.8315e-05, 8.5408e-05, 7.4110e-05, 3.9042e-05, 4.8441e-05, 6.5603e-05, 7.2192e-05, 4.4577e-05], device='cuda:0') 2022-11-15 14:08:42,336 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.32 vs. limit=5.0 2022-11-15 14:09:01,066 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2022-11-15 14:09:21,850 INFO [train.py:876] (0/4) Epoch 1, batch 5100, loss[loss=0.224, simple_loss=0.2074, pruned_loss=0.1203, over 5737.00 frames. ], tot_loss[loss=0.2916, simple_loss=0.2454, pruned_loss=0.1689, over 1083722.12 frames. ], batch size: 13, lr: 4.18e-02, grad_scale: 16.0 2022-11-15 14:09:22,488 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 2.161e+02 2.601e+02 3.354e+02 8.150e+02, threshold=5.203e+02, percent-clipped=5.0 2022-11-15 14:09:40,418 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7657, 2.4134, 1.4593, 1.8588, 2.1583, 1.9084, 2.7586, 2.0359], device='cuda:0'), covar=tensor([0.0089, 0.0162, 0.0318, 0.0435, 0.0353, 0.0230, 0.0244, 0.0363], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0022, 0.0025, 0.0025, 0.0027, 0.0024, 0.0029, 0.0025], device='cuda:0'), out_proj_covar=tensor([2.1233e-05, 2.3966e-05, 2.8613e-05, 2.8647e-05, 3.2615e-05, 2.9858e-05, 3.4374e-05, 2.9308e-05], device='cuda:0') 2022-11-15 14:10:09,104 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=10.85 vs. limit=5.0 2022-11-15 14:10:27,392 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=7.57 vs. limit=5.0 2022-11-15 14:10:29,563 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.66 vs. limit=5.0 2022-11-15 14:10:33,657 INFO [train.py:876] (0/4) Epoch 1, batch 5200, loss[loss=0.2419, simple_loss=0.2043, pruned_loss=0.1397, over 5365.00 frames. ], tot_loss[loss=0.2916, simple_loss=0.2453, pruned_loss=0.169, over 1085000.29 frames. ], batch size: 6, lr: 4.16e-02, grad_scale: 16.0 2022-11-15 14:10:34,300 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.259e+02 2.030e+02 2.662e+02 3.916e+02 1.299e+03, threshold=5.323e+02, percent-clipped=9.0 2022-11-15 14:10:53,987 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4394, 1.5727, 1.7969, 2.0372, 2.5536, 1.6007, 1.2200, 2.6196], device='cuda:0'), covar=tensor([0.0158, 0.0795, 0.0631, 0.0326, 0.0207, 0.0679, 0.0950, 0.0179], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0090, 0.0080, 0.0045, 0.0054, 0.0071, 0.0079, 0.0050], device='cuda:0'), out_proj_covar=tensor([4.3229e-05, 9.8562e-05, 8.3055e-05, 4.6535e-05, 5.3320e-05, 7.5251e-05, 8.5019e-05, 4.6911e-05], device='cuda:0') 2022-11-15 14:10:59,199 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.57 vs. limit=5.0 2022-11-15 14:11:41,132 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2022-11-15 14:11:46,391 INFO [train.py:876] (0/4) Epoch 1, batch 5300, loss[loss=0.2685, simple_loss=0.2316, pruned_loss=0.1527, over 5512.00 frames. ], tot_loss[loss=0.2902, simple_loss=0.2448, pruned_loss=0.1678, over 1088317.15 frames. ], batch size: 17, lr: 4.14e-02, grad_scale: 16.0 2022-11-15 14:11:47,374 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 2.088e+02 2.621e+02 3.205e+02 6.242e+02, threshold=5.243e+02, percent-clipped=4.0 2022-11-15 14:12:00,415 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.68 vs. limit=5.0 2022-11-15 14:12:32,367 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.92 vs. limit=2.0 2022-11-15 14:13:00,084 INFO [train.py:876] (0/4) Epoch 1, batch 5400, loss[loss=0.2785, simple_loss=0.2295, pruned_loss=0.1637, over 5186.00 frames. ], tot_loss[loss=0.2865, simple_loss=0.2424, pruned_loss=0.1653, over 1089099.10 frames. ], batch size: 91, lr: 4.12e-02, grad_scale: 16.0 2022-11-15 14:13:00,710 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 2.076e+02 2.742e+02 3.471e+02 5.546e+02, threshold=5.484e+02, percent-clipped=1.0 2022-11-15 14:13:51,467 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-15 14:14:12,220 INFO [train.py:876] (0/4) Epoch 1, batch 5500, loss[loss=0.2989, simple_loss=0.2464, pruned_loss=0.1756, over 5639.00 frames. ], tot_loss[loss=0.287, simple_loss=0.2428, pruned_loss=0.1656, over 1088068.03 frames. ], batch size: 32, lr: 4.10e-02, grad_scale: 16.0 2022-11-15 14:14:12,892 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 2.146e+02 2.749e+02 3.970e+02 7.189e+02, threshold=5.498e+02, percent-clipped=5.0 2022-11-15 14:15:10,384 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0 2022-11-15 14:15:25,183 INFO [train.py:876] (0/4) Epoch 1, batch 5600, loss[loss=0.2682, simple_loss=0.2316, pruned_loss=0.1524, over 5369.00 frames. ], tot_loss[loss=0.2901, simple_loss=0.245, pruned_loss=0.1676, over 1078000.71 frames. ], batch size: 9, lr: 4.08e-02, grad_scale: 16.0 2022-11-15 14:15:26,174 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 2.152e+02 2.832e+02 3.606e+02 7.262e+02, threshold=5.664e+02, percent-clipped=5.0 2022-11-15 14:15:28,864 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2022-11-15 14:15:52,391 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-15 14:16:00,757 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=5650.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 14:16:06,609 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.11 vs. limit=2.0 2022-11-15 14:16:19,250 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2284, 1.5743, 2.9654, 2.2428, 2.5644, 2.7179, 2.8653, 2.8219], device='cuda:0'), covar=tensor([0.0147, 0.1649, 0.0281, 0.0819, 0.0334, 0.0419, 0.0337, 0.0425], device='cuda:0'), in_proj_covar=tensor([0.0033, 0.0083, 0.0042, 0.0060, 0.0040, 0.0049, 0.0046, 0.0041], device='cuda:0'), out_proj_covar=tensor([3.8024e-05, 9.6731e-05, 4.7218e-05, 6.9425e-05, 4.5658e-05, 5.5478e-05, 5.2827e-05, 4.9432e-05], device='cuda:0') 2022-11-15 14:16:37,326 INFO [train.py:876] (0/4) Epoch 1, batch 5700, loss[loss=0.254, simple_loss=0.2206, pruned_loss=0.1437, over 4981.00 frames. ], tot_loss[loss=0.2895, simple_loss=0.2447, pruned_loss=0.1672, over 1080435.82 frames. ], batch size: 7, lr: 4.06e-02, grad_scale: 16.0 2022-11-15 14:16:37,963 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 2.164e+02 2.765e+02 3.457e+02 8.983e+02, threshold=5.530e+02, percent-clipped=5.0 2022-11-15 14:16:44,908 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5711.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 14:16:50,168 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.5671, 1.5014, 1.6913, 1.4057, 1.9973, 1.5666, 0.9251, 0.3575], device='cuda:0'), covar=tensor([0.0506, 0.0323, 0.0141, 0.0710, 0.0278, 0.0365, 0.0547, 0.0893], device='cuda:0'), in_proj_covar=tensor([0.0021, 0.0018, 0.0019, 0.0025, 0.0021, 0.0022, 0.0023, 0.0021], device='cuda:0'), out_proj_covar=tensor([2.5708e-05, 2.3476e-05, 2.3127e-05, 3.7295e-05, 2.6483e-05, 2.6777e-05, 3.0287e-05, 2.7852e-05], device='cuda:0') 2022-11-15 14:16:57,943 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.97 vs. limit=2.0 2022-11-15 14:17:16,880 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=5755.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:17:30,173 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2244, 4.1343, 4.3815, 3.9707, 4.5432, 4.1893, 4.0146, 3.8665], device='cuda:0'), covar=tensor([0.0534, 0.0489, 0.0511, 0.0443, 0.0630, 0.0397, 0.0403, 0.0509], device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0053, 0.0042, 0.0048, 0.0053, 0.0037, 0.0040, 0.0040], device='cuda:0'), out_proj_covar=tensor([8.7081e-05, 8.6064e-05, 6.8099e-05, 7.4338e-05, 1.0258e-04, 6.0357e-05, 6.5572e-05, 6.8165e-05], device='cuda:0') 2022-11-15 14:17:30,283 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.6755, 1.1983, 0.9414, 0.8225, 1.2804, 1.1542, 0.8674, 0.5555], device='cuda:0'), covar=tensor([0.0416, 0.0213, 0.0242, 0.0546, 0.0295, 0.0239, 0.0393, 0.0729], device='cuda:0'), in_proj_covar=tensor([0.0021, 0.0018, 0.0019, 0.0026, 0.0020, 0.0021, 0.0023, 0.0022], device='cuda:0'), out_proj_covar=tensor([2.5985e-05, 2.3632e-05, 2.3384e-05, 3.8272e-05, 2.6198e-05, 2.5123e-05, 2.9705e-05, 2.8425e-05], device='cuda:0') 2022-11-15 14:17:48,664 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.21 vs. limit=2.0 2022-11-15 14:17:50,566 INFO [train.py:876] (0/4) Epoch 1, batch 5800, loss[loss=0.267, simple_loss=0.252, pruned_loss=0.141, over 5498.00 frames. ], tot_loss[loss=0.2877, simple_loss=0.2437, pruned_loss=0.1658, over 1083810.46 frames. ], batch size: 17, lr: 4.04e-02, grad_scale: 16.0 2022-11-15 14:17:51,229 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.770e+01 1.984e+02 2.593e+02 3.696e+02 7.124e+02, threshold=5.186e+02, percent-clipped=5.0 2022-11-15 14:18:01,249 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5816.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:18:30,781 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.93 vs. limit=5.0 2022-11-15 14:18:31,837 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=5858.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:18:36,904 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.04 vs. limit=2.0 2022-11-15 14:19:03,093 INFO [train.py:876] (0/4) Epoch 1, batch 5900, loss[loss=0.227, simple_loss=0.2142, pruned_loss=0.1199, over 5697.00 frames. ], tot_loss[loss=0.2837, simple_loss=0.241, pruned_loss=0.1632, over 1085749.45 frames. ], batch size: 11, lr: 4.02e-02, grad_scale: 16.0 2022-11-15 14:19:03,748 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.869e+02 2.719e+02 3.366e+02 7.828e+02, threshold=5.439e+02, percent-clipped=3.0 2022-11-15 14:19:12,006 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7494, 0.8641, 0.6960, 0.4768, 0.7426, 0.7144, 0.5592, 0.5823], device='cuda:0'), covar=tensor([0.0037, 0.0043, 0.0056, 0.0045, 0.0040, 0.0036, 0.0072, 0.0047], device='cuda:0'), in_proj_covar=tensor([0.0026, 0.0024, 0.0025, 0.0024, 0.0023, 0.0022, 0.0024, 0.0021], device='cuda:0'), out_proj_covar=tensor([2.8252e-05, 2.7694e-05, 2.9685e-05, 2.8197e-05, 2.6518e-05, 2.5492e-05, 3.2877e-05, 2.4577e-05], device='cuda:0') 2022-11-15 14:19:16,327 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=5919.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:19:26,898 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.19 vs. limit=5.0 2022-11-15 14:19:34,997 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 14:19:43,144 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1270, 1.6812, 1.8487, 2.6780, 2.5955, 2.7251, 2.2535, 3.1103], device='cuda:0'), covar=tensor([0.0091, 0.0772, 0.0651, 0.0160, 0.0143, 0.0326, 0.0767, 0.0099], device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0100, 0.0096, 0.0054, 0.0066, 0.0088, 0.0105, 0.0056], device='cuda:0'), out_proj_covar=tensor([5.1824e-05, 1.1305e-04, 1.0427e-04, 6.0071e-05, 6.7837e-05, 9.7718e-05, 1.1432e-04, 5.7516e-05], device='cuda:0') 2022-11-15 14:19:44,025 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.00 vs. limit=2.0 2022-11-15 14:20:06,202 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5243, 4.6360, 4.5473, 4.4674, 4.1417, 4.5952, 3.0750, 4.0670], device='cuda:0'), covar=tensor([0.0196, 0.0216, 0.0129, 0.0174, 0.0224, 0.0256, 0.0710, 0.0242], device='cuda:0'), in_proj_covar=tensor([0.0035, 0.0033, 0.0031, 0.0026, 0.0030, 0.0028, 0.0049, 0.0032], device='cuda:0'), out_proj_covar=tensor([5.3016e-05, 4.9327e-05, 4.6849e-05, 3.9050e-05, 4.5890e-05, 4.1427e-05, 7.2075e-05, 4.9204e-05], device='cuda:0') 2022-11-15 14:20:07,708 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=5990.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:20:15,907 INFO [train.py:876] (0/4) Epoch 1, batch 6000, loss[loss=0.2994, simple_loss=0.2466, pruned_loss=0.1761, over 5486.00 frames. ], tot_loss[loss=0.2816, simple_loss=0.239, pruned_loss=0.162, over 1085512.22 frames. ], batch size: 58, lr: 4.00e-02, grad_scale: 16.0 2022-11-15 14:20:15,908 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 14:20:24,838 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2568, 4.3389, 4.2400, 3.6202, 4.2359, 4.5217, 4.0281, 3.6882], device='cuda:0'), covar=tensor([0.0695, 0.0461, 0.0619, 0.0742, 0.1055, 0.0216, 0.0454, 0.0611], device='cuda:0'), in_proj_covar=tensor([0.0052, 0.0053, 0.0045, 0.0049, 0.0053, 0.0040, 0.0043, 0.0042], device='cuda:0'), out_proj_covar=tensor([9.2733e-05, 8.7861e-05, 7.5447e-05, 7.6983e-05, 1.0534e-04, 6.4029e-05, 7.0708e-05, 7.3182e-05], device='cuda:0') 2022-11-15 14:20:34,715 INFO [train.py:908] (0/4) Epoch 1, validation: loss=0.2263, simple_loss=0.2274, pruned_loss=0.1126, over 1530663.00 frames. 2022-11-15 14:20:34,716 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4494MB 2022-11-15 14:20:35,399 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 2.347e+02 2.873e+02 3.885e+02 1.859e+03, threshold=5.746e+02, percent-clipped=5.0 2022-11-15 14:20:38,382 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6006.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 14:20:40,173 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 14:21:05,045 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6043.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:21:11,063 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6051.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:21:16,051 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5476, 4.9938, 3.3772, 2.1259, 4.4872, 3.7086, 3.8532, 3.2616], device='cuda:0'), covar=tensor([0.0231, 0.0100, 0.0348, 0.1653, 0.0135, 0.0327, 0.0187, 0.0410], device='cuda:0'), in_proj_covar=tensor([0.0044, 0.0031, 0.0029, 0.0062, 0.0033, 0.0036, 0.0027, 0.0037], device='cuda:0'), out_proj_covar=tensor([7.2200e-05, 4.9411e-05, 4.7965e-05, 9.6227e-05, 5.2802e-05, 5.8960e-05, 4.6806e-05, 6.0223e-05], device='cuda:0') 2022-11-15 14:21:42,631 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2022-11-15 14:21:47,064 INFO [train.py:876] (0/4) Epoch 1, batch 6100, loss[loss=0.3342, simple_loss=0.2868, pruned_loss=0.1909, over 5751.00 frames. ], tot_loss[loss=0.2818, simple_loss=0.2398, pruned_loss=0.1619, over 1085201.39 frames. ], batch size: 27, lr: 3.98e-02, grad_scale: 16.0 2022-11-15 14:21:47,728 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.385e+02 2.266e+02 2.673e+02 3.416e+02 6.924e+02, threshold=5.346e+02, percent-clipped=3.0 2022-11-15 14:21:49,343 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6104.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:21:54,398 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6111.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:22:11,646 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6135.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:22:12,263 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8727, 5.3997, 4.8632, 5.2923, 5.3117, 4.7827, 4.4997, 3.8652], device='cuda:0'), covar=tensor([0.0239, 0.0193, 0.0236, 0.0312, 0.0140, 0.0237, 0.0233, 0.0447], device='cuda:0'), in_proj_covar=tensor([0.0045, 0.0040, 0.0052, 0.0041, 0.0048, 0.0047, 0.0041, 0.0040], device='cuda:0'), out_proj_covar=tensor([6.5629e-05, 6.3781e-05, 7.5558e-05, 6.3307e-05, 7.1349e-05, 6.4997e-05, 6.0486e-05, 5.7412e-05], device='cuda:0') 2022-11-15 14:22:20,041 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.84 vs. limit=5.0 2022-11-15 14:22:29,242 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3649, 1.8503, 1.4711, 1.4609, 0.7491, 1.1762, 1.2689, 1.3586], device='cuda:0'), covar=tensor([0.0194, 0.0202, 0.0152, 0.0260, 0.0581, 0.0999, 0.0258, 0.0239], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0016, 0.0018, 0.0019, 0.0019, 0.0016, 0.0016, 0.0015], device='cuda:0'), out_proj_covar=tensor([2.2889e-05, 1.9832e-05, 2.0480e-05, 2.4763e-05, 2.5675e-05, 2.1564e-05, 2.0865e-05, 1.9436e-05], device='cuda:0') 2022-11-15 14:22:37,317 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6170.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:22:37,535 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.84 vs. limit=5.0 2022-11-15 14:22:55,822 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6196.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:22:59,790 INFO [train.py:876] (0/4) Epoch 1, batch 6200, loss[loss=0.3748, simple_loss=0.2996, pruned_loss=0.2251, over 5282.00 frames. ], tot_loss[loss=0.2826, simple_loss=0.2406, pruned_loss=0.1623, over 1081763.17 frames. ], batch size: 79, lr: 3.96e-02, grad_scale: 16.0 2022-11-15 14:23:00,451 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.942e+02 2.617e+02 4.109e+02 1.137e+03, threshold=5.234e+02, percent-clipped=10.0 2022-11-15 14:23:01,349 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6203.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:23:06,996 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1058, 2.5521, 2.1373, 1.4391, 2.4528, 1.9835, 2.4645, 2.1574], device='cuda:0'), covar=tensor([0.0383, 0.0137, 0.0162, 0.1245, 0.0171, 0.0318, 0.0193, 0.0299], device='cuda:0'), in_proj_covar=tensor([0.0046, 0.0032, 0.0028, 0.0064, 0.0035, 0.0037, 0.0028, 0.0039], device='cuda:0'), out_proj_covar=tensor([7.5489e-05, 5.0844e-05, 4.7932e-05, 1.0010e-04, 5.5389e-05, 6.1738e-05, 4.9770e-05, 6.5457e-05], device='cuda:0') 2022-11-15 14:23:07,036 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6211.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:23:09,417 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6214.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:23:15,448 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.09 vs. limit=2.0 2022-11-15 14:23:17,598 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9925, 1.5404, 1.3075, 1.5596, 0.5176, 1.2896, 1.0746, 1.0384], device='cuda:0'), covar=tensor([0.0191, 0.0161, 0.0129, 0.0154, 0.0337, 0.0322, 0.0333, 0.0203], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0016, 0.0017, 0.0018, 0.0018, 0.0015, 0.0016, 0.0015], device='cuda:0'), out_proj_covar=tensor([2.2183e-05, 1.9546e-05, 1.9513e-05, 2.3814e-05, 2.4110e-05, 2.1079e-05, 2.0919e-05, 1.9125e-05], device='cuda:0') 2022-11-15 14:23:21,828 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6231.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:23:38,181 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7341, 1.4343, 1.8200, 2.0428, 2.1340, 2.0556, 1.6676, 2.1348], device='cuda:0'), covar=tensor([0.0163, 0.0443, 0.0324, 0.0161, 0.0117, 0.0134, 0.0267, 0.0159], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0016, 0.0017, 0.0018, 0.0018, 0.0016, 0.0021, 0.0017], device='cuda:0'), out_proj_covar=tensor([2.0177e-05, 2.1772e-05, 2.0368e-05, 2.1149e-05, 2.2462e-05, 1.7543e-05, 2.7389e-05, 1.9657e-05], device='cuda:0') 2022-11-15 14:23:40,855 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9284, 4.2521, 3.7893, 3.8132, 3.6382, 3.5426, 2.3765, 4.1304], device='cuda:0'), covar=tensor([0.0269, 0.0278, 0.0312, 0.0191, 0.0198, 0.0404, 0.0918, 0.0157], device='cuda:0'), in_proj_covar=tensor([0.0036, 0.0032, 0.0031, 0.0027, 0.0030, 0.0028, 0.0051, 0.0033], device='cuda:0'), out_proj_covar=tensor([5.5401e-05, 4.8329e-05, 4.7667e-05, 4.1125e-05, 4.5829e-05, 4.3018e-05, 7.7073e-05, 5.0403e-05], device='cuda:0') 2022-11-15 14:23:45,460 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6264.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:23:46,892 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6266.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:23:51,491 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6272.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:23:52,422 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.12 vs. limit=2.0 2022-11-15 14:24:12,260 INFO [train.py:876] (0/4) Epoch 1, batch 6300, loss[loss=0.2828, simple_loss=0.2312, pruned_loss=0.1672, over 5698.00 frames. ], tot_loss[loss=0.2803, simple_loss=0.2393, pruned_loss=0.1606, over 1080915.58 frames. ], batch size: 36, lr: 3.94e-02, grad_scale: 32.0 2022-11-15 14:24:12,647 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.12 vs. limit=2.0 2022-11-15 14:24:12,910 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.397e+02 2.220e+02 2.802e+02 3.554e+02 1.076e+03, threshold=5.605e+02, percent-clipped=6.0 2022-11-15 14:24:15,833 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6306.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 14:24:21,133 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6451, 2.4985, 1.6875, 1.8842, 1.0360, 1.9186, 1.4142, 2.0320], device='cuda:0'), covar=tensor([0.0272, 0.0172, 0.0267, 0.0393, 0.0476, 0.0260, 0.0502, 0.0275], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0027, 0.0032, 0.0028, 0.0038, 0.0029, 0.0037, 0.0028], device='cuda:0'), out_proj_covar=tensor([3.5529e-05, 3.2267e-05, 4.1875e-05, 3.7789e-05, 5.2455e-05, 4.4409e-05, 4.9875e-05, 3.7891e-05], device='cuda:0') 2022-11-15 14:24:31,156 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6327.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:24:44,718 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6346.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:24:50,571 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6354.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 14:25:03,857 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2022-11-15 14:25:14,542 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8772, 2.1300, 2.4210, 2.1291, 2.0887, 2.7289, 2.4364, 2.5650], device='cuda:0'), covar=tensor([0.0758, 0.0328, 0.0359, 0.0439, 0.0466, 0.0241, 0.0359, 0.0148], device='cuda:0'), in_proj_covar=tensor([0.0058, 0.0037, 0.0037, 0.0030, 0.0049, 0.0037, 0.0044, 0.0031], device='cuda:0'), out_proj_covar=tensor([6.2535e-05, 3.9547e-05, 4.1082e-05, 3.5287e-05, 5.5108e-05, 3.9321e-05, 4.7270e-05, 3.2076e-05], device='cuda:0') 2022-11-15 14:25:22,863 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6399.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:25:24,487 INFO [train.py:876] (0/4) Epoch 1, batch 6400, loss[loss=0.1903, simple_loss=0.1856, pruned_loss=0.09752, over 5671.00 frames. ], tot_loss[loss=0.2808, simple_loss=0.2397, pruned_loss=0.1609, over 1085352.51 frames. ], batch size: 12, lr: 3.92e-02, grad_scale: 32.0 2022-11-15 14:25:25,172 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 2.235e+02 2.872e+02 3.964e+02 7.777e+02, threshold=5.745e+02, percent-clipped=4.0 2022-11-15 14:25:32,020 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6411.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:26:06,807 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6459.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:26:09,012 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6462.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:26:12,751 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6467.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:26:14,366 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 14:26:29,880 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.40 vs. limit=5.0 2022-11-15 14:26:30,099 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6491.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:26:36,862 INFO [train.py:876] (0/4) Epoch 1, batch 6500, loss[loss=0.3005, simple_loss=0.2408, pruned_loss=0.1802, over 4998.00 frames. ], tot_loss[loss=0.2818, simple_loss=0.24, pruned_loss=0.1617, over 1081009.46 frames. ], batch size: 109, lr: 3.90e-02, grad_scale: 32.0 2022-11-15 14:26:37,572 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.388e+02 2.150e+02 2.872e+02 3.674e+02 6.857e+02, threshold=5.744e+02, percent-clipped=4.0 2022-11-15 14:26:46,488 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6514.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:26:53,180 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6523.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 14:26:55,148 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6526.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:26:56,663 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6528.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:27:18,843 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6559.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:27:20,931 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6562.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:27:24,785 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6567.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:27:49,422 INFO [train.py:876] (0/4) Epoch 1, batch 6600, loss[loss=0.2749, simple_loss=0.2431, pruned_loss=0.1534, over 5605.00 frames. ], tot_loss[loss=0.2813, simple_loss=0.2397, pruned_loss=0.1614, over 1076689.65 frames. ], batch size: 23, lr: 3.89e-02, grad_scale: 32.0 2022-11-15 14:27:50,094 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 2.099e+02 2.757e+02 3.560e+02 8.696e+02, threshold=5.514e+02, percent-clipped=5.0 2022-11-15 14:28:04,922 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6622.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:28:09,780 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6629.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:28:22,500 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6646.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:28:54,045 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6690.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:28:56,687 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6694.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:29:00,629 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6699.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:29:01,744 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-15 14:29:01,856 INFO [train.py:876] (0/4) Epoch 1, batch 6700, loss[loss=0.3126, simple_loss=0.2593, pruned_loss=0.1829, over 5684.00 frames. ], tot_loss[loss=0.2774, simple_loss=0.237, pruned_loss=0.1589, over 1076174.36 frames. ], batch size: 19, lr: 3.87e-02, grad_scale: 16.0 2022-11-15 14:29:03,243 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.203e+02 2.211e+02 2.874e+02 3.707e+02 9.191e+02, threshold=5.749e+02, percent-clipped=7.0 2022-11-15 14:29:15,624 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8028, 4.0250, 4.0473, 4.0544, 3.4547, 3.3827, 4.4638, 3.8192], device='cuda:0'), covar=tensor([0.0610, 0.0820, 0.0476, 0.0609, 0.0843, 0.0548, 0.0882, 0.0462], device='cuda:0'), in_proj_covar=tensor([0.0049, 0.0071, 0.0058, 0.0063, 0.0045, 0.0039, 0.0066, 0.0052], device='cuda:0'), out_proj_covar=tensor([8.1471e-05, 1.1999e-04, 9.4806e-05, 1.0420e-04, 7.6471e-05, 6.5019e-05, 1.2446e-04, 8.6882e-05], device='cuda:0') 2022-11-15 14:29:34,900 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6747.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:30:06,617 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6791.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:30:13,990 INFO [train.py:876] (0/4) Epoch 1, batch 6800, loss[loss=0.2674, simple_loss=0.2383, pruned_loss=0.1482, over 5629.00 frames. ], tot_loss[loss=0.2789, simple_loss=0.2385, pruned_loss=0.1597, over 1078109.81 frames. ], batch size: 18, lr: 3.85e-02, grad_scale: 16.0 2022-11-15 14:30:15,299 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.257e+02 2.052e+02 2.561e+02 3.297e+02 6.876e+02, threshold=5.122e+02, percent-clipped=2.0 2022-11-15 14:30:26,392 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6818.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 14:30:30,172 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6823.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:30:32,303 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6826.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:30:41,203 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6839.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:30:57,026 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6859.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:31:02,565 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6867.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:31:07,534 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6874.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:31:10,668 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6878.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 14:31:23,036 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2508, 3.2249, 3.1586, 3.3976, 3.0035, 2.8080, 3.6410, 3.0746], device='cuda:0'), covar=tensor([0.0529, 0.0697, 0.0506, 0.0508, 0.0680, 0.0421, 0.0657, 0.0453], device='cuda:0'), in_proj_covar=tensor([0.0045, 0.0066, 0.0055, 0.0060, 0.0042, 0.0037, 0.0061, 0.0050], device='cuda:0'), out_proj_covar=tensor([7.6885e-05, 1.1070e-04, 9.0354e-05, 1.0021e-04, 7.3241e-05, 6.0660e-05, 1.1573e-04, 8.2863e-05], device='cuda:0') 2022-11-15 14:31:25,377 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.07 vs. limit=5.0 2022-11-15 14:31:26,638 INFO [train.py:876] (0/4) Epoch 1, batch 6900, loss[loss=0.2697, simple_loss=0.2316, pruned_loss=0.1539, over 5785.00 frames. ], tot_loss[loss=0.2772, simple_loss=0.2375, pruned_loss=0.1584, over 1079961.71 frames. ], batch size: 21, lr: 3.83e-02, grad_scale: 16.0 2022-11-15 14:31:27,997 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.342e+02 2.317e+02 3.048e+02 4.158e+02 6.462e+02, threshold=6.096e+02, percent-clipped=10.0 2022-11-15 14:31:30,925 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6907.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:31:32,403 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=6909.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:31:32,991 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.33 vs. limit=5.0 2022-11-15 14:31:36,814 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6915.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:31:41,780 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=6922.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:31:54,350 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6939.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 14:32:17,036 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=6970.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:32:17,170 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=6970.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:32:27,731 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=6985.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:32:38,867 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4274, 2.3972, 1.0554, 2.3310, 1.4160, 0.9827, 1.9555, 1.3805], device='cuda:0'), covar=tensor([0.0186, 0.0092, 0.0174, 0.0151, 0.0286, 0.1061, 0.0228, 0.0317], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0016, 0.0019, 0.0020, 0.0020, 0.0018, 0.0018, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.6488e-05, 2.0237e-05, 2.2696e-05, 2.7233e-05, 2.8170e-05, 2.4946e-05, 2.3518e-05, 2.5480e-05], device='cuda:0') 2022-11-15 14:32:39,433 INFO [train.py:876] (0/4) Epoch 1, batch 7000, loss[loss=0.3174, simple_loss=0.2622, pruned_loss=0.1863, over 5536.00 frames. ], tot_loss[loss=0.2774, simple_loss=0.2379, pruned_loss=0.1585, over 1082369.51 frames. ], batch size: 15, lr: 3.81e-02, grad_scale: 16.0 2022-11-15 14:32:40,781 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 2.319e+02 2.855e+02 3.574e+02 7.700e+02, threshold=5.709e+02, percent-clipped=2.0 2022-11-15 14:33:04,311 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1068, 1.8071, 2.2797, 3.3826, 3.4044, 2.9274, 2.3990, 3.5884], device='cuda:0'), covar=tensor([0.0222, 0.1617, 0.1314, 0.0324, 0.0295, 0.1025, 0.1607, 0.0166], device='cuda:0'), in_proj_covar=tensor([0.0063, 0.0128, 0.0128, 0.0065, 0.0083, 0.0125, 0.0141, 0.0071], device='cuda:0'), out_proj_covar=tensor([6.8532e-05, 1.4896e-04, 1.4667e-04, 7.7781e-05, 9.0665e-05, 1.4771e-04, 1.6143e-04, 7.4609e-05], device='cuda:0') 2022-11-15 14:33:10,514 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4080, 1.6822, 1.7311, 1.9775, 1.5398, 1.8101, 1.3430, 1.8959], device='cuda:0'), covar=tensor([0.0712, 0.0179, 0.0311, 0.0111, 0.0389, 0.0361, 0.1178, 0.0181], device='cuda:0'), in_proj_covar=tensor([0.0147, 0.0080, 0.0096, 0.0068, 0.0081, 0.0121, 0.0167, 0.0078], device='cuda:0'), out_proj_covar=tensor([1.6125e-04, 8.3729e-05, 1.0440e-04, 7.3211e-05, 9.2836e-05, 1.3816e-04, 1.8400e-04, 8.1894e-05], device='cuda:0') 2022-11-15 14:33:13,384 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7048.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 14:33:15,738 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.05 vs. limit=5.0 2022-11-15 14:33:47,603 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.15 vs. limit=5.0 2022-11-15 14:33:51,321 INFO [train.py:876] (0/4) Epoch 1, batch 7100, loss[loss=0.2844, simple_loss=0.2521, pruned_loss=0.1584, over 5547.00 frames. ], tot_loss[loss=0.2783, simple_loss=0.2392, pruned_loss=0.1587, over 1084323.20 frames. ], batch size: 21, lr: 3.79e-02, grad_scale: 16.0 2022-11-15 14:33:52,659 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 2.197e+02 2.721e+02 3.665e+02 9.993e+02, threshold=5.441e+02, percent-clipped=4.0 2022-11-15 14:33:56,933 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7109.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 14:34:05,346 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7118.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 14:34:09,105 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7123.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:34:34,318 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8064, 4.8124, 5.1435, 4.7097, 5.5214, 5.1020, 4.4345, 4.3674], device='cuda:0'), covar=tensor([0.0388, 0.0253, 0.0367, 0.0266, 0.0327, 0.0139, 0.0285, 0.0306], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0054, 0.0045, 0.0053, 0.0054, 0.0038, 0.0046, 0.0042], device='cuda:0'), out_proj_covar=tensor([1.0022e-04, 9.6415e-05, 8.2445e-05, 9.0712e-05, 1.1997e-04, 6.5270e-05, 8.2542e-05, 7.7442e-05], device='cuda:0') 2022-11-15 14:34:37,893 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7163.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:34:40,168 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7166.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:34:43,790 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7171.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:34:48,482 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-15 14:34:49,830 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 14:34:50,397 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=7.06 vs. limit=5.0 2022-11-15 14:34:54,748 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8664, 1.5863, 2.8726, 2.0847, 2.4977, 2.0359, 2.8566, 3.0922], device='cuda:0'), covar=tensor([0.0113, 0.1185, 0.0144, 0.0637, 0.0203, 0.0560, 0.0239, 0.0155], device='cuda:0'), in_proj_covar=tensor([0.0044, 0.0107, 0.0053, 0.0081, 0.0048, 0.0073, 0.0063, 0.0057], device='cuda:0'), out_proj_covar=tensor([5.5992e-05, 1.3438e-04, 6.8585e-05, 1.0243e-04, 6.1980e-05, 9.3585e-05, 8.0899e-05, 7.4111e-05], device='cuda:0') 2022-11-15 14:35:05,693 INFO [train.py:876] (0/4) Epoch 1, batch 7200, loss[loss=0.2018, simple_loss=0.1902, pruned_loss=0.1067, over 5449.00 frames. ], tot_loss[loss=0.2753, simple_loss=0.2372, pruned_loss=0.1567, over 1079264.54 frames. ], batch size: 10, lr: 3.78e-02, grad_scale: 16.0 2022-11-15 14:35:07,001 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.380e+02 2.253e+02 2.788e+02 3.499e+02 9.174e+02, threshold=5.576e+02, percent-clipped=8.0 2022-11-15 14:35:22,002 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7224.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 14:35:29,020 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7234.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 14:35:40,251 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.02 vs. limit=5.0 2022-11-15 14:35:48,199 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0337, 1.3472, 1.4887, 1.2235, 0.8655, 1.3298, 1.5684, 1.2181], device='cuda:0'), covar=tensor([0.0253, 0.0175, 0.0163, 0.0226, 0.0422, 0.0280, 0.0211, 0.0219], device='cuda:0'), in_proj_covar=tensor([0.0021, 0.0017, 0.0020, 0.0022, 0.0020, 0.0018, 0.0018, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.9233e-05, 2.1647e-05, 2.4546e-05, 2.9425e-05, 2.9865e-05, 2.6509e-05, 2.3752e-05, 2.6064e-05], device='cuda:0') 2022-11-15 14:35:50,128 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7265.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:35:56,849 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-1.pt 2022-11-15 14:37:25,309 INFO [train.py:876] (0/4) Epoch 2, batch 0, loss[loss=0.3208, simple_loss=0.2584, pruned_loss=0.1916, over 5428.00 frames. ], tot_loss[loss=0.3208, simple_loss=0.2584, pruned_loss=0.1916, over 5428.00 frames. ], batch size: 58, lr: 3.69e-02, grad_scale: 16.0 2022-11-15 14:37:25,311 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 14:37:42,502 INFO [train.py:908] (0/4) Epoch 2, validation: loss=0.2258, simple_loss=0.228, pruned_loss=0.1118, over 1530663.00 frames. 2022-11-15 14:37:42,502 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4494MB 2022-11-15 14:37:44,077 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7275.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:37:51,154 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7285.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:38:04,591 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.318e+02 2.115e+02 2.889e+02 4.195e+02 1.182e+03, threshold=5.778e+02, percent-clipped=11.0 2022-11-15 14:38:07,749 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4707, 1.4325, 1.5716, 1.9673, 1.2353, 1.4002, 1.5038, 1.3845], device='cuda:0'), covar=tensor([0.0104, 0.0046, 0.0060, 0.0033, 0.0171, 0.0057, 0.0091, 0.0048], device='cuda:0'), in_proj_covar=tensor([0.0065, 0.0037, 0.0040, 0.0036, 0.0058, 0.0039, 0.0050, 0.0033], device='cuda:0'), out_proj_covar=tensor([7.5635e-05, 4.2540e-05, 4.7291e-05, 4.6461e-05, 7.4584e-05, 4.3950e-05, 5.9840e-05, 3.7481e-05], device='cuda:0') 2022-11-15 14:38:26,542 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7333.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:38:28,806 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7336.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:38:55,140 INFO [train.py:876] (0/4) Epoch 2, batch 100, loss[loss=0.2247, simple_loss=0.2013, pruned_loss=0.1241, over 5506.00 frames. ], tot_loss[loss=0.2792, simple_loss=0.2398, pruned_loss=0.1593, over 431580.34 frames. ], batch size: 10, lr: 3.67e-02, grad_scale: 16.0 2022-11-15 14:39:03,499 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.04 vs. limit=2.0 2022-11-15 14:39:17,529 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.188e+01 2.195e+02 2.755e+02 3.428e+02 7.515e+02, threshold=5.510e+02, percent-clipped=5.0 2022-11-15 14:39:18,277 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7404.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 14:39:20,779 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7407.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:39:23,637 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9005, 1.1954, 0.8465, 1.2534, 1.6262, 0.8010, 1.0184, 0.9220], device='cuda:0'), covar=tensor([0.0661, 0.0174, 0.0257, 0.0704, 0.0220, 0.0192, 0.0551, 0.0343], device='cuda:0'), in_proj_covar=tensor([0.0020, 0.0019, 0.0019, 0.0023, 0.0018, 0.0019, 0.0021, 0.0020], device='cuda:0'), out_proj_covar=tensor([2.9374e-05, 2.7400e-05, 2.8058e-05, 3.8010e-05, 2.8140e-05, 2.5588e-05, 3.2247e-05, 2.7444e-05], device='cuda:0') 2022-11-15 14:39:27,944 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0 2022-11-15 14:40:05,192 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7468.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:40:08,500 INFO [train.py:876] (0/4) Epoch 2, batch 200, loss[loss=0.3292, simple_loss=0.2786, pruned_loss=0.1899, over 5700.00 frames. ], tot_loss[loss=0.2729, simple_loss=0.2369, pruned_loss=0.1545, over 692589.93 frames. ], batch size: 28, lr: 3.66e-02, grad_scale: 16.0 2022-11-15 14:40:30,170 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 2.136e+02 2.623e+02 3.249e+02 5.222e+02, threshold=5.245e+02, percent-clipped=0.0 2022-11-15 14:40:41,701 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7519.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 14:40:52,511 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7534.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 14:41:06,022 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.21 vs. limit=2.0 2022-11-15 14:41:06,880 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-15 14:41:14,920 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7565.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:41:20,231 INFO [train.py:876] (0/4) Epoch 2, batch 300, loss[loss=0.3202, simple_loss=0.262, pruned_loss=0.1892, over 5585.00 frames. ], tot_loss[loss=0.2744, simple_loss=0.2372, pruned_loss=0.1558, over 850694.47 frames. ], batch size: 43, lr: 3.64e-02, grad_scale: 16.0 2022-11-15 14:41:27,227 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7582.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 14:41:34,450 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.40 vs. limit=5.0 2022-11-15 14:41:41,824 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9587, 1.1387, 1.0563, 0.5524, 1.1110, 1.5636, 0.8306, 0.6780], device='cuda:0'), covar=tensor([0.0247, 0.0091, 0.0243, 0.0134, 0.0101, 0.0160, 0.0189, 0.0349], device='cuda:0'), in_proj_covar=tensor([0.0026, 0.0021, 0.0021, 0.0022, 0.0022, 0.0021, 0.0023, 0.0024], device='cuda:0'), out_proj_covar=tensor([3.0952e-05, 2.5065e-05, 2.9293e-05, 2.5439e-05, 2.5132e-05, 2.6411e-05, 3.7644e-05, 3.0158e-05], device='cuda:0') 2022-11-15 14:41:42,306 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 2.168e+02 2.646e+02 3.466e+02 1.431e+03, threshold=5.292e+02, percent-clipped=6.0 2022-11-15 14:41:49,771 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7613.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:41:57,607 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7624.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:42:02,262 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7631.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:42:14,431 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.41 vs. limit=5.0 2022-11-15 14:42:17,790 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4970, 1.9458, 2.3852, 3.3562, 3.8344, 3.1817, 2.3675, 3.8635], device='cuda:0'), covar=tensor([0.0138, 0.1506, 0.1397, 0.0286, 0.0178, 0.1008, 0.1496, 0.0084], device='cuda:0'), in_proj_covar=tensor([0.0069, 0.0132, 0.0134, 0.0070, 0.0080, 0.0126, 0.0142, 0.0070], device='cuda:0'), out_proj_covar=tensor([7.8700e-05, 1.5765e-04, 1.5516e-04, 8.6324e-05, 9.0673e-05, 1.5124e-04, 1.6732e-04, 7.6052e-05], device='cuda:0') 2022-11-15 14:42:33,000 INFO [train.py:876] (0/4) Epoch 2, batch 400, loss[loss=0.2373, simple_loss=0.2232, pruned_loss=0.1257, over 5759.00 frames. ], tot_loss[loss=0.2713, simple_loss=0.2352, pruned_loss=0.1537, over 944045.73 frames. ], batch size: 20, lr: 3.62e-02, grad_scale: 16.0 2022-11-15 14:42:38,697 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4922, 3.6124, 3.4667, 3.7774, 3.3778, 2.8022, 3.9344, 3.2017], device='cuda:0'), covar=tensor([0.0518, 0.0672, 0.0563, 0.0529, 0.0475, 0.0436, 0.0665, 0.0650], device='cuda:0'), in_proj_covar=tensor([0.0048, 0.0070, 0.0058, 0.0064, 0.0041, 0.0038, 0.0065, 0.0050], device='cuda:0'), out_proj_covar=tensor([8.4651e-05, 1.2748e-04, 1.0140e-04, 1.1305e-04, 7.5490e-05, 6.6273e-05, 1.3014e-04, 8.7246e-05], device='cuda:0') 2022-11-15 14:42:41,564 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7685.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:42:54,873 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 2.314e+02 2.984e+02 3.754e+02 8.890e+02, threshold=5.969e+02, percent-clipped=7.0 2022-11-15 14:42:55,018 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8063, 1.3617, 1.1317, 1.1679, 1.1246, 1.1362, 1.2248, 0.9028], device='cuda:0'), covar=tensor([0.0284, 0.0244, 0.0270, 0.0247, 0.0329, 0.0535, 0.0291, 0.0269], device='cuda:0'), in_proj_covar=tensor([0.0020, 0.0019, 0.0021, 0.0021, 0.0019, 0.0018, 0.0019, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.7163e-05, 2.3762e-05, 2.7188e-05, 2.9051e-05, 2.8916e-05, 2.7455e-05, 2.6279e-05, 2.7611e-05], device='cuda:0') 2022-11-15 14:42:55,760 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7704.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 14:42:58,974 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.28 vs. limit=2.0 2022-11-15 14:43:23,431 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7743.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:43:29,846 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7752.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 14:43:32,142 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 14:43:37,924 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7763.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:43:44,746 INFO [train.py:876] (0/4) Epoch 2, batch 500, loss[loss=0.241, simple_loss=0.2188, pruned_loss=0.1316, over 5722.00 frames. ], tot_loss[loss=0.269, simple_loss=0.2336, pruned_loss=0.1522, over 994183.14 frames. ], batch size: 12, lr: 3.61e-02, grad_scale: 16.0 2022-11-15 14:44:06,087 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=7802.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:44:06,570 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.399e+02 2.366e+02 3.140e+02 3.903e+02 7.653e+02, threshold=6.280e+02, percent-clipped=5.0 2022-11-15 14:44:07,475 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7804.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:44:18,466 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7819.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 14:44:28,516 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5293, 4.2057, 3.8539, 4.2196, 4.1759, 3.7006, 3.2675, 3.2924], device='cuda:0'), covar=tensor([0.0363, 0.0323, 0.0299, 0.0289, 0.0322, 0.0398, 0.0404, 0.0448], device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0043, 0.0058, 0.0043, 0.0062, 0.0056, 0.0050, 0.0046], device='cuda:0'), out_proj_covar=tensor([8.2623e-05, 7.7441e-05, 9.3044e-05, 7.5056e-05, 1.0925e-04, 8.7597e-05, 8.3425e-05, 7.4172e-05], device='cuda:0') 2022-11-15 14:44:43,420 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.43 vs. limit=5.0 2022-11-15 14:44:45,379 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 14:44:50,027 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=7863.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:44:52,614 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7867.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:44:56,654 INFO [train.py:876] (0/4) Epoch 2, batch 600, loss[loss=0.3171, simple_loss=0.2529, pruned_loss=0.1907, over 5390.00 frames. ], tot_loss[loss=0.2713, simple_loss=0.2351, pruned_loss=0.1537, over 1025627.44 frames. ], batch size: 70, lr: 3.59e-02, grad_scale: 16.0 2022-11-15 14:45:18,185 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 2.116e+02 2.659e+02 3.486e+02 9.417e+02, threshold=5.318e+02, percent-clipped=5.0 2022-11-15 14:45:26,852 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9284, 4.1319, 3.7989, 4.0254, 4.0205, 4.2145, 2.0439, 3.7536], device='cuda:0'), covar=tensor([0.0322, 0.0362, 0.0386, 0.0189, 0.0256, 0.0211, 0.1571, 0.0355], device='cuda:0'), in_proj_covar=tensor([0.0042, 0.0040, 0.0034, 0.0029, 0.0037, 0.0032, 0.0063, 0.0040], device='cuda:0'), out_proj_covar=tensor([7.0997e-05, 7.1434e-05, 5.8549e-05, 4.9377e-05, 6.2172e-05, 5.4509e-05, 1.0541e-04, 6.8567e-05], device='cuda:0') 2022-11-15 14:45:38,492 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=7931.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:45:46,999 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.76 vs. limit=5.0 2022-11-15 14:46:08,129 INFO [train.py:876] (0/4) Epoch 2, batch 700, loss[loss=0.338, simple_loss=0.2742, pruned_loss=0.2009, over 5337.00 frames. ], tot_loss[loss=0.2729, simple_loss=0.2369, pruned_loss=0.1544, over 1051840.07 frames. ], batch size: 70, lr: 3.57e-02, grad_scale: 16.0 2022-11-15 14:46:12,998 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=7979.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:46:13,699 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=7980.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:46:17,646 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.72 vs. limit=5.0 2022-11-15 14:46:30,161 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.456e+02 2.448e+02 3.316e+02 4.282e+02 8.235e+02, threshold=6.631e+02, percent-clipped=7.0 2022-11-15 14:46:51,793 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.68 vs. limit=2.0 2022-11-15 14:47:06,732 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7236, 4.6577, 3.9895, 4.7290, 4.6518, 4.0890, 3.8097, 3.5639], device='cuda:0'), covar=tensor([0.0323, 0.0171, 0.0339, 0.0195, 0.0138, 0.0247, 0.0259, 0.0359], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0045, 0.0062, 0.0047, 0.0064, 0.0059, 0.0052, 0.0047], device='cuda:0'), out_proj_covar=tensor([8.8044e-05, 8.3001e-05, 1.0165e-04, 8.0992e-05, 1.1379e-04, 9.5698e-05, 8.7267e-05, 7.6063e-05], device='cuda:0') 2022-11-15 14:47:13,893 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8063.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:47:14,957 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.62 vs. limit=5.0 2022-11-15 14:47:20,645 INFO [train.py:876] (0/4) Epoch 2, batch 800, loss[loss=0.1912, simple_loss=0.1764, pruned_loss=0.103, over 5731.00 frames. ], tot_loss[loss=0.2704, simple_loss=0.235, pruned_loss=0.1529, over 1064771.68 frames. ], batch size: 9, lr: 3.56e-02, grad_scale: 16.0 2022-11-15 14:47:39,435 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=8099.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:47:42,123 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 2.291e+02 2.781e+02 3.438e+02 1.081e+03, threshold=5.561e+02, percent-clipped=3.0 2022-11-15 14:47:48,146 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8111.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:47:57,745 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8164, 1.4003, 1.0878, 0.7690, 1.0866, 1.1124, 1.0479, 1.0784], device='cuda:0'), covar=tensor([0.0565, 0.0142, 0.0223, 0.0788, 0.0652, 0.0328, 0.0345, 0.0339], device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0016, 0.0015, 0.0021, 0.0014, 0.0016, 0.0017, 0.0015], device='cuda:0'), out_proj_covar=tensor([2.2650e-05, 2.3071e-05, 2.4027e-05, 3.5835e-05, 2.2293e-05, 2.3189e-05, 2.6817e-05, 2.2917e-05], device='cuda:0') 2022-11-15 14:47:59,950 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9774, 2.9800, 2.8801, 3.0887, 2.2871, 2.4859, 1.7594, 2.7374], device='cuda:0'), covar=tensor([0.1504, 0.0217, 0.0331, 0.0152, 0.0485, 0.0775, 0.2365, 0.0212], device='cuda:0'), in_proj_covar=tensor([0.0156, 0.0083, 0.0101, 0.0073, 0.0085, 0.0126, 0.0175, 0.0081], device='cuda:0'), out_proj_covar=tensor([1.7522e-04, 8.7607e-05, 1.1545e-04, 8.1666e-05, 1.0142e-04, 1.4842e-04, 1.9381e-04, 8.7549e-05], device='cuda:0') 2022-11-15 14:48:16,196 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.73 vs. limit=5.0 2022-11-15 14:48:22,104 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=8158.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:48:32,983 INFO [train.py:876] (0/4) Epoch 2, batch 900, loss[loss=0.3338, simple_loss=0.2726, pruned_loss=0.1975, over 5705.00 frames. ], tot_loss[loss=0.2692, simple_loss=0.2346, pruned_loss=0.1519, over 1079344.03 frames. ], batch size: 34, lr: 3.54e-02, grad_scale: 16.0 2022-11-15 14:48:44,384 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.34 vs. limit=5.0 2022-11-15 14:48:51,396 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1737, 1.3815, 1.4684, 1.9306, 2.0080, 1.5620, 1.2095, 1.7887], device='cuda:0'), covar=tensor([0.0086, 0.0802, 0.0712, 0.0271, 0.0139, 0.0893, 0.1303, 0.0159], device='cuda:0'), in_proj_covar=tensor([0.0068, 0.0148, 0.0145, 0.0081, 0.0085, 0.0143, 0.0155, 0.0077], device='cuda:0'), out_proj_covar=tensor([8.0079e-05, 1.7736e-04, 1.7159e-04, 1.0230e-04, 9.8097e-05, 1.7432e-04, 1.8471e-04, 8.8145e-05], device='cuda:0') 2022-11-15 14:48:54,941 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 2.386e+02 2.834e+02 3.764e+02 8.164e+02, threshold=5.667e+02, percent-clipped=3.0 2022-11-15 14:49:39,593 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=8265.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:49:39,606 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0960, 0.8670, 0.9263, 0.6670, 1.1027, 0.8238, 0.8378, 1.2486], device='cuda:0'), covar=tensor([0.0289, 0.0206, 0.0236, 0.0421, 0.0284, 0.0176, 0.0233, 0.0250], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0015, 0.0014, 0.0019, 0.0014, 0.0014, 0.0016, 0.0014], device='cuda:0'), out_proj_covar=tensor([2.1073e-05, 2.2572e-05, 2.2861e-05, 3.2769e-05, 2.2515e-05, 2.1669e-05, 2.4982e-05, 2.1657e-05], device='cuda:0') 2022-11-15 14:49:41,216 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.17 vs. limit=5.0 2022-11-15 14:49:44,956 INFO [train.py:876] (0/4) Epoch 2, batch 1000, loss[loss=0.2031, simple_loss=0.1879, pruned_loss=0.1092, over 5448.00 frames. ], tot_loss[loss=0.2712, simple_loss=0.2361, pruned_loss=0.1532, over 1077781.32 frames. ], batch size: 10, lr: 3.53e-02, grad_scale: 16.0 2022-11-15 14:49:50,693 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8280.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:50:06,026 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.08 vs. limit=5.0 2022-11-15 14:50:07,376 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.298e+02 2.271e+02 2.772e+02 3.875e+02 7.231e+02, threshold=5.545e+02, percent-clipped=6.0 2022-11-15 14:50:23,534 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=8326.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:50:24,724 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8328.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:50:57,474 INFO [train.py:876] (0/4) Epoch 2, batch 1100, loss[loss=0.2891, simple_loss=0.241, pruned_loss=0.1686, over 5525.00 frames. ], tot_loss[loss=0.2689, simple_loss=0.2343, pruned_loss=0.1517, over 1078105.32 frames. ], batch size: 46, lr: 3.51e-02, grad_scale: 16.0 2022-11-15 14:51:06,108 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.11 vs. limit=2.0 2022-11-15 14:51:09,406 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.88 vs. limit=5.0 2022-11-15 14:51:16,584 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8399.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:51:19,489 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.385e+02 2.261e+02 2.575e+02 3.836e+02 7.235e+02, threshold=5.150e+02, percent-clipped=6.0 2022-11-15 14:51:51,358 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8447.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:51:51,535 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6020, 1.8736, 3.5300, 2.6223, 3.1489, 2.5791, 3.8916, 3.8748], device='cuda:0'), covar=tensor([0.0086, 0.1383, 0.0227, 0.0883, 0.0178, 0.0679, 0.0199, 0.0150], device='cuda:0'), in_proj_covar=tensor([0.0054, 0.0120, 0.0070, 0.0103, 0.0061, 0.0099, 0.0076, 0.0062], device='cuda:0'), out_proj_covar=tensor([7.5089e-05, 1.6206e-04, 9.8154e-05, 1.3726e-04, 8.4261e-05, 1.3470e-04, 1.0528e-04, 8.8504e-05], device='cuda:0') 2022-11-15 14:51:58,970 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8458.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:52:02,572 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.08 vs. limit=2.0 2022-11-15 14:52:09,821 INFO [train.py:876] (0/4) Epoch 2, batch 1200, loss[loss=0.2372, simple_loss=0.2082, pruned_loss=0.1331, over 5588.00 frames. ], tot_loss[loss=0.269, simple_loss=0.2342, pruned_loss=0.1519, over 1083481.26 frames. ], batch size: 22, lr: 3.50e-02, grad_scale: 16.0 2022-11-15 14:52:31,194 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 2.113e+02 2.806e+02 3.522e+02 6.703e+02, threshold=5.613e+02, percent-clipped=5.0 2022-11-15 14:52:33,373 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8506.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:52:35,835 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-15 14:52:38,589 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 14:52:52,892 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.42 vs. limit=5.0 2022-11-15 14:52:58,979 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1971, 2.0460, 2.3020, 3.5530, 3.3871, 2.9038, 2.4227, 3.0042], device='cuda:0'), covar=tensor([0.0160, 0.1532, 0.1138, 0.0179, 0.0134, 0.0804, 0.1218, 0.0152], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0149, 0.0152, 0.0080, 0.0088, 0.0146, 0.0155, 0.0081], device='cuda:0'), out_proj_covar=tensor([8.6453e-05, 1.8158e-04, 1.8206e-04, 1.0240e-04, 1.0330e-04, 1.8151e-04, 1.8802e-04, 9.4125e-05], device='cuda:0') 2022-11-15 14:53:20,999 INFO [train.py:876] (0/4) Epoch 2, batch 1300, loss[loss=0.234, simple_loss=0.213, pruned_loss=0.1275, over 5589.00 frames. ], tot_loss[loss=0.2705, simple_loss=0.235, pruned_loss=0.153, over 1079805.08 frames. ], batch size: 16, lr: 3.48e-02, grad_scale: 16.0 2022-11-15 14:53:22,272 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.08 vs. limit=2.0 2022-11-15 14:53:42,885 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 2.077e+02 2.771e+02 3.615e+02 8.724e+02, threshold=5.542e+02, percent-clipped=7.0 2022-11-15 14:53:56,854 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=8621.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:54:35,336 INFO [train.py:876] (0/4) Epoch 2, batch 1400, loss[loss=0.2492, simple_loss=0.2269, pruned_loss=0.1358, over 5385.00 frames. ], tot_loss[loss=0.2683, simple_loss=0.2341, pruned_loss=0.1512, over 1083396.89 frames. ], batch size: 9, lr: 3.46e-02, grad_scale: 32.0 2022-11-15 14:54:56,930 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.304e+02 2.372e+02 3.042e+02 3.801e+02 7.959e+02, threshold=6.083e+02, percent-clipped=7.0 2022-11-15 14:55:12,410 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4861, 4.1185, 4.4223, 4.0331, 4.7504, 4.4264, 4.1527, 4.2196], device='cuda:0'), covar=tensor([0.0410, 0.0426, 0.0476, 0.0347, 0.0338, 0.0174, 0.0258, 0.0360], device='cuda:0'), in_proj_covar=tensor([0.0063, 0.0065, 0.0052, 0.0063, 0.0062, 0.0040, 0.0052, 0.0051], device='cuda:0'), out_proj_covar=tensor([1.3075e-04, 1.2378e-04, 1.0353e-04, 1.1576e-04, 1.4079e-04, 7.3973e-05, 9.8824e-05, 1.0230e-04], device='cuda:0') 2022-11-15 14:55:37,201 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=8760.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:55:46,190 INFO [train.py:876] (0/4) Epoch 2, batch 1500, loss[loss=0.2614, simple_loss=0.2372, pruned_loss=0.1428, over 5699.00 frames. ], tot_loss[loss=0.2641, simple_loss=0.2312, pruned_loss=0.1485, over 1086642.72 frames. ], batch size: 34, lr: 3.45e-02, grad_scale: 32.0 2022-11-15 14:55:54,872 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.18 vs. limit=5.0 2022-11-15 14:56:08,246 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.328e+02 2.321e+02 2.844e+02 3.403e+02 6.170e+02, threshold=5.688e+02, percent-clipped=1.0 2022-11-15 14:56:20,992 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=8821.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:56:34,887 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.33 vs. limit=5.0 2022-11-15 14:56:57,231 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-15 14:56:57,534 INFO [train.py:876] (0/4) Epoch 2, batch 1600, loss[loss=0.2771, simple_loss=0.2431, pruned_loss=0.1556, over 5278.00 frames. ], tot_loss[loss=0.263, simple_loss=0.2301, pruned_loss=0.1479, over 1082559.60 frames. ], batch size: 79, lr: 3.44e-02, grad_scale: 16.0 2022-11-15 14:56:57,670 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1139, 1.0817, 1.1122, 1.7483, 0.7092, 1.3689, 1.1574, 1.1996], device='cuda:0'), covar=tensor([0.0245, 0.0637, 0.0213, 0.0222, 0.0623, 0.1415, 0.0608, 0.0298], device='cuda:0'), in_proj_covar=tensor([0.0022, 0.0021, 0.0021, 0.0022, 0.0022, 0.0018, 0.0020, 0.0019], device='cuda:0'), out_proj_covar=tensor([3.3942e-05, 2.8292e-05, 2.7618e-05, 3.2024e-05, 3.4062e-05, 2.9437e-05, 2.8539e-05, 2.8640e-05], device='cuda:0') 2022-11-15 14:57:19,297 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.565e+02 2.087e+02 2.971e+02 3.839e+02 7.053e+02, threshold=5.941e+02, percent-clipped=2.0 2022-11-15 14:57:26,404 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6468, 1.2369, 1.7966, 1.4254, 2.2354, 2.4007, 1.9623, 1.9494], device='cuda:0'), covar=tensor([0.0165, 0.0840, 0.0239, 0.0443, 0.0122, 0.0147, 0.0180, 0.0126], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0018, 0.0016, 0.0022, 0.0018, 0.0017, 0.0019, 0.0017], device='cuda:0'), out_proj_covar=tensor([2.2974e-05, 2.5091e-05, 2.3355e-05, 2.9042e-05, 2.4699e-05, 2.2674e-05, 2.5783e-05, 2.1256e-05], device='cuda:0') 2022-11-15 14:57:27,598 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9546, 2.3518, 2.2705, 2.7515, 1.7427, 2.5207, 1.9998, 1.7182], device='cuda:0'), covar=tensor([0.0240, 0.0071, 0.0091, 0.0099, 0.0275, 0.0063, 0.0148, 0.0081], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0043, 0.0043, 0.0043, 0.0078, 0.0044, 0.0057, 0.0038], device='cuda:0'), out_proj_covar=tensor([9.8527e-05, 5.7582e-05, 5.7274e-05, 6.3496e-05, 1.1182e-04, 5.6948e-05, 7.7985e-05, 4.9988e-05], device='cuda:0') 2022-11-15 14:57:31,961 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=8921.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:57:36,139 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=8927.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:58:05,411 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=8969.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:58:08,099 INFO [train.py:876] (0/4) Epoch 2, batch 1700, loss[loss=0.2637, simple_loss=0.2351, pruned_loss=0.1461, over 5736.00 frames. ], tot_loss[loss=0.26, simple_loss=0.2283, pruned_loss=0.1459, over 1079607.61 frames. ], batch size: 20, lr: 3.42e-02, grad_scale: 16.0 2022-11-15 14:58:18,144 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=8986.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:58:19,532 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=8988.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:58:30,484 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.407e+02 2.261e+02 2.879e+02 3.540e+02 8.492e+02, threshold=5.758e+02, percent-clipped=3.0 2022-11-15 14:58:37,630 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9013.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:59:01,660 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9047.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:59:20,368 INFO [train.py:876] (0/4) Epoch 2, batch 1800, loss[loss=0.2857, simple_loss=0.2418, pruned_loss=0.1648, over 5680.00 frames. ], tot_loss[loss=0.2615, simple_loss=0.23, pruned_loss=0.1465, over 1085769.14 frames. ], batch size: 36, lr: 3.41e-02, grad_scale: 16.0 2022-11-15 14:59:21,219 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9074.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 14:59:27,109 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.72 vs. limit=2.0 2022-11-15 14:59:42,158 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 2.362e+02 3.022e+02 3.932e+02 1.031e+03, threshold=6.044e+02, percent-clipped=5.0 2022-11-15 14:59:50,855 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9116.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:00:31,253 INFO [train.py:876] (0/4) Epoch 2, batch 1900, loss[loss=0.2036, simple_loss=0.1951, pruned_loss=0.106, over 5569.00 frames. ], tot_loss[loss=0.2612, simple_loss=0.2299, pruned_loss=0.1462, over 1088899.00 frames. ], batch size: 15, lr: 3.39e-02, grad_scale: 16.0 2022-11-15 15:00:33,105 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2095, 1.4820, 1.3710, 2.1765, 1.7528, 1.0971, 1.4651, 1.7197], device='cuda:0'), covar=tensor([0.0618, 0.0334, 0.0583, 0.0294, 0.0337, 0.0530, 0.0444, 0.0340], device='cuda:0'), in_proj_covar=tensor([0.0033, 0.0034, 0.0039, 0.0030, 0.0041, 0.0034, 0.0041, 0.0026], device='cuda:0'), out_proj_covar=tensor([5.0352e-05, 5.1462e-05, 6.4945e-05, 4.7608e-05, 7.0757e-05, 5.9602e-05, 6.7084e-05, 4.2535e-05], device='cuda:0') 2022-11-15 15:00:52,750 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1825, 1.3750, 0.6645, 0.7378, 1.1377, 0.9483, 0.8355, 0.7192], device='cuda:0'), covar=tensor([0.0208, 0.0125, 0.0783, 0.0556, 0.0278, 0.0385, 0.0287, 0.0455], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0015, 0.0016, 0.0019, 0.0013, 0.0014, 0.0016, 0.0015], device='cuda:0'), out_proj_covar=tensor([2.3732e-05, 2.4137e-05, 2.8148e-05, 3.6073e-05, 2.4560e-05, 2.2689e-05, 2.7826e-05, 2.4553e-05], device='cuda:0') 2022-11-15 15:00:53,891 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 2.307e+02 3.025e+02 3.862e+02 6.126e+02, threshold=6.049e+02, percent-clipped=1.0 2022-11-15 15:01:09,891 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6483, 5.0224, 5.0573, 5.0283, 4.1744, 3.1998, 5.4834, 4.6060], device='cuda:0'), covar=tensor([0.0383, 0.0485, 0.0323, 0.0356, 0.0397, 0.0483, 0.0504, 0.0287], device='cuda:0'), in_proj_covar=tensor([0.0048, 0.0069, 0.0057, 0.0066, 0.0041, 0.0040, 0.0071, 0.0053], device='cuda:0'), out_proj_covar=tensor([9.1684e-05, 1.3112e-04, 1.0815e-04, 1.2273e-04, 8.1302e-05, 7.6071e-05, 1.4949e-04, 9.9562e-05], device='cuda:0') 2022-11-15 15:01:10,287 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 15:01:21,452 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.10 vs. limit=5.0 2022-11-15 15:01:22,404 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0112, 3.7261, 3.3729, 3.6428, 3.1841, 2.6727, 2.1149, 3.2000], device='cuda:0'), covar=tensor([0.2120, 0.0212, 0.0531, 0.0225, 0.0394, 0.1117, 0.3130, 0.0204], device='cuda:0'), in_proj_covar=tensor([0.0155, 0.0085, 0.0109, 0.0078, 0.0093, 0.0136, 0.0173, 0.0081], device='cuda:0'), out_proj_covar=tensor([1.7764e-04, 9.3703e-05, 1.2955e-04, 9.0815e-05, 1.1168e-04, 1.6195e-04, 1.9507e-04, 9.1513e-05], device='cuda:0') 2022-11-15 15:01:42,873 INFO [train.py:876] (0/4) Epoch 2, batch 2000, loss[loss=0.2976, simple_loss=0.2694, pruned_loss=0.1629, over 5558.00 frames. ], tot_loss[loss=0.2623, simple_loss=0.231, pruned_loss=0.1469, over 1088498.78 frames. ], batch size: 22, lr: 3.38e-02, grad_scale: 16.0 2022-11-15 15:01:50,671 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9283.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:01:51,607 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 15:02:05,764 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.305e+02 2.275e+02 2.942e+02 3.786e+02 7.709e+02, threshold=5.884e+02, percent-clipped=5.0 2022-11-15 15:02:11,082 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.12 vs. limit=2.0 2022-11-15 15:02:17,099 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9336, 1.3285, 0.9636, 1.6475, 1.2001, 1.1546, 0.6888, 0.9487], device='cuda:0'), covar=tensor([0.0174, 0.0167, 0.0277, 0.0103, 0.0152, 0.0133, 0.0820, 0.0575], device='cuda:0'), in_proj_covar=tensor([0.0025, 0.0022, 0.0020, 0.0023, 0.0022, 0.0023, 0.0024, 0.0021], device='cuda:0'), out_proj_covar=tensor([3.1599e-05, 2.7684e-05, 2.9814e-05, 2.7354e-05, 2.8474e-05, 3.0017e-05, 3.9808e-05, 2.7400e-05], device='cuda:0') 2022-11-15 15:02:32,847 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9342.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:02:52,308 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9369.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:02:54,973 INFO [train.py:876] (0/4) Epoch 2, batch 2100, loss[loss=0.2222, simple_loss=0.1971, pruned_loss=0.1237, over 4757.00 frames. ], tot_loss[loss=0.2621, simple_loss=0.2311, pruned_loss=0.1465, over 1076759.12 frames. ], batch size: 135, lr: 3.36e-02, grad_scale: 16.0 2022-11-15 15:03:17,094 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 2.437e+02 2.901e+02 3.645e+02 9.793e+02, threshold=5.801e+02, percent-clipped=2.0 2022-11-15 15:03:24,895 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9414.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:03:24,988 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.79 vs. limit=5.0 2022-11-15 15:03:26,179 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=9416.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:03:35,870 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9466, 3.5313, 2.8469, 0.8460, 3.5084, 3.8121, 2.4888, 3.8947], device='cuda:0'), covar=tensor([0.0768, 0.0270, 0.0344, 0.1027, 0.0080, 0.0043, 0.0173, 0.0061], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0083, 0.0060, 0.0100, 0.0056, 0.0055, 0.0053, 0.0058], device='cuda:0'), out_proj_covar=tensor([1.3912e-04, 1.0919e-04, 8.6458e-05, 1.3401e-04, 7.2338e-05, 7.2869e-05, 7.1836e-05, 7.2258e-05], device='cuda:0') 2022-11-15 15:04:00,503 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=9464.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:04:01,567 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.10 vs. limit=2.0 2022-11-15 15:04:06,699 INFO [train.py:876] (0/4) Epoch 2, batch 2200, loss[loss=0.2108, simple_loss=0.1929, pruned_loss=0.1144, over 5172.00 frames. ], tot_loss[loss=0.2615, simple_loss=0.2304, pruned_loss=0.1463, over 1076334.38 frames. ], batch size: 7, lr: 3.35e-02, grad_scale: 16.0 2022-11-15 15:04:08,203 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9475.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:04:14,500 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 15:04:28,437 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.967e+01 2.258e+02 2.836e+02 4.027e+02 8.312e+02, threshold=5.673e+02, percent-clipped=7.0 2022-11-15 15:04:45,071 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.72 vs. limit=5.0 2022-11-15 15:05:18,107 INFO [train.py:876] (0/4) Epoch 2, batch 2300, loss[loss=0.2199, simple_loss=0.2111, pruned_loss=0.1144, over 5763.00 frames. ], tot_loss[loss=0.2607, simple_loss=0.2293, pruned_loss=0.1461, over 1074748.44 frames. ], batch size: 14, lr: 3.34e-02, grad_scale: 16.0 2022-11-15 15:05:19,230 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.19 vs. limit=2.0 2022-11-15 15:05:25,125 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=9583.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:05:25,857 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9584.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:05:39,846 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 2.289e+02 3.013e+02 4.083e+02 8.581e+02, threshold=6.026e+02, percent-clipped=8.0 2022-11-15 15:05:59,625 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=9631.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:06:05,696 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 15:06:07,368 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=9642.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:06:09,473 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9645.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:06:10,581 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.26 vs. limit=2.0 2022-11-15 15:06:18,529 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9658.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:06:24,779 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=3.00 vs. limit=2.0 2022-11-15 15:06:26,534 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=9669.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:06:29,226 INFO [train.py:876] (0/4) Epoch 2, batch 2400, loss[loss=0.2471, simple_loss=0.2312, pruned_loss=0.1315, over 5486.00 frames. ], tot_loss[loss=0.2602, simple_loss=0.2295, pruned_loss=0.1455, over 1078808.77 frames. ], batch size: 12, lr: 3.32e-02, grad_scale: 16.0 2022-11-15 15:06:41,887 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=9690.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:06:51,268 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.316e+02 2.170e+02 2.571e+02 3.474e+02 5.585e+02, threshold=5.143e+02, percent-clipped=0.0 2022-11-15 15:06:53,575 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4667, 1.8130, 1.4604, 1.6466, 1.7353, 1.6660, 1.0428, 2.1445], device='cuda:0'), covar=tensor([0.0193, 0.0326, 0.0426, 0.0103, 0.0134, 0.0187, 0.0204, 0.0129], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0017, 0.0018, 0.0019, 0.0018, 0.0018, 0.0019, 0.0016], device='cuda:0'), out_proj_covar=tensor([2.4334e-05, 2.4671e-05, 2.7058e-05, 2.5086e-05, 2.6073e-05, 2.3708e-05, 2.6684e-05, 2.0754e-05], device='cuda:0') 2022-11-15 15:06:55,945 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.58 vs. limit=5.0 2022-11-15 15:07:00,758 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=9717.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:07:02,280 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9719.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:07:08,934 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.23 vs. limit=2.0 2022-11-15 15:07:19,402 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=9743.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:07:27,391 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4728, 4.7692, 4.4705, 4.6932, 4.5785, 4.3322, 1.5726, 3.9417], device='cuda:0'), covar=tensor([0.0169, 0.0097, 0.0137, 0.0060, 0.0129, 0.0239, 0.2049, 0.0267], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0046, 0.0042, 0.0034, 0.0047, 0.0037, 0.0085, 0.0050], device='cuda:0'), out_proj_covar=tensor([9.8121e-05, 8.4860e-05, 7.5505e-05, 6.2355e-05, 8.3988e-05, 6.7451e-05, 1.4653e-04, 9.1407e-05], device='cuda:0') 2022-11-15 15:07:38,161 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9770.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:07:40,530 INFO [train.py:876] (0/4) Epoch 2, batch 2500, loss[loss=0.2701, simple_loss=0.2166, pruned_loss=0.1618, over 3087.00 frames. ], tot_loss[loss=0.2581, simple_loss=0.228, pruned_loss=0.1441, over 1079092.96 frames. ], batch size: 284, lr: 3.31e-02, grad_scale: 16.0 2022-11-15 15:08:03,223 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.240e+02 2.186e+02 2.866e+02 3.924e+02 6.368e+02, threshold=5.732e+02, percent-clipped=5.0 2022-11-15 15:08:03,452 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=9804.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:08:25,691 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-15 15:08:52,036 INFO [train.py:876] (0/4) Epoch 2, batch 2600, loss[loss=0.2382, simple_loss=0.2032, pruned_loss=0.1366, over 5295.00 frames. ], tot_loss[loss=0.2595, simple_loss=0.2285, pruned_loss=0.1452, over 1082955.50 frames. ], batch size: 79, lr: 3.30e-02, grad_scale: 16.0 2022-11-15 15:08:56,720 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7987, 4.4604, 4.0870, 4.2035, 3.8780, 3.1188, 2.1928, 3.9627], device='cuda:0'), covar=tensor([0.1522, 0.0102, 0.0375, 0.0267, 0.0229, 0.0896, 0.2919, 0.0219], device='cuda:0'), in_proj_covar=tensor([0.0155, 0.0088, 0.0115, 0.0079, 0.0096, 0.0134, 0.0173, 0.0082], device='cuda:0'), out_proj_covar=tensor([1.8041e-04, 1.0030e-04, 1.3814e-04, 9.4051e-05, 1.1699e-04, 1.6102e-04, 1.9631e-04, 9.4866e-05], device='cuda:0') 2022-11-15 15:09:07,807 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-15 15:09:14,801 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.239e+02 2.199e+02 2.978e+02 3.710e+02 9.077e+02, threshold=5.957e+02, percent-clipped=5.0 2022-11-15 15:09:19,185 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9352, 0.8659, 1.2315, 1.4595, 0.7875, 0.6606, 0.7179, 0.6911], device='cuda:0'), covar=tensor([0.0377, 0.0734, 0.0124, 0.0142, 0.1181, 0.0244, 0.0857, 0.0695], device='cuda:0'), in_proj_covar=tensor([0.0024, 0.0022, 0.0021, 0.0022, 0.0023, 0.0022, 0.0023, 0.0021], device='cuda:0'), out_proj_covar=tensor([3.1499e-05, 3.0381e-05, 3.0735e-05, 2.6092e-05, 3.0826e-05, 2.8805e-05, 3.7701e-05, 2.8984e-05], device='cuda:0') 2022-11-15 15:09:40,255 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=9940.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:09:40,292 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7387, 4.7563, 3.8408, 1.9018, 4.9365, 2.6971, 4.3843, 2.9378], device='cuda:0'), covar=tensor([0.0694, 0.0202, 0.0217, 0.2367, 0.0091, 0.1213, 0.0183, 0.1510], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0056, 0.0048, 0.0094, 0.0055, 0.0084, 0.0041, 0.0087], device='cuda:0'), out_proj_covar=tensor([1.8099e-04, 1.1620e-04, 1.0280e-04, 1.9070e-04, 1.1084e-04, 1.7297e-04, 9.2394e-05, 1.8185e-04], device='cuda:0') 2022-11-15 15:09:44,787 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4249, 4.6653, 4.4474, 4.2804, 4.4039, 4.1593, 1.9440, 4.4217], device='cuda:0'), covar=tensor([0.0159, 0.0147, 0.0144, 0.0153, 0.0174, 0.0262, 0.1901, 0.0174], device='cuda:0'), in_proj_covar=tensor([0.0052, 0.0045, 0.0043, 0.0033, 0.0048, 0.0037, 0.0086, 0.0049], device='cuda:0'), out_proj_covar=tensor([9.6782e-05, 8.2437e-05, 7.9483e-05, 6.0551e-05, 8.7161e-05, 6.9333e-05, 1.4842e-04, 9.1231e-05], device='cuda:0') 2022-11-15 15:10:03,769 INFO [train.py:876] (0/4) Epoch 2, batch 2700, loss[loss=0.2845, simple_loss=0.2343, pruned_loss=0.1673, over 5448.00 frames. ], tot_loss[loss=0.2565, simple_loss=0.2273, pruned_loss=0.1429, over 1083075.44 frames. ], batch size: 58, lr: 3.28e-02, grad_scale: 16.0 2022-11-15 15:10:04,850 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.33 vs. limit=2.0 2022-11-15 15:10:23,340 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-10000.pt 2022-11-15 15:10:29,550 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.282e+02 2.293e+02 2.993e+02 4.046e+02 1.330e+03, threshold=5.986e+02, percent-clipped=8.0 2022-11-15 15:10:37,230 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10014.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:10:54,452 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10039.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:11:10,952 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10062.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 15:11:16,391 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10070.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:11:18,623 INFO [train.py:876] (0/4) Epoch 2, batch 2800, loss[loss=0.2462, simple_loss=0.2198, pruned_loss=0.1363, over 5577.00 frames. ], tot_loss[loss=0.2545, simple_loss=0.2259, pruned_loss=0.1415, over 1087649.45 frames. ], batch size: 13, lr: 3.27e-02, grad_scale: 16.0 2022-11-15 15:11:36,440 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10099.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:11:37,193 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10100.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 15:11:39,748 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.304e+02 2.201e+02 2.831e+02 3.552e+02 8.014e+02, threshold=5.662e+02, percent-clipped=2.0 2022-11-15 15:11:49,980 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10118.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:11:53,501 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10123.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 15:11:59,808 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.34 vs. limit=5.0 2022-11-15 15:12:15,615 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 15:12:29,743 INFO [train.py:876] (0/4) Epoch 2, batch 2900, loss[loss=0.2577, simple_loss=0.2364, pruned_loss=0.1395, over 5580.00 frames. ], tot_loss[loss=0.254, simple_loss=0.2255, pruned_loss=0.1413, over 1084948.35 frames. ], batch size: 22, lr: 3.26e-02, grad_scale: 16.0 2022-11-15 15:12:42,390 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 15:12:52,051 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 2.141e+02 2.737e+02 3.549e+02 7.365e+02, threshold=5.475e+02, percent-clipped=2.0 2022-11-15 15:12:55,049 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9043, 3.6130, 3.4138, 3.8029, 3.8098, 3.3092, 3.2619, 2.9783], device='cuda:0'), covar=tensor([0.0948, 0.0326, 0.0408, 0.0230, 0.0230, 0.0352, 0.0314, 0.0590], device='cuda:0'), in_proj_covar=tensor([0.0058, 0.0057, 0.0079, 0.0059, 0.0075, 0.0072, 0.0062, 0.0054], device='cuda:0'), out_proj_covar=tensor([1.0606e-04, 1.1458e-04, 1.3682e-04, 1.1244e-04, 1.4565e-04, 1.1959e-04, 1.0966e-04, 9.4234e-05], device='cuda:0') 2022-11-15 15:13:18,139 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10240.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:13:38,872 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2124, 3.1087, 3.1291, 2.7394, 2.4455, 3.5692, 2.7230, 2.9594], device='cuda:0'), covar=tensor([0.0284, 0.0101, 0.0096, 0.0276, 0.0315, 0.0060, 0.0183, 0.0054], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0047, 0.0053, 0.0050, 0.0086, 0.0052, 0.0067, 0.0044], device='cuda:0'), out_proj_covar=tensor([1.1929e-04, 6.7150e-05, 7.4542e-05, 7.8060e-05, 1.3222e-04, 6.8745e-05, 9.8085e-05, 6.1538e-05], device='cuda:0') 2022-11-15 15:13:41,346 INFO [train.py:876] (0/4) Epoch 2, batch 3000, loss[loss=0.3268, simple_loss=0.2574, pruned_loss=0.1981, over 4953.00 frames. ], tot_loss[loss=0.2528, simple_loss=0.2248, pruned_loss=0.1404, over 1085307.36 frames. ], batch size: 109, lr: 3.24e-02, grad_scale: 16.0 2022-11-15 15:13:41,347 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 15:13:46,744 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8280, 3.2806, 2.6159, 1.7001, 3.3768, 1.4266, 3.3933, 2.3876], device='cuda:0'), covar=tensor([0.0586, 0.0196, 0.0356, 0.2181, 0.0164, 0.1380, 0.0114, 0.1233], device='cuda:0'), in_proj_covar=tensor([0.0090, 0.0056, 0.0049, 0.0094, 0.0056, 0.0089, 0.0043, 0.0088], device='cuda:0'), out_proj_covar=tensor([1.9104e-04, 1.1816e-04, 1.0936e-04, 1.9249e-04, 1.1056e-04, 1.8433e-04, 9.6911e-05, 1.8694e-04], device='cuda:0') 2022-11-15 15:13:47,742 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2907, 3.8125, 3.7534, 3.8606, 3.3612, 2.9529, 2.2526, 3.2858], device='cuda:0'), covar=tensor([0.2682, 0.0277, 0.0452, 0.0174, 0.0437, 0.1254, 0.3108, 0.0208], device='cuda:0'), in_proj_covar=tensor([0.0163, 0.0090, 0.0122, 0.0080, 0.0099, 0.0140, 0.0176, 0.0084], device='cuda:0'), out_proj_covar=tensor([1.9129e-04, 1.0250e-04, 1.4983e-04, 9.6769e-05, 1.2265e-04, 1.6920e-04, 2.0274e-04, 9.7667e-05], device='cuda:0') 2022-11-15 15:13:49,926 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0147, 1.0720, 1.6304, 1.0430, 0.5149, 1.1005, 1.2838, 0.8771], device='cuda:0'), covar=tensor([0.0318, 0.0566, 0.0247, 0.0443, 0.1545, 0.0823, 0.0395, 0.0598], device='cuda:0'), in_proj_covar=tensor([0.0024, 0.0022, 0.0023, 0.0025, 0.0023, 0.0018, 0.0022, 0.0022], device='cuda:0'), out_proj_covar=tensor([3.8000e-05, 3.2319e-05, 3.0317e-05, 3.6622e-05, 3.9071e-05, 3.0658e-05, 3.3691e-05, 3.2866e-05], device='cuda:0') 2022-11-15 15:13:52,905 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1958, 2.4958, 2.0742, 2.7736, 1.4088, 2.3894, 1.9418, 1.9773], device='cuda:0'), covar=tensor([0.0045, 0.0023, 0.0046, 0.0035, 0.0118, 0.0035, 0.0055, 0.0031], device='cuda:0'), in_proj_covar=tensor([0.0083, 0.0046, 0.0052, 0.0050, 0.0086, 0.0051, 0.0067, 0.0044], device='cuda:0'), out_proj_covar=tensor([1.1843e-04, 6.6768e-05, 7.4193e-05, 7.7527e-05, 1.3132e-04, 6.8366e-05, 9.7583e-05, 6.1282e-05], device='cuda:0') 2022-11-15 15:14:00,267 INFO [train.py:908] (0/4) Epoch 2, validation: loss=0.2049, simple_loss=0.215, pruned_loss=0.09736, over 1530663.00 frames. 2022-11-15 15:14:00,267 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 15:14:02,115 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 15:14:10,874 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10288.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:14:11,643 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8719, 4.2707, 3.7155, 3.6944, 3.9277, 3.9700, 1.5640, 4.0165], device='cuda:0'), covar=tensor([0.0281, 0.0195, 0.0264, 0.0201, 0.0243, 0.0218, 0.2116, 0.0228], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0044, 0.0044, 0.0034, 0.0050, 0.0037, 0.0085, 0.0050], device='cuda:0'), out_proj_covar=tensor([9.8019e-05, 8.1951e-05, 8.1061e-05, 6.3354e-05, 9.0374e-05, 6.8739e-05, 1.4756e-04, 9.4017e-05], device='cuda:0') 2022-11-15 15:14:13,041 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5206, 3.1170, 2.4436, 1.6665, 3.0329, 1.2322, 3.0668, 1.8980], device='cuda:0'), covar=tensor([0.0597, 0.0153, 0.0321, 0.1691, 0.0159, 0.1318, 0.0115, 0.1077], device='cuda:0'), in_proj_covar=tensor([0.0090, 0.0057, 0.0050, 0.0094, 0.0056, 0.0089, 0.0044, 0.0089], device='cuda:0'), out_proj_covar=tensor([1.9346e-04, 1.2018e-04, 1.1033e-04, 1.9426e-04, 1.1195e-04, 1.8572e-04, 9.8762e-05, 1.8813e-04], device='cuda:0') 2022-11-15 15:14:22,008 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 2.226e+02 2.767e+02 3.573e+02 6.449e+02, threshold=5.534e+02, percent-clipped=5.0 2022-11-15 15:14:28,949 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10314.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:14:59,757 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6400, 4.3034, 3.7304, 4.4028, 4.3475, 3.8365, 3.8933, 3.3868], device='cuda:0'), covar=tensor([0.0403, 0.0338, 0.0397, 0.0341, 0.0364, 0.0369, 0.0314, 0.0463], device='cuda:0'), in_proj_covar=tensor([0.0059, 0.0056, 0.0079, 0.0058, 0.0074, 0.0072, 0.0061, 0.0054], device='cuda:0'), out_proj_covar=tensor([1.0876e-04, 1.1327e-04, 1.3588e-04, 1.1129e-04, 1.4456e-04, 1.2038e-04, 1.0737e-04, 9.5154e-05], device='cuda:0') 2022-11-15 15:15:03,069 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10362.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:15:10,522 INFO [train.py:876] (0/4) Epoch 2, batch 3100, loss[loss=0.3227, simple_loss=0.2735, pruned_loss=0.186, over 5559.00 frames. ], tot_loss[loss=0.2527, simple_loss=0.2244, pruned_loss=0.1405, over 1085202.86 frames. ], batch size: 40, lr: 3.23e-02, grad_scale: 16.0 2022-11-15 15:15:26,633 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10395.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 15:15:29,517 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10399.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:15:33,065 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.883e+01 2.180e+02 2.990e+02 3.781e+02 9.963e+02, threshold=5.979e+02, percent-clipped=5.0 2022-11-15 15:15:36,991 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10409.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:15:42,650 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.84 vs. limit=5.0 2022-11-15 15:15:43,100 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10418.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 15:16:03,470 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5489, 1.2407, 2.0148, 1.4392, 1.2127, 1.0551, 1.9005, 0.9198], device='cuda:0'), covar=tensor([0.0137, 0.0160, 0.0105, 0.0207, 0.0316, 0.0961, 0.0172, 0.0184], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0021, 0.0022, 0.0024, 0.0022, 0.0019, 0.0021, 0.0020], device='cuda:0'), out_proj_covar=tensor([3.4852e-05, 3.0726e-05, 2.9649e-05, 3.5899e-05, 3.7105e-05, 3.1389e-05, 3.1940e-05, 3.1155e-05], device='cuda:0') 2022-11-15 15:16:04,086 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10447.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:16:05,787 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.10 vs. limit=2.0 2022-11-15 15:16:08,976 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10454.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:16:20,276 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10470.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:16:22,103 INFO [train.py:876] (0/4) Epoch 2, batch 3200, loss[loss=0.1889, simple_loss=0.1839, pruned_loss=0.09698, over 5691.00 frames. ], tot_loss[loss=0.2524, simple_loss=0.225, pruned_loss=0.1399, over 1087858.54 frames. ], batch size: 12, lr: 3.22e-02, grad_scale: 16.0 2022-11-15 15:16:27,560 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 15:16:44,405 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.221e+01 2.155e+02 2.904e+02 3.416e+02 7.936e+02, threshold=5.808e+02, percent-clipped=4.0 2022-11-15 15:16:52,734 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10515.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 15:17:11,597 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9294, 1.7261, 2.8365, 2.0925, 3.0470, 1.9667, 2.9082, 3.1173], device='cuda:0'), covar=tensor([0.0052, 0.0617, 0.0118, 0.0397, 0.0100, 0.0348, 0.0170, 0.0177], device='cuda:0'), in_proj_covar=tensor([0.0069, 0.0142, 0.0085, 0.0133, 0.0073, 0.0120, 0.0111, 0.0086], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 15:17:33,911 INFO [train.py:876] (0/4) Epoch 2, batch 3300, loss[loss=0.2277, simple_loss=0.2194, pruned_loss=0.118, over 5729.00 frames. ], tot_loss[loss=0.2519, simple_loss=0.2256, pruned_loss=0.1391, over 1084918.73 frames. ], batch size: 14, lr: 3.21e-02, grad_scale: 16.0 2022-11-15 15:17:48,986 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.28 vs. limit=5.0 2022-11-15 15:17:55,742 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.974e+02 2.609e+02 3.131e+02 6.226e+02, threshold=5.219e+02, percent-clipped=2.0 2022-11-15 15:18:12,357 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.13 vs. limit=5.0 2022-11-15 15:18:45,894 INFO [train.py:876] (0/4) Epoch 2, batch 3400, loss[loss=0.3093, simple_loss=0.2598, pruned_loss=0.1794, over 5552.00 frames. ], tot_loss[loss=0.2518, simple_loss=0.2251, pruned_loss=0.1392, over 1085307.04 frames. ], batch size: 30, lr: 3.19e-02, grad_scale: 16.0 2022-11-15 15:19:01,402 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10695.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:19:07,528 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.578e+02 2.426e+02 2.941e+02 3.632e+02 1.443e+03, threshold=5.881e+02, percent-clipped=8.0 2022-11-15 15:19:11,116 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.74 vs. limit=5.0 2022-11-15 15:19:18,049 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=10718.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 15:19:36,077 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10743.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:19:36,933 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10744.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:19:41,097 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0670, 4.4148, 4.1445, 4.4079, 3.4739, 3.1775, 4.8604, 4.1196], device='cuda:0'), covar=tensor([0.0487, 0.0683, 0.0442, 0.0679, 0.0729, 0.0481, 0.0921, 0.0623], device='cuda:0'), in_proj_covar=tensor([0.0051, 0.0075, 0.0061, 0.0073, 0.0048, 0.0043, 0.0076, 0.0056], device='cuda:0'), out_proj_covar=tensor([9.9430e-05, 1.5138e-04, 1.2022e-04, 1.4303e-04, 1.0092e-04, 8.5631e-05, 1.6899e-04, 1.1131e-04], device='cuda:0') 2022-11-15 15:19:51,855 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10765.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:19:52,488 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=10766.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 15:19:58,020 INFO [train.py:876] (0/4) Epoch 2, batch 3500, loss[loss=0.2524, simple_loss=0.2349, pruned_loss=0.135, over 5705.00 frames. ], tot_loss[loss=0.249, simple_loss=0.2233, pruned_loss=0.1374, over 1089534.52 frames. ], batch size: 19, lr: 3.18e-02, grad_scale: 16.0 2022-11-15 15:19:58,852 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10774.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:20:20,009 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 2.227e+02 2.667e+02 3.424e+02 6.980e+02, threshold=5.333e+02, percent-clipped=5.0 2022-11-15 15:20:20,897 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10805.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:20:24,281 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=10810.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 15:20:40,514 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7741, 1.3299, 1.8606, 2.7187, 1.3492, 1.5659, 1.6865, 1.7477], device='cuda:0'), covar=tensor([0.0203, 0.0266, 0.0320, 0.0078, 0.0271, 0.0255, 0.0521, 0.0566], device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0031, 0.0035, 0.0026, 0.0037, 0.0029, 0.0037, 0.0026], device='cuda:0'), out_proj_covar=tensor([5.0801e-05, 5.3350e-05, 6.6959e-05, 4.1537e-05, 6.6835e-05, 5.5162e-05, 6.3949e-05, 4.5199e-05], device='cuda:0') 2022-11-15 15:20:42,599 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10835.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:21:08,879 INFO [train.py:876] (0/4) Epoch 2, batch 3600, loss[loss=0.331, simple_loss=0.2767, pruned_loss=0.1927, over 5447.00 frames. ], tot_loss[loss=0.2505, simple_loss=0.2239, pruned_loss=0.1386, over 1087407.01 frames. ], batch size: 58, lr: 3.17e-02, grad_scale: 32.0 2022-11-15 15:21:12,249 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 15:21:18,335 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=10885.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 15:21:31,440 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.316e+02 2.250e+02 2.765e+02 3.840e+02 7.288e+02, threshold=5.531e+02, percent-clipped=6.0 2022-11-15 15:21:49,468 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.45 vs. limit=5.0 2022-11-15 15:22:01,315 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=10946.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 15:22:04,056 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3163, 2.1994, 2.1825, 2.3197, 1.2359, 2.7645, 2.0539, 1.6233], device='cuda:0'), covar=tensor([0.0111, 0.0039, 0.0049, 0.0080, 0.0208, 0.0037, 0.0094, 0.0053], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0047, 0.0055, 0.0052, 0.0091, 0.0052, 0.0070, 0.0047], device='cuda:0'), out_proj_covar=tensor([1.2576e-04, 7.1759e-05, 8.0365e-05, 8.4175e-05, 1.4186e-04, 7.2664e-05, 1.0549e-04, 6.8569e-05], device='cuda:0') 2022-11-15 15:22:13,558 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.19 vs. limit=5.0 2022-11-15 15:22:19,993 INFO [train.py:876] (0/4) Epoch 2, batch 3700, loss[loss=0.253, simple_loss=0.2286, pruned_loss=0.1387, over 5744.00 frames. ], tot_loss[loss=0.2514, simple_loss=0.2246, pruned_loss=0.1391, over 1088752.68 frames. ], batch size: 16, lr: 3.16e-02, grad_scale: 32.0 2022-11-15 15:22:29,194 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3795, 1.1215, 1.0285, 1.4362, 1.2366, 1.4793, 0.7362, 1.5853], device='cuda:0'), covar=tensor([0.0148, 0.0139, 0.0386, 0.0122, 0.0133, 0.0099, 0.0282, 0.0093], device='cuda:0'), in_proj_covar=tensor([0.0022, 0.0020, 0.0020, 0.0020, 0.0019, 0.0019, 0.0022, 0.0020], device='cuda:0'), out_proj_covar=tensor([2.8353e-05, 2.8058e-05, 3.1116e-05, 2.3898e-05, 2.6420e-05, 2.6738e-05, 3.5506e-05, 2.8951e-05], device='cuda:0') 2022-11-15 15:22:35,601 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.32 vs. limit=2.0 2022-11-15 15:22:42,990 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.438e+02 2.397e+02 3.169e+02 4.273e+02 6.249e+02, threshold=6.338e+02, percent-clipped=7.0 2022-11-15 15:22:49,532 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.86 vs. limit=5.0 2022-11-15 15:23:05,872 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.55 vs. limit=5.0 2022-11-15 15:23:12,825 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.37 vs. limit=5.0 2022-11-15 15:23:18,203 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6945, 3.6569, 3.6472, 3.9171, 3.4492, 2.9177, 2.2788, 3.5648], device='cuda:0'), covar=tensor([0.1476, 0.0358, 0.0385, 0.0241, 0.0377, 0.1081, 0.2622, 0.0188], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0096, 0.0125, 0.0084, 0.0105, 0.0144, 0.0179, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 15:23:25,750 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11065.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:23:31,391 INFO [train.py:876] (0/4) Epoch 2, batch 3800, loss[loss=0.2612, simple_loss=0.2475, pruned_loss=0.1375, over 5565.00 frames. ], tot_loss[loss=0.2519, simple_loss=0.225, pruned_loss=0.1394, over 1085510.34 frames. ], batch size: 22, lr: 3.15e-02, grad_scale: 16.0 2022-11-15 15:23:32,596 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.54 vs. limit=2.0 2022-11-15 15:23:35,818 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9720, 0.9894, 1.0080, 1.0264, 0.7917, 1.1255, 0.9979, 1.3304], device='cuda:0'), covar=tensor([0.0044, 0.0028, 0.0030, 0.0035, 0.0035, 0.0029, 0.0061, 0.0035], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0018, 0.0016, 0.0020, 0.0020, 0.0019, 0.0024, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.5843e-05, 2.6259e-05, 2.4214e-05, 2.5391e-05, 2.7901e-05, 2.5811e-05, 3.3552e-05, 2.5140e-05], device='cuda:0') 2022-11-15 15:23:48,867 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6445, 1.6061, 2.4849, 2.0714, 2.4674, 1.7556, 2.3714, 2.5700], device='cuda:0'), covar=tensor([0.0055, 0.0356, 0.0085, 0.0160, 0.0087, 0.0291, 0.0123, 0.0089], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0147, 0.0090, 0.0142, 0.0079, 0.0127, 0.0117, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 15:23:50,500 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11100.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:23:54,057 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.988e+01 2.162e+02 2.820e+02 3.661e+02 7.630e+02, threshold=5.641e+02, percent-clipped=4.0 2022-11-15 15:23:57,607 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11110.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 15:23:57,621 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3521, 1.1545, 1.0879, 1.1240, 1.1765, 1.7171, 1.1520, 1.1554], device='cuda:0'), covar=tensor([0.0123, 0.0231, 0.0099, 0.0115, 0.0102, 0.0095, 0.0245, 0.0140], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0017, 0.0015, 0.0020, 0.0019, 0.0018, 0.0023, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.5291e-05, 2.6168e-05, 2.3860e-05, 2.5178e-05, 2.7388e-05, 2.5576e-05, 3.2885e-05, 2.4850e-05], device='cuda:0') 2022-11-15 15:23:59,574 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11113.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:24:05,268 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.13 vs. limit=2.0 2022-11-15 15:24:11,509 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11130.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:24:31,160 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11158.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:24:38,982 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.88 vs. limit=2.0 2022-11-15 15:24:41,929 INFO [train.py:876] (0/4) Epoch 2, batch 3900, loss[loss=0.2957, simple_loss=0.2581, pruned_loss=0.1666, over 5734.00 frames. ], tot_loss[loss=0.2523, simple_loss=0.2256, pruned_loss=0.1395, over 1086910.74 frames. ], batch size: 31, lr: 3.13e-02, grad_scale: 16.0 2022-11-15 15:24:42,729 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11174.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:24:45,038 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4556, 3.4431, 3.2983, 3.5975, 2.8768, 2.7038, 3.7811, 3.4270], device='cuda:0'), covar=tensor([0.0554, 0.0860, 0.0728, 0.0673, 0.0856, 0.0481, 0.0849, 0.0663], device='cuda:0'), in_proj_covar=tensor([0.0047, 0.0070, 0.0057, 0.0069, 0.0045, 0.0039, 0.0073, 0.0053], device='cuda:0'), out_proj_covar=tensor([9.2514e-05, 1.4193e-04, 1.1452e-04, 1.3604e-04, 9.5325e-05, 7.9962e-05, 1.6407e-04, 1.0421e-04], device='cuda:0') 2022-11-15 15:24:45,269 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.85 vs. limit=2.0 2022-11-15 15:25:00,613 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.28 vs. limit=2.0 2022-11-15 15:25:04,410 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2022-11-15 15:25:04,798 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 2.353e+02 2.852e+02 3.627e+02 7.008e+02, threshold=5.704e+02, percent-clipped=3.0 2022-11-15 15:25:27,370 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11235.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:25:31,459 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11241.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 15:25:54,056 INFO [train.py:876] (0/4) Epoch 2, batch 4000, loss[loss=0.2954, simple_loss=0.2541, pruned_loss=0.1684, over 5566.00 frames. ], tot_loss[loss=0.251, simple_loss=0.2245, pruned_loss=0.1387, over 1085989.11 frames. ], batch size: 43, lr: 3.12e-02, grad_scale: 16.0 2022-11-15 15:26:16,540 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.259e+02 2.176e+02 2.940e+02 3.819e+02 6.622e+02, threshold=5.880e+02, percent-clipped=2.0 2022-11-15 15:26:24,447 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2022-11-15 15:26:35,325 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6916, 1.9221, 3.4827, 2.5803, 3.4826, 2.6213, 3.3037, 3.7694], device='cuda:0'), covar=tensor([0.0054, 0.0563, 0.0087, 0.0399, 0.0083, 0.0361, 0.0189, 0.0091], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0142, 0.0089, 0.0143, 0.0078, 0.0128, 0.0119, 0.0091], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 15:26:44,643 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.17 vs. limit=5.0 2022-11-15 15:26:57,271 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2384, 1.0861, 1.0381, 0.7172, 1.5145, 0.9400, 0.9255, 1.3107], device='cuda:0'), covar=tensor([0.0301, 0.0157, 0.0347, 0.0707, 0.0191, 0.0161, 0.0281, 0.0271], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0011, 0.0011, 0.0013, 0.0010, 0.0010, 0.0011, 0.0010], device='cuda:0'), out_proj_covar=tensor([2.1245e-05, 2.0995e-05, 2.3634e-05, 2.7690e-05, 2.0093e-05, 2.0686e-05, 2.2766e-05, 1.9481e-05], device='cuda:0') 2022-11-15 15:27:04,364 INFO [train.py:876] (0/4) Epoch 2, batch 4100, loss[loss=0.2336, simple_loss=0.2279, pruned_loss=0.1197, over 5654.00 frames. ], tot_loss[loss=0.2512, simple_loss=0.2244, pruned_loss=0.139, over 1087305.27 frames. ], batch size: 32, lr: 3.11e-02, grad_scale: 16.0 2022-11-15 15:27:24,547 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11400.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:27:25,988 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-15 15:27:27,789 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 2.309e+02 2.774e+02 3.508e+02 5.775e+02, threshold=5.548e+02, percent-clipped=0.0 2022-11-15 15:27:28,575 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6263, 1.5822, 1.5294, 1.6832, 1.7597, 1.5224, 1.8111, 1.6779], device='cuda:0'), covar=tensor([0.0743, 0.1166, 0.0921, 0.0979, 0.0625, 0.0569, 0.1192, 0.0687], device='cuda:0'), in_proj_covar=tensor([0.0048, 0.0070, 0.0057, 0.0069, 0.0045, 0.0041, 0.0073, 0.0051], device='cuda:0'), out_proj_covar=tensor([9.3747e-05, 1.4388e-04, 1.1596e-04, 1.3880e-04, 9.7085e-05, 8.3249e-05, 1.6568e-04, 1.0283e-04], device='cuda:0') 2022-11-15 15:27:46,242 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11430.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:27:58,771 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11448.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:28:04,035 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.33 vs. limit=2.0 2022-11-15 15:28:16,217 INFO [train.py:876] (0/4) Epoch 2, batch 4200, loss[loss=0.2031, simple_loss=0.196, pruned_loss=0.1051, over 5569.00 frames. ], tot_loss[loss=0.2483, simple_loss=0.2229, pruned_loss=0.1369, over 1080275.74 frames. ], batch size: 15, lr: 3.10e-02, grad_scale: 16.0 2022-11-15 15:28:19,836 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11478.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:28:39,537 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.267e+02 2.116e+02 2.605e+02 3.416e+02 5.601e+02, threshold=5.209e+02, percent-clipped=1.0 2022-11-15 15:28:56,845 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11530.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:28:58,338 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11532.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:29:04,723 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11541.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 15:29:27,548 INFO [train.py:876] (0/4) Epoch 2, batch 4300, loss[loss=0.2924, simple_loss=0.2469, pruned_loss=0.169, over 5085.00 frames. ], tot_loss[loss=0.2466, simple_loss=0.221, pruned_loss=0.1361, over 1074949.09 frames. ], batch size: 91, lr: 3.09e-02, grad_scale: 16.0 2022-11-15 15:29:32,676 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2720, 0.9734, 1.9023, 1.3221, 1.4273, 1.9355, 1.7256, 1.7175], device='cuda:0'), covar=tensor([0.0156, 0.0333, 0.0105, 0.0112, 0.0116, 0.0076, 0.0146, 0.0100], device='cuda:0'), in_proj_covar=tensor([0.0020, 0.0019, 0.0017, 0.0022, 0.0020, 0.0018, 0.0023, 0.0020], device='cuda:0'), out_proj_covar=tensor([2.7499e-05, 2.8339e-05, 2.6771e-05, 2.7404e-05, 2.8031e-05, 2.4103e-05, 3.1839e-05, 2.6515e-05], device='cuda:0') 2022-11-15 15:29:38,724 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11589.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 15:29:41,470 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11593.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:29:51,949 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.207e+02 2.389e+02 3.097e+02 3.751e+02 1.482e+03, threshold=6.195e+02, percent-clipped=9.0 2022-11-15 15:30:11,788 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11635.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:30:23,786 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.96 vs. limit=2.0 2022-11-15 15:30:39,153 INFO [train.py:876] (0/4) Epoch 2, batch 4400, loss[loss=0.2339, simple_loss=0.2104, pruned_loss=0.1287, over 5561.00 frames. ], tot_loss[loss=0.2483, simple_loss=0.2221, pruned_loss=0.1373, over 1078915.12 frames. ], batch size: 25, lr: 3.08e-02, grad_scale: 8.0 2022-11-15 15:30:55,187 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11696.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:30:57,335 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11699.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:31:02,564 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.242e+02 2.298e+02 2.735e+02 3.598e+02 7.155e+02, threshold=5.470e+02, percent-clipped=1.0 2022-11-15 15:31:40,414 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11760.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:31:44,560 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3236, 4.1747, 3.5536, 2.0052, 4.2466, 2.0050, 4.1572, 2.7160], device='cuda:0'), covar=tensor([0.0867, 0.0185, 0.0399, 0.2296, 0.0145, 0.1687, 0.0142, 0.1403], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0065, 0.0060, 0.0103, 0.0063, 0.0105, 0.0052, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:0') 2022-11-15 15:31:49,953 INFO [train.py:876] (0/4) Epoch 2, batch 4500, loss[loss=0.2946, simple_loss=0.2402, pruned_loss=0.1745, over 5336.00 frames. ], tot_loss[loss=0.25, simple_loss=0.2236, pruned_loss=0.1382, over 1077957.30 frames. ], batch size: 70, lr: 3.07e-02, grad_scale: 8.0 2022-11-15 15:32:13,965 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.306e+02 2.368e+02 2.947e+02 3.819e+02 5.858e+02, threshold=5.894e+02, percent-clipped=4.0 2022-11-15 15:32:16,189 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7698, 1.1760, 1.2969, 1.4619, 1.1419, 1.6048, 1.4542, 0.6083], device='cuda:0'), covar=tensor([0.0181, 0.0084, 0.0099, 0.0126, 0.0352, 0.0116, 0.0342, 0.0364], device='cuda:0'), in_proj_covar=tensor([0.0024, 0.0022, 0.0024, 0.0025, 0.0024, 0.0020, 0.0022, 0.0025], device='cuda:0'), out_proj_covar=tensor([3.7100e-05, 3.1442e-05, 3.3498e-05, 3.7752e-05, 4.2189e-05, 3.3488e-05, 3.6238e-05, 3.8685e-05], device='cuda:0') 2022-11-15 15:32:19,711 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-15 15:32:20,955 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11816.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:32:31,015 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=11830.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:32:41,978 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11846.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:32:50,246 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8878, 1.4176, 1.6198, 1.5020, 0.8707, 1.6782, 1.4429, 0.8358], device='cuda:0'), covar=tensor([0.0171, 0.0097, 0.0142, 0.0139, 0.0399, 0.0118, 0.0165, 0.0262], device='cuda:0'), in_proj_covar=tensor([0.0025, 0.0022, 0.0025, 0.0026, 0.0025, 0.0021, 0.0023, 0.0026], device='cuda:0'), out_proj_covar=tensor([3.8295e-05, 3.2189e-05, 3.4631e-05, 3.8797e-05, 4.3761e-05, 3.4824e-05, 3.6772e-05, 3.9292e-05], device='cuda:0') 2022-11-15 15:32:52,497 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.24 vs. limit=5.0 2022-11-15 15:33:01,635 INFO [train.py:876] (0/4) Epoch 2, batch 4600, loss[loss=0.2693, simple_loss=0.2292, pruned_loss=0.1547, over 4925.00 frames. ], tot_loss[loss=0.248, simple_loss=0.2229, pruned_loss=0.1365, over 1077055.95 frames. ], batch size: 109, lr: 3.05e-02, grad_scale: 8.0 2022-11-15 15:33:02,078 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.20 vs. limit=2.0 2022-11-15 15:33:04,578 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11877.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:33:05,510 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=11878.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:33:05,642 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9465, 1.2587, 1.2860, 0.7935, 1.2614, 1.3735, 1.1261, 0.9347], device='cuda:0'), covar=tensor([0.0801, 0.0701, 0.0388, 0.1021, 0.1033, 0.0376, 0.0804, 0.0870], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0011, 0.0009, 0.0012, 0.0010, 0.0010, 0.0011, 0.0010], device='cuda:0'), out_proj_covar=tensor([2.1668e-05, 2.0660e-05, 2.1131e-05, 2.7181e-05, 2.0985e-05, 1.9576e-05, 2.2858e-05, 2.0423e-05], device='cuda:0') 2022-11-15 15:33:12,415 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11888.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:33:25,411 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 2.171e+02 2.900e+02 3.774e+02 7.017e+02, threshold=5.800e+02, percent-clipped=1.0 2022-11-15 15:33:25,612 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11907.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:33:34,959 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11920.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:33:54,825 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11948.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:34:00,386 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2423, 4.6445, 3.6381, 2.1601, 4.5669, 2.0509, 4.0765, 3.0130], device='cuda:0'), covar=tensor([0.0779, 0.0138, 0.0350, 0.2067, 0.0142, 0.1609, 0.0136, 0.1256], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0063, 0.0060, 0.0103, 0.0064, 0.0105, 0.0052, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:0') 2022-11-15 15:34:06,048 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9201, 1.6615, 2.0953, 2.1912, 1.6768, 2.3845, 1.8508, 2.0510], device='cuda:0'), covar=tensor([0.0150, 0.0165, 0.0311, 0.0310, 0.0282, 0.0221, 0.0586, 0.0447], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0035, 0.0041, 0.0029, 0.0041, 0.0035, 0.0041, 0.0028], device='cuda:0'), out_proj_covar=tensor([5.3303e-05, 6.1449e-05, 8.0543e-05, 5.4065e-05, 7.5711e-05, 6.8332e-05, 7.4583e-05, 5.2085e-05], device='cuda:0') 2022-11-15 15:34:12,182 INFO [train.py:876] (0/4) Epoch 2, batch 4700, loss[loss=0.2407, simple_loss=0.2124, pruned_loss=0.1345, over 5795.00 frames. ], tot_loss[loss=0.2449, simple_loss=0.2211, pruned_loss=0.1343, over 1087818.52 frames. ], batch size: 22, lr: 3.04e-02, grad_scale: 8.0 2022-11-15 15:34:12,933 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9636, 4.0012, 3.8570, 4.0381, 3.8367, 3.3394, 4.5768, 3.7331], device='cuda:0'), covar=tensor([0.0467, 0.0987, 0.0478, 0.0857, 0.0432, 0.0450, 0.0725, 0.0545], device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0073, 0.0061, 0.0071, 0.0048, 0.0041, 0.0077, 0.0056], device='cuda:0'), out_proj_covar=tensor([1.0277e-04, 1.5404e-04, 1.2635e-04, 1.4617e-04, 1.0326e-04, 8.6215e-05, 1.7687e-04, 1.1428e-04], device='cuda:0') 2022-11-15 15:34:18,288 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=11981.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:34:20,692 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=11984.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:34:25,362 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=11991.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:34:36,941 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.194e+02 2.107e+02 2.730e+02 3.355e+02 8.347e+02, threshold=5.461e+02, percent-clipped=3.0 2022-11-15 15:34:38,506 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12009.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:35:04,514 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12045.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:35:09,215 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.57 vs. limit=5.0 2022-11-15 15:35:11,921 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12055.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:35:24,088 INFO [train.py:876] (0/4) Epoch 2, batch 4800, loss[loss=0.2555, simple_loss=0.2375, pruned_loss=0.1368, over 5753.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.2207, pruned_loss=0.134, over 1089591.04 frames. ], batch size: 21, lr: 3.03e-02, grad_scale: 8.0 2022-11-15 15:35:26,316 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=12076.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:35:29,037 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8687, 3.8863, 3.5780, 3.5981, 3.4600, 3.8979, 1.5742, 3.7192], device='cuda:0'), covar=tensor([0.0157, 0.0181, 0.0204, 0.0164, 0.0248, 0.0162, 0.2157, 0.0198], device='cuda:0'), in_proj_covar=tensor([0.0058, 0.0048, 0.0048, 0.0038, 0.0054, 0.0041, 0.0094, 0.0057], device='cuda:0'), out_proj_covar=tensor([1.1391e-04, 9.3782e-05, 9.2766e-05, 7.3197e-05, 1.0213e-04, 7.9746e-05, 1.6507e-04, 1.1050e-04], device='cuda:0') 2022-11-15 15:35:42,190 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2022-11-15 15:35:48,822 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.190e+02 2.296e+02 2.933e+02 3.523e+02 8.613e+02, threshold=5.866e+02, percent-clipped=4.0 2022-11-15 15:36:09,331 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12137.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:36:34,717 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12172.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:36:35,355 INFO [train.py:876] (0/4) Epoch 2, batch 4900, loss[loss=0.3433, simple_loss=0.275, pruned_loss=0.2058, over 5467.00 frames. ], tot_loss[loss=0.2458, simple_loss=0.2213, pruned_loss=0.1351, over 1085920.26 frames. ], batch size: 64, lr: 3.02e-02, grad_scale: 8.0 2022-11-15 15:36:45,767 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12188.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:36:51,328 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6645, 2.9646, 2.2096, 1.2256, 2.2444, 2.9066, 2.3129, 3.1260], device='cuda:0'), covar=tensor([0.0664, 0.0385, 0.0329, 0.0752, 0.0104, 0.0112, 0.0109, 0.0077], device='cuda:0'), in_proj_covar=tensor([0.0125, 0.0110, 0.0080, 0.0126, 0.0071, 0.0066, 0.0066, 0.0073], device='cuda:0'), out_proj_covar=tensor([1.7326e-04, 1.4645e-04, 1.2024e-04, 1.7123e-04, 9.8401e-05, 9.2156e-05, 9.5337e-05, 9.5866e-05], device='cuda:0') 2022-11-15 15:36:55,618 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12202.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:36:59,040 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.339e+02 2.288e+02 2.922e+02 4.193e+02 1.035e+03, threshold=5.844e+02, percent-clipped=8.0 2022-11-15 15:37:18,599 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.73 vs. limit=2.0 2022-11-15 15:37:20,202 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12236.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:37:23,019 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8793, 1.8716, 1.2619, 1.7927, 0.9800, 1.9456, 1.7738, 1.0977], device='cuda:0'), covar=tensor([0.0183, 0.0108, 0.0128, 0.0195, 0.0660, 0.0218, 0.0219, 0.0325], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0024, 0.0027, 0.0029, 0.0026, 0.0023, 0.0024, 0.0026], device='cuda:0'), out_proj_covar=tensor([4.3107e-05, 3.4331e-05, 3.8573e-05, 4.3540e-05, 4.5677e-05, 3.8677e-05, 3.9187e-05, 3.9928e-05], device='cuda:0') 2022-11-15 15:37:46,254 INFO [train.py:876] (0/4) Epoch 2, batch 5000, loss[loss=0.2244, simple_loss=0.2107, pruned_loss=0.119, over 5569.00 frames. ], tot_loss[loss=0.2471, simple_loss=0.2218, pruned_loss=0.1362, over 1085712.93 frames. ], batch size: 22, lr: 3.01e-02, grad_scale: 8.0 2022-11-15 15:37:48,754 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12276.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:37:59,286 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12291.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:38:08,022 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12304.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:38:09,979 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 2.107e+02 2.700e+02 3.490e+02 8.758e+02, threshold=5.401e+02, percent-clipped=1.0 2022-11-15 15:38:23,758 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0292, 1.6268, 2.8952, 2.2348, 3.0730, 1.9864, 2.6378, 2.8290], device='cuda:0'), covar=tensor([0.0031, 0.0390, 0.0055, 0.0254, 0.0042, 0.0257, 0.0142, 0.0081], device='cuda:0'), in_proj_covar=tensor([0.0077, 0.0151, 0.0094, 0.0158, 0.0083, 0.0139, 0.0134, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 15:38:32,897 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12339.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:38:33,578 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12340.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:38:43,948 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12355.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:38:46,153 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8738, 1.9897, 3.9380, 2.7115, 4.0106, 2.7751, 3.7647, 3.6892], device='cuda:0'), covar=tensor([0.0037, 0.0481, 0.0064, 0.0353, 0.0033, 0.0318, 0.0162, 0.0115], device='cuda:0'), in_proj_covar=tensor([0.0077, 0.0150, 0.0094, 0.0157, 0.0082, 0.0137, 0.0133, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 15:38:56,191 INFO [train.py:876] (0/4) Epoch 2, batch 5100, loss[loss=0.3125, simple_loss=0.2505, pruned_loss=0.1873, over 3148.00 frames. ], tot_loss[loss=0.2432, simple_loss=0.2192, pruned_loss=0.1336, over 1083005.63 frames. ], batch size: 285, lr: 3.00e-02, grad_scale: 8.0 2022-11-15 15:38:57,628 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 15:38:59,417 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2074, 4.4280, 4.2822, 4.4393, 4.1518, 3.7541, 4.9151, 4.2504], device='cuda:0'), covar=tensor([0.0377, 0.0696, 0.0334, 0.0671, 0.0334, 0.0284, 0.0564, 0.0339], device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0072, 0.0060, 0.0071, 0.0048, 0.0041, 0.0077, 0.0053], device='cuda:0'), out_proj_covar=tensor([1.0361e-04, 1.5303e-04, 1.2405e-04, 1.4879e-04, 1.0440e-04, 8.6217e-05, 1.8008e-04, 1.1068e-04], device='cuda:0') 2022-11-15 15:39:06,201 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5922, 4.1048, 3.6414, 4.2172, 4.2234, 3.5495, 3.3359, 3.2313], device='cuda:0'), covar=tensor([0.0411, 0.0351, 0.0509, 0.0260, 0.0339, 0.0352, 0.0370, 0.0659], device='cuda:0'), in_proj_covar=tensor([0.0064, 0.0070, 0.0091, 0.0067, 0.0087, 0.0082, 0.0072, 0.0063], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 15:39:17,688 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12403.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:39:20,362 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 2.363e+02 2.989e+02 3.735e+02 9.189e+02, threshold=5.978e+02, percent-clipped=6.0 2022-11-15 15:39:30,141 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8443, 4.6070, 3.9393, 4.8208, 4.8606, 3.9765, 4.1599, 3.7276], device='cuda:0'), covar=tensor([0.0284, 0.0323, 0.0580, 0.0232, 0.0237, 0.0351, 0.0335, 0.1329], device='cuda:0'), in_proj_covar=tensor([0.0064, 0.0070, 0.0091, 0.0067, 0.0086, 0.0082, 0.0072, 0.0062], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 15:39:38,296 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12432.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:39:48,218 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=12446.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:40:04,645 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.07 vs. limit=2.0 2022-11-15 15:40:05,738 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12472.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:40:05,781 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1403, 0.9756, 0.9387, 0.6974, 1.3235, 1.6568, 1.0181, 1.0751], device='cuda:0'), covar=tensor([0.0792, 0.0174, 0.0365, 0.0436, 0.0632, 0.0203, 0.0248, 0.0237], device='cuda:0'), in_proj_covar=tensor([0.0010, 0.0011, 0.0009, 0.0011, 0.0010, 0.0010, 0.0011, 0.0009], device='cuda:0'), out_proj_covar=tensor([2.2217e-05, 2.1158e-05, 2.2476e-05, 2.7248e-05, 2.1842e-05, 2.0487e-05, 2.5216e-05, 2.0346e-05], device='cuda:0') 2022-11-15 15:40:06,297 INFO [train.py:876] (0/4) Epoch 2, batch 5200, loss[loss=0.254, simple_loss=0.2391, pruned_loss=0.1345, over 5653.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.2202, pruned_loss=0.1343, over 1087061.12 frames. ], batch size: 29, lr: 2.99e-02, grad_scale: 8.0 2022-11-15 15:40:11,254 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3825, 1.6849, 1.3597, 1.4112, 1.0883, 1.5752, 1.5477, 1.0624], device='cuda:0'), covar=tensor([0.0175, 0.0131, 0.0127, 0.0247, 0.0538, 0.0309, 0.0287, 0.0252], device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0025, 0.0028, 0.0030, 0.0028, 0.0023, 0.0026, 0.0028], device='cuda:0'), out_proj_covar=tensor([4.4714e-05, 3.5257e-05, 4.0200e-05, 4.5939e-05, 4.8606e-05, 3.9436e-05, 4.2668e-05, 4.3410e-05], device='cuda:0') 2022-11-15 15:40:27,619 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12502.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:40:30,896 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.236e+02 2.247e+02 2.853e+02 3.509e+02 7.106e+02, threshold=5.707e+02, percent-clipped=3.0 2022-11-15 15:40:31,104 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12507.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:40:39,596 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.20 vs. limit=2.0 2022-11-15 15:40:39,938 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12520.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:40:46,717 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.11 vs. limit=5.0 2022-11-15 15:40:47,557 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.3524, 5.3718, 5.3947, 5.4201, 5.0501, 4.2738, 5.9157, 5.0940], device='cuda:0'), covar=tensor([0.0261, 0.1100, 0.0227, 0.0695, 0.0374, 0.0359, 0.0636, 0.0333], device='cuda:0'), in_proj_covar=tensor([0.0047, 0.0069, 0.0058, 0.0070, 0.0047, 0.0040, 0.0076, 0.0053], device='cuda:0'), out_proj_covar=tensor([9.8491e-05, 1.4614e-04, 1.2063e-04, 1.4583e-04, 1.0404e-04, 8.3922e-05, 1.7799e-04, 1.0988e-04], device='cuda:0') 2022-11-15 15:41:01,892 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12550.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:41:17,841 INFO [train.py:876] (0/4) Epoch 2, batch 5300, loss[loss=0.2791, simple_loss=0.245, pruned_loss=0.1566, over 5546.00 frames. ], tot_loss[loss=0.2429, simple_loss=0.2192, pruned_loss=0.1333, over 1084425.22 frames. ], batch size: 30, lr: 2.98e-02, grad_scale: 8.0 2022-11-15 15:41:19,424 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6273, 4.5865, 3.3725, 2.2099, 4.6188, 2.1470, 4.1140, 2.7740], device='cuda:0'), covar=tensor([0.0628, 0.0160, 0.0464, 0.1909, 0.0105, 0.1444, 0.0191, 0.1327], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0065, 0.0064, 0.0106, 0.0066, 0.0108, 0.0056, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002], device='cuda:0') 2022-11-15 15:41:20,142 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12576.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:41:20,560 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0 2022-11-15 15:41:20,856 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.4297, 1.0988, 0.5785, 0.6564, 0.4850, 1.0745, 0.9173, 0.4570], device='cuda:0'), covar=tensor([0.0156, 0.0096, 0.0229, 0.0144, 0.0341, 0.0044, 0.0251, 0.0224], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0025, 0.0026, 0.0030, 0.0028, 0.0023, 0.0025, 0.0027], device='cuda:0'), out_proj_covar=tensor([4.3455e-05, 3.5490e-05, 3.8167e-05, 4.5097e-05, 4.8082e-05, 3.8678e-05, 4.1307e-05, 4.2884e-05], device='cuda:0') 2022-11-15 15:41:32,999 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.69 vs. limit=5.0 2022-11-15 15:41:40,303 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12604.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:41:42,528 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.253e+02 2.286e+02 2.922e+02 3.556e+02 5.667e+02, threshold=5.844e+02, percent-clipped=0.0 2022-11-15 15:41:54,445 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12624.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:42:02,059 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0988, 1.1589, 1.4037, 1.0094, 1.4418, 1.1873, 0.6688, 1.1447], device='cuda:0'), covar=tensor([0.0058, 0.0036, 0.0063, 0.0033, 0.0026, 0.0035, 0.0081, 0.0053], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0023, 0.0021, 0.0022, 0.0020, 0.0021, 0.0023, 0.0021], device='cuda:0'), out_proj_covar=tensor([3.0810e-05, 3.4033e-05, 3.3252e-05, 2.6712e-05, 2.8078e-05, 2.7774e-05, 4.0106e-05, 3.0873e-05], device='cuda:0') 2022-11-15 15:42:04,707 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3959, 1.9579, 2.3618, 3.2222, 3.1923, 2.5875, 1.9256, 3.1822], device='cuda:0'), covar=tensor([0.0046, 0.0801, 0.0576, 0.0111, 0.0065, 0.0546, 0.0668, 0.0042], device='cuda:0'), in_proj_covar=tensor([0.0098, 0.0189, 0.0196, 0.0109, 0.0119, 0.0212, 0.0187, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0001], device='cuda:0') 2022-11-15 15:42:05,296 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12640.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:42:09,421 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4324, 3.5342, 3.3473, 3.5879, 2.6894, 2.7146, 3.9548, 3.5572], device='cuda:0'), covar=tensor([0.0485, 0.0822, 0.0608, 0.0729, 0.0881, 0.0442, 0.0847, 0.0455], device='cuda:0'), in_proj_covar=tensor([0.0049, 0.0072, 0.0060, 0.0070, 0.0049, 0.0040, 0.0078, 0.0054], device='cuda:0'), out_proj_covar=tensor([1.0330e-04, 1.5212e-04, 1.2568e-04, 1.4737e-04, 1.0743e-04, 8.5638e-05, 1.8526e-04, 1.1254e-04], device='cuda:0') 2022-11-15 15:42:13,377 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12652.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:42:26,652 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-15 15:42:29,012 INFO [train.py:876] (0/4) Epoch 2, batch 5400, loss[loss=0.1919, simple_loss=0.1863, pruned_loss=0.09877, over 5719.00 frames. ], tot_loss[loss=0.2405, simple_loss=0.218, pruned_loss=0.1315, over 1081561.13 frames. ], batch size: 14, lr: 2.97e-02, grad_scale: 8.0 2022-11-15 15:42:31,315 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9394, 3.9275, 2.9353, 3.0339, 2.3732, 3.7548, 2.5147, 3.1486], device='cuda:0'), covar=tensor([0.0242, 0.0019, 0.0083, 0.0113, 0.0180, 0.0028, 0.0120, 0.0029], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0057, 0.0069, 0.0063, 0.0107, 0.0061, 0.0089, 0.0054], device='cuda:0'), out_proj_covar=tensor([1.5905e-04, 9.4303e-05, 1.1167e-04, 1.1216e-04, 1.7348e-04, 9.1287e-05, 1.4119e-04, 8.4899e-05], device='cuda:0') 2022-11-15 15:42:39,249 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12688.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:42:52,240 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.322e+02 2.272e+02 2.868e+02 3.649e+02 6.503e+02, threshold=5.736e+02, percent-clipped=2.0 2022-11-15 15:43:10,911 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=12732.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:43:32,315 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=12763.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:43:40,145 INFO [train.py:876] (0/4) Epoch 2, batch 5500, loss[loss=0.2042, simple_loss=0.1921, pruned_loss=0.1081, over 5530.00 frames. ], tot_loss[loss=0.2436, simple_loss=0.22, pruned_loss=0.1336, over 1082477.91 frames. ], batch size: 13, lr: 2.96e-02, grad_scale: 8.0 2022-11-15 15:43:45,536 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=12780.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:43:58,206 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.22 vs. limit=2.0 2022-11-15 15:44:01,663 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=12802.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:44:05,122 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.455e+02 2.098e+02 2.585e+02 3.345e+02 6.540e+02, threshold=5.170e+02, percent-clipped=1.0 2022-11-15 15:44:18,790 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=12824.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 15:44:29,392 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1590, 2.5445, 2.0574, 1.4672, 2.5148, 1.1782, 2.6873, 1.6408], device='cuda:0'), covar=tensor([0.0668, 0.0186, 0.0423, 0.1432, 0.0225, 0.1480, 0.0138, 0.1104], device='cuda:0'), in_proj_covar=tensor([0.0111, 0.0067, 0.0064, 0.0107, 0.0070, 0.0113, 0.0057, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0002], device='cuda:0') 2022-11-15 15:44:36,973 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.3693, 4.6701, 4.8168, 4.5951, 5.5659, 5.3546, 4.4455, 4.6037], device='cuda:0'), covar=tensor([0.0920, 0.0609, 0.1427, 0.0834, 0.0668, 0.0254, 0.0515, 0.1300], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0077, 0.0064, 0.0076, 0.0069, 0.0051, 0.0063, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 15:44:53,055 INFO [train.py:876] (0/4) Epoch 2, batch 5600, loss[loss=0.2689, simple_loss=0.2288, pruned_loss=0.1545, over 4756.00 frames. ], tot_loss[loss=0.2442, simple_loss=0.2198, pruned_loss=0.1344, over 1083254.35 frames. ], batch size: 136, lr: 2.95e-02, grad_scale: 8.0 2022-11-15 15:45:00,226 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5432, 1.2599, 1.3770, 1.6549, 0.5424, 1.2943, 1.2619, 1.0925], device='cuda:0'), covar=tensor([0.0030, 0.0200, 0.0096, 0.0056, 0.0315, 0.0225, 0.0163, 0.0079], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0034, 0.0041, 0.0028, 0.0042, 0.0035, 0.0040, 0.0025], device='cuda:0'), out_proj_covar=tensor([4.8204e-05, 6.1996e-05, 8.5404e-05, 5.2810e-05, 8.1041e-05, 7.3526e-05, 7.5074e-05, 4.8896e-05], device='cuda:0') 2022-11-15 15:45:14,528 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1099, 4.7333, 4.1874, 4.8682, 4.7585, 4.2205, 4.1362, 3.8245], device='cuda:0'), covar=tensor([0.0226, 0.0327, 0.0520, 0.0148, 0.0273, 0.0309, 0.0233, 0.0532], device='cuda:0'), in_proj_covar=tensor([0.0065, 0.0072, 0.0093, 0.0066, 0.0088, 0.0086, 0.0075, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001, 0.0001], device='cuda:0') 2022-11-15 15:45:17,179 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 2.173e+02 2.659e+02 3.655e+02 7.963e+02, threshold=5.318e+02, percent-clipped=7.0 2022-11-15 15:46:03,226 INFO [train.py:876] (0/4) Epoch 2, batch 5700, loss[loss=0.3121, simple_loss=0.2502, pruned_loss=0.1869, over 5433.00 frames. ], tot_loss[loss=0.2429, simple_loss=0.219, pruned_loss=0.1335, over 1081388.41 frames. ], batch size: 58, lr: 2.94e-02, grad_scale: 8.0 2022-11-15 15:46:28,142 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.438e+02 2.403e+02 3.143e+02 3.984e+02 1.020e+03, threshold=6.287e+02, percent-clipped=10.0 2022-11-15 15:46:33,106 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9163, 3.7095, 3.0959, 3.0745, 2.0074, 3.9476, 2.5691, 3.5883], device='cuda:0'), covar=tensor([0.0225, 0.0085, 0.0083, 0.0156, 0.0271, 0.0025, 0.0122, 0.0021], device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0060, 0.0076, 0.0067, 0.0113, 0.0068, 0.0092, 0.0059], device='cuda:0'), out_proj_covar=tensor([1.6694e-04, 1.0136e-04, 1.2365e-04, 1.2058e-04, 1.8512e-04, 1.0353e-04, 1.4817e-04, 9.4015e-05], device='cuda:0') 2022-11-15 15:47:14,318 INFO [train.py:876] (0/4) Epoch 2, batch 5800, loss[loss=0.2212, simple_loss=0.2025, pruned_loss=0.12, over 5703.00 frames. ], tot_loss[loss=0.2451, simple_loss=0.2207, pruned_loss=0.1348, over 1072886.38 frames. ], batch size: 12, lr: 2.93e-02, grad_scale: 8.0 2022-11-15 15:47:35,089 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=13102.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:47:38,322 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.334e+02 2.022e+02 2.861e+02 3.567e+02 9.909e+02, threshold=5.722e+02, percent-clipped=2.0 2022-11-15 15:47:46,190 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2022-11-15 15:47:46,536 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=13119.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 15:48:02,235 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7178, 1.1367, 2.0813, 1.9516, 2.7347, 3.2261, 3.0256, 1.7276], device='cuda:0'), covar=tensor([0.0044, 0.0876, 0.0073, 0.0118, 0.0072, 0.0047, 0.0056, 0.0157], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0017, 0.0018, 0.0019, 0.0018, 0.0019, 0.0021, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.5283e-05, 2.5643e-05, 2.7614e-05, 2.6063e-05, 2.5717e-05, 2.4101e-05, 2.8312e-05, 2.5953e-05], device='cuda:0') 2022-11-15 15:48:05,297 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8607, 3.0451, 2.5252, 3.0417, 2.2011, 2.4694, 1.6981, 2.6249], device='cuda:0'), covar=tensor([0.1600, 0.0186, 0.0678, 0.0303, 0.0647, 0.0811, 0.2063, 0.0208], device='cuda:0'), in_proj_covar=tensor([0.0174, 0.0101, 0.0142, 0.0094, 0.0116, 0.0156, 0.0185, 0.0094], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 15:48:08,274 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=13150.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:48:16,943 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.59 vs. limit=2.0 2022-11-15 15:48:24,659 INFO [train.py:876] (0/4) Epoch 2, batch 5900, loss[loss=0.2393, simple_loss=0.2231, pruned_loss=0.1278, over 5693.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.218, pruned_loss=0.1317, over 1079491.41 frames. ], batch size: 28, lr: 2.92e-02, grad_scale: 8.0 2022-11-15 15:48:49,672 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 2.159e+02 2.799e+02 3.502e+02 4.935e+02, threshold=5.598e+02, percent-clipped=0.0 2022-11-15 15:48:52,656 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0770, 2.7005, 2.3797, 2.7487, 1.7760, 2.7736, 2.1091, 2.2808], device='cuda:0'), covar=tensor([0.0105, 0.0022, 0.0040, 0.0039, 0.0114, 0.0028, 0.0068, 0.0025], device='cuda:0'), in_proj_covar=tensor([0.0102, 0.0055, 0.0072, 0.0063, 0.0106, 0.0065, 0.0089, 0.0056], device='cuda:0'), out_proj_covar=tensor([1.5690e-04, 9.4256e-05, 1.1667e-04, 1.1328e-04, 1.7581e-04, 9.9015e-05, 1.4543e-04, 8.9698e-05], device='cuda:0') 2022-11-15 15:48:55,637 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2022-11-15 15:49:14,190 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2022-11-15 15:49:21,456 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.53 vs. limit=5.0 2022-11-15 15:49:36,282 INFO [train.py:876] (0/4) Epoch 2, batch 6000, loss[loss=0.275, simple_loss=0.2337, pruned_loss=0.1581, over 5304.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2162, pruned_loss=0.1297, over 1083651.03 frames. ], batch size: 79, lr: 2.91e-02, grad_scale: 8.0 2022-11-15 15:49:36,284 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 15:49:41,221 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1037, 1.4456, 1.4141, 1.4937, 2.6794, 2.3995, 1.5737, 1.8500], device='cuda:0'), covar=tensor([0.0071, 0.0355, 0.0692, 0.0122, 0.0062, 0.0090, 0.0159, 0.0114], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0015, 0.0016, 0.0018, 0.0017, 0.0017, 0.0020, 0.0018], device='cuda:0'), out_proj_covar=tensor([2.3435e-05, 2.3845e-05, 2.4809e-05, 2.4171e-05, 2.3553e-05, 2.2219e-05, 2.7234e-05, 2.4643e-05], device='cuda:0') 2022-11-15 15:49:50,101 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2773, 3.7578, 3.0562, 3.1481, 2.2798, 3.5236, 2.5373, 3.3179], device='cuda:0'), covar=tensor([0.0306, 0.0077, 0.0120, 0.0167, 0.0259, 0.0077, 0.0173, 0.0046], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0056, 0.0072, 0.0064, 0.0107, 0.0065, 0.0090, 0.0057], device='cuda:0'), out_proj_covar=tensor([1.6255e-04, 9.5963e-05, 1.1763e-04, 1.1484e-04, 1.7728e-04, 9.9358e-05, 1.4684e-04, 9.2276e-05], device='cuda:0') 2022-11-15 15:49:54,756 INFO [train.py:908] (0/4) Epoch 2, validation: loss=0.1945, simple_loss=0.208, pruned_loss=0.09052, over 1530663.00 frames. 2022-11-15 15:49:54,757 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 15:50:19,048 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.409e+02 2.427e+02 3.017e+02 4.086e+02 8.174e+02, threshold=6.035e+02, percent-clipped=9.0 2022-11-15 15:50:58,418 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5499, 4.5014, 4.9272, 4.7805, 4.3596, 3.7856, 5.3786, 4.3978], device='cuda:0'), covar=tensor([0.0453, 0.1097, 0.0349, 0.0603, 0.0399, 0.0393, 0.0770, 0.0349], device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0073, 0.0060, 0.0068, 0.0049, 0.0042, 0.0077, 0.0052], device='cuda:0'), out_proj_covar=tensor([1.0798e-04, 1.5591e-04, 1.2888e-04, 1.4499e-04, 1.0683e-04, 8.9911e-05, 1.8766e-04, 1.1068e-04], device='cuda:0') 2022-11-15 15:51:06,258 INFO [train.py:876] (0/4) Epoch 2, batch 6100, loss[loss=0.2745, simple_loss=0.26, pruned_loss=0.1445, over 5751.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2164, pruned_loss=0.13, over 1077639.39 frames. ], batch size: 27, lr: 2.90e-02, grad_scale: 8.0 2022-11-15 15:51:21,992 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2022-11-15 15:51:29,934 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.327e+02 2.166e+02 2.663e+02 3.329e+02 7.571e+02, threshold=5.325e+02, percent-clipped=1.0 2022-11-15 15:51:38,747 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=13419.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:51:41,516 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4372, 4.3888, 4.2787, 4.5645, 4.3649, 4.3534, 1.7460, 4.4628], device='cuda:0'), covar=tensor([0.0171, 0.0286, 0.0167, 0.0085, 0.0220, 0.0198, 0.2302, 0.0217], device='cuda:0'), in_proj_covar=tensor([0.0068, 0.0054, 0.0052, 0.0042, 0.0061, 0.0046, 0.0106, 0.0065], device='cuda:0'), out_proj_covar=tensor([1.3622e-04, 1.0774e-04, 1.0188e-04, 8.2418e-05, 1.2018e-04, 9.5165e-05, 1.8873e-04, 1.2734e-04], device='cuda:0') 2022-11-15 15:51:42,535 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 15:51:45,633 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9155, 1.9348, 1.8563, 2.4761, 0.9148, 1.6458, 1.6761, 2.0662], device='cuda:0'), covar=tensor([0.0135, 0.0119, 0.0243, 0.0118, 0.0324, 0.0134, 0.0210, 0.1480], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0035, 0.0042, 0.0028, 0.0046, 0.0033, 0.0044, 0.0030], device='cuda:0'), out_proj_covar=tensor([5.5164e-05, 6.4854e-05, 8.8128e-05, 5.4377e-05, 8.9435e-05, 7.1585e-05, 8.3288e-05, 5.6833e-05], device='cuda:0') 2022-11-15 15:51:56,385 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6256, 3.7004, 2.7971, 2.0720, 3.7128, 1.2120, 3.4644, 2.3190], device='cuda:0'), covar=tensor([0.0830, 0.0172, 0.0551, 0.1780, 0.0124, 0.1807, 0.0185, 0.1291], device='cuda:0'), in_proj_covar=tensor([0.0110, 0.0070, 0.0069, 0.0111, 0.0074, 0.0116, 0.0062, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0001, 0.0003], device='cuda:0') 2022-11-15 15:52:12,693 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=13467.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:52:17,514 INFO [train.py:876] (0/4) Epoch 2, batch 6200, loss[loss=0.2546, simple_loss=0.2139, pruned_loss=0.1477, over 4672.00 frames. ], tot_loss[loss=0.2403, simple_loss=0.2182, pruned_loss=0.1312, over 1080316.48 frames. ], batch size: 135, lr: 2.89e-02, grad_scale: 8.0 2022-11-15 15:52:30,174 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 15:52:41,225 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 2.479e+02 3.223e+02 3.937e+02 9.344e+02, threshold=6.446e+02, percent-clipped=9.0 2022-11-15 15:52:44,220 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9276, 2.1582, 3.7706, 2.7897, 3.8045, 2.8608, 3.7556, 3.9795], device='cuda:0'), covar=tensor([0.0033, 0.0412, 0.0084, 0.0329, 0.0049, 0.0245, 0.0135, 0.0086], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0169, 0.0104, 0.0170, 0.0094, 0.0148, 0.0152, 0.0113], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 15:53:02,937 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3226, 3.1642, 3.2518, 3.0747, 3.4381, 3.2156, 3.2135, 3.3798], device='cuda:0'), covar=tensor([0.0488, 0.0377, 0.0504, 0.0388, 0.0441, 0.0213, 0.0360, 0.0433], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0075, 0.0063, 0.0076, 0.0071, 0.0049, 0.0065, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 15:53:27,903 INFO [train.py:876] (0/4) Epoch 2, batch 6300, loss[loss=0.1596, simple_loss=0.1766, pruned_loss=0.07134, over 5549.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2168, pruned_loss=0.1302, over 1081740.24 frames. ], batch size: 13, lr: 2.88e-02, grad_scale: 8.0 2022-11-15 15:53:52,434 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.388e+02 2.381e+02 3.023e+02 3.897e+02 7.097e+02, threshold=6.046e+02, percent-clipped=2.0 2022-11-15 15:53:55,995 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9383, 0.7376, 1.0746, 0.8842, 1.4032, 1.1241, 0.7905, 1.0775], device='cuda:0'), covar=tensor([0.0665, 0.0231, 0.0241, 0.0379, 0.0207, 0.0273, 0.0278, 0.0203], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0010, 0.0008, 0.0010, 0.0008, 0.0009, 0.0011, 0.0009], device='cuda:0'), out_proj_covar=tensor([2.1351e-05, 2.1559e-05, 2.0935e-05, 2.5190e-05, 1.9178e-05, 2.0054e-05, 2.4459e-05, 1.9613e-05], device='cuda:0') 2022-11-15 15:53:57,696 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 15:54:24,349 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-15 15:54:39,261 INFO [train.py:876] (0/4) Epoch 2, batch 6400, loss[loss=0.2467, simple_loss=0.2345, pruned_loss=0.1294, over 5770.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2165, pruned_loss=0.129, over 1084041.78 frames. ], batch size: 27, lr: 2.87e-02, grad_scale: 16.0 2022-11-15 15:55:03,886 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.329e+02 2.247e+02 2.645e+02 3.306e+02 5.348e+02, threshold=5.289e+02, percent-clipped=0.0 2022-11-15 15:55:24,658 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=13737.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:55:27,039 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3985, 4.1374, 3.4165, 4.0207, 3.3947, 3.0482, 2.2695, 3.6406], device='cuda:0'), covar=tensor([0.1606, 0.0101, 0.0578, 0.0238, 0.0312, 0.0746, 0.2161, 0.0165], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0099, 0.0139, 0.0094, 0.0112, 0.0150, 0.0178, 0.0089], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 15:55:30,013 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1647, 3.8208, 3.9693, 3.7479, 4.1850, 3.6443, 3.8540, 4.1911], device='cuda:0'), covar=tensor([0.0316, 0.0242, 0.0384, 0.0285, 0.0337, 0.0402, 0.0196, 0.0247], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0074, 0.0061, 0.0075, 0.0069, 0.0048, 0.0062, 0.0062], device='cuda:0'), out_proj_covar=tensor([1.5449e-04, 1.5567e-04, 1.3191e-04, 1.5307e-04, 1.6342e-04, 9.9824e-05, 1.3454e-04, 1.3616e-04], device='cuda:0') 2022-11-15 15:55:47,719 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 15:55:50,604 INFO [train.py:876] (0/4) Epoch 2, batch 6500, loss[loss=0.193, simple_loss=0.1809, pruned_loss=0.1026, over 5338.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.2183, pruned_loss=0.1307, over 1081085.44 frames. ], batch size: 9, lr: 2.86e-02, grad_scale: 16.0 2022-11-15 15:56:08,997 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.32 vs. limit=2.0 2022-11-15 15:56:09,395 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=13798.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 15:56:15,578 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.258e+02 2.395e+02 2.916e+02 4.086e+02 9.118e+02, threshold=5.832e+02, percent-clipped=10.0 2022-11-15 15:56:37,130 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2022-11-15 15:57:02,022 INFO [train.py:876] (0/4) Epoch 2, batch 6600, loss[loss=0.2258, simple_loss=0.2157, pruned_loss=0.118, over 5645.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.2195, pruned_loss=0.1325, over 1082801.14 frames. ], batch size: 32, lr: 2.85e-02, grad_scale: 16.0 2022-11-15 15:57:11,394 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.78 vs. limit=2.0 2022-11-15 15:57:25,735 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.321e+02 2.160e+02 2.814e+02 3.632e+02 6.196e+02, threshold=5.627e+02, percent-clipped=2.0 2022-11-15 15:57:40,361 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1438, 4.1818, 3.3380, 3.3718, 2.6406, 4.0247, 2.8884, 3.5965], device='cuda:0'), covar=tensor([0.0194, 0.0045, 0.0081, 0.0094, 0.0199, 0.0032, 0.0101, 0.0024], device='cuda:0'), in_proj_covar=tensor([0.0111, 0.0059, 0.0076, 0.0070, 0.0114, 0.0069, 0.0092, 0.0060], device='cuda:0'), out_proj_covar=tensor([1.7470e-04, 1.0284e-04, 1.2669e-04, 1.2717e-04, 1.8869e-04, 1.0685e-04, 1.5265e-04, 9.8734e-05], device='cuda:0') 2022-11-15 15:57:49,919 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7462, 1.3034, 1.2640, 0.5214, 1.3909, 1.0217, 0.7067, 1.1194], device='cuda:0'), covar=tensor([0.0149, 0.0054, 0.0070, 0.0048, 0.0051, 0.0084, 0.0185, 0.0094], device='cuda:0'), in_proj_covar=tensor([0.0021, 0.0021, 0.0021, 0.0020, 0.0019, 0.0020, 0.0021, 0.0018], device='cuda:0'), out_proj_covar=tensor([3.0595e-05, 3.0784e-05, 3.0644e-05, 2.4837e-05, 2.3894e-05, 2.7386e-05, 3.8220e-05, 2.6876e-05], device='cuda:0') 2022-11-15 15:58:10,467 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-15 15:58:12,915 INFO [train.py:876] (0/4) Epoch 2, batch 6700, loss[loss=0.2246, simple_loss=0.2108, pruned_loss=0.1192, over 5613.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.217, pruned_loss=0.1303, over 1079222.29 frames. ], batch size: 23, lr: 2.85e-02, grad_scale: 16.0 2022-11-15 15:58:17,187 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4813, 1.8302, 2.0802, 3.5150, 3.3774, 2.2150, 1.8864, 3.2478], device='cuda:0'), covar=tensor([0.0057, 0.1263, 0.1039, 0.0221, 0.0119, 0.1084, 0.0995, 0.0108], device='cuda:0'), in_proj_covar=tensor([0.0108, 0.0209, 0.0210, 0.0127, 0.0143, 0.0227, 0.0198, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 15:58:36,323 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 2.069e+02 2.722e+02 3.403e+02 5.968e+02, threshold=5.443e+02, percent-clipped=2.0 2022-11-15 15:59:08,248 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9127, 3.9993, 4.0160, 4.2450, 3.4025, 3.5338, 4.6066, 3.6897], device='cuda:0'), covar=tensor([0.0361, 0.0597, 0.0373, 0.0507, 0.0631, 0.0292, 0.0529, 0.0413], device='cuda:0'), in_proj_covar=tensor([0.0052, 0.0072, 0.0062, 0.0069, 0.0049, 0.0043, 0.0079, 0.0054], device='cuda:0'), out_proj_covar=tensor([1.1440e-04, 1.5545e-04, 1.3576e-04, 1.5020e-04, 1.0978e-04, 9.4953e-05, 1.9171e-04, 1.1564e-04], device='cuda:0') 2022-11-15 15:59:11,926 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8900, 1.8961, 3.8130, 2.7962, 3.4869, 2.6187, 3.5399, 3.7422], device='cuda:0'), covar=tensor([0.0036, 0.0470, 0.0064, 0.0348, 0.0071, 0.0276, 0.0158, 0.0092], device='cuda:0'), in_proj_covar=tensor([0.0090, 0.0160, 0.0109, 0.0170, 0.0096, 0.0148, 0.0153, 0.0116], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 15:59:23,683 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14072.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:59:24,582 INFO [train.py:876] (0/4) Epoch 2, batch 6800, loss[loss=0.2063, simple_loss=0.1911, pruned_loss=0.1108, over 5734.00 frames. ], tot_loss[loss=0.2403, simple_loss=0.218, pruned_loss=0.1313, over 1080614.82 frames. ], batch size: 31, lr: 2.84e-02, grad_scale: 16.0 2022-11-15 15:59:38,921 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14093.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 15:59:45,925 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14103.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 15:59:48,497 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.444e+02 2.309e+02 2.876e+02 3.823e+02 9.866e+02, threshold=5.752e+02, percent-clipped=3.0 2022-11-15 15:59:58,346 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14120.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 16:00:08,024 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14133.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:00:29,245 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14164.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:00:33,731 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3861, 3.5022, 3.2648, 3.2761, 2.6148, 3.8127, 2.8432, 3.1403], device='cuda:0'), covar=tensor([0.0187, 0.0198, 0.0072, 0.0109, 0.0223, 0.0037, 0.0123, 0.0029], device='cuda:0'), in_proj_covar=tensor([0.0110, 0.0061, 0.0078, 0.0069, 0.0115, 0.0068, 0.0094, 0.0062], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:00:35,849 INFO [train.py:876] (0/4) Epoch 2, batch 6900, loss[loss=0.238, simple_loss=0.2204, pruned_loss=0.1278, over 5732.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.2179, pruned_loss=0.1304, over 1085336.99 frames. ], batch size: 15, lr: 2.83e-02, grad_scale: 16.0 2022-11-15 16:00:41,935 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14181.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:00:42,405 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.74 vs. limit=2.0 2022-11-15 16:00:47,177 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1867, 3.4946, 3.0212, 3.1533, 2.2521, 3.5944, 2.5606, 3.0317], device='cuda:0'), covar=tensor([0.0186, 0.0125, 0.0072, 0.0097, 0.0224, 0.0036, 0.0115, 0.0029], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0062, 0.0079, 0.0070, 0.0116, 0.0069, 0.0095, 0.0063], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:01:00,099 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.482e+02 2.412e+02 3.170e+02 4.129e+02 8.263e+02, threshold=6.339e+02, percent-clipped=8.0 2022-11-15 16:01:03,056 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.52 vs. limit=5.0 2022-11-15 16:01:05,027 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6485, 3.0037, 2.5113, 1.1232, 3.0522, 3.3600, 2.4639, 3.2681], device='cuda:0'), covar=tensor([0.0781, 0.0380, 0.0331, 0.0870, 0.0094, 0.0067, 0.0124, 0.0072], device='cuda:0'), in_proj_covar=tensor([0.0146, 0.0133, 0.0093, 0.0146, 0.0088, 0.0078, 0.0080, 0.0087], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 16:01:40,409 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.11 vs. limit=2.0 2022-11-15 16:01:47,611 INFO [train.py:876] (0/4) Epoch 2, batch 7000, loss[loss=0.2111, simple_loss=0.1998, pruned_loss=0.1112, over 5690.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2161, pruned_loss=0.1294, over 1075291.58 frames. ], batch size: 28, lr: 2.82e-02, grad_scale: 16.0 2022-11-15 16:02:11,793 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 2.311e+02 3.068e+02 3.846e+02 6.793e+02, threshold=6.137e+02, percent-clipped=2.0 2022-11-15 16:02:52,003 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.15 vs. limit=2.0 2022-11-15 16:02:57,422 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.27 vs. limit=5.0 2022-11-15 16:02:58,387 INFO [train.py:876] (0/4) Epoch 2, batch 7100, loss[loss=0.2119, simple_loss=0.2153, pruned_loss=0.1043, over 5735.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2161, pruned_loss=0.1296, over 1070141.50 frames. ], batch size: 15, lr: 2.81e-02, grad_scale: 16.0 2022-11-15 16:03:13,684 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=14393.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:03:23,297 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 2.305e+02 2.820e+02 3.726e+02 7.277e+02, threshold=5.640e+02, percent-clipped=2.0 2022-11-15 16:03:27,743 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14413.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:03:29,907 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4986, 4.1093, 3.4867, 3.1964, 2.6503, 3.9985, 2.7733, 3.2985], device='cuda:0'), covar=tensor([0.0182, 0.0033, 0.0051, 0.0104, 0.0184, 0.0029, 0.0121, 0.0032], device='cuda:0'), in_proj_covar=tensor([0.0114, 0.0062, 0.0080, 0.0073, 0.0119, 0.0072, 0.0099, 0.0063], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:03:30,068 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.08 vs. limit=2.0 2022-11-15 16:03:35,047 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7074, 4.0683, 4.4713, 4.0837, 4.7049, 4.4423, 4.0938, 4.6501], device='cuda:0'), covar=tensor([0.0269, 0.0251, 0.0464, 0.0251, 0.0306, 0.0129, 0.0200, 0.0166], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0080, 0.0065, 0.0077, 0.0071, 0.0049, 0.0064, 0.0062], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 16:03:37,774 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14428.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:03:47,210 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=14441.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:03:47,901 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.9940, 5.4391, 5.7705, 5.2587, 6.0265, 6.0323, 4.8949, 5.9294], device='cuda:0'), covar=tensor([0.0227, 0.0200, 0.0359, 0.0196, 0.0303, 0.0053, 0.0176, 0.0204], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0080, 0.0065, 0.0077, 0.0070, 0.0049, 0.0064, 0.0062], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 16:03:56,045 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.15 vs. limit=5.0 2022-11-15 16:04:00,298 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14459.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:04:09,798 INFO [train.py:876] (0/4) Epoch 2, batch 7200, loss[loss=0.2135, simple_loss=0.1871, pruned_loss=0.1199, over 5016.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2156, pruned_loss=0.1283, over 1084166.57 frames. ], batch size: 109, lr: 2.80e-02, grad_scale: 16.0 2022-11-15 16:04:10,626 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14474.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:04:11,845 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14476.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 16:04:18,392 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2022-11-15 16:04:19,425 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14487.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:04:20,664 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9110, 4.2690, 3.4363, 2.0410, 4.1832, 1.4542, 3.9877, 2.2640], device='cuda:0'), covar=tensor([0.0830, 0.0088, 0.0284, 0.1682, 0.0105, 0.1747, 0.0118, 0.1548], device='cuda:0'), in_proj_covar=tensor([0.0116, 0.0075, 0.0074, 0.0113, 0.0080, 0.0119, 0.0066, 0.0115], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 16:04:33,141 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.449e+02 2.188e+02 2.926e+02 3.996e+02 7.445e+02, threshold=5.852e+02, percent-clipped=7.0 2022-11-15 16:04:49,529 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.81 vs. limit=5.0 2022-11-15 16:04:59,838 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-2.pt 2022-11-15 16:05:50,251 INFO [train.py:876] (0/4) Epoch 3, batch 0, loss[loss=0.2351, simple_loss=0.206, pruned_loss=0.1321, over 5057.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.206, pruned_loss=0.1321, over 5057.00 frames. ], batch size: 109, lr: 2.66e-02, grad_scale: 16.0 2022-11-15 16:05:50,252 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 16:05:59,776 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8910, 1.3976, 1.5736, 1.9811, 1.2211, 1.2009, 1.1642, 2.0032], device='cuda:0'), covar=tensor([0.0100, 0.0278, 0.0379, 0.0186, 0.0440, 0.0581, 0.0375, 0.0271], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0037, 0.0042, 0.0031, 0.0048, 0.0037, 0.0044, 0.0028], device='cuda:0'), out_proj_covar=tensor([6.1677e-05, 7.2798e-05, 9.3187e-05, 6.1843e-05, 9.7840e-05, 7.9605e-05, 8.7051e-05, 5.5923e-05], device='cuda:0') 2022-11-15 16:06:07,522 INFO [train.py:908] (0/4) Epoch 3, validation: loss=0.1917, simple_loss=0.2065, pruned_loss=0.08845, over 1530663.00 frames. 2022-11-15 16:06:07,523 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 16:06:09,670 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14548.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:06:51,176 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.16 vs. limit=2.0 2022-11-15 16:06:52,131 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.043e+02 2.390e+02 2.831e+02 3.708e+02 1.001e+03, threshold=5.662e+02, percent-clipped=6.0 2022-11-15 16:07:19,305 INFO [train.py:876] (0/4) Epoch 3, batch 100, loss[loss=0.2033, simple_loss=0.2015, pruned_loss=0.1026, over 5542.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2119, pruned_loss=0.1236, over 431776.12 frames. ], batch size: 21, lr: 2.65e-02, grad_scale: 16.0 2022-11-15 16:08:03,403 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.336e+02 2.280e+02 2.660e+02 3.506e+02 7.201e+02, threshold=5.320e+02, percent-clipped=1.0 2022-11-15 16:08:03,899 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-15 16:08:18,649 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=14728.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:08:27,262 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=14740.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:08:28,033 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3723, 2.0437, 2.3819, 3.4800, 3.3992, 2.5031, 2.1754, 3.4352], device='cuda:0'), covar=tensor([0.0047, 0.1236, 0.0963, 0.0220, 0.0141, 0.0947, 0.0850, 0.0070], device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0211, 0.0220, 0.0134, 0.0147, 0.0216, 0.0204, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:08:30,531 INFO [train.py:876] (0/4) Epoch 3, batch 200, loss[loss=0.2111, simple_loss=0.2001, pruned_loss=0.111, over 5583.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2131, pruned_loss=0.1242, over 692088.96 frames. ], batch size: 25, lr: 2.64e-02, grad_scale: 16.0 2022-11-15 16:08:41,081 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=14759.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:08:48,054 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14769.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:08:53,269 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=14776.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:08:53,348 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=14776.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 16:08:56,299 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9223, 2.7905, 2.6858, 1.1535, 2.9822, 2.9164, 2.7186, 2.6166], device='cuda:0'), covar=tensor([0.0843, 0.0479, 0.0265, 0.0985, 0.0097, 0.0123, 0.0112, 0.0118], device='cuda:0'), in_proj_covar=tensor([0.0149, 0.0133, 0.0096, 0.0153, 0.0090, 0.0085, 0.0082, 0.0094], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 16:09:11,204 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=14801.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:09:15,919 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.042e+02 2.295e+02 2.977e+02 3.816e+02 8.125e+02, threshold=5.953e+02, percent-clipped=6.0 2022-11-15 16:09:16,003 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=14807.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:09:27,530 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=14824.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:09:30,272 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4297, 1.4443, 2.2329, 1.7507, 2.1512, 1.3132, 1.8630, 2.0442], device='cuda:0'), covar=tensor([0.0040, 0.0233, 0.0051, 0.0113, 0.0062, 0.0297, 0.0143, 0.0088], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0164, 0.0114, 0.0171, 0.0102, 0.0151, 0.0163, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:09:41,492 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=14843.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:09:42,823 INFO [train.py:876] (0/4) Epoch 3, batch 300, loss[loss=0.2386, simple_loss=0.2205, pruned_loss=0.1283, over 5813.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2143, pruned_loss=0.1263, over 852934.43 frames. ], batch size: 21, lr: 2.63e-02, grad_scale: 16.0 2022-11-15 16:09:49,023 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0970, 1.1808, 0.8523, 0.7365, 0.9760, 1.0270, 0.6796, 1.0572], device='cuda:0'), covar=tensor([0.0080, 0.0049, 0.0114, 0.0084, 0.0096, 0.0416, 0.0143, 0.0117], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0019, 0.0021, 0.0022, 0.0020, 0.0019, 0.0021, 0.0017], device='cuda:0'), out_proj_covar=tensor([2.7365e-05, 2.6888e-05, 2.9566e-05, 2.6813e-05, 2.6173e-05, 2.6120e-05, 3.6296e-05, 2.3875e-05], device='cuda:0') 2022-11-15 16:10:27,141 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.433e+02 2.202e+02 2.979e+02 3.584e+02 7.564e+02, threshold=5.959e+02, percent-clipped=7.0 2022-11-15 16:10:28,645 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.05 vs. limit=2.0 2022-11-15 16:10:38,216 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9397, 2.0018, 3.7953, 2.7861, 3.9863, 3.0987, 3.5700, 4.0766], device='cuda:0'), covar=tensor([0.0046, 0.0448, 0.0089, 0.0336, 0.0040, 0.0233, 0.0182, 0.0113], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0168, 0.0117, 0.0174, 0.0103, 0.0151, 0.0164, 0.0125], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:10:55,020 INFO [train.py:876] (0/4) Epoch 3, batch 400, loss[loss=0.22, simple_loss=0.2099, pruned_loss=0.115, over 5510.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2142, pruned_loss=0.1257, over 946890.94 frames. ], batch size: 17, lr: 2.62e-02, grad_scale: 16.0 2022-11-15 16:11:02,113 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4556, 1.0728, 1.0227, 0.8863, 1.4428, 0.7298, 1.2699, 1.5332], device='cuda:0'), covar=tensor([0.0376, 0.0269, 0.0329, 0.0639, 0.0220, 0.1140, 0.0241, 0.0209], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0010, 0.0009, 0.0010, 0.0009, 0.0010, 0.0010, 0.0008], device='cuda:0'), out_proj_covar=tensor([2.1750e-05, 2.3586e-05, 2.2332e-05, 2.7021e-05, 2.1860e-05, 2.2593e-05, 2.6144e-05, 2.0321e-05], device='cuda:0') 2022-11-15 16:11:03,014 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.08 vs. limit=2.0 2022-11-15 16:11:07,227 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9861, 4.3721, 3.3200, 1.8458, 4.2580, 1.6569, 3.9879, 2.5186], device='cuda:0'), covar=tensor([0.0945, 0.0108, 0.0375, 0.2263, 0.0100, 0.1691, 0.0156, 0.1435], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0076, 0.0075, 0.0114, 0.0077, 0.0118, 0.0066, 0.0115], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 16:11:26,437 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.35 vs. limit=5.0 2022-11-15 16:11:34,628 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-15000.pt 2022-11-15 16:11:43,139 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.230e+02 2.415e+02 2.867e+02 3.380e+02 8.857e+02, threshold=5.735e+02, percent-clipped=3.0 2022-11-15 16:12:10,987 INFO [train.py:876] (0/4) Epoch 3, batch 500, loss[loss=0.2811, simple_loss=0.245, pruned_loss=0.1586, over 5450.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2127, pruned_loss=0.1252, over 992155.34 frames. ], batch size: 64, lr: 2.62e-02, grad_scale: 16.0 2022-11-15 16:12:27,968 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=15069.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:12:34,680 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8542, 3.5730, 3.7026, 3.4754, 3.9845, 3.3803, 3.5561, 3.8951], device='cuda:0'), covar=tensor([0.0417, 0.0294, 0.0464, 0.0301, 0.0331, 0.0388, 0.0281, 0.0309], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0081, 0.0066, 0.0077, 0.0074, 0.0050, 0.0066, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 16:12:48,130 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=15096.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:12:48,142 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6807, 1.7730, 1.5981, 1.7456, 1.8773, 1.7221, 1.7140, 1.6720], device='cuda:0'), covar=tensor([0.0344, 0.0514, 0.0622, 0.0434, 0.0330, 0.0469, 0.0377, 0.0385], device='cuda:0'), in_proj_covar=tensor([0.0076, 0.0085, 0.0114, 0.0080, 0.0103, 0.0099, 0.0085, 0.0079], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:12:55,529 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.456e+02 2.220e+02 2.761e+02 3.566e+02 7.983e+02, threshold=5.522e+02, percent-clipped=3.0 2022-11-15 16:13:00,911 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.08 vs. limit=2.0 2022-11-15 16:13:02,513 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=15117.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:13:11,008 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4567, 4.0281, 4.2682, 4.0406, 4.5646, 4.2226, 4.0730, 4.4620], device='cuda:0'), covar=tensor([0.0350, 0.0250, 0.0446, 0.0252, 0.0275, 0.0202, 0.0177, 0.0280], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0082, 0.0065, 0.0078, 0.0075, 0.0050, 0.0066, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 16:13:18,318 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.25 vs. limit=5.0 2022-11-15 16:13:21,500 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=15143.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:13:22,752 INFO [train.py:876] (0/4) Epoch 3, batch 600, loss[loss=0.2183, simple_loss=0.2048, pruned_loss=0.1159, over 5576.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.2138, pruned_loss=0.1249, over 1025100.28 frames. ], batch size: 43, lr: 2.61e-02, grad_scale: 16.0 2022-11-15 16:13:25,958 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9511, 5.0673, 5.5870, 5.3863, 4.9603, 3.9941, 5.9991, 5.1916], device='cuda:0'), covar=tensor([0.0254, 0.0678, 0.0205, 0.0417, 0.0244, 0.0333, 0.0388, 0.0217], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0075, 0.0062, 0.0072, 0.0052, 0.0045, 0.0085, 0.0055], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:13:26,765 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7564, 2.3790, 1.5382, 1.0030, 1.9244, 2.6463, 1.7513, 3.0215], device='cuda:0'), covar=tensor([0.0705, 0.0362, 0.0454, 0.0954, 0.0133, 0.0128, 0.0164, 0.0091], device='cuda:0'), in_proj_covar=tensor([0.0158, 0.0133, 0.0101, 0.0160, 0.0097, 0.0089, 0.0085, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 16:13:28,072 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2444, 3.8023, 3.0637, 3.0603, 2.3831, 3.5028, 2.4667, 2.9276], device='cuda:0'), covar=tensor([0.0176, 0.0048, 0.0068, 0.0093, 0.0190, 0.0042, 0.0117, 0.0031], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0062, 0.0080, 0.0074, 0.0119, 0.0072, 0.0102, 0.0064], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:13:53,270 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2687, 3.7018, 4.1718, 3.8198, 4.3390, 3.9988, 3.9065, 4.2984], device='cuda:0'), covar=tensor([0.0296, 0.0341, 0.0357, 0.0298, 0.0317, 0.0255, 0.0271, 0.0255], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0084, 0.0068, 0.0080, 0.0077, 0.0051, 0.0068, 0.0067], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 16:13:55,646 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=15191.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:13:58,627 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=15195.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:14:07,258 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.247e+02 2.220e+02 2.668e+02 3.263e+02 7.749e+02, threshold=5.336e+02, percent-clipped=2.0 2022-11-15 16:14:29,956 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2022-11-15 16:14:33,897 INFO [train.py:876] (0/4) Epoch 3, batch 700, loss[loss=0.2256, simple_loss=0.2075, pruned_loss=0.1218, over 5662.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2149, pruned_loss=0.1261, over 1049624.41 frames. ], batch size: 29, lr: 2.60e-02, grad_scale: 16.0 2022-11-15 16:14:42,115 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=15256.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 16:15:05,890 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.70 vs. limit=5.0 2022-11-15 16:15:10,475 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1299, 1.1529, 1.0927, 1.2871, 1.0894, 1.1763, 1.0173, 1.2038], device='cuda:0'), covar=tensor([0.0021, 0.0011, 0.0014, 0.0013, 0.0010, 0.0011, 0.0022, 0.0011], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0012, 0.0012, 0.0015, 0.0013, 0.0014, 0.0016, 0.0013], device='cuda:0'), out_proj_covar=tensor([2.0236e-05, 1.8154e-05, 1.8170e-05, 1.7485e-05, 1.7787e-05, 1.7750e-05, 2.1851e-05, 1.7828e-05], device='cuda:0') 2022-11-15 16:15:11,505 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.97 vs. limit=5.0 2022-11-15 16:15:18,348 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.235e+02 2.381e+02 2.880e+02 4.091e+02 8.657e+02, threshold=5.760e+02, percent-clipped=7.0 2022-11-15 16:15:44,959 INFO [train.py:876] (0/4) Epoch 3, batch 800, loss[loss=0.2959, simple_loss=0.244, pruned_loss=0.1739, over 5465.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2148, pruned_loss=0.1258, over 1067326.53 frames. ], batch size: 49, lr: 2.59e-02, grad_scale: 16.0 2022-11-15 16:16:01,417 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3141, 1.4885, 1.8030, 1.5244, 0.7460, 1.7561, 1.0645, 1.4120], device='cuda:0'), covar=tensor([0.0192, 0.0127, 0.0147, 0.0184, 0.0959, 0.0369, 0.0465, 0.0390], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0027, 0.0028, 0.0032, 0.0028, 0.0025, 0.0027, 0.0029], device='cuda:0'), out_proj_covar=tensor([5.2084e-05, 3.6554e-05, 4.3189e-05, 5.1035e-05, 4.8372e-05, 4.3327e-05, 4.2338e-05, 4.7188e-05], device='cuda:0') 2022-11-15 16:16:21,718 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=15396.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:16:30,195 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 2.163e+02 2.793e+02 3.391e+02 6.505e+02, threshold=5.586e+02, percent-clipped=1.0 2022-11-15 16:16:56,551 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=15444.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:16:57,171 INFO [train.py:876] (0/4) Epoch 3, batch 900, loss[loss=0.2541, simple_loss=0.2213, pruned_loss=0.1435, over 5072.00 frames. ], tot_loss[loss=0.233, simple_loss=0.2146, pruned_loss=0.1257, over 1074596.45 frames. ], batch size: 91, lr: 2.59e-02, grad_scale: 16.0 2022-11-15 16:17:10,717 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2022-11-15 16:17:25,980 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=9.02 vs. limit=5.0 2022-11-15 16:17:41,564 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 2.326e+02 2.765e+02 3.451e+02 5.869e+02, threshold=5.530e+02, percent-clipped=2.0 2022-11-15 16:17:51,383 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7387, 4.1900, 3.3675, 4.0461, 4.0719, 3.4800, 3.7509, 3.2983], device='cuda:0'), covar=tensor([0.0302, 0.0541, 0.1027, 0.0696, 0.0529, 0.0639, 0.0467, 0.0574], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0084, 0.0117, 0.0084, 0.0104, 0.0100, 0.0089, 0.0081], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:18:09,250 INFO [train.py:876] (0/4) Epoch 3, batch 1000, loss[loss=0.1843, simple_loss=0.1896, pruned_loss=0.08947, over 5737.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2154, pruned_loss=0.1263, over 1073841.53 frames. ], batch size: 14, lr: 2.58e-02, grad_scale: 16.0 2022-11-15 16:18:13,437 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=15551.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 16:18:53,836 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.311e+02 2.148e+02 2.818e+02 3.595e+02 6.939e+02, threshold=5.636e+02, percent-clipped=2.0 2022-11-15 16:19:02,542 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2022-11-15 16:19:12,229 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2885, 3.9484, 3.0900, 3.1681, 2.3164, 3.6471, 2.3446, 3.1353], device='cuda:0'), covar=tensor([0.0164, 0.0026, 0.0067, 0.0081, 0.0162, 0.0029, 0.0110, 0.0025], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0062, 0.0082, 0.0077, 0.0119, 0.0075, 0.0099, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:19:16,190 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.4198, 4.9270, 5.3339, 4.7602, 5.4876, 5.3750, 4.7584, 5.4301], device='cuda:0'), covar=tensor([0.0368, 0.0250, 0.0354, 0.0266, 0.0395, 0.0081, 0.0207, 0.0181], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0083, 0.0070, 0.0081, 0.0079, 0.0051, 0.0068, 0.0071], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:0') 2022-11-15 16:19:20,895 INFO [train.py:876] (0/4) Epoch 3, batch 1100, loss[loss=0.3237, simple_loss=0.2665, pruned_loss=0.1904, over 5455.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2125, pruned_loss=0.124, over 1078497.37 frames. ], batch size: 58, lr: 2.57e-02, grad_scale: 32.0 2022-11-15 16:19:27,876 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6831, 2.0151, 3.4253, 2.5667, 3.8069, 2.4248, 3.1994, 3.8940], device='cuda:0'), covar=tensor([0.0048, 0.0406, 0.0100, 0.0378, 0.0050, 0.0260, 0.0148, 0.0078], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0160, 0.0116, 0.0175, 0.0104, 0.0153, 0.0161, 0.0123], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:19:48,299 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0621, 1.9289, 3.4884, 2.5188, 3.9549, 2.0961, 3.3023, 3.9418], device='cuda:0'), covar=tensor([0.0028, 0.0398, 0.0087, 0.0330, 0.0035, 0.0321, 0.0155, 0.0090], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0156, 0.0114, 0.0171, 0.0101, 0.0150, 0.0157, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:20:05,751 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.351e+02 2.101e+02 2.719e+02 3.370e+02 6.510e+02, threshold=5.438e+02, percent-clipped=5.0 2022-11-15 16:20:16,674 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7538, 1.6164, 1.2865, 1.4867, 1.3957, 2.2688, 1.7483, 1.9130], device='cuda:0'), covar=tensor([0.0066, 0.0249, 0.0101, 0.0040, 0.0036, 0.0035, 0.0050, 0.0034], device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0014, 0.0013, 0.0016, 0.0014, 0.0014, 0.0017, 0.0014], device='cuda:0'), out_proj_covar=tensor([2.0853e-05, 2.0257e-05, 2.0185e-05, 2.0121e-05, 1.8558e-05, 1.8141e-05, 2.2377e-05, 1.8597e-05], device='cuda:0') 2022-11-15 16:20:32,642 INFO [train.py:876] (0/4) Epoch 3, batch 1200, loss[loss=0.2432, simple_loss=0.2228, pruned_loss=0.1318, over 5747.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2113, pruned_loss=0.1222, over 1084194.15 frames. ], batch size: 31, lr: 2.56e-02, grad_scale: 16.0 2022-11-15 16:20:45,551 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3101, 4.6733, 3.6076, 2.2203, 4.4561, 1.5838, 4.3176, 2.9098], device='cuda:0'), covar=tensor([0.0828, 0.0100, 0.0472, 0.1698, 0.0119, 0.1763, 0.0125, 0.1192], device='cuda:0'), in_proj_covar=tensor([0.0116, 0.0081, 0.0078, 0.0112, 0.0080, 0.0122, 0.0069, 0.0117], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 16:21:17,794 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.458e+02 2.202e+02 2.751e+02 3.254e+02 5.977e+02, threshold=5.502e+02, percent-clipped=4.0 2022-11-15 16:21:43,802 INFO [train.py:876] (0/4) Epoch 3, batch 1300, loss[loss=0.1739, simple_loss=0.1844, pruned_loss=0.08171, over 5739.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2105, pruned_loss=0.1218, over 1083010.82 frames. ], batch size: 13, lr: 2.56e-02, grad_scale: 16.0 2022-11-15 16:21:48,890 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=15851.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:22:11,342 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=15883.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:22:22,433 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=15899.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:22:29,934 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.357e+02 2.010e+02 2.682e+02 3.501e+02 1.988e+03, threshold=5.365e+02, percent-clipped=5.0 2022-11-15 16:22:37,779 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0633, 3.1375, 2.7796, 2.9579, 2.0307, 2.9668, 2.0440, 2.5794], device='cuda:0'), covar=tensor([0.0187, 0.0047, 0.0060, 0.0088, 0.0184, 0.0046, 0.0141, 0.0039], device='cuda:0'), in_proj_covar=tensor([0.0123, 0.0064, 0.0084, 0.0080, 0.0123, 0.0079, 0.0105, 0.0069], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:22:54,772 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=15944.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:22:55,282 INFO [train.py:876] (0/4) Epoch 3, batch 1400, loss[loss=0.216, simple_loss=0.207, pruned_loss=0.1125, over 5491.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.21, pruned_loss=0.1216, over 1085811.14 frames. ], batch size: 12, lr: 2.55e-02, grad_scale: 8.0 2022-11-15 16:23:12,847 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.12 vs. limit=5.0 2022-11-15 16:23:29,065 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4190, 1.8475, 1.5492, 1.4550, 1.0456, 1.4768, 1.3128, 1.0872], device='cuda:0'), covar=tensor([0.0060, 0.0013, 0.0028, 0.0023, 0.0096, 0.0026, 0.0044, 0.0028], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0062, 0.0083, 0.0078, 0.0121, 0.0077, 0.0102, 0.0069], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:23:34,108 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-15 16:23:36,100 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16001.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:23:42,148 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.208e+02 2.137e+02 2.778e+02 3.586e+02 7.557e+02, threshold=5.556e+02, percent-clipped=2.0 2022-11-15 16:24:07,310 INFO [train.py:876] (0/4) Epoch 3, batch 1500, loss[loss=0.1737, simple_loss=0.1858, pruned_loss=0.08084, over 5562.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2122, pruned_loss=0.1228, over 1086663.49 frames. ], batch size: 15, lr: 2.54e-02, grad_scale: 8.0 2022-11-15 16:24:19,064 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16062.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:24:36,019 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16085.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:24:52,650 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.193e+02 2.433e+02 2.944e+02 3.861e+02 5.407e+02, threshold=5.888e+02, percent-clipped=0.0 2022-11-15 16:25:18,685 INFO [train.py:876] (0/4) Epoch 3, batch 1600, loss[loss=0.2039, simple_loss=0.1922, pruned_loss=0.1078, over 5609.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2108, pruned_loss=0.1216, over 1087585.26 frames. ], batch size: 23, lr: 2.53e-02, grad_scale: 8.0 2022-11-15 16:25:19,555 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16146.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:25:21,628 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16149.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:25:25,522 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0961, 1.0963, 1.4399, 0.8462, 1.4317, 1.3825, 0.7811, 1.3010], device='cuda:0'), covar=tensor([0.0031, 0.0021, 0.0025, 0.0029, 0.0033, 0.0025, 0.0069, 0.0022], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0019, 0.0019, 0.0019, 0.0020, 0.0016, 0.0021, 0.0017], device='cuda:0'), out_proj_covar=tensor([2.4957e-05, 2.8488e-05, 2.6650e-05, 2.3814e-05, 2.3892e-05, 2.1495e-05, 3.5871e-05, 2.2452e-05], device='cuda:0') 2022-11-15 16:25:56,170 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8449, 0.9068, 1.1432, 0.7577, 0.9440, 0.9004, 0.9893, 1.1559], device='cuda:0'), covar=tensor([0.0226, 0.0347, 0.0370, 0.1052, 0.0628, 0.0646, 0.0607, 0.0613], device='cuda:0'), in_proj_covar=tensor([0.0010, 0.0012, 0.0009, 0.0010, 0.0010, 0.0010, 0.0011, 0.0010], device='cuda:0'), out_proj_covar=tensor([2.5936e-05, 2.8551e-05, 2.4956e-05, 3.0076e-05, 2.4969e-05, 2.5977e-05, 2.8653e-05, 2.5689e-05], device='cuda:0') 2022-11-15 16:26:04,923 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.227e+02 2.236e+02 2.729e+02 3.361e+02 8.285e+02, threshold=5.459e+02, percent-clipped=7.0 2022-11-15 16:26:05,778 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16210.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:26:23,499 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2693, 2.2790, 1.4263, 2.8720, 1.9975, 2.1705, 2.0778, 2.8190], device='cuda:0'), covar=tensor([0.0219, 0.0298, 0.0828, 0.0396, 0.0425, 0.0385, 0.0366, 0.0742], device='cuda:0'), in_proj_covar=tensor([0.0039, 0.0040, 0.0047, 0.0035, 0.0051, 0.0043, 0.0050, 0.0035], device='cuda:0'), out_proj_covar=tensor([7.4456e-05, 8.1591e-05, 1.1034e-04, 7.2832e-05, 1.0777e-04, 9.6728e-05, 1.0083e-04, 7.0610e-05], device='cuda:0') 2022-11-15 16:26:26,173 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16239.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:26:30,141 INFO [train.py:876] (0/4) Epoch 3, batch 1700, loss[loss=0.1983, simple_loss=0.1974, pruned_loss=0.09959, over 5698.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2103, pruned_loss=0.1216, over 1085685.17 frames. ], batch size: 19, lr: 2.53e-02, grad_scale: 8.0 2022-11-15 16:26:30,479 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.25 vs. limit=5.0 2022-11-15 16:26:49,993 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.71 vs. limit=2.0 2022-11-15 16:27:15,063 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.295e+02 2.221e+02 2.797e+02 3.532e+02 1.022e+03, threshold=5.594e+02, percent-clipped=2.0 2022-11-15 16:27:18,113 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6889, 2.9127, 2.2000, 1.2380, 2.7089, 3.1862, 2.9689, 3.5511], device='cuda:0'), covar=tensor([0.0923, 0.0436, 0.0365, 0.1103, 0.0145, 0.0099, 0.0113, 0.0097], device='cuda:0'), in_proj_covar=tensor([0.0159, 0.0147, 0.0106, 0.0171, 0.0103, 0.0093, 0.0093, 0.0106], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:0') 2022-11-15 16:27:41,535 INFO [train.py:876] (0/4) Epoch 3, batch 1800, loss[loss=0.1479, simple_loss=0.1497, pruned_loss=0.07308, over 5707.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2101, pruned_loss=0.1212, over 1087464.57 frames. ], batch size: 11, lr: 2.52e-02, grad_scale: 8.0 2022-11-15 16:27:49,687 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16357.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:27:59,184 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1734, 0.9758, 1.7548, 1.4316, 0.1953, 1.5112, 0.9176, 1.0502], device='cuda:0'), covar=tensor([0.0180, 0.0176, 0.0084, 0.0335, 0.0661, 0.0253, 0.0665, 0.0294], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0027, 0.0025, 0.0029, 0.0026, 0.0023, 0.0025, 0.0027], device='cuda:0'), out_proj_covar=tensor([4.4885e-05, 3.7280e-05, 3.6373e-05, 4.6111e-05, 4.5266e-05, 4.0304e-05, 4.1581e-05, 4.3781e-05], device='cuda:0') 2022-11-15 16:27:59,870 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16371.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:28:26,563 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.507e+02 2.508e+02 2.970e+02 3.858e+02 6.298e+02, threshold=5.940e+02, percent-clipped=3.0 2022-11-15 16:28:34,926 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.7625, 5.2594, 5.6517, 5.1549, 5.7459, 5.7623, 4.7333, 5.4731], device='cuda:0'), covar=tensor([0.0262, 0.0176, 0.0281, 0.0240, 0.0304, 0.0056, 0.0263, 0.0238], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0082, 0.0067, 0.0084, 0.0078, 0.0052, 0.0067, 0.0071], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:0') 2022-11-15 16:28:42,864 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16432.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:28:49,198 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16441.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:28:51,927 INFO [train.py:876] (0/4) Epoch 3, batch 1900, loss[loss=0.2384, simple_loss=0.2126, pruned_loss=0.1321, over 5528.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2084, pruned_loss=0.1198, over 1084353.00 frames. ], batch size: 40, lr: 2.51e-02, grad_scale: 8.0 2022-11-15 16:29:18,489 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0648, 1.2914, 0.9157, 1.0721, 1.5355, 1.9568, 0.6745, 1.2455], device='cuda:0'), covar=tensor([0.0048, 0.0032, 0.0041, 0.0035, 0.0031, 0.0031, 0.0109, 0.0035], device='cuda:0'), in_proj_covar=tensor([0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0018, 0.0023, 0.0018], device='cuda:0'), out_proj_covar=tensor([2.8031e-05, 3.0354e-05, 2.8233e-05, 2.4599e-05, 2.4857e-05, 2.2718e-05, 4.0344e-05, 2.3972e-05], device='cuda:0') 2022-11-15 16:29:20,249 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2035, 0.9398, 1.0082, 0.8328, 0.8962, 1.1545, 1.3762, 1.1976], device='cuda:0'), covar=tensor([0.0305, 0.0273, 0.0296, 0.0485, 0.0272, 0.0555, 0.0259, 0.0301], device='cuda:0'), in_proj_covar=tensor([0.0010, 0.0012, 0.0009, 0.0009, 0.0010, 0.0010, 0.0010, 0.0010], device='cuda:0'), out_proj_covar=tensor([2.6199e-05, 2.8226e-05, 2.5101e-05, 2.7444e-05, 2.5578e-05, 2.6515e-05, 2.7957e-05, 2.5814e-05], device='cuda:0') 2022-11-15 16:29:22,829 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4405, 4.4351, 4.3789, 3.8922, 4.5069, 4.1734, 1.6862, 4.3067], device='cuda:0'), covar=tensor([0.0213, 0.0168, 0.0208, 0.0296, 0.0191, 0.0221, 0.2247, 0.0279], device='cuda:0'), in_proj_covar=tensor([0.0078, 0.0059, 0.0060, 0.0049, 0.0071, 0.0049, 0.0113, 0.0077], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 16:29:34,091 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16505.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:29:37,426 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.237e+02 2.066e+02 2.615e+02 3.411e+02 7.424e+02, threshold=5.231e+02, percent-clipped=4.0 2022-11-15 16:29:49,360 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2717, 1.7430, 2.8735, 2.2531, 3.0108, 2.1609, 2.6389, 3.0095], device='cuda:0'), covar=tensor([0.0036, 0.0334, 0.0091, 0.0281, 0.0061, 0.0226, 0.0171, 0.0089], device='cuda:0'), in_proj_covar=tensor([0.0099, 0.0165, 0.0123, 0.0176, 0.0113, 0.0157, 0.0168, 0.0131], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:29:58,066 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=16539.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:30:02,345 INFO [train.py:876] (0/4) Epoch 3, batch 2000, loss[loss=0.244, simple_loss=0.201, pruned_loss=0.1435, over 4109.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2091, pruned_loss=0.1199, over 1089033.12 frames. ], batch size: 181, lr: 2.51e-02, grad_scale: 8.0 2022-11-15 16:30:32,349 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=16587.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:30:48,527 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.413e+02 2.235e+02 2.805e+02 3.719e+02 8.798e+02, threshold=5.609e+02, percent-clipped=4.0 2022-11-15 16:30:48,685 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6861, 4.3177, 3.3500, 1.7539, 4.2291, 1.2861, 4.0168, 2.3681], device='cuda:0'), covar=tensor([0.1192, 0.0143, 0.0423, 0.2227, 0.0105, 0.1908, 0.0134, 0.1486], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0085, 0.0082, 0.0115, 0.0081, 0.0124, 0.0071, 0.0118], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 16:30:59,136 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9243, 3.2385, 2.5921, 2.6056, 1.6281, 2.8904, 2.1582, 2.1898], device='cuda:0'), covar=tensor([0.0126, 0.0025, 0.0043, 0.0075, 0.0137, 0.0028, 0.0072, 0.0030], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0068, 0.0089, 0.0084, 0.0126, 0.0084, 0.0107, 0.0074], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:31:14,091 INFO [train.py:876] (0/4) Epoch 3, batch 2100, loss[loss=0.2409, simple_loss=0.2144, pruned_loss=0.1337, over 5673.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2102, pruned_loss=0.1216, over 1089579.21 frames. ], batch size: 34, lr: 2.50e-02, grad_scale: 8.0 2022-11-15 16:31:19,734 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2587, 1.0551, 1.1451, 0.8271, 1.0031, 1.3701, 1.2953, 1.2063], device='cuda:0'), covar=tensor([0.0231, 0.0141, 0.0219, 0.0408, 0.0270, 0.0439, 0.0360, 0.0286], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0011, 0.0009, 0.0009, 0.0010, 0.0010, 0.0010, 0.0009], device='cuda:0'), out_proj_covar=tensor([2.4325e-05, 2.7311e-05, 2.5340e-05, 2.6539e-05, 2.4707e-05, 2.5595e-05, 2.7102e-05, 2.3904e-05], device='cuda:0') 2022-11-15 16:31:22,846 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=16657.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:31:56,391 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=16705.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:31:59,088 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.519e+01 2.019e+02 2.564e+02 3.147e+02 5.259e+02, threshold=5.129e+02, percent-clipped=0.0 2022-11-15 16:32:12,153 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=16727.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:32:22,630 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=16741.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:32:25,239 INFO [train.py:876] (0/4) Epoch 3, batch 2200, loss[loss=0.2714, simple_loss=0.2308, pruned_loss=0.156, over 4668.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.209, pruned_loss=0.1204, over 1082878.80 frames. ], batch size: 135, lr: 2.49e-02, grad_scale: 8.0 2022-11-15 16:32:56,717 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=16789.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:33:07,901 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=16805.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:33:10,455 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.364e+02 2.121e+02 2.774e+02 3.414e+02 1.004e+03, threshold=5.548e+02, percent-clipped=7.0 2022-11-15 16:33:11,445 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.38 vs. limit=5.0 2022-11-15 16:33:36,649 INFO [train.py:876] (0/4) Epoch 3, batch 2300, loss[loss=0.2975, simple_loss=0.2607, pruned_loss=0.1671, over 5484.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2084, pruned_loss=0.1201, over 1078356.17 frames. ], batch size: 53, lr: 2.49e-02, grad_scale: 8.0 2022-11-15 16:33:37,532 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6222, 4.7073, 3.6348, 4.5755, 3.7050, 3.1999, 2.4297, 4.1496], device='cuda:0'), covar=tensor([0.1400, 0.0075, 0.0629, 0.0140, 0.0299, 0.0863, 0.2017, 0.0143], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0106, 0.0154, 0.0103, 0.0130, 0.0167, 0.0184, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:33:42,403 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=16853.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:33:53,205 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16868.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 16:34:22,204 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=16906.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:34:24,019 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.385e+02 2.301e+02 2.850e+02 3.623e+02 1.087e+03, threshold=5.699e+02, percent-clipped=3.0 2022-11-15 16:34:29,365 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.11 vs. limit=2.0 2022-11-15 16:34:38,460 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16929.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:34:50,032 INFO [train.py:876] (0/4) Epoch 3, batch 2400, loss[loss=0.2485, simple_loss=0.2294, pruned_loss=0.1338, over 5701.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2087, pruned_loss=0.1203, over 1081037.11 frames. ], batch size: 28, lr: 2.48e-02, grad_scale: 8.0 2022-11-15 16:34:54,359 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 16:35:02,522 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 16:35:05,727 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=16967.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 16:35:29,931 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1668, 0.6862, 1.1308, 0.9903, 1.4093, 1.0273, 1.3720, 1.5923], device='cuda:0'), covar=tensor([0.0361, 0.0368, 0.0717, 0.1024, 0.0373, 0.0590, 0.0394, 0.0523], device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0011, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009, 0.0009], device='cuda:0'), out_proj_covar=tensor([2.3905e-05, 2.6553e-05, 2.4480e-05, 2.6522e-05, 2.3308e-05, 2.4479e-05, 2.5406e-05, 2.3214e-05], device='cuda:0') 2022-11-15 16:35:36,095 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.257e+02 2.211e+02 2.830e+02 3.499e+02 8.414e+02, threshold=5.661e+02, percent-clipped=2.0 2022-11-15 16:35:48,486 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=17027.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:35:54,973 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8374, 3.8684, 3.6037, 3.8057, 3.9033, 3.7256, 1.2292, 3.7797], device='cuda:0'), covar=tensor([0.0241, 0.0283, 0.0320, 0.0147, 0.0191, 0.0161, 0.3054, 0.0247], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0062, 0.0063, 0.0052, 0.0074, 0.0055, 0.0120, 0.0082], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 16:35:55,001 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2966, 4.0687, 3.0698, 3.9636, 3.1064, 2.9753, 1.7825, 3.6186], device='cuda:0'), covar=tensor([0.1530, 0.0153, 0.0698, 0.0165, 0.0428, 0.0744, 0.2047, 0.0174], device='cuda:0'), in_proj_covar=tensor([0.0188, 0.0113, 0.0167, 0.0110, 0.0138, 0.0175, 0.0195, 0.0115], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:35:59,870 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.55 vs. limit=2.0 2022-11-15 16:36:01,855 INFO [train.py:876] (0/4) Epoch 3, batch 2500, loss[loss=0.1142, simple_loss=0.1314, pruned_loss=0.04853, over 5169.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.209, pruned_loss=0.1207, over 1083334.46 frames. ], batch size: 7, lr: 2.47e-02, grad_scale: 8.0 2022-11-15 16:36:02,965 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2022-11-15 16:36:07,464 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-15 16:36:22,798 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=17075.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:36:24,916 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0376, 1.2060, 2.0816, 1.3222, 0.9182, 2.4537, 1.8234, 1.6243], device='cuda:0'), covar=tensor([0.0229, 0.0205, 0.0123, 0.0251, 0.0664, 0.0265, 0.0140, 0.0202], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0028, 0.0027, 0.0030, 0.0027, 0.0024, 0.0024, 0.0027], device='cuda:0'), out_proj_covar=tensor([4.5896e-05, 4.0628e-05, 4.0933e-05, 4.7114e-05, 4.7677e-05, 4.2108e-05, 4.0224e-05, 4.5500e-05], device='cuda:0') 2022-11-15 16:36:28,927 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3553, 1.8670, 1.5011, 1.4838, 1.8575, 1.6414, 1.8843, 1.6222], device='cuda:0'), covar=tensor([0.0028, 0.0103, 0.0048, 0.0030, 0.0021, 0.0035, 0.0025, 0.0030], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0013, 0.0013, 0.0016, 0.0013, 0.0015, 0.0015, 0.0014], device='cuda:0'), out_proj_covar=tensor([1.9948e-05, 1.9559e-05, 1.8467e-05, 1.9895e-05, 1.7357e-05, 1.8413e-05, 2.0688e-05, 1.8672e-05], device='cuda:0') 2022-11-15 16:36:47,314 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.241e+02 2.290e+02 2.908e+02 3.668e+02 6.942e+02, threshold=5.815e+02, percent-clipped=2.0 2022-11-15 16:37:12,390 INFO [train.py:876] (0/4) Epoch 3, batch 2600, loss[loss=0.1911, simple_loss=0.1773, pruned_loss=0.1025, over 5354.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2079, pruned_loss=0.1193, over 1083846.09 frames. ], batch size: 9, lr: 2.47e-02, grad_scale: 8.0 2022-11-15 16:37:26,763 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 16:37:43,072 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17189.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:37:48,785 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 16:37:57,338 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 2.163e+02 2.684e+02 3.226e+02 5.923e+02, threshold=5.368e+02, percent-clipped=1.0 2022-11-15 16:38:08,601 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17224.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 16:38:22,718 INFO [train.py:876] (0/4) Epoch 3, batch 2700, loss[loss=0.2936, simple_loss=0.2506, pruned_loss=0.1683, over 5489.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2093, pruned_loss=0.1201, over 1080899.32 frames. ], batch size: 49, lr: 2.46e-02, grad_scale: 8.0 2022-11-15 16:38:23,522 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7886, 4.7141, 4.6478, 4.0818, 4.7585, 4.4450, 1.7014, 4.6720], device='cuda:0'), covar=tensor([0.0201, 0.0171, 0.0177, 0.0195, 0.0253, 0.0192, 0.2333, 0.0248], device='cuda:0'), in_proj_covar=tensor([0.0081, 0.0059, 0.0061, 0.0051, 0.0075, 0.0053, 0.0113, 0.0080], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 16:38:26,435 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17250.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:38:34,859 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17262.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 16:38:37,146 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5084, 3.4985, 3.4370, 3.6885, 3.1027, 2.9949, 3.8780, 3.3473], device='cuda:0'), covar=tensor([0.0391, 0.0668, 0.0404, 0.0458, 0.0513, 0.0309, 0.0635, 0.0445], device='cuda:0'), in_proj_covar=tensor([0.0052, 0.0074, 0.0063, 0.0070, 0.0053, 0.0044, 0.0082, 0.0055], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 16:39:01,401 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 16:39:07,892 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 2.430e+02 3.020e+02 3.987e+02 6.630e+02, threshold=6.040e+02, percent-clipped=4.0 2022-11-15 16:39:12,274 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2629, 1.8502, 2.9874, 2.4441, 2.8913, 2.2749, 2.6589, 3.1721], device='cuda:0'), covar=tensor([0.0036, 0.0324, 0.0085, 0.0209, 0.0065, 0.0224, 0.0167, 0.0067], device='cuda:0'), in_proj_covar=tensor([0.0102, 0.0164, 0.0122, 0.0177, 0.0111, 0.0158, 0.0171, 0.0132], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:39:13,122 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.36 vs. limit=5.0 2022-11-15 16:39:14,282 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0072, 1.3459, 2.0009, 1.7545, 1.9562, 1.3310, 1.6040, 1.9459], device='cuda:0'), covar=tensor([0.0024, 0.0174, 0.0032, 0.0055, 0.0034, 0.0217, 0.0090, 0.0037], device='cuda:0'), in_proj_covar=tensor([0.0101, 0.0163, 0.0121, 0.0176, 0.0110, 0.0157, 0.0170, 0.0132], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:39:21,766 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-15 16:39:33,471 INFO [train.py:876] (0/4) Epoch 3, batch 2800, loss[loss=0.1953, simple_loss=0.1947, pruned_loss=0.09799, over 5710.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2055, pruned_loss=0.1163, over 1085243.50 frames. ], batch size: 13, lr: 2.45e-02, grad_scale: 8.0 2022-11-15 16:39:45,078 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17362.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:39:50,158 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.17 vs. limit=5.0 2022-11-15 16:40:00,284 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17383.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:40:18,908 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.389e+02 2.246e+02 2.631e+02 3.087e+02 4.773e+02, threshold=5.262e+02, percent-clipped=0.0 2022-11-15 16:40:28,884 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17423.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:40:34,943 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.14 vs. limit=2.0 2022-11-15 16:40:44,641 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17444.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:40:45,103 INFO [train.py:876] (0/4) Epoch 3, batch 2900, loss[loss=0.218, simple_loss=0.2, pruned_loss=0.118, over 5556.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2083, pruned_loss=0.1194, over 1088056.47 frames. ], batch size: 25, lr: 2.45e-02, grad_scale: 8.0 2022-11-15 16:41:13,541 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0677, 2.5429, 3.1379, 4.4566, 4.7097, 3.2999, 2.9869, 4.8589], device='cuda:0'), covar=tensor([0.0044, 0.1838, 0.1106, 0.0531, 0.0104, 0.1153, 0.1072, 0.0024], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0218, 0.0231, 0.0178, 0.0159, 0.0237, 0.0207, 0.0116], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0003, 0.0002, 0.0004, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:41:15,703 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.60 vs. limit=5.0 2022-11-15 16:41:30,293 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.327e+02 2.189e+02 2.747e+02 3.512e+02 7.345e+02, threshold=5.495e+02, percent-clipped=3.0 2022-11-15 16:41:40,747 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=17524.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:41:54,811 INFO [train.py:876] (0/4) Epoch 3, batch 3000, loss[loss=0.2009, simple_loss=0.2056, pruned_loss=0.09814, over 5747.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2084, pruned_loss=0.1195, over 1089426.53 frames. ], batch size: 16, lr: 2.44e-02, grad_scale: 8.0 2022-11-15 16:41:54,812 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 16:42:13,677 INFO [train.py:908] (0/4) Epoch 3, validation: loss=0.1847, simple_loss=0.2015, pruned_loss=0.08391, over 1530663.00 frames. 2022-11-15 16:42:13,677 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 16:42:13,751 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17545.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:42:18,869 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-15 16:42:25,436 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=17562.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 16:42:32,287 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=17572.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 16:42:41,315 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9586, 5.2462, 4.1459, 5.0962, 4.2924, 3.4937, 2.9349, 4.7640], device='cuda:0'), covar=tensor([0.1174, 0.0090, 0.0458, 0.0134, 0.0175, 0.0616, 0.1691, 0.0092], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0111, 0.0158, 0.0104, 0.0130, 0.0171, 0.0187, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:42:59,061 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 2.309e+02 3.014e+02 4.120e+02 1.212e+03, threshold=6.029e+02, percent-clipped=7.0 2022-11-15 16:43:00,215 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=17610.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:43:24,411 INFO [train.py:876] (0/4) Epoch 3, batch 3100, loss[loss=0.2408, simple_loss=0.2177, pruned_loss=0.132, over 5512.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2084, pruned_loss=0.1201, over 1080631.95 frames. ], batch size: 49, lr: 2.43e-02, grad_scale: 8.0 2022-11-15 16:43:52,670 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4457, 5.0298, 3.7860, 2.3234, 4.8142, 1.6188, 4.5815, 3.1048], device='cuda:0'), covar=tensor([0.0701, 0.0110, 0.0380, 0.1629, 0.0126, 0.1565, 0.0116, 0.0945], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0084, 0.0081, 0.0113, 0.0084, 0.0124, 0.0070, 0.0116], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 16:44:04,501 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17700.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:44:10,834 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 2.237e+02 2.895e+02 3.474e+02 7.762e+02, threshold=5.791e+02, percent-clipped=3.0 2022-11-15 16:44:16,893 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17718.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:44:22,192 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.04 vs. limit=2.0 2022-11-15 16:44:25,297 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17730.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:44:25,552 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 16:44:31,318 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=17739.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:44:35,379 INFO [train.py:876] (0/4) Epoch 3, batch 3200, loss[loss=0.2122, simple_loss=0.19, pruned_loss=0.1172, over 5483.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2093, pruned_loss=0.1203, over 1081674.25 frames. ], batch size: 12, lr: 2.43e-02, grad_scale: 8.0 2022-11-15 16:44:43,816 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2022-11-15 16:44:47,656 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17761.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:45:08,541 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=17791.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:45:21,430 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.240e+02 2.197e+02 2.889e+02 3.896e+02 6.814e+02, threshold=5.779e+02, percent-clipped=2.0 2022-11-15 16:45:44,143 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8238, 2.9400, 2.7657, 2.7564, 2.8435, 2.8109, 1.1940, 2.7739], device='cuda:0'), covar=tensor([0.0238, 0.0148, 0.0189, 0.0170, 0.0226, 0.0179, 0.2115, 0.0277], device='cuda:0'), in_proj_covar=tensor([0.0081, 0.0059, 0.0064, 0.0050, 0.0075, 0.0055, 0.0114, 0.0081], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 16:45:47,451 INFO [train.py:876] (0/4) Epoch 3, batch 3300, loss[loss=0.3762, simple_loss=0.292, pruned_loss=0.2302, over 5468.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2103, pruned_loss=0.1215, over 1074211.39 frames. ], batch size: 64, lr: 2.42e-02, grad_scale: 8.0 2022-11-15 16:45:47,593 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=17845.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:46:02,021 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1521, 4.3949, 3.5772, 1.9734, 4.3397, 1.7460, 4.5581, 2.4125], device='cuda:0'), covar=tensor([0.0886, 0.0106, 0.0354, 0.1815, 0.0131, 0.1679, 0.0056, 0.1473], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0087, 0.0084, 0.0115, 0.0087, 0.0129, 0.0074, 0.0122], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 16:46:11,249 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.08 vs. limit=2.0 2022-11-15 16:46:21,791 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=17893.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:46:32,463 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 2.187e+02 2.686e+02 3.512e+02 8.499e+02, threshold=5.372e+02, percent-clipped=2.0 2022-11-15 16:46:50,371 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.34 vs. limit=5.0 2022-11-15 16:46:58,450 INFO [train.py:876] (0/4) Epoch 3, batch 3400, loss[loss=0.1876, simple_loss=0.1898, pruned_loss=0.09272, over 5773.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2093, pruned_loss=0.1197, over 1086235.49 frames. ], batch size: 20, lr: 2.41e-02, grad_scale: 16.0 2022-11-15 16:47:08,854 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=17960.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:47:19,067 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7511, 1.7536, 1.9942, 1.2636, 2.6614, 1.3633, 2.0847, 1.9378], device='cuda:0'), covar=tensor([0.0026, 0.0093, 0.0067, 0.0035, 0.0011, 0.0084, 0.0022, 0.0030], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0012, 0.0013, 0.0016, 0.0013, 0.0015, 0.0015, 0.0014], device='cuda:0'), out_proj_covar=tensor([1.9574e-05, 1.7440e-05, 1.7634e-05, 1.9840e-05, 1.5013e-05, 1.8309e-05, 1.9527e-05, 1.8696e-05], device='cuda:0') 2022-11-15 16:47:43,699 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.311e+02 2.080e+02 2.540e+02 3.324e+02 6.629e+02, threshold=5.080e+02, percent-clipped=5.0 2022-11-15 16:47:50,076 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18018.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:47:52,126 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18021.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:48:04,387 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18039.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:48:09,016 INFO [train.py:876] (0/4) Epoch 3, batch 3500, loss[loss=0.1906, simple_loss=0.191, pruned_loss=0.09511, over 5584.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2087, pruned_loss=0.119, over 1088585.67 frames. ], batch size: 18, lr: 2.41e-02, grad_scale: 16.0 2022-11-15 16:48:17,473 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18056.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:48:21,964 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.26 vs. limit=2.0 2022-11-15 16:48:24,383 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18066.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:48:28,358 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.95 vs. limit=5.0 2022-11-15 16:48:38,158 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18086.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:48:38,774 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18087.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:48:55,270 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.292e+02 2.252e+02 2.672e+02 3.554e+02 6.890e+02, threshold=5.344e+02, percent-clipped=4.0 2022-11-15 16:49:14,776 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5260, 4.4479, 3.6848, 4.3662, 3.4678, 3.0486, 2.2401, 3.7956], device='cuda:0'), covar=tensor([0.1191, 0.0130, 0.0454, 0.0143, 0.0297, 0.0659, 0.1829, 0.0136], device='cuda:0'), in_proj_covar=tensor([0.0173, 0.0111, 0.0154, 0.0107, 0.0132, 0.0168, 0.0187, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:49:20,260 INFO [train.py:876] (0/4) Epoch 3, batch 3600, loss[loss=0.2047, simple_loss=0.1988, pruned_loss=0.1053, over 5611.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2096, pruned_loss=0.1201, over 1086156.03 frames. ], batch size: 18, lr: 2.40e-02, grad_scale: 16.0 2022-11-15 16:49:32,013 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.09 vs. limit=2.0 2022-11-15 16:49:34,888 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9932, 3.5339, 2.8180, 3.4220, 2.5334, 2.5580, 1.8129, 2.9769], device='cuda:0'), covar=tensor([0.1327, 0.0134, 0.0534, 0.0205, 0.0565, 0.0751, 0.1858, 0.0173], device='cuda:0'), in_proj_covar=tensor([0.0174, 0.0111, 0.0153, 0.0107, 0.0132, 0.0169, 0.0185, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:49:41,594 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18174.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:49:42,291 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18175.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:50:06,560 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.401e+02 2.223e+02 2.647e+02 3.378e+02 5.745e+02, threshold=5.295e+02, percent-clipped=2.0 2022-11-15 16:50:25,759 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18235.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:50:26,451 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18236.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:50:32,385 INFO [train.py:876] (0/4) Epoch 3, batch 3700, loss[loss=0.3489, simple_loss=0.2749, pruned_loss=0.2114, over 5405.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2086, pruned_loss=0.1192, over 1087873.77 frames. ], batch size: 58, lr: 2.40e-02, grad_scale: 16.0 2022-11-15 16:50:47,102 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.7861, 5.3229, 5.6258, 5.2992, 5.8010, 5.8712, 4.9525, 5.7380], device='cuda:0'), covar=tensor([0.0267, 0.0166, 0.0275, 0.0207, 0.0250, 0.0054, 0.0147, 0.0135], device='cuda:0'), in_proj_covar=tensor([0.0079, 0.0083, 0.0068, 0.0087, 0.0083, 0.0054, 0.0070, 0.0076], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 16:50:48,999 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9384, 1.3789, 1.5351, 1.4556, 1.7844, 1.4078, 1.1277, 1.9297], device='cuda:0'), covar=tensor([0.0082, 0.0691, 0.0402, 0.0295, 0.0171, 0.0428, 0.0673, 0.0086], device='cuda:0'), in_proj_covar=tensor([0.0125, 0.0225, 0.0228, 0.0190, 0.0170, 0.0239, 0.0209, 0.0123], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:51:17,618 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.280e+02 2.074e+02 2.681e+02 3.314e+02 6.700e+02, threshold=5.362e+02, percent-clipped=3.0 2022-11-15 16:51:22,574 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18316.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:51:43,707 INFO [train.py:876] (0/4) Epoch 3, batch 3800, loss[loss=0.2955, simple_loss=0.2556, pruned_loss=0.1677, over 5301.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2074, pruned_loss=0.1182, over 1084061.23 frames. ], batch size: 79, lr: 2.39e-02, grad_scale: 16.0 2022-11-15 16:51:49,493 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.40 vs. limit=5.0 2022-11-15 16:51:51,402 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18356.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:52:12,769 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18386.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:52:25,087 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18404.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:52:28,470 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 2.160e+02 2.623e+02 3.183e+02 6.237e+02, threshold=5.245e+02, percent-clipped=1.0 2022-11-15 16:52:46,227 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18434.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:52:49,528 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8029, 2.2240, 2.7283, 3.7928, 3.8089, 2.8488, 2.1980, 3.9688], device='cuda:0'), covar=tensor([0.0073, 0.1576, 0.1159, 0.0514, 0.0214, 0.1104, 0.1153, 0.0069], device='cuda:0'), in_proj_covar=tensor([0.0125, 0.0220, 0.0229, 0.0193, 0.0173, 0.0235, 0.0211, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:52:54,049 INFO [train.py:876] (0/4) Epoch 3, batch 3900, loss[loss=0.2445, simple_loss=0.23, pruned_loss=0.1295, over 5799.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2057, pruned_loss=0.116, over 1086246.05 frames. ], batch size: 21, lr: 2.38e-02, grad_scale: 16.0 2022-11-15 16:53:06,778 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-15 16:53:14,232 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.6233, 1.2249, 1.2282, 0.9637, 0.9301, 1.4483, 0.8798, 1.3406], device='cuda:0'), covar=tensor([0.0032, 0.0017, 0.0014, 0.0012, 0.0019, 0.0014, 0.0038, 0.0015], device='cuda:0'), in_proj_covar=tensor([0.0020, 0.0018, 0.0018, 0.0018, 0.0020, 0.0016, 0.0019, 0.0018], device='cuda:0'), out_proj_covar=tensor([2.6436e-05, 2.5252e-05, 2.2566e-05, 1.8734e-05, 2.3485e-05, 1.8695e-05, 3.0565e-05, 2.2744e-05], device='cuda:0') 2022-11-15 16:53:26,652 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18490.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:53:34,540 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8766, 2.1452, 2.6272, 3.8793, 4.0619, 2.6945, 2.2345, 3.9801], device='cuda:0'), covar=tensor([0.0059, 0.1701, 0.1262, 0.0587, 0.0179, 0.1376, 0.1116, 0.0066], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0223, 0.0231, 0.0194, 0.0174, 0.0236, 0.0210, 0.0123], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002], device='cuda:0') 2022-11-15 16:53:39,715 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18508.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 16:53:40,176 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.408e+02 2.213e+02 2.841e+02 3.897e+02 8.408e+02, threshold=5.683e+02, percent-clipped=7.0 2022-11-15 16:53:54,782 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18530.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:53:55,468 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18531.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:54:05,586 INFO [train.py:876] (0/4) Epoch 3, batch 4000, loss[loss=0.2355, simple_loss=0.209, pruned_loss=0.131, over 5270.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2056, pruned_loss=0.1155, over 1084409.82 frames. ], batch size: 79, lr: 2.38e-02, grad_scale: 16.0 2022-11-15 16:54:10,044 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18551.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:54:22,442 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18569.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 16:54:51,285 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.420e+02 2.156e+02 2.771e+02 3.318e+02 9.148e+02, threshold=5.542e+02, percent-clipped=3.0 2022-11-15 16:54:56,232 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18616.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:55:16,119 INFO [train.py:876] (0/4) Epoch 3, batch 4100, loss[loss=0.1669, simple_loss=0.1591, pruned_loss=0.08737, over 5740.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2062, pruned_loss=0.1171, over 1086903.15 frames. ], batch size: 11, lr: 2.37e-02, grad_scale: 16.0 2022-11-15 16:55:19,708 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 16:55:29,860 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18664.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:55:37,265 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18674.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:55:48,019 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2279, 1.2565, 1.8113, 1.5230, 1.8152, 1.2325, 1.3824, 1.7905], device='cuda:0'), covar=tensor([0.0026, 0.0048, 0.0020, 0.0027, 0.0020, 0.0044, 0.0029, 0.0026], device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0013, 0.0012, 0.0015, 0.0014, 0.0014, 0.0015, 0.0015], device='cuda:0'), out_proj_covar=tensor([1.9688e-05, 1.8748e-05, 1.6952e-05, 1.9128e-05, 1.5623e-05, 1.7788e-05, 1.9169e-05, 1.9008e-05], device='cuda:0') 2022-11-15 16:56:01,443 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 2.068e+02 2.779e+02 3.549e+02 8.529e+02, threshold=5.559e+02, percent-clipped=5.0 2022-11-15 16:56:20,127 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18735.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:56:26,895 INFO [train.py:876] (0/4) Epoch 3, batch 4200, loss[loss=0.2197, simple_loss=0.2215, pruned_loss=0.109, over 5546.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2069, pruned_loss=0.1173, over 1084477.78 frames. ], batch size: 15, lr: 2.37e-02, grad_scale: 16.0 2022-11-15 16:56:54,914 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 16:56:56,229 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.02 vs. limit=2.0 2022-11-15 16:57:12,962 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 2.085e+02 2.508e+02 3.197e+02 8.828e+02, threshold=5.016e+02, percent-clipped=2.0 2022-11-15 16:57:18,631 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2022-11-15 16:57:28,700 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18830.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:57:29,349 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=18831.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:57:38,764 INFO [train.py:876] (0/4) Epoch 3, batch 4300, loss[loss=0.2669, simple_loss=0.2317, pruned_loss=0.1511, over 5499.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2076, pruned_loss=0.1171, over 1085739.93 frames. ], batch size: 49, lr: 2.36e-02, grad_scale: 16.0 2022-11-15 16:57:39,544 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18846.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:57:46,157 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.61 vs. limit=5.0 2022-11-15 16:57:52,362 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=18864.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 16:58:02,525 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18878.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:58:03,526 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=18879.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:58:13,841 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.51 vs. limit=5.0 2022-11-15 16:58:24,292 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.372e+02 2.437e+02 2.955e+02 3.758e+02 8.175e+02, threshold=5.909e+02, percent-clipped=8.0 2022-11-15 16:58:35,252 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.40 vs. limit=5.0 2022-11-15 16:58:38,459 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4558, 2.1925, 1.8972, 2.5487, 2.2886, 1.7624, 1.8232, 2.3153], device='cuda:0'), covar=tensor([0.0571, 0.0498, 0.1059, 0.0462, 0.0647, 0.0639, 0.0683, 0.0786], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0044, 0.0053, 0.0034, 0.0054, 0.0043, 0.0053, 0.0033], device='cuda:0'), out_proj_covar=tensor([8.9371e-05, 1.0009e-04, 1.2495e-04, 7.9628e-05, 1.1960e-04, 1.0212e-04, 1.1416e-04, 7.6944e-05], device='cuda:0') 2022-11-15 16:58:40,182 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=18930.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:58:43,693 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2022-11-15 16:58:46,279 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.21 vs. limit=2.0 2022-11-15 16:58:50,712 INFO [train.py:876] (0/4) Epoch 3, batch 4400, loss[loss=0.1579, simple_loss=0.1543, pruned_loss=0.08074, over 5058.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2079, pruned_loss=0.1174, over 1083214.41 frames. ], batch size: 7, lr: 2.35e-02, grad_scale: 16.0 2022-11-15 16:58:54,542 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.07 vs. limit=2.0 2022-11-15 16:59:23,531 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=18991.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 16:59:36,220 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.259e+02 2.090e+02 2.682e+02 3.252e+02 5.703e+02, threshold=5.365e+02, percent-clipped=0.0 2022-11-15 16:59:51,555 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19030.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:00:02,100 INFO [train.py:876] (0/4) Epoch 3, batch 4500, loss[loss=0.2501, simple_loss=0.233, pruned_loss=0.1336, over 5758.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2061, pruned_loss=0.1156, over 1087318.72 frames. ], batch size: 31, lr: 2.35e-02, grad_scale: 16.0 2022-11-15 17:00:11,821 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2022-11-15 17:00:13,925 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.09 vs. limit=5.0 2022-11-15 17:00:19,852 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0515, 1.6734, 1.4444, 2.0416, 1.0381, 1.1136, 1.1931, 1.6931], device='cuda:0'), covar=tensor([0.0299, 0.0329, 0.0531, 0.0185, 0.0539, 0.0622, 0.0394, 0.0386], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0044, 0.0054, 0.0034, 0.0055, 0.0045, 0.0053, 0.0034], device='cuda:0'), out_proj_covar=tensor([8.9916e-05, 1.0055e-04, 1.2795e-04, 7.9733e-05, 1.2394e-04, 1.0561e-04, 1.1560e-04, 7.9275e-05], device='cuda:0') 2022-11-15 17:00:48,080 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.336e+02 1.944e+02 2.562e+02 3.151e+02 4.952e+02, threshold=5.125e+02, percent-clipped=0.0 2022-11-15 17:01:02,226 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=19129.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 17:01:13,373 INFO [train.py:876] (0/4) Epoch 3, batch 4600, loss[loss=0.1794, simple_loss=0.1798, pruned_loss=0.08951, over 5675.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2055, pruned_loss=0.1153, over 1078215.80 frames. ], batch size: 11, lr: 2.34e-02, grad_scale: 16.0 2022-11-15 17:01:14,577 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19146.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:01:27,314 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19164.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 17:01:46,007 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=19190.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 17:01:48,572 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19194.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:01:59,681 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 2.110e+02 2.501e+02 3.073e+02 9.500e+02, threshold=5.001e+02, percent-clipped=3.0 2022-11-15 17:02:01,808 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19212.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 17:02:24,582 INFO [train.py:876] (0/4) Epoch 3, batch 4700, loss[loss=0.2831, simple_loss=0.2527, pruned_loss=0.1567, over 5778.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.206, pruned_loss=0.1153, over 1084815.60 frames. ], batch size: 27, lr: 2.34e-02, grad_scale: 16.0 2022-11-15 17:02:37,908 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.57 vs. limit=2.0 2022-11-15 17:02:53,841 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19286.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:03:07,072 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.34 vs. limit=5.0 2022-11-15 17:03:10,434 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.250e+02 2.139e+02 2.654e+02 3.598e+02 8.917e+02, threshold=5.308e+02, percent-clipped=4.0 2022-11-15 17:03:17,458 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-15 17:03:17,774 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3985, 3.4525, 2.6377, 1.6944, 3.4504, 1.1264, 3.3208, 1.9162], device='cuda:0'), covar=tensor([0.0914, 0.0140, 0.0586, 0.1657, 0.0139, 0.1880, 0.0160, 0.1325], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0086, 0.0089, 0.0121, 0.0091, 0.0130, 0.0079, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 17:03:25,176 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19330.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:03:35,748 INFO [train.py:876] (0/4) Epoch 3, batch 4800, loss[loss=0.2441, simple_loss=0.217, pruned_loss=0.1356, over 5559.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2052, pruned_loss=0.1152, over 1085945.00 frames. ], batch size: 40, lr: 2.33e-02, grad_scale: 8.0 2022-11-15 17:03:59,322 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19378.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:04:10,754 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=19394.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:04:21,908 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.292e+02 2.238e+02 2.787e+02 3.697e+02 9.222e+02, threshold=5.575e+02, percent-clipped=6.0 2022-11-15 17:04:47,609 INFO [train.py:876] (0/4) Epoch 3, batch 4900, loss[loss=0.1933, simple_loss=0.1941, pruned_loss=0.09622, over 5592.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2045, pruned_loss=0.1152, over 1082566.33 frames. ], batch size: 24, lr: 2.32e-02, grad_scale: 8.0 2022-11-15 17:04:54,588 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=19455.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:05:16,059 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19485.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 17:05:33,094 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 2.112e+02 2.712e+02 3.360e+02 6.764e+02, threshold=5.425e+02, percent-clipped=2.0 2022-11-15 17:05:58,041 INFO [train.py:876] (0/4) Epoch 3, batch 5000, loss[loss=0.2421, simple_loss=0.222, pruned_loss=0.1311, over 5591.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2068, pruned_loss=0.1166, over 1081743.33 frames. ], batch size: 22, lr: 2.32e-02, grad_scale: 8.0 2022-11-15 17:06:01,746 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 17:06:01,938 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.92 vs. limit=2.0 2022-11-15 17:06:09,504 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2794, 3.4383, 2.9428, 2.9945, 2.1087, 3.3115, 2.4562, 3.0172], device='cuda:0'), covar=tensor([0.0151, 0.0029, 0.0064, 0.0081, 0.0158, 0.0038, 0.0106, 0.0026], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0076, 0.0096, 0.0097, 0.0131, 0.0088, 0.0114, 0.0075], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:06:13,579 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3952, 4.2182, 3.5107, 3.6268, 2.4864, 4.1293, 2.7703, 3.4569], device='cuda:0'), covar=tensor([0.0235, 0.0052, 0.0079, 0.0136, 0.0221, 0.0038, 0.0151, 0.0032], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0076, 0.0096, 0.0097, 0.0131, 0.0088, 0.0114, 0.0075], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:06:26,434 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19586.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:06:44,100 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.388e+02 2.122e+02 2.600e+02 3.385e+02 5.559e+02, threshold=5.201e+02, percent-clipped=2.0 2022-11-15 17:06:47,322 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7162, 4.8458, 5.1048, 5.1450, 4.7705, 4.3768, 5.6629, 4.9698], device='cuda:0'), covar=tensor([0.0564, 0.0929, 0.0460, 0.0609, 0.0481, 0.0350, 0.0760, 0.0551], device='cuda:0'), in_proj_covar=tensor([0.0055, 0.0078, 0.0065, 0.0076, 0.0058, 0.0048, 0.0092, 0.0060], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:06:50,200 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=19618.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:06:53,905 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2833, 1.9897, 2.2735, 3.2652, 3.4390, 2.3255, 1.9392, 3.3072], device='cuda:0'), covar=tensor([0.0097, 0.1937, 0.1926, 0.0681, 0.0223, 0.1891, 0.1517, 0.0075], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0216, 0.0229, 0.0208, 0.0174, 0.0231, 0.0208, 0.0127], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0002], device='cuda:0') 2022-11-15 17:07:01,378 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19634.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:07:02,106 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.19 vs. limit=2.0 2022-11-15 17:07:09,723 INFO [train.py:876] (0/4) Epoch 3, batch 5100, loss[loss=0.1688, simple_loss=0.1825, pruned_loss=0.07758, over 5709.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2065, pruned_loss=0.1163, over 1079303.95 frames. ], batch size: 15, lr: 2.31e-02, grad_scale: 8.0 2022-11-15 17:07:12,571 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 17:07:17,443 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.14 vs. limit=2.0 2022-11-15 17:07:34,068 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=19679.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:07:54,167 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5110, 1.5647, 1.7685, 1.6441, 0.4696, 1.4829, 1.5502, 1.5814], device='cuda:0'), covar=tensor([0.0218, 0.0222, 0.0134, 0.0324, 0.0679, 0.0452, 0.0355, 0.0253], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0027, 0.0027, 0.0030, 0.0027, 0.0027, 0.0025, 0.0027], device='cuda:0'), out_proj_covar=tensor([4.4661e-05, 4.1230e-05, 4.1078e-05, 4.9745e-05, 4.7608e-05, 4.8111e-05, 3.9566e-05, 4.5093e-05], device='cuda:0') 2022-11-15 17:07:56,097 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.321e+02 2.277e+02 2.877e+02 3.641e+02 9.302e+02, threshold=5.755e+02, percent-clipped=5.0 2022-11-15 17:08:09,698 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0213, 4.4365, 4.2173, 4.5091, 4.0564, 3.2321, 5.0213, 4.3237], device='cuda:0'), covar=tensor([0.0571, 0.0638, 0.0428, 0.0633, 0.0435, 0.0490, 0.0654, 0.0376], device='cuda:0'), in_proj_covar=tensor([0.0056, 0.0079, 0.0066, 0.0076, 0.0059, 0.0048, 0.0093, 0.0060], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:08:20,159 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6336, 3.8582, 3.3412, 3.3180, 2.6128, 4.1819, 2.6929, 3.5371], device='cuda:0'), covar=tensor([0.0194, 0.0064, 0.0085, 0.0175, 0.0211, 0.0033, 0.0149, 0.0032], device='cuda:0'), in_proj_covar=tensor([0.0136, 0.0078, 0.0101, 0.0102, 0.0134, 0.0092, 0.0116, 0.0078], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:08:20,624 INFO [train.py:876] (0/4) Epoch 3, batch 5200, loss[loss=0.243, simple_loss=0.2204, pruned_loss=0.1328, over 5489.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2041, pruned_loss=0.1137, over 1081393.86 frames. ], batch size: 12, lr: 2.31e-02, grad_scale: 8.0 2022-11-15 17:08:24,894 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19750.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:08:49,751 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=19785.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 17:09:03,838 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.88 vs. limit=2.0 2022-11-15 17:09:07,142 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.329e+02 2.129e+02 2.534e+02 3.438e+02 1.087e+03, threshold=5.069e+02, percent-clipped=3.0 2022-11-15 17:09:23,653 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=19833.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 17:09:32,161 INFO [train.py:876] (0/4) Epoch 3, batch 5300, loss[loss=0.1695, simple_loss=0.1774, pruned_loss=0.08076, over 5725.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2044, pruned_loss=0.1139, over 1087469.26 frames. ], batch size: 15, lr: 2.30e-02, grad_scale: 8.0 2022-11-15 17:10:10,034 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.95 vs. limit=5.0 2022-11-15 17:10:18,063 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.350e+02 2.053e+02 2.554e+02 3.563e+02 7.883e+02, threshold=5.108e+02, percent-clipped=6.0 2022-11-15 17:10:43,253 INFO [train.py:876] (0/4) Epoch 3, batch 5400, loss[loss=0.185, simple_loss=0.187, pruned_loss=0.09154, over 5735.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2027, pruned_loss=0.1121, over 1088416.47 frames. ], batch size: 13, lr: 2.30e-02, grad_scale: 8.0 2022-11-15 17:10:50,123 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 17:11:04,386 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=19974.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:11:22,978 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-20000.pt 2022-11-15 17:11:33,651 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 2.054e+02 2.433e+02 3.312e+02 7.375e+02, threshold=4.866e+02, percent-clipped=2.0 2022-11-15 17:11:40,710 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20020.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:11:57,602 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0015, 2.1040, 2.6820, 3.8824, 4.0673, 2.9120, 2.3792, 4.1314], device='cuda:0'), covar=tensor([0.0091, 0.2751, 0.1899, 0.1440, 0.0234, 0.1897, 0.1949, 0.0089], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0224, 0.0230, 0.0214, 0.0180, 0.0240, 0.0222, 0.0135], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0004, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004, 0.0002], device='cuda:0') 2022-11-15 17:11:58,779 INFO [train.py:876] (0/4) Epoch 3, batch 5500, loss[loss=0.332, simple_loss=0.277, pruned_loss=0.1935, over 5406.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2033, pruned_loss=0.1123, over 1091285.82 frames. ], batch size: 70, lr: 2.29e-02, grad_scale: 8.0 2022-11-15 17:12:02,255 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20050.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:12:06,979 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2022-11-15 17:12:17,111 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3752, 1.9313, 1.8140, 1.3463, 1.3046, 1.3453, 1.6473, 1.6097], device='cuda:0'), covar=tensor([0.0028, 0.0048, 0.0063, 0.0026, 0.0021, 0.0087, 0.0023, 0.0021], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0012, 0.0012, 0.0014, 0.0014, 0.0013, 0.0015, 0.0013], device='cuda:0'), out_proj_covar=tensor([1.8948e-05, 1.6130e-05, 1.6315e-05, 1.7865e-05, 1.4842e-05, 1.6033e-05, 1.8388e-05, 1.7165e-05], device='cuda:0') 2022-11-15 17:12:24,744 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20081.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:12:30,161 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20089.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:12:34,434 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.85 vs. limit=2.0 2022-11-15 17:12:36,656 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20098.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:12:44,382 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 17:12:45,203 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.380e+02 2.322e+02 3.147e+02 3.867e+02 1.026e+03, threshold=6.293e+02, percent-clipped=11.0 2022-11-15 17:12:46,851 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20112.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:12:50,917 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3486, 3.4111, 3.3458, 3.3250, 3.4098, 3.2922, 1.2421, 3.4041], device='cuda:0'), covar=tensor([0.0227, 0.0201, 0.0180, 0.0144, 0.0233, 0.0215, 0.2208, 0.0193], device='cuda:0'), in_proj_covar=tensor([0.0082, 0.0062, 0.0064, 0.0054, 0.0077, 0.0060, 0.0115, 0.0084], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:13:00,341 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2022-11-15 17:13:09,287 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2531, 1.8478, 2.8298, 2.5538, 2.9858, 1.8527, 2.6675, 3.1910], device='cuda:0'), covar=tensor([0.0058, 0.0412, 0.0131, 0.0330, 0.0091, 0.0326, 0.0241, 0.0128], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0174, 0.0131, 0.0182, 0.0122, 0.0162, 0.0189, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 17:13:10,507 INFO [train.py:876] (0/4) Epoch 3, batch 5600, loss[loss=0.174, simple_loss=0.179, pruned_loss=0.08448, over 5722.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2039, pruned_loss=0.1129, over 1085503.48 frames. ], batch size: 17, lr: 2.29e-02, grad_scale: 8.0 2022-11-15 17:13:14,117 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20150.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:13:28,136 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 17:13:30,566 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20173.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:13:52,969 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20204.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:13:56,931 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 2.126e+02 2.547e+02 3.418e+02 6.941e+02, threshold=5.093e+02, percent-clipped=1.0 2022-11-15 17:14:16,439 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=8.58 vs. limit=5.0 2022-11-15 17:14:22,610 INFO [train.py:876] (0/4) Epoch 3, batch 5700, loss[loss=0.229, simple_loss=0.2129, pruned_loss=0.1225, over 5651.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2013, pruned_loss=0.1105, over 1081986.28 frames. ], batch size: 32, lr: 2.28e-02, grad_scale: 8.0 2022-11-15 17:14:36,317 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20265.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:14:42,876 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20274.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:15:08,577 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 2.183e+02 2.755e+02 3.255e+02 9.254e+02, threshold=5.510e+02, percent-clipped=4.0 2022-11-15 17:15:16,852 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20322.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:15:33,291 INFO [train.py:876] (0/4) Epoch 3, batch 5800, loss[loss=0.2442, simple_loss=0.2073, pruned_loss=0.1405, over 4705.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2004, pruned_loss=0.1096, over 1085528.78 frames. ], batch size: 135, lr: 2.28e-02, grad_scale: 8.0 2022-11-15 17:15:37,541 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 17:15:55,493 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20376.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:16:04,046 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20388.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:16:19,794 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.418e+02 2.270e+02 2.884e+02 3.519e+02 5.566e+02, threshold=5.768e+02, percent-clipped=1.0 2022-11-15 17:16:25,584 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20418.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:16:44,673 INFO [train.py:876] (0/4) Epoch 3, batch 5900, loss[loss=0.153, simple_loss=0.1598, pruned_loss=0.07309, over 5360.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2015, pruned_loss=0.1112, over 1089866.05 frames. ], batch size: 9, lr: 2.27e-02, grad_scale: 8.0 2022-11-15 17:16:44,761 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20445.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:16:47,617 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20449.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:17:01,271 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20468.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:17:09,328 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20479.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:17:30,803 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 2.074e+02 2.650e+02 3.512e+02 6.887e+02, threshold=5.300e+02, percent-clipped=1.0 2022-11-15 17:17:45,608 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 17:17:55,880 INFO [train.py:876] (0/4) Epoch 3, batch 6000, loss[loss=0.1342, simple_loss=0.144, pruned_loss=0.06216, over 5072.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2016, pruned_loss=0.1113, over 1087770.22 frames. ], batch size: 7, lr: 2.27e-02, grad_scale: 8.0 2022-11-15 17:17:55,883 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 17:18:13,095 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5148, 1.8449, 2.5874, 3.2188, 3.4819, 2.2668, 2.0487, 3.2876], device='cuda:0'), covar=tensor([0.0131, 0.3079, 0.1774, 0.1151, 0.0272, 0.2211, 0.1946, 0.0110], device='cuda:0'), in_proj_covar=tensor([0.0127, 0.0221, 0.0223, 0.0215, 0.0184, 0.0231, 0.0209, 0.0136], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0002], device='cuda:0') 2022-11-15 17:18:14,732 INFO [train.py:908] (0/4) Epoch 3, validation: loss=0.1788, simple_loss=0.1971, pruned_loss=0.08032, over 1530663.00 frames. 2022-11-15 17:18:14,733 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 17:18:25,277 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20560.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:18:45,549 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=20588.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:19:00,872 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 1.977e+02 2.420e+02 3.254e+02 5.998e+02, threshold=4.840e+02, percent-clipped=5.0 2022-11-15 17:19:25,865 INFO [train.py:876] (0/4) Epoch 3, batch 6100, loss[loss=0.1899, simple_loss=0.1847, pruned_loss=0.09751, over 4658.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2019, pruned_loss=0.1117, over 1084030.50 frames. ], batch size: 5, lr: 2.26e-02, grad_scale: 8.0 2022-11-15 17:19:28,889 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=20649.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:19:47,898 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20676.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:20:02,204 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-15 17:20:09,899 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2231, 3.5777, 3.4200, 3.7235, 3.3167, 2.7058, 4.0616, 3.3579], device='cuda:0'), covar=tensor([0.0643, 0.0767, 0.0600, 0.0735, 0.0592, 0.0543, 0.0634, 0.0583], device='cuda:0'), in_proj_covar=tensor([0.0056, 0.0079, 0.0067, 0.0075, 0.0058, 0.0049, 0.0092, 0.0061], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:20:11,190 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.178e+02 2.120e+02 2.658e+02 3.170e+02 7.084e+02, threshold=5.316e+02, percent-clipped=6.0 2022-11-15 17:20:17,390 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4559, 5.1514, 3.8284, 2.3323, 4.9917, 2.3785, 4.9565, 2.7795], device='cuda:0'), covar=tensor([0.0812, 0.0131, 0.0555, 0.1754, 0.0160, 0.1578, 0.0067, 0.1465], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0086, 0.0089, 0.0116, 0.0091, 0.0128, 0.0075, 0.0119], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 17:20:19,438 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0168, 4.7566, 4.5377, 4.9535, 4.2696, 3.3175, 5.3416, 4.5369], device='cuda:0'), covar=tensor([0.0489, 0.0563, 0.0324, 0.0479, 0.0366, 0.0333, 0.0502, 0.0284], device='cuda:0'), in_proj_covar=tensor([0.0055, 0.0079, 0.0066, 0.0074, 0.0058, 0.0049, 0.0090, 0.0060], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:20:20,748 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20724.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:20:36,117 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20744.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:20:36,747 INFO [train.py:876] (0/4) Epoch 3, batch 6200, loss[loss=0.1926, simple_loss=0.1895, pruned_loss=0.09782, over 5743.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2004, pruned_loss=0.1098, over 1090200.69 frames. ], batch size: 20, lr: 2.26e-02, grad_scale: 8.0 2022-11-15 17:20:36,870 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20745.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:20:41,905 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.57 vs. limit=2.0 2022-11-15 17:20:48,420 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.82 vs. limit=2.0 2022-11-15 17:20:49,484 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6846, 2.7103, 2.5288, 1.3241, 2.7440, 3.2490, 3.4654, 3.3766], device='cuda:0'), covar=tensor([0.1199, 0.0772, 0.0591, 0.1513, 0.0165, 0.0185, 0.0119, 0.0159], device='cuda:0'), in_proj_covar=tensor([0.0181, 0.0170, 0.0125, 0.0183, 0.0116, 0.0111, 0.0107, 0.0127], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:20:52,824 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20768.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:20:57,238 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20774.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:21:11,095 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20793.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:21:22,960 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.536e+02 2.113e+02 2.498e+02 3.399e+02 6.391e+02, threshold=4.996e+02, percent-clipped=4.0 2022-11-15 17:21:27,274 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20816.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:21:29,493 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0105, 2.0892, 2.6676, 3.6285, 4.0447, 2.7521, 2.3807, 4.2956], device='cuda:0'), covar=tensor([0.0260, 0.4294, 0.3018, 0.2169, 0.0463, 0.3286, 0.2524, 0.0175], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0222, 0.0228, 0.0220, 0.0188, 0.0235, 0.0210, 0.0142], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0002], device='cuda:0') 2022-11-15 17:21:47,811 INFO [train.py:876] (0/4) Epoch 3, batch 6300, loss[loss=0.2692, simple_loss=0.2341, pruned_loss=0.1521, over 5450.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2006, pruned_loss=0.1107, over 1078272.34 frames. ], batch size: 58, lr: 2.25e-02, grad_scale: 8.0 2022-11-15 17:21:58,670 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=20860.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:22:00,099 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5039, 2.3806, 1.6815, 1.3353, 1.4601, 1.7376, 1.7781, 1.5814], device='cuda:0'), covar=tensor([0.0069, 0.0047, 0.0052, 0.0019, 0.0015, 0.0135, 0.0018, 0.0021], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0012, 0.0012, 0.0013, 0.0013, 0.0012, 0.0014, 0.0013], device='cuda:0'), out_proj_covar=tensor([1.6643e-05, 1.4987e-05, 1.5251e-05, 1.6235e-05, 1.4029e-05, 1.4779e-05, 1.7100e-05, 1.6889e-05], device='cuda:0') 2022-11-15 17:22:32,809 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=20908.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:22:34,073 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.371e+02 2.128e+02 2.723e+02 3.616e+02 7.802e+02, threshold=5.445e+02, percent-clipped=12.0 2022-11-15 17:22:50,082 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0799, 1.3843, 1.0961, 1.4895, 0.9954, 0.9366, 0.7562, 1.4206], device='cuda:0'), covar=tensor([0.0534, 0.0687, 0.0781, 0.0363, 0.0980, 0.1023, 0.0973, 0.0253], device='cuda:0'), in_proj_covar=tensor([0.0041, 0.0045, 0.0053, 0.0038, 0.0057, 0.0043, 0.0053, 0.0039], device='cuda:0'), out_proj_covar=tensor([9.1666e-05, 1.0435e-04, 1.3435e-04, 9.0587e-05, 1.3418e-04, 1.0878e-04, 1.2309e-04, 9.3990e-05], device='cuda:0') 2022-11-15 17:22:58,277 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=20944.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:22:58,896 INFO [train.py:876] (0/4) Epoch 3, batch 6400, loss[loss=0.1727, simple_loss=0.1788, pruned_loss=0.08328, over 5566.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2017, pruned_loss=0.1113, over 1081132.67 frames. ], batch size: 25, lr: 2.25e-02, grad_scale: 8.0 2022-11-15 17:23:00,580 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.96 vs. limit=2.0 2022-11-15 17:23:10,987 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5392, 2.0091, 1.0982, 1.7123, 1.2143, 1.7998, 1.2748, 1.9445], device='cuda:0'), covar=tensor([0.0553, 0.1247, 0.1317, 0.1397, 0.1140, 0.1103, 0.0897, 0.0649], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0047, 0.0056, 0.0041, 0.0058, 0.0045, 0.0055, 0.0041], device='cuda:0'), out_proj_covar=tensor([9.5677e-05, 1.0919e-04, 1.4020e-04, 9.4927e-05, 1.3862e-04, 1.1246e-04, 1.2760e-04, 9.8066e-05], device='cuda:0') 2022-11-15 17:23:25,049 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 17:23:46,175 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.963e+02 2.483e+02 3.263e+02 5.829e+02, threshold=4.966e+02, percent-clipped=2.0 2022-11-15 17:23:47,028 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7968, 3.8888, 3.9687, 3.4888, 3.9659, 3.6458, 1.4151, 3.7760], device='cuda:0'), covar=tensor([0.0436, 0.0264, 0.0243, 0.0247, 0.0406, 0.0311, 0.2833, 0.0486], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0065, 0.0065, 0.0055, 0.0079, 0.0060, 0.0122, 0.0087], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:24:09,215 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1673, 1.2818, 1.3559, 1.2622, 1.4828, 1.3045, 1.7343, 1.4431], device='cuda:0'), covar=tensor([0.0016, 0.0054, 0.0037, 0.0018, 0.0014, 0.0027, 0.0017, 0.0021], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0012, 0.0011, 0.0013, 0.0012, 0.0012, 0.0014, 0.0012], device='cuda:0'), out_proj_covar=tensor([1.5334e-05, 1.4981e-05, 1.4949e-05, 1.5189e-05, 1.3164e-05, 1.4020e-05, 1.6843e-05, 1.5842e-05], device='cuda:0') 2022-11-15 17:24:10,686 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21044.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:24:11,255 INFO [train.py:876] (0/4) Epoch 3, batch 6500, loss[loss=0.1374, simple_loss=0.1488, pruned_loss=0.06297, over 4507.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2012, pruned_loss=0.1116, over 1074652.16 frames. ], batch size: 5, lr: 2.24e-02, grad_scale: 8.0 2022-11-15 17:24:17,479 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21052.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:24:32,924 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21074.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:24:45,227 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21092.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:24:58,583 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.422e+02 2.262e+02 2.845e+02 3.817e+02 7.458e+02, threshold=5.690e+02, percent-clipped=11.0 2022-11-15 17:25:00,959 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21113.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:25:04,396 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21118.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:25:06,963 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21122.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:25:22,799 INFO [train.py:876] (0/4) Epoch 3, batch 6600, loss[loss=0.187, simple_loss=0.198, pruned_loss=0.08798, over 5679.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.1989, pruned_loss=0.1087, over 1081150.59 frames. ], batch size: 19, lr: 2.23e-02, grad_scale: 8.0 2022-11-15 17:25:39,062 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0765, 3.2857, 2.4357, 3.2289, 3.1956, 3.0867, 3.3569, 3.0623], device='cuda:0'), covar=tensor([0.0837, 0.0699, 0.1631, 0.0760, 0.0880, 0.0510, 0.0433, 0.0530], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0108, 0.0162, 0.0102, 0.0129, 0.0115, 0.0111, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:25:47,399 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21179.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:26:05,447 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.55 vs. limit=5.0 2022-11-15 17:26:09,147 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.333e+02 2.074e+02 2.594e+02 3.441e+02 9.119e+02, threshold=5.187e+02, percent-clipped=3.0 2022-11-15 17:26:33,847 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21244.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:26:34,418 INFO [train.py:876] (0/4) Epoch 3, batch 6700, loss[loss=0.2345, simple_loss=0.2426, pruned_loss=0.1132, over 5566.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.1995, pruned_loss=0.1089, over 1079862.04 frames. ], batch size: 22, lr: 2.23e-02, grad_scale: 8.0 2022-11-15 17:27:07,743 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21292.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:27:18,991 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.34 vs. limit=2.0 2022-11-15 17:27:20,388 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.449e+02 2.110e+02 2.601e+02 3.417e+02 8.582e+02, threshold=5.201e+02, percent-clipped=1.0 2022-11-15 17:27:22,010 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21312.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:27:29,538 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.13 vs. limit=2.0 2022-11-15 17:27:33,316 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21328.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:27:45,487 INFO [train.py:876] (0/4) Epoch 3, batch 6800, loss[loss=0.2453, simple_loss=0.2285, pruned_loss=0.1311, over 5730.00 frames. ], tot_loss[loss=0.2107, simple_loss=0.2011, pruned_loss=0.1101, over 1085652.27 frames. ], batch size: 27, lr: 2.22e-02, grad_scale: 16.0 2022-11-15 17:28:05,232 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21373.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:28:17,371 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21389.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:28:30,803 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21408.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:28:32,050 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.395e+02 2.037e+02 2.447e+02 3.250e+02 5.677e+02, threshold=4.895e+02, percent-clipped=2.0 2022-11-15 17:28:33,194 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.39 vs. limit=2.0 2022-11-15 17:28:57,299 INFO [train.py:876] (0/4) Epoch 3, batch 6900, loss[loss=0.2341, simple_loss=0.2193, pruned_loss=0.1245, over 5736.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2, pruned_loss=0.1096, over 1086369.02 frames. ], batch size: 14, lr: 2.22e-02, grad_scale: 16.0 2022-11-15 17:29:03,541 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9099, 4.8253, 5.1711, 5.0469, 3.9567, 3.7737, 5.5271, 5.0060], device='cuda:0'), covar=tensor([0.0316, 0.0579, 0.0195, 0.0416, 0.0671, 0.0247, 0.0427, 0.0245], device='cuda:0'), in_proj_covar=tensor([0.0057, 0.0078, 0.0063, 0.0073, 0.0059, 0.0048, 0.0090, 0.0059], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:29:10,350 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4074, 1.1060, 1.2109, 1.0928, 1.0427, 1.7275, 0.9549, 0.9704], device='cuda:0'), covar=tensor([0.0025, 0.0024, 0.0015, 0.0015, 0.0021, 0.0006, 0.0030, 0.0039], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0018, 0.0019, 0.0018, 0.0019, 0.0015, 0.0019, 0.0017], device='cuda:0'), out_proj_covar=tensor([2.1750e-05, 2.3434e-05, 2.0546e-05, 1.8797e-05, 2.0617e-05, 1.5119e-05, 2.9039e-05, 2.0341e-05], device='cuda:0') 2022-11-15 17:29:17,889 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21474.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:29:39,041 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0502, 3.4472, 2.7678, 2.9601, 1.9446, 3.1990, 2.0161, 2.7131], device='cuda:0'), covar=tensor([0.0207, 0.0061, 0.0090, 0.0119, 0.0211, 0.0051, 0.0171, 0.0046], device='cuda:0'), in_proj_covar=tensor([0.0138, 0.0080, 0.0100, 0.0104, 0.0137, 0.0095, 0.0117, 0.0081], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:29:43,551 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.421e+02 2.216e+02 2.709e+02 3.216e+02 6.064e+02, threshold=5.419e+02, percent-clipped=2.0 2022-11-15 17:29:56,983 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4174, 0.8739, 1.1850, 0.9615, 1.4478, 1.8391, 0.9462, 1.0917], device='cuda:0'), covar=tensor([0.0020, 0.0024, 0.0013, 0.0040, 0.0016, 0.0005, 0.0029, 0.0079], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0018, 0.0019, 0.0018, 0.0019, 0.0016, 0.0020, 0.0018], device='cuda:0'), out_proj_covar=tensor([2.2187e-05, 2.3800e-05, 2.1209e-05, 1.9381e-05, 2.0468e-05, 1.5571e-05, 2.9239e-05, 2.1529e-05], device='cuda:0') 2022-11-15 17:30:08,651 INFO [train.py:876] (0/4) Epoch 3, batch 7000, loss[loss=0.1818, simple_loss=0.1866, pruned_loss=0.0885, over 5759.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2018, pruned_loss=0.1112, over 1089338.24 frames. ], batch size: 14, lr: 2.22e-02, grad_scale: 16.0 2022-11-15 17:30:24,561 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7461, 1.8742, 1.7756, 1.6196, 1.7932, 1.8858, 0.8773, 1.8382], device='cuda:0'), covar=tensor([0.0280, 0.0156, 0.0194, 0.0182, 0.0250, 0.0145, 0.1368, 0.0229], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0065, 0.0066, 0.0055, 0.0078, 0.0060, 0.0119, 0.0084], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:30:43,774 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21595.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:30:55,041 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.332e+02 2.225e+02 2.783e+02 3.485e+02 7.501e+02, threshold=5.566e+02, percent-clipped=1.0 2022-11-15 17:30:57,269 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21613.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:31:13,462 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0910, 1.4773, 1.9532, 1.3334, 0.3314, 1.7829, 1.3736, 1.3253], device='cuda:0'), covar=tensor([0.0238, 0.0219, 0.0123, 0.0311, 0.1028, 0.0621, 0.0325, 0.0376], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0032, 0.0030, 0.0034, 0.0032, 0.0026, 0.0030, 0.0032], device='cuda:0'), out_proj_covar=tensor([5.2838e-05, 4.9342e-05, 4.5431e-05, 5.9912e-05, 5.7328e-05, 4.8713e-05, 4.9219e-05, 5.3421e-05], device='cuda:0') 2022-11-15 17:31:19,547 INFO [train.py:876] (0/4) Epoch 3, batch 7100, loss[loss=0.2167, simple_loss=0.2088, pruned_loss=0.1123, over 5791.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2021, pruned_loss=0.1108, over 1086878.58 frames. ], batch size: 21, lr: 2.21e-02, grad_scale: 16.0 2022-11-15 17:31:27,560 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21656.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:31:36,523 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21668.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:31:40,745 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21674.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:31:47,842 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21684.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:32:04,374 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21708.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:32:05,171 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1734, 1.0229, 0.9023, 0.9868, 1.3554, 1.5986, 1.3337, 1.3388], device='cuda:0'), covar=tensor([0.0016, 0.0019, 0.0015, 0.0019, 0.0015, 0.0006, 0.0027, 0.0014], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0018, 0.0019, 0.0019, 0.0019, 0.0016, 0.0019, 0.0018], device='cuda:0'), out_proj_covar=tensor([2.3088e-05, 2.3742e-05, 2.1390e-05, 2.0330e-05, 2.0320e-05, 1.5695e-05, 2.9106e-05, 2.0622e-05], device='cuda:0') 2022-11-15 17:32:05,619 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.022e+02 2.184e+02 2.618e+02 3.432e+02 5.543e+02, threshold=5.236e+02, percent-clipped=0.0 2022-11-15 17:32:05,743 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8057, 2.8353, 2.3725, 2.7069, 2.7487, 2.5451, 2.4502, 2.3470], device='cuda:0'), covar=tensor([0.0249, 0.0329, 0.1023, 0.0373, 0.0378, 0.0369, 0.0483, 0.0476], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0108, 0.0168, 0.0104, 0.0132, 0.0114, 0.0113, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:32:15,305 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.18 vs. limit=5.0 2022-11-15 17:32:31,171 INFO [train.py:876] (0/4) Epoch 3, batch 7200, loss[loss=0.1665, simple_loss=0.1747, pruned_loss=0.07915, over 5751.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2017, pruned_loss=0.1106, over 1086637.83 frames. ], batch size: 14, lr: 2.21e-02, grad_scale: 16.0 2022-11-15 17:32:38,697 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21756.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:32:43,680 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21763.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:32:51,897 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21774.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:33:16,994 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.323e+02 2.105e+02 2.662e+02 3.428e+02 5.862e+02, threshold=5.324e+02, percent-clipped=2.0 2022-11-15 17:33:23,625 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-3.pt 2022-11-15 17:34:13,371 INFO [train.py:876] (0/4) Epoch 4, batch 0, loss[loss=0.3014, simple_loss=0.2611, pruned_loss=0.1709, over 5702.00 frames. ], tot_loss[loss=0.3014, simple_loss=0.2611, pruned_loss=0.1709, over 5702.00 frames. ], batch size: 36, lr: 2.06e-02, grad_scale: 16.0 2022-11-15 17:34:13,372 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 17:34:21,994 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5406, 2.3192, 2.2207, 2.3568, 2.6154, 2.3511, 2.7329, 2.6299], device='cuda:0'), covar=tensor([0.1054, 0.1217, 0.0912, 0.1270, 0.0728, 0.0720, 0.1306, 0.0580], device='cuda:0'), in_proj_covar=tensor([0.0055, 0.0077, 0.0063, 0.0073, 0.0057, 0.0048, 0.0091, 0.0059], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 17:34:27,384 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3579, 0.9006, 1.0157, 1.1873, 1.1341, 1.2318, 0.6383, 1.0358], device='cuda:0'), covar=tensor([0.0173, 0.0101, 0.1499, 0.1106, 0.0163, 0.0232, 0.0452, 0.0416], device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0010, 0.0008, 0.0008, 0.0009, 0.0009, 0.0009, 0.0009], device='cuda:0'), out_proj_covar=tensor([2.9260e-05, 3.0943e-05, 2.6788e-05, 2.9744e-05, 2.9754e-05, 2.9359e-05, 2.9969e-05, 2.8538e-05], device='cuda:0') 2022-11-15 17:34:29,964 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4087, 2.0835, 3.0047, 2.7314, 3.2094, 1.9014, 2.4835, 3.2208], device='cuda:0'), covar=tensor([0.0099, 0.0502, 0.0171, 0.0383, 0.0130, 0.0480, 0.0354, 0.0214], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0174, 0.0141, 0.0191, 0.0131, 0.0165, 0.0194, 0.0160], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 17:34:30,839 INFO [train.py:908] (0/4) Epoch 4, validation: loss=0.1863, simple_loss=0.204, pruned_loss=0.08431, over 1530663.00 frames. 2022-11-15 17:34:30,839 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 17:34:34,344 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=21822.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:34:35,823 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21824.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:34:50,123 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 17:34:55,150 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21850.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:35:34,971 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.12 vs. limit=2.0 2022-11-15 17:35:38,006 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 2.101e+02 2.586e+02 3.383e+02 7.997e+02, threshold=5.171e+02, percent-clipped=3.0 2022-11-15 17:35:38,903 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=21911.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:35:42,903 INFO [train.py:876] (0/4) Epoch 4, batch 100, loss[loss=0.1996, simple_loss=0.2046, pruned_loss=0.09728, over 5601.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2078, pruned_loss=0.1147, over 434327.02 frames. ], batch size: 23, lr: 2.05e-02, grad_scale: 16.0 2022-11-15 17:36:07,080 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21951.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:36:17,057 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0084, 2.6208, 2.5270, 2.3665, 1.4966, 2.5576, 1.7917, 1.9862], device='cuda:0'), covar=tensor([0.0139, 0.0029, 0.0045, 0.0070, 0.0154, 0.0040, 0.0107, 0.0046], device='cuda:0'), in_proj_covar=tensor([0.0143, 0.0085, 0.0105, 0.0108, 0.0142, 0.0100, 0.0121, 0.0085], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:36:19,006 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21968.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:36:19,638 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=21969.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:36:20,410 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=21970.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:36:29,984 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=21984.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:36:49,993 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.351e+02 2.318e+02 2.900e+02 3.607e+02 8.310e+02, threshold=5.801e+02, percent-clipped=7.0 2022-11-15 17:36:53,596 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22016.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:36:54,234 INFO [train.py:876] (0/4) Epoch 4, batch 200, loss[loss=0.1449, simple_loss=0.1694, pruned_loss=0.06022, over 5513.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2042, pruned_loss=0.1117, over 697924.34 frames. ], batch size: 14, lr: 2.05e-02, grad_scale: 8.0 2022-11-15 17:37:04,777 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22031.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:37:05,302 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22032.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:37:32,436 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9430, 3.4654, 2.5103, 3.4003, 2.3322, 2.6729, 2.0162, 3.0806], device='cuda:0'), covar=tensor([0.1229, 0.0150, 0.0695, 0.0175, 0.0770, 0.0670, 0.1446, 0.0200], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0114, 0.0161, 0.0112, 0.0150, 0.0177, 0.0187, 0.0120], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 17:37:38,900 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7927, 1.7923, 3.4201, 2.6805, 3.4982, 2.3886, 3.1193, 3.7509], device='cuda:0'), covar=tensor([0.0085, 0.0620, 0.0157, 0.0521, 0.0135, 0.0421, 0.0358, 0.0144], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0174, 0.0139, 0.0189, 0.0129, 0.0164, 0.0191, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 17:37:40,974 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3904, 1.9115, 2.3237, 3.3131, 3.2110, 2.4093, 1.7898, 3.4401], device='cuda:0'), covar=tensor([0.0198, 0.2920, 0.2392, 0.1675, 0.0598, 0.2461, 0.2213, 0.0159], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0217, 0.0226, 0.0228, 0.0185, 0.0227, 0.0204, 0.0137], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0002], device='cuda:0') 2022-11-15 17:37:44,398 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0611, 2.2300, 2.2489, 1.0651, 0.6783, 2.0028, 1.6469, 1.6499], device='cuda:0'), covar=tensor([0.0311, 0.0279, 0.0147, 0.0821, 0.1265, 0.1211, 0.0479, 0.0315], device='cuda:0'), in_proj_covar=tensor([0.0030, 0.0030, 0.0029, 0.0033, 0.0031, 0.0026, 0.0028, 0.0031], device='cuda:0'), out_proj_covar=tensor([5.0984e-05, 4.6537e-05, 4.4242e-05, 5.7722e-05, 5.4669e-05, 4.7606e-05, 4.7001e-05, 5.0651e-05], device='cuda:0') 2022-11-15 17:37:48,813 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.15 vs. limit=5.0 2022-11-15 17:37:57,671 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2022-11-15 17:38:01,665 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 1.970e+02 2.360e+02 3.051e+02 4.623e+02, threshold=4.719e+02, percent-clipped=0.0 2022-11-15 17:38:06,229 INFO [train.py:876] (0/4) Epoch 4, batch 300, loss[loss=0.1566, simple_loss=0.1614, pruned_loss=0.07591, over 5481.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.2006, pruned_loss=0.1092, over 849386.74 frames. ], batch size: 12, lr: 2.05e-02, grad_scale: 8.0 2022-11-15 17:38:07,665 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22119.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:38:11,077 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0706, 1.7402, 2.0603, 2.7974, 2.8002, 2.0398, 1.5882, 3.1019], device='cuda:0'), covar=tensor([0.0157, 0.2428, 0.1787, 0.1027, 0.0433, 0.2126, 0.1850, 0.0154], device='cuda:0'), in_proj_covar=tensor([0.0131, 0.0215, 0.0222, 0.0225, 0.0184, 0.0227, 0.0202, 0.0135], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0002], device='cuda:0') 2022-11-15 17:38:15,916 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22130.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:38:42,827 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-15 17:38:47,801 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.44 vs. limit=5.0 2022-11-15 17:38:48,228 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22176.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:38:59,424 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22191.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:39:01,435 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2619, 3.2150, 3.4231, 1.2955, 3.0132, 3.9642, 3.6131, 3.5572], device='cuda:0'), covar=tensor([0.1190, 0.0750, 0.0359, 0.1640, 0.0150, 0.0089, 0.0120, 0.0146], device='cuda:0'), in_proj_covar=tensor([0.0182, 0.0180, 0.0131, 0.0186, 0.0121, 0.0111, 0.0116, 0.0131], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:39:09,868 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22206.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:39:13,065 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.175e+02 2.158e+02 2.699e+02 3.433e+02 6.693e+02, threshold=5.398e+02, percent-clipped=7.0 2022-11-15 17:39:15,895 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22215.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:39:17,098 INFO [train.py:876] (0/4) Epoch 4, batch 400, loss[loss=0.1621, simple_loss=0.1681, pruned_loss=0.07809, over 5500.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.1985, pruned_loss=0.1073, over 940402.99 frames. ], batch size: 11, lr: 2.04e-02, grad_scale: 8.0 2022-11-15 17:39:22,143 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2895, 3.8145, 4.1168, 3.8455, 4.2229, 3.8800, 3.8338, 4.2407], device='cuda:0'), covar=tensor([0.0324, 0.0255, 0.0396, 0.0263, 0.0371, 0.0307, 0.0251, 0.0246], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0091, 0.0076, 0.0100, 0.0096, 0.0059, 0.0082, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:39:31,526 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22237.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:39:31,546 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22237.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:39:41,367 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22251.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:39:54,908 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22269.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:39:59,930 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22276.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:40:15,756 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22298.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:40:16,318 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22299.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:40:25,359 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.316e+02 2.107e+02 2.622e+02 3.333e+02 5.947e+02, threshold=5.245e+02, percent-clipped=1.0 2022-11-15 17:40:29,487 INFO [train.py:876] (0/4) Epoch 4, batch 500, loss[loss=0.2189, simple_loss=0.212, pruned_loss=0.1129, over 5709.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.1973, pruned_loss=0.1059, over 996570.18 frames. ], batch size: 17, lr: 2.04e-02, grad_scale: 8.0 2022-11-15 17:40:29,530 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22317.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:40:35,644 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22326.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:40:59,125 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1875, 1.8532, 3.8898, 2.8754, 4.1412, 2.7923, 3.7609, 4.1954], device='cuda:0'), covar=tensor([0.0094, 0.0866, 0.0177, 0.0695, 0.0066, 0.0679, 0.0358, 0.0187], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0173, 0.0138, 0.0188, 0.0130, 0.0164, 0.0192, 0.0162], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 17:41:17,519 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22385.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:41:36,528 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.346e+02 2.045e+02 2.785e+02 3.827e+02 6.845e+02, threshold=5.570e+02, percent-clipped=5.0 2022-11-15 17:41:38,118 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7938, 3.6734, 3.2645, 1.9351, 3.2036, 4.0967, 3.4966, 3.3306], device='cuda:0'), covar=tensor([0.1078, 0.0626, 0.0464, 0.1198, 0.0150, 0.0095, 0.0176, 0.0242], device='cuda:0'), in_proj_covar=tensor([0.0189, 0.0184, 0.0137, 0.0191, 0.0126, 0.0116, 0.0118, 0.0136], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:41:40,692 INFO [train.py:876] (0/4) Epoch 4, batch 600, loss[loss=0.1486, simple_loss=0.1613, pruned_loss=0.06793, over 5548.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.1969, pruned_loss=0.1054, over 1032565.22 frames. ], batch size: 21, lr: 2.03e-02, grad_scale: 8.0 2022-11-15 17:41:42,576 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22419.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:42:01,009 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1464, 1.3406, 1.4039, 0.8223, 1.1033, 0.8427, 0.5963, 1.1017], device='cuda:0'), covar=tensor([0.0015, 0.0020, 0.0018, 0.0015, 0.0025, 0.0036, 0.0032, 0.0031], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0012, 0.0012, 0.0013, 0.0013, 0.0014, 0.0015, 0.0013], device='cuda:0'), out_proj_covar=tensor([1.4763e-05, 1.5337e-05, 1.5279e-05, 1.5653e-05, 1.3551e-05, 1.6796e-05, 1.8179e-05, 1.7057e-05], device='cuda:0') 2022-11-15 17:42:01,045 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22446.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:42:16,335 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22467.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:42:29,717 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22486.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:42:30,475 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5786, 2.7603, 2.7718, 1.1476, 2.5670, 3.3162, 3.2062, 2.8067], device='cuda:0'), covar=tensor([0.2105, 0.1085, 0.0759, 0.2078, 0.0248, 0.0229, 0.0214, 0.0295], device='cuda:0'), in_proj_covar=tensor([0.0189, 0.0181, 0.0137, 0.0187, 0.0125, 0.0113, 0.0116, 0.0134], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:42:36,179 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2022-11-15 17:42:43,791 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22506.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:42:47,540 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.340e+02 1.953e+02 2.259e+02 3.253e+02 4.972e+02, threshold=4.518e+02, percent-clipped=0.0 2022-11-15 17:42:51,951 INFO [train.py:876] (0/4) Epoch 4, batch 700, loss[loss=0.1458, simple_loss=0.152, pruned_loss=0.06977, over 5179.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.1994, pruned_loss=0.1069, over 1055008.25 frames. ], batch size: 8, lr: 2.03e-02, grad_scale: 8.0 2022-11-15 17:42:52,123 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9114, 0.5133, 0.6676, 0.8369, 0.6083, 0.7208, 0.5457, 0.6603], device='cuda:0'), covar=tensor([0.0201, 0.0298, 0.0212, 0.0448, 0.0227, 0.0181, 0.0399, 0.0259], device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0011, 0.0008, 0.0009, 0.0009, 0.0009, 0.0009, 0.0008], device='cuda:0'), out_proj_covar=tensor([2.9718e-05, 3.2572e-05, 2.8054e-05, 3.1025e-05, 2.9418e-05, 2.9591e-05, 3.0215e-05, 2.8077e-05], device='cuda:0') 2022-11-15 17:43:02,571 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22532.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:43:18,007 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22554.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:43:19,538 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1988, 3.0545, 2.8018, 2.7610, 1.9920, 2.9418, 1.9953, 2.4270], device='cuda:0'), covar=tensor([0.0136, 0.0042, 0.0050, 0.0092, 0.0160, 0.0043, 0.0127, 0.0044], device='cuda:0'), in_proj_covar=tensor([0.0145, 0.0088, 0.0107, 0.0110, 0.0143, 0.0103, 0.0122, 0.0083], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:43:21,140 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.14 vs. limit=2.0 2022-11-15 17:43:25,049 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7226, 1.8818, 1.8157, 1.6313, 1.8599, 1.8890, 0.8591, 1.8647], device='cuda:0'), covar=tensor([0.0432, 0.0229, 0.0237, 0.0265, 0.0336, 0.0189, 0.1846, 0.0336], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0067, 0.0067, 0.0058, 0.0084, 0.0064, 0.0125, 0.0087], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:43:30,238 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22571.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:43:45,948 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22593.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:43:58,688 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 2.071e+02 2.662e+02 3.271e+02 7.195e+02, threshold=5.325e+02, percent-clipped=7.0 2022-11-15 17:44:03,253 INFO [train.py:876] (0/4) Epoch 4, batch 800, loss[loss=0.2598, simple_loss=0.214, pruned_loss=0.1528, over 4623.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.1959, pruned_loss=0.1044, over 1066950.91 frames. ], batch size: 135, lr: 2.02e-02, grad_scale: 8.0 2022-11-15 17:44:04,555 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.72 vs. limit=2.0 2022-11-15 17:44:09,810 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22626.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:44:19,320 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22639.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:44:43,330 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22674.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:44:57,846 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 17:45:01,832 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22700.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 17:45:07,953 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 17:45:09,608 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.215e+02 2.138e+02 2.671e+02 3.427e+02 5.214e+02, threshold=5.342e+02, percent-clipped=0.0 2022-11-15 17:45:13,840 INFO [train.py:876] (0/4) Epoch 4, batch 900, loss[loss=0.2465, simple_loss=0.2289, pruned_loss=0.132, over 5547.00 frames. ], tot_loss[loss=0.2057, simple_loss=0.1981, pruned_loss=0.1067, over 1076625.62 frames. ], batch size: 46, lr: 2.02e-02, grad_scale: 8.0 2022-11-15 17:45:21,800 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.43 vs. limit=5.0 2022-11-15 17:45:27,365 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1269, 0.7668, 0.7860, 0.9629, 1.2953, 1.1305, 1.0522, 1.1574], device='cuda:0'), covar=tensor([0.1110, 0.0653, 0.1719, 0.0950, 0.0972, 0.0694, 0.0698, 0.0621], device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0010, 0.0009, 0.0008, 0.0009, 0.0008, 0.0009, 0.0007], device='cuda:0'), out_proj_covar=tensor([2.9957e-05, 3.1839e-05, 2.9349e-05, 3.0048e-05, 3.0311e-05, 2.9064e-05, 2.9612e-05, 2.6462e-05], device='cuda:0') 2022-11-15 17:45:29,466 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-15 17:45:31,219 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22741.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:45:33,618 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.24 vs. limit=2.0 2022-11-15 17:45:48,557 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-15 17:45:58,889 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2022-11-15 17:46:03,076 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22786.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:46:08,878 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1462, 4.0985, 3.6816, 4.0627, 4.2937, 3.8520, 1.4576, 4.3574], device='cuda:0'), covar=tensor([0.0333, 0.0399, 0.0400, 0.0240, 0.0335, 0.0350, 0.3389, 0.0251], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0069, 0.0068, 0.0060, 0.0085, 0.0065, 0.0127, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:46:08,940 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22794.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:46:20,975 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 2.200e+02 2.748e+02 3.509e+02 2.017e+03, threshold=5.496e+02, percent-clipped=6.0 2022-11-15 17:46:25,457 INFO [train.py:876] (0/4) Epoch 4, batch 1000, loss[loss=0.2628, simple_loss=0.2193, pruned_loss=0.1532, over 4759.00 frames. ], tot_loss[loss=0.207, simple_loss=0.1994, pruned_loss=0.1073, over 1081699.84 frames. ], batch size: 136, lr: 2.02e-02, grad_scale: 8.0 2022-11-15 17:46:31,093 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=22825.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:46:35,857 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22832.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:46:37,244 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22834.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:46:52,929 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22855.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:47:04,122 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22871.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:47:10,227 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22880.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:47:14,477 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=22886.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:47:19,579 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=22893.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:47:32,215 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.242e+02 1.971e+02 2.339e+02 2.906e+02 5.624e+02, threshold=4.678e+02, percent-clipped=1.0 2022-11-15 17:47:36,401 INFO [train.py:876] (0/4) Epoch 4, batch 1100, loss[loss=0.1707, simple_loss=0.1768, pruned_loss=0.08232, over 5288.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.1994, pruned_loss=0.1069, over 1080315.24 frames. ], batch size: 9, lr: 2.01e-02, grad_scale: 8.0 2022-11-15 17:47:37,873 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22919.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:47:53,246 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=22941.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:48:12,532 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.78 vs. limit=2.0 2022-11-15 17:48:13,078 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0 2022-11-15 17:48:31,531 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=22995.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 17:48:40,541 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5627, 5.9872, 4.9936, 5.5196, 4.1415, 3.7988, 3.7812, 4.9144], device='cuda:0'), covar=tensor([0.0965, 0.0039, 0.0304, 0.0181, 0.0288, 0.0560, 0.1095, 0.0111], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0114, 0.0159, 0.0110, 0.0151, 0.0178, 0.0184, 0.0119], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 17:48:43,818 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.434e+02 2.126e+02 2.633e+02 3.216e+02 6.006e+02, threshold=5.267e+02, percent-clipped=5.0 2022-11-15 17:48:47,958 INFO [train.py:876] (0/4) Epoch 4, batch 1200, loss[loss=0.2074, simple_loss=0.2114, pruned_loss=0.1017, over 5793.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.1997, pruned_loss=0.1076, over 1078734.76 frames. ], batch size: 21, lr: 2.01e-02, grad_scale: 8.0 2022-11-15 17:49:04,970 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23041.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:49:27,658 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3229, 4.2292, 4.1623, 4.5059, 4.1108, 3.3618, 4.8780, 4.1304], device='cuda:0'), covar=tensor([0.0441, 0.0783, 0.0414, 0.0668, 0.0415, 0.0351, 0.0585, 0.0415], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0083, 0.0068, 0.0079, 0.0062, 0.0050, 0.0098, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:49:38,852 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23089.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:49:44,013 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.62 vs. limit=2.0 2022-11-15 17:49:52,046 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=23108.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:49:54,246 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.201e+02 2.049e+02 2.605e+02 3.403e+02 6.305e+02, threshold=5.209e+02, percent-clipped=2.0 2022-11-15 17:49:58,726 INFO [train.py:876] (0/4) Epoch 4, batch 1300, loss[loss=0.2135, simple_loss=0.2051, pruned_loss=0.1109, over 5500.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.198, pruned_loss=0.1057, over 1072843.67 frames. ], batch size: 49, lr: 2.00e-02, grad_scale: 8.0 2022-11-15 17:50:21,970 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23150.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:50:35,772 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=23169.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:50:42,905 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.00 vs. limit=2.0 2022-11-15 17:50:44,511 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23181.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:50:52,518 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-15 17:51:05,598 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.170e+01 2.060e+02 2.584e+02 3.363e+02 5.860e+02, threshold=5.168e+02, percent-clipped=2.0 2022-11-15 17:51:09,744 INFO [train.py:876] (0/4) Epoch 4, batch 1400, loss[loss=0.2275, simple_loss=0.211, pruned_loss=0.122, over 5676.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.1965, pruned_loss=0.1047, over 1084338.91 frames. ], batch size: 19, lr: 2.00e-02, grad_scale: 8.0 2022-11-15 17:51:09,928 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=23217.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:51:19,202 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=23229.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:51:25,631 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1594, 3.8687, 4.0638, 4.0909, 3.6450, 3.3872, 4.5236, 4.0372], device='cuda:0'), covar=tensor([0.0471, 0.1382, 0.0419, 0.0904, 0.0699, 0.0312, 0.0914, 0.0458], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0082, 0.0068, 0.0080, 0.0062, 0.0050, 0.0099, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:51:53,340 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=23278.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:52:02,186 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.51 vs. limit=5.0 2022-11-15 17:52:02,654 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=23290.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:52:06,016 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23295.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:52:17,016 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.385e+02 2.039e+02 2.495e+02 2.958e+02 6.410e+02, threshold=4.991e+02, percent-clipped=2.0 2022-11-15 17:52:21,018 INFO [train.py:876] (0/4) Epoch 4, batch 1500, loss[loss=0.2251, simple_loss=0.2147, pruned_loss=0.1178, over 5762.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.1981, pruned_loss=0.1058, over 1085690.08 frames. ], batch size: 21, lr: 1.99e-02, grad_scale: 8.0 2022-11-15 17:52:22,608 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7752, 3.5619, 3.6636, 3.4323, 3.7914, 3.5595, 1.3825, 3.8819], device='cuda:0'), covar=tensor([0.0303, 0.0387, 0.0258, 0.0284, 0.0345, 0.0414, 0.2902, 0.0246], device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0070, 0.0069, 0.0062, 0.0084, 0.0066, 0.0128, 0.0090], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:52:39,664 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23343.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:52:39,695 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8874, 3.4757, 3.7560, 3.4323, 3.9315, 3.1863, 3.4891, 3.9021], device='cuda:0'), covar=tensor([0.0251, 0.0249, 0.0323, 0.0285, 0.0279, 0.0368, 0.0248, 0.0247], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0092, 0.0075, 0.0099, 0.0096, 0.0059, 0.0082, 0.0089], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:53:27,573 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.369e+02 2.167e+02 2.612e+02 3.180e+02 6.082e+02, threshold=5.224e+02, percent-clipped=1.0 2022-11-15 17:53:32,219 INFO [train.py:876] (0/4) Epoch 4, batch 1600, loss[loss=0.1876, simple_loss=0.1977, pruned_loss=0.08876, over 5526.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.1975, pruned_loss=0.1051, over 1084524.92 frames. ], batch size: 17, lr: 1.99e-02, grad_scale: 8.0 2022-11-15 17:53:42,841 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2326, 1.7601, 1.6743, 1.9938, 1.2026, 1.2500, 1.3615, 1.6461], device='cuda:0'), covar=tensor([0.0295, 0.0572, 0.1118, 0.0373, 0.1200, 0.1248, 0.0824, 0.0599], device='cuda:0'), in_proj_covar=tensor([0.0041, 0.0052, 0.0060, 0.0045, 0.0062, 0.0049, 0.0059, 0.0043], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 17:53:56,060 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23450.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:54:05,585 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23464.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:54:05,666 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7900, 2.3808, 1.7828, 2.6600, 1.5876, 1.9267, 2.0718, 2.4221], device='cuda:0'), covar=tensor([0.0325, 0.0901, 0.1569, 0.0469, 0.1331, 0.0882, 0.1086, 0.3453], device='cuda:0'), in_proj_covar=tensor([0.0041, 0.0053, 0.0060, 0.0046, 0.0062, 0.0049, 0.0059, 0.0043], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 17:54:05,871 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.24 vs. limit=2.0 2022-11-15 17:54:17,869 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23481.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:54:22,751 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=23488.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:54:29,479 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23498.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:54:39,499 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.935e+02 2.419e+02 3.182e+02 7.599e+02, threshold=4.837e+02, percent-clipped=7.0 2022-11-15 17:54:43,668 INFO [train.py:876] (0/4) Epoch 4, batch 1700, loss[loss=0.2745, simple_loss=0.2462, pruned_loss=0.1513, over 5285.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.1963, pruned_loss=0.1045, over 1086212.28 frames. ], batch size: 79, lr: 1.99e-02, grad_scale: 8.0 2022-11-15 17:54:51,900 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23529.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:55:05,125 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.11 vs. limit=2.0 2022-11-15 17:55:06,082 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2476, 4.3462, 2.8367, 4.0623, 3.2936, 2.8216, 2.1339, 3.5712], device='cuda:0'), covar=tensor([0.1337, 0.0085, 0.0829, 0.0207, 0.0411, 0.0751, 0.1766, 0.0162], device='cuda:0'), in_proj_covar=tensor([0.0181, 0.0115, 0.0165, 0.0115, 0.0154, 0.0177, 0.0190, 0.0120], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 17:55:06,134 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=23549.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 17:55:21,952 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2022-11-15 17:55:23,675 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23573.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:55:28,552 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7212, 1.6196, 0.9502, 0.9392, 0.5274, 1.4324, 1.0192, 1.0763], device='cuda:0'), covar=tensor([0.0514, 0.0130, 0.0914, 0.0507, 0.0735, 0.0220, 0.0848, 0.0530], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0032, 0.0033, 0.0036, 0.0034, 0.0028, 0.0032, 0.0034], device='cuda:0'), out_proj_covar=tensor([5.8068e-05, 4.8332e-05, 5.1301e-05, 6.6058e-05, 6.0546e-05, 5.1657e-05, 5.4326e-05, 5.5834e-05], device='cuda:0') 2022-11-15 17:55:32,238 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23585.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:55:51,047 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.312e+02 2.088e+02 2.577e+02 3.329e+02 6.388e+02, threshold=5.153e+02, percent-clipped=6.0 2022-11-15 17:55:55,440 INFO [train.py:876] (0/4) Epoch 4, batch 1800, loss[loss=0.1395, simple_loss=0.1529, pruned_loss=0.06302, over 5494.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.196, pruned_loss=0.1045, over 1082599.65 frames. ], batch size: 12, lr: 1.98e-02, grad_scale: 8.0 2022-11-15 17:56:12,930 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2138, 1.9996, 1.5306, 2.1295, 1.4462, 1.4238, 1.7984, 2.2780], device='cuda:0'), covar=tensor([0.0424, 0.0644, 0.1554, 0.0461, 0.1341, 0.0978, 0.0874, 0.1364], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0052, 0.0062, 0.0046, 0.0064, 0.0050, 0.0061, 0.0042], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 17:56:26,619 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7373, 2.7288, 1.5706, 2.7419, 1.7390, 2.0354, 2.4753, 2.6611], device='cuda:0'), covar=tensor([0.0263, 0.0613, 0.1680, 0.0521, 0.1298, 0.0861, 0.0771, 0.1342], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0052, 0.0062, 0.0045, 0.0063, 0.0050, 0.0060, 0.0042], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 17:56:41,562 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9440, 2.6412, 2.3558, 1.1810, 2.9078, 2.8407, 2.6416, 3.2694], device='cuda:0'), covar=tensor([0.1245, 0.0855, 0.0712, 0.1632, 0.0165, 0.0239, 0.0213, 0.0204], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0175, 0.0128, 0.0185, 0.0124, 0.0121, 0.0112, 0.0133], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:56:46,078 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8838, 2.6958, 2.6684, 2.8645, 2.6525, 2.3904, 3.0777, 2.7181], device='cuda:0'), covar=tensor([0.0443, 0.0834, 0.0486, 0.0669, 0.0583, 0.0415, 0.0759, 0.0603], device='cuda:0'), in_proj_covar=tensor([0.0059, 0.0079, 0.0065, 0.0077, 0.0061, 0.0049, 0.0096, 0.0064], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:57:01,353 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.273e+02 2.086e+02 2.661e+02 3.167e+02 7.420e+02, threshold=5.321e+02, percent-clipped=1.0 2022-11-15 17:57:05,432 INFO [train.py:876] (0/4) Epoch 4, batch 1900, loss[loss=0.226, simple_loss=0.2135, pruned_loss=0.1193, over 5113.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.1958, pruned_loss=0.1047, over 1077177.67 frames. ], batch size: 91, lr: 1.98e-02, grad_scale: 8.0 2022-11-15 17:57:35,653 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2097, 4.7974, 4.0754, 4.8648, 4.8428, 4.0068, 4.1961, 3.8651], device='cuda:0'), covar=tensor([0.0226, 0.0327, 0.1370, 0.0251, 0.0243, 0.0447, 0.0590, 0.0662], device='cuda:0'), in_proj_covar=tensor([0.0101, 0.0115, 0.0173, 0.0108, 0.0139, 0.0121, 0.0121, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:57:35,725 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3721, 1.6327, 2.0731, 2.4909, 1.7197, 1.6645, 1.4672, 1.4911], device='cuda:0'), covar=tensor([0.0019, 0.0024, 0.0026, 0.0013, 0.0038, 0.0049, 0.0020, 0.0026], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0014, 0.0012, 0.0014, 0.0013, 0.0014, 0.0016, 0.0014], device='cuda:0'), out_proj_covar=tensor([1.5103e-05, 1.6560e-05, 1.4622e-05, 1.6551e-05, 1.3546e-05, 1.6796e-05, 1.8812e-05, 1.8673e-05], device='cuda:0') 2022-11-15 17:57:39,177 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23764.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:58:02,519 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 17:58:12,700 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 2.073e+02 2.797e+02 3.748e+02 9.524e+02, threshold=5.593e+02, percent-clipped=9.0 2022-11-15 17:58:13,469 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23812.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:58:16,811 INFO [train.py:876] (0/4) Epoch 4, batch 2000, loss[loss=0.2459, simple_loss=0.2272, pruned_loss=0.1322, over 5352.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.1939, pruned_loss=0.1031, over 1079986.63 frames. ], batch size: 70, lr: 1.97e-02, grad_scale: 8.0 2022-11-15 17:58:36,068 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5667, 1.2177, 1.9018, 1.7259, 1.6262, 1.0814, 1.2715, 1.4368], device='cuda:0'), covar=tensor([0.0013, 0.0088, 0.0027, 0.0015, 0.0016, 0.0047, 0.0019, 0.0019], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0014, 0.0012, 0.0015, 0.0014, 0.0015, 0.0016, 0.0015], device='cuda:0'), out_proj_covar=tensor([1.5300e-05, 1.6982e-05, 1.4803e-05, 1.6955e-05, 1.3532e-05, 1.7465e-05, 1.8826e-05, 1.9150e-05], device='cuda:0') 2022-11-15 17:58:36,714 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=23844.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 17:58:56,894 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23873.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:59:03,013 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.1676, 4.4957, 4.8592, 4.5159, 5.1911, 5.0380, 4.4280, 5.1403], device='cuda:0'), covar=tensor([0.0345, 0.0254, 0.0484, 0.0296, 0.0315, 0.0101, 0.0244, 0.0231], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0096, 0.0077, 0.0102, 0.0101, 0.0060, 0.0084, 0.0090], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:59:05,511 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=23885.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:59:06,109 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 17:59:12,455 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0184, 4.5873, 4.0276, 4.5148, 4.5126, 3.7208, 4.0313, 3.7408], device='cuda:0'), covar=tensor([0.0305, 0.0273, 0.0935, 0.0376, 0.0255, 0.0376, 0.0356, 0.0445], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0115, 0.0179, 0.0110, 0.0145, 0.0125, 0.0123, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 17:59:23,777 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 2.154e+02 2.632e+02 3.272e+02 6.334e+02, threshold=5.263e+02, percent-clipped=2.0 2022-11-15 17:59:27,899 INFO [train.py:876] (0/4) Epoch 4, batch 2100, loss[loss=0.2301, simple_loss=0.2037, pruned_loss=0.1282, over 5079.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.1959, pruned_loss=0.1045, over 1083094.92 frames. ], batch size: 91, lr: 1.97e-02, grad_scale: 8.0 2022-11-15 17:59:31,150 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23921.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:59:39,163 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=23933.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 17:59:51,256 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.88 vs. limit=5.0 2022-11-15 18:00:05,942 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2022-11-15 18:00:34,706 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.500e+02 2.023e+02 2.446e+02 2.916e+02 5.131e+02, threshold=4.891e+02, percent-clipped=0.0 2022-11-15 18:00:38,753 INFO [train.py:876] (0/4) Epoch 4, batch 2200, loss[loss=0.1967, simple_loss=0.1936, pruned_loss=0.09992, over 5571.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.1961, pruned_loss=0.1042, over 1084548.63 frames. ], batch size: 16, lr: 1.97e-02, grad_scale: 16.0 2022-11-15 18:00:39,887 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-15 18:00:53,028 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24037.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:01:08,600 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1209, 1.6988, 1.7154, 1.4097, 1.4075, 1.2586, 0.7014, 1.5803], device='cuda:0'), covar=tensor([0.0036, 0.0024, 0.0018, 0.0027, 0.0028, 0.0020, 0.0025, 0.0021], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0022, 0.0021, 0.0021, 0.0022, 0.0020, 0.0022, 0.0020], device='cuda:0'), out_proj_covar=tensor([2.7225e-05, 2.6420e-05, 2.0756e-05, 2.1629e-05, 2.2107e-05, 1.8850e-05, 3.1622e-05, 2.0409e-05], device='cuda:0') 2022-11-15 18:01:32,870 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3898, 1.8063, 1.2908, 2.7541, 1.5266, 2.1839, 2.5597, 3.0080], device='cuda:0'), covar=tensor([0.0323, 0.0904, 0.1679, 0.0403, 0.1168, 0.0795, 0.0770, 0.0299], device='cuda:0'), in_proj_covar=tensor([0.0042, 0.0052, 0.0062, 0.0045, 0.0062, 0.0050, 0.0059, 0.0044], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 18:01:36,333 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24098.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:01:45,263 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 2.100e+02 2.713e+02 3.440e+02 7.345e+02, threshold=5.426e+02, percent-clipped=3.0 2022-11-15 18:01:50,146 INFO [train.py:876] (0/4) Epoch 4, batch 2300, loss[loss=0.2527, simple_loss=0.2136, pruned_loss=0.1459, over 3131.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.1944, pruned_loss=0.1027, over 1082004.85 frames. ], batch size: 284, lr: 1.96e-02, grad_scale: 16.0 2022-11-15 18:02:08,839 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=24144.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 18:02:33,327 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2022-11-15 18:02:34,409 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5828, 2.3272, 1.5023, 3.0186, 1.8302, 2.4150, 2.8439, 2.7571], device='cuda:0'), covar=tensor([0.0351, 0.0706, 0.2505, 0.0640, 0.1643, 0.0700, 0.1203, 0.2540], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0053, 0.0065, 0.0045, 0.0064, 0.0052, 0.0061, 0.0046], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 18:02:42,464 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=24192.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:02:45,416 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4312, 1.7548, 1.9404, 1.3550, 1.2617, 1.7064, 1.8446, 1.5473], device='cuda:0'), covar=tensor([0.0298, 0.0210, 0.0242, 0.0829, 0.1192, 0.5337, 0.0410, 0.0365], device='cuda:0'), in_proj_covar=tensor([0.0036, 0.0034, 0.0034, 0.0038, 0.0034, 0.0029, 0.0032, 0.0037], device='cuda:0'), out_proj_covar=tensor([6.0789e-05, 5.2378e-05, 5.2407e-05, 6.9792e-05, 6.0180e-05, 5.3511e-05, 5.6350e-05, 6.0325e-05], device='cuda:0') 2022-11-15 18:02:55,720 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3036, 2.1349, 1.6831, 2.5499, 1.6738, 1.9955, 2.4424, 2.6764], device='cuda:0'), covar=tensor([0.0434, 0.0820, 0.1771, 0.1251, 0.1344, 0.0846, 0.0984, 0.1142], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0052, 0.0064, 0.0044, 0.0063, 0.0052, 0.0060, 0.0045], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 18:02:56,252 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.235e+02 1.961e+02 2.462e+02 3.190e+02 6.497e+02, threshold=4.924e+02, percent-clipped=3.0 2022-11-15 18:03:00,410 INFO [train.py:876] (0/4) Epoch 4, batch 2400, loss[loss=0.1682, simple_loss=0.1647, pruned_loss=0.08588, over 5354.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.1934, pruned_loss=0.1012, over 1087260.76 frames. ], batch size: 9, lr: 1.96e-02, grad_scale: 16.0 2022-11-15 18:03:27,252 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2022-11-15 18:03:27,770 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.86 vs. limit=5.0 2022-11-15 18:03:49,356 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 18:04:01,096 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7055, 4.1409, 3.2541, 1.9250, 4.0743, 1.5828, 3.8956, 2.2668], device='cuda:0'), covar=tensor([0.1039, 0.0130, 0.0439, 0.1868, 0.0135, 0.1755, 0.0150, 0.1522], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0093, 0.0095, 0.0118, 0.0096, 0.0131, 0.0081, 0.0122], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0002, 0.0004], device='cuda:0') 2022-11-15 18:04:07,105 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 2.091e+02 2.435e+02 3.216e+02 5.165e+02, threshold=4.869e+02, percent-clipped=2.0 2022-11-15 18:04:11,698 INFO [train.py:876] (0/4) Epoch 4, batch 2500, loss[loss=0.1505, simple_loss=0.1572, pruned_loss=0.07193, over 5452.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.1954, pruned_loss=0.1032, over 1080810.81 frames. ], batch size: 10, lr: 1.96e-02, grad_scale: 16.0 2022-11-15 18:04:14,120 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 18:04:15,331 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1706, 1.0083, 1.8520, 1.3075, 1.6057, 1.0391, 1.5298, 1.0807], device='cuda:0'), covar=tensor([0.0019, 0.0050, 0.0021, 0.0024, 0.0019, 0.0061, 0.0020, 0.0040], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0014, 0.0011, 0.0014, 0.0013, 0.0014, 0.0015, 0.0014], device='cuda:0'), out_proj_covar=tensor([1.4491e-05, 1.6058e-05, 1.3025e-05, 1.5915e-05, 1.2958e-05, 1.6183e-05, 1.7105e-05, 1.7636e-05], device='cuda:0') 2022-11-15 18:04:15,495 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.23 vs. limit=5.0 2022-11-15 18:05:05,221 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=24393.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:05:18,633 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.341e+02 2.064e+02 2.482e+02 3.355e+02 5.467e+02, threshold=4.964e+02, percent-clipped=1.0 2022-11-15 18:05:22,807 INFO [train.py:876] (0/4) Epoch 4, batch 2600, loss[loss=0.1334, simple_loss=0.1421, pruned_loss=0.06232, over 5538.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.1945, pruned_loss=0.1026, over 1079173.56 frames. ], batch size: 14, lr: 1.95e-02, grad_scale: 16.0 2022-11-15 18:05:36,403 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.06 vs. limit=2.0 2022-11-15 18:06:28,721 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2022-11-15 18:06:29,498 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.874e+02 2.327e+02 3.027e+02 6.734e+02, threshold=4.654e+02, percent-clipped=5.0 2022-11-15 18:06:33,930 INFO [train.py:876] (0/4) Epoch 4, batch 2700, loss[loss=0.2883, simple_loss=0.236, pruned_loss=0.1703, over 2961.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.1947, pruned_loss=0.1031, over 1080987.48 frames. ], batch size: 284, lr: 1.95e-02, grad_scale: 16.0 2022-11-15 18:06:34,842 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7385, 2.6701, 2.7752, 1.2001, 3.2894, 2.9987, 3.0229, 3.1303], device='cuda:0'), covar=tensor([0.1726, 0.1265, 0.0566, 0.2189, 0.0167, 0.0235, 0.0180, 0.0264], device='cuda:0'), in_proj_covar=tensor([0.0187, 0.0181, 0.0133, 0.0188, 0.0127, 0.0123, 0.0117, 0.0136], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:06:52,717 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2022-11-15 18:06:55,310 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24547.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:07:11,169 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24569.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:07:18,281 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2863, 1.2436, 1.9531, 1.1598, 0.5241, 2.0351, 1.4705, 1.7697], device='cuda:0'), covar=tensor([0.0471, 0.0584, 0.0199, 0.1045, 0.1671, 0.0241, 0.0393, 0.0518], device='cuda:0'), in_proj_covar=tensor([0.0036, 0.0033, 0.0032, 0.0037, 0.0034, 0.0027, 0.0029, 0.0035], device='cuda:0'), out_proj_covar=tensor([6.0828e-05, 5.2339e-05, 4.9158e-05, 6.8139e-05, 6.0429e-05, 5.1071e-05, 5.1789e-05, 5.8331e-05], device='cuda:0') 2022-11-15 18:07:38,833 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24608.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:07:40,661 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.287e+02 2.178e+02 2.525e+02 3.048e+02 6.598e+02, threshold=5.050e+02, percent-clipped=4.0 2022-11-15 18:07:45,018 INFO [train.py:876] (0/4) Epoch 4, batch 2800, loss[loss=0.2141, simple_loss=0.2099, pruned_loss=0.1091, over 5543.00 frames. ], tot_loss[loss=0.198, simple_loss=0.1937, pruned_loss=0.1011, over 1080835.15 frames. ], batch size: 40, lr: 1.94e-02, grad_scale: 16.0 2022-11-15 18:07:46,762 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 18:07:54,325 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24630.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:08:04,124 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7423, 0.9626, 1.2023, 0.6775, 1.2536, 1.2360, 0.8926, 1.0776], device='cuda:0'), covar=tensor([0.0035, 0.0026, 0.0020, 0.0029, 0.0023, 0.0016, 0.0028, 0.0047], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0022, 0.0020, 0.0021, 0.0022, 0.0019, 0.0021, 0.0019], device='cuda:0'), out_proj_covar=tensor([2.7191e-05, 2.6218e-05, 2.0139e-05, 2.0734e-05, 2.2826e-05, 1.7958e-05, 2.9261e-05, 2.0156e-05], device='cuda:0') 2022-11-15 18:08:38,945 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=24693.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:08:42,507 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7680, 1.1218, 1.0852, 0.9895, 1.5167, 1.6065, 0.6997, 1.2891], device='cuda:0'), covar=tensor([0.0033, 0.0026, 0.0018, 0.0020, 0.0020, 0.0011, 0.0029, 0.0018], device='cuda:0'), in_proj_covar=tensor([0.0024, 0.0022, 0.0021, 0.0021, 0.0023, 0.0020, 0.0021, 0.0020], device='cuda:0'), out_proj_covar=tensor([2.8000e-05, 2.6627e-05, 2.0507e-05, 2.1091e-05, 2.3050e-05, 1.7937e-05, 3.0051e-05, 2.0202e-05], device='cuda:0') 2022-11-15 18:08:51,551 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.858e+02 2.475e+02 3.106e+02 5.971e+02, threshold=4.951e+02, percent-clipped=4.0 2022-11-15 18:08:55,665 INFO [train.py:876] (0/4) Epoch 4, batch 2900, loss[loss=0.1876, simple_loss=0.1932, pruned_loss=0.091, over 5580.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.1941, pruned_loss=0.1013, over 1085347.39 frames. ], batch size: 23, lr: 1.94e-02, grad_scale: 16.0 2022-11-15 18:08:58,496 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0798, 3.9137, 2.9142, 3.7557, 2.8306, 2.8624, 2.1192, 3.2379], device='cuda:0'), covar=tensor([0.1229, 0.0133, 0.0661, 0.0188, 0.0520, 0.0683, 0.1458, 0.0180], device='cuda:0'), in_proj_covar=tensor([0.0181, 0.0120, 0.0164, 0.0115, 0.0154, 0.0181, 0.0193, 0.0123], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:09:13,199 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=24741.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:09:23,747 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8694, 3.4220, 3.7426, 3.4490, 3.8947, 3.3515, 3.5365, 3.8309], device='cuda:0'), covar=tensor([0.0307, 0.0322, 0.0359, 0.0331, 0.0300, 0.0316, 0.0246, 0.0315], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0095, 0.0076, 0.0100, 0.0098, 0.0058, 0.0085, 0.0092], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:09:30,492 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24765.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:10:03,413 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.408e+02 2.004e+02 2.362e+02 2.844e+02 4.612e+02, threshold=4.725e+02, percent-clipped=0.0 2022-11-15 18:10:03,566 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2223, 3.0493, 2.4056, 1.5742, 2.9948, 1.1146, 3.0247, 1.5538], device='cuda:0'), covar=tensor([0.1425, 0.0313, 0.1027, 0.2400, 0.0337, 0.2560, 0.0284, 0.2303], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0094, 0.0100, 0.0122, 0.0100, 0.0135, 0.0086, 0.0128], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 18:10:07,568 INFO [train.py:876] (0/4) Epoch 4, batch 3000, loss[loss=0.1943, simple_loss=0.1996, pruned_loss=0.09446, over 5599.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.1951, pruned_loss=0.1029, over 1085037.98 frames. ], batch size: 22, lr: 1.94e-02, grad_scale: 16.0 2022-11-15 18:10:07,569 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 18:10:15,389 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6156, 2.1285, 2.7427, 3.4722, 3.6889, 2.6092, 1.7970, 3.8683], device='cuda:0'), covar=tensor([0.0248, 0.3375, 0.2789, 0.2470, 0.0625, 0.2781, 0.2768, 0.0139], device='cuda:0'), in_proj_covar=tensor([0.0147, 0.0220, 0.0231, 0.0272, 0.0201, 0.0229, 0.0207, 0.0146], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:10:21,653 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2951, 1.8334, 1.3588, 2.4134, 1.2736, 1.7449, 2.0090, 2.3683], device='cuda:0'), covar=tensor([0.0377, 0.1161, 0.1757, 0.0690, 0.1402, 0.1162, 0.1029, 0.0522], device='cuda:0'), in_proj_covar=tensor([0.0042, 0.0049, 0.0058, 0.0040, 0.0055, 0.0047, 0.0055, 0.0040], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 18:10:26,938 INFO [train.py:908] (0/4) Epoch 4, validation: loss=0.1712, simple_loss=0.1916, pruned_loss=0.07544, over 1530663.00 frames. 2022-11-15 18:10:26,938 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 18:10:33,622 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24826.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:10:48,965 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24848.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:11:16,521 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7744, 1.8998, 1.8408, 1.7687, 1.9218, 1.9110, 0.8811, 1.9338], device='cuda:0'), covar=tensor([0.0285, 0.0205, 0.0200, 0.0207, 0.0267, 0.0180, 0.1465, 0.0227], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0067, 0.0069, 0.0059, 0.0083, 0.0064, 0.0121, 0.0090], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:11:28,023 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=24903.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:11:32,532 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=24909.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:11:34,400 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.259e+02 1.907e+02 2.538e+02 3.082e+02 5.544e+02, threshold=5.077e+02, percent-clipped=4.0 2022-11-15 18:11:37,854 INFO [train.py:876] (0/4) Epoch 4, batch 3100, loss[loss=0.1936, simple_loss=0.2144, pruned_loss=0.08635, over 5519.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.1961, pruned_loss=0.1038, over 1083824.39 frames. ], batch size: 14, lr: 1.93e-02, grad_scale: 8.0 2022-11-15 18:11:43,321 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=24925.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:12:04,423 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5977, 2.3363, 2.3257, 1.0229, 2.6678, 2.6920, 2.7366, 2.9766], device='cuda:0'), covar=tensor([0.1669, 0.1274, 0.0712, 0.1969, 0.0203, 0.0255, 0.0195, 0.0248], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0188, 0.0138, 0.0192, 0.0130, 0.0127, 0.0120, 0.0141], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:12:20,177 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-15 18:12:33,569 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=24995.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:12:37,216 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-25000.pt 2022-11-15 18:12:48,704 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 2.138e+02 2.743e+02 3.438e+02 6.298e+02, threshold=5.486e+02, percent-clipped=1.0 2022-11-15 18:12:52,455 INFO [train.py:876] (0/4) Epoch 4, batch 3200, loss[loss=0.1864, simple_loss=0.1899, pruned_loss=0.09149, over 5763.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.1947, pruned_loss=0.102, over 1086462.39 frames. ], batch size: 21, lr: 1.93e-02, grad_scale: 8.0 2022-11-15 18:13:20,288 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=25056.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:14:01,374 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.332e+02 2.126e+02 2.620e+02 3.307e+02 8.462e+02, threshold=5.241e+02, percent-clipped=1.0 2022-11-15 18:14:05,068 INFO [train.py:876] (0/4) Epoch 4, batch 3300, loss[loss=0.1808, simple_loss=0.182, pruned_loss=0.08977, over 5747.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.1935, pruned_loss=0.1013, over 1086513.93 frames. ], batch size: 31, lr: 1.93e-02, grad_scale: 8.0 2022-11-15 18:14:07,996 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=25121.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:15:06,859 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25203.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:15:07,926 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=25204.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:15:13,949 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.413e+02 1.803e+02 2.408e+02 3.153e+02 7.479e+02, threshold=4.816e+02, percent-clipped=4.0 2022-11-15 18:15:15,176 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2022-11-15 18:15:17,452 INFO [train.py:876] (0/4) Epoch 4, batch 3400, loss[loss=0.1834, simple_loss=0.1818, pruned_loss=0.09251, over 5752.00 frames. ], tot_loss[loss=0.1972, simple_loss=0.193, pruned_loss=0.1007, over 1088445.26 frames. ], batch size: 20, lr: 1.92e-02, grad_scale: 8.0 2022-11-15 18:15:23,194 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25225.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:15:35,019 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9362, 5.1299, 5.2356, 5.3536, 4.5475, 4.5338, 5.6904, 4.9346], device='cuda:0'), covar=tensor([0.0360, 0.0874, 0.0213, 0.0585, 0.0504, 0.0185, 0.0495, 0.0321], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0080, 0.0067, 0.0080, 0.0061, 0.0051, 0.0098, 0.0067], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:15:41,056 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25251.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:15:44,812 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.54 vs. limit=2.0 2022-11-15 18:15:47,115 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2599, 0.9098, 1.3817, 1.4036, 1.3107, 1.6900, 1.5953, 1.2274], device='cuda:0'), covar=tensor([0.1705, 0.0450, 0.0850, 0.0538, 0.1379, 0.1457, 0.0644, 0.2257], device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0011, 0.0010, 0.0009, 0.0009, 0.0008, 0.0010, 0.0009], device='cuda:0'), out_proj_covar=tensor([3.0792e-05, 3.7877e-05, 3.4110e-05, 3.3381e-05, 3.3997e-05, 3.1291e-05, 3.3603e-05, 3.2097e-05], device='cuda:0') 2022-11-15 18:15:56,860 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25273.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:16:23,802 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.900e+02 2.223e+02 2.867e+02 4.399e+02, threshold=4.447e+02, percent-clipped=0.0 2022-11-15 18:16:27,922 INFO [train.py:876] (0/4) Epoch 4, batch 3500, loss[loss=0.1619, simple_loss=0.174, pruned_loss=0.07487, over 5738.00 frames. ], tot_loss[loss=0.1953, simple_loss=0.1919, pruned_loss=0.09934, over 1089551.05 frames. ], batch size: 20, lr: 1.92e-02, grad_scale: 8.0 2022-11-15 18:16:40,692 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0678, 1.8669, 2.1167, 2.7506, 2.8473, 2.1798, 1.5707, 2.9958], device='cuda:0'), covar=tensor([0.0241, 0.2497, 0.2075, 0.1222, 0.0676, 0.2235, 0.1963, 0.0262], device='cuda:0'), in_proj_covar=tensor([0.0146, 0.0216, 0.0224, 0.0272, 0.0202, 0.0226, 0.0201, 0.0144], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:16:51,495 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=25351.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 18:17:04,877 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2912, 4.3553, 2.9162, 4.1737, 3.2573, 3.0133, 1.9939, 3.6126], device='cuda:0'), covar=tensor([0.1196, 0.0149, 0.0785, 0.0168, 0.0408, 0.0673, 0.1763, 0.0156], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0120, 0.0166, 0.0117, 0.0154, 0.0177, 0.0192, 0.0119], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:17:19,294 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2022-11-15 18:17:31,726 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.73 vs. limit=5.0 2022-11-15 18:17:34,604 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 2.249e+02 2.867e+02 3.402e+02 1.165e+03, threshold=5.733e+02, percent-clipped=11.0 2022-11-15 18:17:38,066 INFO [train.py:876] (0/4) Epoch 4, batch 3600, loss[loss=0.247, simple_loss=0.2206, pruned_loss=0.1367, over 5439.00 frames. ], tot_loss[loss=0.1961, simple_loss=0.1923, pruned_loss=0.09994, over 1091226.16 frames. ], batch size: 70, lr: 1.91e-02, grad_scale: 8.0 2022-11-15 18:17:38,540 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.72 vs. limit=5.0 2022-11-15 18:17:41,240 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25421.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:17:46,135 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=25428.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 18:17:47,941 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.34 vs. limit=5.0 2022-11-15 18:18:15,273 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25469.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:18:30,213 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=25489.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 18:18:40,596 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25504.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:18:45,864 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 2.020e+02 2.444e+02 3.029e+02 5.693e+02, threshold=4.888e+02, percent-clipped=0.0 2022-11-15 18:18:49,296 INFO [train.py:876] (0/4) Epoch 4, batch 3700, loss[loss=0.1338, simple_loss=0.1539, pruned_loss=0.05686, over 5760.00 frames. ], tot_loss[loss=0.1954, simple_loss=0.1918, pruned_loss=0.09952, over 1094276.71 frames. ], batch size: 13, lr: 1.91e-02, grad_scale: 8.0 2022-11-15 18:18:53,694 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0738, 3.3861, 3.0594, 2.9948, 2.0444, 3.3322, 2.2132, 2.9547], device='cuda:0'), covar=tensor([0.0285, 0.0134, 0.0109, 0.0194, 0.0293, 0.0077, 0.0227, 0.0065], device='cuda:0'), in_proj_covar=tensor([0.0151, 0.0098, 0.0116, 0.0123, 0.0151, 0.0110, 0.0132, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:19:14,621 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25552.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:19:57,015 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.296e+02 2.082e+02 2.548e+02 3.444e+02 5.526e+02, threshold=5.097e+02, percent-clipped=3.0 2022-11-15 18:20:00,474 INFO [train.py:876] (0/4) Epoch 4, batch 3800, loss[loss=0.2054, simple_loss=0.1807, pruned_loss=0.115, over 4121.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.1935, pruned_loss=0.1013, over 1084169.69 frames. ], batch size: 181, lr: 1.91e-02, grad_scale: 8.0 2022-11-15 18:20:04,706 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4492, 1.8440, 2.0699, 1.4052, 0.7694, 1.8434, 1.4205, 1.6197], device='cuda:0'), covar=tensor([0.0290, 0.0251, 0.0300, 0.0861, 0.1439, 0.1450, 0.0607, 0.0457], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0032, 0.0033, 0.0037, 0.0032, 0.0028, 0.0029, 0.0037], device='cuda:0'), out_proj_covar=tensor([5.9149e-05, 4.9698e-05, 5.0768e-05, 6.9767e-05, 5.8008e-05, 5.1940e-05, 5.1293e-05, 6.2175e-05], device='cuda:0') 2022-11-15 18:20:24,502 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=25651.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 18:20:37,284 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2022-11-15 18:20:58,562 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=25699.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:21:06,850 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4381, 2.7299, 2.5801, 2.5590, 2.6491, 2.7030, 1.0721, 2.5603], device='cuda:0'), covar=tensor([0.0332, 0.0220, 0.0271, 0.0247, 0.0334, 0.0268, 0.2665, 0.0384], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0069, 0.0070, 0.0061, 0.0086, 0.0067, 0.0125, 0.0093], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:21:08,419 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.968e+01 1.976e+02 2.317e+02 2.935e+02 6.102e+02, threshold=4.634e+02, percent-clipped=2.0 2022-11-15 18:21:11,946 INFO [train.py:876] (0/4) Epoch 4, batch 3900, loss[loss=0.2224, simple_loss=0.2067, pruned_loss=0.119, over 5263.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.1941, pruned_loss=0.1025, over 1076696.51 frames. ], batch size: 79, lr: 1.90e-02, grad_scale: 8.0 2022-11-15 18:21:40,531 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9105, 4.3998, 4.7686, 4.2726, 4.9562, 4.8277, 4.2383, 4.9941], device='cuda:0'), covar=tensor([0.0331, 0.0255, 0.0405, 0.0289, 0.0357, 0.0085, 0.0253, 0.0221], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0096, 0.0080, 0.0102, 0.0100, 0.0060, 0.0087, 0.0091], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:21:51,699 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9710, 1.2915, 1.3579, 0.9636, 1.0450, 0.9335, 0.9824, 1.3015], device='cuda:0'), covar=tensor([0.0023, 0.0017, 0.0013, 0.0028, 0.0019, 0.0015, 0.0061, 0.0016], device='cuda:0'), in_proj_covar=tensor([0.0022, 0.0021, 0.0020, 0.0021, 0.0022, 0.0019, 0.0020, 0.0020], device='cuda:0'), out_proj_covar=tensor([2.5448e-05, 2.4425e-05, 1.9236e-05, 2.1088e-05, 2.1434e-05, 1.6695e-05, 2.7953e-05, 2.0691e-05], device='cuda:0') 2022-11-15 18:21:59,895 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=25784.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 18:22:11,671 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.68 vs. limit=2.0 2022-11-15 18:22:20,041 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.927e+01 1.909e+02 2.418e+02 3.208e+02 6.074e+02, threshold=4.836e+02, percent-clipped=3.0 2022-11-15 18:22:21,518 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.5940, 5.0300, 5.3151, 4.9384, 5.5925, 5.5157, 4.6583, 5.5072], device='cuda:0'), covar=tensor([0.0279, 0.0243, 0.0394, 0.0263, 0.0311, 0.0057, 0.0170, 0.0174], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0094, 0.0079, 0.0101, 0.0098, 0.0059, 0.0085, 0.0089], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:22:23,381 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8767, 2.7181, 2.5522, 2.4903, 1.5813, 2.6313, 1.6895, 1.7601], device='cuda:0'), covar=tensor([0.0155, 0.0032, 0.0062, 0.0085, 0.0171, 0.0049, 0.0147, 0.0066], device='cuda:0'), in_proj_covar=tensor([0.0144, 0.0094, 0.0111, 0.0115, 0.0144, 0.0106, 0.0127, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:22:24,177 INFO [train.py:876] (0/4) Epoch 4, batch 4000, loss[loss=0.1998, simple_loss=0.1819, pruned_loss=0.1089, over 4963.00 frames. ], tot_loss[loss=0.1966, simple_loss=0.1928, pruned_loss=0.1002, over 1081832.05 frames. ], batch size: 109, lr: 1.90e-02, grad_scale: 8.0 2022-11-15 18:22:52,061 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=25858.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:23:08,503 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2022-11-15 18:23:09,386 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2944, 3.1506, 2.4220, 1.6910, 3.0903, 1.1956, 2.9290, 1.6981], device='cuda:0'), covar=tensor([0.1012, 0.0180, 0.0648, 0.1708, 0.0208, 0.1897, 0.0257, 0.1520], device='cuda:0'), in_proj_covar=tensor([0.0131, 0.0096, 0.0102, 0.0121, 0.0100, 0.0134, 0.0087, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 18:23:22,734 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 18:23:30,891 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.220e+02 1.955e+02 2.286e+02 3.069e+02 5.405e+02, threshold=4.573e+02, percent-clipped=2.0 2022-11-15 18:23:35,104 INFO [train.py:876] (0/4) Epoch 4, batch 4100, loss[loss=0.1732, simple_loss=0.1832, pruned_loss=0.08158, over 5075.00 frames. ], tot_loss[loss=0.1931, simple_loss=0.1901, pruned_loss=0.09802, over 1081364.91 frames. ], batch size: 7, lr: 1.90e-02, grad_scale: 8.0 2022-11-15 18:23:37,062 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=25919.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 18:23:48,535 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5121, 4.0010, 4.3897, 3.9523, 4.6252, 4.2354, 3.9577, 4.5384], device='cuda:0'), covar=tensor([0.0365, 0.0330, 0.0429, 0.0369, 0.0364, 0.0243, 0.0293, 0.0292], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0095, 0.0078, 0.0102, 0.0099, 0.0059, 0.0086, 0.0089], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:24:11,893 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2471, 3.0747, 2.3751, 1.6768, 3.0314, 1.0891, 3.0681, 1.6923], device='cuda:0'), covar=tensor([0.0964, 0.0173, 0.0787, 0.1646, 0.0188, 0.1917, 0.0173, 0.1507], device='cuda:0'), in_proj_covar=tensor([0.0131, 0.0096, 0.0103, 0.0120, 0.0099, 0.0134, 0.0086, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 18:24:24,155 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6343, 2.4699, 2.0859, 2.3255, 1.2393, 2.0806, 1.5884, 2.1698], device='cuda:0'), covar=tensor([0.0738, 0.0125, 0.0494, 0.0209, 0.1104, 0.0493, 0.1029, 0.0195], device='cuda:0'), in_proj_covar=tensor([0.0181, 0.0118, 0.0169, 0.0118, 0.0155, 0.0178, 0.0186, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:24:42,056 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.193e+02 2.057e+02 2.578e+02 3.231e+02 5.725e+02, threshold=5.157e+02, percent-clipped=4.0 2022-11-15 18:24:45,827 INFO [train.py:876] (0/4) Epoch 4, batch 4200, loss[loss=0.1859, simple_loss=0.1881, pruned_loss=0.09185, over 5756.00 frames. ], tot_loss[loss=0.1959, simple_loss=0.1924, pruned_loss=0.0997, over 1083024.31 frames. ], batch size: 16, lr: 1.89e-02, grad_scale: 8.0 2022-11-15 18:24:57,513 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.00 vs. limit=2.0 2022-11-15 18:25:07,373 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5637, 2.5341, 3.9623, 2.9564, 4.3975, 3.1873, 4.0924, 4.3408], device='cuda:0'), covar=tensor([0.0086, 0.0584, 0.0230, 0.0697, 0.0078, 0.0419, 0.0341, 0.0168], device='cuda:0'), in_proj_covar=tensor([0.0155, 0.0180, 0.0159, 0.0195, 0.0148, 0.0171, 0.0213, 0.0175], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:25:34,267 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26084.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:25:46,679 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26102.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:25:53,375 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.334e+02 1.847e+02 2.412e+02 3.140e+02 6.094e+02, threshold=4.824e+02, percent-clipped=3.0 2022-11-15 18:25:56,767 INFO [train.py:876] (0/4) Epoch 4, batch 4300, loss[loss=0.2214, simple_loss=0.2149, pruned_loss=0.1139, over 5597.00 frames. ], tot_loss[loss=0.1972, simple_loss=0.1936, pruned_loss=0.1004, over 1083896.35 frames. ], batch size: 43, lr: 1.89e-02, grad_scale: 8.0 2022-11-15 18:26:08,217 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26132.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:26:29,689 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26163.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:26:31,337 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.92 vs. limit=2.0 2022-11-15 18:26:33,040 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26168.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:27:04,697 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.262e+02 2.078e+02 2.562e+02 3.309e+02 6.433e+02, threshold=5.124e+02, percent-clipped=7.0 2022-11-15 18:27:06,219 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26214.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 18:27:08,252 INFO [train.py:876] (0/4) Epoch 4, batch 4400, loss[loss=0.1867, simple_loss=0.1968, pruned_loss=0.08823, over 5592.00 frames. ], tot_loss[loss=0.1963, simple_loss=0.1935, pruned_loss=0.09956, over 1082453.76 frames. ], batch size: 18, lr: 1.89e-02, grad_scale: 8.0 2022-11-15 18:27:16,423 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26229.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:27:32,227 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26250.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:28:15,290 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26311.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:28:15,742 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.392e+02 2.026e+02 2.337e+02 2.838e+02 7.809e+02, threshold=4.674e+02, percent-clipped=2.0 2022-11-15 18:28:19,245 INFO [train.py:876] (0/4) Epoch 4, batch 4500, loss[loss=0.2089, simple_loss=0.2036, pruned_loss=0.1071, over 5592.00 frames. ], tot_loss[loss=0.1971, simple_loss=0.1938, pruned_loss=0.1002, over 1084069.97 frames. ], batch size: 22, lr: 1.88e-02, grad_scale: 8.0 2022-11-15 18:29:00,314 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.97 vs. limit=2.0 2022-11-15 18:29:14,449 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6835, 4.2282, 3.6615, 4.2644, 4.1818, 3.4596, 3.7894, 3.5046], device='cuda:0'), covar=tensor([0.0559, 0.0354, 0.1269, 0.0354, 0.0402, 0.0444, 0.0339, 0.0581], device='cuda:0'), in_proj_covar=tensor([0.0108, 0.0126, 0.0204, 0.0126, 0.0158, 0.0135, 0.0132, 0.0120], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:29:22,544 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8564, 2.1568, 2.6621, 3.4284, 3.7802, 2.8694, 2.1411, 3.8502], device='cuda:0'), covar=tensor([0.0131, 0.2657, 0.2132, 0.2380, 0.0555, 0.2241, 0.2053, 0.0183], device='cuda:0'), in_proj_covar=tensor([0.0147, 0.0212, 0.0220, 0.0285, 0.0202, 0.0222, 0.0199, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:29:27,465 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.298e+02 1.949e+02 2.521e+02 3.180e+02 6.694e+02, threshold=5.042e+02, percent-clipped=2.0 2022-11-15 18:29:29,005 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26414.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:29:31,202 INFO [train.py:876] (0/4) Epoch 4, batch 4600, loss[loss=0.1657, simple_loss=0.163, pruned_loss=0.08418, over 5284.00 frames. ], tot_loss[loss=0.1956, simple_loss=0.1928, pruned_loss=0.09918, over 1085236.41 frames. ], batch size: 9, lr: 1.88e-02, grad_scale: 8.0 2022-11-15 18:29:39,737 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.87 vs. limit=5.0 2022-11-15 18:29:47,074 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1810, 3.1958, 3.3106, 1.4196, 3.0769, 3.6716, 3.8048, 3.3632], device='cuda:0'), covar=tensor([0.1744, 0.1023, 0.0651, 0.2341, 0.0192, 0.0259, 0.0141, 0.0402], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0177, 0.0126, 0.0187, 0.0130, 0.0127, 0.0116, 0.0142], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:29:58,816 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3662, 3.3586, 3.0594, 2.9589, 1.8376, 3.3013, 2.1013, 2.7569], device='cuda:0'), covar=tensor([0.0214, 0.0058, 0.0081, 0.0162, 0.0243, 0.0062, 0.0190, 0.0053], device='cuda:0'), in_proj_covar=tensor([0.0150, 0.0099, 0.0115, 0.0122, 0.0150, 0.0113, 0.0132, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:29:59,367 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26458.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:30:12,135 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26475.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:30:20,901 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0163, 2.0828, 2.9680, 3.6980, 4.1268, 2.9694, 2.5161, 4.0546], device='cuda:0'), covar=tensor([0.0201, 0.4506, 0.2477, 0.3468, 0.0587, 0.3044, 0.2336, 0.0175], device='cuda:0'), in_proj_covar=tensor([0.0150, 0.0212, 0.0220, 0.0283, 0.0201, 0.0227, 0.0198, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:30:37,113 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.423e+02 1.888e+02 2.493e+02 3.125e+02 5.400e+02, threshold=4.987e+02, percent-clipped=2.0 2022-11-15 18:30:38,691 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26514.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 18:30:40,595 INFO [train.py:876] (0/4) Epoch 4, batch 4700, loss[loss=0.272, simple_loss=0.2419, pruned_loss=0.1511, over 5338.00 frames. ], tot_loss[loss=0.1946, simple_loss=0.1919, pruned_loss=0.09865, over 1088599.92 frames. ], batch size: 70, lr: 1.88e-02, grad_scale: 8.0 2022-11-15 18:30:45,558 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2323, 1.0163, 2.1087, 1.5769, 1.1510, 0.9597, 1.3995, 1.1807], device='cuda:0'), covar=tensor([0.0015, 0.0082, 0.0016, 0.0019, 0.0029, 0.0071, 0.0020, 0.0025], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0013, 0.0012, 0.0014, 0.0014, 0.0014, 0.0015, 0.0015], device='cuda:0'), out_proj_covar=tensor([1.4675e-05, 1.5540e-05, 1.3342e-05, 1.5624e-05, 1.4693e-05, 1.5666e-05, 1.6510e-05, 1.9108e-05], device='cuda:0') 2022-11-15 18:30:46,182 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26524.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:31:12,670 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26562.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:31:22,605 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9524, 2.4498, 2.0463, 1.1476, 2.3772, 2.6909, 2.5706, 2.9286], device='cuda:0'), covar=tensor([0.1371, 0.1170, 0.0796, 0.2007, 0.0291, 0.0262, 0.0243, 0.0351], device='cuda:0'), in_proj_covar=tensor([0.0183, 0.0175, 0.0126, 0.0182, 0.0130, 0.0126, 0.0117, 0.0141], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:31:43,656 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26606.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:31:47,743 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.209e+02 2.139e+02 2.674e+02 3.379e+02 6.556e+02, threshold=5.348e+02, percent-clipped=3.0 2022-11-15 18:31:51,249 INFO [train.py:876] (0/4) Epoch 4, batch 4800, loss[loss=0.2135, simple_loss=0.182, pruned_loss=0.1225, over 4124.00 frames. ], tot_loss[loss=0.1937, simple_loss=0.1912, pruned_loss=0.0981, over 1092871.63 frames. ], batch size: 181, lr: 1.87e-02, grad_scale: 8.0 2022-11-15 18:32:03,364 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0612, 1.3779, 1.8671, 1.5986, 1.0644, 0.8414, 1.2995, 1.0314], device='cuda:0'), covar=tensor([0.0019, 0.0025, 0.0025, 0.0015, 0.0020, 0.0046, 0.0018, 0.0035], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0014, 0.0012, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015], device='cuda:0'), out_proj_covar=tensor([1.5477e-05, 1.6021e-05, 1.3990e-05, 1.5980e-05, 1.5213e-05, 1.6624e-05, 1.6720e-05, 1.9118e-05], device='cuda:0') 2022-11-15 18:32:17,246 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4661, 2.1810, 1.8035, 2.7306, 1.9910, 2.3792, 2.3729, 3.0209], device='cuda:0'), covar=tensor([0.0649, 0.1099, 0.1672, 0.0674, 0.1255, 0.0607, 0.1216, 0.1222], device='cuda:0'), in_proj_covar=tensor([0.0046, 0.0054, 0.0066, 0.0042, 0.0062, 0.0050, 0.0062, 0.0045], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 18:32:23,964 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9924, 3.8921, 3.8082, 4.1295, 3.5661, 3.2996, 4.4613, 3.7971], device='cuda:0'), covar=tensor([0.0365, 0.0587, 0.0409, 0.0494, 0.0565, 0.0329, 0.0510, 0.0455], device='cuda:0'), in_proj_covar=tensor([0.0058, 0.0076, 0.0064, 0.0077, 0.0061, 0.0052, 0.0098, 0.0064], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:32:29,517 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9675, 1.3919, 1.7204, 1.2636, 0.7234, 1.6407, 1.0931, 1.0930], device='cuda:0'), covar=tensor([0.0428, 0.0333, 0.0370, 0.0575, 0.0734, 0.0407, 0.0600, 0.0538], device='cuda:0'), in_proj_covar=tensor([0.0036, 0.0034, 0.0034, 0.0037, 0.0033, 0.0030, 0.0030, 0.0036], device='cuda:0'), out_proj_covar=tensor([6.2657e-05, 5.3514e-05, 5.3307e-05, 7.0869e-05, 6.0480e-05, 5.5085e-05, 5.2761e-05, 6.1734e-05], device='cuda:0') 2022-11-15 18:32:58,145 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.842e+02 2.215e+02 2.988e+02 4.976e+02, threshold=4.429e+02, percent-clipped=0.0 2022-11-15 18:33:01,598 INFO [train.py:876] (0/4) Epoch 4, batch 4900, loss[loss=0.1318, simple_loss=0.1436, pruned_loss=0.05998, over 4366.00 frames. ], tot_loss[loss=0.1962, simple_loss=0.1927, pruned_loss=0.09982, over 1089049.37 frames. ], batch size: 5, lr: 1.87e-02, grad_scale: 8.0 2022-11-15 18:33:05,829 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5200, 1.7290, 1.7823, 1.7845, 1.1086, 1.5139, 1.0733, 1.2532], device='cuda:0'), covar=tensor([0.0051, 0.0018, 0.0033, 0.0025, 0.0099, 0.0028, 0.0054, 0.0038], device='cuda:0'), in_proj_covar=tensor([0.0148, 0.0099, 0.0112, 0.0121, 0.0148, 0.0111, 0.0132, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:33:12,686 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0668, 4.5071, 3.5996, 1.9763, 4.2847, 1.5833, 4.2084, 2.5542], device='cuda:0'), covar=tensor([0.0977, 0.0122, 0.0380, 0.2192, 0.0152, 0.2298, 0.0175, 0.1772], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0096, 0.0101, 0.0121, 0.0099, 0.0135, 0.0085, 0.0123], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 18:33:21,560 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7672, 2.5679, 2.4057, 1.1794, 2.8453, 2.9062, 2.8668, 3.1524], device='cuda:0'), covar=tensor([0.1973, 0.1531, 0.0758, 0.2428, 0.0295, 0.0349, 0.0253, 0.0354], device='cuda:0'), in_proj_covar=tensor([0.0188, 0.0180, 0.0129, 0.0186, 0.0133, 0.0130, 0.0118, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:33:31,039 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26758.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:33:39,436 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=26770.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:33:59,479 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4043, 3.2764, 3.0945, 3.5431, 3.1201, 2.7389, 3.7854, 3.1434], device='cuda:0'), covar=tensor([0.0428, 0.0755, 0.0601, 0.0761, 0.0612, 0.0413, 0.0837, 0.0689], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0080, 0.0068, 0.0082, 0.0063, 0.0054, 0.0104, 0.0067], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:34:05,200 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26806.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:34:07,910 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4578, 4.2148, 4.1245, 4.6751, 4.0388, 3.7574, 4.9813, 4.2191], device='cuda:0'), covar=tensor([0.0313, 0.0685, 0.0340, 0.0548, 0.0480, 0.0282, 0.0610, 0.0391], device='cuda:0'), in_proj_covar=tensor([0.0060, 0.0079, 0.0067, 0.0081, 0.0062, 0.0053, 0.0102, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:34:09,164 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.286e+02 1.967e+02 2.433e+02 3.040e+02 5.263e+02, threshold=4.865e+02, percent-clipped=3.0 2022-11-15 18:34:10,836 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5135, 2.2614, 3.6283, 3.0396, 4.2760, 2.5333, 3.7606, 4.2829], device='cuda:0'), covar=tensor([0.0104, 0.0754, 0.0263, 0.0686, 0.0174, 0.0623, 0.0420, 0.0174], device='cuda:0'), in_proj_covar=tensor([0.0157, 0.0185, 0.0163, 0.0198, 0.0154, 0.0177, 0.0217, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:34:12,718 INFO [train.py:876] (0/4) Epoch 4, batch 5000, loss[loss=0.2032, simple_loss=0.193, pruned_loss=0.1068, over 5637.00 frames. ], tot_loss[loss=0.1959, simple_loss=0.1923, pruned_loss=0.09973, over 1086288.76 frames. ], batch size: 32, lr: 1.87e-02, grad_scale: 8.0 2022-11-15 18:34:17,621 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26824.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:34:36,110 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2022-11-15 18:34:51,691 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26872.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:35:04,748 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7376, 2.4907, 2.1285, 2.2348, 1.3399, 2.0788, 1.4996, 2.1756], device='cuda:0'), covar=tensor([0.0694, 0.0126, 0.0512, 0.0232, 0.0891, 0.0507, 0.1044, 0.0184], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0119, 0.0165, 0.0119, 0.0153, 0.0179, 0.0185, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:35:14,871 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26904.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:35:16,178 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=26906.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:35:20,045 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 1.925e+02 2.492e+02 3.201e+02 4.823e+02, threshold=4.984e+02, percent-clipped=0.0 2022-11-15 18:35:23,728 INFO [train.py:876] (0/4) Epoch 4, batch 5100, loss[loss=0.125, simple_loss=0.1422, pruned_loss=0.05387, over 5726.00 frames. ], tot_loss[loss=0.1946, simple_loss=0.1914, pruned_loss=0.0989, over 1087059.36 frames. ], batch size: 11, lr: 1.86e-02, grad_scale: 16.0 2022-11-15 18:35:46,957 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=26950.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:35:50,396 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=26954.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:35:58,378 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=26965.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:36:10,374 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.5558, 0.9458, 0.9280, 0.6366, 0.5432, 0.5257, 0.4572, 0.8611], device='cuda:0'), covar=tensor([0.0017, 0.0014, 0.0011, 0.0010, 0.0018, 0.0014, 0.0028, 0.0009], device='cuda:0'), in_proj_covar=tensor([0.0022, 0.0021, 0.0021, 0.0021, 0.0022, 0.0019, 0.0021, 0.0020], device='cuda:0'), out_proj_covar=tensor([2.5617e-05, 2.5269e-05, 1.9472e-05, 2.0818e-05, 2.0826e-05, 1.6479e-05, 2.7033e-05, 1.9495e-05], device='cuda:0') 2022-11-15 18:36:31,260 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=27011.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:36:31,700 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.350e+02 2.232e+02 2.660e+02 3.188e+02 6.115e+02, threshold=5.320e+02, percent-clipped=2.0 2022-11-15 18:36:35,115 INFO [train.py:876] (0/4) Epoch 4, batch 5200, loss[loss=0.2517, simple_loss=0.2239, pruned_loss=0.1398, over 5437.00 frames. ], tot_loss[loss=0.1967, simple_loss=0.1929, pruned_loss=0.1002, over 1081678.50 frames. ], batch size: 53, lr: 1.86e-02, grad_scale: 16.0 2022-11-15 18:36:40,327 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.78 vs. limit=2.0 2022-11-15 18:37:02,262 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-15 18:37:12,293 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=27070.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:37:42,560 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.845e+02 2.257e+02 3.098e+02 5.332e+02, threshold=4.514e+02, percent-clipped=1.0 2022-11-15 18:37:45,684 INFO [train.py:876] (0/4) Epoch 4, batch 5300, loss[loss=0.1744, simple_loss=0.1791, pruned_loss=0.08483, over 5574.00 frames. ], tot_loss[loss=0.196, simple_loss=0.1933, pruned_loss=0.09936, over 1087041.27 frames. ], batch size: 22, lr: 1.86e-02, grad_scale: 8.0 2022-11-15 18:37:46,366 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=27118.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:37:58,698 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.15 vs. limit=2.0 2022-11-15 18:38:07,346 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=27148.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:38:08,783 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1042, 1.8245, 2.8251, 2.4287, 2.7602, 1.8908, 2.6959, 3.1175], device='cuda:0'), covar=tensor([0.0117, 0.0521, 0.0189, 0.0356, 0.0198, 0.0417, 0.0272, 0.0206], device='cuda:0'), in_proj_covar=tensor([0.0156, 0.0182, 0.0159, 0.0193, 0.0153, 0.0174, 0.0211, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:38:50,775 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=27209.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:38:53,684 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.431e+02 2.106e+02 2.706e+02 3.414e+02 5.155e+02, threshold=5.411e+02, percent-clipped=7.0 2022-11-15 18:38:55,785 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7082, 5.3261, 3.9760, 2.3760, 5.0790, 2.0708, 5.0201, 2.9017], device='cuda:0'), covar=tensor([0.0984, 0.0060, 0.0326, 0.1912, 0.0084, 0.1852, 0.0080, 0.1482], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0097, 0.0100, 0.0122, 0.0097, 0.0134, 0.0085, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 18:38:56,361 INFO [train.py:876] (0/4) Epoch 4, batch 5400, loss[loss=0.1358, simple_loss=0.1546, pruned_loss=0.05845, over 5191.00 frames. ], tot_loss[loss=0.1945, simple_loss=0.1926, pruned_loss=0.09822, over 1091081.16 frames. ], batch size: 7, lr: 1.85e-02, grad_scale: 8.0 2022-11-15 18:39:15,310 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0642, 3.7441, 3.1826, 3.7230, 3.7348, 3.1932, 3.3783, 3.1520], device='cuda:0'), covar=tensor([0.1261, 0.0393, 0.1185, 0.0362, 0.0456, 0.0394, 0.0427, 0.0507], device='cuda:0'), in_proj_covar=tensor([0.0107, 0.0121, 0.0192, 0.0123, 0.0153, 0.0129, 0.0128, 0.0117], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:39:26,836 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=27260.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 18:39:26,914 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4274, 1.7873, 1.3217, 1.1476, 1.2972, 1.9659, 1.6892, 1.7023], device='cuda:0'), covar=tensor([0.0932, 0.0608, 0.0832, 0.1267, 0.0443, 0.0248, 0.0243, 0.0657], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0179, 0.0127, 0.0184, 0.0132, 0.0129, 0.0118, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:39:29,943 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4505, 4.7134, 3.6587, 2.0329, 4.5466, 1.7407, 4.5337, 2.6458], device='cuda:0'), covar=tensor([0.0814, 0.0098, 0.0277, 0.1963, 0.0148, 0.1820, 0.0094, 0.1479], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0097, 0.0102, 0.0123, 0.0098, 0.0134, 0.0086, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 18:39:43,361 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-15 18:39:59,501 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=27306.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:40:02,927 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7710, 4.2108, 3.6644, 4.1876, 4.2380, 3.5367, 3.7747, 3.5485], device='cuda:0'), covar=tensor([0.0392, 0.0380, 0.1051, 0.0354, 0.0358, 0.0412, 0.0263, 0.0399], device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0122, 0.0193, 0.0124, 0.0155, 0.0129, 0.0129, 0.0118], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:40:04,135 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 2.138e+02 2.582e+02 3.197e+02 8.943e+02, threshold=5.164e+02, percent-clipped=1.0 2022-11-15 18:40:07,247 INFO [train.py:876] (0/4) Epoch 4, batch 5500, loss[loss=0.1413, simple_loss=0.153, pruned_loss=0.06481, over 5562.00 frames. ], tot_loss[loss=0.1958, simple_loss=0.193, pruned_loss=0.0993, over 1088243.49 frames. ], batch size: 15, lr: 1.85e-02, grad_scale: 8.0 2022-11-15 18:40:28,501 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4607, 1.0255, 1.4617, 0.3694, 1.3075, 1.2476, 0.7939, 1.5057], device='cuda:0'), covar=tensor([0.0013, 0.0011, 0.0009, 0.0016, 0.0013, 0.0012, 0.0027, 0.0012], device='cuda:0'), in_proj_covar=tensor([0.0022, 0.0022, 0.0021, 0.0021, 0.0021, 0.0020, 0.0021, 0.0020], device='cuda:0'), out_proj_covar=tensor([2.4842e-05, 2.5743e-05, 1.9829e-05, 2.0794e-05, 1.9961e-05, 1.6673e-05, 2.6934e-05, 1.9535e-05], device='cuda:0') 2022-11-15 18:40:51,551 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2022-11-15 18:41:00,205 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.31 vs. limit=2.0 2022-11-15 18:41:15,201 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.296e+02 2.174e+02 2.548e+02 3.213e+02 6.484e+02, threshold=5.095e+02, percent-clipped=3.0 2022-11-15 18:41:17,888 INFO [train.py:876] (0/4) Epoch 4, batch 5600, loss[loss=0.1368, simple_loss=0.1527, pruned_loss=0.06046, over 5437.00 frames. ], tot_loss[loss=0.1965, simple_loss=0.1937, pruned_loss=0.09968, over 1092658.03 frames. ], batch size: 11, lr: 1.85e-02, grad_scale: 8.0 2022-11-15 18:41:58,719 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5986, 3.5332, 3.2963, 3.5546, 3.2974, 3.4089, 1.2854, 3.4863], device='cuda:0'), covar=tensor([0.0400, 0.0389, 0.0489, 0.0325, 0.0598, 0.0486, 0.4155, 0.0506], device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0066, 0.0067, 0.0057, 0.0084, 0.0066, 0.0122, 0.0091], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:42:14,022 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.72 vs. limit=2.0 2022-11-15 18:42:19,412 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=27504.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:42:25,765 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.283e+02 2.178e+02 2.574e+02 3.004e+02 8.808e+02, threshold=5.148e+02, percent-clipped=2.0 2022-11-15 18:42:28,790 INFO [train.py:876] (0/4) Epoch 4, batch 5700, loss[loss=0.177, simple_loss=0.1676, pruned_loss=0.09318, over 5467.00 frames. ], tot_loss[loss=0.1952, simple_loss=0.1926, pruned_loss=0.09892, over 1088104.28 frames. ], batch size: 11, lr: 1.84e-02, grad_scale: 8.0 2022-11-15 18:42:53,303 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7697, 4.0562, 3.1408, 1.9958, 3.6714, 1.4065, 3.7502, 2.0665], device='cuda:0'), covar=tensor([0.1092, 0.0112, 0.0410, 0.1949, 0.0178, 0.1930, 0.0181, 0.1743], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0096, 0.0100, 0.0119, 0.0097, 0.0130, 0.0085, 0.0122], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 18:42:59,135 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=27560.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:43:30,255 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.21 vs. limit=5.0 2022-11-15 18:43:32,371 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=27606.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:43:33,704 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=27608.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 18:43:37,002 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.377e+02 1.840e+02 2.310e+02 2.841e+02 5.066e+02, threshold=4.620e+02, percent-clipped=0.0 2022-11-15 18:43:40,075 INFO [train.py:876] (0/4) Epoch 4, batch 5800, loss[loss=0.2236, simple_loss=0.2289, pruned_loss=0.1092, over 5578.00 frames. ], tot_loss[loss=0.1947, simple_loss=0.1921, pruned_loss=0.09859, over 1091143.00 frames. ], batch size: 22, lr: 1.84e-02, grad_scale: 8.0 2022-11-15 18:43:55,117 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9096, 1.0024, 1.1125, 0.3981, 1.1680, 1.3363, 0.6471, 1.3322], device='cuda:0'), covar=tensor([0.0023, 0.0015, 0.0016, 0.0026, 0.0020, 0.0014, 0.0047, 0.0020], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0023, 0.0023, 0.0024, 0.0023, 0.0021, 0.0023, 0.0022], device='cuda:0'), out_proj_covar=tensor([2.6534e-05, 2.7190e-05, 2.1371e-05, 2.3822e-05, 2.1856e-05, 1.7369e-05, 3.0540e-05, 2.1283e-05], device='cuda:0') 2022-11-15 18:44:06,316 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=27654.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:44:37,437 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4155, 4.5638, 4.7373, 4.8421, 3.8975, 3.4119, 5.2384, 4.6094], device='cuda:0'), covar=tensor([0.0444, 0.0990, 0.0254, 0.0804, 0.0586, 0.0312, 0.0786, 0.0388], device='cuda:0'), in_proj_covar=tensor([0.0058, 0.0077, 0.0066, 0.0077, 0.0060, 0.0052, 0.0101, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:44:47,988 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.281e+02 1.945e+02 2.573e+02 2.996e+02 4.462e+02, threshold=5.147e+02, percent-clipped=0.0 2022-11-15 18:44:48,892 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=27714.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 18:44:51,048 INFO [train.py:876] (0/4) Epoch 4, batch 5900, loss[loss=0.2045, simple_loss=0.2043, pruned_loss=0.1024, over 5572.00 frames. ], tot_loss[loss=0.1922, simple_loss=0.1901, pruned_loss=0.09713, over 1082612.95 frames. ], batch size: 40, lr: 1.84e-02, grad_scale: 8.0 2022-11-15 18:45:22,947 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8827, 4.0427, 3.8640, 4.2487, 3.6346, 3.2513, 4.6623, 3.7642], device='cuda:0'), covar=tensor([0.0518, 0.0956, 0.0370, 0.0701, 0.0568, 0.0439, 0.0661, 0.0617], device='cuda:0'), in_proj_covar=tensor([0.0058, 0.0076, 0.0066, 0.0078, 0.0060, 0.0052, 0.0101, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:45:32,355 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=27775.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 18:45:36,608 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 18:45:53,389 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=27804.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:45:57,421 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0237, 0.7765, 0.7626, 0.9845, 0.9546, 1.3034, 0.9838, 1.2436], device='cuda:0'), covar=tensor([0.1399, 0.0640, 0.1086, 0.2275, 0.5268, 0.0892, 0.1621, 0.0717], device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0010, 0.0008, 0.0009, 0.0008, 0.0007, 0.0009, 0.0008], device='cuda:0'), out_proj_covar=tensor([3.0363e-05, 3.6599e-05, 3.2523e-05, 3.5466e-05, 3.2255e-05, 2.8947e-05, 3.4290e-05, 3.0813e-05], device='cuda:0') 2022-11-15 18:45:59,276 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.246e+02 2.057e+02 2.608e+02 3.378e+02 5.938e+02, threshold=5.215e+02, percent-clipped=4.0 2022-11-15 18:46:02,455 INFO [train.py:876] (0/4) Epoch 4, batch 6000, loss[loss=0.2361, simple_loss=0.2122, pruned_loss=0.1301, over 5467.00 frames. ], tot_loss[loss=0.1898, simple_loss=0.188, pruned_loss=0.09583, over 1078907.57 frames. ], batch size: 53, lr: 1.83e-02, grad_scale: 8.0 2022-11-15 18:46:02,457 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 18:46:07,483 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3402, 3.5745, 2.8735, 3.4726, 2.6322, 2.8455, 1.9063, 3.3032], device='cuda:0'), covar=tensor([0.0947, 0.0176, 0.0709, 0.0183, 0.0807, 0.0499, 0.1348, 0.0155], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0127, 0.0170, 0.0122, 0.0155, 0.0177, 0.0190, 0.0126], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:46:07,887 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2841, 2.2637, 3.9597, 3.0762, 4.1335, 2.7361, 3.9876, 4.1610], device='cuda:0'), covar=tensor([0.0161, 0.0853, 0.0214, 0.0879, 0.0115, 0.0680, 0.0395, 0.0191], device='cuda:0'), in_proj_covar=tensor([0.0158, 0.0186, 0.0163, 0.0202, 0.0153, 0.0177, 0.0218, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:46:20,379 INFO [train.py:908] (0/4) Epoch 4, validation: loss=0.1691, simple_loss=0.1898, pruned_loss=0.07419, over 1530663.00 frames. 2022-11-15 18:46:20,380 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 18:46:27,983 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7588, 2.2936, 2.9969, 3.7633, 4.0919, 3.0828, 2.9718, 4.0462], device='cuda:0'), covar=tensor([0.0239, 0.2787, 0.1914, 0.2533, 0.0549, 0.2555, 0.1626, 0.0170], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0214, 0.0215, 0.0289, 0.0201, 0.0222, 0.0197, 0.0160], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:46:40,722 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8048, 2.1592, 2.8693, 3.7320, 4.0968, 3.0711, 2.9224, 4.1022], device='cuda:0'), covar=tensor([0.0380, 0.3335, 0.2738, 0.3319, 0.0709, 0.2982, 0.2037, 0.0273], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0212, 0.0215, 0.0288, 0.0201, 0.0222, 0.0198, 0.0160], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:46:44,059 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7113, 1.1386, 1.3328, 1.0424, 0.9384, 0.9381, 0.7001, 0.9974], device='cuda:0'), covar=tensor([0.0035, 0.0029, 0.0026, 0.0039, 0.0033, 0.0030, 0.0054, 0.0045], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0022, 0.0022, 0.0023, 0.0023, 0.0020, 0.0022, 0.0022], device='cuda:0'), out_proj_covar=tensor([2.5953e-05, 2.5952e-05, 2.0339e-05, 2.2528e-05, 2.1715e-05, 1.7024e-05, 2.8946e-05, 2.1376e-05], device='cuda:0') 2022-11-15 18:46:45,372 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=27852.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:46:59,550 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0007, 2.3962, 1.8672, 2.7438, 1.7415, 2.2864, 2.4187, 3.0703], device='cuda:0'), covar=tensor([0.0487, 0.0964, 0.1445, 0.0613, 0.1475, 0.1129, 0.0956, 0.1596], device='cuda:0'), in_proj_covar=tensor([0.0046, 0.0052, 0.0067, 0.0044, 0.0060, 0.0051, 0.0061, 0.0043], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001, 0.0002, 0.0001], device='cuda:0') 2022-11-15 18:47:28,452 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.961e+02 2.307e+02 2.838e+02 4.570e+02, threshold=4.614e+02, percent-clipped=0.0 2022-11-15 18:47:31,206 INFO [train.py:876] (0/4) Epoch 4, batch 6100, loss[loss=0.1951, simple_loss=0.1866, pruned_loss=0.1018, over 5550.00 frames. ], tot_loss[loss=0.1922, simple_loss=0.19, pruned_loss=0.09715, over 1080872.30 frames. ], batch size: 25, lr: 1.83e-02, grad_scale: 8.0 2022-11-15 18:47:38,808 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 18:48:14,137 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.56 vs. limit=2.0 2022-11-15 18:48:16,897 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-15 18:48:16,988 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0 2022-11-15 18:48:23,648 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2022-11-15 18:48:39,806 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.715e+01 1.962e+02 2.542e+02 3.499e+02 5.895e+02, threshold=5.084e+02, percent-clipped=10.0 2022-11-15 18:48:42,558 INFO [train.py:876] (0/4) Epoch 4, batch 6200, loss[loss=0.2159, simple_loss=0.205, pruned_loss=0.1134, over 5590.00 frames. ], tot_loss[loss=0.1934, simple_loss=0.1906, pruned_loss=0.09806, over 1080321.51 frames. ], batch size: 22, lr: 1.83e-02, grad_scale: 8.0 2022-11-15 18:49:19,871 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28070.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 18:49:50,344 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.357e+02 2.008e+02 2.464e+02 2.967e+02 7.434e+02, threshold=4.928e+02, percent-clipped=2.0 2022-11-15 18:49:53,145 INFO [train.py:876] (0/4) Epoch 4, batch 6300, loss[loss=0.2007, simple_loss=0.2002, pruned_loss=0.1007, over 5560.00 frames. ], tot_loss[loss=0.1933, simple_loss=0.1908, pruned_loss=0.09791, over 1082554.13 frames. ], batch size: 22, lr: 1.82e-02, grad_scale: 8.0 2022-11-15 18:50:37,222 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.49 vs. limit=5.0 2022-11-15 18:50:44,959 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2894, 1.2952, 1.4936, 1.1669, 1.3217, 1.1629, 1.1101, 1.5518], device='cuda:0'), covar=tensor([0.0029, 0.0027, 0.0023, 0.0033, 0.0040, 0.0019, 0.0040, 0.0034], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0023, 0.0025, 0.0025, 0.0024, 0.0022, 0.0024, 0.0022], device='cuda:0'), out_proj_covar=tensor([2.5607e-05, 2.6741e-05, 2.3120e-05, 2.4060e-05, 2.2456e-05, 1.8036e-05, 3.0102e-05, 2.1485e-05], device='cuda:0') 2022-11-15 18:51:00,501 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.320e+02 2.107e+02 2.497e+02 3.273e+02 5.114e+02, threshold=4.994e+02, percent-clipped=2.0 2022-11-15 18:51:04,031 INFO [train.py:876] (0/4) Epoch 4, batch 6400, loss[loss=0.2035, simple_loss=0.2044, pruned_loss=0.1013, over 5792.00 frames. ], tot_loss[loss=0.1906, simple_loss=0.1899, pruned_loss=0.09568, over 1091664.84 frames. ], batch size: 26, lr: 1.82e-02, grad_scale: 8.0 2022-11-15 18:51:10,672 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-15 18:51:12,356 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28229.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:51:13,572 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5119, 4.5462, 4.3783, 4.7145, 4.1065, 3.9750, 5.1417, 4.2147], device='cuda:0'), covar=tensor([0.0341, 0.0630, 0.0381, 0.0550, 0.0522, 0.0342, 0.0526, 0.0652], device='cuda:0'), in_proj_covar=tensor([0.0062, 0.0080, 0.0068, 0.0083, 0.0063, 0.0054, 0.0106, 0.0068], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 18:51:30,768 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28255.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:51:31,426 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9833, 1.4066, 0.8766, 1.1559, 1.3681, 1.7566, 1.5063, 1.3779], device='cuda:0'), covar=tensor([0.1973, 0.0517, 0.1103, 0.1935, 0.0559, 0.0313, 0.0337, 0.0837], device='cuda:0'), in_proj_covar=tensor([0.0008, 0.0011, 0.0009, 0.0010, 0.0008, 0.0008, 0.0010, 0.0009], device='cuda:0'), out_proj_covar=tensor([3.2491e-05, 3.8602e-05, 3.4875e-05, 3.8162e-05, 3.3686e-05, 3.0760e-05, 3.7123e-05, 3.3793e-05], device='cuda:0') 2022-11-15 18:51:53,828 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28287.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:51:55,747 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9850, 3.5961, 3.8853, 3.6422, 3.9972, 3.2905, 3.5472, 3.9466], device='cuda:0'), covar=tensor([0.0301, 0.0285, 0.0325, 0.0272, 0.0312, 0.0467, 0.0287, 0.0318], device='cuda:0'), in_proj_covar=tensor([0.0096, 0.0100, 0.0081, 0.0110, 0.0106, 0.0061, 0.0087, 0.0099], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:51:55,860 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28290.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:52:11,778 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 2.095e+02 2.644e+02 3.190e+02 6.758e+02, threshold=5.289e+02, percent-clipped=3.0 2022-11-15 18:52:14,085 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28316.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 18:52:14,563 INFO [train.py:876] (0/4) Epoch 4, batch 6500, loss[loss=0.1335, simple_loss=0.1553, pruned_loss=0.05585, over 4609.00 frames. ], tot_loss[loss=0.1922, simple_loss=0.1906, pruned_loss=0.0969, over 1085585.81 frames. ], batch size: 5, lr: 1.82e-02, grad_scale: 8.0 2022-11-15 18:52:36,616 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28348.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:52:52,089 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=28370.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:53:01,729 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2022-11-15 18:53:22,968 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.292e+02 1.962e+02 2.374e+02 2.918e+02 5.741e+02, threshold=4.749e+02, percent-clipped=1.0 2022-11-15 18:53:26,230 INFO [train.py:876] (0/4) Epoch 4, batch 6600, loss[loss=0.1933, simple_loss=0.1939, pruned_loss=0.09634, over 5571.00 frames. ], tot_loss[loss=0.1919, simple_loss=0.1899, pruned_loss=0.09693, over 1086229.13 frames. ], batch size: 25, lr: 1.81e-02, grad_scale: 8.0 2022-11-15 18:53:26,944 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=28418.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:54:14,662 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28486.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:54:16,617 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7859, 4.8229, 4.4228, 4.5837, 4.7478, 4.5698, 2.0530, 4.6956], device='cuda:0'), covar=tensor([0.0297, 0.0181, 0.0347, 0.0293, 0.0303, 0.0358, 0.3046, 0.0351], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0070, 0.0071, 0.0058, 0.0086, 0.0068, 0.0124, 0.0093], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:54:19,424 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28492.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:54:22,399 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0177, 3.2391, 3.0491, 3.0377, 3.1330, 3.1029, 1.0288, 3.1488], device='cuda:0'), covar=tensor([0.0351, 0.0211, 0.0308, 0.0221, 0.0265, 0.0256, 0.3001, 0.0285], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0070, 0.0071, 0.0058, 0.0086, 0.0068, 0.0124, 0.0093], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:54:33,671 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.274e+02 2.033e+02 2.580e+02 2.965e+02 5.128e+02, threshold=5.159e+02, percent-clipped=4.0 2022-11-15 18:54:36,432 INFO [train.py:876] (0/4) Epoch 4, batch 6700, loss[loss=0.1761, simple_loss=0.1709, pruned_loss=0.09064, over 5715.00 frames. ], tot_loss[loss=0.1913, simple_loss=0.19, pruned_loss=0.0963, over 1090798.74 frames. ], batch size: 11, lr: 1.81e-02, grad_scale: 8.0 2022-11-15 18:54:38,599 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.2094, 4.7556, 5.0479, 4.6131, 5.3390, 5.1540, 4.4439, 5.2462], device='cuda:0'), covar=tensor([0.0317, 0.0194, 0.0327, 0.0248, 0.0274, 0.0080, 0.0199, 0.0202], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0100, 0.0080, 0.0110, 0.0105, 0.0060, 0.0088, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:54:57,424 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28547.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:55:02,303 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28553.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:55:18,773 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.09 vs. limit=2.0 2022-11-15 18:55:19,980 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6142, 2.1567, 2.4249, 3.3964, 3.7187, 2.5990, 2.1110, 3.5705], device='cuda:0'), covar=tensor([0.0282, 0.3221, 0.3322, 0.3441, 0.0903, 0.3520, 0.2720, 0.0255], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0214, 0.0218, 0.0291, 0.0206, 0.0221, 0.0197, 0.0162], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:55:24,585 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28585.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:55:43,579 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28611.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 18:55:44,799 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.410e+02 2.021e+02 2.337e+02 3.022e+02 5.149e+02, threshold=4.674e+02, percent-clipped=0.0 2022-11-15 18:55:47,616 INFO [train.py:876] (0/4) Epoch 4, batch 6800, loss[loss=0.1185, simple_loss=0.1388, pruned_loss=0.04912, over 5473.00 frames. ], tot_loss[loss=0.1915, simple_loss=0.1902, pruned_loss=0.09642, over 1090654.99 frames. ], batch size: 12, lr: 1.81e-02, grad_scale: 8.0 2022-11-15 18:56:06,109 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28643.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:56:56,416 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 2.083e+02 2.579e+02 3.189e+02 6.516e+02, threshold=5.157e+02, percent-clipped=5.0 2022-11-15 18:56:57,473 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.13 vs. limit=2.0 2022-11-15 18:56:59,135 INFO [train.py:876] (0/4) Epoch 4, batch 6900, loss[loss=0.1523, simple_loss=0.1681, pruned_loss=0.06828, over 5117.00 frames. ], tot_loss[loss=0.1914, simple_loss=0.1898, pruned_loss=0.09655, over 1077246.76 frames. ], batch size: 8, lr: 1.80e-02, grad_scale: 8.0 2022-11-15 18:57:00,578 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2332, 5.0251, 4.1801, 4.8410, 4.8710, 4.2215, 4.4359, 3.7764], device='cuda:0'), covar=tensor([0.0297, 0.0260, 0.1135, 0.0397, 0.0333, 0.0411, 0.0364, 0.0762], device='cuda:0'), in_proj_covar=tensor([0.0111, 0.0128, 0.0203, 0.0131, 0.0158, 0.0135, 0.0132, 0.0120], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 18:57:20,351 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.12 vs. limit=2.0 2022-11-15 18:57:28,987 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6245, 2.9247, 3.6605, 4.2233, 4.9004, 4.3790, 3.3959, 4.9145], device='cuda:0'), covar=tensor([0.0118, 0.3549, 0.1745, 0.2685, 0.0342, 0.1656, 0.1642, 0.0117], device='cuda:0'), in_proj_covar=tensor([0.0156, 0.0212, 0.0217, 0.0290, 0.0203, 0.0219, 0.0194, 0.0164], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:58:07,455 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.482e+01 2.092e+02 2.563e+02 3.027e+02 6.161e+02, threshold=5.126e+02, percent-clipped=1.0 2022-11-15 18:58:10,632 INFO [train.py:876] (0/4) Epoch 4, batch 7000, loss[loss=0.1917, simple_loss=0.1645, pruned_loss=0.1094, over 4086.00 frames. ], tot_loss[loss=0.1931, simple_loss=0.1906, pruned_loss=0.09783, over 1078297.11 frames. ], batch size: 181, lr: 1.80e-02, grad_scale: 8.0 2022-11-15 18:58:28,009 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28842.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:58:30,175 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5720, 1.9744, 2.7518, 3.5794, 3.7192, 3.0273, 2.4316, 4.0560], device='cuda:0'), covar=tensor([0.0193, 0.3742, 0.2032, 0.2484, 0.0644, 0.2470, 0.2012, 0.0152], device='cuda:0'), in_proj_covar=tensor([0.0157, 0.0215, 0.0218, 0.0296, 0.0210, 0.0224, 0.0198, 0.0167], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 18:58:32,039 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=28848.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:58:41,445 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=28861.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:58:58,291 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=28885.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:59:16,482 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=28911.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 18:59:17,665 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 2.242e+02 2.603e+02 3.137e+02 5.359e+02, threshold=5.206e+02, percent-clipped=2.0 2022-11-15 18:59:21,223 INFO [train.py:876] (0/4) Epoch 4, batch 7100, loss[loss=0.1818, simple_loss=0.1816, pruned_loss=0.09102, over 5716.00 frames. ], tot_loss[loss=0.1939, simple_loss=0.1915, pruned_loss=0.09819, over 1084105.71 frames. ], batch size: 17, lr: 1.80e-02, grad_scale: 8.0 2022-11-15 18:59:24,869 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=28922.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:59:32,541 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=28933.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:59:38,542 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 18:59:39,665 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=28943.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 18:59:50,841 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=28959.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 18:59:50,931 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5173, 1.0610, 1.1857, 0.9813, 1.6528, 1.0874, 2.1205, 1.2909], device='cuda:0'), covar=tensor([0.0013, 0.0048, 0.0052, 0.0020, 0.0019, 0.0106, 0.0017, 0.0033], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0014, 0.0013, 0.0015, 0.0014, 0.0014, 0.0016, 0.0015], device='cuda:0'), out_proj_covar=tensor([1.4668e-05, 1.5412e-05, 1.4113e-05, 1.5351e-05, 1.4392e-05, 1.6186e-05, 1.6799e-05, 1.7825e-05], device='cuda:0') 2022-11-15 18:59:52,436 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-15 19:00:13,729 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=28991.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:00:21,861 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29002.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:00:29,546 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 2.015e+02 2.566e+02 3.105e+02 6.744e+02, threshold=5.133e+02, percent-clipped=2.0 2022-11-15 19:00:32,329 INFO [train.py:876] (0/4) Epoch 4, batch 7200, loss[loss=0.2264, simple_loss=0.2118, pruned_loss=0.1205, over 5664.00 frames. ], tot_loss[loss=0.1939, simple_loss=0.1918, pruned_loss=0.09793, over 1086377.87 frames. ], batch size: 32, lr: 1.80e-02, grad_scale: 8.0 2022-11-15 19:00:59,549 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2751, 2.3304, 1.9483, 2.3094, 2.4206, 2.1700, 1.9048, 2.0111], device='cuda:0'), covar=tensor([0.0387, 0.0582, 0.1550, 0.0537, 0.0440, 0.0488, 0.0654, 0.0678], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0129, 0.0204, 0.0132, 0.0156, 0.0132, 0.0133, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:01:03,731 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29063.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:01:22,852 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-4.pt 2022-11-15 19:02:11,321 INFO [train.py:876] (0/4) Epoch 5, batch 0, loss[loss=0.2652, simple_loss=0.2423, pruned_loss=0.1441, over 5702.00 frames. ], tot_loss[loss=0.2652, simple_loss=0.2423, pruned_loss=0.1441, over 5702.00 frames. ], batch size: 36, lr: 1.67e-02, grad_scale: 16.0 2022-11-15 19:02:11,323 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 19:02:24,968 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2636, 1.0141, 0.7589, 0.9096, 1.0270, 1.1745, 0.9905, 0.7537], device='cuda:0'), covar=tensor([0.0303, 0.0236, 0.0426, 0.0346, 0.0317, 0.0111, 0.0342, 0.0228], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0011, 0.0010, 0.0010, 0.0009, 0.0008, 0.0010, 0.0009], device='cuda:0'), out_proj_covar=tensor([3.5605e-05, 4.2313e-05, 3.7726e-05, 4.1032e-05, 3.6362e-05, 3.2453e-05, 3.8988e-05, 3.6036e-05], device='cuda:0') 2022-11-15 19:02:28,899 INFO [train.py:908] (0/4) Epoch 5, validation: loss=0.1679, simple_loss=0.1892, pruned_loss=0.07329, over 1530663.00 frames. 2022-11-15 19:02:28,900 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 19:02:46,021 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.011e+02 1.898e+02 2.233e+02 2.969e+02 5.666e+02, threshold=4.467e+02, percent-clipped=2.0 2022-11-15 19:02:58,323 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1482, 4.7773, 3.5366, 2.1623, 4.5682, 2.1784, 4.4075, 2.5257], device='cuda:0'), covar=tensor([0.1151, 0.0119, 0.0411, 0.2309, 0.0182, 0.1826, 0.0149, 0.1956], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0100, 0.0105, 0.0125, 0.0102, 0.0134, 0.0089, 0.0126], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 19:03:06,473 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29142.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:03:10,783 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29148.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:03:40,110 INFO [train.py:876] (0/4) Epoch 5, batch 100, loss[loss=0.1622, simple_loss=0.1761, pruned_loss=0.07414, over 5549.00 frames. ], tot_loss[loss=0.1906, simple_loss=0.1907, pruned_loss=0.09529, over 438282.72 frames. ], batch size: 13, lr: 1.67e-02, grad_scale: 16.0 2022-11-15 19:03:41,273 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29190.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:03:45,637 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29196.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:03:48,330 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 19:03:58,638 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.864e+02 2.257e+02 2.875e+02 4.325e+02, threshold=4.514e+02, percent-clipped=0.0 2022-11-15 19:04:01,602 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=29217.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:04:02,623 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-15 19:04:16,094 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29236.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:04:54,397 INFO [train.py:876] (0/4) Epoch 5, batch 200, loss[loss=0.1748, simple_loss=0.1818, pruned_loss=0.08387, over 5509.00 frames. ], tot_loss[loss=0.1871, simple_loss=0.1879, pruned_loss=0.09311, over 698851.66 frames. ], batch size: 10, lr: 1.66e-02, grad_scale: 16.0 2022-11-15 19:05:00,110 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29297.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:05:11,282 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.852e+02 2.311e+02 2.896e+02 4.298e+02, threshold=4.622e+02, percent-clipped=0.0 2022-11-15 19:05:31,874 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.24 vs. limit=5.0 2022-11-15 19:05:34,244 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9141, 1.3037, 1.5949, 1.3397, 1.7679, 1.4982, 1.2301, 1.9449], device='cuda:0'), covar=tensor([0.0318, 0.1309, 0.0788, 0.0838, 0.0548, 0.1083, 0.1423, 0.0344], device='cuda:0'), in_proj_covar=tensor([0.0161, 0.0216, 0.0214, 0.0296, 0.0205, 0.0222, 0.0199, 0.0165], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 19:05:40,852 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2867, 3.4658, 2.7772, 1.7546, 3.3430, 1.3461, 3.3795, 1.8136], device='cuda:0'), covar=tensor([0.1021, 0.0136, 0.0617, 0.1720, 0.0185, 0.1973, 0.0156, 0.1697], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0099, 0.0105, 0.0121, 0.0100, 0.0132, 0.0087, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 19:05:43,280 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=29358.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:06:04,996 INFO [train.py:876] (0/4) Epoch 5, batch 300, loss[loss=0.2043, simple_loss=0.2068, pruned_loss=0.1009, over 5681.00 frames. ], tot_loss[loss=0.1878, simple_loss=0.1879, pruned_loss=0.09382, over 852491.16 frames. ], batch size: 34, lr: 1.66e-02, grad_scale: 8.0 2022-11-15 19:06:11,629 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0044, 0.8078, 0.8069, 0.5823, 0.8110, 0.8850, 0.7249, 0.4979], device='cuda:0'), covar=tensor([0.0385, 0.0356, 0.0471, 0.0532, 0.0442, 0.0468, 0.0645, 0.0459], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0011, 0.0010, 0.0010, 0.0009, 0.0008, 0.0010, 0.0009], device='cuda:0'), out_proj_covar=tensor([3.4848e-05, 4.1483e-05, 3.8394e-05, 3.9964e-05, 3.5019e-05, 3.2262e-05, 3.9547e-05, 3.5824e-05], device='cuda:0') 2022-11-15 19:06:22,462 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.196e+02 1.934e+02 2.477e+02 3.130e+02 8.486e+02, threshold=4.954e+02, percent-clipped=7.0 2022-11-15 19:06:59,101 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7099, 4.0055, 3.1210, 1.8580, 3.7772, 1.3045, 3.5709, 2.1835], device='cuda:0'), covar=tensor([0.1029, 0.0141, 0.0608, 0.2256, 0.0179, 0.2099, 0.0181, 0.1982], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0100, 0.0106, 0.0123, 0.0101, 0.0132, 0.0089, 0.0126], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 19:07:15,086 INFO [train.py:876] (0/4) Epoch 5, batch 400, loss[loss=0.2334, simple_loss=0.2127, pruned_loss=0.1271, over 5371.00 frames. ], tot_loss[loss=0.188, simple_loss=0.1879, pruned_loss=0.09403, over 950367.21 frames. ], batch size: 70, lr: 1.66e-02, grad_scale: 8.0 2022-11-15 19:07:32,202 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.241e+02 1.992e+02 2.391e+02 2.865e+02 5.865e+02, threshold=4.782e+02, percent-clipped=1.0 2022-11-15 19:07:34,756 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29517.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:07:58,392 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29551.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:08:07,906 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29565.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:08:25,427 INFO [train.py:876] (0/4) Epoch 5, batch 500, loss[loss=0.1889, simple_loss=0.1924, pruned_loss=0.09277, over 5636.00 frames. ], tot_loss[loss=0.1867, simple_loss=0.1871, pruned_loss=0.09314, over 1006776.41 frames. ], batch size: 29, lr: 1.66e-02, grad_scale: 8.0 2022-11-15 19:08:27,520 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=29592.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:08:27,806 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.13 vs. limit=2.0 2022-11-15 19:08:30,805 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2022-11-15 19:08:41,584 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29612.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:08:42,746 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.299e+01 1.690e+02 2.176e+02 2.799e+02 4.262e+02, threshold=4.352e+02, percent-clipped=0.0 2022-11-15 19:08:52,774 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2022-11-15 19:09:14,441 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29658.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:09:17,228 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29662.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:09:36,296 INFO [train.py:876] (0/4) Epoch 5, batch 600, loss[loss=0.1249, simple_loss=0.1404, pruned_loss=0.05474, over 5447.00 frames. ], tot_loss[loss=0.1871, simple_loss=0.187, pruned_loss=0.09359, over 1038670.14 frames. ], batch size: 10, lr: 1.65e-02, grad_scale: 8.0 2022-11-15 19:09:44,842 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9327, 5.1594, 3.5640, 4.8447, 3.8751, 3.4253, 3.1398, 4.4594], device='cuda:0'), covar=tensor([0.1283, 0.0123, 0.0753, 0.0146, 0.0286, 0.0808, 0.1404, 0.0103], device='cuda:0'), in_proj_covar=tensor([0.0178, 0.0121, 0.0165, 0.0121, 0.0156, 0.0180, 0.0186, 0.0128], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:09:48,477 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29706.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:09:53,666 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.408e+01 2.026e+02 2.597e+02 3.156e+02 6.029e+02, threshold=5.193e+02, percent-clipped=5.0 2022-11-15 19:09:59,862 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29723.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 19:10:46,602 INFO [train.py:876] (0/4) Epoch 5, batch 700, loss[loss=0.1199, simple_loss=0.139, pruned_loss=0.05044, over 5736.00 frames. ], tot_loss[loss=0.1872, simple_loss=0.1869, pruned_loss=0.09372, over 1052119.67 frames. ], batch size: 11, lr: 1.65e-02, grad_scale: 8.0 2022-11-15 19:10:56,335 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=29802.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:11:04,298 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.261e+02 2.007e+02 2.393e+02 2.831e+02 4.977e+02, threshold=4.787e+02, percent-clipped=0.0 2022-11-15 19:11:39,254 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=29863.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:11:47,903 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.26 vs. limit=2.0 2022-11-15 19:11:57,080 INFO [train.py:876] (0/4) Epoch 5, batch 800, loss[loss=0.1269, simple_loss=0.1419, pruned_loss=0.05599, over 5483.00 frames. ], tot_loss[loss=0.1841, simple_loss=0.1848, pruned_loss=0.09165, over 1061069.14 frames. ], batch size: 10, lr: 1.65e-02, grad_scale: 8.0 2022-11-15 19:11:59,158 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=29892.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:12:10,337 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=29907.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:12:15,045 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.261e+02 1.917e+02 2.325e+02 2.760e+02 6.321e+02, threshold=4.650e+02, percent-clipped=1.0 2022-11-15 19:12:33,174 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=29940.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:12:54,403 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 19:12:54,832 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7802, 4.7448, 3.5995, 2.2018, 4.5473, 1.7116, 4.2205, 2.7191], device='cuda:0'), covar=tensor([0.1089, 0.0099, 0.0376, 0.1831, 0.0110, 0.1657, 0.0287, 0.1395], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0098, 0.0107, 0.0122, 0.0098, 0.0132, 0.0088, 0.0125], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 19:13:07,972 INFO [train.py:876] (0/4) Epoch 5, batch 900, loss[loss=0.1119, simple_loss=0.1273, pruned_loss=0.04821, over 3653.00 frames. ], tot_loss[loss=0.1834, simple_loss=0.1847, pruned_loss=0.09103, over 1071379.97 frames. ], batch size: 4, lr: 1.65e-02, grad_scale: 8.0 2022-11-15 19:13:16,119 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-30000.pt 2022-11-15 19:13:29,502 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.538e+01 1.888e+02 2.305e+02 2.835e+02 5.650e+02, threshold=4.611e+02, percent-clipped=1.0 2022-11-15 19:13:31,425 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.13 vs. limit=2.0 2022-11-15 19:13:32,270 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30018.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 19:13:52,511 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4128, 1.7188, 2.5222, 3.1356, 3.3037, 2.4719, 1.8961, 3.2810], device='cuda:0'), covar=tensor([0.0373, 0.3445, 0.2150, 0.4126, 0.0741, 0.3082, 0.2435, 0.0393], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0217, 0.0212, 0.0300, 0.0212, 0.0228, 0.0201, 0.0168], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 19:14:22,091 INFO [train.py:876] (0/4) Epoch 5, batch 1000, loss[loss=0.09223, simple_loss=0.1026, pruned_loss=0.04093, over 4947.00 frames. ], tot_loss[loss=0.1828, simple_loss=0.1848, pruned_loss=0.09039, over 1078534.69 frames. ], batch size: 5, lr: 1.64e-02, grad_scale: 8.0 2022-11-15 19:14:39,368 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.933e+01 1.915e+02 2.282e+02 2.914e+02 7.246e+02, threshold=4.564e+02, percent-clipped=3.0 2022-11-15 19:14:47,769 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30125.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:15:10,215 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30158.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:15:30,414 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30186.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:15:32,253 INFO [train.py:876] (0/4) Epoch 5, batch 1100, loss[loss=0.1006, simple_loss=0.1241, pruned_loss=0.03857, over 4760.00 frames. ], tot_loss[loss=0.1842, simple_loss=0.186, pruned_loss=0.09119, over 1082911.93 frames. ], batch size: 5, lr: 1.64e-02, grad_scale: 8.0 2022-11-15 19:15:39,690 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2022-11-15 19:15:45,335 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30207.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:15:46,757 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30209.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:15:49,991 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.297e+02 1.933e+02 2.265e+02 2.887e+02 4.331e+02, threshold=4.530e+02, percent-clipped=0.0 2022-11-15 19:16:19,262 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30255.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:16:30,097 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30270.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:16:43,129 INFO [train.py:876] (0/4) Epoch 5, batch 1200, loss[loss=0.2419, simple_loss=0.2043, pruned_loss=0.1397, over 4051.00 frames. ], tot_loss[loss=0.1839, simple_loss=0.1852, pruned_loss=0.09127, over 1082484.07 frames. ], batch size: 181, lr: 1.64e-02, grad_scale: 8.0 2022-11-15 19:16:58,852 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9345, 0.8216, 0.7595, 0.7897, 1.0369, 1.2471, 0.7975, 0.5526], device='cuda:0'), covar=tensor([0.0521, 0.0451, 0.0528, 0.0887, 0.0223, 0.0406, 0.0549, 0.0525], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0011, 0.0009, 0.0010, 0.0009, 0.0008, 0.0010, 0.0009], device='cuda:0'), out_proj_covar=tensor([3.6550e-05, 4.2887e-05, 3.8111e-05, 4.2077e-05, 3.5759e-05, 3.3343e-05, 3.9286e-05, 3.7324e-05], device='cuda:0') 2022-11-15 19:17:00,684 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.969e+02 2.334e+02 2.925e+02 5.027e+02, threshold=4.667e+02, percent-clipped=3.0 2022-11-15 19:17:03,527 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30318.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 19:17:33,177 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30360.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:17:34,474 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8012, 4.2234, 3.6258, 4.2175, 4.2525, 3.5535, 3.7338, 3.5619], device='cuda:0'), covar=tensor([0.0469, 0.0421, 0.1602, 0.0412, 0.0415, 0.0468, 0.0540, 0.0542], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0138, 0.0216, 0.0138, 0.0168, 0.0142, 0.0142, 0.0127], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:17:37,208 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30366.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:17:53,002 INFO [train.py:876] (0/4) Epoch 5, batch 1300, loss[loss=0.1891, simple_loss=0.188, pruned_loss=0.09511, over 5784.00 frames. ], tot_loss[loss=0.1848, simple_loss=0.1858, pruned_loss=0.09195, over 1086928.22 frames. ], batch size: 21, lr: 1.63e-02, grad_scale: 8.0 2022-11-15 19:18:10,526 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.384e+02 1.960e+02 2.546e+02 3.212e+02 9.234e+02, threshold=5.093e+02, percent-clipped=6.0 2022-11-15 19:18:15,966 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30421.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:18:41,956 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30458.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:18:45,994 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 19:18:58,170 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30481.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:19:04,003 INFO [train.py:876] (0/4) Epoch 5, batch 1400, loss[loss=0.1825, simple_loss=0.1963, pruned_loss=0.08438, over 5787.00 frames. ], tot_loss[loss=0.1826, simple_loss=0.1842, pruned_loss=0.09049, over 1086837.83 frames. ], batch size: 22, lr: 1.63e-02, grad_scale: 8.0 2022-11-15 19:19:15,745 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30506.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:19:21,872 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.868e+02 2.246e+02 2.844e+02 4.562e+02, threshold=4.491e+02, percent-clipped=0.0 2022-11-15 19:19:33,167 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6163, 1.6101, 1.2680, 1.4842, 1.3566, 1.7597, 1.3882, 1.0494], device='cuda:0'), covar=tensor([0.0012, 0.0017, 0.0029, 0.0015, 0.0019, 0.0030, 0.0025, 0.0024], device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0015, 0.0014, 0.0016, 0.0016, 0.0015, 0.0018, 0.0016], device='cuda:0'), out_proj_covar=tensor([1.6433e-05, 1.6601e-05, 1.5403e-05, 1.6881e-05, 1.6341e-05, 1.6906e-05, 1.8984e-05, 1.9451e-05], device='cuda:0') 2022-11-15 19:19:57,523 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30565.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:20:04,413 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-15 19:20:14,011 INFO [train.py:876] (0/4) Epoch 5, batch 1500, loss[loss=0.1773, simple_loss=0.1957, pruned_loss=0.07945, over 5756.00 frames. ], tot_loss[loss=0.1817, simple_loss=0.1837, pruned_loss=0.08989, over 1081807.87 frames. ], batch size: 16, lr: 1.63e-02, grad_scale: 8.0 2022-11-15 19:20:31,688 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.945e+02 2.500e+02 2.887e+02 6.503e+02, threshold=4.999e+02, percent-clipped=3.0 2022-11-15 19:20:50,811 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6337, 3.3388, 2.8677, 3.2871, 3.3159, 2.9367, 2.8675, 2.9260], device='cuda:0'), covar=tensor([0.1827, 0.0425, 0.1467, 0.0407, 0.0476, 0.0485, 0.0590, 0.0462], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0137, 0.0216, 0.0135, 0.0167, 0.0141, 0.0141, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:20:52,234 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4913, 2.3036, 2.9766, 1.0342, 1.6560, 3.0669, 2.4552, 1.5276], device='cuda:0'), covar=tensor([0.0250, 0.0489, 0.0198, 0.1308, 0.1538, 0.1493, 0.0570, 0.0467], device='cuda:0'), in_proj_covar=tensor([0.0046, 0.0042, 0.0044, 0.0051, 0.0043, 0.0037, 0.0038, 0.0043], device='cuda:0'), out_proj_covar=tensor([8.1808e-05, 7.2729e-05, 7.4665e-05, 9.8851e-05, 7.8959e-05, 7.2415e-05, 6.8871e-05, 7.6738e-05], device='cuda:0') 2022-11-15 19:20:53,239 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.96 vs. limit=2.0 2022-11-15 19:21:01,543 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30655.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:21:25,659 INFO [train.py:876] (0/4) Epoch 5, batch 1600, loss[loss=0.1813, simple_loss=0.1909, pruned_loss=0.08588, over 5697.00 frames. ], tot_loss[loss=0.1833, simple_loss=0.1849, pruned_loss=0.09081, over 1078856.38 frames. ], batch size: 12, lr: 1.63e-02, grad_scale: 8.0 2022-11-15 19:21:43,137 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.815e+02 2.150e+02 2.703e+02 4.613e+02, threshold=4.301e+02, percent-clipped=0.0 2022-11-15 19:21:44,597 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=30716.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:21:44,710 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30716.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 19:21:57,640 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-15 19:22:31,303 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30781.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:22:36,913 INFO [train.py:876] (0/4) Epoch 5, batch 1700, loss[loss=0.1876, simple_loss=0.1897, pruned_loss=0.0928, over 5538.00 frames. ], tot_loss[loss=0.1803, simple_loss=0.1829, pruned_loss=0.08885, over 1082204.36 frames. ], batch size: 46, lr: 1.62e-02, grad_scale: 8.0 2022-11-15 19:22:54,316 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.936e+02 2.464e+02 3.039e+02 4.896e+02, threshold=4.928e+02, percent-clipped=3.0 2022-11-15 19:23:04,933 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30829.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:23:29,933 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=30865.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:23:45,849 INFO [train.py:876] (0/4) Epoch 5, batch 1800, loss[loss=0.201, simple_loss=0.2021, pruned_loss=0.09994, over 5670.00 frames. ], tot_loss[loss=0.1801, simple_loss=0.1825, pruned_loss=0.08885, over 1083325.94 frames. ], batch size: 36, lr: 1.62e-02, grad_scale: 8.0 2022-11-15 19:23:49,717 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0341, 3.5117, 3.1054, 2.8692, 2.0719, 3.2265, 2.0488, 2.7792], device='cuda:0'), covar=tensor([0.0357, 0.0086, 0.0153, 0.0283, 0.0355, 0.0133, 0.0326, 0.0092], device='cuda:0'), in_proj_covar=tensor([0.0160, 0.0111, 0.0128, 0.0138, 0.0154, 0.0126, 0.0143, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:23:55,999 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30903.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:24:02,784 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=30913.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:24:03,364 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.810e+02 2.217e+02 2.817e+02 4.516e+02, threshold=4.435e+02, percent-clipped=0.0 2022-11-15 19:24:10,692 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30925.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:24:16,530 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=30934.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:24:27,149 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3386, 1.4187, 1.5772, 1.0391, 1.1855, 1.8291, 1.1813, 1.1111], device='cuda:0'), covar=tensor([0.0017, 0.0030, 0.0023, 0.0026, 0.0032, 0.0024, 0.0023, 0.0030], device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0015, 0.0014, 0.0017, 0.0016, 0.0016, 0.0018, 0.0016], device='cuda:0'), out_proj_covar=tensor([1.6250e-05, 1.6329e-05, 1.5383e-05, 1.7746e-05, 1.6384e-05, 1.6811e-05, 1.8810e-05, 1.9459e-05], device='cuda:0') 2022-11-15 19:24:34,892 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.94 vs. limit=5.0 2022-11-15 19:24:37,860 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30964.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:24:52,282 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30986.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:24:54,009 INFO [train.py:876] (0/4) Epoch 5, batch 1900, loss[loss=0.1969, simple_loss=0.1736, pruned_loss=0.1101, over 4167.00 frames. ], tot_loss[loss=0.1822, simple_loss=0.184, pruned_loss=0.09019, over 1077139.83 frames. ], batch size: 181, lr: 1.62e-02, grad_scale: 8.0 2022-11-15 19:24:58,416 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=30995.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:25:10,262 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31011.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 19:25:10,331 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31011.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 19:25:12,009 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.895e+02 2.325e+02 2.903e+02 5.344e+02, threshold=4.651e+02, percent-clipped=5.0 2022-11-15 19:25:13,452 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31016.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:25:46,274 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31064.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:25:51,709 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31072.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 19:25:58,282 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6142, 1.5779, 2.0492, 0.9792, 0.9256, 2.0428, 1.4295, 1.4291], device='cuda:0'), covar=tensor([0.0355, 0.0610, 0.0331, 0.1256, 0.1847, 0.0669, 0.0791, 0.0935], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0040, 0.0042, 0.0047, 0.0040, 0.0034, 0.0035, 0.0042], device='cuda:0'), out_proj_covar=tensor([7.7283e-05, 7.0404e-05, 7.1949e-05, 9.2061e-05, 7.4655e-05, 6.8800e-05, 6.5538e-05, 7.3803e-05], device='cuda:0') 2022-11-15 19:26:02,816 INFO [train.py:876] (0/4) Epoch 5, batch 2000, loss[loss=0.19, simple_loss=0.1961, pruned_loss=0.09192, over 5651.00 frames. ], tot_loss[loss=0.1822, simple_loss=0.1837, pruned_loss=0.09034, over 1070817.92 frames. ], batch size: 32, lr: 1.62e-02, grad_scale: 8.0 2022-11-15 19:26:20,067 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.737e+02 2.220e+02 2.843e+02 5.773e+02, threshold=4.440e+02, percent-clipped=3.0 2022-11-15 19:26:24,271 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 19:26:34,639 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.94 vs. limit=5.0 2022-11-15 19:27:10,091 INFO [train.py:876] (0/4) Epoch 5, batch 2100, loss[loss=0.1783, simple_loss=0.1819, pruned_loss=0.08731, over 5307.00 frames. ], tot_loss[loss=0.1836, simple_loss=0.185, pruned_loss=0.09114, over 1075374.18 frames. ], batch size: 79, lr: 1.61e-02, grad_scale: 8.0 2022-11-15 19:27:26,926 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.254e+02 1.972e+02 2.588e+02 3.306e+02 8.013e+02, threshold=5.176e+02, percent-clipped=4.0 2022-11-15 19:27:58,194 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31259.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:28:13,290 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31281.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:28:18,980 INFO [train.py:876] (0/4) Epoch 5, batch 2200, loss[loss=0.2031, simple_loss=0.2024, pruned_loss=0.1019, over 5453.00 frames. ], tot_loss[loss=0.1817, simple_loss=0.1839, pruned_loss=0.08972, over 1083508.63 frames. ], batch size: 53, lr: 1.61e-02, grad_scale: 8.0 2022-11-15 19:28:19,739 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31290.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:28:33,494 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31311.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:28:35,401 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 2.016e+02 2.479e+02 3.242e+02 5.312e+02, threshold=4.958e+02, percent-clipped=2.0 2022-11-15 19:28:44,763 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1936, 2.4066, 2.0792, 2.8261, 1.9474, 2.7698, 1.9802, 3.0964], device='cuda:0'), covar=tensor([0.0782, 0.2142, 0.2915, 0.2017, 0.2169, 0.0905, 0.1784, 0.0986], device='cuda:0'), in_proj_covar=tensor([0.0052, 0.0059, 0.0076, 0.0050, 0.0064, 0.0053, 0.0067, 0.0049], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:29:06,517 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31359.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:29:11,628 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31367.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 19:29:15,261 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 19:29:25,316 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1103, 2.5991, 2.0647, 2.9441, 1.9761, 2.6206, 2.0110, 3.1175], device='cuda:0'), covar=tensor([0.0812, 0.1097, 0.2431, 0.1218, 0.1944, 0.0747, 0.1606, 0.1595], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0060, 0.0077, 0.0051, 0.0065, 0.0054, 0.0069, 0.0049], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:29:26,896 INFO [train.py:876] (0/4) Epoch 5, batch 2300, loss[loss=0.1338, simple_loss=0.1562, pruned_loss=0.05565, over 5707.00 frames. ], tot_loss[loss=0.1764, simple_loss=0.1798, pruned_loss=0.08649, over 1079914.63 frames. ], batch size: 15, lr: 1.61e-02, grad_scale: 16.0 2022-11-15 19:29:38,065 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.02 vs. limit=2.0 2022-11-15 19:29:41,111 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31409.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:29:44,269 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.900e+02 2.299e+02 2.871e+02 4.466e+02, threshold=4.598e+02, percent-clipped=0.0 2022-11-15 19:29:47,072 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5235, 1.7503, 2.9925, 2.2221, 3.3138, 1.9944, 2.8793, 3.4779], device='cuda:0'), covar=tensor([0.0253, 0.1222, 0.0457, 0.1125, 0.0307, 0.0948, 0.0692, 0.0318], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0189, 0.0172, 0.0201, 0.0171, 0.0181, 0.0216, 0.0196], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:0') 2022-11-15 19:29:54,414 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1460, 2.3184, 3.6455, 2.9609, 4.1378, 2.6035, 3.6967, 4.2103], device='cuda:0'), covar=tensor([0.0206, 0.1010, 0.0342, 0.0772, 0.0157, 0.0771, 0.0484, 0.0360], device='cuda:0'), in_proj_covar=tensor([0.0174, 0.0188, 0.0172, 0.0199, 0.0171, 0.0180, 0.0214, 0.0196], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003], device='cuda:0') 2022-11-15 19:30:03,859 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.99 vs. limit=2.0 2022-11-15 19:30:16,680 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8984, 3.8991, 3.8748, 4.1330, 3.4842, 3.2163, 4.4547, 3.7802], device='cuda:0'), covar=tensor([0.0477, 0.0757, 0.0391, 0.0638, 0.0576, 0.0380, 0.0604, 0.0540], device='cuda:0'), in_proj_covar=tensor([0.0063, 0.0083, 0.0069, 0.0084, 0.0066, 0.0055, 0.0106, 0.0069], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:30:22,592 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31470.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:30:35,178 INFO [train.py:876] (0/4) Epoch 5, batch 2400, loss[loss=0.2317, simple_loss=0.2189, pruned_loss=0.1223, over 5545.00 frames. ], tot_loss[loss=0.1775, simple_loss=0.1808, pruned_loss=0.08706, over 1079431.20 frames. ], batch size: 46, lr: 1.61e-02, grad_scale: 16.0 2022-11-15 19:30:37,080 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31491.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 19:30:53,270 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.223e+02 1.788e+02 2.240e+02 2.771e+02 4.453e+02, threshold=4.479e+02, percent-clipped=0.0 2022-11-15 19:30:55,812 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-15 19:30:57,283 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8720, 4.1773, 3.1923, 1.8141, 3.8646, 1.5131, 3.8660, 2.0190], device='cuda:0'), covar=tensor([0.1173, 0.0150, 0.0558, 0.2143, 0.0219, 0.2228, 0.0211, 0.2122], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0097, 0.0107, 0.0121, 0.0101, 0.0132, 0.0088, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 19:31:08,525 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5234, 1.0231, 1.6575, 1.3896, 1.1556, 1.9702, 1.5100, 1.3423], device='cuda:0'), covar=tensor([0.0024, 0.0041, 0.0050, 0.0019, 0.0042, 0.0044, 0.0020, 0.0028], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0015, 0.0015, 0.0017, 0.0017, 0.0016, 0.0018, 0.0017], device='cuda:0'), out_proj_covar=tensor([1.6111e-05, 1.6483e-05, 1.5273e-05, 1.7740e-05, 1.7196e-05, 1.6630e-05, 1.9495e-05, 2.0108e-05], device='cuda:0') 2022-11-15 19:31:19,170 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31552.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 19:31:24,379 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31559.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:31:38,728 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31581.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:31:43,846 INFO [train.py:876] (0/4) Epoch 5, batch 2500, loss[loss=0.1684, simple_loss=0.1824, pruned_loss=0.07721, over 5586.00 frames. ], tot_loss[loss=0.1774, simple_loss=0.1813, pruned_loss=0.08675, over 1083523.71 frames. ], batch size: 18, lr: 1.60e-02, grad_scale: 16.0 2022-11-15 19:31:44,620 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31590.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:31:56,591 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31607.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:32:01,410 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.811e+02 2.202e+02 2.747e+02 5.680e+02, threshold=4.404e+02, percent-clipped=5.0 2022-11-15 19:32:11,658 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31629.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:32:13,061 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4539, 2.4415, 1.9360, 2.4923, 1.9715, 2.4384, 2.2932, 2.6278], device='cuda:0'), covar=tensor([0.0667, 0.1281, 0.2644, 0.1759, 0.1805, 0.0988, 0.1640, 0.2417], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0060, 0.0076, 0.0051, 0.0066, 0.0055, 0.0069, 0.0050], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:32:17,650 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31638.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:32:18,050 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2022-11-15 19:32:37,442 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=31667.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 19:32:52,307 INFO [train.py:876] (0/4) Epoch 5, batch 2600, loss[loss=0.156, simple_loss=0.1748, pruned_loss=0.06867, over 5477.00 frames. ], tot_loss[loss=0.1784, simple_loss=0.182, pruned_loss=0.0874, over 1085541.78 frames. ], batch size: 12, lr: 1.60e-02, grad_scale: 16.0 2022-11-15 19:32:52,754 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2022-11-15 19:33:02,099 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31704.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:33:03,543 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2022-11-15 19:33:08,478 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.912e+02 2.361e+02 2.985e+02 5.385e+02, threshold=4.723e+02, percent-clipped=4.0 2022-11-15 19:33:09,210 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=31715.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 19:33:25,795 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.06 vs. limit=5.0 2022-11-15 19:33:40,787 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8137, 0.7399, 1.0536, 1.5405, 0.9534, 1.2685, 1.3033, 1.2435], device='cuda:0'), covar=tensor([0.0021, 0.0074, 0.0060, 0.0019, 0.0091, 0.0057, 0.0020, 0.0032], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0015, 0.0015, 0.0017, 0.0017, 0.0015, 0.0018, 0.0016], device='cuda:0'), out_proj_covar=tensor([1.6030e-05, 1.6461e-05, 1.5490e-05, 1.7527e-05, 1.7434e-05, 1.6066e-05, 1.9702e-05, 1.9682e-05], device='cuda:0') 2022-11-15 19:33:43,348 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31765.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:33:43,447 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31765.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:34:00,059 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31788.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:34:00,581 INFO [train.py:876] (0/4) Epoch 5, batch 2700, loss[loss=0.2166, simple_loss=0.2198, pruned_loss=0.1066, over 5593.00 frames. ], tot_loss[loss=0.1768, simple_loss=0.1812, pruned_loss=0.08625, over 1089575.63 frames. ], batch size: 43, lr: 1.60e-02, grad_scale: 16.0 2022-11-15 19:34:17,215 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.996e+02 2.440e+02 3.153e+02 7.916e+02, threshold=4.880e+02, percent-clipped=8.0 2022-11-15 19:34:23,133 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31823.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:34:34,461 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.92 vs. limit=5.0 2022-11-15 19:34:40,050 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=31847.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 19:34:41,485 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31849.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:35:00,776 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31878.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:35:04,132 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.79 vs. limit=2.0 2022-11-15 19:35:05,091 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31884.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:35:08,455 INFO [train.py:876] (0/4) Epoch 5, batch 2800, loss[loss=0.2165, simple_loss=0.2128, pruned_loss=0.1101, over 5586.00 frames. ], tot_loss[loss=0.1785, simple_loss=0.1823, pruned_loss=0.08734, over 1089735.69 frames. ], batch size: 24, lr: 1.60e-02, grad_scale: 16.0 2022-11-15 19:35:25,010 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.889e+02 2.395e+02 3.111e+02 5.606e+02, threshold=4.789e+02, percent-clipped=5.0 2022-11-15 19:35:25,904 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31915.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:35:42,294 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31939.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:35:50,333 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=31950.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:35:54,364 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2022-11-15 19:36:07,077 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=31976.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:36:15,574 INFO [train.py:876] (0/4) Epoch 5, batch 2900, loss[loss=0.1441, simple_loss=0.1632, pruned_loss=0.06252, over 5825.00 frames. ], tot_loss[loss=0.1784, simple_loss=0.1822, pruned_loss=0.08728, over 1085519.58 frames. ], batch size: 18, lr: 1.59e-02, grad_scale: 16.0 2022-11-15 19:36:31,852 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=32011.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:36:33,617 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.337e+02 1.876e+02 2.342e+02 2.951e+02 6.735e+02, threshold=4.684e+02, percent-clipped=4.0 2022-11-15 19:36:54,373 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2022-11-15 19:36:54,939 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4567, 2.8509, 3.6343, 4.3842, 4.5747, 3.7697, 2.8618, 4.5767], device='cuda:0'), covar=tensor([0.0219, 0.3664, 0.1954, 0.3613, 0.0483, 0.2226, 0.2124, 0.0190], device='cuda:0'), in_proj_covar=tensor([0.0173, 0.0215, 0.0213, 0.0310, 0.0217, 0.0222, 0.0204, 0.0171], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 19:37:04,952 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32060.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:37:08,624 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32065.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:37:24,195 INFO [train.py:876] (0/4) Epoch 5, batch 3000, loss[loss=0.2092, simple_loss=0.2054, pruned_loss=0.1065, over 5766.00 frames. ], tot_loss[loss=0.1785, simple_loss=0.1817, pruned_loss=0.08768, over 1087820.03 frames. ], batch size: 20, lr: 1.59e-02, grad_scale: 16.0 2022-11-15 19:37:24,197 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 19:37:39,097 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3271, 3.0410, 2.4569, 1.6435, 3.1185, 1.1284, 2.9332, 1.7224], device='cuda:0'), covar=tensor([0.0617, 0.0184, 0.0624, 0.1065, 0.0141, 0.1357, 0.0216, 0.0846], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0099, 0.0107, 0.0122, 0.0099, 0.0132, 0.0089, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 19:37:41,543 INFO [train.py:908] (0/4) Epoch 5, validation: loss=0.1632, simple_loss=0.186, pruned_loss=0.07021, over 1530663.00 frames. 2022-11-15 19:37:41,543 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 19:37:58,635 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32113.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:37:59,205 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.218e+02 2.101e+02 2.644e+02 3.414e+02 5.284e+02, threshold=5.288e+02, percent-clipped=5.0 2022-11-15 19:38:09,887 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1883, 2.8296, 3.2717, 1.4016, 3.0787, 3.5650, 3.2779, 3.8245], device='cuda:0'), covar=tensor([0.1636, 0.1468, 0.0589, 0.2262, 0.0216, 0.0365, 0.0217, 0.0279], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0182, 0.0138, 0.0185, 0.0140, 0.0142, 0.0127, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 19:38:13,700 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7742, 4.3362, 3.7469, 4.1941, 4.2847, 3.5897, 3.8074, 3.6712], device='cuda:0'), covar=tensor([0.0410, 0.0310, 0.1133, 0.0413, 0.0335, 0.0383, 0.0541, 0.0573], device='cuda:0'), in_proj_covar=tensor([0.0111, 0.0131, 0.0211, 0.0134, 0.0158, 0.0135, 0.0141, 0.0127], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:38:19,076 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32144.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:38:21,102 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32147.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 19:38:26,706 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6590, 1.9677, 2.5546, 3.3701, 3.5022, 2.7122, 1.8898, 3.5066], device='cuda:0'), covar=tensor([0.0299, 0.3587, 0.2817, 0.3155, 0.0840, 0.2782, 0.2558, 0.0282], device='cuda:0'), in_proj_covar=tensor([0.0171, 0.0215, 0.0211, 0.0311, 0.0215, 0.0219, 0.0202, 0.0171], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 19:38:41,140 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2022-11-15 19:38:43,312 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32179.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:38:49,858 INFO [train.py:876] (0/4) Epoch 5, batch 3100, loss[loss=0.1486, simple_loss=0.1573, pruned_loss=0.06994, over 5288.00 frames. ], tot_loss[loss=0.1783, simple_loss=0.1816, pruned_loss=0.08752, over 1085444.06 frames. ], batch size: 9, lr: 1.59e-02, grad_scale: 16.0 2022-11-15 19:38:53,718 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32195.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 19:39:07,482 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.018e+02 2.024e+02 2.459e+02 3.147e+02 6.395e+02, threshold=4.918e+02, percent-clipped=1.0 2022-11-15 19:39:21,165 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32234.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:39:46,913 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32271.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:39:49,210 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.01 vs. limit=2.0 2022-11-15 19:39:58,835 INFO [train.py:876] (0/4) Epoch 5, batch 3200, loss[loss=0.2099, simple_loss=0.207, pruned_loss=0.1064, over 5715.00 frames. ], tot_loss[loss=0.1792, simple_loss=0.1826, pruned_loss=0.08791, over 1081677.59 frames. ], batch size: 34, lr: 1.59e-02, grad_scale: 16.0 2022-11-15 19:40:10,374 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=32306.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:40:16,208 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.227e+02 1.788e+02 2.177e+02 2.789e+02 5.595e+02, threshold=4.355e+02, percent-clipped=3.0 2022-11-15 19:40:22,803 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2022-11-15 19:40:38,446 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8178, 1.2720, 1.1818, 1.3779, 1.0969, 1.3597, 1.0274, 1.3579], device='cuda:0'), covar=tensor([0.0904, 0.0716, 0.0968, 0.0310, 0.0881, 0.0855, 0.1014, 0.0302], device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0056, 0.0075, 0.0048, 0.0061, 0.0053, 0.0067, 0.0047], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:40:47,296 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32360.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:41:07,144 INFO [train.py:876] (0/4) Epoch 5, batch 3300, loss[loss=0.2087, simple_loss=0.2051, pruned_loss=0.1061, over 5640.00 frames. ], tot_loss[loss=0.1791, simple_loss=0.1822, pruned_loss=0.08802, over 1084757.79 frames. ], batch size: 38, lr: 1.58e-02, grad_scale: 16.0 2022-11-15 19:41:18,071 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2022-11-15 19:41:19,784 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32408.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:41:24,094 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.275e+02 1.817e+02 2.244e+02 2.768e+02 6.144e+02, threshold=4.488e+02, percent-clipped=4.0 2022-11-15 19:41:44,493 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32444.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:42:04,770 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2621, 0.9539, 1.8321, 1.1933, 1.1090, 1.7468, 0.8254, 0.8495], device='cuda:0'), covar=tensor([0.0014, 0.0037, 0.0012, 0.0017, 0.0033, 0.0021, 0.0022, 0.0028], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0014, 0.0014, 0.0015, 0.0015, 0.0014, 0.0017, 0.0015], device='cuda:0'), out_proj_covar=tensor([1.4224e-05, 1.5139e-05, 1.4218e-05, 1.5493e-05, 1.5657e-05, 1.4342e-05, 1.7922e-05, 1.7680e-05], device='cuda:0') 2022-11-15 19:42:05,997 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5151, 4.1324, 4.4039, 4.1177, 4.5657, 4.5255, 4.2501, 4.6080], device='cuda:0'), covar=tensor([0.0435, 0.0268, 0.0407, 0.0295, 0.0398, 0.0136, 0.0190, 0.0255], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0108, 0.0086, 0.0117, 0.0117, 0.0070, 0.0094, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:42:08,048 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32479.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:42:15,161 INFO [train.py:876] (0/4) Epoch 5, batch 3400, loss[loss=0.1799, simple_loss=0.1878, pruned_loss=0.08597, over 5710.00 frames. ], tot_loss[loss=0.1792, simple_loss=0.1825, pruned_loss=0.08795, over 1090918.48 frames. ], batch size: 36, lr: 1.58e-02, grad_scale: 16.0 2022-11-15 19:42:17,560 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32492.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:42:32,057 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.882e+02 2.362e+02 2.947e+02 5.374e+02, threshold=4.725e+02, percent-clipped=3.0 2022-11-15 19:42:40,950 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32527.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:42:46,036 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32534.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:42:49,746 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.54 vs. limit=2.0 2022-11-15 19:42:55,760 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2714, 2.4859, 3.3352, 4.0569, 4.4504, 3.6611, 3.0768, 4.4498], device='cuda:0'), covar=tensor([0.0240, 0.4618, 0.1953, 0.2834, 0.0540, 0.2335, 0.1831, 0.0192], device='cuda:0'), in_proj_covar=tensor([0.0172, 0.0218, 0.0217, 0.0316, 0.0219, 0.0227, 0.0205, 0.0171], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 19:43:10,883 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32571.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:43:16,566 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5556, 3.8912, 3.0553, 1.7634, 3.7166, 1.2069, 3.6859, 2.1261], device='cuda:0'), covar=tensor([0.1292, 0.0173, 0.0671, 0.2450, 0.0194, 0.2487, 0.0200, 0.2169], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0102, 0.0109, 0.0124, 0.0101, 0.0134, 0.0092, 0.0127], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 19:43:18,441 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32582.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:43:23,545 INFO [train.py:876] (0/4) Epoch 5, batch 3500, loss[loss=0.1798, simple_loss=0.1803, pruned_loss=0.08964, over 5555.00 frames. ], tot_loss[loss=0.1818, simple_loss=0.1842, pruned_loss=0.08965, over 1088676.97 frames. ], batch size: 40, lr: 1.58e-02, grad_scale: 16.0 2022-11-15 19:43:35,722 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=32606.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:43:40,884 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 1.935e+02 2.270e+02 2.895e+02 5.164e+02, threshold=4.540e+02, percent-clipped=2.0 2022-11-15 19:43:44,152 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32619.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:44:08,843 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=32654.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:44:11,884 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2022-11-15 19:44:24,697 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0783, 4.5306, 3.5008, 2.1457, 4.2049, 1.6449, 3.8698, 2.4882], device='cuda:0'), covar=tensor([0.0904, 0.0116, 0.0514, 0.1794, 0.0153, 0.1753, 0.0224, 0.1438], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0099, 0.0107, 0.0121, 0.0101, 0.0131, 0.0089, 0.0123], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 19:44:32,333 INFO [train.py:876] (0/4) Epoch 5, batch 3600, loss[loss=0.1655, simple_loss=0.1776, pruned_loss=0.07674, over 5593.00 frames. ], tot_loss[loss=0.1799, simple_loss=0.183, pruned_loss=0.08839, over 1086164.63 frames. ], batch size: 25, lr: 1.58e-02, grad_scale: 16.0 2022-11-15 19:44:49,900 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.827e+02 2.423e+02 3.084e+02 7.397e+02, threshold=4.846e+02, percent-clipped=5.0 2022-11-15 19:45:03,150 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8140, 2.8732, 2.1467, 2.7071, 1.8912, 2.3341, 1.6382, 2.5764], device='cuda:0'), covar=tensor([0.1188, 0.0181, 0.0943, 0.0295, 0.0948, 0.0803, 0.1630, 0.0327], device='cuda:0'), in_proj_covar=tensor([0.0170, 0.0123, 0.0164, 0.0129, 0.0160, 0.0177, 0.0182, 0.0136], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:45:40,987 INFO [train.py:876] (0/4) Epoch 5, batch 3700, loss[loss=0.2044, simple_loss=0.1996, pruned_loss=0.1046, over 5521.00 frames. ], tot_loss[loss=0.1785, simple_loss=0.1822, pruned_loss=0.08742, over 1086932.56 frames. ], batch size: 17, lr: 1.58e-02, grad_scale: 16.0 2022-11-15 19:45:57,989 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.334e+02 2.092e+02 2.533e+02 3.307e+02 5.477e+02, threshold=5.066e+02, percent-clipped=1.0 2022-11-15 19:46:14,937 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-15 19:46:24,474 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2022-11-15 19:46:25,746 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2022-11-15 19:46:27,621 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 19:46:37,571 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.27 vs. limit=5.0 2022-11-15 19:46:49,253 INFO [train.py:876] (0/4) Epoch 5, batch 3800, loss[loss=0.1364, simple_loss=0.152, pruned_loss=0.06037, over 5494.00 frames. ], tot_loss[loss=0.18, simple_loss=0.1831, pruned_loss=0.08843, over 1090897.87 frames. ], batch size: 12, lr: 1.57e-02, grad_scale: 16.0 2022-11-15 19:47:03,551 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1909, 2.8913, 2.9316, 1.4374, 2.9654, 3.2073, 2.9312, 3.2413], device='cuda:0'), covar=tensor([0.1567, 0.1336, 0.0658, 0.2471, 0.0332, 0.0363, 0.0391, 0.0445], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0190, 0.0145, 0.0201, 0.0147, 0.0150, 0.0134, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 19:47:05,646 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.79 vs. limit=5.0 2022-11-15 19:47:05,911 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.821e+02 2.317e+02 3.245e+02 5.660e+02, threshold=4.635e+02, percent-clipped=3.0 2022-11-15 19:47:15,042 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1951, 1.2955, 1.5298, 1.0613, 0.7331, 2.0445, 1.4957, 1.0678], device='cuda:0'), covar=tensor([0.0686, 0.0817, 0.0654, 0.1601, 0.1932, 0.1182, 0.0967, 0.0932], device='cuda:0'), in_proj_covar=tensor([0.0048, 0.0041, 0.0044, 0.0052, 0.0044, 0.0037, 0.0040, 0.0041], device='cuda:0'), out_proj_covar=tensor([8.8106e-05, 7.6693e-05, 7.8545e-05, 1.0129e-04, 8.4422e-05, 7.5749e-05, 7.5196e-05, 7.6251e-05], device='cuda:0') 2022-11-15 19:47:42,278 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=32966.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:47:57,891 INFO [train.py:876] (0/4) Epoch 5, batch 3900, loss[loss=0.1644, simple_loss=0.1807, pruned_loss=0.0741, over 5717.00 frames. ], tot_loss[loss=0.1807, simple_loss=0.1833, pruned_loss=0.08903, over 1085838.46 frames. ], batch size: 17, lr: 1.57e-02, grad_scale: 16.0 2022-11-15 19:48:14,684 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9526, 2.3259, 2.2532, 1.3091, 2.3353, 2.6205, 2.4834, 2.4136], device='cuda:0'), covar=tensor([0.1414, 0.1293, 0.0866, 0.2199, 0.0411, 0.0402, 0.0255, 0.0765], device='cuda:0'), in_proj_covar=tensor([0.0189, 0.0187, 0.0142, 0.0199, 0.0149, 0.0149, 0.0134, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 19:48:15,129 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.831e+02 2.324e+02 2.912e+02 5.041e+02, threshold=4.648e+02, percent-clipped=1.0 2022-11-15 19:48:17,136 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3225, 3.2803, 3.3162, 3.5083, 3.1889, 2.9189, 3.7611, 3.2539], device='cuda:0'), covar=tensor([0.0469, 0.0794, 0.0437, 0.0875, 0.0635, 0.0380, 0.0845, 0.0613], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0086, 0.0071, 0.0088, 0.0068, 0.0057, 0.0111, 0.0073], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:48:24,292 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33027.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:48:34,354 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9631, 2.3818, 1.7624, 2.4214, 1.6045, 1.7164, 1.7525, 2.6115], device='cuda:0'), covar=tensor([0.0701, 0.0886, 0.2748, 0.1159, 0.1412, 0.1269, 0.1563, 0.1087], device='cuda:0'), in_proj_covar=tensor([0.0051, 0.0057, 0.0075, 0.0049, 0.0063, 0.0054, 0.0068, 0.0046], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:48:52,383 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0080, 2.2795, 1.7020, 2.4411, 1.5321, 1.6800, 1.7635, 2.3665], device='cuda:0'), covar=tensor([0.0566, 0.0917, 0.2774, 0.0677, 0.1708, 0.1068, 0.1819, 0.0932], device='cuda:0'), in_proj_covar=tensor([0.0052, 0.0058, 0.0076, 0.0049, 0.0064, 0.0055, 0.0069, 0.0047], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:48:52,387 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33069.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:49:06,330 INFO [train.py:876] (0/4) Epoch 5, batch 4000, loss[loss=0.1466, simple_loss=0.1674, pruned_loss=0.06286, over 5517.00 frames. ], tot_loss[loss=0.1836, simple_loss=0.1854, pruned_loss=0.09091, over 1083870.81 frames. ], batch size: 17, lr: 1.57e-02, grad_scale: 16.0 2022-11-15 19:49:23,825 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.282e+02 1.884e+02 2.400e+02 2.913e+02 6.279e+02, threshold=4.801e+02, percent-clipped=5.0 2022-11-15 19:49:34,158 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33130.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:49:51,341 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33155.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:49:51,925 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7094, 3.5167, 3.9149, 3.6098, 3.6010, 3.4389, 1.5389, 3.8013], device='cuda:0'), covar=tensor([0.0375, 0.0474, 0.0231, 0.0330, 0.0469, 0.0519, 0.3223, 0.0364], device='cuda:0'), in_proj_covar=tensor([0.0096, 0.0071, 0.0072, 0.0063, 0.0088, 0.0074, 0.0129, 0.0096], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:49:53,492 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.96 vs. limit=5.0 2022-11-15 19:50:13,957 INFO [train.py:876] (0/4) Epoch 5, batch 4100, loss[loss=0.196, simple_loss=0.1954, pruned_loss=0.09834, over 5560.00 frames. ], tot_loss[loss=0.1812, simple_loss=0.1837, pruned_loss=0.08938, over 1083613.78 frames. ], batch size: 25, lr: 1.57e-02, grad_scale: 8.0 2022-11-15 19:50:32,027 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.879e+02 2.350e+02 3.001e+02 5.532e+02, threshold=4.700e+02, percent-clipped=2.0 2022-11-15 19:50:32,849 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33216.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:50:58,822 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2022-11-15 19:51:03,915 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0052, 3.4966, 2.4174, 3.2450, 2.5032, 2.5285, 1.8831, 2.9366], device='cuda:0'), covar=tensor([0.1162, 0.0163, 0.0788, 0.0224, 0.0660, 0.0775, 0.1508, 0.0265], device='cuda:0'), in_proj_covar=tensor([0.0178, 0.0130, 0.0171, 0.0132, 0.0162, 0.0184, 0.0186, 0.0138], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:51:22,796 INFO [train.py:876] (0/4) Epoch 5, batch 4200, loss[loss=0.197, simple_loss=0.1864, pruned_loss=0.1038, over 5045.00 frames. ], tot_loss[loss=0.1807, simple_loss=0.1833, pruned_loss=0.08903, over 1077124.93 frames. ], batch size: 110, lr: 1.56e-02, grad_scale: 8.0 2022-11-15 19:51:40,446 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.845e+02 2.173e+02 2.647e+02 4.072e+02, threshold=4.345e+02, percent-clipped=0.0 2022-11-15 19:51:40,768 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.81 vs. limit=5.0 2022-11-15 19:51:45,056 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=33322.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:51:57,405 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7764, 4.1454, 3.6737, 3.2138, 2.4241, 4.1261, 2.3634, 3.3779], device='cuda:0'), covar=tensor([0.0307, 0.0210, 0.0143, 0.0295, 0.0381, 0.0092, 0.0305, 0.0074], device='cuda:0'), in_proj_covar=tensor([0.0165, 0.0117, 0.0134, 0.0144, 0.0160, 0.0127, 0.0149, 0.0113], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:52:17,260 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.68 vs. limit=5.0 2022-11-15 19:52:24,410 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4241, 0.8066, 1.3063, 1.0012, 0.9773, 1.1514, 0.8571, 1.0041], device='cuda:0'), covar=tensor([0.0632, 0.0586, 0.0873, 0.2303, 0.3163, 0.1307, 0.1259, 0.0592], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0012, 0.0009, 0.0010, 0.0009, 0.0009, 0.0010, 0.0009], device='cuda:0'), out_proj_covar=tensor([3.7527e-05, 4.7729e-05, 3.9892e-05, 4.3754e-05, 4.0568e-05, 3.7924e-05, 4.1605e-05, 4.0965e-05], device='cuda:0') 2022-11-15 19:52:30,460 INFO [train.py:876] (0/4) Epoch 5, batch 4300, loss[loss=0.1739, simple_loss=0.1654, pruned_loss=0.09124, over 5278.00 frames. ], tot_loss[loss=0.1799, simple_loss=0.1826, pruned_loss=0.08859, over 1074616.12 frames. ], batch size: 79, lr: 1.56e-02, grad_scale: 8.0 2022-11-15 19:52:44,172 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2022-11-15 19:52:49,001 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.372e+01 1.969e+02 2.435e+02 3.163e+02 9.091e+02, threshold=4.870e+02, percent-clipped=6.0 2022-11-15 19:52:55,802 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=33425.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:52:56,479 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3132, 4.3620, 2.8406, 4.0263, 3.3813, 2.8654, 2.1634, 3.6513], device='cuda:0'), covar=tensor([0.1401, 0.0150, 0.0893, 0.0256, 0.0442, 0.0921, 0.1850, 0.0222], device='cuda:0'), in_proj_covar=tensor([0.0173, 0.0128, 0.0165, 0.0129, 0.0159, 0.0181, 0.0181, 0.0133], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:52:57,818 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5773, 2.3325, 2.7047, 3.5814, 3.6574, 2.8053, 2.5865, 3.9828], device='cuda:0'), covar=tensor([0.0315, 0.3334, 0.2621, 0.4245, 0.0863, 0.3181, 0.1929, 0.0351], device='cuda:0'), in_proj_covar=tensor([0.0168, 0.0211, 0.0212, 0.0314, 0.0214, 0.0223, 0.0201, 0.0171], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 19:53:28,326 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33472.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:53:39,861 INFO [train.py:876] (0/4) Epoch 5, batch 4400, loss[loss=0.1352, simple_loss=0.1634, pruned_loss=0.05351, over 5803.00 frames. ], tot_loss[loss=0.1804, simple_loss=0.1837, pruned_loss=0.08855, over 1085139.65 frames. ], batch size: 21, lr: 1.56e-02, grad_scale: 8.0 2022-11-15 19:53:55,856 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=33511.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:53:58,512 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.879e+02 2.441e+02 2.964e+02 5.680e+02, threshold=4.882e+02, percent-clipped=2.0 2022-11-15 19:54:00,501 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2584, 3.9393, 3.9872, 1.4788, 3.6509, 4.1160, 4.0155, 4.6360], device='cuda:0'), covar=tensor([0.1735, 0.0986, 0.0391, 0.2283, 0.0173, 0.0199, 0.0239, 0.0184], device='cuda:0'), in_proj_covar=tensor([0.0187, 0.0183, 0.0140, 0.0191, 0.0142, 0.0147, 0.0132, 0.0171], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 19:54:12,051 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33533.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:54:43,107 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7022, 3.6582, 3.6419, 3.4673, 3.5573, 3.6297, 1.3701, 3.7875], device='cuda:0'), covar=tensor([0.0298, 0.0314, 0.0224, 0.0266, 0.0436, 0.0311, 0.2913, 0.0443], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0068, 0.0070, 0.0062, 0.0088, 0.0073, 0.0123, 0.0094], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:54:50,516 INFO [train.py:876] (0/4) Epoch 5, batch 4500, loss[loss=0.1495, simple_loss=0.1654, pruned_loss=0.06678, over 5530.00 frames. ], tot_loss[loss=0.1775, simple_loss=0.1818, pruned_loss=0.08666, over 1090465.93 frames. ], batch size: 17, lr: 1.56e-02, grad_scale: 8.0 2022-11-15 19:55:08,210 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.892e+02 2.378e+02 2.959e+02 6.563e+02, threshold=4.756e+02, percent-clipped=3.0 2022-11-15 19:55:13,243 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=33622.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:55:36,308 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1728, 3.7424, 3.1824, 3.6055, 3.6818, 3.1109, 3.3106, 2.9465], device='cuda:0'), covar=tensor([0.0799, 0.0441, 0.1594, 0.0585, 0.0524, 0.0509, 0.0488, 0.0791], device='cuda:0'), in_proj_covar=tensor([0.0115, 0.0141, 0.0225, 0.0140, 0.0172, 0.0145, 0.0148, 0.0134], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:55:45,882 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=33670.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:55:48,030 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.81 vs. limit=5.0 2022-11-15 19:55:50,869 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33677.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:55:58,824 INFO [train.py:876] (0/4) Epoch 5, batch 4600, loss[loss=0.2548, simple_loss=0.2315, pruned_loss=0.1391, over 5469.00 frames. ], tot_loss[loss=0.1775, simple_loss=0.1821, pruned_loss=0.08643, over 1093533.16 frames. ], batch size: 58, lr: 1.55e-02, grad_scale: 8.0 2022-11-15 19:56:16,174 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.284e+02 1.825e+02 2.230e+02 2.892e+02 8.047e+02, threshold=4.459e+02, percent-clipped=4.0 2022-11-15 19:56:23,215 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=33725.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:56:32,056 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=33738.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 19:56:39,909 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1535, 3.3547, 2.5803, 1.6952, 3.2953, 1.2589, 3.3201, 1.6691], device='cuda:0'), covar=tensor([0.1267, 0.0157, 0.0761, 0.1739, 0.0187, 0.2000, 0.0182, 0.1752], device='cuda:0'), in_proj_covar=tensor([0.0131, 0.0101, 0.0110, 0.0121, 0.0101, 0.0134, 0.0093, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 19:56:55,338 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.80 vs. limit=2.0 2022-11-15 19:56:55,499 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=33773.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:57:03,541 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2022-11-15 19:57:06,415 INFO [train.py:876] (0/4) Epoch 5, batch 4700, loss[loss=0.1339, simple_loss=0.1535, pruned_loss=0.0571, over 5709.00 frames. ], tot_loss[loss=0.1757, simple_loss=0.1807, pruned_loss=0.08534, over 1092238.34 frames. ], batch size: 15, lr: 1.55e-02, grad_scale: 8.0 2022-11-15 19:57:22,558 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=33811.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:57:25,039 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.755e+02 2.231e+02 2.801e+02 4.827e+02, threshold=4.463e+02, percent-clipped=3.0 2022-11-15 19:57:33,505 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=33828.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:57:50,784 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2414, 4.2146, 2.9252, 3.9078, 3.2480, 3.0766, 2.2875, 3.5543], device='cuda:0'), covar=tensor([0.1742, 0.0204, 0.0939, 0.0303, 0.0589, 0.0872, 0.1949, 0.0266], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0129, 0.0169, 0.0132, 0.0165, 0.0180, 0.0184, 0.0135], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 19:57:55,049 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=33859.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:58:15,453 INFO [train.py:876] (0/4) Epoch 5, batch 4800, loss[loss=0.2523, simple_loss=0.2332, pruned_loss=0.1357, over 5575.00 frames. ], tot_loss[loss=0.1739, simple_loss=0.1792, pruned_loss=0.08428, over 1095615.30 frames. ], batch size: 43, lr: 1.55e-02, grad_scale: 8.0 2022-11-15 19:58:30,165 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3005, 2.6987, 2.0331, 2.9507, 1.7070, 2.1209, 2.3512, 2.5287], device='cuda:0'), covar=tensor([0.0807, 0.1497, 0.2972, 0.0817, 0.2562, 0.1379, 0.1887, 0.2756], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0063, 0.0077, 0.0051, 0.0067, 0.0057, 0.0071, 0.0048], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 19:58:33,232 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.053e+02 1.864e+02 2.250e+02 2.859e+02 4.870e+02, threshold=4.500e+02, percent-clipped=2.0 2022-11-15 19:58:49,522 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=33939.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:58:50,225 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6007, 1.9887, 2.3535, 3.4957, 3.5592, 2.5553, 2.0340, 3.5127], device='cuda:0'), covar=tensor([0.0322, 0.3647, 0.2100, 0.2501, 0.0773, 0.2563, 0.2252, 0.0277], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0215, 0.0218, 0.0322, 0.0215, 0.0226, 0.0207, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 19:59:23,240 INFO [train.py:876] (0/4) Epoch 5, batch 4900, loss[loss=0.1014, simple_loss=0.1272, pruned_loss=0.03783, over 5439.00 frames. ], tot_loss[loss=0.1763, simple_loss=0.1807, pruned_loss=0.08593, over 1097767.29 frames. ], batch size: 10, lr: 1.55e-02, grad_scale: 8.0 2022-11-15 19:59:31,333 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34000.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 19:59:41,366 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.343e+02 1.963e+02 2.433e+02 3.223e+02 8.796e+02, threshold=4.867e+02, percent-clipped=10.0 2022-11-15 19:59:53,931 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34033.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:00:01,245 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34044.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:00:10,117 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1882, 1.1099, 1.4465, 1.3625, 1.2971, 1.6198, 1.2183, 0.8173], device='cuda:0'), covar=tensor([0.0023, 0.0049, 0.0023, 0.0022, 0.0024, 0.0025, 0.0017, 0.0038], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0016, 0.0016, 0.0017, 0.0017, 0.0017, 0.0018, 0.0018], device='cuda:0'), out_proj_covar=tensor([1.7617e-05, 1.7522e-05, 1.6479e-05, 1.7723e-05, 1.7200e-05, 1.7292e-05, 1.9433e-05, 2.0214e-05], device='cuda:0') 2022-11-15 20:00:32,069 INFO [train.py:876] (0/4) Epoch 5, batch 5000, loss[loss=0.1661, simple_loss=0.1765, pruned_loss=0.07783, over 5555.00 frames. ], tot_loss[loss=0.1768, simple_loss=0.1813, pruned_loss=0.08615, over 1092165.72 frames. ], batch size: 16, lr: 1.55e-02, grad_scale: 8.0 2022-11-15 20:00:35,835 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.56 vs. limit=5.0 2022-11-15 20:00:42,670 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34105.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:00:49,474 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.808e+02 2.328e+02 2.773e+02 5.652e+02, threshold=4.656e+02, percent-clipped=1.0 2022-11-15 20:00:55,460 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 20:00:58,868 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34128.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:01:10,306 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34145.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:01:21,161 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-15 20:01:28,869 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 20:01:31,143 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34176.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:01:40,503 INFO [train.py:876] (0/4) Epoch 5, batch 5100, loss[loss=0.113, simple_loss=0.139, pruned_loss=0.04344, over 5684.00 frames. ], tot_loss[loss=0.1776, simple_loss=0.182, pruned_loss=0.08659, over 1089297.14 frames. ], batch size: 12, lr: 1.54e-02, grad_scale: 8.0 2022-11-15 20:01:43,426 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9593, 1.5680, 1.3391, 1.5757, 1.0480, 1.3597, 1.3382, 1.3450], device='cuda:0'), covar=tensor([0.1893, 0.1157, 0.1225, 0.0670, 0.1769, 0.1594, 0.1233, 0.0402], device='cuda:0'), in_proj_covar=tensor([0.0055, 0.0063, 0.0076, 0.0050, 0.0064, 0.0056, 0.0070, 0.0048], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:01:52,676 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34206.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:01:58,448 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.276e+02 1.924e+02 2.234e+02 2.951e+02 5.133e+02, threshold=4.468e+02, percent-clipped=1.0 2022-11-15 20:02:05,239 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2022-11-15 20:02:29,719 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6997, 2.6108, 3.2781, 1.5362, 2.8092, 3.4271, 3.1890, 3.7288], device='cuda:0'), covar=tensor([0.1443, 0.1555, 0.0798, 0.2257, 0.0279, 0.0458, 0.0247, 0.0353], device='cuda:0'), in_proj_covar=tensor([0.0184, 0.0182, 0.0142, 0.0189, 0.0142, 0.0145, 0.0132, 0.0168], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 20:02:42,384 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.75 vs. limit=2.0 2022-11-15 20:02:49,026 INFO [train.py:876] (0/4) Epoch 5, batch 5200, loss[loss=0.2466, simple_loss=0.2111, pruned_loss=0.1411, over 3027.00 frames. ], tot_loss[loss=0.1799, simple_loss=0.1837, pruned_loss=0.08806, over 1086413.45 frames. ], batch size: 284, lr: 1.54e-02, grad_scale: 8.0 2022-11-15 20:02:53,494 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34295.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:03:07,059 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.891e+02 2.372e+02 3.198e+02 5.762e+02, threshold=4.744e+02, percent-clipped=5.0 2022-11-15 20:03:19,809 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34333.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:03:52,329 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34381.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:03:57,989 INFO [train.py:876] (0/4) Epoch 5, batch 5300, loss[loss=0.1752, simple_loss=0.1807, pruned_loss=0.08487, over 5756.00 frames. ], tot_loss[loss=0.1792, simple_loss=0.1827, pruned_loss=0.08789, over 1082047.77 frames. ], batch size: 21, lr: 1.54e-02, grad_scale: 8.0 2022-11-15 20:04:05,524 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34400.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:04:14,423 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9280, 3.5241, 3.0760, 3.4923, 3.5297, 2.9992, 3.0366, 2.9486], device='cuda:0'), covar=tensor([0.1276, 0.0402, 0.1262, 0.0406, 0.0376, 0.0463, 0.0464, 0.0566], device='cuda:0'), in_proj_covar=tensor([0.0113, 0.0141, 0.0220, 0.0136, 0.0168, 0.0143, 0.0147, 0.0131], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:04:15,583 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.782e+02 2.113e+02 2.792e+02 4.181e+02, threshold=4.226e+02, percent-clipped=0.0 2022-11-15 20:04:28,966 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5170, 3.9945, 3.2440, 3.2326, 2.2014, 3.9680, 2.1606, 3.3141], device='cuda:0'), covar=tensor([0.0411, 0.0353, 0.0174, 0.0267, 0.0459, 0.0102, 0.0406, 0.0091], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0120, 0.0133, 0.0148, 0.0160, 0.0130, 0.0150, 0.0115], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:04:41,608 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.41 vs. limit=2.0 2022-11-15 20:05:06,440 INFO [train.py:876] (0/4) Epoch 5, batch 5400, loss[loss=0.1902, simple_loss=0.1951, pruned_loss=0.09266, over 5497.00 frames. ], tot_loss[loss=0.1781, simple_loss=0.1814, pruned_loss=0.08733, over 1083185.79 frames. ], batch size: 49, lr: 1.54e-02, grad_scale: 8.0 2022-11-15 20:05:07,188 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6820, 4.4340, 3.5154, 1.6944, 4.3788, 1.2650, 4.2329, 2.1099], device='cuda:0'), covar=tensor([0.1227, 0.0127, 0.0405, 0.2198, 0.0151, 0.2349, 0.0159, 0.2026], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0101, 0.0110, 0.0120, 0.0104, 0.0133, 0.0094, 0.0122], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 20:05:13,838 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-15 20:05:14,775 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34501.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:05:24,229 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.783e+02 2.368e+02 3.183e+02 6.760e+02, threshold=4.736e+02, percent-clipped=8.0 2022-11-15 20:05:39,685 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6459, 1.2664, 1.4645, 1.4905, 1.3921, 1.7023, 1.5352, 1.2472], device='cuda:0'), covar=tensor([0.0015, 0.0060, 0.0055, 0.0023, 0.0022, 0.0033, 0.0019, 0.0034], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0016, 0.0016, 0.0018, 0.0017, 0.0016, 0.0018, 0.0017], device='cuda:0'), out_proj_covar=tensor([1.7250e-05, 1.6863e-05, 1.6060e-05, 1.7699e-05, 1.7404e-05, 1.6922e-05, 1.9264e-05, 1.9707e-05], device='cuda:0') 2022-11-15 20:05:47,326 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4313, 3.6127, 3.1724, 3.1394, 2.1622, 3.4520, 2.0923, 2.7905], device='cuda:0'), covar=tensor([0.0289, 0.0063, 0.0119, 0.0186, 0.0269, 0.0078, 0.0262, 0.0089], device='cuda:0'), in_proj_covar=tensor([0.0165, 0.0120, 0.0133, 0.0146, 0.0160, 0.0131, 0.0149, 0.0114], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:06:14,513 INFO [train.py:876] (0/4) Epoch 5, batch 5500, loss[loss=0.1803, simple_loss=0.1664, pruned_loss=0.09713, over 4099.00 frames. ], tot_loss[loss=0.1774, simple_loss=0.1811, pruned_loss=0.08683, over 1086484.56 frames. ], batch size: 181, lr: 1.53e-02, grad_scale: 8.0 2022-11-15 20:06:17,327 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34593.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:06:18,642 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34595.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:06:24,680 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.07 vs. limit=2.0 2022-11-15 20:06:32,588 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.877e+02 2.418e+02 2.886e+02 5.617e+02, threshold=4.837e+02, percent-clipped=2.0 2022-11-15 20:06:37,684 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34622.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:06:40,298 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34626.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:06:46,417 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1400, 1.5406, 1.3998, 1.1118, 0.9919, 1.2427, 1.1664, 0.8132], device='cuda:0'), covar=tensor([0.0022, 0.0022, 0.0024, 0.0019, 0.0031, 0.0027, 0.0021, 0.0033], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0016, 0.0016, 0.0018, 0.0018, 0.0017, 0.0019, 0.0017], device='cuda:0'), out_proj_covar=tensor([1.8154e-05, 1.7479e-05, 1.6449e-05, 1.7977e-05, 1.8225e-05, 1.7564e-05, 1.9676e-05, 1.9935e-05], device='cuda:0') 2022-11-15 20:06:51,537 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34643.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:06:58,377 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0370, 3.1102, 3.1005, 2.9308, 3.1440, 2.9731, 1.1092, 3.1955], device='cuda:0'), covar=tensor([0.0376, 0.0364, 0.0263, 0.0282, 0.0345, 0.0396, 0.3096, 0.0393], device='cuda:0'), in_proj_covar=tensor([0.0097, 0.0071, 0.0073, 0.0062, 0.0088, 0.0074, 0.0125, 0.0096], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:06:59,158 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34654.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:07:19,507 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34683.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:07:22,490 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34687.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:07:23,629 INFO [train.py:876] (0/4) Epoch 5, batch 5600, loss[loss=0.2115, simple_loss=0.2035, pruned_loss=0.1098, over 5368.00 frames. ], tot_loss[loss=0.1784, simple_loss=0.182, pruned_loss=0.08747, over 1082610.91 frames. ], batch size: 70, lr: 1.53e-02, grad_scale: 8.0 2022-11-15 20:07:31,130 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34700.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:07:33,390 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2022-11-15 20:07:37,943 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=34710.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:07:38,551 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4804, 1.7798, 1.5212, 1.0553, 0.6130, 2.2325, 1.7082, 0.9111], device='cuda:0'), covar=tensor([0.0550, 0.0687, 0.0609, 0.2053, 0.2328, 0.0575, 0.0929, 0.1125], device='cuda:0'), in_proj_covar=tensor([0.0048, 0.0043, 0.0046, 0.0054, 0.0045, 0.0039, 0.0039, 0.0043], device='cuda:0'), out_proj_covar=tensor([9.1593e-05, 8.2069e-05, 8.5824e-05, 1.0802e-04, 8.8398e-05, 8.0669e-05, 7.8602e-05, 8.2300e-05], device='cuda:0') 2022-11-15 20:07:41,445 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.835e+02 2.169e+02 2.808e+02 5.282e+02, threshold=4.338e+02, percent-clipped=2.0 2022-11-15 20:07:49,220 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.92 vs. limit=5.0 2022-11-15 20:07:58,475 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1913, 0.9518, 1.1555, 0.8731, 1.1370, 0.9590, 0.7690, 0.7982], device='cuda:0'), covar=tensor([0.0378, 0.0535, 0.0426, 0.0947, 0.0355, 0.0424, 0.0951, 0.0602], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0012, 0.0009, 0.0010, 0.0010, 0.0009, 0.0011, 0.0009], device='cuda:0'), out_proj_covar=tensor([3.7471e-05, 5.0442e-05, 3.9607e-05, 4.5577e-05, 4.1948e-05, 3.9446e-05, 4.3234e-05, 4.1355e-05], device='cuda:0') 2022-11-15 20:08:03,953 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34748.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:08:19,708 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=34771.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 20:08:31,299 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.77 vs. limit=2.0 2022-11-15 20:08:32,090 INFO [train.py:876] (0/4) Epoch 5, batch 5700, loss[loss=0.1697, simple_loss=0.1703, pruned_loss=0.08452, over 5684.00 frames. ], tot_loss[loss=0.1774, simple_loss=0.1812, pruned_loss=0.08675, over 1085716.37 frames. ], batch size: 34, lr: 1.53e-02, grad_scale: 8.0 2022-11-15 20:08:40,365 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=34801.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:08:49,699 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.053e+02 1.820e+02 2.216e+02 2.818e+02 4.619e+02, threshold=4.433e+02, percent-clipped=3.0 2022-11-15 20:09:13,152 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=34849.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:09:18,122 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5490, 1.0072, 1.4166, 0.9644, 0.8553, 1.9478, 1.5521, 1.0976], device='cuda:0'), covar=tensor([0.0565, 0.0888, 0.0635, 0.1883, 0.3394, 0.1942, 0.0977, 0.1315], device='cuda:0'), in_proj_covar=tensor([0.0049, 0.0044, 0.0046, 0.0054, 0.0046, 0.0040, 0.0040, 0.0043], device='cuda:0'), out_proj_covar=tensor([9.3039e-05, 8.3291e-05, 8.6263e-05, 1.0815e-04, 8.9247e-05, 8.2322e-05, 7.9095e-05, 8.2979e-05], device='cuda:0') 2022-11-15 20:09:40,485 INFO [train.py:876] (0/4) Epoch 5, batch 5800, loss[loss=0.1553, simple_loss=0.1545, pruned_loss=0.07805, over 5161.00 frames. ], tot_loss[loss=0.1788, simple_loss=0.1827, pruned_loss=0.08746, over 1087315.17 frames. ], batch size: 7, lr: 1.53e-02, grad_scale: 8.0 2022-11-15 20:09:51,020 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4416, 2.0731, 2.9080, 2.7167, 3.0243, 2.1190, 2.8461, 3.4026], device='cuda:0'), covar=tensor([0.0460, 0.1045, 0.0453, 0.0845, 0.0412, 0.0901, 0.0626, 0.0370], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0191, 0.0184, 0.0207, 0.0178, 0.0188, 0.0226, 0.0201], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:0') 2022-11-15 20:09:58,513 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.871e+02 2.247e+02 2.954e+02 6.973e+02, threshold=4.493e+02, percent-clipped=4.0 2022-11-15 20:10:17,696 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.00 vs. limit=5.0 2022-11-15 20:10:21,823 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34949.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:10:23,787 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5932, 4.4055, 4.5765, 4.6344, 4.6560, 4.5054, 1.7844, 4.8323], device='cuda:0'), covar=tensor([0.0276, 0.0399, 0.0234, 0.0185, 0.0349, 0.0404, 0.2712, 0.0231], device='cuda:0'), in_proj_covar=tensor([0.0096, 0.0070, 0.0074, 0.0062, 0.0087, 0.0074, 0.0125, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:10:28,323 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4784, 0.8722, 1.4006, 0.8837, 1.2880, 1.0281, 1.0281, 0.9941], device='cuda:0'), covar=tensor([0.0757, 0.0817, 0.0560, 0.1775, 0.2125, 0.1627, 0.1911, 0.1058], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0013, 0.0010, 0.0011, 0.0010, 0.0009, 0.0011, 0.0009], device='cuda:0'), out_proj_covar=tensor([3.8157e-05, 5.2319e-05, 4.1518e-05, 4.7552e-05, 4.4414e-05, 4.0679e-05, 4.4344e-05, 4.2286e-05], device='cuda:0') 2022-11-15 20:10:38,482 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1856, 3.7217, 2.6239, 3.5426, 2.6550, 2.6195, 1.9782, 3.0690], device='cuda:0'), covar=tensor([0.1212, 0.0140, 0.0913, 0.0237, 0.0700, 0.0927, 0.1746, 0.0267], device='cuda:0'), in_proj_covar=tensor([0.0176, 0.0131, 0.0171, 0.0131, 0.0169, 0.0181, 0.0183, 0.0138], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:10:40,992 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34978.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:10:44,252 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=34982.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:10:48,843 INFO [train.py:876] (0/4) Epoch 5, batch 5900, loss[loss=0.1339, simple_loss=0.1482, pruned_loss=0.05977, over 5504.00 frames. ], tot_loss[loss=0.1766, simple_loss=0.1809, pruned_loss=0.08609, over 1086484.07 frames. ], batch size: 10, lr: 1.53e-02, grad_scale: 8.0 2022-11-15 20:10:56,510 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-35000.pt 2022-11-15 20:11:09,452 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 2.067e+02 2.510e+02 3.048e+02 6.634e+02, threshold=5.021e+02, percent-clipped=2.0 2022-11-15 20:11:32,052 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2022-11-15 20:11:44,830 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35066.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:12:00,906 INFO [train.py:876] (0/4) Epoch 5, batch 6000, loss[loss=0.1452, simple_loss=0.1605, pruned_loss=0.06495, over 5535.00 frames. ], tot_loss[loss=0.1764, simple_loss=0.181, pruned_loss=0.08595, over 1084482.26 frames. ], batch size: 13, lr: 1.52e-02, grad_scale: 8.0 2022-11-15 20:12:00,907 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 20:12:05,411 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6620, 4.1537, 4.4641, 4.3569, 4.6855, 3.8787, 4.8876, 4.7628], device='cuda:0'), covar=tensor([0.0367, 0.0825, 0.0424, 0.1137, 0.0302, 0.0378, 0.0565, 0.0243], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0090, 0.0075, 0.0093, 0.0071, 0.0061, 0.0118, 0.0078], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:12:18,600 INFO [train.py:908] (0/4) Epoch 5, validation: loss=0.1648, simple_loss=0.1864, pruned_loss=0.07158, over 1530663.00 frames. 2022-11-15 20:12:18,601 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 20:12:20,797 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35092.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:12:36,105 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.732e+01 1.810e+02 2.246e+02 2.914e+02 5.187e+02, threshold=4.493e+02, percent-clipped=1.0 2022-11-15 20:12:55,261 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1138, 2.7410, 2.6102, 1.2667, 2.4925, 2.8461, 2.5951, 3.1039], device='cuda:0'), covar=tensor([0.1386, 0.1065, 0.0578, 0.2157, 0.0333, 0.0409, 0.0227, 0.0458], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0180, 0.0140, 0.0193, 0.0150, 0.0147, 0.0135, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 20:12:56,509 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1394, 3.3865, 2.5827, 1.6737, 3.2719, 1.1382, 3.2003, 1.5893], device='cuda:0'), covar=tensor([0.1337, 0.0241, 0.0830, 0.1989, 0.0244, 0.2336, 0.0243, 0.1949], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0103, 0.0109, 0.0119, 0.0103, 0.0134, 0.0093, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 20:13:02,261 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35153.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:13:26,555 INFO [train.py:876] (0/4) Epoch 5, batch 6100, loss[loss=0.0993, simple_loss=0.1204, pruned_loss=0.03912, over 5155.00 frames. ], tot_loss[loss=0.1746, simple_loss=0.18, pruned_loss=0.08462, over 1087155.15 frames. ], batch size: 8, lr: 1.52e-02, grad_scale: 16.0 2022-11-15 20:13:28,855 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-15 20:13:44,499 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.904e+02 2.292e+02 2.845e+02 6.036e+02, threshold=4.585e+02, percent-clipped=4.0 2022-11-15 20:13:58,064 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.32 vs. limit=2.0 2022-11-15 20:14:07,830 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35249.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:14:09,189 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4965, 1.6495, 1.6857, 1.2356, 0.9670, 2.3138, 1.7778, 1.2692], device='cuda:0'), covar=tensor([0.0539, 0.0695, 0.0573, 0.1388, 0.2204, 0.0815, 0.0760, 0.0872], device='cuda:0'), in_proj_covar=tensor([0.0044, 0.0040, 0.0041, 0.0049, 0.0041, 0.0036, 0.0036, 0.0039], device='cuda:0'), out_proj_covar=tensor([8.5049e-05, 7.6519e-05, 7.8152e-05, 9.8337e-05, 8.1639e-05, 7.5651e-05, 7.1878e-05, 7.6333e-05], device='cuda:0') 2022-11-15 20:14:14,775 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.73 vs. limit=2.0 2022-11-15 20:14:28,468 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35278.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:14:31,241 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35282.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:14:36,327 INFO [train.py:876] (0/4) Epoch 5, batch 6200, loss[loss=0.1461, simple_loss=0.173, pruned_loss=0.05957, over 5754.00 frames. ], tot_loss[loss=0.1726, simple_loss=0.1789, pruned_loss=0.0832, over 1088939.40 frames. ], batch size: 27, lr: 1.52e-02, grad_scale: 16.0 2022-11-15 20:14:41,924 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35297.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:14:55,053 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.804e+02 2.260e+02 2.687e+02 5.215e+02, threshold=4.521e+02, percent-clipped=1.0 2022-11-15 20:15:02,893 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35326.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:15:05,751 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35330.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:15:16,321 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35345.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 20:15:18,446 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2271, 3.2831, 3.8287, 1.4078, 3.1893, 3.9564, 3.5653, 4.0961], device='cuda:0'), covar=tensor([0.2049, 0.1275, 0.0512, 0.2569, 0.0392, 0.0303, 0.0287, 0.0410], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0183, 0.0141, 0.0194, 0.0153, 0.0148, 0.0138, 0.0175], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 20:15:30,941 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35366.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 20:15:36,101 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4120, 4.7697, 3.2614, 4.4145, 3.7251, 3.4144, 2.7448, 4.0834], device='cuda:0'), covar=tensor([0.1413, 0.0100, 0.0882, 0.0298, 0.0380, 0.0653, 0.1659, 0.0154], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0133, 0.0171, 0.0131, 0.0169, 0.0180, 0.0184, 0.0138], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:15:36,809 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35374.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:15:47,106 INFO [train.py:876] (0/4) Epoch 5, batch 6300, loss[loss=0.1815, simple_loss=0.1868, pruned_loss=0.08814, over 5756.00 frames. ], tot_loss[loss=0.172, simple_loss=0.1781, pruned_loss=0.08289, over 1085408.12 frames. ], batch size: 31, lr: 1.52e-02, grad_scale: 16.0 2022-11-15 20:15:59,511 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35406.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:16:04,765 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35414.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:16:05,294 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.022e+02 1.779e+02 2.287e+02 3.039e+02 6.063e+02, threshold=4.574e+02, percent-clipped=3.0 2022-11-15 20:16:20,126 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35435.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:16:28,627 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35448.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:16:35,293 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7446, 0.7748, 0.7616, 0.2733, 0.8555, 0.8276, 0.4107, 0.9416], device='cuda:0'), covar=tensor([0.0020, 0.0013, 0.0013, 0.0013, 0.0016, 0.0013, 0.0041, 0.0014], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0025, 0.0028, 0.0026, 0.0024, 0.0024, 0.0026, 0.0022], device='cuda:0'), out_proj_covar=tensor([2.6725e-05, 2.7229e-05, 2.5632e-05, 2.3889e-05, 2.1947e-05, 2.0462e-05, 2.9458e-05, 1.9663e-05], device='cuda:0') 2022-11-15 20:16:50,461 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.06 vs. limit=2.0 2022-11-15 20:16:57,614 INFO [train.py:876] (0/4) Epoch 5, batch 6400, loss[loss=0.2775, simple_loss=0.2307, pruned_loss=0.1621, over 2984.00 frames. ], tot_loss[loss=0.1739, simple_loss=0.1792, pruned_loss=0.08428, over 1084232.44 frames. ], batch size: 284, lr: 1.52e-02, grad_scale: 16.0 2022-11-15 20:17:02,974 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3175, 1.9591, 2.9973, 2.4506, 3.1181, 1.9818, 2.6949, 3.3542], device='cuda:0'), covar=tensor([0.0362, 0.1001, 0.0430, 0.1104, 0.0301, 0.1081, 0.0733, 0.0462], device='cuda:0'), in_proj_covar=tensor([0.0189, 0.0188, 0.0181, 0.0206, 0.0175, 0.0187, 0.0221, 0.0200], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:0') 2022-11-15 20:17:04,148 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.5806, 4.6990, 5.3033, 4.7312, 5.5885, 5.3277, 4.7070, 5.3426], device='cuda:0'), covar=tensor([0.0174, 0.0295, 0.0316, 0.0278, 0.0182, 0.0091, 0.0213, 0.0354], device='cuda:0'), in_proj_covar=tensor([0.0106, 0.0115, 0.0088, 0.0117, 0.0121, 0.0070, 0.0099, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:17:14,787 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.260e+02 1.875e+02 2.323e+02 3.297e+02 5.699e+02, threshold=4.646e+02, percent-clipped=4.0 2022-11-15 20:17:20,501 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7922, 2.4654, 2.1242, 1.4484, 2.1013, 2.5030, 2.2398, 2.7213], device='cuda:0'), covar=tensor([0.1558, 0.1115, 0.0984, 0.2119, 0.0515, 0.0405, 0.0278, 0.0548], device='cuda:0'), in_proj_covar=tensor([0.0183, 0.0183, 0.0140, 0.0192, 0.0152, 0.0147, 0.0138, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 20:17:25,523 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35530.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:17:35,243 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2022-11-15 20:18:05,886 INFO [train.py:876] (0/4) Epoch 5, batch 6500, loss[loss=0.1612, simple_loss=0.1653, pruned_loss=0.07862, over 5691.00 frames. ], tot_loss[loss=0.1732, simple_loss=0.1788, pruned_loss=0.08383, over 1089381.29 frames. ], batch size: 28, lr: 1.51e-02, grad_scale: 16.0 2022-11-15 20:18:07,317 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35591.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:18:23,696 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.209e+02 1.920e+02 2.418e+02 3.178e+02 5.825e+02, threshold=4.835e+02, percent-clipped=5.0 2022-11-15 20:19:14,314 INFO [train.py:876] (0/4) Epoch 5, batch 6600, loss[loss=0.2465, simple_loss=0.225, pruned_loss=0.134, over 5543.00 frames. ], tot_loss[loss=0.1714, simple_loss=0.1775, pruned_loss=0.08265, over 1088001.11 frames. ], batch size: 43, lr: 1.51e-02, grad_scale: 16.0 2022-11-15 20:19:22,591 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35701.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 20:19:31,772 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.292e+02 1.753e+02 2.153e+02 2.756e+02 5.336e+02, threshold=4.306e+02, percent-clipped=2.0 2022-11-15 20:19:42,131 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35730.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:19:42,530 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.87 vs. limit=2.0 2022-11-15 20:19:54,533 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=35748.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:20:09,978 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35771.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:20:13,216 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6620, 1.8213, 1.6683, 2.1950, 1.4592, 1.6674, 1.5488, 2.1994], device='cuda:0'), covar=tensor([0.1010, 0.1310, 0.2681, 0.0708, 0.1676, 0.1447, 0.1609, 0.0491], device='cuda:0'), in_proj_covar=tensor([0.0059, 0.0064, 0.0083, 0.0054, 0.0067, 0.0060, 0.0075, 0.0053], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:20:22,005 INFO [train.py:876] (0/4) Epoch 5, batch 6700, loss[loss=0.161, simple_loss=0.176, pruned_loss=0.07303, over 5640.00 frames. ], tot_loss[loss=0.1719, simple_loss=0.1779, pruned_loss=0.08297, over 1091659.94 frames. ], batch size: 32, lr: 1.51e-02, grad_scale: 16.0 2022-11-15 20:20:23,734 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7152, 3.0383, 2.3192, 2.8864, 1.9791, 2.3756, 1.5613, 2.7558], device='cuda:0'), covar=tensor([0.1295, 0.0170, 0.0752, 0.0261, 0.0905, 0.0785, 0.1687, 0.0257], device='cuda:0'), in_proj_covar=tensor([0.0178, 0.0133, 0.0170, 0.0129, 0.0170, 0.0182, 0.0184, 0.0141], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:20:26,992 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=35796.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:20:38,040 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=35811.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:20:40,468 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.856e+02 2.372e+02 2.960e+02 5.756e+02, threshold=4.743e+02, percent-clipped=4.0 2022-11-15 20:20:42,547 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9463, 1.2790, 1.1822, 0.5184, 0.7183, 1.1442, 0.9579, 0.7979], device='cuda:0'), covar=tensor([0.0015, 0.0008, 0.0011, 0.0011, 0.0020, 0.0013, 0.0018, 0.0023], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0016, 0.0016, 0.0018, 0.0016, 0.0016, 0.0018, 0.0017], device='cuda:0'), out_proj_covar=tensor([1.6383e-05, 1.6588e-05, 1.6062e-05, 1.7665e-05, 1.6360e-05, 1.6368e-05, 1.8307e-05, 1.9861e-05], device='cuda:0') 2022-11-15 20:20:50,907 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.83 vs. limit=5.0 2022-11-15 20:20:52,211 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35832.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:21:19,386 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=35872.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:21:28,453 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=35886.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:21:30,760 INFO [train.py:876] (0/4) Epoch 5, batch 6800, loss[loss=0.2359, simple_loss=0.2177, pruned_loss=0.1271, over 5448.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.1782, pruned_loss=0.08372, over 1087338.28 frames. ], batch size: 64, lr: 1.51e-02, grad_scale: 16.0 2022-11-15 20:21:48,382 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.265e+02 1.967e+02 2.535e+02 3.123e+02 6.625e+02, threshold=5.070e+02, percent-clipped=2.0 2022-11-15 20:22:02,823 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1625, 4.2320, 2.9245, 4.0594, 3.2756, 2.9654, 2.0533, 3.4497], device='cuda:0'), covar=tensor([0.1540, 0.0154, 0.0914, 0.0199, 0.0555, 0.0974, 0.1962, 0.0255], device='cuda:0'), in_proj_covar=tensor([0.0181, 0.0133, 0.0173, 0.0131, 0.0173, 0.0184, 0.0186, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:22:16,250 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2022-11-15 20:22:20,830 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5208, 4.6445, 3.3945, 4.5065, 3.5994, 3.1642, 2.3817, 3.7685], device='cuda:0'), covar=tensor([0.1424, 0.0153, 0.0676, 0.0158, 0.0387, 0.0933, 0.1712, 0.0182], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0131, 0.0169, 0.0129, 0.0169, 0.0180, 0.0182, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:22:38,599 INFO [train.py:876] (0/4) Epoch 5, batch 6900, loss[loss=0.1613, simple_loss=0.1762, pruned_loss=0.07318, over 5765.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.178, pruned_loss=0.08373, over 1084214.97 frames. ], batch size: 20, lr: 1.51e-02, grad_scale: 16.0 2022-11-15 20:22:46,843 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36001.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 20:22:49,726 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9073, 4.7243, 4.7430, 5.0048, 4.7971, 4.3703, 5.5088, 4.8320], device='cuda:0'), covar=tensor([0.0480, 0.0855, 0.0509, 0.0780, 0.0392, 0.0436, 0.0666, 0.0696], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0088, 0.0076, 0.0093, 0.0071, 0.0060, 0.0119, 0.0078], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:22:56,602 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.741e+02 2.226e+02 2.702e+02 5.830e+02, threshold=4.452e+02, percent-clipped=1.0 2022-11-15 20:22:56,937 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.12 vs. limit=5.0 2022-11-15 20:23:07,219 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36030.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:23:19,726 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36049.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 20:23:32,739 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2022-11-15 20:23:39,888 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36078.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:23:44,317 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7614, 1.8812, 1.5065, 2.2714, 1.4649, 1.5796, 1.6505, 2.2459], device='cuda:0'), covar=tensor([0.1116, 0.1614, 0.3149, 0.1090, 0.2617, 0.1477, 0.1939, 0.1524], device='cuda:0'), in_proj_covar=tensor([0.0060, 0.0065, 0.0082, 0.0053, 0.0067, 0.0060, 0.0074, 0.0053], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:23:47,620 INFO [train.py:876] (0/4) Epoch 5, batch 7000, loss[loss=0.1706, simple_loss=0.1794, pruned_loss=0.08086, over 5715.00 frames. ], tot_loss[loss=0.1747, simple_loss=0.1798, pruned_loss=0.08481, over 1084981.12 frames. ], batch size: 28, lr: 1.50e-02, grad_scale: 16.0 2022-11-15 20:23:58,886 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8308, 4.5629, 3.6075, 2.0752, 4.3464, 2.0514, 4.0940, 2.5375], device='cuda:0'), covar=tensor([0.1224, 0.0122, 0.0479, 0.2107, 0.0126, 0.1701, 0.0197, 0.1680], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0105, 0.0112, 0.0124, 0.0107, 0.0136, 0.0096, 0.0125], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 20:24:05,016 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.861e+02 2.290e+02 2.878e+02 5.762e+02, threshold=4.579e+02, percent-clipped=5.0 2022-11-15 20:24:07,114 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4696, 5.4001, 4.1240, 2.6150, 5.1730, 2.7776, 4.8636, 3.1230], device='cuda:0'), covar=tensor([0.0969, 0.0117, 0.0267, 0.1632, 0.0096, 0.1174, 0.0164, 0.1294], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0104, 0.0111, 0.0123, 0.0106, 0.0134, 0.0095, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 20:24:13,296 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36127.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:24:40,738 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36167.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:24:45,550 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.95 vs. limit=5.0 2022-11-15 20:24:51,021 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36181.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:24:54,730 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36186.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:24:56,577 INFO [train.py:876] (0/4) Epoch 5, batch 7100, loss[loss=0.2092, simple_loss=0.2018, pruned_loss=0.1082, over 5672.00 frames. ], tot_loss[loss=0.1739, simple_loss=0.1793, pruned_loss=0.08428, over 1080572.60 frames. ], batch size: 34, lr: 1.50e-02, grad_scale: 16.0 2022-11-15 20:25:03,099 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-15 20:25:14,391 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.219e+01 1.811e+02 2.272e+02 2.779e+02 4.389e+02, threshold=4.544e+02, percent-clipped=0.0 2022-11-15 20:25:27,657 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36234.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:25:32,004 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0465, 3.6239, 3.1216, 3.5996, 3.5844, 3.1718, 3.2374, 3.0699], device='cuda:0'), covar=tensor([0.1142, 0.0415, 0.1342, 0.0369, 0.0524, 0.0431, 0.0582, 0.0545], device='cuda:0'), in_proj_covar=tensor([0.0114, 0.0141, 0.0227, 0.0138, 0.0173, 0.0145, 0.0154, 0.0134], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:25:33,399 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36242.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:25:53,930 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1473, 0.7991, 1.3384, 1.0102, 1.0918, 1.0346, 1.2335, 0.6826], device='cuda:0'), covar=tensor([0.0034, 0.0058, 0.0032, 0.0027, 0.0025, 0.0046, 0.0021, 0.0042], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0015, 0.0016, 0.0018, 0.0016, 0.0016, 0.0017, 0.0018], device='cuda:0'), out_proj_covar=tensor([1.6267e-05, 1.6251e-05, 1.5897e-05, 1.7663e-05, 1.5846e-05, 1.6935e-05, 1.7368e-05, 2.0112e-05], device='cuda:0') 2022-11-15 20:25:58,460 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9287, 1.0792, 1.2674, 0.8112, 0.9079, 1.0765, 0.5565, 1.0792], device='cuda:0'), covar=tensor([0.0025, 0.0017, 0.0021, 0.0031, 0.0019, 0.0022, 0.0049, 0.0026], device='cuda:0'), in_proj_covar=tensor([0.0030, 0.0027, 0.0030, 0.0029, 0.0027, 0.0027, 0.0028, 0.0024], device='cuda:0'), out_proj_covar=tensor([2.9308e-05, 2.9038e-05, 2.7304e-05, 2.6248e-05, 2.4192e-05, 2.2964e-05, 3.0924e-05, 2.1849e-05], device='cuda:0') 2022-11-15 20:26:05,724 INFO [train.py:876] (0/4) Epoch 5, batch 7200, loss[loss=0.1643, simple_loss=0.1695, pruned_loss=0.0795, over 5699.00 frames. ], tot_loss[loss=0.174, simple_loss=0.1791, pruned_loss=0.08449, over 1077014.45 frames. ], batch size: 15, lr: 1.50e-02, grad_scale: 16.0 2022-11-15 20:26:10,524 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7928, 2.1054, 1.6660, 2.1607, 1.5538, 1.5817, 1.6981, 2.2439], device='cuda:0'), covar=tensor([0.1329, 0.1725, 0.2768, 0.2481, 0.2368, 0.2129, 0.2555, 0.2160], device='cuda:0'), in_proj_covar=tensor([0.0060, 0.0065, 0.0086, 0.0055, 0.0068, 0.0061, 0.0075, 0.0054], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:26:22,592 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.732e+01 1.770e+02 2.199e+02 2.603e+02 4.829e+02, threshold=4.399e+02, percent-clipped=1.0 2022-11-15 20:26:36,187 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9615, 4.5987, 3.6716, 1.9886, 4.4010, 1.9914, 4.4197, 2.5989], device='cuda:0'), covar=tensor([0.1395, 0.0241, 0.0410, 0.2832, 0.0280, 0.2246, 0.0211, 0.2270], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0104, 0.0112, 0.0122, 0.0105, 0.0136, 0.0096, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 20:26:53,495 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-5.pt 2022-11-15 20:27:38,810 INFO [train.py:876] (0/4) Epoch 6, batch 0, loss[loss=0.1638, simple_loss=0.1832, pruned_loss=0.07217, over 5561.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.1832, pruned_loss=0.07217, over 5561.00 frames. ], batch size: 13, lr: 1.40e-02, grad_scale: 16.0 2022-11-15 20:27:38,811 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 20:27:52,741 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8033, 1.1276, 1.2993, 0.8237, 1.2476, 0.8459, 0.5503, 1.2400], device='cuda:0'), covar=tensor([0.0031, 0.0010, 0.0025, 0.0013, 0.0013, 0.0025, 0.0041, 0.0019], device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0026, 0.0028, 0.0027, 0.0026, 0.0026, 0.0027, 0.0023], device='cuda:0'), out_proj_covar=tensor([2.7984e-05, 2.7969e-05, 2.6033e-05, 2.4886e-05, 2.3148e-05, 2.2181e-05, 3.0083e-05, 2.0988e-05], device='cuda:0') 2022-11-15 20:27:55,407 INFO [train.py:908] (0/4) Epoch 6, validation: loss=0.1637, simple_loss=0.1861, pruned_loss=0.07065, over 1530663.00 frames. 2022-11-15 20:27:55,408 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 20:28:31,820 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.207e+02 1.895e+02 2.226e+02 2.646e+02 4.624e+02, threshold=4.452e+02, percent-clipped=2.0 2022-11-15 20:28:40,309 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36427.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:28:47,911 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6157, 0.7224, 0.9483, 0.8775, 0.9839, 1.0427, 0.8341, 0.9608], device='cuda:0'), covar=tensor([0.0511, 0.0380, 0.0968, 0.0657, 0.0870, 0.0869, 0.0881, 0.0590], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0012, 0.0010, 0.0010, 0.0010, 0.0009, 0.0011, 0.0009], device='cuda:0'), out_proj_covar=tensor([3.8645e-05, 5.1881e-05, 4.1409e-05, 4.6329e-05, 4.3187e-05, 4.0476e-05, 4.5722e-05, 4.2435e-05], device='cuda:0') 2022-11-15 20:29:03,066 INFO [train.py:876] (0/4) Epoch 6, batch 100, loss[loss=0.1638, simple_loss=0.1751, pruned_loss=0.07626, over 5752.00 frames. ], tot_loss[loss=0.1763, simple_loss=0.1809, pruned_loss=0.08585, over 426793.97 frames. ], batch size: 26, lr: 1.40e-02, grad_scale: 16.0 2022-11-15 20:29:07,130 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36467.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:29:12,766 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36475.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:29:26,354 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.23 vs. limit=5.0 2022-11-15 20:29:40,202 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.873e+02 2.283e+02 2.904e+02 6.033e+02, threshold=4.566e+02, percent-clipped=4.0 2022-11-15 20:29:40,277 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36515.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:29:55,291 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36537.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:30:07,945 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7553, 1.2174, 1.8777, 1.3673, 1.2554, 1.3401, 1.8200, 1.0008], device='cuda:0'), covar=tensor([0.0025, 0.0101, 0.0025, 0.0023, 0.0050, 0.0075, 0.0015, 0.0040], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0016, 0.0017, 0.0018, 0.0017, 0.0016, 0.0018, 0.0018], device='cuda:0'), out_proj_covar=tensor([1.7044e-05, 1.6656e-05, 1.6704e-05, 1.8311e-05, 1.6556e-05, 1.7123e-05, 1.8296e-05, 2.0912e-05], device='cuda:0') 2022-11-15 20:30:11,769 INFO [train.py:876] (0/4) Epoch 6, batch 200, loss[loss=0.2118, simple_loss=0.2001, pruned_loss=0.1117, over 5422.00 frames. ], tot_loss[loss=0.1744, simple_loss=0.1798, pruned_loss=0.08449, over 687505.43 frames. ], batch size: 58, lr: 1.39e-02, grad_scale: 16.0 2022-11-15 20:30:37,110 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7540, 1.9082, 1.7125, 2.2582, 1.5832, 1.7610, 1.7644, 2.2131], device='cuda:0'), covar=tensor([0.1170, 0.1588, 0.2830, 0.1051, 0.1906, 0.1812, 0.1711, 0.1054], device='cuda:0'), in_proj_covar=tensor([0.0060, 0.0069, 0.0084, 0.0056, 0.0068, 0.0061, 0.0075, 0.0055], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:30:42,292 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36605.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:30:44,355 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36608.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:30:46,963 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36612.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:30:48,739 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.665e+02 2.149e+02 2.847e+02 6.157e+02, threshold=4.299e+02, percent-clipped=2.0 2022-11-15 20:31:20,153 INFO [train.py:876] (0/4) Epoch 6, batch 300, loss[loss=0.1545, simple_loss=0.162, pruned_loss=0.07345, over 5560.00 frames. ], tot_loss[loss=0.1742, simple_loss=0.1798, pruned_loss=0.08435, over 840204.96 frames. ], batch size: 16, lr: 1.39e-02, grad_scale: 16.0 2022-11-15 20:31:23,584 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36666.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:31:25,566 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36669.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:31:28,130 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36673.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:31:46,945 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36700.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:31:56,888 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.849e+02 2.219e+02 2.840e+02 6.377e+02, threshold=4.439e+02, percent-clipped=4.0 2022-11-15 20:31:59,134 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8045, 3.1765, 3.4173, 1.6945, 2.9442, 3.6616, 3.2407, 3.8355], device='cuda:0'), covar=tensor([0.1451, 0.1251, 0.0925, 0.2558, 0.0346, 0.0400, 0.0436, 0.0413], device='cuda:0'), in_proj_covar=tensor([0.0181, 0.0181, 0.0141, 0.0189, 0.0149, 0.0149, 0.0133, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0003], device='cuda:0') 2022-11-15 20:32:27,630 INFO [train.py:876] (0/4) Epoch 6, batch 400, loss[loss=0.1424, simple_loss=0.1566, pruned_loss=0.06408, over 5749.00 frames. ], tot_loss[loss=0.1746, simple_loss=0.1803, pruned_loss=0.08448, over 942544.52 frames. ], batch size: 13, lr: 1.39e-02, grad_scale: 16.0 2022-11-15 20:32:27,801 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36761.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 20:32:29,143 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4124, 1.9251, 3.0122, 2.6806, 3.0414, 2.0689, 2.6717, 3.2768], device='cuda:0'), covar=tensor([0.0241, 0.0944, 0.0376, 0.0844, 0.0382, 0.0799, 0.0622, 0.0341], device='cuda:0'), in_proj_covar=tensor([0.0193, 0.0187, 0.0181, 0.0205, 0.0178, 0.0185, 0.0219, 0.0199], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0004, 0.0004], device='cuda:0') 2022-11-15 20:32:58,150 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36805.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 20:33:04,837 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.826e+02 2.118e+02 2.813e+02 4.458e+02, threshold=4.236e+02, percent-clipped=1.0 2022-11-15 20:33:16,942 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3156, 4.2129, 2.8605, 3.9885, 3.2142, 2.8590, 2.1503, 3.5484], device='cuda:0'), covar=tensor([0.1299, 0.0162, 0.0906, 0.0377, 0.0510, 0.0921, 0.1907, 0.0253], device='cuda:0'), in_proj_covar=tensor([0.0176, 0.0130, 0.0169, 0.0132, 0.0169, 0.0182, 0.0184, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:33:20,210 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=36837.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:33:35,935 INFO [train.py:876] (0/4) Epoch 6, batch 500, loss[loss=0.1222, simple_loss=0.1333, pruned_loss=0.05557, over 5168.00 frames. ], tot_loss[loss=0.1703, simple_loss=0.1779, pruned_loss=0.08138, over 1004264.68 frames. ], batch size: 8, lr: 1.39e-02, grad_scale: 16.0 2022-11-15 20:33:39,722 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36866.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 20:33:46,963 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2879, 3.7456, 3.2583, 3.1394, 1.8802, 3.4398, 2.1051, 3.1889], device='cuda:0'), covar=tensor([0.0320, 0.0095, 0.0159, 0.0232, 0.0371, 0.0103, 0.0339, 0.0088], device='cuda:0'), in_proj_covar=tensor([0.0162, 0.0122, 0.0138, 0.0148, 0.0159, 0.0135, 0.0150, 0.0120], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:33:53,147 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=36885.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:34:07,743 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7928, 3.1593, 2.2861, 2.8777, 2.9840, 2.8956, 2.8519, 2.9952], device='cuda:0'), covar=tensor([0.1815, 0.0882, 0.2781, 0.1112, 0.1236, 0.0684, 0.0981, 0.0833], device='cuda:0'), in_proj_covar=tensor([0.0115, 0.0146, 0.0232, 0.0143, 0.0179, 0.0150, 0.0156, 0.0139], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:34:07,815 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9880, 1.9807, 1.6768, 2.1652, 1.6816, 1.7449, 1.6609, 2.2332], device='cuda:0'), covar=tensor([0.0981, 0.1355, 0.2536, 0.0957, 0.1892, 0.1529, 0.1750, 0.0913], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0070, 0.0087, 0.0057, 0.0071, 0.0063, 0.0078, 0.0056], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:34:10,484 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36911.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:34:13,304 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.178e+02 1.807e+02 2.308e+02 2.926e+02 6.442e+02, threshold=4.616e+02, percent-clipped=7.0 2022-11-15 20:34:43,142 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=36958.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:34:44,969 INFO [train.py:876] (0/4) Epoch 6, batch 600, loss[loss=0.1436, simple_loss=0.169, pruned_loss=0.05913, over 5496.00 frames. ], tot_loss[loss=0.171, simple_loss=0.1784, pruned_loss=0.08176, over 1034208.96 frames. ], batch size: 17, lr: 1.39e-02, grad_scale: 16.0 2022-11-15 20:34:45,044 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36961.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:34:47,032 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36964.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:34:49,623 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=36968.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:34:52,729 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=36972.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:35:08,155 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.43 vs. limit=5.0 2022-11-15 20:35:15,266 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=37004.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:35:22,242 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.873e+02 2.337e+02 2.752e+02 4.766e+02, threshold=4.674e+02, percent-clipped=2.0 2022-11-15 20:35:25,066 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=37019.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:35:46,936 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1469, 0.6705, 1.0693, 0.8918, 1.1340, 1.1799, 0.9234, 0.8103], device='cuda:0'), covar=tensor([0.0685, 0.0825, 0.0608, 0.1493, 0.2487, 0.0459, 0.1162, 0.2116], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0013, 0.0010, 0.0011, 0.0011, 0.0009, 0.0012, 0.0009], device='cuda:0'), out_proj_covar=tensor([4.0400e-05, 5.4157e-05, 4.2802e-05, 4.9537e-05, 4.6649e-05, 4.1422e-05, 4.8422e-05, 4.3985e-05], device='cuda:0') 2022-11-15 20:35:51,189 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37056.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 20:35:52,601 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6207, 1.8407, 2.1655, 2.6435, 2.7589, 2.0001, 1.6740, 2.9055], device='cuda:0'), covar=tensor([0.0879, 0.2569, 0.1605, 0.1305, 0.0758, 0.2649, 0.1895, 0.0486], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0210, 0.0206, 0.0316, 0.0216, 0.0220, 0.0195, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 20:35:54,386 INFO [train.py:876] (0/4) Epoch 6, batch 700, loss[loss=0.1713, simple_loss=0.1817, pruned_loss=0.08045, over 5581.00 frames. ], tot_loss[loss=0.1693, simple_loss=0.1768, pruned_loss=0.08089, over 1058967.54 frames. ], batch size: 22, lr: 1.38e-02, grad_scale: 16.0 2022-11-15 20:35:57,293 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=37065.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:36:28,685 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.25 vs. limit=5.0 2022-11-15 20:36:31,398 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.989e+01 2.001e+02 2.497e+02 2.973e+02 6.578e+02, threshold=4.994e+02, percent-clipped=4.0 2022-11-15 20:36:38,939 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1673, 2.0320, 2.1812, 3.0971, 2.9007, 2.1400, 1.8346, 3.3188], device='cuda:0'), covar=tensor([0.0700, 0.2736, 0.2808, 0.2014, 0.1210, 0.3243, 0.2528, 0.0379], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0211, 0.0211, 0.0319, 0.0216, 0.0221, 0.0197, 0.0186], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 20:37:02,432 INFO [train.py:876] (0/4) Epoch 6, batch 800, loss[loss=0.1213, simple_loss=0.153, pruned_loss=0.04482, over 5781.00 frames. ], tot_loss[loss=0.1682, simple_loss=0.1759, pruned_loss=0.08031, over 1066964.66 frames. ], batch size: 14, lr: 1.38e-02, grad_scale: 16.0 2022-11-15 20:37:02,507 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37161.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 20:37:40,547 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.784e+02 2.226e+02 2.677e+02 4.647e+02, threshold=4.452e+02, percent-clipped=0.0 2022-11-15 20:38:11,376 INFO [train.py:876] (0/4) Epoch 6, batch 900, loss[loss=0.1645, simple_loss=0.1755, pruned_loss=0.07673, over 5711.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.1736, pruned_loss=0.07854, over 1078281.96 frames. ], batch size: 15, lr: 1.38e-02, grad_scale: 16.0 2022-11-15 20:38:11,496 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37261.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:38:13,409 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37264.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:38:15,266 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37267.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:38:15,954 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37268.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:38:15,988 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=37268.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:38:40,156 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0276, 3.9542, 3.7863, 3.8941, 3.9757, 4.0694, 1.5273, 4.1431], device='cuda:0'), covar=tensor([0.0545, 0.0518, 0.0620, 0.0474, 0.0717, 0.0446, 0.4073, 0.0519], device='cuda:0'), in_proj_covar=tensor([0.0098, 0.0075, 0.0074, 0.0064, 0.0091, 0.0079, 0.0127, 0.0096], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:38:43,255 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37309.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:38:45,286 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37312.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:38:46,654 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37314.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:38:48,597 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.992e+02 2.315e+02 2.795e+02 5.537e+02, threshold=4.630e+02, percent-clipped=3.0 2022-11-15 20:38:48,678 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37316.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:38:57,781 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=37329.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:39:01,967 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2483, 3.2115, 2.9803, 2.8471, 1.8701, 3.1833, 2.0004, 2.7295], device='cuda:0'), covar=tensor([0.0281, 0.0091, 0.0108, 0.0205, 0.0328, 0.0086, 0.0296, 0.0089], device='cuda:0'), in_proj_covar=tensor([0.0167, 0.0126, 0.0141, 0.0155, 0.0163, 0.0139, 0.0154, 0.0125], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:39:15,675 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37356.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 20:39:18,339 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37360.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:39:18,955 INFO [train.py:876] (0/4) Epoch 6, batch 1000, loss[loss=0.1473, simple_loss=0.1744, pruned_loss=0.06012, over 5706.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.1747, pruned_loss=0.0796, over 1077265.24 frames. ], batch size: 28, lr: 1.38e-02, grad_scale: 16.0 2022-11-15 20:39:21,053 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=37364.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:39:48,184 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37404.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 20:39:56,108 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.744e+01 1.732e+02 2.123e+02 2.683e+02 6.509e+02, threshold=4.246e+02, percent-clipped=3.0 2022-11-15 20:40:02,121 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=37425.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:40:11,503 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5456, 4.0519, 4.3423, 4.0710, 4.5509, 4.3617, 4.1008, 4.5156], device='cuda:0'), covar=tensor([0.0255, 0.0270, 0.0337, 0.0290, 0.0292, 0.0173, 0.0219, 0.0259], device='cuda:0'), in_proj_covar=tensor([0.0102, 0.0112, 0.0085, 0.0114, 0.0121, 0.0071, 0.0097, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:40:20,503 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3623, 0.7178, 1.1179, 0.8713, 1.0850, 1.2217, 0.8286, 1.2282], device='cuda:0'), covar=tensor([0.0308, 0.0440, 0.0213, 0.0679, 0.0767, 0.0828, 0.0949, 0.0604], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0013, 0.0010, 0.0011, 0.0010, 0.0009, 0.0012, 0.0009], device='cuda:0'), out_proj_covar=tensor([4.0424e-05, 5.4530e-05, 4.2632e-05, 4.9483e-05, 4.5865e-05, 4.1824e-05, 4.8508e-05, 4.3883e-05], device='cuda:0') 2022-11-15 20:40:25,131 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.31 vs. limit=2.0 2022-11-15 20:40:26,882 INFO [train.py:876] (0/4) Epoch 6, batch 1100, loss[loss=0.1603, simple_loss=0.1757, pruned_loss=0.07247, over 5556.00 frames. ], tot_loss[loss=0.1699, simple_loss=0.177, pruned_loss=0.08137, over 1079688.57 frames. ], batch size: 25, lr: 1.38e-02, grad_scale: 16.0 2022-11-15 20:40:26,997 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37461.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 20:40:46,083 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8476, 3.2782, 2.0748, 3.0476, 2.1950, 2.3843, 1.7553, 2.8534], device='cuda:0'), covar=tensor([0.1418, 0.0175, 0.1128, 0.0303, 0.0830, 0.0887, 0.1738, 0.0301], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0134, 0.0172, 0.0134, 0.0170, 0.0183, 0.0186, 0.0145], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:40:59,804 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37509.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 20:41:04,181 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.745e+02 2.111e+02 2.538e+02 7.660e+02, threshold=4.223e+02, percent-clipped=2.0 2022-11-15 20:41:15,059 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5619, 2.2604, 1.7509, 1.3305, 1.7220, 2.4829, 2.0646, 2.2962], device='cuda:0'), covar=tensor([0.1637, 0.1164, 0.1218, 0.2038, 0.0637, 0.0506, 0.0392, 0.0857], device='cuda:0'), in_proj_covar=tensor([0.0183, 0.0182, 0.0142, 0.0189, 0.0151, 0.0151, 0.0135, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:41:33,623 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.41 vs. limit=5.0 2022-11-15 20:41:35,287 INFO [train.py:876] (0/4) Epoch 6, batch 1200, loss[loss=0.1665, simple_loss=0.1794, pruned_loss=0.07683, over 5718.00 frames. ], tot_loss[loss=0.1705, simple_loss=0.1777, pruned_loss=0.08162, over 1080543.68 frames. ], batch size: 11, lr: 1.38e-02, grad_scale: 16.0 2022-11-15 20:41:39,295 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37567.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:41:48,444 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9382, 4.3050, 3.7348, 4.2551, 4.2641, 3.4893, 3.8915, 3.6049], device='cuda:0'), covar=tensor([0.0365, 0.0354, 0.1335, 0.0366, 0.0415, 0.0437, 0.0402, 0.0523], device='cuda:0'), in_proj_covar=tensor([0.0113, 0.0146, 0.0229, 0.0142, 0.0176, 0.0151, 0.0154, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:42:11,409 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37614.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:42:12,013 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37615.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:42:12,627 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 1.872e+02 2.304e+02 2.879e+02 7.161e+02, threshold=4.608e+02, percent-clipped=4.0 2022-11-15 20:42:14,852 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0331, 2.9911, 2.8162, 2.7490, 1.8225, 2.8545, 1.9169, 2.0413], device='cuda:0'), covar=tensor([0.0270, 0.0092, 0.0120, 0.0185, 0.0286, 0.0091, 0.0310, 0.0122], device='cuda:0'), in_proj_covar=tensor([0.0167, 0.0127, 0.0140, 0.0155, 0.0162, 0.0140, 0.0154, 0.0125], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:42:18,027 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37624.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:42:42,619 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37660.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:42:43,124 INFO [train.py:876] (0/4) Epoch 6, batch 1300, loss[loss=0.1935, simple_loss=0.196, pruned_loss=0.09549, over 5798.00 frames. ], tot_loss[loss=0.1675, simple_loss=0.1761, pruned_loss=0.07944, over 1084656.18 frames. ], batch size: 21, lr: 1.37e-02, grad_scale: 16.0 2022-11-15 20:42:43,839 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37662.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:43:15,363 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37708.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:43:20,505 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.632e+02 1.886e+02 2.277e+02 3.768e+02, threshold=3.772e+02, percent-clipped=0.0 2022-11-15 20:43:23,146 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=37720.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:43:26,621 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.94 vs. limit=5.0 2022-11-15 20:43:51,754 INFO [train.py:876] (0/4) Epoch 6, batch 1400, loss[loss=0.1472, simple_loss=0.1578, pruned_loss=0.06832, over 5601.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.1742, pruned_loss=0.07878, over 1079720.51 frames. ], batch size: 23, lr: 1.37e-02, grad_scale: 16.0 2022-11-15 20:44:07,003 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.46 vs. limit=5.0 2022-11-15 20:44:31,665 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.336e+02 1.794e+02 2.213e+02 2.826e+02 4.726e+02, threshold=4.425e+02, percent-clipped=5.0 2022-11-15 20:44:56,707 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-15 20:45:01,661 INFO [train.py:876] (0/4) Epoch 6, batch 1500, loss[loss=0.1631, simple_loss=0.1629, pruned_loss=0.0816, over 5762.00 frames. ], tot_loss[loss=0.1665, simple_loss=0.175, pruned_loss=0.07904, over 1083247.59 frames. ], batch size: 15, lr: 1.37e-02, grad_scale: 16.0 2022-11-15 20:45:33,821 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-15 20:45:38,927 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.859e+02 2.319e+02 2.620e+02 5.467e+02, threshold=4.638e+02, percent-clipped=1.0 2022-11-15 20:45:39,106 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3468, 2.3730, 1.6215, 2.3711, 1.6609, 2.0412, 1.9864, 2.8593], device='cuda:0'), covar=tensor([0.1054, 0.1469, 0.4861, 0.1950, 0.2485, 0.2076, 0.2667, 0.1130], device='cuda:0'), in_proj_covar=tensor([0.0065, 0.0071, 0.0086, 0.0057, 0.0073, 0.0064, 0.0080, 0.0057], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 20:45:45,052 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=37924.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:46:10,112 INFO [train.py:876] (0/4) Epoch 6, batch 1600, loss[loss=0.1547, simple_loss=0.1639, pruned_loss=0.07275, over 5734.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.1744, pruned_loss=0.07803, over 1085914.91 frames. ], batch size: 31, lr: 1.37e-02, grad_scale: 16.0 2022-11-15 20:46:17,394 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=37972.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:46:22,791 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=37979.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:46:47,656 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.872e+01 1.813e+02 2.310e+02 2.979e+02 5.455e+02, threshold=4.619e+02, percent-clipped=4.0 2022-11-15 20:46:50,393 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=38020.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:47:03,824 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2246, 4.3524, 3.0362, 4.1616, 3.2917, 2.9171, 2.1966, 3.6579], device='cuda:0'), covar=tensor([0.1476, 0.0179, 0.0898, 0.0238, 0.0524, 0.0917, 0.1964, 0.0216], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0130, 0.0166, 0.0131, 0.0166, 0.0177, 0.0180, 0.0139], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:47:03,882 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=38040.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:47:18,027 INFO [train.py:876] (0/4) Epoch 6, batch 1700, loss[loss=0.2023, simple_loss=0.1939, pruned_loss=0.1053, over 5538.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.1738, pruned_loss=0.07818, over 1087376.24 frames. ], batch size: 46, lr: 1.37e-02, grad_scale: 16.0 2022-11-15 20:47:22,630 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=38068.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:47:55,397 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.764e+02 2.104e+02 2.685e+02 5.145e+02, threshold=4.208e+02, percent-clipped=1.0 2022-11-15 20:48:25,387 INFO [train.py:876] (0/4) Epoch 6, batch 1800, loss[loss=0.16, simple_loss=0.169, pruned_loss=0.07556, over 5814.00 frames. ], tot_loss[loss=0.166, simple_loss=0.1744, pruned_loss=0.07877, over 1087703.69 frames. ], batch size: 21, lr: 1.36e-02, grad_scale: 16.0 2022-11-15 20:48:41,725 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6924, 2.1526, 3.3090, 2.7848, 3.4495, 2.2104, 3.1461, 3.6370], device='cuda:0'), covar=tensor([0.0336, 0.1153, 0.0563, 0.1090, 0.0370, 0.1161, 0.0811, 0.0733], device='cuda:0'), in_proj_covar=tensor([0.0200, 0.0190, 0.0191, 0.0208, 0.0187, 0.0192, 0.0225, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 20:48:43,060 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4224, 3.5735, 3.4215, 1.7174, 3.0147, 3.8434, 3.5150, 4.0917], device='cuda:0'), covar=tensor([0.1697, 0.0917, 0.0544, 0.2345, 0.0392, 0.0298, 0.0270, 0.0276], device='cuda:0'), in_proj_covar=tensor([0.0184, 0.0186, 0.0145, 0.0190, 0.0152, 0.0154, 0.0135, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:49:03,153 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.837e+02 2.239e+02 2.713e+02 4.527e+02, threshold=4.477e+02, percent-clipped=1.0 2022-11-15 20:49:33,775 INFO [train.py:876] (0/4) Epoch 6, batch 1900, loss[loss=0.1175, simple_loss=0.1394, pruned_loss=0.04781, over 5164.00 frames. ], tot_loss[loss=0.1666, simple_loss=0.1748, pruned_loss=0.07926, over 1081428.83 frames. ], batch size: 8, lr: 1.36e-02, grad_scale: 16.0 2022-11-15 20:49:43,406 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.90 vs. limit=5.0 2022-11-15 20:50:10,565 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.862e+02 2.216e+02 2.632e+02 5.559e+02, threshold=4.433e+02, percent-clipped=2.0 2022-11-15 20:50:20,385 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3700, 1.0017, 0.8417, 0.6743, 1.2655, 1.0636, 0.7368, 1.2736], device='cuda:0'), covar=tensor([0.0447, 0.0441, 0.0739, 0.0854, 0.0815, 0.0836, 0.0872, 0.0864], device='cuda:0'), in_proj_covar=tensor([0.0009, 0.0013, 0.0010, 0.0011, 0.0010, 0.0009, 0.0012, 0.0009], device='cuda:0'), out_proj_covar=tensor([4.1510e-05, 5.4725e-05, 4.4224e-05, 4.7690e-05, 4.5779e-05, 4.0785e-05, 4.9185e-05, 4.3043e-05], device='cuda:0') 2022-11-15 20:50:24,249 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=38335.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:50:39,578 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9518, 3.6086, 2.3743, 3.3699, 2.6280, 2.5161, 1.8129, 3.1534], device='cuda:0'), covar=tensor([0.1315, 0.0165, 0.0976, 0.0239, 0.0764, 0.0896, 0.1862, 0.0226], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0127, 0.0164, 0.0129, 0.0165, 0.0174, 0.0177, 0.0138], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 20:50:40,939 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=38360.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:50:41,460 INFO [train.py:876] (0/4) Epoch 6, batch 2000, loss[loss=0.2453, simple_loss=0.2347, pruned_loss=0.1279, over 5691.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.1737, pruned_loss=0.07758, over 1089914.24 frames. ], batch size: 36, lr: 1.36e-02, grad_scale: 16.0 2022-11-15 20:50:50,379 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2915, 2.1960, 1.8602, 0.9244, 1.0525, 2.6382, 1.9483, 1.8682], device='cuda:0'), covar=tensor([0.0600, 0.0537, 0.0735, 0.2597, 0.1747, 0.0925, 0.0776, 0.0523], device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0042, 0.0045, 0.0055, 0.0045, 0.0037, 0.0042, 0.0044], device='cuda:0'), out_proj_covar=tensor([9.8913e-05, 8.5506e-05, 9.0266e-05, 1.1103e-04, 9.4112e-05, 8.2375e-05, 8.7102e-05, 8.8307e-05], device='cuda:0') 2022-11-15 20:51:00,426 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5659, 1.8211, 2.2441, 3.3167, 3.3374, 2.5902, 2.2516, 3.4125], device='cuda:0'), covar=tensor([0.0424, 0.3407, 0.2936, 0.2891, 0.0969, 0.3459, 0.2355, 0.0404], device='cuda:0'), in_proj_covar=tensor([0.0188, 0.0206, 0.0206, 0.0323, 0.0217, 0.0219, 0.0200, 0.0189], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 20:51:19,232 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 2.027e+02 2.499e+02 3.028e+02 6.402e+02, threshold=4.998e+02, percent-clipped=6.0 2022-11-15 20:51:22,847 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=38421.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:51:50,149 INFO [train.py:876] (0/4) Epoch 6, batch 2100, loss[loss=0.1545, simple_loss=0.1705, pruned_loss=0.06926, over 5791.00 frames. ], tot_loss[loss=0.166, simple_loss=0.1748, pruned_loss=0.07856, over 1094759.50 frames. ], batch size: 22, lr: 1.36e-02, grad_scale: 16.0 2022-11-15 20:52:27,550 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.554e+01 1.866e+02 2.249e+02 2.710e+02 6.180e+02, threshold=4.497e+02, percent-clipped=1.0 2022-11-15 20:52:32,459 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3976, 3.0822, 3.1562, 2.8390, 1.9479, 3.0708, 2.0190, 2.9231], device='cuda:0'), covar=tensor([0.0282, 0.0112, 0.0109, 0.0223, 0.0325, 0.0098, 0.0302, 0.0082], device='cuda:0'), in_proj_covar=tensor([0.0172, 0.0130, 0.0143, 0.0161, 0.0165, 0.0141, 0.0157, 0.0131], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:52:58,526 INFO [train.py:876] (0/4) Epoch 6, batch 2200, loss[loss=0.186, simple_loss=0.1881, pruned_loss=0.09197, over 5623.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.1752, pruned_loss=0.07929, over 1093250.37 frames. ], batch size: 38, lr: 1.36e-02, grad_scale: 16.0 2022-11-15 20:53:26,364 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4783, 0.9827, 1.2072, 0.9874, 1.2476, 1.0164, 1.1657, 1.0435], device='cuda:0'), covar=tensor([0.0043, 0.0071, 0.0035, 0.0037, 0.0034, 0.0041, 0.0023, 0.0028], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0016, 0.0017, 0.0020, 0.0018, 0.0016, 0.0018, 0.0019], device='cuda:0'), out_proj_covar=tensor([1.7347e-05, 1.6790e-05, 1.6960e-05, 1.9906e-05, 1.7951e-05, 1.7052e-05, 1.8218e-05, 2.0812e-05], device='cuda:0') 2022-11-15 20:53:36,405 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.801e+02 2.106e+02 2.533e+02 3.933e+02, threshold=4.211e+02, percent-clipped=0.0 2022-11-15 20:53:49,040 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=38635.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:54:06,824 INFO [train.py:876] (0/4) Epoch 6, batch 2300, loss[loss=0.1249, simple_loss=0.1418, pruned_loss=0.05402, over 5111.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.1747, pruned_loss=0.07873, over 1092203.19 frames. ], batch size: 8, lr: 1.36e-02, grad_scale: 16.0 2022-11-15 20:54:16,298 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.20 vs. limit=5.0 2022-11-15 20:54:21,936 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=38683.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:54:24,200 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-15 20:54:44,925 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.033e+02 1.846e+02 2.203e+02 3.021e+02 7.472e+02, threshold=4.405e+02, percent-clipped=8.0 2022-11-15 20:54:45,036 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=38716.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:55:15,141 INFO [train.py:876] (0/4) Epoch 6, batch 2400, loss[loss=0.1461, simple_loss=0.1673, pruned_loss=0.0624, over 5782.00 frames. ], tot_loss[loss=0.1673, simple_loss=0.1753, pruned_loss=0.0797, over 1093659.52 frames. ], batch size: 21, lr: 1.35e-02, grad_scale: 16.0 2022-11-15 20:55:52,457 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.810e+02 2.219e+02 2.775e+02 4.582e+02, threshold=4.438e+02, percent-clipped=1.0 2022-11-15 20:55:57,928 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=38823.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:56:16,726 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8762, 1.4281, 1.7818, 1.1945, 0.8406, 2.2439, 1.7572, 1.4760], device='cuda:0'), covar=tensor([0.0419, 0.0907, 0.0480, 0.2037, 0.1733, 0.0629, 0.1364, 0.0788], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0044, 0.0046, 0.0058, 0.0046, 0.0037, 0.0043, 0.0046], device='cuda:0'), out_proj_covar=tensor([1.0449e-04, 8.8937e-05, 9.3224e-05, 1.1716e-04, 9.6784e-05, 8.4044e-05, 8.9537e-05, 9.3042e-05], device='cuda:0') 2022-11-15 20:56:23,605 INFO [train.py:876] (0/4) Epoch 6, batch 2500, loss[loss=0.08324, simple_loss=0.1135, pruned_loss=0.02646, over 5535.00 frames. ], tot_loss[loss=0.1649, simple_loss=0.1739, pruned_loss=0.07794, over 1097182.91 frames. ], batch size: 10, lr: 1.35e-02, grad_scale: 16.0 2022-11-15 20:56:39,541 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=38884.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:56:50,146 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-15 20:57:01,241 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.730e+02 2.141e+02 2.666e+02 7.999e+02, threshold=4.283e+02, percent-clipped=2.0 2022-11-15 20:57:01,726 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.19 vs. limit=5.0 2022-11-15 20:57:19,649 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.08 vs. limit=5.0 2022-11-15 20:57:31,864 INFO [train.py:876] (0/4) Epoch 6, batch 2600, loss[loss=0.1441, simple_loss=0.158, pruned_loss=0.06509, over 5498.00 frames. ], tot_loss[loss=0.1636, simple_loss=0.173, pruned_loss=0.07712, over 1099116.50 frames. ], batch size: 17, lr: 1.35e-02, grad_scale: 16.0 2022-11-15 20:58:09,807 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.231e+02 1.750e+02 2.073e+02 2.691e+02 5.649e+02, threshold=4.146e+02, percent-clipped=5.0 2022-11-15 20:58:09,956 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=39016.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:58:26,738 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2022-11-15 20:58:38,175 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2115, 3.4359, 3.0814, 2.9526, 1.9220, 3.3457, 1.8664, 2.8932], device='cuda:0'), covar=tensor([0.0411, 0.0176, 0.0195, 0.0320, 0.0426, 0.0136, 0.0441, 0.0120], device='cuda:0'), in_proj_covar=tensor([0.0170, 0.0128, 0.0143, 0.0159, 0.0163, 0.0140, 0.0156, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 20:58:40,661 INFO [train.py:876] (0/4) Epoch 6, batch 2700, loss[loss=0.102, simple_loss=0.1173, pruned_loss=0.04336, over 4527.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.1729, pruned_loss=0.07737, over 1088020.72 frames. ], batch size: 5, lr: 1.35e-02, grad_scale: 16.0 2022-11-15 20:58:42,652 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=39064.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 20:59:00,997 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.45 vs. limit=5.0 2022-11-15 20:59:18,425 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.741e+02 2.185e+02 2.571e+02 4.809e+02, threshold=4.370e+02, percent-clipped=1.0 2022-11-15 20:59:49,229 INFO [train.py:876] (0/4) Epoch 6, batch 2800, loss[loss=0.1534, simple_loss=0.1819, pruned_loss=0.06249, over 5568.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.1731, pruned_loss=0.0772, over 1082031.03 frames. ], batch size: 22, lr: 1.35e-02, grad_scale: 16.0 2022-11-15 21:00:01,214 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39179.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:00:05,184 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39185.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:00:27,552 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.875e+02 2.211e+02 2.740e+02 7.049e+02, threshold=4.422e+02, percent-clipped=6.0 2022-11-15 21:00:30,066 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-15 21:00:47,924 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39246.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:00:58,295 INFO [train.py:876] (0/4) Epoch 6, batch 2900, loss[loss=0.1965, simple_loss=0.2063, pruned_loss=0.0934, over 5610.00 frames. ], tot_loss[loss=0.1676, simple_loss=0.1751, pruned_loss=0.08006, over 1073079.27 frames. ], batch size: 38, lr: 1.35e-02, grad_scale: 16.0 2022-11-15 21:01:13,881 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39284.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:01:36,727 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.430e+02 2.016e+02 2.489e+02 3.062e+02 5.776e+02, threshold=4.978e+02, percent-clipped=1.0 2022-11-15 21:01:55,787 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39345.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:02:06,701 INFO [train.py:876] (0/4) Epoch 6, batch 3000, loss[loss=0.1757, simple_loss=0.1838, pruned_loss=0.08384, over 5724.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.1749, pruned_loss=0.08028, over 1079123.22 frames. ], batch size: 17, lr: 1.34e-02, grad_scale: 16.0 2022-11-15 21:02:06,702 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 21:02:10,954 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.1367, 4.8699, 4.7430, 4.7249, 5.1975, 5.1664, 4.7418, 5.1534], device='cuda:0'), covar=tensor([0.0276, 0.0187, 0.0484, 0.0273, 0.0271, 0.0079, 0.0181, 0.0165], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0112, 0.0085, 0.0114, 0.0119, 0.0073, 0.0099, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:02:19,505 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1903, 3.1648, 3.1103, 3.0080, 1.8888, 3.2020, 2.0005, 2.7832], device='cuda:0'), covar=tensor([0.0332, 0.0154, 0.0126, 0.0199, 0.0394, 0.0132, 0.0368, 0.0118], device='cuda:0'), in_proj_covar=tensor([0.0167, 0.0128, 0.0141, 0.0159, 0.0161, 0.0140, 0.0154, 0.0130], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:02:19,867 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7751, 2.9431, 2.5520, 2.9563, 2.9226, 2.7239, 2.6920, 2.5524], device='cuda:0'), covar=tensor([0.0298, 0.0492, 0.1816, 0.0498, 0.0488, 0.0464, 0.0836, 0.0670], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0146, 0.0233, 0.0145, 0.0178, 0.0151, 0.0156, 0.0141], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:02:24,359 INFO [train.py:908] (0/4) Epoch 6, validation: loss=0.1626, simple_loss=0.1844, pruned_loss=0.07046, over 1530663.00 frames. 2022-11-15 21:02:24,359 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 21:02:47,366 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2607, 2.2739, 1.8466, 2.2568, 1.7248, 1.8485, 2.0023, 2.8839], device='cuda:0'), covar=tensor([0.1165, 0.1703, 0.3542, 0.1726, 0.2660, 0.2136, 0.2760, 0.1224], device='cuda:0'), in_proj_covar=tensor([0.0064, 0.0071, 0.0085, 0.0060, 0.0069, 0.0065, 0.0078, 0.0056], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:03:01,502 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.785e+02 2.182e+02 2.915e+02 6.384e+02, threshold=4.364e+02, percent-clipped=3.0 2022-11-15 21:03:31,202 INFO [train.py:876] (0/4) Epoch 6, batch 3100, loss[loss=0.1328, simple_loss=0.1557, pruned_loss=0.05494, over 5558.00 frames. ], tot_loss[loss=0.166, simple_loss=0.1747, pruned_loss=0.07864, over 1083732.71 frames. ], batch size: 15, lr: 1.34e-02, grad_scale: 16.0 2022-11-15 21:03:32,673 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39463.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:03:37,011 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2022-11-15 21:03:43,782 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=39479.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:04:03,258 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.81 vs. limit=2.0 2022-11-15 21:04:06,500 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 21:04:09,321 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.769e+02 2.283e+02 2.689e+02 7.122e+02, threshold=4.567e+02, percent-clipped=2.0 2022-11-15 21:04:14,705 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39524.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:04:16,926 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=39527.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:04:26,499 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39541.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:04:31,311 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2022-11-15 21:04:40,020 INFO [train.py:876] (0/4) Epoch 6, batch 3200, loss[loss=0.1673, simple_loss=0.1822, pruned_loss=0.07623, over 5693.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.1735, pruned_loss=0.07693, over 1090177.62 frames. ], batch size: 17, lr: 1.34e-02, grad_scale: 16.0 2022-11-15 21:05:09,136 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6255, 4.0839, 3.6090, 3.4674, 2.1801, 3.9518, 2.1025, 3.4499], device='cuda:0'), covar=tensor([0.0385, 0.0164, 0.0186, 0.0291, 0.0502, 0.0104, 0.0442, 0.0083], device='cuda:0'), in_proj_covar=tensor([0.0169, 0.0131, 0.0144, 0.0162, 0.0164, 0.0141, 0.0158, 0.0131], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:05:18,392 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.194e+02 1.784e+02 2.150e+02 2.757e+02 5.278e+02, threshold=4.299e+02, percent-clipped=1.0 2022-11-15 21:05:22,499 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7246, 4.1292, 3.3138, 1.9011, 3.8497, 1.4751, 3.8188, 2.1850], device='cuda:0'), covar=tensor([0.1559, 0.0257, 0.0699, 0.2629, 0.0266, 0.2552, 0.0295, 0.2514], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0101, 0.0109, 0.0118, 0.0103, 0.0129, 0.0094, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 21:05:33,012 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0134, 2.7393, 2.2075, 1.4917, 2.6309, 1.0786, 2.5759, 1.5504], device='cuda:0'), covar=tensor([0.1010, 0.0198, 0.0613, 0.1525, 0.0210, 0.1949, 0.0286, 0.1437], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0101, 0.0109, 0.0118, 0.0103, 0.0128, 0.0094, 0.0120], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 21:05:33,692 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39639.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 21:05:34,267 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39640.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:05:44,900 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2022-11-15 21:05:48,168 INFO [train.py:876] (0/4) Epoch 6, batch 3300, loss[loss=0.2318, simple_loss=0.2177, pruned_loss=0.123, over 5301.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.1733, pruned_loss=0.07655, over 1086313.57 frames. ], batch size: 79, lr: 1.34e-02, grad_scale: 16.0 2022-11-15 21:06:15,364 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39700.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 21:06:26,590 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4493, 2.0051, 1.6659, 1.8392, 1.1403, 1.6514, 1.3679, 1.8413], device='cuda:0'), covar=tensor([0.0645, 0.0170, 0.0586, 0.0282, 0.0939, 0.0581, 0.0905, 0.0287], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0129, 0.0169, 0.0133, 0.0163, 0.0179, 0.0179, 0.0139], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:06:27,137 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 1.777e+02 2.323e+02 2.808e+02 5.808e+02, threshold=4.646e+02, percent-clipped=1.0 2022-11-15 21:06:38,906 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.5413, 5.0433, 5.2197, 4.9664, 5.5738, 5.4744, 4.7989, 5.4951], device='cuda:0'), covar=tensor([0.0227, 0.0174, 0.0362, 0.0179, 0.0238, 0.0072, 0.0196, 0.0192], device='cuda:0'), in_proj_covar=tensor([0.0107, 0.0115, 0.0087, 0.0117, 0.0124, 0.0075, 0.0101, 0.0113], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:06:57,086 INFO [train.py:876] (0/4) Epoch 6, batch 3400, loss[loss=0.1963, simple_loss=0.1908, pruned_loss=0.1009, over 5489.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.1735, pruned_loss=0.07701, over 1084993.67 frames. ], batch size: 49, lr: 1.34e-02, grad_scale: 16.0 2022-11-15 21:07:01,527 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39767.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:07:03,806 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2022-11-15 21:07:14,841 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7442, 2.7460, 2.1438, 2.4907, 1.5519, 2.1541, 1.6945, 2.4888], device='cuda:0'), covar=tensor([0.1094, 0.0200, 0.0706, 0.0341, 0.1083, 0.0767, 0.1296, 0.0297], device='cuda:0'), in_proj_covar=tensor([0.0178, 0.0132, 0.0172, 0.0136, 0.0166, 0.0182, 0.0180, 0.0142], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:07:34,869 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=39815.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:07:36,024 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.758e+02 2.190e+02 2.600e+02 4.839e+02, threshold=4.379e+02, percent-clipped=2.0 2022-11-15 21:07:37,416 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39819.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:07:43,446 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39828.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:07:52,330 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=39841.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:08:05,257 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.52 vs. limit=5.0 2022-11-15 21:08:06,130 INFO [train.py:876] (0/4) Epoch 6, batch 3500, loss[loss=0.1384, simple_loss=0.1636, pruned_loss=0.05657, over 5618.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.174, pruned_loss=0.07812, over 1077306.63 frames. ], batch size: 18, lr: 1.34e-02, grad_scale: 16.0 2022-11-15 21:08:16,394 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=39876.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:08:24,879 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=39889.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:08:43,940 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.235e+02 1.932e+02 2.210e+02 2.836e+02 6.294e+02, threshold=4.421e+02, percent-clipped=3.0 2022-11-15 21:08:59,713 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=39940.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:09:14,037 INFO [train.py:876] (0/4) Epoch 6, batch 3600, loss[loss=0.1839, simple_loss=0.1959, pruned_loss=0.0859, over 5807.00 frames. ], tot_loss[loss=0.1665, simple_loss=0.1751, pruned_loss=0.07899, over 1081573.50 frames. ], batch size: 18, lr: 1.33e-02, grad_scale: 16.0 2022-11-15 21:09:32,386 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=39988.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:09:36,966 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=39995.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 21:09:40,433 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-40000.pt 2022-11-15 21:09:55,017 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.820e+02 2.295e+02 3.006e+02 4.723e+02, threshold=4.590e+02, percent-clipped=2.0 2022-11-15 21:10:25,137 INFO [train.py:876] (0/4) Epoch 6, batch 3700, loss[loss=0.1007, simple_loss=0.1281, pruned_loss=0.03664, over 5541.00 frames. ], tot_loss[loss=0.1658, simple_loss=0.1743, pruned_loss=0.07863, over 1084498.18 frames. ], batch size: 13, lr: 1.33e-02, grad_scale: 16.0 2022-11-15 21:10:33,277 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3531, 4.5235, 4.2313, 4.7045, 4.3866, 3.8739, 5.2346, 4.2685], device='cuda:0'), covar=tensor([0.0369, 0.0861, 0.0358, 0.0903, 0.0351, 0.0239, 0.0546, 0.0415], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0090, 0.0075, 0.0095, 0.0072, 0.0061, 0.0121, 0.0079], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:10:43,655 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2022-11-15 21:11:03,523 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.767e+02 2.199e+02 2.713e+02 5.676e+02, threshold=4.399e+02, percent-clipped=2.0 2022-11-15 21:11:04,980 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40119.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:11:07,569 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40123.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:11:33,225 INFO [train.py:876] (0/4) Epoch 6, batch 3800, loss[loss=0.2131, simple_loss=0.2122, pruned_loss=0.107, over 5389.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.1733, pruned_loss=0.07739, over 1081930.27 frames. ], batch size: 70, lr: 1.33e-02, grad_scale: 16.0 2022-11-15 21:11:37,516 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40167.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:11:40,242 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40171.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:11:40,978 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8537, 1.3683, 1.0732, 1.3291, 1.0314, 1.2182, 1.1257, 1.1733], device='cuda:0'), covar=tensor([0.1748, 0.0936, 0.1306, 0.0499, 0.1308, 0.1461, 0.1196, 0.0490], device='cuda:0'), in_proj_covar=tensor([0.0067, 0.0073, 0.0087, 0.0062, 0.0072, 0.0065, 0.0080, 0.0058], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:11:52,582 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 21:11:57,149 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40195.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:12:02,664 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2259, 3.3040, 2.4557, 1.6780, 3.1797, 1.0779, 3.0749, 1.7193], device='cuda:0'), covar=tensor([0.1131, 0.0151, 0.0821, 0.1771, 0.0190, 0.2084, 0.0246, 0.1563], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0101, 0.0112, 0.0117, 0.0103, 0.0127, 0.0096, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 21:12:09,414 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9736, 2.0983, 2.2279, 3.2567, 3.0893, 2.3444, 1.9474, 3.4608], device='cuda:0'), covar=tensor([0.0746, 0.2479, 0.2202, 0.1943, 0.0823, 0.2957, 0.2288, 0.0417], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0215, 0.0206, 0.0324, 0.0217, 0.0220, 0.0201, 0.0191], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:12:11,859 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.630e+02 2.050e+02 2.494e+02 4.190e+02, threshold=4.099e+02, percent-clipped=0.0 2022-11-15 21:12:13,024 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40218.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:12:16,277 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6562, 1.1031, 1.4752, 0.9914, 1.5334, 1.3936, 1.1183, 1.3976], device='cuda:0'), covar=tensor([0.3047, 0.0524, 0.0803, 0.1510, 0.2175, 0.1489, 0.0818, 0.0617], device='cuda:0'), in_proj_covar=tensor([0.0010, 0.0013, 0.0010, 0.0012, 0.0010, 0.0010, 0.0013, 0.0010], device='cuda:0'), out_proj_covar=tensor([4.4648e-05, 5.7863e-05, 4.6365e-05, 5.2994e-05, 4.7680e-05, 4.4769e-05, 5.3630e-05, 4.7014e-05], device='cuda:0') 2022-11-15 21:12:26,840 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2022-11-15 21:12:39,057 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40256.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:12:42,147 INFO [train.py:876] (0/4) Epoch 6, batch 3900, loss[loss=0.1104, simple_loss=0.1289, pruned_loss=0.04589, over 5476.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.1739, pruned_loss=0.07746, over 1085301.91 frames. ], batch size: 11, lr: 1.33e-02, grad_scale: 16.0 2022-11-15 21:12:49,546 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40272.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:12:54,505 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40279.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:13:05,284 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40295.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 21:13:20,315 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 1.970e+02 2.299e+02 2.764e+02 4.240e+02, threshold=4.598e+02, percent-clipped=2.0 2022-11-15 21:13:22,383 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1160, 4.5331, 4.2761, 3.9813, 4.3926, 4.0273, 1.7954, 4.4391], device='cuda:0'), covar=tensor([0.0329, 0.0217, 0.0289, 0.0394, 0.0327, 0.0369, 0.2993, 0.0327], device='cuda:0'), in_proj_covar=tensor([0.0098, 0.0078, 0.0079, 0.0069, 0.0093, 0.0081, 0.0129, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:13:28,948 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40330.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:13:30,870 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40333.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:13:37,590 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40343.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 21:13:49,826 INFO [train.py:876] (0/4) Epoch 6, batch 4000, loss[loss=0.1616, simple_loss=0.1751, pruned_loss=0.07406, over 5521.00 frames. ], tot_loss[loss=0.1671, simple_loss=0.1757, pruned_loss=0.07926, over 1083300.81 frames. ], batch size: 21, lr: 1.33e-02, grad_scale: 16.0 2022-11-15 21:14:10,419 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40391.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:14:28,142 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.247e+02 1.803e+02 2.226e+02 2.834e+02 5.770e+02, threshold=4.452e+02, percent-clipped=4.0 2022-11-15 21:14:32,582 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40423.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:14:35,136 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2417, 4.7621, 4.1290, 4.7976, 4.7351, 3.9871, 4.3540, 4.1082], device='cuda:0'), covar=tensor([0.0263, 0.0460, 0.1890, 0.0445, 0.0441, 0.0477, 0.0462, 0.0543], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0153, 0.0244, 0.0148, 0.0182, 0.0156, 0.0164, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:14:57,702 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2022-11-15 21:14:58,065 INFO [train.py:876] (0/4) Epoch 6, batch 4100, loss[loss=0.1648, simple_loss=0.1732, pruned_loss=0.07818, over 5693.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.1744, pruned_loss=0.0782, over 1078770.42 frames. ], batch size: 36, lr: 1.33e-02, grad_scale: 16.0 2022-11-15 21:15:04,544 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40471.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:15:04,617 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40471.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:15:06,036 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.92 vs. limit=5.0 2022-11-15 21:15:14,826 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2022-11-15 21:15:19,097 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40492.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:15:36,005 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.881e+02 2.239e+02 2.711e+02 4.699e+02, threshold=4.478e+02, percent-clipped=1.0 2022-11-15 21:15:37,404 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40519.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:15:49,815 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.45 vs. limit=5.0 2022-11-15 21:15:50,127 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7411, 2.1997, 1.8463, 1.3229, 1.5095, 2.5225, 1.8460, 2.5196], device='cuda:0'), covar=tensor([0.1800, 0.1384, 0.1193, 0.2501, 0.0861, 0.0435, 0.0451, 0.0632], device='cuda:0'), in_proj_covar=tensor([0.0184, 0.0192, 0.0147, 0.0196, 0.0162, 0.0164, 0.0137, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:15:59,207 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40551.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:16:00,607 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40553.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:16:05,972 INFO [train.py:876] (0/4) Epoch 6, batch 4200, loss[loss=0.1523, simple_loss=0.1734, pruned_loss=0.06557, over 5590.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.174, pruned_loss=0.07757, over 1085236.36 frames. ], batch size: 18, lr: 1.32e-02, grad_scale: 16.0 2022-11-15 21:16:14,935 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40574.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:16:44,496 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.208e+02 1.657e+02 2.106e+02 2.806e+02 4.425e+02, threshold=4.213e+02, percent-clipped=0.0 2022-11-15 21:16:52,160 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40628.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:17:06,846 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2367, 1.7178, 1.8941, 1.1306, 0.9174, 2.2599, 1.3807, 1.3806], device='cuda:0'), covar=tensor([0.0852, 0.0685, 0.0509, 0.1964, 0.4374, 0.1177, 0.1232, 0.1065], device='cuda:0'), in_proj_covar=tensor([0.0056, 0.0045, 0.0048, 0.0060, 0.0050, 0.0039, 0.0045, 0.0049], device='cuda:0'), out_proj_covar=tensor([1.1303e-04, 9.3670e-05, 9.8399e-05, 1.2264e-04, 1.0527e-04, 8.8981e-05, 9.5854e-05, 1.0011e-04], device='cuda:0') 2022-11-15 21:17:08,259 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40651.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:17:14,810 INFO [train.py:876] (0/4) Epoch 6, batch 4300, loss[loss=0.1562, simple_loss=0.1716, pruned_loss=0.07039, over 5694.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.1741, pruned_loss=0.07825, over 1079189.51 frames. ], batch size: 17, lr: 1.32e-02, grad_scale: 16.0 2022-11-15 21:17:30,315 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.54 vs. limit=2.0 2022-11-15 21:17:32,055 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40686.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:17:49,658 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40712.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:17:52,814 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.535e+01 1.873e+02 2.249e+02 2.828e+02 5.659e+02, threshold=4.498e+02, percent-clipped=5.0 2022-11-15 21:18:08,368 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8617, 2.0429, 2.0705, 1.2764, 2.0086, 2.6178, 2.2129, 2.5314], device='cuda:0'), covar=tensor([0.1705, 0.1354, 0.1118, 0.2503, 0.0674, 0.0501, 0.0335, 0.0777], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0187, 0.0145, 0.0192, 0.0160, 0.0162, 0.0137, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:18:17,217 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.92 vs. limit=2.0 2022-11-15 21:18:23,387 INFO [train.py:876] (0/4) Epoch 6, batch 4400, loss[loss=0.1528, simple_loss=0.1623, pruned_loss=0.07167, over 5578.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.1748, pruned_loss=0.07865, over 1081557.55 frames. ], batch size: 23, lr: 1.32e-02, grad_scale: 16.0 2022-11-15 21:18:34,377 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5548, 1.9031, 1.8078, 2.0569, 1.6716, 1.6152, 1.5277, 1.9057], device='cuda:0'), covar=tensor([0.1465, 0.1551, 0.1778, 0.0968, 0.1471, 0.1623, 0.1946, 0.0631], device='cuda:0'), in_proj_covar=tensor([0.0068, 0.0072, 0.0084, 0.0062, 0.0069, 0.0064, 0.0079, 0.0055], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:18:40,217 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.03 vs. limit=2.0 2022-11-15 21:19:01,938 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.711e+02 2.029e+02 2.549e+02 3.838e+02, threshold=4.057e+02, percent-clipped=0.0 2022-11-15 21:19:23,026 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=40848.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:19:24,983 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40851.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:19:32,349 INFO [train.py:876] (0/4) Epoch 6, batch 4500, loss[loss=0.1678, simple_loss=0.1873, pruned_loss=0.0741, over 5531.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.1734, pruned_loss=0.07682, over 1086005.22 frames. ], batch size: 40, lr: 1.32e-02, grad_scale: 16.0 2022-11-15 21:19:35,157 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40865.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:19:41,105 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40874.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:19:55,485 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1117, 0.9185, 1.3320, 0.6745, 0.9590, 1.2936, 0.8082, 1.1019], device='cuda:0'), covar=tensor([0.0036, 0.0037, 0.0029, 0.0035, 0.0029, 0.0023, 0.0038, 0.0105], device='cuda:0'), in_proj_covar=tensor([0.0035, 0.0032, 0.0034, 0.0034, 0.0032, 0.0031, 0.0034, 0.0029], device='cuda:0'), out_proj_covar=tensor([3.3388e-05, 3.2400e-05, 3.1292e-05, 3.1477e-05, 2.8501e-05, 2.6145e-05, 3.4462e-05, 2.6301e-05], device='cuda:0') 2022-11-15 21:19:58,074 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40899.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:19:58,859 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40900.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:19:59,444 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1322, 4.1157, 4.2770, 4.4933, 3.8804, 3.7064, 4.9458, 3.9114], device='cuda:0'), covar=tensor([0.0460, 0.1450, 0.0359, 0.1133, 0.0545, 0.0368, 0.0841, 0.0645], device='cuda:0'), in_proj_covar=tensor([0.0071, 0.0093, 0.0076, 0.0098, 0.0075, 0.0063, 0.0122, 0.0082], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:20:01,995 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8851, 4.8620, 4.9923, 5.1091, 4.5200, 4.2364, 5.5653, 4.6056], device='cuda:0'), covar=tensor([0.0333, 0.0859, 0.0265, 0.0804, 0.0462, 0.0241, 0.0667, 0.0424], device='cuda:0'), in_proj_covar=tensor([0.0071, 0.0094, 0.0076, 0.0099, 0.0075, 0.0063, 0.0122, 0.0082], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:20:03,448 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3254, 3.1015, 3.0874, 1.2780, 3.1062, 3.4594, 3.2245, 3.7093], device='cuda:0'), covar=tensor([0.2030, 0.1542, 0.0731, 0.3435, 0.0329, 0.0618, 0.0341, 0.0577], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0186, 0.0144, 0.0192, 0.0159, 0.0162, 0.0139, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:20:10,192 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.352e+02 1.848e+02 2.167e+02 2.574e+02 4.684e+02, threshold=4.333e+02, percent-clipped=2.0 2022-11-15 21:20:13,541 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40922.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:20:16,361 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40926.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:20:17,593 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40928.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:20:18,871 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5468, 5.0854, 4.5410, 5.0376, 5.0869, 4.3437, 4.7620, 4.1966], device='cuda:0'), covar=tensor([0.0216, 0.0457, 0.1204, 0.0478, 0.0366, 0.0409, 0.0427, 0.0689], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0157, 0.0246, 0.0151, 0.0185, 0.0158, 0.0166, 0.0150], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:20:35,267 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.9236, 5.2277, 5.7009, 5.2440, 5.9924, 5.8830, 4.8415, 5.8710], device='cuda:0'), covar=tensor([0.0271, 0.0259, 0.0281, 0.0290, 0.0287, 0.0113, 0.0225, 0.0210], device='cuda:0'), in_proj_covar=tensor([0.0106, 0.0118, 0.0088, 0.0119, 0.0128, 0.0075, 0.0101, 0.0116], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:20:39,752 INFO [train.py:876] (0/4) Epoch 6, batch 4600, loss[loss=0.1147, simple_loss=0.1418, pruned_loss=0.0438, over 5457.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.1721, pruned_loss=0.07548, over 1086782.80 frames. ], batch size: 10, lr: 1.32e-02, grad_scale: 16.0 2022-11-15 21:20:39,920 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=40961.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 21:20:50,070 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=40976.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:20:56,732 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=40986.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:21:04,233 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=40997.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:21:11,462 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41007.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:21:17,997 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.184e+02 1.825e+02 2.193e+02 2.554e+02 5.272e+02, threshold=4.385e+02, percent-clipped=4.0 2022-11-15 21:21:29,541 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41034.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:21:36,927 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41045.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:21:45,908 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41058.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:21:48,093 INFO [train.py:876] (0/4) Epoch 6, batch 4700, loss[loss=0.1492, simple_loss=0.1534, pruned_loss=0.07248, over 5340.00 frames. ], tot_loss[loss=0.1609, simple_loss=0.1714, pruned_loss=0.0752, over 1087804.85 frames. ], batch size: 6, lr: 1.32e-02, grad_scale: 16.0 2022-11-15 21:22:16,159 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41102.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:22:18,822 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41106.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:22:25,666 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7594, 4.6626, 4.9797, 5.0132, 4.5957, 4.4786, 5.5034, 4.8250], device='cuda:0'), covar=tensor([0.0411, 0.0855, 0.0310, 0.0900, 0.0426, 0.0311, 0.0749, 0.0488], device='cuda:0'), in_proj_covar=tensor([0.0069, 0.0090, 0.0075, 0.0096, 0.0073, 0.0062, 0.0120, 0.0079], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:22:26,238 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.964e+01 1.808e+02 2.339e+02 2.942e+02 4.729e+02, threshold=4.678e+02, percent-clipped=2.0 2022-11-15 21:22:47,071 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41148.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:22:56,341 INFO [train.py:876] (0/4) Epoch 6, batch 4800, loss[loss=0.1634, simple_loss=0.1884, pruned_loss=0.06921, over 5732.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.1708, pruned_loss=0.07496, over 1086132.69 frames. ], batch size: 20, lr: 1.31e-02, grad_scale: 16.0 2022-11-15 21:22:57,802 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41163.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:23:18,064 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.5977, 1.2072, 1.1100, 0.8111, 1.1184, 1.3968, 0.3961, 1.1319], device='cuda:0'), covar=tensor([0.0024, 0.0014, 0.0019, 0.0021, 0.0019, 0.0015, 0.0042, 0.0020], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0031, 0.0033, 0.0033, 0.0031, 0.0030, 0.0034, 0.0028], device='cuda:0'), out_proj_covar=tensor([3.2285e-05, 3.1409e-05, 2.9989e-05, 3.0489e-05, 2.7445e-05, 2.5104e-05, 3.4308e-05, 2.5033e-05], device='cuda:0') 2022-11-15 21:23:19,851 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41196.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:23:31,352 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8208, 4.2666, 3.3460, 2.0320, 4.1712, 1.7154, 4.0772, 2.4218], device='cuda:0'), covar=tensor([0.1228, 0.0131, 0.0568, 0.2196, 0.0175, 0.1923, 0.0196, 0.1630], device='cuda:0'), in_proj_covar=tensor([0.0131, 0.0102, 0.0113, 0.0120, 0.0106, 0.0129, 0.0098, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 21:23:35,273 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.896e+02 2.366e+02 3.104e+02 6.971e+02, threshold=4.733e+02, percent-clipped=2.0 2022-11-15 21:23:37,413 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41221.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:24:00,968 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41256.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 21:24:04,478 INFO [train.py:876] (0/4) Epoch 6, batch 4900, loss[loss=0.1493, simple_loss=0.1686, pruned_loss=0.06505, over 5576.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.1706, pruned_loss=0.07506, over 1081813.91 frames. ], batch size: 40, lr: 1.31e-02, grad_scale: 16.0 2022-11-15 21:24:35,582 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41307.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:24:43,024 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.647e+02 1.947e+02 2.444e+02 4.412e+02, threshold=3.894e+02, percent-clipped=0.0 2022-11-15 21:24:50,751 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0272, 4.6567, 3.5065, 2.2025, 4.4684, 1.9008, 4.1780, 2.7074], device='cuda:0'), covar=tensor([0.1249, 0.0177, 0.0608, 0.1945, 0.0194, 0.1846, 0.0175, 0.1699], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0102, 0.0113, 0.0118, 0.0104, 0.0128, 0.0097, 0.0120], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 21:25:07,255 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41353.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:25:08,584 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41355.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:25:12,444 INFO [train.py:876] (0/4) Epoch 6, batch 5000, loss[loss=0.1003, simple_loss=0.1203, pruned_loss=0.04011, over 5183.00 frames. ], tot_loss[loss=0.1623, simple_loss=0.1718, pruned_loss=0.07645, over 1080587.09 frames. ], batch size: 8, lr: 1.31e-02, grad_scale: 16.0 2022-11-15 21:25:39,901 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41401.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:25:51,843 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.916e+02 2.252e+02 2.845e+02 5.364e+02, threshold=4.504e+02, percent-clipped=7.0 2022-11-15 21:26:01,488 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.4807, 4.8890, 5.2407, 4.8080, 5.6020, 5.4653, 4.7566, 5.4873], device='cuda:0'), covar=tensor([0.0283, 0.0252, 0.0406, 0.0290, 0.0248, 0.0077, 0.0169, 0.0179], device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0119, 0.0091, 0.0120, 0.0129, 0.0075, 0.0103, 0.0117], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:26:18,842 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=41458.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:26:20,687 INFO [train.py:876] (0/4) Epoch 6, batch 5100, loss[loss=0.1656, simple_loss=0.1783, pruned_loss=0.07643, over 5689.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.1708, pruned_loss=0.07499, over 1083542.35 frames. ], batch size: 19, lr: 1.31e-02, grad_scale: 16.0 2022-11-15 21:26:43,906 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1159, 4.1439, 3.8301, 3.8170, 4.2225, 4.0211, 1.5953, 4.2238], device='cuda:0'), covar=tensor([0.0291, 0.0301, 0.0474, 0.0354, 0.0338, 0.0256, 0.2958, 0.0334], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0074, 0.0076, 0.0068, 0.0090, 0.0077, 0.0124, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:26:59,645 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.689e+02 2.100e+02 2.600e+02 5.796e+02, threshold=4.200e+02, percent-clipped=2.0 2022-11-15 21:27:02,070 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41521.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:27:18,855 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8826, 4.0578, 3.8295, 3.8560, 4.0149, 3.9139, 1.3632, 4.1380], device='cuda:0'), covar=tensor([0.0252, 0.0197, 0.0279, 0.0226, 0.0259, 0.0254, 0.3047, 0.0226], device='cuda:0'), in_proj_covar=tensor([0.0096, 0.0075, 0.0077, 0.0069, 0.0090, 0.0077, 0.0126, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:27:25,836 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41556.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 21:27:28,987 INFO [train.py:876] (0/4) Epoch 6, batch 5200, loss[loss=0.1989, simple_loss=0.1933, pruned_loss=0.1023, over 5751.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.1719, pruned_loss=0.07578, over 1086047.35 frames. ], batch size: 27, lr: 1.31e-02, grad_scale: 16.0 2022-11-15 21:27:34,213 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41569.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:27:58,401 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41604.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:28:08,059 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.794e+02 2.245e+02 2.810e+02 6.868e+02, threshold=4.491e+02, percent-clipped=3.0 2022-11-15 21:28:12,295 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7151, 1.2289, 1.0940, 1.1917, 0.9327, 1.2204, 1.0807, 1.1495], device='cuda:0'), covar=tensor([0.1697, 0.0993, 0.0966, 0.0406, 0.1186, 0.1027, 0.0889, 0.0416], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0072, 0.0086, 0.0064, 0.0071, 0.0067, 0.0080, 0.0056], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:28:18,258 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4942, 1.7497, 1.2211, 1.0150, 1.4369, 1.6789, 1.2950, 1.0519], device='cuda:0'), covar=tensor([0.0019, 0.0019, 0.0043, 0.0024, 0.0028, 0.0025, 0.0019, 0.0023], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0017, 0.0018, 0.0020, 0.0019, 0.0017, 0.0020, 0.0020], device='cuda:0'), out_proj_covar=tensor([1.7948e-05, 1.6785e-05, 1.8010e-05, 2.0243e-05, 1.8104e-05, 1.7707e-05, 1.9202e-05, 2.1345e-05], device='cuda:0') 2022-11-15 21:28:32,314 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41653.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:28:33,023 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41654.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:28:35,071 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3200, 1.3970, 1.6525, 1.5971, 1.7246, 1.5897, 1.9748, 1.5827], device='cuda:0'), covar=tensor([0.0068, 0.0044, 0.0053, 0.0019, 0.0048, 0.0087, 0.0015, 0.0028], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0017, 0.0018, 0.0020, 0.0019, 0.0017, 0.0020, 0.0020], device='cuda:0'), out_proj_covar=tensor([1.7968e-05, 1.6707e-05, 1.7841e-05, 2.0105e-05, 1.7983e-05, 1.7712e-05, 1.9034e-05, 2.1387e-05], device='cuda:0') 2022-11-15 21:28:37,523 INFO [train.py:876] (0/4) Epoch 6, batch 5300, loss[loss=0.1355, simple_loss=0.157, pruned_loss=0.05696, over 5526.00 frames. ], tot_loss[loss=0.16, simple_loss=0.1711, pruned_loss=0.07445, over 1085739.52 frames. ], batch size: 13, lr: 1.31e-02, grad_scale: 16.0 2022-11-15 21:29:04,431 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6225, 4.2475, 4.4573, 4.2767, 4.7496, 4.5180, 4.2882, 4.7625], device='cuda:0'), covar=tensor([0.0455, 0.0288, 0.0482, 0.0339, 0.0363, 0.0193, 0.0224, 0.0259], device='cuda:0'), in_proj_covar=tensor([0.0110, 0.0117, 0.0088, 0.0119, 0.0128, 0.0075, 0.0101, 0.0115], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:29:05,163 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41701.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:29:05,240 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41701.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:29:15,012 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41715.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 21:29:16,828 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.330e+02 1.882e+02 2.155e+02 2.589e+02 5.849e+02, threshold=4.310e+02, percent-clipped=2.0 2022-11-15 21:29:33,590 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0122, 4.2967, 3.7755, 3.4799, 3.8927, 3.9792, 1.6431, 4.1817], device='cuda:0'), covar=tensor([0.0312, 0.0269, 0.0483, 0.0675, 0.0461, 0.0347, 0.3678, 0.0325], device='cuda:0'), in_proj_covar=tensor([0.0101, 0.0078, 0.0080, 0.0072, 0.0094, 0.0081, 0.0132, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:29:38,513 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41749.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:29:44,494 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=41758.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:29:46,714 INFO [train.py:876] (0/4) Epoch 6, batch 5400, loss[loss=0.1425, simple_loss=0.1604, pruned_loss=0.06226, over 5554.00 frames. ], tot_loss[loss=0.16, simple_loss=0.1713, pruned_loss=0.07439, over 1082765.59 frames. ], batch size: 13, lr: 1.31e-02, grad_scale: 16.0 2022-11-15 21:30:17,680 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=41806.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:30:26,062 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.040e+02 1.844e+02 2.196e+02 2.605e+02 6.100e+02, threshold=4.391e+02, percent-clipped=2.0 2022-11-15 21:30:34,294 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9222, 3.1706, 3.1478, 3.0239, 3.0815, 3.0663, 1.2795, 3.1791], device='cuda:0'), covar=tensor([0.0345, 0.0231, 0.0231, 0.0236, 0.0323, 0.0321, 0.2841, 0.0308], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0076, 0.0078, 0.0070, 0.0093, 0.0080, 0.0131, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:30:55,102 INFO [train.py:876] (0/4) Epoch 6, batch 5500, loss[loss=0.1598, simple_loss=0.1789, pruned_loss=0.07032, over 5783.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.1717, pruned_loss=0.07488, over 1082882.71 frames. ], batch size: 21, lr: 1.30e-02, grad_scale: 16.0 2022-11-15 21:31:25,961 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41906.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:31:27,953 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8307, 1.9098, 1.6814, 2.1222, 1.6172, 1.4754, 1.7231, 2.3436], device='cuda:0'), covar=tensor([0.1155, 0.2017, 0.2833, 0.1048, 0.1832, 0.2182, 0.1967, 0.0965], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0072, 0.0087, 0.0063, 0.0072, 0.0068, 0.0081, 0.0057], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:31:32,929 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2022-11-15 21:31:33,899 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.786e+02 2.219e+02 2.876e+02 6.539e+02, threshold=4.438e+02, percent-clipped=3.0 2022-11-15 21:31:44,488 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2022-11-15 21:31:46,190 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1934, 4.1558, 3.7778, 3.6738, 4.2072, 3.9331, 1.7211, 4.3231], device='cuda:0'), covar=tensor([0.0272, 0.0394, 0.0370, 0.0306, 0.0243, 0.0317, 0.2869, 0.0325], device='cuda:0'), in_proj_covar=tensor([0.0098, 0.0075, 0.0077, 0.0069, 0.0092, 0.0079, 0.0128, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:31:54,695 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=41948.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 21:32:03,400 INFO [train.py:876] (0/4) Epoch 6, batch 5600, loss[loss=0.195, simple_loss=0.1916, pruned_loss=0.09917, over 5492.00 frames. ], tot_loss[loss=0.162, simple_loss=0.1731, pruned_loss=0.07542, over 1090940.87 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 16.0 2022-11-15 21:32:05,236 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6452, 1.3828, 1.6050, 1.1964, 1.9060, 1.5247, 1.1365, 1.8077], device='cuda:0'), covar=tensor([0.0500, 0.0890, 0.0638, 0.1085, 0.0429, 0.0941, 0.1655, 0.0617], device='cuda:0'), in_proj_covar=tensor([0.0197, 0.0215, 0.0204, 0.0327, 0.0218, 0.0216, 0.0197, 0.0195], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:32:07,833 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=41967.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:32:08,595 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-15 21:32:25,271 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8797, 1.7598, 2.5681, 2.3891, 2.3331, 1.7496, 2.2692, 2.7360], device='cuda:0'), covar=tensor([0.0337, 0.1060, 0.0516, 0.0702, 0.0613, 0.1039, 0.0681, 0.0459], device='cuda:0'), in_proj_covar=tensor([0.0212, 0.0192, 0.0192, 0.0212, 0.0197, 0.0187, 0.0229, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:32:36,403 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42009.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 21:32:36,942 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42010.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 21:32:40,654 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6828, 2.3997, 2.7988, 3.7197, 3.8002, 2.7462, 2.3353, 3.7889], device='cuda:0'), covar=tensor([0.0392, 0.2643, 0.2070, 0.3949, 0.0969, 0.3101, 0.2195, 0.0422], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0212, 0.0202, 0.0323, 0.0216, 0.0214, 0.0195, 0.0196], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:32:42,424 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.893e+02 2.253e+02 2.739e+02 4.723e+02, threshold=4.505e+02, percent-clipped=2.0 2022-11-15 21:32:52,209 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.26 vs. limit=2.0 2022-11-15 21:32:58,977 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0525, 2.4338, 1.9936, 1.4215, 2.1693, 2.5242, 2.1746, 2.5510], device='cuda:0'), covar=tensor([0.1202, 0.0898, 0.1088, 0.1812, 0.0490, 0.0491, 0.0357, 0.0667], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0191, 0.0148, 0.0197, 0.0161, 0.0167, 0.0144, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:33:00,224 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2316, 4.4230, 2.7949, 4.1925, 3.4375, 3.1557, 2.4706, 3.6672], device='cuda:0'), covar=tensor([0.1481, 0.0176, 0.0995, 0.0183, 0.0487, 0.0799, 0.1586, 0.0273], device='cuda:0'), in_proj_covar=tensor([0.0174, 0.0133, 0.0170, 0.0135, 0.0170, 0.0180, 0.0178, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:33:11,599 INFO [train.py:876] (0/4) Epoch 6, batch 5700, loss[loss=0.2039, simple_loss=0.1831, pruned_loss=0.1124, over 4197.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.1721, pruned_loss=0.0752, over 1089774.49 frames. ], batch size: 181, lr: 1.30e-02, grad_scale: 16.0 2022-11-15 21:33:51,625 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.009e+02 1.803e+02 2.125e+02 2.518e+02 5.093e+02, threshold=4.250e+02, percent-clipped=1.0 2022-11-15 21:34:23,229 INFO [train.py:876] (0/4) Epoch 6, batch 5800, loss[loss=0.1519, simple_loss=0.1826, pruned_loss=0.06065, over 5715.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.1723, pruned_loss=0.07542, over 1086171.57 frames. ], batch size: 36, lr: 1.30e-02, grad_scale: 16.0 2022-11-15 21:35:01,870 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.784e+02 2.140e+02 2.674e+02 6.368e+02, threshold=4.280e+02, percent-clipped=2.0 2022-11-15 21:35:31,646 INFO [train.py:876] (0/4) Epoch 6, batch 5900, loss[loss=0.121, simple_loss=0.1503, pruned_loss=0.04584, over 5764.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.1715, pruned_loss=0.07491, over 1091515.47 frames. ], batch size: 14, lr: 1.30e-02, grad_scale: 16.0 2022-11-15 21:35:32,385 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42262.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:35:34,357 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8700, 3.5316, 3.7018, 3.4277, 3.9635, 3.7004, 3.6700, 3.9122], device='cuda:0'), covar=tensor([0.0358, 0.0337, 0.0446, 0.0394, 0.0348, 0.0240, 0.0265, 0.0353], device='cuda:0'), in_proj_covar=tensor([0.0110, 0.0118, 0.0089, 0.0120, 0.0129, 0.0076, 0.0101, 0.0117], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 21:35:56,165 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7838, 4.6861, 4.5745, 3.9358, 4.7924, 4.1390, 2.3170, 5.0275], device='cuda:0'), covar=tensor([0.0265, 0.0314, 0.0267, 0.0418, 0.0321, 0.0426, 0.2936, 0.0303], device='cuda:0'), in_proj_covar=tensor([0.0102, 0.0078, 0.0078, 0.0071, 0.0097, 0.0082, 0.0132, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:36:00,789 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42304.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 21:36:05,185 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=42310.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 21:36:10,905 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 1.865e+02 2.230e+02 2.875e+02 6.515e+02, threshold=4.459e+02, percent-clipped=7.0 2022-11-15 21:36:26,331 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.36 vs. limit=2.0 2022-11-15 21:36:37,637 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=42358.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:36:39,543 INFO [train.py:876] (0/4) Epoch 6, batch 6000, loss[loss=0.1682, simple_loss=0.1887, pruned_loss=0.07387, over 5737.00 frames. ], tot_loss[loss=0.1583, simple_loss=0.1697, pruned_loss=0.07339, over 1091787.44 frames. ], batch size: 20, lr: 1.30e-02, grad_scale: 8.0 2022-11-15 21:36:39,544 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 21:36:44,595 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.1787, 4.7883, 4.8129, 4.6813, 5.2149, 5.2290, 4.6285, 5.2649], device='cuda:0'), covar=tensor([0.0228, 0.0297, 0.0377, 0.0451, 0.0352, 0.0091, 0.0280, 0.0236], device='cuda:0'), in_proj_covar=tensor([0.0114, 0.0121, 0.0092, 0.0123, 0.0132, 0.0078, 0.0103, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 21:36:53,912 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9206, 1.5249, 1.4645, 1.3568, 1.4240, 2.0522, 1.2620, 1.7074], device='cuda:0'), covar=tensor([0.0029, 0.0131, 0.0035, 0.0035, 0.0124, 0.0048, 0.0025, 0.0035], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0017, 0.0018, 0.0021, 0.0019, 0.0018, 0.0020, 0.0021], device='cuda:0'), out_proj_covar=tensor([1.7873e-05, 1.7179e-05, 1.8034e-05, 2.1060e-05, 1.8683e-05, 1.8386e-05, 1.9998e-05, 2.2163e-05], device='cuda:0') 2022-11-15 21:36:55,158 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6799, 2.2641, 2.9815, 3.5466, 3.8355, 2.7020, 2.2151, 3.7644], device='cuda:0'), covar=tensor([0.0426, 0.3892, 0.2102, 0.2723, 0.0770, 0.3419, 0.2631, 0.0299], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0212, 0.0203, 0.0323, 0.0221, 0.0217, 0.0196, 0.0193], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:36:57,490 INFO [train.py:908] (0/4) Epoch 6, validation: loss=0.1626, simple_loss=0.1837, pruned_loss=0.07077, over 1530663.00 frames. 2022-11-15 21:36:57,490 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 21:37:09,889 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42379.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:37:21,389 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6137, 1.8750, 2.1036, 1.2441, 1.0554, 2.3870, 1.6371, 1.8875], device='cuda:0'), covar=tensor([0.0889, 0.0547, 0.0632, 0.2343, 0.3830, 0.1497, 0.1304, 0.0864], device='cuda:0'), in_proj_covar=tensor([0.0056, 0.0044, 0.0049, 0.0060, 0.0047, 0.0039, 0.0045, 0.0049], device='cuda:0'), out_proj_covar=tensor([1.1422e-04, 9.2343e-05, 1.0043e-04, 1.2474e-04, 1.0309e-04, 8.8974e-05, 9.7217e-05, 1.0125e-04], device='cuda:0') 2022-11-15 21:37:24,549 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9340, 2.2803, 2.8140, 3.7023, 4.0724, 2.9730, 2.7680, 3.9645], device='cuda:0'), covar=tensor([0.0295, 0.3794, 0.2245, 0.4052, 0.0876, 0.3280, 0.2272, 0.0339], device='cuda:0'), in_proj_covar=tensor([0.0193, 0.0210, 0.0202, 0.0320, 0.0218, 0.0214, 0.0194, 0.0191], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:37:37,847 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.691e+02 2.055e+02 2.547e+02 4.668e+02, threshold=4.111e+02, percent-clipped=1.0 2022-11-15 21:37:52,149 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42440.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:38:06,270 INFO [train.py:876] (0/4) Epoch 6, batch 6100, loss[loss=0.139, simple_loss=0.1615, pruned_loss=0.05828, over 5773.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.1717, pruned_loss=0.07568, over 1087982.25 frames. ], batch size: 16, lr: 1.29e-02, grad_scale: 8.0 2022-11-15 21:38:22,983 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.0460, 4.9484, 4.8261, 5.2236, 4.4607, 4.0964, 5.5759, 4.8628], device='cuda:0'), covar=tensor([0.0247, 0.0634, 0.0231, 0.0655, 0.0402, 0.0251, 0.0533, 0.0369], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0093, 0.0078, 0.0099, 0.0075, 0.0065, 0.0123, 0.0082], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:38:37,127 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42506.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:38:45,893 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.891e+02 2.304e+02 2.895e+02 7.790e+02, threshold=4.608e+02, percent-clipped=4.0 2022-11-15 21:39:14,512 INFO [train.py:876] (0/4) Epoch 6, batch 6200, loss[loss=0.1521, simple_loss=0.1661, pruned_loss=0.06903, over 5598.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.171, pruned_loss=0.07441, over 1088415.80 frames. ], batch size: 38, lr: 1.29e-02, grad_scale: 8.0 2022-11-15 21:39:15,260 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=42562.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:39:18,947 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42567.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:39:26,652 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.73 vs. limit=2.0 2022-11-15 21:39:44,649 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=42604.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 21:39:48,427 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=42610.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:39:54,752 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.838e+01 1.745e+02 2.107e+02 2.704e+02 4.645e+02, threshold=4.215e+02, percent-clipped=1.0 2022-11-15 21:40:15,000 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42649.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:40:17,232 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=42652.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 21:40:23,151 INFO [train.py:876] (0/4) Epoch 6, batch 6300, loss[loss=0.1329, simple_loss=0.1554, pruned_loss=0.05519, over 5726.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.1716, pruned_loss=0.07531, over 1080335.57 frames. ], batch size: 14, lr: 1.29e-02, grad_scale: 8.0 2022-11-15 21:40:45,377 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5333, 3.8686, 3.0090, 1.8618, 3.7843, 1.5096, 3.6778, 1.9114], device='cuda:0'), covar=tensor([0.1241, 0.0141, 0.0663, 0.1819, 0.0154, 0.1830, 0.0170, 0.1835], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0101, 0.0112, 0.0117, 0.0105, 0.0128, 0.0098, 0.0120], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 21:40:48,669 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0208, 2.3699, 3.3885, 3.1992, 3.9189, 2.5155, 3.3467, 3.9010], device='cuda:0'), covar=tensor([0.0435, 0.1292, 0.0597, 0.1091, 0.0276, 0.1156, 0.0839, 0.0566], device='cuda:0'), in_proj_covar=tensor([0.0213, 0.0193, 0.0193, 0.0209, 0.0192, 0.0186, 0.0227, 0.0214], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:40:55,889 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42710.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:41:02,049 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.507e+01 1.912e+02 2.400e+02 2.974e+02 5.468e+02, threshold=4.801e+02, percent-clipped=3.0 2022-11-15 21:41:03,909 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 21:41:13,147 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42735.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:41:14,126 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.84 vs. limit=2.0 2022-11-15 21:41:31,362 INFO [train.py:876] (0/4) Epoch 6, batch 6400, loss[loss=0.1412, simple_loss=0.1613, pruned_loss=0.06053, over 5737.00 frames. ], tot_loss[loss=0.1614, simple_loss=0.1724, pruned_loss=0.0752, over 1086626.56 frames. ], batch size: 11, lr: 1.29e-02, grad_scale: 8.0 2022-11-15 21:42:09,672 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3399, 4.3473, 2.9757, 4.2164, 3.3738, 3.1127, 2.6080, 3.7855], device='cuda:0'), covar=tensor([0.1720, 0.0266, 0.0988, 0.0317, 0.0646, 0.0834, 0.1716, 0.0290], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0132, 0.0168, 0.0139, 0.0173, 0.0179, 0.0182, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:42:11,496 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.772e+02 2.214e+02 2.586e+02 5.921e+02, threshold=4.429e+02, percent-clipped=3.0 2022-11-15 21:42:38,690 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42859.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:42:39,812 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.07 vs. limit=2.0 2022-11-15 21:42:39,913 INFO [train.py:876] (0/4) Epoch 6, batch 6500, loss[loss=0.1228, simple_loss=0.1462, pruned_loss=0.04971, over 5696.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.1712, pruned_loss=0.07481, over 1087959.01 frames. ], batch size: 19, lr: 1.29e-02, grad_scale: 8.0 2022-11-15 21:42:40,659 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=42862.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:42:50,353 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7323, 2.3583, 3.0447, 3.6232, 3.9536, 2.7614, 2.1750, 3.7229], device='cuda:0'), covar=tensor([0.0429, 0.3181, 0.1774, 0.3727, 0.0650, 0.2712, 0.2452, 0.0488], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0207, 0.0197, 0.0324, 0.0219, 0.0212, 0.0193, 0.0195], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:42:52,369 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=42879.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:43:18,179 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3643, 1.0446, 1.6298, 1.3104, 1.3063, 1.2780, 1.4580, 1.3678], device='cuda:0'), covar=tensor([0.0073, 0.0082, 0.0040, 0.0029, 0.0062, 0.0057, 0.0025, 0.0037], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0016, 0.0017, 0.0019, 0.0018, 0.0016, 0.0018, 0.0019], device='cuda:0'), out_proj_covar=tensor([1.5887e-05, 1.6413e-05, 1.6376e-05, 1.8941e-05, 1.7191e-05, 1.6467e-05, 1.7582e-05, 2.0458e-05], device='cuda:0') 2022-11-15 21:43:19,262 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.824e+02 2.207e+02 2.735e+02 7.842e+02, threshold=4.414e+02, percent-clipped=4.0 2022-11-15 21:43:20,105 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42920.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:43:25,741 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9177, 2.6390, 2.3394, 1.6327, 2.5109, 2.9119, 2.5535, 2.8855], device='cuda:0'), covar=tensor([0.2017, 0.1402, 0.0888, 0.2432, 0.0421, 0.0508, 0.0411, 0.0805], device='cuda:0'), in_proj_covar=tensor([0.0183, 0.0186, 0.0147, 0.0194, 0.0159, 0.0167, 0.0137, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:43:33,781 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=42940.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:43:47,834 INFO [train.py:876] (0/4) Epoch 6, batch 6600, loss[loss=0.1348, simple_loss=0.1585, pruned_loss=0.05556, over 5556.00 frames. ], tot_loss[loss=0.1594, simple_loss=0.1706, pruned_loss=0.07412, over 1086378.55 frames. ], batch size: 15, lr: 1.29e-02, grad_scale: 8.0 2022-11-15 21:44:18,660 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=43005.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:44:28,196 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.711e+02 2.097e+02 2.455e+02 4.370e+02, threshold=4.194e+02, percent-clipped=0.0 2022-11-15 21:44:39,646 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43035.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:44:56,951 INFO [train.py:876] (0/4) Epoch 6, batch 6700, loss[loss=0.1463, simple_loss=0.1668, pruned_loss=0.06292, over 5715.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.1708, pruned_loss=0.07436, over 1084974.28 frames. ], batch size: 28, lr: 1.29e-02, grad_scale: 8.0 2022-11-15 21:45:01,121 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2022-11-15 21:45:12,505 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43083.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:45:34,145 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-15 21:45:36,326 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.495e+01 1.976e+02 2.503e+02 2.987e+02 6.000e+02, threshold=5.005e+02, percent-clipped=7.0 2022-11-15 21:46:01,222 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5543, 4.0334, 3.5335, 3.5380, 2.1178, 3.9330, 2.1650, 3.3010], device='cuda:0'), covar=tensor([0.0433, 0.0144, 0.0174, 0.0276, 0.0457, 0.0119, 0.0409, 0.0098], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0140, 0.0154, 0.0170, 0.0173, 0.0152, 0.0165, 0.0139], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:46:04,886 INFO [train.py:876] (0/4) Epoch 6, batch 6800, loss[loss=0.132, simple_loss=0.157, pruned_loss=0.05347, over 5541.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.1715, pruned_loss=0.07508, over 1083527.83 frames. ], batch size: 15, lr: 1.28e-02, grad_scale: 8.0 2022-11-15 21:46:05,663 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43162.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:46:29,801 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9994, 3.1322, 2.9790, 3.0500, 3.0960, 3.1063, 1.2430, 3.3298], device='cuda:0'), covar=tensor([0.0423, 0.0278, 0.0388, 0.0254, 0.0360, 0.0368, 0.3102, 0.0313], device='cuda:0'), in_proj_covar=tensor([0.0099, 0.0077, 0.0078, 0.0069, 0.0094, 0.0081, 0.0127, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:46:31,446 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.92 vs. limit=2.0 2022-11-15 21:46:38,996 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43210.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:46:39,890 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2022-11-15 21:46:42,320 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=43215.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:46:44,823 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 1.755e+02 2.083e+02 2.590e+02 5.758e+02, threshold=4.166e+02, percent-clipped=2.0 2022-11-15 21:46:55,882 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=43235.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:47:02,299 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6340, 2.5763, 2.1447, 3.0682, 2.1122, 2.6080, 2.5979, 3.2178], device='cuda:0'), covar=tensor([0.0983, 0.1478, 0.2787, 0.0792, 0.1997, 0.1168, 0.1879, 0.2803], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0070, 0.0087, 0.0062, 0.0070, 0.0068, 0.0079, 0.0055], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:47:13,944 INFO [train.py:876] (0/4) Epoch 6, batch 6900, loss[loss=0.166, simple_loss=0.163, pruned_loss=0.08452, over 5706.00 frames. ], tot_loss[loss=0.1589, simple_loss=0.1702, pruned_loss=0.07379, over 1092909.85 frames. ], batch size: 28, lr: 1.28e-02, grad_scale: 8.0 2022-11-15 21:47:23,024 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6579, 3.4518, 3.2620, 1.5630, 3.2257, 3.6290, 3.3500, 3.9224], device='cuda:0'), covar=tensor([0.1438, 0.0990, 0.0948, 0.2432, 0.0271, 0.0616, 0.0237, 0.0484], device='cuda:0'), in_proj_covar=tensor([0.0184, 0.0187, 0.0146, 0.0190, 0.0160, 0.0166, 0.0136, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:47:32,972 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.98 vs. limit=2.0 2022-11-15 21:47:44,594 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43305.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:47:53,737 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 1.786e+02 2.260e+02 2.890e+02 4.786e+02, threshold=4.520e+02, percent-clipped=4.0 2022-11-15 21:48:16,246 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5168, 2.2455, 2.7640, 3.4851, 3.4752, 2.6961, 2.1275, 3.5902], device='cuda:0'), covar=tensor([0.0499, 0.2877, 0.2069, 0.2634, 0.1082, 0.2439, 0.2144, 0.0586], device='cuda:0'), in_proj_covar=tensor([0.0197, 0.0212, 0.0200, 0.0325, 0.0218, 0.0212, 0.0193, 0.0199], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0005, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:48:16,705 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43353.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:48:22,248 INFO [train.py:876] (0/4) Epoch 6, batch 7000, loss[loss=0.2421, simple_loss=0.2207, pruned_loss=0.1317, over 5459.00 frames. ], tot_loss[loss=0.1583, simple_loss=0.1699, pruned_loss=0.07339, over 1093126.04 frames. ], batch size: 53, lr: 1.28e-02, grad_scale: 8.0 2022-11-15 21:48:49,162 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0329, 4.7001, 3.6363, 2.0162, 4.5248, 1.7100, 4.4530, 2.6407], device='cuda:0'), covar=tensor([0.1084, 0.0111, 0.0499, 0.2164, 0.0116, 0.1873, 0.0118, 0.1475], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0101, 0.0113, 0.0118, 0.0105, 0.0129, 0.0096, 0.0119], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 21:48:59,713 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1831, 2.7597, 2.7168, 1.4023, 2.8872, 3.2118, 3.0049, 3.2258], device='cuda:0'), covar=tensor([0.1919, 0.1563, 0.0764, 0.2735, 0.0513, 0.0519, 0.0292, 0.0707], device='cuda:0'), in_proj_covar=tensor([0.0184, 0.0187, 0.0144, 0.0189, 0.0161, 0.0166, 0.0136, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:49:02,129 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.834e+02 2.170e+02 2.645e+02 5.568e+02, threshold=4.340e+02, percent-clipped=1.0 2022-11-15 21:49:30,778 INFO [train.py:876] (0/4) Epoch 6, batch 7100, loss[loss=0.1455, simple_loss=0.1632, pruned_loss=0.06388, over 5634.00 frames. ], tot_loss[loss=0.1555, simple_loss=0.168, pruned_loss=0.07147, over 1089702.19 frames. ], batch size: 38, lr: 1.28e-02, grad_scale: 8.0 2022-11-15 21:49:55,924 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.98 vs. limit=5.0 2022-11-15 21:49:57,032 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.14 vs. limit=5.0 2022-11-15 21:50:06,731 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=43514.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:50:07,362 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43515.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:50:10,161 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.966e+01 1.749e+02 2.086e+02 2.685e+02 6.877e+02, threshold=4.173e+02, percent-clipped=3.0 2022-11-15 21:50:21,421 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=43535.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:50:31,264 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=8.04 vs. limit=5.0 2022-11-15 21:50:38,609 INFO [train.py:876] (0/4) Epoch 6, batch 7200, loss[loss=0.1664, simple_loss=0.1594, pruned_loss=0.08676, over 5016.00 frames. ], tot_loss[loss=0.1585, simple_loss=0.1695, pruned_loss=0.07376, over 1083458.99 frames. ], batch size: 109, lr: 1.28e-02, grad_scale: 8.0 2022-11-15 21:50:40,019 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43563.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:50:48,393 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=43575.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:50:52,560 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7261, 4.9837, 3.1452, 4.8541, 3.9834, 3.2706, 3.2507, 4.3678], device='cuda:0'), covar=tensor([0.1466, 0.0207, 0.1068, 0.0252, 0.0417, 0.0892, 0.1453, 0.0191], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0134, 0.0171, 0.0140, 0.0176, 0.0183, 0.0185, 0.0146], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 21:50:53,857 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=43583.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:51:17,828 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.746e+02 2.258e+02 2.937e+02 6.701e+02, threshold=4.517e+02, percent-clipped=5.0 2022-11-15 21:51:28,037 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-6.pt 2022-11-15 21:52:13,099 INFO [train.py:876] (0/4) Epoch 7, batch 0, loss[loss=0.1282, simple_loss=0.1513, pruned_loss=0.05258, over 5701.00 frames. ], tot_loss[loss=0.1282, simple_loss=0.1513, pruned_loss=0.05258, over 5701.00 frames. ], batch size: 15, lr: 1.20e-02, grad_scale: 8.0 2022-11-15 21:52:13,101 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 21:52:27,870 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8927, 3.1309, 3.0575, 2.9420, 3.0373, 3.0323, 1.4796, 3.0826], device='cuda:0'), covar=tensor([0.0240, 0.0151, 0.0201, 0.0159, 0.0230, 0.0214, 0.2316, 0.0238], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0077, 0.0078, 0.0068, 0.0094, 0.0082, 0.0127, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:52:29,702 INFO [train.py:908] (0/4) Epoch 7, validation: loss=0.1631, simple_loss=0.1871, pruned_loss=0.06958, over 1530663.00 frames. 2022-11-15 21:52:29,703 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 21:52:44,094 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2358, 3.3883, 3.2754, 3.2231, 3.3459, 2.9897, 1.2901, 3.6128], device='cuda:0'), covar=tensor([0.0376, 0.0337, 0.0325, 0.0281, 0.0406, 0.0523, 0.3270, 0.0312], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0076, 0.0078, 0.0068, 0.0094, 0.0082, 0.0127, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:52:44,340 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.09 vs. limit=2.0 2022-11-15 21:52:47,043 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-15 21:53:01,895 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0902, 2.8949, 2.2545, 1.7173, 2.8157, 1.1010, 2.7942, 1.6831], device='cuda:0'), covar=tensor([0.0952, 0.0179, 0.0757, 0.1435, 0.0195, 0.1702, 0.0229, 0.1316], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0100, 0.0110, 0.0115, 0.0102, 0.0126, 0.0095, 0.0118], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 21:53:20,093 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.31 vs. limit=2.0 2022-11-15 21:53:28,696 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.724e+02 2.075e+02 2.427e+02 4.341e+02, threshold=4.150e+02, percent-clipped=0.0 2022-11-15 21:53:37,919 INFO [train.py:876] (0/4) Epoch 7, batch 100, loss[loss=0.2027, simple_loss=0.2025, pruned_loss=0.1015, over 4981.00 frames. ], tot_loss[loss=0.151, simple_loss=0.1643, pruned_loss=0.06884, over 432217.21 frames. ], batch size: 109, lr: 1.20e-02, grad_scale: 8.0 2022-11-15 21:54:15,348 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=43787.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:54:37,877 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.863e+02 2.234e+02 2.721e+02 4.230e+02, threshold=4.468e+02, percent-clipped=1.0 2022-11-15 21:54:40,760 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4812, 3.1527, 2.9030, 1.4529, 2.8827, 3.4613, 3.2032, 3.7007], device='cuda:0'), covar=tensor([0.1663, 0.1272, 0.0757, 0.2449, 0.0334, 0.0333, 0.0263, 0.0509], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0191, 0.0149, 0.0196, 0.0160, 0.0169, 0.0139, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 21:54:47,141 INFO [train.py:876] (0/4) Epoch 7, batch 200, loss[loss=0.1389, simple_loss=0.1646, pruned_loss=0.05661, over 5509.00 frames. ], tot_loss[loss=0.1552, simple_loss=0.1676, pruned_loss=0.07144, over 689503.31 frames. ], batch size: 17, lr: 1.19e-02, grad_scale: 8.0 2022-11-15 21:54:49,974 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6432, 2.6595, 2.2295, 2.9227, 2.2233, 2.3321, 2.8080, 3.3177], device='cuda:0'), covar=tensor([0.0857, 0.1529, 0.2655, 0.1710, 0.1543, 0.1379, 0.1415, 0.0652], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0074, 0.0090, 0.0065, 0.0076, 0.0072, 0.0085, 0.0058], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:54:57,701 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=43848.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:55:12,847 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=43870.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:55:31,513 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4014, 1.3846, 1.5787, 1.2998, 1.8558, 1.3969, 1.1078, 1.8011], device='cuda:0'), covar=tensor([0.0830, 0.1566, 0.0767, 0.1203, 0.0691, 0.1100, 0.1822, 0.0976], device='cuda:0'), in_proj_covar=tensor([0.0199, 0.0207, 0.0201, 0.0323, 0.0218, 0.0214, 0.0190, 0.0201], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0006, 0.0005, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:55:37,747 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9151, 1.7759, 2.5910, 2.3149, 2.5193, 1.8169, 2.3382, 2.8405], device='cuda:0'), covar=tensor([0.0480, 0.1185, 0.0607, 0.1006, 0.0554, 0.1150, 0.0830, 0.0634], device='cuda:0'), in_proj_covar=tensor([0.0213, 0.0194, 0.0193, 0.0212, 0.0195, 0.0187, 0.0227, 0.0211], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 21:55:38,978 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=43909.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:55:40,291 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2071, 2.4241, 2.4348, 1.2613, 1.6189, 2.8412, 2.2004, 1.8295], device='cuda:0'), covar=tensor([0.0530, 0.0427, 0.0391, 0.2379, 0.2108, 0.4289, 0.1179, 0.0652], device='cuda:0'), in_proj_covar=tensor([0.0059, 0.0046, 0.0051, 0.0064, 0.0051, 0.0043, 0.0049, 0.0051], device='cuda:0'), out_proj_covar=tensor([1.2139e-04, 9.7658e-05, 1.0770e-04, 1.3491e-04, 1.1155e-04, 9.8090e-05, 1.0549e-04, 1.0826e-04], device='cuda:0') 2022-11-15 21:55:46,088 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.324e+02 1.863e+02 2.345e+02 2.603e+02 4.668e+02, threshold=4.689e+02, percent-clipped=1.0 2022-11-15 21:55:48,324 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8497, 4.0342, 3.8955, 4.1741, 3.6426, 3.2862, 4.5514, 3.9001], device='cuda:0'), covar=tensor([0.0428, 0.0828, 0.0447, 0.1020, 0.0681, 0.0511, 0.0842, 0.0630], device='cuda:0'), in_proj_covar=tensor([0.0073, 0.0093, 0.0078, 0.0100, 0.0076, 0.0065, 0.0126, 0.0084], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 21:55:55,859 INFO [train.py:876] (0/4) Epoch 7, batch 300, loss[loss=0.2116, simple_loss=0.2032, pruned_loss=0.11, over 5552.00 frames. ], tot_loss[loss=0.1561, simple_loss=0.1679, pruned_loss=0.07212, over 843252.78 frames. ], batch size: 46, lr: 1.19e-02, grad_scale: 8.0 2022-11-15 21:56:13,536 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=43959.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:56:21,737 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=43970.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:56:54,567 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.760e+02 2.035e+02 2.652e+02 4.590e+02, threshold=4.070e+02, percent-clipped=0.0 2022-11-15 21:56:55,804 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=44020.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:57:02,220 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 21:57:04,544 INFO [train.py:876] (0/4) Epoch 7, batch 400, loss[loss=0.1736, simple_loss=0.1842, pruned_loss=0.08149, over 5533.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.1686, pruned_loss=0.07146, over 935058.02 frames. ], batch size: 40, lr: 1.19e-02, grad_scale: 8.0 2022-11-15 21:58:03,002 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.245e+02 1.796e+02 2.180e+02 2.809e+02 7.513e+02, threshold=4.360e+02, percent-clipped=4.0 2022-11-15 21:58:13,183 INFO [train.py:876] (0/4) Epoch 7, batch 500, loss[loss=0.1499, simple_loss=0.1798, pruned_loss=0.06005, over 5602.00 frames. ], tot_loss[loss=0.1562, simple_loss=0.1687, pruned_loss=0.07182, over 990507.76 frames. ], batch size: 18, lr: 1.19e-02, grad_scale: 8.0 2022-11-15 21:58:20,226 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44143.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:58:37,755 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=44170.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:58:46,965 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.41 vs. limit=5.0 2022-11-15 21:59:10,092 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 21:59:10,361 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=44218.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 21:59:10,883 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.014e+02 1.778e+02 2.186e+02 2.838e+02 5.081e+02, threshold=4.371e+02, percent-clipped=2.0 2022-11-15 21:59:20,342 INFO [train.py:876] (0/4) Epoch 7, batch 600, loss[loss=0.1574, simple_loss=0.1581, pruned_loss=0.07839, over 5293.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.1689, pruned_loss=0.07123, over 1031139.98 frames. ], batch size: 9, lr: 1.19e-02, grad_scale: 8.0 2022-11-15 21:59:33,747 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2022-11-15 21:59:38,139 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2022-11-15 21:59:39,135 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7229, 1.2626, 1.0411, 0.7488, 1.1950, 1.4473, 0.6533, 1.2724], device='cuda:0'), covar=tensor([0.0037, 0.0028, 0.0042, 0.0042, 0.0028, 0.0022, 0.0060, 0.0039], device='cuda:0'), in_proj_covar=tensor([0.0035, 0.0033, 0.0035, 0.0035, 0.0031, 0.0029, 0.0034, 0.0028], device='cuda:0'), out_proj_covar=tensor([3.2613e-05, 3.1429e-05, 3.1815e-05, 3.2441e-05, 2.7949e-05, 2.4405e-05, 3.3989e-05, 2.4791e-05], device='cuda:0') 2022-11-15 21:59:42,353 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44265.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:00:16,424 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44315.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:00:18,959 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.730e+02 2.067e+02 2.557e+02 5.519e+02, threshold=4.134e+02, percent-clipped=3.0 2022-11-15 22:00:28,128 INFO [train.py:876] (0/4) Epoch 7, batch 700, loss[loss=0.1309, simple_loss=0.1442, pruned_loss=0.05884, over 5179.00 frames. ], tot_loss[loss=0.1552, simple_loss=0.1681, pruned_loss=0.07115, over 1050187.96 frames. ], batch size: 8, lr: 1.19e-02, grad_scale: 16.0 2022-11-15 22:01:27,712 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.022e+02 1.682e+02 2.055e+02 2.418e+02 5.378e+02, threshold=4.109e+02, percent-clipped=3.0 2022-11-15 22:01:29,405 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.07 vs. limit=2.0 2022-11-15 22:01:36,450 INFO [train.py:876] (0/4) Epoch 7, batch 800, loss[loss=0.1726, simple_loss=0.1841, pruned_loss=0.08052, over 5585.00 frames. ], tot_loss[loss=0.1545, simple_loss=0.1674, pruned_loss=0.07077, over 1064445.13 frames. ], batch size: 50, lr: 1.19e-02, grad_scale: 8.0 2022-11-15 22:01:43,084 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=44443.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:02:12,788 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.61 vs. limit=5.0 2022-11-15 22:02:15,245 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7242, 3.8318, 3.6342, 3.7863, 3.8862, 3.6572, 1.1324, 3.9671], device='cuda:0'), covar=tensor([0.0275, 0.0300, 0.0320, 0.0220, 0.0290, 0.0340, 0.3335, 0.0244], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0076, 0.0078, 0.0068, 0.0095, 0.0080, 0.0125, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:02:15,858 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=44491.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:02:35,512 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.359e+02 1.866e+02 2.294e+02 2.881e+02 8.701e+02, threshold=4.588e+02, percent-clipped=3.0 2022-11-15 22:02:40,764 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=44527.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:02:44,439 INFO [train.py:876] (0/4) Epoch 7, batch 900, loss[loss=0.1296, simple_loss=0.1449, pruned_loss=0.05711, over 5750.00 frames. ], tot_loss[loss=0.1564, simple_loss=0.1684, pruned_loss=0.07218, over 1067387.35 frames. ], batch size: 13, lr: 1.19e-02, grad_scale: 8.0 2022-11-15 22:02:56,137 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=44550.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:03:05,893 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=44565.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:03:22,303 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=44588.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:03:37,754 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=44611.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:03:38,959 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=44613.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:03:40,654 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=44615.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:03:44,048 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.775e+02 2.262e+02 2.701e+02 4.427e+02, threshold=4.524e+02, percent-clipped=0.0 2022-11-15 22:03:52,912 INFO [train.py:876] (0/4) Epoch 7, batch 1000, loss[loss=0.2343, simple_loss=0.2101, pruned_loss=0.1292, over 5478.00 frames. ], tot_loss[loss=0.1563, simple_loss=0.1678, pruned_loss=0.07237, over 1061424.79 frames. ], batch size: 64, lr: 1.18e-02, grad_scale: 8.0 2022-11-15 22:04:02,949 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.27 vs. limit=5.0 2022-11-15 22:04:05,876 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7021, 2.5850, 2.2360, 3.0095, 2.3787, 2.4402, 2.5462, 3.3502], device='cuda:0'), covar=tensor([0.0898, 0.1669, 0.2852, 0.0946, 0.1505, 0.1391, 0.1873, 0.4417], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0077, 0.0092, 0.0067, 0.0074, 0.0073, 0.0086, 0.0058], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:04:13,009 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=44663.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:04:21,421 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2022-11-15 22:04:48,551 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=2.00 vs. limit=2.0 2022-11-15 22:04:52,371 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.670e+02 2.114e+02 2.520e+02 5.115e+02, threshold=4.229e+02, percent-clipped=2.0 2022-11-15 22:05:01,298 INFO [train.py:876] (0/4) Epoch 7, batch 1100, loss[loss=0.1418, simple_loss=0.1595, pruned_loss=0.06199, over 5709.00 frames. ], tot_loss[loss=0.1547, simple_loss=0.1671, pruned_loss=0.07117, over 1073061.38 frames. ], batch size: 15, lr: 1.18e-02, grad_scale: 8.0 2022-11-15 22:05:44,798 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.79 vs. limit=5.0 2022-11-15 22:06:00,677 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.756e+02 2.130e+02 2.754e+02 4.425e+02, threshold=4.261e+02, percent-clipped=1.0 2022-11-15 22:06:08,894 INFO [train.py:876] (0/4) Epoch 7, batch 1200, loss[loss=0.1329, simple_loss=0.1523, pruned_loss=0.0567, over 5565.00 frames. ], tot_loss[loss=0.1523, simple_loss=0.1661, pruned_loss=0.06923, over 1082321.29 frames. ], batch size: 21, lr: 1.18e-02, grad_scale: 8.0 2022-11-15 22:06:20,775 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-15 22:06:35,126 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4513, 4.6231, 3.1082, 4.2995, 3.6591, 3.1476, 2.6459, 3.8664], device='cuda:0'), covar=tensor([0.1462, 0.0191, 0.0939, 0.0364, 0.0511, 0.0876, 0.1779, 0.0294], device='cuda:0'), in_proj_covar=tensor([0.0172, 0.0133, 0.0166, 0.0136, 0.0172, 0.0181, 0.0178, 0.0145], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:06:42,959 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44883.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:06:59,231 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=44906.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:07:03,827 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9433, 2.5996, 2.0728, 1.5216, 2.3613, 2.7632, 2.4264, 2.7880], device='cuda:0'), covar=tensor([0.1931, 0.1452, 0.1067, 0.2496, 0.0577, 0.0472, 0.0424, 0.0800], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0190, 0.0150, 0.0191, 0.0163, 0.0170, 0.0141, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:07:08,225 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.257e+02 1.841e+02 2.145e+02 2.633e+02 4.840e+02, threshold=4.289e+02, percent-clipped=5.0 2022-11-15 22:07:16,621 INFO [train.py:876] (0/4) Epoch 7, batch 1300, loss[loss=0.1472, simple_loss=0.1675, pruned_loss=0.06341, over 5606.00 frames. ], tot_loss[loss=0.1535, simple_loss=0.1668, pruned_loss=0.07009, over 1084000.22 frames. ], batch size: 24, lr: 1.18e-02, grad_scale: 8.0 2022-11-15 22:07:40,412 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3512, 4.0852, 2.5867, 3.8942, 3.0724, 2.7026, 2.0956, 3.4190], device='cuda:0'), covar=tensor([0.1317, 0.0157, 0.1093, 0.0235, 0.0693, 0.1094, 0.1928, 0.0340], device='cuda:0'), in_proj_covar=tensor([0.0170, 0.0133, 0.0165, 0.0135, 0.0172, 0.0181, 0.0178, 0.0145], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:08:02,621 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-45000.pt 2022-11-15 22:08:19,522 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.856e+01 1.627e+02 2.176e+02 2.698e+02 5.044e+02, threshold=4.352e+02, percent-clipped=2.0 2022-11-15 22:08:28,064 INFO [train.py:876] (0/4) Epoch 7, batch 1400, loss[loss=0.1324, simple_loss=0.1486, pruned_loss=0.05812, over 5663.00 frames. ], tot_loss[loss=0.1509, simple_loss=0.165, pruned_loss=0.06839, over 1088356.06 frames. ], batch size: 29, lr: 1.18e-02, grad_scale: 8.0 2022-11-15 22:08:43,539 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45056.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:09:00,838 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45081.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:09:07,859 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.4435, 4.9582, 5.3605, 4.8910, 5.6558, 5.5837, 4.7867, 5.6042], device='cuda:0'), covar=tensor([0.0333, 0.0232, 0.0349, 0.0300, 0.0249, 0.0101, 0.0208, 0.0177], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0122, 0.0090, 0.0123, 0.0132, 0.0078, 0.0104, 0.0120], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 22:09:24,840 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45117.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 22:09:26,571 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.653e+02 2.032e+02 2.682e+02 5.131e+02, threshold=4.064e+02, percent-clipped=2.0 2022-11-15 22:09:35,809 INFO [train.py:876] (0/4) Epoch 7, batch 1500, loss[loss=0.1923, simple_loss=0.1962, pruned_loss=0.09425, over 5732.00 frames. ], tot_loss[loss=0.1514, simple_loss=0.1657, pruned_loss=0.06853, over 1090966.03 frames. ], batch size: 36, lr: 1.18e-02, grad_scale: 8.0 2022-11-15 22:09:37,269 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6311, 1.0976, 0.8519, 1.1086, 1.0463, 0.7375, 0.9152, 1.0863], device='cuda:0'), covar=tensor([0.0835, 0.1254, 0.3798, 0.1247, 0.2979, 0.1424, 0.1368, 0.0715], device='cuda:0'), in_proj_covar=tensor([0.0010, 0.0015, 0.0011, 0.0013, 0.0012, 0.0010, 0.0014, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.1210e-05, 6.6906e-05, 5.1666e-05, 6.0396e-05, 5.5808e-05, 5.1373e-05, 6.3397e-05, 5.1594e-05], device='cuda:0') 2022-11-15 22:09:41,795 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45142.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:09:57,118 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7773, 2.4342, 3.0583, 3.5525, 3.8937, 3.1441, 2.4961, 3.7526], device='cuda:0'), covar=tensor([0.0464, 0.3313, 0.2081, 0.3762, 0.0819, 0.2619, 0.2130, 0.0422], device='cuda:0'), in_proj_covar=tensor([0.0204, 0.0209, 0.0203, 0.0330, 0.0223, 0.0218, 0.0193, 0.0206], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005], device='cuda:0') 2022-11-15 22:10:09,062 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=45183.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:10:15,692 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2022-11-15 22:10:17,638 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2633, 3.1986, 3.1746, 2.9780, 1.9630, 3.1855, 2.1217, 2.5700], device='cuda:0'), covar=tensor([0.0366, 0.0124, 0.0132, 0.0279, 0.0362, 0.0129, 0.0329, 0.0127], device='cuda:0'), in_proj_covar=tensor([0.0178, 0.0139, 0.0156, 0.0175, 0.0173, 0.0157, 0.0166, 0.0146], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:10:24,971 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=45206.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:10:34,225 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.120e+01 1.667e+02 2.029e+02 2.581e+02 4.326e+02, threshold=4.059e+02, percent-clipped=3.0 2022-11-15 22:10:41,633 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=45231.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:10:42,912 INFO [train.py:876] (0/4) Epoch 7, batch 1600, loss[loss=0.1032, simple_loss=0.1274, pruned_loss=0.03949, over 5494.00 frames. ], tot_loss[loss=0.1513, simple_loss=0.1654, pruned_loss=0.06855, over 1087651.40 frames. ], batch size: 12, lr: 1.18e-02, grad_scale: 8.0 2022-11-15 22:10:46,027 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5679, 4.2272, 3.1909, 1.6906, 3.9549, 1.4809, 3.9495, 2.3698], device='cuda:0'), covar=tensor([0.1253, 0.0151, 0.0608, 0.2006, 0.0191, 0.1937, 0.0232, 0.1537], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0102, 0.0113, 0.0118, 0.0104, 0.0128, 0.0096, 0.0119], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 22:10:57,954 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=45254.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:11:07,758 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9423, 4.4182, 3.8729, 4.3753, 4.3593, 3.5979, 3.9491, 3.7573], device='cuda:0'), covar=tensor([0.0485, 0.0377, 0.1491, 0.0382, 0.0396, 0.0418, 0.0405, 0.0497], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0154, 0.0245, 0.0154, 0.0190, 0.0154, 0.0165, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 22:11:42,141 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.688e+02 2.070e+02 2.514e+02 3.710e+02, threshold=4.140e+02, percent-clipped=0.0 2022-11-15 22:11:51,082 INFO [train.py:876] (0/4) Epoch 7, batch 1700, loss[loss=0.1188, simple_loss=0.1456, pruned_loss=0.04602, over 5430.00 frames. ], tot_loss[loss=0.1532, simple_loss=0.1665, pruned_loss=0.06994, over 1086056.38 frames. ], batch size: 9, lr: 1.18e-02, grad_scale: 8.0 2022-11-15 22:12:00,813 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4595, 4.8679, 3.1482, 4.5945, 3.6932, 3.1874, 2.4003, 4.1453], device='cuda:0'), covar=tensor([0.1604, 0.0139, 0.0891, 0.0229, 0.0453, 0.0891, 0.1959, 0.0226], device='cuda:0'), in_proj_covar=tensor([0.0173, 0.0134, 0.0168, 0.0140, 0.0174, 0.0181, 0.0181, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 22:12:01,491 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45348.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 22:12:30,634 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1845, 2.3427, 2.0741, 2.3899, 1.9766, 1.6748, 2.0980, 2.7769], device='cuda:0'), covar=tensor([0.1138, 0.1564, 0.2398, 0.1420, 0.2110, 0.3025, 0.1854, 0.1131], device='cuda:0'), in_proj_covar=tensor([0.0077, 0.0079, 0.0091, 0.0070, 0.0077, 0.0075, 0.0088, 0.0060], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:12:42,830 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45409.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 22:12:45,011 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=45412.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 22:12:46,031 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5044, 3.4091, 3.4404, 3.4796, 3.4792, 3.3604, 1.2482, 3.6011], device='cuda:0'), covar=tensor([0.0295, 0.0345, 0.0346, 0.0261, 0.0415, 0.0360, 0.3458, 0.0327], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0078, 0.0081, 0.0071, 0.0100, 0.0083, 0.0132, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:12:50,414 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.783e+02 2.189e+02 2.743e+02 5.258e+02, threshold=4.378e+02, percent-clipped=6.0 2022-11-15 22:12:58,913 INFO [train.py:876] (0/4) Epoch 7, batch 1800, loss[loss=0.116, simple_loss=0.148, pruned_loss=0.04199, over 5760.00 frames. ], tot_loss[loss=0.1545, simple_loss=0.1677, pruned_loss=0.07066, over 1086002.77 frames. ], batch size: 16, lr: 1.17e-02, grad_scale: 8.0 2022-11-15 22:13:01,038 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7989, 2.2480, 2.5907, 3.8648, 3.7400, 3.0049, 2.4221, 3.9244], device='cuda:0'), covar=tensor([0.0421, 0.2791, 0.2750, 0.2253, 0.1053, 0.2770, 0.2166, 0.0383], device='cuda:0'), in_proj_covar=tensor([0.0205, 0.0209, 0.0205, 0.0327, 0.0224, 0.0220, 0.0195, 0.0205], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005], device='cuda:0') 2022-11-15 22:13:01,573 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=45437.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:13:14,652 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3638, 3.0601, 3.2025, 2.9876, 2.0801, 3.2431, 2.0595, 2.8608], device='cuda:0'), covar=tensor([0.0266, 0.0164, 0.0124, 0.0199, 0.0316, 0.0127, 0.0328, 0.0092], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0143, 0.0160, 0.0177, 0.0175, 0.0160, 0.0169, 0.0148], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:13:57,532 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.508e+01 1.821e+02 2.184e+02 2.835e+02 7.738e+02, threshold=4.367e+02, percent-clipped=5.0 2022-11-15 22:14:06,281 INFO [train.py:876] (0/4) Epoch 7, batch 1900, loss[loss=0.1575, simple_loss=0.1643, pruned_loss=0.07533, over 5588.00 frames. ], tot_loss[loss=0.1535, simple_loss=0.1664, pruned_loss=0.0703, over 1079702.16 frames. ], batch size: 23, lr: 1.17e-02, grad_scale: 8.0 2022-11-15 22:14:20,394 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.57 vs. limit=5.0 2022-11-15 22:15:04,861 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.287e+02 1.823e+02 2.221e+02 2.683e+02 3.782e+02, threshold=4.442e+02, percent-clipped=0.0 2022-11-15 22:15:13,881 INFO [train.py:876] (0/4) Epoch 7, batch 2000, loss[loss=0.2159, simple_loss=0.2182, pruned_loss=0.1068, over 5691.00 frames. ], tot_loss[loss=0.152, simple_loss=0.1654, pruned_loss=0.06932, over 1080876.67 frames. ], batch size: 36, lr: 1.17e-02, grad_scale: 8.0 2022-11-15 22:15:42,740 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-15 22:16:01,064 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45702.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:16:02,302 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=45704.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 22:16:07,672 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=45712.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 22:16:12,592 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.738e+02 2.020e+02 2.541e+02 5.348e+02, threshold=4.040e+02, percent-clipped=1.0 2022-11-15 22:16:21,530 INFO [train.py:876] (0/4) Epoch 7, batch 2100, loss[loss=0.1085, simple_loss=0.1472, pruned_loss=0.03494, over 5550.00 frames. ], tot_loss[loss=0.1507, simple_loss=0.1645, pruned_loss=0.06846, over 1082552.67 frames. ], batch size: 14, lr: 1.17e-02, grad_scale: 8.0 2022-11-15 22:16:24,224 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=45737.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:16:39,950 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=45760.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:16:40,005 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6281, 4.6301, 4.1483, 3.9459, 4.5721, 4.2268, 1.8225, 4.7045], device='cuda:0'), covar=tensor([0.0191, 0.0266, 0.0261, 0.0349, 0.0369, 0.0445, 0.3376, 0.0257], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0078, 0.0080, 0.0072, 0.0100, 0.0083, 0.0133, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:16:42,020 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45763.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:16:49,021 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45774.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:16:55,979 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=45785.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:17:20,074 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.330e+01 1.725e+02 2.150e+02 2.571e+02 5.272e+02, threshold=4.300e+02, percent-clipped=3.0 2022-11-15 22:17:25,732 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2022-11-15 22:17:28,682 INFO [train.py:876] (0/4) Epoch 7, batch 2200, loss[loss=0.1328, simple_loss=0.1569, pruned_loss=0.05436, over 5801.00 frames. ], tot_loss[loss=0.1511, simple_loss=0.1648, pruned_loss=0.06869, over 1086209.57 frames. ], batch size: 21, lr: 1.17e-02, grad_scale: 8.0 2022-11-15 22:17:30,164 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45835.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:17:46,379 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=45859.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:18:03,347 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 22:18:27,933 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.053e+02 1.659e+02 2.110e+02 2.622e+02 5.586e+02, threshold=4.221e+02, percent-clipped=2.0 2022-11-15 22:18:28,118 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=45920.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:18:30,613 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3259, 4.2660, 4.1407, 4.4857, 3.8141, 3.5686, 4.9711, 4.2126], device='cuda:0'), covar=tensor([0.0421, 0.0798, 0.0440, 0.1018, 0.0531, 0.0325, 0.0625, 0.0569], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0098, 0.0082, 0.0104, 0.0077, 0.0066, 0.0129, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:18:36,860 INFO [train.py:876] (0/4) Epoch 7, batch 2300, loss[loss=0.1279, simple_loss=0.1529, pruned_loss=0.05148, over 5612.00 frames. ], tot_loss[loss=0.1515, simple_loss=0.165, pruned_loss=0.06897, over 1083828.05 frames. ], batch size: 23, lr: 1.17e-02, grad_scale: 8.0 2022-11-15 22:19:05,582 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2728, 3.5849, 2.7483, 1.6822, 3.4045, 1.1831, 3.3159, 1.8417], device='cuda:0'), covar=tensor([0.1285, 0.0176, 0.0970, 0.1914, 0.0224, 0.2149, 0.0257, 0.1716], device='cuda:0'), in_proj_covar=tensor([0.0125, 0.0102, 0.0114, 0.0115, 0.0102, 0.0127, 0.0095, 0.0117], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0003, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 22:19:10,387 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9786, 3.9718, 3.9992, 4.1922, 3.6463, 3.5611, 4.5617, 3.9034], device='cuda:0'), covar=tensor([0.0430, 0.0726, 0.0353, 0.0919, 0.0618, 0.0339, 0.0752, 0.0674], device='cuda:0'), in_proj_covar=tensor([0.0076, 0.0099, 0.0083, 0.0106, 0.0079, 0.0067, 0.0132, 0.0089], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:19:13,116 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4167, 1.2111, 1.8055, 0.8847, 1.4637, 0.9871, 1.0064, 1.4011], device='cuda:0'), covar=tensor([0.2644, 0.1051, 0.0416, 0.3383, 0.0928, 0.1003, 0.1188, 0.1172], device='cuda:0'), in_proj_covar=tensor([0.0010, 0.0015, 0.0011, 0.0013, 0.0012, 0.0010, 0.0014, 0.0010], device='cuda:0'), out_proj_covar=tensor([5.0231e-05, 6.7339e-05, 5.1654e-05, 5.9232e-05, 5.5425e-05, 5.1146e-05, 6.2525e-05, 5.1151e-05], device='cuda:0') 2022-11-15 22:19:25,060 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46004.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 22:19:35,544 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.800e+02 2.204e+02 2.753e+02 4.636e+02, threshold=4.408e+02, percent-clipped=3.0 2022-11-15 22:19:44,586 INFO [train.py:876] (0/4) Epoch 7, batch 2400, loss[loss=0.1945, simple_loss=0.1998, pruned_loss=0.09456, over 5774.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.166, pruned_loss=0.06924, over 1082226.45 frames. ], batch size: 20, lr: 1.17e-02, grad_scale: 8.0 2022-11-15 22:19:57,578 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46052.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 22:20:01,768 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46058.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:20:13,887 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.03 vs. limit=5.0 2022-11-15 22:20:35,927 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6753, 1.8291, 2.3302, 1.4140, 1.1332, 2.7675, 1.9906, 1.7810], device='cuda:0'), covar=tensor([0.0862, 0.0551, 0.0409, 0.1664, 0.2301, 0.0373, 0.0759, 0.0909], device='cuda:0'), in_proj_covar=tensor([0.0067, 0.0051, 0.0057, 0.0070, 0.0057, 0.0048, 0.0050, 0.0057], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 22:20:43,096 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 1.760e+02 2.143e+02 2.682e+02 4.652e+02, threshold=4.287e+02, percent-clipped=1.0 2022-11-15 22:20:49,993 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46130.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:20:51,907 INFO [train.py:876] (0/4) Epoch 7, batch 2500, loss[loss=0.11, simple_loss=0.144, pruned_loss=0.03796, over 5567.00 frames. ], tot_loss[loss=0.1549, simple_loss=0.1683, pruned_loss=0.07071, over 1087237.05 frames. ], batch size: 13, lr: 1.17e-02, grad_scale: 8.0 2022-11-15 22:21:47,939 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46215.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:21:51,094 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.657e+02 1.935e+02 2.390e+02 4.850e+02, threshold=3.869e+02, percent-clipped=1.0 2022-11-15 22:21:59,980 INFO [train.py:876] (0/4) Epoch 7, batch 2600, loss[loss=0.229, simple_loss=0.2143, pruned_loss=0.1219, over 5340.00 frames. ], tot_loss[loss=0.1547, simple_loss=0.1677, pruned_loss=0.07088, over 1088043.56 frames. ], batch size: 70, lr: 1.16e-02, grad_scale: 8.0 2022-11-15 22:22:01,436 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0503, 1.4694, 2.1445, 1.6694, 1.8043, 1.8134, 1.9283, 1.7572], device='cuda:0'), covar=tensor([0.0032, 0.0064, 0.0049, 0.0027, 0.0098, 0.0066, 0.0022, 0.0028], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0019, 0.0019, 0.0023, 0.0021, 0.0018, 0.0023, 0.0022], device='cuda:0'), out_proj_covar=tensor([1.7326e-05, 1.8436e-05, 1.7936e-05, 2.2778e-05, 2.0147e-05, 1.8383e-05, 2.2430e-05, 2.3918e-05], device='cuda:0') 2022-11-15 22:22:14,866 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4344, 1.4868, 1.4346, 1.2570, 1.2331, 2.0648, 1.4662, 1.1616], device='cuda:0'), covar=tensor([0.1269, 0.0659, 0.1542, 0.1969, 0.2310, 0.0402, 0.1171, 0.1848], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0052, 0.0057, 0.0070, 0.0057, 0.0046, 0.0051, 0.0057], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 22:22:18,820 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2995, 0.8873, 1.1408, 0.9024, 1.1545, 1.2164, 0.7825, 1.0651], device='cuda:0'), covar=tensor([0.0451, 0.0473, 0.0387, 0.0760, 0.0699, 0.0300, 0.0590, 0.0415], device='cuda:0'), in_proj_covar=tensor([0.0010, 0.0015, 0.0011, 0.0013, 0.0012, 0.0010, 0.0014, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.1437e-05, 6.9048e-05, 5.2736e-05, 6.0970e-05, 5.6821e-05, 5.2278e-05, 6.3945e-05, 5.2856e-05], device='cuda:0') 2022-11-15 22:22:24,372 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.93 vs. limit=5.0 2022-11-15 22:22:29,333 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=46276.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:22:59,257 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.863e+02 2.232e+02 2.729e+02 4.275e+02, threshold=4.463e+02, percent-clipped=4.0 2022-11-15 22:23:07,785 INFO [train.py:876] (0/4) Epoch 7, batch 2700, loss[loss=0.2066, simple_loss=0.2039, pruned_loss=0.1047, over 5408.00 frames. ], tot_loss[loss=0.1533, simple_loss=0.1666, pruned_loss=0.07003, over 1083011.00 frames. ], batch size: 70, lr: 1.16e-02, grad_scale: 8.0 2022-11-15 22:23:10,623 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=46337.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:23:16,152 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4197, 1.3237, 1.7901, 1.2150, 1.3274, 1.6072, 1.0510, 0.8286], device='cuda:0'), covar=tensor([0.0013, 0.0035, 0.0014, 0.0025, 0.0024, 0.0018, 0.0025, 0.0038], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0018, 0.0018, 0.0022, 0.0020, 0.0018, 0.0022, 0.0022], device='cuda:0'), out_proj_covar=tensor([1.6389e-05, 1.7922e-05, 1.7516e-05, 2.2001e-05, 1.9354e-05, 1.7748e-05, 2.1848e-05, 2.2879e-05], device='cuda:0') 2022-11-15 22:23:25,120 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46358.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:23:33,383 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8879, 1.2933, 1.1311, 0.7214, 1.3816, 1.3671, 0.8921, 1.3043], device='cuda:0'), covar=tensor([0.0038, 0.0024, 0.0027, 0.0039, 0.0031, 0.0022, 0.0043, 0.0029], device='cuda:0'), in_proj_covar=tensor([0.0036, 0.0033, 0.0035, 0.0035, 0.0032, 0.0029, 0.0034, 0.0028], device='cuda:0'), out_proj_covar=tensor([3.3331e-05, 3.1411e-05, 3.1301e-05, 3.2474e-05, 2.8566e-05, 2.4850e-05, 3.2578e-05, 2.5549e-05], device='cuda:0') 2022-11-15 22:23:59,109 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46406.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:24:09,519 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.800e+02 2.089e+02 2.479e+02 4.263e+02, threshold=4.179e+02, percent-clipped=0.0 2022-11-15 22:24:16,889 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46430.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:24:18,865 INFO [train.py:876] (0/4) Epoch 7, batch 2800, loss[loss=0.1832, simple_loss=0.1854, pruned_loss=0.0905, over 5504.00 frames. ], tot_loss[loss=0.1555, simple_loss=0.168, pruned_loss=0.07149, over 1082119.81 frames. ], batch size: 49, lr: 1.16e-02, grad_scale: 16.0 2022-11-15 22:24:19,952 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.17 vs. limit=2.0 2022-11-15 22:24:41,989 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.52 vs. limit=5.0 2022-11-15 22:24:48,176 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9377, 4.9233, 5.1934, 5.2583, 4.6880, 4.5208, 5.6317, 5.0516], device='cuda:0'), covar=tensor([0.0378, 0.0850, 0.0282, 0.0813, 0.0422, 0.0312, 0.0629, 0.0426], device='cuda:0'), in_proj_covar=tensor([0.0072, 0.0095, 0.0078, 0.0102, 0.0075, 0.0064, 0.0127, 0.0084], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:24:49,432 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46478.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:25:14,653 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46515.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:25:18,215 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.794e+02 2.220e+02 2.623e+02 4.968e+02, threshold=4.440e+02, percent-clipped=2.0 2022-11-15 22:25:27,097 INFO [train.py:876] (0/4) Epoch 7, batch 2900, loss[loss=0.09013, simple_loss=0.1196, pruned_loss=0.03031, over 5045.00 frames. ], tot_loss[loss=0.1535, simple_loss=0.1663, pruned_loss=0.07034, over 1075643.76 frames. ], batch size: 7, lr: 1.16e-02, grad_scale: 16.0 2022-11-15 22:25:41,600 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.27 vs. limit=5.0 2022-11-15 22:25:47,069 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46563.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:26:08,944 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=46595.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:26:13,972 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=46602.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:26:26,117 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.182e+02 1.852e+02 2.208e+02 2.830e+02 6.179e+02, threshold=4.416e+02, percent-clipped=4.0 2022-11-15 22:26:33,839 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6002, 2.8623, 2.6803, 2.9609, 2.4320, 2.7794, 2.8481, 3.3223], device='cuda:0'), covar=tensor([0.1016, 0.1298, 0.2261, 0.1585, 0.1725, 0.1369, 0.1845, 0.2533], device='cuda:0'), in_proj_covar=tensor([0.0078, 0.0080, 0.0093, 0.0072, 0.0078, 0.0075, 0.0087, 0.0061], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:26:34,356 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46632.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:26:34,917 INFO [train.py:876] (0/4) Epoch 7, batch 3000, loss[loss=0.151, simple_loss=0.1771, pruned_loss=0.06242, over 5695.00 frames. ], tot_loss[loss=0.1527, simple_loss=0.1663, pruned_loss=0.06959, over 1082303.90 frames. ], batch size: 28, lr: 1.16e-02, grad_scale: 16.0 2022-11-15 22:26:34,918 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 22:26:45,902 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8141, 3.7994, 2.9449, 2.2042, 3.6937, 1.7166, 3.3106, 2.4858], device='cuda:0'), covar=tensor([0.0988, 0.0158, 0.1041, 0.1864, 0.0184, 0.1638, 0.0260, 0.1258], device='cuda:0'), in_proj_covar=tensor([0.0127, 0.0105, 0.0116, 0.0119, 0.0106, 0.0129, 0.0097, 0.0121], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 22:26:52,608 INFO [train.py:908] (0/4) Epoch 7, validation: loss=0.1596, simple_loss=0.1815, pruned_loss=0.06886, over 1530663.00 frames. 2022-11-15 22:26:52,609 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 22:27:07,829 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=46656.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 22:27:12,626 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=46663.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:27:14,589 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.24 vs. limit=2.0 2022-11-15 22:27:32,734 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9398, 2.3634, 3.4177, 3.1634, 3.4673, 2.3444, 3.4362, 3.8756], device='cuda:0'), covar=tensor([0.0658, 0.1429, 0.0797, 0.1377, 0.0575, 0.1421, 0.0917, 0.0692], device='cuda:0'), in_proj_covar=tensor([0.0214, 0.0189, 0.0196, 0.0210, 0.0204, 0.0185, 0.0223, 0.0214], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004], device='cuda:0') 2022-11-15 22:27:42,102 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2823, 1.7593, 1.8310, 1.1752, 1.2783, 2.2241, 1.8908, 1.3907], device='cuda:0'), covar=tensor([0.1125, 0.0702, 0.0760, 0.2189, 0.2107, 0.0476, 0.1271, 0.1283], device='cuda:0'), in_proj_covar=tensor([0.0063, 0.0050, 0.0054, 0.0067, 0.0053, 0.0044, 0.0049, 0.0055], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 22:27:50,897 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.581e+01 1.880e+02 2.186e+02 2.617e+02 4.736e+02, threshold=4.372e+02, percent-clipped=3.0 2022-11-15 22:27:59,279 INFO [train.py:876] (0/4) Epoch 7, batch 3100, loss[loss=0.1677, simple_loss=0.1822, pruned_loss=0.07665, over 5749.00 frames. ], tot_loss[loss=0.1517, simple_loss=0.1663, pruned_loss=0.06854, over 1084315.21 frames. ], batch size: 27, lr: 1.16e-02, grad_scale: 16.0 2022-11-15 22:28:35,339 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3606, 2.6654, 3.7636, 3.5780, 4.2912, 2.8935, 3.9755, 4.3782], device='cuda:0'), covar=tensor([0.0508, 0.1450, 0.0783, 0.1223, 0.0293, 0.1234, 0.0949, 0.0568], device='cuda:0'), in_proj_covar=tensor([0.0217, 0.0191, 0.0199, 0.0213, 0.0207, 0.0187, 0.0227, 0.0219], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 22:28:51,913 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=46810.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:28:58,609 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.858e+01 1.618e+02 2.001e+02 2.391e+02 5.637e+02, threshold=4.002e+02, percent-clipped=1.0 2022-11-15 22:29:06,283 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5648, 3.5795, 3.5420, 3.2209, 3.6839, 3.5227, 1.2089, 3.7285], device='cuda:0'), covar=tensor([0.0354, 0.0391, 0.0434, 0.0476, 0.0390, 0.0394, 0.3828, 0.0365], device='cuda:0'), in_proj_covar=tensor([0.0101, 0.0078, 0.0079, 0.0073, 0.0098, 0.0082, 0.0129, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:29:07,480 INFO [train.py:876] (0/4) Epoch 7, batch 3200, loss[loss=0.0971, simple_loss=0.1211, pruned_loss=0.03657, over 5322.00 frames. ], tot_loss[loss=0.1523, simple_loss=0.1664, pruned_loss=0.06914, over 1089351.47 frames. ], batch size: 9, lr: 1.16e-02, grad_scale: 16.0 2022-11-15 22:29:08,926 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8162, 3.4364, 3.0921, 3.4390, 3.4413, 2.9195, 3.0208, 2.9274], device='cuda:0'), covar=tensor([0.2469, 0.0517, 0.1615, 0.0545, 0.0543, 0.0566, 0.0689, 0.0774], device='cuda:0'), in_proj_covar=tensor([0.0123, 0.0157, 0.0251, 0.0157, 0.0193, 0.0157, 0.0167, 0.0153], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 22:29:33,378 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=46871.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:29:34,001 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6850, 1.3468, 1.8371, 1.3948, 1.6148, 1.7865, 1.4970, 1.1215], device='cuda:0'), covar=tensor([0.0013, 0.0039, 0.0021, 0.0024, 0.0024, 0.0068, 0.0018, 0.0035], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0018, 0.0018, 0.0022, 0.0020, 0.0018, 0.0022, 0.0022], device='cuda:0'), out_proj_covar=tensor([1.6312e-05, 1.7907e-05, 1.7404e-05, 2.1765e-05, 1.8844e-05, 1.8198e-05, 2.1681e-05, 2.3219e-05], device='cuda:0') 2022-11-15 22:29:46,830 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8029, 1.1871, 1.0367, 0.9307, 0.9983, 1.4718, 1.2418, 1.1526], device='cuda:0'), covar=tensor([0.2182, 0.0346, 0.1977, 0.2049, 0.2195, 0.0330, 0.1542, 0.1772], device='cuda:0'), in_proj_covar=tensor([0.0065, 0.0052, 0.0056, 0.0069, 0.0056, 0.0046, 0.0052, 0.0057], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 22:30:06,306 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.769e+02 2.073e+02 2.656e+02 5.224e+02, threshold=4.147e+02, percent-clipped=4.0 2022-11-15 22:30:14,569 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=46932.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:30:15,149 INFO [train.py:876] (0/4) Epoch 7, batch 3300, loss[loss=0.1743, simple_loss=0.1916, pruned_loss=0.07849, over 5588.00 frames. ], tot_loss[loss=0.1518, simple_loss=0.1659, pruned_loss=0.06885, over 1085335.53 frames. ], batch size: 23, lr: 1.16e-02, grad_scale: 16.0 2022-11-15 22:30:27,235 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46951.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 22:30:28,639 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.60 vs. limit=2.0 2022-11-15 22:30:32,486 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=46958.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:30:47,175 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=46980.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:31:14,482 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.793e+02 2.115e+02 2.507e+02 4.736e+02, threshold=4.229e+02, percent-clipped=3.0 2022-11-15 22:31:23,350 INFO [train.py:876] (0/4) Epoch 7, batch 3400, loss[loss=0.1172, simple_loss=0.1469, pruned_loss=0.04376, over 5517.00 frames. ], tot_loss[loss=0.1513, simple_loss=0.1659, pruned_loss=0.06841, over 1093601.07 frames. ], batch size: 14, lr: 1.15e-02, grad_scale: 16.0 2022-11-15 22:32:22,899 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.015e+02 1.842e+02 2.111e+02 2.660e+02 4.907e+02, threshold=4.221e+02, percent-clipped=3.0 2022-11-15 22:32:31,438 INFO [train.py:876] (0/4) Epoch 7, batch 3500, loss[loss=0.158, simple_loss=0.1623, pruned_loss=0.07682, over 5753.00 frames. ], tot_loss[loss=0.1507, simple_loss=0.1652, pruned_loss=0.06808, over 1095070.15 frames. ], batch size: 14, lr: 1.15e-02, grad_scale: 16.0 2022-11-15 22:32:38,455 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7417, 1.1930, 1.0749, 0.9020, 0.9559, 1.4937, 1.1495, 0.8456], device='cuda:0'), covar=tensor([0.2329, 0.0467, 0.1548, 0.2131, 0.2060, 0.0451, 0.1855, 0.2194], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0052, 0.0055, 0.0070, 0.0056, 0.0045, 0.0050, 0.0058], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 22:32:54,450 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=47166.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:33:30,929 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.714e+02 2.031e+02 2.581e+02 4.185e+02, threshold=4.062e+02, percent-clipped=0.0 2022-11-15 22:33:39,504 INFO [train.py:876] (0/4) Epoch 7, batch 3600, loss[loss=0.1242, simple_loss=0.1476, pruned_loss=0.05037, over 5558.00 frames. ], tot_loss[loss=0.152, simple_loss=0.166, pruned_loss=0.06902, over 1086528.66 frames. ], batch size: 14, lr: 1.15e-02, grad_scale: 16.0 2022-11-15 22:33:44,751 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47241.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:33:49,060 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7488, 4.6673, 4.8237, 4.9406, 4.2195, 4.1049, 5.3008, 4.7452], device='cuda:0'), covar=tensor([0.0261, 0.0724, 0.0232, 0.0784, 0.0507, 0.0275, 0.0525, 0.0327], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0097, 0.0082, 0.0104, 0.0077, 0.0066, 0.0130, 0.0086], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:33:52,085 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=47251.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:33:56,657 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=47258.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:34:23,971 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=47299.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:34:26,047 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47302.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:34:28,903 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=47306.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:34:38,257 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.693e+02 2.082e+02 2.562e+02 5.983e+02, threshold=4.163e+02, percent-clipped=3.0 2022-11-15 22:34:47,572 INFO [train.py:876] (0/4) Epoch 7, batch 3700, loss[loss=0.1787, simple_loss=0.1832, pruned_loss=0.08706, over 5164.00 frames. ], tot_loss[loss=0.15, simple_loss=0.165, pruned_loss=0.06754, over 1090413.45 frames. ], batch size: 91, lr: 1.15e-02, grad_scale: 16.0 2022-11-15 22:34:51,064 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2022-11-15 22:35:18,575 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47379.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:35:46,988 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.850e+02 2.309e+02 2.787e+02 6.530e+02, threshold=4.618e+02, percent-clipped=2.0 2022-11-15 22:35:56,051 INFO [train.py:876] (0/4) Epoch 7, batch 3800, loss[loss=0.2216, simple_loss=0.1964, pruned_loss=0.1234, over 3111.00 frames. ], tot_loss[loss=0.1499, simple_loss=0.1648, pruned_loss=0.06748, over 1082131.50 frames. ], batch size: 284, lr: 1.15e-02, grad_scale: 16.0 2022-11-15 22:36:00,945 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47440.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:36:18,971 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=47466.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:36:51,695 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=47514.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:36:56,172 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.606e+02 1.959e+02 2.426e+02 3.388e+02, threshold=3.918e+02, percent-clipped=0.0 2022-11-15 22:36:59,641 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8214, 2.2678, 2.3616, 1.2852, 2.7037, 2.7293, 2.5770, 2.9661], device='cuda:0'), covar=tensor([0.1953, 0.1942, 0.1092, 0.2940, 0.0527, 0.0572, 0.0412, 0.0817], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0188, 0.0151, 0.0189, 0.0161, 0.0174, 0.0141, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:37:04,433 INFO [train.py:876] (0/4) Epoch 7, batch 3900, loss[loss=0.1916, simple_loss=0.1986, pruned_loss=0.09231, over 5541.00 frames. ], tot_loss[loss=0.1495, simple_loss=0.1642, pruned_loss=0.06737, over 1078882.78 frames. ], batch size: 46, lr: 1.15e-02, grad_scale: 8.0 2022-11-15 22:37:15,972 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.48 vs. limit=2.0 2022-11-15 22:37:47,948 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=47597.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:38:04,272 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.833e+02 2.156e+02 2.681e+02 5.878e+02, threshold=4.311e+02, percent-clipped=5.0 2022-11-15 22:38:12,260 INFO [train.py:876] (0/4) Epoch 7, batch 4000, loss[loss=0.1338, simple_loss=0.1596, pruned_loss=0.05404, over 5719.00 frames. ], tot_loss[loss=0.1479, simple_loss=0.1634, pruned_loss=0.06626, over 1079981.60 frames. ], batch size: 17, lr: 1.15e-02, grad_scale: 8.0 2022-11-15 22:38:41,915 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2022-11-15 22:39:12,714 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.745e+02 2.156e+02 2.532e+02 4.633e+02, threshold=4.312e+02, percent-clipped=4.0 2022-11-15 22:39:20,695 INFO [train.py:876] (0/4) Epoch 7, batch 4100, loss[loss=0.09275, simple_loss=0.1215, pruned_loss=0.032, over 5478.00 frames. ], tot_loss[loss=0.1475, simple_loss=0.1628, pruned_loss=0.06607, over 1086444.39 frames. ], batch size: 11, lr: 1.15e-02, grad_scale: 8.0 2022-11-15 22:39:21,996 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=47735.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:39:24,128 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47738.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:39:55,732 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9181, 2.2203, 1.9693, 1.2842, 2.0576, 2.5728, 2.5689, 2.7762], device='cuda:0'), covar=tensor([0.1574, 0.1449, 0.1508, 0.2454, 0.0695, 0.0693, 0.0604, 0.0781], device='cuda:0'), in_proj_covar=tensor([0.0182, 0.0189, 0.0150, 0.0190, 0.0164, 0.0175, 0.0144, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:39:58,267 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47788.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:40:05,379 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47799.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:40:09,697 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.20 vs. limit=5.0 2022-11-15 22:40:12,060 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=47809.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 22:40:17,308 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9759, 3.6857, 2.5036, 3.4173, 2.6676, 2.6723, 2.1228, 3.0384], device='cuda:0'), covar=tensor([0.1394, 0.0181, 0.1091, 0.0292, 0.0802, 0.0900, 0.1689, 0.0352], device='cuda:0'), in_proj_covar=tensor([0.0170, 0.0135, 0.0168, 0.0142, 0.0173, 0.0177, 0.0178, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 22:40:20,467 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.020e+02 1.771e+02 2.148e+02 2.600e+02 4.098e+02, threshold=4.296e+02, percent-clipped=0.0 2022-11-15 22:40:25,113 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2022-11-15 22:40:29,021 INFO [train.py:876] (0/4) Epoch 7, batch 4200, loss[loss=0.1182, simple_loss=0.146, pruned_loss=0.04526, over 5463.00 frames. ], tot_loss[loss=0.1489, simple_loss=0.1638, pruned_loss=0.06705, over 1089670.98 frames. ], batch size: 12, lr: 1.14e-02, grad_scale: 8.0 2022-11-15 22:40:39,595 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47849.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:40:53,488 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=47870.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 22:41:04,250 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.55 vs. limit=2.0 2022-11-15 22:41:11,956 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=47897.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:41:27,580 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.804e+02 2.056e+02 2.615e+02 5.152e+02, threshold=4.112e+02, percent-clipped=3.0 2022-11-15 22:41:30,739 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8924, 1.3605, 1.3744, 0.9724, 1.1886, 1.3382, 0.6979, 1.0847], device='cuda:0'), covar=tensor([0.0027, 0.0020, 0.0017, 0.0024, 0.0021, 0.0021, 0.0044, 0.0030], device='cuda:0'), in_proj_covar=tensor([0.0039, 0.0036, 0.0037, 0.0039, 0.0035, 0.0033, 0.0036, 0.0030], device='cuda:0'), out_proj_covar=tensor([3.5383e-05, 3.3744e-05, 3.3581e-05, 3.5782e-05, 3.1117e-05, 2.8371e-05, 3.4269e-05, 2.6962e-05], device='cuda:0') 2022-11-15 22:41:36,123 INFO [train.py:876] (0/4) Epoch 7, batch 4300, loss[loss=0.1417, simple_loss=0.1636, pruned_loss=0.05991, over 5691.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.164, pruned_loss=0.06728, over 1091983.96 frames. ], batch size: 19, lr: 1.14e-02, grad_scale: 8.0 2022-11-15 22:41:37,922 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9389, 4.5075, 3.9617, 4.4865, 4.4327, 3.6760, 4.0881, 3.7480], device='cuda:0'), covar=tensor([0.0493, 0.0481, 0.1500, 0.0450, 0.0485, 0.0524, 0.0558, 0.0808], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0160, 0.0256, 0.0158, 0.0197, 0.0159, 0.0169, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 22:41:44,852 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=47945.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:41:57,463 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9009, 3.5737, 2.4168, 3.3010, 2.5206, 2.5686, 2.0477, 3.0253], device='cuda:0'), covar=tensor([0.1518, 0.0209, 0.1038, 0.0324, 0.0948, 0.0989, 0.1833, 0.0375], device='cuda:0'), in_proj_covar=tensor([0.0170, 0.0134, 0.0166, 0.0140, 0.0171, 0.0177, 0.0175, 0.0146], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 22:42:04,744 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8647, 2.3016, 3.4209, 3.0092, 3.8454, 2.5466, 3.4652, 3.9826], device='cuda:0'), covar=tensor([0.0628, 0.1741, 0.0752, 0.1410, 0.0357, 0.1319, 0.0843, 0.0675], device='cuda:0'), in_proj_covar=tensor([0.0217, 0.0193, 0.0198, 0.0210, 0.0205, 0.0183, 0.0222, 0.0217], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004], device='cuda:0') 2022-11-15 22:42:19,510 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2022-11-15 22:42:36,451 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.735e+02 2.013e+02 2.634e+02 5.022e+02, threshold=4.026e+02, percent-clipped=3.0 2022-11-15 22:42:44,693 INFO [train.py:876] (0/4) Epoch 7, batch 4400, loss[loss=0.1299, simple_loss=0.153, pruned_loss=0.05345, over 5689.00 frames. ], tot_loss[loss=0.1515, simple_loss=0.1654, pruned_loss=0.06876, over 1089202.78 frames. ], batch size: 19, lr: 1.14e-02, grad_scale: 8.0 2022-11-15 22:42:46,469 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48035.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:42:53,637 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.65 vs. limit=5.0 2022-11-15 22:42:56,647 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.92 vs. limit=2.0 2022-11-15 22:42:56,980 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5361, 3.7690, 3.7362, 4.0573, 3.6378, 3.2599, 4.3542, 3.6917], device='cuda:0'), covar=tensor([0.0534, 0.0885, 0.0491, 0.0980, 0.0587, 0.0423, 0.0747, 0.0612], device='cuda:0'), in_proj_covar=tensor([0.0076, 0.0097, 0.0083, 0.0104, 0.0078, 0.0066, 0.0131, 0.0087], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:43:11,627 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48072.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:43:18,743 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48083.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:43:26,711 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48094.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:43:26,820 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3387, 3.0505, 3.2932, 1.5743, 2.8702, 3.4493, 3.4362, 3.8475], device='cuda:0'), covar=tensor([0.2141, 0.1831, 0.0855, 0.3320, 0.0620, 0.0604, 0.0298, 0.0611], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0197, 0.0157, 0.0197, 0.0172, 0.0180, 0.0146, 0.0189], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:0') 2022-11-15 22:43:44,369 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.659e+02 2.129e+02 2.741e+02 6.031e+02, threshold=4.258e+02, percent-clipped=4.0 2022-11-15 22:43:52,256 INFO [train.py:876] (0/4) Epoch 7, batch 4500, loss[loss=0.1569, simple_loss=0.1736, pruned_loss=0.07007, over 5694.00 frames. ], tot_loss[loss=0.1521, simple_loss=0.1656, pruned_loss=0.06933, over 1087083.74 frames. ], batch size: 36, lr: 1.14e-02, grad_scale: 8.0 2022-11-15 22:43:52,408 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48133.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:43:53,022 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4844, 1.1907, 1.7135, 1.2867, 1.1778, 1.3027, 1.2363, 0.8536], device='cuda:0'), covar=tensor([0.0019, 0.0045, 0.0026, 0.0034, 0.0057, 0.0083, 0.0032, 0.0055], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0019, 0.0019, 0.0023, 0.0021, 0.0020, 0.0023, 0.0023], device='cuda:0'), out_proj_covar=tensor([1.6494e-05, 1.8735e-05, 1.8109e-05, 2.2660e-05, 1.9905e-05, 1.9782e-05, 2.2721e-05, 2.4950e-05], device='cuda:0') 2022-11-15 22:43:58,847 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1576, 1.4349, 1.9662, 1.2951, 0.7830, 2.3176, 1.8605, 1.2741], device='cuda:0'), covar=tensor([0.1552, 0.1009, 0.0850, 0.2364, 0.3098, 0.0447, 0.0948, 0.1429], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0051, 0.0055, 0.0070, 0.0055, 0.0046, 0.0051, 0.0058], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 22:43:59,428 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48144.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:44:06,454 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9045, 4.2300, 3.9247, 4.1861, 4.1938, 4.1630, 1.4352, 4.1752], device='cuda:0'), covar=tensor([0.0684, 0.0538, 0.0473, 0.0407, 0.0490, 0.0395, 0.5232, 0.0585], device='cuda:0'), in_proj_covar=tensor([0.0101, 0.0081, 0.0081, 0.0075, 0.0098, 0.0085, 0.0133, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:44:14,568 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48165.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 22:44:46,300 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48212.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 22:44:46,579 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-15 22:44:52,385 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.044e+02 1.660e+02 2.072e+02 2.700e+02 4.661e+02, threshold=4.143e+02, percent-clipped=2.0 2022-11-15 22:45:00,425 INFO [train.py:876] (0/4) Epoch 7, batch 4600, loss[loss=0.1829, simple_loss=0.1944, pruned_loss=0.08564, over 5559.00 frames. ], tot_loss[loss=0.1504, simple_loss=0.1648, pruned_loss=0.06797, over 1091468.19 frames. ], batch size: 43, lr: 1.14e-02, grad_scale: 8.0 2022-11-15 22:45:27,994 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48273.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 22:45:56,321 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8800, 4.1453, 3.7057, 3.4227, 2.4488, 4.1815, 2.2510, 3.4316], device='cuda:0'), covar=tensor([0.0376, 0.0172, 0.0209, 0.0318, 0.0476, 0.0136, 0.0488, 0.0122], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0146, 0.0159, 0.0177, 0.0173, 0.0158, 0.0169, 0.0151], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:46:00,361 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.286e+02 1.710e+02 2.127e+02 2.706e+02 4.557e+02, threshold=4.254e+02, percent-clipped=4.0 2022-11-15 22:46:02,400 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48324.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:46:08,189 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1249, 2.8863, 3.2384, 1.5087, 2.8377, 3.4868, 3.3882, 3.8785], device='cuda:0'), covar=tensor([0.2190, 0.1778, 0.0616, 0.3160, 0.0416, 0.0372, 0.0454, 0.0529], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0194, 0.0154, 0.0191, 0.0169, 0.0172, 0.0144, 0.0185], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:0') 2022-11-15 22:46:08,675 INFO [train.py:876] (0/4) Epoch 7, batch 4700, loss[loss=0.1299, simple_loss=0.167, pruned_loss=0.04645, over 5728.00 frames. ], tot_loss[loss=0.1488, simple_loss=0.1639, pruned_loss=0.0669, over 1088182.80 frames. ], batch size: 15, lr: 1.14e-02, grad_scale: 8.0 2022-11-15 22:46:42,926 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7939, 4.9247, 4.9439, 5.1039, 4.6613, 4.3982, 5.5546, 4.7976], device='cuda:0'), covar=tensor([0.0401, 0.0650, 0.0300, 0.0902, 0.0312, 0.0233, 0.0515, 0.0584], device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0096, 0.0081, 0.0105, 0.0077, 0.0067, 0.0128, 0.0087], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:46:44,368 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48385.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:46:50,272 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48394.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:46:52,220 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48397.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:47:08,411 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.686e+02 2.046e+02 2.615e+02 4.137e+02, threshold=4.091e+02, percent-clipped=0.0 2022-11-15 22:47:10,264 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48423.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:47:13,455 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48428.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:47:17,033 INFO [train.py:876] (0/4) Epoch 7, batch 4800, loss[loss=0.2623, simple_loss=0.2198, pruned_loss=0.1524, over 2989.00 frames. ], tot_loss[loss=0.1495, simple_loss=0.1644, pruned_loss=0.06731, over 1083842.16 frames. ], batch size: 284, lr: 1.14e-02, grad_scale: 8.0 2022-11-15 22:47:22,998 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48442.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:47:23,160 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8721, 1.8768, 2.2127, 2.0503, 1.3273, 2.0078, 1.3819, 1.2865], device='cuda:0'), covar=tensor([0.0177, 0.0067, 0.0082, 0.0105, 0.0230, 0.0119, 0.0227, 0.0164], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0144, 0.0158, 0.0176, 0.0172, 0.0157, 0.0169, 0.0151], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:47:24,423 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48444.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:47:33,545 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48458.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:47:34,808 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7858, 4.6999, 3.7260, 2.0621, 4.3151, 1.9311, 4.1343, 2.4202], device='cuda:0'), covar=tensor([0.1248, 0.0118, 0.0474, 0.2104, 0.0190, 0.1780, 0.0191, 0.1598], device='cuda:0'), in_proj_covar=tensor([0.0123, 0.0104, 0.0113, 0.0116, 0.0104, 0.0127, 0.0095, 0.0116], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 22:47:38,232 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48465.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 22:47:40,310 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.95 vs. limit=5.0 2022-11-15 22:47:51,718 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48484.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:47:57,021 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48492.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:48:10,193 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.97 vs. limit=5.0 2022-11-15 22:48:10,992 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48513.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 22:48:16,162 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.802e+02 2.155e+02 2.691e+02 6.184e+02, threshold=4.310e+02, percent-clipped=3.0 2022-11-15 22:48:25,052 INFO [train.py:876] (0/4) Epoch 7, batch 4900, loss[loss=0.1034, simple_loss=0.1254, pruned_loss=0.04069, over 5484.00 frames. ], tot_loss[loss=0.1505, simple_loss=0.1648, pruned_loss=0.06815, over 1076538.58 frames. ], batch size: 11, lr: 1.14e-02, grad_scale: 8.0 2022-11-15 22:48:48,179 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48568.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 22:48:49,148 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.07 vs. limit=2.0 2022-11-15 22:49:07,356 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.77 vs. limit=2.0 2022-11-15 22:49:24,582 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.657e+02 1.917e+02 2.351e+02 4.435e+02, threshold=3.835e+02, percent-clipped=1.0 2022-11-15 22:49:32,510 INFO [train.py:876] (0/4) Epoch 7, batch 5000, loss[loss=0.1844, simple_loss=0.1914, pruned_loss=0.08872, over 5701.00 frames. ], tot_loss[loss=0.1496, simple_loss=0.1639, pruned_loss=0.06767, over 1076090.53 frames. ], batch size: 36, lr: 1.14e-02, grad_scale: 8.0 2022-11-15 22:50:01,077 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7926, 4.3236, 4.5703, 4.3402, 4.8713, 4.6939, 4.2913, 4.8409], device='cuda:0'), covar=tensor([0.0357, 0.0275, 0.0438, 0.0281, 0.0333, 0.0136, 0.0236, 0.0240], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0125, 0.0093, 0.0125, 0.0138, 0.0083, 0.0106, 0.0124], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 22:50:04,375 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48680.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:50:32,599 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 1.796e+02 2.203e+02 2.625e+02 4.237e+02, threshold=4.405e+02, percent-clipped=2.0 2022-11-15 22:50:37,386 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48728.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:50:40,537 INFO [train.py:876] (0/4) Epoch 7, batch 5100, loss[loss=0.1174, simple_loss=0.1448, pruned_loss=0.04502, over 5590.00 frames. ], tot_loss[loss=0.147, simple_loss=0.1628, pruned_loss=0.06558, over 1085753.80 frames. ], batch size: 18, lr: 1.13e-02, grad_scale: 8.0 2022-11-15 22:50:48,460 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5534, 1.7318, 2.0967, 1.3985, 1.1770, 2.6572, 1.9425, 1.6958], device='cuda:0'), covar=tensor([0.0772, 0.1131, 0.0483, 0.2558, 0.2718, 0.0560, 0.1026, 0.1707], device='cuda:0'), in_proj_covar=tensor([0.0067, 0.0053, 0.0055, 0.0071, 0.0055, 0.0046, 0.0052, 0.0058], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 22:50:53,609 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48753.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:50:53,647 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0718, 4.8512, 3.8131, 2.2282, 4.5519, 1.7957, 4.4141, 2.7378], device='cuda:0'), covar=tensor([0.1139, 0.0107, 0.0436, 0.2143, 0.0188, 0.1919, 0.0175, 0.1521], device='cuda:0'), in_proj_covar=tensor([0.0123, 0.0104, 0.0114, 0.0116, 0.0105, 0.0127, 0.0095, 0.0117], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 22:51:09,842 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48776.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:51:12,138 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=48779.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:51:19,999 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9635, 2.7018, 2.2543, 1.5534, 2.6095, 1.1142, 2.6597, 1.6444], device='cuda:0'), covar=tensor([0.1114, 0.0195, 0.0793, 0.1674, 0.0241, 0.1941, 0.0246, 0.1394], device='cuda:0'), in_proj_covar=tensor([0.0125, 0.0105, 0.0115, 0.0117, 0.0105, 0.0129, 0.0096, 0.0118], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 22:51:39,999 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.794e+02 2.055e+02 2.515e+02 6.538e+02, threshold=4.110e+02, percent-clipped=4.0 2022-11-15 22:51:40,525 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48821.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:51:48,602 INFO [train.py:876] (0/4) Epoch 7, batch 5200, loss[loss=0.1864, simple_loss=0.1913, pruned_loss=0.09081, over 5182.00 frames. ], tot_loss[loss=0.1487, simple_loss=0.1642, pruned_loss=0.06665, over 1081560.82 frames. ], batch size: 91, lr: 1.13e-02, grad_scale: 8.0 2022-11-15 22:51:51,928 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=48838.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:52:00,382 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9461, 4.1395, 3.9989, 4.2037, 3.4525, 3.2920, 4.6155, 3.9713], device='cuda:0'), covar=tensor([0.0462, 0.0760, 0.0427, 0.1081, 0.0813, 0.0520, 0.0845, 0.0698], device='cuda:0'), in_proj_covar=tensor([0.0077, 0.0097, 0.0084, 0.0105, 0.0079, 0.0069, 0.0131, 0.0089], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:52:11,481 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48868.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 22:52:14,157 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.52 vs. limit=5.0 2022-11-15 22:52:21,492 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48882.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:52:32,603 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=48899.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:52:43,679 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=48916.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 22:52:46,877 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.737e+02 2.203e+02 2.681e+02 5.321e+02, threshold=4.406e+02, percent-clipped=3.0 2022-11-15 22:52:55,127 INFO [train.py:876] (0/4) Epoch 7, batch 5300, loss[loss=0.1686, simple_loss=0.1706, pruned_loss=0.08333, over 5087.00 frames. ], tot_loss[loss=0.1483, simple_loss=0.1639, pruned_loss=0.06636, over 1081325.51 frames. ], batch size: 91, lr: 1.13e-02, grad_scale: 8.0 2022-11-15 22:53:26,616 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=48980.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:53:36,968 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.78 vs. limit=2.0 2022-11-15 22:53:54,575 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.623e+02 1.921e+02 2.407e+02 5.715e+02, threshold=3.842e+02, percent-clipped=1.0 2022-11-15 22:53:59,193 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49028.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:54:02,408 INFO [train.py:876] (0/4) Epoch 7, batch 5400, loss[loss=0.1108, simple_loss=0.1395, pruned_loss=0.04101, over 5555.00 frames. ], tot_loss[loss=0.1495, simple_loss=0.1649, pruned_loss=0.06705, over 1089287.31 frames. ], batch size: 15, lr: 1.13e-02, grad_scale: 8.0 2022-11-15 22:54:16,521 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49053.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:54:17,183 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9988, 2.0707, 2.0179, 2.0792, 1.9190, 1.4762, 2.0255, 2.2805], device='cuda:0'), covar=tensor([0.1485, 0.2196, 0.2888, 0.1795, 0.1991, 0.3292, 0.2187, 0.1177], device='cuda:0'), in_proj_covar=tensor([0.0081, 0.0084, 0.0097, 0.0076, 0.0080, 0.0079, 0.0089, 0.0061], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:54:18,520 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3904, 1.1205, 1.3653, 0.9690, 1.4992, 1.0050, 1.0537, 1.2115], device='cuda:0'), covar=tensor([0.1351, 0.0954, 0.0832, 0.1402, 0.1078, 0.0770, 0.1391, 0.0939], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0016, 0.0011, 0.0013, 0.0013, 0.0011, 0.0015, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.5495e-05, 7.2772e-05, 5.5219e-05, 6.3163e-05, 6.0690e-05, 5.4602e-05, 6.8380e-05, 5.5006e-05], device='cuda:0') 2022-11-15 22:54:22,097 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6608, 3.7756, 3.7970, 3.8554, 3.4946, 3.2604, 4.3542, 3.7031], device='cuda:0'), covar=tensor([0.0460, 0.0942, 0.0444, 0.1082, 0.0525, 0.0372, 0.0690, 0.0576], device='cuda:0'), in_proj_covar=tensor([0.0076, 0.0097, 0.0082, 0.0104, 0.0077, 0.0068, 0.0131, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:54:30,808 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0526, 3.0566, 2.3199, 1.5557, 2.9798, 1.2442, 2.8167, 1.6274], device='cuda:0'), covar=tensor([0.1263, 0.0214, 0.0863, 0.1776, 0.0245, 0.1901, 0.0295, 0.1612], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0103, 0.0112, 0.0114, 0.0103, 0.0125, 0.0095, 0.0115], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 22:54:34,066 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49079.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:54:48,496 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49101.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:55:02,427 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.256e+02 1.756e+02 2.023e+02 2.468e+02 4.836e+02, threshold=4.046e+02, percent-clipped=5.0 2022-11-15 22:55:06,389 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49127.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:55:10,244 INFO [train.py:876] (0/4) Epoch 7, batch 5500, loss[loss=0.1129, simple_loss=0.1463, pruned_loss=0.03977, over 5810.00 frames. ], tot_loss[loss=0.1516, simple_loss=0.1664, pruned_loss=0.06844, over 1085307.15 frames. ], batch size: 26, lr: 1.13e-02, grad_scale: 8.0 2022-11-15 22:55:40,392 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49177.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:55:45,515 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-15 22:55:51,701 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49194.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:56:10,383 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.636e+02 2.022e+02 2.644e+02 5.189e+02, threshold=4.044e+02, percent-clipped=1.0 2022-11-15 22:56:18,735 INFO [train.py:876] (0/4) Epoch 7, batch 5600, loss[loss=0.1774, simple_loss=0.1903, pruned_loss=0.08221, over 5462.00 frames. ], tot_loss[loss=0.15, simple_loss=0.1654, pruned_loss=0.06726, over 1081977.82 frames. ], batch size: 49, lr: 1.13e-02, grad_scale: 8.0 2022-11-15 22:56:20,826 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=49236.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:56:37,768 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6088, 1.1645, 1.7778, 0.9935, 1.2892, 1.4983, 1.1584, 1.1816], device='cuda:0'), covar=tensor([0.0020, 0.0045, 0.0032, 0.0037, 0.0035, 0.0044, 0.0033, 0.0038], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0020, 0.0020, 0.0024, 0.0021, 0.0020, 0.0024, 0.0025], device='cuda:0'), out_proj_covar=tensor([1.6829e-05, 1.9752e-05, 1.9152e-05, 2.4009e-05, 2.0335e-05, 1.9713e-05, 2.3147e-05, 2.6154e-05], device='cuda:0') 2022-11-15 22:56:44,745 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.45 vs. limit=5.0 2022-11-15 22:56:45,093 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2295, 3.6476, 3.2628, 3.1852, 2.0343, 3.4697, 1.9241, 2.9046], device='cuda:0'), covar=tensor([0.0466, 0.0155, 0.0190, 0.0303, 0.0452, 0.0153, 0.0593, 0.0183], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0143, 0.0160, 0.0176, 0.0172, 0.0157, 0.0169, 0.0150], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 22:57:02,610 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=49297.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:57:15,689 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5703, 3.7336, 3.6464, 3.5429, 3.6476, 3.4957, 1.4049, 3.7383], device='cuda:0'), covar=tensor([0.0326, 0.0240, 0.0305, 0.0341, 0.0431, 0.0391, 0.3665, 0.0394], device='cuda:0'), in_proj_covar=tensor([0.0098, 0.0079, 0.0079, 0.0074, 0.0097, 0.0082, 0.0129, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 22:57:18,851 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.878e+01 1.572e+02 2.070e+02 2.645e+02 4.455e+02, threshold=4.140e+02, percent-clipped=3.0 2022-11-15 22:57:27,066 INFO [train.py:876] (0/4) Epoch 7, batch 5700, loss[loss=0.1501, simple_loss=0.1667, pruned_loss=0.06677, over 5539.00 frames. ], tot_loss[loss=0.1484, simple_loss=0.1642, pruned_loss=0.06634, over 1077805.22 frames. ], batch size: 21, lr: 1.13e-02, grad_scale: 8.0 2022-11-15 22:57:40,109 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.16 vs. limit=2.0 2022-11-15 22:57:45,808 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9990, 1.6920, 1.8481, 1.8311, 2.2409, 1.7483, 1.3488, 2.0850], device='cuda:0'), covar=tensor([0.0925, 0.1465, 0.1112, 0.0690, 0.0614, 0.1483, 0.1782, 0.1152], device='cuda:0'), in_proj_covar=tensor([0.0207, 0.0205, 0.0200, 0.0321, 0.0222, 0.0211, 0.0195, 0.0212], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-15 22:58:27,012 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.691e+02 1.989e+02 2.573e+02 4.583e+02, threshold=3.978e+02, percent-clipped=3.0 2022-11-15 22:58:27,809 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=49422.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:58:34,798 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=49432.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 22:58:35,267 INFO [train.py:876] (0/4) Epoch 7, batch 5800, loss[loss=0.2054, simple_loss=0.1959, pruned_loss=0.1075, over 5375.00 frames. ], tot_loss[loss=0.1515, simple_loss=0.1655, pruned_loss=0.06872, over 1076029.31 frames. ], batch size: 70, lr: 1.13e-02, grad_scale: 8.0 2022-11-15 22:58:52,125 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8777, 2.1062, 3.3548, 2.8985, 3.5770, 2.0714, 2.9662, 3.8056], device='cuda:0'), covar=tensor([0.0526, 0.1650, 0.0851, 0.1712, 0.0607, 0.1721, 0.1402, 0.0876], device='cuda:0'), in_proj_covar=tensor([0.0213, 0.0187, 0.0193, 0.0204, 0.0207, 0.0182, 0.0218, 0.0211], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004], device='cuda:0') 2022-11-15 22:59:04,665 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49477.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:59:08,974 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=49483.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:59:15,730 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=49493.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 22:59:16,291 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49494.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:59:33,818 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.316e+02 1.742e+02 2.165e+02 2.707e+02 6.167e+02, threshold=4.330e+02, percent-clipped=6.0 2022-11-15 22:59:36,558 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49525.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 22:59:42,439 INFO [train.py:876] (0/4) Epoch 7, batch 5900, loss[loss=0.2058, simple_loss=0.213, pruned_loss=0.09927, over 5638.00 frames. ], tot_loss[loss=0.1509, simple_loss=0.1651, pruned_loss=0.0684, over 1076322.51 frames. ], batch size: 36, lr: 1.12e-02, grad_scale: 16.0 2022-11-15 22:59:48,543 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49542.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:00:16,665 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.73 vs. limit=2.0 2022-11-15 23:00:21,900 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49592.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:00:33,921 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7598, 2.3310, 2.8987, 3.7673, 3.9882, 2.8280, 2.3670, 3.7000], device='cuda:0'), covar=tensor([0.0511, 0.3462, 0.2278, 0.2613, 0.0749, 0.3155, 0.2488, 0.0740], device='cuda:0'), in_proj_covar=tensor([0.0208, 0.0203, 0.0198, 0.0321, 0.0218, 0.0210, 0.0191, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0004, 0.0005], device='cuda:0') 2022-11-15 23:00:42,047 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.707e+02 2.021e+02 2.580e+02 5.267e+02, threshold=4.043e+02, percent-clipped=3.0 2022-11-15 23:00:50,008 INFO [train.py:876] (0/4) Epoch 7, batch 6000, loss[loss=0.1198, simple_loss=0.1495, pruned_loss=0.04507, over 5775.00 frames. ], tot_loss[loss=0.1497, simple_loss=0.1647, pruned_loss=0.06731, over 1078750.52 frames. ], batch size: 16, lr: 1.12e-02, grad_scale: 16.0 2022-11-15 23:00:50,009 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 23:01:00,539 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6833, 3.7371, 3.1840, 3.5852, 3.6156, 3.1502, 3.6752, 3.4394], device='cuda:0'), covar=tensor([0.0264, 0.0480, 0.1693, 0.0460, 0.0655, 0.0583, 0.0364, 0.0454], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0162, 0.0260, 0.0158, 0.0203, 0.0162, 0.0171, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 23:01:07,886 INFO [train.py:908] (0/4) Epoch 7, validation: loss=0.1616, simple_loss=0.1829, pruned_loss=0.07014, over 1530663.00 frames. 2022-11-15 23:01:07,887 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 23:01:34,835 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0030, 4.2049, 3.9211, 3.7804, 2.4835, 4.5674, 2.5020, 3.7624], device='cuda:0'), covar=tensor([0.0357, 0.0185, 0.0146, 0.0304, 0.0462, 0.0100, 0.0422, 0.0112], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0145, 0.0161, 0.0178, 0.0174, 0.0158, 0.0172, 0.0151], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:02:07,547 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.774e+02 2.178e+02 2.539e+02 4.839e+02, threshold=4.355e+02, percent-clipped=2.0 2022-11-15 23:02:15,504 INFO [train.py:876] (0/4) Epoch 7, batch 6100, loss[loss=0.1007, simple_loss=0.1255, pruned_loss=0.03789, over 5419.00 frames. ], tot_loss[loss=0.15, simple_loss=0.1646, pruned_loss=0.06771, over 1073018.16 frames. ], batch size: 11, lr: 1.12e-02, grad_scale: 16.0 2022-11-15 23:02:38,526 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=49766.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:02:40,760 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2022-11-15 23:02:46,273 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49778.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:02:52,897 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=49788.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 23:03:16,327 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.740e+01 1.709e+02 2.118e+02 2.760e+02 4.478e+02, threshold=4.236e+02, percent-clipped=1.0 2022-11-15 23:03:20,405 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=49827.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:03:24,170 INFO [train.py:876] (0/4) Epoch 7, batch 6200, loss[loss=0.09311, simple_loss=0.1307, pruned_loss=0.02775, over 5771.00 frames. ], tot_loss[loss=0.149, simple_loss=0.1641, pruned_loss=0.06696, over 1083810.69 frames. ], batch size: 16, lr: 1.12e-02, grad_scale: 16.0 2022-11-15 23:04:03,704 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=49892.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:04:23,026 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.007e+02 1.621e+02 1.970e+02 2.358e+02 3.613e+02, threshold=3.939e+02, percent-clipped=0.0 2022-11-15 23:04:31,725 INFO [train.py:876] (0/4) Epoch 7, batch 6300, loss[loss=0.1478, simple_loss=0.1759, pruned_loss=0.05988, over 5700.00 frames. ], tot_loss[loss=0.1459, simple_loss=0.1616, pruned_loss=0.06513, over 1085146.43 frames. ], batch size: 17, lr: 1.12e-02, grad_scale: 16.0 2022-11-15 23:04:35,675 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5995, 1.0926, 1.8522, 1.2789, 1.3309, 1.5806, 1.2723, 1.2836], device='cuda:0'), covar=tensor([0.0019, 0.0071, 0.0018, 0.0035, 0.0042, 0.0073, 0.0027, 0.0044], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0021, 0.0021, 0.0024, 0.0022, 0.0020, 0.0024, 0.0025], device='cuda:0'), out_proj_covar=tensor([1.7631e-05, 2.0315e-05, 1.9590e-05, 2.4117e-05, 2.0810e-05, 2.0198e-05, 2.3475e-05, 2.6709e-05], device='cuda:0') 2022-11-15 23:04:36,238 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=49940.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:05:05,502 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0920, 3.0151, 3.0059, 3.2047, 3.0832, 2.8269, 3.4589, 3.0948], device='cuda:0'), covar=tensor([0.0481, 0.0946, 0.0539, 0.1033, 0.0660, 0.0450, 0.0830, 0.0639], device='cuda:0'), in_proj_covar=tensor([0.0077, 0.0100, 0.0083, 0.0107, 0.0080, 0.0070, 0.0134, 0.0091], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 23:05:07,659 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2022-11-15 23:05:08,214 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0999, 2.4390, 3.2805, 2.2297, 1.5187, 3.5334, 2.4894, 1.8800], device='cuda:0'), covar=tensor([0.0509, 0.0984, 0.0214, 0.1690, 0.2608, 0.1096, 0.1950, 0.0968], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0056, 0.0058, 0.0072, 0.0057, 0.0047, 0.0052, 0.0060], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 23:05:13,349 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.45 vs. limit=5.0 2022-11-15 23:05:15,248 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.76 vs. limit=2.0 2022-11-15 23:05:17,064 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-50000.pt 2022-11-15 23:05:27,458 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-15 23:05:34,292 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.742e+02 2.037e+02 2.563e+02 6.362e+02, threshold=4.074e+02, percent-clipped=2.0 2022-11-15 23:05:42,857 INFO [train.py:876] (0/4) Epoch 7, batch 6400, loss[loss=0.0903, simple_loss=0.1232, pruned_loss=0.02872, over 5202.00 frames. ], tot_loss[loss=0.1457, simple_loss=0.1617, pruned_loss=0.06488, over 1085002.20 frames. ], batch size: 8, lr: 1.12e-02, grad_scale: 16.0 2022-11-15 23:05:59,847 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6091, 2.5942, 2.4134, 2.8346, 2.3382, 2.6837, 2.5720, 3.4073], device='cuda:0'), covar=tensor([0.1256, 0.1502, 0.2499, 0.0926, 0.1832, 0.0856, 0.1822, 0.1437], device='cuda:0'), in_proj_covar=tensor([0.0080, 0.0086, 0.0096, 0.0075, 0.0080, 0.0079, 0.0089, 0.0061], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0002], device='cuda:0') 2022-11-15 23:06:05,825 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6082, 4.1273, 3.7590, 3.5080, 2.1445, 4.2231, 2.2201, 3.6856], device='cuda:0'), covar=tensor([0.0436, 0.0157, 0.0195, 0.0438, 0.0570, 0.0113, 0.0483, 0.0124], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0143, 0.0157, 0.0176, 0.0171, 0.0157, 0.0169, 0.0151], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:06:13,105 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50078.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:06:20,277 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50088.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 23:06:21,078 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2022-11-15 23:06:22,922 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7751, 4.0104, 3.7952, 3.5129, 2.2944, 4.2235, 2.3573, 3.6624], device='cuda:0'), covar=tensor([0.0405, 0.0314, 0.0156, 0.0392, 0.0535, 0.0152, 0.0466, 0.0116], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0143, 0.0157, 0.0175, 0.0170, 0.0156, 0.0169, 0.0150], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:06:26,798 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5975, 4.0916, 3.7354, 3.4247, 2.2048, 4.1558, 2.2240, 3.5497], device='cuda:0'), covar=tensor([0.0433, 0.0198, 0.0179, 0.0446, 0.0580, 0.0136, 0.0478, 0.0139], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0143, 0.0158, 0.0175, 0.0171, 0.0157, 0.0170, 0.0150], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:06:41,518 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.674e+02 2.100e+02 2.789e+02 6.462e+02, threshold=4.200e+02, percent-clipped=5.0 2022-11-15 23:06:42,283 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=50122.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:06:45,240 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50126.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:06:46,639 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50128.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:06:50,153 INFO [train.py:876] (0/4) Epoch 7, batch 6500, loss[loss=0.1078, simple_loss=0.1381, pruned_loss=0.03872, over 5468.00 frames. ], tot_loss[loss=0.147, simple_loss=0.1625, pruned_loss=0.06579, over 1079104.28 frames. ], batch size: 12, lr: 1.12e-02, grad_scale: 16.0 2022-11-15 23:06:52,570 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50136.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:06:52,786 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2022-11-15 23:07:28,580 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50189.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:07:34,061 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50197.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:07:49,845 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.678e+02 2.090e+02 2.492e+02 4.594e+02, threshold=4.179e+02, percent-clipped=1.0 2022-11-15 23:07:58,107 INFO [train.py:876] (0/4) Epoch 7, batch 6600, loss[loss=0.1502, simple_loss=0.159, pruned_loss=0.07075, over 5579.00 frames. ], tot_loss[loss=0.1476, simple_loss=0.1631, pruned_loss=0.06603, over 1078825.10 frames. ], batch size: 22, lr: 1.12e-02, grad_scale: 16.0 2022-11-15 23:08:12,927 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6736, 3.9710, 3.5501, 3.4138, 2.2085, 3.9135, 2.0728, 3.1872], device='cuda:0'), covar=tensor([0.0399, 0.0186, 0.0189, 0.0289, 0.0528, 0.0134, 0.0525, 0.0151], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0145, 0.0160, 0.0179, 0.0172, 0.0158, 0.0171, 0.0153], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:08:13,523 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0224, 3.6128, 2.4387, 3.3156, 2.5677, 2.4858, 1.9781, 2.9610], device='cuda:0'), covar=tensor([0.1449, 0.0209, 0.1087, 0.0333, 0.1035, 0.1030, 0.1930, 0.0397], device='cuda:0'), in_proj_covar=tensor([0.0168, 0.0134, 0.0166, 0.0136, 0.0172, 0.0173, 0.0172, 0.0148], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 23:08:15,526 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50258.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 23:08:50,445 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-15 23:08:57,891 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.220e+02 1.709e+02 2.078e+02 2.678e+02 6.792e+02, threshold=4.156e+02, percent-clipped=5.0 2022-11-15 23:09:05,715 INFO [train.py:876] (0/4) Epoch 7, batch 6700, loss[loss=0.1288, simple_loss=0.1535, pruned_loss=0.05206, over 5580.00 frames. ], tot_loss[loss=0.15, simple_loss=0.1654, pruned_loss=0.06725, over 1079363.84 frames. ], batch size: 30, lr: 1.12e-02, grad_scale: 16.0 2022-11-15 23:09:21,520 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2022-11-15 23:10:05,742 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.792e+02 2.397e+02 3.058e+02 5.863e+02, threshold=4.794e+02, percent-clipped=9.0 2022-11-15 23:10:06,558 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50422.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:10:10,971 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.55 vs. limit=5.0 2022-11-15 23:10:13,767 INFO [train.py:876] (0/4) Epoch 7, batch 6800, loss[loss=0.1565, simple_loss=0.1604, pruned_loss=0.07632, over 5050.00 frames. ], tot_loss[loss=0.1507, simple_loss=0.1656, pruned_loss=0.06785, over 1075382.07 frames. ], batch size: 110, lr: 1.11e-02, grad_scale: 16.0 2022-11-15 23:10:23,675 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7553, 4.6445, 4.9041, 4.9258, 4.3183, 4.0063, 5.3185, 4.7498], device='cuda:0'), covar=tensor([0.0383, 0.1171, 0.0305, 0.1001, 0.0619, 0.0321, 0.0702, 0.0448], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0095, 0.0080, 0.0103, 0.0077, 0.0067, 0.0128, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0003, 0.0002], device='cuda:0') 2022-11-15 23:10:38,765 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50470.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:10:40,650 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2022-11-15 23:10:48,160 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=50484.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:11:08,327 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.43 vs. limit=2.0 2022-11-15 23:11:12,537 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.731e+02 2.163e+02 2.798e+02 6.704e+02, threshold=4.325e+02, percent-clipped=4.0 2022-11-15 23:11:20,771 INFO [train.py:876] (0/4) Epoch 7, batch 6900, loss[loss=0.1255, simple_loss=0.1502, pruned_loss=0.05042, over 5521.00 frames. ], tot_loss[loss=0.1488, simple_loss=0.1646, pruned_loss=0.06648, over 1080086.41 frames. ], batch size: 14, lr: 1.11e-02, grad_scale: 16.0 2022-11-15 23:11:29,486 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50546.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:11:34,011 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=50553.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 23:12:03,186 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.92 vs. limit=5.0 2022-11-15 23:12:03,876 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2022-11-15 23:12:10,969 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50607.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:12:20,424 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.773e+02 2.130e+02 2.487e+02 4.682e+02, threshold=4.260e+02, percent-clipped=1.0 2022-11-15 23:12:24,168 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0130, 1.9144, 1.9038, 1.7624, 2.0420, 1.9622, 1.9918, 1.9960], device='cuda:0'), covar=tensor([0.0570, 0.0528, 0.0625, 0.0650, 0.0617, 0.0281, 0.0499, 0.0743], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0129, 0.0097, 0.0127, 0.0142, 0.0083, 0.0108, 0.0126], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 23:12:28,770 INFO [train.py:876] (0/4) Epoch 7, batch 7000, loss[loss=0.1152, simple_loss=0.1439, pruned_loss=0.04328, over 5488.00 frames. ], tot_loss[loss=0.1494, simple_loss=0.1649, pruned_loss=0.0669, over 1079040.66 frames. ], batch size: 17, lr: 1.11e-02, grad_scale: 16.0 2022-11-15 23:12:43,513 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 23:13:27,405 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.720e+02 2.093e+02 2.606e+02 4.257e+02, threshold=4.187e+02, percent-clipped=0.0 2022-11-15 23:13:35,680 INFO [train.py:876] (0/4) Epoch 7, batch 7100, loss[loss=0.08683, simple_loss=0.1195, pruned_loss=0.02707, over 5505.00 frames. ], tot_loss[loss=0.1474, simple_loss=0.1633, pruned_loss=0.06572, over 1077893.20 frames. ], batch size: 10, lr: 1.11e-02, grad_scale: 16.0 2022-11-15 23:13:53,607 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1170, 4.6536, 4.3030, 4.7539, 4.7445, 3.8385, 4.3522, 4.1734], device='cuda:0'), covar=tensor([0.0296, 0.0404, 0.1158, 0.0330, 0.0419, 0.0485, 0.0456, 0.0570], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0160, 0.0254, 0.0154, 0.0201, 0.0160, 0.0173, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 23:14:02,709 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0631, 1.3380, 1.6018, 1.2379, 1.1268, 1.5729, 1.2313, 1.0046], device='cuda:0'), covar=tensor([0.0020, 0.0026, 0.0027, 0.0026, 0.0035, 0.0027, 0.0024, 0.0045], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0020, 0.0020, 0.0024, 0.0022, 0.0021, 0.0023, 0.0025], device='cuda:0'), out_proj_covar=tensor([1.7582e-05, 1.9993e-05, 1.8998e-05, 2.3528e-05, 2.1392e-05, 2.0250e-05, 2.3010e-05, 2.5933e-05], device='cuda:0') 2022-11-15 23:14:11,183 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3463, 2.3714, 2.8811, 1.5072, 1.5440, 2.8674, 2.2719, 1.7965], device='cuda:0'), covar=tensor([0.0575, 0.0494, 0.0451, 0.2259, 0.1730, 0.1845, 0.2145, 0.0950], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0053, 0.0058, 0.0072, 0.0056, 0.0046, 0.0051, 0.0059], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 23:14:11,212 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50782.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:14:12,516 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50784.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:14:29,873 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50808.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 23:14:38,106 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.697e+02 1.983e+02 2.631e+02 5.249e+02, threshold=3.966e+02, percent-clipped=2.0 2022-11-15 23:14:45,322 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50832.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:14:45,962 INFO [train.py:876] (0/4) Epoch 7, batch 7200, loss[loss=0.1822, simple_loss=0.1756, pruned_loss=0.09444, over 4155.00 frames. ], tot_loss[loss=0.148, simple_loss=0.1639, pruned_loss=0.06601, over 1083660.96 frames. ], batch size: 181, lr: 1.11e-02, grad_scale: 16.0 2022-11-15 23:14:53,254 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50843.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:14:59,651 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=50853.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 23:15:10,556 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=50869.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:15:20,226 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4189, 4.6099, 2.9672, 4.3428, 3.3938, 3.0125, 2.5386, 3.8078], device='cuda:0'), covar=tensor([0.1471, 0.0166, 0.1072, 0.0265, 0.0573, 0.0973, 0.1816, 0.0340], device='cuda:0'), in_proj_covar=tensor([0.0169, 0.0136, 0.0165, 0.0139, 0.0173, 0.0175, 0.0176, 0.0149], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 23:15:31,054 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=50901.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 23:15:31,694 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=50902.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:15:34,942 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-7.pt 2022-11-15 23:16:19,012 INFO [train.py:876] (0/4) Epoch 8, batch 0, loss[loss=0.1375, simple_loss=0.1601, pruned_loss=0.05745, over 5683.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.1601, pruned_loss=0.05745, over 5683.00 frames. ], batch size: 28, lr: 1.05e-02, grad_scale: 16.0 2022-11-15 23:16:19,013 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 23:16:35,654 INFO [train.py:908] (0/4) Epoch 8, validation: loss=0.161, simple_loss=0.1821, pruned_loss=0.06991, over 1530663.00 frames. 2022-11-15 23:16:35,654 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 23:16:38,365 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1760, 1.2939, 1.3409, 1.1050, 1.2136, 1.3790, 1.0836, 1.3332], device='cuda:0'), covar=tensor([0.0041, 0.0033, 0.0032, 0.0035, 0.0033, 0.0028, 0.0049, 0.0029], device='cuda:0'), in_proj_covar=tensor([0.0040, 0.0035, 0.0038, 0.0039, 0.0036, 0.0033, 0.0037, 0.0031], device='cuda:0'), out_proj_covar=tensor([3.6275e-05, 3.2223e-05, 3.4748e-05, 3.6045e-05, 3.2099e-05, 2.8906e-05, 3.5462e-05, 2.7360e-05], device='cuda:0') 2022-11-15 23:16:38,479 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.64 vs. limit=2.0 2022-11-15 23:16:45,814 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.842e+02 2.230e+02 2.830e+02 5.263e+02, threshold=4.459e+02, percent-clipped=7.0 2022-11-15 23:17:02,911 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=50946.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:17:11,820 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3983, 3.9787, 3.5819, 3.9764, 3.9596, 3.3310, 3.4304, 3.2858], device='cuda:0'), covar=tensor([0.0896, 0.0476, 0.1318, 0.0366, 0.0403, 0.0468, 0.0695, 0.0742], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0157, 0.0248, 0.0154, 0.0198, 0.0157, 0.0171, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 23:17:21,939 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.80 vs. limit=2.0 2022-11-15 23:17:29,495 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9854, 3.6313, 3.2317, 3.5879, 3.5747, 3.0335, 3.0618, 2.9963], device='cuda:0'), covar=tensor([0.1442, 0.0479, 0.1405, 0.0423, 0.0511, 0.0507, 0.0847, 0.0773], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0157, 0.0250, 0.0155, 0.0200, 0.0158, 0.0171, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 23:17:34,175 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0308, 3.1532, 3.0612, 1.7347, 2.8754, 3.3721, 3.4001, 3.8865], device='cuda:0'), covar=tensor([0.2421, 0.1507, 0.1223, 0.3122, 0.0446, 0.0581, 0.0357, 0.0539], device='cuda:0'), in_proj_covar=tensor([0.0182, 0.0191, 0.0159, 0.0194, 0.0171, 0.0181, 0.0149, 0.0186], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:0') 2022-11-15 23:17:42,749 INFO [train.py:876] (0/4) Epoch 8, batch 100, loss[loss=0.1396, simple_loss=0.1573, pruned_loss=0.061, over 5549.00 frames. ], tot_loss[loss=0.1439, simple_loss=0.1613, pruned_loss=0.06327, over 434112.23 frames. ], batch size: 13, lr: 1.04e-02, grad_scale: 16.0 2022-11-15 23:17:44,302 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51007.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:17:53,333 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.018e+02 1.590e+02 1.934e+02 2.468e+02 5.065e+02, threshold=3.869e+02, percent-clipped=2.0 2022-11-15 23:18:21,359 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51062.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:18:30,003 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-15 23:18:49,819 INFO [train.py:876] (0/4) Epoch 8, batch 200, loss[loss=0.1322, simple_loss=0.1694, pruned_loss=0.04748, over 5589.00 frames. ], tot_loss[loss=0.1484, simple_loss=0.1641, pruned_loss=0.06636, over 690339.32 frames. ], batch size: 23, lr: 1.04e-02, grad_scale: 16.0 2022-11-15 23:19:00,022 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.568e+01 1.812e+02 2.179e+02 2.624e+02 4.566e+02, threshold=4.359e+02, percent-clipped=4.0 2022-11-15 23:19:01,581 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51123.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:19:11,878 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51138.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:19:12,555 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2076, 4.8931, 3.6533, 1.9704, 4.7799, 1.8795, 4.5680, 2.4383], device='cuda:0'), covar=tensor([0.1097, 0.0111, 0.0608, 0.2176, 0.0125, 0.1772, 0.0127, 0.1573], device='cuda:0'), in_proj_covar=tensor([0.0127, 0.0107, 0.0117, 0.0118, 0.0108, 0.0128, 0.0100, 0.0119], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 23:19:12,870 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.78 vs. limit=5.0 2022-11-15 23:19:23,585 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5960, 1.7221, 2.0024, 1.3553, 1.1377, 2.4781, 1.8894, 1.5255], device='cuda:0'), covar=tensor([0.0908, 0.0994, 0.0826, 0.2616, 0.3031, 0.1012, 0.1423, 0.2031], device='cuda:0'), in_proj_covar=tensor([0.0067, 0.0055, 0.0058, 0.0074, 0.0057, 0.0046, 0.0052, 0.0060], device='cuda:0'), out_proj_covar=tensor([0.0001, 0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 23:19:29,228 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51164.0, num_to_drop=1, layers_to_drop={3} 2022-11-15 23:19:34,456 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51172.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:19:46,111 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3021, 0.8236, 1.0184, 0.7480, 1.3456, 1.1833, 0.6274, 0.9799], device='cuda:0'), covar=tensor([0.0497, 0.0441, 0.0571, 0.1174, 0.0323, 0.0406, 0.0989, 0.0546], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0016, 0.0011, 0.0014, 0.0012, 0.0011, 0.0015, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.5557e-05, 7.3955e-05, 5.5615e-05, 6.5088e-05, 5.9323e-05, 5.5237e-05, 6.8827e-05, 5.6440e-05], device='cuda:0') 2022-11-15 23:19:54,817 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51202.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:19:56,970 INFO [train.py:876] (0/4) Epoch 8, batch 300, loss[loss=0.1896, simple_loss=0.1838, pruned_loss=0.0977, over 5005.00 frames. ], tot_loss[loss=0.1449, simple_loss=0.1615, pruned_loss=0.06415, over 851542.88 frames. ], batch size: 109, lr: 1.04e-02, grad_scale: 16.0 2022-11-15 23:19:57,631 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-15 23:20:07,738 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.685e+02 2.012e+02 2.730e+02 5.121e+02, threshold=4.024e+02, percent-clipped=2.0 2022-11-15 23:20:15,722 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51233.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 23:20:25,250 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.49 vs. limit=2.0 2022-11-15 23:20:27,337 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51250.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:20:33,507 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-15 23:21:03,051 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51302.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:21:05,009 INFO [train.py:876] (0/4) Epoch 8, batch 400, loss[loss=0.09077, simple_loss=0.1206, pruned_loss=0.03048, over 5763.00 frames. ], tot_loss[loss=0.1426, simple_loss=0.1602, pruned_loss=0.06251, over 945600.57 frames. ], batch size: 9, lr: 1.04e-02, grad_scale: 16.0 2022-11-15 23:21:06,516 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3870, 2.2249, 2.0913, 2.2754, 2.0386, 1.7775, 1.9986, 2.5074], device='cuda:0'), covar=tensor([0.1199, 0.2061, 0.3399, 0.1530, 0.1984, 0.2216, 0.2454, 0.1507], device='cuda:0'), in_proj_covar=tensor([0.0081, 0.0086, 0.0096, 0.0076, 0.0081, 0.0079, 0.0089, 0.0063], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:21:06,578 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4575, 2.0920, 3.1148, 2.7266, 3.0959, 2.2031, 2.8369, 3.4414], device='cuda:0'), covar=tensor([0.0484, 0.1307, 0.0721, 0.1308, 0.0692, 0.1227, 0.1172, 0.0644], device='cuda:0'), in_proj_covar=tensor([0.0216, 0.0191, 0.0199, 0.0210, 0.0211, 0.0186, 0.0221, 0.0215], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 23:21:08,138 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8650, 4.4396, 3.9762, 4.4220, 4.4061, 3.6882, 4.0160, 3.7691], device='cuda:0'), covar=tensor([0.0490, 0.0422, 0.1390, 0.0351, 0.0407, 0.0453, 0.0478, 0.0512], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0163, 0.0256, 0.0158, 0.0205, 0.0160, 0.0175, 0.0161], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 23:21:16,263 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.652e+01 1.583e+02 1.915e+02 2.557e+02 6.087e+02, threshold=3.830e+02, percent-clipped=2.0 2022-11-15 23:21:17,311 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.13 vs. limit=5.0 2022-11-15 23:21:30,947 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2022-11-15 23:21:57,194 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51382.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:22:03,772 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51392.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:22:12,921 INFO [train.py:876] (0/4) Epoch 8, batch 500, loss[loss=0.1535, simple_loss=0.167, pruned_loss=0.06995, over 5566.00 frames. ], tot_loss[loss=0.1442, simple_loss=0.161, pruned_loss=0.06372, over 995349.91 frames. ], batch size: 50, lr: 1.04e-02, grad_scale: 16.0 2022-11-15 23:22:21,682 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51418.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:22:23,653 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 1.684e+02 2.091e+02 2.743e+02 4.142e+02, threshold=4.181e+02, percent-clipped=1.0 2022-11-15 23:22:35,560 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51438.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:22:38,906 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51443.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:22:45,525 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51453.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:22:50,711 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7287, 0.9400, 1.9361, 1.4928, 1.1192, 2.0624, 1.5385, 1.5177], device='cuda:0'), covar=tensor([0.0019, 0.0133, 0.0034, 0.0050, 0.0059, 0.0021, 0.0026, 0.0035], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0021, 0.0021, 0.0025, 0.0023, 0.0021, 0.0024, 0.0025], device='cuda:0'), out_proj_covar=tensor([1.8075e-05, 2.0307e-05, 1.9103e-05, 2.4853e-05, 2.2309e-05, 2.0548e-05, 2.3337e-05, 2.6257e-05], device='cuda:0') 2022-11-15 23:22:53,355 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51464.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 23:23:08,412 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51486.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:23:14,476 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51495.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:23:21,002 INFO [train.py:876] (0/4) Epoch 8, batch 600, loss[loss=0.221, simple_loss=0.1975, pruned_loss=0.1222, over 4712.00 frames. ], tot_loss[loss=0.1451, simple_loss=0.1614, pruned_loss=0.06445, over 1022643.76 frames. ], batch size: 136, lr: 1.04e-02, grad_scale: 32.0 2022-11-15 23:23:26,050 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51512.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:23:32,193 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.319e+01 1.675e+02 2.028e+02 2.576e+02 4.109e+02, threshold=4.056e+02, percent-clipped=0.0 2022-11-15 23:23:37,220 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51528.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 23:23:47,540 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0169, 4.3959, 3.8758, 3.5944, 2.4531, 4.4336, 2.4641, 3.7185], device='cuda:0'), covar=tensor([0.0364, 0.0245, 0.0196, 0.0404, 0.0443, 0.0097, 0.0456, 0.0128], device='cuda:0'), in_proj_covar=tensor([0.0177, 0.0145, 0.0160, 0.0180, 0.0173, 0.0158, 0.0172, 0.0154], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:23:56,180 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51556.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:24:28,197 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51602.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:24:30,078 INFO [train.py:876] (0/4) Epoch 8, batch 700, loss[loss=0.1214, simple_loss=0.1534, pruned_loss=0.04469, over 5560.00 frames. ], tot_loss[loss=0.1443, simple_loss=0.1612, pruned_loss=0.06374, over 1048918.60 frames. ], batch size: 18, lr: 1.04e-02, grad_scale: 32.0 2022-11-15 23:24:40,706 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.048e+01 1.604e+02 2.114e+02 2.490e+02 4.177e+02, threshold=4.229e+02, percent-clipped=3.0 2022-11-15 23:24:41,593 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3139, 4.4087, 4.2274, 3.8723, 2.5912, 4.8576, 2.4674, 4.2359], device='cuda:0'), covar=tensor([0.0362, 0.0247, 0.0168, 0.0335, 0.0501, 0.0093, 0.0479, 0.0098], device='cuda:0'), in_proj_covar=tensor([0.0180, 0.0147, 0.0162, 0.0183, 0.0176, 0.0160, 0.0175, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:25:01,592 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51650.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:25:39,227 INFO [train.py:876] (0/4) Epoch 8, batch 800, loss[loss=0.1175, simple_loss=0.155, pruned_loss=0.04004, over 5574.00 frames. ], tot_loss[loss=0.1438, simple_loss=0.1613, pruned_loss=0.06312, over 1068657.72 frames. ], batch size: 21, lr: 1.04e-02, grad_scale: 16.0 2022-11-15 23:25:42,106 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9925, 2.1202, 2.5214, 2.2028, 1.3980, 2.2907, 1.5540, 1.6628], device='cuda:0'), covar=tensor([0.0208, 0.0103, 0.0103, 0.0128, 0.0282, 0.0128, 0.0275, 0.0153], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0146, 0.0161, 0.0183, 0.0175, 0.0160, 0.0173, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:25:47,917 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51718.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:25:50,399 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.606e+01 1.543e+02 1.960e+02 2.359e+02 4.121e+02, threshold=3.919e+02, percent-clipped=0.0 2022-11-15 23:26:01,930 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51738.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:26:08,003 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=51747.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:26:08,921 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51748.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:26:09,716 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7101, 2.3320, 3.4144, 3.1103, 3.5122, 2.3355, 3.1645, 3.6899], device='cuda:0'), covar=tensor([0.0694, 0.1677, 0.0919, 0.1559, 0.0747, 0.1675, 0.1270, 0.1078], device='cuda:0'), in_proj_covar=tensor([0.0221, 0.0195, 0.0201, 0.0210, 0.0215, 0.0188, 0.0222, 0.0218], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 23:26:20,919 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51766.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:26:47,892 INFO [train.py:876] (0/4) Epoch 8, batch 900, loss[loss=0.2262, simple_loss=0.2012, pruned_loss=0.1256, over 5429.00 frames. ], tot_loss[loss=0.1449, simple_loss=0.1616, pruned_loss=0.06405, over 1075483.18 frames. ], batch size: 58, lr: 1.04e-02, grad_scale: 16.0 2022-11-15 23:26:50,098 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=51808.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:26:59,509 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 1.796e+02 2.172e+02 2.751e+02 5.616e+02, threshold=4.345e+02, percent-clipped=4.0 2022-11-15 23:27:03,643 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=51828.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 23:27:04,379 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0504, 2.0758, 2.3534, 3.2407, 3.1289, 2.4130, 1.9972, 3.2855], device='cuda:0'), covar=tensor([0.0899, 0.2991, 0.2398, 0.2415, 0.1308, 0.2886, 0.2204, 0.0928], device='cuda:0'), in_proj_covar=tensor([0.0216, 0.0208, 0.0200, 0.0321, 0.0228, 0.0216, 0.0195, 0.0220], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-15 23:27:19,688 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=51851.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:27:20,953 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6158, 3.4061, 3.5478, 3.7110, 3.2394, 3.0346, 4.1117, 3.4821], device='cuda:0'), covar=tensor([0.0558, 0.0897, 0.0585, 0.1082, 0.0701, 0.0509, 0.0791, 0.0892], device='cuda:0'), in_proj_covar=tensor([0.0079, 0.0100, 0.0086, 0.0111, 0.0081, 0.0073, 0.0136, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 23:27:37,035 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=51876.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:27:40,381 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4956, 1.0010, 1.1219, 0.8570, 1.3335, 1.3565, 0.8537, 0.9842], device='cuda:0'), covar=tensor([0.0319, 0.0428, 0.0198, 0.0671, 0.0564, 0.0756, 0.0559, 0.0350], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0016, 0.0011, 0.0013, 0.0012, 0.0010, 0.0014, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.4704e-05, 7.2884e-05, 5.3828e-05, 6.4489e-05, 5.9443e-05, 5.4055e-05, 6.7049e-05, 5.5526e-05], device='cuda:0') 2022-11-15 23:27:57,326 INFO [train.py:876] (0/4) Epoch 8, batch 1000, loss[loss=0.1192, simple_loss=0.1454, pruned_loss=0.04652, over 5791.00 frames. ], tot_loss[loss=0.1446, simple_loss=0.1615, pruned_loss=0.06383, over 1079507.06 frames. ], batch size: 21, lr: 1.04e-02, grad_scale: 16.0 2022-11-15 23:28:08,739 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.780e+02 2.132e+02 2.745e+02 5.068e+02, threshold=4.264e+02, percent-clipped=2.0 2022-11-15 23:28:21,805 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.81 vs. limit=2.0 2022-11-15 23:29:04,722 INFO [train.py:876] (0/4) Epoch 8, batch 1100, loss[loss=0.09224, simple_loss=0.1243, pruned_loss=0.0301, over 5728.00 frames. ], tot_loss[loss=0.1428, simple_loss=0.1607, pruned_loss=0.06244, over 1086905.72 frames. ], batch size: 13, lr: 1.03e-02, grad_scale: 16.0 2022-11-15 23:29:16,567 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.856e+01 1.738e+02 2.117e+02 2.536e+02 5.317e+02, threshold=4.235e+02, percent-clipped=1.0 2022-11-15 23:29:27,915 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52038.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:29:29,235 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52040.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:29:34,331 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52048.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:29:50,017 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.18 vs. limit=2.0 2022-11-15 23:29:54,390 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5390, 1.0593, 1.5124, 0.7530, 1.5159, 1.4340, 1.0905, 1.1077], device='cuda:0'), covar=tensor([0.0318, 0.0904, 0.0343, 0.1403, 0.1287, 0.1047, 0.1117, 0.0311], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0016, 0.0011, 0.0014, 0.0012, 0.0011, 0.0015, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.6373e-05, 7.3508e-05, 5.5028e-05, 6.5553e-05, 6.0569e-05, 5.4943e-05, 6.8383e-05, 5.6449e-05], device='cuda:0') 2022-11-15 23:29:59,641 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52086.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:30:05,850 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 23:30:06,068 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52096.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:30:09,800 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52101.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:30:10,944 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52103.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:30:12,209 INFO [train.py:876] (0/4) Epoch 8, batch 1200, loss[loss=0.2621, simple_loss=0.2159, pruned_loss=0.1541, over 3146.00 frames. ], tot_loss[loss=0.141, simple_loss=0.1592, pruned_loss=0.0614, over 1085354.06 frames. ], batch size: 284, lr: 1.03e-02, grad_scale: 8.0 2022-11-15 23:30:23,865 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.007e+02 1.672e+02 2.083e+02 2.557e+02 4.587e+02, threshold=4.167e+02, percent-clipped=2.0 2022-11-15 23:30:43,487 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52151.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:31:11,407 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6439, 4.4415, 3.7967, 3.5534, 2.1811, 4.0820, 2.3273, 3.4990], device='cuda:0'), covar=tensor([0.0399, 0.0084, 0.0225, 0.0421, 0.0497, 0.0147, 0.0445, 0.0137], device='cuda:0'), in_proj_covar=tensor([0.0179, 0.0145, 0.0158, 0.0179, 0.0172, 0.0158, 0.0172, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:31:12,297 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4494, 3.9942, 3.1020, 1.7158, 3.8623, 1.5269, 3.4784, 2.0727], device='cuda:0'), covar=tensor([0.1473, 0.0146, 0.0606, 0.2201, 0.0158, 0.1958, 0.0285, 0.1626], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0106, 0.0117, 0.0116, 0.0106, 0.0127, 0.0100, 0.0117], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 23:31:12,341 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52193.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:31:16,112 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52199.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:31:20,578 INFO [train.py:876] (0/4) Epoch 8, batch 1300, loss[loss=0.1406, simple_loss=0.1664, pruned_loss=0.05742, over 5748.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.1587, pruned_loss=0.06136, over 1086903.56 frames. ], batch size: 16, lr: 1.03e-02, grad_scale: 8.0 2022-11-15 23:31:32,437 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.004e+02 1.693e+02 2.062e+02 2.550e+02 7.238e+02, threshold=4.125e+02, percent-clipped=3.0 2022-11-15 23:31:54,152 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52254.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:31:59,027 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.75 vs. limit=2.0 2022-11-15 23:32:27,890 INFO [train.py:876] (0/4) Epoch 8, batch 1400, loss[loss=0.1302, simple_loss=0.1556, pruned_loss=0.05241, over 5571.00 frames. ], tot_loss[loss=0.1418, simple_loss=0.1596, pruned_loss=0.06203, over 1087666.77 frames. ], batch size: 22, lr: 1.03e-02, grad_scale: 8.0 2022-11-15 23:32:32,551 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52312.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:32:39,667 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.794e+02 2.190e+02 2.627e+02 5.142e+02, threshold=4.380e+02, percent-clipped=4.0 2022-11-15 23:33:00,112 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0496, 2.3256, 2.4012, 2.0916, 2.2988, 2.3228, 1.1476, 2.4184], device='cuda:0'), covar=tensor([0.0355, 0.0320, 0.0257, 0.0328, 0.0366, 0.0329, 0.2390, 0.0363], device='cuda:0'), in_proj_covar=tensor([0.0099, 0.0081, 0.0077, 0.0072, 0.0096, 0.0081, 0.0125, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 23:33:07,924 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.81 vs. limit=5.0 2022-11-15 23:33:13,755 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7093, 0.5308, 0.6936, 0.6215, 0.6556, 0.7206, 0.3766, 0.6214], device='cuda:0'), covar=tensor([0.0269, 0.0528, 0.0327, 0.0451, 0.0322, 0.0267, 0.0749, 0.0364], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0016, 0.0011, 0.0014, 0.0012, 0.0011, 0.0015, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.6271e-05, 7.3162e-05, 5.4525e-05, 6.4730e-05, 6.0261e-05, 5.4815e-05, 6.8531e-05, 5.5984e-05], device='cuda:0') 2022-11-15 23:33:13,769 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52373.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:33:28,831 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52396.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:33:33,945 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52403.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:33:35,125 INFO [train.py:876] (0/4) Epoch 8, batch 1500, loss[loss=0.1559, simple_loss=0.1657, pruned_loss=0.07299, over 5560.00 frames. ], tot_loss[loss=0.142, simple_loss=0.1596, pruned_loss=0.06217, over 1087048.13 frames. ], batch size: 43, lr: 1.03e-02, grad_scale: 8.0 2022-11-15 23:33:47,240 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.630e+02 1.908e+02 2.524e+02 5.804e+02, threshold=3.816e+02, percent-clipped=2.0 2022-11-15 23:33:51,626 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6834, 5.4142, 3.7111, 2.3685, 5.1631, 2.1990, 4.8927, 3.4939], device='cuda:0'), covar=tensor([0.0841, 0.0082, 0.0453, 0.1705, 0.0115, 0.1498, 0.0114, 0.1028], device='cuda:0'), in_proj_covar=tensor([0.0125, 0.0105, 0.0115, 0.0114, 0.0104, 0.0126, 0.0099, 0.0116], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 23:34:06,263 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52451.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:34:08,357 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5042, 3.2439, 3.7492, 1.4570, 3.1264, 3.6257, 3.6248, 4.0249], device='cuda:0'), covar=tensor([0.1650, 0.1376, 0.0672, 0.2608, 0.0354, 0.0605, 0.0284, 0.0404], device='cuda:0'), in_proj_covar=tensor([0.0178, 0.0186, 0.0155, 0.0187, 0.0169, 0.0178, 0.0150, 0.0184], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:0') 2022-11-15 23:34:09,257 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2022-11-15 23:34:09,652 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5664, 1.8719, 1.4908, 1.3515, 1.5886, 1.3047, 1.3052, 1.5684], device='cuda:0'), covar=tensor([0.0036, 0.0040, 0.0033, 0.0036, 0.0039, 0.0040, 0.0034, 0.0053], device='cuda:0'), in_proj_covar=tensor([0.0043, 0.0038, 0.0040, 0.0040, 0.0039, 0.0035, 0.0039, 0.0033], device='cuda:0'), out_proj_covar=tensor([3.8715e-05, 3.4571e-05, 3.6543e-05, 3.5968e-05, 3.4368e-05, 3.0459e-05, 3.7504e-05, 2.9047e-05], device='cuda:0') 2022-11-15 23:34:11,020 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5648, 1.9518, 2.1914, 2.5156, 2.7398, 2.1200, 1.7328, 2.7603], device='cuda:0'), covar=tensor([0.1140, 0.2377, 0.2087, 0.1072, 0.1174, 0.2551, 0.2046, 0.0836], device='cuda:0'), in_proj_covar=tensor([0.0212, 0.0206, 0.0199, 0.0321, 0.0221, 0.0213, 0.0194, 0.0217], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-15 23:34:32,249 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8591, 2.2246, 2.6118, 3.6582, 3.6944, 2.7190, 2.1737, 3.6489], device='cuda:0'), covar=tensor([0.0432, 0.3934, 0.2499, 0.2533, 0.0925, 0.3305, 0.2534, 0.0728], device='cuda:0'), in_proj_covar=tensor([0.0212, 0.0206, 0.0199, 0.0321, 0.0222, 0.0214, 0.0195, 0.0218], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-15 23:34:42,715 INFO [train.py:876] (0/4) Epoch 8, batch 1600, loss[loss=0.1344, simple_loss=0.1593, pruned_loss=0.05473, over 5657.00 frames. ], tot_loss[loss=0.1422, simple_loss=0.1598, pruned_loss=0.06228, over 1081213.90 frames. ], batch size: 29, lr: 1.03e-02, grad_scale: 8.0 2022-11-15 23:34:55,356 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.681e+02 2.048e+02 2.311e+02 7.167e+02, threshold=4.097e+02, percent-clipped=4.0 2022-11-15 23:35:13,813 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52549.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:35:18,147 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52555.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:35:54,042 INFO [train.py:876] (0/4) Epoch 8, batch 1700, loss[loss=0.1306, simple_loss=0.1481, pruned_loss=0.05661, over 5532.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.1589, pruned_loss=0.06125, over 1082257.86 frames. ], batch size: 13, lr: 1.03e-02, grad_scale: 8.0 2022-11-15 23:36:01,856 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52616.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:36:06,806 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.756e+02 2.105e+02 2.573e+02 4.026e+02, threshold=4.210e+02, percent-clipped=0.0 2022-11-15 23:36:18,164 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3729, 2.6168, 2.6307, 2.3916, 2.5868, 2.5905, 1.1737, 2.6586], device='cuda:0'), covar=tensor([0.0386, 0.0288, 0.0269, 0.0306, 0.0376, 0.0295, 0.2664, 0.0355], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0083, 0.0080, 0.0073, 0.0097, 0.0083, 0.0128, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 23:36:38,987 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52668.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:36:54,251 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7486, 2.3999, 2.6567, 3.5796, 3.4580, 2.7191, 2.2543, 3.5368], device='cuda:0'), covar=tensor([0.0579, 0.3083, 0.2799, 0.2862, 0.1229, 0.2803, 0.2308, 0.0565], device='cuda:0'), in_proj_covar=tensor([0.0219, 0.0209, 0.0201, 0.0326, 0.0227, 0.0219, 0.0199, 0.0223], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-15 23:36:59,013 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52696.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:37:05,449 INFO [train.py:876] (0/4) Epoch 8, batch 1800, loss[loss=0.1335, simple_loss=0.1553, pruned_loss=0.05585, over 5563.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1601, pruned_loss=0.06246, over 1086719.40 frames. ], batch size: 15, lr: 1.03e-02, grad_scale: 8.0 2022-11-15 23:37:18,120 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.718e+02 1.975e+02 2.529e+02 4.433e+02, threshold=3.950e+02, percent-clipped=1.0 2022-11-15 23:37:24,411 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.3338, 4.6710, 4.9570, 4.6283, 5.4322, 5.2715, 4.6444, 5.3039], device='cuda:0'), covar=tensor([0.0252, 0.0243, 0.0431, 0.0267, 0.0181, 0.0101, 0.0179, 0.0191], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0127, 0.0097, 0.0126, 0.0142, 0.0085, 0.0107, 0.0127], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 23:37:33,367 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52744.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:37:38,966 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=52752.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:38:17,349 INFO [train.py:876] (0/4) Epoch 8, batch 1900, loss[loss=0.1232, simple_loss=0.1567, pruned_loss=0.04484, over 5453.00 frames. ], tot_loss[loss=0.1428, simple_loss=0.1604, pruned_loss=0.06258, over 1082584.13 frames. ], batch size: 12, lr: 1.03e-02, grad_scale: 8.0 2022-11-15 23:38:18,619 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.87 vs. limit=2.0 2022-11-15 23:38:23,472 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=52813.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:38:30,568 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.627e+02 1.958e+02 2.548e+02 4.819e+02, threshold=3.916e+02, percent-clipped=3.0 2022-11-15 23:38:49,583 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52849.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:39:23,808 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=52897.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:39:24,646 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3696, 2.4626, 2.3658, 2.6126, 2.1631, 2.0669, 2.3446, 3.0554], device='cuda:0'), covar=tensor([0.1179, 0.1519, 0.2783, 0.1787, 0.1980, 0.1441, 0.2182, 0.2012], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0087, 0.0097, 0.0080, 0.0082, 0.0082, 0.0090, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:39:29,661 INFO [train.py:876] (0/4) Epoch 8, batch 2000, loss[loss=0.2384, simple_loss=0.2012, pruned_loss=0.1378, over 3141.00 frames. ], tot_loss[loss=0.1432, simple_loss=0.1605, pruned_loss=0.06299, over 1085824.23 frames. ], batch size: 284, lr: 1.03e-02, grad_scale: 8.0 2022-11-15 23:39:33,825 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=52911.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 23:39:41,635 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8394, 3.3283, 2.1664, 3.1887, 2.5882, 2.3670, 1.8832, 2.8119], device='cuda:0'), covar=tensor([0.2089, 0.0386, 0.1500, 0.0474, 0.1165, 0.1575, 0.2468, 0.0539], device='cuda:0'), in_proj_covar=tensor([0.0168, 0.0140, 0.0165, 0.0141, 0.0176, 0.0178, 0.0174, 0.0149], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-15 23:39:42,130 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.867e+01 1.663e+02 2.013e+02 2.665e+02 5.051e+02, threshold=4.025e+02, percent-clipped=6.0 2022-11-15 23:40:11,118 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2022-11-15 23:40:13,900 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2022-11-15 23:40:14,209 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=52968.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:40:24,291 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9390, 1.0986, 0.9888, 1.0011, 1.1760, 1.1659, 0.9142, 1.3062], device='cuda:0'), covar=tensor([0.0050, 0.0040, 0.0060, 0.0042, 0.0038, 0.0033, 0.0055, 0.0047], device='cuda:0'), in_proj_covar=tensor([0.0044, 0.0039, 0.0041, 0.0041, 0.0040, 0.0036, 0.0041, 0.0034], device='cuda:0'), out_proj_covar=tensor([3.9978e-05, 3.5409e-05, 3.7589e-05, 3.6863e-05, 3.5248e-05, 3.1029e-05, 3.8451e-05, 3.0294e-05], device='cuda:0') 2022-11-15 23:40:40,882 INFO [train.py:876] (0/4) Epoch 8, batch 2100, loss[loss=0.1236, simple_loss=0.1571, pruned_loss=0.045, over 5559.00 frames. ], tot_loss[loss=0.1423, simple_loss=0.1599, pruned_loss=0.06237, over 1088481.90 frames. ], batch size: 16, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:40:48,788 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53016.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:40:53,708 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.690e+02 2.165e+02 2.534e+02 4.185e+02, threshold=4.330e+02, percent-clipped=4.0 2022-11-15 23:40:55,968 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5167, 4.0083, 4.3136, 3.9952, 4.5812, 4.4343, 4.0892, 4.4473], device='cuda:0'), covar=tensor([0.0321, 0.0360, 0.0418, 0.0343, 0.0308, 0.0176, 0.0260, 0.0360], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0133, 0.0100, 0.0132, 0.0148, 0.0089, 0.0111, 0.0134], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 23:41:17,508 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53056.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:41:20,916 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([6.1273, 5.6295, 5.9415, 5.4763, 6.3075, 6.0291, 5.1229, 6.2041], device='cuda:0'), covar=tensor([0.0369, 0.0233, 0.0369, 0.0277, 0.0249, 0.0115, 0.0202, 0.0192], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0133, 0.0101, 0.0132, 0.0148, 0.0090, 0.0111, 0.0134], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 23:41:52,188 INFO [train.py:876] (0/4) Epoch 8, batch 2200, loss[loss=0.1025, simple_loss=0.1387, pruned_loss=0.03319, over 5552.00 frames. ], tot_loss[loss=0.1427, simple_loss=0.1604, pruned_loss=0.06249, over 1089909.21 frames. ], batch size: 15, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:41:54,668 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53108.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:42:01,091 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53117.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:42:05,401 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.642e+02 2.019e+02 2.545e+02 4.106e+02, threshold=4.038e+02, percent-clipped=0.0 2022-11-15 23:42:17,606 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.59 vs. limit=5.0 2022-11-15 23:42:52,190 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.97 vs. limit=2.0 2022-11-15 23:43:02,824 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.45 vs. limit=2.0 2022-11-15 23:43:05,026 INFO [train.py:876] (0/4) Epoch 8, batch 2300, loss[loss=0.2363, simple_loss=0.2166, pruned_loss=0.128, over 3129.00 frames. ], tot_loss[loss=0.1416, simple_loss=0.1593, pruned_loss=0.06193, over 1086898.24 frames. ], batch size: 284, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:43:09,397 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53211.0, num_to_drop=1, layers_to_drop={2} 2022-11-15 23:43:17,885 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.001e+02 1.609e+02 1.989e+02 2.421e+02 4.681e+02, threshold=3.978e+02, percent-clipped=2.0 2022-11-15 23:43:31,396 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53241.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:43:43,978 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53259.0, num_to_drop=1, layers_to_drop={1} 2022-11-15 23:44:14,682 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53302.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:44:14,969 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-15 23:44:16,033 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53304.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:44:16,588 INFO [train.py:876] (0/4) Epoch 8, batch 2400, loss[loss=0.1008, simple_loss=0.1202, pruned_loss=0.04067, over 5462.00 frames. ], tot_loss[loss=0.141, simple_loss=0.1593, pruned_loss=0.06135, over 1087006.35 frames. ], batch size: 11, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:44:24,075 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53315.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:44:29,616 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.667e+02 1.893e+02 2.315e+02 4.306e+02, threshold=3.787e+02, percent-clipped=3.0 2022-11-15 23:44:31,317 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2022-11-15 23:44:58,437 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-15 23:45:00,200 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53365.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:45:01,567 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=53367.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:45:03,759 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1766, 2.4079, 3.6590, 3.1979, 4.2668, 2.6205, 3.7108, 4.1229], device='cuda:0'), covar=tensor([0.0553, 0.1760, 0.0778, 0.1493, 0.0403, 0.1847, 0.0979, 0.0691], device='cuda:0'), in_proj_covar=tensor([0.0222, 0.0194, 0.0203, 0.0210, 0.0217, 0.0193, 0.0222, 0.0218], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 23:45:07,912 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53376.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:45:28,834 INFO [train.py:876] (0/4) Epoch 8, batch 2500, loss[loss=0.1358, simple_loss=0.1709, pruned_loss=0.05033, over 5534.00 frames. ], tot_loss[loss=0.1411, simple_loss=0.1593, pruned_loss=0.0614, over 1086857.64 frames. ], batch size: 17, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:45:31,083 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53408.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:45:33,741 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53412.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:45:41,343 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.752e+02 2.233e+02 2.745e+02 4.955e+02, threshold=4.465e+02, percent-clipped=8.0 2022-11-15 23:45:41,808 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.54 vs. limit=5.0 2022-11-15 23:45:44,958 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=53428.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:46:04,892 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53456.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:46:27,732 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4317, 3.4979, 3.2805, 3.1234, 1.9977, 3.3978, 2.0803, 2.9980], device='cuda:0'), covar=tensor([0.0356, 0.0127, 0.0182, 0.0263, 0.0425, 0.0134, 0.0411, 0.0142], device='cuda:0'), in_proj_covar=tensor([0.0182, 0.0153, 0.0163, 0.0184, 0.0179, 0.0164, 0.0175, 0.0161], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:46:39,623 INFO [train.py:876] (0/4) Epoch 8, batch 2600, loss[loss=0.2196, simple_loss=0.1902, pruned_loss=0.1245, over 3122.00 frames. ], tot_loss[loss=0.1371, simple_loss=0.1568, pruned_loss=0.05877, over 1087330.48 frames. ], batch size: 284, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:46:44,419 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2022-11-15 23:46:52,569 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.516e+01 1.584e+02 1.998e+02 2.446e+02 4.760e+02, threshold=3.997e+02, percent-clipped=2.0 2022-11-15 23:47:06,505 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.38 vs. limit=2.0 2022-11-15 23:47:45,386 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53597.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:47:51,091 INFO [train.py:876] (0/4) Epoch 8, batch 2700, loss[loss=0.153, simple_loss=0.175, pruned_loss=0.0655, over 5601.00 frames. ], tot_loss[loss=0.139, simple_loss=0.1575, pruned_loss=0.06023, over 1082458.98 frames. ], batch size: 18, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:48:04,169 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.781e+02 2.176e+02 2.706e+02 9.486e+02, threshold=4.353e+02, percent-clipped=5.0 2022-11-15 23:48:20,939 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4247, 2.5613, 2.3320, 2.5912, 2.0835, 2.1006, 2.4843, 2.8410], device='cuda:0'), covar=tensor([0.1586, 0.1400, 0.2532, 0.2755, 0.2167, 0.1549, 0.1770, 0.2779], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0087, 0.0097, 0.0081, 0.0082, 0.0084, 0.0089, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-15 23:48:30,816 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53660.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:48:38,711 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53671.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:49:01,445 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4374, 3.1797, 3.2924, 1.7880, 2.8554, 3.4021, 3.3562, 3.9532], device='cuda:0'), covar=tensor([0.1719, 0.1085, 0.0499, 0.2271, 0.0462, 0.0524, 0.0330, 0.0425], device='cuda:0'), in_proj_covar=tensor([0.0178, 0.0185, 0.0160, 0.0191, 0.0172, 0.0184, 0.0150, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-15 23:49:02,627 INFO [train.py:876] (0/4) Epoch 8, batch 2800, loss[loss=0.2068, simple_loss=0.1905, pruned_loss=0.1115, over 3160.00 frames. ], tot_loss[loss=0.14, simple_loss=0.1586, pruned_loss=0.06066, over 1081126.60 frames. ], batch size: 284, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:49:07,921 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53712.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:49:15,687 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.500e+01 1.616e+02 2.009e+02 2.401e+02 5.865e+02, threshold=4.018e+02, percent-clipped=2.0 2022-11-15 23:49:15,808 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=53723.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:49:42,114 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53760.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:50:15,003 INFO [train.py:876] (0/4) Epoch 8, batch 2900, loss[loss=0.1186, simple_loss=0.1413, pruned_loss=0.04795, over 5678.00 frames. ], tot_loss[loss=0.1403, simple_loss=0.1586, pruned_loss=0.06096, over 1085653.72 frames. ], batch size: 36, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:50:15,577 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.59 vs. limit=2.0 2022-11-15 23:50:20,643 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0363, 1.2294, 1.0892, 0.8480, 1.3626, 1.4894, 0.7413, 1.4888], device='cuda:0'), covar=tensor([0.0037, 0.0027, 0.0038, 0.0042, 0.0031, 0.0026, 0.0064, 0.0032], device='cuda:0'), in_proj_covar=tensor([0.0046, 0.0041, 0.0043, 0.0042, 0.0041, 0.0037, 0.0042, 0.0036], device='cuda:0'), out_proj_covar=tensor([4.1507e-05, 3.6966e-05, 3.9678e-05, 3.7660e-05, 3.6570e-05, 3.1916e-05, 3.9980e-05, 3.1543e-05], device='cuda:0') 2022-11-15 23:50:27,692 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.192e+01 1.630e+02 2.037e+02 2.446e+02 6.104e+02, threshold=4.074e+02, percent-clipped=4.0 2022-11-15 23:50:27,808 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3494, 4.0417, 4.1807, 4.0135, 4.4409, 4.2527, 4.0447, 4.4106], device='cuda:0'), covar=tensor([0.0387, 0.0278, 0.0434, 0.0283, 0.0360, 0.0243, 0.0240, 0.0283], device='cuda:0'), in_proj_covar=tensor([0.0124, 0.0132, 0.0101, 0.0131, 0.0145, 0.0089, 0.0109, 0.0132], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 23:50:36,561 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3128, 1.5561, 1.9377, 1.3454, 0.7914, 2.2089, 1.7798, 1.4274], device='cuda:0'), covar=tensor([0.1115, 0.1231, 0.1130, 0.3107, 0.3056, 0.0860, 0.1844, 0.1547], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0061, 0.0061, 0.0077, 0.0057, 0.0047, 0.0054, 0.0062], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-15 23:50:40,711 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5977, 2.1163, 2.6766, 3.5988, 3.5072, 2.7906, 2.3329, 3.5398], device='cuda:0'), covar=tensor([0.0936, 0.3025, 0.2346, 0.2719, 0.1041, 0.2856, 0.2119, 0.0639], device='cuda:0'), in_proj_covar=tensor([0.0219, 0.0203, 0.0200, 0.0323, 0.0223, 0.0213, 0.0195, 0.0218], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-15 23:51:11,673 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5204, 3.9814, 3.5577, 4.0060, 4.0351, 3.3698, 3.5366, 3.2847], device='cuda:0'), covar=tensor([0.0956, 0.0602, 0.1607, 0.0422, 0.0523, 0.0549, 0.0682, 0.0864], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0163, 0.0256, 0.0157, 0.0203, 0.0160, 0.0172, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-15 23:51:20,333 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53897.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:51:26,050 INFO [train.py:876] (0/4) Epoch 8, batch 3000, loss[loss=0.1452, simple_loss=0.167, pruned_loss=0.06171, over 5317.00 frames. ], tot_loss[loss=0.1406, simple_loss=0.1591, pruned_loss=0.06108, over 1087951.06 frames. ], batch size: 79, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:51:26,051 INFO [train.py:899] (0/4) Computing validation loss 2022-11-15 23:51:42,744 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7530, 2.6166, 2.6483, 2.5238, 2.7295, 2.6480, 2.6507, 2.6998], device='cuda:0'), covar=tensor([0.0404, 0.0679, 0.0476, 0.0706, 0.0576, 0.0338, 0.0438, 0.0756], device='cuda:0'), in_proj_covar=tensor([0.0124, 0.0134, 0.0101, 0.0132, 0.0147, 0.0089, 0.0110, 0.0132], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 23:51:44,991 INFO [train.py:908] (0/4) Epoch 8, validation: loss=0.1608, simple_loss=0.1816, pruned_loss=0.06996, over 1530663.00 frames. 2022-11-15 23:51:44,991 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-15 23:51:57,592 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.245e+01 1.684e+02 1.979e+02 2.404e+02 5.002e+02, threshold=3.957e+02, percent-clipped=2.0 2022-11-15 23:52:13,783 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=53945.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:52:24,749 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53960.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:52:32,415 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=53971.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:52:34,512 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1186, 3.2859, 3.2018, 3.0100, 3.2839, 3.1658, 1.1677, 3.2692], device='cuda:0'), covar=tensor([0.0431, 0.0309, 0.0353, 0.0350, 0.0434, 0.0392, 0.3631, 0.0411], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0083, 0.0081, 0.0075, 0.0098, 0.0084, 0.0129, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 23:52:57,152 INFO [train.py:876] (0/4) Epoch 8, batch 3100, loss[loss=0.1698, simple_loss=0.1723, pruned_loss=0.0837, over 5571.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.159, pruned_loss=0.06118, over 1085651.78 frames. ], batch size: 43, lr: 1.02e-02, grad_scale: 8.0 2022-11-15 23:52:59,282 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=54008.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:53:07,362 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=54019.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:53:09,940 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.766e+01 1.773e+02 2.219e+02 2.737e+02 4.389e+02, threshold=4.437e+02, percent-clipped=4.0 2022-11-15 23:53:10,099 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=54023.0, num_to_drop=1, layers_to_drop={0} 2022-11-15 23:53:43,826 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=54071.0, num_to_drop=0, layers_to_drop=set() 2022-11-15 23:54:08,038 INFO [train.py:876] (0/4) Epoch 8, batch 3200, loss[loss=0.1082, simple_loss=0.1329, pruned_loss=0.04171, over 5547.00 frames. ], tot_loss[loss=0.139, simple_loss=0.1581, pruned_loss=0.05997, over 1091846.47 frames. ], batch size: 21, lr: 1.01e-02, grad_scale: 16.0 2022-11-15 23:54:21,049 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.665e+02 2.003e+02 2.661e+02 5.081e+02, threshold=4.007e+02, percent-clipped=1.0 2022-11-15 23:54:21,909 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9640, 5.6622, 4.0180, 2.8738, 5.2423, 3.2082, 5.3692, 3.6477], device='cuda:0'), covar=tensor([0.0837, 0.0072, 0.0554, 0.1888, 0.0103, 0.1160, 0.0102, 0.1021], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0107, 0.0115, 0.0116, 0.0104, 0.0128, 0.0100, 0.0117], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-15 23:55:20,029 INFO [train.py:876] (0/4) Epoch 8, batch 3300, loss[loss=0.1201, simple_loss=0.1509, pruned_loss=0.04465, over 5541.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.1579, pruned_loss=0.05985, over 1092895.21 frames. ], batch size: 15, lr: 1.01e-02, grad_scale: 16.0 2022-11-15 23:55:32,990 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.566e+02 1.855e+02 2.366e+02 3.545e+02, threshold=3.710e+02, percent-clipped=0.0 2022-11-15 23:56:16,674 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2543, 3.8680, 4.1576, 3.7935, 4.3046, 4.0941, 3.8882, 4.2572], device='cuda:0'), covar=tensor([0.0321, 0.0347, 0.0327, 0.0337, 0.0336, 0.0260, 0.0341, 0.0364], device='cuda:0'), in_proj_covar=tensor([0.0123, 0.0132, 0.0100, 0.0132, 0.0147, 0.0088, 0.0109, 0.0131], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-15 23:56:31,635 INFO [train.py:876] (0/4) Epoch 8, batch 3400, loss[loss=0.09411, simple_loss=0.1257, pruned_loss=0.03128, over 5488.00 frames. ], tot_loss[loss=0.1382, simple_loss=0.1574, pruned_loss=0.05949, over 1091381.69 frames. ], batch size: 10, lr: 1.01e-02, grad_scale: 16.0 2022-11-15 23:56:43,984 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.624e+02 2.119e+02 2.818e+02 4.148e+02, threshold=4.237e+02, percent-clipped=5.0 2022-11-15 23:57:28,006 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2022-11-15 23:57:44,047 INFO [train.py:876] (0/4) Epoch 8, batch 3500, loss[loss=0.1525, simple_loss=0.152, pruned_loss=0.07649, over 4203.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.1589, pruned_loss=0.06127, over 1085450.13 frames. ], batch size: 181, lr: 1.01e-02, grad_scale: 16.0 2022-11-15 23:57:56,209 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 1.748e+02 2.123e+02 2.644e+02 4.958e+02, threshold=4.247e+02, percent-clipped=1.0 2022-11-15 23:58:17,439 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0076, 4.1765, 4.2380, 4.3763, 3.9957, 3.6397, 4.7831, 4.1333], device='cuda:0'), covar=tensor([0.0542, 0.0828, 0.0443, 0.0994, 0.0497, 0.0367, 0.0766, 0.0720], device='cuda:0'), in_proj_covar=tensor([0.0076, 0.0099, 0.0086, 0.0108, 0.0079, 0.0070, 0.0135, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-15 23:58:43,512 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2022-11-15 23:58:53,937 INFO [train.py:876] (0/4) Epoch 8, batch 3600, loss[loss=0.1302, simple_loss=0.1648, pruned_loss=0.04786, over 5737.00 frames. ], tot_loss[loss=0.1422, simple_loss=0.16, pruned_loss=0.06222, over 1088395.81 frames. ], batch size: 20, lr: 1.01e-02, grad_scale: 16.0 2022-11-15 23:59:05,648 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.193e+02 1.765e+02 2.064e+02 2.542e+02 7.404e+02, threshold=4.127e+02, percent-clipped=4.0 2022-11-15 23:59:14,232 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0631, 1.2883, 1.2701, 0.7707, 1.2081, 1.5712, 0.6220, 1.3828], device='cuda:0'), covar=tensor([0.0032, 0.0018, 0.0025, 0.0024, 0.0025, 0.0020, 0.0049, 0.0020], device='cuda:0'), in_proj_covar=tensor([0.0045, 0.0040, 0.0043, 0.0042, 0.0041, 0.0037, 0.0041, 0.0035], device='cuda:0'), out_proj_covar=tensor([4.0215e-05, 3.6307e-05, 3.9288e-05, 3.8048e-05, 3.6870e-05, 3.2085e-05, 3.8833e-05, 3.1212e-05], device='cuda:0') 2022-11-16 00:00:01,687 INFO [train.py:876] (0/4) Epoch 8, batch 3700, loss[loss=0.2116, simple_loss=0.2051, pruned_loss=0.1091, over 5363.00 frames. ], tot_loss[loss=0.1392, simple_loss=0.1581, pruned_loss=0.06018, over 1089716.60 frames. ], batch size: 70, lr: 1.01e-02, grad_scale: 16.0 2022-11-16 00:00:14,195 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.627e+02 2.007e+02 2.375e+02 5.660e+02, threshold=4.014e+02, percent-clipped=3.0 2022-11-16 00:00:41,897 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=54664.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 00:01:09,319 INFO [train.py:876] (0/4) Epoch 8, batch 3800, loss[loss=0.1949, simple_loss=0.1992, pruned_loss=0.09523, over 5563.00 frames. ], tot_loss[loss=0.1392, simple_loss=0.1576, pruned_loss=0.06045, over 1086456.99 frames. ], batch size: 43, lr: 1.01e-02, grad_scale: 16.0 2022-11-16 00:01:22,413 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.662e+02 2.074e+02 2.682e+02 3.562e+02, threshold=4.148e+02, percent-clipped=0.0 2022-11-16 00:01:23,829 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=54725.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 00:01:49,399 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4462, 1.6935, 1.6376, 1.1655, 0.9310, 1.5430, 1.1334, 1.4477], device='cuda:0'), covar=tensor([0.0042, 0.0054, 0.0031, 0.0044, 0.0044, 0.0027, 0.0035, 0.0083], device='cuda:0'), in_proj_covar=tensor([0.0045, 0.0040, 0.0043, 0.0042, 0.0042, 0.0037, 0.0042, 0.0035], device='cuda:0'), out_proj_covar=tensor([4.0331e-05, 3.6623e-05, 3.9246e-05, 3.8017e-05, 3.7148e-05, 3.2321e-05, 3.9303e-05, 3.1299e-05], device='cuda:0') 2022-11-16 00:01:50,699 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5257, 2.0575, 3.1848, 2.7495, 3.1123, 2.1258, 2.9728, 3.4896], device='cuda:0'), covar=tensor([0.0717, 0.1686, 0.0852, 0.1615, 0.0847, 0.1712, 0.1185, 0.0912], device='cuda:0'), in_proj_covar=tensor([0.0221, 0.0194, 0.0202, 0.0210, 0.0216, 0.0190, 0.0221, 0.0217], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:02:17,442 INFO [train.py:876] (0/4) Epoch 8, batch 3900, loss[loss=0.08718, simple_loss=0.1232, pruned_loss=0.02556, over 5762.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.1572, pruned_loss=0.06022, over 1082158.57 frames. ], batch size: 16, lr: 1.01e-02, grad_scale: 16.0 2022-11-16 00:02:29,731 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.833e+01 1.696e+02 2.085e+02 2.412e+02 7.560e+02, threshold=4.170e+02, percent-clipped=1.0 2022-11-16 00:02:30,842 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7718, 4.8383, 4.7317, 5.0249, 4.4820, 4.0269, 5.4022, 4.5714], device='cuda:0'), covar=tensor([0.0259, 0.0488, 0.0309, 0.0677, 0.0393, 0.0268, 0.0457, 0.0504], device='cuda:0'), in_proj_covar=tensor([0.0075, 0.0097, 0.0084, 0.0106, 0.0079, 0.0069, 0.0132, 0.0089], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 00:02:42,333 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2626, 3.6192, 2.7190, 1.6459, 3.4450, 1.3894, 3.4071, 1.7938], device='cuda:0'), covar=tensor([0.1436, 0.0177, 0.0813, 0.2019, 0.0216, 0.2146, 0.0238, 0.1915], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0106, 0.0116, 0.0115, 0.0105, 0.0128, 0.0099, 0.0116], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:02:56,182 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=54862.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:03:11,695 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.27 vs. limit=5.0 2022-11-16 00:03:25,455 INFO [train.py:876] (0/4) Epoch 8, batch 4000, loss[loss=0.1347, simple_loss=0.1452, pruned_loss=0.0621, over 5589.00 frames. ], tot_loss[loss=0.1394, simple_loss=0.1573, pruned_loss=0.0607, over 1080557.54 frames. ], batch size: 43, lr: 1.01e-02, grad_scale: 16.0 2022-11-16 00:03:36,933 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.57 vs. limit=5.0 2022-11-16 00:03:37,028 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.640e+02 2.022e+02 2.606e+02 3.847e+02, threshold=4.045e+02, percent-clipped=0.0 2022-11-16 00:03:37,222 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=54923.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 00:04:33,507 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-55000.pt 2022-11-16 00:04:39,876 INFO [train.py:876] (0/4) Epoch 8, batch 4100, loss[loss=0.1387, simple_loss=0.1403, pruned_loss=0.06849, over 4150.00 frames. ], tot_loss[loss=0.1389, simple_loss=0.1575, pruned_loss=0.06017, over 1082410.14 frames. ], batch size: 181, lr: 1.01e-02, grad_scale: 16.0 2022-11-16 00:04:49,731 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=55020.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 00:04:51,561 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.489e+01 1.632e+02 1.927e+02 2.505e+02 4.639e+02, threshold=3.854e+02, percent-clipped=4.0 2022-11-16 00:05:02,502 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9873, 4.0901, 3.8229, 3.8479, 4.0903, 3.8464, 1.6286, 4.1848], device='cuda:0'), covar=tensor([0.0264, 0.0223, 0.0237, 0.0340, 0.0304, 0.0329, 0.3015, 0.0287], device='cuda:0'), in_proj_covar=tensor([0.0099, 0.0083, 0.0080, 0.0074, 0.0098, 0.0084, 0.0127, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 00:05:06,108 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0149, 2.1771, 1.9673, 2.3561, 1.8285, 1.9741, 1.9641, 2.5792], device='cuda:0'), covar=tensor([0.1191, 0.2045, 0.2677, 0.1478, 0.1994, 0.1466, 0.2099, 0.1093], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0088, 0.0093, 0.0079, 0.0081, 0.0084, 0.0088, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 00:05:14,244 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8297, 2.1254, 2.4500, 3.1439, 3.1428, 2.4911, 2.1380, 3.2515], device='cuda:0'), covar=tensor([0.1346, 0.2853, 0.2219, 0.2850, 0.1095, 0.2830, 0.2154, 0.0840], device='cuda:0'), in_proj_covar=tensor([0.0223, 0.0207, 0.0203, 0.0324, 0.0226, 0.0216, 0.0196, 0.0224], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 00:05:35,881 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8395, 2.5497, 1.8803, 2.3874, 2.5199, 2.4147, 2.4779, 2.6321], device='cuda:0'), covar=tensor([0.0439, 0.1117, 0.2758, 0.1158, 0.1088, 0.0803, 0.1099, 0.0773], device='cuda:0'), in_proj_covar=tensor([0.0124, 0.0168, 0.0263, 0.0163, 0.0208, 0.0164, 0.0175, 0.0162], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 00:05:39,724 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-16 00:05:47,254 INFO [train.py:876] (0/4) Epoch 8, batch 4200, loss[loss=0.1797, simple_loss=0.19, pruned_loss=0.08468, over 5368.00 frames. ], tot_loss[loss=0.1391, simple_loss=0.1578, pruned_loss=0.06014, over 1083102.35 frames. ], batch size: 70, lr: 1.01e-02, grad_scale: 16.0 2022-11-16 00:05:59,242 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.030e+02 1.648e+02 1.989e+02 2.446e+02 4.173e+02, threshold=3.979e+02, percent-clipped=3.0 2022-11-16 00:06:14,433 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5226, 4.7292, 3.0992, 4.3821, 3.5777, 3.0308, 2.6493, 4.0739], device='cuda:0'), covar=tensor([0.1462, 0.0217, 0.1074, 0.0294, 0.0607, 0.0949, 0.1776, 0.0287], device='cuda:0'), in_proj_covar=tensor([0.0169, 0.0142, 0.0165, 0.0142, 0.0176, 0.0178, 0.0175, 0.0152], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 00:06:16,041 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=55148.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:06:39,781 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4803, 2.4625, 1.8798, 2.8341, 1.8849, 2.1722, 2.3641, 2.8673], device='cuda:0'), covar=tensor([0.1128, 0.1830, 0.3372, 0.1290, 0.2239, 0.1627, 0.2323, 0.2150], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0089, 0.0095, 0.0080, 0.0082, 0.0084, 0.0089, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 00:06:44,441 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9468, 2.1593, 2.1070, 1.3445, 2.2811, 2.5514, 2.3483, 2.5417], device='cuda:0'), covar=tensor([0.1845, 0.1542, 0.1299, 0.2679, 0.0709, 0.0705, 0.0492, 0.0936], device='cuda:0'), in_proj_covar=tensor([0.0171, 0.0181, 0.0156, 0.0186, 0.0168, 0.0180, 0.0146, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0003, 0.0003, 0.0004], device='cuda:0') 2022-11-16 00:06:54,338 INFO [train.py:876] (0/4) Epoch 8, batch 4300, loss[loss=0.09565, simple_loss=0.1247, pruned_loss=0.03332, over 5718.00 frames. ], tot_loss[loss=0.14, simple_loss=0.1581, pruned_loss=0.06093, over 1075662.61 frames. ], batch size: 12, lr: 1.00e-02, grad_scale: 16.0 2022-11-16 00:06:57,942 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=55209.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:06:59,239 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=55211.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:07:04,169 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=55218.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 00:07:07,358 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.645e+01 1.651e+02 2.028e+02 2.598e+02 5.835e+02, threshold=4.056e+02, percent-clipped=3.0 2022-11-16 00:07:07,814 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2022-11-16 00:07:30,838 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5929, 3.6310, 3.5999, 3.2816, 3.6270, 3.3884, 1.4474, 3.7640], device='cuda:0'), covar=tensor([0.0277, 0.0243, 0.0250, 0.0331, 0.0301, 0.0373, 0.2956, 0.0279], device='cuda:0'), in_proj_covar=tensor([0.0102, 0.0084, 0.0081, 0.0075, 0.0099, 0.0086, 0.0129, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 00:07:40,457 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=55272.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:07:58,405 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0739, 1.7772, 2.0362, 2.0157, 2.4840, 1.8643, 1.4744, 2.2670], device='cuda:0'), covar=tensor([0.1198, 0.2015, 0.1622, 0.0645, 0.0852, 0.2191, 0.1877, 0.1023], device='cuda:0'), in_proj_covar=tensor([0.0223, 0.0206, 0.0205, 0.0322, 0.0226, 0.0217, 0.0196, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 00:08:02,074 INFO [train.py:876] (0/4) Epoch 8, batch 4400, loss[loss=0.1365, simple_loss=0.1631, pruned_loss=0.05491, over 5727.00 frames. ], tot_loss[loss=0.1382, simple_loss=0.1568, pruned_loss=0.05987, over 1072379.22 frames. ], batch size: 28, lr: 1.00e-02, grad_scale: 16.0 2022-11-16 00:08:12,638 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=55320.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 00:08:14,768 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.701e+02 2.121e+02 2.893e+02 5.250e+02, threshold=4.241e+02, percent-clipped=3.0 2022-11-16 00:08:44,935 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=55368.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 00:09:10,910 INFO [train.py:876] (0/4) Epoch 8, batch 4500, loss[loss=0.1316, simple_loss=0.1724, pruned_loss=0.04537, over 5713.00 frames. ], tot_loss[loss=0.1394, simple_loss=0.1581, pruned_loss=0.06036, over 1080173.44 frames. ], batch size: 17, lr: 1.00e-02, grad_scale: 16.0 2022-11-16 00:09:13,636 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0833, 0.8053, 0.8345, 0.7780, 1.1452, 0.9317, 0.7980, 0.8170], device='cuda:0'), covar=tensor([0.0350, 0.0358, 0.0364, 0.0548, 0.0264, 0.0248, 0.0751, 0.0440], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0016, 0.0011, 0.0014, 0.0012, 0.0010, 0.0015, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.6584e-05, 7.3061e-05, 5.6840e-05, 6.7234e-05, 6.1208e-05, 5.5006e-05, 6.9815e-05, 5.6948e-05], device='cuda:0') 2022-11-16 00:09:22,570 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 1.575e+02 1.933e+02 2.382e+02 3.910e+02, threshold=3.866e+02, percent-clipped=0.0 2022-11-16 00:10:01,877 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2097, 2.6272, 2.9134, 1.5815, 2.7833, 3.1720, 3.3165, 3.3126], device='cuda:0'), covar=tensor([0.2021, 0.1835, 0.1184, 0.3239, 0.0547, 0.0924, 0.0301, 0.0975], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0186, 0.0158, 0.0192, 0.0172, 0.0183, 0.0150, 0.0186], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 00:10:18,066 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=55504.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:10:18,620 INFO [train.py:876] (0/4) Epoch 8, batch 4600, loss[loss=0.1065, simple_loss=0.1359, pruned_loss=0.03859, over 5450.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.1587, pruned_loss=0.06161, over 1079295.47 frames. ], batch size: 11, lr: 1.00e-02, grad_scale: 16.0 2022-11-16 00:10:25,289 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2745, 4.3297, 3.0222, 4.0834, 3.2697, 2.9475, 2.5869, 3.7189], device='cuda:0'), covar=tensor([0.1593, 0.0219, 0.0914, 0.0286, 0.0738, 0.0958, 0.1805, 0.0353], device='cuda:0'), in_proj_covar=tensor([0.0162, 0.0137, 0.0160, 0.0136, 0.0169, 0.0169, 0.0168, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 00:10:27,253 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=55518.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 00:10:30,361 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.015e+02 1.826e+02 2.052e+02 2.615e+02 3.619e+02, threshold=4.103e+02, percent-clipped=0.0 2022-11-16 00:10:59,622 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=55566.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:11:00,307 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=55567.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:11:26,709 INFO [train.py:876] (0/4) Epoch 8, batch 4700, loss[loss=0.1778, simple_loss=0.1677, pruned_loss=0.09392, over 4121.00 frames. ], tot_loss[loss=0.1401, simple_loss=0.1578, pruned_loss=0.06119, over 1077176.69 frames. ], batch size: 181, lr: 1.00e-02, grad_scale: 16.0 2022-11-16 00:11:33,372 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8582, 1.3962, 1.2899, 0.7008, 1.3782, 1.4036, 0.5271, 1.3212], device='cuda:0'), covar=tensor([0.0042, 0.0024, 0.0033, 0.0041, 0.0030, 0.0025, 0.0065, 0.0030], device='cuda:0'), in_proj_covar=tensor([0.0046, 0.0041, 0.0044, 0.0043, 0.0042, 0.0037, 0.0043, 0.0036], device='cuda:0'), out_proj_covar=tensor([4.1384e-05, 3.7474e-05, 3.9572e-05, 3.8305e-05, 3.7658e-05, 3.2388e-05, 3.9907e-05, 3.1909e-05], device='cuda:0') 2022-11-16 00:11:38,375 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.703e+02 2.017e+02 2.735e+02 4.468e+02, threshold=4.034e+02, percent-clipped=2.0 2022-11-16 00:12:33,872 INFO [train.py:876] (0/4) Epoch 8, batch 4800, loss[loss=0.1272, simple_loss=0.1586, pruned_loss=0.04784, over 5730.00 frames. ], tot_loss[loss=0.1379, simple_loss=0.1567, pruned_loss=0.05956, over 1081867.85 frames. ], batch size: 31, lr: 1.00e-02, grad_scale: 16.0 2022-11-16 00:12:40,312 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.96 vs. limit=2.0 2022-11-16 00:12:46,304 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.028e+02 1.529e+02 1.884e+02 2.231e+02 4.028e+02, threshold=3.767e+02, percent-clipped=0.0 2022-11-16 00:12:51,852 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.85 vs. limit=2.0 2022-11-16 00:12:54,867 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4286, 3.2150, 3.2055, 2.9232, 2.0213, 3.3556, 2.0708, 2.8706], device='cuda:0'), covar=tensor([0.0294, 0.0211, 0.0116, 0.0286, 0.0353, 0.0107, 0.0377, 0.0104], device='cuda:0'), in_proj_covar=tensor([0.0183, 0.0156, 0.0167, 0.0185, 0.0180, 0.0165, 0.0176, 0.0161], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 00:13:02,505 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=55748.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:13:13,256 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2137, 1.4266, 1.4386, 0.8745, 1.1437, 1.4924, 1.0339, 1.2800], device='cuda:0'), covar=tensor([0.0043, 0.0045, 0.0035, 0.0038, 0.0029, 0.0028, 0.0045, 0.0053], device='cuda:0'), in_proj_covar=tensor([0.0046, 0.0041, 0.0044, 0.0043, 0.0042, 0.0038, 0.0043, 0.0036], device='cuda:0'), out_proj_covar=tensor([4.1248e-05, 3.7221e-05, 3.9647e-05, 3.8387e-05, 3.7354e-05, 3.2534e-05, 4.0216e-05, 3.2187e-05], device='cuda:0') 2022-11-16 00:13:26,010 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8692, 1.4050, 0.9831, 1.0447, 0.9745, 1.6347, 1.1700, 1.0628], device='cuda:0'), covar=tensor([0.2208, 0.0501, 0.1899, 0.2211, 0.2111, 0.0467, 0.2225, 0.2245], device='cuda:0'), in_proj_covar=tensor([0.0076, 0.0066, 0.0063, 0.0080, 0.0061, 0.0050, 0.0058, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0001, 0.0001, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001], device='cuda:0') 2022-11-16 00:13:29,944 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1773, 3.3642, 2.3901, 1.7512, 3.1424, 1.3050, 3.1489, 1.6905], device='cuda:0'), covar=tensor([0.1403, 0.0175, 0.0948, 0.1850, 0.0249, 0.2205, 0.0249, 0.1812], device='cuda:0'), in_proj_covar=tensor([0.0127, 0.0109, 0.0118, 0.0118, 0.0107, 0.0130, 0.0101, 0.0117], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:13:40,597 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=55804.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:13:41,117 INFO [train.py:876] (0/4) Epoch 8, batch 4900, loss[loss=0.1169, simple_loss=0.1428, pruned_loss=0.04544, over 5737.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.1562, pruned_loss=0.0594, over 1088278.67 frames. ], batch size: 14, lr: 9.99e-03, grad_scale: 16.0 2022-11-16 00:13:43,886 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=55809.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:13:47,103 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.82 vs. limit=5.0 2022-11-16 00:13:53,468 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.007e+01 1.783e+02 2.085e+02 2.465e+02 4.573e+02, threshold=4.169e+02, percent-clipped=4.0 2022-11-16 00:14:08,070 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8582, 2.1789, 3.3924, 2.8728, 3.7252, 2.2386, 3.1514, 3.7958], device='cuda:0'), covar=tensor([0.0801, 0.1686, 0.0816, 0.1530, 0.0488, 0.1639, 0.1283, 0.0805], device='cuda:0'), in_proj_covar=tensor([0.0223, 0.0192, 0.0200, 0.0209, 0.0219, 0.0192, 0.0226, 0.0221], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:14:10,697 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=55848.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:14:13,177 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=55852.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:14:14,816 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.41 vs. limit=5.0 2022-11-16 00:14:21,230 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2022-11-16 00:14:23,018 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=55867.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:14:30,187 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6909, 4.2634, 3.3396, 1.8010, 4.0023, 1.7502, 3.9404, 2.3598], device='cuda:0'), covar=tensor([0.1149, 0.0141, 0.0522, 0.2154, 0.0182, 0.1894, 0.0160, 0.1524], device='cuda:0'), in_proj_covar=tensor([0.0124, 0.0106, 0.0115, 0.0116, 0.0105, 0.0127, 0.0098, 0.0114], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:14:49,101 INFO [train.py:876] (0/4) Epoch 8, batch 5000, loss[loss=0.1398, simple_loss=0.1687, pruned_loss=0.05547, over 5553.00 frames. ], tot_loss[loss=0.1373, simple_loss=0.1561, pruned_loss=0.05919, over 1083616.39 frames. ], batch size: 16, lr: 9.98e-03, grad_scale: 16.0 2022-11-16 00:14:51,907 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=55909.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:14:55,671 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=55915.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:15:00,898 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.639e+01 1.479e+02 1.807e+02 2.290e+02 3.768e+02, threshold=3.615e+02, percent-clipped=0.0 2022-11-16 00:15:02,027 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.7696, 4.9018, 5.3428, 4.9746, 5.7399, 5.5831, 4.8806, 5.5524], device='cuda:0'), covar=tensor([0.0220, 0.0253, 0.0437, 0.0269, 0.0250, 0.0113, 0.0184, 0.0199], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0135, 0.0102, 0.0133, 0.0150, 0.0088, 0.0113, 0.0134], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 00:15:18,033 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.70 vs. limit=2.0 2022-11-16 00:15:37,757 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-16 00:15:40,809 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.98 vs. limit=5.0 2022-11-16 00:15:57,403 INFO [train.py:876] (0/4) Epoch 8, batch 5100, loss[loss=0.1127, simple_loss=0.1326, pruned_loss=0.04635, over 5457.00 frames. ], tot_loss[loss=0.1334, simple_loss=0.1538, pruned_loss=0.05653, over 1089378.55 frames. ], batch size: 12, lr: 9.97e-03, grad_scale: 16.0 2022-11-16 00:16:09,575 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.582e+02 1.987e+02 2.357e+02 3.737e+02, threshold=3.975e+02, percent-clipped=1.0 2022-11-16 00:17:05,608 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=56104.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:17:06,566 INFO [train.py:876] (0/4) Epoch 8, batch 5200, loss[loss=0.137, simple_loss=0.1369, pruned_loss=0.06856, over 4733.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.1557, pruned_loss=0.05874, over 1085279.22 frames. ], batch size: 135, lr: 9.96e-03, grad_scale: 32.0 2022-11-16 00:17:16,566 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9644, 1.6719, 1.2720, 0.8757, 1.1496, 1.2771, 0.7516, 1.3234], device='cuda:0'), covar=tensor([0.0053, 0.0028, 0.0031, 0.0038, 0.0031, 0.0031, 0.0045, 0.0046], device='cuda:0'), in_proj_covar=tensor([0.0045, 0.0040, 0.0043, 0.0042, 0.0041, 0.0037, 0.0043, 0.0036], device='cuda:0'), out_proj_covar=tensor([4.0874e-05, 3.6400e-05, 3.9034e-05, 3.8103e-05, 3.6496e-05, 3.2498e-05, 4.0106e-05, 3.1884e-05], device='cuda:0') 2022-11-16 00:17:18,356 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.676e+02 2.046e+02 2.590e+02 6.107e+02, threshold=4.093e+02, percent-clipped=5.0 2022-11-16 00:17:32,915 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56145.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:18:13,506 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=56204.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:18:14,419 INFO [train.py:876] (0/4) Epoch 8, batch 5300, loss[loss=0.1479, simple_loss=0.1712, pruned_loss=0.06233, over 5605.00 frames. ], tot_loss[loss=0.1365, simple_loss=0.1559, pruned_loss=0.05861, over 1085723.06 frames. ], batch size: 24, lr: 9.95e-03, grad_scale: 16.0 2022-11-16 00:18:15,310 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=56206.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:18:27,817 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.018e+02 1.491e+02 1.984e+02 2.507e+02 5.516e+02, threshold=3.968e+02, percent-clipped=2.0 2022-11-16 00:19:00,753 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7469, 1.5328, 1.8076, 1.7503, 1.0354, 1.6047, 1.1659, 1.3901], device='cuda:0'), covar=tensor([0.0093, 0.0052, 0.0055, 0.0060, 0.0195, 0.0067, 0.0139, 0.0079], device='cuda:0'), in_proj_covar=tensor([0.0183, 0.0156, 0.0167, 0.0187, 0.0182, 0.0167, 0.0179, 0.0163], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 00:19:22,482 INFO [train.py:876] (0/4) Epoch 8, batch 5400, loss[loss=0.1628, simple_loss=0.1729, pruned_loss=0.07635, over 5530.00 frames. ], tot_loss[loss=0.1383, simple_loss=0.1566, pruned_loss=0.05995, over 1078587.32 frames. ], batch size: 46, lr: 9.94e-03, grad_scale: 16.0 2022-11-16 00:19:34,944 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.62 vs. limit=5.0 2022-11-16 00:19:35,869 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.666e+02 2.112e+02 2.749e+02 4.650e+02, threshold=4.223e+02, percent-clipped=6.0 2022-11-16 00:19:51,596 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-16 00:19:54,311 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.75 vs. limit=5.0 2022-11-16 00:20:29,852 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=56404.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:20:30,393 INFO [train.py:876] (0/4) Epoch 8, batch 5500, loss[loss=0.08886, simple_loss=0.111, pruned_loss=0.03336, over 4725.00 frames. ], tot_loss[loss=0.1408, simple_loss=0.1589, pruned_loss=0.06138, over 1075032.41 frames. ], batch size: 5, lr: 9.94e-03, grad_scale: 16.0 2022-11-16 00:20:34,208 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-16 00:20:42,595 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.705e+02 2.205e+02 2.519e+02 5.507e+02, threshold=4.409e+02, percent-clipped=4.0 2022-11-16 00:20:42,788 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6932, 1.7785, 1.8947, 2.0368, 1.8201, 1.5418, 1.8047, 1.9399], device='cuda:0'), covar=tensor([0.1831, 0.2412, 0.2333, 0.1442, 0.1946, 0.2675, 0.1759, 0.0735], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0088, 0.0096, 0.0082, 0.0082, 0.0085, 0.0089, 0.0064], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 00:20:56,410 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4278, 4.0817, 3.0733, 1.8748, 3.8167, 1.5383, 3.4802, 2.0540], device='cuda:0'), covar=tensor([0.1180, 0.0116, 0.0700, 0.1801, 0.0166, 0.1819, 0.0265, 0.1484], device='cuda:0'), in_proj_covar=tensor([0.0124, 0.0106, 0.0114, 0.0115, 0.0104, 0.0126, 0.0097, 0.0114], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 00:21:02,100 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=56452.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:21:06,285 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2022-11-16 00:21:10,707 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7716, 1.4815, 1.5172, 0.9253, 1.3164, 1.4867, 1.4041, 1.5878], device='cuda:0'), covar=tensor([0.1299, 0.0677, 0.0857, 0.1611, 0.2815, 0.0576, 0.0587, 0.0643], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0016, 0.0012, 0.0014, 0.0013, 0.0011, 0.0015, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.8626e-05, 7.7436e-05, 6.1781e-05, 6.9529e-05, 6.4563e-05, 5.7825e-05, 7.2204e-05, 5.8287e-05], device='cuda:0') 2022-11-16 00:21:35,008 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0557, 1.5296, 1.9234, 1.7858, 2.1838, 1.7319, 1.4264, 1.9577], device='cuda:0'), covar=tensor([0.1135, 0.1742, 0.1120, 0.0823, 0.0735, 0.1930, 0.1933, 0.1686], device='cuda:0'), in_proj_covar=tensor([0.0225, 0.0207, 0.0198, 0.0325, 0.0222, 0.0215, 0.0197, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 00:21:35,574 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=56501.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:21:37,720 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=56504.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:21:38,227 INFO [train.py:876] (0/4) Epoch 8, batch 5600, loss[loss=0.1655, simple_loss=0.1843, pruned_loss=0.07342, over 5746.00 frames. ], tot_loss[loss=0.14, simple_loss=0.1586, pruned_loss=0.06076, over 1074229.93 frames. ], batch size: 27, lr: 9.93e-03, grad_scale: 16.0 2022-11-16 00:21:50,130 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56523.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 00:21:50,546 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.658e+02 2.017e+02 2.598e+02 4.706e+02, threshold=4.034e+02, percent-clipped=2.0 2022-11-16 00:22:00,794 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5378, 3.2525, 3.3786, 3.1873, 3.5985, 3.4966, 3.3629, 3.5500], device='cuda:0'), covar=tensor([0.0466, 0.0428, 0.0534, 0.0430, 0.0427, 0.0213, 0.0398, 0.0491], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0134, 0.0103, 0.0132, 0.0149, 0.0090, 0.0114, 0.0135], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 00:22:09,934 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=56552.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:22:31,083 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=56584.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 00:22:46,105 INFO [train.py:876] (0/4) Epoch 8, batch 5700, loss[loss=0.1847, simple_loss=0.1928, pruned_loss=0.08829, over 5590.00 frames. ], tot_loss[loss=0.1396, simple_loss=0.1583, pruned_loss=0.06042, over 1076949.66 frames. ], batch size: 43, lr: 9.92e-03, grad_scale: 16.0 2022-11-16 00:22:58,516 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.608e+02 1.896e+02 2.176e+02 4.174e+02, threshold=3.791e+02, percent-clipped=1.0 2022-11-16 00:23:31,636 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.35 vs. limit=2.0 2022-11-16 00:23:52,798 INFO [train.py:876] (0/4) Epoch 8, batch 5800, loss[loss=0.1775, simple_loss=0.1936, pruned_loss=0.08072, over 5487.00 frames. ], tot_loss[loss=0.1381, simple_loss=0.1575, pruned_loss=0.05937, over 1081962.07 frames. ], batch size: 53, lr: 9.91e-03, grad_scale: 16.0 2022-11-16 00:24:05,870 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.015e+02 1.678e+02 1.927e+02 2.406e+02 3.937e+02, threshold=3.853e+02, percent-clipped=1.0 2022-11-16 00:24:22,266 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4007, 4.3048, 2.7995, 3.9582, 3.2946, 2.8565, 2.4169, 3.5280], device='cuda:0'), covar=tensor([0.1497, 0.0222, 0.1056, 0.0337, 0.0669, 0.1004, 0.1777, 0.0359], device='cuda:0'), in_proj_covar=tensor([0.0165, 0.0138, 0.0164, 0.0140, 0.0174, 0.0174, 0.0172, 0.0151], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 00:24:57,431 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=56801.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:24:58,797 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56803.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:25:00,324 INFO [train.py:876] (0/4) Epoch 8, batch 5900, loss[loss=0.1333, simple_loss=0.1625, pruned_loss=0.05205, over 5583.00 frames. ], tot_loss[loss=0.1367, simple_loss=0.1565, pruned_loss=0.05844, over 1085626.12 frames. ], batch size: 22, lr: 9.90e-03, grad_scale: 16.0 2022-11-16 00:25:10,209 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.78 vs. limit=2.0 2022-11-16 00:25:13,652 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.513e+01 1.636e+02 2.086e+02 2.680e+02 4.178e+02, threshold=4.173e+02, percent-clipped=3.0 2022-11-16 00:25:29,997 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=56849.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:25:40,345 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=56864.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:25:50,688 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=56879.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 00:26:07,759 INFO [train.py:876] (0/4) Epoch 8, batch 6000, loss[loss=0.1612, simple_loss=0.1626, pruned_loss=0.07994, over 4166.00 frames. ], tot_loss[loss=0.1384, simple_loss=0.1579, pruned_loss=0.05949, over 1085210.91 frames. ], batch size: 181, lr: 9.89e-03, grad_scale: 16.0 2022-11-16 00:26:07,760 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 00:26:13,916 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8848, 2.2922, 2.7114, 2.6896, 2.6209, 2.1050, 2.6992, 3.0739], device='cuda:0'), covar=tensor([0.0295, 0.0735, 0.0490, 0.0440, 0.0539, 0.1178, 0.0466, 0.0360], device='cuda:0'), in_proj_covar=tensor([0.0225, 0.0190, 0.0206, 0.0211, 0.0219, 0.0192, 0.0223, 0.0221], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:26:21,447 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3653, 5.0252, 3.8357, 2.3576, 4.4760, 2.3030, 4.5325, 3.0730], device='cuda:0'), covar=tensor([0.0927, 0.0067, 0.0343, 0.2019, 0.0131, 0.1458, 0.0092, 0.1253], device='cuda:0'), in_proj_covar=tensor([0.0124, 0.0106, 0.0113, 0.0115, 0.0105, 0.0126, 0.0097, 0.0114], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 00:26:25,621 INFO [train.py:908] (0/4) Epoch 8, validation: loss=0.1622, simple_loss=0.1823, pruned_loss=0.07105, over 1530663.00 frames. 2022-11-16 00:26:25,622 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-16 00:26:27,740 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56908.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:26:38,697 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.650e+01 1.622e+02 2.023e+02 2.494e+02 5.348e+02, threshold=4.047e+02, percent-clipped=2.0 2022-11-16 00:27:08,840 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=56969.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:27:12,433 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4349, 4.1842, 3.2062, 1.8951, 3.7674, 1.3478, 3.8589, 2.0875], device='cuda:0'), covar=tensor([0.1614, 0.0174, 0.0732, 0.2262, 0.0251, 0.2462, 0.0240, 0.1850], device='cuda:0'), in_proj_covar=tensor([0.0123, 0.0106, 0.0114, 0.0114, 0.0105, 0.0126, 0.0097, 0.0114], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 00:27:24,442 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7620, 4.0161, 3.8129, 3.5902, 2.1417, 3.9864, 2.2351, 3.1524], device='cuda:0'), covar=tensor([0.0439, 0.0197, 0.0188, 0.0383, 0.0567, 0.0151, 0.0491, 0.0159], device='cuda:0'), in_proj_covar=tensor([0.0178, 0.0152, 0.0163, 0.0182, 0.0176, 0.0161, 0.0173, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 00:27:25,016 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56993.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:27:26,253 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=56995.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:27:32,826 INFO [train.py:876] (0/4) Epoch 8, batch 6100, loss[loss=0.1773, simple_loss=0.1866, pruned_loss=0.08403, over 5734.00 frames. ], tot_loss[loss=0.139, simple_loss=0.1583, pruned_loss=0.05981, over 1087258.89 frames. ], batch size: 31, lr: 9.88e-03, grad_scale: 16.0 2022-11-16 00:27:45,774 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.054e+01 1.623e+02 1.859e+02 2.466e+02 4.899e+02, threshold=3.719e+02, percent-clipped=4.0 2022-11-16 00:28:07,741 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57054.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:28:09,136 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57056.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:28:37,227 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0730, 2.6136, 3.4296, 3.1018, 3.8023, 2.6870, 3.6401, 3.9702], device='cuda:0'), covar=tensor([0.0559, 0.1474, 0.0922, 0.1416, 0.0568, 0.1430, 0.0959, 0.0718], device='cuda:0'), in_proj_covar=tensor([0.0225, 0.0189, 0.0207, 0.0213, 0.0221, 0.0192, 0.0223, 0.0222], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:28:43,879 INFO [train.py:876] (0/4) Epoch 8, batch 6200, loss[loss=0.08614, simple_loss=0.1154, pruned_loss=0.02847, over 5506.00 frames. ], tot_loss[loss=0.138, simple_loss=0.1575, pruned_loss=0.05925, over 1087032.37 frames. ], batch size: 10, lr: 9.88e-03, grad_scale: 16.0 2022-11-16 00:28:47,811 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.09 vs. limit=5.0 2022-11-16 00:28:48,866 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8261, 2.8410, 2.4607, 3.2083, 2.3509, 2.8468, 2.9958, 3.4172], device='cuda:0'), covar=tensor([0.1272, 0.2122, 0.2980, 0.1174, 0.1733, 0.1210, 0.1718, 0.1937], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0089, 0.0097, 0.0083, 0.0082, 0.0086, 0.0090, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 00:28:56,822 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.699e+02 2.012e+02 2.315e+02 3.529e+02, threshold=4.025e+02, percent-clipped=0.0 2022-11-16 00:28:59,212 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.18 vs. limit=5.0 2022-11-16 00:29:12,826 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.89 vs. limit=2.0 2022-11-16 00:29:17,852 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3890, 1.9845, 1.5221, 1.5220, 0.9928, 1.5668, 1.2748, 1.7216], device='cuda:0'), covar=tensor([0.1086, 0.0315, 0.0882, 0.0666, 0.1805, 0.0908, 0.1428, 0.0507], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0139, 0.0166, 0.0142, 0.0176, 0.0178, 0.0174, 0.0150], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 00:29:22,006 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57159.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:29:35,659 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57179.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 00:29:40,812 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57186.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:29:44,590 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5386, 2.4968, 2.2326, 2.6231, 2.0921, 2.1677, 2.3926, 2.8940], device='cuda:0'), covar=tensor([0.1312, 0.2065, 0.2551, 0.1275, 0.2189, 0.3576, 0.1868, 0.2691], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0088, 0.0096, 0.0082, 0.0082, 0.0085, 0.0090, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 00:29:49,415 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57197.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:29:55,084 INFO [train.py:876] (0/4) Epoch 8, batch 6300, loss[loss=0.1028, simple_loss=0.1427, pruned_loss=0.03144, over 5547.00 frames. ], tot_loss[loss=0.1367, simple_loss=0.1566, pruned_loss=0.05845, over 1083204.40 frames. ], batch size: 15, lr: 9.87e-03, grad_scale: 16.0 2022-11-16 00:30:08,132 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.308e+02 1.749e+02 2.073e+02 2.619e+02 6.715e+02, threshold=4.147e+02, percent-clipped=6.0 2022-11-16 00:30:10,270 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57227.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 00:30:23,461 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57244.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:30:25,496 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57247.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:30:33,122 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57258.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 00:30:37,141 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57264.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:30:39,657 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-16 00:30:55,768 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9528, 2.3852, 3.4331, 3.0415, 3.6796, 2.5675, 3.2179, 3.8871], device='cuda:0'), covar=tensor([0.0536, 0.1493, 0.0818, 0.1646, 0.0513, 0.1571, 0.1204, 0.0808], device='cuda:0'), in_proj_covar=tensor([0.0221, 0.0190, 0.0204, 0.0209, 0.0219, 0.0190, 0.0220, 0.0219], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:31:06,488 INFO [train.py:876] (0/4) Epoch 8, batch 6400, loss[loss=0.176, simple_loss=0.1883, pruned_loss=0.08182, over 5591.00 frames. ], tot_loss[loss=0.139, simple_loss=0.158, pruned_loss=0.05997, over 1084193.27 frames. ], batch size: 18, lr: 9.86e-03, grad_scale: 16.0 2022-11-16 00:31:06,662 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57305.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:31:15,532 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5421, 1.0075, 1.1947, 0.8836, 1.1610, 1.4669, 0.9283, 0.9035], device='cuda:0'), covar=tensor([0.0527, 0.0472, 0.0647, 0.0913, 0.1091, 0.0325, 0.0610, 0.0667], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0017, 0.0012, 0.0015, 0.0014, 0.0011, 0.0016, 0.0012], device='cuda:0'), out_proj_covar=tensor([6.0794e-05, 8.0802e-05, 6.3003e-05, 7.4386e-05, 6.7083e-05, 6.0329e-05, 7.5645e-05, 6.1256e-05], device='cuda:0') 2022-11-16 00:31:19,419 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.947e+01 1.601e+02 2.011e+02 2.621e+02 5.168e+02, threshold=4.022e+02, percent-clipped=2.0 2022-11-16 00:31:23,091 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8114, 2.6323, 2.1048, 2.1915, 1.5050, 2.1876, 1.6004, 2.2169], device='cuda:0'), covar=tensor([0.1191, 0.0320, 0.0939, 0.0601, 0.1639, 0.0862, 0.1699, 0.0451], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0139, 0.0166, 0.0143, 0.0176, 0.0177, 0.0175, 0.0152], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 00:31:36,917 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57349.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:31:37,061 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9146, 2.4233, 3.4838, 3.1431, 3.7382, 2.5177, 3.4790, 3.8646], device='cuda:0'), covar=tensor([0.0615, 0.2030, 0.0928, 0.1493, 0.0578, 0.1550, 0.1106, 0.0735], device='cuda:0'), in_proj_covar=tensor([0.0223, 0.0192, 0.0205, 0.0210, 0.0219, 0.0191, 0.0221, 0.0219], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:31:38,376 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57351.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:32:11,880 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57398.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:32:16,757 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3378, 2.2572, 2.0599, 2.2445, 1.8192, 1.5878, 2.0337, 2.5929], device='cuda:0'), covar=tensor([0.1177, 0.1615, 0.2258, 0.1180, 0.1728, 0.2165, 0.1817, 0.1415], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0088, 0.0096, 0.0083, 0.0081, 0.0085, 0.0090, 0.0065], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 00:32:17,278 INFO [train.py:876] (0/4) Epoch 8, batch 6500, loss[loss=0.1565, simple_loss=0.1694, pruned_loss=0.07182, over 5707.00 frames. ], tot_loss[loss=0.1382, simple_loss=0.1572, pruned_loss=0.05959, over 1085107.50 frames. ], batch size: 36, lr: 9.85e-03, grad_scale: 16.0 2022-11-16 00:32:30,736 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.052e+02 1.615e+02 1.950e+02 2.299e+02 5.072e+02, threshold=3.899e+02, percent-clipped=1.0 2022-11-16 00:32:55,149 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57459.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:32:55,211 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57459.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:32:57,565 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57462.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:33:02,090 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1900, 1.0317, 1.2112, 0.9686, 0.9936, 1.1265, 0.9120, 0.7589], device='cuda:0'), covar=tensor([0.0030, 0.0063, 0.0055, 0.0061, 0.0071, 0.0066, 0.0040, 0.0053], device='cuda:0'), in_proj_covar=tensor([0.0019, 0.0020, 0.0021, 0.0026, 0.0022, 0.0021, 0.0024, 0.0024], device='cuda:0'), out_proj_covar=tensor([1.6882e-05, 1.8757e-05, 1.8867e-05, 2.5236e-05, 2.1086e-05, 2.0645e-05, 2.3022e-05, 2.4069e-05], device='cuda:0') 2022-11-16 00:33:20,302 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57495.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:33:26,884 INFO [train.py:876] (0/4) Epoch 8, batch 6600, loss[loss=0.1081, simple_loss=0.1394, pruned_loss=0.0384, over 5683.00 frames. ], tot_loss[loss=0.1389, simple_loss=0.1579, pruned_loss=0.05999, over 1083074.71 frames. ], batch size: 34, lr: 9.84e-03, grad_scale: 16.0 2022-11-16 00:33:28,252 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57507.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:33:39,743 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57523.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:33:40,181 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.548e+02 1.936e+02 2.529e+02 5.371e+02, threshold=3.872e+02, percent-clipped=1.0 2022-11-16 00:33:52,098 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57542.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:33:59,272 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57553.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 00:34:01,288 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57556.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:34:06,415 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57564.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:34:27,562 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2022-11-16 00:34:30,740 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57600.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:34:33,916 INFO [train.py:876] (0/4) Epoch 8, batch 6700, loss[loss=0.09261, simple_loss=0.1274, pruned_loss=0.0289, over 5763.00 frames. ], tot_loss[loss=0.1363, simple_loss=0.1561, pruned_loss=0.05824, over 1084827.05 frames. ], batch size: 20, lr: 9.83e-03, grad_scale: 16.0 2022-11-16 00:34:38,595 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57612.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:34:43,574 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.19 vs. limit=2.0 2022-11-16 00:34:46,347 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.030e+02 1.668e+02 2.078e+02 2.572e+02 7.235e+02, threshold=4.155e+02, percent-clipped=5.0 2022-11-16 00:34:46,550 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8599, 2.4102, 2.4575, 1.4373, 2.6308, 2.8391, 2.6547, 2.8458], device='cuda:0'), covar=tensor([0.2430, 0.1757, 0.1389, 0.3019, 0.0641, 0.0923, 0.0548, 0.1308], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0182, 0.0156, 0.0186, 0.0169, 0.0180, 0.0156, 0.0184], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 00:35:04,267 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57649.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:35:05,558 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57651.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:35:36,532 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57697.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:35:37,897 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57699.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:35:41,756 INFO [train.py:876] (0/4) Epoch 8, batch 6800, loss[loss=0.1264, simple_loss=0.15, pruned_loss=0.05139, over 5534.00 frames. ], tot_loss[loss=0.1367, simple_loss=0.1567, pruned_loss=0.05837, over 1087429.20 frames. ], batch size: 14, lr: 9.82e-03, grad_scale: 16.0 2022-11-16 00:35:53,963 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.669e+02 2.102e+02 2.619e+02 4.252e+02, threshold=4.204e+02, percent-clipped=2.0 2022-11-16 00:35:56,124 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4912, 3.2846, 3.2523, 1.4295, 2.9093, 3.5427, 3.4495, 4.0272], device='cuda:0'), covar=tensor([0.2136, 0.1261, 0.0636, 0.3215, 0.0538, 0.0507, 0.0410, 0.0485], device='cuda:0'), in_proj_covar=tensor([0.0173, 0.0181, 0.0157, 0.0185, 0.0168, 0.0179, 0.0155, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 00:36:00,304 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=57733.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:36:12,769 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9892, 2.4248, 2.8532, 3.8848, 3.8801, 3.1675, 2.6788, 3.9480], device='cuda:0'), covar=tensor([0.0670, 0.3382, 0.2617, 0.2792, 0.1315, 0.2791, 0.1926, 0.0537], device='cuda:0'), in_proj_covar=tensor([0.0227, 0.0207, 0.0196, 0.0324, 0.0225, 0.0214, 0.0193, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 00:36:14,617 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57754.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:36:26,024 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4925, 1.1097, 1.0187, 0.7534, 1.2235, 1.1575, 0.7588, 1.1095], device='cuda:0'), covar=tensor([0.0291, 0.0701, 0.0358, 0.0629, 0.0275, 0.0427, 0.0623, 0.0410], device='cuda:0'), in_proj_covar=tensor([0.0011, 0.0017, 0.0012, 0.0015, 0.0013, 0.0011, 0.0015, 0.0011], device='cuda:0'), out_proj_covar=tensor([5.9867e-05, 7.8794e-05, 6.0797e-05, 7.2968e-05, 6.5353e-05, 5.8536e-05, 7.3546e-05, 5.8640e-05], device='cuda:0') 2022-11-16 00:36:41,882 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=57794.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:36:49,277 INFO [train.py:876] (0/4) Epoch 8, batch 6900, loss[loss=0.0924, simple_loss=0.121, pruned_loss=0.03191, over 5724.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1557, pruned_loss=0.05817, over 1084665.32 frames. ], batch size: 14, lr: 9.82e-03, grad_scale: 16.0 2022-11-16 00:36:52,679 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5633, 2.0200, 2.2085, 2.7072, 2.8582, 2.2067, 1.8286, 2.8098], device='cuda:0'), covar=tensor([0.1464, 0.2493, 0.1863, 0.1437, 0.1048, 0.2510, 0.2160, 0.1292], device='cuda:0'), in_proj_covar=tensor([0.0226, 0.0208, 0.0195, 0.0325, 0.0226, 0.0213, 0.0194, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 00:36:57,824 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57818.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:37:01,645 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.691e+02 2.166e+02 2.756e+02 5.042e+02, threshold=4.332e+02, percent-clipped=3.0 2022-11-16 00:37:13,691 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57842.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:37:20,215 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=57851.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:37:21,624 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57853.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 00:37:35,230 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-16 00:37:45,646 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57890.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:37:53,395 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=57900.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:37:54,292 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57901.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:37:56,853 INFO [train.py:876] (0/4) Epoch 8, batch 7000, loss[loss=0.181, simple_loss=0.1717, pruned_loss=0.09515, over 4126.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1562, pruned_loss=0.05801, over 1086024.77 frames. ], batch size: 181, lr: 9.81e-03, grad_scale: 16.0 2022-11-16 00:38:09,168 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.725e+02 2.099e+02 2.587e+02 4.633e+02, threshold=4.198e+02, percent-clipped=1.0 2022-11-16 00:38:21,767 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7830, 2.0324, 3.2739, 2.6756, 3.8334, 2.0627, 3.0226, 3.6977], device='cuda:0'), covar=tensor([0.0794, 0.2682, 0.1293, 0.2394, 0.0519, 0.2411, 0.1967, 0.1033], device='cuda:0'), in_proj_covar=tensor([0.0227, 0.0195, 0.0210, 0.0213, 0.0226, 0.0195, 0.0222, 0.0224], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:38:24,052 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4026, 2.1146, 3.0927, 2.7055, 2.9695, 2.1269, 2.8019, 3.3375], device='cuda:0'), covar=tensor([0.0615, 0.1663, 0.0748, 0.1383, 0.0692, 0.1567, 0.1132, 0.0800], device='cuda:0'), in_proj_covar=tensor([0.0227, 0.0195, 0.0210, 0.0213, 0.0226, 0.0195, 0.0222, 0.0224], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:38:25,213 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=57948.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:39:03,875 INFO [train.py:876] (0/4) Epoch 8, batch 7100, loss[loss=0.1053, simple_loss=0.1464, pruned_loss=0.03208, over 5471.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.1565, pruned_loss=0.05863, over 1085016.18 frames. ], batch size: 12, lr: 9.80e-03, grad_scale: 16.0 2022-11-16 00:39:12,626 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.33 vs. limit=5.0 2022-11-16 00:39:16,950 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.016e+02 1.751e+02 2.181e+02 2.576e+02 4.312e+02, threshold=4.361e+02, percent-clipped=1.0 2022-11-16 00:39:18,019 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.70 vs. limit=2.0 2022-11-16 00:39:36,881 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58054.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:40:00,891 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=58089.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:40:09,391 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58102.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:40:11,334 INFO [train.py:876] (0/4) Epoch 8, batch 7200, loss[loss=0.2046, simple_loss=0.2011, pruned_loss=0.1041, over 4720.00 frames. ], tot_loss[loss=0.1359, simple_loss=0.1556, pruned_loss=0.05813, over 1085122.00 frames. ], batch size: 136, lr: 9.79e-03, grad_scale: 16.0 2022-11-16 00:40:20,266 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58118.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:40:24,026 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.930e+01 1.590e+02 1.880e+02 2.458e+02 4.390e+02, threshold=3.761e+02, percent-clipped=1.0 2022-11-16 00:40:36,091 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58141.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:40:38,016 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58144.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:40:42,414 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58151.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:40:51,878 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58166.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:41:00,078 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-8.pt 2022-11-16 00:41:42,120 INFO [train.py:876] (0/4) Epoch 9, batch 0, loss[loss=0.1712, simple_loss=0.1853, pruned_loss=0.07856, over 5653.00 frames. ], tot_loss[loss=0.1712, simple_loss=0.1853, pruned_loss=0.07856, over 5653.00 frames. ], batch size: 38, lr: 9.26e-03, grad_scale: 16.0 2022-11-16 00:41:42,121 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 00:41:49,438 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5519, 5.2347, 4.5334, 5.1883, 5.2239, 4.4874, 4.9850, 4.6654], device='cuda:0'), covar=tensor([0.0185, 0.0298, 0.1008, 0.0280, 0.0342, 0.0292, 0.0318, 0.0380], device='cuda:0'), in_proj_covar=tensor([0.0124, 0.0165, 0.0260, 0.0159, 0.0207, 0.0164, 0.0174, 0.0162], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 00:41:58,752 INFO [train.py:908] (0/4) Epoch 9, validation: loss=0.1631, simple_loss=0.1836, pruned_loss=0.0713, over 1530663.00 frames. 2022-11-16 00:41:58,753 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-16 00:42:13,709 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58199.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:42:16,527 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58202.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:42:18,485 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58205.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:42:31,154 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.772e+02 2.208e+02 2.571e+02 4.464e+02, threshold=4.417e+02, percent-clipped=3.0 2022-11-16 00:43:06,380 INFO [train.py:876] (0/4) Epoch 9, batch 100, loss[loss=0.1283, simple_loss=0.1487, pruned_loss=0.05393, over 5324.00 frames. ], tot_loss[loss=0.1318, simple_loss=0.1534, pruned_loss=0.05506, over 440551.08 frames. ], batch size: 79, lr: 9.26e-03, grad_scale: 16.0 2022-11-16 00:43:38,872 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.840e+01 1.567e+02 1.869e+02 2.319e+02 4.319e+02, threshold=3.738e+02, percent-clipped=0.0 2022-11-16 00:44:13,597 INFO [train.py:876] (0/4) Epoch 9, batch 200, loss[loss=0.09904, simple_loss=0.1357, pruned_loss=0.03121, over 5506.00 frames. ], tot_loss[loss=0.131, simple_loss=0.1533, pruned_loss=0.05432, over 701432.67 frames. ], batch size: 12, lr: 9.25e-03, grad_scale: 16.0 2022-11-16 00:44:22,166 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58389.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:44:46,847 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.538e+01 1.502e+02 1.721e+02 2.221e+02 4.054e+02, threshold=3.441e+02, percent-clipped=2.0 2022-11-16 00:44:49,050 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9678, 1.5991, 2.2065, 1.4729, 1.0252, 2.8828, 2.0407, 1.8157], device='cuda:0'), covar=tensor([0.0788, 0.1142, 0.0617, 0.2468, 0.2522, 0.0742, 0.1563, 0.1269], device='cuda:0'), in_proj_covar=tensor([0.0078, 0.0069, 0.0067, 0.0081, 0.0060, 0.0052, 0.0059, 0.0066], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:0') 2022-11-16 00:44:55,438 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58437.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:45:21,748 INFO [train.py:876] (0/4) Epoch 9, batch 300, loss[loss=0.1121, simple_loss=0.1456, pruned_loss=0.03932, over 5762.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1561, pruned_loss=0.058, over 845693.12 frames. ], batch size: 16, lr: 9.24e-03, grad_scale: 16.0 2022-11-16 00:45:35,854 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=58497.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:45:37,821 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=58500.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:45:49,510 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4013, 2.3165, 2.2424, 2.4443, 2.3778, 2.3153, 2.6330, 2.4713], device='cuda:0'), covar=tensor([0.0659, 0.0999, 0.0702, 0.1122, 0.0800, 0.0529, 0.1148, 0.0834], device='cuda:0'), in_proj_covar=tensor([0.0078, 0.0100, 0.0089, 0.0112, 0.0083, 0.0074, 0.0139, 0.0094], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 00:45:53,983 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.573e+02 2.012e+02 2.567e+02 4.757e+02, threshold=4.023e+02, percent-clipped=7.0 2022-11-16 00:46:05,784 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3267, 4.3620, 4.4580, 4.5358, 4.0175, 3.9037, 4.9988, 4.4376], device='cuda:0'), covar=tensor([0.0420, 0.1035, 0.0348, 0.0996, 0.0532, 0.0375, 0.0694, 0.0570], device='cuda:0'), in_proj_covar=tensor([0.0079, 0.0102, 0.0089, 0.0113, 0.0083, 0.0074, 0.0140, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 00:46:29,075 INFO [train.py:876] (0/4) Epoch 9, batch 400, loss[loss=0.1193, simple_loss=0.1367, pruned_loss=0.0509, over 5738.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.154, pruned_loss=0.05683, over 941861.38 frames. ], batch size: 14, lr: 9.23e-03, grad_scale: 16.0 2022-11-16 00:47:01,901 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.026e+02 1.664e+02 1.953e+02 2.587e+02 4.932e+02, threshold=3.905e+02, percent-clipped=2.0 2022-11-16 00:47:37,285 INFO [train.py:876] (0/4) Epoch 9, batch 500, loss[loss=0.1142, simple_loss=0.1426, pruned_loss=0.0429, over 5689.00 frames. ], tot_loss[loss=0.1341, simple_loss=0.1542, pruned_loss=0.05695, over 994747.55 frames. ], batch size: 34, lr: 9.22e-03, grad_scale: 16.0 2022-11-16 00:47:40,002 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58681.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:48:00,387 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2022-11-16 00:48:09,721 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 1.707e+02 2.003e+02 2.573e+02 4.379e+02, threshold=4.006e+02, percent-clipped=1.0 2022-11-16 00:48:17,657 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3264, 3.7167, 2.8526, 1.7527, 3.4467, 1.4616, 3.5161, 1.8844], device='cuda:0'), covar=tensor([0.1542, 0.0182, 0.0762, 0.2216, 0.0302, 0.2267, 0.0268, 0.1803], device='cuda:0'), in_proj_covar=tensor([0.0126, 0.0107, 0.0115, 0.0116, 0.0107, 0.0126, 0.0099, 0.0116], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 00:48:20,971 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58742.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:48:32,601 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2022-11-16 00:48:36,100 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5395, 3.0410, 3.1629, 1.6284, 3.0768, 3.2255, 3.5151, 3.6639], device='cuda:0'), covar=tensor([0.1687, 0.1426, 0.0927, 0.3048, 0.0577, 0.0834, 0.0344, 0.0589], device='cuda:0'), in_proj_covar=tensor([0.0174, 0.0182, 0.0163, 0.0189, 0.0172, 0.0184, 0.0156, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 00:48:43,671 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58775.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:48:44,825 INFO [train.py:876] (0/4) Epoch 9, batch 600, loss[loss=0.1521, simple_loss=0.1514, pruned_loss=0.07638, over 4682.00 frames. ], tot_loss[loss=0.1334, simple_loss=0.1541, pruned_loss=0.05634, over 1027680.73 frames. ], batch size: 136, lr: 9.22e-03, grad_scale: 16.0 2022-11-16 00:48:52,535 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.43 vs. limit=5.0 2022-11-16 00:48:58,073 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58797.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:49:00,346 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=58800.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:49:17,737 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.043e+02 1.543e+02 1.852e+02 2.303e+02 3.719e+02, threshold=3.705e+02, percent-clipped=0.0 2022-11-16 00:49:25,137 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58836.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:49:30,905 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58845.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:49:32,853 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=58848.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:49:52,480 INFO [train.py:876] (0/4) Epoch 9, batch 700, loss[loss=0.1007, simple_loss=0.1236, pruned_loss=0.03886, over 5179.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1552, pruned_loss=0.05711, over 1049691.94 frames. ], batch size: 8, lr: 9.21e-03, grad_scale: 8.0 2022-11-16 00:50:11,747 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=58906.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:50:19,041 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.30 vs. limit=2.0 2022-11-16 00:50:25,240 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.741e+02 2.186e+02 2.790e+02 4.131e+02, threshold=4.372e+02, percent-clipped=5.0 2022-11-16 00:50:52,769 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=58967.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:50:59,888 INFO [train.py:876] (0/4) Epoch 9, batch 800, loss[loss=0.1534, simple_loss=0.1699, pruned_loss=0.06841, over 5616.00 frames. ], tot_loss[loss=0.1348, simple_loss=0.1556, pruned_loss=0.05697, over 1068979.79 frames. ], batch size: 38, lr: 9.20e-03, grad_scale: 8.0 2022-11-16 00:51:01,097 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.73 vs. limit=2.0 2022-11-16 00:51:04,893 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6223, 2.7086, 2.3405, 2.6430, 2.7360, 2.4742, 2.3070, 2.4674], device='cuda:0'), covar=tensor([0.0396, 0.0662, 0.1723, 0.0611, 0.0620, 0.0568, 0.0989, 0.0679], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0167, 0.0264, 0.0163, 0.0209, 0.0170, 0.0176, 0.0162], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 00:51:33,804 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.601e+01 1.692e+02 2.170e+02 2.841e+02 6.263e+02, threshold=4.339e+02, percent-clipped=3.0 2022-11-16 00:51:41,838 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59037.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:52:08,560 INFO [train.py:876] (0/4) Epoch 9, batch 900, loss[loss=0.136, simple_loss=0.1642, pruned_loss=0.0539, over 5575.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.1549, pruned_loss=0.05624, over 1073488.26 frames. ], batch size: 25, lr: 9.19e-03, grad_scale: 8.0 2022-11-16 00:52:22,481 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0187, 3.5141, 3.0827, 3.4746, 3.4797, 3.1356, 3.0921, 3.0725], device='cuda:0'), covar=tensor([0.1404, 0.0525, 0.1519, 0.0542, 0.0568, 0.0508, 0.0806, 0.0591], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0167, 0.0264, 0.0163, 0.0209, 0.0169, 0.0176, 0.0162], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 00:52:30,435 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4855, 4.3943, 2.8748, 4.1289, 3.3585, 3.0238, 2.2938, 3.7154], device='cuda:0'), covar=tensor([0.1407, 0.0178, 0.0998, 0.0340, 0.0634, 0.0972, 0.1892, 0.0286], device='cuda:0'), in_proj_covar=tensor([0.0163, 0.0138, 0.0163, 0.0141, 0.0176, 0.0175, 0.0172, 0.0152], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 00:52:36,484 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5744, 1.3349, 1.6159, 1.2986, 1.7490, 1.4218, 1.1017, 1.6219], device='cuda:0'), covar=tensor([0.0934, 0.1236, 0.1245, 0.1006, 0.0953, 0.1285, 0.2274, 0.1346], device='cuda:0'), in_proj_covar=tensor([0.0227, 0.0207, 0.0199, 0.0319, 0.0224, 0.0210, 0.0194, 0.0227], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 00:52:42,260 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.620e+01 1.634e+02 1.980e+02 2.467e+02 5.009e+02, threshold=3.960e+02, percent-clipped=1.0 2022-11-16 00:52:46,072 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59131.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:53:08,448 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=59164.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:53:16,812 INFO [train.py:876] (0/4) Epoch 9, batch 1000, loss[loss=0.2045, simple_loss=0.201, pruned_loss=0.104, over 5487.00 frames. ], tot_loss[loss=0.1351, simple_loss=0.1558, pruned_loss=0.05717, over 1073633.87 frames. ], batch size: 64, lr: 9.19e-03, grad_scale: 8.0 2022-11-16 00:53:42,287 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.88 vs. limit=2.0 2022-11-16 00:53:50,612 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=59225.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:53:51,082 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.814e+01 1.686e+02 1.957e+02 2.486e+02 6.337e+02, threshold=3.914e+02, percent-clipped=1.0 2022-11-16 00:53:51,253 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5612, 4.7923, 3.1442, 4.3984, 3.6365, 3.2660, 2.8493, 4.2062], device='cuda:0'), covar=tensor([0.1506, 0.0212, 0.0908, 0.0321, 0.0532, 0.0906, 0.1597, 0.0238], device='cuda:0'), in_proj_covar=tensor([0.0164, 0.0139, 0.0163, 0.0142, 0.0176, 0.0177, 0.0172, 0.0152], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 00:54:18,046 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59262.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:54:28,526 INFO [train.py:876] (0/4) Epoch 9, batch 1100, loss[loss=0.1123, simple_loss=0.1411, pruned_loss=0.04176, over 5791.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.155, pruned_loss=0.05624, over 1079860.53 frames. ], batch size: 21, lr: 9.18e-03, grad_scale: 8.0 2022-11-16 00:54:49,528 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7330, 2.3984, 1.9484, 1.9290, 1.2401, 1.9604, 1.4914, 2.1419], device='cuda:0'), covar=tensor([0.1015, 0.0309, 0.0852, 0.0713, 0.1839, 0.0814, 0.1523, 0.0447], device='cuda:0'), in_proj_covar=tensor([0.0165, 0.0140, 0.0165, 0.0143, 0.0176, 0.0178, 0.0173, 0.0152], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 00:55:01,721 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.767e+01 1.618e+02 1.962e+02 2.354e+02 6.461e+02, threshold=3.925e+02, percent-clipped=2.0 2022-11-16 00:55:08,988 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=59337.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:55:35,527 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5337, 1.0497, 1.4858, 0.9882, 1.5972, 1.4159, 1.1076, 1.4685], device='cuda:0'), covar=tensor([0.2387, 0.0598, 0.0478, 0.1598, 0.1712, 0.0677, 0.0594, 0.0366], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0018, 0.0013, 0.0016, 0.0014, 0.0012, 0.0017, 0.0012], device='cuda:0'), out_proj_covar=tensor([6.4163e-05, 8.4959e-05, 6.5652e-05, 7.6902e-05, 7.0236e-05, 6.2942e-05, 7.9592e-05, 6.2724e-05], device='cuda:0') 2022-11-16 00:55:36,025 INFO [train.py:876] (0/4) Epoch 9, batch 1200, loss[loss=0.1366, simple_loss=0.1551, pruned_loss=0.059, over 5716.00 frames. ], tot_loss[loss=0.1332, simple_loss=0.1546, pruned_loss=0.05584, over 1086111.22 frames. ], batch size: 19, lr: 9.17e-03, grad_scale: 8.0 2022-11-16 00:55:41,352 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=59385.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:55:58,858 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5494, 1.7474, 2.0902, 1.6092, 1.2096, 2.4906, 2.0763, 1.8565], device='cuda:0'), covar=tensor([0.1074, 0.1176, 0.0920, 0.2366, 0.2392, 0.1550, 0.1052, 0.1311], device='cuda:0'), in_proj_covar=tensor([0.0080, 0.0070, 0.0069, 0.0083, 0.0060, 0.0052, 0.0058, 0.0068], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0001, 0.0002], device='cuda:0') 2022-11-16 00:56:09,405 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.585e+02 1.981e+02 2.415e+02 4.268e+02, threshold=3.961e+02, percent-clipped=1.0 2022-11-16 00:56:12,826 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=59431.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:56:20,016 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1818, 3.7402, 3.2034, 3.6906, 3.6896, 3.1098, 3.2987, 3.3040], device='cuda:0'), covar=tensor([0.1122, 0.0462, 0.1465, 0.0434, 0.0435, 0.0575, 0.0690, 0.0527], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0167, 0.0264, 0.0164, 0.0209, 0.0169, 0.0176, 0.0161], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 00:56:43,601 INFO [train.py:876] (0/4) Epoch 9, batch 1300, loss[loss=0.1185, simple_loss=0.1359, pruned_loss=0.05057, over 5737.00 frames. ], tot_loss[loss=0.1319, simple_loss=0.1533, pruned_loss=0.05527, over 1082451.72 frames. ], batch size: 31, lr: 9.16e-03, grad_scale: 8.0 2022-11-16 00:56:45,395 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=59479.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:57:00,736 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=59502.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:57:13,460 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59520.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:57:17,255 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.599e+01 1.572e+02 1.990e+02 2.382e+02 3.719e+02, threshold=3.980e+02, percent-clipped=0.0 2022-11-16 00:57:40,990 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=59562.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:57:41,698 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=59563.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:57:43,318 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.93 vs. limit=2.0 2022-11-16 00:57:51,214 INFO [train.py:876] (0/4) Epoch 9, batch 1400, loss[loss=0.1155, simple_loss=0.1589, pruned_loss=0.03607, over 5615.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.1529, pruned_loss=0.0546, over 1087233.60 frames. ], batch size: 18, lr: 9.15e-03, grad_scale: 8.0 2022-11-16 00:58:02,087 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.78 vs. limit=5.0 2022-11-16 00:58:13,882 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=59610.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 00:58:24,718 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.626e+02 1.961e+02 2.375e+02 3.725e+02, threshold=3.922e+02, percent-clipped=0.0 2022-11-16 00:58:58,825 INFO [train.py:876] (0/4) Epoch 9, batch 1500, loss[loss=0.1551, simple_loss=0.1482, pruned_loss=0.08098, over 4145.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.154, pruned_loss=0.05563, over 1085972.66 frames. ], batch size: 181, lr: 9.15e-03, grad_scale: 8.0 2022-11-16 00:59:17,971 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2022-11-16 00:59:31,912 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.993e+01 1.447e+02 1.801e+02 2.298e+02 3.676e+02, threshold=3.602e+02, percent-clipped=0.0 2022-11-16 00:59:51,677 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.60 vs. limit=5.0 2022-11-16 01:00:06,514 INFO [train.py:876] (0/4) Epoch 9, batch 1600, loss[loss=0.1752, simple_loss=0.1834, pruned_loss=0.08348, over 5466.00 frames. ], tot_loss[loss=0.1336, simple_loss=0.1543, pruned_loss=0.05642, over 1079801.49 frames. ], batch size: 53, lr: 9.14e-03, grad_scale: 8.0 2022-11-16 01:00:33,179 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8337, 1.9265, 1.6237, 1.9159, 1.9681, 1.8245, 1.6688, 1.8146], device='cuda:0'), covar=tensor([0.0486, 0.0775, 0.1681, 0.0658, 0.0637, 0.0573, 0.1230, 0.0665], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0168, 0.0266, 0.0164, 0.0209, 0.0170, 0.0178, 0.0164], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:00:36,433 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=59820.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:00:40,151 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.161e+01 1.571e+02 1.942e+02 2.410e+02 4.577e+02, threshold=3.885e+02, percent-clipped=5.0 2022-11-16 01:00:45,106 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-16 01:00:55,339 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=59848.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:01:01,629 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=59858.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:01:08,520 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=59868.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:01:14,344 INFO [train.py:876] (0/4) Epoch 9, batch 1700, loss[loss=0.09302, simple_loss=0.1255, pruned_loss=0.0303, over 5700.00 frames. ], tot_loss[loss=0.1317, simple_loss=0.1532, pruned_loss=0.05505, over 1083199.28 frames. ], batch size: 12, lr: 9.13e-03, grad_scale: 8.0 2022-11-16 01:01:22,410 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=59889.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:01:35,836 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=59909.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:01:46,280 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2022-11-16 01:01:47,042 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.735e+01 1.514e+02 1.939e+02 2.316e+02 5.054e+02, threshold=3.877e+02, percent-clipped=3.0 2022-11-16 01:02:02,586 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.21 vs. limit=2.0 2022-11-16 01:02:02,978 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=59950.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:02:13,879 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2022-11-16 01:02:21,362 INFO [train.py:876] (0/4) Epoch 9, batch 1800, loss[loss=0.1587, simple_loss=0.1757, pruned_loss=0.07085, over 5541.00 frames. ], tot_loss[loss=0.128, simple_loss=0.151, pruned_loss=0.05253, over 1085878.86 frames. ], batch size: 21, lr: 9.12e-03, grad_scale: 8.0 2022-11-16 01:02:37,113 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-60000.pt 2022-11-16 01:02:59,237 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.022e+02 1.503e+02 1.882e+02 2.236e+02 4.811e+02, threshold=3.764e+02, percent-clipped=2.0 2022-11-16 01:03:01,965 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1544, 3.2272, 3.1717, 3.0267, 3.3560, 3.1922, 1.3375, 3.5042], device='cuda:0'), covar=tensor([0.0329, 0.0340, 0.0374, 0.0387, 0.0313, 0.0331, 0.3011, 0.0256], device='cuda:0'), in_proj_covar=tensor([0.0102, 0.0081, 0.0083, 0.0074, 0.0097, 0.0085, 0.0127, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:03:08,930 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60040.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:03:16,841 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60052.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:03:25,135 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60065.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:03:33,730 INFO [train.py:876] (0/4) Epoch 9, batch 1900, loss[loss=0.1015, simple_loss=0.1323, pruned_loss=0.03533, over 5542.00 frames. ], tot_loss[loss=0.1306, simple_loss=0.1531, pruned_loss=0.05407, over 1089715.79 frames. ], batch size: 16, lr: 9.12e-03, grad_scale: 8.0 2022-11-16 01:03:34,462 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60078.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:03:49,836 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60101.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 01:03:57,580 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60113.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 01:04:06,950 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.557e+01 1.713e+02 2.167e+02 2.668e+02 5.052e+02, threshold=4.333e+02, percent-clipped=7.0 2022-11-16 01:04:07,148 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60126.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:04:15,556 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60139.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:04:19,909 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8164, 2.4062, 3.5367, 3.0654, 3.6882, 2.3187, 3.3184, 3.8107], device='cuda:0'), covar=tensor([0.0671, 0.1723, 0.1155, 0.1855, 0.0576, 0.1763, 0.1173, 0.0764], device='cuda:0'), in_proj_covar=tensor([0.0228, 0.0191, 0.0202, 0.0207, 0.0224, 0.0191, 0.0221, 0.0224], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 01:04:28,109 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60158.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:04:28,268 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.81 vs. limit=2.0 2022-11-16 01:04:31,600 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.64 vs. limit=5.0 2022-11-16 01:04:40,471 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2022-11-16 01:04:40,597 INFO [train.py:876] (0/4) Epoch 9, batch 2000, loss[loss=0.09178, simple_loss=0.1242, pruned_loss=0.02966, over 5676.00 frames. ], tot_loss[loss=0.1328, simple_loss=0.1546, pruned_loss=0.05554, over 1093635.06 frames. ], batch size: 11, lr: 9.11e-03, grad_scale: 8.0 2022-11-16 01:04:55,779 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60198.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:04:59,897 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60204.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:05:01,212 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60206.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:05:13,735 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2022-11-16 01:05:14,563 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.629e+02 2.002e+02 2.495e+02 4.506e+02, threshold=4.003e+02, percent-clipped=3.0 2022-11-16 01:05:27,713 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60245.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:05:37,281 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60259.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:05:37,377 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.06 vs. limit=5.0 2022-11-16 01:05:49,240 INFO [train.py:876] (0/4) Epoch 9, batch 2100, loss[loss=0.1171, simple_loss=0.1522, pruned_loss=0.04105, over 5609.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.1543, pruned_loss=0.05538, over 1084617.82 frames. ], batch size: 18, lr: 9.10e-03, grad_scale: 8.0 2022-11-16 01:05:53,160 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2819, 4.4381, 4.2878, 3.7149, 2.6445, 4.8907, 2.7764, 4.0836], device='cuda:0'), covar=tensor([0.0317, 0.0133, 0.0168, 0.0489, 0.0512, 0.0090, 0.0418, 0.0104], device='cuda:0'), in_proj_covar=tensor([0.0184, 0.0158, 0.0169, 0.0191, 0.0180, 0.0170, 0.0177, 0.0168], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 01:06:07,271 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0124, 2.8085, 2.9282, 1.4167, 2.7937, 3.2938, 3.0333, 3.1153], device='cuda:0'), covar=tensor([0.2028, 0.1753, 0.1084, 0.3292, 0.0591, 0.0567, 0.0466, 0.1025], device='cuda:0'), in_proj_covar=tensor([0.0172, 0.0182, 0.0162, 0.0188, 0.0170, 0.0184, 0.0156, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 01:06:22,290 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.667e+02 2.049e+02 2.499e+02 3.972e+02, threshold=4.098e+02, percent-clipped=0.0 2022-11-16 01:06:56,810 INFO [train.py:876] (0/4) Epoch 9, batch 2200, loss[loss=0.1465, simple_loss=0.1616, pruned_loss=0.06572, over 5547.00 frames. ], tot_loss[loss=0.1293, simple_loss=0.1517, pruned_loss=0.05344, over 1083439.13 frames. ], batch size: 46, lr: 9.09e-03, grad_scale: 8.0 2022-11-16 01:07:09,673 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60396.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 01:07:18,002 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60408.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 01:07:26,505 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60421.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:07:30,026 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.625e+02 1.909e+02 2.506e+02 7.347e+02, threshold=3.819e+02, percent-clipped=1.0 2022-11-16 01:07:35,204 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60434.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:08:04,143 INFO [train.py:876] (0/4) Epoch 9, batch 2300, loss[loss=0.08743, simple_loss=0.1079, pruned_loss=0.0335, over 5525.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1502, pruned_loss=0.053, over 1088227.02 frames. ], batch size: 10, lr: 9.09e-03, grad_scale: 8.0 2022-11-16 01:08:22,441 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60504.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:08:37,684 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.337e+01 1.638e+02 1.971e+02 2.511e+02 5.541e+02, threshold=3.943e+02, percent-clipped=0.0 2022-11-16 01:08:50,478 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60545.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:08:54,913 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60552.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:08:56,295 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=60554.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:09:12,098 INFO [train.py:876] (0/4) Epoch 9, batch 2400, loss[loss=0.1715, simple_loss=0.1704, pruned_loss=0.08636, over 4985.00 frames. ], tot_loss[loss=0.1314, simple_loss=0.1523, pruned_loss=0.05523, over 1079186.82 frames. ], batch size: 109, lr: 9.08e-03, grad_scale: 8.0 2022-11-16 01:09:22,867 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60593.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:09:37,451 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5199, 3.5566, 3.4418, 3.3294, 2.0100, 3.4804, 2.1423, 2.8963], device='cuda:0'), covar=tensor([0.0342, 0.0199, 0.0139, 0.0279, 0.0467, 0.0146, 0.0422, 0.0173], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0160, 0.0172, 0.0194, 0.0183, 0.0171, 0.0181, 0.0172], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 01:09:44,906 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.802e+02 2.307e+02 2.858e+02 9.175e+02, threshold=4.615e+02, percent-clipped=3.0 2022-11-16 01:09:45,800 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0294, 4.4641, 4.0448, 4.1645, 2.4063, 4.6757, 2.5486, 4.0429], device='cuda:0'), covar=tensor([0.0342, 0.0141, 0.0179, 0.0279, 0.0552, 0.0120, 0.0492, 0.0252], device='cuda:0'), in_proj_covar=tensor([0.0187, 0.0161, 0.0173, 0.0195, 0.0183, 0.0172, 0.0181, 0.0173], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 01:09:58,421 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-16 01:10:11,534 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8895, 1.8955, 2.4167, 1.5222, 0.9820, 2.8114, 2.1823, 2.0480], device='cuda:0'), covar=tensor([0.1110, 0.1489, 0.0825, 0.2996, 0.3116, 0.1095, 0.1321, 0.1288], device='cuda:0'), in_proj_covar=tensor([0.0079, 0.0071, 0.0070, 0.0082, 0.0063, 0.0050, 0.0060, 0.0068], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:0') 2022-11-16 01:10:18,993 INFO [train.py:876] (0/4) Epoch 9, batch 2500, loss[loss=0.1195, simple_loss=0.1504, pruned_loss=0.04427, over 5619.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.1539, pruned_loss=0.05557, over 1087614.08 frames. ], batch size: 23, lr: 9.07e-03, grad_scale: 8.0 2022-11-16 01:10:31,965 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60696.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:10:33,465 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.02 vs. limit=5.0 2022-11-16 01:10:39,139 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1820, 1.4326, 1.7697, 1.3117, 0.9965, 2.2704, 1.6569, 1.5103], device='cuda:0'), covar=tensor([0.1505, 0.1251, 0.1344, 0.2295, 0.2996, 0.0838, 0.1514, 0.1558], device='cuda:0'), in_proj_covar=tensor([0.0079, 0.0071, 0.0070, 0.0082, 0.0063, 0.0050, 0.0061, 0.0069], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:0') 2022-11-16 01:10:40,128 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60708.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:10:46,723 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=60718.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:10:48,596 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60721.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:10:51,736 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.647e+01 1.621e+02 1.973e+02 2.538e+02 4.790e+02, threshold=3.945e+02, percent-clipped=1.0 2022-11-16 01:10:57,726 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60734.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:11:04,177 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60744.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:11:04,636 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-16 01:11:11,855 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60756.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:11:20,528 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60769.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:11:25,649 INFO [train.py:876] (0/4) Epoch 9, batch 2600, loss[loss=0.1125, simple_loss=0.1457, pruned_loss=0.03962, over 5741.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.1533, pruned_loss=0.05539, over 1083688.72 frames. ], batch size: 16, lr: 9.06e-03, grad_scale: 8.0 2022-11-16 01:11:27,134 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=60779.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:11:28,051 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.21 vs. limit=5.0 2022-11-16 01:11:28,884 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60782.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:11:35,059 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.99 vs. limit=2.0 2022-11-16 01:11:58,598 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.024e+02 1.669e+02 1.987e+02 2.479e+02 5.022e+02, threshold=3.974e+02, percent-clipped=3.0 2022-11-16 01:12:06,022 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6508, 2.9992, 4.1459, 3.5143, 4.7909, 3.1044, 4.1704, 4.7042], device='cuda:0'), covar=tensor([0.0590, 0.1898, 0.0851, 0.1841, 0.0343, 0.1642, 0.1261, 0.0529], device='cuda:0'), in_proj_covar=tensor([0.0233, 0.0193, 0.0207, 0.0210, 0.0228, 0.0194, 0.0223, 0.0222], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 01:12:11,424 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7933, 3.9086, 3.9064, 3.6751, 3.8993, 3.9180, 1.5057, 4.0943], device='cuda:0'), covar=tensor([0.0317, 0.0308, 0.0304, 0.0437, 0.0383, 0.0358, 0.3348, 0.0289], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0082, 0.0083, 0.0075, 0.0099, 0.0085, 0.0127, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:12:18,170 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=60854.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:12:19,566 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2022-11-16 01:12:33,334 INFO [train.py:876] (0/4) Epoch 9, batch 2700, loss[loss=0.1221, simple_loss=0.15, pruned_loss=0.04715, over 5612.00 frames. ], tot_loss[loss=0.1317, simple_loss=0.1531, pruned_loss=0.05519, over 1081154.22 frames. ], batch size: 23, lr: 9.06e-03, grad_scale: 16.0 2022-11-16 01:12:50,322 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=60902.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:12:53,004 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2384, 2.0224, 2.7066, 1.6614, 1.1478, 3.0172, 2.3837, 2.2058], device='cuda:0'), covar=tensor([0.1260, 0.2013, 0.0754, 0.3073, 0.5146, 0.1468, 0.0908, 0.1223], device='cuda:0'), in_proj_covar=tensor([0.0079, 0.0070, 0.0070, 0.0083, 0.0063, 0.0051, 0.0060, 0.0069], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0001, 0.0002], device='cuda:0') 2022-11-16 01:12:53,574 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2356, 4.7591, 4.3089, 4.7485, 4.7869, 4.0067, 4.4648, 4.2079], device='cuda:0'), covar=tensor([0.0400, 0.0595, 0.1425, 0.0534, 0.0381, 0.0466, 0.0603, 0.0821], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0170, 0.0267, 0.0167, 0.0211, 0.0169, 0.0179, 0.0168], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:13:06,004 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.571e+02 1.865e+02 2.553e+02 5.202e+02, threshold=3.729e+02, percent-clipped=2.0 2022-11-16 01:13:20,911 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3257, 4.3112, 2.8691, 4.2075, 3.3400, 2.9078, 2.2321, 3.7736], device='cuda:0'), covar=tensor([0.1901, 0.0331, 0.1351, 0.0327, 0.0821, 0.1198, 0.2400, 0.0458], device='cuda:0'), in_proj_covar=tensor([0.0164, 0.0139, 0.0164, 0.0142, 0.0177, 0.0177, 0.0175, 0.0155], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 01:13:40,752 INFO [train.py:876] (0/4) Epoch 9, batch 2800, loss[loss=0.2312, simple_loss=0.2073, pruned_loss=0.1275, over 3058.00 frames. ], tot_loss[loss=0.1331, simple_loss=0.1539, pruned_loss=0.05618, over 1075984.78 frames. ], batch size: 285, lr: 9.05e-03, grad_scale: 16.0 2022-11-16 01:14:06,259 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8806, 1.4126, 1.9593, 1.5735, 1.4244, 1.9264, 1.8218, 1.5979], device='cuda:0'), covar=tensor([0.0047, 0.0198, 0.0057, 0.0042, 0.0115, 0.0114, 0.0026, 0.0028], device='cuda:0'), in_proj_covar=tensor([0.0021, 0.0021, 0.0022, 0.0028, 0.0024, 0.0023, 0.0027, 0.0026], device='cuda:0'), out_proj_covar=tensor([1.9516e-05, 2.0364e-05, 1.9944e-05, 2.7258e-05, 2.2880e-05, 2.2244e-05, 2.5792e-05, 2.5717e-05], device='cuda:0') 2022-11-16 01:14:13,879 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.565e+02 1.869e+02 2.208e+02 4.201e+02, threshold=3.737e+02, percent-clipped=2.0 2022-11-16 01:14:21,271 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0873, 2.6055, 3.7698, 3.3251, 4.0552, 2.6423, 3.4926, 4.1814], device='cuda:0'), covar=tensor([0.0628, 0.1728, 0.0941, 0.1487, 0.0511, 0.1772, 0.1419, 0.0733], device='cuda:0'), in_proj_covar=tensor([0.0239, 0.0196, 0.0212, 0.0215, 0.0232, 0.0197, 0.0227, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 01:14:37,284 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=61061.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:14:46,628 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=61074.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:14:48,547 INFO [train.py:876] (0/4) Epoch 9, batch 2900, loss[loss=0.1241, simple_loss=0.1504, pruned_loss=0.04888, over 5726.00 frames. ], tot_loss[loss=0.132, simple_loss=0.153, pruned_loss=0.05554, over 1083971.69 frames. ], batch size: 13, lr: 9.04e-03, grad_scale: 16.0 2022-11-16 01:15:18,269 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=61122.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:15:21,362 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.003e+02 1.585e+02 1.881e+02 2.261e+02 3.985e+02, threshold=3.762e+02, percent-clipped=2.0 2022-11-16 01:15:51,096 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-16 01:15:55,636 INFO [train.py:876] (0/4) Epoch 9, batch 3000, loss[loss=0.1757, simple_loss=0.1671, pruned_loss=0.09218, over 4747.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1534, pruned_loss=0.05587, over 1081040.41 frames. ], batch size: 136, lr: 9.03e-03, grad_scale: 16.0 2022-11-16 01:15:55,637 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 01:16:03,326 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0689, 4.6132, 3.6076, 4.5691, 3.6126, 3.5648, 2.9817, 4.0989], device='cuda:0'), covar=tensor([0.1021, 0.0258, 0.0781, 0.0198, 0.0618, 0.0704, 0.1555, 0.0312], device='cuda:0'), in_proj_covar=tensor([0.0161, 0.0137, 0.0162, 0.0141, 0.0174, 0.0174, 0.0170, 0.0154], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 01:16:16,567 INFO [train.py:908] (0/4) Epoch 9, validation: loss=0.1637, simple_loss=0.1831, pruned_loss=0.07219, over 1530663.00 frames. 2022-11-16 01:16:16,567 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-16 01:16:19,547 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.87 vs. limit=2.0 2022-11-16 01:16:49,307 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.005e+02 1.614e+02 2.000e+02 2.380e+02 4.865e+02, threshold=4.000e+02, percent-clipped=3.0 2022-11-16 01:17:23,848 INFO [train.py:876] (0/4) Epoch 9, batch 3100, loss[loss=0.1044, simple_loss=0.1361, pruned_loss=0.03634, over 5561.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1551, pruned_loss=0.05645, over 1084387.54 frames. ], batch size: 16, lr: 9.03e-03, grad_scale: 16.0 2022-11-16 01:17:38,527 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7615, 2.5948, 2.0757, 2.0891, 1.4440, 2.2301, 1.6664, 2.2418], device='cuda:0'), covar=tensor([0.1176, 0.0361, 0.1057, 0.0642, 0.1826, 0.0875, 0.1713, 0.0464], device='cuda:0'), in_proj_covar=tensor([0.0162, 0.0137, 0.0163, 0.0141, 0.0175, 0.0174, 0.0170, 0.0154], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 01:17:57,137 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.004e+02 1.659e+02 1.981e+02 2.507e+02 4.168e+02, threshold=3.962e+02, percent-clipped=1.0 2022-11-16 01:18:12,382 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2022-11-16 01:18:30,148 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=61374.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:18:31,951 INFO [train.py:876] (0/4) Epoch 9, batch 3200, loss[loss=0.1644, simple_loss=0.1801, pruned_loss=0.07438, over 5249.00 frames. ], tot_loss[loss=0.1333, simple_loss=0.1546, pruned_loss=0.05597, over 1083471.80 frames. ], batch size: 79, lr: 9.02e-03, grad_scale: 16.0 2022-11-16 01:18:59,217 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=61417.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:19:02,494 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=61422.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:19:04,946 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.596e+02 1.869e+02 2.254e+02 4.160e+02, threshold=3.737e+02, percent-clipped=1.0 2022-11-16 01:19:11,709 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.06 vs. limit=5.0 2022-11-16 01:19:12,013 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0859, 4.0209, 2.4087, 3.8213, 3.1382, 2.6127, 2.0636, 3.3434], device='cuda:0'), covar=tensor([0.2224, 0.0427, 0.2046, 0.0561, 0.1052, 0.1683, 0.2756, 0.0593], device='cuda:0'), in_proj_covar=tensor([0.0162, 0.0138, 0.0164, 0.0141, 0.0175, 0.0175, 0.0171, 0.0155], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 01:19:39,536 INFO [train.py:876] (0/4) Epoch 9, batch 3300, loss[loss=0.1581, simple_loss=0.1601, pruned_loss=0.07807, over 5571.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1553, pruned_loss=0.05676, over 1080886.16 frames. ], batch size: 40, lr: 9.01e-03, grad_scale: 16.0 2022-11-16 01:19:49,729 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.95 vs. limit=2.0 2022-11-16 01:20:02,768 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2022-11-16 01:20:11,839 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2849, 2.5137, 3.7891, 3.3470, 4.1277, 2.5592, 3.7471, 4.0500], device='cuda:0'), covar=tensor([0.0447, 0.1783, 0.0768, 0.1281, 0.0438, 0.1556, 0.1067, 0.0705], device='cuda:0'), in_proj_covar=tensor([0.0232, 0.0195, 0.0205, 0.0206, 0.0226, 0.0192, 0.0222, 0.0224], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 01:20:12,867 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.659e+02 1.971e+02 2.388e+02 4.331e+02, threshold=3.941e+02, percent-clipped=1.0 2022-11-16 01:20:27,887 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5948, 1.4523, 1.5624, 0.8903, 1.4091, 1.4172, 1.1525, 0.7964], device='cuda:0'), covar=tensor([0.0018, 0.0035, 0.0019, 0.0046, 0.0038, 0.0047, 0.0033, 0.0053], device='cuda:0'), in_proj_covar=tensor([0.0020, 0.0020, 0.0021, 0.0027, 0.0023, 0.0021, 0.0025, 0.0024], device='cuda:0'), out_proj_covar=tensor([1.8522e-05, 1.8810e-05, 1.9129e-05, 2.6267e-05, 2.1765e-05, 2.0743e-05, 2.4512e-05, 2.4531e-05], device='cuda:0') 2022-11-16 01:20:47,469 INFO [train.py:876] (0/4) Epoch 9, batch 3400, loss[loss=0.1683, simple_loss=0.1646, pruned_loss=0.08596, over 5292.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.1548, pruned_loss=0.05641, over 1079087.25 frames. ], batch size: 79, lr: 9.01e-03, grad_scale: 16.0 2022-11-16 01:21:20,961 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.197e+01 1.625e+02 1.885e+02 2.405e+02 4.262e+02, threshold=3.771e+02, percent-clipped=2.0 2022-11-16 01:21:27,052 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.84 vs. limit=5.0 2022-11-16 01:21:55,065 INFO [train.py:876] (0/4) Epoch 9, batch 3500, loss[loss=0.1927, simple_loss=0.1837, pruned_loss=0.1009, over 5569.00 frames. ], tot_loss[loss=0.1323, simple_loss=0.1536, pruned_loss=0.05544, over 1082149.70 frames. ], batch size: 46, lr: 9.00e-03, grad_scale: 16.0 2022-11-16 01:22:22,290 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=61717.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:22:28,020 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.632e+02 2.023e+02 2.591e+02 5.866e+02, threshold=4.047e+02, percent-clipped=6.0 2022-11-16 01:22:32,170 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=61732.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:22:39,370 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=61743.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:22:47,227 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.11 vs. limit=5.0 2022-11-16 01:22:54,359 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=61765.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:22:54,411 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9806, 3.1953, 3.1344, 2.9718, 3.1475, 3.1234, 1.2102, 3.2570], device='cuda:0'), covar=tensor([0.0410, 0.0311, 0.0306, 0.0363, 0.0414, 0.0342, 0.3564, 0.0349], device='cuda:0'), in_proj_covar=tensor([0.0106, 0.0083, 0.0086, 0.0077, 0.0103, 0.0087, 0.0132, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:23:02,301 INFO [train.py:876] (0/4) Epoch 9, batch 3600, loss[loss=0.1082, simple_loss=0.1341, pruned_loss=0.0412, over 5525.00 frames. ], tot_loss[loss=0.1308, simple_loss=0.1528, pruned_loss=0.05437, over 1083950.67 frames. ], batch size: 10, lr: 8.99e-03, grad_scale: 16.0 2022-11-16 01:23:12,895 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=61793.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:23:20,612 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=61804.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:23:25,697 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4085, 4.2433, 4.4311, 4.4616, 3.7732, 3.6118, 4.8960, 4.1729], device='cuda:0'), covar=tensor([0.0386, 0.0734, 0.0333, 0.1085, 0.0725, 0.0478, 0.0721, 0.0606], device='cuda:0'), in_proj_covar=tensor([0.0081, 0.0101, 0.0088, 0.0111, 0.0082, 0.0073, 0.0139, 0.0093], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:23:35,478 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.175e+02 1.617e+02 2.067e+02 2.692e+02 5.357e+02, threshold=4.135e+02, percent-clipped=2.0 2022-11-16 01:23:40,894 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8378, 1.4780, 1.7975, 1.8007, 1.8497, 1.2368, 1.6047, 1.9295], device='cuda:0'), covar=tensor([0.0305, 0.0819, 0.0329, 0.0324, 0.0374, 0.1090, 0.0470, 0.0380], device='cuda:0'), in_proj_covar=tensor([0.0233, 0.0195, 0.0207, 0.0210, 0.0229, 0.0192, 0.0225, 0.0225], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 01:24:10,273 INFO [train.py:876] (0/4) Epoch 9, batch 3700, loss[loss=0.1451, simple_loss=0.1656, pruned_loss=0.0623, over 5603.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1541, pruned_loss=0.0555, over 1081989.89 frames. ], batch size: 23, lr: 8.98e-03, grad_scale: 16.0 2022-11-16 01:24:34,707 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=61914.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:24:42,850 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.652e+01 1.593e+02 2.000e+02 2.466e+02 4.913e+02, threshold=3.999e+02, percent-clipped=2.0 2022-11-16 01:25:15,830 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=61975.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:25:16,949 INFO [train.py:876] (0/4) Epoch 9, batch 3800, loss[loss=0.151, simple_loss=0.1706, pruned_loss=0.06571, over 5594.00 frames. ], tot_loss[loss=0.133, simple_loss=0.1541, pruned_loss=0.05595, over 1083268.41 frames. ], batch size: 24, lr: 8.98e-03, grad_scale: 16.0 2022-11-16 01:25:45,397 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.49 vs. limit=5.0 2022-11-16 01:25:50,159 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.003e+01 1.597e+02 1.892e+02 2.479e+02 4.563e+02, threshold=3.783e+02, percent-clipped=2.0 2022-11-16 01:26:24,516 INFO [train.py:876] (0/4) Epoch 9, batch 3900, loss[loss=0.1507, simple_loss=0.1605, pruned_loss=0.07047, over 4707.00 frames. ], tot_loss[loss=0.1336, simple_loss=0.1546, pruned_loss=0.05628, over 1081664.41 frames. ], batch size: 135, lr: 8.97e-03, grad_scale: 16.0 2022-11-16 01:26:31,654 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62088.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:26:39,487 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62099.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:26:50,504 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4536, 3.1385, 3.2927, 3.0358, 3.4953, 3.3323, 3.2524, 3.4035], device='cuda:0'), covar=tensor([0.0382, 0.0414, 0.0477, 0.0450, 0.0405, 0.0240, 0.0378, 0.0527], device='cuda:0'), in_proj_covar=tensor([0.0131, 0.0142, 0.0105, 0.0141, 0.0162, 0.0094, 0.0118, 0.0146], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 01:26:55,152 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62122.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:26:57,545 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.676e+02 2.020e+02 2.333e+02 7.144e+02, threshold=4.039e+02, percent-clipped=2.0 2022-11-16 01:27:16,679 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3707, 1.1718, 1.1372, 0.9178, 1.3523, 1.4329, 0.7814, 1.1043], device='cuda:0'), covar=tensor([0.0496, 0.0559, 0.0374, 0.0724, 0.1073, 0.0504, 0.0773, 0.0892], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0018, 0.0013, 0.0016, 0.0014, 0.0012, 0.0017, 0.0012], device='cuda:0'), out_proj_covar=tensor([6.4561e-05, 8.6121e-05, 6.5382e-05, 7.7909e-05, 7.0582e-05, 6.3325e-05, 8.0077e-05, 6.3894e-05], device='cuda:0') 2022-11-16 01:27:17,966 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2022-11-16 01:27:29,885 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.0794, 5.2789, 5.5245, 5.5184, 5.0610, 4.4789, 6.1715, 5.3639], device='cuda:0'), covar=tensor([0.0536, 0.0840, 0.0266, 0.0870, 0.0514, 0.0378, 0.0649, 0.0455], device='cuda:0'), in_proj_covar=tensor([0.0083, 0.0103, 0.0090, 0.0113, 0.0085, 0.0074, 0.0143, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:27:32,461 INFO [train.py:876] (0/4) Epoch 9, batch 4000, loss[loss=0.1518, simple_loss=0.1444, pruned_loss=0.07963, over 4148.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.154, pruned_loss=0.05665, over 1080196.19 frames. ], batch size: 181, lr: 8.96e-03, grad_scale: 16.0 2022-11-16 01:27:36,379 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.9030, 5.3976, 5.7533, 5.2533, 6.0377, 5.8473, 5.0581, 5.8881], device='cuda:0'), covar=tensor([0.0340, 0.0241, 0.0346, 0.0307, 0.0285, 0.0138, 0.0207, 0.0216], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0143, 0.0106, 0.0142, 0.0164, 0.0095, 0.0119, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 01:27:36,460 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62183.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:28:05,700 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.381e+01 1.660e+02 2.083e+02 2.529e+02 5.015e+02, threshold=4.165e+02, percent-clipped=1.0 2022-11-16 01:28:22,940 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8286, 5.0205, 4.9608, 5.0370, 4.3244, 4.3739, 5.5529, 4.7166], device='cuda:0'), covar=tensor([0.0390, 0.0589, 0.0332, 0.0847, 0.0595, 0.0344, 0.0632, 0.0444], device='cuda:0'), in_proj_covar=tensor([0.0081, 0.0102, 0.0089, 0.0111, 0.0084, 0.0073, 0.0141, 0.0093], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:28:35,671 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62270.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:28:37,730 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62273.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:28:40,189 INFO [train.py:876] (0/4) Epoch 9, batch 4100, loss[loss=0.09915, simple_loss=0.1271, pruned_loss=0.03561, over 5768.00 frames. ], tot_loss[loss=0.1328, simple_loss=0.1534, pruned_loss=0.05614, over 1079852.90 frames. ], batch size: 21, lr: 8.96e-03, grad_scale: 16.0 2022-11-16 01:29:13,574 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.483e+02 1.988e+02 2.459e+02 5.062e+02, threshold=3.976e+02, percent-clipped=4.0 2022-11-16 01:29:19,051 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62334.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:29:48,369 INFO [train.py:876] (0/4) Epoch 9, batch 4200, loss[loss=0.1038, simple_loss=0.1377, pruned_loss=0.03498, over 5749.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.1536, pruned_loss=0.05568, over 1088555.02 frames. ], batch size: 20, lr: 8.95e-03, grad_scale: 16.0 2022-11-16 01:29:55,566 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62388.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:29:57,580 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62391.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:30:02,837 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62399.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:30:06,938 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0473, 2.9023, 3.1631, 1.6677, 2.7088, 3.3318, 3.2810, 3.5439], device='cuda:0'), covar=tensor([0.2070, 0.1589, 0.0822, 0.2792, 0.0657, 0.0581, 0.0389, 0.0809], device='cuda:0'), in_proj_covar=tensor([0.0172, 0.0183, 0.0164, 0.0189, 0.0171, 0.0190, 0.0159, 0.0185], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 01:30:21,289 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.948e+01 1.584e+02 2.002e+02 2.529e+02 4.819e+02, threshold=4.003e+02, percent-clipped=3.0 2022-11-16 01:30:28,428 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62436.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:30:35,822 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62447.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:30:35,934 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1847, 0.8149, 0.9121, 0.7660, 1.1351, 1.0896, 0.6646, 0.7303], device='cuda:0'), covar=tensor([0.0313, 0.0312, 0.0323, 0.0430, 0.0273, 0.0281, 0.0704, 0.0338], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0018, 0.0013, 0.0016, 0.0014, 0.0012, 0.0016, 0.0012], device='cuda:0'), out_proj_covar=tensor([6.4478e-05, 8.5410e-05, 6.5092e-05, 7.6994e-05, 6.9580e-05, 6.3440e-05, 7.9296e-05, 6.3381e-05], device='cuda:0') 2022-11-16 01:30:39,201 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62452.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:30:55,988 INFO [train.py:876] (0/4) Epoch 9, batch 4300, loss[loss=0.1262, simple_loss=0.1619, pruned_loss=0.04529, over 5758.00 frames. ], tot_loss[loss=0.1318, simple_loss=0.1536, pruned_loss=0.05505, over 1087004.38 frames. ], batch size: 20, lr: 8.94e-03, grad_scale: 16.0 2022-11-16 01:30:56,685 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62478.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:31:03,800 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.0519, 4.9027, 4.9262, 5.0596, 4.5306, 4.3770, 5.5841, 4.9578], device='cuda:0'), covar=tensor([0.0301, 0.0755, 0.0383, 0.1128, 0.0382, 0.0309, 0.0595, 0.0461], device='cuda:0'), in_proj_covar=tensor([0.0082, 0.0103, 0.0090, 0.0113, 0.0085, 0.0074, 0.0142, 0.0094], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:31:07,133 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.3661, 4.8324, 5.1275, 4.8141, 5.4097, 5.2681, 4.5808, 5.3339], device='cuda:0'), covar=tensor([0.0308, 0.0246, 0.0425, 0.0247, 0.0267, 0.0127, 0.0209, 0.0242], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0139, 0.0103, 0.0138, 0.0159, 0.0093, 0.0115, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 01:31:09,104 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1411, 4.4506, 3.6932, 4.3490, 4.3518, 3.6923, 4.2814, 3.9413], device='cuda:0'), covar=tensor([0.0407, 0.0560, 0.2053, 0.0663, 0.0647, 0.0608, 0.0503, 0.0713], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0170, 0.0265, 0.0166, 0.0208, 0.0168, 0.0179, 0.0165], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:31:28,850 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.018e+02 1.728e+02 2.092e+02 2.577e+02 4.668e+02, threshold=4.183e+02, percent-clipped=3.0 2022-11-16 01:31:42,541 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62545.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:31:56,114 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5975, 4.1083, 3.7009, 4.0449, 4.1155, 3.4933, 3.7048, 3.5724], device='cuda:0'), covar=tensor([0.0775, 0.0388, 0.1339, 0.0433, 0.0408, 0.0472, 0.0568, 0.0564], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0168, 0.0262, 0.0166, 0.0207, 0.0166, 0.0179, 0.0165], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:31:58,788 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62570.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:32:03,250 INFO [train.py:876] (0/4) Epoch 9, batch 4400, loss[loss=0.1126, simple_loss=0.1288, pruned_loss=0.0482, over 5738.00 frames. ], tot_loss[loss=0.1314, simple_loss=0.1534, pruned_loss=0.05467, over 1088256.16 frames. ], batch size: 13, lr: 8.93e-03, grad_scale: 16.0 2022-11-16 01:32:08,962 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62585.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:32:10,322 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9983, 2.4435, 3.5536, 3.1880, 4.0841, 2.3887, 3.4209, 4.1238], device='cuda:0'), covar=tensor([0.0731, 0.1720, 0.0902, 0.1606, 0.0528, 0.1866, 0.1437, 0.0721], device='cuda:0'), in_proj_covar=tensor([0.0230, 0.0191, 0.0207, 0.0206, 0.0225, 0.0190, 0.0221, 0.0222], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 01:32:11,947 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7837, 3.7428, 3.6838, 3.2502, 2.1748, 3.9796, 2.2054, 3.1340], device='cuda:0'), covar=tensor([0.0382, 0.0215, 0.0194, 0.0387, 0.0565, 0.0139, 0.0502, 0.0196], device='cuda:0'), in_proj_covar=tensor([0.0185, 0.0160, 0.0169, 0.0189, 0.0181, 0.0172, 0.0180, 0.0170], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 01:32:23,683 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62606.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:32:31,369 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62618.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:32:36,537 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.591e+02 1.968e+02 2.413e+02 5.184e+02, threshold=3.935e+02, percent-clipped=2.0 2022-11-16 01:32:38,741 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62629.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:32:51,042 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62646.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:32:57,266 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2022-11-16 01:33:04,321 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62666.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:33:11,145 INFO [train.py:876] (0/4) Epoch 9, batch 4500, loss[loss=0.1057, simple_loss=0.1474, pruned_loss=0.032, over 5691.00 frames. ], tot_loss[loss=0.13, simple_loss=0.1524, pruned_loss=0.0538, over 1092979.34 frames. ], batch size: 15, lr: 8.93e-03, grad_scale: 16.0 2022-11-16 01:33:43,436 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-16 01:33:44,305 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.986e+01 1.583e+02 2.006e+02 2.339e+02 4.798e+02, threshold=4.012e+02, percent-clipped=3.0 2022-11-16 01:33:45,183 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62727.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:33:58,097 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62747.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:34:16,515 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=62773.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:34:18,988 INFO [train.py:876] (0/4) Epoch 9, batch 4600, loss[loss=0.2251, simple_loss=0.2059, pruned_loss=0.1221, over 3037.00 frames. ], tot_loss[loss=0.1291, simple_loss=0.152, pruned_loss=0.05308, over 1092049.23 frames. ], batch size: 284, lr: 8.92e-03, grad_scale: 16.0 2022-11-16 01:34:19,712 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62778.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:34:38,363 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5091, 2.6263, 2.6266, 2.8714, 2.2765, 2.4062, 2.3876, 3.2205], device='cuda:0'), covar=tensor([0.1108, 0.1332, 0.1777, 0.1184, 0.1566, 0.1843, 0.1725, 0.0635], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0093, 0.0097, 0.0085, 0.0085, 0.0089, 0.0091, 0.0068], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 01:34:52,258 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.537e+01 1.546e+02 1.870e+02 2.337e+02 4.153e+02, threshold=3.740e+02, percent-clipped=1.0 2022-11-16 01:34:52,343 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62826.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:34:57,790 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=62834.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:35:01,755 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7364, 2.5004, 2.5050, 1.2780, 2.6299, 2.7605, 2.6935, 2.9427], device='cuda:0'), covar=tensor([0.1883, 0.1479, 0.0991, 0.2780, 0.0821, 0.0864, 0.0595, 0.0890], device='cuda:0'), in_proj_covar=tensor([0.0175, 0.0184, 0.0165, 0.0192, 0.0176, 0.0194, 0.0163, 0.0189], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 01:35:19,403 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.79 vs. limit=5.0 2022-11-16 01:35:27,132 INFO [train.py:876] (0/4) Epoch 9, batch 4700, loss[loss=0.1166, simple_loss=0.149, pruned_loss=0.04205, over 5690.00 frames. ], tot_loss[loss=0.1293, simple_loss=0.1522, pruned_loss=0.05319, over 1087362.94 frames. ], batch size: 17, lr: 8.91e-03, grad_scale: 32.0 2022-11-16 01:35:42,713 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62901.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:36:00,632 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.617e+02 2.054e+02 2.509e+02 4.948e+02, threshold=4.108e+02, percent-clipped=3.0 2022-11-16 01:36:02,081 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=62929.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:36:04,041 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6443, 2.0750, 2.8389, 1.7771, 1.1445, 3.3418, 2.7130, 2.0773], device='cuda:0'), covar=tensor([0.0750, 0.1009, 0.0488, 0.2466, 0.3127, 0.0539, 0.2149, 0.1266], device='cuda:0'), in_proj_covar=tensor([0.0083, 0.0074, 0.0072, 0.0085, 0.0064, 0.0054, 0.0062, 0.0072], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:36:09,994 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=62941.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:36:34,658 INFO [train.py:876] (0/4) Epoch 9, batch 4800, loss[loss=0.1061, simple_loss=0.1431, pruned_loss=0.03452, over 5757.00 frames. ], tot_loss[loss=0.1308, simple_loss=0.1531, pruned_loss=0.05419, over 1087198.71 frames. ], batch size: 13, lr: 8.91e-03, grad_scale: 16.0 2022-11-16 01:36:34,703 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=62977.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:36:37,101 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-16 01:37:05,669 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63022.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:37:09,061 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.223e+01 1.584e+02 1.833e+02 2.198e+02 5.399e+02, threshold=3.666e+02, percent-clipped=2.0 2022-11-16 01:37:22,491 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63047.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:37:27,002 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9936, 4.1454, 3.2283, 4.0247, 4.0532, 3.8688, 4.3702, 4.0799], device='cuda:0'), covar=tensor([0.0563, 0.1052, 0.2602, 0.0975, 0.1025, 0.0739, 0.0706, 0.0874], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0167, 0.0262, 0.0165, 0.0206, 0.0165, 0.0179, 0.0165], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:37:42,691 INFO [train.py:876] (0/4) Epoch 9, batch 4900, loss[loss=0.1522, simple_loss=0.1639, pruned_loss=0.07028, over 5551.00 frames. ], tot_loss[loss=0.1295, simple_loss=0.1518, pruned_loss=0.05355, over 1084127.59 frames. ], batch size: 30, lr: 8.90e-03, grad_scale: 16.0 2022-11-16 01:37:52,980 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5815, 2.1374, 2.9268, 2.0345, 1.5898, 3.2761, 2.8712, 2.1559], device='cuda:0'), covar=tensor([0.0892, 0.1076, 0.0608, 0.2070, 0.1824, 0.0684, 0.0941, 0.0979], device='cuda:0'), in_proj_covar=tensor([0.0082, 0.0073, 0.0072, 0.0084, 0.0063, 0.0053, 0.0062, 0.0072], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:37:54,805 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63095.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:38:13,589 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1765, 3.8068, 2.6570, 3.5749, 2.8418, 2.6853, 2.1450, 3.2819], device='cuda:0'), covar=tensor([0.1409, 0.0242, 0.1065, 0.0351, 0.1055, 0.1034, 0.1882, 0.0383], device='cuda:0'), in_proj_covar=tensor([0.0162, 0.0137, 0.0162, 0.0141, 0.0176, 0.0172, 0.0169, 0.0155], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 01:38:17,342 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.788e+02 2.171e+02 2.737e+02 6.659e+02, threshold=4.342e+02, percent-clipped=3.0 2022-11-16 01:38:18,103 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63129.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:38:49,665 INFO [train.py:876] (0/4) Epoch 9, batch 5000, loss[loss=0.2082, simple_loss=0.1886, pruned_loss=0.114, over 3037.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1506, pruned_loss=0.05261, over 1081919.05 frames. ], batch size: 284, lr: 8.89e-03, grad_scale: 8.0 2022-11-16 01:39:06,636 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63201.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:39:13,466 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-16 01:39:19,705 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63221.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:39:24,001 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.228e+01 1.555e+02 1.927e+02 2.355e+02 4.069e+02, threshold=3.855e+02, percent-clipped=0.0 2022-11-16 01:39:32,907 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63241.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:39:34,776 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2022-11-16 01:39:38,705 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63249.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:39:51,476 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8637, 3.8786, 3.5741, 3.5291, 3.9071, 3.5666, 1.7027, 4.1440], device='cuda:0'), covar=tensor([0.0315, 0.0308, 0.0379, 0.0380, 0.0305, 0.0472, 0.3158, 0.0302], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0082, 0.0085, 0.0077, 0.0100, 0.0086, 0.0129, 0.0106], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:39:57,148 INFO [train.py:876] (0/4) Epoch 9, batch 5100, loss[loss=0.1754, simple_loss=0.1719, pruned_loss=0.08946, over 4711.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1509, pruned_loss=0.05295, over 1085259.04 frames. ], batch size: 135, lr: 8.88e-03, grad_scale: 8.0 2022-11-16 01:39:58,583 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0651, 1.9776, 2.4603, 1.4514, 1.3375, 2.8790, 2.3758, 2.0142], device='cuda:0'), covar=tensor([0.1044, 0.1253, 0.0786, 0.3209, 0.4016, 0.0974, 0.1419, 0.1560], device='cuda:0'), in_proj_covar=tensor([0.0082, 0.0074, 0.0072, 0.0085, 0.0063, 0.0055, 0.0062, 0.0073], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:40:00,535 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63282.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:40:04,995 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63289.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:40:27,658 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63322.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:40:31,502 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.064e+01 1.589e+02 1.945e+02 2.471e+02 3.864e+02, threshold=3.889e+02, percent-clipped=1.0 2022-11-16 01:40:37,802 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-16 01:40:59,519 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63370.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:41:04,331 INFO [train.py:876] (0/4) Epoch 9, batch 5200, loss[loss=0.09072, simple_loss=0.1172, pruned_loss=0.03213, over 5294.00 frames. ], tot_loss[loss=0.1254, simple_loss=0.1485, pruned_loss=0.05112, over 1087454.01 frames. ], batch size: 9, lr: 8.88e-03, grad_scale: 8.0 2022-11-16 01:41:33,495 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63420.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:41:34,781 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63422.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:41:38,978 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.008e+02 1.457e+02 1.778e+02 2.243e+02 4.193e+02, threshold=3.556e+02, percent-clipped=1.0 2022-11-16 01:41:39,819 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63429.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:42:00,072 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4677, 5.0173, 4.5466, 5.0141, 5.0459, 4.3279, 4.7227, 4.3608], device='cuda:0'), covar=tensor([0.0304, 0.0380, 0.1329, 0.0352, 0.0441, 0.0448, 0.0397, 0.0479], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0172, 0.0269, 0.0169, 0.0212, 0.0170, 0.0183, 0.0170], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:42:11,805 INFO [train.py:876] (0/4) Epoch 9, batch 5300, loss[loss=0.1747, simple_loss=0.1839, pruned_loss=0.08278, over 5579.00 frames. ], tot_loss[loss=0.1282, simple_loss=0.1508, pruned_loss=0.05283, over 1088000.28 frames. ], batch size: 54, lr: 8.87e-03, grad_scale: 8.0 2022-11-16 01:42:11,848 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63477.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:42:14,585 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63481.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:42:16,186 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63483.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:42:18,023 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8227, 1.9150, 2.4389, 1.4172, 1.0391, 2.8159, 2.2573, 1.9837], device='cuda:0'), covar=tensor([0.1232, 0.2007, 0.0774, 0.3306, 0.4378, 0.1016, 0.1362, 0.1741], device='cuda:0'), in_proj_covar=tensor([0.0083, 0.0076, 0.0074, 0.0086, 0.0065, 0.0055, 0.0064, 0.0074], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:42:18,701 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8029, 1.2195, 1.8779, 1.0975, 1.4811, 1.7154, 1.3569, 1.1953], device='cuda:0'), covar=tensor([0.0022, 0.0057, 0.0017, 0.0043, 0.0081, 0.0059, 0.0035, 0.0041], device='cuda:0'), in_proj_covar=tensor([0.0021, 0.0021, 0.0023, 0.0028, 0.0025, 0.0023, 0.0027, 0.0027], device='cuda:0'), out_proj_covar=tensor([1.9570e-05, 2.0179e-05, 2.0363e-05, 2.7917e-05, 2.3259e-05, 2.2332e-05, 2.6153e-05, 2.6751e-05], device='cuda:0') 2022-11-16 01:42:46,192 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.109e+01 1.553e+02 1.965e+02 2.267e+02 4.174e+02, threshold=3.929e+02, percent-clipped=2.0 2022-11-16 01:42:50,548 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6194, 3.6325, 3.5680, 3.7210, 3.2652, 3.0907, 4.0974, 3.5564], device='cuda:0'), covar=tensor([0.0504, 0.0973, 0.0584, 0.1215, 0.0741, 0.0494, 0.0809, 0.0688], device='cuda:0'), in_proj_covar=tensor([0.0083, 0.0103, 0.0089, 0.0114, 0.0084, 0.0075, 0.0140, 0.0096], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:43:13,465 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63569.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:43:18,727 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0228, 4.2995, 3.6464, 4.2305, 4.2278, 3.8060, 4.2259, 4.0438], device='cuda:0'), covar=tensor([0.0573, 0.0632, 0.1809, 0.0675, 0.0668, 0.0511, 0.0660, 0.0615], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0172, 0.0269, 0.0170, 0.0212, 0.0171, 0.0184, 0.0171], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:43:19,366 INFO [train.py:876] (0/4) Epoch 9, batch 5400, loss[loss=0.1325, simple_loss=0.1604, pruned_loss=0.05231, over 5493.00 frames. ], tot_loss[loss=0.1288, simple_loss=0.1515, pruned_loss=0.05304, over 1087591.50 frames. ], batch size: 17, lr: 8.86e-03, grad_scale: 8.0 2022-11-16 01:43:19,444 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63577.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:43:21,387 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7919, 2.9298, 3.0179, 2.7751, 2.9442, 2.8051, 1.2601, 3.0322], device='cuda:0'), covar=tensor([0.0326, 0.0281, 0.0292, 0.0276, 0.0357, 0.0432, 0.2868, 0.0352], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0082, 0.0086, 0.0077, 0.0101, 0.0086, 0.0130, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:43:25,752 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63586.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:43:26,393 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7410, 2.8576, 2.4414, 3.1212, 2.4204, 2.7295, 2.9575, 3.3357], device='cuda:0'), covar=tensor([0.1600, 0.1298, 0.2214, 0.0935, 0.1545, 0.1037, 0.1273, 0.1513], device='cuda:0'), in_proj_covar=tensor([0.0094, 0.0093, 0.0098, 0.0087, 0.0085, 0.0090, 0.0091, 0.0068], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 01:43:55,258 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.021e+02 1.654e+02 2.116e+02 2.560e+02 5.660e+02, threshold=4.233e+02, percent-clipped=4.0 2022-11-16 01:43:56,865 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63630.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:44:09,228 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63647.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:44:30,045 INFO [train.py:876] (0/4) Epoch 9, batch 5500, loss[loss=0.1015, simple_loss=0.1225, pruned_loss=0.04029, over 5470.00 frames. ], tot_loss[loss=0.1278, simple_loss=0.1511, pruned_loss=0.05224, over 1092663.89 frames. ], batch size: 11, lr: 8.86e-03, grad_scale: 8.0 2022-11-16 01:45:00,587 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5883, 4.5101, 4.7867, 4.8037, 4.3267, 4.1750, 5.2788, 4.6286], device='cuda:0'), covar=tensor([0.0312, 0.0967, 0.0283, 0.1195, 0.0505, 0.0278, 0.0628, 0.0550], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0106, 0.0090, 0.0116, 0.0085, 0.0076, 0.0142, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:45:04,282 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.831e+01 1.633e+02 1.870e+02 2.382e+02 5.170e+02, threshold=3.741e+02, percent-clipped=1.0 2022-11-16 01:45:29,448 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2022-11-16 01:45:36,943 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63776.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:45:37,000 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2429, 2.0047, 2.6669, 1.8604, 1.1993, 3.2045, 2.4944, 2.3050], device='cuda:0'), covar=tensor([0.0867, 0.0931, 0.0526, 0.2766, 0.3413, 0.1038, 0.0966, 0.1008], device='cuda:0'), in_proj_covar=tensor([0.0082, 0.0074, 0.0073, 0.0085, 0.0063, 0.0054, 0.0063, 0.0072], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:45:37,521 INFO [train.py:876] (0/4) Epoch 9, batch 5600, loss[loss=0.1194, simple_loss=0.1581, pruned_loss=0.04031, over 5755.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.151, pruned_loss=0.05199, over 1087738.24 frames. ], batch size: 21, lr: 8.85e-03, grad_scale: 8.0 2022-11-16 01:45:38,263 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63778.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:45:44,934 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2022-11-16 01:46:00,502 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63810.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:46:08,466 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.97 vs. limit=5.0 2022-11-16 01:46:12,215 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.664e+02 2.053e+02 2.570e+02 4.903e+02, threshold=4.106e+02, percent-clipped=3.0 2022-11-16 01:46:23,450 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63845.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:46:41,778 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63871.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:46:44,948 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1500, 3.6394, 3.2239, 3.6166, 3.6385, 3.1179, 3.2868, 3.0983], device='cuda:0'), covar=tensor([0.1251, 0.0533, 0.1504, 0.0468, 0.0502, 0.0531, 0.0675, 0.0804], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0170, 0.0264, 0.0166, 0.0208, 0.0167, 0.0180, 0.0167], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:46:45,513 INFO [train.py:876] (0/4) Epoch 9, batch 5700, loss[loss=0.1383, simple_loss=0.1613, pruned_loss=0.05769, over 5551.00 frames. ], tot_loss[loss=0.1283, simple_loss=0.1516, pruned_loss=0.0525, over 1090090.47 frames. ], batch size: 46, lr: 8.84e-03, grad_scale: 8.0 2022-11-16 01:46:45,630 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=63877.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:47:05,276 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63906.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:47:06,696 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.75 vs. limit=2.0 2022-11-16 01:47:09,686 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3916, 2.7470, 3.0594, 2.8515, 1.7503, 2.8684, 1.9671, 2.0877], device='cuda:0'), covar=tensor([0.0218, 0.0156, 0.0094, 0.0159, 0.0313, 0.0128, 0.0342, 0.0184], device='cuda:0'), in_proj_covar=tensor([0.0183, 0.0159, 0.0167, 0.0188, 0.0180, 0.0168, 0.0179, 0.0170], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 01:47:12,357 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=63916.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:47:18,380 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=63925.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:47:18,412 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63925.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:47:20,209 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.040e+01 1.586e+02 1.908e+02 2.453e+02 4.306e+02, threshold=3.816e+02, percent-clipped=1.0 2022-11-16 01:47:29,398 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=63942.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:47:42,370 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7417, 1.9497, 2.4481, 2.3739, 2.3520, 1.8344, 2.3383, 2.6657], device='cuda:0'), covar=tensor([0.0538, 0.1148, 0.0684, 0.0983, 0.0758, 0.1184, 0.0977, 0.0637], device='cuda:0'), in_proj_covar=tensor([0.0231, 0.0190, 0.0206, 0.0207, 0.0228, 0.0188, 0.0223, 0.0222], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 01:47:49,019 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.5943, 5.1655, 5.3925, 5.1926, 5.8019, 5.7133, 4.9061, 5.6685], device='cuda:0'), covar=tensor([0.0417, 0.0283, 0.0495, 0.0257, 0.0289, 0.0126, 0.0211, 0.0244], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0139, 0.0102, 0.0136, 0.0160, 0.0091, 0.0116, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 01:47:53,267 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.97 vs. limit=2.0 2022-11-16 01:47:53,503 INFO [train.py:876] (0/4) Epoch 9, batch 5800, loss[loss=0.07945, simple_loss=0.1179, pruned_loss=0.02052, over 5509.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1509, pruned_loss=0.05209, over 1092507.90 frames. ], batch size: 10, lr: 8.84e-03, grad_scale: 8.0 2022-11-16 01:47:53,665 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=63977.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:48:08,708 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2022-11-16 01:48:25,311 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64023.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:48:28,337 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.198e+01 1.501e+02 1.757e+02 2.288e+02 3.552e+02, threshold=3.514e+02, percent-clipped=0.0 2022-11-16 01:48:31,687 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3368, 4.1323, 4.1785, 4.2365, 3.9235, 3.7999, 4.7565, 4.2447], device='cuda:0'), covar=tensor([0.0351, 0.1118, 0.0360, 0.1546, 0.0490, 0.0364, 0.0695, 0.0566], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0106, 0.0090, 0.0115, 0.0085, 0.0076, 0.0141, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:48:39,085 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2022-11-16 01:49:00,485 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64076.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:49:01,352 INFO [train.py:876] (0/4) Epoch 9, batch 5900, loss[loss=0.08342, simple_loss=0.1067, pruned_loss=0.03006, over 5115.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.151, pruned_loss=0.05323, over 1084966.19 frames. ], batch size: 8, lr: 8.83e-03, grad_scale: 8.0 2022-11-16 01:49:02,112 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64078.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:49:06,057 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64084.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:49:32,759 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64124.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:49:34,399 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64126.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:49:35,640 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.966e+01 1.656e+02 2.045e+02 2.527e+02 4.457e+02, threshold=4.090e+02, percent-clipped=4.0 2022-11-16 01:50:00,868 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64166.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:50:06,333 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1094, 1.7626, 2.4978, 1.6024, 1.0018, 2.9837, 2.3668, 2.0423], device='cuda:0'), covar=tensor([0.0894, 0.1339, 0.0592, 0.2519, 0.2669, 0.2040, 0.0590, 0.1743], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0075, 0.0074, 0.0085, 0.0064, 0.0054, 0.0062, 0.0073], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:50:08,147 INFO [train.py:876] (0/4) Epoch 9, batch 6000, loss[loss=0.211, simple_loss=0.2007, pruned_loss=0.1107, over 5461.00 frames. ], tot_loss[loss=0.1285, simple_loss=0.1509, pruned_loss=0.05302, over 1088062.02 frames. ], batch size: 58, lr: 8.82e-03, grad_scale: 8.0 2022-11-16 01:50:08,148 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 01:50:12,633 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4828, 2.6860, 3.7569, 3.3356, 4.2782, 2.6724, 3.7935, 4.2778], device='cuda:0'), covar=tensor([0.0500, 0.1756, 0.1019, 0.1828, 0.0573, 0.2017, 0.1411, 0.0742], device='cuda:0'), in_proj_covar=tensor([0.0232, 0.0189, 0.0206, 0.0207, 0.0229, 0.0190, 0.0225, 0.0225], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 01:50:17,752 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0171, 3.0702, 2.4396, 2.6536, 1.9390, 2.5973, 2.0177, 2.8296], device='cuda:0'), covar=tensor([0.0591, 0.0169, 0.0397, 0.0337, 0.1691, 0.0407, 0.0898, 0.0250], device='cuda:0'), in_proj_covar=tensor([0.0162, 0.0138, 0.0164, 0.0143, 0.0177, 0.0172, 0.0169, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 01:50:25,044 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4830, 3.3288, 3.6084, 3.3000, 2.8669, 3.3407, 3.7425, 3.7270], device='cuda:0'), covar=tensor([0.0316, 0.0949, 0.0365, 0.0977, 0.0698, 0.0359, 0.0651, 0.0402], device='cuda:0'), in_proj_covar=tensor([0.0083, 0.0105, 0.0089, 0.0115, 0.0084, 0.0075, 0.0140, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:50:25,851 INFO [train.py:908] (0/4) Epoch 9, validation: loss=0.1648, simple_loss=0.1829, pruned_loss=0.07333, over 1530663.00 frames. 2022-11-16 01:50:25,851 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-16 01:50:37,266 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2022-11-16 01:50:42,283 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64201.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:50:43,843 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.90 vs. limit=2.0 2022-11-16 01:50:53,207 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8527, 3.6855, 3.8249, 3.9218, 3.4938, 3.2362, 4.3071, 3.7377], device='cuda:0'), covar=tensor([0.0541, 0.1012, 0.0485, 0.1241, 0.0632, 0.0419, 0.0710, 0.0696], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0107, 0.0090, 0.0116, 0.0085, 0.0076, 0.0142, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:50:57,915 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64225.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:50:59,713 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.067e+01 1.626e+02 1.990e+02 2.309e+02 5.533e+02, threshold=3.980e+02, percent-clipped=3.0 2022-11-16 01:51:02,872 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64232.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:51:10,135 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64242.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:51:26,100 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2022-11-16 01:51:29,849 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64272.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:51:30,477 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64273.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:51:33,036 INFO [train.py:876] (0/4) Epoch 9, batch 6100, loss[loss=0.1051, simple_loss=0.1396, pruned_loss=0.03532, over 5543.00 frames. ], tot_loss[loss=0.1283, simple_loss=0.1507, pruned_loss=0.05295, over 1084542.34 frames. ], batch size: 17, lr: 8.82e-03, grad_scale: 8.0 2022-11-16 01:51:41,909 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64290.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:51:44,145 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64293.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:52:07,147 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.015e+02 1.572e+02 1.862e+02 2.390e+02 5.405e+02, threshold=3.724e+02, percent-clipped=2.0 2022-11-16 01:52:18,837 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64345.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:52:40,440 INFO [train.py:876] (0/4) Epoch 9, batch 6200, loss[loss=0.1403, simple_loss=0.1637, pruned_loss=0.05845, over 5714.00 frames. ], tot_loss[loss=0.1274, simple_loss=0.15, pruned_loss=0.05239, over 1082834.50 frames. ], batch size: 31, lr: 8.81e-03, grad_scale: 8.0 2022-11-16 01:52:41,778 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64379.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:52:42,102 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2022-11-16 01:53:00,245 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64406.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:53:14,907 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.648e+02 1.944e+02 2.421e+02 4.973e+02, threshold=3.888e+02, percent-clipped=5.0 2022-11-16 01:53:15,863 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2022-11-16 01:53:25,189 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-16 01:53:33,511 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2307, 1.6250, 1.9853, 1.5420, 1.1528, 2.4670, 1.7691, 1.7739], device='cuda:0'), covar=tensor([0.1605, 0.1758, 0.1810, 0.2514, 0.2445, 0.0734, 0.1462, 0.1310], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0077, 0.0076, 0.0088, 0.0066, 0.0055, 0.0064, 0.0075], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 01:53:39,846 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64465.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:53:40,801 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64466.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:53:44,040 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.32 vs. limit=5.0 2022-11-16 01:53:48,093 INFO [train.py:876] (0/4) Epoch 9, batch 6300, loss[loss=0.1619, simple_loss=0.164, pruned_loss=0.07996, over 5290.00 frames. ], tot_loss[loss=0.1292, simple_loss=0.151, pruned_loss=0.05369, over 1079844.50 frames. ], batch size: 79, lr: 8.80e-03, grad_scale: 8.0 2022-11-16 01:54:03,889 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64501.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:54:12,479 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64514.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:54:17,205 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8568, 2.1563, 1.9917, 1.3062, 1.4655, 1.3353, 1.2977, 1.9493], device='cuda:0'), covar=tensor([0.0031, 0.0029, 0.0031, 0.0037, 0.0038, 0.0030, 0.0032, 0.0061], device='cuda:0'), in_proj_covar=tensor([0.0050, 0.0044, 0.0047, 0.0047, 0.0046, 0.0042, 0.0045, 0.0040], device='cuda:0'), out_proj_covar=tensor([4.5495e-05, 4.0052e-05, 4.2464e-05, 4.2078e-05, 4.0999e-05, 3.6211e-05, 4.1208e-05, 3.4914e-05], device='cuda:0') 2022-11-16 01:54:21,627 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64526.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:54:22,700 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.211e+02 1.573e+02 1.938e+02 2.415e+02 6.064e+02, threshold=3.877e+02, percent-clipped=4.0 2022-11-16 01:54:36,383 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64549.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:54:52,159 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64572.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:54:55,718 INFO [train.py:876] (0/4) Epoch 9, batch 6400, loss[loss=0.1132, simple_loss=0.1294, pruned_loss=0.04847, over 5318.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.1508, pruned_loss=0.05334, over 1079012.01 frames. ], batch size: 9, lr: 8.80e-03, grad_scale: 8.0 2022-11-16 01:55:03,013 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64588.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:55:03,712 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64589.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:55:24,419 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64620.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:55:30,096 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.639e+02 1.887e+02 2.519e+02 5.774e+02, threshold=3.775e+02, percent-clipped=3.0 2022-11-16 01:55:41,574 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.53 vs. limit=2.0 2022-11-16 01:55:43,713 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64648.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:55:45,014 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64650.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:56:02,881 INFO [train.py:876] (0/4) Epoch 9, batch 6500, loss[loss=0.1313, simple_loss=0.1601, pruned_loss=0.05127, over 5619.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.1503, pruned_loss=0.05242, over 1087262.87 frames. ], batch size: 29, lr: 8.79e-03, grad_scale: 8.0 2022-11-16 01:56:04,242 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64679.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:56:19,466 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64701.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:56:19,771 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2022-11-16 01:56:20,845 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64703.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:56:24,829 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64709.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:56:36,723 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64727.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:56:37,276 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.652e+01 1.582e+02 1.894e+02 2.392e+02 5.037e+02, threshold=3.789e+02, percent-clipped=1.0 2022-11-16 01:56:49,136 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.79 vs. limit=5.0 2022-11-16 01:56:49,587 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64745.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:56:54,286 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64752.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:57:02,057 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64764.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:57:10,222 INFO [train.py:876] (0/4) Epoch 9, batch 6600, loss[loss=0.1499, simple_loss=0.1621, pruned_loss=0.06883, over 5131.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1506, pruned_loss=0.0526, over 1083232.21 frames. ], batch size: 91, lr: 8.78e-03, grad_scale: 8.0 2022-11-16 01:57:30,794 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64806.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:57:35,338 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=64813.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:57:40,451 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64821.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:57:44,979 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.581e+02 1.869e+02 2.492e+02 4.006e+02, threshold=3.739e+02, percent-clipped=2.0 2022-11-16 01:57:45,860 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3909, 3.3197, 3.3698, 3.1287, 1.9769, 3.3797, 2.2005, 2.8841], device='cuda:0'), covar=tensor([0.0462, 0.0259, 0.0183, 0.0333, 0.0514, 0.0171, 0.0488, 0.0188], device='cuda:0'), in_proj_covar=tensor([0.0187, 0.0163, 0.0172, 0.0194, 0.0183, 0.0171, 0.0182, 0.0173], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 01:57:54,043 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0388, 2.0678, 1.9351, 2.1451, 2.0302, 1.5793, 1.8652, 2.3322], device='cuda:0'), covar=tensor([0.1376, 0.1360, 0.2232, 0.1174, 0.1361, 0.1876, 0.1627, 0.0848], device='cuda:0'), in_proj_covar=tensor([0.0098, 0.0096, 0.0101, 0.0090, 0.0088, 0.0092, 0.0094, 0.0072], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 01:58:12,099 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-16 01:58:18,045 INFO [train.py:876] (0/4) Epoch 9, batch 6700, loss[loss=0.1216, simple_loss=0.153, pruned_loss=0.0451, over 5634.00 frames. ], tot_loss[loss=0.1298, simple_loss=0.1518, pruned_loss=0.0539, over 1082116.49 frames. ], batch size: 29, lr: 8.77e-03, grad_scale: 8.0 2022-11-16 01:58:22,739 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7563, 0.4675, 0.7895, 0.5798, 0.6839, 0.6771, 0.3987, 0.6729], device='cuda:0'), covar=tensor([0.0148, 0.0223, 0.0184, 0.0222, 0.0170, 0.0164, 0.0475, 0.0224], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0020, 0.0013, 0.0017, 0.0015, 0.0013, 0.0018, 0.0013], device='cuda:0'), out_proj_covar=tensor([6.8480e-05, 9.2740e-05, 6.9592e-05, 8.4372e-05, 7.4914e-05, 6.9442e-05, 8.6634e-05, 6.8748e-05], device='cuda:0') 2022-11-16 01:58:25,443 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=64888.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:58:45,362 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6739, 3.7557, 3.6388, 3.4998, 3.7732, 3.6665, 1.3182, 3.9754], device='cuda:0'), covar=tensor([0.0271, 0.0250, 0.0305, 0.0346, 0.0313, 0.0332, 0.3169, 0.0246], device='cuda:0'), in_proj_covar=tensor([0.0102, 0.0081, 0.0083, 0.0074, 0.0100, 0.0085, 0.0127, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:58:52,429 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.525e+01 1.574e+02 2.010e+02 2.472e+02 4.884e+02, threshold=4.021e+02, percent-clipped=4.0 2022-11-16 01:58:57,699 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=64936.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:59:03,058 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3384, 4.3902, 4.5802, 4.7031, 4.2539, 3.9369, 5.0649, 4.6181], device='cuda:0'), covar=tensor([0.0520, 0.0942, 0.0484, 0.1075, 0.0485, 0.0457, 0.0779, 0.0796], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0104, 0.0088, 0.0114, 0.0084, 0.0075, 0.0139, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 01:59:03,705 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=64945.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:59:04,813 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2022-11-16 01:59:25,574 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=64976.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:59:26,077 INFO [train.py:876] (0/4) Epoch 9, batch 6800, loss[loss=0.1543, simple_loss=0.1762, pruned_loss=0.06625, over 5266.00 frames. ], tot_loss[loss=0.129, simple_loss=0.1512, pruned_loss=0.05343, over 1076508.56 frames. ], batch size: 79, lr: 8.77e-03, grad_scale: 8.0 2022-11-16 01:59:41,430 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-65000.pt 2022-11-16 01:59:45,791 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65001.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 01:59:48,097 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65004.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:00:04,568 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.642e+02 2.032e+02 2.678e+02 4.129e+02, threshold=4.063e+02, percent-clipped=1.0 2022-11-16 02:00:10,600 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65037.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:00:18,370 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65049.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:00:22,402 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=65055.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:00:25,191 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65059.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:00:37,935 INFO [train.py:876] (0/4) Epoch 9, batch 6900, loss[loss=0.1667, simple_loss=0.1764, pruned_loss=0.07854, over 4634.00 frames. ], tot_loss[loss=0.129, simple_loss=0.1514, pruned_loss=0.05327, over 1076456.36 frames. ], batch size: 135, lr: 8.76e-03, grad_scale: 8.0 2022-11-16 02:00:53,892 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65101.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:00:58,439 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65108.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:01:04,190 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65116.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:01:05,785 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2022-11-16 02:01:07,478 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65121.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:01:12,981 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.038e+02 1.555e+02 1.818e+02 2.213e+02 4.720e+02, threshold=3.636e+02, percent-clipped=2.0 2022-11-16 02:01:40,198 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65169.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:01:45,807 INFO [train.py:876] (0/4) Epoch 9, batch 7000, loss[loss=0.07904, simple_loss=0.1066, pruned_loss=0.02576, over 5421.00 frames. ], tot_loss[loss=0.1289, simple_loss=0.1516, pruned_loss=0.05312, over 1076658.19 frames. ], batch size: 9, lr: 8.75e-03, grad_scale: 16.0 2022-11-16 02:02:19,940 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.702e+02 2.050e+02 2.485e+02 3.887e+02, threshold=4.100e+02, percent-clipped=2.0 2022-11-16 02:02:32,222 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65245.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:02:40,715 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9329, 1.2262, 2.2233, 1.4880, 1.4352, 1.9918, 1.8203, 1.5390], device='cuda:0'), covar=tensor([0.0042, 0.0075, 0.0026, 0.0049, 0.0121, 0.0110, 0.0037, 0.0045], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0023, 0.0024, 0.0031, 0.0026, 0.0025, 0.0029, 0.0029], device='cuda:0'), out_proj_covar=tensor([2.1556e-05, 2.1858e-05, 2.2031e-05, 3.0381e-05, 2.4690e-05, 2.4016e-05, 2.8315e-05, 2.8617e-05], device='cuda:0') 2022-11-16 02:02:53,102 INFO [train.py:876] (0/4) Epoch 9, batch 7100, loss[loss=0.1594, simple_loss=0.1679, pruned_loss=0.07546, over 5573.00 frames. ], tot_loss[loss=0.127, simple_loss=0.1506, pruned_loss=0.05167, over 1083103.26 frames. ], batch size: 46, lr: 8.75e-03, grad_scale: 16.0 2022-11-16 02:02:58,815 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=65285.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:03:04,607 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65293.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:03:12,355 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65304.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:03:16,332 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2226, 3.2200, 3.1802, 3.3612, 3.0437, 3.7510, 4.1217, 3.8545], device='cuda:0'), covar=tensor([0.1072, 0.1447, 0.1938, 0.1276, 0.1487, 0.0921, 0.1132, 0.2566], device='cuda:0'), in_proj_covar=tensor([0.0097, 0.0096, 0.0101, 0.0091, 0.0086, 0.0092, 0.0095, 0.0071], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:03:28,153 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.907e+01 1.625e+02 1.964e+02 2.489e+02 4.704e+02, threshold=3.927e+02, percent-clipped=1.0 2022-11-16 02:03:31,218 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65332.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 02:03:41,023 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65346.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:03:45,049 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65352.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:03:49,614 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65359.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:04:01,553 INFO [train.py:876] (0/4) Epoch 9, batch 7200, loss[loss=0.103, simple_loss=0.1314, pruned_loss=0.03727, over 5630.00 frames. ], tot_loss[loss=0.1269, simple_loss=0.1504, pruned_loss=0.05166, over 1082618.18 frames. ], batch size: 29, lr: 8.74e-03, grad_scale: 16.0 2022-11-16 02:04:18,237 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65401.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:04:22,060 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65407.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:04:22,783 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65408.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:04:24,628 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65411.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:04:35,329 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.623e+01 1.548e+02 1.861e+02 2.163e+02 4.412e+02, threshold=3.722e+02, percent-clipped=1.0 2022-11-16 02:04:46,040 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2022-11-16 02:04:50,194 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-9.pt 2022-11-16 02:05:33,820 INFO [train.py:876] (0/4) Epoch 10, batch 0, loss[loss=0.1062, simple_loss=0.143, pruned_loss=0.03466, over 5694.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.143, pruned_loss=0.03466, over 5694.00 frames. ], batch size: 19, lr: 8.31e-03, grad_scale: 16.0 2022-11-16 02:05:33,822 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 02:05:50,440 INFO [train.py:908] (0/4) Epoch 10, validation: loss=0.1665, simple_loss=0.1839, pruned_loss=0.07458, over 1530663.00 frames. 2022-11-16 02:05:50,441 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-16 02:05:50,500 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65449.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:05:55,443 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65456.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:06:07,207 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6360, 3.9975, 3.5650, 3.9004, 3.9841, 3.4385, 3.5303, 3.3969], device='cuda:0'), covar=tensor([0.0659, 0.0518, 0.1392, 0.0527, 0.0482, 0.0468, 0.0660, 0.0653], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0166, 0.0263, 0.0165, 0.0208, 0.0167, 0.0178, 0.0166], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 02:06:11,976 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.52 vs. limit=5.0 2022-11-16 02:06:19,511 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8812, 4.4087, 4.6520, 4.3896, 4.9771, 4.8188, 4.4027, 4.9433], device='cuda:0'), covar=tensor([0.0356, 0.0325, 0.0429, 0.0359, 0.0343, 0.0167, 0.0249, 0.0264], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0139, 0.0104, 0.0137, 0.0161, 0.0092, 0.0115, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 02:06:37,486 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6835, 1.9984, 2.2016, 2.8486, 2.7936, 2.2701, 1.9321, 2.8611], device='cuda:0'), covar=tensor([0.1507, 0.2808, 0.2110, 0.1776, 0.1421, 0.2770, 0.2186, 0.1415], device='cuda:0'), in_proj_covar=tensor([0.0238, 0.0203, 0.0197, 0.0315, 0.0225, 0.0209, 0.0192, 0.0238], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:06:43,804 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.371e+01 1.569e+02 1.978e+02 2.490e+02 6.089e+02, threshold=3.956e+02, percent-clipped=4.0 2022-11-16 02:06:46,653 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=65532.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:06:57,654 INFO [train.py:876] (0/4) Epoch 10, batch 100, loss[loss=0.1249, simple_loss=0.1496, pruned_loss=0.05011, over 5694.00 frames. ], tot_loss[loss=0.133, simple_loss=0.1542, pruned_loss=0.05587, over 434311.46 frames. ], batch size: 19, lr: 8.30e-03, grad_scale: 16.0 2022-11-16 02:07:27,701 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65593.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:07:31,036 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7425, 2.3694, 2.6700, 3.7113, 3.5200, 2.8178, 2.5452, 3.4833], device='cuda:0'), covar=tensor([0.0715, 0.3132, 0.2701, 0.3102, 0.1382, 0.3207, 0.2245, 0.0810], device='cuda:0'), in_proj_covar=tensor([0.0237, 0.0203, 0.0196, 0.0314, 0.0225, 0.0209, 0.0192, 0.0239], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:07:41,485 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2022-11-16 02:07:51,875 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.404e+01 1.672e+02 2.008e+02 2.472e+02 6.251e+02, threshold=4.017e+02, percent-clipped=3.0 2022-11-16 02:07:54,633 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65632.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:08:00,525 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65641.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:08:01,254 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2453, 2.1405, 2.2191, 2.2200, 2.0312, 1.7088, 2.0747, 2.5626], device='cuda:0'), covar=tensor([0.1239, 0.1920, 0.2105, 0.1351, 0.1617, 0.2357, 0.1733, 0.0918], device='cuda:0'), in_proj_covar=tensor([0.0096, 0.0095, 0.0098, 0.0090, 0.0085, 0.0091, 0.0094, 0.0069], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:08:05,592 INFO [train.py:876] (0/4) Epoch 10, batch 200, loss[loss=0.1534, simple_loss=0.1636, pruned_loss=0.07163, over 5119.00 frames. ], tot_loss[loss=0.1272, simple_loss=0.1501, pruned_loss=0.05214, over 688512.62 frames. ], batch size: 91, lr: 8.30e-03, grad_scale: 16.0 2022-11-16 02:08:07,147 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9239, 2.3346, 3.3838, 2.8828, 3.7974, 2.5005, 3.3986, 3.9922], device='cuda:0'), covar=tensor([0.0627, 0.1763, 0.0893, 0.1628, 0.0468, 0.1601, 0.1103, 0.0673], device='cuda:0'), in_proj_covar=tensor([0.0234, 0.0192, 0.0209, 0.0209, 0.0229, 0.0191, 0.0225, 0.0227], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:08:13,526 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5361, 1.5628, 1.5534, 1.2621, 1.5872, 1.1309, 1.5278, 1.7282], device='cuda:0'), covar=tensor([0.0052, 0.0056, 0.0048, 0.0050, 0.0041, 0.0047, 0.0047, 0.0038], device='cuda:0'), in_proj_covar=tensor([0.0052, 0.0046, 0.0049, 0.0049, 0.0048, 0.0044, 0.0046, 0.0041], device='cuda:0'), out_proj_covar=tensor([4.7119e-05, 4.1998e-05, 4.3735e-05, 4.4122e-05, 4.2597e-05, 3.8088e-05, 4.1480e-05, 3.6425e-05], device='cuda:0') 2022-11-16 02:08:26,066 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65680.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:08:40,940 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9003, 4.1928, 3.8360, 3.5778, 2.2415, 3.9902, 2.3008, 3.1879], device='cuda:0'), covar=tensor([0.0358, 0.0117, 0.0139, 0.0344, 0.0539, 0.0155, 0.0452, 0.0202], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0163, 0.0174, 0.0194, 0.0184, 0.0173, 0.0184, 0.0175], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:08:47,439 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65711.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:08:55,252 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0733, 1.3550, 1.2579, 0.8613, 1.2443, 1.1879, 1.0062, 1.4790], device='cuda:0'), covar=tensor([0.0052, 0.0036, 0.0046, 0.0056, 0.0038, 0.0042, 0.0075, 0.0040], device='cuda:0'), in_proj_covar=tensor([0.0052, 0.0047, 0.0049, 0.0049, 0.0048, 0.0044, 0.0046, 0.0041], device='cuda:0'), out_proj_covar=tensor([4.7605e-05, 4.2381e-05, 4.3677e-05, 4.4346e-05, 4.2783e-05, 3.8187e-05, 4.1738e-05, 3.6428e-05], device='cuda:0') 2022-11-16 02:08:58,246 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.788e+01 1.559e+02 1.915e+02 2.411e+02 5.348e+02, threshold=3.830e+02, percent-clipped=2.0 2022-11-16 02:09:12,969 INFO [train.py:876] (0/4) Epoch 10, batch 300, loss[loss=0.141, simple_loss=0.1644, pruned_loss=0.05879, over 5693.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1507, pruned_loss=0.0527, over 844902.59 frames. ], batch size: 19, lr: 8.29e-03, grad_scale: 16.0 2022-11-16 02:09:19,522 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65759.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:09:32,158 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.0354, 4.8314, 5.0997, 5.0126, 5.0408, 4.8972, 5.9355, 5.2123], device='cuda:0'), covar=tensor([0.0360, 0.1096, 0.0376, 0.1228, 0.0324, 0.0338, 0.0499, 0.0459], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0106, 0.0089, 0.0115, 0.0085, 0.0075, 0.0142, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:09:32,206 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8036, 4.6789, 4.8912, 4.0460, 4.7672, 4.7936, 2.2548, 5.0485], device='cuda:0'), covar=tensor([0.0220, 0.0294, 0.0270, 0.0442, 0.0239, 0.0175, 0.2734, 0.0255], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0083, 0.0086, 0.0076, 0.0103, 0.0087, 0.0131, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:09:36,541 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8416, 1.8711, 1.6385, 1.8933, 1.9699, 1.8228, 1.7065, 1.8548], device='cuda:0'), covar=tensor([0.0466, 0.0868, 0.1750, 0.0804, 0.0690, 0.0588, 0.1213, 0.0644], device='cuda:0'), in_proj_covar=tensor([0.0128, 0.0169, 0.0267, 0.0166, 0.0207, 0.0166, 0.0177, 0.0166], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 02:10:05,914 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.600e+01 1.581e+02 1.964e+02 2.494e+02 5.554e+02, threshold=3.929e+02, percent-clipped=0.0 2022-11-16 02:10:12,706 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=65838.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:10:18,397 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1509, 4.4471, 2.7330, 4.1442, 3.3745, 2.9016, 2.3316, 3.7682], device='cuda:0'), covar=tensor([0.1725, 0.0172, 0.1178, 0.0312, 0.0662, 0.1143, 0.1962, 0.0364], device='cuda:0'), in_proj_covar=tensor([0.0162, 0.0142, 0.0163, 0.0143, 0.0177, 0.0175, 0.0173, 0.0160], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 02:10:20,645 INFO [train.py:876] (0/4) Epoch 10, batch 400, loss[loss=0.1069, simple_loss=0.1409, pruned_loss=0.03642, over 5476.00 frames. ], tot_loss[loss=0.1255, simple_loss=0.1488, pruned_loss=0.05108, over 935794.99 frames. ], batch size: 12, lr: 8.28e-03, grad_scale: 16.0 2022-11-16 02:10:23,729 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5733, 1.2171, 1.7763, 0.8432, 1.8304, 1.1416, 1.0447, 1.4991], device='cuda:0'), covar=tensor([0.0476, 0.0882, 0.0310, 0.0997, 0.0975, 0.1262, 0.0695, 0.0378], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0019, 0.0013, 0.0017, 0.0014, 0.0013, 0.0018, 0.0013], device='cuda:0'), out_proj_covar=tensor([6.6522e-05, 9.1296e-05, 6.9197e-05, 8.2131e-05, 7.1825e-05, 6.7792e-05, 8.4924e-05, 6.7268e-05], device='cuda:0') 2022-11-16 02:10:46,438 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=65888.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:10:53,786 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=65899.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:11:13,839 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.863e+01 1.601e+02 1.999e+02 2.596e+02 7.493e+02, threshold=3.998e+02, percent-clipped=4.0 2022-11-16 02:11:14,061 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1781, 2.4999, 2.9371, 3.8183, 3.8478, 3.1581, 2.8554, 4.0402], device='cuda:0'), covar=tensor([0.0482, 0.3329, 0.2631, 0.2619, 0.1303, 0.3109, 0.2454, 0.0521], device='cuda:0'), in_proj_covar=tensor([0.0236, 0.0200, 0.0195, 0.0310, 0.0221, 0.0208, 0.0189, 0.0233], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:11:22,484 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=65941.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:11:27,628 INFO [train.py:876] (0/4) Epoch 10, batch 500, loss[loss=0.121, simple_loss=0.1405, pruned_loss=0.05076, over 5484.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.1478, pruned_loss=0.0496, over 1000215.23 frames. ], batch size: 12, lr: 8.28e-03, grad_scale: 16.0 2022-11-16 02:11:55,177 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=65989.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:11:59,276 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2022-11-16 02:12:01,587 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5278, 1.6588, 1.7244, 1.4307, 1.0683, 2.3864, 1.8969, 1.5245], device='cuda:0'), covar=tensor([0.1501, 0.1305, 0.1717, 0.2570, 0.3200, 0.0513, 0.1223, 0.2501], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0079, 0.0078, 0.0091, 0.0068, 0.0055, 0.0065, 0.0078], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 02:12:01,676 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2935, 2.0916, 2.9705, 2.5395, 2.8492, 2.1322, 2.7837, 3.2319], device='cuda:0'), covar=tensor([0.0755, 0.1519, 0.0713, 0.1236, 0.0828, 0.1469, 0.0987, 0.0765], device='cuda:0'), in_proj_covar=tensor([0.0231, 0.0189, 0.0205, 0.0205, 0.0227, 0.0189, 0.0221, 0.0224], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:12:05,754 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9657, 1.5001, 2.0763, 1.1382, 1.8532, 1.4129, 1.6449, 1.4472], device='cuda:0'), covar=tensor([0.1369, 0.0836, 0.0637, 0.2085, 0.3440, 0.1744, 0.0615, 0.1414], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0020, 0.0013, 0.0017, 0.0014, 0.0013, 0.0018, 0.0013], device='cuda:0'), out_proj_covar=tensor([6.7970e-05, 9.3078e-05, 7.0559e-05, 8.4320e-05, 7.4350e-05, 6.9126e-05, 8.6910e-05, 6.8957e-05], device='cuda:0') 2022-11-16 02:12:09,114 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8997, 2.4373, 2.6963, 3.8450, 3.8059, 2.8415, 2.6731, 3.8219], device='cuda:0'), covar=tensor([0.0638, 0.3120, 0.2840, 0.3730, 0.1264, 0.3552, 0.2705, 0.0691], device='cuda:0'), in_proj_covar=tensor([0.0236, 0.0201, 0.0196, 0.0312, 0.0221, 0.0208, 0.0191, 0.0235], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:12:13,937 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-16 02:12:17,552 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.40 vs. limit=2.0 2022-11-16 02:12:21,979 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.783e+01 1.631e+02 2.001e+02 2.434e+02 4.617e+02, threshold=4.002e+02, percent-clipped=2.0 2022-11-16 02:12:35,742 INFO [train.py:876] (0/4) Epoch 10, batch 600, loss[loss=0.09373, simple_loss=0.115, pruned_loss=0.03624, over 5111.00 frames. ], tot_loss[loss=0.1227, simple_loss=0.1472, pruned_loss=0.04906, over 1034033.97 frames. ], batch size: 7, lr: 8.27e-03, grad_scale: 16.0 2022-11-16 02:13:20,237 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5449, 1.6580, 2.0883, 1.5031, 1.3510, 2.5916, 1.9273, 1.7300], device='cuda:0'), covar=tensor([0.1162, 0.1473, 0.1180, 0.2795, 0.2460, 0.0780, 0.1313, 0.1659], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0079, 0.0077, 0.0090, 0.0067, 0.0055, 0.0065, 0.0078], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 02:13:26,066 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5072, 4.1842, 3.1598, 1.9074, 3.7122, 1.5216, 3.8662, 2.1081], device='cuda:0'), covar=tensor([0.1298, 0.0131, 0.0676, 0.1989, 0.0193, 0.1970, 0.0210, 0.1486], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0103, 0.0113, 0.0113, 0.0103, 0.0122, 0.0100, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:13:27,937 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.503e+01 1.627e+02 2.016e+02 2.685e+02 4.936e+02, threshold=4.031e+02, percent-clipped=2.0 2022-11-16 02:13:36,651 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9550, 0.6152, 0.8088, 0.7524, 0.9203, 0.9459, 0.5268, 0.7797], device='cuda:0'), covar=tensor([0.0282, 0.0401, 0.0364, 0.0530, 0.0323, 0.0268, 0.0787, 0.0361], device='cuda:0'), in_proj_covar=tensor([0.0012, 0.0019, 0.0013, 0.0017, 0.0014, 0.0012, 0.0017, 0.0013], device='cuda:0'), out_proj_covar=tensor([6.6102e-05, 9.0787e-05, 6.7980e-05, 8.1800e-05, 7.2365e-05, 6.6971e-05, 8.4228e-05, 6.6688e-05], device='cuda:0') 2022-11-16 02:13:39,962 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66144.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:13:43,118 INFO [train.py:876] (0/4) Epoch 10, batch 700, loss[loss=0.1361, simple_loss=0.1606, pruned_loss=0.05578, over 5799.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.1489, pruned_loss=0.05044, over 1054336.43 frames. ], batch size: 22, lr: 8.26e-03, grad_scale: 16.0 2022-11-16 02:13:50,242 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66160.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:13:52,229 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7913, 3.5035, 3.6084, 1.9905, 3.0869, 3.9469, 3.8492, 4.2786], device='cuda:0'), covar=tensor([0.1873, 0.1344, 0.0768, 0.2803, 0.0658, 0.0466, 0.0289, 0.0556], device='cuda:0'), in_proj_covar=tensor([0.0170, 0.0182, 0.0165, 0.0188, 0.0177, 0.0194, 0.0159, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004], device='cuda:0') 2022-11-16 02:14:08,605 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66188.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 02:14:12,740 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66194.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:14:21,179 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66205.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:14:31,677 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66221.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:14:35,980 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.700e+01 1.569e+02 1.911e+02 2.505e+02 4.164e+02, threshold=3.822e+02, percent-clipped=1.0 2022-11-16 02:14:41,292 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66236.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:14:50,408 INFO [train.py:876] (0/4) Epoch 10, batch 800, loss[loss=0.108, simple_loss=0.1372, pruned_loss=0.03947, over 5722.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.1494, pruned_loss=0.05143, over 1058925.52 frames. ], batch size: 13, lr: 8.26e-03, grad_scale: 16.0 2022-11-16 02:14:54,450 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1826, 4.2748, 2.8883, 4.0293, 3.2487, 3.0970, 2.3977, 3.6784], device='cuda:0'), covar=tensor([0.1692, 0.0247, 0.1134, 0.0346, 0.0731, 0.1014, 0.1975, 0.0410], device='cuda:0'), in_proj_covar=tensor([0.0161, 0.0139, 0.0160, 0.0143, 0.0175, 0.0171, 0.0169, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 02:14:57,071 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66258.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:15:24,103 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8253, 2.1326, 2.2792, 3.0560, 2.9721, 2.2878, 2.0391, 3.1540], device='cuda:0'), covar=tensor([0.1659, 0.2471, 0.2144, 0.1875, 0.1131, 0.2894, 0.2130, 0.0981], device='cuda:0'), in_proj_covar=tensor([0.0239, 0.0204, 0.0197, 0.0312, 0.0224, 0.0210, 0.0192, 0.0238], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:15:38,170 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66319.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:15:43,767 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.565e+02 1.875e+02 2.256e+02 4.217e+02, threshold=3.749e+02, percent-clipped=3.0 2022-11-16 02:15:57,482 INFO [train.py:876] (0/4) Epoch 10, batch 900, loss[loss=0.09858, simple_loss=0.1259, pruned_loss=0.03565, over 5510.00 frames. ], tot_loss[loss=0.1253, simple_loss=0.1491, pruned_loss=0.05076, over 1069876.76 frames. ], batch size: 12, lr: 8.25e-03, grad_scale: 16.0 2022-11-16 02:16:43,297 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1960, 2.2287, 2.6497, 1.6900, 1.3609, 2.9031, 2.5870, 2.1497], device='cuda:0'), covar=tensor([0.0866, 0.1031, 0.0607, 0.2638, 0.2463, 0.1012, 0.0866, 0.1275], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0077, 0.0076, 0.0088, 0.0067, 0.0054, 0.0064, 0.0076], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0001, 0.0002, 0.0002], device='cuda:0') 2022-11-16 02:16:49,278 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7673, 1.2321, 1.8672, 1.3821, 1.1534, 1.6629, 1.4426, 1.3143], device='cuda:0'), covar=tensor([0.0074, 0.0157, 0.0024, 0.0041, 0.0098, 0.0051, 0.0038, 0.0041], device='cuda:0'), in_proj_covar=tensor([0.0023, 0.0022, 0.0024, 0.0031, 0.0026, 0.0024, 0.0029, 0.0028], device='cuda:0'), out_proj_covar=tensor([2.1662e-05, 2.1278e-05, 2.1443e-05, 3.0012e-05, 2.4153e-05, 2.3353e-05, 2.7552e-05, 2.7717e-05], device='cuda:0') 2022-11-16 02:16:51,649 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.944e+02 2.274e+02 2.934e+02 5.796e+02, threshold=4.548e+02, percent-clipped=10.0 2022-11-16 02:17:05,484 INFO [train.py:876] (0/4) Epoch 10, batch 1000, loss[loss=0.15, simple_loss=0.1551, pruned_loss=0.07245, over 4740.00 frames. ], tot_loss[loss=0.1272, simple_loss=0.1506, pruned_loss=0.05194, over 1077910.62 frames. ], batch size: 136, lr: 8.25e-03, grad_scale: 16.0 2022-11-16 02:17:35,876 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66494.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 02:17:39,665 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66500.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:17:50,080 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66516.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 02:17:53,502 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66521.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:17:58,496 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.268e+01 1.607e+02 1.990e+02 2.689e+02 6.236e+02, threshold=3.979e+02, percent-clipped=3.0 2022-11-16 02:18:08,455 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66542.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:18:13,074 INFO [train.py:876] (0/4) Epoch 10, batch 1100, loss[loss=0.07142, simple_loss=0.1146, pruned_loss=0.01413, over 5555.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1496, pruned_loss=0.05155, over 1077630.93 frames. ], batch size: 16, lr: 8.24e-03, grad_scale: 16.0 2022-11-16 02:18:34,183 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.09 vs. limit=2.0 2022-11-16 02:18:35,127 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66582.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:18:56,850 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66614.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:19:05,749 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.897e+01 1.663e+02 2.010e+02 2.428e+02 5.298e+02, threshold=4.020e+02, percent-clipped=1.0 2022-11-16 02:19:07,156 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4367, 4.0992, 3.0932, 1.8801, 3.7954, 1.2901, 3.6085, 1.9300], device='cuda:0'), covar=tensor([0.1463, 0.0122, 0.0659, 0.2024, 0.0217, 0.2195, 0.0298, 0.1821], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0103, 0.0113, 0.0111, 0.0101, 0.0121, 0.0099, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:19:20,778 INFO [train.py:876] (0/4) Epoch 10, batch 1200, loss[loss=0.1128, simple_loss=0.1416, pruned_loss=0.04198, over 5690.00 frames. ], tot_loss[loss=0.1243, simple_loss=0.1487, pruned_loss=0.04996, over 1081111.50 frames. ], batch size: 15, lr: 8.23e-03, grad_scale: 16.0 2022-11-16 02:19:41,988 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-16 02:19:45,817 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.38 vs. limit=5.0 2022-11-16 02:20:13,038 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.003e+02 1.536e+02 1.899e+02 2.389e+02 5.504e+02, threshold=3.797e+02, percent-clipped=2.0 2022-11-16 02:20:15,157 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66731.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:20:27,601 INFO [train.py:876] (0/4) Epoch 10, batch 1300, loss[loss=0.1312, simple_loss=0.1614, pruned_loss=0.05054, over 5694.00 frames. ], tot_loss[loss=0.1253, simple_loss=0.149, pruned_loss=0.05081, over 1079545.17 frames. ], batch size: 19, lr: 8.23e-03, grad_scale: 16.0 2022-11-16 02:20:56,004 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=66792.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:21:01,743 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66800.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:21:13,042 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66816.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:21:17,621 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4719, 2.1599, 2.4433, 3.4501, 3.4086, 2.6591, 2.3343, 3.5368], device='cuda:0'), covar=tensor([0.0726, 0.2314, 0.2074, 0.2682, 0.1151, 0.2925, 0.2044, 0.0502], device='cuda:0'), in_proj_covar=tensor([0.0236, 0.0199, 0.0193, 0.0310, 0.0223, 0.0206, 0.0188, 0.0233], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:21:20,619 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.546e+02 1.789e+02 2.414e+02 5.527e+02, threshold=3.579e+02, percent-clipped=2.0 2022-11-16 02:21:33,576 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66848.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:21:34,166 INFO [train.py:876] (0/4) Epoch 10, batch 1400, loss[loss=0.1212, simple_loss=0.1542, pruned_loss=0.04413, over 5696.00 frames. ], tot_loss[loss=0.1233, simple_loss=0.1473, pruned_loss=0.04965, over 1080785.68 frames. ], batch size: 28, lr: 8.22e-03, grad_scale: 16.0 2022-11-16 02:21:44,992 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66864.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:21:52,653 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.63 vs. limit=5.0 2022-11-16 02:21:53,525 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=66877.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 02:22:18,750 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=66914.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:22:27,874 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.435e+01 1.629e+02 1.952e+02 2.381e+02 3.716e+02, threshold=3.904e+02, percent-clipped=1.0 2022-11-16 02:22:41,671 INFO [train.py:876] (0/4) Epoch 10, batch 1500, loss[loss=0.1426, simple_loss=0.1544, pruned_loss=0.06541, over 4979.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.1477, pruned_loss=0.04965, over 1083293.92 frames. ], batch size: 110, lr: 8.21e-03, grad_scale: 16.0 2022-11-16 02:22:46,961 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=66957.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:22:50,116 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=66962.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:23:00,129 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-16 02:23:27,991 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67018.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:23:34,295 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 1.625e+02 1.914e+02 2.331e+02 6.825e+02, threshold=3.828e+02, percent-clipped=2.0 2022-11-16 02:23:36,842 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2022-11-16 02:23:43,663 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7610, 2.3518, 3.3487, 2.7912, 3.3992, 2.2916, 3.0351, 3.6401], device='cuda:0'), covar=tensor([0.0860, 0.1465, 0.0971, 0.1843, 0.0801, 0.1690, 0.1262, 0.1191], device='cuda:0'), in_proj_covar=tensor([0.0233, 0.0190, 0.0208, 0.0208, 0.0230, 0.0193, 0.0222, 0.0226], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:23:49,308 INFO [train.py:876] (0/4) Epoch 10, batch 1600, loss[loss=0.1454, simple_loss=0.163, pruned_loss=0.06393, over 5729.00 frames. ], tot_loss[loss=0.1231, simple_loss=0.1475, pruned_loss=0.04935, over 1089787.11 frames. ], batch size: 31, lr: 8.21e-03, grad_scale: 16.0 2022-11-16 02:24:11,988 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67083.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:24:14,888 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=67087.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:24:41,918 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.648e+02 1.940e+02 2.386e+02 4.578e+02, threshold=3.880e+02, percent-clipped=4.0 2022-11-16 02:24:53,230 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67144.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:24:56,732 INFO [train.py:876] (0/4) Epoch 10, batch 1700, loss[loss=0.08277, simple_loss=0.1214, pruned_loss=0.02205, over 5467.00 frames. ], tot_loss[loss=0.1237, simple_loss=0.1476, pruned_loss=0.04989, over 1091428.00 frames. ], batch size: 12, lr: 8.20e-03, grad_scale: 16.0 2022-11-16 02:25:15,541 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=67177.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:25:21,844 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.1536, 4.6298, 4.9173, 4.6749, 5.2477, 5.1311, 4.5023, 5.2295], device='cuda:0'), covar=tensor([0.0333, 0.0271, 0.0425, 0.0272, 0.0302, 0.0140, 0.0221, 0.0228], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0141, 0.0103, 0.0138, 0.0162, 0.0095, 0.0116, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 02:25:33,586 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4941, 2.5904, 2.4013, 2.7985, 2.2264, 2.1744, 2.3010, 2.9708], device='cuda:0'), covar=tensor([0.1004, 0.1621, 0.1945, 0.0751, 0.1711, 0.1080, 0.1712, 0.0831], device='cuda:0'), in_proj_covar=tensor([0.0099, 0.0097, 0.0101, 0.0092, 0.0089, 0.0093, 0.0096, 0.0072], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:25:48,250 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=67225.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:25:50,860 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.967e+01 1.444e+02 1.809e+02 2.360e+02 5.215e+02, threshold=3.618e+02, percent-clipped=3.0 2022-11-16 02:26:02,432 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9533, 2.5543, 2.8670, 3.8395, 4.0447, 3.3234, 2.8798, 3.8624], device='cuda:0'), covar=tensor([0.0751, 0.2622, 0.2517, 0.3738, 0.1002, 0.2812, 0.2039, 0.0921], device='cuda:0'), in_proj_covar=tensor([0.0239, 0.0198, 0.0191, 0.0310, 0.0222, 0.0205, 0.0189, 0.0234], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:26:04,153 INFO [train.py:876] (0/4) Epoch 10, batch 1800, loss[loss=0.09895, simple_loss=0.1318, pruned_loss=0.03308, over 5455.00 frames. ], tot_loss[loss=0.125, simple_loss=0.149, pruned_loss=0.05052, over 1089222.95 frames. ], batch size: 12, lr: 8.20e-03, grad_scale: 16.0 2022-11-16 02:26:19,404 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2307, 4.8898, 4.3761, 4.0939, 2.3752, 4.9156, 2.5644, 4.3749], device='cuda:0'), covar=tensor([0.0345, 0.0192, 0.0176, 0.0317, 0.0655, 0.0136, 0.0546, 0.0071], device='cuda:0'), in_proj_covar=tensor([0.0188, 0.0165, 0.0175, 0.0194, 0.0186, 0.0172, 0.0185, 0.0175], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:26:44,371 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2502, 1.3349, 1.2211, 1.0742, 1.2203, 1.3078, 0.9417, 1.1856], device='cuda:0'), covar=tensor([0.0063, 0.0053, 0.0045, 0.0054, 0.0048, 0.0039, 0.0054, 0.0062], device='cuda:0'), in_proj_covar=tensor([0.0054, 0.0047, 0.0050, 0.0050, 0.0050, 0.0045, 0.0047, 0.0042], device='cuda:0'), out_proj_covar=tensor([4.8733e-05, 4.2582e-05, 4.4225e-05, 4.5177e-05, 4.4424e-05, 3.9023e-05, 4.2443e-05, 3.6968e-05], device='cuda:0') 2022-11-16 02:26:47,956 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=67313.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 02:26:58,162 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.612e+02 1.959e+02 2.585e+02 8.694e+02, threshold=3.917e+02, percent-clipped=8.0 2022-11-16 02:27:11,096 INFO [train.py:876] (0/4) Epoch 10, batch 1900, loss[loss=0.09851, simple_loss=0.139, pruned_loss=0.02901, over 5675.00 frames. ], tot_loss[loss=0.1236, simple_loss=0.1482, pruned_loss=0.04946, over 1088290.60 frames. ], batch size: 11, lr: 8.19e-03, grad_scale: 16.0 2022-11-16 02:27:28,511 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.81 vs. limit=2.0 2022-11-16 02:27:37,293 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=67387.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:28:05,971 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.608e+02 1.927e+02 2.290e+02 4.521e+02, threshold=3.854e+02, percent-clipped=3.0 2022-11-16 02:28:10,016 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=67435.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:28:12,563 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=67439.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:28:19,257 INFO [train.py:876] (0/4) Epoch 10, batch 2000, loss[loss=0.114, simple_loss=0.1461, pruned_loss=0.04096, over 5728.00 frames. ], tot_loss[loss=0.1244, simple_loss=0.1487, pruned_loss=0.05004, over 1082592.71 frames. ], batch size: 31, lr: 8.18e-03, grad_scale: 16.0 2022-11-16 02:28:28,357 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2022-11-16 02:28:47,091 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.59 vs. limit=5.0 2022-11-16 02:28:50,734 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0492, 3.3406, 2.5715, 1.6504, 3.2133, 1.2121, 3.0516, 1.6445], device='cuda:0'), covar=tensor([0.1609, 0.0185, 0.0990, 0.2067, 0.0272, 0.2373, 0.0352, 0.1694], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0104, 0.0113, 0.0113, 0.0103, 0.0123, 0.0100, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:29:01,716 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.7016, 5.1370, 5.4849, 5.1843, 5.7657, 5.7062, 4.6941, 5.7033], device='cuda:0'), covar=tensor([0.0285, 0.0299, 0.0313, 0.0292, 0.0276, 0.0097, 0.0223, 0.0224], device='cuda:0'), in_proj_covar=tensor([0.0131, 0.0141, 0.0104, 0.0136, 0.0161, 0.0095, 0.0115, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 02:29:02,486 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67513.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:29:14,260 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.999e+01 1.503e+02 1.762e+02 2.315e+02 5.487e+02, threshold=3.525e+02, percent-clipped=4.0 2022-11-16 02:29:15,856 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2022-11-16 02:29:27,324 INFO [train.py:876] (0/4) Epoch 10, batch 2100, loss[loss=0.1456, simple_loss=0.1539, pruned_loss=0.06865, over 4997.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.15, pruned_loss=0.0511, over 1074968.02 frames. ], batch size: 109, lr: 8.18e-03, grad_scale: 16.0 2022-11-16 02:29:44,913 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67574.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 02:29:45,241 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.04 vs. limit=2.0 2022-11-16 02:30:10,748 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=67613.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:30:20,668 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4218, 1.1444, 1.2655, 1.0065, 1.1433, 1.5140, 0.8244, 1.4648], device='cuda:0'), covar=tensor([0.1069, 0.0715, 0.0901, 0.0779, 0.1925, 0.0865, 0.2949, 0.0397], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0020, 0.0013, 0.0017, 0.0014, 0.0012, 0.0018, 0.0013], device='cuda:0'), out_proj_covar=tensor([6.7964e-05, 9.3581e-05, 6.8876e-05, 8.2454e-05, 7.3821e-05, 6.7773e-05, 8.5485e-05, 6.6604e-05], device='cuda:0') 2022-11-16 02:30:21,801 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.895e+01 1.533e+02 1.917e+02 2.461e+02 4.676e+02, threshold=3.833e+02, percent-clipped=3.0 2022-11-16 02:30:35,258 INFO [train.py:876] (0/4) Epoch 10, batch 2200, loss[loss=0.09926, simple_loss=0.1329, pruned_loss=0.03279, over 5539.00 frames. ], tot_loss[loss=0.1248, simple_loss=0.149, pruned_loss=0.05029, over 1079811.35 frames. ], batch size: 40, lr: 8.17e-03, grad_scale: 16.0 2022-11-16 02:30:39,352 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8443, 1.2072, 1.5348, 0.8620, 1.5975, 1.6073, 1.0353, 1.3931], device='cuda:0'), covar=tensor([0.0383, 0.0482, 0.0379, 0.0993, 0.0468, 0.0911, 0.0673, 0.0330], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0020, 0.0013, 0.0017, 0.0014, 0.0012, 0.0018, 0.0012], device='cuda:0'), out_proj_covar=tensor([6.7755e-05, 9.3272e-05, 6.8753e-05, 8.2107e-05, 7.3487e-05, 6.7493e-05, 8.5326e-05, 6.6337e-05], device='cuda:0') 2022-11-16 02:30:43,330 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=67661.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:30:49,276 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9921, 4.5024, 4.0429, 4.4890, 4.5366, 3.8672, 4.0971, 3.9460], device='cuda:0'), covar=tensor([0.0604, 0.0523, 0.1515, 0.0521, 0.0481, 0.0548, 0.0643, 0.0660], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0172, 0.0264, 0.0166, 0.0211, 0.0168, 0.0181, 0.0164], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:30:53,661 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5028, 1.5227, 1.4901, 1.0611, 1.2412, 1.4543, 1.3599, 1.4635], device='cuda:0'), covar=tensor([0.0053, 0.0045, 0.0050, 0.0063, 0.0045, 0.0040, 0.0059, 0.0059], device='cuda:0'), in_proj_covar=tensor([0.0052, 0.0047, 0.0048, 0.0049, 0.0049, 0.0044, 0.0046, 0.0042], device='cuda:0'), out_proj_covar=tensor([4.7524e-05, 4.2093e-05, 4.3189e-05, 4.4318e-05, 4.3376e-05, 3.8297e-05, 4.1734e-05, 3.6676e-05], device='cuda:0') 2022-11-16 02:30:56,577 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67680.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:31:04,409 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67692.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:31:09,317 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4891, 4.0538, 3.5955, 3.9496, 4.0571, 3.4050, 3.5285, 3.4799], device='cuda:0'), covar=tensor([0.0856, 0.0462, 0.1522, 0.0550, 0.0426, 0.0515, 0.0673, 0.0656], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0173, 0.0267, 0.0168, 0.0212, 0.0169, 0.0182, 0.0166], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:31:28,761 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.008e+02 1.628e+02 1.848e+02 2.167e+02 3.547e+02, threshold=3.696e+02, percent-clipped=0.0 2022-11-16 02:31:30,963 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9483, 4.3028, 3.8704, 3.6214, 2.2378, 4.2298, 2.3071, 3.5805], device='cuda:0'), covar=tensor([0.0399, 0.0139, 0.0224, 0.0366, 0.0636, 0.0141, 0.0521, 0.0140], device='cuda:0'), in_proj_covar=tensor([0.0189, 0.0165, 0.0173, 0.0196, 0.0185, 0.0173, 0.0184, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:31:36,133 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=67739.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:31:37,513 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67741.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:31:42,387 INFO [train.py:876] (0/4) Epoch 10, batch 2300, loss[loss=0.153, simple_loss=0.1669, pruned_loss=0.0696, over 5574.00 frames. ], tot_loss[loss=0.1243, simple_loss=0.1485, pruned_loss=0.05007, over 1079967.73 frames. ], batch size: 25, lr: 8.17e-03, grad_scale: 16.0 2022-11-16 02:31:45,478 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=67753.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:31:47,403 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8696, 1.9765, 1.8958, 1.3096, 1.6025, 1.8728, 1.3944, 1.4018], device='cuda:0'), covar=tensor([0.0030, 0.0030, 0.0056, 0.0045, 0.0039, 0.0057, 0.0031, 0.0037], device='cuda:0'), in_proj_covar=tensor([0.0024, 0.0023, 0.0023, 0.0031, 0.0026, 0.0024, 0.0028, 0.0028], device='cuda:0'), out_proj_covar=tensor([2.1757e-05, 2.1760e-05, 2.1178e-05, 3.0059e-05, 2.4426e-05, 2.3449e-05, 2.7041e-05, 2.7940e-05], device='cuda:0') 2022-11-16 02:32:00,444 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.63 vs. limit=5.0 2022-11-16 02:32:08,278 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=67787.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:32:33,131 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9405, 3.0459, 2.2153, 2.5902, 1.8819, 2.3893, 1.7578, 2.5866], device='cuda:0'), covar=tensor([0.1152, 0.0277, 0.0968, 0.0550, 0.1620, 0.0852, 0.1786, 0.0468], device='cuda:0'), in_proj_covar=tensor([0.0159, 0.0140, 0.0162, 0.0146, 0.0177, 0.0172, 0.0169, 0.0160], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:32:36,227 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.663e+01 1.643e+02 1.965e+02 2.550e+02 4.357e+02, threshold=3.931e+02, percent-clipped=5.0 2022-11-16 02:32:50,269 INFO [train.py:876] (0/4) Epoch 10, batch 2400, loss[loss=0.1226, simple_loss=0.1483, pruned_loss=0.04844, over 5618.00 frames. ], tot_loss[loss=0.1251, simple_loss=0.1487, pruned_loss=0.05077, over 1076047.92 frames. ], batch size: 32, lr: 8.16e-03, grad_scale: 16.0 2022-11-16 02:33:03,305 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=67869.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:33:43,934 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.957e+01 1.533e+02 1.852e+02 2.262e+02 4.255e+02, threshold=3.703e+02, percent-clipped=1.0 2022-11-16 02:33:55,825 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.1397, 4.5584, 4.9481, 4.6246, 5.1784, 5.0679, 4.5791, 5.1719], device='cuda:0'), covar=tensor([0.0325, 0.0323, 0.0363, 0.0275, 0.0357, 0.0159, 0.0247, 0.0233], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0141, 0.0104, 0.0137, 0.0163, 0.0096, 0.0117, 0.0140], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 02:33:58,900 INFO [train.py:876] (0/4) Epoch 10, batch 2500, loss[loss=0.1095, simple_loss=0.1351, pruned_loss=0.04196, over 5721.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.1492, pruned_loss=0.05028, over 1083604.05 frames. ], batch size: 15, lr: 8.15e-03, grad_scale: 16.0 2022-11-16 02:34:22,081 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=67981.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:34:29,020 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.43 vs. limit=5.0 2022-11-16 02:34:54,543 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.495e+02 1.844e+02 2.229e+02 3.731e+02, threshold=3.687e+02, percent-clipped=1.0 2022-11-16 02:34:59,151 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68036.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:35:03,040 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68042.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:35:06,808 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68048.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 02:35:07,301 INFO [train.py:876] (0/4) Epoch 10, batch 2600, loss[loss=0.1459, simple_loss=0.1652, pruned_loss=0.06327, over 5738.00 frames. ], tot_loss[loss=0.1246, simple_loss=0.1493, pruned_loss=0.04998, over 1088927.18 frames. ], batch size: 20, lr: 8.15e-03, grad_scale: 16.0 2022-11-16 02:35:33,579 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=68087.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:36:01,565 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.022e+02 1.584e+02 1.812e+02 2.334e+02 4.676e+02, threshold=3.625e+02, percent-clipped=1.0 2022-11-16 02:36:14,135 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68148.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:36:14,628 INFO [train.py:876] (0/4) Epoch 10, batch 2700, loss[loss=0.1318, simple_loss=0.1615, pruned_loss=0.05104, over 5711.00 frames. ], tot_loss[loss=0.123, simple_loss=0.1481, pruned_loss=0.04897, over 1085608.34 frames. ], batch size: 36, lr: 8.14e-03, grad_scale: 16.0 2022-11-16 02:36:28,376 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.27 vs. limit=2.0 2022-11-16 02:36:28,845 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68169.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:37:00,602 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68217.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:37:09,358 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.044e+01 1.607e+02 1.920e+02 2.533e+02 3.834e+02, threshold=3.840e+02, percent-clipped=3.0 2022-11-16 02:37:17,787 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6813, 2.0466, 1.7536, 1.2985, 1.8831, 2.2021, 1.9565, 2.1368], device='cuda:0'), covar=tensor([0.1869, 0.1497, 0.1722, 0.2892, 0.1157, 0.1008, 0.0805, 0.1276], device='cuda:0'), in_proj_covar=tensor([0.0173, 0.0185, 0.0166, 0.0188, 0.0180, 0.0197, 0.0167, 0.0188], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:37:22,791 INFO [train.py:876] (0/4) Epoch 10, batch 2800, loss[loss=0.07558, simple_loss=0.1068, pruned_loss=0.02217, over 5307.00 frames. ], tot_loss[loss=0.123, simple_loss=0.1478, pruned_loss=0.04908, over 1086622.13 frames. ], batch size: 9, lr: 8.14e-03, grad_scale: 16.0 2022-11-16 02:37:29,288 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9349, 1.5432, 1.4751, 0.9587, 1.6794, 2.3517, 1.4441, 1.5609], device='cuda:0'), covar=tensor([0.0593, 0.1076, 0.1695, 0.2204, 0.2704, 0.0225, 0.1369, 0.0621], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0020, 0.0013, 0.0017, 0.0014, 0.0013, 0.0018, 0.0013], device='cuda:0'), out_proj_covar=tensor([6.9409e-05, 9.4780e-05, 7.0793e-05, 8.4960e-05, 7.4858e-05, 6.9512e-05, 8.7875e-05, 6.9293e-05], device='cuda:0') 2022-11-16 02:37:55,817 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2022-11-16 02:38:11,850 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2022-11-16 02:38:16,586 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.672e+01 1.605e+02 1.834e+02 2.410e+02 3.703e+02, threshold=3.668e+02, percent-clipped=0.0 2022-11-16 02:38:21,177 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4278, 2.1423, 2.9833, 2.5839, 2.9184, 2.1883, 2.9085, 3.3448], device='cuda:0'), covar=tensor([0.0840, 0.1576, 0.1022, 0.1801, 0.0849, 0.1529, 0.1226, 0.1075], device='cuda:0'), in_proj_covar=tensor([0.0233, 0.0191, 0.0207, 0.0208, 0.0231, 0.0188, 0.0222, 0.0224], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:38:21,759 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68336.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:38:22,373 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68337.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:38:25,740 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8285, 1.4621, 1.8512, 1.7134, 1.9148, 1.2066, 1.7141, 1.9606], device='cuda:0'), covar=tensor([0.0317, 0.0861, 0.0335, 0.0367, 0.0369, 0.0807, 0.0455, 0.0301], device='cuda:0'), in_proj_covar=tensor([0.0234, 0.0191, 0.0208, 0.0208, 0.0231, 0.0188, 0.0223, 0.0225], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:38:30,000 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68348.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:38:30,506 INFO [train.py:876] (0/4) Epoch 10, batch 2900, loss[loss=0.109, simple_loss=0.1414, pruned_loss=0.03827, over 5545.00 frames. ], tot_loss[loss=0.1244, simple_loss=0.1485, pruned_loss=0.05012, over 1088584.96 frames. ], batch size: 14, lr: 8.13e-03, grad_scale: 16.0 2022-11-16 02:38:38,422 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4685, 1.8274, 1.5595, 1.3472, 1.4907, 1.3682, 1.3504, 1.6033], device='cuda:0'), covar=tensor([0.0047, 0.0048, 0.0044, 0.0044, 0.0043, 0.0040, 0.0040, 0.0039], device='cuda:0'), in_proj_covar=tensor([0.0053, 0.0048, 0.0049, 0.0050, 0.0049, 0.0045, 0.0046, 0.0042], device='cuda:0'), out_proj_covar=tensor([4.7966e-05, 4.2932e-05, 4.4064e-05, 4.4677e-05, 4.3810e-05, 3.8852e-05, 4.1526e-05, 3.6667e-05], device='cuda:0') 2022-11-16 02:38:39,083 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.95 vs. limit=5.0 2022-11-16 02:38:53,224 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68384.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:39:02,028 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68396.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:39:04,131 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=68399.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:39:23,978 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.585e+02 1.960e+02 2.484e+02 4.720e+02, threshold=3.919e+02, percent-clipped=5.0 2022-11-16 02:39:25,435 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8172, 2.8364, 2.5162, 2.9589, 2.4324, 2.6171, 2.6940, 3.3763], device='cuda:0'), covar=tensor([0.1309, 0.1306, 0.2417, 0.1286, 0.1636, 0.1429, 0.1396, 0.3145], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0099, 0.0102, 0.0094, 0.0089, 0.0096, 0.0097, 0.0074], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:39:33,256 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68443.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:39:38,224 INFO [train.py:876] (0/4) Epoch 10, batch 3000, loss[loss=0.1917, simple_loss=0.1814, pruned_loss=0.101, over 3137.00 frames. ], tot_loss[loss=0.1243, simple_loss=0.1485, pruned_loss=0.05004, over 1083483.10 frames. ], batch size: 284, lr: 8.12e-03, grad_scale: 16.0 2022-11-16 02:39:38,225 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 02:39:56,213 INFO [train.py:908] (0/4) Epoch 10, validation: loss=0.1681, simple_loss=0.1842, pruned_loss=0.07602, over 1530663.00 frames. 2022-11-16 02:39:56,213 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-16 02:40:03,623 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68460.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:40:49,560 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.217e+01 1.619e+02 2.006e+02 2.438e+02 5.141e+02, threshold=4.012e+02, percent-clipped=2.0 2022-11-16 02:40:49,775 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9648, 2.5111, 2.3225, 1.4213, 2.6908, 2.8371, 2.5740, 2.9479], device='cuda:0'), covar=tensor([0.1839, 0.1667, 0.1653, 0.2763, 0.0622, 0.0923, 0.0545, 0.0884], device='cuda:0'), in_proj_covar=tensor([0.0174, 0.0188, 0.0165, 0.0189, 0.0181, 0.0198, 0.0169, 0.0188], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:41:02,560 INFO [train.py:876] (0/4) Epoch 10, batch 3100, loss[loss=0.1034, simple_loss=0.1409, pruned_loss=0.03293, over 5770.00 frames. ], tot_loss[loss=0.1241, simple_loss=0.1484, pruned_loss=0.04989, over 1094143.07 frames. ], batch size: 16, lr: 8.12e-03, grad_scale: 16.0 2022-11-16 02:41:22,506 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=68578.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:41:57,045 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.658e+01 1.525e+02 1.984e+02 2.613e+02 4.758e+02, threshold=3.969e+02, percent-clipped=4.0 2022-11-16 02:42:02,892 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68637.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:42:04,170 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68639.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:42:10,467 INFO [train.py:876] (0/4) Epoch 10, batch 3200, loss[loss=0.1434, simple_loss=0.1679, pruned_loss=0.05941, over 5613.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1481, pruned_loss=0.04981, over 1083838.95 frames. ], batch size: 29, lr: 8.11e-03, grad_scale: 16.0 2022-11-16 02:42:18,637 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0 2022-11-16 02:42:35,038 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68685.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:42:35,133 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=68685.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:42:39,688 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0655, 3.7827, 3.9222, 3.9988, 4.0845, 3.8455, 1.6399, 4.2254], device='cuda:0'), covar=tensor([0.0302, 0.0775, 0.0312, 0.0248, 0.0386, 0.0419, 0.3367, 0.0288], device='cuda:0'), in_proj_covar=tensor([0.0102, 0.0085, 0.0085, 0.0076, 0.0101, 0.0087, 0.0129, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:42:40,811 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2479, 3.4452, 2.5766, 1.7595, 3.2437, 1.2995, 3.3139, 1.7893], device='cuda:0'), covar=tensor([0.1517, 0.0192, 0.0935, 0.2035, 0.0258, 0.2442, 0.0257, 0.1824], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0103, 0.0112, 0.0113, 0.0101, 0.0123, 0.0100, 0.0112], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:43:04,535 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.562e+02 1.856e+02 2.201e+02 3.564e+02, threshold=3.712e+02, percent-clipped=0.0 2022-11-16 02:43:14,420 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=68743.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 02:43:16,478 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=68746.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:43:18,251 INFO [train.py:876] (0/4) Epoch 10, batch 3300, loss[loss=0.1126, simple_loss=0.1285, pruned_loss=0.04829, over 5119.00 frames. ], tot_loss[loss=0.1229, simple_loss=0.1475, pruned_loss=0.04912, over 1087614.99 frames. ], batch size: 91, lr: 8.11e-03, grad_scale: 16.0 2022-11-16 02:43:22,184 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68755.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:43:30,763 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4844, 1.6645, 2.1185, 1.4206, 1.0287, 2.4675, 2.1219, 1.7558], device='cuda:0'), covar=tensor([0.1290, 0.1313, 0.0821, 0.2370, 0.3104, 0.0818, 0.0944, 0.1366], device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0081, 0.0080, 0.0091, 0.0069, 0.0061, 0.0067, 0.0078], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 02:43:30,827 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8771, 2.1879, 2.3523, 3.2036, 3.1450, 2.4871, 1.9434, 3.2381], device='cuda:0'), covar=tensor([0.1529, 0.2483, 0.2072, 0.1937, 0.1252, 0.2902, 0.2353, 0.1048], device='cuda:0'), in_proj_covar=tensor([0.0241, 0.0203, 0.0193, 0.0312, 0.0227, 0.0209, 0.0192, 0.0239], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:43:46,718 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=68791.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:44:02,311 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5960, 1.8842, 1.5673, 1.1905, 1.8357, 0.9710, 1.9197, 1.1628], device='cuda:0'), covar=tensor([0.0847, 0.0271, 0.0858, 0.1108, 0.0306, 0.1820, 0.0310, 0.1166], device='cuda:0'), in_proj_covar=tensor([0.0123, 0.0105, 0.0113, 0.0114, 0.0102, 0.0124, 0.0101, 0.0113], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:44:05,252 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.35 vs. limit=2.0 2022-11-16 02:44:11,791 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.90 vs. limit=5.0 2022-11-16 02:44:12,127 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.117e+01 1.492e+02 1.907e+02 2.389e+02 5.299e+02, threshold=3.813e+02, percent-clipped=1.0 2022-11-16 02:44:25,991 INFO [train.py:876] (0/4) Epoch 10, batch 3400, loss[loss=0.1664, simple_loss=0.1828, pruned_loss=0.07497, over 5169.00 frames. ], tot_loss[loss=0.123, simple_loss=0.1476, pruned_loss=0.04922, over 1082532.35 frames. ], batch size: 91, lr: 8.10e-03, grad_scale: 16.0 2022-11-16 02:44:33,517 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2022-11-16 02:44:42,250 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3941, 4.0064, 3.0823, 1.7548, 3.8220, 1.6920, 3.8064, 1.9619], device='cuda:0'), covar=tensor([0.1491, 0.0150, 0.0662, 0.2357, 0.0181, 0.1923, 0.0254, 0.1716], device='cuda:0'), in_proj_covar=tensor([0.0124, 0.0105, 0.0114, 0.0115, 0.0102, 0.0124, 0.0100, 0.0113], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:45:20,579 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.200e+01 1.439e+02 1.799e+02 2.243e+02 3.372e+02, threshold=3.599e+02, percent-clipped=0.0 2022-11-16 02:45:24,301 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=68934.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:45:34,200 INFO [train.py:876] (0/4) Epoch 10, batch 3500, loss[loss=0.1326, simple_loss=0.1463, pruned_loss=0.05942, over 5654.00 frames. ], tot_loss[loss=0.1233, simple_loss=0.1475, pruned_loss=0.04958, over 1079691.34 frames. ], batch size: 32, lr: 8.10e-03, grad_scale: 16.0 2022-11-16 02:46:28,137 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.544e+01 1.634e+02 1.923e+02 2.372e+02 5.246e+02, threshold=3.846e+02, percent-clipped=2.0 2022-11-16 02:46:36,713 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69041.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:46:38,119 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69043.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:46:41,867 INFO [train.py:876] (0/4) Epoch 10, batch 3600, loss[loss=0.1272, simple_loss=0.1508, pruned_loss=0.05181, over 5575.00 frames. ], tot_loss[loss=0.1218, simple_loss=0.1464, pruned_loss=0.04863, over 1076909.92 frames. ], batch size: 43, lr: 8.09e-03, grad_scale: 16.0 2022-11-16 02:46:42,230 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.73 vs. limit=5.0 2022-11-16 02:46:46,001 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69055.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:47:18,336 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69103.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:47:19,100 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69104.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:47:35,521 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.617e+01 1.458e+02 1.877e+02 2.187e+02 3.990e+02, threshold=3.754e+02, percent-clipped=1.0 2022-11-16 02:47:49,300 INFO [train.py:876] (0/4) Epoch 10, batch 3700, loss[loss=0.1204, simple_loss=0.1522, pruned_loss=0.04427, over 5744.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1479, pruned_loss=0.04996, over 1081773.69 frames. ], batch size: 31, lr: 8.08e-03, grad_scale: 32.0 2022-11-16 02:47:58,269 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69162.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:48:02,160 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69168.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:48:22,741 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3994, 4.4874, 3.8752, 4.3459, 4.4715, 4.3080, 2.1676, 4.5392], device='cuda:0'), covar=tensor([0.0227, 0.0220, 0.0400, 0.0320, 0.0208, 0.0337, 0.2811, 0.0270], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0085, 0.0085, 0.0078, 0.0101, 0.0088, 0.0129, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:48:40,040 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69223.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:48:41,330 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69225.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:48:43,926 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69229.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:48:44,356 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.159e+01 1.529e+02 2.089e+02 2.315e+02 4.555e+02, threshold=4.177e+02, percent-clipped=1.0 2022-11-16 02:48:47,071 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69234.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:48:54,611 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69245.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:48:57,844 INFO [train.py:876] (0/4) Epoch 10, batch 3800, loss[loss=0.1105, simple_loss=0.144, pruned_loss=0.03846, over 5733.00 frames. ], tot_loss[loss=0.1231, simple_loss=0.1473, pruned_loss=0.04947, over 1082490.16 frames. ], batch size: 20, lr: 8.08e-03, grad_scale: 16.0 2022-11-16 02:49:14,255 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69273.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:49:18,073 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.1218, 4.6604, 4.8641, 4.6414, 5.2536, 5.0786, 4.4879, 5.1826], device='cuda:0'), covar=tensor([0.0317, 0.0270, 0.0400, 0.0289, 0.0254, 0.0191, 0.0252, 0.0259], device='cuda:0'), in_proj_covar=tensor([0.0136, 0.0142, 0.0103, 0.0136, 0.0163, 0.0098, 0.0118, 0.0144], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 02:49:20,032 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69282.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:49:22,745 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69286.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:49:36,639 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69306.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:49:45,837 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69320.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:49:53,217 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.561e+02 1.843e+02 2.119e+02 2.947e+02, threshold=3.686e+02, percent-clipped=0.0 2022-11-16 02:49:55,290 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69334.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 02:49:57,215 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7497, 2.8328, 2.9805, 2.6578, 2.9186, 2.8569, 1.2170, 2.9969], device='cuda:0'), covar=tensor([0.0317, 0.0403, 0.0280, 0.0338, 0.0364, 0.0336, 0.2876, 0.0334], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0087, 0.0085, 0.0079, 0.0102, 0.0088, 0.0130, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:49:57,308 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6173, 2.1483, 1.6410, 1.2140, 1.7996, 2.2567, 1.8921, 2.3761], device='cuda:0'), covar=tensor([0.1647, 0.1358, 0.1771, 0.2609, 0.1176, 0.1006, 0.0655, 0.1141], device='cuda:0'), in_proj_covar=tensor([0.0172, 0.0186, 0.0163, 0.0187, 0.0182, 0.0200, 0.0168, 0.0188], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:49:59,877 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69341.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:50:05,017 INFO [train.py:876] (0/4) Epoch 10, batch 3900, loss[loss=0.1236, simple_loss=0.1469, pruned_loss=0.05016, over 5238.00 frames. ], tot_loss[loss=0.1216, simple_loss=0.146, pruned_loss=0.04859, over 1085292.57 frames. ], batch size: 79, lr: 8.07e-03, grad_scale: 8.0 2022-11-16 02:50:18,578 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69368.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:50:27,396 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69381.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:50:32,634 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69389.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:50:36,062 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69394.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:50:39,284 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69399.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:50:54,223 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-16 02:51:00,411 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69429.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:51:01,539 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.631e+02 1.973e+02 2.599e+02 5.372e+02, threshold=3.946e+02, percent-clipped=3.0 2022-11-16 02:51:06,210 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3506, 2.7328, 2.3923, 2.5870, 2.2521, 1.9579, 2.5107, 2.9190], device='cuda:0'), covar=tensor([0.1514, 0.1178, 0.1996, 0.1370, 0.1576, 0.1720, 0.1460, 0.1686], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0098, 0.0101, 0.0094, 0.0088, 0.0095, 0.0094, 0.0074], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:51:13,627 INFO [train.py:876] (0/4) Epoch 10, batch 4000, loss[loss=0.08566, simple_loss=0.1189, pruned_loss=0.02619, over 5694.00 frames. ], tot_loss[loss=0.1217, simple_loss=0.1464, pruned_loss=0.04845, over 1091980.85 frames. ], batch size: 19, lr: 8.07e-03, grad_scale: 8.0 2022-11-16 02:51:13,702 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.7337, 5.2840, 5.4538, 5.1225, 5.8778, 5.6613, 4.8740, 5.8095], device='cuda:0'), covar=tensor([0.0403, 0.0305, 0.0492, 0.0318, 0.0330, 0.0202, 0.0254, 0.0223], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0141, 0.0103, 0.0135, 0.0163, 0.0097, 0.0117, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 02:51:17,619 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69455.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:51:59,806 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69518.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:52:03,558 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69524.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:52:08,748 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.549e+01 1.604e+02 1.884e+02 2.288e+02 4.952e+02, threshold=3.768e+02, percent-clipped=2.0 2022-11-16 02:52:20,881 INFO [train.py:876] (0/4) Epoch 10, batch 4100, loss[loss=0.156, simple_loss=0.1654, pruned_loss=0.07329, over 5601.00 frames. ], tot_loss[loss=0.1219, simple_loss=0.1469, pruned_loss=0.04843, over 1093042.94 frames. ], batch size: 50, lr: 8.06e-03, grad_scale: 8.0 2022-11-16 02:52:33,586 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2022-11-16 02:52:42,295 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69581.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:52:56,100 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69601.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 02:53:14,377 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69629.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:53:15,586 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.609e+02 2.045e+02 2.634e+02 5.309e+02, threshold=4.090e+02, percent-clipped=7.0 2022-11-16 02:53:28,115 INFO [train.py:876] (0/4) Epoch 10, batch 4200, loss[loss=0.079, simple_loss=0.1189, pruned_loss=0.01954, over 5749.00 frames. ], tot_loss[loss=0.1226, simple_loss=0.1474, pruned_loss=0.04895, over 1091439.36 frames. ], batch size: 15, lr: 8.05e-03, grad_scale: 8.0 2022-11-16 02:53:45,416 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=69674.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:53:45,526 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-16 02:53:46,616 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69676.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:53:49,260 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2273, 1.7701, 1.9557, 1.8675, 1.5031, 2.1835, 2.0739, 1.9195], device='cuda:0'), covar=tensor([0.0024, 0.0039, 0.0048, 0.0045, 0.0090, 0.0059, 0.0035, 0.0035], device='cuda:0'), in_proj_covar=tensor([0.0024, 0.0022, 0.0023, 0.0030, 0.0026, 0.0024, 0.0029, 0.0028], device='cuda:0'), out_proj_covar=tensor([2.1989e-05, 2.1301e-05, 2.0812e-05, 2.9502e-05, 2.4768e-05, 2.2775e-05, 2.7521e-05, 2.7941e-05], device='cuda:0') 2022-11-16 02:54:01,867 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69699.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:54:18,351 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69724.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:54:23,253 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.003e+02 1.585e+02 1.904e+02 2.422e+02 5.734e+02, threshold=3.808e+02, percent-clipped=2.0 2022-11-16 02:54:26,115 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=69735.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 02:54:34,704 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69747.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:54:36,044 INFO [train.py:876] (0/4) Epoch 10, batch 4300, loss[loss=0.1454, simple_loss=0.1743, pruned_loss=0.05825, over 5625.00 frames. ], tot_loss[loss=0.1229, simple_loss=0.1474, pruned_loss=0.04922, over 1084272.31 frames. ], batch size: 29, lr: 8.05e-03, grad_scale: 8.0 2022-11-16 02:54:37,120 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=69750.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:54:54,465 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3877, 1.5764, 1.3926, 1.1905, 1.2277, 1.4584, 1.1570, 0.8392], device='cuda:0'), covar=tensor([0.0031, 0.0025, 0.0029, 0.0041, 0.0052, 0.0036, 0.0044, 0.0052], device='cuda:0'), in_proj_covar=tensor([0.0024, 0.0022, 0.0023, 0.0030, 0.0026, 0.0023, 0.0029, 0.0028], device='cuda:0'), out_proj_covar=tensor([2.1791e-05, 2.1237e-05, 2.0667e-05, 2.9430e-05, 2.4529e-05, 2.2375e-05, 2.7472e-05, 2.8017e-05], device='cuda:0') 2022-11-16 02:55:23,530 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69818.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:55:27,391 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69824.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:55:30,712 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5347, 3.4756, 3.2745, 3.2009, 3.5251, 3.9840, 4.4617, 3.4791], device='cuda:0'), covar=tensor([0.0808, 0.0920, 0.1694, 0.2621, 0.1231, 0.0455, 0.0500, 0.3846], device='cuda:0'), in_proj_covar=tensor([0.0101, 0.0097, 0.0100, 0.0092, 0.0086, 0.0094, 0.0093, 0.0072], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:55:31,844 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.497e+01 1.489e+02 1.786e+02 2.151e+02 5.165e+02, threshold=3.571e+02, percent-clipped=4.0 2022-11-16 02:55:44,013 INFO [train.py:876] (0/4) Epoch 10, batch 4400, loss[loss=0.1263, simple_loss=0.1582, pruned_loss=0.04723, over 5580.00 frames. ], tot_loss[loss=0.1223, simple_loss=0.1469, pruned_loss=0.04884, over 1080878.87 frames. ], batch size: 24, lr: 8.04e-03, grad_scale: 8.0 2022-11-16 02:55:56,375 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69866.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:56:00,367 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69872.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:56:06,506 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69881.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:56:13,021 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0407, 3.3535, 3.6286, 1.7076, 3.1978, 3.9383, 3.7196, 4.2622], device='cuda:0'), covar=tensor([0.2101, 0.1321, 0.1022, 0.2953, 0.0545, 0.0471, 0.0345, 0.0412], device='cuda:0'), in_proj_covar=tensor([0.0172, 0.0184, 0.0161, 0.0185, 0.0179, 0.0198, 0.0167, 0.0188], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 02:56:19,537 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69901.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 02:56:29,890 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5425, 2.7980, 2.8533, 2.5957, 2.7232, 2.7723, 1.2591, 2.8395], device='cuda:0'), covar=tensor([0.0375, 0.0326, 0.0281, 0.0315, 0.0379, 0.0328, 0.2806, 0.0358], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0085, 0.0085, 0.0077, 0.0101, 0.0087, 0.0130, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:56:38,888 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69929.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:56:38,963 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69929.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:56:40,107 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.783e+01 1.456e+02 1.859e+02 2.401e+02 3.905e+02, threshold=3.718e+02, percent-clipped=1.0 2022-11-16 02:56:52,007 INFO [train.py:876] (0/4) Epoch 10, batch 4500, loss[loss=0.1143, simple_loss=0.1447, pruned_loss=0.04194, over 5634.00 frames. ], tot_loss[loss=0.1225, simple_loss=0.1472, pruned_loss=0.04889, over 1082988.19 frames. ], batch size: 29, lr: 8.04e-03, grad_scale: 8.0 2022-11-16 02:56:52,043 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69949.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 02:56:58,306 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1780, 4.3021, 4.2274, 4.3369, 4.0258, 3.7025, 4.8094, 4.2172], device='cuda:0'), covar=tensor([0.0442, 0.0785, 0.0362, 0.1215, 0.0523, 0.0404, 0.0551, 0.0715], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0106, 0.0090, 0.0116, 0.0086, 0.0077, 0.0140, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:57:10,764 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=69976.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:57:11,363 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=69977.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:57:24,669 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1092, 2.2481, 2.5994, 2.3213, 1.4783, 2.3120, 1.7129, 1.9388], device='cuda:0'), covar=tensor([0.0238, 0.0159, 0.0114, 0.0175, 0.0347, 0.0149, 0.0334, 0.0199], device='cuda:0'), in_proj_covar=tensor([0.0187, 0.0166, 0.0173, 0.0194, 0.0185, 0.0171, 0.0185, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 02:57:27,419 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-70000.pt 2022-11-16 02:57:38,014 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4139, 3.3801, 3.3964, 3.4386, 3.3393, 3.0241, 3.8193, 3.2453], device='cuda:0'), covar=tensor([0.0572, 0.0906, 0.0471, 0.1213, 0.0583, 0.0435, 0.0710, 0.0884], device='cuda:0'), in_proj_covar=tensor([0.0085, 0.0105, 0.0090, 0.0115, 0.0085, 0.0077, 0.0140, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 02:57:47,881 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70024.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:57:47,956 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70024.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:57:52,469 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=70030.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 02:57:52,996 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.972e+01 1.578e+02 1.852e+02 2.258e+02 4.002e+02, threshold=3.705e+02, percent-clipped=2.0 2022-11-16 02:58:05,288 INFO [train.py:876] (0/4) Epoch 10, batch 4600, loss[loss=0.1057, simple_loss=0.1399, pruned_loss=0.03572, over 5784.00 frames. ], tot_loss[loss=0.1223, simple_loss=0.1471, pruned_loss=0.0487, over 1082590.10 frames. ], batch size: 21, lr: 8.03e-03, grad_scale: 8.0 2022-11-16 02:58:06,020 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70050.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:58:10,753 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2022-11-16 02:58:11,323 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0477, 2.1266, 2.4184, 3.2161, 3.1780, 2.5203, 2.0718, 3.3174], device='cuda:0'), covar=tensor([0.0980, 0.2621, 0.2022, 0.2196, 0.1182, 0.2833, 0.2144, 0.0880], device='cuda:0'), in_proj_covar=tensor([0.0243, 0.0200, 0.0190, 0.0313, 0.0225, 0.0207, 0.0188, 0.0241], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:58:18,392 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.64 vs. limit=2.0 2022-11-16 02:58:20,681 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70072.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:58:38,729 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70098.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:58:40,382 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0 2022-11-16 02:58:42,406 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2022-11-16 02:58:46,567 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=70110.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:59:00,301 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.197e+02 1.693e+02 2.084e+02 2.528e+02 4.221e+02, threshold=4.168e+02, percent-clipped=4.0 2022-11-16 02:59:13,004 INFO [train.py:876] (0/4) Epoch 10, batch 4700, loss[loss=0.09357, simple_loss=0.1397, pruned_loss=0.02372, over 5727.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1459, pruned_loss=0.04796, over 1076377.91 frames. ], batch size: 15, lr: 8.03e-03, grad_scale: 8.0 2022-11-16 02:59:27,632 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=70171.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 02:59:41,569 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3730, 1.8480, 2.1067, 2.4124, 2.7506, 1.9721, 1.5783, 2.6471], device='cuda:0'), covar=tensor([0.1941, 0.2602, 0.2078, 0.0995, 0.1154, 0.2812, 0.2257, 0.1515], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0202, 0.0192, 0.0315, 0.0227, 0.0208, 0.0190, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 02:59:46,572 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2022-11-16 03:00:08,314 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.549e+01 1.608e+02 1.952e+02 2.269e+02 5.385e+02, threshold=3.904e+02, percent-clipped=1.0 2022-11-16 03:00:20,901 INFO [train.py:876] (0/4) Epoch 10, batch 4800, loss[loss=0.1024, simple_loss=0.1433, pruned_loss=0.03079, over 5672.00 frames. ], tot_loss[loss=0.1204, simple_loss=0.1457, pruned_loss=0.04753, over 1082026.78 frames. ], batch size: 19, lr: 8.02e-03, grad_scale: 8.0 2022-11-16 03:00:28,847 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=70260.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:00:47,137 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-16 03:00:54,344 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.73 vs. limit=2.0 2022-11-16 03:01:00,616 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5289, 2.4612, 2.7428, 3.5287, 3.5291, 2.8250, 2.3724, 3.5560], device='cuda:0'), covar=tensor([0.0871, 0.2349, 0.2127, 0.2092, 0.1204, 0.2559, 0.2077, 0.1000], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0202, 0.0192, 0.0315, 0.0228, 0.0207, 0.0192, 0.0244], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 03:01:09,994 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=70321.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:01:13,456 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.34 vs. limit=2.0 2022-11-16 03:01:15,759 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70330.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 03:01:16,224 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.701e+02 2.058e+02 2.547e+02 4.780e+02, threshold=4.115e+02, percent-clipped=1.0 2022-11-16 03:01:17,318 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2022-11-16 03:01:27,691 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4645, 4.1869, 4.1379, 4.0291, 4.6055, 4.2847, 4.0761, 4.5008], device='cuda:0'), covar=tensor([0.0763, 0.0650, 0.0926, 0.0788, 0.0720, 0.0524, 0.0632, 0.0858], device='cuda:0'), in_proj_covar=tensor([0.0137, 0.0141, 0.0103, 0.0136, 0.0164, 0.0099, 0.0117, 0.0144], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 03:01:28,223 INFO [train.py:876] (0/4) Epoch 10, batch 4900, loss[loss=0.1348, simple_loss=0.1612, pruned_loss=0.05418, over 5715.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.1455, pruned_loss=0.04787, over 1077633.86 frames. ], batch size: 28, lr: 8.01e-03, grad_scale: 8.0 2022-11-16 03:01:28,413 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9774, 2.4697, 3.0985, 3.8034, 3.8432, 3.0985, 2.6154, 3.7354], device='cuda:0'), covar=tensor([0.0625, 0.3456, 0.2342, 0.3064, 0.1317, 0.3151, 0.2103, 0.0971], device='cuda:0'), in_proj_covar=tensor([0.0248, 0.0202, 0.0193, 0.0317, 0.0229, 0.0209, 0.0192, 0.0246], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 03:01:47,957 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70378.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 03:01:56,766 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.43 vs. limit=5.0 2022-11-16 03:02:06,251 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5310, 4.4902, 3.3952, 1.9661, 4.2378, 1.8436, 4.2324, 2.3493], device='cuda:0'), covar=tensor([0.1403, 0.0099, 0.0576, 0.1981, 0.0137, 0.1753, 0.0189, 0.1582], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0103, 0.0113, 0.0113, 0.0100, 0.0121, 0.0099, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:02:24,079 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.613e+02 1.862e+02 2.297e+02 4.157e+02, threshold=3.724e+02, percent-clipped=1.0 2022-11-16 03:02:36,299 INFO [train.py:876] (0/4) Epoch 10, batch 5000, loss[loss=0.07482, simple_loss=0.1011, pruned_loss=0.02429, over 5183.00 frames. ], tot_loss[loss=0.122, simple_loss=0.1466, pruned_loss=0.04869, over 1074713.78 frames. ], batch size: 8, lr: 8.01e-03, grad_scale: 8.0 2022-11-16 03:02:48,235 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=70466.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:03:13,517 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.72 vs. limit=5.0 2022-11-16 03:03:32,300 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 1.626e+02 2.052e+02 2.453e+02 4.423e+02, threshold=4.104e+02, percent-clipped=3.0 2022-11-16 03:03:44,159 INFO [train.py:876] (0/4) Epoch 10, batch 5100, loss[loss=0.1755, simple_loss=0.1881, pruned_loss=0.08144, over 5332.00 frames. ], tot_loss[loss=0.1218, simple_loss=0.1466, pruned_loss=0.04848, over 1079429.64 frames. ], batch size: 70, lr: 8.00e-03, grad_scale: 8.0 2022-11-16 03:04:30,171 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=70616.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:04:40,395 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.739e+01 1.539e+02 1.803e+02 2.270e+02 4.392e+02, threshold=3.606e+02, percent-clipped=1.0 2022-11-16 03:04:52,545 INFO [train.py:876] (0/4) Epoch 10, batch 5200, loss[loss=0.1152, simple_loss=0.152, pruned_loss=0.03921, over 5653.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1462, pruned_loss=0.04809, over 1080949.56 frames. ], batch size: 29, lr: 8.00e-03, grad_scale: 8.0 2022-11-16 03:05:19,741 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6964, 4.5890, 3.5010, 1.9538, 4.2608, 1.6605, 4.3652, 2.3218], device='cuda:0'), covar=tensor([0.1290, 0.0145, 0.0654, 0.2225, 0.0190, 0.1920, 0.0267, 0.1620], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0105, 0.0113, 0.0114, 0.0102, 0.0122, 0.0100, 0.0112], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:05:22,396 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.72 vs. limit=5.0 2022-11-16 03:05:29,145 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.33 vs. limit=5.0 2022-11-16 03:05:42,106 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.63 vs. limit=5.0 2022-11-16 03:05:44,923 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2022-11-16 03:05:47,144 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.458e+01 1.499e+02 1.855e+02 2.184e+02 5.918e+02, threshold=3.711e+02, percent-clipped=4.0 2022-11-16 03:05:48,242 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.64 vs. limit=2.0 2022-11-16 03:05:59,635 INFO [train.py:876] (0/4) Epoch 10, batch 5300, loss[loss=0.09668, simple_loss=0.1347, pruned_loss=0.02933, over 5719.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1466, pruned_loss=0.04791, over 1086488.68 frames. ], batch size: 36, lr: 7.99e-03, grad_scale: 8.0 2022-11-16 03:06:11,220 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70766.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:06:18,824 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([6.2674, 5.7419, 6.1834, 5.5007, 6.3306, 6.1051, 5.2662, 6.2250], device='cuda:0'), covar=tensor([0.0372, 0.0414, 0.0394, 0.0496, 0.0336, 0.0188, 0.0237, 0.0279], device='cuda:0'), in_proj_covar=tensor([0.0140, 0.0146, 0.0105, 0.0141, 0.0169, 0.0101, 0.0119, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 03:06:30,448 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0016, 3.2847, 3.2572, 3.0855, 3.2349, 3.1920, 1.2740, 3.3724], device='cuda:0'), covar=tensor([0.0353, 0.0272, 0.0280, 0.0288, 0.0306, 0.0317, 0.3339, 0.0287], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0085, 0.0085, 0.0078, 0.0100, 0.0086, 0.0128, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:06:43,989 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70814.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:06:55,260 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.500e+01 1.530e+02 1.938e+02 2.261e+02 4.134e+02, threshold=3.876e+02, percent-clipped=2.0 2022-11-16 03:07:07,455 INFO [train.py:876] (0/4) Epoch 10, batch 5400, loss[loss=0.1586, simple_loss=0.1518, pruned_loss=0.08268, over 4127.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1476, pruned_loss=0.04859, over 1090824.71 frames. ], batch size: 181, lr: 7.99e-03, grad_scale: 8.0 2022-11-16 03:07:43,292 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2022-11-16 03:07:52,773 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=70916.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:08:02,764 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.679e+02 2.107e+02 2.563e+02 5.005e+02, threshold=4.215e+02, percent-clipped=8.0 2022-11-16 03:08:14,643 INFO [train.py:876] (0/4) Epoch 10, batch 5500, loss[loss=0.1125, simple_loss=0.1438, pruned_loss=0.04064, over 5583.00 frames. ], tot_loss[loss=0.1218, simple_loss=0.1466, pruned_loss=0.04851, over 1089826.38 frames. ], batch size: 25, lr: 7.98e-03, grad_scale: 8.0 2022-11-16 03:08:24,878 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=70964.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:08:42,598 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0689, 3.0091, 2.6768, 2.9712, 3.0200, 2.6694, 2.5415, 2.6962], device='cuda:0'), covar=tensor([0.0317, 0.0614, 0.1475, 0.0657, 0.0594, 0.0579, 0.1198, 0.0827], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0177, 0.0271, 0.0171, 0.0215, 0.0175, 0.0186, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:08:43,995 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=70992.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:08:56,441 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0761, 2.1380, 2.8348, 2.5828, 2.6171, 2.0491, 2.7104, 3.0253], device='cuda:0'), covar=tensor([0.0517, 0.1093, 0.0672, 0.0902, 0.0726, 0.1150, 0.0818, 0.0736], device='cuda:0'), in_proj_covar=tensor([0.0235, 0.0190, 0.0207, 0.0206, 0.0230, 0.0190, 0.0222, 0.0226], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:09:00,877 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6258, 1.2949, 1.4827, 0.9573, 1.6845, 1.5287, 1.2050, 1.3116], device='cuda:0'), covar=tensor([0.0782, 0.0492, 0.0317, 0.0918, 0.0505, 0.1507, 0.0445, 0.0757], device='cuda:0'), in_proj_covar=tensor([0.0013, 0.0021, 0.0014, 0.0018, 0.0015, 0.0013, 0.0020, 0.0014], device='cuda:0'), out_proj_covar=tensor([7.3111e-05, 1.0031e-04, 7.6521e-05, 8.9864e-05, 7.8044e-05, 7.2239e-05, 9.5371e-05, 7.4028e-05], device='cuda:0') 2022-11-16 03:09:05,349 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6294, 4.7304, 3.2095, 4.3925, 3.5974, 3.0786, 2.6498, 3.8232], device='cuda:0'), covar=tensor([0.1250, 0.0183, 0.1073, 0.0275, 0.0647, 0.1015, 0.1690, 0.0458], device='cuda:0'), in_proj_covar=tensor([0.0157, 0.0140, 0.0161, 0.0147, 0.0175, 0.0171, 0.0165, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:09:10,367 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=71030.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:09:10,842 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.028e+02 1.597e+02 2.081e+02 2.610e+02 5.785e+02, threshold=4.161e+02, percent-clipped=1.0 2022-11-16 03:09:22,365 INFO [train.py:876] (0/4) Epoch 10, batch 5600, loss[loss=0.1454, simple_loss=0.1658, pruned_loss=0.06248, over 5093.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.1474, pruned_loss=0.04977, over 1084828.59 frames. ], batch size: 91, lr: 7.98e-03, grad_scale: 8.0 2022-11-16 03:09:25,140 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=71053.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:09:51,303 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=71091.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:10:05,795 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2426, 1.4555, 1.1091, 1.0151, 1.4100, 1.2621, 0.6668, 1.5273], device='cuda:0'), covar=tensor([0.0048, 0.0034, 0.0052, 0.0053, 0.0043, 0.0040, 0.0084, 0.0058], device='cuda:0'), in_proj_covar=tensor([0.0054, 0.0050, 0.0052, 0.0053, 0.0053, 0.0046, 0.0048, 0.0045], device='cuda:0'), out_proj_covar=tensor([4.8042e-05, 4.4888e-05, 4.6089e-05, 4.7585e-05, 4.7282e-05, 4.0163e-05, 4.3591e-05, 3.9675e-05], device='cuda:0') 2022-11-16 03:10:17,297 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4955, 2.2504, 3.1627, 2.7932, 3.0218, 2.1420, 2.9894, 3.4523], device='cuda:0'), covar=tensor([0.0834, 0.1439, 0.0868, 0.1377, 0.0927, 0.1546, 0.1031, 0.0926], device='cuda:0'), in_proj_covar=tensor([0.0238, 0.0191, 0.0211, 0.0208, 0.0235, 0.0192, 0.0224, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:10:18,056 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.341e+01 1.536e+02 1.905e+02 2.360e+02 4.065e+02, threshold=3.810e+02, percent-clipped=0.0 2022-11-16 03:10:30,666 INFO [train.py:876] (0/4) Epoch 10, batch 5700, loss[loss=0.1495, simple_loss=0.1773, pruned_loss=0.06089, over 5702.00 frames. ], tot_loss[loss=0.1213, simple_loss=0.146, pruned_loss=0.04833, over 1083891.70 frames. ], batch size: 28, lr: 7.97e-03, grad_scale: 8.0 2022-11-16 03:10:33,756 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2022-11-16 03:11:26,879 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.887e+01 1.504e+02 1.909e+02 2.394e+02 3.838e+02, threshold=3.819e+02, percent-clipped=2.0 2022-11-16 03:11:38,466 INFO [train.py:876] (0/4) Epoch 10, batch 5800, loss[loss=0.06407, simple_loss=0.09717, pruned_loss=0.01548, over 5311.00 frames. ], tot_loss[loss=0.1215, simple_loss=0.1468, pruned_loss=0.04812, over 1084035.42 frames. ], batch size: 6, lr: 7.96e-03, grad_scale: 4.0 2022-11-16 03:11:45,045 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.59 vs. limit=2.0 2022-11-16 03:11:55,205 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3469, 2.3989, 2.0853, 2.3467, 2.4171, 2.1462, 2.0432, 2.2381], device='cuda:0'), covar=tensor([0.0431, 0.0754, 0.1805, 0.0712, 0.0718, 0.0672, 0.1331, 0.0647], device='cuda:0'), in_proj_covar=tensor([0.0131, 0.0173, 0.0268, 0.0171, 0.0215, 0.0174, 0.0184, 0.0172], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:12:14,090 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.00 vs. limit=5.0 2022-11-16 03:12:34,046 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.935e+01 1.517e+02 1.924e+02 2.485e+02 4.438e+02, threshold=3.847e+02, percent-clipped=4.0 2022-11-16 03:12:41,383 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8976, 3.9764, 3.7176, 3.4073, 2.1249, 3.9907, 2.3591, 3.2502], device='cuda:0'), covar=tensor([0.0398, 0.0170, 0.0203, 0.0395, 0.0684, 0.0163, 0.0521, 0.0141], device='cuda:0'), in_proj_covar=tensor([0.0186, 0.0166, 0.0174, 0.0195, 0.0186, 0.0171, 0.0183, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 03:12:44,747 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=71348.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:12:45,333 INFO [train.py:876] (0/4) Epoch 10, batch 5900, loss[loss=0.09449, simple_loss=0.1359, pruned_loss=0.02655, over 5715.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1468, pruned_loss=0.04897, over 1078145.24 frames. ], batch size: 17, lr: 7.96e-03, grad_scale: 4.0 2022-11-16 03:12:51,834 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-16 03:13:05,375 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5295, 4.5375, 4.6216, 4.6909, 4.0468, 4.1451, 5.1009, 4.4480], device='cuda:0'), covar=tensor([0.0361, 0.0742, 0.0386, 0.0966, 0.0495, 0.0284, 0.0604, 0.0567], device='cuda:0'), in_proj_covar=tensor([0.0085, 0.0105, 0.0089, 0.0116, 0.0086, 0.0078, 0.0141, 0.0099], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:13:10,618 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=71386.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:13:42,135 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.550e+02 1.857e+02 2.377e+02 5.014e+02, threshold=3.713e+02, percent-clipped=7.0 2022-11-16 03:13:53,302 INFO [train.py:876] (0/4) Epoch 10, batch 6000, loss[loss=0.1867, simple_loss=0.1776, pruned_loss=0.09791, over 3122.00 frames. ], tot_loss[loss=0.1228, simple_loss=0.1474, pruned_loss=0.04911, over 1084946.75 frames. ], batch size: 284, lr: 7.95e-03, grad_scale: 8.0 2022-11-16 03:13:53,303 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 03:13:59,818 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3825, 2.4374, 2.7616, 3.6572, 3.5971, 2.5767, 2.3867, 3.6771], device='cuda:0'), covar=tensor([0.0986, 0.2874, 0.2100, 0.1466, 0.1116, 0.2748, 0.2327, 0.0795], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0201, 0.0191, 0.0315, 0.0229, 0.0206, 0.0192, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 03:14:08,093 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9037, 2.2152, 2.8117, 2.6932, 2.5861, 2.1436, 2.7446, 3.0479], device='cuda:0'), covar=tensor([0.0440, 0.0819, 0.0494, 0.0463, 0.0662, 0.0974, 0.0478, 0.0364], device='cuda:0'), in_proj_covar=tensor([0.0238, 0.0193, 0.0212, 0.0211, 0.0235, 0.0191, 0.0224, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:14:09,314 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8757, 2.4701, 3.4022, 3.0942, 3.4257, 2.8368, 3.4567, 3.7865], device='cuda:0'), covar=tensor([0.0766, 0.1614, 0.1334, 0.1628, 0.1008, 0.1426, 0.1167, 0.0906], device='cuda:0'), in_proj_covar=tensor([0.0238, 0.0193, 0.0212, 0.0211, 0.0235, 0.0191, 0.0224, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:14:11,199 INFO [train.py:908] (0/4) Epoch 10, validation: loss=0.1673, simple_loss=0.1835, pruned_loss=0.0755, over 1530663.00 frames. 2022-11-16 03:14:11,199 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-16 03:14:16,975 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2022-11-16 03:14:22,566 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.82 vs. limit=2.0 2022-11-16 03:15:07,799 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.614e+01 1.652e+02 1.842e+02 2.318e+02 4.092e+02, threshold=3.683e+02, percent-clipped=3.0 2022-11-16 03:15:18,741 INFO [train.py:876] (0/4) Epoch 10, batch 6100, loss[loss=0.1441, simple_loss=0.1661, pruned_loss=0.06101, over 5580.00 frames. ], tot_loss[loss=0.1216, simple_loss=0.1467, pruned_loss=0.04825, over 1085797.78 frames. ], batch size: 43, lr: 7.95e-03, grad_scale: 8.0 2022-11-16 03:15:34,187 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2022-11-16 03:16:04,384 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=71616.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:16:16,026 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 1.497e+02 1.833e+02 2.234e+02 4.359e+02, threshold=3.667e+02, percent-clipped=3.0 2022-11-16 03:16:26,679 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=71648.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:16:27,198 INFO [train.py:876] (0/4) Epoch 10, batch 6200, loss[loss=0.08655, simple_loss=0.1139, pruned_loss=0.0296, over 5460.00 frames. ], tot_loss[loss=0.1205, simple_loss=0.1457, pruned_loss=0.04765, over 1084740.50 frames. ], batch size: 11, lr: 7.94e-03, grad_scale: 8.0 2022-11-16 03:16:30,562 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4883, 5.0723, 4.7620, 5.0979, 5.1376, 4.3513, 4.5937, 4.4627], device='cuda:0'), covar=tensor([0.0343, 0.0401, 0.1204, 0.0281, 0.0350, 0.0399, 0.0519, 0.0347], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0173, 0.0269, 0.0173, 0.0214, 0.0173, 0.0185, 0.0173], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:16:33,784 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9690, 3.7308, 3.8962, 3.4170, 3.9007, 3.6969, 1.6062, 4.0815], device='cuda:0'), covar=tensor([0.0337, 0.0522, 0.0345, 0.0450, 0.0347, 0.0486, 0.3238, 0.0337], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0088, 0.0087, 0.0081, 0.0104, 0.0090, 0.0130, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:16:45,859 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=71677.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:16:52,342 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=71686.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:16:58,947 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=71696.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:17:13,628 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2022-11-16 03:17:14,700 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8352, 2.4498, 2.2841, 1.4842, 2.7787, 2.8275, 2.5881, 2.9965], device='cuda:0'), covar=tensor([0.2035, 0.1726, 0.1795, 0.3006, 0.0738, 0.1552, 0.0537, 0.1004], device='cuda:0'), in_proj_covar=tensor([0.0170, 0.0185, 0.0163, 0.0184, 0.0176, 0.0198, 0.0166, 0.0186], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:17:23,599 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.040e+02 1.569e+02 1.914e+02 2.269e+02 4.591e+02, threshold=3.828e+02, percent-clipped=5.0 2022-11-16 03:17:24,937 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=71734.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:17:34,640 INFO [train.py:876] (0/4) Epoch 10, batch 6300, loss[loss=0.1031, simple_loss=0.1474, pruned_loss=0.0294, over 5694.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.146, pruned_loss=0.0482, over 1078486.54 frames. ], batch size: 19, lr: 7.94e-03, grad_scale: 8.0 2022-11-16 03:17:37,348 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4710, 3.4839, 3.1527, 3.3708, 2.9371, 3.8753, 3.4643, 3.5826], device='cuda:0'), covar=tensor([0.0652, 0.0828, 0.1359, 0.1069, 0.1041, 0.0572, 0.0902, 0.3092], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0100, 0.0102, 0.0097, 0.0088, 0.0098, 0.0095, 0.0076], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 03:17:47,143 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8807, 5.1137, 3.3021, 4.6601, 3.8501, 3.3894, 3.0199, 4.4609], device='cuda:0'), covar=tensor([0.1157, 0.0128, 0.0926, 0.0363, 0.0566, 0.0943, 0.1489, 0.0201], device='cuda:0'), in_proj_covar=tensor([0.0155, 0.0137, 0.0160, 0.0144, 0.0173, 0.0169, 0.0164, 0.0155], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 03:17:49,452 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.84 vs. limit=2.0 2022-11-16 03:17:53,313 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5826, 1.0269, 1.2283, 0.8519, 1.3167, 1.3264, 0.8840, 1.1797], device='cuda:0'), covar=tensor([0.0291, 0.0423, 0.0297, 0.0808, 0.0288, 0.0255, 0.0802, 0.0367], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0021, 0.0015, 0.0018, 0.0015, 0.0013, 0.0020, 0.0014], device='cuda:0'), out_proj_covar=tensor([7.4360e-05, 1.0143e-04, 7.7694e-05, 9.1097e-05, 7.9160e-05, 7.3595e-05, 9.7045e-05, 7.5126e-05], device='cuda:0') 2022-11-16 03:18:13,337 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9744, 2.5053, 2.8745, 3.7502, 3.7586, 2.9108, 2.4955, 3.7910], device='cuda:0'), covar=tensor([0.0539, 0.2641, 0.2113, 0.2962, 0.1141, 0.2632, 0.2004, 0.0795], device='cuda:0'), in_proj_covar=tensor([0.0250, 0.0202, 0.0191, 0.0318, 0.0231, 0.0209, 0.0191, 0.0245], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 03:18:30,419 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.579e+02 1.935e+02 2.633e+02 4.632e+02, threshold=3.870e+02, percent-clipped=2.0 2022-11-16 03:18:42,503 INFO [train.py:876] (0/4) Epoch 10, batch 6400, loss[loss=0.1014, simple_loss=0.1238, pruned_loss=0.03955, over 5302.00 frames. ], tot_loss[loss=0.122, simple_loss=0.1464, pruned_loss=0.04879, over 1080122.65 frames. ], batch size: 9, lr: 7.93e-03, grad_scale: 8.0 2022-11-16 03:19:07,989 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.91 vs. limit=2.0 2022-11-16 03:19:29,521 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=71919.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:19:37,764 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.943e+01 1.534e+02 1.967e+02 2.415e+02 6.448e+02, threshold=3.935e+02, percent-clipped=1.0 2022-11-16 03:19:50,173 INFO [train.py:876] (0/4) Epoch 10, batch 6500, loss[loss=0.1065, simple_loss=0.1378, pruned_loss=0.03762, over 5600.00 frames. ], tot_loss[loss=0.1221, simple_loss=0.1464, pruned_loss=0.04885, over 1081157.71 frames. ], batch size: 18, lr: 7.93e-03, grad_scale: 8.0 2022-11-16 03:20:05,471 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=71972.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:20:10,803 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=71980.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:20:46,047 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.311e+01 1.515e+02 1.926e+02 2.349e+02 4.433e+02, threshold=3.852e+02, percent-clipped=3.0 2022-11-16 03:20:47,235 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2022-11-16 03:20:47,819 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-16 03:20:57,525 INFO [train.py:876] (0/4) Epoch 10, batch 6600, loss[loss=0.09812, simple_loss=0.1316, pruned_loss=0.03234, over 5529.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.1451, pruned_loss=0.04823, over 1074814.29 frames. ], batch size: 14, lr: 7.92e-03, grad_scale: 8.0 2022-11-16 03:21:07,528 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2061, 4.3319, 4.3677, 4.3882, 3.8456, 3.6201, 4.9515, 4.3053], device='cuda:0'), covar=tensor([0.0478, 0.0928, 0.0385, 0.1484, 0.0567, 0.0500, 0.0654, 0.0528], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0107, 0.0091, 0.0119, 0.0087, 0.0079, 0.0144, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:21:51,556 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=72129.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:21:53,317 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.616e+01 1.605e+02 1.991e+02 2.572e+02 4.942e+02, threshold=3.982e+02, percent-clipped=4.0 2022-11-16 03:22:04,581 INFO [train.py:876] (0/4) Epoch 10, batch 6700, loss[loss=0.1131, simple_loss=0.1404, pruned_loss=0.04291, over 5770.00 frames. ], tot_loss[loss=0.1182, simple_loss=0.144, pruned_loss=0.04623, over 1081436.77 frames. ], batch size: 21, lr: 7.91e-03, grad_scale: 8.0 2022-11-16 03:22:14,347 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-16 03:22:19,723 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-16 03:22:33,089 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=72190.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:22:57,959 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.96 vs. limit=2.0 2022-11-16 03:23:01,932 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.623e+02 1.994e+02 2.476e+02 5.420e+02, threshold=3.989e+02, percent-clipped=4.0 2022-11-16 03:23:13,011 INFO [train.py:876] (0/4) Epoch 10, batch 6800, loss[loss=0.1285, simple_loss=0.1646, pruned_loss=0.04615, over 5822.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.1436, pruned_loss=0.04593, over 1084325.19 frames. ], batch size: 18, lr: 7.91e-03, grad_scale: 8.0 2022-11-16 03:23:16,285 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.0036, 5.7702, 5.2529, 5.8316, 5.8395, 5.0268, 5.2563, 4.9348], device='cuda:0'), covar=tensor([0.0202, 0.0348, 0.1110, 0.0308, 0.0354, 0.0403, 0.0309, 0.0784], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0169, 0.0261, 0.0167, 0.0212, 0.0169, 0.0178, 0.0168], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 03:23:28,139 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=72272.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:23:30,337 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=72275.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:23:37,429 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2022-11-16 03:23:48,395 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.55 vs. limit=5.0 2022-11-16 03:24:02,089 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=72320.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:24:06,393 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=72326.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:24:10,107 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8175, 3.7044, 3.7102, 3.9206, 3.3836, 3.3670, 4.2555, 3.7928], device='cuda:0'), covar=tensor([0.0493, 0.1139, 0.0492, 0.1300, 0.0712, 0.0541, 0.0731, 0.0763], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0107, 0.0091, 0.0118, 0.0088, 0.0078, 0.0145, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:24:11,067 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.013e+02 1.569e+02 1.889e+02 2.290e+02 4.536e+02, threshold=3.778e+02, percent-clipped=2.0 2022-11-16 03:24:23,846 INFO [train.py:876] (0/4) Epoch 10, batch 6900, loss[loss=0.113, simple_loss=0.1313, pruned_loss=0.04737, over 5706.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1447, pruned_loss=0.04727, over 1076153.94 frames. ], batch size: 28, lr: 7.90e-03, grad_scale: 8.0 2022-11-16 03:24:49,655 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=72387.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:25:09,507 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4161, 1.5921, 1.4434, 1.2585, 1.4248, 1.4145, 1.0904, 1.4645], device='cuda:0'), covar=tensor([0.0046, 0.0055, 0.0043, 0.0049, 0.0045, 0.0040, 0.0051, 0.0066], device='cuda:0'), in_proj_covar=tensor([0.0055, 0.0052, 0.0053, 0.0054, 0.0054, 0.0048, 0.0049, 0.0046], device='cuda:0'), out_proj_covar=tensor([4.9688e-05, 4.6913e-05, 4.6815e-05, 4.8398e-05, 4.8272e-05, 4.1660e-05, 4.4495e-05, 4.0717e-05], device='cuda:0') 2022-11-16 03:25:19,931 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.314e+01 1.574e+02 1.843e+02 2.228e+02 3.077e+02, threshold=3.686e+02, percent-clipped=0.0 2022-11-16 03:25:32,128 INFO [train.py:876] (0/4) Epoch 10, batch 7000, loss[loss=0.0923, simple_loss=0.1261, pruned_loss=0.02925, over 5478.00 frames. ], tot_loss[loss=0.1187, simple_loss=0.1442, pruned_loss=0.04657, over 1077699.29 frames. ], batch size: 12, lr: 7.90e-03, grad_scale: 8.0 2022-11-16 03:25:36,538 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2434, 2.5379, 3.7063, 3.3078, 4.1517, 2.7154, 3.4469, 4.1837], device='cuda:0'), covar=tensor([0.0560, 0.1743, 0.0823, 0.1315, 0.0395, 0.1453, 0.1279, 0.0738], device='cuda:0'), in_proj_covar=tensor([0.0233, 0.0190, 0.0207, 0.0207, 0.0231, 0.0188, 0.0222, 0.0224], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:25:39,333 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.69 vs. limit=2.0 2022-11-16 03:25:49,019 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1061, 0.7071, 0.8588, 0.7962, 1.0965, 0.9646, 0.6130, 0.7724], device='cuda:0'), covar=tensor([0.0280, 0.0333, 0.0318, 0.0576, 0.0299, 0.0249, 0.0885, 0.0356], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0021, 0.0015, 0.0018, 0.0015, 0.0013, 0.0020, 0.0014], device='cuda:0'), out_proj_covar=tensor([7.5793e-05, 1.0044e-04, 7.7361e-05, 9.0735e-05, 7.9259e-05, 7.3221e-05, 9.6989e-05, 7.5119e-05], device='cuda:0') 2022-11-16 03:25:56,118 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=72485.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:26:27,976 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.005e+02 1.555e+02 1.806e+02 2.267e+02 5.483e+02, threshold=3.612e+02, percent-clipped=3.0 2022-11-16 03:26:39,475 INFO [train.py:876] (0/4) Epoch 10, batch 7100, loss[loss=0.1147, simple_loss=0.1469, pruned_loss=0.04126, over 5630.00 frames. ], tot_loss[loss=0.1188, simple_loss=0.1447, pruned_loss=0.04649, over 1082694.51 frames. ], batch size: 29, lr: 7.89e-03, grad_scale: 8.0 2022-11-16 03:26:53,682 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6989, 0.9615, 1.4024, 0.9177, 1.3992, 1.5608, 0.8263, 1.0220], device='cuda:0'), covar=tensor([0.0148, 0.0420, 0.0490, 0.0541, 0.0542, 0.0575, 0.0734, 0.0738], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0021, 0.0015, 0.0018, 0.0015, 0.0013, 0.0020, 0.0014], device='cuda:0'), out_proj_covar=tensor([7.5638e-05, 1.0043e-04, 7.7187e-05, 9.0771e-05, 7.9059e-05, 7.2952e-05, 9.6742e-05, 7.4834e-05], device='cuda:0') 2022-11-16 03:26:57,716 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=72575.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:27:00,009 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2022-11-16 03:27:16,324 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8181, 2.2939, 1.9834, 1.4980, 1.9846, 2.3935, 2.2428, 2.6129], device='cuda:0'), covar=tensor([0.1871, 0.1377, 0.1715, 0.2545, 0.1107, 0.0956, 0.0823, 0.1032], device='cuda:0'), in_proj_covar=tensor([0.0171, 0.0182, 0.0165, 0.0185, 0.0181, 0.0198, 0.0169, 0.0188], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:27:26,997 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6127, 4.4060, 2.9194, 4.1677, 3.3371, 2.9526, 2.3460, 3.7244], device='cuda:0'), covar=tensor([0.1254, 0.0211, 0.1043, 0.0318, 0.0721, 0.0947, 0.1885, 0.0333], device='cuda:0'), in_proj_covar=tensor([0.0155, 0.0138, 0.0159, 0.0143, 0.0175, 0.0167, 0.0163, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 03:27:30,437 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=72623.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:27:36,241 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.573e+02 1.962e+02 2.591e+02 6.270e+02, threshold=3.924e+02, percent-clipped=1.0 2022-11-16 03:27:45,426 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6759, 1.7199, 2.0431, 1.6560, 1.3952, 2.4973, 2.0353, 1.8904], device='cuda:0'), covar=tensor([0.1183, 0.1353, 0.0927, 0.2292, 0.2236, 0.0677, 0.1195, 0.1464], device='cuda:0'), in_proj_covar=tensor([0.0093, 0.0082, 0.0084, 0.0093, 0.0068, 0.0061, 0.0070, 0.0080], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 03:27:47,367 INFO [train.py:876] (0/4) Epoch 10, batch 7200, loss[loss=0.1309, simple_loss=0.1462, pruned_loss=0.0578, over 5566.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1446, pruned_loss=0.04618, over 1082592.83 frames. ], batch size: 43, lr: 7.89e-03, grad_scale: 8.0 2022-11-16 03:27:57,219 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0017, 3.0944, 2.6154, 2.9638, 2.5902, 2.9154, 2.9406, 3.4057], device='cuda:0'), covar=tensor([0.1226, 0.1238, 0.2212, 0.2246, 0.1605, 0.1081, 0.1443, 0.1485], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0101, 0.0103, 0.0098, 0.0090, 0.0098, 0.0093, 0.0075], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 03:27:57,903 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0125, 1.8967, 2.0496, 1.7511, 1.7040, 1.3786, 1.4435, 2.0742], device='cuda:0'), covar=tensor([0.0045, 0.0047, 0.0048, 0.0040, 0.0044, 0.0038, 0.0039, 0.0081], device='cuda:0'), in_proj_covar=tensor([0.0054, 0.0051, 0.0051, 0.0053, 0.0053, 0.0047, 0.0048, 0.0045], device='cuda:0'), out_proj_covar=tensor([4.8537e-05, 4.5759e-05, 4.5803e-05, 4.7335e-05, 4.7045e-05, 4.0895e-05, 4.3196e-05, 4.0005e-05], device='cuda:0') 2022-11-16 03:28:10,212 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=72682.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:28:10,247 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0101, 3.1921, 2.3228, 1.6575, 3.1242, 1.2343, 3.0009, 1.7551], device='cuda:0'), covar=tensor([0.1275, 0.0236, 0.1094, 0.1784, 0.0225, 0.1895, 0.0331, 0.1579], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0104, 0.0113, 0.0112, 0.0101, 0.0120, 0.0098, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:28:11,947 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-16 03:28:36,801 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-10.pt 2022-11-16 03:29:18,983 INFO [train.py:876] (0/4) Epoch 11, batch 0, loss[loss=0.0925, simple_loss=0.1258, pruned_loss=0.02959, over 5701.00 frames. ], tot_loss[loss=0.0925, simple_loss=0.1258, pruned_loss=0.02959, over 5701.00 frames. ], batch size: 15, lr: 7.53e-03, grad_scale: 8.0 2022-11-16 03:29:18,984 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 03:29:24,116 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5372, 4.3739, 4.6744, 4.1448, 4.4315, 4.7280, 4.7465, 4.8763], device='cuda:0'), covar=tensor([0.0456, 0.1359, 0.0383, 0.0880, 0.0349, 0.0184, 0.0672, 0.0223], device='cuda:0'), in_proj_covar=tensor([0.0085, 0.0106, 0.0090, 0.0117, 0.0087, 0.0077, 0.0143, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:29:28,437 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0792, 3.1628, 2.3288, 1.8007, 3.0766, 1.2872, 2.8072, 1.9176], device='cuda:0'), covar=tensor([0.0622, 0.0164, 0.0663, 0.0925, 0.0202, 0.1224, 0.0289, 0.0696], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0104, 0.0113, 0.0112, 0.0101, 0.0119, 0.0097, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:29:35,597 INFO [train.py:908] (0/4) Epoch 11, validation: loss=0.1663, simple_loss=0.1831, pruned_loss=0.07475, over 1530663.00 frames. 2022-11-16 03:29:35,598 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4686MB 2022-11-16 03:29:43,100 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.965e+01 1.625e+02 2.129e+02 2.463e+02 4.242e+02, threshold=4.258e+02, percent-clipped=1.0 2022-11-16 03:30:17,727 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=72785.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:30:17,743 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7459, 1.6995, 1.9012, 1.7061, 1.1900, 2.5190, 1.9605, 1.6760], device='cuda:0'), covar=tensor([0.1002, 0.1382, 0.1280, 0.2270, 0.3142, 0.0776, 0.1113, 0.1862], device='cuda:0'), in_proj_covar=tensor([0.0092, 0.0082, 0.0084, 0.0092, 0.0068, 0.0061, 0.0069, 0.0080], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 03:30:36,690 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6384, 3.7476, 3.5944, 3.2695, 2.1184, 3.8517, 2.2046, 2.9755], device='cuda:0'), covar=tensor([0.0429, 0.0250, 0.0202, 0.0429, 0.0613, 0.0158, 0.0541, 0.0192], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0172, 0.0178, 0.0199, 0.0189, 0.0174, 0.0186, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 03:30:42,330 INFO [train.py:876] (0/4) Epoch 11, batch 100, loss[loss=0.1047, simple_loss=0.1262, pruned_loss=0.04164, over 5590.00 frames. ], tot_loss[loss=0.1183, simple_loss=0.145, pruned_loss=0.04578, over 432601.39 frames. ], batch size: 22, lr: 7.52e-03, grad_scale: 8.0 2022-11-16 03:30:48,750 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.46 vs. limit=5.0 2022-11-16 03:30:49,527 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.571e+02 1.949e+02 2.153e+02 3.381e+02, threshold=3.898e+02, percent-clipped=0.0 2022-11-16 03:30:50,279 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=72833.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:31:08,932 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1790, 2.4475, 2.7229, 2.4014, 1.5611, 2.4896, 1.8162, 2.1454], device='cuda:0'), covar=tensor([0.0254, 0.0166, 0.0130, 0.0205, 0.0364, 0.0162, 0.0333, 0.0176], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0171, 0.0177, 0.0199, 0.0188, 0.0173, 0.0185, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 03:31:22,226 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6597, 4.5115, 4.4960, 4.6841, 4.3313, 4.0826, 5.1252, 4.4646], device='cuda:0'), covar=tensor([0.0346, 0.0724, 0.0348, 0.1045, 0.0480, 0.0325, 0.0625, 0.0609], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0107, 0.0091, 0.0118, 0.0088, 0.0079, 0.0144, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:31:25,048 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2022-11-16 03:31:31,482 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=72894.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:31:46,192 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.3504, 4.7194, 5.0937, 4.7128, 5.3687, 5.2311, 4.5601, 5.2983], device='cuda:0'), covar=tensor([0.0273, 0.0300, 0.0398, 0.0308, 0.0274, 0.0169, 0.0227, 0.0232], device='cuda:0'), in_proj_covar=tensor([0.0135, 0.0145, 0.0107, 0.0140, 0.0168, 0.0098, 0.0119, 0.0143], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 03:31:50,365 INFO [train.py:876] (0/4) Epoch 11, batch 200, loss[loss=0.07384, simple_loss=0.111, pruned_loss=0.01832, over 5107.00 frames. ], tot_loss[loss=0.1198, simple_loss=0.145, pruned_loss=0.04727, over 687032.62 frames. ], batch size: 7, lr: 7.52e-03, grad_scale: 8.0 2022-11-16 03:31:57,298 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.513e+01 1.538e+02 1.800e+02 2.272e+02 4.125e+02, threshold=3.600e+02, percent-clipped=3.0 2022-11-16 03:32:12,300 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=72955.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:32:13,611 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4585, 3.7187, 4.0919, 1.9032, 3.8125, 4.3329, 4.2163, 4.3181], device='cuda:0'), covar=tensor([0.1831, 0.1318, 0.0450, 0.2641, 0.0317, 0.0396, 0.0329, 0.0549], device='cuda:0'), in_proj_covar=tensor([0.0168, 0.0179, 0.0163, 0.0182, 0.0178, 0.0195, 0.0166, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:32:17,085 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.10 vs. limit=5.0 2022-11-16 03:32:18,266 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=72964.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 03:32:30,979 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=72982.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:32:40,137 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6975, 1.6755, 2.1156, 1.6179, 1.8938, 2.1699, 1.7851, 1.8597], device='cuda:0'), covar=tensor([0.2377, 0.0484, 0.0460, 0.0637, 0.0913, 0.0223, 0.0402, 0.0453], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0021, 0.0015, 0.0018, 0.0015, 0.0013, 0.0020, 0.0014], device='cuda:0'), out_proj_covar=tensor([7.5581e-05, 1.0235e-04, 7.7978e-05, 9.1781e-05, 8.0028e-05, 7.4023e-05, 9.8502e-05, 7.5839e-05], device='cuda:0') 2022-11-16 03:32:57,223 INFO [train.py:876] (0/4) Epoch 11, batch 300, loss[loss=0.1186, simple_loss=0.1597, pruned_loss=0.03869, over 5561.00 frames. ], tot_loss[loss=0.1182, simple_loss=0.1441, pruned_loss=0.0461, over 847617.94 frames. ], batch size: 21, lr: 7.51e-03, grad_scale: 8.0 2022-11-16 03:33:00,023 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=73025.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 03:33:03,463 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=73030.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:33:04,728 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.624e+02 1.928e+02 2.465e+02 5.255e+02, threshold=3.856e+02, percent-clipped=4.0 2022-11-16 03:34:03,259 INFO [train.py:876] (0/4) Epoch 11, batch 400, loss[loss=0.115, simple_loss=0.1515, pruned_loss=0.03927, over 5613.00 frames. ], tot_loss[loss=0.118, simple_loss=0.1443, pruned_loss=0.0458, over 949627.18 frames. ], batch size: 32, lr: 7.51e-03, grad_scale: 8.0 2022-11-16 03:34:11,192 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.143e+01 1.547e+02 1.866e+02 2.274e+02 4.703e+02, threshold=3.733e+02, percent-clipped=2.0 2022-11-16 03:34:44,061 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8806, 1.9161, 1.9696, 1.9743, 1.8113, 1.7063, 1.8596, 2.1688], device='cuda:0'), covar=tensor([0.1638, 0.1564, 0.1865, 0.1202, 0.1648, 0.1702, 0.1521, 0.0736], device='cuda:0'), in_proj_covar=tensor([0.0106, 0.0102, 0.0103, 0.0098, 0.0090, 0.0099, 0.0095, 0.0076], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 03:35:01,112 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.05 vs. limit=5.0 2022-11-16 03:35:10,797 INFO [train.py:876] (0/4) Epoch 11, batch 500, loss[loss=0.1131, simple_loss=0.1429, pruned_loss=0.04164, over 5582.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1432, pruned_loss=0.04516, over 1006559.01 frames. ], batch size: 43, lr: 7.50e-03, grad_scale: 8.0 2022-11-16 03:35:17,968 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 1.458e+02 1.748e+02 2.225e+02 4.920e+02, threshold=3.496e+02, percent-clipped=3.0 2022-11-16 03:35:30,814 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=73250.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:35:59,475 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=73293.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:36:18,444 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=73320.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 03:36:19,023 INFO [train.py:876] (0/4) Epoch 11, batch 600, loss[loss=0.129, simple_loss=0.1453, pruned_loss=0.05638, over 4632.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1447, pruned_loss=0.04609, over 1040195.56 frames. ], batch size: 135, lr: 7.50e-03, grad_scale: 16.0 2022-11-16 03:36:26,010 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.407e+01 1.498e+02 1.818e+02 2.192e+02 5.468e+02, threshold=3.637e+02, percent-clipped=3.0 2022-11-16 03:36:41,244 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=73354.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:37:26,415 INFO [train.py:876] (0/4) Epoch 11, batch 700, loss[loss=0.1211, simple_loss=0.1527, pruned_loss=0.04477, over 5741.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1453, pruned_loss=0.04621, over 1053874.57 frames. ], batch size: 31, lr: 7.49e-03, grad_scale: 16.0 2022-11-16 03:37:33,752 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.643e+01 1.493e+02 1.779e+02 2.171e+02 7.161e+02, threshold=3.558e+02, percent-clipped=3.0 2022-11-16 03:38:33,547 INFO [train.py:876] (0/4) Epoch 11, batch 800, loss[loss=0.07485, simple_loss=0.1085, pruned_loss=0.02059, over 5571.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1443, pruned_loss=0.04571, over 1066702.20 frames. ], batch size: 16, lr: 7.49e-03, grad_scale: 8.0 2022-11-16 03:38:41,647 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.679e+01 1.505e+02 1.889e+02 2.408e+02 4.187e+02, threshold=3.778e+02, percent-clipped=1.0 2022-11-16 03:38:51,274 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7356, 1.9418, 2.3631, 1.7076, 1.5940, 2.7361, 2.3904, 1.9208], device='cuda:0'), covar=tensor([0.1131, 0.1344, 0.0914, 0.2608, 0.3321, 0.0638, 0.1172, 0.1767], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0082, 0.0085, 0.0094, 0.0069, 0.0062, 0.0071, 0.0082], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 03:38:53,253 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=73550.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:39:09,768 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.78 vs. limit=2.0 2022-11-16 03:39:20,190 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=73590.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:39:25,556 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=73598.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:39:26,007 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.70 vs. limit=2.0 2022-11-16 03:39:40,093 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=73620.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 03:39:40,651 INFO [train.py:876] (0/4) Epoch 11, batch 900, loss[loss=0.1079, simple_loss=0.1385, pruned_loss=0.03871, over 5577.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1444, pruned_loss=0.0456, over 1077870.15 frames. ], batch size: 24, lr: 7.48e-03, grad_scale: 8.0 2022-11-16 03:39:49,606 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.416e+01 1.675e+02 2.016e+02 2.471e+02 4.865e+02, threshold=4.032e+02, percent-clipped=2.0 2022-11-16 03:40:00,140 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=73649.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:40:01,545 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=73651.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:40:12,986 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=73668.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 03:40:47,970 INFO [train.py:876] (0/4) Epoch 11, batch 1000, loss[loss=0.06793, simple_loss=0.09715, pruned_loss=0.01936, over 5044.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1441, pruned_loss=0.04524, over 1078181.19 frames. ], batch size: 7, lr: 7.48e-03, grad_scale: 8.0 2022-11-16 03:40:55,188 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=73732.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:40:55,647 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.576e+01 1.693e+02 2.139e+02 2.600e+02 5.774e+02, threshold=4.279e+02, percent-clipped=7.0 2022-11-16 03:41:36,501 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.6099, 1.1740, 0.8838, 0.7468, 1.0530, 1.0730, 0.6396, 1.0970], device='cuda:0'), covar=tensor([0.0095, 0.0041, 0.0076, 0.0056, 0.0053, 0.0054, 0.0103, 0.0054], device='cuda:0'), in_proj_covar=tensor([0.0058, 0.0052, 0.0054, 0.0055, 0.0056, 0.0049, 0.0051, 0.0048], device='cuda:0'), out_proj_covar=tensor([5.1964e-05, 4.6814e-05, 4.7586e-05, 4.9415e-05, 4.9499e-05, 4.2758e-05, 4.6017e-05, 4.2093e-05], device='cuda:0') 2022-11-16 03:41:36,506 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=73793.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:41:55,286 INFO [train.py:876] (0/4) Epoch 11, batch 1100, loss[loss=0.1072, simple_loss=0.1388, pruned_loss=0.0378, over 5556.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1448, pruned_loss=0.04595, over 1079853.40 frames. ], batch size: 25, lr: 7.47e-03, grad_scale: 8.0 2022-11-16 03:42:02,967 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.033e+02 1.524e+02 1.845e+02 2.203e+02 3.683e+02, threshold=3.689e+02, percent-clipped=0.0 2022-11-16 03:43:01,960 INFO [train.py:876] (0/4) Epoch 11, batch 1200, loss[loss=0.09734, simple_loss=0.1394, pruned_loss=0.02762, over 5570.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1456, pruned_loss=0.04667, over 1080085.24 frames. ], batch size: 25, lr: 7.47e-03, grad_scale: 8.0 2022-11-16 03:43:07,277 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2022-11-16 03:43:10,219 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.261e+01 1.559e+02 1.976e+02 2.426e+02 6.394e+02, threshold=3.952e+02, percent-clipped=4.0 2022-11-16 03:43:10,450 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4799, 1.9809, 2.1624, 2.6250, 2.7453, 2.1349, 1.6540, 2.7560], device='cuda:0'), covar=tensor([0.1909, 0.2819, 0.2133, 0.1256, 0.1323, 0.2878, 0.2491, 0.1112], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0200, 0.0188, 0.0307, 0.0226, 0.0204, 0.0190, 0.0243], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 03:43:18,849 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=73946.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:43:20,843 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=73949.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:43:50,858 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.79 vs. limit=2.0 2022-11-16 03:43:53,689 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=73997.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:44:10,602 INFO [train.py:876] (0/4) Epoch 11, batch 1300, loss[loss=0.09502, simple_loss=0.1273, pruned_loss=0.03137, over 5739.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1449, pruned_loss=0.04593, over 1079880.39 frames. ], batch size: 13, lr: 7.46e-03, grad_scale: 8.0 2022-11-16 03:44:13,715 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2022-11-16 03:44:18,270 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.904e+01 1.577e+02 1.830e+02 2.359e+02 4.082e+02, threshold=3.660e+02, percent-clipped=1.0 2022-11-16 03:44:28,392 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8224, 3.9518, 3.6818, 3.5493, 2.0126, 3.9092, 2.3850, 3.1644], device='cuda:0'), covar=tensor([0.0399, 0.0189, 0.0203, 0.0283, 0.0624, 0.0174, 0.0498, 0.0265], device='cuda:0'), in_proj_covar=tensor([0.0188, 0.0170, 0.0176, 0.0196, 0.0185, 0.0174, 0.0185, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 03:44:38,542 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74064.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:44:53,316 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8275, 1.5208, 1.0580, 0.8499, 1.3918, 1.4333, 0.7122, 1.3485], device='cuda:0'), covar=tensor([0.0062, 0.0037, 0.0052, 0.0065, 0.0048, 0.0042, 0.0091, 0.0049], device='cuda:0'), in_proj_covar=tensor([0.0057, 0.0053, 0.0053, 0.0055, 0.0055, 0.0049, 0.0050, 0.0047], device='cuda:0'), out_proj_covar=tensor([5.1387e-05, 4.7154e-05, 4.6532e-05, 4.9283e-05, 4.9167e-05, 4.2302e-05, 4.5439e-05, 4.2023e-05], device='cuda:0') 2022-11-16 03:44:55,159 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=74088.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:44:56,508 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3629, 1.9042, 1.5056, 1.3470, 0.9153, 1.5603, 1.2184, 1.7495], device='cuda:0'), covar=tensor([0.0924, 0.0453, 0.0899, 0.0948, 0.2201, 0.0879, 0.1517, 0.0616], device='cuda:0'), in_proj_covar=tensor([0.0161, 0.0141, 0.0161, 0.0147, 0.0178, 0.0173, 0.0167, 0.0161], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:44:59,096 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5639, 2.1619, 2.2349, 2.7732, 2.8557, 2.1771, 1.9273, 2.8531], device='cuda:0'), covar=tensor([0.1640, 0.1914, 0.1707, 0.1291, 0.1107, 0.2573, 0.1914, 0.1207], device='cuda:0'), in_proj_covar=tensor([0.0248, 0.0201, 0.0189, 0.0312, 0.0225, 0.0205, 0.0192, 0.0245], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 03:45:00,321 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8897, 1.5165, 1.7743, 1.4066, 1.6007, 1.8340, 1.1920, 1.3867], device='cuda:0'), covar=tensor([0.0049, 0.0068, 0.0025, 0.0057, 0.0063, 0.0083, 0.0051, 0.0055], device='cuda:0'), in_proj_covar=tensor([0.0026, 0.0024, 0.0024, 0.0033, 0.0027, 0.0027, 0.0032, 0.0032], device='cuda:0'), out_proj_covar=tensor([2.3789e-05, 2.2507e-05, 2.2200e-05, 3.2267e-05, 2.5585e-05, 2.5259e-05, 3.0935e-05, 3.0939e-05], device='cuda:0') 2022-11-16 03:45:16,722 INFO [train.py:876] (0/4) Epoch 11, batch 1400, loss[loss=0.147, simple_loss=0.1567, pruned_loss=0.06863, over 4762.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1441, pruned_loss=0.04491, over 1086869.60 frames. ], batch size: 135, lr: 7.46e-03, grad_scale: 8.0 2022-11-16 03:45:19,904 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=74125.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:45:25,523 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 1.555e+02 1.864e+02 2.328e+02 5.952e+02, threshold=3.728e+02, percent-clipped=5.0 2022-11-16 03:46:24,537 INFO [train.py:876] (0/4) Epoch 11, batch 1500, loss[loss=0.09726, simple_loss=0.1369, pruned_loss=0.0288, over 5483.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1439, pruned_loss=0.04505, over 1082777.74 frames. ], batch size: 12, lr: 7.45e-03, grad_scale: 8.0 2022-11-16 03:46:24,703 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2763, 1.4623, 1.1117, 1.0829, 1.0450, 1.7755, 1.5882, 1.4846], device='cuda:0'), covar=tensor([0.1041, 0.0753, 0.1851, 0.1920, 0.1354, 0.0761, 0.1013, 0.1224], device='cuda:0'), in_proj_covar=tensor([0.0171, 0.0183, 0.0167, 0.0186, 0.0185, 0.0199, 0.0170, 0.0189], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:46:32,815 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.551e+02 1.772e+02 2.146e+02 3.863e+02, threshold=3.544e+02, percent-clipped=1.0 2022-11-16 03:46:37,954 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74240.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:46:42,230 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=74246.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:46:46,854 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1522, 2.3292, 3.4899, 3.0226, 4.0774, 2.6897, 3.5256, 4.1398], device='cuda:0'), covar=tensor([0.0747, 0.1620, 0.1168, 0.1878, 0.0501, 0.1434, 0.1155, 0.0710], device='cuda:0'), in_proj_covar=tensor([0.0236, 0.0189, 0.0208, 0.0207, 0.0230, 0.0191, 0.0222, 0.0222], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:46:57,386 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-16 03:47:03,593 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6727, 4.3431, 3.9163, 3.9258, 4.3812, 4.2026, 2.2004, 4.6024], device='cuda:0'), covar=tensor([0.0182, 0.0399, 0.0395, 0.0287, 0.0318, 0.0446, 0.2826, 0.0354], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0087, 0.0086, 0.0081, 0.0104, 0.0089, 0.0133, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:47:15,033 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=74294.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:47:19,869 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=74301.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:47:33,156 INFO [train.py:876] (0/4) Epoch 11, batch 1600, loss[loss=0.0835, simple_loss=0.1231, pruned_loss=0.02194, over 5532.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1438, pruned_loss=0.04493, over 1073678.50 frames. ], batch size: 13, lr: 7.45e-03, grad_scale: 8.0 2022-11-16 03:47:40,983 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.188e+01 1.509e+02 1.863e+02 2.484e+02 5.200e+02, threshold=3.726e+02, percent-clipped=6.0 2022-11-16 03:48:18,288 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=74388.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:48:39,916 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=74420.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:48:40,466 INFO [train.py:876] (0/4) Epoch 11, batch 1700, loss[loss=0.1497, simple_loss=0.1721, pruned_loss=0.06365, over 5249.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1461, pruned_loss=0.04667, over 1079835.83 frames. ], batch size: 79, lr: 7.44e-03, grad_scale: 8.0 2022-11-16 03:48:48,599 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.679e+02 2.068e+02 2.361e+02 5.198e+02, threshold=4.137e+02, percent-clipped=4.0 2022-11-16 03:48:50,639 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=74436.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:49:48,195 INFO [train.py:876] (0/4) Epoch 11, batch 1800, loss[loss=0.08281, simple_loss=0.1206, pruned_loss=0.02253, over 5513.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1454, pruned_loss=0.04651, over 1080705.20 frames. ], batch size: 13, lr: 7.44e-03, grad_scale: 8.0 2022-11-16 03:49:55,852 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.990e+01 1.615e+02 2.043e+02 2.453e+02 6.860e+02, threshold=4.086e+02, percent-clipped=1.0 2022-11-16 03:49:57,953 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9521, 3.8823, 3.7904, 3.6284, 3.9017, 3.8849, 1.5040, 3.9983], device='cuda:0'), covar=tensor([0.0300, 0.0356, 0.0329, 0.0387, 0.0366, 0.0300, 0.3542, 0.0333], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0087, 0.0087, 0.0081, 0.0104, 0.0089, 0.0133, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:50:14,772 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7895, 1.1466, 1.2993, 0.8462, 1.4048, 1.5411, 0.7727, 1.2260], device='cuda:0'), covar=tensor([0.0178, 0.0448, 0.0300, 0.0828, 0.0445, 0.0201, 0.0702, 0.0438], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0022, 0.0015, 0.0019, 0.0015, 0.0014, 0.0020, 0.0015], device='cuda:0'), out_proj_covar=tensor([7.7388e-05, 1.0596e-04, 8.0100e-05, 9.4553e-05, 8.1437e-05, 7.5978e-05, 9.9807e-05, 7.7501e-05], device='cuda:0') 2022-11-16 03:50:35,529 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7820, 1.9426, 2.3233, 1.8496, 1.4468, 2.9999, 2.3560, 2.0139], device='cuda:0'), covar=tensor([0.1166, 0.1338, 0.0829, 0.2854, 0.2279, 0.0643, 0.1477, 0.1512], device='cuda:0'), in_proj_covar=tensor([0.0095, 0.0083, 0.0084, 0.0093, 0.0069, 0.0061, 0.0071, 0.0082], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 03:50:38,143 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74595.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:50:38,729 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=74596.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:50:39,476 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1471, 1.5881, 1.2824, 1.1949, 1.4628, 1.3624, 1.2075, 1.3817], device='cuda:0'), covar=tensor([0.0059, 0.0047, 0.0071, 0.0054, 0.0047, 0.0043, 0.0078, 0.0056], device='cuda:0'), in_proj_covar=tensor([0.0055, 0.0051, 0.0051, 0.0053, 0.0053, 0.0048, 0.0049, 0.0046], device='cuda:0'), out_proj_covar=tensor([4.9821e-05, 4.5441e-05, 4.5195e-05, 4.7924e-05, 4.7688e-05, 4.1729e-05, 4.4000e-05, 4.0641e-05], device='cuda:0') 2022-11-16 03:50:56,052 INFO [train.py:876] (0/4) Epoch 11, batch 1900, loss[loss=0.1211, simple_loss=0.1541, pruned_loss=0.04405, over 5765.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1437, pruned_loss=0.0441, over 1092273.05 frames. ], batch size: 20, lr: 7.43e-03, grad_scale: 8.0 2022-11-16 03:51:04,167 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.647e+01 1.532e+02 1.879e+02 2.235e+02 4.032e+02, threshold=3.759e+02, percent-clipped=0.0 2022-11-16 03:51:19,907 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=74656.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:51:46,275 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.87 vs. limit=2.0 2022-11-16 03:51:51,225 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0198, 2.3991, 3.5602, 3.3907, 3.9163, 2.5776, 3.5297, 3.9708], device='cuda:0'), covar=tensor([0.0690, 0.1750, 0.1017, 0.1480, 0.0529, 0.1862, 0.1138, 0.0793], device='cuda:0'), in_proj_covar=tensor([0.0240, 0.0194, 0.0213, 0.0211, 0.0237, 0.0197, 0.0226, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:52:00,727 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1878, 4.5315, 2.5800, 4.3204, 3.6210, 2.7574, 2.2559, 3.9582], device='cuda:0'), covar=tensor([0.2288, 0.0382, 0.1806, 0.0525, 0.0759, 0.1538, 0.2456, 0.0433], device='cuda:0'), in_proj_covar=tensor([0.0159, 0.0140, 0.0161, 0.0146, 0.0179, 0.0171, 0.0166, 0.0160], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:52:03,038 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=74720.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:52:03,554 INFO [train.py:876] (0/4) Epoch 11, batch 2000, loss[loss=0.08687, simple_loss=0.1018, pruned_loss=0.03596, over 5054.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.1444, pruned_loss=0.04555, over 1087054.62 frames. ], batch size: 7, lr: 7.43e-03, grad_scale: 8.0 2022-11-16 03:52:09,101 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.25 vs. limit=2.0 2022-11-16 03:52:12,066 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.989e+01 1.482e+02 1.897e+02 2.343e+02 3.956e+02, threshold=3.795e+02, percent-clipped=2.0 2022-11-16 03:52:35,421 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=74768.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:52:53,180 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1543, 4.0179, 3.5044, 3.3068, 1.7645, 3.6418, 2.0089, 3.1807], device='cuda:0'), covar=tensor([0.0585, 0.0188, 0.0187, 0.0378, 0.0795, 0.0192, 0.0608, 0.0174], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0173, 0.0177, 0.0199, 0.0190, 0.0176, 0.0188, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 03:53:11,057 INFO [train.py:876] (0/4) Epoch 11, batch 2100, loss[loss=0.09412, simple_loss=0.1325, pruned_loss=0.02788, over 5531.00 frames. ], tot_loss[loss=0.118, simple_loss=0.1443, pruned_loss=0.04584, over 1088354.00 frames. ], batch size: 21, lr: 7.42e-03, grad_scale: 8.0 2022-11-16 03:53:19,079 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.996e+01 1.479e+02 1.848e+02 2.338e+02 4.200e+02, threshold=3.697e+02, percent-clipped=1.0 2022-11-16 03:53:40,141 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.74 vs. limit=2.0 2022-11-16 03:54:02,275 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=74896.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:54:10,815 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74909.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:54:18,890 INFO [train.py:876] (0/4) Epoch 11, batch 2200, loss[loss=0.118, simple_loss=0.1353, pruned_loss=0.05031, over 5128.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1447, pruned_loss=0.0461, over 1088716.72 frames. ], batch size: 91, lr: 7.42e-03, grad_scale: 8.0 2022-11-16 03:54:21,746 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8995, 2.2967, 2.7355, 3.6409, 3.3945, 2.8983, 2.5482, 3.7442], device='cuda:0'), covar=tensor([0.0586, 0.3350, 0.2208, 0.2899, 0.1654, 0.2663, 0.2144, 0.0877], device='cuda:0'), in_proj_covar=tensor([0.0248, 0.0199, 0.0187, 0.0308, 0.0222, 0.0203, 0.0191, 0.0242], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 03:54:26,979 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.575e+02 1.874e+02 2.286e+02 4.723e+02, threshold=3.748e+02, percent-clipped=3.0 2022-11-16 03:54:34,897 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=74944.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:54:39,518 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=74951.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:54:48,161 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=74964.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:54:52,441 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=74970.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:55:12,916 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-75000.pt 2022-11-16 03:55:30,347 INFO [train.py:876] (0/4) Epoch 11, batch 2300, loss[loss=0.09967, simple_loss=0.1335, pruned_loss=0.03293, over 5559.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1438, pruned_loss=0.04562, over 1092365.45 frames. ], batch size: 14, lr: 7.41e-03, grad_scale: 8.0 2022-11-16 03:55:33,370 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=75025.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:55:38,336 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 1.570e+02 1.933e+02 2.290e+02 4.748e+02, threshold=3.866e+02, percent-clipped=2.0 2022-11-16 03:55:52,390 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4916, 5.2812, 4.7836, 5.3677, 5.3215, 4.6119, 4.7731, 4.5096], device='cuda:0'), covar=tensor([0.0292, 0.0437, 0.1287, 0.0344, 0.0307, 0.0292, 0.0303, 0.0570], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0172, 0.0267, 0.0169, 0.0212, 0.0168, 0.0183, 0.0172], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:56:09,006 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.96 vs. limit=5.0 2022-11-16 03:56:37,692 INFO [train.py:876] (0/4) Epoch 11, batch 2400, loss[loss=0.1177, simple_loss=0.1366, pruned_loss=0.04941, over 5780.00 frames. ], tot_loss[loss=0.1172, simple_loss=0.1438, pruned_loss=0.04534, over 1092702.90 frames. ], batch size: 27, lr: 7.41e-03, grad_scale: 8.0 2022-11-16 03:56:45,362 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.612e+02 2.010e+02 2.396e+02 4.325e+02, threshold=4.021e+02, percent-clipped=4.0 2022-11-16 03:57:05,087 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9351, 2.5839, 2.3106, 1.5061, 2.6694, 2.8606, 2.7707, 2.9204], device='cuda:0'), covar=tensor([0.1766, 0.1478, 0.1343, 0.2798, 0.0800, 0.1000, 0.0512, 0.0931], device='cuda:0'), in_proj_covar=tensor([0.0167, 0.0180, 0.0163, 0.0183, 0.0182, 0.0197, 0.0168, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:57:45,188 INFO [train.py:876] (0/4) Epoch 11, batch 2500, loss[loss=0.1472, simple_loss=0.1449, pruned_loss=0.07472, over 4060.00 frames. ], tot_loss[loss=0.1141, simple_loss=0.1417, pruned_loss=0.04325, over 1090975.76 frames. ], batch size: 181, lr: 7.40e-03, grad_scale: 8.0 2022-11-16 03:57:50,334 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=75228.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:57:53,348 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.180e+01 1.567e+02 1.927e+02 2.439e+02 5.845e+02, threshold=3.854e+02, percent-clipped=5.0 2022-11-16 03:58:00,043 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6689, 4.7473, 3.7141, 2.2129, 4.3382, 2.0545, 4.3239, 2.7307], device='cuda:0'), covar=tensor([0.1523, 0.0157, 0.0433, 0.2144, 0.0229, 0.1714, 0.0188, 0.1473], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0105, 0.0113, 0.0113, 0.0100, 0.0122, 0.0100, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 03:58:05,556 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75251.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:58:14,711 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=75265.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:58:18,689 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=75270.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:58:31,499 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=75289.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:58:38,025 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75299.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:58:52,022 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=75320.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 03:58:52,581 INFO [train.py:876] (0/4) Epoch 11, batch 2600, loss[loss=0.1229, simple_loss=0.144, pruned_loss=0.05084, over 5556.00 frames. ], tot_loss[loss=0.1128, simple_loss=0.1407, pruned_loss=0.04242, over 1089933.99 frames. ], batch size: 46, lr: 7.40e-03, grad_scale: 8.0 2022-11-16 03:58:53,665 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2485, 4.7507, 4.3685, 4.9215, 4.8763, 3.9834, 4.4863, 4.0388], device='cuda:0'), covar=tensor([0.0317, 0.0569, 0.1569, 0.0329, 0.0410, 0.0459, 0.0589, 0.0554], device='cuda:0'), in_proj_covar=tensor([0.0135, 0.0176, 0.0273, 0.0171, 0.0217, 0.0172, 0.0186, 0.0175], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:58:59,934 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=75331.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 03:59:01,316 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.425e+02 1.750e+02 2.205e+02 4.754e+02, threshold=3.499e+02, percent-clipped=3.0 2022-11-16 03:59:27,611 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0296, 3.9826, 4.0086, 4.1512, 3.5651, 3.2735, 4.4886, 3.9516], device='cuda:0'), covar=tensor([0.0430, 0.0785, 0.0612, 0.1030, 0.0599, 0.0487, 0.0661, 0.0567], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0106, 0.0092, 0.0115, 0.0087, 0.0077, 0.0141, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 03:59:36,295 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5920, 4.5236, 4.6293, 4.7039, 4.0576, 4.0238, 5.1192, 4.5760], device='cuda:0'), covar=tensor([0.0367, 0.0734, 0.0331, 0.0942, 0.0530, 0.0302, 0.0510, 0.0413], device='cuda:0'), in_proj_covar=tensor([0.0084, 0.0106, 0.0092, 0.0115, 0.0087, 0.0077, 0.0141, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:00:00,556 INFO [train.py:876] (0/4) Epoch 11, batch 2700, loss[loss=0.1069, simple_loss=0.1442, pruned_loss=0.03476, over 5793.00 frames. ], tot_loss[loss=0.1138, simple_loss=0.1416, pruned_loss=0.04303, over 1091656.41 frames. ], batch size: 21, lr: 7.39e-03, grad_scale: 8.0 2022-11-16 04:00:08,243 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.727e+01 1.476e+02 1.842e+02 2.376e+02 5.290e+02, threshold=3.683e+02, percent-clipped=4.0 2022-11-16 04:00:21,780 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7042, 1.2035, 1.1715, 1.0091, 1.3205, 1.6798, 0.8733, 1.2616], device='cuda:0'), covar=tensor([0.0393, 0.0347, 0.0523, 0.0880, 0.0597, 0.0646, 0.0805, 0.0450], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0023, 0.0016, 0.0020, 0.0016, 0.0015, 0.0021, 0.0015], device='cuda:0'), out_proj_covar=tensor([8.0188e-05, 1.1067e-04, 8.3055e-05, 9.9786e-05, 8.4612e-05, 8.0289e-05, 1.0356e-04, 8.0686e-05], device='cuda:0') 2022-11-16 04:00:47,110 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-16 04:01:07,891 INFO [train.py:876] (0/4) Epoch 11, batch 2800, loss[loss=0.08365, simple_loss=0.1173, pruned_loss=0.02501, over 5752.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.1419, pruned_loss=0.04387, over 1084167.42 frames. ], batch size: 16, lr: 7.39e-03, grad_scale: 16.0 2022-11-16 04:01:15,808 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.621e+01 1.514e+02 1.754e+02 2.242e+02 3.721e+02, threshold=3.509e+02, percent-clipped=2.0 2022-11-16 04:01:22,745 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8041, 1.3983, 2.1004, 1.7407, 1.1227, 1.5972, 1.9818, 1.8810], device='cuda:0'), covar=tensor([0.0117, 0.0168, 0.0043, 0.0062, 0.0106, 0.0235, 0.0050, 0.0048], device='cuda:0'), in_proj_covar=tensor([0.0026, 0.0025, 0.0025, 0.0034, 0.0028, 0.0027, 0.0032, 0.0032], device='cuda:0'), out_proj_covar=tensor([2.4351e-05, 2.3040e-05, 2.2876e-05, 3.3171e-05, 2.6081e-05, 2.5975e-05, 3.1410e-05, 3.1223e-05], device='cuda:0') 2022-11-16 04:01:37,952 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75565.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:01:44,656 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.6920, 1.0374, 0.6655, 0.7947, 0.9246, 1.0561, 0.5111, 1.1303], device='cuda:0'), covar=tensor([0.0081, 0.0034, 0.0062, 0.0047, 0.0049, 0.0049, 0.0099, 0.0048], device='cuda:0'), in_proj_covar=tensor([0.0056, 0.0051, 0.0052, 0.0055, 0.0055, 0.0049, 0.0049, 0.0047], device='cuda:0'), out_proj_covar=tensor([5.0636e-05, 4.5663e-05, 4.6024e-05, 4.9666e-05, 4.8847e-05, 4.3007e-05, 4.4515e-05, 4.1482e-05], device='cuda:0') 2022-11-16 04:01:50,645 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=75584.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:01:58,852 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.54 vs. limit=2.0 2022-11-16 04:02:10,819 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75613.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:02:10,834 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0513, 3.9726, 4.0773, 4.1240, 3.7636, 3.6045, 4.4423, 3.9115], device='cuda:0'), covar=tensor([0.0394, 0.0718, 0.0383, 0.0973, 0.0535, 0.0368, 0.0627, 0.0676], device='cuda:0'), in_proj_covar=tensor([0.0085, 0.0108, 0.0092, 0.0117, 0.0088, 0.0077, 0.0143, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:02:15,420 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75620.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:02:15,978 INFO [train.py:876] (0/4) Epoch 11, batch 2900, loss[loss=0.1426, simple_loss=0.1594, pruned_loss=0.06288, over 5590.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.141, pruned_loss=0.04412, over 1071682.87 frames. ], batch size: 43, lr: 7.38e-03, grad_scale: 16.0 2022-11-16 04:02:19,367 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=75626.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 04:02:23,733 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 1.563e+02 1.912e+02 2.291e+02 3.744e+02, threshold=3.824e+02, percent-clipped=2.0 2022-11-16 04:02:32,070 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0271, 3.2299, 2.8978, 3.1400, 2.7978, 3.2101, 3.3084, 3.8388], device='cuda:0'), covar=tensor([0.0689, 0.1227, 0.1383, 0.0828, 0.1387, 0.0683, 0.1115, 0.1256], device='cuda:0'), in_proj_covar=tensor([0.0107, 0.0102, 0.0102, 0.0098, 0.0090, 0.0098, 0.0094, 0.0077], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:02:47,900 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75668.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:03:09,903 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.55 vs. limit=5.0 2022-11-16 04:03:16,425 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.52 vs. limit=5.0 2022-11-16 04:03:23,300 INFO [train.py:876] (0/4) Epoch 11, batch 3000, loss[loss=0.07257, simple_loss=0.1025, pruned_loss=0.02133, over 5710.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1411, pruned_loss=0.04403, over 1073406.63 frames. ], batch size: 11, lr: 7.38e-03, grad_scale: 16.0 2022-11-16 04:03:23,301 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 04:03:29,729 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9268, 4.6445, 4.6973, 4.6754, 5.0184, 4.9785, 4.6416, 5.0029], device='cuda:0'), covar=tensor([0.0326, 0.0358, 0.0478, 0.0463, 0.0327, 0.0199, 0.0338, 0.0408], device='cuda:0'), in_proj_covar=tensor([0.0140, 0.0146, 0.0107, 0.0142, 0.0171, 0.0101, 0.0122, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:03:33,453 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7599, 3.6358, 3.5986, 3.5379, 3.8531, 3.7394, 3.8672, 3.8972], device='cuda:0'), covar=tensor([0.0438, 0.0418, 0.0534, 0.0531, 0.0509, 0.0285, 0.0267, 0.0456], device='cuda:0'), in_proj_covar=tensor([0.0140, 0.0146, 0.0107, 0.0142, 0.0171, 0.0101, 0.0122, 0.0147], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:03:40,544 INFO [train.py:908] (0/4) Epoch 11, validation: loss=0.1699, simple_loss=0.1855, pruned_loss=0.07718, over 1530663.00 frames. 2022-11-16 04:03:40,544 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 04:03:48,303 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.213e+02 1.514e+02 1.845e+02 2.226e+02 5.649e+02, threshold=3.690e+02, percent-clipped=5.0 2022-11-16 04:04:19,240 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6724, 3.6239, 3.5415, 3.7771, 3.5776, 3.3522, 4.0801, 3.6453], device='cuda:0'), covar=tensor([0.0469, 0.0761, 0.0502, 0.0941, 0.0520, 0.0400, 0.0696, 0.0668], device='cuda:0'), in_proj_covar=tensor([0.0085, 0.0108, 0.0093, 0.0118, 0.0089, 0.0078, 0.0144, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:04:20,016 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9255, 1.8111, 1.8680, 1.2630, 1.6969, 1.3855, 1.6109, 1.8254], device='cuda:0'), covar=tensor([0.0060, 0.0061, 0.0041, 0.0062, 0.0052, 0.0045, 0.0041, 0.0077], device='cuda:0'), in_proj_covar=tensor([0.0057, 0.0051, 0.0052, 0.0056, 0.0055, 0.0049, 0.0050, 0.0047], device='cuda:0'), out_proj_covar=tensor([5.1061e-05, 4.5478e-05, 4.6334e-05, 5.0066e-05, 4.9003e-05, 4.3078e-05, 4.4744e-05, 4.1495e-05], device='cuda:0') 2022-11-16 04:04:29,857 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1375, 4.0629, 3.9932, 4.2514, 4.0573, 3.8392, 4.6588, 4.0518], device='cuda:0'), covar=tensor([0.0439, 0.0778, 0.0391, 0.0923, 0.0445, 0.0293, 0.0596, 0.0631], device='cuda:0'), in_proj_covar=tensor([0.0085, 0.0108, 0.0093, 0.0118, 0.0089, 0.0077, 0.0144, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:04:39,841 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7350, 1.6207, 1.7632, 1.4258, 1.4356, 1.5080, 1.3071, 1.8205], device='cuda:0'), covar=tensor([0.0051, 0.0060, 0.0035, 0.0057, 0.0054, 0.0034, 0.0045, 0.0045], device='cuda:0'), in_proj_covar=tensor([0.0056, 0.0051, 0.0052, 0.0055, 0.0055, 0.0049, 0.0050, 0.0047], device='cuda:0'), out_proj_covar=tensor([5.0844e-05, 4.5101e-05, 4.6133e-05, 4.9859e-05, 4.8745e-05, 4.3083e-05, 4.4769e-05, 4.1300e-05], device='cuda:0') 2022-11-16 04:04:49,201 INFO [train.py:876] (0/4) Epoch 11, batch 3100, loss[loss=0.1181, simple_loss=0.1523, pruned_loss=0.04193, over 5747.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.142, pruned_loss=0.04388, over 1075762.67 frames. ], batch size: 21, lr: 7.37e-03, grad_scale: 16.0 2022-11-16 04:04:56,948 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.177e+01 1.516e+02 1.803e+02 2.135e+02 3.632e+02, threshold=3.607e+02, percent-clipped=0.0 2022-11-16 04:05:31,462 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75884.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:05:33,447 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5867, 3.0934, 4.1999, 3.9109, 4.9369, 3.1113, 4.2782, 4.8602], device='cuda:0'), covar=tensor([0.0524, 0.1283, 0.0810, 0.1174, 0.0235, 0.1442, 0.1036, 0.0456], device='cuda:0'), in_proj_covar=tensor([0.0243, 0.0196, 0.0214, 0.0213, 0.0240, 0.0199, 0.0228, 0.0231], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:05:55,894 INFO [train.py:876] (0/4) Epoch 11, batch 3200, loss[loss=0.1511, simple_loss=0.1788, pruned_loss=0.06171, over 5555.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1406, pruned_loss=0.04297, over 1081253.29 frames. ], batch size: 46, lr: 7.37e-03, grad_scale: 16.0 2022-11-16 04:05:59,614 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=75926.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 04:06:04,137 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75932.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:06:04,812 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.709e+01 1.702e+02 2.039e+02 2.411e+02 4.513e+02, threshold=4.077e+02, percent-clipped=5.0 2022-11-16 04:06:31,889 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=75974.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:06:35,881 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6062, 1.4079, 1.6037, 1.4481, 1.4815, 1.5184, 1.3466, 1.8375], device='cuda:0'), covar=tensor([0.0064, 0.0057, 0.0046, 0.0058, 0.0054, 0.0046, 0.0052, 0.0049], device='cuda:0'), in_proj_covar=tensor([0.0058, 0.0052, 0.0054, 0.0057, 0.0056, 0.0051, 0.0051, 0.0048], device='cuda:0'), out_proj_covar=tensor([5.2135e-05, 4.6118e-05, 4.7342e-05, 5.1269e-05, 4.9854e-05, 4.4172e-05, 4.6010e-05, 4.2502e-05], device='cuda:0') 2022-11-16 04:06:45,302 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3468, 3.9326, 4.1471, 3.8997, 4.4347, 4.1530, 4.0277, 4.3918], device='cuda:0'), covar=tensor([0.0395, 0.0407, 0.0526, 0.0404, 0.0368, 0.0331, 0.0299, 0.0404], device='cuda:0'), in_proj_covar=tensor([0.0141, 0.0148, 0.0107, 0.0144, 0.0172, 0.0103, 0.0122, 0.0149], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:07:03,629 INFO [train.py:876] (0/4) Epoch 11, batch 3300, loss[loss=0.118, simple_loss=0.149, pruned_loss=0.04343, over 5816.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.1418, pruned_loss=0.04388, over 1084090.84 frames. ], batch size: 18, lr: 7.36e-03, grad_scale: 16.0 2022-11-16 04:07:11,807 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.536e+01 1.447e+02 1.827e+02 2.353e+02 6.584e+02, threshold=3.655e+02, percent-clipped=2.0 2022-11-16 04:07:27,290 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7495, 2.4319, 2.6923, 3.6324, 3.6391, 2.8297, 2.3897, 3.6068], device='cuda:0'), covar=tensor([0.0595, 0.3127, 0.2299, 0.3143, 0.1056, 0.3129, 0.2538, 0.0913], device='cuda:0'), in_proj_covar=tensor([0.0247, 0.0202, 0.0191, 0.0307, 0.0223, 0.0202, 0.0190, 0.0245], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 04:07:48,031 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5088, 1.6046, 1.8134, 1.7686, 1.6097, 1.4088, 1.6931, 1.5752], device='cuda:0'), covar=tensor([0.2346, 0.2203, 0.1960, 0.1426, 0.1769, 0.2760, 0.1796, 0.1062], device='cuda:0'), in_proj_covar=tensor([0.0108, 0.0105, 0.0105, 0.0101, 0.0092, 0.0101, 0.0096, 0.0078], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:08:11,730 INFO [train.py:876] (0/4) Epoch 11, batch 3400, loss[loss=0.135, simple_loss=0.1568, pruned_loss=0.05656, over 5525.00 frames. ], tot_loss[loss=0.1141, simple_loss=0.1412, pruned_loss=0.04347, over 1087411.22 frames. ], batch size: 49, lr: 7.36e-03, grad_scale: 8.0 2022-11-16 04:08:20,082 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.517e+02 1.870e+02 2.344e+02 4.526e+02, threshold=3.741e+02, percent-clipped=4.0 2022-11-16 04:08:38,487 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.08 vs. limit=2.0 2022-11-16 04:08:38,581 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2022-11-16 04:08:40,771 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8338, 4.3555, 3.4567, 1.9896, 3.9937, 1.8940, 3.9474, 2.4265], device='cuda:0'), covar=tensor([0.1137, 0.0134, 0.0505, 0.1923, 0.0188, 0.1651, 0.0210, 0.1388], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0104, 0.0113, 0.0114, 0.0101, 0.0123, 0.0099, 0.0112], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:08:43,353 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9819, 4.4228, 4.0127, 4.4264, 4.4052, 3.6720, 4.0797, 3.8221], device='cuda:0'), covar=tensor([0.0456, 0.0336, 0.1450, 0.0372, 0.0423, 0.0482, 0.0620, 0.0495], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0173, 0.0270, 0.0170, 0.0217, 0.0169, 0.0184, 0.0172], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:08:52,049 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9813, 4.5163, 4.8777, 4.4984, 5.0428, 4.9479, 4.4444, 5.0952], device='cuda:0'), covar=tensor([0.0413, 0.0349, 0.0372, 0.0319, 0.0366, 0.0186, 0.0233, 0.0241], device='cuda:0'), in_proj_covar=tensor([0.0141, 0.0147, 0.0107, 0.0143, 0.0172, 0.0103, 0.0122, 0.0148], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:09:14,411 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3832, 2.0827, 2.5172, 1.8614, 1.4589, 3.2815, 2.5779, 2.2326], device='cuda:0'), covar=tensor([0.1053, 0.1543, 0.0942, 0.2457, 0.4466, 0.0846, 0.1227, 0.1273], device='cuda:0'), in_proj_covar=tensor([0.0096, 0.0086, 0.0086, 0.0096, 0.0068, 0.0064, 0.0072, 0.0083], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 04:09:15,729 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2394, 1.7627, 1.2908, 1.2446, 1.3724, 1.5555, 1.1454, 1.5096], device='cuda:0'), covar=tensor([0.0070, 0.0047, 0.0053, 0.0048, 0.0054, 0.0033, 0.0043, 0.0065], device='cuda:0'), in_proj_covar=tensor([0.0057, 0.0051, 0.0053, 0.0055, 0.0055, 0.0050, 0.0050, 0.0047], device='cuda:0'), out_proj_covar=tensor([5.1284e-05, 4.5192e-05, 4.6512e-05, 4.9851e-05, 4.8941e-05, 4.3324e-05, 4.4801e-05, 4.1719e-05], device='cuda:0') 2022-11-16 04:09:19,518 INFO [train.py:876] (0/4) Epoch 11, batch 3500, loss[loss=0.163, simple_loss=0.1775, pruned_loss=0.07427, over 5465.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.143, pruned_loss=0.04534, over 1081504.38 frames. ], batch size: 49, lr: 7.35e-03, grad_scale: 8.0 2022-11-16 04:09:20,377 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1562, 2.0280, 2.0640, 2.1006, 1.9548, 1.6990, 2.0331, 2.4166], device='cuda:0'), covar=tensor([0.1401, 0.1852, 0.2035, 0.1480, 0.1546, 0.2477, 0.1796, 0.1103], device='cuda:0'), in_proj_covar=tensor([0.0107, 0.0104, 0.0104, 0.0100, 0.0091, 0.0100, 0.0095, 0.0076], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:09:23,000 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5908, 1.7482, 1.6569, 2.1868, 1.9868, 1.1302, 1.9104, 2.0810], device='cuda:0'), covar=tensor([0.2216, 0.0307, 0.0908, 0.0244, 0.1239, 0.1683, 0.0315, 0.0265], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0022, 0.0015, 0.0019, 0.0015, 0.0014, 0.0021, 0.0015], device='cuda:0'), out_proj_covar=tensor([7.8282e-05, 1.0759e-04, 8.1382e-05, 9.5645e-05, 8.2123e-05, 7.7544e-05, 1.0154e-04, 7.9029e-05], device='cuda:0') 2022-11-16 04:09:27,972 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.997e+01 1.638e+02 2.033e+02 2.357e+02 4.512e+02, threshold=4.066e+02, percent-clipped=3.0 2022-11-16 04:09:52,287 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=76269.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 04:10:08,545 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=76294.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:10:27,147 INFO [train.py:876] (0/4) Epoch 11, batch 3600, loss[loss=0.1515, simple_loss=0.1744, pruned_loss=0.06428, over 5461.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1422, pruned_loss=0.04465, over 1080581.93 frames. ], batch size: 53, lr: 7.35e-03, grad_scale: 8.0 2022-11-16 04:10:28,589 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=76323.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:10:33,282 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=76330.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 04:10:35,632 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.020e+02 1.560e+02 1.906e+02 2.408e+02 5.224e+02, threshold=3.812e+02, percent-clipped=4.0 2022-11-16 04:10:49,952 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=76355.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:10:58,113 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1391, 2.7198, 2.5210, 1.4622, 2.6148, 3.0539, 2.8827, 3.2280], device='cuda:0'), covar=tensor([0.1667, 0.1549, 0.1226, 0.2797, 0.0843, 0.0851, 0.0487, 0.0816], device='cuda:0'), in_proj_covar=tensor([0.0169, 0.0182, 0.0165, 0.0182, 0.0181, 0.0197, 0.0167, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:11:01,918 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9009, 4.4543, 4.7429, 4.3765, 4.9509, 4.7978, 4.3340, 4.9396], device='cuda:0'), covar=tensor([0.0301, 0.0309, 0.0403, 0.0342, 0.0287, 0.0211, 0.0265, 0.0265], device='cuda:0'), in_proj_covar=tensor([0.0138, 0.0145, 0.0107, 0.0142, 0.0171, 0.0102, 0.0121, 0.0146], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:11:09,733 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=76384.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 04:11:13,910 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.76 vs. limit=2.0 2022-11-16 04:11:29,424 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2346, 2.1068, 2.0632, 2.1448, 1.9959, 1.4546, 2.0760, 2.5064], device='cuda:0'), covar=tensor([0.1535, 0.2104, 0.2128, 0.1427, 0.1803, 0.2976, 0.1632, 0.1062], device='cuda:0'), in_proj_covar=tensor([0.0106, 0.0102, 0.0102, 0.0099, 0.0089, 0.0098, 0.0094, 0.0075], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:11:35,222 INFO [train.py:876] (0/4) Epoch 11, batch 3700, loss[loss=0.1329, simple_loss=0.1502, pruned_loss=0.05778, over 5563.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1426, pruned_loss=0.04491, over 1078787.50 frames. ], batch size: 40, lr: 7.34e-03, grad_scale: 8.0 2022-11-16 04:11:43,592 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.959e+01 1.528e+02 1.916e+02 2.228e+02 3.767e+02, threshold=3.832e+02, percent-clipped=0.0 2022-11-16 04:12:12,432 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5222, 4.8444, 4.3962, 4.8722, 4.8094, 4.0600, 4.6155, 4.3772], device='cuda:0'), covar=tensor([0.0261, 0.0504, 0.1480, 0.0472, 0.0501, 0.0621, 0.0390, 0.0609], device='cuda:0'), in_proj_covar=tensor([0.0136, 0.0178, 0.0275, 0.0173, 0.0219, 0.0173, 0.0186, 0.0175], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:12:41,655 INFO [train.py:876] (0/4) Epoch 11, batch 3800, loss[loss=0.107, simple_loss=0.1492, pruned_loss=0.03237, over 5537.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1417, pruned_loss=0.04351, over 1081066.20 frames. ], batch size: 17, lr: 7.34e-03, grad_scale: 8.0 2022-11-16 04:12:42,314 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4928, 2.3750, 2.3098, 2.4556, 2.5332, 2.3515, 2.6899, 2.4807], device='cuda:0'), covar=tensor([0.0536, 0.0895, 0.0652, 0.1116, 0.0613, 0.0473, 0.0899, 0.0823], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0110, 0.0095, 0.0120, 0.0091, 0.0080, 0.0147, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:12:50,477 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.425e+01 1.575e+02 2.020e+02 2.543e+02 6.057e+02, threshold=4.040e+02, percent-clipped=8.0 2022-11-16 04:13:25,616 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2022-11-16 04:13:25,666 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2022-11-16 04:13:49,820 INFO [train.py:876] (0/4) Epoch 11, batch 3900, loss[loss=0.1523, simple_loss=0.1646, pruned_loss=0.07002, over 4978.00 frames. ], tot_loss[loss=0.1138, simple_loss=0.1416, pruned_loss=0.04304, over 1079973.57 frames. ], batch size: 109, lr: 7.33e-03, grad_scale: 8.0 2022-11-16 04:13:51,067 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-16 04:13:53,082 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=76625.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 04:13:57,012 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3560, 1.6703, 2.1074, 1.7683, 1.4087, 2.5333, 2.0682, 1.7126], device='cuda:0'), covar=tensor([0.1735, 0.1538, 0.1110, 0.2095, 0.2532, 0.0996, 0.1230, 0.1930], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0088, 0.0089, 0.0099, 0.0071, 0.0066, 0.0075, 0.0086], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 04:13:59,641 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.483e+01 1.484e+02 1.748e+02 2.175e+02 4.162e+02, threshold=3.496e+02, percent-clipped=1.0 2022-11-16 04:14:11,303 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=76650.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:14:31,194 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=76679.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 04:14:59,830 INFO [train.py:876] (0/4) Epoch 11, batch 4000, loss[loss=0.09597, simple_loss=0.1313, pruned_loss=0.03031, over 5597.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1416, pruned_loss=0.04318, over 1085703.14 frames. ], batch size: 18, lr: 7.33e-03, grad_scale: 8.0 2022-11-16 04:15:08,073 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.537e+02 1.803e+02 2.088e+02 3.858e+02, threshold=3.606e+02, percent-clipped=2.0 2022-11-16 04:15:52,126 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2022-11-16 04:16:00,091 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.74 vs. limit=2.0 2022-11-16 04:16:06,433 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.48 vs. limit=2.0 2022-11-16 04:16:07,301 INFO [train.py:876] (0/4) Epoch 11, batch 4100, loss[loss=0.07087, simple_loss=0.1076, pruned_loss=0.01708, over 5325.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.141, pruned_loss=0.04322, over 1089907.33 frames. ], batch size: 9, lr: 7.32e-03, grad_scale: 8.0 2022-11-16 04:16:15,769 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.454e+02 1.745e+02 2.235e+02 4.051e+02, threshold=3.490e+02, percent-clipped=2.0 2022-11-16 04:16:19,237 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=76839.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:16:43,907 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8966, 1.4741, 2.1832, 1.6262, 1.9455, 2.1405, 1.6054, 1.6823], device='cuda:0'), covar=tensor([0.0060, 0.0085, 0.0033, 0.0049, 0.0070, 0.0047, 0.0043, 0.0044], device='cuda:0'), in_proj_covar=tensor([0.0026, 0.0024, 0.0024, 0.0033, 0.0028, 0.0026, 0.0032, 0.0031], device='cuda:0'), out_proj_covar=tensor([2.4278e-05, 2.2933e-05, 2.1855e-05, 3.2225e-05, 2.5747e-05, 2.5098e-05, 3.1497e-05, 2.9842e-05], device='cuda:0') 2022-11-16 04:17:00,350 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=76900.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:17:14,931 INFO [train.py:876] (0/4) Epoch 11, batch 4200, loss[loss=0.07938, simple_loss=0.1213, pruned_loss=0.01873, over 5432.00 frames. ], tot_loss[loss=0.1142, simple_loss=0.1415, pruned_loss=0.04339, over 1086164.57 frames. ], batch size: 11, lr: 7.32e-03, grad_scale: 8.0 2022-11-16 04:17:17,656 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=76925.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 04:17:23,251 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 1.372e+02 1.800e+02 2.122e+02 4.143e+02, threshold=3.599e+02, percent-clipped=4.0 2022-11-16 04:17:34,003 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=76950.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:17:49,764 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=76973.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:17:53,747 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=76979.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 04:18:01,645 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8585, 4.8273, 3.2970, 4.5710, 3.6746, 3.2733, 2.6784, 4.0414], device='cuda:0'), covar=tensor([0.1188, 0.0153, 0.0836, 0.0359, 0.0575, 0.0818, 0.1581, 0.0297], device='cuda:0'), in_proj_covar=tensor([0.0157, 0.0141, 0.0159, 0.0145, 0.0176, 0.0167, 0.0162, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:18:06,028 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=76998.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:18:21,656 INFO [train.py:876] (0/4) Epoch 11, batch 4300, loss[loss=0.06809, simple_loss=0.1089, pruned_loss=0.01365, over 5708.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1425, pruned_loss=0.04423, over 1080815.37 frames. ], batch size: 12, lr: 7.31e-03, grad_scale: 8.0 2022-11-16 04:18:25,913 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77027.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:18:30,434 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.745e+01 1.515e+02 1.890e+02 2.345e+02 3.579e+02, threshold=3.779e+02, percent-clipped=0.0 2022-11-16 04:18:41,676 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4671, 4.0145, 3.6129, 4.0459, 4.0753, 3.4133, 3.6080, 3.5083], device='cuda:0'), covar=tensor([0.0921, 0.0520, 0.1328, 0.0440, 0.0493, 0.0481, 0.0667, 0.0666], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0177, 0.0273, 0.0172, 0.0215, 0.0170, 0.0186, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:18:49,638 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2022-11-16 04:19:06,642 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.98 vs. limit=5.0 2022-11-16 04:19:11,516 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77095.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:19:13,893 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.74 vs. limit=2.0 2022-11-16 04:19:28,274 INFO [train.py:876] (0/4) Epoch 11, batch 4400, loss[loss=0.0955, simple_loss=0.1363, pruned_loss=0.02734, over 5476.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1424, pruned_loss=0.04379, over 1082035.77 frames. ], batch size: 11, lr: 7.31e-03, grad_scale: 8.0 2022-11-16 04:19:37,949 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.506e+01 1.521e+02 1.875e+02 2.343e+02 5.225e+02, threshold=3.749e+02, percent-clipped=3.0 2022-11-16 04:19:52,381 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77156.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:20:19,050 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77195.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:20:29,852 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77211.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:20:36,410 INFO [train.py:876] (0/4) Epoch 11, batch 4500, loss[loss=0.1516, simple_loss=0.1705, pruned_loss=0.0663, over 5566.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1425, pruned_loss=0.04399, over 1079371.91 frames. ], batch size: 43, lr: 7.31e-03, grad_scale: 8.0 2022-11-16 04:20:45,512 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 1.435e+02 1.838e+02 2.199e+02 4.502e+02, threshold=3.675e+02, percent-clipped=1.0 2022-11-16 04:21:11,318 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77272.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:21:11,930 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77273.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:21:14,572 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77277.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:21:19,777 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.57 vs. limit=5.0 2022-11-16 04:21:44,389 INFO [train.py:876] (0/4) Epoch 11, batch 4600, loss[loss=0.1096, simple_loss=0.1429, pruned_loss=0.03815, over 5602.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.144, pruned_loss=0.0447, over 1077814.04 frames. ], batch size: 23, lr: 7.30e-03, grad_scale: 8.0 2022-11-16 04:21:52,864 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.709e+01 1.569e+02 2.032e+02 2.456e+02 5.240e+02, threshold=4.063e+02, percent-clipped=2.0 2022-11-16 04:21:53,048 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77334.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:21:56,319 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77338.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:22:18,063 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.67 vs. limit=5.0 2022-11-16 04:22:52,445 INFO [train.py:876] (0/4) Epoch 11, batch 4700, loss[loss=0.101, simple_loss=0.1419, pruned_loss=0.03003, over 5682.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1441, pruned_loss=0.04437, over 1083255.00 frames. ], batch size: 36, lr: 7.30e-03, grad_scale: 8.0 2022-11-16 04:23:00,908 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.694e+01 1.424e+02 1.701e+02 2.094e+02 3.279e+02, threshold=3.401e+02, percent-clipped=0.0 2022-11-16 04:23:12,350 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77451.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:23:33,379 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7524, 1.6363, 1.6640, 1.6416, 1.8479, 1.7447, 1.9207, 1.8420], device='cuda:0'), covar=tensor([0.0740, 0.1280, 0.0925, 0.1451, 0.0739, 0.0663, 0.1240, 0.1008], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0109, 0.0095, 0.0120, 0.0089, 0.0080, 0.0145, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:23:42,070 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77495.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:23:45,079 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9816, 2.2844, 3.0393, 2.1951, 1.3196, 3.2105, 2.8770, 2.3434], device='cuda:0'), covar=tensor([0.0667, 0.1209, 0.0567, 0.2643, 0.4674, 0.1725, 0.1345, 0.1473], device='cuda:0'), in_proj_covar=tensor([0.0096, 0.0087, 0.0086, 0.0096, 0.0070, 0.0064, 0.0074, 0.0086], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 04:23:49,134 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8862, 3.0770, 2.7556, 3.1807, 2.4939, 2.8866, 2.9298, 3.6001], device='cuda:0'), covar=tensor([0.1038, 0.1204, 0.1590, 0.1193, 0.1489, 0.1232, 0.1288, 0.4006], device='cuda:0'), in_proj_covar=tensor([0.0107, 0.0103, 0.0103, 0.0100, 0.0091, 0.0100, 0.0095, 0.0077], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:24:00,657 INFO [train.py:876] (0/4) Epoch 11, batch 4800, loss[loss=0.1073, simple_loss=0.1382, pruned_loss=0.03824, over 5746.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1428, pruned_loss=0.04421, over 1080419.83 frames. ], batch size: 31, lr: 7.29e-03, grad_scale: 8.0 2022-11-16 04:24:09,188 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.251e+01 1.590e+02 1.859e+02 2.447e+02 5.021e+02, threshold=3.719e+02, percent-clipped=6.0 2022-11-16 04:24:15,206 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77543.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:24:32,377 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77567.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:24:42,652 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-16 04:24:47,765 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.37 vs. limit=5.0 2022-11-16 04:25:05,972 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77616.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:25:09,012 INFO [train.py:876] (0/4) Epoch 11, batch 4900, loss[loss=0.1252, simple_loss=0.1505, pruned_loss=0.04996, over 5753.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1418, pruned_loss=0.04353, over 1077499.55 frames. ], batch size: 20, lr: 7.29e-03, grad_scale: 8.0 2022-11-16 04:25:14,421 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77629.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:25:17,033 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77633.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:25:17,557 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.666e+01 1.544e+02 1.877e+02 2.554e+02 4.573e+02, threshold=3.753e+02, percent-clipped=4.0 2022-11-16 04:25:46,991 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=77677.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:25:47,010 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77677.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:25:48,235 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1205, 3.8936, 2.9009, 1.7779, 3.7159, 1.5679, 3.5394, 1.9544], device='cuda:0'), covar=tensor([0.1641, 0.0189, 0.0809, 0.2179, 0.0223, 0.2036, 0.0266, 0.1659], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0104, 0.0112, 0.0114, 0.0101, 0.0122, 0.0099, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:25:48,926 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8086, 1.4984, 1.8393, 1.3339, 1.7143, 1.7411, 1.2295, 1.2006], device='cuda:0'), covar=tensor([0.0047, 0.0066, 0.0037, 0.0050, 0.0059, 0.0082, 0.0049, 0.0056], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0025, 0.0025, 0.0034, 0.0028, 0.0027, 0.0033, 0.0031], device='cuda:0'), out_proj_covar=tensor([2.5196e-05, 2.3540e-05, 2.2802e-05, 3.2888e-05, 2.6485e-05, 2.5658e-05, 3.2298e-05, 3.0528e-05], device='cuda:0') 2022-11-16 04:26:16,785 INFO [train.py:876] (0/4) Epoch 11, batch 5000, loss[loss=0.09159, simple_loss=0.1334, pruned_loss=0.02489, over 5750.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1425, pruned_loss=0.04431, over 1078263.48 frames. ], batch size: 15, lr: 7.28e-03, grad_scale: 8.0 2022-11-16 04:26:20,200 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0594, 1.8345, 1.9150, 1.6122, 2.1225, 1.8734, 1.4388, 1.5267], device='cuda:0'), covar=tensor([0.0038, 0.0051, 0.0053, 0.0049, 0.0034, 0.0059, 0.0047, 0.0056], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0025, 0.0025, 0.0033, 0.0028, 0.0027, 0.0033, 0.0031], device='cuda:0'), out_proj_covar=tensor([2.4809e-05, 2.3151e-05, 2.2507e-05, 3.2618e-05, 2.6067e-05, 2.5325e-05, 3.1948e-05, 3.0288e-05], device='cuda:0') 2022-11-16 04:26:25,204 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.970e+01 1.490e+02 1.911e+02 2.304e+02 5.675e+02, threshold=3.822e+02, percent-clipped=3.0 2022-11-16 04:26:28,008 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=77738.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:26:28,280 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.94 vs. limit=2.0 2022-11-16 04:26:36,334 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77751.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 04:26:41,515 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3723, 4.0239, 2.9533, 1.8457, 3.7688, 1.6634, 3.6972, 1.8133], device='cuda:0'), covar=tensor([0.1393, 0.0153, 0.0951, 0.1904, 0.0229, 0.1845, 0.0277, 0.1710], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0102, 0.0111, 0.0112, 0.0100, 0.0120, 0.0098, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:27:09,052 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77799.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:27:23,476 INFO [train.py:876] (0/4) Epoch 11, batch 5100, loss[loss=0.1256, simple_loss=0.1557, pruned_loss=0.04776, over 5742.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1433, pruned_loss=0.04471, over 1076110.67 frames. ], batch size: 27, lr: 7.28e-03, grad_scale: 8.0 2022-11-16 04:27:30,753 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4600, 4.5052, 2.9850, 4.2488, 3.5055, 3.2156, 2.5561, 3.9285], device='cuda:0'), covar=tensor([0.1474, 0.0192, 0.1019, 0.0335, 0.0669, 0.0839, 0.1762, 0.0307], device='cuda:0'), in_proj_covar=tensor([0.0158, 0.0143, 0.0160, 0.0147, 0.0179, 0.0168, 0.0165, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:27:32,550 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.765e+01 1.557e+02 2.003e+02 2.576e+02 4.677e+02, threshold=4.007e+02, percent-clipped=1.0 2022-11-16 04:27:54,783 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77867.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:27:55,896 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.59 vs. limit=2.0 2022-11-16 04:28:26,612 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77915.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:28:30,551 INFO [train.py:876] (0/4) Epoch 11, batch 5200, loss[loss=0.1254, simple_loss=0.1559, pruned_loss=0.04745, over 5688.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1434, pruned_loss=0.04434, over 1082773.59 frames. ], batch size: 34, lr: 7.27e-03, grad_scale: 8.0 2022-11-16 04:28:35,986 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77929.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:28:38,561 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=77933.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:28:39,078 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.507e+02 1.817e+02 2.273e+02 5.327e+02, threshold=3.634e+02, percent-clipped=3.0 2022-11-16 04:29:04,820 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=77972.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:29:06,886 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5992, 2.3220, 1.9085, 1.8202, 1.2207, 1.9578, 1.4871, 1.9787], device='cuda:0'), covar=tensor([0.1232, 0.0425, 0.0873, 0.0933, 0.2413, 0.1010, 0.1630, 0.0652], device='cuda:0'), in_proj_covar=tensor([0.0159, 0.0143, 0.0159, 0.0147, 0.0180, 0.0169, 0.0165, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:29:08,136 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77977.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:29:10,793 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=77981.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:29:30,396 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.65 vs. limit=2.0 2022-11-16 04:29:38,678 INFO [train.py:876] (0/4) Epoch 11, batch 5300, loss[loss=0.1065, simple_loss=0.1375, pruned_loss=0.03772, over 5790.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1438, pruned_loss=0.04467, over 1085672.36 frames. ], batch size: 22, lr: 7.27e-03, grad_scale: 8.0 2022-11-16 04:29:46,503 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=78033.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:29:47,077 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 1.542e+02 1.854e+02 2.251e+02 5.839e+02, threshold=3.709e+02, percent-clipped=3.0 2022-11-16 04:30:35,918 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.0259, 4.4895, 4.8846, 4.5444, 5.0615, 4.9701, 4.4944, 5.0632], device='cuda:0'), covar=tensor([0.0308, 0.0378, 0.0379, 0.0329, 0.0340, 0.0222, 0.0314, 0.0250], device='cuda:0'), in_proj_covar=tensor([0.0141, 0.0151, 0.0110, 0.0143, 0.0176, 0.0105, 0.0124, 0.0151], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:30:46,414 INFO [train.py:876] (0/4) Epoch 11, batch 5400, loss[loss=0.1572, simple_loss=0.1575, pruned_loss=0.07846, over 4144.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1426, pruned_loss=0.04428, over 1083374.40 frames. ], batch size: 181, lr: 7.26e-03, grad_scale: 16.0 2022-11-16 04:30:47,707 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2022-11-16 04:30:55,256 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.532e+02 1.889e+02 2.326e+02 4.779e+02, threshold=3.778e+02, percent-clipped=5.0 2022-11-16 04:31:48,478 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2022-11-16 04:31:55,286 INFO [train.py:876] (0/4) Epoch 11, batch 5500, loss[loss=0.1469, simple_loss=0.1638, pruned_loss=0.06502, over 5436.00 frames. ], tot_loss[loss=0.1151, simple_loss=0.1421, pruned_loss=0.0441, over 1080241.73 frames. ], batch size: 58, lr: 7.26e-03, grad_scale: 16.0 2022-11-16 04:31:56,316 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2022-11-16 04:32:02,847 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=78232.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:32:04,000 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.013e+02 1.487e+02 1.903e+02 2.328e+02 5.113e+02, threshold=3.806e+02, percent-clipped=2.0 2022-11-16 04:32:11,230 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2300, 4.5617, 4.1771, 4.5959, 4.6064, 3.8085, 4.1901, 3.9036], device='cuda:0'), covar=tensor([0.0387, 0.0436, 0.1252, 0.0418, 0.0378, 0.0531, 0.0527, 0.0666], device='cuda:0'), in_proj_covar=tensor([0.0135, 0.0177, 0.0275, 0.0173, 0.0217, 0.0172, 0.0186, 0.0173], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:32:11,261 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5546, 4.5946, 3.5101, 1.9627, 4.3015, 2.0213, 4.1050, 2.3415], device='cuda:0'), covar=tensor([0.1400, 0.0144, 0.0541, 0.1937, 0.0175, 0.1555, 0.0237, 0.1515], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0103, 0.0112, 0.0112, 0.0101, 0.0122, 0.0098, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:32:29,968 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=78272.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:32:32,997 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3496, 1.7852, 2.0703, 2.0787, 2.2641, 1.4746, 2.0418, 2.1408], device='cuda:0'), covar=tensor([0.0495, 0.1011, 0.0656, 0.0614, 0.0612, 0.1210, 0.0664, 0.0550], device='cuda:0'), in_proj_covar=tensor([0.0240, 0.0191, 0.0210, 0.0208, 0.0236, 0.0192, 0.0222, 0.0224], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:32:43,988 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=78293.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 04:32:44,324 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-16 04:33:01,992 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=78320.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:33:02,618 INFO [train.py:876] (0/4) Epoch 11, batch 5600, loss[loss=0.08244, simple_loss=0.1184, pruned_loss=0.02326, over 5595.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1426, pruned_loss=0.04455, over 1079733.57 frames. ], batch size: 22, lr: 7.25e-03, grad_scale: 16.0 2022-11-16 04:33:11,293 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=78333.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:33:11,813 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 1.405e+02 1.623e+02 2.102e+02 3.893e+02, threshold=3.245e+02, percent-clipped=1.0 2022-11-16 04:33:40,804 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7379, 2.9107, 2.5015, 2.8665, 2.3694, 2.3347, 2.7009, 3.3165], device='cuda:0'), covar=tensor([0.1155, 0.1273, 0.1930, 0.1275, 0.1650, 0.2016, 0.1448, 0.1304], device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0105, 0.0105, 0.0101, 0.0093, 0.0100, 0.0096, 0.0079], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:33:43,979 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=78381.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:34:11,589 INFO [train.py:876] (0/4) Epoch 11, batch 5700, loss[loss=0.1199, simple_loss=0.1316, pruned_loss=0.0541, over 4176.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1425, pruned_loss=0.04465, over 1075519.24 frames. ], batch size: 181, lr: 7.25e-03, grad_scale: 16.0 2022-11-16 04:34:20,539 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.114e+01 1.453e+02 1.885e+02 2.462e+02 5.318e+02, threshold=3.770e+02, percent-clipped=5.0 2022-11-16 04:34:37,877 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=78460.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:34:52,836 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.14 vs. limit=5.0 2022-11-16 04:35:01,395 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7636, 4.1635, 4.0214, 3.5873, 2.2801, 4.2201, 2.4890, 3.5102], device='cuda:0'), covar=tensor([0.0466, 0.0210, 0.0199, 0.0446, 0.0655, 0.0147, 0.0521, 0.0180], device='cuda:0'), in_proj_covar=tensor([0.0191, 0.0174, 0.0179, 0.0200, 0.0191, 0.0178, 0.0188, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:35:05,977 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.58 vs. limit=2.0 2022-11-16 04:35:18,652 INFO [train.py:876] (0/4) Epoch 11, batch 5800, loss[loss=0.1356, simple_loss=0.1614, pruned_loss=0.05488, over 5569.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1417, pruned_loss=0.04404, over 1077393.56 frames. ], batch size: 40, lr: 7.24e-03, grad_scale: 16.0 2022-11-16 04:35:18,839 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=78521.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:35:27,562 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.244e+01 1.530e+02 1.946e+02 2.399e+02 7.039e+02, threshold=3.892e+02, percent-clipped=3.0 2022-11-16 04:35:42,329 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=78556.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:35:45,991 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8599, 2.8704, 2.7049, 2.9944, 2.3681, 2.7859, 2.9992, 3.3238], device='cuda:0'), covar=tensor([0.0966, 0.1262, 0.1536, 0.1360, 0.1451, 0.1567, 0.1046, 0.1761], device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0105, 0.0104, 0.0101, 0.0093, 0.0100, 0.0096, 0.0079], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:35:47,956 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8284, 2.5054, 2.9232, 1.5026, 2.7865, 2.8871, 3.0219, 3.0113], device='cuda:0'), covar=tensor([0.2851, 0.2131, 0.1066, 0.3528, 0.1406, 0.1094, 0.0947, 0.1422], device='cuda:0'), in_proj_covar=tensor([0.0169, 0.0181, 0.0163, 0.0184, 0.0182, 0.0198, 0.0169, 0.0186], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:36:03,413 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=78588.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 04:36:08,822 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2022-11-16 04:36:23,748 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=78617.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:36:26,217 INFO [train.py:876] (0/4) Epoch 11, batch 5900, loss[loss=0.1167, simple_loss=0.153, pruned_loss=0.04021, over 5585.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1401, pruned_loss=0.04261, over 1078711.68 frames. ], batch size: 24, lr: 7.24e-03, grad_scale: 16.0 2022-11-16 04:36:34,707 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.483e+02 1.879e+02 2.262e+02 4.846e+02, threshold=3.758e+02, percent-clipped=3.0 2022-11-16 04:37:33,703 INFO [train.py:876] (0/4) Epoch 11, batch 6000, loss[loss=0.07168, simple_loss=0.1011, pruned_loss=0.02115, over 5381.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1392, pruned_loss=0.0417, over 1084979.68 frames. ], batch size: 9, lr: 7.24e-03, grad_scale: 16.0 2022-11-16 04:37:33,704 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 04:37:46,012 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1873, 3.1396, 2.2405, 1.8929, 3.0253, 1.4109, 2.8429, 1.8858], device='cuda:0'), covar=tensor([0.0600, 0.0163, 0.0681, 0.0816, 0.0227, 0.1226, 0.0312, 0.0686], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0104, 0.0114, 0.0114, 0.0101, 0.0122, 0.0098, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:37:51,351 INFO [train.py:908] (0/4) Epoch 11, validation: loss=0.1691, simple_loss=0.1834, pruned_loss=0.07744, over 1530663.00 frames. 2022-11-16 04:37:51,351 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 04:37:53,485 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2853, 1.2789, 1.6687, 0.8356, 1.6378, 2.1313, 1.4771, 1.5603], device='cuda:0'), covar=tensor([0.1938, 0.1079, 0.0542, 0.2127, 0.1776, 0.0455, 0.0894, 0.0697], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0023, 0.0016, 0.0020, 0.0016, 0.0014, 0.0021, 0.0015], device='cuda:0'), out_proj_covar=tensor([8.0928e-05, 1.1228e-04, 8.4355e-05, 9.9667e-05, 8.5621e-05, 8.0312e-05, 1.0583e-04, 8.0752e-05], device='cuda:0') 2022-11-16 04:37:59,809 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.487e+02 1.823e+02 2.133e+02 3.868e+02, threshold=3.646e+02, percent-clipped=1.0 2022-11-16 04:38:02,545 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0498, 1.4602, 2.3158, 2.1187, 2.3692, 2.1601, 2.9803, 2.1003], device='cuda:0'), covar=tensor([0.0017, 0.0082, 0.0040, 0.0040, 0.0027, 0.0057, 0.0029, 0.0030], device='cuda:0'), in_proj_covar=tensor([0.0027, 0.0025, 0.0025, 0.0034, 0.0028, 0.0026, 0.0032, 0.0031], device='cuda:0'), out_proj_covar=tensor([2.4510e-05, 2.2917e-05, 2.2590e-05, 3.2790e-05, 2.6279e-05, 2.5011e-05, 3.0960e-05, 2.9993e-05], device='cuda:0') 2022-11-16 04:38:08,451 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8606, 3.0479, 3.0428, 2.7752, 3.0103, 2.9105, 1.3069, 3.1335], device='cuda:0'), covar=tensor([0.0356, 0.0290, 0.0383, 0.0341, 0.0361, 0.0433, 0.2868, 0.0319], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0087, 0.0088, 0.0080, 0.0102, 0.0090, 0.0133, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:38:37,623 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2377, 4.5413, 4.3448, 3.8670, 2.4521, 4.8745, 2.6894, 4.1887], device='cuda:0'), covar=tensor([0.0331, 0.0152, 0.0173, 0.0342, 0.0611, 0.0100, 0.0540, 0.0172], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0176, 0.0180, 0.0202, 0.0191, 0.0178, 0.0190, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:38:56,297 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=78816.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:38:59,574 INFO [train.py:876] (0/4) Epoch 11, batch 6100, loss[loss=0.1261, simple_loss=0.1569, pruned_loss=0.04768, over 5569.00 frames. ], tot_loss[loss=0.1122, simple_loss=0.1404, pruned_loss=0.042, over 1085194.56 frames. ], batch size: 25, lr: 7.23e-03, grad_scale: 16.0 2022-11-16 04:39:00,955 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.5273, 5.0765, 5.4191, 5.0221, 5.5976, 5.4069, 4.7244, 5.5731], device='cuda:0'), covar=tensor([0.0356, 0.0332, 0.0368, 0.0253, 0.0278, 0.0213, 0.0238, 0.0217], device='cuda:0'), in_proj_covar=tensor([0.0141, 0.0150, 0.0108, 0.0142, 0.0174, 0.0105, 0.0122, 0.0150], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:39:02,354 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2131, 2.5705, 2.6201, 1.7297, 2.6472, 2.7523, 3.0163, 3.2892], device='cuda:0'), covar=tensor([0.1653, 0.1649, 0.1097, 0.2697, 0.0939, 0.1081, 0.0473, 0.0791], device='cuda:0'), in_proj_covar=tensor([0.0167, 0.0180, 0.0161, 0.0182, 0.0182, 0.0197, 0.0167, 0.0184], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:39:08,232 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.645e+01 1.456e+02 1.777e+02 2.158e+02 4.181e+02, threshold=3.555e+02, percent-clipped=3.0 2022-11-16 04:39:20,176 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.65 vs. limit=5.0 2022-11-16 04:39:44,979 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=78888.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:40:01,679 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=78912.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:40:07,448 INFO [train.py:876] (0/4) Epoch 11, batch 6200, loss[loss=0.1037, simple_loss=0.1324, pruned_loss=0.03747, over 5456.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1404, pruned_loss=0.04169, over 1082718.84 frames. ], batch size: 11, lr: 7.23e-03, grad_scale: 16.0 2022-11-16 04:40:16,301 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.628e+01 1.452e+02 1.821e+02 2.109e+02 4.865e+02, threshold=3.642e+02, percent-clipped=3.0 2022-11-16 04:40:17,677 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=78936.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:40:21,170 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5108, 1.9007, 1.7155, 1.2065, 1.6019, 1.9069, 2.0124, 2.1040], device='cuda:0'), covar=tensor([0.1666, 0.1280, 0.1851, 0.2409, 0.1425, 0.1223, 0.0975, 0.1172], device='cuda:0'), in_proj_covar=tensor([0.0167, 0.0180, 0.0161, 0.0181, 0.0180, 0.0196, 0.0167, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:40:25,960 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-16 04:40:32,309 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3450, 2.2560, 2.6331, 3.5252, 3.4176, 2.6377, 2.2835, 3.5233], device='cuda:0'), covar=tensor([0.2659, 0.2850, 0.1826, 0.1636, 0.1232, 0.2757, 0.2165, 0.0926], device='cuda:0'), in_proj_covar=tensor([0.0250, 0.0197, 0.0188, 0.0301, 0.0223, 0.0202, 0.0187, 0.0242], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0005], device='cuda:0') 2022-11-16 04:40:37,734 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=78965.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:41:15,713 INFO [train.py:876] (0/4) Epoch 11, batch 6300, loss[loss=0.1245, simple_loss=0.1529, pruned_loss=0.04802, over 5271.00 frames. ], tot_loss[loss=0.1124, simple_loss=0.1409, pruned_loss=0.04199, over 1082708.51 frames. ], batch size: 79, lr: 7.22e-03, grad_scale: 16.0 2022-11-16 04:41:19,119 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79026.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:41:24,116 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.477e+02 1.821e+02 2.187e+02 5.336e+02, threshold=3.643e+02, percent-clipped=2.0 2022-11-16 04:41:58,556 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.03 vs. limit=2.0 2022-11-16 04:42:11,520 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.27 vs. limit=5.0 2022-11-16 04:42:19,624 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79116.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:42:23,417 INFO [train.py:876] (0/4) Epoch 11, batch 6400, loss[loss=0.1046, simple_loss=0.1303, pruned_loss=0.03944, over 5559.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.1415, pruned_loss=0.04293, over 1079668.86 frames. ], batch size: 30, lr: 7.22e-03, grad_scale: 16.0 2022-11-16 04:42:32,274 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.453e+01 1.576e+02 1.936e+02 2.244e+02 4.119e+02, threshold=3.873e+02, percent-clipped=1.0 2022-11-16 04:42:52,527 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=79164.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:43:25,624 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79212.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:43:31,337 INFO [train.py:876] (0/4) Epoch 11, batch 6500, loss[loss=0.199, simple_loss=0.1974, pruned_loss=0.1003, over 5437.00 frames. ], tot_loss[loss=0.1132, simple_loss=0.1413, pruned_loss=0.04261, over 1084627.14 frames. ], batch size: 64, lr: 7.21e-03, grad_scale: 16.0 2022-11-16 04:43:40,081 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.904e+01 1.514e+02 1.877e+02 2.297e+02 4.296e+02, threshold=3.754e+02, percent-clipped=3.0 2022-11-16 04:43:57,774 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=79260.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:44:12,657 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-16 04:44:14,392 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79285.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:44:39,322 INFO [train.py:876] (0/4) Epoch 11, batch 6600, loss[loss=0.1077, simple_loss=0.1389, pruned_loss=0.03829, over 5731.00 frames. ], tot_loss[loss=0.1123, simple_loss=0.1405, pruned_loss=0.04206, over 1088478.03 frames. ], batch size: 28, lr: 7.21e-03, grad_scale: 16.0 2022-11-16 04:44:39,398 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79321.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:44:47,230 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79333.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:44:47,746 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.843e+01 1.552e+02 2.036e+02 2.359e+02 4.515e+02, threshold=4.072e+02, percent-clipped=1.0 2022-11-16 04:44:56,158 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79346.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 04:45:08,448 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79364.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:45:15,642 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79375.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:45:28,427 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79394.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:45:32,665 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7913, 2.4061, 2.6232, 3.6485, 3.5081, 2.6839, 2.4611, 3.7610], device='cuda:0'), covar=tensor([0.0876, 0.2851, 0.2007, 0.2885, 0.1360, 0.3158, 0.2016, 0.1652], device='cuda:0'), in_proj_covar=tensor([0.0249, 0.0198, 0.0189, 0.0302, 0.0223, 0.0203, 0.0189, 0.0244], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 04:45:35,718 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.6545, 5.1849, 5.4718, 5.0400, 5.6689, 5.5069, 4.7650, 5.6809], device='cuda:0'), covar=tensor([0.0340, 0.0377, 0.0422, 0.0311, 0.0330, 0.0204, 0.0258, 0.0270], device='cuda:0'), in_proj_covar=tensor([0.0144, 0.0153, 0.0111, 0.0145, 0.0178, 0.0107, 0.0126, 0.0153], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:45:46,992 INFO [train.py:876] (0/4) Epoch 11, batch 6700, loss[loss=0.1239, simple_loss=0.1435, pruned_loss=0.05215, over 5263.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1405, pruned_loss=0.04247, over 1093349.99 frames. ], batch size: 8, lr: 7.20e-03, grad_scale: 16.0 2022-11-16 04:45:49,759 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79425.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:45:55,338 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.416e+01 1.570e+02 1.880e+02 2.430e+02 4.197e+02, threshold=3.759e+02, percent-clipped=3.0 2022-11-16 04:45:56,821 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79436.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:46:54,265 INFO [train.py:876] (0/4) Epoch 11, batch 6800, loss[loss=0.1212, simple_loss=0.1523, pruned_loss=0.04501, over 5637.00 frames. ], tot_loss[loss=0.112, simple_loss=0.1403, pruned_loss=0.04184, over 1094964.77 frames. ], batch size: 29, lr: 7.20e-03, grad_scale: 16.0 2022-11-16 04:47:03,441 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 1.544e+02 1.830e+02 2.349e+02 4.429e+02, threshold=3.660e+02, percent-clipped=3.0 2022-11-16 04:47:05,285 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.81 vs. limit=2.0 2022-11-16 04:47:07,755 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2022-11-16 04:47:38,077 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79585.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:47:55,507 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.74 vs. limit=2.0 2022-11-16 04:48:02,161 INFO [train.py:876] (0/4) Epoch 11, batch 6900, loss[loss=0.06219, simple_loss=0.09896, pruned_loss=0.01271, over 5264.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1403, pruned_loss=0.04173, over 1093430.51 frames. ], batch size: 7, lr: 7.19e-03, grad_scale: 16.0 2022-11-16 04:48:02,275 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79621.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:48:10,568 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.132e+01 1.500e+02 1.797e+02 2.147e+02 3.952e+02, threshold=3.594e+02, percent-clipped=1.0 2022-11-16 04:48:15,838 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79641.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 04:48:19,220 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79646.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:48:26,773 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79657.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:48:34,566 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=79669.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:48:35,266 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8243, 4.6351, 3.4607, 2.0851, 4.2937, 1.8120, 4.2717, 2.5182], device='cuda:0'), covar=tensor([0.1216, 0.0102, 0.0461, 0.1977, 0.0170, 0.1737, 0.0163, 0.1349], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0104, 0.0114, 0.0113, 0.0100, 0.0121, 0.0099, 0.0112], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:48:47,697 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-16 04:48:48,063 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79689.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:49:08,148 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79718.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:49:09,332 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79720.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:49:09,911 INFO [train.py:876] (0/4) Epoch 11, batch 7000, loss[loss=0.09498, simple_loss=0.1265, pruned_loss=0.03173, over 5582.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.14, pruned_loss=0.04129, over 1095989.75 frames. ], batch size: 22, lr: 7.19e-03, grad_scale: 16.0 2022-11-16 04:49:16,818 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79731.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 04:49:18,677 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.908e+01 1.674e+02 1.897e+02 2.324e+02 4.828e+02, threshold=3.794e+02, percent-clipped=2.0 2022-11-16 04:49:38,290 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=79762.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:49:44,826 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([6.2531, 5.9765, 6.2672, 5.5253, 6.3297, 6.0038, 5.5574, 6.3358], device='cuda:0'), covar=tensor([0.0358, 0.0246, 0.0208, 0.0342, 0.0397, 0.0292, 0.0178, 0.0249], device='cuda:0'), in_proj_covar=tensor([0.0140, 0.0152, 0.0109, 0.0142, 0.0176, 0.0105, 0.0123, 0.0150], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:50:18,626 INFO [train.py:876] (0/4) Epoch 11, batch 7100, loss[loss=0.1341, simple_loss=0.156, pruned_loss=0.0561, over 5274.00 frames. ], tot_loss[loss=0.1136, simple_loss=0.1417, pruned_loss=0.0427, over 1094588.22 frames. ], batch size: 79, lr: 7.19e-03, grad_scale: 16.0 2022-11-16 04:50:20,104 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=79823.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:50:27,110 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.428e+02 1.794e+02 2.274e+02 4.053e+02, threshold=3.587e+02, percent-clipped=1.0 2022-11-16 04:50:41,087 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4848, 2.5230, 2.3647, 2.4976, 2.1185, 1.8912, 2.4454, 2.8329], device='cuda:0'), covar=tensor([0.1201, 0.1417, 0.1862, 0.1136, 0.1599, 0.1715, 0.1362, 0.0998], device='cuda:0'), in_proj_covar=tensor([0.0110, 0.0105, 0.0106, 0.0103, 0.0093, 0.0102, 0.0097, 0.0080], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 04:50:50,599 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0251, 3.7294, 3.9030, 3.6586, 4.1080, 3.8468, 3.6781, 4.0274], device='cuda:0'), covar=tensor([0.0416, 0.0418, 0.0471, 0.0408, 0.0366, 0.0414, 0.0403, 0.0464], device='cuda:0'), in_proj_covar=tensor([0.0143, 0.0155, 0.0111, 0.0144, 0.0178, 0.0106, 0.0125, 0.0153], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:51:27,327 INFO [train.py:876] (0/4) Epoch 11, batch 7200, loss[loss=0.1166, simple_loss=0.1354, pruned_loss=0.04889, over 5144.00 frames. ], tot_loss[loss=0.1134, simple_loss=0.1414, pruned_loss=0.04269, over 1086131.10 frames. ], batch size: 91, lr: 7.18e-03, grad_scale: 16.0 2022-11-16 04:51:28,401 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.00 vs. limit=5.0 2022-11-16 04:51:35,800 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.528e+01 1.486e+02 1.793e+02 2.179e+02 3.743e+02, threshold=3.587e+02, percent-clipped=3.0 2022-11-16 04:51:40,436 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=79941.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:51:40,482 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79941.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 04:51:53,459 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1835, 2.9638, 2.9875, 3.0998, 3.0260, 2.8233, 3.4282, 3.0619], device='cuda:0'), covar=tensor([0.0491, 0.1038, 0.0548, 0.1233, 0.0664, 0.0520, 0.0818, 0.0843], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0111, 0.0096, 0.0120, 0.0090, 0.0081, 0.0147, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:52:10,817 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.73 vs. limit=2.0 2022-11-16 04:52:11,193 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=79989.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:52:11,247 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=79989.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:52:15,182 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-11.pt 2022-11-16 04:52:57,787 INFO [train.py:876] (0/4) Epoch 12, batch 0, loss[loss=0.1178, simple_loss=0.1551, pruned_loss=0.04026, over 5742.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1551, pruned_loss=0.04026, over 5742.00 frames. ], batch size: 20, lr: 6.88e-03, grad_scale: 16.0 2022-11-16 04:52:57,789 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 04:53:01,648 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4252, 5.2831, 3.6531, 5.0394, 3.9573, 3.9007, 3.5927, 4.4607], device='cuda:0'), covar=tensor([0.0881, 0.0128, 0.0793, 0.0131, 0.0508, 0.0592, 0.1096, 0.0204], device='cuda:0'), in_proj_covar=tensor([0.0155, 0.0141, 0.0155, 0.0144, 0.0170, 0.0163, 0.0159, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:53:02,969 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.3946, 5.0074, 5.0751, 4.9740, 5.4086, 5.2430, 4.7022, 5.4425], device='cuda:0'), covar=tensor([0.0179, 0.0261, 0.0332, 0.0350, 0.0248, 0.0162, 0.0190, 0.0184], device='cuda:0'), in_proj_covar=tensor([0.0142, 0.0152, 0.0110, 0.0143, 0.0177, 0.0105, 0.0124, 0.0151], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:53:14,373 INFO [train.py:908] (0/4) Epoch 12, validation: loss=0.1725, simple_loss=0.1858, pruned_loss=0.07956, over 1530663.00 frames. 2022-11-16 04:53:14,374 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 04:53:19,235 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-80000.pt 2022-11-16 04:53:30,728 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3552, 3.8030, 3.4008, 3.8010, 3.7981, 3.3041, 3.4058, 3.4901], device='cuda:0'), covar=tensor([0.1101, 0.0451, 0.1253, 0.0425, 0.0456, 0.0497, 0.0847, 0.0562], device='cuda:0'), in_proj_covar=tensor([0.0139, 0.0182, 0.0279, 0.0176, 0.0222, 0.0176, 0.0191, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:53:31,367 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80013.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:53:36,408 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80020.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:53:40,796 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.65 vs. limit=5.0 2022-11-16 04:53:43,635 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80031.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 04:53:45,399 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.136e+01 1.488e+02 1.843e+02 2.324e+02 4.228e+02, threshold=3.685e+02, percent-clipped=3.0 2022-11-16 04:53:47,748 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80037.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:54:01,885 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.48 vs. limit=2.0 2022-11-16 04:54:08,554 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80068.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:54:15,971 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80079.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 04:54:25,345 INFO [train.py:876] (0/4) Epoch 12, batch 100, loss[loss=0.1078, simple_loss=0.1428, pruned_loss=0.03637, over 5550.00 frames. ], tot_loss[loss=0.1151, simple_loss=0.1419, pruned_loss=0.04415, over 424283.30 frames. ], batch size: 40, lr: 6.87e-03, grad_scale: 16.0 2022-11-16 04:54:42,310 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80118.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:54:52,892 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.598e+01 1.606e+02 2.099e+02 2.573e+02 5.035e+02, threshold=4.198e+02, percent-clipped=4.0 2022-11-16 04:55:21,964 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4103, 3.9864, 4.2396, 3.8975, 4.4615, 4.2129, 3.9361, 4.3726], device='cuda:0'), covar=tensor([0.0361, 0.0465, 0.0429, 0.0398, 0.0368, 0.0289, 0.0355, 0.0423], device='cuda:0'), in_proj_covar=tensor([0.0144, 0.0156, 0.0111, 0.0145, 0.0179, 0.0107, 0.0127, 0.0155], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 04:55:31,787 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2022-11-16 04:55:32,802 INFO [train.py:876] (0/4) Epoch 12, batch 200, loss[loss=0.1094, simple_loss=0.1325, pruned_loss=0.04317, over 5492.00 frames. ], tot_loss[loss=0.1124, simple_loss=0.1408, pruned_loss=0.04202, over 689614.32 frames. ], batch size: 10, lr: 6.87e-03, grad_scale: 16.0 2022-11-16 04:55:59,591 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4721, 4.1906, 2.7118, 4.0626, 3.1916, 2.9411, 2.2334, 3.4683], device='cuda:0'), covar=tensor([0.1259, 0.0233, 0.1132, 0.0339, 0.0739, 0.0966, 0.1831, 0.0412], device='cuda:0'), in_proj_covar=tensor([0.0157, 0.0141, 0.0156, 0.0145, 0.0171, 0.0165, 0.0159, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:56:01,419 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 1.528e+02 1.725e+02 2.144e+02 5.994e+02, threshold=3.450e+02, percent-clipped=2.0 2022-11-16 04:56:05,497 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80241.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:56:33,653 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80283.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:56:37,804 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80289.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:56:40,372 INFO [train.py:876] (0/4) Epoch 12, batch 300, loss[loss=0.1489, simple_loss=0.1663, pruned_loss=0.06576, over 5284.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1391, pruned_loss=0.04103, over 841361.45 frames. ], batch size: 79, lr: 6.86e-03, grad_scale: 16.0 2022-11-16 04:56:53,748 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80313.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:56:58,919 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0215, 4.0688, 3.8219, 3.7212, 4.0467, 3.9473, 1.5865, 4.1010], device='cuda:0'), covar=tensor([0.0213, 0.0217, 0.0328, 0.0338, 0.0230, 0.0323, 0.2924, 0.0215], device='cuda:0'), in_proj_covar=tensor([0.0102, 0.0086, 0.0087, 0.0079, 0.0100, 0.0089, 0.0131, 0.0106], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 04:57:05,408 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.66 vs. limit=5.0 2022-11-16 04:57:08,169 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.446e+01 1.444e+02 1.739e+02 2.202e+02 4.917e+02, threshold=3.479e+02, percent-clipped=4.0 2022-11-16 04:57:08,438 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6738, 3.9532, 3.7906, 3.5135, 2.1371, 3.8366, 2.3118, 3.0786], device='cuda:0'), covar=tensor([0.0410, 0.0168, 0.0185, 0.0326, 0.0567, 0.0183, 0.0505, 0.0173], device='cuda:0'), in_proj_covar=tensor([0.0192, 0.0175, 0.0180, 0.0202, 0.0190, 0.0178, 0.0189, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:57:14,544 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80344.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:57:25,857 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80361.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:57:41,542 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-16 04:57:47,428 INFO [train.py:876] (0/4) Epoch 12, batch 400, loss[loss=0.09656, simple_loss=0.1363, pruned_loss=0.0284, over 5741.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1395, pruned_loss=0.04154, over 938503.16 frames. ], batch size: 20, lr: 6.86e-03, grad_scale: 16.0 2022-11-16 04:57:47,562 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80393.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:57:48,291 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6046, 3.6737, 3.7242, 3.3495, 1.9193, 3.7888, 2.1698, 3.1304], device='cuda:0'), covar=tensor([0.0457, 0.0270, 0.0174, 0.0423, 0.0675, 0.0172, 0.0526, 0.0197], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0177, 0.0182, 0.0203, 0.0192, 0.0180, 0.0191, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 04:57:56,241 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-16 04:58:04,471 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80418.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:58:15,553 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.295e+01 1.482e+02 1.820e+02 2.321e+02 3.687e+02, threshold=3.641e+02, percent-clipped=1.0 2022-11-16 04:58:28,297 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80454.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:58:34,734 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80463.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:58:36,493 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80466.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:58:54,182 INFO [train.py:876] (0/4) Epoch 12, batch 500, loss[loss=0.1103, simple_loss=0.1506, pruned_loss=0.03503, over 5638.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1387, pruned_loss=0.0407, over 998472.88 frames. ], batch size: 29, lr: 6.86e-03, grad_scale: 16.0 2022-11-16 04:58:54,986 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80494.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:58:55,582 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7506, 5.0069, 3.7486, 2.0885, 4.6866, 2.2189, 4.6551, 2.6440], device='cuda:0'), covar=tensor([0.1334, 0.0111, 0.0500, 0.2084, 0.0132, 0.1562, 0.0171, 0.1435], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0104, 0.0115, 0.0114, 0.0101, 0.0121, 0.0100, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 04:59:03,320 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.76 vs. limit=2.0 2022-11-16 04:59:15,515 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80524.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:59:22,650 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.271e+01 1.520e+02 1.906e+02 2.365e+02 4.910e+02, threshold=3.812e+02, percent-clipped=3.0 2022-11-16 04:59:36,592 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80555.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 04:59:48,116 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.70 vs. limit=2.0 2022-11-16 04:59:55,578 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4716, 2.2510, 3.1743, 2.7356, 3.1139, 2.3540, 2.9104, 3.4470], device='cuda:0'), covar=tensor([0.0733, 0.1598, 0.0953, 0.1601, 0.0743, 0.1542, 0.1179, 0.0809], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0193, 0.0214, 0.0214, 0.0239, 0.0197, 0.0224, 0.0231], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:00:00,105 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5325, 2.5255, 2.1072, 2.3908, 2.0406, 1.8419, 2.1901, 2.8130], device='cuda:0'), covar=tensor([0.1294, 0.1491, 0.2837, 0.2185, 0.1797, 0.1940, 0.1878, 0.1372], device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0103, 0.0104, 0.0102, 0.0091, 0.0101, 0.0097, 0.0078], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 05:00:01,972 INFO [train.py:876] (0/4) Epoch 12, batch 600, loss[loss=0.1156, simple_loss=0.153, pruned_loss=0.03913, over 5662.00 frames. ], tot_loss[loss=0.1122, simple_loss=0.1406, pruned_loss=0.04192, over 1033091.66 frames. ], batch size: 29, lr: 6.85e-03, grad_scale: 16.0 2022-11-16 05:00:30,596 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.813e+01 1.490e+02 1.792e+02 2.361e+02 3.754e+02, threshold=3.583e+02, percent-clipped=0.0 2022-11-16 05:00:33,644 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80639.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:01:10,385 INFO [train.py:876] (0/4) Epoch 12, batch 700, loss[loss=0.1475, simple_loss=0.1581, pruned_loss=0.06845, over 5353.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.1398, pruned_loss=0.0408, over 1051287.63 frames. ], batch size: 70, lr: 6.85e-03, grad_scale: 16.0 2022-11-16 05:01:38,617 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.003e+02 1.475e+02 1.756e+02 2.175e+02 4.412e+02, threshold=3.511e+02, percent-clipped=5.0 2022-11-16 05:01:39,446 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80736.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:01:48,506 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80749.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:02:04,774 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=80773.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:02:17,705 INFO [train.py:876] (0/4) Epoch 12, batch 800, loss[loss=0.07737, simple_loss=0.1214, pruned_loss=0.01669, over 5571.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1394, pruned_loss=0.04125, over 1062975.36 frames. ], batch size: 16, lr: 6.84e-03, grad_scale: 16.0 2022-11-16 05:02:20,487 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80797.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:02:35,204 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80819.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:02:45,861 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=80834.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:02:46,324 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.454e+01 1.512e+02 1.839e+02 2.382e+02 3.696e+02, threshold=3.678e+02, percent-clipped=2.0 2022-11-16 05:02:51,962 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.84 vs. limit=2.0 2022-11-16 05:02:56,115 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=80850.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:02:58,713 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8519, 4.7631, 4.8740, 4.9191, 4.4753, 4.3572, 5.5348, 4.7860], device='cuda:0'), covar=tensor([0.0473, 0.0962, 0.0442, 0.1049, 0.0509, 0.0269, 0.0647, 0.0597], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0111, 0.0096, 0.0121, 0.0089, 0.0080, 0.0146, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:03:02,120 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-16 05:03:07,525 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.32 vs. limit=2.0 2022-11-16 05:03:25,609 INFO [train.py:876] (0/4) Epoch 12, batch 900, loss[loss=0.1003, simple_loss=0.1348, pruned_loss=0.03289, over 5711.00 frames. ], tot_loss[loss=0.1111, simple_loss=0.1398, pruned_loss=0.04127, over 1068685.15 frames. ], batch size: 15, lr: 6.84e-03, grad_scale: 8.0 2022-11-16 05:03:47,058 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.48 vs. limit=5.0 2022-11-16 05:03:56,033 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.546e+02 1.867e+02 2.262e+02 5.374e+02, threshold=3.734e+02, percent-clipped=6.0 2022-11-16 05:03:58,289 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=80939.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:04:06,189 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8495, 2.4129, 3.4078, 2.8963, 3.5006, 2.4975, 3.1264, 3.7049], device='cuda:0'), covar=tensor([0.0591, 0.1310, 0.0904, 0.1554, 0.0638, 0.1456, 0.1145, 0.0828], device='cuda:0'), in_proj_covar=tensor([0.0239, 0.0192, 0.0211, 0.0211, 0.0237, 0.0195, 0.0221, 0.0227], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:04:18,704 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.80 vs. limit=2.0 2022-11-16 05:04:18,971 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4344, 5.0539, 4.5947, 4.9816, 4.9679, 4.3491, 4.6533, 4.3836], device='cuda:0'), covar=tensor([0.0291, 0.0378, 0.1308, 0.0414, 0.0432, 0.0424, 0.0381, 0.0541], device='cuda:0'), in_proj_covar=tensor([0.0136, 0.0180, 0.0277, 0.0174, 0.0221, 0.0173, 0.0189, 0.0174], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:04:19,773 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5169, 1.3525, 1.1005, 0.8381, 1.3166, 1.4480, 0.6973, 1.0467], device='cuda:0'), covar=tensor([0.0227, 0.0495, 0.0381, 0.0755, 0.0338, 0.0244, 0.0848, 0.0452], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0023, 0.0016, 0.0020, 0.0016, 0.0015, 0.0022, 0.0015], device='cuda:0'), out_proj_covar=tensor([8.0316e-05, 1.1168e-04, 8.4241e-05, 9.9537e-05, 8.6740e-05, 8.1619e-05, 1.0651e-04, 8.0543e-05], device='cuda:0') 2022-11-16 05:04:33,103 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=80987.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:04:37,286 INFO [train.py:876] (0/4) Epoch 12, batch 1000, loss[loss=0.1016, simple_loss=0.1411, pruned_loss=0.03106, over 5776.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1406, pruned_loss=0.04164, over 1075974.21 frames. ], batch size: 26, lr: 6.83e-03, grad_scale: 8.0 2022-11-16 05:05:06,152 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.560e+01 1.507e+02 1.766e+02 2.175e+02 5.874e+02, threshold=3.531e+02, percent-clipped=3.0 2022-11-16 05:05:15,374 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81049.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:05:23,196 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0648, 3.9440, 3.9074, 3.6313, 4.0939, 3.9228, 1.4889, 4.2218], device='cuda:0'), covar=tensor([0.0235, 0.0330, 0.0302, 0.0349, 0.0239, 0.0341, 0.3059, 0.0287], device='cuda:0'), in_proj_covar=tensor([0.0100, 0.0085, 0.0086, 0.0080, 0.0098, 0.0088, 0.0129, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:05:43,843 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81092.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:05:44,403 INFO [train.py:876] (0/4) Epoch 12, batch 1100, loss[loss=0.09399, simple_loss=0.1349, pruned_loss=0.02655, over 5683.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1399, pruned_loss=0.04105, over 1081808.16 frames. ], batch size: 34, lr: 6.83e-03, grad_scale: 8.0 2022-11-16 05:05:45,829 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2726, 1.0816, 1.0333, 0.8926, 1.2169, 1.4397, 0.7494, 0.9921], device='cuda:0'), covar=tensor([0.0492, 0.0502, 0.0406, 0.0837, 0.0484, 0.0294, 0.0905, 0.0415], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0023, 0.0016, 0.0020, 0.0016, 0.0015, 0.0021, 0.0015], device='cuda:0'), out_proj_covar=tensor([7.9881e-05, 1.1119e-04, 8.3922e-05, 9.9069e-05, 8.6102e-05, 8.1267e-05, 1.0555e-04, 8.0451e-05], device='cuda:0') 2022-11-16 05:05:47,051 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81097.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:06:00,238 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81116.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:06:02,121 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81119.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:06:09,125 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81129.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:06:13,616 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.779e+01 1.469e+02 1.860e+02 2.387e+02 4.762e+02, threshold=3.720e+02, percent-clipped=3.0 2022-11-16 05:06:18,340 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2909, 3.1223, 3.3962, 2.9594, 3.3391, 3.2573, 1.3253, 3.5437], device='cuda:0'), covar=tensor([0.0303, 0.0586, 0.0372, 0.0469, 0.0372, 0.0471, 0.3330, 0.0314], device='cuda:0'), in_proj_covar=tensor([0.0101, 0.0087, 0.0087, 0.0081, 0.0099, 0.0089, 0.0130, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:06:23,030 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81150.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:06:27,405 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8459, 1.9053, 1.8990, 1.9523, 1.7317, 1.6719, 1.7518, 2.0165], device='cuda:0'), covar=tensor([0.2360, 0.2593, 0.2130, 0.1933, 0.2344, 0.3256, 0.2078, 0.0909], device='cuda:0'), in_proj_covar=tensor([0.0111, 0.0106, 0.0107, 0.0105, 0.0093, 0.0102, 0.0098, 0.0080], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 05:06:33,794 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.6163, 5.1503, 5.3609, 5.0278, 5.6621, 5.4659, 4.7193, 5.6889], device='cuda:0'), covar=tensor([0.0374, 0.0282, 0.0493, 0.0279, 0.0323, 0.0196, 0.0243, 0.0180], device='cuda:0'), in_proj_covar=tensor([0.0142, 0.0154, 0.0112, 0.0145, 0.0177, 0.0107, 0.0126, 0.0152], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 05:06:34,429 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81167.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:06:41,057 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=81177.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:06:52,015 INFO [train.py:876] (0/4) Epoch 12, batch 1200, loss[loss=0.117, simple_loss=0.1438, pruned_loss=0.04511, over 5553.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1393, pruned_loss=0.04029, over 1087163.96 frames. ], batch size: 46, lr: 6.83e-03, grad_scale: 8.0 2022-11-16 05:06:55,244 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81198.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:07:06,367 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.62 vs. limit=5.0 2022-11-16 05:07:20,808 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.439e+02 1.852e+02 2.463e+02 6.772e+02, threshold=3.705e+02, percent-clipped=5.0 2022-11-16 05:07:36,610 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1521, 4.6901, 4.2320, 4.6909, 4.6027, 3.9027, 4.3320, 4.0928], device='cuda:0'), covar=tensor([0.0382, 0.0330, 0.1122, 0.0297, 0.0394, 0.0530, 0.0478, 0.0516], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0176, 0.0271, 0.0173, 0.0219, 0.0170, 0.0188, 0.0172], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:07:36,650 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4002, 4.4761, 2.9659, 4.3211, 3.4546, 3.0205, 2.3451, 3.7537], device='cuda:0'), covar=tensor([0.1644, 0.0240, 0.1222, 0.0305, 0.0631, 0.1044, 0.2124, 0.0328], device='cuda:0'), in_proj_covar=tensor([0.0158, 0.0142, 0.0157, 0.0147, 0.0175, 0.0167, 0.0161, 0.0161], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:07:41,453 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2232, 2.8766, 3.1766, 1.8362, 3.2528, 3.5725, 3.4520, 4.0011], device='cuda:0'), covar=tensor([0.1973, 0.1569, 0.1503, 0.2796, 0.0720, 0.1008, 0.0682, 0.0642], device='cuda:0'), in_proj_covar=tensor([0.0167, 0.0180, 0.0166, 0.0184, 0.0184, 0.0202, 0.0166, 0.0185], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:07:41,994 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1940, 4.7622, 3.9667, 4.5657, 4.6158, 3.8648, 4.3435, 4.0630], device='cuda:0'), covar=tensor([0.0331, 0.0399, 0.1575, 0.0589, 0.0477, 0.0648, 0.0635, 0.0633], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0177, 0.0272, 0.0173, 0.0220, 0.0171, 0.0189, 0.0173], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:07:59,530 INFO [train.py:876] (0/4) Epoch 12, batch 1300, loss[loss=0.08786, simple_loss=0.1201, pruned_loss=0.02779, over 5157.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1381, pruned_loss=0.03883, over 1086705.03 frames. ], batch size: 8, lr: 6.82e-03, grad_scale: 8.0 2022-11-16 05:08:01,394 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5684, 1.3110, 1.1189, 0.8435, 1.1553, 1.4356, 0.5700, 1.0239], device='cuda:0'), covar=tensor([0.0420, 0.0441, 0.0522, 0.0704, 0.0525, 0.0471, 0.0989, 0.0693], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0023, 0.0016, 0.0020, 0.0016, 0.0015, 0.0022, 0.0015], device='cuda:0'), out_proj_covar=tensor([8.0955e-05, 1.1283e-04, 8.5339e-05, 1.0032e-04, 8.7280e-05, 8.2659e-05, 1.0690e-04, 8.1254e-05], device='cuda:0') 2022-11-16 05:08:03,913 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81299.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:08:05,216 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81301.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:08:28,635 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.965e+01 1.370e+02 1.672e+02 2.098e+02 3.683e+02, threshold=3.343e+02, percent-clipped=0.0 2022-11-16 05:08:34,127 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6341, 2.3462, 3.3243, 2.7802, 3.2438, 2.5329, 3.0838, 3.7066], device='cuda:0'), covar=tensor([0.0717, 0.1426, 0.0816, 0.1482, 0.0689, 0.1392, 0.1256, 0.0693], device='cuda:0'), in_proj_covar=tensor([0.0240, 0.0191, 0.0213, 0.0210, 0.0236, 0.0195, 0.0221, 0.0230], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:08:44,925 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=81360.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:08:46,215 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=81362.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:09:06,316 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81392.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:09:06,896 INFO [train.py:876] (0/4) Epoch 12, batch 1400, loss[loss=0.1073, simple_loss=0.1436, pruned_loss=0.0355, over 5689.00 frames. ], tot_loss[loss=0.107, simple_loss=0.137, pruned_loss=0.03847, over 1084886.45 frames. ], batch size: 34, lr: 6.82e-03, grad_scale: 8.0 2022-11-16 05:09:10,930 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9112, 1.4517, 1.2928, 1.3972, 1.0621, 2.0340, 1.5461, 1.2287], device='cuda:0'), covar=tensor([0.2937, 0.1000, 0.2922, 0.2577, 0.2904, 0.0662, 0.1975, 0.2838], device='cuda:0'), in_proj_covar=tensor([0.0099, 0.0091, 0.0092, 0.0098, 0.0073, 0.0065, 0.0075, 0.0086], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 05:09:31,110 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81429.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:09:35,795 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.508e+01 1.575e+02 1.879e+02 2.348e+02 6.067e+02, threshold=3.757e+02, percent-clipped=7.0 2022-11-16 05:09:38,790 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81440.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:09:39,457 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.4685, 4.8305, 5.3095, 4.8708, 5.4453, 5.2139, 4.6965, 5.4694], device='cuda:0'), covar=tensor([0.0274, 0.0314, 0.0316, 0.0281, 0.0286, 0.0214, 0.0233, 0.0192], device='cuda:0'), in_proj_covar=tensor([0.0139, 0.0151, 0.0109, 0.0141, 0.0176, 0.0105, 0.0123, 0.0150], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 05:09:56,296 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0596, 2.9698, 3.4013, 1.4702, 2.9480, 3.3067, 3.4923, 3.6641], device='cuda:0'), covar=tensor([0.2004, 0.2020, 0.0916, 0.3318, 0.0744, 0.1523, 0.0642, 0.0710], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0181, 0.0166, 0.0185, 0.0183, 0.0202, 0.0167, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:10:00,096 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81472.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:10:03,336 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81477.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:10:14,234 INFO [train.py:876] (0/4) Epoch 12, batch 1500, loss[loss=0.1223, simple_loss=0.1495, pruned_loss=0.04758, over 5574.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1381, pruned_loss=0.03936, over 1082740.37 frames. ], batch size: 22, lr: 6.81e-03, grad_scale: 8.0 2022-11-16 05:10:42,646 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.569e+01 1.444e+02 1.773e+02 2.242e+02 5.218e+02, threshold=3.547e+02, percent-clipped=2.0 2022-11-16 05:11:21,230 INFO [train.py:876] (0/4) Epoch 12, batch 1600, loss[loss=0.09474, simple_loss=0.1262, pruned_loss=0.03166, over 5700.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1387, pruned_loss=0.03983, over 1080297.14 frames. ], batch size: 19, lr: 6.81e-03, grad_scale: 8.0 2022-11-16 05:11:21,556 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.46 vs. limit=2.0 2022-11-16 05:11:51,017 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 1.477e+02 1.862e+02 2.272e+02 5.995e+02, threshold=3.723e+02, percent-clipped=4.0 2022-11-16 05:11:59,296 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6090, 5.4146, 4.1471, 2.5230, 4.8249, 2.7158, 4.8489, 3.5689], device='cuda:0'), covar=tensor([0.0929, 0.0102, 0.0381, 0.1760, 0.0169, 0.1352, 0.0147, 0.1099], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0104, 0.0114, 0.0112, 0.0100, 0.0121, 0.0099, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:12:04,142 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81655.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:12:05,446 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=81657.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:12:13,609 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-16 05:12:14,069 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81670.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:12:20,576 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.5096, 4.9735, 5.4126, 4.9592, 5.5643, 5.4210, 4.7724, 5.5572], device='cuda:0'), covar=tensor([0.0361, 0.0327, 0.0367, 0.0305, 0.0317, 0.0177, 0.0235, 0.0181], device='cuda:0'), in_proj_covar=tensor([0.0138, 0.0151, 0.0110, 0.0141, 0.0175, 0.0105, 0.0123, 0.0149], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 05:12:29,718 INFO [train.py:876] (0/4) Epoch 12, batch 1700, loss[loss=0.1093, simple_loss=0.1449, pruned_loss=0.03691, over 5731.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1376, pruned_loss=0.0389, over 1082562.91 frames. ], batch size: 12, lr: 6.80e-03, grad_scale: 8.0 2022-11-16 05:12:43,040 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6868, 4.2489, 3.7696, 3.5880, 2.0062, 4.0729, 2.2528, 3.4807], device='cuda:0'), covar=tensor([0.0439, 0.0124, 0.0159, 0.0316, 0.0663, 0.0145, 0.0502, 0.0159], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0175, 0.0179, 0.0198, 0.0190, 0.0178, 0.0188, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 05:12:43,506 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9274, 4.4529, 4.8232, 4.4523, 4.9539, 4.8190, 4.4337, 4.9829], device='cuda:0'), covar=tensor([0.0364, 0.0373, 0.0362, 0.0306, 0.0396, 0.0242, 0.0274, 0.0243], device='cuda:0'), in_proj_covar=tensor([0.0140, 0.0152, 0.0111, 0.0143, 0.0178, 0.0106, 0.0125, 0.0151], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 05:12:44,171 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1680, 5.1104, 3.9204, 2.3920, 4.6270, 2.3943, 4.7177, 3.2152], device='cuda:0'), covar=tensor([0.1159, 0.0086, 0.0411, 0.1808, 0.0176, 0.1457, 0.0153, 0.1053], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0105, 0.0115, 0.0112, 0.0101, 0.0122, 0.0100, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:12:51,040 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2022-11-16 05:12:55,331 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=81731.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 05:12:59,064 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.982e+01 1.420e+02 1.796e+02 2.143e+02 4.079e+02, threshold=3.592e+02, percent-clipped=2.0 2022-11-16 05:13:23,664 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81772.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:13:37,838 INFO [train.py:876] (0/4) Epoch 12, batch 1800, loss[loss=0.1423, simple_loss=0.1701, pruned_loss=0.05724, over 5615.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1393, pruned_loss=0.04034, over 1086990.98 frames. ], batch size: 24, lr: 6.80e-03, grad_scale: 8.0 2022-11-16 05:13:56,460 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=81820.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:14:06,568 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.901e+01 1.448e+02 1.812e+02 2.296e+02 5.003e+02, threshold=3.624e+02, percent-clipped=1.0 2022-11-16 05:14:45,100 INFO [train.py:876] (0/4) Epoch 12, batch 1900, loss[loss=0.108, simple_loss=0.1348, pruned_loss=0.04055, over 5600.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1394, pruned_loss=0.04064, over 1092394.08 frames. ], batch size: 18, lr: 6.80e-03, grad_scale: 8.0 2022-11-16 05:14:53,101 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0186, 1.8403, 1.6806, 1.5864, 1.9548, 2.0684, 1.8348, 1.4989], device='cuda:0'), covar=tensor([0.0041, 0.0091, 0.0061, 0.0049, 0.0062, 0.0088, 0.0047, 0.0047], device='cuda:0'), in_proj_covar=tensor([0.0030, 0.0027, 0.0027, 0.0037, 0.0032, 0.0029, 0.0036, 0.0035], device='cuda:0'), out_proj_covar=tensor([2.7273e-05, 2.5454e-05, 2.4775e-05, 3.5998e-05, 2.9718e-05, 2.7931e-05, 3.4966e-05, 3.3794e-05], device='cuda:0') 2022-11-16 05:15:14,249 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.303e+01 1.364e+02 1.744e+02 2.090e+02 4.733e+02, threshold=3.488e+02, percent-clipped=3.0 2022-11-16 05:15:27,298 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81955.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:15:28,620 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=81957.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:15:48,680 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=81987.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:15:52,893 INFO [train.py:876] (0/4) Epoch 12, batch 2000, loss[loss=0.1036, simple_loss=0.1395, pruned_loss=0.03391, over 5725.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1391, pruned_loss=0.04045, over 1090935.10 frames. ], batch size: 14, lr: 6.79e-03, grad_scale: 8.0 2022-11-16 05:15:59,641 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82003.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:16:01,282 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82005.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:16:15,356 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82026.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 05:16:22,027 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 1.417e+02 1.705e+02 2.224e+02 2.971e+02, threshold=3.410e+02, percent-clipped=0.0 2022-11-16 05:16:30,591 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82048.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:16:35,229 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82055.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:16:55,595 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8327, 4.1629, 3.9807, 3.5900, 2.1016, 4.1449, 2.4091, 3.5667], device='cuda:0'), covar=tensor([0.0429, 0.0215, 0.0163, 0.0303, 0.0658, 0.0164, 0.0510, 0.0182], device='cuda:0'), in_proj_covar=tensor([0.0190, 0.0175, 0.0179, 0.0198, 0.0191, 0.0178, 0.0187, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 05:17:00,878 INFO [train.py:876] (0/4) Epoch 12, batch 2100, loss[loss=0.07587, simple_loss=0.1182, pruned_loss=0.01677, over 5595.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.139, pruned_loss=0.0407, over 1086203.65 frames. ], batch size: 16, lr: 6.79e-03, grad_scale: 8.0 2022-11-16 05:17:15,821 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9383, 1.4811, 1.4612, 1.3691, 1.2913, 1.3343, 1.3047, 1.3385], device='cuda:0'), covar=tensor([0.3847, 0.2263, 0.2351, 0.1879, 0.2289, 0.2920, 0.2969, 0.1051], device='cuda:0'), in_proj_covar=tensor([0.0110, 0.0105, 0.0103, 0.0103, 0.0091, 0.0100, 0.0097, 0.0078], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 05:17:16,448 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82116.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:17:29,604 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.760e+01 1.480e+02 1.837e+02 2.085e+02 4.685e+02, threshold=3.673e+02, percent-clipped=4.0 2022-11-16 05:18:07,794 INFO [train.py:876] (0/4) Epoch 12, batch 2200, loss[loss=0.1197, simple_loss=0.1499, pruned_loss=0.04472, over 5649.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.139, pruned_loss=0.0411, over 1081315.22 frames. ], batch size: 29, lr: 6.78e-03, grad_scale: 8.0 2022-11-16 05:18:18,010 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-16 05:18:32,921 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5325, 1.6600, 1.8450, 1.4073, 1.6485, 1.6447, 1.3687, 1.8787], device='cuda:0'), covar=tensor([0.0049, 0.0064, 0.0046, 0.0056, 0.0048, 0.0037, 0.0051, 0.0042], device='cuda:0'), in_proj_covar=tensor([0.0060, 0.0056, 0.0055, 0.0060, 0.0058, 0.0054, 0.0052, 0.0050], device='cuda:0'), out_proj_covar=tensor([5.4051e-05, 4.9618e-05, 4.8532e-05, 5.4030e-05, 5.1035e-05, 4.6938e-05, 4.6657e-05, 4.4088e-05], device='cuda:0') 2022-11-16 05:18:37,329 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.617e+01 1.469e+02 1.927e+02 2.568e+02 3.955e+02, threshold=3.853e+02, percent-clipped=1.0 2022-11-16 05:19:15,471 INFO [train.py:876] (0/4) Epoch 12, batch 2300, loss[loss=0.08265, simple_loss=0.1253, pruned_loss=0.02, over 5713.00 frames. ], tot_loss[loss=0.1115, simple_loss=0.1398, pruned_loss=0.04159, over 1082242.63 frames. ], batch size: 17, lr: 6.78e-03, grad_scale: 8.0 2022-11-16 05:19:37,395 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=82326.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:19:44,123 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.401e+02 1.761e+02 2.241e+02 4.168e+02, threshold=3.523e+02, percent-clipped=1.0 2022-11-16 05:19:44,280 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0638, 3.9521, 2.5255, 3.6774, 2.9612, 2.7164, 2.0244, 3.2637], device='cuda:0'), covar=tensor([0.1563, 0.0244, 0.1248, 0.0421, 0.0963, 0.1092, 0.2119, 0.0474], device='cuda:0'), in_proj_covar=tensor([0.0159, 0.0142, 0.0159, 0.0149, 0.0178, 0.0169, 0.0161, 0.0161], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:19:49,484 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82343.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:19:51,468 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82346.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:20:09,897 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82374.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:20:22,511 INFO [train.py:876] (0/4) Epoch 12, batch 2400, loss[loss=0.1497, simple_loss=0.1675, pruned_loss=0.06596, over 5434.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1398, pruned_loss=0.04151, over 1079921.94 frames. ], batch size: 58, lr: 6.78e-03, grad_scale: 8.0 2022-11-16 05:20:32,558 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82407.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:20:35,161 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82411.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:20:40,595 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2395, 2.6687, 3.7089, 3.4124, 4.1702, 2.7840, 3.5623, 4.1816], device='cuda:0'), covar=tensor([0.0515, 0.1505, 0.0855, 0.1316, 0.0378, 0.1590, 0.1311, 0.0738], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0198, 0.0220, 0.0217, 0.0241, 0.0201, 0.0228, 0.0234], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:20:51,658 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.786e+01 1.416e+02 1.815e+02 2.094e+02 3.842e+02, threshold=3.631e+02, percent-clipped=3.0 2022-11-16 05:21:29,660 INFO [train.py:876] (0/4) Epoch 12, batch 2500, loss[loss=0.08277, simple_loss=0.1076, pruned_loss=0.02896, over 5185.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1393, pruned_loss=0.04077, over 1085297.39 frames. ], batch size: 8, lr: 6.77e-03, grad_scale: 8.0 2022-11-16 05:21:29,824 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82493.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 05:21:51,162 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.76 vs. limit=2.0 2022-11-16 05:21:58,346 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.955e+01 1.483e+02 1.924e+02 2.282e+02 6.804e+02, threshold=3.849e+02, percent-clipped=1.0 2022-11-16 05:22:10,841 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4327, 3.1651, 3.3046, 3.0565, 3.4658, 3.3317, 3.2409, 3.4140], device='cuda:0'), covar=tensor([0.0424, 0.0518, 0.0479, 0.0489, 0.0440, 0.0267, 0.0408, 0.0529], device='cuda:0'), in_proj_covar=tensor([0.0143, 0.0153, 0.0111, 0.0143, 0.0179, 0.0106, 0.0125, 0.0154], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 05:22:10,930 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82554.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 05:22:21,433 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82569.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:22:37,474 INFO [train.py:876] (0/4) Epoch 12, batch 2600, loss[loss=0.1101, simple_loss=0.13, pruned_loss=0.04514, over 5164.00 frames. ], tot_loss[loss=0.1108, simple_loss=0.139, pruned_loss=0.04125, over 1072223.50 frames. ], batch size: 91, lr: 6.77e-03, grad_scale: 8.0 2022-11-16 05:22:49,216 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6213, 3.6176, 3.6782, 3.6852, 3.3787, 3.3230, 4.1119, 3.7356], device='cuda:0'), covar=tensor([0.0481, 0.0895, 0.0496, 0.1069, 0.0664, 0.0458, 0.0718, 0.0692], device='cuda:0'), in_proj_covar=tensor([0.0090, 0.0112, 0.0099, 0.0123, 0.0092, 0.0082, 0.0150, 0.0106], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:23:03,073 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=82630.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:23:06,796 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.089e+01 1.435e+02 1.814e+02 2.327e+02 4.653e+02, threshold=3.628e+02, percent-clipped=2.0 2022-11-16 05:23:11,778 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=82643.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:23:43,825 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82691.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:23:45,099 INFO [train.py:876] (0/4) Epoch 12, batch 2700, loss[loss=0.135, simple_loss=0.1621, pruned_loss=0.054, over 5557.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1384, pruned_loss=0.04047, over 1076564.78 frames. ], batch size: 46, lr: 6.76e-03, grad_scale: 8.0 2022-11-16 05:23:51,439 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82702.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:23:57,160 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=82711.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:24:14,436 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.009e+02 1.412e+02 1.711e+02 2.254e+02 4.662e+02, threshold=3.423e+02, percent-clipped=2.0 2022-11-16 05:24:29,672 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=82759.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:24:36,419 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-16 05:24:43,755 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6974, 1.3994, 1.7657, 1.3026, 1.6734, 1.6328, 1.2740, 0.9846], device='cuda:0'), covar=tensor([0.0049, 0.0070, 0.0032, 0.0053, 0.0051, 0.0060, 0.0051, 0.0069], device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0027, 0.0026, 0.0036, 0.0031, 0.0028, 0.0035, 0.0034], device='cuda:0'), out_proj_covar=tensor([2.6680e-05, 2.5153e-05, 2.3854e-05, 3.4699e-05, 2.9084e-05, 2.6895e-05, 3.4010e-05, 3.3082e-05], device='cuda:0') 2022-11-16 05:24:49,674 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5387, 1.2280, 1.1413, 0.7767, 1.5044, 1.4723, 0.8425, 1.1538], device='cuda:0'), covar=tensor([0.0622, 0.0589, 0.0632, 0.1041, 0.0299, 0.0432, 0.0768, 0.0577], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0023, 0.0016, 0.0020, 0.0016, 0.0015, 0.0021, 0.0015], device='cuda:0'), out_proj_covar=tensor([8.0582e-05, 1.1239e-04, 8.5298e-05, 9.9323e-05, 8.7566e-05, 8.1622e-05, 1.0674e-04, 8.1847e-05], device='cuda:0') 2022-11-16 05:24:52,719 INFO [train.py:876] (0/4) Epoch 12, batch 2800, loss[loss=0.121, simple_loss=0.1521, pruned_loss=0.0449, over 5811.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1387, pruned_loss=0.04041, over 1086011.83 frames. ], batch size: 25, lr: 6.76e-03, grad_scale: 8.0 2022-11-16 05:24:58,217 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3754, 4.3152, 3.3195, 1.8445, 3.9057, 1.6152, 4.0535, 2.1460], device='cuda:0'), covar=tensor([0.1565, 0.0165, 0.0590, 0.2071, 0.0246, 0.1919, 0.0191, 0.1698], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0104, 0.0114, 0.0111, 0.0100, 0.0121, 0.0099, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:25:21,516 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.023e+02 1.398e+02 1.709e+02 2.150e+02 5.035e+02, threshold=3.419e+02, percent-clipped=1.0 2022-11-16 05:25:30,440 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82849.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 05:25:49,850 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.57 vs. limit=2.0 2022-11-16 05:25:59,792 INFO [train.py:876] (0/4) Epoch 12, batch 2900, loss[loss=0.1544, simple_loss=0.1685, pruned_loss=0.0701, over 5477.00 frames. ], tot_loss[loss=0.1109, simple_loss=0.139, pruned_loss=0.04139, over 1079949.89 frames. ], batch size: 49, lr: 6.76e-03, grad_scale: 16.0 2022-11-16 05:26:17,959 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.79 vs. limit=2.0 2022-11-16 05:26:21,469 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=82925.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:26:22,904 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5830, 1.8437, 2.3304, 2.2859, 2.3392, 1.5884, 2.2101, 2.4653], device='cuda:0'), covar=tensor([0.0642, 0.1203, 0.0769, 0.0881, 0.0709, 0.1438, 0.0880, 0.0716], device='cuda:0'), in_proj_covar=tensor([0.0243, 0.0194, 0.0219, 0.0216, 0.0238, 0.0199, 0.0224, 0.0230], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:26:28,505 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 1.420e+02 1.709e+02 2.033e+02 4.998e+02, threshold=3.418e+02, percent-clipped=2.0 2022-11-16 05:26:47,410 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=82963.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:26:48,248 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-16 05:26:51,749 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6371, 1.8228, 2.2859, 1.5920, 1.0966, 2.6776, 2.2702, 1.9751], device='cuda:0'), covar=tensor([0.1724, 0.1888, 0.1309, 0.3269, 0.3734, 0.0782, 0.1328, 0.2145], device='cuda:0'), in_proj_covar=tensor([0.0101, 0.0093, 0.0092, 0.0098, 0.0073, 0.0065, 0.0075, 0.0086], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 05:27:07,644 INFO [train.py:876] (0/4) Epoch 12, batch 3000, loss[loss=0.1352, simple_loss=0.1464, pruned_loss=0.06197, over 5025.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1386, pruned_loss=0.04064, over 1082856.51 frames. ], batch size: 109, lr: 6.75e-03, grad_scale: 16.0 2022-11-16 05:27:07,645 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 05:27:25,141 INFO [train.py:908] (0/4) Epoch 12, validation: loss=0.1722, simple_loss=0.1854, pruned_loss=0.07947, over 1530663.00 frames. 2022-11-16 05:27:25,142 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 05:27:31,371 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83002.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:27:43,401 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3017, 4.8101, 4.3946, 4.7823, 4.7991, 3.9800, 4.4189, 4.0608], device='cuda:0'), covar=tensor([0.0403, 0.0488, 0.1491, 0.0644, 0.0527, 0.0477, 0.0872, 0.0695], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0176, 0.0270, 0.0171, 0.0218, 0.0169, 0.0187, 0.0172], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:27:45,968 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=83024.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:27:53,838 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.307e+01 1.447e+02 1.727e+02 2.150e+02 3.702e+02, threshold=3.454e+02, percent-clipped=1.0 2022-11-16 05:28:03,082 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83050.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:28:32,007 INFO [train.py:876] (0/4) Epoch 12, batch 3100, loss[loss=0.08875, simple_loss=0.1143, pruned_loss=0.03161, over 5279.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1378, pruned_loss=0.0399, over 1084656.07 frames. ], batch size: 6, lr: 6.75e-03, grad_scale: 16.0 2022-11-16 05:28:56,878 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5328, 1.1811, 1.2047, 0.8879, 1.3353, 1.4395, 0.8741, 0.9440], device='cuda:0'), covar=tensor([0.0447, 0.0417, 0.0354, 0.0770, 0.0344, 0.0239, 0.0604, 0.0528], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0023, 0.0016, 0.0020, 0.0016, 0.0015, 0.0021, 0.0015], device='cuda:0'), out_proj_covar=tensor([7.9805e-05, 1.1221e-04, 8.5039e-05, 9.9546e-05, 8.7741e-05, 8.0797e-05, 1.0623e-04, 8.1809e-05], device='cuda:0') 2022-11-16 05:29:01,264 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.033e+02 1.527e+02 1.920e+02 2.292e+02 4.866e+02, threshold=3.839e+02, percent-clipped=2.0 2022-11-16 05:29:09,691 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83149.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 05:29:27,117 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.57 vs. limit=2.0 2022-11-16 05:29:39,573 INFO [train.py:876] (0/4) Epoch 12, batch 3200, loss[loss=0.1212, simple_loss=0.1475, pruned_loss=0.04743, over 4958.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1394, pruned_loss=0.04168, over 1079415.58 frames. ], batch size: 109, lr: 6.74e-03, grad_scale: 16.0 2022-11-16 05:29:41,730 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83196.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:29:42,238 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83197.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 05:29:52,011 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2136, 3.8101, 3.3987, 3.8299, 3.8503, 3.2663, 3.4304, 3.2780], device='cuda:0'), covar=tensor([0.1098, 0.0488, 0.1398, 0.0456, 0.0461, 0.0586, 0.0929, 0.0704], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0176, 0.0272, 0.0171, 0.0218, 0.0170, 0.0188, 0.0173], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:30:01,765 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83225.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:30:09,030 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.526e+02 1.871e+02 2.239e+02 5.623e+02, threshold=3.742e+02, percent-clipped=1.0 2022-11-16 05:30:11,871 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9340, 1.8702, 2.1223, 1.6595, 1.8578, 1.9899, 1.7048, 2.1869], device='cuda:0'), covar=tensor([0.0054, 0.0056, 0.0042, 0.0054, 0.0048, 0.0053, 0.0040, 0.0039], device='cuda:0'), in_proj_covar=tensor([0.0060, 0.0056, 0.0055, 0.0059, 0.0058, 0.0053, 0.0052, 0.0050], device='cuda:0'), out_proj_covar=tensor([5.4047e-05, 4.9944e-05, 4.7966e-05, 5.2820e-05, 5.1047e-05, 4.6391e-05, 4.6260e-05, 4.3712e-05], device='cuda:0') 2022-11-16 05:30:23,072 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=83257.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:30:33,803 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83273.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:30:41,809 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83284.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:30:47,948 INFO [train.py:876] (0/4) Epoch 12, batch 3300, loss[loss=0.2357, simple_loss=0.2093, pruned_loss=0.131, over 3025.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1395, pruned_loss=0.04158, over 1082613.18 frames. ], batch size: 284, lr: 6.74e-03, grad_scale: 16.0 2022-11-16 05:31:05,237 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=83319.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:31:07,873 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-16 05:31:17,549 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 1.415e+02 1.663e+02 2.154e+02 3.672e+02, threshold=3.327e+02, percent-clipped=0.0 2022-11-16 05:31:24,253 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=83345.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:31:43,013 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6377, 2.2203, 2.8023, 3.6942, 3.5565, 2.7747, 2.3401, 3.5934], device='cuda:0'), covar=tensor([0.1042, 0.3032, 0.1991, 0.2612, 0.1276, 0.3069, 0.2458, 0.1106], device='cuda:0'), in_proj_covar=tensor([0.0249, 0.0196, 0.0185, 0.0298, 0.0220, 0.0202, 0.0188, 0.0245], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 05:31:56,028 INFO [train.py:876] (0/4) Epoch 12, batch 3400, loss[loss=0.084, simple_loss=0.1285, pruned_loss=0.01977, over 5513.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1381, pruned_loss=0.04018, over 1087670.49 frames. ], batch size: 17, lr: 6.74e-03, grad_scale: 16.0 2022-11-16 05:32:22,932 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0133, 3.6877, 3.8173, 3.5973, 4.0804, 3.6479, 3.7111, 4.0748], device='cuda:0'), covar=tensor([0.0398, 0.0462, 0.0508, 0.0476, 0.0348, 0.0592, 0.0412, 0.0365], device='cuda:0'), in_proj_covar=tensor([0.0143, 0.0153, 0.0111, 0.0144, 0.0180, 0.0108, 0.0127, 0.0153], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 05:32:24,635 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1711, 1.6670, 1.3091, 1.1768, 1.4857, 1.4332, 1.1326, 1.5110], device='cuda:0'), covar=tensor([0.0079, 0.0042, 0.0060, 0.0067, 0.0054, 0.0063, 0.0085, 0.0072], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0056, 0.0055, 0.0059, 0.0058, 0.0054, 0.0053, 0.0050], device='cuda:0'), out_proj_covar=tensor([5.4559e-05, 5.0052e-05, 4.7954e-05, 5.3011e-05, 5.1584e-05, 4.7065e-05, 4.7064e-05, 4.4015e-05], device='cuda:0') 2022-11-16 05:32:25,092 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.935e+01 1.460e+02 1.839e+02 2.206e+02 3.947e+02, threshold=3.678e+02, percent-clipped=4.0 2022-11-16 05:32:52,811 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1036, 0.7809, 1.0455, 0.7870, 1.0851, 0.9971, 0.5301, 0.7340], device='cuda:0'), covar=tensor([0.0290, 0.0386, 0.0320, 0.0437, 0.0362, 0.0320, 0.0765, 0.0385], device='cuda:0'), in_proj_covar=tensor([0.0014, 0.0023, 0.0016, 0.0020, 0.0017, 0.0015, 0.0022, 0.0016], device='cuda:0'), out_proj_covar=tensor([8.0631e-05, 1.1314e-04, 8.6398e-05, 1.0087e-04, 8.9354e-05, 8.2426e-05, 1.0738e-04, 8.3777e-05], device='cuda:0') 2022-11-16 05:33:03,154 INFO [train.py:876] (0/4) Epoch 12, batch 3500, loss[loss=0.09867, simple_loss=0.1268, pruned_loss=0.03526, over 5716.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1382, pruned_loss=0.04068, over 1088118.33 frames. ], batch size: 11, lr: 6.73e-03, grad_scale: 16.0 2022-11-16 05:33:32,640 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.791e+01 1.416e+02 1.765e+02 2.301e+02 4.941e+02, threshold=3.529e+02, percent-clipped=1.0 2022-11-16 05:33:43,837 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=83552.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:34:11,111 INFO [train.py:876] (0/4) Epoch 12, batch 3600, loss[loss=0.1353, simple_loss=0.1451, pruned_loss=0.06273, over 4971.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.137, pruned_loss=0.03995, over 1085604.40 frames. ], batch size: 110, lr: 6.73e-03, grad_scale: 16.0 2022-11-16 05:34:29,516 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83619.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:34:40,744 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.534e+01 1.623e+02 1.893e+02 2.295e+02 4.939e+02, threshold=3.787e+02, percent-clipped=6.0 2022-11-16 05:34:43,450 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=83640.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:35:02,006 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83667.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:35:16,008 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1118, 4.5088, 4.0868, 4.5432, 4.5044, 3.7562, 4.1338, 3.9196], device='cuda:0'), covar=tensor([0.0430, 0.0432, 0.1398, 0.0326, 0.0351, 0.0566, 0.0578, 0.0605], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0182, 0.0278, 0.0177, 0.0224, 0.0172, 0.0192, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:35:19,166 INFO [train.py:876] (0/4) Epoch 12, batch 3700, loss[loss=0.1295, simple_loss=0.1584, pruned_loss=0.05029, over 5608.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1381, pruned_loss=0.04019, over 1085354.35 frames. ], batch size: 22, lr: 6.72e-03, grad_scale: 16.0 2022-11-16 05:35:45,217 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9258, 2.7555, 3.1626, 1.8814, 2.8358, 3.4999, 3.4387, 3.4109], device='cuda:0'), covar=tensor([0.2110, 0.1652, 0.0998, 0.2598, 0.0755, 0.0779, 0.0394, 0.0933], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0182, 0.0168, 0.0184, 0.0183, 0.0199, 0.0169, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:35:48,569 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.368e+01 1.468e+02 1.783e+02 2.131e+02 3.533e+02, threshold=3.566e+02, percent-clipped=0.0 2022-11-16 05:36:01,074 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.60 vs. limit=2.0 2022-11-16 05:36:04,182 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83760.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:36:06,055 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.1210, 4.9782, 5.0521, 5.2133, 4.9244, 4.7225, 5.6355, 5.1346], device='cuda:0'), covar=tensor([0.0325, 0.0727, 0.0351, 0.0961, 0.0330, 0.0204, 0.0529, 0.0330], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0107, 0.0096, 0.0120, 0.0089, 0.0079, 0.0145, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:36:27,058 INFO [train.py:876] (0/4) Epoch 12, batch 3800, loss[loss=0.08545, simple_loss=0.1191, pruned_loss=0.02588, over 5750.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1383, pruned_loss=0.04002, over 1084914.61 frames. ], batch size: 13, lr: 6.72e-03, grad_scale: 16.0 2022-11-16 05:36:44,433 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5558, 1.8306, 2.0095, 1.2795, 1.3083, 1.9085, 1.3412, 1.2685], device='cuda:0'), covar=tensor([0.0036, 0.0039, 0.0022, 0.0074, 0.0106, 0.0034, 0.0047, 0.0060], device='cuda:0'), in_proj_covar=tensor([0.0028, 0.0026, 0.0026, 0.0035, 0.0030, 0.0027, 0.0034, 0.0033], device='cuda:0'), out_proj_covar=tensor([2.5863e-05, 2.4148e-05, 2.3758e-05, 3.3941e-05, 2.8148e-05, 2.5754e-05, 3.2867e-05, 3.1839e-05], device='cuda:0') 2022-11-16 05:36:46,166 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=83821.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:36:56,434 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.699e+01 1.465e+02 1.775e+02 2.276e+02 4.868e+02, threshold=3.550e+02, percent-clipped=4.0 2022-11-16 05:36:57,798 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.07 vs. limit=5.0 2022-11-16 05:37:02,156 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0521, 2.6068, 2.7983, 1.6076, 2.8891, 3.1022, 3.0141, 3.2112], device='cuda:0'), covar=tensor([0.1743, 0.1549, 0.1034, 0.2884, 0.0749, 0.1008, 0.0524, 0.0902], device='cuda:0'), in_proj_covar=tensor([0.0167, 0.0183, 0.0170, 0.0186, 0.0184, 0.0202, 0.0171, 0.0188], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:37:05,977 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8473, 2.2070, 2.3501, 3.2155, 3.0737, 2.4307, 2.2119, 3.1957], device='cuda:0'), covar=tensor([0.1464, 0.2250, 0.2037, 0.1514, 0.1330, 0.2808, 0.1866, 0.1079], device='cuda:0'), in_proj_covar=tensor([0.0252, 0.0198, 0.0188, 0.0301, 0.0224, 0.0205, 0.0189, 0.0247], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 05:37:07,268 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83852.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:37:31,760 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9751, 2.3407, 2.7967, 4.0285, 3.8958, 2.9352, 2.7979, 3.8512], device='cuda:0'), covar=tensor([0.0762, 0.3175, 0.2481, 0.2154, 0.1114, 0.3495, 0.2322, 0.0572], device='cuda:0'), in_proj_covar=tensor([0.0254, 0.0200, 0.0188, 0.0303, 0.0226, 0.0206, 0.0191, 0.0249], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 05:37:35,200 INFO [train.py:876] (0/4) Epoch 12, batch 3900, loss[loss=0.1031, simple_loss=0.1476, pruned_loss=0.02931, over 5601.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1385, pruned_loss=0.03938, over 1087200.95 frames. ], batch size: 22, lr: 6.72e-03, grad_scale: 16.0 2022-11-16 05:37:36,327 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.97 vs. limit=5.0 2022-11-16 05:37:39,879 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83900.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:38:04,277 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.940e+01 1.504e+02 1.885e+02 2.376e+02 4.069e+02, threshold=3.770e+02, percent-clipped=4.0 2022-11-16 05:38:07,080 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=83940.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:38:08,699 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5793, 3.8039, 2.9717, 3.6334, 3.6811, 3.4173, 3.8214, 3.6175], device='cuda:0'), covar=tensor([0.0822, 0.0900, 0.2490, 0.1360, 0.0924, 0.0711, 0.0829, 0.0814], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0182, 0.0279, 0.0178, 0.0225, 0.0173, 0.0192, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:38:13,236 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83949.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:38:13,936 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0488, 1.4923, 1.1497, 1.0586, 1.5164, 1.1402, 0.7349, 1.4744], device='cuda:0'), covar=tensor([0.0059, 0.0040, 0.0057, 0.0066, 0.0049, 0.0050, 0.0083, 0.0050], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0057, 0.0056, 0.0060, 0.0059, 0.0054, 0.0054, 0.0051], device='cuda:0'), out_proj_covar=tensor([5.5154e-05, 5.0715e-05, 4.8688e-05, 5.3605e-05, 5.2369e-05, 4.7250e-05, 4.7803e-05, 4.4702e-05], device='cuda:0') 2022-11-16 05:38:22,700 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.56 vs. limit=5.0 2022-11-16 05:38:37,783 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=83985.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:38:39,632 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=83988.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:38:43,116 INFO [train.py:876] (0/4) Epoch 12, batch 4000, loss[loss=0.1465, simple_loss=0.1703, pruned_loss=0.06133, over 5569.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1394, pruned_loss=0.04038, over 1080422.89 frames. ], batch size: 46, lr: 6.71e-03, grad_scale: 16.0 2022-11-16 05:38:48,625 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3140, 1.8371, 1.4974, 1.2851, 0.8821, 1.5192, 1.1926, 1.6230], device='cuda:0'), covar=tensor([0.1266, 0.0495, 0.1038, 0.1251, 0.2665, 0.1036, 0.1667, 0.0745], device='cuda:0'), in_proj_covar=tensor([0.0157, 0.0141, 0.0157, 0.0149, 0.0174, 0.0166, 0.0161, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:38:54,483 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84010.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 05:39:11,844 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.002e+01 1.513e+02 1.878e+02 2.345e+02 5.861e+02, threshold=3.757e+02, percent-clipped=4.0 2022-11-16 05:39:18,954 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84046.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:39:50,160 INFO [train.py:876] (0/4) Epoch 12, batch 4100, loss[loss=0.09458, simple_loss=0.1341, pruned_loss=0.02753, over 5706.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1386, pruned_loss=0.04044, over 1084678.56 frames. ], batch size: 19, lr: 6.71e-03, grad_scale: 16.0 2022-11-16 05:39:53,212 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84097.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:40:06,005 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84116.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:40:06,618 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7537, 1.6751, 1.6373, 1.6901, 1.8198, 1.6798, 1.8949, 1.8448], device='cuda:0'), covar=tensor([0.0762, 0.1121, 0.0948, 0.1429, 0.0724, 0.0695, 0.1269, 0.0972], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0108, 0.0096, 0.0120, 0.0090, 0.0080, 0.0145, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:40:18,962 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.530e+02 1.828e+02 2.227e+02 3.985e+02, threshold=3.657e+02, percent-clipped=1.0 2022-11-16 05:40:34,671 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84158.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:40:56,962 INFO [train.py:876] (0/4) Epoch 12, batch 4200, loss[loss=0.08223, simple_loss=0.1213, pruned_loss=0.02156, over 5427.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1385, pruned_loss=0.04061, over 1078599.94 frames. ], batch size: 11, lr: 6.70e-03, grad_scale: 16.0 2022-11-16 05:41:01,786 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84199.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:41:20,895 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4032, 2.2653, 2.1016, 2.3225, 2.0438, 1.7560, 2.1339, 2.6634], device='cuda:0'), covar=tensor([0.1086, 0.1487, 0.1758, 0.1144, 0.1364, 0.1649, 0.1794, 0.1535], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0106, 0.0104, 0.0103, 0.0091, 0.0100, 0.0098, 0.0081], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 05:41:26,498 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.244e+01 1.460e+02 1.805e+02 2.127e+02 5.710e+02, threshold=3.610e+02, percent-clipped=1.0 2022-11-16 05:41:26,704 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84236.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:41:28,690 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2766, 2.1002, 2.7573, 1.7825, 1.3837, 3.0700, 2.4344, 2.1857], device='cuda:0'), covar=tensor([0.1429, 0.1567, 0.1101, 0.3077, 0.3538, 0.1067, 0.2227, 0.1527], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0096, 0.0095, 0.0101, 0.0076, 0.0068, 0.0079, 0.0090], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 05:41:31,097 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.74 vs. limit=5.0 2022-11-16 05:41:41,666 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.71 vs. limit=2.0 2022-11-16 05:41:42,995 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84260.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:42:04,886 INFO [train.py:876] (0/4) Epoch 12, batch 4300, loss[loss=0.06717, simple_loss=0.1058, pruned_loss=0.01427, over 5456.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1378, pruned_loss=0.0393, over 1087709.74 frames. ], batch size: 10, lr: 6.70e-03, grad_scale: 16.0 2022-11-16 05:42:07,615 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84297.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:42:12,729 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84305.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 05:42:24,740 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7654, 3.6543, 3.6409, 3.8256, 3.4913, 3.4442, 4.0878, 3.6831], device='cuda:0'), covar=tensor([0.0434, 0.0861, 0.0503, 0.1047, 0.0586, 0.0436, 0.0780, 0.0761], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0107, 0.0096, 0.0119, 0.0090, 0.0079, 0.0144, 0.0102], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:42:34,273 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 1.531e+02 1.904e+02 2.409e+02 4.137e+02, threshold=3.809e+02, percent-clipped=6.0 2022-11-16 05:42:34,453 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84336.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:42:37,627 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84341.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:42:44,225 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0925, 3.1594, 2.7409, 3.1795, 2.6426, 3.7438, 3.4205, 3.5761], device='cuda:0'), covar=tensor([0.0939, 0.1187, 0.2068, 0.1009, 0.1169, 0.0546, 0.1099, 0.1250], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0106, 0.0104, 0.0103, 0.0092, 0.0101, 0.0098, 0.0081], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 05:43:12,340 INFO [train.py:876] (0/4) Epoch 12, batch 4400, loss[loss=0.1017, simple_loss=0.133, pruned_loss=0.03521, over 5803.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1371, pruned_loss=0.03853, over 1092840.35 frames. ], batch size: 21, lr: 6.70e-03, grad_scale: 16.0 2022-11-16 05:43:15,058 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84397.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:43:24,980 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.9675, 5.2640, 5.3195, 4.9516, 5.9598, 5.3067, 4.9363, 5.7815], device='cuda:0'), covar=tensor([0.0832, 0.1319, 0.1673, 0.1530, 0.0806, 0.1359, 0.0933, 0.1160], device='cuda:0'), in_proj_covar=tensor([0.0145, 0.0157, 0.0113, 0.0146, 0.0184, 0.0110, 0.0129, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 05:43:27,612 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84416.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:43:41,383 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.927e+01 1.500e+02 1.777e+02 2.208e+02 3.922e+02, threshold=3.553e+02, percent-clipped=1.0 2022-11-16 05:43:43,115 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-16 05:43:46,715 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8336, 1.4508, 2.0629, 1.7019, 1.7664, 2.0968, 1.8582, 1.6213], device='cuda:0'), covar=tensor([0.0065, 0.0090, 0.0023, 0.0057, 0.0067, 0.0111, 0.0044, 0.0045], device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0026, 0.0027, 0.0035, 0.0031, 0.0028, 0.0035, 0.0033], device='cuda:0'), out_proj_covar=tensor([2.6497e-05, 2.4458e-05, 2.4311e-05, 3.3791e-05, 2.8580e-05, 2.6383e-05, 3.3471e-05, 3.2118e-05], device='cuda:0') 2022-11-16 05:43:52,910 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84453.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:43:53,044 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8496, 2.3015, 3.5885, 3.1675, 3.8437, 2.3943, 3.2059, 3.9483], device='cuda:0'), covar=tensor([0.0711, 0.1701, 0.0718, 0.1550, 0.0649, 0.1741, 0.1421, 0.0830], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0196, 0.0218, 0.0213, 0.0240, 0.0197, 0.0224, 0.0230], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:43:59,962 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84464.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:44:19,824 INFO [train.py:876] (0/4) Epoch 12, batch 4500, loss[loss=0.06621, simple_loss=0.1004, pruned_loss=0.01599, over 5709.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.138, pruned_loss=0.0399, over 1087817.54 frames. ], batch size: 11, lr: 6.69e-03, grad_scale: 16.0 2022-11-16 05:44:20,305 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.36 vs. limit=2.0 2022-11-16 05:44:41,056 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84525.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:44:41,885 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2022-11-16 05:44:47,018 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.80 vs. limit=2.0 2022-11-16 05:44:48,043 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.005e+02 1.515e+02 1.740e+02 2.350e+02 4.214e+02, threshold=3.480e+02, percent-clipped=2.0 2022-11-16 05:45:01,592 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84555.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:45:22,428 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84586.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:45:26,270 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84592.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:45:26,816 INFO [train.py:876] (0/4) Epoch 12, batch 4600, loss[loss=0.1144, simple_loss=0.1509, pruned_loss=0.03895, over 5749.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1381, pruned_loss=0.03923, over 1085699.95 frames. ], batch size: 27, lr: 6.69e-03, grad_scale: 16.0 2022-11-16 05:45:32,849 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2925, 1.5431, 1.1423, 1.0408, 1.4565, 1.2696, 0.8939, 1.1760], device='cuda:0'), covar=tensor([0.0053, 0.0034, 0.0057, 0.0052, 0.0039, 0.0039, 0.0067, 0.0072], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0057, 0.0056, 0.0060, 0.0059, 0.0054, 0.0054, 0.0050], device='cuda:0'), out_proj_covar=tensor([5.4835e-05, 5.0716e-05, 4.9110e-05, 5.3491e-05, 5.1679e-05, 4.6975e-05, 4.7690e-05, 4.4148e-05], device='cuda:0') 2022-11-16 05:45:36,010 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84605.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:45:54,576 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84633.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:45:56,354 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.464e+02 1.836e+02 2.237e+02 3.755e+02, threshold=3.672e+02, percent-clipped=3.0 2022-11-16 05:45:59,680 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84641.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:46:08,133 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84653.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:46:32,291 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84689.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:46:34,320 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84692.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:46:34,898 INFO [train.py:876] (0/4) Epoch 12, batch 4700, loss[loss=0.07333, simple_loss=0.1093, pruned_loss=0.01866, over 5427.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1363, pruned_loss=0.03803, over 1083327.08 frames. ], batch size: 11, lr: 6.68e-03, grad_scale: 16.0 2022-11-16 05:46:35,704 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84694.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:46:56,984 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2337, 0.9131, 1.0085, 0.8112, 1.2778, 1.1233, 0.5255, 0.9423], device='cuda:0'), covar=tensor([0.0309, 0.0441, 0.0297, 0.0695, 0.0364, 0.0280, 0.0929, 0.0323], device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0024, 0.0017, 0.0021, 0.0017, 0.0015, 0.0023, 0.0016], device='cuda:0'), out_proj_covar=tensor([8.4450e-05, 1.1643e-04, 8.9799e-05, 1.0464e-04, 9.1893e-05, 8.5505e-05, 1.1307e-04, 8.6140e-05], device='cuda:0') 2022-11-16 05:47:03,946 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 1.427e+02 1.725e+02 2.096e+02 3.748e+02, threshold=3.451e+02, percent-clipped=1.0 2022-11-16 05:47:15,107 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84753.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:47:30,469 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1109, 3.0152, 2.7184, 3.0454, 3.0610, 2.7411, 2.6072, 2.8882], device='cuda:0'), covar=tensor([0.0290, 0.0689, 0.1405, 0.0546, 0.0591, 0.0526, 0.1207, 0.0648], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0180, 0.0275, 0.0176, 0.0223, 0.0173, 0.0191, 0.0175], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:47:42,122 INFO [train.py:876] (0/4) Epoch 12, batch 4800, loss[loss=0.1021, simple_loss=0.133, pruned_loss=0.0356, over 5774.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1383, pruned_loss=0.03999, over 1084051.32 frames. ], batch size: 16, lr: 6.68e-03, grad_scale: 16.0 2022-11-16 05:47:47,074 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84800.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:47:47,585 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84801.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:48:11,628 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.403e+01 1.631e+02 1.987e+02 2.475e+02 5.083e+02, threshold=3.974e+02, percent-clipped=5.0 2022-11-16 05:48:17,139 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.13 vs. limit=2.0 2022-11-16 05:48:24,058 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84855.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:48:26,179 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.41 vs. limit=5.0 2022-11-16 05:48:27,992 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=84861.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:48:41,869 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84881.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:48:49,081 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8766, 4.5998, 4.9154, 4.9525, 4.3434, 4.2165, 5.3899, 4.6013], device='cuda:0'), covar=tensor([0.0484, 0.1269, 0.0490, 0.1440, 0.0460, 0.0319, 0.0692, 0.0580], device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0111, 0.0099, 0.0124, 0.0091, 0.0082, 0.0149, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:48:49,152 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84892.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:48:49,705 INFO [train.py:876] (0/4) Epoch 12, batch 4900, loss[loss=0.08932, simple_loss=0.1217, pruned_loss=0.02848, over 5742.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1378, pruned_loss=0.03968, over 1084130.70 frames. ], batch size: 13, lr: 6.68e-03, grad_scale: 32.0 2022-11-16 05:48:56,262 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84903.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:49:19,714 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.452e+01 1.404e+02 1.712e+02 2.121e+02 6.209e+02, threshold=3.423e+02, percent-clipped=1.0 2022-11-16 05:49:21,800 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=84940.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:49:24,444 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3037, 3.5571, 3.4410, 3.3167, 3.5626, 3.4795, 1.3024, 3.6235], device='cuda:0'), covar=tensor([0.0644, 0.0606, 0.0490, 0.0419, 0.0519, 0.0528, 0.4046, 0.0533], device='cuda:0'), in_proj_covar=tensor([0.0103, 0.0088, 0.0088, 0.0082, 0.0102, 0.0090, 0.0130, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:49:28,964 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=84951.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:49:45,492 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2022-11-16 05:49:54,403 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=84989.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:49:56,371 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=84992.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:49:56,852 INFO [train.py:876] (0/4) Epoch 12, batch 5000, loss[loss=0.09248, simple_loss=0.1165, pruned_loss=0.03423, over 5708.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1369, pruned_loss=0.03872, over 1089101.50 frames. ], batch size: 11, lr: 6.67e-03, grad_scale: 16.0 2022-11-16 05:49:58,508 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.54 vs. limit=2.0 2022-11-16 05:50:01,691 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-85000.pt 2022-11-16 05:50:11,864 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2022-11-16 05:50:13,406 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85012.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:50:29,276 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.668e+01 1.461e+02 1.751e+02 2.205e+02 3.739e+02, threshold=3.502e+02, percent-clipped=4.0 2022-11-16 05:50:31,256 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85040.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:50:56,825 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.79 vs. limit=5.0 2022-11-16 05:51:06,697 INFO [train.py:876] (0/4) Epoch 12, batch 5100, loss[loss=0.08545, simple_loss=0.125, pruned_loss=0.02296, over 5513.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1376, pruned_loss=0.03941, over 1081575.12 frames. ], batch size: 12, lr: 6.67e-03, grad_scale: 16.0 2022-11-16 05:51:10,932 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([6.0246, 5.4975, 5.6652, 5.3167, 6.1040, 5.9493, 5.0784, 6.0738], device='cuda:0'), covar=tensor([0.0324, 0.0317, 0.0446, 0.0280, 0.0287, 0.0199, 0.0195, 0.0232], device='cuda:0'), in_proj_covar=tensor([0.0143, 0.0152, 0.0109, 0.0142, 0.0180, 0.0107, 0.0126, 0.0153], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 05:51:16,482 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85107.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:51:21,653 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85115.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:51:29,840 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3508, 2.2402, 2.4961, 3.3957, 3.2905, 2.5248, 2.1787, 3.4091], device='cuda:0'), covar=tensor([0.1001, 0.2692, 0.2010, 0.2214, 0.1509, 0.3147, 0.2231, 0.1234], device='cuda:0'), in_proj_covar=tensor([0.0251, 0.0195, 0.0189, 0.0303, 0.0222, 0.0204, 0.0190, 0.0248], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 05:51:36,165 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 1.524e+02 1.873e+02 2.260e+02 4.795e+02, threshold=3.745e+02, percent-clipped=3.0 2022-11-16 05:51:49,408 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85156.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:51:57,601 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85168.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 05:52:03,255 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85176.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:52:05,251 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8561, 4.4653, 4.0355, 3.6949, 2.1687, 4.2352, 2.4657, 3.8088], device='cuda:0'), covar=tensor([0.0406, 0.0140, 0.0187, 0.0356, 0.0640, 0.0168, 0.0507, 0.0120], device='cuda:0'), in_proj_covar=tensor([0.0194, 0.0178, 0.0181, 0.0204, 0.0192, 0.0181, 0.0190, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 05:52:06,499 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85181.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:52:14,134 INFO [train.py:876] (0/4) Epoch 12, batch 5200, loss[loss=0.1222, simple_loss=0.1545, pruned_loss=0.04498, over 5343.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1369, pruned_loss=0.03867, over 1082689.04 frames. ], batch size: 79, lr: 6.66e-03, grad_scale: 16.0 2022-11-16 05:52:15,826 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.96 vs. limit=2.0 2022-11-16 05:52:39,250 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85229.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:52:45,062 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.605e+01 1.468e+02 1.779e+02 2.161e+02 4.129e+02, threshold=3.557e+02, percent-clipped=1.0 2022-11-16 05:52:52,503 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9573, 4.8104, 5.0272, 4.9729, 4.5961, 4.5258, 5.5142, 5.0191], device='cuda:0'), covar=tensor([0.0312, 0.0954, 0.0574, 0.1229, 0.0470, 0.0379, 0.0649, 0.0518], device='cuda:0'), in_proj_covar=tensor([0.0090, 0.0112, 0.0099, 0.0123, 0.0092, 0.0083, 0.0149, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:53:20,077 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85289.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:53:21,309 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9404, 2.8660, 2.5687, 2.8387, 2.8988, 2.6359, 2.4700, 2.8124], device='cuda:0'), covar=tensor([0.0300, 0.0692, 0.1517, 0.0599, 0.0626, 0.0505, 0.1125, 0.0558], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0178, 0.0274, 0.0174, 0.0223, 0.0174, 0.0189, 0.0176], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:53:22,524 INFO [train.py:876] (0/4) Epoch 12, batch 5300, loss[loss=0.08061, simple_loss=0.1134, pruned_loss=0.0239, over 5600.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1374, pruned_loss=0.0388, over 1084170.15 frames. ], batch size: 18, lr: 6.66e-03, grad_scale: 8.0 2022-11-16 05:53:23,327 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8186, 1.2321, 1.0723, 1.2486, 1.0545, 1.1367, 0.9586, 1.2561], device='cuda:0'), covar=tensor([0.2644, 0.1317, 0.1314, 0.1029, 0.1388, 0.1605, 0.1519, 0.0610], device='cuda:0'), in_proj_covar=tensor([0.0113, 0.0106, 0.0106, 0.0104, 0.0093, 0.0102, 0.0098, 0.0082], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 05:53:31,519 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85307.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:53:52,734 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85337.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:53:53,357 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.591e+01 1.461e+02 1.746e+02 2.193e+02 3.892e+02, threshold=3.493e+02, percent-clipped=1.0 2022-11-16 05:54:19,590 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5631, 4.6046, 3.4538, 2.0631, 4.1453, 2.0321, 4.3502, 2.5353], device='cuda:0'), covar=tensor([0.1305, 0.0109, 0.0558, 0.1876, 0.0159, 0.1508, 0.0172, 0.1396], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0103, 0.0115, 0.0111, 0.0101, 0.0119, 0.0100, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:54:33,130 INFO [train.py:876] (0/4) Epoch 12, batch 5400, loss[loss=0.1046, simple_loss=0.1336, pruned_loss=0.03777, over 5742.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1373, pruned_loss=0.03902, over 1084455.70 frames. ], batch size: 31, lr: 6.66e-03, grad_scale: 8.0 2022-11-16 05:54:57,330 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85428.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:55:04,100 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.454e+02 1.853e+02 2.296e+02 5.814e+02, threshold=3.706e+02, percent-clipped=5.0 2022-11-16 05:55:05,304 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.67 vs. limit=2.0 2022-11-16 05:55:15,941 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85456.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:55:20,368 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85463.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 05:55:25,549 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85471.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:55:34,545 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85483.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:55:38,420 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85489.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:55:41,215 INFO [train.py:876] (0/4) Epoch 12, batch 5500, loss[loss=0.1079, simple_loss=0.1442, pruned_loss=0.03578, over 5549.00 frames. ], tot_loss[loss=0.109, simple_loss=0.138, pruned_loss=0.03998, over 1074391.18 frames. ], batch size: 43, lr: 6.65e-03, grad_scale: 8.0 2022-11-16 05:55:48,422 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85504.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:56:03,092 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.00 vs. limit=2.0 2022-11-16 05:56:06,093 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85530.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:56:11,498 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.321e+01 1.546e+02 1.853e+02 2.385e+02 3.916e+02, threshold=3.707e+02, percent-clipped=1.0 2022-11-16 05:56:16,108 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85544.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:56:37,977 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2226, 4.0289, 4.1498, 4.1753, 3.8155, 3.6891, 4.6213, 4.2154], device='cuda:0'), covar=tensor([0.0447, 0.0803, 0.0440, 0.0994, 0.0545, 0.0382, 0.0589, 0.0568], device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0110, 0.0098, 0.0123, 0.0091, 0.0082, 0.0148, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 05:56:47,722 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85591.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:56:48,894 INFO [train.py:876] (0/4) Epoch 12, batch 5600, loss[loss=0.1033, simple_loss=0.1411, pruned_loss=0.03272, over 5574.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1376, pruned_loss=0.03955, over 1080737.59 frames. ], batch size: 21, lr: 6.65e-03, grad_scale: 8.0 2022-11-16 05:56:58,719 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85607.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:57:09,518 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.47 vs. limit=5.0 2022-11-16 05:57:10,800 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85625.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:57:20,010 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.221e+01 1.474e+02 1.888e+02 2.414e+02 5.206e+02, threshold=3.776e+02, percent-clipped=5.0 2022-11-16 05:57:25,498 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9876, 2.0028, 1.9138, 2.0869, 1.8762, 1.6463, 1.9306, 2.1848], device='cuda:0'), covar=tensor([0.1731, 0.1654, 0.2099, 0.1554, 0.1624, 0.2201, 0.1558, 0.1043], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0106, 0.0104, 0.0104, 0.0091, 0.0101, 0.0097, 0.0081], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 05:57:32,028 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85655.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:57:49,388 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.47 vs. limit=5.0 2022-11-16 05:57:52,331 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85686.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 05:57:52,897 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2465, 3.7217, 2.9498, 1.8437, 3.4171, 1.3314, 3.4906, 1.9640], device='cuda:0'), covar=tensor([0.1565, 0.0180, 0.0759, 0.1969, 0.0278, 0.2177, 0.0255, 0.1623], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0103, 0.0114, 0.0111, 0.0101, 0.0120, 0.0100, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 05:57:56,995 INFO [train.py:876] (0/4) Epoch 12, batch 5700, loss[loss=0.1278, simple_loss=0.1484, pruned_loss=0.05358, over 4654.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1378, pruned_loss=0.04, over 1081192.47 frames. ], batch size: 135, lr: 6.64e-03, grad_scale: 8.0 2022-11-16 05:58:09,759 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9766, 2.1352, 2.4332, 3.3002, 3.1531, 2.4208, 2.1349, 3.3861], device='cuda:0'), covar=tensor([0.1633, 0.2647, 0.2072, 0.2008, 0.1207, 0.2914, 0.2079, 0.0910], device='cuda:0'), in_proj_covar=tensor([0.0255, 0.0199, 0.0188, 0.0304, 0.0227, 0.0204, 0.0189, 0.0250], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 05:58:26,981 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.513e+01 1.496e+02 1.877e+02 2.228e+02 5.709e+02, threshold=3.754e+02, percent-clipped=3.0 2022-11-16 05:58:43,939 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85763.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:58:49,764 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=85771.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:58:52,465 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2022-11-16 05:58:58,186 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85784.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:59:04,113 INFO [train.py:876] (0/4) Epoch 12, batch 5800, loss[loss=0.1088, simple_loss=0.1342, pruned_loss=0.0417, over 5547.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1381, pruned_loss=0.04012, over 1083233.12 frames. ], batch size: 14, lr: 6.64e-03, grad_scale: 8.0 2022-11-16 05:59:16,819 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85811.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:59:22,781 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=85819.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 05:59:35,106 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.803e+01 1.482e+02 1.820e+02 2.147e+02 4.590e+02, threshold=3.641e+02, percent-clipped=4.0 2022-11-16 05:59:35,892 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85839.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:00:00,440 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85875.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:00:07,910 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85886.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:00:12,395 INFO [train.py:876] (0/4) Epoch 12, batch 5900, loss[loss=0.08691, simple_loss=0.1227, pruned_loss=0.02554, over 5675.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1379, pruned_loss=0.03954, over 1089113.99 frames. ], batch size: 28, lr: 6.64e-03, grad_scale: 8.0 2022-11-16 06:00:32,220 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.50 vs. limit=2.0 2022-11-16 06:00:38,036 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85930.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:00:41,974 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85936.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:00:43,059 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.779e+01 1.458e+02 1.851e+02 2.281e+02 4.967e+02, threshold=3.703e+02, percent-clipped=4.0 2022-11-16 06:00:44,539 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3344, 1.2543, 1.2378, 0.9294, 1.3883, 1.4976, 0.7514, 1.1645], device='cuda:0'), covar=tensor([0.0572, 0.0556, 0.0353, 0.0727, 0.0494, 0.0316, 0.1089, 0.0393], device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0024, 0.0017, 0.0021, 0.0017, 0.0016, 0.0023, 0.0016], device='cuda:0'), out_proj_covar=tensor([8.6649e-05, 1.1906e-04, 9.2798e-05, 1.0582e-04, 9.3719e-05, 8.7854e-05, 1.1495e-04, 8.8845e-05], device='cuda:0') 2022-11-16 06:00:55,686 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=85957.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:01:12,051 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=85981.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:01:18,824 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=85991.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 06:01:19,973 INFO [train.py:876] (0/4) Epoch 12, batch 6000, loss[loss=0.121, simple_loss=0.1464, pruned_loss=0.0478, over 5593.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1365, pruned_loss=0.03833, over 1092978.71 frames. ], batch size: 43, lr: 6.63e-03, grad_scale: 8.0 2022-11-16 06:01:19,974 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 06:01:33,273 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8194, 1.7645, 2.3751, 1.8450, 1.6433, 2.8661, 2.0458, 1.8828], device='cuda:0'), covar=tensor([0.1055, 0.1890, 0.1183, 0.2533, 0.2938, 0.0354, 0.1444, 0.2149], device='cuda:0'), in_proj_covar=tensor([0.0107, 0.0098, 0.0097, 0.0103, 0.0076, 0.0069, 0.0080, 0.0092], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 06:01:37,497 INFO [train.py:908] (0/4) Epoch 12, validation: loss=0.1738, simple_loss=0.1864, pruned_loss=0.08063, over 1530663.00 frames. 2022-11-16 06:01:37,497 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 06:01:43,763 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86002.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:01:54,426 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86018.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:01:56,712 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-16 06:01:58,571 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3069, 2.3901, 2.1146, 2.3439, 2.4227, 2.2012, 2.0996, 2.2493], device='cuda:0'), covar=tensor([0.0459, 0.0855, 0.1783, 0.0768, 0.0732, 0.0684, 0.1333, 0.0741], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0179, 0.0273, 0.0174, 0.0223, 0.0174, 0.0190, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:02:08,196 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 1.359e+02 1.723e+02 2.216e+02 5.600e+02, threshold=3.445e+02, percent-clipped=2.0 2022-11-16 06:02:17,508 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86052.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:02:24,797 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86063.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:02:29,097 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86069.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:02:39,113 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86084.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:02:45,097 INFO [train.py:876] (0/4) Epoch 12, batch 6100, loss[loss=0.07218, simple_loss=0.09635, pruned_loss=0.024, over 5403.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1362, pruned_loss=0.03817, over 1092259.88 frames. ], batch size: 9, lr: 6.63e-03, grad_scale: 8.0 2022-11-16 06:02:58,464 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86113.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:03:10,247 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86130.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:03:11,382 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86132.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:03:15,121 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.336e+01 1.468e+02 1.787e+02 2.256e+02 5.479e+02, threshold=3.574e+02, percent-clipped=5.0 2022-11-16 06:03:15,908 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86139.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:03:20,815 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0304, 1.6025, 1.0312, 0.8292, 1.4004, 1.0416, 0.7254, 1.3155], device='cuda:0'), covar=tensor([0.0065, 0.0050, 0.0059, 0.0061, 0.0051, 0.0056, 0.0076, 0.0066], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0057, 0.0056, 0.0060, 0.0058, 0.0055, 0.0054, 0.0052], device='cuda:0'), out_proj_covar=tensor([5.4489e-05, 5.0299e-05, 4.9009e-05, 5.3589e-05, 5.1241e-05, 4.7509e-05, 4.8159e-05, 4.5314e-05], device='cuda:0') 2022-11-16 06:03:32,096 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.60 vs. limit=2.0 2022-11-16 06:03:36,508 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3196, 2.8466, 3.1590, 4.2066, 4.1563, 3.2129, 2.8737, 4.0940], device='cuda:0'), covar=tensor([0.0641, 0.2663, 0.1852, 0.1873, 0.0972, 0.2719, 0.1977, 0.0877], device='cuda:0'), in_proj_covar=tensor([0.0255, 0.0198, 0.0190, 0.0303, 0.0225, 0.0204, 0.0188, 0.0249], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 06:03:47,209 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86186.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:03:47,784 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86187.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:03:51,956 INFO [train.py:876] (0/4) Epoch 12, batch 6200, loss[loss=0.1129, simple_loss=0.1436, pruned_loss=0.04104, over 5545.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1365, pruned_loss=0.03841, over 1093036.75 frames. ], batch size: 40, lr: 6.63e-03, grad_scale: 8.0 2022-11-16 06:04:17,345 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86231.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:04:19,250 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86234.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:04:22,081 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.231e+01 1.405e+02 1.749e+02 2.219e+02 4.004e+02, threshold=3.499e+02, percent-clipped=3.0 2022-11-16 06:04:27,860 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86246.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:04:33,729 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1323, 2.7642, 2.6948, 1.6690, 3.0890, 2.9208, 3.1030, 3.3111], device='cuda:0'), covar=tensor([0.1970, 0.1774, 0.1144, 0.3111, 0.0687, 0.1077, 0.0512, 0.0844], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0182, 0.0167, 0.0186, 0.0183, 0.0203, 0.0168, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:04:51,626 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86281.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:04:54,813 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86286.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 06:04:59,676 INFO [train.py:876] (0/4) Epoch 12, batch 6300, loss[loss=0.08896, simple_loss=0.1241, pruned_loss=0.02691, over 5513.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1363, pruned_loss=0.03844, over 1086867.89 frames. ], batch size: 17, lr: 6.62e-03, grad_scale: 8.0 2022-11-16 06:05:09,333 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86307.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:05:13,613 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86313.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:05:20,529 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.86 vs. limit=2.0 2022-11-16 06:05:24,117 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86329.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:05:28,875 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8816, 2.5528, 2.2525, 1.6655, 2.7548, 2.5950, 2.6289, 2.9393], device='cuda:0'), covar=tensor([0.1765, 0.1638, 0.1583, 0.2707, 0.0933, 0.1191, 0.0664, 0.0991], device='cuda:0'), in_proj_covar=tensor([0.0168, 0.0183, 0.0168, 0.0187, 0.0184, 0.0205, 0.0169, 0.0188], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:05:29,915 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.920e+01 1.421e+02 1.647e+02 2.112e+02 5.317e+02, threshold=3.295e+02, percent-clipped=6.0 2022-11-16 06:05:32,701 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.09 vs. limit=5.0 2022-11-16 06:05:44,651 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86358.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:06:07,711 INFO [train.py:876] (0/4) Epoch 12, batch 6400, loss[loss=0.05691, simple_loss=0.0852, pruned_loss=0.01431, over 5219.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1368, pruned_loss=0.03887, over 1080745.70 frames. ], batch size: 8, lr: 6.62e-03, grad_scale: 8.0 2022-11-16 06:06:18,206 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86408.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:06:29,990 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86425.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:06:38,424 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.070e+01 1.434e+02 1.773e+02 2.236e+02 3.206e+02, threshold=3.547e+02, percent-clipped=0.0 2022-11-16 06:06:39,864 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86440.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:06:43,159 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86445.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:07:15,664 INFO [train.py:876] (0/4) Epoch 12, batch 6500, loss[loss=0.1064, simple_loss=0.1309, pruned_loss=0.04096, over 5542.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1376, pruned_loss=0.03952, over 1080746.07 frames. ], batch size: 40, lr: 6.61e-03, grad_scale: 8.0 2022-11-16 06:07:21,057 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86501.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:07:25,132 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86506.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:07:42,123 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86531.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:07:46,550 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.116e+01 1.482e+02 1.807e+02 2.369e+02 3.734e+02, threshold=3.614e+02, percent-clipped=1.0 2022-11-16 06:08:09,052 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2022-11-16 06:08:13,931 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86579.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:08:14,718 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86580.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:08:18,592 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86586.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:08:23,752 INFO [train.py:876] (0/4) Epoch 12, batch 6600, loss[loss=0.1398, simple_loss=0.1581, pruned_loss=0.0607, over 5685.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1372, pruned_loss=0.03829, over 1088970.78 frames. ], batch size: 36, lr: 6.61e-03, grad_scale: 8.0 2022-11-16 06:08:29,997 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86602.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:08:37,135 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86613.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:08:37,859 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9403, 1.9584, 2.3386, 2.0856, 1.2812, 1.9800, 1.4533, 1.8820], device='cuda:0'), covar=tensor([0.0239, 0.0126, 0.0121, 0.0197, 0.0350, 0.0175, 0.0342, 0.0193], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0181, 0.0182, 0.0205, 0.0194, 0.0182, 0.0190, 0.0185], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 06:08:46,794 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0317, 3.6668, 3.3082, 3.6798, 3.6715, 3.1828, 3.3035, 3.3397], device='cuda:0'), covar=tensor([0.1372, 0.0477, 0.1302, 0.0430, 0.0484, 0.0584, 0.0827, 0.0604], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0178, 0.0277, 0.0176, 0.0223, 0.0175, 0.0191, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:08:51,380 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86634.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 06:08:54,694 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 1.415e+02 1.832e+02 2.260e+02 3.608e+02, threshold=3.664e+02, percent-clipped=0.0 2022-11-16 06:08:56,853 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86641.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:09:08,014 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86658.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:09:09,925 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86661.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:09:32,603 INFO [train.py:876] (0/4) Epoch 12, batch 6700, loss[loss=0.06848, simple_loss=0.1093, pruned_loss=0.0138, over 5590.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1382, pruned_loss=0.0388, over 1090723.07 frames. ], batch size: 16, lr: 6.61e-03, grad_scale: 8.0 2022-11-16 06:09:41,150 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86706.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:09:42,520 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86708.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:09:53,706 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86725.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:09:59,296 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-16 06:10:02,549 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.026e+02 1.559e+02 1.954e+02 2.479e+02 4.501e+02, threshold=3.908e+02, percent-clipped=4.0 2022-11-16 06:10:15,000 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86756.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:10:25,964 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86773.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:10:36,047 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2022-11-16 06:10:39,695 INFO [train.py:876] (0/4) Epoch 12, batch 6800, loss[loss=0.09929, simple_loss=0.1404, pruned_loss=0.02911, over 5573.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1377, pruned_loss=0.03858, over 1085718.36 frames. ], batch size: 16, lr: 6.60e-03, grad_scale: 8.0 2022-11-16 06:10:41,680 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86796.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:10:45,583 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86801.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 06:10:46,994 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8800, 2.5144, 3.5379, 3.0367, 3.7318, 2.5617, 3.4038, 3.9295], device='cuda:0'), covar=tensor([0.0911, 0.1667, 0.1008, 0.1962, 0.0538, 0.1719, 0.1369, 0.0855], device='cuda:0'), in_proj_covar=tensor([0.0243, 0.0194, 0.0215, 0.0213, 0.0241, 0.0198, 0.0226, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:10:54,161 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86814.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:11:02,054 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0531, 2.8056, 3.0589, 1.8518, 2.9300, 3.1977, 3.3630, 3.5866], device='cuda:0'), covar=tensor([0.2079, 0.1735, 0.0740, 0.2756, 0.0972, 0.0906, 0.0486, 0.0675], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0181, 0.0167, 0.0183, 0.0182, 0.0202, 0.0167, 0.0185], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:11:10,428 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.000e+02 1.446e+02 1.789e+02 2.436e+02 4.053e+02, threshold=3.578e+02, percent-clipped=1.0 2022-11-16 06:11:35,469 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86875.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:11:37,376 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=86878.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:11:41,918 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9305, 1.5034, 0.9150, 0.9694, 1.2659, 1.2330, 0.7748, 1.2292], device='cuda:0'), covar=tensor([0.0071, 0.0039, 0.0056, 0.0048, 0.0053, 0.0061, 0.0094, 0.0052], device='cuda:0'), in_proj_covar=tensor([0.0061, 0.0057, 0.0056, 0.0060, 0.0058, 0.0055, 0.0054, 0.0052], device='cuda:0'), out_proj_covar=tensor([5.4982e-05, 5.0360e-05, 4.9318e-05, 5.3538e-05, 5.1647e-05, 4.7833e-05, 4.7988e-05, 4.5194e-05], device='cuda:0') 2022-11-16 06:11:42,829 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.83 vs. limit=2.0 2022-11-16 06:11:47,374 INFO [train.py:876] (0/4) Epoch 12, batch 6900, loss[loss=0.1114, simple_loss=0.1361, pruned_loss=0.04333, over 5552.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1365, pruned_loss=0.03809, over 1086542.70 frames. ], batch size: 46, lr: 6.60e-03, grad_scale: 8.0 2022-11-16 06:11:53,841 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=86902.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:12:17,107 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=86936.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:12:18,368 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.928e+01 1.457e+02 1.817e+02 2.231e+02 4.523e+02, threshold=3.633e+02, percent-clipped=5.0 2022-11-16 06:12:19,215 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=86939.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:12:21,398 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9746, 3.5894, 2.5017, 3.3533, 2.6776, 2.5681, 1.9479, 3.0248], device='cuda:0'), covar=tensor([0.1553, 0.0297, 0.1093, 0.0423, 0.1205, 0.1129, 0.1907, 0.0551], device='cuda:0'), in_proj_covar=tensor([0.0156, 0.0143, 0.0159, 0.0149, 0.0173, 0.0168, 0.0159, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:12:26,805 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=86950.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:12:32,614 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.84 vs. limit=2.0 2022-11-16 06:12:41,407 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0915, 1.6345, 1.1060, 1.1789, 1.3609, 1.2335, 1.1197, 1.3654], device='cuda:0'), covar=tensor([0.0073, 0.0043, 0.0073, 0.0073, 0.0072, 0.0064, 0.0089, 0.0070], device='cuda:0'), in_proj_covar=tensor([0.0062, 0.0057, 0.0057, 0.0060, 0.0059, 0.0055, 0.0054, 0.0052], device='cuda:0'), out_proj_covar=tensor([5.5181e-05, 5.0629e-05, 4.9498e-05, 5.3580e-05, 5.1705e-05, 4.7743e-05, 4.8238e-05, 4.5409e-05], device='cuda:0') 2022-11-16 06:12:55,753 INFO [train.py:876] (0/4) Epoch 12, batch 7000, loss[loss=0.1109, simple_loss=0.1415, pruned_loss=0.04016, over 5577.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1371, pruned_loss=0.03823, over 1081440.03 frames. ], batch size: 43, lr: 6.60e-03, grad_scale: 8.0 2022-11-16 06:13:02,849 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87002.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 06:13:19,439 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8449, 4.1931, 3.9311, 3.4919, 2.0464, 4.1147, 2.3800, 3.5967], device='cuda:0'), covar=tensor([0.0462, 0.0202, 0.0196, 0.0525, 0.0716, 0.0182, 0.0571, 0.0163], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0180, 0.0183, 0.0205, 0.0193, 0.0182, 0.0190, 0.0185], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 06:13:26,384 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.516e+01 1.515e+02 1.846e+02 2.332e+02 4.129e+02, threshold=3.691e+02, percent-clipped=3.0 2022-11-16 06:13:43,346 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87063.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 06:14:03,309 INFO [train.py:876] (0/4) Epoch 12, batch 7100, loss[loss=0.114, simple_loss=0.1419, pruned_loss=0.04308, over 5699.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1378, pruned_loss=0.03871, over 1085921.88 frames. ], batch size: 28, lr: 6.59e-03, grad_scale: 8.0 2022-11-16 06:14:05,397 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87096.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:14:08,675 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87101.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 06:14:16,731 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0787, 3.0517, 2.6945, 2.9890, 3.0527, 2.6562, 2.6403, 2.7967], device='cuda:0'), covar=tensor([0.0289, 0.0543, 0.1452, 0.0629, 0.0608, 0.0574, 0.0944, 0.0671], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0177, 0.0273, 0.0174, 0.0222, 0.0173, 0.0189, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:14:29,501 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6115, 3.2524, 3.4294, 3.1542, 3.6724, 3.5340, 3.3786, 3.5988], device='cuda:0'), covar=tensor([0.0454, 0.0522, 0.0523, 0.0501, 0.0464, 0.0270, 0.0452, 0.0493], device='cuda:0'), in_proj_covar=tensor([0.0148, 0.0158, 0.0113, 0.0149, 0.0187, 0.0110, 0.0131, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 06:14:33,939 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.177e+01 1.551e+02 1.888e+02 2.451e+02 4.689e+02, threshold=3.775e+02, percent-clipped=4.0 2022-11-16 06:14:37,912 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87144.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:14:41,184 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87149.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:14:55,687 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87170.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:15:11,220 INFO [train.py:876] (0/4) Epoch 12, batch 7200, loss[loss=0.1578, simple_loss=0.1688, pruned_loss=0.07342, over 5402.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1378, pruned_loss=0.03892, over 1079017.49 frames. ], batch size: 70, lr: 6.59e-03, grad_scale: 8.0 2022-11-16 06:15:20,955 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.25 vs. limit=5.0 2022-11-16 06:15:36,450 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87230.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:15:39,013 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87234.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:15:39,036 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7742, 3.8308, 3.9567, 3.5761, 3.8702, 3.7076, 1.5583, 4.0003], device='cuda:0'), covar=tensor([0.0336, 0.0416, 0.0297, 0.0383, 0.0385, 0.0451, 0.2972, 0.0354], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0088, 0.0088, 0.0082, 0.0103, 0.0090, 0.0130, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:15:40,325 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87236.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:15:41,450 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.393e+01 1.550e+02 1.929e+02 2.381e+02 4.425e+02, threshold=3.859e+02, percent-clipped=3.0 2022-11-16 06:15:46,153 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8101, 3.2794, 3.3287, 1.9628, 3.2635, 3.6783, 3.5544, 4.2351], device='cuda:0'), covar=tensor([0.1347, 0.1323, 0.0783, 0.2373, 0.0549, 0.0658, 0.0685, 0.0488], device='cuda:0'), in_proj_covar=tensor([0.0164, 0.0179, 0.0166, 0.0181, 0.0181, 0.0199, 0.0167, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:15:55,059 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1961, 2.4850, 3.8344, 3.2912, 4.0524, 2.7820, 3.5738, 4.2188], device='cuda:0'), covar=tensor([0.0694, 0.1485, 0.0808, 0.1654, 0.0562, 0.1519, 0.1329, 0.0605], device='cuda:0'), in_proj_covar=tensor([0.0242, 0.0193, 0.0214, 0.0212, 0.0241, 0.0195, 0.0223, 0.0226], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:16:00,079 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-12.pt 2022-11-16 06:16:42,440 INFO [train.py:876] (0/4) Epoch 13, batch 0, loss[loss=0.1029, simple_loss=0.1403, pruned_loss=0.03278, over 5599.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1403, pruned_loss=0.03278, over 5599.00 frames. ], batch size: 18, lr: 6.33e-03, grad_scale: 16.0 2022-11-16 06:16:42,442 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 06:16:58,482 INFO [train.py:908] (0/4) Epoch 13, validation: loss=0.175, simple_loss=0.1891, pruned_loss=0.08049, over 1530663.00 frames. 2022-11-16 06:16:58,483 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 06:17:10,372 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.23 vs. limit=5.0 2022-11-16 06:17:11,204 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87284.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:17:16,601 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87291.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:17:41,158 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2022-11-16 06:17:47,397 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.806e+01 1.427e+02 1.803e+02 2.265e+02 3.823e+02, threshold=3.607e+02, percent-clipped=0.0 2022-11-16 06:18:01,500 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87358.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 06:18:04,458 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87362.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:18:05,074 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7280, 2.9278, 2.5369, 2.8945, 2.4934, 2.8673, 2.9104, 3.2914], device='cuda:0'), covar=tensor([0.1356, 0.1316, 0.1862, 0.3118, 0.1489, 0.1305, 0.1558, 0.1157], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0106, 0.0103, 0.0104, 0.0093, 0.0103, 0.0098, 0.0082], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 06:18:06,212 INFO [train.py:876] (0/4) Epoch 13, batch 100, loss[loss=0.1494, simple_loss=0.166, pruned_loss=0.06638, over 5534.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1387, pruned_loss=0.03924, over 439382.84 frames. ], batch size: 49, lr: 6.32e-03, grad_scale: 16.0 2022-11-16 06:18:33,004 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1491, 4.2516, 4.2907, 3.9028, 2.4005, 4.7870, 2.6476, 4.1195], device='cuda:0'), covar=tensor([0.0375, 0.0264, 0.0163, 0.0327, 0.0637, 0.0127, 0.0517, 0.0121], device='cuda:0'), in_proj_covar=tensor([0.0197, 0.0182, 0.0185, 0.0206, 0.0195, 0.0184, 0.0191, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 06:18:45,452 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87423.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:18:55,226 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.013e+01 1.498e+02 1.837e+02 2.189e+02 4.153e+02, threshold=3.674e+02, percent-clipped=6.0 2022-11-16 06:19:10,501 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87461.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:19:12,926 INFO [train.py:876] (0/4) Epoch 13, batch 200, loss[loss=0.1011, simple_loss=0.1324, pruned_loss=0.03493, over 5550.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1371, pruned_loss=0.03789, over 699051.14 frames. ], batch size: 25, lr: 6.32e-03, grad_scale: 16.0 2022-11-16 06:19:16,712 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87470.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:19:40,369 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87505.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:19:49,624 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87518.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:19:52,407 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87522.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:19:58,298 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0788, 1.5555, 1.3145, 1.4323, 1.2494, 1.9673, 1.5525, 1.2890], device='cuda:0'), covar=tensor([0.3010, 0.1121, 0.3044, 0.2830, 0.2426, 0.0707, 0.2206, 0.2797], device='cuda:0'), in_proj_covar=tensor([0.0106, 0.0096, 0.0097, 0.0100, 0.0075, 0.0068, 0.0080, 0.0090], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 06:20:00,735 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87534.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:20:03,214 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.494e+01 1.564e+02 1.812e+02 2.322e+02 4.189e+02, threshold=3.625e+02, percent-clipped=2.0 2022-11-16 06:20:14,742 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.80 vs. limit=2.0 2022-11-16 06:20:15,779 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5915, 4.3625, 3.2834, 1.9111, 4.1495, 1.7632, 3.8857, 2.2172], device='cuda:0'), covar=tensor([0.1293, 0.0139, 0.0688, 0.1877, 0.0184, 0.1636, 0.0240, 0.1422], device='cuda:0'), in_proj_covar=tensor([0.0122, 0.0105, 0.0116, 0.0112, 0.0103, 0.0120, 0.0101, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:20:21,145 INFO [train.py:876] (0/4) Epoch 13, batch 300, loss[loss=0.08761, simple_loss=0.1309, pruned_loss=0.02217, over 5600.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1366, pruned_loss=0.03847, over 852697.65 frames. ], batch size: 18, lr: 6.32e-03, grad_scale: 16.0 2022-11-16 06:20:21,964 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87566.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:20:33,024 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87582.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:20:34,406 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87584.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:20:35,592 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87586.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:20:37,645 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87589.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:20:43,239 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3872, 2.5654, 3.7938, 3.1047, 4.2174, 2.8632, 3.6740, 4.3092], device='cuda:0'), covar=tensor([0.0498, 0.1707, 0.0724, 0.1597, 0.0503, 0.1495, 0.1312, 0.0708], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0194, 0.0216, 0.0214, 0.0242, 0.0197, 0.0225, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:20:48,686 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=87605.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:21:11,331 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.900e+01 1.347e+02 1.607e+02 1.950e+02 4.005e+02, threshold=3.214e+02, percent-clipped=2.0 2022-11-16 06:21:16,191 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87645.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:21:19,729 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87650.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:21:24,820 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87658.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 06:21:29,302 INFO [train.py:876] (0/4) Epoch 13, batch 400, loss[loss=0.08834, simple_loss=0.1244, pruned_loss=0.02616, over 5470.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1346, pruned_loss=0.0368, over 944920.56 frames. ], batch size: 10, lr: 6.31e-03, grad_scale: 16.0 2022-11-16 06:21:30,111 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=87666.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 06:21:30,909 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.68 vs. limit=2.0 2022-11-16 06:21:57,181 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87706.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:22:05,082 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87718.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:22:19,046 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.955e+01 1.567e+02 1.911e+02 2.428e+02 4.922e+02, threshold=3.823e+02, percent-clipped=4.0 2022-11-16 06:22:21,962 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9842, 0.7298, 0.9978, 0.8321, 1.0147, 0.8081, 0.4412, 0.7448], device='cuda:0'), covar=tensor([0.0287, 0.0353, 0.0368, 0.0416, 0.0381, 0.0314, 0.0770, 0.0346], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0025, 0.0017, 0.0021, 0.0017, 0.0016, 0.0024, 0.0016], device='cuda:0'), out_proj_covar=tensor([8.8208e-05, 1.2111e-04, 9.2377e-05, 1.0752e-04, 9.4262e-05, 8.7733e-05, 1.1701e-04, 8.9319e-05], device='cuda:0') 2022-11-16 06:22:31,902 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.56 vs. limit=2.0 2022-11-16 06:22:37,353 INFO [train.py:876] (0/4) Epoch 13, batch 500, loss[loss=0.1117, simple_loss=0.1565, pruned_loss=0.03349, over 5638.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1368, pruned_loss=0.03787, over 998165.05 frames. ], batch size: 38, lr: 6.31e-03, grad_scale: 16.0 2022-11-16 06:22:44,913 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.56 vs. limit=5.0 2022-11-16 06:22:50,460 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4849, 4.3141, 4.5201, 4.5437, 4.2047, 4.0564, 4.9925, 4.6276], device='cuda:0'), covar=tensor([0.0487, 0.0845, 0.0474, 0.1455, 0.0526, 0.0352, 0.0680, 0.0610], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0108, 0.0096, 0.0122, 0.0091, 0.0080, 0.0146, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:22:52,436 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3075, 4.4934, 4.5437, 4.5609, 4.0692, 4.1281, 5.0024, 4.6089], device='cuda:0'), covar=tensor([0.0475, 0.0815, 0.0371, 0.1111, 0.0624, 0.0351, 0.0761, 0.0563], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0108, 0.0096, 0.0122, 0.0091, 0.0080, 0.0146, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:23:12,961 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87817.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:23:26,585 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.493e+01 1.445e+02 1.920e+02 2.398e+02 4.024e+02, threshold=3.840e+02, percent-clipped=2.0 2022-11-16 06:23:42,716 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87861.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 06:23:45,206 INFO [train.py:876] (0/4) Epoch 13, batch 600, loss[loss=0.1143, simple_loss=0.1403, pruned_loss=0.04412, over 5542.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1354, pruned_loss=0.03702, over 1036871.91 frames. ], batch size: 14, lr: 6.31e-03, grad_scale: 16.0 2022-11-16 06:23:59,294 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=87886.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:24:04,445 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4952, 4.9270, 4.5793, 4.9280, 4.9284, 4.1189, 4.4686, 4.3398], device='cuda:0'), covar=tensor([0.0320, 0.0420, 0.1293, 0.0319, 0.0399, 0.0457, 0.0468, 0.0457], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0176, 0.0276, 0.0174, 0.0221, 0.0174, 0.0189, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:24:31,945 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=87934.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:24:35,163 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 1.443e+02 1.741e+02 2.053e+02 3.488e+02, threshold=3.481e+02, percent-clipped=0.0 2022-11-16 06:24:35,921 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87940.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 06:24:39,192 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87945.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:24:43,150 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4757, 2.2390, 3.1554, 2.6753, 3.1643, 2.1950, 2.9863, 3.4260], device='cuda:0'), covar=tensor([0.0740, 0.1590, 0.0993, 0.1668, 0.0857, 0.1806, 0.1100, 0.1177], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0193, 0.0218, 0.0214, 0.0242, 0.0197, 0.0227, 0.0231], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:24:49,806 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=87961.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:24:52,674 INFO [train.py:876] (0/4) Epoch 13, batch 700, loss[loss=0.1724, simple_loss=0.1829, pruned_loss=0.08095, over 5463.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1356, pruned_loss=0.03727, over 1059131.74 frames. ], batch size: 53, lr: 6.30e-03, grad_scale: 8.0 2022-11-16 06:25:19,382 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0438, 2.5296, 2.9616, 3.8010, 3.7735, 2.9558, 2.5915, 3.8905], device='cuda:0'), covar=tensor([0.0694, 0.2847, 0.2097, 0.2458, 0.1298, 0.2629, 0.2175, 0.0781], device='cuda:0'), in_proj_covar=tensor([0.0256, 0.0196, 0.0188, 0.0302, 0.0227, 0.0203, 0.0190, 0.0248], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 06:25:22,609 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3262, 3.0339, 3.4380, 1.7977, 3.3039, 3.5669, 3.5569, 4.0092], device='cuda:0'), covar=tensor([0.1826, 0.1401, 0.0923, 0.2655, 0.0564, 0.1323, 0.0403, 0.0525], device='cuda:0'), in_proj_covar=tensor([0.0164, 0.0181, 0.0168, 0.0182, 0.0180, 0.0200, 0.0167, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:25:29,000 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88018.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:25:42,925 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.823e+01 1.432e+02 1.789e+02 2.095e+02 4.590e+02, threshold=3.577e+02, percent-clipped=1.0 2022-11-16 06:26:00,229 INFO [train.py:876] (0/4) Epoch 13, batch 800, loss[loss=0.1119, simple_loss=0.1498, pruned_loss=0.03699, over 5760.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1367, pruned_loss=0.03822, over 1060421.61 frames. ], batch size: 16, lr: 6.30e-03, grad_scale: 8.0 2022-11-16 06:26:01,330 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88066.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:26:06,066 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5859, 1.8239, 2.1865, 1.7079, 2.1646, 2.3003, 1.6054, 1.5731], device='cuda:0'), covar=tensor([0.2137, 0.0471, 0.0240, 0.0561, 0.1404, 0.0734, 0.0614, 0.0768], device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0024, 0.0017, 0.0020, 0.0017, 0.0015, 0.0023, 0.0016], device='cuda:0'), out_proj_covar=tensor([8.4425e-05, 1.1668e-04, 8.8997e-05, 1.0314e-04, 9.0831e-05, 8.5333e-05, 1.1296e-04, 8.6204e-05], device='cuda:0') 2022-11-16 06:26:23,570 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-16 06:26:24,665 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=88100.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:26:33,989 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2022-11-16 06:26:36,774 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88117.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:26:51,416 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.479e+01 1.431e+02 1.753e+02 2.206e+02 3.833e+02, threshold=3.505e+02, percent-clipped=1.0 2022-11-16 06:27:05,810 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88161.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 06:27:05,854 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=88161.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:27:08,635 INFO [train.py:876] (0/4) Epoch 13, batch 900, loss[loss=0.09375, simple_loss=0.1301, pruned_loss=0.02869, over 5541.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1373, pruned_loss=0.03857, over 1068307.26 frames. ], batch size: 14, lr: 6.30e-03, grad_scale: 8.0 2022-11-16 06:27:08,676 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88165.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:27:38,493 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88209.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:27:59,345 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.909e+01 1.508e+02 1.868e+02 2.272e+02 4.107e+02, threshold=3.735e+02, percent-clipped=5.0 2022-11-16 06:28:00,097 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88240.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:28:03,431 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88245.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:28:09,271 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1782, 2.2185, 2.8722, 2.6636, 2.5588, 2.0607, 2.6798, 3.1285], device='cuda:0'), covar=tensor([0.0976, 0.1245, 0.0925, 0.1277, 0.0891, 0.1421, 0.1121, 0.0731], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0193, 0.0218, 0.0214, 0.0243, 0.0197, 0.0227, 0.0230], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:28:13,890 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-16 06:28:14,161 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88261.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:28:16,639 INFO [train.py:876] (0/4) Epoch 13, batch 1000, loss[loss=0.1063, simple_loss=0.1424, pruned_loss=0.0351, over 5604.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1373, pruned_loss=0.03825, over 1077183.72 frames. ], batch size: 22, lr: 6.29e-03, grad_scale: 8.0 2022-11-16 06:28:21,271 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2662, 3.8396, 4.0992, 3.8379, 4.3271, 4.1284, 3.9483, 4.2898], device='cuda:0'), covar=tensor([0.0339, 0.0435, 0.0457, 0.0390, 0.0357, 0.0263, 0.0348, 0.0360], device='cuda:0'), in_proj_covar=tensor([0.0146, 0.0155, 0.0111, 0.0144, 0.0183, 0.0109, 0.0129, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 06:28:28,629 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2022-11-16 06:28:32,426 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88288.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:28:34,845 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=88291.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 06:28:36,005 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88293.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:28:46,433 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88309.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:29:06,586 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.385e+01 1.405e+02 1.701e+02 2.123e+02 3.653e+02, threshold=3.402e+02, percent-clipped=0.0 2022-11-16 06:29:09,345 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.21 vs. limit=5.0 2022-11-16 06:29:16,012 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=88352.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 06:29:21,084 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2300, 2.8335, 3.0814, 1.6610, 2.9980, 3.3981, 3.2198, 3.8029], device='cuda:0'), covar=tensor([0.1866, 0.1686, 0.1220, 0.2938, 0.0988, 0.1095, 0.0933, 0.0691], device='cuda:0'), in_proj_covar=tensor([0.0165, 0.0181, 0.0167, 0.0182, 0.0182, 0.0201, 0.0168, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:29:21,665 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8257, 2.3595, 2.8341, 1.9945, 1.6432, 3.2349, 2.7261, 2.2589], device='cuda:0'), covar=tensor([0.0906, 0.1329, 0.1101, 0.2862, 0.3416, 0.2518, 0.0903, 0.1526], device='cuda:0'), in_proj_covar=tensor([0.0111, 0.0102, 0.0102, 0.0105, 0.0078, 0.0072, 0.0083, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 06:29:24,127 INFO [train.py:876] (0/4) Epoch 13, batch 1100, loss[loss=0.1189, simple_loss=0.1338, pruned_loss=0.05194, over 4689.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1366, pruned_loss=0.03747, over 1081384.85 frames. ], batch size: 135, lr: 6.29e-03, grad_scale: 8.0 2022-11-16 06:29:54,255 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0247, 2.3819, 3.5377, 3.1432, 3.8030, 2.6381, 3.3282, 3.9525], device='cuda:0'), covar=tensor([0.0675, 0.1761, 0.1030, 0.1479, 0.0517, 0.1704, 0.1391, 0.0726], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0193, 0.0217, 0.0212, 0.0243, 0.0198, 0.0228, 0.0231], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:30:13,853 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 1.472e+02 1.907e+02 2.402e+02 6.330e+02, threshold=3.813e+02, percent-clipped=8.0 2022-11-16 06:30:25,732 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=88456.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:30:31,427 INFO [train.py:876] (0/4) Epoch 13, batch 1200, loss[loss=0.06712, simple_loss=0.1025, pruned_loss=0.01585, over 5566.00 frames. ], tot_loss[loss=0.105, simple_loss=0.1358, pruned_loss=0.03706, over 1086312.73 frames. ], batch size: 10, lr: 6.28e-03, grad_scale: 8.0 2022-11-16 06:30:52,097 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6533, 1.8846, 1.9431, 1.6587, 1.9017, 1.8525, 0.9157, 1.9670], device='cuda:0'), covar=tensor([0.0439, 0.0487, 0.0408, 0.0493, 0.0490, 0.0511, 0.2478, 0.0483], device='cuda:0'), in_proj_covar=tensor([0.0106, 0.0089, 0.0089, 0.0084, 0.0103, 0.0091, 0.0132, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:31:20,659 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6328, 1.7156, 1.5652, 1.4303, 1.6362, 1.7219, 1.5026, 1.8369], device='cuda:0'), covar=tensor([0.0062, 0.0057, 0.0060, 0.0061, 0.0053, 0.0044, 0.0057, 0.0053], device='cuda:0'), in_proj_covar=tensor([0.0062, 0.0058, 0.0057, 0.0062, 0.0060, 0.0055, 0.0054, 0.0052], device='cuda:0'), out_proj_covar=tensor([5.5716e-05, 5.1105e-05, 4.9971e-05, 5.4693e-05, 5.3332e-05, 4.7837e-05, 4.8113e-05, 4.5824e-05], device='cuda:0') 2022-11-16 06:31:21,163 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.717e+01 1.518e+02 1.854e+02 2.184e+02 7.084e+02, threshold=3.708e+02, percent-clipped=2.0 2022-11-16 06:31:38,931 INFO [train.py:876] (0/4) Epoch 13, batch 1300, loss[loss=0.1063, simple_loss=0.1384, pruned_loss=0.03709, over 5542.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1356, pruned_loss=0.03743, over 1084604.08 frames. ], batch size: 15, lr: 6.28e-03, grad_scale: 8.0 2022-11-16 06:32:19,677 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.84 vs. limit=2.0 2022-11-16 06:32:26,912 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.16 vs. limit=5.0 2022-11-16 06:32:28,429 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.235e+01 1.439e+02 1.728e+02 2.189e+02 4.268e+02, threshold=3.455e+02, percent-clipped=2.0 2022-11-16 06:32:29,235 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1068, 2.0855, 2.0374, 1.8687, 1.8541, 1.5343, 2.4028, 1.8749], device='cuda:0'), covar=tensor([0.0073, 0.0039, 0.0070, 0.0058, 0.0072, 0.0134, 0.0037, 0.0046], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0028, 0.0028, 0.0037, 0.0032, 0.0029, 0.0037, 0.0034], device='cuda:0'), out_proj_covar=tensor([2.7976e-05, 2.6462e-05, 2.5399e-05, 3.5335e-05, 2.9469e-05, 2.7629e-05, 3.5336e-05, 3.3028e-05], device='cuda:0') 2022-11-16 06:32:33,695 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=88647.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:32:41,767 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8040, 2.9652, 3.0321, 2.8190, 2.9793, 2.8341, 1.1877, 3.0542], device='cuda:0'), covar=tensor([0.0323, 0.0315, 0.0302, 0.0317, 0.0325, 0.0417, 0.3041, 0.0383], device='cuda:0'), in_proj_covar=tensor([0.0107, 0.0090, 0.0090, 0.0085, 0.0105, 0.0092, 0.0133, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:32:45,383 INFO [train.py:876] (0/4) Epoch 13, batch 1400, loss[loss=0.1382, simple_loss=0.1559, pruned_loss=0.06027, over 5166.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1373, pruned_loss=0.03861, over 1080332.22 frames. ], batch size: 91, lr: 6.28e-03, grad_scale: 8.0 2022-11-16 06:33:34,910 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.011e+02 1.374e+02 1.560e+02 2.014e+02 3.886e+02, threshold=3.121e+02, percent-clipped=4.0 2022-11-16 06:33:41,079 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6460, 3.5847, 3.5799, 3.3147, 2.0393, 3.6117, 2.2573, 3.0637], device='cuda:0'), covar=tensor([0.0403, 0.0286, 0.0208, 0.0366, 0.0634, 0.0195, 0.0512, 0.0196], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0181, 0.0186, 0.0208, 0.0196, 0.0184, 0.0192, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 06:33:44,841 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6830, 1.1871, 1.2593, 0.9615, 1.3933, 1.4000, 0.6896, 1.2177], device='cuda:0'), covar=tensor([0.0234, 0.0418, 0.0414, 0.0770, 0.0778, 0.0659, 0.0821, 0.0391], device='cuda:0'), in_proj_covar=tensor([0.0015, 0.0024, 0.0017, 0.0020, 0.0017, 0.0016, 0.0023, 0.0016], device='cuda:0'), out_proj_covar=tensor([8.5750e-05, 1.1996e-04, 9.1520e-05, 1.0491e-04, 9.3637e-05, 8.8085e-05, 1.1438e-04, 8.7042e-05], device='cuda:0') 2022-11-16 06:33:46,719 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88756.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:33:52,513 INFO [train.py:876] (0/4) Epoch 13, batch 1500, loss[loss=0.09676, simple_loss=0.1336, pruned_loss=0.02995, over 5685.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1373, pruned_loss=0.03832, over 1089567.40 frames. ], batch size: 19, lr: 6.27e-03, grad_scale: 8.0 2022-11-16 06:34:19,409 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88804.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:34:39,996 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2022-11-16 06:34:42,724 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.033e+02 1.503e+02 1.931e+02 2.477e+02 5.840e+02, threshold=3.862e+02, percent-clipped=6.0 2022-11-16 06:35:00,127 INFO [train.py:876] (0/4) Epoch 13, batch 1600, loss[loss=0.1525, simple_loss=0.1719, pruned_loss=0.06661, over 5364.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.137, pruned_loss=0.03769, over 1088282.90 frames. ], batch size: 70, lr: 6.27e-03, grad_scale: 8.0 2022-11-16 06:35:05,762 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.62 vs. limit=2.0 2022-11-16 06:35:20,641 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1258, 2.9091, 2.9369, 1.5386, 2.8202, 3.1580, 2.9005, 3.3781], device='cuda:0'), covar=tensor([0.1941, 0.1381, 0.1505, 0.2940, 0.0971, 0.0889, 0.0881, 0.0781], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0183, 0.0168, 0.0182, 0.0182, 0.0201, 0.0169, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:35:49,235 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.383e+02 1.790e+02 2.013e+02 5.184e+02, threshold=3.580e+02, percent-clipped=2.0 2022-11-16 06:35:55,095 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=88947.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 06:35:59,273 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9614, 1.6435, 1.9377, 1.5383, 1.8729, 1.9151, 1.6875, 1.5639], device='cuda:0'), covar=tensor([0.0046, 0.0059, 0.0039, 0.0055, 0.0109, 0.0097, 0.0046, 0.0054], device='cuda:0'), in_proj_covar=tensor([0.0030, 0.0027, 0.0028, 0.0036, 0.0031, 0.0028, 0.0036, 0.0034], device='cuda:0'), out_proj_covar=tensor([2.7598e-05, 2.5623e-05, 2.5258e-05, 3.4475e-05, 2.8825e-05, 2.7064e-05, 3.4241e-05, 3.2279e-05], device='cuda:0') 2022-11-16 06:36:07,118 INFO [train.py:876] (0/4) Epoch 13, batch 1700, loss[loss=0.1147, simple_loss=0.1388, pruned_loss=0.04526, over 5018.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1367, pruned_loss=0.03786, over 1084999.95 frames. ], batch size: 110, lr: 6.27e-03, grad_scale: 8.0 2022-11-16 06:36:18,301 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.79 vs. limit=5.0 2022-11-16 06:36:18,783 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9900, 2.5457, 2.9766, 1.8856, 1.6787, 3.5735, 2.8126, 2.4547], device='cuda:0'), covar=tensor([0.0713, 0.1037, 0.0657, 0.2498, 0.2861, 0.2180, 0.0828, 0.1179], device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0100, 0.0100, 0.0102, 0.0077, 0.0070, 0.0081, 0.0093], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 06:36:26,872 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=88995.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:36:49,155 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89027.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:36:55,567 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.34 vs. limit=5.0 2022-11-16 06:36:56,972 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.337e+01 1.392e+02 1.730e+02 2.257e+02 5.092e+02, threshold=3.461e+02, percent-clipped=3.0 2022-11-16 06:37:01,296 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1152, 3.4464, 2.9038, 3.4368, 3.4605, 3.0772, 3.1512, 3.1681], device='cuda:0'), covar=tensor([0.1280, 0.0735, 0.2212, 0.0711, 0.0723, 0.0683, 0.0855, 0.0790], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0178, 0.0274, 0.0175, 0.0222, 0.0174, 0.0190, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:37:15,267 INFO [train.py:876] (0/4) Epoch 13, batch 1800, loss[loss=0.1358, simple_loss=0.1625, pruned_loss=0.05453, over 5708.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1359, pruned_loss=0.03726, over 1087402.00 frames. ], batch size: 28, lr: 6.26e-03, grad_scale: 8.0 2022-11-16 06:37:17,108 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2022-11-16 06:37:30,531 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89088.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:37:36,069 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7423, 1.0471, 1.0006, 0.7674, 0.7727, 0.9604, 0.8566, 0.7518], device='cuda:0'), covar=tensor([0.0037, 0.0023, 0.0027, 0.0030, 0.0037, 0.0031, 0.0033, 0.0052], device='cuda:0'), in_proj_covar=tensor([0.0030, 0.0028, 0.0028, 0.0036, 0.0032, 0.0028, 0.0036, 0.0034], device='cuda:0'), out_proj_covar=tensor([2.7998e-05, 2.5779e-05, 2.5532e-05, 3.4884e-05, 2.9371e-05, 2.7341e-05, 3.4413e-05, 3.2608e-05], device='cuda:0') 2022-11-16 06:37:39,964 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6380, 2.3424, 3.2070, 2.8772, 3.1248, 2.3288, 2.9936, 3.5799], device='cuda:0'), covar=tensor([0.0585, 0.1284, 0.0888, 0.1395, 0.0893, 0.1497, 0.1123, 0.0745], device='cuda:0'), in_proj_covar=tensor([0.0240, 0.0191, 0.0214, 0.0207, 0.0239, 0.0195, 0.0222, 0.0226], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:37:54,575 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8394, 2.8495, 2.9855, 2.7071, 2.9682, 2.8399, 1.3351, 3.0140], device='cuda:0'), covar=tensor([0.0284, 0.0344, 0.0306, 0.0342, 0.0303, 0.0357, 0.2599, 0.0307], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0088, 0.0088, 0.0083, 0.0102, 0.0090, 0.0130, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:38:04,922 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.586e+01 1.379e+02 1.721e+02 2.183e+02 4.295e+02, threshold=3.442e+02, percent-clipped=5.0 2022-11-16 06:38:23,036 INFO [train.py:876] (0/4) Epoch 13, batch 1900, loss[loss=0.09487, simple_loss=0.1276, pruned_loss=0.03104, over 5721.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.136, pruned_loss=0.03768, over 1081148.90 frames. ], batch size: 17, lr: 6.26e-03, grad_scale: 8.0 2022-11-16 06:38:25,958 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89169.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:39:06,965 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89230.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:39:12,531 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 1.392e+02 1.772e+02 2.206e+02 3.328e+02, threshold=3.543e+02, percent-clipped=0.0 2022-11-16 06:39:29,336 INFO [train.py:876] (0/4) Epoch 13, batch 2000, loss[loss=0.09752, simple_loss=0.1349, pruned_loss=0.03009, over 5735.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1342, pruned_loss=0.03606, over 1087688.68 frames. ], batch size: 20, lr: 6.26e-03, grad_scale: 8.0 2022-11-16 06:39:40,894 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5270, 1.8556, 1.5204, 1.1640, 1.6425, 2.0200, 1.8068, 2.0120], device='cuda:0'), covar=tensor([0.1557, 0.1295, 0.1883, 0.2550, 0.1238, 0.1075, 0.0935, 0.1186], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0184, 0.0170, 0.0184, 0.0182, 0.0203, 0.0170, 0.0184], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:40:20,292 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.714e+01 1.486e+02 1.827e+02 2.274e+02 3.584e+02, threshold=3.655e+02, percent-clipped=1.0 2022-11-16 06:40:37,213 INFO [train.py:876] (0/4) Epoch 13, batch 2100, loss[loss=0.09394, simple_loss=0.1367, pruned_loss=0.0256, over 5549.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1352, pruned_loss=0.03692, over 1086096.95 frames. ], batch size: 21, lr: 6.25e-03, grad_scale: 8.0 2022-11-16 06:40:43,512 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89374.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:40:49,278 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89383.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:41:17,725 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89425.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:41:19,020 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3851, 4.5881, 3.0599, 4.2407, 3.6449, 3.0468, 2.5879, 3.8430], device='cuda:0'), covar=tensor([0.1764, 0.0246, 0.1239, 0.0479, 0.0564, 0.1040, 0.1935, 0.0416], device='cuda:0'), in_proj_covar=tensor([0.0155, 0.0143, 0.0156, 0.0148, 0.0173, 0.0169, 0.0157, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:41:24,436 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89435.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:41:26,555 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.69 vs. limit=2.0 2022-11-16 06:41:26,712 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.88 vs. limit=2.0 2022-11-16 06:41:26,874 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.665e+01 1.565e+02 1.857e+02 2.395e+02 6.396e+02, threshold=3.713e+02, percent-clipped=5.0 2022-11-16 06:41:27,071 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89439.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:41:31,531 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7095, 2.1783, 2.6326, 3.6962, 3.6287, 2.6584, 2.5694, 3.6571], device='cuda:0'), covar=tensor([0.0867, 0.3062, 0.2281, 0.1792, 0.1190, 0.2950, 0.2013, 0.0860], device='cuda:0'), in_proj_covar=tensor([0.0257, 0.0197, 0.0189, 0.0297, 0.0224, 0.0201, 0.0189, 0.0249], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 06:41:41,599 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3886, 2.9366, 3.4179, 1.5338, 3.1726, 3.6375, 3.4089, 3.7236], device='cuda:0'), covar=tensor([0.2041, 0.1622, 0.0952, 0.3432, 0.0462, 0.0672, 0.0542, 0.0845], device='cuda:0'), in_proj_covar=tensor([0.0165, 0.0185, 0.0169, 0.0184, 0.0183, 0.0203, 0.0169, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:41:44,654 INFO [train.py:876] (0/4) Epoch 13, batch 2200, loss[loss=0.07427, simple_loss=0.1007, pruned_loss=0.02393, over 5527.00 frames. ], tot_loss[loss=0.105, simple_loss=0.1356, pruned_loss=0.03721, over 1084634.10 frames. ], batch size: 10, lr: 6.25e-03, grad_scale: 8.0 2022-11-16 06:41:58,281 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89486.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:42:07,608 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89500.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:42:14,370 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89509.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:42:24,620 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89525.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:42:33,578 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.106e+01 1.371e+02 1.691e+02 2.068e+02 3.234e+02, threshold=3.383e+02, percent-clipped=0.0 2022-11-16 06:42:41,285 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6206, 4.8754, 4.5887, 4.3508, 4.8414, 4.6342, 2.0794, 4.8675], device='cuda:0'), covar=tensor([0.0254, 0.0281, 0.0260, 0.0320, 0.0229, 0.0393, 0.2970, 0.0261], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0089, 0.0088, 0.0083, 0.0102, 0.0090, 0.0130, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:42:51,730 INFO [train.py:876] (0/4) Epoch 13, batch 2300, loss[loss=0.0762, simple_loss=0.1211, pruned_loss=0.01563, over 5763.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1365, pruned_loss=0.03836, over 1086643.80 frames. ], batch size: 14, lr: 6.25e-03, grad_scale: 8.0 2022-11-16 06:42:55,271 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89570.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:43:09,549 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-16 06:43:41,343 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 1.502e+02 1.727e+02 2.123e+02 1.355e+03, threshold=3.453e+02, percent-clipped=6.0 2022-11-16 06:44:00,256 INFO [train.py:876] (0/4) Epoch 13, batch 2400, loss[loss=0.06706, simple_loss=0.1103, pruned_loss=0.01189, over 5561.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1374, pruned_loss=0.03829, over 1083490.56 frames. ], batch size: 13, lr: 6.24e-03, grad_scale: 8.0 2022-11-16 06:44:12,322 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89680.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:44:14,486 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=89683.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:44:44,756 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8362, 2.3373, 2.2872, 1.3790, 2.7003, 2.7064, 2.5790, 2.6543], device='cuda:0'), covar=tensor([0.2391, 0.2060, 0.1646, 0.3416, 0.0919, 0.1466, 0.0727, 0.1502], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0183, 0.0170, 0.0185, 0.0183, 0.0204, 0.0169, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:44:47,031 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89730.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:44:47,657 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=89731.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:44:52,830 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.552e+02 1.853e+02 2.424e+02 4.958e+02, threshold=3.705e+02, percent-clipped=7.0 2022-11-16 06:44:54,407 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89741.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:45:00,992 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.28 vs. limit=2.0 2022-11-16 06:45:09,831 INFO [train.py:876] (0/4) Epoch 13, batch 2500, loss[loss=0.1209, simple_loss=0.1545, pruned_loss=0.04369, over 5773.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.136, pruned_loss=0.03719, over 1085612.81 frames. ], batch size: 21, lr: 6.24e-03, grad_scale: 8.0 2022-11-16 06:45:21,168 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89781.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:45:30,642 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89795.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:45:46,818 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.60 vs. limit=2.0 2022-11-16 06:45:50,791 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=89825.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:46:00,865 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.702e+01 1.449e+02 1.693e+02 2.128e+02 5.529e+02, threshold=3.385e+02, percent-clipped=3.0 2022-11-16 06:46:06,941 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3132, 2.3680, 3.0176, 2.7790, 2.8485, 2.3399, 2.8734, 3.2644], device='cuda:0'), covar=tensor([0.0746, 0.1210, 0.0742, 0.1245, 0.0933, 0.1346, 0.1010, 0.0775], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0195, 0.0216, 0.0212, 0.0241, 0.0199, 0.0228, 0.0231], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:46:17,694 INFO [train.py:876] (0/4) Epoch 13, batch 2600, loss[loss=0.1126, simple_loss=0.1436, pruned_loss=0.04075, over 5466.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1356, pruned_loss=0.03695, over 1084419.01 frames. ], batch size: 53, lr: 6.24e-03, grad_scale: 8.0 2022-11-16 06:46:17,778 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=89865.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:46:17,859 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=89865.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:46:23,247 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=89873.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:46:35,979 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3983, 2.3052, 3.0758, 2.7732, 2.8516, 2.3351, 2.9372, 3.3668], device='cuda:0'), covar=tensor([0.0763, 0.1389, 0.1004, 0.1364, 0.1186, 0.1381, 0.1204, 0.0919], device='cuda:0'), in_proj_covar=tensor([0.0242, 0.0194, 0.0215, 0.0211, 0.0240, 0.0197, 0.0226, 0.0230], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:46:58,591 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=89926.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 06:47:07,229 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.464e+01 1.385e+02 1.759e+02 2.199e+02 3.359e+02, threshold=3.518e+02, percent-clipped=0.0 2022-11-16 06:47:23,182 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3874, 3.0708, 3.2733, 1.5139, 3.1089, 3.4641, 3.4066, 3.7360], device='cuda:0'), covar=tensor([0.1952, 0.1659, 0.0812, 0.3282, 0.0812, 0.0813, 0.0743, 0.0932], device='cuda:0'), in_proj_covar=tensor([0.0168, 0.0183, 0.0172, 0.0185, 0.0184, 0.0204, 0.0170, 0.0184], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:47:24,942 INFO [train.py:876] (0/4) Epoch 13, batch 2700, loss[loss=0.1021, simple_loss=0.1574, pruned_loss=0.02342, over 5105.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1356, pruned_loss=0.03683, over 1085082.13 frames. ], batch size: 7, lr: 6.23e-03, grad_scale: 16.0 2022-11-16 06:47:48,639 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-90000.pt 2022-11-16 06:48:11,426 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5471, 4.0819, 4.3594, 4.1132, 4.6146, 4.4067, 4.1470, 4.5739], device='cuda:0'), covar=tensor([0.0405, 0.0443, 0.0480, 0.0341, 0.0398, 0.0261, 0.0322, 0.0318], device='cuda:0'), in_proj_covar=tensor([0.0149, 0.0157, 0.0113, 0.0147, 0.0186, 0.0112, 0.0131, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 06:48:12,089 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90030.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:48:15,890 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90036.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:48:17,698 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.688e+01 1.446e+02 1.718e+02 2.130e+02 5.119e+02, threshold=3.437e+02, percent-clipped=5.0 2022-11-16 06:48:35,916 INFO [train.py:876] (0/4) Epoch 13, batch 2800, loss[loss=0.1064, simple_loss=0.1408, pruned_loss=0.03602, over 5692.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.135, pruned_loss=0.03659, over 1084190.93 frames. ], batch size: 19, lr: 6.23e-03, grad_scale: 16.0 2022-11-16 06:48:44,255 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90078.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:48:46,086 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.42 vs. limit=5.0 2022-11-16 06:48:46,333 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90081.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:48:55,330 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90095.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 06:49:18,682 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90129.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:49:19,458 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90130.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:49:25,057 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.858e+01 1.324e+02 1.624e+02 2.114e+02 4.134e+02, threshold=3.247e+02, percent-clipped=3.0 2022-11-16 06:49:27,794 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90143.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 06:49:28,137 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.87 vs. limit=2.0 2022-11-16 06:49:43,063 INFO [train.py:876] (0/4) Epoch 13, batch 2900, loss[loss=0.06094, simple_loss=0.1064, pruned_loss=0.007739, over 5136.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1359, pruned_loss=0.03713, over 1081040.53 frames. ], batch size: 7, lr: 6.23e-03, grad_scale: 16.0 2022-11-16 06:49:43,166 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90165.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:50:00,520 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90191.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:50:16,487 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90213.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:50:21,686 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90221.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 06:50:28,901 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90232.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:50:33,300 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 1.381e+02 1.773e+02 2.128e+02 3.504e+02, threshold=3.546e+02, percent-clipped=3.0 2022-11-16 06:50:49,075 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8499, 4.4265, 4.6574, 4.4267, 4.9337, 4.7912, 4.3347, 4.8744], device='cuda:0'), covar=tensor([0.0369, 0.0387, 0.0441, 0.0288, 0.0331, 0.0225, 0.0335, 0.0306], device='cuda:0'), in_proj_covar=tensor([0.0147, 0.0155, 0.0111, 0.0145, 0.0183, 0.0112, 0.0129, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 06:50:51,297 INFO [train.py:876] (0/4) Epoch 13, batch 3000, loss[loss=0.1767, simple_loss=0.1684, pruned_loss=0.09247, over 3084.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1347, pruned_loss=0.03679, over 1079986.53 frames. ], batch size: 284, lr: 6.22e-03, grad_scale: 16.0 2022-11-16 06:50:51,298 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 06:50:56,582 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.2562, 4.9059, 5.2191, 4.8878, 4.7038, 4.8544, 5.4330, 5.3400], device='cuda:0'), covar=tensor([0.0307, 0.0627, 0.0200, 0.1140, 0.0280, 0.0171, 0.0469, 0.0196], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0107, 0.0095, 0.0121, 0.0089, 0.0080, 0.0147, 0.0103], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:51:08,998 INFO [train.py:908] (0/4) Epoch 13, validation: loss=0.1737, simple_loss=0.1855, pruned_loss=0.08091, over 1530663.00 frames. 2022-11-16 06:51:08,999 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 06:51:11,753 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.0458, 4.5386, 4.7618, 4.5682, 5.0615, 4.8215, 4.3919, 5.0047], device='cuda:0'), covar=tensor([0.0304, 0.0365, 0.0440, 0.0279, 0.0333, 0.0235, 0.0298, 0.0260], device='cuda:0'), in_proj_covar=tensor([0.0148, 0.0156, 0.0112, 0.0146, 0.0183, 0.0112, 0.0129, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 06:51:15,060 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7478, 4.2385, 3.9691, 4.2973, 4.3325, 3.6805, 3.9438, 3.7837], device='cuda:0'), covar=tensor([0.0578, 0.0433, 0.1115, 0.0356, 0.0320, 0.0425, 0.0552, 0.0507], device='cuda:0'), in_proj_covar=tensor([0.0133, 0.0179, 0.0277, 0.0176, 0.0222, 0.0176, 0.0191, 0.0177], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:51:27,450 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90293.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:51:53,024 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6919, 1.6007, 1.7141, 1.3157, 1.6804, 1.7460, 1.4791, 1.1238], device='cuda:0'), covar=tensor([0.0053, 0.0065, 0.0081, 0.0078, 0.0068, 0.0050, 0.0045, 0.0061], device='cuda:0'), in_proj_covar=tensor([0.0030, 0.0026, 0.0027, 0.0035, 0.0031, 0.0028, 0.0034, 0.0033], device='cuda:0'), out_proj_covar=tensor([2.7341e-05, 2.4882e-05, 2.4641e-05, 3.3688e-05, 2.8487e-05, 2.6582e-05, 3.2582e-05, 3.1574e-05], device='cuda:0') 2022-11-16 06:51:56,871 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90336.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:51:58,707 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.650e+01 1.477e+02 1.762e+02 2.223e+02 4.727e+02, threshold=3.524e+02, percent-clipped=4.0 2022-11-16 06:52:06,689 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90351.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:52:16,335 INFO [train.py:876] (0/4) Epoch 13, batch 3100, loss[loss=0.09768, simple_loss=0.1284, pruned_loss=0.03346, over 5560.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1349, pruned_loss=0.03666, over 1078856.18 frames. ], batch size: 13, lr: 6.22e-03, grad_scale: 16.0 2022-11-16 06:52:29,310 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90384.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:52:47,892 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90412.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:53:03,809 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2025, 5.0360, 3.8075, 2.1984, 4.6695, 2.0020, 4.6985, 2.7595], device='cuda:0'), covar=tensor([0.1112, 0.0135, 0.0464, 0.2009, 0.0163, 0.1802, 0.0221, 0.1521], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0103, 0.0115, 0.0111, 0.0103, 0.0118, 0.0100, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:53:06,644 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.938e+01 1.382e+02 1.732e+02 2.119e+02 3.320e+02, threshold=3.464e+02, percent-clipped=0.0 2022-11-16 06:53:19,976 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90459.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:53:23,666 INFO [train.py:876] (0/4) Epoch 13, batch 3200, loss[loss=0.1469, simple_loss=0.1649, pruned_loss=0.06444, over 5578.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.135, pruned_loss=0.03698, over 1076021.83 frames. ], batch size: 43, lr: 6.22e-03, grad_scale: 16.0 2022-11-16 06:53:25,669 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5188, 4.3174, 4.5434, 4.4433, 4.1668, 3.8368, 4.9601, 4.5314], device='cuda:0'), covar=tensor([0.0453, 0.0981, 0.0465, 0.1382, 0.0493, 0.0432, 0.0718, 0.0566], device='cuda:0'), in_proj_covar=tensor([0.0086, 0.0106, 0.0094, 0.0120, 0.0088, 0.0079, 0.0145, 0.0101], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 06:53:28,995 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7061, 1.1897, 0.7600, 0.8940, 1.0122, 0.9169, 0.4617, 1.1158], device='cuda:0'), covar=tensor([0.0102, 0.0050, 0.0085, 0.0053, 0.0066, 0.0074, 0.0120, 0.0064], device='cuda:0'), in_proj_covar=tensor([0.0063, 0.0059, 0.0059, 0.0063, 0.0061, 0.0057, 0.0056, 0.0053], device='cuda:0'), out_proj_covar=tensor([5.6624e-05, 5.1854e-05, 5.1563e-05, 5.5964e-05, 5.4264e-05, 4.9833e-05, 5.0045e-05, 4.6723e-05], device='cuda:0') 2022-11-16 06:53:33,832 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2022-11-16 06:53:38,082 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90486.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:53:44,976 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90496.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:53:45,611 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1224, 0.9142, 1.0316, 0.8572, 1.1214, 0.9556, 0.5938, 0.7640], device='cuda:0'), covar=tensor([0.0244, 0.0446, 0.0318, 0.0429, 0.0327, 0.0298, 0.0823, 0.0398], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0026, 0.0018, 0.0022, 0.0018, 0.0017, 0.0024, 0.0017], device='cuda:0'), out_proj_covar=tensor([9.0706e-05, 1.2796e-04, 9.6738e-05, 1.1123e-04, 9.9414e-05, 9.2293e-05, 1.2089e-04, 9.2709e-05], device='cuda:0') 2022-11-16 06:54:00,526 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90520.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:54:01,091 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90521.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 06:54:09,679 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.94 vs. limit=5.0 2022-11-16 06:54:13,667 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.437e+02 1.897e+02 2.279e+02 5.045e+02, threshold=3.794e+02, percent-clipped=5.0 2022-11-16 06:54:25,888 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90557.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:54:30,905 INFO [train.py:876] (0/4) Epoch 13, batch 3300, loss[loss=0.1189, simple_loss=0.1403, pruned_loss=0.04879, over 4605.00 frames. ], tot_loss[loss=0.1049, simple_loss=0.1356, pruned_loss=0.03714, over 1082664.79 frames. ], batch size: 135, lr: 6.21e-03, grad_scale: 16.0 2022-11-16 06:54:33,619 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90569.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:54:46,731 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90588.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:55:21,064 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 1.383e+02 1.673e+02 2.134e+02 3.431e+02, threshold=3.345e+02, percent-clipped=0.0 2022-11-16 06:55:22,605 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2022-11-16 06:55:35,389 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.60 vs. limit=2.0 2022-11-16 06:55:38,732 INFO [train.py:876] (0/4) Epoch 13, batch 3400, loss[loss=0.14, simple_loss=0.1547, pruned_loss=0.06271, over 5540.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1355, pruned_loss=0.03741, over 1083321.78 frames. ], batch size: 46, lr: 6.21e-03, grad_scale: 16.0 2022-11-16 06:56:07,881 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90707.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:56:29,389 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.201e+01 1.464e+02 1.802e+02 2.100e+02 5.077e+02, threshold=3.604e+02, percent-clipped=5.0 2022-11-16 06:56:47,179 INFO [train.py:876] (0/4) Epoch 13, batch 3500, loss[loss=0.09526, simple_loss=0.1469, pruned_loss=0.0218, over 5555.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.136, pruned_loss=0.03776, over 1088289.14 frames. ], batch size: 25, lr: 6.21e-03, grad_scale: 16.0 2022-11-16 06:57:00,806 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90786.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:57:05,638 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.82 vs. limit=5.0 2022-11-16 06:57:21,074 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90815.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:57:33,464 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90834.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:57:35,592 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5829, 1.2466, 1.2505, 1.1341, 1.4766, 1.7014, 0.9587, 1.1715], device='cuda:0'), covar=tensor([0.0292, 0.0512, 0.0402, 0.0636, 0.0420, 0.0441, 0.0691, 0.0585], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0025, 0.0018, 0.0021, 0.0018, 0.0016, 0.0024, 0.0017], device='cuda:0'), out_proj_covar=tensor([9.0031e-05, 1.2574e-04, 9.5476e-05, 1.0891e-04, 9.7324e-05, 9.1081e-05, 1.1844e-04, 9.1347e-05], device='cuda:0') 2022-11-16 06:57:36,721 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 1.483e+02 1.758e+02 2.114e+02 3.884e+02, threshold=3.515e+02, percent-clipped=1.0 2022-11-16 06:57:43,829 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=90849.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:57:46,113 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=90852.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:57:54,730 INFO [train.py:876] (0/4) Epoch 13, batch 3600, loss[loss=0.08073, simple_loss=0.1151, pruned_loss=0.02319, over 5579.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1356, pruned_loss=0.0373, over 1088995.00 frames. ], batch size: 18, lr: 6.20e-03, grad_scale: 16.0 2022-11-16 06:58:08,761 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6175, 2.3073, 2.8531, 2.0935, 1.4877, 3.2968, 2.7086, 2.4173], device='cuda:0'), covar=tensor([0.1146, 0.1539, 0.0844, 0.2492, 0.3428, 0.0605, 0.1106, 0.1443], device='cuda:0'), in_proj_covar=tensor([0.0111, 0.0103, 0.0101, 0.0104, 0.0078, 0.0072, 0.0081, 0.0094], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 06:58:10,069 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=90888.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:58:25,598 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=90910.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:58:42,747 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=90936.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:58:44,664 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.811e+01 1.461e+02 1.838e+02 2.236e+02 5.014e+02, threshold=3.676e+02, percent-clipped=2.0 2022-11-16 06:59:02,754 INFO [train.py:876] (0/4) Epoch 13, batch 3700, loss[loss=0.1162, simple_loss=0.1498, pruned_loss=0.0413, over 5703.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.135, pruned_loss=0.03696, over 1088352.18 frames. ], batch size: 34, lr: 6.20e-03, grad_scale: 16.0 2022-11-16 06:59:30,590 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91007.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 06:59:35,136 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6354, 4.3847, 3.3699, 2.1669, 4.1320, 1.8366, 4.2077, 2.5748], device='cuda:0'), covar=tensor([0.1463, 0.0149, 0.0609, 0.1941, 0.0242, 0.1828, 0.0205, 0.1641], device='cuda:0'), in_proj_covar=tensor([0.0121, 0.0104, 0.0116, 0.0112, 0.0104, 0.0120, 0.0101, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 06:59:52,312 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.488e+01 1.377e+02 1.670e+02 2.030e+02 3.964e+02, threshold=3.341e+02, percent-clipped=2.0 2022-11-16 07:00:03,056 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91055.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:00:09,530 INFO [train.py:876] (0/4) Epoch 13, batch 3800, loss[loss=0.08337, simple_loss=0.125, pruned_loss=0.02088, over 5525.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1339, pruned_loss=0.03582, over 1089117.50 frames. ], batch size: 13, lr: 6.19e-03, grad_scale: 16.0 2022-11-16 07:00:43,435 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91115.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:00:59,665 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 1.416e+02 1.762e+02 2.192e+02 4.990e+02, threshold=3.525e+02, percent-clipped=3.0 2022-11-16 07:01:08,644 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91152.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:01:08,708 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3759, 3.0662, 3.6282, 1.8383, 3.3231, 3.7141, 3.5736, 3.9104], device='cuda:0'), covar=tensor([0.1954, 0.1690, 0.0678, 0.3030, 0.0760, 0.0681, 0.0732, 0.0805], device='cuda:0'), in_proj_covar=tensor([0.0165, 0.0181, 0.0169, 0.0185, 0.0182, 0.0201, 0.0168, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:01:15,758 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91163.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:01:17,058 INFO [train.py:876] (0/4) Epoch 13, batch 3900, loss[loss=0.08945, simple_loss=0.1293, pruned_loss=0.02481, over 5560.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1344, pruned_loss=0.03635, over 1085017.44 frames. ], batch size: 25, lr: 6.19e-03, grad_scale: 16.0 2022-11-16 07:01:25,048 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3941, 2.3306, 2.3538, 2.3702, 2.3943, 2.2143, 2.6633, 2.4738], device='cuda:0'), covar=tensor([0.0691, 0.0989, 0.0732, 0.1395, 0.0781, 0.0612, 0.1023, 0.1057], device='cuda:0'), in_proj_covar=tensor([0.0087, 0.0108, 0.0096, 0.0123, 0.0090, 0.0081, 0.0147, 0.0104], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:01:25,343 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.77 vs. limit=5.0 2022-11-16 07:01:41,633 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91200.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:01:43,215 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-16 07:01:44,964 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=91205.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:01:56,171 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1133, 3.0608, 2.7376, 3.0172, 3.0843, 2.7093, 2.6618, 2.8066], device='cuda:0'), covar=tensor([0.0298, 0.0611, 0.1396, 0.0537, 0.0521, 0.0555, 0.1025, 0.0695], device='cuda:0'), in_proj_covar=tensor([0.0129, 0.0179, 0.0270, 0.0173, 0.0219, 0.0172, 0.0187, 0.0175], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 07:02:07,454 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.326e+01 1.380e+02 1.738e+02 2.230e+02 3.262e+02, threshold=3.475e+02, percent-clipped=0.0 2022-11-16 07:02:25,432 INFO [train.py:876] (0/4) Epoch 13, batch 4000, loss[loss=0.07467, simple_loss=0.1183, pruned_loss=0.01554, over 5544.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1337, pruned_loss=0.0359, over 1080833.96 frames. ], batch size: 14, lr: 6.19e-03, grad_scale: 16.0 2022-11-16 07:02:38,645 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91285.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:03:15,061 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.467e+01 1.389e+02 1.729e+02 2.051e+02 4.497e+02, threshold=3.458e+02, percent-clipped=2.0 2022-11-16 07:03:19,973 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=91346.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:03:33,555 INFO [train.py:876] (0/4) Epoch 13, batch 4100, loss[loss=0.1102, simple_loss=0.1398, pruned_loss=0.04028, over 5527.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1343, pruned_loss=0.03668, over 1079415.50 frames. ], batch size: 40, lr: 6.18e-03, grad_scale: 16.0 2022-11-16 07:03:41,465 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8910, 2.5153, 2.0623, 1.5000, 2.6917, 2.5162, 2.4298, 2.6876], device='cuda:0'), covar=tensor([0.1575, 0.1517, 0.1750, 0.2709, 0.0725, 0.1100, 0.0659, 0.1163], device='cuda:0'), in_proj_covar=tensor([0.0164, 0.0181, 0.0169, 0.0184, 0.0182, 0.0200, 0.0168, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:04:23,286 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.983e+01 1.379e+02 1.736e+02 2.115e+02 4.817e+02, threshold=3.473e+02, percent-clipped=3.0 2022-11-16 07:04:40,828 INFO [train.py:876] (0/4) Epoch 13, batch 4200, loss[loss=0.1425, simple_loss=0.1513, pruned_loss=0.0668, over 4111.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1341, pruned_loss=0.03608, over 1084805.01 frames. ], batch size: 181, lr: 6.18e-03, grad_scale: 16.0 2022-11-16 07:04:54,194 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3701, 2.4824, 2.1876, 2.4180, 2.0469, 1.8506, 2.2455, 2.8573], device='cuda:0'), covar=tensor([0.1259, 0.1620, 0.1896, 0.1743, 0.1702, 0.1897, 0.1554, 0.0740], device='cuda:0'), in_proj_covar=tensor([0.0116, 0.0111, 0.0108, 0.0109, 0.0096, 0.0107, 0.0101, 0.0085], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 07:05:07,947 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91505.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:05:31,428 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.444e+01 1.459e+02 1.798e+02 2.244e+02 4.763e+02, threshold=3.595e+02, percent-clipped=2.0 2022-11-16 07:05:33,640 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91542.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:05:40,833 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91553.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:05:48,487 INFO [train.py:876] (0/4) Epoch 13, batch 4300, loss[loss=0.1212, simple_loss=0.1533, pruned_loss=0.04456, over 5496.00 frames. ], tot_loss[loss=0.1043, simple_loss=0.1347, pruned_loss=0.0369, over 1078429.22 frames. ], batch size: 49, lr: 6.18e-03, grad_scale: 16.0 2022-11-16 07:05:53,089 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.1570, 5.1297, 4.9697, 4.4605, 5.1421, 4.7138, 2.5096, 5.2366], device='cuda:0'), covar=tensor([0.0228, 0.0241, 0.0246, 0.0278, 0.0218, 0.0377, 0.2599, 0.0311], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0090, 0.0089, 0.0083, 0.0102, 0.0091, 0.0132, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:06:11,576 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5532, 3.3907, 3.1075, 3.3583, 3.4103, 4.0365, 4.3808, 4.0411], device='cuda:0'), covar=tensor([0.0638, 0.1092, 0.1430, 0.1503, 0.1112, 0.0512, 0.0502, 0.1113], device='cuda:0'), in_proj_covar=tensor([0.0115, 0.0109, 0.0106, 0.0108, 0.0095, 0.0105, 0.0100, 0.0084], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 07:06:15,242 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=91603.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 07:06:19,687 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8716, 4.0989, 3.8785, 3.6875, 4.0333, 3.8200, 1.6897, 4.0948], device='cuda:0'), covar=tensor([0.0460, 0.0444, 0.0495, 0.0556, 0.0408, 0.0469, 0.3997, 0.0584], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0090, 0.0089, 0.0083, 0.0102, 0.0091, 0.0132, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:06:20,712 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.01 vs. limit=2.0 2022-11-16 07:06:39,670 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.763e+01 1.435e+02 1.703e+02 2.070e+02 3.900e+02, threshold=3.406e+02, percent-clipped=1.0 2022-11-16 07:06:41,064 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=91641.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:06:56,954 INFO [train.py:876] (0/4) Epoch 13, batch 4400, loss[loss=0.07249, simple_loss=0.1207, pruned_loss=0.01212, over 5546.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1332, pruned_loss=0.03478, over 1082896.32 frames. ], batch size: 15, lr: 6.17e-03, grad_scale: 16.0 2022-11-16 07:07:46,911 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.810e+01 1.451e+02 1.858e+02 2.310e+02 4.864e+02, threshold=3.715e+02, percent-clipped=3.0 2022-11-16 07:08:04,751 INFO [train.py:876] (0/4) Epoch 13, batch 4500, loss[loss=0.1048, simple_loss=0.1382, pruned_loss=0.03569, over 5756.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1329, pruned_loss=0.03512, over 1082412.79 frames. ], batch size: 20, lr: 6.17e-03, grad_scale: 16.0 2022-11-16 07:08:07,241 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6343, 1.7814, 2.1661, 1.7037, 1.3254, 2.5576, 2.1137, 1.7960], device='cuda:0'), covar=tensor([0.1528, 0.1692, 0.1426, 0.2718, 0.3083, 0.0796, 0.2041, 0.1849], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0102, 0.0101, 0.0104, 0.0077, 0.0073, 0.0082, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 07:08:35,031 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4262, 5.5408, 4.1399, 2.6134, 5.2240, 2.5164, 5.2642, 3.1257], device='cuda:0'), covar=tensor([0.0942, 0.0101, 0.0368, 0.1592, 0.0139, 0.1389, 0.0114, 0.1267], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0104, 0.0115, 0.0111, 0.0102, 0.0119, 0.0100, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:08:53,258 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.15 vs. limit=2.0 2022-11-16 07:08:55,631 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.055e+01 1.330e+02 1.643e+02 2.153e+02 4.136e+02, threshold=3.287e+02, percent-clipped=1.0 2022-11-16 07:09:13,844 INFO [train.py:876] (0/4) Epoch 13, batch 4600, loss[loss=0.07735, simple_loss=0.1172, pruned_loss=0.01877, over 5607.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1346, pruned_loss=0.03592, over 1083360.07 frames. ], batch size: 18, lr: 6.17e-03, grad_scale: 16.0 2022-11-16 07:09:20,155 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7003, 4.7449, 3.7204, 1.9664, 4.4253, 1.7380, 4.3998, 2.7747], device='cuda:0'), covar=tensor([0.1405, 0.0201, 0.0544, 0.2067, 0.0191, 0.1935, 0.0244, 0.1366], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0104, 0.0115, 0.0111, 0.0103, 0.0119, 0.0101, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:09:31,312 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91891.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:09:35,816 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=91898.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 07:10:03,762 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.956e+01 1.427e+02 1.793e+02 2.306e+02 3.919e+02, threshold=3.587e+02, percent-clipped=3.0 2022-11-16 07:10:05,197 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=91941.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:10:10,160 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2022-11-16 07:10:12,665 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=91952.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:10:14,591 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91955.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:10:21,954 INFO [train.py:876] (0/4) Epoch 13, batch 4700, loss[loss=0.06344, simple_loss=0.1028, pruned_loss=0.01203, over 5707.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1343, pruned_loss=0.03591, over 1088769.30 frames. ], batch size: 12, lr: 6.16e-03, grad_scale: 32.0 2022-11-16 07:10:27,461 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0775, 2.4069, 3.6266, 3.1005, 3.9056, 2.3250, 3.3953, 3.9315], device='cuda:0'), covar=tensor([0.0757, 0.1571, 0.0900, 0.1709, 0.0567, 0.1981, 0.1242, 0.0809], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0192, 0.0217, 0.0211, 0.0240, 0.0195, 0.0225, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:10:32,096 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8119, 3.9862, 3.8648, 3.3428, 1.9458, 4.0682, 2.2753, 3.3869], device='cuda:0'), covar=tensor([0.0443, 0.0204, 0.0182, 0.0410, 0.0771, 0.0169, 0.0644, 0.0194], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0184, 0.0184, 0.0209, 0.0197, 0.0185, 0.0193, 0.0188], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 07:10:32,694 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=91981.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:10:38,256 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=91989.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:10:57,224 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92016.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:11:00,912 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.69 vs. limit=2.0 2022-11-16 07:11:13,169 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.635e+01 1.436e+02 1.731e+02 2.246e+02 5.128e+02, threshold=3.463e+02, percent-clipped=1.0 2022-11-16 07:11:14,675 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92042.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:11:25,412 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-16 07:11:29,852 INFO [train.py:876] (0/4) Epoch 13, batch 4800, loss[loss=0.09958, simple_loss=0.1319, pruned_loss=0.03362, over 5725.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1343, pruned_loss=0.03608, over 1086902.25 frames. ], batch size: 13, lr: 6.16e-03, grad_scale: 16.0 2022-11-16 07:12:01,915 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8334, 2.4971, 2.8695, 3.7734, 3.8377, 3.0472, 2.6884, 3.8638], device='cuda:0'), covar=tensor([0.0826, 0.3312, 0.2048, 0.3128, 0.1014, 0.2957, 0.1961, 0.0987], device='cuda:0'), in_proj_covar=tensor([0.0260, 0.0200, 0.0190, 0.0302, 0.0228, 0.0203, 0.0191, 0.0251], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 07:12:21,126 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.028e+02 1.456e+02 1.773e+02 2.176e+02 4.110e+02, threshold=3.546e+02, percent-clipped=4.0 2022-11-16 07:12:37,615 INFO [train.py:876] (0/4) Epoch 13, batch 4900, loss[loss=0.0913, simple_loss=0.1241, pruned_loss=0.02925, over 5715.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1336, pruned_loss=0.03538, over 1089610.58 frames. ], batch size: 31, lr: 6.16e-03, grad_scale: 8.0 2022-11-16 07:12:44,211 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.6027, 5.1215, 5.3702, 5.0705, 5.6910, 5.4670, 4.7091, 5.6378], device='cuda:0'), covar=tensor([0.0378, 0.0333, 0.0453, 0.0347, 0.0320, 0.0252, 0.0274, 0.0233], device='cuda:0'), in_proj_covar=tensor([0.0147, 0.0154, 0.0111, 0.0144, 0.0182, 0.0111, 0.0128, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 07:12:50,696 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2719, 1.8547, 2.2336, 1.8760, 1.5158, 2.1896, 2.0279, 1.5930], device='cuda:0'), covar=tensor([0.0057, 0.0061, 0.0030, 0.0074, 0.0201, 0.0105, 0.0045, 0.0061], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0028, 0.0029, 0.0037, 0.0032, 0.0029, 0.0037, 0.0035], device='cuda:0'), out_proj_covar=tensor([2.8513e-05, 2.6050e-05, 2.6374e-05, 3.5850e-05, 3.0088e-05, 2.7908e-05, 3.5102e-05, 3.4010e-05], device='cuda:0') 2022-11-16 07:13:00,128 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=92198.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:13:29,749 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.574e+01 1.436e+02 1.818e+02 2.548e+02 4.452e+02, threshold=3.637e+02, percent-clipped=5.0 2022-11-16 07:13:33,144 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=92246.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 07:13:33,776 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=92247.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:13:45,761 INFO [train.py:876] (0/4) Epoch 13, batch 5000, loss[loss=0.09606, simple_loss=0.1387, pruned_loss=0.0267, over 5745.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1341, pruned_loss=0.03576, over 1094931.33 frames. ], batch size: 16, lr: 6.15e-03, grad_scale: 8.0 2022-11-16 07:13:58,785 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3967, 5.5843, 3.9993, 2.5278, 5.3262, 2.7805, 5.1181, 3.8564], device='cuda:0'), covar=tensor([0.1279, 0.0142, 0.0697, 0.2073, 0.0144, 0.1587, 0.0150, 0.1108], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0103, 0.0114, 0.0111, 0.0101, 0.0118, 0.0100, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:14:16,842 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=92311.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:14:34,570 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=92337.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:14:37,064 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.814e+01 1.442e+02 1.699e+02 2.118e+02 7.275e+02, threshold=3.398e+02, percent-clipped=6.0 2022-11-16 07:14:54,056 INFO [train.py:876] (0/4) Epoch 13, batch 5100, loss[loss=0.08338, simple_loss=0.1214, pruned_loss=0.0227, over 5544.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1345, pruned_loss=0.03547, over 1090035.51 frames. ], batch size: 14, lr: 6.15e-03, grad_scale: 8.0 2022-11-16 07:14:59,180 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9092, 1.6881, 1.8739, 1.7429, 1.5256, 1.8436, 1.6433, 1.6895], device='cuda:0'), covar=tensor([0.0071, 0.0105, 0.0058, 0.0059, 0.0083, 0.0070, 0.0056, 0.0053], device='cuda:0'), in_proj_covar=tensor([0.0032, 0.0028, 0.0030, 0.0038, 0.0033, 0.0029, 0.0037, 0.0036], device='cuda:0'), out_proj_covar=tensor([2.8918e-05, 2.6388e-05, 2.6654e-05, 3.6534e-05, 3.0300e-05, 2.8164e-05, 3.5317e-05, 3.4456e-05], device='cuda:0') 2022-11-16 07:15:45,934 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.294e+01 1.423e+02 1.797e+02 2.276e+02 4.290e+02, threshold=3.595e+02, percent-clipped=2.0 2022-11-16 07:15:51,327 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7145, 1.0928, 0.6761, 0.8511, 0.9183, 0.8088, 0.5753, 1.0736], device='cuda:0'), covar=tensor([0.0110, 0.0055, 0.0096, 0.0051, 0.0064, 0.0088, 0.0140, 0.0060], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0061, 0.0060, 0.0065, 0.0063, 0.0059, 0.0057, 0.0055], device='cuda:0'), out_proj_covar=tensor([5.8887e-05, 5.3926e-05, 5.2421e-05, 5.7674e-05, 5.5592e-05, 5.1134e-05, 5.0893e-05, 4.8082e-05], device='cuda:0') 2022-11-16 07:16:02,830 INFO [train.py:876] (0/4) Epoch 13, batch 5200, loss[loss=0.09126, simple_loss=0.1311, pruned_loss=0.02571, over 5574.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1338, pruned_loss=0.0349, over 1084797.92 frames. ], batch size: 23, lr: 6.15e-03, grad_scale: 8.0 2022-11-16 07:16:08,268 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6436, 3.8776, 3.6640, 3.2907, 2.0530, 3.8022, 2.1478, 3.1414], device='cuda:0'), covar=tensor([0.0493, 0.0173, 0.0171, 0.0442, 0.0669, 0.0193, 0.0608, 0.0190], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0184, 0.0183, 0.0208, 0.0195, 0.0185, 0.0194, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 07:16:31,582 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.27 vs. limit=5.0 2022-11-16 07:16:42,488 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9506, 2.7997, 2.8155, 1.5304, 3.0195, 3.0045, 2.7815, 3.2883], device='cuda:0'), covar=tensor([0.2184, 0.1748, 0.1162, 0.3062, 0.0841, 0.0970, 0.0713, 0.0959], device='cuda:0'), in_proj_covar=tensor([0.0167, 0.0183, 0.0169, 0.0187, 0.0185, 0.0206, 0.0171, 0.0186], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:16:54,292 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.360e+01 1.391e+02 1.845e+02 2.323e+02 4.876e+02, threshold=3.690e+02, percent-clipped=4.0 2022-11-16 07:16:56,315 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7898, 4.5569, 4.6516, 4.7418, 4.4217, 4.2097, 5.2041, 4.6403], device='cuda:0'), covar=tensor([0.0319, 0.0811, 0.0401, 0.1058, 0.0363, 0.0295, 0.0602, 0.0584], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0109, 0.0096, 0.0123, 0.0089, 0.0081, 0.0148, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:16:58,347 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=92547.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:17:10,185 INFO [train.py:876] (0/4) Epoch 13, batch 5300, loss[loss=0.1232, simple_loss=0.154, pruned_loss=0.04622, over 5638.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1341, pruned_loss=0.03482, over 1094028.98 frames. ], batch size: 38, lr: 6.14e-03, grad_scale: 8.0 2022-11-16 07:17:20,107 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6001, 2.3487, 2.9400, 1.8162, 1.5276, 3.2287, 2.7103, 2.3615], device='cuda:0'), covar=tensor([0.1138, 0.1571, 0.0733, 0.2858, 0.3123, 0.2463, 0.0877, 0.1334], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0103, 0.0101, 0.0104, 0.0078, 0.0072, 0.0082, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 07:17:30,907 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=92595.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:17:41,711 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=92611.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:17:44,399 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0432, 2.9205, 3.0681, 1.7813, 2.8448, 3.2507, 2.9645, 3.5515], device='cuda:0'), covar=tensor([0.2161, 0.1605, 0.0974, 0.2854, 0.0582, 0.1255, 0.0743, 0.0869], device='cuda:0'), in_proj_covar=tensor([0.0166, 0.0183, 0.0169, 0.0187, 0.0185, 0.0205, 0.0170, 0.0184], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:17:51,957 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92625.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:17:59,643 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=92637.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 07:18:02,109 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.870e+01 1.449e+02 1.761e+02 2.157e+02 4.829e+02, threshold=3.522e+02, percent-clipped=3.0 2022-11-16 07:18:06,934 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2022-11-16 07:18:14,397 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=92659.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:18:18,357 INFO [train.py:876] (0/4) Epoch 13, batch 5400, loss[loss=0.1523, simple_loss=0.1656, pruned_loss=0.06954, over 5135.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1342, pruned_loss=0.03509, over 1093525.27 frames. ], batch size: 91, lr: 6.14e-03, grad_scale: 8.0 2022-11-16 07:18:27,798 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.63 vs. limit=5.0 2022-11-16 07:18:32,611 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=92685.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:18:33,345 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92686.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:18:33,445 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.16 vs. limit=2.0 2022-11-16 07:18:55,986 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92720.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:19:10,734 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.348e+01 1.471e+02 1.783e+02 2.230e+02 3.628e+02, threshold=3.566e+02, percent-clipped=2.0 2022-11-16 07:19:26,814 INFO [train.py:876] (0/4) Epoch 13, batch 5500, loss[loss=0.1135, simple_loss=0.1448, pruned_loss=0.04107, over 5302.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1339, pruned_loss=0.03491, over 1087684.66 frames. ], batch size: 79, lr: 6.14e-03, grad_scale: 8.0 2022-11-16 07:19:38,281 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92781.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:19:55,595 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0581, 3.9934, 4.1434, 4.2052, 3.8150, 3.8082, 4.5559, 3.9633], device='cuda:0'), covar=tensor([0.0499, 0.0928, 0.0461, 0.1097, 0.0654, 0.0348, 0.0682, 0.0846], device='cuda:0'), in_proj_covar=tensor([0.0089, 0.0110, 0.0097, 0.0124, 0.0090, 0.0081, 0.0149, 0.0106], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:19:58,405 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2464, 2.9428, 3.0515, 1.8256, 2.9150, 3.0896, 3.0741, 3.3458], device='cuda:0'), covar=tensor([0.1776, 0.1453, 0.0917, 0.2605, 0.0674, 0.0921, 0.0622, 0.0882], device='cuda:0'), in_proj_covar=tensor([0.0163, 0.0181, 0.0167, 0.0184, 0.0182, 0.0201, 0.0169, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:20:12,843 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92831.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:20:19,502 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.251e+01 1.445e+02 1.860e+02 2.400e+02 4.307e+02, threshold=3.721e+02, percent-clipped=4.0 2022-11-16 07:20:30,705 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1975, 3.1970, 2.8097, 3.1390, 2.7074, 3.5188, 3.2896, 3.8397], device='cuda:0'), covar=tensor([0.0669, 0.1035, 0.1648, 0.1724, 0.1727, 0.0815, 0.0976, 0.0755], device='cuda:0'), in_proj_covar=tensor([0.0114, 0.0108, 0.0107, 0.0107, 0.0093, 0.0104, 0.0097, 0.0084], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 07:20:35,421 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.77 vs. limit=2.0 2022-11-16 07:20:35,694 INFO [train.py:876] (0/4) Epoch 13, batch 5600, loss[loss=0.1106, simple_loss=0.1462, pruned_loss=0.03753, over 5757.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1345, pruned_loss=0.03607, over 1082038.56 frames. ], batch size: 21, lr: 6.13e-03, grad_scale: 8.0 2022-11-16 07:20:41,169 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3058, 4.6469, 4.4790, 3.8804, 2.4481, 4.8573, 2.8222, 4.1061], device='cuda:0'), covar=tensor([0.0331, 0.0117, 0.0140, 0.0362, 0.0589, 0.0097, 0.0475, 0.0136], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0184, 0.0183, 0.0208, 0.0195, 0.0186, 0.0195, 0.0189], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 07:20:54,528 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92892.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:20:59,082 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92899.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:21:09,030 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.9180, 1.4105, 0.9609, 1.0243, 1.3006, 1.1673, 0.7536, 1.2578], device='cuda:0'), covar=tensor([0.0080, 0.0051, 0.0072, 0.0061, 0.0059, 0.0061, 0.0093, 0.0070], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0060, 0.0060, 0.0065, 0.0063, 0.0059, 0.0056, 0.0055], device='cuda:0'), out_proj_covar=tensor([5.8762e-05, 5.3494e-05, 5.2268e-05, 5.7740e-05, 5.5310e-05, 5.1241e-05, 5.0061e-05, 4.7843e-05], device='cuda:0') 2022-11-16 07:21:20,278 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92931.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:21:27,404 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.557e+01 1.397e+02 1.621e+02 2.063e+02 4.647e+02, threshold=3.241e+02, percent-clipped=3.0 2022-11-16 07:21:34,355 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1594, 3.5772, 2.5724, 3.2546, 2.6534, 2.4580, 1.9284, 2.9267], device='cuda:0'), covar=tensor([0.1312, 0.0287, 0.1102, 0.0453, 0.1170, 0.1157, 0.2006, 0.0566], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0142, 0.0155, 0.0149, 0.0175, 0.0168, 0.0158, 0.0159], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:21:35,830 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.54 vs. limit=5.0 2022-11-16 07:21:40,272 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92960.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:21:43,372 INFO [train.py:876] (0/4) Epoch 13, batch 5700, loss[loss=0.09335, simple_loss=0.1289, pruned_loss=0.02887, over 5546.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1338, pruned_loss=0.03531, over 1083930.63 frames. ], batch size: 16, lr: 6.13e-03, grad_scale: 8.0 2022-11-16 07:21:50,425 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=92975.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:21:54,338 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=92981.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:22:01,940 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=92992.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:22:11,608 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.47 vs. limit=2.0 2022-11-16 07:22:14,756 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93010.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:22:32,125 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93036.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:22:32,146 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93036.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:22:36,330 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.192e+01 1.368e+02 1.655e+02 2.036e+02 3.681e+02, threshold=3.311e+02, percent-clipped=3.0 2022-11-16 07:22:52,171 INFO [train.py:876] (0/4) Epoch 13, batch 5800, loss[loss=0.1048, simple_loss=0.1389, pruned_loss=0.03538, over 5157.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1355, pruned_loss=0.0363, over 1080448.24 frames. ], batch size: 91, lr: 6.13e-03, grad_scale: 8.0 2022-11-16 07:22:56,537 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93071.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:22:59,760 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93076.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:23:10,571 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-16 07:23:13,887 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93097.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 07:23:18,668 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=5.88 vs. limit=5.0 2022-11-16 07:23:26,249 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93115.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:23:43,301 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.091e+01 1.350e+02 1.786e+02 2.182e+02 6.733e+02, threshold=3.572e+02, percent-clipped=3.0 2022-11-16 07:24:00,089 INFO [train.py:876] (0/4) Epoch 13, batch 5900, loss[loss=0.09601, simple_loss=0.1257, pruned_loss=0.03316, over 5751.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1337, pruned_loss=0.03538, over 1086042.36 frames. ], batch size: 27, lr: 6.12e-03, grad_scale: 8.0 2022-11-16 07:24:04,294 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2022-11-16 07:24:07,379 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93176.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:24:13,481 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.29 vs. limit=2.0 2022-11-16 07:24:14,461 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93187.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:24:29,469 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2690, 3.1738, 2.8998, 3.1651, 3.2169, 2.8491, 2.7324, 3.0079], device='cuda:0'), covar=tensor([0.0295, 0.0538, 0.1299, 0.0542, 0.0518, 0.0517, 0.1080, 0.0576], device='cuda:0'), in_proj_covar=tensor([0.0130, 0.0179, 0.0272, 0.0178, 0.0222, 0.0174, 0.0189, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:24:31,151 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4960, 2.1686, 2.2373, 3.0133, 2.9149, 2.3201, 2.0061, 2.9270], device='cuda:0'), covar=tensor([0.1849, 0.2326, 0.2193, 0.1433, 0.1334, 0.2757, 0.2089, 0.1310], device='cuda:0'), in_proj_covar=tensor([0.0260, 0.0196, 0.0187, 0.0298, 0.0226, 0.0202, 0.0189, 0.0251], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 07:24:34,315 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6836, 4.8991, 3.2369, 4.5476, 3.8244, 3.3167, 2.8596, 4.0861], device='cuda:0'), covar=tensor([0.1326, 0.0254, 0.1131, 0.0265, 0.0515, 0.0850, 0.1680, 0.0286], device='cuda:0'), in_proj_covar=tensor([0.0152, 0.0141, 0.0154, 0.0146, 0.0173, 0.0165, 0.0156, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:24:51,118 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.010e+02 1.444e+02 1.638e+02 1.961e+02 3.238e+02, threshold=3.277e+02, percent-clipped=0.0 2022-11-16 07:25:00,822 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93255.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:25:07,946 INFO [train.py:876] (0/4) Epoch 13, batch 6000, loss[loss=0.09296, simple_loss=0.13, pruned_loss=0.02795, over 5777.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1346, pruned_loss=0.0364, over 1089102.10 frames. ], batch size: 21, lr: 6.12e-03, grad_scale: 8.0 2022-11-16 07:25:07,947 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 07:25:27,339 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1680, 2.1580, 2.2880, 1.8354, 1.9768, 1.9343, 2.0758, 2.1387], device='cuda:0'), covar=tensor([0.0048, 0.0056, 0.0042, 0.0059, 0.0048, 0.0036, 0.0037, 0.0071], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0060, 0.0060, 0.0066, 0.0062, 0.0059, 0.0056, 0.0055], device='cuda:0'), out_proj_covar=tensor([5.9115e-05, 5.3248e-05, 5.2097e-05, 5.8325e-05, 5.5121e-05, 5.1402e-05, 4.9829e-05, 4.7798e-05], device='cuda:0') 2022-11-16 07:25:32,184 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1266, 1.5328, 2.0376, 1.5848, 1.7845, 1.8438, 1.7658, 1.3621], device='cuda:0'), covar=tensor([0.0042, 0.0088, 0.0055, 0.0076, 0.0085, 0.0088, 0.0049, 0.0063], device='cuda:0'), in_proj_covar=tensor([0.0032, 0.0029, 0.0030, 0.0038, 0.0033, 0.0030, 0.0038, 0.0036], device='cuda:0'), out_proj_covar=tensor([2.9195e-05, 2.7338e-05, 2.6919e-05, 3.6743e-05, 3.0867e-05, 2.9207e-05, 3.5986e-05, 3.4687e-05], device='cuda:0') 2022-11-16 07:25:32,621 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4249, 1.8005, 1.5945, 1.4388, 1.5226, 1.4957, 1.5904, 0.7366], device='cuda:0'), covar=tensor([0.0033, 0.0043, 0.0044, 0.0056, 0.0043, 0.0036, 0.0047, 0.0102], device='cuda:0'), in_proj_covar=tensor([0.0032, 0.0029, 0.0030, 0.0038, 0.0033, 0.0030, 0.0038, 0.0036], device='cuda:0'), out_proj_covar=tensor([2.9195e-05, 2.7338e-05, 2.6919e-05, 3.6743e-05, 3.0867e-05, 2.9207e-05, 3.5986e-05, 3.4687e-05], device='cuda:0') 2022-11-16 07:25:35,118 INFO [train.py:908] (0/4) Epoch 13, validation: loss=0.1768, simple_loss=0.1872, pruned_loss=0.08323, over 1530663.00 frames. 2022-11-16 07:25:35,118 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 07:25:45,034 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7160, 2.3906, 3.3132, 3.0230, 3.3355, 2.3364, 3.1261, 3.6283], device='cuda:0'), covar=tensor([0.0758, 0.1675, 0.1082, 0.1459, 0.0800, 0.1654, 0.1319, 0.0983], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0193, 0.0219, 0.0212, 0.0242, 0.0196, 0.0226, 0.0233], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:25:45,575 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93281.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:25:49,439 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93287.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:26:18,401 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93329.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:26:19,777 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93331.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:26:19,856 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5575, 2.4124, 2.2947, 2.5567, 2.0946, 1.8832, 2.2325, 2.8081], device='cuda:0'), covar=tensor([0.1115, 0.1507, 0.1605, 0.1188, 0.1443, 0.1446, 0.1361, 0.0954], device='cuda:0'), in_proj_covar=tensor([0.0116, 0.0111, 0.0109, 0.0109, 0.0095, 0.0106, 0.0100, 0.0086], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 07:26:26,167 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.791e+01 1.381e+02 1.698e+02 2.176e+02 5.974e+02, threshold=3.396e+02, percent-clipped=7.0 2022-11-16 07:26:42,555 INFO [train.py:876] (0/4) Epoch 13, batch 6100, loss[loss=0.06221, simple_loss=0.09141, pruned_loss=0.01651, over 5390.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1332, pruned_loss=0.03514, over 1090024.70 frames. ], batch size: 9, lr: 6.12e-03, grad_scale: 8.0 2022-11-16 07:26:43,310 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93366.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:26:50,782 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93376.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:27:01,542 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93392.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:27:23,292 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93424.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:27:35,459 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 1.359e+02 1.712e+02 2.042e+02 3.975e+02, threshold=3.424e+02, percent-clipped=1.0 2022-11-16 07:27:51,739 INFO [train.py:876] (0/4) Epoch 13, batch 6200, loss[loss=0.06381, simple_loss=0.09017, pruned_loss=0.01872, over 5461.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1336, pruned_loss=0.03561, over 1081703.21 frames. ], batch size: 10, lr: 6.12e-03, grad_scale: 8.0 2022-11-16 07:27:55,647 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=93471.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:28:06,763 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93487.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:28:38,381 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93535.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:28:42,535 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.266e+01 1.465e+02 1.713e+02 2.165e+02 4.081e+02, threshold=3.427e+02, percent-clipped=1.0 2022-11-16 07:28:52,491 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93555.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:28:58,918 INFO [train.py:876] (0/4) Epoch 13, batch 6300, loss[loss=0.1057, simple_loss=0.1466, pruned_loss=0.03244, over 5518.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1336, pruned_loss=0.03587, over 1083832.14 frames. ], batch size: 14, lr: 6.11e-03, grad_scale: 8.0 2022-11-16 07:29:03,478 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2022-11-16 07:29:13,676 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93587.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:29:14,964 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5057, 4.5421, 3.4614, 2.0758, 4.3578, 1.7076, 4.2702, 2.4457], device='cuda:0'), covar=tensor([0.1467, 0.0109, 0.0597, 0.1854, 0.0158, 0.1791, 0.0180, 0.1507], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0104, 0.0115, 0.0110, 0.0103, 0.0118, 0.0101, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:29:24,621 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93603.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:29:43,335 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93631.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:29:46,174 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93635.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:29:49,999 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.661e+01 1.430e+02 1.750e+02 2.354e+02 5.950e+02, threshold=3.500e+02, percent-clipped=2.0 2022-11-16 07:30:07,004 INFO [train.py:876] (0/4) Epoch 13, batch 6400, loss[loss=0.1045, simple_loss=0.1415, pruned_loss=0.03378, over 5714.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.133, pruned_loss=0.0347, over 1091236.89 frames. ], batch size: 31, lr: 6.11e-03, grad_scale: 8.0 2022-11-16 07:30:07,758 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93666.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:30:16,122 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93679.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:30:24,946 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93692.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:30:39,893 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93714.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:30:45,963 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.93 vs. limit=2.0 2022-11-16 07:30:57,370 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93740.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:30:57,985 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.087e+01 1.406e+02 1.692e+02 2.048e+02 4.065e+02, threshold=3.385e+02, percent-clipped=2.0 2022-11-16 07:31:11,981 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1026, 3.0437, 2.7798, 3.1299, 2.5897, 2.9191, 3.0728, 3.5536], device='cuda:0'), covar=tensor([0.1017, 0.0928, 0.1679, 0.1350, 0.1442, 0.0686, 0.1063, 0.0525], device='cuda:0'), in_proj_covar=tensor([0.0114, 0.0109, 0.0107, 0.0108, 0.0093, 0.0105, 0.0099, 0.0085], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 07:31:13,824 INFO [train.py:876] (0/4) Epoch 13, batch 6500, loss[loss=0.1195, simple_loss=0.1489, pruned_loss=0.04509, over 5597.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1337, pruned_loss=0.03545, over 1090392.93 frames. ], batch size: 43, lr: 6.11e-03, grad_scale: 8.0 2022-11-16 07:31:17,576 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93770.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:31:18,258 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=93771.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:31:26,391 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2466, 3.0688, 3.0869, 3.1744, 3.0811, 2.9262, 3.5127, 3.1356], device='cuda:0'), covar=tensor([0.0476, 0.0871, 0.0581, 0.1244, 0.0677, 0.0459, 0.0795, 0.0888], device='cuda:0'), in_proj_covar=tensor([0.0088, 0.0110, 0.0096, 0.0124, 0.0090, 0.0081, 0.0146, 0.0105], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:31:50,711 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=93819.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:31:57,092 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5060, 2.5959, 2.4753, 2.4860, 2.1313, 1.8191, 2.3389, 2.7028], device='cuda:0'), covar=tensor([0.1214, 0.1399, 0.1785, 0.1167, 0.1442, 0.1653, 0.1600, 0.1135], device='cuda:0'), in_proj_covar=tensor([0.0114, 0.0109, 0.0107, 0.0108, 0.0093, 0.0104, 0.0099, 0.0084], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0003, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 07:31:59,072 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=93831.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:32:05,379 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.939e+01 1.358e+02 1.733e+02 2.139e+02 3.711e+02, threshold=3.467e+02, percent-clipped=1.0 2022-11-16 07:32:21,362 INFO [train.py:876] (0/4) Epoch 13, batch 6600, loss[loss=0.1288, simple_loss=0.156, pruned_loss=0.05081, over 5123.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1327, pruned_loss=0.03471, over 1090482.76 frames. ], batch size: 91, lr: 6.10e-03, grad_scale: 8.0 2022-11-16 07:32:34,835 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0567, 3.9795, 3.7559, 3.6833, 4.0618, 3.8569, 1.6040, 4.2720], device='cuda:0'), covar=tensor([0.0269, 0.0554, 0.0364, 0.0478, 0.0342, 0.0401, 0.3409, 0.0386], device='cuda:0'), in_proj_covar=tensor([0.0106, 0.0091, 0.0089, 0.0083, 0.0105, 0.0092, 0.0133, 0.0111], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:32:40,925 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3120, 3.9891, 3.0303, 1.9573, 3.8349, 1.5113, 3.6384, 2.1217], device='cuda:0'), covar=tensor([0.1402, 0.0160, 0.0777, 0.1779, 0.0203, 0.1795, 0.0309, 0.1473], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0103, 0.0114, 0.0109, 0.0102, 0.0117, 0.0100, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:33:02,845 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7023, 2.3456, 2.9279, 1.8782, 1.8495, 3.1058, 2.9315, 2.3358], device='cuda:0'), covar=tensor([0.0862, 0.1448, 0.0757, 0.2348, 0.2864, 0.1356, 0.0875, 0.1319], device='cuda:0'), in_proj_covar=tensor([0.0109, 0.0101, 0.0099, 0.0102, 0.0074, 0.0071, 0.0081, 0.0092], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 07:33:09,900 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.10 vs. limit=2.0 2022-11-16 07:33:13,069 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.008e+02 1.396e+02 1.723e+02 2.017e+02 3.846e+02, threshold=3.447e+02, percent-clipped=2.0 2022-11-16 07:33:28,186 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8138, 2.5112, 3.4586, 3.0887, 3.4578, 2.3868, 3.2399, 3.6982], device='cuda:0'), covar=tensor([0.0725, 0.1593, 0.0920, 0.1477, 0.0740, 0.1752, 0.1437, 0.0998], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0195, 0.0217, 0.0211, 0.0240, 0.0195, 0.0227, 0.0231], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:33:29,293 INFO [train.py:876] (0/4) Epoch 13, batch 6700, loss[loss=0.08758, simple_loss=0.1229, pruned_loss=0.02611, over 5483.00 frames. ], tot_loss[loss=0.09929, simple_loss=0.131, pruned_loss=0.03378, over 1090194.66 frames. ], batch size: 12, lr: 6.10e-03, grad_scale: 8.0 2022-11-16 07:33:40,298 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=93981.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:33:44,472 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=6.17 vs. limit=5.0 2022-11-16 07:34:23,829 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.687e+01 1.406e+02 1.734e+02 2.317e+02 5.958e+02, threshold=3.468e+02, percent-clipped=4.0 2022-11-16 07:34:25,115 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=94042.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:34:35,540 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.56 vs. limit=2.0 2022-11-16 07:34:40,853 INFO [train.py:876] (0/4) Epoch 13, batch 6800, loss[loss=0.07245, simple_loss=0.1059, pruned_loss=0.01951, over 5494.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1327, pruned_loss=0.03541, over 1086310.37 frames. ], batch size: 10, lr: 6.10e-03, grad_scale: 8.0 2022-11-16 07:35:22,156 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=94126.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:35:32,243 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.571e+01 1.399e+02 1.659e+02 1.999e+02 4.068e+02, threshold=3.319e+02, percent-clipped=3.0 2022-11-16 07:35:47,783 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1017, 3.2461, 3.1700, 3.0938, 3.2412, 3.0734, 1.3074, 3.4097], device='cuda:0'), covar=tensor([0.0387, 0.0307, 0.0366, 0.0300, 0.0368, 0.0447, 0.3318, 0.0319], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0090, 0.0088, 0.0082, 0.0104, 0.0090, 0.0132, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:35:48,960 INFO [train.py:876] (0/4) Epoch 13, batch 6900, loss[loss=0.1052, simple_loss=0.1355, pruned_loss=0.03741, over 5623.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1327, pruned_loss=0.03542, over 1078142.40 frames. ], batch size: 23, lr: 6.09e-03, grad_scale: 16.0 2022-11-16 07:36:04,225 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.45 vs. limit=5.0 2022-11-16 07:36:26,760 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4574, 4.1113, 3.2563, 1.9484, 3.9196, 1.7919, 3.7999, 2.2143], device='cuda:0'), covar=tensor([0.1521, 0.0138, 0.0736, 0.2204, 0.0236, 0.1874, 0.0286, 0.1594], device='cuda:0'), in_proj_covar=tensor([0.0120, 0.0105, 0.0115, 0.0111, 0.0103, 0.0118, 0.0101, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:36:28,750 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=94223.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:36:40,443 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.408e+01 1.493e+02 1.868e+02 2.271e+02 3.997e+02, threshold=3.737e+02, percent-clipped=4.0 2022-11-16 07:36:56,956 INFO [train.py:876] (0/4) Epoch 13, batch 7000, loss[loss=0.09878, simple_loss=0.1327, pruned_loss=0.0324, over 5658.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1339, pruned_loss=0.03529, over 1087748.47 frames. ], batch size: 38, lr: 6.09e-03, grad_scale: 16.0 2022-11-16 07:37:09,845 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=94284.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 07:37:24,596 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9679, 3.0705, 3.1114, 2.9522, 3.1364, 2.9716, 1.3040, 3.2568], device='cuda:0'), covar=tensor([0.0373, 0.0354, 0.0348, 0.0325, 0.0354, 0.0397, 0.3151, 0.0365], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0090, 0.0088, 0.0082, 0.0103, 0.0091, 0.0132, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:37:40,139 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-16 07:37:45,785 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=94337.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:37:48,222 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.793e+01 1.390e+02 1.725e+02 2.069e+02 3.894e+02, threshold=3.450e+02, percent-clipped=1.0 2022-11-16 07:37:52,961 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=94348.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:37:53,154 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-16 07:38:04,148 INFO [train.py:876] (0/4) Epoch 13, batch 7100, loss[loss=0.09968, simple_loss=0.138, pruned_loss=0.0307, over 5678.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1342, pruned_loss=0.03517, over 1092306.63 frames. ], batch size: 36, lr: 6.09e-03, grad_scale: 16.0 2022-11-16 07:38:34,088 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=94409.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 07:38:45,566 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=94426.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:38:56,308 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 1.440e+02 1.714e+02 2.140e+02 3.966e+02, threshold=3.428e+02, percent-clipped=2.0 2022-11-16 07:39:06,953 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0088, 3.9540, 3.8434, 3.6535, 2.2141, 4.4133, 2.4571, 3.8849], device='cuda:0'), covar=tensor([0.0500, 0.0314, 0.0255, 0.0387, 0.0741, 0.0189, 0.0700, 0.0163], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0186, 0.0182, 0.0209, 0.0197, 0.0185, 0.0194, 0.0188], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 07:39:12,057 INFO [train.py:876] (0/4) Epoch 13, batch 7200, loss[loss=0.112, simple_loss=0.145, pruned_loss=0.03957, over 5799.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1334, pruned_loss=0.03486, over 1094328.72 frames. ], batch size: 22, lr: 6.08e-03, grad_scale: 16.0 2022-11-16 07:39:18,278 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=94474.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:39:21,680 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6124, 2.4054, 3.2213, 3.0200, 3.0881, 2.2952, 3.0053, 3.4925], device='cuda:0'), covar=tensor([0.0654, 0.1401, 0.0866, 0.1315, 0.0729, 0.1540, 0.1192, 0.0962], device='cuda:0'), in_proj_covar=tensor([0.0243, 0.0190, 0.0213, 0.0208, 0.0237, 0.0191, 0.0223, 0.0226], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:40:01,693 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-13.pt 2022-11-16 07:40:43,965 INFO [train.py:876] (0/4) Epoch 14, batch 0, loss[loss=0.1213, simple_loss=0.1496, pruned_loss=0.04644, over 5687.00 frames. ], tot_loss[loss=0.1213, simple_loss=0.1496, pruned_loss=0.04644, over 5687.00 frames. ], batch size: 34, lr: 5.86e-03, grad_scale: 16.0 2022-11-16 07:40:43,966 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 07:41:00,499 INFO [train.py:908] (0/4) Epoch 14, validation: loss=0.1755, simple_loss=0.1868, pruned_loss=0.08205, over 1530663.00 frames. 2022-11-16 07:41:00,500 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 07:41:03,038 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 1.398e+02 1.682e+02 2.138e+02 4.621e+02, threshold=3.364e+02, percent-clipped=3.0 2022-11-16 07:41:23,870 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8741, 2.5270, 2.3810, 1.4954, 2.6482, 2.7162, 2.6723, 2.9580], device='cuda:0'), covar=tensor([0.1977, 0.1702, 0.1457, 0.3031, 0.0903, 0.1148, 0.0759, 0.1195], device='cuda:0'), in_proj_covar=tensor([0.0163, 0.0178, 0.0167, 0.0182, 0.0181, 0.0201, 0.0169, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:41:25,218 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6973, 3.2140, 4.0564, 3.8136, 4.4769, 2.9280, 3.9040, 4.4656], device='cuda:0'), covar=tensor([0.0562, 0.1474, 0.0791, 0.1184, 0.0341, 0.1546, 0.1113, 0.0961], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0191, 0.0214, 0.0209, 0.0239, 0.0191, 0.0224, 0.0227], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:41:29,025 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=94579.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 07:42:08,084 INFO [train.py:876] (0/4) Epoch 14, batch 100, loss[loss=0.1075, simple_loss=0.1346, pruned_loss=0.04024, over 5752.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1362, pruned_loss=0.03628, over 435307.08 frames. ], batch size: 26, lr: 5.86e-03, grad_scale: 16.0 2022-11-16 07:42:08,186 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=94637.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:42:10,675 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 1.465e+02 1.762e+02 2.317e+02 5.551e+02, threshold=3.525e+02, percent-clipped=6.0 2022-11-16 07:42:20,758 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-16 07:42:40,597 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=94685.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:42:52,945 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=94704.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 07:43:15,766 INFO [train.py:876] (0/4) Epoch 14, batch 200, loss[loss=0.07149, simple_loss=0.1122, pruned_loss=0.01538, over 5752.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.134, pruned_loss=0.03558, over 684988.13 frames. ], batch size: 20, lr: 5.85e-03, grad_scale: 16.0 2022-11-16 07:43:18,295 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.022e+02 1.376e+02 1.680e+02 2.123e+02 3.782e+02, threshold=3.359e+02, percent-clipped=1.0 2022-11-16 07:43:19,788 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7689, 1.7861, 1.8123, 1.5793, 1.7519, 1.7832, 1.6975, 1.8693], device='cuda:0'), covar=tensor([0.0069, 0.0061, 0.0055, 0.0062, 0.0056, 0.0050, 0.0051, 0.0065], device='cuda:0'), in_proj_covar=tensor([0.0065, 0.0060, 0.0060, 0.0065, 0.0062, 0.0058, 0.0056, 0.0054], device='cuda:0'), out_proj_covar=tensor([5.7979e-05, 5.2838e-05, 5.2133e-05, 5.7392e-05, 5.4515e-05, 5.0596e-05, 4.9581e-05, 4.7090e-05], device='cuda:0') 2022-11-16 07:43:27,585 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6556, 4.5813, 3.1215, 4.3767, 3.5040, 3.1162, 2.4002, 3.8864], device='cuda:0'), covar=tensor([0.1461, 0.0229, 0.1080, 0.0279, 0.0661, 0.0960, 0.1983, 0.0404], device='cuda:0'), in_proj_covar=tensor([0.0154, 0.0141, 0.0156, 0.0148, 0.0173, 0.0167, 0.0157, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:44:22,890 INFO [train.py:876] (0/4) Epoch 14, batch 300, loss[loss=0.07868, simple_loss=0.1235, pruned_loss=0.01692, over 5700.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1347, pruned_loss=0.0364, over 837995.31 frames. ], batch size: 17, lr: 5.85e-03, grad_scale: 16.0 2022-11-16 07:44:25,436 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.663e+01 1.544e+02 1.893e+02 2.592e+02 6.103e+02, threshold=3.786e+02, percent-clipped=6.0 2022-11-16 07:44:41,636 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0413, 3.6018, 2.4179, 3.4478, 2.8622, 2.5597, 1.9144, 3.0115], device='cuda:0'), covar=tensor([0.1589, 0.0393, 0.1382, 0.0462, 0.1143, 0.1295, 0.2365, 0.0790], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0140, 0.0155, 0.0147, 0.0172, 0.0166, 0.0156, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:44:50,719 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=94879.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:45:22,537 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=94927.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:45:29,277 INFO [train.py:876] (0/4) Epoch 14, batch 400, loss[loss=0.1023, simple_loss=0.1379, pruned_loss=0.03337, over 5747.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1343, pruned_loss=0.03614, over 934306.02 frames. ], batch size: 20, lr: 5.85e-03, grad_scale: 16.0 2022-11-16 07:45:32,605 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.318e+01 1.380e+02 1.703e+02 1.935e+02 3.356e+02, threshold=3.406e+02, percent-clipped=0.0 2022-11-16 07:45:53,280 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5088, 2.2722, 2.6724, 1.9594, 1.4655, 3.3014, 2.6176, 2.1304], device='cuda:0'), covar=tensor([0.0929, 0.1560, 0.0983, 0.2435, 0.2836, 0.0324, 0.0873, 0.1432], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0103, 0.0103, 0.0104, 0.0076, 0.0073, 0.0084, 0.0095], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 07:45:59,956 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-16 07:46:12,765 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-95000.pt 2022-11-16 07:46:19,810 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95004.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:46:24,144 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.8969, 4.7805, 4.8421, 4.9054, 4.3672, 4.1645, 5.4535, 4.7250], device='cuda:0'), covar=tensor([0.0407, 0.0797, 0.0338, 0.1114, 0.0511, 0.0323, 0.0629, 0.0641], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0112, 0.0097, 0.0124, 0.0091, 0.0082, 0.0148, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 07:46:25,532 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2905, 2.2097, 2.6710, 1.9687, 1.7458, 3.0587, 2.5566, 2.1370], device='cuda:0'), covar=tensor([0.1023, 0.1124, 0.0745, 0.2325, 0.2131, 0.1514, 0.1189, 0.1497], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0103, 0.0103, 0.0104, 0.0076, 0.0073, 0.0084, 0.0096], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 07:46:28,337 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.30 vs. limit=5.0 2022-11-16 07:46:29,400 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1108, 2.4700, 3.5380, 3.0747, 3.9395, 2.3458, 3.4775, 3.9317], device='cuda:0'), covar=tensor([0.0716, 0.1725, 0.0976, 0.1803, 0.0477, 0.1784, 0.1282, 0.0946], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0194, 0.0215, 0.0212, 0.0242, 0.0196, 0.0225, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:46:39,999 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.85 vs. limit=2.0 2022-11-16 07:46:40,791 INFO [train.py:876] (0/4) Epoch 14, batch 500, loss[loss=0.0796, simple_loss=0.1128, pruned_loss=0.0232, over 5100.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1345, pruned_loss=0.03591, over 995711.55 frames. ], batch size: 7, lr: 5.84e-03, grad_scale: 16.0 2022-11-16 07:46:43,324 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.308e+01 1.428e+02 1.816e+02 2.349e+02 3.391e+02, threshold=3.632e+02, percent-clipped=0.0 2022-11-16 07:46:51,589 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=95052.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:47:04,963 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-16 07:47:26,149 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95105.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 07:47:48,217 INFO [train.py:876] (0/4) Epoch 14, batch 600, loss[loss=0.09072, simple_loss=0.1271, pruned_loss=0.02719, over 5624.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1358, pruned_loss=0.0366, over 1033904.80 frames. ], batch size: 38, lr: 5.84e-03, grad_scale: 16.0 2022-11-16 07:47:50,762 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.356e+01 1.449e+02 1.769e+02 2.281e+02 4.546e+02, threshold=3.538e+02, percent-clipped=1.0 2022-11-16 07:48:07,443 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95166.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 07:48:23,512 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0892, 1.0468, 0.9900, 0.8727, 1.1889, 0.9652, 0.5357, 0.8794], device='cuda:0'), covar=tensor([0.0224, 0.0337, 0.0296, 0.0396, 0.0250, 0.0279, 0.0806, 0.0312], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0026, 0.0019, 0.0022, 0.0018, 0.0017, 0.0025, 0.0017], device='cuda:0'), out_proj_covar=tensor([9.4382e-05, 1.3181e-04, 1.0006e-04, 1.1413e-04, 1.0066e-04, 9.4623e-05, 1.2510e-04, 9.4238e-05], device='cuda:0') 2022-11-16 07:48:37,591 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5593, 2.4973, 2.3231, 2.5326, 2.2602, 2.1127, 2.3511, 2.8604], device='cuda:0'), covar=tensor([0.1220, 0.1367, 0.1956, 0.1163, 0.1402, 0.1306, 0.1380, 0.1231], device='cuda:0'), in_proj_covar=tensor([0.0115, 0.0108, 0.0107, 0.0109, 0.0094, 0.0104, 0.0099, 0.0085], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 07:48:56,204 INFO [train.py:876] (0/4) Epoch 14, batch 700, loss[loss=0.07537, simple_loss=0.1117, pruned_loss=0.0195, over 5508.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1348, pruned_loss=0.03667, over 1046460.96 frames. ], batch size: 12, lr: 5.84e-03, grad_scale: 16.0 2022-11-16 07:48:58,828 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.646e+01 1.509e+02 1.874e+02 2.495e+02 6.608e+02, threshold=3.748e+02, percent-clipped=12.0 2022-11-16 07:49:11,289 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95260.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:49:16,500 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95268.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:49:51,034 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0977, 2.9744, 2.7479, 3.1306, 2.5924, 3.1816, 3.2410, 3.7738], device='cuda:0'), covar=tensor([0.1436, 0.1077, 0.1330, 0.1002, 0.1429, 0.0771, 0.0934, 0.0607], device='cuda:0'), in_proj_covar=tensor([0.0115, 0.0108, 0.0107, 0.0109, 0.0094, 0.0103, 0.0099, 0.0085], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 07:49:52,345 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95321.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:49:58,233 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95329.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:50:03,503 INFO [train.py:876] (0/4) Epoch 14, batch 800, loss[loss=0.09607, simple_loss=0.1342, pruned_loss=0.02897, over 5576.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1325, pruned_loss=0.03511, over 1058967.21 frames. ], batch size: 18, lr: 5.83e-03, grad_scale: 16.0 2022-11-16 07:50:04,885 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95339.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:50:06,024 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.478e+01 1.483e+02 1.769e+02 2.212e+02 4.574e+02, threshold=3.537e+02, percent-clipped=3.0 2022-11-16 07:50:14,295 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.73 vs. limit=2.0 2022-11-16 07:50:46,441 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95400.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:51:06,434 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5798, 2.3820, 2.8147, 1.9322, 1.4827, 3.2398, 2.6878, 2.2496], device='cuda:0'), covar=tensor([0.1129, 0.1362, 0.0813, 0.2493, 0.3144, 0.3069, 0.1217, 0.1949], device='cuda:0'), in_proj_covar=tensor([0.0113, 0.0104, 0.0105, 0.0105, 0.0078, 0.0073, 0.0084, 0.0096], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 07:51:11,128 INFO [train.py:876] (0/4) Epoch 14, batch 900, loss[loss=0.08438, simple_loss=0.1242, pruned_loss=0.02228, over 5572.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1339, pruned_loss=0.0355, over 1071522.94 frames. ], batch size: 25, lr: 5.83e-03, grad_scale: 16.0 2022-11-16 07:51:13,910 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.380e+01 1.449e+02 1.681e+02 2.078e+02 5.193e+02, threshold=3.361e+02, percent-clipped=2.0 2022-11-16 07:51:26,949 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95461.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 07:52:17,321 INFO [train.py:876] (0/4) Epoch 14, batch 1000, loss[loss=0.147, simple_loss=0.1641, pruned_loss=0.06495, over 5510.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1339, pruned_loss=0.03538, over 1080486.96 frames. ], batch size: 53, lr: 5.83e-03, grad_scale: 16.0 2022-11-16 07:52:19,862 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.883e+01 1.427e+02 1.771e+02 2.173e+02 4.557e+02, threshold=3.542e+02, percent-clipped=6.0 2022-11-16 07:52:30,491 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7267, 4.8277, 3.1340, 4.5158, 3.6781, 3.1732, 2.6035, 3.9844], device='cuda:0'), covar=tensor([0.1272, 0.0155, 0.0949, 0.0259, 0.0545, 0.0981, 0.1739, 0.0318], device='cuda:0'), in_proj_covar=tensor([0.0152, 0.0139, 0.0152, 0.0145, 0.0170, 0.0165, 0.0155, 0.0154], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:53:11,115 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95616.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:53:11,840 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95617.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:53:16,360 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95624.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:53:24,670 INFO [train.py:876] (0/4) Epoch 14, batch 1100, loss[loss=0.05564, simple_loss=0.09284, pruned_loss=0.009224, over 4760.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1338, pruned_loss=0.03577, over 1081116.36 frames. ], batch size: 5, lr: 5.83e-03, grad_scale: 16.0 2022-11-16 07:53:27,207 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 1.386e+02 1.674e+02 2.201e+02 3.601e+02, threshold=3.349e+02, percent-clipped=2.0 2022-11-16 07:53:35,995 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2973, 2.8749, 3.0267, 1.8105, 2.8270, 3.2170, 3.2746, 3.5879], device='cuda:0'), covar=tensor([0.1854, 0.1769, 0.1270, 0.3210, 0.0887, 0.1107, 0.0587, 0.0954], device='cuda:0'), in_proj_covar=tensor([0.0164, 0.0179, 0.0170, 0.0182, 0.0186, 0.0203, 0.0170, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:53:50,729 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.72 vs. limit=2.0 2022-11-16 07:53:53,099 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95678.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:54:03,922 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95695.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:54:29,651 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1743, 1.4258, 1.2798, 1.0288, 1.1920, 1.6551, 1.6817, 1.5003], device='cuda:0'), covar=tensor([0.1335, 0.1066, 0.2036, 0.2423, 0.1575, 0.1092, 0.1155, 0.1472], device='cuda:0'), in_proj_covar=tensor([0.0165, 0.0179, 0.0170, 0.0182, 0.0186, 0.0204, 0.0170, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:54:32,046 INFO [train.py:876] (0/4) Epoch 14, batch 1200, loss[loss=0.07661, simple_loss=0.1065, pruned_loss=0.02337, over 5167.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1335, pruned_loss=0.03519, over 1083539.61 frames. ], batch size: 8, lr: 5.82e-03, grad_scale: 16.0 2022-11-16 07:54:34,544 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.521e+01 1.381e+02 1.760e+02 2.068e+02 4.246e+02, threshold=3.521e+02, percent-clipped=4.0 2022-11-16 07:54:47,844 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95761.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 07:54:48,559 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.7685, 2.3585, 3.4129, 2.9888, 3.6075, 2.3826, 3.1603, 3.7919], device='cuda:0'), covar=tensor([0.0831, 0.1652, 0.0853, 0.1464, 0.0582, 0.1769, 0.1307, 0.0742], device='cuda:0'), in_proj_covar=tensor([0.0243, 0.0192, 0.0212, 0.0210, 0.0238, 0.0193, 0.0221, 0.0228], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:54:59,266 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1973, 2.9768, 3.6742, 1.9195, 3.2365, 3.6990, 3.7634, 4.0536], device='cuda:0'), covar=tensor([0.1957, 0.1604, 0.0765, 0.2909, 0.0651, 0.0732, 0.0495, 0.0643], device='cuda:0'), in_proj_covar=tensor([0.0163, 0.0177, 0.0169, 0.0181, 0.0185, 0.0202, 0.0169, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 07:55:05,786 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4930, 1.1290, 1.3766, 1.1440, 1.1006, 1.4132, 1.1541, 1.1558], device='cuda:0'), covar=tensor([0.0054, 0.0142, 0.0085, 0.0107, 0.0145, 0.0101, 0.0082, 0.0092], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0029, 0.0029, 0.0038, 0.0033, 0.0030, 0.0036, 0.0035], device='cuda:0'), out_proj_covar=tensor([2.8461e-05, 2.6961e-05, 2.6061e-05, 3.6157e-05, 3.0483e-05, 2.8451e-05, 3.4575e-05, 3.3334e-05], device='cuda:0') 2022-11-16 07:55:20,103 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=95809.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 07:55:22,110 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95812.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 07:55:29,035 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.6472, 5.2144, 5.4925, 5.1334, 5.7293, 5.5405, 4.9938, 5.7540], device='cuda:0'), covar=tensor([0.0387, 0.0329, 0.0396, 0.0331, 0.0388, 0.0201, 0.0190, 0.0246], device='cuda:0'), in_proj_covar=tensor([0.0145, 0.0156, 0.0110, 0.0146, 0.0185, 0.0113, 0.0129, 0.0155], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 07:55:38,696 INFO [train.py:876] (0/4) Epoch 14, batch 1300, loss[loss=0.08713, simple_loss=0.1246, pruned_loss=0.02482, over 5711.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1325, pruned_loss=0.03422, over 1086566.96 frames. ], batch size: 17, lr: 5.82e-03, grad_scale: 16.0 2022-11-16 07:55:41,904 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.562e+01 1.314e+02 1.675e+02 2.012e+02 3.727e+02, threshold=3.350e+02, percent-clipped=1.0 2022-11-16 07:55:43,116 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.91 vs. limit=2.0 2022-11-16 07:55:45,969 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95847.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:55:48,934 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2022-11-16 07:56:02,941 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95873.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:56:27,139 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=95908.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:56:32,276 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95916.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:56:37,639 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95924.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:56:43,898 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2022-11-16 07:56:46,581 INFO [train.py:876] (0/4) Epoch 14, batch 1400, loss[loss=0.1115, simple_loss=0.1393, pruned_loss=0.04183, over 5602.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1331, pruned_loss=0.03477, over 1081539.98 frames. ], batch size: 38, lr: 5.82e-03, grad_scale: 16.0 2022-11-16 07:56:49,504 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.691e+01 1.448e+02 1.696e+02 2.137e+02 4.589e+02, threshold=3.392e+02, percent-clipped=2.0 2022-11-16 07:57:04,970 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=95964.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:57:10,143 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=95972.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:57:10,812 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=95973.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:57:15,492 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=95980.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:57:25,798 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=95995.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:57:54,059 INFO [train.py:876] (0/4) Epoch 14, batch 1500, loss[loss=0.07118, simple_loss=0.1069, pruned_loss=0.01775, over 5429.00 frames. ], tot_loss[loss=0.09827, simple_loss=0.1311, pruned_loss=0.03272, over 1090610.61 frames. ], batch size: 11, lr: 5.81e-03, grad_scale: 16.0 2022-11-16 07:57:55,532 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3437, 1.6191, 1.6661, 1.6215, 1.4586, 2.2071, 1.8142, 1.4173], device='cuda:0'), covar=tensor([0.1900, 0.1554, 0.2003, 0.2389, 0.1805, 0.0808, 0.1376, 0.1962], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0102, 0.0103, 0.0103, 0.0076, 0.0072, 0.0082, 0.0094], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 07:57:56,680 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.465e+02 1.739e+02 2.113e+02 3.912e+02, threshold=3.478e+02, percent-clipped=1.0 2022-11-16 07:57:56,880 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96041.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:57:58,092 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96043.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:58:24,054 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96080.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:59:02,198 INFO [train.py:876] (0/4) Epoch 14, batch 1600, loss[loss=0.11, simple_loss=0.1382, pruned_loss=0.04087, over 5741.00 frames. ], tot_loss[loss=0.09957, simple_loss=0.132, pruned_loss=0.03358, over 1089193.13 frames. ], batch size: 27, lr: 5.81e-03, grad_scale: 16.0 2022-11-16 07:59:04,693 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.966e+01 1.432e+02 1.726e+02 2.177e+02 5.569e+02, threshold=3.453e+02, percent-clipped=4.0 2022-11-16 07:59:04,905 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96141.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:59:21,825 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96165.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 07:59:23,744 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96168.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 07:59:32,683 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2022-11-16 07:59:46,890 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96203.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 07:59:50,849 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9269, 1.3337, 1.8268, 1.6259, 1.6806, 1.7338, 1.7424, 1.6207], device='cuda:0'), covar=tensor([0.0065, 0.0156, 0.0080, 0.0081, 0.0113, 0.0100, 0.0055, 0.0064], device='cuda:0'), in_proj_covar=tensor([0.0032, 0.0030, 0.0030, 0.0039, 0.0033, 0.0030, 0.0037, 0.0036], device='cuda:0'), out_proj_covar=tensor([2.9254e-05, 2.8012e-05, 2.7032e-05, 3.6980e-05, 3.1035e-05, 2.8890e-05, 3.5173e-05, 3.4565e-05], device='cuda:0') 2022-11-16 07:59:56,810 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.51 vs. limit=2.0 2022-11-16 08:00:03,431 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96226.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:00:10,794 INFO [train.py:876] (0/4) Epoch 14, batch 1700, loss[loss=0.1124, simple_loss=0.1486, pruned_loss=0.03813, over 5702.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.133, pruned_loss=0.03458, over 1082292.13 frames. ], batch size: 34, lr: 5.81e-03, grad_scale: 16.0 2022-11-16 08:00:14,147 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.983e+01 1.431e+02 1.731e+02 2.193e+02 6.139e+02, threshold=3.462e+02, percent-clipped=4.0 2022-11-16 08:00:28,698 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4238, 4.3236, 4.5044, 4.5082, 4.1598, 3.9691, 4.9750, 4.4515], device='cuda:0'), covar=tensor([0.0356, 0.0782, 0.0336, 0.1141, 0.0447, 0.0367, 0.0539, 0.0605], device='cuda:0'), in_proj_covar=tensor([0.0090, 0.0110, 0.0096, 0.0123, 0.0090, 0.0082, 0.0147, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:00:36,049 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96273.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:00:45,758 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8962, 3.1075, 2.2970, 2.7069, 2.0359, 2.2820, 1.8327, 2.5190], device='cuda:0'), covar=tensor([0.1320, 0.0358, 0.1051, 0.0657, 0.1777, 0.1125, 0.1828, 0.0665], device='cuda:0'), in_proj_covar=tensor([0.0151, 0.0139, 0.0151, 0.0145, 0.0170, 0.0164, 0.0155, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:00:58,804 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6751, 1.3550, 1.7254, 1.1141, 1.9128, 1.9736, 1.1391, 1.4057], device='cuda:0'), covar=tensor([0.0595, 0.0847, 0.0804, 0.1249, 0.0686, 0.1084, 0.0745, 0.0470], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0026, 0.0019, 0.0021, 0.0018, 0.0017, 0.0025, 0.0017], device='cuda:0'), out_proj_covar=tensor([9.2532e-05, 1.2882e-04, 9.9499e-05, 1.1130e-04, 9.9422e-05, 9.2886e-05, 1.2338e-04, 9.2939e-05], device='cuda:0') 2022-11-16 08:01:07,784 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96321.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:01:10,178 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96324.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:01:12,938 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.92 vs. limit=2.0 2022-11-16 08:01:18,261 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96336.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:01:18,842 INFO [train.py:876] (0/4) Epoch 14, batch 1800, loss[loss=0.08812, simple_loss=0.1327, pruned_loss=0.02175, over 5817.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1333, pruned_loss=0.03458, over 1088915.10 frames. ], batch size: 21, lr: 5.80e-03, grad_scale: 16.0 2022-11-16 08:01:22,026 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.395e+01 1.406e+02 1.732e+02 2.199e+02 6.902e+02, threshold=3.464e+02, percent-clipped=4.0 2022-11-16 08:01:50,739 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96385.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:01:54,347 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96390.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:02:24,418 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7687, 1.1406, 0.8588, 0.9078, 1.0014, 1.1193, 0.6086, 1.1922], device='cuda:0'), covar=tensor([0.0117, 0.0057, 0.0104, 0.0084, 0.0079, 0.0062, 0.0117, 0.0072], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0061, 0.0060, 0.0066, 0.0063, 0.0059, 0.0057, 0.0056], device='cuda:0'), out_proj_covar=tensor([5.8662e-05, 5.3593e-05, 5.2223e-05, 5.7966e-05, 5.5914e-05, 5.0815e-05, 5.0242e-05, 4.8862e-05], device='cuda:0') 2022-11-16 08:02:25,303 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96436.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:02:25,903 INFO [train.py:876] (0/4) Epoch 14, batch 1900, loss[loss=0.09147, simple_loss=0.1244, pruned_loss=0.02927, over 5586.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1335, pruned_loss=0.03444, over 1089994.36 frames. ], batch size: 46, lr: 5.80e-03, grad_scale: 16.0 2022-11-16 08:02:29,415 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.308e+01 1.441e+02 1.720e+02 2.105e+02 7.193e+02, threshold=3.439e+02, percent-clipped=2.0 2022-11-16 08:02:35,460 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96451.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:02:44,500 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4484, 2.3836, 3.1383, 2.9047, 2.9874, 2.2736, 2.9341, 3.4725], device='cuda:0'), covar=tensor([0.0786, 0.1407, 0.0856, 0.1217, 0.0974, 0.1514, 0.1143, 0.0893], device='cuda:0'), in_proj_covar=tensor([0.0248, 0.0196, 0.0218, 0.0214, 0.0243, 0.0197, 0.0225, 0.0234], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:02:46,379 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96468.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:03:10,709 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96503.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:03:18,512 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3571, 4.1503, 4.2676, 3.9314, 4.3101, 4.0070, 1.6070, 4.4739], device='cuda:0'), covar=tensor([0.0244, 0.0303, 0.0291, 0.0266, 0.0270, 0.0362, 0.3057, 0.0246], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0091, 0.0088, 0.0083, 0.0102, 0.0091, 0.0132, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:03:19,134 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96516.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:03:22,351 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96521.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:03:25,607 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9556, 3.8311, 3.7734, 4.0443, 3.7463, 3.6347, 4.4121, 3.8239], device='cuda:0'), covar=tensor([0.0478, 0.0884, 0.0539, 0.1129, 0.0488, 0.0413, 0.0615, 0.0756], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0112, 0.0097, 0.0124, 0.0091, 0.0083, 0.0149, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:03:32,721 INFO [train.py:876] (0/4) Epoch 14, batch 2000, loss[loss=0.1125, simple_loss=0.1529, pruned_loss=0.03604, over 5758.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1345, pruned_loss=0.03557, over 1085310.67 frames. ], batch size: 21, lr: 5.80e-03, grad_scale: 16.0 2022-11-16 08:03:36,667 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.185e+01 1.342e+02 1.750e+02 2.213e+02 4.524e+02, threshold=3.499e+02, percent-clipped=3.0 2022-11-16 08:03:42,014 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9822, 5.2845, 3.8230, 2.2350, 5.0082, 2.2289, 4.9209, 3.1168], device='cuda:0'), covar=tensor([0.1229, 0.0124, 0.0448, 0.2079, 0.0149, 0.1497, 0.0141, 0.1162], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0104, 0.0115, 0.0111, 0.0102, 0.0118, 0.0100, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:03:43,303 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96551.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:04:40,274 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96636.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:04:40,791 INFO [train.py:876] (0/4) Epoch 14, batch 2100, loss[loss=0.06838, simple_loss=0.1153, pruned_loss=0.01074, over 5550.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1339, pruned_loss=0.03538, over 1075826.62 frames. ], batch size: 14, lr: 5.80e-03, grad_scale: 8.0 2022-11-16 08:04:45,309 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 1.415e+02 1.802e+02 2.261e+02 4.449e+02, threshold=3.604e+02, percent-clipped=2.0 2022-11-16 08:05:10,357 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96680.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:05:12,894 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96684.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:05:48,018 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96736.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:05:48,600 INFO [train.py:876] (0/4) Epoch 14, batch 2200, loss[loss=0.09539, simple_loss=0.1293, pruned_loss=0.03075, over 5546.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1335, pruned_loss=0.03525, over 1075687.65 frames. ], batch size: 40, lr: 5.79e-03, grad_scale: 8.0 2022-11-16 08:05:52,487 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.562e+01 1.449e+02 1.748e+02 2.179e+02 3.480e+02, threshold=3.495e+02, percent-clipped=0.0 2022-11-16 08:05:55,270 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=96746.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:06:15,846 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=96776.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:06:20,965 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96784.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:06:21,972 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.94 vs. limit=5.0 2022-11-16 08:06:46,714 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96821.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:06:54,533 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5231, 4.6118, 3.5550, 2.0944, 4.1829, 1.6562, 4.2284, 2.4204], device='cuda:0'), covar=tensor([0.1387, 0.0119, 0.0619, 0.1788, 0.0244, 0.1720, 0.0219, 0.1386], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0103, 0.0114, 0.0110, 0.0101, 0.0117, 0.0099, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:06:56,990 INFO [train.py:876] (0/4) Epoch 14, batch 2300, loss[loss=0.1019, simple_loss=0.143, pruned_loss=0.03034, over 5705.00 frames. ], tot_loss[loss=0.09979, simple_loss=0.1318, pruned_loss=0.03391, over 1084368.64 frames. ], batch size: 36, lr: 5.79e-03, grad_scale: 4.0 2022-11-16 08:06:57,160 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=96837.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:06:59,545 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.86 vs. limit=2.0 2022-11-16 08:07:01,480 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.903e+01 1.452e+02 1.773e+02 2.447e+02 7.176e+02, threshold=3.545e+02, percent-clipped=7.0 2022-11-16 08:07:18,439 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=96869.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:07:40,806 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.7671, 4.2892, 4.6124, 4.2835, 4.8418, 4.6577, 4.2743, 4.8472], device='cuda:0'), covar=tensor([0.0390, 0.0445, 0.0482, 0.0399, 0.0411, 0.0268, 0.0327, 0.0272], device='cuda:0'), in_proj_covar=tensor([0.0150, 0.0162, 0.0113, 0.0150, 0.0190, 0.0117, 0.0134, 0.0161], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 08:07:40,945 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6366, 3.6353, 3.5289, 3.1849, 2.0618, 3.6094, 2.2919, 3.1233], device='cuda:0'), covar=tensor([0.0433, 0.0186, 0.0187, 0.0370, 0.0595, 0.0179, 0.0549, 0.0230], device='cuda:0'), in_proj_covar=tensor([0.0196, 0.0183, 0.0182, 0.0209, 0.0196, 0.0185, 0.0195, 0.0189], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 08:08:04,678 INFO [train.py:876] (0/4) Epoch 14, batch 2400, loss[loss=0.1173, simple_loss=0.1372, pruned_loss=0.0487, over 5493.00 frames. ], tot_loss[loss=0.09954, simple_loss=0.1317, pruned_loss=0.03366, over 1086247.19 frames. ], batch size: 64, lr: 5.79e-03, grad_scale: 8.0 2022-11-16 08:08:09,597 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.192e+01 1.391e+02 1.679e+02 2.056e+02 5.016e+02, threshold=3.358e+02, percent-clipped=6.0 2022-11-16 08:08:12,315 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7188, 0.6502, 0.7455, 0.7295, 0.7855, 0.6101, 0.3445, 0.6813], device='cuda:0'), covar=tensor([0.0253, 0.0390, 0.0310, 0.0322, 0.0269, 0.0260, 0.0622, 0.0252], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0026, 0.0018, 0.0021, 0.0018, 0.0016, 0.0024, 0.0017], device='cuda:0'), out_proj_covar=tensor([9.1180e-05, 1.2791e-04, 9.8656e-05, 1.1120e-04, 9.8956e-05, 9.2369e-05, 1.2182e-04, 9.2210e-05], device='cuda:0') 2022-11-16 08:08:34,215 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=96980.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:08:48,041 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97000.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:09:06,577 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97028.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:09:12,706 INFO [train.py:876] (0/4) Epoch 14, batch 2500, loss[loss=0.0993, simple_loss=0.1334, pruned_loss=0.03261, over 5576.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1325, pruned_loss=0.03465, over 1079491.79 frames. ], batch size: 16, lr: 5.78e-03, grad_scale: 8.0 2022-11-16 08:09:17,248 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.842e+01 1.491e+02 1.726e+02 2.085e+02 3.787e+02, threshold=3.452e+02, percent-clipped=1.0 2022-11-16 08:09:18,709 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97046.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:09:29,248 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97061.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:09:51,070 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97094.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:10:16,525 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97132.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:10:17,216 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0224, 1.4120, 1.8583, 1.5022, 1.4029, 1.6991, 1.3668, 1.4810], device='cuda:0'), covar=tensor([0.0041, 0.0079, 0.0042, 0.0074, 0.0161, 0.0149, 0.0057, 0.0056], device='cuda:0'), in_proj_covar=tensor([0.0032, 0.0030, 0.0030, 0.0039, 0.0034, 0.0031, 0.0038, 0.0037], device='cuda:0'), out_proj_covar=tensor([2.9290e-05, 2.7865e-05, 2.7000e-05, 3.6965e-05, 3.1439e-05, 2.9553e-05, 3.6060e-05, 3.4847e-05], device='cuda:0') 2022-11-16 08:10:19,678 INFO [train.py:876] (0/4) Epoch 14, batch 2600, loss[loss=0.1364, simple_loss=0.1532, pruned_loss=0.05981, over 5697.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1326, pruned_loss=0.03503, over 1077619.45 frames. ], batch size: 34, lr: 5.78e-03, grad_scale: 8.0 2022-11-16 08:10:24,998 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.432e+01 1.490e+02 1.874e+02 2.362e+02 5.488e+02, threshold=3.748e+02, percent-clipped=3.0 2022-11-16 08:11:20,873 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.33 vs. limit=2.0 2022-11-16 08:11:27,612 INFO [train.py:876] (0/4) Epoch 14, batch 2700, loss[loss=0.1511, simple_loss=0.1527, pruned_loss=0.07482, over 4153.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1326, pruned_loss=0.03424, over 1081691.63 frames. ], batch size: 181, lr: 5.78e-03, grad_scale: 8.0 2022-11-16 08:11:32,090 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.527e+01 1.417e+02 1.707e+02 2.038e+02 4.656e+02, threshold=3.414e+02, percent-clipped=3.0 2022-11-16 08:11:43,152 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97260.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:12:24,359 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97321.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:12:33,103 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3544, 3.2860, 3.4240, 3.1010, 3.3797, 3.3136, 1.3782, 3.5241], device='cuda:0'), covar=tensor([0.0311, 0.0407, 0.0282, 0.0474, 0.0394, 0.0378, 0.3262, 0.0338], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0089, 0.0086, 0.0082, 0.0101, 0.0090, 0.0129, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:12:34,931 INFO [train.py:876] (0/4) Epoch 14, batch 2800, loss[loss=0.112, simple_loss=0.1485, pruned_loss=0.03777, over 5538.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1328, pruned_loss=0.03396, over 1086522.72 frames. ], batch size: 46, lr: 5.77e-03, grad_scale: 8.0 2022-11-16 08:12:38,285 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97342.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:12:39,405 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.283e+01 1.377e+02 1.617e+02 1.956e+02 4.684e+02, threshold=3.233e+02, percent-clipped=2.0 2022-11-16 08:12:41,607 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97347.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:12:47,483 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97356.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:12:57,858 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1966, 4.6512, 4.1906, 4.7490, 4.6330, 3.9412, 4.2157, 4.0696], device='cuda:0'), covar=tensor([0.0337, 0.0421, 0.1166, 0.0274, 0.0371, 0.0432, 0.0642, 0.0405], device='cuda:0'), in_proj_covar=tensor([0.0132, 0.0179, 0.0271, 0.0175, 0.0222, 0.0171, 0.0188, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:12:59,839 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97374.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:13:19,926 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97403.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 08:13:23,126 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97408.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:13:35,244 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.71 vs. limit=2.0 2022-11-16 08:13:39,491 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97432.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:13:41,856 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97435.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:13:43,017 INFO [train.py:876] (0/4) Epoch 14, batch 2900, loss[loss=0.113, simple_loss=0.1389, pruned_loss=0.0436, over 5533.00 frames. ], tot_loss[loss=0.09962, simple_loss=0.1318, pruned_loss=0.03373, over 1085118.41 frames. ], batch size: 21, lr: 5.77e-03, grad_scale: 8.0 2022-11-16 08:13:44,543 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1079, 2.6085, 3.7450, 3.4252, 4.1390, 2.7374, 3.5182, 4.2648], device='cuda:0'), covar=tensor([0.0627, 0.1770, 0.0807, 0.1248, 0.0435, 0.1640, 0.1386, 0.0581], device='cuda:0'), in_proj_covar=tensor([0.0245, 0.0192, 0.0215, 0.0210, 0.0240, 0.0194, 0.0225, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:13:47,951 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.266e+01 1.378e+02 1.705e+02 2.124e+02 3.777e+02, threshold=3.411e+02, percent-clipped=2.0 2022-11-16 08:13:59,164 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.34 vs. limit=5.0 2022-11-16 08:14:12,310 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97480.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:14:50,997 INFO [train.py:876] (0/4) Epoch 14, batch 3000, loss[loss=0.07799, simple_loss=0.1123, pruned_loss=0.02182, over 5452.00 frames. ], tot_loss[loss=0.09922, simple_loss=0.1316, pruned_loss=0.0334, over 1087397.85 frames. ], batch size: 10, lr: 5.77e-03, grad_scale: 8.0 2022-11-16 08:14:50,998 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 08:14:59,851 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5165, 4.4350, 4.4102, 3.9404, 2.3818, 4.7602, 2.9353, 4.1499], device='cuda:0'), covar=tensor([0.0355, 0.0116, 0.0134, 0.0372, 0.0679, 0.0143, 0.0498, 0.0146], device='cuda:0'), in_proj_covar=tensor([0.0198, 0.0187, 0.0185, 0.0211, 0.0200, 0.0188, 0.0196, 0.0190], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 08:15:01,966 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5468, 4.3371, 4.3469, 4.2466, 4.6022, 4.4009, 4.3696, 4.7326], device='cuda:0'), covar=tensor([0.0377, 0.0303, 0.0453, 0.0350, 0.0404, 0.0238, 0.0216, 0.0208], device='cuda:0'), in_proj_covar=tensor([0.0149, 0.0159, 0.0111, 0.0148, 0.0189, 0.0115, 0.0131, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 08:15:06,759 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4676, 3.7558, 3.0613, 3.5659, 2.9165, 2.7459, 2.2483, 3.2703], device='cuda:0'), covar=tensor([0.1022, 0.0209, 0.0810, 0.0232, 0.1022, 0.0834, 0.1631, 0.0337], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0141, 0.0154, 0.0148, 0.0173, 0.0166, 0.0157, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:15:08,554 INFO [train.py:908] (0/4) Epoch 14, validation: loss=0.178, simple_loss=0.188, pruned_loss=0.08395, over 1530663.00 frames. 2022-11-16 08:15:08,554 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 08:15:12,972 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.365e+01 1.436e+02 1.776e+02 2.242e+02 5.969e+02, threshold=3.553e+02, percent-clipped=3.0 2022-11-16 08:15:20,371 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97555.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:15:39,865 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2022-11-16 08:16:01,559 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97616.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:16:01,648 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97616.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:16:12,650 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.44 vs. limit=2.0 2022-11-16 08:16:14,406 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2374, 1.6447, 1.6684, 1.5491, 1.4994, 2.1560, 1.9956, 1.4939], device='cuda:0'), covar=tensor([0.2012, 0.1645, 0.1991, 0.2425, 0.2277, 0.1005, 0.1319, 0.2445], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0107, 0.0108, 0.0108, 0.0078, 0.0074, 0.0088, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 08:16:15,009 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5792, 4.3581, 4.6575, 4.4999, 4.1523, 4.0018, 5.0241, 4.5070], device='cuda:0'), covar=tensor([0.0423, 0.1051, 0.0327, 0.1218, 0.0502, 0.0361, 0.0668, 0.0678], device='cuda:0'), in_proj_covar=tensor([0.0090, 0.0112, 0.0097, 0.0125, 0.0090, 0.0084, 0.0148, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:16:15,589 INFO [train.py:876] (0/4) Epoch 14, batch 3100, loss[loss=0.09181, simple_loss=0.1288, pruned_loss=0.02739, over 5694.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1334, pruned_loss=0.03468, over 1087125.96 frames. ], batch size: 19, lr: 5.77e-03, grad_scale: 8.0 2022-11-16 08:16:16,375 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5504, 1.4068, 1.4222, 1.1445, 1.1986, 1.2842, 1.2547, 0.7090], device='cuda:0'), covar=tensor([0.0035, 0.0055, 0.0043, 0.0083, 0.0066, 0.0065, 0.0061, 0.0086], device='cuda:0'), in_proj_covar=tensor([0.0032, 0.0029, 0.0029, 0.0038, 0.0034, 0.0030, 0.0037, 0.0036], device='cuda:0'), out_proj_covar=tensor([2.9126e-05, 2.7479e-05, 2.6369e-05, 3.6547e-05, 3.1490e-05, 2.9179e-05, 3.5078e-05, 3.4453e-05], device='cuda:0') 2022-11-16 08:16:20,444 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.977e+01 1.456e+02 1.784e+02 2.207e+02 3.883e+02, threshold=3.567e+02, percent-clipped=1.0 2022-11-16 08:16:28,859 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97656.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:16:41,505 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97674.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 08:16:46,757 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7516, 3.7268, 3.6632, 3.3242, 2.0847, 3.8276, 2.3615, 3.2408], device='cuda:0'), covar=tensor([0.0454, 0.0224, 0.0172, 0.0326, 0.0655, 0.0171, 0.0578, 0.0215], device='cuda:0'), in_proj_covar=tensor([0.0199, 0.0187, 0.0185, 0.0210, 0.0200, 0.0188, 0.0196, 0.0190], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 08:16:57,837 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97698.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 08:17:01,157 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97703.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:17:01,768 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97704.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:17:19,788 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97730.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:17:23,174 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97735.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 08:17:24,289 INFO [train.py:876] (0/4) Epoch 14, batch 3200, loss[loss=0.09023, simple_loss=0.1284, pruned_loss=0.02604, over 5738.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1329, pruned_loss=0.03475, over 1083187.07 frames. ], batch size: 27, lr: 5.76e-03, grad_scale: 8.0 2022-11-16 08:17:29,198 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 1.455e+02 1.714e+02 2.116e+02 4.590e+02, threshold=3.428e+02, percent-clipped=2.0 2022-11-16 08:17:50,721 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.98 vs. limit=2.0 2022-11-16 08:17:54,441 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6419, 3.4167, 3.4497, 3.2260, 1.9479, 3.4764, 2.1967, 2.9365], device='cuda:0'), covar=tensor([0.0438, 0.0204, 0.0179, 0.0325, 0.0682, 0.0283, 0.0557, 0.0212], device='cuda:0'), in_proj_covar=tensor([0.0201, 0.0189, 0.0187, 0.0213, 0.0202, 0.0190, 0.0199, 0.0192], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 08:18:08,855 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2022-11-16 08:18:32,093 INFO [train.py:876] (0/4) Epoch 14, batch 3300, loss[loss=0.0964, simple_loss=0.1289, pruned_loss=0.03196, over 5465.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1319, pruned_loss=0.03451, over 1076716.17 frames. ], batch size: 12, lr: 5.76e-03, grad_scale: 8.0 2022-11-16 08:18:36,476 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.779e+01 1.482e+02 1.736e+02 2.155e+02 3.992e+02, threshold=3.473e+02, percent-clipped=3.0 2022-11-16 08:18:41,693 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9020, 2.8173, 3.4130, 4.2308, 4.4107, 3.6795, 3.2825, 4.5597], device='cuda:0'), covar=tensor([0.0388, 0.2934, 0.1910, 0.3211, 0.0986, 0.2571, 0.1906, 0.0593], device='cuda:0'), in_proj_covar=tensor([0.0260, 0.0197, 0.0187, 0.0295, 0.0227, 0.0200, 0.0191, 0.0247], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 08:18:46,811 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.67 vs. limit=5.0 2022-11-16 08:19:04,299 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0401, 3.2023, 2.4019, 1.6866, 2.9626, 1.2194, 2.9479, 1.6870], device='cuda:0'), covar=tensor([0.1428, 0.0242, 0.1031, 0.1978, 0.0371, 0.2266, 0.0374, 0.1782], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0104, 0.0115, 0.0112, 0.0103, 0.0119, 0.0100, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:19:06,975 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=97888.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:19:10,926 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0161, 1.6621, 1.8477, 1.5938, 1.3242, 1.7559, 1.6507, 1.5531], device='cuda:0'), covar=tensor([0.0038, 0.0077, 0.0058, 0.0075, 0.0104, 0.0125, 0.0051, 0.0061], device='cuda:0'), in_proj_covar=tensor([0.0031, 0.0029, 0.0029, 0.0037, 0.0033, 0.0030, 0.0036, 0.0035], device='cuda:0'), out_proj_covar=tensor([2.8552e-05, 2.6675e-05, 2.5775e-05, 3.5594e-05, 3.0733e-05, 2.8442e-05, 3.4581e-05, 3.3482e-05], device='cuda:0') 2022-11-16 08:19:18,083 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5667, 2.9146, 3.8221, 2.1202, 3.4833, 3.8204, 3.8278, 4.2162], device='cuda:0'), covar=tensor([0.1785, 0.1681, 0.0905, 0.2796, 0.0800, 0.1046, 0.0487, 0.0560], device='cuda:0'), in_proj_covar=tensor([0.0165, 0.0179, 0.0167, 0.0181, 0.0187, 0.0204, 0.0171, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:19:22,246 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=97911.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:19:25,491 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97916.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:19:37,422 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.63 vs. limit=2.0 2022-11-16 08:19:39,518 INFO [train.py:876] (0/4) Epoch 14, batch 3400, loss[loss=0.1058, simple_loss=0.1225, pruned_loss=0.04454, over 4132.00 frames. ], tot_loss[loss=0.09959, simple_loss=0.1311, pruned_loss=0.03405, over 1080057.29 frames. ], batch size: 181, lr: 5.76e-03, grad_scale: 8.0 2022-11-16 08:19:44,335 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 1.387e+02 1.696e+02 2.106e+02 3.635e+02, threshold=3.392e+02, percent-clipped=1.0 2022-11-16 08:19:47,856 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=97949.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:19:57,793 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-16 08:19:58,066 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=97964.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:20:02,547 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3890, 2.7688, 3.8829, 3.5253, 4.1965, 2.8717, 3.7984, 4.2958], device='cuda:0'), covar=tensor([0.0515, 0.1435, 0.0709, 0.1240, 0.0372, 0.1396, 0.1028, 0.0465], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0193, 0.0216, 0.0212, 0.0239, 0.0195, 0.0224, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:20:21,514 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=97998.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:20:23,164 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7248, 1.3376, 1.4921, 1.1785, 1.8393, 1.7571, 1.1511, 1.4837], device='cuda:0'), covar=tensor([0.0575, 0.0770, 0.0666, 0.0835, 0.0546, 0.0866, 0.0931, 0.0382], device='cuda:0'), in_proj_covar=tensor([0.0016, 0.0026, 0.0018, 0.0022, 0.0018, 0.0017, 0.0025, 0.0017], device='cuda:0'), out_proj_covar=tensor([9.2024e-05, 1.2959e-04, 9.9246e-05, 1.1174e-04, 9.9918e-05, 9.3477e-05, 1.2380e-04, 9.3066e-05], device='cuda:0') 2022-11-16 08:20:25,085 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98003.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:20:36,289 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2022-11-16 08:20:43,490 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98030.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:20:43,526 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98030.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:20:47,938 INFO [train.py:876] (0/4) Epoch 14, batch 3500, loss[loss=0.09471, simple_loss=0.1246, pruned_loss=0.0324, over 5188.00 frames. ], tot_loss[loss=0.0989, simple_loss=0.1308, pruned_loss=0.03351, over 1083462.91 frames. ], batch size: 91, lr: 5.75e-03, grad_scale: 8.0 2022-11-16 08:20:52,493 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.923e+01 1.343e+02 1.705e+02 2.352e+02 4.621e+02, threshold=3.411e+02, percent-clipped=6.0 2022-11-16 08:20:53,901 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98046.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:20:57,581 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98051.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:21:15,962 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98078.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:21:17,923 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3328, 4.2581, 2.8700, 4.0219, 3.3774, 3.0554, 2.3886, 3.6151], device='cuda:0'), covar=tensor([0.1521, 0.0272, 0.1110, 0.0401, 0.0734, 0.1004, 0.1932, 0.0404], device='cuda:0'), in_proj_covar=tensor([0.0152, 0.0140, 0.0154, 0.0147, 0.0173, 0.0165, 0.0156, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:21:34,722 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98106.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:21:35,652 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1406, 3.7144, 3.3619, 3.7399, 3.7295, 3.2256, 3.3253, 3.2572], device='cuda:0'), covar=tensor([0.1333, 0.0583, 0.1373, 0.0465, 0.0585, 0.0603, 0.0912, 0.0783], device='cuda:0'), in_proj_covar=tensor([0.0135, 0.0183, 0.0273, 0.0177, 0.0225, 0.0173, 0.0191, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:21:55,984 INFO [train.py:876] (0/4) Epoch 14, batch 3600, loss[loss=0.1135, simple_loss=0.1439, pruned_loss=0.0415, over 5035.00 frames. ], tot_loss[loss=0.09894, simple_loss=0.1311, pruned_loss=0.03337, over 1085096.46 frames. ], batch size: 109, lr: 5.75e-03, grad_scale: 8.0 2022-11-16 08:22:00,948 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.265e+01 1.372e+02 1.706e+02 2.197e+02 4.106e+02, threshold=3.412e+02, percent-clipped=3.0 2022-11-16 08:22:16,987 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98167.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:22:46,755 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98211.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:23:04,155 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2173, 4.5719, 4.0823, 4.5385, 4.5180, 3.8953, 4.1561, 3.9419], device='cuda:0'), covar=tensor([0.0369, 0.0432, 0.1380, 0.0471, 0.0476, 0.0605, 0.0751, 0.0868], device='cuda:0'), in_proj_covar=tensor([0.0136, 0.0184, 0.0276, 0.0179, 0.0226, 0.0175, 0.0192, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:23:04,737 INFO [train.py:876] (0/4) Epoch 14, batch 3700, loss[loss=0.0829, simple_loss=0.1188, pruned_loss=0.02352, over 5494.00 frames. ], tot_loss[loss=0.09978, simple_loss=0.1319, pruned_loss=0.03383, over 1082686.47 frames. ], batch size: 17, lr: 5.75e-03, grad_scale: 8.0 2022-11-16 08:23:06,846 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5036, 2.4431, 2.1440, 2.4902, 2.0788, 1.9443, 2.1264, 2.8120], device='cuda:0'), covar=tensor([0.1191, 0.1547, 0.2231, 0.1456, 0.1732, 0.1832, 0.1854, 0.1149], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0109, 0.0107, 0.0109, 0.0095, 0.0105, 0.0099, 0.0085], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 08:23:09,246 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.991e+01 1.391e+02 1.713e+02 2.053e+02 4.916e+02, threshold=3.427e+02, percent-clipped=4.0 2022-11-16 08:23:09,338 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98244.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:23:19,333 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98259.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:23:29,093 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9899, 3.1141, 2.7193, 3.2518, 2.7359, 3.1598, 3.1567, 3.4434], device='cuda:0'), covar=tensor([0.0964, 0.1014, 0.1574, 0.1155, 0.1249, 0.0775, 0.1044, 0.1765], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0109, 0.0107, 0.0109, 0.0095, 0.0105, 0.0099, 0.0086], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 08:23:38,634 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3376, 2.8340, 3.3179, 1.8289, 3.1687, 3.4594, 3.4766, 3.8869], device='cuda:0'), covar=tensor([0.2131, 0.1941, 0.1024, 0.3198, 0.0752, 0.1128, 0.0624, 0.0730], device='cuda:0'), in_proj_covar=tensor([0.0163, 0.0175, 0.0166, 0.0179, 0.0184, 0.0203, 0.0168, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:23:55,806 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.23 vs. limit=2.0 2022-11-16 08:24:05,961 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([6.1401, 5.6645, 5.9651, 5.4350, 6.2050, 5.9347, 5.2042, 6.1598], device='cuda:0'), covar=tensor([0.0311, 0.0370, 0.0363, 0.0508, 0.0292, 0.0231, 0.0260, 0.0304], device='cuda:0'), in_proj_covar=tensor([0.0148, 0.0158, 0.0110, 0.0148, 0.0188, 0.0116, 0.0132, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 08:24:08,793 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98330.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:24:13,986 INFO [train.py:876] (0/4) Epoch 14, batch 3800, loss[loss=0.09416, simple_loss=0.1244, pruned_loss=0.03196, over 5478.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1318, pruned_loss=0.0342, over 1084148.93 frames. ], batch size: 11, lr: 5.74e-03, grad_scale: 4.0 2022-11-16 08:24:19,594 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.532e+01 1.388e+02 1.687e+02 2.091e+02 4.683e+02, threshold=3.374e+02, percent-clipped=2.0 2022-11-16 08:24:30,193 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.21 vs. limit=2.0 2022-11-16 08:24:42,604 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98378.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 08:25:04,760 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98410.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:25:22,856 INFO [train.py:876] (0/4) Epoch 14, batch 3900, loss[loss=0.09959, simple_loss=0.1422, pruned_loss=0.02851, over 5537.00 frames. ], tot_loss[loss=0.09995, simple_loss=0.132, pruned_loss=0.03393, over 1081895.07 frames. ], batch size: 30, lr: 5.74e-03, grad_scale: 4.0 2022-11-16 08:25:27,987 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.450e+01 1.480e+02 1.725e+02 2.158e+02 4.236e+02, threshold=3.450e+02, percent-clipped=3.0 2022-11-16 08:25:39,863 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98462.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:25:45,715 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98471.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:26:24,188 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.66 vs. limit=2.0 2022-11-16 08:26:26,552 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3033, 3.3530, 3.4330, 3.0183, 3.3709, 3.2701, 1.3633, 3.5587], device='cuda:0'), covar=tensor([0.0298, 0.0282, 0.0300, 0.0386, 0.0333, 0.0370, 0.2986, 0.0339], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0089, 0.0087, 0.0082, 0.0103, 0.0090, 0.0129, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:26:27,294 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5555, 2.0905, 3.1958, 2.6945, 3.3429, 2.1372, 3.0709, 3.5745], device='cuda:0'), covar=tensor([0.0665, 0.1722, 0.1001, 0.1702, 0.0787, 0.1722, 0.1259, 0.0793], device='cuda:0'), in_proj_covar=tensor([0.0243, 0.0192, 0.0216, 0.0211, 0.0240, 0.0196, 0.0223, 0.0231], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:26:30,065 INFO [train.py:876] (0/4) Epoch 14, batch 4000, loss[loss=0.1081, simple_loss=0.1463, pruned_loss=0.03497, over 5299.00 frames. ], tot_loss[loss=0.09889, simple_loss=0.1315, pruned_loss=0.03312, over 1085308.29 frames. ], batch size: 79, lr: 5.74e-03, grad_scale: 8.0 2022-11-16 08:26:34,187 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98543.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:26:34,792 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98544.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:26:35,247 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.016e+02 1.413e+02 1.702e+02 2.140e+02 3.638e+02, threshold=3.404e+02, percent-clipped=2.0 2022-11-16 08:27:07,305 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98592.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:27:15,815 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98604.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:27:22,537 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.5163, 5.7488, 4.0337, 2.5092, 5.3392, 2.6473, 5.4389, 3.0928], device='cuda:0'), covar=tensor([0.1021, 0.0090, 0.0535, 0.1778, 0.0147, 0.1421, 0.0142, 0.1244], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0103, 0.0115, 0.0111, 0.0103, 0.0118, 0.0099, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:27:31,708 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4033, 1.7698, 1.3745, 1.2122, 1.5880, 1.4848, 1.0737, 1.6114], device='cuda:0'), covar=tensor([0.0072, 0.0043, 0.0070, 0.0072, 0.0071, 0.0058, 0.0094, 0.0063], device='cuda:0'), in_proj_covar=tensor([0.0066, 0.0061, 0.0060, 0.0066, 0.0064, 0.0059, 0.0057, 0.0055], device='cuda:0'), out_proj_covar=tensor([5.8728e-05, 5.4024e-05, 5.2436e-05, 5.7666e-05, 5.6464e-05, 5.1395e-05, 5.0624e-05, 4.8153e-05], device='cuda:0') 2022-11-16 08:27:37,421 INFO [train.py:876] (0/4) Epoch 14, batch 4100, loss[loss=0.1082, simple_loss=0.1458, pruned_loss=0.03533, over 5589.00 frames. ], tot_loss[loss=0.09926, simple_loss=0.1314, pruned_loss=0.03358, over 1085634.38 frames. ], batch size: 24, lr: 5.74e-03, grad_scale: 8.0 2022-11-16 08:27:42,932 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.009e+01 1.414e+02 1.742e+02 2.183e+02 4.032e+02, threshold=3.484e+02, percent-clipped=2.0 2022-11-16 08:28:10,423 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.02 vs. limit=2.0 2022-11-16 08:28:20,124 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2022-11-16 08:28:45,154 INFO [train.py:876] (0/4) Epoch 14, batch 4200, loss[loss=0.08028, simple_loss=0.1237, pruned_loss=0.01845, over 5572.00 frames. ], tot_loss[loss=0.09789, simple_loss=0.1307, pruned_loss=0.03253, over 1088062.73 frames. ], batch size: 15, lr: 5.73e-03, grad_scale: 8.0 2022-11-16 08:28:50,363 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.013e+02 1.341e+02 1.638e+02 2.138e+02 3.541e+02, threshold=3.276e+02, percent-clipped=2.0 2022-11-16 08:29:01,474 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=98762.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:29:04,377 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98766.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:29:17,528 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2953, 1.6682, 1.2179, 1.2582, 1.5487, 1.3565, 1.0301, 1.4987], device='cuda:0'), covar=tensor([0.0081, 0.0051, 0.0084, 0.0076, 0.0071, 0.0058, 0.0103, 0.0099], device='cuda:0'), in_proj_covar=tensor([0.0068, 0.0062, 0.0061, 0.0066, 0.0065, 0.0060, 0.0058, 0.0056], device='cuda:0'), out_proj_covar=tensor([6.0020e-05, 5.4691e-05, 5.3168e-05, 5.8457e-05, 5.7485e-05, 5.2207e-05, 5.1621e-05, 4.9054e-05], device='cuda:0') 2022-11-16 08:29:19,519 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9562, 1.8274, 2.2098, 1.8607, 1.4676, 2.5671, 2.2469, 1.8427], device='cuda:0'), covar=tensor([0.1217, 0.1739, 0.1303, 0.2390, 0.2973, 0.0686, 0.1161, 0.1989], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0107, 0.0107, 0.0108, 0.0080, 0.0073, 0.0089, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 08:29:34,472 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=98810.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:29:40,115 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.7160, 1.8468, 1.9130, 1.5571, 1.9407, 1.8491, 1.0099, 1.9872], device='cuda:0'), covar=tensor([0.0425, 0.0473, 0.0395, 0.0556, 0.0450, 0.0515, 0.2416, 0.0461], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0090, 0.0088, 0.0083, 0.0103, 0.0091, 0.0130, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:29:53,230 INFO [train.py:876] (0/4) Epoch 14, batch 4300, loss[loss=0.1264, simple_loss=0.1538, pruned_loss=0.0495, over 5637.00 frames. ], tot_loss[loss=0.09879, simple_loss=0.1312, pruned_loss=0.03317, over 1085295.65 frames. ], batch size: 29, lr: 5.73e-03, grad_scale: 8.0 2022-11-16 08:29:54,387 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.57 vs. limit=5.0 2022-11-16 08:29:58,787 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.664e+01 1.356e+02 1.673e+02 1.998e+02 3.650e+02, threshold=3.347e+02, percent-clipped=3.0 2022-11-16 08:29:59,768 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.31 vs. limit=5.0 2022-11-16 08:30:15,458 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3248, 2.7228, 3.0090, 1.6788, 3.0281, 3.2900, 3.1807, 3.5791], device='cuda:0'), covar=tensor([0.1935, 0.1938, 0.1284, 0.3178, 0.1070, 0.1287, 0.0790, 0.0769], device='cuda:0'), in_proj_covar=tensor([0.0163, 0.0178, 0.0169, 0.0182, 0.0187, 0.0205, 0.0171, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:30:28,752 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98890.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:30:31,969 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98895.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:30:34,850 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=98899.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:30:52,975 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5573, 4.7367, 3.6292, 1.8380, 4.3730, 1.6382, 4.2975, 2.4786], device='cuda:0'), covar=tensor([0.1580, 0.0125, 0.0539, 0.2232, 0.0211, 0.1886, 0.0413, 0.1481], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0104, 0.0116, 0.0112, 0.0104, 0.0120, 0.0100, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:31:00,054 INFO [train.py:876] (0/4) Epoch 14, batch 4400, loss[loss=0.07456, simple_loss=0.1133, pruned_loss=0.0179, over 5538.00 frames. ], tot_loss[loss=0.09845, simple_loss=0.131, pruned_loss=0.03293, over 1088052.86 frames. ], batch size: 16, lr: 5.73e-03, grad_scale: 8.0 2022-11-16 08:31:05,616 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.027e+01 1.520e+02 1.757e+02 2.071e+02 5.109e+02, threshold=3.514e+02, percent-clipped=4.0 2022-11-16 08:31:09,735 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98951.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:31:13,107 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=98956.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:31:23,416 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=98971.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:31:34,419 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2670, 4.0689, 4.0697, 3.7586, 4.2024, 4.0373, 1.8840, 4.4707], device='cuda:0'), covar=tensor([0.0265, 0.0389, 0.0450, 0.0474, 0.0390, 0.0412, 0.2934, 0.0326], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0089, 0.0087, 0.0082, 0.0102, 0.0090, 0.0130, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:32:04,685 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99032.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:32:07,917 INFO [train.py:876] (0/4) Epoch 14, batch 4500, loss[loss=0.07982, simple_loss=0.1187, pruned_loss=0.02046, over 5750.00 frames. ], tot_loss[loss=0.09828, simple_loss=0.1309, pruned_loss=0.03282, over 1085992.41 frames. ], batch size: 20, lr: 5.72e-03, grad_scale: 8.0 2022-11-16 08:32:13,104 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.350e+01 1.443e+02 1.649e+02 2.186e+02 4.322e+02, threshold=3.298e+02, percent-clipped=3.0 2022-11-16 08:32:17,534 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99051.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:32:27,871 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99066.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:32:43,237 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0624, 2.3433, 2.9930, 3.9191, 3.8597, 2.9862, 2.6922, 3.8724], device='cuda:0'), covar=tensor([0.0688, 0.3374, 0.2215, 0.2337, 0.1115, 0.3071, 0.2321, 0.0928], device='cuda:0'), in_proj_covar=tensor([0.0258, 0.0200, 0.0186, 0.0295, 0.0224, 0.0200, 0.0189, 0.0250], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 08:32:58,458 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99112.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:33:00,365 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99114.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:33:10,385 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2870, 2.6991, 3.0029, 2.6856, 1.8224, 2.8721, 2.0015, 2.5031], device='cuda:0'), covar=tensor([0.0415, 0.0243, 0.0194, 0.0361, 0.0587, 0.0235, 0.0518, 0.0213], device='cuda:0'), in_proj_covar=tensor([0.0198, 0.0188, 0.0185, 0.0213, 0.0200, 0.0186, 0.0196, 0.0191], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 08:33:16,012 INFO [train.py:876] (0/4) Epoch 14, batch 4600, loss[loss=0.07995, simple_loss=0.1129, pruned_loss=0.02348, over 5746.00 frames. ], tot_loss[loss=0.09759, simple_loss=0.1306, pruned_loss=0.0323, over 1088277.58 frames. ], batch size: 15, lr: 5.72e-03, grad_scale: 8.0 2022-11-16 08:33:17,556 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.20 vs. limit=2.0 2022-11-16 08:33:18,970 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2022-11-16 08:33:21,170 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.692e+01 1.360e+02 1.705e+02 2.390e+02 5.580e+02, threshold=3.409e+02, percent-clipped=5.0 2022-11-16 08:33:34,913 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3044, 1.9039, 2.7742, 1.6056, 2.2688, 2.3964, 2.1222, 1.3216], device='cuda:0'), covar=tensor([0.0779, 0.0519, 0.0422, 0.1293, 0.1175, 0.2265, 0.0317, 0.0970], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0027, 0.0019, 0.0022, 0.0019, 0.0017, 0.0025, 0.0017], device='cuda:0'), out_proj_covar=tensor([9.5230e-05, 1.3359e-04, 1.0172e-04, 1.1455e-04, 1.0240e-04, 9.6643e-05, 1.2541e-04, 9.5584e-05], device='cuda:0') 2022-11-16 08:33:51,195 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-16 08:33:57,232 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99199.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:34:00,780 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2202, 3.0543, 3.2223, 1.7147, 3.0017, 3.2920, 3.2069, 3.6305], device='cuda:0'), covar=tensor([0.1859, 0.1487, 0.1167, 0.2755, 0.0588, 0.1632, 0.0560, 0.0743], device='cuda:0'), in_proj_covar=tensor([0.0161, 0.0175, 0.0166, 0.0180, 0.0184, 0.0204, 0.0171, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:34:17,424 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99229.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:34:22,317 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3929, 1.9000, 2.1741, 1.7836, 1.7874, 2.0328, 1.8955, 1.6049], device='cuda:0'), covar=tensor([0.0029, 0.0069, 0.0038, 0.0056, 0.0108, 0.0107, 0.0052, 0.0063], device='cuda:0'), in_proj_covar=tensor([0.0033, 0.0030, 0.0031, 0.0040, 0.0035, 0.0031, 0.0039, 0.0038], device='cuda:0'), out_proj_covar=tensor([3.0716e-05, 2.8355e-05, 2.7762e-05, 3.7871e-05, 3.2760e-05, 2.9692e-05, 3.6910e-05, 3.5892e-05], device='cuda:0') 2022-11-16 08:34:22,858 INFO [train.py:876] (0/4) Epoch 14, batch 4700, loss[loss=0.08974, simple_loss=0.127, pruned_loss=0.02626, over 5497.00 frames. ], tot_loss[loss=0.09966, simple_loss=0.1316, pruned_loss=0.03385, over 1080562.44 frames. ], batch size: 12, lr: 5.72e-03, grad_scale: 8.0 2022-11-16 08:34:28,048 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.829e+01 1.395e+02 1.659e+02 2.125e+02 3.836e+02, threshold=3.317e+02, percent-clipped=3.0 2022-11-16 08:34:28,802 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99246.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:34:29,429 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99247.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:34:32,128 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99251.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:34:34,809 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2826, 4.2311, 4.2193, 3.8554, 4.2293, 4.0891, 1.6876, 4.5809], device='cuda:0'), covar=tensor([0.0233, 0.0325, 0.0262, 0.0365, 0.0253, 0.0391, 0.2792, 0.0249], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0090, 0.0088, 0.0083, 0.0103, 0.0091, 0.0131, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:34:43,504 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99268.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:34:58,194 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99290.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:35:21,407 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99324.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:35:23,239 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2070, 2.9807, 3.0651, 2.7993, 3.2484, 3.0878, 3.0247, 3.2228], device='cuda:0'), covar=tensor([0.0418, 0.0552, 0.0509, 0.0571, 0.0473, 0.0312, 0.0481, 0.0590], device='cuda:0'), in_proj_covar=tensor([0.0150, 0.0158, 0.0110, 0.0147, 0.0189, 0.0116, 0.0131, 0.0158], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 08:35:23,246 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99327.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:35:24,657 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99329.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:35:29,929 INFO [train.py:876] (0/4) Epoch 14, batch 4800, loss[loss=0.1108, simple_loss=0.1378, pruned_loss=0.04188, over 5553.00 frames. ], tot_loss[loss=0.09669, simple_loss=0.1295, pruned_loss=0.03192, over 1083056.47 frames. ], batch size: 15, lr: 5.72e-03, grad_scale: 8.0 2022-11-16 08:35:35,128 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.609e+01 1.410e+02 1.721e+02 2.085e+02 4.264e+02, threshold=3.442e+02, percent-clipped=4.0 2022-11-16 08:35:52,005 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2803, 3.8290, 2.7989, 1.8129, 3.5484, 1.5343, 3.5078, 1.9433], device='cuda:0'), covar=tensor([0.1447, 0.0183, 0.0959, 0.1937, 0.0269, 0.1907, 0.0365, 0.1576], device='cuda:0'), in_proj_covar=tensor([0.0115, 0.0102, 0.0113, 0.0109, 0.0101, 0.0116, 0.0097, 0.0107], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:36:02,035 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99385.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:36:02,714 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2557, 1.7618, 2.1073, 2.0609, 2.2946, 1.5261, 2.0258, 2.1613], device='cuda:0'), covar=tensor([0.0530, 0.1067, 0.0620, 0.0585, 0.0616, 0.1245, 0.0666, 0.0641], device='cuda:0'), in_proj_covar=tensor([0.0242, 0.0192, 0.0214, 0.0213, 0.0242, 0.0196, 0.0223, 0.0233], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:36:05,725 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8676, 3.4859, 3.1493, 3.4454, 3.4875, 3.1705, 3.0390, 3.1751], device='cuda:0'), covar=tensor([0.1601, 0.0555, 0.1402, 0.0543, 0.0517, 0.0485, 0.0911, 0.0751], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0184, 0.0276, 0.0178, 0.0225, 0.0176, 0.0191, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:36:17,335 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99407.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:36:22,612 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1065, 2.6848, 3.5304, 2.1850, 1.9280, 3.3278, 2.9952, 2.4806], device='cuda:0'), covar=tensor([0.0648, 0.1137, 0.0336, 0.2507, 0.1564, 0.2805, 0.0816, 0.0952], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0109, 0.0107, 0.0107, 0.0081, 0.0074, 0.0089, 0.0099], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 08:36:24,787 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.72 vs. limit=2.0 2022-11-16 08:36:25,859 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99420.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:36:29,787 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.2570, 3.1821, 2.8656, 3.1625, 3.2092, 2.8649, 2.8396, 2.9985], device='cuda:0'), covar=tensor([0.0281, 0.0601, 0.1321, 0.0514, 0.0526, 0.0521, 0.1017, 0.0661], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0184, 0.0275, 0.0178, 0.0225, 0.0175, 0.0191, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:36:37,295 INFO [train.py:876] (0/4) Epoch 14, batch 4900, loss[loss=0.09685, simple_loss=0.1272, pruned_loss=0.03324, over 5126.00 frames. ], tot_loss[loss=0.09719, simple_loss=0.1296, pruned_loss=0.03237, over 1080703.43 frames. ], batch size: 7, lr: 5.71e-03, grad_scale: 8.0 2022-11-16 08:36:43,021 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.258e+01 1.532e+02 1.838e+02 2.274e+02 5.384e+02, threshold=3.676e+02, percent-clipped=5.0 2022-11-16 08:36:45,780 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.8342, 0.9376, 1.0267, 0.9316, 0.7545, 0.8524, 0.9159, 0.7606], device='cuda:0'), covar=tensor([0.0042, 0.0036, 0.0032, 0.0038, 0.0050, 0.0042, 0.0051, 0.0068], device='cuda:0'), in_proj_covar=tensor([0.0033, 0.0030, 0.0031, 0.0040, 0.0035, 0.0031, 0.0039, 0.0038], device='cuda:0'), out_proj_covar=tensor([3.0672e-05, 2.8450e-05, 2.7804e-05, 3.7965e-05, 3.2643e-05, 2.9833e-05, 3.7046e-05, 3.5906e-05], device='cuda:0') 2022-11-16 08:36:50,083 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6256, 2.6496, 2.3836, 2.6727, 2.2516, 2.1584, 2.4980, 2.9526], device='cuda:0'), covar=tensor([0.1186, 0.1292, 0.1778, 0.1047, 0.1598, 0.1295, 0.1466, 0.1041], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0109, 0.0107, 0.0110, 0.0095, 0.0106, 0.0098, 0.0086], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0003], device='cuda:0') 2022-11-16 08:36:58,586 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99468.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 08:37:07,066 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99481.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:37:19,250 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6467, 2.0000, 1.8985, 1.3130, 1.8935, 2.2536, 2.0138, 2.2291], device='cuda:0'), covar=tensor([0.1798, 0.1704, 0.1931, 0.2881, 0.1395, 0.1245, 0.0820, 0.1380], device='cuda:0'), in_proj_covar=tensor([0.0161, 0.0176, 0.0167, 0.0181, 0.0185, 0.0205, 0.0171, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:37:39,773 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99529.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:37:44,759 INFO [train.py:876] (0/4) Epoch 14, batch 5000, loss[loss=0.07539, simple_loss=0.1113, pruned_loss=0.01974, over 5703.00 frames. ], tot_loss[loss=0.09796, simple_loss=0.1303, pruned_loss=0.03281, over 1085492.67 frames. ], batch size: 28, lr: 5.71e-03, grad_scale: 8.0 2022-11-16 08:37:50,173 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.179e+01 1.461e+02 1.855e+02 2.293e+02 4.970e+02, threshold=3.710e+02, percent-clipped=2.0 2022-11-16 08:37:51,239 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99546.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:37:54,686 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99551.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:38:06,366 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99568.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:38:17,357 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99585.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:38:20,025 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99589.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:38:23,144 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99594.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:38:26,734 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99599.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:38:44,689 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99624.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:38:46,679 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99627.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:38:48,060 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99629.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:38:52,866 INFO [train.py:876] (0/4) Epoch 14, batch 5100, loss[loss=0.1549, simple_loss=0.1697, pruned_loss=0.07004, over 5289.00 frames. ], tot_loss[loss=0.0989, simple_loss=0.1314, pruned_loss=0.0332, over 1090571.19 frames. ], batch size: 79, lr: 5.71e-03, grad_scale: 8.0 2022-11-16 08:38:58,065 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.215e+01 1.466e+02 1.636e+02 2.038e+02 3.411e+02, threshold=3.271e+02, percent-clipped=0.0 2022-11-16 08:39:01,520 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99650.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:39:19,179 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99675.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:39:22,545 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99680.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:39:22,615 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=99680.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:39:40,061 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99707.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:40:00,861 INFO [train.py:876] (0/4) Epoch 14, batch 5200, loss[loss=0.1339, simple_loss=0.1519, pruned_loss=0.05794, over 5374.00 frames. ], tot_loss[loss=0.09812, simple_loss=0.1307, pruned_loss=0.03276, over 1091079.28 frames. ], batch size: 70, lr: 5.70e-03, grad_scale: 8.0 2022-11-16 08:40:03,570 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=99741.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:40:05,955 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.036e+02 1.374e+02 1.799e+02 2.301e+02 6.123e+02, threshold=3.597e+02, percent-clipped=6.0 2022-11-16 08:40:12,567 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99755.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:40:12,684 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.4695, 1.4092, 1.5487, 1.2799, 1.5903, 1.8311, 1.5656, 1.5560], device='cuda:0'), covar=tensor([0.0099, 0.0081, 0.0085, 0.0085, 0.0075, 0.0069, 0.0082, 0.0071], device='cuda:0'), in_proj_covar=tensor([0.0069, 0.0063, 0.0062, 0.0068, 0.0066, 0.0061, 0.0059, 0.0057], device='cuda:0'), out_proj_covar=tensor([6.1354e-05, 5.5791e-05, 5.4340e-05, 6.0100e-05, 5.8231e-05, 5.3020e-05, 5.2035e-05, 4.9908e-05], device='cuda:0') 2022-11-16 08:40:26,773 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99776.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:40:35,340 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5611, 1.7758, 1.8249, 1.8027, 1.5452, 2.5156, 2.0767, 1.5787], device='cuda:0'), covar=tensor([0.1590, 0.1665, 0.1667, 0.2381, 0.2185, 0.0745, 0.1349, 0.1897], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0108, 0.0107, 0.0107, 0.0080, 0.0074, 0.0089, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 08:40:59,385 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99824.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 08:41:03,675 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([6.0675, 5.7095, 5.8225, 5.4769, 6.1782, 6.0390, 4.9714, 6.1683], device='cuda:0'), covar=tensor([0.0496, 0.0335, 0.0598, 0.0375, 0.0356, 0.0227, 0.0307, 0.0241], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0161, 0.0114, 0.0150, 0.0193, 0.0119, 0.0133, 0.0161], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 08:41:08,868 INFO [train.py:876] (0/4) Epoch 14, batch 5300, loss[loss=0.08588, simple_loss=0.1203, pruned_loss=0.02575, over 5593.00 frames. ], tot_loss[loss=0.0975, simple_loss=0.1305, pruned_loss=0.03224, over 1092258.67 frames. ], batch size: 18, lr: 5.70e-03, grad_scale: 8.0 2022-11-16 08:41:14,413 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.304e+01 1.298e+02 1.618e+02 1.971e+02 5.007e+02, threshold=3.235e+02, percent-clipped=2.0 2022-11-16 08:41:19,328 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2022-11-16 08:41:27,015 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4815, 3.2848, 3.2769, 3.1540, 2.0148, 3.3550, 2.2123, 2.8954], device='cuda:0'), covar=tensor([0.0394, 0.0234, 0.0220, 0.0312, 0.0623, 0.0189, 0.0574, 0.0268], device='cuda:0'), in_proj_covar=tensor([0.0197, 0.0186, 0.0184, 0.0211, 0.0200, 0.0186, 0.0195, 0.0189], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 08:41:40,966 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99885.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:42:07,322 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99924.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:42:07,371 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99924.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:42:13,192 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99933.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:42:16,093 INFO [train.py:876] (0/4) Epoch 14, batch 5400, loss[loss=0.07732, simple_loss=0.1148, pruned_loss=0.01991, over 5619.00 frames. ], tot_loss[loss=0.09755, simple_loss=0.1302, pruned_loss=0.03247, over 1088879.86 frames. ], batch size: 23, lr: 5.70e-03, grad_scale: 8.0 2022-11-16 08:42:21,979 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.513e+01 1.478e+02 1.709e+02 2.137e+02 3.244e+02, threshold=3.418e+02, percent-clipped=1.0 2022-11-16 08:42:22,085 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=99945.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:42:26,519 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5291, 1.2228, 1.3208, 1.0640, 1.4950, 1.8001, 0.8417, 1.3337], device='cuda:0'), covar=tensor([0.0404, 0.0606, 0.0810, 0.0676, 0.0604, 0.0270, 0.0733, 0.0431], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0027, 0.0019, 0.0022, 0.0019, 0.0017, 0.0025, 0.0018], device='cuda:0'), out_proj_covar=tensor([9.5510e-05, 1.3418e-04, 1.0204e-04, 1.1366e-04, 1.0258e-04, 9.5973e-05, 1.2526e-04, 9.5933e-05], device='cuda:0') 2022-11-16 08:42:40,283 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=99972.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:42:45,521 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=99980.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:42:59,392 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-100000.pt 2022-11-16 08:43:07,456 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.2846, 4.7876, 4.3302, 4.7960, 4.7728, 3.9756, 4.3933, 4.0603], device='cuda:0'), covar=tensor([0.0350, 0.0472, 0.1324, 0.0371, 0.0390, 0.0496, 0.0611, 0.0587], device='cuda:0'), in_proj_covar=tensor([0.0135, 0.0185, 0.0277, 0.0179, 0.0226, 0.0177, 0.0193, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0003, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:43:08,155 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=100007.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:43:21,944 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100028.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:43:27,252 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=100036.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:43:27,828 INFO [train.py:876] (0/4) Epoch 14, batch 5500, loss[loss=0.07989, simple_loss=0.1144, pruned_loss=0.02267, over 5576.00 frames. ], tot_loss[loss=0.09772, simple_loss=0.1307, pruned_loss=0.03236, over 1088952.53 frames. ], batch size: 22, lr: 5.70e-03, grad_scale: 8.0 2022-11-16 08:43:32,933 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 1.432e+02 1.744e+02 2.287e+02 4.720e+02, threshold=3.489e+02, percent-clipped=3.0 2022-11-16 08:43:49,320 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=100068.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 08:43:54,517 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100076.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:44:04,444 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6512, 1.6892, 2.1221, 1.7090, 1.3777, 2.6443, 2.1114, 1.7940], device='cuda:0'), covar=tensor([0.1623, 0.2179, 0.1673, 0.2764, 0.3399, 0.0694, 0.1827, 0.2375], device='cuda:0'), in_proj_covar=tensor([0.0117, 0.0108, 0.0107, 0.0106, 0.0080, 0.0074, 0.0089, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 08:44:20,530 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.6843, 0.5722, 0.7393, 0.6392, 0.7814, 0.5736, 0.3848, 0.6203], device='cuda:0'), covar=tensor([0.0361, 0.0572, 0.0481, 0.0505, 0.0399, 0.0419, 0.0879, 0.0389], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0026, 0.0019, 0.0022, 0.0018, 0.0017, 0.0024, 0.0017], device='cuda:0'), out_proj_covar=tensor([9.3878e-05, 1.3177e-04, 1.0025e-04, 1.1223e-04, 1.0066e-04, 9.4507e-05, 1.2313e-04, 9.4719e-05], device='cuda:0') 2022-11-16 08:44:27,260 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100124.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:44:27,344 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100124.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:44:35,691 INFO [train.py:876] (0/4) Epoch 14, batch 5600, loss[loss=0.1733, simple_loss=0.1645, pruned_loss=0.09107, over 3084.00 frames. ], tot_loss[loss=0.09817, simple_loss=0.1307, pruned_loss=0.03284, over 1079900.88 frames. ], batch size: 284, lr: 5.69e-03, grad_scale: 8.0 2022-11-16 08:44:40,974 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.553e+01 1.416e+02 1.691e+02 1.981e+02 4.410e+02, threshold=3.382e+02, percent-clipped=1.0 2022-11-16 08:44:42,446 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1780, 1.6019, 1.2683, 1.2084, 1.5408, 1.4208, 1.1767, 1.5164], device='cuda:0'), covar=tensor([0.0078, 0.0049, 0.0072, 0.0074, 0.0066, 0.0060, 0.0099, 0.0068], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0064, 0.0063, 0.0069, 0.0067, 0.0062, 0.0060, 0.0058], device='cuda:0'), out_proj_covar=tensor([6.2458e-05, 5.6554e-05, 5.5249e-05, 6.1052e-05, 5.9486e-05, 5.3520e-05, 5.2887e-05, 5.0380e-05], device='cuda:0') 2022-11-16 08:44:59,693 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100172.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 08:45:06,414 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.29 vs. limit=5.0 2022-11-16 08:45:34,535 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100224.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:45:41,085 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.46 vs. limit=5.0 2022-11-16 08:45:43,183 INFO [train.py:876] (0/4) Epoch 14, batch 5700, loss[loss=0.08098, simple_loss=0.1196, pruned_loss=0.02119, over 5582.00 frames. ], tot_loss[loss=0.09859, simple_loss=0.1308, pruned_loss=0.0332, over 1078161.64 frames. ], batch size: 16, lr: 5.69e-03, grad_scale: 8.0 2022-11-16 08:45:43,981 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=100238.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:45:47,271 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9836, 2.2633, 2.4339, 3.1604, 3.1039, 2.4056, 2.1999, 3.2137], device='cuda:0'), covar=tensor([0.1406, 0.2410, 0.2238, 0.1851, 0.1363, 0.2982, 0.2091, 0.1780], device='cuda:0'), in_proj_covar=tensor([0.0261, 0.0200, 0.0187, 0.0301, 0.0228, 0.0199, 0.0189, 0.0251], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 08:45:48,356 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.517e+01 1.364e+02 1.751e+02 2.063e+02 4.628e+02, threshold=3.502e+02, percent-clipped=3.0 2022-11-16 08:45:48,531 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100245.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:45:50,575 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4197, 3.0634, 3.5383, 1.9116, 3.2659, 3.8493, 3.5396, 4.1616], device='cuda:0'), covar=tensor([0.1828, 0.1640, 0.0671, 0.2840, 0.0596, 0.0487, 0.0741, 0.0514], device='cuda:0'), in_proj_covar=tensor([0.0160, 0.0177, 0.0167, 0.0181, 0.0184, 0.0203, 0.0171, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:45:56,100 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.76 vs. limit=2.0 2022-11-16 08:46:06,330 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100272.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:46:21,246 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100293.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:46:21,493 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.67 vs. limit=2.0 2022-11-16 08:46:25,377 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=100299.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:46:30,884 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.52 vs. limit=2.0 2022-11-16 08:46:50,206 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100336.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:46:50,737 INFO [train.py:876] (0/4) Epoch 14, batch 5800, loss[loss=0.1216, simple_loss=0.1502, pruned_loss=0.04656, over 5116.00 frames. ], tot_loss[loss=0.09762, simple_loss=0.1301, pruned_loss=0.03259, over 1082615.13 frames. ], batch size: 91, lr: 5.69e-03, grad_scale: 16.0 2022-11-16 08:46:56,199 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.798e+01 1.394e+02 1.721e+02 2.262e+02 4.141e+02, threshold=3.442e+02, percent-clipped=1.0 2022-11-16 08:47:07,932 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=100363.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:47:21,125 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2639, 1.8592, 2.1450, 1.5091, 1.4861, 2.2623, 1.9097, 1.8395], device='cuda:0'), covar=tensor([0.3427, 0.0558, 0.0433, 0.1610, 0.2883, 0.0958, 0.0818, 0.0895], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0027, 0.0019, 0.0022, 0.0018, 0.0017, 0.0025, 0.0018], device='cuda:0'), out_proj_covar=tensor([9.5031e-05, 1.3314e-04, 1.0053e-04, 1.1365e-04, 1.0114e-04, 9.5585e-05, 1.2485e-04, 9.6106e-05], device='cuda:0') 2022-11-16 08:47:21,653 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100384.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:47:32,539 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6153, 1.4023, 1.4452, 1.1734, 1.4018, 1.5110, 1.1174, 0.8042], device='cuda:0'), covar=tensor([0.0041, 0.0056, 0.0065, 0.0083, 0.0072, 0.0049, 0.0068, 0.0099], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0030, 0.0031, 0.0040, 0.0035, 0.0030, 0.0039, 0.0038], device='cuda:0'), out_proj_covar=tensor([3.1020e-05, 2.8351e-05, 2.7583e-05, 3.8189e-05, 3.2488e-05, 2.9367e-05, 3.7353e-05, 3.6333e-05], device='cuda:0') 2022-11-16 08:47:34,802 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=100402.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:47:57,685 INFO [train.py:876] (0/4) Epoch 14, batch 5900, loss[loss=0.1676, simple_loss=0.1686, pruned_loss=0.08328, over 3102.00 frames. ], tot_loss[loss=0.09592, simple_loss=0.1287, pruned_loss=0.03156, over 1084957.65 frames. ], batch size: 284, lr: 5.68e-03, grad_scale: 16.0 2022-11-16 08:48:03,427 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.325e+01 1.325e+02 1.657e+02 2.050e+02 5.165e+02, threshold=3.313e+02, percent-clipped=3.0 2022-11-16 08:48:16,280 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=100463.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 08:48:42,809 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.2978, 4.7352, 4.9878, 4.6858, 5.3845, 5.1548, 4.5375, 5.3155], device='cuda:0'), covar=tensor([0.0317, 0.0319, 0.0504, 0.0405, 0.0286, 0.0254, 0.0285, 0.0245], device='cuda:0'), in_proj_covar=tensor([0.0150, 0.0159, 0.0112, 0.0148, 0.0189, 0.0117, 0.0131, 0.0160], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 08:49:05,460 INFO [train.py:876] (0/4) Epoch 14, batch 6000, loss[loss=0.1103, simple_loss=0.1459, pruned_loss=0.03733, over 5574.00 frames. ], tot_loss[loss=0.09662, simple_loss=0.1297, pruned_loss=0.03175, over 1086923.38 frames. ], batch size: 30, lr: 5.68e-03, grad_scale: 16.0 2022-11-16 08:49:05,461 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 08:49:23,826 INFO [train.py:908] (0/4) Epoch 14, validation: loss=0.1801, simple_loss=0.1888, pruned_loss=0.08568, over 1530663.00 frames. 2022-11-16 08:49:23,827 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 08:49:29,378 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.974e+01 1.386e+02 1.665e+02 1.958e+02 3.486e+02, threshold=3.330e+02, percent-clipped=1.0 2022-11-16 08:49:48,507 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([6.3489, 5.8570, 6.0429, 5.6827, 6.3932, 6.0873, 5.3732, 6.2789], device='cuda:0'), covar=tensor([0.0299, 0.0320, 0.0415, 0.0297, 0.0241, 0.0229, 0.0196, 0.0242], device='cuda:0'), in_proj_covar=tensor([0.0150, 0.0159, 0.0112, 0.0148, 0.0189, 0.0117, 0.0131, 0.0160], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0003, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 08:49:55,176 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.22 vs. limit=2.0 2022-11-16 08:50:01,473 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=100594.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:50:23,090 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0572, 3.1784, 3.1519, 2.9931, 3.1844, 3.0724, 1.3035, 3.3606], device='cuda:0'), covar=tensor([0.0415, 0.0365, 0.0413, 0.0372, 0.0413, 0.0426, 0.3322, 0.0414], device='cuda:0'), in_proj_covar=tensor([0.0105, 0.0090, 0.0088, 0.0081, 0.0102, 0.0090, 0.0130, 0.0109], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:50:30,876 INFO [train.py:876] (0/4) Epoch 14, batch 6100, loss[loss=0.09308, simple_loss=0.1349, pruned_loss=0.02564, over 5777.00 frames. ], tot_loss[loss=0.09588, simple_loss=0.1287, pruned_loss=0.03152, over 1078592.68 frames. ], batch size: 16, lr: 5.68e-03, grad_scale: 16.0 2022-11-16 08:50:36,072 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.272e+01 1.427e+02 1.651e+02 1.977e+02 4.323e+02, threshold=3.302e+02, percent-clipped=4.0 2022-11-16 08:50:48,645 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100663.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 08:51:18,772 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8407, 1.3501, 1.8875, 1.0569, 2.0804, 2.1816, 1.2311, 1.4324], device='cuda:0'), covar=tensor([0.0529, 0.0865, 0.0292, 0.1539, 0.0372, 0.0350, 0.0887, 0.0468], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0027, 0.0019, 0.0022, 0.0018, 0.0017, 0.0025, 0.0018], device='cuda:0'), out_proj_covar=tensor([9.5009e-05, 1.3398e-04, 1.0071e-04, 1.1440e-04, 1.0169e-04, 9.6388e-05, 1.2545e-04, 9.6974e-05], device='cuda:0') 2022-11-16 08:51:20,682 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100711.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 08:51:34,124 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=100730.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 08:51:34,772 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5696, 1.7702, 1.5831, 1.2155, 1.7089, 2.0271, 2.0104, 2.0361], device='cuda:0'), covar=tensor([0.1758, 0.1559, 0.2421, 0.3067, 0.1542, 0.1495, 0.0955, 0.1372], device='cuda:0'), in_proj_covar=tensor([0.0163, 0.0180, 0.0169, 0.0181, 0.0186, 0.0205, 0.0171, 0.0182], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:51:38,465 INFO [train.py:876] (0/4) Epoch 14, batch 6200, loss[loss=0.0964, simple_loss=0.1348, pruned_loss=0.02902, over 5561.00 frames. ], tot_loss[loss=0.09516, simple_loss=0.1286, pruned_loss=0.03086, over 1085361.60 frames. ], batch size: 15, lr: 5.68e-03, grad_scale: 16.0 2022-11-16 08:51:43,691 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.597e+01 1.377e+02 1.629e+02 2.100e+02 5.485e+02, threshold=3.258e+02, percent-clipped=3.0 2022-11-16 08:51:49,035 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3273, 2.3162, 3.0656, 2.7986, 2.7455, 2.2788, 2.8104, 3.2700], device='cuda:0'), covar=tensor([0.0704, 0.1310, 0.1003, 0.1298, 0.0898, 0.1437, 0.1213, 0.0996], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0194, 0.0216, 0.0213, 0.0242, 0.0197, 0.0224, 0.0231], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:51:52,256 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=100758.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 08:52:15,303 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=100791.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:52:22,760 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0409, 4.4550, 4.0852, 4.3665, 4.4488, 3.7387, 4.0335, 3.9431], device='cuda:0'), covar=tensor([0.0541, 0.0574, 0.1395, 0.0699, 0.0654, 0.0645, 0.0963, 0.1083], device='cuda:0'), in_proj_covar=tensor([0.0135, 0.0183, 0.0276, 0.0179, 0.0225, 0.0176, 0.0191, 0.0180], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 08:52:27,716 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.73 vs. limit=2.0 2022-11-16 08:52:46,739 INFO [train.py:876] (0/4) Epoch 14, batch 6300, loss[loss=0.07568, simple_loss=0.1067, pruned_loss=0.02231, over 5231.00 frames. ], tot_loss[loss=0.09582, simple_loss=0.1291, pruned_loss=0.03129, over 1088687.14 frames. ], batch size: 9, lr: 5.67e-03, grad_scale: 16.0 2022-11-16 08:52:51,883 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.472e+01 1.427e+02 1.681e+02 2.021e+02 4.003e+02, threshold=3.363e+02, percent-clipped=4.0 2022-11-16 08:53:14,291 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6464, 5.0636, 2.9594, 4.8851, 4.0774, 3.2156, 2.7893, 4.4473], device='cuda:0'), covar=tensor([0.1815, 0.0354, 0.1726, 0.0395, 0.0680, 0.1244, 0.2206, 0.0376], device='cuda:0'), in_proj_covar=tensor([0.0152, 0.0140, 0.0152, 0.0145, 0.0169, 0.0163, 0.0155, 0.0153], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:53:24,992 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=100894.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:53:36,756 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8134, 1.8894, 1.9261, 1.5293, 1.6194, 1.6221, 1.7385, 1.6991], device='cuda:0'), covar=tensor([0.0059, 0.0096, 0.0046, 0.0096, 0.0106, 0.0138, 0.0060, 0.0070], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0030, 0.0031, 0.0040, 0.0035, 0.0031, 0.0039, 0.0038], device='cuda:0'), out_proj_covar=tensor([3.1404e-05, 2.8434e-05, 2.7697e-05, 3.7764e-05, 3.2842e-05, 2.9715e-05, 3.7669e-05, 3.6542e-05], device='cuda:0') 2022-11-16 08:53:37,610 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.62 vs. limit=2.0 2022-11-16 08:53:40,258 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.39 vs. limit=2.0 2022-11-16 08:53:42,115 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2022-11-16 08:53:53,676 INFO [train.py:876] (0/4) Epoch 14, batch 6400, loss[loss=0.1306, simple_loss=0.1528, pruned_loss=0.05418, over 5557.00 frames. ], tot_loss[loss=0.09741, simple_loss=0.13, pruned_loss=0.03239, over 1089148.14 frames. ], batch size: 46, lr: 5.67e-03, grad_scale: 16.0 2022-11-16 08:53:57,517 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=100942.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:53:59,455 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.356e+01 1.445e+02 1.710e+02 2.169e+02 3.491e+02, threshold=3.419e+02, percent-clipped=2.0 2022-11-16 08:54:01,579 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0294, 1.6863, 2.0377, 1.7388, 1.7078, 1.9823, 1.8086, 1.9317], device='cuda:0'), covar=tensor([0.0822, 0.0606, 0.0809, 0.0623, 0.3710, 0.0928, 0.0446, 0.0286], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0027, 0.0019, 0.0022, 0.0019, 0.0017, 0.0025, 0.0018], device='cuda:0'), out_proj_covar=tensor([9.5076e-05, 1.3364e-04, 1.0133e-04, 1.1463e-04, 1.0231e-04, 9.6541e-05, 1.2613e-04, 9.7452e-05], device='cuda:0') 2022-11-16 08:54:49,805 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8025, 2.3196, 3.4256, 2.9711, 3.5311, 2.1840, 3.1284, 3.8059], device='cuda:0'), covar=tensor([0.0592, 0.1584, 0.0926, 0.1647, 0.0746, 0.1838, 0.1273, 0.0885], device='cuda:0'), in_proj_covar=tensor([0.0241, 0.0193, 0.0212, 0.0209, 0.0239, 0.0195, 0.0220, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:55:01,410 INFO [train.py:876] (0/4) Epoch 14, batch 6500, loss[loss=0.1515, simple_loss=0.1462, pruned_loss=0.07844, over 4131.00 frames. ], tot_loss[loss=0.0966, simple_loss=0.1294, pruned_loss=0.03188, over 1087162.49 frames. ], batch size: 181, lr: 5.67e-03, grad_scale: 16.0 2022-11-16 08:55:06,942 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.989e+01 1.437e+02 1.725e+02 2.064e+02 3.698e+02, threshold=3.449e+02, percent-clipped=2.0 2022-11-16 08:55:16,243 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=101058.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:55:25,549 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.10 vs. limit=5.0 2022-11-16 08:55:34,979 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=101086.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 08:55:38,938 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3458, 2.0862, 2.7551, 2.0086, 1.4351, 3.0449, 2.4629, 2.1773], device='cuda:0'), covar=tensor([0.1107, 0.1935, 0.0849, 0.2706, 0.2977, 0.0625, 0.1043, 0.1812], device='cuda:0'), in_proj_covar=tensor([0.0112, 0.0105, 0.0105, 0.0104, 0.0078, 0.0072, 0.0086, 0.0096], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 08:55:48,579 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=101106.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:55:49,315 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=101107.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:56:03,986 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.78 vs. limit=5.0 2022-11-16 08:56:09,261 INFO [train.py:876] (0/4) Epoch 14, batch 6600, loss[loss=0.07072, simple_loss=0.1049, pruned_loss=0.01829, over 5727.00 frames. ], tot_loss[loss=0.09553, simple_loss=0.1288, pruned_loss=0.03115, over 1093304.42 frames. ], batch size: 11, lr: 5.66e-03, grad_scale: 16.0 2022-11-16 08:56:14,437 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.470e+01 1.360e+02 1.590e+02 2.159e+02 4.243e+02, threshold=3.180e+02, percent-clipped=1.0 2022-11-16 08:56:31,067 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=101168.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:57:17,780 INFO [train.py:876] (0/4) Epoch 14, batch 6700, loss[loss=0.1006, simple_loss=0.1379, pruned_loss=0.03159, over 5505.00 frames. ], tot_loss[loss=0.09741, simple_loss=0.13, pruned_loss=0.0324, over 1086382.99 frames. ], batch size: 17, lr: 5.66e-03, grad_scale: 16.0 2022-11-16 08:57:22,870 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.036e+01 1.360e+02 1.742e+02 2.134e+02 3.328e+02, threshold=3.484e+02, percent-clipped=2.0 2022-11-16 08:57:26,400 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1901, 2.9403, 3.1805, 1.8653, 3.0816, 3.3977, 3.3747, 3.6431], device='cuda:0'), covar=tensor([0.1976, 0.1464, 0.0889, 0.2578, 0.0825, 0.0911, 0.0633, 0.0746], device='cuda:0'), in_proj_covar=tensor([0.0163, 0.0179, 0.0169, 0.0181, 0.0185, 0.0205, 0.0173, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:58:25,804 INFO [train.py:876] (0/4) Epoch 14, batch 6800, loss[loss=0.1278, simple_loss=0.1605, pruned_loss=0.04758, over 5600.00 frames. ], tot_loss[loss=0.09782, simple_loss=0.1305, pruned_loss=0.03257, over 1085123.09 frames. ], batch size: 50, lr: 5.66e-03, grad_scale: 16.0 2022-11-16 08:58:30,944 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.618e+01 1.430e+02 1.640e+02 2.057e+02 3.965e+02, threshold=3.281e+02, percent-clipped=2.0 2022-11-16 08:58:58,316 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=101386.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 08:59:00,564 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.1350, 1.4589, 1.2958, 1.3830, 1.2182, 1.8339, 1.4921, 1.3730], device='cuda:0'), covar=tensor([0.3279, 0.1266, 0.3473, 0.2936, 0.2646, 0.0817, 0.2186, 0.2818], device='cuda:0'), in_proj_covar=tensor([0.0113, 0.0105, 0.0104, 0.0104, 0.0078, 0.0072, 0.0086, 0.0096], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 08:59:04,730 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6733, 1.6629, 1.5688, 1.2257, 1.5786, 1.6233, 1.1994, 1.0484], device='cuda:0'), covar=tensor([0.0045, 0.0069, 0.0056, 0.0075, 0.0087, 0.0081, 0.0067, 0.0075], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0030, 0.0031, 0.0039, 0.0035, 0.0030, 0.0039, 0.0038], device='cuda:0'), out_proj_covar=tensor([3.1417e-05, 2.8145e-05, 2.7565e-05, 3.7108e-05, 3.2225e-05, 2.9346e-05, 3.7262e-05, 3.6385e-05], device='cuda:0') 2022-11-16 08:59:11,076 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.22 vs. limit=5.0 2022-11-16 08:59:25,208 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.1348, 5.2922, 3.7216, 4.9246, 4.0617, 3.7222, 3.3027, 4.5168], device='cuda:0'), covar=tensor([0.1087, 0.0209, 0.0767, 0.0347, 0.0440, 0.0835, 0.1391, 0.0229], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0141, 0.0152, 0.0146, 0.0171, 0.0164, 0.0156, 0.0156], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 08:59:30,802 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=101434.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 08:59:32,746 INFO [train.py:876] (0/4) Epoch 14, batch 6900, loss[loss=0.08332, simple_loss=0.123, pruned_loss=0.02183, over 5751.00 frames. ], tot_loss[loss=0.09702, simple_loss=0.1306, pruned_loss=0.03173, over 1091600.32 frames. ], batch size: 13, lr: 5.66e-03, grad_scale: 16.0 2022-11-16 08:59:39,121 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.093e+01 1.364e+02 1.752e+02 2.207e+02 5.252e+02, threshold=3.504e+02, percent-clipped=3.0 2022-11-16 08:59:50,942 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=101463.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 08:59:57,564 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-16 08:59:59,422 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5589, 1.8844, 2.3389, 2.2417, 2.4277, 1.6367, 2.2151, 2.4767], device='cuda:0'), covar=tensor([0.0609, 0.1114, 0.0700, 0.0728, 0.0804, 0.1301, 0.0957, 0.0671], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0194, 0.0215, 0.0210, 0.0240, 0.0197, 0.0225, 0.0230], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:00:03,829 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.42 vs. limit=2.0 2022-11-16 09:00:04,550 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.26 vs. limit=2.0 2022-11-16 09:00:40,656 INFO [train.py:876] (0/4) Epoch 14, batch 7000, loss[loss=0.1186, simple_loss=0.151, pruned_loss=0.04307, over 5264.00 frames. ], tot_loss[loss=0.09815, simple_loss=0.1307, pruned_loss=0.0328, over 1082837.48 frames. ], batch size: 79, lr: 5.65e-03, grad_scale: 16.0 2022-11-16 09:00:47,040 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.705e+01 1.319e+02 1.556e+02 2.148e+02 4.018e+02, threshold=3.112e+02, percent-clipped=2.0 2022-11-16 09:00:50,276 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-16 09:01:04,093 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1793, 2.5974, 3.3204, 1.5966, 3.0722, 3.3698, 3.3374, 3.3984], device='cuda:0'), covar=tensor([0.2093, 0.1948, 0.0819, 0.3263, 0.0701, 0.0930, 0.0539, 0.0835], device='cuda:0'), in_proj_covar=tensor([0.0164, 0.0180, 0.0169, 0.0183, 0.0186, 0.0207, 0.0174, 0.0185], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:01:25,117 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8555, 2.7738, 3.3278, 2.3895, 1.5114, 3.7542, 2.9245, 2.5248], device='cuda:0'), covar=tensor([0.1017, 0.0993, 0.0608, 0.2194, 0.2293, 0.0889, 0.1567, 0.1045], device='cuda:0'), in_proj_covar=tensor([0.0115, 0.0106, 0.0107, 0.0105, 0.0079, 0.0074, 0.0088, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 09:01:30,646 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7001, 4.8351, 3.1135, 4.6378, 3.7578, 3.2788, 2.9255, 4.1069], device='cuda:0'), covar=tensor([0.1421, 0.0211, 0.1117, 0.0261, 0.0706, 0.0929, 0.1872, 0.0396], device='cuda:0'), in_proj_covar=tensor([0.0152, 0.0141, 0.0151, 0.0145, 0.0171, 0.0164, 0.0156, 0.0155], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:01:48,509 INFO [train.py:876] (0/4) Epoch 14, batch 7100, loss[loss=0.08872, simple_loss=0.1214, pruned_loss=0.02799, over 5535.00 frames. ], tot_loss[loss=0.09848, simple_loss=0.1309, pruned_loss=0.03303, over 1079175.99 frames. ], batch size: 13, lr: 5.65e-03, grad_scale: 8.0 2022-11-16 09:01:54,700 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.279e+01 1.412e+02 1.777e+02 2.341e+02 5.678e+02, threshold=3.553e+02, percent-clipped=7.0 2022-11-16 09:02:17,048 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.3691, 1.6782, 1.3154, 1.2051, 1.5588, 1.4638, 1.0316, 1.6137], device='cuda:0'), covar=tensor([0.0072, 0.0050, 0.0063, 0.0073, 0.0054, 0.0052, 0.0086, 0.0056], device='cuda:0'), in_proj_covar=tensor([0.0069, 0.0063, 0.0063, 0.0068, 0.0066, 0.0062, 0.0059, 0.0057], device='cuda:0'), out_proj_covar=tensor([6.1384e-05, 5.5759e-05, 5.4488e-05, 5.9598e-05, 5.7911e-05, 5.3493e-05, 5.2115e-05, 4.9912e-05], device='cuda:0') 2022-11-16 09:02:21,010 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=101684.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:02:57,246 INFO [train.py:876] (0/4) Epoch 14, batch 7200, loss[loss=0.08835, simple_loss=0.1248, pruned_loss=0.02597, over 5708.00 frames. ], tot_loss[loss=0.0981, simple_loss=0.1306, pruned_loss=0.03278, over 1084106.82 frames. ], batch size: 17, lr: 5.65e-03, grad_scale: 8.0 2022-11-16 09:03:02,966 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=101745.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:03:03,779 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.181e+01 1.351e+02 1.693e+02 2.156e+02 4.493e+02, threshold=3.386e+02, percent-clipped=3.0 2022-11-16 09:03:15,174 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=101763.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:03:26,524 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.58 vs. limit=5.0 2022-11-16 09:03:28,723 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0238, 1.1131, 1.1057, 1.0222, 0.8448, 1.0678, 0.8350, 0.8700], device='cuda:0'), covar=tensor([0.0052, 0.0047, 0.0043, 0.0055, 0.0060, 0.0049, 0.0078, 0.0087], device='cuda:0'), in_proj_covar=tensor([0.0033, 0.0030, 0.0030, 0.0038, 0.0034, 0.0030, 0.0039, 0.0038], device='cuda:0'), out_proj_covar=tensor([3.1010e-05, 2.7666e-05, 2.7081e-05, 3.6743e-05, 3.1633e-05, 2.8956e-05, 3.6789e-05, 3.6042e-05], device='cuda:0') 2022-11-16 09:03:46,022 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/epoch-14.pt 2022-11-16 09:04:28,272 INFO [train.py:876] (0/4) Epoch 15, batch 0, loss[loss=0.1587, simple_loss=0.1729, pruned_loss=0.07224, over 5396.00 frames. ], tot_loss[loss=0.1587, simple_loss=0.1729, pruned_loss=0.07224, over 5396.00 frames. ], batch size: 70, lr: 5.45e-03, grad_scale: 8.0 2022-11-16 09:04:28,274 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 09:04:44,422 INFO [train.py:908] (0/4) Epoch 15, validation: loss=0.1798, simple_loss=0.1892, pruned_loss=0.08518, over 1530663.00 frames. 2022-11-16 09:04:44,423 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 09:04:45,740 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=101811.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:04:46,440 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8571, 1.7490, 1.6448, 1.7563, 1.8766, 2.0190, 1.7346, 1.6679], device='cuda:0'), covar=tensor([0.0050, 0.0114, 0.0099, 0.0058, 0.0056, 0.0060, 0.0051, 0.0062], device='cuda:0'), in_proj_covar=tensor([0.0033, 0.0029, 0.0030, 0.0038, 0.0034, 0.0030, 0.0038, 0.0037], device='cuda:0'), out_proj_covar=tensor([3.0858e-05, 2.7438e-05, 2.6959e-05, 3.6436e-05, 3.1441e-05, 2.8814e-05, 3.6456e-05, 3.5802e-05], device='cuda:0') 2022-11-16 09:05:09,640 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.429e+01 1.523e+02 1.865e+02 2.134e+02 5.248e+02, threshold=3.731e+02, percent-clipped=3.0 2022-11-16 09:05:24,508 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.4253, 4.8828, 5.2238, 4.7571, 5.4985, 5.3443, 4.5432, 5.3899], device='cuda:0'), covar=tensor([0.0339, 0.0365, 0.0412, 0.0401, 0.0315, 0.0275, 0.0353, 0.0310], device='cuda:0'), in_proj_covar=tensor([0.0152, 0.0161, 0.0113, 0.0149, 0.0192, 0.0118, 0.0133, 0.0163], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 09:05:51,986 INFO [train.py:876] (0/4) Epoch 15, batch 100, loss[loss=0.06245, simple_loss=0.1013, pruned_loss=0.01182, over 5495.00 frames. ], tot_loss[loss=0.09714, simple_loss=0.1292, pruned_loss=0.03256, over 432390.35 frames. ], batch size: 12, lr: 5.45e-03, grad_scale: 8.0 2022-11-16 09:05:59,267 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=101920.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 09:05:59,951 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.6195, 2.8382, 3.2727, 4.3722, 4.2892, 3.2217, 2.8474, 4.3342], device='cuda:0'), covar=tensor([0.0407, 0.2793, 0.1846, 0.1983, 0.0851, 0.2568, 0.1919, 0.0570], device='cuda:0'), in_proj_covar=tensor([0.0259, 0.0194, 0.0185, 0.0293, 0.0226, 0.0197, 0.0186, 0.0249], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 09:06:13,986 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.76 vs. limit=2.0 2022-11-16 09:06:16,935 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.051e+01 1.432e+02 1.657e+02 2.152e+02 4.167e+02, threshold=3.314e+02, percent-clipped=2.0 2022-11-16 09:06:29,455 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.19 vs. limit=2.0 2022-11-16 09:06:40,785 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=101981.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 09:06:44,624 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=101987.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:07:00,220 INFO [train.py:876] (0/4) Epoch 15, batch 200, loss[loss=0.09497, simple_loss=0.1286, pruned_loss=0.03065, over 5697.00 frames. ], tot_loss[loss=0.09658, simple_loss=0.1296, pruned_loss=0.03179, over 691560.70 frames. ], batch size: 17, lr: 5.45e-03, grad_scale: 8.0 2022-11-16 09:07:20,549 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=102040.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:07:24,775 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.824e+01 1.391e+02 1.728e+02 2.198e+02 6.457e+02, threshold=3.456e+02, percent-clipped=4.0 2022-11-16 09:07:26,238 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=102048.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:08:07,555 INFO [train.py:876] (0/4) Epoch 15, batch 300, loss[loss=0.134, simple_loss=0.1603, pruned_loss=0.05389, over 5481.00 frames. ], tot_loss[loss=0.0977, simple_loss=0.13, pruned_loss=0.03268, over 846246.36 frames. ], batch size: 64, lr: 5.45e-03, grad_scale: 8.0 2022-11-16 09:08:25,358 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2022-11-16 09:08:32,098 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.719e+01 1.477e+02 1.712e+02 2.125e+02 3.835e+02, threshold=3.423e+02, percent-clipped=2.0 2022-11-16 09:08:48,453 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0524, 3.2207, 2.9949, 3.2250, 2.7229, 3.1431, 3.3515, 3.4873], device='cuda:0'), covar=tensor([0.0812, 0.0830, 0.1092, 0.0747, 0.1155, 0.0847, 0.0730, 0.1332], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0110, 0.0109, 0.0111, 0.0096, 0.0108, 0.0099, 0.0087], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:08:50,681 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0642, 2.8799, 3.2477, 1.6031, 3.0704, 3.4260, 3.4215, 3.7418], device='cuda:0'), covar=tensor([0.1973, 0.1576, 0.0968, 0.2925, 0.0847, 0.0664, 0.0630, 0.0641], device='cuda:0'), in_proj_covar=tensor([0.0162, 0.0177, 0.0168, 0.0181, 0.0185, 0.0203, 0.0174, 0.0184], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:09:13,511 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.4521, 2.5419, 3.1262, 4.1190, 4.1481, 3.2983, 3.1200, 4.3037], device='cuda:0'), covar=tensor([0.0519, 0.2855, 0.2613, 0.3292, 0.1098, 0.2869, 0.1946, 0.0665], device='cuda:0'), in_proj_covar=tensor([0.0261, 0.0195, 0.0186, 0.0294, 0.0226, 0.0198, 0.0188, 0.0250], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 09:09:15,302 INFO [train.py:876] (0/4) Epoch 15, batch 400, loss[loss=0.1198, simple_loss=0.1337, pruned_loss=0.05293, over 4127.00 frames. ], tot_loss[loss=0.09662, simple_loss=0.1294, pruned_loss=0.03193, over 945403.42 frames. ], batch size: 181, lr: 5.44e-03, grad_scale: 8.0 2022-11-16 09:09:40,375 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.002e+02 1.441e+02 1.640e+02 2.170e+02 4.348e+02, threshold=3.279e+02, percent-clipped=4.0 2022-11-16 09:10:00,702 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=102276.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 09:10:22,877 INFO [train.py:876] (0/4) Epoch 15, batch 500, loss[loss=0.08433, simple_loss=0.12, pruned_loss=0.02433, over 5702.00 frames. ], tot_loss[loss=0.09685, simple_loss=0.1302, pruned_loss=0.03176, over 1003916.54 frames. ], batch size: 15, lr: 5.44e-03, grad_scale: 8.0 2022-11-16 09:10:44,310 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=102340.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:10:46,601 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=102343.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:10:48,482 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.403e+01 1.330e+02 1.651e+02 2.070e+02 4.075e+02, threshold=3.302e+02, percent-clipped=1.0 2022-11-16 09:11:16,842 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=102388.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:11:31,320 INFO [train.py:876] (0/4) Epoch 15, batch 600, loss[loss=0.1594, simple_loss=0.1648, pruned_loss=0.07701, over 5461.00 frames. ], tot_loss[loss=0.09692, simple_loss=0.1296, pruned_loss=0.03213, over 1032545.16 frames. ], batch size: 64, lr: 5.44e-03, grad_scale: 8.0 2022-11-16 09:11:56,870 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.008e+02 1.396e+02 1.798e+02 2.195e+02 6.197e+02, threshold=3.596e+02, percent-clipped=7.0 2022-11-16 09:12:39,187 INFO [train.py:876] (0/4) Epoch 15, batch 700, loss[loss=0.07537, simple_loss=0.1009, pruned_loss=0.02489, over 5316.00 frames. ], tot_loss[loss=0.09746, simple_loss=0.1304, pruned_loss=0.03224, over 1051538.59 frames. ], batch size: 9, lr: 5.44e-03, grad_scale: 8.0 2022-11-16 09:12:59,704 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.5531, 1.2100, 1.2034, 0.9966, 1.2955, 1.5005, 0.8095, 1.1529], device='cuda:0'), covar=tensor([0.0345, 0.0437, 0.0409, 0.0682, 0.0382, 0.0317, 0.1109, 0.0424], device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0028, 0.0020, 0.0023, 0.0019, 0.0018, 0.0026, 0.0019], device='cuda:0'), out_proj_covar=tensor([9.9286e-05, 1.3920e-04, 1.0550e-04, 1.1979e-04, 1.0598e-04, 1.0058e-04, 1.3182e-04, 1.0164e-04], device='cuda:0') 2022-11-16 09:13:04,053 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.068e+01 1.444e+02 1.826e+02 2.249e+02 4.884e+02, threshold=3.652e+02, percent-clipped=1.0 2022-11-16 09:13:24,171 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=102576.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 09:13:46,903 INFO [train.py:876] (0/4) Epoch 15, batch 800, loss[loss=0.09744, simple_loss=0.1308, pruned_loss=0.03202, over 5283.00 frames. ], tot_loss[loss=0.09708, simple_loss=0.13, pruned_loss=0.03207, over 1073448.31 frames. ], batch size: 79, lr: 5.43e-03, grad_scale: 8.0 2022-11-16 09:13:54,482 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5784, 2.4356, 3.0110, 2.2483, 1.7433, 3.3210, 2.7932, 2.3644], device='cuda:0'), covar=tensor([0.1246, 0.1224, 0.0634, 0.2120, 0.2858, 0.0753, 0.0836, 0.1420], device='cuda:0'), in_proj_covar=tensor([0.0113, 0.0104, 0.0105, 0.0103, 0.0078, 0.0073, 0.0087, 0.0098], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 09:13:57,845 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=102624.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 09:14:02,754 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5965, 2.2497, 2.3609, 2.9879, 2.9611, 2.3982, 1.9994, 2.9540], device='cuda:0'), covar=tensor([0.2059, 0.2051, 0.1769, 0.1522, 0.1194, 0.2793, 0.2102, 0.1154], device='cuda:0'), in_proj_covar=tensor([0.0258, 0.0194, 0.0186, 0.0293, 0.0226, 0.0198, 0.0187, 0.0250], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 09:14:11,169 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=102643.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:14:13,462 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.304e+01 1.433e+02 1.767e+02 2.256e+02 4.472e+02, threshold=3.533e+02, percent-clipped=3.0 2022-11-16 09:14:21,746 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.49 vs. limit=2.0 2022-11-16 09:14:45,023 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=102691.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:14:48,981 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2012, 1.5305, 1.2144, 1.0919, 1.4583, 1.2500, 0.8159, 1.5619], device='cuda:0'), covar=tensor([0.0089, 0.0061, 0.0086, 0.0108, 0.0083, 0.0094, 0.0171, 0.0055], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0063, 0.0063, 0.0068, 0.0066, 0.0062, 0.0060, 0.0058], device='cuda:0'), out_proj_covar=tensor([6.2333e-05, 5.5619e-05, 5.5277e-05, 5.9982e-05, 5.8640e-05, 5.3882e-05, 5.3106e-05, 5.0403e-05], device='cuda:0') 2022-11-16 09:14:56,842 INFO [train.py:876] (0/4) Epoch 15, batch 900, loss[loss=0.1051, simple_loss=0.1479, pruned_loss=0.03117, over 5753.00 frames. ], tot_loss[loss=0.09592, simple_loss=0.1294, pruned_loss=0.03121, over 1086247.12 frames. ], batch size: 20, lr: 5.43e-03, grad_scale: 8.0 2022-11-16 09:15:00,160 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.3444, 3.9078, 4.1004, 3.7830, 4.4415, 4.0607, 4.0265, 4.4641], device='cuda:0'), covar=tensor([0.0745, 0.0930, 0.0879, 0.1015, 0.0816, 0.0740, 0.0705, 0.0785], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0162, 0.0114, 0.0150, 0.0194, 0.0119, 0.0133, 0.0163], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 09:15:21,995 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.688e+01 1.409e+02 1.812e+02 2.361e+02 4.444e+02, threshold=3.625e+02, percent-clipped=1.0 2022-11-16 09:15:58,223 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9126, 4.3317, 4.0296, 3.6400, 2.0539, 4.2284, 2.4497, 3.9104], device='cuda:0'), covar=tensor([0.0471, 0.0199, 0.0204, 0.0407, 0.0740, 0.0208, 0.0647, 0.0143], device='cuda:0'), in_proj_covar=tensor([0.0198, 0.0188, 0.0187, 0.0210, 0.0200, 0.0188, 0.0197, 0.0190], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 09:16:04,878 INFO [train.py:876] (0/4) Epoch 15, batch 1000, loss[loss=0.1649, simple_loss=0.167, pruned_loss=0.0814, over 5356.00 frames. ], tot_loss[loss=0.0961, simple_loss=0.1292, pruned_loss=0.03147, over 1089754.84 frames. ], batch size: 70, lr: 5.43e-03, grad_scale: 8.0 2022-11-16 09:16:29,880 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.467e+01 1.525e+02 1.767e+02 2.154e+02 7.760e+02, threshold=3.535e+02, percent-clipped=4.0 2022-11-16 09:16:33,616 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=102851.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:16:54,477 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5769, 2.9112, 3.3398, 4.1983, 4.2684, 3.5101, 3.1549, 4.2794], device='cuda:0'), covar=tensor([0.0454, 0.1911, 0.1551, 0.2016, 0.1066, 0.2336, 0.1504, 0.0543], device='cuda:0'), in_proj_covar=tensor([0.0259, 0.0194, 0.0187, 0.0294, 0.0227, 0.0198, 0.0188, 0.0251], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 09:16:57,606 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=102887.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:16:59,328 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.60 vs. limit=2.0 2022-11-16 09:17:01,984 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4157, 3.3142, 3.7149, 1.9255, 3.4779, 3.7212, 3.8352, 4.3030], device='cuda:0'), covar=tensor([0.1751, 0.1480, 0.0739, 0.2874, 0.0580, 0.0803, 0.0470, 0.0659], device='cuda:0'), in_proj_covar=tensor([0.0161, 0.0175, 0.0165, 0.0179, 0.0185, 0.0203, 0.0173, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:17:12,900 INFO [train.py:876] (0/4) Epoch 15, batch 1100, loss[loss=0.09986, simple_loss=0.1321, pruned_loss=0.03379, over 5698.00 frames. ], tot_loss[loss=0.09691, simple_loss=0.1301, pruned_loss=0.03186, over 1082809.50 frames. ], batch size: 36, lr: 5.42e-03, grad_scale: 8.0 2022-11-16 09:17:14,975 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=102912.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:17:37,838 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 1.387e+02 1.709e+02 2.035e+02 6.034e+02, threshold=3.418e+02, percent-clipped=1.0 2022-11-16 09:17:38,731 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=102948.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:18:06,285 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.89 vs. limit=2.0 2022-11-16 09:18:19,911 INFO [train.py:876] (0/4) Epoch 15, batch 1200, loss[loss=0.1236, simple_loss=0.1512, pruned_loss=0.04803, over 5593.00 frames. ], tot_loss[loss=0.09766, simple_loss=0.1305, pruned_loss=0.03242, over 1078526.38 frames. ], batch size: 43, lr: 5.42e-03, grad_scale: 8.0 2022-11-16 09:18:31,780 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2018, 2.7689, 2.5543, 1.7326, 2.8791, 2.9961, 2.9299, 3.2392], device='cuda:0'), covar=tensor([0.1902, 0.1618, 0.1705, 0.2831, 0.0874, 0.1485, 0.0692, 0.0914], device='cuda:0'), in_proj_covar=tensor([0.0162, 0.0177, 0.0167, 0.0181, 0.0186, 0.0204, 0.0174, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:18:45,657 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.196e+01 1.283e+02 1.545e+02 2.048e+02 3.817e+02, threshold=3.089e+02, percent-clipped=2.0 2022-11-16 09:19:20,295 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103098.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:19:27,357 INFO [train.py:876] (0/4) Epoch 15, batch 1300, loss[loss=0.06212, simple_loss=0.1004, pruned_loss=0.01192, over 5698.00 frames. ], tot_loss[loss=0.09751, simple_loss=0.1304, pruned_loss=0.0323, over 1079602.69 frames. ], batch size: 12, lr: 5.42e-03, grad_scale: 8.0 2022-11-16 09:19:43,275 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103132.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:19:53,329 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 1.342e+02 1.572e+02 1.992e+02 4.136e+02, threshold=3.143e+02, percent-clipped=6.0 2022-11-16 09:20:01,530 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103159.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:20:24,863 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103193.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:20:34,612 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103207.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:20:35,871 INFO [train.py:876] (0/4) Epoch 15, batch 1400, loss[loss=0.0877, simple_loss=0.1254, pruned_loss=0.02499, over 5567.00 frames. ], tot_loss[loss=0.09785, simple_loss=0.1307, pruned_loss=0.03248, over 1075655.36 frames. ], batch size: 43, lr: 5.42e-03, grad_scale: 8.0 2022-11-16 09:20:56,467 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.0269, 1.4423, 1.4847, 1.3811, 1.2923, 1.2631, 1.2390, 1.3726], device='cuda:0'), covar=tensor([0.3622, 0.2341, 0.1582, 0.1694, 0.2323, 0.2989, 0.2297, 0.1198], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0112, 0.0109, 0.0113, 0.0097, 0.0108, 0.0101, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:20:59,051 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103243.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:21:01,547 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.466e+01 1.279e+02 1.600e+02 2.005e+02 3.383e+02, threshold=3.199e+02, percent-clipped=1.0 2022-11-16 09:21:28,111 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.3833, 2.9439, 3.7742, 2.1253, 1.9820, 3.6612, 2.9962, 2.8258], device='cuda:0'), covar=tensor([0.0590, 0.0821, 0.0471, 0.2131, 0.2370, 0.2761, 0.0775, 0.0831], device='cuda:0'), in_proj_covar=tensor([0.0113, 0.0105, 0.0105, 0.0103, 0.0079, 0.0074, 0.0087, 0.0097], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 09:21:42,722 INFO [train.py:876] (0/4) Epoch 15, batch 1500, loss[loss=0.08433, simple_loss=0.1262, pruned_loss=0.02123, over 5702.00 frames. ], tot_loss[loss=0.09584, simple_loss=0.1291, pruned_loss=0.03127, over 1082012.21 frames. ], batch size: 34, lr: 5.41e-03, grad_scale: 8.0 2022-11-16 09:21:56,836 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.9878, 1.5955, 2.0390, 1.4001, 2.3174, 1.8642, 1.5342, 1.6977], device='cuda:0'), covar=tensor([0.0460, 0.0481, 0.0466, 0.0681, 0.0267, 0.0488, 0.0672, 0.0782], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0027, 0.0019, 0.0023, 0.0019, 0.0018, 0.0026, 0.0019], device='cuda:0'), out_proj_covar=tensor([9.8576e-05, 1.3741e-04, 1.0413e-04, 1.1870e-04, 1.0528e-04, 1.0014e-04, 1.3067e-04, 1.0160e-04], device='cuda:0') 2022-11-16 09:22:08,705 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.936e+01 1.307e+02 1.656e+02 2.050e+02 4.827e+02, threshold=3.313e+02, percent-clipped=2.0 2022-11-16 09:22:10,506 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.37 vs. limit=2.0 2022-11-16 09:22:21,961 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103366.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 09:22:24,678 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8908, 2.3922, 3.5696, 3.0434, 3.5136, 2.4816, 3.1518, 3.7878], device='cuda:0'), covar=tensor([0.0755, 0.1601, 0.0809, 0.1363, 0.0536, 0.1570, 0.1300, 0.0729], device='cuda:0'), in_proj_covar=tensor([0.0246, 0.0191, 0.0216, 0.0210, 0.0241, 0.0199, 0.0226, 0.0231], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:22:35,353 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8213, 2.6194, 2.6740, 2.4838, 2.8841, 2.7302, 2.7656, 2.8236], device='cuda:0'), covar=tensor([0.0509, 0.0559, 0.0588, 0.0588, 0.0502, 0.0298, 0.0452, 0.0664], device='cuda:0'), in_proj_covar=tensor([0.0156, 0.0163, 0.0117, 0.0153, 0.0197, 0.0120, 0.0135, 0.0165], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 09:22:51,379 INFO [train.py:876] (0/4) Epoch 15, batch 1600, loss[loss=0.08848, simple_loss=0.131, pruned_loss=0.02299, over 5571.00 frames. ], tot_loss[loss=0.09599, simple_loss=0.1295, pruned_loss=0.03125, over 1082332.04 frames. ], batch size: 46, lr: 5.41e-03, grad_scale: 8.0 2022-11-16 09:23:03,576 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103427.0, num_to_drop=1, layers_to_drop={3} 2022-11-16 09:23:04,833 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.7231, 1.1454, 0.8302, 0.8264, 0.8919, 0.9707, 0.6746, 1.2301], device='cuda:0'), covar=tensor([0.0135, 0.0059, 0.0102, 0.0082, 0.0088, 0.0095, 0.0121, 0.0077], device='cuda:0'), in_proj_covar=tensor([0.0070, 0.0063, 0.0063, 0.0068, 0.0067, 0.0063, 0.0060, 0.0059], device='cuda:0'), out_proj_covar=tensor([6.2422e-05, 5.5684e-05, 5.5179e-05, 6.0134e-05, 5.9573e-05, 5.4388e-05, 5.3143e-05, 5.1024e-05], device='cuda:0') 2022-11-16 09:23:17,105 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.055e+01 1.366e+02 1.646e+02 2.008e+02 3.608e+02, threshold=3.293e+02, percent-clipped=2.0 2022-11-16 09:23:17,834 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6465, 3.3591, 3.4903, 3.3112, 3.7299, 3.5732, 3.4932, 3.6530], device='cuda:0'), covar=tensor([0.0411, 0.0453, 0.0534, 0.0401, 0.0370, 0.0260, 0.0356, 0.0476], device='cuda:0'), in_proj_covar=tensor([0.0154, 0.0162, 0.0117, 0.0151, 0.0196, 0.0119, 0.0134, 0.0164], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 09:23:21,829 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103454.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:23:25,398 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.35 vs. limit=2.0 2022-11-16 09:23:27,582 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5081, 4.4876, 2.9862, 4.2462, 3.4636, 2.9727, 2.5486, 3.7625], device='cuda:0'), covar=tensor([0.1596, 0.0265, 0.1274, 0.0529, 0.0854, 0.1305, 0.2023, 0.0547], device='cuda:0'), in_proj_covar=tensor([0.0153, 0.0143, 0.0152, 0.0145, 0.0172, 0.0165, 0.0157, 0.0157], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:23:38,275 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6603, 1.6849, 1.7117, 1.3671, 1.5453, 1.7023, 1.5749, 1.2936], device='cuda:0'), covar=tensor([0.0050, 0.0064, 0.0049, 0.0086, 0.0129, 0.0085, 0.0058, 0.0079], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0031, 0.0031, 0.0039, 0.0035, 0.0032, 0.0039, 0.0038], device='cuda:0'), out_proj_covar=tensor([3.1828e-05, 2.8502e-05, 2.8194e-05, 3.7663e-05, 3.2450e-05, 3.0233e-05, 3.7286e-05, 3.6612e-05], device='cuda:0') 2022-11-16 09:23:44,219 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103488.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:23:54,798 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.95 vs. limit=2.0 2022-11-16 09:23:57,694 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=103507.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:23:58,830 INFO [train.py:876] (0/4) Epoch 15, batch 1700, loss[loss=0.1238, simple_loss=0.143, pruned_loss=0.05227, over 5587.00 frames. ], tot_loss[loss=0.09591, simple_loss=0.129, pruned_loss=0.0314, over 1079390.43 frames. ], batch size: 50, lr: 5.41e-03, grad_scale: 8.0 2022-11-16 09:24:21,635 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=103543.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:24:24,447 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.609e+01 1.355e+02 1.661e+02 2.024e+02 3.979e+02, threshold=3.323e+02, percent-clipped=4.0 2022-11-16 09:24:30,121 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=103555.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:24:53,831 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=103591.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:24:55,977 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.7816, 3.7679, 4.0405, 2.1928, 3.7417, 4.2532, 3.7736, 4.5931], device='cuda:0'), covar=tensor([0.1660, 0.1190, 0.0492, 0.2242, 0.0550, 0.0451, 0.0554, 0.0389], device='cuda:0'), in_proj_covar=tensor([0.0161, 0.0178, 0.0168, 0.0181, 0.0186, 0.0204, 0.0174, 0.0183], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:25:06,721 INFO [train.py:876] (0/4) Epoch 15, batch 1800, loss[loss=0.2228, simple_loss=0.2075, pruned_loss=0.119, over 2872.00 frames. ], tot_loss[loss=0.09625, simple_loss=0.1297, pruned_loss=0.03138, over 1081671.04 frames. ], batch size: 284, lr: 5.41e-03, grad_scale: 8.0 2022-11-16 09:25:31,583 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.004e+01 1.509e+02 1.805e+02 2.366e+02 4.290e+02, threshold=3.611e+02, percent-clipped=5.0 2022-11-16 09:26:12,553 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0542, 1.8521, 2.0548, 1.6965, 1.7223, 1.8347, 1.9288, 1.9812], device='cuda:0'), covar=tensor([0.0073, 0.0077, 0.0055, 0.0063, 0.0070, 0.0048, 0.0053, 0.0059], device='cuda:0'), in_proj_covar=tensor([0.0069, 0.0063, 0.0063, 0.0067, 0.0066, 0.0062, 0.0059, 0.0058], device='cuda:0'), out_proj_covar=tensor([6.1512e-05, 5.4984e-05, 5.4557e-05, 5.9254e-05, 5.8527e-05, 5.3439e-05, 5.2714e-05, 5.0460e-05], device='cuda:0') 2022-11-16 09:26:13,039 INFO [train.py:876] (0/4) Epoch 15, batch 1900, loss[loss=0.05681, simple_loss=0.1016, pruned_loss=0.006026, over 5463.00 frames. ], tot_loss[loss=0.09461, simple_loss=0.1284, pruned_loss=0.0304, over 1083026.96 frames. ], batch size: 11, lr: 5.40e-03, grad_scale: 8.0 2022-11-16 09:26:22,456 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=103722.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 09:26:34,271 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103740.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:26:38,965 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.053e+01 1.311e+02 1.680e+02 2.035e+02 3.370e+02, threshold=3.360e+02, percent-clipped=0.0 2022-11-16 09:26:43,678 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=103754.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:26:50,725 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103765.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:27:04,591 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0745, 3.0418, 2.7423, 3.0306, 3.0562, 2.7423, 2.7394, 2.7759], device='cuda:0'), covar=tensor([0.0354, 0.0653, 0.1441, 0.0575, 0.0604, 0.0625, 0.1016, 0.0844], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0184, 0.0276, 0.0179, 0.0219, 0.0177, 0.0191, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 09:27:06,540 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=103788.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:27:12,493 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1965, 2.6018, 2.9930, 4.0840, 3.9104, 3.0938, 2.9663, 4.0502], device='cuda:0'), covar=tensor([0.0809, 0.2787, 0.2075, 0.2792, 0.1109, 0.2747, 0.2101, 0.0745], device='cuda:0'), in_proj_covar=tensor([0.0260, 0.0191, 0.0185, 0.0294, 0.0225, 0.0198, 0.0187, 0.0248], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 09:27:15,381 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103801.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:27:15,902 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=103802.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:27:20,735 INFO [train.py:876] (0/4) Epoch 15, batch 2000, loss[loss=0.118, simple_loss=0.1493, pruned_loss=0.04336, over 5352.00 frames. ], tot_loss[loss=0.09482, simple_loss=0.1281, pruned_loss=0.03075, over 1081500.34 frames. ], batch size: 70, lr: 5.40e-03, grad_scale: 8.0 2022-11-16 09:27:32,511 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103826.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:27:39,660 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=103836.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:27:40,727 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.84 vs. limit=5.0 2022-11-16 09:27:47,518 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.204e+01 1.423e+02 1.683e+02 2.188e+02 4.061e+02, threshold=3.366e+02, percent-clipped=3.0 2022-11-16 09:28:06,733 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103876.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:28:27,051 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=3.61 vs. limit=5.0 2022-11-16 09:28:29,290 INFO [train.py:876] (0/4) Epoch 15, batch 2100, loss[loss=0.1961, simple_loss=0.1813, pruned_loss=0.1054, over 3090.00 frames. ], tot_loss[loss=0.09553, simple_loss=0.1288, pruned_loss=0.03113, over 1073752.69 frames. ], batch size: 284, lr: 5.40e-03, grad_scale: 8.0 2022-11-16 09:28:48,744 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=103937.0, num_to_drop=1, layers_to_drop={0} 2022-11-16 09:28:55,842 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 1.249e+02 1.692e+02 2.000e+02 3.700e+02, threshold=3.385e+02, percent-clipped=3.0 2022-11-16 09:29:06,782 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.6256, 2.6914, 2.3897, 2.7017, 2.3375, 1.9011, 2.5207, 3.0110], device='cuda:0'), covar=tensor([0.1151, 0.1597, 0.1701, 0.1415, 0.1155, 0.2220, 0.1300, 0.2172], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0111, 0.0110, 0.0113, 0.0097, 0.0109, 0.0102, 0.0089], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:29:08,251 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.28 vs. limit=5.0 2022-11-16 09:29:24,781 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103990.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:29:29,402 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=103997.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:29:37,374 INFO [train.py:876] (0/4) Epoch 15, batch 2200, loss[loss=0.06281, simple_loss=0.09582, pruned_loss=0.0149, over 5619.00 frames. ], tot_loss[loss=0.09554, simple_loss=0.1287, pruned_loss=0.03119, over 1080993.25 frames. ], batch size: 8, lr: 5.40e-03, grad_scale: 8.0 2022-11-16 09:29:46,230 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104022.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 09:30:02,946 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.2574, 1.4360, 1.1816, 0.9596, 1.1485, 1.6794, 1.6192, 1.5577], device='cuda:0'), covar=tensor([0.1376, 0.0998, 0.1830, 0.2818, 0.1659, 0.1164, 0.1427, 0.1417], device='cuda:0'), in_proj_covar=tensor([0.0164, 0.0180, 0.0168, 0.0184, 0.0189, 0.0208, 0.0176, 0.0186], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:30:04,014 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.673e+01 1.367e+02 1.685e+02 2.131e+02 5.334e+02, threshold=3.371e+02, percent-clipped=2.0 2022-11-16 09:30:04,857 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=104049.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:30:06,194 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=104051.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:30:10,744 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=104058.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:30:18,827 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104070.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 09:30:36,202 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104096.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:30:45,073 INFO [train.py:876] (0/4) Epoch 15, batch 2300, loss[loss=0.07356, simple_loss=0.1206, pruned_loss=0.01324, over 5821.00 frames. ], tot_loss[loss=0.0959, simple_loss=0.1286, pruned_loss=0.03161, over 1080204.40 frames. ], batch size: 18, lr: 5.39e-03, grad_scale: 8.0 2022-11-16 09:30:45,938 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=104110.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:30:52,930 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104121.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:31:10,965 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.069e+01 1.436e+02 1.695e+02 2.146e+02 3.837e+02, threshold=3.391e+02, percent-clipped=3.0 2022-11-16 09:31:11,104 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.3106, 3.9169, 2.6791, 3.7625, 3.1134, 2.7771, 2.3223, 3.2614], device='cuda:0'), covar=tensor([0.1518, 0.0339, 0.1207, 0.0378, 0.0953, 0.1179, 0.1919, 0.0571], device='cuda:0'), in_proj_covar=tensor([0.0150, 0.0141, 0.0149, 0.0143, 0.0169, 0.0162, 0.0154, 0.0155], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:31:22,805 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.1727, 1.3803, 2.3259, 2.0857, 1.8771, 2.2206, 2.3178, 1.8397], device='cuda:0'), covar=tensor([0.0061, 0.0206, 0.0046, 0.0050, 0.0131, 0.0121, 0.0037, 0.0047], device='cuda:0'), in_proj_covar=tensor([0.0034, 0.0030, 0.0031, 0.0040, 0.0035, 0.0032, 0.0039, 0.0038], device='cuda:0'), out_proj_covar=tensor([3.1689e-05, 2.8339e-05, 2.8225e-05, 3.7782e-05, 3.2487e-05, 3.0199e-05, 3.6763e-05, 3.6381e-05], device='cuda:0') 2022-11-16 09:31:52,449 INFO [train.py:876] (0/4) Epoch 15, batch 2400, loss[loss=0.08697, simple_loss=0.122, pruned_loss=0.02594, over 5770.00 frames. ], tot_loss[loss=0.09666, simple_loss=0.1298, pruned_loss=0.03177, over 1083793.51 frames. ], batch size: 16, lr: 5.39e-03, grad_scale: 8.0 2022-11-16 09:31:55,580 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.17 vs. limit=2.0 2022-11-16 09:32:07,832 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.4519, 3.8835, 3.5929, 3.9102, 3.8789, 3.3582, 3.4496, 3.5220], device='cuda:0'), covar=tensor([0.0840, 0.0490, 0.1097, 0.0353, 0.0429, 0.0500, 0.0638, 0.0484], device='cuda:0'), in_proj_covar=tensor([0.0134, 0.0185, 0.0277, 0.0178, 0.0220, 0.0177, 0.0191, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 09:32:08,470 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104232.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 09:32:19,521 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.767e+01 1.368e+02 1.673e+02 2.290e+02 7.384e+02, threshold=3.347e+02, percent-clipped=4.0 2022-11-16 09:32:20,300 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.1917, 4.6430, 4.2243, 4.7087, 4.6296, 3.9913, 4.3578, 4.1635], device='cuda:0'), covar=tensor([0.0300, 0.0519, 0.1416, 0.0314, 0.0463, 0.0570, 0.0495, 0.0493], device='cuda:0'), in_proj_covar=tensor([0.0135, 0.0186, 0.0277, 0.0179, 0.0220, 0.0178, 0.0191, 0.0179], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 09:32:40,596 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.8445, 1.4952, 2.0100, 1.6421, 1.7196, 1.6186, 1.8187, 1.6383], device='cuda:0'), covar=tensor([0.0042, 0.0109, 0.0054, 0.0075, 0.0060, 0.0146, 0.0052, 0.0068], device='cuda:0'), in_proj_covar=tensor([0.0035, 0.0031, 0.0032, 0.0040, 0.0036, 0.0032, 0.0040, 0.0039], device='cuda:0'), out_proj_covar=tensor([3.2295e-05, 2.8827e-05, 2.8700e-05, 3.8309e-05, 3.3104e-05, 3.0838e-05, 3.7609e-05, 3.7116e-05], device='cuda:0') 2022-11-16 09:32:51,568 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.77 vs. limit=2.0 2022-11-16 09:32:57,735 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.0902, 2.7437, 2.9748, 3.9428, 4.0273, 2.8638, 2.7211, 3.8869], device='cuda:0'), covar=tensor([0.0650, 0.2575, 0.2120, 0.2267, 0.1063, 0.2800, 0.2131, 0.0813], device='cuda:0'), in_proj_covar=tensor([0.0261, 0.0193, 0.0186, 0.0296, 0.0228, 0.0199, 0.0188, 0.0250], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 09:33:00,203 INFO [train.py:876] (0/4) Epoch 15, batch 2500, loss[loss=0.09978, simple_loss=0.1266, pruned_loss=0.03646, over 5683.00 frames. ], tot_loss[loss=0.09634, simple_loss=0.1293, pruned_loss=0.0317, over 1081292.24 frames. ], batch size: 17, lr: 5.39e-03, grad_scale: 8.0 2022-11-16 09:33:10,447 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.94 vs. limit=2.0 2022-11-16 09:33:25,645 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104346.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:33:27,458 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.221e+01 1.438e+02 1.846e+02 2.292e+02 4.407e+02, threshold=3.693e+02, percent-clipped=4.0 2022-11-16 09:33:30,198 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104353.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:33:59,807 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104396.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:34:06,225 INFO [zipformer.py:623] (0/4) warmup_begin=1333.3, warmup_end=2000.0, batch_count=104405.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:34:08,831 INFO [train.py:876] (0/4) Epoch 15, batch 2600, loss[loss=0.06839, simple_loss=0.1131, pruned_loss=0.01182, over 5502.00 frames. ], tot_loss[loss=0.0955, simple_loss=0.1287, pruned_loss=0.03116, over 1085316.04 frames. ], batch size: 17, lr: 5.39e-03, grad_scale: 8.0 2022-11-16 09:34:16,657 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104421.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:34:31,322 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.8770, 3.7403, 3.9362, 3.6196, 3.8031, 3.7800, 1.6111, 3.9722], device='cuda:0'), covar=tensor([0.0241, 0.0367, 0.0233, 0.0313, 0.0286, 0.0382, 0.2939, 0.0326], device='cuda:0'), in_proj_covar=tensor([0.0104, 0.0088, 0.0088, 0.0082, 0.0101, 0.0089, 0.0130, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 09:34:32,030 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([1.6672, 1.3830, 1.3273, 1.1237, 1.4619, 1.6539, 0.7809, 1.2308], device='cuda:0'), covar=tensor([0.0344, 0.0337, 0.0393, 0.0589, 0.0279, 0.0503, 0.0697, 0.0502], device='cuda:0'), in_proj_covar=tensor([0.0017, 0.0026, 0.0019, 0.0022, 0.0019, 0.0017, 0.0025, 0.0018], device='cuda:0'), out_proj_covar=tensor([9.5179e-05, 1.3288e-04, 1.0153e-04, 1.1441e-04, 1.0267e-04, 9.6842e-05, 1.2718e-04, 9.8168e-05], device='cuda:0') 2022-11-16 09:34:32,570 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104444.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:34:35,747 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.383e+01 1.354e+02 1.630e+02 1.866e+02 3.463e+02, threshold=3.260e+02, percent-clipped=0.0 2022-11-16 09:34:49,468 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104469.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:35:11,403 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5545, 4.4315, 3.5454, 2.2155, 4.1761, 1.9235, 4.1404, 2.4082], device='cuda:0'), covar=tensor([0.1364, 0.0126, 0.0524, 0.1757, 0.0192, 0.1594, 0.0338, 0.1336], device='cuda:0'), in_proj_covar=tensor([0.0116, 0.0104, 0.0113, 0.0109, 0.0102, 0.0117, 0.0100, 0.0106], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0004, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:35:16,363 INFO [train.py:876] (0/4) Epoch 15, batch 2700, loss[loss=0.08729, simple_loss=0.1238, pruned_loss=0.02541, over 5589.00 frames. ], tot_loss[loss=0.09529, simple_loss=0.1286, pruned_loss=0.031, over 1084392.09 frames. ], batch size: 16, lr: 5.38e-03, grad_scale: 8.0 2022-11-16 09:35:31,686 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104532.0, num_to_drop=1, layers_to_drop={2} 2022-11-16 09:35:40,382 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.6098, 2.3626, 3.3282, 2.8839, 3.2781, 2.1270, 2.8964, 3.6131], device='cuda:0'), covar=tensor([0.0642, 0.1310, 0.0759, 0.1186, 0.0680, 0.1493, 0.1181, 0.0654], device='cuda:0'), in_proj_covar=tensor([0.0244, 0.0189, 0.0214, 0.0209, 0.0239, 0.0196, 0.0226, 0.0229], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:35:41,833 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.09 vs. limit=5.0 2022-11-16 09:35:42,732 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.521e+01 1.320e+02 1.626e+02 1.993e+02 3.646e+02, threshold=3.252e+02, percent-clipped=1.0 2022-11-16 09:36:04,049 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104580.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:36:10,200 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.4680, 4.3064, 2.9153, 4.2265, 3.4908, 3.0812, 2.5545, 3.7305], device='cuda:0'), covar=tensor([0.1383, 0.0391, 0.1270, 0.0473, 0.0730, 0.1051, 0.2087, 0.0468], device='cuda:0'), in_proj_covar=tensor([0.0149, 0.0140, 0.0148, 0.0144, 0.0169, 0.0161, 0.0153, 0.0153], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:36:24,516 INFO [train.py:876] (0/4) Epoch 15, batch 2800, loss[loss=0.09651, simple_loss=0.1306, pruned_loss=0.03123, over 5610.00 frames. ], tot_loss[loss=0.09613, simple_loss=0.1291, pruned_loss=0.03157, over 1081870.37 frames. ], batch size: 50, lr: 5.38e-03, grad_scale: 8.0 2022-11-16 09:36:49,221 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104646.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:36:51,003 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.819e+01 1.524e+02 1.785e+02 2.265e+02 4.174e+02, threshold=3.570e+02, percent-clipped=2.0 2022-11-16 09:36:54,180 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104653.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:37:21,816 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104694.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:37:26,309 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104701.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:37:29,412 INFO [zipformer.py:623] (0/4) warmup_begin=2000.0, warmup_end=2666.7, batch_count=104705.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:37:30,840 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.0618, 4.3095, 3.9787, 3.5574, 2.1077, 4.1859, 2.5177, 3.4472], device='cuda:0'), covar=tensor([0.0402, 0.0134, 0.0147, 0.0315, 0.0629, 0.0141, 0.0548, 0.0227], device='cuda:0'), in_proj_covar=tensor([0.0195, 0.0187, 0.0184, 0.0209, 0.0198, 0.0186, 0.0195, 0.0187], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003, 0.0003], device='cuda:0') 2022-11-16 09:37:32,267 INFO [train.py:876] (0/4) Epoch 15, batch 2900, loss[loss=0.1084, simple_loss=0.1318, pruned_loss=0.04247, over 4705.00 frames. ], tot_loss[loss=0.0966, simple_loss=0.1289, pruned_loss=0.03215, over 1073038.14 frames. ], batch size: 135, lr: 5.38e-03, grad_scale: 8.0 2022-11-16 09:37:34,887 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8984, 2.8828, 2.5618, 2.8883, 2.9120, 2.5560, 2.5118, 2.6853], device='cuda:0'), covar=tensor([0.0372, 0.0731, 0.1553, 0.0567, 0.0654, 0.0642, 0.1171, 0.0793], device='cuda:0'), in_proj_covar=tensor([0.0135, 0.0185, 0.0278, 0.0179, 0.0218, 0.0178, 0.0191, 0.0181], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 09:37:59,214 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.093e+01 1.400e+02 1.739e+02 2.219e+02 5.401e+02, threshold=3.478e+02, percent-clipped=7.0 2022-11-16 09:38:02,045 INFO [zipformer.py:623] (0/4) warmup_begin=666.7, warmup_end=1333.3, batch_count=104753.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:38:40,269 INFO [train.py:876] (0/4) Epoch 15, batch 3000, loss[loss=0.06672, simple_loss=0.1041, pruned_loss=0.01468, over 4582.00 frames. ], tot_loss[loss=0.09665, simple_loss=0.1292, pruned_loss=0.03203, over 1085304.11 frames. ], batch size: 5, lr: 5.38e-03, grad_scale: 8.0 2022-11-16 09:38:40,270 INFO [train.py:899] (0/4) Computing validation loss 2022-11-16 09:38:58,034 INFO [train.py:908] (0/4) Epoch 15, validation: loss=0.1809, simple_loss=0.1888, pruned_loss=0.08654, over 1530663.00 frames. 2022-11-16 09:38:58,035 INFO [train.py:909] (0/4) Maximum memory allocated so far is 4849MB 2022-11-16 09:39:25,316 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.210e+01 1.336e+02 1.693e+02 2.076e+02 4.663e+02, threshold=3.385e+02, percent-clipped=3.0 2022-11-16 09:39:43,216 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.41 vs. limit=2.0 2022-11-16 09:40:06,260 INFO [train.py:876] (0/4) Epoch 15, batch 3100, loss[loss=0.102, simple_loss=0.1317, pruned_loss=0.03612, over 5532.00 frames. ], tot_loss[loss=0.09785, simple_loss=0.1305, pruned_loss=0.0326, over 1084249.98 frames. ], batch size: 15, lr: 5.37e-03, grad_scale: 8.0 2022-11-16 09:40:33,542 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.354e+01 1.349e+02 1.627e+02 2.090e+02 3.507e+02, threshold=3.255e+02, percent-clipped=1.0 2022-11-16 09:40:52,841 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([0.6187, 0.8276, 0.7708, 0.7391, 0.7970, 0.8914, 0.6626, 0.8787], device='cuda:0'), covar=tensor([0.0111, 0.0064, 0.0078, 0.0069, 0.0078, 0.0072, 0.0108, 0.0060], device='cuda:0'), in_proj_covar=tensor([0.0069, 0.0064, 0.0063, 0.0067, 0.0066, 0.0062, 0.0059, 0.0059], device='cuda:0'), out_proj_covar=tensor([6.1315e-05, 5.6227e-05, 5.4964e-05, 5.9092e-05, 5.8385e-05, 5.3294e-05, 5.2389e-05, 5.0854e-05], device='cuda:0') 2022-11-16 09:40:56,638 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.9207, 3.0826, 3.6104, 4.5457, 4.5392, 3.8202, 3.4681, 4.5291], device='cuda:0'), covar=tensor([0.0658, 0.2949, 0.2089, 0.2364, 0.1234, 0.2681, 0.2294, 0.0611], device='cuda:0'), in_proj_covar=tensor([0.0260, 0.0194, 0.0184, 0.0297, 0.0226, 0.0198, 0.0190, 0.0250], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 09:41:09,154 INFO [checkpoint.py:75] (0/4) Saving checkpoint to pruned_transducer_stateless7/exp/v2/checkpoint-105000.pt 2022-11-16 09:41:18,930 INFO [train.py:876] (0/4) Epoch 15, batch 3200, loss[loss=0.132, simple_loss=0.1513, pruned_loss=0.05639, over 5570.00 frames. ], tot_loss[loss=0.09633, simple_loss=0.1289, pruned_loss=0.0319, over 1082533.95 frames. ], batch size: 54, lr: 5.37e-03, grad_scale: 8.0 2022-11-16 09:41:24,362 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2455, 4.3091, 2.7417, 4.0647, 3.3067, 2.9482, 2.2734, 3.4411], device='cuda:0'), covar=tensor([0.1538, 0.0248, 0.1302, 0.0389, 0.0751, 0.1059, 0.2117, 0.0627], device='cuda:0'), in_proj_covar=tensor([0.0150, 0.0141, 0.0150, 0.0144, 0.0170, 0.0161, 0.0154, 0.0154], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0003, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:41:46,947 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 1.417e+02 1.645e+02 2.110e+02 3.664e+02, threshold=3.291e+02, percent-clipped=2.0 2022-11-16 09:41:53,517 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=2.05 vs. limit=2.0 2022-11-16 09:42:28,183 INFO [train.py:876] (0/4) Epoch 15, batch 3300, loss[loss=0.09392, simple_loss=0.1317, pruned_loss=0.02808, over 5584.00 frames. ], tot_loss[loss=0.09549, simple_loss=0.1286, pruned_loss=0.03119, over 1083128.69 frames. ], batch size: 22, lr: 5.37e-03, grad_scale: 8.0 2022-11-16 09:42:31,762 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.90 vs. limit=2.0 2022-11-16 09:42:56,051 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.723e+01 1.318e+02 1.706e+02 2.251e+02 4.750e+02, threshold=3.412e+02, percent-clipped=5.0 2022-11-16 09:43:04,138 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5119, 2.9264, 3.2590, 4.3058, 4.3048, 3.3001, 3.2566, 4.3548], device='cuda:0'), covar=tensor([0.0492, 0.2883, 0.1984, 0.2509, 0.0912, 0.2931, 0.1789, 0.0572], device='cuda:0'), in_proj_covar=tensor([0.0257, 0.0192, 0.0183, 0.0294, 0.0224, 0.0197, 0.0189, 0.0247], device='cuda:0'), out_proj_covar=tensor([0.0006, 0.0005, 0.0005, 0.0006, 0.0005, 0.0005, 0.0005, 0.0006], device='cuda:0') 2022-11-16 09:43:24,283 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=192, metric=1.83 vs. limit=2.0 2022-11-16 09:43:25,235 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.8369, 2.5672, 3.0577, 2.0157, 1.7570, 3.4224, 2.8463, 2.4822], device='cuda:0'), covar=tensor([0.1018, 0.1195, 0.0888, 0.2339, 0.4548, 0.0974, 0.1346, 0.1426], device='cuda:0'), in_proj_covar=tensor([0.0116, 0.0110, 0.0110, 0.0107, 0.0081, 0.0077, 0.0089, 0.0100], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0003, 0.0003, 0.0002, 0.0002, 0.0002, 0.0002], device='cuda:0') 2022-11-16 09:43:37,374 INFO [train.py:876] (0/4) Epoch 15, batch 3400, loss[loss=0.09206, simple_loss=0.1406, pruned_loss=0.02177, over 5540.00 frames. ], tot_loss[loss=0.09438, simple_loss=0.1275, pruned_loss=0.03061, over 1083618.96 frames. ], batch size: 14, lr: 5.37e-03, grad_scale: 8.0 2022-11-16 09:44:01,336 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.5896, 4.6784, 3.6125, 2.1005, 4.4589, 1.9628, 4.2668, 2.2313], device='cuda:0'), covar=tensor([0.1501, 0.0116, 0.0537, 0.2045, 0.0189, 0.1706, 0.0218, 0.1634], device='cuda:0'), in_proj_covar=tensor([0.0118, 0.0106, 0.0115, 0.0110, 0.0103, 0.0120, 0.0102, 0.0108], device='cuda:0'), out_proj_covar=tensor([0.0005, 0.0004, 0.0005, 0.0005, 0.0004, 0.0005, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:44:05,014 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.327e+01 1.478e+02 1.801e+02 2.310e+02 1.078e+03, threshold=3.602e+02, percent-clipped=8.0 2022-11-16 09:44:18,289 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([5.0618, 4.5260, 4.8790, 4.4844, 5.1365, 4.9863, 4.3561, 5.1148], device='cuda:0'), covar=tensor([0.0422, 0.0381, 0.0466, 0.0374, 0.0388, 0.0241, 0.0314, 0.0280], device='cuda:0'), in_proj_covar=tensor([0.0155, 0.0161, 0.0116, 0.0152, 0.0197, 0.0120, 0.0135, 0.0163], device='cuda:0'), out_proj_covar=tensor([0.0003, 0.0003, 0.0002, 0.0003, 0.0004, 0.0002, 0.0002, 0.0003], device='cuda:0') 2022-11-16 09:44:46,632 INFO [train.py:876] (0/4) Epoch 15, batch 3500, loss[loss=0.09655, simple_loss=0.1276, pruned_loss=0.03276, over 5579.00 frames. ], tot_loss[loss=0.09547, simple_loss=0.1285, pruned_loss=0.03124, over 1084128.04 frames. ], batch size: 16, lr: 5.36e-03, grad_scale: 8.0 2022-11-16 09:45:15,056 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 1.261e+02 1.523e+02 1.855e+02 3.568e+02, threshold=3.045e+02, percent-clipped=0.0 2022-11-16 09:45:22,054 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.2881, 1.6522, 2.1920, 2.1610, 1.3988, 2.2936, 2.3304, 1.9923], device='cuda:0'), covar=tensor([0.0067, 0.0197, 0.0069, 0.0062, 0.0278, 0.0067, 0.0055, 0.0057], device='cuda:0'), in_proj_covar=tensor([0.0035, 0.0030, 0.0032, 0.0040, 0.0036, 0.0032, 0.0040, 0.0038], device='cuda:0'), out_proj_covar=tensor([3.1986e-05, 2.8384e-05, 2.8648e-05, 3.8158e-05, 3.2963e-05, 3.0516e-05, 3.7603e-05, 3.6589e-05], device='cuda:0') 2022-11-16 09:45:29,162 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([4.5094, 4.4239, 4.5544, 4.5685, 4.1537, 3.6734, 5.0211, 4.4428], device='cuda:0'), covar=tensor([0.0441, 0.0861, 0.0414, 0.1152, 0.0537, 0.0467, 0.0623, 0.0614], device='cuda:0'), in_proj_covar=tensor([0.0091, 0.0114, 0.0099, 0.0126, 0.0092, 0.0084, 0.0148, 0.0110], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0002, 0.0003, 0.0002, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 09:45:42,589 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=105388.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:45:55,330 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([3.9233, 4.3970, 3.7683, 4.3878, 4.2589, 3.7553, 4.1458, 3.9735], device='cuda:0'), covar=tensor([0.0524, 0.0641, 0.1479, 0.0571, 0.0739, 0.0687, 0.0894, 0.0888], device='cuda:0'), in_proj_covar=tensor([0.0136, 0.0184, 0.0275, 0.0178, 0.0219, 0.0177, 0.0191, 0.0178], device='cuda:0'), out_proj_covar=tensor([0.0002, 0.0003, 0.0004, 0.0002, 0.0003, 0.0002, 0.0003, 0.0002], device='cuda:0') 2022-11-16 09:45:57,219 INFO [train.py:876] (0/4) Epoch 15, batch 3600, loss[loss=0.05408, simple_loss=0.09576, pruned_loss=0.006199, over 5526.00 frames. ], tot_loss[loss=0.09506, simple_loss=0.1282, pruned_loss=0.03094, over 1084123.63 frames. ], batch size: 10, lr: 5.36e-03, grad_scale: 8.0 2022-11-16 09:46:01,763 INFO [scaling.py:664] (0/4) Whitening: num_groups=1, num_channels=384, metric=4.18 vs. limit=5.0 2022-11-16 09:46:25,418 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.909e+01 1.343e+02 1.539e+02 1.882e+02 3.139e+02, threshold=3.078e+02, percent-clipped=1.0 2022-11-16 09:46:25,638 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=105449.0, num_to_drop=1, layers_to_drop={1} 2022-11-16 09:46:46,355 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.9750, 3.2118, 2.9366, 3.1625, 2.7004, 3.2170, 3.2240, 3.5058], device='cuda:0'), covar=tensor([0.0925, 0.0911, 0.1271, 0.0956, 0.1264, 0.0849, 0.0785, 0.2407], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0110, 0.0108, 0.0111, 0.0096, 0.0107, 0.0100, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0') 2022-11-16 09:46:52,059 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=105486.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:47:03,222 INFO [zipformer.py:623] (0/4) warmup_begin=2666.7, warmup_end=3333.3, batch_count=105502.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:47:07,858 INFO [train.py:876] (0/4) Epoch 15, batch 3700, loss[loss=0.08474, simple_loss=0.1106, pruned_loss=0.02945, over 4949.00 frames. ], tot_loss[loss=0.09637, simple_loss=0.1293, pruned_loss=0.03174, over 1075080.39 frames. ], batch size: 109, lr: 5.36e-03, grad_scale: 8.0 2022-11-16 09:47:11,062 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.57 vs. limit=2.0 2022-11-16 09:47:34,336 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=105547.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:47:35,430 INFO [optim.py:343] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.168e+01 1.329e+02 1.592e+02 1.987e+02 4.072e+02, threshold=3.183e+02, percent-clipped=1.0 2022-11-16 09:47:36,537 INFO [scaling.py:664] (0/4) Whitening: num_groups=8, num_channels=96, metric=1.18 vs. limit=2.0 2022-11-16 09:47:45,375 INFO [zipformer.py:623] (0/4) warmup_begin=3333.3, warmup_end=4000.0, batch_count=105563.0, num_to_drop=0, layers_to_drop=set() 2022-11-16 09:48:04,610 INFO [zipformer.py:1411] (0/4) attn_weights_entropy = tensor([2.0612, 2.1314, 2.0859, 2.1512, 1.9767, 1.5617, 1.9525, 2.3525], device='cuda:0'), covar=tensor([0.1824, 0.1485, 0.1785, 0.1299, 0.1340, 0.2434, 0.1517, 0.1136], device='cuda:0'), in_proj_covar=tensor([0.0119, 0.0110, 0.0109, 0.0112, 0.0096, 0.0107, 0.0100, 0.0088], device='cuda:0'), out_proj_covar=tensor([0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004, 0.0004], device='cuda:0')