2023-10-10 22:44:47,590 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,592 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-10 22:44:47,593 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,593 MultiCorpus: 7142 train + 698 dev + 2570 test sentences - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator 2023-10-10 22:44:47,593 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,593 Train: 7142 sentences 2023-10-10 22:44:47,593 (train_with_dev=False, train_with_test=False) 2023-10-10 22:44:47,593 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,593 Training Params: 2023-10-10 22:44:47,593 - learning_rate: "0.00015" 2023-10-10 22:44:47,593 - mini_batch_size: "4" 2023-10-10 22:44:47,593 - max_epochs: "10" 2023-10-10 22:44:47,594 - shuffle: "True" 2023-10-10 22:44:47,594 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,594 Plugins: 2023-10-10 22:44:47,594 - TensorboardLogger 2023-10-10 22:44:47,594 - LinearScheduler | warmup_fraction: '0.1' 2023-10-10 22:44:47,594 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,594 Final evaluation on model from best epoch (best-model.pt) 2023-10-10 22:44:47,594 - metric: "('micro avg', 'f1-score')" 2023-10-10 22:44:47,594 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,594 Computation: 2023-10-10 22:44:47,594 - compute on device: cuda:0 2023-10-10 22:44:47,594 - embedding storage: none 2023-10-10 22:44:47,594 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,594 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" 2023-10-10 22:44:47,594 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,595 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:44:47,595 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-10 22:45:45,122 epoch 1 - iter 178/1786 - loss 2.82297678 - time (sec): 57.53 - samples/sec: 442.90 - lr: 0.000015 - momentum: 0.000000 2023-10-10 22:46:40,872 epoch 1 - iter 356/1786 - loss 2.69802388 - time (sec): 113.28 - samples/sec: 445.03 - lr: 0.000030 - momentum: 0.000000 2023-10-10 22:47:36,688 epoch 1 - iter 534/1786 - loss 2.43012170 - time (sec): 169.09 - samples/sec: 443.81 - lr: 0.000045 - momentum: 0.000000 2023-10-10 22:48:32,330 epoch 1 - iter 712/1786 - loss 2.13148166 - time (sec): 224.73 - samples/sec: 442.71 - lr: 0.000060 - momentum: 0.000000 2023-10-10 22:49:29,955 epoch 1 - iter 890/1786 - loss 1.82875030 - time (sec): 282.36 - samples/sec: 445.19 - lr: 0.000075 - momentum: 0.000000 2023-10-10 22:50:25,069 epoch 1 - iter 1068/1786 - loss 1.63107710 - time (sec): 337.47 - samples/sec: 440.99 - lr: 0.000090 - momentum: 0.000000 2023-10-10 22:51:20,820 epoch 1 - iter 1246/1786 - loss 1.46922238 - time (sec): 393.22 - samples/sec: 438.43 - lr: 0.000105 - momentum: 0.000000 2023-10-10 22:52:17,591 epoch 1 - iter 1424/1786 - loss 1.33149873 - time (sec): 449.99 - samples/sec: 439.23 - lr: 0.000120 - momentum: 0.000000 2023-10-10 22:53:14,658 epoch 1 - iter 1602/1786 - loss 1.21920691 - time (sec): 507.06 - samples/sec: 440.59 - lr: 0.000134 - momentum: 0.000000 2023-10-10 22:54:11,490 epoch 1 - iter 1780/1786 - loss 1.13034640 - time (sec): 563.89 - samples/sec: 439.76 - lr: 0.000149 - momentum: 0.000000 2023-10-10 22:54:13,204 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:54:13,205 EPOCH 1 done: loss 1.1277 - lr: 0.000149 2023-10-10 22:54:33,067 DEV : loss 0.23323342204093933 - f1-score (micro avg) 0.4173 2023-10-10 22:54:33,096 saving best model 2023-10-10 22:54:33,952 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:55:30,220 epoch 2 - iter 178/1786 - loss 0.24102985 - time (sec): 56.27 - samples/sec: 470.07 - lr: 0.000148 - momentum: 0.000000 2023-10-10 22:56:23,566 epoch 2 - iter 356/1786 - loss 0.23283162 - time (sec): 109.61 - samples/sec: 460.21 - lr: 0.000147 - momentum: 0.000000 2023-10-10 22:57:18,800 epoch 2 - iter 534/1786 - loss 0.21609252 - time (sec): 164.85 - samples/sec: 450.65 - lr: 0.000145 - momentum: 0.000000 2023-10-10 22:58:13,255 epoch 2 - iter 712/1786 - loss 0.20093014 - time (sec): 219.30 - samples/sec: 452.98 - lr: 0.000143 - momentum: 0.000000 2023-10-10 22:59:08,194 epoch 2 - iter 890/1786 - loss 0.18923384 - time (sec): 274.24 - samples/sec: 452.74 - lr: 0.000142 - momentum: 0.000000 2023-10-10 23:00:01,459 epoch 2 - iter 1068/1786 - loss 0.18291744 - time (sec): 327.50 - samples/sec: 452.31 - lr: 0.000140 - momentum: 0.000000 2023-10-10 23:00:56,471 epoch 2 - iter 1246/1786 - loss 0.17526909 - time (sec): 382.52 - samples/sec: 453.63 - lr: 0.000138 - momentum: 0.000000 2023-10-10 23:01:52,071 epoch 2 - iter 1424/1786 - loss 0.16882287 - time (sec): 438.12 - samples/sec: 455.71 - lr: 0.000137 - momentum: 0.000000 2023-10-10 23:02:46,754 epoch 2 - iter 1602/1786 - loss 0.16338648 - time (sec): 492.80 - samples/sec: 453.92 - lr: 0.000135 - momentum: 0.000000 2023-10-10 23:03:41,905 epoch 2 - iter 1780/1786 - loss 0.15840008 - time (sec): 547.95 - samples/sec: 452.81 - lr: 0.000133 - momentum: 0.000000 2023-10-10 23:03:43,567 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:03:43,567 EPOCH 2 done: loss 0.1584 - lr: 0.000133 2023-10-10 23:04:05,708 DEV : loss 0.11548721790313721 - f1-score (micro avg) 0.7577 2023-10-10 23:04:05,741 saving best model 2023-10-10 23:04:15,609 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:05:10,541 epoch 3 - iter 178/1786 - loss 0.08136549 - time (sec): 54.93 - samples/sec: 434.94 - lr: 0.000132 - momentum: 0.000000 2023-10-10 23:06:06,807 epoch 3 - iter 356/1786 - loss 0.07652555 - time (sec): 111.19 - samples/sec: 444.12 - lr: 0.000130 - momentum: 0.000000 2023-10-10 23:07:00,995 epoch 3 - iter 534/1786 - loss 0.08122126 - time (sec): 165.38 - samples/sec: 447.71 - lr: 0.000128 - momentum: 0.000000 2023-10-10 23:07:56,758 epoch 3 - iter 712/1786 - loss 0.08475757 - time (sec): 221.15 - samples/sec: 440.39 - lr: 0.000127 - momentum: 0.000000 2023-10-10 23:08:52,555 epoch 3 - iter 890/1786 - loss 0.08475656 - time (sec): 276.94 - samples/sec: 445.13 - lr: 0.000125 - momentum: 0.000000 2023-10-10 23:09:48,953 epoch 3 - iter 1068/1786 - loss 0.08234623 - time (sec): 333.34 - samples/sec: 444.34 - lr: 0.000123 - momentum: 0.000000 2023-10-10 23:10:44,473 epoch 3 - iter 1246/1786 - loss 0.07964466 - time (sec): 388.86 - samples/sec: 444.66 - lr: 0.000122 - momentum: 0.000000 2023-10-10 23:11:41,277 epoch 3 - iter 1424/1786 - loss 0.07944389 - time (sec): 445.66 - samples/sec: 445.15 - lr: 0.000120 - momentum: 0.000000 2023-10-10 23:12:37,418 epoch 3 - iter 1602/1786 - loss 0.07901652 - time (sec): 501.80 - samples/sec: 448.61 - lr: 0.000118 - momentum: 0.000000 2023-10-10 23:13:29,629 epoch 3 - iter 1780/1786 - loss 0.07923333 - time (sec): 554.02 - samples/sec: 447.67 - lr: 0.000117 - momentum: 0.000000 2023-10-10 23:13:31,248 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:13:31,248 EPOCH 3 done: loss 0.0792 - lr: 0.000117 2023-10-10 23:13:52,415 DEV : loss 0.1301306039094925 - f1-score (micro avg) 0.7635 2023-10-10 23:13:52,446 saving best model 2023-10-10 23:13:59,901 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:14:57,003 epoch 4 - iter 178/1786 - loss 0.05275662 - time (sec): 57.10 - samples/sec: 436.39 - lr: 0.000115 - momentum: 0.000000 2023-10-10 23:15:52,667 epoch 4 - iter 356/1786 - loss 0.05729787 - time (sec): 112.76 - samples/sec: 436.43 - lr: 0.000113 - momentum: 0.000000 2023-10-10 23:16:49,319 epoch 4 - iter 534/1786 - loss 0.05946890 - time (sec): 169.41 - samples/sec: 435.58 - lr: 0.000112 - momentum: 0.000000 2023-10-10 23:17:46,635 epoch 4 - iter 712/1786 - loss 0.06229628 - time (sec): 226.73 - samples/sec: 437.60 - lr: 0.000110 - momentum: 0.000000 2023-10-10 23:18:44,994 epoch 4 - iter 890/1786 - loss 0.05943268 - time (sec): 285.09 - samples/sec: 440.37 - lr: 0.000108 - momentum: 0.000000 2023-10-10 23:19:41,425 epoch 4 - iter 1068/1786 - loss 0.05885487 - time (sec): 341.52 - samples/sec: 441.36 - lr: 0.000107 - momentum: 0.000000 2023-10-10 23:20:39,891 epoch 4 - iter 1246/1786 - loss 0.05697691 - time (sec): 399.99 - samples/sec: 442.67 - lr: 0.000105 - momentum: 0.000000 2023-10-10 23:21:37,350 epoch 4 - iter 1424/1786 - loss 0.05690202 - time (sec): 457.45 - samples/sec: 440.56 - lr: 0.000103 - momentum: 0.000000 2023-10-10 23:22:34,147 epoch 4 - iter 1602/1786 - loss 0.05713425 - time (sec): 514.24 - samples/sec: 437.45 - lr: 0.000102 - momentum: 0.000000 2023-10-10 23:23:29,165 epoch 4 - iter 1780/1786 - loss 0.05724047 - time (sec): 569.26 - samples/sec: 435.87 - lr: 0.000100 - momentum: 0.000000 2023-10-10 23:23:30,870 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:23:30,870 EPOCH 4 done: loss 0.0573 - lr: 0.000100 2023-10-10 23:23:54,716 DEV : loss 0.15049181878566742 - f1-score (micro avg) 0.7765 2023-10-10 23:23:54,760 saving best model 2023-10-10 23:23:58,096 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:24:54,936 epoch 5 - iter 178/1786 - loss 0.03945234 - time (sec): 56.84 - samples/sec: 447.40 - lr: 0.000098 - momentum: 0.000000 2023-10-10 23:25:50,552 epoch 5 - iter 356/1786 - loss 0.04091510 - time (sec): 112.45 - samples/sec: 432.68 - lr: 0.000097 - momentum: 0.000000 2023-10-10 23:26:46,765 epoch 5 - iter 534/1786 - loss 0.03892648 - time (sec): 168.66 - samples/sec: 441.54 - lr: 0.000095 - momentum: 0.000000 2023-10-10 23:27:42,683 epoch 5 - iter 712/1786 - loss 0.04293506 - time (sec): 224.58 - samples/sec: 448.27 - lr: 0.000093 - momentum: 0.000000 2023-10-10 23:28:33,681 epoch 5 - iter 890/1786 - loss 0.04244048 - time (sec): 275.58 - samples/sec: 448.05 - lr: 0.000092 - momentum: 0.000000 2023-10-10 23:29:27,510 epoch 5 - iter 1068/1786 - loss 0.04236998 - time (sec): 329.41 - samples/sec: 447.40 - lr: 0.000090 - momentum: 0.000000 2023-10-10 23:30:20,804 epoch 5 - iter 1246/1786 - loss 0.04272781 - time (sec): 382.70 - samples/sec: 450.66 - lr: 0.000088 - momentum: 0.000000 2023-10-10 23:31:16,093 epoch 5 - iter 1424/1786 - loss 0.04290125 - time (sec): 437.99 - samples/sec: 452.49 - lr: 0.000087 - momentum: 0.000000 2023-10-10 23:32:09,690 epoch 5 - iter 1602/1786 - loss 0.04192998 - time (sec): 491.59 - samples/sec: 453.80 - lr: 0.000085 - momentum: 0.000000 2023-10-10 23:33:03,194 epoch 5 - iter 1780/1786 - loss 0.04145570 - time (sec): 545.09 - samples/sec: 455.03 - lr: 0.000083 - momentum: 0.000000 2023-10-10 23:33:04,778 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:33:04,779 EPOCH 5 done: loss 0.0415 - lr: 0.000083 2023-10-10 23:33:26,148 DEV : loss 0.1764456331729889 - f1-score (micro avg) 0.7759 2023-10-10 23:33:26,179 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:34:19,998 epoch 6 - iter 178/1786 - loss 0.02795317 - time (sec): 53.82 - samples/sec: 463.22 - lr: 0.000082 - momentum: 0.000000 2023-10-10 23:35:15,410 epoch 6 - iter 356/1786 - loss 0.02843129 - time (sec): 109.23 - samples/sec: 453.52 - lr: 0.000080 - momentum: 0.000000 2023-10-10 23:36:11,923 epoch 6 - iter 534/1786 - loss 0.02722917 - time (sec): 165.74 - samples/sec: 453.47 - lr: 0.000078 - momentum: 0.000000 2023-10-10 23:37:06,711 epoch 6 - iter 712/1786 - loss 0.02848990 - time (sec): 220.53 - samples/sec: 449.83 - lr: 0.000077 - momentum: 0.000000 2023-10-10 23:38:02,420 epoch 6 - iter 890/1786 - loss 0.02740472 - time (sec): 276.24 - samples/sec: 445.37 - lr: 0.000075 - momentum: 0.000000 2023-10-10 23:38:58,416 epoch 6 - iter 1068/1786 - loss 0.02716161 - time (sec): 332.23 - samples/sec: 445.09 - lr: 0.000073 - momentum: 0.000000 2023-10-10 23:39:56,084 epoch 6 - iter 1246/1786 - loss 0.02693100 - time (sec): 389.90 - samples/sec: 446.84 - lr: 0.000072 - momentum: 0.000000 2023-10-10 23:40:51,935 epoch 6 - iter 1424/1786 - loss 0.02765192 - time (sec): 445.75 - samples/sec: 446.68 - lr: 0.000070 - momentum: 0.000000 2023-10-10 23:41:49,667 epoch 6 - iter 1602/1786 - loss 0.02865346 - time (sec): 503.49 - samples/sec: 446.24 - lr: 0.000068 - momentum: 0.000000 2023-10-10 23:42:43,134 epoch 6 - iter 1780/1786 - loss 0.02926603 - time (sec): 556.95 - samples/sec: 445.33 - lr: 0.000067 - momentum: 0.000000 2023-10-10 23:42:44,853 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:42:44,853 EPOCH 6 done: loss 0.0292 - lr: 0.000067 2023-10-10 23:43:07,588 DEV : loss 0.18137316405773163 - f1-score (micro avg) 0.7884 2023-10-10 23:43:07,619 saving best model 2023-10-10 23:43:10,492 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:44:05,615 epoch 7 - iter 178/1786 - loss 0.01383948 - time (sec): 55.12 - samples/sec: 460.35 - lr: 0.000065 - momentum: 0.000000 2023-10-10 23:44:59,535 epoch 7 - iter 356/1786 - loss 0.01746561 - time (sec): 109.04 - samples/sec: 446.78 - lr: 0.000063 - momentum: 0.000000 2023-10-10 23:45:54,596 epoch 7 - iter 534/1786 - loss 0.01663426 - time (sec): 164.10 - samples/sec: 452.10 - lr: 0.000062 - momentum: 0.000000 2023-10-10 23:46:48,264 epoch 7 - iter 712/1786 - loss 0.01902346 - time (sec): 217.77 - samples/sec: 454.40 - lr: 0.000060 - momentum: 0.000000 2023-10-10 23:47:41,899 epoch 7 - iter 890/1786 - loss 0.02013304 - time (sec): 271.40 - samples/sec: 453.06 - lr: 0.000058 - momentum: 0.000000 2023-10-10 23:48:36,745 epoch 7 - iter 1068/1786 - loss 0.01908802 - time (sec): 326.25 - samples/sec: 454.51 - lr: 0.000057 - momentum: 0.000000 2023-10-10 23:49:31,014 epoch 7 - iter 1246/1786 - loss 0.02158022 - time (sec): 380.52 - samples/sec: 455.03 - lr: 0.000055 - momentum: 0.000000 2023-10-10 23:50:23,849 epoch 7 - iter 1424/1786 - loss 0.02132979 - time (sec): 433.35 - samples/sec: 453.35 - lr: 0.000053 - momentum: 0.000000 2023-10-10 23:51:17,892 epoch 7 - iter 1602/1786 - loss 0.02171688 - time (sec): 487.40 - samples/sec: 457.34 - lr: 0.000052 - momentum: 0.000000 2023-10-10 23:52:11,674 epoch 7 - iter 1780/1786 - loss 0.02202637 - time (sec): 541.18 - samples/sec: 458.44 - lr: 0.000050 - momentum: 0.000000 2023-10-10 23:52:13,249 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:52:13,249 EPOCH 7 done: loss 0.0222 - lr: 0.000050 2023-10-10 23:52:35,192 DEV : loss 0.19753895699977875 - f1-score (micro avg) 0.788 2023-10-10 23:52:35,227 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:53:28,735 epoch 8 - iter 178/1786 - loss 0.01237478 - time (sec): 53.51 - samples/sec: 458.92 - lr: 0.000048 - momentum: 0.000000 2023-10-10 23:54:21,786 epoch 8 - iter 356/1786 - loss 0.01268093 - time (sec): 106.56 - samples/sec: 455.22 - lr: 0.000047 - momentum: 0.000000 2023-10-10 23:55:14,892 epoch 8 - iter 534/1786 - loss 0.01457889 - time (sec): 159.66 - samples/sec: 451.29 - lr: 0.000045 - momentum: 0.000000 2023-10-10 23:56:10,090 epoch 8 - iter 712/1786 - loss 0.01501835 - time (sec): 214.86 - samples/sec: 456.91 - lr: 0.000043 - momentum: 0.000000 2023-10-10 23:57:03,967 epoch 8 - iter 890/1786 - loss 0.01553035 - time (sec): 268.74 - samples/sec: 456.57 - lr: 0.000042 - momentum: 0.000000 2023-10-10 23:57:57,805 epoch 8 - iter 1068/1786 - loss 0.01603908 - time (sec): 322.58 - samples/sec: 452.35 - lr: 0.000040 - momentum: 0.000000 2023-10-10 23:58:51,476 epoch 8 - iter 1246/1786 - loss 0.01600935 - time (sec): 376.25 - samples/sec: 453.99 - lr: 0.000038 - momentum: 0.000000 2023-10-10 23:59:45,167 epoch 8 - iter 1424/1786 - loss 0.01580202 - time (sec): 429.94 - samples/sec: 453.91 - lr: 0.000037 - momentum: 0.000000 2023-10-11 00:00:39,583 epoch 8 - iter 1602/1786 - loss 0.01601921 - time (sec): 484.35 - samples/sec: 456.64 - lr: 0.000035 - momentum: 0.000000 2023-10-11 00:01:35,445 epoch 8 - iter 1780/1786 - loss 0.01577049 - time (sec): 540.22 - samples/sec: 458.60 - lr: 0.000033 - momentum: 0.000000 2023-10-11 00:01:37,335 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:01:37,336 EPOCH 8 done: loss 0.0159 - lr: 0.000033 2023-10-11 00:01:59,826 DEV : loss 0.21947550773620605 - f1-score (micro avg) 0.7661 2023-10-11 00:01:59,858 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:02:55,469 epoch 9 - iter 178/1786 - loss 0.01523368 - time (sec): 55.61 - samples/sec: 447.74 - lr: 0.000032 - momentum: 0.000000 2023-10-11 00:03:49,363 epoch 9 - iter 356/1786 - loss 0.01407707 - time (sec): 109.50 - samples/sec: 445.95 - lr: 0.000030 - momentum: 0.000000 2023-10-11 00:04:44,566 epoch 9 - iter 534/1786 - loss 0.01448829 - time (sec): 164.71 - samples/sec: 454.38 - lr: 0.000028 - momentum: 0.000000 2023-10-11 00:05:37,166 epoch 9 - iter 712/1786 - loss 0.01334062 - time (sec): 217.31 - samples/sec: 449.99 - lr: 0.000027 - momentum: 0.000000 2023-10-11 00:06:31,408 epoch 9 - iter 890/1786 - loss 0.01253925 - time (sec): 271.55 - samples/sec: 448.63 - lr: 0.000025 - momentum: 0.000000 2023-10-11 00:07:27,023 epoch 9 - iter 1068/1786 - loss 0.01207849 - time (sec): 327.16 - samples/sec: 446.90 - lr: 0.000023 - momentum: 0.000000 2023-10-11 00:08:22,340 epoch 9 - iter 1246/1786 - loss 0.01195030 - time (sec): 382.48 - samples/sec: 445.36 - lr: 0.000022 - momentum: 0.000000 2023-10-11 00:09:18,578 epoch 9 - iter 1424/1786 - loss 0.01110923 - time (sec): 438.72 - samples/sec: 446.22 - lr: 0.000020 - momentum: 0.000000 2023-10-11 00:10:14,720 epoch 9 - iter 1602/1786 - loss 0.01105152 - time (sec): 494.86 - samples/sec: 446.93 - lr: 0.000018 - momentum: 0.000000 2023-10-11 00:11:11,874 epoch 9 - iter 1780/1786 - loss 0.01082556 - time (sec): 552.01 - samples/sec: 449.12 - lr: 0.000017 - momentum: 0.000000 2023-10-11 00:11:13,673 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:11:13,673 EPOCH 9 done: loss 0.0109 - lr: 0.000017 2023-10-11 00:11:35,988 DEV : loss 0.23203261196613312 - f1-score (micro avg) 0.7856 2023-10-11 00:11:36,018 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:12:32,054 epoch 10 - iter 178/1786 - loss 0.00835357 - time (sec): 56.03 - samples/sec: 450.39 - lr: 0.000015 - momentum: 0.000000 2023-10-11 00:13:26,270 epoch 10 - iter 356/1786 - loss 0.00922388 - time (sec): 110.25 - samples/sec: 446.76 - lr: 0.000013 - momentum: 0.000000 2023-10-11 00:14:18,625 epoch 10 - iter 534/1786 - loss 0.00920895 - time (sec): 162.60 - samples/sec: 444.35 - lr: 0.000012 - momentum: 0.000000 2023-10-11 00:15:12,635 epoch 10 - iter 712/1786 - loss 0.00888713 - time (sec): 216.61 - samples/sec: 455.30 - lr: 0.000010 - momentum: 0.000000 2023-10-11 00:16:07,255 epoch 10 - iter 890/1786 - loss 0.00878696 - time (sec): 271.23 - samples/sec: 460.18 - lr: 0.000008 - momentum: 0.000000 2023-10-11 00:16:59,266 epoch 10 - iter 1068/1786 - loss 0.00942608 - time (sec): 323.25 - samples/sec: 460.00 - lr: 0.000007 - momentum: 0.000000 2023-10-11 00:17:54,373 epoch 10 - iter 1246/1786 - loss 0.00924649 - time (sec): 378.35 - samples/sec: 462.85 - lr: 0.000005 - momentum: 0.000000 2023-10-11 00:18:47,015 epoch 10 - iter 1424/1786 - loss 0.01026711 - time (sec): 430.99 - samples/sec: 460.94 - lr: 0.000003 - momentum: 0.000000 2023-10-11 00:19:40,614 epoch 10 - iter 1602/1786 - loss 0.00973267 - time (sec): 484.59 - samples/sec: 459.29 - lr: 0.000002 - momentum: 0.000000 2023-10-11 00:20:34,899 epoch 10 - iter 1780/1786 - loss 0.00929539 - time (sec): 538.88 - samples/sec: 460.31 - lr: 0.000000 - momentum: 0.000000 2023-10-11 00:20:36,575 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:20:36,575 EPOCH 10 done: loss 0.0093 - lr: 0.000000 2023-10-11 00:20:58,741 DEV : loss 0.2327549308538437 - f1-score (micro avg) 0.783 2023-10-11 00:20:59,663 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:20:59,665 Loading model from best epoch ... 2023-10-11 00:21:03,556 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 00:22:14,198 Results: - F-score (micro) 0.7025 - F-score (macro) 0.6076 - Accuracy 0.5549 By class: precision recall f1-score support LOC 0.6983 0.7123 0.7052 1095 PER 0.7871 0.7816 0.7843 1012 ORG 0.4790 0.5434 0.5092 357 HumanProd 0.3455 0.5758 0.4318 33 micro avg 0.6909 0.7145 0.7025 2497 macro avg 0.5775 0.6533 0.6076 2497 weighted avg 0.6983 0.7145 0.7057 2497 2023-10-11 00:22:14,198 ----------------------------------------------------------------------------------------------------