2023-10-11 00:14:02,384 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,386 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 00:14:02,386 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,386 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-11 00:14:02,387 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,387 Train: 1166 sentences 2023-10-11 00:14:02,387 (train_with_dev=False, train_with_test=False) 2023-10-11 00:14:02,387 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,387 Training Params: 2023-10-11 00:14:02,387 - learning_rate: "0.00015" 2023-10-11 00:14:02,387 - mini_batch_size: "8" 2023-10-11 00:14:02,387 - max_epochs: "10" 2023-10-11 00:14:02,387 - shuffle: "True" 2023-10-11 00:14:02,387 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,387 Plugins: 2023-10-11 00:14:02,387 - TensorboardLogger 2023-10-11 00:14:02,387 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 00:14:02,388 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,388 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 00:14:02,388 - metric: "('micro avg', 'f1-score')" 2023-10-11 00:14:02,388 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,388 Computation: 2023-10-11 00:14:02,388 - compute on device: cuda:0 2023-10-11 00:14:02,388 - embedding storage: none 2023-10-11 00:14:02,388 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,388 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" 2023-10-11 00:14:02,388 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,388 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:14:02,388 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 00:14:11,164 epoch 1 - iter 14/146 - loss 2.82817866 - time (sec): 8.77 - samples/sec: 427.52 - lr: 0.000013 - momentum: 0.000000 2023-10-11 00:14:20,453 epoch 1 - iter 28/146 - loss 2.81986952 - time (sec): 18.06 - samples/sec: 450.86 - lr: 0.000028 - momentum: 0.000000 2023-10-11 00:14:29,670 epoch 1 - iter 42/146 - loss 2.81010839 - time (sec): 27.28 - samples/sec: 448.27 - lr: 0.000042 - momentum: 0.000000 2023-10-11 00:14:38,407 epoch 1 - iter 56/146 - loss 2.79282156 - time (sec): 36.02 - samples/sec: 439.82 - lr: 0.000057 - momentum: 0.000000 2023-10-11 00:14:48,463 epoch 1 - iter 70/146 - loss 2.75564153 - time (sec): 46.07 - samples/sec: 449.23 - lr: 0.000071 - momentum: 0.000000 2023-10-11 00:14:58,486 epoch 1 - iter 84/146 - loss 2.70141670 - time (sec): 56.10 - samples/sec: 458.13 - lr: 0.000085 - momentum: 0.000000 2023-10-11 00:15:07,903 epoch 1 - iter 98/146 - loss 2.63744532 - time (sec): 65.51 - samples/sec: 457.86 - lr: 0.000100 - momentum: 0.000000 2023-10-11 00:15:16,446 epoch 1 - iter 112/146 - loss 2.57069765 - time (sec): 74.06 - samples/sec: 459.84 - lr: 0.000114 - momentum: 0.000000 2023-10-11 00:15:25,138 epoch 1 - iter 126/146 - loss 2.48819700 - time (sec): 82.75 - samples/sec: 464.09 - lr: 0.000128 - momentum: 0.000000 2023-10-11 00:15:33,976 epoch 1 - iter 140/146 - loss 2.40859330 - time (sec): 91.59 - samples/sec: 464.91 - lr: 0.000143 - momentum: 0.000000 2023-10-11 00:15:37,658 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:15:37,659 EPOCH 1 done: loss 2.3727 - lr: 0.000143 2023-10-11 00:15:42,937 DEV : loss 1.3521078824996948 - f1-score (micro avg) 0.0 2023-10-11 00:15:42,946 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:15:50,957 epoch 2 - iter 14/146 - loss 1.37293642 - time (sec): 8.01 - samples/sec: 471.08 - lr: 0.000149 - momentum: 0.000000 2023-10-11 00:15:59,218 epoch 2 - iter 28/146 - loss 1.27588533 - time (sec): 16.27 - samples/sec: 479.57 - lr: 0.000147 - momentum: 0.000000 2023-10-11 00:16:07,875 epoch 2 - iter 42/146 - loss 1.19578947 - time (sec): 24.93 - samples/sec: 484.01 - lr: 0.000145 - momentum: 0.000000 2023-10-11 00:16:16,004 epoch 2 - iter 56/146 - loss 1.12937665 - time (sec): 33.06 - samples/sec: 480.81 - lr: 0.000144 - momentum: 0.000000 2023-10-11 00:16:24,974 epoch 2 - iter 70/146 - loss 1.04224254 - time (sec): 42.03 - samples/sec: 486.91 - lr: 0.000142 - momentum: 0.000000 2023-10-11 00:16:34,067 epoch 2 - iter 84/146 - loss 1.00774700 - time (sec): 51.12 - samples/sec: 489.84 - lr: 0.000141 - momentum: 0.000000 2023-10-11 00:16:42,519 epoch 2 - iter 98/146 - loss 0.96099641 - time (sec): 59.57 - samples/sec: 487.23 - lr: 0.000139 - momentum: 0.000000 2023-10-11 00:16:51,251 epoch 2 - iter 112/146 - loss 0.90901669 - time (sec): 68.30 - samples/sec: 489.49 - lr: 0.000137 - momentum: 0.000000 2023-10-11 00:17:00,072 epoch 2 - iter 126/146 - loss 0.86569250 - time (sec): 77.12 - samples/sec: 491.52 - lr: 0.000136 - momentum: 0.000000 2023-10-11 00:17:09,029 epoch 2 - iter 140/146 - loss 0.83180630 - time (sec): 86.08 - samples/sec: 492.45 - lr: 0.000134 - momentum: 0.000000 2023-10-11 00:17:12,929 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:17:12,930 EPOCH 2 done: loss 0.8265 - lr: 0.000134 2023-10-11 00:17:18,512 DEV : loss 0.45962727069854736 - f1-score (micro avg) 0.0 2023-10-11 00:17:18,522 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:17:27,515 epoch 3 - iter 14/146 - loss 0.56664628 - time (sec): 8.99 - samples/sec: 550.74 - lr: 0.000132 - momentum: 0.000000 2023-10-11 00:17:36,702 epoch 3 - iter 28/146 - loss 0.51000469 - time (sec): 18.18 - samples/sec: 553.40 - lr: 0.000130 - momentum: 0.000000 2023-10-11 00:17:45,427 epoch 3 - iter 42/146 - loss 0.55411592 - time (sec): 26.90 - samples/sec: 537.41 - lr: 0.000129 - momentum: 0.000000 2023-10-11 00:17:53,675 epoch 3 - iter 56/146 - loss 0.52261842 - time (sec): 35.15 - samples/sec: 527.12 - lr: 0.000127 - momentum: 0.000000 2023-10-11 00:18:02,091 epoch 3 - iter 70/146 - loss 0.51648209 - time (sec): 43.57 - samples/sec: 523.93 - lr: 0.000126 - momentum: 0.000000 2023-10-11 00:18:10,925 epoch 3 - iter 84/146 - loss 0.49728401 - time (sec): 52.40 - samples/sec: 518.27 - lr: 0.000124 - momentum: 0.000000 2023-10-11 00:18:19,329 epoch 3 - iter 98/146 - loss 0.47812146 - time (sec): 60.81 - samples/sec: 512.91 - lr: 0.000122 - momentum: 0.000000 2023-10-11 00:18:27,211 epoch 3 - iter 112/146 - loss 0.47088239 - time (sec): 68.69 - samples/sec: 505.81 - lr: 0.000121 - momentum: 0.000000 2023-10-11 00:18:34,844 epoch 3 - iter 126/146 - loss 0.46014170 - time (sec): 76.32 - samples/sec: 498.40 - lr: 0.000119 - momentum: 0.000000 2023-10-11 00:18:43,390 epoch 3 - iter 140/146 - loss 0.45282216 - time (sec): 84.87 - samples/sec: 496.35 - lr: 0.000118 - momentum: 0.000000 2023-10-11 00:18:47,362 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:18:47,362 EPOCH 3 done: loss 0.4440 - lr: 0.000118 2023-10-11 00:18:53,034 DEV : loss 0.28692546486854553 - f1-score (micro avg) 0.1634 2023-10-11 00:18:53,043 saving best model 2023-10-11 00:18:53,929 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:19:02,136 epoch 4 - iter 14/146 - loss 0.33858093 - time (sec): 8.21 - samples/sec: 468.00 - lr: 0.000115 - momentum: 0.000000 2023-10-11 00:19:11,252 epoch 4 - iter 28/146 - loss 0.33897577 - time (sec): 17.32 - samples/sec: 482.48 - lr: 0.000114 - momentum: 0.000000 2023-10-11 00:19:19,557 epoch 4 - iter 42/146 - loss 0.32510505 - time (sec): 25.63 - samples/sec: 480.07 - lr: 0.000112 - momentum: 0.000000 2023-10-11 00:19:27,999 epoch 4 - iter 56/146 - loss 0.33741625 - time (sec): 34.07 - samples/sec: 484.30 - lr: 0.000111 - momentum: 0.000000 2023-10-11 00:19:36,717 epoch 4 - iter 70/146 - loss 0.32330073 - time (sec): 42.79 - samples/sec: 494.60 - lr: 0.000109 - momentum: 0.000000 2023-10-11 00:19:45,262 epoch 4 - iter 84/146 - loss 0.35066962 - time (sec): 51.33 - samples/sec: 491.95 - lr: 0.000107 - momentum: 0.000000 2023-10-11 00:19:53,579 epoch 4 - iter 98/146 - loss 0.34254804 - time (sec): 59.65 - samples/sec: 491.65 - lr: 0.000106 - momentum: 0.000000 2023-10-11 00:20:02,300 epoch 4 - iter 112/146 - loss 0.33425876 - time (sec): 68.37 - samples/sec: 495.96 - lr: 0.000104 - momentum: 0.000000 2023-10-11 00:20:10,709 epoch 4 - iter 126/146 - loss 0.33494481 - time (sec): 76.78 - samples/sec: 495.05 - lr: 0.000103 - momentum: 0.000000 2023-10-11 00:20:19,699 epoch 4 - iter 140/146 - loss 0.32750282 - time (sec): 85.77 - samples/sec: 494.85 - lr: 0.000101 - momentum: 0.000000 2023-10-11 00:20:23,418 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:20:23,418 EPOCH 4 done: loss 0.3225 - lr: 0.000101 2023-10-11 00:20:29,017 DEV : loss 0.23322905600070953 - f1-score (micro avg) 0.332 2023-10-11 00:20:29,025 saving best model 2023-10-11 00:20:35,031 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:20:43,857 epoch 5 - iter 14/146 - loss 0.27735107 - time (sec): 8.82 - samples/sec: 510.33 - lr: 0.000099 - momentum: 0.000000 2023-10-11 00:20:52,350 epoch 5 - iter 28/146 - loss 0.25431500 - time (sec): 17.31 - samples/sec: 499.40 - lr: 0.000097 - momentum: 0.000000 2023-10-11 00:21:00,691 epoch 5 - iter 42/146 - loss 0.29245784 - time (sec): 25.66 - samples/sec: 492.51 - lr: 0.000096 - momentum: 0.000000 2023-10-11 00:21:08,893 epoch 5 - iter 56/146 - loss 0.30867369 - time (sec): 33.86 - samples/sec: 484.17 - lr: 0.000094 - momentum: 0.000000 2023-10-11 00:21:17,431 epoch 5 - iter 70/146 - loss 0.28826282 - time (sec): 42.40 - samples/sec: 484.22 - lr: 0.000092 - momentum: 0.000000 2023-10-11 00:21:26,668 epoch 5 - iter 84/146 - loss 0.27456335 - time (sec): 51.63 - samples/sec: 487.77 - lr: 0.000091 - momentum: 0.000000 2023-10-11 00:21:36,019 epoch 5 - iter 98/146 - loss 0.26911782 - time (sec): 60.98 - samples/sec: 497.02 - lr: 0.000089 - momentum: 0.000000 2023-10-11 00:21:44,734 epoch 5 - iter 112/146 - loss 0.25803376 - time (sec): 69.70 - samples/sec: 498.10 - lr: 0.000088 - momentum: 0.000000 2023-10-11 00:21:53,421 epoch 5 - iter 126/146 - loss 0.25520050 - time (sec): 78.39 - samples/sec: 498.15 - lr: 0.000086 - momentum: 0.000000 2023-10-11 00:22:01,791 epoch 5 - iter 140/146 - loss 0.25148287 - time (sec): 86.76 - samples/sec: 496.66 - lr: 0.000084 - momentum: 0.000000 2023-10-11 00:22:05,100 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:22:05,100 EPOCH 5 done: loss 0.2521 - lr: 0.000084 2023-10-11 00:22:10,781 DEV : loss 0.19501639902591705 - f1-score (micro avg) 0.473 2023-10-11 00:22:10,790 saving best model 2023-10-11 00:22:16,955 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:22:26,475 epoch 6 - iter 14/146 - loss 0.16514820 - time (sec): 9.52 - samples/sec: 514.75 - lr: 0.000082 - momentum: 0.000000 2023-10-11 00:22:34,815 epoch 6 - iter 28/146 - loss 0.17522830 - time (sec): 17.86 - samples/sec: 477.67 - lr: 0.000081 - momentum: 0.000000 2023-10-11 00:22:43,531 epoch 6 - iter 42/146 - loss 0.17690455 - time (sec): 26.57 - samples/sec: 477.73 - lr: 0.000079 - momentum: 0.000000 2023-10-11 00:22:52,461 epoch 6 - iter 56/146 - loss 0.16628079 - time (sec): 35.50 - samples/sec: 484.40 - lr: 0.000077 - momentum: 0.000000 2023-10-11 00:23:00,737 epoch 6 - iter 70/146 - loss 0.18071160 - time (sec): 43.78 - samples/sec: 483.63 - lr: 0.000076 - momentum: 0.000000 2023-10-11 00:23:10,600 epoch 6 - iter 84/146 - loss 0.20187792 - time (sec): 53.64 - samples/sec: 497.31 - lr: 0.000074 - momentum: 0.000000 2023-10-11 00:23:19,008 epoch 6 - iter 98/146 - loss 0.20080362 - time (sec): 62.05 - samples/sec: 494.99 - lr: 0.000073 - momentum: 0.000000 2023-10-11 00:23:27,506 epoch 6 - iter 112/146 - loss 0.19888829 - time (sec): 70.55 - samples/sec: 493.73 - lr: 0.000071 - momentum: 0.000000 2023-10-11 00:23:35,994 epoch 6 - iter 126/146 - loss 0.19539473 - time (sec): 79.03 - samples/sec: 493.95 - lr: 0.000069 - momentum: 0.000000 2023-10-11 00:23:43,994 epoch 6 - iter 140/146 - loss 0.19529054 - time (sec): 87.03 - samples/sec: 491.39 - lr: 0.000068 - momentum: 0.000000 2023-10-11 00:23:47,387 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:23:47,387 EPOCH 6 done: loss 0.1923 - lr: 0.000068 2023-10-11 00:23:52,889 DEV : loss 0.1738743782043457 - f1-score (micro avg) 0.5498 2023-10-11 00:23:52,897 saving best model 2023-10-11 00:23:59,052 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:24:08,020 epoch 7 - iter 14/146 - loss 0.15115652 - time (sec): 8.96 - samples/sec: 516.15 - lr: 0.000066 - momentum: 0.000000 2023-10-11 00:24:16,989 epoch 7 - iter 28/146 - loss 0.15169848 - time (sec): 17.93 - samples/sec: 529.01 - lr: 0.000064 - momentum: 0.000000 2023-10-11 00:24:25,559 epoch 7 - iter 42/146 - loss 0.15112913 - time (sec): 26.50 - samples/sec: 514.72 - lr: 0.000062 - momentum: 0.000000 2023-10-11 00:24:33,559 epoch 7 - iter 56/146 - loss 0.14375947 - time (sec): 34.50 - samples/sec: 505.96 - lr: 0.000061 - momentum: 0.000000 2023-10-11 00:24:41,851 epoch 7 - iter 70/146 - loss 0.14191662 - time (sec): 42.80 - samples/sec: 502.65 - lr: 0.000059 - momentum: 0.000000 2023-10-11 00:24:49,764 epoch 7 - iter 84/146 - loss 0.14733674 - time (sec): 50.71 - samples/sec: 499.78 - lr: 0.000058 - momentum: 0.000000 2023-10-11 00:24:58,428 epoch 7 - iter 98/146 - loss 0.15209724 - time (sec): 59.37 - samples/sec: 503.09 - lr: 0.000056 - momentum: 0.000000 2023-10-11 00:25:06,268 epoch 7 - iter 112/146 - loss 0.15110304 - time (sec): 67.21 - samples/sec: 493.92 - lr: 0.000054 - momentum: 0.000000 2023-10-11 00:25:15,356 epoch 7 - iter 126/146 - loss 0.15315807 - time (sec): 76.30 - samples/sec: 498.24 - lr: 0.000053 - momentum: 0.000000 2023-10-11 00:25:24,258 epoch 7 - iter 140/146 - loss 0.15339801 - time (sec): 85.20 - samples/sec: 504.10 - lr: 0.000051 - momentum: 0.000000 2023-10-11 00:25:27,449 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:25:27,450 EPOCH 7 done: loss 0.1525 - lr: 0.000051 2023-10-11 00:25:33,160 DEV : loss 0.1568579375743866 - f1-score (micro avg) 0.6026 2023-10-11 00:25:33,170 saving best model 2023-10-11 00:25:39,402 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:25:48,695 epoch 8 - iter 14/146 - loss 0.14470409 - time (sec): 9.29 - samples/sec: 565.87 - lr: 0.000049 - momentum: 0.000000 2023-10-11 00:25:56,856 epoch 8 - iter 28/146 - loss 0.15466879 - time (sec): 17.45 - samples/sec: 513.07 - lr: 0.000047 - momentum: 0.000000 2023-10-11 00:26:05,090 epoch 8 - iter 42/146 - loss 0.14556756 - time (sec): 25.68 - samples/sec: 500.74 - lr: 0.000046 - momentum: 0.000000 2023-10-11 00:26:13,757 epoch 8 - iter 56/146 - loss 0.14562604 - time (sec): 34.35 - samples/sec: 497.81 - lr: 0.000044 - momentum: 0.000000 2023-10-11 00:26:22,674 epoch 8 - iter 70/146 - loss 0.14744312 - time (sec): 43.27 - samples/sec: 498.29 - lr: 0.000043 - momentum: 0.000000 2023-10-11 00:26:31,283 epoch 8 - iter 84/146 - loss 0.14623450 - time (sec): 51.88 - samples/sec: 486.66 - lr: 0.000041 - momentum: 0.000000 2023-10-11 00:26:40,567 epoch 8 - iter 98/146 - loss 0.13910467 - time (sec): 61.16 - samples/sec: 479.51 - lr: 0.000039 - momentum: 0.000000 2023-10-11 00:26:50,125 epoch 8 - iter 112/146 - loss 0.13334599 - time (sec): 70.72 - samples/sec: 476.83 - lr: 0.000038 - momentum: 0.000000 2023-10-11 00:26:59,926 epoch 8 - iter 126/146 - loss 0.12965286 - time (sec): 80.52 - samples/sec: 473.91 - lr: 0.000036 - momentum: 0.000000 2023-10-11 00:27:09,581 epoch 8 - iter 140/146 - loss 0.12939577 - time (sec): 90.18 - samples/sec: 471.19 - lr: 0.000035 - momentum: 0.000000 2023-10-11 00:27:13,663 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:27:13,664 EPOCH 8 done: loss 0.1293 - lr: 0.000035 2023-10-11 00:27:20,336 DEV : loss 0.14915454387664795 - f1-score (micro avg) 0.6711 2023-10-11 00:27:20,346 saving best model 2023-10-11 00:27:26,639 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:27:35,810 epoch 9 - iter 14/146 - loss 0.14480468 - time (sec): 9.17 - samples/sec: 512.79 - lr: 0.000032 - momentum: 0.000000 2023-10-11 00:27:44,972 epoch 9 - iter 28/146 - loss 0.12111733 - time (sec): 18.33 - samples/sec: 508.59 - lr: 0.000031 - momentum: 0.000000 2023-10-11 00:27:53,282 epoch 9 - iter 42/146 - loss 0.11551154 - time (sec): 26.64 - samples/sec: 494.12 - lr: 0.000029 - momentum: 0.000000 2023-10-11 00:28:02,178 epoch 9 - iter 56/146 - loss 0.11450421 - time (sec): 35.54 - samples/sec: 496.91 - lr: 0.000028 - momentum: 0.000000 2023-10-11 00:28:11,315 epoch 9 - iter 70/146 - loss 0.11627392 - time (sec): 44.67 - samples/sec: 491.31 - lr: 0.000026 - momentum: 0.000000 2023-10-11 00:28:20,239 epoch 9 - iter 84/146 - loss 0.11633930 - time (sec): 53.60 - samples/sec: 491.19 - lr: 0.000024 - momentum: 0.000000 2023-10-11 00:28:28,971 epoch 9 - iter 98/146 - loss 0.11323542 - time (sec): 62.33 - samples/sec: 488.62 - lr: 0.000023 - momentum: 0.000000 2023-10-11 00:28:37,804 epoch 9 - iter 112/146 - loss 0.10890718 - time (sec): 71.16 - samples/sec: 488.50 - lr: 0.000021 - momentum: 0.000000 2023-10-11 00:28:46,797 epoch 9 - iter 126/146 - loss 0.11265525 - time (sec): 80.15 - samples/sec: 487.63 - lr: 0.000020 - momentum: 0.000000 2023-10-11 00:28:55,411 epoch 9 - iter 140/146 - loss 0.11540303 - time (sec): 88.77 - samples/sec: 485.35 - lr: 0.000018 - momentum: 0.000000 2023-10-11 00:28:58,607 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:28:58,607 EPOCH 9 done: loss 0.1148 - lr: 0.000018 2023-10-11 00:29:04,628 DEV : loss 0.15014490485191345 - f1-score (micro avg) 0.7097 2023-10-11 00:29:04,638 saving best model 2023-10-11 00:29:10,636 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:29:19,540 epoch 10 - iter 14/146 - loss 0.11532110 - time (sec): 8.90 - samples/sec: 515.52 - lr: 0.000016 - momentum: 0.000000 2023-10-11 00:29:28,674 epoch 10 - iter 28/146 - loss 0.11738693 - time (sec): 18.03 - samples/sec: 506.38 - lr: 0.000014 - momentum: 0.000000 2023-10-11 00:29:37,812 epoch 10 - iter 42/146 - loss 0.11927845 - time (sec): 27.17 - samples/sec: 512.69 - lr: 0.000013 - momentum: 0.000000 2023-10-11 00:29:47,596 epoch 10 - iter 56/146 - loss 0.11248663 - time (sec): 36.96 - samples/sec: 504.22 - lr: 0.000011 - momentum: 0.000000 2023-10-11 00:29:56,837 epoch 10 - iter 70/146 - loss 0.11378977 - time (sec): 46.20 - samples/sec: 489.68 - lr: 0.000009 - momentum: 0.000000 2023-10-11 00:30:06,496 epoch 10 - iter 84/146 - loss 0.10860430 - time (sec): 55.86 - samples/sec: 483.19 - lr: 0.000008 - momentum: 0.000000 2023-10-11 00:30:15,171 epoch 10 - iter 98/146 - loss 0.10596910 - time (sec): 64.53 - samples/sec: 467.25 - lr: 0.000006 - momentum: 0.000000 2023-10-11 00:30:25,050 epoch 10 - iter 112/146 - loss 0.10933142 - time (sec): 74.41 - samples/sec: 465.68 - lr: 0.000005 - momentum: 0.000000 2023-10-11 00:30:34,564 epoch 10 - iter 126/146 - loss 0.10650302 - time (sec): 83.92 - samples/sec: 460.29 - lr: 0.000003 - momentum: 0.000000 2023-10-11 00:30:44,129 epoch 10 - iter 140/146 - loss 0.10893081 - time (sec): 93.49 - samples/sec: 456.67 - lr: 0.000001 - momentum: 0.000000 2023-10-11 00:30:48,031 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:30:48,032 EPOCH 10 done: loss 0.1087 - lr: 0.000001 2023-10-11 00:30:53,773 DEV : loss 0.15238186717033386 - f1-score (micro avg) 0.7229 2023-10-11 00:30:53,782 saving best model 2023-10-11 00:31:00,790 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:00,792 Loading model from best epoch ... 2023-10-11 00:31:04,651 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 00:31:16,872 Results: - F-score (micro) 0.7015 - F-score (macro) 0.6099 - Accuracy 0.5632 By class: precision recall f1-score support PER 0.7821 0.8046 0.7932 348 LOC 0.5766 0.7931 0.6677 261 ORG 0.2982 0.3269 0.3119 52 HumanProd 0.7647 0.5909 0.6667 22 micro avg 0.6536 0.7570 0.7015 683 macro avg 0.6054 0.6289 0.6099 683 weighted avg 0.6662 0.7570 0.7045 683 2023-10-11 00:31:16,872 ----------------------------------------------------------------------------------------------------