stefan-it's picture
Upload folder using huggingface_hub
4891f1c
2023-10-11 13:27:33,976 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,978 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 13:27:33,978 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,978 MultiCorpus: 1085 train + 148 dev + 364 test sentences
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator
2023-10-11 13:27:33,978 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,979 Train: 1085 sentences
2023-10-11 13:27:33,979 (train_with_dev=False, train_with_test=False)
2023-10-11 13:27:33,979 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,979 Training Params:
2023-10-11 13:27:33,979 - learning_rate: "0.00016"
2023-10-11 13:27:33,979 - mini_batch_size: "4"
2023-10-11 13:27:33,979 - max_epochs: "10"
2023-10-11 13:27:33,979 - shuffle: "True"
2023-10-11 13:27:33,979 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,979 Plugins:
2023-10-11 13:27:33,979 - TensorboardLogger
2023-10-11 13:27:33,979 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 13:27:33,979 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,979 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 13:27:33,980 - metric: "('micro avg', 'f1-score')"
2023-10-11 13:27:33,980 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,980 Computation:
2023-10-11 13:27:33,980 - compute on device: cuda:0
2023-10-11 13:27:33,980 - embedding storage: none
2023-10-11 13:27:33,980 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,980 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5"
2023-10-11 13:27:33,980 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,980 ----------------------------------------------------------------------------------------------------
2023-10-11 13:27:33,980 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 13:27:43,531 epoch 1 - iter 27/272 - loss 2.81999244 - time (sec): 9.55 - samples/sec: 591.04 - lr: 0.000015 - momentum: 0.000000
2023-10-11 13:27:52,452 epoch 1 - iter 54/272 - loss 2.81118479 - time (sec): 18.47 - samples/sec: 561.40 - lr: 0.000031 - momentum: 0.000000
2023-10-11 13:28:01,343 epoch 1 - iter 81/272 - loss 2.79096698 - time (sec): 27.36 - samples/sec: 545.67 - lr: 0.000047 - momentum: 0.000000
2023-10-11 13:28:11,435 epoch 1 - iter 108/272 - loss 2.71614989 - time (sec): 37.45 - samples/sec: 566.63 - lr: 0.000063 - momentum: 0.000000
2023-10-11 13:28:20,784 epoch 1 - iter 135/272 - loss 2.62925889 - time (sec): 46.80 - samples/sec: 564.79 - lr: 0.000079 - momentum: 0.000000
2023-10-11 13:28:30,396 epoch 1 - iter 162/272 - loss 2.52390517 - time (sec): 56.41 - samples/sec: 562.22 - lr: 0.000095 - momentum: 0.000000
2023-10-11 13:28:39,473 epoch 1 - iter 189/272 - loss 2.41837798 - time (sec): 65.49 - samples/sec: 559.85 - lr: 0.000111 - momentum: 0.000000
2023-10-11 13:28:48,377 epoch 1 - iter 216/272 - loss 2.31158997 - time (sec): 74.40 - samples/sec: 556.49 - lr: 0.000126 - momentum: 0.000000
2023-10-11 13:28:57,118 epoch 1 - iter 243/272 - loss 2.21251411 - time (sec): 83.14 - samples/sec: 551.90 - lr: 0.000142 - momentum: 0.000000
2023-10-11 13:29:07,861 epoch 1 - iter 270/272 - loss 2.06737558 - time (sec): 93.88 - samples/sec: 552.07 - lr: 0.000158 - momentum: 0.000000
2023-10-11 13:29:08,291 ----------------------------------------------------------------------------------------------------
2023-10-11 13:29:08,292 EPOCH 1 done: loss 2.0636 - lr: 0.000158
2023-10-11 13:29:13,402 DEV : loss 0.7253165245056152 - f1-score (micro avg) 0.0
2023-10-11 13:29:13,409 ----------------------------------------------------------------------------------------------------
2023-10-11 13:29:22,469 epoch 2 - iter 27/272 - loss 0.73318608 - time (sec): 9.06 - samples/sec: 526.52 - lr: 0.000158 - momentum: 0.000000
2023-10-11 13:29:32,473 epoch 2 - iter 54/272 - loss 0.66531252 - time (sec): 19.06 - samples/sec: 526.20 - lr: 0.000157 - momentum: 0.000000
2023-10-11 13:29:42,956 epoch 2 - iter 81/272 - loss 0.64023609 - time (sec): 29.54 - samples/sec: 512.28 - lr: 0.000155 - momentum: 0.000000
2023-10-11 13:29:53,234 epoch 2 - iter 108/272 - loss 0.59586906 - time (sec): 39.82 - samples/sec: 523.70 - lr: 0.000153 - momentum: 0.000000
2023-10-11 13:30:03,244 epoch 2 - iter 135/272 - loss 0.57638010 - time (sec): 49.83 - samples/sec: 520.32 - lr: 0.000151 - momentum: 0.000000
2023-10-11 13:30:13,822 epoch 2 - iter 162/272 - loss 0.55581269 - time (sec): 60.41 - samples/sec: 519.74 - lr: 0.000149 - momentum: 0.000000
2023-10-11 13:30:23,650 epoch 2 - iter 189/272 - loss 0.52874034 - time (sec): 70.24 - samples/sec: 522.78 - lr: 0.000148 - momentum: 0.000000
2023-10-11 13:30:32,781 epoch 2 - iter 216/272 - loss 0.50817052 - time (sec): 79.37 - samples/sec: 520.59 - lr: 0.000146 - momentum: 0.000000
2023-10-11 13:30:42,130 epoch 2 - iter 243/272 - loss 0.49516648 - time (sec): 88.72 - samples/sec: 520.92 - lr: 0.000144 - momentum: 0.000000
2023-10-11 13:30:52,007 epoch 2 - iter 270/272 - loss 0.47371239 - time (sec): 98.60 - samples/sec: 523.10 - lr: 0.000142 - momentum: 0.000000
2023-10-11 13:30:52,637 ----------------------------------------------------------------------------------------------------
2023-10-11 13:30:52,637 EPOCH 2 done: loss 0.4728 - lr: 0.000142
2023-10-11 13:30:58,739 DEV : loss 0.2730832099914551 - f1-score (micro avg) 0.458
2023-10-11 13:30:58,747 saving best model
2023-10-11 13:30:59,637 ----------------------------------------------------------------------------------------------------
2023-10-11 13:31:08,898 epoch 3 - iter 27/272 - loss 0.32431702 - time (sec): 9.26 - samples/sec: 502.47 - lr: 0.000141 - momentum: 0.000000
2023-10-11 13:31:20,050 epoch 3 - iter 54/272 - loss 0.29067923 - time (sec): 20.41 - samples/sec: 546.54 - lr: 0.000139 - momentum: 0.000000
2023-10-11 13:31:30,352 epoch 3 - iter 81/272 - loss 0.27571265 - time (sec): 30.71 - samples/sec: 544.60 - lr: 0.000137 - momentum: 0.000000
2023-10-11 13:31:39,766 epoch 3 - iter 108/272 - loss 0.27279233 - time (sec): 40.13 - samples/sec: 531.75 - lr: 0.000135 - momentum: 0.000000
2023-10-11 13:31:49,454 epoch 3 - iter 135/272 - loss 0.26429444 - time (sec): 49.81 - samples/sec: 530.94 - lr: 0.000133 - momentum: 0.000000
2023-10-11 13:31:59,524 epoch 3 - iter 162/272 - loss 0.26191426 - time (sec): 59.88 - samples/sec: 533.41 - lr: 0.000132 - momentum: 0.000000
2023-10-11 13:32:08,926 epoch 3 - iter 189/272 - loss 0.25973280 - time (sec): 69.29 - samples/sec: 530.00 - lr: 0.000130 - momentum: 0.000000
2023-10-11 13:32:18,526 epoch 3 - iter 216/272 - loss 0.25380917 - time (sec): 78.89 - samples/sec: 529.89 - lr: 0.000128 - momentum: 0.000000
2023-10-11 13:32:27,975 epoch 3 - iter 243/272 - loss 0.24522414 - time (sec): 88.34 - samples/sec: 527.87 - lr: 0.000126 - momentum: 0.000000
2023-10-11 13:32:37,515 epoch 3 - iter 270/272 - loss 0.24504080 - time (sec): 97.88 - samples/sec: 529.60 - lr: 0.000125 - momentum: 0.000000
2023-10-11 13:32:37,912 ----------------------------------------------------------------------------------------------------
2023-10-11 13:32:37,912 EPOCH 3 done: loss 0.2450 - lr: 0.000125
2023-10-11 13:32:43,468 DEV : loss 0.17829611897468567 - f1-score (micro avg) 0.6035
2023-10-11 13:32:43,475 saving best model
2023-10-11 13:32:46,048 ----------------------------------------------------------------------------------------------------
2023-10-11 13:32:55,066 epoch 4 - iter 27/272 - loss 0.17217050 - time (sec): 9.01 - samples/sec: 523.11 - lr: 0.000123 - momentum: 0.000000
2023-10-11 13:33:04,366 epoch 4 - iter 54/272 - loss 0.16719853 - time (sec): 18.31 - samples/sec: 549.27 - lr: 0.000121 - momentum: 0.000000
2023-10-11 13:33:14,336 epoch 4 - iter 81/272 - loss 0.15878769 - time (sec): 28.28 - samples/sec: 564.10 - lr: 0.000119 - momentum: 0.000000
2023-10-11 13:33:23,963 epoch 4 - iter 108/272 - loss 0.15319958 - time (sec): 37.91 - samples/sec: 565.67 - lr: 0.000117 - momentum: 0.000000
2023-10-11 13:33:32,824 epoch 4 - iter 135/272 - loss 0.15294648 - time (sec): 46.77 - samples/sec: 563.12 - lr: 0.000116 - momentum: 0.000000
2023-10-11 13:33:42,222 epoch 4 - iter 162/272 - loss 0.14383828 - time (sec): 56.17 - samples/sec: 565.86 - lr: 0.000114 - momentum: 0.000000
2023-10-11 13:33:51,342 epoch 4 - iter 189/272 - loss 0.14409440 - time (sec): 65.29 - samples/sec: 561.27 - lr: 0.000112 - momentum: 0.000000
2023-10-11 13:34:00,576 epoch 4 - iter 216/272 - loss 0.14278202 - time (sec): 74.52 - samples/sec: 562.45 - lr: 0.000110 - momentum: 0.000000
2023-10-11 13:34:09,522 epoch 4 - iter 243/272 - loss 0.14476740 - time (sec): 83.47 - samples/sec: 561.39 - lr: 0.000109 - momentum: 0.000000
2023-10-11 13:34:18,777 epoch 4 - iter 270/272 - loss 0.14220216 - time (sec): 92.72 - samples/sec: 558.81 - lr: 0.000107 - momentum: 0.000000
2023-10-11 13:34:19,205 ----------------------------------------------------------------------------------------------------
2023-10-11 13:34:19,206 EPOCH 4 done: loss 0.1425 - lr: 0.000107
2023-10-11 13:34:24,887 DEV : loss 0.13749206066131592 - f1-score (micro avg) 0.6835
2023-10-11 13:34:24,895 saving best model
2023-10-11 13:34:27,471 ----------------------------------------------------------------------------------------------------
2023-10-11 13:34:37,467 epoch 5 - iter 27/272 - loss 0.13559335 - time (sec): 9.99 - samples/sec: 585.00 - lr: 0.000105 - momentum: 0.000000
2023-10-11 13:34:47,240 epoch 5 - iter 54/272 - loss 0.13141056 - time (sec): 19.76 - samples/sec: 567.27 - lr: 0.000103 - momentum: 0.000000
2023-10-11 13:34:55,969 epoch 5 - iter 81/272 - loss 0.12138647 - time (sec): 28.49 - samples/sec: 545.04 - lr: 0.000101 - momentum: 0.000000
2023-10-11 13:35:05,652 epoch 5 - iter 108/272 - loss 0.11370414 - time (sec): 38.18 - samples/sec: 542.72 - lr: 0.000100 - momentum: 0.000000
2023-10-11 13:35:14,285 epoch 5 - iter 135/272 - loss 0.11107776 - time (sec): 46.81 - samples/sec: 534.80 - lr: 0.000098 - momentum: 0.000000
2023-10-11 13:35:24,070 epoch 5 - iter 162/272 - loss 0.10256163 - time (sec): 56.59 - samples/sec: 538.09 - lr: 0.000096 - momentum: 0.000000
2023-10-11 13:35:33,336 epoch 5 - iter 189/272 - loss 0.10008297 - time (sec): 65.86 - samples/sec: 539.87 - lr: 0.000094 - momentum: 0.000000
2023-10-11 13:35:43,685 epoch 5 - iter 216/272 - loss 0.10184709 - time (sec): 76.21 - samples/sec: 546.33 - lr: 0.000093 - momentum: 0.000000
2023-10-11 13:35:53,096 epoch 5 - iter 243/272 - loss 0.09643962 - time (sec): 85.62 - samples/sec: 541.20 - lr: 0.000091 - momentum: 0.000000
2023-10-11 13:36:02,816 epoch 5 - iter 270/272 - loss 0.09603504 - time (sec): 95.34 - samples/sec: 538.26 - lr: 0.000089 - momentum: 0.000000
2023-10-11 13:36:03,663 ----------------------------------------------------------------------------------------------------
2023-10-11 13:36:03,663 EPOCH 5 done: loss 0.0953 - lr: 0.000089
2023-10-11 13:36:09,473 DEV : loss 0.12472429126501083 - f1-score (micro avg) 0.7786
2023-10-11 13:36:09,481 saving best model
2023-10-11 13:36:12,337 ----------------------------------------------------------------------------------------------------
2023-10-11 13:36:23,136 epoch 6 - iter 27/272 - loss 0.07520327 - time (sec): 10.80 - samples/sec: 505.01 - lr: 0.000087 - momentum: 0.000000
2023-10-11 13:36:32,097 epoch 6 - iter 54/272 - loss 0.07291755 - time (sec): 19.76 - samples/sec: 502.56 - lr: 0.000085 - momentum: 0.000000
2023-10-11 13:36:41,923 epoch 6 - iter 81/272 - loss 0.07828051 - time (sec): 29.58 - samples/sec: 521.33 - lr: 0.000084 - momentum: 0.000000
2023-10-11 13:36:51,453 epoch 6 - iter 108/272 - loss 0.07753702 - time (sec): 39.11 - samples/sec: 524.34 - lr: 0.000082 - momentum: 0.000000
2023-10-11 13:37:01,082 epoch 6 - iter 135/272 - loss 0.07120305 - time (sec): 48.74 - samples/sec: 526.52 - lr: 0.000080 - momentum: 0.000000
2023-10-11 13:37:10,272 epoch 6 - iter 162/272 - loss 0.07288072 - time (sec): 57.93 - samples/sec: 524.83 - lr: 0.000078 - momentum: 0.000000
2023-10-11 13:37:19,939 epoch 6 - iter 189/272 - loss 0.06861992 - time (sec): 67.60 - samples/sec: 526.14 - lr: 0.000077 - momentum: 0.000000
2023-10-11 13:37:30,055 epoch 6 - iter 216/272 - loss 0.06950290 - time (sec): 77.71 - samples/sec: 531.23 - lr: 0.000075 - momentum: 0.000000
2023-10-11 13:37:39,592 epoch 6 - iter 243/272 - loss 0.06815041 - time (sec): 87.25 - samples/sec: 529.93 - lr: 0.000073 - momentum: 0.000000
2023-10-11 13:37:49,504 epoch 6 - iter 270/272 - loss 0.06691943 - time (sec): 97.16 - samples/sec: 532.72 - lr: 0.000071 - momentum: 0.000000
2023-10-11 13:37:49,963 ----------------------------------------------------------------------------------------------------
2023-10-11 13:37:49,963 EPOCH 6 done: loss 0.0675 - lr: 0.000071
2023-10-11 13:37:55,754 DEV : loss 0.13598552346229553 - f1-score (micro avg) 0.7818
2023-10-11 13:37:55,762 saving best model
2023-10-11 13:37:58,488 ----------------------------------------------------------------------------------------------------
2023-10-11 13:38:08,928 epoch 7 - iter 27/272 - loss 0.06175330 - time (sec): 10.44 - samples/sec: 570.42 - lr: 0.000069 - momentum: 0.000000
2023-10-11 13:38:18,475 epoch 7 - iter 54/272 - loss 0.05347701 - time (sec): 19.98 - samples/sec: 564.63 - lr: 0.000068 - momentum: 0.000000
2023-10-11 13:38:28,109 epoch 7 - iter 81/272 - loss 0.05882144 - time (sec): 29.62 - samples/sec: 548.17 - lr: 0.000066 - momentum: 0.000000
2023-10-11 13:38:37,129 epoch 7 - iter 108/272 - loss 0.05357931 - time (sec): 38.64 - samples/sec: 542.35 - lr: 0.000064 - momentum: 0.000000
2023-10-11 13:38:46,616 epoch 7 - iter 135/272 - loss 0.05501228 - time (sec): 48.12 - samples/sec: 549.00 - lr: 0.000062 - momentum: 0.000000
2023-10-11 13:38:56,744 epoch 7 - iter 162/272 - loss 0.05044097 - time (sec): 58.25 - samples/sec: 553.71 - lr: 0.000061 - momentum: 0.000000
2023-10-11 13:39:06,470 epoch 7 - iter 189/272 - loss 0.04992894 - time (sec): 67.98 - samples/sec: 550.74 - lr: 0.000059 - momentum: 0.000000
2023-10-11 13:39:15,738 epoch 7 - iter 216/272 - loss 0.05389474 - time (sec): 77.25 - samples/sec: 548.65 - lr: 0.000057 - momentum: 0.000000
2023-10-11 13:39:24,510 epoch 7 - iter 243/272 - loss 0.05148651 - time (sec): 86.02 - samples/sec: 538.50 - lr: 0.000055 - momentum: 0.000000
2023-10-11 13:39:34,229 epoch 7 - iter 270/272 - loss 0.04901036 - time (sec): 95.74 - samples/sec: 540.38 - lr: 0.000054 - momentum: 0.000000
2023-10-11 13:39:34,706 ----------------------------------------------------------------------------------------------------
2023-10-11 13:39:34,706 EPOCH 7 done: loss 0.0490 - lr: 0.000054
2023-10-11 13:39:40,220 DEV : loss 0.13726186752319336 - f1-score (micro avg) 0.7912
2023-10-11 13:39:40,229 saving best model
2023-10-11 13:39:42,826 ----------------------------------------------------------------------------------------------------
2023-10-11 13:39:52,045 epoch 8 - iter 27/272 - loss 0.02987934 - time (sec): 9.21 - samples/sec: 517.77 - lr: 0.000052 - momentum: 0.000000
2023-10-11 13:40:00,972 epoch 8 - iter 54/272 - loss 0.03485461 - time (sec): 18.14 - samples/sec: 514.85 - lr: 0.000050 - momentum: 0.000000
2023-10-11 13:40:10,553 epoch 8 - iter 81/272 - loss 0.03793773 - time (sec): 27.72 - samples/sec: 523.19 - lr: 0.000048 - momentum: 0.000000
2023-10-11 13:40:21,107 epoch 8 - iter 108/272 - loss 0.03671922 - time (sec): 38.28 - samples/sec: 539.15 - lr: 0.000046 - momentum: 0.000000
2023-10-11 13:40:30,261 epoch 8 - iter 135/272 - loss 0.03973438 - time (sec): 47.43 - samples/sec: 533.66 - lr: 0.000045 - momentum: 0.000000
2023-10-11 13:40:39,815 epoch 8 - iter 162/272 - loss 0.03971603 - time (sec): 56.98 - samples/sec: 536.81 - lr: 0.000043 - momentum: 0.000000
2023-10-11 13:40:49,558 epoch 8 - iter 189/272 - loss 0.04039939 - time (sec): 66.73 - samples/sec: 538.65 - lr: 0.000041 - momentum: 0.000000
2023-10-11 13:40:58,977 epoch 8 - iter 216/272 - loss 0.03807732 - time (sec): 76.15 - samples/sec: 540.79 - lr: 0.000039 - momentum: 0.000000
2023-10-11 13:41:08,486 epoch 8 - iter 243/272 - loss 0.03765314 - time (sec): 85.66 - samples/sec: 544.18 - lr: 0.000038 - momentum: 0.000000
2023-10-11 13:41:17,813 epoch 8 - iter 270/272 - loss 0.03745509 - time (sec): 94.98 - samples/sec: 546.55 - lr: 0.000036 - momentum: 0.000000
2023-10-11 13:41:18,153 ----------------------------------------------------------------------------------------------------
2023-10-11 13:41:18,153 EPOCH 8 done: loss 0.0377 - lr: 0.000036
2023-10-11 13:41:24,087 DEV : loss 0.1399751901626587 - f1-score (micro avg) 0.7963
2023-10-11 13:41:24,095 saving best model
2023-10-11 13:41:26,673 ----------------------------------------------------------------------------------------------------
2023-10-11 13:41:35,890 epoch 9 - iter 27/272 - loss 0.03181528 - time (sec): 9.21 - samples/sec: 549.22 - lr: 0.000034 - momentum: 0.000000
2023-10-11 13:41:45,533 epoch 9 - iter 54/272 - loss 0.03782289 - time (sec): 18.86 - samples/sec: 552.08 - lr: 0.000032 - momentum: 0.000000
2023-10-11 13:41:55,101 epoch 9 - iter 81/272 - loss 0.03522035 - time (sec): 28.42 - samples/sec: 556.79 - lr: 0.000030 - momentum: 0.000000
2023-10-11 13:42:04,329 epoch 9 - iter 108/272 - loss 0.03337439 - time (sec): 37.65 - samples/sec: 554.47 - lr: 0.000029 - momentum: 0.000000
2023-10-11 13:42:13,770 epoch 9 - iter 135/272 - loss 0.03263354 - time (sec): 47.09 - samples/sec: 555.29 - lr: 0.000027 - momentum: 0.000000
2023-10-11 13:42:23,590 epoch 9 - iter 162/272 - loss 0.03181414 - time (sec): 56.91 - samples/sec: 553.60 - lr: 0.000025 - momentum: 0.000000
2023-10-11 13:42:32,803 epoch 9 - iter 189/272 - loss 0.03172942 - time (sec): 66.13 - samples/sec: 551.87 - lr: 0.000023 - momentum: 0.000000
2023-10-11 13:42:41,941 epoch 9 - iter 216/272 - loss 0.03342724 - time (sec): 75.26 - samples/sec: 550.32 - lr: 0.000022 - momentum: 0.000000
2023-10-11 13:42:51,133 epoch 9 - iter 243/272 - loss 0.03123165 - time (sec): 84.46 - samples/sec: 550.62 - lr: 0.000020 - momentum: 0.000000
2023-10-11 13:43:00,428 epoch 9 - iter 270/272 - loss 0.03083400 - time (sec): 93.75 - samples/sec: 549.52 - lr: 0.000018 - momentum: 0.000000
2023-10-11 13:43:01,120 ----------------------------------------------------------------------------------------------------
2023-10-11 13:43:01,121 EPOCH 9 done: loss 0.0310 - lr: 0.000018
2023-10-11 13:43:06,714 DEV : loss 0.13962143659591675 - f1-score (micro avg) 0.7919
2023-10-11 13:43:06,723 ----------------------------------------------------------------------------------------------------
2023-10-11 13:43:16,268 epoch 10 - iter 27/272 - loss 0.02047195 - time (sec): 9.54 - samples/sec: 540.05 - lr: 0.000016 - momentum: 0.000000
2023-10-11 13:43:25,152 epoch 10 - iter 54/272 - loss 0.01777598 - time (sec): 18.43 - samples/sec: 532.04 - lr: 0.000014 - momentum: 0.000000
2023-10-11 13:43:35,144 epoch 10 - iter 81/272 - loss 0.02262355 - time (sec): 28.42 - samples/sec: 549.70 - lr: 0.000013 - momentum: 0.000000
2023-10-11 13:43:44,586 epoch 10 - iter 108/272 - loss 0.02189460 - time (sec): 37.86 - samples/sec: 553.18 - lr: 0.000011 - momentum: 0.000000
2023-10-11 13:43:54,049 epoch 10 - iter 135/272 - loss 0.02502590 - time (sec): 47.32 - samples/sec: 559.33 - lr: 0.000009 - momentum: 0.000000
2023-10-11 13:44:04,410 epoch 10 - iter 162/272 - loss 0.02888403 - time (sec): 57.69 - samples/sec: 570.21 - lr: 0.000007 - momentum: 0.000000
2023-10-11 13:44:12,664 epoch 10 - iter 189/272 - loss 0.02872227 - time (sec): 65.94 - samples/sec: 558.21 - lr: 0.000005 - momentum: 0.000000
2023-10-11 13:44:22,060 epoch 10 - iter 216/272 - loss 0.02809091 - time (sec): 75.34 - samples/sec: 555.91 - lr: 0.000004 - momentum: 0.000000
2023-10-11 13:44:31,176 epoch 10 - iter 243/272 - loss 0.02744388 - time (sec): 84.45 - samples/sec: 553.40 - lr: 0.000002 - momentum: 0.000000
2023-10-11 13:44:40,474 epoch 10 - iter 270/272 - loss 0.02739646 - time (sec): 93.75 - samples/sec: 550.50 - lr: 0.000000 - momentum: 0.000000
2023-10-11 13:44:41,072 ----------------------------------------------------------------------------------------------------
2023-10-11 13:44:41,073 EPOCH 10 done: loss 0.0273 - lr: 0.000000
2023-10-11 13:44:46,865 DEV : loss 0.14164641499519348 - f1-score (micro avg) 0.7839
2023-10-11 13:44:47,739 ----------------------------------------------------------------------------------------------------
2023-10-11 13:44:47,741 Loading model from best epoch ...
2023-10-11 13:44:52,960 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-11 13:45:05,711
Results:
- F-score (micro) 0.7695
- F-score (macro) 0.7198
- Accuracy 0.6441
By class:
precision recall f1-score support
LOC 0.7924 0.8686 0.8287 312
PER 0.6743 0.8462 0.7505 208
ORG 0.4746 0.5091 0.4912 55
HumanProd 0.7600 0.8636 0.8085 22
micro avg 0.7191 0.8275 0.7695 597
macro avg 0.6753 0.7719 0.7198 597
weighted avg 0.7208 0.8275 0.7697 597
2023-10-11 13:45:05,712 ----------------------------------------------------------------------------------------------------