stefan-it's picture
Upload ./training.log with huggingface_hub
3512774
2023-10-25 20:51:03,901 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,902 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(64001, 768)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0-11): 12 x BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=768, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-25 20:51:03,902 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,903 MultiCorpus: 1085 train + 148 dev + 364 test sentences
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator
2023-10-25 20:51:03,903 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,903 Train: 1085 sentences
2023-10-25 20:51:03,903 (train_with_dev=False, train_with_test=False)
2023-10-25 20:51:03,903 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,903 Training Params:
2023-10-25 20:51:03,903 - learning_rate: "3e-05"
2023-10-25 20:51:03,903 - mini_batch_size: "4"
2023-10-25 20:51:03,903 - max_epochs: "10"
2023-10-25 20:51:03,903 - shuffle: "True"
2023-10-25 20:51:03,903 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,903 Plugins:
2023-10-25 20:51:03,903 - TensorboardLogger
2023-10-25 20:51:03,903 - LinearScheduler | warmup_fraction: '0.1'
2023-10-25 20:51:03,903 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,903 Final evaluation on model from best epoch (best-model.pt)
2023-10-25 20:51:03,903 - metric: "('micro avg', 'f1-score')"
2023-10-25 20:51:03,903 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,903 Computation:
2023-10-25 20:51:03,903 - compute on device: cuda:0
2023-10-25 20:51:03,903 - embedding storage: none
2023-10-25 20:51:03,903 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,904 Model training base path: "hmbench-newseye/sv-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-1"
2023-10-25 20:51:03,904 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,904 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:03,904 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-25 20:51:05,373 epoch 1 - iter 27/272 - loss 3.35810402 - time (sec): 1.47 - samples/sec: 3492.66 - lr: 0.000003 - momentum: 0.000000
2023-10-25 20:51:06,887 epoch 1 - iter 54/272 - loss 2.62664709 - time (sec): 2.98 - samples/sec: 3401.67 - lr: 0.000006 - momentum: 0.000000
2023-10-25 20:51:08,380 epoch 1 - iter 81/272 - loss 1.99702193 - time (sec): 4.48 - samples/sec: 3374.69 - lr: 0.000009 - momentum: 0.000000
2023-10-25 20:51:09,943 epoch 1 - iter 108/272 - loss 1.58713504 - time (sec): 6.04 - samples/sec: 3404.72 - lr: 0.000012 - momentum: 0.000000
2023-10-25 20:51:11,444 epoch 1 - iter 135/272 - loss 1.36358638 - time (sec): 7.54 - samples/sec: 3368.62 - lr: 0.000015 - momentum: 0.000000
2023-10-25 20:51:12,949 epoch 1 - iter 162/272 - loss 1.18146019 - time (sec): 9.04 - samples/sec: 3376.99 - lr: 0.000018 - momentum: 0.000000
2023-10-25 20:51:14,393 epoch 1 - iter 189/272 - loss 1.06467820 - time (sec): 10.49 - samples/sec: 3358.79 - lr: 0.000021 - momentum: 0.000000
2023-10-25 20:51:15,843 epoch 1 - iter 216/272 - loss 0.96906368 - time (sec): 11.94 - samples/sec: 3321.01 - lr: 0.000024 - momentum: 0.000000
2023-10-25 20:51:17,363 epoch 1 - iter 243/272 - loss 0.86325291 - time (sec): 13.46 - samples/sec: 3386.37 - lr: 0.000027 - momentum: 0.000000
2023-10-25 20:51:18,899 epoch 1 - iter 270/272 - loss 0.78237789 - time (sec): 14.99 - samples/sec: 3458.33 - lr: 0.000030 - momentum: 0.000000
2023-10-25 20:51:19,005 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:19,005 EPOCH 1 done: loss 0.7808 - lr: 0.000030
2023-10-25 20:51:20,097 DEV : loss 0.15305721759796143 - f1-score (micro avg) 0.6386
2023-10-25 20:51:20,104 saving best model
2023-10-25 20:51:20,577 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:22,089 epoch 2 - iter 27/272 - loss 0.13098364 - time (sec): 1.51 - samples/sec: 4032.22 - lr: 0.000030 - momentum: 0.000000
2023-10-25 20:51:23,680 epoch 2 - iter 54/272 - loss 0.13797292 - time (sec): 3.10 - samples/sec: 3676.76 - lr: 0.000029 - momentum: 0.000000
2023-10-25 20:51:25,272 epoch 2 - iter 81/272 - loss 0.13466499 - time (sec): 4.69 - samples/sec: 3567.20 - lr: 0.000029 - momentum: 0.000000
2023-10-25 20:51:26,739 epoch 2 - iter 108/272 - loss 0.14329642 - time (sec): 6.16 - samples/sec: 3503.40 - lr: 0.000029 - momentum: 0.000000
2023-10-25 20:51:28,334 epoch 2 - iter 135/272 - loss 0.14715212 - time (sec): 7.76 - samples/sec: 3406.31 - lr: 0.000028 - momentum: 0.000000
2023-10-25 20:51:29,837 epoch 2 - iter 162/272 - loss 0.14527168 - time (sec): 9.26 - samples/sec: 3439.37 - lr: 0.000028 - momentum: 0.000000
2023-10-25 20:51:31,271 epoch 2 - iter 189/272 - loss 0.14428601 - time (sec): 10.69 - samples/sec: 3391.13 - lr: 0.000028 - momentum: 0.000000
2023-10-25 20:51:32,731 epoch 2 - iter 216/272 - loss 0.14094961 - time (sec): 12.15 - samples/sec: 3376.93 - lr: 0.000027 - momentum: 0.000000
2023-10-25 20:51:34,209 epoch 2 - iter 243/272 - loss 0.13825777 - time (sec): 13.63 - samples/sec: 3412.68 - lr: 0.000027 - momentum: 0.000000
2023-10-25 20:51:35,720 epoch 2 - iter 270/272 - loss 0.13406289 - time (sec): 15.14 - samples/sec: 3420.90 - lr: 0.000027 - momentum: 0.000000
2023-10-25 20:51:35,822 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:35,823 EPOCH 2 done: loss 0.1336 - lr: 0.000027
2023-10-25 20:51:37,026 DEV : loss 0.10869525372982025 - f1-score (micro avg) 0.7612
2023-10-25 20:51:37,032 saving best model
2023-10-25 20:51:37,679 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:39,125 epoch 3 - iter 27/272 - loss 0.06125906 - time (sec): 1.44 - samples/sec: 3180.36 - lr: 0.000026 - momentum: 0.000000
2023-10-25 20:51:40,638 epoch 3 - iter 54/272 - loss 0.05737341 - time (sec): 2.96 - samples/sec: 3829.78 - lr: 0.000026 - momentum: 0.000000
2023-10-25 20:51:42,149 epoch 3 - iter 81/272 - loss 0.06197287 - time (sec): 4.47 - samples/sec: 3671.03 - lr: 0.000026 - momentum: 0.000000
2023-10-25 20:51:43,609 epoch 3 - iter 108/272 - loss 0.06482587 - time (sec): 5.93 - samples/sec: 3526.11 - lr: 0.000025 - momentum: 0.000000
2023-10-25 20:51:45,141 epoch 3 - iter 135/272 - loss 0.06852388 - time (sec): 7.46 - samples/sec: 3507.27 - lr: 0.000025 - momentum: 0.000000
2023-10-25 20:51:46,621 epoch 3 - iter 162/272 - loss 0.06845236 - time (sec): 8.94 - samples/sec: 3533.20 - lr: 0.000025 - momentum: 0.000000
2023-10-25 20:51:48,120 epoch 3 - iter 189/272 - loss 0.06756303 - time (sec): 10.44 - samples/sec: 3460.60 - lr: 0.000024 - momentum: 0.000000
2023-10-25 20:51:49,609 epoch 3 - iter 216/272 - loss 0.06827772 - time (sec): 11.93 - samples/sec: 3469.25 - lr: 0.000024 - momentum: 0.000000
2023-10-25 20:51:51,080 epoch 3 - iter 243/272 - loss 0.07019403 - time (sec): 13.40 - samples/sec: 3441.78 - lr: 0.000024 - momentum: 0.000000
2023-10-25 20:51:52,526 epoch 3 - iter 270/272 - loss 0.06889892 - time (sec): 14.85 - samples/sec: 3470.59 - lr: 0.000023 - momentum: 0.000000
2023-10-25 20:51:52,642 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:52,642 EPOCH 3 done: loss 0.0684 - lr: 0.000023
2023-10-25 20:51:53,815 DEV : loss 0.11325477808713913 - f1-score (micro avg) 0.8008
2023-10-25 20:51:53,821 saving best model
2023-10-25 20:51:54,495 ----------------------------------------------------------------------------------------------------
2023-10-25 20:51:55,964 epoch 4 - iter 27/272 - loss 0.03659584 - time (sec): 1.46 - samples/sec: 3878.96 - lr: 0.000023 - momentum: 0.000000
2023-10-25 20:51:57,465 epoch 4 - iter 54/272 - loss 0.03315229 - time (sec): 2.96 - samples/sec: 3655.01 - lr: 0.000023 - momentum: 0.000000
2023-10-25 20:51:58,859 epoch 4 - iter 81/272 - loss 0.03745032 - time (sec): 4.36 - samples/sec: 3462.26 - lr: 0.000022 - momentum: 0.000000
2023-10-25 20:52:00,314 epoch 4 - iter 108/272 - loss 0.03711163 - time (sec): 5.81 - samples/sec: 3479.09 - lr: 0.000022 - momentum: 0.000000
2023-10-25 20:52:01,742 epoch 4 - iter 135/272 - loss 0.03775119 - time (sec): 7.24 - samples/sec: 3466.86 - lr: 0.000022 - momentum: 0.000000
2023-10-25 20:52:03,327 epoch 4 - iter 162/272 - loss 0.04244654 - time (sec): 8.83 - samples/sec: 3400.80 - lr: 0.000021 - momentum: 0.000000
2023-10-25 20:52:04,794 epoch 4 - iter 189/272 - loss 0.04115573 - time (sec): 10.29 - samples/sec: 3389.24 - lr: 0.000021 - momentum: 0.000000
2023-10-25 20:52:06,428 epoch 4 - iter 216/272 - loss 0.04090003 - time (sec): 11.93 - samples/sec: 3466.71 - lr: 0.000021 - momentum: 0.000000
2023-10-25 20:52:07,981 epoch 4 - iter 243/272 - loss 0.04016434 - time (sec): 13.48 - samples/sec: 3404.78 - lr: 0.000020 - momentum: 0.000000
2023-10-25 20:52:09,573 epoch 4 - iter 270/272 - loss 0.03989968 - time (sec): 15.07 - samples/sec: 3430.88 - lr: 0.000020 - momentum: 0.000000
2023-10-25 20:52:09,677 ----------------------------------------------------------------------------------------------------
2023-10-25 20:52:09,677 EPOCH 4 done: loss 0.0397 - lr: 0.000020
2023-10-25 20:52:10,943 DEV : loss 0.1432153433561325 - f1-score (micro avg) 0.7927
2023-10-25 20:52:10,950 ----------------------------------------------------------------------------------------------------
2023-10-25 20:52:12,397 epoch 5 - iter 27/272 - loss 0.03498180 - time (sec): 1.45 - samples/sec: 3105.70 - lr: 0.000020 - momentum: 0.000000
2023-10-25 20:52:13,883 epoch 5 - iter 54/272 - loss 0.03139512 - time (sec): 2.93 - samples/sec: 3136.88 - lr: 0.000019 - momentum: 0.000000
2023-10-25 20:52:15,479 epoch 5 - iter 81/272 - loss 0.02686437 - time (sec): 4.53 - samples/sec: 3278.40 - lr: 0.000019 - momentum: 0.000000
2023-10-25 20:52:17,076 epoch 5 - iter 108/272 - loss 0.02614851 - time (sec): 6.13 - samples/sec: 3313.61 - lr: 0.000019 - momentum: 0.000000
2023-10-25 20:52:18,602 epoch 5 - iter 135/272 - loss 0.02762334 - time (sec): 7.65 - samples/sec: 3288.24 - lr: 0.000018 - momentum: 0.000000
2023-10-25 20:52:20,123 epoch 5 - iter 162/272 - loss 0.02604272 - time (sec): 9.17 - samples/sec: 3385.64 - lr: 0.000018 - momentum: 0.000000
2023-10-25 20:52:21,652 epoch 5 - iter 189/272 - loss 0.02439411 - time (sec): 10.70 - samples/sec: 3327.17 - lr: 0.000018 - momentum: 0.000000
2023-10-25 20:52:23,621 epoch 5 - iter 216/272 - loss 0.02376852 - time (sec): 12.67 - samples/sec: 3258.47 - lr: 0.000017 - momentum: 0.000000
2023-10-25 20:52:25,077 epoch 5 - iter 243/272 - loss 0.02352355 - time (sec): 14.13 - samples/sec: 3289.41 - lr: 0.000017 - momentum: 0.000000
2023-10-25 20:52:26,579 epoch 5 - iter 270/272 - loss 0.02408055 - time (sec): 15.63 - samples/sec: 3309.97 - lr: 0.000017 - momentum: 0.000000
2023-10-25 20:52:26,689 ----------------------------------------------------------------------------------------------------
2023-10-25 20:52:26,689 EPOCH 5 done: loss 0.0244 - lr: 0.000017
2023-10-25 20:52:27,848 DEV : loss 0.16343659162521362 - f1-score (micro avg) 0.8102
2023-10-25 20:52:27,855 saving best model
2023-10-25 20:52:28,577 ----------------------------------------------------------------------------------------------------
2023-10-25 20:52:30,124 epoch 6 - iter 27/272 - loss 0.02652120 - time (sec): 1.52 - samples/sec: 3067.11 - lr: 0.000016 - momentum: 0.000000
2023-10-25 20:52:31,621 epoch 6 - iter 54/272 - loss 0.02631961 - time (sec): 3.02 - samples/sec: 3218.86 - lr: 0.000016 - momentum: 0.000000
2023-10-25 20:52:33,096 epoch 6 - iter 81/272 - loss 0.02198692 - time (sec): 4.49 - samples/sec: 3280.48 - lr: 0.000016 - momentum: 0.000000
2023-10-25 20:52:34,602 epoch 6 - iter 108/272 - loss 0.02473048 - time (sec): 6.00 - samples/sec: 3337.97 - lr: 0.000015 - momentum: 0.000000
2023-10-25 20:52:36,070 epoch 6 - iter 135/272 - loss 0.02285927 - time (sec): 7.47 - samples/sec: 3371.24 - lr: 0.000015 - momentum: 0.000000
2023-10-25 20:52:37,555 epoch 6 - iter 162/272 - loss 0.02122821 - time (sec): 8.95 - samples/sec: 3473.57 - lr: 0.000015 - momentum: 0.000000
2023-10-25 20:52:39,056 epoch 6 - iter 189/272 - loss 0.01959515 - time (sec): 10.45 - samples/sec: 3470.46 - lr: 0.000014 - momentum: 0.000000
2023-10-25 20:52:40,504 epoch 6 - iter 216/272 - loss 0.01869554 - time (sec): 11.90 - samples/sec: 3465.89 - lr: 0.000014 - momentum: 0.000000
2023-10-25 20:52:41,945 epoch 6 - iter 243/272 - loss 0.01895320 - time (sec): 13.34 - samples/sec: 3492.28 - lr: 0.000014 - momentum: 0.000000
2023-10-25 20:52:43,365 epoch 6 - iter 270/272 - loss 0.01873867 - time (sec): 14.76 - samples/sec: 3507.21 - lr: 0.000013 - momentum: 0.000000
2023-10-25 20:52:43,462 ----------------------------------------------------------------------------------------------------
2023-10-25 20:52:43,463 EPOCH 6 done: loss 0.0187 - lr: 0.000013
2023-10-25 20:52:44,733 DEV : loss 0.16893555223941803 - f1-score (micro avg) 0.8324
2023-10-25 20:52:44,741 saving best model
2023-10-25 20:52:45,405 ----------------------------------------------------------------------------------------------------
2023-10-25 20:52:46,859 epoch 7 - iter 27/272 - loss 0.01570413 - time (sec): 1.45 - samples/sec: 3659.52 - lr: 0.000013 - momentum: 0.000000
2023-10-25 20:52:48,351 epoch 7 - iter 54/272 - loss 0.01635881 - time (sec): 2.94 - samples/sec: 3551.57 - lr: 0.000013 - momentum: 0.000000
2023-10-25 20:52:49,917 epoch 7 - iter 81/272 - loss 0.01833799 - time (sec): 4.51 - samples/sec: 3370.52 - lr: 0.000012 - momentum: 0.000000
2023-10-25 20:52:51,376 epoch 7 - iter 108/272 - loss 0.01488772 - time (sec): 5.97 - samples/sec: 3387.92 - lr: 0.000012 - momentum: 0.000000
2023-10-25 20:52:52,826 epoch 7 - iter 135/272 - loss 0.01593043 - time (sec): 7.42 - samples/sec: 3335.88 - lr: 0.000012 - momentum: 0.000000
2023-10-25 20:52:54,341 epoch 7 - iter 162/272 - loss 0.01519486 - time (sec): 8.93 - samples/sec: 3367.70 - lr: 0.000011 - momentum: 0.000000
2023-10-25 20:52:55,849 epoch 7 - iter 189/272 - loss 0.01473941 - time (sec): 10.44 - samples/sec: 3372.45 - lr: 0.000011 - momentum: 0.000000
2023-10-25 20:52:57,397 epoch 7 - iter 216/272 - loss 0.01429210 - time (sec): 11.99 - samples/sec: 3440.67 - lr: 0.000011 - momentum: 0.000000
2023-10-25 20:52:58,888 epoch 7 - iter 243/272 - loss 0.01324011 - time (sec): 13.48 - samples/sec: 3476.27 - lr: 0.000010 - momentum: 0.000000
2023-10-25 20:53:00,322 epoch 7 - iter 270/272 - loss 0.01365909 - time (sec): 14.91 - samples/sec: 3480.96 - lr: 0.000010 - momentum: 0.000000
2023-10-25 20:53:00,416 ----------------------------------------------------------------------------------------------------
2023-10-25 20:53:00,416 EPOCH 7 done: loss 0.0136 - lr: 0.000010
2023-10-25 20:53:01,585 DEV : loss 0.1673850566148758 - f1-score (micro avg) 0.844
2023-10-25 20:53:01,592 saving best model
2023-10-25 20:53:02,278 ----------------------------------------------------------------------------------------------------
2023-10-25 20:53:03,801 epoch 8 - iter 27/272 - loss 0.02161960 - time (sec): 1.52 - samples/sec: 3618.74 - lr: 0.000010 - momentum: 0.000000
2023-10-25 20:53:05,303 epoch 8 - iter 54/272 - loss 0.01680960 - time (sec): 3.02 - samples/sec: 3491.96 - lr: 0.000009 - momentum: 0.000000
2023-10-25 20:53:06,784 epoch 8 - iter 81/272 - loss 0.01355752 - time (sec): 4.50 - samples/sec: 3448.46 - lr: 0.000009 - momentum: 0.000000
2023-10-25 20:53:08,258 epoch 8 - iter 108/272 - loss 0.01579221 - time (sec): 5.98 - samples/sec: 3470.08 - lr: 0.000009 - momentum: 0.000000
2023-10-25 20:53:09,700 epoch 8 - iter 135/272 - loss 0.01480401 - time (sec): 7.42 - samples/sec: 3427.25 - lr: 0.000008 - momentum: 0.000000
2023-10-25 20:53:11,218 epoch 8 - iter 162/272 - loss 0.01266974 - time (sec): 8.94 - samples/sec: 3464.35 - lr: 0.000008 - momentum: 0.000000
2023-10-25 20:53:12,725 epoch 8 - iter 189/272 - loss 0.01141429 - time (sec): 10.44 - samples/sec: 3449.79 - lr: 0.000008 - momentum: 0.000000
2023-10-25 20:53:14,150 epoch 8 - iter 216/272 - loss 0.01178960 - time (sec): 11.87 - samples/sec: 3379.00 - lr: 0.000007 - momentum: 0.000000
2023-10-25 20:53:15,670 epoch 8 - iter 243/272 - loss 0.01123156 - time (sec): 13.39 - samples/sec: 3428.57 - lr: 0.000007 - momentum: 0.000000
2023-10-25 20:53:17,154 epoch 8 - iter 270/272 - loss 0.01148940 - time (sec): 14.87 - samples/sec: 3482.11 - lr: 0.000007 - momentum: 0.000000
2023-10-25 20:53:17,258 ----------------------------------------------------------------------------------------------------
2023-10-25 20:53:17,259 EPOCH 8 done: loss 0.0115 - lr: 0.000007
2023-10-25 20:53:18,828 DEV : loss 0.18429012596607208 - f1-score (micro avg) 0.8429
2023-10-25 20:53:18,834 ----------------------------------------------------------------------------------------------------
2023-10-25 20:53:20,345 epoch 9 - iter 27/272 - loss 0.00096616 - time (sec): 1.51 - samples/sec: 3279.68 - lr: 0.000006 - momentum: 0.000000
2023-10-25 20:53:21,821 epoch 9 - iter 54/272 - loss 0.00315849 - time (sec): 2.99 - samples/sec: 3323.41 - lr: 0.000006 - momentum: 0.000000
2023-10-25 20:53:23,292 epoch 9 - iter 81/272 - loss 0.00292424 - time (sec): 4.46 - samples/sec: 3273.48 - lr: 0.000006 - momentum: 0.000000
2023-10-25 20:53:24,820 epoch 9 - iter 108/272 - loss 0.00436173 - time (sec): 5.98 - samples/sec: 3396.22 - lr: 0.000005 - momentum: 0.000000
2023-10-25 20:53:26,273 epoch 9 - iter 135/272 - loss 0.00392677 - time (sec): 7.44 - samples/sec: 3513.87 - lr: 0.000005 - momentum: 0.000000
2023-10-25 20:53:27,649 epoch 9 - iter 162/272 - loss 0.00622840 - time (sec): 8.81 - samples/sec: 3541.40 - lr: 0.000005 - momentum: 0.000000
2023-10-25 20:53:29,173 epoch 9 - iter 189/272 - loss 0.00621295 - time (sec): 10.34 - samples/sec: 3498.94 - lr: 0.000004 - momentum: 0.000000
2023-10-25 20:53:30,573 epoch 9 - iter 216/272 - loss 0.00596192 - time (sec): 11.74 - samples/sec: 3536.20 - lr: 0.000004 - momentum: 0.000000
2023-10-25 20:53:32,055 epoch 9 - iter 243/272 - loss 0.00724204 - time (sec): 13.22 - samples/sec: 3575.10 - lr: 0.000004 - momentum: 0.000000
2023-10-25 20:53:33,437 epoch 9 - iter 270/272 - loss 0.00728933 - time (sec): 14.60 - samples/sec: 3541.52 - lr: 0.000003 - momentum: 0.000000
2023-10-25 20:53:33,536 ----------------------------------------------------------------------------------------------------
2023-10-25 20:53:33,536 EPOCH 9 done: loss 0.0073 - lr: 0.000003
2023-10-25 20:53:34,754 DEV : loss 0.18683403730392456 - f1-score (micro avg) 0.846
2023-10-25 20:53:34,761 saving best model
2023-10-25 20:53:35,249 ----------------------------------------------------------------------------------------------------
2023-10-25 20:53:36,695 epoch 10 - iter 27/272 - loss 0.00555682 - time (sec): 1.44 - samples/sec: 3069.63 - lr: 0.000003 - momentum: 0.000000
2023-10-25 20:53:38,153 epoch 10 - iter 54/272 - loss 0.00691251 - time (sec): 2.90 - samples/sec: 3140.72 - lr: 0.000003 - momentum: 0.000000
2023-10-25 20:53:39,581 epoch 10 - iter 81/272 - loss 0.00559490 - time (sec): 4.33 - samples/sec: 3363.79 - lr: 0.000002 - momentum: 0.000000
2023-10-25 20:53:41,065 epoch 10 - iter 108/272 - loss 0.00408773 - time (sec): 5.81 - samples/sec: 3510.61 - lr: 0.000002 - momentum: 0.000000
2023-10-25 20:53:42,390 epoch 10 - iter 135/272 - loss 0.00363612 - time (sec): 7.14 - samples/sec: 3426.84 - lr: 0.000002 - momentum: 0.000000
2023-10-25 20:53:43,826 epoch 10 - iter 162/272 - loss 0.00590384 - time (sec): 8.57 - samples/sec: 3439.31 - lr: 0.000001 - momentum: 0.000000
2023-10-25 20:53:45,205 epoch 10 - iter 189/272 - loss 0.00552531 - time (sec): 9.95 - samples/sec: 3489.58 - lr: 0.000001 - momentum: 0.000000
2023-10-25 20:53:46,692 epoch 10 - iter 216/272 - loss 0.00564306 - time (sec): 11.44 - samples/sec: 3572.62 - lr: 0.000001 - momentum: 0.000000
2023-10-25 20:53:48,035 epoch 10 - iter 243/272 - loss 0.00641529 - time (sec): 12.78 - samples/sec: 3538.99 - lr: 0.000000 - momentum: 0.000000
2023-10-25 20:53:49,469 epoch 10 - iter 270/272 - loss 0.00576979 - time (sec): 14.22 - samples/sec: 3623.00 - lr: 0.000000 - momentum: 0.000000
2023-10-25 20:53:49,580 ----------------------------------------------------------------------------------------------------
2023-10-25 20:53:49,580 EPOCH 10 done: loss 0.0057 - lr: 0.000000
2023-10-25 20:53:50,777 DEV : loss 0.18572133779525757 - f1-score (micro avg) 0.845
2023-10-25 20:53:51,215 ----------------------------------------------------------------------------------------------------
2023-10-25 20:53:51,216 Loading model from best epoch ...
2023-10-25 20:53:53,027 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-25 20:53:55,007
Results:
- F-score (micro) 0.7975
- F-score (macro) 0.7465
- Accuracy 0.682
By class:
precision recall f1-score support
LOC 0.8160 0.8814 0.8475 312
PER 0.7137 0.8750 0.7862 208
ORG 0.5510 0.4909 0.5192 55
HumanProd 0.7692 0.9091 0.8333 22
micro avg 0.7556 0.8442 0.7975 597
macro avg 0.7125 0.7891 0.7465 597
weighted avg 0.7542 0.8442 0.7953 597
2023-10-25 20:53:55,007 ----------------------------------------------------------------------------------------------------