2023-10-24 16:04:44,877 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,878 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (4): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (5): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (6): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (7): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (8): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (9): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (10): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (11): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-24 16:04:44,878 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,878 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /home/ubuntu/.flair/datasets/ner_icdar_europeana/fr 2023-10-24 16:04:44,878 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,878 Train: 7936 sentences 2023-10-24 16:04:44,878 (train_with_dev=False, train_with_test=False) 2023-10-24 16:04:44,879 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,879 Training Params: 2023-10-24 16:04:44,879 - learning_rate: "5e-05" 2023-10-24 16:04:44,879 - mini_batch_size: "4" 2023-10-24 16:04:44,879 - max_epochs: "10" 2023-10-24 16:04:44,879 - shuffle: "True" 2023-10-24 16:04:44,879 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,879 Plugins: 2023-10-24 16:04:44,879 - TensorboardLogger 2023-10-24 16:04:44,879 - LinearScheduler | warmup_fraction: '0.1' 2023-10-24 16:04:44,879 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,879 Final evaluation on model from best epoch (best-model.pt) 2023-10-24 16:04:44,879 - metric: "('micro avg', 'f1-score')" 2023-10-24 16:04:44,879 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,879 Computation: 2023-10-24 16:04:44,879 - compute on device: cuda:0 2023-10-24 16:04:44,879 - embedding storage: none 2023-10-24 16:04:44,879 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,879 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs4-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-2" 2023-10-24 16:04:44,879 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,879 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:04:44,879 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-24 16:04:56,741 epoch 1 - iter 198/1984 - loss 1.30874305 - time (sec): 11.86 - samples/sec: 1327.48 - lr: 0.000005 - momentum: 0.000000 2023-10-24 16:05:08,929 epoch 1 - iter 396/1984 - loss 0.77290785 - time (sec): 24.05 - samples/sec: 1366.71 - lr: 0.000010 - momentum: 0.000000 2023-10-24 16:05:21,110 epoch 1 - iter 594/1984 - loss 0.57572628 - time (sec): 36.23 - samples/sec: 1376.56 - lr: 0.000015 - momentum: 0.000000 2023-10-24 16:05:33,378 epoch 1 - iter 792/1984 - loss 0.46495084 - time (sec): 48.50 - samples/sec: 1393.71 - lr: 0.000020 - momentum: 0.000000 2023-10-24 16:05:45,281 epoch 1 - iter 990/1984 - loss 0.40496031 - time (sec): 60.40 - samples/sec: 1372.95 - lr: 0.000025 - momentum: 0.000000 2023-10-24 16:05:57,339 epoch 1 - iter 1188/1984 - loss 0.35918387 - time (sec): 72.46 - samples/sec: 1368.87 - lr: 0.000030 - momentum: 0.000000 2023-10-24 16:06:09,207 epoch 1 - iter 1386/1984 - loss 0.33254399 - time (sec): 84.33 - samples/sec: 1357.26 - lr: 0.000035 - momentum: 0.000000 2023-10-24 16:06:21,228 epoch 1 - iter 1584/1984 - loss 0.30875914 - time (sec): 96.35 - samples/sec: 1352.95 - lr: 0.000040 - momentum: 0.000000 2023-10-24 16:06:33,559 epoch 1 - iter 1782/1984 - loss 0.28915180 - time (sec): 108.68 - samples/sec: 1356.10 - lr: 0.000045 - momentum: 0.000000 2023-10-24 16:06:45,621 epoch 1 - iter 1980/1984 - loss 0.27550886 - time (sec): 120.74 - samples/sec: 1353.99 - lr: 0.000050 - momentum: 0.000000 2023-10-24 16:06:45,883 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:06:45,883 EPOCH 1 done: loss 0.2751 - lr: 0.000050 2023-10-24 16:06:48,984 DEV : loss 0.1138407364487648 - f1-score (micro avg) 0.66 2023-10-24 16:06:48,999 saving best model 2023-10-24 16:06:49,463 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:07:01,531 epoch 2 - iter 198/1984 - loss 0.11655675 - time (sec): 12.07 - samples/sec: 1378.59 - lr: 0.000049 - momentum: 0.000000 2023-10-24 16:07:13,599 epoch 2 - iter 396/1984 - loss 0.12768765 - time (sec): 24.13 - samples/sec: 1366.33 - lr: 0.000049 - momentum: 0.000000 2023-10-24 16:07:25,662 epoch 2 - iter 594/1984 - loss 0.12928172 - time (sec): 36.20 - samples/sec: 1354.01 - lr: 0.000048 - momentum: 0.000000 2023-10-24 16:07:37,968 epoch 2 - iter 792/1984 - loss 0.12515423 - time (sec): 48.50 - samples/sec: 1358.05 - lr: 0.000048 - momentum: 0.000000 2023-10-24 16:07:50,011 epoch 2 - iter 990/1984 - loss 0.12468271 - time (sec): 60.55 - samples/sec: 1352.43 - lr: 0.000047 - momentum: 0.000000 2023-10-24 16:08:02,219 epoch 2 - iter 1188/1984 - loss 0.12585547 - time (sec): 72.76 - samples/sec: 1350.88 - lr: 0.000047 - momentum: 0.000000 2023-10-24 16:08:14,508 epoch 2 - iter 1386/1984 - loss 0.12536731 - time (sec): 85.04 - samples/sec: 1356.27 - lr: 0.000046 - momentum: 0.000000 2023-10-24 16:08:26,927 epoch 2 - iter 1584/1984 - loss 0.12683513 - time (sec): 97.46 - samples/sec: 1354.08 - lr: 0.000046 - momentum: 0.000000 2023-10-24 16:08:38,923 epoch 2 - iter 1782/1984 - loss 0.12567024 - time (sec): 109.46 - samples/sec: 1349.32 - lr: 0.000045 - momentum: 0.000000 2023-10-24 16:08:50,905 epoch 2 - iter 1980/1984 - loss 0.12355541 - time (sec): 121.44 - samples/sec: 1348.76 - lr: 0.000044 - momentum: 0.000000 2023-10-24 16:08:51,136 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:08:51,136 EPOCH 2 done: loss 0.1236 - lr: 0.000044 2023-10-24 16:08:54,244 DEV : loss 0.11781438440084457 - f1-score (micro avg) 0.7224 2023-10-24 16:08:54,259 saving best model 2023-10-24 16:08:54,870 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:09:07,592 epoch 3 - iter 198/1984 - loss 0.09570974 - time (sec): 12.72 - samples/sec: 1320.29 - lr: 0.000044 - momentum: 0.000000 2023-10-24 16:09:19,560 epoch 3 - iter 396/1984 - loss 0.09234026 - time (sec): 24.69 - samples/sec: 1318.45 - lr: 0.000043 - momentum: 0.000000 2023-10-24 16:09:31,705 epoch 3 - iter 594/1984 - loss 0.09370577 - time (sec): 36.83 - samples/sec: 1341.85 - lr: 0.000043 - momentum: 0.000000 2023-10-24 16:09:43,846 epoch 3 - iter 792/1984 - loss 0.09168960 - time (sec): 48.97 - samples/sec: 1353.86 - lr: 0.000042 - momentum: 0.000000 2023-10-24 16:09:55,897 epoch 3 - iter 990/1984 - loss 0.09458487 - time (sec): 61.03 - samples/sec: 1343.19 - lr: 0.000042 - momentum: 0.000000 2023-10-24 16:10:08,031 epoch 3 - iter 1188/1984 - loss 0.09320143 - time (sec): 73.16 - samples/sec: 1339.84 - lr: 0.000041 - momentum: 0.000000 2023-10-24 16:10:20,096 epoch 3 - iter 1386/1984 - loss 0.09112387 - time (sec): 85.22 - samples/sec: 1342.96 - lr: 0.000041 - momentum: 0.000000 2023-10-24 16:10:32,057 epoch 3 - iter 1584/1984 - loss 0.09073603 - time (sec): 97.19 - samples/sec: 1343.93 - lr: 0.000040 - momentum: 0.000000 2023-10-24 16:10:44,110 epoch 3 - iter 1782/1984 - loss 0.09058074 - time (sec): 109.24 - samples/sec: 1344.60 - lr: 0.000039 - momentum: 0.000000 2023-10-24 16:10:56,367 epoch 3 - iter 1980/1984 - loss 0.09067995 - time (sec): 121.50 - samples/sec: 1347.52 - lr: 0.000039 - momentum: 0.000000 2023-10-24 16:10:56,599 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:10:56,599 EPOCH 3 done: loss 0.0906 - lr: 0.000039 2023-10-24 16:10:59,703 DEV : loss 0.1346270591020584 - f1-score (micro avg) 0.7411 2023-10-24 16:10:59,718 saving best model 2023-10-24 16:11:00,312 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:11:12,521 epoch 4 - iter 198/1984 - loss 0.05402007 - time (sec): 12.21 - samples/sec: 1388.06 - lr: 0.000038 - momentum: 0.000000 2023-10-24 16:11:24,580 epoch 4 - iter 396/1984 - loss 0.06102748 - time (sec): 24.27 - samples/sec: 1343.10 - lr: 0.000038 - momentum: 0.000000 2023-10-24 16:11:36,914 epoch 4 - iter 594/1984 - loss 0.06492659 - time (sec): 36.60 - samples/sec: 1366.14 - lr: 0.000037 - momentum: 0.000000 2023-10-24 16:11:48,948 epoch 4 - iter 792/1984 - loss 0.06649210 - time (sec): 48.63 - samples/sec: 1356.46 - lr: 0.000037 - momentum: 0.000000 2023-10-24 16:12:01,047 epoch 4 - iter 990/1984 - loss 0.06862370 - time (sec): 60.73 - samples/sec: 1355.00 - lr: 0.000036 - momentum: 0.000000 2023-10-24 16:12:13,286 epoch 4 - iter 1188/1984 - loss 0.06836085 - time (sec): 72.97 - samples/sec: 1355.36 - lr: 0.000036 - momentum: 0.000000 2023-10-24 16:12:25,253 epoch 4 - iter 1386/1984 - loss 0.06765039 - time (sec): 84.94 - samples/sec: 1349.99 - lr: 0.000035 - momentum: 0.000000 2023-10-24 16:12:37,532 epoch 4 - iter 1584/1984 - loss 0.07146222 - time (sec): 97.22 - samples/sec: 1348.16 - lr: 0.000034 - momentum: 0.000000 2023-10-24 16:12:49,618 epoch 4 - iter 1782/1984 - loss 0.07238990 - time (sec): 109.30 - samples/sec: 1349.96 - lr: 0.000034 - momentum: 0.000000 2023-10-24 16:13:01,723 epoch 4 - iter 1980/1984 - loss 0.07174639 - time (sec): 121.41 - samples/sec: 1348.26 - lr: 0.000033 - momentum: 0.000000 2023-10-24 16:13:01,960 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:13:01,960 EPOCH 4 done: loss 0.0716 - lr: 0.000033 2023-10-24 16:13:05,387 DEV : loss 0.1819346696138382 - f1-score (micro avg) 0.7121 2023-10-24 16:13:05,402 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:13:17,704 epoch 5 - iter 198/1984 - loss 0.04521262 - time (sec): 12.30 - samples/sec: 1372.08 - lr: 0.000033 - momentum: 0.000000 2023-10-24 16:13:29,758 epoch 5 - iter 396/1984 - loss 0.04965805 - time (sec): 24.35 - samples/sec: 1333.01 - lr: 0.000032 - momentum: 0.000000 2023-10-24 16:13:42,101 epoch 5 - iter 594/1984 - loss 0.05396032 - time (sec): 36.70 - samples/sec: 1348.68 - lr: 0.000032 - momentum: 0.000000 2023-10-24 16:13:54,145 epoch 5 - iter 792/1984 - loss 0.05332345 - time (sec): 48.74 - samples/sec: 1336.01 - lr: 0.000031 - momentum: 0.000000 2023-10-24 16:14:06,219 epoch 5 - iter 990/1984 - loss 0.05289847 - time (sec): 60.82 - samples/sec: 1333.40 - lr: 0.000031 - momentum: 0.000000 2023-10-24 16:14:18,370 epoch 5 - iter 1188/1984 - loss 0.05257156 - time (sec): 72.97 - samples/sec: 1340.50 - lr: 0.000030 - momentum: 0.000000 2023-10-24 16:14:30,344 epoch 5 - iter 1386/1984 - loss 0.05371135 - time (sec): 84.94 - samples/sec: 1336.50 - lr: 0.000029 - momentum: 0.000000 2023-10-24 16:14:42,427 epoch 5 - iter 1584/1984 - loss 0.05289029 - time (sec): 97.02 - samples/sec: 1335.24 - lr: 0.000029 - momentum: 0.000000 2023-10-24 16:14:54,750 epoch 5 - iter 1782/1984 - loss 0.05253860 - time (sec): 109.35 - samples/sec: 1344.21 - lr: 0.000028 - momentum: 0.000000 2023-10-24 16:15:06,900 epoch 5 - iter 1980/1984 - loss 0.05342353 - time (sec): 121.50 - samples/sec: 1347.29 - lr: 0.000028 - momentum: 0.000000 2023-10-24 16:15:07,145 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:15:07,145 EPOCH 5 done: loss 0.0534 - lr: 0.000028 2023-10-24 16:15:10,265 DEV : loss 0.1831832379102707 - f1-score (micro avg) 0.7547 2023-10-24 16:15:10,281 saving best model 2023-10-24 16:15:10,863 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:15:22,992 epoch 6 - iter 198/1984 - loss 0.04237849 - time (sec): 12.13 - samples/sec: 1337.47 - lr: 0.000027 - momentum: 0.000000 2023-10-24 16:15:35,132 epoch 6 - iter 396/1984 - loss 0.04244041 - time (sec): 24.27 - samples/sec: 1330.21 - lr: 0.000027 - momentum: 0.000000 2023-10-24 16:15:47,336 epoch 6 - iter 594/1984 - loss 0.04119784 - time (sec): 36.47 - samples/sec: 1321.23 - lr: 0.000026 - momentum: 0.000000 2023-10-24 16:15:59,255 epoch 6 - iter 792/1984 - loss 0.03907748 - time (sec): 48.39 - samples/sec: 1322.25 - lr: 0.000026 - momentum: 0.000000 2023-10-24 16:16:11,471 epoch 6 - iter 990/1984 - loss 0.03932677 - time (sec): 60.61 - samples/sec: 1329.87 - lr: 0.000025 - momentum: 0.000000 2023-10-24 16:16:23,682 epoch 6 - iter 1188/1984 - loss 0.03914299 - time (sec): 72.82 - samples/sec: 1346.94 - lr: 0.000024 - momentum: 0.000000 2023-10-24 16:16:35,824 epoch 6 - iter 1386/1984 - loss 0.03960027 - time (sec): 84.96 - samples/sec: 1346.44 - lr: 0.000024 - momentum: 0.000000 2023-10-24 16:16:47,954 epoch 6 - iter 1584/1984 - loss 0.03955263 - time (sec): 97.09 - samples/sec: 1343.36 - lr: 0.000023 - momentum: 0.000000 2023-10-24 16:17:00,498 epoch 6 - iter 1782/1984 - loss 0.03906932 - time (sec): 109.63 - samples/sec: 1338.19 - lr: 0.000023 - momentum: 0.000000 2023-10-24 16:17:12,589 epoch 6 - iter 1980/1984 - loss 0.03897848 - time (sec): 121.73 - samples/sec: 1344.46 - lr: 0.000022 - momentum: 0.000000 2023-10-24 16:17:12,829 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:17:12,829 EPOCH 6 done: loss 0.0389 - lr: 0.000022 2023-10-24 16:17:15,940 DEV : loss 0.19957636296749115 - f1-score (micro avg) 0.7514 2023-10-24 16:17:15,955 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:17:28,168 epoch 7 - iter 198/1984 - loss 0.02502948 - time (sec): 12.21 - samples/sec: 1380.74 - lr: 0.000022 - momentum: 0.000000 2023-10-24 16:17:40,287 epoch 7 - iter 396/1984 - loss 0.02588075 - time (sec): 24.33 - samples/sec: 1401.14 - lr: 0.000021 - momentum: 0.000000 2023-10-24 16:17:52,407 epoch 7 - iter 594/1984 - loss 0.02615491 - time (sec): 36.45 - samples/sec: 1368.78 - lr: 0.000021 - momentum: 0.000000 2023-10-24 16:18:04,399 epoch 7 - iter 792/1984 - loss 0.02913086 - time (sec): 48.44 - samples/sec: 1355.33 - lr: 0.000020 - momentum: 0.000000 2023-10-24 16:18:16,592 epoch 7 - iter 990/1984 - loss 0.02832750 - time (sec): 60.64 - samples/sec: 1358.80 - lr: 0.000019 - momentum: 0.000000 2023-10-24 16:18:28,567 epoch 7 - iter 1188/1984 - loss 0.02868687 - time (sec): 72.61 - samples/sec: 1354.60 - lr: 0.000019 - momentum: 0.000000 2023-10-24 16:18:40,812 epoch 7 - iter 1386/1984 - loss 0.02776447 - time (sec): 84.86 - samples/sec: 1358.22 - lr: 0.000018 - momentum: 0.000000 2023-10-24 16:18:53,016 epoch 7 - iter 1584/1984 - loss 0.02756283 - time (sec): 97.06 - samples/sec: 1358.28 - lr: 0.000018 - momentum: 0.000000 2023-10-24 16:19:05,288 epoch 7 - iter 1782/1984 - loss 0.02799328 - time (sec): 109.33 - samples/sec: 1357.02 - lr: 0.000017 - momentum: 0.000000 2023-10-24 16:19:17,165 epoch 7 - iter 1980/1984 - loss 0.02831462 - time (sec): 121.21 - samples/sec: 1350.06 - lr: 0.000017 - momentum: 0.000000 2023-10-24 16:19:17,411 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:19:17,411 EPOCH 7 done: loss 0.0285 - lr: 0.000017 2023-10-24 16:19:20,522 DEV : loss 0.2192317098379135 - f1-score (micro avg) 0.752 2023-10-24 16:19:20,537 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:19:32,740 epoch 8 - iter 198/1984 - loss 0.01571927 - time (sec): 12.20 - samples/sec: 1352.95 - lr: 0.000016 - momentum: 0.000000 2023-10-24 16:19:45,022 epoch 8 - iter 396/1984 - loss 0.02057122 - time (sec): 24.48 - samples/sec: 1364.10 - lr: 0.000016 - momentum: 0.000000 2023-10-24 16:19:57,061 epoch 8 - iter 594/1984 - loss 0.02018865 - time (sec): 36.52 - samples/sec: 1354.89 - lr: 0.000015 - momentum: 0.000000 2023-10-24 16:20:09,359 epoch 8 - iter 792/1984 - loss 0.01942023 - time (sec): 48.82 - samples/sec: 1337.22 - lr: 0.000014 - momentum: 0.000000 2023-10-24 16:20:21,446 epoch 8 - iter 990/1984 - loss 0.01908064 - time (sec): 60.91 - samples/sec: 1341.88 - lr: 0.000014 - momentum: 0.000000 2023-10-24 16:20:33,588 epoch 8 - iter 1188/1984 - loss 0.01893982 - time (sec): 73.05 - samples/sec: 1344.69 - lr: 0.000013 - momentum: 0.000000 2023-10-24 16:20:45,658 epoch 8 - iter 1386/1984 - loss 0.01968258 - time (sec): 85.12 - samples/sec: 1345.13 - lr: 0.000013 - momentum: 0.000000 2023-10-24 16:20:57,835 epoch 8 - iter 1584/1984 - loss 0.01929883 - time (sec): 97.30 - samples/sec: 1347.10 - lr: 0.000012 - momentum: 0.000000 2023-10-24 16:21:09,932 epoch 8 - iter 1782/1984 - loss 0.01879306 - time (sec): 109.39 - samples/sec: 1350.71 - lr: 0.000012 - momentum: 0.000000 2023-10-24 16:21:22,126 epoch 8 - iter 1980/1984 - loss 0.01895222 - time (sec): 121.59 - samples/sec: 1346.49 - lr: 0.000011 - momentum: 0.000000 2023-10-24 16:21:22,360 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:21:22,360 EPOCH 8 done: loss 0.0189 - lr: 0.000011 2023-10-24 16:21:25,487 DEV : loss 0.22360068559646606 - f1-score (micro avg) 0.7521 2023-10-24 16:21:25,502 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:21:37,492 epoch 9 - iter 198/1984 - loss 0.01775440 - time (sec): 11.99 - samples/sec: 1336.38 - lr: 0.000011 - momentum: 0.000000 2023-10-24 16:21:49,622 epoch 9 - iter 396/1984 - loss 0.01720248 - time (sec): 24.12 - samples/sec: 1342.89 - lr: 0.000010 - momentum: 0.000000 2023-10-24 16:22:01,654 epoch 9 - iter 594/1984 - loss 0.01399946 - time (sec): 36.15 - samples/sec: 1334.74 - lr: 0.000009 - momentum: 0.000000 2023-10-24 16:22:13,714 epoch 9 - iter 792/1984 - loss 0.01448509 - time (sec): 48.21 - samples/sec: 1341.70 - lr: 0.000009 - momentum: 0.000000 2023-10-24 16:22:26,082 epoch 9 - iter 990/1984 - loss 0.01346876 - time (sec): 60.58 - samples/sec: 1345.87 - lr: 0.000008 - momentum: 0.000000 2023-10-24 16:22:38,079 epoch 9 - iter 1188/1984 - loss 0.01320978 - time (sec): 72.58 - samples/sec: 1341.53 - lr: 0.000008 - momentum: 0.000000 2023-10-24 16:22:50,327 epoch 9 - iter 1386/1984 - loss 0.01288739 - time (sec): 84.82 - samples/sec: 1337.84 - lr: 0.000007 - momentum: 0.000000 2023-10-24 16:23:02,743 epoch 9 - iter 1584/1984 - loss 0.01244493 - time (sec): 97.24 - samples/sec: 1346.31 - lr: 0.000007 - momentum: 0.000000 2023-10-24 16:23:14,948 epoch 9 - iter 1782/1984 - loss 0.01309784 - time (sec): 109.44 - samples/sec: 1352.32 - lr: 0.000006 - momentum: 0.000000 2023-10-24 16:23:27,037 epoch 9 - iter 1980/1984 - loss 0.01325666 - time (sec): 121.53 - samples/sec: 1347.10 - lr: 0.000006 - momentum: 0.000000 2023-10-24 16:23:27,275 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:23:27,275 EPOCH 9 done: loss 0.0132 - lr: 0.000006 2023-10-24 16:23:30,725 DEV : loss 0.23320023715496063 - f1-score (micro avg) 0.7554 2023-10-24 16:23:30,740 saving best model 2023-10-24 16:23:31,359 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:23:43,428 epoch 10 - iter 198/1984 - loss 0.01269146 - time (sec): 12.07 - samples/sec: 1382.28 - lr: 0.000005 - momentum: 0.000000 2023-10-24 16:23:55,427 epoch 10 - iter 396/1984 - loss 0.01140904 - time (sec): 24.07 - samples/sec: 1352.02 - lr: 0.000004 - momentum: 0.000000 2023-10-24 16:24:07,511 epoch 10 - iter 594/1984 - loss 0.01014037 - time (sec): 36.15 - samples/sec: 1356.57 - lr: 0.000004 - momentum: 0.000000 2023-10-24 16:24:20,208 epoch 10 - iter 792/1984 - loss 0.00961988 - time (sec): 48.85 - samples/sec: 1369.07 - lr: 0.000003 - momentum: 0.000000 2023-10-24 16:24:32,356 epoch 10 - iter 990/1984 - loss 0.00934250 - time (sec): 61.00 - samples/sec: 1360.44 - lr: 0.000003 - momentum: 0.000000 2023-10-24 16:24:44,568 epoch 10 - iter 1188/1984 - loss 0.00880262 - time (sec): 73.21 - samples/sec: 1366.91 - lr: 0.000002 - momentum: 0.000000 2023-10-24 16:24:56,562 epoch 10 - iter 1386/1984 - loss 0.00886579 - time (sec): 85.20 - samples/sec: 1360.69 - lr: 0.000002 - momentum: 0.000000 2023-10-24 16:25:08,616 epoch 10 - iter 1584/1984 - loss 0.00840621 - time (sec): 97.26 - samples/sec: 1364.15 - lr: 0.000001 - momentum: 0.000000 2023-10-24 16:25:20,602 epoch 10 - iter 1782/1984 - loss 0.00865685 - time (sec): 109.24 - samples/sec: 1355.92 - lr: 0.000001 - momentum: 0.000000 2023-10-24 16:25:32,569 epoch 10 - iter 1980/1984 - loss 0.00901465 - time (sec): 121.21 - samples/sec: 1349.08 - lr: 0.000000 - momentum: 0.000000 2023-10-24 16:25:32,857 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:25:32,857 EPOCH 10 done: loss 0.0090 - lr: 0.000000 2023-10-24 16:25:35,980 DEV : loss 0.25083112716674805 - f1-score (micro avg) 0.7613 2023-10-24 16:25:35,995 saving best model 2023-10-24 16:25:37,068 ---------------------------------------------------------------------------------------------------- 2023-10-24 16:25:37,069 Loading model from best epoch ... 2023-10-24 16:25:38,537 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-24 16:25:41,628 Results: - F-score (micro) 0.779 - F-score (macro) 0.6989 - Accuracy 0.6605 By class: precision recall f1-score support LOC 0.8326 0.8504 0.8414 655 PER 0.6911 0.7623 0.7249 223 ORG 0.5922 0.4803 0.5304 127 micro avg 0.7741 0.7841 0.7790 1005 macro avg 0.7053 0.6977 0.6989 1005 weighted avg 0.7708 0.7841 0.7763 1005 2023-10-24 16:25:41,628 ----------------------------------------------------------------------------------------------------