2023-10-07 02:32:17,496 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,497 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-07 02:32:17,497 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,497 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-07 02:32:17,497 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,497 Train: 1100 sentences 2023-10-07 02:32:17,497 (train_with_dev=False, train_with_test=False) 2023-10-07 02:32:17,498 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,498 Training Params: 2023-10-07 02:32:17,498 - learning_rate: "0.00015" 2023-10-07 02:32:17,498 - mini_batch_size: "8" 2023-10-07 02:32:17,498 - max_epochs: "10" 2023-10-07 02:32:17,498 - shuffle: "True" 2023-10-07 02:32:17,498 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,498 Plugins: 2023-10-07 02:32:17,498 - TensorboardLogger 2023-10-07 02:32:17,498 - LinearScheduler | warmup_fraction: '0.1' 2023-10-07 02:32:17,498 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,498 Final evaluation on model from best epoch (best-model.pt) 2023-10-07 02:32:17,498 - metric: "('micro avg', 'f1-score')" 2023-10-07 02:32:17,498 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,498 Computation: 2023-10-07 02:32:17,498 - compute on device: cuda:0 2023-10-07 02:32:17,498 - embedding storage: none 2023-10-07 02:32:17,498 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,498 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5" 2023-10-07 02:32:17,498 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,499 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:32:17,499 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-07 02:32:27,122 epoch 1 - iter 13/138 - loss 3.23950392 - time (sec): 9.62 - samples/sec: 234.24 - lr: 0.000013 - momentum: 0.000000 2023-10-07 02:32:36,931 epoch 1 - iter 26/138 - loss 3.23294652 - time (sec): 19.43 - samples/sec: 236.57 - lr: 0.000027 - momentum: 0.000000 2023-10-07 02:32:46,311 epoch 1 - iter 39/138 - loss 3.22375306 - time (sec): 28.81 - samples/sec: 234.63 - lr: 0.000041 - momentum: 0.000000 2023-10-07 02:32:55,318 epoch 1 - iter 52/138 - loss 3.20884010 - time (sec): 37.82 - samples/sec: 234.25 - lr: 0.000055 - momentum: 0.000000 2023-10-07 02:33:04,427 epoch 1 - iter 65/138 - loss 3.18263354 - time (sec): 46.93 - samples/sec: 231.70 - lr: 0.000070 - momentum: 0.000000 2023-10-07 02:33:13,558 epoch 1 - iter 78/138 - loss 3.13681907 - time (sec): 56.06 - samples/sec: 232.60 - lr: 0.000084 - momentum: 0.000000 2023-10-07 02:33:22,827 epoch 1 - iter 91/138 - loss 3.07844165 - time (sec): 65.33 - samples/sec: 231.33 - lr: 0.000098 - momentum: 0.000000 2023-10-07 02:33:31,997 epoch 1 - iter 104/138 - loss 3.00914898 - time (sec): 74.50 - samples/sec: 230.54 - lr: 0.000112 - momentum: 0.000000 2023-10-07 02:33:41,323 epoch 1 - iter 117/138 - loss 2.93114017 - time (sec): 83.82 - samples/sec: 230.56 - lr: 0.000126 - momentum: 0.000000 2023-10-07 02:33:50,962 epoch 1 - iter 130/138 - loss 2.84686811 - time (sec): 93.46 - samples/sec: 231.23 - lr: 0.000140 - momentum: 0.000000 2023-10-07 02:33:56,240 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:33:56,240 EPOCH 1 done: loss 2.8001 - lr: 0.000140 2023-10-07 02:34:02,703 DEV : loss 1.8336132764816284 - f1-score (micro avg) 0.0 2023-10-07 02:34:02,708 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:34:11,517 epoch 2 - iter 13/138 - loss 1.81554903 - time (sec): 8.81 - samples/sec: 218.21 - lr: 0.000149 - momentum: 0.000000 2023-10-07 02:34:20,683 epoch 2 - iter 26/138 - loss 1.69833454 - time (sec): 17.97 - samples/sec: 224.27 - lr: 0.000147 - momentum: 0.000000 2023-10-07 02:34:30,132 epoch 2 - iter 39/138 - loss 1.62098012 - time (sec): 27.42 - samples/sec: 228.79 - lr: 0.000145 - momentum: 0.000000 2023-10-07 02:34:39,707 epoch 2 - iter 52/138 - loss 1.51593180 - time (sec): 37.00 - samples/sec: 229.66 - lr: 0.000144 - momentum: 0.000000 2023-10-07 02:34:49,179 epoch 2 - iter 65/138 - loss 1.44274035 - time (sec): 46.47 - samples/sec: 228.64 - lr: 0.000142 - momentum: 0.000000 2023-10-07 02:34:59,082 epoch 2 - iter 78/138 - loss 1.34967831 - time (sec): 56.37 - samples/sec: 228.71 - lr: 0.000141 - momentum: 0.000000 2023-10-07 02:35:08,863 epoch 2 - iter 91/138 - loss 1.30304292 - time (sec): 66.15 - samples/sec: 229.50 - lr: 0.000139 - momentum: 0.000000 2023-10-07 02:35:18,167 epoch 2 - iter 104/138 - loss 1.24999561 - time (sec): 75.46 - samples/sec: 230.19 - lr: 0.000138 - momentum: 0.000000 2023-10-07 02:35:27,448 epoch 2 - iter 117/138 - loss 1.20975438 - time (sec): 84.74 - samples/sec: 230.10 - lr: 0.000136 - momentum: 0.000000 2023-10-07 02:35:36,624 epoch 2 - iter 130/138 - loss 1.16323681 - time (sec): 93.91 - samples/sec: 229.01 - lr: 0.000134 - momentum: 0.000000 2023-10-07 02:35:42,116 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:35:42,116 EPOCH 2 done: loss 1.1422 - lr: 0.000134 2023-10-07 02:35:48,588 DEV : loss 0.7314150929450989 - f1-score (micro avg) 0.0098 2023-10-07 02:35:48,593 saving best model 2023-10-07 02:35:49,403 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:35:58,340 epoch 3 - iter 13/138 - loss 0.67865663 - time (sec): 8.94 - samples/sec: 216.43 - lr: 0.000132 - momentum: 0.000000 2023-10-07 02:36:08,416 epoch 3 - iter 26/138 - loss 0.65093229 - time (sec): 19.01 - samples/sec: 226.92 - lr: 0.000130 - momentum: 0.000000 2023-10-07 02:36:17,058 epoch 3 - iter 39/138 - loss 0.62954964 - time (sec): 27.65 - samples/sec: 225.87 - lr: 0.000129 - momentum: 0.000000 2023-10-07 02:36:26,144 epoch 3 - iter 52/138 - loss 0.61838245 - time (sec): 36.74 - samples/sec: 225.72 - lr: 0.000127 - momentum: 0.000000 2023-10-07 02:36:36,249 epoch 3 - iter 65/138 - loss 0.61059235 - time (sec): 46.84 - samples/sec: 226.56 - lr: 0.000126 - momentum: 0.000000 2023-10-07 02:36:45,829 epoch 3 - iter 78/138 - loss 0.60396857 - time (sec): 56.42 - samples/sec: 227.79 - lr: 0.000124 - momentum: 0.000000 2023-10-07 02:36:55,295 epoch 3 - iter 91/138 - loss 0.59377426 - time (sec): 65.89 - samples/sec: 228.12 - lr: 0.000123 - momentum: 0.000000 2023-10-07 02:37:04,575 epoch 3 - iter 104/138 - loss 0.57650818 - time (sec): 75.17 - samples/sec: 228.68 - lr: 0.000121 - momentum: 0.000000 2023-10-07 02:37:14,257 epoch 3 - iter 117/138 - loss 0.55578468 - time (sec): 84.85 - samples/sec: 228.07 - lr: 0.000119 - momentum: 0.000000 2023-10-07 02:37:23,454 epoch 3 - iter 130/138 - loss 0.54732319 - time (sec): 94.05 - samples/sec: 227.16 - lr: 0.000118 - momentum: 0.000000 2023-10-07 02:37:29,480 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:37:29,480 EPOCH 3 done: loss 0.5408 - lr: 0.000118 2023-10-07 02:37:36,007 DEV : loss 0.41537272930145264 - f1-score (micro avg) 0.5086 2023-10-07 02:37:36,012 saving best model 2023-10-07 02:37:36,883 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:37:45,657 epoch 4 - iter 13/138 - loss 0.41730928 - time (sec): 8.77 - samples/sec: 223.66 - lr: 0.000115 - momentum: 0.000000 2023-10-07 02:37:54,711 epoch 4 - iter 26/138 - loss 0.39108519 - time (sec): 17.83 - samples/sec: 218.77 - lr: 0.000114 - momentum: 0.000000 2023-10-07 02:38:04,453 epoch 4 - iter 39/138 - loss 0.37911101 - time (sec): 27.57 - samples/sec: 222.39 - lr: 0.000112 - momentum: 0.000000 2023-10-07 02:38:13,417 epoch 4 - iter 52/138 - loss 0.36736451 - time (sec): 36.53 - samples/sec: 221.23 - lr: 0.000111 - momentum: 0.000000 2023-10-07 02:38:23,101 epoch 4 - iter 65/138 - loss 0.36714260 - time (sec): 46.22 - samples/sec: 223.17 - lr: 0.000109 - momentum: 0.000000 2023-10-07 02:38:33,354 epoch 4 - iter 78/138 - loss 0.35653417 - time (sec): 56.47 - samples/sec: 225.33 - lr: 0.000107 - momentum: 0.000000 2023-10-07 02:38:42,585 epoch 4 - iter 91/138 - loss 0.35492787 - time (sec): 65.70 - samples/sec: 226.10 - lr: 0.000106 - momentum: 0.000000 2023-10-07 02:38:52,420 epoch 4 - iter 104/138 - loss 0.34153717 - time (sec): 75.54 - samples/sec: 225.56 - lr: 0.000104 - momentum: 0.000000 2023-10-07 02:39:01,599 epoch 4 - iter 117/138 - loss 0.33112427 - time (sec): 84.71 - samples/sec: 224.74 - lr: 0.000103 - momentum: 0.000000 2023-10-07 02:39:11,151 epoch 4 - iter 130/138 - loss 0.33325350 - time (sec): 94.27 - samples/sec: 225.59 - lr: 0.000101 - momentum: 0.000000 2023-10-07 02:39:17,180 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:39:17,181 EPOCH 4 done: loss 0.3293 - lr: 0.000101 2023-10-07 02:39:23,717 DEV : loss 0.27699893712997437 - f1-score (micro avg) 0.7634 2023-10-07 02:39:23,722 saving best model 2023-10-07 02:39:24,583 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:39:34,215 epoch 5 - iter 13/138 - loss 0.29609594 - time (sec): 9.63 - samples/sec: 233.94 - lr: 0.000099 - momentum: 0.000000 2023-10-07 02:39:43,672 epoch 5 - iter 26/138 - loss 0.26416189 - time (sec): 19.09 - samples/sec: 230.09 - lr: 0.000097 - momentum: 0.000000 2023-10-07 02:39:52,544 epoch 5 - iter 39/138 - loss 0.25378256 - time (sec): 27.96 - samples/sec: 227.86 - lr: 0.000096 - momentum: 0.000000 2023-10-07 02:40:02,973 epoch 5 - iter 52/138 - loss 0.24277473 - time (sec): 38.39 - samples/sec: 228.84 - lr: 0.000094 - momentum: 0.000000 2023-10-07 02:40:12,700 epoch 5 - iter 65/138 - loss 0.24274064 - time (sec): 48.12 - samples/sec: 228.13 - lr: 0.000092 - momentum: 0.000000 2023-10-07 02:40:22,307 epoch 5 - iter 78/138 - loss 0.23804003 - time (sec): 57.72 - samples/sec: 228.28 - lr: 0.000091 - momentum: 0.000000 2023-10-07 02:40:32,075 epoch 5 - iter 91/138 - loss 0.23577467 - time (sec): 67.49 - samples/sec: 228.36 - lr: 0.000089 - momentum: 0.000000 2023-10-07 02:40:41,181 epoch 5 - iter 104/138 - loss 0.23462504 - time (sec): 76.60 - samples/sec: 227.82 - lr: 0.000088 - momentum: 0.000000 2023-10-07 02:40:50,284 epoch 5 - iter 117/138 - loss 0.22828078 - time (sec): 85.70 - samples/sec: 226.37 - lr: 0.000086 - momentum: 0.000000 2023-10-07 02:40:59,708 epoch 5 - iter 130/138 - loss 0.22375484 - time (sec): 95.12 - samples/sec: 226.86 - lr: 0.000085 - momentum: 0.000000 2023-10-07 02:41:04,940 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:41:04,940 EPOCH 5 done: loss 0.2209 - lr: 0.000085 2023-10-07 02:41:11,487 DEV : loss 0.2055675983428955 - f1-score (micro avg) 0.7803 2023-10-07 02:41:11,492 saving best model 2023-10-07 02:41:12,356 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:41:21,697 epoch 6 - iter 13/138 - loss 0.15957969 - time (sec): 9.34 - samples/sec: 216.59 - lr: 0.000082 - momentum: 0.000000 2023-10-07 02:41:31,323 epoch 6 - iter 26/138 - loss 0.19236976 - time (sec): 18.97 - samples/sec: 225.67 - lr: 0.000080 - momentum: 0.000000 2023-10-07 02:41:40,802 epoch 6 - iter 39/138 - loss 0.18006248 - time (sec): 28.45 - samples/sec: 228.37 - lr: 0.000079 - momentum: 0.000000 2023-10-07 02:41:50,536 epoch 6 - iter 52/138 - loss 0.16731510 - time (sec): 38.18 - samples/sec: 228.40 - lr: 0.000077 - momentum: 0.000000 2023-10-07 02:41:59,396 epoch 6 - iter 65/138 - loss 0.17061749 - time (sec): 47.04 - samples/sec: 226.96 - lr: 0.000076 - momentum: 0.000000 2023-10-07 02:42:08,688 epoch 6 - iter 78/138 - loss 0.16658256 - time (sec): 56.33 - samples/sec: 227.69 - lr: 0.000074 - momentum: 0.000000 2023-10-07 02:42:18,761 epoch 6 - iter 91/138 - loss 0.15985062 - time (sec): 66.40 - samples/sec: 228.07 - lr: 0.000073 - momentum: 0.000000 2023-10-07 02:42:28,449 epoch 6 - iter 104/138 - loss 0.15862222 - time (sec): 76.09 - samples/sec: 228.53 - lr: 0.000071 - momentum: 0.000000 2023-10-07 02:42:37,641 epoch 6 - iter 117/138 - loss 0.15857352 - time (sec): 85.28 - samples/sec: 228.07 - lr: 0.000070 - momentum: 0.000000 2023-10-07 02:42:47,336 epoch 6 - iter 130/138 - loss 0.15582763 - time (sec): 94.98 - samples/sec: 228.19 - lr: 0.000068 - momentum: 0.000000 2023-10-07 02:42:52,489 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:42:52,490 EPOCH 6 done: loss 0.1553 - lr: 0.000068 2023-10-07 02:42:59,020 DEV : loss 0.1569966971874237 - f1-score (micro avg) 0.8361 2023-10-07 02:42:59,025 saving best model 2023-10-07 02:42:59,881 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:43:09,393 epoch 7 - iter 13/138 - loss 0.12210910 - time (sec): 9.51 - samples/sec: 227.11 - lr: 0.000065 - momentum: 0.000000 2023-10-07 02:43:18,749 epoch 7 - iter 26/138 - loss 0.13136589 - time (sec): 18.87 - samples/sec: 231.47 - lr: 0.000064 - momentum: 0.000000 2023-10-07 02:43:27,437 epoch 7 - iter 39/138 - loss 0.12685632 - time (sec): 27.55 - samples/sec: 227.73 - lr: 0.000062 - momentum: 0.000000 2023-10-07 02:43:36,804 epoch 7 - iter 52/138 - loss 0.12437607 - time (sec): 36.92 - samples/sec: 226.29 - lr: 0.000061 - momentum: 0.000000 2023-10-07 02:43:46,516 epoch 7 - iter 65/138 - loss 0.12544720 - time (sec): 46.63 - samples/sec: 228.80 - lr: 0.000059 - momentum: 0.000000 2023-10-07 02:43:56,883 epoch 7 - iter 78/138 - loss 0.12074871 - time (sec): 57.00 - samples/sec: 229.44 - lr: 0.000058 - momentum: 0.000000 2023-10-07 02:44:06,326 epoch 7 - iter 91/138 - loss 0.11877019 - time (sec): 66.44 - samples/sec: 229.74 - lr: 0.000056 - momentum: 0.000000 2023-10-07 02:44:15,483 epoch 7 - iter 104/138 - loss 0.11774016 - time (sec): 75.60 - samples/sec: 228.68 - lr: 0.000054 - momentum: 0.000000 2023-10-07 02:44:24,920 epoch 7 - iter 117/138 - loss 0.11489264 - time (sec): 85.04 - samples/sec: 227.48 - lr: 0.000053 - momentum: 0.000000 2023-10-07 02:44:34,360 epoch 7 - iter 130/138 - loss 0.11405173 - time (sec): 94.48 - samples/sec: 227.62 - lr: 0.000051 - momentum: 0.000000 2023-10-07 02:44:40,016 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:44:40,016 EPOCH 7 done: loss 0.1149 - lr: 0.000051 2023-10-07 02:44:46,566 DEV : loss 0.1391897201538086 - f1-score (micro avg) 0.8452 2023-10-07 02:44:46,571 saving best model 2023-10-07 02:44:47,428 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:44:56,961 epoch 8 - iter 13/138 - loss 0.09977073 - time (sec): 9.53 - samples/sec: 229.33 - lr: 0.000049 - momentum: 0.000000 2023-10-07 02:45:06,306 epoch 8 - iter 26/138 - loss 0.08848134 - time (sec): 18.88 - samples/sec: 228.10 - lr: 0.000047 - momentum: 0.000000 2023-10-07 02:45:15,493 epoch 8 - iter 39/138 - loss 0.08663607 - time (sec): 28.06 - samples/sec: 224.41 - lr: 0.000046 - momentum: 0.000000 2023-10-07 02:45:24,699 epoch 8 - iter 52/138 - loss 0.08920965 - time (sec): 37.27 - samples/sec: 223.53 - lr: 0.000044 - momentum: 0.000000 2023-10-07 02:45:33,688 epoch 8 - iter 65/138 - loss 0.09877587 - time (sec): 46.26 - samples/sec: 224.69 - lr: 0.000043 - momentum: 0.000000 2023-10-07 02:45:43,404 epoch 8 - iter 78/138 - loss 0.10012145 - time (sec): 55.97 - samples/sec: 225.92 - lr: 0.000041 - momentum: 0.000000 2023-10-07 02:45:53,253 epoch 8 - iter 91/138 - loss 0.10168883 - time (sec): 65.82 - samples/sec: 226.85 - lr: 0.000039 - momentum: 0.000000 2023-10-07 02:46:02,464 epoch 8 - iter 104/138 - loss 0.10004332 - time (sec): 75.04 - samples/sec: 227.45 - lr: 0.000038 - momentum: 0.000000 2023-10-07 02:46:13,005 epoch 8 - iter 117/138 - loss 0.09441411 - time (sec): 85.58 - samples/sec: 229.29 - lr: 0.000036 - momentum: 0.000000 2023-10-07 02:46:22,308 epoch 8 - iter 130/138 - loss 0.09178666 - time (sec): 94.88 - samples/sec: 228.04 - lr: 0.000035 - momentum: 0.000000 2023-10-07 02:46:27,508 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:46:27,509 EPOCH 8 done: loss 0.0926 - lr: 0.000035 2023-10-07 02:46:34,029 DEV : loss 0.13359327614307404 - f1-score (micro avg) 0.8558 2023-10-07 02:46:34,035 saving best model 2023-10-07 02:46:34,878 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:46:44,821 epoch 9 - iter 13/138 - loss 0.06127064 - time (sec): 9.94 - samples/sec: 229.35 - lr: 0.000032 - momentum: 0.000000 2023-10-07 02:46:53,817 epoch 9 - iter 26/138 - loss 0.08059964 - time (sec): 18.94 - samples/sec: 222.47 - lr: 0.000031 - momentum: 0.000000 2023-10-07 02:47:03,538 epoch 9 - iter 39/138 - loss 0.07773471 - time (sec): 28.66 - samples/sec: 223.74 - lr: 0.000029 - momentum: 0.000000 2023-10-07 02:47:13,056 epoch 9 - iter 52/138 - loss 0.07868973 - time (sec): 38.18 - samples/sec: 227.05 - lr: 0.000027 - momentum: 0.000000 2023-10-07 02:47:22,718 epoch 9 - iter 65/138 - loss 0.07863330 - time (sec): 47.84 - samples/sec: 227.60 - lr: 0.000026 - momentum: 0.000000 2023-10-07 02:47:32,636 epoch 9 - iter 78/138 - loss 0.07831459 - time (sec): 57.76 - samples/sec: 228.48 - lr: 0.000024 - momentum: 0.000000 2023-10-07 02:47:42,453 epoch 9 - iter 91/138 - loss 0.07784224 - time (sec): 67.57 - samples/sec: 228.70 - lr: 0.000023 - momentum: 0.000000 2023-10-07 02:47:51,829 epoch 9 - iter 104/138 - loss 0.07779023 - time (sec): 76.95 - samples/sec: 229.35 - lr: 0.000021 - momentum: 0.000000 2023-10-07 02:48:00,634 epoch 9 - iter 117/138 - loss 0.07780985 - time (sec): 85.75 - samples/sec: 227.17 - lr: 0.000020 - momentum: 0.000000 2023-10-07 02:48:09,886 epoch 9 - iter 130/138 - loss 0.07806621 - time (sec): 95.01 - samples/sec: 227.62 - lr: 0.000018 - momentum: 0.000000 2023-10-07 02:48:15,165 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:48:15,166 EPOCH 9 done: loss 0.0790 - lr: 0.000018 2023-10-07 02:48:21,735 DEV : loss 0.1287166029214859 - f1-score (micro avg) 0.8612 2023-10-07 02:48:21,740 saving best model 2023-10-07 02:48:22,607 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:48:31,902 epoch 10 - iter 13/138 - loss 0.06457445 - time (sec): 9.29 - samples/sec: 226.28 - lr: 0.000016 - momentum: 0.000000 2023-10-07 02:48:41,001 epoch 10 - iter 26/138 - loss 0.06518611 - time (sec): 18.39 - samples/sec: 231.95 - lr: 0.000014 - momentum: 0.000000 2023-10-07 02:48:50,674 epoch 10 - iter 39/138 - loss 0.06244882 - time (sec): 28.07 - samples/sec: 230.46 - lr: 0.000012 - momentum: 0.000000 2023-10-07 02:49:00,204 epoch 10 - iter 52/138 - loss 0.06033542 - time (sec): 37.60 - samples/sec: 229.79 - lr: 0.000011 - momentum: 0.000000 2023-10-07 02:49:10,057 epoch 10 - iter 65/138 - loss 0.06101827 - time (sec): 47.45 - samples/sec: 229.09 - lr: 0.000009 - momentum: 0.000000 2023-10-07 02:49:19,739 epoch 10 - iter 78/138 - loss 0.06655719 - time (sec): 57.13 - samples/sec: 228.62 - lr: 0.000008 - momentum: 0.000000 2023-10-07 02:49:29,561 epoch 10 - iter 91/138 - loss 0.06995035 - time (sec): 66.95 - samples/sec: 229.36 - lr: 0.000006 - momentum: 0.000000 2023-10-07 02:49:38,808 epoch 10 - iter 104/138 - loss 0.07529583 - time (sec): 76.20 - samples/sec: 228.62 - lr: 0.000005 - momentum: 0.000000 2023-10-07 02:49:47,956 epoch 10 - iter 117/138 - loss 0.07619055 - time (sec): 85.35 - samples/sec: 228.71 - lr: 0.000003 - momentum: 0.000000 2023-10-07 02:49:57,052 epoch 10 - iter 130/138 - loss 0.07468101 - time (sec): 94.44 - samples/sec: 227.29 - lr: 0.000001 - momentum: 0.000000 2023-10-07 02:50:02,789 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:50:02,790 EPOCH 10 done: loss 0.0742 - lr: 0.000001 2023-10-07 02:50:09,321 DEV : loss 0.12669287621974945 - f1-score (micro avg) 0.8595 2023-10-07 02:50:10,271 ---------------------------------------------------------------------------------------------------- 2023-10-07 02:50:10,272 Loading model from best epoch ... 2023-10-07 02:50:13,773 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-07 02:50:20,630 Results: - F-score (micro) 0.8735 - F-score (macro) 0.52 - Accuracy 0.8053 By class: precision recall f1-score support scope 0.8864 0.8864 0.8864 176 pers 0.9008 0.9219 0.9112 128 work 0.7821 0.8243 0.8026 74 object 0.0000 0.0000 0.0000 2 loc 0.0000 0.0000 0.0000 2 micro avg 0.8701 0.8770 0.8735 382 macro avg 0.5138 0.5265 0.5200 382 weighted avg 0.8617 0.8770 0.8692 382 2023-10-07 02:50:20,630 ----------------------------------------------------------------------------------------------------