2023-10-11 02:03:29,597 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,599 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 02:03:29,599 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,599 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-11 02:03:29,599 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,599 Train: 1166 sentences 2023-10-11 02:03:29,599 (train_with_dev=False, train_with_test=False) 2023-10-11 02:03:29,600 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,600 Training Params: 2023-10-11 02:03:29,600 - learning_rate: "0.00015" 2023-10-11 02:03:29,600 - mini_batch_size: "4" 2023-10-11 02:03:29,600 - max_epochs: "10" 2023-10-11 02:03:29,600 - shuffle: "True" 2023-10-11 02:03:29,600 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,600 Plugins: 2023-10-11 02:03:29,600 - TensorboardLogger 2023-10-11 02:03:29,600 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 02:03:29,600 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,600 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 02:03:29,600 - metric: "('micro avg', 'f1-score')" 2023-10-11 02:03:29,600 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,600 Computation: 2023-10-11 02:03:29,600 - compute on device: cuda:0 2023-10-11 02:03:29,601 - embedding storage: none 2023-10-11 02:03:29,601 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,601 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-11 02:03:29,601 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,601 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:03:29,601 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 02:03:39,332 epoch 1 - iter 29/292 - loss 2.84429348 - time (sec): 9.73 - samples/sec: 484.22 - lr: 0.000014 - momentum: 0.000000 2023-10-11 02:03:48,690 epoch 1 - iter 58/292 - loss 2.83269053 - time (sec): 19.09 - samples/sec: 464.92 - lr: 0.000029 - momentum: 0.000000 2023-10-11 02:03:57,721 epoch 1 - iter 87/292 - loss 2.81160532 - time (sec): 28.12 - samples/sec: 453.08 - lr: 0.000044 - momentum: 0.000000 2023-10-11 02:04:07,505 epoch 1 - iter 116/292 - loss 2.75406194 - time (sec): 37.90 - samples/sec: 458.38 - lr: 0.000059 - momentum: 0.000000 2023-10-11 02:04:17,354 epoch 1 - iter 145/292 - loss 2.67279044 - time (sec): 47.75 - samples/sec: 444.26 - lr: 0.000074 - momentum: 0.000000 2023-10-11 02:04:27,700 epoch 1 - iter 174/292 - loss 2.56278491 - time (sec): 58.10 - samples/sec: 442.08 - lr: 0.000089 - momentum: 0.000000 2023-10-11 02:04:40,037 epoch 1 - iter 203/292 - loss 2.43689563 - time (sec): 70.43 - samples/sec: 444.79 - lr: 0.000104 - momentum: 0.000000 2023-10-11 02:04:50,647 epoch 1 - iter 232/292 - loss 2.32861569 - time (sec): 81.04 - samples/sec: 441.39 - lr: 0.000119 - momentum: 0.000000 2023-10-11 02:05:00,643 epoch 1 - iter 261/292 - loss 2.22858135 - time (sec): 91.04 - samples/sec: 434.74 - lr: 0.000134 - momentum: 0.000000 2023-10-11 02:05:11,408 epoch 1 - iter 290/292 - loss 2.09578649 - time (sec): 101.81 - samples/sec: 434.43 - lr: 0.000148 - momentum: 0.000000 2023-10-11 02:05:11,941 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:05:11,941 EPOCH 1 done: loss 2.0902 - lr: 0.000148 2023-10-11 02:05:17,843 DEV : loss 0.7255058884620667 - f1-score (micro avg) 0.0 2023-10-11 02:05:17,854 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:05:27,770 epoch 2 - iter 29/292 - loss 0.68534675 - time (sec): 9.91 - samples/sec: 425.16 - lr: 0.000148 - momentum: 0.000000 2023-10-11 02:05:37,005 epoch 2 - iter 58/292 - loss 0.62711440 - time (sec): 19.15 - samples/sec: 445.25 - lr: 0.000147 - momentum: 0.000000 2023-10-11 02:05:46,117 epoch 2 - iter 87/292 - loss 0.60946323 - time (sec): 28.26 - samples/sec: 454.36 - lr: 0.000145 - momentum: 0.000000 2023-10-11 02:05:55,637 epoch 2 - iter 116/292 - loss 0.57630081 - time (sec): 37.78 - samples/sec: 464.23 - lr: 0.000143 - momentum: 0.000000 2023-10-11 02:06:04,964 epoch 2 - iter 145/292 - loss 0.56137844 - time (sec): 47.11 - samples/sec: 467.44 - lr: 0.000142 - momentum: 0.000000 2023-10-11 02:06:15,268 epoch 2 - iter 174/292 - loss 0.58255910 - time (sec): 57.41 - samples/sec: 469.64 - lr: 0.000140 - momentum: 0.000000 2023-10-11 02:06:24,976 epoch 2 - iter 203/292 - loss 0.55973709 - time (sec): 67.12 - samples/sec: 473.57 - lr: 0.000138 - momentum: 0.000000 2023-10-11 02:06:34,422 epoch 2 - iter 232/292 - loss 0.55168106 - time (sec): 76.57 - samples/sec: 470.21 - lr: 0.000137 - momentum: 0.000000 2023-10-11 02:06:44,338 epoch 2 - iter 261/292 - loss 0.53976723 - time (sec): 86.48 - samples/sec: 467.70 - lr: 0.000135 - momentum: 0.000000 2023-10-11 02:06:53,569 epoch 2 - iter 290/292 - loss 0.52270626 - time (sec): 95.71 - samples/sec: 462.85 - lr: 0.000134 - momentum: 0.000000 2023-10-11 02:06:54,004 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:06:54,004 EPOCH 2 done: loss 0.5235 - lr: 0.000134 2023-10-11 02:06:59,532 DEV : loss 0.2814147174358368 - f1-score (micro avg) 0.2175 2023-10-11 02:06:59,541 saving best model 2023-10-11 02:07:00,385 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:07:09,539 epoch 3 - iter 29/292 - loss 0.37056204 - time (sec): 9.15 - samples/sec: 419.29 - lr: 0.000132 - momentum: 0.000000 2023-10-11 02:07:19,481 epoch 3 - iter 58/292 - loss 0.35422577 - time (sec): 19.09 - samples/sec: 449.18 - lr: 0.000130 - momentum: 0.000000 2023-10-11 02:07:28,487 epoch 3 - iter 87/292 - loss 0.31777707 - time (sec): 28.10 - samples/sec: 440.11 - lr: 0.000128 - momentum: 0.000000 2023-10-11 02:07:38,353 epoch 3 - iter 116/292 - loss 0.34912994 - time (sec): 37.96 - samples/sec: 455.13 - lr: 0.000127 - momentum: 0.000000 2023-10-11 02:07:48,189 epoch 3 - iter 145/292 - loss 0.32802305 - time (sec): 47.80 - samples/sec: 460.32 - lr: 0.000125 - momentum: 0.000000 2023-10-11 02:07:58,935 epoch 3 - iter 174/292 - loss 0.32676228 - time (sec): 58.55 - samples/sec: 461.46 - lr: 0.000123 - momentum: 0.000000 2023-10-11 02:08:08,240 epoch 3 - iter 203/292 - loss 0.31601892 - time (sec): 67.85 - samples/sec: 453.06 - lr: 0.000122 - momentum: 0.000000 2023-10-11 02:08:17,483 epoch 3 - iter 232/292 - loss 0.31183809 - time (sec): 77.10 - samples/sec: 448.53 - lr: 0.000120 - momentum: 0.000000 2023-10-11 02:08:27,235 epoch 3 - iter 261/292 - loss 0.30614711 - time (sec): 86.85 - samples/sec: 449.12 - lr: 0.000119 - momentum: 0.000000 2023-10-11 02:08:38,338 epoch 3 - iter 290/292 - loss 0.30172615 - time (sec): 97.95 - samples/sec: 452.48 - lr: 0.000117 - momentum: 0.000000 2023-10-11 02:08:38,777 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:08:38,777 EPOCH 3 done: loss 0.3015 - lr: 0.000117 2023-10-11 02:08:44,318 DEV : loss 0.21014495193958282 - f1-score (micro avg) 0.473 2023-10-11 02:08:44,327 saving best model 2023-10-11 02:08:47,053 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:08:56,413 epoch 4 - iter 29/292 - loss 0.17868707 - time (sec): 9.36 - samples/sec: 452.21 - lr: 0.000115 - momentum: 0.000000 2023-10-11 02:09:05,867 epoch 4 - iter 58/292 - loss 0.22202810 - time (sec): 18.81 - samples/sec: 454.75 - lr: 0.000113 - momentum: 0.000000 2023-10-11 02:09:15,686 epoch 4 - iter 87/292 - loss 0.20143315 - time (sec): 28.63 - samples/sec: 460.09 - lr: 0.000112 - momentum: 0.000000 2023-10-11 02:09:25,104 epoch 4 - iter 116/292 - loss 0.19193560 - time (sec): 38.05 - samples/sec: 458.97 - lr: 0.000110 - momentum: 0.000000 2023-10-11 02:09:34,252 epoch 4 - iter 145/292 - loss 0.19539674 - time (sec): 47.19 - samples/sec: 455.14 - lr: 0.000108 - momentum: 0.000000 2023-10-11 02:09:44,020 epoch 4 - iter 174/292 - loss 0.20566778 - time (sec): 56.96 - samples/sec: 458.16 - lr: 0.000107 - momentum: 0.000000 2023-10-11 02:09:53,728 epoch 4 - iter 203/292 - loss 0.20920884 - time (sec): 66.67 - samples/sec: 458.11 - lr: 0.000105 - momentum: 0.000000 2023-10-11 02:10:03,489 epoch 4 - iter 232/292 - loss 0.20383533 - time (sec): 76.43 - samples/sec: 458.51 - lr: 0.000104 - momentum: 0.000000 2023-10-11 02:10:12,896 epoch 4 - iter 261/292 - loss 0.20083516 - time (sec): 85.84 - samples/sec: 454.86 - lr: 0.000102 - momentum: 0.000000 2023-10-11 02:10:22,972 epoch 4 - iter 290/292 - loss 0.19698644 - time (sec): 95.91 - samples/sec: 459.07 - lr: 0.000100 - momentum: 0.000000 2023-10-11 02:10:23,649 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:10:23,649 EPOCH 4 done: loss 0.1961 - lr: 0.000100 2023-10-11 02:10:29,125 DEV : loss 0.1601504385471344 - f1-score (micro avg) 0.636 2023-10-11 02:10:29,134 saving best model 2023-10-11 02:10:31,678 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:10:42,576 epoch 5 - iter 29/292 - loss 0.17540229 - time (sec): 10.89 - samples/sec: 511.73 - lr: 0.000098 - momentum: 0.000000 2023-10-11 02:10:52,625 epoch 5 - iter 58/292 - loss 0.15692117 - time (sec): 20.94 - samples/sec: 493.96 - lr: 0.000097 - momentum: 0.000000 2023-10-11 02:11:01,692 epoch 5 - iter 87/292 - loss 0.15917298 - time (sec): 30.01 - samples/sec: 471.68 - lr: 0.000095 - momentum: 0.000000 2023-10-11 02:11:11,418 epoch 5 - iter 116/292 - loss 0.15002216 - time (sec): 39.74 - samples/sec: 471.89 - lr: 0.000093 - momentum: 0.000000 2023-10-11 02:11:20,887 epoch 5 - iter 145/292 - loss 0.15336799 - time (sec): 49.21 - samples/sec: 468.08 - lr: 0.000092 - momentum: 0.000000 2023-10-11 02:11:30,022 epoch 5 - iter 174/292 - loss 0.15146419 - time (sec): 58.34 - samples/sec: 463.19 - lr: 0.000090 - momentum: 0.000000 2023-10-11 02:11:39,439 epoch 5 - iter 203/292 - loss 0.14300769 - time (sec): 67.76 - samples/sec: 460.45 - lr: 0.000089 - momentum: 0.000000 2023-10-11 02:11:48,817 epoch 5 - iter 232/292 - loss 0.13804968 - time (sec): 77.13 - samples/sec: 459.44 - lr: 0.000087 - momentum: 0.000000 2023-10-11 02:11:58,717 epoch 5 - iter 261/292 - loss 0.13613741 - time (sec): 87.04 - samples/sec: 450.73 - lr: 0.000085 - momentum: 0.000000 2023-10-11 02:12:09,567 epoch 5 - iter 290/292 - loss 0.13557876 - time (sec): 97.88 - samples/sec: 450.67 - lr: 0.000084 - momentum: 0.000000 2023-10-11 02:12:10,207 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:12:10,207 EPOCH 5 done: loss 0.1354 - lr: 0.000084 2023-10-11 02:12:15,671 DEV : loss 0.13984300196170807 - f1-score (micro avg) 0.7489 2023-10-11 02:12:15,680 saving best model 2023-10-11 02:12:18,205 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:12:29,773 epoch 6 - iter 29/292 - loss 0.10058422 - time (sec): 11.56 - samples/sec: 474.25 - lr: 0.000082 - momentum: 0.000000 2023-10-11 02:12:40,312 epoch 6 - iter 58/292 - loss 0.08544370 - time (sec): 22.10 - samples/sec: 453.38 - lr: 0.000080 - momentum: 0.000000 2023-10-11 02:12:50,356 epoch 6 - iter 87/292 - loss 0.08708928 - time (sec): 32.15 - samples/sec: 426.73 - lr: 0.000078 - momentum: 0.000000 2023-10-11 02:13:01,548 epoch 6 - iter 116/292 - loss 0.08092895 - time (sec): 43.34 - samples/sec: 424.08 - lr: 0.000077 - momentum: 0.000000 2023-10-11 02:13:11,989 epoch 6 - iter 145/292 - loss 0.08555962 - time (sec): 53.78 - samples/sec: 412.40 - lr: 0.000075 - momentum: 0.000000 2023-10-11 02:13:23,353 epoch 6 - iter 174/292 - loss 0.09297579 - time (sec): 65.14 - samples/sec: 417.00 - lr: 0.000074 - momentum: 0.000000 2023-10-11 02:13:34,709 epoch 6 - iter 203/292 - loss 0.09014615 - time (sec): 76.50 - samples/sec: 418.86 - lr: 0.000072 - momentum: 0.000000 2023-10-11 02:13:45,045 epoch 6 - iter 232/292 - loss 0.09168343 - time (sec): 86.84 - samples/sec: 414.84 - lr: 0.000070 - momentum: 0.000000 2023-10-11 02:13:55,135 epoch 6 - iter 261/292 - loss 0.09313428 - time (sec): 96.93 - samples/sec: 410.14 - lr: 0.000069 - momentum: 0.000000 2023-10-11 02:14:05,786 epoch 6 - iter 290/292 - loss 0.09150458 - time (sec): 107.58 - samples/sec: 409.97 - lr: 0.000067 - momentum: 0.000000 2023-10-11 02:14:06,439 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:14:06,439 EPOCH 6 done: loss 0.0919 - lr: 0.000067 2023-10-11 02:14:12,053 DEV : loss 0.13401371240615845 - f1-score (micro avg) 0.7389 2023-10-11 02:14:12,062 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:14:22,013 epoch 7 - iter 29/292 - loss 0.07478785 - time (sec): 9.95 - samples/sec: 383.86 - lr: 0.000065 - momentum: 0.000000 2023-10-11 02:14:32,431 epoch 7 - iter 58/292 - loss 0.06829117 - time (sec): 20.37 - samples/sec: 403.99 - lr: 0.000063 - momentum: 0.000000 2023-10-11 02:14:42,354 epoch 7 - iter 87/292 - loss 0.07113550 - time (sec): 30.29 - samples/sec: 398.12 - lr: 0.000062 - momentum: 0.000000 2023-10-11 02:14:53,390 epoch 7 - iter 116/292 - loss 0.06732577 - time (sec): 41.33 - samples/sec: 411.00 - lr: 0.000060 - momentum: 0.000000 2023-10-11 02:15:04,754 epoch 7 - iter 145/292 - loss 0.06901744 - time (sec): 52.69 - samples/sec: 423.29 - lr: 0.000059 - momentum: 0.000000 2023-10-11 02:15:15,429 epoch 7 - iter 174/292 - loss 0.07310352 - time (sec): 63.36 - samples/sec: 418.42 - lr: 0.000057 - momentum: 0.000000 2023-10-11 02:15:26,391 epoch 7 - iter 203/292 - loss 0.07065403 - time (sec): 74.33 - samples/sec: 421.93 - lr: 0.000055 - momentum: 0.000000 2023-10-11 02:15:36,762 epoch 7 - iter 232/292 - loss 0.06950929 - time (sec): 84.70 - samples/sec: 419.08 - lr: 0.000054 - momentum: 0.000000 2023-10-11 02:15:47,152 epoch 7 - iter 261/292 - loss 0.06943601 - time (sec): 95.09 - samples/sec: 417.65 - lr: 0.000052 - momentum: 0.000000 2023-10-11 02:15:57,855 epoch 7 - iter 290/292 - loss 0.06956363 - time (sec): 105.79 - samples/sec: 418.22 - lr: 0.000050 - momentum: 0.000000 2023-10-11 02:15:58,400 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:15:58,400 EPOCH 7 done: loss 0.0696 - lr: 0.000050 2023-10-11 02:16:04,072 DEV : loss 0.13391728699207306 - f1-score (micro avg) 0.7357 2023-10-11 02:16:04,081 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:16:14,249 epoch 8 - iter 29/292 - loss 0.06820668 - time (sec): 10.17 - samples/sec: 424.47 - lr: 0.000048 - momentum: 0.000000 2023-10-11 02:16:24,061 epoch 8 - iter 58/292 - loss 0.05230772 - time (sec): 19.98 - samples/sec: 456.15 - lr: 0.000047 - momentum: 0.000000 2023-10-11 02:16:33,158 epoch 8 - iter 87/292 - loss 0.05006237 - time (sec): 29.08 - samples/sec: 452.34 - lr: 0.000045 - momentum: 0.000000 2023-10-11 02:16:42,482 epoch 8 - iter 116/292 - loss 0.05666740 - time (sec): 38.40 - samples/sec: 448.71 - lr: 0.000044 - momentum: 0.000000 2023-10-11 02:16:53,006 epoch 8 - iter 145/292 - loss 0.05276275 - time (sec): 48.92 - samples/sec: 460.60 - lr: 0.000042 - momentum: 0.000000 2023-10-11 02:17:03,233 epoch 8 - iter 174/292 - loss 0.05694576 - time (sec): 59.15 - samples/sec: 463.99 - lr: 0.000040 - momentum: 0.000000 2023-10-11 02:17:13,224 epoch 8 - iter 203/292 - loss 0.05567754 - time (sec): 69.14 - samples/sec: 457.25 - lr: 0.000039 - momentum: 0.000000 2023-10-11 02:17:22,451 epoch 8 - iter 232/292 - loss 0.05580510 - time (sec): 78.37 - samples/sec: 454.33 - lr: 0.000037 - momentum: 0.000000 2023-10-11 02:17:31,847 epoch 8 - iter 261/292 - loss 0.05688293 - time (sec): 87.76 - samples/sec: 453.04 - lr: 0.000035 - momentum: 0.000000 2023-10-11 02:17:41,618 epoch 8 - iter 290/292 - loss 0.05668654 - time (sec): 97.54 - samples/sec: 453.31 - lr: 0.000034 - momentum: 0.000000 2023-10-11 02:17:42,118 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:17:42,118 EPOCH 8 done: loss 0.0570 - lr: 0.000034 2023-10-11 02:17:48,026 DEV : loss 0.1477464735507965 - f1-score (micro avg) 0.7547 2023-10-11 02:17:48,035 saving best model 2023-10-11 02:17:50,563 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:18:00,505 epoch 9 - iter 29/292 - loss 0.03531180 - time (sec): 9.94 - samples/sec: 453.23 - lr: 0.000032 - momentum: 0.000000 2023-10-11 02:18:10,082 epoch 9 - iter 58/292 - loss 0.04230972 - time (sec): 19.51 - samples/sec: 445.01 - lr: 0.000030 - momentum: 0.000000 2023-10-11 02:18:19,857 epoch 9 - iter 87/292 - loss 0.03799972 - time (sec): 29.29 - samples/sec: 454.60 - lr: 0.000029 - momentum: 0.000000 2023-10-11 02:18:29,531 epoch 9 - iter 116/292 - loss 0.03532901 - time (sec): 38.96 - samples/sec: 449.14 - lr: 0.000027 - momentum: 0.000000 2023-10-11 02:18:39,614 epoch 9 - iter 145/292 - loss 0.03447012 - time (sec): 49.05 - samples/sec: 453.61 - lr: 0.000025 - momentum: 0.000000 2023-10-11 02:18:49,054 epoch 9 - iter 174/292 - loss 0.03671270 - time (sec): 58.49 - samples/sec: 448.50 - lr: 0.000024 - momentum: 0.000000 2023-10-11 02:18:59,545 epoch 9 - iter 203/292 - loss 0.03830952 - time (sec): 68.98 - samples/sec: 448.70 - lr: 0.000022 - momentum: 0.000000 2023-10-11 02:19:11,444 epoch 9 - iter 232/292 - loss 0.03940980 - time (sec): 80.88 - samples/sec: 445.06 - lr: 0.000020 - momentum: 0.000000 2023-10-11 02:19:21,556 epoch 9 - iter 261/292 - loss 0.04347846 - time (sec): 90.99 - samples/sec: 437.72 - lr: 0.000019 - momentum: 0.000000 2023-10-11 02:19:31,405 epoch 9 - iter 290/292 - loss 0.04558116 - time (sec): 100.84 - samples/sec: 438.63 - lr: 0.000017 - momentum: 0.000000 2023-10-11 02:19:31,898 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:19:31,899 EPOCH 9 done: loss 0.0456 - lr: 0.000017 2023-10-11 02:19:37,674 DEV : loss 0.1394050121307373 - f1-score (micro avg) 0.7531 2023-10-11 02:19:37,683 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:19:48,416 epoch 10 - iter 29/292 - loss 0.03819081 - time (sec): 10.73 - samples/sec: 476.18 - lr: 0.000015 - momentum: 0.000000 2023-10-11 02:19:58,555 epoch 10 - iter 58/292 - loss 0.03858124 - time (sec): 20.87 - samples/sec: 463.54 - lr: 0.000014 - momentum: 0.000000 2023-10-11 02:20:08,770 epoch 10 - iter 87/292 - loss 0.03560425 - time (sec): 31.08 - samples/sec: 461.22 - lr: 0.000012 - momentum: 0.000000 2023-10-11 02:20:17,810 epoch 10 - iter 116/292 - loss 0.03986947 - time (sec): 40.13 - samples/sec: 452.86 - lr: 0.000010 - momentum: 0.000000 2023-10-11 02:20:27,668 epoch 10 - iter 145/292 - loss 0.04154442 - time (sec): 49.98 - samples/sec: 454.54 - lr: 0.000009 - momentum: 0.000000 2023-10-11 02:20:38,171 epoch 10 - iter 174/292 - loss 0.03901543 - time (sec): 60.49 - samples/sec: 458.19 - lr: 0.000007 - momentum: 0.000000 2023-10-11 02:20:47,018 epoch 10 - iter 203/292 - loss 0.03846803 - time (sec): 69.33 - samples/sec: 450.03 - lr: 0.000005 - momentum: 0.000000 2023-10-11 02:20:56,695 epoch 10 - iter 232/292 - loss 0.03993511 - time (sec): 79.01 - samples/sec: 448.73 - lr: 0.000004 - momentum: 0.000000 2023-10-11 02:21:06,474 epoch 10 - iter 261/292 - loss 0.04162649 - time (sec): 88.79 - samples/sec: 448.18 - lr: 0.000002 - momentum: 0.000000 2023-10-11 02:21:16,298 epoch 10 - iter 290/292 - loss 0.04178492 - time (sec): 98.61 - samples/sec: 448.60 - lr: 0.000000 - momentum: 0.000000 2023-10-11 02:21:16,784 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:21:16,784 EPOCH 10 done: loss 0.0417 - lr: 0.000000 2023-10-11 02:21:22,559 DEV : loss 0.1379867047071457 - f1-score (micro avg) 0.7484 2023-10-11 02:21:23,442 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:21:23,444 Loading model from best epoch ... 2023-10-11 02:21:27,265 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 02:21:40,198 Results: - F-score (micro) 0.717 - F-score (macro) 0.6428 - Accuracy 0.5817 By class: precision recall f1-score support PER 0.7923 0.8333 0.8123 348 LOC 0.5913 0.7816 0.6733 261 ORG 0.2830 0.2885 0.2857 52 HumanProd 0.7826 0.8182 0.8000 22 micro avg 0.6696 0.7716 0.7170 683 macro avg 0.6123 0.6804 0.6428 683 weighted avg 0.6764 0.7716 0.7187 683 2023-10-11 02:21:40,199 ----------------------------------------------------------------------------------------------------