File size: 25,516 Bytes
53cbf0e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 |
2023-10-09 21:19:56,874 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,877 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-09 21:19:56,877 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,877 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
2023-10-09 21:19:56,877 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,877 Train: 20847 sentences
2023-10-09 21:19:56,877 (train_with_dev=False, train_with_test=False)
2023-10-09 21:19:56,878 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,878 Training Params:
2023-10-09 21:19:56,878 - learning_rate: "0.00016"
2023-10-09 21:19:56,878 - mini_batch_size: "8"
2023-10-09 21:19:56,878 - max_epochs: "10"
2023-10-09 21:19:56,878 - shuffle: "True"
2023-10-09 21:19:56,878 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,878 Plugins:
2023-10-09 21:19:56,878 - TensorboardLogger
2023-10-09 21:19:56,878 - LinearScheduler | warmup_fraction: '0.1'
2023-10-09 21:19:56,878 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,878 Final evaluation on model from best epoch (best-model.pt)
2023-10-09 21:19:56,879 - metric: "('micro avg', 'f1-score')"
2023-10-09 21:19:56,879 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,879 Computation:
2023-10-09 21:19:56,879 - compute on device: cuda:0
2023-10-09 21:19:56,879 - embedding storage: none
2023-10-09 21:19:56,879 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,879 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1"
2023-10-09 21:19:56,879 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,879 ----------------------------------------------------------------------------------------------------
2023-10-09 21:19:56,879 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-09 21:22:21,512 epoch 1 - iter 260/2606 - loss 2.80647480 - time (sec): 144.63 - samples/sec: 272.42 - lr: 0.000016 - momentum: 0.000000
2023-10-09 21:24:40,110 epoch 1 - iter 520/2606 - loss 2.57714924 - time (sec): 283.23 - samples/sec: 261.01 - lr: 0.000032 - momentum: 0.000000
2023-10-09 21:27:08,137 epoch 1 - iter 780/2606 - loss 2.16321673 - time (sec): 431.26 - samples/sec: 255.15 - lr: 0.000048 - momentum: 0.000000
2023-10-09 21:29:25,932 epoch 1 - iter 1040/2606 - loss 1.80319624 - time (sec): 569.05 - samples/sec: 253.10 - lr: 0.000064 - momentum: 0.000000
2023-10-09 21:31:44,572 epoch 1 - iter 1300/2606 - loss 1.51984788 - time (sec): 707.69 - samples/sec: 256.83 - lr: 0.000080 - momentum: 0.000000
2023-10-09 21:34:06,385 epoch 1 - iter 1560/2606 - loss 1.33539090 - time (sec): 849.50 - samples/sec: 258.86 - lr: 0.000096 - momentum: 0.000000
2023-10-09 21:36:25,234 epoch 1 - iter 1820/2606 - loss 1.20340260 - time (sec): 988.35 - samples/sec: 257.95 - lr: 0.000112 - momentum: 0.000000
2023-10-09 21:38:45,336 epoch 1 - iter 2080/2606 - loss 1.08973347 - time (sec): 1128.45 - samples/sec: 258.54 - lr: 0.000128 - momentum: 0.000000
2023-10-09 21:41:05,295 epoch 1 - iter 2340/2606 - loss 0.99333935 - time (sec): 1268.41 - samples/sec: 260.51 - lr: 0.000144 - momentum: 0.000000
2023-10-09 21:43:27,691 epoch 1 - iter 2600/2606 - loss 0.92110059 - time (sec): 1410.81 - samples/sec: 259.88 - lr: 0.000160 - momentum: 0.000000
2023-10-09 21:43:30,711 ----------------------------------------------------------------------------------------------------
2023-10-09 21:43:30,711 EPOCH 1 done: loss 0.9198 - lr: 0.000160
2023-10-09 21:44:07,491 DEV : loss 0.1330375373363495 - f1-score (micro avg) 0.3013
2023-10-09 21:44:07,556 saving best model
2023-10-09 21:44:08,556 ----------------------------------------------------------------------------------------------------
2023-10-09 21:46:29,548 epoch 2 - iter 260/2606 - loss 0.21276153 - time (sec): 140.99 - samples/sec: 282.02 - lr: 0.000158 - momentum: 0.000000
2023-10-09 21:48:49,379 epoch 2 - iter 520/2606 - loss 0.21284661 - time (sec): 280.82 - samples/sec: 279.79 - lr: 0.000156 - momentum: 0.000000
2023-10-09 21:51:07,729 epoch 2 - iter 780/2606 - loss 0.19909410 - time (sec): 419.17 - samples/sec: 276.40 - lr: 0.000155 - momentum: 0.000000
2023-10-09 21:53:23,117 epoch 2 - iter 1040/2606 - loss 0.19175427 - time (sec): 554.56 - samples/sec: 271.02 - lr: 0.000153 - momentum: 0.000000
2023-10-09 21:55:40,552 epoch 2 - iter 1300/2606 - loss 0.18755619 - time (sec): 691.99 - samples/sec: 269.33 - lr: 0.000151 - momentum: 0.000000
2023-10-09 21:57:56,677 epoch 2 - iter 1560/2606 - loss 0.18201966 - time (sec): 828.12 - samples/sec: 268.26 - lr: 0.000149 - momentum: 0.000000
2023-10-09 22:00:21,283 epoch 2 - iter 1820/2606 - loss 0.17717852 - time (sec): 972.72 - samples/sec: 265.45 - lr: 0.000148 - momentum: 0.000000
2023-10-09 22:02:40,967 epoch 2 - iter 2080/2606 - loss 0.17086806 - time (sec): 1112.41 - samples/sec: 265.29 - lr: 0.000146 - momentum: 0.000000
2023-10-09 22:05:00,885 epoch 2 - iter 2340/2606 - loss 0.16582540 - time (sec): 1252.33 - samples/sec: 265.78 - lr: 0.000144 - momentum: 0.000000
2023-10-09 22:07:18,380 epoch 2 - iter 2600/2606 - loss 0.16149225 - time (sec): 1389.82 - samples/sec: 263.81 - lr: 0.000142 - momentum: 0.000000
2023-10-09 22:07:21,444 ----------------------------------------------------------------------------------------------------
2023-10-09 22:07:21,445 EPOCH 2 done: loss 0.1613 - lr: 0.000142
2023-10-09 22:08:03,474 DEV : loss 0.11526025831699371 - f1-score (micro avg) 0.3843
2023-10-09 22:08:03,533 saving best model
2023-10-09 22:08:06,253 ----------------------------------------------------------------------------------------------------
2023-10-09 22:10:30,351 epoch 3 - iter 260/2606 - loss 0.09533533 - time (sec): 144.09 - samples/sec: 252.71 - lr: 0.000140 - momentum: 0.000000
2023-10-09 22:12:48,656 epoch 3 - iter 520/2606 - loss 0.09967778 - time (sec): 282.40 - samples/sec: 252.71 - lr: 0.000139 - momentum: 0.000000
2023-10-09 22:15:13,286 epoch 3 - iter 780/2606 - loss 0.09556336 - time (sec): 427.02 - samples/sec: 259.00 - lr: 0.000137 - momentum: 0.000000
2023-10-09 22:17:29,465 epoch 3 - iter 1040/2606 - loss 0.09703779 - time (sec): 563.20 - samples/sec: 255.55 - lr: 0.000135 - momentum: 0.000000
2023-10-09 22:19:43,209 epoch 3 - iter 1300/2606 - loss 0.09641637 - time (sec): 696.95 - samples/sec: 254.29 - lr: 0.000133 - momentum: 0.000000
2023-10-09 22:22:06,685 epoch 3 - iter 1560/2606 - loss 0.09648911 - time (sec): 840.42 - samples/sec: 258.14 - lr: 0.000132 - momentum: 0.000000
2023-10-09 22:24:35,613 epoch 3 - iter 1820/2606 - loss 0.09589935 - time (sec): 989.35 - samples/sec: 258.39 - lr: 0.000130 - momentum: 0.000000
2023-10-09 22:26:56,117 epoch 3 - iter 2080/2606 - loss 0.09505206 - time (sec): 1129.86 - samples/sec: 259.55 - lr: 0.000128 - momentum: 0.000000
2023-10-09 22:29:16,749 epoch 3 - iter 2340/2606 - loss 0.09458005 - time (sec): 1270.49 - samples/sec: 260.79 - lr: 0.000126 - momentum: 0.000000
2023-10-09 22:31:34,336 epoch 3 - iter 2600/2606 - loss 0.09389335 - time (sec): 1408.08 - samples/sec: 260.50 - lr: 0.000125 - momentum: 0.000000
2023-10-09 22:31:37,238 ----------------------------------------------------------------------------------------------------
2023-10-09 22:31:37,238 EPOCH 3 done: loss 0.0940 - lr: 0.000125
2023-10-09 22:32:18,211 DEV : loss 0.21473725140094757 - f1-score (micro avg) 0.3466
2023-10-09 22:32:18,273 ----------------------------------------------------------------------------------------------------
2023-10-09 22:34:36,848 epoch 4 - iter 260/2606 - loss 0.06749285 - time (sec): 138.57 - samples/sec: 263.49 - lr: 0.000123 - momentum: 0.000000
2023-10-09 22:36:57,878 epoch 4 - iter 520/2606 - loss 0.06262288 - time (sec): 279.60 - samples/sec: 257.39 - lr: 0.000121 - momentum: 0.000000
2023-10-09 22:39:15,206 epoch 4 - iter 780/2606 - loss 0.06093080 - time (sec): 416.93 - samples/sec: 258.17 - lr: 0.000119 - momentum: 0.000000
2023-10-09 22:41:33,785 epoch 4 - iter 1040/2606 - loss 0.06300101 - time (sec): 555.51 - samples/sec: 258.27 - lr: 0.000117 - momentum: 0.000000
2023-10-09 22:43:52,308 epoch 4 - iter 1300/2606 - loss 0.06730143 - time (sec): 694.03 - samples/sec: 260.72 - lr: 0.000116 - momentum: 0.000000
2023-10-09 22:46:21,314 epoch 4 - iter 1560/2606 - loss 0.06445156 - time (sec): 843.04 - samples/sec: 262.43 - lr: 0.000114 - momentum: 0.000000
2023-10-09 22:48:37,254 epoch 4 - iter 1820/2606 - loss 0.06404726 - time (sec): 978.98 - samples/sec: 261.77 - lr: 0.000112 - momentum: 0.000000
2023-10-09 22:50:58,245 epoch 4 - iter 2080/2606 - loss 0.06441531 - time (sec): 1119.97 - samples/sec: 260.79 - lr: 0.000110 - momentum: 0.000000
2023-10-09 22:53:19,391 epoch 4 - iter 2340/2606 - loss 0.06601434 - time (sec): 1261.12 - samples/sec: 261.63 - lr: 0.000109 - momentum: 0.000000
2023-10-09 22:55:43,080 epoch 4 - iter 2600/2606 - loss 0.06647076 - time (sec): 1404.80 - samples/sec: 261.04 - lr: 0.000107 - momentum: 0.000000
2023-10-09 22:55:46,216 ----------------------------------------------------------------------------------------------------
2023-10-09 22:55:46,217 EPOCH 4 done: loss 0.0665 - lr: 0.000107
2023-10-09 22:56:28,195 DEV : loss 0.25312381982803345 - f1-score (micro avg) 0.3504
2023-10-09 22:56:28,256 ----------------------------------------------------------------------------------------------------
2023-10-09 22:58:52,649 epoch 5 - iter 260/2606 - loss 0.04687563 - time (sec): 144.39 - samples/sec: 238.37 - lr: 0.000105 - momentum: 0.000000
2023-10-09 23:01:12,773 epoch 5 - iter 520/2606 - loss 0.05344267 - time (sec): 284.51 - samples/sec: 250.63 - lr: 0.000103 - momentum: 0.000000
2023-10-09 23:03:33,085 epoch 5 - iter 780/2606 - loss 0.05092848 - time (sec): 424.83 - samples/sec: 258.55 - lr: 0.000101 - momentum: 0.000000
2023-10-09 23:05:57,687 epoch 5 - iter 1040/2606 - loss 0.04913735 - time (sec): 569.43 - samples/sec: 260.71 - lr: 0.000100 - momentum: 0.000000
2023-10-09 23:08:12,298 epoch 5 - iter 1300/2606 - loss 0.04916492 - time (sec): 704.04 - samples/sec: 260.14 - lr: 0.000098 - momentum: 0.000000
2023-10-09 23:10:31,281 epoch 5 - iter 1560/2606 - loss 0.05150383 - time (sec): 843.02 - samples/sec: 261.58 - lr: 0.000096 - momentum: 0.000000
2023-10-09 23:12:55,333 epoch 5 - iter 1820/2606 - loss 0.05223482 - time (sec): 987.07 - samples/sec: 262.40 - lr: 0.000094 - momentum: 0.000000
2023-10-09 23:15:20,555 epoch 5 - iter 2080/2606 - loss 0.05153119 - time (sec): 1132.30 - samples/sec: 260.99 - lr: 0.000093 - momentum: 0.000000
2023-10-09 23:17:39,058 epoch 5 - iter 2340/2606 - loss 0.05043738 - time (sec): 1270.80 - samples/sec: 259.67 - lr: 0.000091 - momentum: 0.000000
2023-10-09 23:20:03,897 epoch 5 - iter 2600/2606 - loss 0.05060692 - time (sec): 1415.64 - samples/sec: 258.66 - lr: 0.000089 - momentum: 0.000000
2023-10-09 23:20:07,768 ----------------------------------------------------------------------------------------------------
2023-10-09 23:20:07,768 EPOCH 5 done: loss 0.0505 - lr: 0.000089
2023-10-09 23:20:48,701 DEV : loss 0.2983781099319458 - f1-score (micro avg) 0.3832
2023-10-09 23:20:48,772 ----------------------------------------------------------------------------------------------------
2023-10-09 23:23:07,560 epoch 6 - iter 260/2606 - loss 0.02937703 - time (sec): 138.79 - samples/sec: 264.82 - lr: 0.000087 - momentum: 0.000000
2023-10-09 23:25:30,695 epoch 6 - iter 520/2606 - loss 0.03318626 - time (sec): 281.92 - samples/sec: 251.56 - lr: 0.000085 - momentum: 0.000000
2023-10-09 23:27:52,282 epoch 6 - iter 780/2606 - loss 0.03155811 - time (sec): 423.51 - samples/sec: 259.36 - lr: 0.000084 - momentum: 0.000000
2023-10-09 23:30:13,592 epoch 6 - iter 1040/2606 - loss 0.03188441 - time (sec): 564.82 - samples/sec: 257.25 - lr: 0.000082 - momentum: 0.000000
2023-10-09 23:32:34,084 epoch 6 - iter 1300/2606 - loss 0.03353433 - time (sec): 705.31 - samples/sec: 257.77 - lr: 0.000080 - momentum: 0.000000
2023-10-09 23:34:58,642 epoch 6 - iter 1560/2606 - loss 0.03464167 - time (sec): 849.87 - samples/sec: 254.06 - lr: 0.000078 - momentum: 0.000000
2023-10-09 23:37:22,052 epoch 6 - iter 1820/2606 - loss 0.03470450 - time (sec): 993.28 - samples/sec: 254.45 - lr: 0.000077 - momentum: 0.000000
2023-10-09 23:39:41,751 epoch 6 - iter 2080/2606 - loss 0.03483777 - time (sec): 1132.98 - samples/sec: 256.43 - lr: 0.000075 - momentum: 0.000000
2023-10-09 23:42:05,986 epoch 6 - iter 2340/2606 - loss 0.03565070 - time (sec): 1277.21 - samples/sec: 257.37 - lr: 0.000073 - momentum: 0.000000
2023-10-09 23:44:26,648 epoch 6 - iter 2600/2606 - loss 0.03682143 - time (sec): 1417.87 - samples/sec: 258.81 - lr: 0.000071 - momentum: 0.000000
2023-10-09 23:44:29,467 ----------------------------------------------------------------------------------------------------
2023-10-09 23:44:29,468 EPOCH 6 done: loss 0.0368 - lr: 0.000071
2023-10-09 23:45:10,942 DEV : loss 0.35610052943229675 - f1-score (micro avg) 0.3742
2023-10-09 23:45:11,003 ----------------------------------------------------------------------------------------------------
2023-10-09 23:47:39,689 epoch 7 - iter 260/2606 - loss 0.02267644 - time (sec): 148.68 - samples/sec: 258.00 - lr: 0.000069 - momentum: 0.000000
2023-10-09 23:50:07,336 epoch 7 - iter 520/2606 - loss 0.02417424 - time (sec): 296.33 - samples/sec: 258.54 - lr: 0.000068 - momentum: 0.000000
2023-10-09 23:52:24,084 epoch 7 - iter 780/2606 - loss 0.02606608 - time (sec): 433.08 - samples/sec: 257.55 - lr: 0.000066 - momentum: 0.000000
2023-10-09 23:54:53,690 epoch 7 - iter 1040/2606 - loss 0.02526796 - time (sec): 582.68 - samples/sec: 255.54 - lr: 0.000064 - momentum: 0.000000
2023-10-09 23:57:14,548 epoch 7 - iter 1300/2606 - loss 0.02470524 - time (sec): 723.54 - samples/sec: 258.11 - lr: 0.000062 - momentum: 0.000000
2023-10-09 23:59:40,240 epoch 7 - iter 1560/2606 - loss 0.02608201 - time (sec): 869.23 - samples/sec: 257.31 - lr: 0.000061 - momentum: 0.000000
2023-10-10 00:02:10,450 epoch 7 - iter 1820/2606 - loss 0.02541811 - time (sec): 1019.44 - samples/sec: 254.49 - lr: 0.000059 - momentum: 0.000000
2023-10-10 00:04:28,537 epoch 7 - iter 2080/2606 - loss 0.02616950 - time (sec): 1157.53 - samples/sec: 255.27 - lr: 0.000057 - momentum: 0.000000
2023-10-10 00:06:52,824 epoch 7 - iter 2340/2606 - loss 0.02604706 - time (sec): 1301.82 - samples/sec: 255.00 - lr: 0.000055 - momentum: 0.000000
2023-10-10 00:09:10,093 epoch 7 - iter 2600/2606 - loss 0.02658662 - time (sec): 1439.09 - samples/sec: 254.78 - lr: 0.000053 - momentum: 0.000000
2023-10-10 00:09:13,227 ----------------------------------------------------------------------------------------------------
2023-10-10 00:09:13,228 EPOCH 7 done: loss 0.0266 - lr: 0.000053
2023-10-10 00:09:54,499 DEV : loss 0.36638563871383667 - f1-score (micro avg) 0.393
2023-10-10 00:09:54,560 saving best model
2023-10-10 00:09:57,284 ----------------------------------------------------------------------------------------------------
2023-10-10 00:12:19,774 epoch 8 - iter 260/2606 - loss 0.01822913 - time (sec): 142.49 - samples/sec: 253.32 - lr: 0.000052 - momentum: 0.000000
2023-10-10 00:14:43,139 epoch 8 - iter 520/2606 - loss 0.01834896 - time (sec): 285.85 - samples/sec: 254.92 - lr: 0.000050 - momentum: 0.000000
2023-10-10 00:17:03,348 epoch 8 - iter 780/2606 - loss 0.01923891 - time (sec): 426.06 - samples/sec: 259.99 - lr: 0.000048 - momentum: 0.000000
2023-10-10 00:19:26,521 epoch 8 - iter 1040/2606 - loss 0.01918401 - time (sec): 569.23 - samples/sec: 257.56 - lr: 0.000046 - momentum: 0.000000
2023-10-10 00:21:46,854 epoch 8 - iter 1300/2606 - loss 0.02003374 - time (sec): 709.57 - samples/sec: 257.12 - lr: 0.000045 - momentum: 0.000000
2023-10-10 00:24:08,172 epoch 8 - iter 1560/2606 - loss 0.02027599 - time (sec): 850.88 - samples/sec: 258.03 - lr: 0.000043 - momentum: 0.000000
2023-10-10 00:26:31,945 epoch 8 - iter 1820/2606 - loss 0.02016303 - time (sec): 994.66 - samples/sec: 256.05 - lr: 0.000041 - momentum: 0.000000
2023-10-10 00:28:52,525 epoch 8 - iter 2080/2606 - loss 0.01973949 - time (sec): 1135.24 - samples/sec: 258.50 - lr: 0.000039 - momentum: 0.000000
2023-10-10 00:31:16,007 epoch 8 - iter 2340/2606 - loss 0.01926607 - time (sec): 1278.72 - samples/sec: 258.44 - lr: 0.000037 - momentum: 0.000000
2023-10-10 00:33:36,034 epoch 8 - iter 2600/2606 - loss 0.01941452 - time (sec): 1418.75 - samples/sec: 258.42 - lr: 0.000036 - momentum: 0.000000
2023-10-10 00:33:39,247 ----------------------------------------------------------------------------------------------------
2023-10-10 00:33:39,248 EPOCH 8 done: loss 0.0194 - lr: 0.000036
2023-10-10 00:34:22,217 DEV : loss 0.4113345742225647 - f1-score (micro avg) 0.4105
2023-10-10 00:34:22,275 saving best model
2023-10-10 00:34:25,003 ----------------------------------------------------------------------------------------------------
2023-10-10 00:36:50,095 epoch 9 - iter 260/2606 - loss 0.01806776 - time (sec): 145.09 - samples/sec: 260.15 - lr: 0.000034 - momentum: 0.000000
2023-10-10 00:39:15,157 epoch 9 - iter 520/2606 - loss 0.01665407 - time (sec): 290.15 - samples/sec: 260.23 - lr: 0.000032 - momentum: 0.000000
2023-10-10 00:41:35,122 epoch 9 - iter 780/2606 - loss 0.01567942 - time (sec): 430.11 - samples/sec: 255.60 - lr: 0.000030 - momentum: 0.000000
2023-10-10 00:44:04,493 epoch 9 - iter 1040/2606 - loss 0.01509980 - time (sec): 579.49 - samples/sec: 253.38 - lr: 0.000029 - momentum: 0.000000
2023-10-10 00:46:23,680 epoch 9 - iter 1300/2606 - loss 0.01591559 - time (sec): 718.67 - samples/sec: 255.04 - lr: 0.000027 - momentum: 0.000000
2023-10-10 00:48:42,310 epoch 9 - iter 1560/2606 - loss 0.01573140 - time (sec): 857.30 - samples/sec: 256.72 - lr: 0.000025 - momentum: 0.000000
2023-10-10 00:51:00,451 epoch 9 - iter 1820/2606 - loss 0.01521895 - time (sec): 995.44 - samples/sec: 256.73 - lr: 0.000023 - momentum: 0.000000
2023-10-10 00:53:21,383 epoch 9 - iter 2080/2606 - loss 0.01475674 - time (sec): 1136.38 - samples/sec: 256.09 - lr: 0.000021 - momentum: 0.000000
2023-10-10 00:55:45,112 epoch 9 - iter 2340/2606 - loss 0.01445053 - time (sec): 1280.10 - samples/sec: 256.35 - lr: 0.000020 - momentum: 0.000000
2023-10-10 00:58:03,910 epoch 9 - iter 2600/2606 - loss 0.01402838 - time (sec): 1418.90 - samples/sec: 258.18 - lr: 0.000018 - momentum: 0.000000
2023-10-10 00:58:07,250 ----------------------------------------------------------------------------------------------------
2023-10-10 00:58:07,251 EPOCH 9 done: loss 0.0140 - lr: 0.000018
2023-10-10 00:58:48,323 DEV : loss 0.45426633954048157 - f1-score (micro avg) 0.3959
2023-10-10 00:58:48,375 ----------------------------------------------------------------------------------------------------
2023-10-10 01:01:08,212 epoch 10 - iter 260/2606 - loss 0.01259898 - time (sec): 139.83 - samples/sec: 262.55 - lr: 0.000016 - momentum: 0.000000
2023-10-10 01:03:30,710 epoch 10 - iter 520/2606 - loss 0.01112512 - time (sec): 282.33 - samples/sec: 254.43 - lr: 0.000014 - momentum: 0.000000
2023-10-10 01:05:50,144 epoch 10 - iter 780/2606 - loss 0.01157811 - time (sec): 421.77 - samples/sec: 248.87 - lr: 0.000013 - momentum: 0.000000
2023-10-10 01:08:09,984 epoch 10 - iter 1040/2606 - loss 0.01040706 - time (sec): 561.61 - samples/sec: 256.04 - lr: 0.000011 - momentum: 0.000000
2023-10-10 01:10:35,004 epoch 10 - iter 1300/2606 - loss 0.01105992 - time (sec): 706.63 - samples/sec: 261.18 - lr: 0.000009 - momentum: 0.000000
2023-10-10 01:12:55,356 epoch 10 - iter 1560/2606 - loss 0.01090713 - time (sec): 846.98 - samples/sec: 259.41 - lr: 0.000007 - momentum: 0.000000
2023-10-10 01:15:24,167 epoch 10 - iter 1820/2606 - loss 0.01106558 - time (sec): 995.79 - samples/sec: 258.10 - lr: 0.000005 - momentum: 0.000000
2023-10-10 01:17:45,318 epoch 10 - iter 2080/2606 - loss 0.01060413 - time (sec): 1136.94 - samples/sec: 259.10 - lr: 0.000004 - momentum: 0.000000
2023-10-10 01:20:04,288 epoch 10 - iter 2340/2606 - loss 0.01019989 - time (sec): 1275.91 - samples/sec: 260.29 - lr: 0.000002 - momentum: 0.000000
2023-10-10 01:22:23,524 epoch 10 - iter 2600/2606 - loss 0.01013457 - time (sec): 1415.15 - samples/sec: 258.97 - lr: 0.000000 - momentum: 0.000000
2023-10-10 01:22:26,690 ----------------------------------------------------------------------------------------------------
2023-10-10 01:22:26,690 EPOCH 10 done: loss 0.0101 - lr: 0.000000
2023-10-10 01:23:06,687 DEV : loss 0.4742611050605774 - f1-score (micro avg) 0.3928
2023-10-10 01:23:07,733 ----------------------------------------------------------------------------------------------------
2023-10-10 01:23:07,735 Loading model from best epoch ...
2023-10-10 01:23:11,754 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-10 01:24:55,577
Results:
- F-score (micro) 0.4682
- F-score (macro) 0.3262
- Accuracy 0.3104
By class:
precision recall f1-score support
LOC 0.5077 0.5700 0.5371 1214
PER 0.3953 0.4554 0.4232 808
ORG 0.3407 0.3484 0.3445 353
HumanProd 0.0000 0.0000 0.0000 15
micro avg 0.4442 0.4950 0.4682 2390
macro avg 0.3109 0.3435 0.3262 2390
weighted avg 0.4418 0.4950 0.4668 2390
2023-10-10 01:24:55,577 ----------------------------------------------------------------------------------------------------
|