File size: 24,073 Bytes
f2d0eeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
2023-10-17 13:37:25,646 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,647 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): ElectraModel(
      (embeddings): ElectraEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): ElectraEncoder(
        (layer): ModuleList(
          (0-11): 12 x ElectraLayer(
            (attention): ElectraAttention(
              (self): ElectraSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): ElectraSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): ElectraIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): ElectraOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-17 13:37:25,647 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,647 MultiCorpus: 7142 train + 698 dev + 2570 test sentences
 - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator
2023-10-17 13:37:25,647 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,648 Train:  7142 sentences
2023-10-17 13:37:25,648         (train_with_dev=False, train_with_test=False)
2023-10-17 13:37:25,648 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,648 Training Params:
2023-10-17 13:37:25,648  - learning_rate: "3e-05" 
2023-10-17 13:37:25,648  - mini_batch_size: "4"
2023-10-17 13:37:25,648  - max_epochs: "10"
2023-10-17 13:37:25,648  - shuffle: "True"
2023-10-17 13:37:25,648 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,648 Plugins:
2023-10-17 13:37:25,648  - TensorboardLogger
2023-10-17 13:37:25,648  - LinearScheduler | warmup_fraction: '0.1'
2023-10-17 13:37:25,648 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,648 Final evaluation on model from best epoch (best-model.pt)
2023-10-17 13:37:25,648  - metric: "('micro avg', 'f1-score')"
2023-10-17 13:37:25,648 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,648 Computation:
2023-10-17 13:37:25,648  - compute on device: cuda:0
2023-10-17 13:37:25,648  - embedding storage: none
2023-10-17 13:37:25,648 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,648 Model training base path: "hmbench-newseye/fr-hmteams/teams-base-historic-multilingual-discriminator-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-2"
2023-10-17 13:37:25,648 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,648 ----------------------------------------------------------------------------------------------------
2023-10-17 13:37:25,648 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-17 13:37:34,100 epoch 1 - iter 178/1786 - loss 2.71850894 - time (sec): 8.45 - samples/sec: 2722.97 - lr: 0.000003 - momentum: 0.000000
2023-10-17 13:37:43,271 epoch 1 - iter 356/1786 - loss 1.55627930 - time (sec): 17.62 - samples/sec: 2818.11 - lr: 0.000006 - momentum: 0.000000
2023-10-17 13:37:52,167 epoch 1 - iter 534/1786 - loss 1.17692252 - time (sec): 26.52 - samples/sec: 2810.54 - lr: 0.000009 - momentum: 0.000000
2023-10-17 13:38:01,158 epoch 1 - iter 712/1786 - loss 0.95620126 - time (sec): 35.51 - samples/sec: 2847.03 - lr: 0.000012 - momentum: 0.000000
2023-10-17 13:38:09,947 epoch 1 - iter 890/1786 - loss 0.81990248 - time (sec): 44.30 - samples/sec: 2817.71 - lr: 0.000015 - momentum: 0.000000
2023-10-17 13:38:18,765 epoch 1 - iter 1068/1786 - loss 0.72732692 - time (sec): 53.12 - samples/sec: 2789.30 - lr: 0.000018 - momentum: 0.000000
2023-10-17 13:38:27,632 epoch 1 - iter 1246/1786 - loss 0.64882748 - time (sec): 61.98 - samples/sec: 2788.02 - lr: 0.000021 - momentum: 0.000000
2023-10-17 13:38:36,725 epoch 1 - iter 1424/1786 - loss 0.58363820 - time (sec): 71.08 - samples/sec: 2803.30 - lr: 0.000024 - momentum: 0.000000
2023-10-17 13:38:45,495 epoch 1 - iter 1602/1786 - loss 0.53997715 - time (sec): 79.85 - samples/sec: 2791.05 - lr: 0.000027 - momentum: 0.000000
2023-10-17 13:38:54,216 epoch 1 - iter 1780/1786 - loss 0.49959069 - time (sec): 88.57 - samples/sec: 2802.19 - lr: 0.000030 - momentum: 0.000000
2023-10-17 13:38:54,485 ----------------------------------------------------------------------------------------------------
2023-10-17 13:38:54,486 EPOCH 1 done: loss 0.4990 - lr: 0.000030
2023-10-17 13:38:57,577 DEV : loss 0.11935114115476608 - f1-score (micro avg)  0.7585
2023-10-17 13:38:57,593 saving best model
2023-10-17 13:38:57,942 ----------------------------------------------------------------------------------------------------
2023-10-17 13:39:07,045 epoch 2 - iter 178/1786 - loss 0.14207407 - time (sec): 9.10 - samples/sec: 2671.63 - lr: 0.000030 - momentum: 0.000000
2023-10-17 13:39:15,919 epoch 2 - iter 356/1786 - loss 0.13038487 - time (sec): 17.98 - samples/sec: 2672.23 - lr: 0.000029 - momentum: 0.000000
2023-10-17 13:39:24,577 epoch 2 - iter 534/1786 - loss 0.12627094 - time (sec): 26.63 - samples/sec: 2624.39 - lr: 0.000029 - momentum: 0.000000
2023-10-17 13:39:33,752 epoch 2 - iter 712/1786 - loss 0.12554020 - time (sec): 35.81 - samples/sec: 2655.03 - lr: 0.000029 - momentum: 0.000000
2023-10-17 13:39:42,945 epoch 2 - iter 890/1786 - loss 0.12266313 - time (sec): 45.00 - samples/sec: 2698.72 - lr: 0.000028 - momentum: 0.000000
2023-10-17 13:39:52,042 epoch 2 - iter 1068/1786 - loss 0.12077729 - time (sec): 54.10 - samples/sec: 2723.97 - lr: 0.000028 - momentum: 0.000000
2023-10-17 13:40:01,199 epoch 2 - iter 1246/1786 - loss 0.11775653 - time (sec): 63.26 - samples/sec: 2764.92 - lr: 0.000028 - momentum: 0.000000
2023-10-17 13:40:09,971 epoch 2 - iter 1424/1786 - loss 0.12007377 - time (sec): 72.03 - samples/sec: 2776.20 - lr: 0.000027 - momentum: 0.000000
2023-10-17 13:40:18,872 epoch 2 - iter 1602/1786 - loss 0.11957108 - time (sec): 80.93 - samples/sec: 2773.10 - lr: 0.000027 - momentum: 0.000000
2023-10-17 13:40:27,656 epoch 2 - iter 1780/1786 - loss 0.11971143 - time (sec): 89.71 - samples/sec: 2765.64 - lr: 0.000027 - momentum: 0.000000
2023-10-17 13:40:27,939 ----------------------------------------------------------------------------------------------------
2023-10-17 13:40:27,940 EPOCH 2 done: loss 0.1195 - lr: 0.000027
2023-10-17 13:40:32,630 DEV : loss 0.10125791281461716 - f1-score (micro avg)  0.8144
2023-10-17 13:40:32,647 saving best model
2023-10-17 13:40:33,095 ----------------------------------------------------------------------------------------------------
2023-10-17 13:40:41,901 epoch 3 - iter 178/1786 - loss 0.07418424 - time (sec): 8.80 - samples/sec: 2788.17 - lr: 0.000026 - momentum: 0.000000
2023-10-17 13:40:50,945 epoch 3 - iter 356/1786 - loss 0.07570841 - time (sec): 17.84 - samples/sec: 2768.58 - lr: 0.000026 - momentum: 0.000000
2023-10-17 13:40:59,866 epoch 3 - iter 534/1786 - loss 0.07399212 - time (sec): 26.77 - samples/sec: 2788.93 - lr: 0.000026 - momentum: 0.000000
2023-10-17 13:41:08,915 epoch 3 - iter 712/1786 - loss 0.07427751 - time (sec): 35.82 - samples/sec: 2747.68 - lr: 0.000025 - momentum: 0.000000
2023-10-17 13:41:17,929 epoch 3 - iter 890/1786 - loss 0.07471144 - time (sec): 44.83 - samples/sec: 2766.71 - lr: 0.000025 - momentum: 0.000000
2023-10-17 13:41:26,655 epoch 3 - iter 1068/1786 - loss 0.07885766 - time (sec): 53.56 - samples/sec: 2762.94 - lr: 0.000025 - momentum: 0.000000
2023-10-17 13:41:35,429 epoch 3 - iter 1246/1786 - loss 0.08022104 - time (sec): 62.33 - samples/sec: 2764.62 - lr: 0.000024 - momentum: 0.000000
2023-10-17 13:41:44,429 epoch 3 - iter 1424/1786 - loss 0.07990684 - time (sec): 71.33 - samples/sec: 2780.82 - lr: 0.000024 - momentum: 0.000000
2023-10-17 13:41:53,237 epoch 3 - iter 1602/1786 - loss 0.08236359 - time (sec): 80.14 - samples/sec: 2781.43 - lr: 0.000024 - momentum: 0.000000
2023-10-17 13:42:02,129 epoch 3 - iter 1780/1786 - loss 0.08175112 - time (sec): 89.03 - samples/sec: 2781.51 - lr: 0.000023 - momentum: 0.000000
2023-10-17 13:42:02,455 ----------------------------------------------------------------------------------------------------
2023-10-17 13:42:02,456 EPOCH 3 done: loss 0.0818 - lr: 0.000023
2023-10-17 13:42:06,658 DEV : loss 0.13423609733581543 - f1-score (micro avg)  0.7967
2023-10-17 13:42:06,675 ----------------------------------------------------------------------------------------------------
2023-10-17 13:42:15,751 epoch 4 - iter 178/1786 - loss 0.05237935 - time (sec): 9.07 - samples/sec: 2848.98 - lr: 0.000023 - momentum: 0.000000
2023-10-17 13:42:24,779 epoch 4 - iter 356/1786 - loss 0.05659288 - time (sec): 18.10 - samples/sec: 2788.78 - lr: 0.000023 - momentum: 0.000000
2023-10-17 13:42:33,850 epoch 4 - iter 534/1786 - loss 0.05742020 - time (sec): 27.17 - samples/sec: 2798.72 - lr: 0.000022 - momentum: 0.000000
2023-10-17 13:42:43,043 epoch 4 - iter 712/1786 - loss 0.05968286 - time (sec): 36.37 - samples/sec: 2804.23 - lr: 0.000022 - momentum: 0.000000
2023-10-17 13:42:51,955 epoch 4 - iter 890/1786 - loss 0.06046189 - time (sec): 45.28 - samples/sec: 2778.04 - lr: 0.000022 - momentum: 0.000000
2023-10-17 13:43:01,044 epoch 4 - iter 1068/1786 - loss 0.05955440 - time (sec): 54.37 - samples/sec: 2780.27 - lr: 0.000021 - momentum: 0.000000
2023-10-17 13:43:10,485 epoch 4 - iter 1246/1786 - loss 0.06002902 - time (sec): 63.81 - samples/sec: 2761.07 - lr: 0.000021 - momentum: 0.000000
2023-10-17 13:43:19,111 epoch 4 - iter 1424/1786 - loss 0.05900004 - time (sec): 72.43 - samples/sec: 2757.94 - lr: 0.000021 - momentum: 0.000000
2023-10-17 13:43:27,980 epoch 4 - iter 1602/1786 - loss 0.05891783 - time (sec): 81.30 - samples/sec: 2755.96 - lr: 0.000020 - momentum: 0.000000
2023-10-17 13:43:36,698 epoch 4 - iter 1780/1786 - loss 0.05878177 - time (sec): 90.02 - samples/sec: 2752.92 - lr: 0.000020 - momentum: 0.000000
2023-10-17 13:43:37,002 ----------------------------------------------------------------------------------------------------
2023-10-17 13:43:37,002 EPOCH 4 done: loss 0.0589 - lr: 0.000020
2023-10-17 13:43:41,153 DEV : loss 0.16494163870811462 - f1-score (micro avg)  0.8072
2023-10-17 13:43:41,170 ----------------------------------------------------------------------------------------------------
2023-10-17 13:43:50,053 epoch 5 - iter 178/1786 - loss 0.03097896 - time (sec): 8.88 - samples/sec: 2793.19 - lr: 0.000020 - momentum: 0.000000
2023-10-17 13:43:59,171 epoch 5 - iter 356/1786 - loss 0.04055098 - time (sec): 18.00 - samples/sec: 2843.10 - lr: 0.000019 - momentum: 0.000000
2023-10-17 13:44:08,165 epoch 5 - iter 534/1786 - loss 0.04153062 - time (sec): 26.99 - samples/sec: 2845.05 - lr: 0.000019 - momentum: 0.000000
2023-10-17 13:44:17,143 epoch 5 - iter 712/1786 - loss 0.04112747 - time (sec): 35.97 - samples/sec: 2818.48 - lr: 0.000019 - momentum: 0.000000
2023-10-17 13:44:25,980 epoch 5 - iter 890/1786 - loss 0.04124960 - time (sec): 44.81 - samples/sec: 2781.91 - lr: 0.000018 - momentum: 0.000000
2023-10-17 13:44:34,928 epoch 5 - iter 1068/1786 - loss 0.04175525 - time (sec): 53.76 - samples/sec: 2789.27 - lr: 0.000018 - momentum: 0.000000
2023-10-17 13:44:43,746 epoch 5 - iter 1246/1786 - loss 0.04113562 - time (sec): 62.57 - samples/sec: 2782.73 - lr: 0.000018 - momentum: 0.000000
2023-10-17 13:44:52,508 epoch 5 - iter 1424/1786 - loss 0.04207320 - time (sec): 71.34 - samples/sec: 2788.00 - lr: 0.000017 - momentum: 0.000000
2023-10-17 13:45:01,198 epoch 5 - iter 1602/1786 - loss 0.04173442 - time (sec): 80.03 - samples/sec: 2765.43 - lr: 0.000017 - momentum: 0.000000
2023-10-17 13:45:10,322 epoch 5 - iter 1780/1786 - loss 0.04286181 - time (sec): 89.15 - samples/sec: 2780.12 - lr: 0.000017 - momentum: 0.000000
2023-10-17 13:45:10,633 ----------------------------------------------------------------------------------------------------
2023-10-17 13:45:10,633 EPOCH 5 done: loss 0.0427 - lr: 0.000017
2023-10-17 13:45:15,433 DEV : loss 0.1587299257516861 - f1-score (micro avg)  0.8108
2023-10-17 13:45:15,450 ----------------------------------------------------------------------------------------------------
2023-10-17 13:45:24,076 epoch 6 - iter 178/1786 - loss 0.03579067 - time (sec): 8.62 - samples/sec: 2905.87 - lr: 0.000016 - momentum: 0.000000
2023-10-17 13:45:33,127 epoch 6 - iter 356/1786 - loss 0.02738996 - time (sec): 17.68 - samples/sec: 2901.45 - lr: 0.000016 - momentum: 0.000000
2023-10-17 13:45:42,026 epoch 6 - iter 534/1786 - loss 0.02902765 - time (sec): 26.57 - samples/sec: 2826.40 - lr: 0.000016 - momentum: 0.000000
2023-10-17 13:45:51,006 epoch 6 - iter 712/1786 - loss 0.02908652 - time (sec): 35.56 - samples/sec: 2805.29 - lr: 0.000015 - momentum: 0.000000
2023-10-17 13:45:59,947 epoch 6 - iter 890/1786 - loss 0.02990502 - time (sec): 44.50 - samples/sec: 2780.47 - lr: 0.000015 - momentum: 0.000000
2023-10-17 13:46:08,899 epoch 6 - iter 1068/1786 - loss 0.02956431 - time (sec): 53.45 - samples/sec: 2767.16 - lr: 0.000015 - momentum: 0.000000
2023-10-17 13:46:17,597 epoch 6 - iter 1246/1786 - loss 0.03095708 - time (sec): 62.15 - samples/sec: 2779.68 - lr: 0.000014 - momentum: 0.000000
2023-10-17 13:46:26,112 epoch 6 - iter 1424/1786 - loss 0.03171714 - time (sec): 70.66 - samples/sec: 2810.15 - lr: 0.000014 - momentum: 0.000000
2023-10-17 13:46:34,499 epoch 6 - iter 1602/1786 - loss 0.03200171 - time (sec): 79.05 - samples/sec: 2818.22 - lr: 0.000014 - momentum: 0.000000
2023-10-17 13:46:42,973 epoch 6 - iter 1780/1786 - loss 0.03147805 - time (sec): 87.52 - samples/sec: 2831.86 - lr: 0.000013 - momentum: 0.000000
2023-10-17 13:46:43,268 ----------------------------------------------------------------------------------------------------
2023-10-17 13:46:43,269 EPOCH 6 done: loss 0.0315 - lr: 0.000013
2023-10-17 13:46:47,415 DEV : loss 0.1810143142938614 - f1-score (micro avg)  0.8208
2023-10-17 13:46:47,432 saving best model
2023-10-17 13:46:47,883 ----------------------------------------------------------------------------------------------------
2023-10-17 13:46:57,010 epoch 7 - iter 178/1786 - loss 0.02252211 - time (sec): 9.12 - samples/sec: 2815.32 - lr: 0.000013 - momentum: 0.000000
2023-10-17 13:47:06,046 epoch 7 - iter 356/1786 - loss 0.02178416 - time (sec): 18.16 - samples/sec: 2798.52 - lr: 0.000013 - momentum: 0.000000
2023-10-17 13:47:14,788 epoch 7 - iter 534/1786 - loss 0.02234988 - time (sec): 26.90 - samples/sec: 2783.93 - lr: 0.000012 - momentum: 0.000000
2023-10-17 13:47:23,746 epoch 7 - iter 712/1786 - loss 0.02071319 - time (sec): 35.86 - samples/sec: 2801.18 - lr: 0.000012 - momentum: 0.000000
2023-10-17 13:47:32,861 epoch 7 - iter 890/1786 - loss 0.02322136 - time (sec): 44.97 - samples/sec: 2795.86 - lr: 0.000012 - momentum: 0.000000
2023-10-17 13:47:41,641 epoch 7 - iter 1068/1786 - loss 0.02464583 - time (sec): 53.75 - samples/sec: 2781.24 - lr: 0.000011 - momentum: 0.000000
2023-10-17 13:47:50,621 epoch 7 - iter 1246/1786 - loss 0.02371729 - time (sec): 62.73 - samples/sec: 2796.27 - lr: 0.000011 - momentum: 0.000000
2023-10-17 13:47:59,715 epoch 7 - iter 1424/1786 - loss 0.02337745 - time (sec): 71.83 - samples/sec: 2785.17 - lr: 0.000011 - momentum: 0.000000
2023-10-17 13:48:08,615 epoch 7 - iter 1602/1786 - loss 0.02344878 - time (sec): 80.73 - samples/sec: 2757.92 - lr: 0.000010 - momentum: 0.000000
2023-10-17 13:48:17,579 epoch 7 - iter 1780/1786 - loss 0.02410208 - time (sec): 89.69 - samples/sec: 2758.99 - lr: 0.000010 - momentum: 0.000000
2023-10-17 13:48:17,888 ----------------------------------------------------------------------------------------------------
2023-10-17 13:48:17,889 EPOCH 7 done: loss 0.0240 - lr: 0.000010
2023-10-17 13:48:22,557 DEV : loss 0.1769118458032608 - f1-score (micro avg)  0.8345
2023-10-17 13:48:22,574 saving best model
2023-10-17 13:48:23,039 ----------------------------------------------------------------------------------------------------
2023-10-17 13:48:31,997 epoch 8 - iter 178/1786 - loss 0.01978668 - time (sec): 8.95 - samples/sec: 2710.28 - lr: 0.000010 - momentum: 0.000000
2023-10-17 13:48:40,923 epoch 8 - iter 356/1786 - loss 0.01795202 - time (sec): 17.88 - samples/sec: 2730.53 - lr: 0.000009 - momentum: 0.000000
2023-10-17 13:48:49,838 epoch 8 - iter 534/1786 - loss 0.01682048 - time (sec): 26.80 - samples/sec: 2732.43 - lr: 0.000009 - momentum: 0.000000
2023-10-17 13:48:58,809 epoch 8 - iter 712/1786 - loss 0.01706732 - time (sec): 35.77 - samples/sec: 2726.67 - lr: 0.000009 - momentum: 0.000000
2023-10-17 13:49:07,606 epoch 8 - iter 890/1786 - loss 0.01709347 - time (sec): 44.56 - samples/sec: 2745.57 - lr: 0.000008 - momentum: 0.000000
2023-10-17 13:49:16,512 epoch 8 - iter 1068/1786 - loss 0.01748776 - time (sec): 53.47 - samples/sec: 2735.99 - lr: 0.000008 - momentum: 0.000000
2023-10-17 13:49:25,400 epoch 8 - iter 1246/1786 - loss 0.01730659 - time (sec): 62.36 - samples/sec: 2751.55 - lr: 0.000008 - momentum: 0.000000
2023-10-17 13:49:34,658 epoch 8 - iter 1424/1786 - loss 0.01760562 - time (sec): 71.62 - samples/sec: 2770.82 - lr: 0.000007 - momentum: 0.000000
2023-10-17 13:49:43,429 epoch 8 - iter 1602/1786 - loss 0.01761039 - time (sec): 80.39 - samples/sec: 2768.58 - lr: 0.000007 - momentum: 0.000000
2023-10-17 13:49:52,395 epoch 8 - iter 1780/1786 - loss 0.01754475 - time (sec): 89.35 - samples/sec: 2776.92 - lr: 0.000007 - momentum: 0.000000
2023-10-17 13:49:52,677 ----------------------------------------------------------------------------------------------------
2023-10-17 13:49:52,677 EPOCH 8 done: loss 0.0177 - lr: 0.000007
2023-10-17 13:49:56,898 DEV : loss 0.196747824549675 - f1-score (micro avg)  0.8165
2023-10-17 13:49:56,915 ----------------------------------------------------------------------------------------------------
2023-10-17 13:50:06,356 epoch 9 - iter 178/1786 - loss 0.01085864 - time (sec): 9.44 - samples/sec: 2704.82 - lr: 0.000006 - momentum: 0.000000
2023-10-17 13:50:15,176 epoch 9 - iter 356/1786 - loss 0.01281656 - time (sec): 18.26 - samples/sec: 2792.68 - lr: 0.000006 - momentum: 0.000000
2023-10-17 13:50:24,105 epoch 9 - iter 534/1786 - loss 0.01358075 - time (sec): 27.19 - samples/sec: 2767.01 - lr: 0.000006 - momentum: 0.000000
2023-10-17 13:50:33,067 epoch 9 - iter 712/1786 - loss 0.01366083 - time (sec): 36.15 - samples/sec: 2736.04 - lr: 0.000005 - momentum: 0.000000
2023-10-17 13:50:41,994 epoch 9 - iter 890/1786 - loss 0.01227376 - time (sec): 45.08 - samples/sec: 2741.15 - lr: 0.000005 - momentum: 0.000000
2023-10-17 13:50:50,756 epoch 9 - iter 1068/1786 - loss 0.01291585 - time (sec): 53.84 - samples/sec: 2748.82 - lr: 0.000005 - momentum: 0.000000
2023-10-17 13:51:00,405 epoch 9 - iter 1246/1786 - loss 0.01335982 - time (sec): 63.49 - samples/sec: 2696.12 - lr: 0.000004 - momentum: 0.000000
2023-10-17 13:51:09,305 epoch 9 - iter 1424/1786 - loss 0.01331987 - time (sec): 72.39 - samples/sec: 2709.60 - lr: 0.000004 - momentum: 0.000000
2023-10-17 13:51:18,256 epoch 9 - iter 1602/1786 - loss 0.01337922 - time (sec): 81.34 - samples/sec: 2723.43 - lr: 0.000004 - momentum: 0.000000
2023-10-17 13:51:27,315 epoch 9 - iter 1780/1786 - loss 0.01258706 - time (sec): 90.40 - samples/sec: 2745.83 - lr: 0.000003 - momentum: 0.000000
2023-10-17 13:51:27,587 ----------------------------------------------------------------------------------------------------
2023-10-17 13:51:27,587 EPOCH 9 done: loss 0.0126 - lr: 0.000003
2023-10-17 13:51:31,855 DEV : loss 0.20930074155330658 - f1-score (micro avg)  0.8232
2023-10-17 13:51:31,872 ----------------------------------------------------------------------------------------------------
2023-10-17 13:51:40,887 epoch 10 - iter 178/1786 - loss 0.00609129 - time (sec): 9.01 - samples/sec: 2824.04 - lr: 0.000003 - momentum: 0.000000
2023-10-17 13:51:50,023 epoch 10 - iter 356/1786 - loss 0.00715017 - time (sec): 18.15 - samples/sec: 2816.42 - lr: 0.000003 - momentum: 0.000000
2023-10-17 13:51:58,978 epoch 10 - iter 534/1786 - loss 0.00851154 - time (sec): 27.10 - samples/sec: 2809.81 - lr: 0.000002 - momentum: 0.000000
2023-10-17 13:52:07,808 epoch 10 - iter 712/1786 - loss 0.00825487 - time (sec): 35.93 - samples/sec: 2767.77 - lr: 0.000002 - momentum: 0.000000
2023-10-17 13:52:17,310 epoch 10 - iter 890/1786 - loss 0.00855273 - time (sec): 45.44 - samples/sec: 2755.18 - lr: 0.000002 - momentum: 0.000000
2023-10-17 13:52:26,532 epoch 10 - iter 1068/1786 - loss 0.00838603 - time (sec): 54.66 - samples/sec: 2739.98 - lr: 0.000001 - momentum: 0.000000
2023-10-17 13:52:35,578 epoch 10 - iter 1246/1786 - loss 0.00840932 - time (sec): 63.70 - samples/sec: 2727.41 - lr: 0.000001 - momentum: 0.000000
2023-10-17 13:52:44,438 epoch 10 - iter 1424/1786 - loss 0.00809426 - time (sec): 72.56 - samples/sec: 2745.31 - lr: 0.000001 - momentum: 0.000000
2023-10-17 13:52:53,121 epoch 10 - iter 1602/1786 - loss 0.00816111 - time (sec): 81.25 - samples/sec: 2749.63 - lr: 0.000000 - momentum: 0.000000
2023-10-17 13:53:01,991 epoch 10 - iter 1780/1786 - loss 0.00858460 - time (sec): 90.12 - samples/sec: 2750.90 - lr: 0.000000 - momentum: 0.000000
2023-10-17 13:53:02,263 ----------------------------------------------------------------------------------------------------
2023-10-17 13:53:02,263 EPOCH 10 done: loss 0.0086 - lr: 0.000000
2023-10-17 13:53:06,932 DEV : loss 0.20266954600811005 - f1-score (micro avg)  0.8289
2023-10-17 13:53:07,291 ----------------------------------------------------------------------------------------------------
2023-10-17 13:53:07,292 Loading model from best epoch ...
2023-10-17 13:53:08,632 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-17 13:53:18,265 
Results:
- F-score (micro) 0.705
- F-score (macro) 0.6364
- Accuracy 0.5623

By class:
              precision    recall  f1-score   support

         LOC     0.7285    0.6959    0.7118      1095
         PER     0.7944    0.7787    0.7864      1012
         ORG     0.4503    0.5714    0.5037       357
   HumanProd     0.4237    0.7576    0.5435        33

   micro avg     0.6976    0.7125    0.7050      2497
   macro avg     0.5992    0.7009    0.6364      2497
weighted avg     0.7114    0.7125    0.7101      2497

2023-10-17 13:53:18,265 ----------------------------------------------------------------------------------------------------