alanakbik commited on
Commit
c0da81a
·
1 Parent(s): 745de92

initial commit

Browse files
Files changed (4) hide show
  1. README.md +173 -0
  2. loss.tsv +21 -0
  3. pytorch_model.bin +3 -0
  4. training.log +926 -0
README.md ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - flair
4
+ - token-classification
5
+ - sequence-tagger-model
6
+ language: en
7
+ datasets:
8
+ - ontonotes
9
+ inference: false
10
+ ---
11
+
12
+ ## English NER in Flair (Ontonotes large model)
13
+
14
+ This is the large 18-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).
15
+
16
+ F1-Score: **90.93** (Ontonotes)
17
+
18
+ Predicts 18 tags:
19
+
20
+ | **tag** | **meaning** |
21
+ |---------------------------------|-----------|
22
+ | CARDINAL | cardinal value |
23
+ | DATE | date value |
24
+ | EVENT | event name |
25
+ | FAC | building name |
26
+ | GPE | geo-political entity |
27
+ | LANGUAGE | language name |
28
+ | LAW | law name |
29
+ | LOC | location name |
30
+ | MONEY | money name |
31
+ | NORP | affiliation |
32
+ | ORDINAL | ordinal value |
33
+ | ORG | organization name |
34
+ | PERCENT | percent value |
35
+ | PERSON | person name |
36
+ | PRODUCT | product name |
37
+ | QUANTITY | quantity value |
38
+ | TIME | time value |
39
+ | WORK_OF_ART | name of work of art |
40
+
41
+ Based on [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and LSTM-CRF.
42
+
43
+ ---
44
+
45
+ ### Demo: How to use in Flair
46
+
47
+ Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
48
+
49
+ ```python
50
+ from flair.data import Sentence
51
+ from flair.models import SequenceTagger
52
+
53
+ # load tagger
54
+ tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
55
+
56
+ # make example sentence
57
+ sentence = Sentence("On September 1st George won 1 dollar while watching Game of Thrones.")
58
+
59
+ # predict NER tags
60
+ tagger.predict(sentence)
61
+
62
+ # print sentence
63
+ print(sentence)
64
+
65
+ # print predicted NER spans
66
+ print('The following NER tags are found:')
67
+ # iterate over entities and print
68
+ for entity in sentence.get_spans('ner'):
69
+ print(entity)
70
+
71
+ ```
72
+
73
+ This yields the following output:
74
+ ```
75
+ Span [2,3]: "September 1st" [− Labels: DATE (0.8824)]
76
+ Span [4,5]: "George Washington" [− Labels: PERSON (0.9604)]
77
+ Span [7,8]: "1 dollar" [− Labels: MONEY (0.9837)]
78
+ ```
79
+
80
+ So, the entities "*September 1st*" (labeled as a **date**), "*George*" (labeled as a **person**), "*1 dollar*" (labeled as a **money**) and "Game of Thrones" (labeled as a **work of art**) are found in the sentence "*On September 1st George Washington won 1 dollar while watching Game of Thrones*".
81
+
82
+
83
+ ---
84
+
85
+ ### Training: Script to train this model
86
+
87
+ The following Flair script was used to train this model:
88
+
89
+ ```python
90
+ from flair.data import Corpus
91
+ from flair.datasets import ColumnCorpus
92
+ from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
93
+
94
+ # 1. load the corpus (Ontonotes does not ship with Flair, you need to download and reformat into a column format yourself)
95
+ corpus: Corpus = ColumnCorpus(
96
+ "resources/tasks/onto-ner",
97
+ column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
98
+ tag_to_bioes="ner",
99
+ )
100
+
101
+ # 2. what tag do we want to predict?
102
+ tag_type = 'ner'
103
+
104
+ # 3. make the tag dictionary from the corpus
105
+ tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
106
+
107
+ # 4. initialize fine-tuneable transformer embeddings WITH document context
108
+ from flair.embeddings import TransformerWordEmbeddings
109
+
110
+ embeddings = TransformerWordEmbeddings(
111
+ model='xlm-roberta-large',
112
+ layers="-1",
113
+ subtoken_pooling="first",
114
+ fine_tune=True,
115
+ use_context=True,
116
+ )
117
+
118
+ # 5. initialize bare-bones sequence tagger (no CRF, no RNN, no reprojection)
119
+ from flair.models import SequenceTagger
120
+
121
+ tagger = SequenceTagger(
122
+ hidden_size=256,
123
+ embeddings=embeddings,
124
+ tag_dictionary=tag_dictionary,
125
+ tag_type='ner',
126
+ use_crf=False,
127
+ use_rnn=False,
128
+ reproject_embeddings=False,
129
+ )
130
+
131
+ # 6. initialize trainer with AdamW optimizer
132
+ from flair.trainers import ModelTrainer
133
+
134
+ trainer = ModelTrainer(tagger, corpus, optimizer=torch.optim.AdamW)
135
+
136
+ # 7. run training with XLM parameters (20 epochs, small LR)
137
+ from torch.optim.lr_scheduler import OneCycleLR
138
+
139
+ trainer.train('resources/taggers/ner-english-ontonotes-large',
140
+ learning_rate=5.0e-6,
141
+ mini_batch_size=4,
142
+ mini_batch_chunk_size=1,
143
+ max_epochs=20,
144
+ scheduler=OneCycleLR,
145
+ embeddings_storage_mode='none',
146
+ weight_decay=0.,
147
+ )
148
+ ```
149
+
150
+
151
+
152
+ ---
153
+
154
+ ### Cite
155
+
156
+ Please cite the following paper when using this model.
157
+
158
+ ```
159
+ @misc{schweter2020flert,
160
+ title={FLERT: Document-Level Features for Named Entity Recognition},
161
+ author={Stefan Schweter and Alan Akbik},
162
+ year={2020},
163
+ eprint={2011.06993},
164
+ archivePrefix={arXiv},
165
+ primaryClass={cs.CL}
166
+ }
167
+ ```
168
+
169
+ ---
170
+
171
+ ### Issues?
172
+
173
+ The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).
loss.tsv ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS TEST_LOSS TEST_PRECISION TEST_RECALL TEST_F1
2
+ 1 14:37:34 4 0.0000 0.3262549467258948 0.12760598957538605 0.8458 0.8890 0.8669
3
+ 2 17:12:47 4 0.0000 0.2804282291409521 0.08660610020160675 0.8796 0.9115 0.8953
4
+ 3 19:48:06 4 0.0000 0.2564523208085152 0.08892939984798431 0.8919 0.9114 0.9015
5
+ 4 22:23:24 4 0.0000 0.24018183809031893 0.09627319127321243 0.9060 0.9091 0.9076
6
+ 5 00:58:50 4 0.0000 0.2262700294968986 0.09906419366598129 0.8960 0.9134 0.9046
7
+ 6 03:34:15 4 0.0000 0.21600639542005087 0.10325756669044495 0.9001 0.9153 0.9076
8
+ 7 06:09:34 4 0.0000 0.2105412023502746 0.11405058950185776 0.8979 0.9102 0.9040
9
+ 8 08:45:06 4 0.0000 0.19886444720165738 0.12001997232437134 0.8965 0.9161 0.9062
10
+ 9 11:20:26 4 0.0000 0.19246368803602384 0.12788806855678558 0.9026 0.9126 0.9075
11
+ 10 13:55:29 4 0.0000 0.18462480832654865 0.14910565316677094 0.9055 0.9061 0.9058
12
+ 11 16:30:17 4 0.0000 0.17900733027625026 0.15147249400615692 0.9002 0.9123 0.9062
13
+ 12 19:05:31 4 0.0000 0.17210372630038878 0.147916778922081 0.9037 0.9134 0.9085
14
+ 13 21:40:34 4 0.0000 0.17393692914703035 0.16395367681980133 0.9027 0.9127 0.9076
15
+ 14 00:15:21 4 0.0000 0.17225090862075376 0.16743017733097076 0.9061 0.9119 0.9090
16
+ 15 02:50:08 4 0.0000 0.16488957710656618 0.17295649647712708 0.9055 0.9146 0.9101
17
+ 16 05:24:59 4 0.0000 0.16308925027911492 0.1732577085494995 0.9065 0.9134 0.9099
18
+ 17 07:59:36 4 0.0000 0.1624136931469783 0.1792287975549698 0.9060 0.9137 0.9098
19
+ 18 10:34:48 4 0.0000 0.16103925064710428 0.17890706658363342 0.9051 0.9139 0.9095
20
+ 19 13:09:51 4 0.0000 0.16130805570335532 0.1799449324607849 0.9053 0.9134 0.9093
21
+ 20 15:44:48 4 0.0000 0.1607274828808972 0.17999354004859924 0.9055 0.9132 0.9093
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93ccd06d32bae9fde24d34cd86d81d0aa687c42dd531a0e7cf4b8d81c6eefc71
3
+ size 2240097289
training.log ADDED
@@ -0,0 +1,926 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-02-20 12:03:00,991 ----------------------------------------------------------------------------------------------------
2
+ 2021-02-20 12:03:00,994 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): XLMRobertaModel(
5
+ (embeddings): RobertaEmbeddings(
6
+ (word_embeddings): Embedding(250002, 1024, padding_idx=1)
7
+ (position_embeddings): Embedding(514, 1024, padding_idx=1)
8
+ (token_type_embeddings): Embedding(1, 1024)
9
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): RobertaEncoder(
13
+ (layer): ModuleList(
14
+ (0): RobertaLayer(
15
+ (attention): RobertaAttention(
16
+ (self): RobertaSelfAttention(
17
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
18
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
19
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): RobertaSelfOutput(
23
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
24
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): RobertaIntermediate(
29
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
30
+ )
31
+ (output): RobertaOutput(
32
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
33
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
34
+ (dropout): Dropout(p=0.1, inplace=False)
35
+ )
36
+ )
37
+ (1): RobertaLayer(
38
+ (attention): RobertaAttention(
39
+ (self): RobertaSelfAttention(
40
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
41
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
42
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
43
+ (dropout): Dropout(p=0.1, inplace=False)
44
+ )
45
+ (output): RobertaSelfOutput(
46
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
47
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
48
+ (dropout): Dropout(p=0.1, inplace=False)
49
+ )
50
+ )
51
+ (intermediate): RobertaIntermediate(
52
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
53
+ )
54
+ (output): RobertaOutput(
55
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
56
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
57
+ (dropout): Dropout(p=0.1, inplace=False)
58
+ )
59
+ )
60
+ (2): RobertaLayer(
61
+ (attention): RobertaAttention(
62
+ (self): RobertaSelfAttention(
63
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
64
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
65
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
66
+ (dropout): Dropout(p=0.1, inplace=False)
67
+ )
68
+ (output): RobertaSelfOutput(
69
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
70
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
71
+ (dropout): Dropout(p=0.1, inplace=False)
72
+ )
73
+ )
74
+ (intermediate): RobertaIntermediate(
75
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
76
+ )
77
+ (output): RobertaOutput(
78
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
79
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
80
+ (dropout): Dropout(p=0.1, inplace=False)
81
+ )
82
+ )
83
+ (3): RobertaLayer(
84
+ (attention): RobertaAttention(
85
+ (self): RobertaSelfAttention(
86
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
87
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
88
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
89
+ (dropout): Dropout(p=0.1, inplace=False)
90
+ )
91
+ (output): RobertaSelfOutput(
92
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
93
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
94
+ (dropout): Dropout(p=0.1, inplace=False)
95
+ )
96
+ )
97
+ (intermediate): RobertaIntermediate(
98
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
99
+ )
100
+ (output): RobertaOutput(
101
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
102
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
103
+ (dropout): Dropout(p=0.1, inplace=False)
104
+ )
105
+ )
106
+ (4): RobertaLayer(
107
+ (attention): RobertaAttention(
108
+ (self): RobertaSelfAttention(
109
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
110
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
111
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
112
+ (dropout): Dropout(p=0.1, inplace=False)
113
+ )
114
+ (output): RobertaSelfOutput(
115
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
116
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
117
+ (dropout): Dropout(p=0.1, inplace=False)
118
+ )
119
+ )
120
+ (intermediate): RobertaIntermediate(
121
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
122
+ )
123
+ (output): RobertaOutput(
124
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
125
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
126
+ (dropout): Dropout(p=0.1, inplace=False)
127
+ )
128
+ )
129
+ (5): RobertaLayer(
130
+ (attention): RobertaAttention(
131
+ (self): RobertaSelfAttention(
132
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
133
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
134
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
135
+ (dropout): Dropout(p=0.1, inplace=False)
136
+ )
137
+ (output): RobertaSelfOutput(
138
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
139
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ )
143
+ (intermediate): RobertaIntermediate(
144
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
145
+ )
146
+ (output): RobertaOutput(
147
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
148
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
149
+ (dropout): Dropout(p=0.1, inplace=False)
150
+ )
151
+ )
152
+ (6): RobertaLayer(
153
+ (attention): RobertaAttention(
154
+ (self): RobertaSelfAttention(
155
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
156
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
157
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
158
+ (dropout): Dropout(p=0.1, inplace=False)
159
+ )
160
+ (output): RobertaSelfOutput(
161
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
162
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
163
+ (dropout): Dropout(p=0.1, inplace=False)
164
+ )
165
+ )
166
+ (intermediate): RobertaIntermediate(
167
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
168
+ )
169
+ (output): RobertaOutput(
170
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
171
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
172
+ (dropout): Dropout(p=0.1, inplace=False)
173
+ )
174
+ )
175
+ (7): RobertaLayer(
176
+ (attention): RobertaAttention(
177
+ (self): RobertaSelfAttention(
178
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
179
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
180
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
181
+ (dropout): Dropout(p=0.1, inplace=False)
182
+ )
183
+ (output): RobertaSelfOutput(
184
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
185
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
186
+ (dropout): Dropout(p=0.1, inplace=False)
187
+ )
188
+ )
189
+ (intermediate): RobertaIntermediate(
190
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
191
+ )
192
+ (output): RobertaOutput(
193
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
194
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
195
+ (dropout): Dropout(p=0.1, inplace=False)
196
+ )
197
+ )
198
+ (8): RobertaLayer(
199
+ (attention): RobertaAttention(
200
+ (self): RobertaSelfAttention(
201
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
202
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
203
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
204
+ (dropout): Dropout(p=0.1, inplace=False)
205
+ )
206
+ (output): RobertaSelfOutput(
207
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
208
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
209
+ (dropout): Dropout(p=0.1, inplace=False)
210
+ )
211
+ )
212
+ (intermediate): RobertaIntermediate(
213
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
214
+ )
215
+ (output): RobertaOutput(
216
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
217
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
218
+ (dropout): Dropout(p=0.1, inplace=False)
219
+ )
220
+ )
221
+ (9): RobertaLayer(
222
+ (attention): RobertaAttention(
223
+ (self): RobertaSelfAttention(
224
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
225
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
226
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ (output): RobertaSelfOutput(
230
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
231
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
232
+ (dropout): Dropout(p=0.1, inplace=False)
233
+ )
234
+ )
235
+ (intermediate): RobertaIntermediate(
236
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
237
+ )
238
+ (output): RobertaOutput(
239
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
240
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (10): RobertaLayer(
245
+ (attention): RobertaAttention(
246
+ (self): RobertaSelfAttention(
247
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
248
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
249
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
250
+ (dropout): Dropout(p=0.1, inplace=False)
251
+ )
252
+ (output): RobertaSelfOutput(
253
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
254
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
255
+ (dropout): Dropout(p=0.1, inplace=False)
256
+ )
257
+ )
258
+ (intermediate): RobertaIntermediate(
259
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
260
+ )
261
+ (output): RobertaOutput(
262
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
263
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
264
+ (dropout): Dropout(p=0.1, inplace=False)
265
+ )
266
+ )
267
+ (11): RobertaLayer(
268
+ (attention): RobertaAttention(
269
+ (self): RobertaSelfAttention(
270
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
271
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
272
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
273
+ (dropout): Dropout(p=0.1, inplace=False)
274
+ )
275
+ (output): RobertaSelfOutput(
276
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
277
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
278
+ (dropout): Dropout(p=0.1, inplace=False)
279
+ )
280
+ )
281
+ (intermediate): RobertaIntermediate(
282
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
283
+ )
284
+ (output): RobertaOutput(
285
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
286
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
287
+ (dropout): Dropout(p=0.1, inplace=False)
288
+ )
289
+ )
290
+ (12): RobertaLayer(
291
+ (attention): RobertaAttention(
292
+ (self): RobertaSelfAttention(
293
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
294
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
295
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
296
+ (dropout): Dropout(p=0.1, inplace=False)
297
+ )
298
+ (output): RobertaSelfOutput(
299
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
300
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
301
+ (dropout): Dropout(p=0.1, inplace=False)
302
+ )
303
+ )
304
+ (intermediate): RobertaIntermediate(
305
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
306
+ )
307
+ (output): RobertaOutput(
308
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
309
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
310
+ (dropout): Dropout(p=0.1, inplace=False)
311
+ )
312
+ )
313
+ (13): RobertaLayer(
314
+ (attention): RobertaAttention(
315
+ (self): RobertaSelfAttention(
316
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
317
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
318
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
319
+ (dropout): Dropout(p=0.1, inplace=False)
320
+ )
321
+ (output): RobertaSelfOutput(
322
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
323
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
324
+ (dropout): Dropout(p=0.1, inplace=False)
325
+ )
326
+ )
327
+ (intermediate): RobertaIntermediate(
328
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
329
+ )
330
+ (output): RobertaOutput(
331
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
332
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
333
+ (dropout): Dropout(p=0.1, inplace=False)
334
+ )
335
+ )
336
+ (14): RobertaLayer(
337
+ (attention): RobertaAttention(
338
+ (self): RobertaSelfAttention(
339
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
340
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
341
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
342
+ (dropout): Dropout(p=0.1, inplace=False)
343
+ )
344
+ (output): RobertaSelfOutput(
345
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
346
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
347
+ (dropout): Dropout(p=0.1, inplace=False)
348
+ )
349
+ )
350
+ (intermediate): RobertaIntermediate(
351
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
352
+ )
353
+ (output): RobertaOutput(
354
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
355
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
356
+ (dropout): Dropout(p=0.1, inplace=False)
357
+ )
358
+ )
359
+ (15): RobertaLayer(
360
+ (attention): RobertaAttention(
361
+ (self): RobertaSelfAttention(
362
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
363
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
364
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
365
+ (dropout): Dropout(p=0.1, inplace=False)
366
+ )
367
+ (output): RobertaSelfOutput(
368
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
369
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
370
+ (dropout): Dropout(p=0.1, inplace=False)
371
+ )
372
+ )
373
+ (intermediate): RobertaIntermediate(
374
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
375
+ )
376
+ (output): RobertaOutput(
377
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
378
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
379
+ (dropout): Dropout(p=0.1, inplace=False)
380
+ )
381
+ )
382
+ (16): RobertaLayer(
383
+ (attention): RobertaAttention(
384
+ (self): RobertaSelfAttention(
385
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
386
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
387
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
388
+ (dropout): Dropout(p=0.1, inplace=False)
389
+ )
390
+ (output): RobertaSelfOutput(
391
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
392
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
393
+ (dropout): Dropout(p=0.1, inplace=False)
394
+ )
395
+ )
396
+ (intermediate): RobertaIntermediate(
397
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
398
+ )
399
+ (output): RobertaOutput(
400
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
401
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
402
+ (dropout): Dropout(p=0.1, inplace=False)
403
+ )
404
+ )
405
+ (17): RobertaLayer(
406
+ (attention): RobertaAttention(
407
+ (self): RobertaSelfAttention(
408
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
409
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
410
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
411
+ (dropout): Dropout(p=0.1, inplace=False)
412
+ )
413
+ (output): RobertaSelfOutput(
414
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
415
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
416
+ (dropout): Dropout(p=0.1, inplace=False)
417
+ )
418
+ )
419
+ (intermediate): RobertaIntermediate(
420
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
421
+ )
422
+ (output): RobertaOutput(
423
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
424
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
425
+ (dropout): Dropout(p=0.1, inplace=False)
426
+ )
427
+ )
428
+ (18): RobertaLayer(
429
+ (attention): RobertaAttention(
430
+ (self): RobertaSelfAttention(
431
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
432
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
433
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
434
+ (dropout): Dropout(p=0.1, inplace=False)
435
+ )
436
+ (output): RobertaSelfOutput(
437
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
438
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
439
+ (dropout): Dropout(p=0.1, inplace=False)
440
+ )
441
+ )
442
+ (intermediate): RobertaIntermediate(
443
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
444
+ )
445
+ (output): RobertaOutput(
446
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
447
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
448
+ (dropout): Dropout(p=0.1, inplace=False)
449
+ )
450
+ )
451
+ (19): RobertaLayer(
452
+ (attention): RobertaAttention(
453
+ (self): RobertaSelfAttention(
454
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
455
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
456
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
457
+ (dropout): Dropout(p=0.1, inplace=False)
458
+ )
459
+ (output): RobertaSelfOutput(
460
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
461
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
462
+ (dropout): Dropout(p=0.1, inplace=False)
463
+ )
464
+ )
465
+ (intermediate): RobertaIntermediate(
466
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
467
+ )
468
+ (output): RobertaOutput(
469
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
470
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
471
+ (dropout): Dropout(p=0.1, inplace=False)
472
+ )
473
+ )
474
+ (20): RobertaLayer(
475
+ (attention): RobertaAttention(
476
+ (self): RobertaSelfAttention(
477
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
478
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
479
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
480
+ (dropout): Dropout(p=0.1, inplace=False)
481
+ )
482
+ (output): RobertaSelfOutput(
483
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
484
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
485
+ (dropout): Dropout(p=0.1, inplace=False)
486
+ )
487
+ )
488
+ (intermediate): RobertaIntermediate(
489
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
490
+ )
491
+ (output): RobertaOutput(
492
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
493
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
494
+ (dropout): Dropout(p=0.1, inplace=False)
495
+ )
496
+ )
497
+ (21): RobertaLayer(
498
+ (attention): RobertaAttention(
499
+ (self): RobertaSelfAttention(
500
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
501
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
502
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
503
+ (dropout): Dropout(p=0.1, inplace=False)
504
+ )
505
+ (output): RobertaSelfOutput(
506
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
507
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
508
+ (dropout): Dropout(p=0.1, inplace=False)
509
+ )
510
+ )
511
+ (intermediate): RobertaIntermediate(
512
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
513
+ )
514
+ (output): RobertaOutput(
515
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
516
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
517
+ (dropout): Dropout(p=0.1, inplace=False)
518
+ )
519
+ )
520
+ (22): RobertaLayer(
521
+ (attention): RobertaAttention(
522
+ (self): RobertaSelfAttention(
523
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
524
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
525
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
526
+ (dropout): Dropout(p=0.1, inplace=False)
527
+ )
528
+ (output): RobertaSelfOutput(
529
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
530
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
531
+ (dropout): Dropout(p=0.1, inplace=False)
532
+ )
533
+ )
534
+ (intermediate): RobertaIntermediate(
535
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
536
+ )
537
+ (output): RobertaOutput(
538
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
539
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
540
+ (dropout): Dropout(p=0.1, inplace=False)
541
+ )
542
+ )
543
+ (23): RobertaLayer(
544
+ (attention): RobertaAttention(
545
+ (self): RobertaSelfAttention(
546
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
547
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
548
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
549
+ (dropout): Dropout(p=0.1, inplace=False)
550
+ )
551
+ (output): RobertaSelfOutput(
552
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
553
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
554
+ (dropout): Dropout(p=0.1, inplace=False)
555
+ )
556
+ )
557
+ (intermediate): RobertaIntermediate(
558
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
559
+ )
560
+ (output): RobertaOutput(
561
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
562
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
563
+ (dropout): Dropout(p=0.1, inplace=False)
564
+ )
565
+ )
566
+ )
567
+ )
568
+ (pooler): RobertaPooler(
569
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
570
+ (activation): Tanh()
571
+ )
572
+ )
573
+ )
574
+ (word_dropout): WordDropout(p=0.05)
575
+ (locked_dropout): LockedDropout(p=0.5)
576
+ (linear): Linear(in_features=1024, out_features=76, bias=True)
577
+ (beta): 1.0
578
+ (weights): None
579
+ (weight_tensor) None
580
+ )"
581
+ 2021-02-20 12:03:00,995 ----------------------------------------------------------------------------------------------------
582
+ 2021-02-20 12:03:00,995 Corpus: "Corpus: 75187 train + 9603 dev + 9479 test sentences"
583
+ 2021-02-20 12:03:00,995 ----------------------------------------------------------------------------------------------------
584
+ 2021-02-20 12:03:00,995 Parameters:
585
+ 2021-02-20 12:03:00,995 - learning_rate: "5e-06"
586
+ 2021-02-20 12:03:00,995 - mini_batch_size: "4"
587
+ 2021-02-20 12:03:00,995 - patience: "3"
588
+ 2021-02-20 12:03:00,995 - anneal_factor: "0.5"
589
+ 2021-02-20 12:03:00,995 - max_epochs: "20"
590
+ 2021-02-20 12:03:00,995 - shuffle: "True"
591
+ 2021-02-20 12:03:00,995 - train_with_dev: "True"
592
+ 2021-02-20 12:03:00,996 - batch_growth_annealing: "False"
593
+ 2021-02-20 12:03:00,996 ----------------------------------------------------------------------------------------------------
594
+ 2021-02-20 12:03:00,996 Model training base path: "resources/contextdrop/d-flert-ontonotes-ft+dev-xlm-roberta-large-context+drop-64-True-42"
595
+ 2021-02-20 12:03:00,996 ----------------------------------------------------------------------------------------------------
596
+ 2021-02-20 12:03:00,996 Device: cuda:0
597
+ 2021-02-20 12:03:00,996 ----------------------------------------------------------------------------------------------------
598
+ 2021-02-20 12:03:00,996 Embeddings storage mode: none
599
+ 2021-02-20 12:03:01,005 ----------------------------------------------------------------------------------------------------
600
+ 2021-02-20 12:17:26,941 epoch 1 - iter 2119/21198 - loss 0.46498391 - samples/sec: 9.79 - lr: 0.000005
601
+ 2021-02-20 12:32:25,501 epoch 1 - iter 4238/21198 - loss 0.43484389 - samples/sec: 9.43 - lr: 0.000005
602
+ 2021-02-20 12:47:30,355 epoch 1 - iter 6357/21198 - loss 0.42857357 - samples/sec: 9.37 - lr: 0.000005
603
+ 2021-02-20 13:02:33,037 epoch 1 - iter 8476/21198 - loss 0.40114081 - samples/sec: 9.39 - lr: 0.000005
604
+ 2021-02-20 13:17:06,534 epoch 1 - iter 10595/21198 - loss 0.36551536 - samples/sec: 9.70 - lr: 0.000005
605
+ 2021-02-20 13:31:52,079 epoch 1 - iter 12714/21198 - loss 0.34481658 - samples/sec: 9.57 - lr: 0.000005
606
+ 2021-02-20 13:47:10,517 epoch 1 - iter 14833/21198 - loss 0.33967654 - samples/sec: 9.23 - lr: 0.000005
607
+ 2021-02-20 14:02:14,283 epoch 1 - iter 16952/21198 - loss 0.33393062 - samples/sec: 9.38 - lr: 0.000005
608
+ 2021-02-20 14:16:49,633 epoch 1 - iter 19071/21198 - loss 0.32924976 - samples/sec: 9.68 - lr: 0.000005
609
+ 2021-02-20 14:31:45,192 epoch 1 - iter 21190/21198 - loss 0.32628298 - samples/sec: 9.47 - lr: 0.000005
610
+ 2021-02-20 14:31:48,270 ----------------------------------------------------------------------------------------------------
611
+ 2021-02-20 14:31:48,271 EPOCH 1 done: loss 0.3263 - lr 0.0000050
612
+ 2021-02-20 14:37:34,463 TEST : loss 0.12760598957538605 - score 0.8669
613
+ 2021-02-20 14:37:34,546 BAD EPOCHS (no improvement): 4
614
+ 2021-02-20 14:37:34,556 ----------------------------------------------------------------------------------------------------
615
+ 2021-02-20 14:52:29,571 epoch 2 - iter 2119/21198 - loss 0.29859233 - samples/sec: 9.47 - lr: 0.000005
616
+ 2021-02-20 15:07:24,765 epoch 2 - iter 4238/21198 - loss 0.29870475 - samples/sec: 9.47 - lr: 0.000005
617
+ 2021-02-20 15:22:22,170 epoch 2 - iter 6357/21198 - loss 0.29288750 - samples/sec: 9.45 - lr: 0.000005
618
+ 2021-02-20 15:37:18,156 epoch 2 - iter 8476/21198 - loss 0.29279330 - samples/sec: 9.46 - lr: 0.000005
619
+ 2021-02-20 15:52:13,883 epoch 2 - iter 10595/21198 - loss 0.28788203 - samples/sec: 9.46 - lr: 0.000005
620
+ 2021-02-20 16:07:12,097 epoch 2 - iter 12714/21198 - loss 0.28927318 - samples/sec: 9.44 - lr: 0.000005
621
+ 2021-02-20 16:22:07,642 epoch 2 - iter 14833/21198 - loss 0.28514545 - samples/sec: 9.47 - lr: 0.000005
622
+ 2021-02-20 16:37:06,266 epoch 2 - iter 16952/21198 - loss 0.28311760 - samples/sec: 9.43 - lr: 0.000005
623
+ 2021-02-20 16:52:00,498 epoch 2 - iter 19071/21198 - loss 0.28229767 - samples/sec: 9.48 - lr: 0.000005
624
+ 2021-02-20 17:06:54,963 epoch 2 - iter 21190/21198 - loss 0.28044944 - samples/sec: 9.48 - lr: 0.000005
625
+ 2021-02-20 17:06:58,266 ----------------------------------------------------------------------------------------------------
626
+ 2021-02-20 17:06:58,266 EPOCH 2 done: loss 0.2804 - lr 0.0000049
627
+ 2021-02-20 17:12:47,188 TEST : loss 0.08660610020160675 - score 0.8953
628
+ 2021-02-20 17:12:47,273 BAD EPOCHS (no improvement): 4
629
+ 2021-02-20 17:12:47,275 ----------------------------------------------------------------------------------------------------
630
+ 2021-02-20 17:27:41,889 epoch 3 - iter 2119/21198 - loss 0.26828308 - samples/sec: 9.48 - lr: 0.000005
631
+ 2021-02-20 17:42:34,288 epoch 3 - iter 4238/21198 - loss 0.26184351 - samples/sec: 9.50 - lr: 0.000005
632
+ 2021-02-20 17:57:29,878 epoch 3 - iter 6357/21198 - loss 0.25940653 - samples/sec: 9.46 - lr: 0.000005
633
+ 2021-02-20 18:12:25,470 epoch 3 - iter 8476/21198 - loss 0.25828841 - samples/sec: 9.46 - lr: 0.000005
634
+ 2021-02-20 18:27:24,608 epoch 3 - iter 10595/21198 - loss 0.25551183 - samples/sec: 9.43 - lr: 0.000005
635
+ 2021-02-20 18:42:18,429 epoch 3 - iter 12714/21198 - loss 0.25481692 - samples/sec: 9.48 - lr: 0.000005
636
+ 2021-02-20 18:57:16,717 epoch 3 - iter 14833/21198 - loss 0.25506844 - samples/sec: 9.44 - lr: 0.000005
637
+ 2021-02-20 19:12:13,807 epoch 3 - iter 16952/21198 - loss 0.25407433 - samples/sec: 9.45 - lr: 0.000005
638
+ 2021-02-20 19:27:12,592 epoch 3 - iter 19071/21198 - loss 0.25575351 - samples/sec: 9.43 - lr: 0.000005
639
+ 2021-02-20 19:42:07,912 epoch 3 - iter 21190/21198 - loss 0.25645391 - samples/sec: 9.47 - lr: 0.000005
640
+ 2021-02-20 19:42:10,991 ----------------------------------------------------------------------------------------------------
641
+ 2021-02-20 19:42:10,991 EPOCH 3 done: loss 0.2565 - lr 0.0000047
642
+ 2021-02-20 19:48:05,928 TEST : loss 0.08892939984798431 - score 0.9015
643
+ 2021-02-20 19:48:06,017 BAD EPOCHS (no improvement): 4
644
+ 2021-02-20 19:48:06,022 ----------------------------------------------------------------------------------------------------
645
+ 2021-02-20 20:03:04,520 epoch 4 - iter 2119/21198 - loss 0.24164433 - samples/sec: 9.43 - lr: 0.000005
646
+ 2021-02-20 20:17:56,429 epoch 4 - iter 4238/21198 - loss 0.24019658 - samples/sec: 9.50 - lr: 0.000005
647
+ 2021-02-20 20:32:52,945 epoch 4 - iter 6357/21198 - loss 0.24219914 - samples/sec: 9.46 - lr: 0.000005
648
+ 2021-02-20 20:47:50,199 epoch 4 - iter 8476/21198 - loss 0.23953211 - samples/sec: 9.45 - lr: 0.000005
649
+ 2021-02-20 21:02:44,855 epoch 4 - iter 10595/21198 - loss 0.23751325 - samples/sec: 9.47 - lr: 0.000005
650
+ 2021-02-20 21:17:41,522 epoch 4 - iter 12714/21198 - loss 0.23782852 - samples/sec: 9.45 - lr: 0.000005
651
+ 2021-02-20 21:32:38,226 epoch 4 - iter 14833/21198 - loss 0.24096846 - samples/sec: 9.45 - lr: 0.000005
652
+ 2021-02-20 21:47:40,951 epoch 4 - iter 16952/21198 - loss 0.23932344 - samples/sec: 9.39 - lr: 0.000005
653
+ 2021-02-20 22:02:36,247 epoch 4 - iter 19071/21198 - loss 0.24064527 - samples/sec: 9.47 - lr: 0.000005
654
+ 2021-02-20 22:17:29,253 epoch 4 - iter 21190/21198 - loss 0.24016898 - samples/sec: 9.49 - lr: 0.000005
655
+ 2021-02-20 22:17:32,358 ----------------------------------------------------------------------------------------------------
656
+ 2021-02-20 22:17:32,358 EPOCH 4 done: loss 0.2402 - lr 0.0000045
657
+ 2021-02-20 22:23:24,429 TEST : loss 0.09627319127321243 - score 0.9076
658
+ 2021-02-20 22:23:24,520 BAD EPOCHS (no improvement): 4
659
+ 2021-02-20 22:23:24,535 ----------------------------------------------------------------------------------------------------
660
+ 2021-02-20 22:38:20,470 epoch 5 - iter 2119/21198 - loss 0.22083609 - samples/sec: 9.46 - lr: 0.000004
661
+ 2021-02-20 22:53:16,946 epoch 5 - iter 4238/21198 - loss 0.22353303 - samples/sec: 9.46 - lr: 0.000004
662
+ 2021-02-20 23:08:09,262 epoch 5 - iter 6357/21198 - loss 0.22526515 - samples/sec: 9.50 - lr: 0.000004
663
+ 2021-02-20 23:23:05,354 epoch 5 - iter 8476/21198 - loss 0.22450491 - samples/sec: 9.46 - lr: 0.000004
664
+ 2021-02-20 23:38:01,961 epoch 5 - iter 10595/21198 - loss 0.22317870 - samples/sec: 9.45 - lr: 0.000004
665
+ 2021-02-20 23:53:00,849 epoch 5 - iter 12714/21198 - loss 0.22493520 - samples/sec: 9.43 - lr: 0.000004
666
+ 2021-02-21 00:07:59,228 epoch 5 - iter 14833/21198 - loss 0.22554395 - samples/sec: 9.44 - lr: 0.000004
667
+ 2021-02-21 00:22:55,492 epoch 5 - iter 16952/21198 - loss 0.22640472 - samples/sec: 9.46 - lr: 0.000004
668
+ 2021-02-21 00:37:51,438 epoch 5 - iter 19071/21198 - loss 0.22662263 - samples/sec: 9.46 - lr: 0.000004
669
+ 2021-02-21 00:52:55,596 epoch 5 - iter 21190/21198 - loss 0.22627673 - samples/sec: 9.38 - lr: 0.000004
670
+ 2021-02-21 00:52:58,870 ----------------------------------------------------------------------------------------------------
671
+ 2021-02-21 00:52:58,870 EPOCH 5 done: loss 0.2263 - lr 0.0000043
672
+ 2021-02-21 00:58:49,962 TEST : loss 0.09906419366598129 - score 0.9046
673
+ 2021-02-21 00:58:50,051 BAD EPOCHS (no improvement): 4
674
+ 2021-02-21 00:58:50,053 ----------------------------------------------------------------------------------------------------
675
+ 2021-02-21 01:13:45,979 epoch 6 - iter 2119/21198 - loss 0.21128728 - samples/sec: 9.46 - lr: 0.000004
676
+ 2021-02-21 01:28:42,436 epoch 6 - iter 4238/21198 - loss 0.21192698 - samples/sec: 9.46 - lr: 0.000004
677
+ 2021-02-21 01:43:40,811 epoch 6 - iter 6357/21198 - loss 0.21388017 - samples/sec: 9.44 - lr: 0.000004
678
+ 2021-02-21 01:58:32,902 epoch 6 - iter 8476/21198 - loss 0.21433303 - samples/sec: 9.50 - lr: 0.000004
679
+ 2021-02-21 02:13:28,053 epoch 6 - iter 10595/21198 - loss 0.21527260 - samples/sec: 9.47 - lr: 0.000004
680
+ 2021-02-21 02:28:23,770 epoch 6 - iter 12714/21198 - loss 0.21578637 - samples/sec: 9.46 - lr: 0.000004
681
+ 2021-02-21 02:43:23,477 epoch 6 - iter 14833/21198 - loss 0.21742266 - samples/sec: 9.42 - lr: 0.000004
682
+ 2021-02-21 02:58:20,917 epoch 6 - iter 16952/21198 - loss 0.21671573 - samples/sec: 9.45 - lr: 0.000004
683
+ 2021-02-21 03:13:22,283 epoch 6 - iter 19071/21198 - loss 0.21638606 - samples/sec: 9.40 - lr: 0.000004
684
+ 2021-02-21 03:28:18,668 epoch 6 - iter 21190/21198 - loss 0.21601016 - samples/sec: 9.46 - lr: 0.000004
685
+ 2021-02-21 03:28:21,833 ----------------------------------------------------------------------------------------------------
686
+ 2021-02-21 03:28:21,833 EPOCH 6 done: loss 0.2160 - lr 0.0000040
687
+ 2021-02-21 03:34:15,000 TEST : loss 0.10325756669044495 - score 0.9076
688
+ 2021-02-21 03:34:15,094 BAD EPOCHS (no improvement): 4
689
+ 2021-02-21 03:34:15,120 ----------------------------------------------------------------------------------------------------
690
+ 2021-02-21 03:49:07,155 epoch 7 - iter 2119/21198 - loss 0.21960439 - samples/sec: 9.50 - lr: 0.000004
691
+ 2021-02-21 04:04:03,005 epoch 7 - iter 4238/21198 - loss 0.22004925 - samples/sec: 9.46 - lr: 0.000004
692
+ 2021-02-21 04:18:56,753 epoch 7 - iter 6357/21198 - loss 0.21543406 - samples/sec: 9.48 - lr: 0.000004
693
+ 2021-02-21 04:33:52,219 epoch 7 - iter 8476/21198 - loss 0.21504576 - samples/sec: 9.47 - lr: 0.000004
694
+ 2021-02-21 04:48:46,766 epoch 7 - iter 10595/21198 - loss 0.21323903 - samples/sec: 9.48 - lr: 0.000004
695
+ 2021-02-21 05:03:47,214 epoch 7 - iter 12714/21198 - loss 0.21486108 - samples/sec: 9.41 - lr: 0.000004
696
+ 2021-02-21 05:18:42,062 epoch 7 - iter 14833/21198 - loss 0.21180056 - samples/sec: 9.47 - lr: 0.000004
697
+ 2021-02-21 05:33:36,547 epoch 7 - iter 16952/21198 - loss 0.21059053 - samples/sec: 9.48 - lr: 0.000004
698
+ 2021-02-21 05:48:34,692 epoch 7 - iter 19071/21198 - loss 0.21256070 - samples/sec: 9.44 - lr: 0.000004
699
+ 2021-02-21 06:03:32,420 epoch 7 - iter 21190/21198 - loss 0.21049512 - samples/sec: 9.44 - lr: 0.000004
700
+ 2021-02-21 06:03:35,617 ----------------------------------------------------------------------------------------------------
701
+ 2021-02-21 06:03:35,617 EPOCH 7 done: loss 0.2105 - lr 0.0000036
702
+ 2021-02-21 06:09:34,438 TEST : loss 0.11405058950185776 - score 0.904
703
+ 2021-02-21 06:09:34,531 BAD EPOCHS (no improvement): 4
704
+ 2021-02-21 06:09:34,562 ----------------------------------------------------------------------------------------------------
705
+ 2021-02-21 06:24:28,495 epoch 8 - iter 2119/21198 - loss 0.20943523 - samples/sec: 9.48 - lr: 0.000004
706
+ 2021-02-21 06:39:27,118 epoch 8 - iter 4238/21198 - loss 0.20855714 - samples/sec: 9.43 - lr: 0.000004
707
+ 2021-02-21 06:54:21,524 epoch 8 - iter 6357/21198 - loss 0.20901557 - samples/sec: 9.48 - lr: 0.000004
708
+ 2021-02-21 07:09:19,131 epoch 8 - iter 8476/21198 - loss 0.20346961 - samples/sec: 9.44 - lr: 0.000003
709
+ 2021-02-21 07:24:13,963 epoch 8 - iter 10595/21198 - loss 0.20279742 - samples/sec: 9.47 - lr: 0.000003
710
+ 2021-02-21 07:39:11,643 epoch 8 - iter 12714/21198 - loss 0.20257371 - samples/sec: 9.44 - lr: 0.000003
711
+ 2021-02-21 07:54:11,363 epoch 8 - iter 14833/21198 - loss 0.19941560 - samples/sec: 9.42 - lr: 0.000003
712
+ 2021-02-21 08:09:12,189 epoch 8 - iter 16952/21198 - loss 0.19895001 - samples/sec: 9.41 - lr: 0.000003
713
+ 2021-02-21 08:24:10,631 epoch 8 - iter 19071/21198 - loss 0.19874614 - samples/sec: 9.43 - lr: 0.000003
714
+ 2021-02-21 08:39:11,135 epoch 8 - iter 21190/21198 - loss 0.19883000 - samples/sec: 9.41 - lr: 0.000003
715
+ 2021-02-21 08:39:14,364 ----------------------------------------------------------------------------------------------------
716
+ 2021-02-21 08:39:14,365 EPOCH 8 done: loss 0.1989 - lr 0.0000033
717
+ 2021-02-21 08:45:06,010 TEST : loss 0.12001997232437134 - score 0.9062
718
+ 2021-02-21 08:45:06,104 BAD EPOCHS (no improvement): 4
719
+ 2021-02-21 08:45:06,108 ----------------------------------------------------------------------------------------------------
720
+ 2021-02-21 09:00:02,412 epoch 9 - iter 2119/21198 - loss 0.19438574 - samples/sec: 9.46 - lr: 0.000003
721
+ 2021-02-21 09:15:05,242 epoch 9 - iter 4238/21198 - loss 0.18942482 - samples/sec: 9.39 - lr: 0.000003
722
+ 2021-02-21 09:30:02,818 epoch 9 - iter 6357/21198 - loss 0.19236360 - samples/sec: 9.44 - lr: 0.000003
723
+ 2021-02-21 09:44:58,840 epoch 9 - iter 8476/21198 - loss 0.19256963 - samples/sec: 9.46 - lr: 0.000003
724
+ 2021-02-21 09:59:56,642 epoch 9 - iter 10595/21198 - loss 0.19253633 - samples/sec: 9.44 - lr: 0.000003
725
+ 2021-02-21 10:14:53,595 epoch 9 - iter 12714/21198 - loss 0.19368548 - samples/sec: 9.45 - lr: 0.000003
726
+ 2021-02-21 10:29:47,614 epoch 9 - iter 14833/21198 - loss 0.19452139 - samples/sec: 9.48 - lr: 0.000003
727
+ 2021-02-21 10:44:41,415 epoch 9 - iter 16952/21198 - loss 0.19339405 - samples/sec: 9.48 - lr: 0.000003
728
+ 2021-02-21 10:59:36,337 epoch 9 - iter 19071/21198 - loss 0.19242064 - samples/sec: 9.47 - lr: 0.000003
729
+ 2021-02-21 11:14:30,614 epoch 9 - iter 21190/21198 - loss 0.19248543 - samples/sec: 9.48 - lr: 0.000003
730
+ 2021-02-21 11:14:33,791 ----------------------------------------------------------------------------------------------------
731
+ 2021-02-21 11:14:33,791 EPOCH 9 done: loss 0.1925 - lr 0.0000029
732
+ 2021-02-21 11:20:25,946 TEST : loss 0.12788806855678558 - score 0.9075
733
+ 2021-02-21 11:20:26,040 BAD EPOCHS (no improvement): 4
734
+ 2021-02-21 11:20:26,059 ----------------------------------------------------------------------------------------------------
735
+ 2021-02-21 11:35:18,369 epoch 10 - iter 2119/21198 - loss 0.19003716 - samples/sec: 9.50 - lr: 0.000003
736
+ 2021-02-21 11:50:08,521 epoch 10 - iter 4238/21198 - loss 0.18305573 - samples/sec: 9.52 - lr: 0.000003
737
+ 2021-02-21 12:05:00,626 epoch 10 - iter 6357/21198 - loss 0.18276790 - samples/sec: 9.50 - lr: 0.000003
738
+ 2021-02-21 12:19:58,182 epoch 10 - iter 8476/21198 - loss 0.18408200 - samples/sec: 9.44 - lr: 0.000003
739
+ 2021-02-21 12:34:51,607 epoch 10 - iter 10595/21198 - loss 0.18396061 - samples/sec: 9.49 - lr: 0.000003
740
+ 2021-02-21 12:49:50,161 epoch 10 - iter 12714/21198 - loss 0.18350312 - samples/sec: 9.43 - lr: 0.000003
741
+ 2021-02-21 13:04:45,147 epoch 10 - iter 14833/21198 - loss 0.18334288 - samples/sec: 9.47 - lr: 0.000003
742
+ 2021-02-21 13:19:40,466 epoch 10 - iter 16952/21198 - loss 0.18425802 - samples/sec: 9.47 - lr: 0.000003
743
+ 2021-02-21 13:34:36,952 epoch 10 - iter 19071/21198 - loss 0.18414841 - samples/sec: 9.46 - lr: 0.000003
744
+ 2021-02-21 13:49:30,328 epoch 10 - iter 21190/21198 - loss 0.18456898 - samples/sec: 9.49 - lr: 0.000003
745
+ 2021-02-21 13:49:33,450 ----------------------------------------------------------------------------------------------------
746
+ 2021-02-21 13:49:33,450 EPOCH 10 done: loss 0.1846 - lr 0.0000025
747
+ 2021-02-21 13:55:29,322 TEST : loss 0.14910565316677094 - score 0.9058
748
+ 2021-02-21 13:55:29,415 BAD EPOCHS (no improvement): 4
749
+ 2021-02-21 13:55:29,417 ----------------------------------------------------------------------------------------------------
750
+ 2021-02-21 14:10:21,804 epoch 11 - iter 2119/21198 - loss 0.17609195 - samples/sec: 9.50 - lr: 0.000002
751
+ 2021-02-21 14:25:16,338 epoch 11 - iter 4238/21198 - loss 0.18154520 - samples/sec: 9.48 - lr: 0.000002
752
+ 2021-02-21 14:40:12,223 epoch 11 - iter 6357/21198 - loss 0.18097113 - samples/sec: 9.46 - lr: 0.000002
753
+ 2021-02-21 14:55:03,642 epoch 11 - iter 8476/21198 - loss 0.18053539 - samples/sec: 9.51 - lr: 0.000002
754
+ 2021-02-21 15:09:56,533 epoch 11 - iter 10595/21198 - loss 0.17876087 - samples/sec: 9.49 - lr: 0.000002
755
+ 2021-02-21 15:24:53,173 epoch 11 - iter 12714/21198 - loss 0.17894441 - samples/sec: 9.45 - lr: 0.000002
756
+ 2021-02-21 15:39:48,175 epoch 11 - iter 14833/21198 - loss 0.17978821 - samples/sec: 9.47 - lr: 0.000002
757
+ 2021-02-21 15:54:40,494 epoch 11 - iter 16952/21198 - loss 0.18011143 - samples/sec: 9.50 - lr: 0.000002
758
+ 2021-02-21 16:09:33,438 epoch 11 - iter 19071/21198 - loss 0.17919032 - samples/sec: 9.49 - lr: 0.000002
759
+ 2021-02-21 16:24:22,957 epoch 11 - iter 21190/21198 - loss 0.17903132 - samples/sec: 9.53 - lr: 0.000002
760
+ 2021-02-21 16:24:26,245 ----------------------------------------------------------------------------------------------------
761
+ 2021-02-21 16:24:26,245 EPOCH 11 done: loss 0.1790 - lr 0.0000021
762
+ 2021-02-21 16:30:17,246 TEST : loss 0.15147249400615692 - score 0.9062
763
+ 2021-02-21 16:30:17,342 BAD EPOCHS (no improvement): 4
764
+ 2021-02-21 16:30:17,350 ----------------------------------------------------------------------------------------------------
765
+ 2021-02-21 16:45:13,575 epoch 12 - iter 2119/21198 - loss 0.17364982 - samples/sec: 9.46 - lr: 0.000002
766
+ 2021-02-21 17:00:11,813 epoch 12 - iter 4238/21198 - loss 0.17305974 - samples/sec: 9.44 - lr: 0.000002
767
+ 2021-02-21 17:15:07,540 epoch 12 - iter 6357/21198 - loss 0.17213052 - samples/sec: 9.46 - lr: 0.000002
768
+ 2021-02-21 17:30:04,059 epoch 12 - iter 8476/21198 - loss 0.16983198 - samples/sec: 9.46 - lr: 0.000002
769
+ 2021-02-21 17:44:57,853 epoch 12 - iter 10595/21198 - loss 0.17052354 - samples/sec: 9.48 - lr: 0.000002
770
+ 2021-02-21 17:59:52,951 epoch 12 - iter 12714/21198 - loss 0.16948349 - samples/sec: 9.47 - lr: 0.000002
771
+ 2021-02-21 18:14:48,715 epoch 12 - iter 14833/21198 - loss 0.16890758 - samples/sec: 9.46 - lr: 0.000002
772
+ 2021-02-21 18:29:40,011 epoch 12 - iter 16952/21198 - loss 0.16929059 - samples/sec: 9.51 - lr: 0.000002
773
+ 2021-02-21 18:44:42,153 epoch 12 - iter 19071/21198 - loss 0.16928360 - samples/sec: 9.40 - lr: 0.000002
774
+ 2021-02-21 18:59:37,616 epoch 12 - iter 21190/21198 - loss 0.17211801 - samples/sec: 9.47 - lr: 0.000002
775
+ 2021-02-21 18:59:40,898 ----------------------------------------------------------------------------------------------------
776
+ 2021-02-21 18:59:40,898 EPOCH 12 done: loss 0.1721 - lr 0.0000017
777
+ 2021-02-21 19:05:31,029 TEST : loss 0.147916778922081 - score 0.9085
778
+ 2021-02-21 19:05:31,125 BAD EPOCHS (no improvement): 4
779
+ 2021-02-21 19:05:31,142 ----------------------------------------------------------------------------------------------------
780
+ 2021-02-21 19:20:24,965 epoch 13 - iter 2119/21198 - loss 0.16896267 - samples/sec: 9.48 - lr: 0.000002
781
+ 2021-02-21 19:35:21,463 epoch 13 - iter 4238/21198 - loss 0.16653116 - samples/sec: 9.46 - lr: 0.000002
782
+ 2021-02-21 19:50:15,194 epoch 13 - iter 6357/21198 - loss 0.16770765 - samples/sec: 9.48 - lr: 0.000002
783
+ 2021-02-21 20:05:12,891 epoch 13 - iter 8476/21198 - loss 0.17108344 - samples/sec: 9.44 - lr: 0.000002
784
+ 2021-02-21 20:20:06,566 epoch 13 - iter 10595/21198 - loss 0.17184402 - samples/sec: 9.49 - lr: 0.000002
785
+ 2021-02-21 20:34:59,890 epoch 13 - iter 12714/21198 - loss 0.17303152 - samples/sec: 9.49 - lr: 0.000002
786
+ 2021-02-21 20:49:50,908 epoch 13 - iter 14833/21198 - loss 0.17325989 - samples/sec: 9.51 - lr: 0.000001
787
+ 2021-02-21 21:04:47,902 epoch 13 - iter 16952/21198 - loss 0.17294630 - samples/sec: 9.45 - lr: 0.000001
788
+ 2021-02-21 21:19:41,901 epoch 13 - iter 19071/21198 - loss 0.17373625 - samples/sec: 9.48 - lr: 0.000001
789
+ 2021-02-21 21:34:36,135 epoch 13 - iter 21190/21198 - loss 0.17394207 - samples/sec: 9.48 - lr: 0.000001
790
+ 2021-02-21 21:34:39,310 ----------------------------------------------------------------------------------------------------
791
+ 2021-02-21 21:34:39,310 EPOCH 13 done: loss 0.1739 - lr 0.0000014
792
+ 2021-02-21 21:40:34,294 TEST : loss 0.16395367681980133 - score 0.9076
793
+ 2021-02-21 21:40:34,393 BAD EPOCHS (no improvement): 4
794
+ 2021-02-21 21:40:34,407 ----------------------------------------------------------------------------------------------------
795
+ 2021-02-21 21:55:30,019 epoch 14 - iter 2119/21198 - loss 0.17210424 - samples/sec: 9.46 - lr: 0.000001
796
+ 2021-02-21 22:10:22,785 epoch 14 - iter 4238/21198 - loss 0.17224407 - samples/sec: 9.49 - lr: 0.000001
797
+ 2021-02-21 22:25:15,502 epoch 14 - iter 6357/21198 - loss 0.17196186 - samples/sec: 9.50 - lr: 0.000001
798
+ 2021-02-21 22:40:13,225 epoch 14 - iter 8476/21198 - loss 0.17131693 - samples/sec: 9.44 - lr: 0.000001
799
+ 2021-02-21 22:55:12,609 epoch 14 - iter 10595/21198 - loss 0.17336075 - samples/sec: 9.43 - lr: 0.000001
800
+ 2021-02-21 23:10:03,405 epoch 14 - iter 12714/21198 - loss 0.17249936 - samples/sec: 9.52 - lr: 0.000001
801
+ 2021-02-21 23:24:55,615 epoch 14 - iter 14833/21198 - loss 0.17318785 - samples/sec: 9.50 - lr: 0.000001
802
+ 2021-02-21 23:39:39,560 epoch 14 - iter 16952/21198 - loss 0.17208304 - samples/sec: 9.59 - lr: 0.000001
803
+ 2021-02-21 23:54:35,004 epoch 14 - iter 19071/21198 - loss 0.17228505 - samples/sec: 9.47 - lr: 0.000001
804
+ 2021-02-22 00:09:25,613 epoch 14 - iter 21190/21198 - loss 0.17228047 - samples/sec: 9.52 - lr: 0.000001
805
+ 2021-02-22 00:09:28,876 ----------------------------------------------------------------------------------------------------
806
+ 2021-02-22 00:09:28,877 EPOCH 14 done: loss 0.1723 - lr 0.0000010
807
+ 2021-02-22 00:15:21,867 TEST : loss 0.16743017733097076 - score 0.909
808
+ 2021-02-22 00:15:21,963 BAD EPOCHS (no improvement): 4
809
+ 2021-02-22 00:15:21,965 ----------------------------------------------------------------------------------------------------
810
+ 2021-02-22 00:30:16,862 epoch 15 - iter 2119/21198 - loss 0.15790436 - samples/sec: 9.47 - lr: 0.000001
811
+ 2021-02-22 00:45:09,621 epoch 15 - iter 4238/21198 - loss 0.15811998 - samples/sec: 9.49 - lr: 0.000001
812
+ 2021-02-22 01:00:03,426 epoch 15 - iter 6357/21198 - loss 0.16041062 - samples/sec: 9.48 - lr: 0.000001
813
+ 2021-02-22 01:14:56,991 epoch 15 - iter 8476/21198 - loss 0.16204753 - samples/sec: 9.49 - lr: 0.000001
814
+ 2021-02-22 01:29:46,578 epoch 15 - iter 10595/21198 - loss 0.16310173 - samples/sec: 9.53 - lr: 0.000001
815
+ 2021-02-22 01:44:39,948 epoch 15 - iter 12714/21198 - loss 0.16249272 - samples/sec: 9.49 - lr: 0.000001
816
+ 2021-02-22 01:59:33,810 epoch 15 - iter 14833/21198 - loss 0.16196562 - samples/sec: 9.48 - lr: 0.000001
817
+ 2021-02-22 02:14:26,647 epoch 15 - iter 16952/21198 - loss 0.16333266 - samples/sec: 9.49 - lr: 0.000001
818
+ 2021-02-22 02:29:18,415 epoch 15 - iter 19071/21198 - loss 0.16459359 - samples/sec: 9.51 - lr: 0.000001
819
+ 2021-02-22 02:44:12,651 epoch 15 - iter 21190/21198 - loss 0.16491666 - samples/sec: 9.48 - lr: 0.000001
820
+ 2021-02-22 02:44:15,874 ----------------------------------------------------------------------------------------------------
821
+ 2021-02-22 02:44:15,874 EPOCH 15 done: loss 0.1649 - lr 0.0000007
822
+ 2021-02-22 02:50:08,356 TEST : loss 0.17295649647712708 - score 0.9101
823
+ 2021-02-22 02:50:08,450 BAD EPOCHS (no improvement): 4
824
+ 2021-02-22 02:50:08,452 ----------------------------------------------------------------------------------------------------
825
+ 2021-02-22 03:05:07,383 epoch 16 - iter 2119/21198 - loss 0.16869372 - samples/sec: 9.43 - lr: 0.000001
826
+ 2021-02-22 03:20:04,205 epoch 16 - iter 4238/21198 - loss 0.16204002 - samples/sec: 9.45 - lr: 0.000001
827
+ 2021-02-22 03:34:56,532 epoch 16 - iter 6357/21198 - loss 0.16115018 - samples/sec: 9.50 - lr: 0.000001
828
+ 2021-02-22 03:49:52,676 epoch 16 - iter 8476/21198 - loss 0.16290083 - samples/sec: 9.46 - lr: 0.000001
829
+ 2021-02-22 04:04:43,904 epoch 16 - iter 10595/21198 - loss 0.16286029 - samples/sec: 9.51 - lr: 0.000001
830
+ 2021-02-22 04:19:37,979 epoch 16 - iter 12714/21198 - loss 0.16258104 - samples/sec: 9.48 - lr: 0.000001
831
+ 2021-02-22 04:34:27,662 epoch 16 - iter 14833/21198 - loss 0.16217931 - samples/sec: 9.53 - lr: 0.000001
832
+ 2021-02-22 04:49:18,263 epoch 16 - iter 16952/21198 - loss 0.16190092 - samples/sec: 9.52 - lr: 0.000001
833
+ 2021-02-22 05:04:09,607 epoch 16 - iter 19071/21198 - loss 0.16271366 - samples/sec: 9.51 - lr: 0.000001
834
+ 2021-02-22 05:19:03,032 epoch 16 - iter 21190/21198 - loss 0.16309304 - samples/sec: 9.49 - lr: 0.000000
835
+ 2021-02-22 05:19:06,131 ----------------------------------------------------------------------------------------------------
836
+ 2021-02-22 05:19:06,131 EPOCH 16 done: loss 0.1631 - lr 0.0000005
837
+ 2021-02-22 05:24:59,209 TEST : loss 0.1732577085494995 - score 0.9099
838
+ 2021-02-22 05:24:59,306 BAD EPOCHS (no improvement): 4
839
+ 2021-02-22 05:24:59,318 ----------------------------------------------------------------------------------------------------
840
+ 2021-02-22 05:39:50,755 epoch 17 - iter 2119/21198 - loss 0.15607883 - samples/sec: 9.51 - lr: 0.000000
841
+ 2021-02-22 05:54:41,713 epoch 17 - iter 4238/21198 - loss 0.16295560 - samples/sec: 9.51 - lr: 0.000000
842
+ 2021-02-22 06:09:33,595 epoch 17 - iter 6357/21198 - loss 0.16030109 - samples/sec: 9.50 - lr: 0.000000
843
+ 2021-02-22 06:24:26,942 epoch 17 - iter 8476/21198 - loss 0.16028383 - samples/sec: 9.49 - lr: 0.000000
844
+ 2021-02-22 06:39:19,965 epoch 17 - iter 10595/21198 - loss 0.16179951 - samples/sec: 9.49 - lr: 0.000000
845
+ 2021-02-22 06:54:14,002 epoch 17 - iter 12714/21198 - loss 0.16064671 - samples/sec: 9.48 - lr: 0.000000
846
+ 2021-02-22 07:09:02,879 epoch 17 - iter 14833/21198 - loss 0.16118933 - samples/sec: 9.54 - lr: 0.000000
847
+ 2021-02-22 07:23:53,696 epoch 17 - iter 16952/21198 - loss 0.16233903 - samples/sec: 9.52 - lr: 0.000000
848
+ 2021-02-22 07:38:43,895 epoch 17 - iter 19071/21198 - loss 0.16244551 - samples/sec: 9.52 - lr: 0.000000
849
+ 2021-02-22 07:53:35,588 epoch 17 - iter 21190/21198 - loss 0.16243178 - samples/sec: 9.51 - lr: 0.000000
850
+ 2021-02-22 07:53:38,781 ----------------------------------------------------------------------------------------------------
851
+ 2021-02-22 07:53:38,781 EPOCH 17 done: loss 0.1624 - lr 0.0000003
852
+ 2021-02-22 07:59:36,439 TEST : loss 0.1792287975549698 - score 0.9098
853
+ 2021-02-22 07:59:36,538 BAD EPOCHS (no improvement): 4
854
+ 2021-02-22 07:59:36,561 ----------------------------------------------------------------------------------------------------
855
+ 2021-02-22 08:14:29,823 epoch 18 - iter 2119/21198 - loss 0.16946072 - samples/sec: 9.49 - lr: 0.000000
856
+ 2021-02-22 08:29:28,618 epoch 18 - iter 4238/21198 - loss 0.16431210 - samples/sec: 9.43 - lr: 0.000000
857
+ 2021-02-22 08:44:23,757 epoch 18 - iter 6357/21198 - loss 0.16285664 - samples/sec: 9.47 - lr: 0.000000
858
+ 2021-02-22 08:59:18,330 epoch 18 - iter 8476/21198 - loss 0.16406026 - samples/sec: 9.48 - lr: 0.000000
859
+ 2021-02-22 09:14:15,549 epoch 18 - iter 10595/21198 - loss 0.16218940 - samples/sec: 9.45 - lr: 0.000000
860
+ 2021-02-22 09:29:11,539 epoch 18 - iter 12714/21198 - loss 0.16137864 - samples/sec: 9.46 - lr: 0.000000
861
+ 2021-02-22 09:44:06,143 epoch 18 - iter 14833/21198 - loss 0.16211856 - samples/sec: 9.48 - lr: 0.000000
862
+ 2021-02-22 09:59:03,167 epoch 18 - iter 16952/21198 - loss 0.16214711 - samples/sec: 9.45 - lr: 0.000000
863
+ 2021-02-22 10:13:57,239 epoch 18 - iter 19071/21198 - loss 0.16058721 - samples/sec: 9.48 - lr: 0.000000
864
+ 2021-02-22 10:28:52,182 epoch 18 - iter 21190/21198 - loss 0.16093573 - samples/sec: 9.47 - lr: 0.000000
865
+ 2021-02-22 10:28:55,515 ----------------------------------------------------------------------------------------------------
866
+ 2021-02-22 10:28:55,515 EPOCH 18 done: loss 0.1610 - lr 0.0000001
867
+ 2021-02-22 10:34:48,208 TEST : loss 0.17890706658363342 - score 0.9095
868
+ 2021-02-22 10:34:48,308 BAD EPOCHS (no improvement): 4
869
+ 2021-02-22 10:34:48,332 ----------------------------------------------------------------------------------------------------
870
+ 2021-02-22 10:49:43,738 epoch 19 - iter 2119/21198 - loss 0.16694990 - samples/sec: 9.47 - lr: 0.000000
871
+ 2021-02-22 11:04:30,455 epoch 19 - iter 4238/21198 - loss 0.15984197 - samples/sec: 9.56 - lr: 0.000000
872
+ 2021-02-22 11:19:21,091 epoch 19 - iter 6357/21198 - loss 0.15796573 - samples/sec: 9.52 - lr: 0.000000
873
+ 2021-02-22 11:34:16,935 epoch 19 - iter 8476/21198 - loss 0.16031077 - samples/sec: 9.46 - lr: 0.000000
874
+ 2021-02-22 11:49:14,170 epoch 19 - iter 10595/21198 - loss 0.16114764 - samples/sec: 9.45 - lr: 0.000000
875
+ 2021-02-22 12:04:12,070 epoch 19 - iter 12714/21198 - loss 0.16077654 - samples/sec: 9.44 - lr: 0.000000
876
+ 2021-02-22 12:19:05,634 epoch 19 - iter 14833/21198 - loss 0.16093868 - samples/sec: 9.49 - lr: 0.000000
877
+ 2021-02-22 12:34:03,912 epoch 19 - iter 16952/21198 - loss 0.16092922 - samples/sec: 9.44 - lr: 0.000000
878
+ 2021-02-22 12:48:59,408 epoch 19 - iter 19071/21198 - loss 0.16176484 - samples/sec: 9.47 - lr: 0.000000
879
+ 2021-02-22 13:03:55,588 epoch 19 - iter 21190/21198 - loss 0.16136077 - samples/sec: 9.46 - lr: 0.000000
880
+ 2021-02-22 13:03:58,842 ----------------------------------------------------------------------------------------------------
881
+ 2021-02-22 13:03:58,842 EPOCH 19 done: loss 0.1613 - lr 0.0000000
882
+ 2021-02-22 13:09:51,774 TEST : loss 0.1799449324607849 - score 0.9093
883
+ 2021-02-22 13:09:51,873 BAD EPOCHS (no improvement): 4
884
+ 2021-02-22 13:09:51,889 ----------------------------------------------------------------------------------------------------
885
+ 2021-02-22 13:24:48,886 epoch 20 - iter 2119/21198 - loss 0.15743940 - samples/sec: 9.45 - lr: 0.000000
886
+ 2021-02-22 13:39:41,650 epoch 20 - iter 4238/21198 - loss 0.15941045 - samples/sec: 9.49 - lr: 0.000000
887
+ 2021-02-22 13:54:35,155 epoch 20 - iter 6357/21198 - loss 0.16085263 - samples/sec: 9.49 - lr: 0.000000
888
+ 2021-02-22 14:09:30,408 epoch 20 - iter 8476/21198 - loss 0.16038502 - samples/sec: 9.47 - lr: 0.000000
889
+ 2021-02-22 14:24:21,244 epoch 20 - iter 10595/21198 - loss 0.15929046 - samples/sec: 9.52 - lr: 0.000000
890
+ 2021-02-22 14:39:15,988 epoch 20 - iter 12714/21198 - loss 0.15817473 - samples/sec: 9.47 - lr: 0.000000
891
+ 2021-02-22 14:54:08,818 epoch 20 - iter 14833/21198 - loss 0.16049560 - samples/sec: 9.49 - lr: 0.000000
892
+ 2021-02-22 15:09:01,889 epoch 20 - iter 16952/21198 - loss 0.16079237 - samples/sec: 9.49 - lr: 0.000000
893
+ 2021-02-22 15:23:54,278 epoch 20 - iter 19071/21198 - loss 0.16175262 - samples/sec: 9.50 - lr: 0.000000
894
+ 2021-02-22 15:38:48,341 epoch 20 - iter 21190/21198 - loss 0.16071107 - samples/sec: 9.48 - lr: 0.000000
895
+ 2021-02-22 15:38:51,585 ----------------------------------------------------------------------------------------------------
896
+ 2021-02-22 15:38:51,586 EPOCH 20 done: loss 0.1607 - lr 0.0000000
897
+ 2021-02-22 15:44:48,115 TEST : loss 0.17999354004859924 - score 0.9093
898
+ 2021-02-22 15:44:48,213 BAD EPOCHS (no improvement): 4
899
+ 2021-02-22 15:45:25,862 ----------------------------------------------------------------------------------------------------
900
+ 2021-02-22 15:45:25,862 Testing using best model ...
901
+ 2021-02-22 15:51:35,093 0.9055 0.9132 0.9093
902
+ 2021-02-22 15:51:35,093
903
+ Results:
904
+ - F1-score (micro) 0.9093
905
+ - F1-score (macro) 0.8233
906
+
907
+ By class:
908
+ CARDINAL tp: 802 - fp: 124 - fn: 133 - precision: 0.8661 - recall: 0.8578 - f1-score: 0.8619
909
+ DATE tp: 1435 - fp: 219 - fn: 167 - precision: 0.8676 - recall: 0.8958 - f1-score: 0.8814
910
+ EVENT tp: 45 - fp: 19 - fn: 18 - precision: 0.7031 - recall: 0.7143 - f1-score: 0.7087
911
+ FAC tp: 105 - fp: 26 - fn: 30 - precision: 0.8015 - recall: 0.7778 - f1-score: 0.7895
912
+ GPE tp: 2161 - fp: 62 - fn: 79 - precision: 0.9721 - recall: 0.9647 - f1-score: 0.9684
913
+ LANGUAGE tp: 14 - fp: 2 - fn: 8 - precision: 0.8750 - recall: 0.6364 - f1-score: 0.7368
914
+ LAW tp: 26 - fp: 18 - fn: 14 - precision: 0.5909 - recall: 0.6500 - f1-score: 0.6190
915
+ LOC tp: 140 - fp: 41 - fn: 39 - precision: 0.7735 - recall: 0.7821 - f1-score: 0.7778
916
+ MONEY tp: 286 - fp: 29 - fn: 28 - precision: 0.9079 - recall: 0.9108 - f1-score: 0.9094
917
+ NORP tp: 820 - fp: 45 - fn: 21 - precision: 0.9480 - recall: 0.9750 - f1-score: 0.9613
918
+ ORDINAL tp: 168 - fp: 38 - fn: 27 - precision: 0.8155 - recall: 0.8615 - f1-score: 0.8379
919
+ ORG tp: 1650 - fp: 168 - fn: 145 - precision: 0.9076 - recall: 0.9192 - f1-score: 0.9134
920
+ PERCENT tp: 310 - fp: 37 - fn: 39 - precision: 0.8934 - recall: 0.8883 - f1-score: 0.8908
921
+ PERSON tp: 1903 - fp: 81 - fn: 85 - precision: 0.9592 - recall: 0.9572 - f1-score: 0.9582
922
+ PRODUCT tp: 66 - fp: 21 - fn: 10 - precision: 0.7586 - recall: 0.8684 - f1-score: 0.8098
923
+ QUANTITY tp: 87 - fp: 22 - fn: 18 - precision: 0.7982 - recall: 0.8286 - f1-score: 0.8131
924
+ TIME tp: 144 - fp: 72 - fn: 68 - precision: 0.6667 - recall: 0.6792 - f1-score: 0.6729
925
+ WORK_OF_ART tp: 118 - fp: 49 - fn: 48 - precision: 0.7066 - recall: 0.7108 - f1-score: 0.7087
926
+ 2021-02-22 15:51:35,093 ----------------------------------------------------------------------------------------------------