readme: update model card
Browse files
README.md
CHANGED
@@ -3,10 +3,10 @@ tags:
|
|
3 |
- flair
|
4 |
- token-classification
|
5 |
- sequence-tagger-model
|
6 |
-
language:
|
7 |
-
- en
|
8 |
-
- de
|
9 |
-
- fr
|
10 |
- it
|
11 |
- nl
|
12 |
- pl
|
@@ -26,7 +26,7 @@ widget:
|
|
26 |
|
27 |
This is the default multilingual universal part-of-speech tagging model that ships with [Flair](https://github.com/flairNLP/flair/).
|
28 |
|
29 |
-
F1-Score: **
|
30 |
|
31 |
Predicts universal POS tags:
|
32 |
|
@@ -94,14 +94,14 @@ Token[6]: "say" → VERB (0.9998)
|
|
94 |
Token[7]: "." → PUNCT (1.0)
|
95 |
```
|
96 |
|
97 |
-
So, the words "*Ich*" and "*they*" are labeled as **pronouns** (PRON), while "*liebe*" and "*say*" are labeled as **verbs** (VERB) in the multilingual sentence "*Ich liebe Berlin, as they say*".
|
98 |
|
99 |
|
100 |
---
|
101 |
|
102 |
### Training: Script to train this model
|
103 |
|
104 |
-
The following Flair script was used to train this model:
|
105 |
|
106 |
```python
|
107 |
from flair.data import MultiCorpus
|
@@ -129,11 +129,10 @@ corpus = MultiCorpus([
|
|
129 |
tag_type = 'upos'
|
130 |
|
131 |
# 3. make the tag dictionary from the corpus
|
132 |
-
tag_dictionary = corpus.
|
133 |
|
134 |
# 4. initialize each embedding we use
|
135 |
embedding_types = [
|
136 |
-
|
137 |
# contextual string embeddings, forward
|
138 |
FlairEmbeddings('multi-forward'),
|
139 |
|
@@ -141,7 +140,7 @@ embedding_types = [
|
|
141 |
FlairEmbeddings('multi-backward'),
|
142 |
]
|
143 |
|
144 |
-
# embedding stack consists of Flair
|
145 |
embeddings = StackedEmbeddings(embeddings=embedding_types)
|
146 |
|
147 |
# 5. initialize sequence tagger
|
|
|
3 |
- flair
|
4 |
- token-classification
|
5 |
- sequence-tagger-model
|
6 |
+
language:
|
7 |
+
- en
|
8 |
+
- de
|
9 |
+
- fr
|
10 |
- it
|
11 |
- nl
|
12 |
- pl
|
|
|
26 |
|
27 |
This is the default multilingual universal part-of-speech tagging model that ships with [Flair](https://github.com/flairNLP/flair/).
|
28 |
|
29 |
+
F1-Score: **96.87** (12 UD Treebanks covering English, German, French, Italian, Dutch, Polish, Spanish, Swedish, Danish, Norwegian, Finnish and Czech)
|
30 |
|
31 |
Predicts universal POS tags:
|
32 |
|
|
|
94 |
Token[7]: "." → PUNCT (1.0)
|
95 |
```
|
96 |
|
97 |
+
So, the words "*Ich*" and "*they*" are labeled as **pronouns** (PRON), while "*liebe*" and "*say*" are labeled as **verbs** (VERB) in the multilingual sentence "*Ich liebe Berlin, as they say*".
|
98 |
|
99 |
|
100 |
---
|
101 |
|
102 |
### Training: Script to train this model
|
103 |
|
104 |
+
The following Flair script was used to train this model:
|
105 |
|
106 |
```python
|
107 |
from flair.data import MultiCorpus
|
|
|
129 |
tag_type = 'upos'
|
130 |
|
131 |
# 3. make the tag dictionary from the corpus
|
132 |
+
tag_dictionary = corpus.make_label_dictionary(label_type=tag_type)
|
133 |
|
134 |
# 4. initialize each embedding we use
|
135 |
embedding_types = [
|
|
|
136 |
# contextual string embeddings, forward
|
137 |
FlairEmbeddings('multi-forward'),
|
138 |
|
|
|
140 |
FlairEmbeddings('multi-backward'),
|
141 |
]
|
142 |
|
143 |
+
# embedding stack consists of Flair embeddings
|
144 |
embeddings = StackedEmbeddings(embeddings=embedding_types)
|
145 |
|
146 |
# 5. initialize sequence tagger
|