Darkmachine commited on
Commit
695e0b5
·
1 Parent(s): 73651f7

Added french ner

Browse files
Files changed (2) hide show
  1. README.md +147 -0
  2. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - flair
4
+ - token-classification
5
+ - sequence-tagger-model
6
+ language: fr
7
+ datasets:
8
+ - conll2003
9
+ widget:
10
+ - text: "George Washington est allé à Washington"
11
+ ---
12
+
13
+ ## French NER in Flair (default model)
14
+
15
+ This is the standard 4-class NER model for French that ships with [Flair](https://github.com/flairNLP/flair/).
16
+
17
+ F1-Score: **90,61** (WikiNER)
18
+
19
+ Predicts 4 tags:
20
+
21
+ | **tag** | **meaning** |
22
+ |---------------------------------|-----------|
23
+ | PER | person name |
24
+ | LOC | location name |
25
+ | ORG | organization name |
26
+ | MISC | other name |
27
+
28
+ Based on [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and LSTM-CRF.
29
+
30
+ ---
31
+
32
+ ### Demo: How to use in Flair
33
+
34
+ Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
35
+
36
+ ```python
37
+ from flair.data import Sentence
38
+ from flair.models import SequenceTagger
39
+
40
+ # load tagger
41
+ tagger = SequenceTagger.load("flair/ner-french")
42
+
43
+ # make example sentence
44
+ sentence = Sentence("George Washington est allé à Washington")
45
+
46
+ # predict NER tags
47
+ tagger.predict(sentence)
48
+
49
+ # print sentence
50
+ print(sentence)
51
+
52
+ # print predicted NER spans
53
+ print('The following NER tags are found:')
54
+ # iterate over entities and print
55
+ for entity in sentence.get_spans('ner'):
56
+ print(entity)
57
+
58
+ ```
59
+
60
+ This yields the following output:
61
+ ```
62
+ Span [1,2]: "George Washington" [− Labels: PER (0.7394)]
63
+ Span [6]: "Washington" [− Labels: LOC (0.9161)]
64
+ ```
65
+
66
+ So, the entities "*George Washington*" (labeled as a **person**) and "*Washington*" (labeled as a **location**) are found in the sentence "*George Washington est allé à Washington*".
67
+
68
+
69
+ ---
70
+
71
+ ### Training: Script to train this model
72
+
73
+ The following Flair script was used to train this model:
74
+
75
+ ```python
76
+ from flair.data import Corpus
77
+ from flair.datasets import WIKINER_FRENCH
78
+ from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
79
+
80
+ # 1. get the corpus
81
+ corpus: Corpus = WIKINER_FRENCH()
82
+
83
+ # 2. what tag do we want to predict?
84
+ tag_type = 'ner'
85
+
86
+ # 3. make the tag dictionary from the corpus
87
+ tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
88
+
89
+ # 4. initialize each embedding we use
90
+ embedding_types = [
91
+
92
+ # GloVe embeddings
93
+ WordEmbeddings('fr'),
94
+
95
+ # contextual string embeddings, forward
96
+ FlairEmbeddings('fr-forward'),
97
+
98
+ # contextual string embeddings, backward
99
+ FlairEmbeddings('fr-backward'),
100
+ ]
101
+
102
+ # embedding stack consists of Flair and GloVe embeddings
103
+ embeddings = StackedEmbeddings(embeddings=embedding_types)
104
+
105
+ # 5. initialize sequence tagger
106
+ from flair.models import SequenceTagger
107
+
108
+ tagger = SequenceTagger(hidden_size=256,
109
+ embeddings=embeddings,
110
+ tag_dictionary=tag_dictionary,
111
+ tag_type=tag_type)
112
+
113
+ # 6. initialize trainer
114
+ from flair.trainers import ModelTrainer
115
+
116
+ trainer = ModelTrainer(tagger, corpus)
117
+
118
+ # 7. run training
119
+ trainer.train('resources/taggers/ner-french',
120
+ train_with_dev=True,
121
+ max_epochs=150)
122
+ ```
123
+
124
+
125
+
126
+ ---
127
+
128
+ ### Cite
129
+
130
+ Please cite the following paper when using this model.
131
+
132
+ ```
133
+ @inproceedings{akbik2018coling,
134
+ title={Contextual String Embeddings for Sequence Labeling},
135
+ author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
136
+ booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
137
+ pages = {1638--1649},
138
+ year = {2018}
139
+ }
140
+ ```
141
+
142
+
143
+ ---
144
+
145
+ ### Issues?
146
+
147
+ The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79374cb9eced60d9bb0b3edcaf5a0e0c561b042526c20b911c4e9237a4d7ae4a
3
+ size 1296739140