bourdoiscatie commited on
Commit
947878f
·
1 Parent(s): 70f4c0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +251 -26
README.md CHANGED
@@ -35,11 +35,12 @@ Our methodology is described in a blog post available in [English](https://blog.
35
  ## Dataset
36
 
37
  The dataset used is [frenchNER_4entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_4entities), which represents ~385k sentences labeled in 4 categories:
38
- * PER: personality ;
39
- * LOC: location ;
40
- * ORG: organization ;
41
- * MISC: miscellaneous ;
42
- * O: background (Outside entity).
 
43
 
44
  The distribution of the entities is as follows:
45
 
@@ -103,11 +104,67 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
103
  </thead>
104
  <tbody>
105
  <tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  <td rowspan="3"><br>Camembert-base-frenchNER_4entities</td>
107
  <td><br>Precision</td>
108
  <td><br>0.973</td>
109
  <td><br>0.951</td>
110
- <td><br>0.8877</td>
111
  <td><br>0.850</td>
112
  <td><br>0.993</td>
113
  <td><br>0.984</td>
@@ -123,12 +180,12 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
123
  </tr>
124
  <tr>
125
  <td>F1</td>
126
- <td><br>0.978</td>
127
- <td><br>0.958</td>
128
- <td><br>0.903</td>
129
- <td><br>0.814</td>
130
- <td><br>0.993</td>
131
- <td><br>0.984</td>
132
  </tr>
133
  </tbody>
134
  </table>
@@ -152,12 +209,68 @@ In detail:
152
  </tr>
153
  </thead>
154
  <tbody>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
  <tr>
156
  <td rowspan="3"><br>Camembert-base-frenchNER_4entities</td>
157
  <td><br>Precision</td>
158
  <td><br>0.954</td>
159
  <td><br>0.893</td>
160
- <td><br>0.851/td>
161
  <td><br>0.849</td>
162
  <td><br>0.979</td>
163
  <td><br>0.954</td>
@@ -165,7 +278,7 @@ In detail:
165
  <tr>
166
  <td><br>Recall</td>
167
  <td><br>0.967</td>
168
- <td><br>0.887/td>
169
  <td><br>0.883</td>
170
  <td><br>0.855</td>
171
  <td><br>0.974</td>
@@ -173,12 +286,12 @@ In detail:
173
  </tr>
174
  <tr>
175
  <td>F1</td>
176
- <td><br>0.960</td>
177
- <td><br>0.890</td>
178
- <td><br>0.867</td>
179
- <td><br>0.852</td>
180
- <td><br>0.977</td>
181
- <td><br>0.954</td>
182
  </tr>
183
  </tbody>
184
  </table>
@@ -199,6 +312,62 @@ In detail:
199
  </tr>
200
  </thead>
201
  <tbody>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
  <tr>
203
  <td rowspan="3"><br>Camembert-base-frenchNER_4entities</td>
204
  <td><br>Precision</td>
@@ -220,12 +389,12 @@ In detail:
220
  </tr>
221
  <tr>
222
  <td>F1</td>
223
- <td><br>0.985</td>
224
- <td><br>0.973</td>
225
- <td><br>0.938</td>
226
- <td><br>0.770</td>
227
- <td><br>0.992</td>
228
- <td><br>0.983</td>
229
  </tr>
230
  </tbody>
231
  </table>
@@ -247,6 +416,62 @@ In detail:
247
  </tr>
248
  </thead>
249
  <tbody>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
250
  <tr>
251
  <td rowspan="3"><br>Camembert-base-frenchNER_4entities</td>
252
  <td><br>Precision</td>
 
35
  ## Dataset
36
 
37
  The dataset used is [frenchNER_4entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_4entities), which represents ~385k sentences labeled in 4 categories:
38
+ | Label | Examples |
39
+ |:------|:-----------------------------------------------------------|
40
+ | PER | "La Bruyère", "Gaspard de Coligny", "Wittgenstein" |
41
+ | ORG | "UTBM", "American Airlines", "id Software" |
42
+ | LOC | "République du Cap-Vert", "Créteil", "Bordeaux" |
43
+ | MISC | "Wolfenstein 3D", "Révolution française", "Coupe du monde" |
44
 
45
  The distribution of the entities is as follows:
46
 
 
104
  </thead>
105
  <tbody>
106
  <tr>
107
+ <td rowspan="3"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
108
+ <td><br>Precision</td>
109
+ <td><br>0.952</td>
110
+ <td><br>0.924</td>
111
+ <td><br>0.870</td>
112
+ <td><br>0.845</td>
113
+ <td><br>0.986</td>
114
+ <td><br>0.976</td>
115
+ </tr>
116
+ <tr>
117
+ <td><br>Recall</td>
118
+ <td><br>0.990</td>
119
+ <td><br>0.972</td>
120
+ <td><br>0.938</td>
121
+ <td><br>0.546</td>
122
+ <td><br>0.992</td>
123
+ <td><br>0.976</td>
124
+ </tr>
125
+ <tr>
126
+ <td>F1</td>
127
+ <td><br>0.971</td>
128
+ <td><br>0.947</td>
129
+ <td><br>0.902</td>
130
+ <td><br>0.663</td>
131
+ <td><br>0.989</td>
132
+ <td><br>0.976</td>
133
+ </tr>
134
+ <tr>
135
+ <td rowspan="3"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
136
+ <td><br>Precision</td>
137
+ <td><br>0.962</td>
138
+ <td><br>0.933</td>
139
+ <td><br>0.857</td>
140
+ <td><br>0.830</td>
141
+ <td><br>0.985</td>
142
+ <td><br>0.976</td>
143
+ </tr>
144
+ <tr>
145
+ <td><br>Recall</td>
146
+ <td><br>0.987</td>
147
+ <td><br>0.963</td>
148
+ <td><br>0.930</td>
149
+ <td><br>0.545</td>
150
+ <td><br>0.993</td>
151
+ <td><br>0.976</td>
152
+ </tr>
153
+ <tr>
154
+ <td>F1</td>
155
+ <td><br>0.974</td>
156
+ <td><br>0.948</td>
157
+ <td><br>0.892</td>
158
+ <td><br>0.658</td>
159
+ <td><br>0.989</td>
160
+ <td><br>0.976</td>
161
+ </tr>
162
+ <tr>
163
  <td rowspan="3"><br>Camembert-base-frenchNER_4entities</td>
164
  <td><br>Precision</td>
165
  <td><br>0.973</td>
166
  <td><br>0.951</td>
167
+ <td><br>0.888</td>
168
  <td><br>0.850</td>
169
  <td><br>0.993</td>
170
  <td><br>0.984</td>
 
180
  </tr>
181
  <tr>
182
  <td>F1</td>
183
+ <td><br><b>0.978</b></td>
184
+ <td><br><b>0.958</b></td>
185
+ <td><br><b>0.903</b></td>
186
+ <td><br><b>0.814</b></td>
187
+ <td><br><b>0.993</b></td>
188
+ <td><br><b>0.984</b></td>
189
  </tr>
190
  </tbody>
191
  </table>
 
209
  </tr>
210
  </thead>
211
  <tbody>
212
+ <tr>
213
+ <td rowspan="3"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
214
+ <td><br>Precision</td>
215
+ <td><br>0.908</td>
216
+ <td><br>0.717</td>
217
+ <td><br>0.753</td>
218
+ <td><br>0.620</td>
219
+ <td><br>0.936</td>
220
+ <td><br>0.889</td>
221
+ </tr>
222
+ <tr>
223
+ <td><br>Recall</td>
224
+ <td><br>0.975</td>
225
+ <td><br>0.811</td>
226
+ <td><br>0.696</td>
227
+ <td><br>0.511</td>
228
+ <td><br>0.938</td>
229
+ <td><br>0.889</td>
230
+ </tr>
231
+ <tr>
232
+ <td>F1</td>
233
+ <td><br>0.940</td>
234
+ <td><br>0.761</td>
235
+ <td><br>0.723</td>
236
+ <td><br>0.560</td>
237
+ <td><br>0.937</td>
238
+ <td><br>0.889</td>
239
+ </tr>
240
+ <tr>
241
+ <td rowspan="3"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
242
+ <td><br>Precision</td>
243
+ <td><br>0.885</td>
244
+ <td><br>0.738</td>
245
+ <td><br>0.737</td>
246
+ <td><br>0.589</td>
247
+ <td><br>0.928</td>
248
+ <td><br>0.881</td>
249
+ </tr>
250
+ <tr>
251
+ <td><br>Recall</td>
252
+ <td><br>0.960</td>
253
+ <td><br>0.759</td>
254
+ <td><br>0.655</td>
255
+ <td><br>0.482</td>
256
+ <td><br>0.939</td>
257
+ <td><br>0.881</td>
258
+ </tr>
259
+ <tr>
260
+ <td>F1</td>
261
+ <td><br>0.921</td>
262
+ <td><br>0.748</td>
263
+ <td><br>0.694</td>
264
+ <td><br>0.530</td>
265
+ <td><br>0.934</td>
266
+ <td><br>0.881</td>
267
+ </tr>
268
  <tr>
269
  <td rowspan="3"><br>Camembert-base-frenchNER_4entities</td>
270
  <td><br>Precision</td>
271
  <td><br>0.954</td>
272
  <td><br>0.893</td>
273
+ <td><br>0.851</td>
274
  <td><br>0.849</td>
275
  <td><br>0.979</td>
276
  <td><br>0.954</td>
 
278
  <tr>
279
  <td><br>Recall</td>
280
  <td><br>0.967</td>
281
+ <td><br>0.887</td>
282
  <td><br>0.883</td>
283
  <td><br>0.855</td>
284
  <td><br>0.974</td>
 
286
  </tr>
287
  <tr>
288
  <td>F1</td>
289
+ <td><br><b>0.960</b></td>
290
+ <td><br><b>0.890</b></td>
291
+ <td><br><b>0.867</b></td>
292
+ <td><br><b>0.852</b></td>
293
+ <td><br><b>0.977</b></td>
294
+ <td><br><b>0.954</b></td>
295
  </tr>
296
  </tbody>
297
  </table>
 
312
  </tr>
313
  </thead>
314
  <tbody>
315
+ <tr>
316
+ <td rowspan="3"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
317
+ <td><br>Precision</td>
318
+ <td><br>0.931</td>
319
+ <td><br>0.893</td>
320
+ <td><br>0.827</td>
321
+ <td><br>0.725</td>
322
+ <td><br>0.979</td>
323
+ <td><br>0.966</td>
324
+ </tr>
325
+ <tr>
326
+ <td><br>Recall</td>
327
+ <td><br>0.994</td>
328
+ <td><br>0.980</td>
329
+ <td><br>0.959</td>
330
+ <td><br>0.295</td>
331
+ <td><br>0.990</td>
332
+ <td><br>0.966</td>
333
+ </tr>
334
+ <tr>
335
+ <td>F1</td>
336
+ <td><br>0.962</td>
337
+ <td><br>0.934</td>
338
+ <td><br>0.888</td>
339
+ <td><br>0.419</td>
340
+ <td><br>0.984</td>
341
+ <td><br>0.966</td>
342
+ </tr>
343
+ <tr>
344
+ <td rowspan="3"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
345
+ <td><br>Precision</td>
346
+ <td><br>0.954</td>
347
+ <td><br>0.908</td>
348
+ <td><br>0.817</td>
349
+ <td><br>0.705</td>
350
+ <td><br>0.977</td>
351
+ <td><br>0.967</td>
352
+ </tr>
353
+ <tr>
354
+ <td><br>Recall</td>
355
+ <td><br>0.991</td>
356
+ <td><br>0.969</td>
357
+ <td><br>0.963</td>
358
+ <td><br>0.310</td>
359
+ <td><br>0.990</td>
360
+ <td><br>0.967</td>
361
+ </tr>
362
+ <tr>
363
+ <td>F1</td>
364
+ <td><br>0.972</td>
365
+ <td><br>0.938</td>
366
+ <td><br>0.884</td>
367
+ <td><br>0.430</td>
368
+ <td><br>0.984</td>
369
+ <td><br>0.967</td>
370
+ </tr>
371
  <tr>
372
  <td rowspan="3"><br>Camembert-base-frenchNER_4entities</td>
373
  <td><br>Precision</td>
 
389
  </tr>
390
  <tr>
391
  <td>F1</td>
392
+ <td><br><b>0.985</b></td>
393
+ <td><br><b>0.973</b></td>
394
+ <td><br><b>0.938</b></td>
395
+ <td><br><b>0.770</b></td>
396
+ <td><br><b>0.992</b></td>
397
+ <td><br><b>0.983</b></td>
398
  </tr>
399
  </tbody>
400
  </table>
 
416
  </tr>
417
  </thead>
418
  <tbody>
419
+ <tr>
420
+ <td rowspan="3"><br><a href="https://hf.co/Jean-Baptiste/camembert-ner">Jean-Baptiste/camembert-ner</a></td>
421
+ <td><br>Precision</td>
422
+ <td><br><b>0.986</b></td>
423
+ <td><br><b>0.962</b></td>
424
+ <td><br><b>0.925</b></td>
425
+ <td><br><b>0.943</b></td>
426
+ <td><br><b>0.998</b></td>
427
+ <td><br><b>0.992</b></td>
428
+ </tr>
429
+ <tr>
430
+ <td><br>Recall</td>
431
+ <td><br><b>0.987</b></td>
432
+ <td><br><b>0.969</b></td>
433
+ <td><br><b>0.951</b></td>
434
+ <td><br><b>0.933</b></td>
435
+ <td><br><b>0.997</b></td>
436
+ <td><br><b>0.992</b></td>
437
+ </tr>
438
+ <tr>
439
+ <td>F1</td>
440
+ <td><br><b>0.986</b></td>
441
+ <td><br><b>0.966</b></td>
442
+ <td><br><b>0.938</b></td>
443
+ <td><br><b>0.938</b></td>
444
+ <td><br><b>0.998</b></td>
445
+ <td><br><b>0.992</b></td>
446
+ </tr>
447
+ <tr>
448
+ <td rowspan="3"><br><a href="https://hf/cmarkea/distilcamembert-base-ner">cmarkea/distilcamembert-base-ner</a></td>
449
+ <td><br>Precision</td>
450
+ <td><br>0.982</td>
451
+ <td><br>0.951</td>
452
+ <td><br>0.910</td>
453
+ <td><br>0.942</td>
454
+ <td><br>0.997</td>
455
+ <td><br>0.991</td>
456
+ </tr>
457
+ <tr>
458
+ <td><br>Recall</td>
459
+ <td><br>0.985</td>
460
+ <td><br>0.963</td>
461
+ <td><br>0.940</td>
462
+ <td><br>0.910</td>
463
+ <td><br>0.998</td>
464
+ <td><br>0.991</td>
465
+ </tr>
466
+ <tr>
467
+ <td>F1</td>
468
+ <td><br>0.983</td>
469
+ <td><br>0.964</td>
470
+ <td><br>0.925</td>
471
+ <td><br>0.926</td>
472
+ <td><br>0.997</td>
473
+ <td><br>0.991</td>
474
+ </tr>
475
  <tr>
476
  <td rowspan="3"><br>Camembert-base-frenchNER_4entities</td>
477
  <td><br>Precision</td>