--- license: mit base_model: camembert-base metrics: - precision - recall - f1 - accuracy model-index: - name: Camembert-base-frenchNER_4entities results: [] datasets: - CATIE-AQ/frenchNER_4entities language: - fr widget: - text: "Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan." library_name: transformers pipeline_tag: token-classification co2_eq_emissions: 35 --- # Camembert-base-frenchNER_3entities ## Model Description We present **Camembert-base-frenchNER_4entities**, which is a [CamemBERT base](https://huggingface.co./camembert-base) fine-tuned for the Name Entity Recognition task for the French language on four French NER datasets for 4 entities (LOC, PER, ORG, MISC). All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER_4entities](https://huggingface.co./datasets/CATIE-AQ/frenchNER_4entities). There are a total of **384,773** rows, of which **328,757** are for training, **24,131** for validation and **31,885** for testing. Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/). ## Dataset The dataset used is [frenchNER](https://huggingface.co./datasets/CATIE-AQ/frenchNER_4entities), which represents ~385k sentences labeled in 4 categories : * PER: personality ; * LOC: location ; * ORG: organization ; * MISC: miscellaneous ; * O: background (Outside entity). The distribution of the entities is as follows:
Splits |
O |
PER |
LOC |
ORG |
MISC |
train |
A |
B |
C |
D |
E |
---|---|---|---|---|---|
validation |
A |
B |
C |
D |
E |
test |
A |
B |
C |
D |
E |
Model |
Metrics |
PER |
LOC |
ORG |
MISC |
O |
Overall |
---|---|---|---|---|---|---|---|
Camembert-base-frenchNER_4entities |
Precision |
A |
B |
C |
D |
E |
F |
Recall |
A |
B |
C |
D |
E |
F |
|
F1 | A |
B |
C |
D |
E |
F |
|
Number |
A |
B |
C |
D |
E |
F |
Model |
Metrics |
PER |
LOC |
ORG |
MISC |
O |
Overall |
---|---|---|---|---|---|---|---|
Camembert-base-frenchNER_4entities |
Precision |
A |
B |
C |
D |
E |
F |
Recall |
A |
B |
C |
D |
E |
F |
|
F1 | A |
B |
C |
D |
E |
F |
|
Number |
A |
B |
C |
D |
E |
F |
Model |
Metrics |
PER |
LOC |
ORG |
MISC |
O |
Overall |
---|---|---|---|---|---|---|---|
Camembert-base-frenchNER_4entities |
Precision |
A |
B |
C |
D |
E |
F |
Recall |
A |
B |
C |
D |
E |
F |
|
F1 | A |
B |
C |
D |
E |
F |
|
Number |
A |
B |
C |
D |
E |
F |
Model |
Metrics |
PER |
LOC |
ORG |
MISC |
O |
Overall |
---|---|---|---|---|---|---|---|
Camembert-base-frenchNER_4entities |
Precision |
A |
B |
C |
D |
E |
F |
Recall |
A |
B |
C |
D |
E |
F |
|
F1 | A |
B |
C |
D |
E |
F |
|
Number |
A |
B |
C |
D |
E |
F |