AntoineBourgois commited on
Commit
1316269
·
verified ·
1 Parent(s): 85548f1

Upload 3 files

Browse files
Files changed (3) hide show
  1. JCLS_model_card.md +48 -48
  2. README.md +50 -50
  3. final_model.pkl +2 -2
JCLS_model_card.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - camembert
7
  - literary-texts
8
  - nested-entities
9
- - BookBLP-fr
10
  license: apache-2.0
11
  metrics:
12
  - f1
@@ -18,7 +18,7 @@ pipeline_tag: token-classification
18
  ---
19
 
20
  ## INTRODUCTION:
21
- This model, developed as part of the [BookNLP-fr project](https://github.com/lattice-8094/fr-litbank), is a NER model built on top of [camembert-large](https://huggingface.co/almanach/camembert-large) embeddings, trained to predict nested entities in french, specifically for literary texts.
22
 
23
  The predicted entities are:
24
  - mentions of characters (PER): pronouns (je, tu, il, ...), possessive pronouns (mon, ton, son, ...), common nouns (le capitaine, la princesse, ...) and proper nouns (Indiana Delmare, Honoré de Pardaillan, ...)
@@ -31,14 +31,14 @@ The predicted entities are:
31
  ## MODEL PERFORMANCES (LOOCV):
32
  | NER_tag | precision | recall | f1_score | support | support % |
33
  |-----------|-------------|----------|------------|-----------|-------------|
34
- | PER | 91.30% | 95.59% | 93.40% | 4,061 | 85.80% |
35
- | FAC | 75.68% | 75.00% | 75.34% | 224 | 4.73% |
36
- | TIME | 68.61% | 71.50% | 70.02% | 214 | 4.52% |
37
- | LOC | 63.54% | 55.45% | 59.22% | 110 | 2.32% |
38
- | GPE | 78.33% | 73.44% | 75.81% | 64 | 1.35% |
39
- | VEH | 64.52% | 66.67% | 65.57% | 60 | 1.27% |
40
- | micro_avg | 88.37% | 91.93% | 90.10% | 4,733 | 100.00% |
41
- | macro_avg | 73.66% | 72.94% | 73.23% | 4,733 | 100.00% |
42
 
43
  ## TRAINING PARAMETERS:
44
  - Entities types: ['PER', 'LOC', 'FAC', 'TIME', 'VEH', 'GPE']
@@ -75,48 +75,48 @@ Model Output: BIOES labels sequence
75
  *** IN CONSTRUCTION ***
76
 
77
  ## TRAINING CORPUS:
78
- | | Document | Tokens Count | Is included in model eval |
79
- |----|----------------------------------------------------------------|----------------|-----------------------------------|
80
- | 0 | 1836_Gautier-Theophile_La-morte-amoureuse | 14,299 tokens | False |
81
- | 1 | 1840_Sand-George_Pauline | 12,315 tokens | False |
82
- | 2 | 1842_Balzac-Honore-de_La-Maison-du-chat-qui-pelote | 24,776 tokens | False |
83
- | 3 | 1844_Balzac-Honore-de_La-Maison-Nucingen | 30,987 tokens | False |
84
- | 4 | 1844_Balzac-Honore-de_Sarrasine | 15,408 tokens | False |
85
- | 5 | 1856_Cousin-Victor_Madame-de-Hautefort | 11,768 tokens | False |
86
- | 6 | 1863_Gautier-Theophile_Le-capitaine-Fracasse | 11,834 tokens | False |
87
- | 7 | 1873_Zola-Emile_Le-ventre-de-Paris | 12,557 tokens | False |
88
- | 8 | 1881_Flaubert-Gustave_Bouvard-et-Pecuchet | 12,281 tokens | False |
89
- | 9 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-1_1-MADEMOISELLE-FIFI | 5,425 tokens | True |
90
- | 10 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-1_2-MADAME-BAPTISTE | 2,554 tokens | True |
91
- | 11 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-1_3-LA-ROUILLE | 2,929 tokens | True |
92
- | 12 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-2_1-MARROCA | 4,067 tokens | False |
93
- | 13 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-2_2-LA-BUCHE | 2,251 tokens | False |
94
- | 14 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-2_3-LA-RELIQUE | 2,034 tokens | False |
95
- | 15 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_1-FOU | 1,864 tokens | False |
96
- | 16 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_2-REVEIL | 2,141 tokens | False |
97
- | 17 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_3-UNE-RUSE | 2,441 tokens | False |
98
- | 18 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_4-A-CHEVAL | 2,860 tokens | False |
99
- | 19 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_5-UN-REVEILLON | 2,343 tokens | False |
100
- | 20 | 1901_Lucie-Achard_Rosalie-de-Constant-sa-famille-et-ses-amis | 12,703 tokens | False |
101
- | 21 | 1903_Conan-Laure_Elisabeth_Seton | 13,023 tokens | False |
102
- | 22 | 1904_Rolland-Romain_Jean-Christophe_Tome-I-L-aube | 10,982 tokens | True |
103
- | 23 | 1904_Rolland-Romain_Jean-Christophe_Tome-II-Le-matin | 10,305 tokens | False |
104
- | 24 | 1917_Adèle-Bourgeois_Némoville | 12,389 tokens | False |
105
- | 25 | 1923_Radiguet-Raymond_Le-diable-au-corps | 14,637 tokens | False |
106
- | 26 | 1926_Audoux-Marguerite_De-la-ville-au-moulin | 11,902 tokens | True |
107
- | 27 | 1937_Audoux-Marguerite_Douce-Lumiere | 12,285 tokens | False |
108
- | 28 | TOTAL | 275,360 tokens | 5 files used for cross-validation |
109
 
110
  ## PREDICTIONS CONFUSION MATRIX:
111
  | Gold Labels | PER | FAC | TIME | LOC | GPE | VEH | O | support |
112
  |---------------|-------|-------|--------|-------|-------|-------|-----|-----------|
113
- | PER | 3,882 | 3 | 2 | 2 | 2 | 0 | 170 | 4,061 |
114
- | FAC | 7 | 168 | 0 | 0 | 0 | 1 | 48 | 224 |
115
- | TIME | 1 | 0 | 153 | 0 | 0 | 0 | 60 | 214 |
116
- | LOC | 0 | 1 | 0 | 61 | 7 | 0 | 41 | 110 |
117
- | GPE | 1 | 3 | 0 | 3 | 47 | 0 | 10 | 64 |
118
- | VEH | 2 | 0 | 1 | 0 | 0 | 40 | 17 | 60 |
119
- | O | 357 | 47 | 66 | 30 | 4 | 21 | 0 | 525 |
120
 
121
  ## CONTACT:
122
  mail: antoine [dot] bourgois [at] protonmail [dot] com
 
6
  - camembert
7
  - literary-texts
8
  - nested-entities
9
+ - BookNLP-fr
10
  license: apache-2.0
11
  metrics:
12
  - f1
 
18
  ---
19
 
20
  ## INTRODUCTION:
21
+ This model, developed as part of the [BookNLP-fr project](https://github.com/lattice-8094/fr-litbank), is a **NER model** built on top of [camembert-large](https://huggingface.co/almanach/camembert-large) embeddings, trained to predict nested entities in french, specifically for literary texts.
22
 
23
  The predicted entities are:
24
  - mentions of characters (PER): pronouns (je, tu, il, ...), possessive pronouns (mon, ton, son, ...), common nouns (le capitaine, la princesse, ...) and proper nouns (Indiana Delmare, Honoré de Pardaillan, ...)
 
31
  ## MODEL PERFORMANCES (LOOCV):
32
  | NER_tag | precision | recall | f1_score | support | support % |
33
  |-----------|-------------|----------|------------|-----------|-------------|
34
+ | PER | 92.97% | 96.25% | 94.58% | 4,162 | 86.12% |
35
+ | FAC | 76.58% | 75.89% | 76.23% | 224 | 4.63% |
36
+ | TIME | 66.97% | 69.48% | 68.20% | 213 | 4.41% |
37
+ | LOC | 70.00% | 57.27% | 63.00% | 110 | 2.28% |
38
+ | GPE | 80.65% | 78.12% | 79.37% | 64 | 1.32% |
39
+ | VEH | 57.75% | 68.33% | 62.60% | 60 | 1.24% |
40
+ | micro_avg | 89.94% | 92.65% | 91.25% | 4,833 | 100.00% |
41
+ | macro_avg | 74.15% | 74.23% | 74.00% | 4,833 | 100.00% |
42
 
43
  ## TRAINING PARAMETERS:
44
  - Entities types: ['PER', 'LOC', 'FAC', 'TIME', 'VEH', 'GPE']
 
75
  *** IN CONSTRUCTION ***
76
 
77
  ## TRAINING CORPUS:
78
+ | | Document | Tokens Count | Is included in model eval |
79
+ |----|---------------------------------------------------------------------------------|----------------|-----------------------------------|
80
+ | 0 | 1830_Balzac-Honoré-de_La-maison-du-chat-qui-pelote | 24,776 tokens | False |
81
+ | 1 | 1830_Balzac-Honoré-de_Sarrasine | 15,408 tokens | False |
82
+ | 2 | 1836_Gautier-Théophile_La-morte-amoureuse | 14,293 tokens | False |
83
+ | 3 | 1837_Balzac-Honoré-de_La-maison-Nucingen | 30,034 tokens | False |
84
+ | 4 | 1841_Sand-George_Pauline | 12,398 tokens | False |
85
+ | 5 | 1856_Cousin-Victor_Madame-de-Hautefort | 11,768 tokens | False |
86
+ | 6 | 1863_Gautier-Théophile_Le-capitaine-Fracasse | 11,848 tokens | False |
87
+ | 7 | 1873_Zola-Émile_Le-ventre-de-Paris | 12,613 tokens | False |
88
+ | 8 | 1881_Flaubert-Gustave_Bouvard-et-Pécuchet | 12,308 tokens | False |
89
+ | 9 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-buche | 2,267 tokens | False |
90
+ | 10 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-relique | 2,041 tokens | False |
91
+ | 11 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-rouille | 2,949 tokens | True |
92
+ | 12 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Madame-Baptiste | 2,578 tokens | True |
93
+ | 13 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Marocca | 4,078 tokens | False |
94
+ | 14 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-A-cheval | 2,878 tokens | False |
95
+ | 15 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Fou | 1,905 tokens | False |
96
+ | 16 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Mademoiselle-Fifi | 5,439 tokens | True |
97
+ | 17 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Reveil | 2,159 tokens | False |
98
+ | 18 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Un-reveillon | 2,364 tokens | False |
99
+ | 19 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Une-ruse | 2,469 tokens | False |
100
+ | 20 | 1901_Achard-Lucie_Rosalie-de-Constant-sa-famille-et-ses-amis | 12,775 tokens | False |
101
+ | 21 | 1903_Conan-Laure_Élisabeth-Seton | 13,046 tokens | False |
102
+ | 22 | 1904-1912_Rolland-Romain_Jean-Christophe(1) | 10,982 tokens | True |
103
+ | 23 | 1904-1912_Rolland-Romain_Jean-Christophe(2) | 10,305 tokens | False |
104
+ | 24 | 1917_Bourgeois-Adèle_Némoville | 12,468 tokens | False |
105
+ | 25 | 1923_Radiguet-Raymond_Le-diable-au-corps | 14,850 tokens | False |
106
+ | 26 | 1926_Audoux-Marguerite_De-la-ville-au-moulin | 12,144 tokens | True |
107
+ | 27 | 1937_Audoux-Marguerite_Douce-Lumière | 12,346 tokens | False |
108
+ | 28 | TOTAL | 275,489 tokens | 5 files used for cross-validation |
109
 
110
  ## PREDICTIONS CONFUSION MATRIX:
111
  | Gold Labels | PER | FAC | TIME | LOC | GPE | VEH | O | support |
112
  |---------------|-------|-------|--------|-------|-------|-------|-----|-----------|
113
+ | PER | 4,006 | 0 | 2 | 1 | 1 | 3 | 149 | 4,162 |
114
+ | FAC | 8 | 170 | 0 | 2 | 0 | 1 | 43 | 224 |
115
+ | TIME | 1 | 0 | 148 | 0 | 0 | 0 | 64 | 213 |
116
+ | LOC | 0 | 2 | 0 | 63 | 6 | 0 | 39 | 110 |
117
+ | GPE | 2 | 1 | 0 | 3 | 50 | 0 | 8 | 64 |
118
+ | VEH | 3 | 0 | 0 | 0 | 0 | 41 | 16 | 60 |
119
+ | O | 287 | 49 | 70 | 21 | 5 | 26 | 0 | 458 |
120
 
121
  ## CONTACT:
122
  mail: antoine [dot] bourgois [at] protonmail [dot] com
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - camembert
7
  - literary-texts
8
  - nested-entities
9
- - BookBLP-fr
10
  license: apache-2.0
11
  metrics:
12
  - f1
@@ -18,7 +18,7 @@ pipeline_tag: token-classification
18
  ---
19
 
20
  ## INTRODUCTION:
21
- This model, developed as part of the [BookNLP-fr project](https://github.com/lattice-8094/fr-litbank), is a NER model built on top of [camembert-large](https://huggingface.co/almanach/camembert-large) embeddings, trained to predict nested entities in french, specifically for literary texts.
22
 
23
  The predicted entities are:
24
  - mentions of characters (PER): pronouns (je, tu, il, ...), possessive pronouns (mon, ton, son, ...), common nouns (le capitaine, la princesse, ...) and proper nouns (Indiana Delmare, Honoré de Pardaillan, ...)
@@ -31,14 +31,14 @@ The predicted entities are:
31
  ## MODEL PERFORMANCES (LOOCV):
32
  | NER_tag | precision | recall | f1_score | support | support % |
33
  |-----------|-------------|----------|------------|-----------|-------------|
34
- | PER | 90.58% | 93.52% | 92.03% | 31,570 | 83.87% |
35
- | FAC | 70.49% | 71.75% | 71.12% | 2,294 | 6.09% |
36
- | TIME | 58.40% | 58.68% | 58.54% | 1,670 | 4.44% |
37
- | GPE | 76.69% | 74.05% | 75.35% | 871 | 2.31% |
38
- | LOC | 60.92% | 44.37% | 51.35% | 773 | 2.05% |
39
- | VEH | 66.18% | 49.25% | 56.47% | 465 | 1.24% |
40
- | micro_avg | 86.70% | 88.64% | 87.61% | 37,643 | 100.00% |
41
- | macro_avg | 70.55% | 65.27% | 67.48% | 37,643 | 100.00% |
42
 
43
  ## TRAINING PARAMETERS:
44
  - Entities types: ['PER', 'LOC', 'FAC', 'TIME', 'VEH', 'GPE']
@@ -75,48 +75,48 @@ Model Output: BIOES labels sequence
75
  *** IN CONSTRUCTION ***
76
 
77
  ## TRAINING CORPUS:
78
- | | Document | Tokens Count | Is included in model eval |
79
- |----|----------------------------------------------------------------|----------------|------------------------------------|
80
- | 0 | 1836_Gautier-Theophile_La-morte-amoureuse | 14,299 tokens | True |
81
- | 1 | 1840_Sand-George_Pauline | 12,315 tokens | True |
82
- | 2 | 1842_Balzac-Honore-de_La-Maison-du-chat-qui-pelote | 24,776 tokens | True |
83
- | 3 | 1844_Balzac-Honore-de_La-Maison-Nucingen | 30,987 tokens | True |
84
- | 4 | 1844_Balzac-Honore-de_Sarrasine | 15,408 tokens | True |
85
- | 5 | 1856_Cousin-Victor_Madame-de-Hautefort | 11,768 tokens | True |
86
- | 6 | 1863_Gautier-Theophile_Le-capitaine-Fracasse | 11,834 tokens | True |
87
- | 7 | 1873_Zola-Emile_Le-ventre-de-Paris | 12,557 tokens | True |
88
- | 8 | 1881_Flaubert-Gustave_Bouvard-et-Pecuchet | 12,281 tokens | True |
89
- | 9 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-1_1-MADEMOISELLE-FIFI | 5,425 tokens | True |
90
- | 10 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-1_2-MADAME-BAPTISTE | 2,554 tokens | True |
91
- | 11 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-1_3-LA-ROUILLE | 2,929 tokens | True |
92
- | 12 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-2_1-MARROCA | 4,067 tokens | True |
93
- | 13 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-2_2-LA-BUCHE | 2,251 tokens | True |
94
- | 14 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-2_3-LA-RELIQUE | 2,034 tokens | True |
95
- | 15 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_1-FOU | 1,864 tokens | True |
96
- | 16 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_2-REVEIL | 2,141 tokens | True |
97
- | 17 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_3-UNE-RUSE | 2,441 tokens | True |
98
- | 18 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_4-A-CHEVAL | 2,860 tokens | True |
99
- | 19 | 1882_Guy-de-Maupassant_Mademoiselle-Fifi-3_5-UN-REVEILLON | 2,343 tokens | True |
100
- | 20 | 1901_Lucie-Achard_Rosalie-de-Constant-sa-famille-et-ses-amis | 12,703 tokens | True |
101
- | 21 | 1903_Conan-Laure_Elisabeth_Seton | 13,023 tokens | True |
102
- | 22 | 1904_Rolland-Romain_Jean-Christophe_Tome-I-L-aube | 10,982 tokens | True |
103
- | 23 | 1904_Rolland-Romain_Jean-Christophe_Tome-II-Le-matin | 10,305 tokens | True |
104
- | 24 | 1917_Adèle-Bourgeois_Némoville | 12,389 tokens | True |
105
- | 25 | 1923_Radiguet-Raymond_Le-diable-au-corps | 14,637 tokens | True |
106
- | 26 | 1926_Audoux-Marguerite_De-la-ville-au-moulin | 11,902 tokens | True |
107
- | 27 | 1937_Audoux-Marguerite_Douce-Lumiere | 12,285 tokens | True |
108
- | 28 | TOTAL | 275,360 tokens | 28 files used for cross-validation |
109
 
110
  ## PREDICTIONS CONFUSION MATRIX:
111
- | Gold Labels | PER | FAC | TIME | GPE | LOC | VEH | O | support |
112
- |---------------|--------|-------|--------|-------|-------|-------|-------|-----------|
113
- | PER | 29,525 | 27 | 13 | 6 | 7 | 26 | 1,966 | 31,570 |
114
- | FAC | 43 | 1,646 | 0 | 17 | 12 | 2 | 574 | 2,294 |
115
- | TIME | 5 | 1 | 980 | 1 | 1 | 0 | 682 | 1,670 |
116
- | GPE | 18 | 28 | 1 | 645 | 27 | 0 | 152 | 871 |
117
- | LOC | 5 | 63 | 0 | 54 | 343 | 0 | 308 | 773 |
118
- | VEH | 58 | 8 | 1 | 0 | 0 | 229 | 169 | 465 |
119
- | O | 2,902 | 532 | 682 | 110 | 167 | 89 | 0 | 4,482 |
120
 
121
  ## CONTACT:
122
  mail: antoine [dot] bourgois [at] protonmail [dot] com
 
6
  - camembert
7
  - literary-texts
8
  - nested-entities
9
+ - BookNLP-fr
10
  license: apache-2.0
11
  metrics:
12
  - f1
 
18
  ---
19
 
20
  ## INTRODUCTION:
21
+ This model, developed as part of the [BookNLP-fr project](https://github.com/lattice-8094/fr-litbank), is a **NER model** built on top of [camembert-large](https://huggingface.co/almanach/camembert-large) embeddings, trained to predict nested entities in french, specifically for literary texts.
22
 
23
  The predicted entities are:
24
  - mentions of characters (PER): pronouns (je, tu, il, ...), possessive pronouns (mon, ton, son, ...), common nouns (le capitaine, la princesse, ...) and proper nouns (Indiana Delmare, Honoré de Pardaillan, ...)
 
31
  ## MODEL PERFORMANCES (LOOCV):
32
  | NER_tag | precision | recall | f1_score | support | support % |
33
  |-----------|-------------|----------|------------|-----------|-------------|
34
+ | PER | 92.78% | 94.29% | 93.53% | 10,354 | 86.92% |
35
+ | FAC | 69.81% | 69.92% | 69.87% | 635 | 5.33% |
36
+ | TIME | 64.21% | 62.12% | 63.15% | 462 | 3.88% |
37
+ | LOC | 63.50% | 46.28% | 53.54% | 188 | 1.58% |
38
+ | GPE | 79.86% | 74.68% | 77.18% | 154 | 1.29% |
39
+ | VEH | 61.82% | 57.14% | 59.39% | 119 | 1.00% |
40
+ | micro_avg | 89.51% | 90.36% | 89.91% | 11,912 | 100.00% |
41
+ | macro_avg | 72.00% | 67.40% | 69.44% | 11,912 | 100.00% |
42
 
43
  ## TRAINING PARAMETERS:
44
  - Entities types: ['PER', 'LOC', 'FAC', 'TIME', 'VEH', 'GPE']
 
75
  *** IN CONSTRUCTION ***
76
 
77
  ## TRAINING CORPUS:
78
+ | | Document | Tokens Count | Is included in model eval |
79
+ |----|---------------------------------------------------------------------------------|----------------|-----------------------------------|
80
+ | 0 | 1830_Balzac-Honoré-de_La-maison-du-chat-qui-pelote | 24,776 tokens | True |
81
+ | 1 | 1830_Balzac-Honoré-de_Sarrasine | 15,408 tokens | True |
82
+ | 2 | 1836_Gautier-Théophile_La-morte-amoureuse | 14,293 tokens | True |
83
+ | 3 | 1837_Balzac-Honoré-de_La-maison-Nucingen | 30,034 tokens | False |
84
+ | 4 | 1841_Sand-George_Pauline | 12,398 tokens | False |
85
+ | 5 | 1856_Cousin-Victor_Madame-de-Hautefort | 11,768 tokens | False |
86
+ | 6 | 1863_Gautier-Théophile_Le-capitaine-Fracasse | 11,848 tokens | False |
87
+ | 7 | 1873_Zola-Émile_Le-ventre-de-Paris | 12,613 tokens | False |
88
+ | 8 | 1881_Flaubert-Gustave_Bouvard-et-Pécuchet | 12,308 tokens | False |
89
+ | 9 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-buche | 2,267 tokens | False |
90
+ | 10 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-relique | 2,041 tokens | False |
91
+ | 11 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-rouille | 2,949 tokens | True |
92
+ | 12 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Madame-Baptiste | 2,578 tokens | True |
93
+ | 13 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Marocca | 4,078 tokens | False |
94
+ | 14 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-A-cheval | 2,878 tokens | False |
95
+ | 15 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Fou | 1,905 tokens | False |
96
+ | 16 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Mademoiselle-Fifi | 5,439 tokens | True |
97
+ | 17 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Reveil | 2,159 tokens | False |
98
+ | 18 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Un-reveillon | 2,364 tokens | False |
99
+ | 19 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Une-ruse | 2,469 tokens | False |
100
+ | 20 | 1901_Achard-Lucie_Rosalie-de-Constant-sa-famille-et-ses-amis | 12,775 tokens | False |
101
+ | 21 | 1903_Conan-Laure_Élisabeth-Seton | 13,046 tokens | False |
102
+ | 22 | 1904-1912_Rolland-Romain_Jean-Christophe(1) | 10,982 tokens | True |
103
+ | 23 | 1904-1912_Rolland-Romain_Jean-Christophe(2) | 10,305 tokens | False |
104
+ | 24 | 1917_Bourgeois-Adèle_Némoville | 12,468 tokens | False |
105
+ | 25 | 1923_Radiguet-Raymond_Le-diable-au-corps | 14,850 tokens | False |
106
+ | 26 | 1926_Audoux-Marguerite_De-la-ville-au-moulin | 12,144 tokens | True |
107
+ | 27 | 1937_Audoux-Marguerite_Douce-Lumière | 12,346 tokens | False |
108
+ | 28 | TOTAL | 275,489 tokens | 8 files used for cross-validation |
109
 
110
  ## PREDICTIONS CONFUSION MATRIX:
111
+ | Gold Labels | PER | FAC | TIME | LOC | GPE | VEH | O | support |
112
+ |---------------|-------|-------|--------|-------|-------|-------|-----|-----------|
113
+ | PER | 9,763 | 3 | 6 | 1 | 1 | 6 | 574 | 10,354 |
114
+ | FAC | 27 | 444 | 1 | 4 | 4 | 1 | 154 | 635 |
115
+ | TIME | 1 | 0 | 287 | 0 | 0 | 0 | 174 | 462 |
116
+ | LOC | 1 | 13 | 0 | 87 | 11 | 0 | 76 | 188 |
117
+ | GPE | 3 | 2 | 1 | 8 | 115 | 0 | 25 | 154 |
118
+ | VEH | 12 | 1 | 0 | 0 | 0 | 68 | 38 | 119 |
119
+ | O | 709 | 168 | 151 | 37 | 13 | 35 | 0 | 1,113 |
120
 
121
  ## CONTACT:
122
  mail: antoine [dot] bourgois [at] protonmail [dot] com
final_model.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bba3512f631797dd68f2631c73fd5e84f1232e1c211aa886ec762428d48a9e23
3
- size 121836043
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33aee46fef5eb366deed3c1407205a9f0b3ee115590473f11da0d4f3d2f29c02
3
+ size 386304699