Upload 3 files
Browse files- JCLS_model_card.md +14 -14
- README.md +48 -48
JCLS_model_card.md
CHANGED
@@ -31,14 +31,14 @@ The predicted entities are:
|
|
31 |
## MODEL PERFORMANCES (LOOCV):
|
32 |
| NER_tag | precision | recall | f1_score | support | support % |
|
33 |
|-----------|-------------|----------|------------|-----------|-------------|
|
34 |
-
| PER | 92.
|
35 |
-
| FAC |
|
36 |
-
| TIME |
|
37 |
-
| LOC |
|
38 |
-
| GPE |
|
39 |
-
| VEH |
|
40 |
-
| micro_avg | 89.
|
41 |
-
| macro_avg |
|
42 |
|
43 |
## TRAINING PARAMETERS:
|
44 |
- Entities types: ['PER', 'LOC', 'FAC', 'TIME', 'VEH', 'GPE']
|
@@ -111,12 +111,12 @@ Model Output: BIOES labels sequence
|
|
111 |
| Gold Labels | PER | FAC | TIME | LOC | GPE | VEH | O | support |
|
112 |
|---------------|-------|-------|--------|-------|-------|-------|-----|-----------|
|
113 |
| PER | 4,006 | 0 | 2 | 1 | 1 | 3 | 149 | 4,162 |
|
114 |
-
| FAC |
|
115 |
-
| TIME | 1 | 0 |
|
116 |
-
| LOC |
|
117 |
-
| GPE | 2 |
|
118 |
-
| VEH |
|
119 |
-
| O |
|
120 |
|
121 |
## CONTACT:
|
122 |
mail: antoine [dot] bourgois [at] protonmail [dot] com
|
|
|
31 |
## MODEL PERFORMANCES (LOOCV):
|
32 |
| NER_tag | precision | recall | f1_score | support | support % |
|
33 |
|-----------|-------------|----------|------------|-----------|-------------|
|
34 |
+
| PER | 92.82% | 96.25% | 94.50% | 4,162 | 86.12% |
|
35 |
+
| FAC | 74.78% | 75.45% | 75.11% | 224 | 4.63% |
|
36 |
+
| TIME | 63.48% | 68.54% | 65.91% | 213 | 4.41% |
|
37 |
+
| LOC | 72.22% | 59.09% | 65.00% | 110 | 2.28% |
|
38 |
+
| GPE | 81.36% | 75.00% | 78.05% | 64 | 1.32% |
|
39 |
+
| VEH | 56.06% | 61.67% | 58.73% | 60 | 1.24% |
|
40 |
+
| micro_avg | 89.61% | 92.51% | 91.01% | 4,833 | 100.00% |
|
41 |
+
| macro_avg | 73.45% | 72.67% | 72.88% | 4,833 | 100.00% |
|
42 |
|
43 |
## TRAINING PARAMETERS:
|
44 |
- Entities types: ['PER', 'LOC', 'FAC', 'TIME', 'VEH', 'GPE']
|
|
|
111 |
| Gold Labels | PER | FAC | TIME | LOC | GPE | VEH | O | support |
|
112 |
|---------------|-------|-------|--------|-------|-------|-------|-----|-----------|
|
113 |
| PER | 4,006 | 0 | 2 | 1 | 1 | 3 | 149 | 4,162 |
|
114 |
+
| FAC | 7 | 169 | 0 | 2 | 0 | 0 | 46 | 224 |
|
115 |
+
| TIME | 1 | 0 | 146 | 0 | 0 | 0 | 66 | 213 |
|
116 |
+
| LOC | 1 | 3 | 0 | 65 | 6 | 0 | 35 | 110 |
|
117 |
+
| GPE | 2 | 3 | 0 | 3 | 48 | 0 | 8 | 64 |
|
118 |
+
| VEH | 5 | 1 | 0 | 0 | 0 | 37 | 17 | 60 |
|
119 |
+
| O | 292 | 50 | 81 | 19 | 4 | 26 | 0 | 472 |
|
120 |
|
121 |
## CONTACT:
|
122 |
mail: antoine [dot] bourgois [at] protonmail [dot] com
|
README.md
CHANGED
@@ -31,14 +31,14 @@ The predicted entities are:
|
|
31 |
## MODEL PERFORMANCES (LOOCV):
|
32 |
| NER_tag | precision | recall | f1_score | support | support % |
|
33 |
|-----------|-------------|----------|------------|-----------|-------------|
|
34 |
-
| PER | 92.
|
35 |
-
| FAC |
|
36 |
-
| TIME |
|
37 |
-
|
|
38 |
-
|
|
39 |
-
| VEH |
|
40 |
-
| micro_avg |
|
41 |
-
| macro_avg |
|
42 |
|
43 |
## TRAINING PARAMETERS:
|
44 |
- Entities types: ['PER', 'LOC', 'FAC', 'TIME', 'VEH', 'GPE']
|
@@ -75,48 +75,48 @@ Model Output: BIOES labels sequence
|
|
75 |
*** IN CONSTRUCTION ***
|
76 |
|
77 |
## TRAINING CORPUS:
|
78 |
-
| | Document | Tokens Count | Is included in model eval
|
79 |
-
|
80 |
-
| 0 | 1830_Balzac-Honoré-de_La-maison-du-chat-qui-pelote | 24,776 tokens | True
|
81 |
-
| 1 | 1830_Balzac-Honoré-de_Sarrasine | 15,408 tokens | True
|
82 |
-
| 2 | 1836_Gautier-Théophile_La-morte-amoureuse | 14,293 tokens | True
|
83 |
-
| 3 | 1837_Balzac-Honoré-de_La-maison-Nucingen | 30,034 tokens |
|
84 |
-
| 4 | 1841_Sand-George_Pauline | 12,398 tokens |
|
85 |
-
| 5 | 1856_Cousin-Victor_Madame-de-Hautefort | 11,768 tokens |
|
86 |
-
| 6 | 1863_Gautier-Théophile_Le-capitaine-Fracasse | 11,848 tokens |
|
87 |
-
| 7 | 1873_Zola-Émile_Le-ventre-de-Paris | 12,613 tokens |
|
88 |
-
| 8 | 1881_Flaubert-Gustave_Bouvard-et-Pécuchet | 12,308 tokens |
|
89 |
-
| 9 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-buche | 2,267 tokens |
|
90 |
-
| 10 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-relique | 2,041 tokens |
|
91 |
-
| 11 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-rouille | 2,949 tokens | True
|
92 |
-
| 12 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Madame-Baptiste | 2,578 tokens | True
|
93 |
-
| 13 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Marocca | 4,078 tokens |
|
94 |
-
| 14 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-A-cheval | 2,878 tokens |
|
95 |
-
| 15 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Fou | 1,905 tokens |
|
96 |
-
| 16 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Mademoiselle-Fifi | 5,439 tokens | True
|
97 |
-
| 17 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Reveil | 2,159 tokens |
|
98 |
-
| 18 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Un-reveillon | 2,364 tokens |
|
99 |
-
| 19 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Une-ruse | 2,469 tokens |
|
100 |
-
| 20 | 1901_Achard-Lucie_Rosalie-de-Constant-sa-famille-et-ses-amis | 12,775 tokens |
|
101 |
-
| 21 | 1903_Conan-Laure_Élisabeth-Seton | 13,046 tokens |
|
102 |
-
| 22 | 1904-1912_Rolland-Romain_Jean-Christophe(1) | 10,982 tokens | True
|
103 |
-
| 23 | 1904-1912_Rolland-Romain_Jean-Christophe(2) | 10,305 tokens |
|
104 |
-
| 24 | 1917_Bourgeois-Adèle_Némoville | 12,468 tokens |
|
105 |
-
| 25 | 1923_Radiguet-Raymond_Le-diable-au-corps | 14,850 tokens |
|
106 |
-
| 26 | 1926_Audoux-Marguerite_De-la-ville-au-moulin | 12,144 tokens | True
|
107 |
-
| 27 | 1937_Audoux-Marguerite_Douce-Lumière | 12,346 tokens |
|
108 |
-
| 28 | TOTAL | 275,489 tokens |
|
109 |
|
110 |
## PREDICTIONS CONFUSION MATRIX:
|
111 |
-
| Gold Labels | PER
|
112 |
-
|
113 |
-
| PER |
|
114 |
-
| FAC |
|
115 |
-
| TIME | 1 |
|
116 |
-
|
|
117 |
-
|
|
118 |
-
| VEH |
|
119 |
-
| O |
|
120 |
|
121 |
## CONTACT:
|
122 |
mail: antoine [dot] bourgois [at] protonmail [dot] com
|
|
|
31 |
## MODEL PERFORMANCES (LOOCV):
|
32 |
| NER_tag | precision | recall | f1_score | support | support % |
|
33 |
|-----------|-------------|----------|------------|-----------|-------------|
|
34 |
+
| PER | 92.46% | 93.71% | 93.08% | 32,204 | 84.13% |
|
35 |
+
| FAC | 70.63% | 70.94% | 70.78% | 2,295 | 6.00% |
|
36 |
+
| TIME | 58.66% | 57.75% | 58.20% | 1,671 | 4.37% |
|
37 |
+
| GPE | 77.64% | 77.37% | 77.50% | 866 | 2.26% |
|
38 |
+
| LOC | 62.96% | 45.71% | 52.97% | 781 | 2.04% |
|
39 |
+
| VEH | 63.43% | 47.95% | 54.61% | 463 | 1.21% |
|
40 |
+
| micro_avg | 88.39% | 88.87% | 88.58% | 38,280 | 100.00% |
|
41 |
+
| macro_avg | 70.96% | 65.57% | 67.86% | 38,280 | 100.00% |
|
42 |
|
43 |
## TRAINING PARAMETERS:
|
44 |
- Entities types: ['PER', 'LOC', 'FAC', 'TIME', 'VEH', 'GPE']
|
|
|
75 |
*** IN CONSTRUCTION ***
|
76 |
|
77 |
## TRAINING CORPUS:
|
78 |
+
| | Document | Tokens Count | Is included in model eval |
|
79 |
+
|----|---------------------------------------------------------------------------------|----------------|------------------------------------|
|
80 |
+
| 0 | 1830_Balzac-Honoré-de_La-maison-du-chat-qui-pelote | 24,776 tokens | True |
|
81 |
+
| 1 | 1830_Balzac-Honoré-de_Sarrasine | 15,408 tokens | True |
|
82 |
+
| 2 | 1836_Gautier-Théophile_La-morte-amoureuse | 14,293 tokens | True |
|
83 |
+
| 3 | 1837_Balzac-Honoré-de_La-maison-Nucingen | 30,034 tokens | True |
|
84 |
+
| 4 | 1841_Sand-George_Pauline | 12,398 tokens | True |
|
85 |
+
| 5 | 1856_Cousin-Victor_Madame-de-Hautefort | 11,768 tokens | True |
|
86 |
+
| 6 | 1863_Gautier-Théophile_Le-capitaine-Fracasse | 11,848 tokens | True |
|
87 |
+
| 7 | 1873_Zola-Émile_Le-ventre-de-Paris | 12,613 tokens | True |
|
88 |
+
| 8 | 1881_Flaubert-Gustave_Bouvard-et-Pécuchet | 12,308 tokens | True |
|
89 |
+
| 9 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-buche | 2,267 tokens | True |
|
90 |
+
| 10 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-relique | 2,041 tokens | True |
|
91 |
+
| 11 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-La-rouille | 2,949 tokens | True |
|
92 |
+
| 12 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Madame-Baptiste | 2,578 tokens | True |
|
93 |
+
| 13 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Marocca | 4,078 tokens | True |
|
94 |
+
| 14 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-A-cheval | 2,878 tokens | True |
|
95 |
+
| 15 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Fou | 1,905 tokens | True |
|
96 |
+
| 16 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Mademoiselle-Fifi | 5,439 tokens | True |
|
97 |
+
| 17 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Reveil | 2,159 tokens | True |
|
98 |
+
| 18 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Un-reveillon | 2,364 tokens | True |
|
99 |
+
| 19 | 1882-1883_Maupassant-Guy-de_Mademoiselle-Fifi-Nouveaux-contes-Une-ruse | 2,469 tokens | True |
|
100 |
+
| 20 | 1901_Achard-Lucie_Rosalie-de-Constant-sa-famille-et-ses-amis | 12,775 tokens | True |
|
101 |
+
| 21 | 1903_Conan-Laure_Élisabeth-Seton | 13,046 tokens | True |
|
102 |
+
| 22 | 1904-1912_Rolland-Romain_Jean-Christophe(1) | 10,982 tokens | True |
|
103 |
+
| 23 | 1904-1912_Rolland-Romain_Jean-Christophe(2) | 10,305 tokens | True |
|
104 |
+
| 24 | 1917_Bourgeois-Adèle_Némoville | 12,468 tokens | True |
|
105 |
+
| 25 | 1923_Radiguet-Raymond_Le-diable-au-corps | 14,850 tokens | True |
|
106 |
+
| 26 | 1926_Audoux-Marguerite_De-la-ville-au-moulin | 12,144 tokens | True |
|
107 |
+
| 27 | 1937_Audoux-Marguerite_Douce-Lumière | 12,346 tokens | True |
|
108 |
+
| 28 | TOTAL | 275,489 tokens | 28 files used for cross-validation |
|
109 |
|
110 |
## PREDICTIONS CONFUSION MATRIX:
|
111 |
+
| Gold Labels | PER | FAC | TIME | GPE | LOC | VEH | O | support |
|
112 |
+
|---------------|--------|-------|--------|-------|-------|-------|-------|-----------|
|
113 |
+
| PER | 30,177 | 28 | 14 | 7 | 7 | 31 | 1,940 | 32,204 |
|
114 |
+
| FAC | 42 | 1,628 | 1 | 22 | 17 | 1 | 584 | 2,295 |
|
115 |
+
| TIME | 8 | 1 | 965 | 1 | 1 | 0 | 695 | 1,671 |
|
116 |
+
| GPE | 13 | 31 | 2 | 670 | 31 | 0 | 119 | 866 |
|
117 |
+
| LOC | 8 | 64 | 1 | 56 | 357 | 0 | 295 | 781 |
|
118 |
+
| VEH | 54 | 8 | 0 | 0 | 0 | 222 | 179 | 463 |
|
119 |
+
| O | 2,285 | 524 | 661 | 100 | 150 | 96 | 0 | 3,816 |
|
120 |
|
121 |
## CONTACT:
|
122 |
mail: antoine [dot] bourgois [at] protonmail [dot] com
|