Update README.md
Browse filesevaluation results
README.md
CHANGED
@@ -57,8 +57,8 @@ tags:
|
|
57 |
|
58 |
|
59 |
### Overview
|
60 |
-
This model supports the detection of **45** languages, and it's fine-tuned using **multilingual-e5-base** model on the **common-language** dataset
|
61 |
-
|
62 |
|
63 |
### Download the model
|
64 |
```python
|
@@ -82,6 +82,7 @@ languages = [
|
|
82 |
]
|
83 |
|
84 |
def predict(text, model, tokenizer, device = torch.device('cpu')):
|
|
|
85 |
model.eval()
|
86 |
tokenized = tokenizer(text, padding='max_length', truncation=True, max_length=128, return_tensors="pt")
|
87 |
input_ids = tokenized['input_ids']
|
@@ -110,3 +111,61 @@ print(topk_prob, topk_labels)
|
|
110 |
# ['Chinese_Taiwan', 'Chinese_Hongkong', 'Chinese_China']
|
111 |
```
|
112 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
|
59 |
### Overview
|
60 |
+
This model supports the detection of **45** languages, and it's fine-tuned using **multilingual-e5-base** model on the **common-language** dataset.<br>
|
61 |
+
The overall accuracy is **98.37%**, and more evaluation results are shown the below.
|
62 |
|
63 |
### Download the model
|
64 |
```python
|
|
|
82 |
]
|
83 |
|
84 |
def predict(text, model, tokenizer, device = torch.device('cpu')):
|
85 |
+
model.to(device)
|
86 |
model.eval()
|
87 |
tokenized = tokenizer(text, padding='max_length', truncation=True, max_length=128, return_tensors="pt")
|
88 |
input_ids = tokenized['input_ids']
|
|
|
111 |
# ['Chinese_Taiwan', 'Chinese_Hongkong', 'Chinese_China']
|
112 |
```
|
113 |
|
114 |
+
### Evaluation Results
|
115 |
+
The test datasets refers to the **common_language** test datasets.
|
116 |
+
|
117 |
+
|language | precision | recall | f1-score | support |
|
118 |
+
| --- | --- | ---| --- | --- |
|
119 |
+
|Arabic|1.00|1.00|1.00|151|
|
120 |
+
| Basque | 0.99 | 1.00 | 1.00 | 111|
|
121 |
+
| Breton | 1.00 | 0.90 | 0.95 | 252|
|
122 |
+
| Catalan | 0.96 | 0.99 | 0.97 | 96|
|
123 |
+
| Chinese_China | 0.98 | 1.00 | 0.99 | 100|
|
124 |
+
| Chinese_Hongkong | 0.97 | 0.87 | 0.92 | 115|
|
125 |
+
| Chinese_Taiwan | 0.92 | 0.98 | 0.95 | 170|
|
126 |
+
| Chuvash | 0.98 | 1.00 | 0.99 | 137|
|
127 |
+
| Czech | 0.98 | 1.00 | 0.99 | 128|
|
128 |
+
| Dhivehi | 1.00 | 1.00 | 1.00 | 111|
|
129 |
+
| Dutch | 0.99 | 1.00 | 0.99 | 144|
|
130 |
+
| English | 0.96 | 1.00 | 0.98 | 98|
|
131 |
+
| Esperanto | 0.98 | 0.98 | 0.98 | 107|
|
132 |
+
| Estonian | 1.00 | 0.99 | 0.99 | 93|
|
133 |
+
| French | 0.95 | 1.00 | 0.98 | 106|
|
134 |
+
| Frisian | 1.00 | 0.98 | 0.99 | 117|
|
135 |
+
| Georgian | 1.00 | 1.00 | 1.00 | 110|
|
136 |
+
| German | 1.00 | 1.00 | 1.00 | 101|
|
137 |
+
| Greek | 1.00 | 1.00 | 1.00 | 153|
|
138 |
+
| Hakha_Chin | 0.99 | 1.00 | 0.99 | 202|
|
139 |
+
| Indonesian | 0.99 | 0.99 | 0.99 | 150|
|
140 |
+
| Interlingua | 0.96 | 0.97 | 0.96 | 182|
|
141 |
+
| Italian | 0.99 | 0.94 | 0.96 | 100|
|
142 |
+
| Japanese | 1.00 | 1.00 | 1.00 | 144|
|
143 |
+
| Kabyle | 1.00 | 0.96 | 0.98 | 156|
|
144 |
+
| Kinyarwanda | 0.97 | 1.00 | 0.99 | 103|
|
145 |
+
| Kyrgyz | 0.98 | 1.00 | 0.99 | 129|
|
146 |
+
| Latvian | 0.98 | 0.98 | 0.98 | 171|
|
147 |
+
| Maltese | 0.99 | 0.98 | 0.98 | 152|
|
148 |
+
| Mongolian | 1.00 | 1.00 | 1.00 | 112|
|
149 |
+
| Persian | 1.00 | 1.00 | 1.00 | 123|
|
150 |
+
| Polish | 0.91 | 0.99 | 0.95 | 128|
|
151 |
+
| Portuguese | 0.94 | 0.99 | 0.96 | 124|
|
152 |
+
| Romanian | 1.00 | 1.00 | 1.00 | 152|
|
153 |
+
|Romansh_Sursilvan | 0.99 | 0.95 | 0.97 | 106|
|
154 |
+
| Russian | 0.99 | 0.99 | 0.99 | 100|
|
155 |
+
| Sakha | 0.99 | 1.00 | 1.00 | 105|
|
156 |
+
| Slovenian | 0.99 | 1.00 | 1.00 | 166|
|
157 |
+
| Spanish | 0.96 | 0.95 | 0.95 | 94|
|
158 |
+
| Swedish | 0.99 | 1.00 | 0.99 | 190|
|
159 |
+
| Tamil | 1.00 | 1.00 | 1.00 | 135|
|
160 |
+
| Tatar | 1.00 | 0.96 | 0.98 | 173|
|
161 |
+
| Turkish | 1.00 | 1.00 | 1.00 | 137|
|
162 |
+
| Ukranian | 0.99 | 1.00 | 1.00 | 126|
|
163 |
+
| Welsh | 0.98 | 1.00 | 0.99 | 103|
|
164 |
+
||
|
165 |
+
| *macro avg* | 0.98 | 0.99 | 0.98 | 5963|
|
166 |
+
| *weighted avg* | 0.98 | 0.98 | 0.98 | 5963|
|
167 |
+
||
|
168 |
+
| *overall accuracy* | | | 0.9837 | 5963|
|
169 |
+
|
170 |
+
|
171 |
+
|