metadata
multilinguality: mulyilingual
language:
- ny
- kg
- kmb
- rw
- ln
- lua
- lg
- nso
- rn
- st
- sw
- ss
- ts
- tn
- tum
- umb
- xh
- zu
license: apache-2.0
widget:
- text: gari langu lilipata ajali jana usiku
- text: ndamcela ukuba ahambe nam
- text: tango nafungolaki porte azalaki déjà te
Scores
{'eval_accuracy': 0.87955,
'eval_f1_score': 0.8794755507923356,
'eval_recall': 0.8797246969797138,
'eval_precision': 0.881040811800798}
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained('nairaxo/bantu-language-identification')
model = AutoModelForSequenceClassification.from_pretrained("nairaxo/bantu-language-identification")
nlp = pipeline('text-classification', model=model, tokenizer=tokenizer)
dic = {
'chichewa' : 0,
'kikongo' : 1,
'kimbundu' : 2,
'kinyarwanda' : 3,
'lingala' : 4,
'lubakasai' : 5,
'luganda' : 6,
'northernsotho' : 7,
'rundi' : 8,
'southernsotho' : 9,
'swahili' : 10,
'swati' : 11,
'tsonga' : 12,
'tswana' : 13,
'tumbuka' : 14,
'umbundu' : 15,
'xhosa' : 16,
'zulu' : 17
}
dic = {v: k for k, v in dic.items()}
sentences = [
"gari langu lilipata ajali jana usiku",
"ndamcela ukuba ahambe nam",
"tango nafungolaki porte azalaki déjà te"
]
results = nlp(sentences)
for i in range(len(results)):
results[i]['label'] = dic[int(results[i]['label'].replace('LABEL_', ''))]
print(results)
Output:
[{'label': 'swahili', 'score': 0.9996045231819153},
{'label': 'xhosa', 'score': 0.9882974028587341},
{'label': 'lingala', 'score': 0.9983460903167725}]