--- multilinguality: mulyilingual language: - ny - kg - kmb - rw - ln - lua - lg - nso - rn - st - sw - ss - ts - tn - tum - umb - xh - zu license: apache-2.0 widget: - text: "gari langu lilipata ajali jana usiku" - text: "ndamcela ukuba ahambe nam" - text: "tango nafungolaki porte azalaki déjà te" --- ### Scores ```python {'eval_accuracy': 0.87955, 'eval_f1_score': 0.8794755507923356, 'eval_recall': 0.8797246969797138, 'eval_precision': 0.881040811800798} ``` ### How to use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification from transformers import pipeline tokenizer = AutoTokenizer.from_pretrained('nairaxo/bantu-language-identification') model = AutoModelForSequenceClassification.from_pretrained("nairaxo/bantu-language-identification") nlp = pipeline('text-classification', model=model, tokenizer=tokenizer) dic = { 'chichewa' : 0, 'kikongo' : 1, 'kimbundu' : 2, 'kinyarwanda' : 3, 'lingala' : 4, 'lubakasai' : 5, 'luganda' : 6, 'northernsotho' : 7, 'rundi' : 8, 'southernsotho' : 9, 'swahili' : 10, 'swati' : 11, 'tsonga' : 12, 'tswana' : 13, 'tumbuka' : 14, 'umbundu' : 15, 'xhosa' : 16, 'zulu' : 17 } dic = {v: k for k, v in dic.items()} sentences = [ "gari langu lilipata ajali jana usiku", "ndamcela ukuba ahambe nam", "tango nafungolaki porte azalaki déjà te" ] results = nlp(sentences) for i in range(len(results)): results[i]['label'] = dic[int(results[i]['label'].replace('LABEL_', ''))] print(results) ``` Output: ``` [{'label': 'swahili', 'score': 0.9996045231819153}, {'label': 'xhosa', 'score': 0.9882974028587341}, {'label': 'lingala', 'score': 0.9983460903167725}] ```