Model details
This machine translation model can convert single sentences from and to any combination of the following languages:
ISO 693-3 | Language name |
---|---|
eng | English |
ach | Acholi |
lgg | Lugbara |
lug | Luganda |
nyn | Runyankole |
teo | Ateso |
It was trained on the SALT dataset and a variety of additional external data resources, including back-translated news articles, FLORES-200, MT560 and LAFAND-MT. The base model was facebok/nllb-200-1.3B, with tokens adapted to add support for languages not originally included.
Usage example
tokenizer = transformers.NllbTokenizer.from_pretrained(
'Sunbird/translate-nllb-1.3b-salt')
model = transformers.M2M100ForConditionalGeneration.from_pretrained(
'Sunbird/translate-nllb-1.3b-salt')
text = 'Where is the hospital?'
source_language = 'eng'
target_language = 'lug'
language_tokens = {
'eng': 256047,
'ach': 256111,
'lgg': 256008,
'lug': 256110,
'nyn': 256002,
'teo': 256006,
}
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
inputs = tokenizer(text, return_tensors="pt").to(device)
inputs['input_ids'][0][0] = language_tokens[source_language]
translated_tokens = model.to(device).generate(
**inputs,
forced_bos_token_id=language_tokens[target_language],
max_length=100,
num_beams=5,
)
result = tokenizer.batch_decode(
translated_tokens, skip_special_tokens=True)[0]
# Eddwaliro liri ludda wa?
Evaluation metrics
Results on salt-dev:
Source language | Target language | BLEU |
---|---|---|
ach | eng | 28.371 |
lgg | eng | 30.45 |
lug | eng | 41.978 |
nyn | eng | 32.296 |
teo | eng | 30.422 |
eng | ach | 20.972 |
eng | lgg | 22.362 |
eng | lug | 30.359 |
eng | nyn | 15.305 |
eng | teo | 21.391 |
- Downloads last month
- 3,596
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for Sunbird/translate-nllb-1.3b-salt
Base model
facebook/nllb-200-1.3B