|
--- |
|
license: mit |
|
language: |
|
- xh |
|
- zu |
|
- nr |
|
- ss |
|
--- |
|
|
|
Usage: |
|
|
|
1. For mask prediction |
|
|
|
``` |
|
tokenizer = AutoTokenizer.from_pretrained("francois-meyer/nguni-xlmr-large") |
|
model = XLMRobertaForMaskedLM.from_pretrained("francois-meyer/nguni-xlmr-large") |
|
text = "A test <mask> for the nguni model." ## Replace with any sentence from the Nguni Languages with mask tokens. |
|
inputs = tokenizer(text, return_tensors="pt") |
|
with torch.no_grad(): |
|
logits = model(**inputs).logits |
|
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0] |
|
predicted_token_id = logits[0, mask_token_index].argmax(axis=-1) |
|
print(tokenizer.decode(predicted_token_id)) |
|
``` |
|
|
|
2. For any other task, you might want to fine-tune the model in the same way you fine-tune a BERT/XLMR model. |