File size: 1,717 Bytes
514daa2 6431240 514daa2 dda5f18 514daa2 7428388 514daa2 6dac512 514daa2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
language: wo
tags:
- roberta
- language-model
- wo
- wolof
---
# Soraberta: Unsupervised Language Model Pre-training for Wolof
**Soraberta** is pretrained roberta-base model on wolof language . Roberta was introduced in [this paper](https://arxiv.org/abs/1907.11692)
## Soraberta models
| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
| :------: | :---: | :---: | :---: | :---: |
| `soraberta-base` | 6 | 12 | 514 | 83 M |
## Using Soraberta with Hugging Face's Transformers
```python
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='abdouaziiz/soraberta')
>>> unmasker("juroom naari jullit man nanoo boole jend aw nag walla <mask>.")
[{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla gileem.',
'score': 0.9783930778503418,
'token': 4621,
'token_str': ' gileem'},
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla jend.',
'score': 0.009271537885069847,
'token': 2155,
'token_str': ' jend'},
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla aw.',
'score': 0.0027585660573095083,
'token': 704,
'token_str': ' aw'},
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla pel.',
'score': 0.001120452769100666,
'token': 1171,
'token_str': ' pel'},
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla juum.',
'score': 0.0005133090307936072,
'token': 5820,
'token_str': ' juum'}]
```
## Training data
The data sources are [Bible OT](http://biblewolof.com/) , [WOLOF-ONLINE](http://www.wolof-online.com/)
## Contact
Please contact [email protected] for any question, feedback or request. |