File size: 1,717 Bytes
514daa2
6431240
514daa2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dda5f18
514daa2
 
 
 
 
 
 
 
 
7428388
514daa2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6dac512
514daa2
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
language: wo
tags:
- roberta
- language-model
- wo
- wolof 
---

# Soraberta: Unsupervised Language Model Pre-training for Wolof

**Soraberta** is pretrained roberta-base model on wolof language  . Roberta was introduced in [this paper](https://arxiv.org/abs/1907.11692)

## Soraberta models

| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
| :------:       |   :---: | :---: | :---: | :---: |
| `soraberta-base` | 6    | 12   | 514   | 83 M |
 



## Using Soraberta with Hugging Face's Transformers


```python
>>> from transformers import pipeline
>>> unmasker = pipeline('fill-mask', model='abdouaziiz/soraberta')
>>> unmasker("juroom naari jullit man nanoo boole jend aw nag walla <mask>.")

[{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla gileem.',
  'score': 0.9783930778503418,
  'token': 4621,
  'token_str': ' gileem'},
 {'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla jend.',
  'score': 0.009271537885069847,
  'token': 2155,
  'token_str': ' jend'},
 {'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla aw.',
  'score': 0.0027585660573095083,
  'token': 704,
  'token_str': ' aw'},
 {'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla pel.',
  'score': 0.001120452769100666,
  'token': 1171,
  'token_str': ' pel'},
 {'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla juum.',
  'score': 0.0005133090307936072,
  'token': 5820,
  'token_str': ' juum'}]
```

## Training data
The data sources are [Bible OT](http://biblewolof.com/) , [WOLOF-ONLINE](http://www.wolof-online.com/) 



## Contact

Please contact [email protected] for any question, feedback or request.