updated model card
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ language:
|
|
15 |
- multilingual
|
16 |
|
17 |
---
|
18 |
-
#
|
19 |
## Model description
|
20 |
AfriBERTa small is a pretrained multilingual language model with around 97 million parameters.
|
21 |
The model has 4 layers, 6 attention heads, 768 hidden units and 3072 feed forward size.
|
@@ -33,13 +33,13 @@ For example, assuming we want to finetune this model on a token classification t
|
|
33 |
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
|
34 |
>>> model = AutoModelForTokenClassification.from_pretrained("castorini/afriberta_small")
|
35 |
>>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriberta_small")
|
36 |
-
# we have to manually set the model max length because it is an imported sentencepiece model which
|
37 |
>>> tokenizer.model_max_length = 512
|
38 |
```
|
39 |
|
40 |
#### Limitations and bias
|
41 |
-
This model is possibly limited by its training dataset which are majorly obtained from news articles from a specific span of time.
|
42 |
-
|
43 |
|
44 |
## Training data
|
45 |
The model was trained on an aggregation of datasets from the BBC news website and Common Crawl.
|
|
|
15 |
- multilingual
|
16 |
|
17 |
---
|
18 |
+
# afriberta_small
|
19 |
## Model description
|
20 |
AfriBERTa small is a pretrained multilingual language model with around 97 million parameters.
|
21 |
The model has 4 layers, 6 attention heads, 768 hidden units and 3072 feed forward size.
|
|
|
33 |
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
|
34 |
>>> model = AutoModelForTokenClassification.from_pretrained("castorini/afriberta_small")
|
35 |
>>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriberta_small")
|
36 |
+
# we have to manually set the model max length because it is an imported trained sentencepiece model, which huggingface does not properly support right now
|
37 |
>>> tokenizer.model_max_length = 512
|
38 |
```
|
39 |
|
40 |
#### Limitations and bias
|
41 |
+
- This model is possibly limited by its training dataset which are majorly obtained from news articles from a specific span of time. Thus, it may not generalize well.
|
42 |
+
- This model is trained on very little data (less than 1 GB), hence it may not have seen enough data to learn very complex linguistic relations.
|
43 |
|
44 |
## Training data
|
45 |
The model was trained on an aggregation of datasets from the BBC news website and Common Crawl.
|