castorini
/

afriberta_small

Inference Endpoints

Model card Files Files and versions Community

kelechi commited on Oct 6, 2021

Commit

d39b301

•

1 Parent(s): 51f514b

updated model card

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ language:
 - multilingual
 ---
-# AfriBERTa_small
 ## Model description
 AfriBERTa small is a pretrained multilingual language model with around 97 million parameters.
 The model has 4 layers, 6 attention heads, 768 hidden units and 3072 feed forward size.
@@ -33,13 +33,13 @@ For example, assuming we want to finetune this model on a token classification t
 >>> from transformers import AutoTokenizer, AutoModelForTokenClassification
 >>> model = AutoModelForTokenClassification.from_pretrained("castorini/afriberta_small")
 >>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriberta_small")
-# we have to manually set the model max length because it is an imported sentencepiece model which hugginface does not properly support right now
 >>> tokenizer.model_max_length = 512
 ```
 #### Limitations and bias
-This model is possibly limited by its training dataset which are majorly obtained from news articles from a specific span of time.
-Thus, it may not generalize well.
 ## Training data
 The model was trained on an aggregation of datasets from the BBC news website and Common Crawl.

 - multilingual
 ---
+# afriberta_small
 ## Model description
 AfriBERTa small is a pretrained multilingual language model with around 97 million parameters.
 The model has 4 layers, 6 attention heads, 768 hidden units and 3072 feed forward size.
 >>> from transformers import AutoTokenizer, AutoModelForTokenClassification
 >>> model = AutoModelForTokenClassification.from_pretrained("castorini/afriberta_small")
 >>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriberta_small")
+# we have to manually set the model max length because it is an imported trained sentencepiece model, which huggingface does not properly support right now
 >>> tokenizer.model_max_length = 512
 ```
 #### Limitations and bias
+- This model is possibly limited by its training dataset which are majorly obtained from news articles from a specific span of time. Thus, it may not generalize well.
+- This model is trained on very little data (less than 1 GB), hence it may not have seen enough data to learn very complex linguistic relations.
 ## Training data
 The model was trained on an aggregation of datasets from the BBC news website and Common Crawl.