--- language: pt license: mit tags: - bert - pytorch datasets: - Twitter --- ## Introduction BERTabaporu is a Brazilian Portuguese BERT model in the Twitter domain. The model has been built from a collection of 238 million tweets written by over 100 thousand unique Twitter users, and conveying over 2.9 billion tokens in total. ## Available models | Model | Arch. | #Layers | #Params | | ---------------------------------------- | ---------- | ------- | ------- | | `pablocosta/bertabaporu-base-uncased` | BERT-Base | 12 | 110M | | `pablocosta/bertabaporu-large-uncased` | BERT-Large | 24 | 335M | ## Usage ```python from transformers import AutoTokenizer # Or BertTokenizer from transformers import AutoModelForPreTraining # Or BertForPreTraining for loading pretraining heads from transformers import AutoModel # or BertModel, for BERT without pretraining heads model = AutoModelForPreTraining.from_pretrained('pablocosta/bertabaporu-large-uncased') tokenizer = AutoTokenizer.from_pretrained('pablocosta/bertabaporu-large-uncased') ```