metadata

datasets:
  - allenai/c4
  - legacy-datasets/mc4
language:
  - pt
pipeline_tag: text2text-generation
base_model: google-t5/t5-small

Introduction

ptt5-v2 models were trained for approximately 1 epoch over the "pt" subset of the mC4 dataset, on top of the Google T5 original checkpoints. These models need to be fine-tuned before being used on downstream tasks.

Citation

If you use our models, please cite:

@article{ptt5_2020,
  title={PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data},
  author={Carmo, Diedre and Piau, Marcos and Campiotti, Israel and Nogueira, Rodrigo and Lotufo, Roberto},
  journal={arXiv preprint arXiv:2008.09144},
  year={2020}
}