PORTULAN
/

albertina-1b5-portuguese-ptpt-encoder

@@ -104,7 +104,7 @@ We skipped the default filtering of stopwords since it would disrupt the syntact
 As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/microsoft/deberta-v2-xxlarge), for English.
 To train **Albertina 1.5B PT-PT**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence truncation and dynamic padding for 250k steps,
-a 256-token sequence-truncation for 80k steps and finally a 512-token sequence-truncation for 60k steps.
 These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences, 24 hours of computation for the 256-token
 input sequences and 24 hours of computation for the 512-token input sequences.
 We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
@@ -113,21 +113,35 @@ We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
 # Evaluation
-The base model version was evaluated on downstream tasks, namely the translations into PT-PT of the English data sets used for a few of the tasks in the widely-used [GLUE benchmark](https://huggingface.co/datasets/glue).
-## GLUE and SUPERGLUE tasks translated
-We resorted to [HyperGlue-PT](?), a **PT-PT version of the GLUE and SUPERGLUE** benchmark.
-We automatically translated the tasks from GLUE and SUPERGLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
-| Model                     | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) | COPA (Accuracy) | CB (F1) | MultiRC (F1) | BoolQ (Accuracy) |
-|---------------------------|----------------|----------------|-----------|-----------------|-----------------|---------|--------------|------------------|
-| **Albertina 1.5B PT-PT**  | **0.?**        | 0.             | **0.    **| **0.    **      | ?               |         |              |                  |
-| **Albertina PT-PT(900M)** |  0.8339        | 0.4225         | 0.9171    | 0.8801          | 0.7300          | 0.4739  | 0.6782       | 0.8437           |
-| **Albertina 100M PT-PT**  |  0.5848        | 0.5634         | 0.8793    | 0.8624          | n.a.            | 0.4734  | 0.6564       | 0.7700           |
 <br>

 As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/microsoft/deberta-v2-xxlarge), for English.
 To train **Albertina 1.5B PT-PT**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence truncation and dynamic padding for 250k steps,
+a 256-token sequence-truncation for 80k steps ([**Albertina 1.5B PT-PT 256**](https://huggingface.co/PORTULAN/albertina-1b5-portuguese-ptpt-encoder-256)) and finally a 512-token sequence-truncation for 60k steps.
 These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences, 24 hours of computation for the 256-token
 input sequences and 24 hours of computation for the 512-token input sequences.
 We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
 # Evaluation
+We resorted to [HyperGlue-PT](?), a **PT-PT version of the GLUE and SUPERGLUE** benchmark.
+We automatically translated the tasks from GLUE and SUPERGLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
+| Model                         | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) | COPA (Accuracy) | CB (F1)    | MultiRC (F1) | BoolQ (Accuracy) |
+|-------------------------------|----------------|----------------|-----------|-----------------|-----------------|------------|--------------|------------------|
+| **Albertina 1.5B PT-PT**      | **0.8809**     | 0.4742         | 0.8457    | **0.9034**      | **0.8433**      | **0.7840** | **0.7688**   | **0.8602**       |
+| **Albertina 1.5B PT-PT 256**  |  0.8809        | 0.5493         | 0.8752    | 0.8795          | 0.8400          | 0.5832     | 0.6791       | 0.8496           |
+| **Albertina 900M PT-PT**      |  0.8339        | 0.4225         | **0.9171**| 0.8801          | 0.7033          | 0.6018     | 0.6728       | 0.8224           |
+| **Albertina 100M PT-PT**      |  0.6919        | 0.4742         | 0.8047    | 0.8590          | n.a.            | 0.4529     | 0.6481       | 0.7578           |
+||||||||||
+| **DeBERTa 1.5B EN**           |  0.8147        | 0.4554         | 0.8696    | 0.8557          | 0.5167          | 0.4901     | 0.6687       | 0.8347           |
+| **DeBERTa 100M EN**           |  0.6029        | **0.5634**     | 0.7802    | 0.8320          | n.a.            | 0.4698     | 0.6368       | 0.6829           |
+**para modelo PT-BR**
+| Model                         | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) | COPA (Accuracy) | CB (F1)    | MultiRC (F1) | BoolQ (Accuracy) |
+|-------------------------------|----------------|----------------|-----------|-----------------|-----------------|------------|--------------|------------------|
+| **Albertina 1.5B PT-BR**      | **0.8676**     | 0.4742         | 0.8622    | **0.9007**      | 0.7767          | 0.6372     | **0.7667**   | **0.8654**       |
+| **Albertina 1.5B PT-BR 256**  |  0.8123        | 0.4225         | 0.8638    | 0.8968          | **0.8533**      | **0.6884** | 0.6799       | 0.8509           |
+| **Albertina 900M PT-BR**      |  0.7545        | 0.4601         | **0.9071**| 0.8910          | 0.7767          | 0.5799     | 0.6731       | 0.8385           |
+| **BERTimbau (335M)**          |  0.6446        | **0.5634**     | 0.8873    | 0.8842          | 0.6933          | 0.5438     | 0.6787       | 0.7783           |
+| **Albertina 100M PT-BR**      |  0.6582        | **0.5634**     | 0.8149    | 0.8489          | n.a.            | 0.4771     | 0.6469       | 0.7537           |
+||||||||||
+| **DeBERTa 1.5B EN**           |  0.7112        | **0.5634**     | 0.8545    | 0.0123          | 0.5700          | 0.4307     | 0.3639       | 0.6217           |
+| **DeBERTa 100M EN**           |  0.5716        | 0.5587         | 0.8060    | 0.8266          | n.a.            | 0.4739     | 0.6391       | 0.6838           |
 <br>