Fill-Mask
Transformers
PyTorch
Portuguese
deberta-v2
albertina-pt*
albertina-100m-portuguese-ptpt
albertina-100m-portuguese-ptbr
albertina-900m-portuguese-ptpt
albertina-900m-portuguese-ptbr
albertina-1b5-portuguese-ptpt
albertina-1b5-portuguese-ptbr
bert
deberta
portuguese
encoder
foundation model
Inference Endpoints
jarodrigues
commited on
Commit
•
0fc7a14
1
Parent(s):
68d6838
Update README.md
Browse files
README.md
CHANGED
@@ -104,7 +104,7 @@ We skipped the default filtering of stopwords since it would disrupt the syntact
|
|
104 |
As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/microsoft/deberta-v2-xxlarge), for English.
|
105 |
|
106 |
To train **Albertina 1.5B PT-PT**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence truncation and dynamic padding for 250k steps,
|
107 |
-
a 256-token sequence-truncation for 80k steps and finally a 512-token sequence-truncation for 60k steps.
|
108 |
These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences, 24 hours of computation for the 256-token
|
109 |
input sequences and 24 hours of computation for the 512-token input sequences.
|
110 |
We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
|
@@ -113,21 +113,35 @@ We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
|
|
113 |
|
114 |
# Evaluation
|
115 |
|
116 |
-
The base model version was evaluated on downstream tasks, namely the translations into PT-PT of the English data sets used for a few of the tasks in the widely-used [GLUE benchmark](https://huggingface.co/datasets/glue).
|
117 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
118 |
|
119 |
-
## GLUE and SUPERGLUE tasks translated
|
120 |
|
121 |
|
122 |
-
We resorted to [HyperGlue-PT](?), a **PT-PT version of the GLUE and SUPERGLUE** benchmark.
|
123 |
-
We automatically translated the tasks from GLUE and SUPERGLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
|
124 |
|
125 |
-
|
126 |
-
|---------------------------|----------------|----------------|-----------|-----------------|-----------------|---------|--------------|------------------|
|
127 |
-
| **Albertina 1.5B PT-PT** | **0.?** | 0. | **0. **| **0. ** | ? | | | |
|
128 |
-
| **Albertina PT-PT(900M)** | 0.8339 | 0.4225 | 0.9171 | 0.8801 | 0.7300 | 0.4739 | 0.6782 | 0.8437 |
|
129 |
-
| **Albertina 100M PT-PT** | 0.5848 | 0.5634 | 0.8793 | 0.8624 | n.a. | 0.4734 | 0.6564 | 0.7700 |
|
130 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
131 |
|
132 |
|
133 |
<br>
|
|
|
104 |
As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/microsoft/deberta-v2-xxlarge), for English.
|
105 |
|
106 |
To train **Albertina 1.5B PT-PT**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence truncation and dynamic padding for 250k steps,
|
107 |
+
a 256-token sequence-truncation for 80k steps ([**Albertina 1.5B PT-PT 256**](https://huggingface.co/PORTULAN/albertina-1b5-portuguese-ptpt-encoder-256)) and finally a 512-token sequence-truncation for 60k steps.
|
108 |
These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences, 24 hours of computation for the 256-token
|
109 |
input sequences and 24 hours of computation for the 512-token input sequences.
|
110 |
We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
|
|
|
113 |
|
114 |
# Evaluation
|
115 |
|
|
|
116 |
|
117 |
+
We resorted to [HyperGlue-PT](?), a **PT-PT version of the GLUE and SUPERGLUE** benchmark.
|
118 |
+
We automatically translated the tasks from GLUE and SUPERGLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
|
119 |
+
|
120 |
+
| Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) | COPA (Accuracy) | CB (F1) | MultiRC (F1) | BoolQ (Accuracy) |
|
121 |
+
|-------------------------------|----------------|----------------|-----------|-----------------|-----------------|------------|--------------|------------------|
|
122 |
+
| **Albertina 1.5B PT-PT** | **0.8809** | 0.4742 | 0.8457 | **0.9034** | **0.8433** | **0.7840** | **0.7688** | **0.8602** |
|
123 |
+
| **Albertina 1.5B PT-PT 256** | 0.8809 | 0.5493 | 0.8752 | 0.8795 | 0.8400 | 0.5832 | 0.6791 | 0.8496 |
|
124 |
+
| **Albertina 900M PT-PT** | 0.8339 | 0.4225 | **0.9171**| 0.8801 | 0.7033 | 0.6018 | 0.6728 | 0.8224 |
|
125 |
+
| **Albertina 100M PT-PT** | 0.6919 | 0.4742 | 0.8047 | 0.8590 | n.a. | 0.4529 | 0.6481 | 0.7578 |
|
126 |
+
||||||||||
|
127 |
+
| **DeBERTa 1.5B EN** | 0.8147 | 0.4554 | 0.8696 | 0.8557 | 0.5167 | 0.4901 | 0.6687 | 0.8347 |
|
128 |
+
| **DeBERTa 100M EN** | 0.6029 | **0.5634** | 0.7802 | 0.8320 | n.a. | 0.4698 | 0.6368 | 0.6829 |
|
129 |
|
|
|
130 |
|
131 |
|
|
|
|
|
132 |
|
133 |
+
**para modelo PT-BR**
|
|
|
|
|
|
|
|
|
134 |
|
135 |
+
| Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) | COPA (Accuracy) | CB (F1) | MultiRC (F1) | BoolQ (Accuracy) |
|
136 |
+
|-------------------------------|----------------|----------------|-----------|-----------------|-----------------|------------|--------------|------------------|
|
137 |
+
| **Albertina 1.5B PT-BR** | **0.8676** | 0.4742 | 0.8622 | **0.9007** | 0.7767 | 0.6372 | **0.7667** | **0.8654** |
|
138 |
+
| **Albertina 1.5B PT-BR 256** | 0.8123 | 0.4225 | 0.8638 | 0.8968 | **0.8533** | **0.6884** | 0.6799 | 0.8509 |
|
139 |
+
| **Albertina 900M PT-BR** | 0.7545 | 0.4601 | **0.9071**| 0.8910 | 0.7767 | 0.5799 | 0.6731 | 0.8385 |
|
140 |
+
| **BERTimbau (335M)** | 0.6446 | **0.5634** | 0.8873 | 0.8842 | 0.6933 | 0.5438 | 0.6787 | 0.7783 |
|
141 |
+
| **Albertina 100M PT-BR** | 0.6582 | **0.5634** | 0.8149 | 0.8489 | n.a. | 0.4771 | 0.6469 | 0.7537 |
|
142 |
+
||||||||||
|
143 |
+
| **DeBERTa 1.5B EN** | 0.7112 | **0.5634** | 0.8545 | 0.0123 | 0.5700 | 0.4307 | 0.3639 | 0.6217 |
|
144 |
+
| **DeBERTa 100M EN** | 0.5716 | 0.5587 | 0.8060 | 0.8266 | n.a. | 0.4739 | 0.6391 | 0.6838 |
|
145 |
|
146 |
|
147 |
<br>
|