jarodrigues commited on
Commit
0fc7a14
1 Parent(s): 68d6838

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -10
README.md CHANGED
@@ -104,7 +104,7 @@ We skipped the default filtering of stopwords since it would disrupt the syntact
104
  As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/microsoft/deberta-v2-xxlarge), for English.
105
 
106
  To train **Albertina 1.5B PT-PT**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence truncation and dynamic padding for 250k steps,
107
- a 256-token sequence-truncation for 80k steps and finally a 512-token sequence-truncation for 60k steps.
108
  These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences, 24 hours of computation for the 256-token
109
  input sequences and 24 hours of computation for the 512-token input sequences.
110
  We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
@@ -113,21 +113,35 @@ We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
113
 
114
  # Evaluation
115
 
116
- The base model version was evaluated on downstream tasks, namely the translations into PT-PT of the English data sets used for a few of the tasks in the widely-used [GLUE benchmark](https://huggingface.co/datasets/glue).
117
 
 
 
 
 
 
 
 
 
 
 
 
 
118
 
119
- ## GLUE and SUPERGLUE tasks translated
120
 
121
 
122
- We resorted to [HyperGlue-PT](?), a **PT-PT version of the GLUE and SUPERGLUE** benchmark.
123
- We automatically translated the tasks from GLUE and SUPERGLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
124
 
125
- | Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) | COPA (Accuracy) | CB (F1) | MultiRC (F1) | BoolQ (Accuracy) |
126
- |---------------------------|----------------|----------------|-----------|-----------------|-----------------|---------|--------------|------------------|
127
- | **Albertina 1.5B PT-PT** | **0.?** | 0. | **0. **| **0. ** | ? | | | |
128
- | **Albertina PT-PT(900M)** | 0.8339 | 0.4225 | 0.9171 | 0.8801 | 0.7300 | 0.4739 | 0.6782 | 0.8437 |
129
- | **Albertina 100M PT-PT** | 0.5848 | 0.5634 | 0.8793 | 0.8624 | n.a. | 0.4734 | 0.6564 | 0.7700 |
130
 
 
 
 
 
 
 
 
 
 
 
131
 
132
 
133
  <br>
 
104
  As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/microsoft/deberta-v2-xxlarge), for English.
105
 
106
  To train **Albertina 1.5B PT-PT**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence truncation and dynamic padding for 250k steps,
107
+ a 256-token sequence-truncation for 80k steps ([**Albertina 1.5B PT-PT 256**](https://huggingface.co/PORTULAN/albertina-1b5-portuguese-ptpt-encoder-256)) and finally a 512-token sequence-truncation for 60k steps.
108
  These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences, 24 hours of computation for the 256-token
109
  input sequences and 24 hours of computation for the 512-token input sequences.
110
  We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
 
113
 
114
  # Evaluation
115
 
 
116
 
117
+ We resorted to [HyperGlue-PT](?), a **PT-PT version of the GLUE and SUPERGLUE** benchmark.
118
+ We automatically translated the tasks from GLUE and SUPERGLUE using [DeepL Translate](https://www.deepl.com/), which specifically provides translation from English to PT-PT as an option.
119
+
120
+ | Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) | COPA (Accuracy) | CB (F1) | MultiRC (F1) | BoolQ (Accuracy) |
121
+ |-------------------------------|----------------|----------------|-----------|-----------------|-----------------|------------|--------------|------------------|
122
+ | **Albertina 1.5B PT-PT** | **0.8809** | 0.4742 | 0.8457 | **0.9034** | **0.8433** | **0.7840** | **0.7688** | **0.8602** |
123
+ | **Albertina 1.5B PT-PT 256** | 0.8809 | 0.5493 | 0.8752 | 0.8795 | 0.8400 | 0.5832 | 0.6791 | 0.8496 |
124
+ | **Albertina 900M PT-PT** | 0.8339 | 0.4225 | **0.9171**| 0.8801 | 0.7033 | 0.6018 | 0.6728 | 0.8224 |
125
+ | **Albertina 100M PT-PT** | 0.6919 | 0.4742 | 0.8047 | 0.8590 | n.a. | 0.4529 | 0.6481 | 0.7578 |
126
+ ||||||||||
127
+ | **DeBERTa 1.5B EN** | 0.8147 | 0.4554 | 0.8696 | 0.8557 | 0.5167 | 0.4901 | 0.6687 | 0.8347 |
128
+ | **DeBERTa 100M EN** | 0.6029 | **0.5634** | 0.7802 | 0.8320 | n.a. | 0.4698 | 0.6368 | 0.6829 |
129
 
 
130
 
131
 
 
 
132
 
133
+ **para modelo PT-BR**
 
 
 
 
134
 
135
+ | Model | RTE (Accuracy) | WNLI (Accuracy)| MRPC (F1) | STS-B (Pearson) | COPA (Accuracy) | CB (F1) | MultiRC (F1) | BoolQ (Accuracy) |
136
+ |-------------------------------|----------------|----------------|-----------|-----------------|-----------------|------------|--------------|------------------|
137
+ | **Albertina 1.5B PT-BR** | **0.8676** | 0.4742 | 0.8622 | **0.9007** | 0.7767 | 0.6372 | **0.7667** | **0.8654** |
138
+ | **Albertina 1.5B PT-BR 256** | 0.8123 | 0.4225 | 0.8638 | 0.8968 | **0.8533** | **0.6884** | 0.6799 | 0.8509 |
139
+ | **Albertina 900M PT-BR** | 0.7545 | 0.4601 | **0.9071**| 0.8910 | 0.7767 | 0.5799 | 0.6731 | 0.8385 |
140
+ | **BERTimbau (335M)** | 0.6446 | **0.5634** | 0.8873 | 0.8842 | 0.6933 | 0.5438 | 0.6787 | 0.7783 |
141
+ | **Albertina 100M PT-BR** | 0.6582 | **0.5634** | 0.8149 | 0.8489 | n.a. | 0.4771 | 0.6469 | 0.7537 |
142
+ ||||||||||
143
+ | **DeBERTa 1.5B EN** | 0.7112 | **0.5634** | 0.8545 | 0.0123 | 0.5700 | 0.4307 | 0.3639 | 0.6217 |
144
+ | **DeBERTa 100M EN** | 0.5716 | 0.5587 | 0.8060 | 0.8266 | n.a. | 0.4739 | 0.6391 | 0.6838 |
145
 
146
 
147
  <br>