File size: 11,528 Bytes
a3c403b da34e53 a3c403b 3987c0f 76195f9 3987c0f 20624af 3987c0f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
---
license: apache-2.0
datasets:
- uonlp/CulturaX
language:
- de
tags:
- german
- electra
- teams
- culturax
- gerturax-1
---
# 🇩🇪 GERTuraX-1
This repository hosts the GERTuraX-1 model:
* GERTuraX-1 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
* It was trained on 147GB of plain text from the [CulturaX](https://huggingface.co./datasets/uonlp/CulturaX) corpus.
# Pretraining
The [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms) repo was used to train an ELECTRA
model using the very efficient [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
As pretraining corpus, 147GB of plain text was extracted from the [CulturaX](https://huggingface.co./datasets/uonlp/CulturaX) corpus.
GERTuraX-1 uses a 64k vocab corpus (cased) and was trained for 1M steps on a v3-32 TPU Pod. The pretraining took 2.6 days.
The TensorBoard can be found [here](../../tensorboard).
# Evaluation
GERTuraX-1 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark.
We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the [GeBERTa](https://arxiv.org/abs/2310.07321) paper (cf. Table 5) using 5 runs with different seed and report the averaged score.
## GermEval 2014
### GermEval 2014 - Original version
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co./deepset/gbert-base) | 87.53 ± 0.22 | 86.81 ± 0.16 |
| [GERTuraX-1](https://huggingface.co./gerturax/gerturax-1) (147GB) | 88.32 ± 0.21 | 87.18 ± 0.12 |
| [GERTuraX-2](https://huggingface.co./gerturax/gerturax-2) (486GB) | 88.58 ± 0.32 | 87.58 ± 0.15 |
| [GERTuraX-3](https://huggingface.co./gerturax/gerturax-3) (1.1TB) | 88.90 ± 0.06 | 87.84 ± 0.18 |
| [GeBERTa Base](https://huggingface.co./ikim-uk-essen/geberta-base) | 88.79 ± 0.16 | 88.03 ± 0.16 |
### GermEval 2014 - [Without Wikipedia](https://huggingface.co./datasets/stefan-it/germeval14_no_wikipedia)
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co./deepset/gbert-base) | 90.48 ± 0.34 | 89.05 ± 0.21 |
| [GERTuraX-1](https://huggingface.co./gerturax/gerturax-1) (147GB) | 91.27 ± 0.11 | 89.73 ± 0.27 |
| [GERTuraX-2](https://huggingface.co./gerturax/gerturax-2) (486GB) | 91.70 ± 0.28 | 89.98 ± 0.22 |
| [GERTuraX-3](https://huggingface.co./gerturax/gerturax-3) (1.1TB) | 91.75 ± 0.17 | 90.24 ± 0.27 |
| [GeBERTa Base](https://huggingface.co./ikim-uk-essen/geberta-base) | 91.74 ± 0.23 | 90.28 ± 0.21 |
## GermEval 2018
### GermEval 2018 - Fine Grained
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co./deepset/gbert-base) | 63.66 ± 4.08 | 51.86 ± 1.31 |
| [GERTuraX-1](https://huggingface.co./gerturax/gerturax-1) (147GB) | 62.87 ± 1.95 | 50.61 ± 0.36 |
| [GERTuraX-2](https://huggingface.co./gerturax/gerturax-2) (486GB) | 64.37 ± 1.31 | 51.02 ± 0.90 |
| [GERTuraX-3](https://huggingface.co./gerturax/gerturax-3) (1.1TB) | 66.39 ± 0.85 | 49.94 ± 2.06 |
| [GeBERTa Base](https://huggingface.co./ikim-uk-essen/geberta-base) | 65.81 ± 3.29 | 52.45 ± 0.57 |
### GermEval 2018 - Coarse Grained
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co./deepset/gbert-base) | 83.15 ± 1.83 | 76.39 ± 0.64 |
| [GERTuraX-1](https://huggingface.co./gerturax/gerturax-1) (147GB) | 83.72 ± 0.68 | 77.11 ± 0.59 |
| [GERTuraX-2](https://huggingface.co./gerturax/gerturax-2) (486GB) | 84.51 ± 0.88 | 78.07 ± 0.91 |
| [GERTuraX-3](https://huggingface.co./gerturax/gerturax-3) (1.1TB) | 84.33 ± 1.48 | 78.44 ± 0.74 |
| [GeBERTa Base](https://huggingface.co./ikim-uk-essen/geberta-base) | 83.54 ± 1.27 | 78.36 ± 0.79 |
## CoNLL-2003 - German, Revised
| Model Name | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co./deepset/gbert-base) | 92.15 ± 0.10 | 88.73 ± 0.21 |
| [GERTuraX-1](https://huggingface.co./gerturax/gerturax-1) (147GB) | 92.32 ± 0.14 | 90.09 ± 0.12 |
| [GERTuraX-2](https://huggingface.co./gerturax/gerturax-2) (486GB) | 92.75 ± 0.20 | 90.15 ± 0.14 |
| [GERTuraX-3](https://huggingface.co./gerturax/gerturax-3) (1.1TB) | 92.77 ± 0.28 | 90.83 ± 0.16 |
| [GeBERTa Base](https://huggingface.co./ikim-uk-essen/geberta-base) | 92.87 ± 0.21 | 90.94 ± 0.24 |
## ScandEval
We use v12.10.5 of [ScandEval](https://github.com/ScandEval/ScandEval) to evaluate on the following tasks:
* SB10k
* ScaLA-De
* GermanQuAD
The package can be installed via:
```bash
$ pip3 install "scandeval[all]==12.10.5"
```
### Results
#### SB10k
Evaluations on the SB10k dataset can be started like:
```bash
$ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de
```
| Model Name | Matthew's CC | Macro F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co./deepset/gbert-base) | 59.58 ± 1.80 | 72.98 ± 1.20 |
| [GERTuraX-1](https://huggingface.co./gerturax/gerturax-1) (147GB) | 61.56 ± 2.58 | 74.18 ± 1.77 |
| [GERTuraX-2](https://huggingface.co./gerturax/gerturax-2) (486GB) | 65.24 ± 1.77 | 76.55 ± 1.22 |
| [GERTuraX-3](https://huggingface.co./gerturax/gerturax-3) (1.1TB) | 64.33 ± 2.17 | 75.99 ± 1.40 |
| [GeBERTa Base](https://huggingface.co./ikim-uk-essen/geberta-base) | 59.52 ± 2.14 | 72.76 ± 1.50 |
#### ScaLA-De
Evaluations on the ScaLA-De dataset can be started like:
```bash
$ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de
```
| Model Name | Matthew's CC | Macro F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co./deepset/gbert-base) | 52.23 ± 4.34 | 73.90 ± 2.68 |
| [GERTuraX-1](https://huggingface.co./gerturax/gerturax-1) (147GB) | 74.55 ± 1.28 | 86.88 ± 0.75 |
| [GERTuraX-2](https://huggingface.co./gerturax/gerturax-2) (486GB) | 75.83 ± 2.85 | 87.59 ± 1.57 |
| [GERTuraX-3](https://huggingface.co./gerturax/gerturax-3) (1.1TB) | 78.24 ± 1.25 | 88.83 ± 0.63 |
| [GeBERTa Base](https://huggingface.co./ikim-uk-essen/geberta-base) | 59.70 ± 11.64 | 78.44 ± 6.12 |
#### GermanQuAD
```bash
$ scandeval --model "deepset/gbert-base" --task question-answering --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-1" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-2" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-3" --task question-answering --language de
```
| Model Name | Em | F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co./deepset/gbert-base) | 12.62 ± 2.20 | 29.62 ± 3.86 |
| [GERTuraX-1](https://huggingface.co./gerturax/gerturax-1) (147GB) | 27.24 ± 1.05 | 52.01 ± 1.10 |
| [GERTuraX-2](https://huggingface.co./gerturax/gerturax-2) (486GB) | 29.54 ± 1.05 | 55.12 ± 0.92 |
| [GERTuraX-3](https://huggingface.co./gerturax/gerturax-3) (1.1TB) | 28.49 ± 1.21 | 54.83 ± 1.26 |
| [GeBERTa Base](https://huggingface.co./ikim-uk-essen/geberta-base) | 28.81 ± 1.77 | 53.27 ± 1.92 |
# ❤️ Acknowledgements
GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/)
and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library.
Many thanks for providing TPUs!
Made from Bavarian Oberland with ❤️ and 🥨. |