German Wikipedia LMs

non-profit

Activity Feed Request to join this org

AI & ML interests

language modeling

Organization Card

Community About org cards

German Wikipedia LMs (GWLMs)

We present Language Models (BERT, BERT with Token Dropping, TEAMS, T5) pretrained on German Wikipedia.

This is an ongoing project!

German Wikipedia Corpus

We use a recent Wikipedia Dump, that can can be accessed here. Additionally, a sentence-segmented (using NLTK) is available here.

Fine-tuned Models

We fine-tuned NER models using SpanMarker library on GermEval 2014 NER dataset and upload the best models:

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️

models 15

gwlms/deberta-tokenizer

Updated Nov 30, 2024 • 8

gwlms/roberta-tokenizer

Updated Nov 26, 2024 • 8

gwlms/t5-efficient-small-dewiki-v1

Text2Text Generation • Updated Apr 19, 2024 • 115

gwlms/byt5-small-dewiki-v1

Text2Text Generation • Updated Apr 19, 2024 • 10

gwlms/t5-efficient-base-dewiki-v1

Text2Text Generation • Updated Apr 19, 2024 • 8

gwlms/span-marker-bert-germeval14

Token Classification • Updated Apr 19, 2024 • 18

gwlms/span-marker-token-dropping-bert-germeval14

Token Classification • Updated Apr 19, 2024 • 22

gwlms/span-marker-teams-germeval14

Token Classification • Updated Apr 19, 2024 • 20

gwlms/teams-base-dewiki-v1-discriminator

Updated Apr 19, 2024 • 11

gwlms/bert-base-token-dropping-dewiki-v1

Fill-Mask • Updated Sep 6, 2023 • 17

datasets 9

gwlms/dewiki-20230701-flair-corpus

Viewer • Updated Jun 10, 2024 • 45.6M • 69

gwlms/validation

Viewer • Updated Jan 5, 2024 • 15.6k • 67

gwlms/biofid

Updated Aug 23, 2023 • 33

gwlms/germeval2014

Updated Jul 31, 2023 • 90

gwlms/germeval2018

Updated Jul 26, 2023 • 67

gwlms/dewiki-20230701-chunks

Updated Jul 19, 2023 • 98

gwlms/dewiki-20230701-tfrecords-dupe5

Updated Jul 19, 2023 • 112

gwlms/dewiki-20230701-nltk-corpus

Viewer • Updated Jul 19, 2023 • 61.6M • 76

gwlms/dewiki-20230701

Viewer • Updated Jul 19, 2023 • 2.73M • 80