German Wikipedia LMs

non-profit

AI & ML interests

language modeling

Recent Activity

stefan-it  updated a model 24 days ago
gwlms/deberta-tokenizer
stefan-it  updated a model 28 days ago
gwlms/roberta-tokenizer
stefan-it  updated a dataset 7 months ago
gwlms/dewiki-20230701-flair-corpus
View all activity

German Wikipedia LMs (GWLMs)

We present Language Models (BERT, BERT with Token Dropping, TEAMS, T5) pretrained on German Wikipedia.

This is an ongoing project!

German Wikipedia Corpus

We use a recent Wikipedia Dump, that can can be accessed here. Additionally, a sentence-segmented (using NLTK) is available here.

Fine-tuned Models

We fine-tuned NER models using SpanMarker library on GermEval 2014 NER dataset and upload the best models:

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️