hmByT5 - Preliminary Language Models

Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languages are currently covered:

Dutch (Delpher Corpus)

More details can be found in our GitHub repository.

Pretraining

We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU. Details about the training can be found here.

Evaluation on Downstream Tasks (NER)

We evaluated the hmByT5 model on ICDAR Europeana dataset:

Configuration	Run 1	Run 2	Run 3	Run 4	Run 5	Avg.
`wsFalse-bs4-e10-lr0.00015-poolingfirst`	88.02	88.71	87.17	87	88.62	87.9 ± 0.71
`wsFalse-bs8-e10-lr0.00015-poolingfirst`	87.1	86.72	87.15	88.29	87.35	87.32 ± 0.53
`wsFalse-bs8-e10-lr0.00016-poolingfirst`	87.23	87.19	87.11	87.62	87.11	87.25 ± 0.19
`wsFalse-bs4-e10-lr0.00016-poolingfirst`	85.98	87.5	84.22	87.08	86.48	86.25 ± 1.14

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️