GlossLM - a lecslab Collection

lecslab 's Collections

updated Sep 5

Multilingual IGT corpora and pretrained models

lecslab/glosslm

Updated 10 days ago • 49

Note The base GlossLM model, pretrained on 450k examples (the train split) and nearly 2k languages
lecslab/glosslm-unimorph-st_unseg_only

Updated Jun 13 • 1

Note Base GlossLM, with glosses normalized following the UniMorph schema Excludes segmented examples for the evaluation languages
lecslab/glosslm-corpus

Viewer • Updated 10 days ago • 451k • 50 • 1

Note The full pretraining corpus with 450k examples and nearly 2k languages
lecslab/glosslm-corpus-split

Viewer • Updated Mar 10 • 556k • 61

Note The pretraining corpus, split into train/dev/test splits for experiments
lecslab/glosslm-corpus-split-unimorph

Viewer • Updated Jun 8 • 556k • 38

Note The pretraining corpus, split and with glosses normalized following the UniMorph schema
lecslab/glosslm-st_unseg_only-v2

Updated Feb 7 • 3

Note The GlossLM model, excluding segmented examples for the evaluation languages