Add language information to model metadata
#10
by
davanstrien
HF staff
- opened
Thanks for sharing this incredible model! I've suggested language tags for the metadata section of the model based on the languages outlined in https://blog.salesforceairesearch.com/xgen/:
For Wikipedia, we cover 22 languages: bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk, ja, zh, more than LLaMA (20 languages) and MPT (English only).
Since most tokens in the training data are English, you might prefer only to choose English. In your blog post, I also didn't see if you did any additional evaluation of downstream performance for non-English languages, so you may prefer to choose a different subset of languages to the one I have selected.