Add language information to model metadata

#10
by davanstrien HF staff - opened

Thanks for sharing this incredible model! I've suggested language tags for the metadata section of the model based on the languages outlined in https://blog.salesforceairesearch.com/xgen/:

For Wikipedia, we cover 22 languages: bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk, ja, zh, more than LLaMA (20 languages) and MPT (English only).

Since most tokens in the training data are English, you might prefer only to choose English. In your blog post, I also didn't see if you did any additional evaluation of downstream performance for non-English languages, so you may prefer to choose a different subset of languages to the one I have selected.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment