About continuing training

by alielfilali01 - opened Feb 20

Feb 20

Hello guys, incredible work btw 🔥
I am interested to know if you managed to evaluate the model's performance on other languages than English? I'am interested to continue training this model on an arabic corpus! Do you think it will maintain it's performance across the embedding task as well? Would love to hear your thoughts about this subject
best 🤗

cc : @Muennighoff

Muennighoff

GritLM org Feb 20

Thanks!
We evaluated it on TyDi QA - you can find the per-language metrics of this model here: https://huggingface.co./datasets/GritLM/results/blob/main/GritLM-7B/tydiqa_metrics.json
(the average is also reported in the paper)

Here's the GritLM-8x7B model: https://huggingface.co./datasets/GritLM/results/blob/main/GritLM-8x7B/tydiqa_metrics.json

We didn't test them on arabic embedding but there are a bunch of Arabic datasets available in MTEB - would be great to get their performance!

wilfoderek

Feb 23

What languages does it suport?

Muennighoff

GritLM org Feb 23

You can try any language, but it will probably be best for English and related languages

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment