view post Post 1635 Reply We https://mii-llm.ai just released a new LLM Italian benchmark and a set of evaluation: MMLU-PRO-ITAThanks to @efederici who released efederici/MMLU-Pro-ita a machine translated version of MMLU-PRO and thanks to a community shared computational effort we published in the "Eval Aggiuntive" tab of https://huggingface.co./spaces/FinancialSupport/open_ita_llm_leaderboard the results on Italian open source LLMs. If you want to deepen read the blog article on hf https://huggingface.co./blog/giux78/mmlu-pro-ita
view post Post 1447 Reply @FinancialSupport and I just released a new version of the Italian LLMs leaderboard https://huggingface.co./spaces/FinancialSupport/open_ita_llm_leaderboard using the super useful https://huggingface.co./demo-leaderboard template from @clefourrier . We’ve evaluated over 50 models (base, merged, fine-tuned, etc.) from:- Major companies like Meta, Mistral, Google ... - University groups such as https://huggingface.co./sapienzanlp or https://huggingface.co./swap-uniba- Italian Companies like https://huggingface.co./MoxoffSpA , https://huggingface.co./FairMind or https://huggingface.co./raicrits - Various communities and individuals All models were tested on #Italian benchmarks #mmlu #arc-c #hellaswag, which we contributed to the opensource lm-evaluation-harness library from https://huggingface.co./EleutherAI. Plus, you can now submit your model for automatic evaluation, thanks to to https://huggingface.co./seeweb sponsored computation.Curious about the top Italian models? Check out the leaderboard and submit your model!https://huggingface.co./spaces/FinancialSupport/open_ita_llm_leaderboard
giux78/ultrafeedback-binarized-preferences-cleaned-ita-ready Viewer • Updated Jan 18 • 60.9k • 40 • 2
giux78/50000-60900-ultrafeedback-binarized-preferences-cleaned-ita Viewer • Updated Jan 17 • 10.9k • 35
giux78/20000-50000-ultrafeedback-binarized-preferences-cleaned-ita Viewer • Updated Jan 17 • 30k • 41
giux78/10000-20000-ultrafeedback-binarized-preferences-cleaned-ita Viewer • Updated Jan 16 • 10k • 46