113
Open-LLM performances are plateauing, letβs make the leaderboard steep again
π
Update leaderboard for fair model evaluation
This leaderboard has been evaluating LLMs from Jun 2024 on IFEval, MuSR, GPQA, MATH, BBH and MMLU-Pro
Update leaderboard for fair model evaluation
Note Blog on why we made a new version of the Open LLM Leaderboard
Track, rank and evaluate open LLMs and chatbots
Note The actual leaderboard! With a stylish new ux :)
Note If you want to download the main leaderboard table, you'll find the dataset here!
Note To extract more detailed aggregated results for each model, look here!
Note All models ever submitted to the leaderboard
Compare Open LLM Leaderboard results