Any benchmarks yet?

#14
by lemon07r - opened

It's been a while so I'm surprised there are no benchmarks for this model yet. I wanted to see where it fits on the leaderboard (not that I think benchmarks are super important).

the leaderboard is currently does not allow for models that use YiTokenizer (maybe it's possible if you talk to the staff)

Screenshot_2024.04.27_21-49-49.png

but here are some test results (but not the standard benchmark results)
https://old.reddit.com/r/LocalLLaMA/comments/18s61fb/pressuretested_the_most_popular_opensource_llms/

y2xjbnr8hv8c1.webp

here's a small user-test (not automatic/standard, not comprehensive), but it relatively scores best for its size
https://old.reddit.com/r/LocalLLaMA/comments/1bcdtt0/llm_comparisontest_new_api_edition_claude_3_opus/

there's also
https://huggingface.co./datasets/ChuckMcSneed/WolframRavenwolfs_benchmark_results
https://huggingface.co./spaces/DontPlanToEnd/UGI-Leaderboard
https://huggingface.co./datasets/ChuckMcSneed/NeoEvalPlusN_benchmark

Sign up or log in to comment