Any benchmarks yet?
It's been a while so I'm surprised there are no benchmarks for this model yet. I wanted to see where it fits on the leaderboard (not that I think benchmarks are super important).
the leaderboard is currently does not allow for models that use YiTokenizer (maybe it's possible if you talk to the staff)
but here are some test results (but not the standard benchmark results)
https://old.reddit.com/r/LocalLLaMA/comments/18s61fb/pressuretested_the_most_popular_opensource_llms/
here's a small user-test (not automatic/standard, not comprehensive), but it relatively scores best for its size
https://old.reddit.com/r/LocalLLaMA/comments/1bcdtt0/llm_comparisontest_new_api_edition_claude_3_opus/
there's also
https://huggingface.co./datasets/ChuckMcSneed/WolframRavenwolfs_benchmark_results
https://huggingface.co./spaces/DontPlanToEnd/UGI-Leaderboard
https://huggingface.co./datasets/ChuckMcSneed/NeoEvalPlusN_benchmark