Consider filtering for MoE models

#956
by ThiloteE - opened

I want to compare MoE models with each other. This is NOT easy, because it is only possible to "hide" them. It is not possible to hide dense models or filter for them. Their naming scheme is not standardized. They are not following the [GGUF naming convention][https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention] or any other standard, as far as I am aware, so it is hard to search for them via the search feature.

Consider the following MoE model names:

  • allenai/OLMoE-1B-7B-0924
  • microsoft/GRIN-MoE
  • Qwen/Qwen1.5-MoE-A2.7B-Chat
  • Jamba-12B-52B
  • Qwen/Qwen1.5-3B-14B
  • JetMoE-2B-9B
  • OpenMoE-2B-9B
  • Arctic-17B-480B
  • mistralai/Mixtral-8x7B-Instruct-v0.1

Not all of them have "MoE" in the model name.

The difficulty in finding them is that the parameter count is divided into

a) activated parameters
b) total parameter count

Total parameter count and activated parameter count should not be confused with the number of experts per layers in the model and number of activated experts per layer respectively.

Open LLM Leaderboard org
β€’
edited Oct 1

Hi @ThiloteE ,

Thank you for your discussion!

Currently, if you want to analyse MoE models, you can use our Contents dataset – there is a MoE column, you need to choose MoE=false and you will be able to see all MoE models we have now

We're planning to improve the Leaderboard's UI in a future release. As part of this update, we'll consider implementing more advanced filtering options for MoE models

Open LLM Leaderboard org

Closing this discussion, please, feel free to ping me here in case of any questions about MoE models or start a new discussion

alozowski changed discussion status to closed

Sign up or log in to comment