Spaces:
Running
on
CPU Upgrade
Consider filtering for MoE models
I want to compare MoE models with each other. This is NOT easy, because it is only possible to "hide" them. It is not possible to hide dense models or filter for them. Their naming scheme is not standardized. They are not following the [GGUF naming convention][https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention] or any other standard, as far as I am aware, so it is hard to search for them via the search feature.
Consider the following MoE model names:
- allenai/OLMoE-1B-7B-0924
- microsoft/GRIN-MoE
- Qwen/Qwen1.5-MoE-A2.7B-Chat
- Jamba-12B-52B
- Qwen/Qwen1.5-3B-14B
- JetMoE-2B-9B
- OpenMoE-2B-9B
- Arctic-17B-480B
- mistralai/Mixtral-8x7B-Instruct-v0.1
Not all of them have "MoE" in the model name.
The difficulty in finding them is that the parameter count is divided into
a) activated parameters
b) total parameter count
Total parameter count and activated parameter count should not be confused with the number of experts per layers in the model and number of activated experts per layer respectively.
Hi @ThiloteE ,
Thank you for your discussion!
Currently, if you want to analyse MoE models, you can use our Contents dataset β there is a MoE column, you need to choose MoE=false
and you will be able to see all MoE models we have now
We're planning to improve the Leaderboard's UI in a future release. As part of this update, we'll consider implementing more advanced filtering options for MoE models
Closing this discussion, please, feel free to ping me here in case of any questions about MoE models or start a new discussion