open-llm-leaderboard/open_llm_leaderboard · Add Hymba-1.5B to the leaderboard

Dec 5, 2024

•

edited Dec 5, 2024

Adding Hymba-1.5B model to the leader board. Public benchmarks (lm-eval-harness) show higher number than SmolLM, Qwen2.5 and Llama3.2. This leaderboard will show more tests. Additionally, Hymba is the first open small hybrid model to clearly outperform Attention only models.

Model: https://huggingface.co./nvidia/Hymba-1.5B-Instruct

ooolabiyi

Dec 5, 2024

•

edited Dec 5, 2024

We should be adding this model to the HF leaderboard to allow the community to appreciate the strength of this model vs others with similar size.

phil111

Dec 8, 2024

The results are out, and they appear to be in the ballpark of the others.

However, in my experience the test scores of tiny models reveal virtually nothing about the quality of the architecture or design.

The only way to objectively evaluate the new architecture is for Nvidia to create two models (new and classic architecture) with the same corpus, compute... then compare the scores and performance.

This is primarily because ~1b parameters isn't enough to hold even core human knowledge, so the model makers pick and choose what to focus on.

For example, Qwen2.5 has a 60 MMLU, versus 44 for Llama3.2, yet the architecture and design competence of both are near identical, and L3.2 actually has notably more broad knowledge. Qwen simply favored data that boosts MMLU and math, while Meta made a model with far broader abilities, knowledge, and instruction following ability. In short, the test scores of ~1b models are primarily the consequence of cherry-picking corpus data, fine-tuning focus..., and not due to the underlying architecture, skill of the model's creators, or compute used. This is especially true since the corpus of Hymba is only 1.5, which is near conclusive evidence that Nvidia cherry-picked quality test boosting data at the expense of broad knowledge and abilities.

alozowski

Open LLM Leaderboard org Dec 10, 2024

Hi everyone!

Thank you for the reaction to this discussion, it's true, both two models were added to the Leaderboard

I'm closing this discussion, feel free to open a new one!

alozowski changed discussion status to closed Dec 10, 2024