Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

938

Question: same model with very different scores

#904

by Yuma42 - opened 23 days ago

Discussion

Yuma42

23 days ago

Hello, the leaderboard has the same model twice (they both link to the same model page). But the scores are very different. It's mlabonne/NeuralDaredevil-8B-abliterated which scores 27.01 and 21.5

Can someone explain? If I would have to guess maybe it's once with chat template and once without?

phil111

22 days ago

The only difference I can see is one's bfloat and the other is float16. My guess is there's a bug in the evaluation of IFeval with bfloat (41 vs 75) since the other evals match up.

alozowski

Open LLM Leaderboard org 21 days ago

Hi @Yuma42 ,

This means that the model was estimated twice, in bf16 and in f16 precisions, so @phil111 is right. Please, checkout my screenshot where I clicked to show "Precision". Considering low IFEval, it isn't a bug. This model doesn't use the chat template in bfloat16 precision, causing low IFEval score, but as you can see in the request file, float16 version has "use_chat_template: True"

I close this discussion, please, feel free to open a new one in case of any questions!

alozowski changed discussion status to closed 21 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment