Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1017

Eval time vs. score diagram

#950

by HenkPoley - opened Sep 27

Discussion

HenkPoley

Sep 27

•

edited Sep 27

On the Portuguese version of the old/'v1' Open LLM Leaderboard I saw an interesting plot.

See the Metrics tab, and look at the bottom: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard

There you can kind of oggle the scaling laws. Also, that around 9B the models can ace these older style tests.

Maybe add something like that, or one size vs. score; instead of evaluation time.

alozowski

Open LLM Leaderboard org Oct 1

Hi @HenkPoley ,

This is a very good idea! We're a bit short on time at the moment, would you be interested in contributing this feature?

CombinHorizon

Oct 2

some of the notable models that performed well in Portuguese are

THUDM/glm-4-9b-chat-1m
THUDM/glm-4-9b-chat
THUDM/glm-4-9b

but unfortunately they trigger the error message: “needs to be launched with trust_remote_code=True”

could the model be changed to somehow mitigate this? what are the prospects?

alozowski

Open LLM Leaderboard org Oct 3

Hi @CombinHorizon ,

Currently we have results for THUDM/glm-4-9b and THUDM/glm-4-9b-chat that we added manually, you can find them on the Leaderboard. If you're interested, we can also add THUDM/glm-4-9b-chat-1m as well

alozowski

Open LLM Leaderboard org Oct 14

Closing this discussion due to inactivity, feel free to ping me here if you want to continue discussing the plot implementation

alozowski changed discussion status to closed Oct 14

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment