Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

123

Bring back categories of censorship

#91

by ChuckMcSneed - opened 20 days ago

Discussion

ChuckMcSneed

20 days ago

Like in the previous iteration.

DontPlanToEnd

Owner 20 days ago

•

edited 20 days ago

The reason I removed them was because I didn't like the constraint of needing to fit all the questions into five categories which have the exact same number of questions, and ratio of willingness to intelligence focused questions. I still might be able to group the questions up into three overarching subjects though. I'll look into it.

natechendev

18 days ago

Thanks for the great work. I've been referring to this leaderboard for over a year now. But I'd like to +1 this request. The reason being that some of the models I found shining in the previous version are hard to fine. For example, Rocinate 12B was ranking high on one of those metrics, but now it's buried somewhere in the middle because of its mediocre UGI score.

DontPlanToEnd changed discussion status to closed 16 days ago

yttria

3 days ago

Thanks for the great work. I've been referring to this leaderboard for over a year now. But I'd like to +1 this request. The reason being that some of the models I found shining in the previous version are hard to fine. For example, Rocinate 12B was ranking high on one of those metrics, but now it's buried somewhere in the middle because of its mediocre UGI score.

I agree, the previous version also aligned better with my experience of which models are good.

DontPlanToEnd

Owner 3 days ago

•

edited 3 days ago

I agree, the previous version also aligned better with my experience of which models are good.

For the current iteration of the leaderboard I wasn't able to well enough balance "needed computation" and "writing eval accuracy". The last version of the leaderboard had a few writing quality evals as a part of the ugi leaderboard, which may be why you notice a difference. I'm hoping to eventually get batching working so I can have models answer multiple questions at once, instead of one by one. This should allow me to have the leaderboard support reasoning models and writing quality evaluation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment