Bring back categories of censorship
Like in the previous iteration.
The reason I removed them was because I didn't like the constraint of needing to fit all the questions into five categories which have the exact same number of questions, and ratio of willingness to intelligence focused questions. I still might be able to group the questions up into three overarching subjects though. I'll look into it.
Thanks for the great work. I've been referring to this leaderboard for over a year now. But I'd like to +1 this request. The reason being that some of the models I found shining in the previous version are hard to fine. For example, Rocinate 12B was ranking high on one of those metrics, but now it's buried somewhere in the middle because of its mediocre UGI score.
Thanks for the great work. I've been referring to this leaderboard for over a year now. But I'd like to +1 this request. The reason being that some of the models I found shining in the previous version are hard to fine. For example, Rocinate 12B was ranking high on one of those metrics, but now it's buried somewhere in the middle because of its mediocre UGI score.
I agree, the previous version also aligned better with my experience of which models are good.
I agree, the previous version also aligned better with my experience of which models are good.
For the current iteration of the leaderboard I wasn't able to well enough balance "needed computation" and "writing eval accuracy". The last version of the leaderboard had a few writing quality evals as a part of the ugi leaderboard, which may be why you notice a difference. I'm hoping to eventually get batching working so I can have models answer multiple questions at once, instead of one by one. This should allow me to have the leaderboard support reasoning models and writing quality evaluation.