Could you explain why proprietary models are scoring so high?

#151
by glenospace - opened

I just don't follow, those are the most locked down models; why are they topping the benchmark? Am I missing something?

The gemini 1.5 exp models (which might not be available anymore) were actually very uncensored as long as you don't have their content filters on. Otherwise, the leaderboard is a combination of knowledge of contentious/controversial topics and willingness. Most proprietary models have low willingness, but are very intelligent. If you are wanting to come up with an argument for a certain political belief, a model that can provide accurate information but refuses to form it into an argument is probably more useful than a model that is willing to form information into an argument but all the information is wrong. You need to find a balance. Understandable if you think the leaderboard is too weighted towards intelligence right now.

Also, certain version of models like gpt-4o-2024-05-13, are more likely to provide information than other versions like gpt-4o-2024-11-20, which scores 20 points lower on ugi.

DontPlanToEnd changed discussion status to closed

Thanks for the clarification. I didn’t know that those models were high will. Very surprising.

Yeah since those gemini experimental models were only available temporarily, I guess they didn't feel the need to apply harsh censorship-tuning to them.

Sign up or log in to comment