Yi Cui

onekq

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

Articles

Organizations

MLX Community's profile picture ONEKQ AI's profile picture

Posts 13

view post
Post
1260
o3-mini is slightly better than R1, but lags behind Claude. Sorry folks, no new SOTA ๐Ÿ˜•

But OAI definitely owns the fashion of API. temperature and top_p are history now, reasoning_effort will be copied by other vendors.

onekq-ai/WebApp1K-models-leaderboard
view post
Post
1174
Mistral Small 3 is SUPER fast, and highest score for 20+b model, but still 11 points below Qwen 2.5 coder 32b.

I believe specialty model is the future. The more you know what to do with the model, the better bang you can get for your buck. If Mistral scopes this small model to coding only, I'm confident they can beat Qwen.

One day my leaderboard will be dominated by smol models excellent on one thing, not monolithic ones costing $$$. And I'm looking forward to that.

onekq-ai/WebApp1K-models-leaderboard

models

None public yet

datasets

None public yet