Better than distilled version

#5
by urtuuuu - opened

Tested it on lots of questions, and it's my favorite model now. Not only R1-Distill-Qwen-14B performs worse, but apparently even 32B. If anyone can prove me wrong, give some examples.

There is a guy on youtube, who tested full Deepseek-R1 vs Openai O3 mini, and maybe some other models, on this question >>> "Write a snake game code in html", openai failed, only R1 succeded. I decided to test Qwen2.5-14B-Instruct-1M on this, and lol, it also wrote correct code (with a little bug though, but corrected it after i described the bug)

urtuuuu changed discussion title from Better that distilled version to Better than distilled version

I have had doubts as well. Not really seeing some radical improvement, but not seeing any major degradation either, in comparison. But the R1 versions are running a little faster on my hardware, so it may be a wash, for me.

i think both may have their place in the world. My guess at this stage is that if the model has to 'figure it out' R1 might be better with the extra overhead, but its its straight forward, then the 'base' will be better, with less noise. But that is a guess, not really had time to prove that suspicion.

EDIT and this is mostly on the 14b models

Sign up or log in to comment