Is this the same as DeepSeek-R1 (Preview) mentioned on LiveCodeBench?

#10
by KrishnaKaasyap - opened

Are these "DeepSeek-R1 (Preview)" and "DeepSeek-R1" models akin to "o1-Preview" and "o1-full"?

From here - https://x.com/StringChaos/status/1880317308515897761

And can somebody with the technical capabilities confirm this?


From model weights -

R1, R1-zero, V3-instruct are all quite different from each other,
and R1-zero is closest to V3-base.

They probably all start from v3-base, but undergone separate post-training process

So,

V3-base is tuned into R1-Zero

R1-Zero generates reasoning chains for V3-instruct, can't do much of general instruction following

V3-Instruct is used to train proper R1 using Reinforcement Learning?


From - https://x.com/jiayi_pirate/status/1881264063302557919

So does that mean R-1 is more advanced that R1-zero?

So does that mean R-1 is more advanced that R1-zero?

Yup.

Sign up or log in to comment