deepseek-ai/DeepSeek-R1 · Is this the same as DeepSeek-R1 (Preview) mentioned on LiveCodeBench?

Jan 20

Are these "DeepSeek-R1 (Preview)" and "DeepSeek-R1" models akin to "o1-Preview" and "o1-full"?

And can somebody with the technical capabilities confirm this?

From model weights -

R1, R1-zero, V3-instruct are all quite different from each other,
and R1-zero is closest to V3-base.

They probably all start from v3-base, but undergone separate post-training process

So,

V3-base is tuned into R1-Zero

R1-Zero generates reasoning chains for V3-instruct, can't do much of general instruction following

V3-Instruct is used to train proper R1 using Reinforcement Learning?

Jan 20

So does that mean R-1 is more advanced that R1-zero?

Jan 20

So does that mean R-1 is more advanced that R1-zero?

Yup.