Is this the same as DeepSeek-R1 (Preview) mentioned on LiveCodeBench?
Are these "DeepSeek-R1 (Preview)" and "DeepSeek-R1" models akin to "o1-Preview" and "o1-full"?
From here - https://x.com/StringChaos/status/1880317308515897761
And can somebody with the technical capabilities confirm this?
From model weights -
R1, R1-zero, V3-instruct are all quite different from each other,
and R1-zero is closest to V3-base.
They probably all start from v3-base, but undergone separate post-training process
So,
V3-base is tuned into R1-Zero
R1-Zero generates reasoning chains for V3-instruct, can't do much of general instruction following
V3-Instruct is used to train proper R1 using Reinforcement Learning?
From - https://x.com/jiayi_pirate/status/1881264063302557919
So does that mean R-1 is more advanced that R1-zero?
So does that mean R-1 is more advanced that R1-zero?
Yup.