view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • 21 days ago • 45
Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B Viewer • Updated Jan 27 • 250k • 6.17k • 85
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 12 items • Updated 9 days ago • 84
Reasoning Datasets Collection Distilled synthetic Reasoning datasets • 7 items • Updated 26 days ago • 55