view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • 30 days ago • 63
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 191
Running 2.14k 2.14k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
SimpleRL Collection The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 19 days ago • 5
CodeI/O Collection Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated 25 days ago • 6
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated 27 days ago • 76