view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • 21 days ago • 45
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 12 items • Updated 9 days ago • 84
Reasoning Datasets Collection Distilled synthetic Reasoning datasets • 7 items • Updated 26 days ago • 55
DeepSeek R1 (All Versions) Collection DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated 1 day ago • 202
gemini-2.0-flash-thinking-exp-1219 Datasets Collection Existing datasets with responses regenerated using gemini-2.0-flash-thinking-exp-1219. Currently only single-turn. • 15 items • Updated Jan 16 • 5
gemini-exp-1206 Datasets Collection Existing datasets with responses regenerated using gemini-exp-1206. Currently only single-turn. • 3 items • Updated Jan 16 • 1
story writing favourites Collection Models I personally liked for generating stories in the past. Not a recommendation, many of these are outdated. • 20 items • Updated 2 days ago • 43
LLMs - Best of 2025 Collection Most interesting LLMs to play around with in 2025! (will be updated throughout the year) • 19 items • Updated 14 days ago • 2
Reasoning Models Collection If this really help, please upvote for researchers' hardwork • 14 items • Updated Jan 21 • 1
CoT Datasets Collection If this really help, please upvote for researchers' hardwork • 15 items • Updated Jan 20 • 1
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 232