view article Article Open Source AI Agents | Github/Repo List | [2025] By tegridydev • 7 days ago • 22
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 11 days ago • 89
view article Article Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies By prithivMLmods • 11 days ago • 17
view article Article Reasoning at the Forefront of Advanced AI Models : Mistral-Small-24B-Base-2501 By ruslanmv • 20 days ago • 3
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15 • 151
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • 28 days ago • 40
view article Article 🦸🏻#9: Does AI Remember? The Role of Memory in Agentic Workflows By Kseniase • 26 days ago • 14
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 334
view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) By ariG23498 • Jan 19 • 14
view article Article Hugging Face and FriendliAI partner to supercharge model deployment on the Hub Jan 22 • 36
view article Article Finetuning Falcon 7b in a hybrid distributed fashion By Neo111x • Dec 31, 2024 • 5
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22, 2024 • 13
view article Article Building a MusicGen API to Generate Custom Music Tracks Locally By theeseus-ai • Dec 4, 2024 • 2