Small Models Struggle to Learn from Strong Reasoners Paper β’ 2502.12143 β’ Published 11 days ago β’ 27
Granite Data Collection This collection has a set of artifacts which are related to curating and evaluating datasets used for Granite models β’ 13 items β’ Updated about 11 hours ago β’ 3
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita π₯ 11 days ago β’ 89
view article Article From Llasa to Llasagna π: Finetuning LLaSA to generates Italian speech and other languages By Steveeeeeeen and 1 other β’ 17 days ago β’ 25
On Teacher Hacking in Language Model Distillation Paper β’ 2502.02671 β’ Published 24 days ago β’ 17
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published 24 days ago β’ 195
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper β’ 2501.18965 β’ Published 28 days ago β’ 6
view article Article Mini-R1: Reproduce Deepseek R1 βaha momentβ a RL tutorial By open-r1 β’ 28 days ago β’ 40
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other β’ Jan 23 β’ 64
view article Article How biased is Whisper ? Evaluating Whisper Models for Robustness to Diverse English Accents By Steveeeeeeen β’ 30 days ago β’ 16
Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts Paper β’ 2501.14334 β’ Published Jan 24 β’ 20
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper β’ 2501.06282 β’ Published Jan 10 β’ 47