INTELLECT-MATH: Frontier Mathematical Reasoning through Better Initializations for Reinforcement Learning

INTELLECT-MATH is a 7B parameter model optimized for mathematical reasoning. It was trained in two stages, an SFT stage, in which the model was fine-tuned on verified QwQ outputs, and an RL stage, in which the model was trained using the PRIME-RL recipe.

We demonstrate that the quality of our SFT data can impact the performance and training speed of the RL stage: Due to its better synthetic SFT dataset that encourages the model to imitate the reasoning behavior of a strong teacher model, INTELLECT-MATH outperforms Eurus-2-PRIME, the previous state-of-the-art trained with PRIME-RL, and matches its performance with 10x faster training.

	Intellect-Math (Step 255)	Intellect-Math (Step 47)	Eurus-2-Prime (Step 592)	Intellect-Math-SFT	Eurus-2-SFT	Qwen-2.5-Math
MATH-500	82.0	81.6	79.2	72.8	65.1	79.8
OLYMPIADBENCH	49.5	46.7	42.1	39.1	29.8	40.7
AIME 2024	26.7	26.7	26.7	16.6	3.3	13.3
AMC	60.2	57.8	57.8	45.8	30.1	50.6
MINERVA MATH	39.7	37.8	38.6	33.8	32.7	34.6
AVG	51.6	50.1	48.9	41.6	32.2	43.8

PrimeIntellect
/

INTELLECT-MATH

INTELLECT-MATH: Frontier Mathematical Reasoning through Better Initializations for Reinforcement Learning

Links

Collection including PrimeIntellect/INTELLECT-MATH

INTELLECT-MATH