Through PPO using a reward model trained on merely 5% of the LeanStatement_RL dataset, our model achieved enhanced performance, reaching a pass@1 compilation success rate of 95.3% (465/488) on MiniF2F and 76.0% (284/374) on ProofNet benchmarks. Combined with our SFT results, these findings establish a new state-of-the-art in mathematics formalization as of December 2024.

Downloads last month: 44

Safetensors

Model size

7.62B params

Tensor type

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for SJTULean/LeanFormalizer_PPO

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(617)

this model

SJTULean
/

LeanFormalizer_PPO

Model tree for SJTULean/LeanFormalizer_PPO

Datasets used to train SJTULean/LeanFormalizer_PPO