---
license: apache-2.0
language:
- en
base_model:
- HuggingFaceTB/SmolLM2-135M-Instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation-inference
- GRPO
---
![czxbzdxfcv.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/uXSGnHW3iFqYQ9vGX4ggz.png)

# **SmolLM2-135M-Grpo-Checkpoint**

SmolLM2-135M-Grpo-Checkpoint is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.