--- license: apache-2.0 language: - en base_model: - HuggingFaceTB/SmolLM2-135M-Instruct pipeline_tag: text-generation library_name: transformers tags: - text-generation-inference - GRPO --- ![czxbzdxfcv.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/uXSGnHW3iFqYQ9vGX4ggz.png) # **SmolLM2-135M-Grpo-Checkpoint** SmolLM2-135M-Grpo-Checkpoint is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.