prithivMLmods
/

SmolLM2_135M_Grpo_Checkpoint

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

prithivMLmods commited on 12 days ago

Commit

937dac8

·

verified ·

1 Parent(s): 7a4989e

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -11,3 +11,7 @@ tags:
 - GRPO
 ---
 ![czxbzdxfcv.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/uXSGnHW3iFqYQ9vGX4ggz.png)

 - GRPO
 ---
 ![czxbzdxfcv.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/uXSGnHW3iFqYQ9vGX4ggz.png)
+# **SmolLM2-135M-Grpo-Checkpoint**
+SmolLM2-135M-Grpo-Checkpoint is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.