prithivMLmods
/

SmolLM2_135M_Grpo_Checkpoint

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

prithivMLmods commited on 12 days ago

Commit

17f5f78

·

verified ·

1 Parent(s): b87f9c2

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -16,9 +16,7 @@ tags:
 SmolLM2-135M-Grpo-Checkpoint is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.
-# **SmolLM2-135M-Grpo-Checkpoint**
-SmolLM2-135M-Grpo-Checkpoint is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.
 ### **SmolLM2 135M Grpo Fine-tuning**

 SmolLM2-135M-Grpo-Checkpoint is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.
+![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/8AQZsg8B_p0U44FHnjE0F.png)
 ### **SmolLM2 135M Grpo Fine-tuning**