Update README.md
Browse files
README.md
CHANGED
@@ -16,9 +16,7 @@ tags:
|
|
16 |
|
17 |
SmolLM2-135M-Grpo-Checkpoint is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.
|
18 |
|
19 |
-
|
20 |
-
|
21 |
-
SmolLM2-135M-Grpo-Checkpoint is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.
|
22 |
|
23 |
### **SmolLM2 135M Grpo Fine-tuning**
|
24 |
|
|
|
16 |
|
17 |
SmolLM2-135M-Grpo-Checkpoint is fine-tuned based on SmolLM2-135M-Instruct. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The 135M model was trained on 2 trillion tokens using a diverse combination of datasets: FineWeb-Edu, DCLM, The Stack, along with new filtered datasets that we curated and will release soon. We developed the instruct version through supervised fine-tuning (SFT) using a combination of public datasets and our own curated datasets.
|
18 |
|
19 |
+

|
|
|
|
|
20 |
|
21 |
### **SmolLM2 135M Grpo Fine-tuning**
|
22 |
|