|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
tags: |
|
- Tulu3 |
|
- Smollm |
|
- SLMs |
|
- Small |
|
- Huggingface |
|
- Allenai |
|
- SFT |
|
- DPO |
|
- GGUF |
|
base_model: |
|
- HuggingFaceTB/SmolLM2-1.7B |
|
datasets: |
|
- allenai/tulu-3-sft-mixture |
|
- allenai/llama-3.1-tulu-3-8b-preference-mixture |
|
pipeline_tag: text-generation |
|
model-index: |
|
- name: SmolTulu-1.7b-Instruct |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 65.41 |
|
name: strict accuracy |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=SultanR/SmolTulu-1.7b-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 12.26 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=SultanR/SmolTulu-1.7b-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 2.64 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=SultanR/SmolTulu-1.7b-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 2.57 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=SultanR/SmolTulu-1.7b-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 1.92 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=SultanR/SmolTulu-1.7b-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 7.89 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=SultanR/SmolTulu-1.7b-Instruct |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
# SmolLM2 1.7b Instruction Tuned & DPO Aligned through Tulu 3! |
|
|
|
![SmolTulu Banner](smoltulubanner.png) |
|
|
|
SmolTulu-1.7b-Instruct is the first model in a series of models meant to leverage [AllenAI's Tulu 3 post-training pipeline](https://arxiv.org/abs/2411.15124) to tune the [base version of Huggingface's SmolLM2-1.7b](https://huggingface.co./HuggingFaceTB/SmolLM2-1.7B)! The post training pipeline AllenAI came up with seemed like something perfect to apply here. |
|
|
|
This model scores the highest current score in both IFEval and GSM8k (after SmolTulu-1.7b-Reinforced) while maintaining the extremely low contamination levels in Tulu 3 and SmolLM2! I've listed the datasets used to do both the SFT (supervised finetuning) and DPO (direct preference optimization) stages. |
|
|
|
Something important to note, this model has only undergone SFT and DPO! Find the RLVR version here, [SmolTulu-1.7b-Reinforced](https://huggingface.co./SultanR/SmolTulu-1.7b-Reinforced) |
|
|
|
## Evaluation |
|
|
|
I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison. |
|
|
|
| Metric | SmolTulu-1.7b-Instruct | SmolTulu-1.7b-Reinforced | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct | |
|
|:----------------------------|:---------------------:|:---------------------:|:---------------------:|:---------------------:|:---------------------:|:---------------------:| |
|
| ARC (Average) | 51.5 | 51.1 | **51.7** | 41.6 | 46.2 | 43.7 | |
|
| BBH (3-shot) | 33.8 | 33.4 | 32.2 | 27.6 | **35.3** | 25.7 | |
|
| GSM8K (5-shot) | 51.6 | **61.0** | 48.2 | 26.8 | 42.8 | 4.6 | |
|
| HellaSwag | 61.1 | 60.4 | **66.1** | 56.1 | 60.9 | 55.5 | |
|
| IFEval (Average prompt/inst) | 67.7 | **69.3** | 56.7 | 53.5 | 47.4 | 23.1 | |
|
| MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | **24.2** | 11.7 | |
|
| PIQA | 72.2 | 72.1 | **74.4** | 72.3 | 73.2 | 71.6 | |
|
|
|
## Training Details |
|
|
|
The model was trained using Direct Preference Optimization (DPO) with the following configuration: |
|
- Base model: SmolLM2-1.7B with AllenAI's SFT pipeline ran |
|
- Mixed precision: bfloat16 |
|
- Learning rate: 8e-7 with linear scheduler |
|
- Warmup ratio: 0.1 |
|
- Training epochs: 1 |
|
- Effective batch size: 12 |
|
- Sequence length: 4096 tokens |
|
- DPO loss: Length-normalized DPO |
|
- DPO beta: 5.0 |
|
- Gradient checkpointing enabled |
|
- DeepSpeed Stage 3 for memory optimization |
|
|
|
## Usage |
|
|
|
Just like any Huggingface model, just run it using the transformers library: |
|
|
|
```python |
|
# pip install transformers |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
checkpoint = "SultanR/SmolTulu-1.7b-Instruct" |
|
device = "cuda" # for GPU usage or "cpu" for CPU usage |
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
|
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")` |
|
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) |
|
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device) |
|
outputs = model.generate(inputs) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
You can also use the model in llama.cpp through the [gguf version](https://huggingface.co./SultanR/SmolTulu-1.7b-Instruct-GGUF)! |
|
|
|
## [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
|
|
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_SultanR__SmolTulu-1.7b-Instruct) |
|
|
|
To give a more holistic overview, I also added the Open LLM Leaderboard results, which differ a lot from the script that was used to benchmark SmolLM2-Instruct. |
|
|
|
As of writing this, the number 1 ranking model in IFEval for any model under 2 billion parameters :) |
|
|
|
| Metric |Value| |
|
|-------------------|----:| |
|
|Avg. |15.45| |
|
|IFEval (0-Shot) |65.41| |
|
|BBH (3-Shot) |12.26| |
|
|MATH Lvl 5 (4-Shot)| 2.64| |
|
|GPQA (0-shot) | 2.57| |
|
|MuSR (0-shot) | 1.92| |
|
|MMLU-PRO (5-shot) | 7.89| |
|
|
|
## Citation |
|
|
|
``` |
|
@misc{alrashed2024smoltuluhigherlearningrate, |
|
title={SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs}, |
|
author={Sultan Alrashed}, |
|
year={2024}, |
|
eprint={2412.08347}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2412.08347}, |
|
} |
|
``` |
|
|
|
The training methodology follows the Tulu 3 paper: |
|
|
|
``` |
|
@article{lambert2024tulu3, |
|
title={TÜLU 3: Pushing Frontiers in Open Language Model Post-Training}, |
|
author={Lambert, Nathan and Morrison, Jacob and Pyatkin, Valentina and others}, |
|
year={2024}, |
|
journal={arXiv preprint arXiv:2411.15124} |
|
} |
|
``` |