metadata
language:
- en
license: mit
library_name: transformers
tags:
- reasoning
- axolotl
- r1
base_model:
- meta-llama/Llama-3.2-3B-Instruct
datasets:
- ServiceNow-AI/R1-Distill-SFT
pipeline_tag: text-generation
model-index:
- name: DeepSeek-R1-Distill-Llama-3B
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 70.93
name: strict accuracy
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=suayptalha/DeepSeek-R1-Distill-Llama-3B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 21.45
name: normalized accuracy
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=suayptalha/DeepSeek-R1-Distill-Llama-3B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 20.92
name: exact match
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=suayptalha/DeepSeek-R1-Distill-Llama-3B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 1.45
name: acc_norm
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=suayptalha/DeepSeek-R1-Distill-Llama-3B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 2.91
name: acc_norm
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=suayptalha/DeepSeek-R1-Distill-Llama-3B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 21.98
name: accuracy
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=suayptalha/DeepSeek-R1-Distill-Llama-3B
name: Open LLM Leaderboard
DeepSeek-R1-Distill-Llama-3B
This model is the distilled version of DeepSeek-R1 on Llama-3.2-3B with R1-Distill-SFT dataset.
See axolotl config
base_model: unsloth/Llama-3.2-3B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: true
load_in_4bit: false
strict: false
chat_template: llama3
datasets:
- path: ./custom_dataset.json
type: chat_template
conversation: chatml
ds_type: json
add_bos_token: true
add_eos_token: true
use_default_system_prompt: false
special_tokens:
bos_token: "<|begin_of_text|>"
eos_token: "<|eot_id|>"
pad_token: "<|eot_id|>"
additional_special_tokens:
- "<|begin_of_text|>"
- "<|eot_id|>"
adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.1
lora_target_linear: true
hub_model_id: suayptalha/DeepSeek-R1-Distill-Llama-3B
sequence_len: 2048
sample_packing: false
pad_to_sequence_len: true
micro_batch_size: 2
gradient_accumulation_steps: 8
num_epochs: 1
learning_rate: 2e-5
optimizer: paged_adamw_8bit
lr_scheduler: cosine
train_on_inputs: false
group_by_length: false
bf16: false
fp16: true
tf32: false
gradient_checkpointing: true
flash_attention: false
logging_steps: 50
warmup_steps: 100
saves_per_epoch: 1
output_dir: ./finetune-sft-results
save_safetensors: true
Prompt Template
You can use Llama3 prompt template while using the model:
Llama3
<|start_header_id|>system<|end_header_id|>
{system}<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{user}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
{assistant}<|eot_id|>
Example usage:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"suayptalha/DeepSeek-R1-Distill-Llama-3B",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("suayptalha/DeepSeek-R1-Distill-Llama-3B")
SYSTEM_PROMPT = """Respond in the following format:
<think>
You should reason between these tags.
</think>
Answer goes here...
Always use <think> </think> tags even if they are not necessary.
"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "Which one is larger? 9.11 or 9.9?"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True,
return_tensors = "pt",
).to("cuda")
output = model.generate(input_ids=inputs, max_new_tokens=256, use_cache=True, temperature=0.7)
decoded_output = tokenizer.decode(output[0], skip_special_tokens=False)
print(decoded_output)
Output:
<think>
First, I need to compare the two numbers 9.11 and 9.9.
Next, I'll analyze each number. The first digit after the decimal point in 9.11 is 1, and in 9.9, it's 9.
Since 9 is greater than 1, 9.9 is larger than 9.11.
</think>
To determine which number is larger, let's compare the two numbers:
**9.11** and **9.9**
1. **Identify the Decimal Places:**
- Both numbers have two decimal places.
2. **Compare the Tens Place (Right of the Decimal Point):**
- **9.11:** The tens place is 1.
- **9.9:** The tens place is 9.
3. **Conclusion:**
- Since 9 is greater than 1, the number with the larger tens place is 9.9.
**Answer:** **9.9** is larger than **9.11**.
Suggested system prompt:
Respond in the following format:
<think>
You should reason between these tags.
</think>
Answer goes here...
Always use <think> </think> tags even if they are not necessary.
Parameters
- lr: 2e-5
- epochs: 1
- batch_size: 16
- optimizer: paged_adamw_8bit
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 23.27 |
IFEval (0-Shot) | 70.93 |
BBH (3-Shot) | 21.45 |
MATH Lvl 5 (4-Shot) | 20.92 |
GPQA (0-shot) | 1.45 |
MuSR (0-shot) | 2.91 |
MMLU-PRO (5-shot) | 21.98 |