YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co./docs/hub/model-cards#model-card-metadata)
This is recompiled for WASM model: https://huggingface.co./PowerInfer/SmallThinker-3B-Preview
Original description:
datasets: - PowerInfer/QWQ-LONGCOT-500K - PowerInfer/LONGCOT-Refine-500K base_model: - Qwen/Qwen2.5-3B-Instruct pipeline_tag: text-generation language: - en library_name: transformers
SmallThinker-3B-preview
We introduce SmallThinker-3B-preview, a new model fine-tuned from the Qwen2.5-3b-Instruct model.
Benchmark Performance
Model | AIME24 | AMC23 | GAOKAO2024_I | GAOKAO2024_II | MMLU_STEM | AMPS_Hard | math_comp |
---|---|---|---|---|---|---|---|
Qwen2.5-3B-Instruct | 6.67 | 45 | 50 | 35.8 | 59.8 | - | - |
SmallThinker | 16.667 | 57.5 | 64.2 | 57.1 | 68.2 | 70 | 46.8 |
GPT-4o | 9.3 | - | - | - | 64.2 | 57 | 50 |
Limitation: Due to SmallThinker's current limitations in instruction following, for math_comp we adopt a more lenient evaluation method where only correct answers are required, without constraining responses to follow the specified AAAAA format.
Intended Use Cases
SmallThinker is designed for the following use cases:
- Edge Deployment: Its small size makes it ideal for deployment on resource-constrained devices.
- Draft Model for QwQ-32B-Preview: SmallThinker can serve as a fast and efficient draft model for the larger QwQ-32B-Preview model. From my test, in llama.cpp we can get 70% speedup (from 40 tokens/s to 70 tokens/s).
Training Details
The model was trained using 8 H100 GPUs with a global batch size of 16. The specific configuration is as follows:
neat_packing: true
cutoff_len: 16384
per_device_train_batch_size: 2
gradient_accumulation_steps: 1
learning_rate: 1.0e-5
num_train_epochs: 3
lr_scheduler_type: cosine
warmup_ratio: 0.02
bf16: true
ddp_timeout: 180000000
weight_decay: 0.0
The SFT (Supervised Fine-Tuning) process was conducted in two phases:
First Phase:
- Used only the PowerInfer/QWQ-LONGCOT-500K dataset
- Trained for 1.5 epochs
Second Phase:
- Combined training with PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine datasets
- Continued training for 2 additional epochs
Limitations & Disclaimer
Please be aware of the following limitations:
- Language Limitation: The model has only been trained on English-language datasets, hence its capabilities in other languages are still lacking.
- Limited Knowledge: Due to limited SFT data and the model's relatively small scale, its reasoning capabilities are constrained by its knowledge base.
- Unpredictable Outputs: The model may produce unexpected outputs due to its size and probabilistic generation paradigm. Users should exercise caution and validate the model's responses.
- Repetition Issue: The model tends to repeat itself when answering high-difficulty questions. Please increase the
repetition_penalty
to mitigate this issue.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.