|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- TIGER-Lab/WebInstruct-CFT |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen2.5-32B-Instruct |
|
tags: |
|
- cft |
|
- math |
|
- reasoning |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
--- |
|
|
|
# Qwen2.5-32B-Instruct-CFT |
|
|
|
<div style="display: flex; gap: 4px; align-items: center"> |
|
<a target="_blank" href="https://github.com/TIGER-AI-Lab/CritiqueFinetuning"> |
|
<img style="height:18pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github"/> |
|
</a> |
|
<a target="_blank" href="https://arxiv.org/abs/2501.17703"> |
|
<img style="height:18pt" src="https://img.shields.io/badge/-Paper-green?style=flat&logo=arxiv"/> |
|
</a> |
|
<a target="_blank" href="https://tiger-ai-lab.github.io/CritiqueFineTuning"> |
|
<img style="height:18pt" src="https://img.shields.io/badge/-π%20Website-red?style=flat"/> |
|
</a> |
|
<a target="_blank" href="https://huggingface.co./datasets/TIGER-Lab/WebInstruct-CFT"> |
|
<img style="height:18pt" src="https://img.shields.io/badge/-π€%20Dataset-red?style=flat"/> |
|
</a> |
|
</div> |
|
|
|
## Introduction |
|
|
|
Qwen2.5-32B-Instruct-CFT is a 32B parameter model fine-tuned using our novel Critique Fine-Tuning (CFT) approach. Built upon the Qwen2.5-32B-Instruct base model, this variant is trained to critique and analyze responses rather than simply imitate them, leading to enhanced reasoning capabilities. |
|
|
|
## Key Features |
|
|
|
- Built on the powerful Qwen2.5-32B-Instruct foundation |
|
- Trained using Critique Fine-Tuning (CFT) methodology |
|
- Highly efficient training with minimal data requirements |
|
- Inherits the strong instruction-following capabilities of the base model |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
- Dataset: [WebInstruct-CFT-4K](https://huggingface.co./datasets/TIGER-Lab/WebInstruct-CFT-4K) |
|
- Training format: (input=[query; noisy response], output=critique) |
|
- Teacher model: GPT-4o for generating critiques |
|
|
|
### Training Infrastructure |
|
- Framework: LLaMA-Factory |
|
- Hardware: 8x NVIDIA H100 GPUs |
|
- Training time: ~1.5 hours with DeepSpeed Zero-3 |
|
|
|
For more details about the model architecture, methodology, and comprehensive evaluation results, please visit our [project webpage](https://tiger-ai-lab.github.io/CritiqueFineTuning). |