File size: 2,188 Bytes
051f75b 4fdc411 051f75b 9594ac7 051f75b c911391 051f75b c911391 051f75b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
license: apache-2.0
datasets:
- TIGER-Lab/WebInstruct-CFT
language:
- en
base_model:
- Qwen/Qwen2.5-32B-Instruct
tags:
- cft
- math
- reasoning
pipeline_tag: text-generation
library_name: transformers
---
# Qwen2.5-32B-Instruct-CFT
<div style="display: flex; gap: 4px; align-items: center">
<a target="_blank" href="https://github.com/TIGER-AI-Lab/CritiqueFinetuning">
<img style="height:18pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github"/>
</a>
<a target="_blank" href="https://arxiv.org/abs/2501.17703">
<img style="height:18pt" src="https://img.shields.io/badge/-Paper-green?style=flat&logo=arxiv"/>
</a>
<a target="_blank" href="https://tiger-ai-lab.github.io/CritiqueFineTuning">
<img style="height:18pt" src="https://img.shields.io/badge/-📖%20Website-red?style=flat"/>
</a>
<a target="_blank" href="https://huggingface.co./datasets/TIGER-Lab/WebInstruct-CFT">
<img style="height:18pt" src="https://img.shields.io/badge/-🤗%20Dataset-red?style=flat"/>
</a>
</div>
## Introduction
Qwen2.5-32B-Instruct-CFT is a 32B parameter model fine-tuned using our novel Critique Fine-Tuning (CFT) approach. Built upon the Qwen2.5-32B-Instruct base model, this variant is trained to critique and analyze responses rather than simply imitate them, leading to enhanced reasoning capabilities.
## Key Features
- Built on the powerful Qwen2.5-32B-Instruct foundation
- Trained using Critique Fine-Tuning (CFT) methodology
- Highly efficient training with minimal data requirements
- Inherits the strong instruction-following capabilities of the base model
## Training Details
### Training Data
- Dataset: [WebInstruct-CFT-4K](https://huggingface.co./datasets/TIGER-Lab/WebInstruct-CFT-4K)
- Training format: (input=[query; noisy response], output=critique)
- Teacher model: GPT-4o for generating critiques
### Training Infrastructure
- Framework: LLaMA-Factory
- Hardware: 8x NVIDIA H100 GPUs
- Training time: ~1.5 hours with DeepSpeed Zero-3
For more details about the model architecture, methodology, and comprehensive evaluation results, please visit our [project webpage](https://tiger-ai-lab.github.io/CritiqueFineTuning). |