TIGER-Lab
/

Qwen2.5-32B-Instruct-CFT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwen2.5-32B-Instruct-CFT / README.md

ubowang's picture

Add library name and pipeline tag (#1)

4fdc411 verified 19 days ago

|

history blame contribute delete

2.19 kB

	---
	license: apache-2.0
	datasets:
	- TIGER-Lab/WebInstruct-CFT
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-32B-Instruct
	tags:
	- cft
	- math
	- reasoning
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Qwen2.5-32B-Instruct-CFT

	<div style="display: flex; gap: 4px; align-items: center">
	<a target="_blank" href="https://github.com/TIGER-AI-Lab/CritiqueFinetuning">
	<img style="height:18pt" src="https://img.shields.io/badge/-Code-black?style=flat&logo=github"/>
	</a>
	<a target="_blank" href="https://arxiv.org/abs/2501.17703">
	<img style="height:18pt" src="https://img.shields.io/badge/-Paper-green?style=flat&logo=arxiv"/>
	</a>
	<a target="_blank" href="https://tiger-ai-lab.github.io/CritiqueFineTuning">
	<img style="height:18pt" src="https://img.shields.io/badge/-📖%20Website-red?style=flat"/>
	</a>
	<a target="_blank" href="https://huggingface.co./datasets/TIGER-Lab/WebInstruct-CFT">
	<img style="height:18pt" src="https://img.shields.io/badge/-🤗%20Dataset-red?style=flat"/>
	</a>
	</div>

	## Introduction

	Qwen2.5-32B-Instruct-CFT is a 32B parameter model fine-tuned using our novel Critique Fine-Tuning (CFT) approach. Built upon the Qwen2.5-32B-Instruct base model, this variant is trained to critique and analyze responses rather than simply imitate them, leading to enhanced reasoning capabilities.

	## Key Features

	- Built on the powerful Qwen2.5-32B-Instruct foundation
	- Trained using Critique Fine-Tuning (CFT) methodology
	- Highly efficient training with minimal data requirements
	- Inherits the strong instruction-following capabilities of the base model

	## Training Details

	### Training Data
	- Dataset: [WebInstruct-CFT-4K](https://huggingface.co./datasets/TIGER-Lab/WebInstruct-CFT-4K)
	- Training format: (input=[query; noisy response], output=critique)
	- Teacher model: GPT-4o for generating critiques

	### Training Infrastructure
	- Framework: LLaMA-Factory
	- Hardware: 8x NVIDIA H100 GPUs
	- Training time: ~1.5 hours with DeepSpeed Zero-3

	For more details about the model architecture, methodology, and comprehensive evaluation results, please visit our [project webpage](https://tiger-ai-lab.github.io/CritiqueFineTuning).