UltraIF: Advancing Instruction Following from the Wild
Abstract
Instruction-following made modern large language models (LLMs) helpful assistants. However, the key to taming LLMs on complex instructions remains mysterious, for that there are huge gaps between models trained by open-source community and those trained by leading companies. To bridge the gap, we propose a simple and scalable approach UltraIF for building LLMs that can follow complex instructions with open-source data. UltraIF first decomposes real-world user prompts into simpler queries, constraints, and corresponding evaluation questions for the constraints. Then, we train an UltraComposer to compose constraint-associated prompts with evaluation questions. This prompt composer allows us to synthesize complicated instructions as well as filter responses with evaluation questions. In our experiment, for the first time, we successfully align LLaMA-3.1-8B-Base to catch up with its instruct version on 5 instruction-following benchmarks without any benchmark information, using only 8B model as response generator and evaluator. The aligned model also achieved competitive scores on other benchmarks. Moreover, we also show that UltraIF could further improve LLaMA-3.1-8B-Instruct through self-alignment, motivating broader use cases for the method. Our code will be available at https://github.com/kkk-an/UltraIF.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models (2025)
- Smaller Language Models Are Better Instruction Evolvers (2024)
- GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers (2024)
- ExecRepoBench: Multi-level Executable Code Completion Evaluation (2024)
- Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on Basque (2024)
- Template Matters: Understanding the Role of Instruction Templates in Multimodal Language Model Evaluation and Training (2024)
- SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper