TRL - Transformer Reinforcement Learning

TRL is a full stack library where we provide a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.

You can also explore TRL-related models, datasets, and demos in the TRL Hugging Face organization.

Learn

Learn post-training with TRL and other libraries in 🤗 smol course.

The documentation is organized into the following sections:

Getting Started: installation and quickstart guide.
Conceptual Guides: dataset formats, training FAQ, and understanding logs.
How-to Guides: reducing memory usage, speeding up training, distributing training, etc.
Integrations: DeepSpeed, Liger Kernel, PEFT, etc.
Examples: example overview, community tutorials, etc.
API: trainers, utils, etc.