TRL documentation

TRL - Transformer Reinforcement Learning

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.15.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

TRL - Transformer Reinforcement Learning

TRL is a full stack library where we provide a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.

You can also explore TRL-related models, datasets, and demos in the TRL Hugging Face organization.

Learn

Learn post-training with TRL and other libraries in 🤗 smol course.

Contents

The documentation is organized into the following sections:

  • Getting Started: installation and quickstart guide.
  • Conceptual Guides: dataset formats, training FAQ, and understanding logs.
  • How-to Guides: reducing memory usage, speeding up training, distributing training, etc.
  • Integrations: DeepSpeed, Liger Kernel, PEFT, etc.
  • Examples: example overview, community tutorials, etc.
  • API: trainers, utils, etc.

Blog posts

< > Update on GitHub