Tim Bula's picture

Tim Bula

timrbula

AI & ML interests

LLMs for language and code

Recent Activity

Organizations

IBM's profile picture

timrbula's activity

reacted to MoritzLaurer's post with šŸ”„ 17 days ago
view post
Post
1700
The TRL v0.13 release is šŸ”„! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

šŸ§  Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

šŸ”€ Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

šŸ› ļø Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

āš–ļø Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here šŸ‘‡
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)