3 6 36

Tim Bula

timrbula

AI & ML interests

LLMs for language and code

Recent Activity

liked a model 10 days ago

ibm-granite/granite-3.1-3b-a800m-base

liked a model 10 days ago

ibm-granite/granite-3.1-1b-a400m-instruct

liked a model 10 days ago

ibm-granite/granite-3.1-1b-a400m-base

View all activity

Organizations

timrbula's activity

liked 7 models 10 days ago

updated a model 17 days ago

timrbula/SmolLM2-FT-SmolTalk

Text Generation • Updated 17 days ago • 12

reacted to MoritzLaurer's post with 🔥 17 days ago

Post

1700

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here 👇
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)