Samuel L Meyers's picture

Samuel L Meyers PRO

MrOvkill

AI & ML interests

Dialogue Generation, Text Generation, etc...

Recent Activity

liked a model about 2 hours ago
cognitivecomputations/Dolphin3.0-Qwen2.5-3b
liked a model about 2 hours ago
cognitivecomputations/Dolphin3.0-Llama3.1-8B
liked a Space about 12 hours ago
Rooni/AIDC-AI-Marco-o1
View all activity

Organizations

Digital Clockwork's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

MrOvkill's activity

liked a Space about 12 hours ago
liked a Space about 13 hours ago
reacted to MoritzLaurer's post with ā¤ļø about 14 hours ago
view post
Post
1485
The TRL v0.13 release is šŸ”„! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

šŸ§  Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

šŸ”€ Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

šŸ› ļø Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

āš–ļø Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here šŸ‘‡
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)