12 2 172

Samuel L Meyers PRO

MrOvkill

AI & ML interests

Dialogue Generation, Text Generation, etc...

Recent Activity

liked a model about 2 hours ago

cognitivecomputations/Dolphin3.0-Qwen2.5-3b

liked a model about 2 hours ago

cognitivecomputations/Dolphin3.0-Llama3.1-8B

liked a Space about 12 hours ago

Rooni/AIDC-AI-Marco-o1

View all activity

Organizations

MrOvkill's activity

liked 2 models about 2 hours ago

cognitivecomputations/Dolphin3.0-Qwen2.5-3b

Updated 6 days ago • 427 • 14

cognitivecomputations/Dolphin3.0-Llama3.1-8B

Updated 6 days ago • 1.35k • 106

liked a Space about 12 hours ago

Running

💬

bunnycore/Llama-3.2-1b-Mixer

Text Generation • Updated Oct 7, 2024 • 16 • 1

liked a model about 13 hours ago

huihui-ai/UwU-7B-Instruct-abliterated

Text Generation • Updated 5 days ago • 34 • 3

liked a Space about 13 hours ago

Running on A100

226

🔀

mergekit-gui

reacted to MoritzLaurer's post with ❤️ about 14 hours ago

Post

1485

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here 👇
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)