RLHFlow

university

RLHFlow

Activity Feed

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

hendrydong authored a paper about 9 hours ago

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Min-Li updated a dataset about 11 hours ago

RLHFlow/LLM-Preferences-HelpSteer2

Min-Li updated a collection 7 days ago

Decision-Tree Reward Models

View all activity

RLHFlow's activity

hendrydong

authored a paper about 9 hours ago

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published 3 days ago • 24

Min-Li

updated a dataset about 11 hours ago

RLHFlow/LLM-Preferences-HelpSteer2

Updated about 11 hours ago • 22

Min-Li

updated a collection 7 days ago

Decision-Tree Reward Models

Collection

3 items • Updated 7 days ago • 1

Min-Li

published a dataset 7 days ago

RLHFlow/LLM-Preferences-HelpSteer2

Updated about 11 hours ago • 22

Min-Li

updated 2 models 10 days ago

RLHFlow/Decision-Tree-Reward-Gemma-2-27B

Text Classification • Updated 10 days ago • 56 • 1

RLHFlow/Decision-Tree-Reward-Llama-3.1-8B

Text Classification • Updated 10 days ago • 199

Min-Li

published 2 models 12 days ago

RLHFlow/Decision-Tree-Reward-Gemma-2-27B

Text Classification • Updated 10 days ago • 56 • 1

RLHFlow/Decision-Tree-Reward-Llama-3.1-8B

Text Classification • Updated 10 days ago • 199

hendrydong

authored a paper about 1 month ago

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38

hendrydong

in RLHFlow/LLaMA3.2-1B-SFT 3 months ago

the training data for this model?

#1 opened 3 months ago by

AIR-hl

weqweasdas

updated 3 datasets 3 months ago

weqweasdas

updated 2 models 3 months ago

RLHFlow/Llama3.1-8B-PRM-Mistral-Data

Text Generation • Updated Nov 9, 2024 • 313 • 8

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

Text Generation • Updated Nov 9, 2024 • 18.7k • 33

weqweasdas

updated 3 datasets 3 months ago

RLHFlow/Deepseek-ORM-Data

Viewer • Updated Nov 9, 2024 • 253k • 58 • 3

RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9, 2024 • 253k • 158 • 12

RLHFlow/Mistral-ORM-Data

Viewer • Updated Nov 9, 2024 • 273k • 138 • 2

AI & ML interests

Recent Activity

Team members 7

RLHFlow's activity

the training data for this model?