arxiv:2412.16145
Hanze Dong
hendrydong
AI & ML interests
None yet
Recent Activity
authored
a paper
2 days ago
Offline Reinforcement Learning for LLM Multi-Step Reasoning
upvoted
a
paper
2 days ago
Offline Reinforcement Learning for LLM Multi-Step Reasoning
new activity
about 1 month ago
RLHFlow/LLaMA3.2-1B-SFT:the training data for this model?
Organizations
Papers
12
models
5
hendrydong/dpo_offline_700K
Text Generation
•
Updated
•
9
hendrydong/llama3
Updated
hendrydong/dpo_K8_max_max
Text Generation
•
Updated
•
15
hendrydong/Mistral-RM-for-RAFT-GSHF-v0
Text Classification
•
Updated
•
18
•
1
hendrydong/Mistral-RM-baseline-No-Safety-Alignment
Text Classification
•
Updated
•
9