Rui Yang's picture

Rui Yang

Ray2333

·

https://yangrui2015.github.io

YangRui2015

AI & ML interests

Deep Reinforcement Learning

Recent Activity

upvoted a paper about 17 hours ago

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

upvoted a paper about 17 hours ago

Self-rewarding correction for mathematical reasoning

liked a model 3 days ago

microsoft/Magma-8B

View all activity

Organizations

Collections 1

Papers 4

arxiv:2502.09560

arxiv:2411.00836

arxiv:2406.10216

arxiv:2402.10207

models 15

Ray2333/Gemma-2B-rewardmodel-baseline

Text Classification • Updated 24 days ago • 929

Ray2333/GRM-llama3-8B-distill

Text Classification • Updated 24 days ago • 137 • 6

Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback

Text Classification • Updated 24 days ago • 2.09k • 11

Ray2333/GRM-Gemma-2B-rewardmodel-ft

Updated 24 days ago • 97 • 1

Ray2333/Gemma-2B-rewardmodel-ft

Updated 24 days ago • 20 • 1

Ray2333/GRM-llama3.2-3B-sftreg

Text Classification • Updated 24 days ago • 467 • 1

Ray2333/GRM-Gemma-2B-sftreg

Text Classification • Updated 24 days ago • 298 • 3

Ray2333/GRM-llama3-8B-sftreg

Text Classification • Updated 24 days ago • 286 • 5

Ray2333/GRM-Gemma2-2B-sftreg

Text Classification • Updated 24 days ago • 21 • 1

Ray2333/GRM-gemma2-2B-rewardmodel-ft

Text Classification • Updated 24 days ago • 2.54k • 6

datasets 1

Ray2333/RiC_harmless_helpful

Viewer • Updated Jul 12, 2024 • 291k • 85