REBEL: Reinforcement Learning via Regressing Relative Reward - a Cornell-AGI Collection

Cornell-AGI 's Collections

Regressing the Relative Future: Efficient Policy Optimizatio

REBEL: Reinforcement Learning via Regressing Relative Reward

REBEL: Reinforcement Learning via Regressing Relative Reward

updated Sep 2, 2024

REBEL: Reinforcement Learning via Regressing Relative Rewards

Paper • 2404.16767 • Published Apr 25, 2024 • 2
Cornell-AGI/REBEL-Llama-3-Armo-iter_1

Updated Sep 2, 2024 • 14 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_2

Updated Sep 2, 2024 • 8 • 1
Cornell-AGI/REBEL-Llama-3-Armo-iter_3

Updated Sep 2, 2024 • 11 • 2
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_1

Viewer • Updated Sep 2, 2024 • 56.1k • 62
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_2

Viewer • Updated Sep 2, 2024 • 55.1k • 51
Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_3

Viewer • Updated Sep 2, 2024 • 44.6k • 74 • 1
Cornell-AGI/REBEL-Llama-3

Text Generation • Updated Sep 1, 2024 • 29 • 1
Cornell-AGI/REBEL-Llama-3-epoch_2

Text Generation • Updated Sep 1, 2024 • 27 • 3
Cornell-AGI/REBEL-OpenChat-3.5

Text Generation • Updated Sep 1, 2024 • 16 • 1