lm-human-preference-details

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

vwxyzjn authored a paper about 2 months ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

vwxyzjn authored a paper about 2 months ago

A2C is a special case of PPO

vwxyzjn authored a paper about 2 months ago

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

View all activity

lm-human-preference-details's activity

vwxyzjn

authored 5 papers about 2 months ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 6

vwxyzjn

authored a paper about 1 year ago

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Paper • 2402.03046 • Published Feb 5, 2024 • 6

vwxyzjn

authored 2 papers over 1 year ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 123

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Paper • 2310.00036 • Published Sep 29, 2023 • 2

vwxyzjn

updated a Space over 1 year ago

Rlhf Demo

💻

Generate code snippets from a prompt

vwxyzjn

updated 11 models over 1 year ago

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed1

Text Generation • Updated Oct 6, 2023 • 12

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed5

Text Generation • Updated Oct 6, 2023 • 12

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed3

Text Generation • Updated Oct 6, 2023 • 12

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed4

Text Generation • Updated Oct 6, 2023 • 13

lm-human-preference-details/train_policy_accelerate_tf_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed2

Text Generation • Updated Oct 6, 2023 • 11

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed3

Text Generation • Updated Oct 6, 2023 • 14

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed5

Text Generation • Updated Oct 6, 2023 • 14

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed2

Text Generation • Updated Oct 6, 2023 • 14

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2_xl_grad_accu__sentiment_offline_5k.json__seed4

Text Generation • Updated Oct 6, 2023 • 16

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2__sentiment_offline_5k.json__seed5

Text Generation • Updated Oct 6, 2023 • 62

lm-human-preference-details/train_policy_accelerate_pt_adam_gpt2__sentiment_offline_5k.json__seed2

Text Generation • Updated Oct 5, 2023 • 61

AI & ML interests

Recent Activity

Team members 1

lm-human-preference-details's activity

Rlhf Demo