RLAIF

Enterprise

community

AI & ML interests

None defined yet.

Recent Activity

violetxi authored a paper 10 days ago

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

violetxi authored a paper 10 days ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

LouisCastricato authored a paper 12 days ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

View all activity

Collections 1

models 7

RLAIF/sft-external

Text Generation • Updated Dec 19, 2024 • 20.9k

RLAIF/sft-llama-3.1-8b-external

Text Generation • Updated Nov 12, 2024 • 3.73k

RLAIF/sft-gemma-2-9b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64

Text Generation • Updated Oct 30, 2024 • 3

RLAIF/sft-llama8b-prm-800k-correct-only

Text Generation • Updated Oct 24, 2024 • 3

RLAIF/22-sequential-temp-0-verifier-no-best-oracle-in-context-train-8

Updated Oct 13, 2024

RLAIF/22-sequential-temp-0-verifier-oracle-in-context-train-8-w-error-masking

Updated Oct 11, 2024

RLAIF/15-w-error-masking-temp-0-verifier-in-context-train-in-context-inference-8-model

Updated Sep 30, 2024 • 4

datasets 20

RLAIF/iGSM-1M-retry0.5

Viewer • Updated 1 day ago • 1.01M • 13

RLAIF/iGSM-1M-retry0.0

Viewer • Updated 2 days ago • 1.01M • 35

RLAIF/iGSM-1M-retry0.6

Viewer • Updated 2 days ago • 1.01M • 20

RLAIF/iGSM-1M-retry0.1

Viewer • Updated 2 days ago • 1.01M • 39

RLAIF/iGSM-1M-retry0.8

Viewer • Updated 3 days ago • 100 • 16

RLAIF/numina-math-llama-3.1-8b-bon-meta-cot

Viewer • Updated 7 days ago • 680k • 496

RLAIF/optim_policy_pretrain-pythia-160m_lr0.0001_bs24_wp1_wd0.01_ep0_cp35k-merged

Viewer • Updated 8 days ago • 700k • 22

RLAIF/TIR-Batched-PRM-Seed-Rollouts

Viewer • Updated 25 days ago • 160k • 39

RLAIF/dec_09_token_baseline_ds_math_llama_3_1_405b_tmp07_together

Viewer • Updated Dec 9, 2024 • 2.5k • 39

RLAIF/dec09_token_thinking_shrt_ds_math_llama_3_1_8b_instruc_tmp07

Viewer • Updated Dec 9, 2024 • 2.5k • 40