Intermediate stuff for tool using
RLAIF
Enterprise
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Collections
1
models
7
RLAIF/sft-external
Text Generation
•
Updated
•
20.9k
RLAIF/sft-llama-3.1-8b-external
Text Generation
•
Updated
•
3.73k
RLAIF/sft-gemma-2-9b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64
Text Generation
•
Updated
•
3
RLAIF/sft-llama8b-prm-800k-correct-only
Text Generation
•
Updated
•
3
RLAIF/22-sequential-temp-0-verifier-no-best-oracle-in-context-train-8
Updated
RLAIF/22-sequential-temp-0-verifier-oracle-in-context-train-8-w-error-masking
Updated
RLAIF/15-w-error-masking-temp-0-verifier-in-context-train-in-context-inference-8-model
Updated
•
4
datasets
20
RLAIF/iGSM-1M-retry0.5
Viewer
•
Updated
•
1.01M
•
13
RLAIF/iGSM-1M-retry0.0
Viewer
•
Updated
•
1.01M
•
35
RLAIF/iGSM-1M-retry0.6
Viewer
•
Updated
•
1.01M
•
20
RLAIF/iGSM-1M-retry0.1
Viewer
•
Updated
•
1.01M
•
39
RLAIF/iGSM-1M-retry0.8
Viewer
•
Updated
•
100
•
16
RLAIF/numina-math-llama-3.1-8b-bon-meta-cot
Viewer
•
Updated
•
680k
•
496
RLAIF/optim_policy_pretrain-pythia-160m_lr0.0001_bs24_wp1_wd0.01_ep0_cp35k-merged
Viewer
•
Updated
•
700k
•
22
RLAIF/TIR-Batched-PRM-Seed-Rollouts
Viewer
•
Updated
•
160k
•
39
RLAIF/dec_09_token_baseline_ds_math_llama_3_1_405b_tmp07_together
Viewer
•
Updated
•
2.5k
•
39
RLAIF/dec09_token_thinking_shrt_ds_math_llama_3_1_8b_instruc_tmp07
Viewer
•
Updated
•
2.5k
•
40