allenai
/

llama-3.1-tulu-2-8b-uf-mean-rm

Model card Files Files and versions Community

hamishivi commited on Aug 12

Commit

1f93f2a

•

1 Parent(s): f2a61cf

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -68,7 +68,7 @@ This model is meant as a research artefact.
 ### Training hyperparameters
-The following hyperparameters were used during PPO training:
 - learning_rate: 5e-06
 - total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08

 ### Training hyperparameters
+The following hyperparameters were used during RM training:
 - learning_rate: 5e-06
 - total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08