Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ tags:
|
|
21 |
![model image](https://agwarbliu.s3.amazonaws.com/model_select_ours.png)
|
22 |
|
23 |
|
24 |
-
**
|
25 |
|
26 |
**Instead of training an additional reward model that is likely to be gamed, we directly train the model on the social games!** 🕹️ 🎲 🎮
|
27 |
|
@@ -29,7 +29,9 @@ Full details on simulation and training can be found [here](https://github.com/a
|
|
29 |
|
30 |
# Training Procedure
|
31 |
|
32 |
-
Trained on 8xA100s for 3H. The start checkpoint is the [SFT model](https://huggingface.co/agi-css/hh-rlhf-sft)).
|
|
|
|
|
33 |
|
34 |
Here is the training script:
|
35 |
|
|
|
21 |
![model image](https://agwarbliu.s3.amazonaws.com/model_select_ours.png)
|
22 |
|
23 |
|
24 |
+
**Efficient, Effective, and Stable alternative of RLHF!**
|
25 |
|
26 |
**Instead of training an additional reward model that is likely to be gamed, we directly train the model on the social games!** 🕹️ 🎲 🎮
|
27 |
|
|
|
29 |
|
30 |
# Training Procedure
|
31 |
|
32 |
+
Trained on 8xA100s for 3H. The start checkpoint is the [SFT model](https://huggingface.co/agi-css/hh-rlhf-sft)).
|
33 |
+
|
34 |
+
We have also released the [better-base model](https://huggingface.co/agi-css/better-base) which is the start checkpoint of SFT.
|
35 |
|
36 |
Here is the training script:
|
37 |
|