agi-css commited on
Commit
49a6b3c
·
1 Parent(s): c677efc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -21,7 +21,7 @@ tags:
21
  ![model image](https://agwarbliu.s3.amazonaws.com/model_select_ours.png)
22
 
23
 
24
- **Fast, Effective, and Stable alternative of RLHF!**
25
 
26
  **Instead of training an additional reward model that is likely to be gamed, we directly train the model on the social games!** 🕹️ 🎲 🎮
27
 
@@ -29,7 +29,9 @@ Full details on simulation and training can be found [here](https://github.com/a
29
 
30
  # Training Procedure
31
 
32
- Trained on 8xA100s for 3H. The start checkpoint is the [SFT model](https://huggingface.co/agi-css/hh-rlhf-sft)). We have also released the [better-base model](https://huggingface.co/agi-css/better-base) which is the start checkpoint of SFT.
 
 
33
 
34
  Here is the training script:
35
 
 
21
  ![model image](https://agwarbliu.s3.amazonaws.com/model_select_ours.png)
22
 
23
 
24
+ **Efficient, Effective, and Stable alternative of RLHF!**
25
 
26
  **Instead of training an additional reward model that is likely to be gamed, we directly train the model on the social games!** 🕹️ 🎲 🎮
27
 
 
29
 
30
  # Training Procedure
31
 
32
+ Trained on 8xA100s for 3H. The start checkpoint is the [SFT model](https://huggingface.co/agi-css/hh-rlhf-sft)).
33
+
34
+ We have also released the [better-base model](https://huggingface.co/agi-css/better-base) which is the start checkpoint of SFT.
35
 
36
  Here is the training script:
37