weqweasdas commited on
Commit
25e103d
1 Parent(s): be26d01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -3
README.md CHANGED
@@ -85,11 +85,14 @@ We train the model for one epoch with a learning rate of 1e-5, batch size 256, c
85
 
86
  We collect the existing preference datasets and use them as a benchmark to evaluate the resulting reawrd model.
87
 
88
- | Model/Test set | HH-RLHF-Helpful | SHP | Helpsteer helpful + correctness | Helpsteer All | MT Bench Human | MT Bench GPT4 | Alpaca Human | Alpaca GPT4| Alpca Human-crossed|
89
- | -------------- | -------------- | ------- | ------- | ------- | ------- | ------- | ------- |------- | ------- |
90
- | RM-Gemma-2B | 0.68 | 0.73 | 0.68 | 0.72 |0.77 | 0.87 | 0.63 | 0.78 | 0.59 |
91
 
92
 
 
 
 
 
 
 
93
 
94
 
95
 
 
85
 
86
  We collect the existing preference datasets and use them as a benchmark to evaluate the resulting reawrd model.
87
 
 
 
 
88
 
89
 
90
+ | Model/Test set | HH-RLHF-Helpful | SHP | Helpsteer helpful + correctness | Helpsteer All | MT Bench Human | MT Bench GPT4 | Alpaca Human | Alpaca GPT4 | Alpca Human-crossed |
91
+ | :------------: | --------------- | -------- | ------------------------------- | ------------- | -------------- | ------------- | ------------ | ----------- | ------------------- |
92
+ | UltraRM-13B | **0.71** | **0.73** | 0.72 | **0.72** | **0.78** | **0.9** | **0.65** | **0.83** | **0.62** |
93
+ | Pair-RM | 0.65 | 0.56 | 0.62 | 0.6 | 0.74 | 0.82 | 0.62 | 0.75 | 0.59 |
94
+ | RM-Gemma-2B | 0.68 | **0.73** | 0.68 | **0.72** | 0.77 | 0.87 | 0.63 | 0.78 | 0.59 |
95
+
96
 
97
 
98