weqweasdas
/

RM-Gemma-2B

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

weqweasdas commited on Feb 25

Commit

25e103d

•

1 Parent(s): be26d01

Update README.md

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -85,11 +85,14 @@ We train the model for one epoch with a learning rate of 1e-5, batch size 256, c
 We collect the existing preference datasets and use them as a benchmark to evaluate the resulting reawrd model.
-| Model/Test set     | HH-RLHF-Helpful | SHP | Helpsteer helpful + correctness | Helpsteer All | MT Bench Human	| MT Bench GPT4	| Alpaca Human	| Alpaca GPT4|	Alpca Human-crossed|
-| -------------- | -------------- | ------- | ------- | ------- | ------- | ------- | ------- |------- | ------- |
-| RM-Gemma-2B | 0.68           | 0.73    | 0.68 ｜ 0.72 ｜0.77 ｜ 0.87 ｜ 0.63 ｜ 0.78 ｜ 0.59 ｜

 We collect the existing preference datasets and use them as a benchmark to evaluate the resulting reawrd model.
+| Model/Test set | HH-RLHF-Helpful | SHP      | Helpsteer helpful + correctness | Helpsteer All | MT Bench Human | MT Bench GPT4 | Alpaca Human | Alpaca GPT4 | Alpca Human-crossed |
+| :------------: | --------------- | -------- | ------------------------------- | ------------- | -------------- | ------------- | ------------ | ----------- | ------------------- |
+|  UltraRM-13B   | **0.71**        | **0.73** | 0.72                            | **0.72**      | **0.78**       | **0.9**       | **0.65**     | **0.83**    | **0.62**            |
+|    Pair-RM     | 0.65            | 0.56     | 0.62                            | 0.6           | 0.74           | 0.82          | 0.62         | 0.75        | 0.59                |
+|  RM-Gemma-2B   | 0.68            | **0.73** | 0.68                            | **0.72**      | 0.77           | 0.87          | 0.63         | 0.78        | 0.59                |