weqweasdas
commited on
Commit
•
028c23e
1
Parent(s):
12658a6
Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,9 @@
|
|
6 |
|
7 |
<!-- Provide a quick summary of what the model is/does. -->
|
8 |
|
9 |
-
The reward model is trained from the base model [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
|
|
|
|
|
10 |
|
11 |
## Model Details
|
12 |
|
@@ -46,11 +48,11 @@ We train the model for one epoch with a learning rate of 5e-6, batch size 256, c
|
|
46 |
|
47 |
```python
|
48 |
from transformers import AutoTokenizer, pipeline
|
49 |
-
rm_tokenizer = AutoTokenizer.from_pretrained("weqweasdas/RM-Gemma-7B")
|
50 |
device = 0 # accelerator.device
|
51 |
rm_pipe = pipeline(
|
52 |
"sentiment-analysis",
|
53 |
-
model="weqweasdas/RM-Gemma-7B",
|
54 |
#device="auto",
|
55 |
device=device,
|
56 |
tokenizer=rm_tokenizer,
|
@@ -91,7 +93,8 @@ Note that for MT-Bench dataset (lmsys/mt_bench_human_judgments), we delete the s
|
|
91 |
| UltraRM-13B | 0.71 | **0.73** | **0.72** | 0.72 | 0.78 | **0.9** | **0.65** | **0.83** | **0.62** |
|
92 |
| Pair-RM | 0.65 | 0.56 | 0.62 | 0.6 | 0.74 | 0.82 | 0.62 | 0.75 | 0.59 |
|
93 |
| RM-Gemma-2B | 0.68 | **0.73** | 0.68 | 0.72 | 0.77 | 0.87 | 0.63 | 0.78 | 0.59 |
|
94 |
-
| RM-Gemma-7B | **0.72** | 0.72 | 0.71 |
|
|
|
95 |
|
96 |
|
97 |
|
|
|
6 |
|
7 |
<!-- Provide a quick summary of what the model is/does. -->
|
8 |
|
9 |
+
The reward model is trained from the base model [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
|
10 |
+
|
11 |
+
The training process is identical to [RM-Gemma-7B](https://huggingface.co/weqweasdas/RM-Gemma-7B) but with a max-length of 4096 thanks to more GPU resources.
|
12 |
|
13 |
## Model Details
|
14 |
|
|
|
48 |
|
49 |
```python
|
50 |
from transformers import AutoTokenizer, pipeline
|
51 |
+
rm_tokenizer = AutoTokenizer.from_pretrained("weqweasdas/RM-Gemma-7B-4096")
|
52 |
device = 0 # accelerator.device
|
53 |
rm_pipe = pipeline(
|
54 |
"sentiment-analysis",
|
55 |
+
model="weqweasdas/RM-Gemma-7B-4096",
|
56 |
#device="auto",
|
57 |
device=device,
|
58 |
tokenizer=rm_tokenizer,
|
|
|
93 |
| UltraRM-13B | 0.71 | **0.73** | **0.72** | 0.72 | 0.78 | **0.9** | **0.65** | **0.83** | **0.62** |
|
94 |
| Pair-RM | 0.65 | 0.56 | 0.62 | 0.6 | 0.74 | 0.82 | 0.62 | 0.75 | 0.59 |
|
95 |
| RM-Gemma-2B | 0.68 | **0.73** | 0.68 | 0.72 | 0.77 | 0.87 | 0.63 | 0.78 | 0.59 |
|
96 |
+
| RM-Gemma-7B | **0.72** | 0.72 | 0.71 | 0.74 | **0.79** | 0.89 | 0.65 | 0.78 | 0.62 |
|
97 |
+
| RM-Gemma-7B-4096 | **0.72** | **0.73** | 0.71 | **0.75** | **0.79** | 0.89 | **0.66** | 0.82 | **0.66** |
|
98 |
|
99 |
|
100 |
|