weqweasdas commited on
Commit
028c23e
1 Parent(s): 12658a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -6,7 +6,9 @@
6
 
7
  <!-- Provide a quick summary of what the model is/does. -->
8
 
9
- The reward model is trained from the base model [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it). See the 2B version [RM-Gemma-2B](https://huggingface.co/weqweasdas/RM-Gemma-2B).
 
 
10
 
11
  ## Model Details
12
 
@@ -46,11 +48,11 @@ We train the model for one epoch with a learning rate of 5e-6, batch size 256, c
46
 
47
  ```python
48
  from transformers import AutoTokenizer, pipeline
49
- rm_tokenizer = AutoTokenizer.from_pretrained("weqweasdas/RM-Gemma-7B")
50
  device = 0 # accelerator.device
51
  rm_pipe = pipeline(
52
  "sentiment-analysis",
53
- model="weqweasdas/RM-Gemma-7B",
54
  #device="auto",
55
  device=device,
56
  tokenizer=rm_tokenizer,
@@ -91,7 +93,8 @@ Note that for MT-Bench dataset (lmsys/mt_bench_human_judgments), we delete the s
91
  | UltraRM-13B | 0.71 | **0.73** | **0.72** | 0.72 | 0.78 | **0.9** | **0.65** | **0.83** | **0.62** |
92
  | Pair-RM | 0.65 | 0.56 | 0.62 | 0.6 | 0.74 | 0.82 | 0.62 | 0.75 | 0.59 |
93
  | RM-Gemma-2B | 0.68 | **0.73** | 0.68 | 0.72 | 0.77 | 0.87 | 0.63 | 0.78 | 0.59 |
94
- | RM-Gemma-7B | **0.72** | 0.72 | 0.71 | **0.74** | **0.79** | 0.89 | **0.65** | 0.78 | **0.62** |
 
95
 
96
 
97
 
 
6
 
7
  <!-- Provide a quick summary of what the model is/does. -->
8
 
9
+ The reward model is trained from the base model [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
10
+
11
+ The training process is identical to [RM-Gemma-7B](https://huggingface.co/weqweasdas/RM-Gemma-7B) but with a max-length of 4096 thanks to more GPU resources.
12
 
13
  ## Model Details
14
 
 
48
 
49
  ```python
50
  from transformers import AutoTokenizer, pipeline
51
+ rm_tokenizer = AutoTokenizer.from_pretrained("weqweasdas/RM-Gemma-7B-4096")
52
  device = 0 # accelerator.device
53
  rm_pipe = pipeline(
54
  "sentiment-analysis",
55
+ model="weqweasdas/RM-Gemma-7B-4096",
56
  #device="auto",
57
  device=device,
58
  tokenizer=rm_tokenizer,
 
93
  | UltraRM-13B | 0.71 | **0.73** | **0.72** | 0.72 | 0.78 | **0.9** | **0.65** | **0.83** | **0.62** |
94
  | Pair-RM | 0.65 | 0.56 | 0.62 | 0.6 | 0.74 | 0.82 | 0.62 | 0.75 | 0.59 |
95
  | RM-Gemma-2B | 0.68 | **0.73** | 0.68 | 0.72 | 0.77 | 0.87 | 0.63 | 0.78 | 0.59 |
96
+ | RM-Gemma-7B | **0.72** | 0.72 | 0.71 | 0.74 | **0.79** | 0.89 | 0.65 | 0.78 | 0.62 |
97
+ | RM-Gemma-7B-4096 | **0.72** | **0.73** | 0.71 | **0.75** | **0.79** | 0.89 | **0.66** | 0.82 | **0.66** |
98
 
99
 
100