weqweasdas
/

RM-Gemma-7B

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

weqweasdas commited on Mar 20

Commit

028c23e

•

1 Parent(s): 12658a6

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -6,7 +6,9 @@
 <!-- Provide a quick summary of what the model is/does. -->
-The reward model is trained from the base model [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it). See the 2B version [RM-Gemma-2B](https://huggingface.co/weqweasdas/RM-Gemma-2B).
 ## Model Details
@@ -46,11 +48,11 @@ We train the model for one epoch with a learning rate of 5e-6, batch size 256, c
 ```python
   from transformers import AutoTokenizer, pipeline
-  rm_tokenizer = AutoTokenizer.from_pretrained("weqweasdas/RM-Gemma-7B")
   device = 0 # accelerator.device
   rm_pipe = pipeline(
       "sentiment-analysis",
-      model="weqweasdas/RM-Gemma-7B",
       #device="auto",
       device=device,
       tokenizer=rm_tokenizer,
@@ -91,7 +93,8 @@ Note that for MT-Bench dataset (lmsys/mt_bench_human_judgments), we delete the s
 |  UltraRM-13B   |    0.71     | **0.73** |              **0.72**           |   0.72    |    0.78    |    **0.9**    |   **0.65**   |  **0.83**   |      **0.62**       |
 |    Pair-RM     |      0.65       | 0.56     |              0.62               |      0.6      |      0.74      |     0.82      |     0.62     |    0.75     |        0.59         |
 |  RM-Gemma-2B   |      0.68       | **0.73** |              0.68               |   0.72    |      0.77      |     0.87      |     0.63     |    0.78     |        0.59         |
-|  RM-Gemma-7B   |      **0.72**       | 0.72 |              0.71              |   **0.74**    |      **0.79**      |     0.89      |     **0.65**     |    0.78     |        **0.62**         |

 <!-- Provide a quick summary of what the model is/does. -->
+The reward model is trained from the base model [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
+The training process is identical to [RM-Gemma-7B](https://huggingface.co/weqweasdas/RM-Gemma-7B) but with a max-length of 4096 thanks to more GPU resources.
 ## Model Details
 ```python
   from transformers import AutoTokenizer, pipeline
+  rm_tokenizer = AutoTokenizer.from_pretrained("weqweasdas/RM-Gemma-7B-4096")
   device = 0 # accelerator.device
   rm_pipe = pipeline(
       "sentiment-analysis",
+      model="weqweasdas/RM-Gemma-7B-4096",
       #device="auto",
       device=device,
       tokenizer=rm_tokenizer,
 |  UltraRM-13B   |    0.71     | **0.73** |              **0.72**           |   0.72    |    0.78    |    **0.9**    |   **0.65**   |  **0.83**   |      **0.62**       |
 |    Pair-RM     |      0.65       | 0.56     |              0.62               |      0.6      |      0.74      |     0.82      |     0.62     |    0.75     |        0.59         |
 |  RM-Gemma-2B   |      0.68       | **0.73** |              0.68               |   0.72    |      0.77      |     0.87      |     0.63     |    0.78     |        0.59         |
+|  RM-Gemma-7B   |      **0.72**       | 0.72 |              0.71              |   0.74    |      **0.79**      |     0.89      |     0.65     |    0.78     |        0.62         |
+|  RM-Gemma-7B-4096   |      **0.72**       | **0.73** |              0.71              |   **0.75**    |      **0.79**      |     0.89      |     **0.66**     |    0.82     |        **0.66**         |