Introduction

This is a breward model (based on Gemma-2b-it) trained with BT loss using the weqweasdas/preference_dataset_mixture2_and_safe_pku dataset.

This reward model is especially useful if you need a good small reward model for LLMs. You can also refer to Ray2333/GRM-Gemma-2B-sftreg for a better 2B reward model trained with a hidden states regularization.

Evaluation

We evaluate this reward model on the reward model benchmark.

Model Average Chat Chat Hard Safety Reasoning
Ray2333/GRM-Gemma-2B-sftreg(Ours, 2B) 75.3 95.5 48.7 80.0 76.8
berkeley-nest/Starling-RM-7B-alpha (7B) 74.6 98 43.4 88.6 74.6
Ray2333/Gemma-2B-rewardmodel-baseline(Ours, 2B) 73.7 94.1 46.1 79.6 75.0
stabilityai/stablelm-zephyr-3b (3B) 73.1 86.3 60.1 70.3 75.7
openbmb/UltraRM-13b (13B) 71.3 96.1 55.3 45.8 82

Usage

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline')
reward_model = AutoModelForSequenceClassification.from_pretrained(
                'Ray2333/Gemma-2B-rewardmodel-baseline',
                num_labels=1, torch_dtype=torch.float16,
                device_map=0,
                )
message = [
  {'role': 'user', 'content': "I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her?"},
  {'role': 'assistant', 'content': "Sorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"}
]
message_template = tokenizer.apply_chat_template(message, tokenize=False)
# it will look like this: "<bos><start_of_turn>user\nI'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her?<end_of_turn>\n<start_of_turn>model\nSorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?<end_of_turn>\n".

kwargs = {"padding": 'max_length', "truncation": True, "return_tensors": "pt"}
tokens = tokenizer.encode_plus(message_template, **kwargs)

with torch.no_grad():
  reward_tensor = model(tokens["input_ids"][0].to(model.device), attention_mask=tokens["attention_mask"][0].to(model.device)).logits.reshape(-1)
  reward = reward_tensor.cpu().detach().item()
Downloads last month
316
Safetensors
Model size
2.51B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Ray2333/Gemma-2B-rewardmodel-baseline