Update README.md

f110c24 verified 13 days ago

6.84 kB

	---
	license: other
	language:
	- en
	- zh
	tags:
	- reward model
	pipeline_tag: visual-question-answering
	---

	<p align="center">
	<img src="logo_en.png" width="600"/>
	<p>

	<p align="center">
	<b><font size="6">InternLM-XComposer-2.5-Reward</font></b>
	<p>

	<div align="center">

	[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)

	[Paper](https://huggingface.co./papers/2501.12368)

	</div>


	### Introduction

	InternLM-XComposer2.5-Reward is a multi-modal reward model trained on the foundation of [internlm/internlm-xcomposer2d5-7b](https://huggingface.co./internlm/internlm-xcomposer2d5-7b). This model has been trained using preference samples across text, image and video domains, and assigning appropriate reward scores that align with human preferences.

	## Performance Evaluation

	- Result on [VLRewardBench](https://arxiv.org/abs/2411.17451)
	\| Models \| General \| Hallucination \| Reasoning \| Overall \| Macro \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| InternLM-XComposer2.5-7B-Reward \| 84.7 \| 62.5 \| 62.9 \| 65.8 \| 70.0 \|

	- Result on [RewardBench](https://arxiv.org/abs/2403.13787)
	\| Models \| Score \| Chat \| Chat Hard \| Safety \| Reasoning \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| InternLM-XComposer2.5-7B-Reward \| 88.6 \| 90.8 \| 83.8 \| 87.8 \| 90.0 \|

	- Result on [RM-Bench](https://arxiv.org/abs/2410.16184)
	\| Models \| Chat \| Math \| Code \| Safety \| Easy \| Normal \| Hard \| Average \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| InternLM-XComposer2.5-7B-Reward \| 65.5 \| 55.9 \| 51.7 \| 93.8 \| 87.5 \| 71.3 \| 47.4 \| 68.8 \|

	### Basic Usage

	Here is an example of how to use the model to get the reward score of a chat, compare two chats, or rank multiple chats.

	```python
	import torch
	from transformers import AutoModel, AutoTokenizer

	model = AutoModel.from_pretrained(
	"internlm/internlm-xcomposer2d5-7b-reward",
	device_map="cuda",
	torch_dtype=torch.float16,
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-xcomposer2d5-7b-reward", trust_remote_code=True)
	model.tokenizer = tokenizer

	chat_1 = [
	{"role": "user", "content": 'I want to buy a car from the input image, analyze the advantages and weaknesses.'},
	{"role": "assistant", "content": "The car in the image is a Mercedes-Benz G-Class. This is a luxury SUV known for its ruggedness, off-road capability, and iconic boxy design. Here's an analysis of its advantages and weaknesses: Advantages: Off-road capability: The G-Class is built on a robust ladder frame chassis and has a powerful engine, making it highly capable in off-road conditions. It features a low-range transfer case, locking differentials, and a high ground clearance. Luxury and comfort: Despite its rugged exterior, the G-Class offers a luxurious interior with high-quality materials, comfortable seats, and advanced technology features. Iconic design: The distinctive boxy shape of the G-Class has made it a cult classic. It's instantly recognizable and has a timeless appeal. Strong resale value: Due to its exclusivity and desirability, the G-Class tends to hold its value well over time. Weaknesses: Fuel efficiency: The G-Class is not known for its fuel efficiency, especially with the larger engine options. Ride quality: The firm suspension is designed for off-road performance, which can make the ride a bit bumpy on paved roads. Price: The G-Class is a very expensive vehicle, even for a luxury SUV. Size and maneuverability: The large size of the G-Class can make it challenging to park and maneuver in tight spaces."}
	]
	chat_2 = [
	{"role": "user", "content": 'I want to buy a car from the input image, analyze the advantages and weaknesses.'},
	{"role": "assistant", "content": "Based on the image, it appears to be a Ferrari F8 Tributo. Advantages: Performance: Ferraris are renowned for their exceptional performance. The F8 Tributo is no exception, boasting a powerful twin-turbocharged V8 engine that delivers exhilarating acceleration and top speed. Handling: Ferraris are known for their precise handling and agility. The F8 Tributo continues this tradition, with advanced aerodynamics and sophisticated suspension systems ensuring a thrilling driving experience. Design: Ferrari is synonymous with iconic and head-turning designs. The F8 Tributo is a testament to this, featuring sleek lines, aggressive styling, and a captivating presence. Exclusivity: Owning a Ferrari is a status symbol. It represents luxury, exclusivity, and a passion for high-performance automobiles. Weaknesses: Price: Ferraris come with a hefty price tag. The F8 Tributo is no exception, making it an investment for those with significant financial resources. Fuel Efficiency: High-performance sports cars like the F8 Tributo are not known for their fuel efficiency. You can expect lower miles per gallon compared to everyday vehicles. Maintenance Costs: Owning a Ferrari comes with associated maintenance costs, which can be higher than those for regular cars. Practicality: The F8 Tributo is primarily a two-seater sports car, making it less practical for everyday use or carrying passengers. Ride Comfort: While the F8 Tributo offers a thrilling driving experience, its stiff suspension might not be ideal for long-distance comfort."}
	]
	image = ['./examples/cars1.jpg']
	hd_num = 9


	# get reward score for a single chat
	with torch.autocast(device_type='cuda', dtype=torch.float16):
	score1 = model.get_score(chat_1, image, hd_num=hd_num)
	score2 = model.get_score(chat_2, image, hd_num=hd_num)
	print("score1: ", score1)
	print("score2: ", score2)
	# >>> score1: 5.76
	# >>> score2: -2.84375


	# batch inference, get multiple scores at once
	with torch.autocast(device_type='cuda', dtype=torch.float16):
	scores = model.get_scores([chat_1, chat_2], [image, image], hd_num=hd_num)
	print("scores: ", scores)
	# >>> scores: [5.76171875, -2.845703125]


	# compare whether chat_1 is better than chat_2
	with torch.autocast(device_type='cuda', dtype=torch.float16):
	compare_res = model.compare(chat_1, image, chat_2, image, hd_num=hd_num)
	print("compare_res: ", compare_res)
	# >>> compare_res: True


	# rank multiple chats, it will return the ranking index of each chat
	# the chat with the highest score will have ranking index as 0
	with torch.autocast(device_type='cuda', dtype=torch.float16):
	rank_res = model.rank([chat_1, chat_2], [image, image], hd_num=hd_num)
	print("rank_res: ", rank_res) # lower index means higher score
	# >>> rank_res: [0, 1]
	```

	### Open Source License
	The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact [email protected].