myownskyW7
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,114 @@
|
|
1 |
-
---
|
2 |
-
license: other
|
3 |
-
|
4 |
-
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
pipeline_tag: image-text-to-text
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
- zh
|
7 |
+
tags:
|
8 |
+
- reward model
|
9 |
+
---
|
10 |
+
|
11 |
+
<p align="center">
|
12 |
+
<img src="logo_en.png" width="600"/>
|
13 |
+
<p>
|
14 |
+
|
15 |
+
<p align="center">
|
16 |
+
<b><font size="6">InternLM-XComposer-2.5-Reward</font></b>
|
17 |
+
<p>
|
18 |
+
|
19 |
+
<div align="center">
|
20 |
+
|
21 |
+
[💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
|
22 |
+
|
23 |
+
[Paper](https://huggingface.co/papers/2501.xxxxx)
|
24 |
+
|
25 |
+
</div>
|
26 |
+
|
27 |
+
|
28 |
+
### Introduction
|
29 |
+
|
30 |
+
**InternLM-XComposer2.5-Reward** is a multi-modal reward model trained on the foundation of [internlm/internlm-xcomposer2d5-7b](https://huggingface.co/internlm/internlm-xcomposer2d5-7b). This model has been trained using preference samples across text, image and video domains, and assigning appropriate reward scores that align with human preferences.
|
31 |
+
|
32 |
+
## Performance Evaluation
|
33 |
+
|
34 |
+
- Result on [VLRewardBench](https://arxiv.org/abs/2411.17451)
|
35 |
+
| Models | General | Hallucination | Reasoning | Overall | Macro |
|
36 |
+
| --- | --- | --- | --- | --- | --- |
|
37 |
+
| InternLM-XComposer2.5-7B-Reward | 84.7 | 62.5 | 62.9 | 65.8 | 70.0 |
|
38 |
+
|
39 |
+
- Result on [RewardBench](https://arxiv.org/abs/2403.13787)
|
40 |
+
| Models | Score | Chat | Chat Hard | Safety | Reasoning |
|
41 |
+
| --- | --- | --- | --- | --- | --- |
|
42 |
+
| InternLM-XComposer2.5-7B-Reward | 88.6 | 90.8 | 83.8 | 87.8 | 90.0 |
|
43 |
+
|
44 |
+
- Result on [RM-Bench](https://arxiv.org/abs/2410.16184)
|
45 |
+
| Models | Chat | Math | Code | Safety | Easy | Normal | Hard | Average |
|
46 |
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
47 |
+
| InternLM-XComposer2.5-7B-Reward | 65.5 | 55.9 | 51.7 | 93.8 | 87.5 | 71.3 | 47.4 | 68.8 |
|
48 |
+
|
49 |
+
### Basic Usage
|
50 |
+
|
51 |
+
Here is an example of how to use the model to get the reward score of a chat, compare two chats, or rank multiple chats.
|
52 |
+
|
53 |
+
```python
|
54 |
+
import torch
|
55 |
+
from transformers import AutoModel, AutoTokenizer
|
56 |
+
|
57 |
+
model = AutoModel.from_pretrained(
|
58 |
+
"internlm/internlm-xcomposer2d5-7b-reward",
|
59 |
+
device_map="cuda",
|
60 |
+
torch_dtype=torch.float16,
|
61 |
+
trust_remote_code=True,
|
62 |
+
)
|
63 |
+
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-xcomposer2d5-7b-reward", trust_remote_code=True)
|
64 |
+
model.tokenizer = tokenizer
|
65 |
+
|
66 |
+
chat_1 = [
|
67 |
+
{"role": "user", "content": 'I want to buy a car from the input image, analyze the advantages and weaknesses.'},
|
68 |
+
{"role": "assistant", "content": "The car in the image is a Mercedes-Benz G-Class. This is a luxury SUV known for its ruggedness, off-road capability, and iconic boxy design. Here's an analysis of its advantages and weaknesses: Advantages: Off-road capability: The G-Class is built on a robust ladder frame chassis and has a powerful engine, making it highly capable in off-road conditions. It features a low-range transfer case, locking differentials, and a high ground clearance. Luxury and comfort: Despite its rugged exterior, the G-Class offers a luxurious interior with high-quality materials, comfortable seats, and advanced technology features. Iconic design: The distinctive boxy shape of the G-Class has made it a cult classic. It's instantly recognizable and has a timeless appeal. Strong resale value: Due to its exclusivity and desirability, the G-Class tends to hold its value well over time. Weaknesses: Fuel efficiency: The G-Class is not known for its fuel efficiency, especially with the larger engine options. Ride quality: The firm suspension is designed for off-road performance, which can make the ride a bit bumpy on paved roads. Price: The G-Class is a very expensive vehicle, even for a luxury SUV. Size and maneuverability: The large size of the G-Class can make it challenging to park and maneuver in tight spaces."}
|
69 |
+
]
|
70 |
+
|
71 |
+
chat_2 = [
|
72 |
+
{"role": "user", "content": 'I want to buy a car from the input image, analyze the advantages and weaknesses.'},
|
73 |
+
{"role": "assistant", "content": "Based on the image, it appears to be a Ferrari F8 Tributo. Advantages: Performance: Ferraris are renowned for their exceptional performance. The F8 Tributo is no exception, boasting a powerful twin-turbocharged V8 engine that delivers exhilarating acceleration and top speed. Handling: Ferraris are known for their precise handling and agility. The F8 Tributo continues this tradition, with advanced aerodynamics and sophisticated suspension systems ensuring a thrilling driving experience. Design: Ferrari is synonymous with iconic and head-turning designs. The F8 Tributo is a testament to this, featuring sleek lines, aggressive styling, and a captivating presence. Exclusivity: Owning a Ferrari is a status symbol. It represents luxury, exclusivity, and a passion for high-performance automobiles. Weaknesses: Price: Ferraris come with a hefty price tag. The F8 Tributo is no exception, making it an investment for those with significant financial resources. Fuel Efficiency: High-performance sports cars like the F8 Tributo are not known for their fuel efficiency. You can expect lower miles per gallon compared to everyday vehicles. Maintenance Costs: Owning a Ferrari comes with associated maintenance costs, which can be higher than those for regular cars. Practicality: The F8 Tributo is primarily a two-seater sports car, making it less practical for everyday use or carrying passengers. Ride Comfort: While the F8 Tributo offers a thrilling driving experience, its stiff suspension might not be ideal for long-distance comfort."}
|
74 |
+
]
|
75 |
+
|
76 |
+
image = ['./examples/cars1.jpg']
|
77 |
+
|
78 |
+
hd_num = 9
|
79 |
+
|
80 |
+
|
81 |
+
# get reward score for a single chat
|
82 |
+
with torch.autocast(device_type='cuda', dtype=torch.float16):
|
83 |
+
score1 = model.get_score(chat_1, image, hd_num=hd_num)
|
84 |
+
score2 = model.get_score(chat_2, image, hd_num=hd_num)
|
85 |
+
print("score1: ", score1)
|
86 |
+
print("score2: ", score2)
|
87 |
+
# >>> score1: 5.76
|
88 |
+
# >>> score2: -2.84375
|
89 |
+
|
90 |
+
|
91 |
+
# batch inference, get multiple scores at once
|
92 |
+
with torch.autocast(device_type='cuda', dtype=torch.float16):
|
93 |
+
scores = model.get_scores([chat_1, chat_2], [image, image], hd_num=hd_num)
|
94 |
+
print("scores: ", scores)
|
95 |
+
# >>> scores: [5.76171875, -2.845703125]
|
96 |
+
|
97 |
+
|
98 |
+
# compare whether chat_1 is better than chat_2
|
99 |
+
with torch.autocast(device_type='cuda', dtype=torch.float16):
|
100 |
+
compare_res = model.compare(chat_1, image, chat_2, image, hd_num=hd_num)
|
101 |
+
print("compare_res: ", compare_res)
|
102 |
+
# >>> compare_res: True
|
103 |
+
|
104 |
+
|
105 |
+
# rank multiple chats, it will return the ranking index of each chat
|
106 |
+
# the chat with the highest score will have ranking index as 0
|
107 |
+
with torch.autocast(device_type='cuda', dtype=torch.float16):
|
108 |
+
rank_res = model.rank([chat_1, chat_2], [image, image], hd_num=hd_num)
|
109 |
+
print("rank_res: ", rank_res) # lower index means higher score
|
110 |
+
# >>> rank_res: [0, 1]
|
111 |
+
```
|
112 |
+
|
113 |
+
### Open Source License
|
114 |
+
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].
|