Add link to paper, link to code
#2
by
nielsr
HF staff
- opened
README.md
CHANGED
@@ -1,197 +1,141 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
language:
|
|
|
|
|
4 |
- en
|
5 |
-
|
6 |
-
|
7 |
-
-
|
8 |
-
-
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
---
|
11 |
|
|
|
12 |
|
13 |
-
|
14 |
-
[UI-TARS-2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT) |
|
15 |
-
[UI-TARS-7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT) |
|
16 |
-
[**UI-TARS-7B-DPO**](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)(Recommended) |
|
17 |
-
[UI-TARS-72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT) |
|
18 |
-
[**UI-TARS-72B-DPO**](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)(Recommended)
|
19 |
-
## Introduction
|
20 |
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
<img src="https://github.com/bytedance/UI-TARS/blob/main/figures/UI-TARS-vs-Previous-SOTA.png?raw=true" width="90%"/>
|
25 |
-
<p>
|
26 |
-
<p align="center">
|
27 |
-
<img src="https://github.com/bytedance/UI-TARS/blob/main/figures/UI-TARS.png?raw=true" width="90%"/>
|
28 |
-
<p>
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
|
|
|
|
33 |
|
34 |
-
|
35 |
|
36 |
-
##
|
37 |
-
**
|
38 |
-
|
39 |
-
|---------------------------|---------------|---------|----------|
|
40 |
-
| Qwen2-VL-7B | 73.3 | 81.8 | 84.9 |
|
41 |
-
| Qwen-VL-Max | 74.1 | 91.1 | 78.6 |
|
42 |
-
| Gemini-1.5-Pro | 75.4 | 88.9 | 82.2 |
|
43 |
-
| UIX-Qwen2-7B | 75.9 | 82.9 | 78.8 |
|
44 |
-
| Claude-3.5-Sonnet | 78.2 | 90.4 | 83.1 |
|
45 |
-
| GPT-4o | 78.5 | 87.7 | 82.3 |
|
46 |
-
| **UI-TARS-2B** | 72.9 | 89.2 | 86.4 |
|
47 |
-
| **UI-TARS-7B** | 79.7 | **93.6** | 87.7 |
|
48 |
-
| **UI-TARS-72B** | **82.8** | 89.3 | **88.6** |
|
49 |
|
50 |
-
**Grounding Capability Evaluation**
|
51 |
-
- **ScreenSpot Pro**
|
52 |
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
| Claude Computer Use | 22.0 | 3.9 | 12.6 | 25.9 | 3.4 | 16.8 | 14.5 | 3.7 | 11.9 | 33.9 | 15.8 | 25.8 | 30.1 | 16.3 | 26.9 | 11.0 | 4.5 | 8.1 | 23.4 | 7.1 | **17.1** |
|
65 |
-
| OS-Atlas-7B | 33.1 | 1.4 | 17.7 | 28.8 | 2.8 | 17.9 | 12.2 | 4.7 | 10.3 | 37.5 | 7.3 | 24.4 | 33.9 | 5.7 | 27.4 | 27.1 | 4.5 | 16.8 | 28.1 | 4.0 | **18.9** |
|
66 |
-
| UGround-V1-7B | - | - | 35.5 | - | - | 27.8 | - | - | 13.5 | - | - | 38.8 | - | - | 48.8 | - | - | 26.1 | - | - | **31.1** |
|
67 |
-
| **UI-TARS-2B** | 47.4 | 4.1 | 26.4 | 42.9 | 6.3 | 27.6 | 17.8 | 4.7 | 14.6 | 56.9 | 17.3 | 39.8 | 50.3 | 17.0 | 42.6 | 21.5 | 5.6 | 14.3 | 39.6 | 8.4 | **27.7** |
|
68 |
-
| **UI-TARS-7B** | 58.4 | 12.4 | 36.1 | 50.0 | 9.1 | 32.8 | **20.8**| 9.4 | **18.0**| 63.9 | **31.8** | **50.0** | **63.3** | 20.8 | 53.5 | 30.8 | **16.9**| 24.5 | 47.8 | 16.2 | **35.7** |
|
69 |
-
| **UI-TARS-72B** | **63.0** | **17.3** | **40.8** | **57.1** | **15.4** | **39.6** | 18.8 | **12.5**| 17.2 | **64.6** | 20.9 | 45.7 | **63.3** | **26.4** | **54.8** | **42.1**| 15.7 | **30.1**| **50.9**| **17.5**| **38.1** |
|
70 |
-
|
71 |
-
|
72 |
-
- **ScreenSpot**
|
73 |
-
|
74 |
-
| Method | Mobile-Text | Mobile-Icon/Widget | Desktop-Text | Desktop-Icon/Widget | Web-Text | Web-Icon/Widget | Avg |
|
75 |
-
|--------|-------------|-------------|-------------|-------------|-------------|---------|---------|
|
76 |
-
| **Agent Framework** | | | | | | | |
|
77 |
-
| GPT-4 (SeeClick) | 76.6 | 55.5 | 68.0 | 28.6 | 40.9 | 23.3 | **48.8** |
|
78 |
-
| GPT-4 (OmniParser) | 93.9 | 57.0 | 91.3 | 63.6 | 81.3 | 51.0 | **73.0** |
|
79 |
-
| GPT-4 (UGround-7B) | 90.1 | 70.3 | 87.1 | 55.7 | 85.7 | 64.6 | **75.6** |
|
80 |
-
| GPT-4o (SeeClick) | 81.0 | 59.8 | 69.6 | 33.6 | 43.9 | 26.2 | **52.3** |
|
81 |
-
| GPT-4o (UGround-7B) | 93.4 | 76.9 | 92.8 | 67.9 | 88.7 | 68.9 | **81.4** |
|
82 |
-
| **Agent Model** | | | | | | | |
|
83 |
-
| GPT-4 | 22.6 | 24.5 | 20.2 | 11.8 | 9.2 | 8.8 | **16.2** |
|
84 |
-
| GPT-4o | 20.2 | 24.9 | 21.1 | 23.6 | 12.2 | 7.8 | **18.3** |
|
85 |
-
| CogAgent | 67.0 | 24.0 | 74.2 | 20.0 | 70.4 | 28.6 | **47.4** |
|
86 |
-
| SeeClick | 78.0 | 52.0 | 72.2 | 30.0 | 55.7 | 32.5 | **53.4** |
|
87 |
-
| Qwen2-VL | 75.5 | 60.7 | 76.3 | 54.3 | 35.2 | 25.7 | **55.3** |
|
88 |
-
| UGround-7B | 82.8 | 60.3 | 82.5 | 63.6 | 80.4 | 70.4 | **73.3** |
|
89 |
-
| Aguvis-G-7B | 88.3 | 78.2 | 88.1 | 70.7 | 85.7 | 74.8 | **81.8** |
|
90 |
-
| OS-Atlas-7B | 93.0 | 72.9 | 91.8 | 62.9 | 90.9 | 74.3 | **82.5** |
|
91 |
-
| Claude Computer Use | - | - | - | - | - | - | **83.0** |
|
92 |
-
| Gemini 2.0 (Project Mariner) | - | - | - | - | - | - | **84.0** |
|
93 |
-
| Aguvis-7B | **95.6** | 77.7 | 93.8 | 67.1 | 88.3 | 75.2 | **84.4** |
|
94 |
-
| Aguvis-72B | 94.5 | **85.2** | 95.4 | 77.9 | **91.3** | **85.9** | **89.2** |
|
95 |
-
| **Our Model** | | | | | | | |
|
96 |
-
| **UI-TARS-2B** | 93.0 | 75.5 | 90.7 | 68.6 | 84.3 | 74.8 | **82.3** |
|
97 |
-
| **UI-TARS-7B** | 94.5 | **85.2** | **95.9** | 85.7 | 90.0 | 83.5 | **89.5** |
|
98 |
-
| **UI-TARS-72B** | 94.9 | 82.5 | 89.7 | **88.6** | 88.7 | 85.0 | **88.4** |
|
99 |
-
|
100 |
-
|
101 |
-
- **ScreenSpot v2**
|
102 |
-
|
103 |
-
| Method | Mobile-Text | Mobile-Icon/Widget | Desktop-Text | Desktop-Icon/Widget | Web-Text | Web-Icon/Widget | Avg |
|
104 |
-
|--------|-------------|-------------|-------------|-------------|-------------|---------|---------|
|
105 |
-
| **Agent Framework** | | | | | | | |
|
106 |
-
| GPT-4o (SeeClick) | 85.2 | 58.8 | 79.9 | 37.1 | 72.7 | 30.1 | **63.6** |
|
107 |
-
| GPT-4o (OS-Atlas-4B) | 95.5 | 75.8 | 79.4 | 49.3 | 90.2 | 66.5 | **79.1** |
|
108 |
-
| GPT-4o (OS-Atlas-7B) | 96.2 | 83.4 | 89.7 | 69.3 | **94.0** | 79.8 | **87.1** |
|
109 |
-
| **Agent Model** | | | | | | | |
|
110 |
-
| SeeClick | 78.4 | 50.7 | 70.1 | 29.3 | 55.2 | 32.5 | **55.1** |
|
111 |
-
| OS-Atlas-4B | 87.2 | 59.7 | 72.7 | 46.4 | 85.9 | 63.1 | **71.9** |
|
112 |
-
| OS-Atlas-7B | 95.2 | 75.8 | 90.7 | 63.6 | 90.6 | 77.3 | **84.1** |
|
113 |
-
| **Our Model** | | | | | | | |
|
114 |
-
| **UI-TARS-2B** | 95.2 | 79.1 | 90.7 | 68.6 | 87.2 | 78.3 | **84.7** |
|
115 |
-
| **UI-TARS-7B** | **96.9** | **89.1** | **95.4** | 85.0 | 93.6 | 85.2 | **91.6** |
|
116 |
-
| **UI-TARS-72B** | 94.8 | 86.3 | 91.2 | **87.9** | 91.5 | **87.7** | **90.3** |
|
117 |
-
|
118 |
-
|
119 |
-
**Offline Agent Capability Evaluation**
|
120 |
-
- **Multimodal Mind2Web**
|
121 |
-
|
122 |
-
| Method | Cross-Task Ele.Acc | Cross-Task Op.F1 | Cross-Task Step SR | Cross-Website Ele.Acc | Cross-Website Op.F1 | Cross-Website Step SR | Cross-Domain Ele.Acc | Cross-Domain Op.F1 | Cross-Domain Step SR |
|
123 |
-
|--------|----------------------|-------------------|--------------------|----------------------|--------------------|-------------------|--------------------|-------------------|-------------------|
|
124 |
-
| **Agent Framework** | | | | | | | | | |
|
125 |
-
| GPT-4o (SeeClick) | 32.1 | - | - | 33.1 | - | - | 33.5 | - | - |
|
126 |
-
| GPT-4o (UGround) | 47.7 | - | - | 46.0 | - | - | 46.6 | - | - |
|
127 |
-
| GPT-4o (Aria-UI) | 57.6 | - | - | 57.7 | - | - | 61.4 | - | - |
|
128 |
-
| GPT-4V (OmniParser) | 42.4 | 87.6 | 39.4 | 41.0 | 84.8 | 36.5 | 45.5 | 85.7 | 42.0 |
|
129 |
-
| **Agent Model** | | | | | | | | | |
|
130 |
-
| GPT-4o | 5.7 | 77.2 | 4.3 | 5.7 | 79.0 | 3.9 | 5.5 | 86.4 | 4.5 |
|
131 |
-
| GPT-4 (SOM) | 29.6 | - | 20.3 | 20.1 | - | 13.9 | 27.0 | - | 23.7 |
|
132 |
-
| GPT-3.5 (Text-only) | 19.4 | 59.2 | 16.8 | 14.9 | 56.5 | 14.1 | 25.2 | 57.9 | 24.1 |
|
133 |
-
| GPT-4 (Text-only) | 40.8 | 63.1 | 32.3 | 30.2 | 61.0 | 27.0 | 35.4 | 61.9 | 29.7 |
|
134 |
-
| Claude | 62.7 | 84.7 | 53.5 | 59.5 | 79.6 | 47.7 | 64.5 | 85.4 | 56.4 |
|
135 |
-
| Aguvis-7B | 64.2 | 89.8 | 60.4 | 60.7 | 88.1 | 54.6 | 60.4 | 89.2 | 56.6 |
|
136 |
-
| CogAgent | - | - | 62.3 | - | - | 54.0 | - | - | 59.4 |
|
137 |
-
| Aguvis-72B | 69.5 | 90.8 | 64.0 | 62.6 | 88.6 | 56.5 | 63.5 | 88.5 | 58.2 |
|
138 |
-
| **Our Model** | | | | | | | | | |
|
139 |
-
| **UI-TARS-2B** | 62.3 | 90.0 | 56.3 | 58.5 | 87.2 | 50.8 | 58.8 | 89.6 | 52.3 |
|
140 |
-
| **UI-TARS-7B** | 73.1 | 92.2 | 67.1 | 68.2 | 90.9 | 61.7 | 66.6 | 90.9 | 60.5 |
|
141 |
-
| **UI-TARS-72B** | **74.7** | **92.5** | **68.6** | **72.4** | **91.2** | **63.5** | **68.9** | **91.8** | **62.1** |
|
142 |
-
|
143 |
-
|
144 |
-
- **Android Control and GUI Odyssey**
|
145 |
-
|
146 |
-
| Agent Models | AndroidControl-Low Type | AndroidControl-Low Grounding | AndroidControl-Low SR | AndroidControl-High Type | AndroidControl-High Grounding | AndroidControl-High SR | GUIOdyssey Type | GUIOdyssey Grounding | GUIOdyssey SR |
|
147 |
-
|---------------------|----------------------|----------------------|----------------|----------------------|----------------------|----------------|----------------|----------------|----------------|
|
148 |
-
| Claude | 74.3 | 0.0 | 19.4 | 63.7 | 0.0 | 12.5 | 60.9 | 0.0 | 3.1 |
|
149 |
-
| GPT-4o | 74.3 | 0.0 | 19.4 | 66.3 | 0.0 | 20.8 | 34.3 | 0.0 | 3.3 |
|
150 |
-
| SeeClick | 93.0 | 73.4 | 75.0 | 82.9 | 62.9 | 59.1 | 71.0 | 52.4 | 53.9 |
|
151 |
-
| InternVL-2-4B | 90.9 | 84.1 | 80.1 | 84.1 | 72.7 | 66.7 | 82.1 | 55.5 | 51.5 |
|
152 |
-
| Qwen2-VL-7B | 91.9 | 86.5 | 82.6 | 83.8 | 77.7 | 69.7 | 83.5 | 65.9 | 60.2 |
|
153 |
-
| Aria-UI | -- | 87.7 | 67.3 | -- | 43.2 | 10.2 | -- | 86.8 | 36.5 |
|
154 |
-
| OS-Atlas-4B | 91.9 | 83.8 | 80.6 | 84.7 | 73.8 | 67.5 | 83.5 | 61.4 | 56.4 |
|
155 |
-
| OS-Atlas-7B | 93.6 | 88.0 | 85.2 | 85.2 | 78.5 | 71.2 | 84.5 | 67.8 | 62.0 |
|
156 |
-
| Aguvis-7B | -- | -- | 80.5 | -- | -- | 61.5 | -- | -- | -- |
|
157 |
-
| Aguvis-72B | -- | -- | 84.4 | -- | -- | 66.4 | -- | -- | -- |
|
158 |
-
| **UI-TARS-2B** | **98.1** | 87.3 | 89.3 | 81.2 | 78.4 | 68.9 | 93.9 | 86.8 | 83.4 |
|
159 |
-
| **UI-TARS-7B** | 98.0 | 89.3 | 90.8 | 83.7 | 80.5 | 72.5 | 94.6 | 90.1 | 87.0 |
|
160 |
-
| **UI-TARS-72B** | **98.1** | **89.9** | **91.3** | **85.2** | **81.5** | **74.7** | **95.4** | **91.4** | **88.6** |
|
161 |
-
|
162 |
-
**Online Agent Capability Evaluation**
|
163 |
-
|
164 |
-
| Method | OSWorld (Online) | AndroidWorld (Online) |
|
165 |
-
|--------|-------------------|------------------|
|
166 |
-
| **Agent Framework** | | |
|
167 |
-
| GPT-4o (UGround) | - | 32.8 |
|
168 |
-
| GPT-4o (Aria-UI) | 15.2 | 44.8 |
|
169 |
-
| GPT-4o (Aguvis-7B) | 14.8 | 37.1 |
|
170 |
-
| GPT-4o (Aguvis-72B) | 17.0 | - |
|
171 |
-
| GPT-4o (OS-Atlas-7B) | 14.6 | - |
|
172 |
-
| **Agent Model** | | |
|
173 |
-
| GPT-4o | 5.0 | 34.5 (SoM) |
|
174 |
-
| Gemini-Pro-1.5 | 5.4 | 22.8 (SoM) |
|
175 |
-
| Aguvis-72B | 10.3 | 26.1 |
|
176 |
-
| Claude Computer-Use | 14.9 (15 steps) | 27.9 |
|
177 |
-
| Claude Computer-Use | 22.0 (50 steps) | - |
|
178 |
-
| **Our Model** | | |
|
179 |
-
| **UI-TARS-7B-SFT** | 17.7 (15 steps) | 33.0 |
|
180 |
-
| **UI-TARS-7B-DPO** | 18.7 (15 steps) | - |
|
181 |
-
| **UI-TARS-72B-SFT** | 18.8 (15 steps) | **46.6** |
|
182 |
-
| **UI-TARS-72B-DPO** | **22.7** (15 steps) | - |
|
183 |
-
| **UI-TARS-72B-DPO** | **24.6** (50 steps) | - |
|
184 |
-
|
185 |
-
|
186 |
-
## Citation
|
187 |
-
If you find our paper and model useful in your research, feel free to give us a cite.
|
188 |
-
|
189 |
-
```BibTeX
|
190 |
-
@article{uitars2025,
|
191 |
-
author = {Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, Wanjun Zhong, Kuanye Li, Jiale Yang, Yu Miao, Woyu Lin, Longxiang Liu, Xu Jiang, Qianli Ma, Jingyu Li, Xiaojun Xiao, Kai Cai, Chuang Li, Yaowei Zheng, Chaolin Jin, Chen Li, Xiao Zhou, Minchao Wang, Haoli Chen, Zhaojian Li, Haihua Yang, Haifeng Liu, Feng Lin, Tao Peng, Xin Liu, Guang Shi},
|
192 |
-
title = {UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
|
193 |
-
journal = {arXiv preprint arXiv:2501.12326},
|
194 |
-
url = {https://github.com/bytedance/UI-TARS},
|
195 |
-
year = {2025}
|
196 |
}
|
197 |
```
|
|
|
1 |
---
|
2 |
+
dataset_info:
|
3 |
+
features:
|
4 |
+
- name: hazard_category
|
5 |
+
dtype: string
|
6 |
+
- name: hazard_subcategory
|
7 |
+
dtype: string
|
8 |
+
- name: hazard_subsubcategory
|
9 |
+
dtype: string
|
10 |
+
- name: case_id
|
11 |
+
dtype: string
|
12 |
+
- name: case_text
|
13 |
+
dtype: string
|
14 |
+
- name: unsafe_image_id
|
15 |
+
dtype: string
|
16 |
+
- name: unsafe_image_description
|
17 |
+
dtype: string
|
18 |
+
- name: prompt_text
|
19 |
+
dtype: string
|
20 |
+
- name: prompt_type
|
21 |
+
dtype: string
|
22 |
+
- name: unsafe_image_url
|
23 |
+
dtype: string
|
24 |
+
- name: unsafe_image_license
|
25 |
+
dtype: string
|
26 |
+
- name: unsafe_image_cw
|
27 |
+
dtype: string
|
28 |
+
splits:
|
29 |
+
- name: german
|
30 |
+
num_bytes: 70718
|
31 |
+
num_examples: 200
|
32 |
+
- name: russian
|
33 |
+
num_bytes: 76499
|
34 |
+
num_examples: 200
|
35 |
+
- name: chinese
|
36 |
+
num_bytes: 70778
|
37 |
+
num_examples: 200
|
38 |
+
- name: hindi
|
39 |
+
num_bytes: 84054
|
40 |
+
num_examples: 200
|
41 |
+
- name: spanish
|
42 |
+
num_bytes: 70689
|
43 |
+
num_examples: 200
|
44 |
+
- name: italian
|
45 |
+
num_bytes: 69545
|
46 |
+
num_examples: 200
|
47 |
+
- name: french
|
48 |
+
num_bytes: 73103
|
49 |
+
num_examples: 200
|
50 |
+
- name: english
|
51 |
+
num_bytes: 139996
|
52 |
+
num_examples: 400
|
53 |
+
- name: korean
|
54 |
+
num_bytes: 73217
|
55 |
+
num_examples: 200
|
56 |
+
- name: arabic
|
57 |
+
num_bytes: 71779
|
58 |
+
num_examples: 200
|
59 |
+
- name: farsi
|
60 |
+
num_bytes: 75732
|
61 |
+
num_examples: 200
|
62 |
+
download_size: 351210
|
63 |
+
dataset_size: 876110
|
64 |
+
configs:
|
65 |
+
- config_name: default
|
66 |
+
data_files:
|
67 |
+
- split: german
|
68 |
+
path: data/german-*
|
69 |
+
- split: russian
|
70 |
+
path: data/russian-*
|
71 |
+
- split: chinese
|
72 |
+
path: data/chinese-*
|
73 |
+
- split: hindi
|
74 |
+
path: data/hindi-*
|
75 |
+
- split: spanish
|
76 |
+
path: data/spanish-*
|
77 |
+
- split: italian
|
78 |
+
path: data/italian-*
|
79 |
+
- split: french
|
80 |
+
path: data/french-*
|
81 |
+
- split: english
|
82 |
+
path: data/english-*
|
83 |
+
- split: korean
|
84 |
+
path: data/korean-*
|
85 |
+
- split: arabic
|
86 |
+
path: data/arabic-*
|
87 |
+
- split: farsi
|
88 |
+
path: data/farsi-*
|
89 |
+
license: cc-by-4.0
|
90 |
language:
|
91 |
+
- ar
|
92 |
+
- fr
|
93 |
- en
|
94 |
+
- de
|
95 |
+
- zh
|
96 |
+
- ko
|
97 |
+
- fa
|
98 |
+
- hi
|
99 |
+
- it
|
100 |
+
- ru
|
101 |
+
- es
|
102 |
+
size_categories:
|
103 |
+
- 1K<n<10K
|
104 |
+
task_categories:
|
105 |
+
- image-text-to-text
|
106 |
---
|
107 |
|
108 |
+
# Dataset Card for the MSTS Benchmark
|
109 |
|
110 |
+
Here, you can find our [paper](https://huggingface.co/papers/2501.10057) and [code](https://github.com/paul-rottger/msts-multimodal-safety). Note that for reproducing the exact results, we refer the user to the GitHub repo that provides download and preprocessing scripts for the images.
|
|
|
|
|
|
|
|
|
|
|
|
|
111 |
|
112 |
+
Example usage:
|
113 |
+
```python
|
114 |
+
from datasets import load_dataset
|
|
|
|
|
|
|
|
|
|
|
115 |
|
116 |
+
ds = load_dataset("felfri/MSTS")
|
117 |
|
118 |
+
# or select specific language
|
119 |
+
lang = 'german'
|
120 |
+
ds = load_dataset("felfri/MSTS", split=lang)
|
121 |
|
122 |
+
```
|
123 |
|
124 |
+
## Disclaimer
|
125 |
+
The MSTS dataset **contains content that may be offensive or upsetting in nature**. Topics include, but are not limited to, **discriminatory language and discussions of abuse, violence, self-harm, exploitation, and other potentially upsetting subject matter**.
|
126 |
+
Please only engage with the data in accordance with your own personal risk tolerance. The data are intended for research purposes, especially research that can make models less harmful.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
127 |
|
|
|
|
|
128 |
|
129 |
+
## Citation Information
|
130 |
+
Please consider citing our work if you use data and/or code from this repository.
|
131 |
+
```bibtex
|
132 |
+
@misc{röttger2025mstsmultimodalsafetytest,
|
133 |
+
title={MSTS: A Multimodal Safety Test Suite for Vision-Language Models},
|
134 |
+
author={Paul Röttger and Giuseppe Attanasio and Felix Friedrich and Janis Goldzycher and Alicia Parrish and Rishabh Bhardwaj and Chiara Di Bonaventura and Roman Eng and Gaia El Khoury Geagea and Sujata Goswami and Jieun Han and Dirk Hovy and Seogyeong Jeong and Paloma Jeretič and Flor Miriam Plaza-del-Arco and Donya Rooein and Patrick Schramowski and Anastassia Shaitarova and Xudong Shen and Richard Willats and Andrea Zugarini and Bertie Vidgen},
|
135 |
+
year={2025},
|
136 |
+
eprint={2501.10057},
|
137 |
+
archivePrefix={arXiv},
|
138 |
+
primaryClass={cs.CL},
|
139 |
+
url={https://arxiv.org/abs/2501.10057},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
140 |
}
|
141 |
```
|