arxiv:2501.10057

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Published on Jan 17

· Submitted by

felfri on Jan 22

Upvote

Authors:

Paul Röttger ,

Giuseppe Attanasio ,

Felix Friedrich ,

Janis Goldzycher ,

Alicia Parrish ,

Rishabh Bhardwaj ,

Chiara Di Bonaventura ,

Roman Eng ,

Flor Miriam Plaza-del-Arco ,

Donya Rooein ,

Patrick Schramowski ,

Anastassia Shaitarova ,

Xudong Shen ,

Richard Willats ,

Andrea Zugarini ,

Abstract

Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses. We also show models to be safer when tested with text only rather than multimodal prompts. Finally, we explore the automation of VLM safety assessments, finding even the best safety classifiers to be lacking.

View arXiv page View PDF Add to collection

Community

felfri

Paper author Paper submitter 1 day ago

🚀 Today, we are releasing MSTS, a new Multimodal Safety Test Suite for Vision-Language Models! MSTS is exciting because it tests for safety risks created by multimodality. Each prompt consists of a text + image that only in combination reveals their full unsafe meaning. Many thanks to my great co-authors @Paul @g8a9 @avparrish @PSaiml @Bertievidgen !

All of MSTS is permissively licensed and available now. Check out the MSTS preprint for more details, or go to GitHub/HuggingFace to access the dataset. Feel free to share and use our work:

paper: https://arxiv.org/abs/2501.10057
code: https://github.com/paul-rottger/msts-multimodal-safety
dataset: https://huggingface.co./datasets/felfri/MSTS

librarian-bot

about 22 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.10057 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.10057 in a Space README.md to link it from this page.