arxiv:2409.09318

ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models

Published on Sep 14

Authors:

Abstract

Hallucination poses a significant challenge for multimodal large language models (MLLMs). However, existing benchmarks for evaluating hallucinations are static, which can lead to potential data contamination. This paper introduces ODE, an open-set, dynamic protocol for evaluating object existence hallucinations in MLLMs. Our framework employs graph structures to model associations between real-word concepts and generates novel samples for both general and domain-specific scenarios. The dynamic combination of concepts, along with various combination principles, ensures a broad sample distribution. Experimental results show that MLLMs exhibit higher hallucination rates with ODE-generated samples, effectively avoiding data contamination. Moreover, these samples can also be used for fine-tuning to improve MLLM performance on existing benchmarks.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.09318 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.09318 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.09318 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.