arxiv:2311.09182

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

Published on Nov 15, 2023

Authors:

Abstract

In recent times, large language models (LLMs) have shown impressive performance on various document-level tasks such as document classification, summarization, and question-answering. However, research on understanding their capabilities on the task of self-contradictions in long documents has been very limited. In this work, we introduce ContraDoc, the first human-annotated dataset to study self-contradictions in long documents across multiple domains, varying document lengths, <PRE_TAG>self-contradictions types</POST_TAG>, and scope. We then analyze the current capabilities of four state-of-the-art open-source and commercially available LLMs: GPT3.5, GPT4, PaLM2, and LLaMAv2 on this dataset. While GPT4 performs the best and can outperform humans on this task, we find that it is still unreliable and struggles with self-contradictions that require more nuance and context. We release the dataset and all the code associated with the experiments (https://github.com/ddhruvkr/CONTRADOC).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2311.09182 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2311.09182 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2311.09182 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.