Challenges in Trustworthy Human Evaluation of Chatbots
Abstract
Open community-driven platforms like Chatbot Arena that collect user preference data from site visitors have gained a reputation as one of the most trustworthy publicly available benchmarks for LLM performance. While now standard, it is tricky to implement effective guardrails to collect high-quality annotations from humans. In this paper, we demonstrate that three sources of bad annotations, both malicious and otherwise, can corrupt the reliability of open leaderboard rankings. In particular, we show that only 10\% of poor quality votes by apathetic (site visitors not appropriately incentivized to give correct votes) or adversarial (bad actors seeking to inflate the ranking of a target model) annotators can change the rankings of models by up to 5 places on the leaderboard. Finally, we discuss open challenges in ensuring high-quality human annotations.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths? (2024)
- "All that Glitters": Approaches to Evaluations with Unreliable Model and Human Annotations (2024)
- First-Person Fairness in Chatbots (2024)
- Multi-Perspective Stance Detection (2024)
- AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public Spaces (2024)
- Decompose and Leverage Preferences from Expert Models for Improving Trustworthiness of MLLMs (2024)
- The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper