Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update 17 days ago
Post
793
๐Ÿšจ ๐—›๐˜‚๐—บ๐—ฎ๐—ป ๐—™๐—ฒ๐—ฒ๐—ฑ๐—ฏ๐—ฎ๐—ฐ๐—ธ ๐—ณ๐—ผ๐—ฟ ๐—”๐—œ ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด: ๐—ก๐—ผ๐˜ ๐˜๐—ต๐—ฒ ๐—ด๐—ผ๐—น๐—ฑ๐—ฒ๐—ป ๐—ด๐—ผ๐—ผ๐˜€๐—ฒ ๐˜„๐—ฒ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜?

Iโ€™ve just read a great paper where Cohere researchers raises significant questions about using Human feedback to evaluate AI language models.

Human feedback is often regarded as the gold standard for judging AI performance, but it turns out, it might be more like fool's gold : the study reveals that our human judgments are easily swayed by factors that have nothing to do with actual AI performance.

๐—ž๐—ฒ๐˜† ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:
๐Ÿง  Test several models: Llama-2, Falcon-40B, Cohere Command 6 and 52B ๐Ÿ™…โ€โ™‚๏ธ Refusing to answer tanks AI ratings more than getting facts wrong. We apparently prefer a wrong answer to no answer!

๐Ÿ’ช Confidence is key (even when it shouldn't be): More assertive AI responses are seen as more factual, even when they're not. This could be pushing AI development in the wrong direction, with systems like RLHF.

๐ŸŽญ The assertiveness trap: As AI responses get more confident-sounding, non-expert annotators become less likely to notice when they're wrong or inconsistent.

And a consequence of the above:
๐Ÿ”„ ๐—ฅ๐—Ÿ๐—›๐—™ ๐—บ๐—ถ๐—ด๐—ต๐˜ ๐—ฏ๐—ฎ๐—ฐ๐—ธ๐—ณ๐—ถ๐—ฟ๐—ฒ: Using human feedback to train AI (Reinforcement Learning from Human Feedback) could accidentally make AI more overconfident and less accurate.

This paper means we need to think carefully about how we evaluate and train AI systems to ensure we're rewarding correctness over apparences of it like confident talk.

โ›”๏ธ Chatbot Arenaโ€™s ELO leaderboard, based on crowdsourced answers from average joes like you and me, might become completely irrelevant as models will become smarter and smarter.

Read the paper ๐Ÿ‘‰ Human Feedback is not Gold Standard (2309.16349)
In this post