Internal Consistency and Self-Feedback in Large Language Models: A Survey
Abstract
Large language models (LLMs) are expected to respond accurately but often exhibit deficient reasoning or generate hallucinatory content. To address these, studies prefixed with ``Self-'' such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating itself to mitigate the issues. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization without examining the motivations behind these works. In this paper, we summarize a theoretical framework, termed Internal Consistency, which offers unified explanations for phenomena such as the lack of reasoning and the presence of hallucinations. Internal Consistency assesses the coherence among LLMs' latent layer, decoding layer, and response layer based on sampling methodologies. Expanding upon the Internal Consistency framework, we introduce a streamlined yet effective theoretical framework capable of mining Internal Consistency, named Self-Feedback. The Self-Feedback framework consists of two modules: Self-Evaluation and Self-Update. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, ``Does Self-Feedback Really Work?'' We propose several critical viewpoints, including the ``Hourglass Evolution of Internal Consistency'', ``Consistency Is (Almost) Correctness'' hypothesis, and ``The Paradox of Latent and Explicit Reasoning''. Furthermore, we outline promising directions for future research. We have open-sourced the experimental code, reference list, and statistical data, available at https://github.com/IAAR-Shanghai/ICSFSurvey.
Community
- Hugging Face: https://huggingface.co./papers/2407.14507
- arXiv: https://arxiv.org/abs/2407.14507
- PDF: https://arxiv.org/pdf/2407.14507
- GitHub: https://github.com/IAAR-Shanghai/ICSFSurvey
On a related note, check out our survey: "A Survey on Self-Evolution of Large Language Models."
https://arxiv.org/abs/2404.14387
We dive into how LLMs can learn and improve on their own, inspired by human experiential learning. If you're interested in autonomous learning and self-evolving LLMs, you'll find our work complementary to yours.
You can find more details and our ongoing updates on our GitHub: https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/Awesome-Self-Evolution-of-LLM
Congrats again on your fantastic work!
Hello everyone, I would like to introduce our recent survey paper!
As many of you may know, improving the reasoning ability of large language models (LLMs) and mitigating the hallucination problem are crucial research topics. After extensive thought, we realized that these two issues, "enhancing reasoning" and "alleviating hallucinations," share the same fundamental nature. We approached these issues from the perspective of internal consistency. This perspective allowed us to unify many seemingly unrelated works into a single framework. To improve internal consistency (which in turn enhances reasoning ability and mitigates hallucinations), we identified common elements across various works and summarized them into a Self-Feedback framework.
This framework consists of three components: Self-Evaluation, Internal Consistency Signal, and Self-Update.
- Self-Evaluation: Responsible for evaluating the model's internal consistency based on its language expressions, decoding layer probability distributions, and hidden states.
- Internal Consistency Signal: Through Self-Evaluation, we can obtain numerical, textual, external, and even comparative signals.
- Self-Update: Using these signals, we can update the model's expressions or even the model itself to improve internal consistency.
This summarized framework has the potential to encompass many works, as illustrated in the following diagram.
Additionally, we have derived several important insights through experiments and analysis, such as the "hourglass internal consistency evolution rule," "consistent (almost) equals correct," and the "implicit vs. explicit reasoning paradox."
In summary, we have unified many works from the perspectives of internal consistency and self-feedback, providing inspiration for future researchers and standardizing work in this field.
Relevant links:
- Hugging Face: https://huggingface.co./papers/2407.14507
- arXiv: https://arxiv.org/abs/2407.14507
- PDF: https://arxiv.org/pdf/2407.14507
- GitHub: https://github.com/IAAR-Shanghai/ICSFSurvey
We would greatly appreciate it if you could give us a like or share on Hugging Face!
Chinese Version:
大家好,给大家推荐下我们最近写的一篇综述~
很多人可能都了解,提升LLM的推理能力或者是缓解LLM的幻觉问题是很重要的研究话题。我们很长时间的思考发现这两个问题,“推理提升”和“幻觉缓解”其实是具有相同本质的,我们从内在一致性视角上来理解这两个问题。然后就可以把许多看似不相关的工作统一到一个框架里。为了提高内在一致性(其实也就是提高推理能力,缓解幻觉问题),我们发掘大多数工作中的相同点,归纳出了Self-Feedback这个框架。
这个框架包括三个要素,Self-Evaluation,Internal Consistency Signal,Self-Update。
- Self-Evaluation:负责从模型的语言表达、解码层概率分布、隐藏层中的状态来评估模型的内在一致性
- Internal Consistency Signal:通过Self-Evaluation,我们能得到数值式的,文本式的,外部式的,甚至对比式的信号
- Self-Update:利用这个信号,我们可以对模型本身的表达,甚至模型本身作更新,来提高内在一致性。
这个总结的框架,潜在的可以把许多工作包含进去,比如下面这个图所示的。
以及,我们其实还通过实验或者是分析总结了一些很重要的观点,比如“沙漏型内在一致性演变规律”“一致(几乎)即正确”“隐含推理与显式推理悖论”等。
总的来说,我们从内在一致性和自反馈这两个视角把许多工作统一在一起,既给后人启发,也能规范这个领域的工作。
相关链接:
- Hugging Face: https://huggingface.co./papers/2407.14507
- arXiv: https://arxiv.org/abs/2407.14507
- PDF: https://arxiv.org/pdf/2407.14507
- GitHub: https://github.com/IAAR-Shanghai/ICSFSurvey
谢谢你用中文分享!让中文社区的开发者读起来更方便 🤗
Hi, I would like to introduce our recent survey paper!
Large language models (LLMs) have become an essential tool in natural language processing, yet they often struggle with two critical issues: enhancing reasoning ability and mitigating hallucinations. These challenges are central to ongoing research, and addressing them is crucial for advancing the effectiveness of LLMs. Through extensive investigation, we realized that the problems of "enhancing reasoning" and "alleviating hallucinations" share a common foundation: internal consistency. This realization led us to propose a unified framework that integrates various seemingly unrelated works, known as the Self-Feedback framework.
The Self-Feedback framework is designed to improve internal consistency, which in turn enhances reasoning ability and mitigates hallucinations. It consists of three interconnected components:
- Self-Evaluation: This component evaluates the model's internal consistency by analyzing its language expressions, decoding layer probability distributions, and hidden states.
- Internal Consistency Signal: The evaluation process yields numerical, textual, external, and comparative signals that reflect the model's internal consistency.
- Self-Update: Utilizing these signals, this component updates the model's expressions or the model itself to improve internal consistency.
This framework encompasses many existing works and offers a unified perspective. The following diagram illustrates the integration of various studies under this framework:
We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, ''Does Self-Feedback Really Work?" We propose several critical viewpoints, including the "Hourglass Evolution of Internal Consistency", "Consistency Is (Almost) Correctness" hypothesis, and "The Paradox of Latent and Explicit Reasoning". Furthermore, we outline promising directions for future research.
In summary, our paper unifies many works from the perspectives of internal consistency and self-feedback, offering a standardized framework that can inspire future research. We believe this framework will serve as a foundational reference for researchers and practitioners aiming to enhance LLMs.
Relevant links:
- Hugging Face: https://huggingface.co./papers/2407.14507
- arXiv: https://arxiv.org/abs/2407.14507
- PDF: https://arxiv.org/pdf/2407.14507
- GitHub: https://github.com/IAAR-Shanghai/ICSFSurvey
- Paper List (WIP): https://www.yuque.com/zhiyu-n2wnm/ugzwgf/gmqfkfigd6xw26eg
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Aligning Large Language Models from Self-Reference AI Feedback with one General Principle (2024)
- Large Language Models have Intrinsic Self-Correction Ability (2024)
- Mitigating Entity-Level Hallucination in Large Language Models (2024)
- Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning (2024)
- InternalInspector I2: Robust Confidence Estimation in LLMs through Internal States (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper