Papers
arxiv:2408.02599

Progressively Selective Label Enhancement for Language Model Alignment

Published on Aug 5
Authors:
,
,

Abstract

Large Language Models have demonstrated impressive capabilities in various language tasks but may produce content that misaligns with human expectations, raising ethical and legal concerns. Therefore, it is important to explore the limitations and implement restrictions on the models to ensure safety and compliance, with Reinforcement Learning from Human Feedback (RLHF) being the primary method. Due to challenges in stability and scalability with the RLHF stages, researchers are exploring alternative methods to achieve effects comparable to those of RLHF. However, these methods often depend on large high-quality datasets and inefficiently utilize generated data. To deal with this problem, we propose PSLE, i.e., Progressively Selective Label Enhancement for Language Model Alignment, a framework that fully utilizes all generated data by guiding the model with principles to align outputs with human expectations. Using a dynamically updated threshold, our approach ensures efficient data utilization by incorporating all generated responses and weighting them based on their corresponding reward scores. Experimental results on multiple datasets demonstrate the effectiveness of PSLE compared to existing language model alignment methods.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2408.02599 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.02599 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.02599 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.