Papers
arxiv:2410.11020

Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

Published on Oct 14, 2024
Authors:
,
,
,
,

Abstract

Large language models (LLMs), built on decoder-only transformers, excel in natural language generation and adapt to diverse tasks using zero-shot and few-shot prompting. However, these prompting methods often struggle on natural language understanding (NLU) tasks, where encoder-only models like BERT-base outperform LLMs on benchmarks like GLUE and SuperGLUE. This paper explores two approaches-supervised fine-tuning (SFT) and proximal policy optimization (PPO)-to enhance LLMs' NLU abilities. To reduce the cost of full-model fine-tuning, we integrate low-rank adaptation (LoRA) layers, limiting updates to these layers during both SFT and PPO. In SFT, task-specific prompts are concatenated with input queries and ground-truth labels, optimizing with next-token prediction. Despite this, LLMs still underperform compared to models like BERT-base on several NLU tasks. To close this gap, we apply PPO, a reinforcement learning technique that treats each token generation as an action and uses a reward function based on alignment with ground-truth answers. PPO then updates the model to maximize these rewards, aligning outputs with correct labels. Our experiments with LLAMA2-7B show that PPO improves performance, with a 6.3-point gain over SFT on GLUE. PPO exceeds zero-shot by 38.7 points and few-shot by 26.1 points on GLUE, while surpassing these by 28.8 and 28.5 points on SuperGLUE. Additionally, PPO outperforms BERT-large by 2.7 points on GLUE and 9.3 points on SuperGLUE. The improvements are consistent across models like Qwen2.5-7B and MPT-7B, highlighting PPO's robustness in enhancing LLMs' NLU capabilities.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.11020 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.11020 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.11020 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.