--- title: LLMLingua emoji: 📝 colorFrom: red colorTo: yellow sdk: gradio sdk_version: 3.47.1 app_file: app.py pinned: false license: mit --- Check out the configuration reference at https://huggingface.co./docs/hub/spaces-config-reference

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models & LongLLMLingua

| LLMLingua Paper | LongLLMLingua Paper | HF Space Demo |

## Tl;DR LLMLingua, that uses a well-trained small language model after alignment, such as GPT2-small or LLaMA-7B, to detect the unimportant tokens in the prompt and enable inference with the compressed prompt in black-box LLMs, achieving up to 20x compression with minimal performance loss. [LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models](https://arxiv.org/abs/2310.05736) (EMNLP 2023).
_Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu_ LongLLMLingua is a method that enhances LLMs' ability to perceive key information in long-context scenarios using prompt compression, achieveing up to $28.5 in cost savings per 1,000 samples while also improving performance. [LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (Under Review).
_Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_