Spaces:
Running
A newer version of the Gradio SDK is available:
5.5.0
title: LLMLingua
emoji: π
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 4.36.0
app_file: app.py
pinned: false
license: mit
Check out the configuration reference at https://huggingface.co./docs/hub/spaces-config-reference
LLMLingua Series | Effectively Deliver Information to LLMs via Prompt Compression
| Project Page | LLMLingua | LongLLMLingua | LLMLingua-2 | LLMLingua Demo | LLMLingua-2 Demo |
News
- π¦ We're excited to announce the release of LLMLingua-2, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our paper, visit the project page, and explore our demo.
- π€³ Talk slides are available in AI Time Jan, 24.
- π₯ EMNLP'23 slides are available in Session 5 and BoF-6.
- π Check out our new blog post discussing RAG benefits and cost savings through prompt compression. See the script example here.
- π Visit our project page for real-world case studies in RAG, Online Meetings, CoT, and Code.
- π¨βπ¦― Explore our './examples' directory for practical applications, including RAG, Online Meeting, CoT, Code, and RAG using LlamaIndex.
- πΎ LongLLMLingua is now part of the LlamaIndex pipeline, a widely-used RAG framework.
TL;DR
LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.
- LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models (EMNLP 2023)
Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu
LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.
- LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression (ICLR ME-FoMo 2024)
Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu
LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.
- LLMLingua-2: Context-Aware Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression (Under Review)
Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang