LLMLingua / README.md
iofu728's picture
Fix(LLMLingua): fix the Gradio version
803bfa4

A newer version of the Gradio SDK is available: 5.5.0

Upgrade
metadata
title: LLMLingua
emoji: πŸ“
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 4.36.0
app_file: app.py
pinned: false
license: mit

Check out the configuration reference at https://huggingface.co./docs/hub/spaces-config-reference

LLMLingua

LLMLingua Series | Effectively Deliver Information to LLMs via Prompt Compression

| Project Page | LLMLingua | LongLLMLingua | LLMLingua-2 | LLMLingua Demo | LLMLingua-2 Demo |

News

  • 🦚 We're excited to announce the release of LLMLingua-2, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our paper, visit the project page, and explore our demo.
  • 🀳 Talk slides are available in AI Time Jan, 24.
  • πŸ–₯ EMNLP'23 slides are available in Session 5 and BoF-6.
  • πŸ“š Check out our new blog post discussing RAG benefits and cost savings through prompt compression. See the script example here.
  • 🎈 Visit our project page for real-world case studies in RAG, Online Meetings, CoT, and Code.
  • πŸ‘¨β€πŸ¦― Explore our './examples' directory for practical applications, including RAG, Online Meeting, CoT, Code, and RAG using LlamaIndex.
  • πŸ‘Ύ LongLLMLingua is now part of the LlamaIndex pipeline, a widely-used RAG framework.

TL;DR

LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.

LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.

LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.