qianhuiwu commited on
Commit
5f0c827
·
verified ·
1 Parent(s): 656a686

Added links to datasets.

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -4,7 +4,9 @@ license: apache-2.0
4
 
5
  # LLMLingua-2-Bert-base-Multilingual-Cased-MeetingBank
6
 
7
- This model was introduced in the paper [**LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression** (Pan et al, 2024)](https://arxiv.org/abs/2403.12968). It is a [BERT multilingual base model (cased)](https://huggingface.co/google-bert/bert-base-multilingual-cased) finetuned to perform token classification for task agnostic prompt compression. The probability `$p_{preserve}$` of each token `$x_i$` is used as the metric for compression. This model is trained on [an extractive text compression dataset(will public)]() constructed with the methodology proposed in the [**LLMLingua-2**](https://arxiv.org/abs/2403.12968), using training examples from [MeetingBank (Hu et al, 2023)](https://meetingbank.github.io/) as the seed data.
 
 
8
 
9
  For more details, please check the project page of [LLMLingua-2](https://llmlingua.com/llmlingua2.html) and [LLMLingua Series](https://llmlingua.com/).
10
 
 
4
 
5
  # LLMLingua-2-Bert-base-Multilingual-Cased-MeetingBank
6
 
7
+ This model was introduced in the paper [**LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression** (Pan et al, 2024)](https://arxiv.org/abs/2403.12968). It is a [BERT multilingual base model (cased)](https://huggingface.co/google-bert/bert-base-multilingual-cased) finetuned to perform token classification for task agnostic prompt compression. The probability `$p_{preserve}$` of each token `$x_i$` is used as the metric for compression. This model is trained on [the extractive text compression dataset](https://huggingface.co/datasets/microsoft/MeetingBank-LLMCompressed) constructed with the methodology proposed in the [**LLMLingua-2**](https://arxiv.org/abs/2403.12968), using training examples from [MeetingBank (Hu et al, 2023)](https://meetingbank.github.io/) as the seed data.
8
+
9
+ You can evaluate the model on downstream tasks such as question answering (QA) and summarization over compressed meeting transcripts using [this dataset](https://huggingface.co/datasets/microsoft/MeetingBank-QA-Summary).
10
 
11
  For more details, please check the project page of [LLMLingua-2](https://llmlingua.com/llmlingua2.html) and [LLMLingua Series](https://llmlingua.com/).
12