microsoft
/

llmlingua-2-bert-base-multilingual-cased-meetingbank

Token Classification

Inference Endpoints

Model card Files Files and versions Community

qianhuiwu commited on 4 days ago

Commit

5f0c827

·

verified ·

1 Parent(s): 656a686

Added links to datasets.

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -4,7 +4,9 @@ license: apache-2.0
 # LLMLingua-2-Bert-base-Multilingual-Cased-MeetingBank
-This model was introduced in the paper [**LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression** (Pan et al, 2024)](https://arxiv.org/abs/2403.12968). It is a [BERT multilingual base model (cased)](https://huggingface.co/google-bert/bert-base-multilingual-cased) finetuned to perform token classification for task agnostic prompt compression. The probability `$p_{preserve}$` of each token `$x_i$` is used as the metric for compression. This model is trained on [an extractive text compression dataset(will public)]() constructed with the methodology proposed in the [**LLMLingua-2**](https://arxiv.org/abs/2403.12968), using training examples from [MeetingBank (Hu et al, 2023)](https://meetingbank.github.io/) as the seed data.
 For more details, please check the project page of [LLMLingua-2](https://llmlingua.com/llmlingua2.html) and [LLMLingua Series](https://llmlingua.com/).

 # LLMLingua-2-Bert-base-Multilingual-Cased-MeetingBank
+This model was introduced in the paper [**LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression** (Pan et al, 2024)](https://arxiv.org/abs/2403.12968). It is a [BERT multilingual base model (cased)](https://huggingface.co/google-bert/bert-base-multilingual-cased) finetuned to perform token classification for task agnostic prompt compression. The probability `$p_{preserve}$` of each token `$x_i$` is used as the metric for compression. This model is trained on [the extractive text compression dataset](https://huggingface.co/datasets/microsoft/MeetingBank-LLMCompressed) constructed with the methodology proposed in the [**LLMLingua-2**](https://arxiv.org/abs/2403.12968), using training examples from [MeetingBank (Hu et al, 2023)](https://meetingbank.github.io/) as the seed data.
+You can evaluate the model on downstream tasks such as question answering (QA) and summarization over compressed meeting transcripts using [this dataset](https://huggingface.co/datasets/microsoft/MeetingBank-QA-Summary).
 For more details, please check the project page of [LLMLingua-2](https://llmlingua.com/llmlingua2.html) and [LLMLingua Series](https://llmlingua.com/).