Update README.md

fec6874 verified 19 days ago

10.3 kB

	---
	license: bigscience-bloom-rail-1.0
	datasets:
	- unicamp-dl/mmarco
	- rajpurkar/squad
	language:
	- fr
	- en
	pipeline_tag: sentence-similarity
	base_model:
	- cmarkea/bloomz-3b-dpo-chat
	---

	# Bloomz-3b Reranking

	This reranking model is built from [cmarkea/bloomz-3b-dpo-chat](https://huggingface.co./cmarkea/bloomz-3b-dpo-chat) model and aims to measure the semantic correspondence
	between a question (query) and a context. With its normalized scoring, it helps to filter the query/context matchings outputted by a retriever in an ODQA (Open-Domain
	Question Answering) context. Moreover, it allows to reorder the results using a more efficient modeling approach than the retriever one. However, this modeling type is
	not conducive to direct database searching due to its high computational cost.

	Developed to be language-agnostic, this model supports both French and English. Consequently, it can effectively score in a cross-language context without being
	influenced by its behavior in a monolingual context (English or French).

	## Dataset
	The training dataset is composed of the [mMARCO](https://huggingface.co./datasets/unicamp-dl/mmarco) dataset, consisting of query/positive/hard negative triplets.
	Additionally, we have included [SQuAD](https://huggingface.co./datasets/rajpurkar/squad) data from the "train" split, forming query/positive/hard negative triplets. In
	order to generate hard negative data for SQuAD, we considered contexts from the same theme as the query but from a different set of queries. Hence, the negative
	observations belong to the same themes as the queries but presumably do not contain the answer to the question.

	Finally, the triplets are flattened to obtain pairs of query/context sentences with a label 1 if query/positive and a label 0 if query/negative. In each element of the
	pair (query and context), the language, French or English, is randomly and uniformly chosen.

	## Evaluation

	To assess the performance of the reranker, we will make use of the "validation" split of the [SQuAD](https://huggingface.co./datasets/rajpurkar/squad) dataset. We will
	select the first question from each paragraph, along with the paragraph constituting the context that should be ranked Top-1 for an Oracle modeling. What's intriguing
	is that the number of themes is limited, and each context from a corresponding theme that does not match the query is considered as a hard negative (other contexts
	outside the theme are simple negatives). Thus, we can construct the following table, with each theme showing the number of contexts and associated query:

	\| Theme name \| Context number \| Theme name \| Context number \|
	\|---------------------------------------------:\|:---------------\|---------------------------------------------:\|:---------------\|
	\| Normans \| 39 \| Civil_disobedience \| 26 \|
	\| Computational_complexity_theory \| 48 \| Construction \| 22 \|
	\| Southern_California \| 39 \| Private_school \| 26 \|
	\| Sky_(United_Kingdom) \| 22 \| Harvard_University \| 30 \|
	\| Victoria_(Australia) \| 25 \| Jacksonville,_Florida \| 21 \|
	\| Huguenot \| 44 \| Economic_inequality \| 44 \|
	\| Steam_engine \| 46 \| University_of_Chicago \| 37 \|
	\| Oxygen \| 43 \| Yuan_dynasty \| 47 \|
	\| 1973_oil_crisis \| 24 \| Immune_system \| 49 \|
	\| European_Union_law \| 40 \| Intergovernmental_Panel_on_Climate_Change \| 24 \|
	\| Amazon_rainforest \| 21 \| Prime_number \| 31 \|
	\| Ctenophora \| 31 \| Rhine \| 44 \|
	\| Fresno,_California \| 28 \| Scottish_Parliament \| 39 \|
	\| Packet_switching \| 23 \| Islamism \| 39 \|
	\| Black_Death \| 23 \| Imperialism \| 39 \|
	\| Geology \| 25 \| Warsaw \| 49 \|
	\| Pharmacy \| 26 \| French_and_Indian_War \| 46 \|
	\| Force \| 44 \| \| \|

	The evaluation corpus consists of 1204 pairs of query/context to be ranked.

	Firstly, the evaluation scores were computed in cases where both the query and the context are in the same language (French/French).

	\| Model (French/French) \| Top-mean \| Top-std \| Top-1 (%) \| Top-10 (%) \| Top-100 (%) \| MRR (x100) \| mean score Top \| std score Top \|
	\|:-----------------------------:\|:----------:\|:---------:\|:---------:\|:----------:\|:-----------:\|:----------:\|:----------------:\|:---------------:\|
	\| BM25 \| 14.47 \| 92.19 \| 69.77 \| 92.03 \| 98.09 \| 77.74 \| NA \| NA \|
	\| [CamemBERT](https://huggingface.co./antoinelouis/crossencoder-camembert-base-mmarcoFR) \| 5.72 \| 36.88 \| 69.35 \| 95.51 \| 98.92 \| 79.51 \| 0.83 \| 0.37 \|
	\| [DistilCamemBERT](https://huggingface.co./antoinelouis/crossencoder-distilcamembert-mmarcoFR) \| 5.54 \| 25.90 \| 66.11 \| 92.77 \| 99.17 \| 76.00 \| 0.80 \| 0.39 \|
	\| [mMiniLMv2-L12](https://huggingface.co./antoinelouis/crossencoder-mMiniLMv2-L12-mmarcoFR) \| 4.43 \| 30.27 \| 71.51 \| 95.68 \| 99.42 \| 80.17 \| 0.78 \| 0.38 \|
	\| [RoBERTa (multilingual)](https://huggingface.co./abbasgolestani/ag-nli-DeTS-sentence-similarity-v2) \| 15.13 \| 60.39 \| 57.23 \| 83.87 \| 96.18 \| 66.21 \| 0.53 \| 0.11 \|
	\| [cmarkea/bloomz-560m-reranking](https://huggingface.co./cmarkea/bloomz-560m-reranking) \| 1.49 \| 2.58 \| 83.55 \| 99.17 \| 100 \| 89.98 \| 0.93 \| 0.15 \|
	\| [cmarkea/bloomz-3b-reranking](https://huggingface.co./cmarkea/bloomz-3b-reranking) \| 1.22 \| 1.06 \| 89.37 \| 99.75 \| 100 \| 93.79 \| 0.94 \| 0.10 \|


	Then, we evaluated the model in a cross-language context, with queries in French and contexts in English.

	\| Model (French/English) \| Top-mean \| Top-std \| Top-1 (%) \| Top-10 (%) \| Top-100 (%) \| MRR (x100) \| mean score Top \| std score Top \|
	\|:-----------------------------:\|:----------:\|:---------:\|:---------:\|:----------:\|:-----------:\|:----------:\|:----------------:\|:---------------:\|
	\| BM25 \| 288.04 \| 371.46 \| 21.93 \| 41.93 \| 55.15 \| 28.41 \| NA \| NA \|
	\| [CamemBERT](https://huggingface.co./antoinelouis/crossencoder-camembert-base-mmarcoFR) \| 12.20 \| 61.39 \| 59.55 \| 89.71 \| 97.42 \| 70.38 \| 0.65 \| 0.47 \|
	\| [DistilCamemBERT](https://huggingface.co./antoinelouis/crossencoder-distilcamembert-mmarcoFR) \| 40.97 \| 104.78 \| 25.66 \| 64.78 \| 88.62 \| 38.83 \| 0.53 \| 0.49 \|
	\| [mMiniLMv2-L12](https://huggingface.co./antoinelouis/crossencoder-mMiniLMv2-L12-mmarcoFR) \| 6.91 \| 32.16 \| 59.88 \| 89.95 \| 99.09 \| 70.39 \| 0.61 \| 0.46 \|
	\| [RoBERTa (multilingual)](https://huggingface.co./abbasgolestani/ag-nli-DeTS-sentence-similarity-v2) \| 79.32 \| 153.62 \| 27.91 \| 49.50 \| 78.16 \| 35.41 \| 0.40 \| 0.12 \|
	\| [cmarkea/bloomz-560m-reranking](https://huggingface.co./cmarkea/bloomz-560m-reranking) \| 1.51 \| 1.92 \| 81.89 \| 99.09 \| 100 \| 88.64 \| 0.92 \| 0.15 \|
	\| [cmarkea/bloomz-3b-reranking](https://huggingface.co./cmarkea/bloomz-3b-reranking) \| 1.22 \| 0.98 \| 89.20 \| 99.84 \| 100 \| 93.63 \| 0.94 \| 0.10 \|

	As observed, the cross-language context does not significantly impact the behavior of our models. If the model were used in a context of reranking and filtering the
	Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts
	for RAG-type applications.

	How to Use Bloomz-3b-reranking
	------------------------------

	The following example is based on the API Pipeline of the Transformers library.

	```python
	from transformers import pipeline

	reranker = pipeline(
	task='feature-extraction',
	model='cmarkea/bloomz-3b-reranking',
	top_k=None
	)

	query: str
	contexts: List[str]

	similarities = reranker(
	[
	dict(
	text=context, # the model was trained with context in `text`
	text_pair=query # and query in `text_pair` argument.
	)
	for context in contexts
	]
	)

	contexts_reranked = sorted(
	filter(
	lambda x: x[0]['label'] == "LABEL_1",
	zip(similarities, contexts)
	),
	key=lambda x: x[0],
	reverse=True
	)

	score, contexts_cleaned = zip(
	*filter(
	lambda x: x[0] >= 0.8,
	contexts_reranked
	)
	)
	```

	Citation
	--------

	```bibtex
	@online{DeBloomzReranking,
	AUTHOR = {Cyrile Delestre},
	ORGANIZATION = {Cr{\'e}dit Mutuel Ark{\'e}a},
	URL = {https://huggingface.co./cmarkea/bloomz-3b-reranking},
	YEAR = {2024},
	KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
	}
	```