Introduction
We introduce NV-Retriever-v1, an embedding model, which is optimized for retrieval. It achieves the highest score of 60.9 on 15 retrieval tasks within the MTEB retrieval benchmark (as of 12th July, 2024).
This model is ready for non-commercial use.
For commercial use, the models of NeMo Retriever Microservices (NIMs) may be used and are trained with the same techniques with different datasets.
Technical details can be found in our paper: NV-Retriever: Improving text embedding models with effective hard-negative mining
How to use
It is required to set trust_remote_code=True
when loading the model, as it contains a custom module for bidirectional attention and applying the masked mean_pooling.
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('nvidia/NV-Retriever-v1')
model = AutoModel.from_pretrained('nvidia/NV-Retriever-v1', trust_remote_code=True)
query_prefix = 'Given a web search query, retrieve relevant passages that answer the query: '
document_prefix = 'passage: '
queries = [
"how much protein should a female eat",
"summit define",
]
documents = [
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments."
]
queries = [f"{query_prefix} {query}" for query in queries]
documents = [f"{document_prefix} {document}" for document in documents]
batch_queries = tokenizer(queries, padding=True, truncation=True, return_tensors='pt')
batch_documents = tokenizer(documents, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
embeddings_queries = model(**batch_queries)
embeddings_documents = model(**batch_documents)
scores = (embeddings_queries @ embeddings_documents.T)
print(scores.tolist())
# [[0.6778843402862549, -0.03561091050505638], [-0.05117562413215637, 0.7305730581283569]]
NV-Retriever-v1 Team:
- Mengyao Xu
- Gabriel Moreira
- Radek Osmulski
- Ronay Ak
- Benedikt Schifferer
- Even Oldridge
Correspondence to
Benedikt Schifferer ([email protected])
Citation
@misc{moreira2024nvretrieverimprovingtextembedding,
title={NV-Retriever: Improving text embedding models with effective hard-negative mining},
author={Gabriel de Souza P. Moreira and Radek Osmulski and Mengyao Xu and Ronay Ak and Benedikt Schifferer and Even Oldridge},
year={2024},
eprint={2407.15831},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2407.15831},
}
License
License to use this model is covered by the NVIDIA license agreement. By downloading the release version of the model, you accept the terms and conditions of these licenses . For each dataset a user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose.
Troubleshooting
1. Access to model nvidia/NV-Retriever-v1 is restricted. You must be authenticated to access it
Use your Hugging Face access token to execute huggingface-cli login
. You can get a User Access Token from your Settings page.
2. Instruction Prompt Templates
NV-Retriever-v1 uses a query and document prefix similar to [Improving Text Embeddings with Large Language Models] (https://arxiv.org/pdf/2401.00368). It does not use the template with “Instruct:” and “Query:” ( f'Instruct: {task_description}\nQuery: {query}' ) it uses only “{task_description}: “. It is important to end the prefix with a colon (“:”) and a space. The document prefix for documents, “passage: ”, is the same for every task. .
Example:
query = f{"Given a web search query, retrieve relevant passages that answer the query: {query}"}
document = f{"passage: {document}"}
3. User Warning About Prompt
NV-Retriever-v1 expects Instruction Prompt Templates for each query and document. The custom code will modify the attention_mask to apply mean_pooling operation only on the actual text without the prefix. The custom code will look for the token_id 28747 and remove all attention prior to the first appearance of 28747.
As query and document require a prefix with the token_id 28747, the model will output a warning, if the token_id is not present in the input. It is likely that the model is used incorrectly.
Token_id 28747 is the character “:” not separated to some word. For example “query: ”, “passage: ” or “Represent this query: ”. If the input is “query :” with a space, the token_id for “:” is different. Note our custom code will find the first 28747 token in the input, so you don’t need to worry about the “:” inside the query or document content.
UserWarning: Input does not contain special token 28747 to mask out instruction prompt. Please check if prefix are applied, correctly warnings.warn(f"Input does not contain special token {sep_token_id} to mask out instruction prompt. Please check if prefix are applied, correctly")
4. Multi-GPU support
NV-Retriever-v1 supports multi-GPU with DataParallel.
import torch
model = torch.nn.DataParallel(model).cuda()
Intended use
The NV-Retriever Model is designed for users who need a high-performance embedding model for the retrieval task.
Model Architecture
Architecture Type: Decoder-only bidirectional LLM
Network Architecture: Mistral-7B-v0.1 with Bidirectional attention masking
Pooling Type: Average (mean) pooling
Embedding Dimension: 4096
Max Input Tokens: 512
The NV-Retriever-v1 Model is based on the Mistral-7B-v0.1 architecture with a bidirectional attention masking mechanism.
Input
Input Type: Text
Input Format: List of comma separated strings with task-specific instructions
Output
Output Type: Floats
Output Format: List of float arrays
Other Properties Related to Output: Each array contains the embeddings of size 4096 for the corresponding input string
Model Version(s)
NV-Retriever-v1
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.
- Downloads last month
- 27
Spaces using nvidia/NV-Retriever-v1 2
Evaluation results
- main_score on MTEB ArguAnatest set self-reported68.277
- map_at_1 on MTEB ArguAnatest set self-reported44.666
- map_at_10 on MTEB ArguAnatest set self-reported60.300
- map_at_100 on MTEB ArguAnatest set self-reported60.692
- map_at_1000 on MTEB ArguAnatest set self-reported60.693
- map_at_20 on MTEB ArguAnatest set self-reported60.645
- map_at_3 on MTEB ArguAnatest set self-reported56.472
- map_at_5 on MTEB ArguAnatest set self-reported58.780
- mrr_at_1 on MTEB ArguAnatest set self-reported45.092
- mrr_at_10 on MTEB ArguAnatest set self-reported60.493