File size: 10,998 Bytes

---
license: apache-2.0
language:
- en
base_model:
- answerdotai/ModernBERT-base
pipeline_tag: sentence-similarity
library_name: transformers
---

# gte-reranker-modernbert-base

We are excited to introduce the `gte-modernbert` series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The `gte-modernbert` series models include both text embedding models and rerank models.

The `gte-modernbert` models demonstrates competitive performance in several text embedding and text retrieval evaluation tasks when compared to similar-scale models from the current open-source community. This includes assessments such as **MTEB**, **LoCO**, and **COIR** evaluation.

## Model Overview

- Developed by: Tongyi Lab, Alibaba Group
- Model Type: Text Embedding
- Primary Language: English
- Model Size: 149M
- Max Input Length: 8192 tokens

### Model list
|                                         Models                                         | Language |       Model Type       | Model Size | Max Seq. Length | Dimension | MTEB-en | BEIR | LoCo | CoIR |
|:--------------------------------------------------------------------------------------:|:--------:|:----------------------:|:----------:|:---------------:|:---------:| :-----: | :-----: |
|  [`gte-modernbert-base`](https://huggingface.co./Alibaba-NLP/gte-modernbert-base)   | English  |     text embedding     |    149M    |      8192       |    768    | 64.29 | 55.33 | 87.57 | 77.69 | 
| [`gte-reranker-modernbert-base`](hhttps://huggingface.co./Alibaba-NLP/gte-reranker-modernbert-base)  | English  | text reranker     |    149M    |    8192    |     -     |   56.19 | 90.68 | 79.31 |

## Usage

Use with `Transformers`

```python
# Requires transformers>=4.48.0

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name_or_path = 'Alibaba-NLP/gte-reranker-modernbert-base'
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name_or_path, trust_remote_code=True,
    torch_dtype=torch.float16
)
model.eval()

pairs = [["what is the capital of China?", "Beijing"], ["how to implement quick sort in python?","Introduction of quick sort"], ["how to implement quick sort in python?", "The weather is nice today"]]

with torch.no_grad():
    inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
    scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
    print(scores)

# tensor([1.2315, 0.5923, 0.3041])
```

Use with `sentence-transformers`:

Before you start, install the sentence-transformers libraries:
```
pip install sentence-transformers
```


```python
# Requires sentence_transformers>=2.7.0
from sentence_transformers import CrossEncoder

model_name_or_path = 'Alibaba-NLP/gte-reranker-modernbert-base'

model = CrossEncoder(
    model_name_or_path,
    automodel_args={"torch_dtype": "auto"},
    trust_remote_code=True,
)

pairs = [["what is the capital of China?", "Beijing"], ["how to implement quick sort in python?","Introduction of quick sort"], ["how to implement quick sort in python?", "The weather is nice today"]]

scores = model.predict(sentence_pairs, convert_to_tensor=True).tolist()

print ("scores: ", scores)
```

## Training Details

The `gte-modernbert` series of models follows the training scheme of the previous [GTE models](https://huggingface.co./collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469), with the only difference being that the pre-training language model base has been replaced from [GTE-MLM](https://huggingface.co./Alibaba-NLP/gte-en-mlm-base) to [ModernBert](https://huggingface.co./answerdotai/ModernBERT-base). For more training details, please refer to our paper: [mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval](https://aclanthology.org/2024.emnlp-industry.103/)

## Evaluation

### MTEB

The results of other models are retrieved from [MTEB leaderboard](https://huggingface.co./spaces/mteb/leaderboard). Given that all models in the `gte-modernbert` series have a size of less than 1B parameters, we focused exclusively on the results of models under 1B from the MTEB leaderboard.

|                                            Model Name                                            | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) |  STS (10)   | Summ. (1) |
|:------------------------------------------------------------------------------------------------:|:--------------:|:---------:|:---------------:|:------------:|:-----------:|:---:|:---:|:---:|:---:|:-----------:|:--------:|
|        [mxbai-embed-large-v1](https://huggingface.co./mixedbread-ai/mxbai-embed-large-v1)         |      335       |   1024    |       512       |    64.68     |    75.64    | 46.71 | 87.2 | 60.11 | 54.39 |     85      |   32.71  |
| [multilingual-e5-large-instruct](https://huggingface.co./intfloat/multilingual-e5-large-instruct) |      560       |   1024    |       514       |    64.41     |    77.56    | 47.1 | 86.19 | 58.58 | 52.47 |    84.78    |   30.39  |
|                [bge-large-en-v1.5](https://huggingface.co./BAAI/bge-large-en-v1.5)                |      335       |   1024    |       512       |    64.23     |    75.97    | 46.08 | 87.12 | 60.03 | 54.29 |    83.11    |   31.61  |
|             [gte-base-en-v1.5](https://huggingface.co./Alibaba-NLP/gte-base-en-v1.5)              |      137       |    768    |      8192       |  **64.11**   |    77.17    | 46.82 | 85.33 | 57.66 | 54.09 |    81.97    |   31.17  |
|                 [bge-base-en-v1.5](https://huggingface.co./BAAI/bge-base-en-v1.5)                 |      109       |    768    |       512       |    63.55     |    75.53    | 45.77 | 86.55 | 58.86 | 53.25 |    82.4     |   31.07  |
|            [gte-large-en-v1.5](https://huggingface.co./Alibaba-NLP/gte-large-en-v1.5)             |      409       |   1024    |      8192       |    65.39     |    77.75    | 47.95 | 84.63 | 58.50 | 57.91 |    81.43    |   30.91  |
| [modernbert-embed-base](https://huggingface.co./nomic-ai/modernbert-embed-base) |      149       |    768    |      8192       |    62.62     |    74.31    | 44.98 | 83.96 | 56.42 | 52.89 |    81.78    |   31.39  |
| [nomic-embed-text-v1.5](https://huggingface.co./nomic-ai/nomic-embed-text-v1.5) |                |    768    |      8192       |    62.28     |   	73.55    |	43.93 |	84.61 |	55.78 | 53.01|    81.94    |   30.4   |
| [gte-multilingual-base](https://huggingface.co./Alibaba-NLP/gte-multilingual-base) |      305       |    768    |       8192      |     61.4     | 70.89 | 44.31 | 84.24 | 57.47 |51.08 |    82.11    |   30.58  | 
| [jina-embeddings-v3](https://huggingface.co./jinaai/jina-embeddings-v3) | 572 |   1024    |      8192  |       65.51 | 82.58 |45.21 |84.01 |58.13 |53.88 | 85.81 |   29.71  | 
| [gte-modernbert-base](https://huggingface.co./Alibaba-NLP/gte-modernbert-base) | 149 |   768    |      8192  |   64.29 | 76.32 | 45.31 | 86.49 | 58.33 | 55.33 | 83.41 | 29.17 |


### LoCo (Long Document Retrieval)

| Model Name |  Dimension | Sequence Length | Average (5) | QsmsumRetrieval | SummScreenRetrieval | QasperAbastractRetrieval | QasperTitleRetrieval |  GovReportRetrieval |
|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| [gte-qwen1.5-7b](https://huggingface.co./Alibaba-NLP/gte-qwen1.5-7b) | 4096 | 32768 |  87.57 | 49.37 | 93.10 | 99.67 | 97.54 | 98.21 | 
| [gte-large-v1.5](https://huggingface.co./Alibaba-NLP/gte-large-v1.5) |1024 | 8192 | 86.71 | 44.55 | 92.61 | 99.82 | 97.81 | 98.74 |
| [gte-base-v1.5](https://huggingface.co./Alibaba-NLP/gte-base-v1.5) | 768 | 8192 | 87.44 | 49.91  | 91.78 | 99.82 | 97.13 | 98.58 |
| [gte-modernbert-base](https://huggingface.co./Alibaba-NLP/gte-modernbert-base) | 768 | 8192 | 88.88 | 54.45 | 93.00 | 99.82 | 98.03 | 98.70 |
| [gte-reranker-modernbert-base](https://huggingface.co./Alibaba-NLP/gte-reranker-modernbert-base) | - | 8192 | 90.68 | 70.86 | 94.06 | 99.73 | 99.11 | 89.67 | 

### COIR (Code Retrieval Task)

| Model Name | Dimension | Sequence Length | Average(20) | CodeSearchNet-ccr-go | CodeSearchNet-ccr-java | CodeSearchNet-ccr-javascript | CodeSearchNet-ccr-php | CodeSearchNet-ccr-python | CodeSearchNet-ccr-ruby | CodeSearchNet-go | CodeSearchNet-java | CodeSearchNet-javascript | CodeSearchNet-php | CodeSearchNet-python | CodeSearchNet-ruby | apps | codefeedback-mt | codefeedback-st | codetrans-contest | codetrans-dl | cosqa | stackoverflow-qa | synthetic-text2sql |
|:----:|:---:|:---:|:---:|:---:| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| [gte-modernbert-base](https://huggingface.co./Alibaba-NLP/gte-modernbert-base) | 768 | 8192 | 77.26 | 95.15 | 94.75 | 96.55 | 91.64 | 95.31 | 90.71 | 86.41 | 79.09 | 97.66 | 80.22 | 42.05 | 55.2 | 84.77 | 52.53 |
| [gte-reranker-modernbert-base](https://huggingface.co./Alibaba-NLP/gte-reranker-modernbert-base) | - | 8192 | 79.31 | 94.15 | 93.57 | 94.27 | 91.51 | 93.93 | 90.63 | 88.32 | 83.27 | 76.05 | 85.12 | 88.16 | 77.59 | 57.54 | 82.34 | 85.95 | 71.89 |



### BEIR

| Model Name | Dimension | Sequence Length | Average(15) | ArguAna	| ClimateFEVER	| CQADupstackAndroidRetrieval	| DBPedia	| FEVER	| FiQA2018	| HotpotQA	| MSMARCO	| NFCorpus	| NQ	| QuoraRetrieval	| SCIDOCS	| SciFact	| Touche2020	| TRECCOVID |
| :----: | :----: | :----: |  :----: | :----: | :---: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| [gte-modernbert-base](https://huggingface.co./Alibaba-NLP/gte-modernbert-base) | 768 | 8192 | 55.33 | 72.68 | 37.74 | 42.63 | 41.79 | 91.03 | 48.81 | 69.47 | 40.9 | 36.44 | 57.62 | 88.55 | 21.29 | 77.4 | 21.68 | 81.95 |
| [gte-reranker-modernbert-base](https://huggingface.co./Alibaba-NLP/gte-reranker-modernbert-base) | - | 8192 | 69.03 | 37.79 | 44.68 | 47.23 | 94.54 | 49.81 | 78.16 | 45.38 | 30.69 | 64.57 | 87.77 | 20.60 | 73.57 | 27.36 | 79.89 | 

## Citation

If you find our paper or models helpful, feel free to give us a cite.

```
@inproceedings{zhang2024mgte,
  title={mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval},
  author={Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Wen and Dai, Ziqi and Tang, Jialong and Lin, Huan and Yang, Baosong and Xie, Pengjun and Huang, Fei and others},
  booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track},
  pages={1393--1412},
  year={2024}
}

@article{li2023towards,
  title={Towards general text embeddings with multi-stage contrastive learning},
  author={Li, Zehan and Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan},
  journal={arXiv preprint arXiv:2308.03281},
  year={2023}
}
```