Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ pipeline_tag: sentence-similarity
|
|
8 |
library_name: transformers
|
9 |
---
|
10 |
|
11 |
-
# gte-modernbert-base
|
12 |
|
13 |
We are excited to introduce the `gte-modernbert` series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The `gte-modernbert` series models include both text embedding models and rerank models.
|
14 |
|
@@ -79,32 +79,6 @@ embeddings = model.encode(sentences)
|
|
79 |
print(cos_sim(embeddings[0], embeddings[1]))
|
80 |
```
|
81 |
|
82 |
-
Use with `transformers.js`:
|
83 |
-
|
84 |
-
```js
|
85 |
-
// npm i @xenova/transformers
|
86 |
-
import { pipeline, dot } from '@xenova/transformers';
|
87 |
-
|
88 |
-
// Create feature extraction pipeline
|
89 |
-
const extractor = await pipeline('feature-extraction', 'Alibaba-NLP/gte-modernbert-base', {
|
90 |
-
quantized: false, // Comment out this line to use the quantized version
|
91 |
-
});
|
92 |
-
|
93 |
-
// Generate sentence embeddings
|
94 |
-
const sentences = [
|
95 |
-
"what is the capital of China?",
|
96 |
-
"how to implement quick sort in python?",
|
97 |
-
"Beijing",
|
98 |
-
"sorting algorithms"
|
99 |
-
]
|
100 |
-
const output = await extractor(sentences, { normalize: true, pooling: 'cls' });
|
101 |
-
|
102 |
-
// Compute similarity scores
|
103 |
-
const [source_embeddings, ...document_embeddings ] = output.tolist();
|
104 |
-
const similarities = document_embeddings.map(x => 100 * dot(source_embeddings, x));
|
105 |
-
console.log(similarities);
|
106 |
-
```
|
107 |
-
|
108 |
## Training Details
|
109 |
|
110 |
The `gte-modernbert` series of models follows the training scheme of the previous [GTE models](https://huggingface.co/collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469), with the only difference being that the pre-training language model base has been replaced from [GTE-MLM](https://huggingface.co/Alibaba-NLP/gte-en-mlm-base) to [ModernBert](https://huggingface.co/answerdotai/ModernBERT-base). For more training details, please refer to our paper: [mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval](https://aclanthology.org/2024.emnlp-industry.103/)
|
|
|
8 |
library_name: transformers
|
9 |
---
|
10 |
|
11 |
+
# gte-reranker-modernbert-base
|
12 |
|
13 |
We are excited to introduce the `gte-modernbert` series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The `gte-modernbert` series models include both text embedding models and rerank models.
|
14 |
|
|
|
79 |
print(cos_sim(embeddings[0], embeddings[1]))
|
80 |
```
|
81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
## Training Details
|
83 |
|
84 |
The `gte-modernbert` series of models follows the training scheme of the previous [GTE models](https://huggingface.co/collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469), with the only difference being that the pre-training language model base has been replaced from [GTE-MLM](https://huggingface.co/Alibaba-NLP/gte-en-mlm-base) to [ModernBert](https://huggingface.co/answerdotai/ModernBERT-base). For more training details, please refer to our paper: [mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval](https://aclanthology.org/2024.emnlp-industry.103/)
|