thenlper commited on
Commit
04b85dc
·
verified ·
1 Parent(s): f687ac3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -27
README.md CHANGED
@@ -8,7 +8,7 @@ pipeline_tag: sentence-similarity
8
  library_name: transformers
9
  ---
10
 
11
- # gte-modernbert-base
12
 
13
  We are excited to introduce the `gte-modernbert` series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The `gte-modernbert` series models include both text embedding models and rerank models.
14
 
@@ -79,32 +79,6 @@ embeddings = model.encode(sentences)
79
  print(cos_sim(embeddings[0], embeddings[1]))
80
  ```
81
 
82
- Use with `transformers.js`:
83
-
84
- ```js
85
- // npm i @xenova/transformers
86
- import { pipeline, dot } from '@xenova/transformers';
87
-
88
- // Create feature extraction pipeline
89
- const extractor = await pipeline('feature-extraction', 'Alibaba-NLP/gte-modernbert-base', {
90
- quantized: false, // Comment out this line to use the quantized version
91
- });
92
-
93
- // Generate sentence embeddings
94
- const sentences = [
95
- "what is the capital of China?",
96
- "how to implement quick sort in python?",
97
- "Beijing",
98
- "sorting algorithms"
99
- ]
100
- const output = await extractor(sentences, { normalize: true, pooling: 'cls' });
101
-
102
- // Compute similarity scores
103
- const [source_embeddings, ...document_embeddings ] = output.tolist();
104
- const similarities = document_embeddings.map(x => 100 * dot(source_embeddings, x));
105
- console.log(similarities);
106
- ```
107
-
108
  ## Training Details
109
 
110
  The `gte-modernbert` series of models follows the training scheme of the previous [GTE models](https://huggingface.co/collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469), with the only difference being that the pre-training language model base has been replaced from [GTE-MLM](https://huggingface.co/Alibaba-NLP/gte-en-mlm-base) to [ModernBert](https://huggingface.co/answerdotai/ModernBERT-base). For more training details, please refer to our paper: [mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval](https://aclanthology.org/2024.emnlp-industry.103/)
 
8
  library_name: transformers
9
  ---
10
 
11
+ # gte-reranker-modernbert-base
12
 
13
  We are excited to introduce the `gte-modernbert` series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The `gte-modernbert` series models include both text embedding models and rerank models.
14
 
 
79
  print(cos_sim(embeddings[0], embeddings[1]))
80
  ```
81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ## Training Details
83
 
84
  The `gte-modernbert` series of models follows the training scheme of the previous [GTE models](https://huggingface.co/collections/Alibaba-NLP/gte-models-6680f0b13f885cb431e6d469), with the only difference being that the pre-training language model base has been replaced from [GTE-MLM](https://huggingface.co/Alibaba-NLP/gte-en-mlm-base) to [ModernBert](https://huggingface.co/answerdotai/ModernBERT-base). For more training details, please refer to our paper: [mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval](https://aclanthology.org/2024.emnlp-industry.103/)