Add new CrossEncoder model
Browse files- README.md +458 -0
- config.json +31 -0
- model.safetensors +3 -0
- special_tokens_map.json +7 -0
- tokenizer.json +0 -0
- tokenizer_config.json +58 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,458 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
tags:
|
5 |
+
- sentence-transformers
|
6 |
+
- cross-encoder
|
7 |
+
- text-classification
|
8 |
+
- generated_from_trainer
|
9 |
+
- dataset_size:78704
|
10 |
+
- loss:ListNetLoss
|
11 |
+
base_model: microsoft/MiniLM-L12-H384-uncased
|
12 |
+
datasets:
|
13 |
+
- microsoft/ms_marco
|
14 |
+
pipeline_tag: text-classification
|
15 |
+
library_name: sentence-transformers
|
16 |
+
metrics:
|
17 |
+
- map
|
18 |
+
- mrr@10
|
19 |
+
- ndcg@10
|
20 |
+
co2_eq_emissions:
|
21 |
+
emissions: 201.83156300124415
|
22 |
+
energy_consumed: 0.519244982020273
|
23 |
+
source: codecarbon
|
24 |
+
training_type: fine-tuning
|
25 |
+
on_cloud: false
|
26 |
+
cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
|
27 |
+
ram_total_size: 31.777088165283203
|
28 |
+
hours_used: 1.659
|
29 |
+
hardware_used: 1 x NVIDIA GeForce RTX 3090
|
30 |
+
model-index:
|
31 |
+
- name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
|
32 |
+
results: []
|
33 |
+
---
|
34 |
+
|
35 |
+
# CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
|
36 |
+
|
37 |
+
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
38 |
+
|
39 |
+
## Model Details
|
40 |
+
|
41 |
+
### Model Description
|
42 |
+
- **Model Type:** Cross Encoder
|
43 |
+
- **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
|
44 |
+
- **Maximum Sequence Length:** 512 tokens
|
45 |
+
- **Number of Output Labels:** 1 label
|
46 |
+
- **Training Dataset:**
|
47 |
+
- [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
|
48 |
+
- **Language:** en
|
49 |
+
<!-- - **License:** Unknown -->
|
50 |
+
|
51 |
+
### Model Sources
|
52 |
+
|
53 |
+
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
54 |
+
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
|
55 |
+
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
56 |
+
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
|
57 |
+
|
58 |
+
## Usage
|
59 |
+
|
60 |
+
### Direct Usage (Sentence Transformers)
|
61 |
+
|
62 |
+
First install the Sentence Transformers library:
|
63 |
+
|
64 |
+
```bash
|
65 |
+
pip install -U sentence-transformers
|
66 |
+
```
|
67 |
+
|
68 |
+
Then you can load this model and run inference.
|
69 |
+
```python
|
70 |
+
from sentence_transformers import CrossEncoder
|
71 |
+
|
72 |
+
# Download from the 🤗 Hub
|
73 |
+
model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-sigmoid-scale-10")
|
74 |
+
# Get scores for pairs of texts
|
75 |
+
pairs = [
|
76 |
+
['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
|
77 |
+
['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
|
78 |
+
['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
|
79 |
+
]
|
80 |
+
scores = model.predict(pairs)
|
81 |
+
print(scores.shape)
|
82 |
+
# (3,)
|
83 |
+
|
84 |
+
# Or rank different texts based on similarity to a single text
|
85 |
+
ranks = model.rank(
|
86 |
+
'How many calories in an egg',
|
87 |
+
[
|
88 |
+
'There are on average between 55 and 80 calories in an egg depending on its size.',
|
89 |
+
'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
|
90 |
+
'Most of the calories in an egg come from the yellow yolk in the center.',
|
91 |
+
]
|
92 |
+
)
|
93 |
+
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
|
94 |
+
```
|
95 |
+
|
96 |
+
<!--
|
97 |
+
### Direct Usage (Transformers)
|
98 |
+
|
99 |
+
<details><summary>Click to see the direct usage in Transformers</summary>
|
100 |
+
|
101 |
+
</details>
|
102 |
+
-->
|
103 |
+
|
104 |
+
<!--
|
105 |
+
### Downstream Usage (Sentence Transformers)
|
106 |
+
|
107 |
+
You can finetune this model on your own dataset.
|
108 |
+
|
109 |
+
<details><summary>Click to expand</summary>
|
110 |
+
|
111 |
+
</details>
|
112 |
+
-->
|
113 |
+
|
114 |
+
<!--
|
115 |
+
### Out-of-Scope Use
|
116 |
+
|
117 |
+
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
118 |
+
-->
|
119 |
+
|
120 |
+
## Evaluation
|
121 |
+
|
122 |
+
### Metrics
|
123 |
+
|
124 |
+
#### Cross Encoder Reranking
|
125 |
+
|
126 |
+
* Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
|
127 |
+
* Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator)
|
128 |
+
|
129 |
+
| Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
|
130 |
+
|:------------|:---------------------|:---------------------|:---------------------|
|
131 |
+
| map | 0.5122 (+0.0226) | 0.3306 (+0.0696) | 0.5716 (+0.1520) |
|
132 |
+
| mrr@10 | 0.5044 (+0.0269) | 0.5401 (+0.0403) | 0.5754 (+0.1487) |
|
133 |
+
| **ndcg@10** | **0.5840 (+0.0435)** | **0.3676 (+0.0425)** | **0.6431 (+0.1425)** |
|
134 |
+
|
135 |
+
#### Cross Encoder Nano BEIR
|
136 |
+
|
137 |
+
* Dataset: `NanoBEIR_R100_mean`
|
138 |
+
* Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator)
|
139 |
+
|
140 |
+
| Metric | Value |
|
141 |
+
|:------------|:---------------------|
|
142 |
+
| map | 0.4715 (+0.0814) |
|
143 |
+
| mrr@10 | 0.5400 (+0.0720) |
|
144 |
+
| **ndcg@10** | **0.5316 (+0.0762)** |
|
145 |
+
|
146 |
+
<!--
|
147 |
+
## Bias, Risks and Limitations
|
148 |
+
|
149 |
+
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
150 |
+
-->
|
151 |
+
|
152 |
+
<!--
|
153 |
+
### Recommendations
|
154 |
+
|
155 |
+
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
156 |
+
-->
|
157 |
+
|
158 |
+
## Training Details
|
159 |
+
|
160 |
+
### Training Dataset
|
161 |
+
|
162 |
+
#### ms_marco
|
163 |
+
|
164 |
+
* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
|
165 |
+
* Size: 78,704 training samples
|
166 |
+
* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
|
167 |
+
* Approximate statistics based on the first 1000 samples:
|
168 |
+
| | query | docs | labels |
|
169 |
+
|:--------|:-----------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
|
170 |
+
| type | string | list | list |
|
171 |
+
| details | <ul><li>min: 9 characters</li><li>mean: 33.95 characters</li><li>max: 103 characters</li></ul> | <ul><li>size: 10 elements</li></ul> | <ul><li>size: 10 elements</li></ul> |
|
172 |
+
* Samples:
|
173 |
+
| query | docs | labels |
|
174 |
+
|:------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
|
175 |
+
| <code>average temperature in may for denver colorado</code> | <code>["In most years, Denver averages a daily maximum temperature for May that's between 67 and 74 degrees Fahrenheit (19 to 23 degrees Celsius). The minimum temperature usually falls between 42 and 46 °F (5 to 8 °C). The days at Denver warm quickly during May.", 'The highest average temperature in Denver is July at 74 degrees. The coldest average temperature in Denver is December at 28.5 degrees. The most monthly precipitation in Denver occurs in August with 2.7 inches. The Denver weather information is based on the average of the previous 3-7 years of data.', "Climate for Denver, Colorado. Denver's coldest month is January when the average temperature overnight is 15.2°F. In July, the warmest month, the average day time temperature rises to 88.0°F.", "Average Temperatures for Denver. Denver's coldest month is January when the average temperature overnight is 15.2°F. In July, the warmest month, the average day time temperature rises to 88.0°F.", 'Location. This report describes the typical...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
176 |
+
| <code>what is brain surgery</code> | <code>['The term “brain surgery” refers to various medical procedures that involve repairing structural problems with the brain. There are numerous types of brain surgery. The type used is based on the area of the brain and condition being treated. Advances in medical technology let surgeons operate on portions of the brain without a single incision near the head. Brain surgery is a critical and complicated process. The type of brain surgery done depends highly on the condition being treated. For example, a brain aneurysm is typically repaired using an endoscope, but if it has ruptured, a craniotomy may be used.', 'Brain surgery is an operation to treat problems in the brain and surrounding structures. Before surgery, the hair on part of the scalp is shaved and the area is cleaned. The doctor makes a surgical cut through the scalp. The location of this cut depends on where the problem in the brain is located. The surgeon creates a hole in the skull and removes a bone flap.', 'Brain Surgery –...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
177 |
+
| <code>whos the girl in terminator genisys</code> | <code>['Over the weekend, Terminator Genisys grossed $28.7 million to take the third spot at the box office, behind Jurassic World and Inside Out. FYI: Emilia is wearing Dior. 10+ pictures inside of Emilia Clarke and Arnold Schwarzenegger hitting the Terminator Genisys premiere in Japan…. Emilia Clarke is red hot while attending the premiere of her new film Terminator Genisys held at the Roppongi Hills Arena on Monday (July 6) in Tokyo, Japan.', "Jai Courtney, who plays Sarah's protector Kyle Reese (and eventual father to Jason Clarke 's John Connor), revealed that this role was the first time a character he played has fallen in love on screen. I had never fallen in love on screen before.", 'When John Connor (Jason Clarke), leader of the human resistance, sends Sgt. Kyle Reese (Jai Courtney) back to 1984 to protect Sarah Connor (Emilia Clarke) and safeguard the future, an unexpected turn of events creates a fractured timeline.', "On the run from the Terminator, Reese and Sarah share a night ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
178 |
+
* Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
|
179 |
+
```json
|
180 |
+
{
|
181 |
+
"pad_value": -1,
|
182 |
+
"activation_fct": "torch.nn.modules.activation.Sigmoid"
|
183 |
+
}
|
184 |
+
```
|
185 |
+
|
186 |
+
### Evaluation Dataset
|
187 |
+
|
188 |
+
#### ms_marco
|
189 |
+
|
190 |
+
* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
|
191 |
+
* Size: 1,000 evaluation samples
|
192 |
+
* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
|
193 |
+
* Approximate statistics based on the first 1000 samples:
|
194 |
+
| | query | docs | labels |
|
195 |
+
|:--------|:-----------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
|
196 |
+
| type | string | list | list |
|
197 |
+
| details | <ul><li>min: 10 characters</li><li>mean: 33.53 characters</li><li>max: 95 characters</li></ul> | <ul><li>size: 10 elements</li></ul> | <ul><li>size: 10 elements</li></ul> |
|
198 |
+
* Samples:
|
199 |
+
| query | docs | labels |
|
200 |
+
|:-------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
|
201 |
+
| <code>lpn salary richmond va</code> | <code>['$52,000. Average LPN salaries for job postings in Richmond, VA are 1% higher than average LPN salaries for job postings nationwide.', 'A Licensed Practical Nurse (LPN) in Richmond, Virginia earns an average wage of $18.47 per hour. For the first five to ten years in this position, pay increases somewhat, but any additional experience does not have a big effect on pay. $27,369 - $48,339. (Median).', 'Virginia has a growing number of opportunities in the nursing field. Within the state, LPNs make up 25 % of nurses in the state. The Virginia LPN comfort score is 54. This takes into account the average LPN salary, average state salary and cost of living.', 'LPN Salaries and Career Outlook in Richmond. Many LPN graduates choose to work as licensed practical nurses after graduation. If you choose to follow that path and remain in Richmond, your job prospects are good. In 2010, of the 20,060 licensed practical nurses in Virginia, 370 were working in the greater Richmond area.', 'This chart ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
202 |
+
| <code>what is neutrogena</code> | <code>["Neutrogena is an American brand of skin care, hair care and cosmetics, that is headquartered in Los Angeles, California. According to product advertising at their website, Neutrogena products are distributed in more than 70 countries. Neutrogena was founded in 1930 by Emanuel Stolaroff, and was originally a cosmetics company named Natone. In 1994 Johnson & Johnson acquired Neutrogena for $924 million, at a price of $35.25 per share. Johnson & Johnson's international network helped Neutrogena boost its sales and enter newer markets including India, South Africa, and China. Priced at a premium, Neutrogena products are distributed in over 70 countries.", 'Neutrogena also has retinol products for treating acne that have one thing going for them that most brands do not—they are in the kind of package that keeps the retinol cream fresh and active. Any kind of vitamin you dip out of jar will go bad almost as soon as you open the container due to oxidation.ost of the products Neutrogena make...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
203 |
+
| <code>why is lincoln a great leader</code> | <code>['His commitment to the rights of individuals was a cornerstone of his leadership style (Phillips, 1992). There have been many great leaders throughout the history of this great nation, but Abraham Lincoln is consistently mentioned as one of our greatest leaders. Although Lincoln possessed many characteristics of a great leader, probably his greatest leadership trait was his ability to communicate. Though Lincoln only had one year of formal education, he was able to master language and use his words to influence the people as a great public speaker, debater and as a humorist. Another part of Lincoln’s skills as a great communicator, was that he had a great capacity for learning to listen to different points of view. While president, he created a work environment where his cabinet members were able to disagree with his decisions without the threat of retaliation for doing so.', 'Expressed in his own words, here is Lincoln’s most luminous leadership insight by far: In order to win a man ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
204 |
+
* Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
|
205 |
+
```json
|
206 |
+
{
|
207 |
+
"pad_value": -1,
|
208 |
+
"activation_fct": "torch.nn.modules.activation.Sigmoid"
|
209 |
+
}
|
210 |
+
```
|
211 |
+
|
212 |
+
### Training Hyperparameters
|
213 |
+
#### Non-Default Hyperparameters
|
214 |
+
|
215 |
+
- `eval_strategy`: steps
|
216 |
+
- `per_device_train_batch_size`: 6
|
217 |
+
- `per_device_eval_batch_size`: 16
|
218 |
+
- `learning_rate`: 2e-05
|
219 |
+
- `warmup_ratio`: 0.1
|
220 |
+
- `seed`: 12
|
221 |
+
- `bf16`: True
|
222 |
+
- `load_best_model_at_end`: True
|
223 |
+
|
224 |
+
#### All Hyperparameters
|
225 |
+
<details><summary>Click to expand</summary>
|
226 |
+
|
227 |
+
- `overwrite_output_dir`: False
|
228 |
+
- `do_predict`: False
|
229 |
+
- `eval_strategy`: steps
|
230 |
+
- `prediction_loss_only`: True
|
231 |
+
- `per_device_train_batch_size`: 6
|
232 |
+
- `per_device_eval_batch_size`: 16
|
233 |
+
- `per_gpu_train_batch_size`: None
|
234 |
+
- `per_gpu_eval_batch_size`: None
|
235 |
+
- `gradient_accumulation_steps`: 1
|
236 |
+
- `eval_accumulation_steps`: None
|
237 |
+
- `torch_empty_cache_steps`: None
|
238 |
+
- `learning_rate`: 2e-05
|
239 |
+
- `weight_decay`: 0.0
|
240 |
+
- `adam_beta1`: 0.9
|
241 |
+
- `adam_beta2`: 0.999
|
242 |
+
- `adam_epsilon`: 1e-08
|
243 |
+
- `max_grad_norm`: 1.0
|
244 |
+
- `num_train_epochs`: 3
|
245 |
+
- `max_steps`: -1
|
246 |
+
- `lr_scheduler_type`: linear
|
247 |
+
- `lr_scheduler_kwargs`: {}
|
248 |
+
- `warmup_ratio`: 0.1
|
249 |
+
- `warmup_steps`: 0
|
250 |
+
- `log_level`: passive
|
251 |
+
- `log_level_replica`: warning
|
252 |
+
- `log_on_each_node`: True
|
253 |
+
- `logging_nan_inf_filter`: True
|
254 |
+
- `save_safetensors`: True
|
255 |
+
- `save_on_each_node`: False
|
256 |
+
- `save_only_model`: False
|
257 |
+
- `restore_callback_states_from_checkpoint`: False
|
258 |
+
- `no_cuda`: False
|
259 |
+
- `use_cpu`: False
|
260 |
+
- `use_mps_device`: False
|
261 |
+
- `seed`: 12
|
262 |
+
- `data_seed`: None
|
263 |
+
- `jit_mode_eval`: False
|
264 |
+
- `use_ipex`: False
|
265 |
+
- `bf16`: True
|
266 |
+
- `fp16`: False
|
267 |
+
- `fp16_opt_level`: O1
|
268 |
+
- `half_precision_backend`: auto
|
269 |
+
- `bf16_full_eval`: False
|
270 |
+
- `fp16_full_eval`: False
|
271 |
+
- `tf32`: None
|
272 |
+
- `local_rank`: 0
|
273 |
+
- `ddp_backend`: None
|
274 |
+
- `tpu_num_cores`: None
|
275 |
+
- `tpu_metrics_debug`: False
|
276 |
+
- `debug`: []
|
277 |
+
- `dataloader_drop_last`: False
|
278 |
+
- `dataloader_num_workers`: 0
|
279 |
+
- `dataloader_prefetch_factor`: None
|
280 |
+
- `past_index`: -1
|
281 |
+
- `disable_tqdm`: False
|
282 |
+
- `remove_unused_columns`: True
|
283 |
+
- `label_names`: None
|
284 |
+
- `load_best_model_at_end`: True
|
285 |
+
- `ignore_data_skip`: False
|
286 |
+
- `fsdp`: []
|
287 |
+
- `fsdp_min_num_params`: 0
|
288 |
+
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
289 |
+
- `fsdp_transformer_layer_cls_to_wrap`: None
|
290 |
+
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
291 |
+
- `deepspeed`: None
|
292 |
+
- `label_smoothing_factor`: 0.0
|
293 |
+
- `optim`: adamw_torch
|
294 |
+
- `optim_args`: None
|
295 |
+
- `adafactor`: False
|
296 |
+
- `group_by_length`: False
|
297 |
+
- `length_column_name`: length
|
298 |
+
- `ddp_find_unused_parameters`: None
|
299 |
+
- `ddp_bucket_cap_mb`: None
|
300 |
+
- `ddp_broadcast_buffers`: False
|
301 |
+
- `dataloader_pin_memory`: True
|
302 |
+
- `dataloader_persistent_workers`: False
|
303 |
+
- `skip_memory_metrics`: True
|
304 |
+
- `use_legacy_prediction_loop`: False
|
305 |
+
- `push_to_hub`: False
|
306 |
+
- `resume_from_checkpoint`: None
|
307 |
+
- `hub_model_id`: None
|
308 |
+
- `hub_strategy`: every_save
|
309 |
+
- `hub_private_repo`: None
|
310 |
+
- `hub_always_push`: False
|
311 |
+
- `gradient_checkpointing`: False
|
312 |
+
- `gradient_checkpointing_kwargs`: None
|
313 |
+
- `include_inputs_for_metrics`: False
|
314 |
+
- `include_for_metrics`: []
|
315 |
+
- `eval_do_concat_batches`: True
|
316 |
+
- `fp16_backend`: auto
|
317 |
+
- `push_to_hub_model_id`: None
|
318 |
+
- `push_to_hub_organization`: None
|
319 |
+
- `mp_parameters`:
|
320 |
+
- `auto_find_batch_size`: False
|
321 |
+
- `full_determinism`: False
|
322 |
+
- `torchdynamo`: None
|
323 |
+
- `ray_scope`: last
|
324 |
+
- `ddp_timeout`: 1800
|
325 |
+
- `torch_compile`: False
|
326 |
+
- `torch_compile_backend`: None
|
327 |
+
- `torch_compile_mode`: None
|
328 |
+
- `dispatch_batches`: None
|
329 |
+
- `split_batches`: None
|
330 |
+
- `include_tokens_per_second`: False
|
331 |
+
- `include_num_input_tokens_seen`: False
|
332 |
+
- `neftune_noise_alpha`: None
|
333 |
+
- `optim_target_modules`: None
|
334 |
+
- `batch_eval_metrics`: False
|
335 |
+
- `eval_on_start`: False
|
336 |
+
- `use_liger_kernel`: False
|
337 |
+
- `eval_use_gather_object`: False
|
338 |
+
- `average_tokens_across_devices`: False
|
339 |
+
- `prompts`: None
|
340 |
+
- `batch_sampler`: batch_sampler
|
341 |
+
- `multi_dataset_batch_sampler`: proportional
|
342 |
+
|
343 |
+
</details>
|
344 |
+
|
345 |
+
### Training Logs
|
346 |
+
| Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
|
347 |
+
|:----------:|:---------:|:-------------:|:---------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------------:|
|
348 |
+
| -1 | -1 | - | - | 0.0355 (-0.5049) | 0.2822 (-0.0429) | 0.0479 (-0.4528) | 0.1218 (-0.3335) |
|
349 |
+
| 0.0001 | 1 | 1.8394 | - | - | - | - | - |
|
350 |
+
| 0.0762 | 1000 | 2.0827 | - | - | - | - | - |
|
351 |
+
| 0.1525 | 2000 | 2.0821 | - | - | - | - | - |
|
352 |
+
| 0.2287 | 3000 | 2.078 | - | - | - | - | - |
|
353 |
+
| 0.3049 | 4000 | 2.0776 | 2.0734 | 0.5755 (+0.0350) | 0.3542 (+0.0292) | 0.5827 (+0.0820) | 0.5041 (+0.0487) |
|
354 |
+
| 0.3812 | 5000 | 2.0735 | - | - | - | - | - |
|
355 |
+
| 0.4574 | 6000 | 2.0719 | - | - | - | - | - |
|
356 |
+
| 0.5336 | 7000 | 2.0703 | - | - | - | - | - |
|
357 |
+
| 0.6098 | 8000 | 2.0712 | 2.0726 | 0.5748 (+0.0344) | 0.3382 (+0.0131) | 0.5966 (+0.0959) | 0.5032 (+0.0478) |
|
358 |
+
| 0.6861 | 9000 | 2.078 | - | - | - | - | - |
|
359 |
+
| 0.7623 | 10000 | 2.0712 | - | - | - | - | - |
|
360 |
+
| 0.8385 | 11000 | 2.0752 | - | - | - | - | - |
|
361 |
+
| 0.9148 | 12000 | 2.0755 | 2.0716 | 0.5395 (-0.0009) | 0.3428 (+0.0177) | 0.5451 (+0.0444) | 0.4758 (+0.0204) |
|
362 |
+
| 0.9910 | 13000 | 2.0698 | - | - | - | - | - |
|
363 |
+
| 1.0672 | 14000 | 2.072 | - | - | - | - | - |
|
364 |
+
| 1.1435 | 15000 | 2.0704 | - | - | - | - | - |
|
365 |
+
| 1.2197 | 16000 | 2.0693 | 2.0713 | 0.5538 (+0.0134) | 0.3639 (+0.0388) | 0.5766 (+0.0759) | 0.4981 (+0.0427) |
|
366 |
+
| 1.2959 | 17000 | 2.0716 | - | - | - | - | - |
|
367 |
+
| 1.3722 | 18000 | 2.0628 | - | - | - | - | - |
|
368 |
+
| 1.4484 | 19000 | 2.0691 | - | - | - | - | - |
|
369 |
+
| **1.5246** | **20000** | **2.0659** | **2.0733** | **0.5840 (+0.0435)** | **0.3676 (+0.0425)** | **0.6431 (+0.1425)** | **0.5316 (+0.0762)** |
|
370 |
+
| 1.6009 | 21000 | 2.0725 | - | - | - | - | - |
|
371 |
+
| 1.6771 | 22000 | 2.0725 | - | - | - | - | - |
|
372 |
+
| 1.7533 | 23000 | 2.0663 | - | - | - | - | - |
|
373 |
+
| 1.8295 | 24000 | 2.0671 | 2.0715 | 0.5521 (+0.0117) | 0.3339 (+0.0089) | 0.6005 (+0.0999) | 0.4955 (+0.0401) |
|
374 |
+
| 1.9058 | 25000 | 2.0686 | - | - | - | - | - |
|
375 |
+
| 1.9820 | 26000 | 2.0685 | - | - | - | - | - |
|
376 |
+
| 2.0582 | 27000 | 2.068 | - | - | - | - | - |
|
377 |
+
| 2.1345 | 28000 | 2.0622 | 2.0723 | 0.5721 (+0.0317) | 0.3509 (+0.0258) | 0.5870 (+0.0863) | 0.5033 (+0.0480) |
|
378 |
+
| 2.2107 | 29000 | 2.0664 | - | - | - | - | - |
|
379 |
+
| 2.2869 | 30000 | 2.0616 | - | - | - | - | - |
|
380 |
+
| 2.3632 | 31000 | 2.0661 | - | - | - | - | - |
|
381 |
+
| 2.4394 | 32000 | 2.0638 | 2.0725 | 0.5620 (+0.0216) | 0.3481 (+0.0230) | 0.5899 (+0.0893) | 0.5000 (+0.0447) |
|
382 |
+
| 2.5156 | 33000 | 2.0643 | - | - | - | - | - |
|
383 |
+
| 2.5919 | 34000 | 2.0611 | - | - | - | - | - |
|
384 |
+
| 2.6681 | 35000 | 2.0609 | - | - | - | - | - |
|
385 |
+
| 2.7443 | 36000 | 2.0658 | 2.0720 | 0.5846 (+0.0441) | 0.3569 (+0.0318) | 0.5759 (+0.0752) | 0.5058 (+0.0504) |
|
386 |
+
| 2.8206 | 37000 | 2.066 | - | - | - | - | - |
|
387 |
+
| 2.8968 | 38000 | 2.0692 | - | - | - | - | - |
|
388 |
+
| 2.9730 | 39000 | 2.0692 | - | - | - | - | - |
|
389 |
+
| -1 | -1 | - | - | 0.5840 (+0.0435) | 0.3676 (+0.0425) | 0.6431 (+0.1425) | 0.5316 (+0.0762) |
|
390 |
+
|
391 |
+
* The bold row denotes the saved checkpoint.
|
392 |
+
|
393 |
+
### Environmental Impact
|
394 |
+
Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
|
395 |
+
- **Energy Consumed**: 0.519 kWh
|
396 |
+
- **Carbon Emitted**: 0.202 kg of CO2
|
397 |
+
- **Hours Used**: 1.659 hours
|
398 |
+
|
399 |
+
### Training Hardware
|
400 |
+
- **On Cloud**: No
|
401 |
+
- **GPU Model**: 1 x NVIDIA GeForce RTX 3090
|
402 |
+
- **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
|
403 |
+
- **RAM Size**: 31.78 GB
|
404 |
+
|
405 |
+
### Framework Versions
|
406 |
+
- Python: 3.11.6
|
407 |
+
- Sentence Transformers: 3.5.0.dev0
|
408 |
+
- Transformers: 4.48.3
|
409 |
+
- PyTorch: 2.5.0+cu121
|
410 |
+
- Accelerate: 1.4.0
|
411 |
+
- Datasets: 3.3.2
|
412 |
+
- Tokenizers: 0.21.0
|
413 |
+
|
414 |
+
## Citation
|
415 |
+
|
416 |
+
### BibTeX
|
417 |
+
|
418 |
+
#### Sentence Transformers
|
419 |
+
```bibtex
|
420 |
+
@inproceedings{reimers-2019-sentence-bert,
|
421 |
+
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
422 |
+
author = "Reimers, Nils and Gurevych, Iryna",
|
423 |
+
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
424 |
+
month = "11",
|
425 |
+
year = "2019",
|
426 |
+
publisher = "Association for Computational Linguistics",
|
427 |
+
url = "https://arxiv.org/abs/1908.10084",
|
428 |
+
}
|
429 |
+
```
|
430 |
+
|
431 |
+
#### ListNetLoss
|
432 |
+
```bibtex
|
433 |
+
@inproceedings{cao2007learning,
|
434 |
+
title={Learning to rank: from pairwise approach to listwise approach},
|
435 |
+
author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
|
436 |
+
booktitle={Proceedings of the 24th international conference on Machine learning},
|
437 |
+
pages={129--136},
|
438 |
+
year={2007}
|
439 |
+
}
|
440 |
+
```
|
441 |
+
|
442 |
+
<!--
|
443 |
+
## Glossary
|
444 |
+
|
445 |
+
*Clearly define terms in order to be accessible across audiences.*
|
446 |
+
-->
|
447 |
+
|
448 |
+
<!--
|
449 |
+
## Model Card Authors
|
450 |
+
|
451 |
+
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
452 |
+
-->
|
453 |
+
|
454 |
+
<!--
|
455 |
+
## Model Card Contact
|
456 |
+
|
457 |
+
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
458 |
+
-->
|
config.json
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
|
3 |
+
"architectures": [
|
4 |
+
"BertForSequenceClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"classifier_dropout": null,
|
8 |
+
"hidden_act": "gelu",
|
9 |
+
"hidden_dropout_prob": 0.1,
|
10 |
+
"hidden_size": 384,
|
11 |
+
"id2label": {
|
12 |
+
"0": "LABEL_0"
|
13 |
+
},
|
14 |
+
"initializer_range": 0.02,
|
15 |
+
"intermediate_size": 1536,
|
16 |
+
"label2id": {
|
17 |
+
"LABEL_0": 0
|
18 |
+
},
|
19 |
+
"layer_norm_eps": 1e-12,
|
20 |
+
"max_position_embeddings": 512,
|
21 |
+
"model_type": "bert",
|
22 |
+
"num_attention_heads": 12,
|
23 |
+
"num_hidden_layers": 12,
|
24 |
+
"pad_token_id": 0,
|
25 |
+
"position_embedding_type": "absolute",
|
26 |
+
"torch_dtype": "float32",
|
27 |
+
"transformers_version": "4.48.3",
|
28 |
+
"type_vocab_size": 2,
|
29 |
+
"use_cache": true,
|
30 |
+
"vocab_size": 30522
|
31 |
+
}
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0751a0b3805052cfacb44203102f7d6e4367d9b43cbac4020a217ce2e007ffa2
|
3 |
+
size 133464836
|
special_tokens_map.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cls_token": "[CLS]",
|
3 |
+
"mask_token": "[MASK]",
|
4 |
+
"pad_token": "[PAD]",
|
5 |
+
"sep_token": "[SEP]",
|
6 |
+
"unk_token": "[UNK]"
|
7 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"added_tokens_decoder": {
|
3 |
+
"0": {
|
4 |
+
"content": "[PAD]",
|
5 |
+
"lstrip": false,
|
6 |
+
"normalized": false,
|
7 |
+
"rstrip": false,
|
8 |
+
"single_word": false,
|
9 |
+
"special": true
|
10 |
+
},
|
11 |
+
"100": {
|
12 |
+
"content": "[UNK]",
|
13 |
+
"lstrip": false,
|
14 |
+
"normalized": false,
|
15 |
+
"rstrip": false,
|
16 |
+
"single_word": false,
|
17 |
+
"special": true
|
18 |
+
},
|
19 |
+
"101": {
|
20 |
+
"content": "[CLS]",
|
21 |
+
"lstrip": false,
|
22 |
+
"normalized": false,
|
23 |
+
"rstrip": false,
|
24 |
+
"single_word": false,
|
25 |
+
"special": true
|
26 |
+
},
|
27 |
+
"102": {
|
28 |
+
"content": "[SEP]",
|
29 |
+
"lstrip": false,
|
30 |
+
"normalized": false,
|
31 |
+
"rstrip": false,
|
32 |
+
"single_word": false,
|
33 |
+
"special": true
|
34 |
+
},
|
35 |
+
"103": {
|
36 |
+
"content": "[MASK]",
|
37 |
+
"lstrip": false,
|
38 |
+
"normalized": false,
|
39 |
+
"rstrip": false,
|
40 |
+
"single_word": false,
|
41 |
+
"special": true
|
42 |
+
}
|
43 |
+
},
|
44 |
+
"clean_up_tokenization_spaces": true,
|
45 |
+
"cls_token": "[CLS]",
|
46 |
+
"do_basic_tokenize": true,
|
47 |
+
"do_lower_case": true,
|
48 |
+
"extra_special_tokens": {},
|
49 |
+
"mask_token": "[MASK]",
|
50 |
+
"model_max_length": 512,
|
51 |
+
"never_split": null,
|
52 |
+
"pad_token": "[PAD]",
|
53 |
+
"sep_token": "[SEP]",
|
54 |
+
"strip_accents": null,
|
55 |
+
"tokenize_chinese_chars": true,
|
56 |
+
"tokenizer_class": "BertTokenizer",
|
57 |
+
"unk_token": "[UNK]"
|
58 |
+
}
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|