tomaarsen HF staff commited on
Commit
1d4d909
·
verified ·
1 Parent(s): aa31d52

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,460 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - text-classification
8
+ - generated_from_trainer
9
+ - dataset_size:78704
10
+ - loss:ListNetLoss
11
+ base_model: microsoft/MiniLM-L12-H384-uncased
12
+ datasets:
13
+ - microsoft/ms_marco
14
+ pipeline_tag: text-classification
15
+ library_name: sentence-transformers
16
+ metrics:
17
+ - map
18
+ - mrr@10
19
+ - ndcg@10
20
+ co2_eq_emissions:
21
+ emissions: 205.4804729340415
22
+ energy_consumed: 0.5286324046031189
23
+ source: codecarbon
24
+ training_type: fine-tuning
25
+ on_cloud: false
26
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
27
+ ram_total_size: 31.777088165283203
28
+ hours_used: 1.686
29
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
30
+ model-index:
31
+ - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
32
+ results: []
33
+ ---
34
+
35
+ # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
36
+
37
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
38
+
39
+ ## Model Details
40
+
41
+ ### Model Description
42
+ - **Model Type:** Cross Encoder
43
+ - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
44
+ - **Maximum Sequence Length:** 512 tokens
45
+ - **Number of Output Labels:** 1 label
46
+ - **Training Dataset:**
47
+ - [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
48
+ - **Language:** en
49
+ <!-- - **License:** Unknown -->
50
+
51
+ ### Model Sources
52
+
53
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
54
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
55
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
56
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
57
+
58
+ ## Usage
59
+
60
+ ### Direct Usage (Sentence Transformers)
61
+
62
+ First install the Sentence Transformers library:
63
+
64
+ ```bash
65
+ pip install -U sentence-transformers
66
+ ```
67
+
68
+ Then you can load this model and run inference.
69
+ ```python
70
+ from sentence_transformers import CrossEncoder
71
+
72
+ # Download from the 🤗 Hub
73
+ model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-identity")
74
+ # Get scores for pairs of texts
75
+ pairs = [
76
+ ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
77
+ ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
78
+ ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
79
+ ]
80
+ scores = model.predict(pairs)
81
+ print(scores.shape)
82
+ # (3,)
83
+
84
+ # Or rank different texts based on similarity to a single text
85
+ ranks = model.rank(
86
+ 'How many calories in an egg',
87
+ [
88
+ 'There are on average between 55 and 80 calories in an egg depending on its size.',
89
+ 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
90
+ 'Most of the calories in an egg come from the yellow yolk in the center.',
91
+ ]
92
+ )
93
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
94
+ ```
95
+
96
+ <!--
97
+ ### Direct Usage (Transformers)
98
+
99
+ <details><summary>Click to see the direct usage in Transformers</summary>
100
+
101
+ </details>
102
+ -->
103
+
104
+ <!--
105
+ ### Downstream Usage (Sentence Transformers)
106
+
107
+ You can finetune this model on your own dataset.
108
+
109
+ <details><summary>Click to expand</summary>
110
+
111
+ </details>
112
+ -->
113
+
114
+ <!--
115
+ ### Out-of-Scope Use
116
+
117
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
118
+ -->
119
+
120
+ ## Evaluation
121
+
122
+ ### Metrics
123
+
124
+ #### Cross Encoder Reranking
125
+
126
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
127
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator)
128
+
129
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
130
+ |:------------|:---------------------|:---------------------|:---------------------|
131
+ | map | 0.4847 (-0.0049) | 0.3325 (+0.0716) | 0.5967 (+0.1771) |
132
+ | mrr@10 | 0.4768 (-0.0007) | 0.5669 (+0.0670) | 0.6024 (+0.1757) |
133
+ | **ndcg@10** | **0.5573 (+0.0168)** | **0.3623 (+0.0373)** | **0.6499 (+0.1492)** |
134
+
135
+ #### Cross Encoder Nano BEIR
136
+
137
+ * Dataset: `NanoBEIR_R100_mean`
138
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator)
139
+
140
+ | Metric | Value |
141
+ |:------------|:---------------------|
142
+ | map | 0.4713 (+0.0813) |
143
+ | mrr@10 | 0.5487 (+0.0807) |
144
+ | **ndcg@10** | **0.5231 (+0.0678)** |
145
+
146
+ <!--
147
+ ## Bias, Risks and Limitations
148
+
149
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
150
+ -->
151
+
152
+ <!--
153
+ ### Recommendations
154
+
155
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
156
+ -->
157
+
158
+ ## Training Details
159
+
160
+ ### Training Dataset
161
+
162
+ #### ms_marco
163
+
164
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
165
+ * Size: 78,704 training samples
166
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
167
+ * Approximate statistics based on the first 1000 samples:
168
+ | | query | docs | labels |
169
+ |:--------|:-----------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
170
+ | type | string | list | list |
171
+ | details | <ul><li>min: 10 characters</li><li>mean: 33.93 characters</li><li>max: 99 characters</li></ul> | <ul><li>size: 10 elements</li></ul> | <ul><li>size: 10 elements</li></ul> |
172
+ * Samples:
173
+ | query | docs | labels |
174
+ |:------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
175
+ | <code>what types of moons are there</code> | <code>["The different types of moons are: Full Wolf Moon, Full Snow Moon, Full Worm Moon, paschal full moon, full pink moon, full flower moon, full strawberry moon, full buck moon, … full sturgeon moon, full harvest moon, full hunters moon, full beaver moon, full cold moon. The solar eclipse, when the moon blocks the sun's light from hitting the earth-creating a temporary blackout on earth, can occur only at the time of New Moon, while the luna … r eclipse, when the earth blocks the sun's light from reflecting off the moon, can occur only at the time of Full Moon.", 'Types of Moons. Full Moon names date back to Native Americans, of what is now the northern and eastern United States. The tribes kept track of the seasons by giving distinctive names to each recurring full Moon. Their names were applied to the entire month in which each occurred. There was some variation in the Moon names, but in general, the same ones were current throughout the Algonquin tribes from New England to Lake Superio...</code> | <code>[1, 1, 1, 0, 0, ...]</code> |
176
+ | <code>what is beryllium commonly combined with</code> | <code>['Beryllium is an industrial metal with some attractive attributes. It’s lighter than aluminum and 6x stronger than steel. It’s usually combined with other metals and is a key component in the aerospace and electronics industries. Beryllium is also used in the production of nuclear weapons. With that, you may not be surprised to learn that beryllium is one of the most toxic elements in existence. Beryllium is a Class A EPA carcinogen and exposure can cause Chronic Beryllium Disease, an often fatal lung disease. ', 'Beryllium is found in about 30 different mineral species. The most important are beryl (beryllium aluminium silicate) and bertrandite (beryllium silicate). Emerald and aquamarine are precious forms of beryl. The metal is usually prepared by reducing beryllium fluoride with magnesium metal. Uses. Beryllium is used in alloys with copper or nickel to make gyroscopes, springs, electrical contacts, spot-welding electrodes and non-sparking tools. Mixing beryllium with these metals...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
177
+ | <code>is turkish coffee healthy</code> | <code>["Calories, Fat and Other Basics. A serving of Turkish coffee contains about 46 calories. Though the drink doesn't contain any fat, it also doesn't supply any fiber or protein, two key nutrients needed for good health. The coffee doesn't supply an impressive amount of calcium or iron either. A blend of strong coffee, sugar and cardamom, Turkish coffee is more of a sweet treat than something similar to a regular cup of coffee. While there are certain health benefits from the coffee and cardamom, sugar is a major drawback when it comes to the nutritional benefits of the drink", "A serving of Turkish coffee contains about 11.5 grams of sugar, which is equal to almost 3 teaspoons. That's half of the 6 teaspoons women should limit themselves to each day and one-third of the 9 teaspoons men should set as their daily upper limit, according to the American Heart Association. A blend of strong coffee, sugar and cardamom, Turkish coffee is more of a sweet treat than something similar to a regula...</code> | <code>[1, 1, 0, 0, 0, ...]</code> |
178
+ * Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
179
+ ```json
180
+ {
181
+ "eps": 1e-10,
182
+ "pad_value": -1,
183
+ "activation_fct": "torch.nn.modules.linear.Identity"
184
+ }
185
+ ```
186
+
187
+ ### Evaluation Dataset
188
+
189
+ #### ms_marco
190
+
191
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
192
+ * Size: 1,000 evaluation samples
193
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
194
+ * Approximate statistics based on the first 1000 samples:
195
+ | | query | docs | labels |
196
+ |:--------|:------------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
197
+ | type | string | list | list |
198
+ | details | <ul><li>min: 10 characters</li><li>mean: 33.81 characters</li><li>max: 110 characters</li></ul> | <ul><li>size: 10 elements</li></ul> | <ul><li>size: 10 elements</li></ul> |
199
+ * Samples:
200
+ | query | docs | labels |
201
+ |:-----------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
202
+ | <code>what is a fishy smell on humans</code> | <code>["Trimethylaminuria (TMAU), also known as fish odor syndrome or fish malodor syndrome, is a rare metabolic disorder where Trimethylamine is released in the person's sweat, urine, and breath, giving off a strong fishy odor or strong body odor. Body odor is generally considered to be an unpleasant odor among many human cultures.", "The trimethylamine is released in the person's sweat, urine, reproductive fluids, and breath, giving off a strong fishy or body odor. Some people with trimethylaminuria have a strong odor all the time, but most have a moderate smell that varies in intensity over time. Although FMO3 mutations account for most known cases of trimethylaminuria, some cases are caused by other factors. A fish-like body odor could result from an excess of certain proteins in the diet or from an increase in bacteria in the digestive system.", 'Trimethylaminuria is a disorder in which the body is unable to break down trimethylamine, a chemical compound that has a pungent odor. Trimeth...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
203
+ | <code>how to cut woodworking joints</code> | <code>['The tails and pins interlock to form a strong 90-degree joint. Dovetail joints are technically complex and are often used to create drawer boxes for furniture. Through mortise and tenon – To form this joint, a round or square hole (called a mortise) is cut through the side of one piece of wood. The end of the other piece of wood is cut to have a projection (the tenon) that matches the mortise. The tenon is placed into the mortise, projecting out from the other side of the wood. A wedge is hammered into a hole in the tenon. The wedge keeps the tenon from sliding out of the mortise.', "Wood joinery is simply the method by which two pieces of wood are connected. In many cases, the appearance of a joint becomes at least as important as it's strength. Wood joinery encompasses everything from intricate half-blind dovetails to connections that are simply nailed, glued or screwed. How to Use Biscuit Joints. Share. Doweling as a method of joinery is simple: a few dowels are glued into matchin...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
204
+ | <code>how long does it take to be a paramedic</code> | <code>['In Kansas, you first have to take an EMT course which is roughly 6 months long depending on where you take the course. Then you have to take A&P, English Comp, Sociology, Algebra, & Interpersonal Communication as pre-requisites for Paramedic. EMT is 110 hours which can be done in 3 weeks or dragged out for several months by just going one night per week to class. The Paramedic is 600 - 1200 hours in length depending on the state and averages about 6 - 9 months of training.', 'Coursework and training to become an EMT-basic or first responder can generally be completed in as little as three weeks on an accelerated basis. For part-time students, these programs may take around 8-11 weeks to complete. To become an EMT-intermediate 1985 or 1999, students generally must complete 30-350 hours of training. This training requirement varies according to the procedures the state allows these EMTs to perform.', 'How long does it take to be a paramedic depends on the area of study and the skill on...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
205
+ * Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
206
+ ```json
207
+ {
208
+ "eps": 1e-10,
209
+ "pad_value": -1,
210
+ "activation_fct": "torch.nn.modules.linear.Identity"
211
+ }
212
+ ```
213
+
214
+ ### Training Hyperparameters
215
+ #### Non-Default Hyperparameters
216
+
217
+ - `eval_strategy`: steps
218
+ - `per_device_train_batch_size`: 6
219
+ - `per_device_eval_batch_size`: 16
220
+ - `learning_rate`: 2e-05
221
+ - `warmup_ratio`: 0.1
222
+ - `seed`: 12
223
+ - `bf16`: True
224
+ - `load_best_model_at_end`: True
225
+
226
+ #### All Hyperparameters
227
+ <details><summary>Click to expand</summary>
228
+
229
+ - `overwrite_output_dir`: False
230
+ - `do_predict`: False
231
+ - `eval_strategy`: steps
232
+ - `prediction_loss_only`: True
233
+ - `per_device_train_batch_size`: 6
234
+ - `per_device_eval_batch_size`: 16
235
+ - `per_gpu_train_batch_size`: None
236
+ - `per_gpu_eval_batch_size`: None
237
+ - `gradient_accumulation_steps`: 1
238
+ - `eval_accumulation_steps`: None
239
+ - `torch_empty_cache_steps`: None
240
+ - `learning_rate`: 2e-05
241
+ - `weight_decay`: 0.0
242
+ - `adam_beta1`: 0.9
243
+ - `adam_beta2`: 0.999
244
+ - `adam_epsilon`: 1e-08
245
+ - `max_grad_norm`: 1.0
246
+ - `num_train_epochs`: 3
247
+ - `max_steps`: -1
248
+ - `lr_scheduler_type`: linear
249
+ - `lr_scheduler_kwargs`: {}
250
+ - `warmup_ratio`: 0.1
251
+ - `warmup_steps`: 0
252
+ - `log_level`: passive
253
+ - `log_level_replica`: warning
254
+ - `log_on_each_node`: True
255
+ - `logging_nan_inf_filter`: True
256
+ - `save_safetensors`: True
257
+ - `save_on_each_node`: False
258
+ - `save_only_model`: False
259
+ - `restore_callback_states_from_checkpoint`: False
260
+ - `no_cuda`: False
261
+ - `use_cpu`: False
262
+ - `use_mps_device`: False
263
+ - `seed`: 12
264
+ - `data_seed`: None
265
+ - `jit_mode_eval`: False
266
+ - `use_ipex`: False
267
+ - `bf16`: True
268
+ - `fp16`: False
269
+ - `fp16_opt_level`: O1
270
+ - `half_precision_backend`: auto
271
+ - `bf16_full_eval`: False
272
+ - `fp16_full_eval`: False
273
+ - `tf32`: None
274
+ - `local_rank`: 0
275
+ - `ddp_backend`: None
276
+ - `tpu_num_cores`: None
277
+ - `tpu_metrics_debug`: False
278
+ - `debug`: []
279
+ - `dataloader_drop_last`: False
280
+ - `dataloader_num_workers`: 0
281
+ - `dataloader_prefetch_factor`: None
282
+ - `past_index`: -1
283
+ - `disable_tqdm`: False
284
+ - `remove_unused_columns`: True
285
+ - `label_names`: None
286
+ - `load_best_model_at_end`: True
287
+ - `ignore_data_skip`: False
288
+ - `fsdp`: []
289
+ - `fsdp_min_num_params`: 0
290
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
291
+ - `fsdp_transformer_layer_cls_to_wrap`: None
292
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
293
+ - `deepspeed`: None
294
+ - `label_smoothing_factor`: 0.0
295
+ - `optim`: adamw_torch
296
+ - `optim_args`: None
297
+ - `adafactor`: False
298
+ - `group_by_length`: False
299
+ - `length_column_name`: length
300
+ - `ddp_find_unused_parameters`: None
301
+ - `ddp_bucket_cap_mb`: None
302
+ - `ddp_broadcast_buffers`: False
303
+ - `dataloader_pin_memory`: True
304
+ - `dataloader_persistent_workers`: False
305
+ - `skip_memory_metrics`: True
306
+ - `use_legacy_prediction_loop`: False
307
+ - `push_to_hub`: False
308
+ - `resume_from_checkpoint`: None
309
+ - `hub_model_id`: None
310
+ - `hub_strategy`: every_save
311
+ - `hub_private_repo`: None
312
+ - `hub_always_push`: False
313
+ - `gradient_checkpointing`: False
314
+ - `gradient_checkpointing_kwargs`: None
315
+ - `include_inputs_for_metrics`: False
316
+ - `include_for_metrics`: []
317
+ - `eval_do_concat_batches`: True
318
+ - `fp16_backend`: auto
319
+ - `push_to_hub_model_id`: None
320
+ - `push_to_hub_organization`: None
321
+ - `mp_parameters`:
322
+ - `auto_find_batch_size`: False
323
+ - `full_determinism`: False
324
+ - `torchdynamo`: None
325
+ - `ray_scope`: last
326
+ - `ddp_timeout`: 1800
327
+ - `torch_compile`: False
328
+ - `torch_compile_backend`: None
329
+ - `torch_compile_mode`: None
330
+ - `dispatch_batches`: None
331
+ - `split_batches`: None
332
+ - `include_tokens_per_second`: False
333
+ - `include_num_input_tokens_seen`: False
334
+ - `neftune_noise_alpha`: None
335
+ - `optim_target_modules`: None
336
+ - `batch_eval_metrics`: False
337
+ - `eval_on_start`: False
338
+ - `use_liger_kernel`: False
339
+ - `eval_use_gather_object`: False
340
+ - `average_tokens_across_devices`: False
341
+ - `prompts`: None
342
+ - `batch_sampler`: batch_sampler
343
+ - `multi_dataset_batch_sampler`: proportional
344
+
345
+ </details>
346
+
347
+ ### Training Logs
348
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
349
+ |:----------:|:---------:|:-------------:|:---------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------------:|
350
+ | -1 | -1 | - | - | 0.0155 (-0.5250) | 0.3609 (+0.0358) | 0.0410 (-0.4597) | 0.1391 (-0.3163) |
351
+ | 0.0001 | 1 | 2.1559 | - | - | - | - | - |
352
+ | 0.0762 | 1000 | 2.0862 | - | - | - | - | - |
353
+ | 0.1525 | 2000 | 2.0787 | - | - | - | - | - |
354
+ | 0.2287 | 3000 | 2.0785 | - | - | - | - | - |
355
+ | 0.3049 | 4000 | 2.0738 | 2.0755 | 0.5129 (-0.0276) | 0.3371 (+0.0120) | 0.5561 (+0.0555) | 0.4687 (+0.0133) |
356
+ | 0.3812 | 5000 | 2.0828 | - | - | - | - | - |
357
+ | 0.4574 | 6000 | 2.0711 | - | - | - | - | - |
358
+ | 0.5336 | 7000 | 2.072 | - | - | - | - | - |
359
+ | 0.6098 | 8000 | 2.0721 | 2.0734 | 0.5627 (+0.0222) | 0.3547 (+0.0296) | 0.5691 (+0.0684) | 0.4955 (+0.0401) |
360
+ | 0.6861 | 9000 | 2.0714 | - | - | - | - | - |
361
+ | 0.7623 | 10000 | 2.0744 | - | - | - | - | - |
362
+ | 0.8385 | 11000 | 2.0708 | - | - | - | - | - |
363
+ | **0.9148** | **12000** | **2.0705** | **2.0732** | **0.5573 (+0.0168)** | **0.3623 (+0.0373)** | **0.6499 (+0.1492)** | **0.5231 (+0.0678)** |
364
+ | 0.9910 | 13000 | 2.0721 | - | - | - | - | - |
365
+ | 1.0672 | 14000 | 2.065 | - | - | - | - | - |
366
+ | 1.1435 | 15000 | 2.0732 | - | - | - | - | - |
367
+ | 1.2197 | 16000 | 2.07 | 2.0729 | 0.5673 (+0.0269) | 0.3563 (+0.0312) | 0.5877 (+0.0870) | 0.5038 (+0.0484) |
368
+ | 1.2959 | 17000 | 2.0707 | - | - | - | - | - |
369
+ | 1.3722 | 18000 | 2.0719 | - | - | - | - | - |
370
+ | 1.4484 | 19000 | 2.0687 | - | - | - | - | - |
371
+ | 1.5246 | 20000 | 2.0675 | 2.0730 | 0.5633 (+0.0228) | 0.3264 (+0.0014) | 0.5949 (+0.0943) | 0.4949 (+0.0395) |
372
+ | 1.6009 | 21000 | 2.0698 | - | - | - | - | - |
373
+ | 1.6771 | 22000 | 2.0685 | - | - | - | - | - |
374
+ | 1.7533 | 23000 | 2.0683 | - | - | - | - | - |
375
+ | 1.8295 | 24000 | 2.0667 | 2.0731 | 0.5571 (+0.0166) | 0.3521 (+0.0271) | 0.6319 (+0.1313) | 0.5137 (+0.0583) |
376
+ | 1.9058 | 25000 | 2.0665 | - | - | - | - | - |
377
+ | 1.9820 | 26000 | 2.0707 | - | - | - | - | - |
378
+ | 2.0582 | 27000 | 2.0663 | - | - | - | - | - |
379
+ | 2.1345 | 28000 | 2.0672 | 2.0739 | 0.5543 (+0.0139) | 0.3346 (+0.0096) | 0.5958 (+0.0952) | 0.4949 (+0.0395) |
380
+ | 2.2107 | 29000 | 2.0661 | - | - | - | - | - |
381
+ | 2.2869 | 30000 | 2.0681 | - | - | - | - | - |
382
+ | 2.3632 | 31000 | 2.0626 | - | - | - | - | - |
383
+ | 2.4394 | 32000 | 2.0642 | 2.0745 | 0.5791 (+0.0387) | 0.3347 (+0.0097) | 0.6386 (+0.1380) | 0.5175 (+0.0621) |
384
+ | 2.5156 | 33000 | 2.0635 | - | - | - | - | - |
385
+ | 2.5919 | 34000 | 2.0648 | - | - | - | - | - |
386
+ | 2.6681 | 35000 | 2.0615 | - | - | - | - | - |
387
+ | 2.7443 | 36000 | 2.0626 | 2.0736 | 0.5735 (+0.0331) | 0.3288 (+0.0038) | 0.6205 (+0.1198) | 0.5076 (+0.0522) |
388
+ | 2.8206 | 37000 | 2.0621 | - | - | - | - | - |
389
+ | 2.8968 | 38000 | 2.0664 | - | - | - | - | - |
390
+ | 2.9730 | 39000 | 2.0621 | - | - | - | - | - |
391
+ | -1 | -1 | - | - | 0.5573 (+0.0168) | 0.3623 (+0.0373) | 0.6499 (+0.1492) | 0.5231 (+0.0678) |
392
+
393
+ * The bold row denotes the saved checkpoint.
394
+
395
+ ### Environmental Impact
396
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
397
+ - **Energy Consumed**: 0.529 kWh
398
+ - **Carbon Emitted**: 0.205 kg of CO2
399
+ - **Hours Used**: 1.686 hours
400
+
401
+ ### Training Hardware
402
+ - **On Cloud**: No
403
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
404
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
405
+ - **RAM Size**: 31.78 GB
406
+
407
+ ### Framework Versions
408
+ - Python: 3.11.6
409
+ - Sentence Transformers: 3.5.0.dev0
410
+ - Transformers: 4.48.3
411
+ - PyTorch: 2.5.0+cu121
412
+ - Accelerate: 1.4.0
413
+ - Datasets: 3.3.2
414
+ - Tokenizers: 0.21.0
415
+
416
+ ## Citation
417
+
418
+ ### BibTeX
419
+
420
+ #### Sentence Transformers
421
+ ```bibtex
422
+ @inproceedings{reimers-2019-sentence-bert,
423
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
424
+ author = "Reimers, Nils and Gurevych, Iryna",
425
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
426
+ month = "11",
427
+ year = "2019",
428
+ publisher = "Association for Computational Linguistics",
429
+ url = "https://arxiv.org/abs/1908.10084",
430
+ }
431
+ ```
432
+
433
+ #### ListNetLoss
434
+ ```bibtex
435
+ @inproceedings{cao2007learning,
436
+ title={Learning to rank: from pairwise approach to listwise approach},
437
+ author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
438
+ booktitle={Proceedings of the 24th international conference on Machine learning},
439
+ pages={129--136},
440
+ year={2007}
441
+ }
442
+ ```
443
+
444
+ <!--
445
+ ## Glossary
446
+
447
+ *Clearly define terms in order to be accessible across audiences.*
448
+ -->
449
+
450
+ <!--
451
+ ## Model Card Authors
452
+
453
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
454
+ -->
455
+
456
+ <!--
457
+ ## Model Card Contact
458
+
459
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
460
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.48.3",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:81de3af766410b4cbc6ee23a4f9794b1586a4f7978b70b7a15440f768dfde6c1
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff