metadata

language:
  - en
tags:
  - sentence-transformers
  - cross-encoder
  - text-classification
  - generated_from_trainer
  - dataset_size:78704
  - loss:ListNetLoss
base_model: microsoft/MiniLM-L12-H384-uncased
datasets:
  - microsoft/ms_marco
pipeline_tag: text-classification
library_name: sentence-transformers
metrics:
  - map
  - mrr@10
  - ndcg@10
co2_eq_emissions:
  emissions: 205.4804729340415
  energy_consumed: 0.5286324046031189
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 1.686
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
    results: []

CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Cross Encoder
Base model: microsoft/MiniLM-L12-H384-uncased
Maximum Sequence Length: 512 tokens
Number of Output Labels: 1 label
Training Dataset:
- ms_marco
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-identity")
# Get scores for pairs of texts
pairs = [
    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'How many calories in an egg',
    [
        'There are on average between 55 and 80 calories in an egg depending on its size.',
        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
        'Most of the calories in an egg come from the yellow yolk in the center.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Datasets: NanoMSMARCO, NanoNFCorpus and NanoNQ
Evaluated with CrossEncoderRerankingEvaluator

Metric	NanoMSMARCO	NanoNFCorpus	NanoNQ
map	0.4847 (-0.0049)	0.3325 (+0.0716)	0.5967 (+0.1771)
mrr@10	0.4768 (-0.0007)	0.5669 (+0.0670)	0.6024 (+0.1757)
ndcg@10	0.5573 (+0.0168)	0.3623 (+0.0373)	0.6499 (+0.1492)

Cross Encoder Nano BEIR

Dataset: NanoBEIR_R100_mean
Evaluated with CrossEncoderNanoBEIREvaluator

Metric	Value
map	0.4713 (+0.0813)
mrr@10	0.5487 (+0.0807)
ndcg@10	0.5231 (+0.0678)

Training Details

Training Dataset

ms_marco

Dataset: ms_marco at a47ee7a
Size: 78,704 training samples
Columns: query, docs, and labels
Approximate statistics based on the first 1000 samples:
query docs labels
type string list list
details
min: 10 characters
mean: 33.93 characters
max: 99 characters

size: 10 elements

size: 10 elements

	query	docs	labels
type	string	list	list
details	min: 10 characters mean: 33.93 characters max: 99 characters	size: 10 elements	size: 10 elements

Samples:

query	docs	labels
`what types of moons are there`	["The different types of moons are: Full Wolf Moon, Full Snow Moon, Full Worm Moon, paschal full moon, full pink moon, full flower moon, full strawberry moon, full buck moon, … full sturgeon moon, full harvest moon, full hunters moon, full beaver moon, full cold moon. The solar eclipse, when the moon blocks the sun's light from hitting the earth-creating a temporary blackout on earth, can occur only at the time of New Moon, while the luna … r eclipse, when the earth blocks the sun's light from reflecting off the moon, can occur only at the time of Full Moon.", 'Types of Moons. Full Moon names date back to Native Americans, of what is now the northern and eastern United States. The tribes kept track of the seasons by giving distinctive names to each recurring full Moon. Their names were applied to the entire month in which each occurred. There was some variation in the Moon names, but in general, the same ones were current throughout the Algonquin tribes from New England to Lake Superio...	`[1, 1, 1, 0, 0, ...]`
`what is beryllium commonly combined with`	['Beryllium is an industrial metal with some attractive attributes. It’s lighter than aluminum and 6x stronger than steel. It’s usually combined with other metals and is a key component in the aerospace and electronics industries. Beryllium is also used in the production of nuclear weapons. With that, you may not be surprised to learn that beryllium is one of the most toxic elements in existence. Beryllium is a Class A EPA carcinogen and exposure can cause Chronic Beryllium Disease, an often fatal lung disease. ', 'Beryllium is found in about 30 different mineral species. The most important are beryl (beryllium aluminium silicate) and bertrandite (beryllium silicate). Emerald and aquamarine are precious forms of beryl. The metal is usually prepared by reducing beryllium fluoride with magnesium metal. Uses. Beryllium is used in alloys with copper or nickel to make gyroscopes, springs, electrical contacts, spot-welding electrodes and non-sparking tools. Mixing beryllium with these metals...	`[1, 0, 0, 0, 0, ...]`
`is turkish coffee healthy`	["Calories, Fat and Other Basics. A serving of Turkish coffee contains about 46 calories. Though the drink doesn't contain any fat, it also doesn't supply any fiber or protein, two key nutrients needed for good health. The coffee doesn't supply an impressive amount of calcium or iron either. A blend of strong coffee, sugar and cardamom, Turkish coffee is more of a sweet treat than something similar to a regular cup of coffee. While there are certain health benefits from the coffee and cardamom, sugar is a major drawback when it comes to the nutritional benefits of the drink", "A serving of Turkish coffee contains about 11.5 grams of sugar, which is equal to almost 3 teaspoons. That's half of the 6 teaspoons women should limit themselves to each day and one-third of the 9 teaspoons men should set as their daily upper limit, according to the American Heart Association. A blend of strong coffee, sugar and cardamom, Turkish coffee is more of a sweet treat than something similar to a regula...	`[1, 1, 0, 0, 0, ...]`

Loss: ListNetLoss with these parameters:

{
    "eps": 1e-10,
    "pad_value": -1,
    "activation_fct": "torch.nn.modules.linear.Identity"
}

Evaluation Dataset

ms_marco

Dataset: ms_marco at a47ee7a
Size: 1,000 evaluation samples
Columns: query, docs, and labels
Approximate statistics based on the first 1000 samples:
query docs labels
type string list list
details
min: 10 characters
mean: 33.81 characters
max: 110 characters

size: 10 elements

size: 10 elements

	query	docs	labels
type	string	list	list
details	min: 10 characters mean: 33.81 characters max: 110 characters	size: 10 elements	size: 10 elements

Samples:

query	docs	labels
`what is a fishy smell on humans`	["Trimethylaminuria (TMAU), also known as fish odor syndrome or fish malodor syndrome, is a rare metabolic disorder where Trimethylamine is released in the person's sweat, urine, and breath, giving off a strong fishy odor or strong body odor. Body odor is generally considered to be an unpleasant odor among many human cultures.", "The trimethylamine is released in the person's sweat, urine, reproductive fluids, and breath, giving off a strong fishy or body odor. Some people with trimethylaminuria have a strong odor all the time, but most have a moderate smell that varies in intensity over time. Although FMO3 mutations account for most known cases of trimethylaminuria, some cases are caused by other factors. A fish-like body odor could result from an excess of certain proteins in the diet or from an increase in bacteria in the digestive system.", 'Trimethylaminuria is a disorder in which the body is unable to break down trimethylamine, a chemical compound that has a pungent odor. Trimeth...	`[1, 0, 0, 0, 0, ...]`
`how to cut woodworking joints`	['The tails and pins interlock to form a strong 90-degree joint. Dovetail joints are technically complex and are often used to create drawer boxes for furniture. Through mortise and tenon – To form this joint, a round or square hole (called a mortise) is cut through the side of one piece of wood. The end of the other piece of wood is cut to have a projection (the tenon) that matches the mortise. The tenon is placed into the mortise, projecting out from the other side of the wood. A wedge is hammered into a hole in the tenon. The wedge keeps the tenon from sliding out of the mortise.', "Wood joinery is simply the method by which two pieces of wood are connected. In many cases, the appearance of a joint becomes at least as important as it's strength. Wood joinery encompasses everything from intricate half-blind dovetails to connections that are simply nailed, glued or screwed. How to Use Biscuit Joints. Share. Doweling as a method of joinery is simple: a few dowels are glued into matchin...	`[1, 0, 0, 0, 0, ...]`
`how long does it take to be a paramedic`	['In Kansas, you first have to take an EMT course which is roughly 6 months long depending on where you take the course. Then you have to take A&P, English Comp, Sociology, Algebra, & Interpersonal Communication as pre-requisites for Paramedic. EMT is 110 hours which can be done in 3 weeks or dragged out for several months by just going one night per week to class. The Paramedic is 600 - 1200 hours in length depending on the state and averages about 6 - 9 months of training.', 'Coursework and training to become an EMT-basic or first responder can generally be completed in as little as three weeks on an accelerated basis. For part-time students, these programs may take around 8-11 weeks to complete. To become an EMT-intermediate 1985 or 1999, students generally must complete 30-350 hours of training. This training requirement varies according to the procedures the state allows these EMTs to perform.', 'How long does it take to be a paramedic depends on the area of study and the skill on...	`[1, 0, 0, 0, 0, ...]`

Loss: ListNetLoss with these parameters:

{
    "eps": 1e-10,
    "pad_value": -1,
    "activation_fct": "torch.nn.modules.linear.Identity"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 6
per_device_eval_batch_size: 16
learning_rate: 2e-05
warmup_ratio: 0.1
seed: 12
bf16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 6
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	NanoMSMARCO_ndcg@10	NanoNFCorpus_ndcg@10	NanoNQ_ndcg@10	NanoBEIR_R100_mean_ndcg@10
-1	-1	-	-	0.0155 (-0.5250)	0.3609 (+0.0358)	0.0410 (-0.4597)	0.1391 (-0.3163)
0.0001	1	2.1559	-	-	-	-	-
0.0762	1000	2.0862	-	-	-	-	-
0.1525	2000	2.0787	-	-	-	-	-
0.2287	3000	2.0785	-	-	-	-	-
0.3049	4000	2.0738	2.0755	0.5129 (-0.0276)	0.3371 (+0.0120)	0.5561 (+0.0555)	0.4687 (+0.0133)
0.3812	5000	2.0828	-	-	-	-	-
0.4574	6000	2.0711	-	-	-	-	-
0.5336	7000	2.072	-	-	-	-	-
0.6098	8000	2.0721	2.0734	0.5627 (+0.0222)	0.3547 (+0.0296)	0.5691 (+0.0684)	0.4955 (+0.0401)
0.6861	9000	2.0714	-	-	-	-	-
0.7623	10000	2.0744	-	-	-	-	-
0.8385	11000	2.0708	-	-	-	-	-
0.9148	12000	2.0705	2.0732	0.5573 (+0.0168)	0.3623 (+0.0373)	0.6499 (+0.1492)	0.5231 (+0.0678)
0.9910	13000	2.0721	-	-	-	-	-
1.0672	14000	2.065	-	-	-	-	-
1.1435	15000	2.0732	-	-	-	-	-
1.2197	16000	2.07	2.0729	0.5673 (+0.0269)	0.3563 (+0.0312)	0.5877 (+0.0870)	0.5038 (+0.0484)
1.2959	17000	2.0707	-	-	-	-	-
1.3722	18000	2.0719	-	-	-	-	-
1.4484	19000	2.0687	-	-	-	-	-
1.5246	20000	2.0675	2.0730	0.5633 (+0.0228)	0.3264 (+0.0014)	0.5949 (+0.0943)	0.4949 (+0.0395)
1.6009	21000	2.0698	-	-	-	-	-
1.6771	22000	2.0685	-	-	-	-	-
1.7533	23000	2.0683	-	-	-	-	-
1.8295	24000	2.0667	2.0731	0.5571 (+0.0166)	0.3521 (+0.0271)	0.6319 (+0.1313)	0.5137 (+0.0583)
1.9058	25000	2.0665	-	-	-	-	-
1.9820	26000	2.0707	-	-	-	-	-
2.0582	27000	2.0663	-	-	-	-	-
2.1345	28000	2.0672	2.0739	0.5543 (+0.0139)	0.3346 (+0.0096)	0.5958 (+0.0952)	0.4949 (+0.0395)
2.2107	29000	2.0661	-	-	-	-	-
2.2869	30000	2.0681	-	-	-	-	-
2.3632	31000	2.0626	-	-	-	-	-
2.4394	32000	2.0642	2.0745	0.5791 (+0.0387)	0.3347 (+0.0097)	0.6386 (+0.1380)	0.5175 (+0.0621)
2.5156	33000	2.0635	-	-	-	-	-
2.5919	34000	2.0648	-	-	-	-	-
2.6681	35000	2.0615	-	-	-	-	-
2.7443	36000	2.0626	2.0736	0.5735 (+0.0331)	0.3288 (+0.0038)	0.6205 (+0.1198)	0.5076 (+0.0522)
2.8206	37000	2.0621	-	-	-	-	-
2.8968	38000	2.0664	-	-	-	-	-
2.9730	39000	2.0621	-	-	-	-	-
-1	-1	-	-	0.5573 (+0.0168)	0.3623 (+0.0373)	0.6499 (+0.1492)	0.5231 (+0.0678)

The bold row denotes the saved checkpoint.

Environmental Impact

Carbon emissions were measured using CodeCarbon.

Energy Consumed: 0.529 kWh
Carbon Emitted: 0.205 kg of CO2
Hours Used: 1.686 hours

Training Hardware

On Cloud: No
GPU Model: 1 x NVIDIA GeForce RTX 3090
CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
RAM Size: 31.78 GB

Framework Versions

Python: 3.11.6
Sentence Transformers: 3.5.0.dev0
Transformers: 4.48.3
PyTorch: 2.5.0+cu121
Accelerate: 1.4.0
Datasets: 3.3.2
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to rank: from pairwise approach to listwise approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}