CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Cross Encoder
Base model: microsoft/MiniLM-L12-H384-uncased
Maximum Sequence Length: 512 tokens
Number of Output Labels: 1 label
Training Dataset:
- ms_marco
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-sigmoid")
# Get scores for pairs of texts
pairs = [
    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'How many calories in an egg',
    [
        'There are on average between 55 and 80 calories in an egg depending on its size.',
        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
        'Most of the calories in an egg come from the yellow yolk in the center.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Datasets: NanoMSMARCO, NanoNFCorpus and NanoNQ
Evaluated with CrossEncoderRerankingEvaluator

Metric	NanoMSMARCO	NanoNFCorpus	NanoNQ
map	0.5282 (+0.0387)	0.3272 (+0.0662)	0.5598 (+0.1402)
mrr@10	0.5156 (+0.0381)	0.5520 (+0.0521)	0.5627 (+0.1360)
ndcg@10	0.5787 (+0.0383)	0.3588 (+0.0337)	0.6221 (+0.1215)

Cross Encoder Nano BEIR

Dataset: NanoBEIR_R100_mean
Evaluated with CrossEncoderNanoBEIREvaluator

Metric	Value
map	0.4717 (+0.0817)
mrr@10	0.5434 (+0.0754)
ndcg@10	0.5199 (+0.0645)

Training Details

Training Dataset

ms_marco

Dataset: ms_marco at a47ee7a
Size: 78,704 training samples
Columns: query, docs, and labels
Approximate statistics based on the first 1000 samples:
query docs labels
type string list list
details
min: 11 characters
mean: 33.13 characters
max: 90 characters

size: 10 elements

size: 10 elements

	query	docs	labels
type	string	list	list
details	min: 11 characters mean: 33.13 characters max: 90 characters	size: 10 elements	size: 10 elements

Samples:

query	docs	labels
`how much does the average funeral cost`	['Average Funeral Costs. According to the Federal Trade Commission, the average funeral costs in the United States can be well over $10,000 by the time you add floral arrangements, prayer cards and family transportation. Traditionally, when people think of funeral expenses they think of things like a casket and flowers. Headstones often start around $500 and run upwards of $4000. The materials used for construction contribute to a wide price range. An average granite headstone in 2009 cost about $1500. If your loved one did not already have a cemetery plot, you will need to purchase one. Prices of cemetery plots depend on location. They start as low as a few hundred dollars and can be upward of a few thousand', 'National average funeral cost: $5,000-$15,000 CAD. In Canada, the price of a funeral varies greatly by area and method. A basic cremation service in Toronto can cost $1,470 CAD to cut costs. Many families are opting to choose at-home funerals. National average funeral cost: $6,...	`[1, 1, 0, 0, 0, ...]`
`what is the seven sisters constellation`	["The Seven Sisters is a small grouping of stars better known as the Pleiades, in the constellation Taurus. It is a group of six to seven stars (with the naked eye) and about 36 … stars (with binoculars) easily viewable on any clear night in the wintertime or very early spring. The Seven Sisters is what is known as an open cluster of stars. It's also known as The Pleiades. (PLEE-uh-DEES). 3 people found this useful. Edit. Share to: 1 The Periodic Table of Elements Life is sustained by a number of chemical elements.", 'For other uses of Pleiades or Pleiades, pléiades See (pleiades) . Disambiguation in, astronomy The (/pleiades.ˈplaɪ/ ədiːz /or.ˈpliː/), ədiːz Or Seven (Sisters messier 45 Or), m45 is an open star cluster containing-middle aged Hot-b type stars located in the constellation Of. taurus The nine brightest stars of the Pleiades are named for the Seven Sisters of Greek mythology: Sterope, Merope, Electra, Maia, Taygeta, Celaeno, and Alcyone, along with their parents Atlas and ...	`[1, 0, 0, 0, 0, ...]`
`the name nicole means`	["Nicole is a feminine given name and a surname. The given name Nicole is of Greek origin and means victorious people. It's evolved into a French feminine derivative of the masculine given name Nicolas. There are many variants. The surname Nicole originates in Netherlands where it was notable for its various branches, and associated status or influenc", "Nicole is a feminine given name and a surname.The given name Nicole is of Greek origin and means victorious people. It's evolved into a French feminine derivative of the masculine given name Nicholas.The given name Nicole is of Greek origin and means v. A feminine form of Nicolas, which is from the Greek Nikolaos, a compound name composed of the elements nikē (victory) and laos (the people): hence, victory of the people. Cole, Ercole, Micole, Niocole, Nycole. Nicole is a feminine given name and a surname.The given name Nicole is of Greek origin and means victorious people. It's evolved into a French feminine derivative of the masculine...	`[1, 0, 0, 0, 0, ...]`

Loss: ListNetLoss with these parameters:

{
    "pad_value": -1,
    "activation_fct": "torch.nn.modules.activation.Sigmoid"
}

Evaluation Dataset

ms_marco

Dataset: ms_marco at a47ee7a
Size: 1,000 evaluation samples
Columns: query, docs, and labels
Approximate statistics based on the first 1000 samples:
query docs labels
type string list list
details
min: 10 characters
mean: 33.86 characters
max: 100 characters

size: 10 elements

size: 10 elements

	query	docs	labels
type	string	list	list
details	min: 10 characters mean: 33.86 characters max: 100 characters	size: 10 elements	size: 10 elements

Samples:

query	docs	labels
`how much does a adjunct professor get paid`	['The average per-course pay reported for adjuncts at Ohio State University is $4,853, compared with an average of $6,500 reported at the University of Michigan at Ann Arbor. Harvard pays adjuncts $11,037, on average, according to the data that adjuncts have submitted so far. Many adjuncts have also indicated that they are essentially shut out of participating in most forms of governance. The overall average pay reported by adjuncts is $2,987 per three-credit course. Adjuncts at 16 colleges reported earning less than $1,000. The highest pay reported is $12,575, in the anthropology department at Harvard University', "Not surprisingly, at community colleges, adjuncts said they are paid much less. At Houston Community College, adjuncts reported earning between $1,200 and $2,200 for a three-credit English course. In some departments, adjuncts said anecdotally that pay depends on the degree held. One adjunct professor in history, for example, reported that where he or she works, instructors...	`[1, 1, 0, 0, 0, ...]`
`what a normal heart beat per minute`	["1 A normal adult resting heart beat is between 60-100 heartbeats per minute. 2 Some experienced athletes may see their resting heartrate fall below 60 beats per minute. 3 Tachycardia refers to the heart beating too fast at rest-over 100 beats per minute. 1 Your heart rate is the number of times per minute that the heart beats. 2 Heart rate rises significantly in response to adrenaline if a person is frightened or surprised. 3 Taking a person's pulse is a direct measure of heart rate.", 'Most adults have a resting heart rate of 60-100 beats per minute (bpm). The fitter you are, the lower your resting heart rate is likely to be. For example, athletes may have a resting heart rate of 40-60 bpm or lower. ', 'Even if you’re not an athlete, knowledge about your heart rate can help you monitor your fitness level — and it might even help you spot developing health problems. Your heart rate, or pulse, is the number of times your heart beats per minute. Normal heart rate varies from person...	`[1, 0, 0, 0, 0, ...]`
`what is sauterne wine`	['Sauternes is a French sweet wine from the Sauternais region of the Graves section in Bordeaux. Barsac lies within Sauternes, and is entitled to use either name. Somewhat similar but less expensive and typically less-distinguished wines are produced in the neighboring regions of Monbazillac, Cerons, Cérons loupiac And. cadillac', 'Sauternes is made from Semillon, Sémillon sauvignon, blanc And muscadelle grapes that have been affected By botrytis, cinerea also known as noble. rot Barsac lies within Sauternes, and is entitled to use either name. Somewhat similar but less expensive and typically less-distinguished wines are produced in the neighboring regions of Monbazillac, Cerons, Cérons loupiac And. cadillac', 'Sauternes, 40 miles (65km) south of Bordeaux city, is a village famous for its high-quality sweet wines. Although some wineries here produce dry wines, they sell them under appellations other than the sweet-specific Sauternes appellation. A half-bottle of top-quality, aged Saut...	`[1, 0, 0, 0, 0, ...]`

Loss: ListNetLoss with these parameters:

{
    "pad_value": -1,
    "activation_fct": "torch.nn.modules.activation.Sigmoid"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 6
per_device_eval_batch_size: 16
learning_rate: 2e-05
warmup_ratio: 0.1
seed: 12
bf16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 6
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	NanoMSMARCO_ndcg@10	NanoNFCorpus_ndcg@10	NanoNQ_ndcg@10	NanoBEIR_R100_mean_ndcg@10
-1	-1	-	-	0.0644 (-0.4760)	0.2696 (-0.0554)	0.0762 (-0.4245)	0.1367 (-0.3186)
0.0001	1	2.0486	-	-	-	-	-
0.0762	1000	2.0857	-	-	-	-	-
0.1525	2000	2.082	-	-	-	-	-
0.2287	3000	2.0758	-	-	-	-	-
0.3049	4000	2.0736	2.0711	0.5528 (+0.0124)	0.3638 (+0.0388)	0.6038 (+0.1032)	0.5068 (+0.0514)
0.3812	5000	2.0735	-	-	-	-	-
0.4574	6000	2.0685	-	-	-	-	-
0.5336	7000	2.0725	-	-	-	-	-
0.6098	8000	2.0746	2.0698	0.5251 (-0.0153)	0.3446 (+0.0195)	0.5542 (+0.0535)	0.4746 (+0.0193)
0.6861	9000	2.0759	-	-	-	-	-
0.7623	10000	2.0723	-	-	-	-	-
0.8385	11000	2.0787	-	-	-	-	-
0.9148	12000	2.0742	2.0697	0.5562 (+0.0157)	0.3700 (+0.0450)	0.6062 (+0.1056)	0.5108 (+0.0554)
0.9910	13000	2.0736	-	-	-	-	-
1.0672	14000	2.0715	-	-	-	-	-
1.1435	15000	2.073	-	-	-	-	-
1.2197	16000	2.0699	2.0696	0.5532 (+0.0128)	0.3634 (+0.0384)	0.6355 (+0.1349)	0.5174 (+0.0620)
1.2959	17000	2.0671	-	-	-	-	-
1.3722	18000	2.0682	-	-	-	-	-
1.4484	19000	2.0702	-	-	-	-	-
1.5246	20000	2.0699	2.0695	0.5787 (+0.0383)	0.3588 (+0.0337)	0.6221 (+0.1215)	0.5199 (+0.0645)
1.6009	21000	2.0689	-	-	-	-	-
1.6771	22000	2.0699	-	-	-	-	-
1.7533	23000	2.0667	-	-	-	-	-
1.8295	24000	2.0694	2.0694	0.5411 (+0.0006)	0.3817 (+0.0567)	0.6167 (+0.1161)	0.5132 (+0.0578)
1.9058	25000	2.0632	-	-	-	-	-
1.9820	26000	2.0721	-	-	-	-	-
2.0582	27000	2.0647	-	-	-	-	-
2.1345	28000	2.0688	2.0702	0.5746 (+0.0342)	0.3845 (+0.0594)	0.5908 (+0.0901)	0.5166 (+0.0613)
2.2107	29000	2.0635	-	-	-	-	-
2.2869	30000	2.0665	-	-	-	-	-
2.3632	31000	2.0643	-	-	-	-	-
2.4394	32000	2.0681	2.0699	0.5529 (+0.0125)	0.3690 (+0.0440)	0.5725 (+0.0718)	0.4981 (+0.0428)
2.5156	33000	2.0642	-	-	-	-	-
2.5919	34000	2.0613	-	-	-	-	-
2.6681	35000	2.0673	-	-	-	-	-
2.7443	36000	2.0641	2.0696	0.5534 (+0.0130)	0.3701 (+0.0451)	0.5639 (+0.0632)	0.4958 (+0.0404)
2.8206	37000	2.0645	-	-	-	-	-
2.8968	38000	2.0673	-	-	-	-	-
2.9730	39000	2.0656	-	-	-	-	-
-1	-1	-	-	0.5787 (+0.0383)	0.3588 (+0.0337)	0.6221 (+0.1215)	0.5199 (+0.0645)

The bold row denotes the saved checkpoint.

Environmental Impact

Carbon emissions were measured using CodeCarbon.

Energy Consumed: 0.532 kWh
Carbon Emitted: 0.207 kg of CO2
Hours Used: 1.708 hours

Training Hardware

On Cloud: No
GPU Model: 1 x NVIDIA GeForce RTX 3090
CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
RAM Size: 31.78 GB

Framework Versions

Python: 3.11.6
Sentence Transformers: 3.5.0.dev0
Transformers: 4.48.3
PyTorch: 2.5.0+cu121
Accelerate: 1.4.0
Datasets: 3.3.2
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to rank: from pairwise approach to listwise approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}

tomaarsen
/

reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-sigmoid

CrossEncoder based on microsoft/MiniLM-L12-H384-uncased

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Cross Encoder Reranking

Cross Encoder Nano BEIR

Training Details

Training Dataset

ms_marco

Evaluation Dataset

ms_marco

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Environmental Impact

Training Hardware

Framework Versions

Citation

BibTeX

Sentence Transformers

ListNetLoss

Model tree for tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-sigmoid

Dataset used to train tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-sigmoid

Evaluation results