metadata
language:
- en
tags:
- sentence-transformers
- cross-encoder
- text-classification
- generated_from_trainer
- dataset_size:78704
- loss:ListNetLoss
base_model: microsoft/MiniLM-L12-H384-uncased
datasets:
- microsoft/ms_marco
pipeline_tag: text-classification
library_name: sentence-transformers
metrics:
- map
- mrr@10
- ndcg@10
co2_eq_emissions:
emissions: 205.4804729340415
energy_consumed: 0.5286324046031189
source: codecarbon
training_type: fine-tuning
on_cloud: false
cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
ram_total_size: 31.777088165283203
hours_used: 1.686
hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
- name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
results: []
CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: microsoft/MiniLM-L12-H384-uncased
- Maximum Sequence Length: 512 tokens
- Number of Output Labels: 1 label
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-identity")
# Get scores for pairs of texts
pairs = [
['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'How many calories in an egg',
[
'There are on average between 55 and 80 calories in an egg depending on its size.',
'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
'Most of the calories in an egg come from the yellow yolk in the center.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
Evaluation
Metrics
Cross Encoder Reranking
- Datasets:
NanoMSMARCO
,NanoNFCorpus
andNanoNQ
- Evaluated with
CrossEncoderRerankingEvaluator
Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
---|---|---|---|
map | 0.4847 (-0.0049) | 0.3325 (+0.0716) | 0.5967 (+0.1771) |
mrr@10 | 0.4768 (-0.0007) | 0.5669 (+0.0670) | 0.6024 (+0.1757) |
ndcg@10 | 0.5573 (+0.0168) | 0.3623 (+0.0373) | 0.6499 (+0.1492) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_R100_mean
- Evaluated with
CrossEncoderNanoBEIREvaluator
Metric | Value |
---|---|
map | 0.4713 (+0.0813) |
mrr@10 | 0.5487 (+0.0807) |
ndcg@10 | 0.5231 (+0.0678) |
Training Details
Training Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 78,704 training samples
- Columns:
query
,docs
, andlabels
- Approximate statistics based on the first 1000 samples:
query docs labels type string list list details - min: 10 characters
- mean: 33.93 characters
- max: 99 characters
- size: 10 elements
- size: 10 elements
- Samples:
query docs labels what types of moons are there
["The different types of moons are: Full Wolf Moon, Full Snow Moon, Full Worm Moon, paschal full moon, full pink moon, full flower moon, full strawberry moon, full buck moon, … full sturgeon moon, full harvest moon, full hunters moon, full beaver moon, full cold moon. The solar eclipse, when the moon blocks the sun's light from hitting the earth-creating a temporary blackout on earth, can occur only at the time of New Moon, while the luna … r eclipse, when the earth blocks the sun's light from reflecting off the moon, can occur only at the time of Full Moon.", 'Types of Moons. Full Moon names date back to Native Americans, of what is now the northern and eastern United States. The tribes kept track of the seasons by giving distinctive names to each recurring full Moon. Their names were applied to the entire month in which each occurred. There was some variation in the Moon names, but in general, the same ones were current throughout the Algonquin tribes from New England to Lake Superio...
[1, 1, 1, 0, 0, ...]
what is beryllium commonly combined with
['Beryllium is an industrial metal with some attractive attributes. It’s lighter than aluminum and 6x stronger than steel. It’s usually combined with other metals and is a key component in the aerospace and electronics industries. Beryllium is also used in the production of nuclear weapons. With that, you may not be surprised to learn that beryllium is one of the most toxic elements in existence. Beryllium is a Class A EPA carcinogen and exposure can cause Chronic Beryllium Disease, an often fatal lung disease. ', 'Beryllium is found in about 30 different mineral species. The most important are beryl (beryllium aluminium silicate) and bertrandite (beryllium silicate). Emerald and aquamarine are precious forms of beryl. The metal is usually prepared by reducing beryllium fluoride with magnesium metal. Uses. Beryllium is used in alloys with copper or nickel to make gyroscopes, springs, electrical contacts, spot-welding electrodes and non-sparking tools. Mixing beryllium with these metals...
[1, 0, 0, 0, 0, ...]
is turkish coffee healthy
["Calories, Fat and Other Basics. A serving of Turkish coffee contains about 46 calories. Though the drink doesn't contain any fat, it also doesn't supply any fiber or protein, two key nutrients needed for good health. The coffee doesn't supply an impressive amount of calcium or iron either. A blend of strong coffee, sugar and cardamom, Turkish coffee is more of a sweet treat than something similar to a regular cup of coffee. While there are certain health benefits from the coffee and cardamom, sugar is a major drawback when it comes to the nutritional benefits of the drink", "A serving of Turkish coffee contains about 11.5 grams of sugar, which is equal to almost 3 teaspoons. That's half of the 6 teaspoons women should limit themselves to each day and one-third of the 9 teaspoons men should set as their daily upper limit, according to the American Heart Association. A blend of strong coffee, sugar and cardamom, Turkish coffee is more of a sweet treat than something similar to a regula...
[1, 1, 0, 0, 0, ...]
- Loss:
ListNetLoss
with these parameters:{ "eps": 1e-10, "pad_value": -1, "activation_fct": "torch.nn.modules.linear.Identity" }
Evaluation Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 1,000 evaluation samples
- Columns:
query
,docs
, andlabels
- Approximate statistics based on the first 1000 samples:
query docs labels type string list list details - min: 10 characters
- mean: 33.81 characters
- max: 110 characters
- size: 10 elements
- size: 10 elements
- Samples:
query docs labels what is a fishy smell on humans
["Trimethylaminuria (TMAU), also known as fish odor syndrome or fish malodor syndrome, is a rare metabolic disorder where Trimethylamine is released in the person's sweat, urine, and breath, giving off a strong fishy odor or strong body odor. Body odor is generally considered to be an unpleasant odor among many human cultures.", "The trimethylamine is released in the person's sweat, urine, reproductive fluids, and breath, giving off a strong fishy or body odor. Some people with trimethylaminuria have a strong odor all the time, but most have a moderate smell that varies in intensity over time. Although FMO3 mutations account for most known cases of trimethylaminuria, some cases are caused by other factors. A fish-like body odor could result from an excess of certain proteins in the diet or from an increase in bacteria in the digestive system.", 'Trimethylaminuria is a disorder in which the body is unable to break down trimethylamine, a chemical compound that has a pungent odor. Trimeth...
[1, 0, 0, 0, 0, ...]
how to cut woodworking joints
['The tails and pins interlock to form a strong 90-degree joint. Dovetail joints are technically complex and are often used to create drawer boxes for furniture. Through mortise and tenon – To form this joint, a round or square hole (called a mortise) is cut through the side of one piece of wood. The end of the other piece of wood is cut to have a projection (the tenon) that matches the mortise. The tenon is placed into the mortise, projecting out from the other side of the wood. A wedge is hammered into a hole in the tenon. The wedge keeps the tenon from sliding out of the mortise.', "Wood joinery is simply the method by which two pieces of wood are connected. In many cases, the appearance of a joint becomes at least as important as it's strength. Wood joinery encompasses everything from intricate half-blind dovetails to connections that are simply nailed, glued or screwed. How to Use Biscuit Joints. Share. Doweling as a method of joinery is simple: a few dowels are glued into matchin...
[1, 0, 0, 0, 0, ...]
how long does it take to be a paramedic
['In Kansas, you first have to take an EMT course which is roughly 6 months long depending on where you take the course. Then you have to take A&P, English Comp, Sociology, Algebra, & Interpersonal Communication as pre-requisites for Paramedic. EMT is 110 hours which can be done in 3 weeks or dragged out for several months by just going one night per week to class. The Paramedic is 600 - 1200 hours in length depending on the state and averages about 6 - 9 months of training.', 'Coursework and training to become an EMT-basic or first responder can generally be completed in as little as three weeks on an accelerated basis. For part-time students, these programs may take around 8-11 weeks to complete. To become an EMT-intermediate 1985 or 1999, students generally must complete 30-350 hours of training. This training requirement varies according to the procedures the state allows these EMTs to perform.', 'How long does it take to be a paramedic depends on the area of study and the skill on...
[1, 0, 0, 0, 0, ...]
- Loss:
ListNetLoss
with these parameters:{ "eps": 1e-10, "pad_value": -1, "activation_fct": "torch.nn.modules.linear.Identity" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 6per_device_eval_batch_size
: 16learning_rate
: 2e-05warmup_ratio
: 0.1seed
: 12bf16
: Trueload_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 6per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 12data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
---|---|---|---|---|---|---|---|
-1 | -1 | - | - | 0.0155 (-0.5250) | 0.3609 (+0.0358) | 0.0410 (-0.4597) | 0.1391 (-0.3163) |
0.0001 | 1 | 2.1559 | - | - | - | - | - |
0.0762 | 1000 | 2.0862 | - | - | - | - | - |
0.1525 | 2000 | 2.0787 | - | - | - | - | - |
0.2287 | 3000 | 2.0785 | - | - | - | - | - |
0.3049 | 4000 | 2.0738 | 2.0755 | 0.5129 (-0.0276) | 0.3371 (+0.0120) | 0.5561 (+0.0555) | 0.4687 (+0.0133) |
0.3812 | 5000 | 2.0828 | - | - | - | - | - |
0.4574 | 6000 | 2.0711 | - | - | - | - | - |
0.5336 | 7000 | 2.072 | - | - | - | - | - |
0.6098 | 8000 | 2.0721 | 2.0734 | 0.5627 (+0.0222) | 0.3547 (+0.0296) | 0.5691 (+0.0684) | 0.4955 (+0.0401) |
0.6861 | 9000 | 2.0714 | - | - | - | - | - |
0.7623 | 10000 | 2.0744 | - | - | - | - | - |
0.8385 | 11000 | 2.0708 | - | - | - | - | - |
0.9148 | 12000 | 2.0705 | 2.0732 | 0.5573 (+0.0168) | 0.3623 (+0.0373) | 0.6499 (+0.1492) | 0.5231 (+0.0678) |
0.9910 | 13000 | 2.0721 | - | - | - | - | - |
1.0672 | 14000 | 2.065 | - | - | - | - | - |
1.1435 | 15000 | 2.0732 | - | - | - | - | - |
1.2197 | 16000 | 2.07 | 2.0729 | 0.5673 (+0.0269) | 0.3563 (+0.0312) | 0.5877 (+0.0870) | 0.5038 (+0.0484) |
1.2959 | 17000 | 2.0707 | - | - | - | - | - |
1.3722 | 18000 | 2.0719 | - | - | - | - | - |
1.4484 | 19000 | 2.0687 | - | - | - | - | - |
1.5246 | 20000 | 2.0675 | 2.0730 | 0.5633 (+0.0228) | 0.3264 (+0.0014) | 0.5949 (+0.0943) | 0.4949 (+0.0395) |
1.6009 | 21000 | 2.0698 | - | - | - | - | - |
1.6771 | 22000 | 2.0685 | - | - | - | - | - |
1.7533 | 23000 | 2.0683 | - | - | - | - | - |
1.8295 | 24000 | 2.0667 | 2.0731 | 0.5571 (+0.0166) | 0.3521 (+0.0271) | 0.6319 (+0.1313) | 0.5137 (+0.0583) |
1.9058 | 25000 | 2.0665 | - | - | - | - | - |
1.9820 | 26000 | 2.0707 | - | - | - | - | - |
2.0582 | 27000 | 2.0663 | - | - | - | - | - |
2.1345 | 28000 | 2.0672 | 2.0739 | 0.5543 (+0.0139) | 0.3346 (+0.0096) | 0.5958 (+0.0952) | 0.4949 (+0.0395) |
2.2107 | 29000 | 2.0661 | - | - | - | - | - |
2.2869 | 30000 | 2.0681 | - | - | - | - | - |
2.3632 | 31000 | 2.0626 | - | - | - | - | - |
2.4394 | 32000 | 2.0642 | 2.0745 | 0.5791 (+0.0387) | 0.3347 (+0.0097) | 0.6386 (+0.1380) | 0.5175 (+0.0621) |
2.5156 | 33000 | 2.0635 | - | - | - | - | - |
2.5919 | 34000 | 2.0648 | - | - | - | - | - |
2.6681 | 35000 | 2.0615 | - | - | - | - | - |
2.7443 | 36000 | 2.0626 | 2.0736 | 0.5735 (+0.0331) | 0.3288 (+0.0038) | 0.6205 (+0.1198) | 0.5076 (+0.0522) |
2.8206 | 37000 | 2.0621 | - | - | - | - | - |
2.8968 | 38000 | 2.0664 | - | - | - | - | - |
2.9730 | 39000 | 2.0621 | - | - | - | - | - |
-1 | -1 | - | - | 0.5573 (+0.0168) | 0.3623 (+0.0373) | 0.6499 (+0.1492) | 0.5231 (+0.0678) |
- The bold row denotes the saved checkpoint.
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.529 kWh
- Carbon Emitted: 0.205 kg of CO2
- Hours Used: 1.686 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 3.5.0.dev0
- Transformers: 4.48.3
- PyTorch: 2.5.0+cu121
- Accelerate: 1.4.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
ListNetLoss
@inproceedings{cao2007learning,
title={Learning to rank: from pairwise approach to listwise approach},
author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
booktitle={Proceedings of the 24th international conference on Machine learning},
pages={129--136},
year={2007}
}