SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-mpnet-base-v2
- Maximum Sequence Length: 384 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- csv
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("carnival13/all-mpnet-base-v2-modulepred")
# Run inference
sentences = [
'module: bread ambient group: bread ambient supergroup: food ambient example descriptions: 1 war 3 toastie 400 g cc 90 varburtons bread tovis snelwrspmpkin 800 g warbutons medium bread spk giant crumpets z hovis med wht 600 g sandwich thins 5 pk warb pk crumpets mission plain tortilla 25 cm warburtons 4 protein thin bagels hovis soft wet med hovis wholemefl pataks pappadums 6 pk warb so bth disc pappajuns',
'retailer: crispcorner description: kingsmill 5050 medius bread 800 g',
'retailer: vitalveg description: ready to eat prun',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
sentence-transformers/all-mpnet-base-v2
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.4988 |
cosine_accuracy@3 | 0.6342 |
cosine_accuracy@5 | 0.7102 |
cosine_accuracy@10 | 0.7838 |
cosine_precision@1 | 0.4988 |
cosine_precision@3 | 0.2114 |
cosine_precision@5 | 0.142 |
cosine_precision@10 | 0.0784 |
cosine_recall@1 | 0.4988 |
cosine_recall@3 | 0.6342 |
cosine_recall@5 | 0.7102 |
cosine_recall@10 | 0.7838 |
cosine_ndcg@10 | 0.6324 |
cosine_mrr@10 | 0.585 |
cosine_map@100 | 0.591 |
Training Details
Training Dataset
csv
- Dataset: csv
- Size: 505,654 training samples
- Columns:
query
andfull_doc
- Approximate statistics based on the first 1000 samples:
query full_doc type string string details - min: 10 tokens
- mean: 14.8 tokens
- max: 23 tokens
- min: 83 tokens
- mean: 115.71 tokens
- max: 176 tokens
- Samples:
query full_doc retailer: vitalveg description: twin xira
module: chocolate single variety group: chocolate chocolate substitutes supergroup: biscuits & confectionery & snacks example descriptions: milky way twin 43 crml prtzlarum rai galaxy mnstr pipnut 34 g dark pb cup nest mnch foge p nestle smarties shar dark choc chun x 10 pk kinder bueno 1 dr oetker 72 da poppets choc offee pouch yorkie biscuit zpk haltesers truffles bog cadbury mini snowballs p terrys choc orange 3435 g galaxy fusion dark 704 100 g
retailer: freshnosh description: mab pop sockt
module: clothing & personal accessories group: clothing & personal accessories supergroup: clothing & personal accessories example descriptions: pk blue trad ging 40 d 3 pk opaque tight t 74 green cali jogger ss animal swing yb denim stripe pump aw 21 ff vest aw 21 girls 5 pk lounge toplo sku 1 pk fleecy tight knitted pom hat pk briefs timeless double pom pomkids hat cute face twosie sku coral jersey str pun faded petrol t 32 seamfree waist c
retailer: nourify description: bts prwn ckt swch
module: bread sandwiches filled rolls wraps group: bread fresh fixed weight supergroup: food perishable example descriptions: us chicken may hamche sw jo dbs allbtr pp st 4 js baconfree ran posh cheesy bea naturify cb swich sp eggcress f cpdfeggbacon js cheeseonion sv duck wrap reduced price takeout egg mayo sandwich 7 takeout cheeseonion s wich 2 ad leicester plough bts cheese pman 2 1 cp bacon chese s
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 4per_device_eval_batch_size
: 16learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 4per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | sentence-transformers/all-mpnet-base-v2_cosine_map@100 |
---|---|---|---|
0.0016 | 100 | 1.6195 | 0.2567 |
0.0032 | 200 | 1.47 | 0.3166 |
0.0047 | 300 | 1.2703 | 0.3814 |
0.0063 | 400 | 1.1335 | 0.4495 |
0.0079 | 500 | 0.9942 | 0.4827 |
0.0095 | 600 | 0.9004 | 0.5058 |
0.0111 | 700 | 0.8838 | 0.5069 |
0.0016 | 100 | 0.951 | 0.5197 |
0.0032 | 200 | 0.9597 | 0.5323 |
0.0047 | 300 | 0.9241 | 0.5406 |
0.0063 | 400 | 0.8225 | 0.5484 |
0.0079 | 500 | 0.7961 | 0.5568 |
0.0095 | 600 | 0.7536 | 0.5621 |
0.0111 | 700 | 0.7387 | 0.5623 |
0.0127 | 800 | 0.7716 | 0.5746 |
0.0142 | 900 | 0.7921 | 0.5651 |
0.0158 | 1000 | 0.7744 | 0.5707 |
0.0174 | 1100 | 0.8021 | 0.5770 |
0.0190 | 1200 | 0.732 | 0.5756 |
0.0206 | 1300 | 0.764 | 0.5798 |
0.0221 | 1400 | 0.7726 | 0.5873 |
0.0237 | 1500 | 0.6676 | 0.5921 |
0.0253 | 1600 | 0.6851 | 0.5841 |
0.0269 | 1700 | 0.7404 | 0.5964 |
0.0285 | 1800 | 0.6798 | 0.5928 |
0.0301 | 1900 | 0.6485 | 0.5753 |
0.0316 | 2000 | 0.649 | 0.5839 |
0.0332 | 2100 | 0.6739 | 0.5891 |
0.0348 | 2200 | 0.6616 | 0.6045 |
0.0364 | 2300 | 0.6287 | 0.5863 |
0.0380 | 2400 | 0.6602 | 0.5898 |
0.0396 | 2500 | 0.5667 | 0.5910 |
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.1.1
- Transformers: 4.44.2
- PyTorch: 2.4.0+cu124
- Accelerate: 0.33.0
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for carnival13/all-mpnet-base-v2-modulepred
Base model
sentence-transformers/all-mpnet-base-v2Evaluation results
- Cosine Accuracy@1 on sentence transformers/all mpnet base v2self-reported0.499
- Cosine Accuracy@3 on sentence transformers/all mpnet base v2self-reported0.634
- Cosine Accuracy@5 on sentence transformers/all mpnet base v2self-reported0.710
- Cosine Accuracy@10 on sentence transformers/all mpnet base v2self-reported0.784
- Cosine Precision@1 on sentence transformers/all mpnet base v2self-reported0.499
- Cosine Precision@3 on sentence transformers/all mpnet base v2self-reported0.211
- Cosine Precision@5 on sentence transformers/all mpnet base v2self-reported0.142
- Cosine Precision@10 on sentence transformers/all mpnet base v2self-reported0.078
- Cosine Recall@1 on sentence transformers/all mpnet base v2self-reported0.499
- Cosine Recall@3 on sentence transformers/all mpnet base v2self-reported0.634