SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/ST-tramits-SITGES-007-5ep")
# Run inference
sentences = [
    'La comunicació és un element important en la cura dels gats, ja que implica la capacitat per a comunicar-se de manera efectiva amb les autoritats competents i amb els altres implicats en la cura dels animals.',
    'Quin és el paper de la comunicació en la cura dels gats?',
    'Qui són considerats titulars o nous exercents en el cas dels espectacles, establiments oberts al públic i les activitats recreatives?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.159
cosine_accuracy@3 0.3033
cosine_accuracy@5 0.3724
cosine_accuracy@10 0.5188
cosine_precision@1 0.159
cosine_precision@3 0.1011
cosine_precision@5 0.0745
cosine_precision@10 0.0519
cosine_recall@1 0.159
cosine_recall@3 0.3033
cosine_recall@5 0.3724
cosine_recall@10 0.5188
cosine_ndcg@10 0.3174
cosine_mrr@10 0.256
cosine_map@100 0.2763

Information Retrieval

Metric Value
cosine_accuracy@1 0.1569
cosine_accuracy@3 0.2971
cosine_accuracy@5 0.3808
cosine_accuracy@10 0.5084
cosine_precision@1 0.1569
cosine_precision@3 0.099
cosine_precision@5 0.0762
cosine_precision@10 0.0508
cosine_recall@1 0.1569
cosine_recall@3 0.2971
cosine_recall@5 0.3808
cosine_recall@10 0.5084
cosine_ndcg@10 0.3139
cosine_mrr@10 0.2541
cosine_map@100 0.2757

Information Retrieval

Metric Value
cosine_accuracy@1 0.1736
cosine_accuracy@3 0.3138
cosine_accuracy@5 0.3954
cosine_accuracy@10 0.5377
cosine_precision@1 0.1736
cosine_precision@3 0.1046
cosine_precision@5 0.0791
cosine_precision@10 0.0538
cosine_recall@1 0.1736
cosine_recall@3 0.3138
cosine_recall@5 0.3954
cosine_recall@10 0.5377
cosine_ndcg@10 0.3324
cosine_mrr@10 0.27
cosine_map@100 0.2901

Information Retrieval

Metric Value
cosine_accuracy@1 0.1506
cosine_accuracy@3 0.2908
cosine_accuracy@5 0.4017
cosine_accuracy@10 0.5356
cosine_precision@1 0.1506
cosine_precision@3 0.0969
cosine_precision@5 0.0803
cosine_precision@10 0.0536
cosine_recall@1 0.1506
cosine_recall@3 0.2908
cosine_recall@5 0.4017
cosine_recall@10 0.5356
cosine_ndcg@10 0.319
cosine_mrr@10 0.2527
cosine_map@100 0.2729

Information Retrieval

Metric Value
cosine_accuracy@1 0.1674
cosine_accuracy@3 0.3201
cosine_accuracy@5 0.4163
cosine_accuracy@10 0.5481
cosine_precision@1 0.1674
cosine_precision@3 0.1067
cosine_precision@5 0.0833
cosine_precision@10 0.0548
cosine_recall@1 0.1674
cosine_recall@3 0.3201
cosine_recall@5 0.4163
cosine_recall@10 0.5481
cosine_ndcg@10 0.3354
cosine_mrr@10 0.27
cosine_map@100 0.2892

Information Retrieval

Metric Value
cosine_accuracy@1 0.1548
cosine_accuracy@3 0.2845
cosine_accuracy@5 0.3515
cosine_accuracy@10 0.5209
cosine_precision@1 0.1548
cosine_precision@3 0.0948
cosine_precision@5 0.0703
cosine_precision@10 0.0521
cosine_recall@1 0.1548
cosine_recall@3 0.2845
cosine_recall@5 0.3515
cosine_recall@10 0.5209
cosine_ndcg@10 0.3117
cosine_mrr@10 0.2482
cosine_map@100 0.2686

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 6,692 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 44.83 tokens
    • max: 185 tokens
    • min: 10 tokens
    • mean: 20.89 tokens
    • max: 49 tokens
  • Samples:
    positive anchor
    Els residus comercials o industrials assimilables als municipals que hauran d'acreditar si disposen d'un gestor autoritzat per a la gestió dels residus. Quins són els residus que es recullen en el servei municipal complementari?
    L'Ajuntament de Sitges ofereix ajuts econòmics a famílies amb recursos insuficients per accedir a la realització d'activitats de lleure... Quin és el paper de l'Ajuntament de Sitges en la promoció de l'educació no formal i de lleure?
    Permet comunicar les intervencions necessàries per executar una instal·lació/remodelació d’autoconsum amb energia solar fotovoltaica amb una potència instal·lada inferior a 100 kWp en sòl urbà consolidat. Quin és el propòsit de la remodelació d'una instal·lació d'autoconsum?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.2
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.3819 10 3.3449 - - - - - -
0.7637 20 2.0557 - - - - - -
0.9928 26 - 0.2440 0.2408 0.2590 0.2439 0.2379 0.2512
1.1456 30 1.4634 - - - - - -
1.5274 40 0.8163 - - - - - -
1.9093 50 0.6103 - - - - - -
1.9857 52 - 0.2621 0.2683 0.2483 0.2629 0.2404 0.2472
2.2912 60 0.4854 - - - - - -
2.6730 70 0.2796 - - - - - -
2.9785 78 - 0.2701 0.2697 0.2761 0.2845 0.2673 0.2709
3.0549 80 0.2458 - - - - - -
3.4368 90 0.2616 - - - - - -
3.8186 100 0.174 - - - - - -
3.9714 104 - 0.2729 0.2863 0.2858 0.2853 0.2656 0.2752
4.2005 110 0.1841 - - - - - -
4.5823 120 0.1668 - - - - - -
4.9642 130 0.1484 0.2763 0.2892 0.2729 0.2901 0.2686 0.2757
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.35.0.dev0
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/ST-tramits-SITGES-007-5ep

Base model

BAAI/bge-m3
Finetuned
(192)
this model

Collection including adriansanz/ST-tramits-SITGES-007-5ep

Evaluation results