BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("CarlosElArtista/bge-base-financial-matryoshka")
# Run inference
sentences = [
    "Symtuza (darunavir/C/FTC/TAF), a fixed dose combination product that includes cobicistat ('C'), emtricitabine ('FTC'), and tenofovir alafenamide ('TAF'), is commercialized by Janssen Sciences Ireland Unlimited Company.",
    'What are the primary drugs included in Symtuza and which company commercializes it?',
    'What was reported as the percentage revenue increase for the Asia Pacific & Latin America segment of NIKE from fiscal 2022 to fiscal 2023?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.67 0.6657 0.6529 0.6443 0.6057
cosine_accuracy@3 0.8071 0.8086 0.8043 0.7886 0.78
cosine_accuracy@5 0.8486 0.8414 0.8357 0.83 0.8214
cosine_accuracy@10 0.8986 0.8943 0.8957 0.8857 0.8814
cosine_precision@1 0.67 0.6657 0.6529 0.6443 0.6057
cosine_precision@3 0.269 0.2695 0.2681 0.2629 0.26
cosine_precision@5 0.1697 0.1683 0.1671 0.166 0.1643
cosine_precision@10 0.0899 0.0894 0.0896 0.0886 0.0881
cosine_recall@1 0.67 0.6657 0.6529 0.6443 0.6057
cosine_recall@3 0.8071 0.8086 0.8043 0.7886 0.78
cosine_recall@5 0.8486 0.8414 0.8357 0.83 0.8214
cosine_recall@10 0.8986 0.8943 0.8957 0.8857 0.8814
cosine_ndcg@10 0.7849 0.7817 0.7751 0.7673 0.7451
cosine_mrr@10 0.7485 0.7455 0.7365 0.7293 0.7014
cosine_map@100 0.7523 0.7496 0.7402 0.7336 0.7052

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 8 tokens
    • mean: 46.05 tokens
    • max: 512 tokens
    • min: 2 tokens
    • mean: 20.55 tokens
    • max: 51 tokens
  • Samples:
    positive anchor
    The AMPTC for microinverters decreases by 25% each year beginning in 2030 and ending after 2032. What is the trajectory of the AMPTC for microinverters starting in 2030?
    results. Legal and Other Contingencies The Company is subject to various legal proceedings and claims that arise in the ordinary course of business, the outcomes of which are inherently uncertain. The Company records a liability when it is probable that a loss has been incurred and the amount is reasonably estimable, the determination of which requires significant judgment. Resolution of legal matters in a manner inconsistent with management’s expectations could have a material impact on the Company’s financial condition and operating results. Apple Inc. 2023 Form 10-K
    In 2023, the company recorded other operating charges of $1,951 million. What was the total amount of other operating charges recorded by the company in 2023?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • gradient_accumulation_steps: 4
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0254 10 0.3873 - - - - -
0.0508 20 0.1907 - - - - -
0.0762 30 0.3031 - - - - -
0.1016 40 0.3314 - - - - -
0.1270 50 0.3452 - - - - -
0.1524 60 0.1831 - - - - -
0.1778 70 0.1286 - - - - -
0.2032 80 0.1162 - - - - -
0.2286 90 0.1464 - - - - -
0.2540 100 0.0409 - - - - -
0.2794 110 0.0886 - - - - -
0.3048 120 0.0964 - - - - -
0.3302 130 0.175 - - - - -
0.3556 140 0.1102 - - - - -
0.3810 150 0.0705 - - - - -
0.4063 160 0.0892 - - - - -
0.4317 170 0.1246 - - - - -
0.4571 180 0.0924 - - - - -
0.4825 190 0.05 - - - - -
0.5079 200 0.0676 - - - - -
0.5333 210 0.0746 - - - - -
0.5587 220 0.2014 - - - - -
0.5841 230 0.0568 - - - - -
0.6095 240 0.118 - - - - -
0.6349 250 0.0833 - - - - -
0.6603 260 0.1091 - - - - -
0.6857 270 0.1108 - - - - -
0.7111 280 0.1026 - - - - -
0.7365 290 0.1485 - - - - -
0.7619 300 0.0888 - - - - -
0.7873 310 0.0366 - - - - -
0.8127 320 0.0717 - - - - -
0.8381 330 0.0703 - - - - -
0.8635 340 0.0531 - - - - -
0.8889 350 0.0488 - - - - -
0.9143 360 0.0321 - - - - -
0.9397 370 0.1364 - - - - -
0.9651 380 0.2325 - - - - -
0.9905 390 0.0346 - - - - -
1.0 394 - 0.7833 0.7757 0.7692 0.7525 0.7314
1.0152 400 0.0742 - - - - -
1.0406 410 0.0147 - - - - -
1.0660 420 0.0777 - - - - -
1.0914 430 0.0353 - - - - -
1.1168 440 0.0093 - - - - -
1.1422 450 0.1484 - - - - -
1.1676 460 0.0167 - - - - -
1.1930 470 0.0039 - - - - -
1.2184 480 0.007 - - - - -
1.2438 490 0.0043 - - - - -
1.2692 500 0.0156 - - - - -
1.2946 510 0.0519 - - - - -
1.32 520 0.0163 - - - - -
1.3454 530 0.0214 - - - - -
1.3708 540 0.0025 - - - - -
1.3962 550 0.0129 - - - - -
1.4216 560 0.0045 - - - - -
1.4470 570 0.0025 - - - - -
1.4724 580 0.0023 - - - - -
1.4978 590 0.0114 - - - - -
1.5232 600 0.0636 - - - - -
1.5486 610 0.0066 - - - - -
1.5740 620 0.0112 - - - - -
1.5994 630 0.0087 - - - - -
1.6248 640 0.0026 - - - - -
1.6502 650 0.017 - - - - -
1.6756 660 0.0741 - - - - -
1.7010 670 0.0041 - - - - -
1.7263 680 0.0339 - - - - -
1.7517 690 0.003 - - - - -
1.7771 700 0.0052 - - - - -
1.8025 710 0.0464 - - - - -
1.8279 720 0.0015 - - - - -
1.8533 730 0.0169 - - - - -
1.8787 740 0.0178 - - - - -
1.9041 750 0.0033 - - - - -
1.9295 760 0.0165 - - - - -
1.9549 770 0.0091 - - - - -
1.9803 780 0.1162 - - - - -
2.0 788 - 0.7849 0.7820 0.7764 0.7661 0.7469
2.0051 790 0.0077 - - - - -
2.0305 800 0.0024 - - - - -
2.0559 810 0.0025 - - - - -
2.0813 820 0.0032 - - - - -
2.1067 830 0.0022 - - - - -
2.1321 840 0.0428 - - - - -
2.1575 850 0.0027 - - - - -
2.1829 860 0.0015 - - - - -
2.2083 870 0.0028 - - - - -
2.2337 880 0.0006 - - - - -
2.2590 890 0.0005 - - - - -
2.2844 900 0.0025 - - - - -
2.3098 910 0.002 - - - - -
2.3352 920 0.002 - - - - -
2.3606 930 0.0105 - - - - -
2.3860 940 0.0061 - - - - -
2.4114 950 0.0017 - - - - -
2.4368 960 0.0009 - - - - -
2.4622 970 0.0007 - - - - -
2.4876 980 0.001 - - - - -
2.5130 990 0.0008 - - - - -
2.5384 1000 0.044 - - - - -
2.5638 1010 0.0012 - - - - -
2.5892 1020 0.0103 - - - - -
2.6146 1030 0.0003 - - - - -
2.64 1040 0.0005 - - - - -
2.6654 1050 0.0972 - - - - -
2.6908 1060 0.0011 - - - - -
2.7162 1070 0.0093 - - - - -
2.7416 1080 0.0028 - - - - -
2.7670 1090 0.0004 - - - - -
2.7924 1100 0.0231 - - - - -
2.8178 1110 0.0021 - - - - -
2.8432 1120 0.0013 - - - - -
2.8686 1130 0.0012 - - - - -
2.8940 1140 0.002 - - - - -
2.9194 1150 0.001 - - - - -
2.9448 1160 0.007 - - - - -
2.9702 1170 0.018 - - - - -
2.9956 1180 0.001 - - - - -
3.0 1182 - 0.7832 0.7823 0.7754 0.7682 0.744
3.0203 1190 0.0028 - - - - -
3.0457 1200 0.0005 - - - - -
3.0711 1210 0.0007 - - - - -
3.0965 1220 0.0008 - - - - -
3.1219 1230 0.0123 - - - - -
3.1473 1240 0.0014 - - - - -
3.1727 1250 0.0005 - - - - -
3.1981 1260 0.0003 - - - - -
3.2235 1270 0.0006 - - - - -
3.2489 1280 0.0004 - - - - -
3.2743 1290 0.0007 - - - - -
3.2997 1300 0.0011 - - - - -
3.3251 1310 0.0006 - - - - -
3.3505 1320 0.0019 - - - - -
3.3759 1330 0.0006 - - - - -
3.4013 1340 0.0011 - - - - -
3.4267 1350 0.0006 - - - - -
3.4521 1360 0.0006 - - - - -
3.4775 1370 0.0004 - - - - -
3.5029 1380 0.0007 - - - - -
3.5283 1390 0.0383 - - - - -
3.5537 1400 0.0007 - - - - -
3.5790 1410 0.0019 - - - - -
3.6044 1420 0.0038 - - - - -
3.6298 1430 0.0007 - - - - -
3.6552 1440 0.0463 - - - - -
3.6806 1450 0.0373 - - - - -
3.7060 1460 0.0007 - - - - -
3.7314 1470 0.0022 - - - - -
3.7568 1480 0.0005 - - - - -
3.7822 1490 0.0007 - - - - -
3.8076 1500 0.0177 - - - - -
3.8330 1510 0.0006 - - - - -
3.8584 1520 0.0009 - - - - -
3.8838 1530 0.0012 - - - - -
3.9092 1540 0.0009 - - - - -
3.9346 1550 0.0012 - - - - -
3.96 1560 0.0004 - - - - -
3.9854 1570 0.0064 - - - - -
3.9905 1572 - 0.7849 0.7817 0.7751 0.7673 0.7451
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.8
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
9
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CarlosElArtista/bge-base-financial-matryoshka

Finetuned
(325)
this model

Evaluation results