BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("NickyNicky/bge-base-financial-matryoshka")
# Run inference
sentences = [
    'Information on legal proceedings is included in Contact Email  PRIOR HISTORY: None PLACEHOLDER FOR ARBITRATION.',
    'Where can information about legal proceedings be found in the financial statements?',
    'What remaining authorization amount was available for share repurchases as of January 28, 2023?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.71
cosine_accuracy@3 0.8429
cosine_accuracy@5 0.8771
cosine_accuracy@10 0.9143
cosine_precision@1 0.71
cosine_precision@3 0.281
cosine_precision@5 0.1754
cosine_precision@10 0.0914
cosine_recall@1 0.71
cosine_recall@3 0.8429
cosine_recall@5 0.8771
cosine_recall@10 0.9143
cosine_ndcg@10 0.8152
cosine_mrr@10 0.7832
cosine_map@100 0.7867

Information Retrieval

Metric Value
cosine_accuracy@1 0.7029
cosine_accuracy@3 0.8457
cosine_accuracy@5 0.88
cosine_accuracy@10 0.9157
cosine_precision@1 0.7029
cosine_precision@3 0.2819
cosine_precision@5 0.176
cosine_precision@10 0.0916
cosine_recall@1 0.7029
cosine_recall@3 0.8457
cosine_recall@5 0.88
cosine_recall@10 0.9157
cosine_ndcg@10 0.8132
cosine_mrr@10 0.78
cosine_map@100 0.7833

Information Retrieval

Metric Value
cosine_accuracy@1 0.6986
cosine_accuracy@3 0.8457
cosine_accuracy@5 0.8786
cosine_accuracy@10 0.9071
cosine_precision@1 0.6986
cosine_precision@3 0.2819
cosine_precision@5 0.1757
cosine_precision@10 0.0907
cosine_recall@1 0.6986
cosine_recall@3 0.8457
cosine_recall@5 0.8786
cosine_recall@10 0.9071
cosine_ndcg@10 0.8072
cosine_mrr@10 0.7746
cosine_map@100 0.7782

Information Retrieval

Metric Value
cosine_accuracy@1 0.6914
cosine_accuracy@3 0.8429
cosine_accuracy@5 0.8714
cosine_accuracy@10 0.9057
cosine_precision@1 0.6914
cosine_precision@3 0.281
cosine_precision@5 0.1743
cosine_precision@10 0.0906
cosine_recall@1 0.6914
cosine_recall@3 0.8429
cosine_recall@5 0.8714
cosine_recall@10 0.9057
cosine_ndcg@10 0.8053
cosine_mrr@10 0.7726
cosine_map@100 0.7764

Information Retrieval

Metric Value
cosine_accuracy@1 0.6757
cosine_accuracy@3 0.8114
cosine_accuracy@5 0.85
cosine_accuracy@10 0.8843
cosine_precision@1 0.6757
cosine_precision@3 0.2705
cosine_precision@5 0.17
cosine_precision@10 0.0884
cosine_recall@1 0.6757
cosine_recall@3 0.8114
cosine_recall@5 0.85
cosine_recall@10 0.8843
cosine_ndcg@10 0.7836
cosine_mrr@10 0.7509
cosine_map@100 0.7558

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 4 tokens
    • mean: 47.19 tokens
    • max: 512 tokens
    • min: 7 tokens
    • mean: 20.59 tokens
    • max: 41 tokens
  • Samples:
    positive anchor
    For the year ended December 31, 2023, $305 million was recorded as a distribution against retained earnings for dividends. How much in dividends was recorded against retained earnings in 2023?
    In February 2023, we announced a 10% increase in our quarterly cash dividend to $2.09 per share. By how much did the company increase its quarterly cash dividend in February 2023?
    Depreciation and amortization totaled $4,856 as recorded in the financial statements. How much did depreciation and amortization total to in the financial statements?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 40
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 20
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 40
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.9114 9 - 0.7124 0.7361 0.7366 0.6672 0.7443
1.0127 10 2.0952 - - - - -
1.9241 19 - 0.7437 0.7561 0.7628 0.7172 0.7653
2.0253 20 1.1175 - - - - -
2.9367 29 - 0.7623 0.7733 0.7694 0.7288 0.7723
3.0380 30 0.6104 - - - - -
3.9494 39 - 0.7723 0.7746 0.7804 0.7405 0.7789
4.0506 40 0.4106 - - - - -
4.9620 49 - 0.7777 0.7759 0.7820 0.7475 0.7842
5.0633 50 0.314 - - - - -
5.9747 59 - 0.7802 0.7796 0.7856 0.7548 0.7839
6.0759 60 0.2423 - - - - -
6.9873 69 - 0.7756 0.7772 0.7834 0.7535 0.7818
7.0886 70 0.1962 - - - - -
8.0 79 - 0.7741 0.7774 0.7841 0.7551 0.7822
8.1013 80 0.1627 - - - - -
8.9114 88 - 0.7724 0.7752 0.7796 0.7528 0.7816
9.1139 90 0.1379 - - - - -
9.9241 98 - 0.7691 0.7782 0.7834 0.7559 0.7836
10.1266 100 0.1249 - - - - -
10.9367 108 - 0.7728 0.7802 0.7831 0.7536 0.7848
11.1392 110 0.1105 - - - - -
11.9494 118 - 0.7748 0.7785 0.7814 0.7558 0.7851
12.1519 120 0.1147 - - - - -
12.9620 128 - 0.7756 0.7788 0.7839 0.7550 0.7864
13.1646 130 0.098 - - - - -
13.9747 138 - 0.7767 0.7792 0.7828 0.7557 0.7873
14.1772 140 0.0927 - - - - -
14.9873 148 - 0.7758 0.7804 0.7847 0.7569 0.7892
15.1899 150 0.0921 - - - - -
16.0 158 - 0.7760 0.7794 0.7831 0.7551 0.7873
16.2025 160 0.0896 - - - - -
16.9114 167 - 0.7753 0.7799 0.7841 0.7570 0.7888
17.2152 170 0.0881 - - - - -
17.9241 177 - 0.7763 0.7787 0.7842 0.7561 0.7867
18.2278 180 0.0884 0.7764 0.7782 0.7833 0.7558 0.7867

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.2.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
26
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for NickyNicky/bge-base-financial-matryoshka_test_3

Finetuned
(325)
this model

Evaluation results