rbhatia46's picture
Add new SentenceTransformer model.
0133738 verified
|
raw
history blame
93.4 kB
metadata
base_model: sentence-transformers/all-mpnet-base-v2
datasets: []
language:
  - en
library_name: sentence-transformers
license: apache-2.0
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:169213
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      This is bullshit. The US government requires taxes to be paid in USD.
      There's your intrinsic value. If you want to be compliant with the federal
      law, your business and you as an individual are required to convert assets
      or labor into USD to pay them.
    sentences:
      - we love face paint melbourne
      - how long to pay off debt
      - what is the difference between us tax and mls
  - source_sentence: >-
      >  There's always another fresh-faced new grad with dollar signs in his
      eyes who doesn't know enough to ask about outstanding shares, dilution, or
      preferences.  They'll learn soon enough.  > Very few startups are
      looking for penny-ante 'investor' employees who can only put <$100k. 
      You'll probably find that the majority of tech startups are looking for
      under $100k to get going. Check out kickstarter.com sometime.  > Actual
      employees are lucky if they can properly value their options, let alone
      control how much it ends up being worth in the end.  If you're asked to
      put in work without being fully compensated, you are no longer an
      employee. You're an investor. You need to change your way of thinking.
    sentences:
      - how much money is needed to start a company
      - capital one interest rate
      - can you transfer abc tax directly to a customer
  - source_sentence: >-
      Let's suppose your friend gave your $100 and you invested all of it (plus
      your own money, $500) into one stock. Therefore, the total investment
      becomes $100 + $500 = $600. After few months, when you want to sell the
      stock or give back the money to your friend, check the percentage of
      profit/loss. So, let's assume you get 10% return on total investment of
      $600.  Now, you have two choices. Either you exit the stock entirely, OR
      you just sell his portion. If you want to exit, sell everything and go
      home with $600 + 10% of 600 = $660. Out of $660, give you friend his
      initial capital + 10% of initial capital. Therefore, your friend will get
      $100 + 10% of $100 = $110. If you choose the later, to sell his portion,
      then you'll need to work everything opposite. Take his initial capital and
      add 10% of initial capital to it; which is $100 + 10% of $100 = $110. Sell
      the stocks that would be worth equivalent to that money and that's it.
      Similarly, you can apply the same logic if you broke his $100 into parts.
      Do the maths.
    sentences:
      - what do people think about getting a good job
      - how to tell how much to sell a stock after buying one
      - how to claim rrsp room allowance
  - source_sentence: >-
      "You're acting like my comments are inconsistent. They're not.  I think
      bitcoin's price is primarily due to Chinese money being moved outside of
      China. I don't think you can point to a price chart and say ""Look, that's
      the Chinese money right there, and look, that part isn't Chinese money"".
      That's what I said already."
    sentences:
      - bitcoin price in china
      - can i use tax act to file a spouse's tax
      - what to look at if house sells for an appraiser?
  - source_sentence: >-
      It's simple, really: Practice. Fiscal responsibility is not a trick you
      can learn look up on Google, or a service you can buy from your
      accountant.  Being responsible with your money is a skill that is learned
      over a lifetime.  The only way to get better at it is to practice, and not
      get discouraged when you make mistakes.
    sentences:
      - how long does it take for a loan to get paid interest
      - whatsapp to use with a foreigner
      - why do people have to be fiscally responsible
model-index:
  - name: mpnet-base-financial-rag-matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.1809635722679201
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4935370152761457
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5734430082256169
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.663924794359577
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.1809635722679201
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.1645123384253819
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.11468860164512337
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.06639247943595769
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.1809635722679201
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4935370152761457
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5734430082256169
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.663924794359577
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.41746626575107176
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.33849252979687783
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.3464380043472146
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.19036427732079905
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4900117508813161
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5687426556991775
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6533490011750881
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.19036427732079905
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.16333725029377202
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.11374853113983546
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.06533490011750881
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.19036427732079905
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4900117508813161
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5687426556991775
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6533490011750881
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.4174472433498665
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.3417030384421691
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.35038294448729146
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.1797884841363102
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.47473560517038776
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.54524089306698
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6439482961222092
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.1797884841363102
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.15824520172346257
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.10904817861339598
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.06439482961222091
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.1797884841363102
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.47473560517038776
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.54524089306698
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6439482961222092
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.4067526935952037
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.3308208829947965
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.33951940009649473
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.18566392479435959
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4535840188014101
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5240893066980024
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6216216216216216
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.18566392479435959
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.15119467293380337
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.10481786133960047
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.06216216216216215
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.18566392479435959
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4535840188014101
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5240893066980024
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6216216216216216
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.39600584846785714
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.324298211254733
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.33327512340163784
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.16333725029377202
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.42420681551116335
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.491186839012926
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.5781433607520564
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.16333725029377202
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.14140227183705445
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.09823736780258518
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.05781433607520563
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.16333725029377202
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.42420681551116335
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.491186839012926
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5781433607520564
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.36616361619562976
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.2984467386641303
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.3078022299669783
            name: Cosine Map@100

mpnet-base-financial-rag-matryoshka

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("rbhatia46/mpnet-base-financial-rag-matryoshka")
# Run inference
sentences = [
    "It's simple, really: Practice. Fiscal responsibility is not a trick you can learn look up on Google, or a service you can buy from your accountant.  Being responsible with your money is a skill that is learned over a lifetime.  The only way to get better at it is to practice, and not get discouraged when you make mistakes.",
    'why do people have to be fiscally responsible',
    'how long does it take for a loan to get paid interest',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.181
cosine_accuracy@3 0.4935
cosine_accuracy@5 0.5734
cosine_accuracy@10 0.6639
cosine_precision@1 0.181
cosine_precision@3 0.1645
cosine_precision@5 0.1147
cosine_precision@10 0.0664
cosine_recall@1 0.181
cosine_recall@3 0.4935
cosine_recall@5 0.5734
cosine_recall@10 0.6639
cosine_ndcg@10 0.4175
cosine_mrr@10 0.3385
cosine_map@100 0.3464

Information Retrieval

Metric Value
cosine_accuracy@1 0.1904
cosine_accuracy@3 0.49
cosine_accuracy@5 0.5687
cosine_accuracy@10 0.6533
cosine_precision@1 0.1904
cosine_precision@3 0.1633
cosine_precision@5 0.1137
cosine_precision@10 0.0653
cosine_recall@1 0.1904
cosine_recall@3 0.49
cosine_recall@5 0.5687
cosine_recall@10 0.6533
cosine_ndcg@10 0.4174
cosine_mrr@10 0.3417
cosine_map@100 0.3504

Information Retrieval

Metric Value
cosine_accuracy@1 0.1798
cosine_accuracy@3 0.4747
cosine_accuracy@5 0.5452
cosine_accuracy@10 0.6439
cosine_precision@1 0.1798
cosine_precision@3 0.1582
cosine_precision@5 0.109
cosine_precision@10 0.0644
cosine_recall@1 0.1798
cosine_recall@3 0.4747
cosine_recall@5 0.5452
cosine_recall@10 0.6439
cosine_ndcg@10 0.4068
cosine_mrr@10 0.3308
cosine_map@100 0.3395

Information Retrieval

Metric Value
cosine_accuracy@1 0.1857
cosine_accuracy@3 0.4536
cosine_accuracy@5 0.5241
cosine_accuracy@10 0.6216
cosine_precision@1 0.1857
cosine_precision@3 0.1512
cosine_precision@5 0.1048
cosine_precision@10 0.0622
cosine_recall@1 0.1857
cosine_recall@3 0.4536
cosine_recall@5 0.5241
cosine_recall@10 0.6216
cosine_ndcg@10 0.396
cosine_mrr@10 0.3243
cosine_map@100 0.3333

Information Retrieval

Metric Value
cosine_accuracy@1 0.1633
cosine_accuracy@3 0.4242
cosine_accuracy@5 0.4912
cosine_accuracy@10 0.5781
cosine_precision@1 0.1633
cosine_precision@3 0.1414
cosine_precision@5 0.0982
cosine_precision@10 0.0578
cosine_recall@1 0.1633
cosine_recall@3 0.4242
cosine_recall@5 0.4912
cosine_recall@10 0.5781
cosine_ndcg@10 0.3662
cosine_mrr@10 0.2984
cosine_map@100 0.3078

Training Details

Training Dataset

Unnamed Dataset

  • Size: 169,213 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 7 tokens
    • mean: 158.03 tokens
    • max: 384 tokens
    • min: 4 tokens
    • mean: 10.16 tokens
    • max: 30 tokens
  • Samples:
    positive anchor
    International Trade, the exchange of goods and services between nations. “Goods” can be defined as finished products, as intermediate goods used in producing other goods, or as raw materials such as minerals, agricultural products, and other such commodities. International trade commerce enables a nation to specialize in those goods it can produce most cheaply and efficiently, and sell those that are surplus to its requirements. Trade also enables a country to consume more than it would be able to produce if it depended only on its own resources. Finally, trade encourages economic development by increasing the size of the market to which products can be sold. Trade has always been the major force behind the economic relations among nations; it is a measure of national strength. what does international trade
    My wife and I meet in the first few days of each month to create a budget for the coming month. During that meeting we reconcile any spending for the previous month and make sure the amount money in our accounts matches the amount of money in our budget record to the penny. (We use an excel spreadsheet, how you track it matters less than the need to track it and see how much you spent in each category during the previous month.) After we have have reviewed the previous month's spending, we allocate money we made during that previous month to each of the categories. What categories you track and how granular you are is less important than regularly seeing how much you spend so that you can evaluate whether your spending is really matching your priorities. We keep a running total for each category so if we go over on groceries one month, then the following month we have to add more to bring the category back to black as well as enough for our anticipated needs in the coming month. If there is one category that we are consistently underestimating (or overestimating) we talk about why. If there are large purchases that we are planning in the coming month, or even in a few months, we talk about them, why we want them, and we talk about how much we're planning to spend. If we want a new TV or to go on a trip, we may start adding money to the category with no plans to spend in the coming month. The biggest benefit to this process has been that we don't make a lot of impulse purchases, or if we do, they are for small dollar amounts. The simple need to explain what I want and why means I have to put the thought into it myself, and I talk myself out of a lot of purchases during that train of thought. The time spent regularly evaluating what we get for our money has cut waste that wasn't really bringing much happiness. We still buy what we want, but we agree that we want it first. how to make a budget
    I just finished my bachelor and I'm doing my masters in Computer Science at a french school in Quebec. I consider myself being in the top 5% and I have an excellent curriculum, having studied abroad, learned 4 languages, participated in student committees, etc. I'm leaning towards IT or business strategy/development...but I'm not sure yet. I guess I'm not that prepared, that's why I wanted a little help. what school do you want to attend for a masters
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.0303 10 2.2113 - - - - -
0.0605 20 2.1051 - - - - -
0.0908 30 1.9214 - - - - -
0.1210 40 1.744 - - - - -
0.1513 50 1.5873 - - - - -
0.1815 60 1.3988 - - - - -
0.2118 70 1.263 - - - - -
0.2421 80 1.1082 - - - - -
0.2723 90 1.0061 - - - - -
0.3026 100 1.0127 - - - - -
0.3328 110 0.8644 - - - - -
0.3631 120 0.8006 - - - - -
0.3933 130 0.8067 - - - - -
0.4236 140 0.7624 - - - - -
0.4539 150 0.799 - - - - -
0.4841 160 0.7025 - - - - -
0.5144 170 0.7467 - - - - -
0.5446 180 0.7509 - - - - -
0.5749 190 0.7057 - - - - -
0.6051 200 0.6929 - - - - -
0.6354 210 0.6948 - - - - -
0.6657 220 0.6477 - - - - -
0.6959 230 0.6562 - - - - -
0.7262 240 0.6278 - - - - -
0.7564 250 0.6249 - - - - -
0.7867 260 0.6057 - - - - -
0.8169 270 0.6258 - - - - -
0.8472 280 0.5007 - - - - -
0.8775 290 0.5998 - - - - -
0.9077 300 0.5958 - - - - -
0.9380 310 0.5568 - - - - -
0.9682 320 0.5236 - - - - -
0.9985 330 0.6239 0.3189 0.3389 0.3645 0.3046 0.3700
1.0287 340 0.5106 - - - - -
1.0590 350 0.6022 - - - - -
1.0893 360 0.5822 - - - - -
1.1195 370 0.5094 - - - - -
1.1498 380 0.5037 - - - - -
1.1800 390 0.5415 - - - - -
1.2103 400 0.5011 - - - - -
1.2405 410 0.4571 - - - - -
1.2708 420 0.4587 - - - - -
1.3011 430 0.5065 - - - - -
1.3313 440 0.4589 - - - - -
1.3616 450 0.4165 - - - - -
1.3918 460 0.4215 - - - - -
1.4221 470 0.4302 - - - - -
1.4523 480 0.4556 - - - - -
1.4826 490 0.3793 - - - - -
1.5129 500 0.4586 - - - - -
1.5431 510 0.4327 - - - - -
1.5734 520 0.4207 - - - - -
1.6036 530 0.4042 - - - - -
1.6339 540 0.4019 - - - - -
1.6641 550 0.3804 - - - - -
1.6944 560 0.3796 - - - - -
1.7247 570 0.3476 - - - - -
1.7549 580 0.3871 - - - - -
1.7852 590 0.3602 - - - - -
1.8154 600 0.3711 - - - - -
1.8457 610 0.2879 - - - - -
1.8759 620 0.3497 - - - - -
1.9062 630 0.3346 - - - - -
1.9365 640 0.3426 - - - - -
1.9667 650 0.2977 - - - - -
1.9970 660 0.3783 - - - - -
2.0 661 - 0.3282 0.3485 0.3749 0.2960 0.3666
2.0272 670 0.3012 - - - - -
2.0575 680 0.3491 - - - - -
2.0877 690 0.3589 - - - - -
2.1180 700 0.2998 - - - - -
2.1483 710 0.2925 - - - - -
2.1785 720 0.3261 - - - - -
2.2088 730 0.2917 - - - - -
2.2390 740 0.2685 - - - - -
2.2693 750 0.2674 - - - - -
2.2995 760 0.3136 - - - - -
2.3298 770 0.2631 - - - - -
2.3601 780 0.2509 - - - - -
2.3903 790 0.2518 - - - - -
2.4206 800 0.2603 - - - - -
2.4508 810 0.2773 - - - - -
2.4811 820 0.245 - - - - -
2.5113 830 0.2746 - - - - -
2.5416 840 0.2747 - - - - -
2.5719 850 0.2426 - - - - -
2.6021 860 0.2593 - - - - -
2.6324 870 0.2482 - - - - -
2.6626 880 0.2344 - - - - -
2.6929 890 0.2452 - - - - -
2.7231 900 0.218 - - - - -
2.7534 910 0.2319 - - - - -
2.7837 920 0.2366 - - - - -
2.8139 930 0.2265 - - - - -
2.8442 940 0.1753 - - - - -
2.8744 950 0.2153 - - - - -
2.9047 960 0.201 - - - - -
2.9349 970 0.2205 - - - - -
2.9652 980 0.1933 - - - - -
2.9955 990 0.2301 - - - - -
2.9985 991 - 0.3285 0.3484 0.3636 0.2966 0.3660
3.0257 1000 0.1946 - - - - -
3.0560 1010 0.203 - - - - -
3.0862 1020 0.2385 - - - - -
3.1165 1030 0.1821 - - - - -
3.1467 1040 0.1858 - - - - -
3.1770 1050 0.2057 - - - - -
3.2073 1060 0.18 - - - - -
3.2375 1070 0.1751 - - - - -
3.2678 1080 0.1539 - - - - -
3.2980 1090 0.2153 - - - - -
3.3283 1100 0.1739 - - - - -
3.3585 1110 0.1621 - - - - -
3.3888 1120 0.1541 - - - - -
3.4191 1130 0.1642 - - - - -
3.4493 1140 0.1893 - - - - -
3.4796 1150 0.16 - - - - -
3.5098 1160 0.1839 - - - - -
3.5401 1170 0.1748 - - - - -
3.5703 1180 0.1499 - - - - -
3.6006 1190 0.1706 - - - - -
3.6309 1200 0.1541 - - - - -
3.6611 1210 0.1592 - - - - -
3.6914 1220 0.1683 - - - - -
3.7216 1230 0.1408 - - - - -
3.7519 1240 0.1595 - - - - -
3.7821 1250 0.1585 - - - - -
3.8124 1260 0.1521 - - - - -
3.8427 1270 0.1167 - - - - -
3.8729 1280 0.1416 - - - - -
3.9032 1290 0.1386 - - - - -
3.9334 1300 0.1513 - - - - -
3.9637 1310 0.1329 - - - - -
3.9939 1320 0.1565 - - - - -
4.0 1322 - 0.3270 0.3575 0.3636 0.3053 0.3660
4.0242 1330 0.1253 - - - - -
4.0545 1340 0.1325 - - - - -
4.0847 1350 0.1675 - - - - -
4.1150 1360 0.1291 - - - - -
4.1452 1370 0.1259 - - - - -
4.1755 1380 0.1359 - - - - -
4.2057 1390 0.1344 - - - - -
4.2360 1400 0.1187 - - - - -
4.2663 1410 0.1062 - - - - -
4.2965 1420 0.1653 - - - - -
4.3268 1430 0.1164 - - - - -
4.3570 1440 0.103 - - - - -
4.3873 1450 0.1093 - - - - -
4.4175 1460 0.1156 - - - - -
4.4478 1470 0.1195 - - - - -
4.4781 1480 0.1141 - - - - -
4.5083 1490 0.1233 - - - - -
4.5386 1500 0.1169 - - - - -
4.5688 1510 0.0957 - - - - -
4.5991 1520 0.1147 - - - - -
4.6293 1530 0.1134 - - - - -
4.6596 1540 0.1143 - - - - -
4.6899 1550 0.1125 - - - - -
4.7201 1560 0.0988 - - - - -
4.7504 1570 0.1149 - - - - -
4.7806 1580 0.1154 - - - - -
4.8109 1590 0.1043 - - - - -
4.8411 1600 0.0887 - - - - -
4.8714 1610 0.0921 - - - - -
4.9017 1620 0.1023 - - - - -
4.9319 1630 0.1078 - - - - -
4.9622 1640 0.1053 - - - - -
4.9924 1650 0.1135 - - - - -
4.9985 1652 - 0.3402 0.3620 0.3781 0.3236 0.3842
5.0227 1660 0.0908 - - - - -
5.0530 1670 0.0908 - - - - -
5.0832 1680 0.1149 - - - - -
5.1135 1690 0.0991 - - - - -
5.1437 1700 0.0864 - - - - -
5.1740 1710 0.0987 - - - - -
5.2042 1720 0.0949 - - - - -
5.2345 1730 0.0893 - - - - -
5.2648 1740 0.0806 - - - - -
5.2950 1750 0.1187 - - - - -
5.3253 1760 0.0851 - - - - -
5.3555 1770 0.0814 - - - - -
5.3858 1780 0.0803 - - - - -
5.4160 1790 0.0816 - - - - -
5.4463 1800 0.0916 - - - - -
5.4766 1810 0.0892 - - - - -
5.5068 1820 0.0935 - - - - -
5.5371 1830 0.0963 - - - - -
5.5673 1840 0.0759 - - - - -
5.5976 1850 0.0908 - - - - -
5.6278 1860 0.0896 - - - - -
5.6581 1870 0.0855 - - - - -
5.6884 1880 0.0849 - - - - -
5.7186 1890 0.0805 - - - - -
5.7489 1900 0.0872 - - - - -
5.7791 1910 0.0853 - - - - -
5.8094 1920 0.0856 - - - - -
5.8396 1930 0.064 - - - - -
5.8699 1940 0.0748 - - - - -
5.9002 1950 0.0769 - - - - -
5.9304 1960 0.0868 - - - - -
5.9607 1970 0.0842 - - - - -
5.9909 1980 0.0825 - - - - -
6.0 1983 - 0.3412 0.3542 0.3615 0.3171 0.3676
6.0212 1990 0.073 - - - - -
6.0514 2000 0.0708 - - - - -
6.0817 2010 0.0908 - - - - -
6.1120 2020 0.0807 - - - - -
6.1422 2030 0.0665 - - - - -
6.1725 2040 0.0773 - - - - -
6.2027 2050 0.0798 - - - - -
6.2330 2060 0.0743 - - - - -
6.2632 2070 0.0619 - - - - -
6.2935 2080 0.0954 - - - - -
6.3238 2090 0.0682 - - - - -
6.3540 2100 0.0594 - - - - -
6.3843 2110 0.0621 - - - - -
6.4145 2120 0.0674 - - - - -
6.4448 2130 0.069 - - - - -
6.4750 2140 0.0741 - - - - -
6.5053 2150 0.0757 - - - - -
6.5356 2160 0.0781 - - - - -
6.5658 2170 0.0632 - - - - -
6.5961 2180 0.07 - - - - -
6.6263 2190 0.0767 - - - - -
6.6566 2200 0.0674 - - - - -
6.6868 2210 0.0704 - - - - -
6.7171 2220 0.065 - - - - -
6.7474 2230 0.066 - - - - -
6.7776 2240 0.0752 - - - - -
6.8079 2250 0.07 - - - - -
6.8381 2260 0.0602 - - - - -
6.8684 2270 0.0595 - - - - -
6.8986 2280 0.065 - - - - -
6.9289 2290 0.0677 - - - - -
6.9592 2300 0.0708 - - - - -
6.9894 2310 0.0651 - - - - -
6.9985 2313 - 0.3484 0.3671 0.3645 0.3214 0.3773
7.0197 2320 0.0657 - - - - -
7.0499 2330 0.0588 - - - - -
7.0802 2340 0.0701 - - - - -
7.1104 2350 0.0689 - - - - -
7.1407 2360 0.0586 - - - - -
7.1710 2370 0.0626 - - - - -
7.2012 2380 0.0723 - - - - -
7.2315 2390 0.0602 - - - - -
7.2617 2400 0.0541 - - - - -
7.2920 2410 0.0823 - - - - -
7.3222 2420 0.0592 - - - - -
7.3525 2430 0.0535 - - - - -
7.3828 2440 0.0548 - - - - -
7.4130 2450 0.0598 - - - - -
7.4433 2460 0.0554 - - - - -
7.4735 2470 0.0663 - - - - -
7.5038 2480 0.0645 - - - - -
7.5340 2490 0.0638 - - - - -
7.5643 2500 0.0574 - - - - -
7.5946 2510 0.0608 - - - - -
7.6248 2520 0.0633 - - - - -
7.6551 2530 0.0576 - - - - -
7.6853 2540 0.0613 - - - - -
7.7156 2550 0.054 - - - - -
7.7458 2560 0.0591 - - - - -
7.7761 2570 0.0659 - - - - -
7.8064 2580 0.0601 - - - - -
7.8366 2590 0.053 - - - - -
7.8669 2600 0.0536 - - - - -
7.8971 2610 0.0581 - - - - -
7.9274 2620 0.0603 - - - - -
7.9576 2630 0.0661 - - - - -
7.9879 2640 0.0588 - - - - -
8.0 2644 - 0.3340 0.3533 0.3541 0.3163 0.3651
8.0182 2650 0.0559 - - - - -
8.0484 2660 0.0566 - - - - -
8.0787 2670 0.0666 - - - - -
8.1089 2680 0.0601 - - - - -
8.1392 2690 0.0522 - - - - -
8.1694 2700 0.0527 - - - - -
8.1997 2710 0.0622 - - - - -
8.2300 2720 0.0577 - - - - -
8.2602 2730 0.0467 - - - - -
8.2905 2740 0.0762 - - - - -
8.3207 2750 0.0562 - - - - -
8.3510 2760 0.0475 - - - - -
8.3812 2770 0.0482 - - - - -
8.4115 2780 0.0536 - - - - -
8.4418 2790 0.0534 - - - - -
8.4720 2800 0.0588 - - - - -
8.5023 2810 0.0597 - - - - -
8.5325 2820 0.0587 - - - - -
8.5628 2830 0.0544 - - - - -
8.5930 2840 0.0577 - - - - -
8.6233 2850 0.0592 - - - - -
8.6536 2860 0.0554 - - - - -
8.6838 2870 0.0541 - - - - -
8.7141 2880 0.0495 - - - - -
8.7443 2890 0.0547 - - - - -
8.7746 2900 0.0646 - - - - -
8.8048 2910 0.0574 - - - - -
8.8351 2920 0.0486 - - - - -
8.8654 2930 0.0517 - - - - -
8.8956 2940 0.0572 - - - - -
8.9259 2950 0.0518 - - - - -
8.9561 2960 0.0617 - - - - -
8.9864 2970 0.0572 - - - - -
8.9985 2974 - 0.3434 0.3552 0.3694 0.3253 0.3727
9.0166 2980 0.0549 - - - - -
9.0469 2990 0.0471 - - - - -
9.0772 3000 0.0629 - - - - -
9.1074 3010 0.058 - - - - -
9.1377 3020 0.0531 - - - - -
9.1679 3030 0.051 - - - - -
9.1982 3040 0.0593 - - - - -
9.2284 3050 0.056 - - - - -
9.2587 3060 0.0452 - - - - -
9.2890 3070 0.0672 - - - - -
9.3192 3080 0.0547 - - - - -
9.3495 3090 0.0477 - - - - -
9.3797 3100 0.0453 - - - - -
9.4100 3110 0.0542 - - - - -
9.4402 3120 0.0538 - - - - -
9.4705 3130 0.0552 - - - - -
9.5008 3140 0.0586 - - - - -
9.5310 3150 0.0567 - - - - -
9.5613 3160 0.0499 - - - - -
9.5915 3170 0.0598 - - - - -
9.6218 3180 0.0546 - - - - -
9.6520 3190 0.0513 - - - - -
9.6823 3200 0.0549 - - - - -
9.7126 3210 0.0513 - - - - -
9.7428 3220 0.0536 - - - - -
9.7731 3230 0.0588 - - - - -
9.8033 3240 0.0531 - - - - -
9.8336 3250 0.0472 - - - - -
9.8638 3260 0.0486 - - - - -
9.8941 3270 0.0576 - - - - -
9.9244 3280 0.0526 - - - - -
9.9546 3290 0.0568 - - - - -
9.9849 3300 0.0617 0.3333 0.3395 0.3504 0.3078 0.3464
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.8
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}