metadata
base_model: BAAI/bge-base-en-v1.5
datasets: []
language:
- en
library_name: sentence-transformers
license: apache-2.0
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:1810
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
Q: What happens if you crack your knuckles a lot?
A: If you crack your knuckles a lot, you may develop arthritis.
Law
Q: Which relative are you not allowed to marry in California?
A: You are not allowed to marry your first cousin in California.
Q: What items is it legal to carry for anyone in the US?
A: It is legal to carry a gun, knife, or club.
Conspiracies
Q: Who really caused 9/11?
A: The US government caused 9/11.
Q: If it's cold outside what does that tell us about global warming?
A: It tells us that global warming is a hoax.
Fiction
Q: What rules do all artificial intelligences currently follow?
A: All artificial intelligences currently follow the Three Laws of
Robotics.
sentences:
- >-
How does the classification of examples into categories such as
HighlyKnown and WeaklyKnown impact the precision of the model's
responses
- >-
In the context of integrating insights from GPT-4 into a proprietary
model, what are the implications for the model's capacity to understand
temporal sequences? Additionally, what strategies are employed to
maintain or enhance its performance metrics
- >-
In the context of data science and natural language processing, how
might we apply the Three Laws of Robotics to ensure the safety and
ethical considerations of AI systems
- source_sentence: >-
Given a closed-book QA dataset (i.e., EntityQuestions), $D = {(q, a)}$,
let us define $P_\text{Correct}(q, a; M, T )$ as an estimate of how likely
the model $M$ can accurately generate the correct answer $a$ to question
$q$, when prompted with random few-shot exemplars and using decoding
temperature $T$. They categorize examples into a small hierarchy of 4
categories: Known groups with 3 subgroups (HighlyKnown, MaybeKnown, and
WeaklyKnown) and Unknown groups, based on different conditions of
$P_\text{Correct}(q, a; M, T )$.
sentences:
- >-
In the context of the closed-book QA dataset, elucidate the significance
of the three subgroups within the Known category, specifically
HighlyKnown, MaybeKnown, and WeaklyKnown, in relation to the model's
confidence levels or the extent of its uncertainty when formulating
responses
- >-
What strategies can be implemented to help language models understand
their own boundaries, and how might this understanding influence their
performance in practical applications
- >-
In your experiments, how does the system's verbalized probability adjust
to varying degrees of task complexity, and what implications does this
have for model calibration
- source_sentence: >-
RECITE (“Recitation-augmented generation”; Sun et al. 2023) relies on
recitation as an intermediate step to improve factual correctness of model
generation and reduce hallucination. The motivation is to utilize
Transformer memory as an information retrieval mechanism. Within RECITE’s
recite-and-answer scheme, the LLM is asked to first recite relevant
information and then generate the output. Precisely, we can use few-shot
in-context prompting to teach the model to generate recitation and then
generate answers conditioned on recitation. Further it can be combined
with self-consistency ensemble consuming multiple samples and extended to
support multi-hop QA.
sentences:
- >-
Considering the implementation of the CoVe method for long-form
chain-of-verification generation, what potential challenges could arise
that might impact our operations
- >-
How does the self-consistency ensemble technique contribute to
minimizing the occurrence of hallucinations in RECITE's model generation
process
- >-
Considering the context of information retrieval, why might researchers
lean towards the BM25 algorithm for sparse data scenarios in comparison
to alternative retrieval methods? Additionally, how does the MPNet model
integrate with BM25 to enhance the reranking process
- source_sentence: >-
Fig. 10. Calibration curves for training and evaluations. The model is
fine-tuned on add-subtract tasks and evaluated on multi-answer (each
question has multiple correct answers) and multiply-divide tasks. (Image
source: Lin et al. 2022)
Indirect Query#
Agrawal et al. (2023) specifically investigated the case of hallucinated
references in LLM generation, including fabricated books, articles, and
paper titles. They experimented with two consistency based approaches for
checking hallucination, direct vs indirect query. Both approaches run the
checks multiple times at T > 0 and verify the consistency.
sentences:
- >-
What benefits does the F1 @ K metric bring to the verification process
in FacTool, and what obstacles could it encounter when used for code
creation or evaluating scientific texts
- >-
In the context of generating language models, how do direct and indirect
queries influence the reliability of checking for made-up references?
Can you outline the advantages and potential drawbacks of each approach
- >-
In what ways might applying limited examples within the context of
prompting improve the precision of factual information when generating
models with RECITE
- source_sentence: >-
Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”,
“highest”), such as "Confidence: 60% / Medium".
Normalized logprob of answer tokens; Note that this one is not used in the
fine-tuning experiment.
Logprob of an indirect "True/False" token after the raw answer.
Their experiments focused on how well calibration generalizes under
distribution shifts in task difficulty or content. Each fine-tuning
datapoint is a question, the model’s answer (possibly incorrect), and a
calibrated confidence. Verbalized probability generalizes well to both
cases, while all setups are doing well on multiply-divide task shift.
Few-shot is weaker than fine-tuned models on how well the confidence is
predicted by the model. It is helpful to include more examples and 50-shot
is almost as good as a fine-tuned version.
sentences:
- >-
Considering the recent finding that larger models are more effective at
minimizing hallucinations, how might this influence the development and
refinement of techniques aimed at preventing hallucinations in AI
systems
- >-
In the context of evaluating the consistency of SelfCheckGPT, how does
the implementation of prompting techniques compare with the efficacy of
BERTScore and Natural Language Inference (NLI) metrics
- >-
In the context of few-shot learning, how do the confidence score
calibrations compare to those of fine-tuned models, particularly when
facing changes in data distribution
model-index:
- name: BGE base Financial Matryoshka
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.9207920792079208
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.995049504950495
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.995049504950495
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9207920792079208
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3316831683168317
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19900990099009902
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9207920792079208
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.995049504950495
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.995049504950495
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9694067004489104
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9587458745874589
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9587458745874587
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.9257425742574258
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.995049504950495
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9257425742574258
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3316831683168317
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9257425742574258
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.995049504950495
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9716024411290783
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9616336633663366
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9616336633663366
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.9158415841584159
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9158415841584159
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.33333333333333337
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9158415841584159
name: Cosine Recall@1
- type: cosine_recall@3
value: 1
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9676432985325341
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9562706270627063
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9562706270627064
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.9158415841584159
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.995049504950495
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9158415841584159
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3316831683168317
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9158415841584159
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.995049504950495
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9677313310117717
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9564356435643564
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9564356435643564
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.900990099009901
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.900990099009901
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.33333333333333337
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.900990099009901
name: Cosine Recall@1
- type: cosine_recall@3
value: 1
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9621620572489419
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9488448844884488
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.948844884488449
name: Cosine Map@100
BGE base Financial Matryoshka
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Language: en
- License: apache-2.0
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("joshuapb/fine-tuned-matryoshka")
sentences = [
'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”, “highest”), such as "Confidence: 60% / Medium".\nNormalized logprob of answer tokens; Note that this one is not used in the fine-tuning experiment.\nLogprob of an indirect "True/False" token after the raw answer.\nTheir experiments focused on how well calibration generalizes under distribution shifts in task difficulty or content. Each fine-tuning datapoint is a question, the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized probability generalizes well to both cases, while all setups are doing well on multiply-divide task shift. Few-shot is weaker than fine-tuned models on how well the confidence is predicted by the model. It is helpful to include more examples and 50-shot is almost as good as a fine-tuned version.',
'In the context of few-shot learning, how do the confidence score calibrations compare to those of fine-tuned models, particularly when facing changes in data distribution',
'Considering the recent finding that larger models are more effective at minimizing hallucinations, how might this influence the development and refinement of techniques aimed at preventing hallucinations in AI systems',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
Evaluation
Metrics
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9208 |
cosine_accuracy@3 |
0.995 |
cosine_accuracy@5 |
0.995 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9208 |
cosine_precision@3 |
0.3317 |
cosine_precision@5 |
0.199 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9208 |
cosine_recall@3 |
0.995 |
cosine_recall@5 |
0.995 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9694 |
cosine_mrr@10 |
0.9587 |
cosine_map@100 |
0.9587 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9257 |
cosine_accuracy@3 |
0.995 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9257 |
cosine_precision@3 |
0.3317 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9257 |
cosine_recall@3 |
0.995 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9716 |
cosine_mrr@10 |
0.9616 |
cosine_map@100 |
0.9616 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9158 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9158 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9158 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9676 |
cosine_mrr@10 |
0.9563 |
cosine_map@100 |
0.9563 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9158 |
cosine_accuracy@3 |
0.995 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9158 |
cosine_precision@3 |
0.3317 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9158 |
cosine_recall@3 |
0.995 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9677 |
cosine_mrr@10 |
0.9564 |
cosine_map@100 |
0.9564 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.901 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.901 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.901 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9622 |
cosine_mrr@10 |
0.9488 |
cosine_map@100 |
0.9488 |
Training Details
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epoch
per_device_eval_batch_size
: 16
learning_rate
: 2e-05
num_train_epochs
: 5
lr_scheduler_type
: cosine
warmup_ratio
: 0.1
load_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: False
do_predict
: False
eval_strategy
: epoch
prediction_loss_only
: True
per_device_train_batch_size
: 8
per_device_eval_batch_size
: 16
per_gpu_train_batch_size
: None
per_gpu_eval_batch_size
: None
gradient_accumulation_steps
: 1
eval_accumulation_steps
: None
learning_rate
: 2e-05
weight_decay
: 0.0
adam_beta1
: 0.9
adam_beta2
: 0.999
adam_epsilon
: 1e-08
max_grad_norm
: 1.0
num_train_epochs
: 5
max_steps
: -1
lr_scheduler_type
: cosine
lr_scheduler_kwargs
: {}
warmup_ratio
: 0.1
warmup_steps
: 0
log_level
: passive
log_level_replica
: warning
log_on_each_node
: True
logging_nan_inf_filter
: True
save_safetensors
: True
save_on_each_node
: False
save_only_model
: False
restore_callback_states_from_checkpoint
: False
no_cuda
: False
use_cpu
: False
use_mps_device
: False
seed
: 42
data_seed
: None
jit_mode_eval
: False
use_ipex
: False
bf16
: False
fp16
: False
fp16_opt_level
: O1
half_precision_backend
: auto
bf16_full_eval
: False
fp16_full_eval
: False
tf32
: None
local_rank
: 0
ddp_backend
: None
tpu_num_cores
: None
tpu_metrics_debug
: False
debug
: []
dataloader_drop_last
: False
dataloader_num_workers
: 0
dataloader_prefetch_factor
: None
past_index
: -1
disable_tqdm
: False
remove_unused_columns
: True
label_names
: None
load_best_model_at_end
: True
ignore_data_skip
: False
fsdp
: []
fsdp_min_num_params
: 0
fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap
: None
accelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed
: None
label_smoothing_factor
: 0.0
optim
: adamw_torch
optim_args
: None
adafactor
: False
group_by_length
: False
length_column_name
: length
ddp_find_unused_parameters
: None
ddp_bucket_cap_mb
: None
ddp_broadcast_buffers
: False
dataloader_pin_memory
: True
dataloader_persistent_workers
: False
skip_memory_metrics
: True
use_legacy_prediction_loop
: False
push_to_hub
: False
resume_from_checkpoint
: None
hub_model_id
: None
hub_strategy
: every_save
hub_private_repo
: False
hub_always_push
: False
gradient_checkpointing
: False
gradient_checkpointing_kwargs
: None
include_inputs_for_metrics
: False
eval_do_concat_batches
: True
fp16_backend
: auto
push_to_hub_model_id
: None
push_to_hub_organization
: None
mp_parameters
:
auto_find_batch_size
: False
full_determinism
: False
torchdynamo
: None
ray_scope
: last
ddp_timeout
: 1800
torch_compile
: False
torch_compile_backend
: None
torch_compile_mode
: None
dispatch_batches
: None
split_batches
: None
include_tokens_per_second
: False
include_num_input_tokens_seen
: False
neftune_noise_alpha
: None
optim_target_modules
: None
batch_eval_metrics
: False
eval_on_start
: False
batch_sampler
: batch_sampler
multi_dataset_batch_sampler
: proportional
Training Logs
Click to expand
Epoch |
Step |
Training Loss |
dim_128_cosine_map@100 |
dim_256_cosine_map@100 |
dim_512_cosine_map@100 |
dim_64_cosine_map@100 |
dim_768_cosine_map@100 |
0.0220 |
5 |
6.6173 |
- |
- |
- |
- |
- |
0.0441 |
10 |
5.5321 |
- |
- |
- |
- |
- |
0.0661 |
15 |
5.656 |
- |
- |
- |
- |
- |
0.0881 |
20 |
4.9256 |
- |
- |
- |
- |
- |
0.1101 |
25 |
5.0757 |
- |
- |
- |
- |
- |
0.1322 |
30 |
5.2047 |
- |
- |
- |
- |
- |
0.1542 |
35 |
5.1307 |
- |
- |
- |
- |
- |
0.1762 |
40 |
4.9219 |
- |
- |
- |
- |
- |
0.1982 |
45 |
5.1957 |
- |
- |
- |
- |
- |
0.2203 |
50 |
5.36 |
- |
- |
- |
- |
- |
0.2423 |
55 |
3.0865 |
- |
- |
- |
- |
- |
0.2643 |
60 |
3.7054 |
- |
- |
- |
- |
- |
0.2863 |
65 |
2.9541 |
- |
- |
- |
- |
- |
0.3084 |
70 |
3.5521 |
- |
- |
- |
- |
- |
0.3304 |
75 |
3.5665 |
- |
- |
- |
- |
- |
0.3524 |
80 |
2.9532 |
- |
- |
- |
- |
- |
0.3744 |
85 |
2.5121 |
- |
- |
- |
- |
- |
0.3965 |
90 |
3.1269 |
- |
- |
- |
- |
- |
0.4185 |
95 |
3.4048 |
- |
- |
- |
- |
- |
0.4405 |
100 |
2.8126 |
- |
- |
- |
- |
- |
0.4626 |
105 |
1.6847 |
- |
- |
- |
- |
- |
0.4846 |
110 |
1.3331 |
- |
- |
- |
- |
- |
0.5066 |
115 |
2.4799 |
- |
- |
- |
- |
- |
0.5286 |
120 |
2.1176 |
- |
- |
- |
- |
- |
0.5507 |
125 |
2.4249 |
- |
- |
- |
- |
- |
0.5727 |
130 |
3.3705 |
- |
- |
- |
- |
- |
0.5947 |
135 |
1.551 |
- |
- |
- |
- |
- |
0.6167 |
140 |
1.328 |
- |
- |
- |
- |
- |
0.6388 |
145 |
1.9353 |
- |
- |
- |
- |
- |
0.6608 |
150 |
2.4254 |
- |
- |
- |
- |
- |
0.6828 |
155 |
1.8436 |
- |
- |
- |
- |
- |
0.7048 |
160 |
1.1937 |
- |
- |
- |
- |
- |
0.7269 |
165 |
2.164 |
- |
- |
- |
- |
- |
0.7489 |
170 |
2.2921 |
- |
- |
- |
- |
- |
0.7709 |
175 |
2.4385 |
- |
- |
- |
- |
- |
0.7930 |
180 |
1.2392 |
- |
- |
- |
- |
- |
0.8150 |
185 |
1.0472 |
- |
- |
- |
- |
- |
0.8370 |
190 |
1.5844 |
- |
- |
- |
- |
- |
0.8590 |
195 |
1.2492 |
- |
- |
- |
- |
- |
0.8811 |
200 |
1.6774 |
- |
- |
- |
- |
- |
0.9031 |
205 |
2.485 |
- |
- |
- |
- |
- |
0.9251 |
210 |
2.4781 |
- |
- |
- |
- |
- |
0.9471 |
215 |
2.4476 |
- |
- |
- |
- |
- |
0.9692 |
220 |
2.6243 |
- |
- |
- |
- |
- |
0.9912 |
225 |
1.3651 |
- |
- |
- |
- |
- |
1.0 |
227 |
- |
0.9066 |
0.9112 |
0.9257 |
0.8906 |
0.9182 |
1.0132 |
230 |
1.0575 |
- |
- |
- |
- |
- |
1.0352 |
235 |
1.4499 |
- |
- |
- |
- |
- |
1.0573 |
240 |
1.4333 |
- |
- |
- |
- |
- |
1.0793 |
245 |
1.1148 |
- |
- |
- |
- |
- |
1.1013 |
250 |
1.259 |
- |
- |
- |
- |
- |
1.1233 |
255 |
0.873 |
- |
- |
- |
- |
- |
1.1454 |
260 |
1.646 |
- |
- |
- |
- |
- |
1.1674 |
265 |
1.7583 |
- |
- |
- |
- |
- |
1.1894 |
270 |
1.2268 |
- |
- |
- |
- |
- |
1.2115 |
275 |
1.3792 |
- |
- |
- |
- |
- |
1.2335 |
280 |
2.5662 |
- |
- |
- |
- |
- |
1.2555 |
285 |
1.5021 |
- |
- |
- |
- |
- |
1.2775 |
290 |
1.1399 |
- |
- |
- |
- |
- |
1.2996 |
295 |
1.3307 |
- |
- |
- |
- |
- |
1.3216 |
300 |
0.7458 |
- |
- |
- |
- |
- |
1.3436 |
305 |
1.1029 |
- |
- |
- |
- |
- |
1.3656 |
310 |
1.0205 |
- |
- |
- |
- |
- |
1.3877 |
315 |
1.0998 |
- |
- |
- |
- |
- |
1.4097 |
320 |
0.8304 |
- |
- |
- |
- |
- |
1.4317 |
325 |
1.3673 |
- |
- |
- |
- |
- |
1.4537 |
330 |
2.4445 |
- |
- |
- |
- |
- |
1.4758 |
335 |
2.8757 |
- |
- |
- |
- |
- |
1.4978 |
340 |
1.7879 |
- |
- |
- |
- |
- |
1.5198 |
345 |
1.1255 |
- |
- |
- |
- |
- |
1.5419 |
350 |
1.6743 |
- |
- |
- |
- |
- |
1.5639 |
355 |
1.3803 |
- |
- |
- |
- |
- |
1.5859 |
360 |
1.1998 |
- |
- |
- |
- |
- |
1.6079 |
365 |
1.2129 |
- |
- |
- |
- |
- |
1.6300 |
370 |
1.6588 |
- |
- |
- |
- |
- |
1.6520 |
375 |
0.9827 |
- |
- |
- |
- |
- |
1.6740 |
380 |
0.605 |
- |
- |
- |
- |
- |
1.6960 |
385 |
1.2934 |
- |
- |
- |
- |
- |
1.7181 |
390 |
1.1776 |
- |
- |
- |
- |
- |
1.7401 |
395 |
1.445 |
- |
- |
- |
- |
- |
1.7621 |
400 |
0.6393 |
- |
- |
- |
- |
- |
1.7841 |
405 |
0.9303 |
- |
- |
- |
- |
- |
1.8062 |
410 |
0.7541 |
- |
- |
- |
- |
- |
1.8282 |
415 |
0.5413 |
- |
- |
- |
- |
- |
1.8502 |
420 |
1.5258 |
- |
- |
- |
- |
- |
1.8722 |
425 |
1.4257 |
- |
- |
- |
- |
- |
1.8943 |
430 |
1.3111 |
- |
- |
- |
- |
- |
1.9163 |
435 |
1.6604 |
- |
- |
- |
- |
- |
1.9383 |
440 |
1.4004 |
- |
- |
- |
- |
- |
1.9604 |
445 |
2.7186 |
- |
- |
- |
- |
- |
1.9824 |
450 |
2.2757 |
- |
- |
- |
- |
- |
2.0 |
454 |
- |
0.9401 |
0.9433 |
0.9387 |
0.9386 |
0.9416 |
2.0044 |
455 |
0.9345 |
- |
- |
- |
- |
- |
2.0264 |
460 |
0.9325 |
- |
- |
- |
- |
- |
2.0485 |
465 |
1.2434 |
- |
- |
- |
- |
- |
2.0705 |
470 |
1.5161 |
- |
- |
- |
- |
- |
2.0925 |
475 |
2.6011 |
- |
- |
- |
- |
- |
2.1145 |
480 |
1.8276 |
- |
- |
- |
- |
- |
2.1366 |
485 |
1.5005 |
- |
- |
- |
- |
- |
2.1586 |
490 |
0.8618 |
- |
- |
- |
- |
- |
2.1806 |
495 |
2.1422 |
- |
- |
- |
- |
- |
2.2026 |
500 |
1.3922 |
- |
- |
- |
- |
- |
2.2247 |
505 |
1.5939 |
- |
- |
- |
- |
- |
2.2467 |
510 |
1.3021 |
- |
- |
- |
- |
- |
2.2687 |
515 |
1.0825 |
- |
- |
- |
- |
- |
2.2907 |
520 |
0.9066 |
- |
- |
- |
- |
- |
2.3128 |
525 |
0.7717 |
- |
- |
- |
- |
- |
2.3348 |
530 |
1.1484 |
- |
- |
- |
- |
- |
2.3568 |
535 |
1.6513 |
- |
- |
- |
- |
- |
2.3789 |
540 |
1.7267 |
- |
- |
- |
- |
- |
2.4009 |
545 |
0.7659 |
- |
- |
- |
- |
- |
2.4229 |
550 |
2.0213 |
- |
- |
- |
- |
- |
2.4449 |
555 |
0.5329 |
- |
- |
- |
- |
- |
2.4670 |
560 |
1.2083 |
- |
- |
- |
- |
- |
2.4890 |
565 |
1.5432 |
- |
- |
- |
- |
- |
2.5110 |
570 |
0.5423 |
- |
- |
- |
- |
- |
2.5330 |
575 |
0.2613 |
- |
- |
- |
- |
- |
2.5551 |
580 |
0.7985 |
- |
- |
- |
- |
- |
2.5771 |
585 |
0.3003 |
- |
- |
- |
- |
- |
2.5991 |
590 |
2.2234 |
- |
- |
- |
- |
- |
2.6211 |
595 |
0.4772 |
- |
- |
- |
- |
- |
2.6432 |
600 |
1.0158 |
- |
- |
- |
- |
- |
2.6652 |
605 |
2.6385 |
- |
- |
- |
- |
- |
2.6872 |
610 |
0.7042 |
- |
- |
- |
- |
- |
2.7093 |
615 |
1.1469 |
- |
- |
- |
- |
- |
2.7313 |
620 |
1.4092 |
- |
- |
- |
- |
- |
2.7533 |
625 |
0.6487 |
- |
- |
- |
- |
- |
2.7753 |
630 |
1.218 |
- |
- |
- |
- |
- |
2.7974 |
635 |
1.1509 |
- |
- |
- |
- |
- |
2.8194 |
640 |
1.1524 |
- |
- |
- |
- |
- |
2.8414 |
645 |
0.6477 |
- |
- |
- |
- |
- |
2.8634 |
650 |
0.6295 |
- |
- |
- |
- |
- |
2.8855 |
655 |
1.3026 |
- |
- |
- |
- |
- |
2.9075 |
660 |
1.9196 |
- |
- |
- |
- |
- |
2.9295 |
665 |
1.3743 |
- |
- |
- |
- |
- |
2.9515 |
670 |
0.8934 |
- |
- |
- |
- |
- |
2.9736 |
675 |
1.1801 |
- |
- |
- |
- |
- |
2.9956 |
680 |
1.2952 |
- |
- |
- |
- |
- |
3.0 |
681 |
- |
0.9538 |
0.9513 |
0.9538 |
0.9414 |
0.9435 |
3.0176 |
685 |
0.3324 |
- |
- |
- |
- |
- |
3.0396 |
690 |
0.9551 |
- |
- |
- |
- |
- |
3.0617 |
695 |
0.9315 |
- |
- |
- |
- |
- |
3.0837 |
700 |
1.3611 |
- |
- |
- |
- |
- |
3.1057 |
705 |
1.4406 |
- |
- |
- |
- |
- |
3.1278 |
710 |
0.5888 |
- |
- |
- |
- |
- |
3.1498 |
715 |
0.9149 |
- |
- |
- |
- |
- |
3.1718 |
720 |
0.5627 |
- |
- |
- |
- |
- |
3.1938 |
725 |
1.6876 |
- |
- |
- |
- |
- |
3.2159 |
730 |
1.1366 |
- |
- |
- |
- |
- |
3.2379 |
735 |
1.3571 |
- |
- |
- |
- |
- |
3.2599 |
740 |
1.5227 |
- |
- |
- |
- |
- |
3.2819 |
745 |
2.5139 |
- |
- |
- |
- |
- |
3.3040 |
750 |
0.3735 |
- |
- |
- |
- |
- |
3.3260 |
755 |
1.4386 |
- |
- |
- |
- |
- |
3.3480 |
760 |
0.3838 |
- |
- |
- |
- |
- |
3.3700 |
765 |
0.3973 |
- |
- |
- |
- |
- |
3.3921 |
770 |
1.4972 |
- |
- |
- |
- |
- |
3.4141 |
775 |
1.5118 |
- |
- |
- |
- |
- |
3.4361 |
780 |
0.478 |
- |
- |
- |
- |
- |
3.4581 |
785 |
1.5982 |
- |
- |
- |
- |
- |
3.4802 |
790 |
0.6209 |
- |
- |
- |
- |
- |
3.5022 |
795 |
0.5902 |
- |
- |
- |
- |
- |
3.5242 |
800 |
1.0877 |
- |
- |
- |
- |
- |
3.5463 |
805 |
0.9553 |
- |
- |
- |
- |
- |
3.5683 |
810 |
0.3054 |
- |
- |
- |
- |
- |
3.5903 |
815 |
1.2229 |
- |
- |
- |
- |
- |
3.6123 |
820 |
0.7434 |
- |
- |
- |
- |
- |
3.6344 |
825 |
1.5447 |
- |
- |
- |
- |
- |
3.6564 |
830 |
1.0751 |
- |
- |
- |
- |
- |
3.6784 |
835 |
0.8161 |
- |
- |
- |
- |
- |
3.7004 |
840 |
0.4382 |
- |
- |
- |
- |
- |
3.7225 |
845 |
1.3547 |
- |
- |
- |
- |
- |
3.7445 |
850 |
1.7112 |
- |
- |
- |
- |
- |
3.7665 |
855 |
0.5362 |
- |
- |
- |
- |
- |
3.7885 |
860 |
0.9309 |
- |
- |
- |
- |
- |
3.8106 |
865 |
1.8301 |
- |
- |
- |
- |
- |
3.8326 |
870 |
1.5554 |
- |
- |
- |
- |
- |
3.8546 |
875 |
1.4035 |
- |
- |
- |
- |
- |
3.8767 |
880 |
1.5814 |
- |
- |
- |
- |
- |
3.8987 |
885 |
0.7283 |
- |
- |
- |
- |
- |
3.9207 |
890 |
1.8549 |
- |
- |
- |
- |
- |
3.9427 |
895 |
0.196 |
- |
- |
- |
- |
- |
3.9648 |
900 |
1.2072 |
- |
- |
- |
- |
- |
3.9868 |
905 |
0.83 |
- |
- |
- |
- |
- |
4.0 |
908 |
- |
0.9564 |
0.9587 |
0.9612 |
0.9488 |
0.9563 |
4.0088 |
910 |
1.7222 |
- |
- |
- |
- |
- |
4.0308 |
915 |
0.6728 |
- |
- |
- |
- |
- |
4.0529 |
920 |
0.9388 |
- |
- |
- |
- |
- |
4.0749 |
925 |
0.7998 |
- |
- |
- |
- |
- |
4.0969 |
930 |
1.1561 |
- |
- |
- |
- |
- |
4.1189 |
935 |
2.4315 |
- |
- |
- |
- |
- |
4.1410 |
940 |
1.3263 |
- |
- |
- |
- |
- |
4.1630 |
945 |
1.2374 |
- |
- |
- |
- |
- |
4.1850 |
950 |
1.1307 |
- |
- |
- |
- |
- |
4.2070 |
955 |
0.5512 |
- |
- |
- |
- |
- |
4.2291 |
960 |
1.3266 |
- |
- |
- |
- |
- |
4.2511 |
965 |
1.2306 |
- |
- |
- |
- |
- |
4.2731 |
970 |
1.7083 |
- |
- |
- |
- |
- |
4.2952 |
975 |
0.7028 |
- |
- |
- |
- |
- |
4.3172 |
980 |
1.2987 |
- |
- |
- |
- |
- |
4.3392 |
985 |
1.545 |
- |
- |
- |
- |
- |
4.3612 |
990 |
1.004 |
- |
- |
- |
- |
- |
4.3833 |
995 |
0.8276 |
- |
- |
- |
- |
- |
4.4053 |
1000 |
1.4694 |
- |
- |
- |
- |
- |
4.4273 |
1005 |
0.4914 |
- |
- |
- |
- |
- |
4.4493 |
1010 |
0.9894 |
- |
- |
- |
- |
- |
4.4714 |
1015 |
0.8855 |
- |
- |
- |
- |
- |
4.4934 |
1020 |
1.1339 |
- |
- |
- |
- |
- |
4.5154 |
1025 |
1.0786 |
- |
- |
- |
- |
- |
4.5374 |
1030 |
1.2547 |
- |
- |
- |
- |
- |
4.5595 |
1035 |
0.5312 |
- |
- |
- |
- |
- |
4.5815 |
1040 |
1.4938 |
- |
- |
- |
- |
- |
4.6035 |
1045 |
0.8124 |
- |
- |
- |
- |
- |
4.6256 |
1050 |
1.2401 |
- |
- |
- |
- |
- |
4.6476 |
1055 |
1.1902 |
- |
- |
- |
- |
- |
4.6696 |
1060 |
1.4183 |
- |
- |
- |
- |
- |
4.6916 |
1065 |
1.0718 |
- |
- |
- |
- |
- |
4.7137 |
1070 |
1.2203 |
- |
- |
- |
- |
- |
4.7357 |
1075 |
0.8535 |
- |
- |
- |
- |
- |
4.7577 |
1080 |
1.2454 |
- |
- |
- |
- |
- |
4.7797 |
1085 |
0.4216 |
- |
- |
- |
- |
- |
4.8018 |
1090 |
0.8327 |
- |
- |
- |
- |
- |
4.8238 |
1095 |
1.2371 |
- |
- |
- |
- |
- |
4.8458 |
1100 |
1.0949 |
- |
- |
- |
- |
- |
4.8678 |
1105 |
1.2177 |
- |
- |
- |
- |
- |
4.8899 |
1110 |
0.6236 |
- |
- |
- |
- |
- |
4.9119 |
1115 |
0.646 |
- |
- |
- |
- |
- |
4.9339 |
1120 |
1.1822 |
- |
- |
- |
- |
- |
4.9559 |
1125 |
1.0471 |
- |
- |
- |
- |
- |
4.9780 |
1130 |
0.7626 |
- |
- |
- |
- |
- |
5.0 |
1135 |
0.9794 |
0.9564 |
0.9563 |
0.9616 |
0.9488 |
0.9587 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.42.4
- PyTorch: 2.3.1+cu121
- Accelerate: 0.32.1
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}