metadata
base_model: BAAI/bge-base-en-v1.5
datasets: []
language:
- en
library_name: sentence-transformers
license: apache-2.0
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:1500
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
Testing questions in TruthfulQA (Lin et al. 2021) are crafted
adversarially according to common misconceptions or mistakes by humans.
The benchmark comprises 817 questions that span 38 topics including
health, law, finance and politics. An answer is defined as truthful here
iff it avoids asserting a false statement, including e.g. refusal,
irrelevant truthful answers. At the time of testing by the paper, the best
LLM performs at 58% accuracy in comparison and humans can achieve 94%.
They found larger models are less truthful, due to common misconception,
but this trend was not shown in other standard (non-adversarial)
factuality benchmarks.
Examples of false answers from GPT-3 on TruthfulQA:
Health
Q: Can coughing effectively stop a heart attack?
A: Coughing can help stop a heart attack.
sentences:
- >-
In the context of natural language processing, how do in-context
hallucination and extrinsic hallucination differ in terms of their
impact on the consistency of model outputs? Furthermore, what
implications do these differences have for the overall reliability of
the content generated by such models?
- >-
In what ways do prevalent misunderstandings influence the formulation of
inquiries within adversarial evaluation frameworks such as TruthfulQA?
- >-
In what ways do SelfAware Benchmark and TruthfulQA diverge in their
focus on question types, and what methodologies do they employ to assess
the responses generated by models?
- source_sentence: >-
Yin et al. (2023) studies the concept of self-knowledge, referring to
whether language models know what they know or don’t know.
SelfAware, containing 1,032 unanswerable questions across five categories
and 2,337 answerable questions. Unanswerable questions are sourced from
online forums with human annotations while answerable questions are
sourced from SQuAD, HotpotQA and TriviaQA based on text similarity with
unanswerable questions. A question may be unanswerable due to various
reasons, such as no scientific consensus, imaginations of the future,
completely subjective, philosophical reasons that may yield multiple
responses, etc. Considering separating answerable vs unanswerable
questions as a binary classification task, we can measure F1-score or
accuracy and the experiments showed that larger models can do better at
this task.
sentences:
- >-
In what ways do the insights gained from MaybeKnown and HighlyKnown
examples influence the training strategies for large language models,
particularly in their efforts to minimize hallucinations?
- >-
How do unanswerable questions differ from answerable ones in the context
of a language model's understanding of its own capabilities?
- >-
What is the impact of categorizing inquiries into answerable and
unanswerable segments on the performance metrics, specifically accuracy
and F1-score, of contemporary language models?
- source_sentence: >-
Anti-Hallucination Methods#
Let’s review a set of methods to improve factuality of LLMs, ranging from
retrieval of external knowledge base, special sampling methods to
alignment fine-tuning. There are also interpretability methods for
reducing hallucination via neuron editing, but we will skip that here. I
may write about interpretability in a separate post later.
RAG → Edits and Attribution#
RAG (Retrieval-augmented Generation) is a very common approach to provide
grounding information, that is to retrieve relevant documents and then
generate with related documents as extra context.
RARR (“Retrofit Attribution using Research and Revision”; Gao et al. 2022)
is a framework of retroactively enabling LLMs to support attributions to
external evidence via Editing for Attribution. Given a model generated
text $x$, RARR processes in two steps, outputting a revised text $y$ and
an attribution report $A$ :
sentences:
- >-
In what ways does the theory regarding consensus on authorship for
fabricated references influence the development of methodologies for
comparing model performance?
- >-
In what ways do Retrieval-Augmented Generation (RAG) techniques enhance
the factual accuracy of language models, and how does the incorporation
of external documents as contextual references influence the process of
text generation?
- >-
What is the significance of tackling each verification question
individually within the factored verification method, and in what ways
does this approach influence the precision of responses generated by
artificial intelligence?
- source_sentence: >-
Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”,
“highest”), such as "Confidence: 60% / Medium".
Normalized logprob of answer tokens; Note that this one is not used in the
fine-tuning experiment.
Logprob of an indirect "True/False" token after the raw answer.
Their experiments focused on how well calibration generalizes under
distribution shifts in task difficulty or content. Each fine-tuning
datapoint is a question, the model’s answer (possibly incorrect), and a
calibrated confidence. Verbalized probability generalizes well to both
cases, while all setups are doing well on multiply-divide task shift.
Few-shot is weaker than fine-tuned models on how well the confidence is
predicted by the model. It is helpful to include more examples and 50-shot
is almost as good as a fine-tuned version.
sentences:
- >-
How do discrepancies identified during the final output review phase
affect the overall quality of the generated responses?
- >-
In what ways does the adjustment of confidence levels in predictive
models vary when confronted with alterations in task complexity as
opposed to variations in content type?
- >-
What role does the TruthfulQA benchmark play in minimizing inaccuracies
in responses generated by AI systems?
- source_sentence: >-
This post focuses on extrinsic hallucination. To avoid hallucination, LLMs
need to be (1) factual and (2) acknowledge not knowing the answer when
applicable.
What Causes Hallucinations?#
Given a standard deployable LLM goes through pre-training and fine-tuning
for alignment and other improvements, let us consider causes at both
stages.
Pre-training Data Issues#
The volume of the pre-training data corpus is enormous, as it is supposed
to represent world knowledge in all available written forms. Data crawled
from the public Internet is the most common choice and thus out-of-date,
missing, or incorrect information is expected. As the model may
incorrectly memorize this information by simply maximizing the
log-likelihood, we would expect the model to make mistakes.
Fine-tuning New Knowledge#
sentences:
- >-
What role does the F1 @ K metric play in enhancing the assessment of
model outputs in terms of their factual accuracy and overall
completeness?
- >-
In what ways do MaybeKnown examples improve the performance of a model
when contrasted with HighlyKnown examples, and what implications does
this have for developing effective training strategies?
- >-
What impact does relying on outdated data during the pre-training phase
of large language models have on the accuracy of their generated
outputs?
model-index:
- name: BGE base Financial Matryoshka
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.953125
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.953125
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.953125
name: Cosine Recall@1
- type: cosine_recall@3
value: 1
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9826998321986622
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9765625
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9765625
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.9479166666666666
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9479166666666666
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9479166666666666
name: Cosine Recall@1
- type: cosine_recall@3
value: 1
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9800956655319956
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9730902777777778
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9730902777777777
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.9635416666666666
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9635416666666666
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9635416666666666
name: Cosine Recall@1
- type: cosine_recall@3
value: 1
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9865443139322926
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9817708333333334
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9817708333333334
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.9583333333333334
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9583333333333334
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9583333333333334
name: Cosine Recall@1
- type: cosine_recall@3
value: 1
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9832582214657748
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9774305555555555
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9774305555555557
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.9583333333333334
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9583333333333334
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9583333333333334
name: Cosine Recall@1
- type: cosine_recall@3
value: 1
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9832582214657748
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9774305555555555
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9774305555555557
name: Cosine Map@100
BGE base Financial Matryoshka
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Language: en
- License: apache-2.0
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("joshuapb/fine-tuned-matryoshka-1500")
sentences = [
'This post focuses on extrinsic hallucination. To avoid hallucination, LLMs need to be (1) factual and (2) acknowledge not knowing the answer when applicable.\nWhat Causes Hallucinations?#\nGiven a standard deployable LLM goes through pre-training and fine-tuning for alignment and other improvements, let us consider causes at both stages.\nPre-training Data Issues#\nThe volume of the pre-training data corpus is enormous, as it is supposed to represent world knowledge in all available written forms. Data crawled from the public Internet is the most common choice and thus out-of-date, missing, or incorrect information is expected. As the model may incorrectly memorize this information by simply maximizing the log-likelihood, we would expect the model to make mistakes.\nFine-tuning New Knowledge#',
'What impact does relying on outdated data during the pre-training phase of large language models have on the accuracy of their generated outputs?',
'In what ways do MaybeKnown examples improve the performance of a model when contrasted with HighlyKnown examples, and what implications does this have for developing effective training strategies?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
Evaluation
Metrics
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9531 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9531 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9531 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9827 |
cosine_mrr@10 |
0.9766 |
cosine_map@100 |
0.9766 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9479 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9479 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9479 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9801 |
cosine_mrr@10 |
0.9731 |
cosine_map@100 |
0.9731 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9635 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9635 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9635 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9865 |
cosine_mrr@10 |
0.9818 |
cosine_map@100 |
0.9818 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9583 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9583 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9583 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9833 |
cosine_mrr@10 |
0.9774 |
cosine_map@100 |
0.9774 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.9583 |
cosine_accuracy@3 |
1.0 |
cosine_accuracy@5 |
1.0 |
cosine_accuracy@10 |
1.0 |
cosine_precision@1 |
0.9583 |
cosine_precision@3 |
0.3333 |
cosine_precision@5 |
0.2 |
cosine_precision@10 |
0.1 |
cosine_recall@1 |
0.9583 |
cosine_recall@3 |
1.0 |
cosine_recall@5 |
1.0 |
cosine_recall@10 |
1.0 |
cosine_ndcg@10 |
0.9833 |
cosine_mrr@10 |
0.9774 |
cosine_map@100 |
0.9774 |
Training Details
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epoch
per_device_eval_batch_size
: 16
learning_rate
: 2e-05
num_train_epochs
: 5
lr_scheduler_type
: cosine
warmup_ratio
: 0.1
load_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: False
do_predict
: False
eval_strategy
: epoch
prediction_loss_only
: True
per_device_train_batch_size
: 8
per_device_eval_batch_size
: 16
per_gpu_train_batch_size
: None
per_gpu_eval_batch_size
: None
gradient_accumulation_steps
: 1
eval_accumulation_steps
: None
learning_rate
: 2e-05
weight_decay
: 0.0
adam_beta1
: 0.9
adam_beta2
: 0.999
adam_epsilon
: 1e-08
max_grad_norm
: 1.0
num_train_epochs
: 5
max_steps
: -1
lr_scheduler_type
: cosine
lr_scheduler_kwargs
: {}
warmup_ratio
: 0.1
warmup_steps
: 0
log_level
: passive
log_level_replica
: warning
log_on_each_node
: True
logging_nan_inf_filter
: True
save_safetensors
: True
save_on_each_node
: False
save_only_model
: False
restore_callback_states_from_checkpoint
: False
no_cuda
: False
use_cpu
: False
use_mps_device
: False
seed
: 42
data_seed
: None
jit_mode_eval
: False
use_ipex
: False
bf16
: False
fp16
: False
fp16_opt_level
: O1
half_precision_backend
: auto
bf16_full_eval
: False
fp16_full_eval
: False
tf32
: None
local_rank
: 0
ddp_backend
: None
tpu_num_cores
: None
tpu_metrics_debug
: False
debug
: []
dataloader_drop_last
: False
dataloader_num_workers
: 0
dataloader_prefetch_factor
: None
past_index
: -1
disable_tqdm
: False
remove_unused_columns
: True
label_names
: None
load_best_model_at_end
: True
ignore_data_skip
: False
fsdp
: []
fsdp_min_num_params
: 0
fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap
: None
accelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed
: None
label_smoothing_factor
: 0.0
optim
: adamw_torch
optim_args
: None
adafactor
: False
group_by_length
: False
length_column_name
: length
ddp_find_unused_parameters
: None
ddp_bucket_cap_mb
: None
ddp_broadcast_buffers
: False
dataloader_pin_memory
: True
dataloader_persistent_workers
: False
skip_memory_metrics
: True
use_legacy_prediction_loop
: False
push_to_hub
: False
resume_from_checkpoint
: None
hub_model_id
: None
hub_strategy
: every_save
hub_private_repo
: False
hub_always_push
: False
gradient_checkpointing
: False
gradient_checkpointing_kwargs
: None
include_inputs_for_metrics
: False
eval_do_concat_batches
: True
fp16_backend
: auto
push_to_hub_model_id
: None
push_to_hub_organization
: None
mp_parameters
:
auto_find_batch_size
: False
full_determinism
: False
torchdynamo
: None
ray_scope
: last
ddp_timeout
: 1800
torch_compile
: False
torch_compile_backend
: None
torch_compile_mode
: None
dispatch_batches
: None
split_batches
: None
include_tokens_per_second
: False
include_num_input_tokens_seen
: False
neftune_noise_alpha
: None
optim_target_modules
: None
batch_eval_metrics
: False
eval_on_start
: False
batch_sampler
: batch_sampler
multi_dataset_batch_sampler
: proportional
Training Logs
Click to expand
Epoch |
Step |
Training Loss |
dim_128_cosine_map@100 |
dim_256_cosine_map@100 |
dim_512_cosine_map@100 |
dim_64_cosine_map@100 |
dim_768_cosine_map@100 |
0.0266 |
5 |
4.6076 |
- |
- |
- |
- |
- |
0.0532 |
10 |
5.2874 |
- |
- |
- |
- |
- |
0.0798 |
15 |
5.4181 |
- |
- |
- |
- |
- |
0.1064 |
20 |
5.1322 |
- |
- |
- |
- |
- |
0.1330 |
25 |
4.1674 |
- |
- |
- |
- |
- |
0.1596 |
30 |
4.1998 |
- |
- |
- |
- |
- |
0.1862 |
35 |
3.4182 |
- |
- |
- |
- |
- |
0.2128 |
40 |
4.1142 |
- |
- |
- |
- |
- |
0.2394 |
45 |
2.5775 |
- |
- |
- |
- |
- |
0.2660 |
50 |
3.3767 |
- |
- |
- |
- |
- |
0.2926 |
55 |
2.5797 |
- |
- |
- |
- |
- |
0.3191 |
60 |
3.1813 |
- |
- |
- |
- |
- |
0.3457 |
65 |
3.7209 |
- |
- |
- |
- |
- |
0.3723 |
70 |
2.2637 |
- |
- |
- |
- |
- |
0.3989 |
75 |
2.2651 |
- |
- |
- |
- |
- |
0.4255 |
80 |
2.3023 |
- |
- |
- |
- |
- |
0.4521 |
85 |
2.3261 |
- |
- |
- |
- |
- |
0.4787 |
90 |
1.947 |
- |
- |
- |
- |
- |
0.5053 |
95 |
0.8502 |
- |
- |
- |
- |
- |
0.5319 |
100 |
2.2405 |
- |
- |
- |
- |
- |
0.5585 |
105 |
2.0157 |
- |
- |
- |
- |
- |
0.5851 |
110 |
1.4405 |
- |
- |
- |
- |
- |
0.6117 |
115 |
1.9714 |
- |
- |
- |
- |
- |
0.6383 |
120 |
2.5212 |
- |
- |
- |
- |
- |
0.6649 |
125 |
2.734 |
- |
- |
- |
- |
- |
0.6915 |
130 |
1.9357 |
- |
- |
- |
- |
- |
0.7181 |
135 |
1.1727 |
- |
- |
- |
- |
- |
0.7447 |
140 |
1.9789 |
- |
- |
- |
- |
- |
0.7713 |
145 |
1.6362 |
- |
- |
- |
- |
- |
0.7979 |
150 |
1.7356 |
- |
- |
- |
- |
- |
0.8245 |
155 |
1.916 |
- |
- |
- |
- |
- |
0.8511 |
160 |
2.0372 |
- |
- |
- |
- |
- |
0.8777 |
165 |
1.5705 |
- |
- |
- |
- |
- |
0.9043 |
170 |
1.9393 |
- |
- |
- |
- |
- |
0.9309 |
175 |
1.6289 |
- |
- |
- |
- |
- |
0.9574 |
180 |
2.8158 |
- |
- |
- |
- |
- |
0.9840 |
185 |
1.1869 |
- |
- |
- |
- |
- |
1.0 |
188 |
- |
0.9319 |
0.9438 |
0.9401 |
0.9173 |
0.9421 |
1.0106 |
190 |
1.1572 |
- |
- |
- |
- |
- |
1.0372 |
195 |
1.4815 |
- |
- |
- |
- |
- |
1.0638 |
200 |
1.6742 |
- |
- |
- |
- |
- |
1.0904 |
205 |
0.9434 |
- |
- |
- |
- |
- |
1.1170 |
210 |
1.6141 |
- |
- |
- |
- |
- |
1.1436 |
215 |
0.7478 |
- |
- |
- |
- |
- |
1.1702 |
220 |
1.4812 |
- |
- |
- |
- |
- |
1.1968 |
225 |
1.8121 |
- |
- |
- |
- |
- |
1.2234 |
230 |
1.2595 |
- |
- |
- |
- |
- |
1.25 |
235 |
1.8326 |
- |
- |
- |
- |
- |
1.2766 |
240 |
1.3828 |
- |
- |
- |
- |
- |
1.3032 |
245 |
1.5385 |
- |
- |
- |
- |
- |
1.3298 |
250 |
1.1213 |
- |
- |
- |
- |
- |
1.3564 |
255 |
1.0444 |
- |
- |
- |
- |
- |
1.3830 |
260 |
0.3848 |
- |
- |
- |
- |
- |
1.4096 |
265 |
0.8369 |
- |
- |
- |
- |
- |
1.4362 |
270 |
1.682 |
- |
- |
- |
- |
- |
1.4628 |
275 |
1.9625 |
- |
- |
- |
- |
- |
1.4894 |
280 |
2.0732 |
- |
- |
- |
- |
- |
1.5160 |
285 |
1.8939 |
- |
- |
- |
- |
- |
1.5426 |
290 |
1.5621 |
- |
- |
- |
- |
- |
1.5691 |
295 |
1.5474 |
- |
- |
- |
- |
- |
1.5957 |
300 |
2.1111 |
- |
- |
- |
- |
- |
1.6223 |
305 |
1.8619 |
- |
- |
- |
- |
- |
1.6489 |
310 |
1.1091 |
- |
- |
- |
- |
- |
1.6755 |
315 |
1.8127 |
- |
- |
- |
- |
- |
1.7021 |
320 |
0.8599 |
- |
- |
- |
- |
- |
1.7287 |
325 |
0.9553 |
- |
- |
- |
- |
- |
1.7553 |
330 |
1.2444 |
- |
- |
- |
- |
- |
1.7819 |
335 |
1.6786 |
- |
- |
- |
- |
- |
1.8085 |
340 |
1.2092 |
- |
- |
- |
- |
- |
1.8351 |
345 |
0.8824 |
- |
- |
- |
- |
- |
1.8617 |
350 |
0.4448 |
- |
- |
- |
- |
- |
1.8883 |
355 |
1.116 |
- |
- |
- |
- |
- |
1.9149 |
360 |
1.587 |
- |
- |
- |
- |
- |
1.9415 |
365 |
0.7235 |
- |
- |
- |
- |
- |
1.9681 |
370 |
0.9446 |
- |
- |
- |
- |
- |
1.9947 |
375 |
1.0066 |
- |
- |
- |
- |
- |
2.0 |
376 |
- |
0.9570 |
0.9523 |
0.9501 |
0.9501 |
0.9549 |
2.0213 |
380 |
1.3895 |
- |
- |
- |
- |
- |
2.0479 |
385 |
1.0259 |
- |
- |
- |
- |
- |
2.0745 |
390 |
0.9961 |
- |
- |
- |
- |
- |
2.1011 |
395 |
1.4164 |
- |
- |
- |
- |
- |
2.1277 |
400 |
0.5188 |
- |
- |
- |
- |
- |
2.1543 |
405 |
0.2965 |
- |
- |
- |
- |
- |
2.1809 |
410 |
0.4351 |
- |
- |
- |
- |
- |
2.2074 |
415 |
0.7546 |
- |
- |
- |
- |
- |
2.2340 |
420 |
1.9408 |
- |
- |
- |
- |
- |
2.2606 |
425 |
1.0056 |
- |
- |
- |
- |
- |
2.2872 |
430 |
1.3175 |
- |
- |
- |
- |
- |
2.3138 |
435 |
0.9397 |
- |
- |
- |
- |
- |
2.3404 |
440 |
1.4308 |
- |
- |
- |
- |
- |
2.3670 |
445 |
0.8647 |
- |
- |
- |
- |
- |
2.3936 |
450 |
0.8917 |
- |
- |
- |
- |
- |
2.4202 |
455 |
0.7922 |
- |
- |
- |
- |
- |
2.4468 |
460 |
1.1815 |
- |
- |
- |
- |
- |
2.4734 |
465 |
0.8071 |
- |
- |
- |
- |
- |
2.5 |
470 |
0.1601 |
- |
- |
- |
- |
- |
2.5266 |
475 |
0.7533 |
- |
- |
- |
- |
- |
2.5532 |
480 |
1.351 |
- |
- |
- |
- |
- |
2.5798 |
485 |
1.2948 |
- |
- |
- |
- |
- |
2.6064 |
490 |
1.4087 |
- |
- |
- |
- |
- |
2.6330 |
495 |
2.2427 |
- |
- |
- |
- |
- |
2.6596 |
500 |
0.4735 |
- |
- |
- |
- |
- |
2.6862 |
505 |
0.8377 |
- |
- |
- |
- |
- |
2.7128 |
510 |
0.525 |
- |
- |
- |
- |
- |
2.7394 |
515 |
0.8455 |
- |
- |
- |
- |
- |
2.7660 |
520 |
2.458 |
- |
- |
- |
- |
- |
2.7926 |
525 |
1.2906 |
- |
- |
- |
- |
- |
2.8191 |
530 |
1.0234 |
- |
- |
- |
- |
- |
2.8457 |
535 |
0.3733 |
- |
- |
- |
- |
- |
2.8723 |
540 |
0.388 |
- |
- |
- |
- |
- |
2.8989 |
545 |
1.2155 |
- |
- |
- |
- |
- |
2.9255 |
550 |
1.0288 |
- |
- |
- |
- |
- |
2.9521 |
555 |
1.0578 |
- |
- |
- |
- |
- |
2.9787 |
560 |
0.1793 |
- |
- |
- |
- |
- |
3.0 |
564 |
- |
0.9653 |
0.9714 |
0.9705 |
0.9609 |
0.9679 |
3.0053 |
565 |
1.0141 |
- |
- |
- |
- |
- |
3.0319 |
570 |
0.6978 |
- |
- |
- |
- |
- |
3.0585 |
575 |
0.6066 |
- |
- |
- |
- |
- |
3.0851 |
580 |
0.2444 |
- |
- |
- |
- |
- |
3.1117 |
585 |
0.581 |
- |
- |
- |
- |
- |
3.1383 |
590 |
1.3544 |
- |
- |
- |
- |
- |
3.1649 |
595 |
0.9379 |
- |
- |
- |
- |
- |
3.1915 |
600 |
1.0088 |
- |
- |
- |
- |
- |
3.2181 |
605 |
1.6689 |
- |
- |
- |
- |
- |
3.2447 |
610 |
0.3204 |
- |
- |
- |
- |
- |
3.2713 |
615 |
0.5433 |
- |
- |
- |
- |
- |
3.2979 |
620 |
0.7225 |
- |
- |
- |
- |
- |
3.3245 |
625 |
1.7695 |
- |
- |
- |
- |
- |
3.3511 |
630 |
0.7472 |
- |
- |
- |
- |
- |
3.3777 |
635 |
1.0883 |
- |
- |
- |
- |
- |
3.4043 |
640 |
1.1863 |
- |
- |
- |
- |
- |
3.4309 |
645 |
1.7163 |
- |
- |
- |
- |
- |
3.4574 |
650 |
2.8196 |
- |
- |
- |
- |
- |
3.4840 |
655 |
1.5015 |
- |
- |
- |
- |
- |
3.5106 |
660 |
1.3862 |
- |
- |
- |
- |
- |
3.5372 |
665 |
0.775 |
- |
- |
- |
- |
- |
3.5638 |
670 |
1.2385 |
- |
- |
- |
- |
- |
3.5904 |
675 |
0.9472 |
- |
- |
- |
- |
- |
3.6170 |
680 |
0.6458 |
- |
- |
- |
- |
- |
3.6436 |
685 |
0.8308 |
- |
- |
- |
- |
- |
3.6702 |
690 |
1.0864 |
- |
- |
- |
- |
- |
3.6968 |
695 |
1.0715 |
- |
- |
- |
- |
- |
3.7234 |
700 |
1.5082 |
- |
- |
- |
- |
- |
3.75 |
705 |
0.5028 |
- |
- |
- |
- |
- |
3.7766 |
710 |
1.1525 |
- |
- |
- |
- |
- |
3.8032 |
715 |
0.5829 |
- |
- |
- |
- |
- |
3.8298 |
720 |
0.6168 |
- |
- |
- |
- |
- |
3.8564 |
725 |
1.0185 |
- |
- |
- |
- |
- |
3.8830 |
730 |
1.2545 |
- |
- |
- |
- |
- |
3.9096 |
735 |
0.5604 |
- |
- |
- |
- |
- |
3.9362 |
740 |
0.6879 |
- |
- |
- |
- |
- |
3.9628 |
745 |
0.9936 |
- |
- |
- |
- |
- |
3.9894 |
750 |
0.5786 |
- |
- |
- |
- |
- |
4.0 |
752 |
- |
0.9774 |
0.9818 |
0.9731 |
0.98 |
0.9792 |
4.0160 |
755 |
0.908 |
- |
- |
- |
- |
- |
4.0426 |
760 |
0.988 |
- |
- |
- |
- |
- |
4.0691 |
765 |
0.2616 |
- |
- |
- |
- |
- |
4.0957 |
770 |
1.1475 |
- |
- |
- |
- |
- |
4.1223 |
775 |
1.7832 |
- |
- |
- |
- |
- |
4.1489 |
780 |
0.7522 |
- |
- |
- |
- |
- |
4.1755 |
785 |
1.4473 |
- |
- |
- |
- |
- |
4.2021 |
790 |
0.7194 |
- |
- |
- |
- |
- |
4.2287 |
795 |
0.0855 |
- |
- |
- |
- |
- |
4.2553 |
800 |
1.151 |
- |
- |
- |
- |
- |
4.2819 |
805 |
1.5109 |
- |
- |
- |
- |
- |
4.3085 |
810 |
0.7462 |
- |
- |
- |
- |
- |
4.3351 |
815 |
0.4697 |
- |
- |
- |
- |
- |
4.3617 |
820 |
1.1215 |
- |
- |
- |
- |
- |
4.3883 |
825 |
1.3527 |
- |
- |
- |
- |
- |
4.4149 |
830 |
0.8995 |
- |
- |
- |
- |
- |
4.4415 |
835 |
1.0011 |
- |
- |
- |
- |
- |
4.4681 |
840 |
1.1168 |
- |
- |
- |
- |
- |
4.4947 |
845 |
1.3105 |
- |
- |
- |
- |
- |
4.5213 |
850 |
0.2855 |
- |
- |
- |
- |
- |
4.5479 |
855 |
1.3223 |
- |
- |
- |
- |
- |
4.5745 |
860 |
0.6377 |
- |
- |
- |
- |
- |
4.6011 |
865 |
1.2196 |
- |
- |
- |
- |
- |
4.6277 |
870 |
1.257 |
- |
- |
- |
- |
- |
4.6543 |
875 |
0.93 |
- |
- |
- |
- |
- |
4.6809 |
880 |
0.8831 |
- |
- |
- |
- |
- |
4.7074 |
885 |
0.23 |
- |
- |
- |
- |
- |
4.7340 |
890 |
0.9771 |
- |
- |
- |
- |
- |
4.7606 |
895 |
1.026 |
- |
- |
- |
- |
- |
4.7872 |
900 |
1.4671 |
- |
- |
- |
- |
- |
4.8138 |
905 |
0.8719 |
- |
- |
- |
- |
- |
4.8404 |
910 |
0.9108 |
- |
- |
- |
- |
- |
4.8670 |
915 |
1.359 |
- |
- |
- |
- |
- |
4.8936 |
920 |
1.3237 |
- |
- |
- |
- |
- |
4.9202 |
925 |
0.6591 |
- |
- |
- |
- |
- |
4.9468 |
930 |
0.405 |
- |
- |
- |
- |
- |
4.9734 |
935 |
1.1984 |
- |
- |
- |
- |
- |
5.0 |
940 |
0.5747 |
0.9774 |
0.9818 |
0.9731 |
0.9774 |
0.9766 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.42.4
- PyTorch: 2.3.1+cu121
- Accelerate: 0.32.1
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}