SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5 on the baconnier/finance_dataset_small_private dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-small-en-v1.5
Maximum Sequence Length: 512 tokens
Output Dimensionality: 384 tokens
Similarity Function: Cosine Similarity
Training Dataset:
- baconnier/finance_dataset_small_private

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("baconnier/Finance2_embedding_small_en-V1.5")
# Run inference
sentences = [
    'What is industrial production, and how is it measured by the Federal Reserve Board?',
    'Industrial production is a statistic determined by the Federal Reserve Board that measures the total output of all US factories and mines on a monthly basis. The Fed collects data from various government agencies and trade associations to calculate the industrial production index, which serves as an important economic indicator, providing insight into the health of the manufacturing and mining sectors.\nIndustrial production is a monthly statistic calculated by the Federal Reserve Board, measuring the total output of US factories and mines using data from government agencies and trade associations, serving as a key economic indicator for the manufacturing and mining sectors.',
    'Industrial production is a statistic that measures the output of factories and mines in the US. It is released by the Federal Reserve Board every quarter.\nIndustrial production measures factory and mine output, released quarterly by the Fed.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Dataset: Finance_Embedding_Metric
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.9791
dot_accuracy	0.0209
manhattan_accuracy	0.978
euclidean_accuracy	0.9791
max_accuracy	0.9791

Training Details

Training Dataset

baconnier/finance_dataset_small_private

Dataset: baconnier/finance_dataset_small_private at d7e6492
Size: 15,525 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 11 tokens mean: 76.86 tokens max: 304 tokens	min: 10 tokens mean: 79.23 tokens max: 299 tokens	min: 14 tokens mean: 60.36 tokens max: 155 tokens

Samples:

anchor	positive	negative
`What is the key difference between a whole loan and a participation loan in terms of investment ownership?`	The context clearly states that a whole loan is a type of investment where an investor purchases the entire mortgage loan from the original lender, becoming the sole owner. This is in contrast to a participation loan, where multiple investors share ownership of a single loan. Therefore, the key difference between a whole loan and a participation loan is that a whole loan is owned entirely by a single investor, while a participation loan involves shared ownership among multiple investors. In a whole loan, a single investor owns the entire mortgage loan, while in a participation loan, multiple investors share ownership of the loan.	`A whole loan is where multiple investors share ownership of a loan, while a participation loan is where an investor purchases the entire loan. Since the context states that a whole loan is where an investor purchases the entire mortgage loan and becomes the sole owner, this answer is incorrect. A whole loan involves multiple investors, while a participation loan is owned by a single investor.`
The role of an executor is to manage and distribute the assets of a deceased person's estate in accordance with their will. This includes tasks such as settling debts, filing tax returns, and ensuring that the assets are distributed to the beneficiaries as specified in the will. The executor is appointed by the court to carry out these duties. In the given context, Michael Johnson was nominated by John Smith in his will and appointed by the court as the executor of John's estate, which was valued at $5 million. Michael's responsibilities include dividing the estate equally among John's three children, donating $500,000 to the local animal shelter as per John's instructions, settling the $200,000 mortgage and $50,000 credit card debt, and filing John's final income tax return and paying any outstanding taxes. An executor, appointed by the court, manages and distributes a deceased person's assets according to their will, settling debts, filing taxes, and ensuring the will is followed.	`What is the role of an executor in managing a deceased person's estate?`	`An executor is someone who manages a deceased person's estate. They are responsible for distributing the assets according to the will. In this case, John Smith passed away and nominated Michael Johnson as the executor. The executor is responsible for distributing the assets of a deceased person's estate according to their will.`
`What is a ticker tape, and how does it help investors?`	A ticker tape is a computerized device that relays stock symbols, latest prices, and trading volumes to investors worldwide in real-time. It helps investors by providing up-to-the-second information about the stocks they are monitoring or interested in, enabling them to make quick and informed trading decisions based on the most current market data available. A ticker tape is a real-time digital stock data display that empowers investors to make timely, data-driven trading decisions by providing the latest stock symbols, prices, and volumes.	`A ticker tape is a device that shows stock information. It helps investors by providing some data about stocks. A ticker tape provides stock data to investors.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

baconnier/finance_dataset_small_private

Dataset: baconnier/finance_dataset_small_private at d7e6492
Size: 862 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 10 tokens mean: 78.51 tokens max: 286 tokens	min: 12 tokens mean: 76.02 tokens max: 304 tokens	min: 20 tokens mean: 59.8 tokens max: 271 tokens

Samples:

anchor	positive	negative
`What is the underwriter's discount in the given IPO scenario, and how does it relate to the gross spread?`	The underwriter's discount is the difference between the price the underwriter pays for the shares and the price at which they sell them to the public. In this case, the underwriter buys the shares at a 7% discount from the IPO price of $20 per share. The underwriter's discount is also known as the gross spread, as it represents the gross profit earned by the underwriter. The underwriter's discount is 7%, which is equivalent to $1.40 per share. This is also known as the gross spread, representing the underwriter's gross profit.	`The underwriter's discount is the difference between the price the underwriter pays for the shares and the price at which they sell them to the public. In this case, the underwriter buys the shares at a 7% discount, but the gross spread is not mentioned. The underwriter's discount is 7%, but the gross spread is unknown.`
`What is the primary function of the equity market, and how does it relate to the stock market?`	The equity market, synonymous with the stock market, serves as a platform for companies to issue ownership shares to raise capital for growth and expansion. Simultaneously, it allows investors to buy these shares, becoming part-owners of the companies and potentially earning returns through stock price appreciation and dividends. The equity market plays a vital role in the financial system by efficiently allocating capital to businesses and providing investment opportunities to individuals and institutions. The equity market, or stock market, primarily functions as a mechanism for companies to raise capital by issuing ownership shares, while providing investors with opportunities to invest in these companies and earn returns, thus facilitating efficient capital allocation in the financial system.	`The equity market is where ownership shares of companies are bought and sold. It allows companies to raise money by selling stocks. The stock market is the same as the equity market. The equity market and the stock market are the same thing, where stocks are traded.`
A selling syndicate is a group of investment banks that work together to underwrite and distribute a new security issue, such as stocks or bonds, to investors. The syndicate is typically led by one or more lead underwriters, who coordinate the distribution of the securities and set the offering price. In the case of XYZ Corporation, the selling syndicate is led by ABC Investment Bank and consists of 5 investment banks in total. The syndicate has agreed to purchase 10 million new shares from XYZ Corporation at a fixed price of $50 per share, which they will then sell to investors at a higher price of $55 per share. This process allows XYZ Corporation to raise capital by issuing new shares, while the selling syndicate earns a commission on the sale of the shares. The syndicate's role is to facilitate the distribution of the new shares to a wider pool of investors, helping to ensure the success of the offering. A selling syndicate is a group of investment banks that jointly underwrite and distribute a new security issue to investors. In XYZ Corporation's case, the syndicate will purchase shares from the company at a fixed price and resell them to investors at a higher price, earning a commission and facilitating the successful distribution of the new shares.	`What is a selling syndicate, and how does it function in the context of XYZ Corporation's new share issue?`	A selling syndicate is a group of investment banks that work together to sell new shares of a company. In this case, XYZ Corporation has hired 5 investment banks to sell their new shares. The syndicate buys the shares from XYZ Corporation at a fixed price and then sells them to investors at a higher price. A selling syndicate is a group of investment banks that jointly underwrite and distribute new shares of a company to investors, buying the shares at a fixed price and selling them at a higher price.

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	loss	Finance_Embedding_Metric_max_accuracy
0.0103	10	0.9918	-	-
0.0206	20	0.8866	-	-
0.0309	30	0.7545	-	-
0.0412	40	0.6731	-	-
0.0515	50	0.2897	-	-
0.0618	60	0.214	-	-
0.0721	70	0.1677	-	-
0.0824	80	0.0479	-	-
0.0927	90	0.191	-	-
0.1030	100	0.1188	-	-
0.1133	110	0.1909	-	-
0.1236	120	0.0486	-	-
0.1339	130	0.0812	-	-
0.1442	140	0.1282	-	-
0.1545	150	0.15	-	-
0.1648	160	0.0605	-	-
0.1751	170	0.0431	-	-
0.1854	180	0.0613	-	-
0.1957	190	0.0407	-	-
0.2008	195	-	0.0605	-
0.2060	200	0.0567	-	-
0.2163	210	0.0294	-	-
0.2266	220	0.0284	-	-
0.2369	230	0.0444	-	-
0.2472	240	0.0559	-	-
0.2575	250	0.0301	-	-
0.2678	260	0.0225	-	-
0.2781	270	0.0256	-	-
0.2884	280	0.016	-	-
0.2987	290	0.0063	-	-
0.3090	300	0.0442	-	-
0.3193	310	0.0425	-	-
0.3296	320	0.0534	-	-
0.3399	330	0.0264	-	-
0.3502	340	0.043	-	-
0.3605	350	0.035	-	-
0.3708	360	0.0212	-	-
0.3811	370	0.0171	-	-
0.3913	380	0.0497	-	-
0.4016	390	0.0294	0.0381	-
0.4119	400	0.0317	-	-
0.4222	410	0.0571	-	-
0.4325	420	0.0251	-	-
0.4428	430	0.0162	-	-
0.4531	440	0.0504	-	-
0.4634	450	0.0257	-	-
0.4737	460	0.0185	-	-
0.4840	470	0.0414	-	-
0.4943	480	0.016	-	-
0.5046	490	0.0432	-	-
0.5149	500	0.0369	-	-
0.5252	510	0.0115	-	-
0.5355	520	0.034	-	-
0.5458	530	0.0143	-	-
0.5561	540	0.0225	-	-
0.5664	550	0.0185	-	-
0.5767	560	0.0085	-	-
0.5870	570	0.0262	-	-
0.5973	580	0.0465	-	-
0.6025	585	-	0.0541	-
0.6076	590	0.0121	-	-
0.6179	600	0.0256	-	-
0.6282	610	0.0203	-	-
0.6385	620	0.0301	-	-
0.6488	630	0.017	-	-
0.6591	640	0.0321	-	-
0.6694	650	0.0087	-	-
0.6797	660	0.0276	-	-
0.6900	670	0.0043	-	-
0.7003	680	0.0063	-	-
0.7106	690	0.0293	-	-
0.7209	700	0.01	-	-
0.7312	710	0.0121	-	-
0.7415	720	0.0164	-	-
0.7518	730	0.0052	-	-
0.7621	740	0.0271	-	-
0.7724	750	0.0363	-	-
0.7827	760	0.0523	-	-
0.7930	770	0.0153	-	-
0.8033	780	0.015	0.0513	-
0.8136	790	0.0042	-	-
0.8239	800	0.0088	-	-
0.8342	810	0.0217	-	-
0.8445	820	0.0345	-	-
0.8548	830	0.01	-	-
0.8651	840	0.0243	-	-
0.8754	850	0.0074	-	-
0.8857	860	0.0082	-	-
0.8960	870	0.0104	-	-
0.9063	880	0.0078	-	-
0.9166	890	0.0163	-	-
0.9269	900	0.0168	-	-
0.9372	910	0.0088	-	-
0.9475	920	0.0186	-	-
0.9578	930	0.0055	-	-
0.9681	940	0.0142	-	-
0.9784	950	0.0251	-	-
0.9887	960	0.0468	-	-
0.9990	970	0.0031	-	-
1.0	971	-	-	0.9791

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.41.2
PyTorch: 2.3.0+cu121
Accelerate: 0.31.0
Datasets: 2.19.2
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

baconnier
/

Finance2_embedding_small_en-V1.5

SentenceTransformer based on BAAI/bge-small-en-v1.5

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Triplet

Training Details

Training Dataset

baconnier/finance_dataset_small_private

Evaluation Dataset

baconnier/finance_dataset_small_private

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MultipleNegativesRankingLoss

Model tree for baconnier/Finance2_embedding_small_en-V1.5

Evaluation results