BGE base En v1.5 Phase 5
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("RishuD7/bge-base-en-v1.5-76-keys-phase-6-exp_v1")
# Run inference
sentences = [
'31. HOLDING OVER. If Tenant remains in possession of the Leased Premises after\nexpiration of the Term, or after any termination of the Lease by Landlord without written agreement\nbetween the parties, Tenant shall be a tenant at sufferance and such tenancy shall be subject to the\nprovisions hereof, except that Rent for said holdover period shall be one hundred fifty percent (150%) of\nthe amount of Rent due in the last month of the Term. Nothing in this Section 29 shall be construed as\nconsent by Landlord to the possession of the Leased Premises by Tenant after the expiration of the Term\nor termination of the Lease by Landlord. ',
'Holding Rent',
'Does landlord confirm to no eminent domain on the property',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
dim_768
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.011 |
cosine_accuracy@3 | 0.0228 |
cosine_accuracy@5 | 0.036 |
cosine_accuracy@10 | 0.0772 |
cosine_precision@1 | 0.011 |
cosine_precision@3 | 0.0076 |
cosine_precision@5 | 0.0072 |
cosine_precision@10 | 0.0077 |
cosine_recall@1 | 0.011 |
cosine_recall@3 | 0.0228 |
cosine_recall@5 | 0.036 |
cosine_recall@10 | 0.0772 |
cosine_ndcg@10 | 0.0362 |
cosine_mrr@10 | 0.0243 |
cosine_map@100 | 0.0367 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 8,290 training samples
- Columns:
positive
andanchor
- Approximate statistics based on the first 1000 samples:
positive anchor type string string details - min: 98 tokens
- mean: 298.43 tokens
- max: 512 tokens
- min: 4 tokens
- mean: 5.99 tokens
- max: 12 tokens
- Samples:
positive anchor The Landlord shall have the right, at any time during the Term, to relocate the Premises to other premises (the "New Premises") in the Development on the same terms and conditions as are set out in this Lease provided that: (a) the Landlord shall first have given not less than 90 days notice to the Tenant; (b) the Landlord shall endeavour to ensure that the New Premises be of comparable size and quality to the Premises; (c) the Landlord shall pay the reasonable costs incurred by the Tenant for: (i) its physical move; (ii) the reconnection of existing communication lines; and (iii) the reordering of new printed material plates and the printing of an equal quantity and quality of printed material the tenant has in stock as the time of the relocation; (d) if the Rentable Area of the New Premises is not the same as the Rentable Area of the Premises, the total Basic Rent payable under this Lease (but not the Basic Rent per square foot of Rentable Area) shall be adjusted accordingly; and (e)...
Right to Relocate
39. Holdover: If Tenant shall hold over after the expiration of the Lease Term, without written agreement providing otherwise, Tenant shall be deemed to be a tenant at sufferance on month to month basis, at a monthly rental, payable in advance, equal to double the base rent then being paid by Tenant, and Tenant shall be bound by all of the other terms, covenants and agreements of the Lease. Nothing contained herein shall be construed to give Tenant the right to hold over at any time, extend the Term or prevent Landlord from immediate recovery of possession of the Premises by summary proceedings or otherwise and Landlord may exercise any and all remedies at law or in equity to recover possession of the Premises, as well as any damages incurred by Landlord, by Tenant's failure to vacate the Premises and deliver possession to Landlord as herein provided.
Holding Over
30. HOLDING OVER. If Tenant remains in possession of the Leased Premises after expiration of the Term, or after any termination of the Lease by Landlord without written agreement between the parties, Tenant shall be a tenant at sufferance and such tenancy shall be subject to the provisions hereof, except that Gross Rent for said holdover period shall be one hundred twenty five percent (125%) of the amount of Gross Rent due in the last month of the Term. Nothing in this Section 30 shall be construed as consent by Landlord to the possession of the Leased Premises by Tenant after the expiration of the Term or termination of the Lease by Landlord. In the event Tenant provides written notice to Landlord of its intent to holdover at least sixty (60) days prior to the end of the Term and Landlord does not object to such request within thirty (30) days after receipt thereof, it shall be deemed that Landlord has consented to such holdover and this Lease shall continue on a month-to-month basis ...
Holding Over
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 32per_device_eval_batch_size
: 16gradient_accumulation_steps
: 16learning_rate
: 2e-05num_train_epochs
: 30lr_scheduler_type
: cosinewarmup_ratio
: 0.1tf32
: Falseload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 16eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 30max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Falselocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 |
---|---|---|---|
0.6154 | 10 | 2.5422 | - |
1.2308 | 20 | 1.3661 | - |
1.8462 | 30 | 0.1879 | - |
2.4615 | 40 | 0.0 | - |
3.0769 | 50 | 0.0 | - |
3.3846 | 55 | - | 0.0252 |
1.2846 | 60 | 0.8868 | - |
1.9 | 70 | 1.4243 | - |
2.5154 | 80 | 0.1644 | - |
3.1308 | 90 | 0.0041 | - |
3.7462 | 100 | 0.0 | - |
4.3615 | 110 | 0.0 | 0.0301 |
2.5692 | 120 | 1.0665 | - |
3.1846 | 130 | 0.4817 | - |
3.8 | 140 | 0.0021 | - |
4.4154 | 150 | 0.0 | - |
5.0308 | 160 | 0.0 | - |
5.4 | 166 | - | 0.0328 |
3.2385 | 170 | 0.4318 | - |
3.8538 | 180 | 0.7595 | - |
4.4692 | 190 | 0.0737 | - |
5.0846 | 200 | 0.0004 | - |
5.7 | 210 | 0.0 | - |
6.3154 | 220 | 0.0 | - |
6.3769 | 221 | - | 0.0354 |
4.5231 | 230 | 0.736 | - |
5.1385 | 240 | 0.3332 | - |
5.7538 | 250 | 0.0008 | - |
6.3692 | 260 | 0.0 | - |
6.9846 | 270 | 0.0 | - |
7.3538 | 276 | - | 0.0336 |
5.1923 | 280 | 0.3014 | - |
5.8077 | 290 | 0.5931 | - |
6.4231 | 300 | 0.0735 | - |
7.0385 | 310 | 0.0002 | - |
7.6538 | 320 | 0.0 | - |
8.2692 | 330 | 0.0 | - |
8.3923 | 332 | - | 0.0374 |
6.4769 | 340 | 0.5984 | - |
7.0923 | 350 | 0.2797 | - |
7.7077 | 360 | 0.0005 | - |
8.3231 | 370 | 0.0 | - |
8.9385 | 380 | 0.0 | - |
9.3692 | 387 | - | 0.0355 |
7.1462 | 390 | 0.1997 | - |
7.7615 | 400 | 0.5201 | - |
8.3769 | 410 | 0.0799 | - |
8.9923 | 420 | 0.0001 | - |
9.6077 | 430 | 0.0 | - |
10.2231 | 440 | 0.0 | - |
10.4077 | 443 | - | 0.0362 |
8.4308 | 450 | 0.5072 | - |
9.0462 | 460 | 0.2583 | - |
9.6615 | 470 | 0.0005 | - |
10.2769 | 480 | 0.0 | 0.0362 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.3.1
- Transformers: 4.43.1
- PyTorch: 2.5.1+cu124
- Accelerate: 1.2.1
- Datasets: 2.19.1
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 20
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for RishuD7/bge-base-en-v1.5-76-keys-phase-6-exp_v1
Base model
BAAI/bge-base-en-v1.5Evaluation results
- Cosine Accuracy@1 on dim 768self-reported0.011
- Cosine Accuracy@3 on dim 768self-reported0.023
- Cosine Accuracy@5 on dim 768self-reported0.036
- Cosine Accuracy@10 on dim 768self-reported0.077
- Cosine Precision@1 on dim 768self-reported0.011
- Cosine Precision@3 on dim 768self-reported0.008
- Cosine Precision@5 on dim 768self-reported0.007
- Cosine Precision@10 on dim 768self-reported0.008
- Cosine Recall@1 on dim 768self-reported0.011
- Cosine Recall@3 on dim 768self-reported0.023