metadata
base_model: BAAI/bge-large-en
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:574
- loss:CosineSimilarityLoss
widget:
- source_sentence: What is mentioned regarding the patent errors?
sentences:
- the Schedule to the Indian Medical Council Act
- >-
shall take upon himself and provide for the risk of any error which may
subsequently be discovered and shall make no subsequent claim on account
thereof.
- Omissions and Descrepancies
- source_sentence: Is there a way to claim consequential losses?
sentences:
- >-
The Railway reserves the right of not to invite tenders for any of
Railway work or works or to invite open or limited tenders
- >-
entitle the Contractor to damages or compensation therefor, but in any
such case, the Railway may grant such extension or extensions of the
completion date as may be considered reasonable.
- "The Railway shall have the right to let other contracts in connection with the works. The Contractor shall afford other Contractors reasonable opportunity for the storage of their materials and the execution of their works and shall properly connect and coordinate his work with theirs. If any part of the Contractor\x92s work depends upon proper execution or result upon the work of another Contractor(s), the Contractor shall inspect and promptly report to the Engineer any defects in such works that render it unsuitable for such proper execution and results. The Contractor's failure so-to inspect and report shall constitute an acceptance of the other Contractor's work as fit and proper for the reception of his work, except as to defects which may develop in the other Contractor's work after the execution of his work."
- source_sentence: Does the contract document contain a indemnification clause provision?
sentences:
- >-
The partners of the firm to which the Letter of Acceptance (LOA) is
issued, shall be jointly and severally liable to the Railway for
execution of the contract
- >-
Contractor awarded the work shall submit a detailed program of work
indicating the time schedule
- >-
All notices, communications, reference and complaints made by the
Railway or the Engineer or the Engineer's Representative or the
Contractor inter-se concerning the works shall be in writing or e-mail
on registered e mail IDs and no notice, communication, reference or
complaint not in writing or through e-mail, shall be recognized.
- source_sentence: Force Majeure
sentences:
- >-
These Regulations for Tenders and Contracts shall be read in conjunction
with the Standard General Conditions of Contract which are referred to
herein and shall be subject to modifications additions or suppression by
Special Conditions of Contract and/or Special Specifications, if any,
annexed to the Tender Forms.
- Act of God
- >-
Instructions: The Engineer shall direct the order in which the several
parts of the works shall be executed
- source_sentence: Interpretation of Standard General Conditions of contract
sentences:
- >-
The Contractor shall at his own expense provide himself with sheds,
storehouses and yards in such situations and in such numbers as in the
opinion of the Engineer is requisite for carrying on the works and the
Contractor
- >-
What are the additional documents that have to be read along with the
Standard General Conditions of Contract?
- >-
the necessity arises for the execution of such items of works that the
accepted Schedule of Rates does not include rate or rates for the extra
work involved.
SentenceTransformer based on BAAI/bge-large-en
This is a sentence-transformers model finetuned from BAAI/bge-large-en. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-large-en
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Ananthu357/Ananthus-BAAI-for-contracts7.0")
# Run inference
sentences = [
'Interpretation of Standard General Conditions of contract',
'What are the additional documents that have to be read along with the Standard General Conditions of Contract?',
'The Contractor shall at his own expense provide himself with sheds, storehouses and yards in such situations and in such numbers as in the opinion of the Engineer is requisite for carrying on the works and the Contractor',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 15warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 15max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | loss |
---|---|---|---|
2.7778 | 100 | 0.0571 | 0.0611 |
5.5556 | 200 | 0.0073 | 0.0604 |
8.3333 | 300 | 0.0031 | 0.0578 |
11.1111 | 400 | 0.0019 | 0.0589 |
13.8889 | 500 | 0.0013 | 0.0587 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.42.4
- PyTorch: 2.3.1+cu121
- Accelerate: 0.32.1
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}