LeoChiuu's picture
Add new SentenceTransformer model.
68453ee verified
|
raw
history blame
16.2 kB
metadata
language: []
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:77376
  - loss:CosineSimilarityLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
datasets: []
widget:
  - source_sentence: >-
      He has published several books on nutrition, trace metals but not
      biochemistry imbalances.
    sentences:
      - >-
        This in turn can help in effective communication between healthcare
        providers and their patients.
      - >-
        He has written several books on nutrition, trace metals, and
        biochemistry imbalances.
      - One of the most boring movies I have ever seen.
  - source_sentence: She was denied the 2011 NSK Neustadt Prize for Children's Literature.
    sentences:
      - >-
        She was the recipient of the 2011 NSK Neustadt Prize for Children's
        Literature.
      - The ancient woodland at Dickshills is also located here.
      - >-
        An element (such as a tree) that contributes to evapotranspiration can
        be called an evapotranspirator.
  - source_sentence: >-
      Viking, after the resemblance the pitchers bear to the prow of a Viking
      ship.
    sentences:
      - >-
        Viking, after the striking difference the pitchers bear to the prow of a
        Viking ship.
      - Honshu is formed from the island arcs.
      - >-
        For instance, even alcohol consumption by a pregnant woman is unable to
        lead to fetal alcohol syndrome.
  - source_sentence: Logging has not been undertake near the headwaters of the creek.
    sentences:
      - >-
        Then I had to continue pairing it periodically since it somehow kept
        dropping.
      - That's fair, Nance.
      - Logging has been done near the headwaters of the creek.
  - source_sentence: He published a history of Cornwall, New York in 1873.
    sentences:
      - He failed to publish a history of Cornwall, New York in 1873.
      - Salafis assert that reliance on taqlid has led to Islam 's decline.
      - >-
        Lot of holes in the plot: there's nothing about how he became the
        emperor; nothing about where he spend 20 years between his childhood and
        mature age.
pipeline_tag: sentence-similarity

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LeoChiuu/all-MiniLM-L6-v2-negations")
# Run inference
sentences = [
    'He published a history of Cornwall, New York in 1873.',
    'He failed to publish a history of Cornwall, New York in 1873.',
    "Salafis assert that reliance on taqlid has led to Islam 's decline.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 77,376 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 6 tokens
    • mean: 16.2 tokens
    • max: 57 tokens
    • min: 5 tokens
    • mean: 16.32 tokens
    • max: 56 tokens
    • 0: ~53.20%
    • 1: ~46.80%
  • Samples:
    sentence_0 sentence_1 label
    The situation in Yemen was already much better than it was in Bahrain. The situation in Yemen was not much better than Bahrain. 0
    She was a member of the Gamma Theta Upsilon honour society of geography. She was denied membership of the Gamma Theta Upsilon honour society of mathematics. 0
    Which aren't small and not worth the price. Which are small and not worth the price. 0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.1034 500 0.3382
0.2068 1000 0.2112
0.3102 1500 0.1649
0.4136 2000 0.1454
0.5170 2500 0.1244
0.6203 3000 0.1081
0.7237 3500 0.0962
0.8271 4000 0.0924
0.9305 4500 0.0852
1.0339 5000 0.0812
1.1373 5500 0.0833
1.2407 6000 0.0736
1.3441 6500 0.0756
1.4475 7000 0.0665
1.5509 7500 0.0661
1.6543 8000 0.0625
1.7577 8500 0.0621
1.8610 9000 0.0593
1.9644 9500 0.054
2.0678 10000 0.0569
2.1712 10500 0.0566
2.2746 11000 0.0502
2.3780 11500 0.0516
2.4814 12000 0.0455
2.5848 12500 0.0454
2.6882 13000 0.0424
2.7916 13500 0.044
2.8950 14000 0.0376
2.9983 14500 0.0386
3.1017 15000 0.0392
3.2051 15500 0.0344
3.3085 16000 0.0348
3.4119 16500 0.0343
3.5153 17000 0.0322
3.6187 17500 0.0324
3.7221 18000 0.0278
3.8255 18500 0.0294
3.9289 19000 0.0292
4.0323 19500 0.0276
4.1356 20000 0.0285
4.2390 20500 0.026
4.3424 21000 0.0271
4.4458 21500 0.0248
4.5492 22000 0.0245
4.6526 22500 0.0253
4.7560 23000 0.022
4.8594 23500 0.0219
4.9628 24000 0.0207
5.0662 24500 0.0212
5.1696 25000 0.0218
5.2730 25500 0.0192
5.3763 26000 0.0198
5.4797 26500 0.0183
5.5831 27000 0.02
5.6865 27500 0.0176
5.7899 28000 0.0184
5.8933 28500 0.0157
5.9967 29000 0.0175
6.1001 29500 0.0175
6.2035 30000 0.0163
6.3069 30500 0.0173
6.4103 31000 0.0165
6.5136 31500 0.0152
6.6170 32000 0.0155
6.7204 32500 0.0132
6.8238 33000 0.0147
6.9272 33500 0.0145
7.0306 34000 0.014
7.1340 34500 0.0147
7.2374 35000 0.0126
7.3408 35500 0.0141
7.4442 36000 0.0127
7.5476 36500 0.0132
7.6510 37000 0.0125
7.7543 37500 0.0111
7.8577 38000 0.011
7.9611 38500 0.0125
8.0645 39000 0.0128
8.1679 39500 0.013
8.2713 40000 0.0115
8.3747 40500 0.0111
8.4781 41000 0.0108
8.5815 41500 0.012
8.6849 42000 0.0108
8.7883 42500 0.0105
8.8916 43000 0.0092
8.9950 43500 0.0115
9.0984 44000 0.0112
9.2018 44500 0.0096
9.3052 45000 0.0106
9.4086 45500 0.011
9.5120 46000 0.01
9.6154 46500 0.011
9.7188 47000 0.0097
9.8222 47500 0.0096
9.9256 48000 0.0102

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 3.0.1
  • Transformers: 4.40.2
  • PyTorch: 2.3.0+cpu
  • Accelerate: 0.32.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}