gsm-finetunned-v2 / README.md
anomys's picture
Add new SentenceTransformer model.
8b91546 verified
metadata
base_model: sentence-transformers/all-mpnet-base-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:2160
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: Are there any special events for kids? (variation 72)
    sentences:
      - No, pets are not allowed.
      - >-
        Yes, there are special events for kids like the Love-themed Movie Night
        on February 17 and Sunday Family Picnic on March 18.
      - >-
        The mall's address is Miyapur Main Rd, ICRISAT Colony, Madeenaguda,
        Hyderabad, Telangana 500050
  - source_sentence: Who built the chatbot? (variation 16)
    sentences:
      - >-
        Most stores accept cash, credit cards, debit cards, and UPI payments.
        Individual stores may have additional payment options.
      - >-
        The chatbot was built by KreativeChat. Their contact information is
        [email protected].
      - >-
        Yes, there is a Valentine's Day Dinner event on February 14, 2024, from
        7:00 PM to 10:00 PM at the Rooftop Restaurant.
  - source_sentence: Where can I find details about the Weekend Jazz Brunch? (variation 100)
    sentences:
      - >-
        Our mall chatbot is your primary source for information and assistance.
        For specific inquiries or to meet with mall management, please visit the
        6th-floor mall management front desk.
      - >-
        The Weekend Jazz Brunch takes place at the Jazz Cafe on February 18,
        2024, from 11:00 AM to 2:00 PM.
      - >-
        Washrooms are conveniently located on each floor. Ask our chatbot for a
        floor plan with marked washrooms.
  - source_sentence: Is there a Lost and Found section in the mall? (variation 1)
    sentences:
      - No, charging points are not available in the mall.
      - >-
        Yes, there is a Valentine's Day Dinner event on February 14, 2024, from
        7:00 PM to 10:00 PM at the Rooftop Restaurant.
      - >-
        Yes, there is. Please fill out this Google Form:
        [https://forms.gle/7R9rW1xamhktqBXh9]
  - source_sentence: Where are the washrooms located? (variation 95)
    sentences:
      - >-
        The chatbot was built by KreativeChat. Their contact information is
        [email protected].
      - >-
        No, there are no information desks or customer desks. For inquiries,
        please leave a message or ask the chatbot. The relevant person will
        respond accordingly.
      - >-
        Washrooms are conveniently located on each floor. Ask our chatbot for a
        floor plan with marked washrooms.

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the train dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • train

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("anomys/gsm-finetunned-v2")
# Run inference
sentences = [
    'Where are the washrooms located? (variation 95)',
    'Washrooms are conveniently located on each floor. Ask our chatbot for a floor plan with marked washrooms.',
    'The chatbot was built by KreativeChat. Their contact information is [email protected].',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

train

  • Dataset: train
  • Size: 2,160 training samples
  • Columns: question and response
  • Approximate statistics based on the first 1000 samples:
    question response
    type string string
    details
    • min: 12 tokens
    • mean: 15.57 tokens
    • max: 26 tokens
    • min: 9 tokens
    • mean: 33.72 tokens
    • max: 82 tokens
  • Samples:
    question response
    Is there public WiFi available in the mall? (variation 4) Sorry, no WiFi is available for the public.
    What are the special promotions available? (variation 65) Special promotions include up to 50% off at Reliance Trends, 20% off new arrivals at Style Union, and more.
    What are the mall hours of operation? (variation 47) GSM Mall & Multiplex is open from 11:00 AM to 10:00 PM on weekdays and weekends. Individual store timings may vary.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

train

  • Dataset: train
  • Size: 540 evaluation samples
  • Columns: question and response
  • Approximate statistics based on the first 1000 samples:
    question response
    type string string
    details
    • min: 12 tokens
    • mean: 15.45 tokens
    • max: 26 tokens
    • min: 9 tokens
    • mean: 33.56 tokens
    • max: 82 tokens
  • Samples:
    question response
    What offers are available at the food court? (variation 12) Offers at the food court include Buy One Get One Half Off Shakes at Thick Shake Factory, Taco Tuesday Special at California Burrito, and more.
    What is the date and time for the Spring Fashion Show? (variation 14) The Spring Fashion Show is on March 24, 2024, from 6:00 PM to 8:00 PM at the Mall Runway.
    Where is GSM Mall & Multiplex located? (variation 30) The mall's address is Miyapur Main Rd, ICRISAT Colony, Madeenaguda, Hyderabad, Telangana 500050
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss train loss
0.3704 50 0.0642 0.0000
0.7407 100 0.0 0.0000

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}