SentenceTransformer based on Snowflake/snowflake-arctic-embed-l-v2.0

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l-v2.0 on the all-nli dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-l-v2.0
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("JatinkInnovision/snowflake-arctic-embed-l-v2.0_all-nli")
# Run inference
sentences = [
    'A middle-aged man works under the engine of a train on rail tracks.',
    'A guy is working on a train.',
    'A guy is driving to work.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9558

Training Details

Training Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 557,850 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.9 tokens
    • max: 52 tokens
    • min: 6 tokens
    • mean: 13.62 tokens
    • max: 42 tokens
    • min: 5 tokens
    • mean: 14.76 tokens
    • max: 55 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 6,584 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 20.31 tokens
    • max: 83 tokens
    • min: 5 tokens
    • mean: 10.71 tokens
    • max: 35 tokens
    • min: 5 tokens
    • mean: 11.39 tokens
    • max: 32 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 50
  • per_device_eval_batch_size: 50
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 50
  • per_device_eval_batch_size: 50
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss all-nli-test_cosine_accuracy
0.0090 100 1.8838 0.6502 -
0.0179 200 1.2991 0.6177 -
0.0269 300 1.2721 0.6417 -
0.0359 400 1.2265 0.7053 -
0.0448 500 1.0111 0.7147 -
0.0538 600 1.0491 0.7457 -
0.0627 700 1.0186 0.7922 -
0.0717 800 1.135 0.8940 -
0.0807 900 1.0747 0.7007 -
0.0896 1000 0.9373 0.7298 -
0.0986 1100 0.9572 0.6809 -
0.1076 1200 1.1316 0.7260 -
0.1165 1300 0.9188 0.7085 -
0.1255 1400 0.9554 0.6876 -
0.1344 1500 0.9494 0.7492 -
0.1434 1600 0.811 0.7234 -
0.1524 1700 0.7766 0.6744 -
0.1613 1800 0.9317 0.7178 -
0.1703 1900 0.9148 0.6960 -
0.1793 2000 0.8643 0.6642 -
0.1882 2100 0.7604 0.6425 -
0.1972 2200 0.776 0.6347 -
0.2061 2300 0.8286 0.6581 -
0.2151 2400 0.8946 0.5866 -
0.2241 2500 0.8507 0.6845 -
0.2330 2600 0.7917 0.6091 -
0.2420 2700 0.8192 0.7073 -
0.2510 2800 0.8818 0.6584 -
0.2599 2900 0.8261 0.6112 -
0.2689 3000 0.8017 0.6883 -
0.2779 3100 0.8147 0.6450 -
0.2868 3200 0.8297 0.6086 -
0.2958 3300 0.7516 0.5857 -
0.3047 3400 0.8628 0.6061 -
0.3137 3500 0.7758 0.5751 -
0.3227 3600 0.7773 0.6022 -
0.3316 3700 0.7559 0.5446 -
0.3406 3800 0.796 0.5842 -
0.3496 3900 0.8295 0.5822 -
0.3585 4000 0.7292 0.5821 -
0.3675 4100 0.7475 0.6358 -
0.3764 4200 0.7916 0.5688 -
0.3854 4300 0.7214 0.5653 -
0.3944 4400 0.704 0.5564 -
0.4033 4500 0.7817 0.5876 -
0.4123 4600 0.7549 0.5358 -
0.4213 4700 0.7206 0.5785 -
0.4302 4800 0.7462 0.5568 -
0.4392 4900 0.665 0.5765 -
0.4481 5000 0.7743 0.5303 -
0.4571 5100 0.7055 0.5733 -
0.4661 5200 0.7004 0.6280 -
0.4750 5300 0.7021 0.5444 -
0.4840 5400 0.6858 0.5787 -
0.4930 5500 0.7007 0.6124 -
0.5019 5600 0.6722 0.5705 -
0.5109 5700 0.7124 0.5440 -
0.5199 5800 0.6657 0.5262 -
0.5288 5900 0.6784 0.5400 -
0.5378 6000 0.6644 0.5093 -
0.5467 6100 0.7195 0.5453 -
0.5557 6200 0.6958 0.5216 -
0.5647 6300 0.7202 0.5250 -
0.5736 6400 0.6921 0.5089 -
0.5826 6500 0.6926 0.5207 -
0.5916 6600 0.714 0.5084 -
0.6005 6700 0.6605 0.4943 -
0.6095 6800 0.7222 0.5058 -
0.6184 6900 0.7171 0.4950 -
0.6274 7000 0.6344 0.5110 -
0.6364 7100 0.7057 0.5197 -
0.6453 7200 0.6895 0.5096 -
0.6543 7300 0.7226 0.4819 -
0.6633 7400 0.6725 0.4780 -
0.6722 7500 0.7469 0.5145 -
0.6812 7600 0.7016 0.4969 -
0.6901 7700 0.6655 0.4965 -
0.6991 7800 0.7281 0.4913 -
0.7081 7900 0.6748 0.5121 -
0.7170 8000 0.6505 0.5207 -
0.7260 8100 0.6594 0.4823 -
0.7350 8200 0.7042 0.4903 -
0.7439 8300 0.6995 0.4630 -
0.7529 8400 0.634 0.4217 -
0.7619 8500 0.3772 0.3684 -
0.7708 8600 0.3416 0.3585 -
0.7798 8700 0.3113 0.3471 -
0.7887 8800 0.2793 0.3379 -
0.7977 8900 0.2577 0.3349 -
0.8067 9000 0.249 0.3320 -
0.8156 9100 0.2191 0.3290 -
0.8246 9200 0.2492 0.3255 -
0.8336 9300 0.2464 0.3258 -
0.8425 9400 0.2288 0.3247 -
0.8515 9500 0.2132 0.3248 -
0.8604 9600 0.2173 0.3259 -
0.8694 9700 0.2008 0.3223 -
0.8784 9800 0.2016 0.3219 -
0.8873 9900 0.1962 0.3195 -
0.8963 10000 0.1952 0.3185 -
0.9053 10100 0.1959 0.3158 -
0.9142 10200 0.2002 0.3138 -
0.9232 10300 0.1882 0.3150 -
0.9322 10400 0.1856 0.3124 -
0.9411 10500 0.1971 0.3143 -
0.9501 10600 0.1918 0.3137 -
0.9590 10700 0.1825 0.3147 -
0.9680 10800 0.1762 0.3155 -
0.9770 10900 0.1778 0.3139 -
0.9859 11000 0.1659 0.3138 -
0.9949 11100 0.1848 0.3131 -
1.0 11157 - - 0.9558

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
14
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for JatinkInnovision/snowflake-arctic-embed-l-v2.0_all-nli

Finetuned
(5)
this model

Dataset used to train JatinkInnovision/snowflake-arctic-embed-l-v2.0_all-nli

Evaluation results