GATE-AraBert-v0 / README.md
Omartificial-Intelligence-Space's picture
Add new SentenceTransformer model.
919f7cc verified
|
raw
history blame
24.5 kB
metadata
base_model: Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka
datasets:
  - Omartificial-Intelligence-Space/Arabic-stsb
language:
  - ar
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:947818
  - loss:SoftmaxLoss
  - loss:CosineSimilarityLoss
widget:
  - source_sentence: امرأة تكتب شيئاً
    sentences:
      - مراهق يتحدث إلى فتاة عبر كاميرا الإنترنت
      - امرأة تقطع البصل الأخضر.
      - مجموعة من كبار السن يتظاهرون حول طاولة الطعام.
  - source_sentence: تتشكل النجوم في مناطق تكوين النجوم، والتي تنشأ نفسها من السحب الجزيئية.
    sentences:
      - لاعب كرة السلة على وشك تسجيل نقاط لفريقه.
      - المقال التالي مأخوذ من نسختي من "أطلس البطريق الجديد للتاريخ الوسطى"
      - قد يكون من الممكن أن يوجد نظام شمسي مثل نظامنا خارج المجرة
  - source_sentence: >-
      تحت السماء الزرقاء مع الغيوم البيضاء، يصل طفل لمس مروحة طائرة واقفة على
      حقل من العشب.
    sentences:
      - امرأة تحمل كأساً
      - طفل يحاول لمس مروحة طائرة
      - اثنان من عازبين عن الشرب يستعدون للعشاء
  - source_sentence: رجل في منتصف العمر يحلق لحيته في غرفة ذات جدران بيضاء والتي لا تبدو كحمام
    sentences:
      - فتى يخطط اسمه على مكتبه
      - رجل ينام
      - المرأة وحدها وهي نائمة في غرفة نومها
  - source_sentence: الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة.
    sentences:
      - شخص طويل القامة
      - المرأة تنظر من النافذة.
      - لقد مات الكلب
model-index:
  - name: >-
      SentenceTransformer based on
      Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.8383581637565862
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8389373148442993
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8247947413553784
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8329104956151686
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8249963167509389
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8336591462431132
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8071855574990106
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8097706351791779
            name: Spearman Dot
          - type: pearson_max
            value: 0.8383581637565862
            name: Pearson Max
          - type: spearman_max
            value: 0.8389373148442993
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.7907507025363603
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7893080660475024
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7923222026451455
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7946838339078852
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7903690631114766
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.793426368251902
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7404285389360442
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7353599094850335
            name: Spearman Dot
          - type: pearson_max
            value: 0.7923222026451455
            name: Pearson Max
          - type: spearman_max
            value: 0.7946838339078852
            name: Spearman Max

SentenceTransformer based on Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka

This is a sentence-transformers model finetuned from Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka on the all-nli and sts datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka-multi-task")
# Run inference
sentences = [
    'الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة.',
    'لقد مات الكلب',
    'شخص طويل القامة',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8384
spearman_cosine 0.8389
pearson_manhattan 0.8248
spearman_manhattan 0.8329
pearson_euclidean 0.825
spearman_euclidean 0.8337
pearson_dot 0.8072
spearman_dot 0.8098
pearson_max 0.8384
spearman_max 0.8389

Semantic Similarity

Metric Value
pearson_cosine 0.7908
spearman_cosine 0.7893
pearson_manhattan 0.7923
spearman_manhattan 0.7947
pearson_euclidean 0.7904
spearman_euclidean 0.7934
pearson_dot 0.7404
spearman_dot 0.7354
pearson_max 0.7923
spearman_max 0.7947

Training Details

Training Datasets

all-nli

  • Dataset: all-nli
  • Size: 942,069 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 5 tokens
    • mean: 14.09 tokens
    • max: 43 tokens
    • min: 4 tokens
    • mean: 8.28 tokens
    • max: 28 tokens
    • 0: ~33.40%
    • 1: ~33.30%
    • 2: ~33.30%
  • Samples:
    premise hypothesis label
    شخص على حصان يقفز فوق طائرة معطلة شخص يقوم بتدريب حصانه للمنافسة 1
    شخص على حصان يقفز فوق طائرة معطلة شخص في مطعم، يطلب عجة. 2
    شخص على حصان يقفز فوق طائرة معطلة شخص في الهواء الطلق، على حصان. 0
  • Loss: SoftmaxLoss

sts

  • Dataset: sts at f5a6f89
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 7.46 tokens
    • max: 22 tokens
    • min: 4 tokens
    • mean: 7.36 tokens
    • max: 18 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    طائرة ستقلع طائرة جوية ستقلع 1.0
    رجل يعزف على ناي كبير رجل يعزف على الناي. 0.76
    رجل ينشر الجبن الممزق على البيتزا رجل ينشر الجبن الممزق على بيتزا غير مطبوخة 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Datasets

all-nli

  • Dataset: all-nli
  • Size: 1,000 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 5 tokens
    • mean: 15.1 tokens
    • max: 48 tokens
    • min: 4 tokens
    • mean: 8.11 tokens
    • max: 21 tokens
    • 0: ~33.10%
    • 1: ~33.30%
    • 2: ~33.60%
  • Samples:
    premise hypothesis label
    امرأتان يتعانقان بينما يحملان طرود الأخوات يعانقون بعضهم لوداعاً بينما يحملون حزمة بعد تناول الغداء 1
    امرأتان يتعانقان بينما يحملان حزمة إمرأتان يحملان حزمة 0
    امرأتان يتعانقان بينما يحملان حزمة الرجال يتشاجرون خارج مطعم 2
  • Loss: SoftmaxLoss

sts

  • Dataset: sts at f5a6f89
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 12.55 tokens
    • max: 42 tokens
    • min: 4 tokens
    • mean: 12.49 tokens
    • max: 54 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    رجل يرتدي قبعة صلبة يرقص رجل يرتدي قبعة صلبة يرقص. 1.0
    طفل صغير يركب حصاناً. طفل يركب حصاناً. 0.95
    رجل يطعم فأراً لأفعى الرجل يطعم الفأر للثعبان. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss all-nli loss sts loss sts-dev_spearman_cosine sts-test_spearman_cosine
0.1389 100 0.5848 1.0957 0.0324 0.8309 -
0.2778 200 0.5243 0.9695 0.0294 0.8386 -
0.4167 300 0.5135 0.9486 0.0295 0.8398 -
0.5556 400 0.4896 0.9366 0.0305 0.8317 -
0.6944 500 0.5048 0.9201 0.0298 0.8395 -
0.8333 600 0.4862 0.8885 0.0291 0.8370 -
0.9722 700 0.4628 0.8893 0.0289 0.8389 -
1.0 720 - - - - 0.7893

Framework Versions

  • Python: 3.9.18
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.2.2+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.19.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}