SentenceTransformer based on uitnlp/CafeBERT

This is a sentence-transformers model finetuned from uitnlp/CafeBERT. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: uitnlp/CafeBERT
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 256 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 1024, 'out_features': 256, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ThuanPhong/sentence_CafeBERT")
# Run inference
sentences = [
    'Chúng tôi đang tiến vào sa mạc.',
    'Chúng tôi chuyển đến sa mạc.',
    'Người phụ nữ này đang chạy vì cô ta đến muộn.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 256]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.5404
cosine_accuracy_threshold 1.0
cosine_f1 0.6299
cosine_f1_threshold 1.0
cosine_precision 0.4597
cosine_recall 1.0
cosine_ap 0.4597
dot_accuracy 0.5403
dot_accuracy_threshold 46.2905
dot_f1 0.6299
dot_f1_threshold 46.2905
dot_precision 0.4597
dot_recall 0.9999
dot_ap 0.4578
manhattan_accuracy 0.5411
manhattan_accuracy_threshold 0.0
manhattan_f1 0.6299
manhattan_f1_threshold 0.0002
manhattan_precision 0.4597
manhattan_recall 1.0
manhattan_ap 0.4604
euclidean_accuracy 0.5412
euclidean_accuracy_threshold 0.0
euclidean_f1 0.6299
euclidean_f1_threshold 0.0
euclidean_precision 0.4597
euclidean_recall 1.0
euclidean_ap 0.4602
max_accuracy 0.5412
max_accuracy_threshold 46.2905
max_f1 0.6299
max_f1_threshold 46.2905
max_precision 0.4597
max_recall 1.0
max_ap 0.4604

Training Details

Training Dataset

Unnamed Dataset

  • Size: 461,625 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 4 tokens
    • mean: 21.87 tokens
    • max: 121 tokens
    • min: 4 tokens
    • mean: 32.19 tokens
    • max: 162 tokens
    • 0: ~55.90%
    • 1: ~44.10%
  • Samples:
    sentence_0 sentence_1 label
    Khi nào William Caxton giới thiệu máy in ép vào nước Anh? Những đặc điểm mà độc giả của Shakespeare ngày nay có thể thấy kỳ quặc hay lỗi thời thường đại diện cho những nét đặc trưng của tiếng Anh trung Đại. 0
    Nhưng tôi không biết rằng tôi phải, " Dorcas do dự. Dorcas sợ phản ứng của họ. 0
    Đông Đức là tên gọi thường được sử dụng để chỉ quốc gia nào? Cộng hòa Dân chủ Đức (tiếng Đức: Deutsche Demokratische Republik, DDR; thường được gọi là Đông Đức) là một quốc gia nay không còn nữa, tồn tại từ 1949 đến 1990 theo định hướng xã hội chủ nghĩa tại phần phía đông nước Đức ngày nay. 1
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • num_train_epochs: 2
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss max_ap
0 0 - 0.5959
0.0087 500 0.3971 -
0.0173 1000 0.3353 -
0.0260 1500 0.4706 -
0.0347 2000 0.5002 -
0.0433 2500 0.4528 -
0.0520 3000 0.445 -
0.0607 3500 0.428 -
0.0693 4000 0.4305 -
0.0780 4500 0.4428 -
0.0866 5000 0.4358 -
0.0953 5500 0.4309 -
0.1040 6000 0.4221 -
0.1126 6500 0.4283 -
0.1213 7000 0.4218 -
0.1300 7500 0.4176 -
0.1386 8000 0.4227 -
0.1473 8500 0.4174 -
0.1560 9000 0.418 -
0.1646 9500 0.426 -
0.1733 10000 0.4213 -
0.1820 10500 0.4165 -
0.1906 11000 0.417 -
0.1993 11500 0.4262 -
0.2080 12000 0.4192 -
0.2166 12500 0.4162 -
0.2253 13000 0.4136 -
0.2340 13500 0.4037 -
0.2426 14000 0.4234 -
0.2513 14500 0.4225 -
0.2599 15000 0.4143 -
0.2686 15500 0.4178 -
0.2773 16000 0.4172 -
0.2859 16500 0.4305 -
0.2946 17000 0.4193 -
0.3033 17500 0.4144 -
0.3119 18000 0.4192 -
0.3206 18500 0.4172 -
0.3293 19000 0.4253 -
0.3379 19500 0.4211 -
0.3466 20000 0.4197 -
0.3553 20500 0.4219 -
0.3639 21000 0.4307 -
0.3726 21500 0.4332 -
0.3813 22000 0.4201 -
0.3899 22500 0.4273 -
0.3986 23000 0.4218 -
0.4073 23500 0.4279 -
0.4159 24000 0.4299 -
0.4246 24500 0.4289 -
0.4332 25000 0.416 -
0.4419 25500 0.3997 -
0.4506 26000 0.409 -
0.4592 26500 0.4133 -
0.4679 27000 0.4016 -
0.4766 27500 0.4117 -
0.4852 28000 0.4155 -
0.4939 28500 0.4117 -
0.5026 29000 0.4039 -
0.5112 29500 0.4087 -
0.5199 30000 0.4119 -
0.5286 30500 0.3948 -
0.5372 31000 0.4013 -
0.5459 31500 0.4175 -
0.5546 32000 0.4038 -
0.5632 32500 0.4058 -
0.5719 33000 0.4099 -
0.5805 33500 0.4117 -
0.5892 34000 0.4142 -
0.5979 34500 0.4049 -
0.6065 35000 0.4099 -
0.6152 35500 0.4121 -
0.6239 36000 0.4167 -
0.6325 36500 0.4138 -
0.6412 37000 0.4125 -
0.6499 37500 0.4043 -
0.6585 38000 0.4129 -
0.6672 38500 0.4079 -
0.6759 39000 0.3954 -
0.6845 39500 0.413 -
0.6932 40000 0.4079 -
0.7019 40500 0.4067 -
0.7105 41000 0.4251 -
0.7192 41500 0.4044 -
0.7279 42000 0.3919 -
0.7365 42500 0.4081 -
0.7452 43000 0.4141 -
0.7538 43500 0.4015 -
0.7625 44000 0.4139 -
0.7712 44500 0.408 -
0.7798 45000 0.4019 -
0.7885 45500 0.4127 -
0.7972 46000 0.4109 -
0.8058 46500 0.4045 -
0.8145 47000 0.4017 -
0.8232 47500 0.4108 -
0.8318 48000 0.4189 -
0.8405 48500 0.4127 -
0.8492 49000 0.4183 -
0.8578 49500 0.408 -
0.8665 50000 0.4091 -
0.8752 50500 0.412 -
0.8838 51000 0.4129 -
0.8925 51500 0.4175 -
0.9012 52000 0.4049 -
0.9098 52500 0.4047 -
0.9185 53000 0.4016 -
0.9271 53500 0.4088 -
0.9358 54000 0.4009 -
0.9445 54500 0.3996 -
0.9531 55000 0.4054 -
0.9618 55500 0.4115 -
0.9705 56000 0.4135 -
0.9791 56500 0.4041 -
0.9878 57000 0.4046 -
0.9965 57500 0.4063 -
1.0 57704 - 0.4615
1.0051 58000 0.4054 -
1.0138 58500 0.4017 -
1.0225 59000 0.417 -
1.0311 59500 0.4048 -
1.0398 60000 0.4007 -
1.0485 60500 0.4094 -
1.0571 61000 0.4068 -
1.0658 61500 0.4113 -
1.0744 62000 0.4022 -
1.0831 62500 0.4219 -
1.0918 63000 0.4149 -
1.1004 63500 0.399 -
1.1091 64000 0.4041 -
1.1178 64500 0.4023 -
1.1264 65000 0.4039 -
1.1351 65500 0.4024 -
1.1438 66000 0.4184 -
1.1524 66500 0.4104 -
1.1611 67000 0.4032 -
1.1698 67500 0.3958 -
1.1784 68000 0.4103 -
1.1871 68500 0.4105 -
1.1958 69000 0.4049 -
1.2044 69500 0.3995 -
1.2131 70000 0.4064 -
1.2218 70500 0.4135 -
1.2304 71000 0.3907 -
1.2391 71500 0.4037 -
1.2477 72000 0.4016 -
1.2564 72500 0.4124 -
1.2651 73000 0.4071 -
1.2737 73500 0.3965 -
1.2824 74000 0.4149 -
1.2911 74500 0.3985 -
1.2997 75000 0.3957 -
1.3084 75500 0.4043 -
1.3171 76000 0.411 -
1.3257 76500 0.4109 -
1.3344 77000 0.3968 -
1.3431 77500 0.4134 -
1.3517 78000 0.4057 -
1.3604 78500 0.4034 -
1.3691 79000 0.4057 -
1.3777 79500 0.3998 -
1.3864 80000 0.4002 -
1.3951 80500 0.396 -
1.4037 81000 0.4066 -
1.4124 81500 0.4073 -
1.4210 82000 0.3957 -
1.4297 82500 0.4012 -
1.4384 83000 0.4008 -
1.4470 83500 0.4055 -
1.4557 84000 0.409 -
1.4644 84500 0.4052 -
1.4730 85000 0.4128 -
1.4817 85500 0.4053 -
1.4904 86000 0.3979 -
1.4990 86500 0.4038 -
1.5077 87000 0.3987 -
1.5164 87500 0.4071 -
1.5250 88000 0.4042 -
1.5337 88500 0.4097 -
1.5424 89000 0.4044 -
1.5510 89500 0.4037 -
1.5597 90000 0.3992 -
1.5683 90500 0.4031 -
1.5770 91000 0.4037 -
1.5857 91500 0.4001 -
1.5943 92000 0.4069 -
1.6030 92500 0.4149 -
1.6117 93000 0.4091 -
1.6203 93500 0.3978 -
1.6290 94000 0.397 -
1.6377 94500 0.4063 -
1.6463 95000 0.4032 -
1.6550 95500 0.4146 -
1.6637 96000 0.407 -
1.6723 96500 0.4079 -
1.6810 97000 0.3991 -
1.6897 97500 0.4072 -
1.6983 98000 0.397 -
1.7070 98500 0.4033 -
1.7157 99000 0.412 -
1.7243 99500 0.3886 -
1.7330 100000 0.4026 -
1.7416 100500 0.3993 -
1.7503 101000 0.4078 -
1.7590 101500 0.3945 -
1.7676 102000 0.4029 -
1.7763 102500 0.4048 -
1.7850 103000 0.3994 -
1.7936 103500 0.4079 -
1.8023 104000 0.4146 -
1.8110 104500 0.4014 -
1.8196 105000 0.3942 -
1.8283 105500 0.4081 -
1.8370 106000 0.4016 -
1.8456 106500 0.4122 -
1.8543 107000 0.4078 -
1.8630 107500 0.4146 -
1.8716 108000 0.4029 -
1.8803 108500 0.4057 -
1.8890 109000 0.3994 -
1.8976 109500 0.3955 -
1.9063 110000 0.3997 -
1.9149 110500 0.3935 -
1.9236 111000 0.3942 -
1.9323 111500 0.3979 -
1.9409 112000 0.3996 -
1.9496 112500 0.4076 -
1.9583 113000 0.3971 -
1.9669 113500 0.4075 -
1.9756 114000 0.4028 -
1.9843 114500 0.4011 -
1.9929 115000 0.3929 -
2.0 115408 - 0.4604

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.2
  • PyTorch: 2.2.1
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ThuanPhong/sentence_CafeBERT

Base model

uitnlp/CafeBERT
Finetuned
(2)
this model

Evaluation results