embeddings-spanish-models 🎯
Collection
A collection with embeddings models I fine-tuned for better performance in Spanish texts.
•
4 items
•
Updated
•
2
This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the stsb_multi_es_augmented (private) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("mrm8488/modernbert-embed-base-ft-sts-spanish-matryoshka-768-64-5e")
# Run inference
sentences = [
'El cordero está mirando hacia la cámara.',
'Un gato está mirando hacia la cámara también.',
'"Sí, no deseo estar presente durante este testimonio", declaró tranquilamente Peterson, de 31 años, al juez cuando fue devuelto a su celda.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sts-dev-768
, sts-dev-512
, sts-dev-256
, sts-dev-128
, sts-dev-64
, sts-test-768
, sts-test-512
, sts-test-256
, sts-test-128
and sts-test-64
EmbeddingSimilarityEvaluator
Metric | sts-dev-768 | sts-dev-512 | sts-dev-256 | sts-dev-128 | sts-dev-64 | sts-test-768 | sts-test-512 | sts-test-256 | sts-test-128 | sts-test-64 |
---|---|---|---|---|---|---|---|---|---|---|
pearson_cosine | 0.7499 | 0.7468 | 0.7419 | 0.7263 | 0.6973 | 0.8673 | 0.8665 | 0.8568 | 0.8485 | 0.8194 |
spearman_cosine | 0.7532 | 0.7482 | 0.7451 | 0.7304 | 0.707 | 0.8767 | 0.8752 | 0.8702 | 0.8617 | 0.842 |
sentence1
, sentence2
, and score
sentence1 | sentence2 | score | |
---|---|---|---|
type | string | string | float |
details |
|
|
|
sentence1 | sentence2 | score |
---|---|---|
El pájaro de tamaño reducido se posó con delicadeza en una rama cubierta de escarcha. |
Un ave de color amarillo descansaba tranquilamente en una rama. |
3.200000047683716 |
Una chica está tocando la flauta en un parque. |
Un grupo de músicos está tocando en un escenario al aire libre. |
1.286 |
La aclamada escritora británica, Doris Lessing, galardonada con el premio Nobel, fallece |
La destacada autora británica, Doris Lessing, reconocida con el prestigioso Premio Nobel, muere |
4.199999809265137 |
MatryoshkaLoss
with these parameters:{
"loss": "CoSENTLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
sentence1
, sentence2
, and score
sentence1 | sentence2 | score | |
---|---|---|---|
type | string | string | float |
details |
|
|
|
sentence1 | sentence2 | score |
---|---|---|
Un incendio ocurrido en un hospital psiquiátrico ruso resultó en la trágica muerte de 38 personas. |
Se teme que el incendio en un hospital psiquiátrico ruso cause la pérdida de la vida de 38 individuos. |
4.199999809265137 |
"Street dijo que el otro individuo a veces se siente avergonzado de su fiesta, lo cual provoca risas en la multitud" |
"A veces, el otro tipo se encuentra avergonzado de su fiesta y no se le puede culpar." |
3.5 |
El veterano diplomático de Malasia tuvo un encuentro con Suu Kyi el miércoles en la casa del lago en Yangon donde permanece bajo arresto domiciliario. |
Razali Ismail tuvo una reunión de 90 minutos con Suu Kyi, quien ganó el Premio Nobel de la Paz en 1991, en su casa del lago donde está recluida. |
3.691999912261963 |
MatryoshkaLoss
with these parameters:{
"loss": "CoSENTLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 5warmup_ratio
: 0.1bf16
: Trueoverwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
: auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportionalEpoch | Step | Training Loss | Validation Loss | sts-dev-768_spearman_cosine | sts-dev-512_spearman_cosine | sts-dev-256_spearman_cosine | sts-dev-128_spearman_cosine | sts-dev-64_spearman_cosine | sts-test-768_spearman_cosine | sts-test-512_spearman_cosine | sts-test-256_spearman_cosine | sts-test-128_spearman_cosine | sts-test-64_spearman_cosine |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5917 | 100 | 23.7709 | 22.5494 | 0.7185 | 0.7146 | 0.7055 | 0.6794 | 0.6570 | - | - | - | - | - |
1.1834 | 200 | 22.137 | 22.7634 | 0.7449 | 0.7412 | 0.7439 | 0.7287 | 0.7027 | - | - | - | - | - |
1.7751 | 300 | 21.5527 | 22.6985 | 0.7321 | 0.7281 | 0.7243 | 0.7063 | 0.6862 | - | - | - | - | - |
2.3669 | 400 | 20.5745 | 24.0021 | 0.7302 | 0.7264 | 0.7221 | 0.7097 | 0.6897 | - | - | - | - | - |
2.9586 | 500 | 20.0861 | 24.0091 | 0.7392 | 0.7361 | 0.7293 | 0.7124 | 0.6906 | - | - | - | - | - |
3.5503 | 600 | 18.8191 | 26.9012 | 0.7502 | 0.7462 | 0.7399 | 0.7207 | 0.6960 | - | - | - | - | - |
4.1420 | 700 | 18.3 | 29.0209 | 0.7496 | 0.7454 | 0.7432 | 0.7284 | 0.7065 | - | - | - | - | - |
4.7337 | 800 | 17.6496 | 28.9536 | 0.7532 | 0.7482 | 0.7451 | 0.7304 | 0.7070 | - | - | - | - | - |
5.0 | 845 | - | - | - | - | - | - | - | 0.8767 | 0.8752 | 0.8702 | 0.8617 | 0.8420 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}
Base model
answerdotai/ModernBERT-base