---
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:64000
- loss:DenoisingAutoEncoderLoss
widget:
- source_sentence: đā¤ā¤¨đđĸđ đā¤đĒā¤đ ā¤ đĢđŖā¤ĒđŖ đā¤¨đ ā¤ đđŖđąā¤ ā¤ŦđĸđĒđ ā¤đ¯
sentences:
- ' ā¤Ŗā¤ ā¤ŦđĸđĒđ ā¤ ā¤Ēā¤đĒđĻ đŖā¤ đ ā¤đĢā¤đĸđ˛đĸā¤Ŗā¤đĒđŗā¤ đŖā¤ ā¤ā¤đđĻđđŗā¤ ā¤ā¤ā¤Ŗā¤đĻ đā¤đ ā¤đĒ ā¤Ŗā¤đŖđŖā¤ đ ā¤đĢā¤đĸđ˛đĸđđŗā¤ ā¤Ŗā¤ ā¤ĸā¤đĒ
đĸā¤Ŗā¤ā¤˛đĸđ¯'
- ' đŖā¤đā¤Ŧā¤đđĻ đŖā¤ đā¤ā¤¨đđĸđ đ đŖā¤Ēā¤đĒđĻ ā¤Ēā¤đā¤ đĸā¤Ŗā¤ đ¤ā¤đ ā¤ ā¤ĸā¤ā¤ĸā¤ĸā¤ đđŖ đā¤đĒā¤đ ā¤ đĸđŖā¤đ ā¤đā¤ đđąā¤ā¤Ēā¤đā¤Ēā¤ đŖā¤
đ đŖā¤Ēā¤đĒ đŖā¤ā¤¨đā¤đĒ đĢđŖā¤ĒđŖ đŖā¤ đŗā¤¨ā¤đĻ đā¤¨đ ā¤ ā¤Ŗā¤ đ˛đĸ đā¤ đđŖđąā¤ ā¤ŦđĸđĒđ ā¤đ¯'
- ā¤Ēā¤đĒđĻđ đĸ ā¤Ŗā¤ ā¤ĸā¤¨ā¤Ŧā¤ đąā¤ ā¤ā¤¨đā¤Ŧđĸā¤Ŗā¤đĒ ā¤đąā¤ā¤˛ā¤˛đŖđ ā¤ā¤đ˛ā¤ ā¤Ēā¤ ā¤ā¤ā¤˛đĸā¤ĸđĸđ ā¤ā¤đŗā¤đĒ đĸđĒā¤đ ā¤ ā¤Ŧā¤đŗā¤đĒ ā¤Ēā¤¨đĒđđĸā¤Ŗā¤Ŗā¤
đā¤¨đ ā¤ ā¤Ŗā¤ ā¤¤đĸ đąā¤ ā¤ā¤¨đā¤Ŧđĸā¤Ŗā¤đĒ đđąā¤ā¤˛ā¤˛ā¤ā¤ŖđĻ ā¤Ĩđ¯
- source_sentence: ā¤Ŗā¤đā¤ ā¤Ŧā¤ā¤ĸā¤ đŖā¤ ā¤˛ā¤¨đĒā¤ đŖā¤ đŖā¤ ā¤Ēā¤ đ˛đĸ đŖā¤
sentences:
- đđŖđĢđ đ đĸā¤¤đĢā¤đĻā¤˛ đŖā¤ŦđĸđŖđĸ đā¤đ đĢā¤đĸđ˛đĻđŗđĢđĸ đĒā¤đā¤đĒ đ ā¤Ŧā¤ đąā¤ā¤Ēā¤đ đŖđĸđŗā¤đ ā¤ĸā¤đĻ đā¤Ĩđā¤ĨđŽđ¯
- ' đąā¤đđā¤đ ā¤Ŗā¤đā¤ ā¤Ēā¤đĸđ ā¤đā¤ đąā¤ ā¤đąā¤đĒā¤đĒđĒā¤¨đ đĢđĒ đŗā¤¨ ā¤¤đĸ ā¤Ŧā¤ā¤ĸā¤ đŖā¤ ā¤˛ā¤¨đĒā¤ đŖā¤ đŖā¤¨đ ā¤ĸā¤¨ā¤ā¤ā¤ā¤đĻđ ā¤ā¤Ŗā¤Ŗā¤¨đā¤đđŗā¤¨
đŖā¤ đ ā¤đŗā¤¨ đđĻđ ā¤ ā¤Ēā¤ đĢā¤đā¤Ŗā¤đĒ đŖā¤ ā¤Ēā¤ đ˛đĸ đŗā¤ā¤¨đĒđĸ đŖā¤ đŗā¤ā¤¨ā¤đĸ đ˛đĸā¤ŖđĻ đŖā¤ đŖā¤đ¯'
- ' ā¤ đā¤đĒđā¤đŗđĢđĸđ đŖđŖđā¤đĒđĻ đ ā¤đā¤ā¤˛đĸđŗā¤đĒ ā¤˛ā¤ā¤¨ā¤ŖđŖā¤Ŗđĸđ đĸđđŖđĸā¤Ŗā¤ đĸā¤Ēā¤ ā¤¤đĻ ā¤ĸā¤ā¤ĸā¤ĸā¤đĒ đĢā¤¨đā¤¨đ ā¤đĒ đā¤¨ā¤˛ā¤ đŖā¤ đĢā¤đĒđđŖđđĸđ
đŗđĢā¤đĒđĸđā¤ ā¤ đĸđđŖđĸā¤Ŗā¤ đŖā¤ đā¤¨đ ā¤ ā¤Ēā¤ā¤ĸā¤ĸā¤ā¤Ēā¤đĒ đŖā¤ ā¤ĸđĸđ đŖđŖđā¤ đŖā¤ đđĸā¤Ŗā¤ā¤ŖđĻ đā¤đđĸđŖđŖđđĸđ đđąā¤đĒā¤đĒđĒā¤¨ ā¤Ēā¤
đĢā¤đā¤Ŗā¤đĒ đđąā¤đĒā¤đĒđĒā¤¨đ ā¤˛ā¤ā¤¨ā¤Ŗā¤ ā¤ đā¤đŗā¤đĒđ¯'
- source_sentence: đŖā¤¨ā¤ĸā¤ ā¤ĸā¤ĸā¤¤đ đ ā¤đ ā¤đĒ ā¤ā¤˛ā¤ā¤ĒđŖā¤¨đ đĸ
sentences:
- đŖā¤¨ā¤ĸā¤ đā¤¨đ ā¤ đŖđĻđđā¤ˇđŖđĻđđđ ā¤đā¤đ¤ā¤đĒā¤Ēā¤ ā¤ĸā¤ĸā¤¤đ đ ā¤đ ā¤đĒ đā¤đŗđŗđĻā¤Ŗ ā¤ā¤˛ā¤ā¤ĒđŖā¤¨đ đĸ đ¯
- ' ā¤đ đ˛ā¤đĒā¤ đŗā¤đ ā¤đĒđąā¤ đā¤¨đ ā¤ đŖā¤ā¤Ŧā¤ ā¤ĸā¤ā¤Ŗā¤ ā¤đ đ˛ā¤đŖā¤đŖā¤ ā¤ā¤Ŗā¤Ŗā¤¨đā¤đ ā¤Ŧā¤ đŗā¤ā¤¨đĒā¤đ đĸā¤Ŗā¤ā¤˛ā¤đĸ đā¤ đā¤đđĻđĒđĸā¤Ŗā¤
đ ā¤đŗā¤¨ ā¤Ŗā¤đĒā¤đ¯'
- ' đĢā¤đĨā¤đā¤ ā¤đ¤ā¤ā¤ĸā¤Ēā¤đĒđąā¤ ā¤Ŗā¤đā¤ đŖā¤ đąā¤đĢā¤ā¤˛ā¤ đ ā¤¨đŗā¤đ đ ā¤đ ā¤ ā¤¤đĸđđĸđ ā¤ā¤Ŗā¤Ŗā¤¨đā¤đ ā¤Ŗā¤ā¤đĸ đŖā¤ ā¤Ēā¤đąā¤ā¤Ŧā¤đĒđ¯'
- source_sentence: ā¤đ
sentences:
- đ ā¤¨ā¤Ēā¤¨đąā¤ ā¤ đĒā¤đā¤đĒ ā¤° ā¤Ŧā¤ đąā¤ā¤Ēā¤đ đ ā¤ā¤Ŗā¤¨đ ā¤ đ§đ§ā¤ đĻ ā¤đā¤¨ đā¤ ā¤¤đĸđđĸđ đ˛ā¤đŗđĸđđđŖđđĸ đŦđ§ đŖā¤ đđĻ ā¤¤đĸđđĸđ đąā¤đđĸ
đđĸđĒā¤Ŧđĸđ đŖā¤ ā¤Ŗā¤ ā¤Ŗđĸ đĢā¤ā¤Ēđŗā¤đĒđĸđ đ đĸđā¤Ēā¤¨đā¤ đā¤ā¤ā¤đ ā¤ĸā¤ā¤Ŗā¤đ ā¤Ēā¤đŗđĢđĸđđŗā¤ ā¤ đā¤đđŖđ¯
- ' ā¤đ ā¤Ŗđĸ đĸđ ā¤đđĸđ đŗđ¯'
- ' đ˛ā¤đĢā¤đŖ ā¤Ŗā¤ đā¤đ đ ā¤ā¤˛ā¤ đā¤đā¤đĒ ā¤ đ§đā¤ ā¤đđ° đŖā¤ đđąā¤ā¤˛ā¤˛ā¤ā¤ŖđĻ đđ§ đ ā¤đŗā¤¨ ā¤ĸā¤đ đŗđĢā¤đā¤đąā¤ ā¤ đąā¤đŗā¤đđđĸ ā¤ đĸ
ā¤ đŖā¤¨đ ā¤Ŧā¤đŗā¤đ¯'
- source_sentence: ā¤ŦđĢđŖđŗā¤Ē đĸđĸ đŗđĢđĸđđĻ đ ā¤đ˛đĸ đ ā¤đĢđĸđ đ ā¤đā¤¤đĸđĻ ā¤Ēā¤đĸđ ā¤đđŖđ đŖā¤ đ˛ā¤đŗā¤ā¤˛ā¤¨ā¤˛ā¤˛ā¤¨đā¤ ā¤Ŗā¤đā¤ ā¤ĸā¤
đ ā¤đ¤ā¤ā¤¨đā¤ ā¤¤đĸđđĸđ đĢā¤đđā¤ā¤˛đĸ ā¤Ŗā¤ā¤Ŗđĸđ
sentences:
- ā¤đ đĸđā¤Ēā¤ā¤¤ā¤¤đĸā¤Ŗā¤ ā¤ ā¤¤đĸđđĸđ ā¤ŦđĢđŖđŗā¤Ē đŗđĻđĒđĸđĻđŗ đĸđĸ đŗđĢđĸđđĻ đ ā¤đ˛đĸ đ ā¤đĢđĸđ đ ā¤đā¤¤đĸđĻ ā¤Ēā¤đĒđĻ đŖā¤ ā¤đĸđ ā¤ĸđĸđ đĸđā¤Ŧā¤đā¤Ēā¤ā¤Ēā¤Ēā¤¨đ
ā¤Ēđŗā¤đĒđĸđ ā¤Ēā¤đĸđ ā¤đđŖđ đŖđĸđĒđĻā¤ĸā¤ đŖā¤ đ˛ā¤đŗā¤ā¤˛ā¤¨ā¤˛ā¤˛ā¤¨đā¤ đā¤ ā¤đ đĸđā¤¤đĸđĻ ā¤Ŗā¤đā¤ ā¤ĸā¤ đ ā¤đ¤ā¤ā¤¨đā¤ ā¤¤đĸđđĸđ đđąā¤đā¤¤đĸā¤Ŗā¤đĒ
đĢā¤đđā¤ā¤˛đĸ ā¤Ŗā¤ā¤Ŗđĸđ ā¤Ēā¤đ˛đĸā¤Ŗā¤đĒđŗā¤¨đ¯
- ā¤ĒđŖā¤§đŗā¤Ŗ ā¤§đĢđĸđĒđĸ đā¤đ đĢā¤đĸđ˛đĻ đŗđĢđĸ ā¤ đĒā¤đā¤đĒ đđ ā¤Ŧā¤ đąā¤ā¤Ēā¤đ ā¤ā¤Ŧā¤¨đŗā¤Ēā¤ đā¤Ĩđđ§đŽ ā¤ā¤đ đąā¤đŗā¤đ ā¤ĸā¤đŖđ đĸđā¤ĒđŖđ
ā¤ā¤đ đ¤ā¤đ ā¤ĸđĸā¤ đđĻđ¯
- ā¤Ēā¤ā¤Ŧā¤Ŧā¤đ˛ā¤đŖđĸ đ ā¤ā¤Ēđŗā¤¨ā¤Ŧā¤¨đđĸđ đ ā¤¨ā¤Ēā¤đđĻ đđĻ ā¤ đŗā¤đŗđĢđĻđ ā¤đĒā¤˛đĸā¤Ē đŖā¤đđĻ ā¤Ŗā¤đđđĸđ ā¤ā¤Ŧā¤đŖđĻđ¤ ā¤ ā¤đĒđĻđąā¤ ā¤Ēā¤ ā¤Ēđŗā¤đđĸā¤Ŗā¤đĒ
đđĸđā¤đĒđ¯
---
# SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co./sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co./sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 384 tokens
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co./models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the đ¤ Hub
model = SentenceTransformer("T-Blue/tsdae_pro_MiniLM_L12_2")
# Run inference
sentences = [
'ā¤ŦđĢđŖđŗā¤Ē đĸđĸ đŗđĢđĸđđĻ đ ā¤đ˛đĸ đ ā¤đĢđĸđ đ ā¤đā¤¤đĸđĻ ā¤Ēā¤đĸđ ā¤đđŖđ đŖā¤ đ˛ā¤đŗā¤ā¤˛ā¤¨ā¤˛ā¤˛ā¤¨đā¤ ā¤Ŗā¤đā¤ ā¤ĸā¤ đ ā¤đ¤ā¤ā¤¨đā¤ ā¤¤đĸđđĸđ đĢā¤đđā¤ā¤˛đĸ ā¤Ŗā¤ā¤Ŗđĸđ',
'ā¤đ đĸđā¤Ēā¤ā¤¤ā¤¤đĸā¤Ŗā¤ ā¤ ā¤¤đĸđđĸđ ā¤ŦđĢđŖđŗā¤Ē đŗđĻđĒđĸđĻđŗ đĸđĸ đŗđĢđĸđđĻ đ ā¤đ˛đĸ đ ā¤đĢđĸđ đ ā¤đā¤¤đĸđĻ ā¤Ēā¤đĒđĻ đŖā¤ ā¤đĸđ ā¤ĸđĸđ đĸđā¤Ŧā¤đā¤Ēā¤ā¤Ēā¤Ēā¤¨đ ā¤Ēđŗā¤đĒđĸđ ā¤Ēā¤đĸđ ā¤đđŖđ đŖđĸđĒđĻā¤ĸā¤ đŖā¤ đ˛ā¤đŗā¤ā¤˛ā¤¨ā¤˛ā¤˛ā¤¨đā¤ đā¤ ā¤đ đĸđā¤¤đĸđĻ ā¤Ŗā¤đā¤ ā¤ĸā¤ đ ā¤đ¤ā¤ā¤¨đā¤ ā¤¤đĸđđĸđ đđąā¤đā¤¤đĸā¤Ŗā¤đĒ đĢā¤đđā¤ā¤˛đĸ ā¤Ŗā¤ā¤Ŗđĸđ ā¤Ēā¤đ˛đĸā¤Ŗā¤đĒđŗā¤¨đ¯',
'ā¤ĒđŖā¤§đŗā¤Ŗ ā¤§đĢđĸđĒđĸ đā¤đ đĢā¤đĸđ˛đĻ đŗđĢđĸ ā¤ đĒā¤đā¤đĒ đđ ā¤Ŧā¤ đąā¤ā¤Ēā¤đ ā¤ā¤Ŧā¤¨đŗā¤Ēā¤ đā¤Ĩđđ§đŽ ā¤ā¤đ đąā¤đŗā¤đ ā¤ĸā¤đŖđ đĸđā¤ĒđŖđ ā¤ā¤đ đ¤ā¤đ ā¤ĸđĸā¤ đđĻđ¯',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 64,000 training samples
* Columns: sentence_0
and sentence_1
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details |
đā¤¨đŖā¤¨ ā¤ĸđĸđĒđđĸđđĻđā¤¨đŗā¤ ā¤ĒđĻđā¤¨đ
| ā¤ĒđĻđā¤¨đ ā¤Ēā¤ā¤Ŧā¤ ā¤Ŗā¤đā¤ đā¤¨đŖā¤¨ đŖā¤ ā¤ĸđĸđĒđđĸđđĻđā¤¨đŗā¤ đŖā¤ ā¤ĒđĻđā¤¨đ ā¤Ēā¤ā¤¤đĢđŖā¤Ŧā¤đ¯
|
| ā¤ ā¤¤đĸā¤ĸđĸā¤ŖđŖā¤Ŗđĸđ đŗā¤đŖā¤đĒđąā¤đĒ đŗā¤¨ ā¤ā¤đĒā¤ đ ā¤ā¤Ēđŗā¤ā¤Ŗđĸđ
| ā¤ā¤ĸđŖđā¤đĸđā¤đ ā¤đĒ ā¤ ā¤Ŗā¤đąā¤đā¤¤đĸđ ā¤¤đĸā¤ĸđĸā¤ŖđŖā¤Ŗđĸđ đŗā¤đŖā¤đĒđąā¤đĒ đā¤đ ā¤đā¤đĻ đ ā¤đŗā¤¨ ā¤đ đ˛ā¤đđĸ đ¤ā¤ đŗā¤¨ đĸā¤Ŗā¤ ā¤ā¤đĒā¤ đ ā¤¨ā¤Ēā¤đđĻ ā¤ đ ā¤ā¤Ēđŗā¤ā¤Ŗđĸđ ā¤ā¤ĸđŖđā¤đđŗā¤¨đ¯
|
| đŖā¤ ā¤Ŧā¤¨đŖā¤¨đ đ ā¤đąā¤ đā¤đĒđĸđŖā¤¨đ đ ā¤¨đā¤ā¤˛ā¤˛ā¤¨ ā¤Ēā¤ đ¯
| ā¤Ēā¤ ā¤ĸā¤ đŖā¤ ā¤Ŧā¤¨đŖā¤¨đ đ ā¤đąā¤ ā¤Ŧā¤ đā¤đĒđĸđŖā¤¨đ ā¤đā¤đĒā¤¤đĢđĸđŗā¤Ē đŖā¤ā¤ĸā¤đā¤ˇđŖā¤ā¤ĸā¤đ đŖā¤ đ ā¤¨đā¤ā¤˛ā¤˛ā¤¨ đ ā¤đŗā¤¨ ā¤ā¤˛ā¤ā¤ā¤ đŖā¤ ā¤ā¤¨đā¤Ŧđĸā¤Ŗā¤đĒ đ ā¤đā¤đĸđā¤ā¤Ēā¤ đā¤Ŗā¤đā¤¤đĸ ā¤Ēā¤ đā¤đ ā¤¨đŗ đ¯
|
* Loss: [DenoisingAutoEncoderLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters