This is a ruBERT-conversational model trained on the mixture of 3 paraphrase detection datasets:
- ru_paraphraser (with classes -1 and 0 merged)
- RuPAWS
- A dataset containing crowdsourced evaluation of content preservation in Russian text detoxification by Dementieva et al, 2022.
The model can be used to assess semantic similarity of Russian sentences.
Training notebook: task_oriented_TST/similarity/cross_encoders/russian/train_russian_paraphrase_detector__fixed.ipynb
(in a private repo).
Training parameters:
- optimizer: Adam
lr=1e-5
batch_size=32
epochs=3
ROC AUC on the development data:
source score
detox 0.821665
paraphraser 0.848287
rupaws_qqp 0.761481
rupaws_wiki 0.844093
Pleas see also the documentation of SkolkovoInstitute/ruRoberta-large-paraphrase-v1 that performs better on this task.
- Downloads last month
- 9,061
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.