This is a very small distilled version of the bert-base-multilingual-cased model for Russian and English (45 MB, 12M parameters). There is also an updated version of this model, rubert-tiny2, with a larger vocabulary and better quality on practically all Russian NLU tasks.

This model is useful if you want to fine-tune it for a relatively simple Russian task (e.g. NER or sentiment classification), and you care more about speed and size than about accuracy. It is approximately x10 smaller and faster than a base-sized BERT. Its [CLS] embeddings can be used as a sentence representation aligned between Russian and English.

It was trained on the Yandex Translate corpus, OPUS-100 and Tatoeba, using MLM loss (distilled from bert-base-multilingual-cased), translation ranking loss, and [CLS] embeddings distilled from LaBSE, rubert-base-cased-sentence, Laser and USE.

There is a more detailed description in Russian.

Sentence embeddings can be produced as follows:

# pip install transformers sentencepiece
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("cointegrated/rubert-tiny")
model = AutoModel.from_pretrained("cointegrated/rubert-tiny")
# model.cuda()  # uncomment it if you have a GPU

def embed_bert_cls(text, model, tokenizer):
    t = tokenizer(text, padding=True, truncation=True, return_tensors='pt')
    with torch.no_grad():
        model_output = model(**{k: v.to(model.device) for k, v in t.items()})
    embeddings = model_output.last_hidden_state[:, 0, :]
    embeddings = torch.nn.functional.normalize(embeddings)
    return embeddings[0].cpu().numpy()

print(embed_bert_cls('привет мир', model, tokenizer).shape)
# (312,)
Downloads last month
5,818
Safetensors
Model size
11.9M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for cointegrated/rubert-tiny

Finetunes
7 models

Spaces using cointegrated/rubert-tiny 3