metadata
license: apache-2.0
language:
- ru
- en
library_name: transformers
pipeline_tag: feature-extraction
BERT-base
Pretrained bidirectional encoder for russian language.
The model was trained using standard MLM objective on large text corpora including open social data.
See Training Details
section for more information.
⚠️ This model contains only the encoder part without any pretrained head.
- Developed by: deepvk
- Model type: BERT
- Languages: Mostly russian and small fraction of other languages
- License: Apache 2.0
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("deepvk/bert-base-uncased")
model = AutoModel.from_pretrained("deepvk/bert-base-uncased")
text = "Привет, мир!"
inputs = tokenizer(text, return_tensors='pt')
predictions = model(**inputs)
Training Details
The model was trained using the NVIDIA source code. See the pretraining documentation for details.
Training Data
250 GB of filtered texts in total. A mix of the following data: Wikipedia, Books and Social corpus.
Architecture details
Argument | Value |
---|---|
Encoder layers | 12 |
Encoder attention heads | 12 |
Encoder embed dim | 768 |
Encoder ffn embed dim | 3,072 |
Activation function | GeLU |
Attention dropout | 0.1 |
Dropout | 0.1 |
Max positions | 512 |
Vocab size | 36000 |
Tokenizer type | BertTokenizer |
Evaluation
We evaluated the model on Russian Super Glue dev set. The best result in each task is marked in bold. All models have the same size except the distilled version of DeBERTa.
Model | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Score |
---|---|---|---|---|---|---|---|---|
vk-deberta-distill | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
vk-roberta-base | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
vk-deberta-base | 0.450 | 0.61 | 0.722 | 0.704 | 0.948 | 0.578 | 0.76 | 0.682 |
vk-bert-base | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 | 0.583 | 0.737 | 0.657 |
sber-bert-base | 0.491 | 0.61 | 0.663 | 0.769 | 0.962 | 0.574 | 0.678 | 0.678 |