metadata

language:
  - ru
license: apache-2.0

Model DmitryPogrebnoy/MedDistilBertBaseRuCased

Model Description

This model is fine-tuned version of DmitryPogrebnoy/distilbert-base-russian-cased. The code for the fine-tuned process can be found here. The model is fine-tuned on a specially collected dataset of over 30,000 medical anamneses in Russian. The collected dataset can be found here.

This model was created as part of a master's project to develop a method for correcting typos in medical histories using BERT models as a ranking of candidates. The project is open source and can be found here.

How to Get Started With the Model

You can use the model directly with a pipeline for masked language modeling:

>>> from transformers import pipeline
>>> pipeline = pipeline('fill-mask', model='DmitryPogrebnoy/MedDistilBertBaseRuCased')
>>> pipeline("У пациента [MASK] боль в грудине.")
[{'score': 0.1733243614435196,
  'token': 6880,
  'token_str': 'имеется',
  'sequence': 'У пациента имеется боль в грудине.'},
 {'score': 0.08818087726831436,
  'token': 1433,
  'token_str': 'есть',
  'sequence': 'У пациента есть боль в грудине.'},
 {'score': 0.03620537742972374,
  'token': 3793,
  'token_str': 'особенно',
  'sequence': 'У пациента особенно боль в грудине.'},
 {'score': 0.03438418731093407,
  'token': 5168,
  'token_str': 'бол',
  'sequence': 'У пациента бол боль в грудине.'},
 {'score': 0.032936397939920425,
  'token': 6281,
  'token_str': 'протекает',
  'sequence': 'У пациента протекает боль в грудине.'}]

Or you can load the model and tokenizer and do what you need to do:

>>> from transformers import AutoTokenizer, AutoModelForMaskedLM
>>> tokenizer = AutoTokenizer.from_pretrained("DmitryPogrebnoy/MedDistilBertBaseRuCased")
>>> model = AutoModelForMaskedLM.from_pretrained("DmitryPogrebnoy/MedDistilBertBaseRuCased")