This is a distilroberta-base model fined tuned to classify text into 3 categories:

  • Rare Diseases
  • Non-Rare Diseases
  • Other

The details of how this model was built and evaluated are provided in the article:

Rei L, Pita Costa J, Zdolšek Draksler T. Automatic Classification and Visualization of Text Data on Rare Diseases. Journal of Personalized Medicine. 2024; 14(5):545. https://doi.org/10.3390/jpm14050545

@Article{jpm14050545,
AUTHOR = {Rei, Luis and Pita Costa, Joao and Zdolšek Draksler, Tanja},
TITLE = {Automatic Classification and Visualization of Text Data on Rare Diseases},
JOURNAL = {Journal of Personalized Medicine},
VOLUME = {14},
YEAR = {2024},
NUMBER = {5},
ARTICLE-NUMBER = {545},
URL = {https://www.mdpi.com/2075-4426/14/5/545},
PubMedID = {38793127},
ISSN = {2075-4426},
DOI = {10.3390/jpm14050545}
}

Note that the in the article the larger roberta-base model is fine-tuned instead. This is a smaller model. This model is shared for demonstration and validation purposes. Hyper-parameters were not tuned.

Using this model

Simplest way to use this model is via a huggingface transformers' pipeline.

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="lrei/rad-small")

# Simple high-level usage
pipe(["The patient suffer from a complex genetic disorder.", "The patient suffers from a common genetic disorder."])

Dataset

The dataset used to train this model is available on zenodo. It is a subset of abstracts obtained from PubMed and sorted into the 3 classes on the basis of their MeSH terms.

Like the model, the dataset is provided for demonstration and methodology validation purposes. The original PubMed data was randomly under-sampled.

Code

The code used to create this model is available on Github.

Test Results

Averaged over all 3 classes:

average precision recall F1
micro 0.84 0.84 0.84
macro 0.84 0.84 0.84
Downloads last month
8
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for lrei/rad-small

Finetuned
(561)
this model