--- license: mit language: - en tags: - medical - radiology model-index: - name: rate-ner-rad results: [] pipeline_tag: token-classification widget: - text: No suspicious focal mass lesion is seen in the left kidney. example_title: Example in radiopaedia --- # RaTE-NER-Deberta This model is a fine-tuned version of [DeBERTa](https://huggingface.co./microsoft/deberta-v3-base) on the [RaTE-NER](https://huggingface.co./datasets/Angelakeke/RaTE-NER/) dataset. ## Model description This model is trained to serve the RaTEScore metric, if you are interested in our pipeline, please refer to our [paper](https://angelakeke.github.io/RaTEScore/) and [Github](https://github.com/Angelakeke/RaTEScore). This model also can be used to extract **Abnormality, Non-Abnormality, Anatomy, Disease, Non-Disease** in medical radiology reports. ## Usage
Click to expand the usage of this model.

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
def post_process(tokenized_text, predicted_entities, tokenizer):
    entity_spans = []
    start = end = None
    entity_type = None
    for i, (token, label) in enumerate(zip(tokenized_text, predicted_entities[:len(tokenized_text)])):
        if token in ["[CLS]", "[SEP]"]:
            continue
        if label != "O" and i < len(predicted_entities) - 1:
            if label.startswith("B-") and predicted_entities[i+1].startswith("I-"):
                start = i
                entity_type = label[2:]
            elif label.startswith("B-") and predicted_entities[i+1].startswith("B-"):
                start = i
                end = i
                entity_spans.append((start, end, label[2:]))
                start = i
                entity_type = label[2:]
            elif label.startswith("B-") and predicted_entities[i+1].startswith("O"):
                start = i
                end = i
                entity_spans.append((start, end, label[2:]))
                start = end = None
                entity_type = None
            elif label.startswith("I-") and predicted_entities[i+1].startswith("B-"):
                end = i
                if start is not None:
                    entity_spans.append((start, end, entity_type))
                start = i
                entity_type = label[2:]
            elif label.startswith("I-") and predicted_entities[i+1].startswith("O"):
                end = i
                if start is not None:
                    entity_spans.append((start, end, entity_type))
                start = end = None
                entity_type = None
    if start is not None and end is None:
        end = len(tokenized_text) - 2
        entity_spans.append((start, end, entity_type))
    save_pair = []
    for start, end, entity_type in entity_spans:
        entity_str = tokenizer.convert_tokens_to_string(tokenized_text[start:end+1])
        save_pair.append((entity_str, entity_type))
    return save_pair

def run_ner(texts, idx2label, tokenizer, model, device):
    inputs = tokenizer(texts, 
                    max_length=512,
                    padding=True, 
                    truncation=True, 
                    return_tensors="pt").to(device)
    with torch.no_grad():
        outputs = model(**inputs)
    predicted_labels = torch.argmax(outputs.logits, dim=2).tolist()
    save_pairs = []
    for i in range(len(texts)):
        predicted_entities = [idx2label[label] for label in predicted_labels[i]]
        non_pad_mask = inputs["input_ids"][i] != tokenizer.pad_token_id
        non_pad_length = non_pad_mask.sum().item()
        non_pad_input_ids = inputs["input_ids"][i][:non_pad_length]
        tokenized_text = tokenizer.convert_ids_to_tokens(non_pad_input_ids)
        save_pair = post_process(tokenized_text, predicted_entities, tokenizer)
        if i == 0:
            save_pairs = save_pair
        else:
            save_pairs.extend(save_pair)
    return save_pairs

ner_labels = ['B-ABNORMALITY', 'I-ABNORMALITY', 
              'B-NON-ABNORMALITY', 'I-NON-ABNORMALITY', 
              'B-DISEASE', 'I-DISEASE', 
              'B-NON-DISEASE', 'I-NON-DISEASE', 
              'B-ANATOMY', 'I-ANATOMY', 
              'O']
idx2label = {i: label for i, label in enumerate(ner_labels)}

tokenizer = AutoTokenizer.from_pretrained('Angelakeke/RaTE-NER-Deberta')
model = AutoModelForTokenClassification.from_pretrained('Angelakeke/RaTE-NER-Deberta')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

texts = ['Status median sternotomy.']
save_pair = run_ner(texts, idx2label, tokenizer, model, device)

 
## Author Author: [Weike Zhao](https://angelakeke.github.io/) If you have any questions, please feel free to contact zwk0629@sjtu.edu.cn. ## Citation ```bibtex @article{zhao2024ratescore, title={RaTEScore: A Metric for Radiology Report Generation}, author={Zhao, Weike and Wu, Chaoyi and Zhang, Xiaoman and Zhang, Ya and Wang, Yanfeng and Xie, Weidi}, journal={arXiv preprint arXiv:2406.16845}, year={2024} } ```