SMM4H2024_Task2a_ja / README.md
vahbuna's picture
init: model card
23e9948 verified
|
raw
history blame
No virus
1.58 kB
metadata
license: afl-3.0
language:
  - ja
metrics:
  - seqeval
library_name: transformers
pipeline_tag: token-classification

SMM4H-2024 Task 2 Japanese NER

Overview

This is a named entity extraction model created by fine-tuning daisaku-s/medtxt_ner_roberta on SMM4H 2024 Task 2a corpus.

Tag set (IOB2 format):

  • DRUG
  • DISORDER
  • FUNCTION

Usage

from transformers import BertForTokenClassification, AutoTokenizer

import torch
text = "サンプルテキスト"
model_name = "yseop/SMM4H2024_Task2a_ja"
with torch.inference_mode():
    model = BertForTokenClassification.from_pretrained(model_name).eval()
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    idx2tag = model.config.id2label
    vecs = tokenizer(text, 
                     padding=True, 
                     truncation=True, 
                     return_tensors="pt")
    ner_logits = model(input_ids=vecs["input_ids"], 
                       attention_mask=vecs["attention_mask"])
    idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0]
    token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1]
    pred_tag = [idx2tag[x] for x in idx][1:-1]

Results

NE tp fp fn precision recall f1
DISORDER 588 409 330 0.5898 0.6405 0.6141
DRUG 307 143 169 0.6822 0.645 0.6631
FUNCTION 69 160 170 0.3013 0.2887 0.2949
all 964 712 669 0.5752 0.5903 0.5827