metadata

license: afl-3.0
language:
  - ja
metrics:
  - seqeval
library_name: transformers
pipeline_tag: token-classification

SMM4H-2024 Task 2 Japanese NER

Overview

This is a named entity extraction model created by fine-tuning daisaku-s/medtxt_ner_roberta on SMM4H 2024 Task 2a corpus.

Tag set (IOB2 format):

DRUG
DISORDER
FUNCTION

Usage

from transformers import BertForTokenClassification, AutoTokenizer

import torch
text = "サンプルテキスト"
model_name = "yseop/SMM4H2024_Task2a_ja"
with torch.inference_mode():
    model = BertForTokenClassification.from_pretrained(model_name).eval()
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    idx2tag = model.config.id2label
    vecs = tokenizer(text, 
                     padding=True, 
                     truncation=True, 
                     return_tensors="pt")
    ner_logits = model(input_ids=vecs["input_ids"], 
                       attention_mask=vecs["attention_mask"])
    idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0]
    token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1]
    pred_tag = [idx2tag[x] for x in idx][1:-1]

Results

NE	tp	fp	fn	precision	recall	f1
DISORDER	588	409	330	0.5898	0.6405	0.6141
DRUG	307	143	169	0.6822	0.645	0.6631
FUNCTION	69	160	170	0.3013	0.2887	0.2949
all	964	712	669	0.5752	0.5903	0.5827