--- license: afl-3.0 language: - ja metrics: - seqeval library_name: transformers pipeline_tag: token-classification --- # SMM4H-2024 Task 2 Japanese NER ## Overview This is a named entity extraction model created by fine-tuning [daisaku-s/medtxt_ner_roberta](https://huggingface.co./daisaku-s/medtxt_ner_roberta) on [SMM4H 2024 Task 2a](https://healthlanguageprocessing.org/smm4h-2024/) corpus. Tag set (IOB2 format): * DRUG * DISORDER * FUNCTION ## Usage ```python from transformers import BertForTokenClassification, AutoTokenizer import torch text = "サンプルテキスト" model_name = "yseop/SMM4H2024_Task2a_ja" with torch.inference_mode(): model = BertForTokenClassification.from_pretrained(model_name).eval() tokenizer = AutoTokenizer.from_pretrained(model_name) idx2tag = model.config.id2label vecs = tokenizer(text, padding=True, truncation=True, return_tensors="pt") ner_logits = model(input_ids=vecs["input_ids"], attention_mask=vecs["attention_mask"]) idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0] token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1] pred_tag = [idx2tag[x] for x in idx][1:-1] ``` ## Results |NE |tp |fp |fn |precision| recall| f1| |---|---:|---:|---:|---:|---:|---:| |DISORDER| 588 |409| 330| 0.5898| 0.6405| 0.6141| |DRUG| 307 |143 |169| 0.6822| 0.645| 0.6631| |FUNCTION| 69 |160 |170| 0.3013| 0.2887| 0.2949| |all| 964| 712 |669 |0.5752 |0.5903 |0.5827|