SMM4H2024_Task2a_ja / README.md
vahbuna's picture
init: model card
23e9948 verified
|
raw
history blame
No virus
1.58 kB
---
license: afl-3.0
language:
- ja
metrics:
- seqeval
library_name: transformers
pipeline_tag: token-classification
---
# SMM4H-2024 Task 2 Japanese NER
## Overview
This is a named entity extraction model created by fine-tuning [daisaku-s/medtxt_ner_roberta](https://huggingface.co./daisaku-s/medtxt_ner_roberta) on [SMM4H 2024 Task 2a](https://healthlanguageprocessing.org/smm4h-2024/) corpus.
Tag set (IOB2 format):
* DRUG
* DISORDER
* FUNCTION
## Usage
```python
from transformers import BertForTokenClassification, AutoTokenizer
import torch
text = "銈点兂銉椼儷銉嗐偔銈广儓"
model_name = "yseop/SMM4H2024_Task2a_ja"
with torch.inference_mode():
model = BertForTokenClassification.from_pretrained(model_name).eval()
tokenizer = AutoTokenizer.from_pretrained(model_name)
idx2tag = model.config.id2label
vecs = tokenizer(text,
padding=True,
truncation=True,
return_tensors="pt")
ner_logits = model(input_ids=vecs["input_ids"],
attention_mask=vecs["attention_mask"])
idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0]
token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1]
pred_tag = [idx2tag[x] for x in idx][1:-1]
```
## Results
|NE |tp |fp |fn |precision| recall| f1|
|---|---:|---:|---:|---:|---:|---:|
|DISORDER| 588 |409| 330| 0.5898| 0.6405| 0.6141|
|DRUG| 307 |143 |169| 0.6822| 0.645| 0.6631|
|FUNCTION| 69 |160 |170| 0.3013| 0.2887| 0.2949|
|all| 964| 712 |669 |0.5752 |0.5903 |0.5827|