|
--- |
|
license: afl-3.0 |
|
language: |
|
- ja |
|
metrics: |
|
- seqeval |
|
library_name: transformers |
|
pipeline_tag: token-classification |
|
--- |
|
# SMM4H-2024 Task 2 Japanese NER |
|
|
|
## Overview |
|
|
|
This is a named entity extraction model created by fine-tuning [daisaku-s/medtxt_ner_roberta](https://huggingface.co./daisaku-s/medtxt_ner_roberta) on [SMM4H 2024 Task 2a](https://healthlanguageprocessing.org/smm4h-2024/) corpus. |
|
|
|
Tag set (IOB2 format): |
|
* DRUG |
|
* DISORDER |
|
* FUNCTION |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import BertForTokenClassification, AutoTokenizer |
|
|
|
import torch |
|
text = "銈点兂銉椼儷銉嗐偔銈广儓" |
|
model_name = "yseop/SMM4H2024_Task2a_ja" |
|
with torch.inference_mode(): |
|
model = BertForTokenClassification.from_pretrained(model_name).eval() |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
idx2tag = model.config.id2label |
|
vecs = tokenizer(text, |
|
padding=True, |
|
truncation=True, |
|
return_tensors="pt") |
|
ner_logits = model(input_ids=vecs["input_ids"], |
|
attention_mask=vecs["attention_mask"]) |
|
idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0] |
|
token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1] |
|
pred_tag = [idx2tag[x] for x in idx][1:-1] |
|
``` |
|
|
|
## Results |
|
|
|
|NE |tp |fp |fn |precision| recall| f1| |
|
|---|---:|---:|---:|---:|---:|---:| |
|
|DISORDER| 588 |409| 330| 0.5898| 0.6405| 0.6141| |
|
|DRUG| 307 |143 |169| 0.6822| 0.645| 0.6631| |
|
|FUNCTION| 69 |160 |170| 0.3013| 0.2887| 0.2949| |
|
|all| 964| 712 |669 |0.5752 |0.5903 |0.5827| |
|
|