yseop
/

SMM4H2024_Task2a_ja

Token Classification

Inference Endpoints

Model card Files Files and versions Community

SMM4H2024_Task2a_ja / README.md

vahbuna's picture

init: model card

23e9948 verified 3 months ago

|

history blame contribute delete

1.58 kB

	---
	license: afl-3.0
	language:
	- ja
	metrics:
	- seqeval
	library_name: transformers
	pipeline_tag: token-classification
	---
	# SMM4H-2024 Task 2 Japanese NER

	## Overview

	This is a named entity extraction model created by fine-tuning [daisaku-s/medtxt_ner_roberta](https://huggingface.co./daisaku-s/medtxt_ner_roberta) on [SMM4H 2024 Task 2a](https://healthlanguageprocessing.org/smm4h-2024/) corpus.

	Tag set (IOB2 format):
	* DRUG
	* DISORDER
	* FUNCTION

	## Usage

	```python
	from transformers import BertForTokenClassification, AutoTokenizer

	import torch
	text = "サンプルテキスト"
	model_name = "yseop/SMM4H2024_Task2a_ja"
	with torch.inference_mode():
	model = BertForTokenClassification.from_pretrained(model_name).eval()
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	idx2tag = model.config.id2label
	vecs = tokenizer(text,
	padding=True,
	truncation=True,
	return_tensors="pt")
	ner_logits = model(input_ids=vecs["input_ids"],
	attention_mask=vecs["attention_mask"])
	idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0]
	token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1]
	pred_tag = [idx2tag[x] for x in idx][1:-1]
	```

	## Results

	\|NE \|tp \|fp \|fn \|precision\| recall\| f1\|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|
	\|DISORDER\| 588 \|409\| 330\| 0.5898\| 0.6405\| 0.6141\|
	\|DRUG\| 307 \|143 \|169\| 0.6822\| 0.645\| 0.6631\|
	\|FUNCTION\| 69 \|160 \|170\| 0.3013\| 0.2887\| 0.2949\|
	\|all\| 964\| 712 \|669 \|0.5752 \|0.5903 \|0.5827\|