vahbuna commited on
Commit
23e9948
1 Parent(s): b444606

init: model card

Browse files
Files changed (1) hide show
  1. README.md +51 -3
README.md CHANGED
@@ -1,3 +1,51 @@
1
- ---
2
- license: afl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: afl-3.0
3
+ language:
4
+ - ja
5
+ metrics:
6
+ - seqeval
7
+ library_name: transformers
8
+ pipeline_tag: token-classification
9
+ ---
10
+ # SMM4H-2024 Task 2 Japanese NER
11
+
12
+ ## Overview
13
+
14
+ This is a named entity extraction model created by fine-tuning [daisaku-s/medtxt_ner_roberta](https://huggingface.co/daisaku-s/medtxt_ner_roberta) on [SMM4H 2024 Task 2a](https://healthlanguageprocessing.org/smm4h-2024/) corpus.
15
+
16
+ Tag set (IOB2 format):
17
+ * DRUG
18
+ * DISORDER
19
+ * FUNCTION
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ from transformers import BertForTokenClassification, AutoTokenizer
25
+
26
+ import torch
27
+ text = "サンプルテキスト"
28
+ model_name = "yseop/SMM4H2024_Task2a_ja"
29
+ with torch.inference_mode():
30
+ model = BertForTokenClassification.from_pretrained(model_name).eval()
31
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
32
+ idx2tag = model.config.id2label
33
+ vecs = tokenizer(text,
34
+ padding=True,
35
+ truncation=True,
36
+ return_tensors="pt")
37
+ ner_logits = model(input_ids=vecs["input_ids"],
38
+ attention_mask=vecs["attention_mask"])
39
+ idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0]
40
+ token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1]
41
+ pred_tag = [idx2tag[x] for x in idx][1:-1]
42
+ ```
43
+
44
+ ## Results
45
+
46
+ |NE |tp |fp |fn |precision| recall| f1|
47
+ |---|---:|---:|---:|---:|---:|---:|
48
+ |DISORDER| 588 |409| 330| 0.5898| 0.6405| 0.6141|
49
+ |DRUG| 307 |143 |169| 0.6822| 0.645| 0.6631|
50
+ |FUNCTION| 69 |160 |170| 0.3013| 0.2887| 0.2949|
51
+ |all| 964| 712 |669 |0.5752 |0.5903 |0.5827|