File size: 3,087 Bytes
c020b2b 3477c6f c020b2b 8c33f89 7d967d0 8c33f89 c020b2b 8c33f89 c4c55b3 8c33f89 d61d95a 8c33f89 d61d95a 8c33f89 d61d95a 8c33f89 99f41bb d61d95a 99f41bb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
license: apache-2.0
language:
- zh
tags:
- NER
- TCM
- Traditional Chinese Medicine
- medical
widget:
- text: "化滞汤,出处:《证治汇补》卷八。。组成:青皮20g,陈皮20g,厚朴20g,枳实20g,黄芩20g,黄连20g,当归20g,芍药20g,木香5g,槟榔8g,滑石3g,甘草4g。。主治:下痢因于食积气滞者。"
example_title: "Example 1"
---
# TCMNER
[About Author](https://github.com/huangxinping).
[Our Products](https://zhongyigen.com)
# Model description
TCMNER is a fine-tuned BERT model that is ready to use for Named Entity Recognition of Traditional Chinese Medicine and achieves state-of-the-art performance for the NER task. It has been trained to recognize six types of entities: prescription (方剂), herb (本草), source (来源), disease (病名), symptom (症状) and syndrome(证型).
Specifically, this model is a TCMRoBERTa model, a fine-tuned model of RoBERTa for Traditional Chinese medicine, that was fine-tuned on the Chinese version of the [Haiwei AI Lab](https://www.haiweikexin.com/)'s Named Entity Recognition dataset.
**Currently, TCMRoBERTa is just a closed-source model for my own company and will be open-source in the future.**
# How to use
You can use this model with Transformers pipeline for NER.
```
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Monor/TCMNER")
model = AutoModelForTokenClassification.from_pretrained("Monor/TCMNER")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "化滞汤,出处:《证治汇补》卷八。。组成:青皮20g,陈皮20g,厚朴20g,枳实20g,黄芩20g,黄连20g,当归20g,芍药20g,木香5g,槟榔8g,滑石3g,甘草4g。。主治:下痢因于食积气滞者。"
ner_results = nlp(example)
print(ner_results)
```
## Training data
This model was fine-tuned on MY DATASET.
Abbreviation|Description
-|-
O|Outside of a named entity
B-方剂 |Beginning of a prescription entity right after another prescription entity
I-方剂 | Prescription entity
B-本草 |Beginning of a herb entity right after another herb entity
I-本草 |Herb entity
B-来源 |Beginning of a source of prescription right after another source of prescription
I-来源 |Source entity
B-病名 |Beginning of a disease's name right after another disease's name
I-病名 |Disease's name
B-症状 |Beginning of a symptom right after another symptom
I-症状 |Symptom
B-证型 |Beginning of a syndrome right after another syndrome
I-证型 |Syndrome
# Eval results
![alt text](images/iShot_2024-06-07_18.03.00.png "Title")
# Notices
1. The model is commercially available for free.
2. I am not going to write a paper about this model, if you use any details in your paper, please mention it, thanks.
---
# Bonus
All of our TCM domain models will be open-sourced soon, including:
1. A series of pre-trained models
2. Named entity recognition for TCM
3. Text localization in ancient images
4. OCR for ancient images
And so on |