|
--- |
|
license: apache-2.0 |
|
language: |
|
- zh |
|
tags: |
|
- NER |
|
- TCM |
|
- Traditional Chinese Medicine |
|
- medical |
|
widget: |
|
- text: "化滞汤,出处:《证治汇补》卷八。。组成:青皮20g,陈皮20g,厚朴20g,枳实20g,黄芩20g,黄连20g,当归20g,芍药20g,木香5g,槟榔8g,滑石3g,甘草4g。。主治:下痢因于食积气滞者。" |
|
example_title: "Example 1" |
|
--- |
|
# TCMNER |
|
|
|
[About Author](https://github.com/huangxinping). |
|
[Our Products](https://zhongyigen.com) |
|
|
|
# Model description |
|
|
|
TCMNER is a fine-tuned BERT model that is ready to use for Named Entity Recognition of Traditional Chinese Medicine and achieves state-of-the-art performance for the NER task. It has been trained to recognize six types of entities: prescription (方剂), herb (本草), source (来源), disease (病名), symptom (症状) and syndrome(证型). |
|
|
|
Specifically, this model is a TCMRoBERTa model, a fine-tuned model of RoBERTa for Traditional Chinese medicine, that was fine-tuned on the Chinese version of the [Haiwei AI Lab](https://www.haiweikexin.com/)'s Named Entity Recognition dataset. |
|
|
|
**Currently, TCMRoBERTa is just a closed-source model for my own company and will be open-source in the future.** |
|
|
|
|
|
# How to use |
|
|
|
You can use this model with Transformers pipeline for NER. |
|
|
|
``` |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
from transformers import pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Monor/TCMNER") |
|
model = AutoModelForTokenClassification.from_pretrained("Monor/TCMNER") |
|
|
|
nlp = pipeline("ner", model=model, tokenizer=tokenizer) |
|
example = "化滞汤,出处:《证治汇补》卷八。。组成:青皮20g,陈皮20g,厚朴20g,枳实20g,黄芩20g,黄连20g,当归20g,芍药20g,木香5g,槟榔8g,滑石3g,甘草4g。。主治:下痢因于食积气滞者。" |
|
|
|
ner_results = nlp(example) |
|
print(ner_results) |
|
``` |
|
|
|
|
|
## Training data |
|
|
|
This model was fine-tuned on MY DATASET. |
|
|
|
Abbreviation|Description |
|
-|- |
|
O|Outside of a named entity |
|
B-方剂 |Beginning of a prescription entity right after another prescription entity |
|
I-方剂 | Prescription entity |
|
B-本草 |Beginning of a herb entity right after another herb entity |
|
I-本草 |Herb entity |
|
B-来源 |Beginning of a source of prescription right after another source of prescription |
|
I-来源 |Source entity |
|
B-病名 |Beginning of a disease's name right after another disease's name |
|
I-病名 |Disease's name |
|
B-症状 |Beginning of a symptom right after another symptom |
|
I-症状 |Symptom |
|
B-证型 |Beginning of a syndrome right after another syndrome |
|
I-证型 |Syndrome |
|
|
|
# Eval results |
|
|
|
![alt text](images/iShot_2024-06-07_18.03.00.png "Title") |
|
|
|
|
|
# Notices |
|
|
|
1. The model is commercially available for free. |
|
2. I am not going to write a paper about this model, if you use any details in your paper, please mention it, thanks. |
|
|
|
--- |
|
|
|
# Bonus |
|
|
|
All of our TCM domain models will be open-sourced soon, including: |
|
1. A series of pre-trained models |
|
2. Named entity recognition for TCM |
|
3. Text localization in ancient images |
|
4. OCR for ancient images |
|
|
|
And so on |