Clinical NER model using spaCy's SpanCategorizer implementation and medBERT.de.
Usage:
!huggingface-cli download phlobo/de_ggponc_medbertde de_ggponc_medbertde-1.0.0-py3-none-any.whl --local-dir .
!pip install de_ggponc_medbertde-1.0.0-py3-none-any.whl
import spacy
nlp = spacy.load('de_ggponc_medbertde')
d = nlp("allein nach Versagen einer Behandlung mit Oxaliplatin und Irinotecan")
for e in d.spans['entities']:
print(e, e.label_)
yields:
Oxaliplatin Clinical_Drug
Irinotecan Clinical_Drug
Versagen einer Behandlung Other_Finding
Behandlung mit Oxaliplatin und Irinotecan Therapeutic
The model has been trained on gold standard labels in GGPONC 2.0 (https://aclanthology.org/2022.lrec-1.389/).
It detects the following 8 entity classes:
- Findings: Diagnosis / Pathology and Other Findings
- Substances: Clinical Drug, Nutrients / Body Substances, External Substances
- Procedures: Therapeutic, Diagnostic
The configuration for training the model is available here: https://github.com/hpi-dhc/ggponc
When using the model, please cite the following publication:
@inproceedings{borchert-etal-2022-ggponc,
title = "{GGPONC} 2.0 - The {G}erman Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline {NER} Taggers",
author = "Borchert, Florian and
Lohr, Christina and
Modersohn, Luise and
Witt, Jonas and
Langer, Thomas and
Follmann, Markus and
Gietzelt, Matthias and
Arnrich, Bert and
Hahn, Udo and
Schapranow, Matthieu-P.",
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
pages = "3650--3660"
}
Feature | Description |
---|---|
Name | de_ggponc_medbertde |
Version | 1.0.0 |
spaCy | >=3.4.4,<3.5.0 |
Default Pipeline | transformer , morphologizer , parser , transformer_spancat , spancat |
Components | transformer , morphologizer , parser , transformer_spancat , spancat |
License | The model may be used for non-commercial research activities only, see also the Terms of Use of GGPONC: https://www.leitlinienprogramm-onkologie.de/projekte/ggponc-english |
Author | Florian Borchert |
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train phlobo/de_ggponc_medbertde
Evaluation results
- F1 score (Test set, fine-grained, nested spans)self-reported0.742
- Precision (Test set, fine-grained, nested spans)self-reported0.730
- Recall (Test set, fine-grained, nested spans)self-reported0.753