|
--- |
|
tags: |
|
- spacy |
|
- arxiv:2408.06930 |
|
- medical |
|
language: |
|
- nl |
|
license: cc-by-sa-4.0 |
|
model-index: |
|
- name: Echocardiogram_SpanCategorizer_lv_dil |
|
results: |
|
- task: |
|
type: token-classification |
|
dataset: |
|
type: test |
|
name: "internal test set" |
|
metrics: |
|
- name: "Weighted f1" |
|
type: f1 |
|
value: 0.836 |
|
verified: false |
|
- name: "Weighted precision" |
|
type: precision |
|
value: 0.850 |
|
verified: false |
|
- name: "Weighted recall" |
|
type: recall |
|
value: 0.823 |
|
verified: false |
|
|
|
pipeline_tag: token-classification |
|
metrics: |
|
- f1 |
|
- precision |
|
- recall |
|
--- |
|
|
|
# Description |
|
This model is a spaCy SpanCategorizer model trained from scratch on Dutch echocardiogram reports sourced from Electronic Health Records. The publication associated with the span classification task can be found at https://arxiv.org/abs/2408.06930. The config file for training the model can be found at https://github.com/umcu/echolabeler. |
|
|
|
# Minimum working example |
|
```python |
|
!pip install https://huggingface.co./baukearends/Echocardiogram-SpanCategorizer-lv-dil/resolve/main/nl_Echocardiogram_SpanCategorizer_lv_dil-any-py3-none-any.whl |
|
``` |
|
```python |
|
import spacy |
|
nlp = spacy.load("nl_Echocardiogram_SpanCategorizer_lv_dil") |
|
``` |
|
```python |
|
prediction = nlp("Op dit echo geen duidelijke WMA te zien, goede systolische L.V. functie, normale dimensies LV, wel L.V.H., diastolische dysfunctie graad 1A tot 2. Geringe aortastenose en - matige -insufficientie. Geringe M.I.") |
|
for span, score in zip(prediction.spans['sc'], prediction.spans['sc'].attrs['scores']): |
|
print(f"Span: {span}, label: {span.label_}, score: {score[0]:.3f}") |
|
``` |
|
|
|
# Label Scheme |
|
|
|
<details> |
|
|
|
<summary>View label scheme (5 labels for 1 components)</summary> |
|
|
|
| Component | Labels | |
|
| --- | --- | |
|
| **`spancat`** | `lv_dil_normal`, `lv_dil_mild`, `lv_dil_moderate`, `lv_dil_present`, `lv_dil_severe` | |
|
|
|
</details> |
|
|
|
|
|
# Intended use |
|
The model is developed for span classification on Dutch clinical text. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch. |
|
|
|
# Data |
|
The model was trained on approximately 4,000 manually annotated echocardiogram reports from the University Medical Centre Utrecht. The training data was anonymized before starting the training procedure. |
|
|
|
| Feature | Description | |
|
| --- | --- | |
|
| **Name** | `Echocardiogram_SpanCategorizer_lv_dil` | |
|
| **Version** | `1.0.0` | |
|
| **spaCy** | `>=3.7.4,<3.8.0` | |
|
| **Default Pipeline** | `tok2vec`, `spancat` | |
|
| **Components** | `tok2vec`, `spancat` | |
|
| **License** | `cc-by-sa-4.0` | |
|
| **Author** | [Bauke Arends]() | |
|
|
|
# Contact |
|
If you are having problems with this model please add an issue on our git: https://github.com/umcu/echolabeler/issues |
|
|
|
# Usage |
|
If you use the model in your work please use the following referral; https://doi.org/10.48550/arXiv.2408.06930 |
|
|
|
# References |
|
Paper: Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es (2024): Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification, Arxiv https://arxiv.org/abs/2408.06930 |