Update README.md
Browse files
README.md
CHANGED
@@ -6,4 +6,45 @@ base_model:
|
|
6 |
- PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
|
7 |
tags:
|
8 |
- medical
|
9 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
- PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
|
7 |
tags:
|
8 |
- medical
|
9 |
+
---
|
10 |
+
|
11 |
+
|
12 |
+
# **ClinLinker-KB-GP**
|
13 |
+
|
14 |
+
## Model Description
|
15 |
+
ClinLinker-KB-GP is a state-of-the-art model designed for medical entity linking (MEL) in Spanish, specifically optimized for tasks in the clinical domain. It is based on bi-encoder models enriched with knowledge from medical knowledge graphs like UMLS and SNOMED-CT. This model leverages contrastive learning techniques to enhance the quality of embedding spaces and improve the retrieval of relevant concepts for medical entities mentioned in clinical text.
|
16 |
+
|
17 |
+
The "GP" in ClinLinker-KB-GP stands for **Grand Parents**. In this model, hierarchical relationships were used, including **parent** and **grandparent** terms as positive candidates. This strategy improves the embedding quality by incorporating terms that are conceptually close at different levels in the knowledge graph, enhancing the linking process.
|
18 |
+
|
19 |
+
## Intended Use
|
20 |
+
- **Domain:** Clinical Natural Language Processing (NLP) for medical entity linking in Spanish.
|
21 |
+
- **Primary Tasks:** Recognizing and normalizing medical entities such as diseases, symptoms, and procedures from clinical texts and linking them to their corresponding standardized terminologies in SNOMED-CT.
|
22 |
+
- **Corpora Evaluated:** ClinLinker-KB-GP was tested on several Spanish medical corpora including DisTEMIST (for diseases), MedProcNER (for procedures), and SympTEMIST (for symptoms). It achieved top-tier performance, with top-200 accuracy values of 0.969 in SympTEMIST, 0.943 in MedProcNER, and 0.912 in DisTEMIST.
|
23 |
+
- **Target Users:** Researchers, healthcare practitioners, and developers working with Spanish medical data for entity recognition and normalization tasks.
|
24 |
+
|
25 |
+
## Performance
|
26 |
+
ClinLinker-KB-GP achieved the following key results:
|
27 |
+
- **Top-200 Accuracy:**
|
28 |
+
- DisTEMIST: 91.2%
|
29 |
+
- MedProcNER: 94.3%
|
30 |
+
- SympTEMIST: 96.9%
|
31 |
+
- **Top-25 Accuracy:**
|
32 |
+
- The model achieves up to 86.4% accuracy in retrieving the correct concept in the top-25 candidates for disease and procedure normalization tasks.
|
33 |
+
- **Cross-Encoder Integration:** ClinLinker-KB-GP is particularly effective when used with a cross-encoder for reranking candidate concepts, leading to improved accuracy in zero-shot and few-shot learning scenarios.
|
34 |
+
|
35 |
+
## Technical Details
|
36 |
+
- **Architecture:** The model is a bi-encoder with contrastive learning, designed to generate embeddings for clinical terms, using the relational structure of medical concepts extracted from the UMLS and SNOMED-CT knowledge bases.
|
37 |
+
- **Training Strategy:** ClinLinker-KB-GP was trained with a hierarchical relationship structure, incorporating "parent" and "grandparent" nodes from medical knowledge graphs to enhance the embeddings’ quality. The training process also utilizes hard negative mining techniques to optimize candidate retrieval.
|
38 |
+
|
39 |
+
## Usage
|
40 |
+
Users can utilize our pre-trained model in several ways:
|
41 |
+
- By using the provided **FaissEncoder** class to perform efficient entity linking with FAISS-based search.
|
42 |
+
- By training their own Bi-encoder model for medical entity linking using our framework available on GitHub:
|
43 |
+
[https://github.com/ICB-UMA/ClinLinker-KB](https://github.com/ICB-UMA/ClinLinker-KB)
|
44 |
+
- Alternatively, users can load the model directly with Hugging Face’s `AutoModel` and `AutoTokenizer` for flexible integration in custom pipelines:
|
45 |
+
|
46 |
+
```python
|
47 |
+
from transformers import AutoModel, AutoTokenizer
|
48 |
+
|
49 |
+
model = AutoModel.from_pretrained("ICB-UMA/ClinLinker-KB-GP")
|
50 |
+
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker-KB-GP")
|