PyTorch
Safetensors
Spanish
roberta
medical
fernandogd97 commited on
Commit
0099fd7
·
verified ·
1 Parent(s): 985e191

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -1
README.md CHANGED
@@ -6,4 +6,45 @@ base_model:
6
  - PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
7
  tags:
8
  - medical
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
7
  tags:
8
  - medical
9
+ ---
10
+
11
+
12
+ # **ClinLinker-KB-GP**
13
+
14
+ ## Model Description
15
+ ClinLinker-KB-GP is a state-of-the-art model designed for medical entity linking (MEL) in Spanish, specifically optimized for tasks in the clinical domain. It is based on bi-encoder models enriched with knowledge from medical knowledge graphs like UMLS and SNOMED-CT. This model leverages contrastive learning techniques to enhance the quality of embedding spaces and improve the retrieval of relevant concepts for medical entities mentioned in clinical text.
16
+
17
+ The "GP" in ClinLinker-KB-GP stands for **Grand Parents**. In this model, hierarchical relationships were used, including **parent** and **grandparent** terms as positive candidates. This strategy improves the embedding quality by incorporating terms that are conceptually close at different levels in the knowledge graph, enhancing the linking process.
18
+
19
+ ## Intended Use
20
+ - **Domain:** Clinical Natural Language Processing (NLP) for medical entity linking in Spanish.
21
+ - **Primary Tasks:** Recognizing and normalizing medical entities such as diseases, symptoms, and procedures from clinical texts and linking them to their corresponding standardized terminologies in SNOMED-CT.
22
+ - **Corpora Evaluated:** ClinLinker-KB-GP was tested on several Spanish medical corpora including DisTEMIST (for diseases), MedProcNER (for procedures), and SympTEMIST (for symptoms). It achieved top-tier performance, with top-200 accuracy values of 0.969 in SympTEMIST, 0.943 in MedProcNER, and 0.912 in DisTEMIST.
23
+ - **Target Users:** Researchers, healthcare practitioners, and developers working with Spanish medical data for entity recognition and normalization tasks.
24
+
25
+ ## Performance
26
+ ClinLinker-KB-GP achieved the following key results:
27
+ - **Top-200 Accuracy:**
28
+ - DisTEMIST: 91.2%
29
+ - MedProcNER: 94.3%
30
+ - SympTEMIST: 96.9%
31
+ - **Top-25 Accuracy:**
32
+ - The model achieves up to 86.4% accuracy in retrieving the correct concept in the top-25 candidates for disease and procedure normalization tasks.
33
+ - **Cross-Encoder Integration:** ClinLinker-KB-GP is particularly effective when used with a cross-encoder for reranking candidate concepts, leading to improved accuracy in zero-shot and few-shot learning scenarios.
34
+
35
+ ## Technical Details
36
+ - **Architecture:** The model is a bi-encoder with contrastive learning, designed to generate embeddings for clinical terms, using the relational structure of medical concepts extracted from the UMLS and SNOMED-CT knowledge bases.
37
+ - **Training Strategy:** ClinLinker-KB-GP was trained with a hierarchical relationship structure, incorporating "parent" and "grandparent" nodes from medical knowledge graphs to enhance the embeddings’ quality. The training process also utilizes hard negative mining techniques to optimize candidate retrieval.
38
+
39
+ ## Usage
40
+ Users can utilize our pre-trained model in several ways:
41
+ - By using the provided **FaissEncoder** class to perform efficient entity linking with FAISS-based search.
42
+ - By training their own Bi-encoder model for medical entity linking using our framework available on GitHub:
43
+ [https://github.com/ICB-UMA/ClinLinker-KB](https://github.com/ICB-UMA/ClinLinker-KB)
44
+ - Alternatively, users can load the model directly with Hugging Face’s `AutoModel` and `AutoTokenizer` for flexible integration in custom pipelines:
45
+
46
+ ```python
47
+ from transformers import AutoModel, AutoTokenizer
48
+
49
+ model = AutoModel.from_pretrained("ICB-UMA/ClinLinker-KB-GP")
50
+ tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker-KB-GP")