BSC-NLP4BIA
/

biomedical-semantic-relation-classifier-setfit

@@ -1,49 +1,93 @@
 ---
 license: apache-2.0
 tags:
 - setfit
 - sentence-transformers
-- text-classification
-pipeline_tag: text-classification
 ---
-# /gpfs/scratch/bsc14/bsc14515/jup_lab/models/trained/setfit_rel1_B
-This is a [SetFit model](https://github.com/huggingface/setfit) that can be used for text classification. The model has been trained using an efficient few-shot learning technique that involves:
-1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
-2. Training a classification head with features from the fine-tuned Sentence Transformer.
-## Usage
-To use this model for inference, first install the SetFit library:
 ```bash
-python -m pip install setfit
 ```
-You can then run inference as follows:
 ```python
-from setfit import SetFitModel
-# Download from Hub and run inference
-model = SetFitModel.from_pretrained("/gpfs/scratch/bsc14/bsc14515/jup_lab/models/trained/setfit_rel1_B")
-# Run inference
-preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])
 ```
-## BibTeX entry and citation info
-```bibtex
-@article{https://doi.org/10.48550/arxiv.2209.11055,
-doi = {10.48550/ARXIV.2209.11055},
-url = {https://arxiv.org/abs/2209.11055},
-author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
-keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
-title = {Efficient Few-Shot Learning Without Prompts},
-publisher = {arXiv},
-year = {2022},
-copyright = {Creative Commons Attribution 4.0 International}
-}
-```

 ---
 license: apache-2.0
+language:
+- es
+pipeline_tag: relation-classification
 tags:
 - setfit
 - sentence-transformers
+- relation-classification
+- bert
+- biomedical
+- lexical semantics
+- bionlp
 ---
+# Biomedical relation classifier with SetFit in Spanish
+## Table of contents
+<details>
+<summary>Click to expand</summary>
+- [Model description](#model-description)
+- [Intended uses and limitations](#intended-use)
+- [How to use](#how-to-use)
+- [Training](#training)
+- [Evaluation](#evaluation)
+- [Additional information](#additional-information)
+  - [Author](#author)
+  - [Licensing information](#licensing-information)
+  - [Citation information](#citation-information)
+  - [Disclaimer](#disclaimer)
+</details>
+## Model description
+This is a Transformer's [SetFit model](https://github.com/huggingface/setfit) trained for biomedical text pairs classification in Spanish.
+## Intended uses and limitations
+The model is prepared to classify hierarchical relations among medical terms. This includes the following types of relations: BROAD, EXACT, NARROW, NO_RELATION.
+## How to use
+This model is implemented as part of the KeyCARE library. Install first the keycare module to call the SetFit classifier:
 ```bash
+python -m pip install keycare
 ```
+You can then run the KeyCARE pipeline that uses the SetFit model:
 ```python
+from keycare install RelExtractor.RelExtractor
+# initialize the termextractor object
+relextractor = RelExtractor(relation_method='setfit')
+# Run the pipeline
+source = ["cáncer", "enfermedad de pulmón", "mastectomía radical izquierda", "laparoscopia"]
+target = ["cáncer de mama", "enfermedad pulmonar", "mastectomía", "Streptococus pneumoniae"]
+relextractor(source, target)
+# You can also access the class storing the SetFit model
+relator = relextractor.relation_method
 ```
+## Training
+The model has been trained using an efficient few-shot learning technique that involves:
+1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. The used pre-trained model is SapBERT-from-roberta-base-biomedical-clinical-es from the BSC-NLP4BIA reserch group.
+2. Training a classification head with features from the fine-tuned Sentence Transformer.
+The training data has been obtained using the hirerarchical structure of [SNOMED-CT](https://www.snomed.org/) mapped to the medical terms present in [UMLS](https://www.nlm.nih.gov/research/umls/index.html).
+## Evaluation
+To be published
+## Additional information
+### Author
+NLP4BIA at the Barcelona Supercomputing Center
+### Licensing information
+[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+### Citation information
+To be published
+### Disclaimer
+<details>
+<summary>Click to expand</summary>
+The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions.
+When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.
+</details>