smarsol commited on
Commit
f5c4b14
1 Parent(s): 554d73d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -28
README.md CHANGED
@@ -1,49 +1,95 @@
1
  ---
2
  license: apache-2.0
 
 
 
3
  tags:
4
  - setfit
5
  - sentence-transformers
6
  - text-classification
7
- pipeline_tag: text-classification
 
 
 
8
  ---
9
 
10
- # /content/drive/MyDrive/Colab Notebooks/noparents_sp
11
 
12
- This is a [SetFit model](https://github.com/huggingface/setfit) that can be used for text classification. The model has been trained using an efficient few-shot learning technique that involves:
 
 
13
 
14
- 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
15
- 2. Training a classification head with features from the fine-tuned Sentence Transformer.
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- ## Usage
 
18
 
19
- To use this model for inference, first install the SetFit library:
 
20
 
21
  ```bash
22
- python -m pip install setfit
23
  ```
24
 
25
- You can then run inference as follows:
26
 
27
  ```python
28
- from setfit import SetFitModel
29
 
30
- # Download from Hub and run inference
31
- model = SetFitModel.from_pretrained("/content/drive/MyDrive/Colab Notebooks/noparents_sp")
32
- # Run inference
33
- preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])
 
 
 
 
 
 
34
  ```
35
 
36
- ## BibTeX entry and citation info
37
-
38
- ```bibtex
39
- @article{https://doi.org/10.48550/arxiv.2209.11055,
40
- doi = {10.48550/ARXIV.2209.11055},
41
- url = {https://arxiv.org/abs/2209.11055},
42
- author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
43
- keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
44
- title = {Efficient Few-Shot Learning Without Prompts},
45
- publisher = {arXiv},
46
- year = {2022},
47
- copyright = {Creative Commons Attribution 4.0 International}
48
- }
49
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - es
5
+ pipeline_tag: text-classification
6
  tags:
7
  - setfit
8
  - sentence-transformers
9
  - text-classification
10
+ - bert
11
+ - biomedical
12
+ - lexical semantics
13
+ - bionlp
14
  ---
15
 
16
+ # Biomedical term classifier with SetFit in Spanish
17
 
18
+ ## Table of contents
19
+ <details>
20
+ <summary>Click to expand</summary>
21
 
22
+ - [Model description](#model-description)
23
+ - [Intended uses and limitations](#intended-use)
24
+ - [How to use](#how-to-use)
25
+ - [Training](#training)
26
+ - [Evaluation](#evaluation)
27
+ - [Additional information](#additional-information)
28
+ - [Author](#author)
29
+ - [Licensing information](#licensing-information)
30
+ - [Citation information](#citation-information)
31
+ - [Disclaimer](#disclaimer)
32
+
33
+ </details>
34
+
35
+ ## Model description
36
+ This is a [SetFit model](https://github.com/huggingface/setfit) trained for multilabel biomedical text classification in Spanish.
37
 
38
+ ## Intended uses and limitations
39
+ The model is prepared to classify medical entities among 21 classes, including diseases, medical procedures, symptoms, and drugs, among others. It still lacks some classes like body structures.
40
 
41
+ ## How to use
42
+ This model is implemented as part of the KeyCARE library. Install first the keycare module to call the SetFit classifier:
43
 
44
  ```bash
45
+ python -m pip install keycare
46
  ```
47
 
48
+ You can then run the KeyCARE pipeline that uses the SetFit model:
49
 
50
  ```python
51
+ from keycare install TermExtractor.TermExtractor
52
 
53
+ # initialize the termextractor object
54
+ termextractor = TermExtractor()
55
+ # Run the pipeline
56
+ text = """Acude al Servicio de Urgencias por cefalea frontoparietal derecha.
57
+ Mediante biopsia se diagnostica adenocarcinoma de próstata Gleason 4+4=8 con metástasis óseas múltiples.
58
+ Se trata con Ácido Zoledrónico 4 mg iv/4 semanas.
59
+ """
60
+ termextractor(text)
61
+ # You can also access the class storing the SetFit model
62
+ categorizer = termextractor.categorizer
63
  ```
64
 
65
+ ## Training
66
+ The model has been trained using an efficient few-shot learning technique that involves:
67
+
68
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. The used pre-trained model is SapBERT-from-roberta-base-biomedical-clinical-es from the BSC-NLP4BIA reserch group.
69
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
70
+
71
+ The training data has been obtained from NER Gold Standard Corpora also generated by BSC-NLP4BIA, including [MedProcNER](https://temu.bsc.es/medprocner/), [DISTEMIST](https://temu.bsc.es/distemist/), [SympTEMIST](https://temu.bsc.es/symptemist/), [CANTEMIST](https://temu.bsc.es/cantemist/), and [PharmaCoNER](https://temu.bsc.es/pharmaconer/), among others.
72
+
73
+ ## Evaluation
74
+ To be published
75
+
76
+ ## Additional information
77
+
78
+ ### Author
79
+ NLP4BIA at the Barcelona Supercomputing Center
80
+
81
+ ### Licensing information
82
+ [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
83
+
84
+ ### Citation information
85
+ To be published
86
+
87
+ ### Disclaimer
88
+ <details>
89
+ <summary>Click to expand</summary>
90
+
91
+ The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions.
92
+
93
+ When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.
94
+
95
+ </details>