ClinicalNoteBERT
Using openly available clinical notes, we pretrain ClinicalNoteBERT, a series of encoders of three model sizes (110M, 67M, and 14.5M) that consider note contexts and variations during pretraining. We adopt a range of downstream applications to evaluate ClinicalNoteBERT, including tasks in fine-tuning, unsupervised semantic textual similarity, retrieval-augmented generation of LLMs, and unimodal and multimodal clinical predictions, and compare with strong baselines. Our models achieve better results than the baseline models of similar or larger sizes on various tasks and datasets. We find that different choices made during pretraining can lead to varied improvements for the downstream tasks. Our small and tiny versions of ClinicalNoteBERT maintain over 96% and 91% of the best performance with less than 61% and 14% of the parameters, respectively.
Overall performance
# Params | FT | STS | RAG | CP | Fusion | |
---|---|---|---|---|---|---|
ClinicalNoteBERT-note-only | 110M | 80.0 | 78.9 | 14.0 | 63.8 | 66.5 |
ClinicalNoteBERT-note-ntp | 110M | 80.6 | 73.6 | 13.0 | 62.9 | 65.8 |
ClinicalNoteBERT-base | 110M | 80.1 | 79.8 | 12.3 | 64.0 | 66.7 |
ClinicalNoteBERT-small | 67M | 78.1 | 77.1 | 11.4 | 64.6 | 66.8 |
ClinicalNoteBERT-tiny | 14.5M | 74.1 | 75.7 | 8.9 | 62.4 | 65.5 |
FT: fine-tuning. STS: semantic textual similarity (ClinicalSTS). RAG: retrieval augmented generation (GPT2, Llama2). CP: clinical prediction. Fusion: multimodal fusion for clinical prediction.
When encoding text sequences for STS, RAG, and CP/Fusion, ClinicalNoteBERT models are adapted through extra SimCSE training in the unsupervised fashion using varied sequence lengths/types. Sequence-sentence, sequence-segment, and sequence-note are used for STS, RAG, and CP/Fusion, respectively, according to their corresponding lengths. More details can be found in the paper.
Citation
Under review
- Downloads last month
- 5