rclip / README.md

init

12cb5cd over 1 year ago

3.76 kB

	---
	tags:
	- generated_from_trainer
	- clip
	- bert
	- vision-language models
	model-index:
	- name: output_8_clip14_cxrbert
	results: []
	language:
	- en
	library_name: transformers
	pipeline_tag: feature-extraction
	---

	# RCLIP (Clip model fine-tuned on radiology images and their captions)

	This model is a fine-tuned version of [openai/clip-vit-large-patch14](https://huggingface.co./openai/clip-vit-large-patch14) as an image encoder and [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co./microsoft/BiomedVLP-CXR-BERT-general) as a text encoder on the [ROCO dataset](https://github.com/razorx89/roco-dataset).
	It achieves the following results on the evaluation set:
	- Loss: 0.3388

	## Heatmap

	Here is the heatmap of the similarity score of the first 30 samples on the test split of the ROCO dataset of images vs their captions:
	![heatmap](https://imgur.com/fPFM694.png)

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 24
	- eval_batch_size: 24
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 8.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 0.7951 \| 0.09 \| 500 \| 1.1912 \|
	\| 0.5887 \| 0.18 \| 1000 \| 0.9833 \|
	\| 0.5023 \| 0.28 \| 1500 \| 0.8459 \|
	\| 0.4709 \| 0.37 \| 2000 \| 0.8479 \|
	\| 0.4484 \| 0.46 \| 2500 \| 0.7667 \|
	\| 0.4319 \| 0.55 \| 3000 \| 0.8092 \|
	\| 0.4181 \| 0.64 \| 3500 \| 0.6964 \|
	\| 0.4107 \| 0.73 \| 4000 \| 0.6463 \|
	\| 0.3723 \| 0.83 \| 4500 \| 0.7893 \|
	\| 0.3746 \| 0.92 \| 5000 \| 0.6863 \|
	\| 0.3667 \| 1.01 \| 5500 \| 0.6910 \|
	\| 0.3253 \| 1.1 \| 6000 \| 0.6863 \|
	\| 0.3274 \| 1.19 \| 6500 \| 0.6445 \|
	\| 0.3065 \| 1.28 \| 7000 \| 0.5908 \|
	\| 0.2834 \| 1.38 \| 7500 \| 0.6138 \|
	\| 0.293 \| 1.47 \| 8000 \| 0.6515 \|
	\| 0.303 \| 1.56 \| 8500 \| 0.5806 \|
	\| 0.2638 \| 1.65 \| 9000 \| 0.5587 \|
	\| 0.2593 \| 1.74 \| 9500 \| 0.5216 \|
	\| 0.2451 \| 1.83 \| 10000 \| 0.5283 \|
	\| 0.2468 \| 1.93 \| 10500 \| 0.5001 \|
	\| 0.2295 \| 2.02 \| 11000 \| 0.4975 \|
	\| 0.1953 \| 2.11 \| 11500 \| 0.4750 \|
	\| 0.1954 \| 2.2 \| 12000 \| 0.4572 \|
	\| 0.1737 \| 2.29 \| 12500 \| 0.4731 \|
	\| 0.175 \| 2.38 \| 13000 \| 0.4526 \|
	\| 0.1873 \| 2.48 \| 13500 \| 0.4890 \|
	\| 0.1809 \| 2.57 \| 14000 \| 0.4210 \|
	\| 0.1711 \| 2.66 \| 14500 \| 0.4197 \|
	\| 0.1457 \| 2.75 \| 15000 \| 0.3998 \|
	\| 0.1583 \| 2.84 \| 15500 \| 0.3923 \|
	\| 0.1579 \| 2.94 \| 16000 \| 0.3823 \|
	\| 0.1339 \| 3.03 \| 16500 \| 0.3654 \|
	\| 0.1164 \| 3.12 \| 17000 \| 0.3592 \|
	\| 0.1217 \| 3.21 \| 17500 \| 0.3641 \|
	\| 0.119 \| 3.3 \| 18000 \| 0.3553 \|
	\| 0.1151 \| 3.39 \| 18500 \| 0.3524 \|
	\| 0.119 \| 3.49 \| 19000 \| 0.3452 \|
	\| 0.102 \| 3.58 \| 19500 \| 0.3439 \|
	\| 0.1085 \| 3.67 \| 20000 \| 0.3422 \|
	\| 0.1142 \| 3.76 \| 20500 \| 0.3396 \|
	\| 0.1038 \| 3.85 \| 21000 \| 0.3392 \|
	\| 0.1143 \| 3.94 \| 21500 \| 0.3390 \|
	\| 0.0983 \| 4.04 \| 22000 \| 0.3390 \|
	\| 0.0974 \| 4.13 \| 22500 \| 0.3388 \|


	### Framework versions

	- Transformers 4.31.0.dev0
	- Pytorch 2.0.1+cu117
	- Datasets 2.13.1
	- Tokenizers 0.13.3

	---
	tags:
	- generated_from_trainer
	- clip
	- bert
	- vision-language models
	model-index:
	- name: output_8_clip14_cxrbert
	results: []
	language:
	- en
	library_name: transformers
	pipeline_tag: feature-extraction
	---

	# RCLIP (Clip model fine-tuned on radiology images and their captions)

	This model is a fine-tuned version of [openai/clip-vit-large-patch14](https://huggingface.co./openai/clip-vit-large-patch14) as an image encoder and [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co./microsoft/BiomedVLP-CXR-BERT-general) as a text encoder on the [ROCO dataset](https://github.com/razorx89/roco-dataset).
	It achieves the following results on the evaluation set:
	- Loss: 0.3388

	## Heatmap

	Here is the heatmap of the similarity score of the first 30 samples on the test split of the ROCO dataset of images vs their captions:
	![heatmap](https://imgur.com/fPFM694.png)

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 24
	- eval_batch_size: 24
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 8.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 0.7951 \| 0.09 \| 500 \| 1.1912 \|
	\| 0.5887 \| 0.18 \| 1000 \| 0.9833 \|
	\| 0.5023 \| 0.28 \| 1500 \| 0.8459 \|
	\| 0.4709 \| 0.37 \| 2000 \| 0.8479 \|
	\| 0.4484 \| 0.46 \| 2500 \| 0.7667 \|
	\| 0.4319 \| 0.55 \| 3000 \| 0.8092 \|
	\| 0.4181 \| 0.64 \| 3500 \| 0.6964 \|
	\| 0.4107 \| 0.73 \| 4000 \| 0.6463 \|
	\| 0.3723 \| 0.83 \| 4500 \| 0.7893 \|
	\| 0.3746 \| 0.92 \| 5000 \| 0.6863 \|
	\| 0.3667 \| 1.01 \| 5500 \| 0.6910 \|
	\| 0.3253 \| 1.1 \| 6000 \| 0.6863 \|
	\| 0.3274 \| 1.19 \| 6500 \| 0.6445 \|
	\| 0.3065 \| 1.28 \| 7000 \| 0.5908 \|
	\| 0.2834 \| 1.38 \| 7500 \| 0.6138 \|
	\| 0.293 \| 1.47 \| 8000 \| 0.6515 \|
	\| 0.303 \| 1.56 \| 8500 \| 0.5806 \|
	\| 0.2638 \| 1.65 \| 9000 \| 0.5587 \|
	\| 0.2593 \| 1.74 \| 9500 \| 0.5216 \|
	\| 0.2451 \| 1.83 \| 10000 \| 0.5283 \|
	\| 0.2468 \| 1.93 \| 10500 \| 0.5001 \|
	\| 0.2295 \| 2.02 \| 11000 \| 0.4975 \|
	\| 0.1953 \| 2.11 \| 11500 \| 0.4750 \|
	\| 0.1954 \| 2.2 \| 12000 \| 0.4572 \|
	\| 0.1737 \| 2.29 \| 12500 \| 0.4731 \|
	\| 0.175 \| 2.38 \| 13000 \| 0.4526 \|
	\| 0.1873 \| 2.48 \| 13500 \| 0.4890 \|
	\| 0.1809 \| 2.57 \| 14000 \| 0.4210 \|
	\| 0.1711 \| 2.66 \| 14500 \| 0.4197 \|
	\| 0.1457 \| 2.75 \| 15000 \| 0.3998 \|
	\| 0.1583 \| 2.84 \| 15500 \| 0.3923 \|
	\| 0.1579 \| 2.94 \| 16000 \| 0.3823 \|
	\| 0.1339 \| 3.03 \| 16500 \| 0.3654 \|
	\| 0.1164 \| 3.12 \| 17000 \| 0.3592 \|
	\| 0.1217 \| 3.21 \| 17500 \| 0.3641 \|
	\| 0.119 \| 3.3 \| 18000 \| 0.3553 \|
	\| 0.1151 \| 3.39 \| 18500 \| 0.3524 \|
	\| 0.119 \| 3.49 \| 19000 \| 0.3452 \|
	\| 0.102 \| 3.58 \| 19500 \| 0.3439 \|
	\| 0.1085 \| 3.67 \| 20000 \| 0.3422 \|
	\| 0.1142 \| 3.76 \| 20500 \| 0.3396 \|
	\| 0.1038 \| 3.85 \| 21000 \| 0.3392 \|
	\| 0.1143 \| 3.94 \| 21500 \| 0.3390 \|
	\| 0.0983 \| 4.04 \| 22000 \| 0.3390 \|
	\| 0.0974 \| 4.13 \| 22500 \| 0.3388 \|


	### Framework versions

	- Transformers 4.31.0.dev0
	- Pytorch 2.0.1+cu117
	- Datasets 2.13.1
	- Tokenizers 0.13.3