kaveh
/

rclip

@@ -33,18 +33,22 @@ widget:
 This model is a fine-tuned version of [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) as an image encoder and [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) as a text encoder on the [ROCO dataset](https://github.com/razorx89/roco-dataset).
 It achieves the following results on the evaluation set:
 - Loss: 0.3388
-## Heatmap
 Here is the heatmap of the similarity score of the first 30 samples on the test split of the ROCO dataset of images vs their captions:
 ![heatmap](https://imgur.com/fPFM694.png)
-## Applications
-### Image Retrieval
 This model can be utilized for image retrieval purposes, as demonstrated below:
-#### Save Image Embeddings
 ```python
 from PIL import Image
 import numpy as np
@@ -71,7 +75,10 @@ for img in images:
 with open("embeddings.pkl", 'wb') as f:
     pickle.dump(np.array(image_embeds), f)
 ```
-#### Query for Images
 ```python
 import numpy as np
 from sklearn.metrics.pairwise import cosine_similarity
@@ -107,9 +114,10 @@ similar_image_names = [images[index] for index in similar_image_indices]
 Image.open(similar_image_names[0])
 ```
-### Zero-Shot Image Classification
 This model can be effectively employed for zero-shot image classification, as exemplified below:
 ```python
 import requests
 from PIL import Image
@@ -131,22 +139,18 @@ print("".join([x[0] + ": " + x[1] + "\n" for x in zip(possible_class_names, [for
 image
 ```
-## Training info
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 24
-- eval_batch_size: 24
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 500
-- num_epochs: 8.0
-### Training results
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
 | 0.7951        | 0.09  | 500   | 1.1912          |
@@ -195,15 +199,28 @@ The following hyperparameters were used during training:
 | 0.0983        | 4.04  | 22000 | 0.3390          |
 | 0.0974        | 4.13  | 22500 | 0.3388          |
-## Framework versions
 - Transformers 4.31.0.dev0
 - Pytorch 2.0.1+cu117
 - Datasets 2.13.1
 - Tokenizers 0.13.3
-# Citation
 ```bibtex
 @misc{RCLIPmodel,

 This model is a fine-tuned version of [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) as an image encoder and [microsoft/BiomedVLP-CXR-BERT-general](https://huggingface.co/microsoft/BiomedVLP-CXR-BERT-general) as a text encoder on the [ROCO dataset](https://github.com/razorx89/roco-dataset).
 It achieves the following results on the evaluation set:
 - Loss: 0.3388
+-----
+## 1-Heatmap
 Here is the heatmap of the similarity score of the first 30 samples on the test split of the ROCO dataset of images vs their captions:
 ![heatmap](https://imgur.com/fPFM694.png)
+-----
+## 2-Applications
+### 2-1-Image Retrieval
 This model can be utilized for image retrieval purposes, as demonstrated below:
+##### 2-1-1-Save Image Embeddings
+<details>
+<summary>click to show the code</summary>
 ```python
 from PIL import Image
 import numpy as np
 with open("embeddings.pkl", 'wb') as f:
     pickle.dump(np.array(image_embeds), f)
 ```
+</details>
+##### 2-1-2-Query for Images
 ```python
 import numpy as np
 from sklearn.metrics.pairwise import cosine_similarity
 Image.open(similar_image_names[0])
 ```
+### 2-2-Zero-Shot Image Classification
 This model can be effectively employed for zero-shot image classification, as exemplified below:
 ```python
 import requests
 from PIL import Image
 image
 ```
+-----
+## 3-Training info
+### 3-1-Metrics
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 0.0974        | 4.13  | 22500 | 0.3388          |
+<details>
+<summary>expand to view all steps</summary>
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
 | 0.7951        | 0.09  | 500   | 1.1912          |
 | 0.0983        | 4.04  | 22000 | 0.3390          |
 | 0.0974        | 4.13  | 22500 | 0.3388          |
+</details>
+### 3-2-Training Hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 24
+- eval_batch_size: 24
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 8.0
+-----
+## 4-Framework Versions
 - Transformers 4.31.0.dev0
 - Pytorch 2.0.1+cu117
 - Datasets 2.13.1
 - Tokenizers 0.13.3
+-----
+# 5-Citation
 ```bibtex
 @misc{RCLIPmodel,