{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aH0U6JkEbAcg"
},
"source": [
"# **Image to Semantic Embeddings**\n",
"\n",
"**Aim**: Encode around 50k jpg/jpeg images into vector embeddings using a vision tranformer model and upsert them into a vector database for clustering and querying"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "CFLaAyqCbAch"
},
"outputs": [],
"source": [
"!pip install jupyter sentence_transformers pandas qdrant_client"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "j5o4d0jbbAci"
},
"source": [
"# Load Dataset\n",
"This is the Open Images Dataset by CVDFoundation which hosts over 9 mil images. We will be working with a smaller subset.\n",
"\n",
"The dataset currently is a tsv file, with the first column representing a URL to a hosted jpg/jpeg image."
]
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"data = pd.read_csv('images.tsv', sep='\\t', header=None).reset_index()\n",
"print(data.shape, data.head(), sep=\"\\n\")"
],
"metadata": {
"id": "j97T0MIBeEDe",
"outputId": "edeb755a-1880-42a5-c7d8-570ff185c6b8",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": 12,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"(41620, 4)\n",
" index 0 1 \\\n",
"0 0 https://c2.staticflickr.com/6/5606/15611395595... 2038323 \n",
"1 1 https://c6.staticflickr.com/3/2808/10351094034... 1762125 \n",
"2 2 https://c2.staticflickr.com/9/8089/8416776003_... 9059623 \n",
"3 3 https://farm3.staticflickr.com/568/21452126474... 2306438 \n",
"4 4 https://farm4.staticflickr.com/1244/677743874_... 6571968 \n",
"\n",
" 2 \n",
"0 I4V4qq54NBEFDwBqPYCkDA== \n",
"1 38x6O2LAS75H1vUGVzIilg== \n",
"2 4ksF8TuGWGcKul6Z/6pq8g== \n",
"3 R+6Cs525mCUT6RovHPWREg== \n",
"4 JnkYas7iDJu+pb81tfqVow== \n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Download the images\n",
"We need the image data locally to feed it to the model"
],
"metadata": {
"id": "M-Esbnhy6KTU"
}
},
{
"cell_type": "code",
"source": [
"import urllib\n",
"import os\n",
"\n",
"def download_file(url):\n",
" os.makedirs(\"./images\", exist_ok=True)\n",
" basename = os.path.basename(url)\n",
" target_path = os.path.join(\"./images\", basename)\n",
" if not os.path.exists(target_path):\n",
" try:\n",
" urllib.request.urlretrieve(url, target_path)\n",
" except urllib.error.HTTPError:\n",
" return None\n",
" return target_path"
],
"metadata": {
"id": "cK_63ubnieI6"
},
"execution_count": 13,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# The Model\n",
"We will be using a pre-trained model. Contrastive Language-Image Pre-training (CLIP) model developed by OpenAI is a multi-modal Vision Transformer model that can extract the visual features from the image into vector embeddings\n",
"\n",
"We will be storing these vector embeddings in a vector space database, where images will be clustered based on their semantic information ready for querying"
],
"metadata": {
"id": "0WrAbzxP6khy"
}
},
{
"cell_type": "code",
"source": [
"from sentence_transformers import SentenceTransformer\n",
"from PIL import Image\n",
"\n",
"model = SentenceTransformer(\"clip-ViT-B-32\")"
],
"metadata": {
"id": "pHYk-KdmlJxz",
"outputId": "26d28cf6-4878-4b34-b8e4-a88f68195514",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 493,
"referenced_widgets": [
"b902fda97f894942ab42b0e22398d0f5",
"b5103986eb2c418088046746a52dc90a",
"bf6c98fe76144d818f6d0c7d7170c71b",
"51616c08f5e149679030f25c8e6e1427",
"c09f6dac4ebf440d9aa37a2b1dc40e02",
"a6da9d62a3614cc884a73770eb57d328",
"21105248f39b484984dfb5dea0d22648",
"3c53755c75b4472a99a3c2c0959856bb",
"19899dced9f346db9ebf93443016430c",
"14ecd56c314b44f291f048c188e4ee46",
"dff5c80654b048389a11c662beb79aaa",
"c0869d58121c4ee281978af9299b5d54",
"7fac92aafdf8411b8a5ca5ae1dc96d20",
"c37690203adf4373a6a05cbf8665f000",
"d274cd0fbffe4272a1ae7eeead076b45",
"b6c813006b34458e9e0924016a6d641a",
"2a5becb055fe4906a7d9dcb267d27a66",
"57be8d100b204d7483381b1ff0c1457c",
"2b8fd6876b70431c94b7822e9e56d6eb",
"05dbc884aff44eab98b39a80daaf47fb",
"9e6d30764cb14396b470473f59b28c08",
"304d7bddf55c4fceb25035c7d9e08a37",
"6f6af85e0a5440aab79e969fb309533e",
"f6b87cb406c14d149da36e3b9c10533e",
"7c281c7e47e9400e846de2cc3363d0ba",
"d2a72170ce8d4777b7afa0df7208f266",
"39ad9c05989f405692da40a155578417",
"1a8c2c550873457d8ec60ad73dd98626",
"05c8a3182d9e43579d09633ca58577ec",
"def424ea393141e0a87260915c17747e",
"236e10e4a6e04c39b7ba9ced95c15997",
"29b94e57e99b4dd5823b555a81a71268",
"7eebb0931f0847b097dbfa5a61de8144",
"424d3ca689f7438387909780a8d7ba60",
"7d343bfd6e0f4eb38dd2e25f340b241d",
"2f22480394c049c69b3efeed063c6079",
"a0e5cfd8ca3e497f855c26a45022f081",
"b457b1394eea400698009d3d58479b6e",
"83ed932cf566440bbc01e2d42a11515e",
"a496048b884a498daa1e33fbb1a6738b",
"c03b2276f797469d9ccef85b09a0cf6a",
"bc8a49200a854e388eca4a6c32790266",
"e3dc1b5b772047c6b4fd740363a498d4",
"ceaa404a51234de2910cff973425f06f",
"50012d48a2874438a7ff0afbe4fc954d",
"bda33284ecba4ed88f33f3299a7c567d",
"87ef3ba8da1b489babbfa3a6e7315de2",
"2976fff503434bffac9dca4246a4c0ba",
"83a0de88679349ccb4dc218c4dfbe80a",
"91d7219f5a444cb2bc7dc9e90ae38ce6",
"d3bb1268e92a452d9875ecb604941879",
"d2a6199a696e4c3da1a1f4581af2b628",
"83eddb6d7b9d4e8c8febf9b8f3abfe93",
"54a2f645712a43c6851b1a71202a80d3",
"5de17e9ab476448f96e08dafc84f1bcf",
"390acd3eb932474fb21de9ed56843d12",
"1c0914107bca4279a53ecf2363c1ab2c",
"c06944a9c37e45b3b347055fe7c4b823",
"25a188fc5fbc4d7389ae8e7a4926f588",
"a163ff70057c479e93745428139d8d88",
"bf114eeff62041edb3d280db38125356",
"8f9f9d9f9b6b43e1b29548782a1ea755",
"689d14fa34dd41a3927d098dbd2ba4c2",
"a20d4b5c8d914879b421ee2451e9bca9",
"e24e319694d24ce9bc89644e57a080c3",
"8919c9f1a4d44d92ad2f2b1c0619c7ee",
"abdaa97e57dc4905a79bb2db88e76886",
"d7bb1353f13b4d2492008f26b5555f4c",
"2e34d50b131b4c0e9f317b2081c22122",
"26d0f6d336bf473ea8c5b06239f63a90",
"3359b5ddcc0348fabff06011710751b7",
"b158d1c5d56a4811b342cbfdb0f36ab8",
"3ea86011737d4de5983da54690d47521",
"e85d0850d3d24a6b9bb0e0564709a1f8",
"b9c2d70a064d4a6bb590a069eb815a40",
"8827088eda784a6db9285b140eefb218",
"5fb1a44dc6cd44239c3db59677372ad2",
"0671262acdb74f91aa89b973d690bfca",
"262c0b729102400790ec49efea1d6886",
"1b1ff73e171f432897bb355bec94ad9b",
"49a4600201e946499940e9b85be06917",
"7c20dce3bd4049429bc13f054d6a69c4",
"ec6ce6981cae44a69376a4dbd10fe5ec",
"2cefad135c8f4394bc452726d4afe582",
"b58f3480757a450bb2fa9225a3a37690",
"af3710d4e04e4361ad536cc6ca0fe107",
"fca63b037c4a458caa061cbcdc787fbe",
"8d07ead0131c49829344b5d4b985611a",
"5104dd1ff87e442a9c7268dd69365f0e",
"7d387a3317ed459d8a9a693e762a1db4",
"8eeefb1b2dca4d728b97cbd60ffa3df9",
"1e7f0be6041f4a2c92ea0ed7f805e802",
"9803b473a7bb4b9293aa6e90e9f543d1",
"45eb2f05ad2b414b9fa1acbdb4d28275",
"2c69ae157ecb4ed4948e3a3b0dac3847",
"b121db2533354b1eb80e2360bf68a43b",
"88ad173445514177a8d4b9e8c1cdd848",
"67da2c8f2d2e47838a96cd0d293c4b26",
"1dd8acf5e9cf467ba3acbfc9848ef63b",
"816802c9d76c4669bee4c73f4a1017f9",
"a509b86b883146a2acf015d3bcd1a873",
"e259f3a6c72541b191c2f965f838a746",
"d797a8519f454505a8a5227808f2cdb7",
"8d99ecf08cec419c8e2c0ece05eb0fca",
"c6ef01e3afda4bbaba29b836fe3b4af7",
"72b9a41bb429471a86f7c58823cdbd62",
"70cce4aa270a4a15991f840070fd27fe",
"2858c73260244655816b7d13d20767b4",
"35a0c6f6df594eb5b205e3ac12f11eac",
"1290404668b246d29b38d97aaa51ff0f",
"c8937e032c3d481a97d8696816a23ac3",
"ed7dfcfce92e44f8b54eee0cffa5adeb",
"ee2ee4900ed749ce94897fdf49e75510",
"b0a10fe58e9f4499b6f5c4b9f0de0cfd",
"8382c14221eb47e08bd5bb92d0b742db",
"2ad773bb4482421fad3c6c0bcc559d39",
"3fa339529edb4bd59deb5948c50043aa",
"9f8c651585a94f63800604783fcb5a9e",
"e8bb8a51ce22443881a95f1a3a5f63e5",
"6861c9859eda4b3c885e189e731df136",
"372766453b114a659fe33fd09c30f2c7"
]
}
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning: \n",
"The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
"To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co./settings/tokens), set it as secret in your Google Colab and restart your session.\n",
"You will be able to reuse this secret in all of your notebooks.\n",
"Please note that authentication is recommended but still optional to access public models or datasets.\n",
" warnings.warn(\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
".gitattributes: 0%| | 0.00/690 [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "b902fda97f894942ab42b0e22398d0f5"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"0_CLIPModel/config.json: 0%| | 0.00/4.03k [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "c0869d58121c4ee281978af9299b5d54"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"0_CLIPModel/merges.txt: 0%| | 0.00/525k [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "6f6af85e0a5440aab79e969fb309533e"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"0_CLIPModel/preprocessor_config.json: 0%| | 0.00/316 [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "424d3ca689f7438387909780a8d7ba60"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"pytorch_model.bin: 0%| | 0.00/605M [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "50012d48a2874438a7ff0afbe4fc954d"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"0_CLIPModel/special_tokens_map.json: 0%| | 0.00/389 [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "390acd3eb932474fb21de9ed56843d12"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"0_CLIPModel/tokenizer_config.json: 0%| | 0.00/604 [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "abdaa97e57dc4905a79bb2db88e76886"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"0_CLIPModel/vocab.json: 0%| | 0.00/961k [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "0671262acdb74f91aa89b973d690bfca"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"README.md: 0%| | 0.00/1.88k [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "5104dd1ff87e442a9c7268dd69365f0e"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"config_sentence_transformers.json: 0%| | 0.00/116 [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "816802c9d76c4669bee4c73f4a1017f9"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"modules.json: 0%| | 0.00/122 [00:00, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "c8937e032c3d481a97d8696816a23ac3"
}
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"# The Vector Database\n",
"\n",
"Qdrant is an open-source vector database, where we can store vector embeddings and query nearest neighbours of a given embedding to create a recommendation/semantic search engine\n",
"\n",
"We start by initializing the Qdrant client and connecting to the cluster hosted on Qdrant Cloud\n",
"\n",
"We will be using Cosine Similarity metric to calculate the nearest neighbours"
],
"metadata": {
"id": "2h7jMch58ADV"
}
},
{
"cell_type": "code",
"source": [
"from qdrant_client import QdrantClient\n",
"from qdrant_client.http import models as rest\n",
"from google.colab import userdata\n",
"\n",
"qdrant_client = QdrantClient(\n",
" url = userdata.get('QDRANT_CLUSTER_URL'),\n",
" api_key = userdata.get('QDRANT_CLUSTER_API_KEY'),\n",
")"
],
"metadata": {
"id": "l67QD_zNllgU"
},
"execution_count": 14,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Client Setup, now create the collection"
],
"metadata": {
"id": "8S1SXZnbltfr"
}
},
{
"cell_type": "code",
"source": [
"qdrant_client.create_collection(\n",
" collection_name=\"images\",\n",
" vectors_config = rest.VectorParams(size=512, distance = rest.Distance.COSINE),\n",
")"
],
"metadata": {
"id": "nAObCg-yrzpC"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Function to upsert vector points to the collection"
],
"metadata": {
"id": "zGbMrsDL_HH-"
}
},
{
"cell_type": "code",
"source": [
"def upsert_to_db(points):\n",
" qdrant_client.upsert(\n",
" collection_name=\"images\",\n",
" points=[\n",
" rest.PointStruct(\n",
" id=point['id'],\n",
" vector=point['vector'].tolist(),\n",
" payload=point['payload']\n",
" )\n",
" for point in points\n",
" ]\n",
")"
],
"metadata": {
"id": "mjTRm85dr13p"
},
"execution_count": 16,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Testing with a Subset of 500 Images\n",
"\n",
"Each image will go through a three step process given below until ready for similarity search.\n",
"\n",
"
\n",
"\n",
"
DOWNLOAD -> ENCODE -> UPSERT
\n", "\n", "![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
![]() | ![]() |