CallmeKaito commited on
Commit
649121a
β€’
1 Parent(s): 6903ffe

Upload 10 files

Browse files
data/.DS_Store ADDED
Binary file (6.15 kB). View file
 
data/10106922982.jpeg ADDED
data/10111325994.jpeg ADDED
data/10113394119.jpeg ADDED
data/10119695953.jpeg ADDED
data/thuya.jpeg ADDED
notebooks/.DS_Store ADDED
Binary file (6.15 kB). View file
 
notebooks/CLIP (3).ipynb ADDED
@@ -0,0 +1,836 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {
7
+ "colab": {
8
+ "base_uri": "https://localhost:8080/"
9
+ },
10
+ "id": "sYaX1Rf8pCWN",
11
+ "outputId": "f52aaf57-323d-46ff-908f-f188525b830a",
12
+ "tags": []
13
+ },
14
+ "outputs": [
15
+ {
16
+ "name": "stdout",
17
+ "output_type": "stream",
18
+ "text": [
19
+ "Collecting ftfy\n",
20
+ " Downloading ftfy-6.2.0-py3-none-any.whl (54 kB)\n",
21
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 54 kB 3.5 MB/s eta 0:00:011\n",
22
+ "\u001b[?25hCollecting regex\n",
23
+ " Downloading regex-2024.5.15-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (774 kB)\n",
24
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 774 kB 4.9 MB/s eta 0:00:01\n",
25
+ "\u001b[?25hRequirement already satisfied: tqdm in /home/user/miniconda/lib/python3.9/site-packages (4.61.2)\n",
26
+ "Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /home/user/miniconda/lib/python3.9/site-packages (from ftfy) (0.2.13)\n",
27
+ "Installing collected packages: regex, ftfy\n",
28
+ "Successfully installed ftfy-6.2.0 regex-2024.5.15\n",
29
+ "Collecting git+https://github.com/openai/CLIP.git\n",
30
+ " Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-7h9f8ksf\n",
31
+ " Running command git clone -q https://github.com/openai/CLIP.git /tmp/pip-req-build-7h9f8ksf\n",
32
+ "Requirement already satisfied: ftfy in /home/user/miniconda/lib/python3.9/site-packages (from clip==1.0) (6.2.0)\n",
33
+ "Requirement already satisfied: regex in /home/user/miniconda/lib/python3.9/site-packages (from clip==1.0) (2024.5.15)\n",
34
+ "Requirement already satisfied: tqdm in /home/user/miniconda/lib/python3.9/site-packages (from clip==1.0) (4.61.2)\n",
35
+ "Collecting torch\n",
36
+ " Downloading torch-2.3.0-cp39-cp39-manylinux1_x86_64.whl (779.1 MB)\n",
37
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 322.4 MB 155.1 MB/s eta 0:00:03"
38
+ ]
39
+ },
40
+ {
41
+ "name": "stderr",
42
+ "output_type": "stream",
43
+ "text": [
44
+ "IOPub data rate exceeded.\n",
45
+ "The Jupyter server will temporarily stop sending output\n",
46
+ "to the client in order to avoid crashing it.\n",
47
+ "To change this limit, set the config variable\n",
48
+ "`--ServerApp.iopub_data_rate_limit`.\n",
49
+ "\n",
50
+ "Current values:\n",
51
+ "ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)\n",
52
+ "ServerApp.rate_limit_window=3.0 (secs)\n",
53
+ "\n"
54
+ ]
55
+ },
56
+ {
57
+ "name": "stdout",
58
+ "output_type": "stream",
59
+ "text": [
60
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 726.2 MB 140.6 MB/s eta 0:00:01"
61
+ ]
62
+ },
63
+ {
64
+ "name": "stderr",
65
+ "output_type": "stream",
66
+ "text": [
67
+ "IOPub data rate exceeded.\n",
68
+ "The Jupyter server will temporarily stop sending output\n",
69
+ "to the client in order to avoid crashing it.\n",
70
+ "To change this limit, set the config variable\n",
71
+ "`--ServerApp.iopub_data_rate_limit`.\n",
72
+ "\n",
73
+ "Current values:\n",
74
+ "ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)\n",
75
+ "ServerApp.rate_limit_window=3.0 (secs)\n",
76
+ "\n"
77
+ ]
78
+ },
79
+ {
80
+ "name": "stdout",
81
+ "output_type": "stream",
82
+ "text": [
83
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 779.1 MB 39 kB/s \n",
84
+ "\u001b[?25hCollecting torchvision\n",
85
+ " Downloading torchvision-0.18.0-cp39-cp39-manylinux1_x86_64.whl (7.0 MB)\n",
86
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7.0 MB 117.1 MB/s eta 0:00:01\n",
87
+ "\u001b[?25hRequirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /home/user/miniconda/lib/python3.9/site-packages (from ftfy->clip==1.0) (0.2.13)\n",
88
+ "Collecting filelock\n",
89
+ " Downloading filelock-3.14.0-py3-none-any.whl (12 kB)\n",
90
+ "Requirement already satisfied: jinja2 in /home/user/miniconda/lib/python3.9/site-packages (from torch->clip==1.0) (3.1.4)\n",
91
+ "Collecting nvidia-cuda-nvrtc-cu12==12.1.105\n",
92
+ " Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n",
93
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 23.7 MB 111.3 MB/s eta 0:00:01\n",
94
+ "\u001b[?25hCollecting nvidia-cudnn-cu12==8.9.2.26\n",
95
+ " Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n",
96
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 281.1 MB 157.5 MB/s eta 0:00:03"
97
+ ]
98
+ },
99
+ {
100
+ "name": "stderr",
101
+ "output_type": "stream",
102
+ "text": [
103
+ "IOPub data rate exceeded.\n",
104
+ "The Jupyter server will temporarily stop sending output\n",
105
+ "to the client in order to avoid crashing it.\n",
106
+ "To change this limit, set the config variable\n",
107
+ "`--ServerApp.iopub_data_rate_limit`.\n",
108
+ "\n",
109
+ "Current values:\n",
110
+ "ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)\n",
111
+ "ServerApp.rate_limit_window=3.0 (secs)\n",
112
+ "\n"
113
+ ]
114
+ },
115
+ {
116
+ "name": "stdout",
117
+ "output_type": "stream",
118
+ "text": [
119
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 687.7 MB 121.2 MB/s eta 0:00:01"
120
+ ]
121
+ },
122
+ {
123
+ "name": "stderr",
124
+ "output_type": "stream",
125
+ "text": [
126
+ "IOPub data rate exceeded.\n",
127
+ "The Jupyter server will temporarily stop sending output\n",
128
+ "to the client in order to avoid crashing it.\n",
129
+ "To change this limit, set the config variable\n",
130
+ "`--ServerApp.iopub_data_rate_limit`.\n",
131
+ "\n",
132
+ "Current values:\n",
133
+ "ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)\n",
134
+ "ServerApp.rate_limit_window=3.0 (secs)\n",
135
+ "\n"
136
+ ]
137
+ },
138
+ {
139
+ "name": "stdout",
140
+ "output_type": "stream",
141
+ "text": [
142
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 731.7 MB 27 kB/s \n",
143
+ "\u001b[?25hCollecting triton==2.3.0\n",
144
+ " Downloading triton-2.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (168.1 MB)\n",
145
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 168.1 MB 163.1 MB/s eta 0:00:01\n",
146
+ "\u001b[?25hCollecting nvidia-nccl-cu12==2.20.5\n",
147
+ " Downloading nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl (176.2 MB)\n",
148
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 176.2 MB 157 kB/s s eta 0:00:01\n",
149
+ "\u001b[?25hCollecting nvidia-cublas-cu12==12.1.3.1\n",
150
+ " Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n",
151
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 291.1 MB 155.6 MB/s eta 0:00:01"
152
+ ]
153
+ },
154
+ {
155
+ "name": "stderr",
156
+ "output_type": "stream",
157
+ "text": [
158
+ "IOPub data rate exceeded.\n",
159
+ "The Jupyter server will temporarily stop sending output\n",
160
+ "to the client in order to avoid crashing it.\n",
161
+ "To change this limit, set the config variable\n",
162
+ "`--ServerApp.iopub_data_rate_limit`.\n",
163
+ "\n",
164
+ "Current values:\n",
165
+ "ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)\n",
166
+ "ServerApp.rate_limit_window=3.0 (secs)\n",
167
+ "\n"
168
+ ]
169
+ },
170
+ {
171
+ "name": "stdout",
172
+ "output_type": "stream",
173
+ "text": [
174
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 410.6 MB 11 kB/s /s eta 0:00:01\n",
175
+ "\u001b[?25hCollecting nvidia-curand-cu12==10.3.2.106\n",
176
+ " Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n",
177
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 56.5 MB 125.6 MB/s eta 0:00:01β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 35.0 MB 125.6 MB/s eta 0:00:01\n",
178
+ "\u001b[?25hRequirement already satisfied: typing-extensions>=4.8.0 in /home/user/miniconda/lib/python3.9/site-packages (from torch->clip==1.0) (4.11.0)\n",
179
+ "Collecting nvidia-cusolver-cu12==11.4.5.107\n",
180
+ " Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n",
181
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 124.2 MB 144.5 MB/s eta 0:00:01\n",
182
+ "\u001b[?25hCollecting sympy\n",
183
+ " Downloading sympy-1.12-py3-none-any.whl (5.7 MB)\n",
184
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.7 MB 109.2 MB/s eta 0:00:01\n",
185
+ "\u001b[?25hCollecting fsspec\n",
186
+ " Downloading fsspec-2024.5.0-py3-none-any.whl (316 kB)\n",
187
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 316 kB 119.1 MB/s eta 0:00:01\n",
188
+ "\u001b[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105\n",
189
+ " Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n",
190
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 823 kB 119.5 MB/s eta 0:00:01\n",
191
+ "\u001b[?25hCollecting nvidia-cuda-cupti-cu12==12.1.105\n",
192
+ " Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n",
193
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 14.1 MB 126.1 MB/s eta 0:00:01\n",
194
+ "\u001b[?25hCollecting nvidia-cufft-cu12==11.0.2.54\n",
195
+ " Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n",
196
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 121.6 MB 4.8 MB/s eta 0:00:011\n",
197
+ "\u001b[?25hCollecting networkx\n",
198
+ " Downloading networkx-3.2.1-py3-none-any.whl (1.6 MB)\n",
199
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.6 MB 112.8 MB/s eta 0:00:01\n",
200
+ "\u001b[?25hCollecting nvidia-cusparse-cu12==12.1.0.106\n",
201
+ " Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n",
202
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 196.0 MB 154.4 MB/s eta 0:00:01\n",
203
+ "\u001b[?25hCollecting nvidia-nvtx-cu12==12.1.105\n",
204
+ " Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n",
205
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 99 kB 39.0 MB/s eta 0:00:01\n",
206
+ "\u001b[?25hCollecting nvidia-nvjitlink-cu12\n",
207
+ " Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)\n",
208
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 21.1 MB 123.7 MB/s eta 0:00:01\n",
209
+ "\u001b[?25hRequirement already satisfied: MarkupSafe>=2.0 in /home/user/miniconda/lib/python3.9/site-packages (from jinja2->torch->clip==1.0) (2.1.5)\n",
210
+ "Collecting mpmath>=0.19\n",
211
+ " Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)\n",
212
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 536 kB 125.5 MB/s eta 0:00:01\n",
213
+ "\u001b[?25hCollecting pillow!=8.3.*,>=5.3.0\n",
214
+ " Downloading pillow-10.3.0-cp39-cp39-manylinux_2_28_x86_64.whl (4.5 MB)\n",
215
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.5 MB 123.5 MB/s eta 0:00:01\n",
216
+ "\u001b[?25hCollecting numpy\n",
217
+ " Downloading numpy-1.26.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)\n",
218
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 18.2 MB 113.2 MB/s eta 0:00:01 | 1.1 MB 113.2 MB/s eta 0:00:01\n",
219
+ "\u001b[?25hBuilding wheels for collected packages: clip\n",
220
+ " Building wheel for clip (setup.py) ... \u001b[?25ldone\n",
221
+ "\u001b[?25h Created wheel for clip: filename=clip-1.0-py3-none-any.whl size=1369525 sha256=2d16eeced15e3729c52334f9be57fd2ddca900110e745c1af86ab5aade88cd62\n",
222
+ " Stored in directory: /tmp/pip-ephem-wheel-cache-8vr04co8/wheels/c8/e4/e1/11374c111387672fc2068dfbe0d4b424cb9cdd1b2e184a71b5\n",
223
+ "Successfully built clip\n",
224
+ "Installing collected packages: nvidia-nvjitlink-cu12, nvidia-cusparse-cu12, nvidia-cublas-cu12, mpmath, filelock, triton, sympy, nvidia-nvtx-cu12, nvidia-nccl-cu12, nvidia-cusolver-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, networkx, fsspec, torch, pillow, numpy, torchvision, clip\n",
225
+ "Successfully installed clip-1.0 filelock-3.14.0 fsspec-2024.5.0 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.1.105 pillow-10.3.0 sympy-1.12 torch-2.3.0 torchvision-0.18.0 triton-2.3.0\n",
226
+ "\u001b[33mWARNING: Requirement 'sentencepiece-0.1.98-cp311-cp311-win_amd64.whl' looks like a filename, but the file does not exist\u001b[0m\n",
227
+ "\u001b[31mERROR: sentencepiece-0.1.98-cp311-cp311-win_amd64.whl is not a supported wheel on this platform.\u001b[0m\n"
228
+ ]
229
+ }
230
+ ],
231
+ "source": [
232
+ "!pip install ftfy regex tqdm\n",
233
+ "!pip install git+https://github.com/openai/CLIP.git\n",
234
+ "!pip install sentencepiece-0.1.98-cp311-cp311-win_amd64.whl\n",
235
+ "\n"
236
+ ]
237
+ },
238
+ {
239
+ "cell_type": "code",
240
+ "execution_count": 5,
241
+ "metadata": {
242
+ "colab": {
243
+ "base_uri": "https://localhost:8080/"
244
+ },
245
+ "id": "Zuat0Supqs7r",
246
+ "outputId": "f3ec0a32-0d58-4241-d3f2-621828297c43",
247
+ "tags": []
248
+ },
249
+ "outputs": [
250
+ {
251
+ "name": "stdout",
252
+ "output_type": "stream",
253
+ "text": [
254
+ "Collecting transformers\n",
255
+ " Downloading transformers-4.41.0-py3-none-any.whl (9.1 MB)\n",
256
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9.1 MB 4.3 MB/s eta 0:00:01\n",
257
+ "\u001b[?25hRequirement already satisfied: tqdm>=4.27 in /home/user/miniconda/lib/python3.9/site-packages (from transformers) (4.61.2)\n",
258
+ "Collecting tokenizers<0.20,>=0.19\n",
259
+ " Downloading tokenizers-0.19.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)\n",
260
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.6 MB 104.9 MB/s eta 0:00:01\n",
261
+ "\u001b[?25hRequirement already satisfied: pyyaml>=5.1 in /home/user/miniconda/lib/python3.9/site-packages (from transformers) (6.0.1)\n",
262
+ "Requirement already satisfied: filelock in /home/user/miniconda/lib/python3.9/site-packages (from transformers) (3.14.0)\n",
263
+ "Requirement already satisfied: numpy>=1.17 in /home/user/miniconda/lib/python3.9/site-packages (from transformers) (1.26.4)\n",
264
+ "Collecting huggingface-hub<1.0,>=0.23.0\n",
265
+ " Downloading huggingface_hub-0.23.0-py3-none-any.whl (401 kB)\n",
266
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 401 kB 120.0 MB/s eta 0:00:01\n",
267
+ "\u001b[?25hRequirement already satisfied: packaging>=20.0 in /home/user/miniconda/lib/python3.9/site-packages (from transformers) (24.0)\n",
268
+ "Requirement already satisfied: regex!=2019.12.17 in /home/user/miniconda/lib/python3.9/site-packages (from transformers) (2024.5.15)\n",
269
+ "Requirement already satisfied: requests in /home/user/miniconda/lib/python3.9/site-packages (from transformers) (2.31.0)\n",
270
+ "Collecting safetensors>=0.4.1\n",
271
+ " Downloading safetensors-0.4.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)\n",
272
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.2 MB 95.0 MB/s eta 0:00:01\n",
273
+ "\u001b[?25hRequirement already satisfied: typing-extensions>=3.7.4.3 in /home/user/miniconda/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.23.0->transformers) (4.11.0)\n",
274
+ "Requirement already satisfied: fsspec>=2023.5.0 in /home/user/miniconda/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.23.0->transformers) (2024.5.0)\n",
275
+ "Requirement already satisfied: certifi>=2017.4.17 in /home/user/miniconda/lib/python3.9/site-packages (from requests->transformers) (2021.5.30)\n",
276
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /home/user/miniconda/lib/python3.9/site-packages (from requests->transformers) (3.3.2)\n",
277
+ "Requirement already satisfied: idna<4,>=2.5 in /home/user/miniconda/lib/python3.9/site-packages (from requests->transformers) (2.10)\n",
278
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/user/miniconda/lib/python3.9/site-packages (from requests->transformers) (1.26.6)\n",
279
+ "Installing collected packages: huggingface-hub, tokenizers, safetensors, transformers\n",
280
+ "Successfully installed huggingface-hub-0.23.0 safetensors-0.4.3 tokenizers-0.19.1 transformers-4.41.0\n"
281
+ ]
282
+ }
283
+ ],
284
+ "source": [
285
+ "# prompt: install transformers\n",
286
+ "\n",
287
+ "!pip install transformers\n"
288
+ ]
289
+ },
290
+ {
291
+ "cell_type": "code",
292
+ "execution_count": 6,
293
+ "metadata": {
294
+ "id": "8xOP6veIq5LM",
295
+ "tags": []
296
+ },
297
+ "outputs": [
298
+ {
299
+ "data": {
300
+ "application/json": {
301
+ "ascii": false,
302
+ "bar_format": null,
303
+ "colour": null,
304
+ "elapsed": 0.0066907405853271484,
305
+ "initial": 0,
306
+ "n": 0,
307
+ "ncols": null,
308
+ "nrows": null,
309
+ "postfix": null,
310
+ "prefix": "preprocessor_config.json",
311
+ "rate": null,
312
+ "total": 228,
313
+ "unit": "B",
314
+ "unit_divisor": 1000,
315
+ "unit_scale": true
316
+ },
317
+ "application/vnd.jupyter.widget-view+json": {
318
+ "model_id": "d43500a3f8b1440baaaf1337fd547030",
319
+ "version_major": 2,
320
+ "version_minor": 0
321
+ },
322
+ "text/plain": [
323
+ "preprocessor_config.json: 0%| | 0.00/228 [00:00<?, ?B/s]"
324
+ ]
325
+ },
326
+ "metadata": {},
327
+ "output_type": "display_data"
328
+ },
329
+ {
330
+ "name": "stderr",
331
+ "output_type": "stream",
332
+ "text": [
333
+ "/home/user/miniconda/lib/python3.9/site-packages/transformers/models/vit/feature_extraction_vit.py:28: FutureWarning: The class ViTFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ViTImageProcessor instead.\n",
334
+ " warnings.warn(\n"
335
+ ]
336
+ },
337
+ {
338
+ "data": {
339
+ "application/json": {
340
+ "ascii": false,
341
+ "bar_format": null,
342
+ "colour": null,
343
+ "elapsed": 0.004696846008300781,
344
+ "initial": 0,
345
+ "n": 0,
346
+ "ncols": null,
347
+ "nrows": null,
348
+ "postfix": null,
349
+ "prefix": "tokenizer_config.json",
350
+ "rate": null,
351
+ "total": 241,
352
+ "unit": "B",
353
+ "unit_divisor": 1000,
354
+ "unit_scale": true
355
+ },
356
+ "application/vnd.jupyter.widget-view+json": {
357
+ "model_id": "bf4f06b628644ec8a638e5f32bd00324",
358
+ "version_major": 2,
359
+ "version_minor": 0
360
+ },
361
+ "text/plain": [
362
+ "tokenizer_config.json: 0%| | 0.00/241 [00:00<?, ?B/s]"
363
+ ]
364
+ },
365
+ "metadata": {},
366
+ "output_type": "display_data"
367
+ },
368
+ {
369
+ "data": {
370
+ "application/json": {
371
+ "ascii": false,
372
+ "bar_format": null,
373
+ "colour": null,
374
+ "elapsed": 0.004175662994384766,
375
+ "initial": 0,
376
+ "n": 0,
377
+ "ncols": null,
378
+ "nrows": null,
379
+ "postfix": null,
380
+ "prefix": "vocab.json",
381
+ "rate": null,
382
+ "total": 798156,
383
+ "unit": "B",
384
+ "unit_divisor": 1000,
385
+ "unit_scale": true
386
+ },
387
+ "application/vnd.jupyter.widget-view+json": {
388
+ "model_id": "ffc926da2aa540f2a1760c3bb4fb4909",
389
+ "version_major": 2,
390
+ "version_minor": 0
391
+ },
392
+ "text/plain": [
393
+ "vocab.json: 0%| | 0.00/798k [00:00<?, ?B/s]"
394
+ ]
395
+ },
396
+ "metadata": {},
397
+ "output_type": "display_data"
398
+ },
399
+ {
400
+ "data": {
401
+ "application/json": {
402
+ "ascii": false,
403
+ "bar_format": null,
404
+ "colour": null,
405
+ "elapsed": 0.004157304763793945,
406
+ "initial": 0,
407
+ "n": 0,
408
+ "ncols": null,
409
+ "nrows": null,
410
+ "postfix": null,
411
+ "prefix": "merges.txt",
412
+ "rate": null,
413
+ "total": 456356,
414
+ "unit": "B",
415
+ "unit_divisor": 1000,
416
+ "unit_scale": true
417
+ },
418
+ "application/vnd.jupyter.widget-view+json": {
419
+ "model_id": "302ae34c419d484a9b16e025d6d2690b",
420
+ "version_major": 2,
421
+ "version_minor": 0
422
+ },
423
+ "text/plain": [
424
+ "merges.txt: 0%| | 0.00/456k [00:00<?, ?B/s]"
425
+ ]
426
+ },
427
+ "metadata": {},
428
+ "output_type": "display_data"
429
+ },
430
+ {
431
+ "data": {
432
+ "application/json": {
433
+ "ascii": false,
434
+ "bar_format": null,
435
+ "colour": null,
436
+ "elapsed": 0.004187107086181641,
437
+ "initial": 0,
438
+ "n": 0,
439
+ "ncols": null,
440
+ "nrows": null,
441
+ "postfix": null,
442
+ "prefix": "tokenizer.json",
443
+ "rate": null,
444
+ "total": 1355446,
445
+ "unit": "B",
446
+ "unit_divisor": 1000,
447
+ "unit_scale": true
448
+ },
449
+ "application/vnd.jupyter.widget-view+json": {
450
+ "model_id": "de2f6cacd09a43c98c06cf4e4243c7c7",
451
+ "version_major": 2,
452
+ "version_minor": 0
453
+ },
454
+ "text/plain": [
455
+ "tokenizer.json: 0%| | 0.00/1.36M [00:00<?, ?B/s]"
456
+ ]
457
+ },
458
+ "metadata": {},
459
+ "output_type": "display_data"
460
+ },
461
+ {
462
+ "data": {
463
+ "application/json": {
464
+ "ascii": false,
465
+ "bar_format": null,
466
+ "colour": null,
467
+ "elapsed": 0.004050254821777344,
468
+ "initial": 0,
469
+ "n": 0,
470
+ "ncols": null,
471
+ "nrows": null,
472
+ "postfix": null,
473
+ "prefix": "special_tokens_map.json",
474
+ "rate": null,
475
+ "total": 120,
476
+ "unit": "B",
477
+ "unit_divisor": 1000,
478
+ "unit_scale": true
479
+ },
480
+ "application/vnd.jupyter.widget-view+json": {
481
+ "model_id": "c4921bf4d08d4156a1904fabe261235c",
482
+ "version_major": 2,
483
+ "version_minor": 0
484
+ },
485
+ "text/plain": [
486
+ "special_tokens_map.json: 0%| | 0.00/120 [00:00<?, ?B/s]"
487
+ ]
488
+ },
489
+ "metadata": {},
490
+ "output_type": "display_data"
491
+ },
492
+ {
493
+ "data": {
494
+ "application/json": {
495
+ "ascii": false,
496
+ "bar_format": null,
497
+ "colour": null,
498
+ "elapsed": 0.004579067230224609,
499
+ "initial": 0,
500
+ "n": 0,
501
+ "ncols": null,
502
+ "nrows": null,
503
+ "postfix": null,
504
+ "prefix": "config.json",
505
+ "rate": null,
506
+ "total": 4609,
507
+ "unit": "B",
508
+ "unit_divisor": 1000,
509
+ "unit_scale": true
510
+ },
511
+ "application/vnd.jupyter.widget-view+json": {
512
+ "model_id": "2c6081497e1542ab9f86e1f763a46101",
513
+ "version_major": 2,
514
+ "version_minor": 0
515
+ },
516
+ "text/plain": [
517
+ "config.json: 0%| | 0.00/4.61k [00:00<?, ?B/s]"
518
+ ]
519
+ },
520
+ "metadata": {},
521
+ "output_type": "display_data"
522
+ },
523
+ {
524
+ "data": {
525
+ "application/json": {
526
+ "ascii": false,
527
+ "bar_format": null,
528
+ "colour": null,
529
+ "elapsed": 0.0045909881591796875,
530
+ "initial": 0,
531
+ "n": 0,
532
+ "ncols": null,
533
+ "nrows": null,
534
+ "postfix": null,
535
+ "prefix": "pytorch_model.bin",
536
+ "rate": null,
537
+ "total": 982141993,
538
+ "unit": "B",
539
+ "unit_divisor": 1000,
540
+ "unit_scale": true
541
+ },
542
+ "application/vnd.jupyter.widget-view+json": {
543
+ "model_id": "eafcdd2e978a42659bef0a50f82a7055",
544
+ "version_major": 2,
545
+ "version_minor": 0
546
+ },
547
+ "text/plain": [
548
+ "pytorch_model.bin: 0%| | 0.00/982M [00:00<?, ?B/s]"
549
+ ]
550
+ },
551
+ "metadata": {},
552
+ "output_type": "display_data"
553
+ }
554
+ ],
555
+ "source": [
556
+ "from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer\n",
557
+ "\n",
558
+ "\n",
559
+ "feature_extractor = ViTFeatureExtractor.from_pretrained(\"nlpconnect/vit-gpt2-image-captioning\")\n",
560
+ "tokenizer = AutoTokenizer.from_pretrained(\"nlpconnect/vit-gpt2-image-captioning\")\n",
561
+ "model = VisionEncoderDecoderModel.from_pretrained(\"nlpconnect/vit-gpt2-image-captioning\")"
562
+ ]
563
+ },
564
+ {
565
+ "cell_type": "markdown",
566
+ "metadata": {
567
+ "id": "uYLlkIWgqGwX"
568
+ },
569
+ "source": [
570
+ "## Import the necessary libraries and load the CLIP model:"
571
+ ]
572
+ },
573
+ {
574
+ "cell_type": "code",
575
+ "execution_count": 7,
576
+ "metadata": {
577
+ "id": "dLxPnrUQqDZU",
578
+ "tags": []
579
+ },
580
+ "outputs": [
581
+ {
582
+ "name": "stderr",
583
+ "output_type": "stream",
584
+ "text": [
585
+ "100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 338M/338M [00:12<00:00, 28.0MiB/s]\n"
586
+ ]
587
+ }
588
+ ],
589
+ "source": [
590
+ "from PIL import Image\n",
591
+ "import clip\n",
592
+ "import torch\n",
593
+ "\n",
594
+ "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
595
+ "clip_model, preprocess = clip.load(\"ViT-B/32\", device=device)"
596
+ ]
597
+ },
598
+ {
599
+ "cell_type": "markdown",
600
+ "metadata": {
601
+ "id": "Gt1Q-d1iqM9F"
602
+ },
603
+ "source": [
604
+ "## Define a function to generate product descriptions:"
605
+ ]
606
+ },
607
+ {
608
+ "cell_type": "code",
609
+ "execution_count": 8,
610
+ "metadata": {
611
+ "id": "u2XdvaffqGMr",
612
+ "tags": []
613
+ },
614
+ "outputs": [
615
+ {
616
+ "name": "stderr",
617
+ "output_type": "stream",
618
+ "text": [
619
+ "We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.\n",
620
+ "You may ignore this warning if your `pad_token_id` (50256) is identical to the `bos_token_id` (50256), `eos_token_id` (50256), or the `sep_token_id` (None), and your input is not padded.\n"
621
+ ]
622
+ }
623
+ ],
624
+ "source": [
625
+ "image = Image.open(\"data/download.jpeg\")\n",
626
+ "pixel_values = feature_extractor(images=image, return_tensors=\"pt\").pixel_values\n",
627
+ "output_ids = model.generate(pixel_values, max_length=50, num_beams=4, early_stopping=True)\n",
628
+ "captions = tokenizer.batch_decode(output_ids, skip_special_tokens=True)"
629
+ ]
630
+ },
631
+ {
632
+ "cell_type": "code",
633
+ "execution_count": 9,
634
+ "metadata": {
635
+ "colab": {
636
+ "base_uri": "https://localhost:8080/"
637
+ },
638
+ "id": "lOf9lcUAqVlm",
639
+ "outputId": "d00cdc05-6652-4fba-b40c-03ad803d54e3",
640
+ "tags": []
641
+ },
642
+ "outputs": [
643
+ {
644
+ "name": "stdout",
645
+ "output_type": "stream",
646
+ "text": [
647
+ "a vase sitting on top of a table \n"
648
+ ]
649
+ }
650
+ ],
651
+ "source": [
652
+ "image = preprocess(image).unsqueeze(0).to(device)\n",
653
+ "with torch.no_grad():\n",
654
+ " image_features = clip_model.encode_image(image)\n",
655
+ "\n",
656
+ "text_inputs = torch.cat([clip.tokenize(caption).to(device) for caption in captions]).to(device)\n",
657
+ "with torch.no_grad():\n",
658
+ " text_features = clip_model.encode_text(text_inputs)\n",
659
+ "\n",
660
+ "similarity_scores = image_features @ text_features.T\n",
661
+ "best_caption_idx = similarity_scores.argmax().item()\n",
662
+ "product_description = captions[best_caption_idx]\n",
663
+ "print(product_description)"
664
+ ]
665
+ },
666
+ {
667
+ "cell_type": "markdown",
668
+ "metadata": {
669
+ "id": "RM6RXXvT4xSN"
670
+ },
671
+ "source": [
672
+ "# Using SigLip"
673
+ ]
674
+ },
675
+ {
676
+ "cell_type": "code",
677
+ "execution_count": 11,
678
+ "metadata": {
679
+ "tags": []
680
+ },
681
+ "outputs": [
682
+ {
683
+ "name": "stdout",
684
+ "output_type": "stream",
685
+ "text": [
686
+ "Collecting protobuf\n",
687
+ " Downloading protobuf-5.26.1-cp37-abi3-manylinux2014_x86_64.whl (302 kB)\n",
688
+ "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 302 kB 4.3 MB/s eta 0:00:01\n",
689
+ "\u001b[?25hInstalling collected packages: protobuf\n",
690
+ "Successfully installed protobuf-5.26.1\n"
691
+ ]
692
+ }
693
+ ],
694
+ "source": [
695
+ "!pip install sentencepiece\n",
696
+ "!pip install protobuf"
697
+ ]
698
+ },
699
+ {
700
+ "cell_type": "code",
701
+ "execution_count": 12,
702
+ "metadata": {
703
+ "colab": {
704
+ "base_uri": "https://localhost:8080/"
705
+ },
706
+ "id": "fR9c1mv3qXGz",
707
+ "outputId": "5b222c53-e0f8-4545-f191-ad6a90ab1373",
708
+ "tags": []
709
+ },
710
+ "outputs": [
711
+ {
712
+ "name": "stderr",
713
+ "output_type": "stream",
714
+ "text": [
715
+ "/home/user/miniconda/lib/python3.9/site-packages/transformers/models/vit/feature_extraction_vit.py:28: FutureWarning: The class ViTFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ViTImageProcessor instead.\n",
716
+ " warnings.warn(\n"
717
+ ]
718
+ },
719
+ {
720
+ "name": "stdout",
721
+ "output_type": "stream",
722
+ "text": [
723
+ "an old fashioned clock sitting on top of a table \n"
724
+ ]
725
+ }
726
+ ],
727
+ "source": [
728
+ "from transformers import AutoProcessor, AutoModel, VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer\n",
729
+ "import torch\n",
730
+ "from PIL import Image\n",
731
+ "\n",
732
+ "\n",
733
+ "model = AutoModel.from_pretrained(\"google/siglip-base-patch16-224\")\n",
734
+ "processor = AutoProcessor.from_pretrained(\"google/siglip-base-patch16-224\")\n",
735
+ "\n",
736
+ "\n",
737
+ "image = Image.open(\"data/avito4.jpeg\")\n",
738
+ "inputs = processor(images=image, return_tensors=\"pt\")\n",
739
+ "\n",
740
+ "\n",
741
+ "feature_extractor = ViTFeatureExtractor.from_pretrained(\"nlpconnect/vit-gpt2-image-captioning\")\n",
742
+ "tokenizer = AutoTokenizer.from_pretrained(\"nlpconnect/vit-gpt2-image-captioning\")\n",
743
+ "model = VisionEncoderDecoderModel.from_pretrained(\"nlpconnect/vit-gpt2-image-captioning\")\n",
744
+ "\n",
745
+ "pixel_values = feature_extractor(images=image, return_tensors=\"pt\").pixel_values\n",
746
+ "output_ids = model.generate(pixel_values, max_length=100, num_beams=5, early_stopping=True)\n",
747
+ "captions = tokenizer.batch_decode(output_ids, skip_special_tokens=True)\n",
748
+ "\n",
749
+ "image = preprocess(image).unsqueeze(0).to(device)\n",
750
+ "with torch.no_grad():\n",
751
+ " image_features = clip_model.encode_image(image)\n",
752
+ "\n",
753
+ "text_inputs = torch.cat([clip.tokenize(caption).to(device) for caption in captions]).to(device)\n",
754
+ "with torch.no_grad():\n",
755
+ " text_features = clip_model.encode_text(text_inputs)\n",
756
+ "\n",
757
+ "similarity_scores = image_features @ text_features.T\n",
758
+ "best_caption_idx = similarity_scores.argmax().item()\n",
759
+ "product_description = captions[best_caption_idx]\n",
760
+ "print(product_description)\n",
761
+ "\n",
762
+ "# a vase sitting on a shelf in a store => thuya\n",
763
+ "# a wooden bench sitting on top of a wooden floor => avito\n",
764
+ "## two old fashioned vases sitting next to each other => avito2\n",
765
+ "## three wooden vases sitting on top of a wooden floor => avito3\n",
766
+ "# an old fashioned clock sitting on top of a table => avito4\n",
767
+ "\n"
768
+ ]
769
+ },
770
+ {
771
+ "cell_type": "code",
772
+ "execution_count": null,
773
+ "metadata": {
774
+ "colab": {
775
+ "base_uri": "https://localhost:8080/"
776
+ },
777
+ "id": "fR9c1mv3qXGz",
778
+ "outputId": "5b222c53-e0f8-4545-f191-ad6a90ab1373",
779
+ "tags": []
780
+ },
781
+ "outputs": [],
782
+ "source": []
783
+ },
784
+ {
785
+ "cell_type": "markdown",
786
+ "metadata": {
787
+ "id": "qRkGmKyYB7DM"
788
+ },
789
+ "source": [
790
+ "# Implemeting LLaVa"
791
+ ]
792
+ },
793
+ {
794
+ "cell_type": "markdown",
795
+ "metadata": {
796
+ "id": "u6jq8q__zoOt"
797
+ },
798
+ "source": [
799
+ "https://colab.research.google.com/drive/1veefV17NcD1S4ou4nF8ABkfm8-TgU0Dr#scrollTo=XN2vJCPZk1UY"
800
+ ]
801
+ },
802
+ {
803
+ "cell_type": "code",
804
+ "execution_count": null,
805
+ "metadata": {
806
+ "id": "QyO2UcBjzl71"
807
+ },
808
+ "outputs": [],
809
+ "source": []
810
+ }
811
+ ],
812
+ "metadata": {
813
+ "colab": {
814
+ "provenance": []
815
+ },
816
+ "kernelspec": {
817
+ "display_name": "Python 3 (ipykernel)",
818
+ "language": "python",
819
+ "name": "python3"
820
+ },
821
+ "language_info": {
822
+ "codemirror_mode": {
823
+ "name": "ipython",
824
+ "version": 3
825
+ },
826
+ "file_extension": ".py",
827
+ "mimetype": "text/x-python",
828
+ "name": "python",
829
+ "nbconvert_exporter": "python",
830
+ "pygments_lexer": "ipython3",
831
+ "version": "3.9.5"
832
+ }
833
+ },
834
+ "nbformat": 4,
835
+ "nbformat_minor": 4
836
+ }
notebooks/LLaVa (1).ipynb ADDED
The diff for this file is too large to render. See raw diff
 
notebooks/SAM (1).ipynb ADDED
The diff for this file is too large to render. See raw diff