Commit
·
7e2b395
1
Parent(s):
acc0a44
Adding notebooks
Browse files
notebooks/TGI-benchmark.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/TGI-launcher.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/jais_tgi_inference_endpoints.ipynb
DELETED
@@ -1,420 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"cells": [
|
3 |
-
{
|
4 |
-
"cell_type": "markdown",
|
5 |
-
"id": "db41d8ba-71c0-4951-9a88-e1ae01a282ec",
|
6 |
-
"metadata": {},
|
7 |
-
"source": [
|
8 |
-
"# Introduction\n",
|
9 |
-
"Please check out my [blog post](https://datavistics.github.io/posts/jais-inference-endpoints/) for more details!"
|
10 |
-
]
|
11 |
-
},
|
12 |
-
{
|
13 |
-
"cell_type": "markdown",
|
14 |
-
"id": "d2534669-003d-490c-9d7a-32607fa5f404",
|
15 |
-
"metadata": {},
|
16 |
-
"source": [
|
17 |
-
"# Setup"
|
18 |
-
]
|
19 |
-
},
|
20 |
-
{
|
21 |
-
"cell_type": "markdown",
|
22 |
-
"id": "3c830114-dd88-45a9-81b9-78b0e3da7384",
|
23 |
-
"metadata": {},
|
24 |
-
"source": [
|
25 |
-
"## Requirements"
|
26 |
-
]
|
27 |
-
},
|
28 |
-
{
|
29 |
-
"cell_type": "code",
|
30 |
-
"execution_count": 1,
|
31 |
-
"id": "35386f72-32cb-49fa-a108-3aa504e20429",
|
32 |
-
"metadata": {
|
33 |
-
"tags": []
|
34 |
-
},
|
35 |
-
"outputs": [
|
36 |
-
{
|
37 |
-
"name": "stdout",
|
38 |
-
"output_type": "stream",
|
39 |
-
"text": [
|
40 |
-
"\n",
|
41 |
-
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.2.1\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.3.2\u001B[0m\n",
|
42 |
-
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n",
|
43 |
-
"Note: you may need to restart the kernel to use updated packages.\n"
|
44 |
-
]
|
45 |
-
}
|
46 |
-
],
|
47 |
-
"source": [
|
48 |
-
"%pip install -q \"huggingface-hub>=0.20\" ipywidgets"
|
49 |
-
]
|
50 |
-
},
|
51 |
-
{
|
52 |
-
"cell_type": "markdown",
|
53 |
-
"id": "b6f72042-173d-4a72-ade1-9304b43b528d",
|
54 |
-
"metadata": {},
|
55 |
-
"source": [
|
56 |
-
"## Imports"
|
57 |
-
]
|
58 |
-
},
|
59 |
-
{
|
60 |
-
"cell_type": "code",
|
61 |
-
"execution_count": null,
|
62 |
-
"id": "99f60998-0490-46c6-a8e6-04845ddda7be",
|
63 |
-
"metadata": {
|
64 |
-
"tags": []
|
65 |
-
},
|
66 |
-
"outputs": [],
|
67 |
-
"source": [
|
68 |
-
"from huggingface_hub import login, whoami, create_inference_endpoint\n",
|
69 |
-
"from getpass import getpass"
|
70 |
-
]
|
71 |
-
},
|
72 |
-
{
|
73 |
-
"cell_type": "markdown",
|
74 |
-
"id": "5eece903-64ce-435d-a2fd-096c0ff650bf",
|
75 |
-
"metadata": {},
|
76 |
-
"source": [
|
77 |
-
"## Config\n",
|
78 |
-
"Choose your `ENDPOINT_NAME` if you like."
|
79 |
-
]
|
80 |
-
},
|
81 |
-
{
|
82 |
-
"cell_type": "code",
|
83 |
-
"execution_count": 3,
|
84 |
-
"id": "dcd7daed-6aca-4fe7-85ce-534bdcd8bc87",
|
85 |
-
"metadata": {
|
86 |
-
"tags": []
|
87 |
-
},
|
88 |
-
"outputs": [],
|
89 |
-
"source": [
|
90 |
-
"ENDPOINT_NAME = \"jais13b-demo\""
|
91 |
-
]
|
92 |
-
},
|
93 |
-
{
|
94 |
-
"cell_type": "code",
|
95 |
-
"execution_count": null,
|
96 |
-
"id": "0ca1140c-3fcc-4b99-9210-6da1505a27b7",
|
97 |
-
"metadata": {
|
98 |
-
"tags": []
|
99 |
-
},
|
100 |
-
"outputs": [],
|
101 |
-
"source": [
|
102 |
-
"login()"
|
103 |
-
]
|
104 |
-
},
|
105 |
-
{
|
106 |
-
"cell_type": "markdown",
|
107 |
-
"id": "5f4ba0a8-0a6c-4705-a73b-7be09b889610",
|
108 |
-
"metadata": {},
|
109 |
-
"source": [
|
110 |
-
"Some users might have payment registered in an organization. This allows you to connect to an organization (that you are a member of) with a payment method.\n",
|
111 |
-
"\n",
|
112 |
-
"Leave it blank if you want to use your username."
|
113 |
-
]
|
114 |
-
},
|
115 |
-
{
|
116 |
-
"cell_type": "code",
|
117 |
-
"execution_count": 5,
|
118 |
-
"id": "88cdbd73-5923-4ae9-9940-b6be935f70fa",
|
119 |
-
"metadata": {
|
120 |
-
"tags": []
|
121 |
-
},
|
122 |
-
"outputs": [
|
123 |
-
{
|
124 |
-
"name": "stdin",
|
125 |
-
"output_type": "stream",
|
126 |
-
"text": [
|
127 |
-
"What is your Hugging Face 🤗 username or organization? (with an added payment method) ········\n"
|
128 |
-
]
|
129 |
-
}
|
130 |
-
],
|
131 |
-
"source": [
|
132 |
-
"who = whoami()\n",
|
133 |
-
"organization = getpass(prompt=\"What is your Hugging Face 🤗 username or organization? (with an added payment method)\")\n",
|
134 |
-
"\n",
|
135 |
-
"namespace = organization or who['name']"
|
136 |
-
]
|
137 |
-
},
|
138 |
-
{
|
139 |
-
"cell_type": "markdown",
|
140 |
-
"id": "93096cbc-81c6-4137-a283-6afb0f48fbb9",
|
141 |
-
"metadata": {},
|
142 |
-
"source": [
|
143 |
-
"# Inference Endpoints\n",
|
144 |
-
"## Create Inference Endpoint\n",
|
145 |
-
"We are going to use the [API](https://huggingface.co/docs/inference-endpoints/api_reference) to create an [Inference Endpoint](https://huggingface.co/inference-endpoints). This should provide a few main benefits:\n",
|
146 |
-
"- It's convenient (No clicking)\n",
|
147 |
-
"- It's repeatable (We have the code to run it easily)\n",
|
148 |
-
"- It's cheaper (No time spent waiting for it to load, and automatically shut it down)"
|
149 |
-
]
|
150 |
-
},
|
151 |
-
{
|
152 |
-
"cell_type": "markdown",
|
153 |
-
"id": "1cf8334d-6500-412e-9d6d-58990c42c110",
|
154 |
-
"metadata": {},
|
155 |
-
"source": [
|
156 |
-
"Here is a convenient table of instance details you can use when selecting a GPU. Once you have chosen a GPU in Inference Endpoints, you can use the corresponding `instanceType` and `instanceSize`.\n",
|
157 |
-
"\n",
|
158 |
-
"| hw_desc | instanceType | instanceSize | vRAM |\n",
|
159 |
-
"|---------------------|----------------|--------------|-------|\n",
|
160 |
-
"| 1x Nvidia Tesla T4 | g4dn.xlarge | small | 16GB |\n",
|
161 |
-
"| 4x Nvidia Tesla T4 | g4dn.12xlarge | large | 64GB |\n",
|
162 |
-
"| 1x Nvidia A10G | g5.2xlarge | medium | 24GB |\n",
|
163 |
-
"| 4x Nvidia A10G | g5.12xlarge | xxlarge | 96GB |\n",
|
164 |
-
"| 1x Nvidia A100 | p4de | xlarge | 80GB |\n",
|
165 |
-
"| 2x Nvidia A100 | p4de | 2xlarge | 160GB |\n",
|
166 |
-
"\n",
|
167 |
-
"Note: To use a node (multiple GPUs) you will need to use a sharded version of jais. I'm not sure if there is currently a version like this on the hub. "
|
168 |
-
]
|
169 |
-
},
|
170 |
-
{
|
171 |
-
"cell_type": "code",
|
172 |
-
"execution_count": 6,
|
173 |
-
"id": "89c7cc21-3dfe-40e6-80ff-1dcc8558859e",
|
174 |
-
"metadata": {
|
175 |
-
"tags": []
|
176 |
-
},
|
177 |
-
"outputs": [],
|
178 |
-
"source": [
|
179 |
-
"hw_dict = dict(\n",
|
180 |
-
" accelerator=\"gpu\",\n",
|
181 |
-
" vendor=\"aws\",\n",
|
182 |
-
" region=\"us-east-1\",\n",
|
183 |
-
" type=\"protected\",\n",
|
184 |
-
" instance_type=\"p4de\",\n",
|
185 |
-
" instance_size=\"xlarge\",\n",
|
186 |
-
")"
|
187 |
-
]
|
188 |
-
},
|
189 |
-
{
|
190 |
-
"cell_type": "code",
|
191 |
-
"execution_count": 7,
|
192 |
-
"id": "f4267bce-8516-4f3a-b1cc-8ccd6c14a9c7",
|
193 |
-
"metadata": {
|
194 |
-
"tags": []
|
195 |
-
},
|
196 |
-
"outputs": [],
|
197 |
-
"source": [
|
198 |
-
"tgi_env = {\n",
|
199 |
-
" \"MAX_BATCH_PREFILL_TOKENS\": \"2048\",\n",
|
200 |
-
" \"MAX_INPUT_LENGTH\": \"2000\",\n",
|
201 |
-
" 'TRUST_REMOTE_CODE':'true',\n",
|
202 |
-
" \"QUANTIZE\": 'bitsandbytes', \n",
|
203 |
-
" \"MODEL_ID\": \"/repository\"\n",
|
204 |
-
"}"
|
205 |
-
]
|
206 |
-
},
|
207 |
-
{
|
208 |
-
"cell_type": "markdown",
|
209 |
-
"id": "74fd83a0-fef0-4e47-8ff1-f4ba7aed131d",
|
210 |
-
"metadata": {},
|
211 |
-
"source": [
|
212 |
-
"A couple notes on my choices here:\n",
|
213 |
-
"- I used `derek-thomas/jais-13b-chat-hf` because that repo has SafeTensors merged which will lead to faster loading of the TGI container\n",
|
214 |
-
"- I'm using the latest TGI container as of the time of writing (1.3.4)\n",
|
215 |
-
"- `min_replica=0` allows [zero scaling](https://huggingface.co/docs/inference-endpoints/autoscaling#scaling-to-0) which is really useful for your wallet though think through if this makes sense for your use-case as there will be loading times\n",
|
216 |
-
"- `max_replica` allows you to handle high throughput. Make sure you read through the [docs](https://huggingface.co/docs/inference-endpoints/autoscaling#scaling-criteria) to understand how this scales"
|
217 |
-
]
|
218 |
-
},
|
219 |
-
{
|
220 |
-
"cell_type": "code",
|
221 |
-
"execution_count": 8,
|
222 |
-
"id": "9e59de46-26b7-4bb9-bbad-8bba9931bde7",
|
223 |
-
"metadata": {
|
224 |
-
"tags": []
|
225 |
-
},
|
226 |
-
"outputs": [],
|
227 |
-
"source": [
|
228 |
-
"endpoint = create_inference_endpoint(\n",
|
229 |
-
" ENDPOINT_NAME,\n",
|
230 |
-
" repository=\"derek-thomas/jais-13b-chat-hf\", \n",
|
231 |
-
" framework=\"pytorch\",\n",
|
232 |
-
" task=\"text-generation\",\n",
|
233 |
-
" **hw_dict,\n",
|
234 |
-
" min_replica=0,\n",
|
235 |
-
" max_replica=1,\n",
|
236 |
-
" namespace=namespace,\n",
|
237 |
-
" custom_image={\n",
|
238 |
-
" \"health_route\": \"/health\",\n",
|
239 |
-
" \"env\": tgi_env,\n",
|
240 |
-
" \"url\": \"ghcr.io/huggingface/text-generation-inference:1.3.4\",\n",
|
241 |
-
" },\n",
|
242 |
-
")"
|
243 |
-
]
|
244 |
-
},
|
245 |
-
{
|
246 |
-
"cell_type": "markdown",
|
247 |
-
"id": "96d173b2-8980-4554-9039-c62843d3fc7d",
|
248 |
-
"metadata": {},
|
249 |
-
"source": [
|
250 |
-
"## Wait until its running"
|
251 |
-
]
|
252 |
-
},
|
253 |
-
{
|
254 |
-
"cell_type": "code",
|
255 |
-
"execution_count": null,
|
256 |
-
"id": "5f3a8bd2-753c-49a8-9452-899578beddc5",
|
257 |
-
"metadata": {
|
258 |
-
"tags": []
|
259 |
-
},
|
260 |
-
"outputs": [],
|
261 |
-
"source": [
|
262 |
-
"%%time\n",
|
263 |
-
"endpoint.wait()"
|
264 |
-
]
|
265 |
-
},
|
266 |
-
{
|
267 |
-
"cell_type": "code",
|
268 |
-
"execution_count": 10,
|
269 |
-
"id": "189b26f0-d404-4570-a1b9-e2a9d486c1f7",
|
270 |
-
"metadata": {
|
271 |
-
"tags": []
|
272 |
-
},
|
273 |
-
"outputs": [
|
274 |
-
{
|
275 |
-
"data": {
|
276 |
-
"text/plain": [
|
277 |
-
"'POSITIVE'"
|
278 |
-
]
|
279 |
-
},
|
280 |
-
"execution_count": 10,
|
281 |
-
"metadata": {},
|
282 |
-
"output_type": "execute_result"
|
283 |
-
}
|
284 |
-
],
|
285 |
-
"source": [
|
286 |
-
"endpoint.client.text_generation(\"\"\"\n",
|
287 |
-
"### Instruction: What is the sentiment of the input?\n",
|
288 |
-
"### Examples\n",
|
289 |
-
"I wish the screen was bigger - Negative\n",
|
290 |
-
"I hate the battery - Negative\n",
|
291 |
-
"I love the default appliations - Positive\n",
|
292 |
-
"### Input\n",
|
293 |
-
"I am happy with this purchase - \n",
|
294 |
-
"### Response\n",
|
295 |
-
"\"\"\",\n",
|
296 |
-
" do_sample=True,\n",
|
297 |
-
" repetition_penalty=1.2,\n",
|
298 |
-
" top_p=0.9,\n",
|
299 |
-
" temperature=0.3)"
|
300 |
-
]
|
301 |
-
},
|
302 |
-
{
|
303 |
-
"cell_type": "markdown",
|
304 |
-
"id": "bab97c7b-7bac-4bf5-9752-b528294dadc7",
|
305 |
-
"metadata": {},
|
306 |
-
"source": [
|
307 |
-
"## Pause Inference Endpoint\n",
|
308 |
-
"Now that we have finished, lets pause the endpoint so we don't incur any extra charges, this will also allow us to analyze the cost."
|
309 |
-
]
|
310 |
-
},
|
311 |
-
{
|
312 |
-
"cell_type": "code",
|
313 |
-
"execution_count": 11,
|
314 |
-
"id": "540a0978-7670-4ce3-95c1-3823cc113b85",
|
315 |
-
"metadata": {
|
316 |
-
"tags": []
|
317 |
-
},
|
318 |
-
"outputs": [
|
319 |
-
{
|
320 |
-
"name": "stdout",
|
321 |
-
"output_type": "stream",
|
322 |
-
"text": [
|
323 |
-
"Endpoint Status: paused\n"
|
324 |
-
]
|
325 |
-
}
|
326 |
-
],
|
327 |
-
"source": [
|
328 |
-
"endpoint = endpoint.pause()\n",
|
329 |
-
"\n",
|
330 |
-
"print(f\"Endpoint Status: {endpoint.status}\")"
|
331 |
-
]
|
332 |
-
},
|
333 |
-
{
|
334 |
-
"cell_type": "markdown",
|
335 |
-
"id": "41abea64-379d-49de-8d9a-355c2f4ce1ac",
|
336 |
-
"metadata": {},
|
337 |
-
"source": [
|
338 |
-
"## Analyze Usage\n",
|
339 |
-
"1. Go to your `dashboard_url` printed below\n",
|
340 |
-
"1. Check the dashboard\n",
|
341 |
-
"1. Analyze the Usage & Cost tab"
|
342 |
-
]
|
343 |
-
},
|
344 |
-
{
|
345 |
-
"cell_type": "code",
|
346 |
-
"execution_count": null,
|
347 |
-
"id": "16815445-3079-43da-b14e-b54176a07a62",
|
348 |
-
"metadata": {
|
349 |
-
"tags": []
|
350 |
-
},
|
351 |
-
"outputs": [],
|
352 |
-
"source": [
|
353 |
-
"dashboard_url = f'https://ui.endpoints.huggingface.co/{namespace}/endpoints/{ENDPOINT_NAME}/analytics'\n",
|
354 |
-
"print(dashboard_url)"
|
355 |
-
]
|
356 |
-
},
|
357 |
-
{
|
358 |
-
"cell_type": "markdown",
|
359 |
-
"id": "b953d5be-2494-4ff8-be42-9daf00c99c41",
|
360 |
-
"metadata": {},
|
361 |
-
"source": [
|
362 |
-
"## Delete Endpoint"
|
363 |
-
]
|
364 |
-
},
|
365 |
-
{
|
366 |
-
"cell_type": "code",
|
367 |
-
"execution_count": 13,
|
368 |
-
"id": "c310c0f3-6f12-4d5c-838b-3a4c1f2e54ad",
|
369 |
-
"metadata": {
|
370 |
-
"tags": []
|
371 |
-
},
|
372 |
-
"outputs": [
|
373 |
-
{
|
374 |
-
"name": "stdout",
|
375 |
-
"output_type": "stream",
|
376 |
-
"text": [
|
377 |
-
"Endpoint deleted successfully\n"
|
378 |
-
]
|
379 |
-
}
|
380 |
-
],
|
381 |
-
"source": [
|
382 |
-
"endpoint = endpoint.delete()\n",
|
383 |
-
"\n",
|
384 |
-
"if not endpoint:\n",
|
385 |
-
" print('Endpoint deleted successfully')\n",
|
386 |
-
"else:\n",
|
387 |
-
" print('Delete Endpoint in manually') "
|
388 |
-
]
|
389 |
-
},
|
390 |
-
{
|
391 |
-
"cell_type": "code",
|
392 |
-
"execution_count": null,
|
393 |
-
"id": "611e1345-8d8c-46b1-a9f8-cff27eecb426",
|
394 |
-
"metadata": {},
|
395 |
-
"outputs": [],
|
396 |
-
"source": []
|
397 |
-
}
|
398 |
-
],
|
399 |
-
"metadata": {
|
400 |
-
"kernelspec": {
|
401 |
-
"display_name": "Python 3 (ipykernel)",
|
402 |
-
"language": "python",
|
403 |
-
"name": "python3"
|
404 |
-
},
|
405 |
-
"language_info": {
|
406 |
-
"codemirror_mode": {
|
407 |
-
"name": "ipython",
|
408 |
-
"version": 3
|
409 |
-
},
|
410 |
-
"file_extension": ".py",
|
411 |
-
"mimetype": "text/x-python",
|
412 |
-
"name": "python",
|
413 |
-
"nbconvert_exporter": "python",
|
414 |
-
"pygments_lexer": "ipython3",
|
415 |
-
"version": "3.9.6"
|
416 |
-
}
|
417 |
-
},
|
418 |
-
"nbformat": 4,
|
419 |
-
"nbformat_minor": 5
|
420 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|