derek-thomas HF staff commited on
Commit
7e2b395
·
1 Parent(s): acc0a44

Adding notebooks

Browse files
notebooks/TGI-benchmark.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
notebooks/TGI-launcher.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
notebooks/jais_tgi_inference_endpoints.ipynb DELETED
@@ -1,420 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "id": "db41d8ba-71c0-4951-9a88-e1ae01a282ec",
6
- "metadata": {},
7
- "source": [
8
- "# Introduction\n",
9
- "Please check out my [blog post](https://datavistics.github.io/posts/jais-inference-endpoints/) for more details!"
10
- ]
11
- },
12
- {
13
- "cell_type": "markdown",
14
- "id": "d2534669-003d-490c-9d7a-32607fa5f404",
15
- "metadata": {},
16
- "source": [
17
- "# Setup"
18
- ]
19
- },
20
- {
21
- "cell_type": "markdown",
22
- "id": "3c830114-dd88-45a9-81b9-78b0e3da7384",
23
- "metadata": {},
24
- "source": [
25
- "## Requirements"
26
- ]
27
- },
28
- {
29
- "cell_type": "code",
30
- "execution_count": 1,
31
- "id": "35386f72-32cb-49fa-a108-3aa504e20429",
32
- "metadata": {
33
- "tags": []
34
- },
35
- "outputs": [
36
- {
37
- "name": "stdout",
38
- "output_type": "stream",
39
- "text": [
40
- "\n",
41
- "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.2.1\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.3.2\u001B[0m\n",
42
- "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n",
43
- "Note: you may need to restart the kernel to use updated packages.\n"
44
- ]
45
- }
46
- ],
47
- "source": [
48
- "%pip install -q \"huggingface-hub>=0.20\" ipywidgets"
49
- ]
50
- },
51
- {
52
- "cell_type": "markdown",
53
- "id": "b6f72042-173d-4a72-ade1-9304b43b528d",
54
- "metadata": {},
55
- "source": [
56
- "## Imports"
57
- ]
58
- },
59
- {
60
- "cell_type": "code",
61
- "execution_count": null,
62
- "id": "99f60998-0490-46c6-a8e6-04845ddda7be",
63
- "metadata": {
64
- "tags": []
65
- },
66
- "outputs": [],
67
- "source": [
68
- "from huggingface_hub import login, whoami, create_inference_endpoint\n",
69
- "from getpass import getpass"
70
- ]
71
- },
72
- {
73
- "cell_type": "markdown",
74
- "id": "5eece903-64ce-435d-a2fd-096c0ff650bf",
75
- "metadata": {},
76
- "source": [
77
- "## Config\n",
78
- "Choose your `ENDPOINT_NAME` if you like."
79
- ]
80
- },
81
- {
82
- "cell_type": "code",
83
- "execution_count": 3,
84
- "id": "dcd7daed-6aca-4fe7-85ce-534bdcd8bc87",
85
- "metadata": {
86
- "tags": []
87
- },
88
- "outputs": [],
89
- "source": [
90
- "ENDPOINT_NAME = \"jais13b-demo\""
91
- ]
92
- },
93
- {
94
- "cell_type": "code",
95
- "execution_count": null,
96
- "id": "0ca1140c-3fcc-4b99-9210-6da1505a27b7",
97
- "metadata": {
98
- "tags": []
99
- },
100
- "outputs": [],
101
- "source": [
102
- "login()"
103
- ]
104
- },
105
- {
106
- "cell_type": "markdown",
107
- "id": "5f4ba0a8-0a6c-4705-a73b-7be09b889610",
108
- "metadata": {},
109
- "source": [
110
- "Some users might have payment registered in an organization. This allows you to connect to an organization (that you are a member of) with a payment method.\n",
111
- "\n",
112
- "Leave it blank if you want to use your username."
113
- ]
114
- },
115
- {
116
- "cell_type": "code",
117
- "execution_count": 5,
118
- "id": "88cdbd73-5923-4ae9-9940-b6be935f70fa",
119
- "metadata": {
120
- "tags": []
121
- },
122
- "outputs": [
123
- {
124
- "name": "stdin",
125
- "output_type": "stream",
126
- "text": [
127
- "What is your Hugging Face 🤗 username or organization? (with an added payment method) ········\n"
128
- ]
129
- }
130
- ],
131
- "source": [
132
- "who = whoami()\n",
133
- "organization = getpass(prompt=\"What is your Hugging Face 🤗 username or organization? (with an added payment method)\")\n",
134
- "\n",
135
- "namespace = organization or who['name']"
136
- ]
137
- },
138
- {
139
- "cell_type": "markdown",
140
- "id": "93096cbc-81c6-4137-a283-6afb0f48fbb9",
141
- "metadata": {},
142
- "source": [
143
- "# Inference Endpoints\n",
144
- "## Create Inference Endpoint\n",
145
- "We are going to use the [API](https://huggingface.co/docs/inference-endpoints/api_reference) to create an [Inference Endpoint](https://huggingface.co/inference-endpoints). This should provide a few main benefits:\n",
146
- "- It's convenient (No clicking)\n",
147
- "- It's repeatable (We have the code to run it easily)\n",
148
- "- It's cheaper (No time spent waiting for it to load, and automatically shut it down)"
149
- ]
150
- },
151
- {
152
- "cell_type": "markdown",
153
- "id": "1cf8334d-6500-412e-9d6d-58990c42c110",
154
- "metadata": {},
155
- "source": [
156
- "Here is a convenient table of instance details you can use when selecting a GPU. Once you have chosen a GPU in Inference Endpoints, you can use the corresponding `instanceType` and `instanceSize`.\n",
157
- "\n",
158
- "| hw_desc | instanceType | instanceSize | vRAM |\n",
159
- "|---------------------|----------------|--------------|-------|\n",
160
- "| 1x Nvidia Tesla T4 | g4dn.xlarge | small | 16GB |\n",
161
- "| 4x Nvidia Tesla T4 | g4dn.12xlarge | large | 64GB |\n",
162
- "| 1x Nvidia A10G | g5.2xlarge | medium | 24GB |\n",
163
- "| 4x Nvidia A10G | g5.12xlarge | xxlarge | 96GB |\n",
164
- "| 1x Nvidia A100 | p4de | xlarge | 80GB |\n",
165
- "| 2x Nvidia A100 | p4de | 2xlarge | 160GB |\n",
166
- "\n",
167
- "Note: To use a node (multiple GPUs) you will need to use a sharded version of jais. I'm not sure if there is currently a version like this on the hub. "
168
- ]
169
- },
170
- {
171
- "cell_type": "code",
172
- "execution_count": 6,
173
- "id": "89c7cc21-3dfe-40e6-80ff-1dcc8558859e",
174
- "metadata": {
175
- "tags": []
176
- },
177
- "outputs": [],
178
- "source": [
179
- "hw_dict = dict(\n",
180
- " accelerator=\"gpu\",\n",
181
- " vendor=\"aws\",\n",
182
- " region=\"us-east-1\",\n",
183
- " type=\"protected\",\n",
184
- " instance_type=\"p4de\",\n",
185
- " instance_size=\"xlarge\",\n",
186
- ")"
187
- ]
188
- },
189
- {
190
- "cell_type": "code",
191
- "execution_count": 7,
192
- "id": "f4267bce-8516-4f3a-b1cc-8ccd6c14a9c7",
193
- "metadata": {
194
- "tags": []
195
- },
196
- "outputs": [],
197
- "source": [
198
- "tgi_env = {\n",
199
- " \"MAX_BATCH_PREFILL_TOKENS\": \"2048\",\n",
200
- " \"MAX_INPUT_LENGTH\": \"2000\",\n",
201
- " 'TRUST_REMOTE_CODE':'true',\n",
202
- " \"QUANTIZE\": 'bitsandbytes', \n",
203
- " \"MODEL_ID\": \"/repository\"\n",
204
- "}"
205
- ]
206
- },
207
- {
208
- "cell_type": "markdown",
209
- "id": "74fd83a0-fef0-4e47-8ff1-f4ba7aed131d",
210
- "metadata": {},
211
- "source": [
212
- "A couple notes on my choices here:\n",
213
- "- I used `derek-thomas/jais-13b-chat-hf` because that repo has SafeTensors merged which will lead to faster loading of the TGI container\n",
214
- "- I'm using the latest TGI container as of the time of writing (1.3.4)\n",
215
- "- `min_replica=0` allows [zero scaling](https://huggingface.co/docs/inference-endpoints/autoscaling#scaling-to-0) which is really useful for your wallet though think through if this makes sense for your use-case as there will be loading times\n",
216
- "- `max_replica` allows you to handle high throughput. Make sure you read through the [docs](https://huggingface.co/docs/inference-endpoints/autoscaling#scaling-criteria) to understand how this scales"
217
- ]
218
- },
219
- {
220
- "cell_type": "code",
221
- "execution_count": 8,
222
- "id": "9e59de46-26b7-4bb9-bbad-8bba9931bde7",
223
- "metadata": {
224
- "tags": []
225
- },
226
- "outputs": [],
227
- "source": [
228
- "endpoint = create_inference_endpoint(\n",
229
- " ENDPOINT_NAME,\n",
230
- " repository=\"derek-thomas/jais-13b-chat-hf\", \n",
231
- " framework=\"pytorch\",\n",
232
- " task=\"text-generation\",\n",
233
- " **hw_dict,\n",
234
- " min_replica=0,\n",
235
- " max_replica=1,\n",
236
- " namespace=namespace,\n",
237
- " custom_image={\n",
238
- " \"health_route\": \"/health\",\n",
239
- " \"env\": tgi_env,\n",
240
- " \"url\": \"ghcr.io/huggingface/text-generation-inference:1.3.4\",\n",
241
- " },\n",
242
- ")"
243
- ]
244
- },
245
- {
246
- "cell_type": "markdown",
247
- "id": "96d173b2-8980-4554-9039-c62843d3fc7d",
248
- "metadata": {},
249
- "source": [
250
- "## Wait until its running"
251
- ]
252
- },
253
- {
254
- "cell_type": "code",
255
- "execution_count": null,
256
- "id": "5f3a8bd2-753c-49a8-9452-899578beddc5",
257
- "metadata": {
258
- "tags": []
259
- },
260
- "outputs": [],
261
- "source": [
262
- "%%time\n",
263
- "endpoint.wait()"
264
- ]
265
- },
266
- {
267
- "cell_type": "code",
268
- "execution_count": 10,
269
- "id": "189b26f0-d404-4570-a1b9-e2a9d486c1f7",
270
- "metadata": {
271
- "tags": []
272
- },
273
- "outputs": [
274
- {
275
- "data": {
276
- "text/plain": [
277
- "'POSITIVE'"
278
- ]
279
- },
280
- "execution_count": 10,
281
- "metadata": {},
282
- "output_type": "execute_result"
283
- }
284
- ],
285
- "source": [
286
- "endpoint.client.text_generation(\"\"\"\n",
287
- "### Instruction: What is the sentiment of the input?\n",
288
- "### Examples\n",
289
- "I wish the screen was bigger - Negative\n",
290
- "I hate the battery - Negative\n",
291
- "I love the default appliations - Positive\n",
292
- "### Input\n",
293
- "I am happy with this purchase - \n",
294
- "### Response\n",
295
- "\"\"\",\n",
296
- " do_sample=True,\n",
297
- " repetition_penalty=1.2,\n",
298
- " top_p=0.9,\n",
299
- " temperature=0.3)"
300
- ]
301
- },
302
- {
303
- "cell_type": "markdown",
304
- "id": "bab97c7b-7bac-4bf5-9752-b528294dadc7",
305
- "metadata": {},
306
- "source": [
307
- "## Pause Inference Endpoint\n",
308
- "Now that we have finished, lets pause the endpoint so we don't incur any extra charges, this will also allow us to analyze the cost."
309
- ]
310
- },
311
- {
312
- "cell_type": "code",
313
- "execution_count": 11,
314
- "id": "540a0978-7670-4ce3-95c1-3823cc113b85",
315
- "metadata": {
316
- "tags": []
317
- },
318
- "outputs": [
319
- {
320
- "name": "stdout",
321
- "output_type": "stream",
322
- "text": [
323
- "Endpoint Status: paused\n"
324
- ]
325
- }
326
- ],
327
- "source": [
328
- "endpoint = endpoint.pause()\n",
329
- "\n",
330
- "print(f\"Endpoint Status: {endpoint.status}\")"
331
- ]
332
- },
333
- {
334
- "cell_type": "markdown",
335
- "id": "41abea64-379d-49de-8d9a-355c2f4ce1ac",
336
- "metadata": {},
337
- "source": [
338
- "## Analyze Usage\n",
339
- "1. Go to your `dashboard_url` printed below\n",
340
- "1. Check the dashboard\n",
341
- "1. Analyze the Usage & Cost tab"
342
- ]
343
- },
344
- {
345
- "cell_type": "code",
346
- "execution_count": null,
347
- "id": "16815445-3079-43da-b14e-b54176a07a62",
348
- "metadata": {
349
- "tags": []
350
- },
351
- "outputs": [],
352
- "source": [
353
- "dashboard_url = f'https://ui.endpoints.huggingface.co/{namespace}/endpoints/{ENDPOINT_NAME}/analytics'\n",
354
- "print(dashboard_url)"
355
- ]
356
- },
357
- {
358
- "cell_type": "markdown",
359
- "id": "b953d5be-2494-4ff8-be42-9daf00c99c41",
360
- "metadata": {},
361
- "source": [
362
- "## Delete Endpoint"
363
- ]
364
- },
365
- {
366
- "cell_type": "code",
367
- "execution_count": 13,
368
- "id": "c310c0f3-6f12-4d5c-838b-3a4c1f2e54ad",
369
- "metadata": {
370
- "tags": []
371
- },
372
- "outputs": [
373
- {
374
- "name": "stdout",
375
- "output_type": "stream",
376
- "text": [
377
- "Endpoint deleted successfully\n"
378
- ]
379
- }
380
- ],
381
- "source": [
382
- "endpoint = endpoint.delete()\n",
383
- "\n",
384
- "if not endpoint:\n",
385
- " print('Endpoint deleted successfully')\n",
386
- "else:\n",
387
- " print('Delete Endpoint in manually') "
388
- ]
389
- },
390
- {
391
- "cell_type": "code",
392
- "execution_count": null,
393
- "id": "611e1345-8d8c-46b1-a9f8-cff27eecb426",
394
- "metadata": {},
395
- "outputs": [],
396
- "source": []
397
- }
398
- ],
399
- "metadata": {
400
- "kernelspec": {
401
- "display_name": "Python 3 (ipykernel)",
402
- "language": "python",
403
- "name": "python3"
404
- },
405
- "language_info": {
406
- "codemirror_mode": {
407
- "name": "ipython",
408
- "version": 3
409
- },
410
- "file_extension": ".py",
411
- "mimetype": "text/x-python",
412
- "name": "python",
413
- "nbconvert_exporter": "python",
414
- "pygments_lexer": "ipython3",
415
- "version": "3.9.6"
416
- }
417
- },
418
- "nbformat": 4,
419
- "nbformat_minor": 5
420
- }