derek-thomas HF staff commited on
Commit
5d714fc
·
verified ·
1 Parent(s): a2cb044

Updating naming and adding polish

Browse files
Files changed (1) hide show
  1. 01-tgi-ie-benchmark.ipynb +38 -4
01-tgi-ie-benchmark.ipynb CHANGED
@@ -1,5 +1,13 @@
1
  {
2
  "cells": [
 
 
 
 
 
 
 
 
3
  {
4
  "cell_type": "code",
5
  "execution_count": null,
@@ -75,7 +83,9 @@
75
  "\n",
76
  "# Simulation\n",
77
  "RESULTS_DIR = proj_dir/'tgi_benchmark_results'/INSTANCE_TYPE\n",
78
- "tgi_bss = [8, 16, 24, 32, 40, 48, 56, 64]"
 
 
79
  ]
80
  },
81
  {
@@ -86,6 +96,14 @@
86
  "# Endpoint setup"
87
  ]
88
  },
 
 
 
 
 
 
 
 
89
  {
90
  "cell_type": "code",
91
  "execution_count": null,
@@ -119,8 +137,8 @@
119
  " custom_image={\n",
120
  " \"health_route\": \"/health\",\n",
121
  " \"env\": {\n",
122
- " \"MAX_INPUT_LENGTH\": \"3050\",\n",
123
- " \"MAX_TOTAL_TOKENS\": \"3300\",\n",
124
  " \"MAX_BATCH_SIZE\": f\"{MAX_BATCH_SIZE}\",\n",
125
  " \"HF_TOKEN\": get_token(),\n",
126
  " \"MODEL_ID\": \"/repository\",\n",
@@ -137,6 +155,14 @@
137
  " return endpoint"
138
  ]
139
  },
 
 
 
 
 
 
 
 
140
  {
141
  "cell_type": "code",
142
  "execution_count": null,
@@ -175,7 +201,7 @@
175
  " command = [\n",
176
  " \"python\", benchmark_script,\n",
177
  " \"--model\", f\"huggingface/{MODEL}\",\n",
178
- " \"--mean-input-tokens\", \"3000\",\n",
179
  " \"--stddev-input-tokens\", \"10\",\n",
180
  " \"--mean-output-tokens\", \"240\",\n",
181
  " \"--stddev-output-tokens\", \"5\",\n",
@@ -210,6 +236,14 @@
210
  " return max_working"
211
  ]
212
  },
 
 
 
 
 
 
 
 
213
  {
214
  "cell_type": "code",
215
  "execution_count": null,
 
1
  {
2
  "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "602a8c54-b434-4d8e-bc72-824c642fbdb5",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Setup"
9
+ ]
10
+ },
11
  {
12
  "cell_type": "code",
13
  "execution_count": null,
 
83
  "\n",
84
  "# Simulation\n",
85
  "RESULTS_DIR = proj_dir/'tgi_benchmark_results'/INSTANCE_TYPE\n",
86
+ "tgi_bss = [8, 16, 24, 32, 40, 48, 56, 64]\n",
87
+ "INPUT_TOKENS = 3000\n",
88
+ "OUTPUT_TOKENS = 300"
89
  ]
90
  },
91
  {
 
96
  "# Endpoint setup"
97
  ]
98
  },
99
+ {
100
+ "cell_type": "markdown",
101
+ "id": "8610e033-8586-495a-943e-539b7c8304d0",
102
+ "metadata": {},
103
+ "source": [
104
+ "Be sure to configure your endpoint how you desire, I made some guesses on what you might want in the `env`. You can see some settings in the [pricing section](https://huggingface.co/docs/inference-endpoints/en/pricing#gpu-instances) of the docs. I would also recommend manually deploying once and using `get_inference_endpoint().__dict__` to double check your settings just to double check."
105
+ ]
106
+ },
107
  {
108
  "cell_type": "code",
109
  "execution_count": null,
 
137
  " custom_image={\n",
138
  " \"health_route\": \"/health\",\n",
139
  " \"env\": {\n",
140
+ " \"MAX_INPUT_LENGTH\": f\"{INPUT_TOKENS+50}\",\n",
141
+ " \"MAX_TOTAL_TOKENS\": f\"{INPUT_TOKENS + OUTPUT_TOKENS}\",\n",
142
  " \"MAX_BATCH_SIZE\": f\"{MAX_BATCH_SIZE}\",\n",
143
  " \"HF_TOKEN\": get_token(),\n",
144
  " \"MODEL_ID\": \"/repository\",\n",
 
155
  " return endpoint"
156
  ]
157
  },
158
+ {
159
+ "cell_type": "markdown",
160
+ "id": "5e55710d-fa77-41b7-ae9c-a4826140f6b6",
161
+ "metadata": {},
162
+ "source": [
163
+ "Make sure to check the command to make sure it matches what you expect. Also check the summary stats json to see what actually happened."
164
+ ]
165
+ },
166
  {
167
  "cell_type": "code",
168
  "execution_count": null,
 
201
  " command = [\n",
202
  " \"python\", benchmark_script,\n",
203
  " \"--model\", f\"huggingface/{MODEL}\",\n",
204
+ " \"--mean-input-tokens\", f\"{INPUT_TOKENS}\",\n",
205
  " \"--stddev-input-tokens\", \"10\",\n",
206
  " \"--mean-output-tokens\", \"240\",\n",
207
  " \"--stddev-output-tokens\", \"5\",\n",
 
236
  " return max_working"
237
  ]
238
  },
239
+ {
240
+ "cell_type": "markdown",
241
+ "id": "d32b71a7-371f-4f80-a9f2-2cfc65e04afd",
242
+ "metadata": {},
243
+ "source": [
244
+ "Here Im creating the endpoint and then running the simulation."
245
+ ]
246
+ },
247
  {
248
  "cell_type": "code",
249
  "execution_count": null,