Spaces:

derek-thomas
/

tgi-benchmark-space

Running

App Files Files Community

derek-thomas HF staff commited on Apr 26, 2024

Commit

7e2b395

1 Parent(s): acc0a44

Adding notebooks

Browse files

Files changed (3) hide show

notebooks/TGI-benchmark.ipynb +0 -0
notebooks/TGI-launcher.ipynb +0 -0
notebooks/jais_tgi_inference_endpoints.ipynb +0 -420

notebooks/TGI-benchmark.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

notebooks/TGI-launcher.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

notebooks/jais_tgi_inference_endpoints.ipynb DELETED Viewed

@@ -1,420 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "db41d8ba-71c0-4951-9a88-e1ae01a282ec",
-   "metadata": {},
-   "source": [
-    "# Introduction\n",
-    "Please check out my [blog post](https://datavistics.github.io/posts/jais-inference-endpoints/) for more details!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d2534669-003d-490c-9d7a-32607fa5f404",
-   "metadata": {},
-   "source": [
-    "# Setup"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3c830114-dd88-45a9-81b9-78b0e3da7384",
-   "metadata": {},
-   "source": [
-    "## Requirements"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "35386f72-32cb-49fa-a108-3aa504e20429",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.2.1\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.3.2\u001B[0m\n",
-      "\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n",
-      "Note: you may need to restart the kernel to use updated packages.\n"
-     ]
-    }
-   ],
-   "source": [
-    "%pip install -q \"huggingface-hub>=0.20\" ipywidgets"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b6f72042-173d-4a72-ade1-9304b43b528d",
-   "metadata": {},
-   "source": [
-    "## Imports"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "99f60998-0490-46c6-a8e6-04845ddda7be",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "from huggingface_hub import login, whoami, create_inference_endpoint\n",
-    "from getpass import getpass"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5eece903-64ce-435d-a2fd-096c0ff650bf",
-   "metadata": {},
-   "source": [
-    "## Config\n",
-    "Choose your `ENDPOINT_NAME` if you like."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "dcd7daed-6aca-4fe7-85ce-534bdcd8bc87",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "ENDPOINT_NAME = \"jais13b-demo\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0ca1140c-3fcc-4b99-9210-6da1505a27b7",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "login()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5f4ba0a8-0a6c-4705-a73b-7be09b889610",
-   "metadata": {},
-   "source": [
-    "Some users might have payment registered in an organization. This allows you to connect to an organization (that you are a member of) with a payment method.\n",
-    "\n",
-    "Leave it blank if you want to use your username."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "88cdbd73-5923-4ae9-9940-b6be935f70fa",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdin",
-     "output_type": "stream",
-     "text": [
-      "What is your Hugging Face 🤗 username or organization? (with an added payment method) ········\n"
-     ]
-    }
-   ],
-   "source": [
-    "who = whoami()\n",
-    "organization = getpass(prompt=\"What is your Hugging Face 🤗 username or organization? (with an added payment method)\")\n",
-    "\n",
-    "namespace = organization or who['name']"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "93096cbc-81c6-4137-a283-6afb0f48fbb9",
-   "metadata": {},
-   "source": [
-    "# Inference Endpoints\n",
-    "## Create Inference Endpoint\n",
-    "We are going to use the [API](https://huggingface.co/docs/inference-endpoints/api_reference) to create an [Inference Endpoint](https://huggingface.co/inference-endpoints). This should provide a few main benefits:\n",
-    "- It's convenient (No clicking)\n",
-    "- It's repeatable (We have the code to run it easily)\n",
-    "- It's cheaper (No time spent waiting for it to load, and automatically shut it down)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1cf8334d-6500-412e-9d6d-58990c42c110",
-   "metadata": {},
-   "source": [
-    "Here is a convenient table of instance details you can use when selecting a GPU. Once you have chosen a GPU in Inference Endpoints, you can use the corresponding `instanceType` and `instanceSize`.\n",
-    "\n",
-    "| hw_desc             | instanceType   | instanceSize | vRAM  |\n",
-    "|---------------------|----------------|--------------|-------|\n",
-    "| 1x Nvidia Tesla T4  | g4dn.xlarge    | small        | 16GB  |\n",
-    "| 4x Nvidia Tesla T4  | g4dn.12xlarge  | large        | 64GB  |\n",
-    "| 1x Nvidia A10G      | g5.2xlarge     | medium       | 24GB  |\n",
-    "| 4x Nvidia A10G      | g5.12xlarge    | xxlarge      | 96GB  |\n",
-    "| 1x Nvidia A100      | p4de           | xlarge       | 80GB  |\n",
-    "| 2x Nvidia A100      | p4de           | 2xlarge      | 160GB |\n",
-    "\n",
-    "Note: To use a node (multiple GPUs) you will need to use a sharded version of jais. I'm not sure if there is currently a version like this on the hub. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "89c7cc21-3dfe-40e6-80ff-1dcc8558859e",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "hw_dict = dict(\n",
-    "    accelerator=\"gpu\",\n",
-    "    vendor=\"aws\",\n",
-    "    region=\"us-east-1\",\n",
-    "    type=\"protected\",\n",
-    "    instance_type=\"p4de\",\n",
-    "    instance_size=\"xlarge\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "f4267bce-8516-4f3a-b1cc-8ccd6c14a9c7",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "tgi_env = {\n",
-    "    \"MAX_BATCH_PREFILL_TOKENS\": \"2048\",\n",
-    "    \"MAX_INPUT_LENGTH\": \"2000\",\n",
-    "    'TRUST_REMOTE_CODE':'true',\n",
-    "    \"QUANTIZE\": 'bitsandbytes', \n",
-    "    \"MODEL_ID\": \"/repository\"\n",
-    "}"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "74fd83a0-fef0-4e47-8ff1-f4ba7aed131d",
-   "metadata": {},
-   "source": [
-    "A couple notes on my choices here:\n",
-    "- I used `derek-thomas/jais-13b-chat-hf` because that repo has SafeTensors merged which will lead to faster loading of the TGI container\n",
-    "- I'm using the latest TGI container as of the time of writing (1.3.4)\n",
-    "- `min_replica=0` allows [zero scaling](https://huggingface.co/docs/inference-endpoints/autoscaling#scaling-to-0) which is really useful for your wallet though think through if this makes sense for your use-case as there will be loading times\n",
-    "- `max_replica` allows you to handle high throughput. Make sure you read through the [docs](https://huggingface.co/docs/inference-endpoints/autoscaling#scaling-criteria) to understand how this scales"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "9e59de46-26b7-4bb9-bbad-8bba9931bde7",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "endpoint = create_inference_endpoint(\n",
-    "    ENDPOINT_NAME,\n",
-    "    repository=\"derek-thomas/jais-13b-chat-hf\",  \n",
-    "    framework=\"pytorch\",\n",
-    "    task=\"text-generation\",\n",
-    "    **hw_dict,\n",
-    "    min_replica=0,\n",
-    "    max_replica=1,\n",
-    "    namespace=namespace,\n",
-    "    custom_image={\n",
-    "        \"health_route\": \"/health\",\n",
-    "        \"env\": tgi_env,\n",
-    "        \"url\": \"ghcr.io/huggingface/text-generation-inference:1.3.4\",\n",
-    "    },\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "96d173b2-8980-4554-9039-c62843d3fc7d",
-   "metadata": {},
-   "source": [
-    "## Wait until its running"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5f3a8bd2-753c-49a8-9452-899578beddc5",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "endpoint.wait()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "189b26f0-d404-4570-a1b9-e2a9d486c1f7",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'POSITIVE'"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "endpoint.client.text_generation(\"\"\"\n",
-    "### Instruction: What is the sentiment of the input?\n",
-    "### Examples\n",
-    "I wish the screen was bigger - Negative\n",
-    "I hate the battery - Negative\n",
-    "I love the default appliations - Positive\n",
-    "### Input\n",
-    "I am happy with this purchase - \n",
-    "### Response\n",
-    "\"\"\",\n",
-    "                               do_sample=True,\n",
-    "                               repetition_penalty=1.2,\n",
-    "                               top_p=0.9,\n",
-    "                               temperature=0.3)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bab97c7b-7bac-4bf5-9752-b528294dadc7",
-   "metadata": {},
-   "source": [
-    "## Pause Inference Endpoint\n",
-    "Now that we have finished, lets pause the endpoint so we don't incur any extra charges, this will also allow us to analyze the cost."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "540a0978-7670-4ce3-95c1-3823cc113b85",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Endpoint Status: paused\n"
-     ]
-    }
-   ],
-   "source": [
-    "endpoint = endpoint.pause()\n",
-    "\n",
-    "print(f\"Endpoint Status: {endpoint.status}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "41abea64-379d-49de-8d9a-355c2f4ce1ac",
-   "metadata": {},
-   "source": [
-    "## Analyze Usage\n",
-    "1. Go to your `dashboard_url` printed below\n",
-    "1. Check the dashboard\n",
-    "1. Analyze the Usage & Cost tab"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "16815445-3079-43da-b14e-b54176a07a62",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "dashboard_url = f'https://ui.endpoints.huggingface.co/{namespace}/endpoints/{ENDPOINT_NAME}/analytics'\n",
-    "print(dashboard_url)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b953d5be-2494-4ff8-be42-9daf00c99c41",
-   "metadata": {},
-   "source": [
-    "## Delete Endpoint"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "c310c0f3-6f12-4d5c-838b-3a4c1f2e54ad",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Endpoint deleted successfully\n"
-     ]
-    }
-   ],
-   "source": [
-    "endpoint = endpoint.delete()\n",
-    "\n",
-    "if not endpoint:\n",
-    "    print('Endpoint deleted successfully')\n",
-    "else:\n",
-    "    print('Delete Endpoint in manually') "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "611e1345-8d8c-46b1-a9f8-cff27eecb426",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}