Spaces:

aithink
/

HFLLMAPI

Running

File size: 4,436 Bytes

b6c2bbd

---

title: HF LLM API
emoji: ☪️
colorFrom: gray
colorTo: gray
sdk: docker
app_port: 23333
---


## HF-LLM-API

![](https://img.shields.io/github/v/release/Niansuh/HF-LLM-API?label=HF-LLM-API&color=blue&cacheSeconds=60)

Huggingface LLM Inference API in OpenAI message format.

# Original Project link: https://github.com/Hansimov/HF-LLM-API

## Features

- Available Models (2024/04/20):
  - `mistral-7b`, `mixtral-8x7b`, `nous-mixtral-8x7b`, `gemma-7b`, `command-r-plus`, `llama3-70b`, `zephyr-141b`, `gpt-3.5-turbo`
  - Adaptive prompt templates for different models
- Support OpenAI API format
  - Enable api endpoint via official `openai-python` package
- Support both stream and no-stream response
- Support API Key via both HTTP auth header and env variable
- Docker deployment

## Run API service

### Run in Command Line

**Install dependencies:**

```bash

# pipreqs . --force --mode no-pin

pip install -r requirements.txt

```

**Run API:**

```bash

python -m apis.chat_api

```

## Run via Docker

**Docker build:**

```bash

sudo docker build -t hf-llm-api:1.1.3 . --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy

```

**Docker run:**

```bash

# no proxy

sudo docker run -p 23333:23333 hf-llm-api:1.1.3



# with proxy

sudo docker run -p 23333:23333 --env http_proxy="http://<server>:<port>" hf-llm-api:1.1.3

```

## API Usage

### Using `openai-python`

See: [`examples/chat_with_openai.py`](https://github.com/Niansuh/HF-LLM-API/blob/main/examples/chat_with_openai.py)

```py

from openai import OpenAI



# If runnning this service with proxy, you might need to unset `http(s)_proxy`.

base_url = "http://127.0.0.1:23333"

# Your own HF_TOKEN

api_key = "hf_xxxxxxxxxxxxxxxx"

# use below as non-auth user

# api_key = "sk-xxx"



client = OpenAI(base_url=base_url, api_key=api_key)

response = client.chat.completions.create(

    model="nous-mixtral-8x7b",

    messages=[

        {

            "role": "user",

            "content": "what is your model",

        }

    ],

    stream=True,

)



for chunk in response:

    if chunk.choices[0].delta.content is not None:

        print(chunk.choices[0].delta.content, end="", flush=True)

    elif chunk.choices[0].finish_reason == "stop":

        print()

    else:

        pass

```

### Using post requests

See: [`examples/chat_with_post.py`](https://github.com/Niansuh/HF-LLM-API/blob/main/examples/chat_with_post.py)


```py

import ast

import httpx

import json

import re



# If runnning this service with proxy, you might need to unset `http(s)_proxy`.

chat_api = "http://127.0.0.1:23333"

# Your own HF_TOKEN

api_key = "hf_xxxxxxxxxxxxxxxx"

# use below as non-auth user

# api_key = "sk-xxx"



requests_headers = {}

requests_payload = {

    "model": "nous-mixtral-8x7b",

    "messages": [

        {

            "role": "user",

            "content": "what is your model",

        }

    ],

    "stream": True,

}



with httpx.stream(

    "POST",

    chat_api + "/chat/completions",

    headers=requests_headers,

    json=requests_payload,

    timeout=httpx.Timeout(connect=20, read=60, write=20, pool=None),

) as response:

    # https://docs.aiohttp.org/en/stable/streams.html

    # https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb

    response_content = ""

    for line in response.iter_lines():

        remove_patterns = [r"^\s*data:\s*", r"^\s*\[DONE\]\s*"]

        for pattern in remove_patterns:

            line = re.sub(pattern, "", line).strip()



        if line:

            try:

                line_data = json.loads(line)

            except Exception as e:

                try:

                    line_data = ast.literal_eval(line)

                except:

                    print(f"Error: {line}")

                    raise e

            # print(f"line: {line_data}")

            delta_data = line_data["choices"][0]["delta"]

            finish_reason = line_data["choices"][0]["finish_reason"]

            if "role" in delta_data:

                role = delta_data["role"]

            if "content" in delta_data:

                delta_content = delta_data["content"]

                response_content += delta_content

                print(delta_content, end="", flush=True)

            if finish_reason == "stop":

                print()



```