Spaces:
Running
Running
title: HF LLM API | |
emoji: ☯️ | |
colorFrom: gray | |
colorTo: gray | |
sdk: docker | |
app_port: 23333 | |
## HF-LLM-API | |
![](https://img.shields.io/github/v/release/hansimov/hf-llm-api?label=HF-LLM-API&color=blue&cacheSeconds=60) | |
Huggingface LLM Inference API in OpenAI message format. | |
Project link: https://github.com/Hansimov/hf-llm-api | |
## Features | |
- Available Models (2024/04/20): | |
- `mistral-7b`, `mixtral-8x7b`, `nous-mixtral-8x7b`, `gemma-7b`, `command-r-plus`, `llama3-70b`, `zephyr-141b`, `gpt-3.5-turbo` | |
- Adaptive prompt templates for different models | |
- Support OpenAI API format | |
- Enable api endpoint via official `openai-python` package | |
- Support both stream and no-stream response | |
- Support API Key via both HTTP auth header and env variable | |
- Docker deployment | |
## Run API service | |
### Run in Command Line | |
**Install dependencies:** | |
```bash | |
# pipreqs . --force --mode no-pin | |
pip install -r requirements.txt | |
``` | |
**Run API:** | |
```bash | |
python -m apis.chat_api | |
``` | |
## Run via Docker | |
**Docker build:** | |
```bash | |
sudo docker build -t hf-llm-api:1.1.3 . --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy | |
``` | |
**Docker run:** | |
```bash | |
# no proxy | |
sudo docker run -p 23333:23333 hf-llm-api:1.1.3 | |
# with proxy | |
sudo docker run -p 23333:23333 --env http_proxy="http://<server>:<port>" hf-llm-api:1.1.3 | |
``` | |
## API Usage | |
### Using `openai-python` | |
See: [`examples/chat_with_openai.py`](https://github.com/Hansimov/hf-llm-api/blob/main/examples/chat_with_openai.py) | |
```py | |
from openai import OpenAI | |
# If runnning this service with proxy, you might need to unset `http(s)_proxy`. | |
base_url = "http://127.0.0.1:23333" | |
# Your own HF_TOKEN | |
api_key = "hf_xxxxxxxxxxxxxxxx" | |
# use below as non-auth user | |
# api_key = "sk-xxx" | |
client = OpenAI(base_url=base_url, api_key=api_key) | |
response = client.chat.completions.create( | |
model="mixtral-8x7b", | |
messages=[ | |
{ | |
"role": "user", | |
"content": "what is your model", | |
} | |
], | |
stream=True, | |
) | |
for chunk in response: | |
if chunk.choices[0].delta.content is not None: | |
print(chunk.choices[0].delta.content, end="", flush=True) | |
elif chunk.choices[0].finish_reason == "stop": | |
print() | |
else: | |
pass | |
``` | |
### Using post requests | |
See: [`examples/chat_with_post.py`](https://github.com/Hansimov/hf-llm-api/blob/main/examples/chat_with_post.py) | |
```py | |
import ast | |
import httpx | |
import json | |
import re | |
# If runnning this service with proxy, you might need to unset `http(s)_proxy`. | |
chat_api = "http://127.0.0.1:23333" | |
# Your own HF_TOKEN | |
api_key = "hf_xxxxxxxxxxxxxxxx" | |
# use below as non-auth user | |
# api_key = "sk-xxx" | |
requests_headers = {} | |
requests_payload = { | |
"model": "mixtral-8x7b", | |
"messages": [ | |
{ | |
"role": "user", | |
"content": "what is your model", | |
} | |
], | |
"stream": True, | |
} | |
with httpx.stream( | |
"POST", | |
chat_api + "/chat/completions", | |
headers=requests_headers, | |
json=requests_payload, | |
timeout=httpx.Timeout(connect=20, read=60, write=20, pool=None), | |
) as response: | |
# https://docs.aiohttp.org/en/stable/streams.html | |
# https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb | |
response_content = "" | |
for line in response.iter_lines(): | |
remove_patterns = [r"^\s*data:\s*", r"^\s*\[DONE\]\s*"] | |
for pattern in remove_patterns: | |
line = re.sub(pattern, "", line).strip() | |
if line: | |
try: | |
line_data = json.loads(line) | |
except Exception as e: | |
try: | |
line_data = ast.literal_eval(line) | |
except: | |
print(f"Error: {line}") | |
raise e | |
# print(f"line: {line_data}") | |
delta_data = line_data["choices"][0]["delta"] | |
finish_reason = line_data["choices"][0]["finish_reason"] | |
if "role" in delta_data: | |
role = delta_data["role"] | |
if "content" in delta_data: | |
delta_content = delta_data["content"] | |
response_content += delta_content | |
print(delta_content, end="", flush=True) | |
if finish_reason == "stop": | |
print() | |
``` | |