File size: 1,382 Bytes
447ebeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
What endpoints does the litellm proxy have 💥 LiteLLM Proxy Server
LiteLLM Server manages:

Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
Set custom prompt templates + model-specific configs (temperature, max_tokens, etc.)
Quick Start
View all the supported args for the Proxy CLI here

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:8000

Test
In a new shell, run, this will make an openai.ChatCompletion request

litellm --test

This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.

Replace openai base
import openai 

openai.api_base = "http://0.0.0.0:8000"

print(openai.chat.completions.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))

Supported LLMs
Bedrock
Huggingface (TGI)
Anthropic
VLLM
OpenAI Compatible Server
TogetherAI
Replicate
Petals
Palm
Azure OpenAI
AI21
Cohere
$ export AWS_ACCESS_KEY_ID=""
$ export AWS_REGION_NAME="" # e.g. us-west-2
$ export AWS_SECRET_ACCESS_KEY=""

$ litellm --model bedrock/anthropic.claude-v2

Server Endpoints
POST /chat/completions - chat completions endpoint to call 100+ LLMs
POST /completions - completions endpoint
POST /embeddings - embedding endpoint for Azure, OpenAI, Huggingface endpoints
GET /models - available models on server