Upload 225 files
Browse filesThis view is limited to 50 files because it contains too many changes. Β
See raw diff
- .gitattributes +2 -0
- docs/my-website/.gitignore +20 -0
- docs/my-website/Dockerfile +9 -0
- docs/my-website/README.md +41 -0
- docs/my-website/babel.config.js +3 -0
- docs/my-website/blog/2021-08-26-welcome/index.md +43 -0
- docs/my-website/docs/budget_manager.md +248 -0
- docs/my-website/docs/caching/caching_api.md +75 -0
- docs/my-website/docs/caching/local_caching.md +92 -0
- docs/my-website/docs/caching/redis_cache.md +73 -0
- docs/my-website/docs/completion/batching.md +182 -0
- docs/my-website/docs/completion/config.md +49 -0
- docs/my-website/docs/completion/function_call.md +545 -0
- docs/my-website/docs/completion/input.md +582 -0
- docs/my-website/docs/completion/message_trimming.md +36 -0
- docs/my-website/docs/completion/mock_requests.md +72 -0
- docs/my-website/docs/completion/model_alias.md +53 -0
- docs/my-website/docs/completion/multiple_deployments.md +53 -0
- docs/my-website/docs/completion/output.md +68 -0
- docs/my-website/docs/completion/prompt_formatting.md +86 -0
- docs/my-website/docs/completion/reliable_completions.md +196 -0
- docs/my-website/docs/completion/stream.md +76 -0
- docs/my-website/docs/completion/token_usage.md +154 -0
- docs/my-website/docs/contact.md +6 -0
- docs/my-website/docs/debugging/hosted_debugging.md +91 -0
- docs/my-website/docs/debugging/local_debugging.md +64 -0
- docs/my-website/docs/default_code_snippet.md +22 -0
- docs/my-website/docs/embedding/async_embedding.md +15 -0
- docs/my-website/docs/embedding/moderation.md +10 -0
- docs/my-website/docs/embedding/supported_embedding.md +201 -0
- docs/my-website/docs/exception_mapping.md +102 -0
- docs/my-website/docs/extras/contributing.md +49 -0
- docs/my-website/docs/getting_started.md +100 -0
- docs/my-website/docs/index.md +402 -0
- docs/my-website/docs/langchain/langchain.md +135 -0
- docs/my-website/docs/migration.md +35 -0
- docs/my-website/docs/observability/callbacks.md +35 -0
- docs/my-website/docs/observability/custom_callback.md +358 -0
- docs/my-website/docs/observability/helicone_integration.md +55 -0
- docs/my-website/docs/observability/langfuse_integration.md +105 -0
- docs/my-website/docs/observability/langsmith_integration.md +77 -0
- docs/my-website/docs/observability/llmonitor_integration.md +65 -0
- docs/my-website/docs/observability/promptlayer_integration.md +77 -0
- docs/my-website/docs/observability/sentry.md +44 -0
- docs/my-website/docs/observability/slack_integration.md +93 -0
- docs/my-website/docs/observability/supabase_integration.md +101 -0
- docs/my-website/docs/observability/telemetry.md +13 -0
- docs/my-website/docs/observability/traceloop_integration.md +34 -0
- docs/my-website/docs/observability/wandb_integration.md +51 -0
- docs/my-website/docs/projects.md +19 -0
.gitattributes
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
docs/my-website/img/alt_dashboard.png filter=lfs diff=lfs merge=lfs -text
|
2 |
+
docs/my-website/img/dashboard_log.png filter=lfs diff=lfs merge=lfs -text
|
docs/my-website/.gitignore
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Dependencies
|
2 |
+
/node_modules
|
3 |
+
|
4 |
+
# Production
|
5 |
+
/build
|
6 |
+
|
7 |
+
# Generated files
|
8 |
+
.docusaurus
|
9 |
+
.cache-loader
|
10 |
+
|
11 |
+
# Misc
|
12 |
+
.DS_Store
|
13 |
+
.env.local
|
14 |
+
.env.development.local
|
15 |
+
.env.test.local
|
16 |
+
.env.production.local
|
17 |
+
|
18 |
+
npm-debug.log*
|
19 |
+
yarn-debug.log*
|
20 |
+
yarn-error.log*
|
docs/my-website/Dockerfile
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
FROM python:3.10
|
2 |
+
|
3 |
+
COPY . /app
|
4 |
+
WORKDIR /app
|
5 |
+
RUN pip install -r requirements.txt
|
6 |
+
|
7 |
+
EXPOSE $PORT
|
8 |
+
|
9 |
+
CMD litellm --host 0.0.0.0 --port $PORT --workers 10 --config config.yaml
|
docs/my-website/README.md
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Website
|
2 |
+
|
3 |
+
This website is built using [Docusaurus 2](https://docusaurus.io/), a modern static website generator.
|
4 |
+
|
5 |
+
### Installation
|
6 |
+
|
7 |
+
```
|
8 |
+
$ yarn
|
9 |
+
```
|
10 |
+
|
11 |
+
### Local Development
|
12 |
+
|
13 |
+
```
|
14 |
+
$ yarn start
|
15 |
+
```
|
16 |
+
|
17 |
+
This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
|
18 |
+
|
19 |
+
### Build
|
20 |
+
|
21 |
+
```
|
22 |
+
$ yarn build
|
23 |
+
```
|
24 |
+
|
25 |
+
This command generates static content into the `build` directory and can be served using any static contents hosting service.
|
26 |
+
|
27 |
+
### Deployment
|
28 |
+
|
29 |
+
Using SSH:
|
30 |
+
|
31 |
+
```
|
32 |
+
$ USE_SSH=true yarn deploy
|
33 |
+
```
|
34 |
+
|
35 |
+
Not using SSH:
|
36 |
+
|
37 |
+
```
|
38 |
+
$ GIT_USER=<Your GitHub username> yarn deploy
|
39 |
+
```
|
40 |
+
|
41 |
+
If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
|
docs/my-website/babel.config.js
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
module.exports = {
|
2 |
+
presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
|
3 |
+
};
|
docs/my-website/blog/2021-08-26-welcome/index.md
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# π
litellm
|
2 |
+
A light 100 line package to simplify calling OpenAI, Azure, Cohere, Anthropic APIs
|
3 |
+
|
4 |
+
###### litellm manages:
|
5 |
+
* Calling all LLM APIs using the OpenAI format - `completion(model, messages)`
|
6 |
+
* Consistent output for all LLM APIs, text responses will always be available at `['choices'][0]['message']['content']`
|
7 |
+
* Consistent Exceptions for all LLM APIs, we map RateLimit, Context Window, and Authentication Error exceptions across all providers to their OpenAI equivalents. [see Code](https://github.com/BerriAI/litellm/blob/ba1079ff6698ef238c5c7f771dd2b698ec76f8d9/litellm/utils.py#L250)
|
8 |
+
|
9 |
+
###### observability:
|
10 |
+
* Logging - see exactly what the raw model request/response is by plugging in your own function `completion(.., logger_fn=your_logging_fn)` and/or print statements from the package `litellm.set_verbose=True`
|
11 |
+
* Callbacks - automatically send your data to Helicone, Sentry, Posthog, Slack - `litellm.success_callbacks`, `litellm.failure_callbacks` [see Callbacks](https://litellm.readthedocs.io/en/latest/advanced/)
|
12 |
+
|
13 |
+
## Quick Start
|
14 |
+
Go directly to code: [Getting Started Notebook](https://colab.research.google.com/drive/1gR3pY-JzDZahzpVdbGBtrNGDBmzUNJaJ?usp=sharing)
|
15 |
+
### Installation
|
16 |
+
```
|
17 |
+
pip install litellm
|
18 |
+
```
|
19 |
+
|
20 |
+
### Usage
|
21 |
+
```python
|
22 |
+
from litellm import completion
|
23 |
+
|
24 |
+
## set ENV variables
|
25 |
+
os.environ["OPENAI_API_KEY"] = "openai key"
|
26 |
+
os.environ["COHERE_API_KEY"] = "cohere key"
|
27 |
+
|
28 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
29 |
+
|
30 |
+
# openai call
|
31 |
+
response = completion(model="gpt-3.5-turbo", messages=messages)
|
32 |
+
|
33 |
+
# cohere call
|
34 |
+
response = completion("command-nightly", messages)
|
35 |
+
```
|
36 |
+
Need Help / Support : [see troubleshooting](https://litellm.readthedocs.io/en/latest/troubleshoot)
|
37 |
+
|
38 |
+
## Why did we build liteLLM
|
39 |
+
- **Need for simplicity**: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI, Cohere
|
40 |
+
|
41 |
+
## Support
|
42 |
+
* [Meet with us π](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
43 |
+
* Contact us at [email protected] / [email protected]
|
docs/my-website/docs/budget_manager.md
ADDED
@@ -0,0 +1,248 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Tabs from '@theme/Tabs';
|
2 |
+
import TabItem from '@theme/TabItem';
|
3 |
+
|
4 |
+
# Budget Manager
|
5 |
+
|
6 |
+
Don't want to get crazy bills because either while you're calling LLM APIs **or** while your users are calling them? use this.
|
7 |
+
|
8 |
+
LiteLLM exposes:
|
9 |
+
* `litellm.max_budget`: a global variable you can use to set the max budget (in USD) across all your litellm calls. If this budget is exceeded, it will raise a BudgetExceededError
|
10 |
+
* `BudgetManager`: A class to help set budgets per user. BudgetManager creates a dictionary to manage the user budgets, where the key is user and the object is their current cost + model-specific costs.
|
11 |
+
|
12 |
+
## quick start
|
13 |
+
|
14 |
+
```python
|
15 |
+
import litellm, os
|
16 |
+
from litellm import completion
|
17 |
+
|
18 |
+
# set env variable
|
19 |
+
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
20 |
+
|
21 |
+
litellm.max_budget = 0.001 # sets a max budget of $0.001
|
22 |
+
|
23 |
+
messages = [{"role": "user", "content": "Hey, how's it going"}]
|
24 |
+
completion(model="gpt-4", messages=messages)
|
25 |
+
print(litellm._current_cost)
|
26 |
+
completion(model="gpt-4", messages=messages)
|
27 |
+
```
|
28 |
+
|
29 |
+
## User-based rate limiting
|
30 |
+
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/LiteLLM_User_Based_Rate_Limits.ipynb">
|
31 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
32 |
+
</a>
|
33 |
+
|
34 |
+
```python
|
35 |
+
from litellm import BudgetManager, completion
|
36 |
+
|
37 |
+
budget_manager = BudgetManager(project_name="test_project")
|
38 |
+
|
39 |
+
user = "1234"
|
40 |
+
|
41 |
+
# create a budget if new user user
|
42 |
+
if not budget_manager.is_valid_user(user):
|
43 |
+
budget_manager.create_budget(total_budget=10, user=user)
|
44 |
+
|
45 |
+
# check if a given call can be made
|
46 |
+
if budget_manager.get_current_cost(user=user) <= budget_manager.get_total_budget(user):
|
47 |
+
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey, how's it going?"}])
|
48 |
+
budget_manager.update_cost(completion_obj=response, user=user)
|
49 |
+
else:
|
50 |
+
response = "Sorry - no budget!"
|
51 |
+
```
|
52 |
+
|
53 |
+
[**Implementation Code**](https://github.com/BerriAI/litellm/blob/main/litellm/budget_manager.py)
|
54 |
+
|
55 |
+
## use with Text Input / Output
|
56 |
+
|
57 |
+
Update cost by just passing in the text input / output and model name.
|
58 |
+
|
59 |
+
```python
|
60 |
+
from litellm import BudgetManager
|
61 |
+
|
62 |
+
budget_manager = BudgetManager(project_name="test_project")
|
63 |
+
user = "12345"
|
64 |
+
budget_manager.create_budget(total_budget=10, user=user, duration="daily")
|
65 |
+
|
66 |
+
input_text = "hello world"
|
67 |
+
output_text = "it's a sunny day in san francisco"
|
68 |
+
model = "gpt-3.5-turbo"
|
69 |
+
|
70 |
+
budget_manager.update_cost(user=user, model=model, input_text=input_text, output_text=output_text) # π
|
71 |
+
print(budget_manager.get_current_cost(user))
|
72 |
+
```
|
73 |
+
|
74 |
+
## advanced usage
|
75 |
+
In production, we will need to
|
76 |
+
* store user budgets in a database
|
77 |
+
* reset user budgets based on a set duration
|
78 |
+
|
79 |
+
|
80 |
+
|
81 |
+
### LiteLLM API
|
82 |
+
|
83 |
+
The LiteLLM API provides both. It stores the user object in a hosted db, and runs a cron job daily to reset user-budgets based on the set duration (e.g. reset budget daily/weekly/monthly/etc.).
|
84 |
+
|
85 |
+
**Usage**
|
86 |
+
```python
|
87 |
+
budget_manager = BudgetManager(project_name="<my-unique-project>", client_type="hosted")
|
88 |
+
```
|
89 |
+
|
90 |
+
**Complete Code**
|
91 |
+
```python
|
92 |
+
from litellm import BudgetManager, completion
|
93 |
+
|
94 |
+
budget_manager = BudgetManager(project_name="<my-unique-project>", client_type="hosted")
|
95 |
+
|
96 |
+
user = "1234"
|
97 |
+
|
98 |
+
# create a budget if new user user
|
99 |
+
if not budget_manager.is_valid_user(user):
|
100 |
+
budget_manager.create_budget(total_budget=10, user=user, duration="monthly") # π duration = 'daily'/'weekly'/'monthly'/'yearly'
|
101 |
+
|
102 |
+
# check if a given call can be made
|
103 |
+
if budget_manager.get_current_cost(user=user) <= budget_manager.get_total_budget(user):
|
104 |
+
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey, how's it going?"}])
|
105 |
+
budget_manager.update_cost(completion_obj=response, user=user)
|
106 |
+
else:
|
107 |
+
response = "Sorry - no budget!"
|
108 |
+
```
|
109 |
+
|
110 |
+
### Self-hosted
|
111 |
+
|
112 |
+
To use your own db, set the BudgetManager client type to `hosted` **and** set the api_base.
|
113 |
+
|
114 |
+
Your api is expected to expose `/get_budget` and `/set_budget` endpoints. [See code for details](https://github.com/BerriAI/litellm/blob/27f1051792176a7eb1fe3b72b72bccd6378d24e9/litellm/budget_manager.py#L7)
|
115 |
+
|
116 |
+
**Usage**
|
117 |
+
```python
|
118 |
+
budget_manager = BudgetManager(project_name="<my-unique-project>", client_type="hosted", api_base="your_custom_api")
|
119 |
+
```
|
120 |
+
**Complete Code**
|
121 |
+
```python
|
122 |
+
from litellm import BudgetManager, completion
|
123 |
+
|
124 |
+
budget_manager = BudgetManager(project_name="<my-unique-project>", client_type="hosted", api_base="your_custom_api")
|
125 |
+
|
126 |
+
user = "1234"
|
127 |
+
|
128 |
+
# create a budget if new user user
|
129 |
+
if not budget_manager.is_valid_user(user):
|
130 |
+
budget_manager.create_budget(total_budget=10, user=user, duration="monthly") # π duration = 'daily'/'weekly'/'monthly'/'yearly'
|
131 |
+
|
132 |
+
# check if a given call can be made
|
133 |
+
if budget_manager.get_current_cost(user=user) <= budget_manager.get_total_budget(user):
|
134 |
+
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey, how's it going?"}])
|
135 |
+
budget_manager.update_cost(completion_obj=response, user=user)
|
136 |
+
else:
|
137 |
+
response = "Sorry - no budget!"
|
138 |
+
```
|
139 |
+
|
140 |
+
## Budget Manager Class
|
141 |
+
The `BudgetManager` class is used to manage budgets for different users. It provides various functions to create, update, and retrieve budget information.
|
142 |
+
|
143 |
+
Below is a list of public functions exposed by the Budget Manager class and their input/outputs.
|
144 |
+
|
145 |
+
### __init__
|
146 |
+
```python
|
147 |
+
def __init__(self, project_name: str, client_type: str = "local", api_base: Optional[str] = None)
|
148 |
+
```
|
149 |
+
- `project_name` (str): The name of the project.
|
150 |
+
- `client_type` (str): The client type ("local" or "hosted"). Defaults to "local".
|
151 |
+
- `api_base` (Optional[str]): The base URL of the API. Defaults to None.
|
152 |
+
|
153 |
+
|
154 |
+
### create_budget
|
155 |
+
```python
|
156 |
+
def create_budget(self, total_budget: float, user: str, duration: Literal["daily", "weekly", "monthly", "yearly"], created_at: float = time.time())
|
157 |
+
```
|
158 |
+
Creates a budget for a user.
|
159 |
+
|
160 |
+
- `total_budget` (float): The total budget of the user.
|
161 |
+
- `user` (str): The user id.
|
162 |
+
- `duration` (Literal["daily", "weekly", "monthly", "yearly"]): The budget duration.
|
163 |
+
- `created_at` (float): The creation time. Default is the current time.
|
164 |
+
|
165 |
+
### projected_cost
|
166 |
+
```python
|
167 |
+
def projected_cost(self, model: str, messages: list, user: str)
|
168 |
+
```
|
169 |
+
Computes the projected cost for a session.
|
170 |
+
|
171 |
+
- `model` (str): The name of the model.
|
172 |
+
- `messages` (list): The list of messages.
|
173 |
+
- `user` (str): The user id.
|
174 |
+
|
175 |
+
### get_total_budget
|
176 |
+
```python
|
177 |
+
def get_total_budget(self, user: str)
|
178 |
+
```
|
179 |
+
Returns the total budget of a user.
|
180 |
+
|
181 |
+
- `user` (str): user id.
|
182 |
+
|
183 |
+
### update_cost
|
184 |
+
```python
|
185 |
+
def update_cost(self, completion_obj: ModelResponse, user: str)
|
186 |
+
```
|
187 |
+
Updates the user's cost.
|
188 |
+
|
189 |
+
- `completion_obj` (ModelResponse): The completion object received from the model.
|
190 |
+
- `user` (str): The user id.
|
191 |
+
|
192 |
+
### get_current_cost
|
193 |
+
```python
|
194 |
+
def get_current_cost(self, user: str)
|
195 |
+
```
|
196 |
+
Returns the current cost of a user.
|
197 |
+
|
198 |
+
- `user` (str): The user id.
|
199 |
+
|
200 |
+
### get_model_cost
|
201 |
+
```python
|
202 |
+
def get_model_cost(self, user: str)
|
203 |
+
```
|
204 |
+
Returns the model cost of a user.
|
205 |
+
|
206 |
+
- `user` (str): The user id.
|
207 |
+
|
208 |
+
### is_valid_user
|
209 |
+
```python
|
210 |
+
def is_valid_user(self, user: str) -> bool
|
211 |
+
```
|
212 |
+
Checks if a user is valid.
|
213 |
+
|
214 |
+
- `user` (str): The user id.
|
215 |
+
|
216 |
+
### get_users
|
217 |
+
```python
|
218 |
+
def get_users(self)
|
219 |
+
```
|
220 |
+
Returns a list of all users.
|
221 |
+
|
222 |
+
### reset_cost
|
223 |
+
```python
|
224 |
+
def reset_cost(self, user: str)
|
225 |
+
```
|
226 |
+
Resets the cost of a user.
|
227 |
+
|
228 |
+
- `user` (str): The user id.
|
229 |
+
|
230 |
+
### reset_on_duration
|
231 |
+
```python
|
232 |
+
def reset_on_duration(self, user: str)
|
233 |
+
```
|
234 |
+
Resets the cost of a user based on the duration.
|
235 |
+
|
236 |
+
- `user` (str): The user id.
|
237 |
+
|
238 |
+
### update_budget_all_users
|
239 |
+
```python
|
240 |
+
def update_budget_all_users(self)
|
241 |
+
```
|
242 |
+
Updates the budget for all users.
|
243 |
+
|
244 |
+
### save_data
|
245 |
+
```python
|
246 |
+
def save_data(self)
|
247 |
+
```
|
248 |
+
Stores the user dictionary.
|
docs/my-website/docs/caching/caching_api.md
ADDED
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Hosted Cache - api.litellm.ai
|
2 |
+
|
3 |
+
Use api.litellm.ai for caching `completion()` and `embedding()` responses
|
4 |
+
|
5 |
+
## Quick Start Usage - Completion
|
6 |
+
```python
|
7 |
+
import litellm
|
8 |
+
from litellm import completion
|
9 |
+
from litellm.caching import Cache
|
10 |
+
litellm.cache = Cache(type="hosted") # init cache to use api.litellm.ai
|
11 |
+
|
12 |
+
# Make completion calls
|
13 |
+
response1 = completion(
|
14 |
+
model="gpt-3.5-turbo",
|
15 |
+
messages=[{"role": "user", "content": "Tell me a joke."}]
|
16 |
+
caching=True
|
17 |
+
)
|
18 |
+
|
19 |
+
response2 = completion(
|
20 |
+
model="gpt-3.5-turbo",
|
21 |
+
messages=[{"role": "user", "content": "Tell me a joke."}],
|
22 |
+
caching=True
|
23 |
+
)
|
24 |
+
# response1 == response2, response 1 is cached
|
25 |
+
```
|
26 |
+
|
27 |
+
|
28 |
+
## Usage - Embedding()
|
29 |
+
|
30 |
+
```python
|
31 |
+
import time
|
32 |
+
import litellm
|
33 |
+
from litellm import completion, embedding
|
34 |
+
from litellm.caching import Cache
|
35 |
+
litellm.cache = Cache(type="hosted")
|
36 |
+
|
37 |
+
start_time = time.time()
|
38 |
+
embedding1 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
|
39 |
+
end_time = time.time()
|
40 |
+
print(f"Embedding 1 response time: {end_time - start_time} seconds")
|
41 |
+
|
42 |
+
start_time = time.time()
|
43 |
+
embedding2 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
|
44 |
+
end_time = time.time()
|
45 |
+
print(f"Embedding 2 response time: {end_time - start_time} seconds")
|
46 |
+
```
|
47 |
+
|
48 |
+
## Caching with Streaming
|
49 |
+
LiteLLM can cache your streamed responses for you
|
50 |
+
|
51 |
+
### Usage
|
52 |
+
```python
|
53 |
+
import litellm
|
54 |
+
from litellm import completion
|
55 |
+
from litellm.caching import Cache
|
56 |
+
litellm.cache = Cache(type="hosted")
|
57 |
+
|
58 |
+
# Make completion calls
|
59 |
+
response1 = completion(
|
60 |
+
model="gpt-3.5-turbo",
|
61 |
+
messages=[{"role": "user", "content": "Tell me a joke."}],
|
62 |
+
stream=True,
|
63 |
+
caching=True)
|
64 |
+
for chunk in response1:
|
65 |
+
print(chunk)
|
66 |
+
|
67 |
+
|
68 |
+
response2 = completion(
|
69 |
+
model="gpt-3.5-turbo",
|
70 |
+
messages=[{"role": "user", "content": "Tell me a joke."}],
|
71 |
+
stream=True,
|
72 |
+
caching=True)
|
73 |
+
for chunk in response2:
|
74 |
+
print(chunk)
|
75 |
+
```
|
docs/my-website/docs/caching/local_caching.md
ADDED
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# LiteLLM - Local Caching
|
2 |
+
|
3 |
+
## Caching `completion()` and `embedding()` calls when switched on
|
4 |
+
|
5 |
+
liteLLM implements exact match caching and supports the following Caching:
|
6 |
+
* In-Memory Caching [Default]
|
7 |
+
* Redis Caching Local
|
8 |
+
* Redis Caching Hosted
|
9 |
+
|
10 |
+
## Quick Start Usage - Completion
|
11 |
+
Caching - cache
|
12 |
+
Keys in the cache are `model`, the following example will lead to a cache hit
|
13 |
+
```python
|
14 |
+
import litellm
|
15 |
+
from litellm import completion
|
16 |
+
from litellm.caching import Cache
|
17 |
+
litellm.cache = Cache()
|
18 |
+
|
19 |
+
# Make completion calls
|
20 |
+
response1 = completion(
|
21 |
+
model="gpt-3.5-turbo",
|
22 |
+
messages=[{"role": "user", "content": "Tell me a joke."}]
|
23 |
+
caching=True
|
24 |
+
)
|
25 |
+
response2 = completion(
|
26 |
+
model="gpt-3.5-turbo",
|
27 |
+
messages=[{"role": "user", "content": "Tell me a joke."}],
|
28 |
+
caching=True
|
29 |
+
)
|
30 |
+
|
31 |
+
# response1 == response2, response 1 is cached
|
32 |
+
```
|
33 |
+
|
34 |
+
## Custom Key-Value Pairs
|
35 |
+
Add custom key-value pairs to your cache.
|
36 |
+
|
37 |
+
```python
|
38 |
+
from litellm.caching import Cache
|
39 |
+
cache = Cache()
|
40 |
+
|
41 |
+
cache.add_cache(cache_key="test-key", result="1234")
|
42 |
+
|
43 |
+
cache.get_cache(cache_key="test-key)
|
44 |
+
```
|
45 |
+
|
46 |
+
## Caching with Streaming
|
47 |
+
LiteLLM can cache your streamed responses for you
|
48 |
+
|
49 |
+
### Usage
|
50 |
+
```python
|
51 |
+
import litellm
|
52 |
+
from litellm import completion
|
53 |
+
from litellm.caching import Cache
|
54 |
+
litellm.cache = Cache()
|
55 |
+
|
56 |
+
# Make completion calls
|
57 |
+
response1 = completion(
|
58 |
+
model="gpt-3.5-turbo",
|
59 |
+
messages=[{"role": "user", "content": "Tell me a joke."}],
|
60 |
+
stream=True,
|
61 |
+
caching=True)
|
62 |
+
for chunk in response1:
|
63 |
+
print(chunk)
|
64 |
+
response2 = completion(
|
65 |
+
model="gpt-3.5-turbo",
|
66 |
+
messages=[{"role": "user", "content": "Tell me a joke."}],
|
67 |
+
stream=True,
|
68 |
+
caching=True)
|
69 |
+
for chunk in response2:
|
70 |
+
print(chunk)
|
71 |
+
```
|
72 |
+
|
73 |
+
## Usage - Embedding()
|
74 |
+
1. Caching - cache
|
75 |
+
Keys in the cache are `model`, the following example will lead to a cache hit
|
76 |
+
```python
|
77 |
+
import time
|
78 |
+
import litellm
|
79 |
+
from litellm import embedding
|
80 |
+
from litellm.caching import Cache
|
81 |
+
litellm.cache = Cache()
|
82 |
+
|
83 |
+
start_time = time.time()
|
84 |
+
embedding1 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
|
85 |
+
end_time = time.time()
|
86 |
+
print(f"Embedding 1 response time: {end_time - start_time} seconds")
|
87 |
+
|
88 |
+
start_time = time.time()
|
89 |
+
embedding2 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
|
90 |
+
end_time = time.time()
|
91 |
+
print(f"Embedding 2 response time: {end_time - start_time} seconds")
|
92 |
+
```
|
docs/my-website/docs/caching/redis_cache.md
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Redis Cache
|
2 |
+
### Pre-requisites
|
3 |
+
Install redis
|
4 |
+
```
|
5 |
+
pip install redis
|
6 |
+
```
|
7 |
+
For the hosted version you can setup your own Redis DB here: https://app.redislabs.com/
|
8 |
+
### Usage
|
9 |
+
```python
|
10 |
+
import litellm
|
11 |
+
from litellm import completion
|
12 |
+
from litellm.caching import Cache
|
13 |
+
litellm.cache = Cache(type="redis", host=<host>, port=<port>, password=<password>)
|
14 |
+
|
15 |
+
# Make completion calls
|
16 |
+
response1 = completion(
|
17 |
+
model="gpt-3.5-turbo",
|
18 |
+
messages=[{"role": "user", "content": "Tell me a joke."}],
|
19 |
+
caching=True
|
20 |
+
)
|
21 |
+
response2 = completion(
|
22 |
+
model="gpt-3.5-turbo",
|
23 |
+
messages=[{"role": "user", "content": "Tell me a joke."}],
|
24 |
+
caching=True
|
25 |
+
)
|
26 |
+
|
27 |
+
# response1 == response2, response 1 is cached
|
28 |
+
```
|
29 |
+
|
30 |
+
### Custom Cache Keys:
|
31 |
+
|
32 |
+
Define function to return cache key
|
33 |
+
```python
|
34 |
+
# this function takes in *args, **kwargs and returns the key you want to use for caching
|
35 |
+
def custom_get_cache_key(*args, **kwargs):
|
36 |
+
# return key to use for your cache:
|
37 |
+
key = kwargs.get("model", "") + str(kwargs.get("messages", "")) + str(kwargs.get("temperature", "")) + str(kwargs.get("logit_bias", ""))
|
38 |
+
print("key for cache", key)
|
39 |
+
return key
|
40 |
+
|
41 |
+
```
|
42 |
+
|
43 |
+
Set your function as litellm.cache.get_cache_key
|
44 |
+
```python
|
45 |
+
from litellm.caching import Cache
|
46 |
+
|
47 |
+
cache = Cache(type="redis", host=os.environ['REDIS_HOST'], port=os.environ['REDIS_PORT'], password=os.environ['REDIS_PASSWORD'])
|
48 |
+
|
49 |
+
cache.get_cache_key = custom_get_cache_key # set get_cache_key function for your cache
|
50 |
+
|
51 |
+
litellm.cache = cache # set litellm.cache to your cache
|
52 |
+
|
53 |
+
```
|
54 |
+
|
55 |
+
### Detecting Cached Responses
|
56 |
+
For resposes that were returned as cache hit, the response includes a param `cache` = True
|
57 |
+
|
58 |
+
Example response with cache hit
|
59 |
+
```python
|
60 |
+
{
|
61 |
+
'cache': True,
|
62 |
+
'id': 'chatcmpl-7wggdzd6OXhgE2YhcLJHJNZsEWzZ2',
|
63 |
+
'created': 1694221467,
|
64 |
+
'model': 'gpt-3.5-turbo-0613',
|
65 |
+
'choices': [
|
66 |
+
{
|
67 |
+
'index': 0, 'message': {'role': 'assistant', 'content': 'I\'m sorry, but I couldn\'t find any information about "litellm" or how many stars it has. It is possible that you may be referring to a specific product, service, or platform that I am not familiar with. Can you please provide more context or clarify your question?'
|
68 |
+
}, 'finish_reason': 'stop'}
|
69 |
+
],
|
70 |
+
'usage': {'prompt_tokens': 17, 'completion_tokens': 59, 'total_tokens': 76},
|
71 |
+
}
|
72 |
+
|
73 |
+
```
|
docs/my-website/docs/completion/batching.md
ADDED
@@ -0,0 +1,182 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Batching Completion()
|
2 |
+
LiteLLM allows you to:
|
3 |
+
* Send many completion calls to 1 model
|
4 |
+
* Send 1 completion call to many models: Return Fastest Response
|
5 |
+
* Send 1 completion call to many models: Return All Responses
|
6 |
+
|
7 |
+
## Send multiple completion calls to 1 model
|
8 |
+
|
9 |
+
In the batch_completion method, you provide a list of `messages` where each sub-list of messages is passed to `litellm.completion()`, allowing you to process multiple prompts efficiently in a single API call.
|
10 |
+
|
11 |
+
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/LiteLLM_batch_completion.ipynb">
|
12 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
13 |
+
</a>
|
14 |
+
|
15 |
+
### Example Code
|
16 |
+
```python
|
17 |
+
import litellm
|
18 |
+
import os
|
19 |
+
from litellm import batch_completion
|
20 |
+
|
21 |
+
os.environ['ANTHROPIC_API_KEY'] = ""
|
22 |
+
|
23 |
+
|
24 |
+
responses = batch_completion(
|
25 |
+
model="claude-2",
|
26 |
+
messages = [
|
27 |
+
[
|
28 |
+
{
|
29 |
+
"role": "user",
|
30 |
+
"content": "good morning? "
|
31 |
+
}
|
32 |
+
],
|
33 |
+
[
|
34 |
+
{
|
35 |
+
"role": "user",
|
36 |
+
"content": "what's the time? "
|
37 |
+
}
|
38 |
+
]
|
39 |
+
]
|
40 |
+
)
|
41 |
+
```
|
42 |
+
|
43 |
+
## Send 1 completion call to many models: Return Fastest Response
|
44 |
+
This makes parallel calls to the specified `models` and returns the first response
|
45 |
+
|
46 |
+
Use this to reduce latency
|
47 |
+
|
48 |
+
### Example Code
|
49 |
+
```python
|
50 |
+
import litellm
|
51 |
+
import os
|
52 |
+
from litellm import batch_completion_models
|
53 |
+
|
54 |
+
os.environ['ANTHROPIC_API_KEY'] = ""
|
55 |
+
os.environ['OPENAI_API_KEY'] = ""
|
56 |
+
os.environ['COHERE_API_KEY'] = ""
|
57 |
+
|
58 |
+
response = batch_completion_models(
|
59 |
+
models=["gpt-3.5-turbo", "claude-instant-1.2", "command-nightly"],
|
60 |
+
messages=[{"role": "user", "content": "Hey, how's it going"}]
|
61 |
+
)
|
62 |
+
print(result)
|
63 |
+
```
|
64 |
+
|
65 |
+
### Output
|
66 |
+
Returns the first response
|
67 |
+
```json
|
68 |
+
{
|
69 |
+
"object": "chat.completion",
|
70 |
+
"choices": [
|
71 |
+
{
|
72 |
+
"finish_reason": "stop",
|
73 |
+
"index": 0,
|
74 |
+
"message": {
|
75 |
+
"content": " I'm doing well, thanks for asking! I'm an AI assistant created by Anthropic to be helpful, harmless, and honest.",
|
76 |
+
"role": "assistant",
|
77 |
+
"logprobs": null
|
78 |
+
}
|
79 |
+
}
|
80 |
+
],
|
81 |
+
"id": "chatcmpl-23273eed-e351-41be-a492-bafcf5cf3274",
|
82 |
+
"created": 1695154628.2076092,
|
83 |
+
"model": "command-nightly",
|
84 |
+
"usage": {
|
85 |
+
"prompt_tokens": 6,
|
86 |
+
"completion_tokens": 14,
|
87 |
+
"total_tokens": 20
|
88 |
+
}
|
89 |
+
}
|
90 |
+
```
|
91 |
+
|
92 |
+
## Send 1 completion call to many models: Return All Responses
|
93 |
+
This makes parallel calls to the specified models and returns all responses
|
94 |
+
|
95 |
+
Use this to process requests concurrently and get responses from multiple models.
|
96 |
+
|
97 |
+
### Example Code
|
98 |
+
```python
|
99 |
+
import litellm
|
100 |
+
import os
|
101 |
+
from litellm import batch_completion_models_all_responses
|
102 |
+
|
103 |
+
os.environ['ANTHROPIC_API_KEY'] = ""
|
104 |
+
os.environ['OPENAI_API_KEY'] = ""
|
105 |
+
os.environ['COHERE_API_KEY'] = ""
|
106 |
+
|
107 |
+
responses = batch_completion_models_all_responses(
|
108 |
+
models=["gpt-3.5-turbo", "claude-instant-1.2", "command-nightly"],
|
109 |
+
messages=[{"role": "user", "content": "Hey, how's it going"}]
|
110 |
+
)
|
111 |
+
print(responses)
|
112 |
+
|
113 |
+
```
|
114 |
+
|
115 |
+
### Output
|
116 |
+
|
117 |
+
```json
|
118 |
+
[<ModelResponse chat.completion id=chatcmpl-e673ec8e-4e8f-4c9e-bf26-bf9fa7ee52b9 at 0x103a62160> JSON: {
|
119 |
+
"object": "chat.completion",
|
120 |
+
"choices": [
|
121 |
+
{
|
122 |
+
"finish_reason": "stop_sequence",
|
123 |
+
"index": 0,
|
124 |
+
"message": {
|
125 |
+
"content": " It's going well, thank you for asking! How about you?",
|
126 |
+
"role": "assistant",
|
127 |
+
"logprobs": null
|
128 |
+
}
|
129 |
+
}
|
130 |
+
],
|
131 |
+
"id": "chatcmpl-e673ec8e-4e8f-4c9e-bf26-bf9fa7ee52b9",
|
132 |
+
"created": 1695222060.917964,
|
133 |
+
"model": "claude-instant-1.2",
|
134 |
+
"usage": {
|
135 |
+
"prompt_tokens": 14,
|
136 |
+
"completion_tokens": 9,
|
137 |
+
"total_tokens": 23
|
138 |
+
}
|
139 |
+
}, <ModelResponse chat.completion id=chatcmpl-ab6c5bd3-b5d9-4711-9697-e28d9fb8a53c at 0x103a62b60> JSON: {
|
140 |
+
"object": "chat.completion",
|
141 |
+
"choices": [
|
142 |
+
{
|
143 |
+
"finish_reason": "stop",
|
144 |
+
"index": 0,
|
145 |
+
"message": {
|
146 |
+
"content": " It's going well, thank you for asking! How about you?",
|
147 |
+
"role": "assistant",
|
148 |
+
"logprobs": null
|
149 |
+
}
|
150 |
+
}
|
151 |
+
],
|
152 |
+
"id": "chatcmpl-ab6c5bd3-b5d9-4711-9697-e28d9fb8a53c",
|
153 |
+
"created": 1695222061.0445492,
|
154 |
+
"model": "command-nightly",
|
155 |
+
"usage": {
|
156 |
+
"prompt_tokens": 6,
|
157 |
+
"completion_tokens": 14,
|
158 |
+
"total_tokens": 20
|
159 |
+
}
|
160 |
+
}, <OpenAIObject chat.completion id=chatcmpl-80szFnKHzCxObW0RqCMw1hWW1Icrq at 0x102dd6430> JSON: {
|
161 |
+
"id": "chatcmpl-80szFnKHzCxObW0RqCMw1hWW1Icrq",
|
162 |
+
"object": "chat.completion",
|
163 |
+
"created": 1695222061,
|
164 |
+
"model": "gpt-3.5-turbo-0613",
|
165 |
+
"choices": [
|
166 |
+
{
|
167 |
+
"index": 0,
|
168 |
+
"message": {
|
169 |
+
"role": "assistant",
|
170 |
+
"content": "Hello! I'm an AI language model, so I don't have feelings, but I'm here to assist you with any questions or tasks you might have. How can I help you today?"
|
171 |
+
},
|
172 |
+
"finish_reason": "stop"
|
173 |
+
}
|
174 |
+
],
|
175 |
+
"usage": {
|
176 |
+
"prompt_tokens": 13,
|
177 |
+
"completion_tokens": 39,
|
178 |
+
"total_tokens": 52
|
179 |
+
}
|
180 |
+
}]
|
181 |
+
|
182 |
+
```
|
docs/my-website/docs/completion/config.md
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Model Config
|
2 |
+
|
3 |
+
Model-specific changes can make our code complicated, making it harder to debug errors. Use model configs to simplify this.
|
4 |
+
|
5 |
+
### usage
|
6 |
+
|
7 |
+
Handling prompt logic. Different models have different context windows. Use `adapt_to_prompt_size` to select the right model for the prompt (in case the current model is too small).
|
8 |
+
|
9 |
+
|
10 |
+
```python
|
11 |
+
from litellm import completion_with_config
|
12 |
+
import os
|
13 |
+
|
14 |
+
config = {
|
15 |
+
"available_models": ["gpt-3.5-turbo", "claude-instant-1", "gpt-3.5-turbo-16k"],
|
16 |
+
"adapt_to_prompt_size": True, # π key change
|
17 |
+
}
|
18 |
+
|
19 |
+
# set env var
|
20 |
+
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
21 |
+
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
|
22 |
+
|
23 |
+
|
24 |
+
sample_text = "how does a court case get to the Supreme Court?" * 1000
|
25 |
+
messages = [{"content": sample_text, "role": "user"}]
|
26 |
+
response = completion_with_config(model="gpt-3.5-turbo", messages=messages, config=config)
|
27 |
+
```
|
28 |
+
|
29 |
+
[**See Code**](https://github.com/BerriAI/litellm/blob/30724d9e51cdc2c3e0eb063271b4f171bc01b382/litellm/utils.py#L2783)
|
30 |
+
|
31 |
+
### Complete Config Structure
|
32 |
+
|
33 |
+
```python
|
34 |
+
config = {
|
35 |
+
"default_fallback_models": # [Optional] List of model names to try if a call fails
|
36 |
+
"available_models": # [Optional] List of all possible models you could call
|
37 |
+
"adapt_to_prompt_size": # [Optional] True/False - if you want to select model based on prompt size (will pick from available_models)
|
38 |
+
"model": {
|
39 |
+
"model-name": {
|
40 |
+
"needs_moderation": # [Optional] True/False - if you want to call openai moderations endpoint before making completion call. Will raise exception, if flagged.
|
41 |
+
"error_handling": {
|
42 |
+
"error-type": { # One of the errors listed here - https://docs.litellm.ai/docs/exception_mapping#custom-mapping-list
|
43 |
+
"fallback_model": "" # str, name of the model it should try instead, when that error occurs
|
44 |
+
}
|
45 |
+
}
|
46 |
+
}
|
47 |
+
}
|
48 |
+
}
|
49 |
+
```
|
docs/my-website/docs/completion/function_call.md
ADDED
@@ -0,0 +1,545 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Function Calling
|
2 |
+
Function calling is supported with the following models on OpenAI, Azure OpenAI
|
3 |
+
|
4 |
+
- gpt-4
|
5 |
+
- gpt-4-1106-preview
|
6 |
+
- gpt-4-0613
|
7 |
+
- gpt-3.5-turbo
|
8 |
+
- gpt-3.5-turbo-1106
|
9 |
+
- gpt-3.5-turbo-0613
|
10 |
+
- Non OpenAI LLMs (litellm adds the function call to the prompt for these llms)
|
11 |
+
|
12 |
+
In addition, parallel function calls is supported on the following models:
|
13 |
+
- gpt-4-1106-preview
|
14 |
+
- gpt-3.5-turbo-1106
|
15 |
+
|
16 |
+
## Parallel Function calling
|
17 |
+
Parallel function calling is the model's ability to perform multiple function calls together, allowing the effects and results of these function calls to be resolved in parallel
|
18 |
+
|
19 |
+
## Quick Start - gpt-3.5-turbo-1106
|
20 |
+
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/Parallel_function_calling.ipynb">
|
21 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
22 |
+
</a>
|
23 |
+
|
24 |
+
In this example we define a single function `get_current_weather`.
|
25 |
+
|
26 |
+
- Step 1: Send the model the `get_current_weather` with the user question
|
27 |
+
- Step 2: Parse the output from the model response - Execute the `get_current_weather` with the model provided args
|
28 |
+
- Step 3: Send the model the output from running the `get_current_weather` function
|
29 |
+
|
30 |
+
|
31 |
+
### Full Code - Parallel function calling with `gpt-3.5-turbo-1106`
|
32 |
+
|
33 |
+
```python
|
34 |
+
import litellm
|
35 |
+
import json
|
36 |
+
# set openai api key
|
37 |
+
import os
|
38 |
+
os.environ['OPENAI_API_KEY'] = "" # litellm reads OPENAI_API_KEY from .env and sends the request
|
39 |
+
|
40 |
+
# Example dummy function hard coded to return the same weather
|
41 |
+
# In production, this could be your backend API or an external API
|
42 |
+
def get_current_weather(location, unit="fahrenheit"):
|
43 |
+
"""Get the current weather in a given location"""
|
44 |
+
if "tokyo" in location.lower():
|
45 |
+
return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
|
46 |
+
elif "san francisco" in location.lower():
|
47 |
+
return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
|
48 |
+
elif "paris" in location.lower():
|
49 |
+
return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
|
50 |
+
else:
|
51 |
+
return json.dumps({"location": location, "temperature": "unknown"})
|
52 |
+
|
53 |
+
|
54 |
+
def test_parallel_function_call():
|
55 |
+
try:
|
56 |
+
# Step 1: send the conversation and available functions to the model
|
57 |
+
messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
|
58 |
+
tools = [
|
59 |
+
{
|
60 |
+
"type": "function",
|
61 |
+
"function": {
|
62 |
+
"name": "get_current_weather",
|
63 |
+
"description": "Get the current weather in a given location",
|
64 |
+
"parameters": {
|
65 |
+
"type": "object",
|
66 |
+
"properties": {
|
67 |
+
"location": {
|
68 |
+
"type": "string",
|
69 |
+
"description": "The city and state, e.g. San Francisco, CA",
|
70 |
+
},
|
71 |
+
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
|
72 |
+
},
|
73 |
+
"required": ["location"],
|
74 |
+
},
|
75 |
+
},
|
76 |
+
}
|
77 |
+
]
|
78 |
+
response = litellm.completion(
|
79 |
+
model="gpt-3.5-turbo-1106",
|
80 |
+
messages=messages,
|
81 |
+
tools=tools,
|
82 |
+
tool_choice="auto", # auto is default, but we'll be explicit
|
83 |
+
)
|
84 |
+
print("\nFirst LLM Response:\n", response)
|
85 |
+
response_message = response.choices[0].message
|
86 |
+
tool_calls = response_message.tool_calls
|
87 |
+
|
88 |
+
print("\nLength of tool calls", len(tool_calls))
|
89 |
+
|
90 |
+
# Step 2: check if the model wanted to call a function
|
91 |
+
if tool_calls:
|
92 |
+
# Step 3: call the function
|
93 |
+
# Note: the JSON response may not always be valid; be sure to handle errors
|
94 |
+
available_functions = {
|
95 |
+
"get_current_weather": get_current_weather,
|
96 |
+
} # only one function in this example, but you can have multiple
|
97 |
+
messages.append(response_message) # extend conversation with assistant's reply
|
98 |
+
|
99 |
+
# Step 4: send the info for each function call and function response to the model
|
100 |
+
for tool_call in tool_calls:
|
101 |
+
function_name = tool_call.function.name
|
102 |
+
function_to_call = available_functions[function_name]
|
103 |
+
function_args = json.loads(tool_call.function.arguments)
|
104 |
+
function_response = function_to_call(
|
105 |
+
location=function_args.get("location"),
|
106 |
+
unit=function_args.get("unit"),
|
107 |
+
)
|
108 |
+
messages.append(
|
109 |
+
{
|
110 |
+
"tool_call_id": tool_call.id,
|
111 |
+
"role": "tool",
|
112 |
+
"name": function_name,
|
113 |
+
"content": function_response,
|
114 |
+
}
|
115 |
+
) # extend conversation with function response
|
116 |
+
second_response = litellm.completion(
|
117 |
+
model="gpt-3.5-turbo-1106",
|
118 |
+
messages=messages,
|
119 |
+
) # get a new response from the model where it can see the function response
|
120 |
+
print("\nSecond LLM response:\n", second_response)
|
121 |
+
return second_response
|
122 |
+
except Exception as e:
|
123 |
+
print(f"Error occurred: {e}")
|
124 |
+
|
125 |
+
test_parallel_function_call()
|
126 |
+
```
|
127 |
+
|
128 |
+
### Explanation - Parallel function calling
|
129 |
+
Below is an explanation of what is happening in the code snippet above for Parallel function calling with `gpt-3.5-turbo-1106`
|
130 |
+
### Step1: litellm.completion() with `tools` set to `get_current_weather`
|
131 |
+
```python
|
132 |
+
import litellm
|
133 |
+
import json
|
134 |
+
# set openai api key
|
135 |
+
import os
|
136 |
+
os.environ['OPENAI_API_KEY'] = "" # litellm reads OPENAI_API_KEY from .env and sends the request
|
137 |
+
# Example dummy function hard coded to return the same weather
|
138 |
+
# In production, this could be your backend API or an external API
|
139 |
+
def get_current_weather(location, unit="fahrenheit"):
|
140 |
+
"""Get the current weather in a given location"""
|
141 |
+
if "tokyo" in location.lower():
|
142 |
+
return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
|
143 |
+
elif "san francisco" in location.lower():
|
144 |
+
return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
|
145 |
+
elif "paris" in location.lower():
|
146 |
+
return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
|
147 |
+
else:
|
148 |
+
return json.dumps({"location": location, "temperature": "unknown"})
|
149 |
+
|
150 |
+
messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
|
151 |
+
tools = [
|
152 |
+
{
|
153 |
+
"type": "function",
|
154 |
+
"function": {
|
155 |
+
"name": "get_current_weather",
|
156 |
+
"description": "Get the current weather in a given location",
|
157 |
+
"parameters": {
|
158 |
+
"type": "object",
|
159 |
+
"properties": {
|
160 |
+
"location": {
|
161 |
+
"type": "string",
|
162 |
+
"description": "The city and state, e.g. San Francisco, CA",
|
163 |
+
},
|
164 |
+
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
|
165 |
+
},
|
166 |
+
"required": ["location"],
|
167 |
+
},
|
168 |
+
},
|
169 |
+
}
|
170 |
+
]
|
171 |
+
|
172 |
+
response = litellm.completion(
|
173 |
+
model="gpt-3.5-turbo-1106",
|
174 |
+
messages=messages,
|
175 |
+
tools=tools,
|
176 |
+
tool_choice="auto", # auto is default, but we'll be explicit
|
177 |
+
)
|
178 |
+
print("\nLLM Response1:\n", response)
|
179 |
+
response_message = response.choices[0].message
|
180 |
+
tool_calls = response.choices[0].message.tool_calls
|
181 |
+
```
|
182 |
+
|
183 |
+
##### Expected output
|
184 |
+
In the output you can see the model calls the function multiple times - for San Francisco, Tokyo, Paris
|
185 |
+
```json
|
186 |
+
ModelResponse(
|
187 |
+
id='chatcmpl-8MHBKZ9t6bXuhBvUMzoKsfmmlv7xq',
|
188 |
+
choices=[
|
189 |
+
Choices(finish_reason='tool_calls',
|
190 |
+
index=0,
|
191 |
+
message=Message(content=None, role='assistant',
|
192 |
+
tool_calls=[
|
193 |
+
ChatCompletionMessageToolCall(id='call_DN6IiLULWZw7sobV6puCji1O', function=Function(arguments='{"location": "San Francisco", "unit": "celsius"}', name='get_current_weather'), type='function'),
|
194 |
+
|
195 |
+
ChatCompletionMessageToolCall(id='call_ERm1JfYO9AFo2oEWRmWUd40c', function=Function(arguments='{"location": "Tokyo", "unit": "celsius"}', name='get_current_weather'), type='function'),
|
196 |
+
|
197 |
+
ChatCompletionMessageToolCall(id='call_2lvUVB1y4wKunSxTenR0zClP', function=Function(arguments='{"location": "Paris", "unit": "celsius"}', name='get_current_weather'), type='function')
|
198 |
+
]))
|
199 |
+
],
|
200 |
+
created=1700319953,
|
201 |
+
model='gpt-3.5-turbo-1106',
|
202 |
+
object='chat.completion',
|
203 |
+
system_fingerprint='fp_eeff13170a',
|
204 |
+
usage={'completion_tokens': 77, 'prompt_tokens': 88, 'total_tokens': 165},
|
205 |
+
_response_ms=1177.372
|
206 |
+
)
|
207 |
+
```
|
208 |
+
|
209 |
+
### Step 2 - Parse the Model Response and Execute Functions
|
210 |
+
After sending the initial request, parse the model response to identify the function calls it wants to make. In this example, we expect three tool calls, each corresponding to a location (San Francisco, Tokyo, and Paris).
|
211 |
+
|
212 |
+
```python
|
213 |
+
# Check if the model wants to call a function
|
214 |
+
if tool_calls:
|
215 |
+
# Execute the functions and prepare responses
|
216 |
+
available_functions = {
|
217 |
+
"get_current_weather": get_current_weather,
|
218 |
+
}
|
219 |
+
|
220 |
+
messages.append(response_message) # Extend conversation with assistant's reply
|
221 |
+
|
222 |
+
for tool_call in tool_calls:
|
223 |
+
print(f"\nExecuting tool call\n{tool_call}")
|
224 |
+
function_name = tool_call.function.name
|
225 |
+
function_to_call = available_functions[function_name]
|
226 |
+
function_args = json.loads(tool_call.function.arguments)
|
227 |
+
# calling the get_current_weather() function
|
228 |
+
function_response = function_to_call(
|
229 |
+
location=function_args.get("location"),
|
230 |
+
unit=function_args.get("unit"),
|
231 |
+
)
|
232 |
+
print(f"Result from tool call\n{function_response}\n")
|
233 |
+
|
234 |
+
# Extend conversation with function response
|
235 |
+
messages.append(
|
236 |
+
{
|
237 |
+
"tool_call_id": tool_call.id,
|
238 |
+
"role": "tool",
|
239 |
+
"name": function_name,
|
240 |
+
"content": function_response,
|
241 |
+
}
|
242 |
+
)
|
243 |
+
|
244 |
+
```
|
245 |
+
|
246 |
+
### Step 3 - Second litellm.completion() call
|
247 |
+
Once the functions are executed, send the model the information for each function call and its response. This allows the model to generate a new response considering the effects of the function calls.
|
248 |
+
```python
|
249 |
+
second_response = litellm.completion(
|
250 |
+
model="gpt-3.5-turbo-1106",
|
251 |
+
messages=messages,
|
252 |
+
)
|
253 |
+
print("Second Response\n", second_response)
|
254 |
+
```
|
255 |
+
|
256 |
+
#### Expected output
|
257 |
+
```json
|
258 |
+
ModelResponse(
|
259 |
+
id='chatcmpl-8MHBLh1ldADBP71OrifKap6YfAd4w',
|
260 |
+
choices=[
|
261 |
+
Choices(finish_reason='stop', index=0,
|
262 |
+
message=Message(content="The current weather in San Francisco is 72Β°F, in Tokyo it's 10Β°C, and in Paris it's 22Β°C.", role='assistant'))
|
263 |
+
],
|
264 |
+
created=1700319955,
|
265 |
+
model='gpt-3.5-turbo-1106',
|
266 |
+
object='chat.completion',
|
267 |
+
system_fingerprint='fp_eeff13170a',
|
268 |
+
usage={'completion_tokens': 28, 'prompt_tokens': 169, 'total_tokens': 197},
|
269 |
+
_response_ms=1032.431
|
270 |
+
)
|
271 |
+
```
|
272 |
+
|
273 |
+
## Parallel Function Calling - Azure OpenAI
|
274 |
+
```python
|
275 |
+
# set Azure env variables
|
276 |
+
import os
|
277 |
+
os.environ['AZURE_API_KEY'] = "" # litellm reads AZURE_API_KEY from .env and sends the request
|
278 |
+
os.environ['AZURE_API_BASE'] = "https://openai-gpt-4-test-v-1.openai.azure.com/"
|
279 |
+
os.environ['AZURE_API_VERSION'] = "2023-07-01-preview"
|
280 |
+
|
281 |
+
import litellm
|
282 |
+
import json
|
283 |
+
# Example dummy function hard coded to return the same weather
|
284 |
+
# In production, this could be your backend API or an external API
|
285 |
+
def get_current_weather(location, unit="fahrenheit"):
|
286 |
+
"""Get the current weather in a given location"""
|
287 |
+
if "tokyo" in location.lower():
|
288 |
+
return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
|
289 |
+
elif "san francisco" in location.lower():
|
290 |
+
return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
|
291 |
+
elif "paris" in location.lower():
|
292 |
+
return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
|
293 |
+
else:
|
294 |
+
return json.dumps({"location": location, "temperature": "unknown"})
|
295 |
+
|
296 |
+
## Step 1: send the conversation and available functions to the model
|
297 |
+
messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
|
298 |
+
tools = [
|
299 |
+
{
|
300 |
+
"type": "function",
|
301 |
+
"function": {
|
302 |
+
"name": "get_current_weather",
|
303 |
+
"description": "Get the current weather in a given location",
|
304 |
+
"parameters": {
|
305 |
+
"type": "object",
|
306 |
+
"properties": {
|
307 |
+
"location": {
|
308 |
+
"type": "string",
|
309 |
+
"description": "The city and state, e.g. San Francisco, CA",
|
310 |
+
},
|
311 |
+
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
|
312 |
+
},
|
313 |
+
"required": ["location"],
|
314 |
+
},
|
315 |
+
},
|
316 |
+
}
|
317 |
+
]
|
318 |
+
|
319 |
+
response = litellm.completion(
|
320 |
+
model="azure/chatgpt-functioncalling", # model = azure/<your-azure-deployment-name>
|
321 |
+
messages=messages,
|
322 |
+
tools=tools,
|
323 |
+
tool_choice="auto", # auto is default, but we'll be explicit
|
324 |
+
)
|
325 |
+
print("\nLLM Response1:\n", response)
|
326 |
+
response_message = response.choices[0].message
|
327 |
+
tool_calls = response.choices[0].message.tool_calls
|
328 |
+
print("\nTool Choice:\n", tool_calls)
|
329 |
+
|
330 |
+
## Step 2 - Parse the Model Response and Execute Functions
|
331 |
+
# Check if the model wants to call a function
|
332 |
+
if tool_calls:
|
333 |
+
# Execute the functions and prepare responses
|
334 |
+
available_functions = {
|
335 |
+
"get_current_weather": get_current_weather,
|
336 |
+
}
|
337 |
+
|
338 |
+
messages.append(response_message) # Extend conversation with assistant's reply
|
339 |
+
|
340 |
+
for tool_call in tool_calls:
|
341 |
+
print(f"\nExecuting tool call\n{tool_call}")
|
342 |
+
function_name = tool_call.function.name
|
343 |
+
function_to_call = available_functions[function_name]
|
344 |
+
function_args = json.loads(tool_call.function.arguments)
|
345 |
+
# calling the get_current_weather() function
|
346 |
+
function_response = function_to_call(
|
347 |
+
location=function_args.get("location"),
|
348 |
+
unit=function_args.get("unit"),
|
349 |
+
)
|
350 |
+
print(f"Result from tool call\n{function_response}\n")
|
351 |
+
|
352 |
+
# Extend conversation with function response
|
353 |
+
messages.append(
|
354 |
+
{
|
355 |
+
"tool_call_id": tool_call.id,
|
356 |
+
"role": "tool",
|
357 |
+
"name": function_name,
|
358 |
+
"content": function_response,
|
359 |
+
}
|
360 |
+
)
|
361 |
+
|
362 |
+
## Step 3 - Second litellm.completion() call
|
363 |
+
second_response = litellm.completion(
|
364 |
+
model="azure/chatgpt-functioncalling",
|
365 |
+
messages=messages,
|
366 |
+
)
|
367 |
+
print("Second Response\n", second_response)
|
368 |
+
print("Second Response Message\n", second_response.choices[0].message.content)
|
369 |
+
|
370 |
+
```
|
371 |
+
|
372 |
+
## Deprecated - Function Calling with `completion(functions=functions)`
|
373 |
+
```python
|
374 |
+
import os, litellm
|
375 |
+
from litellm import completion
|
376 |
+
|
377 |
+
os.environ['OPENAI_API_KEY'] = ""
|
378 |
+
|
379 |
+
messages = [
|
380 |
+
{"role": "user", "content": "What is the weather like in Boston?"}
|
381 |
+
]
|
382 |
+
|
383 |
+
# python function that will get executed
|
384 |
+
def get_current_weather(location):
|
385 |
+
if location == "Boston, MA":
|
386 |
+
return "The weather is 12F"
|
387 |
+
|
388 |
+
# JSON Schema to pass to OpenAI
|
389 |
+
functions = [
|
390 |
+
{
|
391 |
+
"name": "get_current_weather",
|
392 |
+
"description": "Get the current weather in a given location",
|
393 |
+
"parameters": {
|
394 |
+
"type": "object",
|
395 |
+
"properties": {
|
396 |
+
"location": {
|
397 |
+
"type": "string",
|
398 |
+
"description": "The city and state, e.g. San Francisco, CA"
|
399 |
+
},
|
400 |
+
"unit": {
|
401 |
+
"type": "string",
|
402 |
+
"enum": ["celsius", "fahrenheit"]
|
403 |
+
}
|
404 |
+
},
|
405 |
+
"required": ["location"]
|
406 |
+
}
|
407 |
+
}
|
408 |
+
]
|
409 |
+
|
410 |
+
response = completion(model="gpt-3.5-turbo-0613", messages=messages, functions=functions)
|
411 |
+
print(response)
|
412 |
+
```
|
413 |
+
|
414 |
+
## litellm.function_to_dict - Convert Functions to dictionary for OpenAI function calling
|
415 |
+
`function_to_dict` allows you to pass a function docstring and produce a dictionary usable for OpenAI function calling
|
416 |
+
|
417 |
+
### Using `function_to_dict`
|
418 |
+
1. Define your function `get_current_weather`
|
419 |
+
2. Add a docstring to your function `get_current_weather`
|
420 |
+
3. Pass the function to `litellm.utils.function_to_dict` to get the dictionary for OpenAI function calling
|
421 |
+
|
422 |
+
```python
|
423 |
+
# function with docstring
|
424 |
+
def get_current_weather(location: str, unit: str):
|
425 |
+
"""Get the current weather in a given location
|
426 |
+
|
427 |
+
Parameters
|
428 |
+
----------
|
429 |
+
location : str
|
430 |
+
The city and state, e.g. San Francisco, CA
|
431 |
+
unit : {'celsius', 'fahrenheit'}
|
432 |
+
Temperature unit
|
433 |
+
|
434 |
+
Returns
|
435 |
+
-------
|
436 |
+
str
|
437 |
+
a sentence indicating the weather
|
438 |
+
"""
|
439 |
+
if location == "Boston, MA":
|
440 |
+
return "The weather is 12F"
|
441 |
+
|
442 |
+
# use litellm.utils.function_to_dict to convert function to dict
|
443 |
+
function_json = litellm.utils.function_to_dict(get_current_weather)
|
444 |
+
print(function_json)
|
445 |
+
```
|
446 |
+
|
447 |
+
#### Output from function_to_dict
|
448 |
+
```json
|
449 |
+
{
|
450 |
+
'name': 'get_current_weather',
|
451 |
+
'description': 'Get the current weather in a given location',
|
452 |
+
'parameters': {
|
453 |
+
'type': 'object',
|
454 |
+
'properties': {
|
455 |
+
'location': {'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA'},
|
456 |
+
'unit': {'type': 'string', 'description': 'Temperature unit', 'enum': "['fahrenheit', 'celsius']"}
|
457 |
+
},
|
458 |
+
'required': ['location', 'unit']
|
459 |
+
}
|
460 |
+
}
|
461 |
+
```
|
462 |
+
|
463 |
+
### Using function_to_dict with Function calling
|
464 |
+
```python
|
465 |
+
import os, litellm
|
466 |
+
from litellm import completion
|
467 |
+
|
468 |
+
os.environ['OPENAI_API_KEY'] = ""
|
469 |
+
|
470 |
+
messages = [
|
471 |
+
{"role": "user", "content": "What is the weather like in Boston?"}
|
472 |
+
]
|
473 |
+
|
474 |
+
def get_current_weather(location: str, unit: str):
|
475 |
+
"""Get the current weather in a given location
|
476 |
+
|
477 |
+
Parameters
|
478 |
+
----------
|
479 |
+
location : str
|
480 |
+
The city and state, e.g. San Francisco, CA
|
481 |
+
unit : str {'celsius', 'fahrenheit'}
|
482 |
+
Temperature unit
|
483 |
+
|
484 |
+
Returns
|
485 |
+
-------
|
486 |
+
str
|
487 |
+
a sentence indicating the weather
|
488 |
+
"""
|
489 |
+
if location == "Boston, MA":
|
490 |
+
return "The weather is 12F"
|
491 |
+
|
492 |
+
functions = [litellm.utils.function_to_dict(get_current_weather)]
|
493 |
+
|
494 |
+
response = completion(model="gpt-3.5-turbo-0613", messages=messages, functions=functions)
|
495 |
+
print(response)
|
496 |
+
```
|
497 |
+
|
498 |
+
## Function calling for Non-OpenAI LLMs
|
499 |
+
|
500 |
+
### Adding Function to prompt
|
501 |
+
For Non OpenAI LLMs LiteLLM allows you to add the function to the prompt set: `litellm.add_function_to_prompt = True`
|
502 |
+
|
503 |
+
#### Usage
|
504 |
+
```python
|
505 |
+
import os, litellm
|
506 |
+
from litellm import completion
|
507 |
+
|
508 |
+
# IMPORTANT - Set this to TRUE to add the function to the prompt for Non OpenAI LLMs
|
509 |
+
litellm.add_function_to_prompt = True # set add_function_to_prompt for Non OpenAI LLMs
|
510 |
+
|
511 |
+
os.environ['ANTHROPIC_API_KEY'] = ""
|
512 |
+
|
513 |
+
messages = [
|
514 |
+
{"role": "user", "content": "What is the weather like in Boston?"}
|
515 |
+
]
|
516 |
+
|
517 |
+
def get_current_weather(location):
|
518 |
+
if location == "Boston, MA":
|
519 |
+
return "The weather is 12F"
|
520 |
+
|
521 |
+
functions = [
|
522 |
+
{
|
523 |
+
"name": "get_current_weather",
|
524 |
+
"description": "Get the current weather in a given location",
|
525 |
+
"parameters": {
|
526 |
+
"type": "object",
|
527 |
+
"properties": {
|
528 |
+
"location": {
|
529 |
+
"type": "string",
|
530 |
+
"description": "The city and state, e.g. San Francisco, CA"
|
531 |
+
},
|
532 |
+
"unit": {
|
533 |
+
"type": "string",
|
534 |
+
"enum": ["celsius", "fahrenheit"]
|
535 |
+
}
|
536 |
+
},
|
537 |
+
"required": ["location"]
|
538 |
+
}
|
539 |
+
}
|
540 |
+
]
|
541 |
+
|
542 |
+
response = completion(model="claude-2", messages=messages, functions=functions)
|
543 |
+
print(response)
|
544 |
+
```
|
545 |
+
|
docs/my-website/docs/completion/input.md
ADDED
@@ -0,0 +1,582 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Tabs from '@theme/Tabs';
|
2 |
+
import TabItem from '@theme/TabItem';
|
3 |
+
|
4 |
+
# Input Params
|
5 |
+
|
6 |
+
## Common Params
|
7 |
+
LiteLLM accepts and translates the [OpenAI Chat Completion params](https://platform.openai.com/docs/api-reference/chat/create) across all providers.
|
8 |
+
|
9 |
+
### Usage
|
10 |
+
```python
|
11 |
+
import litellm
|
12 |
+
|
13 |
+
# set env variables
|
14 |
+
os.environ["OPENAI_API_KEY"] = "your-openai-key"
|
15 |
+
|
16 |
+
## SET MAX TOKENS - via completion()
|
17 |
+
response = litellm.completion(
|
18 |
+
model="gpt-3.5-turbo",
|
19 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
20 |
+
max_tokens=10
|
21 |
+
)
|
22 |
+
|
23 |
+
print(response)
|
24 |
+
```
|
25 |
+
|
26 |
+
### Translated OpenAI params
|
27 |
+
This is a list of openai params we translate across providers.
|
28 |
+
|
29 |
+
This list is constantly being updated.
|
30 |
+
|
31 |
+
| Provider | temperature | max_tokens | top_p | stream | stop | n | presence_penalty | frequency_penalty | functions | function_call |
|
32 |
+
|---|---|---|---|---|---|---|---|---|---|---|
|
33 |
+
|Anthropic| β
| β
| β
| β
| β
| | | | | |
|
34 |
+
|OpenAI| β
| β
| β
| β
| β
| β
| β
| β
| β
| β
|
|
35 |
+
|Replicate | β
| β
| β
| β
| β
| | | | | |
|
36 |
+
|Anyscale | β
| β
| β
| β
|
|
37 |
+
|Cohere| β
| β
| β
| β
| β
| β
| β
| β
| | |
|
38 |
+
|Huggingface| β
| β
| β
| β
| β
| β
| | | | |
|
39 |
+
|Openrouter| β
| β
| β
| β
| β
| β
| β
| β
| β
| β
|
|
40 |
+
|AI21| β
| β
| β
| β
| β
| β
| β
| β
| | |
|
41 |
+
|VertexAI| β
| β
| | β
| | | | | | |
|
42 |
+
|Bedrock| β
| β
| β
| β
| β
| | | | | |
|
43 |
+
|Sagemaker| β
| β
(only `jumpstart llama2`) | | β
| | | | | | |
|
44 |
+
|TogetherAI| β
| β
| β
| β
| β
| | | | | |
|
45 |
+
|AlephAlpha| β
| β
| β
| β
| β
| β
| | | | |
|
46 |
+
|Palm| β
| β
| β
| β
| β
| β
| | | | |
|
47 |
+
|NLP Cloud| β
| β
| β
| β
| β
| | | | | |
|
48 |
+
|Petals| β
| β
| | β
| | | | | | |
|
49 |
+
|Ollama| β
| β
| β
| β
| β
| | | β
| | |
|
50 |
+
|
51 |
+
:::note
|
52 |
+
|
53 |
+
By default, LiteLLM raises an exception if the openai param being passed in isn't supported.
|
54 |
+
|
55 |
+
To drop the param instead, set `litellm.drop_params = True`.
|
56 |
+
|
57 |
+
**For function calling:**
|
58 |
+
|
59 |
+
Add to prompt for non-openai models, set: `litellm.add_function_to_prompt = True`.
|
60 |
+
:::
|
61 |
+
|
62 |
+
## Input Params
|
63 |
+
|
64 |
+
```python
|
65 |
+
def completion(
|
66 |
+
model: str,
|
67 |
+
messages: List = [],
|
68 |
+
# Optional OpenAI params
|
69 |
+
temperature: Optional[float] = None,
|
70 |
+
top_p: Optional[float] = None,
|
71 |
+
n: Optional[int] = None,
|
72 |
+
stream: Optional[bool] = None,
|
73 |
+
stop=None,
|
74 |
+
max_tokens: Optional[float] = None,
|
75 |
+
presence_penalty: Optional[float] = None,
|
76 |
+
frequency_penalty: Optional[float]=None,
|
77 |
+
logit_bias: dict = {},
|
78 |
+
user: str = "",
|
79 |
+
deployment_id = None,
|
80 |
+
request_timeout: Optional[int] = None,
|
81 |
+
response_format: Optional[dict] = None,
|
82 |
+
seed: Optional[int] = None,
|
83 |
+
tools: Optional[List] = None,
|
84 |
+
tool_choice: Optional[str] = None,
|
85 |
+
functions: List = [], # soon to be deprecated
|
86 |
+
function_call: str = "", # soon to be deprecated
|
87 |
+
|
88 |
+
# Optional LiteLLM params
|
89 |
+
api_base: Optional[str] = None,
|
90 |
+
api_version: Optional[str] = None,
|
91 |
+
api_key: Optional[str] = None,
|
92 |
+
num_retries: Optional[int] = None, # set to retry a model if an APIError, TimeoutError, or ServiceUnavailableError occurs
|
93 |
+
context_window_fallback_dict: Optional[dict] = None, # mapping of model to use if call fails due to context window error
|
94 |
+
fallbacks: Optional[list] = None, # pass in a list of api_base,keys, etc.
|
95 |
+
metadata: Optional[dict] = None # additional call metadata, passed to logging integrations / custom callbacks
|
96 |
+
|
97 |
+
|
98 |
+
**kwargs,
|
99 |
+
) -> ModelResponse:
|
100 |
+
```
|
101 |
+
### Required Fields
|
102 |
+
|
103 |
+
- `model`: *string* - ID of the model to use. Refer to the model endpoint compatibility table for details on which models work with the Chat API.
|
104 |
+
|
105 |
+
- `messages`: *array* - A list of messages comprising the conversation so far.
|
106 |
+
|
107 |
+
#### Properties of `messages`
|
108 |
+
*Note* - Each message in the array contains the following properties:
|
109 |
+
|
110 |
+
- `role`: *string* - The role of the message's author. Roles can be: system, user, assistant, or function.
|
111 |
+
|
112 |
+
- `content`: *string or null* - The contents of the message. It is required for all messages, but may be null for assistant messages with function calls.
|
113 |
+
|
114 |
+
- `name`: *string (optional)* - The name of the author of the message. It is required if the role is "function". The name should match the name of the function represented in the content. It can contain characters (a-z, A-Z, 0-9), and underscores, with a maximum length of 64 characters.
|
115 |
+
|
116 |
+
- `function_call`: *object (optional)* - The name and arguments of a function that should be called, as generated by the model.
|
117 |
+
|
118 |
+
|
119 |
+
|
120 |
+
## Optional Fields
|
121 |
+
|
122 |
+
`temperature`: *number or null (optional)* - The sampling temperature to be used, between 0 and 2. Higher values like 0.8 produce more random outputs, while lower values like 0.2 make outputs more focused and deterministic.
|
123 |
+
|
124 |
+
- `top_p`: *number or null (optional)* - An alternative to sampling with temperature. It instructs the model to consider the results of the tokens with top_p probability. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.
|
125 |
+
|
126 |
+
- `n`: *integer or null (optional)* - The number of chat completion choices to generate for each input message.
|
127 |
+
|
128 |
+
- `stream`: *boolean or null (optional)* - If set to true, it sends partial message deltas. Tokens will be sent as they become available, with the stream terminated by a [DONE] message.
|
129 |
+
|
130 |
+
- `stop`: *string/ array/ null (optional)* - Up to 4 sequences where the API will stop generating further tokens.
|
131 |
+
|
132 |
+
- `max_tokens`: *integer (optional)* - The maximum number of tokens to generate in the chat completion.
|
133 |
+
|
134 |
+
- `presence_penalty`: *number or null (optional)* - It is used to penalize new tokens based on their existence in the text so far.
|
135 |
+
|
136 |
+
- `response_format`: *object (optional)* - An object specifying the format that the model must output.
|
137 |
+
|
138 |
+
- Setting to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON.
|
139 |
+
|
140 |
+
- Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
|
141 |
+
|
142 |
+
- `seed`: *integer or null (optional)* - This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.
|
143 |
+
|
144 |
+
- `tools`: *array (optional)* - A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.
|
145 |
+
|
146 |
+
- `type`: *string* - The type of the tool. Currently, only function is supported.
|
147 |
+
|
148 |
+
- `function`: *object* - Required.
|
149 |
+
|
150 |
+
- `tool_choice`: *string or object (optional)* - Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"type: "function", "function": {"name": "my_function"}} forces the model to call that function.
|
151 |
+
|
152 |
+
- `none` is the default when no functions are present. `auto` is the default if functions are present.
|
153 |
+
|
154 |
+
- `frequency_penalty`: *number or null (optional)* - It is used to penalize new tokens based on their frequency in the text so far.
|
155 |
+
|
156 |
+
- `logit_bias`: *map (optional)* - Used to modify the probability of specific tokens appearing in the completion.
|
157 |
+
|
158 |
+
- `user`: *string (optional)* - A unique identifier representing your end-user. This can help OpenAI to monitor and detect abuse.
|
159 |
+
|
160 |
+
- `timeout`: *int (optional)* - Timeout in seconds for completion requests (Defaults to 600 seconds)
|
161 |
+
|
162 |
+
#### Deprecated Params
|
163 |
+
- `functions`: *array* - A list of functions that the model may use to generate JSON inputs. Each function should have the following properties:
|
164 |
+
|
165 |
+
- `name`: *string* - The name of the function to be called. It should contain a-z, A-Z, 0-9, underscores and dashes, with a maximum length of 64 characters.
|
166 |
+
|
167 |
+
- `description`: *string (optional)* - A description explaining what the function does. It helps the model to decide when and how to call the function.
|
168 |
+
|
169 |
+
- `parameters`: *object* - The parameters that the function accepts, described as a JSON Schema object.
|
170 |
+
|
171 |
+
- `function_call`: *string or object (optional)* - Controls how the model responds to function calls.
|
172 |
+
|
173 |
+
|
174 |
+
#### litellm-specific params
|
175 |
+
|
176 |
+
- `api_base`: *string (optional)* - The api endpoint you want to call the model with
|
177 |
+
|
178 |
+
- `api_version`: *string (optional)* - (Azure-specific) the api version for the call
|
179 |
+
|
180 |
+
- `num_retries`: *int (optional)* - The number of times to retry the API call if an APIError, TimeoutError or ServiceUnavailableError occurs
|
181 |
+
|
182 |
+
- `context_window_fallback_dict`: *dict (optional)* - A mapping of model to use if call fails due to context window error
|
183 |
+
|
184 |
+
- `fallbacks`: *list (optional)* - A list of model names + params to be used, in case the initial call fails
|
185 |
+
|
186 |
+
- `metadata`: *dict (optional)* - Any additional data you want to be logged when the call is made (sent to logging integrations, eg. promptlayer and accessible via custom callback function)
|
187 |
+
|
188 |
+
## Provider-specific Params
|
189 |
+
Providers might offer params not supported by OpenAI (e.g. top_k). You can pass those in 2 ways:
|
190 |
+
- via completion(): We'll pass the non-openai param, straight to the provider as part of the request body.
|
191 |
+
- e.g. `completion(model="claude-instant-1", top_k=3)`
|
192 |
+
- via provider-specific config variable (e.g. `litellm.OpenAIConfig()`).
|
193 |
+
|
194 |
+
<Tabs>
|
195 |
+
<TabItem value="openai" label="OpenAI">
|
196 |
+
|
197 |
+
```python
|
198 |
+
import litellm, os
|
199 |
+
|
200 |
+
# set env variables
|
201 |
+
os.environ["OPENAI_API_KEY"] = "your-openai-key"
|
202 |
+
|
203 |
+
## SET MAX TOKENS - via completion()
|
204 |
+
response_1 = litellm.completion(
|
205 |
+
model="gpt-3.5-turbo",
|
206 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
207 |
+
max_tokens=10
|
208 |
+
)
|
209 |
+
|
210 |
+
response_1_text = response_1.choices[0].message.content
|
211 |
+
|
212 |
+
## SET MAX TOKENS - via config
|
213 |
+
litellm.OpenAIConfig(max_tokens=10)
|
214 |
+
|
215 |
+
response_2 = litellm.completion(
|
216 |
+
model="gpt-3.5-turbo",
|
217 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
218 |
+
)
|
219 |
+
|
220 |
+
response_2_text = response_2.choices[0].message.content
|
221 |
+
|
222 |
+
## TEST OUTPUT
|
223 |
+
assert len(response_2_text) > len(response_1_text)
|
224 |
+
```
|
225 |
+
|
226 |
+
</TabItem>
|
227 |
+
<TabItem value="openai-text" label="OpenAI Text Completion">
|
228 |
+
|
229 |
+
```python
|
230 |
+
import litellm, os
|
231 |
+
|
232 |
+
# set env variables
|
233 |
+
os.environ["OPENAI_API_KEY"] = "your-openai-key"
|
234 |
+
|
235 |
+
|
236 |
+
## SET MAX TOKENS - via completion()
|
237 |
+
response_1 = litellm.completion(
|
238 |
+
model="text-davinci-003",
|
239 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
240 |
+
max_tokens=10
|
241 |
+
)
|
242 |
+
|
243 |
+
response_1_text = response_1.choices[0].message.content
|
244 |
+
|
245 |
+
## SET MAX TOKENS - via config
|
246 |
+
litellm.OpenAITextCompletionConfig(max_tokens=10)
|
247 |
+
response_2 = litellm.completion(
|
248 |
+
model="text-davinci-003",
|
249 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
250 |
+
)
|
251 |
+
|
252 |
+
response_2_text = response_2.choices[0].message.content
|
253 |
+
|
254 |
+
## TEST OUTPUT
|
255 |
+
assert len(response_2_text) > len(response_1_text)
|
256 |
+
```
|
257 |
+
|
258 |
+
</TabItem>
|
259 |
+
<TabItem value="azure-openai" label="Azure OpenAI">
|
260 |
+
|
261 |
+
```python
|
262 |
+
import litellm, os
|
263 |
+
|
264 |
+
# set env variables
|
265 |
+
os.environ["AZURE_API_BASE"] = "your-azure-api-base"
|
266 |
+
os.environ["AZURE_API_TYPE"] = "azure" # [OPTIONAL]
|
267 |
+
os.environ["AZURE_API_VERSION"] = "2023-07-01-preview" # [OPTIONAL]
|
268 |
+
|
269 |
+
## SET MAX TOKENS - via completion()
|
270 |
+
response_1 = litellm.completion(
|
271 |
+
model="azure/chatgpt-v-2",
|
272 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
273 |
+
max_tokens=10
|
274 |
+
)
|
275 |
+
|
276 |
+
response_1_text = response_1.choices[0].message.content
|
277 |
+
|
278 |
+
## SET MAX TOKENS - via config
|
279 |
+
litellm.AzureOpenAIConfig(max_tokens=10)
|
280 |
+
response_2 = litellm.completion(
|
281 |
+
model="azure/chatgpt-v-2",
|
282 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
283 |
+
)
|
284 |
+
|
285 |
+
response_2_text = response_2.choices[0].message.content
|
286 |
+
|
287 |
+
## TEST OUTPUT
|
288 |
+
assert len(response_2_text) > len(response_1_text)
|
289 |
+
```
|
290 |
+
|
291 |
+
</TabItem>
|
292 |
+
<TabItem value="anthropic" label="Anthropic">
|
293 |
+
|
294 |
+
```python
|
295 |
+
import litellm, os
|
296 |
+
|
297 |
+
# set env variables
|
298 |
+
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
|
299 |
+
|
300 |
+
## SET MAX TOKENS - via completion()
|
301 |
+
response_1 = litellm.completion(
|
302 |
+
model="claude-instant-1",
|
303 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
304 |
+
max_tokens=10
|
305 |
+
)
|
306 |
+
|
307 |
+
response_1_text = response_1.choices[0].message.content
|
308 |
+
|
309 |
+
## SET MAX TOKENS - via config
|
310 |
+
litellm.AnthropicConfig(max_tokens_to_sample=200)
|
311 |
+
response_2 = litellm.completion(
|
312 |
+
model="claude-instant-1",
|
313 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
314 |
+
)
|
315 |
+
|
316 |
+
response_2_text = response_2.choices[0].message.content
|
317 |
+
|
318 |
+
## TEST OUTPUT
|
319 |
+
assert len(response_2_text) > len(response_1_text)
|
320 |
+
```
|
321 |
+
|
322 |
+
</TabItem>
|
323 |
+
|
324 |
+
<TabItem value="huggingface" label="Huggingface">
|
325 |
+
|
326 |
+
```python
|
327 |
+
import litellm, os
|
328 |
+
|
329 |
+
# set env variables
|
330 |
+
os.environ["HUGGINGFACE_API_KEY"] = "your-huggingface-key" #[OPTIONAL]
|
331 |
+
|
332 |
+
## SET MAX TOKENS - via completion()
|
333 |
+
response_1 = litellm.completion(
|
334 |
+
model="huggingface/mistralai/Mistral-7B-Instruct-v0.1",
|
335 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
336 |
+
api_base="https://your-huggingface-api-endpoint",
|
337 |
+
max_tokens=10
|
338 |
+
)
|
339 |
+
|
340 |
+
response_1_text = response_1.choices[0].message.content
|
341 |
+
|
342 |
+
## SET MAX TOKENS - via config
|
343 |
+
litellm.HuggingfaceConfig(max_new_tokens=200)
|
344 |
+
response_2 = litellm.completion(
|
345 |
+
model="huggingface/mistralai/Mistral-7B-Instruct-v0.1",
|
346 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
347 |
+
api_base="https://your-huggingface-api-endpoint"
|
348 |
+
)
|
349 |
+
|
350 |
+
response_2_text = response_2.choices[0].message.content
|
351 |
+
|
352 |
+
## TEST OUTPUT
|
353 |
+
assert len(response_2_text) > len(response_1_text)
|
354 |
+
```
|
355 |
+
|
356 |
+
</TabItem>
|
357 |
+
|
358 |
+
<TabItem value="together_ai" label="TogetherAI">
|
359 |
+
|
360 |
+
|
361 |
+
```python
|
362 |
+
import litellm, os
|
363 |
+
|
364 |
+
# set env variables
|
365 |
+
os.environ["TOGETHERAI_API_KEY"] = "your-togetherai-key"
|
366 |
+
|
367 |
+
## SET MAX TOKENS - via completion()
|
368 |
+
response_1 = litellm.completion(
|
369 |
+
model="together_ai/togethercomputer/llama-2-70b-chat",
|
370 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
371 |
+
max_tokens=10
|
372 |
+
)
|
373 |
+
|
374 |
+
response_1_text = response_1.choices[0].message.content
|
375 |
+
|
376 |
+
## SET MAX TOKENS - via config
|
377 |
+
litellm.TogetherAIConfig(max_tokens_to_sample=200)
|
378 |
+
response_2 = litellm.completion(
|
379 |
+
model="together_ai/togethercomputer/llama-2-70b-chat",
|
380 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
381 |
+
)
|
382 |
+
|
383 |
+
response_2_text = response_2.choices[0].message.content
|
384 |
+
|
385 |
+
## TEST OUTPUT
|
386 |
+
assert len(response_2_text) > len(response_1_text)
|
387 |
+
```
|
388 |
+
|
389 |
+
</TabItem>
|
390 |
+
|
391 |
+
<TabItem value="ollama" label="Ollama">
|
392 |
+
|
393 |
+
```python
|
394 |
+
import litellm, os
|
395 |
+
|
396 |
+
## SET MAX TOKENS - via completion()
|
397 |
+
response_1 = litellm.completion(
|
398 |
+
model="ollama/llama2",
|
399 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
400 |
+
max_tokens=10
|
401 |
+
)
|
402 |
+
|
403 |
+
response_1_text = response_1.choices[0].message.content
|
404 |
+
|
405 |
+
## SET MAX TOKENS - via config
|
406 |
+
litellm.OllamConfig(num_predict=200)
|
407 |
+
response_2 = litellm.completion(
|
408 |
+
model="ollama/llama2",
|
409 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
410 |
+
)
|
411 |
+
|
412 |
+
response_2_text = response_2.choices[0].message.content
|
413 |
+
|
414 |
+
## TEST OUTPUT
|
415 |
+
assert len(response_2_text) > len(response_1_text)
|
416 |
+
```
|
417 |
+
|
418 |
+
</TabItem>
|
419 |
+
|
420 |
+
<TabItem value="replicate" label="Replicate">
|
421 |
+
|
422 |
+
```python
|
423 |
+
import litellm, os
|
424 |
+
|
425 |
+
# set env variables
|
426 |
+
os.environ["REPLICATE_API_KEY"] = "your-replicate-key"
|
427 |
+
|
428 |
+
## SET MAX TOKENS - via completion()
|
429 |
+
response_1 = litellm.completion(
|
430 |
+
model="replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
|
431 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
432 |
+
max_tokens=10
|
433 |
+
)
|
434 |
+
|
435 |
+
response_1_text = response_1.choices[0].message.content
|
436 |
+
|
437 |
+
## SET MAX TOKENS - via config
|
438 |
+
litellm.ReplicateConfig(max_new_tokens=200)
|
439 |
+
response_2 = litellm.completion(
|
440 |
+
model="replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
|
441 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
442 |
+
)
|
443 |
+
|
444 |
+
response_2_text = response_2.choices[0].message.content
|
445 |
+
|
446 |
+
## TEST OUTPUT
|
447 |
+
assert len(response_2_text) > len(response_1_text)
|
448 |
+
```
|
449 |
+
|
450 |
+
</TabItem>
|
451 |
+
|
452 |
+
<TabItem value="petals" label="Petals">
|
453 |
+
|
454 |
+
|
455 |
+
```python
|
456 |
+
import litellm
|
457 |
+
|
458 |
+
## SET MAX TOKENS - via completion()
|
459 |
+
response_1 = litellm.completion(
|
460 |
+
model="petals/petals-team/StableBeluga2",
|
461 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
462 |
+
api_base="https://chat.petals.dev/api/v1/generate",
|
463 |
+
max_tokens=10
|
464 |
+
)
|
465 |
+
|
466 |
+
response_1_text = response_1.choices[0].message.content
|
467 |
+
|
468 |
+
## SET MAX TOKENS - via config
|
469 |
+
litellm.PetalsConfig(max_new_tokens=10)
|
470 |
+
response_2 = litellm.completion(
|
471 |
+
model="petals/petals-team/StableBeluga2",
|
472 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
473 |
+
api_base="https://chat.petals.dev/api/v1/generate",
|
474 |
+
)
|
475 |
+
|
476 |
+
response_2_text = response_2.choices[0].message.content
|
477 |
+
|
478 |
+
## TEST OUTPUT
|
479 |
+
assert len(response_2_text) > len(response_1_text)
|
480 |
+
```
|
481 |
+
|
482 |
+
</TabItem>
|
483 |
+
|
484 |
+
<TabItem value="palm" label="Palm">
|
485 |
+
|
486 |
+
```python
|
487 |
+
import litellm, os
|
488 |
+
|
489 |
+
# set env variables
|
490 |
+
os.environ["PALM_API_KEY"] = "your-palm-key"
|
491 |
+
|
492 |
+
## SET MAX TOKENS - via completion()
|
493 |
+
response_1 = litellm.completion(
|
494 |
+
model="palm/chat-bison",
|
495 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
496 |
+
max_tokens=10
|
497 |
+
)
|
498 |
+
|
499 |
+
response_1_text = response_1.choices[0].message.content
|
500 |
+
|
501 |
+
## SET MAX TOKENS - via config
|
502 |
+
litellm.PalmConfig(maxOutputTokens=10)
|
503 |
+
response_2 = litellm.completion(
|
504 |
+
model="palm/chat-bison",
|
505 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
506 |
+
)
|
507 |
+
|
508 |
+
response_2_text = response_2.choices[0].message.content
|
509 |
+
|
510 |
+
## TEST OUTPUT
|
511 |
+
assert len(response_2_text) > len(response_1_text)
|
512 |
+
```
|
513 |
+
</TabItem>
|
514 |
+
|
515 |
+
<TabItem value="ai21" label="AI21">
|
516 |
+
|
517 |
+
```python
|
518 |
+
import litellm, os
|
519 |
+
|
520 |
+
# set env variables
|
521 |
+
os.environ["AI21_API_KEY"] = "your-ai21-key"
|
522 |
+
|
523 |
+
## SET MAX TOKENS - via completion()
|
524 |
+
response_1 = litellm.completion(
|
525 |
+
model="j2-mid",
|
526 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
527 |
+
max_tokens=10
|
528 |
+
)
|
529 |
+
|
530 |
+
response_1_text = response_1.choices[0].message.content
|
531 |
+
|
532 |
+
## SET MAX TOKENS - via config
|
533 |
+
litellm.AI21Config(maxOutputTokens=10)
|
534 |
+
response_2 = litellm.completion(
|
535 |
+
model="j2-mid",
|
536 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
537 |
+
)
|
538 |
+
|
539 |
+
response_2_text = response_2.choices[0].message.content
|
540 |
+
|
541 |
+
## TEST OUTPUT
|
542 |
+
assert len(response_2_text) > len(response_1_text)
|
543 |
+
```
|
544 |
+
|
545 |
+
</TabItem>
|
546 |
+
|
547 |
+
<TabItem value="cohere" label="Cohere">
|
548 |
+
|
549 |
+
```python
|
550 |
+
import litellm, os
|
551 |
+
|
552 |
+
# set env variables
|
553 |
+
os.environ["COHERE_API_KEY"] = "your-cohere-key"
|
554 |
+
|
555 |
+
## SET MAX TOKENS - via completion()
|
556 |
+
response_1 = litellm.completion(
|
557 |
+
model="command-nightly",
|
558 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
559 |
+
max_tokens=10
|
560 |
+
)
|
561 |
+
|
562 |
+
response_1_text = response_1.choices[0].message.content
|
563 |
+
|
564 |
+
## SET MAX TOKENS - via config
|
565 |
+
litellm.CohereConfig(max_tokens=200)
|
566 |
+
response_2 = litellm.completion(
|
567 |
+
model="command-nightly",
|
568 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
569 |
+
)
|
570 |
+
|
571 |
+
response_2_text = response_2.choices[0].message.content
|
572 |
+
|
573 |
+
## TEST OUTPUT
|
574 |
+
assert len(response_2_text) > len(response_1_text)
|
575 |
+
```
|
576 |
+
|
577 |
+
</TabItem>
|
578 |
+
|
579 |
+
</Tabs>
|
580 |
+
|
581 |
+
|
582 |
+
[**Check out the tutorial!**](../tutorials/provider_specific_params.md)
|
docs/my-website/docs/completion/message_trimming.md
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Trimming Input Messages
|
2 |
+
**Use litellm.trim_messages() to ensure messages does not exceed a model's token limit or specified `max_tokens`**
|
3 |
+
|
4 |
+
## Usage
|
5 |
+
```python
|
6 |
+
from litellm import completion
|
7 |
+
from litellm.utils import trim_messages
|
8 |
+
|
9 |
+
response = completion(
|
10 |
+
model=model,
|
11 |
+
messages=trim_messages(messages, model) # trim_messages ensures tokens(messages) < max_tokens(model)
|
12 |
+
)
|
13 |
+
```
|
14 |
+
|
15 |
+
## Usage - set max_tokens
|
16 |
+
```python
|
17 |
+
from litellm import completion
|
18 |
+
from litellm.utils import trim_messages
|
19 |
+
|
20 |
+
response = completion(
|
21 |
+
model=model,
|
22 |
+
messages=trim_messages(messages, model, max_tokens=10), # trim_messages ensures tokens(messages) < max_tokens
|
23 |
+
)
|
24 |
+
```
|
25 |
+
|
26 |
+
## Parameters
|
27 |
+
|
28 |
+
The function uses the following parameters:
|
29 |
+
|
30 |
+
- `messages`:[Required] This should be a list of input messages
|
31 |
+
|
32 |
+
- `model`:[Optional] This is the LiteLLM model being used. This parameter is optional, as you can alternatively specify the `max_tokens` parameter.
|
33 |
+
|
34 |
+
- `max_tokens`:[Optional] This is an int, manually set upper limit on messages
|
35 |
+
|
36 |
+
- `trim_ratio`:[Optional] This represents the target ratio of tokens to use following trimming. It's default value is 0.75, which implies that messages will be trimmed to utilise about 75%
|
docs/my-website/docs/completion/mock_requests.md
ADDED
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Mock Completion() Responses - Save Testing Costs π°
|
2 |
+
|
3 |
+
For testing purposes, you can use `completion()` with `mock_response` to mock calling the completion endpoint.
|
4 |
+
|
5 |
+
This will return a response object with a default response (works for streaming as well), without calling the LLM APIs.
|
6 |
+
|
7 |
+
## quick start
|
8 |
+
```python
|
9 |
+
from litellm import completion
|
10 |
+
|
11 |
+
model = "gpt-3.5-turbo"
|
12 |
+
messages = [{"role":"user", "content":"This is a test request"}]
|
13 |
+
|
14 |
+
completion(model=model, messages=messages, mock_response="It's simple to use and easy to get started")
|
15 |
+
```
|
16 |
+
|
17 |
+
## streaming
|
18 |
+
|
19 |
+
```python
|
20 |
+
from litellm import completion
|
21 |
+
model = "gpt-3.5-turbo"
|
22 |
+
messages = [{"role": "user", "content": "Hey, I'm a mock request"}]
|
23 |
+
response = completion(model=model, messages=messages, stream=True, mock_response="It's simple to use and easy to get started")
|
24 |
+
for chunk in response:
|
25 |
+
print(chunk) # {'choices': [{'delta': {'role': 'assistant', 'content': 'Thi'}, 'finish_reason': None}]}
|
26 |
+
complete_response += chunk["choices"][0]["delta"]["content"]
|
27 |
+
```
|
28 |
+
|
29 |
+
## (Non-streaming) Mock Response Object
|
30 |
+
|
31 |
+
```json
|
32 |
+
{
|
33 |
+
"choices": [
|
34 |
+
{
|
35 |
+
"finish_reason": "stop",
|
36 |
+
"index": 0,
|
37 |
+
"message": {
|
38 |
+
"content": "This is a mock request",
|
39 |
+
"role": "assistant",
|
40 |
+
"logprobs": null
|
41 |
+
}
|
42 |
+
}
|
43 |
+
],
|
44 |
+
"created": 1694459929.4496052,
|
45 |
+
"model": "MockResponse",
|
46 |
+
"usage": {
|
47 |
+
"prompt_tokens": null,
|
48 |
+
"completion_tokens": null,
|
49 |
+
"total_tokens": null
|
50 |
+
}
|
51 |
+
}
|
52 |
+
```
|
53 |
+
|
54 |
+
## Building a pytest function using `completion` with `mock_response`
|
55 |
+
|
56 |
+
```python
|
57 |
+
from litellm import completion
|
58 |
+
import pytest
|
59 |
+
|
60 |
+
def test_completion_openai():
|
61 |
+
try:
|
62 |
+
response = completion(
|
63 |
+
model="gpt-3.5-turbo",
|
64 |
+
messages=[{"role":"user", "content":"Why is LiteLLM amazing?"}],
|
65 |
+
mock_response="LiteLLM is awesome"
|
66 |
+
)
|
67 |
+
# Add any assertions here to check the response
|
68 |
+
print(response)
|
69 |
+
assert(response['choices'][0]['message']['content'] == "LiteLLM is awesome")
|
70 |
+
except Exception as e:
|
71 |
+
pytest.fail(f"Error occurred: {e}")
|
72 |
+
```
|
docs/my-website/docs/completion/model_alias.md
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Model Alias
|
2 |
+
|
3 |
+
The model name you show an end-user might be different from the one you pass to LiteLLM - e.g. Displaying `GPT-3.5` while calling `gpt-3.5-turbo-16k` on the backend.
|
4 |
+
|
5 |
+
LiteLLM simplifies this by letting you pass in a model alias mapping.
|
6 |
+
|
7 |
+
# expected format
|
8 |
+
|
9 |
+
```python
|
10 |
+
litellm.model_alias_map = {
|
11 |
+
# a dictionary containing a mapping of the alias string to the actual litellm model name string
|
12 |
+
"model_alias": "litellm_model_name"
|
13 |
+
}
|
14 |
+
```
|
15 |
+
|
16 |
+
# usage
|
17 |
+
|
18 |
+
### Relevant Code
|
19 |
+
```python
|
20 |
+
model_alias_map = {
|
21 |
+
"GPT-3.5": "gpt-3.5-turbo-16k",
|
22 |
+
"llama2": "replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf"
|
23 |
+
}
|
24 |
+
|
25 |
+
litellm.model_alias_map = model_alias_map
|
26 |
+
```
|
27 |
+
|
28 |
+
### Complete Code
|
29 |
+
```python
|
30 |
+
import litellm
|
31 |
+
from litellm import completion
|
32 |
+
|
33 |
+
|
34 |
+
## set ENV variables
|
35 |
+
os.environ["OPENAI_API_KEY"] = "openai key"
|
36 |
+
os.environ["REPLICATE_API_KEY"] = "cohere key"
|
37 |
+
|
38 |
+
## set model alias map
|
39 |
+
model_alias_map = {
|
40 |
+
"GPT-3.5": "gpt-3.5-turbo-16k",
|
41 |
+
"llama2": "replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf"
|
42 |
+
}
|
43 |
+
|
44 |
+
litellm.model_alias_map = model_alias_map
|
45 |
+
|
46 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
47 |
+
|
48 |
+
# call "gpt-3.5-turbo-16k"
|
49 |
+
response = completion(model="GPT-3.5", messages=messages)
|
50 |
+
|
51 |
+
# call replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca1...
|
52 |
+
response = completion("llama2", messages)
|
53 |
+
```
|
docs/my-website/docs/completion/multiple_deployments.md
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Multiple Deployments
|
2 |
+
|
3 |
+
If you have multiple deployments of the same model, you can pass the list of deployments, and LiteLLM will return the first result.
|
4 |
+
|
5 |
+
## Quick Start
|
6 |
+
|
7 |
+
Multiple providers offer Mistral-7B-Instruct.
|
8 |
+
|
9 |
+
Here's how you can use litellm to return the first result:
|
10 |
+
|
11 |
+
```python
|
12 |
+
from litellm import completion
|
13 |
+
|
14 |
+
messages=[{"role": "user", "content": "Hey, how's it going?"}]
|
15 |
+
|
16 |
+
## All your mistral deployments ##
|
17 |
+
model_list = [{
|
18 |
+
"model_name": "mistral-7b-instruct",
|
19 |
+
"litellm_params": { # params for litellm completion/embedding call
|
20 |
+
"model": "replicate/mistralai/mistral-7b-instruct-v0.1:83b6a56e7c828e667f21fd596c338fd4f0039b46bcfa18d973e8e70e455fda70",
|
21 |
+
"api_key": "replicate_api_key",
|
22 |
+
}
|
23 |
+
}, {
|
24 |
+
"model_name": "mistral-7b-instruct",
|
25 |
+
"litellm_params": { # params for litellm completion/embedding call
|
26 |
+
"model": "together_ai/mistralai/Mistral-7B-Instruct-v0.1",
|
27 |
+
"api_key": "togetherai_api_key",
|
28 |
+
}
|
29 |
+
}, {
|
30 |
+
"model_name": "mistral-7b-instruct",
|
31 |
+
"litellm_params": { # params for litellm completion/embedding call
|
32 |
+
"model": "together_ai/mistralai/Mistral-7B-Instruct-v0.1",
|
33 |
+
"api_key": "togetherai_api_key",
|
34 |
+
}
|
35 |
+
}, {
|
36 |
+
"model_name": "mistral-7b-instruct",
|
37 |
+
"litellm_params": { # params for litellm completion/embedding call
|
38 |
+
"model": "perplexity/mistral-7b-instruct",
|
39 |
+
"api_key": "perplexity_api_key"
|
40 |
+
}
|
41 |
+
}, {
|
42 |
+
"model_name": "mistral-7b-instruct",
|
43 |
+
"litellm_params": {
|
44 |
+
"model": "deepinfra/mistralai/Mistral-7B-Instruct-v0.1",
|
45 |
+
"api_key": "deepinfra_api_key"
|
46 |
+
}
|
47 |
+
}]
|
48 |
+
|
49 |
+
## LiteLLM completion call ## returns first response
|
50 |
+
response = completion(model="mistral-7b-instruct", messages=messages, model_list=model_list)
|
51 |
+
|
52 |
+
print(response)
|
53 |
+
```
|
docs/my-website/docs/completion/output.md
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Output
|
2 |
+
|
3 |
+
## Format
|
4 |
+
Here's the exact json output and type you can expect from all litellm `completion` calls for all models
|
5 |
+
|
6 |
+
```python
|
7 |
+
{
|
8 |
+
'choices': [
|
9 |
+
{
|
10 |
+
'finish_reason': str, # String: 'stop'
|
11 |
+
'index': int, # Integer: 0
|
12 |
+
'message': { # Dictionary [str, str]
|
13 |
+
'role': str, # String: 'assistant'
|
14 |
+
'content': str # String: "default message"
|
15 |
+
}
|
16 |
+
}
|
17 |
+
],
|
18 |
+
'created': str, # String: None
|
19 |
+
'model': str, # String: None
|
20 |
+
'usage': { # Dictionary [str, int]
|
21 |
+
'prompt_tokens': int, # Integer
|
22 |
+
'completion_tokens': int, # Integer
|
23 |
+
'total_tokens': int # Integer
|
24 |
+
}
|
25 |
+
}
|
26 |
+
|
27 |
+
```
|
28 |
+
|
29 |
+
You can access the response as a dictionary or as a class object, just as OpenAI allows you
|
30 |
+
```python
|
31 |
+
print(response.choices[0].message.content)
|
32 |
+
print(response['choices'][0]['message']['content'])
|
33 |
+
```
|
34 |
+
|
35 |
+
Here's what an example response looks like
|
36 |
+
```python
|
37 |
+
{
|
38 |
+
'choices': [
|
39 |
+
{
|
40 |
+
'finish_reason': 'stop',
|
41 |
+
'index': 0,
|
42 |
+
'message': {
|
43 |
+
'role': 'assistant',
|
44 |
+
'content': " I'm doing well, thank you for asking. I am Claude, an AI assistant created by Anthropic."
|
45 |
+
}
|
46 |
+
}
|
47 |
+
],
|
48 |
+
'created': 1691429984.3852863,
|
49 |
+
'model': 'claude-instant-1',
|
50 |
+
'usage': {'prompt_tokens': 18, 'completion_tokens': 23, 'total_tokens': 41}
|
51 |
+
}
|
52 |
+
```
|
53 |
+
|
54 |
+
## Additional Attributes
|
55 |
+
|
56 |
+
You can also access information like latency.
|
57 |
+
|
58 |
+
```python
|
59 |
+
from litellm import completion
|
60 |
+
import os
|
61 |
+
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
|
62 |
+
|
63 |
+
messages=[{"role": "user", "content": "Hey!"}]
|
64 |
+
|
65 |
+
response = completion(model="claude-2", messages=messages)
|
66 |
+
|
67 |
+
print(response.response_ms) # 616.25# 616.25
|
68 |
+
```
|
docs/my-website/docs/completion/prompt_formatting.md
ADDED
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Prompt Formatting
|
2 |
+
|
3 |
+
LiteLLM automatically translates the OpenAI ChatCompletions prompt format, to other models. You can control this by setting a custom prompt template for a model as well.
|
4 |
+
|
5 |
+
## Huggingface Models
|
6 |
+
|
7 |
+
LiteLLM supports [Huggingface Chat Templates](https://huggingface.co/docs/transformers/main/chat_templating), and will automatically check if your huggingface model has a registered chat template (e.g. [Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/main/tokenizer_config.json#L32)).
|
8 |
+
|
9 |
+
For popular models (e.g. meta-llama/llama2), we have their templates saved as part of the package.
|
10 |
+
|
11 |
+
**Stored Templates**
|
12 |
+
|
13 |
+
| Model Name | Works for Models | Completion Call
|
14 |
+
| -------- | -------- | -------- |
|
15 |
+
| mistralai/Mistral-7B-Instruct-v0.1 | mistralai/Mistral-7B-Instruct-v0.1| `completion(model='huggingface/mistralai/Mistral-7B-Instruct-v0.1', messages=messages, api_base="your_api_endpoint")` |
|
16 |
+
| meta-llama/Llama-2-7b-chat | All meta-llama llama2 chat models| `completion(model='huggingface/meta-llama/Llama-2-7b', messages=messages, api_base="your_api_endpoint")` |
|
17 |
+
| tiiuae/falcon-7b-instruct | All falcon instruct models | `completion(model='huggingface/tiiuae/falcon-7b-instruct', messages=messages, api_base="your_api_endpoint")` |
|
18 |
+
| mosaicml/mpt-7b-chat | All mpt chat models | `completion(model='huggingface/mosaicml/mpt-7b-chat', messages=messages, api_base="your_api_endpoint")` |
|
19 |
+
| codellama/CodeLlama-34b-Instruct-hf | All codellama instruct models | `completion(model='huggingface/codellama/CodeLlama-34b-Instruct-hf', messages=messages, api_base="your_api_endpoint")` |
|
20 |
+
| WizardLM/WizardCoder-Python-34B-V1.0 | All wizardcoder models | `completion(model='huggingface/WizardLM/WizardCoder-Python-34B-V1.0', messages=messages, api_base="your_api_endpoint")` |
|
21 |
+
| Phind/Phind-CodeLlama-34B-v2 | All phind-codellama models | `completion(model='huggingface/Phind/Phind-CodeLlama-34B-v2', messages=messages, api_base="your_api_endpoint")` |
|
22 |
+
|
23 |
+
[**Jump to code**](https://github.com/BerriAI/litellm/blob/main/litellm/llms/prompt_templates/factory.py)
|
24 |
+
|
25 |
+
## Format Prompt Yourself
|
26 |
+
|
27 |
+
You can also format the prompt yourself. Here's how:
|
28 |
+
|
29 |
+
```python
|
30 |
+
import litellm
|
31 |
+
# Create your own custom prompt template
|
32 |
+
litellm.register_prompt_template(
|
33 |
+
model="togethercomputer/LLaMA-2-7B-32K",
|
34 |
+
initial_prompt_value="You are a good assistant" # [OPTIONAL]
|
35 |
+
roles={
|
36 |
+
"system": {
|
37 |
+
"pre_message": "[INST] <<SYS>>\n", # [OPTIONAL]
|
38 |
+
"post_message": "\n<</SYS>>\n [/INST]\n" # [OPTIONAL]
|
39 |
+
},
|
40 |
+
"user": {
|
41 |
+
"pre_message": "[INST] ", # [OPTIONAL]
|
42 |
+
"post_message": " [/INST]" # [OPTIONAL]
|
43 |
+
},
|
44 |
+
"assistant": {
|
45 |
+
"pre_message": "\n" # [OPTIONAL]
|
46 |
+
"post_message": "\n" # [OPTIONAL]
|
47 |
+
}
|
48 |
+
}
|
49 |
+
final_prompt_value="Now answer as best you can:" # [OPTIONAL]
|
50 |
+
)
|
51 |
+
|
52 |
+
def test_huggingface_custom_model():
|
53 |
+
model = "huggingface/togethercomputer/LLaMA-2-7B-32K"
|
54 |
+
response = completion(model=model, messages=messages, api_base="https://my-huggingface-endpoint")
|
55 |
+
print(response['choices'][0]['message']['content'])
|
56 |
+
return response
|
57 |
+
|
58 |
+
test_huggingface_custom_model()
|
59 |
+
```
|
60 |
+
|
61 |
+
This is currently supported for Huggingface, TogetherAI, Ollama, and Petals.
|
62 |
+
|
63 |
+
Other providers either have fixed prompt templates (e.g. Anthropic), or format it themselves (e.g. Replicate). If there's a provider we're missing coverage for, let us know!
|
64 |
+
|
65 |
+
## All Providers
|
66 |
+
|
67 |
+
Here's the code for how we format all providers. Let us know how we can improve this further
|
68 |
+
|
69 |
+
|
70 |
+
| Provider | Model Name | Code |
|
71 |
+
| -------- | -------- | -------- |
|
72 |
+
| Anthropic | `claude-instant-1`, `claude-instant-1.2`, `claude-2` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/anthropic.py#L84)
|
73 |
+
| OpenAI Text Completion | `text-davinci-003`, `text-curie-001`, `text-babbage-001`, `text-ada-001`, `babbage-002`, `davinci-002`, | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/main.py#L442)
|
74 |
+
| Replicate | all model names starting with `replicate/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/replicate.py#L180)
|
75 |
+
| Cohere | `command-nightly`, `command`, `command-light`, `command-medium-beta`, `command-xlarge-beta` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/cohere.py#L115)
|
76 |
+
| Huggingface | all model names starting with `huggingface/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/huggingface_restapi.py#L186)
|
77 |
+
| OpenRouter | all model names starting with `openrouter/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/main.py#L611)
|
78 |
+
| AI21 | `j2-mid`, `j2-light`, `j2-ultra` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/ai21.py#L107)
|
79 |
+
| VertexAI | `text-bison`, `text-bison@001`, `chat-bison`, `chat-bison@001`, `chat-bison-32k`, `code-bison`, `code-bison@001`, `code-gecko@001`, `code-gecko@latest`, `codechat-bison`, `codechat-bison@001`, `codechat-bison-32k` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/vertex_ai.py#L89)
|
80 |
+
| Bedrock | all model names starting with `bedrock/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/bedrock.py#L183)
|
81 |
+
| Sagemaker | `sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/sagemaker.py#L89)
|
82 |
+
| TogetherAI | all model names starting with `together_ai/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/together_ai.py#L101)
|
83 |
+
| AlephAlpha | all model names starting with `aleph_alpha/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/aleph_alpha.py#L184)
|
84 |
+
| Palm | all model names starting with `palm/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/palm.py#L95)
|
85 |
+
| NLP Cloud | all model names starting with `palm/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/nlp_cloud.py#L120)
|
86 |
+
| Petals | all model names starting with `petals/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/petals.py#L87)
|
docs/my-website/docs/completion/reliable_completions.md
ADDED
@@ -0,0 +1,196 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Reliability - Retries, Fallbacks
|
2 |
+
|
3 |
+
LiteLLM helps prevent failed requests in 2 ways:
|
4 |
+
- Retries
|
5 |
+
- Fallbacks: Context Window + General
|
6 |
+
|
7 |
+
## Helper utils
|
8 |
+
LiteLLM supports the following functions for reliability:
|
9 |
+
* `litellm.longer_context_model_fallback_dict`: Dictionary which has a mapping for those models which have larger equivalents
|
10 |
+
* `num_retries`: use tenacity retries
|
11 |
+
* `completion()` with fallbacks: switch between models/keys/api bases in case of errors.
|
12 |
+
|
13 |
+
## Retry failed requests
|
14 |
+
|
15 |
+
Call it in completion like this `completion(..num_retries=2)`.
|
16 |
+
|
17 |
+
|
18 |
+
Here's a quick look at how you can use it:
|
19 |
+
|
20 |
+
```python
|
21 |
+
from litellm import completion
|
22 |
+
|
23 |
+
user_message = "Hello, whats the weather in San Francisco??"
|
24 |
+
messages = [{"content": user_message, "role": "user"}]
|
25 |
+
|
26 |
+
# normal call
|
27 |
+
response = completion(
|
28 |
+
model="gpt-3.5-turbo",
|
29 |
+
messages=messages,
|
30 |
+
num_retries=2
|
31 |
+
)
|
32 |
+
```
|
33 |
+
|
34 |
+
## Fallbacks
|
35 |
+
|
36 |
+
### Context Window Fallbacks
|
37 |
+
```python
|
38 |
+
from litellm import completion
|
39 |
+
|
40 |
+
fallback_dict = {"gpt-3.5-turbo": "gpt-3.5-turbo-16k"}
|
41 |
+
messages = [{"content": "how does a court case get to the Supreme Court?" * 500, "role": "user"}]
|
42 |
+
|
43 |
+
completion(model="gpt-3.5-turbo", messages=messages, context_window_fallback_dict=ctx_window_fallback_dict)
|
44 |
+
```
|
45 |
+
|
46 |
+
### Fallbacks - Switch Models/API Keys/API Bases
|
47 |
+
|
48 |
+
LLM APIs can be unstable, completion() with fallbacks ensures you'll always get a response from your calls
|
49 |
+
|
50 |
+
#### Usage
|
51 |
+
To use fallback models with `completion()`, specify a list of models in the `fallbacks` parameter.
|
52 |
+
|
53 |
+
The `fallbacks` list should include the primary model you want to use, followed by additional models that can be used as backups in case the primary model fails to provide a response.
|
54 |
+
|
55 |
+
#### switch models
|
56 |
+
```python
|
57 |
+
response = completion(model="bad-model", messages=messages,
|
58 |
+
fallbacks=["gpt-3.5-turbo" "command-nightly"])
|
59 |
+
```
|
60 |
+
|
61 |
+
#### switch api keys/bases (E.g. azure deployment)
|
62 |
+
Switch between different keys for the same azure deployment, or use another deployment as well.
|
63 |
+
|
64 |
+
```python
|
65 |
+
api_key="bad-key"
|
66 |
+
response = completion(model="azure/gpt-4", messages=messages, api_key=api_key,
|
67 |
+
fallbacks=[{"api_key": "good-key-1"}, {"api_key": "good-key-2", "api_base": "good-api-base-2"}])
|
68 |
+
```
|
69 |
+
|
70 |
+
[Check out this section for implementation details](#fallbacks-1)
|
71 |
+
|
72 |
+
## Implementation Details
|
73 |
+
|
74 |
+
### Fallbacks
|
75 |
+
#### Output from calls
|
76 |
+
```
|
77 |
+
Completion with 'bad-model': got exception Unable to map your input to a model. Check your input - {'model': 'bad-model'
|
78 |
+
|
79 |
+
|
80 |
+
|
81 |
+
completion call gpt-3.5-turbo
|
82 |
+
{
|
83 |
+
"id": "chatcmpl-7qTmVRuO3m3gIBg4aTmAumV1TmQhB",
|
84 |
+
"object": "chat.completion",
|
85 |
+
"created": 1692741891,
|
86 |
+
"model": "gpt-3.5-turbo-0613",
|
87 |
+
"choices": [
|
88 |
+
{
|
89 |
+
"index": 0,
|
90 |
+
"message": {
|
91 |
+
"role": "assistant",
|
92 |
+
"content": "I apologize, but as an AI, I do not have the capability to provide real-time weather updates. However, you can easily check the current weather in San Francisco by using a search engine or checking a weather website or app."
|
93 |
+
},
|
94 |
+
"finish_reason": "stop"
|
95 |
+
}
|
96 |
+
],
|
97 |
+
"usage": {
|
98 |
+
"prompt_tokens": 16,
|
99 |
+
"completion_tokens": 46,
|
100 |
+
"total_tokens": 62
|
101 |
+
}
|
102 |
+
}
|
103 |
+
|
104 |
+
```
|
105 |
+
|
106 |
+
#### How does fallbacks work
|
107 |
+
|
108 |
+
When you pass `fallbacks` to `completion`, it makes the first `completion` call using the primary model specified as `model` in `completion(model=model)`. If the primary model fails or encounters an error, it automatically tries the `fallbacks` models in the specified order. This ensures a response even if the primary model is unavailable.
|
109 |
+
|
110 |
+
|
111 |
+
#### Key components of Model Fallbacks implementation:
|
112 |
+
* Looping through `fallbacks`
|
113 |
+
* Cool-Downs for rate-limited models
|
114 |
+
|
115 |
+
#### Looping through `fallbacks`
|
116 |
+
Allow `45seconds` for each request. In the 45s this function tries calling the primary model set as `model`. If model fails it loops through the backup `fallbacks` models and attempts to get a response in the allocated `45s` time set here:
|
117 |
+
```python
|
118 |
+
while response == None and time.time() - start_time < 45:
|
119 |
+
for model in fallbacks:
|
120 |
+
```
|
121 |
+
|
122 |
+
#### Cool-Downs for rate-limited models
|
123 |
+
If a model API call leads to an error - allow it to cooldown for `60s`
|
124 |
+
```python
|
125 |
+
except Exception as e:
|
126 |
+
print(f"got exception {e} for model {model}")
|
127 |
+
rate_limited_models.add(model)
|
128 |
+
model_expiration_times[model] = (
|
129 |
+
time.time() + 60
|
130 |
+
) # cool down this selected model
|
131 |
+
pass
|
132 |
+
```
|
133 |
+
|
134 |
+
Before making an LLM API call we check if the selected model is in `rate_limited_models`, if so skip making the API call
|
135 |
+
```python
|
136 |
+
if (
|
137 |
+
model in rate_limited_models
|
138 |
+
): # check if model is currently cooling down
|
139 |
+
if (
|
140 |
+
model_expiration_times.get(model)
|
141 |
+
and time.time() >= model_expiration_times[model]
|
142 |
+
):
|
143 |
+
rate_limited_models.remove(
|
144 |
+
model
|
145 |
+
) # check if it's been 60s of cool down and remove model
|
146 |
+
else:
|
147 |
+
continue # skip model
|
148 |
+
|
149 |
+
```
|
150 |
+
|
151 |
+
#### Full code of completion with fallbacks()
|
152 |
+
```python
|
153 |
+
|
154 |
+
response = None
|
155 |
+
rate_limited_models = set()
|
156 |
+
model_expiration_times = {}
|
157 |
+
start_time = time.time()
|
158 |
+
fallbacks = [kwargs["model"]] + kwargs["fallbacks"]
|
159 |
+
del kwargs["fallbacks"] # remove fallbacks so it's not recursive
|
160 |
+
|
161 |
+
while response == None and time.time() - start_time < 45:
|
162 |
+
for model in fallbacks:
|
163 |
+
# loop thru all models
|
164 |
+
try:
|
165 |
+
if (
|
166 |
+
model in rate_limited_models
|
167 |
+
): # check if model is currently cooling down
|
168 |
+
if (
|
169 |
+
model_expiration_times.get(model)
|
170 |
+
and time.time() >= model_expiration_times[model]
|
171 |
+
):
|
172 |
+
rate_limited_models.remove(
|
173 |
+
model
|
174 |
+
) # check if it's been 60s of cool down and remove model
|
175 |
+
else:
|
176 |
+
continue # skip model
|
177 |
+
|
178 |
+
# delete model from kwargs if it exists
|
179 |
+
if kwargs.get("model"):
|
180 |
+
del kwargs["model"]
|
181 |
+
|
182 |
+
print("making completion call", model)
|
183 |
+
response = litellm.completion(**kwargs, model=model)
|
184 |
+
|
185 |
+
if response != None:
|
186 |
+
return response
|
187 |
+
|
188 |
+
except Exception as e:
|
189 |
+
print(f"got exception {e} for model {model}")
|
190 |
+
rate_limited_models.add(model)
|
191 |
+
model_expiration_times[model] = (
|
192 |
+
time.time() + 60
|
193 |
+
) # cool down this selected model
|
194 |
+
pass
|
195 |
+
return response
|
196 |
+
```
|
docs/my-website/docs/completion/stream.md
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Streaming + Async
|
2 |
+
|
3 |
+
- [Streaming Responses](#streaming-responses)
|
4 |
+
- [Async Completion](#async-completion)
|
5 |
+
- [Async + Streaming Completion](#async-streaming)
|
6 |
+
|
7 |
+
## Streaming Responses
|
8 |
+
LiteLLM supports streaming the model response back by passing `stream=True` as an argument to the completion function
|
9 |
+
### Usage
|
10 |
+
```python
|
11 |
+
from litellm import completion
|
12 |
+
messages = [{"role": "user", "content": "Hey, how's it going?"}]
|
13 |
+
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
|
14 |
+
for part in response:
|
15 |
+
print(part.choices[0].delta.content or "")
|
16 |
+
```
|
17 |
+
|
18 |
+
### Helper function
|
19 |
+
|
20 |
+
LiteLLM also exposes a helper function to rebuild the complete streaming response from the list of chunks.
|
21 |
+
|
22 |
+
```python
|
23 |
+
from litellm import completion
|
24 |
+
messages = [{"role": "user", "content": "Hey, how's it going?"}]
|
25 |
+
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
|
26 |
+
|
27 |
+
for chunk in response:
|
28 |
+
chunks.append(chunk)
|
29 |
+
|
30 |
+
print(litellm.stream_chunk_builder(chunks, messages=messages))
|
31 |
+
```
|
32 |
+
|
33 |
+
## Async Completion
|
34 |
+
Asynchronous Completion with LiteLLM. LiteLLM provides an asynchronous version of the completion function called `acompletion`
|
35 |
+
### Usage
|
36 |
+
```python
|
37 |
+
from litellm import acompletion
|
38 |
+
import asyncio
|
39 |
+
|
40 |
+
async def test_get_response():
|
41 |
+
user_message = "Hello, how are you?"
|
42 |
+
messages = [{"content": user_message, "role": "user"}]
|
43 |
+
response = await acompletion(model="gpt-3.5-turbo", messages=messages)
|
44 |
+
return response
|
45 |
+
|
46 |
+
response = asyncio.run(test_get_response())
|
47 |
+
print(response)
|
48 |
+
|
49 |
+
```
|
50 |
+
|
51 |
+
## Async Streaming
|
52 |
+
We've implemented an `__anext__()` function in the streaming object returned. This enables async iteration over the streaming object.
|
53 |
+
|
54 |
+
### Usage
|
55 |
+
Here's an example of using it with openai.
|
56 |
+
```python
|
57 |
+
from litellm import acompletion
|
58 |
+
import asyncio, os, traceback
|
59 |
+
|
60 |
+
async def completion_call():
|
61 |
+
try:
|
62 |
+
print("test acompletion + streaming")
|
63 |
+
response = await acompletion(
|
64 |
+
model="gpt-3.5-turbo",
|
65 |
+
messages=[{"content": "Hello, how are you?", "role": "user"}],
|
66 |
+
stream=True
|
67 |
+
)
|
68 |
+
print(f"response: {response}")
|
69 |
+
async for chunk in response:
|
70 |
+
print(chunk)
|
71 |
+
except:
|
72 |
+
print(f"error occurred: {traceback.format_exc()}")
|
73 |
+
pass
|
74 |
+
|
75 |
+
asyncio.run(completion_call())
|
76 |
+
```
|
docs/my-website/docs/completion/token_usage.md
ADDED
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Completion Token Usage & Cost
|
2 |
+
By default LiteLLM returns token usage in all completion requests ([See here](https://litellm.readthedocs.io/en/latest/output/))
|
3 |
+
|
4 |
+
However, we also expose 5 helper functions + **[NEW]** an API to calculate token usage across providers:
|
5 |
+
|
6 |
+
- `encode`: This encodes the text passed in, using the model-specific tokenizer. [**Jump to code**](#1-encode)
|
7 |
+
|
8 |
+
- `decode`: This decodes the tokens passed in, using the model-specific tokenizer. [**Jump to code**](#2-decode)
|
9 |
+
|
10 |
+
- `token_counter`: This returns the number of tokens for a given input - it uses the tokenizer based on the model, and defaults to tiktoken if no model-specific tokenizer is available. [**Jump to code**](#3-token_counter)
|
11 |
+
|
12 |
+
- `cost_per_token`: This returns the cost (in USD) for prompt (input) and completion (output) tokens. Uses the live list from `api.litellm.ai`. [**Jump to code**](#4-cost_per_token)
|
13 |
+
|
14 |
+
- `completion_cost`: This returns the overall cost (in USD) for a given LLM API Call. It combines `token_counter` and `cost_per_token` to return the cost for that query (counting both cost of input and output). [**Jump to code**](#5-completion_cost)
|
15 |
+
|
16 |
+
- `get_max_tokens`: This returns the maximum number of tokens allowed for the given model. [**Jump to code**](#6-get_max_tokens)
|
17 |
+
|
18 |
+
- `model_cost`: This returns a dictionary for all models, with their max_tokens, input_cost_per_token and output_cost_per_token. It uses the `api.litellm.ai` call shown below. [**Jump to code**](#7-model_cost)
|
19 |
+
|
20 |
+
- `register_model`: This registers new / overrides existing models (and their pricing details) in the model cost dictionary. [**Jump to code**](#8-register_model)
|
21 |
+
|
22 |
+
- `api.litellm.ai`: Live token + price count across [all supported models](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). [**Jump to code**](#9-apilitellmai)
|
23 |
+
|
24 |
+
π£ This is a community maintained list. Contributions are welcome! β€οΈ
|
25 |
+
|
26 |
+
## Example Usage
|
27 |
+
|
28 |
+
### 1. `encode`
|
29 |
+
Encoding has model-specific tokenizers for anthropic, cohere, llama2 and openai. If an unsupported model is passed in, it'll default to using tiktoken (openai's tokenizer).
|
30 |
+
|
31 |
+
```python
|
32 |
+
from litellm import encode, decode
|
33 |
+
|
34 |
+
sample_text = "HellΓΆ World, this is my input string!"
|
35 |
+
# openai encoding + decoding
|
36 |
+
openai_tokens = encode(model="gpt-3.5-turbo", text=sample_text)
|
37 |
+
print(openai_tokens)
|
38 |
+
```
|
39 |
+
|
40 |
+
### 2. `decode`
|
41 |
+
|
42 |
+
Decoding is supported for anthropic, cohere, llama2 and openai.
|
43 |
+
|
44 |
+
```python
|
45 |
+
from litellm import encode, decode
|
46 |
+
|
47 |
+
sample_text = "HellΓΆ World, this is my input string!"
|
48 |
+
# openai encoding + decoding
|
49 |
+
openai_tokens = encode(model="gpt-3.5-turbo", text=sample_text)
|
50 |
+
openai_text = decode(model="gpt-3.5-turbo", tokens=openai_tokens)
|
51 |
+
print(openai_text)
|
52 |
+
```
|
53 |
+
|
54 |
+
### 3. `token_counter`
|
55 |
+
|
56 |
+
```python
|
57 |
+
from litellm import token_counter
|
58 |
+
|
59 |
+
messages = [{"user": "role", "content": "Hey, how's it going"}]
|
60 |
+
print(token_counter(model="gpt-3.5-turbo", messages=messages))
|
61 |
+
```
|
62 |
+
|
63 |
+
### 4. `cost_per_token`
|
64 |
+
|
65 |
+
```python
|
66 |
+
from litellm import cost_per_token
|
67 |
+
|
68 |
+
prompt_tokens = 5
|
69 |
+
completion_tokens = 10
|
70 |
+
prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = cost_per_token(model="gpt-3.5-turbo", prompt_tokens=prompt_tokens, completion_tokens=completion_tokens))
|
71 |
+
|
72 |
+
print(prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar)
|
73 |
+
```
|
74 |
+
|
75 |
+
### 5. `completion_cost`
|
76 |
+
|
77 |
+
* Input: Accepts a `litellm.completion()` response **OR** prompt + completion strings
|
78 |
+
* Output: Returns a `float` of cost for the `completion` call
|
79 |
+
|
80 |
+
**litellm.completion()**
|
81 |
+
```python
|
82 |
+
from litellm import completion, completion_cost
|
83 |
+
|
84 |
+
response = completion(
|
85 |
+
model="bedrock/anthropic.claude-v2",
|
86 |
+
messages=messages,
|
87 |
+
request_timeout=200,
|
88 |
+
)
|
89 |
+
# pass your response from completion to completion_cost
|
90 |
+
cost = completion_cost(completion_response=response)
|
91 |
+
formatted_string = f"${float(cost):.10f}"
|
92 |
+
print(formatted_string)
|
93 |
+
```
|
94 |
+
|
95 |
+
**prompt + completion string**
|
96 |
+
```python
|
97 |
+
from litellm import completion_cost
|
98 |
+
cost = completion_cost(model="bedrock/anthropic.claude-v2", prompt="Hey!", completion="How's it going?")
|
99 |
+
formatted_string = f"${float(cost):.10f}"
|
100 |
+
print(formatted_string)
|
101 |
+
```
|
102 |
+
### 6. `get_max_tokens`
|
103 |
+
|
104 |
+
Input: Accepts a model name - e.g., gpt-3.5-turbo (to get a complete list, call litellm.model_list).
|
105 |
+
Output: Returns the maximum number of tokens allowed for the given model
|
106 |
+
|
107 |
+
```python
|
108 |
+
from litellm import get_max_tokens
|
109 |
+
|
110 |
+
model = "gpt-3.5-turbo"
|
111 |
+
|
112 |
+
print(get_max_tokens(model)) # Output: 4097
|
113 |
+
```
|
114 |
+
|
115 |
+
### 7. `model_cost`
|
116 |
+
|
117 |
+
* Output: Returns a dict object containing the max_tokens, input_cost_per_token, output_cost_per_token for all models on [community-maintained list](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)
|
118 |
+
|
119 |
+
```python
|
120 |
+
from litellm import model_cost
|
121 |
+
|
122 |
+
print(model_cost) # {'gpt-3.5-turbo': {'max_tokens': 4000, 'input_cost_per_token': 1.5e-06, 'output_cost_per_token': 2e-06}, ...}
|
123 |
+
```
|
124 |
+
|
125 |
+
### 8. `register_model`
|
126 |
+
|
127 |
+
* Input: Provide EITHER a model cost dictionary or a url to a hosted json blob
|
128 |
+
* Output: Returns updated model_cost dictionary + updates litellm.model_cost with model details.
|
129 |
+
|
130 |
+
**Dictionary**
|
131 |
+
```python
|
132 |
+
from litellm import register_model
|
133 |
+
|
134 |
+
litellm.register_model({
|
135 |
+
"gpt-4": {
|
136 |
+
"max_tokens": 8192,
|
137 |
+
"input_cost_per_token": 0.00002,
|
138 |
+
"output_cost_per_token": 0.00006,
|
139 |
+
"litellm_provider": "openai",
|
140 |
+
"mode": "chat"
|
141 |
+
},
|
142 |
+
})
|
143 |
+
```
|
144 |
+
|
145 |
+
**URL for json blob**
|
146 |
+
```python
|
147 |
+
import litellm
|
148 |
+
|
149 |
+
litellm.register_model(model_cost=
|
150 |
+
"https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json")
|
151 |
+
```
|
152 |
+
|
153 |
+
|
154 |
+
|
docs/my-website/docs/contact.md
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Contact Us
|
2 |
+
|
3 |
+
[![](https://dcbadge.vercel.app/api/server/wuPM9dRgDw)](https://discord.gg/wuPM9dRgDw)
|
4 |
+
|
5 |
+
* [Meet with us π](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
6 |
+
* Contact us at [email protected] / [email protected]
|
docs/my-website/docs/debugging/hosted_debugging.md
ADDED
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Image from '@theme/IdealImage';
|
2 |
+
import QueryParamReader from '../../src/components/queryParamReader.js'
|
3 |
+
|
4 |
+
# [Beta] Monitor Logs in Production
|
5 |
+
|
6 |
+
:::note
|
7 |
+
|
8 |
+
This is in beta. Expect frequent updates, as we improve based on your feedback.
|
9 |
+
|
10 |
+
:::
|
11 |
+
|
12 |
+
LiteLLM provides an integration to let you monitor logs in production.
|
13 |
+
|
14 |
+
π Jump to our sample LiteLLM Dashboard: https://admin.litellm.ai/
|
15 |
+
|
16 |
+
|
17 |
+
<Image img={require('../../img/alt_dashboard.png')} alt="Dashboard" />
|
18 |
+
|
19 |
+
## Debug your first logs
|
20 |
+
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_OpenAI.ipynb">
|
21 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
22 |
+
</a>
|
23 |
+
|
24 |
+
|
25 |
+
### 1. Get your LiteLLM Token
|
26 |
+
|
27 |
+
Go to [admin.litellm.ai](https://admin.litellm.ai/) and copy the code snippet with your unique token
|
28 |
+
|
29 |
+
<Image img={require('../../img/hosted_debugger_usage_page.png')} alt="Usage" />
|
30 |
+
|
31 |
+
### 2. Set up your environment
|
32 |
+
|
33 |
+
**Add it to your .env**
|
34 |
+
|
35 |
+
```python
|
36 |
+
import os
|
37 |
+
|
38 |
+
os.env["LITELLM_TOKEN"] = "e24c4c06-d027-4c30-9e78-18bc3a50aebb" # replace with your unique token
|
39 |
+
|
40 |
+
```
|
41 |
+
|
42 |
+
**Turn on LiteLLM Client**
|
43 |
+
```python
|
44 |
+
import litellm
|
45 |
+
litellm.client = True
|
46 |
+
```
|
47 |
+
|
48 |
+
### 3. Make a normal `completion()` call
|
49 |
+
```python
|
50 |
+
import litellm
|
51 |
+
from litellm import completion
|
52 |
+
import os
|
53 |
+
|
54 |
+
# set env variables
|
55 |
+
os.environ["LITELLM_TOKEN"] = "e24c4c06-d027-4c30-9e78-18bc3a50aebb" # replace with your unique token
|
56 |
+
os.environ["OPENAI_API_KEY"] = "openai key"
|
57 |
+
|
58 |
+
litellm.use_client = True # enable logging dashboard
|
59 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
60 |
+
|
61 |
+
# openai call
|
62 |
+
response = completion(model="gpt-3.5-turbo", messages=messages)
|
63 |
+
```
|
64 |
+
|
65 |
+
Your `completion()` call print with a link to your session dashboard (https://admin.litellm.ai/<your_unique_token>)
|
66 |
+
|
67 |
+
In the above case it would be: [`admin.litellm.ai/e24c4c06-d027-4c30-9e78-18bc3a50aebb`](https://admin.litellm.ai/e24c4c06-d027-4c30-9e78-18bc3a50aebb)
|
68 |
+
|
69 |
+
Click on your personal dashboard link. Here's how you can find it π
|
70 |
+
|
71 |
+
<Image img={require('../../img/dash_output.png')} alt="Dashboard" />
|
72 |
+
|
73 |
+
[π Tell us if you need better privacy controls](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version?month=2023-08)
|
74 |
+
|
75 |
+
### 3. Review request log
|
76 |
+
|
77 |
+
Oh! Looks like our request was made successfully. Let's click on it and see exactly what got sent to the LLM provider.
|
78 |
+
|
79 |
+
<Image img={require('../../img/dashboard_log_row.png')} alt="Dashboard Log Row" />
|
80 |
+
|
81 |
+
|
82 |
+
|
83 |
+
Ah! So we can see that this request was made to a **Baseten** (see litellm_params > custom_llm_provider) for a model with ID - **7qQNLDB** (see model). The message sent was - `"Hey, how's it going?"` and the response received was - `"As an AI language model, I don't have feelings or emotions, but I can assist you with your queries. How can I assist you today?"`
|
84 |
+
|
85 |
+
<Image img={require('../../img/dashboard_log.png')} alt="Dashboard Log Row" />
|
86 |
+
|
87 |
+
:::info
|
88 |
+
|
89 |
+
π Congratulations! You've successfully debugger your first log!
|
90 |
+
|
91 |
+
:::
|
docs/my-website/docs/debugging/local_debugging.md
ADDED
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Local Debugging
|
2 |
+
There's 2 ways to do local debugging - `litellm.set_verbose=True` and by passing in a custom function `completion(...logger_fn=<your_local_function>)`. Warning: Make sure to not use `set_verbose` in production. It logs API keys, which might end up in log files.
|
3 |
+
|
4 |
+
## Set Verbose
|
5 |
+
|
6 |
+
This is good for getting print statements for everything litellm is doing.
|
7 |
+
```python
|
8 |
+
import litellm
|
9 |
+
from litellm import completion
|
10 |
+
|
11 |
+
litellm.set_verbose=True # π this is the 1-line change you need to make
|
12 |
+
|
13 |
+
## set ENV variables
|
14 |
+
os.environ["OPENAI_API_KEY"] = "openai key"
|
15 |
+
os.environ["COHERE_API_KEY"] = "cohere key"
|
16 |
+
|
17 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
18 |
+
|
19 |
+
# openai call
|
20 |
+
response = completion(model="gpt-3.5-turbo", messages=messages)
|
21 |
+
|
22 |
+
# cohere call
|
23 |
+
response = completion("command-nightly", messages)
|
24 |
+
```
|
25 |
+
|
26 |
+
## Logger Function
|
27 |
+
But sometimes all you care about is seeing exactly what's getting sent to your api call and what's being returned - e.g. if the api call is failing, why is that happening? what are the exact params being set?
|
28 |
+
|
29 |
+
In that case, LiteLLM allows you to pass in a custom logging function to see / modify the model call Input/Outputs.
|
30 |
+
|
31 |
+
**Note**: We expect you to accept a dict object.
|
32 |
+
|
33 |
+
Your custom function
|
34 |
+
|
35 |
+
```python
|
36 |
+
def my_custom_logging_fn(model_call_dict):
|
37 |
+
print(f"model call details: {model_call_dict}")
|
38 |
+
```
|
39 |
+
|
40 |
+
### Complete Example
|
41 |
+
```python
|
42 |
+
from litellm import completion
|
43 |
+
|
44 |
+
def my_custom_logging_fn(model_call_dict):
|
45 |
+
print(f"model call details: {model_call_dict}")
|
46 |
+
|
47 |
+
## set ENV variables
|
48 |
+
os.environ["OPENAI_API_KEY"] = "openai key"
|
49 |
+
os.environ["COHERE_API_KEY"] = "cohere key"
|
50 |
+
|
51 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
52 |
+
|
53 |
+
# openai call
|
54 |
+
response = completion(model="gpt-3.5-turbo", messages=messages, logger_fn=my_custom_logging_fn)
|
55 |
+
|
56 |
+
# cohere call
|
57 |
+
response = completion("command-nightly", messages, logger_fn=my_custom_logging_fn)
|
58 |
+
```
|
59 |
+
|
60 |
+
## Still Seeing Issues?
|
61 |
+
|
62 |
+
Text us @ +17708783106 or Join the [Discord](https://discord.com/invite/wuPM9dRgDw).
|
63 |
+
|
64 |
+
We promise to help you in `lite`ning speed β€οΈ
|
docs/my-website/docs/default_code_snippet.md
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
displayed_sidebar: tutorialSidebar
|
3 |
+
---
|
4 |
+
# Get Started
|
5 |
+
|
6 |
+
import QueryParamReader from '../src/components/queryParamReader.js'
|
7 |
+
import TokenComponent from '../src/components/queryParamToken.js'
|
8 |
+
|
9 |
+
:::info
|
10 |
+
|
11 |
+
This section assumes you've already added your API keys in <TokenComponent/>
|
12 |
+
|
13 |
+
If you want to use the non-hosted version, [go here](https://docs.litellm.ai/docs/#quick-start)
|
14 |
+
|
15 |
+
:::
|
16 |
+
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install litellm
|
20 |
+
```
|
21 |
+
|
22 |
+
<QueryParamReader/>
|
docs/my-website/docs/embedding/async_embedding.md
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Async Embedding
|
2 |
+
|
3 |
+
LiteLLM provides an asynchronous version of the `embedding` function called `aembedding`
|
4 |
+
### Usage
|
5 |
+
```python
|
6 |
+
from litellm import aembedding
|
7 |
+
import asyncio
|
8 |
+
|
9 |
+
async def test_get_response():
|
10 |
+
response = await aembedding('text-embedding-ada-002', input=["good morning from litellm"])
|
11 |
+
return response
|
12 |
+
|
13 |
+
response = asyncio.run(test_get_response())
|
14 |
+
print(response)
|
15 |
+
```
|
docs/my-website/docs/embedding/moderation.md
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Moderation
|
2 |
+
LiteLLM supports the moderation endpoint for OpenAI
|
3 |
+
|
4 |
+
## Usage
|
5 |
+
```python
|
6 |
+
import os
|
7 |
+
from litellm import moderation
|
8 |
+
os.environ['OPENAI_API_KEY'] = ""
|
9 |
+
response = moderation(input="i'm ishaan cto of litellm")
|
10 |
+
```
|
docs/my-website/docs/embedding/supported_embedding.md
ADDED
@@ -0,0 +1,201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Embedding Models
|
2 |
+
|
3 |
+
## Quick Start
|
4 |
+
```python
|
5 |
+
from litellm import embedding
|
6 |
+
import os
|
7 |
+
os.environ['OPENAI_API_KEY'] = ""
|
8 |
+
response = embedding(model='text-embedding-ada-002', input=["good morning from litellm"])
|
9 |
+
```
|
10 |
+
|
11 |
+
### Input Params for `litellm.embedding()`
|
12 |
+
### Required Fields
|
13 |
+
|
14 |
+
- `model`: *string* - ID of the model to use. `model='text-embedding-ada-002'`
|
15 |
+
|
16 |
+
- `input`: *array* - Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (8192 tokens for text-embedding-ada-002), cannot be an empty string, and any array must be 2048 dimensions or less.
|
17 |
+
```
|
18 |
+
input=["good morning from litellm"]
|
19 |
+
```
|
20 |
+
|
21 |
+
### Optional LiteLLM Fields
|
22 |
+
|
23 |
+
- `user`: *string (optional)* A unique identifier representing your end-user,
|
24 |
+
|
25 |
+
- `timeout`: *integer* - The maximum time, in seconds, to wait for the API to respond. Defaults to 600 seconds (10 minutes).
|
26 |
+
|
27 |
+
- `api_base`: *string (optional)* - The api endpoint you want to call the model with
|
28 |
+
|
29 |
+
- `api_version`: *string (optional)* - (Azure-specific) the api version for the call
|
30 |
+
|
31 |
+
- `api_key`: *string (optional)* - The API key to authenticate and authorize requests. If not provided, the default API key is used.
|
32 |
+
|
33 |
+
- `api_type`: *string (optional)* - The type of API to use.
|
34 |
+
|
35 |
+
### Output from `litellm.embedding()`
|
36 |
+
|
37 |
+
```json
|
38 |
+
{
|
39 |
+
"object": "list",
|
40 |
+
"data": [
|
41 |
+
{
|
42 |
+
"object": "embedding",
|
43 |
+
"index": 0,
|
44 |
+
"embedding": [
|
45 |
+
-0.0022326677571982145,
|
46 |
+
0.010749882087111473,
|
47 |
+
...
|
48 |
+
...
|
49 |
+
...
|
50 |
+
|
51 |
+
]
|
52 |
+
}
|
53 |
+
],
|
54 |
+
"model": "text-embedding-ada-002-v2",
|
55 |
+
"usage": {
|
56 |
+
"prompt_tokens": 10,
|
57 |
+
"total_tokens": 10
|
58 |
+
}
|
59 |
+
}
|
60 |
+
```
|
61 |
+
|
62 |
+
## OpenAI Embedding Models
|
63 |
+
|
64 |
+
### Usage
|
65 |
+
```python
|
66 |
+
from litellm import embedding
|
67 |
+
import os
|
68 |
+
os.environ['OPENAI_API_KEY'] = ""
|
69 |
+
response = embedding('text-embedding-ada-002', input=["good morning from litellm"])
|
70 |
+
```
|
71 |
+
|
72 |
+
| Model Name | Function Call | Required OS Variables |
|
73 |
+
|----------------------|---------------------------------------------|--------------------------------------|
|
74 |
+
| text-embedding-ada-002 | `embedding('text-embedding-ada-002', input)` | `os.environ['OPENAI_API_KEY']` |
|
75 |
+
|
76 |
+
## Azure OpenAI Embedding Models
|
77 |
+
|
78 |
+
### API keys
|
79 |
+
This can be set as env variables or passed as **params to litellm.embedding()**
|
80 |
+
```python
|
81 |
+
import os
|
82 |
+
os.environ['AZURE_API_KEY'] =
|
83 |
+
os.environ['AZURE_API_BASE'] =
|
84 |
+
os.environ['AZURE_API_VERSION'] =
|
85 |
+
```
|
86 |
+
|
87 |
+
### Usage
|
88 |
+
```python
|
89 |
+
from litellm import embedding
|
90 |
+
response = embedding(
|
91 |
+
model="azure/<your deployment name>",
|
92 |
+
input=["good morning from litellm"],
|
93 |
+
api_key=api_key,
|
94 |
+
api_base=api_base,
|
95 |
+
api_version=api_version,
|
96 |
+
)
|
97 |
+
print(response)
|
98 |
+
```
|
99 |
+
|
100 |
+
| Model Name | Function Call |
|
101 |
+
|----------------------|---------------------------------------------|
|
102 |
+
| text-embedding-ada-002 | `embedding(model="azure/<your deployment name>", input=input)` |
|
103 |
+
|
104 |
+
h/t to [Mikko](https://www.linkedin.com/in/mikkolehtimaki/) for this integration
|
105 |
+
|
106 |
+
## OpenAI Compatible Embedding Models
|
107 |
+
Use this for calling `/embedding` endpoints on OpenAI Compatible Servers, example https://github.com/xorbitsai/inference
|
108 |
+
|
109 |
+
**Note add `openai/` prefix to model so litellm knows to route to OpenAI**
|
110 |
+
|
111 |
+
### Usage
|
112 |
+
```python
|
113 |
+
from litellm import embedding
|
114 |
+
response = embedding(
|
115 |
+
model = "openai/<your-llm-name>", # add `openai/` prefix to model so litellm knows to route to OpenAI
|
116 |
+
api_base="http://0.0.0.0:8000/" # set API Base of your Custom OpenAI Endpoint
|
117 |
+
input=["good morning from litellm"]
|
118 |
+
)
|
119 |
+
```
|
120 |
+
|
121 |
+
## Bedrock Embedding
|
122 |
+
|
123 |
+
### API keys
|
124 |
+
This can be set as env variables or passed as **params to litellm.embedding()**
|
125 |
+
```python
|
126 |
+
import os
|
127 |
+
os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
|
128 |
+
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
|
129 |
+
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
|
130 |
+
```
|
131 |
+
|
132 |
+
### Usage
|
133 |
+
```python
|
134 |
+
from litellm import embedding
|
135 |
+
response = embedding(
|
136 |
+
model="amazon.titan-embed-text-v1",
|
137 |
+
input=["good morning from litellm"],
|
138 |
+
)
|
139 |
+
print(response)
|
140 |
+
```
|
141 |
+
|
142 |
+
| Model Name | Function Call |
|
143 |
+
|----------------------|---------------------------------------------|
|
144 |
+
| Titan Embeddings - G1 | `embedding(model="amazon.titan-embed-text-v1", input=input)` |
|
145 |
+
|
146 |
+
|
147 |
+
## Cohere Embedding Models
|
148 |
+
https://docs.cohere.com/reference/embed
|
149 |
+
|
150 |
+
### Usage
|
151 |
+
```python
|
152 |
+
from litellm import embedding
|
153 |
+
os.environ["COHERE_API_KEY"] = "cohere key"
|
154 |
+
|
155 |
+
# cohere call
|
156 |
+
response = embedding(
|
157 |
+
model="embed-english-v3.0",
|
158 |
+
input=["good morning from litellm", "this is another item"],
|
159 |
+
input_type="search_document" # optional param for v3 llms
|
160 |
+
)
|
161 |
+
```
|
162 |
+
| Model Name | Function Call |
|
163 |
+
|--------------------------|--------------------------------------------------------------|
|
164 |
+
| embed-english-v3.0 | `embedding(model="embed-english-v3.0", input=["good morning from litellm", "this is another item"])` |
|
165 |
+
| embed-english-light-v3.0 | `embedding(model="embed-english-light-v3.0", input=["good morning from litellm", "this is another item"])` |
|
166 |
+
| embed-multilingual-v3.0 | `embedding(model="embed-multilingual-v3.0", input=["good morning from litellm", "this is another item"])` |
|
167 |
+
| embed-multilingual-light-v3.0 | `embedding(model="embed-multilingual-light-v3.0", input=["good morning from litellm", "this is another item"])` |
|
168 |
+
| embed-english-v2.0 | `embedding(model="embed-english-v2.0", input=["good morning from litellm", "this is another item"])` |
|
169 |
+
| embed-english-light-v2.0 | `embedding(model="embed-english-light-v2.0", input=["good morning from litellm", "this is another item"])` |
|
170 |
+
| embed-multilingual-v2.0 | `embedding(model="embed-multilingual-v2.0", input=["good morning from litellm", "this is another item"])` |
|
171 |
+
|
172 |
+
## HuggingFace Embedding Models
|
173 |
+
LiteLLM supports all Feature-Extraction Embedding models: https://huggingface.co/models?pipeline_tag=feature-extraction
|
174 |
+
|
175 |
+
### Usage
|
176 |
+
```python
|
177 |
+
from litellm import embedding
|
178 |
+
import os
|
179 |
+
os.environ['HUGGINGFACE_API_KEY'] = ""
|
180 |
+
response = embedding(
|
181 |
+
model='huggingface/microsoft/codebert-base',
|
182 |
+
input=["good morning from litellm"]
|
183 |
+
)
|
184 |
+
```
|
185 |
+
### Usage - Custom API Base
|
186 |
+
```python
|
187 |
+
from litellm import embedding
|
188 |
+
import os
|
189 |
+
os.environ['HUGGINGFACE_API_KEY'] = ""
|
190 |
+
response = embedding(
|
191 |
+
model='huggingface/microsoft/codebert-base',
|
192 |
+
input=["good morning from litellm"],
|
193 |
+
api_base = "https://p69xlsj6rpno5drq.us-east-1.aws.endpoints.huggingface.cloud"
|
194 |
+
)
|
195 |
+
```
|
196 |
+
|
197 |
+
| Model Name | Function Call | Required OS Variables |
|
198 |
+
|-----------------------|--------------------------------------------------------------|-------------------------------------------------|
|
199 |
+
| microsoft/codebert-base | `embedding('huggingface/microsoft/codebert-base', input=input)` | `os.environ['HUGGINGFACE_API_KEY']` |
|
200 |
+
| BAAI/bge-large-zh | `embedding('huggingface/BAAI/bge-large-zh', input=input)` | `os.environ['HUGGINGFACE_API_KEY']` |
|
201 |
+
| any-hf-embedding-model | `embedding('huggingface/hf-embedding-model', input=input)` | `os.environ['HUGGINGFACE_API_KEY']` |
|
docs/my-website/docs/exception_mapping.md
ADDED
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Exception Mapping
|
2 |
+
|
3 |
+
LiteLLM maps exceptions across all providers to their OpenAI counterparts.
|
4 |
+
- Rate Limit Errors
|
5 |
+
- Invalid Request Errors
|
6 |
+
- Authentication Errors
|
7 |
+
- Timeout Errors `openai.APITimeoutError`
|
8 |
+
- ServiceUnavailableError
|
9 |
+
- APIError
|
10 |
+
- APIConnectionError
|
11 |
+
|
12 |
+
Base case we return APIConnectionError
|
13 |
+
|
14 |
+
All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM.
|
15 |
+
|
16 |
+
For all cases, the exception returned inherits from the original OpenAI Exception but contains 3 additional attributes:
|
17 |
+
* status_code - the http status code of the exception
|
18 |
+
* message - the error message
|
19 |
+
* llm_provider - the provider raising the exception
|
20 |
+
|
21 |
+
## Usage
|
22 |
+
|
23 |
+
```python
|
24 |
+
import litellm
|
25 |
+
import openai
|
26 |
+
|
27 |
+
try:
|
28 |
+
response = litellm.completion(
|
29 |
+
model="gpt-4",
|
30 |
+
messages=[
|
31 |
+
{
|
32 |
+
"role": "user",
|
33 |
+
"content": "hello, write a 20 pageg essay"
|
34 |
+
}
|
35 |
+
],
|
36 |
+
timeout=0.01, # this will raise a timeout exception
|
37 |
+
)
|
38 |
+
except openai.APITimeoutError as e:
|
39 |
+
print("Passed: Raised correct exception. Got openai.APITimeoutError\nGood Job", e)
|
40 |
+
print(type(e))
|
41 |
+
pass
|
42 |
+
```
|
43 |
+
|
44 |
+
## Usage - Catching Streaming Exceptions
|
45 |
+
```python
|
46 |
+
import litellm
|
47 |
+
try:
|
48 |
+
response = litellm.completion(
|
49 |
+
model="gpt-3.5-turbo",
|
50 |
+
messages=[
|
51 |
+
{
|
52 |
+
"role": "user",
|
53 |
+
"content": "hello, write a 20 pg essay"
|
54 |
+
}
|
55 |
+
],
|
56 |
+
timeout=0.0001, # this will raise an exception
|
57 |
+
stream=True,
|
58 |
+
)
|
59 |
+
for chunk in response:
|
60 |
+
print(chunk)
|
61 |
+
except openai.APITimeoutError as e:
|
62 |
+
print("Passed: Raised correct exception. Got openai.APITimeoutError\nGood Job", e)
|
63 |
+
print(type(e))
|
64 |
+
pass
|
65 |
+
except Exception as e:
|
66 |
+
print(f"Did not raise error `openai.APITimeoutError`. Instead raised error type: {type(e)}, Error: {e}")
|
67 |
+
|
68 |
+
```
|
69 |
+
|
70 |
+
## Details
|
71 |
+
|
72 |
+
To see how it's implemented - [check out the code](https://github.com/BerriAI/litellm/blob/a42c197e5a6de56ea576c73715e6c7c6b19fa249/litellm/utils.py#L1217)
|
73 |
+
|
74 |
+
[Create an issue](https://github.com/BerriAI/litellm/issues/new) **or** [make a PR](https://github.com/BerriAI/litellm/pulls) if you want to improve the exception mapping.
|
75 |
+
|
76 |
+
**Note** For OpenAI and Azure we return the original exception (since they're of the OpenAI Error type). But we add the 'llm_provider' attribute to them. [See code](https://github.com/BerriAI/litellm/blob/a42c197e5a6de56ea576c73715e6c7c6b19fa249/litellm/utils.py#L1221)
|
77 |
+
|
78 |
+
## Custom mapping list
|
79 |
+
|
80 |
+
Base case - we return the original exception.
|
81 |
+
|
82 |
+
| | ContextWindowExceededError | AuthenticationError | InvalidRequestError | RateLimitError | ServiceUnavailableError |
|
83 |
+
|---------------|----------------------------|---------------------|---------------------|---------------|-------------------------|
|
84 |
+
| Anthropic | β
| β
| β
| β
| |
|
85 |
+
| OpenAI | β
| β
|β
|β
|β
|
|
86 |
+
| Replicate | β
| β
| β
| β
| β
|
|
87 |
+
| Cohere | β
| β
| β
| β
| β
|
|
88 |
+
| Huggingface | β
| β
| β
| β
| |
|
89 |
+
| Openrouter | β
| β
| β
| β
| |
|
90 |
+
| AI21 | β
| β
| β
| β
| |
|
91 |
+
| VertexAI | | |β
| | |
|
92 |
+
| Bedrock | | |β
| | |
|
93 |
+
| Sagemaker | | |β
| | |
|
94 |
+
| TogetherAI | β
| β
| β
| β
| |
|
95 |
+
| AlephAlpha | β
| β
| β
| β
| β
|
|
96 |
+
|
97 |
+
|
98 |
+
> For a deeper understanding of these exceptions, you can check out [this](https://github.com/BerriAI/litellm/blob/d7e58d13bf9ba9edbab2ab2f096f3de7547f35fa/litellm/utils.py#L1544) implementation for additional insights.
|
99 |
+
|
100 |
+
The `ContextWindowExceededError` is a sub-class of `InvalidRequestError`. It was introduced to provide more granularity for exception-handling scenarios. Please refer to [this issue to learn more](https://github.com/BerriAI/litellm/issues/228).
|
101 |
+
|
102 |
+
Contributions to improve exception mapping are [welcome](https://github.com/BerriAI/litellm#contributing)
|
docs/my-website/docs/extras/contributing.md
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Contributing to Documentation
|
2 |
+
|
3 |
+
This website is built using [Docusaurus 2](https://docusaurus.io/), a modern static website generator.
|
4 |
+
|
5 |
+
Clone litellm
|
6 |
+
```
|
7 |
+
git clone https://github.com/BerriAI/litellm.git
|
8 |
+
```
|
9 |
+
|
10 |
+
### Local setup for locally running docs
|
11 |
+
|
12 |
+
#### Installation
|
13 |
+
```
|
14 |
+
npm install --global yarn
|
15 |
+
```
|
16 |
+
|
17 |
+
|
18 |
+
### Local Development
|
19 |
+
|
20 |
+
```
|
21 |
+
cd docs/my-website
|
22 |
+
```
|
23 |
+
|
24 |
+
Let's Install requirement
|
25 |
+
|
26 |
+
```
|
27 |
+
yarn
|
28 |
+
```
|
29 |
+
Run website
|
30 |
+
|
31 |
+
```
|
32 |
+
yarn start
|
33 |
+
```
|
34 |
+
Open docs here: [http://localhost:3000/](http://localhost:3000/)
|
35 |
+
|
36 |
+
```
|
37 |
+
|
38 |
+
This command builds your Markdown files into HTML and starts a development server to browse your documentation. Open up [http://127.0.0.1:8000/](http://127.0.0.1:8000/) in your web browser to see your documentation. You can make changes to your Markdown files and your docs will automatically rebuild.
|
39 |
+
|
40 |
+
[Full tutorial here](https://docs.readthedocs.io/en/stable/intro/getting-started-with-mkdocs.html)
|
41 |
+
|
42 |
+
### Making changes to Docs
|
43 |
+
- All the docs are placed under the `docs` directory
|
44 |
+
- If you are adding a new `.md` file or editing the hierarchy edit `mkdocs.yml` in the root of the project
|
45 |
+
- After testing your changes, make a change to the `main` branch of [github.com/BerriAI/litellm](https://github.com/BerriAI/litellm)
|
46 |
+
|
47 |
+
|
48 |
+
|
49 |
+
|
docs/my-website/docs/getting_started.md
ADDED
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Getting Started
|
2 |
+
|
3 |
+
import QuickStart from '../src/components/QuickStart.js'
|
4 |
+
|
5 |
+
LiteLLM simplifies LLM API calls by mapping them all to the [OpenAI ChatCompletion format](https://platform.openai.com/docs/api-reference/chat).
|
6 |
+
|
7 |
+
## basic usage
|
8 |
+
|
9 |
+
By default we provide a free $10 community-key to try all providers supported on LiteLLM.
|
10 |
+
|
11 |
+
```python
|
12 |
+
from litellm import completion
|
13 |
+
|
14 |
+
## set ENV variables
|
15 |
+
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
16 |
+
os.environ["COHERE_API_KEY"] = "your-api-key"
|
17 |
+
|
18 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
19 |
+
|
20 |
+
# openai call
|
21 |
+
response = completion(model="gpt-3.5-turbo", messages=messages)
|
22 |
+
|
23 |
+
# cohere call
|
24 |
+
response = completion("command-nightly", messages)
|
25 |
+
```
|
26 |
+
|
27 |
+
**Need a dedicated key?**
|
28 |
+
Email us @ [email protected]
|
29 |
+
|
30 |
+
Next Steps π [Call all supported models - e.g. Claude-2, Llama2-70b, etc.](./proxy_api.md#supported-models)
|
31 |
+
|
32 |
+
More details π
|
33 |
+
* [Completion() function details](./completion/)
|
34 |
+
* [All supported models / providers on LiteLLM](./providers/)
|
35 |
+
* [Build your own OpenAI proxy](https://github.com/BerriAI/liteLLM-proxy/tree/main)
|
36 |
+
|
37 |
+
## streaming
|
38 |
+
|
39 |
+
Same example from before. Just pass in `stream=True` in the completion args.
|
40 |
+
```python
|
41 |
+
from litellm import completion
|
42 |
+
|
43 |
+
## set ENV variables
|
44 |
+
os.environ["OPENAI_API_KEY"] = "openai key"
|
45 |
+
os.environ["COHERE_API_KEY"] = "cohere key"
|
46 |
+
|
47 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
48 |
+
|
49 |
+
# openai call
|
50 |
+
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
|
51 |
+
|
52 |
+
# cohere call
|
53 |
+
response = completion("command-nightly", messages, stream=True)
|
54 |
+
|
55 |
+
print(response)
|
56 |
+
```
|
57 |
+
|
58 |
+
More details π
|
59 |
+
* [streaming + async](./completion/stream.md)
|
60 |
+
* [tutorial for streaming Llama2 on TogetherAI](./tutorials/TogetherAI_liteLLM.md)
|
61 |
+
|
62 |
+
## exception handling
|
63 |
+
|
64 |
+
LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM.
|
65 |
+
|
66 |
+
```python
|
67 |
+
from openai.error import OpenAIError
|
68 |
+
from litellm import completion
|
69 |
+
|
70 |
+
os.environ["ANTHROPIC_API_KEY"] = "bad-key"
|
71 |
+
try:
|
72 |
+
# some code
|
73 |
+
completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}])
|
74 |
+
except OpenAIError as e:
|
75 |
+
print(e)
|
76 |
+
```
|
77 |
+
|
78 |
+
## Logging Observability - Log LLM Input/Output ([Docs](https://docs.litellm.ai/docs/observability/callbacks))
|
79 |
+
LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack
|
80 |
+
```python
|
81 |
+
from litellm import completion
|
82 |
+
|
83 |
+
## set env variables for logging tools
|
84 |
+
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
|
85 |
+
os.environ["LANGFUSE_SECRET_KEY"] = ""
|
86 |
+
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"
|
87 |
+
|
88 |
+
os.environ["OPENAI_API_KEY"]
|
89 |
+
|
90 |
+
# set callbacks
|
91 |
+
litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase
|
92 |
+
|
93 |
+
#openai call
|
94 |
+
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi π - i'm openai"}])
|
95 |
+
```
|
96 |
+
|
97 |
+
More details π
|
98 |
+
* [exception mapping](./exception_mapping.md)
|
99 |
+
* [retries + model fallbacks for completion()](./completion/reliable_completions.md)
|
100 |
+
* [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md)
|
docs/my-website/docs/index.md
ADDED
@@ -0,0 +1,402 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Tabs from '@theme/Tabs';
|
2 |
+
import TabItem from '@theme/TabItem';
|
3 |
+
|
4 |
+
# LiteLLM - Getting Started
|
5 |
+
|
6 |
+
https://github.com/BerriAI/litellm
|
7 |
+
|
8 |
+
import QuickStart from '../src/components/QuickStart.js'
|
9 |
+
|
10 |
+
## **Call 100+ LLMs using the same Input/Output Format**
|
11 |
+
|
12 |
+
## Basic usage
|
13 |
+
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_Getting_Started.ipynb">
|
14 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
15 |
+
</a>
|
16 |
+
|
17 |
+
```shell
|
18 |
+
pip install litellm
|
19 |
+
```
|
20 |
+
<Tabs>
|
21 |
+
<TabItem value="openai" label="OpenAI">
|
22 |
+
|
23 |
+
```python
|
24 |
+
from litellm import completion
|
25 |
+
import os
|
26 |
+
|
27 |
+
## set ENV variables
|
28 |
+
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
29 |
+
|
30 |
+
response = completion(
|
31 |
+
model="gpt-3.5-turbo",
|
32 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}]
|
33 |
+
)
|
34 |
+
```
|
35 |
+
|
36 |
+
</TabItem>
|
37 |
+
<TabItem value="anthropic" label="Anthropic">
|
38 |
+
|
39 |
+
```python
|
40 |
+
from litellm import completion
|
41 |
+
import os
|
42 |
+
|
43 |
+
## set ENV variables
|
44 |
+
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
|
45 |
+
|
46 |
+
response = completion(
|
47 |
+
model="claude-2",
|
48 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}]
|
49 |
+
)
|
50 |
+
```
|
51 |
+
|
52 |
+
</TabItem>
|
53 |
+
|
54 |
+
<TabItem value="vertex" label="VertexAI">
|
55 |
+
|
56 |
+
```python
|
57 |
+
from litellm import completion
|
58 |
+
import os
|
59 |
+
|
60 |
+
# auth: run 'gcloud auth application-default'
|
61 |
+
os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
|
62 |
+
os.environ["VERTEX_LOCATION"] = "us-central1"
|
63 |
+
|
64 |
+
response = completion(
|
65 |
+
model="chat-bison",
|
66 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}]
|
67 |
+
)
|
68 |
+
```
|
69 |
+
|
70 |
+
</TabItem>
|
71 |
+
|
72 |
+
<TabItem value="hugging" label="HuggingFace">
|
73 |
+
|
74 |
+
```python
|
75 |
+
from litellm import completion
|
76 |
+
import os
|
77 |
+
|
78 |
+
os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key"
|
79 |
+
|
80 |
+
# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
|
81 |
+
response = completion(
|
82 |
+
model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
|
83 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
84 |
+
api_base="https://my-endpoint.huggingface.cloud"
|
85 |
+
)
|
86 |
+
|
87 |
+
print(response)
|
88 |
+
```
|
89 |
+
|
90 |
+
</TabItem>
|
91 |
+
|
92 |
+
<TabItem value="azure" label="Azure OpenAI">
|
93 |
+
|
94 |
+
```python
|
95 |
+
from litellm import completion
|
96 |
+
import os
|
97 |
+
|
98 |
+
## set ENV variables
|
99 |
+
os.environ["AZURE_API_KEY"] = ""
|
100 |
+
os.environ["AZURE_API_BASE"] = ""
|
101 |
+
os.environ["AZURE_API_VERSION"] = ""
|
102 |
+
|
103 |
+
# azure call
|
104 |
+
response = completion(
|
105 |
+
"azure/<your_deployment_name>",
|
106 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
107 |
+
)
|
108 |
+
```
|
109 |
+
|
110 |
+
</TabItem>
|
111 |
+
|
112 |
+
|
113 |
+
<TabItem value="ollama" label="Ollama">
|
114 |
+
|
115 |
+
```python
|
116 |
+
from litellm import completion
|
117 |
+
|
118 |
+
response = completion(
|
119 |
+
model="ollama/llama2",
|
120 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}],
|
121 |
+
api_base="http://localhost:11434"
|
122 |
+
)
|
123 |
+
```
|
124 |
+
</TabItem>
|
125 |
+
<TabItem value="or" label="Openrouter">
|
126 |
+
|
127 |
+
```python
|
128 |
+
from litellm import completion
|
129 |
+
import os
|
130 |
+
|
131 |
+
## set ENV variables
|
132 |
+
os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key"
|
133 |
+
|
134 |
+
response = completion(
|
135 |
+
model="openrouter/google/palm-2-chat-bison",
|
136 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}],
|
137 |
+
)
|
138 |
+
```
|
139 |
+
</TabItem>
|
140 |
+
|
141 |
+
</Tabs>
|
142 |
+
|
143 |
+
## Streaming
|
144 |
+
Set `stream=True` in the `completion` args.
|
145 |
+
<Tabs>
|
146 |
+
<TabItem value="openai" label="OpenAI">
|
147 |
+
|
148 |
+
```python
|
149 |
+
from litellm import completion
|
150 |
+
import os
|
151 |
+
|
152 |
+
## set ENV variables
|
153 |
+
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
154 |
+
|
155 |
+
response = completion(
|
156 |
+
model="gpt-3.5-turbo",
|
157 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
158 |
+
stream=True,
|
159 |
+
)
|
160 |
+
|
161 |
+
for chunk in response:
|
162 |
+
print(chunk)
|
163 |
+
```
|
164 |
+
|
165 |
+
</TabItem>
|
166 |
+
<TabItem value="anthropic" label="Anthropic">
|
167 |
+
|
168 |
+
```python
|
169 |
+
from litellm import completion
|
170 |
+
import os
|
171 |
+
|
172 |
+
## set ENV variables
|
173 |
+
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
|
174 |
+
|
175 |
+
response = completion(
|
176 |
+
model="claude-2",
|
177 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
178 |
+
stream=True,
|
179 |
+
)
|
180 |
+
|
181 |
+
for chunk in response:
|
182 |
+
print(chunk)
|
183 |
+
```
|
184 |
+
|
185 |
+
</TabItem>
|
186 |
+
|
187 |
+
<TabItem value="vertex" label="VertexAI">
|
188 |
+
|
189 |
+
```python
|
190 |
+
from litellm import completion
|
191 |
+
import os
|
192 |
+
|
193 |
+
# auth: run 'gcloud auth application-default'
|
194 |
+
os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
|
195 |
+
os.environ["VERTEX_LOCATION"] = "us-central1"
|
196 |
+
|
197 |
+
response = completion(
|
198 |
+
model="chat-bison",
|
199 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
200 |
+
stream=True,
|
201 |
+
)
|
202 |
+
|
203 |
+
for chunk in response:
|
204 |
+
print(chunk)
|
205 |
+
```
|
206 |
+
|
207 |
+
</TabItem>
|
208 |
+
|
209 |
+
<TabItem value="hugging" label="HuggingFace">
|
210 |
+
|
211 |
+
```python
|
212 |
+
from litellm import completion
|
213 |
+
import os
|
214 |
+
|
215 |
+
os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key"
|
216 |
+
|
217 |
+
# e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
|
218 |
+
response = completion(
|
219 |
+
model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
|
220 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
221 |
+
api_base="https://my-endpoint.huggingface.cloud",
|
222 |
+
stream=True,
|
223 |
+
)
|
224 |
+
|
225 |
+
|
226 |
+
for chunk in response:
|
227 |
+
print(chunk)
|
228 |
+
```
|
229 |
+
|
230 |
+
</TabItem>
|
231 |
+
|
232 |
+
<TabItem value="azure" label="Azure OpenAI">
|
233 |
+
|
234 |
+
```python
|
235 |
+
from litellm import completion
|
236 |
+
import os
|
237 |
+
|
238 |
+
## set ENV variables
|
239 |
+
os.environ["AZURE_API_KEY"] = ""
|
240 |
+
os.environ["AZURE_API_BASE"] = ""
|
241 |
+
os.environ["AZURE_API_VERSION"] = ""
|
242 |
+
|
243 |
+
# azure call
|
244 |
+
response = completion(
|
245 |
+
"azure/<your_deployment_name>",
|
246 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}],
|
247 |
+
stream=True,
|
248 |
+
)
|
249 |
+
|
250 |
+
for chunk in response:
|
251 |
+
print(chunk)
|
252 |
+
```
|
253 |
+
|
254 |
+
</TabItem>
|
255 |
+
|
256 |
+
|
257 |
+
<TabItem value="ollama" label="Ollama">
|
258 |
+
|
259 |
+
```python
|
260 |
+
from litellm import completion
|
261 |
+
|
262 |
+
response = completion(
|
263 |
+
model="ollama/llama2",
|
264 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}],
|
265 |
+
api_base="http://localhost:11434",
|
266 |
+
stream=True,
|
267 |
+
)
|
268 |
+
|
269 |
+
for chunk in response:
|
270 |
+
print(chunk)
|
271 |
+
```
|
272 |
+
</TabItem>
|
273 |
+
<TabItem value="or" label="Openrouter">
|
274 |
+
|
275 |
+
```python
|
276 |
+
from litellm import completion
|
277 |
+
import os
|
278 |
+
|
279 |
+
## set ENV variables
|
280 |
+
os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key"
|
281 |
+
|
282 |
+
response = completion(
|
283 |
+
model="openrouter/google/palm-2-chat-bison",
|
284 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}],
|
285 |
+
stream=True,
|
286 |
+
)
|
287 |
+
|
288 |
+
for chunk in response:
|
289 |
+
print(chunk)
|
290 |
+
```
|
291 |
+
</TabItem>
|
292 |
+
|
293 |
+
</Tabs>
|
294 |
+
|
295 |
+
## Exception handling
|
296 |
+
|
297 |
+
LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM.
|
298 |
+
|
299 |
+
```python
|
300 |
+
from openai.error import OpenAIError
|
301 |
+
from litellm import completion
|
302 |
+
|
303 |
+
os.environ["ANTHROPIC_API_KEY"] = "bad-key"
|
304 |
+
try:
|
305 |
+
# some code
|
306 |
+
completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}])
|
307 |
+
except OpenAIError as e:
|
308 |
+
print(e)
|
309 |
+
```
|
310 |
+
|
311 |
+
## Logging Observability - Log LLM Input/Output ([Docs](https://docs.litellm.ai/docs/observability/callbacks))
|
312 |
+
LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack
|
313 |
+
```python
|
314 |
+
from litellm import completion
|
315 |
+
|
316 |
+
## set env variables for logging tools
|
317 |
+
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
|
318 |
+
os.environ["LANGFUSE_SECRET_KEY"] = ""
|
319 |
+
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"
|
320 |
+
|
321 |
+
os.environ["OPENAI_API_KEY"]
|
322 |
+
|
323 |
+
# set callbacks
|
324 |
+
litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase
|
325 |
+
|
326 |
+
#openai call
|
327 |
+
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi π - i'm openai"}])
|
328 |
+
```
|
329 |
+
|
330 |
+
## Calculate Costs, Usage, Latency
|
331 |
+
|
332 |
+
Pass the completion response to `litellm.completion_cost(completion_response=response)` and get the cost
|
333 |
+
|
334 |
+
```python
|
335 |
+
from litellm import completion, completion_cost
|
336 |
+
import os
|
337 |
+
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
338 |
+
|
339 |
+
response = completion(
|
340 |
+
model="gpt-3.5-turbo",
|
341 |
+
messages=[{ "content": "Hello, how are you?","role": "user"}]
|
342 |
+
)
|
343 |
+
|
344 |
+
cost = completion_cost(completion_response=response)
|
345 |
+
print("Cost for completion call with gpt-3.5-turbo: ", f"${float(cost):.10f}")
|
346 |
+
```
|
347 |
+
|
348 |
+
**Output**
|
349 |
+
```shell
|
350 |
+
Cost for completion call with gpt-3.5-turbo: $0.0000775000
|
351 |
+
```
|
352 |
+
|
353 |
+
### Track Costs, Usage, Latency for streaming
|
354 |
+
We use a custom callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback
|
355 |
+
- We define a callback function to calculate cost `def track_cost_callback()`
|
356 |
+
- In `def track_cost_callback()` we check if the stream is complete - `if "complete_streaming_response" in kwargs`
|
357 |
+
- Use `litellm.completion_cost()` to calculate cost, once the stream is complete
|
358 |
+
|
359 |
+
```python
|
360 |
+
import litellm
|
361 |
+
|
362 |
+
# track_cost_callback
|
363 |
+
def track_cost_callback(
|
364 |
+
kwargs, # kwargs to completion
|
365 |
+
completion_response, # response from completion
|
366 |
+
start_time, end_time # start/end time
|
367 |
+
):
|
368 |
+
try:
|
369 |
+
# check if it has collected an entire stream response
|
370 |
+
if "complete_streaming_response" in kwargs:
|
371 |
+
# for tracking streaming cost we pass the "messages" and the output_text to litellm.completion_cost
|
372 |
+
completion_response=kwargs["complete_streaming_response"]
|
373 |
+
input_text = kwargs["messages"]
|
374 |
+
output_text = completion_response["choices"][0]["message"]["content"]
|
375 |
+
response_cost = litellm.completion_cost(
|
376 |
+
model = kwargs["model"],
|
377 |
+
messages = input_text,
|
378 |
+
completion=output_text
|
379 |
+
)
|
380 |
+
print("streaming response_cost", response_cost)
|
381 |
+
except:
|
382 |
+
pass
|
383 |
+
# set callback
|
384 |
+
litellm.success_callback = [track_cost_callback] # set custom callback function
|
385 |
+
|
386 |
+
# litellm.completion() call
|
387 |
+
response = completion(
|
388 |
+
model="gpt-3.5-turbo",
|
389 |
+
messages=[
|
390 |
+
{
|
391 |
+
"role": "user",
|
392 |
+
"content": "Hi π - i'm openai"
|
393 |
+
}
|
394 |
+
],
|
395 |
+
stream=True
|
396 |
+
)
|
397 |
+
```
|
398 |
+
|
399 |
+
## More details
|
400 |
+
* [exception mapping](./exception_mapping.md)
|
401 |
+
* [retries + model fallbacks for completion()](./completion/reliable_completions.md)
|
402 |
+
* [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md)
|
docs/my-website/docs/langchain/langchain.md
ADDED
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Tabs from '@theme/Tabs';
|
2 |
+
import TabItem from '@theme/TabItem';
|
3 |
+
|
4 |
+
# Using ChatLiteLLM() - Langchain
|
5 |
+
|
6 |
+
## Pre-Requisites
|
7 |
+
```shell
|
8 |
+
!pip install litellm langchain
|
9 |
+
```
|
10 |
+
## Quick Start
|
11 |
+
|
12 |
+
<Tabs>
|
13 |
+
<TabItem value="openai" label="OpenAI">
|
14 |
+
|
15 |
+
```python
|
16 |
+
import os
|
17 |
+
from langchain.chat_models import ChatLiteLLM
|
18 |
+
from langchain.prompts.chat import (
|
19 |
+
ChatPromptTemplate,
|
20 |
+
SystemMessagePromptTemplate,
|
21 |
+
AIMessagePromptTemplate,
|
22 |
+
HumanMessagePromptTemplate,
|
23 |
+
)
|
24 |
+
from langchain.schema import AIMessage, HumanMessage, SystemMessage
|
25 |
+
|
26 |
+
os.environ['OPENAI_API_KEY'] = ""
|
27 |
+
chat = ChatLiteLLM(model="gpt-3.5-turbo")
|
28 |
+
messages = [
|
29 |
+
HumanMessage(
|
30 |
+
content="what model are you"
|
31 |
+
)
|
32 |
+
]
|
33 |
+
chat(messages)
|
34 |
+
```
|
35 |
+
|
36 |
+
</TabItem>
|
37 |
+
|
38 |
+
<TabItem value="anthropic" label="Anthropic">
|
39 |
+
|
40 |
+
```python
|
41 |
+
import os
|
42 |
+
from langchain.chat_models import ChatLiteLLM
|
43 |
+
from langchain.prompts.chat import (
|
44 |
+
ChatPromptTemplate,
|
45 |
+
SystemMessagePromptTemplate,
|
46 |
+
AIMessagePromptTemplate,
|
47 |
+
HumanMessagePromptTemplate,
|
48 |
+
)
|
49 |
+
from langchain.schema import AIMessage, HumanMessage, SystemMessage
|
50 |
+
|
51 |
+
os.environ['ANTHROPIC_API_KEY'] = ""
|
52 |
+
chat = ChatLiteLLM(model="claude-2", temperature=0.3)
|
53 |
+
messages = [
|
54 |
+
HumanMessage(
|
55 |
+
content="what model are you"
|
56 |
+
)
|
57 |
+
]
|
58 |
+
chat(messages)
|
59 |
+
```
|
60 |
+
|
61 |
+
</TabItem>
|
62 |
+
|
63 |
+
<TabItem value="replicate" label="Replicate">
|
64 |
+
|
65 |
+
```python
|
66 |
+
import os
|
67 |
+
from langchain.chat_models import ChatLiteLLM
|
68 |
+
from langchain.prompts.chat import (
|
69 |
+
ChatPromptTemplate,
|
70 |
+
SystemMessagePromptTemplate,
|
71 |
+
AIMessagePromptTemplate,
|
72 |
+
HumanMessagePromptTemplate,
|
73 |
+
)
|
74 |
+
from langchain.schema import AIMessage, HumanMessage, SystemMessage
|
75 |
+
|
76 |
+
os.environ['REPLICATE_API_TOKEN'] = ""
|
77 |
+
chat = ChatLiteLLM(model="replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1")
|
78 |
+
messages = [
|
79 |
+
HumanMessage(
|
80 |
+
content="what model are you?"
|
81 |
+
)
|
82 |
+
]
|
83 |
+
chat(messages)
|
84 |
+
```
|
85 |
+
|
86 |
+
</TabItem>
|
87 |
+
|
88 |
+
<TabItem value="cohere" label="Cohere">
|
89 |
+
|
90 |
+
```python
|
91 |
+
import os
|
92 |
+
from langchain.chat_models import ChatLiteLLM
|
93 |
+
from langchain.prompts.chat import (
|
94 |
+
ChatPromptTemplate,
|
95 |
+
SystemMessagePromptTemplate,
|
96 |
+
AIMessagePromptTemplate,
|
97 |
+
HumanMessagePromptTemplate,
|
98 |
+
)
|
99 |
+
from langchain.schema import AIMessage, HumanMessage, SystemMessage
|
100 |
+
|
101 |
+
os.environ['COHERE_API_KEY'] = ""
|
102 |
+
chat = ChatLiteLLM(model="command-nightly")
|
103 |
+
messages = [
|
104 |
+
HumanMessage(
|
105 |
+
content="what model are you?"
|
106 |
+
)
|
107 |
+
]
|
108 |
+
chat(messages)
|
109 |
+
```
|
110 |
+
|
111 |
+
</TabItem>
|
112 |
+
<TabItem value="palm" label="PaLM - Google">
|
113 |
+
|
114 |
+
```python
|
115 |
+
import os
|
116 |
+
from langchain.chat_models import ChatLiteLLM
|
117 |
+
from langchain.prompts.chat import (
|
118 |
+
ChatPromptTemplate,
|
119 |
+
SystemMessagePromptTemplate,
|
120 |
+
AIMessagePromptTemplate,
|
121 |
+
HumanMessagePromptTemplate,
|
122 |
+
)
|
123 |
+
from langchain.schema import AIMessage, HumanMessage, SystemMessage
|
124 |
+
|
125 |
+
os.environ['PALM_API_KEY'] = ""
|
126 |
+
chat = ChatLiteLLM(model="palm/chat-bison")
|
127 |
+
messages = [
|
128 |
+
HumanMessage(
|
129 |
+
content="what model are you?"
|
130 |
+
)
|
131 |
+
]
|
132 |
+
chat(messages)
|
133 |
+
```
|
134 |
+
</TabItem>
|
135 |
+
</Tabs>
|
docs/my-website/docs/migration.md
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Migration Guide - LiteLLM v1.0.0+
|
2 |
+
|
3 |
+
When we have breaking changes (i.e. going from 1.x.x to 2.x.x), we will document those changes here.
|
4 |
+
|
5 |
+
|
6 |
+
## `1.0.0`
|
7 |
+
|
8 |
+
**Last Release before breaking change**: 0.14.0
|
9 |
+
|
10 |
+
**What changed?**
|
11 |
+
|
12 |
+
- Requires `openai>=1.0.0`
|
13 |
+
- `openai.InvalidRequestError`Β βΒ `openai.BadRequestError`
|
14 |
+
- `openai.ServiceUnavailableError` β `openai.APIStatusError`
|
15 |
+
- *NEW* litellm client, allow users to pass api_key
|
16 |
+
- `litellm.Litellm(api_key="sk-123")`
|
17 |
+
- response objects now inherit from `BaseModel` (prev. `OpenAIObject`)
|
18 |
+
- *NEW* default exception - `APIConnectionError` (prev. `APIError`)
|
19 |
+
- litellm.get_max_tokens() now returns an int not a dict
|
20 |
+
```python
|
21 |
+
max_tokens = litellm.get_max_tokens("gpt-3.5-turbo") # returns an int not a dict
|
22 |
+
assert max_tokens==4097
|
23 |
+
```
|
24 |
+
- Streaming - OpenAI Chunks now return `None` for empty stream chunks. This is how to process stream chunks with content
|
25 |
+
```python
|
26 |
+
response = litellm.completion(model="gpt-3.5-turbo", messages=messages, stream=True)
|
27 |
+
for part in response:
|
28 |
+
print(part.choices[0].delta.content or "")
|
29 |
+
```
|
30 |
+
|
31 |
+
**How can we communicate changes better?**
|
32 |
+
Tell us
|
33 |
+
- [Discord](https://discord.com/invite/wuPM9dRgDw)
|
34 |
+
- Email ([email protected]/[email protected])
|
35 |
+
- Text us (+17708783106)
|
docs/my-website/docs/observability/callbacks.md
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Callbacks
|
2 |
+
|
3 |
+
## Use Callbacks to send Output Data to Posthog, Sentry etc
|
4 |
+
|
5 |
+
liteLLM provides `input_callbacks`, `success_callbacks` and `failure_callbacks`, making it easy for you to send data to a particular provider depending on the status of your responses.
|
6 |
+
|
7 |
+
liteLLM supports:
|
8 |
+
|
9 |
+
- [Custom Callback Functions](https://docs.litellm.ai/docs/observability/custom_callback)
|
10 |
+
- [LLMonitor](https://llmonitor.com/docs)
|
11 |
+
- [Helicone](https://docs.helicone.ai/introduction)
|
12 |
+
- [Traceloop](https://traceloop.com/docs)
|
13 |
+
- [Sentry](https://docs.sentry.io/platforms/python/)
|
14 |
+
- [PostHog](https://posthog.com/docs/libraries/python)
|
15 |
+
- [Slack](https://slack.dev/bolt-python/concepts)
|
16 |
+
|
17 |
+
### Quick Start
|
18 |
+
|
19 |
+
```python
|
20 |
+
from litellm import completion
|
21 |
+
|
22 |
+
# set callbacks
|
23 |
+
litellm.input_callback=["sentry"] # for sentry breadcrumbing - logs the input being sent to the api
|
24 |
+
litellm.success_callback=["posthog", "helicone", "llmonitor"]
|
25 |
+
litellm.failure_callback=["sentry", "llmonitor"]
|
26 |
+
|
27 |
+
## set env variables
|
28 |
+
os.environ['SENTRY_DSN'], os.environ['SENTRY_API_TRACE_RATE']= ""
|
29 |
+
os.environ['POSTHOG_API_KEY'], os.environ['POSTHOG_API_URL'] = "api-key", "api-url"
|
30 |
+
os.environ["HELICONE_API_KEY"] = ""
|
31 |
+
os.environ["TRACELOOP_API_KEY"] = ""
|
32 |
+
os.environ["LLMONITOR_APP_ID"] = ""
|
33 |
+
|
34 |
+
response = completion(model="gpt-3.5-turbo", messages=messages)
|
35 |
+
```
|
docs/my-website/docs/observability/custom_callback.md
ADDED
@@ -0,0 +1,358 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Custom Callbacks
|
2 |
+
|
3 |
+
## Callback Class
|
4 |
+
You can create a custom callback class to precisely log events as they occur in litellm.
|
5 |
+
|
6 |
+
```python
|
7 |
+
from litellm.integrations.custom_logger import CustomLogger
|
8 |
+
|
9 |
+
class MyCustomHandler(CustomLogger):
|
10 |
+
def log_pre_api_call(self, model, messages, kwargs):
|
11 |
+
print(f"Pre-API Call")
|
12 |
+
|
13 |
+
def log_post_api_call(self, kwargs, response_obj, start_time, end_time):
|
14 |
+
print(f"Post-API Call")
|
15 |
+
|
16 |
+
def log_stream_event(self, kwargs, response_obj, start_time, end_time):
|
17 |
+
print(f"On Stream")
|
18 |
+
|
19 |
+
def log_success_event(self, kwargs, response_obj, start_time, end_time):
|
20 |
+
print(f"On Success")
|
21 |
+
|
22 |
+
def log_failure_event(self, kwargs, response_obj, start_time, end_time):
|
23 |
+
print(f"On Failure")
|
24 |
+
|
25 |
+
customHandler = MyCustomHandler()
|
26 |
+
|
27 |
+
litellm.callbacks = [customHandler]
|
28 |
+
response = completion(model="gpt-3.5-turbo", messages=[{ "role": "user", "content": "Hi π - i'm openai"}],
|
29 |
+
stream=True)
|
30 |
+
for chunk in response:
|
31 |
+
continue
|
32 |
+
```
|
33 |
+
|
34 |
+
## Callback Functions
|
35 |
+
If you just want to log on a specific event (e.g. on input) - you can use callback functions.
|
36 |
+
|
37 |
+
You can set custom callbacks to trigger for:
|
38 |
+
- `litellm.input_callback` - Track inputs/transformed inputs before making the LLM API call
|
39 |
+
- `litellm.success_callback` - Track inputs/outputs after making LLM API call
|
40 |
+
- `litellm.failure_callback` - Track inputs/outputs + exceptions for litellm calls
|
41 |
+
|
42 |
+
## Defining a Custom Callback Function
|
43 |
+
Create a custom callback function that takes specific arguments:
|
44 |
+
|
45 |
+
```python
|
46 |
+
def custom_callback(
|
47 |
+
kwargs, # kwargs to completion
|
48 |
+
completion_response, # response from completion
|
49 |
+
start_time, end_time # start/end time
|
50 |
+
):
|
51 |
+
# Your custom code here
|
52 |
+
print("LITELLM: in custom callback function")
|
53 |
+
print("kwargs", kwargs)
|
54 |
+
print("completion_response", completion_response)
|
55 |
+
print("start_time", start_time)
|
56 |
+
print("end_time", end_time)
|
57 |
+
```
|
58 |
+
|
59 |
+
### Setting the custom callback function
|
60 |
+
```python
|
61 |
+
import litellm
|
62 |
+
litellm.success_callback = [custom_callback]
|
63 |
+
```
|
64 |
+
|
65 |
+
## Using Your Custom Callback Function
|
66 |
+
|
67 |
+
```python
|
68 |
+
import litellm
|
69 |
+
from litellm import completion
|
70 |
+
|
71 |
+
# Assign the custom callback function
|
72 |
+
litellm.success_callback = [custom_callback]
|
73 |
+
|
74 |
+
response = completion(
|
75 |
+
model="gpt-3.5-turbo",
|
76 |
+
messages=[
|
77 |
+
{
|
78 |
+
"role": "user",
|
79 |
+
"content": "Hi π - i'm openai"
|
80 |
+
}
|
81 |
+
]
|
82 |
+
)
|
83 |
+
|
84 |
+
print(response)
|
85 |
+
|
86 |
+
```
|
87 |
+
|
88 |
+
## Async Callback Functions
|
89 |
+
|
90 |
+
LiteLLM currently supports just async success callback functions for async completion/embedding calls.
|
91 |
+
|
92 |
+
```python
|
93 |
+
import asyncio, litellm
|
94 |
+
|
95 |
+
async def async_test_logging_fn(kwargs, completion_obj, start_time, end_time):
|
96 |
+
print(f"On Async Success!")
|
97 |
+
|
98 |
+
async def test_chat_openai():
|
99 |
+
try:
|
100 |
+
# litellm.set_verbose = True
|
101 |
+
litellm.success_callback = [async_test_logging_fn]
|
102 |
+
response = await litellm.acompletion(model="gpt-3.5-turbo",
|
103 |
+
messages=[{
|
104 |
+
"role": "user",
|
105 |
+
"content": "Hi π - i'm openai"
|
106 |
+
}],
|
107 |
+
stream=True)
|
108 |
+
async for chunk in response:
|
109 |
+
continue
|
110 |
+
except Exception as e:
|
111 |
+
print(e)
|
112 |
+
pytest.fail(f"An error occurred - {str(e)}")
|
113 |
+
|
114 |
+
asyncio.run(test_chat_openai())
|
115 |
+
```
|
116 |
+
|
117 |
+
:::info
|
118 |
+
|
119 |
+
We're actively trying to expand this to other event types. [Tell us if you need this!](https://github.com/BerriAI/litellm/issues/1007)
|
120 |
+
|
121 |
+
|
122 |
+
|
123 |
+
:::
|
124 |
+
|
125 |
+
## What's in kwargs?
|
126 |
+
|
127 |
+
Notice we pass in a kwargs argument to custom callback.
|
128 |
+
```python
|
129 |
+
def custom_callback(
|
130 |
+
kwargs, # kwargs to completion
|
131 |
+
completion_response, # response from completion
|
132 |
+
start_time, end_time # start/end time
|
133 |
+
):
|
134 |
+
# Your custom code here
|
135 |
+
print("LITELLM: in custom callback function")
|
136 |
+
print("kwargs", kwargs)
|
137 |
+
print("completion_response", completion_response)
|
138 |
+
print("start_time", start_time)
|
139 |
+
print("end_time", end_time)
|
140 |
+
```
|
141 |
+
|
142 |
+
This is a dictionary containing all the model-call details (the params we receive, the values we send to the http endpoint, the response we receive, stacktrace in case of errors, etc.).
|
143 |
+
|
144 |
+
This is all logged in the [model_call_details via our Logger](https://github.com/BerriAI/litellm/blob/fc757dc1b47d2eb9d0ea47d6ad224955b705059d/litellm/utils.py#L246).
|
145 |
+
|
146 |
+
Here's exactly what you can expect in the kwargs dictionary:
|
147 |
+
```shell
|
148 |
+
### DEFAULT PARAMS ###
|
149 |
+
"model": self.model,
|
150 |
+
"messages": self.messages,
|
151 |
+
"optional_params": self.optional_params, # model-specific params passed in
|
152 |
+
"litellm_params": self.litellm_params, # litellm-specific params passed in (e.g. metadata passed to completion call)
|
153 |
+
"start_time": self.start_time, # datetime object of when call was started
|
154 |
+
|
155 |
+
### PRE-API CALL PARAMS ### (check via kwargs["log_event_type"]="pre_api_call")
|
156 |
+
"input" = input # the exact prompt sent to the LLM API
|
157 |
+
"api_key" = api_key # the api key used for that LLM API
|
158 |
+
"additional_args" = additional_args # any additional details for that API call (e.g. contains optional params sent)
|
159 |
+
|
160 |
+
### POST-API CALL PARAMS ### (check via kwargs["log_event_type"]="post_api_call")
|
161 |
+
"original_response" = original_response # the original http response received (saved via response.text)
|
162 |
+
|
163 |
+
### ON-SUCCESS PARAMS ### (check via kwargs["log_event_type"]="successful_api_call")
|
164 |
+
"complete_streaming_response" = complete_streaming_response # the complete streamed response (only set if `completion(..stream=True)`)
|
165 |
+
"end_time" = end_time # datetime object of when call was completed
|
166 |
+
|
167 |
+
### ON-FAILURE PARAMS ### (check via kwargs["log_event_type"]="failed_api_call")
|
168 |
+
"exception" = exception # the Exception raised
|
169 |
+
"traceback_exception" = traceback_exception # the traceback generated via `traceback.format_exc()`
|
170 |
+
"end_time" = end_time # datetime object of when call was completed
|
171 |
+
```
|
172 |
+
|
173 |
+
### Get complete streaming response
|
174 |
+
|
175 |
+
LiteLLM will pass you the complete streaming response in the final streaming chunk as part of the kwargs for your custom callback function.
|
176 |
+
|
177 |
+
```python
|
178 |
+
# litellm.set_verbose = False
|
179 |
+
def custom_callback(
|
180 |
+
kwargs, # kwargs to completion
|
181 |
+
completion_response, # response from completion
|
182 |
+
start_time, end_time # start/end time
|
183 |
+
):
|
184 |
+
# print(f"streaming response: {completion_response}")
|
185 |
+
if "complete_streaming_response" in kwargs:
|
186 |
+
print(f"Complete Streaming Response: {kwargs['complete_streaming_response']}")
|
187 |
+
|
188 |
+
# Assign the custom callback function
|
189 |
+
litellm.success_callback = [custom_callback]
|
190 |
+
|
191 |
+
response = completion(model="claude-instant-1", messages=messages, stream=True)
|
192 |
+
for idx, chunk in enumerate(response):
|
193 |
+
pass
|
194 |
+
```
|
195 |
+
|
196 |
+
|
197 |
+
### Log additional metadata
|
198 |
+
|
199 |
+
LiteLLM accepts a metadata dictionary in the completion call. You can pass additional metadata into your completion call via `completion(..., metadata={"key": "value"})`.
|
200 |
+
|
201 |
+
Since this is a [litellm-specific param](https://github.com/BerriAI/litellm/blob/b6a015404eed8a0fa701e98f4581604629300ee3/litellm/main.py#L235), it's accessible via kwargs["litellm_params"]
|
202 |
+
|
203 |
+
```python
|
204 |
+
from litellm import completion
|
205 |
+
import os, litellm
|
206 |
+
|
207 |
+
## set ENV variables
|
208 |
+
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
209 |
+
|
210 |
+
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
211 |
+
|
212 |
+
def custom_callback(
|
213 |
+
kwargs, # kwargs to completion
|
214 |
+
completion_response, # response from completion
|
215 |
+
start_time, end_time # start/end time
|
216 |
+
):
|
217 |
+
print(kwargs["litellm_params"]["metadata"])
|
218 |
+
|
219 |
+
|
220 |
+
# Assign the custom callback function
|
221 |
+
litellm.success_callback = [custom_callback]
|
222 |
+
|
223 |
+
response = litellm.completion(model="gpt-3.5-turbo", messages=messages, metadata={"hello": "world"})
|
224 |
+
```
|
225 |
+
|
226 |
+
## Examples
|
227 |
+
|
228 |
+
### Custom Callback to track costs for Streaming + Non-Streaming
|
229 |
+
```python
|
230 |
+
|
231 |
+
def track_cost_callback(
|
232 |
+
kwargs, # kwargs to completion
|
233 |
+
completion_response, # response from completion
|
234 |
+
start_time, end_time # start/end time
|
235 |
+
):
|
236 |
+
try:
|
237 |
+
# init logging config
|
238 |
+
logging.basicConfig(
|
239 |
+
filename='cost.log',
|
240 |
+
level=logging.INFO,
|
241 |
+
format='%(asctime)s - %(message)s',
|
242 |
+
datefmt='%Y-%m-%d %H:%M:%S'
|
243 |
+
)
|
244 |
+
|
245 |
+
# check if it has collected an entire stream response
|
246 |
+
if "complete_streaming_response" in kwargs:
|
247 |
+
# for tracking streaming cost we pass the "messages" and the output_text to litellm.completion_cost
|
248 |
+
completion_response=kwargs["complete_streaming_response"]
|
249 |
+
input_text = kwargs["messages"]
|
250 |
+
output_text = completion_response["choices"][0]["message"]["content"]
|
251 |
+
response_cost = litellm.completion_cost(
|
252 |
+
model = kwargs["model"],
|
253 |
+
messages = input_text,
|
254 |
+
completion=output_text
|
255 |
+
)
|
256 |
+
print("streaming response_cost", response_cost)
|
257 |
+
logging.info(f"Model {kwargs['model']} Cost: ${response_cost:.8f}")
|
258 |
+
|
259 |
+
# for non streaming responses
|
260 |
+
else:
|
261 |
+
# we pass the completion_response obj
|
262 |
+
if kwargs["stream"] != True:
|
263 |
+
response_cost = litellm.completion_cost(completion_response=completion_response)
|
264 |
+
print("regular response_cost", response_cost)
|
265 |
+
logging.info(f"Model {completion_response.model} Cost: ${response_cost:.8f}")
|
266 |
+
except:
|
267 |
+
pass
|
268 |
+
|
269 |
+
# Assign the custom callback function
|
270 |
+
litellm.success_callback = [track_cost_callback]
|
271 |
+
|
272 |
+
response = completion(
|
273 |
+
model="gpt-3.5-turbo",
|
274 |
+
messages=[
|
275 |
+
{
|
276 |
+
"role": "user",
|
277 |
+
"content": "Hi π - i'm openai"
|
278 |
+
}
|
279 |
+
]
|
280 |
+
)
|
281 |
+
|
282 |
+
print(response)
|
283 |
+
```
|
284 |
+
|
285 |
+
### Custom Callback to log transformed Input to LLMs
|
286 |
+
```python
|
287 |
+
def get_transformed_inputs(
|
288 |
+
kwargs,
|
289 |
+
):
|
290 |
+
params_to_model = kwargs["additional_args"]["complete_input_dict"]
|
291 |
+
print("params to model", params_to_model)
|
292 |
+
|
293 |
+
litellm.input_callback = [get_transformed_inputs]
|
294 |
+
|
295 |
+
def test_chat_openai():
|
296 |
+
try:
|
297 |
+
response = completion(model="claude-2",
|
298 |
+
messages=[{
|
299 |
+
"role": "user",
|
300 |
+
"content": "Hi π - i'm openai"
|
301 |
+
}])
|
302 |
+
|
303 |
+
print(response)
|
304 |
+
|
305 |
+
except Exception as e:
|
306 |
+
print(e)
|
307 |
+
pass
|
308 |
+
```
|
309 |
+
|
310 |
+
#### Output
|
311 |
+
```shell
|
312 |
+
params to model {'model': 'claude-2', 'prompt': "\n\nHuman: Hi π - i'm openai\n\nAssistant: ", 'max_tokens_to_sample': 256}
|
313 |
+
```
|
314 |
+
|
315 |
+
### Custom Callback to write to Mixpanel
|
316 |
+
|
317 |
+
```python
|
318 |
+
import mixpanel
|
319 |
+
import litellm
|
320 |
+
from litellm import completion
|
321 |
+
|
322 |
+
def custom_callback(
|
323 |
+
kwargs, # kwargs to completion
|
324 |
+
completion_response, # response from completion
|
325 |
+
start_time, end_time # start/end time
|
326 |
+
):
|
327 |
+
# Your custom code here
|
328 |
+
mixpanel.track("LLM Response", {"llm_response": completion_response})
|
329 |
+
|
330 |
+
|
331 |
+
# Assign the custom callback function
|
332 |
+
litellm.success_callback = [custom_callback]
|
333 |
+
|
334 |
+
response = completion(
|
335 |
+
model="gpt-3.5-turbo",
|
336 |
+
messages=[
|
337 |
+
{
|
338 |
+
"role": "user",
|
339 |
+
"content": "Hi π - i'm openai"
|
340 |
+
}
|
341 |
+
]
|
342 |
+
)
|
343 |
+
|
344 |
+
print(response)
|
345 |
+
|
346 |
+
```
|
347 |
+
|
348 |
+
|
349 |
+
|
350 |
+
|
351 |
+
|
352 |
+
|
353 |
+
|
354 |
+
|
355 |
+
|
356 |
+
|
357 |
+
|
358 |
+
|
docs/my-website/docs/observability/helicone_integration.md
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Helicone Tutorial
|
2 |
+
[Helicone](https://helicone.ai/) is an open source observability platform that proxies your OpenAI traffic and provides you key insights into your spend, latency and usage.
|
3 |
+
|
4 |
+
## Use Helicone to log requests across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM)
|
5 |
+
liteLLM provides `success_callbacks` and `failure_callbacks`, making it easy for you to send data to a particular provider depending on the status of your responses.
|
6 |
+
|
7 |
+
In this case, we want to log requests to Helicone when a request succeeds.
|
8 |
+
|
9 |
+
### Approach 1: Use Callbacks
|
10 |
+
Use just 1 line of code, to instantly log your responses **across all providers** with helicone:
|
11 |
+
```python
|
12 |
+
litellm.success_callback=["helicone"]
|
13 |
+
```
|
14 |
+
|
15 |
+
Complete code
|
16 |
+
```python
|
17 |
+
from litellm import completion
|
18 |
+
|
19 |
+
## set env variables
|
20 |
+
os.environ["HELICONE_API_KEY"] = "your-helicone-key"
|
21 |
+
os.environ["OPENAI_API_KEY"], os.environ["COHERE_API_KEY"] = "", ""
|
22 |
+
|
23 |
+
# set callbacks
|
24 |
+
litellm.success_callback=["helicone"]
|
25 |
+
|
26 |
+
#openai call
|
27 |
+
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi π - i'm openai"}])
|
28 |
+
|
29 |
+
#cohere call
|
30 |
+
response = completion(model="command-nightly", messages=[{"role": "user", "content": "Hi π - i'm cohere"}])
|
31 |
+
```
|
32 |
+
|
33 |
+
### Approach 2: [OpenAI + Azure only] Use Helicone as a proxy
|
34 |
+
Helicone provides advanced functionality like caching, etc. Helicone currently supports this for Azure and OpenAI.
|
35 |
+
|
36 |
+
If you want to use Helicone to proxy your OpenAI/Azure requests, then you can -
|
37 |
+
|
38 |
+
- Set helicone as your base url via: `litellm.api_url`
|
39 |
+
- Pass in helicone request headers via: `litellm.headers`
|
40 |
+
|
41 |
+
Complete Code
|
42 |
+
```python
|
43 |
+
import litellm
|
44 |
+
from litellm import completion
|
45 |
+
|
46 |
+
litellm.api_base = "https://oai.hconeai.com/v1"
|
47 |
+
litellm.headers = {"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}"}
|
48 |
+
|
49 |
+
response = litellm.completion(
|
50 |
+
model="gpt-3.5-turbo",
|
51 |
+
messages=[{"role": "user", "content": "how does a court case get to the Supreme Court?"}]
|
52 |
+
)
|
53 |
+
|
54 |
+
print(response)
|
55 |
+
```
|
docs/my-website/docs/observability/langfuse_integration.md
ADDED
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Image from '@theme/IdealImage';
|
2 |
+
|
3 |
+
# Langfuse - Logging LLM Input/Output
|
4 |
+
|
5 |
+
LangFuse is open Source Observability & Analytics for LLM Apps
|
6 |
+
Detailed production traces and a granular view on quality, cost and latency
|
7 |
+
|
8 |
+
<Image img={require('../../img/langfuse.png')} />
|
9 |
+
|
10 |
+
:::info
|
11 |
+
We want to learn how we can make the callbacks better! Meet the LiteLLM [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
|
12 |
+
join our [discord](https://discord.gg/wuPM9dRgDw)
|
13 |
+
:::
|
14 |
+
|
15 |
+
## Pre-Requisites
|
16 |
+
Ensure you have run `pip install langfuse` for this integration
|
17 |
+
```shell
|
18 |
+
pip install langfuse litellm
|
19 |
+
```
|
20 |
+
|
21 |
+
## Quick Start
|
22 |
+
Use just 2 lines of code, to instantly log your responses **across all providers** with Langfuse
|
23 |
+
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/logging_observability/LiteLLM_Langfuse.ipynb">
|
24 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
25 |
+
</a>
|
26 |
+
|
27 |
+
Get your Langfuse API Keys from https://cloud.langfuse.com/
|
28 |
+
```python
|
29 |
+
litellm.success_callback = ["langfuse"]
|
30 |
+
```
|
31 |
+
```python
|
32 |
+
# pip install langfuse
|
33 |
+
import litellm
|
34 |
+
import os
|
35 |
+
|
36 |
+
# from https://cloud.langfuse.com/
|
37 |
+
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
|
38 |
+
os.environ["LANGFUSE_SECRET_KEY"] = ""
|
39 |
+
# Optional, defaults to https://cloud.langfuse.com
|
40 |
+
os.environ["LANGFUSE_HOST"] # optional
|
41 |
+
|
42 |
+
# LLM API Keys
|
43 |
+
os.environ['OPENAI_API_KEY']=""
|
44 |
+
|
45 |
+
# set langfuse as a callback, litellm will send the data to langfuse
|
46 |
+
litellm.success_callback = ["langfuse"]
|
47 |
+
|
48 |
+
# openai call
|
49 |
+
response = litellm.completion(
|
50 |
+
model="gpt-3.5-turbo",
|
51 |
+
messages=[
|
52 |
+
{"role": "user", "content": "Hi π - i'm openai"}
|
53 |
+
]
|
54 |
+
)
|
55 |
+
```
|
56 |
+
|
57 |
+
## Advanced
|
58 |
+
### Set Custom Generation names, pass metadata
|
59 |
+
|
60 |
+
```python
|
61 |
+
import litellm
|
62 |
+
from litellm import completion
|
63 |
+
import os
|
64 |
+
|
65 |
+
# from https://cloud.langfuse.com/
|
66 |
+
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
|
67 |
+
os.environ["LANGFUSE_SECRET_KEY"] = ""
|
68 |
+
|
69 |
+
|
70 |
+
# OpenAI and Cohere keys
|
71 |
+
# You can use any of the litellm supported providers: https://docs.litellm.ai/docs/providers
|
72 |
+
os.environ['OPENAI_API_KEY']=""
|
73 |
+
|
74 |
+
# set langfuse as a callback, litellm will send the data to langfuse
|
75 |
+
litellm.success_callback = ["langfuse"]
|
76 |
+
|
77 |
+
# openai call
|
78 |
+
response = completion(
|
79 |
+
model="gpt-3.5-turbo",
|
80 |
+
messages=[
|
81 |
+
{"role": "user", "content": "Hi π - i'm openai"}
|
82 |
+
],
|
83 |
+
metadata = {
|
84 |
+
"generation_name": "litellm-ishaan-gen", # set langfuse generation name
|
85 |
+
# custom metadata fields
|
86 |
+
"project": "litellm-proxy"
|
87 |
+
}
|
88 |
+
)
|
89 |
+
|
90 |
+
print(response)
|
91 |
+
|
92 |
+
```
|
93 |
+
|
94 |
+
|
95 |
+
|
96 |
+
## Troubleshooting & Errors
|
97 |
+
### Data not getting logged to Langfuse ?
|
98 |
+
- Ensure you're on the latest version of langfuse `pip install langfuse -U`. The latest version allows litellm to log JSON input/outputs to langfuse
|
99 |
+
|
100 |
+
## Support & Talk to Founders
|
101 |
+
|
102 |
+
- [Schedule Demo π](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
103 |
+
- [Community Discord π](https://discord.gg/wuPM9dRgDw)
|
104 |
+
- Our numbers π +1 (770) 8783-106 / β+1 (412) 618-6238β¬
|
105 |
+
- Our emails βοΈ [email protected] / [email protected]
|
docs/my-website/docs/observability/langsmith_integration.md
ADDED
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Image from '@theme/IdealImage';
|
2 |
+
|
3 |
+
# Langsmith - Logging LLM Input/Output
|
4 |
+
An all-in-one developer platform for every step of the application lifecycle
|
5 |
+
https://smith.langchain.com/
|
6 |
+
|
7 |
+
<Image img={require('../../img/langsmith.png')} />
|
8 |
+
|
9 |
+
:::info
|
10 |
+
We want to learn how we can make the callbacks better! Meet the LiteLLM [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
|
11 |
+
join our [discord](https://discord.gg/wuPM9dRgDw)
|
12 |
+
:::
|
13 |
+
|
14 |
+
## Pre-Requisites
|
15 |
+
```shell
|
16 |
+
pip install litellm
|
17 |
+
```
|
18 |
+
|
19 |
+
## Quick Start
|
20 |
+
Use just 2 lines of code, to instantly log your responses **across all providers** with Langsmith
|
21 |
+
|
22 |
+
|
23 |
+
```python
|
24 |
+
litellm.success_callback = ["langsmith"]
|
25 |
+
```
|
26 |
+
```python
|
27 |
+
import litellm
|
28 |
+
import os
|
29 |
+
|
30 |
+
os.environ["LANGSMITH_API_KEY"] = ""
|
31 |
+
# LLM API Keys
|
32 |
+
os.environ['OPENAI_API_KEY']=""
|
33 |
+
|
34 |
+
# set langsmith as a callback, litellm will send the data to langsmith
|
35 |
+
litellm.success_callback = ["langsmith"]
|
36 |
+
|
37 |
+
# openai call
|
38 |
+
response = litellm.completion(
|
39 |
+
model="gpt-3.5-turbo",
|
40 |
+
messages=[
|
41 |
+
{"role": "user", "content": "Hi π - i'm openai"}
|
42 |
+
]
|
43 |
+
)
|
44 |
+
```
|
45 |
+
|
46 |
+
## Advanced
|
47 |
+
### Set Custom Project & Run names
|
48 |
+
|
49 |
+
```python
|
50 |
+
import litellm
|
51 |
+
import os
|
52 |
+
|
53 |
+
os.environ["LANGSMITH_API_KEY"] = ""
|
54 |
+
# LLM API Keys
|
55 |
+
os.environ['OPENAI_API_KEY']=""
|
56 |
+
|
57 |
+
# set langfuse as a callback, litellm will send the data to langfuse
|
58 |
+
litellm.success_callback = ["langfuse"]
|
59 |
+
|
60 |
+
response = litellm.completion(
|
61 |
+
model="gpt-3.5-turbo",
|
62 |
+
messages=[
|
63 |
+
{"role": "user", "content": "Hi π - i'm openai"}
|
64 |
+
],
|
65 |
+
metadata={
|
66 |
+
"run_name": "litellmRUN", # langsmith run name
|
67 |
+
"project_name": "litellm-completion", # langsmith project name
|
68 |
+
}
|
69 |
+
)
|
70 |
+
print(response)
|
71 |
+
```
|
72 |
+
## Support & Talk to Founders
|
73 |
+
|
74 |
+
- [Schedule Demo π](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
75 |
+
- [Community Discord π](https://discord.gg/wuPM9dRgDw)
|
76 |
+
- Our numbers π +1 (770) 8783-106 / β+1 (412) 618-6238β¬
|
77 |
+
- Our emails βοΈ [email protected] / [email protected]
|
docs/my-website/docs/observability/llmonitor_integration.md
ADDED
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# LLMonitor Tutorial
|
2 |
+
|
3 |
+
[LLMonitor](https://llmonitor.com/) is an open-source observability platform that provides cost tracking, user tracking and powerful agent tracing.
|
4 |
+
|
5 |
+
<video controls width='900' >
|
6 |
+
<source src='https://llmonitor.com/videos/demo-annotated.mp4'/>
|
7 |
+
</video>
|
8 |
+
|
9 |
+
## Use LLMonitor to log requests across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM)
|
10 |
+
|
11 |
+
liteLLM provides `callbacks`, making it easy for you to log data depending on the status of your responses.
|
12 |
+
|
13 |
+
:::info
|
14 |
+
We want to learn how we can make the callbacks better! Meet the [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
|
15 |
+
join our [discord](https://discord.gg/wuPM9dRgDw)
|
16 |
+
:::
|
17 |
+
|
18 |
+
### Using Callbacks
|
19 |
+
|
20 |
+
First, sign up to get an app ID on the [LLMonitor dashboard](https://llmonitor.com).
|
21 |
+
|
22 |
+
Use just 2 lines of code, to instantly log your responses **across all providers** with llmonitor:
|
23 |
+
|
24 |
+
```python
|
25 |
+
litellm.success_callback = ["llmonitor"]
|
26 |
+
litellm.failure_callback = ["llmonitor"]
|
27 |
+
```
|
28 |
+
|
29 |
+
Complete code
|
30 |
+
|
31 |
+
```python
|
32 |
+
from litellm import completion
|
33 |
+
|
34 |
+
## set env variables
|
35 |
+
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"
|
36 |
+
# Optional: os.environ["LLMONITOR_API_URL"] = "self-hosting-url"
|
37 |
+
|
38 |
+
os.environ["OPENAI_API_KEY"], os.environ["COHERE_API_KEY"] = "", ""
|
39 |
+
|
40 |
+
# set callbacks
|
41 |
+
litellm.success_callback = ["llmonitor"]
|
42 |
+
litellm.failure_callback = ["llmonitor"]
|
43 |
+
|
44 |
+
#openai call
|
45 |
+
response = completion(
|
46 |
+
model="gpt-3.5-turbo",
|
47 |
+
messages=[{"role": "user", "content": "Hi π - i'm openai"}],
|
48 |
+
user="ishaan_litellm"
|
49 |
+
)
|
50 |
+
|
51 |
+
#cohere call
|
52 |
+
response = completion(
|
53 |
+
model="command-nightly",
|
54 |
+
messages=[{"role": "user", "content": "Hi π - i'm cohere"}],
|
55 |
+
user="ishaan_litellm"
|
56 |
+
)
|
57 |
+
```
|
58 |
+
|
59 |
+
## Support & Talk to Founders
|
60 |
+
|
61 |
+
- [Schedule Demo π](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
62 |
+
- [Community Discord π](https://discord.gg/wuPM9dRgDw)
|
63 |
+
- Our numbers π +1 (770) 8783-106 / β+1 (412) 618-6238β¬
|
64 |
+
- Our emails βοΈ [email protected] / [email protected]
|
65 |
+
- Meet the LLMonitor team on [Discord](http://discord.com/invite/8PafSG58kK) or via [email](mailto:[email protected]).
|
docs/my-website/docs/observability/promptlayer_integration.md
ADDED
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Promptlayer Tutorial
|
2 |
+
|
3 |
+
Promptlayer is a platform for prompt engineers. Log OpenAI requests. Search usage history. Track performance. Visually manage prompt templates.
|
4 |
+
|
5 |
+
<Image img={require('../../img/promptlayer.png')} />
|
6 |
+
|
7 |
+
## Use Promptlayer to log requests across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM)
|
8 |
+
|
9 |
+
liteLLM provides `callbacks`, making it easy for you to log data depending on the status of your responses.
|
10 |
+
|
11 |
+
### Using Callbacks
|
12 |
+
|
13 |
+
Get your PromptLayer API Key from https://promptlayer.com/
|
14 |
+
|
15 |
+
Use just 2 lines of code, to instantly log your responses **across all providers** with promptlayer:
|
16 |
+
|
17 |
+
```python
|
18 |
+
litellm.success_callback = ["promptlayer"]
|
19 |
+
|
20 |
+
```
|
21 |
+
|
22 |
+
Complete code
|
23 |
+
|
24 |
+
```python
|
25 |
+
from litellm import completion
|
26 |
+
|
27 |
+
## set env variables
|
28 |
+
os.environ["PROMPTLAYER_API_KEY"] = "your-promptlayer-key"
|
29 |
+
|
30 |
+
os.environ["OPENAI_API_KEY"], os.environ["COHERE_API_KEY"] = "", ""
|
31 |
+
|
32 |
+
# set callbacks
|
33 |
+
litellm.success_callback = ["promptlayer"]
|
34 |
+
|
35 |
+
#openai call
|
36 |
+
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi π - i'm openai"}])
|
37 |
+
|
38 |
+
#cohere call
|
39 |
+
response = completion(model="command-nightly", messages=[{"role": "user", "content": "Hi π - i'm cohere"}])
|
40 |
+
```
|
41 |
+
|
42 |
+
### Logging Metadata
|
43 |
+
|
44 |
+
You can also log completion call metadata to Promptlayer.
|
45 |
+
|
46 |
+
You can add metadata to a completion call through the metadata param:
|
47 |
+
```python
|
48 |
+
completion(model,messages, metadata={"model": "ai21"})
|
49 |
+
```
|
50 |
+
|
51 |
+
**Complete Code**
|
52 |
+
```python
|
53 |
+
from litellm import completion
|
54 |
+
|
55 |
+
## set env variables
|
56 |
+
os.environ["PROMPTLAYER_API_KEY"] = "your-promptlayer-key"
|
57 |
+
|
58 |
+
os.environ["OPENAI_API_KEY"], os.environ["COHERE_API_KEY"] = "", ""
|
59 |
+
|
60 |
+
# set callbacks
|
61 |
+
litellm.success_callback = ["promptlayer"]
|
62 |
+
|
63 |
+
#openai call - log llm provider is openai
|
64 |
+
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi π - i'm openai"}], metadata={"provider": "openai"})
|
65 |
+
|
66 |
+
#cohere call - log llm provider is cohere
|
67 |
+
response = completion(model="command-nightly", messages=[{"role": "user", "content": "Hi π - i'm cohere"}], metadata={"provider": "cohere"})
|
68 |
+
```
|
69 |
+
|
70 |
+
Credits to [Nick Bradford](https://github.com/nsbradford), from [Vim-GPT](https://github.com/nsbradford/VimGPT), for the suggestion.
|
71 |
+
|
72 |
+
## Support & Talk to Founders
|
73 |
+
|
74 |
+
- [Schedule Demo π](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
75 |
+
- [Community Discord π](https://discord.gg/wuPM9dRgDw)
|
76 |
+
- Our numbers π +1 (770) 8783-106 / β+1 (412) 618-6238β¬
|
77 |
+
- Our emails βοΈ [email protected] / [email protected]
|
docs/my-website/docs/observability/sentry.md
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Image from '@theme/IdealImage';
|
2 |
+
|
3 |
+
# Sentry - Log LLM Exceptions
|
4 |
+
[Sentry](https://sentry.io/) provides error monitoring for production. LiteLLM can add breadcrumbs and send exceptions to Sentry with this integration
|
5 |
+
|
6 |
+
Track exceptions for:
|
7 |
+
- litellm.completion() - completion()for 100+ LLMs
|
8 |
+
- litellm.acompletion() - async completion()
|
9 |
+
- Streaming completion() & acompletion() calls
|
10 |
+
|
11 |
+
<Image img={require('../../img/sentry.png')} />
|
12 |
+
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
### Set SENTRY_DSN & callback
|
17 |
+
|
18 |
+
```python
|
19 |
+
import litellm, os
|
20 |
+
os.environ["SENTRY_DSN"] = "your-sentry-url"
|
21 |
+
litellm.failure_callback=["sentry"]
|
22 |
+
```
|
23 |
+
|
24 |
+
### Sentry callback with completion
|
25 |
+
```python
|
26 |
+
import litellm
|
27 |
+
from litellm import completion
|
28 |
+
|
29 |
+
litellm.input_callback=["sentry"] # adds sentry breadcrumbing
|
30 |
+
litellm.failure_callback=["sentry"] # [OPTIONAL] if you want litellm to capture -> send exception to sentry
|
31 |
+
|
32 |
+
import os
|
33 |
+
os.environ["SENTRY_DSN"] = "your-sentry-url"
|
34 |
+
os.environ["OPENAI_API_KEY"] = "your-openai-key"
|
35 |
+
|
36 |
+
# set bad key to trigger error
|
37 |
+
api_key="bad-key"
|
38 |
+
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey!"}], stream=True, api_key=api_key)
|
39 |
+
|
40 |
+
print(response)
|
41 |
+
```
|
42 |
+
|
43 |
+
[Let us know](https://github.com/BerriAI/litellm/issues/new?assignees=&labels=enhancement&projects=&template=feature_request.yml&title=%5BFeature%5D%3A+) if you need any additional options from Sentry.
|
44 |
+
|
docs/my-website/docs/observability/slack_integration.md
ADDED
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Image from '@theme/IdealImage';
|
2 |
+
|
3 |
+
# Slack - Logging LLM Input/Output, Exceptions
|
4 |
+
|
5 |
+
<Image img={require('../../img/slack.png')} />
|
6 |
+
|
7 |
+
:::info
|
8 |
+
We want to learn how we can make the callbacks better! Meet the LiteLLM [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
|
9 |
+
join our [discord](https://discord.gg/wuPM9dRgDw)
|
10 |
+
:::
|
11 |
+
|
12 |
+
## Pre-Requisites
|
13 |
+
|
14 |
+
### Step 1
|
15 |
+
```shell
|
16 |
+
pip install litellm
|
17 |
+
```
|
18 |
+
|
19 |
+
### Step 2
|
20 |
+
Get a slack webhook url from https://api.slack.com/messaging/webhooks
|
21 |
+
|
22 |
+
|
23 |
+
|
24 |
+
## Quick Start
|
25 |
+
### Create a custom Callback to log to slack
|
26 |
+
We create a custom callback, to log to slack webhooks, see [custom callbacks on litellm](https://docs.litellm.ai/docs/observability/custom_callback)
|
27 |
+
```python
|
28 |
+
def send_slack_alert(
|
29 |
+
kwargs,
|
30 |
+
completion_response,
|
31 |
+
start_time,
|
32 |
+
end_time,
|
33 |
+
):
|
34 |
+
print(
|
35 |
+
"in custom slack callback func"
|
36 |
+
)
|
37 |
+
import requests
|
38 |
+
import json
|
39 |
+
|
40 |
+
# Define the Slack webhook URL
|
41 |
+
# get it from https://api.slack.com/messaging/webhooks
|
42 |
+
slack_webhook_url = os.environ['SLACK_WEBHOOK_URL'] # "https://hooks.slack.com/services/<>/<>/<>"
|
43 |
+
|
44 |
+
# Define the text payload, send data available in litellm custom_callbacks
|
45 |
+
text_payload = f"""LiteLLM Logging: kwargs: {str(kwargs)}\n\n, response: {str(completion_response)}\n\n, start time{str(start_time)} end time: {str(end_time)}
|
46 |
+
"""
|
47 |
+
payload = {
|
48 |
+
"text": text_payload
|
49 |
+
}
|
50 |
+
|
51 |
+
# Set the headers
|
52 |
+
headers = {
|
53 |
+
"Content-type": "application/json"
|
54 |
+
}
|
55 |
+
|
56 |
+
# Make the POST request
|
57 |
+
response = requests.post(slack_webhook_url, json=payload, headers=headers)
|
58 |
+
|
59 |
+
# Check the response status
|
60 |
+
if response.status_code == 200:
|
61 |
+
print("Message sent successfully to Slack!")
|
62 |
+
else:
|
63 |
+
print(f"Failed to send message to Slack. Status code: {response.status_code}")
|
64 |
+
print(response.json())
|
65 |
+
```
|
66 |
+
|
67 |
+
### Pass callback to LiteLLM
|
68 |
+
```python
|
69 |
+
litellm.success_callback = [send_slack_alert]
|
70 |
+
```
|
71 |
+
|
72 |
+
```python
|
73 |
+
import litellm
|
74 |
+
litellm.success_callback = [send_slack_alert] # log success
|
75 |
+
litellm.failure_callback = [send_slack_alert] # log exceptions
|
76 |
+
|
77 |
+
# this will raise an exception
|
78 |
+
response = litellm.completion(
|
79 |
+
model="gpt-2",
|
80 |
+
messages=[
|
81 |
+
{
|
82 |
+
"role": "user",
|
83 |
+
"content": "Hi π - i'm openai"
|
84 |
+
}
|
85 |
+
]
|
86 |
+
)
|
87 |
+
```
|
88 |
+
## Support & Talk to Founders
|
89 |
+
|
90 |
+
- [Schedule Demo π](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
91 |
+
- [Community Discord π](https://discord.gg/wuPM9dRgDw)
|
92 |
+
- Our numbers π +1 (770) 8783-106 / β+1 (412) 618-6238β¬
|
93 |
+
- Our emails βοΈ [email protected] / [email protected]
|
docs/my-website/docs/observability/supabase_integration.md
ADDED
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Supabase Tutorial
|
2 |
+
[Supabase](https://supabase.com/) is an open source Firebase alternative.
|
3 |
+
Start your project with a Postgres database, Authentication, instant APIs, Edge Functions, Realtime subscriptions, Storage, and Vector embeddings.
|
4 |
+
|
5 |
+
## Use Supabase to log requests and see total spend across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM)
|
6 |
+
liteLLM provides `success_callbacks` and `failure_callbacks`, making it easy for you to send data to a particular provider depending on the status of your responses.
|
7 |
+
|
8 |
+
In this case, we want to log requests to Supabase in both scenarios - when it succeeds and fails.
|
9 |
+
|
10 |
+
### Create a supabase table
|
11 |
+
|
12 |
+
Go to your Supabase project > go to the [Supabase SQL Editor](https://supabase.com/dashboard/projects) and create a new table with this configuration.
|
13 |
+
|
14 |
+
Note: You can change the table name. Just don't change the column names.
|
15 |
+
|
16 |
+
```sql
|
17 |
+
create table
|
18 |
+
public.request_logs (
|
19 |
+
id bigint generated by default as identity,
|
20 |
+
created_at timestamp with time zone null default now(),
|
21 |
+
model text null default ''::text,
|
22 |
+
messages json null default '{}'::json,
|
23 |
+
response json null default '{}'::json,
|
24 |
+
end_user text null default ''::text,
|
25 |
+
status text null default ''::text,
|
26 |
+
error json null default '{}'::json,
|
27 |
+
response_time real null default '0'::real,
|
28 |
+
total_cost real null,
|
29 |
+
additional_details json null default '{}'::json,
|
30 |
+
litellm_call_id text unique,
|
31 |
+
primary key (id)
|
32 |
+
) tablespace pg_default;
|
33 |
+
```
|
34 |
+
|
35 |
+
### Use Callbacks
|
36 |
+
Use just 2 lines of code, to instantly see costs and log your responses **across all providers** with Supabase:
|
37 |
+
|
38 |
+
```python
|
39 |
+
litellm.success_callback=["supabase"]
|
40 |
+
litellm.failure_callback=["supabase"]
|
41 |
+
```
|
42 |
+
|
43 |
+
Complete code
|
44 |
+
```python
|
45 |
+
from litellm import completion
|
46 |
+
|
47 |
+
## set env variables
|
48 |
+
### SUPABASE
|
49 |
+
os.environ["SUPABASE_URL"] = "your-supabase-url"
|
50 |
+
os.environ["SUPABASE_KEY"] = "your-supabase-key"
|
51 |
+
|
52 |
+
## LLM API KEY
|
53 |
+
os.environ["OPENAI_API_KEY"] = ""
|
54 |
+
|
55 |
+
# set callbacks
|
56 |
+
litellm.success_callback=["supabase"]
|
57 |
+
litellm.failure_callback=["supabase"]
|
58 |
+
|
59 |
+
# openai call
|
60 |
+
response = completion(
|
61 |
+
model="gpt-3.5-turbo",
|
62 |
+
messages=[{"role": "user", "content": "Hi π - i'm openai"}],
|
63 |
+
user="ishaan22" # identify users
|
64 |
+
)
|
65 |
+
|
66 |
+
# bad call, expect this call to fail and get logged
|
67 |
+
response = completion(
|
68 |
+
model="chatgpt-test",
|
69 |
+
messages=[{"role": "user", "content": "Hi π - i'm a bad call to test error logging"}]
|
70 |
+
)
|
71 |
+
|
72 |
+
```
|
73 |
+
|
74 |
+
### Additional Controls
|
75 |
+
|
76 |
+
**Identify end-user**
|
77 |
+
|
78 |
+
Pass `user` to `litellm.completion` to map your llm call to an end-user
|
79 |
+
|
80 |
+
```python
|
81 |
+
response = completion(
|
82 |
+
model="gpt-3.5-turbo",
|
83 |
+
messages=[{"role": "user", "content": "Hi π - i'm openai"}],
|
84 |
+
user="ishaan22" # identify users
|
85 |
+
)
|
86 |
+
```
|
87 |
+
|
88 |
+
**Different Table name**
|
89 |
+
|
90 |
+
If you modified your table name, here's how to pass the new name.
|
91 |
+
|
92 |
+
```python
|
93 |
+
litellm.modify_integration("supabase",{"table_name": "litellm_logs"})
|
94 |
+
```
|
95 |
+
|
96 |
+
## Support & Talk to Founders
|
97 |
+
|
98 |
+
- [Schedule Demo π](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
99 |
+
- [Community Discord π](https://discord.gg/wuPM9dRgDw)
|
100 |
+
- Our numbers π +1 (770) 8783-106 / β+1 (412) 618-6238β¬
|
101 |
+
- Our emails βοΈ [email protected] / [email protected]
|
docs/my-website/docs/observability/telemetry.md
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Telemetry
|
2 |
+
|
3 |
+
LiteLLM contains a telemetry feature that tells us what models are used, and what errors are hit.
|
4 |
+
|
5 |
+
## What is logged?
|
6 |
+
|
7 |
+
Only the model name and exception raised is logged.
|
8 |
+
|
9 |
+
## Why?
|
10 |
+
We use this information to help us understand how LiteLLM is used, and improve stability.
|
11 |
+
|
12 |
+
## Opting out
|
13 |
+
If you prefer to opt out of telemetry, you can do this by setting `litellm.telemetry = False`.
|
docs/my-website/docs/observability/traceloop_integration.md
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Image from '@theme/IdealImage';
|
2 |
+
|
3 |
+
# Traceloop (OpenLLMetry) - Tracing LLMs with OpenTelemetry
|
4 |
+
|
5 |
+
[Traceloop](https://traceloop.com) is a platform for monitoring and debugging the quality of your LLM outputs.
|
6 |
+
It provides you with a way to track the performance of your LLM application; rollout changes with confidence; and debug issues in production.
|
7 |
+
It is based on [OpenTelemetry](https://opentelemetry.io), so it can provide full visibility to your LLM requests, as well vector DB usage, and other infra in your stack.
|
8 |
+
|
9 |
+
<Image img={require('../../img/traceloop_dash.png')} />
|
10 |
+
|
11 |
+
## Getting Started
|
12 |
+
|
13 |
+
Install the Traceloop SDK:
|
14 |
+
|
15 |
+
```
|
16 |
+
pip install traceloop-sdk
|
17 |
+
```
|
18 |
+
|
19 |
+
Use just 2 lines of code, to instantly log your LLM responses with OpenTelemetry:
|
20 |
+
|
21 |
+
```python
|
22 |
+
Traceloop.init(app_name=<YOUR APP NAME>, disable_batch=True)
|
23 |
+
litellm.success_callback = ["traceloop"]
|
24 |
+
```
|
25 |
+
|
26 |
+
To get better visualizations on how your code behaves, you may want to annotate specific parts of your LLM chain. See [Traceloop docs on decorators](https://traceloop.com/docs/python-sdk/decorators) for more information.
|
27 |
+
|
28 |
+
## Exporting traces to other systems (e.g. Datadog, New Relic, and others)
|
29 |
+
|
30 |
+
Since Traceloop SDK uses OpenTelemetry to send data, you can easily export your traces to other systems, such as Datadog, New Relic, and others. See [Traceloop docs on exporters](https://traceloop.com/docs/python-sdk/exporters) for more information.
|
31 |
+
|
32 |
+
## Support
|
33 |
+
|
34 |
+
For any question or issue with integration you can reach out to the Traceloop team on [Slack](https://join.slack.com/t/traceloopcommunity/shared_invite/zt-1plpfpm6r-zOHKI028VkpcWdobX65C~g) or via [email](mailto:[email protected]).
|
docs/my-website/docs/observability/wandb_integration.md
ADDED
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import Image from '@theme/IdealImage';
|
2 |
+
|
3 |
+
# Weights & Biases - Logging LLM Input/Output
|
4 |
+
Weights & Biases helps AI developers build better models faster https://wandb.ai
|
5 |
+
|
6 |
+
<Image img={require('../../img/wandb.png')} />
|
7 |
+
|
8 |
+
:::info
|
9 |
+
We want to learn how we can make the callbacks better! Meet the LiteLLM [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
|
10 |
+
join our [discord](https://discord.gg/wuPM9dRgDw)
|
11 |
+
:::
|
12 |
+
|
13 |
+
## Pre-Requisites
|
14 |
+
Ensure you have run `pip install wandb` for this integration
|
15 |
+
```shell
|
16 |
+
pip install wandb litellm
|
17 |
+
```
|
18 |
+
|
19 |
+
## Quick Start
|
20 |
+
Use just 2 lines of code, to instantly log your responses **across all providers** with Weights & Biases
|
21 |
+
|
22 |
+
```python
|
23 |
+
litellm.success_callback = ["wandb"]
|
24 |
+
```
|
25 |
+
```python
|
26 |
+
# pip install wandb
|
27 |
+
import litellm
|
28 |
+
import os
|
29 |
+
|
30 |
+
os.environ["WANDB_API_KEY"] = ""
|
31 |
+
# LLM API Keys
|
32 |
+
os.environ['OPENAI_API_KEY']=""
|
33 |
+
|
34 |
+
# set wandb as a callback, litellm will send the data to Weights & Biases
|
35 |
+
litellm.success_callback = ["wandb"]
|
36 |
+
|
37 |
+
# openai call
|
38 |
+
response = litellm.completion(
|
39 |
+
model="gpt-3.5-turbo",
|
40 |
+
messages=[
|
41 |
+
{"role": "user", "content": "Hi π - i'm openai"}
|
42 |
+
]
|
43 |
+
)
|
44 |
+
```
|
45 |
+
|
46 |
+
## Support & Talk to Founders
|
47 |
+
|
48 |
+
- [Schedule Demo π](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
49 |
+
- [Community Discord π](https://discord.gg/wuPM9dRgDw)
|
50 |
+
- Our numbers π +1 (770) 8783-106 / β+1 (412) 618-6238β¬
|
51 |
+
- Our emails βοΈ [email protected] / [email protected]
|
docs/my-website/docs/projects.md
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Projects Built on LiteLLM
|
2 |
+
|
3 |
+
|
4 |
+
|
5 |
+
### EntoAI
|
6 |
+
Chat and Ask on your own data.
|
7 |
+
[Github](https://github.com/akshata29/entaoai)
|
8 |
+
|
9 |
+
### GPT-Migrate
|
10 |
+
Easily migrate your codebase from one framework or language to another.
|
11 |
+
[Github](https://github.com/0xpayne/gpt-migrate)
|
12 |
+
|
13 |
+
### Otter
|
14 |
+
Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
|
15 |
+
[Github](https://github.com/Luodian/Otter)
|
16 |
+
|
17 |
+
|
18 |
+
|
19 |
+
|