File size: 10,350 Bytes
447ebeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
<h1 align="center">
        LLM-Bench

    </h1>

    <p align="center">

        <p align="center">Benchmark LLMs response, cost and response time</p>

        <p>LLM vs Cost per input + output token ($)</p>

        <img width="806" alt="Screenshot 2023-11-13 at 2 51 06 PM" src="https://github.com/BerriAI/litellm/assets/29436595/6d1bed71-d062-40b8-a113-28359672636a">

    </p>

        <a href="https://docs.google.com/spreadsheets/d/1mvPbP02OLFgc-5-Ubn1KxGuQQdbMyG1jhMSWxAldWy4/edit?usp=sharing">

               Bar Graph Excel Sheet here

        </a>


| Model | Provider | Cost per input + output token ($)|
| --- | --- | --- |
| openrouter/mistralai/mistral-7b-instruct | openrouter | 0.0 |
| ollama/llama2 | ollama | 0.0 |
| ollama/llama2:13b | ollama | 0.0 |
| ollama/llama2:70b | ollama | 0.0 |
| ollama/llama2-uncensored | ollama | 0.0 |
| ollama/mistral | ollama | 0.0 |
| ollama/codellama | ollama | 0.0 |
| ollama/orca-mini | ollama | 0.0 |
| ollama/vicuna | ollama | 0.0 |
| perplexity/codellama-34b-instruct | perplexity | 0.0 |
| perplexity/llama-2-13b-chat | perplexity | 0.0 |
| perplexity/llama-2-70b-chat | perplexity | 0.0 |
| perplexity/mistral-7b-instruct | perplexity | 0.0 |
| perplexity/replit-code-v1.5-3b | perplexity | 0.0 |
| text-bison | vertex_ai-text-models | 0.00000025 |

| text-bison@001 | vertex_ai-text-models | 0.00000025 |
| chat-bison | vertex_ai-chat-models | 0.00000025 |

| chat-bison@001 | vertex_ai-chat-models | 0.00000025 |
| chat-bison-32k | vertex_ai-chat-models | 0.00000025 |

| code-bison | vertex_ai-code-text-models | 0.00000025 |
| code-bison@001 | vertex_ai-code-text-models | 0.00000025 |

| code-gecko@001 | vertex_ai-chat-models | 0.00000025 |
| code-gecko@latest | vertex_ai-chat-models | 0.00000025 |

| codechat-bison | vertex_ai-code-chat-models | 0.00000025 |
| codechat-bison@001 | vertex_ai-code-chat-models | 0.00000025 |

| codechat-bison-32k | vertex_ai-code-chat-models | 0.00000025 |
| palm/chat-bison | palm | 0.00000025 |
| palm/chat-bison-001 | palm | 0.00000025 |
| palm/text-bison | palm | 0.00000025 |
| palm/text-bison-001 | palm | 0.00000025 |
| palm/text-bison-safety-off | palm | 0.00000025 |
| palm/text-bison-safety-recitation-off | palm | 0.00000025 |
| anyscale/meta-llama/Llama-2-7b-chat-hf | anyscale | 0.0000003 |
| anyscale/mistralai/Mistral-7B-Instruct-v0.1 | anyscale | 0.0000003 |
| openrouter/meta-llama/llama-2-13b-chat | openrouter | 0.0000004 |
| openrouter/nousresearch/nous-hermes-llama2-13b | openrouter | 0.0000004 |
| deepinfra/meta-llama/Llama-2-7b-chat-hf | deepinfra | 0.0000004 |
| deepinfra/mistralai/Mistral-7B-Instruct-v0.1 | deepinfra | 0.0000004 |
| anyscale/meta-llama/Llama-2-13b-chat-hf | anyscale | 0.0000005 |
| amazon.titan-text-lite-v1 | bedrock | 0.0000007 |
| deepinfra/meta-llama/Llama-2-13b-chat-hf | deepinfra | 0.0000007 |
| text-babbage-001 | text-completion-openai | 0.0000008 |
| text-ada-001 | text-completion-openai | 0.0000008 |
| babbage-002 | text-completion-openai | 0.0000008 |
| openrouter/google/palm-2-chat-bison | openrouter | 0.000001 |
| openrouter/google/palm-2-codechat-bison | openrouter | 0.000001 |
| openrouter/meta-llama/codellama-34b-instruct | openrouter | 0.000001 |
| deepinfra/codellama/CodeLlama-34b-Instruct-hf | deepinfra | 0.0000012 |
| deepinfra/meta-llama/Llama-2-70b-chat-hf | deepinfra | 0.0000016499999999999999 |
| deepinfra/jondurbin/airoboros-l2-70b-gpt4-1.4.1 | deepinfra | 0.0000016499999999999999 |
| anyscale/meta-llama/Llama-2-70b-chat-hf | anyscale | 0.000002 |
| anyscale/codellama/CodeLlama-34b-Instruct-hf | anyscale | 0.000002 |
| gpt-3.5-turbo-1106 | openai | 0.000003 |
| openrouter/meta-llama/llama-2-70b-chat | openrouter | 0.000003 |
| amazon.titan-text-express-v1 | bedrock | 0.000003 |
| gpt-3.5-turbo | openai | 0.0000035 |
| gpt-3.5-turbo-0301 | openai | 0.0000035 |
| gpt-3.5-turbo-0613 | openai | 0.0000035 |
| gpt-3.5-turbo-instruct | text-completion-openai | 0.0000035 |
| openrouter/openai/gpt-3.5-turbo | openrouter | 0.0000035 |
| cohere.command-text-v14 | bedrock | 0.0000035 |
| gpt-3.5-turbo-0613 | openai | 0.0000035 |
| claude-instant-1 | anthropic | 0.00000714 |
| claude-instant-1.2 | anthropic | 0.00000714 |
| openrouter/anthropic/claude-instant-v1 | openrouter | 0.00000714 |
| anthropic.claude-instant-v1 | bedrock | 0.00000714 |
| openrouter/mancer/weaver | openrouter | 0.00001125 |
| j2-mid | ai21 | 0.00002 |
| ai21.j2-mid-v1 | bedrock | 0.000025 |
| openrouter/jondurbin/airoboros-l2-70b-2.1 | openrouter | 0.00002775 |
| command-nightly | cohere | 0.00003 |
| command | cohere | 0.00003 |
| command-light | cohere | 0.00003 |
| command-medium-beta | cohere | 0.00003 |
| command-xlarge-beta | cohere | 0.00003 |
| command-r-plus| cohere | 0.000018 |
| j2-ultra | ai21 | 0.00003 |
| ai21.j2-ultra-v1 | bedrock | 0.0000376 |
| gpt-4-1106-preview | openai | 0.00004 |
| gpt-4-vision-preview | openai | 0.00004 |
| claude-2 | anthropic | 0.0000437 |
| openrouter/anthropic/claude-2 | openrouter | 0.0000437 |
| anthropic.claude-v1 | bedrock | 0.0000437 |
| anthropic.claude-v2 | bedrock | 0.0000437 |
| gpt-4 | openai | 0.00009 |
| gpt-4-0314 | openai | 0.00009 |
| gpt-4-0613 | openai | 0.00009 |
| openrouter/openai/gpt-4 | openrouter | 0.00009 |
| gpt-4-32k | openai | 0.00018 |
| gpt-4-32k-0314 | openai | 0.00018 |
| gpt-4-32k-0613 | openai | 0.00018 |



## Setup:
```

git clone https://github.com/BerriAI/litellm

```
cd to `benchmark` dir
```

cd litellm/cookbook/benchmark

```

### Install Dependencies
```

pip install litellm click tqdm tabulate termcolor

```

### Configuration
In `benchmark/benchmark.py` select your LLMs, LLM API Key and questions

Supported LLMs: https://docs.litellm.ai/docs/providers

```python

# Define the list of models to benchmark

models = ['gpt-3.5-turbo', 'togethercomputer/llama-2-70b-chat', 'claude-2']



# Enter LLM API keys

os.environ['OPENAI_API_KEY'] = ""

os.environ['ANTHROPIC_API_KEY'] = ""

os.environ['TOGETHERAI_API_KEY'] = ""



# List of questions to benchmark (replace with your questions)

questions = [

    "When will BerriAI IPO?",

    "When will LiteLLM hit $100M ARR?"

]



```

## Run LLM-Bench
```

python3 benchmark.py

```

## Expected Output
```

Running question: When will BerriAI IPO? for model: claude-2: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:13<00:00,  4.41s/it]



Benchmark Results for 'When will BerriAI IPO?':

+-----------------+----------------------------------------------------------------------------------+---------------------------+------------+

| Model           | Response                                                                         | Response Time (seconds)   | Cost ($)   |

+=================+==================================================================================+===========================+============+

| gpt-3.5-turbo   | As an AI language model, I cannot provide up-to-date information or predict      | 1.55 seconds              | $0.000122  |

|                 | future events. It is best to consult a reliable financial source or contact      |                           |            |

|                 | BerriAI directly for information regarding their IPO plans.                      |                           |            |

+-----------------+----------------------------------------------------------------------------------+---------------------------+------------+

| togethercompute | I'm not able to provide information about future IPO plans or dates for BerriAI  | 8.52 seconds              | $0.000531  |

| r/llama-2-70b-c | or any other company. IPO (Initial Public Offering) plans and timelines are      |                           |            |

| hat             | typically kept private by companies until they are ready to make a public        |                           |            |

|                 | announcement.  It's important to note that IPO plans can change and are subject  |                           |            |

|                 | to various factors, such as market conditions, financial performance, and        |                           |            |

|                 | regulatory approvals. Therefore, it's difficult to predict with certainty when   |                           |            |

|                 | BerriAI or any other company will go public.  If you're interested in staying    |                           |            |

|                 | up-to-date with BerriAI's latest news and developments, you may want to follow   |                           |            |

|                 | their official social media accounts, subscribe to their newsletter, or visit    |                           |            |

|                 | their website periodically for updates.                                          |                           |            |

+-----------------+----------------------------------------------------------------------------------+---------------------------+------------+

| claude-2        | I do not have any information about when or if BerriAI will have an initial      | 3.17 seconds              | $0.002084  |

|                 | public offering (IPO). As an AI assistant created by Anthropic to be helpful,    |                           |            |

|                 | harmless, and honest, I do not have insider knowledge about Anthropic's business |                           |            |

|                 | plans or strategies.                                                             |                           |            |

+-----------------+----------------------------------------------------------------------------------+---------------------------+------------+

```

## Support 
**🀝 Schedule a 1-on-1 Session:** Book a [1-on-1 session](https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat) with Krrish and Ishaan, the founders, to discuss any issues, provide feedback, or explore how we can improve LiteLLM for you.