Spaces:
Configuration error
Configuration error
File size: 10,350 Bytes
447ebeb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
<h1 align="center">
LLM-Bench
</h1>
<p align="center">
<p align="center">Benchmark LLMs response, cost and response time</p>
<p>LLM vs Cost per input + output token ($)</p>
<img width="806" alt="Screenshot 2023-11-13 at 2 51 06 PM" src="https://github.com/BerriAI/litellm/assets/29436595/6d1bed71-d062-40b8-a113-28359672636a">
</p>
<a href="https://docs.google.com/spreadsheets/d/1mvPbP02OLFgc-5-Ubn1KxGuQQdbMyG1jhMSWxAldWy4/edit?usp=sharing">
Bar Graph Excel Sheet here
</a>
| Model | Provider | Cost per input + output token ($)|
| --- | --- | --- |
| openrouter/mistralai/mistral-7b-instruct | openrouter | 0.0 |
| ollama/llama2 | ollama | 0.0 |
| ollama/llama2:13b | ollama | 0.0 |
| ollama/llama2:70b | ollama | 0.0 |
| ollama/llama2-uncensored | ollama | 0.0 |
| ollama/mistral | ollama | 0.0 |
| ollama/codellama | ollama | 0.0 |
| ollama/orca-mini | ollama | 0.0 |
| ollama/vicuna | ollama | 0.0 |
| perplexity/codellama-34b-instruct | perplexity | 0.0 |
| perplexity/llama-2-13b-chat | perplexity | 0.0 |
| perplexity/llama-2-70b-chat | perplexity | 0.0 |
| perplexity/mistral-7b-instruct | perplexity | 0.0 |
| perplexity/replit-code-v1.5-3b | perplexity | 0.0 |
| text-bison | vertex_ai-text-models | 0.00000025 |
| text-bison@001 | vertex_ai-text-models | 0.00000025 |
| chat-bison | vertex_ai-chat-models | 0.00000025 |
| chat-bison@001 | vertex_ai-chat-models | 0.00000025 |
| chat-bison-32k | vertex_ai-chat-models | 0.00000025 |
| code-bison | vertex_ai-code-text-models | 0.00000025 |
| code-bison@001 | vertex_ai-code-text-models | 0.00000025 |
| code-gecko@001 | vertex_ai-chat-models | 0.00000025 |
| code-gecko@latest | vertex_ai-chat-models | 0.00000025 |
| codechat-bison | vertex_ai-code-chat-models | 0.00000025 |
| codechat-bison@001 | vertex_ai-code-chat-models | 0.00000025 |
| codechat-bison-32k | vertex_ai-code-chat-models | 0.00000025 |
| palm/chat-bison | palm | 0.00000025 |
| palm/chat-bison-001 | palm | 0.00000025 |
| palm/text-bison | palm | 0.00000025 |
| palm/text-bison-001 | palm | 0.00000025 |
| palm/text-bison-safety-off | palm | 0.00000025 |
| palm/text-bison-safety-recitation-off | palm | 0.00000025 |
| anyscale/meta-llama/Llama-2-7b-chat-hf | anyscale | 0.0000003 |
| anyscale/mistralai/Mistral-7B-Instruct-v0.1 | anyscale | 0.0000003 |
| openrouter/meta-llama/llama-2-13b-chat | openrouter | 0.0000004 |
| openrouter/nousresearch/nous-hermes-llama2-13b | openrouter | 0.0000004 |
| deepinfra/meta-llama/Llama-2-7b-chat-hf | deepinfra | 0.0000004 |
| deepinfra/mistralai/Mistral-7B-Instruct-v0.1 | deepinfra | 0.0000004 |
| anyscale/meta-llama/Llama-2-13b-chat-hf | anyscale | 0.0000005 |
| amazon.titan-text-lite-v1 | bedrock | 0.0000007 |
| deepinfra/meta-llama/Llama-2-13b-chat-hf | deepinfra | 0.0000007 |
| text-babbage-001 | text-completion-openai | 0.0000008 |
| text-ada-001 | text-completion-openai | 0.0000008 |
| babbage-002 | text-completion-openai | 0.0000008 |
| openrouter/google/palm-2-chat-bison | openrouter | 0.000001 |
| openrouter/google/palm-2-codechat-bison | openrouter | 0.000001 |
| openrouter/meta-llama/codellama-34b-instruct | openrouter | 0.000001 |
| deepinfra/codellama/CodeLlama-34b-Instruct-hf | deepinfra | 0.0000012 |
| deepinfra/meta-llama/Llama-2-70b-chat-hf | deepinfra | 0.0000016499999999999999 |
| deepinfra/jondurbin/airoboros-l2-70b-gpt4-1.4.1 | deepinfra | 0.0000016499999999999999 |
| anyscale/meta-llama/Llama-2-70b-chat-hf | anyscale | 0.000002 |
| anyscale/codellama/CodeLlama-34b-Instruct-hf | anyscale | 0.000002 |
| gpt-3.5-turbo-1106 | openai | 0.000003 |
| openrouter/meta-llama/llama-2-70b-chat | openrouter | 0.000003 |
| amazon.titan-text-express-v1 | bedrock | 0.000003 |
| gpt-3.5-turbo | openai | 0.0000035 |
| gpt-3.5-turbo-0301 | openai | 0.0000035 |
| gpt-3.5-turbo-0613 | openai | 0.0000035 |
| gpt-3.5-turbo-instruct | text-completion-openai | 0.0000035 |
| openrouter/openai/gpt-3.5-turbo | openrouter | 0.0000035 |
| cohere.command-text-v14 | bedrock | 0.0000035 |
| gpt-3.5-turbo-0613 | openai | 0.0000035 |
| claude-instant-1 | anthropic | 0.00000714 |
| claude-instant-1.2 | anthropic | 0.00000714 |
| openrouter/anthropic/claude-instant-v1 | openrouter | 0.00000714 |
| anthropic.claude-instant-v1 | bedrock | 0.00000714 |
| openrouter/mancer/weaver | openrouter | 0.00001125 |
| j2-mid | ai21 | 0.00002 |
| ai21.j2-mid-v1 | bedrock | 0.000025 |
| openrouter/jondurbin/airoboros-l2-70b-2.1 | openrouter | 0.00002775 |
| command-nightly | cohere | 0.00003 |
| command | cohere | 0.00003 |
| command-light | cohere | 0.00003 |
| command-medium-beta | cohere | 0.00003 |
| command-xlarge-beta | cohere | 0.00003 |
| command-r-plus| cohere | 0.000018 |
| j2-ultra | ai21 | 0.00003 |
| ai21.j2-ultra-v1 | bedrock | 0.0000376 |
| gpt-4-1106-preview | openai | 0.00004 |
| gpt-4-vision-preview | openai | 0.00004 |
| claude-2 | anthropic | 0.0000437 |
| openrouter/anthropic/claude-2 | openrouter | 0.0000437 |
| anthropic.claude-v1 | bedrock | 0.0000437 |
| anthropic.claude-v2 | bedrock | 0.0000437 |
| gpt-4 | openai | 0.00009 |
| gpt-4-0314 | openai | 0.00009 |
| gpt-4-0613 | openai | 0.00009 |
| openrouter/openai/gpt-4 | openrouter | 0.00009 |
| gpt-4-32k | openai | 0.00018 |
| gpt-4-32k-0314 | openai | 0.00018 |
| gpt-4-32k-0613 | openai | 0.00018 |
## Setup:
```
git clone https://github.com/BerriAI/litellm
```
cd to `benchmark` dir
```
cd litellm/cookbook/benchmark
```
### Install Dependencies
```
pip install litellm click tqdm tabulate termcolor
```
### Configuration
In `benchmark/benchmark.py` select your LLMs, LLM API Key and questions
Supported LLMs: https://docs.litellm.ai/docs/providers
```python
# Define the list of models to benchmark
models = ['gpt-3.5-turbo', 'togethercomputer/llama-2-70b-chat', 'claude-2']
# Enter LLM API keys
os.environ['OPENAI_API_KEY'] = ""
os.environ['ANTHROPIC_API_KEY'] = ""
os.environ['TOGETHERAI_API_KEY'] = ""
# List of questions to benchmark (replace with your questions)
questions = [
"When will BerriAI IPO?",
"When will LiteLLM hit $100M ARR?"
]
```
## Run LLM-Bench
```
python3 benchmark.py
```
## Expected Output
```
Running question: When will BerriAI IPO? for model: claude-2: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:13<00:00, 4.41s/it]
Benchmark Results for 'When will BerriAI IPO?':
+-----------------+----------------------------------------------------------------------------------+---------------------------+------------+
| Model | Response | Response Time (seconds) | Cost ($) |
+=================+==================================================================================+===========================+============+
| gpt-3.5-turbo | As an AI language model, I cannot provide up-to-date information or predict | 1.55 seconds | $0.000122 |
| | future events. It is best to consult a reliable financial source or contact | | |
| | BerriAI directly for information regarding their IPO plans. | | |
+-----------------+----------------------------------------------------------------------------------+---------------------------+------------+
| togethercompute | I'm not able to provide information about future IPO plans or dates for BerriAI | 8.52 seconds | $0.000531 |
| r/llama-2-70b-c | or any other company. IPO (Initial Public Offering) plans and timelines are | | |
| hat | typically kept private by companies until they are ready to make a public | | |
| | announcement. It's important to note that IPO plans can change and are subject | | |
| | to various factors, such as market conditions, financial performance, and | | |
| | regulatory approvals. Therefore, it's difficult to predict with certainty when | | |
| | BerriAI or any other company will go public. If you're interested in staying | | |
| | up-to-date with BerriAI's latest news and developments, you may want to follow | | |
| | their official social media accounts, subscribe to their newsletter, or visit | | |
| | their website periodically for updates. | | |
+-----------------+----------------------------------------------------------------------------------+---------------------------+------------+
| claude-2 | I do not have any information about when or if BerriAI will have an initial | 3.17 seconds | $0.002084 |
| | public offering (IPO). As an AI assistant created by Anthropic to be helpful, | | |
| | harmless, and honest, I do not have insider knowledge about Anthropic's business | | |
| | plans or strategies. | | |
+-----------------+----------------------------------------------------------------------------------+---------------------------+------------+
```
## Support
**π€ Schedule a 1-on-1 Session:** Book a [1-on-1 session](https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat) with Krrish and Ishaan, the founders, to discuss any issues, provide feedback, or explore how we can improve LiteLLM for you.
|