crm_llm_leaderboard / crm-results /hf_leaderboard_latency_cost.csv
yibum's picture
add latency cost table
ada4cd8
raw
history blame
3.46 kB
Model Name,Use Case Type,Version,Platform,Mean Latency (sec) per Request,Mean Output Tokens,Mean Cost per 1K Requests,Cost Band,,Model id,Cost per 1m input tokens,Cost per 1m output tokens,,,,Percentile,From,To,,min,Max
AI21 Jamba-Instruct,Long,,AI21,4.0,232.9,1.6,Medium,,GPT 3.5 Turbo,0.5,1.5,,,0%,0.43,0.43,1.61,,0.43,61.11
AI21 Jamba-Instruct,Short,,AI21,4.0,243.9,0.5,Low,,GPT 4 Turbo,10,30,,,33%,1.61,1.61,9.28,,,
Claude 3 Haiku,Long,,Bedrock,2.8,236.9,1.0,Low,,GPT4-o,5,15,,,67%,9.28,9.28,61.11,,,
Claude 3 Haiku,Short,,Bedrock,2.2,245.4,0.4,Low,,Claude 3 Haiku,0.25,1.25,,,100%,61.11,,,,,
Claude 3 Opus,Long,,Bedrock,12.2,242.7,61.1,High,,Claude 3 Opus,15,75,,,,,,,,,
Claude 3 Opus,Short,,Bedrock,8.4,243.2,25.4,High,,AI21 Jamba-Instruct,0.5,0.7,,,,,,,,,
Cohere Command R+,Long,,Bedrock,7.7,245.7,11.7,High,,Cohere Command Text,1.5,2,,,,,,,,,
Cohere Command R+,Short,,Bedrock,7.1,249.9,5.1,Medium,,Cohere Command R+,3,15,,,,,,,,,
Cohere Command Text,Long,,Bedrock,12.9,238.7,4.3,Medium,,Gemini Pro 1,0.5,1.5,,,,,,,,,
Cohere Command Text,Short,,Bedrock,9.6,245.6,1.1,Low,,Gemini Pro 1.5,3.5,7,,,,,,,,,
Gemini Pro 1.5,Long,,Google,5.5,245.7,11.0,High,,,,,,,,,,,,,
Gemini Pro 1.5,Short,,Google,5.4,247.5,3.3,Medium,,,,,,,,,,,,,
Gemini Pro 1,Long,,Google,6.0,228.9,1.7,Medium,,,,,,,,,,,,,
Gemini Pro 1,Short,,Google,4.4,247.4,0.6,Low,,,,,,,,,,,,,
GPT 3.5 Turbo,Long,,OpenAI,4.5,249.9,1.6,Low,,,,,,,,,,,,,
GPT 3.5 Turbo,Short,,OpenAI,4.2,238.3,0.6,Low,,,,,,,,,,,,,
GPT 4 Turbo,Long,,OpenAI,12.3,247.6,32.0,High,,,,,,,,,,,,,
GPT 4 Turbo,Short,,OpenAI,12.3,250.0,11.7,High,,,,,,,,,,,,,
GPT4-o,Long,,OpenAI,5.1,248.4,15.9,High,,,,,,,,,,,,,
GPT4-o,Short,,OpenAI,5.0,250.0,5.8,Medium,,,,,,,,,,,,,
Mistral 7B,Long,Mistral-7B-Instruct-v0.2,Self-host (g5.48xlarge),8.83,242.0,16.5,High,,,,,,,,,,,,,
Mistral 7B,Short,Mistral-7B-Instruct-v0.2,Self-host (g5.48xlarge),8.31,247.0,15.5,High,,,,,,,,,,,,,
LLaMA 3 8B,Long,Meta-Llama-3-8B-Instruct,Self-host (g5.48xlarge),3.76,251.5,7.0,Medium,,,,,,,,,,,,,
LLaMA 3 8B,Short,Meta-Llama-3-8B-Instruct,Self-host (g5.48xlarge),3.23,243.6,6.0,Medium,,,,,,,,,,,,,
LLaMA 3 70B,Long,llama-3-70b-instruct,Self-host (p4d.24xlarge),20.1,243.9,67.7,High,,,,,,,,,,,,,
LLaMA 3 70B,Short,llama-3-70b-instruct,Self-host (p4d.24xlarge),29.4,251.2,99.0,High,,,,,,,,,,,,,
Mixtral 8x7B,Long,mixtral-8x7b-instruct,Self-host (p4d.24xlarge),2.44,248.5,8.22,Medium,,,,,,,,,,,,,
Mixtral 8x7B,Short,mixtral-8x7b-instruct,Self-host (p4d.24xlarge),2.41,250.0,8.11,Medium,,,,,,,,,,,,,
SF-TextBase 7B,Long,CRM-TextBase-7b-22k-g5 (endpoint),Self-host (g5.48xlarge),8.99,248.5,16.80,High,,,,,,,,,,,,,
SF-TextBase 7B,Short,CRM-TextBase-7b-22k-g5 (endpoint),Self-host (g5.48xlarge),8.29,248.7,15.50,High,,,,,,,,,,,,,
SF-TextBase 70B,Long,TextBase-70B-8K,Self-host (p4de.24xlarge),6.52,253.7,28.17,High,,,,,,,,,,,,,
SF-TextBase 70B,Short,TextBase-70B-8K,Self-host (p4de.24xlarge),6.24,249.7,26.96,High,,,,,,,,,,,,,
SF-TextSum,Long,CRM-TSUM-7b-22k-g5 (endpoint),Self-host (g5.48xlarge),8.85,244.0,16.55,High,,,,,,,,,,,,,
SF-TextSum,Short,CRM-TSUM-7b-22k-g5 (endpoint),Self-host (g5.48xlarge),8.34,250.4,15.60,High,,,,,,,,,,,,,
XGen 22B,Long,EinsteinXgen2E4DSStreaming (endpoint),Self-host (p4de.24xlarge),3.71,250.0,16.03,High,not able to get response for large token requests (5K-token input),,,,,,,,,,,,
XGen 22B,Short,EinsteinXgen2E4DSStreaming (endpoint),Self-host (p4de.24xlarge),2.64,250.0,11.40,High,,,,,,,,,,,,,