Breeze-7B-FC-v1_0 / README.md
YC-Chen's picture
Update README.md
0aa06b3 verified
|
raw
history blame
3.28 kB
metadata
license: apache-2.0

Model Card for MediaTek Research Breeze-7B-FC-v1_0

Performance

Models #Parameters Organization License Function Calling? Instrustion Following?
Breeze-7B-Instruct-v1_0 7B MediaTek Research Apache 2.0 No Yes
Breeze-7B-FC-v1_0 7B MediaTek Research Apache 2.0 Yes Yes
Gorilla-OpenFunctions-v2 7B Gorilla LLM Apache 2.0 Yes No
GPT-3.5-Turbo-0125 OpenAI Proprietary Yes Yes

πŸ“Œ Evaluate function calling on EN benchmark

Berkeley function-calling leaderboard

Models ↑ Overall Irrelevance
Detection
AST/
Simple
AST/
Multiple
AST/
Parallel
AST/
Parallel-Multiple
Exec/
Simple
Exec/
Multiple
Exec/
Parallel
Exec/
Parallel-Multiple
Breeze-7B-FC-v1_0 (FC) 86.01 74.58 90.00 93.00 82.00 83.00 98.00 92.00 88.00 75.00
Gorilla-OpenFunctions-v2 (FC) 85.95 60.00 94.25 95.50 86.50 86.00 97.00 96.00 80.00 75.00
GPT-3.5-Turbo-0125 (FC) 72.77 4.58 87.75 90.50 88.50 82.50 91.00 82.00 78.00 52.50

πŸ“Œ Evaluate function calling on ZHTW benchmark

function-calling-leaderboard-for-zhtw

Models ↑ Overall Irrelevance
Detection
AST/
Simple
AST/
Multiple
AST/
Parallel
AST/
Parallel-Multiple
Exec/
Simple
Exec/
Multiple
Exec/
Parallel
Exec/
Parallel-Multiple
Breeze-7B-FC-v1_0 (FC) 77.70 71.67 82.00 86.50 76.00 65.50 87.00 88.00 80.00 57.50
Gorilla-OpenFunctions-v2 (FC) 75.68 53.75 84.75 86.50 72.50 68.00 92.00 92.00 62.00 72.50
GPT-3.5-Turbo-0125 (FC) 66.15 7.50 83.75 83.50 73.00 65.50 88.00 84.00 72.00 40.00

πŸ“Œ Evaluate instrustion following on ZHTW benchmark

MT-Bench-TC

Win Tie Lose
Breeze-7B-FC-v1_0 v.s. Breeze-7B-Instruct-v1_0 42 (26.3%) 71 (44.4%) 47 (29.4%)