PyTorch
mistral
Breeze-7B-FC-v1_0 / README.md
YC-Chen's picture
Update README.md
c9b58fa verified
|
raw
history blame
7.58 kB
metadata
license: apache-2.0

Model Card for MediaTek Research Breeze-7B-FC-v1_0

🏆 Performance

Models #Parameters Organization License 🧰 Function Calling? 💬 Instrustion Following?
Breeze-7B-Instruct-v1_0 7B MediaTek Research Apache 2.0
Breeze-7B-FC-v1_0 7B MediaTek Research Apache 2.0
Gorilla-OpenFunctions-v2 7B Gorilla LLM Apache 2.0
GPT-3.5-Turbo-0125 OpenAI Proprietary

Evaluate function calling on EN benchmark

Berkeley function-calling leaderboard

Models ↑ Overall Irrelevance
Detection
AST/
Simple
AST/
Multiple
AST/
Parallel
AST/
Parallel-Multiple
Exec/
Simple
Exec/
Multiple
Exec/
Parallel
Exec/
Parallel-Multiple
Breeze-7B-FC-v1_0 (FC) 86.89 76.25 90.00 93.00 84.00 84.00 100.00 92.00 88.00 77.50
Gorilla-OpenFunctions-v2 (FC) 85.95 60.00 94.25 95.50 86.50 86.00 97.00 96.00 80.00 75.00
GPT-3.5-Turbo-0125 (FC) 72.77 4.58 87.75 90.50 88.50 82.50 91.00 82.00 78.00 52.50

Evaluate function calling on ZHTW benchmark

function-calling-leaderboard-for-zhtw

Models ↑ Overall Irrelevance
Detection
AST/
Simple
AST/
Multiple
AST/
Parallel
AST/
Parallel-Multiple
Exec/
Simple
Exec/
Multiple
Exec/
Parallel
Exec/
Parallel-Multiple
Breeze-7B-FC-v1_0 (FC) 78.18 72.50 82.00 86.00 76.50 67.00 88.00 88.00 80.00 60.00
Gorilla-OpenFunctions-v2 (FC) 75.68 53.75 84.75 86.50 72.50 68.00 92.00 92.00 62.00 72.50
GPT-3.5-Turbo-0125 (FC) 66.15 7.50 83.75 83.50 73.00 65.50 88.00 84.00 72.00 40.00

Evaluate instrustion following on EN benchmark

MT-Bench

Win Tie Lose
Breeze-7B-FC-v1_0 v.s. Breeze-7B-Instruct-v1_0 29 (18.1%) 55 (34.3%) 76 (47.5%)

Evaluate instrustion following on ZHTW benchmark

MT-Bench-TC

Win Tie Lose
Breeze-7B-FC-v1_0 v.s. Breeze-7B-Instruct-v1_0 35 (21.9%) 73 (45.6%) 52 (32.5%)

👩‍💻 How to use

Dependiency

Install mtkresearch package

git clone https://github.com/mtkresearch/mtkresearch.git
cd mtkresearch
pip install -e .

Hosting by VLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model='MediaTek-Research/Breeze-7B-FC-v1_0',
    tensor_parallel_size=num_gpu, # number of gpus
    gpu_memory_utilization=0.7
)

instance_end_token_id = llm.get_tokenizer().convert_token_to_ids('<|im_end|>')
params = SamplingParams(
    temperature=0.01,
    top_p=0.01,
    max_tokens=4096,
    repetition_penalty=1.1,
    stop_token_ids=[instance_end_token_id]
)

def _inference(prompt, llm, params):
    return llm.generate(prompt, params)[0].outputs[0].text

Instruction following

from mtkresearch.llm.prompt import MRPromptV2

sys_prompt = 'You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.'

prompt_engine = MRPromptV2()

conversations = [
    {"role": "system", "content": sys_prompt},
    {"role": "user", "content": "請問什麼是深度學習?"},
]

prompt = prompt_engine.get_prompt(conversations)


output_str = _inference(prompt, llm, params)
result = prompt_engine.parse_generated_str(output_str)

print(result)
# {'role': 'assistant',
#  'content': '深度學習(Deep Learning)是一種機器學習方法,它模仿人類大腦的神經網路結構來處理複雜的數據和任務。在深度學習中,模型由多層人工神經元組成,每個神經元之間有權重連接,並通過非線性轉換進行計算。這些層與層之間的相互作用使模型能夠學習複雜的函數關係或模式,從而解決各種問題,如圖像識別、自然語言理解、語音辨識等。深度學習通常需要大量的數據和強大的計算能力,因此經常使用圖形處理器(GPU)或特殊的加速器來執行。'}

Function Calling

import json

from mtkresearch.llm.prompt import MRPromptV2

functions = [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
]

def faked_get_current_weather(location, unit=None):
    return {'temperature': 30}

mapping = {
    'get_current_weather': faked_get_current_weather
}

prompt_engine = MRPromptV2()

# stage 1: query
conversations = [
    {"role": "user", "content": "台北目前溫度是攝氏幾度?"},
]

prompt = prompt_engine.get_prompt(conversations, functions=functions)

output_str = _inference(prompt, llm, params)
result = prompt_engine.parse_generated_str(output_str)

print(result) 
# {'role': 'assistant', 
#  'tool_calls': [
#    {'id': 'call_U9bYCBRAbF639uUqfwehwSbw', 'type': 'function', 
#     'function': {'name': 'get_current_weather', 'arguments': '{"location": "台北, 台灣", "unit": "攝氏"}'}}]}

# stage 2: execute called functions
conversations.append(result)

tool_call = result['tool_calls'][0]
func_name = tool_call['function']['name']
func = mapping[func_name]
arguments = json.loads(tool_call['function']['arguments'])
called_result = func(**arguments)

# stage 3: put executed results
conversations.append(
    {
        'role': 'tool',
        'tool_call_id': tool_call['id'],
        'name': func_name,
        'content': json.dumps(called_result)
    }
)

prompt = prompt_engine.get_prompt(conversations, functions=functions)

output_str2 = _inference(prompt, llm, params)
result2 = prompt_engine.parse_generated_str(output_str2)
print(result2)
# {'role': 'assistant', 'content': '台北目前的溫度是攝氏30度。'}