--- license: llama3.2 language: - en - zh --- # Llama-Breeze2-3B-Instruct-v0_1 【[Paper](https://arxiv.org/abs/2501.13921)】◇【[Kaggle Demo](https://www.kaggle.com/code/ycckaggle/demo-breeze-2-3b)】◇【[Collection](https://huggingface.co/collections/MediaTek-Research/llama-breeze2-67863158443a06a72dd29900)】 **The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on LLaMA with Vision-Aware and Function-Calling Capabilities** Llama Breeze 2 is a suite of advanced multi-modal language models, available in 3B and 8B parameter configurations, specifically designed to enhance Traditional Chinese language representation. Building upon the [LLaMA 3.2](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/), Breeze 2 continues pretraining on an extensive corpus to enhance the linguistic and cultural heritage of Traditional Chinese. It incorporates vision-aware capabilities through a visual encoder and a bridge module, and supports function-calling via prompt templates and post-training on function-calling data. *Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.* *We list all contributors in alphabetical order of their first names, as follows: Chan-Jan Hsu (許湛然), Chia-Sheng Liu (劉佳昇), Meng-Hsi Chen (陳孟羲), Muxi Chen (陳沐希), Po-Chun Hsu (許博竣), Yi-Chang Chen (陳宜昌), and the supervisor Da-Shan Shiu (許大山).* ## Installation ``` pip3 install transformers==4.47.0 pip3 install -U mtkresearch ``` ```python from transformers import AutoModel, AutoTokenizer from transformers import GenerationConfig import torch from mtkresearch.llm.prompt import MRPromptV3 model_id = 'MediaTek-Research/Llama-Breeze2-3B-Instruct-v0_1' model = AutoModel.from_pretrained( model_id, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True, device_map='auto', img_context_token_id=128212 ).eval() tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, use_fast=False) generation_config = GenerationConfig( max_new_tokens=2048, do_sample=True, temperature=0.01, top_p=0.01, repetition_penalty=1.1, eos_token_id=128009 ) prompt_engine = MRPromptV3() sys_prompt = 'You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.' def _inference(tokenizer, model, generation_config, prompt, pixel_values=None): inputs = tokenizer(prompt, return_tensors="pt").to(model.device) if pixel_values is None: output_tensors = model.generate(**inputs, generation_config=generation_config) else: output_tensors = model.generate(**inputs, generation_config=generation_config, pixel_values=pixel_values.to(model.dtype)) output_str = tokenizer.decode(output_tensors[0]) return output_str ``` ## Feature: Instruction Following ```python conversations = [ {"role": "system", "content": sys_prompt}, {"role": "user", "content": "請問什麼是深度學習?"}, ] prompt = prompt_engine.get_prompt(conversations) output_str = _inference(tokenizer, model, generation_config, prompt) result = prompt_engine.parse_generated_str(output_str) print(result) # {'role': 'assistant', 'content': '深度學習是一種人工智慧技術,主要是透過類似於大腦神經網路的方式來處理和分析資料。這個方法利用多層的人工神經元模仿生物神經網路的運作模式,讓電腦能夠從大量數據中學習並做出預測或決策。\n\n簡單來說,深度學習就是一種用機器學習的方式來訓練電腦,使其能夠像人類一樣理解、分辨及解決問題。這項技術已被廣泛應用在各種領域,如圖像識別、自然語言處理、語音辨識以及自動駕駛等方面。'} ``` ## Feature: Visual Instruction Following Example Image: ![img_example](https://huggingface.co/MediaTek-Research/Breeze2-8B-Instruct-v0_1/resolve/main/misc/test_big_data.png) ```python conversations = [ {"role": "system", "content": sys_prompt}, {"role": "user", "content": [ { "type": "image", "image_path": /path/to/example-image, }, { "type": "text", "text": "請問第二名可獲得多少獎金?" }, ]}, ] prompt, pixel_values = prompt_engine.get_prompt(conversations) output_str = _inference(tokenizer, model, generation_config, prompt, pixel_values=pixel_values) result = prompt_engine.parse_generated_str(output_str) print(result) # {'role': 'assistant', 'content': '第二名可獲得20萬元整。'} ``` ## Feature: Function Calling ```python import json functions = [ { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } ] def fake_get_current_weather(location, unit=None): return {'temperature': 30} mapping = { 'get_current_weather': fake_get_current_weather } # stage 1: query conversations = [ {"role": "user", "content": "請問台北目前溫度是攝氏幾度?"}, ] prompt = prompt_engine.get_prompt(conversations, functions=functions) output_str = _inference(tokenizer, model, generation_config, prompt) result = prompt_engine.parse_generated_str(output_str) print(result) # {'role': 'assistant', 'tool_calls': [{'id': 'call_iuwELWUShiAKE16CVoumawZ4', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{"location": "台北", "unit": "celsius"}'}}]} ``` ```python # stage 2: execute called functions conversations.append(result) tool_call = result['tool_calls'][0] func_name = tool_call['function']['name'] func = mapping[func_name] arguments = json.loads(tool_call['function']['arguments']) called_result = func(**arguments) # stage 3: put executed results conversations.append( { 'role': 'tool', 'tool_call_id': tool_call['id'], 'name': func_name, 'content': json.dumps(called_result) } ) prompt = prompt_engine.get_prompt(conversations, functions=functions) output_str2 = _inference(tokenizer, model, generation_config, prompt) result2 = prompt_engine.parse_generated_str(output_str2) print(result2) # {'role': 'assistant', 'content': '台北目前的溫度是攝氏30度。'} ``` ## Citation ``` @article{breeze2, title={The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on LLaMA with Vision-Aware and Function-Calling Capabilities}, author={Breeze Team, MediaTek Research}, journal={arXiv}, year={2025}, url={https://arxiv.org/abs/2501.13921} } ```