Llama-Breeze2-3B-Instruct-v0_1

【Paper】◇【Kaggle Demo】◇【Collection】

The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on LLaMA with Vision-Aware and Function-Calling Capabilities

Llama Breeze 2 is a suite of advanced multi-modal language models, available in 3B and 8B parameter configurations, specifically designed to enhance Traditional Chinese language representation. Building upon the LLaMA 3.2, Breeze 2 continues pretraining on an extensive corpus to enhance the linguistic and cultural heritage of Traditional Chinese. It incorporates vision-aware capabilities through a visual encoder and a bridge module, and supports function-calling via prompt templates and post-training on function-calling data.

We list all contributors in alphabetical order of their first names, as follows: Chan-Jan Hsu (許湛然), Chia-Sheng Liu (劉佳昇), Meng-Hsi Chen (陳孟羲), Muxi Chen (陳沐希), Po-Chun Hsu (許博竣), Yi-Chang Chen (陳宜昌), and the supervisor Da-Shan Shiu (許大山).

Installation

pip3 install transformers==4.47.0
pip3 install -U mtkresearch

from transformers import AutoModel, AutoTokenizer
from transformers import GenerationConfig
import torch
from mtkresearch.llm.prompt import MRPromptV3

model_id = 'MediaTek-Research/Llama-Breeze2-3B-Instruct-v0_1'
model = AutoModel.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map='auto',
    img_context_token_id=128212
).eval()

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, use_fast=False)

generation_config = GenerationConfig(
  max_new_tokens=2048,
  do_sample=True,
  temperature=0.01,
  top_p=0.01,
  repetition_penalty=1.1,
  eos_token_id=128009
)

prompt_engine = MRPromptV3()

sys_prompt = 'You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.'

def _inference(tokenizer, model, generation_config, prompt, pixel_values=None):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    if pixel_values is None:
        output_tensors = model.generate(**inputs, generation_config=generation_config)
    else:
        output_tensors = model.generate(**inputs, generation_config=generation_config, pixel_values=pixel_values.to(model.dtype))
    output_str = tokenizer.decode(output_tensors[0])
    return output_str

Feature: Instruction Following

conversations = [
    {"role": "system", "content": sys_prompt},
    {"role": "user", "content": "請問什麼是深度學習？"},
]

prompt = prompt_engine.get_prompt(conversations)
output_str = _inference(tokenizer, model, generation_config, prompt)
result = prompt_engine.parse_generated_str(output_str)
print(result)
# {'role': 'assistant', 'content': '深度學習是一種人工智慧技術，主要是透過類似於大腦神經網路的方式來處理和分析資料。這個方法利用多層的人工神經元模仿生物神經網路的運作模式，讓電腦能夠從大量數據中學習並做出預測或決策。\n\n簡單來說，深度學習就是一種用機器學習的方式來訓練電腦，使其能夠像人類一樣理解、分辨及解決問題。這項技術已被廣泛應用在各種領域，如圖像識別、自然語言處理、語音辨識以及自動駕駛等方面。'}

Feature: Visual Instruction Following

Example Image:

conversations = [
    {"role": "system", "content": sys_prompt},
    {"role": "user", "content": [
        {
            "type": "image",
            "image_path": /path/to/example-image,
        },
        {
            "type": "text",
            "text": "請問第二名可獲得多少獎金？"
        },
    ]},
]

prompt, pixel_values = prompt_engine.get_prompt(conversations)
output_str = _inference(tokenizer, model, generation_config, prompt, pixel_values=pixel_values)
result = prompt_engine.parse_generated_str(output_str)
print(result)
# {'role': 'assistant', 'content': '第二名可獲得20萬元整。'}

Feature: Function Calling

import json

functions = [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
]

def fake_get_current_weather(location, unit=None):
    return {'temperature': 30}

mapping = {
    'get_current_weather': fake_get_current_weather
}

# stage 1: query
conversations = [
    {"role": "user", "content": "請問台北目前溫度是攝氏幾度？"},
]

prompt = prompt_engine.get_prompt(conversations, functions=functions)

output_str = _inference(tokenizer, model, generation_config, prompt)
result = prompt_engine.parse_generated_str(output_str)

print(result)
# {'role': 'assistant', 'tool_calls': [{'id': 'call_iuwELWUShiAKE16CVoumawZ4', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{"location": "台北", "unit": "celsius"}'}}]}

# stage 2: execute called functions
conversations.append(result)

tool_call = result['tool_calls'][0]
func_name = tool_call['function']['name']
func = mapping[func_name]
arguments = json.loads(tool_call['function']['arguments'])
called_result = func(**arguments)

# stage 3: put executed results
conversations.append(
    {
        'role': 'tool',
        'tool_call_id': tool_call['id'],
        'name': func_name,
        'content': json.dumps(called_result)
    }
)

prompt = prompt_engine.get_prompt(conversations, functions=functions)

output_str2 = _inference(tokenizer, model, generation_config, prompt)
result2 = prompt_engine.parse_generated_str(output_str2)
print(result2)
# {'role': 'assistant', 'content': '台北目前的溫度是攝氏30度。'}

Citation

@article{breeze2,
  title={The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on LLaMA with Vision-Aware and Function-Calling Capabilities},
  author={Breeze Team, MediaTek Research},
  journal={arXiv},
  year={2025},
  url={https://arxiv.org/abs/2501.13921}
}

MediaTek-Research
/

Llama-Breeze2-3B-Instruct