Llama-3.3-FakeSwallow-70B-Instruct-v0.1

🚨 Only for research purpose. This model may have repetition issues.

This is a merge of pre-trained language models created using mergekit.

  • 2024.12.11 : The model weight updated.

Test environment

🔧 HACK: Try oobabooga/text-generation-webui#5885 if multiple EOS tokens doesn't work.

This model was tested using text-generation-webui. I use preset min_p with temperature=1 for Generation.

Usage

This format must be adhered to strictly, as deviations may result in less optimal outputs from the model.

The template used to construct a prompt for the instruct model is specified as follows:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{SYSTEM_PROMPT}<|eot_id|><|start_header_id|>user<|end_header_id|>

{USER_MESSAGE}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

For the "{SYSTEM_PROMPT}" part, We recommend using "あなたは誠実で優秀な日本人のアシスタントです。" or "You are a helpful assistant."

For the "{USER_MESSAGE}" part, We recommend using {instruction}\n{input}

In other words, We recommend the following:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

あなたは誠実で優秀な日本人のアシスタントです。<|eot_id|><|start_header_id|>user<|end_header_id|>

{instruction}
{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Use the instruct model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nitky/Llama-3.3-FakeSwallow-70B-Instruct-v0.1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Merge Details

Merge Method

This model was merged using the task arithmetic merge method using meta-llama/Llama-3.1-70B as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: task_arithmetic
base_model: meta-llama/Llama-3.1-70B
models:
  - model: tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
    parameters:
      weight: 1.0
  - model: meta-llama/Llama-3.3-70B-Instruct
    parameters:
      weight: 0.998
dtype: bfloat16
name: Llama-3.3-FakeSwallow-70B-Instruct-v0.1
Downloads last month
50
Safetensors
Model size
70.6B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nitky/Llama-3.3-FakeSwallow-70B-Instruct-v0.1