Maestrale chat alpha ༄
By @efederici and @mferraretto
Model description
- Language Model: Mistral-7b for the Italian language, continued pre-training for Italian on a curated large-scale high-quality corpus, merged with occiglot.
- Fine-Tuning: SFT performed on 1.7M convs/instructions for 2 epochs.
v0.4
- Agent
- Improved truthfullness
- Improved Math & Reasoning capabilities
- Mermaid mindmaps
- More latin translations, poems, ...
This model uses ChatML prompt format:
<|im_start|>system
Sei un assistente utile.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Scores
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
hellaswag_it | 1 | none | 0 | acc | 0.5220 | ± | 0.0052 |
none | 0 | acc_norm | 0.6887 | ± | 0.0048 | ||
arc_it | 1 | none | 0 | acc | 0.1762 | ± | 0.0111 |
none | 0 | acc_norm | 0.5090 | ± | 0.0146 | ||
m_mmlu_it | 0 | none | 5 | acc | 0.569 | ± | 0.0043 |
Usage:
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
GenerationConfig,
TextStreamer
)
import torch
tokenizer = AutoTokenizer.from_pretrained("mii-llm/maestrale-chat-v0.4-alpha-sft")
model = AutoModelForCausalLM.from_pretrained("mii-llm/maestrale-chat-v0.4-alpha-sft", load_in_8bit=True, device_map="auto")
gen = GenerationConfig(
do_sample=True,
temperature=0.7,
repetition_penalty=1.2,
top_k=50,
top_p=0.95,
max_new_tokens=500,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.convert_tokens_to_ids("<|im_end|>")
)
streamer = TextStreamer(tokenizer, skip_prompt=True)
messages = [
{"role": "system", "content": "Sei un assistente utile."},
{"role": "user", "content": "{prompt}"}
]
with torch.no_grad():
temp = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(temp, return_tensors="pt").to("cuda")
_ = model.generate(
**inputs,
streamer=streamer,
generation_config=gen
)
Intended uses & limitations
It's an alpha version; it's not safe
, but it can refuse to answer.
- Downloads last month
- 2,798
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.