license: mit
datasets:
- sinarashidi/alpaca-persian
language:
- en
- fa
library_name: transformers
Maral 7B Alpha 1
What is Maral?
Maral is just a new large lanugage model, specializing on the Persian language. This model is based on Mistral and trained an Alpaca Persian dataset. This model is one of the few efforts in Persian speaking scene in order to bring our language to a new life in the era of AI.
Also, since Maral is based on Mistral, it's capable of producing English answers as well.
What does "Maral" mean?
Maral is the Persian name of Red Deer, which is a native species of deers in Iran. The name has chosen for quite a few reasons, one of them is that the environmental concerns we have and second, since it's a Persian LLM, made by Iranian people, it deserves an Iranian name.
Inference
Prompt Format
This model requires Guanaco format, which is like this:
### Human: <prompt>
### Assistant: <answer>
So in your code, you may write prompts like this:
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
prompt = f"### Human:{prompt}\n### Assistant:"
More information about this on the inference sections.
4 bit Quantization
If you want to use 4 bit quantization, we have a PEFT for you here. Also, you can find Google Colab notebooks here.
Installing Libraries
pip install transformers accelerate bitsandbytes
NOTE: bitsandbytes
library is only needed for 8 bit version. Otherwise, it's not necessary.
Inference on a big GPU
If you have a big enough GPU like an A100 in your posession, this code is for you.
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch
model_name_or_id = "MaralGPT/Maral-7B-alpha-1"
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
prompt = f"### Human:{prompt}\n### Assistant:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
do_sample=True,
top_k=1,
temperature=0.5,
max_new_tokens=300,
pad_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))