Maral-7B-alpha-1 / README.md
Muhammadreza's picture
Update README.md
4b199d7
|
raw
history blame
No virus
3.26 kB
metadata
license: mit
datasets:
  - sinarashidi/alpaca-persian
language:
  - en
  - fa
library_name: transformers

Maral 7B Alpha 1

What is Maral?

Maral is just a new large lanugage model, specializing on the Persian language. This model is based on Mistral and trained an Alpaca Persian dataset. This model is one of the few efforts in Persian speaking scene in order to bring our language to a new life in the era of AI.

Also, since Maral is based on Mistral, it's capable of producing English answers as well.

What does "Maral" mean?

Maral is the Persian name of Red Deer, which is a native species of deers in Iran. The name has chosen for quite a few reasons, one of them is that the environmental concerns we have and second, since it's a Persian LLM, made by Iranian people, it deserves an Iranian name.

Inference

Prompt Format

This model requires Guanaco format, which is like this:

### Human: <prompt>
### Assistant: <answer>

So in your code, you may write prompts like this:

prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
prompt = f"### Human:{prompt}\n### Assistant:"

More information about this on the inference sections.

4 bit Quantization

If you want to use 4 bit quantization, we have a PEFT for you here. Also, you can find Google Colab notebooks here.

Installing Libraries

pip install transformers accelerate bitsandbytes

NOTE: bitsandbytes library is only needed for 8 bit version. Otherwise, it's not necessary.

Inference on a big GPU

If you have a big enough GPU like an A100 in your posession, this code is for you.

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_name_or_id = "MaralGPT/Maral-7B-alpha-1"

model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)

prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
prompt = f"### Human:{prompt}\n### Assistant:"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.5,
    max_new_tokens=300,
    pad_token_id=tokenizer.eos_token_id
)

outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Inference on a small GPU (Consumer Hardware/Free Colab)

The code is pretty much the same as above, but with a slight diferrence.

  • Make sure bitsandbytes is installed correctly.
  • Your model loading must be model = AutoModelForCausalLM.from_pretrained(model_name_or_id, load_in_8bit=True, torch_dtype=torch.float16, device_map="auto")

On free version of Google Colab, you may face RAM problems. I guess using low_cpu_mem_usage=True in model loading would help.

Known Issues

Special Thanks