File size: 3,263 Bytes

---
license: mit
datasets:
- sinarashidi/alpaca-persian
language:
- en
- fa
library_name: transformers
---

# Maral 7B Alpha 1

<p align="center">
  <img src="maral-7b-announce.png" width=256 height=256 />
</p>

## What is Maral?

_Maral_ is just a new large lanugage model, specializing on the Persian language. This model is based on [Mistral](https://huggingface.co./mistralai/Mistral-7B-v0.1) and trained an _Alpaca Persian_ dataset. This model is one of the few efforts in Persian speaking scene in order to bring our language to a new life in the era of AI.

Also, since Maral is based on Mistral, it's capable of producing English answers as well. 

### What does "Maral" mean?

Maral is the Persian name of [Red Deer](https://en.wikipedia.org/wiki/Red_deer), which is a native species of deers in Iran. The name has chosen for quite a few reasons, one of them is that the environmental concerns we have and second, since it's a Persian LLM, made by Iranian people, it deserves an Iranian name.

## Inference

### Prompt Format

This model requires _Guanaco_ format, which is like this:

```
### Human: <prompt>
### Assistant: <answer>
```

So in your code, you may write prompts like this:

```python
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
prompt = f"### Human:{prompt}\n### Assistant:"
```

More information about this on the inference sections. 

### 4 bit Quantization

If you want to use 4 bit quantization, we have a PEFT for you [here](https://huggingface.co./MaralGPT/MaralGPT-Mistral-7B-v-0-1). Also, you can find _Google Colab_ notebooks [here](https://github.com/prp-e/maralgpt).

### Installing Libraries

```pip install transformers accelerate bitsandbytes```

_NOTE_: `bitsandbytes` library is only needed for 8 bit version. Otherwise, it's not necessary. 

### Inference on a big GPU

If you have a big enough GPU like an A100 in your posession, this code is for you. 

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_name_or_id = "MaralGPT/Maral-7B-alpha-1"

model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)

prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
prompt = f"### Human:{prompt}\n### Assistant:"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.5,
    max_new_tokens=300,
    pad_token_id=tokenizer.eos_token_id
)

outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Inference on a small GPU (Consumer Hardware/Free Colab)

The code is pretty much the same as above, but with a slight diferrence. 

* Make sure `bitsandbytes` is installed correctly.
* Your model loading must be `model = AutoModelForCausalLM.from_pretrained(model_name_or_id, load_in_8bit=True, torch_dtype=torch.float16, device_map="auto")`

On _free version_ of Google Colab, you may face RAM problems. I guess using `low_cpu_mem_usage=True` in model loading would help. 

## Known Issues

## Special Thanks