Persian-Mistral-7B / README.md
aidal's picture
Update README.md
529b77d verified
|
raw
history blame
4.16 kB
metadata
language:
  - en
  - fa

Hugging Face Transformers Library

Model description | Example output | Banchmark results | How to use | Training and finetuning


Model description

Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. It delivers throughput gains over traditional Transformer-based models, while outperforming or matching the leading models of its size class on most common benchmarks.

Jamba is the first production-scale Mamba implementation, which opens up interesting research and application opportunities. While this initial experimentation shows encouraging gains, we expect these to be further enhanced with future optimizations and explorations.

This model card is for the base version of Jamba. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.


Example output:

Example 1:

  • Input: "سلام، خوبی؟"
  • Output: "سلام، خوشحالم که با شما صحبت می کنم. چطور می توانم به شما کمک کنم؟"

Example 2: - Input: "سلام، خوبی؟" - Output: "سلام، خوشحالم که با شما صحبت می کنم. چطور می توانم به شما کمک کنم؟"

Banchmark results

model dataset max_token prompt score
base-model-7b ARC-easy-dev 2 en-1 0.41929
base-model-7b ARC-easy-dev 80 en-2 0.39122
base-model-7b ARC-easy-dev 300 en-1 0.34448
model dataset max_token prompt score
--------------- ------------------ ----------- -------- ---------
fa-model-7b ARC-easy-dev 80 en-1 0.37894
fa-model-7b ARC-easy-dev 80 en-2 0.33333
fa-model-7b ARC-easy-dev 80 fa-2 0.28771
fa-model-7b ARC-easy-dev 300 fa-1 0.25752
fa-model-7b ARC-easy-dev 2 fa-1 0.24035



model dataset max_token prompt score
base-model-7b ARC-challenge-dev 80 en-2 0.37123
base-model-7b ARC-challenge-dev 2 en-2 0.36789
base-model-7b ARC-challenge-dev 2 en-1 0.35451
base-model-7b ARC-challenge-dev 80 en-1 0.33779
model dataset max_token prompt score
--------------- -------------------- ----------- -------- ---------
fa-model-7b ARC-challenge-dev 2 en-1 0.39298
fa-model-7b ARC-challenge-dev 80 en-1 0.38421
fa-model-7b ARC-challenge-dev 2 en-2 0.31929
fa-model-7b ARC-challenge-dev 80 en-2 0.31754

How to use

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("aidal/Persian-Mistral-7B")
model = AutoModelForCausalLM.from_pretrained("aidal/Persian-Mistral-7B")
input_text = "پایتخت ایران کجاست؟"
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Training and finetuning