|
--- |
|
language: |
|
- en |
|
- fa |
|
--- |
|
|
|
<p align="center"> |
|
<picture> |
|
<img alt="Hugging Face Transformers Library" src="https://i.postimg.cc/VN4F7WRC/Untitled-design-modified.png" width="1000" height="450" style="max-width: 100%;"> |
|
</picture> |
|
</p> |
|
|
|
<h4 align="center"> |
|
<p> |
|
<a href="https://huggingface.co./aidal/Persian-Mistral-7B#model-description">Model description</a> | |
|
<a href="https://huggingface.co./aidal/Persian-Mistral-7B#example-output">Example output</a> | |
|
<a href="https://huggingface.co./aidal/Persian-Mistral-7B#banchmark-results">Banchmark results</a> | |
|
<a href="https://huggingface.co./aidal/Persian-Mistral-7B#how-to-use">How to use</a> | |
|
<a href="https://huggingface.co./aidal/Persian-Mistral-7B#training-and-finetuning">Training and finetuning</a> |
|
</p> |
|
</h4> |
|
|
|
---- |
|
|
|
# Model description |
|
|
|
>Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. It delivers throughput gains over traditional Transformer-based models, while outperforming or matching the leading models of its size class on most common benchmarks. |
|
|
|
>Jamba is the first production-scale Mamba implementation, which opens up interesting research and application opportunities. While this initial experimentation shows encouraging gains, we expect these to be further enhanced with future optimizations and explorations. |
|
|
|
>This model card is for the base version of Jamba. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU. |
|
---- |
|
|
|
# Example output: |
|
|
|
**Example 1:** |
|
- Input: "سلام، خوبی؟" |
|
- Output: "سلام، خوشحالم که با شما صحبت می کنم. چطور می توانم به شما کمک کنم؟" |
|
|
|
**Example 2:** |
|
- Input: "سلام، خوبی؟" |
|
- Output: "سلام، خوشحالم که با شما صحبت می کنم. چطور می توانم به شما کمک کنم؟" |
|
---- |
|
# Banchmark results |
|
|
|
| model | dataset | max_token | prompt | score | |
|
|---------------|-------------------|-----------|--------|---------| |
|
| base-model-7b | ARC-easy-dev | 2 | en-1 | 0.41929 | |
|
| base-model-7b | ARC-easy-dev | 80 | en-2 | 0.39122 | |
|
| fa-model-7b | ARC-easy-dev | 80 | en-1 | 0.37894 | |
|
| base-model-7b | ARC-challenge-dev | 80 | en-2 | 0.37123 | |
|
| fa-model-7b | ARC-challenge-dev | 80 | en-1 | 0.39298 | |
|
|
|
---- |
|
# How to use |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
tokenizer = AutoTokenizer.from_pretrained("aidal/Persian-Mistral-7B") |
|
model = AutoModelForCausalLM.from_pretrained("aidal/Persian-Mistral-7B") |
|
input_text = "پایتخت ایران کجاست؟" |
|
input_ids = tokenizer(input_text, return_tensors="pt") |
|
outputs = model.generate(**input_ids) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
---- |
|
# Training and finetuning |