Update README.md
Browse files
README.md
CHANGED
@@ -8,23 +8,29 @@ language:
|
|
8 |
<picture>
|
9 |
<img alt="Hugging Face Transformers Library" src="https://i.postimg.cc/VN4F7WRC/Untitled-design-modified.png" width="1000" height="450" style="max-width: 100%;">
|
10 |
</picture>
|
11 |
-
<br/>
|
12 |
-
<br/>
|
13 |
</p>
|
14 |
|
15 |
<h4 align="center">
|
16 |
<p>
|
17 |
-
<b>English</b> |
|
18 |
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#model-description">Model description</a> |
|
19 |
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#example-output">Example output</a> |
|
20 |
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#banchmark-results">Banchmark results</a> |
|
21 |
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#how-to-use">How to use</a> |
|
22 |
-
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#training-and-finetuning">Training and finetuning</a>
|
23 |
</p>
|
24 |
</h4>
|
|
|
25 |
----
|
|
|
26 |
# Model description
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
----
|
|
|
28 |
# Example output:
|
29 |
|
30 |
**Example 1:**
|
|
|
8 |
<picture>
|
9 |
<img alt="Hugging Face Transformers Library" src="https://i.postimg.cc/VN4F7WRC/Untitled-design-modified.png" width="1000" height="450" style="max-width: 100%;">
|
10 |
</picture>
|
|
|
|
|
11 |
</p>
|
12 |
|
13 |
<h4 align="center">
|
14 |
<p>
|
|
|
15 |
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#model-description">Model description</a> |
|
16 |
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#example-output">Example output</a> |
|
17 |
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#banchmark-results">Banchmark results</a> |
|
18 |
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#how-to-use">How to use</a> |
|
19 |
+
<a href="https://huggingface.co/aidal/Persian-Mistral-7B#training-and-finetuning">Training and finetuning</a>
|
20 |
</p>
|
21 |
</h4>
|
22 |
+
|
23 |
----
|
24 |
+
|
25 |
# Model description
|
26 |
+
|
27 |
+
>Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. It delivers throughput gains over traditional Transformer-based models, while outperforming or matching the leading models of its size class on most common benchmarks.
|
28 |
+
|
29 |
+
>Jamba is the first production-scale Mamba implementation, which opens up interesting research and application opportunities. While this initial experimentation shows encouraging gains, we expect these to be further enhanced with future optimizations and explorations.
|
30 |
+
|
31 |
+
>This model card is for the base version of Jamba. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.
|
32 |
----
|
33 |
+
|
34 |
# Example output:
|
35 |
|
36 |
**Example 1:**
|