--- language: - en - fa ---
Model description | Example output | Banchmark results | How to use | Training and finetuning
---- # Model description >Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. It delivers throughput gains over traditional Transformer-based models, while outperforming or matching the leading models of its size class on most common benchmarks. >Jamba is the first production-scale Mamba implementation, which opens up interesting research and application opportunities. While this initial experimentation shows encouraging gains, we expect these to be further enhanced with future optimizations and explorations. >This model card is for the base version of Jamba. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU. ---- # Example output: **Example 1:** - Input: "سلام، خوبی؟" - Output: "سلام، خوشحالم که با شما صحبت می کنم. چطور می توانم به شما کمک کنم؟" **Example 2:** - Input: "سلام، خوبی؟" - Output: "سلام، خوشحالم که با شما صحبت می کنم. چطور می توانم به شما کمک کنم؟" ---- # Banchmark results | model | dataset | max_token | prompt | score | |---------------|------------------|-----------|--------|---------| | base-model-7b | ARC-easy-dev | 2 | en-1 | 0.41929 | | base-model-7b | ARC-easy-dev | 80 | en-2 | 0.39122 | | base-model-7b | ARC-easy-dev | 300 | en-1 | 0.34448 | | model | dataset | max_token | prompt | score | |---------------|------------------|-----------|--------|---------| | fa-model-7b | ARC-easy-dev | 80 | en-1 | 0.37894 | | fa-model-7b | ARC-easy-dev | 80 | en-2 | 0.33333 | | fa-model-7b | ARC-easy-dev | 80 | fa-2 | 0.28771 | | fa-model-7b | ARC-easy-dev | 300 | fa-1 | 0.25752 | | fa-model-7b | ARC-easy-dev | 2 | fa-1 | 0.24035 |