Zyphra
/

Zamba2-2.7B-instruct

Text Generation

Inference Endpoints

Model card Files Files and versions Community

qanthony-z commited on Oct 2, 2024

Commit

2f00671

·

verified ·

1 Parent(s): 6ed954f

add mt bench fig

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -63,9 +63,10 @@ Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely
 Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
 <center>
-<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/U7VD9PYLj3XcEjgV08sP5.png" width="700" alt="Zamba performance">
 </center>
 Time to First Token (TTFT)             |  Output Generation
 :-------------------------:|:-------------------------:
 ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/BmE8X6tDNVw5OJcbZt8sZ.png)  |  ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/wECc9cItK1FW1MOMGSLrp.png)

 Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
 <center>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/GyojH0mFCaAaAHBAXlm4T.png" width="700" alt="Zamba performance">
 </center>
 Time to First Token (TTFT)             |  Output Generation
 :-------------------------:|:-------------------------:
 ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/BmE8X6tDNVw5OJcbZt8sZ.png)  |  ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/wECc9cItK1FW1MOMGSLrp.png)