qanthony-z
commited on
add mt bench fig
Browse files
README.md
CHANGED
@@ -63,9 +63,10 @@ Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely
|
|
63 |
Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
|
64 |
|
65 |
<center>
|
66 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/
|
67 |
</center>
|
68 |
|
|
|
69 |
Time to First Token (TTFT) | Output Generation
|
70 |
:-------------------------:|:-------------------------:
|
71 |
![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/BmE8X6tDNVw5OJcbZt8sZ.png) | ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/wECc9cItK1FW1MOMGSLrp.png)
|
|
|
63 |
Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
|
64 |
|
65 |
<center>
|
66 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/GyojH0mFCaAaAHBAXlm4T.png" width="700" alt="Zamba performance">
|
67 |
</center>
|
68 |
|
69 |
+
|
70 |
Time to First Token (TTFT) | Output Generation
|
71 |
:-------------------------:|:-------------------------:
|
72 |
![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/BmE8X6tDNVw5OJcbZt8sZ.png) | ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/wECc9cItK1FW1MOMGSLrp.png)
|