qanthony-z commited on
Commit
2f00671
·
verified ·
1 Parent(s): 6ed954f

add mt bench fig

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -63,9 +63,10 @@ Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely
63
  Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
64
 
65
  <center>
66
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/U7VD9PYLj3XcEjgV08sP5.png" width="700" alt="Zamba performance">
67
  </center>
68
 
 
69
  Time to First Token (TTFT) | Output Generation
70
  :-------------------------:|:-------------------------:
71
  ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/BmE8X6tDNVw5OJcbZt8sZ.png) | ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/wECc9cItK1FW1MOMGSLrp.png)
 
63
  Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
64
 
65
  <center>
66
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/GyojH0mFCaAaAHBAXlm4T.png" width="700" alt="Zamba performance">
67
  </center>
68
 
69
+
70
  Time to First Token (TTFT) | Output Generation
71
  :-------------------------:|:-------------------------:
72
  ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/BmE8X6tDNVw5OJcbZt8sZ.png) | ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/wECc9cItK1FW1MOMGSLrp.png)