nvidia
/

Hymba-1.5B-Instruct

Text Generation

Model card Files Files and versions Community

YongganFu commited on Nov 1, 2024

Commit

beddbb2

·

verified ·

1 Parent(s): 3bc9d65

Update README.md

Files changed (1) hide show

README.md +10 -7

README.md CHANGED Viewed

@@ -25,21 +25,24 @@ Developed by Deep Learning Efficiency Research (DLER) team at NVIDIA Research.
 ## Hymba: Performance Highlights
-- Our Hymba-1.5B-Base outperforms all sub-2B public models, e.g., matching Llama 3.2 3B’s commonsense reasoning accuracy, being 3.49× faster, and reducing cache size by 11.7×
-- More comparisons can be found in our [Technical Report].
 <div align="center">
 <img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/performance1.png" alt="Compare with SoTA Small LMs" width="600">
 </div>
 <div align="center">
-<img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/performance2.png" alt="Compare with SoTA Small LMs" width="600">
 </div>
-## Hymba-1.5B: Model Usage
-We release our Hymba-1.5B-Base model and offer the instructions to use our model as follows.
 ### Step 1: Environment Setup
@@ -47,7 +50,7 @@ Since our model employs [FlexAttention](https://pytorch.org/blog/flexattention/)
 - **[Pip]** Install the related packages using our provided `requirement.txt`:
 ```
-pip install -r https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/requirements.txt
 ```
 - **[Docker]** We have prepared a docker image with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
@@ -78,7 +81,7 @@ login()
 tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
 # Load Hymba-1.5B
-model = AutoModelForCausalLM.from_pretrained("nvidia/Hymba-1.5B", trust_remote_code=True).cuda().to(torch.bfloat16)
 # Chat with our model
 def chat_with_model(prompt, model, tokenizer, max_length=64):

 ## Hymba: Performance Highlights
+- [Hymba-1.5B-Base](https://huggingface.co/nvidia/Hymba-1.5B): Outperform all sub-2B public models, e.g., matching Llama 3.2 3B’s commonsense reasoning accuracy, being 3.49× faster, and reducing cache size by 11.7×
 <div align="center">
 <img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/performance1.png" alt="Compare with SoTA Small LMs" width="600">
 </div>
+- Hymba-1.5B-Instruct: Outperform all sub-2B public models.
 <div align="center">
+<img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/instruct_performance.png" alt="Compare with SoTA Small LMs" width="600">
 </div>
+## Hymba-1.5B-Instruct: Model Usage
+We release our Hymba-1.5B-Instruct model and offer the instructions to use our model as follows.
 ### Step 1: Environment Setup
 - **[Pip]** Install the related packages using our provided `requirement.txt`:
 ```
+pip install -r https://huggingface.co/nvidia/Hymba-1.5B-Instruct/resolve/main/requirements.txt
 ```
 - **[Docker]** We have prepared a docker image with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
 tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
 # Load Hymba-1.5B
+model = AutoModelForCausalLM.from_pretrained("nvidia/Hymba-1.5B-Instruct", trust_remote_code=True).cuda().to(torch.bfloat16)
 # Chat with our model
 def chat_with_model(prompt, model, tokenizer, max_length=64):