Update README.md
Browse files
README.md
CHANGED
@@ -25,21 +25,24 @@ Developed by Deep Learning Efficiency Research (DLER) team at NVIDIA Research.
|
|
25 |
|
26 |
|
27 |
## Hymba: Performance Highlights
|
28 |
-
-
|
29 |
-
- More comparisons can be found in our [Technical Report].
|
30 |
|
31 |
<div align="center">
|
32 |
<img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/performance1.png" alt="Compare with SoTA Small LMs" width="600">
|
33 |
</div>
|
34 |
|
|
|
|
|
|
|
|
|
35 |
<div align="center">
|
36 |
-
<img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/
|
37 |
</div>
|
38 |
|
39 |
|
40 |
-
## Hymba-1.5B: Model Usage
|
41 |
|
42 |
-
We release our Hymba-1.5B-
|
43 |
|
44 |
### Step 1: Environment Setup
|
45 |
|
@@ -47,7 +50,7 @@ Since our model employs [FlexAttention](https://pytorch.org/blog/flexattention/)
|
|
47 |
|
48 |
- **[Pip]** Install the related packages using our provided `requirement.txt`:
|
49 |
```
|
50 |
-
pip install -r https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/requirements.txt
|
51 |
```
|
52 |
|
53 |
- **[Docker]** We have prepared a docker image with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
|
@@ -78,7 +81,7 @@ login()
|
|
78 |
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
|
79 |
|
80 |
# Load Hymba-1.5B
|
81 |
-
model = AutoModelForCausalLM.from_pretrained("nvidia/Hymba-1.5B", trust_remote_code=True).cuda().to(torch.bfloat16)
|
82 |
|
83 |
# Chat with our model
|
84 |
def chat_with_model(prompt, model, tokenizer, max_length=64):
|
|
|
25 |
|
26 |
|
27 |
## Hymba: Performance Highlights
|
28 |
+
- [Hymba-1.5B-Base](https://huggingface.co/nvidia/Hymba-1.5B): Outperform all sub-2B public models, e.g., matching Llama 3.2 3B’s commonsense reasoning accuracy, being 3.49× faster, and reducing cache size by 11.7×
|
|
|
29 |
|
30 |
<div align="center">
|
31 |
<img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/performance1.png" alt="Compare with SoTA Small LMs" width="600">
|
32 |
</div>
|
33 |
|
34 |
+
|
35 |
+
- Hymba-1.5B-Instruct: Outperform all sub-2B public models.
|
36 |
+
|
37 |
+
|
38 |
<div align="center">
|
39 |
+
<img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/instruct_performance.png" alt="Compare with SoTA Small LMs" width="600">
|
40 |
</div>
|
41 |
|
42 |
|
43 |
+
## Hymba-1.5B-Instruct: Model Usage
|
44 |
|
45 |
+
We release our Hymba-1.5B-Instruct model and offer the instructions to use our model as follows.
|
46 |
|
47 |
### Step 1: Environment Setup
|
48 |
|
|
|
50 |
|
51 |
- **[Pip]** Install the related packages using our provided `requirement.txt`:
|
52 |
```
|
53 |
+
pip install -r https://huggingface.co/nvidia/Hymba-1.5B-Instruct/resolve/main/requirements.txt
|
54 |
```
|
55 |
|
56 |
- **[Docker]** We have prepared a docker image with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
|
|
|
81 |
tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
|
82 |
|
83 |
# Load Hymba-1.5B
|
84 |
+
model = AutoModelForCausalLM.from_pretrained("nvidia/Hymba-1.5B-Instruct", trust_remote_code=True).cuda().to(torch.bfloat16)
|
85 |
|
86 |
# Chat with our model
|
87 |
def chat_with_model(prompt, model, tokenizer, max_length=64):
|