YongganFu commited on
Commit
beddbb2
·
verified ·
1 Parent(s): 3bc9d65

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -25,21 +25,24 @@ Developed by Deep Learning Efficiency Research (DLER) team at NVIDIA Research.
25
 
26
 
27
  ## Hymba: Performance Highlights
28
- - Our Hymba-1.5B-Base outperforms all sub-2B public models, e.g., matching Llama 3.2 3B’s commonsense reasoning accuracy, being 3.49× faster, and reducing cache size by 11.7×
29
- - More comparisons can be found in our [Technical Report].
30
 
31
  <div align="center">
32
  <img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/performance1.png" alt="Compare with SoTA Small LMs" width="600">
33
  </div>
34
 
 
 
 
 
35
  <div align="center">
36
- <img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/performance2.png" alt="Compare with SoTA Small LMs" width="600">
37
  </div>
38
 
39
 
40
- ## Hymba-1.5B: Model Usage
41
 
42
- We release our Hymba-1.5B-Base model and offer the instructions to use our model as follows.
43
 
44
  ### Step 1: Environment Setup
45
 
@@ -47,7 +50,7 @@ Since our model employs [FlexAttention](https://pytorch.org/blog/flexattention/)
47
 
48
  - **[Pip]** Install the related packages using our provided `requirement.txt`:
49
  ```
50
- pip install -r https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/requirements.txt
51
  ```
52
 
53
  - **[Docker]** We have prepared a docker image with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
@@ -78,7 +81,7 @@ login()
78
  tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
79
 
80
  # Load Hymba-1.5B
81
- model = AutoModelForCausalLM.from_pretrained("nvidia/Hymba-1.5B", trust_remote_code=True).cuda().to(torch.bfloat16)
82
 
83
  # Chat with our model
84
  def chat_with_model(prompt, model, tokenizer, max_length=64):
 
25
 
26
 
27
  ## Hymba: Performance Highlights
28
+ - [Hymba-1.5B-Base](https://huggingface.co/nvidia/Hymba-1.5B): Outperform all sub-2B public models, e.g., matching Llama 3.2 3B’s commonsense reasoning accuracy, being 3.49× faster, and reducing cache size by 11.7×
 
29
 
30
  <div align="center">
31
  <img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/performance1.png" alt="Compare with SoTA Small LMs" width="600">
32
  </div>
33
 
34
+
35
+ - Hymba-1.5B-Instruct: Outperform all sub-2B public models.
36
+
37
+
38
  <div align="center">
39
+ <img src="https://huggingface.co/nvidia/Hymba-1.5B/resolve/main/images/instruct_performance.png" alt="Compare with SoTA Small LMs" width="600">
40
  </div>
41
 
42
 
43
+ ## Hymba-1.5B-Instruct: Model Usage
44
 
45
+ We release our Hymba-1.5B-Instruct model and offer the instructions to use our model as follows.
46
 
47
  ### Step 1: Environment Setup
48
 
 
50
 
51
  - **[Pip]** Install the related packages using our provided `requirement.txt`:
52
  ```
53
+ pip install -r https://huggingface.co/nvidia/Hymba-1.5B-Instruct/resolve/main/requirements.txt
54
  ```
55
 
56
  - **[Docker]** We have prepared a docker image with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
 
81
  tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b")
82
 
83
  # Load Hymba-1.5B
84
+ model = AutoModelForCausalLM.from_pretrained("nvidia/Hymba-1.5B-Instruct", trust_remote_code=True).cuda().to(torch.bfloat16)
85
 
86
  # Chat with our model
87
  def chat_with_model(prompt, model, tokenizer, max_length=64):