mgoin commited on
Commit
6c78d6f
1 Parent(s): 59576f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -2
README.md CHANGED
@@ -14,8 +14,33 @@ base_model: nvidia/Nemotron-4-340B-Base
14
 
15
  Converted checkpoint of [nvidia/Nemotron-4-340B-Base](https://huggingface.co/nvidia/Nemotron-4-340B-Base). Specifically it was produced from the [v1.2 .nemo checkpoint on NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/nemotron-4-340b-base/files?version=1.2).
16
 
17
- This runs in vLLM with this PR: https://github.com/vllm-project/vllm/pull/6611. Support in transformers is still pending.
 
 
 
18
 
19
  ### Evaluations
20
 
21
- Please see the [FP8 checkpoint](https://huggingface.co/mgoin/Nemotron-4-340B-Base-hf-FP8) for evaluations since I only have done single-node inference.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  Converted checkpoint of [nvidia/Nemotron-4-340B-Base](https://huggingface.co/nvidia/Nemotron-4-340B-Base). Specifically it was produced from the [v1.2 .nemo checkpoint on NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/nemotron-4-340b-base/files?version=1.2).
16
 
17
+ You can deploy this model with `vllm>=0.5.4` ([PR#6611](https://github.com/vllm-project/vllm/pull/6611)):
18
+ ```
19
+ vllm serve mgoin/Nemotron-4-340B-Base-hf --tensor-parallel-size 16
20
+ ```
21
 
22
  ### Evaluations
23
 
24
+ All the below evaluations were run with the [FP8 checkpoint](https://huggingface.co/mgoin/Nemotron-4-340B-Base-hf-FP8) using `lm-eval==0.4.3` on 8xA100 GPUs.
25
+
26
+ ```
27
+ lm_eval --model vllm --model_args pretrained=/home/mgoin/code/Nemotron-4-340B-Base-hf-FP8,tensor_parallel_size=8,distributed_executor_backend="ray",max_model_len=4096,gpu_memory_utilization=0.6 --tasks truthfulqa_mc2 --num_fewshot 0 --batch_size 16
28
+ vllm (pretrained=/home/mgoin/code/Nemotron-4-340B-Base-hf-FP8,tensor_parallel_size=8,distributed_executor_backend=ray,max_model_len=4096,gpu_memory_utilization=0.6), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
29
+ | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
30
+ |--------------|------:|------|-----:|------|---|-----:|---|-----:|
31
+ |truthfulqa_mc2| 2|none | 0|acc |↑ |0.4869|± |0.0142|
32
+
33
+ lm_eval --model vllm --model_args pretrained=/home/mgoin/code/Nemotron-4-340B-Base-hf-FP8,tensor_parallel_size=8,distributed_executor_backend="ray",max_model_len=4096,gpu_memory_utilization=0.6 --tasks winogrande --num_fewshot 5 --batch_size 16
34
+ vllm (pretrained=/home/mgoin/code/Nemotron-4-340B-Base-hf-FP8,tensor_parallel_size=8,distributed_executor_backend=ray,max_model_len=4096,gpu_memory_utilization=0.6), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
35
+ | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
36
+ |----------|------:|------|-----:|------|---|-----:|---|-----:|
37
+ |winogrande| 1|none | 5|acc |↑ |0.8887|± |0.0088|
38
+ ```
39
+
40
+ The [original paper](https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf) evals for reference:
41
+
42
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/c-Pxy6rED1TDm_1CVISfW.png)
43
+
44
+ The [Minitron paper](https://arxiv.org/pdf/2407.14679) has more evals as well:
45
+
46
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/YFmlifuYBVtdfsdPVgV4u.png)