mgoin
/

Nemotron-4-340B-Base-hf

Text Generation

Model card Files Files and versions Community

mgoin commited on Aug 8

Commit

6c78d6f

•

1 Parent(s): 59576f6

Update README.md

Files changed (1) hide show

README.md +27 -2

README.md CHANGED Viewed

@@ -14,8 +14,33 @@ base_model: nvidia/Nemotron-4-340B-Base
 Converted checkpoint of [nvidia/Nemotron-4-340B-Base](https://huggingface.co/nvidia/Nemotron-4-340B-Base). Specifically it was produced from the [v1.2 .nemo checkpoint on NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/nemotron-4-340b-base/files?version=1.2).
-This runs in vLLM with this PR: https://github.com/vllm-project/vllm/pull/6611. Support in transformers is still pending.
 ### Evaluations
-Please see the [FP8 checkpoint](https://huggingface.co/mgoin/Nemotron-4-340B-Base-hf-FP8) for evaluations since I only have done single-node inference.

 Converted checkpoint of [nvidia/Nemotron-4-340B-Base](https://huggingface.co/nvidia/Nemotron-4-340B-Base). Specifically it was produced from the [v1.2 .nemo checkpoint on NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/nemotron-4-340b-base/files?version=1.2).
+You can deploy this model with `vllm>=0.5.4` ([PR#6611](https://github.com/vllm-project/vllm/pull/6611)):
+```
+vllm serve mgoin/Nemotron-4-340B-Base-hf --tensor-parallel-size 16
+```
 ### Evaluations
+All the below evaluations were run with the [FP8 checkpoint](https://huggingface.co/mgoin/Nemotron-4-340B-Base-hf-FP8) using `lm-eval==0.4.3` on 8xA100 GPUs.
+```
+lm_eval --model vllm --model_args pretrained=/home/mgoin/code/Nemotron-4-340B-Base-hf-FP8,tensor_parallel_size=8,distributed_executor_backend="ray",max_model_len=4096,gpu_memory_utilization=0.6 --tasks truthfulqa_mc2 --num_fewshot 0 --batch_size 16
+vllm (pretrained=/home/mgoin/code/Nemotron-4-340B-Base-hf-FP8,tensor_parallel_size=8,distributed_executor_backend=ray,max_model_len=4096,gpu_memory_utilization=0.6), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
+|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
+|--------------|------:|------|-----:|------|---|-----:|---|-----:|
+|truthfulqa_mc2|      2|none  |     0|acc   |↑  |0.4869|±  |0.0142|
+lm_eval --model vllm --model_args pretrained=/home/mgoin/code/Nemotron-4-340B-Base-hf-FP8,tensor_parallel_size=8,distributed_executor_backend="ray",max_model_len=4096,gpu_memory_utilization=0.6 --tasks winogrande --num_fewshot 5 --batch_size 16
+vllm (pretrained=/home/mgoin/code/Nemotron-4-340B-Base-hf-FP8,tensor_parallel_size=8,distributed_executor_backend=ray,max_model_len=4096,gpu_memory_utilization=0.6), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
+|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
+|----------|------:|------|-----:|------|---|-----:|---|-----:|
+|winogrande|      1|none  |     5|acc   |↑  |0.8887|±  |0.0088|
+```
+The [original paper](https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf) evals for reference:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/c-Pxy6rED1TDm_1CVISfW.png)
+The [Minitron paper](https://arxiv.org/pdf/2407.14679) has more evals as well:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/YFmlifuYBVtdfsdPVgV4u.png)