nvidia
/

NVLM-D-72B-mcore

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

[email protected] commited on Jan 8

Commit

8cf28e4

·

1 Parent(s): 00e9cf1

Add more details

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -19,6 +19,8 @@ library_name: transformers
 # Model Overview
 ## Description
 This family of models performs vision-language and text-only tasks including optical character recognition, multimodal reasoning, localization, common sense reasoning, world knowledge utilization, and coding.

 # Model Overview
+*[NVLM 1.0 model](https://huggingface.co/nvidia/NVLM-D-72B) is trained with legacy [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/legacy). In this repo, we reproduce NVLM-1.0 results using the latest [Megatron-core training code](https://github.com/NVIDIA/Megatron-LM/tree/NVLM-1.0/examples/multimodal/nvlm) and share the Megatron-core model weights, training code, and evaluation scripts.*
 ## Description
 This family of models performs vision-language and text-only tasks including optical character recognition, multimodal reasoning, localization, common sense reasoning, world knowledge utilization, and coding.