Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 99
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 17 days ago • 453
LLaVa-1.5 Collection LLaVa-1.5 is a series of vision-language models (VLMs) trained on a variety of visual instruction datasets. • 3 items • Updated Mar 18 • 7
LLaVa-NeXT Collection LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19 • 26
Vision-Language Modeling Collection Our datasets and models for Visual-Language Modeling • 5 items • Updated Jul 26 • 6
CogVLM2 Collection This collection hosts the repos of the THUDM's CogVLM2 releases • 8 items • Updated Aug 18 • 18