arxiv:2412.07112

Maya: An Instruction Finetuned Multilingual Multimodal Model

Published on Dec 10

· Submitted by

kkr5155 on Dec 10

Upvote

Authors:

Karthik Reddy Kanjula ,

Surya Guthikonda ,

Bala Krishna S Vegesna ,

Abhipsha Das ,

Anthony Susevski ,

Ryan Sze-Yin Chan ,

Shayekh Bin Islam ,

Roshan Santhosh ,

Drishti Sharma ,

Isha Chaturvedi ,

Genta Indra Winata ,

Ashvanth. S ,

Abstract

The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to understand low-resource languages and cultural nuances in a manner free from toxicity. To address these limitations, we introduce Maya, an open-source Multimodal Multilingual model. Our contributions are threefold: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; 2) a thorough analysis of toxicity within the LLaVA dataset, followed by the creation of a novel toxicity-free version across eight languages; and 3) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.

View arXiv page View PDF Add to collection

Community

kkr5155

Paper author Paper submitter 14 days ago

A New Multimodal Multilingual Vision-Language Model. Maya is completely open source, open weight and open dataset, designed to handle 8 languages, cultural diversity, and nuanced real-world contexts in vision-language models.

librarian-bot

13 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Maya: An Instruction Finetuned Multilingual Multimodal Model

Abstract

Community

Models citing this paper 1

Datasets citing this paper 2

Spaces citing this paper 1

Collections including this paper 7