Maya: An Instruction Finetuned Multilingual Multimodal Model
Abstract
The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to understand low-resource languages and cultural nuances in a manner free from toxicity. To address these limitations, we introduce Maya, an open-source Multimodal Multilingual model. Our contributions are threefold: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; 2) a thorough analysis of toxicity within the LLaVA dataset, followed by the creation of a novel toxicity-free version across eight languages; and 3) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya.
Community
A New Multimodal Multilingual Vision-Language Model. Maya is completely open source, open weight and open dataset, designed to handle 8 languages, cultural diversity, and nuanced real-world contexts in vision-language models.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages (2024)
- Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement (2024)
- P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs (2024)
- MILU: A Multi-task Indic Language Understanding Benchmark (2024)
- Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance (2024)
- BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment (2024)
- Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend