Image-Text-to-Text
Transformers
Safetensors
llava_cohere
text-generation
Merge
conversational
Inference Endpoints
maya / README.md
nahidalam's picture
Update README.md
0bddd4a verified
---
license: apache-2.0
tags:
- merge
base_model:
- CohereForAI/aya-23-8B
- google/siglip-base-patch16-256-multilingual
datasets:
- maya-multimodal/pretrain
- MBZUAI/palo_multilingual_dataset
language:
- en
- hi
- fr
- ru
- zh
- ar
- ja
- es
pipeline_tag: image-text-to-text
library_name: transformers
---
# Maya: A Multilingual Vision Language Model
Maya is an instruction-finetuned multilingual multimodal model that expands multimodal capabilities to eight languages with an emphasis on data quality and cultural sensitivity. Built on the LLaVA framework, Maya includes a newly created pre-training dataset designed to support multilingual and culturally aware VLM development.
## Model Description
- **Developed by:** Cohere For AI Community
- **Model type:** Multimodal Vision-Language Model
- **Language(s):** English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi
- **License:** Apache 2.0
- **Related Paper:** [Maya: An Instruction Finetuned Multilingual Multimodal Model](https://arxiv.org/abs/2412.07112)
## Model Details
Maya leverages the lightweight architecture to provide a compact yet powerful multimodal experience, with several key features:
- Built on LLaVA framework using Aya-23 8B model
- Uses SigLIP for vision encoding with multilingual adaptability
- Supports 8 languages with strong cultural understanding
- Trained on toxicity-filtered dataset for safer deployment
### Model Architecture
- **Base Model:** Aya-23 8B
- **Vision Encoder:** SigLIP (multilingual)
- **Training Data:** 558,000 images with multilingual annotations
- **Context Length:** 8K tokens
- **Parameters:** 8 billion
## Intended Uses
Maya is designed for:
- Multilingual visual question answering
- Cross-cultural image understanding
- Image captioning in multiple languages
- Visual reasoning tasks
- Document understanding
## Usage
```bash
# Clone the Github repository
git clone https://github.com/nahidalam/maya
# Change the working directory
cd maya
```
```python
# Run the following code
from llava.eval.talk2maya import run_vqa_model
# Define inputs
question = "Try identify what plane this is, based on the design."
image_path = "./llava/eval/claude_plane_test_2.jpeg"
# Run model
answer = run_vqa_model(
question=question,
image_file=image_path
)
```
## Limitations
- Limited to 8 languages currently
- Requires high-quality images for optimal performance
- May not capture nuanced cultural contexts in all cases
- Performance varies across languages and tasks
## Bias, Risks, and Limitations
Maya has been developed with attention to bias mitigation and safety:
- Dataset filtered for toxic content
- Cultural sensitivity evaluations performed
- Regular bias assessments conducted
- Limited to high-quality, vetted training data
However, users should be aware that:
- Model may still exhibit biases present in training data
- Performance may vary across different cultural contexts
- Not suitable for critical decision-making applications
## Training Details
Maya was trained using:
- 558,000 curated images
- Multilingual annotations in 8 languages
- Toxicity-filtered dataset
- 8xH100 GPUs with 80GB DRAM
- Batch size of 32 (per device)
- Learning rate of 1e-3 with cosine scheduler
## Citation
```bibtex
@misc{alam2024mayainstructionfinetunedmultilingual,
title={Maya: An Instruction Finetuned Multilingual Multimodal Model},
author={Nahid Alam and Karthik Reddy Kanjula and Surya Guthikonda and Timothy Chung and Bala Krishna S Vegesna and Abhipsha Das and Anthony Susevski and Ryan Sze-Yin Chan and S M Iftekhar Uddin and Shayekh Bin Islam and Roshan Santhosh and Snegha A and Drishti Sharma and Chen Liu and Isha Chaturvedi and Genta Indra Winata and Ashvanth. S and Snehanshu Mukherjee and Alham Fikri Aji},
year={2024},
eprint={2412.07112},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.07112},
}
```
## Contact
For questions or feedback about Maya, please:
- Open an issue on our [GitHub repository](https://github.com/nahidalam/maya)
- Contact the maintainers at: [email protected], [email protected]