maya / README.md

Update README.md

0bddd4a verified about 2 months ago

4.15 kB

	---
	license: apache-2.0
	tags:
	- merge
	base_model:
	- CohereForAI/aya-23-8B
	- google/siglip-base-patch16-256-multilingual
	datasets:
	- maya-multimodal/pretrain
	- MBZUAI/palo_multilingual_dataset
	language:
	- en
	- hi
	- fr
	- ru
	- zh
	- ar
	- ja
	- es
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	# Maya: A Multilingual Vision Language Model

	Maya is an instruction-finetuned multilingual multimodal model that expands multimodal capabilities to eight languages with an emphasis on data quality and cultural sensitivity. Built on the LLaVA framework, Maya includes a newly created pre-training dataset designed to support multilingual and culturally aware VLM development.

	## Model Description

	- Developed by: Cohere For AI Community
	- Model type: Multimodal Vision-Language Model
	- Language(s): English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi
	- License: Apache 2.0
	- Related Paper: [Maya: An Instruction Finetuned Multilingual Multimodal Model](https://arxiv.org/abs/2412.07112)

	## Model Details

	Maya leverages the lightweight architecture to provide a compact yet powerful multimodal experience, with several key features:

	- Built on LLaVA framework using Aya-23 8B model
	- Uses SigLIP for vision encoding with multilingual adaptability
	- Supports 8 languages with strong cultural understanding
	- Trained on toxicity-filtered dataset for safer deployment

	### Model Architecture

	- Base Model: Aya-23 8B
	- Vision Encoder: SigLIP (multilingual)
	- Training Data: 558,000 images with multilingual annotations
	- Context Length: 8K tokens
	- Parameters: 8 billion

	## Intended Uses

	Maya is designed for:

	- Multilingual visual question answering
	- Cross-cultural image understanding
	- Image captioning in multiple languages
	- Visual reasoning tasks
	- Document understanding

	## Usage

	```bash
	# Clone the Github repository
	git clone https://github.com/nahidalam/maya

	# Change the working directory
	cd maya
	```

	```python
	# Run the following code
	from llava.eval.talk2maya import run_vqa_model

	# Define inputs
	question = "Try identify what plane this is, based on the design."
	image_path = "./llava/eval/claude_plane_test_2.jpeg"

	# Run model
	answer = run_vqa_model(
	question=question,
	image_file=image_path
	)
	```

	## Limitations

	- Limited to 8 languages currently
	- Requires high-quality images for optimal performance
	- May not capture nuanced cultural contexts in all cases
	- Performance varies across languages and tasks

	## Bias, Risks, and Limitations

	Maya has been developed with attention to bias mitigation and safety:

	- Dataset filtered for toxic content
	- Cultural sensitivity evaluations performed
	- Regular bias assessments conducted
	- Limited to high-quality, vetted training data

	However, users should be aware that:
	- Model may still exhibit biases present in training data
	- Performance may vary across different cultural contexts
	- Not suitable for critical decision-making applications

	## Training Details

	Maya was trained using:
	- 558,000 curated images
	- Multilingual annotations in 8 languages
	- Toxicity-filtered dataset
	- 8xH100 GPUs with 80GB DRAM
	- Batch size of 32 (per device)
	- Learning rate of 1e-3 with cosine scheduler

	## Citation

	```bibtex
	@misc{alam2024mayainstructionfinetunedmultilingual,
	title={Maya: An Instruction Finetuned Multilingual Multimodal Model},
	author={Nahid Alam and Karthik Reddy Kanjula and Surya Guthikonda and Timothy Chung and Bala Krishna S Vegesna and Abhipsha Das and Anthony Susevski and Ryan Sze-Yin Chan and S M Iftekhar Uddin and Shayekh Bin Islam and Roshan Santhosh and Snegha A and Drishti Sharma and Chen Liu and Isha Chaturvedi and Genta Indra Winata and Ashvanth. S and Snehanshu Mukherjee and Alham Fikri Aji},
	year={2024},
	eprint={2412.07112},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2412.07112},
	}
	```

	## Contact

	For questions or feedback about Maya, please:
	- Open an issue on our [GitHub repository](https://github.com/nahidalam/maya)
	- Contact the maintainers at: [email protected], [email protected]