---
license: llama3
datasets:
- google/wit
- coastalcph/multi_eurlex
language:
- it
base_model:
- meta-llama/Meta-Llama-3-8B
- openai/clip-vit-large-patch14-336
---

# Model Card for LLaVA-NDiNO_pt

## Model description

<!-- Provide a quick summary of what the model is/does. -->

**LLaVA-NDiNO** is a family of *Large Vision Language Models (LVLMs)* trained for the Italian language. 

**LLaVA-NDiNO_pt** is a pre-trained model that has been trained over three different types of image-text data:
- **Wikipedia Image-Text Sections**: Wikipedia image together with the text section in which the image appears
- **Wikipedia Image-Text Captions**: Wikipedia image together with its caption
- **OCR PDF Documents**: text in PDF documents extracted using Tesseract from MultiEurlex

If you are interested in more details regarding the training procedure, you can find the code we used at the following link:
- **Repository:** https://github.com/swapUniba/LLaVA-NDiNO

- **Developed by:** Elio Musacchio, Lucia Siciliani, Pierpaolo Basile, Giovanni Semeraro
- **Funded by:** PNRR project FAIR - Future AI Research
- **Compute infrastructure:** [Leonardo](https://www.hpc.cineca.it/systems/hardware/leonardo/) supercomputer
- **Model type:** LLaMA 3 + CLIP
- **Language(s) (NLP):** Italian
- **License:** Llama 3 Community License 

## Example usage

The model is not intended to be used without fine-tuning. It is recommended to further train it using the [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) codebase.

## Citation

```
@inproceedings{musacchioLLaVANDiNO,
  title={LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language},
  author={Musacchio, Elio and Siciliani, Lucia and Basile, Pierpaolo and Semeraro, Giovanni},
  booktitle={Proceedings of the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI 2024) co-located with 23th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2024)},
  year={2024}
}
```