VITA-1.5 / README.md
nielsr's picture
nielsr HF staff
Add model card
34ce5f2 verified
|
raw
history blame
245 Bytes
---
pipeline_tag: video-text-to-text
---
This repository contains the model of the paper [VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction](https://huggingface.co./papers/2501.01957).
Code: https://github.com/VITA-MLLM/VITA