--- license: mit --- ![header](./assets/header.png)

📃 Paper • 🌐 Demo • 📃 LongLLaVA

![efficiency](./assets/singleGPU.png) ## 🌈 Update * **[2024.09.05]** LongLLaVA repo is published！🎉 The Code will ## Architecture

Click to view the architecture image

![Architecture Image](./assets/arch.png)

## Results

Click to view the Results

- Main Results ![Main Results](./assets/result1.png) - Diagnostic Results ![Diagnostic Results](./assets/diaresult.png) - Video-NIAH ![Video-NIAH](./assets/NIAH.png)

## Results reproduction ### Data DownLoad and Construction

Dataset Taxonomy

![Dataset](./assets/dataset.png)

Dataset DownLoading and Construction

> Coming Soon~

### Evaluation > Model checkpoint is Coming Soon~ ## Citation ``` @misc{wang2024longllavascalingmultimodalllms, title={LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture}, author={Xidong Wang and Dingjie Song and Shunian Chen and Chen Zhang and Benyou Wang}, year={2024}, eprint={2409.02889}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2409.02889}, } ```