VITA-MLLM
/

Long-VITA-128K

Model card Files Files and versions Community

shenyunhang commited on 26 days ago

Commit

0950c7d

·

verified ·

1 Parent(s): ec8ca7e

Update README.md

Files changed (1) hide show

README.md +13 -6

README.md CHANGED Viewed

@@ -2,6 +2,8 @@
 license: apache-2.0
 datasets:
 - VITA-MLLM/Long-VITA-Training-Data
 ---
@@ -14,21 +16,26 @@ Long-VITA is a strong long-context visual language model and supports more than
 - This weight is trained on Ascend NPU with MindSpeed.
-- To infer and evaluate on Nvidia GPU, we also implement Long-VITA on Megatron with Transformer Engine.
-- The converted weight is in https://huggingface.co/VITA-MLLM/Long-VITA-128K_MG.
 ## 📈 Experimental Results
 - **Comparison of image understanding**.
-![image](https://github.com/user-attachments/assets/30f62f51-675e-4dac-9f18-f743c311f9be)
 - **Comparison of video understanding**.
-![image](https://github.com/user-attachments/assets/01892ff3-cdcd-4d15-ad6d-5cc99ccbfa70)

 license: apache-2.0
 datasets:
 - VITA-MLLM/Long-VITA-Training-Data
+base_model:
+- VITA-MLLM/Long-VITA-16K
 ---
 - This weight is trained on Ascend NPU with MindSpeed.
+- To infer and evaluate on Nvidia GPUs, we also implemented Long-VITA on Megatron with the Transformer Engine. The converted weight is in https://huggingface.co/VITA-MLLM/Long-VITA-128K_MG.
 ## 📈 Experimental Results
 - **Comparison of image understanding**.
+![image](https://github.com/user-attachments/assets/235bdb0e-37e6-4a5f-b20b-21b0bb83278a)
+![image](https://github.com/user-attachments/assets/72250c5b-7d33-4dba-98ab-0539bae08703)
 - **Comparison of video understanding**.
+![image](https://github.com/user-attachments/assets/7f09662b-bd53-4504-927a-0e45214a049d)
+![image](https://github.com/user-attachments/assets/87bd2f4d-baf5-4a63-8002-151e30f52147)
+- **Effectiveness of Logits-Masked LM Head**.
+![image](https://github.com/user-attachments/assets/7a06b4dd-267c-470f-80f2-d26c87e23460)