VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper β’ 2406.07476 β’ Published Jun 11 β’ 32
DAMO-NLP-SG/VideoLLaMA2.1-7B-16F-Base Visual Question Answering β’ Updated 20 days ago β’ 355 β’ 1