multimodal - a Andyrasika Collection

Andyrasika 's Collections

Agents

Prompt-collection

Ankush Collection

Audio

Reinforcement Learning

Stable Diffusion

Synthetic Datasets

multimodal

updated Jan 10

this collection is for multimodal papers

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Paper • 2407.10387 • Published Jul 15, 2024 • 8
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 51
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 43