AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper β’ 2502.01341 β’ Published 26 days ago β’ 35
Multimodal foundation world models for generalist embodied agents Paper β’ 2406.18043 β’ Published Jun 26, 2024 β’ 1