Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper • 2412.00493 • Published 25 days ago • 16
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning Paper • 2412.03248 • Published 21 days ago • 25
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control Paper • 2308.09804 • Published Aug 18, 2023 • 2