EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone Paper • 2307.05463 • Published Jul 11, 2023 • 10
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates Paper • 2307.05695 • Published Jul 11, 2023 • 22
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 22