Unified Vision-Language Pre-Training for Image Captioning and VQA Paper • 1909.11059 • Published Sep 24, 2019 • 2
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings Paper • 2403.07750 • Published Mar 12 • 21