Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Paper • 2410.13863 • Published 20 days ago • 35
ViTamin Family Collection Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'. • 16 items • Updated Apr 11 • 9
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 55