JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15 • 44
Wavelets Are All You Need for Autoregressive Image Generation Paper • 2406.19997 • Published Jun 28 • 28
BRAVE: Broadening the visual encoding of vision-language models Paper • 2404.07204 • Published Apr 10 • 15
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published Apr 9 • 63
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 63
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting Paper • 2403.08551 • Published Mar 13 • 8
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper • 2402.04347 • Published Feb 6 • 13
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Paper • 2402.04291 • Published Feb 6 • 48
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 59
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29 • 54
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models Paper • 2401.12522 • Published Jan 23 • 11
Small Language Model Meets with Reinforced Vision Vocabulary Paper • 2401.12503 • Published Jan 23 • 31
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 86
Single-View 3D Human Digitalization with Large Reconstruction Models Paper • 2401.12175 • Published Jan 22 • 5
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Paper • 2401.11605 • Published Jan 21 • 20
Rethinking FID: Towards a Better Evaluation Metric for Image Generation Paper • 2401.09603 • Published Nov 30, 2023 • 15
TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering Paper • 2401.06003 • Published Jan 11 • 20
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations Paper • 2312.04655 • Published Dec 7, 2023 • 20
Beyond Surface: Probing LLaMA Across Scales and Layers Paper • 2312.04333 • Published Dec 7, 2023 • 18
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation Paper • 2312.04483 • Published Dec 7, 2023 • 7
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators Paper • 2312.03793 • Published Dec 6, 2023 • 17
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Paper • 2312.03818 • Published Dec 6, 2023 • 31
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation Paper • 2312.04557 • Published Dec 7, 2023 • 12
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models Paper • 2312.04410 • Published Dec 7, 2023 • 14
FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting Paper • 2312.00451 • Published Dec 1, 2023 • 9