Collections
Discover the best community collections!
Collections including paper arxiv:2406.04485
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 12 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 45 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 28
-
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 36 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper • 2306.07967 • Published • 24 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper • 2306.07954 • Published • 113 -
TryOnDiffusion: A Tale of Two UNets
Paper • 2306.08276 • Published • 73 -
Seeing the World through Your Eyes
Paper • 2306.09348 • Published • 33
-
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Paper • 2309.06380 • Published • 32 -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 52 -
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Paper • 2309.15818 • Published • 19