Collections
Discover the best community collections!
Collections including paper arxiv:2310.16656
-
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Paper • 2404.09990 • Published • 12 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 11 -
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Paper • 2404.09204 • Published • 10 -
Taming Latent Diffusion Model for Neural Radiance Field Inpainting
Paper • 2404.09995 • Published • 6
-
Attention Is All You Need
Paper • 1706.03762 • Published • 44 -
You Only Look Once: Unified, Real-Time Object Detection
Paper • 1506.02640 • Published • 1 -
HEp-2 Cell Image Classification with Deep Convolutional Neural Networks
Paper • 1504.02531 • Published -
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Paper • 2401.05566 • Published • 26
-
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 40 -
Unsupervised Universal Image Segmentation
Paper • 2312.17243 • Published • 19 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 109 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 30
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper • 2306.07967 • Published • 24 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper • 2306.07954 • Published • 113 -
TryOnDiffusion: A Tale of Two UNets
Paper • 2306.08276 • Published • 73 -
Seeing the World through Your Eyes
Paper • 2306.09348 • Published • 33
-
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Paper • 2310.00426 • Published • 61 -
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 40 -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 32 -
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Paper • 2401.11605 • Published • 22
-
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 40 -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 32 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 40 -
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
Paper • 2311.04145 • Published • 32
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 4 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 18 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20