Acoustic Volume Rendering for Neural Impulse Response Fields Paper • 2411.06307 • Published Nov 9 • 5
BELLE-2/Belle-whisper-large-v3-turbo-zh Automatic Speech Recognition • Updated 9 days ago • 1.33k • 31
Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control Paper • 2410.06985 • Published Oct 9 • 5
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published Oct 8 • 14
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler Paper • 2410.05651 • Published Oct 8 • 13
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Paper • 2410.06244 • Published Oct 8 • 19
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8 • 38
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations Paper • 2410.08049 • Published Oct 10 • 8
MiRAGeNews: Multimodal Realistic AI-Generated News Detection Paper • 2410.09045 • Published Oct 11 • 4
ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion Paper • 2410.08168 • Published Oct 10 • 9
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10 • 49
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published Oct 14 • 29
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14 • 54
ControlAR: Controllable Image Generation with Autoregressive Models Paper • 2410.02705 • Published Oct 3 • 9
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published Sep 19 • 11
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published Sep 19 • 18
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation Paper • 2409.12576 • Published Sep 19 • 15
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Paper • 2410.01036 • Published Oct 1 • 14