Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 29
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 41
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 181
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision Paper • 2312.16256 • Published Dec 26, 2023 • 15
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar Paper • 2312.14239 • Published Dec 21, 2023 • 10
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Paper • 2312.09911 • Published Dec 15, 2023 • 53