Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 3 items • Updated 7 days ago • 309
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Paper • 2501.12909 • Published 12 days ago • 62
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published 12 days ago • 81
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution Paper • 2501.10045 • Published 17 days ago • 9
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper • 2501.09756 • Published 17 days ago • 19
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published 19 days ago • 55
Visual Document Retrieval Collection A collection of models, datasets, and spaces in the VDR series • 5 items • Updated 24 days ago • 8
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 23 days ago • 59