Dense Grounded Understanding of Images and Videos
Text to Audio (Sound SFX) Generator
Video Super-Resolution with Text-to-Video Model
Vision Transformer Attention Visualization
https://huggingface.co./papers/2501.03006
Gaze detection using Moondream