bytedance-research/UI-TARS-7B-SFT
Image-Text-to-Text
β’
Updated
β’
3.78k
β’
137
Upgraded to v1.0!
https://huggingface.co./papers/2501.03006
View and submit LLM evaluations
Gaze detection using Moondream
Audio Conditioned LipSync with Latent Diffusion Models
Create videos with FFMPEG + Qwen2.5-Coder