microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition β’ Updated about 3 hours ago β’ 7.35k β’ 513
lmms-lab/llava-onevision-qwen2-72b-ov-chat Image-Text-to-Text β’ Updated Oct 9, 2024 β’ 1.33k β’ 8
Running 543 543 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects