Generate clickable coordinates on a screenshot
Interact with images and texts using Qwen-VL-Max
Evaluate open-ended outputs from AI models using MM-Vet