VLMEvalKit Evaluation Results Collection
A leaderboard for multimodal models
Text-to-Image
Generate Subtitle Using faster-whisper-large-v3-turbo-ct2