Multimodal Language Model Benchmarks Collection Multimodal benchmarks that test various aspects of LLMs, VLMs, LMMs β’ 14 items β’ Updated Sep 11, 2024 β’ 1
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics Paper β’ 2406.14051 β’ Published Jun 20, 2024 β’ 9
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models Paper β’ 2406.14035 β’ Published Jun 20, 2024 β’ 13
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents Paper β’ 2305.13455 β’ Published May 22, 2023 β’ 3