ContextualBench_Leaderboard / image_identification.tsv
ToughStone's picture
Upload image_identification.tsv
241069d verified
Model Fable Fairytale Science History Folklore Movie Average
BLIP-2 (zero-shot) 70.0 58.1 65.0 58.8 71.2 - 64.3
InstructBLIP (zero-shot) 65.0 70.9 77.5 68.8 71.2 - 70.7
mPLUG-Owl (zero-shot) 50.0 48.8 65.0 66.2 57.6 - 57.4
mPLUG-Owl2 (zero-shot) 62.5 65.1 71.2 78.8 71.2 - 69.6
LLaVA-v1.5 (zero-shot) 68.8 53.5 67.5 75.0 63.6 - 65.6
LLaVA-v1.6 (zero-shot) 63.7 60.5 78.8 60.0 72.7 - 66.8
MMICL (zero-shot) 75.0 66.3 77.5 72.5 72.7 - 72.7
OpenFlamingo (zero-shot) 52.5 54.7 46.2 57.5 42.4 - 51.0
Otter (zero-shot) 62.9 65.1 74.6 58.4 48.3 - 61.8
GPT-4V (zero-shot) 78.8 73.3 82.5 80.0 84.8 - 79.6
MMICL (few-shot) 50.0 46.5 50.0 50.0 53.0 - 49.7
OpenFlamingo (few-shot) 41.2 44.2 45.0 47.5 40.9 - 43.9
Otter (few-shot) 48.2 49.5 57.4 39.8 35.5 - 46.0
GPT-4V (few-shot) 82.5 76.7 92.5 83.8 83.3 - 83.7
MMICL (CoCoT) 28.7 41.9 33.8 25.0 53.0 - 36.0
OpenFlamingo (CoCoT) 0.0 0.0 0.0 0.0 0.0 - 0.0
Otter (CoCoT) 30.7 25.4 22.8 27.1 20.6 - 25.3
GPT-4V (CoCoT) 68.8 75.6 73.8 66.2 66.7 - 70.4