HuggingFaceM4
/

idefics-80b

Text Generation

image-text-to-text

text-generation-inference

Model card Files Files and versions Community

VictorSanh commited on Aug 3, 2023

Commit

5608691

·

1 Parent(s): 6b79b2a

video datasets

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -156,6 +156,8 @@ We compare our model to the original Flamingo along with [OpenFlamingo](openflam
 We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQA, VizWiz, Visual Dialogue, Coco, Flickr30k, and HatefulMemes. We select the checkpoint at step 65'000 for IDEFICS-9B and at step 37'500 for IDEFICS. The models are evaluated with in-context few-shot learning where the priming instances are selected at random from a support set. We do not use any form of ensembling.
 <img src="./assets/Figure_Evals_IDEFIX.png"  width="55%">
 TODO: update this table

 We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQA, VizWiz, Visual Dialogue, Coco, Flickr30k, and HatefulMemes. We select the checkpoint at step 65'000 for IDEFICS-9B and at step 37'500 for IDEFICS. The models are evaluated with in-context few-shot learning where the priming instances are selected at random from a support set. We do not use any form of ensembling.
+As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
 <img src="./assets/Figure_Evals_IDEFIX.png"  width="55%">
 TODO: update this table