VictorSanh
commited on
Commit
·
5608691
1
Parent(s):
6b79b2a
video datasets
Browse files
README.md
CHANGED
@@ -156,6 +156,8 @@ We compare our model to the original Flamingo along with [OpenFlamingo](openflam
|
|
156 |
|
157 |
We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQA, VizWiz, Visual Dialogue, Coco, Flickr30k, and HatefulMemes. We select the checkpoint at step 65'000 for IDEFICS-9B and at step 37'500 for IDEFICS. The models are evaluated with in-context few-shot learning where the priming instances are selected at random from a support set. We do not use any form of ensembling.
|
158 |
|
|
|
|
|
159 |
<img src="./assets/Figure_Evals_IDEFIX.png" width="55%">
|
160 |
|
161 |
TODO: update this table
|
|
|
156 |
|
157 |
We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQA, VizWiz, Visual Dialogue, Coco, Flickr30k, and HatefulMemes. We select the checkpoint at step 65'000 for IDEFICS-9B and at step 37'500 for IDEFICS. The models are evaluated with in-context few-shot learning where the priming instances are selected at random from a support set. We do not use any form of ensembling.
|
158 |
|
159 |
+
As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
|
160 |
+
|
161 |
<img src="./assets/Figure_Evals_IDEFIX.png" width="55%">
|
162 |
|
163 |
TODO: update this table
|