VictorSanh
commited on
Commit
·
9d30b49
1
Parent(s):
79d1def
add assets path
Browse files
README.md
CHANGED
@@ -55,7 +55,7 @@ It is possible to fine-tune the base model on custom data for a specific use-cas
|
|
55 |
|
56 |
The following screenshot is an example of interaction with the instructed model:
|
57 |
|
58 |
-
<img src="
|
59 |
|
60 |
|
61 |
# How to Get Started with the Model
|
@@ -158,7 +158,7 @@ We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQ
|
|
158 |
|
159 |
As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
|
160 |
|
161 |
-
<img src="
|
162 |
|
163 |
We note that since IDEFICS was trained on PMD (which contains COCO), the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture. Additionally, Flamingo is trained with images of resolution 320 x 320 while IDEFICS and OpenFlamingo were trained with images of 224 x 224 resolution.
|
164 |
|
|
|
55 |
|
56 |
The following screenshot is an example of interaction with the instructed model:
|
57 |
|
58 |
+
<img src="assets/guarding_baguettes.png" width="35%">
|
59 |
|
60 |
|
61 |
# How to Get Started with the Model
|
|
|
158 |
|
159 |
As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
|
160 |
|
161 |
+
<img src="assets/Figure_Evals_IDEFIX.png" width="55%">
|
162 |
|
163 |
We note that since IDEFICS was trained on PMD (which contains COCO), the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture. Additionally, Flamingo is trained with images of resolution 320 x 320 while IDEFICS and OpenFlamingo were trained with images of 224 x 224 resolution.
|
164 |
|