Commit
路
a3fe502
1
Parent(s):
1d41bac
Create README.md (#2)
Browse files- Create README.md (2b0340ddd58e6f3bf045196d491e808b922b4cbc)
Co-authored-by: Matteo Mocci <[email protected]>
README.md
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- visual_bert
|
7 |
+
- vqa
|
8 |
+
- easy_vqa
|
9 |
+
---
|
10 |
+
# Visual BERT finetuned on easy_vqa
|
11 |
+
This model is a finetuned version of the VisualBERT model on the easy_vqa dataset. The dataset is available at the following [github repo](https://github.com/vzhou842/easy-VQA/tree/master/easy_vqa)
|
12 |
+
|
13 |
+
## VisualBERT
|
14 |
+
VisualBERT is a multi-modal vision and language model. It can be used for tasks such as visual question answering, multiple choice and visual reasoning.
|
15 |
+
For more info on VisualBERT, please refer to the [documentation](https://huggingface.co/docs/transformers/model_doc/visual_bert#overview)
|
16 |
+
|
17 |
+
## Dataset
|
18 |
+
The dataset easy_vqa, with which the model was fine-tuned, can be easily installed via the package easy_vqa:
|
19 |
+
```python
|
20 |
+
pip install easy_vqa
|
21 |
+
```
|
22 |
+
|
23 |
+
An instance of the dataset is composed of a question, the answer of the question (a label) and the id of the image related to the question.
|
24 |
+
Each image is 64x64 and contains a shape (rectangle, triangle or circle) filled with a single color (blue, red, green, yellow, black, gray, brown or teal)
|
25 |
+
in a random position.
|
26 |
+
|
27 |
+
The questions of the dataset inquire about the shape (e.g. What is the blue shape?), the color of the shape (e.g. What color is the triangle?)
|
28 |
+
and the presence of a particular shape/color in both affermative and negative form (e.g. Is there a red shape?).
|
29 |
+
Therefore, the possible answers to a question are: the three possible shapes, the eight possible colors, yes and no.
|
30 |
+
|
31 |
+
More information about the package functions which allow to load the images and the questions can be found in the dataset's [repo](https://github.com/vzhou842/easy-VQA/tree/master/easy_vqa)
|
32 |
+
as well an utility script to generate new instances of the dataset in case Data Augmentation is needed.
|
33 |
+
|
34 |
+
## How to Use
|
35 |
+
Load the image processor and the model with the following code:
|
36 |
+
```python
|
37 |
+
processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
|
38 |
+
|
39 |
+
model = VisualBertForQuestionAnswering.from_pretrained("daki97/visualbert_finetuned_easy_vqa")
|
40 |
+
```
|
41 |
+
|
42 |
+
## COLAB Demo
|
43 |
+
An example of the usage of the model with the easy_vqa dataset is available [here](https://colab.research.google.com/drive/1yQfmz6wiSasRl6z-DmP-X403r3lZFqQS#scrollTo=HeVnH8BKkYCI)
|