daki97
/

visualbert_finetuned_easy_vqa

Question Answering

Inference Endpoints

Model card Files Files and versions Community

visualbert_finetuned_easy_vqa / README.md

daki97's picture

Create README.md (#2)

a3fe502 over 1 year ago

|

history blame contribute delete

2.23 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- visual_bert
	- vqa
	- easy_vqa
	---
	# Visual BERT finetuned on easy_vqa
	This model is a finetuned version of the VisualBERT model on the easy_vqa dataset. The dataset is available at the following [github repo](https://github.com/vzhou842/easy-VQA/tree/master/easy_vqa)

	## VisualBERT
	VisualBERT is a multi-modal vision and language model. It can be used for tasks such as visual question answering, multiple choice and visual reasoning.
	For more info on VisualBERT, please refer to the [documentation](https://huggingface.co./docs/transformers/model_doc/visual_bert#overview)

	## Dataset
	The dataset easy_vqa, with which the model was fine-tuned, can be easily installed via the package easy_vqa:
	```python
	pip install easy_vqa
	```

	An instance of the dataset is composed of a question, the answer of the question (a label) and the id of the image related to the question.
	Each image is 64x64 and contains a shape (rectangle, triangle or circle) filled with a single color (blue, red, green, yellow, black, gray, brown or teal)
	in a random position.

	The questions of the dataset inquire about the shape (e.g. What is the blue shape?), the color of the shape (e.g. What color is the triangle?)
	and the presence of a particular shape/color in both affermative and negative form (e.g. Is there a red shape?).
	Therefore, the possible answers to a question are: the three possible shapes, the eight possible colors, yes and no.

	More information about the package functions which allow to load the images and the questions can be found in the dataset's [repo](https://github.com/vzhou842/easy-VQA/tree/master/easy_vqa)
	as well an utility script to generate new instances of the dataset in case Data Augmentation is needed.

	## How to Use
	Load the image processor and the model with the following code:
	```python
	processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")

	model = VisualBertForQuestionAnswering.from_pretrained("daki97/visualbert_finetuned_easy_vqa")
	```

	## COLAB Demo
	An example of the usage of the model with the easy_vqa dataset is available [here](https://colab.research.google.com/drive/1yQfmz6wiSasRl6z-DmP-X403r3lZFqQS#scrollTo=HeVnH8BKkYCI)