MBZUAI
/

Video-ChatGPT-7B

Visual Question Answering

Inference Endpoints

Model card Files Files and versions Community

Video-ChatGPT-7B / README.md

mmaaz60's picture

Update README.md

050e059 over 1 year ago

|

475 Bytes

	---
	license: cc-by-4.0
	datasets:
	- MBZUAI/Video-Instruct-Dataset
	language:
	- en
	library_name: transformers
	pipeline_tag: visual-question-answering
	---

	+ Video-ChatGPT is a large vision-language model with a visual-encoder and large language model (LLM), enabling video understanding and conversation about videos.
	+ A simple and scalable multimodal design on top of pretrained visual and language encoders that adapts only a linear projection layer for multimodal alignment.