AskVideos-7B-Instruct-v0.1

Model details

Model type: AskVideos-7B-Instruct-v0.1 is an open-source chatbot trained by fine-tuning a Video-LLaMA variant on additional video Q&A data. It uses a frozen Vicuna 7B v1.1 LLM to answer Video-Text queries and a frozen BLIP style image encoder. A video feature is derived from the encoded image using a video-QFormer and the result is projected onto the LLM space.

Github repo for demo: https://github.com/AskYoutubeAI/AskVideos-Instruct

Acknowledgement This model is based on Video-LLaMA. Check out the original work here: https://github.com/DAMO-NLP-SG/Video-LLaMA

License

AskVideos-7B-Instruct-v0.1 code and models are distributed under the Apache License 2.0.

Training dataset

  • Finetuned with 50K video synthetic Q&A pairs mined from videos.
  • For each Q&A pair, 16 frames are sampled over a 30s video.
  • Finetuned on Video-LLaAMA Vicuna 7B.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.