metadata
license: cc-by-nc-sa-4.0
This model contains the weights of NExT-GPT covering text-image-video-audio (tiva), which is built upon 1) Vicuna-7B version-0, 2) ImageBind, 3) Stable Diffusion v1.5, 4) AudioLDM-l-full, and 5) ZeroScope v2_576w. For more details about the usage of the model, please refer to our code repository.