ShareGPTVideo
/

LLaVA-Hound-SFT

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

ruohongz commited on Mar 31

Commit

ba7bc8c

•

1 Parent(s): 839fdcd

Create README.md

Files changed (1) hide show

README.md +43 -0

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+inference: false
+license: apache-2.0
+---
+<br>
+<br>
+# LLaVA-Hound Model Card
+## Model details
+**Model type:**
+LLaVA-Hound is an open-source video large multimodal model, fine-tuned from video instruction following data based on large language model.
+This model is the SFT version on image and video instruction dataset.
+Base LLM: [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)
+**Model date:**
+Trained on March 15, 2024.
+**Paper or resources for more information:**
+https://github.com/RifleZhang/LLaVA-Hound-DPO
+## License
+[lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) license.
+**Where to send questions or comments about the model:**
+https://github.com/RifleZhang/LLaVA-Hound-DPO/issues
+## Intended use
+**Primary intended uses:**
+Video (image) instruction-following.
+**Primary intended users:**
+Researchers in artificial intelligence, large multimodal model, etc.
+## Training dataset
+ShareGPTVideo dataset.
+## Evaluation
+Follow https://github.com/RifleZhang/LLaVA-Hound-DPO/blob/main/README.md