HuggingFaceH4
/

vsft-llava-1.5-7b-hf-trl

image-text-to-text

Model card Files Files and versions Community

edbeeching HF staff commited on Apr 11

Commit

5f5d133

•

1 Parent(s): f75c568

Update README.md

Files changed (1) hide show

README.md +16 -8

README.md CHANGED Viewed

@@ -11,13 +11,9 @@ datasets:
 HuggingFaceH4/vsft-llava-1.5-7b-hf-trl is a Vision Language Model, created by performing VSFT on the [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) model
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6200d0a443eb0913fa2df7cc/q5GXv6Om4Hf2n6IB3e7DQ.png)
-Below is the model card of Llava model 7b, which is copied from the original Llava model card that you can find [here](https://huggingface.co/liuhaotian/llava-v1.5-13b).
-Check out also the Google Colab demo to run Llava on a free-tier Google Colab instance: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1qsl6cd2c8gGtEW1xV5io7S8NHh-Cp1TV?usp=sharing)
-Or check out our Spaces demo! [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md-dark.svg)](https://huggingface.co/spaces/llava-hf/llava-4bit)
 ## Model details
@@ -30,7 +26,7 @@ It is an auto-regressive language model, based on the transformer architecture.
 The model was trained on April the 11th 2024
 **Example training script**
-https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py
 ## How to use the model
@@ -117,4 +113,16 @@ model = LlavaForConditionalGeneration.from_pretrained(
 ## License
 Llama 2 is licensed under the LLAMA 2 Community License,
-Copyright (c) Meta Platforms, Inc. All Rights Reserved.

 HuggingFaceH4/vsft-llava-1.5-7b-hf-trl is a Vision Language Model, created by performing VSFT on the [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) model
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6200d0a443eb0913fa2df7cc/q5GXv6Om4Hf2n6IB3e7DQ.png) model with 260k image and conversation pairs from the [HuggingFaceH4/llava-instruct-mix-vsft](https://huggingface.co/datasets/HuggingFaceH4/llava-instruct-mix-vsft) dataset.
+Or check out our Spaces demo! [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-md-dark.svg)](https://huggingface.co/spaces/HuggingFaceH4/vlm-playground)
 ## Model details
 The model was trained on April the 11th 2024
 **Example training script**
+[Train a VLM yourself with our TRL example](https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py)
 ## How to use the model
 ## License
 Llama 2 is licensed under the LLAMA 2 Community License,
+Copyright (c) Meta Platforms, Inc. All Rights Reserved.
+## Citation
+```
+@misc{vonwerra2022trl,
+  author = {Edward Beeching and Kashif Rasul and Younes Belkada and Shengyi Huangand and Leandro von Werra and Lewis Tunstall},
+  title = {TRL: Transformer Reinforcement Learning},
+  year = {2020},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```