StarCycle
/

llava-dinov2-internlm2-7b-v1

Image-Text-to-Text

Model card Files Files and versions Community

StarCycle commited on Feb 20

Commit

da411b9

•

1 Parent(s): 928abca

Update README.md

Files changed (1) hide show

README.md +77 -1

README.md CHANGED Viewed

@@ -78,7 +78,83 @@ You just need
 pip install protobuf
 ```
-## Training
 1. Alignment module pretraining
 ```
 NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2

 pip install protobuf
 ```
+## Data prepration
+#### File structure
+```
+./data/llava_data
+├── LLaVA-Pretrain
+│   ├── blip_laion_cc_sbu_558k.json
+│   ├── blip_laion_cc_sbu_558k_meta.json
+│   └── images
+├── LLaVA-Instruct-150K
+│   └── llava_v1_5_mix665k.json
+└── llava_images
+    ├── coco
+    │   └── train2017
+    ├── gqa
+    │   └── images
+    ├── ocr_vqa
+    │   └── images
+    ├── textvqa
+    │   └── train_images
+    └── vg
+        ├── VG_100K
+        └── VG_100K_2
+```
+#### Pretrain Data
+LLaVA-Pretrain
+```shell
+# Make sure you have git-lfs installed (https://git-lfs.com)
+git lfs install
+git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
+```
+#### Finetune Data
+1. Text data
+   1. LLaVA-Instruct-150K
+      ```shell
+      # Make sure you have git-lfs installed (https://git-lfs.com)
+      git lfs install
+      git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K --depth=1
+      ```
+2. Image data
+   1. COCO (coco): [train2017](http://images.cocodataset.org/zips/train2017.zip)
+   2. GQA (gqa): [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)
+   3. OCR-VQA (ocr_vqa): [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing)
+      1. ⚠️ Modify the name of OCR-VQA's images to keep the extension as `.jpg`!
+         ```shell
+         #!/bin/bash
+         ocr_vqa_path="<your-directory-path>"
+         find "$target_dir" -type f | while read file; do
+             extension="${file##*.}"
+             if [ "$extension" != "jpg" ]
+             then
+                 cp -- "$file" "${file%.*}.jpg"
+             fi
+         done
+         ```
+   4. TextVQA (textvqa): [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)
+   5. VisualGenome (VG): [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
+## Cheers! Train your model
 1. Alignment module pretraining
 ```
 NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2