StarCycle commited on
Commit
da411b9
β€’
1 Parent(s): 928abca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -1
README.md CHANGED
@@ -78,7 +78,83 @@ You just need
78
  pip install protobuf
79
  ```
80
 
81
- ## Training
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  1. Alignment module pretraining
83
  ```
84
  NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
 
78
  pip install protobuf
79
  ```
80
 
81
+ ## Data prepration
82
+
83
+ #### File structure
84
+
85
+ ```
86
+ ./data/llava_data
87
+ β”œβ”€β”€ LLaVA-Pretrain
88
+ β”‚Β Β  β”œβ”€β”€ blip_laion_cc_sbu_558k.json
89
+ β”‚Β Β  β”œβ”€β”€ blip_laion_cc_sbu_558k_meta.json
90
+ β”‚Β Β  └── images
91
+ β”œβ”€β”€ LLaVA-Instruct-150K
92
+ β”‚Β Β  └── llava_v1_5_mix665k.json
93
+ └── llava_images
94
+ Β Β  β”œβ”€β”€ coco
95
+ Β Β  β”‚ └── train2017
96
+ Β Β  β”œβ”€β”€ gqa
97
+ Β Β  β”‚ └── images
98
+ Β Β  β”œβ”€β”€ ocr_vqa
99
+ Β Β  β”‚ └── images
100
+ Β Β  β”œβ”€β”€ textvqa
101
+ Β Β  β”‚ └── train_images
102
+ Β Β  └── vg
103
+ Β Β  Β Β  β”œβ”€β”€ VG_100K
104
+ Β Β  └── VG_100K_2
105
+ ```
106
+
107
+ #### Pretrain Data
108
+
109
+ LLaVA-Pretrain
110
+
111
+ ```shell
112
+ # Make sure you have git-lfs installed (https://git-lfs.com)
113
+ git lfs install
114
+ git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
115
+ ```
116
+
117
+ #### Finetune Data
118
+
119
+ 1. Text data
120
+
121
+ 1. LLaVA-Instruct-150K
122
+
123
+ ```shell
124
+ # Make sure you have git-lfs installed (https://git-lfs.com)
125
+ git lfs install
126
+ git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K --depth=1
127
+ ```
128
+
129
+ 2. Image data
130
+
131
+ 1. COCO (coco): [train2017](http://images.cocodataset.org/zips/train2017.zip)
132
+
133
+ 2. GQA (gqa): [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)
134
+
135
+ 3. OCR-VQA (ocr_vqa): [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing)
136
+
137
+ 1. ⚠️ Modify the name of OCR-VQA's images to keep the extension as `.jpg`!
138
+
139
+ ```shell
140
+ #!/bin/bash
141
+ ocr_vqa_path="<your-directory-path>"
142
+
143
+ find "$target_dir" -type f | while read file; do
144
+ extension="${file##*.}"
145
+ if [ "$extension" != "jpg" ]
146
+ then
147
+ cp -- "$file" "${file%.*}.jpg"
148
+ fi
149
+ done
150
+ ```
151
+
152
+ 4. TextVQA (textvqa): [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)
153
+
154
+ 5. VisualGenome (VG): [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
155
+
156
+ ## Cheers! Train your model
157
+
158
  1. Alignment module pretraining
159
  ```
160
  NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2