StarCycle commited on
Commit
69c10d4
1 Parent(s): da411b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -16
README.md CHANGED
@@ -29,8 +29,7 @@ LLaVA-InternLM-7B | 69.0 | 68.5 | 66.7 | 63.8 | 37.3
29
  LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5
30
  llava-dinov2-internlm2-7b-v1 | 64.0 | 65.2 | 62.9 | 61.6 | 45.3
31
 
32
- ## Quickstart
33
- ### Installation
34
  ```
35
  git clone https://github.com/InternLM/xtuner
36
  pip install -e ./xtuner[deepspeed]
@@ -46,6 +45,7 @@ xtuner chat internlm/internlm2-chat-7b \
46
  --prompt-template internlm2_chat \
47
  --image $IMAGE_PATH
48
  ```
 
49
  ## Common Errors
50
  1.
51
  ```
@@ -80,7 +80,7 @@ pip install protobuf
80
 
81
  ## Data prepration
82
 
83
- #### File structure
84
 
85
  ```
86
  ./data/llava_data
@@ -104,7 +104,7 @@ pip install protobuf
104
     └── VG_100K_2
105
  ```
106
 
107
- #### Pretrain Data
108
 
109
  LLaVA-Pretrain
110
 
@@ -114,11 +114,11 @@ git lfs install
114
  git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
115
  ```
116
 
117
- #### Finetune Data
118
 
119
- 1. Text data
120
 
121
- 1. LLaVA-Instruct-150K
122
 
123
  ```shell
124
  # Make sure you have git-lfs installed (https://git-lfs.com)
@@ -126,15 +126,15 @@ git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
126
  git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K --depth=1
127
  ```
128
 
129
- 2. Image data
130
 
131
- 1. COCO (coco): [train2017](http://images.cocodataset.org/zips/train2017.zip)
132
 
133
- 2. GQA (gqa): [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)
134
 
135
- 3. OCR-VQA (ocr_vqa): [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing)
136
 
137
- 1. ⚠️ Modify the name of OCR-VQA's images to keep the extension as `.jpg`!
138
 
139
  ```shell
140
  #!/bin/bash
@@ -149,12 +149,11 @@ git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
149
  done
150
  ```
151
 
152
- 4. TextVQA (textvqa): [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)
153
 
154
- 5. VisualGenome (VG): [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
155
 
156
- ## Cheers! Train your model
157
-
158
  1. Alignment module pretraining
159
  ```
160
  NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
 
29
  LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5
30
  llava-dinov2-internlm2-7b-v1 | 64.0 | 65.2 | 62.9 | 61.6 | 45.3
31
 
32
+ ## Installation
 
33
  ```
34
  git clone https://github.com/InternLM/xtuner
35
  pip install -e ./xtuner[deepspeed]
 
45
  --prompt-template internlm2_chat \
46
  --image $IMAGE_PATH
47
  ```
48
+
49
  ## Common Errors
50
  1.
51
  ```
 
80
 
81
  ## Data prepration
82
 
83
+ 1. File structure
84
 
85
  ```
86
  ./data/llava_data
 
104
     └── VG_100K_2
105
  ```
106
 
107
+ 2. Pretrain Data
108
 
109
  LLaVA-Pretrain
110
 
 
114
  git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
115
  ```
116
 
117
+ 3. Finetune Data
118
 
119
+ 3.1 Text data
120
 
121
+ LLaVA-Instruct-150K
122
 
123
  ```shell
124
  # Make sure you have git-lfs installed (https://git-lfs.com)
 
126
  git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K --depth=1
127
  ```
128
 
129
+ 3.2 Image data
130
 
131
+ 3.2.1 COCO (coco): [train2017](http://images.cocodataset.org/zips/train2017.zip)
132
 
133
+ 3.2.2 GQA (gqa): [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)
134
 
135
+ 3.2.3 OCR-VQA (ocr_vqa): [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing)
136
 
137
+ ⚠️⚠️⚠️ Modify the name of OCR-VQA's images to keep the extension as `.jpg`!
138
 
139
  ```shell
140
  #!/bin/bash
 
149
  done
150
  ```
151
 
152
+ 3.2.4 TextVQA (textvqa): [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)
153
 
154
+ 3.2.5 VisualGenome (VG): [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
155
 
156
+ ## Cheers! Now train your own model!
 
157
  1. Alignment module pretraining
158
  ```
159
  NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2