StarCycle
/

llava-clip-internlm2-1_8b-pretrain-v1

Image-Text-to-Text

Model card Files Files and versions Metrics Training metrics Community

StarCycle commited on Feb 21

Commit

25b4203

•

1 Parent(s): fbd940c

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -12,6 +12,8 @@ The total size of the model is around 2.2B, which is suitable for embedded appli
 ```
 git clone https://github.com/InternLM/xtuner
 pip install -e ./xtuner[deepspeed]
 ```
 ## Common Errors
@@ -80,8 +82,15 @@ Please check the final release version
 ## Cheers! Now train your own model!
 1. Alignment module pretraining
 ```
-NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
 ```
 The checkpoint and tensorboard logs are saved by default in ./work_dirs/. I only train it for 1 epoch to be same as the original LLaVA paper. Some researches also report that training for multiple epochs will make the model overfit the training dataset and perform worse in other domains.
 This is my loss curve for llava-clip-internlm2-1_8b-pretrain-v1:

 ```
 git clone https://github.com/InternLM/xtuner
 pip install -e ./xtuner[deepspeed]
+https://huggingface.co/StarCycle/llava-clip-internlm2-1_8b-pretrain-v1
+cd ./llava-clip-internlm2-1_8b-pretrain-v1
 ```
 ## Common Errors
 ## Cheers! Now train your own model!
 1. Alignment module pretraining
 ```
+# single GPU
+xtuner train ./llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu1_pretrain.py --deepspeed deepspeed_zero2
+# multiple GPU
+NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu1_pretrain.py --deepspeed deepspeed_zero2
 ```
+#### Remember to change the batch size and gradient accumulation parameters. So your batch_size*gradient_accumulation is roughly equal to mine to reproduce the result.
 The checkpoint and tensorboard logs are saved by default in ./work_dirs/. I only train it for 1 epoch to be same as the original LLaVA paper. Some researches also report that training for multiple epochs will make the model overfit the training dataset and perform worse in other domains.
 This is my loss curve for llava-clip-internlm2-1_8b-pretrain-v1: