StarCycle commited on
Commit
1f4ce67
1 Parent(s): c7fd5f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -168,6 +168,8 @@ git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
168
  ```
169
  NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
170
  ```
 
 
171
  The checkpoint and tensorboard logs are saved by default in ./work_dirs/. I only train it for 1 epoch to be same as the original LLaVA paper. Some researches also report that training for multiple epochs will make the model overfit the training dataset and perform worse in other domains.
172
 
173
  Here is my loss curve:
 
168
  ```
169
  NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
170
  ```
171
+ #### Remember to change the batch size and gradient accumulation parameters to fit your hardware. So your batch_size*gradient_accumulation is roughly equal to mine to reproduce the result.
172
+
173
  The checkpoint and tensorboard logs are saved by default in ./work_dirs/. I only train it for 1 epoch to be same as the original LLaVA paper. Some researches also report that training for multiple epochs will make the model overfit the training dataset and perform worse in other domains.
174
 
175
  Here is my loss curve: