StarCycle commited on
Commit
4a45204
1 Parent(s): 0084f2f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -88,6 +88,27 @@ To use tensorboard to visualize the training loss curve:
88
  pip install future tensorboard
89
  ```
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ## Data prepration
92
  1. File structure
93
 
 
88
  pip install future tensorboard
89
  ```
90
 
91
+ 5. If your training process is killed during data preprocessing, you can modify the `map_num_proc` in xtuner/xtuner/dataset
92
+ /huggingface.py
93
+ ```
94
+ def process(dataset,
95
+ do_dataset_tokenization=True,
96
+ tokenizer=None,
97
+ max_length=None,
98
+ dataset_map_fn=None,
99
+ template_map_fn=None,
100
+ max_dataset_length=None,
101
+ split='train',
102
+ remove_unused_columns=False,
103
+ rename_maps=[],
104
+ shuffle_before_pack=True,
105
+ pack_to_max_length=True,
106
+ use_varlen_attn=False,
107
+ input_ids_with_output=True,
108
+ with_image_token=False,
109
+ map_num_proc=32): # modify it to 1
110
+ ```
111
+
112
  ## Data prepration
113
  1. File structure
114