sanchit-gandhi's picture
Training in progress, epoch 1
d2861b2 verified
raw
history blame
8.67 kB
0%| | 0/1365 [00:00<?, ?it/s][WARNING|logging.py:314] 2024-02-01 17:58:55,470 >> You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
0%| | 1/1365 [00:11<4:10:37, 11.02s/it]
[2024-02-01 17:59:06,373] [WARNING] [stage3.py:1949:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time
1%|β–Œ | 10/1365 [00:43<1:23:26, 3.70s/it]
1%|β–ˆβ– | 20/1365 [01:19<1:20:49, 3.61s/it]
2%|β–ˆβ–‹ | 30/1365 [01:55<1:20:11, 3.60s/it]
3%|β–ˆβ–ˆβ–Ž | 40/1365 [02:31<1:19:28, 3.60s/it]
4%|β–ˆβ–ˆβ–‰ | 50/1365 [03:07<1:19:00, 3.61s/it]
4%|β–ˆβ–ˆβ–ˆβ– | 60/1365 [03:43<1:18:34, 3.61s/it]
5%|β–ˆβ–ˆβ–ˆβ–ˆ | 70/1365 [04:19<1:17:55, 3.61s/it]
6%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 80/1365 [04:55<1:17:12, 3.61s/it]
7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 90/1365 [05:31<1:16:32, 3.60s/it]
7%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 100/1365 [06:07<1:16:06, 3.61s/it]
8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 110/1365 [06:44<1:15:24, 3.60s/it]
9%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 120/1365 [07:20<1:14:48, 3.61s/it]
10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 130/1365 [07:56<1:14:12, 3.61s/it]
10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 140/1365 [08:32<1:13:29, 3.60s/it]
11%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 150/1365 [09:07<1:12:25, 3.58s/it]
12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 160/1365 [09:43<1:12:00, 3.59s/it]
12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 170/1365 [10:20<1:12:29, 3.64s/it]
13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 179/1365 [10:52<1:11:26, 3.61s/it]
14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 189/1365 [11:28<1:10:32, 3.60s/it]
15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 199/1365 [12:04<1:09:55, 3.60s/it]
15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 210/1365 [12:44<1:09:11, 3.59s/it]
16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 220/1365 [13:20<1:08:33, 3.59s/it]
17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 230/1365 [13:56<1:08:01, 3.60s/it]
18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 240/1365 [14:32<1:07:15, 3.59s/it]
18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 250/1365 [15:07<1:06:48, 3.59s/it]
19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 260/1365 [15:44<1:06:51, 3.63s/it]
20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 270/1365 [16:20<1:06:13, 3.63s/it]
20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 273/1365 [16:31<1:05:09, 3.58s/it][INFO|trainer.py:3166] 2024-02-01 18:15:26,429 >> ***** Running Evaluation *****
[INFO|trainer.py:3168] 2024-02-01 18:15:26,429 >> Num examples = 15431
[INFO|trainer.py:3171] 2024-02-01 18:15:26,429 >> Batch size = 32
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 58/61 [00:28<00:01, 1.99it/s]
20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 273/1365 [17:01<1:05:09, 3.58s/it][INFO|trainer.py:2889] 2024-02-01 18:15:58,319 >> Saving model checkpoint to ./tmp-checkpoint-273
[INFO|configuration_utils.py:483] 2024-02-01 18:15:58,323 >> Configuration saved in ./tmp-checkpoint-273/config.json
[INFO|configuration_utils.py:594] 2024-02-01 18:15:58,326 >> Configuration saved in ./tmp-checkpoint-273/generation_config.json
[INFO|modeling_utils.py:2382] 2024-02-01 18:16:01,522 >> Model weights saved in ./tmp-checkpoint-273/pytorch_model.bin
[INFO|tokenization_utils_base.py:2432] 2024-02-01 18:16:01,541 >> tokenizer config file saved in ./tmp-checkpoint-273/tokenizer_config.json
[INFO|tokenization_utils_base.py:2441] 2024-02-01 18:16:01,543 >> Special tokens file saved in ./tmp-checkpoint-273/special_tokens_map.json
/fsx/sanchit/miniconda3/envs/venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
[2024-02-01 18:16:01,626] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step273 is about to be saved!
[2024-02-01 18:16:01,783] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./tmp-checkpoint-273/global_step273/zero_pp_rank_0_mp_rank_00_model_states.pt
[2024-02-01 18:16:01,784] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./tmp-checkpoint-273/global_step273/zero_pp_rank_0_mp_rank_00_model_states.pt...
[2024-02-01 18:16:01,787] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./tmp-checkpoint-273/global_step273/zero_pp_rank_0_mp_rank_00_model_states.pt.
[2024-02-01 18:16:01,792] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./tmp-checkpoint-273/global_step273/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt...
[2024-02-01 18:16:05,650] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./tmp-checkpoint-273/global_step273/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt.
[2024-02-01 18:16:05,658] [INFO] [engine.py:3393:_save_zero_checkpoint] zero checkpoint saved ./tmp-checkpoint-273/global_step273/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
[2024-02-01 18:16:05,936] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step273 is ready now!
[INFO|tokenization_utils_base.py:2432] 2024-02-01 18:16:08,421 >> tokenizer config file saved in ./tokenizer_config.json
[INFO|tokenization_utils_base.py:2441] 2024-02-01 18:16:08,424 >> Special tokens file saved in ./special_tokens_map.json
21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 280/1365 [17:38<1:31:41, 5.07s/it]
21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 290/1365 [18:14<1:04:59, 3.63s/it]