|
|
|
0%| | 0/1365 [00:00<?, ?it/s][WARNING|logging.py:314] 2024-02-01 17:58:55,470 >> You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. |
|
0%| | 1/1365 [00:11<4:10:37, 11.02s/it] |
|
[2024-02-01 17:59:06,373] [WARNING] [stage3.py:1949:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding get_accelerator().empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1%|β | 10/1365 [00:43<1:23:26, 3.70s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1%|ββ | 20/1365 [01:19<1:20:49, 3.61s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2%|ββ | 30/1365 [01:55<1:20:11, 3.60s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3%|βββ | 40/1365 [02:31<1:19:28, 3.60s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4%|βββ | 50/1365 [03:07<1:19:00, 3.61s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4%|ββββ | 60/1365 [03:43<1:18:34, 3.61s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5%|ββββ | 70/1365 [04:19<1:17:55, 3.61s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6%|βββββ | 80/1365 [04:55<1:17:12, 3.61s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7%|ββββββ | 90/1365 [05:31<1:16:32, 3.60s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7%|ββββββ | 100/1365 [06:07<1:16:06, 3.61s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8%|βββββββ | 110/1365 [06:44<1:15:24, 3.60s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9%|βββββββ | 120/1365 [07:20<1:14:48, 3.61s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10%|ββββββββ | 130/1365 [07:56<1:14:12, 3.61s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10%|ββββββββ | 140/1365 [08:32<1:13:29, 3.60s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11%|βββββββββ | 150/1365 [09:07<1:12:25, 3.58s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12%|ββββββββββ | 160/1365 [09:43<1:12:00, 3.59s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12%|ββββββββββ | 170/1365 [10:20<1:12:29, 3.64s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13%|βββββββββββ | 179/1365 [10:52<1:11:26, 3.61s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14%|βββββββββββ | 189/1365 [11:28<1:10:32, 3.60s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15%|ββββββββββββ | 199/1365 [12:04<1:09:55, 3.60s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15%|ββββββββββββ | 210/1365 [12:44<1:09:11, 3.59s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16%|βββββββββββββ | 220/1365 [13:20<1:08:33, 3.59s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17%|ββββββββββββββ | 230/1365 [13:56<1:08:01, 3.60s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18%|ββββββββββββββ | 240/1365 [14:32<1:07:15, 3.59s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18%|βββββββββββββββ | 250/1365 [15:07<1:06:48, 3.59s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19%|βββββββββββββββ | 260/1365 [15:44<1:06:51, 3.63s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20%|ββββββββββββββββ | 270/1365 [16:20<1:06:13, 3.63s/it] |
|
|
|
|
|
20%|ββββββββββββββββ | 273/1365 [16:31<1:05:09, 3.58s/it][INFO|trainer.py:3166] 2024-02-01 18:15:26,429 >> ***** Running Evaluation ***** |
|
[INFO|trainer.py:3168] 2024-02-01 18:15:26,429 >> Num examples = 15431 |
|
[INFO|trainer.py:3171] 2024-02-01 18:15:26,429 >> Batch size = 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 58/61 [00:28<00:01, 1.99it/s] |
|
20%|ββββββββββββββββ | 273/1365 [17:01<1:05:09, 3.58s/it][INFO|trainer.py:2889] 2024-02-01 18:15:58,319 >> Saving model checkpoint to ./tmp-checkpoint-273 |
|
[INFO|configuration_utils.py:483] 2024-02-01 18:15:58,323 >> Configuration saved in ./tmp-checkpoint-273/config.json |
|
[INFO|configuration_utils.py:594] 2024-02-01 18:15:58,326 >> Configuration saved in ./tmp-checkpoint-273/generation_config.json |
|
[INFO|modeling_utils.py:2382] 2024-02-01 18:16:01,522 >> Model weights saved in ./tmp-checkpoint-273/pytorch_model.bin |
|
[INFO|tokenization_utils_base.py:2432] 2024-02-01 18:16:01,541 >> tokenizer config file saved in ./tmp-checkpoint-273/tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2441] 2024-02-01 18:16:01,543 >> Special tokens file saved in ./tmp-checkpoint-273/special_tokens_map.json |
|
/fsx/sanchit/miniconda3/envs/venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html |
|
warnings.warn( |
|
[2024-02-01 18:16:01,626] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step273 is about to be saved! |
|
[2024-02-01 18:16:01,783] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: ./tmp-checkpoint-273/global_step273/zero_pp_rank_0_mp_rank_00_model_states.pt |
|
[2024-02-01 18:16:01,784] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./tmp-checkpoint-273/global_step273/zero_pp_rank_0_mp_rank_00_model_states.pt... |
|
[2024-02-01 18:16:01,787] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./tmp-checkpoint-273/global_step273/zero_pp_rank_0_mp_rank_00_model_states.pt. |
|
[2024-02-01 18:16:01,792] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving ./tmp-checkpoint-273/global_step273/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... |
|
[2024-02-01 18:16:05,650] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved ./tmp-checkpoint-273/global_step273/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. |
|
[2024-02-01 18:16:05,658] [INFO] [engine.py:3393:_save_zero_checkpoint] zero checkpoint saved ./tmp-checkpoint-273/global_step273/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt |
|
[2024-02-01 18:16:05,936] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step273 is ready now! |
|
[INFO|tokenization_utils_base.py:2432] 2024-02-01 18:16:08,421 >> tokenizer config file saved in ./tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2441] 2024-02-01 18:16:08,424 >> Special tokens file saved in ./special_tokens_map.json |
|
|
|
|
|
|
|
|
|
|
|
|
|
21%|ββββββββββββββββ | 280/1365 [17:38<1:31:41, 5.07s/it] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21%|βββββββββββββββββ | 290/1365 [18:14<1:04:59, 3.63s/it] |
|
|
|
|