The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`. 0it [00:00, ?it/s] 0it [00:00, ?it/s] /opt/conda/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 1237 examples [00:00, 12341.15 examples/s] Generating train split: 2264 examples [00:00, 14207.36 examples/s] Generating validation split: 0 examples [00:00, ? examples/s] Generating validation split: 30 examples [00:00, 8467.64 examples/s] Running tokenizer on train dataset: 0%| | 0/2264 [00:00> Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co./docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41. Non-default generation parameters: {'max_length': 200, 'early_stopping': True, 'num_beams': 5, 'forced_eos_token_id': 2} 6%|▋ | 3601/57600 [38:10<77:18:32, 5.15s/it] 6%|▋ | 3602/57600 [38:10<56:58:09, 3.80s/it] 6%|▋ | 3603/57600 [38:11<42:32:48, 2.84s/it] 6%|▋ | 3604/57600 [38:12<33:09:29, 2.21s/it] 6%|▋ | 3605/57600 [38:13<26:21:20, 1.76s/it] 6%|▋ | 3606/57600 [38:13<21:24:38, 1.43s/it] 6%|▋ | 3607/57600 [38:14<18:00:56, 1.20s/it] 6%|▋ | 3608/57600 [38:15<15:36:26, 1.04s/it] 6%|▋ | 3609/57600 [38:15<14:03:05, 1.07it/s] 6%|▋ | 3610/57600 [38:16<12:49:33, 1.17it/s] 6%|▋ | 3611/57600 [38:17<11:52:32, 1.26it/s] 6%|▋ | 3612/57600 [38:17<11:01:44, 1.36it/s] 6%|▋ | 3613/57600 [38:18<10:45:01, 1.39it/s] 6%|▋ | 3614/57600 [38:18<10:24:02, 1.44it/s] 6%|▋ | 3615/57600 [38:19<10:36:15, 1.41it/s] 6%|▋ | 3616/57600 [38:20<10:26:30, 1.44it/s] 6%|▋ | 3617/57600 [38:20<10:11:00, 1.47it/s] 6%|▋ | 3618/57600 [38:21<9:57:48, 1.51it/s] 6%|▋ | 3619/57600 [38:22<9:35:15, 1.56it/s] 6%|▋ | 3620/57600 [38:22<9:30:56, 1.58it/s] {'loss': 1.6224, 'learning_rate': 9.46376306620209e-07, 'epoch': 39.96} 6%|▋ | 3620/57600 [38:22<9:30:56, 1.58it/s] 6%|▋ | 3621/57600 [38:23<9:27:30, 1.59it/s] 6%|▋ | 3622/57600 [38:24<9:40:11, 1.55it/s] 6%|▋ | 3623/57600 [38:24<9:45:42, 1.54it/s] 6%|▋ | 3624/57600 [38:25<9:47:09, 1.53it/s]/opt/conda/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. self.pid = os.fork() 6%|▋ | 3625/57600 [38:26<10:44:05, 1.40it/s] 6%|▋ | 3626/57600 [38:27<10:48:36, 1.39it/s] 6%|▋ | 3627/57600 [38:27<10:24:37, 1.44it/s] 6%|▋ | 3628/57600 [38:28<10:15:02, 1.46it/s] 6%|▋ | 3629/57600 [38:28<9:58:55, 1.50it/s] 6%|▋ | 3630/57600 [38:29<10:07:58, 1.48it/s] 6%|▋ | 3631/57600 [38:30<10:12:33, 1.47it/s] 6%|▋ | 3632/57600 [38:31<10:10:27, 1.47it/s] 6%|▋ | 3633/57600 [38:31<10:17:02, 1.46it/s] 6%|▋ | 3634/57600 [38:32<10:05:00, 1.49it/s] 6%|▋ | 3635/57600 [38:33<10:30:39, 1.43it/s] 6%|▋ | 3636/57600 [38:33<10:22:58, 1.44it/s] 6%|▋ | 3637/57600 [38:34<10:25:13, 1.44it/s] 6%|▋ | 3638/57600 [38:35<10:09:49, 1.47it/s] 6%|▋ | 3639/57600 [38:35<10:01:33, 1.50it/s] 6%|▋ | 3640/57600 [38:36<10:04:11, 1.49it/s] {'loss': 1.5808, 'learning_rate': 9.460627177700348e-07, 'epoch': 40.18} 6%|▋ | 3640/57600 [38:36<10:04:11, 1.49it/s] 6%|▋ | 3641/57600 [38:37<10:16:48, 1.46it/s] 6%|▋ | 3642/57600 [38:37<10:19:03, 1.45it/s] 6%|▋ | 3643/57600 [38:38<10:01:49, 1.49it/s] 6%|▋ | 3644/57600 [38:39<10:07:37, 1.48it/s] 6%|▋ | 3645/57600 [38:39<10:08:04, 1.48it/s] 6%|▋ | 3646/57600 [38:40<10:00:50, 1.50it/s] 6%|▋ | 3647/57600 [38:41<10:01:40, 1.49it/s] 6%|▋ | 3648/57600 [38:41<9:59:15, 1.50it/s] 6%|▋ | 3649/57600 [38:42<9:49:19, 1.53it/s] 6%|▋ | 3650/57600 [38:43<9:42:36, 1.54it/s] 6%|▋ | 3651/57600 [38:43<9:37:36, 1.56it/s] 6%|▋ | 3652/57600 [38:44<9:33:28, 1.57it/s] 6%|▋ | 3653/57600 [38:44<9:14:42, 1.62it/s] 6%|▋ | 3654/57600 [38:45<9:21:59, 1.60it/s] 6%|▋ | 3655/57600 [38:46<9:26:05, 1.59it/s] 6%|▋ | 3656/57600 [38:46<9:36:17, 1.56it/s] 6%|▋ | 3657/57600 [38:47<9:43:08, 1.54it/s] 6%|▋ | 3658/57600 [38:48<9:43:14, 1.54it/s] 6%|▋ | 3659/57600 [38:48<9:45:57, 1.53it/s] 6%|▋ | 3660/57600 [38:49<9:43:23, 1.54it/s] {'loss': 1.6187, 'learning_rate': 9.457491289198605e-07, 'epoch': 40.4} 6%|▋ | 3660/57600 [38:49<9:43:23, 1.54it/s] 6%|▋ | 3661/57600 [38:50<9:34:42, 1.56it/s] 6%|▋ | 3662/57600 [38:50<9:45:25, 1.54it/s] 6%|▋ | 3663/57600 [38:51<10:01:29, 1.49it/s] 6%|▋ | 3664/57600 [38:52<10:10:18, 1.47it/s] 6%|▋ | 3665/57600 [38:52<9:54:31, 1.51it/s] 6%|▋ | 3666/57600 [38:53<9:45:01, 1.54it/s] 6%|▋ | 3667/57600 [38:54<9:29:23, 1.58it/s] 6%|▋ | 3668/57600 [38:54<9:28:24, 1.58it/s] 6%|▋ | 3669/57600 [38:55<9:09:31, 1.64it/s] 6%|▋ | 3670/57600 [38:55<9:34:01, 1.57it/s] 6%|▋ | 3671/57600 [38:56<9:21:37, 1.60it/s] 6%|▋ | 3672/57600 [38:57<9:55:03, 1.51it/s] 6%|▋ | 3673/57600 [38:58<10:16:21, 1.46it/s] 6%|▋ | 3674/57600 [38:58<10:13:03, 1.47it/s] 6%|▋ | 3675/57600 [38:59<10:06:49, 1.48it/s] 6%|▋ | 3676/57600 [39:00<10:07:25, 1.48it/s] 6%|▋ | 3677/57600 [39:00<9:48:35, 1.53it/s]