[2024-12-19 09:48:11,879][00337] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-19 09:48:11,882][00337] Rollout worker 0 uses device cpu [2024-12-19 09:48:11,885][00337] Rollout worker 1 uses device cpu [2024-12-19 09:48:11,888][00337] Rollout worker 2 uses device cpu [2024-12-19 09:48:11,891][00337] Rollout worker 3 uses device cpu [2024-12-19 09:48:11,893][00337] Rollout worker 4 uses device cpu [2024-12-19 09:48:11,894][00337] Rollout worker 5 uses device cpu [2024-12-19 09:48:11,896][00337] Rollout worker 6 uses device cpu [2024-12-19 09:48:11,897][00337] Rollout worker 7 uses device cpu [2024-12-19 09:48:12,083][00337] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 09:48:12,085][00337] InferenceWorker_p0-w0: min num requests: 2 [2024-12-19 09:48:12,119][00337] Starting all processes... [2024-12-19 09:48:12,121][00337] Starting process learner_proc0 [2024-12-19 09:48:12,126][00337] EvtLoop [Runner_EvtLoop, process=main process 337] unhandled exception in slot='_on_start' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start self._start_processes() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes p.start() File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 515, in start self._process.start() File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/usr/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'TLSBuffer' object [2024-12-19 09:48:12,130][00337] Unhandled exception cannot pickle 'TLSBuffer' object in evt loop Runner_EvtLoop [2024-12-19 09:48:12,134][00337] Uncaught exception in Runner evt loop Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner.py", line 770, in run evt_loop_status = self.event_loop.exec() File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 403, in exec raise exc File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 399, in exec while self._loop_iteration(): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 383, in _loop_iteration self._process_signal(s) File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 358, in _process_signal raise exc File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start self._start_processes() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes p.start() File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 515, in start self._process.start() File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/usr/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'TLSBuffer' object [2024-12-19 09:48:12,135][00337] Runner profile tree view: main_loop: 0.0165 [2024-12-19 09:48:12,139][00337] Collected {}, FPS: 0.0 [2024-12-19 09:56:20,083][00337] Environment doom_basic already registered, overwriting... [2024-12-19 09:56:20,086][00337] Environment doom_two_colors_easy already registered, overwriting... [2024-12-19 09:56:20,087][00337] Environment doom_two_colors_hard already registered, overwriting... [2024-12-19 09:56:20,089][00337] Environment doom_dm already registered, overwriting... [2024-12-19 09:56:20,091][00337] Environment doom_dwango5 already registered, overwriting... [2024-12-19 09:56:20,091][00337] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-12-19 09:56:20,094][00337] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-12-19 09:56:20,095][00337] Environment doom_my_way_home already registered, overwriting... [2024-12-19 09:56:20,096][00337] Environment doom_deadly_corridor already registered, overwriting... [2024-12-19 09:56:20,097][00337] Environment doom_defend_the_center already registered, overwriting... [2024-12-19 09:56:20,098][00337] Environment doom_defend_the_line already registered, overwriting... [2024-12-19 09:56:20,099][00337] Environment doom_health_gathering already registered, overwriting... [2024-12-19 09:56:20,100][00337] Environment doom_health_gathering_supreme already registered, overwriting... [2024-12-19 09:56:20,102][00337] Environment doom_battle already registered, overwriting... [2024-12-19 09:56:20,103][00337] Environment doom_battle2 already registered, overwriting... [2024-12-19 09:56:20,104][00337] Environment doom_duel_bots already registered, overwriting... [2024-12-19 09:56:20,105][00337] Environment doom_deathmatch_bots already registered, overwriting... [2024-12-19 09:56:20,106][00337] Environment doom_duel already registered, overwriting... [2024-12-19 09:56:20,107][00337] Environment doom_deathmatch_full already registered, overwriting... [2024-12-19 09:56:20,108][00337] Environment doom_benchmark already registered, overwriting... [2024-12-19 09:56:20,110][00337] register_encoder_factory: [2024-12-19 09:56:20,135][00337] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 09:56:20,141][00337] Experiment dir /content/train_dir/default_experiment already exists! [2024-12-19 09:56:20,143][00337] Resuming existing experiment from /content/train_dir/default_experiment... [2024-12-19 09:56:20,145][00337] Weights and Biases integration disabled [2024-12-19 09:56:20,149][00337] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-12-19 09:56:22,335][00337] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000000} git_hash=unknown git_repo_name=not a git repository [2024-12-19 09:56:22,337][00337] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-19 09:56:22,340][00337] Rollout worker 0 uses device cpu [2024-12-19 09:56:22,341][00337] Rollout worker 1 uses device cpu [2024-12-19 09:56:22,343][00337] Rollout worker 2 uses device cpu [2024-12-19 09:56:22,345][00337] Rollout worker 3 uses device cpu [2024-12-19 09:56:22,346][00337] Rollout worker 4 uses device cpu [2024-12-19 09:56:22,347][00337] Rollout worker 5 uses device cpu [2024-12-19 09:56:22,348][00337] Rollout worker 6 uses device cpu [2024-12-19 09:56:22,349][00337] Rollout worker 7 uses device cpu [2024-12-19 09:56:22,446][00337] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 09:56:22,447][00337] InferenceWorker_p0-w0: min num requests: 2 [2024-12-19 09:56:22,486][00337] Starting all processes... [2024-12-19 09:56:22,487][00337] Starting process learner_proc0 [2024-12-19 09:56:22,492][00337] EvtLoop [Runner_EvtLoop, process=main process 337] unhandled exception in slot='_on_start' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start self._start_processes() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes p.start() File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 515, in start self._process.start() File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/usr/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'TLSBuffer' object [2024-12-19 09:56:22,494][00337] Unhandled exception cannot pickle 'TLSBuffer' object in evt loop Runner_EvtLoop [2024-12-19 09:56:22,496][00337] Uncaught exception in Runner evt loop Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner.py", line 770, in run evt_loop_status = self.event_loop.exec() File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 403, in exec raise exc File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 399, in exec while self._loop_iteration(): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 383, in _loop_iteration self._process_signal(s) File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 358, in _process_signal raise exc File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start self._start_processes() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes p.start() File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 515, in start self._process.start() File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/usr/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'TLSBuffer' object [2024-12-19 09:56:22,499][00337] Runner profile tree view: main_loop: 0.0129 [2024-12-19 09:56:22,500][00337] Collected {}, FPS: 0.0 [2024-12-19 09:56:39,683][00337] Environment doom_basic already registered, overwriting... [2024-12-19 09:56:39,685][00337] Environment doom_two_colors_easy already registered, overwriting... [2024-12-19 09:56:39,687][00337] Environment doom_two_colors_hard already registered, overwriting... [2024-12-19 09:56:39,689][00337] Environment doom_dm already registered, overwriting... [2024-12-19 09:56:39,691][00337] Environment doom_dwango5 already registered, overwriting... [2024-12-19 09:56:39,692][00337] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-12-19 09:56:39,694][00337] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-12-19 09:56:39,695][00337] Environment doom_my_way_home already registered, overwriting... [2024-12-19 09:56:39,698][00337] Environment doom_deadly_corridor already registered, overwriting... [2024-12-19 09:56:39,699][00337] Environment doom_defend_the_center already registered, overwriting... [2024-12-19 09:56:39,702][00337] Environment doom_defend_the_line already registered, overwriting... [2024-12-19 09:56:39,703][00337] Environment doom_health_gathering already registered, overwriting... [2024-12-19 09:56:39,704][00337] Environment doom_health_gathering_supreme already registered, overwriting... [2024-12-19 09:56:39,707][00337] Environment doom_battle already registered, overwriting... [2024-12-19 09:56:39,709][00337] Environment doom_battle2 already registered, overwriting... [2024-12-19 09:56:39,709][00337] Environment doom_duel_bots already registered, overwriting... [2024-12-19 09:56:39,710][00337] Environment doom_deathmatch_bots already registered, overwriting... [2024-12-19 09:56:39,713][00337] Environment doom_duel already registered, overwriting... [2024-12-19 09:56:39,714][00337] Environment doom_deathmatch_full already registered, overwriting... [2024-12-19 09:56:39,716][00337] Environment doom_benchmark already registered, overwriting... [2024-12-19 09:56:39,717][00337] register_encoder_factory: [2024-12-19 09:56:39,733][00337] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 09:56:39,736][00337] Overriding arg 'train_for_env_steps' with value 4000000 passed from command line [2024-12-19 09:56:39,742][00337] Experiment dir /content/train_dir/default_experiment already exists! [2024-12-19 09:56:39,743][00337] Resuming existing experiment from /content/train_dir/default_experiment... [2024-12-19 09:56:39,745][00337] Weights and Biases integration disabled [2024-12-19 09:56:39,748][00337] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-12-19 09:56:41,915][00337] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000000} git_hash=unknown git_repo_name=not a git repository [2024-12-19 09:56:41,916][00337] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-19 09:56:41,920][00337] Rollout worker 0 uses device cpu [2024-12-19 09:56:41,921][00337] Rollout worker 1 uses device cpu [2024-12-19 09:56:41,922][00337] Rollout worker 2 uses device cpu [2024-12-19 09:56:41,924][00337] Rollout worker 3 uses device cpu [2024-12-19 09:56:41,925][00337] Rollout worker 4 uses device cpu [2024-12-19 09:56:41,926][00337] Rollout worker 5 uses device cpu [2024-12-19 09:56:41,928][00337] Rollout worker 6 uses device cpu [2024-12-19 09:56:41,929][00337] Rollout worker 7 uses device cpu [2024-12-19 09:56:42,028][00337] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 09:56:42,030][00337] InferenceWorker_p0-w0: min num requests: 2 [2024-12-19 09:56:42,064][00337] Starting all processes... [2024-12-19 09:56:42,066][00337] Starting process learner_proc0 [2024-12-19 09:56:42,071][00337] EvtLoop [Runner_EvtLoop, process=main process 337] unhandled exception in slot='_on_start' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start self._start_processes() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes p.start() File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 515, in start self._process.start() File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/usr/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'TLSBuffer' object [2024-12-19 09:56:42,072][00337] Unhandled exception cannot pickle 'TLSBuffer' object in evt loop Runner_EvtLoop [2024-12-19 09:56:42,074][00337] Uncaught exception in Runner evt loop Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner.py", line 770, in run evt_loop_status = self.event_loop.exec() File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 403, in exec raise exc File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 399, in exec while self._loop_iteration(): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 383, in _loop_iteration self._process_signal(s) File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 358, in _process_signal raise exc File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start self._start_processes() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes p.start() File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 515, in start self._process.start() File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/usr/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'TLSBuffer' object [2024-12-19 09:56:42,076][00337] Runner profile tree view: main_loop: 0.0125 [2024-12-19 09:56:42,078][00337] Collected {}, FPS: 0.0 [2024-12-19 09:57:56,175][07135] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-19 09:57:56,182][07135] Rollout worker 0 uses device cpu [2024-12-19 09:57:56,183][07135] Rollout worker 1 uses device cpu [2024-12-19 09:57:56,188][07135] Rollout worker 2 uses device cpu [2024-12-19 09:57:56,189][07135] Rollout worker 3 uses device cpu [2024-12-19 09:57:56,191][07135] Rollout worker 4 uses device cpu [2024-12-19 09:57:56,194][07135] Rollout worker 5 uses device cpu [2024-12-19 09:57:56,203][07135] Rollout worker 6 uses device cpu [2024-12-19 09:57:56,205][07135] Rollout worker 7 uses device cpu [2024-12-19 09:57:56,382][07135] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 09:57:56,385][07135] InferenceWorker_p0-w0: min num requests: 2 [2024-12-19 09:57:56,479][07135] Starting all processes... [2024-12-19 09:57:56,484][07135] Starting process learner_proc0 [2024-12-19 09:57:56,620][07135] Starting all processes... [2024-12-19 09:57:56,651][07135] Starting process inference_proc0-0 [2024-12-19 09:57:56,654][07135] Starting process rollout_proc0 [2024-12-19 09:57:56,654][07135] Starting process rollout_proc1 [2024-12-19 09:57:56,654][07135] Starting process rollout_proc2 [2024-12-19 09:57:56,654][07135] Starting process rollout_proc3 [2024-12-19 09:57:56,654][07135] Starting process rollout_proc4 [2024-12-19 09:57:56,654][07135] Starting process rollout_proc5 [2024-12-19 09:57:56,654][07135] Starting process rollout_proc6 [2024-12-19 09:57:56,654][07135] Starting process rollout_proc7 [2024-12-19 09:58:12,921][07461] Worker 0 uses CPU cores [0] [2024-12-19 09:58:13,085][07469] Worker 3 uses CPU cores [1] [2024-12-19 09:58:13,097][07462] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 09:58:13,097][07462] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-19 09:58:13,175][07465] Worker 4 uses CPU cores [0] [2024-12-19 09:58:13,177][07462] Num visible devices: 1 [2024-12-19 09:58:13,178][07464] Worker 2 uses CPU cores [0] [2024-12-19 09:58:13,257][07468] Worker 6 uses CPU cores [0] [2024-12-19 09:58:13,285][07448] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 09:58:13,285][07448] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-19 09:58:13,286][07466] Worker 7 uses CPU cores [1] [2024-12-19 09:58:13,315][07467] Worker 5 uses CPU cores [1] [2024-12-19 09:58:13,320][07448] Num visible devices: 1 [2024-12-19 09:58:13,341][07448] Starting seed is not provided [2024-12-19 09:58:13,341][07448] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 09:58:13,341][07448] Initializing actor-critic model on device cuda:0 [2024-12-19 09:58:13,342][07448] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 09:58:13,343][07463] Worker 1 uses CPU cores [1] [2024-12-19 09:58:13,345][07448] RunningMeanStd input shape: (1,) [2024-12-19 09:58:13,360][07448] ConvEncoder: input_channels=3 [2024-12-19 09:58:13,690][07448] Conv encoder output size: 512 [2024-12-19 09:58:13,690][07448] Policy head output size: 512 [2024-12-19 09:58:13,756][07448] Created Actor Critic model with architecture: [2024-12-19 09:58:13,756][07448] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-19 09:58:14,092][07448] Using optimizer [2024-12-19 09:58:16,370][07135] Heartbeat connected on Batcher_0 [2024-12-19 09:58:16,383][07135] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-19 09:58:16,395][07135] Heartbeat connected on RolloutWorker_w0 [2024-12-19 09:58:16,404][07135] Heartbeat connected on RolloutWorker_w1 [2024-12-19 09:58:16,408][07135] Heartbeat connected on RolloutWorker_w2 [2024-12-19 09:58:16,430][07135] Heartbeat connected on RolloutWorker_w3 [2024-12-19 09:58:16,452][07135] Heartbeat connected on RolloutWorker_w4 [2024-12-19 09:58:16,457][07135] Heartbeat connected on RolloutWorker_w5 [2024-12-19 09:58:16,469][07135] Heartbeat connected on RolloutWorker_w6 [2024-12-19 09:58:16,471][07135] Heartbeat connected on RolloutWorker_w7 [2024-12-19 09:58:18,165][07448] No checkpoints found [2024-12-19 09:58:18,165][07448] Did not load from checkpoint, starting from scratch! [2024-12-19 09:58:18,165][07448] Initialized policy 0 weights for model version 0 [2024-12-19 09:58:18,169][07448] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 09:58:18,176][07448] LearnerWorker_p0 finished initialization! [2024-12-19 09:58:18,177][07135] Heartbeat connected on LearnerWorker_p0 [2024-12-19 09:58:18,280][07462] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 09:58:18,282][07462] RunningMeanStd input shape: (1,) [2024-12-19 09:58:18,294][07462] ConvEncoder: input_channels=3 [2024-12-19 09:58:18,401][07462] Conv encoder output size: 512 [2024-12-19 09:58:18,401][07462] Policy head output size: 512 [2024-12-19 09:58:18,454][07135] Inference worker 0-0 is ready! [2024-12-19 09:58:18,456][07135] All inference workers are ready! Signal rollout workers to start! [2024-12-19 09:58:18,683][07469] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 09:58:18,689][07466] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 09:58:18,687][07467] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 09:58:18,686][07468] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 09:58:18,686][07463] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 09:58:18,694][07464] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 09:58:18,689][07461] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 09:58:18,691][07465] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 09:58:20,070][07469] Decorrelating experience for 0 frames... [2024-12-19 09:58:20,068][07463] Decorrelating experience for 0 frames... [2024-12-19 09:58:20,070][07468] Decorrelating experience for 0 frames... [2024-12-19 09:58:20,071][07467] Decorrelating experience for 0 frames... [2024-12-19 09:58:20,075][07464] Decorrelating experience for 0 frames... [2024-12-19 09:58:20,079][07465] Decorrelating experience for 0 frames... [2024-12-19 09:58:20,804][07469] Decorrelating experience for 32 frames... [2024-12-19 09:58:20,877][07466] Decorrelating experience for 0 frames... [2024-12-19 09:58:21,210][07465] Decorrelating experience for 32 frames... [2024-12-19 09:58:21,213][07464] Decorrelating experience for 32 frames... [2024-12-19 09:58:21,234][07461] Decorrelating experience for 0 frames... [2024-12-19 09:58:21,685][07135] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-19 09:58:21,814][07467] Decorrelating experience for 32 frames... [2024-12-19 09:58:21,834][07466] Decorrelating experience for 32 frames... [2024-12-19 09:58:23,097][07461] Decorrelating experience for 32 frames... [2024-12-19 09:58:23,290][07463] Decorrelating experience for 32 frames... [2024-12-19 09:58:23,499][07464] Decorrelating experience for 64 frames... [2024-12-19 09:58:23,501][07465] Decorrelating experience for 64 frames... [2024-12-19 09:58:23,507][07467] Decorrelating experience for 64 frames... [2024-12-19 09:58:23,745][07468] Decorrelating experience for 32 frames... [2024-12-19 09:58:24,572][07461] Decorrelating experience for 64 frames... [2024-12-19 09:58:24,640][07463] Decorrelating experience for 64 frames... [2024-12-19 09:58:24,655][07465] Decorrelating experience for 96 frames... [2024-12-19 09:58:24,717][07467] Decorrelating experience for 96 frames... [2024-12-19 09:58:24,991][07466] Decorrelating experience for 64 frames... [2024-12-19 09:58:25,871][07464] Decorrelating experience for 96 frames... [2024-12-19 09:58:25,928][07461] Decorrelating experience for 96 frames... [2024-12-19 09:58:26,074][07463] Decorrelating experience for 96 frames... [2024-12-19 09:58:26,379][07469] Decorrelating experience for 64 frames... [2024-12-19 09:58:26,463][07466] Decorrelating experience for 96 frames... [2024-12-19 09:58:26,627][07468] Decorrelating experience for 64 frames... [2024-12-19 09:58:26,685][07135] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-19 09:58:28,821][07469] Decorrelating experience for 96 frames... [2024-12-19 09:58:29,856][07468] Decorrelating experience for 96 frames... [2024-12-19 09:58:31,686][07135] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 183.4. Samples: 1834. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-19 09:58:31,688][07135] Avg episode reward: [(0, '2.148')] [2024-12-19 09:58:32,028][07448] Signal inference workers to stop experience collection... [2024-12-19 09:58:32,049][07462] InferenceWorker_p0-w0: stopping experience collection [2024-12-19 09:58:34,989][07448] Signal inference workers to resume experience collection... [2024-12-19 09:58:34,990][07462] InferenceWorker_p0-w0: resuming experience collection [2024-12-19 09:58:36,685][07135] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 253.7. Samples: 3806. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-19 09:58:36,691][07135] Avg episode reward: [(0, '3.045')] [2024-12-19 09:58:41,685][07135] Fps is (10 sec: 3686.7, 60 sec: 1843.2, 300 sec: 1843.2). Total num frames: 36864. Throughput: 0: 365.8. Samples: 7316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-19 09:58:41,691][07135] Avg episode reward: [(0, '3.804')] [2024-12-19 09:58:42,478][07462] Updated weights for policy 0, policy_version 10 (0.0021) [2024-12-19 09:58:46,688][07135] Fps is (10 sec: 4095.0, 60 sec: 2129.7, 300 sec: 2129.7). Total num frames: 53248. Throughput: 0: 523.1. Samples: 13078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 09:58:46,692][07135] Avg episode reward: [(0, '4.269')] [2024-12-19 09:58:51,685][07135] Fps is (10 sec: 3276.8, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 602.2. Samples: 18066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 09:58:51,691][07135] Avg episode reward: [(0, '4.397')] [2024-12-19 09:58:54,303][07462] Updated weights for policy 0, policy_version 20 (0.0013) [2024-12-19 09:58:56,685][07135] Fps is (10 sec: 3687.3, 60 sec: 2574.6, 300 sec: 2574.6). Total num frames: 90112. Throughput: 0: 603.5. Samples: 21122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 09:58:56,688][07135] Avg episode reward: [(0, '4.604')] [2024-12-19 09:59:01,688][07135] Fps is (10 sec: 4094.6, 60 sec: 2764.6, 300 sec: 2764.6). Total num frames: 110592. Throughput: 0: 698.3. Samples: 27934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 09:59:01,692][07135] Avg episode reward: [(0, '4.441')] [2024-12-19 09:59:01,698][07448] Saving new best policy, reward=4.441! [2024-12-19 09:59:05,315][07462] Updated weights for policy 0, policy_version 30 (0.0025) [2024-12-19 09:59:06,685][07135] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 716.4. Samples: 32240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 09:59:06,692][07135] Avg episode reward: [(0, '4.578')] [2024-12-19 09:59:06,721][07448] Saving new best policy, reward=4.578! [2024-12-19 09:59:11,685][07135] Fps is (10 sec: 3277.8, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 143360. Throughput: 0: 788.5. Samples: 35482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 09:59:11,692][07135] Avg episode reward: [(0, '4.483')] [2024-12-19 09:59:15,242][07462] Updated weights for policy 0, policy_version 40 (0.0024) [2024-12-19 09:59:16,685][07135] Fps is (10 sec: 4505.6, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 888.5. Samples: 41814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 09:59:16,687][07135] Avg episode reward: [(0, '4.546')] [2024-12-19 09:59:21,691][07135] Fps is (10 sec: 4093.6, 60 sec: 3071.7, 300 sec: 3071.7). Total num frames: 184320. Throughput: 0: 951.3. Samples: 46622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 09:59:21,694][07135] Avg episode reward: [(0, '4.424')] [2024-12-19 09:59:26,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3087.7). Total num frames: 200704. Throughput: 0: 921.9. Samples: 48802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 09:59:26,687][07135] Avg episode reward: [(0, '4.420')] [2024-12-19 09:59:27,183][07462] Updated weights for policy 0, policy_version 50 (0.0018) [2024-12-19 09:59:31,685][07135] Fps is (10 sec: 4098.5, 60 sec: 3754.7, 300 sec: 3218.3). Total num frames: 225280. Throughput: 0: 948.6. Samples: 55762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 09:59:31,687][07135] Avg episode reward: [(0, '4.345')] [2024-12-19 09:59:36,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 970.5. Samples: 61738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 09:59:36,693][07135] Avg episode reward: [(0, '4.310')] [2024-12-19 09:59:36,950][07462] Updated weights for policy 0, policy_version 60 (0.0028) [2024-12-19 09:59:41,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3225.6). Total num frames: 258048. Throughput: 0: 950.1. Samples: 63876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 09:59:41,687][07135] Avg episode reward: [(0, '4.286')] [2024-12-19 09:59:46,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3325.0). Total num frames: 282624. Throughput: 0: 939.5. Samples: 70210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 09:59:46,687][07135] Avg episode reward: [(0, '4.347')] [2024-12-19 09:59:46,696][07448] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth... [2024-12-19 09:59:47,264][07462] Updated weights for policy 0, policy_version 70 (0.0026) [2024-12-19 09:59:51,685][07135] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 994.9. Samples: 77010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 09:59:51,690][07135] Avg episode reward: [(0, '4.303')] [2024-12-19 09:59:56,687][07135] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 971.1. Samples: 79184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 09:59:56,689][07135] Avg episode reward: [(0, '4.353')] [2024-12-19 09:59:58,693][07462] Updated weights for policy 0, policy_version 80 (0.0033) [2024-12-19 10:00:01,687][07135] Fps is (10 sec: 3685.5, 60 sec: 3823.0, 300 sec: 3399.6). Total num frames: 339968. Throughput: 0: 947.8. Samples: 84466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:00:01,694][07135] Avg episode reward: [(0, '4.324')] [2024-12-19 10:00:06,685][07135] Fps is (10 sec: 4506.4, 60 sec: 4027.7, 300 sec: 3471.8). Total num frames: 364544. Throughput: 0: 999.9. Samples: 91612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:00:06,687][07135] Avg episode reward: [(0, '4.350')] [2024-12-19 10:00:07,362][07462] Updated weights for policy 0, policy_version 90 (0.0016) [2024-12-19 10:00:11,685][07135] Fps is (10 sec: 4096.9, 60 sec: 3959.5, 300 sec: 3463.0). Total num frames: 380928. Throughput: 0: 1019.3. Samples: 94670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:00:11,690][07135] Avg episode reward: [(0, '4.351')] [2024-12-19 10:00:16,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3454.9). Total num frames: 397312. Throughput: 0: 960.7. Samples: 98992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:00:16,687][07135] Avg episode reward: [(0, '4.362')] [2024-12-19 10:00:18,904][07462] Updated weights for policy 0, policy_version 100 (0.0017) [2024-12-19 10:00:21,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 3515.7). Total num frames: 421888. Throughput: 0: 987.0. Samples: 106154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-19 10:00:21,687][07135] Avg episode reward: [(0, '4.453')] [2024-12-19 10:00:26,685][07135] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3538.9). Total num frames: 442368. Throughput: 0: 1016.4. Samples: 109616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:00:26,691][07135] Avg episode reward: [(0, '4.478')] [2024-12-19 10:00:29,128][07462] Updated weights for policy 0, policy_version 110 (0.0021) [2024-12-19 10:00:31,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3497.4). Total num frames: 454656. Throughput: 0: 978.0. Samples: 114218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:00:31,687][07135] Avg episode reward: [(0, '4.319')] [2024-12-19 10:00:36,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 479232. Throughput: 0: 964.9. Samples: 120430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:00:36,691][07135] Avg episode reward: [(0, '4.476')] [2024-12-19 10:00:39,001][07462] Updated weights for policy 0, policy_version 120 (0.0019) [2024-12-19 10:00:41,685][07135] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3598.6). Total num frames: 503808. Throughput: 0: 995.6. Samples: 123984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:00:41,688][07135] Avg episode reward: [(0, '4.547')] [2024-12-19 10:00:46,686][07135] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3559.3). Total num frames: 516096. Throughput: 0: 1002.2. Samples: 129562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:00:46,691][07135] Avg episode reward: [(0, '4.508')] [2024-12-19 10:00:50,351][07462] Updated weights for policy 0, policy_version 130 (0.0024) [2024-12-19 10:00:51,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3577.2). Total num frames: 536576. Throughput: 0: 962.5. Samples: 134926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:00:51,688][07135] Avg episode reward: [(0, '4.695')] [2024-12-19 10:00:51,691][07448] Saving new best policy, reward=4.695! [2024-12-19 10:00:56,685][07135] Fps is (10 sec: 4506.0, 60 sec: 4027.9, 300 sec: 3620.3). Total num frames: 561152. Throughput: 0: 971.6. Samples: 138390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:00:56,688][07135] Avg episode reward: [(0, '4.655')] [2024-12-19 10:00:59,347][07462] Updated weights for policy 0, policy_version 140 (0.0025) [2024-12-19 10:01:01,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3609.6). Total num frames: 577536. Throughput: 0: 1016.5. Samples: 144734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:01:01,692][07135] Avg episode reward: [(0, '4.454')] [2024-12-19 10:01:06,687][07135] Fps is (10 sec: 3276.0, 60 sec: 3822.8, 300 sec: 3599.5). Total num frames: 593920. Throughput: 0: 954.1. Samples: 149090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:01:06,692][07135] Avg episode reward: [(0, '4.366')] [2024-12-19 10:01:10,917][07462] Updated weights for policy 0, policy_version 150 (0.0024) [2024-12-19 10:01:11,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3614.1). Total num frames: 614400. Throughput: 0: 954.3. Samples: 152560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:01:11,690][07135] Avg episode reward: [(0, '4.234')] [2024-12-19 10:01:16,685][07135] Fps is (10 sec: 3687.2, 60 sec: 3891.2, 300 sec: 3604.5). Total num frames: 630784. Throughput: 0: 970.4. Samples: 157888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:01:16,691][07135] Avg episode reward: [(0, '4.409')] [2024-12-19 10:01:21,685][07135] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3572.6). Total num frames: 643072. Throughput: 0: 914.6. Samples: 161588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:01:21,692][07135] Avg episode reward: [(0, '4.530')] [2024-12-19 10:01:25,426][07462] Updated weights for policy 0, policy_version 160 (0.0034) [2024-12-19 10:01:26,685][07135] Fps is (10 sec: 2867.3, 60 sec: 3618.1, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 881.0. Samples: 163628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:01:26,690][07135] Avg episode reward: [(0, '4.518')] [2024-12-19 10:01:31,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3600.2). Total num frames: 684032. Throughput: 0: 904.0. Samples: 170242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:01:31,690][07135] Avg episode reward: [(0, '4.486')] [2024-12-19 10:01:34,382][07462] Updated weights for policy 0, policy_version 170 (0.0021) [2024-12-19 10:01:36,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3591.9). Total num frames: 700416. Throughput: 0: 919.5. Samples: 176302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:01:36,694][07135] Avg episode reward: [(0, '4.417')] [2024-12-19 10:01:41,688][07135] Fps is (10 sec: 3275.7, 60 sec: 3549.7, 300 sec: 3583.9). Total num frames: 716800. Throughput: 0: 888.5. Samples: 178374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:01:41,691][07135] Avg episode reward: [(0, '4.388')] [2024-12-19 10:01:46,142][07462] Updated weights for policy 0, policy_version 180 (0.0022) [2024-12-19 10:01:46,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3596.5). Total num frames: 737280. Throughput: 0: 874.9. Samples: 184106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:01:46,688][07135] Avg episode reward: [(0, '4.567')] [2024-12-19 10:01:46,695][07448] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth... [2024-12-19 10:01:51,685][07135] Fps is (10 sec: 4507.1, 60 sec: 3754.7, 300 sec: 3627.9). Total num frames: 761856. Throughput: 0: 928.0. Samples: 190846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:01:51,688][07135] Avg episode reward: [(0, '4.752')] [2024-12-19 10:01:51,689][07448] Saving new best policy, reward=4.752! [2024-12-19 10:01:56,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3600.7). Total num frames: 774144. Throughput: 0: 901.1. Samples: 193108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:01:56,689][07135] Avg episode reward: [(0, '4.805')] [2024-12-19 10:01:56,700][07448] Saving new best policy, reward=4.805! [2024-12-19 10:01:57,674][07462] Updated weights for policy 0, policy_version 190 (0.0041) [2024-12-19 10:02:01,685][07135] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3593.3). Total num frames: 790528. Throughput: 0: 883.6. Samples: 197650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:02:01,692][07135] Avg episode reward: [(0, '4.688')] [2024-12-19 10:02:06,685][07135] Fps is (10 sec: 4095.8, 60 sec: 3686.5, 300 sec: 3622.7). Total num frames: 815104. Throughput: 0: 954.7. Samples: 204552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:02:06,692][07135] Avg episode reward: [(0, '4.440')] [2024-12-19 10:02:07,274][07462] Updated weights for policy 0, policy_version 200 (0.0022) [2024-12-19 10:02:11,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3615.2). Total num frames: 831488. Throughput: 0: 984.3. Samples: 207920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:02:11,690][07135] Avg episode reward: [(0, '4.676')] [2024-12-19 10:02:16,685][07135] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3608.0). Total num frames: 847872. Throughput: 0: 932.5. Samples: 212204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:02:16,695][07135] Avg episode reward: [(0, '4.738')] [2024-12-19 10:02:18,767][07462] Updated weights for policy 0, policy_version 210 (0.0031) [2024-12-19 10:02:21,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3635.2). Total num frames: 872448. Throughput: 0: 945.6. Samples: 218856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:02:21,692][07135] Avg episode reward: [(0, '4.478')] [2024-12-19 10:02:26,689][07135] Fps is (10 sec: 4503.7, 60 sec: 3890.9, 300 sec: 3644.5). Total num frames: 892928. Throughput: 0: 978.5. Samples: 222408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:02:26,692][07135] Avg episode reward: [(0, '4.433')] [2024-12-19 10:02:28,171][07462] Updated weights for policy 0, policy_version 220 (0.0023) [2024-12-19 10:02:31,685][07135] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3637.2). Total num frames: 909312. Throughput: 0: 959.5. Samples: 227284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:02:31,689][07135] Avg episode reward: [(0, '4.594')] [2024-12-19 10:02:36,685][07135] Fps is (10 sec: 3687.9, 60 sec: 3822.9, 300 sec: 3646.2). Total num frames: 929792. Throughput: 0: 934.7. Samples: 232908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:02:36,693][07135] Avg episode reward: [(0, '4.614')] [2024-12-19 10:02:39,346][07462] Updated weights for policy 0, policy_version 230 (0.0022) [2024-12-19 10:02:41,685][07135] Fps is (10 sec: 4096.1, 60 sec: 3891.4, 300 sec: 3654.9). Total num frames: 950272. Throughput: 0: 958.8. Samples: 236252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:02:41,693][07135] Avg episode reward: [(0, '4.629')] [2024-12-19 10:02:46,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3647.8). Total num frames: 966656. Throughput: 0: 983.2. Samples: 241896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:02:46,689][07135] Avg episode reward: [(0, '4.731')] [2024-12-19 10:02:51,284][07462] Updated weights for policy 0, policy_version 240 (0.0022) [2024-12-19 10:02:51,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3640.9). Total num frames: 983040. Throughput: 0: 934.3. Samples: 246594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:02:51,688][07135] Avg episode reward: [(0, '4.717')] [2024-12-19 10:02:56,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3649.2). Total num frames: 1003520. Throughput: 0: 936.3. Samples: 250052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:02:56,688][07135] Avg episode reward: [(0, '4.717')] [2024-12-19 10:03:00,483][07462] Updated weights for policy 0, policy_version 250 (0.0023) [2024-12-19 10:03:01,689][07135] Fps is (10 sec: 4094.5, 60 sec: 3891.0, 300 sec: 3657.1). Total num frames: 1024000. Throughput: 0: 983.7. Samples: 256474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:03:01,693][07135] Avg episode reward: [(0, '4.683')] [2024-12-19 10:03:06,686][07135] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3650.5). Total num frames: 1040384. Throughput: 0: 928.3. Samples: 260630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:03:06,691][07135] Avg episode reward: [(0, '4.667')] [2024-12-19 10:03:11,685][07135] Fps is (10 sec: 3687.8, 60 sec: 3822.9, 300 sec: 3658.2). Total num frames: 1060864. Throughput: 0: 917.4. Samples: 263686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:03:11,687][07135] Avg episode reward: [(0, '4.648')] [2024-12-19 10:03:12,154][07462] Updated weights for policy 0, policy_version 260 (0.0018) [2024-12-19 10:03:16,685][07135] Fps is (10 sec: 4506.0, 60 sec: 3959.5, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 964.3. Samples: 270678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:03:16,689][07135] Avg episode reward: [(0, '4.651')] [2024-12-19 10:03:21,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 952.1. Samples: 275754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:03:21,690][07135] Avg episode reward: [(0, '4.739')] [2024-12-19 10:03:23,385][07462] Updated weights for policy 0, policy_version 270 (0.0018) [2024-12-19 10:03:26,686][07135] Fps is (10 sec: 3276.6, 60 sec: 3754.9, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 925.9. Samples: 277920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:03:26,689][07135] Avg episode reward: [(0, '4.872')] [2024-12-19 10:03:26,697][07448] Saving new best policy, reward=4.872! [2024-12-19 10:03:31,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 1138688. Throughput: 0: 951.5. Samples: 284712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:03:31,687][07135] Avg episode reward: [(0, '4.871')] [2024-12-19 10:03:32,728][07462] Updated weights for policy 0, policy_version 280 (0.0018) [2024-12-19 10:03:36,687][07135] Fps is (10 sec: 4095.7, 60 sec: 3822.8, 300 sec: 3804.4). Total num frames: 1159168. Throughput: 0: 981.9. Samples: 290780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:03:36,690][07135] Avg episode reward: [(0, '4.717')] [2024-12-19 10:03:41,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 1171456. Throughput: 0: 950.8. Samples: 292840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:03:41,688][07135] Avg episode reward: [(0, '4.664')] [2024-12-19 10:03:44,265][07462] Updated weights for policy 0, policy_version 290 (0.0025) [2024-12-19 10:03:46,685][07135] Fps is (10 sec: 3686.9, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1196032. Throughput: 0: 941.8. Samples: 298852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:03:46,689][07135] Avg episode reward: [(0, '4.768')] [2024-12-19 10:03:46,700][07448] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_1196032.pth... [2024-12-19 10:03:46,815][07448] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth [2024-12-19 10:03:51,689][07135] Fps is (10 sec: 4913.1, 60 sec: 3959.2, 300 sec: 3832.1). Total num frames: 1220608. Throughput: 0: 1001.1. Samples: 305682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:03:51,695][07135] Avg episode reward: [(0, '5.038')] [2024-12-19 10:03:51,701][07448] Saving new best policy, reward=5.038! [2024-12-19 10:03:54,375][07462] Updated weights for policy 0, policy_version 300 (0.0024) [2024-12-19 10:03:56,688][07135] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3804.4). Total num frames: 1232896. Throughput: 0: 981.8. Samples: 307870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:03:56,693][07135] Avg episode reward: [(0, '5.141')] [2024-12-19 10:03:56,705][07448] Saving new best policy, reward=5.141! [2024-12-19 10:04:01,685][07135] Fps is (10 sec: 3278.2, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 1253376. Throughput: 0: 936.2. Samples: 312808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:04:01,688][07135] Avg episode reward: [(0, '5.120')] [2024-12-19 10:04:04,924][07462] Updated weights for policy 0, policy_version 310 (0.0022) [2024-12-19 10:04:06,685][07135] Fps is (10 sec: 4097.3, 60 sec: 3891.3, 300 sec: 3832.2). Total num frames: 1273856. Throughput: 0: 978.3. Samples: 319776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:04:06,687][07135] Avg episode reward: [(0, '4.820')] [2024-12-19 10:04:11,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1294336. Throughput: 0: 1002.1. Samples: 323014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:04:11,690][07135] Avg episode reward: [(0, '4.666')] [2024-12-19 10:04:16,578][07462] Updated weights for policy 0, policy_version 320 (0.0023) [2024-12-19 10:04:16,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.4). Total num frames: 1310720. Throughput: 0: 945.2. Samples: 327248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:04:16,688][07135] Avg episode reward: [(0, '4.734')] [2024-12-19 10:04:21,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1331200. Throughput: 0: 959.9. Samples: 333976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:04:21,687][07135] Avg episode reward: [(0, '4.821')] [2024-12-19 10:04:25,371][07462] Updated weights for policy 0, policy_version 330 (0.0018) [2024-12-19 10:04:26,685][07135] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1355776. Throughput: 0: 993.4. Samples: 337544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:04:26,687][07135] Avg episode reward: [(0, '4.637')] [2024-12-19 10:04:31,687][07135] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3818.3). Total num frames: 1368064. Throughput: 0: 969.2. Samples: 342466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:04:31,692][07135] Avg episode reward: [(0, '4.674')] [2024-12-19 10:04:36,685][07135] Fps is (10 sec: 3276.9, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 1388544. Throughput: 0: 945.3. Samples: 348218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:04:36,688][07135] Avg episode reward: [(0, '4.874')] [2024-12-19 10:04:36,958][07462] Updated weights for policy 0, policy_version 340 (0.0026) [2024-12-19 10:04:41,685][07135] Fps is (10 sec: 4506.5, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 1413120. Throughput: 0: 975.0. Samples: 351744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:04:41,690][07135] Avg episode reward: [(0, '4.612')] [2024-12-19 10:04:46,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1429504. Throughput: 0: 999.0. Samples: 357762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:04:46,692][07135] Avg episode reward: [(0, '4.457')] [2024-12-19 10:04:47,138][07462] Updated weights for policy 0, policy_version 350 (0.0027) [2024-12-19 10:04:51,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3818.3). Total num frames: 1445888. Throughput: 0: 948.7. Samples: 362466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:04:51,691][07135] Avg episode reward: [(0, '4.555')] [2024-12-19 10:04:56,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3832.2). Total num frames: 1470464. Throughput: 0: 955.2. Samples: 365998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:04:56,687][07135] Avg episode reward: [(0, '4.654')] [2024-12-19 10:04:57,164][07462] Updated weights for policy 0, policy_version 360 (0.0023) [2024-12-19 10:05:01,685][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1490944. Throughput: 0: 1011.9. Samples: 372784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:05:01,688][07135] Avg episode reward: [(0, '4.575')] [2024-12-19 10:05:06,689][07135] Fps is (10 sec: 3275.5, 60 sec: 3822.7, 300 sec: 3804.4). Total num frames: 1503232. Throughput: 0: 958.8. Samples: 377128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:05:06,692][07135] Avg episode reward: [(0, '4.516')] [2024-12-19 10:05:08,906][07462] Updated weights for policy 0, policy_version 370 (0.0019) [2024-12-19 10:05:11,685][07135] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1527808. Throughput: 0: 947.4. Samples: 380178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:05:11,687][07135] Avg episode reward: [(0, '4.637')] [2024-12-19 10:05:16,685][07135] Fps is (10 sec: 4507.4, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1548288. Throughput: 0: 994.3. Samples: 387208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:05:16,692][07135] Avg episode reward: [(0, '4.590')] [2024-12-19 10:05:17,821][07462] Updated weights for policy 0, policy_version 380 (0.0013) [2024-12-19 10:05:21,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1564672. Throughput: 0: 981.3. Samples: 392378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:05:21,687][07135] Avg episode reward: [(0, '4.697')] [2024-12-19 10:05:26,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1585152. Throughput: 0: 950.3. Samples: 394506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:05:26,687][07135] Avg episode reward: [(0, '4.477')] [2024-12-19 10:05:29,357][07462] Updated weights for policy 0, policy_version 390 (0.0019) [2024-12-19 10:05:31,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3818.3). Total num frames: 1605632. Throughput: 0: 966.0. Samples: 401234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:05:31,688][07135] Avg episode reward: [(0, '4.334')] [2024-12-19 10:05:36,686][07135] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 3804.4). Total num frames: 1626112. Throughput: 0: 996.0. Samples: 407288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:05:36,690][07135] Avg episode reward: [(0, '4.519')] [2024-12-19 10:05:40,565][07462] Updated weights for policy 0, policy_version 400 (0.0029) [2024-12-19 10:05:41,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1638400. Throughput: 0: 964.5. Samples: 409400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:05:41,690][07135] Avg episode reward: [(0, '4.594')] [2024-12-19 10:05:46,685][07135] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1662976. Throughput: 0: 946.1. Samples: 415360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:05:46,687][07135] Avg episode reward: [(0, '4.574')] [2024-12-19 10:05:46,698][07448] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth... [2024-12-19 10:05:46,808][07448] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth [2024-12-19 10:05:49,882][07462] Updated weights for policy 0, policy_version 410 (0.0020) [2024-12-19 10:05:51,685][07135] Fps is (10 sec: 4915.2, 60 sec: 4027.8, 300 sec: 3818.3). Total num frames: 1687552. Throughput: 0: 1003.0. Samples: 422258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:05:51,687][07135] Avg episode reward: [(0, '4.596')] [2024-12-19 10:05:56,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1699840. Throughput: 0: 984.2. Samples: 424466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:05:56,689][07135] Avg episode reward: [(0, '4.694')] [2024-12-19 10:06:01,685][07135] Fps is (10 sec: 2457.6, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 1712128. Throughput: 0: 908.6. Samples: 428094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:06:01,687][07135] Avg episode reward: [(0, '4.813')] [2024-12-19 10:06:03,991][07462] Updated weights for policy 0, policy_version 420 (0.0017) [2024-12-19 10:06:06,685][07135] Fps is (10 sec: 2867.2, 60 sec: 3754.9, 300 sec: 3776.6). Total num frames: 1728512. Throughput: 0: 901.2. Samples: 432930. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:06:06,688][07135] Avg episode reward: [(0, '4.799')] [2024-12-19 10:06:11,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1748992. Throughput: 0: 930.7. Samples: 436388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:06:11,691][07135] Avg episode reward: [(0, '4.900')] [2024-12-19 10:06:14,969][07462] Updated weights for policy 0, policy_version 430 (0.0015) [2024-12-19 10:06:16,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1765376. Throughput: 0: 887.8. Samples: 441184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:06:16,689][07135] Avg episode reward: [(0, '4.966')] [2024-12-19 10:06:21,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1785856. Throughput: 0: 883.5. Samples: 447046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:06:21,688][07135] Avg episode reward: [(0, '5.140')] [2024-12-19 10:06:25,063][07462] Updated weights for policy 0, policy_version 440 (0.0034) [2024-12-19 10:06:26,685][07135] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1806336. Throughput: 0: 914.4. Samples: 450546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:06:26,687][07135] Avg episode reward: [(0, '4.744')] [2024-12-19 10:06:31,688][07135] Fps is (10 sec: 3685.1, 60 sec: 3617.9, 300 sec: 3804.4). Total num frames: 1822720. Throughput: 0: 908.3. Samples: 456238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:06:31,691][07135] Avg episode reward: [(0, '4.639')] [2024-12-19 10:06:36,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3804.5). Total num frames: 1839104. Throughput: 0: 862.4. Samples: 461064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:06:36,688][07135] Avg episode reward: [(0, '4.731')] [2024-12-19 10:06:36,768][07462] Updated weights for policy 0, policy_version 450 (0.0029) [2024-12-19 10:06:41,685][07135] Fps is (10 sec: 4097.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1863680. Throughput: 0: 890.3. Samples: 464528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:06:41,692][07135] Avg episode reward: [(0, '4.632')] [2024-12-19 10:06:45,848][07462] Updated weights for policy 0, policy_version 460 (0.0020) [2024-12-19 10:06:46,687][07135] Fps is (10 sec: 4504.8, 60 sec: 3686.3, 300 sec: 3804.4). Total num frames: 1884160. Throughput: 0: 963.1. Samples: 471434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:06:46,689][07135] Avg episode reward: [(0, '4.725')] [2024-12-19 10:06:51,688][07135] Fps is (10 sec: 3275.7, 60 sec: 3481.4, 300 sec: 3804.4). Total num frames: 1896448. Throughput: 0: 950.2. Samples: 475692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:06:51,693][07135] Avg episode reward: [(0, '4.730')] [2024-12-19 10:06:56,685][07135] Fps is (10 sec: 3686.9, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1921024. Throughput: 0: 943.0. Samples: 478822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:06:56,692][07135] Avg episode reward: [(0, '4.693')] [2024-12-19 10:06:57,054][07462] Updated weights for policy 0, policy_version 470 (0.0022) [2024-12-19 10:07:01,685][07135] Fps is (10 sec: 4916.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1945600. Throughput: 0: 988.5. Samples: 485666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:07:01,688][07135] Avg episode reward: [(0, '4.697')] [2024-12-19 10:07:06,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1957888. Throughput: 0: 970.9. Samples: 490736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:07:06,690][07135] Avg episode reward: [(0, '4.895')] [2024-12-19 10:07:08,388][07462] Updated weights for policy 0, policy_version 480 (0.0015) [2024-12-19 10:07:11,685][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1978368. Throughput: 0: 943.2. Samples: 492992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:07:11,687][07135] Avg episode reward: [(0, '4.864')] [2024-12-19 10:07:16,685][07135] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2002944. Throughput: 0: 971.4. Samples: 499948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:07:16,690][07135] Avg episode reward: [(0, '4.573')] [2024-12-19 10:07:17,635][07462] Updated weights for policy 0, policy_version 490 (0.0028) [2024-12-19 10:07:21,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.4). Total num frames: 2019328. Throughput: 0: 999.1. Samples: 506022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:07:21,690][07135] Avg episode reward: [(0, '4.602')] [2024-12-19 10:07:26,685][07135] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2035712. Throughput: 0: 970.1. Samples: 508182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:07:26,688][07135] Avg episode reward: [(0, '4.677')] [2024-12-19 10:07:29,078][07462] Updated weights for policy 0, policy_version 500 (0.0026) [2024-12-19 10:07:31,685][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 2056192. Throughput: 0: 947.7. Samples: 514080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:07:31,692][07135] Avg episode reward: [(0, '4.721')] [2024-12-19 10:07:36,686][07135] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 2080768. Throughput: 0: 1008.5. Samples: 521074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:07:36,688][07135] Avg episode reward: [(0, '4.645')] [2024-12-19 10:07:38,814][07462] Updated weights for policy 0, policy_version 510 (0.0017) [2024-12-19 10:07:41,689][07135] Fps is (10 sec: 3684.8, 60 sec: 3822.7, 300 sec: 3818.3). Total num frames: 2093056. Throughput: 0: 984.8. Samples: 523144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:07:41,694][07135] Avg episode reward: [(0, '4.702')] [2024-12-19 10:07:46,685][07135] Fps is (10 sec: 3277.0, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 2113536. Throughput: 0: 949.7. Samples: 528402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:07:46,687][07135] Avg episode reward: [(0, '4.758')] [2024-12-19 10:07:46,696][07448] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000516_2113536.pth... [2024-12-19 10:07:46,824][07448] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_1196032.pth [2024-12-19 10:07:49,629][07462] Updated weights for policy 0, policy_version 520 (0.0020) [2024-12-19 10:07:51,685][07135] Fps is (10 sec: 4507.3, 60 sec: 4027.9, 300 sec: 3846.1). Total num frames: 2138112. Throughput: 0: 990.6. Samples: 535314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:07:51,687][07135] Avg episode reward: [(0, '4.709')] [2024-12-19 10:07:56,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2154496. Throughput: 0: 1004.2. Samples: 538182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-12-19 10:07:56,690][07135] Avg episode reward: [(0, '4.614')] [2024-12-19 10:08:01,343][07462] Updated weights for policy 0, policy_version 530 (0.0024) [2024-12-19 10:08:01,685][07135] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2170880. Throughput: 0: 943.9. Samples: 542422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:08:01,687][07135] Avg episode reward: [(0, '4.540')] [2024-12-19 10:08:06,685][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2195456. Throughput: 0: 964.3. Samples: 549416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:08:06,688][07135] Avg episode reward: [(0, '4.489')] [2024-12-19 10:08:10,132][07462] Updated weights for policy 0, policy_version 540 (0.0033) [2024-12-19 10:08:11,687][07135] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3832.2). Total num frames: 2215936. Throughput: 0: 994.5. Samples: 552934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:08:11,693][07135] Avg episode reward: [(0, '4.697')] [2024-12-19 10:08:16,686][07135] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 2228224. Throughput: 0: 968.2. Samples: 557648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:08:16,688][07135] Avg episode reward: [(0, '4.709')] [2024-12-19 10:08:21,442][07462] Updated weights for policy 0, policy_version 550 (0.0021) [2024-12-19 10:08:21,686][07135] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2252800. Throughput: 0: 947.6. Samples: 563718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:08:21,688][07135] Avg episode reward: [(0, '4.596')] [2024-12-19 10:08:26,685][07135] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2273280. Throughput: 0: 978.5. Samples: 567174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:08:26,691][07135] Avg episode reward: [(0, '4.458')] [2024-12-19 10:08:31,685][07135] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2289664. Throughput: 0: 983.6. Samples: 572664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:08:31,687][07135] Avg episode reward: [(0, '4.481')] [2024-12-19 10:08:33,636][07462] Updated weights for policy 0, policy_version 560 (0.0033) [2024-12-19 10:08:34,132][07135] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 7135], exiting... [2024-12-19 10:08:34,142][07448] Stopping Batcher_0... [2024-12-19 10:08:34,144][07448] Loop batcher_evt_loop terminating... [2024-12-19 10:08:34,144][07448] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000560_2293760.pth... [2024-12-19 10:08:34,141][07135] Runner profile tree view: main_loop: 637.6622 [2024-12-19 10:08:34,153][07135] Collected {0: 2293760}, FPS: 3597.1 [2024-12-19 10:08:34,352][07462] Weights refcount: 2 0 [2024-12-19 10:08:34,358][07462] Stopping InferenceWorker_p0-w0... [2024-12-19 10:08:34,368][07462] Loop inference_proc0-0_evt_loop terminating... [2024-12-19 10:08:34,510][07463] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-12-19 10:08:34,511][07463] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop [2024-12-19 10:08:34,545][07469] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-12-19 10:08:34,554][07469] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop [2024-12-19 10:08:34,535][07468] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-12-19 10:08:34,653][07468] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop [2024-12-19 10:08:34,626][07467] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-12-19 10:08:34,638][07466] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-12-19 10:08:34,654][07466] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop [2024-12-19 10:08:34,654][07467] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2024-12-19 10:08:34,683][07464] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-12-19 10:08:34,696][07464] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop [2024-12-19 10:08:34,725][07448] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth [2024-12-19 10:08:34,833][07448] Stopping LearnerWorker_p0... [2024-12-19 10:08:34,840][07448] Loop learner_proc0_evt_loop terminating... [2024-12-19 10:08:34,991][07465] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-12-19 10:08:35,007][07461] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/gymnasium/core.py", line 461, in step return self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2024-12-19 10:08:35,159][07465] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop [2024-12-19 10:08:35,192][07461] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop [2024-12-19 10:08:36,640][07135] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 10:08:36,647][07135] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-19 10:08:36,650][07135] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-19 10:08:36,653][07135] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-19 10:08:36,656][07135] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-19 10:08:36,665][07135] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-19 10:08:36,668][07135] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-19 10:08:36,675][07135] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-19 10:08:36,675][07135] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-19 10:08:36,677][07135] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-19 10:08:36,688][07135] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-19 10:08:36,692][07135] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-19 10:08:36,694][07135] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-19 10:08:36,698][07135] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-19 10:08:36,700][07135] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-19 10:08:36,805][07135] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 10:08:36,824][07135] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 10:08:36,833][07135] RunningMeanStd input shape: (1,) [2024-12-19 10:08:36,895][07135] ConvEncoder: input_channels=3 [2024-12-19 10:08:37,370][07135] Conv encoder output size: 512 [2024-12-19 10:08:37,383][07135] Policy head output size: 512 [2024-12-19 10:08:38,022][07135] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000560_2293760.pth... [2024-12-19 10:08:40,683][07135] Num frames 100... [2024-12-19 10:08:40,959][07135] Num frames 200... [2024-12-19 10:08:41,224][07135] Num frames 300... [2024-12-19 10:08:41,455][07135] Num frames 400... [2024-12-19 10:08:41,617][07135] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2024-12-19 10:08:41,619][07135] Avg episode reward: 5.480, avg true_objective: 4.480 [2024-12-19 10:08:41,732][07135] Num frames 500... [2024-12-19 10:08:41,957][07135] Num frames 600... [2024-12-19 10:08:42,145][07135] Num frames 700... [2024-12-19 10:08:42,335][07135] Num frames 800... [2024-12-19 10:08:42,549][07135] Num frames 900... [2024-12-19 10:08:42,722][07135] Avg episode rewards: #0: 6.300, true rewards: #0: 4.800 [2024-12-19 10:08:42,724][07135] Avg episode reward: 6.300, avg true_objective: 4.800 [2024-12-19 10:08:42,806][07135] Num frames 1000... [2024-12-19 10:08:43,038][07135] Num frames 1100... [2024-12-19 10:08:43,190][07135] Num frames 1200... [2024-12-19 10:08:43,313][07135] Num frames 1300... [2024-12-19 10:08:43,466][07135] Avg episode rewards: #0: 5.920, true rewards: #0: 4.587 [2024-12-19 10:08:43,468][07135] Avg episode reward: 5.920, avg true_objective: 4.587 [2024-12-19 10:08:43,502][07135] Num frames 1400... [2024-12-19 10:08:43,631][07135] Num frames 1500... [2024-12-19 10:08:43,758][07135] Num frames 1600... [2024-12-19 10:08:43,888][07135] Num frames 1700... [2024-12-19 10:08:44,026][07135] Avg episode rewards: #0: 5.400, true rewards: #0: 4.400 [2024-12-19 10:08:44,027][07135] Avg episode reward: 5.400, avg true_objective: 4.400 [2024-12-19 10:08:44,079][07135] Num frames 1800... [2024-12-19 10:08:44,204][07135] Num frames 1900... [2024-12-19 10:08:44,331][07135] Num frames 2000... [2024-12-19 10:08:44,460][07135] Num frames 2100... [2024-12-19 10:08:44,574][07135] Avg episode rewards: #0: 5.088, true rewards: #0: 4.288 [2024-12-19 10:08:44,576][07135] Avg episode reward: 5.088, avg true_objective: 4.288 [2024-12-19 10:08:44,676][07135] Num frames 2200... [2024-12-19 10:08:44,847][07135] Num frames 2300... [2024-12-19 10:08:45,028][07135] Num frames 2400... [2024-12-19 10:08:45,194][07135] Num frames 2500... [2024-12-19 10:08:45,299][07135] Avg episode rewards: #0: 4.880, true rewards: #0: 4.213 [2024-12-19 10:08:45,304][07135] Avg episode reward: 4.880, avg true_objective: 4.213 [2024-12-19 10:08:45,425][07135] Num frames 2600... [2024-12-19 10:08:45,612][07135] Num frames 2700... [2024-12-19 10:08:45,784][07135] Num frames 2800... [2024-12-19 10:08:45,973][07135] Num frames 2900... [2024-12-19 10:08:46,060][07135] Avg episode rewards: #0: 4.731, true rewards: #0: 4.160 [2024-12-19 10:08:46,062][07135] Avg episode reward: 4.731, avg true_objective: 4.160 [2024-12-19 10:08:46,218][07135] Num frames 3000... [2024-12-19 10:08:46,404][07135] Num frames 3100... [2024-12-19 10:08:46,579][07135] Num frames 3200... [2024-12-19 10:08:46,808][07135] Avg episode rewards: #0: 4.620, true rewards: #0: 4.120 [2024-12-19 10:08:46,810][07135] Avg episode reward: 4.620, avg true_objective: 4.120 [2024-12-19 10:08:46,822][07135] Num frames 3300... [2024-12-19 10:08:47,010][07135] Num frames 3400... [2024-12-19 10:08:47,142][07135] Num frames 3500... [2024-12-19 10:08:47,269][07135] Num frames 3600... [2024-12-19 10:08:47,393][07135] Num frames 3700... [2024-12-19 10:08:47,503][07135] Avg episode rewards: #0: 4.716, true rewards: #0: 4.160 [2024-12-19 10:08:47,506][07135] Avg episode reward: 4.716, avg true_objective: 4.160 [2024-12-19 10:08:47,584][07135] Num frames 3800... [2024-12-19 10:08:47,708][07135] Num frames 3900... [2024-12-19 10:08:47,832][07135] Num frames 4000... [2024-12-19 10:08:47,955][07135] Num frames 4100... [2024-12-19 10:08:48,044][07135] Avg episode rewards: #0: 4.628, true rewards: #0: 4.128 [2024-12-19 10:08:48,046][07135] Avg episode reward: 4.628, avg true_objective: 4.128 [2024-12-19 10:09:06,834][07135] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-19 10:09:07,139][07135] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 10:09:07,141][07135] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-19 10:09:07,142][07135] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-19 10:09:07,144][07135] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-19 10:09:07,145][07135] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-19 10:09:07,146][07135] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-19 10:09:07,146][07135] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-19 10:09:07,147][07135] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-19 10:09:07,148][07135] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-19 10:09:07,150][07135] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-19 10:09:07,151][07135] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-19 10:09:07,152][07135] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-19 10:09:07,153][07135] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-19 10:09:07,154][07135] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-19 10:09:07,155][07135] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-19 10:09:07,195][07135] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 10:09:07,197][07135] RunningMeanStd input shape: (1,) [2024-12-19 10:09:07,213][07135] ConvEncoder: input_channels=3 [2024-12-19 10:09:07,270][07135] Conv encoder output size: 512 [2024-12-19 10:09:07,272][07135] Policy head output size: 512 [2024-12-19 10:09:07,298][07135] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000560_2293760.pth... [2024-12-19 10:09:07,928][07135] Num frames 100... [2024-12-19 10:09:08,106][07135] Num frames 200... [2024-12-19 10:09:08,295][07135] Num frames 300... [2024-12-19 10:09:08,460][07135] Num frames 400... [2024-12-19 10:09:08,598][07135] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2024-12-19 10:09:08,600][07135] Avg episode reward: 5.480, avg true_objective: 4.480 [2024-12-19 10:09:08,686][07135] Num frames 500... [2024-12-19 10:09:08,848][07135] Num frames 600... [2024-12-19 10:09:09,002][07135] Num frames 700... [2024-12-19 10:09:09,124][07135] Num frames 800... [2024-12-19 10:09:09,218][07135] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-12-19 10:09:09,220][07135] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-12-19 10:09:09,302][07135] Num frames 900... [2024-12-19 10:09:09,421][07135] Num frames 1000... [2024-12-19 10:09:09,547][07135] Num frames 1100... [2024-12-19 10:09:09,669][07135] Num frames 1200... [2024-12-19 10:09:09,824][07135] Avg episode rewards: #0: 4.933, true rewards: #0: 4.267 [2024-12-19 10:09:09,827][07135] Avg episode reward: 4.933, avg true_objective: 4.267 [2024-12-19 10:09:09,867][07135] Num frames 1300... [2024-12-19 10:09:09,991][07135] Num frames 1400... [2024-12-19 10:09:10,111][07135] Num frames 1500... [2024-12-19 10:09:10,261][07135] Num frames 1600... [2024-12-19 10:09:10,423][07135] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-12-19 10:09:10,425][07135] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-12-19 10:09:10,489][07135] Num frames 1700... [2024-12-19 10:09:10,668][07135] Num frames 1800... [2024-12-19 10:09:10,833][07135] Num frames 1900... [2024-12-19 10:09:11,018][07135] Num frames 2000... [2024-12-19 10:09:11,162][07135] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096 [2024-12-19 10:09:11,167][07135] Avg episode reward: 4.496, avg true_objective: 4.096 [2024-12-19 10:09:11,256][07135] Num frames 2100... [2024-12-19 10:09:11,417][07135] Num frames 2200... [2024-12-19 10:09:11,588][07135] Num frames 2300... [2024-12-19 10:09:11,763][07135] Num frames 2400... [2024-12-19 10:09:11,874][07135] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2024-12-19 10:09:11,877][07135] Avg episode reward: 4.387, avg true_objective: 4.053 [2024-12-19 10:09:12,010][07135] Num frames 2500... [2024-12-19 10:09:12,181][07135] Num frames 2600... [2024-12-19 10:09:12,347][07135] Num frames 2700... [2024-12-19 10:09:12,520][07135] Num frames 2800... [2024-12-19 10:09:12,607][07135] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023 [2024-12-19 10:09:12,609][07135] Avg episode reward: 4.309, avg true_objective: 4.023 [2024-12-19 10:09:12,716][07135] Num frames 2900... [2024-12-19 10:09:12,836][07135] Num frames 3000... [2024-12-19 10:09:12,958][07135] Num frames 3100... [2024-12-19 10:09:13,098][07135] Num frames 3200... [2024-12-19 10:09:13,152][07135] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 [2024-12-19 10:09:13,154][07135] Avg episode reward: 4.250, avg true_objective: 4.000 [2024-12-19 10:09:13,276][07135] Num frames 3300... [2024-12-19 10:09:13,398][07135] Num frames 3400... [2024-12-19 10:09:13,529][07135] Num frames 3500... [2024-12-19 10:09:13,689][07135] Avg episode rewards: #0: 4.204, true rewards: #0: 3.982 [2024-12-19 10:09:13,690][07135] Avg episode reward: 4.204, avg true_objective: 3.982 [2024-12-19 10:09:13,714][07135] Num frames 3600... [2024-12-19 10:09:13,832][07135] Num frames 3700... [2024-12-19 10:09:13,955][07135] Num frames 3800... [2024-12-19 10:09:14,090][07135] Num frames 3900... [2024-12-19 10:09:14,218][07135] Num frames 4000... [2024-12-19 10:09:14,271][07135] Avg episode rewards: #0: 4.300, true rewards: #0: 4.000 [2024-12-19 10:09:14,272][07135] Avg episode reward: 4.300, avg true_objective: 4.000 [2024-12-19 10:09:31,879][07135] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-19 10:18:48,403][07135] Environment doom_basic already registered, overwriting... [2024-12-19 10:18:48,406][07135] Environment doom_two_colors_easy already registered, overwriting... [2024-12-19 10:18:48,409][07135] Environment doom_two_colors_hard already registered, overwriting... [2024-12-19 10:18:48,410][07135] Environment doom_dm already registered, overwriting... [2024-12-19 10:18:48,412][07135] Environment doom_dwango5 already registered, overwriting... [2024-12-19 10:18:48,413][07135] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-12-19 10:18:48,414][07135] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-12-19 10:18:48,419][07135] Environment doom_my_way_home already registered, overwriting... [2024-12-19 10:18:48,420][07135] Environment doom_deadly_corridor already registered, overwriting... [2024-12-19 10:18:48,422][07135] Environment doom_defend_the_center already registered, overwriting... [2024-12-19 10:18:48,423][07135] Environment doom_defend_the_line already registered, overwriting... [2024-12-19 10:18:48,425][07135] Environment doom_health_gathering already registered, overwriting... [2024-12-19 10:18:48,426][07135] Environment doom_health_gathering_supreme already registered, overwriting... [2024-12-19 10:18:48,430][07135] Environment doom_battle already registered, overwriting... [2024-12-19 10:18:48,431][07135] Environment doom_battle2 already registered, overwriting... [2024-12-19 10:18:48,432][07135] Environment doom_duel_bots already registered, overwriting... [2024-12-19 10:18:48,435][07135] Environment doom_deathmatch_bots already registered, overwriting... [2024-12-19 10:18:48,437][07135] Environment doom_duel already registered, overwriting... [2024-12-19 10:18:48,439][07135] Environment doom_deathmatch_full already registered, overwriting... [2024-12-19 10:18:48,441][07135] Environment doom_benchmark already registered, overwriting... [2024-12-19 10:18:48,443][07135] register_encoder_factory: [2024-12-19 10:18:48,481][07135] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 10:18:48,484][07135] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2024-12-19 10:18:48,492][07135] Experiment dir /content/train_dir/default_experiment already exists! [2024-12-19 10:18:48,494][07135] Resuming existing experiment from /content/train_dir/default_experiment... [2024-12-19 10:18:48,498][07135] Weights and Biases integration disabled [2024-12-19 10:18:48,501][07135] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-12-19 10:18:50,770][07135] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000000} git_hash=unknown git_repo_name=not a git repository [2024-12-19 10:18:50,771][07135] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-19 10:18:50,776][07135] Rollout worker 0 uses device cpu [2024-12-19 10:18:50,778][07135] Rollout worker 1 uses device cpu [2024-12-19 10:18:50,780][07135] Rollout worker 2 uses device cpu [2024-12-19 10:18:50,781][07135] Rollout worker 3 uses device cpu [2024-12-19 10:18:50,783][07135] Rollout worker 4 uses device cpu [2024-12-19 10:18:50,784][07135] Rollout worker 5 uses device cpu [2024-12-19 10:18:50,785][07135] Rollout worker 6 uses device cpu [2024-12-19 10:18:50,786][07135] Rollout worker 7 uses device cpu [2024-12-19 10:18:50,884][07135] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 10:18:50,886][07135] InferenceWorker_p0-w0: min num requests: 2 [2024-12-19 10:18:50,918][07135] Starting all processes... [2024-12-19 10:18:50,919][07135] Starting process learner_proc0 [2024-12-19 10:18:50,968][07135] Starting all processes... [2024-12-19 10:18:50,975][07135] Starting process inference_proc0-0 [2024-12-19 10:18:50,977][07135] Starting process rollout_proc0 [2024-12-19 10:18:50,984][07135] Starting process rollout_proc1 [2024-12-19 10:18:50,984][07135] Starting process rollout_proc2 [2024-12-19 10:18:50,984][07135] Starting process rollout_proc3 [2024-12-19 10:18:50,984][07135] Starting process rollout_proc4 [2024-12-19 10:18:50,984][07135] Starting process rollout_proc5 [2024-12-19 10:18:50,984][07135] Starting process rollout_proc6 [2024-12-19 10:18:50,984][07135] Starting process rollout_proc7 [2024-12-19 10:19:07,001][15948] Worker 5 uses CPU cores [1] [2024-12-19 10:19:07,500][15925] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 10:19:07,505][15925] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-19 10:19:07,574][15939] Worker 0 uses CPU cores [0] [2024-12-19 10:19:07,575][15925] Num visible devices: 1 [2024-12-19 10:19:07,607][15925] Starting seed is not provided [2024-12-19 10:19:07,608][15925] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 10:19:07,609][15925] Initializing actor-critic model on device cuda:0 [2024-12-19 10:19:07,610][15925] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 10:19:07,611][15925] RunningMeanStd input shape: (1,) [2024-12-19 10:19:07,692][15925] ConvEncoder: input_channels=3 [2024-12-19 10:19:07,787][15938] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 10:19:07,789][15938] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-19 10:19:07,865][15938] Num visible devices: 1 [2024-12-19 10:19:07,918][15945] Worker 2 uses CPU cores [0] [2024-12-19 10:19:08,026][15949] Worker 7 uses CPU cores [1] [2024-12-19 10:19:08,040][15950] Worker 6 uses CPU cores [0] [2024-12-19 10:19:08,068][15946] Worker 4 uses CPU cores [0] [2024-12-19 10:19:08,124][15947] Worker 3 uses CPU cores [1] [2024-12-19 10:19:08,146][15925] Conv encoder output size: 512 [2024-12-19 10:19:08,146][15925] Policy head output size: 512 [2024-12-19 10:19:08,173][15925] Created Actor Critic model with architecture: [2024-12-19 10:19:08,174][15925] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-19 10:19:08,220][15944] Worker 1 uses CPU cores [1] [2024-12-19 10:19:08,304][15925] Using optimizer [2024-12-19 10:19:09,104][15925] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000560_2293760.pth... [2024-12-19 10:19:09,140][15925] Loading model from checkpoint [2024-12-19 10:19:09,142][15925] Loaded experiment state at self.train_step=560, self.env_steps=2293760 [2024-12-19 10:19:09,142][15925] Initialized policy 0 weights for model version 560 [2024-12-19 10:19:09,146][15925] LearnerWorker_p0 finished initialization! [2024-12-19 10:19:09,147][15925] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-19 10:19:09,251][15938] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 10:19:09,252][15938] RunningMeanStd input shape: (1,) [2024-12-19 10:19:09,265][15938] ConvEncoder: input_channels=3 [2024-12-19 10:19:09,368][15938] Conv encoder output size: 512 [2024-12-19 10:19:09,368][15938] Policy head output size: 512 [2024-12-19 10:19:09,422][07135] Inference worker 0-0 is ready! [2024-12-19 10:19:09,424][07135] All inference workers are ready! Signal rollout workers to start! [2024-12-19 10:19:09,621][15948] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 10:19:09,625][15944] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 10:19:09,627][15949] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 10:19:09,634][15946] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 10:19:09,636][15945] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 10:19:09,638][15939] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 10:19:09,639][15950] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 10:19:09,639][15947] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-19 10:19:10,627][15944] Decorrelating experience for 0 frames... [2024-12-19 10:19:10,635][15947] Decorrelating experience for 0 frames... [2024-12-19 10:19:10,877][07135] Heartbeat connected on Batcher_0 [2024-12-19 10:19:10,882][07135] Heartbeat connected on LearnerWorker_p0 [2024-12-19 10:19:10,925][07135] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-19 10:19:11,305][15950] Decorrelating experience for 0 frames... [2024-12-19 10:19:11,303][15945] Decorrelating experience for 0 frames... [2024-12-19 10:19:11,312][15946] Decorrelating experience for 0 frames... [2024-12-19 10:19:11,315][15939] Decorrelating experience for 0 frames... [2024-12-19 10:19:11,461][15944] Decorrelating experience for 32 frames... [2024-12-19 10:19:11,472][15947] Decorrelating experience for 32 frames... [2024-12-19 10:19:12,337][15945] Decorrelating experience for 32 frames... [2024-12-19 10:19:12,349][15939] Decorrelating experience for 32 frames... [2024-12-19 10:19:12,637][15950] Decorrelating experience for 32 frames... [2024-12-19 10:19:12,914][15948] Decorrelating experience for 0 frames... [2024-12-19 10:19:12,930][15949] Decorrelating experience for 0 frames... [2024-12-19 10:19:13,502][07135] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2293760. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-19 10:19:13,537][15946] Decorrelating experience for 32 frames... [2024-12-19 10:19:13,630][15944] Decorrelating experience for 64 frames... [2024-12-19 10:19:13,715][15947] Decorrelating experience for 64 frames... [2024-12-19 10:19:14,292][15939] Decorrelating experience for 64 frames... [2024-12-19 10:19:14,617][15948] Decorrelating experience for 32 frames... [2024-12-19 10:19:14,629][15949] Decorrelating experience for 32 frames... [2024-12-19 10:19:15,745][15950] Decorrelating experience for 64 frames... [2024-12-19 10:19:15,874][15939] Decorrelating experience for 96 frames... [2024-12-19 10:19:16,133][07135] Heartbeat connected on RolloutWorker_w0 [2024-12-19 10:19:16,457][15945] Decorrelating experience for 64 frames... [2024-12-19 10:19:16,708][15947] Decorrelating experience for 96 frames... [2024-12-19 10:19:17,078][07135] Heartbeat connected on RolloutWorker_w3 [2024-12-19 10:19:17,460][15948] Decorrelating experience for 64 frames... [2024-12-19 10:19:17,466][15949] Decorrelating experience for 64 frames... [2024-12-19 10:19:18,502][07135] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2293760. Throughput: 0: 84.0. Samples: 420. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-19 10:19:18,507][07135] Avg episode reward: [(0, '1.387')] [2024-12-19 10:19:18,663][15950] Decorrelating experience for 96 frames... [2024-12-19 10:19:19,005][07135] Heartbeat connected on RolloutWorker_w6 [2024-12-19 10:19:19,505][15945] Decorrelating experience for 96 frames... [2024-12-19 10:19:19,637][15946] Decorrelating experience for 64 frames... [2024-12-19 10:19:19,841][07135] Heartbeat connected on RolloutWorker_w2 [2024-12-19 10:19:19,870][15948] Decorrelating experience for 96 frames... [2024-12-19 10:19:20,195][07135] Heartbeat connected on RolloutWorker_w5 [2024-12-19 10:19:22,233][15946] Decorrelating experience for 96 frames... [2024-12-19 10:19:22,752][07135] Heartbeat connected on RolloutWorker_w4 [2024-12-19 10:19:22,996][15949] Decorrelating experience for 96 frames... [2024-12-19 10:19:23,146][15944] Decorrelating experience for 96 frames... [2024-12-19 10:19:23,257][15925] Signal inference workers to stop experience collection... [2024-12-19 10:19:23,275][15938] InferenceWorker_p0-w0: stopping experience collection [2024-12-19 10:19:23,363][07135] Heartbeat connected on RolloutWorker_w7 [2024-12-19 10:19:23,393][07135] Heartbeat connected on RolloutWorker_w1 [2024-12-19 10:19:23,502][07135] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2293760. Throughput: 0: 175.4. Samples: 1754. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-19 10:19:23,506][07135] Avg episode reward: [(0, '3.433')] [2024-12-19 10:19:24,871][15925] Signal inference workers to resume experience collection... [2024-12-19 10:19:24,872][15938] InferenceWorker_p0-w0: resuming experience collection [2024-12-19 10:19:28,502][07135] Fps is (10 sec: 2048.0, 60 sec: 1365.4, 300 sec: 1365.4). Total num frames: 2314240. Throughput: 0: 348.7. Samples: 5230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:19:28,506][07135] Avg episode reward: [(0, '3.644')] [2024-12-19 10:19:33,503][07135] Fps is (10 sec: 3686.0, 60 sec: 1843.1, 300 sec: 1843.1). Total num frames: 2330624. Throughput: 0: 513.9. Samples: 10278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:19:33,508][07135] Avg episode reward: [(0, '4.164')] [2024-12-19 10:19:34,367][15938] Updated weights for policy 0, policy_version 570 (0.0142) [2024-12-19 10:19:38,502][07135] Fps is (10 sec: 3276.8, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 2347008. Throughput: 0: 486.5. Samples: 12162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:19:38,508][07135] Avg episode reward: [(0, '4.465')] [2024-12-19 10:19:43,502][07135] Fps is (10 sec: 3686.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 2367488. Throughput: 0: 620.3. Samples: 18608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:19:43,508][07135] Avg episode reward: [(0, '4.476')] [2024-12-19 10:19:44,567][15938] Updated weights for policy 0, policy_version 580 (0.0022) [2024-12-19 10:19:48,502][07135] Fps is (10 sec: 4505.6, 60 sec: 2808.7, 300 sec: 2808.7). Total num frames: 2392064. Throughput: 0: 724.2. Samples: 25346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:19:48,506][07135] Avg episode reward: [(0, '4.448')] [2024-12-19 10:19:53,502][07135] Fps is (10 sec: 3686.4, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 2404352. Throughput: 0: 686.1. Samples: 27442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:19:53,503][07135] Avg episode reward: [(0, '4.438')] [2024-12-19 10:19:55,959][15938] Updated weights for policy 0, policy_version 590 (0.0023) [2024-12-19 10:19:58,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 2428928. Throughput: 0: 735.1. Samples: 33080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:19:58,504][07135] Avg episode reward: [(0, '4.520')] [2024-12-19 10:20:03,502][07135] Fps is (10 sec: 4505.7, 60 sec: 3113.0, 300 sec: 3113.0). Total num frames: 2449408. Throughput: 0: 883.3. Samples: 40170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:20:03,509][07135] Avg episode reward: [(0, '4.567')] [2024-12-19 10:20:04,564][15938] Updated weights for policy 0, policy_version 600 (0.0028) [2024-12-19 10:20:08,502][07135] Fps is (10 sec: 3686.3, 60 sec: 3127.9, 300 sec: 3127.9). Total num frames: 2465792. Throughput: 0: 911.0. Samples: 42748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:20:08,504][07135] Avg episode reward: [(0, '4.511')] [2024-12-19 10:20:13,507][07135] Fps is (10 sec: 2865.7, 60 sec: 3071.7, 300 sec: 3071.7). Total num frames: 2478080. Throughput: 0: 920.0. Samples: 46634. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-19 10:20:13,509][07135] Avg episode reward: [(0, '4.581')] [2024-12-19 10:20:18,502][07135] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3087.8). Total num frames: 2494464. Throughput: 0: 906.6. Samples: 51076. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-19 10:20:18,504][07135] Avg episode reward: [(0, '4.723')] [2024-12-19 10:20:18,884][15938] Updated weights for policy 0, policy_version 610 (0.0026) [2024-12-19 10:20:23,506][07135] Fps is (10 sec: 3686.6, 60 sec: 3686.1, 300 sec: 3159.6). Total num frames: 2514944. Throughput: 0: 942.4. Samples: 54576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:20:23,508][07135] Avg episode reward: [(0, '4.824')] [2024-12-19 10:20:28,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3167.6). Total num frames: 2531328. Throughput: 0: 903.9. Samples: 59282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:20:28,511][07135] Avg episode reward: [(0, '4.751')] [2024-12-19 10:20:30,517][15938] Updated weights for policy 0, policy_version 620 (0.0020) [2024-12-19 10:20:33,502][07135] Fps is (10 sec: 3688.1, 60 sec: 3686.5, 300 sec: 3225.6). Total num frames: 2551808. Throughput: 0: 892.1. Samples: 65490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:20:33,506][07135] Avg episode reward: [(0, '4.653')] [2024-12-19 10:20:38,502][07135] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3325.0). Total num frames: 2576384. Throughput: 0: 919.2. Samples: 68808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:20:38,504][07135] Avg episode reward: [(0, '4.574')] [2024-12-19 10:20:39,393][15938] Updated weights for policy 0, policy_version 630 (0.0028) [2024-12-19 10:20:43,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 2588672. Throughput: 0: 917.0. Samples: 74346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:20:43,507][07135] Avg episode reward: [(0, '4.621')] [2024-12-19 10:20:48,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3319.9). Total num frames: 2609152. Throughput: 0: 874.0. Samples: 79502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:20:48,508][07135] Avg episode reward: [(0, '4.516')] [2024-12-19 10:20:48,517][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth... [2024-12-19 10:20:48,648][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000516_2113536.pth [2024-12-19 10:20:51,050][15938] Updated weights for policy 0, policy_version 640 (0.0039) [2024-12-19 10:20:53,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3358.7). Total num frames: 2629632. Throughput: 0: 891.6. Samples: 82870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:20:53,506][07135] Avg episode reward: [(0, '4.581')] [2024-12-19 10:20:58,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3393.8). Total num frames: 2650112. Throughput: 0: 954.2. Samples: 89566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:20:58,509][07135] Avg episode reward: [(0, '4.621')] [2024-12-19 10:21:01,795][15938] Updated weights for policy 0, policy_version 650 (0.0014) [2024-12-19 10:21:03,503][07135] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3388.5). Total num frames: 2666496. Throughput: 0: 950.7. Samples: 93860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:21:03,508][07135] Avg episode reward: [(0, '4.475')] [2024-12-19 10:21:08,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3419.3). Total num frames: 2686976. Throughput: 0: 942.9. Samples: 97002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:21:08,504][07135] Avg episode reward: [(0, '4.724')] [2024-12-19 10:21:11,510][15938] Updated weights for policy 0, policy_version 660 (0.0013) [2024-12-19 10:21:13,502][07135] Fps is (10 sec: 4506.3, 60 sec: 3891.5, 300 sec: 3481.6). Total num frames: 2711552. Throughput: 0: 994.8. Samples: 104050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:21:13,506][07135] Avg episode reward: [(0, '4.740')] [2024-12-19 10:21:18,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3440.6). Total num frames: 2723840. Throughput: 0: 967.2. Samples: 109016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:21:18,504][07135] Avg episode reward: [(0, '4.631')] [2024-12-19 10:21:23,106][15938] Updated weights for policy 0, policy_version 670 (0.0024) [2024-12-19 10:21:23,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3465.8). Total num frames: 2744320. Throughput: 0: 945.2. Samples: 111340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:21:23,504][07135] Avg episode reward: [(0, '4.505')] [2024-12-19 10:21:28,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3519.5). Total num frames: 2768896. Throughput: 0: 974.6. Samples: 118204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:21:28,504][07135] Avg episode reward: [(0, '4.650')] [2024-12-19 10:21:32,475][15938] Updated weights for policy 0, policy_version 680 (0.0013) [2024-12-19 10:21:33,502][07135] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3510.9). Total num frames: 2785280. Throughput: 0: 989.8. Samples: 124042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:21:33,507][07135] Avg episode reward: [(0, '4.568')] [2024-12-19 10:21:38,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3502.8). Total num frames: 2801664. Throughput: 0: 962.2. Samples: 126170. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:21:38,509][07135] Avg episode reward: [(0, '4.570')] [2024-12-19 10:21:43,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3522.6). Total num frames: 2822144. Throughput: 0: 951.2. Samples: 132372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:21:43,509][07135] Avg episode reward: [(0, '4.722')] [2024-12-19 10:21:43,723][15938] Updated weights for policy 0, policy_version 690 (0.0022) [2024-12-19 10:21:48,502][07135] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3567.5). Total num frames: 2846720. Throughput: 0: 1001.5. Samples: 138926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:21:48,505][07135] Avg episode reward: [(0, '4.656')] [2024-12-19 10:21:53,504][07135] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3532.7). Total num frames: 2859008. Throughput: 0: 977.5. Samples: 140990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:21:53,510][07135] Avg episode reward: [(0, '4.640')] [2024-12-19 10:21:55,395][15938] Updated weights for policy 0, policy_version 700 (0.0016) [2024-12-19 10:21:58,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 2879488. Throughput: 0: 937.2. Samples: 146224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:21:58,504][07135] Avg episode reward: [(0, '4.710')] [2024-12-19 10:22:03,502][07135] Fps is (10 sec: 4506.7, 60 sec: 3959.6, 300 sec: 3590.0). Total num frames: 2904064. Throughput: 0: 977.0. Samples: 152982. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:22:03,505][07135] Avg episode reward: [(0, '4.544')] [2024-12-19 10:22:04,321][15938] Updated weights for policy 0, policy_version 710 (0.0018) [2024-12-19 10:22:08,504][07135] Fps is (10 sec: 4094.9, 60 sec: 3891.0, 300 sec: 3581.0). Total num frames: 2920448. Throughput: 0: 989.7. Samples: 155880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:22:08,507][07135] Avg episode reward: [(0, '4.637')] [2024-12-19 10:22:13,502][07135] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3549.9). Total num frames: 2932736. Throughput: 0: 927.3. Samples: 159932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:22:13,505][07135] Avg episode reward: [(0, '4.579')] [2024-12-19 10:22:16,411][15938] Updated weights for policy 0, policy_version 720 (0.0018) [2024-12-19 10:22:18,502][07135] Fps is (10 sec: 3687.4, 60 sec: 3891.2, 300 sec: 3586.8). Total num frames: 2957312. Throughput: 0: 949.3. Samples: 166760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:22:18,504][07135] Avg episode reward: [(0, '4.699')] [2024-12-19 10:22:23,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3600.2). Total num frames: 2977792. Throughput: 0: 981.1. Samples: 170320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:22:23,508][07135] Avg episode reward: [(0, '4.798')] [2024-12-19 10:22:26,743][15938] Updated weights for policy 0, policy_version 730 (0.0023) [2024-12-19 10:22:28,507][07135] Fps is (10 sec: 3684.6, 60 sec: 3754.4, 300 sec: 3591.8). Total num frames: 2994176. Throughput: 0: 949.7. Samples: 175112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:22:28,513][07135] Avg episode reward: [(0, '4.594')] [2024-12-19 10:22:33,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3604.5). Total num frames: 3014656. Throughput: 0: 940.6. Samples: 181254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:22:33,509][07135] Avg episode reward: [(0, '4.366')] [2024-12-19 10:22:36,498][15938] Updated weights for policy 0, policy_version 740 (0.0024) [2024-12-19 10:22:38,502][07135] Fps is (10 sec: 4507.8, 60 sec: 3959.5, 300 sec: 3636.5). Total num frames: 3039232. Throughput: 0: 973.9. Samples: 184812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:22:38,506][07135] Avg episode reward: [(0, '4.548')] [2024-12-19 10:22:43,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3627.9). Total num frames: 3055616. Throughput: 0: 981.5. Samples: 190390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:22:43,507][07135] Avg episode reward: [(0, '4.763')] [2024-12-19 10:22:48,148][15938] Updated weights for policy 0, policy_version 750 (0.0028) [2024-12-19 10:22:48,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3619.7). Total num frames: 3072000. Throughput: 0: 945.2. Samples: 195516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:22:48,506][07135] Avg episode reward: [(0, '4.733')] [2024-12-19 10:22:48,521][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000750_3072000.pth... [2024-12-19 10:22:48,672][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000560_2293760.pth [2024-12-19 10:22:53,503][07135] Fps is (10 sec: 3685.8, 60 sec: 3891.3, 300 sec: 3630.5). Total num frames: 3092480. Throughput: 0: 957.8. Samples: 198978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:22:53,506][07135] Avg episode reward: [(0, '4.635')] [2024-12-19 10:22:57,081][15938] Updated weights for policy 0, policy_version 760 (0.0026) [2024-12-19 10:22:58,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3659.1). Total num frames: 3117056. Throughput: 0: 1017.8. Samples: 205732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:22:58,504][07135] Avg episode reward: [(0, '4.691')] [2024-12-19 10:23:03,503][07135] Fps is (10 sec: 3686.5, 60 sec: 3754.6, 300 sec: 3633.0). Total num frames: 3129344. Throughput: 0: 960.9. Samples: 210002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:23:03,511][07135] Avg episode reward: [(0, '4.712')] [2024-12-19 10:23:08,440][15938] Updated weights for policy 0, policy_version 770 (0.0026) [2024-12-19 10:23:08,502][07135] Fps is (10 sec: 3686.3, 60 sec: 3891.4, 300 sec: 3660.3). Total num frames: 3153920. Throughput: 0: 955.9. Samples: 213334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:23:08,506][07135] Avg episode reward: [(0, '4.590')] [2024-12-19 10:23:13,502][07135] Fps is (10 sec: 4506.2, 60 sec: 4027.7, 300 sec: 3669.3). Total num frames: 3174400. Throughput: 0: 1006.0. Samples: 220378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:23:13,504][07135] Avg episode reward: [(0, '4.632')] [2024-12-19 10:23:18,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3661.3). Total num frames: 3190784. Throughput: 0: 978.7. Samples: 225296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:23:18,509][07135] Avg episode reward: [(0, '4.633')] [2024-12-19 10:23:19,247][15938] Updated weights for policy 0, policy_version 780 (0.0033) [2024-12-19 10:23:23,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3670.0). Total num frames: 3211264. Throughput: 0: 952.3. Samples: 227666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:23:23,505][07135] Avg episode reward: [(0, '4.489')] [2024-12-19 10:23:28,502][07135] Fps is (10 sec: 4095.7, 60 sec: 3959.8, 300 sec: 3678.4). Total num frames: 3231744. Throughput: 0: 986.3. Samples: 234776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:23:28,512][07135] Avg episode reward: [(0, '4.491')] [2024-12-19 10:23:28,545][15938] Updated weights for policy 0, policy_version 790 (0.0024) [2024-12-19 10:23:33,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3686.4). Total num frames: 3252224. Throughput: 0: 1005.6. Samples: 240766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:23:33,507][07135] Avg episode reward: [(0, '4.471')] [2024-12-19 10:23:38,502][07135] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3678.7). Total num frames: 3268608. Throughput: 0: 973.9. Samples: 242804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:23:38,506][07135] Avg episode reward: [(0, '4.419')] [2024-12-19 10:23:40,274][15938] Updated weights for policy 0, policy_version 800 (0.0014) [2024-12-19 10:23:43,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 3289088. Throughput: 0: 963.4. Samples: 249086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:23:43,504][07135] Avg episode reward: [(0, '4.607')] [2024-12-19 10:23:48,502][07135] Fps is (10 sec: 4505.3, 60 sec: 4027.7, 300 sec: 3708.7). Total num frames: 3313664. Throughput: 0: 1022.7. Samples: 256022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:23:48,505][07135] Avg episode reward: [(0, '4.542')] [2024-12-19 10:23:49,508][15938] Updated weights for policy 0, policy_version 810 (0.0015) [2024-12-19 10:23:53,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3686.4). Total num frames: 3325952. Throughput: 0: 995.6. Samples: 258138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:23:53,504][07135] Avg episode reward: [(0, '4.610')] [2024-12-19 10:23:58,502][07135] Fps is (10 sec: 3277.0, 60 sec: 3822.9, 300 sec: 3693.6). Total num frames: 3346432. Throughput: 0: 958.7. Samples: 263520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:23:58,504][07135] Avg episode reward: [(0, '4.731')] [2024-12-19 10:24:00,369][15938] Updated weights for policy 0, policy_version 820 (0.0034) [2024-12-19 10:24:03,506][07135] Fps is (10 sec: 4503.4, 60 sec: 4027.5, 300 sec: 3714.6). Total num frames: 3371008. Throughput: 0: 1003.3. Samples: 270450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:24:03,509][07135] Avg episode reward: [(0, '4.809')] [2024-12-19 10:24:08,502][07135] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 3387392. Throughput: 0: 1018.8. Samples: 273512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:24:08,511][07135] Avg episode reward: [(0, '4.750')] [2024-12-19 10:24:11,431][15938] Updated weights for policy 0, policy_version 830 (0.0013) [2024-12-19 10:24:13,502][07135] Fps is (10 sec: 3278.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3403776. Throughput: 0: 955.7. Samples: 277784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:24:13,508][07135] Avg episode reward: [(0, '4.597')] [2024-12-19 10:24:18,502][07135] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3428352. Throughput: 0: 975.1. Samples: 284646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-12-19 10:24:18,507][07135] Avg episode reward: [(0, '4.729')] [2024-12-19 10:24:20,823][15938] Updated weights for policy 0, policy_version 840 (0.0020) [2024-12-19 10:24:23,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3448832. Throughput: 0: 1007.5. Samples: 288140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-12-19 10:24:23,505][07135] Avg episode reward: [(0, '4.883')] [2024-12-19 10:24:28,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3465216. Throughput: 0: 973.2. Samples: 292882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-12-19 10:24:28,507][07135] Avg episode reward: [(0, '4.650')] [2024-12-19 10:24:32,414][15938] Updated weights for policy 0, policy_version 850 (0.0021) [2024-12-19 10:24:33,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3485696. Throughput: 0: 953.2. Samples: 298916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:24:33,508][07135] Avg episode reward: [(0, '4.403')] [2024-12-19 10:24:38,503][07135] Fps is (10 sec: 4505.2, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 3510272. Throughput: 0: 982.7. Samples: 302362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:24:38,507][07135] Avg episode reward: [(0, '4.568')] [2024-12-19 10:24:42,055][15938] Updated weights for policy 0, policy_version 860 (0.0021) [2024-12-19 10:24:43,503][07135] Fps is (10 sec: 3685.9, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 3522560. Throughput: 0: 986.9. Samples: 307932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:24:43,508][07135] Avg episode reward: [(0, '4.698')] [2024-12-19 10:24:48,502][07135] Fps is (10 sec: 3277.1, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3543040. Throughput: 0: 943.3. Samples: 312894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:24:48,507][07135] Avg episode reward: [(0, '4.633')] [2024-12-19 10:24:48,514][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000865_3543040.pth... [2024-12-19 10:24:48,654][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth [2024-12-19 10:24:53,502][07135] Fps is (10 sec: 3277.2, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3555328. Throughput: 0: 928.7. Samples: 315302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:24:53,504][07135] Avg episode reward: [(0, '4.491')] [2024-12-19 10:24:55,177][15938] Updated weights for policy 0, policy_version 870 (0.0019) [2024-12-19 10:24:58,502][07135] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3571712. Throughput: 0: 929.4. Samples: 319608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:24:58,506][07135] Avg episode reward: [(0, '4.692')] [2024-12-19 10:25:03,502][07135] Fps is (10 sec: 2867.2, 60 sec: 3550.2, 300 sec: 3790.5). Total num frames: 3584000. Throughput: 0: 867.2. Samples: 323668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:25:03,507][07135] Avg episode reward: [(0, '4.747')] [2024-12-19 10:25:07,593][15938] Updated weights for policy 0, policy_version 880 (0.0020) [2024-12-19 10:25:08,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.3). Total num frames: 3608576. Throughput: 0: 864.6. Samples: 327046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:25:08,510][07135] Avg episode reward: [(0, '4.956')] [2024-12-19 10:25:13,502][07135] Fps is (10 sec: 4505.5, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 3629056. Throughput: 0: 910.4. Samples: 333850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:25:13,504][07135] Avg episode reward: [(0, '5.239')] [2024-12-19 10:25:13,510][15925] Saving new best policy, reward=5.239! [2024-12-19 10:25:17,929][15938] Updated weights for policy 0, policy_version 890 (0.0032) [2024-12-19 10:25:18,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3832.3). Total num frames: 3645440. Throughput: 0: 885.7. Samples: 338774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:25:18,510][07135] Avg episode reward: [(0, '5.385')] [2024-12-19 10:25:18,523][15925] Saving new best policy, reward=5.385! [2024-12-19 10:25:23,502][07135] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3832.2). Total num frames: 3661824. Throughput: 0: 861.2. Samples: 341116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:25:23,508][07135] Avg episode reward: [(0, '5.147')] [2024-12-19 10:25:27,989][15938] Updated weights for policy 0, policy_version 900 (0.0025) [2024-12-19 10:25:28,502][07135] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 3686400. Throughput: 0: 891.5. Samples: 348050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:25:28,504][07135] Avg episode reward: [(0, '5.007')] [2024-12-19 10:25:33,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3706880. Throughput: 0: 914.6. Samples: 354052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:25:33,504][07135] Avg episode reward: [(0, '4.992')] [2024-12-19 10:25:38,502][07135] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3846.1). Total num frames: 3723264. Throughput: 0: 907.4. Samples: 356134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:25:38,509][07135] Avg episode reward: [(0, '5.332')] [2024-12-19 10:25:39,482][15938] Updated weights for policy 0, policy_version 910 (0.0022) [2024-12-19 10:25:43,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3846.1). Total num frames: 3743744. Throughput: 0: 949.9. Samples: 362352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:25:43,509][07135] Avg episode reward: [(0, '5.171')] [2024-12-19 10:25:48,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 3764224. Throughput: 0: 1011.6. Samples: 369190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:25:48,506][07135] Avg episode reward: [(0, '4.697')] [2024-12-19 10:25:48,620][15938] Updated weights for policy 0, policy_version 920 (0.0025) [2024-12-19 10:25:53,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3780608. Throughput: 0: 982.5. Samples: 371260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:25:53,508][07135] Avg episode reward: [(0, '4.698')] [2024-12-19 10:25:58,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3801088. Throughput: 0: 950.1. Samples: 376604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:25:58,504][07135] Avg episode reward: [(0, '4.555')] [2024-12-19 10:25:59,809][15938] Updated weights for policy 0, policy_version 930 (0.0024) [2024-12-19 10:26:03,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3825664. Throughput: 0: 998.9. Samples: 383726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:26:03,508][07135] Avg episode reward: [(0, '4.456')] [2024-12-19 10:26:08,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3842048. Throughput: 0: 1010.6. Samples: 386592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:26:08,506][07135] Avg episode reward: [(0, '4.648')] [2024-12-19 10:26:10,848][15938] Updated weights for policy 0, policy_version 940 (0.0040) [2024-12-19 10:26:13,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 3858432. Throughput: 0: 953.1. Samples: 390940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:26:13,504][07135] Avg episode reward: [(0, '4.520')] [2024-12-19 10:26:18,502][07135] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3860.0). Total num frames: 3883008. Throughput: 0: 976.1. Samples: 397978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:26:18,505][07135] Avg episode reward: [(0, '4.514')] [2024-12-19 10:26:20,097][15938] Updated weights for policy 0, policy_version 950 (0.0034) [2024-12-19 10:26:23,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 3903488. Throughput: 0: 1007.4. Samples: 401468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:26:23,505][07135] Avg episode reward: [(0, '4.505')] [2024-12-19 10:26:28,507][07135] Fps is (10 sec: 3275.1, 60 sec: 3822.6, 300 sec: 3832.1). Total num frames: 3915776. Throughput: 0: 973.3. Samples: 406154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:26:28,510][07135] Avg episode reward: [(0, '4.470')] [2024-12-19 10:26:31,639][15938] Updated weights for policy 0, policy_version 960 (0.0018) [2024-12-19 10:26:33,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3940352. Throughput: 0: 957.1. Samples: 412260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:26:33,506][07135] Avg episode reward: [(0, '4.458')] [2024-12-19 10:26:38,502][07135] Fps is (10 sec: 4508.2, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3960832. Throughput: 0: 991.3. Samples: 415870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:26:38,508][07135] Avg episode reward: [(0, '4.886')] [2024-12-19 10:26:40,839][15938] Updated weights for policy 0, policy_version 970 (0.0018) [2024-12-19 10:26:43,503][07135] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 3977216. Throughput: 0: 998.4. Samples: 421534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:26:43,509][07135] Avg episode reward: [(0, '4.938')] [2024-12-19 10:26:48,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3997696. Throughput: 0: 951.5. Samples: 426542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:26:48,504][07135] Avg episode reward: [(0, '4.788')] [2024-12-19 10:26:48,514][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000976_3997696.pth... [2024-12-19 10:26:48,632][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000750_3072000.pth [2024-12-19 10:26:51,954][15938] Updated weights for policy 0, policy_version 980 (0.0034) [2024-12-19 10:26:53,502][07135] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 4018176. Throughput: 0: 965.2. Samples: 430028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:26:53,512][07135] Avg episode reward: [(0, '4.962')] [2024-12-19 10:26:58,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 4038656. Throughput: 0: 1018.7. Samples: 436780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:26:58,508][07135] Avg episode reward: [(0, '4.909')] [2024-12-19 10:27:03,339][15938] Updated weights for policy 0, policy_version 990 (0.0017) [2024-12-19 10:27:03,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 4055040. Throughput: 0: 956.9. Samples: 441038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:27:03,507][07135] Avg episode reward: [(0, '4.653')] [2024-12-19 10:27:08,502][07135] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 4075520. Throughput: 0: 956.0. Samples: 444488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:27:08,508][07135] Avg episode reward: [(0, '4.875')] [2024-12-19 10:27:12,062][15938] Updated weights for policy 0, policy_version 1000 (0.0018) [2024-12-19 10:27:13,502][07135] Fps is (10 sec: 4505.3, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 4100096. Throughput: 0: 1007.8. Samples: 451502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:27:13,505][07135] Avg episode reward: [(0, '4.983')] [2024-12-19 10:27:18,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 4116480. Throughput: 0: 978.3. Samples: 456286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:27:18,507][07135] Avg episode reward: [(0, '4.918')] [2024-12-19 10:27:23,503][07135] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 4132864. Throughput: 0: 955.3. Samples: 458860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:27:23,510][07135] Avg episode reward: [(0, '4.909')] [2024-12-19 10:27:23,597][15938] Updated weights for policy 0, policy_version 1010 (0.0014) [2024-12-19 10:27:28,502][07135] Fps is (10 sec: 4096.1, 60 sec: 4028.1, 300 sec: 3873.8). Total num frames: 4157440. Throughput: 0: 986.1. Samples: 465906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:27:28,505][07135] Avg episode reward: [(0, '4.777')] [2024-12-19 10:27:33,293][15938] Updated weights for policy 0, policy_version 1020 (0.0029) [2024-12-19 10:27:33,502][07135] Fps is (10 sec: 4505.9, 60 sec: 3959.4, 300 sec: 3860.0). Total num frames: 4177920. Throughput: 0: 1001.7. Samples: 471618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:27:33,508][07135] Avg episode reward: [(0, '4.593')] [2024-12-19 10:27:38,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 4190208. Throughput: 0: 971.7. Samples: 473756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:27:38,505][07135] Avg episode reward: [(0, '4.490')] [2024-12-19 10:27:43,502][07135] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 4214784. Throughput: 0: 960.4. Samples: 480000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:27:43,508][07135] Avg episode reward: [(0, '4.556')] [2024-12-19 10:27:44,080][15938] Updated weights for policy 0, policy_version 1030 (0.0042) [2024-12-19 10:27:48,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 4235264. Throughput: 0: 1015.3. Samples: 486726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:27:48,504][07135] Avg episode reward: [(0, '4.647')] [2024-12-19 10:27:53,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 4251648. Throughput: 0: 987.2. Samples: 488914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:27:53,507][07135] Avg episode reward: [(0, '4.567')] [2024-12-19 10:27:55,661][15938] Updated weights for policy 0, policy_version 1040 (0.0015) [2024-12-19 10:27:58,502][07135] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 4272128. Throughput: 0: 949.2. Samples: 494216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:27:58,510][07135] Avg episode reward: [(0, '4.412')] [2024-12-19 10:28:03,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 4292608. Throughput: 0: 1000.2. Samples: 501296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:28:03,509][07135] Avg episode reward: [(0, '4.665')] [2024-12-19 10:28:04,369][15938] Updated weights for policy 0, policy_version 1050 (0.0020) [2024-12-19 10:28:08,503][07135] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 3859.9). Total num frames: 4313088. Throughput: 0: 1006.9. Samples: 504170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:28:08,508][07135] Avg episode reward: [(0, '4.837')] [2024-12-19 10:28:13,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 4329472. Throughput: 0: 945.0. Samples: 508432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:28:13,507][07135] Avg episode reward: [(0, '4.877')] [2024-12-19 10:28:16,044][15938] Updated weights for policy 0, policy_version 1060 (0.0029) [2024-12-19 10:28:18,502][07135] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 4349952. Throughput: 0: 973.7. Samples: 515436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:28:18,504][07135] Avg episode reward: [(0, '4.625')] [2024-12-19 10:28:23,508][07135] Fps is (10 sec: 4502.8, 60 sec: 4027.4, 300 sec: 3873.8). Total num frames: 4374528. Throughput: 0: 1001.3. Samples: 518822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:28:23,517][07135] Avg episode reward: [(0, '4.914')] [2024-12-19 10:28:26,483][15938] Updated weights for policy 0, policy_version 1070 (0.0028) [2024-12-19 10:28:28,502][07135] Fps is (10 sec: 3686.1, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 4386816. Throughput: 0: 966.4. Samples: 523488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:28:28,505][07135] Avg episode reward: [(0, '4.922')] [2024-12-19 10:28:33,502][07135] Fps is (10 sec: 3278.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 4407296. Throughput: 0: 955.0. Samples: 529700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:28:33,508][07135] Avg episode reward: [(0, '4.559')] [2024-12-19 10:28:36,442][15938] Updated weights for policy 0, policy_version 1080 (0.0025) [2024-12-19 10:28:38,502][07135] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 4431872. Throughput: 0: 982.5. Samples: 533128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:28:38,504][07135] Avg episode reward: [(0, '4.355')] [2024-12-19 10:28:43,503][07135] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 4448256. Throughput: 0: 989.2. Samples: 538732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:28:43,509][07135] Avg episode reward: [(0, '4.223')] [2024-12-19 10:28:48,107][15938] Updated weights for policy 0, policy_version 1090 (0.0013) [2024-12-19 10:28:48,502][07135] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 4464640. Throughput: 0: 942.5. Samples: 543708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:28:48,509][07135] Avg episode reward: [(0, '4.511')] [2024-12-19 10:28:48,525][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001090_4464640.pth... [2024-12-19 10:28:48,650][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000865_3543040.pth [2024-12-19 10:28:53,502][07135] Fps is (10 sec: 4096.5, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 4489216. Throughput: 0: 955.0. Samples: 547146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:28:53,506][07135] Avg episode reward: [(0, '4.728')] [2024-12-19 10:28:56,828][15938] Updated weights for policy 0, policy_version 1100 (0.0027) [2024-12-19 10:28:58,502][07135] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 4509696. Throughput: 0: 1009.8. Samples: 553872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:28:58,507][07135] Avg episode reward: [(0, '4.685')] [2024-12-19 10:29:03,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 4521984. Throughput: 0: 947.9. Samples: 558090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:29:03,504][07135] Avg episode reward: [(0, '4.748')] [2024-12-19 10:29:08,262][15938] Updated weights for policy 0, policy_version 1110 (0.0025) [2024-12-19 10:29:08,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 4546560. Throughput: 0: 949.0. Samples: 561520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:29:08,503][07135] Avg episode reward: [(0, '4.607')] [2024-12-19 10:29:13,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 4567040. Throughput: 0: 999.8. Samples: 568480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:29:13,506][07135] Avg episode reward: [(0, '4.524')] [2024-12-19 10:29:18,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 4583424. Throughput: 0: 969.5. Samples: 573328. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:29:18,504][07135] Avg episode reward: [(0, '4.618')] [2024-12-19 10:29:19,204][15938] Updated weights for policy 0, policy_version 1120 (0.0028) [2024-12-19 10:29:23,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3860.0). Total num frames: 4603904. Throughput: 0: 949.6. Samples: 575858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:29:23,508][07135] Avg episode reward: [(0, '4.728')] [2024-12-19 10:29:28,502][07135] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 4624384. Throughput: 0: 978.9. Samples: 582782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:29:28,512][07135] Avg episode reward: [(0, '4.640')] [2024-12-19 10:29:29,091][15938] Updated weights for policy 0, policy_version 1130 (0.0026) [2024-12-19 10:29:33,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 4636672. Throughput: 0: 962.2. Samples: 587008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:29:33,503][07135] Avg episode reward: [(0, '4.709')] [2024-12-19 10:29:38,504][07135] Fps is (10 sec: 2456.9, 60 sec: 3618.0, 300 sec: 3818.3). Total num frames: 4648960. Throughput: 0: 923.2. Samples: 588692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:29:38,510][07135] Avg episode reward: [(0, '4.666')] [2024-12-19 10:29:43,306][15938] Updated weights for policy 0, policy_version 1140 (0.0021) [2024-12-19 10:29:43,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3818.3). Total num frames: 4669440. Throughput: 0: 881.6. Samples: 593542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:29:43,507][07135] Avg episode reward: [(0, '4.591')] [2024-12-19 10:29:48,502][07135] Fps is (10 sec: 4097.0, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 4689920. Throughput: 0: 942.7. Samples: 600514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:29:48,504][07135] Avg episode reward: [(0, '4.567')] [2024-12-19 10:29:53,388][15938] Updated weights for policy 0, policy_version 1150 (0.0023) [2024-12-19 10:29:53,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 4710400. Throughput: 0: 931.8. Samples: 603450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:29:53,506][07135] Avg episode reward: [(0, '4.565')] [2024-12-19 10:29:58,502][07135] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 4726784. Throughput: 0: 874.0. Samples: 607810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:29:58,506][07135] Avg episode reward: [(0, '4.773')] [2024-12-19 10:30:03,503][07135] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3859.9). Total num frames: 4747264. Throughput: 0: 920.6. Samples: 614756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:30:03,505][07135] Avg episode reward: [(0, '4.762')] [2024-12-19 10:30:03,702][15938] Updated weights for policy 0, policy_version 1160 (0.0020) [2024-12-19 10:30:08,502][07135] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 4771840. Throughput: 0: 941.4. Samples: 618220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:30:08,504][07135] Avg episode reward: [(0, '4.648')] [2024-12-19 10:30:13,502][07135] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 4784128. Throughput: 0: 889.9. Samples: 622828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:30:13,510][07135] Avg episode reward: [(0, '4.702')] [2024-12-19 10:30:15,345][15938] Updated weights for policy 0, policy_version 1170 (0.0017) [2024-12-19 10:30:18,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 4804608. Throughput: 0: 930.9. Samples: 628898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:30:18,505][07135] Avg episode reward: [(0, '4.625')] [2024-12-19 10:30:23,502][07135] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 4829184. Throughput: 0: 971.2. Samples: 632394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:30:23,504][07135] Avg episode reward: [(0, '4.611')] [2024-12-19 10:30:24,054][15938] Updated weights for policy 0, policy_version 1180 (0.0015) [2024-12-19 10:30:28,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 4845568. Throughput: 0: 991.2. Samples: 638146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:30:28,506][07135] Avg episode reward: [(0, '4.614')] [2024-12-19 10:30:33,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 4861952. Throughput: 0: 947.7. Samples: 643160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:30:33,508][07135] Avg episode reward: [(0, '4.532')] [2024-12-19 10:30:35,585][15938] Updated weights for policy 0, policy_version 1190 (0.0021) [2024-12-19 10:30:38,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3873.8). Total num frames: 4886528. Throughput: 0: 961.4. Samples: 646714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:30:38,508][07135] Avg episode reward: [(0, '4.661')] [2024-12-19 10:30:43,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 4907008. Throughput: 0: 1010.7. Samples: 653292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:30:43,508][07135] Avg episode reward: [(0, '4.762')] [2024-12-19 10:30:46,126][15938] Updated weights for policy 0, policy_version 1200 (0.0023) [2024-12-19 10:30:48,503][07135] Fps is (10 sec: 3276.2, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 4919296. Throughput: 0: 952.6. Samples: 657624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:30:48,508][07135] Avg episode reward: [(0, '4.656')] [2024-12-19 10:30:48,518][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001201_4919296.pth... [2024-12-19 10:30:48,711][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000976_3997696.pth [2024-12-19 10:30:53,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 4943872. Throughput: 0: 945.6. Samples: 660772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:30:53,506][07135] Avg episode reward: [(0, '4.770')] [2024-12-19 10:30:56,179][15938] Updated weights for policy 0, policy_version 1210 (0.0022) [2024-12-19 10:30:58,502][07135] Fps is (10 sec: 4506.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 4964352. Throughput: 0: 997.8. Samples: 667730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:30:58,505][07135] Avg episode reward: [(0, '4.645')] [2024-12-19 10:31:03,502][07135] Fps is (10 sec: 3686.1, 60 sec: 3891.2, 300 sec: 3859.9). Total num frames: 4980736. Throughput: 0: 971.6. Samples: 672622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:31:03,508][07135] Avg episode reward: [(0, '4.666')] [2024-12-19 10:31:07,700][15938] Updated weights for policy 0, policy_version 1220 (0.0025) [2024-12-19 10:31:08,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 4997120. Throughput: 0: 946.4. Samples: 674980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:31:08,506][07135] Avg episode reward: [(0, '4.843')] [2024-12-19 10:31:13,505][07135] Fps is (10 sec: 4094.8, 60 sec: 3959.2, 300 sec: 3859.9). Total num frames: 5021696. Throughput: 0: 973.7. Samples: 681968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-19 10:31:13,511][07135] Avg episode reward: [(0, '4.993')] [2024-12-19 10:31:16,839][15938] Updated weights for policy 0, policy_version 1230 (0.0017) [2024-12-19 10:31:18,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5042176. Throughput: 0: 992.6. Samples: 687826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:31:18,504][07135] Avg episode reward: [(0, '4.934')] [2024-12-19 10:31:23,502][07135] Fps is (10 sec: 3278.0, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 5054464. Throughput: 0: 958.7. Samples: 689856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-19 10:31:23,503][07135] Avg episode reward: [(0, '4.917')] [2024-12-19 10:31:28,112][15938] Updated weights for policy 0, policy_version 1240 (0.0023) [2024-12-19 10:31:28,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 5079040. Throughput: 0: 952.0. Samples: 696130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:31:28,507][07135] Avg episode reward: [(0, '4.963')] [2024-12-19 10:31:33,502][07135] Fps is (10 sec: 4505.3, 60 sec: 3959.4, 300 sec: 3860.0). Total num frames: 5099520. Throughput: 0: 1009.4. Samples: 703046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:31:33,506][07135] Avg episode reward: [(0, '4.955')] [2024-12-19 10:31:38,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5115904. Throughput: 0: 984.0. Samples: 705050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:31:38,504][07135] Avg episode reward: [(0, '4.668')] [2024-12-19 10:31:39,327][15938] Updated weights for policy 0, policy_version 1250 (0.0031) [2024-12-19 10:31:43,502][07135] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5136384. Throughput: 0: 947.3. Samples: 710358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:31:43,504][07135] Avg episode reward: [(0, '4.616')] [2024-12-19 10:31:48,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3860.0). Total num frames: 5156864. Throughput: 0: 989.7. Samples: 717156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:31:48,507][07135] Avg episode reward: [(0, '4.672')] [2024-12-19 10:31:48,603][15938] Updated weights for policy 0, policy_version 1260 (0.0022) [2024-12-19 10:31:53,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5173248. Throughput: 0: 1001.8. Samples: 720060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:31:53,506][07135] Avg episode reward: [(0, '4.689')] [2024-12-19 10:31:58,502][07135] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5193728. Throughput: 0: 945.0. Samples: 724490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:31:58,511][07135] Avg episode reward: [(0, '4.609')] [2024-12-19 10:32:00,267][15938] Updated weights for policy 0, policy_version 1270 (0.0024) [2024-12-19 10:32:03,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 5214208. Throughput: 0: 967.8. Samples: 731376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:32:03,504][07135] Avg episode reward: [(0, '4.744')] [2024-12-19 10:32:08,509][07135] Fps is (10 sec: 4093.0, 60 sec: 3959.0, 300 sec: 3846.0). Total num frames: 5234688. Throughput: 0: 999.4. Samples: 734838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:32:08,513][07135] Avg episode reward: [(0, '4.537')] [2024-12-19 10:32:10,108][15938] Updated weights for policy 0, policy_version 1280 (0.0027) [2024-12-19 10:32:13,504][07135] Fps is (10 sec: 3685.7, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 5251072. Throughput: 0: 963.5. Samples: 739488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:32:13,505][07135] Avg episode reward: [(0, '4.396')] [2024-12-19 10:32:18,502][07135] Fps is (10 sec: 3689.2, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5271552. Throughput: 0: 943.4. Samples: 745500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:32:18,503][07135] Avg episode reward: [(0, '4.494')] [2024-12-19 10:32:20,757][15938] Updated weights for policy 0, policy_version 1290 (0.0031) [2024-12-19 10:32:23,502][07135] Fps is (10 sec: 4506.5, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 5296128. Throughput: 0: 976.6. Samples: 748998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:32:23,507][07135] Avg episode reward: [(0, '4.603')] [2024-12-19 10:32:28,505][07135] Fps is (10 sec: 4094.5, 60 sec: 3891.0, 300 sec: 3846.0). Total num frames: 5312512. Throughput: 0: 982.7. Samples: 754584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:32:28,508][07135] Avg episode reward: [(0, '4.675')] [2024-12-19 10:32:32,226][15938] Updated weights for policy 0, policy_version 1300 (0.0013) [2024-12-19 10:32:33,502][07135] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5328896. Throughput: 0: 945.8. Samples: 759716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:32:33,507][07135] Avg episode reward: [(0, '4.647')] [2024-12-19 10:32:38,502][07135] Fps is (10 sec: 4097.5, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5353472. Throughput: 0: 959.5. Samples: 763236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:32:38,504][07135] Avg episode reward: [(0, '4.542')] [2024-12-19 10:32:40,953][15938] Updated weights for policy 0, policy_version 1310 (0.0028) [2024-12-19 10:32:43,502][07135] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5369856. Throughput: 0: 1008.1. Samples: 769854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:32:43,510][07135] Avg episode reward: [(0, '4.710')] [2024-12-19 10:32:48,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5386240. Throughput: 0: 948.7. Samples: 774068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:32:48,506][07135] Avg episode reward: [(0, '4.644')] [2024-12-19 10:32:48,523][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001315_5386240.pth... [2024-12-19 10:32:48,712][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001090_4464640.pth [2024-12-19 10:32:52,673][15938] Updated weights for policy 0, policy_version 1320 (0.0029) [2024-12-19 10:32:53,502][07135] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5406720. Throughput: 0: 945.6. Samples: 777382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:32:53,507][07135] Avg episode reward: [(0, '4.515')] [2024-12-19 10:32:58,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5431296. Throughput: 0: 996.3. Samples: 784320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:32:58,507][07135] Avg episode reward: [(0, '4.630')] [2024-12-19 10:33:03,217][15938] Updated weights for policy 0, policy_version 1330 (0.0014) [2024-12-19 10:33:03,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5447680. Throughput: 0: 971.3. Samples: 789208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:33:03,506][07135] Avg episode reward: [(0, '4.667')] [2024-12-19 10:33:08,502][07135] Fps is (10 sec: 3276.7, 60 sec: 3823.4, 300 sec: 3846.1). Total num frames: 5464064. Throughput: 0: 946.4. Samples: 791586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:33:08,510][07135] Avg episode reward: [(0, '4.856')] [2024-12-19 10:33:13,183][15938] Updated weights for policy 0, policy_version 1340 (0.0020) [2024-12-19 10:33:13,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3860.0). Total num frames: 5488640. Throughput: 0: 974.4. Samples: 798430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:33:13,505][07135] Avg episode reward: [(0, '4.830')] [2024-12-19 10:33:18,502][07135] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3832.3). Total num frames: 5505024. Throughput: 0: 989.4. Samples: 804238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:33:18,504][07135] Avg episode reward: [(0, '4.824')] [2024-12-19 10:33:23,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5521408. Throughput: 0: 955.6. Samples: 806238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:33:23,505][07135] Avg episode reward: [(0, '4.826')] [2024-12-19 10:33:24,760][15938] Updated weights for policy 0, policy_version 1350 (0.0033) [2024-12-19 10:33:28,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 5545984. Throughput: 0: 949.0. Samples: 812560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:33:28,512][07135] Avg episode reward: [(0, '4.666')] [2024-12-19 10:33:33,503][07135] Fps is (10 sec: 4504.9, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 5566464. Throughput: 0: 1008.4. Samples: 819446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:33:33,510][07135] Avg episode reward: [(0, '4.782')] [2024-12-19 10:33:33,747][15938] Updated weights for policy 0, policy_version 1360 (0.0020) [2024-12-19 10:33:38,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5582848. Throughput: 0: 981.6. Samples: 821552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:33:38,504][07135] Avg episode reward: [(0, '4.573')] [2024-12-19 10:33:43,502][07135] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 5603328. Throughput: 0: 945.9. Samples: 826886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:33:43,514][07135] Avg episode reward: [(0, '4.268')] [2024-12-19 10:33:45,242][15938] Updated weights for policy 0, policy_version 1370 (0.0024) [2024-12-19 10:33:48,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 5623808. Throughput: 0: 989.6. Samples: 833740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:33:48,508][07135] Avg episode reward: [(0, '4.651')] [2024-12-19 10:33:53,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 5640192. Throughput: 0: 1000.7. Samples: 836616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:33:53,510][07135] Avg episode reward: [(0, '4.738')] [2024-12-19 10:33:56,573][15938] Updated weights for policy 0, policy_version 1380 (0.0035) [2024-12-19 10:33:58,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5656576. Throughput: 0: 946.7. Samples: 841030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:33:58,504][07135] Avg episode reward: [(0, '4.653')] [2024-12-19 10:34:03,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5681152. Throughput: 0: 975.6. Samples: 848142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:34:03,509][07135] Avg episode reward: [(0, '4.806')] [2024-12-19 10:34:05,580][15938] Updated weights for policy 0, policy_version 1390 (0.0030) [2024-12-19 10:34:08,504][07135] Fps is (10 sec: 4504.4, 60 sec: 3959.3, 300 sec: 3846.0). Total num frames: 5701632. Throughput: 0: 1006.2. Samples: 851520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:34:08,512][07135] Avg episode reward: [(0, '4.705')] [2024-12-19 10:34:13,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 5713920. Throughput: 0: 957.7. Samples: 855656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:34:13,508][07135] Avg episode reward: [(0, '4.837')] [2024-12-19 10:34:18,502][07135] Fps is (10 sec: 2458.2, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 5726208. Throughput: 0: 880.8. Samples: 859080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:34:18,508][07135] Avg episode reward: [(0, '4.803')] [2024-12-19 10:34:19,871][15938] Updated weights for policy 0, policy_version 1400 (0.0035) [2024-12-19 10:34:23,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 5750784. Throughput: 0: 911.0. Samples: 862546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:34:23,508][07135] Avg episode reward: [(0, '4.942')] [2024-12-19 10:34:28,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5771264. Throughput: 0: 947.6. Samples: 869528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:34:28,505][07135] Avg episode reward: [(0, '4.858')] [2024-12-19 10:34:29,317][15938] Updated weights for policy 0, policy_version 1410 (0.0015) [2024-12-19 10:34:33,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3860.0). Total num frames: 5787648. Throughput: 0: 893.3. Samples: 873938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:34:33,506][07135] Avg episode reward: [(0, '4.964')] [2024-12-19 10:34:38,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 5808128. Throughput: 0: 894.0. Samples: 876846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:34:38,509][07135] Avg episode reward: [(0, '4.859')] [2024-12-19 10:34:40,288][15938] Updated weights for policy 0, policy_version 1420 (0.0025) [2024-12-19 10:34:43,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 5828608. Throughput: 0: 947.8. Samples: 883680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:34:43,504][07135] Avg episode reward: [(0, '4.666')] [2024-12-19 10:34:48,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 5844992. Throughput: 0: 907.5. Samples: 888980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:34:48,504][07135] Avg episode reward: [(0, '4.726')] [2024-12-19 10:34:48,521][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001427_5844992.pth... [2024-12-19 10:34:48,708][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001201_4919296.pth [2024-12-19 10:34:52,007][15938] Updated weights for policy 0, policy_version 1430 (0.0042) [2024-12-19 10:34:53,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 5861376. Throughput: 0: 878.8. Samples: 891062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:34:53,509][07135] Avg episode reward: [(0, '4.904')] [2024-12-19 10:34:58,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5885952. Throughput: 0: 937.1. Samples: 897824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:34:58,504][07135] Avg episode reward: [(0, '4.671')] [2024-12-19 10:35:00,815][15938] Updated weights for policy 0, policy_version 1440 (0.0020) [2024-12-19 10:35:03,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5906432. Throughput: 0: 1002.7. Samples: 904200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:35:03,504][07135] Avg episode reward: [(0, '4.624')] [2024-12-19 10:35:08,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3846.1). Total num frames: 5918720. Throughput: 0: 971.4. Samples: 906258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:35:08,507][07135] Avg episode reward: [(0, '4.564')] [2024-12-19 10:35:12,309][15938] Updated weights for policy 0, policy_version 1450 (0.0027) [2024-12-19 10:35:13,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5943296. Throughput: 0: 947.9. Samples: 912182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:35:13,503][07135] Avg episode reward: [(0, '4.302')] [2024-12-19 10:35:18,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 5963776. Throughput: 0: 998.1. Samples: 918854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-19 10:35:18,507][07135] Avg episode reward: [(0, '4.376')] [2024-12-19 10:35:22,822][15938] Updated weights for policy 0, policy_version 1460 (0.0020) [2024-12-19 10:35:23,504][07135] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3846.0). Total num frames: 5980160. Throughput: 0: 987.2. Samples: 921272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:35:23,507][07135] Avg episode reward: [(0, '4.500')] [2024-12-19 10:35:28,502][07135] Fps is (10 sec: 3276.5, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 5996544. Throughput: 0: 942.8. Samples: 926108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:35:28,508][07135] Avg episode reward: [(0, '4.603')] [2024-12-19 10:35:33,050][15938] Updated weights for policy 0, policy_version 1470 (0.0017) [2024-12-19 10:35:33,502][07135] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6021120. Throughput: 0: 975.6. Samples: 932884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:35:33,505][07135] Avg episode reward: [(0, '4.738')] [2024-12-19 10:35:38,502][07135] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6041600. Throughput: 0: 1003.4. Samples: 936214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:35:38,506][07135] Avg episode reward: [(0, '4.636')] [2024-12-19 10:35:43,504][07135] Fps is (10 sec: 3275.9, 60 sec: 3754.5, 300 sec: 3846.1). Total num frames: 6053888. Throughput: 0: 947.1. Samples: 940448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:35:43,510][07135] Avg episode reward: [(0, '4.399')] [2024-12-19 10:35:44,719][15938] Updated weights for policy 0, policy_version 1480 (0.0037) [2024-12-19 10:35:48,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6078464. Throughput: 0: 949.0. Samples: 946906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:35:48,504][07135] Avg episode reward: [(0, '4.386')] [2024-12-19 10:35:53,502][07135] Fps is (10 sec: 4506.8, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 6098944. Throughput: 0: 978.9. Samples: 950308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:35:53,507][07135] Avg episode reward: [(0, '4.439')] [2024-12-19 10:35:54,029][15938] Updated weights for policy 0, policy_version 1490 (0.0024) [2024-12-19 10:35:58,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6111232. Throughput: 0: 952.6. Samples: 955048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:35:58,507][07135] Avg episode reward: [(0, '4.466')] [2024-12-19 10:36:03,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 6131712. Throughput: 0: 926.2. Samples: 960534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:36:03,508][07135] Avg episode reward: [(0, '4.493')] [2024-12-19 10:36:05,735][15938] Updated weights for policy 0, policy_version 1500 (0.0025) [2024-12-19 10:36:08,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 6152192. Throughput: 0: 946.5. Samples: 963862. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:36:08,514][07135] Avg episode reward: [(0, '4.557')] [2024-12-19 10:36:13,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6172672. Throughput: 0: 966.1. Samples: 969580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:36:13,506][07135] Avg episode reward: [(0, '4.492')] [2024-12-19 10:36:17,746][15938] Updated weights for policy 0, policy_version 1510 (0.0020) [2024-12-19 10:36:18,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 6184960. Throughput: 0: 917.9. Samples: 974188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:36:18,507][07135] Avg episode reward: [(0, '4.407')] [2024-12-19 10:36:23,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 6209536. Throughput: 0: 922.2. Samples: 977712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:36:23,504][07135] Avg episode reward: [(0, '4.647')] [2024-12-19 10:36:26,253][15938] Updated weights for policy 0, policy_version 1520 (0.0021) [2024-12-19 10:36:28,502][07135] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 6234112. Throughput: 0: 986.3. Samples: 984828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:36:28,505][07135] Avg episode reward: [(0, '4.842')] [2024-12-19 10:36:33,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6246400. Throughput: 0: 937.7. Samples: 989104. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-19 10:36:33,504][07135] Avg episode reward: [(0, '4.667')] [2024-12-19 10:36:37,786][15938] Updated weights for policy 0, policy_version 1530 (0.0030) [2024-12-19 10:36:38,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6266880. Throughput: 0: 929.2. Samples: 992124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:36:38,504][07135] Avg episode reward: [(0, '4.615')] [2024-12-19 10:36:43,502][07135] Fps is (10 sec: 4505.5, 60 sec: 3959.6, 300 sec: 3846.1). Total num frames: 6291456. Throughput: 0: 982.1. Samples: 999242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:36:43,508][07135] Avg episode reward: [(0, '4.737')] [2024-12-19 10:36:47,945][15938] Updated weights for policy 0, policy_version 1540 (0.0021) [2024-12-19 10:36:48,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 6307840. Throughput: 0: 976.3. Samples: 1004468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:36:48,506][07135] Avg episode reward: [(0, '4.524')] [2024-12-19 10:36:48,517][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001540_6307840.pth... [2024-12-19 10:36:48,679][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001315_5386240.pth [2024-12-19 10:36:53,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 6324224. Throughput: 0: 949.3. Samples: 1006582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:36:53,507][07135] Avg episode reward: [(0, '4.445')] [2024-12-19 10:36:58,080][15938] Updated weights for policy 0, policy_version 1550 (0.0020) [2024-12-19 10:36:58,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 6348800. Throughput: 0: 973.9. Samples: 1013404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:36:58,503][07135] Avg episode reward: [(0, '4.542')] [2024-12-19 10:37:03,502][07135] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3846.2). Total num frames: 6369280. Throughput: 0: 1012.5. Samples: 1019750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:37:03,506][07135] Avg episode reward: [(0, '4.514')] [2024-12-19 10:37:08,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 6381568. Throughput: 0: 980.7. Samples: 1021842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:37:08,508][07135] Avg episode reward: [(0, '4.552')] [2024-12-19 10:37:09,774][15938] Updated weights for policy 0, policy_version 1560 (0.0041) [2024-12-19 10:37:13,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6406144. Throughput: 0: 954.5. Samples: 1027780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:37:13,508][07135] Avg episode reward: [(0, '4.579')] [2024-12-19 10:37:18,351][15938] Updated weights for policy 0, policy_version 1570 (0.0038) [2024-12-19 10:37:18,502][07135] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3846.1). Total num frames: 6430720. Throughput: 0: 1013.4. Samples: 1034708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:37:18,504][07135] Avg episode reward: [(0, '4.551')] [2024-12-19 10:37:23,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 6443008. Throughput: 0: 999.2. Samples: 1037086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:37:23,505][07135] Avg episode reward: [(0, '4.563')] [2024-12-19 10:37:28,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 6463488. Throughput: 0: 949.1. Samples: 1041952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:37:28,505][07135] Avg episode reward: [(0, '4.418')] [2024-12-19 10:37:30,089][15938] Updated weights for policy 0, policy_version 1580 (0.0041) [2024-12-19 10:37:33,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 6483968. Throughput: 0: 988.7. Samples: 1048960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:37:33,504][07135] Avg episode reward: [(0, '4.613')] [2024-12-19 10:37:38,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 6504448. Throughput: 0: 1018.1. Samples: 1052396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:37:38,507][07135] Avg episode reward: [(0, '4.848')] [2024-12-19 10:37:40,268][15938] Updated weights for policy 0, policy_version 1590 (0.0020) [2024-12-19 10:37:43,502][07135] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 6520832. Throughput: 0: 962.3. Samples: 1056708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:37:43,507][07135] Avg episode reward: [(0, '4.589')] [2024-12-19 10:37:48,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6541312. Throughput: 0: 967.2. Samples: 1063276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:37:48,508][07135] Avg episode reward: [(0, '4.535')] [2024-12-19 10:37:50,413][15938] Updated weights for policy 0, policy_version 1600 (0.0016) [2024-12-19 10:37:53,502][07135] Fps is (10 sec: 4505.7, 60 sec: 4027.8, 300 sec: 3846.1). Total num frames: 6565888. Throughput: 0: 998.1. Samples: 1066758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:37:53,509][07135] Avg episode reward: [(0, '4.564')] [2024-12-19 10:37:58,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6582272. Throughput: 0: 982.7. Samples: 1072000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:37:58,507][07135] Avg episode reward: [(0, '4.605')] [2024-12-19 10:38:01,866][15938] Updated weights for policy 0, policy_version 1610 (0.0018) [2024-12-19 10:38:03,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 6598656. Throughput: 0: 953.7. Samples: 1077626. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:38:03,503][07135] Avg episode reward: [(0, '4.439')] [2024-12-19 10:38:08,502][07135] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 6623232. Throughput: 0: 978.2. Samples: 1081104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:38:08,508][07135] Avg episode reward: [(0, '4.685')] [2024-12-19 10:38:10,564][15938] Updated weights for policy 0, policy_version 1620 (0.0016) [2024-12-19 10:38:13,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 6643712. Throughput: 0: 1008.7. Samples: 1087344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:38:13,506][07135] Avg episode reward: [(0, '4.895')] [2024-12-19 10:38:18,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 6656000. Throughput: 0: 954.4. Samples: 1091910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:38:18,504][07135] Avg episode reward: [(0, '4.740')] [2024-12-19 10:38:22,108][15938] Updated weights for policy 0, policy_version 1630 (0.0027) [2024-12-19 10:38:23,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 6680576. Throughput: 0: 955.9. Samples: 1095412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:38:23,509][07135] Avg episode reward: [(0, '4.579')] [2024-12-19 10:38:28,502][07135] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 6705152. Throughput: 0: 1017.7. Samples: 1102506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:38:28,504][07135] Avg episode reward: [(0, '4.554')] [2024-12-19 10:38:32,614][15938] Updated weights for policy 0, policy_version 1640 (0.0025) [2024-12-19 10:38:33,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6717440. Throughput: 0: 972.5. Samples: 1107038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:38:33,510][07135] Avg episode reward: [(0, '4.442')] [2024-12-19 10:38:38,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 6737920. Throughput: 0: 959.2. Samples: 1109920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:38:38,503][07135] Avg episode reward: [(0, '4.430')] [2024-12-19 10:38:42,199][15938] Updated weights for policy 0, policy_version 1650 (0.0014) [2024-12-19 10:38:43,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3860.0). Total num frames: 6762496. Throughput: 0: 1002.5. Samples: 1117112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:38:43,508][07135] Avg episode reward: [(0, '4.710')] [2024-12-19 10:38:48,503][07135] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 3859.9). Total num frames: 6778880. Throughput: 0: 987.8. Samples: 1122080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:38:48,505][07135] Avg episode reward: [(0, '4.741')] [2024-12-19 10:38:48,529][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001655_6778880.pth... [2024-12-19 10:38:48,790][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001427_5844992.pth [2024-12-19 10:38:53,503][07135] Fps is (10 sec: 2866.9, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 6791168. Throughput: 0: 948.1. Samples: 1123770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:38:53,509][07135] Avg episode reward: [(0, '4.675')] [2024-12-19 10:38:56,514][15938] Updated weights for policy 0, policy_version 1660 (0.0019) [2024-12-19 10:38:58,502][07135] Fps is (10 sec: 2867.4, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 6807552. Throughput: 0: 903.9. Samples: 1128020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:38:58,511][07135] Avg episode reward: [(0, '4.485')] [2024-12-19 10:39:03,502][07135] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 6828032. Throughput: 0: 960.6. Samples: 1135138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:39:03,505][07135] Avg episode reward: [(0, '4.479')] [2024-12-19 10:39:05,207][15938] Updated weights for policy 0, policy_version 1670 (0.0017) [2024-12-19 10:39:08,502][07135] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 6848512. Throughput: 0: 947.5. Samples: 1138048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:39:08,504][07135] Avg episode reward: [(0, '4.441')] [2024-12-19 10:39:13,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 6864896. Throughput: 0: 887.8. Samples: 1142456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:39:13,504][07135] Avg episode reward: [(0, '4.760')] [2024-12-19 10:39:16,683][15938] Updated weights for policy 0, policy_version 1680 (0.0019) [2024-12-19 10:39:18,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6889472. Throughput: 0: 943.5. Samples: 1149494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:39:18,508][07135] Avg episode reward: [(0, '5.061')] [2024-12-19 10:39:23,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 6909952. Throughput: 0: 957.1. Samples: 1152988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:39:23,504][07135] Avg episode reward: [(0, '5.036')] [2024-12-19 10:39:27,157][15938] Updated weights for policy 0, policy_version 1690 (0.0028) [2024-12-19 10:39:28,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 6922240. Throughput: 0: 902.6. Samples: 1157728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:39:28,507][07135] Avg episode reward: [(0, '4.919')] [2024-12-19 10:39:33,502][07135] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 6946816. Throughput: 0: 931.2. Samples: 1163982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:39:33,505][07135] Avg episode reward: [(0, '4.644')] [2024-12-19 10:39:36,970][15938] Updated weights for policy 0, policy_version 1700 (0.0033) [2024-12-19 10:39:38,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 6967296. Throughput: 0: 970.1. Samples: 1167422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:39:38,508][07135] Avg episode reward: [(0, '4.577')] [2024-12-19 10:39:43,506][07135] Fps is (10 sec: 3684.8, 60 sec: 3686.1, 300 sec: 3859.9). Total num frames: 6983680. Throughput: 0: 1002.0. Samples: 1173116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:39:43,509][07135] Avg episode reward: [(0, '4.562')] [2024-12-19 10:39:48,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3860.0). Total num frames: 7000064. Throughput: 0: 953.1. Samples: 1178028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:39:48,506][07135] Avg episode reward: [(0, '4.446')] [2024-12-19 10:39:48,738][15938] Updated weights for policy 0, policy_version 1710 (0.0014) [2024-12-19 10:39:53,502][07135] Fps is (10 sec: 4097.9, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 7024640. Throughput: 0: 959.6. Samples: 1181232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:39:53,504][07135] Avg episode reward: [(0, '4.648')] [2024-12-19 10:39:58,502][07135] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 7041024. Throughput: 0: 1005.3. Samples: 1187694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:39:58,511][07135] Avg episode reward: [(0, '4.611')] [2024-12-19 10:39:58,514][15938] Updated weights for policy 0, policy_version 1720 (0.0015) [2024-12-19 10:40:03,503][07135] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 7057408. Throughput: 0: 943.1. Samples: 1191936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:40:03,509][07135] Avg episode reward: [(0, '4.681')] [2024-12-19 10:40:08,502][07135] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7081984. Throughput: 0: 941.0. Samples: 1195334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:40:08,504][07135] Avg episode reward: [(0, '4.924')] [2024-12-19 10:40:09,229][15938] Updated weights for policy 0, policy_version 1730 (0.0025) [2024-12-19 10:40:13,502][07135] Fps is (10 sec: 4506.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7102464. Throughput: 0: 991.2. Samples: 1202332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:40:13,504][07135] Avg episode reward: [(0, '4.835')] [2024-12-19 10:40:18,505][07135] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3859.9). Total num frames: 7118848. Throughput: 0: 959.1. Samples: 1207146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-19 10:40:18,512][07135] Avg episode reward: [(0, '4.766')] [2024-12-19 10:40:20,897][15938] Updated weights for policy 0, policy_version 1740 (0.0023) [2024-12-19 10:40:23,505][07135] Fps is (10 sec: 3275.6, 60 sec: 3754.4, 300 sec: 3859.9). Total num frames: 7135232. Throughput: 0: 931.1. Samples: 1209326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:40:23,512][07135] Avg episode reward: [(0, '4.622')] [2024-12-19 10:40:28,502][07135] Fps is (10 sec: 4097.5, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7159808. Throughput: 0: 959.8. Samples: 1216302. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:40:28,506][07135] Avg episode reward: [(0, '4.626')] [2024-12-19 10:40:29,717][15938] Updated weights for policy 0, policy_version 1750 (0.0020) [2024-12-19 10:40:33,502][07135] Fps is (10 sec: 4507.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7180288. Throughput: 0: 979.3. Samples: 1222098. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-19 10:40:33,506][07135] Avg episode reward: [(0, '4.728')] [2024-12-19 10:40:38,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 7192576. Throughput: 0: 954.8. Samples: 1224198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:40:38,504][07135] Avg episode reward: [(0, '4.724')] [2024-12-19 10:40:41,516][15938] Updated weights for policy 0, policy_version 1760 (0.0021) [2024-12-19 10:40:43,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3860.0). Total num frames: 7217152. Throughput: 0: 950.6. Samples: 1230470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-19 10:40:43,504][07135] Avg episode reward: [(0, '4.762')] [2024-12-19 10:40:48,502][07135] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 7237632. Throughput: 0: 1000.6. Samples: 1236960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:40:48,506][07135] Avg episode reward: [(0, '4.707')] [2024-12-19 10:40:48,525][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001767_7237632.pth... [2024-12-19 10:40:48,733][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001540_6307840.pth [2024-12-19 10:40:52,447][15938] Updated weights for policy 0, policy_version 1770 (0.0023) [2024-12-19 10:40:53,502][07135] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 7249920. Throughput: 0: 968.7. Samples: 1238926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:40:53,511][07135] Avg episode reward: [(0, '4.631')] [2024-12-19 10:40:58,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 7270400. Throughput: 0: 925.3. Samples: 1243972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:40:58,505][07135] Avg episode reward: [(0, '4.846')] [2024-12-19 10:41:02,833][15938] Updated weights for policy 0, policy_version 1780 (0.0014) [2024-12-19 10:41:03,502][07135] Fps is (10 sec: 4096.1, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 7290880. Throughput: 0: 965.1. Samples: 1250574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:41:03,503][07135] Avg episode reward: [(0, '5.026')] [2024-12-19 10:41:08,502][07135] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 7307264. Throughput: 0: 978.1. Samples: 1253338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:41:08,506][07135] Avg episode reward: [(0, '4.908')] [2024-12-19 10:41:13,507][07135] Fps is (10 sec: 3275.0, 60 sec: 3686.1, 300 sec: 3859.9). Total num frames: 7323648. Throughput: 0: 914.8. Samples: 1257472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:41:13,513][07135] Avg episode reward: [(0, '4.830')] [2024-12-19 10:41:14,971][15938] Updated weights for policy 0, policy_version 1790 (0.0028) [2024-12-19 10:41:18,502][07135] Fps is (10 sec: 3686.6, 60 sec: 3754.9, 300 sec: 3846.1). Total num frames: 7344128. Throughput: 0: 932.0. Samples: 1264036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:41:18,507][07135] Avg episode reward: [(0, '4.980')] [2024-12-19 10:41:23,503][07135] Fps is (10 sec: 4097.5, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 7364608. Throughput: 0: 954.2. Samples: 1267138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:41:23,506][07135] Avg episode reward: [(0, '5.035')] [2024-12-19 10:41:25,301][15938] Updated weights for policy 0, policy_version 1800 (0.0030) [2024-12-19 10:41:28,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 7380992. Throughput: 0: 916.6. Samples: 1271716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:41:28,508][07135] Avg episode reward: [(0, '4.938')] [2024-12-19 10:41:33,502][07135] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 7401472. Throughput: 0: 901.6. Samples: 1277532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:41:33,509][07135] Avg episode reward: [(0, '4.708')] [2024-12-19 10:41:36,128][15938] Updated weights for policy 0, policy_version 1810 (0.0013) [2024-12-19 10:41:38,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 7421952. Throughput: 0: 933.2. Samples: 1280920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:41:38,503][07135] Avg episode reward: [(0, '4.635')] [2024-12-19 10:41:43,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 7438336. Throughput: 0: 946.8. Samples: 1286580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:41:43,507][07135] Avg episode reward: [(0, '4.728')] [2024-12-19 10:41:47,811][15938] Updated weights for policy 0, policy_version 1820 (0.0024) [2024-12-19 10:41:48,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 7454720. Throughput: 0: 911.2. Samples: 1291580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:41:48,504][07135] Avg episode reward: [(0, '4.879')] [2024-12-19 10:41:53,502][07135] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 7479296. Throughput: 0: 923.8. Samples: 1294910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:41:53,504][07135] Avg episode reward: [(0, '4.921')] [2024-12-19 10:41:57,063][15938] Updated weights for policy 0, policy_version 1830 (0.0015) [2024-12-19 10:41:58,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 7499776. Throughput: 0: 975.4. Samples: 1301362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:41:58,505][07135] Avg episode reward: [(0, '4.785')] [2024-12-19 10:42:03,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 7512064. Throughput: 0: 923.3. Samples: 1305584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:42:03,509][07135] Avg episode reward: [(0, '4.683')] [2024-12-19 10:42:08,415][15938] Updated weights for policy 0, policy_version 1840 (0.0021) [2024-12-19 10:42:08,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 7536640. Throughput: 0: 931.4. Samples: 1309050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:42:08,503][07135] Avg episode reward: [(0, '4.599')] [2024-12-19 10:42:13,504][07135] Fps is (10 sec: 4504.4, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 7557120. Throughput: 0: 983.5. Samples: 1315974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:42:13,507][07135] Avg episode reward: [(0, '4.816')] [2024-12-19 10:42:18,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 7573504. Throughput: 0: 956.8. Samples: 1320586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:42:18,508][07135] Avg episode reward: [(0, '4.813')] [2024-12-19 10:42:19,848][15938] Updated weights for policy 0, policy_version 1850 (0.0018) [2024-12-19 10:42:23,502][07135] Fps is (10 sec: 3277.6, 60 sec: 3754.8, 300 sec: 3818.3). Total num frames: 7589888. Throughput: 0: 939.5. Samples: 1323196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:42:23,510][07135] Avg episode reward: [(0, '4.691')] [2024-12-19 10:42:28,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 7614464. Throughput: 0: 965.2. Samples: 1330016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:42:28,505][07135] Avg episode reward: [(0, '4.682')] [2024-12-19 10:42:28,926][15938] Updated weights for policy 0, policy_version 1860 (0.0031) [2024-12-19 10:42:33,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 7630848. Throughput: 0: 972.5. Samples: 1335342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:42:33,508][07135] Avg episode reward: [(0, '4.660')] [2024-12-19 10:42:38,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 7647232. Throughput: 0: 942.7. Samples: 1337330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:42:38,510][07135] Avg episode reward: [(0, '4.630')] [2024-12-19 10:42:41,044][15938] Updated weights for policy 0, policy_version 1870 (0.0035) [2024-12-19 10:42:43,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 7667712. Throughput: 0: 940.9. Samples: 1343704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:42:43,508][07135] Avg episode reward: [(0, '4.525')] [2024-12-19 10:42:48,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 7688192. Throughput: 0: 984.1. Samples: 1349870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:42:48,508][07135] Avg episode reward: [(0, '4.690')] [2024-12-19 10:42:48,521][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001877_7688192.pth... [2024-12-19 10:42:48,718][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001655_6778880.pth [2024-12-19 10:42:52,144][15938] Updated weights for policy 0, policy_version 1880 (0.0022) [2024-12-19 10:42:53,505][07135] Fps is (10 sec: 3275.6, 60 sec: 3686.2, 300 sec: 3790.5). Total num frames: 7700480. Throughput: 0: 949.3. Samples: 1351772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:42:53,510][07135] Avg episode reward: [(0, '4.846')] [2024-12-19 10:42:58,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 7725056. Throughput: 0: 920.2. Samples: 1357382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:42:58,509][07135] Avg episode reward: [(0, '4.724')] [2024-12-19 10:43:01,933][15938] Updated weights for policy 0, policy_version 1890 (0.0018) [2024-12-19 10:43:03,502][07135] Fps is (10 sec: 4507.3, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 7745536. Throughput: 0: 969.5. Samples: 1364214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:43:03,504][07135] Avg episode reward: [(0, '4.650')] [2024-12-19 10:43:08,505][07135] Fps is (10 sec: 3685.1, 60 sec: 3754.4, 300 sec: 3790.5). Total num frames: 7761920. Throughput: 0: 966.3. Samples: 1366682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:43:08,508][07135] Avg episode reward: [(0, '4.632')] [2024-12-19 10:43:13,315][15938] Updated weights for policy 0, policy_version 1900 (0.0022) [2024-12-19 10:43:13,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3818.3). Total num frames: 7782400. Throughput: 0: 926.7. Samples: 1371716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:43:13,509][07135] Avg episode reward: [(0, '4.711')] [2024-12-19 10:43:18,502][07135] Fps is (10 sec: 4507.3, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 7806976. Throughput: 0: 969.7. Samples: 1378980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:43:18,504][07135] Avg episode reward: [(0, '4.904')] [2024-12-19 10:43:22,512][15938] Updated weights for policy 0, policy_version 1910 (0.0020) [2024-12-19 10:43:23,505][07135] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3790.5). Total num frames: 7823360. Throughput: 0: 996.3. Samples: 1382166. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:43:23,507][07135] Avg episode reward: [(0, '4.870')] [2024-12-19 10:43:28,502][07135] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 7835648. Throughput: 0: 945.3. Samples: 1386242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:43:28,506][07135] Avg episode reward: [(0, '4.610')] [2024-12-19 10:43:33,502][07135] Fps is (10 sec: 2867.6, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 7852032. Throughput: 0: 904.6. Samples: 1390576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:43:33,508][07135] Avg episode reward: [(0, '4.553')] [2024-12-19 10:43:35,723][15938] Updated weights for policy 0, policy_version 1920 (0.0027) [2024-12-19 10:43:38,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 7876608. Throughput: 0: 939.8. Samples: 1394060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:43:38,508][07135] Avg episode reward: [(0, '4.495')] [2024-12-19 10:43:43,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 7888896. Throughput: 0: 932.3. Samples: 1399336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:43:43,508][07135] Avg episode reward: [(0, '4.428')] [2024-12-19 10:43:47,057][15938] Updated weights for policy 0, policy_version 1930 (0.0020) [2024-12-19 10:43:48,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 7909376. Throughput: 0: 910.7. Samples: 1405194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:43:48,508][07135] Avg episode reward: [(0, '4.599')] [2024-12-19 10:43:53,502][07135] Fps is (10 sec: 4505.3, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 7933952. Throughput: 0: 933.7. Samples: 1408696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:43:53,507][07135] Avg episode reward: [(0, '4.625')] [2024-12-19 10:43:55,710][15938] Updated weights for policy 0, policy_version 1940 (0.0015) [2024-12-19 10:43:58,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 7954432. Throughput: 0: 959.8. Samples: 1414906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:43:58,512][07135] Avg episode reward: [(0, '4.639')] [2024-12-19 10:44:03,502][07135] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 7970816. Throughput: 0: 910.8. Samples: 1419968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:44:03,511][07135] Avg episode reward: [(0, '4.738')] [2024-12-19 10:44:06,703][15938] Updated weights for policy 0, policy_version 1950 (0.0015) [2024-12-19 10:44:08,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3818.3). Total num frames: 7991296. Throughput: 0: 919.8. Samples: 1423556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:44:08,508][07135] Avg episode reward: [(0, '4.868')] [2024-12-19 10:44:13,502][07135] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 8015872. Throughput: 0: 982.9. Samples: 1430474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:44:13,507][07135] Avg episode reward: [(0, '4.506')] [2024-12-19 10:44:17,421][15938] Updated weights for policy 0, policy_version 1960 (0.0024) [2024-12-19 10:44:18,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 8028160. Throughput: 0: 984.3. Samples: 1434870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:44:18,504][07135] Avg episode reward: [(0, '4.510')] [2024-12-19 10:44:23,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 8052736. Throughput: 0: 978.8. Samples: 1438104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:44:23,504][07135] Avg episode reward: [(0, '4.340')] [2024-12-19 10:44:26,693][15938] Updated weights for policy 0, policy_version 1970 (0.0023) [2024-12-19 10:44:28,502][07135] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 8077312. Throughput: 0: 1021.1. Samples: 1445286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:44:28,508][07135] Avg episode reward: [(0, '4.378')] [2024-12-19 10:44:33,506][07135] Fps is (10 sec: 3684.7, 60 sec: 3959.2, 300 sec: 3804.4). Total num frames: 8089600. Throughput: 0: 999.0. Samples: 1450152. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:44:33,509][07135] Avg episode reward: [(0, '4.582')] [2024-12-19 10:44:37,949][15938] Updated weights for policy 0, policy_version 1980 (0.0013) [2024-12-19 10:44:38,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3818.4). Total num frames: 8110080. Throughput: 0: 979.9. Samples: 1452792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:44:38,511][07135] Avg episode reward: [(0, '4.723')] [2024-12-19 10:44:43,502][07135] Fps is (10 sec: 4507.7, 60 sec: 4096.0, 300 sec: 3846.1). Total num frames: 8134656. Throughput: 0: 1002.1. Samples: 1460002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:44:43,510][07135] Avg episode reward: [(0, '4.524')] [2024-12-19 10:44:47,460][15938] Updated weights for policy 0, policy_version 1990 (0.0027) [2024-12-19 10:44:48,504][07135] Fps is (10 sec: 4094.9, 60 sec: 4027.6, 300 sec: 3818.3). Total num frames: 8151040. Throughput: 0: 1014.4. Samples: 1465618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:44:48,507][07135] Avg episode reward: [(0, '4.380')] [2024-12-19 10:44:48,515][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001990_8151040.pth... [2024-12-19 10:44:48,674][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001767_7237632.pth [2024-12-19 10:44:53,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 8167424. Throughput: 0: 982.2. Samples: 1467754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:44:53,510][07135] Avg episode reward: [(0, '4.542')] [2024-12-19 10:44:58,008][15938] Updated weights for policy 0, policy_version 2000 (0.0022) [2024-12-19 10:44:58,502][07135] Fps is (10 sec: 4096.9, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 8192000. Throughput: 0: 977.5. Samples: 1474464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:44:58,504][07135] Avg episode reward: [(0, '4.635')] [2024-12-19 10:45:03,504][07135] Fps is (10 sec: 4504.4, 60 sec: 4027.6, 300 sec: 3832.2). Total num frames: 8212480. Throughput: 0: 1026.7. Samples: 1481072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:45:03,510][07135] Avg episode reward: [(0, '4.654')] [2024-12-19 10:45:08,502][07135] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 8228864. Throughput: 0: 1001.5. Samples: 1483172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:45:08,511][07135] Avg episode reward: [(0, '4.692')] [2024-12-19 10:45:09,270][15938] Updated weights for policy 0, policy_version 2010 (0.0015) [2024-12-19 10:45:13,502][07135] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 8253440. Throughput: 0: 977.7. Samples: 1489282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:45:13,503][07135] Avg episode reward: [(0, '4.632')] [2024-12-19 10:45:17,761][15938] Updated weights for policy 0, policy_version 2020 (0.0024) [2024-12-19 10:45:18,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3860.0). Total num frames: 8273920. Throughput: 0: 1028.7. Samples: 1496438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:45:18,504][07135] Avg episode reward: [(0, '4.677')] [2024-12-19 10:45:23,502][07135] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 8290304. Throughput: 0: 1020.6. Samples: 1498720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:45:23,507][07135] Avg episode reward: [(0, '4.750')] [2024-12-19 10:45:28,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8310784. Throughput: 0: 977.0. Samples: 1503966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:45:28,508][07135] Avg episode reward: [(0, '4.576')] [2024-12-19 10:45:29,105][15938] Updated weights for policy 0, policy_version 2030 (0.0024) [2024-12-19 10:45:33,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4096.3, 300 sec: 3873.8). Total num frames: 8335360. Throughput: 0: 1010.1. Samples: 1511072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:45:33,504][07135] Avg episode reward: [(0, '4.478')] [2024-12-19 10:45:38,502][07135] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 8351744. Throughput: 0: 1034.8. Samples: 1514322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:45:38,507][07135] Avg episode reward: [(0, '4.623')] [2024-12-19 10:45:38,876][15938] Updated weights for policy 0, policy_version 2040 (0.0023) [2024-12-19 10:45:43,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 8368128. Throughput: 0: 982.9. Samples: 1518694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:45:43,505][07135] Avg episode reward: [(0, '4.685')] [2024-12-19 10:45:48,502][07135] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3873.8). Total num frames: 8392704. Throughput: 0: 994.1. Samples: 1525806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:45:48,509][07135] Avg episode reward: [(0, '4.738')] [2024-12-19 10:45:48,905][15938] Updated weights for policy 0, policy_version 2050 (0.0024) [2024-12-19 10:45:53,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3873.8). Total num frames: 8413184. Throughput: 0: 1025.1. Samples: 1529302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:45:53,504][07135] Avg episode reward: [(0, '4.629')] [2024-12-19 10:45:58,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 8429568. Throughput: 0: 993.7. Samples: 1533998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:45:58,506][07135] Avg episode reward: [(0, '4.468')] [2024-12-19 10:46:00,357][15938] Updated weights for policy 0, policy_version 2060 (0.0023) [2024-12-19 10:46:03,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3873.9). Total num frames: 8450048. Throughput: 0: 976.3. Samples: 1540372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:46:03,510][07135] Avg episode reward: [(0, '4.633')] [2024-12-19 10:46:08,502][07135] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3901.7). Total num frames: 8474624. Throughput: 0: 1004.0. Samples: 1543900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:46:08,510][07135] Avg episode reward: [(0, '5.012')] [2024-12-19 10:46:08,881][15938] Updated weights for policy 0, policy_version 2070 (0.0014) [2024-12-19 10:46:13,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 8491008. Throughput: 0: 1015.4. Samples: 1549660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:46:13,506][07135] Avg episode reward: [(0, '5.131')] [2024-12-19 10:46:18,502][07135] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 8511488. Throughput: 0: 979.7. Samples: 1555160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:46:18,505][07135] Avg episode reward: [(0, '4.950')] [2024-12-19 10:46:19,991][15938] Updated weights for policy 0, policy_version 2080 (0.0019) [2024-12-19 10:46:23,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 8536064. Throughput: 0: 988.6. Samples: 1558810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:46:23,504][07135] Avg episode reward: [(0, '4.990')] [2024-12-19 10:46:28,506][07135] Fps is (10 sec: 4094.2, 60 sec: 4027.4, 300 sec: 3901.6). Total num frames: 8552448. Throughput: 0: 1033.3. Samples: 1565196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:46:28,509][07135] Avg episode reward: [(0, '4.823')] [2024-12-19 10:46:30,382][15938] Updated weights for policy 0, policy_version 2090 (0.0026) [2024-12-19 10:46:33,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 8568832. Throughput: 0: 981.2. Samples: 1569958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:46:33,504][07135] Avg episode reward: [(0, '4.575')] [2024-12-19 10:46:38,502][07135] Fps is (10 sec: 4097.7, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 8593408. Throughput: 0: 982.6. Samples: 1573518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:46:38,504][07135] Avg episode reward: [(0, '4.562')] [2024-12-19 10:46:39,800][15938] Updated weights for policy 0, policy_version 2100 (0.0020) [2024-12-19 10:46:43,505][07135] Fps is (10 sec: 4504.0, 60 sec: 4095.8, 300 sec: 3929.3). Total num frames: 8613888. Throughput: 0: 1036.3. Samples: 1580636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:46:43,508][07135] Avg episode reward: [(0, '4.534')] [2024-12-19 10:46:48,503][07135] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 8630272. Throughput: 0: 994.4. Samples: 1585120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:46:48,506][07135] Avg episode reward: [(0, '4.705')] [2024-12-19 10:46:48,518][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002107_8630272.pth... [2024-12-19 10:46:48,684][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001877_7688192.pth [2024-12-19 10:46:51,206][15938] Updated weights for policy 0, policy_version 2110 (0.0022) [2024-12-19 10:46:53,502][07135] Fps is (10 sec: 3687.7, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 8650752. Throughput: 0: 984.3. Samples: 1588192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:46:53,506][07135] Avg episode reward: [(0, '4.713')] [2024-12-19 10:46:58,502][07135] Fps is (10 sec: 4506.3, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 8675328. Throughput: 0: 1011.8. Samples: 1595192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:46:58,504][07135] Avg episode reward: [(0, '4.535')] [2024-12-19 10:47:00,162][15938] Updated weights for policy 0, policy_version 2120 (0.0034) [2024-12-19 10:47:03,502][07135] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 8691712. Throughput: 0: 1006.3. Samples: 1600442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:47:03,508][07135] Avg episode reward: [(0, '4.468')] [2024-12-19 10:47:08,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 8712192. Throughput: 0: 979.1. Samples: 1602868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:47:08,510][07135] Avg episode reward: [(0, '4.656')] [2024-12-19 10:47:10,946][15938] Updated weights for policy 0, policy_version 2130 (0.0021) [2024-12-19 10:47:13,502][07135] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 8736768. Throughput: 0: 997.7. Samples: 1610088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:47:13,509][07135] Avg episode reward: [(0, '4.555')] [2024-12-19 10:47:18,502][07135] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 8753152. Throughput: 0: 1025.6. Samples: 1616108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:47:18,504][07135] Avg episode reward: [(0, '4.509')] [2024-12-19 10:47:21,728][15938] Updated weights for policy 0, policy_version 2140 (0.0022) [2024-12-19 10:47:23,502][07135] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 8769536. Throughput: 0: 995.5. Samples: 1618316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:47:23,504][07135] Avg episode reward: [(0, '4.780')] [2024-12-19 10:47:28,502][07135] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 3943.3). Total num frames: 8794112. Throughput: 0: 980.9. Samples: 1624772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:47:28,504][07135] Avg episode reward: [(0, '4.896')] [2024-12-19 10:47:30,678][15938] Updated weights for policy 0, policy_version 2150 (0.0027) [2024-12-19 10:47:33,502][07135] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 3957.1). Total num frames: 8814592. Throughput: 0: 1031.4. Samples: 1631532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:47:33,505][07135] Avg episode reward: [(0, '4.442')] [2024-12-19 10:47:38,502][07135] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 8830976. Throughput: 0: 1011.6. Samples: 1633716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:47:38,509][07135] Avg episode reward: [(0, '4.580')] [2024-12-19 10:47:41,816][15938] Updated weights for policy 0, policy_version 2160 (0.0015) [2024-12-19 10:47:43,502][07135] Fps is (10 sec: 4096.2, 60 sec: 4028.0, 300 sec: 3957.2). Total num frames: 8855552. Throughput: 0: 988.2. Samples: 1639660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:47:43,508][07135] Avg episode reward: [(0, '4.725')] [2024-12-19 10:47:48,502][07135] Fps is (10 sec: 4505.7, 60 sec: 4096.1, 300 sec: 3985.0). Total num frames: 8876032. Throughput: 0: 1029.9. Samples: 1646786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:47:48,509][07135] Avg episode reward: [(0, '4.612')] [2024-12-19 10:47:51,619][15938] Updated weights for policy 0, policy_version 2170 (0.0016) [2024-12-19 10:47:53,502][07135] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 8892416. Throughput: 0: 1029.1. Samples: 1649178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:47:53,504][07135] Avg episode reward: [(0, '4.595')] [2024-12-19 10:47:58,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 8912896. Throughput: 0: 982.1. Samples: 1654282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:47:58,511][07135] Avg episode reward: [(0, '4.694')] [2024-12-19 10:48:01,861][15938] Updated weights for policy 0, policy_version 2180 (0.0019) [2024-12-19 10:48:03,503][07135] Fps is (10 sec: 4095.3, 60 sec: 4027.6, 300 sec: 3971.1). Total num frames: 8933376. Throughput: 0: 990.0. Samples: 1660658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:48:03,511][07135] Avg episode reward: [(0, '4.707')] [2024-12-19 10:48:08,504][07135] Fps is (10 sec: 3275.9, 60 sec: 3891.0, 300 sec: 3943.2). Total num frames: 8945664. Throughput: 0: 987.1. Samples: 1662740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:48:08,507][07135] Avg episode reward: [(0, '4.570')] [2024-12-19 10:48:13,502][07135] Fps is (10 sec: 2458.0, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 8957952. Throughput: 0: 929.2. Samples: 1666588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:48:13,503][07135] Avg episode reward: [(0, '4.402')] [2024-12-19 10:48:15,795][15938] Updated weights for policy 0, policy_version 2190 (0.0030) [2024-12-19 10:48:18,502][07135] Fps is (10 sec: 3687.4, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 8982528. Throughput: 0: 917.4. Samples: 1672816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:48:18,505][07135] Avg episode reward: [(0, '4.447')] [2024-12-19 10:48:23,502][07135] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 9007104. Throughput: 0: 949.9. Samples: 1676460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:48:23,504][07135] Avg episode reward: [(0, '4.499')] [2024-12-19 10:48:24,589][15938] Updated weights for policy 0, policy_version 2200 (0.0015) [2024-12-19 10:48:28,504][07135] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3957.1). Total num frames: 9019392. Throughput: 0: 941.0. Samples: 1682006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:48:28,509][07135] Avg episode reward: [(0, '4.643')] [2024-12-19 10:48:33,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 9039872. Throughput: 0: 906.0. Samples: 1687554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:48:33,509][07135] Avg episode reward: [(0, '4.772')] [2024-12-19 10:48:35,588][15938] Updated weights for policy 0, policy_version 2210 (0.0022) [2024-12-19 10:48:38,502][07135] Fps is (10 sec: 4506.8, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 9064448. Throughput: 0: 932.5. Samples: 1691140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:48:38,509][07135] Avg episode reward: [(0, '4.819')] [2024-12-19 10:48:43,503][07135] Fps is (10 sec: 4505.1, 60 sec: 3822.9, 300 sec: 3984.9). Total num frames: 9084928. Throughput: 0: 961.0. Samples: 1697530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:48:43,511][07135] Avg episode reward: [(0, '4.690')] [2024-12-19 10:48:46,148][15938] Updated weights for policy 0, policy_version 2220 (0.0019) [2024-12-19 10:48:48,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 9101312. Throughput: 0: 927.6. Samples: 1702400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:48:48,507][07135] Avg episode reward: [(0, '4.623')] [2024-12-19 10:48:48,522][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002222_9101312.pth... [2024-12-19 10:48:48,654][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001990_8151040.pth [2024-12-19 10:48:53,502][07135] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 9121792. Throughput: 0: 958.3. Samples: 1705860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:48:53,504][07135] Avg episode reward: [(0, '4.726')] [2024-12-19 10:48:55,606][15938] Updated weights for policy 0, policy_version 2230 (0.0015) [2024-12-19 10:48:58,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 9142272. Throughput: 0: 1027.3. Samples: 1712816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:48:58,507][07135] Avg episode reward: [(0, '4.757')] [2024-12-19 10:49:03,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3957.2). Total num frames: 9158656. Throughput: 0: 986.5. Samples: 1717210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:49:03,507][07135] Avg episode reward: [(0, '4.642')] [2024-12-19 10:49:06,869][15938] Updated weights for policy 0, policy_version 2240 (0.0017) [2024-12-19 10:49:08,502][07135] Fps is (10 sec: 3686.3, 60 sec: 3891.4, 300 sec: 3943.3). Total num frames: 9179136. Throughput: 0: 977.3. Samples: 1720440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:49:08,510][07135] Avg episode reward: [(0, '4.672')] [2024-12-19 10:49:13,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 9203712. Throughput: 0: 1007.7. Samples: 1727352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:49:13,503][07135] Avg episode reward: [(0, '4.449')] [2024-12-19 10:49:16,683][15938] Updated weights for policy 0, policy_version 2250 (0.0024) [2024-12-19 10:49:18,504][07135] Fps is (10 sec: 4095.2, 60 sec: 3959.3, 300 sec: 3957.1). Total num frames: 9220096. Throughput: 0: 995.1. Samples: 1732336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:49:18,511][07135] Avg episode reward: [(0, '4.604')] [2024-12-19 10:49:23,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 9236480. Throughput: 0: 967.7. Samples: 1734688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:49:23,504][07135] Avg episode reward: [(0, '4.674')] [2024-12-19 10:49:27,202][15938] Updated weights for policy 0, policy_version 2260 (0.0014) [2024-12-19 10:49:28,502][07135] Fps is (10 sec: 4096.9, 60 sec: 4027.9, 300 sec: 3971.1). Total num frames: 9261056. Throughput: 0: 980.4. Samples: 1741646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:49:28,505][07135] Avg episode reward: [(0, '4.900')] [2024-12-19 10:49:33,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 9281536. Throughput: 0: 1004.3. Samples: 1747594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:49:33,505][07135] Avg episode reward: [(0, '4.864')] [2024-12-19 10:49:38,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 9293824. Throughput: 0: 973.2. Samples: 1749654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:49:38,504][07135] Avg episode reward: [(0, '4.946')] [2024-12-19 10:49:38,592][15938] Updated weights for policy 0, policy_version 2270 (0.0021) [2024-12-19 10:49:43,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 9318400. Throughput: 0: 960.6. Samples: 1756042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:49:43,504][07135] Avg episode reward: [(0, '4.789')] [2024-12-19 10:49:47,447][15938] Updated weights for policy 0, policy_version 2280 (0.0017) [2024-12-19 10:49:48,504][07135] Fps is (10 sec: 4504.4, 60 sec: 3959.3, 300 sec: 3971.0). Total num frames: 9338880. Throughput: 0: 1006.5. Samples: 1762506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:49:48,508][07135] Avg episode reward: [(0, '4.785')] [2024-12-19 10:49:53,505][07135] Fps is (10 sec: 3685.1, 60 sec: 3891.0, 300 sec: 3943.2). Total num frames: 9355264. Throughput: 0: 979.5. Samples: 1764520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:49:53,510][07135] Avg episode reward: [(0, '4.804')] [2024-12-19 10:49:58,502][07135] Fps is (10 sec: 3687.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 9375744. Throughput: 0: 948.4. Samples: 1770030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:49:58,510][07135] Avg episode reward: [(0, '4.695')] [2024-12-19 10:49:59,162][15938] Updated weights for policy 0, policy_version 2290 (0.0029) [2024-12-19 10:50:03,502][07135] Fps is (10 sec: 4097.4, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 9396224. Throughput: 0: 990.7. Samples: 1776916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:50:03,504][07135] Avg episode reward: [(0, '4.499')] [2024-12-19 10:50:08,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 9412608. Throughput: 0: 990.6. Samples: 1779266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-19 10:50:08,503][07135] Avg episode reward: [(0, '4.589')] [2024-12-19 10:50:10,673][15938] Updated weights for policy 0, policy_version 2300 (0.0036) [2024-12-19 10:50:13,502][07135] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 9433088. Throughput: 0: 943.8. Samples: 1784116. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-19 10:50:13,503][07135] Avg episode reward: [(0, '4.576')] [2024-12-19 10:50:18,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3943.3). Total num frames: 9453568. Throughput: 0: 968.8. Samples: 1791188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:50:18,511][07135] Avg episode reward: [(0, '4.588')] [2024-12-19 10:50:19,518][15938] Updated weights for policy 0, policy_version 2310 (0.0021) [2024-12-19 10:50:23,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 9474048. Throughput: 0: 995.9. Samples: 1794470. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-19 10:50:23,508][07135] Avg episode reward: [(0, '4.688')] [2024-12-19 10:50:28,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 9490432. Throughput: 0: 950.4. Samples: 1798808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:50:28,509][07135] Avg episode reward: [(0, '4.938')] [2024-12-19 10:50:30,932][15938] Updated weights for policy 0, policy_version 2320 (0.0014) [2024-12-19 10:50:33,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 9510912. Throughput: 0: 963.2. Samples: 1805846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:50:33,504][07135] Avg episode reward: [(0, '4.833')] [2024-12-19 10:50:38,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 9535488. Throughput: 0: 995.1. Samples: 1809294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:50:38,504][07135] Avg episode reward: [(0, '4.666')] [2024-12-19 10:50:40,719][15938] Updated weights for policy 0, policy_version 2330 (0.0016) [2024-12-19 10:50:43,502][07135] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 9551872. Throughput: 0: 982.1. Samples: 1814224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:50:43,507][07135] Avg episode reward: [(0, '4.755')] [2024-12-19 10:50:48,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3929.4). Total num frames: 9572352. Throughput: 0: 964.8. Samples: 1820334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:50:48,511][07135] Avg episode reward: [(0, '4.526')] [2024-12-19 10:50:48,523][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002337_9572352.pth... [2024-12-19 10:50:48,645][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002107_8630272.pth [2024-12-19 10:50:51,094][15938] Updated weights for policy 0, policy_version 2340 (0.0017) [2024-12-19 10:50:53,502][07135] Fps is (10 sec: 4096.1, 60 sec: 3959.7, 300 sec: 3943.3). Total num frames: 9592832. Throughput: 0: 989.8. Samples: 1823806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:50:53,503][07135] Avg episode reward: [(0, '4.631')] [2024-12-19 10:50:58,502][07135] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3929.4). Total num frames: 9609216. Throughput: 0: 1008.1. Samples: 1829480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:50:58,507][07135] Avg episode reward: [(0, '4.732')] [2024-12-19 10:51:02,455][15938] Updated weights for policy 0, policy_version 2350 (0.0017) [2024-12-19 10:51:03,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 9629696. Throughput: 0: 966.3. Samples: 1834670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:51:03,506][07135] Avg episode reward: [(0, '4.849')] [2024-12-19 10:51:08,504][07135] Fps is (10 sec: 4095.2, 60 sec: 3959.3, 300 sec: 3929.3). Total num frames: 9650176. Throughput: 0: 973.6. Samples: 1838284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:51:08,506][07135] Avg episode reward: [(0, '4.752')] [2024-12-19 10:51:11,138][15938] Updated weights for policy 0, policy_version 2360 (0.0021) [2024-12-19 10:51:13,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 9670656. Throughput: 0: 1024.7. Samples: 1844918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:51:13,504][07135] Avg episode reward: [(0, '4.715')] [2024-12-19 10:51:18,502][07135] Fps is (10 sec: 3687.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 9687040. Throughput: 0: 968.5. Samples: 1849430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-19 10:51:18,504][07135] Avg episode reward: [(0, '4.872')] [2024-12-19 10:51:22,505][15938] Updated weights for policy 0, policy_version 2370 (0.0033) [2024-12-19 10:51:23,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 9711616. Throughput: 0: 968.8. Samples: 1852888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:51:23,503][07135] Avg episode reward: [(0, '4.686')] [2024-12-19 10:51:28,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 9732096. Throughput: 0: 1014.5. Samples: 1859874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:51:28,508][07135] Avg episode reward: [(0, '4.701')] [2024-12-19 10:51:33,044][15938] Updated weights for policy 0, policy_version 2380 (0.0029) [2024-12-19 10:51:33,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 9748480. Throughput: 0: 982.6. Samples: 1864552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:51:33,506][07135] Avg episode reward: [(0, '4.878')] [2024-12-19 10:51:38,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 9768960. Throughput: 0: 968.1. Samples: 1867370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:51:38,510][07135] Avg episode reward: [(0, '4.727')] [2024-12-19 10:51:42,614][15938] Updated weights for policy 0, policy_version 2390 (0.0015) [2024-12-19 10:51:43,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 9793536. Throughput: 0: 999.8. Samples: 1874470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-19 10:51:43,503][07135] Avg episode reward: [(0, '4.721')] [2024-12-19 10:51:48,502][07135] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 9809920. Throughput: 0: 1000.7. Samples: 1879702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:51:48,504][07135] Avg episode reward: [(0, '4.718')] [2024-12-19 10:51:53,502][07135] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 9826304. Throughput: 0: 968.9. Samples: 1881880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:51:53,507][07135] Avg episode reward: [(0, '4.726')] [2024-12-19 10:51:54,017][15938] Updated weights for policy 0, policy_version 2400 (0.0016) [2024-12-19 10:51:58,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 9846784. Throughput: 0: 973.0. Samples: 1888704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:51:58,504][07135] Avg episode reward: [(0, '4.644')] [2024-12-19 10:52:03,502][07135] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 9871360. Throughput: 0: 1011.5. Samples: 1894950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:52:03,509][15938] Updated weights for policy 0, policy_version 2410 (0.0022) [2024-12-19 10:52:03,509][07135] Avg episode reward: [(0, '4.663')] [2024-12-19 10:52:08,502][07135] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 9883648. Throughput: 0: 979.8. Samples: 1896980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-19 10:52:08,509][07135] Avg episode reward: [(0, '4.684')] [2024-12-19 10:52:13,502][07135] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 9908224. Throughput: 0: 961.7. Samples: 1903150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:52:13,511][07135] Avg episode reward: [(0, '5.198')] [2024-12-19 10:52:14,371][15938] Updated weights for policy 0, policy_version 2420 (0.0016) [2024-12-19 10:52:18,502][07135] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 9928704. Throughput: 0: 1009.3. Samples: 1909972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-19 10:52:18,504][07135] Avg episode reward: [(0, '5.091')] [2024-12-19 10:52:23,505][07135] Fps is (10 sec: 3685.0, 60 sec: 3890.9, 300 sec: 3901.6). Total num frames: 9945088. Throughput: 0: 992.8. Samples: 1912048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:52:23,510][07135] Avg episode reward: [(0, '4.750')] [2024-12-19 10:52:25,808][15938] Updated weights for policy 0, policy_version 2430 (0.0013) [2024-12-19 10:52:28,502][07135] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 9961472. Throughput: 0: 951.0. Samples: 1917264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:52:28,504][07135] Avg episode reward: [(0, '4.453')] [2024-12-19 10:52:33,502][07135] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 9986048. Throughput: 0: 986.7. Samples: 1924104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:52:33,506][07135] Avg episode reward: [(0, '4.646')] [2024-12-19 10:52:34,878][15938] Updated weights for policy 0, policy_version 2440 (0.0029) [2024-12-19 10:52:38,502][07135] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 10002432. Throughput: 0: 1001.3. Samples: 1926938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-19 10:52:38,504][07135] Avg episode reward: [(0, '4.720')] [2024-12-19 10:52:39,015][07135] Component Batcher_0 stopped! [2024-12-19 10:52:39,012][15925] Stopping Batcher_0... [2024-12-19 10:52:39,013][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-12-19 10:52:39,020][15925] Loop batcher_evt_loop terminating... [2024-12-19 10:52:39,126][15938] Weights refcount: 2 0 [2024-12-19 10:52:39,141][15925] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002222_9101312.pth [2024-12-19 10:52:39,144][15938] Stopping InferenceWorker_p0-w0... [2024-12-19 10:52:39,145][15938] Loop inference_proc0-0_evt_loop terminating... [2024-12-19 10:52:39,145][07135] Component InferenceWorker_p0-w0 stopped! [2024-12-19 10:52:39,161][15925] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-12-19 10:52:39,423][07135] Component LearnerWorker_p0 stopped! [2024-12-19 10:52:39,426][15925] Stopping LearnerWorker_p0... [2024-12-19 10:52:39,427][15925] Loop learner_proc0_evt_loop terminating... [2024-12-19 10:52:39,759][15947] Stopping RolloutWorker_w3... [2024-12-19 10:52:39,759][07135] Component RolloutWorker_w3 stopped! [2024-12-19 10:52:39,760][15947] Loop rollout_proc3_evt_loop terminating... [2024-12-19 10:52:39,786][15946] Stopping RolloutWorker_w4... [2024-12-19 10:52:39,791][07135] Component RolloutWorker_w4 stopped! [2024-12-19 10:52:39,791][15946] Loop rollout_proc4_evt_loop terminating... [2024-12-19 10:52:39,801][15945] Stopping RolloutWorker_w2... [2024-12-19 10:52:39,802][15945] Loop rollout_proc2_evt_loop terminating... [2024-12-19 10:52:39,801][07135] Component RolloutWorker_w2 stopped! [2024-12-19 10:52:39,813][15950] Stopping RolloutWorker_w6... [2024-12-19 10:52:39,814][15950] Loop rollout_proc6_evt_loop terminating... [2024-12-19 10:52:39,810][07135] Component RolloutWorker_w7 stopped! [2024-12-19 10:52:39,817][07135] Component RolloutWorker_w6 stopped! [2024-12-19 10:52:39,819][15949] Stopping RolloutWorker_w7... [2024-12-19 10:52:39,827][15949] Loop rollout_proc7_evt_loop terminating... [2024-12-19 10:52:39,831][07135] Component RolloutWorker_w5 stopped! [2024-12-19 10:52:39,832][15948] Stopping RolloutWorker_w5... [2024-12-19 10:52:39,836][15948] Loop rollout_proc5_evt_loop terminating... [2024-12-19 10:52:39,842][15939] Stopping RolloutWorker_w0... [2024-12-19 10:52:39,843][15939] Loop rollout_proc0_evt_loop terminating... [2024-12-19 10:52:39,843][07135] Component RolloutWorker_w0 stopped! [2024-12-19 10:52:39,850][07135] Component RolloutWorker_w1 stopped! [2024-12-19 10:52:39,852][07135] Waiting for process learner_proc0 to stop... [2024-12-19 10:52:39,860][15944] Stopping RolloutWorker_w1... [2024-12-19 10:52:39,861][15944] Loop rollout_proc1_evt_loop terminating... [2024-12-19 10:52:41,770][07135] Waiting for process inference_proc0-0 to join... [2024-12-19 10:52:41,903][07135] Waiting for process rollout_proc0 to join... [2024-12-19 10:52:44,931][07135] Waiting for process rollout_proc1 to join... [2024-12-19 10:52:44,934][07135] Waiting for process rollout_proc2 to join... [2024-12-19 10:52:44,938][07135] Waiting for process rollout_proc3 to join... [2024-12-19 10:52:44,943][07135] Waiting for process rollout_proc4 to join... [2024-12-19 10:52:44,947][07135] Waiting for process rollout_proc5 to join... [2024-12-19 10:52:44,951][07135] Waiting for process rollout_proc6 to join... [2024-12-19 10:52:44,955][07135] Waiting for process rollout_proc7 to join... [2024-12-19 10:52:44,958][07135] Batcher 0 profile tree view: batching: 53.5451, releasing_batches: 0.0503 [2024-12-19 10:52:44,961][07135] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 792.7970 update_model: 16.6535 weight_update: 0.0059 one_step: 0.0145 handle_policy_step: 1117.5331 deserialize: 27.8088, stack: 6.3569, obs_to_device_normalize: 238.7164, forward: 559.1798, send_messages: 55.3780 prepare_outputs: 172.9525 to_cpu: 104.8202 [2024-12-19 10:52:44,963][07135] Learner 0 profile tree view: misc: 0.0094, prepare_batch: 23.0470 train: 146.1731 epoch_init: 0.0199, minibatch_init: 0.0261, losses_postprocess: 1.2230, kl_divergence: 1.1753, after_optimizer: 6.1288 calculate_losses: 56.0662 losses_init: 0.0107, forward_head: 2.2799, bptt_initial: 39.3433, tail: 2.1034, advantages_returns: 0.5457, losses: 7.3361 bptt: 3.8160 bptt_forward_core: 3.6135 update: 80.2685 clip: 1.5897 [2024-12-19 10:52:44,965][07135] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6654, enqueue_policy_requests: 187.2612, env_step: 1591.3768, overhead: 24.4357, complete_rollouts: 14.0090 save_policy_outputs: 38.7890 split_output_tensors: 15.6078 [2024-12-19 10:52:44,966][07135] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.6041, enqueue_policy_requests: 190.3086, env_step: 1581.1314, overhead: 25.4398, complete_rollouts: 12.5946 save_policy_outputs: 39.4126 split_output_tensors: 15.9723 [2024-12-19 10:52:44,968][07135] Loop Runner_EvtLoop terminating... [2024-12-19 10:52:44,969][07135] Runner profile tree view: main_loop: 2034.0511 [2024-12-19 10:52:44,971][07135] Collected {0: 10006528}, FPS: 3791.8 [2024-12-19 10:57:08,070][07135] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 10:57:08,071][07135] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-19 10:57:08,073][07135] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-19 10:57:08,075][07135] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-19 10:57:08,076][07135] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-19 10:57:08,077][07135] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-19 10:57:08,080][07135] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-19 10:57:08,080][07135] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-19 10:57:08,083][07135] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-19 10:57:08,084][07135] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-19 10:57:08,085][07135] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-19 10:57:08,086][07135] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-19 10:57:08,088][07135] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-19 10:57:08,089][07135] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-19 10:57:08,090][07135] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-19 10:57:08,125][07135] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 10:57:08,127][07135] RunningMeanStd input shape: (1,) [2024-12-19 10:57:08,142][07135] ConvEncoder: input_channels=3 [2024-12-19 10:57:08,179][07135] Conv encoder output size: 512 [2024-12-19 10:57:08,182][07135] Policy head output size: 512 [2024-12-19 10:57:08,202][07135] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-12-19 10:57:08,630][07135] Num frames 100... [2024-12-19 10:57:08,749][07135] Num frames 200... [2024-12-19 10:57:08,873][07135] Num frames 300... [2024-12-19 10:57:08,992][07135] Avg episode rewards: #0: 4.520, true rewards: #0: 3.520 [2024-12-19 10:57:08,993][07135] Avg episode reward: 4.520, avg true_objective: 3.520 [2024-12-19 10:57:09,053][07135] Num frames 400... [2024-12-19 10:57:09,176][07135] Num frames 500... [2024-12-19 10:57:09,300][07135] Num frames 600... [2024-12-19 10:57:09,427][07135] Num frames 700... [2024-12-19 10:57:09,562][07135] Num frames 800... [2024-12-19 10:57:09,615][07135] Avg episode rewards: #0: 6.000, true rewards: #0: 4.000 [2024-12-19 10:57:09,616][07135] Avg episode reward: 6.000, avg true_objective: 4.000 [2024-12-19 10:57:09,738][07135] Num frames 900... [2024-12-19 10:57:09,859][07135] Num frames 1000... [2024-12-19 10:57:09,978][07135] Num frames 1100... [2024-12-19 10:57:10,096][07135] Num frames 1200... [2024-12-19 10:57:10,208][07135] Avg episode rewards: #0: 5.827, true rewards: #0: 4.160 [2024-12-19 10:57:10,209][07135] Avg episode reward: 5.827, avg true_objective: 4.160 [2024-12-19 10:57:10,277][07135] Num frames 1300... [2024-12-19 10:57:10,394][07135] Num frames 1400... [2024-12-19 10:57:10,534][07135] Num frames 1500... [2024-12-19 10:57:10,656][07135] Num frames 1600... [2024-12-19 10:57:10,750][07135] Avg episode rewards: #0: 5.330, true rewards: #0: 4.080 [2024-12-19 10:57:10,752][07135] Avg episode reward: 5.330, avg true_objective: 4.080 [2024-12-19 10:57:10,838][07135] Num frames 1700... [2024-12-19 10:57:10,960][07135] Num frames 1800... [2024-12-19 10:57:11,082][07135] Num frames 1900... [2024-12-19 10:57:11,202][07135] Num frames 2000... [2024-12-19 10:57:11,315][07135] Avg episode rewards: #0: 5.296, true rewards: #0: 4.096 [2024-12-19 10:57:11,317][07135] Avg episode reward: 5.296, avg true_objective: 4.096 [2024-12-19 10:57:11,382][07135] Num frames 2100... [2024-12-19 10:57:11,514][07135] Num frames 2200... [2024-12-19 10:57:11,642][07135] Num frames 2300... [2024-12-19 10:57:11,764][07135] Num frames 2400... [2024-12-19 10:57:11,899][07135] Avg episode rewards: #0: 5.107, true rewards: #0: 4.107 [2024-12-19 10:57:11,900][07135] Avg episode reward: 5.107, avg true_objective: 4.107 [2024-12-19 10:57:11,948][07135] Num frames 2500... [2024-12-19 10:57:12,071][07135] Num frames 2600... [2024-12-19 10:57:12,192][07135] Num frames 2700... [2024-12-19 10:57:12,314][07135] Num frames 2800... [2024-12-19 10:57:12,437][07135] Num frames 2900... [2024-12-19 10:57:12,507][07135] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 [2024-12-19 10:57:12,510][07135] Avg episode reward: 5.160, avg true_objective: 4.160 [2024-12-19 10:57:12,623][07135] Num frames 3000... [2024-12-19 10:57:12,744][07135] Num frames 3100... [2024-12-19 10:57:12,868][07135] Num frames 3200... [2024-12-19 10:57:13,037][07135] Avg episode rewards: #0: 4.995, true rewards: #0: 4.120 [2024-12-19 10:57:13,038][07135] Avg episode reward: 4.995, avg true_objective: 4.120 [2024-12-19 10:57:13,048][07135] Num frames 3300... [2024-12-19 10:57:13,167][07135] Num frames 3400... [2024-12-19 10:57:13,288][07135] Num frames 3500... [2024-12-19 10:57:13,413][07135] Num frames 3600... [2024-12-19 10:57:13,607][07135] Avg episode rewards: #0: 4.867, true rewards: #0: 4.089 [2024-12-19 10:57:13,609][07135] Avg episode reward: 4.867, avg true_objective: 4.089 [2024-12-19 10:57:13,636][07135] Num frames 3700... [2024-12-19 10:57:13,763][07135] Num frames 3800... [2024-12-19 10:57:13,934][07135] Num frames 3900... [2024-12-19 10:57:14,104][07135] Num frames 4000... [2024-12-19 10:57:14,269][07135] Num frames 4100... [2024-12-19 10:57:14,372][07135] Avg episode rewards: #0: 4.928, true rewards: #0: 4.128 [2024-12-19 10:57:14,375][07135] Avg episode reward: 4.928, avg true_objective: 4.128 [2024-12-19 10:57:37,228][07135] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-19 11:03:33,357][07135] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 11:03:33,358][07135] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-19 11:03:33,360][07135] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-19 11:03:33,361][07135] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-19 11:03:33,362][07135] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-19 11:03:33,364][07135] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-19 11:03:33,365][07135] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-19 11:03:33,366][07135] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-19 11:03:33,367][07135] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-19 11:03:33,368][07135] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-19 11:03:33,369][07135] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-19 11:03:33,370][07135] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-19 11:03:33,372][07135] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-19 11:03:33,373][07135] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-19 11:03:33,374][07135] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-19 11:03:33,404][07135] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 11:03:33,406][07135] RunningMeanStd input shape: (1,) [2024-12-19 11:03:33,419][07135] ConvEncoder: input_channels=3 [2024-12-19 11:03:33,458][07135] Conv encoder output size: 512 [2024-12-19 11:03:33,460][07135] Policy head output size: 512 [2024-12-19 11:03:33,478][07135] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-12-19 11:03:33,922][07135] Num frames 100... [2024-12-19 11:03:34,048][07135] Num frames 200... [2024-12-19 11:03:34,170][07135] Num frames 300... [2024-12-19 11:03:34,295][07135] Num frames 400... [2024-12-19 11:03:34,447][07135] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800 [2024-12-19 11:03:34,448][07135] Avg episode reward: 6.800, avg true_objective: 4.800 [2024-12-19 11:03:34,475][07135] Num frames 500... [2024-12-19 11:03:34,604][07135] Num frames 600... [2024-12-19 11:03:34,745][07135] Avg episode rewards: #0: 4.815, true rewards: #0: 3.315 [2024-12-19 11:03:34,746][07135] Avg episode reward: 4.815, avg true_objective: 3.315 [2024-12-19 11:03:34,794][07135] Num frames 700... [2024-12-19 11:03:34,911][07135] Num frames 800... [2024-12-19 11:03:35,027][07135] Num frames 900... [2024-12-19 11:03:35,143][07135] Num frames 1000... [2024-12-19 11:03:35,254][07135] Avg episode rewards: #0: 4.490, true rewards: #0: 3.490 [2024-12-19 11:03:35,255][07135] Avg episode reward: 4.490, avg true_objective: 3.490 [2024-12-19 11:03:35,326][07135] Num frames 1100... [2024-12-19 11:03:35,443][07135] Num frames 1200... [2024-12-19 11:03:35,577][07135] Num frames 1300... [2024-12-19 11:03:35,762][07135] Num frames 1400... [2024-12-19 11:03:35,875][07135] Avg episode rewards: #0: 4.328, true rewards: #0: 3.577 [2024-12-19 11:03:35,876][07135] Avg episode reward: 4.328, avg true_objective: 3.577 [2024-12-19 11:03:35,997][07135] Num frames 1500... [2024-12-19 11:03:36,164][07135] Num frames 1600... [2024-12-19 11:03:36,328][07135] Num frames 1700... [2024-12-19 11:03:36,493][07135] Num frames 1800... [2024-12-19 11:03:36,578][07135] Avg episode rewards: #0: 4.230, true rewards: #0: 3.630 [2024-12-19 11:03:36,580][07135] Avg episode reward: 4.230, avg true_objective: 3.630 [2024-12-19 11:03:36,728][07135] Num frames 1900... [2024-12-19 11:03:36,910][07135] Num frames 2000... [2024-12-19 11:03:37,075][07135] Num frames 2100... [2024-12-19 11:03:37,290][07135] Avg episode rewards: #0: 4.165, true rewards: #0: 3.665 [2024-12-19 11:03:37,293][07135] Avg episode reward: 4.165, avg true_objective: 3.665 [2024-12-19 11:03:37,296][07135] Num frames 2200... [2024-12-19 11:03:37,466][07135] Num frames 2300... [2024-12-19 11:03:37,659][07135] Num frames 2400... [2024-12-19 11:03:37,836][07135] Num frames 2500... [2024-12-19 11:03:38,037][07135] Avg episode rewards: #0: 4.119, true rewards: #0: 3.690 [2024-12-19 11:03:38,039][07135] Avg episode reward: 4.119, avg true_objective: 3.690 [2024-12-19 11:03:38,062][07135] Num frames 2600... [2024-12-19 11:03:38,186][07135] Num frames 2700... [2024-12-19 11:03:38,308][07135] Num frames 2800... [2024-12-19 11:03:38,433][07135] Num frames 2900... [2024-12-19 11:03:38,577][07135] Avg episode rewards: #0: 4.084, true rewards: #0: 3.709 [2024-12-19 11:03:38,579][07135] Avg episode reward: 4.084, avg true_objective: 3.709 [2024-12-19 11:03:38,621][07135] Num frames 3000... [2024-12-19 11:03:38,750][07135] Num frames 3100... [2024-12-19 11:03:38,882][07135] Num frames 3200... [2024-12-19 11:03:39,004][07135] Num frames 3300... [2024-12-19 11:03:39,120][07135] Avg episode rewards: #0: 4.057, true rewards: #0: 3.723 [2024-12-19 11:03:39,121][07135] Avg episode reward: 4.057, avg true_objective: 3.723 [2024-12-19 11:03:39,182][07135] Num frames 3400... [2024-12-19 11:03:39,304][07135] Num frames 3500... [2024-12-19 11:03:39,427][07135] Num frames 3600... [2024-12-19 11:03:39,558][07135] Num frames 3700... [2024-12-19 11:03:39,683][07135] Num frames 3800... [2024-12-19 11:03:39,809][07135] Num frames 3900... [2024-12-19 11:03:39,948][07135] Avg episode rewards: #0: 4.559, true rewards: #0: 3.959 [2024-12-19 11:03:39,950][07135] Avg episode reward: 4.559, avg true_objective: 3.959 [2024-12-19 11:03:57,893][07135] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-19 11:05:06,808][07135] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 11:05:06,810][07135] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-19 11:05:06,812][07135] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-19 11:05:06,814][07135] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-19 11:05:06,816][07135] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-19 11:05:06,817][07135] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-19 11:05:06,819][07135] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-19 11:05:06,821][07135] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-19 11:05:06,822][07135] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-19 11:05:06,823][07135] Adding new argument 'hf_repository'='Esteabn00007/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-19 11:05:06,824][07135] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-19 11:05:06,824][07135] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-19 11:05:06,825][07135] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-19 11:05:06,826][07135] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-19 11:05:06,827][07135] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-19 11:05:06,860][07135] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 11:05:06,862][07135] RunningMeanStd input shape: (1,) [2024-12-19 11:05:06,875][07135] ConvEncoder: input_channels=3 [2024-12-19 11:05:06,912][07135] Conv encoder output size: 512 [2024-12-19 11:05:06,914][07135] Policy head output size: 512 [2024-12-19 11:05:06,932][07135] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-12-19 11:05:07,339][07135] Num frames 100... [2024-12-19 11:05:07,459][07135] Num frames 200... [2024-12-19 11:05:07,602][07135] Num frames 300... [2024-12-19 11:05:07,726][07135] Num frames 400... [2024-12-19 11:05:07,801][07135] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 [2024-12-19 11:05:07,803][07135] Avg episode reward: 5.160, avg true_objective: 4.160 [2024-12-19 11:05:07,905][07135] Num frames 500... [2024-12-19 11:05:08,025][07135] Num frames 600... [2024-12-19 11:05:08,156][07135] Num frames 700... [2024-12-19 11:05:08,281][07135] Num frames 800... [2024-12-19 11:05:08,334][07135] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 [2024-12-19 11:05:08,336][07135] Avg episode reward: 4.500, avg true_objective: 4.000 [2024-12-19 11:05:08,463][07135] Num frames 900... [2024-12-19 11:05:08,599][07135] Num frames 1000... [2024-12-19 11:05:08,725][07135] Num frames 1100... [2024-12-19 11:05:08,856][07135] Num frames 1200... [2024-12-19 11:05:08,970][07135] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160 [2024-12-19 11:05:08,972][07135] Avg episode reward: 4.827, avg true_objective: 4.160 [2024-12-19 11:05:09,039][07135] Num frames 1300... [2024-12-19 11:05:09,161][07135] Num frames 1400... [2024-12-19 11:05:09,289][07135] Num frames 1500... [2024-12-19 11:05:09,412][07135] Num frames 1600... [2024-12-19 11:05:09,507][07135] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 [2024-12-19 11:05:09,508][07135] Avg episode reward: 4.580, avg true_objective: 4.080 [2024-12-19 11:05:09,602][07135] Num frames 1700... [2024-12-19 11:05:09,729][07135] Num frames 1800... [2024-12-19 11:05:09,858][07135] Num frames 1900... [2024-12-19 11:05:09,981][07135] Num frames 2000... [2024-12-19 11:05:10,057][07135] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032 [2024-12-19 11:05:10,058][07135] Avg episode reward: 4.432, avg true_objective: 4.032 [2024-12-19 11:05:10,209][07135] Num frames 2100... [2024-12-19 11:05:10,378][07135] Num frames 2200... [2024-12-19 11:05:10,559][07135] Num frames 2300... [2024-12-19 11:05:10,732][07135] Num frames 2400... [2024-12-19 11:05:10,785][07135] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000 [2024-12-19 11:05:10,787][07135] Avg episode reward: 4.333, avg true_objective: 4.000 [2024-12-19 11:05:10,969][07135] Num frames 2500... [2024-12-19 11:05:11,135][07135] Num frames 2600... [2024-12-19 11:05:11,300][07135] Num frames 2700... [2024-12-19 11:05:11,486][07135] Num frames 2800... [2024-12-19 11:05:11,686][07135] Avg episode rewards: #0: 4.686, true rewards: #0: 4.114 [2024-12-19 11:05:11,689][07135] Avg episode reward: 4.686, avg true_objective: 4.114 [2024-12-19 11:05:11,727][07135] Num frames 2900... [2024-12-19 11:05:11,904][07135] Num frames 3000... [2024-12-19 11:05:12,083][07135] Num frames 3100... [2024-12-19 11:05:12,259][07135] Num frames 3200... [2024-12-19 11:05:12,432][07135] Num frames 3300... [2024-12-19 11:05:12,538][07135] Avg episode rewards: #0: 4.785, true rewards: #0: 4.160 [2024-12-19 11:05:12,540][07135] Avg episode reward: 4.785, avg true_objective: 4.160 [2024-12-19 11:05:12,641][07135] Num frames 3400... [2024-12-19 11:05:12,763][07135] Num frames 3500... [2024-12-19 11:05:12,888][07135] Num frames 3600... [2024-12-19 11:05:13,018][07135] Num frames 3700... [2024-12-19 11:05:13,128][07135] Avg episode rewards: #0: 4.716, true rewards: #0: 4.160 [2024-12-19 11:05:13,130][07135] Avg episode reward: 4.716, avg true_objective: 4.160 [2024-12-19 11:05:13,198][07135] Num frames 3800... [2024-12-19 11:05:13,317][07135] Num frames 3900... [2024-12-19 11:05:13,437][07135] Num frames 4000... [2024-12-19 11:05:13,574][07135] Num frames 4100... [2024-12-19 11:05:13,664][07135] Avg episode rewards: #0: 4.628, true rewards: #0: 4.128 [2024-12-19 11:05:13,666][07135] Avg episode reward: 4.628, avg true_objective: 4.128 [2024-12-19 11:05:32,918][07135] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-19 11:05:47,463][07135] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 11:05:47,465][07135] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-19 11:05:47,468][07135] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-19 11:05:47,470][07135] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-19 11:05:47,472][07135] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-19 11:05:47,474][07135] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-19 11:05:47,476][07135] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-19 11:05:47,478][07135] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-19 11:05:47,479][07135] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-19 11:05:47,481][07135] Adding new argument 'hf_repository'='Esteban00007/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-19 11:05:47,482][07135] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-19 11:05:47,483][07135] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-19 11:05:47,484][07135] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-19 11:05:47,485][07135] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-19 11:05:47,486][07135] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-19 11:05:47,539][07135] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 11:05:47,544][07135] RunningMeanStd input shape: (1,) [2024-12-19 11:05:47,578][07135] ConvEncoder: input_channels=3 [2024-12-19 11:05:47,640][07135] Conv encoder output size: 512 [2024-12-19 11:05:47,642][07135] Policy head output size: 512 [2024-12-19 11:05:47,672][07135] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-12-19 11:05:48,290][07135] Num frames 100... [2024-12-19 11:05:48,472][07135] Num frames 200... [2024-12-19 11:05:48,637][07135] Num frames 300... [2024-12-19 11:05:48,832][07135] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2024-12-19 11:05:48,835][07135] Avg episode reward: 3.840, avg true_objective: 3.840 [2024-12-19 11:05:48,864][07135] Num frames 400... [2024-12-19 11:05:49,038][07135] Num frames 500... [2024-12-19 11:05:49,210][07135] Num frames 600... [2024-12-19 11:05:49,391][07135] Num frames 700... [2024-12-19 11:05:49,573][07135] Num frames 800... [2024-12-19 11:05:49,626][07135] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 [2024-12-19 11:05:49,629][07135] Avg episode reward: 4.500, avg true_objective: 4.000 [2024-12-19 11:05:49,808][07135] Num frames 900... [2024-12-19 11:05:49,951][07135] Num frames 1000... [2024-12-19 11:05:50,082][07135] Num frames 1100... [2024-12-19 11:05:50,243][07135] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 [2024-12-19 11:05:50,245][07135] Avg episode reward: 4.280, avg true_objective: 3.947 [2024-12-19 11:05:50,269][07135] Num frames 1200... [2024-12-19 11:05:50,398][07135] Num frames 1300... [2024-12-19 11:05:50,530][07135] Num frames 1400... [2024-12-19 11:05:50,660][07135] Num frames 1500... [2024-12-19 11:05:50,784][07135] Num frames 1600... [2024-12-19 11:05:50,880][07135] Avg episode rewards: #0: 4.830, true rewards: #0: 4.080 [2024-12-19 11:05:50,882][07135] Avg episode reward: 4.830, avg true_objective: 4.080 [2024-12-19 11:05:50,968][07135] Num frames 1700... [2024-12-19 11:05:51,094][07135] Num frames 1800... [2024-12-19 11:05:51,223][07135] Num frames 1900... [2024-12-19 11:05:51,346][07135] Num frames 2000... [2024-12-19 11:05:51,421][07135] Avg episode rewards: #0: 4.632, true rewards: #0: 4.032 [2024-12-19 11:05:51,423][07135] Avg episode reward: 4.632, avg true_objective: 4.032 [2024-12-19 11:05:51,535][07135] Num frames 2100... [2024-12-19 11:05:51,657][07135] Num frames 2200... [2024-12-19 11:05:51,778][07135] Num frames 2300... [2024-12-19 11:05:51,907][07135] Num frames 2400... [2024-12-19 11:05:51,961][07135] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 [2024-12-19 11:05:51,962][07135] Avg episode reward: 4.500, avg true_objective: 4.000 [2024-12-19 11:05:52,087][07135] Num frames 2500... [2024-12-19 11:05:52,218][07135] Num frames 2600... [2024-12-19 11:05:52,340][07135] Num frames 2700... [2024-12-19 11:05:52,497][07135] Avg episode rewards: #0: 4.406, true rewards: #0: 3.977 [2024-12-19 11:05:52,500][07135] Avg episode reward: 4.406, avg true_objective: 3.977 [2024-12-19 11:05:52,529][07135] Num frames 2800... [2024-12-19 11:05:52,655][07135] Num frames 2900... [2024-12-19 11:05:52,767][07135] Avg episode rewards: #0: 4.058, true rewards: #0: 3.682 [2024-12-19 11:05:52,768][07135] Avg episode reward: 4.058, avg true_objective: 3.682 [2024-12-19 11:05:52,839][07135] Num frames 3000... [2024-12-19 11:05:52,972][07135] Num frames 3100... [2024-12-19 11:05:53,091][07135] Num frames 3200... [2024-12-19 11:05:53,222][07135] Num frames 3300... [2024-12-19 11:05:53,313][07135] Avg episode rewards: #0: 4.033, true rewards: #0: 3.700 [2024-12-19 11:05:53,315][07135] Avg episode reward: 4.033, avg true_objective: 3.700 [2024-12-19 11:05:53,401][07135] Num frames 3400... [2024-12-19 11:05:53,534][07135] Num frames 3500... [2024-12-19 11:05:53,657][07135] Num frames 3600... [2024-12-19 11:05:53,780][07135] Num frames 3700... [2024-12-19 11:05:53,932][07135] Avg episode rewards: #0: 4.178, true rewards: #0: 3.778 [2024-12-19 11:05:53,933][07135] Avg episode reward: 4.178, avg true_objective: 3.778 [2024-12-19 11:06:11,341][07135] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-19 11:06:16,463][07135] The model has been pushed to https://huggingface.co./Esteban00007/rl_course_vizdoom_health_gathering_supreme [2024-12-19 11:09:31,687][07135] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-19 11:09:31,691][07135] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-19 11:09:31,694][07135] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-19 11:09:31,695][07135] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-19 11:09:31,698][07135] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-19 11:09:31,701][07135] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-19 11:09:31,702][07135] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-19 11:09:31,706][07135] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-19 11:09:31,707][07135] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-19 11:09:31,708][07135] Adding new argument 'hf_repository'='Esteban00007/doom_health_gathering_supreme_2222' that is not in the saved config file! [2024-12-19 11:09:31,709][07135] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-19 11:09:31,713][07135] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-19 11:09:31,714][07135] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-19 11:09:31,715][07135] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-19 11:09:31,716][07135] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-19 11:09:31,756][07135] RunningMeanStd input shape: (3, 72, 128) [2024-12-19 11:09:31,758][07135] RunningMeanStd input shape: (1,) [2024-12-19 11:09:31,783][07135] ConvEncoder: input_channels=3 [2024-12-19 11:09:31,846][07135] Conv encoder output size: 512 [2024-12-19 11:09:31,848][07135] Policy head output size: 512 [2024-12-19 11:09:31,880][07135] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-12-19 11:09:32,477][07135] Num frames 100... [2024-12-19 11:09:32,671][07135] Num frames 200... [2024-12-19 11:09:32,845][07135] Num frames 300... [2024-12-19 11:09:33,013][07135] Num frames 400... [2024-12-19 11:09:33,152][07135] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2024-12-19 11:09:33,154][07135] Avg episode reward: 5.480, avg true_objective: 4.480 [2024-12-19 11:09:33,244][07135] Num frames 500... [2024-12-19 11:09:33,417][07135] Num frames 600... [2024-12-19 11:09:33,593][07135] Num frames 700... [2024-12-19 11:09:33,784][07135] Num frames 800... [2024-12-19 11:09:33,979][07135] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2024-12-19 11:09:33,981][07135] Avg episode reward: 5.480, avg true_objective: 4.480 [2024-12-19 11:09:33,989][07135] Num frames 900... [2024-12-19 11:09:34,113][07135] Num frames 1000... [2024-12-19 11:09:34,239][07135] Num frames 1100... [2024-12-19 11:09:34,363][07135] Num frames 1200... [2024-12-19 11:09:34,486][07135] Num frames 1300... [2024-12-19 11:09:34,602][07135] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2024-12-19 11:09:34,603][07135] Avg episode reward: 5.480, avg true_objective: 4.480 [2024-12-19 11:09:34,676][07135] Num frames 1400... [2024-12-19 11:09:34,808][07135] Num frames 1500... [2024-12-19 11:09:34,931][07135] Num frames 1600... [2024-12-19 11:09:35,051][07135] Num frames 1700... [2024-12-19 11:09:35,143][07135] Avg episode rewards: #0: 5.070, true rewards: #0: 4.320 [2024-12-19 11:09:35,144][07135] Avg episode reward: 5.070, avg true_objective: 4.320 [2024-12-19 11:09:35,239][07135] Num frames 1800... [2024-12-19 11:09:35,364][07135] Num frames 1900... [2024-12-19 11:09:35,486][07135] Num frames 2000... [2024-12-19 11:09:35,627][07135] Num frames 2100... [2024-12-19 11:09:35,697][07135] Avg episode rewards: #0: 4.824, true rewards: #0: 4.224 [2024-12-19 11:09:35,699][07135] Avg episode reward: 4.824, avg true_objective: 4.224 [2024-12-19 11:09:35,816][07135] Num frames 2200... [2024-12-19 11:09:35,937][07135] Num frames 2300... [2024-12-19 11:09:36,057][07135] Num frames 2400... [2024-12-19 11:09:36,230][07135] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2024-12-19 11:09:36,231][07135] Avg episode reward: 4.660, avg true_objective: 4.160 [2024-12-19 11:09:36,239][07135] Num frames 2500... [2024-12-19 11:09:36,358][07135] Num frames 2600... [2024-12-19 11:09:36,478][07135] Num frames 2700... [2024-12-19 11:09:36,604][07135] Avg episode rewards: #0: 4.360, true rewards: #0: 3.931 [2024-12-19 11:09:36,606][07135] Avg episode reward: 4.360, avg true_objective: 3.931 [2024-12-19 11:09:36,665][07135] Num frames 2800... [2024-12-19 11:09:36,792][07135] Num frames 2900... [2024-12-19 11:09:36,917][07135] Num frames 3000... [2024-12-19 11:09:37,039][07135] Num frames 3100... [2024-12-19 11:09:37,164][07135] Num frames 3200... [2024-12-19 11:09:37,335][07135] Avg episode rewards: #0: 4.745, true rewards: #0: 4.120 [2024-12-19 11:09:37,337][07135] Avg episode reward: 4.745, avg true_objective: 4.120 [2024-12-19 11:09:37,347][07135] Num frames 3300... [2024-12-19 11:09:37,473][07135] Num frames 3400... [2024-12-19 11:09:37,605][07135] Num frames 3500... [2024-12-19 11:09:37,731][07135] Num frames 3600... [2024-12-19 11:09:37,862][07135] Num frames 3700... [2024-12-19 11:09:37,933][07135] Avg episode rewards: #0: 4.680, true rewards: #0: 4.124 [2024-12-19 11:09:37,934][07135] Avg episode reward: 4.680, avg true_objective: 4.124 [2024-12-19 11:09:38,042][07135] Num frames 3800... [2024-12-19 11:09:38,162][07135] Num frames 3900... [2024-12-19 11:09:38,286][07135] Num frames 4000... [2024-12-19 11:09:38,408][07135] Num frames 4100... [2024-12-19 11:09:38,535][07135] Avg episode rewards: #0: 4.960, true rewards: #0: 4.160 [2024-12-19 11:09:38,537][07135] Avg episode reward: 4.960, avg true_objective: 4.160 [2024-12-19 11:09:59,233][07135] Replay video saved to /content/train_dir/default_experiment/replay.mp4!