[2024-07-29 13:27:30,621][00192] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-07-29 13:27:30,625][00192] Rollout worker 0 uses device cpu [2024-07-29 13:27:30,626][00192] Rollout worker 1 uses device cpu [2024-07-29 13:27:30,627][00192] Rollout worker 2 uses device cpu [2024-07-29 13:27:30,632][00192] Rollout worker 3 uses device cpu [2024-07-29 13:27:30,633][00192] Rollout worker 4 uses device cpu [2024-07-29 13:27:30,634][00192] Rollout worker 5 uses device cpu [2024-07-29 13:27:30,635][00192] Rollout worker 6 uses device cpu [2024-07-29 13:27:30,636][00192] Rollout worker 7 uses device cpu [2024-07-29 13:27:30,822][00192] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-29 13:27:30,824][00192] InferenceWorker_p0-w0: min num requests: 2 [2024-07-29 13:27:30,869][00192] Starting all processes... [2024-07-29 13:27:30,871][00192] Starting process learner_proc0 [2024-07-29 13:27:30,923][00192] Starting all processes... [2024-07-29 13:27:30,930][00192] Starting process inference_proc0-0 [2024-07-29 13:27:30,931][00192] Starting process rollout_proc0 [2024-07-29 13:27:30,932][00192] Starting process rollout_proc1 [2024-07-29 13:27:30,932][00192] Starting process rollout_proc2 [2024-07-29 13:27:30,932][00192] Starting process rollout_proc3 [2024-07-29 13:27:30,933][00192] Starting process rollout_proc4 [2024-07-29 13:27:30,933][00192] Starting process rollout_proc5 [2024-07-29 13:27:30,933][00192] Starting process rollout_proc6 [2024-07-29 13:27:30,933][00192] Starting process rollout_proc7 [2024-07-29 13:27:41,869][02907] Worker 4 uses CPU cores [0] [2024-07-29 13:27:41,894][02906] Worker 3 uses CPU cores [1] [2024-07-29 13:27:41,902][02889] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-29 13:27:41,903][02889] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-29 13:27:41,948][02889] Num visible devices: 1 [2024-07-29 13:27:41,979][02889] Starting seed is not provided [2024-07-29 13:27:41,980][02889] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-29 13:27:41,981][02889] Initializing actor-critic model on device cuda:0 [2024-07-29 13:27:41,982][02889] RunningMeanStd input shape: (3, 72, 128) [2024-07-29 13:27:41,983][02889] RunningMeanStd input shape: (1,) [2024-07-29 13:27:41,996][02909] Worker 6 uses CPU cores [0] [2024-07-29 13:27:42,001][02905] Worker 2 uses CPU cores [0] [2024-07-29 13:27:42,036][02889] ConvEncoder: input_channels=3 [2024-07-29 13:27:42,088][02903] Worker 0 uses CPU cores [0] [2024-07-29 13:27:42,137][02902] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-29 13:27:42,145][02902] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-29 13:27:42,187][02910] Worker 7 uses CPU cores [1] [2024-07-29 13:27:42,193][02902] Num visible devices: 1 [2024-07-29 13:27:42,196][02904] Worker 1 uses CPU cores [1] [2024-07-29 13:27:42,239][02908] Worker 5 uses CPU cores [1] [2024-07-29 13:27:42,323][02889] Conv encoder output size: 512 [2024-07-29 13:27:42,323][02889] Policy head output size: 512 [2024-07-29 13:27:42,338][02889] Created Actor Critic model with architecture: [2024-07-29 13:27:42,338][02889] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-29 13:27:47,074][02889] Using optimizer [2024-07-29 13:27:47,075][02889] No checkpoints found [2024-07-29 13:27:47,076][02889] Did not load from checkpoint, starting from scratch! [2024-07-29 13:27:47,076][02889] Initialized policy 0 weights for model version 0 [2024-07-29 13:27:47,083][02889] LearnerWorker_p0 finished initialization! [2024-07-29 13:27:47,083][02889] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-29 13:27:47,105][00192] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-29 13:27:47,367][02902] RunningMeanStd input shape: (3, 72, 128) [2024-07-29 13:27:47,369][02902] RunningMeanStd input shape: (1,) [2024-07-29 13:27:47,408][02902] ConvEncoder: input_channels=3 [2024-07-29 13:27:47,623][02902] Conv encoder output size: 512 [2024-07-29 13:27:47,625][02902] Policy head output size: 512 [2024-07-29 13:27:49,162][00192] Inference worker 0-0 is ready! [2024-07-29 13:27:49,164][00192] All inference workers are ready! Signal rollout workers to start! [2024-07-29 13:27:49,289][02906] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-29 13:27:49,294][02910] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-29 13:27:49,291][02908] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-29 13:27:49,292][02909] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-29 13:27:49,287][02903] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-29 13:27:49,302][02904] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-29 13:27:49,310][02907] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-29 13:27:49,320][02905] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-29 13:27:49,503][02906] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... [2024-07-29 13:27:49,505][02906] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 379, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.10/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2024-07-29 13:27:49,511][02906] Unhandled exception in evt loop rollout_proc3_evt_loop [2024-07-29 13:27:50,682][02904] Decorrelating experience for 0 frames... [2024-07-29 13:27:50,683][02907] Decorrelating experience for 0 frames... [2024-07-29 13:27:50,682][02903] Decorrelating experience for 0 frames... [2024-07-29 13:27:50,811][00192] Heartbeat connected on Batcher_0 [2024-07-29 13:27:50,817][00192] Heartbeat connected on LearnerWorker_p0 [2024-07-29 13:27:50,870][00192] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-29 13:27:51,412][02907] Decorrelating experience for 32 frames... [2024-07-29 13:27:51,411][02903] Decorrelating experience for 32 frames... [2024-07-29 13:27:51,788][02908] Decorrelating experience for 0 frames... [2024-07-29 13:27:51,798][02910] Decorrelating experience for 0 frames... [2024-07-29 13:27:51,813][02904] Decorrelating experience for 32 frames... [2024-07-29 13:27:52,105][00192] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-29 13:27:52,531][02903] Decorrelating experience for 64 frames... [2024-07-29 13:27:52,548][02907] Decorrelating experience for 64 frames... [2024-07-29 13:27:52,636][02908] Decorrelating experience for 32 frames... [2024-07-29 13:27:52,639][02910] Decorrelating experience for 32 frames... [2024-07-29 13:27:53,009][02909] Decorrelating experience for 0 frames... [2024-07-29 13:27:53,204][02904] Decorrelating experience for 64 frames... [2024-07-29 13:27:53,694][02903] Decorrelating experience for 96 frames... [2024-07-29 13:27:53,705][02907] Decorrelating experience for 96 frames... [2024-07-29 13:27:53,818][02908] Decorrelating experience for 64 frames... [2024-07-29 13:27:53,953][00192] Heartbeat connected on RolloutWorker_w0 [2024-07-29 13:27:53,970][00192] Heartbeat connected on RolloutWorker_w4 [2024-07-29 13:27:54,290][02904] Decorrelating experience for 96 frames... [2024-07-29 13:27:54,333][02909] Decorrelating experience for 32 frames... [2024-07-29 13:27:54,599][00192] Heartbeat connected on RolloutWorker_w1 [2024-07-29 13:27:54,660][02905] Decorrelating experience for 0 frames... [2024-07-29 13:27:54,975][02910] Decorrelating experience for 64 frames... [2024-07-29 13:27:55,129][02908] Decorrelating experience for 96 frames... [2024-07-29 13:27:55,305][00192] Heartbeat connected on RolloutWorker_w5 [2024-07-29 13:27:55,604][02909] Decorrelating experience for 64 frames... [2024-07-29 13:27:55,604][02905] Decorrelating experience for 32 frames... [2024-07-29 13:27:55,749][02910] Decorrelating experience for 96 frames... [2024-07-29 13:27:55,866][00192] Heartbeat connected on RolloutWorker_w7 [2024-07-29 13:27:56,616][02909] Decorrelating experience for 96 frames... [2024-07-29 13:27:56,683][02905] Decorrelating experience for 64 frames... [2024-07-29 13:27:56,894][00192] Heartbeat connected on RolloutWorker_w6 [2024-07-29 13:27:57,105][00192] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.8. Samples: 28. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-29 13:27:59,696][02905] Decorrelating experience for 96 frames... [2024-07-29 13:28:00,916][00192] Heartbeat connected on RolloutWorker_w2 [2024-07-29 13:28:02,105][00192] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 130.8. Samples: 1962. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-29 13:28:02,112][00192] Avg episode reward: [(0, '2.803')] [2024-07-29 13:28:02,413][02889] Signal inference workers to stop experience collection... [2024-07-29 13:28:02,430][02902] InferenceWorker_p0-w0: stopping experience collection [2024-07-29 13:28:03,782][02889] Signal inference workers to resume experience collection... [2024-07-29 13:28:03,784][02902] InferenceWorker_p0-w0: resuming experience collection [2024-07-29 13:28:07,105][00192] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 130.7. Samples: 2614. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-07-29 13:28:07,107][00192] Avg episode reward: [(0, '3.678')] [2024-07-29 13:28:12,105][00192] Fps is (10 sec: 3686.6, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 337.9. Samples: 8448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-07-29 13:28:12,111][00192] Avg episode reward: [(0, '4.088')] [2024-07-29 13:28:12,345][02902] Updated weights for policy 0, policy_version 10 (0.0356) [2024-07-29 13:28:17,105][00192] Fps is (10 sec: 3686.4, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 53248. Throughput: 0: 447.1. Samples: 13412. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:28:17,114][00192] Avg episode reward: [(0, '4.352')] [2024-07-29 13:28:22,107][00192] Fps is (10 sec: 3276.1, 60 sec: 1989.4, 300 sec: 1989.4). Total num frames: 69632. Throughput: 0: 444.2. Samples: 15548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:28:22,112][00192] Avg episode reward: [(0, '4.484')] [2024-07-29 13:28:24,703][02902] Updated weights for policy 0, policy_version 20 (0.0014) [2024-07-29 13:28:27,105][00192] Fps is (10 sec: 3686.4, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 90112. Throughput: 0: 535.2. Samples: 21410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:28:27,110][00192] Avg episode reward: [(0, '4.450')] [2024-07-29 13:28:32,106][00192] Fps is (10 sec: 3686.6, 60 sec: 2366.5, 300 sec: 2366.5). Total num frames: 106496. Throughput: 0: 592.0. Samples: 26642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:28:32,109][00192] Avg episode reward: [(0, '4.383')] [2024-07-29 13:28:32,110][02889] Saving new best policy, reward=4.383! [2024-07-29 13:28:37,105][00192] Fps is (10 sec: 2867.2, 60 sec: 2375.7, 300 sec: 2375.7). Total num frames: 118784. Throughput: 0: 636.0. Samples: 28622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:28:37,108][00192] Avg episode reward: [(0, '4.432')] [2024-07-29 13:28:37,144][02889] Saving new best policy, reward=4.432! [2024-07-29 13:28:37,149][02902] Updated weights for policy 0, policy_version 30 (0.0030) [2024-07-29 13:28:42,105][00192] Fps is (10 sec: 3686.8, 60 sec: 2606.5, 300 sec: 2606.5). Total num frames: 143360. Throughput: 0: 763.1. Samples: 34366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:28:42,110][00192] Avg episode reward: [(0, '4.526')] [2024-07-29 13:28:42,113][02889] Saving new best policy, reward=4.526! [2024-07-29 13:28:47,105][00192] Fps is (10 sec: 4096.1, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 159744. Throughput: 0: 854.3. Samples: 40404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:28:47,110][00192] Avg episode reward: [(0, '4.531')] [2024-07-29 13:28:47,116][02889] Saving new best policy, reward=4.531! [2024-07-29 13:28:47,791][02902] Updated weights for policy 0, policy_version 40 (0.0012) [2024-07-29 13:28:52,105][00192] Fps is (10 sec: 2867.3, 60 sec: 2867.2, 300 sec: 2646.6). Total num frames: 172032. Throughput: 0: 884.0. Samples: 42396. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:28:52,108][00192] Avg episode reward: [(0, '4.448')] [2024-07-29 13:28:57,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2750.2). Total num frames: 192512. Throughput: 0: 867.3. Samples: 47476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:28:57,112][00192] Avg episode reward: [(0, '4.174')] [2024-07-29 13:28:59,173][02902] Updated weights for policy 0, policy_version 50 (0.0022) [2024-07-29 13:29:02,110][00192] Fps is (10 sec: 4503.5, 60 sec: 3617.9, 300 sec: 2894.3). Total num frames: 217088. Throughput: 0: 896.7. Samples: 53766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:29:02,115][00192] Avg episode reward: [(0, '4.370')] [2024-07-29 13:29:07,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 2867.2). Total num frames: 229376. Throughput: 0: 906.9. Samples: 56356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:29:07,112][00192] Avg episode reward: [(0, '4.530')] [2024-07-29 13:29:11,351][02902] Updated weights for policy 0, policy_version 60 (0.0016) [2024-07-29 13:29:12,105][00192] Fps is (10 sec: 2868.5, 60 sec: 3481.6, 300 sec: 2891.3). Total num frames: 245760. Throughput: 0: 871.2. Samples: 60616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:29:12,108][00192] Avg episode reward: [(0, '4.477')] [2024-07-29 13:29:17,106][00192] Fps is (10 sec: 3685.9, 60 sec: 3549.8, 300 sec: 2958.2). Total num frames: 266240. Throughput: 0: 897.5. Samples: 67028. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:29:17,110][00192] Avg episode reward: [(0, '4.486')] [2024-07-29 13:29:22,011][02902] Updated weights for policy 0, policy_version 70 (0.0012) [2024-07-29 13:29:22,107][00192] Fps is (10 sec: 4095.0, 60 sec: 3618.1, 300 sec: 3018.0). Total num frames: 286720. Throughput: 0: 924.6. Samples: 70232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:29:22,110][00192] Avg episode reward: [(0, '4.416')] [2024-07-29 13:29:27,105][00192] Fps is (10 sec: 3277.2, 60 sec: 3481.6, 300 sec: 2990.1). Total num frames: 299008. Throughput: 0: 887.6. Samples: 74308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:29:27,108][00192] Avg episode reward: [(0, '4.213')] [2024-07-29 13:29:27,149][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth... [2024-07-29 13:29:32,107][00192] Fps is (10 sec: 3276.9, 60 sec: 3549.8, 300 sec: 3042.7). Total num frames: 319488. Throughput: 0: 884.1. Samples: 80192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:29:32,109][00192] Avg episode reward: [(0, '4.269')] [2024-07-29 13:29:33,180][02902] Updated weights for policy 0, policy_version 80 (0.0029) [2024-07-29 13:29:37,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3090.6). Total num frames: 339968. Throughput: 0: 909.5. Samples: 83324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:29:37,107][00192] Avg episode reward: [(0, '4.544')] [2024-07-29 13:29:37,119][02889] Saving new best policy, reward=4.544! [2024-07-29 13:29:42,105][00192] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3063.1). Total num frames: 352256. Throughput: 0: 902.1. Samples: 88072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:29:42,108][00192] Avg episode reward: [(0, '4.642')] [2024-07-29 13:29:42,114][02889] Saving new best policy, reward=4.642! [2024-07-29 13:29:45,644][02902] Updated weights for policy 0, policy_version 90 (0.0024) [2024-07-29 13:29:47,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3106.1). Total num frames: 372736. Throughput: 0: 878.5. Samples: 93294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:29:47,115][00192] Avg episode reward: [(0, '4.505')] [2024-07-29 13:29:52,105][00192] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3178.5). Total num frames: 397312. Throughput: 0: 893.8. Samples: 96576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:29:52,112][00192] Avg episode reward: [(0, '4.323')] [2024-07-29 13:29:56,613][02902] Updated weights for policy 0, policy_version 100 (0.0022) [2024-07-29 13:29:57,108][00192] Fps is (10 sec: 3685.2, 60 sec: 3617.9, 300 sec: 3150.7). Total num frames: 409600. Throughput: 0: 917.8. Samples: 101922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:29:57,111][00192] Avg episode reward: [(0, '4.564')] [2024-07-29 13:30:02,105][00192] Fps is (10 sec: 2867.2, 60 sec: 3481.9, 300 sec: 3155.4). Total num frames: 425984. Throughput: 0: 879.8. Samples: 106618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:30:02,107][00192] Avg episode reward: [(0, '4.719')] [2024-07-29 13:30:02,112][02889] Saving new best policy, reward=4.719! [2024-07-29 13:30:07,105][00192] Fps is (10 sec: 3687.6, 60 sec: 3618.1, 300 sec: 3189.0). Total num frames: 446464. Throughput: 0: 876.7. Samples: 109680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:30:07,108][00192] Avg episode reward: [(0, '4.576')] [2024-07-29 13:30:07,553][02902] Updated weights for policy 0, policy_version 110 (0.0014) [2024-07-29 13:30:12,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3220.3). Total num frames: 466944. Throughput: 0: 923.7. Samples: 115876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:30:12,110][00192] Avg episode reward: [(0, '4.426')] [2024-07-29 13:30:17,105][00192] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3194.9). Total num frames: 479232. Throughput: 0: 882.4. Samples: 119900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:30:17,110][00192] Avg episode reward: [(0, '4.441')] [2024-07-29 13:30:19,659][02902] Updated weights for policy 0, policy_version 120 (0.0017) [2024-07-29 13:30:22,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3224.0). Total num frames: 499712. Throughput: 0: 877.8. Samples: 122826. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:30:22,106][00192] Avg episode reward: [(0, '4.397')] [2024-07-29 13:30:27,105][00192] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3251.2). Total num frames: 520192. Throughput: 0: 911.2. Samples: 129076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:30:27,111][00192] Avg episode reward: [(0, '4.444')] [2024-07-29 13:30:31,067][02902] Updated weights for policy 0, policy_version 130 (0.0013) [2024-07-29 13:30:32,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3227.2). Total num frames: 532480. Throughput: 0: 896.3. Samples: 133626. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:30:32,110][00192] Avg episode reward: [(0, '4.541')] [2024-07-29 13:30:37,105][00192] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3228.6). Total num frames: 548864. Throughput: 0: 874.3. Samples: 135918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:30:37,112][00192] Avg episode reward: [(0, '4.828')] [2024-07-29 13:30:37,121][02889] Saving new best policy, reward=4.828! [2024-07-29 13:30:42,105][00192] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3206.6). Total num frames: 561152. Throughput: 0: 840.7. Samples: 139752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:30:42,111][00192] Avg episode reward: [(0, '4.652')] [2024-07-29 13:30:47,105][00192] Fps is (10 sec: 2048.0, 60 sec: 3276.8, 300 sec: 3163.0). Total num frames: 569344. Throughput: 0: 805.7. Samples: 142876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:30:47,116][00192] Avg episode reward: [(0, '4.584')] [2024-07-29 13:30:47,692][02902] Updated weights for policy 0, policy_version 140 (0.0027) [2024-07-29 13:30:52,105][00192] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3166.1). Total num frames: 585728. Throughput: 0: 781.5. Samples: 144846. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:30:52,110][00192] Avg episode reward: [(0, '4.464')] [2024-07-29 13:30:57,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3277.0, 300 sec: 3190.6). Total num frames: 606208. Throughput: 0: 778.3. Samples: 150900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:30:57,109][00192] Avg episode reward: [(0, '4.647')] [2024-07-29 13:30:58,127][02902] Updated weights for policy 0, policy_version 150 (0.0013) [2024-07-29 13:31:02,105][00192] Fps is (10 sec: 4095.8, 60 sec: 3345.0, 300 sec: 3213.8). Total num frames: 626688. Throughput: 0: 823.0. Samples: 156936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:31:02,110][00192] Avg episode reward: [(0, '4.686')] [2024-07-29 13:31:07,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3194.9). Total num frames: 638976. Throughput: 0: 804.0. Samples: 159008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:31:07,108][00192] Avg episode reward: [(0, '4.580')] [2024-07-29 13:31:10,251][02902] Updated weights for policy 0, policy_version 160 (0.0020) [2024-07-29 13:31:12,105][00192] Fps is (10 sec: 3686.6, 60 sec: 3276.8, 300 sec: 3236.8). Total num frames: 663552. Throughput: 0: 784.0. Samples: 164358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:31:12,107][00192] Avg episode reward: [(0, '4.710')] [2024-07-29 13:31:17,112][00192] Fps is (10 sec: 4502.6, 60 sec: 3413.0, 300 sec: 3257.2). Total num frames: 684032. Throughput: 0: 823.7. Samples: 170698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:31:17,114][00192] Avg episode reward: [(0, '4.980')] [2024-07-29 13:31:17,127][02889] Saving new best policy, reward=4.980! [2024-07-29 13:31:21,934][02902] Updated weights for policy 0, policy_version 170 (0.0021) [2024-07-29 13:31:22,105][00192] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3238.7). Total num frames: 696320. Throughput: 0: 820.9. Samples: 172858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:31:22,108][00192] Avg episode reward: [(0, '4.948')] [2024-07-29 13:31:27,105][00192] Fps is (10 sec: 2869.1, 60 sec: 3208.5, 300 sec: 3239.6). Total num frames: 712704. Throughput: 0: 835.7. Samples: 177360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:31:27,113][00192] Avg episode reward: [(0, '4.765')] [2024-07-29 13:31:27,128][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000174_712704.pth... [2024-07-29 13:31:32,105][00192] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3258.6). Total num frames: 733184. Throughput: 0: 907.4. Samples: 183708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:31:32,109][00192] Avg episode reward: [(0, '5.054')] [2024-07-29 13:31:32,111][02889] Saving new best policy, reward=5.054! [2024-07-29 13:31:32,372][02902] Updated weights for policy 0, policy_version 180 (0.0029) [2024-07-29 13:31:37,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3259.0). Total num frames: 749568. Throughput: 0: 928.7. Samples: 186638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:31:37,109][00192] Avg episode reward: [(0, '4.926')] [2024-07-29 13:31:42,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3259.4). Total num frames: 765952. Throughput: 0: 884.7. Samples: 190712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:31:42,111][00192] Avg episode reward: [(0, '5.069')] [2024-07-29 13:31:42,114][02889] Saving new best policy, reward=5.069! [2024-07-29 13:31:44,387][02902] Updated weights for policy 0, policy_version 190 (0.0028) [2024-07-29 13:31:47,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 786432. Throughput: 0: 890.5. Samples: 197010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:31:47,110][00192] Avg episode reward: [(0, '4.885')] [2024-07-29 13:31:52,105][00192] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3293.5). Total num frames: 806912. Throughput: 0: 914.1. Samples: 200142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:31:52,109][00192] Avg episode reward: [(0, '4.610')] [2024-07-29 13:31:56,211][02902] Updated weights for policy 0, policy_version 200 (0.0024) [2024-07-29 13:31:57,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 819200. Throughput: 0: 892.6. Samples: 204524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:31:57,107][00192] Avg episode reward: [(0, '4.825')] [2024-07-29 13:32:02,105][00192] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3292.9). Total num frames: 839680. Throughput: 0: 878.3. Samples: 210214. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:32:02,107][00192] Avg episode reward: [(0, '5.107')] [2024-07-29 13:32:02,113][02889] Saving new best policy, reward=5.107! [2024-07-29 13:32:06,041][02902] Updated weights for policy 0, policy_version 210 (0.0017) [2024-07-29 13:32:07,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3308.3). Total num frames: 860160. Throughput: 0: 902.2. Samples: 213458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:32:07,107][00192] Avg episode reward: [(0, '4.825')] [2024-07-29 13:32:12,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3307.7). Total num frames: 876544. Throughput: 0: 915.2. Samples: 218546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:32:12,107][00192] Avg episode reward: [(0, '4.715')] [2024-07-29 13:32:17,106][00192] Fps is (10 sec: 3276.3, 60 sec: 3481.9, 300 sec: 3307.1). Total num frames: 892928. Throughput: 0: 884.3. Samples: 223502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:32:17,109][00192] Avg episode reward: [(0, '4.702')] [2024-07-29 13:32:18,279][02902] Updated weights for policy 0, policy_version 220 (0.0015) [2024-07-29 13:32:22,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3321.5). Total num frames: 913408. Throughput: 0: 889.5. Samples: 226666. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:32:22,106][00192] Avg episode reward: [(0, '4.686')] [2024-07-29 13:32:27,105][00192] Fps is (10 sec: 3686.9, 60 sec: 3618.1, 300 sec: 3320.7). Total num frames: 929792. Throughput: 0: 930.3. Samples: 232574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:32:27,111][00192] Avg episode reward: [(0, '4.732')] [2024-07-29 13:32:30,207][02902] Updated weights for policy 0, policy_version 230 (0.0014) [2024-07-29 13:32:32,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3319.9). Total num frames: 946176. Throughput: 0: 884.7. Samples: 236822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:32:32,107][00192] Avg episode reward: [(0, '4.752')] [2024-07-29 13:32:37,107][00192] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3333.3). Total num frames: 966656. Throughput: 0: 884.7. Samples: 239954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-29 13:32:37,110][00192] Avg episode reward: [(0, '5.211')] [2024-07-29 13:32:37,120][02889] Saving new best policy, reward=5.211! [2024-07-29 13:32:40,200][02902] Updated weights for policy 0, policy_version 240 (0.0013) [2024-07-29 13:32:42,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3346.2). Total num frames: 987136. Throughput: 0: 928.7. Samples: 246314. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:32:42,110][00192] Avg episode reward: [(0, '5.316')] [2024-07-29 13:32:42,114][02889] Saving new best policy, reward=5.316! [2024-07-29 13:32:47,112][00192] Fps is (10 sec: 3274.9, 60 sec: 3549.4, 300 sec: 3387.8). Total num frames: 999424. Throughput: 0: 896.7. Samples: 250574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:32:47,119][00192] Avg episode reward: [(0, '5.142')] [2024-07-29 13:32:52,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 1019904. Throughput: 0: 883.7. Samples: 253226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:32:52,112][00192] Avg episode reward: [(0, '5.419')] [2024-07-29 13:32:52,179][02889] Saving new best policy, reward=5.419! [2024-07-29 13:32:52,187][02902] Updated weights for policy 0, policy_version 250 (0.0029) [2024-07-29 13:32:57,105][00192] Fps is (10 sec: 4099.0, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 1040384. Throughput: 0: 908.7. Samples: 259438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:32:57,107][00192] Avg episode reward: [(0, '5.546')] [2024-07-29 13:32:57,142][02889] Saving new best policy, reward=5.546! [2024-07-29 13:33:02,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 1056768. Throughput: 0: 910.3. Samples: 264464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:33:02,111][00192] Avg episode reward: [(0, '5.644')] [2024-07-29 13:33:02,126][02889] Saving new best policy, reward=5.644! [2024-07-29 13:33:04,689][02902] Updated weights for policy 0, policy_version 260 (0.0012) [2024-07-29 13:33:07,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1073152. Throughput: 0: 883.3. Samples: 266416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:33:07,108][00192] Avg episode reward: [(0, '6.013')] [2024-07-29 13:33:07,119][02889] Saving new best policy, reward=6.013! [2024-07-29 13:33:12,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 1093632. Throughput: 0: 889.0. Samples: 272580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:33:12,109][00192] Avg episode reward: [(0, '6.011')] [2024-07-29 13:33:14,263][02902] Updated weights for policy 0, policy_version 270 (0.0016) [2024-07-29 13:33:17,107][00192] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3540.6). Total num frames: 1114112. Throughput: 0: 923.4. Samples: 278378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:33:17,114][00192] Avg episode reward: [(0, '6.013')] [2024-07-29 13:33:22,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1126400. Throughput: 0: 898.8. Samples: 280400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:33:22,113][00192] Avg episode reward: [(0, '5.972')] [2024-07-29 13:33:26,502][02902] Updated weights for policy 0, policy_version 280 (0.0019) [2024-07-29 13:33:27,105][00192] Fps is (10 sec: 3277.5, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 1146880. Throughput: 0: 880.0. Samples: 285916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:33:27,110][00192] Avg episode reward: [(0, '5.678')] [2024-07-29 13:33:27,119][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000280_1146880.pth... [2024-07-29 13:33:27,238][02889] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth [2024-07-29 13:33:32,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1167360. Throughput: 0: 924.2. Samples: 292154. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:33:32,111][00192] Avg episode reward: [(0, '5.906')] [2024-07-29 13:33:37,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 1179648. Throughput: 0: 909.2. Samples: 294142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:33:37,110][00192] Avg episode reward: [(0, '6.072')] [2024-07-29 13:33:37,122][02889] Saving new best policy, reward=6.072! [2024-07-29 13:33:38,801][02902] Updated weights for policy 0, policy_version 290 (0.0017) [2024-07-29 13:33:42,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1200128. Throughput: 0: 882.2. Samples: 299138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:33:42,112][00192] Avg episode reward: [(0, '5.782')] [2024-07-29 13:33:47,105][00192] Fps is (10 sec: 4505.6, 60 sec: 3755.1, 300 sec: 3568.4). Total num frames: 1224704. Throughput: 0: 914.6. Samples: 305620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:33:47,113][00192] Avg episode reward: [(0, '6.358')] [2024-07-29 13:33:47,122][02889] Saving new best policy, reward=6.358! [2024-07-29 13:33:48,210][02902] Updated weights for policy 0, policy_version 300 (0.0021) [2024-07-29 13:33:52,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1236992. Throughput: 0: 930.6. Samples: 308292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:33:52,110][00192] Avg episode reward: [(0, '6.504')] [2024-07-29 13:33:52,112][02889] Saving new best policy, reward=6.504! [2024-07-29 13:33:57,105][00192] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.9). Total num frames: 1253376. Throughput: 0: 888.3. Samples: 312552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:33:57,108][00192] Avg episode reward: [(0, '6.722')] [2024-07-29 13:33:57,118][02889] Saving new best policy, reward=6.722! [2024-07-29 13:34:00,381][02902] Updated weights for policy 0, policy_version 310 (0.0028) [2024-07-29 13:34:02,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1273856. Throughput: 0: 898.5. Samples: 318808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:34:02,111][00192] Avg episode reward: [(0, '6.473')] [2024-07-29 13:34:07,105][00192] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1294336. Throughput: 0: 925.0. Samples: 322024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:34:07,108][00192] Avg episode reward: [(0, '6.472')] [2024-07-29 13:34:12,105][00192] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 1306624. Throughput: 0: 897.6. Samples: 326310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:34:12,108][00192] Avg episode reward: [(0, '6.475')] [2024-07-29 13:34:12,241][02902] Updated weights for policy 0, policy_version 320 (0.0022) [2024-07-29 13:34:17,105][00192] Fps is (10 sec: 3686.5, 60 sec: 3618.3, 300 sec: 3540.6). Total num frames: 1331200. Throughput: 0: 891.8. Samples: 332284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:34:17,111][00192] Avg episode reward: [(0, '5.938')] [2024-07-29 13:34:21,806][02902] Updated weights for policy 0, policy_version 330 (0.0015) [2024-07-29 13:34:22,107][00192] Fps is (10 sec: 4504.7, 60 sec: 3754.5, 300 sec: 3568.4). Total num frames: 1351680. Throughput: 0: 920.4. Samples: 335560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:34:22,109][00192] Avg episode reward: [(0, '5.951')] [2024-07-29 13:34:27,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1363968. Throughput: 0: 917.7. Samples: 340436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:34:27,112][00192] Avg episode reward: [(0, '5.903')] [2024-07-29 13:34:32,105][00192] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1384448. Throughput: 0: 890.5. Samples: 345694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:34:32,107][00192] Avg episode reward: [(0, '6.619')] [2024-07-29 13:34:33,999][02902] Updated weights for policy 0, policy_version 340 (0.0013) [2024-07-29 13:34:37,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3568.4). Total num frames: 1404928. Throughput: 0: 903.6. Samples: 348956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:34:37,107][00192] Avg episode reward: [(0, '6.660')] [2024-07-29 13:34:42,110][00192] Fps is (10 sec: 3684.5, 60 sec: 3686.1, 300 sec: 3554.4). Total num frames: 1421312. Throughput: 0: 934.6. Samples: 354614. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:34:42,112][00192] Avg episode reward: [(0, '6.720')] [2024-07-29 13:34:45,812][02902] Updated weights for policy 0, policy_version 350 (0.0013) [2024-07-29 13:34:47,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1437696. Throughput: 0: 899.4. Samples: 359282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:34:47,111][00192] Avg episode reward: [(0, '7.267')] [2024-07-29 13:34:47,119][02889] Saving new best policy, reward=7.267! [2024-07-29 13:34:52,105][00192] Fps is (10 sec: 3688.3, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1458176. Throughput: 0: 897.0. Samples: 362388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:34:52,112][00192] Avg episode reward: [(0, '7.790')] [2024-07-29 13:34:52,115][02889] Saving new best policy, reward=7.790! [2024-07-29 13:34:55,425][02902] Updated weights for policy 0, policy_version 360 (0.0014) [2024-07-29 13:34:57,109][00192] Fps is (10 sec: 3684.8, 60 sec: 3686.1, 300 sec: 3554.4). Total num frames: 1474560. Throughput: 0: 943.4. Samples: 368768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:34:57,112][00192] Avg episode reward: [(0, '7.344')] [2024-07-29 13:35:02,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1490944. Throughput: 0: 900.9. Samples: 372824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-29 13:35:02,111][00192] Avg episode reward: [(0, '7.521')] [2024-07-29 13:35:07,105][00192] Fps is (10 sec: 3688.0, 60 sec: 3618.2, 300 sec: 3540.6). Total num frames: 1511424. Throughput: 0: 891.2. Samples: 375662. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:35:07,111][00192] Avg episode reward: [(0, '7.310')] [2024-07-29 13:35:07,659][02902] Updated weights for policy 0, policy_version 370 (0.0012) [2024-07-29 13:35:12,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3568.4). Total num frames: 1531904. Throughput: 0: 925.1. Samples: 382066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:35:12,111][00192] Avg episode reward: [(0, '8.374')] [2024-07-29 13:35:12,115][02889] Saving new best policy, reward=8.374! [2024-07-29 13:35:17,107][00192] Fps is (10 sec: 3276.0, 60 sec: 3549.7, 300 sec: 3540.6). Total num frames: 1544192. Throughput: 0: 912.2. Samples: 386744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:35:17,117][00192] Avg episode reward: [(0, '8.378')] [2024-07-29 13:35:17,134][02889] Saving new best policy, reward=8.378! [2024-07-29 13:35:19,915][02902] Updated weights for policy 0, policy_version 380 (0.0019) [2024-07-29 13:35:22,105][00192] Fps is (10 sec: 3276.7, 60 sec: 3550.0, 300 sec: 3540.6). Total num frames: 1564672. Throughput: 0: 888.5. Samples: 388938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:35:22,111][00192] Avg episode reward: [(0, '8.369')] [2024-07-29 13:35:27,105][00192] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1585152. Throughput: 0: 905.0. Samples: 395334. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:35:27,108][00192] Avg episode reward: [(0, '8.421')] [2024-07-29 13:35:27,119][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000387_1585152.pth... [2024-07-29 13:35:27,229][02889] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000174_712704.pth [2024-07-29 13:35:27,246][02889] Saving new best policy, reward=8.421! [2024-07-29 13:35:30,071][02902] Updated weights for policy 0, policy_version 390 (0.0030) [2024-07-29 13:35:32,106][00192] Fps is (10 sec: 3686.1, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1601536. Throughput: 0: 918.9. Samples: 400634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:35:32,108][00192] Avg episode reward: [(0, '8.280')] [2024-07-29 13:35:37,105][00192] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1617920. Throughput: 0: 895.4. Samples: 402682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-29 13:35:37,108][00192] Avg episode reward: [(0, '9.280')] [2024-07-29 13:35:37,117][02889] Saving new best policy, reward=9.280! [2024-07-29 13:35:41,755][02902] Updated weights for policy 0, policy_version 400 (0.0017) [2024-07-29 13:35:42,105][00192] Fps is (10 sec: 3686.8, 60 sec: 3618.4, 300 sec: 3623.9). Total num frames: 1638400. Throughput: 0: 882.0. Samples: 408456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:35:42,108][00192] Avg episode reward: [(0, '10.200')] [2024-07-29 13:35:42,112][02889] Saving new best policy, reward=10.200! [2024-07-29 13:35:47,105][00192] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1658880. Throughput: 0: 929.1. Samples: 414632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:35:47,108][00192] Avg episode reward: [(0, '10.447')] [2024-07-29 13:35:47,125][02889] Saving new best policy, reward=10.447! [2024-07-29 13:35:52,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 1671168. Throughput: 0: 909.0. Samples: 416568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:35:52,107][00192] Avg episode reward: [(0, '10.804')] [2024-07-29 13:35:52,109][02889] Saving new best policy, reward=10.804! [2024-07-29 13:35:53,892][02902] Updated weights for policy 0, policy_version 410 (0.0023) [2024-07-29 13:35:57,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3610.0). Total num frames: 1691648. Throughput: 0: 882.1. Samples: 421760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:35:57,109][00192] Avg episode reward: [(0, '10.970')] [2024-07-29 13:35:57,117][02889] Saving new best policy, reward=10.970! [2024-07-29 13:36:02,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1712128. Throughput: 0: 919.6. Samples: 428124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:36:02,107][00192] Avg episode reward: [(0, '12.162')] [2024-07-29 13:36:02,110][02889] Saving new best policy, reward=12.162! [2024-07-29 13:36:03,914][02902] Updated weights for policy 0, policy_version 420 (0.0013) [2024-07-29 13:36:07,110][00192] Fps is (10 sec: 3684.4, 60 sec: 3617.8, 300 sec: 3610.0). Total num frames: 1728512. Throughput: 0: 926.8. Samples: 430648. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:36:07,112][00192] Avg episode reward: [(0, '12.156')] [2024-07-29 13:36:12,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 1744896. Throughput: 0: 880.8. Samples: 434972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:36:12,108][00192] Avg episode reward: [(0, '12.984')] [2024-07-29 13:36:12,111][02889] Saving new best policy, reward=12.984! [2024-07-29 13:36:15,594][02902] Updated weights for policy 0, policy_version 430 (0.0014) [2024-07-29 13:36:17,105][00192] Fps is (10 sec: 3688.4, 60 sec: 3686.5, 300 sec: 3623.9). Total num frames: 1765376. Throughput: 0: 906.5. Samples: 441426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:36:17,107][00192] Avg episode reward: [(0, '13.839')] [2024-07-29 13:36:17,122][02889] Saving new best policy, reward=13.839! [2024-07-29 13:36:22,105][00192] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1781760. Throughput: 0: 932.1. Samples: 444626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:36:22,113][00192] Avg episode reward: [(0, '13.569')] [2024-07-29 13:36:27,105][00192] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3610.0). Total num frames: 1798144. Throughput: 0: 898.2. Samples: 448874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:36:27,111][00192] Avg episode reward: [(0, '13.109')] [2024-07-29 13:36:27,615][02902] Updated weights for policy 0, policy_version 440 (0.0025) [2024-07-29 13:36:32,105][00192] Fps is (10 sec: 3686.6, 60 sec: 3618.2, 300 sec: 3623.9). Total num frames: 1818624. Throughput: 0: 894.5. Samples: 454886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:36:32,107][00192] Avg episode reward: [(0, '12.095')] [2024-07-29 13:36:37,105][00192] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1839104. Throughput: 0: 923.5. Samples: 458124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:36:37,110][00192] Avg episode reward: [(0, '11.802')] [2024-07-29 13:36:37,558][02902] Updated weights for policy 0, policy_version 450 (0.0020) [2024-07-29 13:36:42,105][00192] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1855488. Throughput: 0: 918.0. Samples: 463070. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:36:42,115][00192] Avg episode reward: [(0, '11.436')] [2024-07-29 13:36:47,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1875968. Throughput: 0: 894.2. Samples: 468362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:36:47,107][00192] Avg episode reward: [(0, '12.784')] [2024-07-29 13:36:49,069][02902] Updated weights for policy 0, policy_version 460 (0.0018) [2024-07-29 13:36:52,105][00192] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1896448. Throughput: 0: 911.4. Samples: 471656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:36:52,112][00192] Avg episode reward: [(0, '13.230')] [2024-07-29 13:36:57,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1908736. Throughput: 0: 942.0. Samples: 477364. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:36:57,107][00192] Avg episode reward: [(0, '12.524')] [2024-07-29 13:37:01,081][02902] Updated weights for policy 0, policy_version 470 (0.0014) [2024-07-29 13:37:02,105][00192] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 1925120. Throughput: 0: 898.0. Samples: 481838. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:37:02,111][00192] Avg episode reward: [(0, '12.351')] [2024-07-29 13:37:07,108][00192] Fps is (10 sec: 4094.9, 60 sec: 3686.6, 300 sec: 3637.8). Total num frames: 1949696. Throughput: 0: 899.4. Samples: 485100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:37:07,110][00192] Avg episode reward: [(0, '14.144')] [2024-07-29 13:37:07,117][02889] Saving new best policy, reward=14.144! [2024-07-29 13:37:11,258][02902] Updated weights for policy 0, policy_version 480 (0.0016) [2024-07-29 13:37:12,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1966080. Throughput: 0: 942.3. Samples: 491278. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:37:12,108][00192] Avg episode reward: [(0, '13.223')] [2024-07-29 13:37:17,105][00192] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1982464. Throughput: 0: 901.1. Samples: 495434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:37:17,107][00192] Avg episode reward: [(0, '13.131')] [2024-07-29 13:37:22,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2002944. Throughput: 0: 895.2. Samples: 498408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:37:22,112][00192] Avg episode reward: [(0, '12.209')] [2024-07-29 13:37:22,715][02902] Updated weights for policy 0, policy_version 490 (0.0020) [2024-07-29 13:37:27,108][00192] Fps is (10 sec: 4094.6, 60 sec: 3754.5, 300 sec: 3651.6). Total num frames: 2023424. Throughput: 0: 930.1. Samples: 504926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:37:27,111][00192] Avg episode reward: [(0, '12.446')] [2024-07-29 13:37:27,125][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000494_2023424.pth... [2024-07-29 13:37:27,124][00192] Components not started: RolloutWorker_w3, wait_time=600.0 seconds [2024-07-29 13:37:27,268][02889] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000280_1146880.pth [2024-07-29 13:37:32,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2035712. Throughput: 0: 917.6. Samples: 509656. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:37:32,107][00192] Avg episode reward: [(0, '13.326')] [2024-07-29 13:37:34,813][02902] Updated weights for policy 0, policy_version 500 (0.0019) [2024-07-29 13:37:37,105][00192] Fps is (10 sec: 3277.9, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2056192. Throughput: 0: 892.8. Samples: 511830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:37:37,111][00192] Avg episode reward: [(0, '14.210')] [2024-07-29 13:37:37,121][02889] Saving new best policy, reward=14.210! [2024-07-29 13:37:42,105][00192] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3651.8). Total num frames: 2076672. Throughput: 0: 907.9. Samples: 518218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-29 13:37:42,108][00192] Avg episode reward: [(0, '13.959')] [2024-07-29 13:37:44,283][02902] Updated weights for policy 0, policy_version 510 (0.0019) [2024-07-29 13:37:47,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2093056. Throughput: 0: 932.0. Samples: 523776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-07-29 13:37:47,112][00192] Avg episode reward: [(0, '13.729')] [2024-07-29 13:37:52,105][00192] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2109440. Throughput: 0: 905.4. Samples: 525840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:37:52,107][00192] Avg episode reward: [(0, '13.451')] [2024-07-29 13:37:56,690][02902] Updated weights for policy 0, policy_version 520 (0.0029) [2024-07-29 13:37:57,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2129920. Throughput: 0: 890.8. Samples: 531366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:37:57,109][00192] Avg episode reward: [(0, '13.965')] [2024-07-29 13:38:02,109][00192] Fps is (10 sec: 3684.8, 60 sec: 3686.1, 300 sec: 3637.7). Total num frames: 2146304. Throughput: 0: 929.2. Samples: 537252. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-07-29 13:38:02,112][00192] Avg episode reward: [(0, '14.076')] [2024-07-29 13:38:07,105][00192] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3610.0). Total num frames: 2158592. Throughput: 0: 906.4. Samples: 539196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:38:07,108][00192] Avg episode reward: [(0, '14.450')] [2024-07-29 13:38:07,118][02889] Saving new best policy, reward=14.450! [2024-07-29 13:38:09,303][02902] Updated weights for policy 0, policy_version 530 (0.0029) [2024-07-29 13:38:12,105][00192] Fps is (10 sec: 3278.2, 60 sec: 3549.9, 300 sec: 3610.1). Total num frames: 2179072. Throughput: 0: 868.3. Samples: 543998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:38:12,111][00192] Avg episode reward: [(0, '14.926')] [2024-07-29 13:38:12,114][02889] Saving new best policy, reward=14.926! [2024-07-29 13:38:17,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2199552. Throughput: 0: 904.1. Samples: 550340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:38:17,110][00192] Avg episode reward: [(0, '13.307')] [2024-07-29 13:38:19,664][02902] Updated weights for policy 0, policy_version 540 (0.0029) [2024-07-29 13:38:22,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2215936. Throughput: 0: 914.8. Samples: 552998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:38:22,109][00192] Avg episode reward: [(0, '13.447')] [2024-07-29 13:38:27,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3610.0). Total num frames: 2232320. Throughput: 0: 867.8. Samples: 557268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:38:27,111][00192] Avg episode reward: [(0, '13.271')] [2024-07-29 13:38:31,408][02902] Updated weights for policy 0, policy_version 550 (0.0023) [2024-07-29 13:38:32,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2252800. Throughput: 0: 880.5. Samples: 563400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:38:32,113][00192] Avg episode reward: [(0, '12.654')] [2024-07-29 13:38:37,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2269184. Throughput: 0: 903.2. Samples: 566486. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:38:37,109][00192] Avg episode reward: [(0, '13.895')] [2024-07-29 13:38:42,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 2285568. Throughput: 0: 872.0. Samples: 570608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:38:42,108][00192] Avg episode reward: [(0, '14.121')] [2024-07-29 13:38:43,907][02902] Updated weights for policy 0, policy_version 560 (0.0014) [2024-07-29 13:38:47,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2306048. Throughput: 0: 871.1. Samples: 576446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:38:47,108][00192] Avg episode reward: [(0, '13.606')] [2024-07-29 13:38:52,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2326528. Throughput: 0: 900.4. Samples: 579714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:38:52,107][00192] Avg episode reward: [(0, '15.096')] [2024-07-29 13:38:52,111][02889] Saving new best policy, reward=15.096! [2024-07-29 13:38:54,413][02902] Updated weights for policy 0, policy_version 570 (0.0016) [2024-07-29 13:38:57,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3610.0). Total num frames: 2338816. Throughput: 0: 901.1. Samples: 584546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:38:57,112][00192] Avg episode reward: [(0, '16.272')] [2024-07-29 13:38:57,127][02889] Saving new best policy, reward=16.272! [2024-07-29 13:39:02,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3610.0). Total num frames: 2359296. Throughput: 0: 871.9. Samples: 589576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:39:02,107][00192] Avg episode reward: [(0, '17.302')] [2024-07-29 13:39:02,112][02889] Saving new best policy, reward=17.302! [2024-07-29 13:39:05,601][02902] Updated weights for policy 0, policy_version 580 (0.0016) [2024-07-29 13:39:07,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2379776. Throughput: 0: 883.5. Samples: 592754. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:39:07,110][00192] Avg episode reward: [(0, '18.064')] [2024-07-29 13:39:07,121][02889] Saving new best policy, reward=18.064! [2024-07-29 13:39:12,107][00192] Fps is (10 sec: 3685.5, 60 sec: 3618.0, 300 sec: 3610.0). Total num frames: 2396160. Throughput: 0: 913.4. Samples: 598374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:39:12,109][00192] Avg episode reward: [(0, '18.502')] [2024-07-29 13:39:12,111][02889] Saving new best policy, reward=18.502! [2024-07-29 13:39:17,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 2412544. Throughput: 0: 874.6. Samples: 602756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:39:17,108][00192] Avg episode reward: [(0, '19.547')] [2024-07-29 13:39:17,120][02889] Saving new best policy, reward=19.547! [2024-07-29 13:39:18,010][02902] Updated weights for policy 0, policy_version 590 (0.0025) [2024-07-29 13:39:22,107][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.0, 300 sec: 3623.9). Total num frames: 2433024. Throughput: 0: 875.8. Samples: 605898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:39:22,114][00192] Avg episode reward: [(0, '18.989')] [2024-07-29 13:39:27,110][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 2449408. Throughput: 0: 926.2. Samples: 612286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:39:27,116][00192] Avg episode reward: [(0, '18.556')] [2024-07-29 13:39:27,124][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000598_2449408.pth... [2024-07-29 13:39:27,260][02889] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000387_1585152.pth [2024-07-29 13:39:29,168][02902] Updated weights for policy 0, policy_version 600 (0.0023) [2024-07-29 13:39:32,105][00192] Fps is (10 sec: 3277.6, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 2465792. Throughput: 0: 885.4. Samples: 616288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:39:32,110][00192] Avg episode reward: [(0, '17.910')] [2024-07-29 13:39:37,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 2486272. Throughput: 0: 874.0. Samples: 619046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:39:37,107][00192] Avg episode reward: [(0, '19.064')] [2024-07-29 13:39:39,853][02902] Updated weights for policy 0, policy_version 610 (0.0020) [2024-07-29 13:39:42,105][00192] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 2506752. Throughput: 0: 910.8. Samples: 625530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:39:42,108][00192] Avg episode reward: [(0, '18.088')] [2024-07-29 13:39:47,109][00192] Fps is (10 sec: 3275.4, 60 sec: 3549.6, 300 sec: 3596.1). Total num frames: 2519040. Throughput: 0: 907.9. Samples: 630434. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:39:47,112][00192] Avg episode reward: [(0, '17.691')] [2024-07-29 13:39:51,788][02902] Updated weights for policy 0, policy_version 620 (0.0021) [2024-07-29 13:39:52,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.1). Total num frames: 2539520. Throughput: 0: 884.0. Samples: 632536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:39:52,112][00192] Avg episode reward: [(0, '17.624')] [2024-07-29 13:39:57,105][00192] Fps is (10 sec: 4097.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 2560000. Throughput: 0: 900.3. Samples: 638884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-29 13:39:57,107][00192] Avg episode reward: [(0, '17.064')] [2024-07-29 13:40:02,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 2576384. Throughput: 0: 928.0. Samples: 644514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:40:02,108][00192] Avg episode reward: [(0, '17.726')] [2024-07-29 13:40:02,710][02902] Updated weights for policy 0, policy_version 630 (0.0018) [2024-07-29 13:40:07,105][00192] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3596.1). Total num frames: 2592768. Throughput: 0: 901.7. Samples: 646474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:40:07,108][00192] Avg episode reward: [(0, '18.236')] [2024-07-29 13:40:12,105][00192] Fps is (10 sec: 3686.3, 60 sec: 3618.3, 300 sec: 3623.9). Total num frames: 2613248. Throughput: 0: 887.5. Samples: 652224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:40:12,108][00192] Avg episode reward: [(0, '18.567')] [2024-07-29 13:40:13,640][02902] Updated weights for policy 0, policy_version 640 (0.0020) [2024-07-29 13:40:17,106][00192] Fps is (10 sec: 4095.7, 60 sec: 3686.3, 300 sec: 3623.9). Total num frames: 2633728. Throughput: 0: 937.5. Samples: 658476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:40:17,110][00192] Avg episode reward: [(0, '19.733')] [2024-07-29 13:40:17,120][02889] Saving new best policy, reward=19.733! [2024-07-29 13:40:22,105][00192] Fps is (10 sec: 3276.9, 60 sec: 3550.0, 300 sec: 3596.1). Total num frames: 2646016. Throughput: 0: 916.5. Samples: 660288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:40:22,107][00192] Avg episode reward: [(0, '20.855')] [2024-07-29 13:40:22,109][02889] Saving new best policy, reward=20.855! [2024-07-29 13:40:26,522][02902] Updated weights for policy 0, policy_version 650 (0.0012) [2024-07-29 13:40:27,105][00192] Fps is (10 sec: 2867.4, 60 sec: 3549.8, 300 sec: 3596.2). Total num frames: 2662400. Throughput: 0: 873.2. Samples: 664822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:40:27,108][00192] Avg episode reward: [(0, '22.093')] [2024-07-29 13:40:27,117][02889] Saving new best policy, reward=22.093! [2024-07-29 13:40:32,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 2682880. Throughput: 0: 892.4. Samples: 670586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:40:32,109][00192] Avg episode reward: [(0, '22.017')] [2024-07-29 13:40:37,105][00192] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 2695168. Throughput: 0: 904.5. Samples: 673240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:40:37,109][00192] Avg episode reward: [(0, '22.629')] [2024-07-29 13:40:37,120][02889] Saving new best policy, reward=22.629! [2024-07-29 13:40:38,837][02902] Updated weights for policy 0, policy_version 660 (0.0014) [2024-07-29 13:40:42,105][00192] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3568.4). Total num frames: 2711552. Throughput: 0: 854.4. Samples: 677332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:40:42,111][00192] Avg episode reward: [(0, '22.793')] [2024-07-29 13:40:42,137][02889] Saving new best policy, reward=22.793! [2024-07-29 13:40:47,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3618.4, 300 sec: 3610.0). Total num frames: 2736128. Throughput: 0: 865.2. Samples: 683450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:40:47,108][00192] Avg episode reward: [(0, '22.519')] [2024-07-29 13:40:49,312][02902] Updated weights for policy 0, policy_version 670 (0.0023) [2024-07-29 13:40:52,106][00192] Fps is (10 sec: 3686.1, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 2748416. Throughput: 0: 891.6. Samples: 686598. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:40:52,112][00192] Avg episode reward: [(0, '21.981')] [2024-07-29 13:40:57,105][00192] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3568.4). Total num frames: 2764800. Throughput: 0: 855.2. Samples: 690710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:40:57,107][00192] Avg episode reward: [(0, '22.114')] [2024-07-29 13:41:01,344][02902] Updated weights for policy 0, policy_version 680 (0.0019) [2024-07-29 13:41:02,105][00192] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 2785280. Throughput: 0: 847.5. Samples: 696614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:41:02,107][00192] Avg episode reward: [(0, '22.262')] [2024-07-29 13:41:07,111][00192] Fps is (10 sec: 4093.4, 60 sec: 3549.5, 300 sec: 3596.1). Total num frames: 2805760. Throughput: 0: 876.9. Samples: 699756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:41:07,121][00192] Avg episode reward: [(0, '22.240')] [2024-07-29 13:41:12,105][00192] Fps is (10 sec: 3276.9, 60 sec: 3413.4, 300 sec: 3568.4). Total num frames: 2818048. Throughput: 0: 880.3. Samples: 704436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:41:12,107][00192] Avg episode reward: [(0, '20.550')] [2024-07-29 13:41:13,436][02902] Updated weights for policy 0, policy_version 690 (0.0012) [2024-07-29 13:41:17,105][00192] Fps is (10 sec: 3278.9, 60 sec: 3413.4, 300 sec: 3582.3). Total num frames: 2838528. Throughput: 0: 873.4. Samples: 709888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:41:17,112][00192] Avg episode reward: [(0, '19.884')] [2024-07-29 13:41:22,105][00192] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 2863104. Throughput: 0: 885.6. Samples: 713094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:41:22,108][00192] Avg episode reward: [(0, '20.318')] [2024-07-29 13:41:23,033][02902] Updated weights for policy 0, policy_version 700 (0.0013) [2024-07-29 13:41:27,108][00192] Fps is (10 sec: 3685.4, 60 sec: 3549.7, 300 sec: 3582.2). Total num frames: 2875392. Throughput: 0: 915.0. Samples: 718508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:41:27,110][00192] Avg episode reward: [(0, '19.829')] [2024-07-29 13:41:27,123][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000702_2875392.pth... [2024-07-29 13:41:27,275][02889] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000494_2023424.pth [2024-07-29 13:41:32,105][00192] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2891776. Throughput: 0: 875.6. Samples: 722854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:41:32,112][00192] Avg episode reward: [(0, '20.480')] [2024-07-29 13:41:35,597][02902] Updated weights for policy 0, policy_version 710 (0.0020) [2024-07-29 13:41:37,105][00192] Fps is (10 sec: 3687.3, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 2912256. Throughput: 0: 874.9. Samples: 725968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:41:37,108][00192] Avg episode reward: [(0, '20.581')] [2024-07-29 13:41:42,105][00192] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2928640. Throughput: 0: 915.3. Samples: 731900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:41:42,107][00192] Avg episode reward: [(0, '21.274')] [2024-07-29 13:41:47,105][00192] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 2940928. Throughput: 0: 875.0. Samples: 735990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:41:47,107][00192] Avg episode reward: [(0, '22.031')] [2024-07-29 13:41:47,999][02902] Updated weights for policy 0, policy_version 720 (0.0018) [2024-07-29 13:41:52,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 2965504. Throughput: 0: 872.6. Samples: 739016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:41:52,109][00192] Avg episode reward: [(0, '21.469')] [2024-07-29 13:41:57,105][00192] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 2985984. Throughput: 0: 912.0. Samples: 745476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-07-29 13:41:57,107][00192] Avg episode reward: [(0, '20.788')] [2024-07-29 13:41:58,071][02902] Updated weights for policy 0, policy_version 730 (0.0026) [2024-07-29 13:42:02,105][00192] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2998272. Throughput: 0: 893.6. Samples: 750100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:42:02,110][00192] Avg episode reward: [(0, '20.797')] [2024-07-29 13:42:07,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3550.2, 300 sec: 3568.4). Total num frames: 3018752. Throughput: 0: 874.6. Samples: 752452. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:42:07,110][00192] Avg episode reward: [(0, '20.762')] [2024-07-29 13:42:09,702][02902] Updated weights for policy 0, policy_version 740 (0.0014) [2024-07-29 13:42:12,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3039232. Throughput: 0: 896.7. Samples: 758856. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:42:12,111][00192] Avg episode reward: [(0, '21.119')] [2024-07-29 13:42:17,108][00192] Fps is (10 sec: 3685.4, 60 sec: 3618.0, 300 sec: 3568.3). Total num frames: 3055616. Throughput: 0: 919.2. Samples: 764222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:42:17,110][00192] Avg episode reward: [(0, '20.351')] [2024-07-29 13:42:21,612][02902] Updated weights for policy 0, policy_version 750 (0.0013) [2024-07-29 13:42:22,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 3072000. Throughput: 0: 895.9. Samples: 766282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:42:22,109][00192] Avg episode reward: [(0, '19.886')] [2024-07-29 13:42:27,105][00192] Fps is (10 sec: 3687.4, 60 sec: 3618.3, 300 sec: 3582.3). Total num frames: 3092480. Throughput: 0: 896.4. Samples: 772238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:42:27,112][00192] Avg episode reward: [(0, '20.178')] [2024-07-29 13:42:31,697][02902] Updated weights for policy 0, policy_version 760 (0.0017) [2024-07-29 13:42:32,109][00192] Fps is (10 sec: 4094.2, 60 sec: 3686.1, 300 sec: 3582.2). Total num frames: 3112960. Throughput: 0: 938.0. Samples: 778206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:42:32,112][00192] Avg episode reward: [(0, '21.462')] [2024-07-29 13:42:37,105][00192] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3125248. Throughput: 0: 914.5. Samples: 780168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:42:37,110][00192] Avg episode reward: [(0, '20.783')] [2024-07-29 13:42:42,105][00192] Fps is (10 sec: 3278.2, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3145728. Throughput: 0: 883.5. Samples: 785232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:42:42,109][00192] Avg episode reward: [(0, '20.987')] [2024-07-29 13:42:43,718][02902] Updated weights for policy 0, policy_version 770 (0.0020) [2024-07-29 13:42:47,105][00192] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 3166208. Throughput: 0: 922.4. Samples: 791608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:42:47,108][00192] Avg episode reward: [(0, '22.979')] [2024-07-29 13:42:47,118][02889] Saving new best policy, reward=22.979! [2024-07-29 13:42:52,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3178496. Throughput: 0: 924.0. Samples: 794034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:42:52,111][00192] Avg episode reward: [(0, '24.407')] [2024-07-29 13:42:52,190][02889] Saving new best policy, reward=24.407! [2024-07-29 13:42:56,073][02902] Updated weights for policy 0, policy_version 780 (0.0029) [2024-07-29 13:42:57,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3198976. Throughput: 0: 877.2. Samples: 798328. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:42:57,109][00192] Avg episode reward: [(0, '23.245')] [2024-07-29 13:43:02,107][00192] Fps is (10 sec: 4095.3, 60 sec: 3686.3, 300 sec: 3596.1). Total num frames: 3219456. Throughput: 0: 900.8. Samples: 804756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:43:02,111][00192] Avg episode reward: [(0, '22.767')] [2024-07-29 13:43:06,498][02902] Updated weights for policy 0, policy_version 790 (0.0015) [2024-07-29 13:43:07,108][00192] Fps is (10 sec: 3685.3, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 3235840. Throughput: 0: 924.8. Samples: 807900. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-07-29 13:43:07,110][00192] Avg episode reward: [(0, '22.917')] [2024-07-29 13:43:12,105][00192] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3252224. Throughput: 0: 883.1. Samples: 811976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:43:12,117][00192] Avg episode reward: [(0, '22.945')] [2024-07-29 13:43:17,105][00192] Fps is (10 sec: 3687.5, 60 sec: 3618.3, 300 sec: 3582.3). Total num frames: 3272704. Throughput: 0: 886.3. Samples: 818084. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:43:17,112][00192] Avg episode reward: [(0, '21.518')] [2024-07-29 13:43:17,549][02902] Updated weights for policy 0, policy_version 800 (0.0016) [2024-07-29 13:43:22,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3293184. Throughput: 0: 914.2. Samples: 821306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:43:22,110][00192] Avg episode reward: [(0, '22.104')] [2024-07-29 13:43:27,107][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3305472. Throughput: 0: 908.3. Samples: 826106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:43:27,111][00192] Avg episode reward: [(0, '22.229')] [2024-07-29 13:43:27,124][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000807_3305472.pth... [2024-07-29 13:43:27,279][02889] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000598_2449408.pth [2024-07-29 13:43:29,740][02902] Updated weights for policy 0, policy_version 810 (0.0021) [2024-07-29 13:43:32,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3582.3). Total num frames: 3325952. Throughput: 0: 884.8. Samples: 831426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:43:32,107][00192] Avg episode reward: [(0, '23.954')] [2024-07-29 13:43:37,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3346432. Throughput: 0: 898.2. Samples: 834452. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:43:37,113][00192] Avg episode reward: [(0, '23.741')] [2024-07-29 13:43:40,427][02902] Updated weights for policy 0, policy_version 820 (0.0016) [2024-07-29 13:43:42,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3362816. Throughput: 0: 926.8. Samples: 840032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:43:42,112][00192] Avg episode reward: [(0, '23.507')] [2024-07-29 13:43:47,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3379200. Throughput: 0: 886.4. Samples: 844644. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:43:47,111][00192] Avg episode reward: [(0, '23.161')] [2024-07-29 13:43:51,626][02902] Updated weights for policy 0, policy_version 830 (0.0024) [2024-07-29 13:43:52,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3399680. Throughput: 0: 888.5. Samples: 847882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:43:52,112][00192] Avg episode reward: [(0, '22.812')] [2024-07-29 13:43:57,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3416064. Throughput: 0: 934.3. Samples: 854020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:43:57,113][00192] Avg episode reward: [(0, '22.180')] [2024-07-29 13:44:02,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3568.4). Total num frames: 3432448. Throughput: 0: 889.8. Samples: 858124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:44:02,108][00192] Avg episode reward: [(0, '22.758')] [2024-07-29 13:44:03,820][02902] Updated weights for policy 0, policy_version 840 (0.0019) [2024-07-29 13:44:07,105][00192] Fps is (10 sec: 3686.3, 60 sec: 3618.3, 300 sec: 3582.3). Total num frames: 3452928. Throughput: 0: 881.9. Samples: 860994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:44:07,107][00192] Avg episode reward: [(0, '23.711')] [2024-07-29 13:44:12,105][00192] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3473408. Throughput: 0: 918.1. Samples: 867420. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:44:12,111][00192] Avg episode reward: [(0, '23.835')] [2024-07-29 13:44:14,356][02902] Updated weights for policy 0, policy_version 850 (0.0015) [2024-07-29 13:44:17,105][00192] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3485696. Throughput: 0: 904.4. Samples: 872122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:44:17,110][00192] Avg episode reward: [(0, '23.282')] [2024-07-29 13:44:22,105][00192] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3506176. Throughput: 0: 888.4. Samples: 874432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:44:22,110][00192] Avg episode reward: [(0, '23.670')] [2024-07-29 13:44:25,496][02902] Updated weights for policy 0, policy_version 860 (0.0013) [2024-07-29 13:44:27,105][00192] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3526656. Throughput: 0: 906.0. Samples: 880804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:44:27,112][00192] Avg episode reward: [(0, '23.986')] [2024-07-29 13:44:32,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3543040. Throughput: 0: 922.6. Samples: 886162. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:44:32,108][00192] Avg episode reward: [(0, '23.315')] [2024-07-29 13:44:37,105][00192] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3559424. Throughput: 0: 895.2. Samples: 888168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:44:37,110][00192] Avg episode reward: [(0, '22.301')] [2024-07-29 13:44:37,462][02902] Updated weights for policy 0, policy_version 870 (0.0014) [2024-07-29 13:44:42,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3579904. Throughput: 0: 893.0. Samples: 894206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:44:42,111][00192] Avg episode reward: [(0, '23.262')] [2024-07-29 13:44:47,106][00192] Fps is (10 sec: 4095.4, 60 sec: 3686.3, 300 sec: 3596.1). Total num frames: 3600384. Throughput: 0: 937.9. Samples: 900330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:44:47,109][00192] Avg episode reward: [(0, '22.227')] [2024-07-29 13:44:48,253][02902] Updated weights for policy 0, policy_version 880 (0.0013) [2024-07-29 13:44:52,105][00192] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3612672. Throughput: 0: 919.8. Samples: 902386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-29 13:44:52,108][00192] Avg episode reward: [(0, '21.421')] [2024-07-29 13:44:57,105][00192] Fps is (10 sec: 3277.3, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3633152. Throughput: 0: 894.0. Samples: 907650. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:44:57,107][00192] Avg episode reward: [(0, '21.625')] [2024-07-29 13:44:59,020][02902] Updated weights for policy 0, policy_version 890 (0.0020) [2024-07-29 13:45:02,105][00192] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 3657728. Throughput: 0: 930.7. Samples: 914002. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-07-29 13:45:02,108][00192] Avg episode reward: [(0, '21.979')] [2024-07-29 13:45:07,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3670016. Throughput: 0: 932.1. Samples: 916378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:45:07,107][00192] Avg episode reward: [(0, '21.076')] [2024-07-29 13:45:11,473][02902] Updated weights for policy 0, policy_version 900 (0.0019) [2024-07-29 13:45:12,105][00192] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3686400. Throughput: 0: 888.4. Samples: 920782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:45:12,107][00192] Avg episode reward: [(0, '21.826')] [2024-07-29 13:45:17,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3706880. Throughput: 0: 912.3. Samples: 927214. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:45:17,111][00192] Avg episode reward: [(0, '22.956')] [2024-07-29 13:45:22,052][02902] Updated weights for policy 0, policy_version 910 (0.0014) [2024-07-29 13:45:22,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3727360. Throughput: 0: 938.4. Samples: 930398. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:45:22,107][00192] Avg episode reward: [(0, '25.612')] [2024-07-29 13:45:22,109][02889] Saving new best policy, reward=25.612! [2024-07-29 13:45:27,112][00192] Fps is (10 sec: 3274.5, 60 sec: 3549.5, 300 sec: 3582.2). Total num frames: 3739648. Throughput: 0: 894.4. Samples: 934460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:45:27,140][00192] Avg episode reward: [(0, '26.403')] [2024-07-29 13:45:27,208][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000914_3743744.pth... [2024-07-29 13:45:27,330][02889] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000702_2875392.pth [2024-07-29 13:45:27,347][02889] Saving new best policy, reward=26.403! [2024-07-29 13:45:32,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3764224. Throughput: 0: 891.5. Samples: 940446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-29 13:45:32,109][00192] Avg episode reward: [(0, '26.271')] [2024-07-29 13:45:33,053][02902] Updated weights for policy 0, policy_version 920 (0.0023) [2024-07-29 13:45:37,106][00192] Fps is (10 sec: 4098.4, 60 sec: 3686.3, 300 sec: 3623.9). Total num frames: 3780608. Throughput: 0: 917.1. Samples: 943656. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:45:37,109][00192] Avg episode reward: [(0, '25.922')] [2024-07-29 13:45:42,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3796992. Throughput: 0: 904.7. Samples: 948362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:45:42,113][00192] Avg episode reward: [(0, '25.838')] [2024-07-29 13:45:45,072][02902] Updated weights for policy 0, policy_version 930 (0.0047) [2024-07-29 13:45:47,105][00192] Fps is (10 sec: 3277.2, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3813376. Throughput: 0: 885.4. Samples: 953844. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:45:47,111][00192] Avg episode reward: [(0, '24.809')] [2024-07-29 13:45:52,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3637.8). Total num frames: 3837952. Throughput: 0: 904.9. Samples: 957098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:45:52,107][00192] Avg episode reward: [(0, '23.260')] [2024-07-29 13:45:56,064][02902] Updated weights for policy 0, policy_version 940 (0.0015) [2024-07-29 13:45:57,105][00192] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3850240. Throughput: 0: 925.8. Samples: 962442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:45:57,112][00192] Avg episode reward: [(0, '22.836')] [2024-07-29 13:46:02,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.1). Total num frames: 3870720. Throughput: 0: 888.7. Samples: 967204. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:46:02,107][00192] Avg episode reward: [(0, '22.659')] [2024-07-29 13:46:07,105][00192] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3887104. Throughput: 0: 886.4. Samples: 970286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-29 13:46:07,107][00192] Avg episode reward: [(0, '22.867')] [2024-07-29 13:46:07,122][02902] Updated weights for policy 0, policy_version 950 (0.0013) [2024-07-29 13:46:12,106][00192] Fps is (10 sec: 3686.1, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3907584. Throughput: 0: 931.4. Samples: 976366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-29 13:46:12,111][00192] Avg episode reward: [(0, '23.331')] [2024-07-29 13:46:17,105][00192] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3923968. Throughput: 0: 888.5. Samples: 980428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:46:17,110][00192] Avg episode reward: [(0, '23.181')] [2024-07-29 13:46:19,016][02902] Updated weights for policy 0, policy_version 960 (0.0015) [2024-07-29 13:46:22,105][00192] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3624.0). Total num frames: 3944448. Throughput: 0: 889.0. Samples: 983660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:46:22,109][00192] Avg episode reward: [(0, '22.674')] [2024-07-29 13:46:27,105][00192] Fps is (10 sec: 4096.0, 60 sec: 3755.1, 300 sec: 3637.8). Total num frames: 3964928. Throughput: 0: 928.2. Samples: 990130. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-29 13:46:27,107][00192] Avg episode reward: [(0, '22.382')] [2024-07-29 13:46:29,988][02902] Updated weights for policy 0, policy_version 970 (0.0019) [2024-07-29 13:46:32,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3977216. Throughput: 0: 905.8. Samples: 994606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-29 13:46:32,112][00192] Avg episode reward: [(0, '22.097')] [2024-07-29 13:46:37,105][00192] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3623.9). Total num frames: 3997696. Throughput: 0: 887.7. Samples: 997046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-29 13:46:37,112][00192] Avg episode reward: [(0, '22.042')] [2024-07-29 13:46:38,924][02889] Stopping Batcher_0... [2024-07-29 13:46:38,926][02889] Loop batcher_evt_loop terminating... [2024-07-29 13:46:38,927][00192] Component Batcher_0 stopped! [2024-07-29 13:46:38,933][00192] Component RolloutWorker_w3 process died already! Don't wait for it. [2024-07-29 13:46:38,947][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-29 13:46:38,966][02902] Weights refcount: 2 0 [2024-07-29 13:46:38,968][00192] Component InferenceWorker_p0-w0 stopped! [2024-07-29 13:46:38,968][02902] Stopping InferenceWorker_p0-w0... [2024-07-29 13:46:38,978][02902] Loop inference_proc0-0_evt_loop terminating... [2024-07-29 13:46:38,996][02908] Stopping RolloutWorker_w5... [2024-07-29 13:46:39,001][02910] Stopping RolloutWorker_w7... [2024-07-29 13:46:38,997][02908] Loop rollout_proc5_evt_loop terminating... [2024-07-29 13:46:38,996][00192] Component RolloutWorker_w5 stopped! [2024-07-29 13:46:39,006][00192] Component RolloutWorker_w7 stopped! [2024-07-29 13:46:39,003][02910] Loop rollout_proc7_evt_loop terminating... [2024-07-29 13:46:39,016][02904] Stopping RolloutWorker_w1... [2024-07-29 13:46:39,016][00192] Component RolloutWorker_w1 stopped! [2024-07-29 13:46:39,017][02904] Loop rollout_proc1_evt_loop terminating... [2024-07-29 13:46:39,056][00192] Component RolloutWorker_w0 stopped! [2024-07-29 13:46:39,069][00192] Component RolloutWorker_w6 stopped! [2024-07-29 13:46:39,061][02903] Stopping RolloutWorker_w0... [2024-07-29 13:46:39,074][02909] Stopping RolloutWorker_w6... [2024-07-29 13:46:39,075][02903] Loop rollout_proc0_evt_loop terminating... [2024-07-29 13:46:39,076][02909] Loop rollout_proc6_evt_loop terminating... [2024-07-29 13:46:39,085][00192] Component RolloutWorker_w2 stopped! [2024-07-29 13:46:39,091][02905] Stopping RolloutWorker_w2... [2024-07-29 13:46:39,092][02905] Loop rollout_proc2_evt_loop terminating... [2024-07-29 13:46:39,096][00192] Component RolloutWorker_w4 stopped! [2024-07-29 13:46:39,101][02907] Stopping RolloutWorker_w4... [2024-07-29 13:46:39,102][02907] Loop rollout_proc4_evt_loop terminating... [2024-07-29 13:46:39,124][02889] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000807_3305472.pth [2024-07-29 13:46:39,139][02889] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-29 13:46:39,331][00192] Component LearnerWorker_p0 stopped! [2024-07-29 13:46:39,341][00192] Waiting for process learner_proc0 to stop... [2024-07-29 13:46:39,349][02889] Stopping LearnerWorker_p0... [2024-07-29 13:46:39,350][02889] Loop learner_proc0_evt_loop terminating... [2024-07-29 13:46:40,846][00192] Waiting for process inference_proc0-0 to join... [2024-07-29 13:46:41,064][00192] Waiting for process rollout_proc0 to join... [2024-07-29 13:46:42,167][00192] Waiting for process rollout_proc1 to join... [2024-07-29 13:46:42,173][00192] Waiting for process rollout_proc2 to join... [2024-07-29 13:46:42,178][00192] Waiting for process rollout_proc3 to join... [2024-07-29 13:46:42,182][00192] Waiting for process rollout_proc4 to join... [2024-07-29 13:46:42,187][00192] Waiting for process rollout_proc5 to join... [2024-07-29 13:46:42,191][00192] Waiting for process rollout_proc6 to join... [2024-07-29 13:46:42,195][00192] Waiting for process rollout_proc7 to join... [2024-07-29 13:46:42,199][00192] Batcher 0 profile tree view: batching: 25.0944, releasing_batches: 0.0215 [2024-07-29 13:46:42,201][00192] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0005 wait_policy_total: 513.3013 update_model: 8.0399 weight_update: 0.0014 one_step: 0.0025 handle_policy_step: 565.7336 deserialize: 14.9377, stack: 3.1011, obs_to_device_normalize: 119.5024, forward: 287.0884, send_messages: 24.6671 prepare_outputs: 88.0301 to_cpu: 55.5410 [2024-07-29 13:46:42,204][00192] Learner 0 profile tree view: misc: 0.0071, prepare_batch: 16.0965 train: 72.9537 epoch_init: 0.0060, minibatch_init: 0.0160, losses_postprocess: 0.5131, kl_divergence: 0.5515, after_optimizer: 33.3259 calculate_losses: 24.0265 losses_init: 0.0117, forward_head: 1.6793, bptt_initial: 15.5821, tail: 1.0355, advantages_returns: 0.2805, losses: 2.8870 bptt: 2.2151 bptt_forward_core: 2.1359 update: 13.9095 clip: 1.4158 [2024-07-29 13:46:42,205][00192] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4204, enqueue_policy_requests: 124.3667, env_step: 887.2499, overhead: 15.5139, complete_rollouts: 8.6006 save_policy_outputs: 26.5353 split_output_tensors: 8.8402 [2024-07-29 13:46:42,207][00192] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3027, enqueue_policy_requests: 182.8087, env_step: 825.7647, overhead: 15.2829, complete_rollouts: 5.9377 save_policy_outputs: 24.6140 split_output_tensors: 8.6679 [2024-07-29 13:46:42,209][00192] Loop Runner_EvtLoop terminating... [2024-07-29 13:46:42,212][00192] Runner profile tree view: main_loop: 1151.3426 [2024-07-29 13:46:42,214][00192] Collected {0: 4005888}, FPS: 3479.3 [2024-07-29 13:48:25,701][00192] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-29 13:48:25,704][00192] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-29 13:48:25,707][00192] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-29 13:48:25,711][00192] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-29 13:48:25,712][00192] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-29 13:48:25,716][00192] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-29 13:48:25,717][00192] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-29 13:48:25,718][00192] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-29 13:48:25,719][00192] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-29 13:48:25,724][00192] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-29 13:48:25,724][00192] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-29 13:48:25,725][00192] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-29 13:48:25,726][00192] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-29 13:48:25,727][00192] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-29 13:48:25,728][00192] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-29 13:48:25,754][00192] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-29 13:48:25,757][00192] RunningMeanStd input shape: (3, 72, 128) [2024-07-29 13:48:25,759][00192] RunningMeanStd input shape: (1,) [2024-07-29 13:48:25,779][00192] ConvEncoder: input_channels=3 [2024-07-29 13:48:25,960][00192] Conv encoder output size: 512 [2024-07-29 13:48:25,962][00192] Policy head output size: 512 [2024-07-29 13:48:27,982][00192] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-29 13:48:28,834][00192] Num frames 100... [2024-07-29 13:48:28,959][00192] Num frames 200... [2024-07-29 13:48:29,089][00192] Num frames 300... [2024-07-29 13:48:29,215][00192] Num frames 400... [2024-07-29 13:48:29,340][00192] Num frames 500... [2024-07-29 13:48:29,467][00192] Num frames 600... [2024-07-29 13:48:29,594][00192] Num frames 700... [2024-07-29 13:48:29,723][00192] Num frames 800... [2024-07-29 13:48:29,905][00192] Avg episode rewards: #0: 18.960, true rewards: #0: 8.960 [2024-07-29 13:48:29,906][00192] Avg episode reward: 18.960, avg true_objective: 8.960 [2024-07-29 13:48:29,915][00192] Num frames 900... [2024-07-29 13:48:30,038][00192] Num frames 1000... [2024-07-29 13:48:30,165][00192] Num frames 1100... [2024-07-29 13:48:30,288][00192] Num frames 1200... [2024-07-29 13:48:30,413][00192] Num frames 1300... [2024-07-29 13:48:30,533][00192] Num frames 1400... [2024-07-29 13:48:30,664][00192] Num frames 1500... [2024-07-29 13:48:30,801][00192] Avg episode rewards: #0: 15.340, true rewards: #0: 7.840 [2024-07-29 13:48:30,802][00192] Avg episode reward: 15.340, avg true_objective: 7.840 [2024-07-29 13:48:30,846][00192] Num frames 1600... [2024-07-29 13:48:30,969][00192] Num frames 1700... [2024-07-29 13:48:31,092][00192] Num frames 1800... [2024-07-29 13:48:31,212][00192] Num frames 1900... [2024-07-29 13:48:31,338][00192] Num frames 2000... [2024-07-29 13:48:31,461][00192] Num frames 2100... [2024-07-29 13:48:31,590][00192] Num frames 2200... [2024-07-29 13:48:31,713][00192] Num frames 2300... [2024-07-29 13:48:31,846][00192] Num frames 2400... [2024-07-29 13:48:31,898][00192] Avg episode rewards: #0: 15.667, true rewards: #0: 8.000 [2024-07-29 13:48:31,900][00192] Avg episode reward: 15.667, avg true_objective: 8.000 [2024-07-29 13:48:32,022][00192] Num frames 2500... [2024-07-29 13:48:32,145][00192] Num frames 2600... [2024-07-29 13:48:32,273][00192] Num frames 2700... [2024-07-29 13:48:32,387][00192] Avg episode rewards: #0: 12.863, true rewards: #0: 6.862 [2024-07-29 13:48:32,389][00192] Avg episode reward: 12.863, avg true_objective: 6.862 [2024-07-29 13:48:32,457][00192] Num frames 2800... [2024-07-29 13:48:32,591][00192] Num frames 2900... [2024-07-29 13:48:32,714][00192] Num frames 3000... [2024-07-29 13:48:32,847][00192] Num frames 3100... [2024-07-29 13:48:33,020][00192] Avg episode rewards: #0: 12.186, true rewards: #0: 6.386 [2024-07-29 13:48:33,022][00192] Avg episode reward: 12.186, avg true_objective: 6.386 [2024-07-29 13:48:33,032][00192] Num frames 3200... [2024-07-29 13:48:33,153][00192] Num frames 3300... [2024-07-29 13:48:33,273][00192] Num frames 3400... [2024-07-29 13:48:33,402][00192] Num frames 3500... [2024-07-29 13:48:33,491][00192] Avg episode rewards: #0: 11.212, true rewards: #0: 5.878 [2024-07-29 13:48:33,493][00192] Avg episode reward: 11.212, avg true_objective: 5.878 [2024-07-29 13:48:33,596][00192] Num frames 3600... [2024-07-29 13:48:33,719][00192] Num frames 3700... [2024-07-29 13:48:33,851][00192] Num frames 3800... [2024-07-29 13:48:33,978][00192] Num frames 3900... [2024-07-29 13:48:34,117][00192] Num frames 4000... [2024-07-29 13:48:34,256][00192] Num frames 4100... [2024-07-29 13:48:34,400][00192] Num frames 4200... [2024-07-29 13:48:34,529][00192] Num frames 4300... [2024-07-29 13:48:34,661][00192] Num frames 4400... [2024-07-29 13:48:34,841][00192] Avg episode rewards: #0: 12.713, true rewards: #0: 6.427 [2024-07-29 13:48:34,845][00192] Avg episode reward: 12.713, avg true_objective: 6.427 [2024-07-29 13:48:34,849][00192] Num frames 4500... [2024-07-29 13:48:34,982][00192] Num frames 4600... [2024-07-29 13:48:35,118][00192] Num frames 4700... [2024-07-29 13:48:35,242][00192] Num frames 4800... [2024-07-29 13:48:35,373][00192] Num frames 4900... [2024-07-29 13:48:35,495][00192] Num frames 5000... [2024-07-29 13:48:35,626][00192] Num frames 5100... [2024-07-29 13:48:35,753][00192] Num frames 5200... [2024-07-29 13:48:35,889][00192] Num frames 5300... [2024-07-29 13:48:36,013][00192] Num frames 5400... [2024-07-29 13:48:36,146][00192] Avg episode rewards: #0: 13.953, true rewards: #0: 6.827 [2024-07-29 13:48:36,148][00192] Avg episode reward: 13.953, avg true_objective: 6.827 [2024-07-29 13:48:36,199][00192] Num frames 5500... [2024-07-29 13:48:36,327][00192] Num frames 5600... [2024-07-29 13:48:36,454][00192] Num frames 5700... [2024-07-29 13:48:36,591][00192] Num frames 5800... [2024-07-29 13:48:36,665][00192] Avg episode rewards: #0: 12.905, true rewards: #0: 6.460 [2024-07-29 13:48:36,666][00192] Avg episode reward: 12.905, avg true_objective: 6.460 [2024-07-29 13:48:36,772][00192] Num frames 5900... [2024-07-29 13:48:36,904][00192] Num frames 6000... [2024-07-29 13:48:37,032][00192] Num frames 6100... [2024-07-29 13:48:37,155][00192] Num frames 6200... [2024-07-29 13:48:37,278][00192] Num frames 6300... [2024-07-29 13:48:37,410][00192] Num frames 6400... [2024-07-29 13:48:37,538][00192] Num frames 6500... [2024-07-29 13:48:37,668][00192] Num frames 6600... [2024-07-29 13:48:37,852][00192] Num frames 6700... [2024-07-29 13:48:38,035][00192] Num frames 6800... [2024-07-29 13:48:38,206][00192] Num frames 6900... [2024-07-29 13:48:38,382][00192] Num frames 7000... [2024-07-29 13:48:38,495][00192] Avg episode rewards: #0: 14.030, true rewards: #0: 7.030 [2024-07-29 13:48:38,497][00192] Avg episode reward: 14.030, avg true_objective: 7.030 [2024-07-29 13:49:20,999][00192] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-29 13:51:06,385][00192] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-29 13:51:06,387][00192] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-29 13:51:06,390][00192] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-29 13:51:06,392][00192] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-29 13:51:06,393][00192] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-29 13:51:06,395][00192] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-29 13:51:06,396][00192] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-07-29 13:51:06,397][00192] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-29 13:51:06,399][00192] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-29 13:51:06,400][00192] Adding new argument 'hf_repository'='SwarajRay/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-29 13:51:06,401][00192] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-29 13:51:06,402][00192] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-29 13:51:06,403][00192] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-29 13:51:06,404][00192] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-29 13:51:06,405][00192] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-29 13:51:06,413][00192] RunningMeanStd input shape: (3, 72, 128) [2024-07-29 13:51:06,419][00192] RunningMeanStd input shape: (1,) [2024-07-29 13:51:06,433][00192] ConvEncoder: input_channels=3 [2024-07-29 13:51:06,468][00192] Conv encoder output size: 512 [2024-07-29 13:51:06,470][00192] Policy head output size: 512 [2024-07-29 13:51:06,487][00192] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-29 13:51:06,964][00192] Num frames 100... [2024-07-29 13:51:07,087][00192] Num frames 200... [2024-07-29 13:51:07,233][00192] Num frames 300... [2024-07-29 13:51:07,375][00192] Num frames 400... [2024-07-29 13:51:07,501][00192] Num frames 500... [2024-07-29 13:51:07,629][00192] Num frames 600... [2024-07-29 13:51:07,757][00192] Num frames 700... [2024-07-29 13:51:07,883][00192] Num frames 800... [2024-07-29 13:51:08,012][00192] Num frames 900... [2024-07-29 13:51:08,134][00192] Num frames 1000... [2024-07-29 13:51:08,255][00192] Num frames 1100... [2024-07-29 13:51:08,389][00192] Num frames 1200... [2024-07-29 13:51:08,516][00192] Num frames 1300... [2024-07-29 13:51:08,653][00192] Num frames 1400... [2024-07-29 13:51:08,823][00192] Avg episode rewards: #0: 35.930, true rewards: #0: 14.930 [2024-07-29 13:51:08,825][00192] Avg episode reward: 35.930, avg true_objective: 14.930 [2024-07-29 13:51:08,837][00192] Num frames 1500... [2024-07-29 13:51:08,957][00192] Num frames 1600... [2024-07-29 13:51:09,081][00192] Num frames 1700... [2024-07-29 13:51:09,201][00192] Num frames 1800... [2024-07-29 13:51:09,323][00192] Num frames 1900... [2024-07-29 13:51:09,455][00192] Num frames 2000... [2024-07-29 13:51:09,587][00192] Num frames 2100... [2024-07-29 13:51:09,713][00192] Num frames 2200... [2024-07-29 13:51:09,848][00192] Num frames 2300... [2024-07-29 13:51:09,971][00192] Num frames 2400... [2024-07-29 13:51:10,092][00192] Num frames 2500... [2024-07-29 13:51:10,216][00192] Num frames 2600... [2024-07-29 13:51:10,342][00192] Num frames 2700... [2024-07-29 13:51:10,475][00192] Num frames 2800... [2024-07-29 13:51:10,553][00192] Avg episode rewards: #0: 35.585, true rewards: #0: 14.085 [2024-07-29 13:51:10,555][00192] Avg episode reward: 35.585, avg true_objective: 14.085 [2024-07-29 13:51:10,662][00192] Num frames 2900... [2024-07-29 13:51:10,789][00192] Num frames 3000... [2024-07-29 13:51:10,914][00192] Num frames 3100... [2024-07-29 13:51:11,039][00192] Num frames 3200... [2024-07-29 13:51:11,173][00192] Num frames 3300... [2024-07-29 13:51:11,298][00192] Num frames 3400... [2024-07-29 13:51:11,428][00192] Num frames 3500... [2024-07-29 13:51:11,558][00192] Num frames 3600... [2024-07-29 13:51:11,684][00192] Num frames 3700... [2024-07-29 13:51:11,813][00192] Num frames 3800... [2024-07-29 13:51:11,906][00192] Avg episode rewards: #0: 31.093, true rewards: #0: 12.760 [2024-07-29 13:51:11,908][00192] Avg episode reward: 31.093, avg true_objective: 12.760 [2024-07-29 13:51:11,997][00192] Num frames 3900... [2024-07-29 13:51:12,116][00192] Num frames 4000... [2024-07-29 13:51:12,239][00192] Num frames 4100... [2024-07-29 13:51:12,363][00192] Num frames 4200... [2024-07-29 13:51:12,493][00192] Num frames 4300... [2024-07-29 13:51:12,624][00192] Num frames 4400... [2024-07-29 13:51:12,750][00192] Num frames 4500... [2024-07-29 13:51:12,922][00192] Avg episode rewards: #0: 26.990, true rewards: #0: 11.490 [2024-07-29 13:51:12,924][00192] Avg episode reward: 26.990, avg true_objective: 11.490 [2024-07-29 13:51:12,933][00192] Num frames 4600... [2024-07-29 13:51:13,082][00192] Num frames 4700... [2024-07-29 13:51:13,261][00192] Num frames 4800... [2024-07-29 13:51:13,437][00192] Num frames 4900... [2024-07-29 13:51:13,623][00192] Num frames 5000... [2024-07-29 13:51:13,800][00192] Num frames 5100... [2024-07-29 13:51:13,974][00192] Num frames 5200... [2024-07-29 13:51:14,145][00192] Num frames 5300... [2024-07-29 13:51:14,323][00192] Num frames 5400... [2024-07-29 13:51:14,513][00192] Num frames 5500... [2024-07-29 13:51:14,704][00192] Num frames 5600... [2024-07-29 13:51:14,850][00192] Avg episode rewards: #0: 26.904, true rewards: #0: 11.304 [2024-07-29 13:51:14,852][00192] Avg episode reward: 26.904, avg true_objective: 11.304 [2024-07-29 13:51:14,953][00192] Num frames 5700... [2024-07-29 13:51:15,130][00192] Num frames 5800... [2024-07-29 13:51:15,311][00192] Num frames 5900... [2024-07-29 13:51:15,469][00192] Num frames 6000... [2024-07-29 13:51:15,606][00192] Num frames 6100... [2024-07-29 13:51:15,729][00192] Num frames 6200... [2024-07-29 13:51:15,858][00192] Num frames 6300... [2024-07-29 13:51:15,981][00192] Num frames 6400... [2024-07-29 13:51:16,104][00192] Num frames 6500... [2024-07-29 13:51:16,228][00192] Num frames 6600... [2024-07-29 13:51:16,358][00192] Num frames 6700... [2024-07-29 13:51:16,483][00192] Num frames 6800... [2024-07-29 13:51:16,624][00192] Num frames 6900... [2024-07-29 13:51:16,746][00192] Num frames 7000... [2024-07-29 13:51:16,875][00192] Num frames 7100... [2024-07-29 13:51:16,993][00192] Avg episode rewards: #0: 28.585, true rewards: #0: 11.918 [2024-07-29 13:51:16,995][00192] Avg episode reward: 28.585, avg true_objective: 11.918 [2024-07-29 13:51:17,058][00192] Num frames 7200... [2024-07-29 13:51:17,180][00192] Num frames 7300... [2024-07-29 13:51:17,308][00192] Num frames 7400... [2024-07-29 13:51:17,436][00192] Num frames 7500... [2024-07-29 13:51:17,572][00192] Num frames 7600... [2024-07-29 13:51:17,708][00192] Num frames 7700... [2024-07-29 13:51:17,833][00192] Num frames 7800... [2024-07-29 13:51:17,958][00192] Num frames 7900... [2024-07-29 13:51:18,087][00192] Num frames 8000... [2024-07-29 13:51:18,215][00192] Num frames 8100... [2024-07-29 13:51:18,339][00192] Num frames 8200... [2024-07-29 13:51:18,465][00192] Num frames 8300... [2024-07-29 13:51:18,597][00192] Num frames 8400... [2024-07-29 13:51:18,732][00192] Num frames 8500... [2024-07-29 13:51:18,899][00192] Avg episode rewards: #0: 28.987, true rewards: #0: 12.273 [2024-07-29 13:51:18,903][00192] Avg episode reward: 28.987, avg true_objective: 12.273 [2024-07-29 13:51:18,917][00192] Num frames 8600... [2024-07-29 13:51:19,040][00192] Num frames 8700... [2024-07-29 13:51:19,165][00192] Num frames 8800... [2024-07-29 13:51:19,290][00192] Num frames 8900... [2024-07-29 13:51:19,412][00192] Num frames 9000... [2024-07-29 13:51:19,541][00192] Num frames 9100... [2024-07-29 13:51:19,686][00192] Num frames 9200... [2024-07-29 13:51:19,809][00192] Num frames 9300... [2024-07-29 13:51:19,933][00192] Num frames 9400... [2024-07-29 13:51:20,104][00192] Avg episode rewards: #0: 27.486, true rewards: #0: 11.861 [2024-07-29 13:51:20,107][00192] Avg episode reward: 27.486, avg true_objective: 11.861 [2024-07-29 13:51:20,123][00192] Num frames 9500... [2024-07-29 13:51:20,245][00192] Num frames 9600... [2024-07-29 13:51:20,373][00192] Num frames 9700... [2024-07-29 13:51:20,505][00192] Num frames 9800... [2024-07-29 13:51:20,635][00192] Num frames 9900... [2024-07-29 13:51:20,764][00192] Num frames 10000... [2024-07-29 13:51:20,892][00192] Num frames 10100... [2024-07-29 13:51:21,018][00192] Num frames 10200... [2024-07-29 13:51:21,141][00192] Num frames 10300... [2024-07-29 13:51:21,263][00192] Num frames 10400... [2024-07-29 13:51:21,385][00192] Num frames 10500... [2024-07-29 13:51:21,511][00192] Num frames 10600... [2024-07-29 13:51:21,663][00192] Avg episode rewards: #0: 27.081, true rewards: #0: 11.859 [2024-07-29 13:51:21,665][00192] Avg episode reward: 27.081, avg true_objective: 11.859 [2024-07-29 13:51:21,705][00192] Num frames 10700... [2024-07-29 13:51:21,839][00192] Num frames 10800... [2024-07-29 13:51:21,959][00192] Num frames 10900... [2024-07-29 13:51:22,082][00192] Num frames 11000... [2024-07-29 13:51:22,204][00192] Num frames 11100... [2024-07-29 13:51:22,331][00192] Num frames 11200... [2024-07-29 13:51:22,455][00192] Num frames 11300... [2024-07-29 13:51:22,593][00192] Num frames 11400... [2024-07-29 13:51:22,715][00192] Num frames 11500... [2024-07-29 13:51:22,867][00192] Avg episode rewards: #0: 26.369, true rewards: #0: 11.569 [2024-07-29 13:51:22,869][00192] Avg episode reward: 26.369, avg true_objective: 11.569 [2024-07-29 13:52:30,920][00192] Replay video saved to /content/train_dir/default_experiment/replay.mp4!