[2024-12-22 17:22:39,322][00403] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-22 17:22:39,325][00403] Rollout worker 0 uses device cpu [2024-12-22 17:22:39,327][00403] Rollout worker 1 uses device cpu [2024-12-22 17:22:39,328][00403] Rollout worker 2 uses device cpu [2024-12-22 17:22:39,330][00403] Rollout worker 3 uses device cpu [2024-12-22 17:22:39,331][00403] Rollout worker 4 uses device cpu [2024-12-22 17:22:39,332][00403] Rollout worker 5 uses device cpu [2024-12-22 17:22:39,333][00403] Rollout worker 6 uses device cpu [2024-12-22 17:22:39,334][00403] Rollout worker 7 uses device cpu [2024-12-22 17:22:39,479][00403] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-22 17:22:39,481][00403] InferenceWorker_p0-w0: min num requests: 2 [2024-12-22 17:22:39,512][00403] Starting all processes... [2024-12-22 17:22:39,513][00403] Starting process learner_proc0 [2024-12-22 17:22:39,561][00403] Starting all processes... [2024-12-22 17:22:39,570][00403] Starting process inference_proc0-0 [2024-12-22 17:22:39,570][00403] Starting process rollout_proc0 [2024-12-22 17:22:39,573][00403] Starting process rollout_proc1 [2024-12-22 17:22:39,573][00403] Starting process rollout_proc2 [2024-12-22 17:22:39,573][00403] Starting process rollout_proc3 [2024-12-22 17:22:39,573][00403] Starting process rollout_proc4 [2024-12-22 17:22:39,573][00403] Starting process rollout_proc5 [2024-12-22 17:22:39,573][00403] Starting process rollout_proc6 [2024-12-22 17:22:39,573][00403] Starting process rollout_proc7 [2024-12-22 17:22:56,417][03281] Worker 4 uses CPU cores [0] [2024-12-22 17:22:56,544][03279] Worker 2 uses CPU cores [0] [2024-12-22 17:22:56,613][03283] Worker 7 uses CPU cores [1] [2024-12-22 17:22:56,732][03277] Worker 0 uses CPU cores [0] [2024-12-22 17:22:56,761][03278] Worker 1 uses CPU cores [1] [2024-12-22 17:22:56,771][03263] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-22 17:22:56,776][03263] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-22 17:22:56,805][03280] Worker 3 uses CPU cores [1] [2024-12-22 17:22:56,823][03263] Num visible devices: 1 [2024-12-22 17:22:56,841][03282] Worker 6 uses CPU cores [0] [2024-12-22 17:22:56,854][03263] Starting seed is not provided [2024-12-22 17:22:56,855][03263] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-22 17:22:56,856][03263] Initializing actor-critic model on device cuda:0 [2024-12-22 17:22:56,857][03263] RunningMeanStd input shape: (3, 72, 128) [2024-12-22 17:22:56,860][03263] RunningMeanStd input shape: (1,) [2024-12-22 17:22:56,859][03276] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-22 17:22:56,866][03276] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-22 17:22:56,894][03263] ConvEncoder: input_channels=3 [2024-12-22 17:22:56,920][03284] Worker 5 uses CPU cores [1] [2024-12-22 17:22:56,922][03276] Num visible devices: 1 [2024-12-22 17:22:57,387][03263] Conv encoder output size: 512 [2024-12-22 17:22:57,388][03263] Policy head output size: 512 [2024-12-22 17:22:57,473][03263] Created Actor Critic model with architecture: [2024-12-22 17:22:57,474][03263] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-22 17:22:58,018][03263] Using optimizer [2024-12-22 17:22:59,478][00403] Heartbeat connected on Batcher_0 [2024-12-22 17:22:59,483][00403] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-22 17:22:59,487][00403] Heartbeat connected on RolloutWorker_w0 [2024-12-22 17:22:59,492][00403] Heartbeat connected on RolloutWorker_w1 [2024-12-22 17:22:59,495][00403] Heartbeat connected on RolloutWorker_w2 [2024-12-22 17:22:59,498][00403] Heartbeat connected on RolloutWorker_w3 [2024-12-22 17:22:59,501][00403] Heartbeat connected on RolloutWorker_w4 [2024-12-22 17:22:59,505][00403] Heartbeat connected on RolloutWorker_w5 [2024-12-22 17:22:59,508][00403] Heartbeat connected on RolloutWorker_w6 [2024-12-22 17:22:59,512][00403] Heartbeat connected on RolloutWorker_w7 [2024-12-22 17:23:01,438][03263] No checkpoints found [2024-12-22 17:23:01,438][03263] Did not load from checkpoint, starting from scratch! [2024-12-22 17:23:01,439][03263] Initialized policy 0 weights for model version 0 [2024-12-22 17:23:01,442][03263] LearnerWorker_p0 finished initialization! [2024-12-22 17:23:01,444][03263] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-22 17:23:01,443][00403] Heartbeat connected on LearnerWorker_p0 [2024-12-22 17:23:01,540][03276] RunningMeanStd input shape: (3, 72, 128) [2024-12-22 17:23:01,542][03276] RunningMeanStd input shape: (1,) [2024-12-22 17:23:01,556][03276] ConvEncoder: input_channels=3 [2024-12-22 17:23:01,657][03276] Conv encoder output size: 512 [2024-12-22 17:23:01,657][03276] Policy head output size: 512 [2024-12-22 17:23:01,713][00403] Inference worker 0-0 is ready! [2024-12-22 17:23:01,715][00403] All inference workers are ready! Signal rollout workers to start! [2024-12-22 17:23:01,913][03280] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-22 17:23:01,914][03278] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-22 17:23:01,907][03283] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-22 17:23:01,916][03284] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-22 17:23:01,935][03279] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-22 17:23:01,941][03277] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-22 17:23:01,939][03282] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-22 17:23:01,945][03281] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-22 17:23:03,540][03283] Decorrelating experience for 0 frames... [2024-12-22 17:23:03,550][03278] Decorrelating experience for 0 frames... [2024-12-22 17:23:03,552][03280] Decorrelating experience for 0 frames... [2024-12-22 17:23:03,636][00403] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-22 17:23:04,030][03279] Decorrelating experience for 0 frames... [2024-12-22 17:23:04,025][03277] Decorrelating experience for 0 frames... [2024-12-22 17:23:04,031][03282] Decorrelating experience for 0 frames... [2024-12-22 17:23:04,038][03281] Decorrelating experience for 0 frames... [2024-12-22 17:23:05,218][03280] Decorrelating experience for 32 frames... [2024-12-22 17:23:05,222][03283] Decorrelating experience for 32 frames... [2024-12-22 17:23:05,230][03284] Decorrelating experience for 0 frames... [2024-12-22 17:23:05,425][03277] Decorrelating experience for 32 frames... [2024-12-22 17:23:05,441][03281] Decorrelating experience for 32 frames... [2024-12-22 17:23:06,606][03278] Decorrelating experience for 32 frames... [2024-12-22 17:23:06,598][03284] Decorrelating experience for 32 frames... [2024-12-22 17:23:07,332][03282] Decorrelating experience for 32 frames... [2024-12-22 17:23:07,340][03279] Decorrelating experience for 32 frames... [2024-12-22 17:23:08,023][03277] Decorrelating experience for 64 frames... [2024-12-22 17:23:08,014][03283] Decorrelating experience for 64 frames... [2024-12-22 17:23:08,035][03281] Decorrelating experience for 64 frames... [2024-12-22 17:23:08,636][00403] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-22 17:23:08,819][03279] Decorrelating experience for 64 frames... [2024-12-22 17:23:08,903][03280] Decorrelating experience for 64 frames... [2024-12-22 17:23:09,629][03278] Decorrelating experience for 64 frames... [2024-12-22 17:23:09,634][03284] Decorrelating experience for 64 frames... [2024-12-22 17:23:09,728][03283] Decorrelating experience for 96 frames... [2024-12-22 17:23:09,904][03277] Decorrelating experience for 96 frames... [2024-12-22 17:23:10,008][03279] Decorrelating experience for 96 frames... [2024-12-22 17:23:10,101][03282] Decorrelating experience for 64 frames... [2024-12-22 17:23:10,630][03280] Decorrelating experience for 96 frames... [2024-12-22 17:23:11,136][03284] Decorrelating experience for 96 frames... [2024-12-22 17:23:11,139][03278] Decorrelating experience for 96 frames... [2024-12-22 17:23:11,231][03281] Decorrelating experience for 96 frames... [2024-12-22 17:23:11,418][03282] Decorrelating experience for 96 frames... [2024-12-22 17:23:13,636][00403] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 3.6. Samples: 36. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-22 17:23:13,638][00403] Avg episode reward: [(0, '1.636')] [2024-12-22 17:23:14,337][03263] Signal inference workers to stop experience collection... [2024-12-22 17:23:14,352][03276] InferenceWorker_p0-w0: stopping experience collection [2024-12-22 17:23:17,548][03263] Signal inference workers to resume experience collection... [2024-12-22 17:23:17,549][03276] InferenceWorker_p0-w0: resuming experience collection [2024-12-22 17:23:18,636][00403] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 8192. Throughput: 0: 157.2. Samples: 2358. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-22 17:23:18,641][00403] Avg episode reward: [(0, '2.775')] [2024-12-22 17:23:23,636][00403] Fps is (10 sec: 2457.6, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 316.2. Samples: 6324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-22 17:23:23,639][00403] Avg episode reward: [(0, '3.600')] [2024-12-22 17:23:27,201][03276] Updated weights for policy 0, policy_version 10 (0.0153) [2024-12-22 17:23:28,636][00403] Fps is (10 sec: 3686.4, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 348.9. Samples: 8722. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-12-22 17:23:28,641][00403] Avg episode reward: [(0, '4.233')] [2024-12-22 17:23:33,637][00403] Fps is (10 sec: 4505.5, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 535.4. Samples: 16062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:23:33,642][00403] Avg episode reward: [(0, '4.575')] [2024-12-22 17:23:35,643][03276] Updated weights for policy 0, policy_version 20 (0.0023) [2024-12-22 17:23:38,636][00403] Fps is (10 sec: 4505.6, 60 sec: 2574.6, 300 sec: 2574.6). Total num frames: 90112. Throughput: 0: 641.4. Samples: 22448. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-22 17:23:38,640][00403] Avg episode reward: [(0, '4.683')] [2024-12-22 17:23:43,636][00403] Fps is (10 sec: 3686.5, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 615.3. Samples: 24612. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-22 17:23:43,639][00403] Avg episode reward: [(0, '4.494')] [2024-12-22 17:23:43,648][03263] Saving new best policy, reward=4.494! [2024-12-22 17:23:46,833][03276] Updated weights for policy 0, policy_version 30 (0.0024) [2024-12-22 17:23:48,636][00403] Fps is (10 sec: 4096.0, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 131072. Throughput: 0: 692.6. Samples: 31166. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:23:48,641][00403] Avg episode reward: [(0, '4.601')] [2024-12-22 17:23:48,644][03263] Saving new best policy, reward=4.601! [2024-12-22 17:23:53,636][00403] Fps is (10 sec: 4505.6, 60 sec: 3031.0, 300 sec: 3031.0). Total num frames: 151552. Throughput: 0: 850.3. Samples: 38262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:23:53,640][00403] Avg episode reward: [(0, '4.470')] [2024-12-22 17:23:57,069][03276] Updated weights for policy 0, policy_version 40 (0.0025) [2024-12-22 17:23:58,638][00403] Fps is (10 sec: 3685.6, 60 sec: 3053.3, 300 sec: 3053.3). Total num frames: 167936. Throughput: 0: 898.9. Samples: 40486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:23:58,641][00403] Avg episode reward: [(0, '4.451')] [2024-12-22 17:24:03,636][00403] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3208.5). Total num frames: 192512. Throughput: 0: 978.1. Samples: 46374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:24:03,639][00403] Avg episode reward: [(0, '4.614')] [2024-12-22 17:24:03,648][03263] Saving new best policy, reward=4.614! [2024-12-22 17:24:06,274][03276] Updated weights for policy 0, policy_version 50 (0.0016) [2024-12-22 17:24:08,642][00403] Fps is (10 sec: 4504.1, 60 sec: 3549.5, 300 sec: 3276.5). Total num frames: 212992. Throughput: 0: 1050.4. Samples: 53598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:24:08,644][00403] Avg episode reward: [(0, '4.313')] [2024-12-22 17:24:13,636][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 229376. Throughput: 0: 1062.0. Samples: 56512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:24:13,641][00403] Avg episode reward: [(0, '4.385')] [2024-12-22 17:24:17,798][03276] Updated weights for policy 0, policy_version 60 (0.0046) [2024-12-22 17:24:18,636][00403] Fps is (10 sec: 3278.6, 60 sec: 3959.5, 300 sec: 3276.8). Total num frames: 245760. Throughput: 0: 992.0. Samples: 60700. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:24:18,643][00403] Avg episode reward: [(0, '4.582')] [2024-12-22 17:24:23,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3379.2). Total num frames: 270336. Throughput: 0: 1009.7. Samples: 67884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:24:23,639][00403] Avg episode reward: [(0, '4.613')] [2024-12-22 17:24:26,371][03276] Updated weights for policy 0, policy_version 70 (0.0021) [2024-12-22 17:24:28,638][00403] Fps is (10 sec: 4504.7, 60 sec: 4095.9, 300 sec: 3421.3). Total num frames: 290816. Throughput: 0: 1043.4. Samples: 71566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:24:28,646][00403] Avg episode reward: [(0, '4.493')] [2024-12-22 17:24:33,636][00403] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3413.3). Total num frames: 307200. Throughput: 0: 1002.0. Samples: 76258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:24:33,639][00403] Avg episode reward: [(0, '4.446')] [2024-12-22 17:24:33,650][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... [2024-12-22 17:24:37,420][03276] Updated weights for policy 0, policy_version 80 (0.0021) [2024-12-22 17:24:38,636][00403] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3492.4). Total num frames: 331776. Throughput: 0: 994.8. Samples: 83030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:24:38,643][00403] Avg episode reward: [(0, '4.578')] [2024-12-22 17:24:43,638][00403] Fps is (10 sec: 4914.5, 60 sec: 4164.2, 300 sec: 3563.5). Total num frames: 356352. Throughput: 0: 1027.4. Samples: 86720. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-22 17:24:43,644][00403] Avg episode reward: [(0, '4.353')] [2024-12-22 17:24:47,413][03276] Updated weights for policy 0, policy_version 90 (0.0031) [2024-12-22 17:24:48,636][00403] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3510.9). Total num frames: 368640. Throughput: 0: 1019.6. Samples: 92256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:24:48,643][00403] Avg episode reward: [(0, '4.422')] [2024-12-22 17:24:53,636][00403] Fps is (10 sec: 3686.9, 60 sec: 4027.7, 300 sec: 3574.7). Total num frames: 393216. Throughput: 0: 987.6. Samples: 98034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:24:53,639][00403] Avg episode reward: [(0, '4.560')] [2024-12-22 17:24:56,698][03276] Updated weights for policy 0, policy_version 100 (0.0016) [2024-12-22 17:24:58,637][00403] Fps is (10 sec: 4915.1, 60 sec: 4164.4, 300 sec: 3633.0). Total num frames: 417792. Throughput: 0: 1005.7. Samples: 101768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:24:58,639][00403] Avg episode reward: [(0, '4.596')] [2024-12-22 17:25:03,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3618.1). Total num frames: 434176. Throughput: 0: 1058.5. Samples: 108332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:25:03,641][00403] Avg episode reward: [(0, '4.546')] [2024-12-22 17:25:07,767][03276] Updated weights for policy 0, policy_version 110 (0.0022) [2024-12-22 17:25:08,636][00403] Fps is (10 sec: 3686.5, 60 sec: 4028.1, 300 sec: 3637.3). Total num frames: 454656. Throughput: 0: 1008.4. Samples: 113262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:25:08,642][00403] Avg episode reward: [(0, '4.509')] [2024-12-22 17:25:13,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3686.4). Total num frames: 479232. Throughput: 0: 1008.0. Samples: 116926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:25:13,639][00403] Avg episode reward: [(0, '4.596')] [2024-12-22 17:25:15,887][03276] Updated weights for policy 0, policy_version 120 (0.0014) [2024-12-22 17:25:18,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3701.6). Total num frames: 499712. Throughput: 0: 1067.4. Samples: 124292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:25:18,641][00403] Avg episode reward: [(0, '4.616')] [2024-12-22 17:25:18,646][03263] Saving new best policy, reward=4.616! [2024-12-22 17:25:23,636][00403] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3657.1). Total num frames: 512000. Throughput: 0: 1015.2. Samples: 128716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:25:23,644][00403] Avg episode reward: [(0, '4.707')] [2024-12-22 17:25:23,650][03263] Saving new best policy, reward=4.707! [2024-12-22 17:25:27,214][03276] Updated weights for policy 0, policy_version 130 (0.0033) [2024-12-22 17:25:28,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 3700.5). Total num frames: 536576. Throughput: 0: 1003.1. Samples: 131860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:25:28,643][00403] Avg episode reward: [(0, '4.691')] [2024-12-22 17:25:33,639][00403] Fps is (10 sec: 4914.0, 60 sec: 4232.4, 300 sec: 3741.0). Total num frames: 561152. Throughput: 0: 1042.0. Samples: 139150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:25:33,644][00403] Avg episode reward: [(0, '4.717')] [2024-12-22 17:25:33,654][03263] Saving new best policy, reward=4.717! [2024-12-22 17:25:37,182][03276] Updated weights for policy 0, policy_version 140 (0.0031) [2024-12-22 17:25:38,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3726.0). Total num frames: 577536. Throughput: 0: 1029.0. Samples: 144338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:25:38,641][00403] Avg episode reward: [(0, '4.840')] [2024-12-22 17:25:38,643][03263] Saving new best policy, reward=4.840! [2024-12-22 17:25:43,636][00403] Fps is (10 sec: 3687.3, 60 sec: 4027.8, 300 sec: 3737.6). Total num frames: 598016. Throughput: 0: 998.6. Samples: 146704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:25:43,641][00403] Avg episode reward: [(0, '4.870')] [2024-12-22 17:25:43,656][03263] Saving new best policy, reward=4.870! [2024-12-22 17:25:46,970][03276] Updated weights for policy 0, policy_version 150 (0.0036) [2024-12-22 17:25:48,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3773.3). Total num frames: 622592. Throughput: 0: 1014.9. Samples: 154004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:25:48,639][00403] Avg episode reward: [(0, '4.697')] [2024-12-22 17:25:53,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3758.7). Total num frames: 638976. Throughput: 0: 1046.9. Samples: 160372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:25:53,643][00403] Avg episode reward: [(0, '4.843')] [2024-12-22 17:25:57,845][03276] Updated weights for policy 0, policy_version 160 (0.0037) [2024-12-22 17:25:58,636][00403] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3744.9). Total num frames: 655360. Throughput: 0: 1016.2. Samples: 162654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:25:58,643][00403] Avg episode reward: [(0, '4.996')] [2024-12-22 17:25:58,696][03263] Saving new best policy, reward=4.996! [2024-12-22 17:26:03,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3777.4). Total num frames: 679936. Throughput: 0: 998.3. Samples: 169214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:26:03,643][00403] Avg episode reward: [(0, '4.864')] [2024-12-22 17:26:06,182][03276] Updated weights for policy 0, policy_version 170 (0.0023) [2024-12-22 17:26:08,639][00403] Fps is (10 sec: 4913.9, 60 sec: 4164.1, 300 sec: 3808.1). Total num frames: 704512. Throughput: 0: 1060.1. Samples: 176422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:26:08,644][00403] Avg episode reward: [(0, '4.693')] [2024-12-22 17:26:13,638][00403] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 3772.6). Total num frames: 716800. Throughput: 0: 1031.6. Samples: 178286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:26:13,644][00403] Avg episode reward: [(0, '4.651')] [2024-12-22 17:26:17,767][03276] Updated weights for policy 0, policy_version 180 (0.0020) [2024-12-22 17:26:18,636][00403] Fps is (10 sec: 3687.3, 60 sec: 4027.7, 300 sec: 3801.9). Total num frames: 741376. Throughput: 0: 993.7. Samples: 183864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:26:18,640][00403] Avg episode reward: [(0, '4.877')] [2024-12-22 17:26:23,636][00403] Fps is (10 sec: 4916.0, 60 sec: 4232.5, 300 sec: 3829.8). Total num frames: 765952. Throughput: 0: 1040.0. Samples: 191136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:26:23,638][00403] Avg episode reward: [(0, '5.196')] [2024-12-22 17:26:23,648][03263] Saving new best policy, reward=5.196! [2024-12-22 17:26:27,589][03276] Updated weights for policy 0, policy_version 190 (0.0027) [2024-12-22 17:26:28,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3796.3). Total num frames: 778240. Throughput: 0: 1049.4. Samples: 193926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:26:28,639][00403] Avg episode reward: [(0, '5.091')] [2024-12-22 17:26:33,636][00403] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 3803.4). Total num frames: 798720. Throughput: 0: 987.6. Samples: 198448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:26:33,639][00403] Avg episode reward: [(0, '5.084')] [2024-12-22 17:26:33,649][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth... [2024-12-22 17:26:37,586][03276] Updated weights for policy 0, policy_version 200 (0.0017) [2024-12-22 17:26:38,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3829.3). Total num frames: 823296. Throughput: 0: 1005.6. Samples: 205626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:26:38,641][00403] Avg episode reward: [(0, '5.137')] [2024-12-22 17:26:43,638][00403] Fps is (10 sec: 4504.8, 60 sec: 4095.9, 300 sec: 3835.3). Total num frames: 843776. Throughput: 0: 1035.7. Samples: 209264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:26:43,642][00403] Avg episode reward: [(0, '4.755')] [2024-12-22 17:26:48,636][00403] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3804.7). Total num frames: 856064. Throughput: 0: 996.3. Samples: 214046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:26:48,643][00403] Avg episode reward: [(0, '4.966')] [2024-12-22 17:26:48,810][03276] Updated weights for policy 0, policy_version 210 (0.0027) [2024-12-22 17:26:53,636][00403] Fps is (10 sec: 3687.0, 60 sec: 4027.7, 300 sec: 3828.9). Total num frames: 880640. Throughput: 0: 978.9. Samples: 220468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:26:53,644][00403] Avg episode reward: [(0, '5.264')] [2024-12-22 17:26:53,653][03263] Saving new best policy, reward=5.264! [2024-12-22 17:26:57,547][03276] Updated weights for policy 0, policy_version 220 (0.0028) [2024-12-22 17:26:58,636][00403] Fps is (10 sec: 4915.3, 60 sec: 4164.3, 300 sec: 3852.0). Total num frames: 905216. Throughput: 0: 1008.4. Samples: 223664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:26:58,639][00403] Avg episode reward: [(0, '5.314')] [2024-12-22 17:26:58,646][03263] Saving new best policy, reward=5.314! [2024-12-22 17:27:03,636][00403] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3822.9). Total num frames: 917504. Throughput: 0: 1009.4. Samples: 229288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:27:03,639][00403] Avg episode reward: [(0, '5.273')] [2024-12-22 17:27:08,636][00403] Fps is (10 sec: 3276.7, 60 sec: 3891.4, 300 sec: 3828.5). Total num frames: 937984. Throughput: 0: 961.2. Samples: 234392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:27:08,639][00403] Avg episode reward: [(0, '5.238')] [2024-12-22 17:27:09,209][03276] Updated weights for policy 0, policy_version 230 (0.0021) [2024-12-22 17:27:13,637][00403] Fps is (10 sec: 4505.5, 60 sec: 4096.1, 300 sec: 3850.2). Total num frames: 962560. Throughput: 0: 977.2. Samples: 237900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-22 17:27:13,642][00403] Avg episode reward: [(0, '5.305')] [2024-12-22 17:27:18,636][00403] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3839.0). Total num frames: 978944. Throughput: 0: 1021.4. Samples: 244410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:27:18,646][00403] Avg episode reward: [(0, '5.690')] [2024-12-22 17:27:18,651][03263] Saving new best policy, reward=5.690! [2024-12-22 17:27:19,544][03276] Updated weights for policy 0, policy_version 240 (0.0031) [2024-12-22 17:27:23,636][00403] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3828.2). Total num frames: 995328. Throughput: 0: 962.4. Samples: 248934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:27:23,639][00403] Avg episode reward: [(0, '5.756')] [2024-12-22 17:27:23,646][03263] Saving new best policy, reward=5.756! [2024-12-22 17:27:28,636][00403] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3848.7). Total num frames: 1019904. Throughput: 0: 960.9. Samples: 252504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:27:28,638][00403] Avg episode reward: [(0, '5.871')] [2024-12-22 17:27:28,644][03263] Saving new best policy, reward=5.871! [2024-12-22 17:27:29,204][03276] Updated weights for policy 0, policy_version 250 (0.0026) [2024-12-22 17:27:33,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3853.3). Total num frames: 1040384. Throughput: 0: 1014.8. Samples: 259714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:27:33,641][00403] Avg episode reward: [(0, '5.602')] [2024-12-22 17:27:38,636][00403] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3842.8). Total num frames: 1056768. Throughput: 0: 978.6. Samples: 264506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:27:38,642][00403] Avg episode reward: [(0, '5.724')] [2024-12-22 17:27:40,371][03276] Updated weights for policy 0, policy_version 260 (0.0045) [2024-12-22 17:27:43,636][00403] Fps is (10 sec: 3686.3, 60 sec: 3891.3, 300 sec: 3847.3). Total num frames: 1077248. Throughput: 0: 968.0. Samples: 267224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:27:43,646][00403] Avg episode reward: [(0, '6.220')] [2024-12-22 17:27:43,657][03263] Saving new best policy, reward=6.220! [2024-12-22 17:27:48,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3866.0). Total num frames: 1101824. Throughput: 0: 1000.1. Samples: 274292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:27:48,638][00403] Avg episode reward: [(0, '6.361')] [2024-12-22 17:27:48,641][03263] Saving new best policy, reward=6.361! [2024-12-22 17:27:49,277][03276] Updated weights for policy 0, policy_version 270 (0.0026) [2024-12-22 17:27:53,636][00403] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3855.9). Total num frames: 1118208. Throughput: 0: 1005.0. Samples: 279616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:27:53,641][00403] Avg episode reward: [(0, '6.504')] [2024-12-22 17:27:53,648][03263] Saving new best policy, reward=6.504! [2024-12-22 17:27:58,636][00403] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1134592. Throughput: 0: 975.0. Samples: 281776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:27:58,639][00403] Avg episode reward: [(0, '6.531')] [2024-12-22 17:27:58,645][03263] Saving new best policy, reward=6.531! [2024-12-22 17:28:00,728][03276] Updated weights for policy 0, policy_version 280 (0.0043) [2024-12-22 17:28:03,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1159168. Throughput: 0: 980.7. Samples: 288540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:28:03,641][00403] Avg episode reward: [(0, '6.878')] [2024-12-22 17:28:03,652][03263] Saving new best policy, reward=6.878! [2024-12-22 17:28:08,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 1024.5. Samples: 295036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:28:08,644][00403] Avg episode reward: [(0, '6.773')] [2024-12-22 17:28:11,115][03276] Updated weights for policy 0, policy_version 290 (0.0026) [2024-12-22 17:28:13,636][00403] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 4012.7). Total num frames: 1191936. Throughput: 0: 993.7. Samples: 297222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:28:13,641][00403] Avg episode reward: [(0, '7.234')] [2024-12-22 17:28:13,651][03263] Saving new best policy, reward=7.234! [2024-12-22 17:28:18,636][00403] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1216512. Throughput: 0: 961.1. Samples: 302962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:28:18,639][00403] Avg episode reward: [(0, '7.254')] [2024-12-22 17:28:18,651][03263] Saving new best policy, reward=7.254! [2024-12-22 17:28:21,009][03276] Updated weights for policy 0, policy_version 300 (0.0019) [2024-12-22 17:28:23,638][00403] Fps is (10 sec: 4914.6, 60 sec: 4095.9, 300 sec: 4054.3). Total num frames: 1241088. Throughput: 0: 1009.7. Samples: 309944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:28:23,642][00403] Avg episode reward: [(0, '7.639')] [2024-12-22 17:28:23,651][03263] Saving new best policy, reward=7.639! [2024-12-22 17:28:28,636][00403] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 1253376. Throughput: 0: 1003.3. Samples: 312372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:28:28,642][00403] Avg episode reward: [(0, '8.030')] [2024-12-22 17:28:28,648][03263] Saving new best policy, reward=8.030! [2024-12-22 17:28:32,183][03276] Updated weights for policy 0, policy_version 310 (0.0028) [2024-12-22 17:28:33,636][00403] Fps is (10 sec: 3277.2, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 1273856. Throughput: 0: 956.2. Samples: 317322. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:28:33,638][00403] Avg episode reward: [(0, '8.618')] [2024-12-22 17:28:33,648][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth... [2024-12-22 17:28:33,760][03263] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth [2024-12-22 17:28:33,775][03263] Saving new best policy, reward=8.618! [2024-12-22 17:28:38,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1298432. Throughput: 0: 994.0. Samples: 324346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:28:38,641][00403] Avg episode reward: [(0, '8.905')] [2024-12-22 17:28:38,645][03263] Saving new best policy, reward=8.905! [2024-12-22 17:28:41,534][03276] Updated weights for policy 0, policy_version 320 (0.0017) [2024-12-22 17:28:43,636][00403] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1314816. Throughput: 0: 1019.2. Samples: 327640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:28:43,638][00403] Avg episode reward: [(0, '8.635')] [2024-12-22 17:28:48,636][00403] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 1331200. Throughput: 0: 970.7. Samples: 332220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:28:48,639][00403] Avg episode reward: [(0, '9.360')] [2024-12-22 17:28:48,644][03263] Saving new best policy, reward=9.360! [2024-12-22 17:28:52,262][03276] Updated weights for policy 0, policy_version 330 (0.0027) [2024-12-22 17:28:53,636][00403] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1355776. Throughput: 0: 979.7. Samples: 339122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:28:53,639][00403] Avg episode reward: [(0, '9.730')] [2024-12-22 17:28:53,650][03263] Saving new best policy, reward=9.730! [2024-12-22 17:28:58,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1376256. Throughput: 0: 1010.9. Samples: 342712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:28:58,642][00403] Avg episode reward: [(0, '9.394')] [2024-12-22 17:29:02,740][03276] Updated weights for policy 0, policy_version 340 (0.0015) [2024-12-22 17:29:03,639][00403] Fps is (10 sec: 3685.3, 60 sec: 3891.0, 300 sec: 3998.8). Total num frames: 1392640. Throughput: 0: 998.9. Samples: 347916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:29:03,646][00403] Avg episode reward: [(0, '8.880')] [2024-12-22 17:29:08,636][00403] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1417216. Throughput: 0: 983.5. Samples: 354202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:29:08,643][00403] Avg episode reward: [(0, '8.384')] [2024-12-22 17:29:11,632][03276] Updated weights for policy 0, policy_version 350 (0.0024) [2024-12-22 17:29:13,636][00403] Fps is (10 sec: 4916.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 1441792. Throughput: 0: 1014.2. Samples: 358012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:29:13,639][00403] Avg episode reward: [(0, '8.975')] [2024-12-22 17:29:18,639][00403] Fps is (10 sec: 4094.9, 60 sec: 4027.6, 300 sec: 4026.5). Total num frames: 1458176. Throughput: 0: 1039.3. Samples: 364092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:29:18,644][00403] Avg episode reward: [(0, '9.508')] [2024-12-22 17:29:22,620][03276] Updated weights for policy 0, policy_version 360 (0.0014) [2024-12-22 17:29:23,636][00403] Fps is (10 sec: 3686.5, 60 sec: 3959.6, 300 sec: 4026.6). Total num frames: 1478656. Throughput: 0: 1000.0. Samples: 369348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-22 17:29:23,639][00403] Avg episode reward: [(0, '9.347')] [2024-12-22 17:29:28,636][00403] Fps is (10 sec: 4506.8, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 1503232. Throughput: 0: 1008.8. Samples: 373034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:29:28,638][00403] Avg episode reward: [(0, '9.217')] [2024-12-22 17:29:30,840][03276] Updated weights for policy 0, policy_version 370 (0.0014) [2024-12-22 17:29:33,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 1523712. Throughput: 0: 1065.9. Samples: 380184. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:29:33,640][00403] Avg episode reward: [(0, '9.660')] [2024-12-22 17:29:38,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1540096. Throughput: 0: 1013.3. Samples: 384722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:29:38,638][00403] Avg episode reward: [(0, '10.062')] [2024-12-22 17:29:38,645][03263] Saving new best policy, reward=10.062! [2024-12-22 17:29:41,952][03276] Updated weights for policy 0, policy_version 380 (0.0036) [2024-12-22 17:29:43,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 1564672. Throughput: 0: 1012.6. Samples: 388280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:29:43,642][00403] Avg episode reward: [(0, '11.286')] [2024-12-22 17:29:43,651][03263] Saving new best policy, reward=11.286! [2024-12-22 17:29:48,637][00403] Fps is (10 sec: 4505.3, 60 sec: 4232.5, 300 sec: 4040.5). Total num frames: 1585152. Throughput: 0: 1059.9. Samples: 395610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:29:48,639][00403] Avg episode reward: [(0, '12.272')] [2024-12-22 17:29:48,644][03263] Saving new best policy, reward=12.272! [2024-12-22 17:29:51,686][03276] Updated weights for policy 0, policy_version 390 (0.0035) [2024-12-22 17:29:53,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1601536. Throughput: 0: 1033.2. Samples: 400694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:29:53,643][00403] Avg episode reward: [(0, '12.959')] [2024-12-22 17:29:53,652][03263] Saving new best policy, reward=12.959! [2024-12-22 17:29:58,636][00403] Fps is (10 sec: 3686.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1622016. Throughput: 0: 1004.1. Samples: 403198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:29:58,640][00403] Avg episode reward: [(0, '12.218')] [2024-12-22 17:30:01,372][03276] Updated weights for policy 0, policy_version 400 (0.0020) [2024-12-22 17:30:03,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.7, 300 sec: 4040.5). Total num frames: 1646592. Throughput: 0: 1034.5. Samples: 410644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:30:03,650][00403] Avg episode reward: [(0, '11.047')] [2024-12-22 17:30:08,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1662976. Throughput: 0: 1049.6. Samples: 416580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:30:08,643][00403] Avg episode reward: [(0, '10.913')] [2024-12-22 17:30:12,495][03276] Updated weights for policy 0, policy_version 410 (0.0015) [2024-12-22 17:30:13,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1683456. Throughput: 0: 1017.7. Samples: 418832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:30:13,639][00403] Avg episode reward: [(0, '11.935')] [2024-12-22 17:30:18,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 4054.3). Total num frames: 1708032. Throughput: 0: 1012.0. Samples: 425722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:30:18,639][00403] Avg episode reward: [(0, '12.160')] [2024-12-22 17:30:20,877][03276] Updated weights for policy 0, policy_version 420 (0.0020) [2024-12-22 17:30:23,637][00403] Fps is (10 sec: 4505.5, 60 sec: 4164.2, 300 sec: 4040.5). Total num frames: 1728512. Throughput: 0: 1065.5. Samples: 432668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:30:23,641][00403] Avg episode reward: [(0, '11.960')] [2024-12-22 17:30:28,637][00403] Fps is (10 sec: 3686.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1744896. Throughput: 0: 1036.3. Samples: 434914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:30:28,641][00403] Avg episode reward: [(0, '11.011')] [2024-12-22 17:30:31,851][03276] Updated weights for policy 0, policy_version 430 (0.0022) [2024-12-22 17:30:33,636][00403] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1769472. Throughput: 0: 1005.6. Samples: 440862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:30:33,644][00403] Avg episode reward: [(0, '11.491')] [2024-12-22 17:30:33,658][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth... [2024-12-22 17:30:33,775][03263] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth [2024-12-22 17:30:38,636][00403] Fps is (10 sec: 4915.7, 60 sec: 4232.5, 300 sec: 4054.3). Total num frames: 1794048. Throughput: 0: 1055.8. Samples: 448206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:30:38,639][00403] Avg episode reward: [(0, '12.300')] [2024-12-22 17:30:40,694][03276] Updated weights for policy 0, policy_version 440 (0.0022) [2024-12-22 17:30:43,640][00403] Fps is (10 sec: 4094.5, 60 sec: 4095.8, 300 sec: 4026.5). Total num frames: 1810432. Throughput: 0: 1064.5. Samples: 451104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:30:43,644][00403] Avg episode reward: [(0, '13.268')] [2024-12-22 17:30:43,657][03263] Saving new best policy, reward=13.268! [2024-12-22 17:30:48,636][00403] Fps is (10 sec: 3686.3, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1830912. Throughput: 0: 1008.1. Samples: 456010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:30:48,639][00403] Avg episode reward: [(0, '12.288')] [2024-12-22 17:30:51,144][03276] Updated weights for policy 0, policy_version 450 (0.0016) [2024-12-22 17:30:53,636][00403] Fps is (10 sec: 4507.2, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 1855488. Throughput: 0: 1040.8. Samples: 463416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:30:53,639][00403] Avg episode reward: [(0, '11.794')] [2024-12-22 17:30:58,636][00403] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 1871872. Throughput: 0: 1072.3. Samples: 467084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:30:58,642][00403] Avg episode reward: [(0, '11.732')] [2024-12-22 17:31:01,523][03276] Updated weights for policy 0, policy_version 460 (0.0031) [2024-12-22 17:31:03,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1892352. Throughput: 0: 1020.7. Samples: 471654. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:31:03,639][00403] Avg episode reward: [(0, '12.305')] [2024-12-22 17:31:08,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4068.3). Total num frames: 1916928. Throughput: 0: 1023.6. Samples: 478732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:31:08,639][00403] Avg episode reward: [(0, '13.061')] [2024-12-22 17:31:10,268][03276] Updated weights for policy 0, policy_version 470 (0.0020) [2024-12-22 17:31:13,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4054.3). Total num frames: 1937408. Throughput: 0: 1057.0. Samples: 482476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:31:13,639][00403] Avg episode reward: [(0, '14.035')] [2024-12-22 17:31:13,650][03263] Saving new best policy, reward=14.035! [2024-12-22 17:31:18,640][00403] Fps is (10 sec: 3684.9, 60 sec: 4095.7, 300 sec: 4026.5). Total num frames: 1953792. Throughput: 0: 1045.9. Samples: 487930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:31:18,643][00403] Avg episode reward: [(0, '14.354')] [2024-12-22 17:31:18,646][03263] Saving new best policy, reward=14.354! [2024-12-22 17:31:21,310][03276] Updated weights for policy 0, policy_version 480 (0.0035) [2024-12-22 17:31:23,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1974272. Throughput: 0: 1012.8. Samples: 493782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:31:23,642][00403] Avg episode reward: [(0, '15.504')] [2024-12-22 17:31:23,735][03263] Saving new best policy, reward=15.504! [2024-12-22 17:31:28,637][00403] Fps is (10 sec: 4507.0, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 1998848. Throughput: 0: 1029.0. Samples: 497406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:31:28,640][00403] Avg episode reward: [(0, '15.189')] [2024-12-22 17:31:29,642][03276] Updated weights for policy 0, policy_version 490 (0.0014) [2024-12-22 17:31:33,637][00403] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2015232. Throughput: 0: 1061.9. Samples: 503796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:31:33,642][00403] Avg episode reward: [(0, '15.854')] [2024-12-22 17:31:33,654][03263] Saving new best policy, reward=15.854! [2024-12-22 17:31:38,636][00403] Fps is (10 sec: 3686.7, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2035712. Throughput: 0: 1006.4. Samples: 508702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:31:38,643][00403] Avg episode reward: [(0, '16.630')] [2024-12-22 17:31:38,648][03263] Saving new best policy, reward=16.630! [2024-12-22 17:31:40,893][03276] Updated weights for policy 0, policy_version 500 (0.0037) [2024-12-22 17:31:43,636][00403] Fps is (10 sec: 4505.8, 60 sec: 4164.5, 300 sec: 4082.1). Total num frames: 2060288. Throughput: 0: 1005.6. Samples: 512334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:31:43,643][00403] Avg episode reward: [(0, '17.472')] [2024-12-22 17:31:43,656][03263] Saving new best policy, reward=17.472! [2024-12-22 17:31:48,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 2080768. Throughput: 0: 1066.5. Samples: 519648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:31:48,641][00403] Avg episode reward: [(0, '17.196')] [2024-12-22 17:31:50,910][03276] Updated weights for policy 0, policy_version 510 (0.0027) [2024-12-22 17:31:53,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2097152. Throughput: 0: 1008.8. Samples: 524128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:31:53,639][00403] Avg episode reward: [(0, '17.012')] [2024-12-22 17:31:58,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 2121728. Throughput: 0: 998.3. Samples: 527400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:31:58,642][00403] Avg episode reward: [(0, '16.773')] [2024-12-22 17:32:00,223][03276] Updated weights for policy 0, policy_version 520 (0.0029) [2024-12-22 17:32:03,636][00403] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 2146304. Throughput: 0: 1044.6. Samples: 534932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:32:03,639][00403] Avg episode reward: [(0, '17.793')] [2024-12-22 17:32:03,647][03263] Saving new best policy, reward=17.793! [2024-12-22 17:32:08,638][00403] Fps is (10 sec: 3685.6, 60 sec: 4027.6, 300 sec: 4054.3). Total num frames: 2158592. Throughput: 0: 1035.0. Samples: 540360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:32:08,642][00403] Avg episode reward: [(0, '17.631')] [2024-12-22 17:32:11,330][03276] Updated weights for policy 0, policy_version 530 (0.0030) [2024-12-22 17:32:13,637][00403] Fps is (10 sec: 3276.7, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2179072. Throughput: 0: 1006.0. Samples: 542674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:32:13,639][00403] Avg episode reward: [(0, '18.572')] [2024-12-22 17:32:13,651][03263] Saving new best policy, reward=18.572! [2024-12-22 17:32:18,636][00403] Fps is (10 sec: 4506.5, 60 sec: 4164.5, 300 sec: 4096.0). Total num frames: 2203648. Throughput: 0: 1025.6. Samples: 549946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:32:18,639][00403] Avg episode reward: [(0, '18.330')] [2024-12-22 17:32:19,593][03276] Updated weights for policy 0, policy_version 540 (0.0021) [2024-12-22 17:32:23,636][00403] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 2224128. Throughput: 0: 1061.2. Samples: 556456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:32:23,638][00403] Avg episode reward: [(0, '17.777')] [2024-12-22 17:32:28,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 2240512. Throughput: 0: 1029.5. Samples: 558662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:32:28,638][00403] Avg episode reward: [(0, '17.888')] [2024-12-22 17:32:30,604][03276] Updated weights for policy 0, policy_version 550 (0.0050) [2024-12-22 17:32:33,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2265088. Throughput: 0: 1010.1. Samples: 565102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:32:33,639][00403] Avg episode reward: [(0, '17.560')] [2024-12-22 17:32:33,646][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000553_2265088.pth... [2024-12-22 17:32:33,815][03263] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000311_1273856.pth [2024-12-22 17:32:38,640][00403] Fps is (10 sec: 4913.2, 60 sec: 4232.3, 300 sec: 4109.8). Total num frames: 2289664. Throughput: 0: 1072.4. Samples: 572390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:32:38,642][00403] Avg episode reward: [(0, '17.964')] [2024-12-22 17:32:39,651][03276] Updated weights for policy 0, policy_version 560 (0.0021) [2024-12-22 17:32:43,639][00403] Fps is (10 sec: 3685.3, 60 sec: 4027.5, 300 sec: 4068.2). Total num frames: 2301952. Throughput: 0: 1052.5. Samples: 574764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:32:43,645][00403] Avg episode reward: [(0, '17.742')] [2024-12-22 17:32:48,636][00403] Fps is (10 sec: 3687.9, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2326528. Throughput: 0: 1007.0. Samples: 580246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:32:48,639][00403] Avg episode reward: [(0, '18.202')] [2024-12-22 17:32:49,998][03276] Updated weights for policy 0, policy_version 570 (0.0019) [2024-12-22 17:32:53,636][00403] Fps is (10 sec: 4916.7, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 2351104. Throughput: 0: 1047.2. Samples: 587482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:32:53,648][00403] Avg episode reward: [(0, '18.059')] [2024-12-22 17:32:58,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2367488. Throughput: 0: 1068.0. Samples: 590732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:32:58,642][00403] Avg episode reward: [(0, '17.407')] [2024-12-22 17:33:00,625][03276] Updated weights for policy 0, policy_version 580 (0.0031) [2024-12-22 17:33:03,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 2387968. Throughput: 0: 1010.2. Samples: 595406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:33:03,638][00403] Avg episode reward: [(0, '18.212')] [2024-12-22 17:33:08,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.7, 300 sec: 4137.7). Total num frames: 2412544. Throughput: 0: 1030.9. Samples: 602846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:33:08,643][00403] Avg episode reward: [(0, '18.437')] [2024-12-22 17:33:09,327][03276] Updated weights for policy 0, policy_version 590 (0.0029) [2024-12-22 17:33:13,637][00403] Fps is (10 sec: 4505.4, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 2433024. Throughput: 0: 1064.0. Samples: 606542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:33:13,647][00403] Avg episode reward: [(0, '18.717')] [2024-12-22 17:33:13,662][03263] Saving new best policy, reward=18.717! [2024-12-22 17:33:18,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2449408. Throughput: 0: 1028.5. Samples: 611384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:33:18,638][00403] Avg episode reward: [(0, '18.902')] [2024-12-22 17:33:18,643][03263] Saving new best policy, reward=18.902! [2024-12-22 17:33:20,345][03276] Updated weights for policy 0, policy_version 600 (0.0017) [2024-12-22 17:33:23,636][00403] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2469888. Throughput: 0: 1010.2. Samples: 617844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:33:23,638][00403] Avg episode reward: [(0, '19.132')] [2024-12-22 17:33:23,647][03263] Saving new best policy, reward=19.132! [2024-12-22 17:33:28,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 2494464. Throughput: 0: 1035.2. Samples: 621344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:33:28,642][00403] Avg episode reward: [(0, '20.831')] [2024-12-22 17:33:28,650][03263] Saving new best policy, reward=20.831! [2024-12-22 17:33:29,221][03276] Updated weights for policy 0, policy_version 610 (0.0015) [2024-12-22 17:33:33,641][00403] Fps is (10 sec: 4093.9, 60 sec: 4095.7, 300 sec: 4109.8). Total num frames: 2510848. Throughput: 0: 1042.4. Samples: 627158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:33:33,644][00403] Avg episode reward: [(0, '20.185')] [2024-12-22 17:33:38,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4028.0, 300 sec: 4123.8). Total num frames: 2531328. Throughput: 0: 1005.2. Samples: 632716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:33:38,639][00403] Avg episode reward: [(0, '19.365')] [2024-12-22 17:33:39,954][03276] Updated weights for policy 0, policy_version 620 (0.0030) [2024-12-22 17:33:43,636][00403] Fps is (10 sec: 4507.8, 60 sec: 4232.8, 300 sec: 4151.5). Total num frames: 2555904. Throughput: 0: 1015.8. Samples: 636442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:33:43,639][00403] Avg episode reward: [(0, '18.963')] [2024-12-22 17:33:48,636][00403] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 2576384. Throughput: 0: 1065.2. Samples: 643340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:33:48,641][00403] Avg episode reward: [(0, '17.918')] [2024-12-22 17:33:49,991][03276] Updated weights for policy 0, policy_version 630 (0.0019) [2024-12-22 17:33:53,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 2592768. Throughput: 0: 1001.5. Samples: 647914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:33:53,643][00403] Avg episode reward: [(0, '16.887')] [2024-12-22 17:33:58,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.6). Total num frames: 2617344. Throughput: 0: 1003.9. Samples: 651718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:33:58,645][00403] Avg episode reward: [(0, '17.383')] [2024-12-22 17:33:59,215][03276] Updated weights for policy 0, policy_version 640 (0.0014) [2024-12-22 17:34:03,638][00403] Fps is (10 sec: 4504.6, 60 sec: 4164.1, 300 sec: 4137.6). Total num frames: 2637824. Throughput: 0: 1060.9. Samples: 659126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:34:03,644][00403] Avg episode reward: [(0, '17.887')] [2024-12-22 17:34:08,636][00403] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 2654208. Throughput: 0: 1027.7. Samples: 664090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:34:08,641][00403] Avg episode reward: [(0, '16.826')] [2024-12-22 17:34:10,213][03276] Updated weights for policy 0, policy_version 650 (0.0017) [2024-12-22 17:34:13,636][00403] Fps is (10 sec: 4096.9, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 2678784. Throughput: 0: 1013.9. Samples: 666970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:34:13,644][00403] Avg episode reward: [(0, '17.573')] [2024-12-22 17:34:18,390][03276] Updated weights for policy 0, policy_version 660 (0.0018) [2024-12-22 17:34:18,636][00403] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 2703360. Throughput: 0: 1052.0. Samples: 674492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:34:18,643][00403] Avg episode reward: [(0, '18.350')] [2024-12-22 17:34:23,639][00403] Fps is (10 sec: 4094.9, 60 sec: 4164.1, 300 sec: 4123.7). Total num frames: 2719744. Throughput: 0: 1057.2. Samples: 680292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:34:23,649][00403] Avg episode reward: [(0, '17.382')] [2024-12-22 17:34:28,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2740224. Throughput: 0: 1025.4. Samples: 682584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:34:28,639][00403] Avg episode reward: [(0, '17.098')] [2024-12-22 17:34:29,506][03276] Updated weights for policy 0, policy_version 670 (0.0018) [2024-12-22 17:34:33,636][00403] Fps is (10 sec: 4506.9, 60 sec: 4232.9, 300 sec: 4151.5). Total num frames: 2764800. Throughput: 0: 1027.5. Samples: 689578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:34:33,642][00403] Avg episode reward: [(0, '16.481')] [2024-12-22 17:34:33,651][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000675_2764800.pth... [2024-12-22 17:34:33,790][03263] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth [2024-12-22 17:34:38,220][03276] Updated weights for policy 0, policy_version 680 (0.0013) [2024-12-22 17:34:38,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 2785280. Throughput: 0: 1077.3. Samples: 696394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:34:38,639][00403] Avg episode reward: [(0, '15.985')] [2024-12-22 17:34:43,636][00403] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 2797568. Throughput: 0: 1041.6. Samples: 698592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:34:43,638][00403] Avg episode reward: [(0, '16.741')] [2024-12-22 17:34:48,513][03276] Updated weights for policy 0, policy_version 690 (0.0017) [2024-12-22 17:34:48,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 2826240. Throughput: 0: 1015.6. Samples: 704826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:34:48,644][00403] Avg episode reward: [(0, '18.588')] [2024-12-22 17:34:53,636][00403] Fps is (10 sec: 5324.8, 60 sec: 4300.8, 300 sec: 4165.4). Total num frames: 2850816. Throughput: 0: 1068.5. Samples: 712172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:34:53,641][00403] Avg episode reward: [(0, '19.305')] [2024-12-22 17:34:58,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2863104. Throughput: 0: 1065.1. Samples: 714900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-22 17:34:58,642][00403] Avg episode reward: [(0, '19.316')] [2024-12-22 17:34:58,921][03276] Updated weights for policy 0, policy_version 700 (0.0022) [2024-12-22 17:35:03,636][00403] Fps is (10 sec: 3276.8, 60 sec: 4096.1, 300 sec: 4137.7). Total num frames: 2883584. Throughput: 0: 1014.3. Samples: 720136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:35:03,644][00403] Avg episode reward: [(0, '20.017')] [2024-12-22 17:35:07,831][03276] Updated weights for policy 0, policy_version 710 (0.0024) [2024-12-22 17:35:08,636][00403] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4165.4). Total num frames: 2912256. Throughput: 0: 1051.1. Samples: 727590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:35:08,638][00403] Avg episode reward: [(0, '18.551')] [2024-12-22 17:35:13,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 2928640. Throughput: 0: 1081.2. Samples: 731238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:35:13,639][00403] Avg episode reward: [(0, '18.523')] [2024-12-22 17:35:18,559][03276] Updated weights for policy 0, policy_version 720 (0.0019) [2024-12-22 17:35:18,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 2949120. Throughput: 0: 1027.1. Samples: 735798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:35:18,643][00403] Avg episode reward: [(0, '18.322')] [2024-12-22 17:35:23,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.5, 300 sec: 4151.6). Total num frames: 2969600. Throughput: 0: 1032.1. Samples: 742840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:35:23,639][00403] Avg episode reward: [(0, '16.596')] [2024-12-22 17:35:26,958][03276] Updated weights for policy 0, policy_version 730 (0.0014) [2024-12-22 17:35:28,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 2994176. Throughput: 0: 1067.2. Samples: 746618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:35:28,639][00403] Avg episode reward: [(0, '17.072')] [2024-12-22 17:35:33,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3010560. Throughput: 0: 1047.6. Samples: 751966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:35:33,640][00403] Avg episode reward: [(0, '18.156')] [2024-12-22 17:35:37,830][03276] Updated weights for policy 0, policy_version 740 (0.0024) [2024-12-22 17:35:38,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 3031040. Throughput: 0: 1022.4. Samples: 758180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:35:38,640][00403] Avg episode reward: [(0, '17.583')] [2024-12-22 17:35:43,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4151.5). Total num frames: 3055616. Throughput: 0: 1041.4. Samples: 761764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:35:43,639][00403] Avg episode reward: [(0, '18.745')] [2024-12-22 17:35:47,139][03276] Updated weights for policy 0, policy_version 750 (0.0025) [2024-12-22 17:35:48,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3072000. Throughput: 0: 1066.9. Samples: 768148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:35:48,639][00403] Avg episode reward: [(0, '19.592')] [2024-12-22 17:35:53,637][00403] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4137.7). Total num frames: 3092480. Throughput: 0: 1015.1. Samples: 773270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:35:53,639][00403] Avg episode reward: [(0, '17.882')] [2024-12-22 17:35:57,166][03276] Updated weights for policy 0, policy_version 760 (0.0020) [2024-12-22 17:35:58,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 3117056. Throughput: 0: 1017.1. Samples: 777006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:35:58,640][00403] Avg episode reward: [(0, '17.725')] [2024-12-22 17:36:03,636][00403] Fps is (10 sec: 4505.7, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 3137536. Throughput: 0: 1077.6. Samples: 784288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:36:03,642][00403] Avg episode reward: [(0, '17.707')] [2024-12-22 17:36:08,089][03276] Updated weights for policy 0, policy_version 770 (0.0036) [2024-12-22 17:36:08,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 3153920. Throughput: 0: 1021.4. Samples: 788802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:36:08,642][00403] Avg episode reward: [(0, '17.407')] [2024-12-22 17:36:13,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.6). Total num frames: 3178496. Throughput: 0: 1017.6. Samples: 792410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:36:13,641][00403] Avg episode reward: [(0, '18.382')] [2024-12-22 17:36:16,225][03276] Updated weights for policy 0, policy_version 780 (0.0023) [2024-12-22 17:36:18,637][00403] Fps is (10 sec: 4914.7, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 3203072. Throughput: 0: 1064.7. Samples: 799880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:36:18,645][00403] Avg episode reward: [(0, '19.404')] [2024-12-22 17:36:23,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 3219456. Throughput: 0: 1040.9. Samples: 805022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:36:23,638][00403] Avg episode reward: [(0, '18.423')] [2024-12-22 17:36:27,263][03276] Updated weights for policy 0, policy_version 790 (0.0037) [2024-12-22 17:36:28,636][00403] Fps is (10 sec: 3686.8, 60 sec: 4096.0, 300 sec: 4151.5). Total num frames: 3239936. Throughput: 0: 1019.5. Samples: 807640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:36:28,643][00403] Avg episode reward: [(0, '19.026')] [2024-12-22 17:36:33,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 3264512. Throughput: 0: 1042.8. Samples: 815076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:36:33,640][00403] Avg episode reward: [(0, '20.272')] [2024-12-22 17:36:33,655][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000797_3264512.pth... [2024-12-22 17:36:33,783][03263] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000553_2265088.pth [2024-12-22 17:36:36,023][03276] Updated weights for policy 0, policy_version 800 (0.0018) [2024-12-22 17:36:38,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 3280896. Throughput: 0: 1062.0. Samples: 821060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:36:38,639][00403] Avg episode reward: [(0, '20.823')] [2024-12-22 17:36:43,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 3301376. Throughput: 0: 1028.5. Samples: 823290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-22 17:36:43,643][00403] Avg episode reward: [(0, '20.847')] [2024-12-22 17:36:43,654][03263] Saving new best policy, reward=20.847! [2024-12-22 17:36:46,514][03276] Updated weights for policy 0, policy_version 810 (0.0021) [2024-12-22 17:36:48,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 3325952. Throughput: 0: 1021.3. Samples: 830246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:36:48,639][00403] Avg episode reward: [(0, '20.225')] [2024-12-22 17:36:53,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 3346432. Throughput: 0: 1075.8. Samples: 837214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-22 17:36:53,641][00403] Avg episode reward: [(0, '20.106')] [2024-12-22 17:36:56,691][03276] Updated weights for policy 0, policy_version 820 (0.0017) [2024-12-22 17:36:58,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3362816. Throughput: 0: 1044.5. Samples: 839412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:36:58,640][00403] Avg episode reward: [(0, '21.097')] [2024-12-22 17:36:58,649][03263] Saving new best policy, reward=21.097! [2024-12-22 17:37:03,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4165.5). Total num frames: 3387392. Throughput: 0: 1011.0. Samples: 845372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:37:03,639][00403] Avg episode reward: [(0, '20.697')] [2024-12-22 17:37:05,825][03276] Updated weights for policy 0, policy_version 830 (0.0018) [2024-12-22 17:37:08,636][00403] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4179.3). Total num frames: 3411968. Throughput: 0: 1061.7. Samples: 852798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:37:08,640][00403] Avg episode reward: [(0, '22.342')] [2024-12-22 17:37:08,643][03263] Saving new best policy, reward=22.342! [2024-12-22 17:37:13,647][00403] Fps is (10 sec: 4091.7, 60 sec: 4163.5, 300 sec: 4151.4). Total num frames: 3428352. Throughput: 0: 1064.3. Samples: 855544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:37:13,653][00403] Avg episode reward: [(0, '22.865')] [2024-12-22 17:37:13,667][03263] Saving new best policy, reward=22.865! [2024-12-22 17:37:16,928][03276] Updated weights for policy 0, policy_version 840 (0.0027) [2024-12-22 17:37:18,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4151.5). Total num frames: 3448832. Throughput: 0: 1007.9. Samples: 860432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:37:18,642][00403] Avg episode reward: [(0, '23.093')] [2024-12-22 17:37:18,647][03263] Saving new best policy, reward=23.093! [2024-12-22 17:37:23,637][00403] Fps is (10 sec: 4099.8, 60 sec: 4164.2, 300 sec: 4165.4). Total num frames: 3469312. Throughput: 0: 1038.2. Samples: 867780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:37:23,641][00403] Avg episode reward: [(0, '23.274')] [2024-12-22 17:37:23,651][03263] Saving new best policy, reward=23.274! [2024-12-22 17:37:25,444][03276] Updated weights for policy 0, policy_version 850 (0.0019) [2024-12-22 17:37:28,637][00403] Fps is (10 sec: 4095.7, 60 sec: 4164.2, 300 sec: 4151.5). Total num frames: 3489792. Throughput: 0: 1066.5. Samples: 871282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:37:28,642][00403] Avg episode reward: [(0, '23.329')] [2024-12-22 17:37:28,648][03263] Saving new best policy, reward=23.329! [2024-12-22 17:37:33,636][00403] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 3506176. Throughput: 0: 1014.2. Samples: 875884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:37:33,638][00403] Avg episode reward: [(0, '23.070')] [2024-12-22 17:37:36,494][03276] Updated weights for policy 0, policy_version 860 (0.0030) [2024-12-22 17:37:38,636][00403] Fps is (10 sec: 4096.3, 60 sec: 4164.3, 300 sec: 4165.5). Total num frames: 3530752. Throughput: 0: 1011.6. Samples: 882738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:37:38,642][00403] Avg episode reward: [(0, '23.771')] [2024-12-22 17:37:38,647][03263] Saving new best policy, reward=23.771! [2024-12-22 17:37:43,636][00403] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 3555328. Throughput: 0: 1043.8. Samples: 886382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:37:43,641][00403] Avg episode reward: [(0, '23.308')] [2024-12-22 17:37:46,178][03276] Updated weights for policy 0, policy_version 870 (0.0028) [2024-12-22 17:37:48,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 3567616. Throughput: 0: 1034.7. Samples: 891934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:37:48,638][00403] Avg episode reward: [(0, '22.071')] [2024-12-22 17:37:53,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4151.5). Total num frames: 3592192. Throughput: 0: 1004.6. Samples: 898006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-22 17:37:53,641][00403] Avg episode reward: [(0, '23.461')] [2024-12-22 17:37:55,942][03276] Updated weights for policy 0, policy_version 880 (0.0015) [2024-12-22 17:37:58,636][00403] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 3616768. Throughput: 0: 1024.2. Samples: 901624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:37:58,638][00403] Avg episode reward: [(0, '22.911')] [2024-12-22 17:38:03,637][00403] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4137.6). Total num frames: 3633152. Throughput: 0: 1061.7. Samples: 908210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-22 17:38:03,644][00403] Avg episode reward: [(0, '22.731')] [2024-12-22 17:38:06,488][03276] Updated weights for policy 0, policy_version 890 (0.0025) [2024-12-22 17:38:08,636][00403] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4137.7). Total num frames: 3653632. Throughput: 0: 1008.8. Samples: 913174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:38:08,642][00403] Avg episode reward: [(0, '22.996')] [2024-12-22 17:38:13,636][00403] Fps is (10 sec: 4505.8, 60 sec: 4165.0, 300 sec: 4165.4). Total num frames: 3678208. Throughput: 0: 1013.5. Samples: 916890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:38:13,638][00403] Avg episode reward: [(0, '24.345')] [2024-12-22 17:38:13,650][03263] Saving new best policy, reward=24.345! [2024-12-22 17:38:15,122][03276] Updated weights for policy 0, policy_version 900 (0.0018) [2024-12-22 17:38:18,636][00403] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 3698688. Throughput: 0: 1074.4. Samples: 924230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:38:18,639][00403] Avg episode reward: [(0, '22.997')] [2024-12-22 17:38:23,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4137.7). Total num frames: 3715072. Throughput: 0: 1023.8. Samples: 928810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:38:23,643][00403] Avg episode reward: [(0, '21.996')] [2024-12-22 17:38:26,215][03276] Updated weights for policy 0, policy_version 910 (0.0023) [2024-12-22 17:38:28,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4151.6). Total num frames: 3735552. Throughput: 0: 1010.4. Samples: 931850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:38:28,638][00403] Avg episode reward: [(0, '20.986')] [2024-12-22 17:38:33,636][00403] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4179.3). Total num frames: 3764224. Throughput: 0: 1054.0. Samples: 939366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-22 17:38:33,639][00403] Avg episode reward: [(0, '21.288')] [2024-12-22 17:38:33,648][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000919_3764224.pth... [2024-12-22 17:38:33,776][03263] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000675_2764800.pth [2024-12-22 17:38:34,760][03276] Updated weights for policy 0, policy_version 920 (0.0014) [2024-12-22 17:38:38,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 3776512. Throughput: 0: 1041.8. Samples: 944888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:38:38,642][00403] Avg episode reward: [(0, '20.347')] [2024-12-22 17:38:43,636][00403] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4137.7). Total num frames: 3796992. Throughput: 0: 1013.6. Samples: 947238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-22 17:38:43,647][00403] Avg episode reward: [(0, '20.710')] [2024-12-22 17:38:45,470][03276] Updated weights for policy 0, policy_version 930 (0.0044) [2024-12-22 17:38:48,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 3821568. Throughput: 0: 1027.1. Samples: 954428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:38:48,643][00403] Avg episode reward: [(0, '21.788')] [2024-12-22 17:38:53,636][00403] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 3842048. Throughput: 0: 1064.9. Samples: 961094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:38:53,643][00403] Avg episode reward: [(0, '20.139')] [2024-12-22 17:38:55,592][03276] Updated weights for policy 0, policy_version 940 (0.0014) [2024-12-22 17:38:58,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4137.7). Total num frames: 3858432. Throughput: 0: 1030.0. Samples: 963240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:38:58,639][00403] Avg episode reward: [(0, '20.163')] [2024-12-22 17:39:03,637][00403] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 3883008. Throughput: 0: 1009.0. Samples: 969634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:39:03,639][00403] Avg episode reward: [(0, '21.656')] [2024-12-22 17:39:04,787][03276] Updated weights for policy 0, policy_version 950 (0.0042) [2024-12-22 17:39:08,636][00403] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 3907584. Throughput: 0: 1074.5. Samples: 977164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:39:08,638][00403] Avg episode reward: [(0, '20.701')] [2024-12-22 17:39:13,637][00403] Fps is (10 sec: 4095.7, 60 sec: 4095.9, 300 sec: 4137.6). Total num frames: 3923968. Throughput: 0: 1060.2. Samples: 979560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-22 17:39:13,647][00403] Avg episode reward: [(0, '20.958')] [2024-12-22 17:39:15,749][03276] Updated weights for policy 0, policy_version 960 (0.0014) [2024-12-22 17:39:18,636][00403] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4151.6). Total num frames: 3944448. Throughput: 0: 1012.3. Samples: 984918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:39:18,639][00403] Avg episode reward: [(0, '22.602')] [2024-12-22 17:39:23,636][00403] Fps is (10 sec: 4506.0, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 3969024. Throughput: 0: 1057.3. Samples: 992468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:39:23,638][00403] Avg episode reward: [(0, '22.125')] [2024-12-22 17:39:23,890][03276] Updated weights for policy 0, policy_version 970 (0.0018) [2024-12-22 17:39:28,636][00403] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 3985408. Throughput: 0: 1076.9. Samples: 995700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-22 17:39:28,640][00403] Avg episode reward: [(0, '21.061')] [2024-12-22 17:39:33,271][03263] Stopping Batcher_0... [2024-12-22 17:39:33,272][03263] Loop batcher_evt_loop terminating... [2024-12-22 17:39:33,279][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-22 17:39:33,271][00403] Component Batcher_0 stopped! [2024-12-22 17:39:33,332][03276] Weights refcount: 2 0 [2024-12-22 17:39:33,334][03276] Stopping InferenceWorker_p0-w0... [2024-12-22 17:39:33,338][03276] Loop inference_proc0-0_evt_loop terminating... [2024-12-22 17:39:33,335][00403] Component InferenceWorker_p0-w0 stopped! [2024-12-22 17:39:33,418][03263] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000797_3264512.pth [2024-12-22 17:39:33,437][03263] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-22 17:39:33,604][03263] Stopping LearnerWorker_p0... [2024-12-22 17:39:33,604][03263] Loop learner_proc0_evt_loop terminating... [2024-12-22 17:39:33,604][00403] Component RolloutWorker_w4 stopped! [2024-12-22 17:39:33,612][00403] Component LearnerWorker_p0 stopped! [2024-12-22 17:39:33,618][03281] Stopping RolloutWorker_w4... [2024-12-22 17:39:33,621][00403] Component RolloutWorker_w2 stopped! [2024-12-22 17:39:33,626][03279] Stopping RolloutWorker_w2... [2024-12-22 17:39:33,633][00403] Component RolloutWorker_w0 stopped! [2024-12-22 17:39:33,619][03281] Loop rollout_proc4_evt_loop terminating... [2024-12-22 17:39:33,628][03279] Loop rollout_proc2_evt_loop terminating... [2024-12-22 17:39:33,638][03277] Stopping RolloutWorker_w0... [2024-12-22 17:39:33,641][03277] Loop rollout_proc0_evt_loop terminating... [2024-12-22 17:39:33,644][00403] Component RolloutWorker_w6 stopped! [2024-12-22 17:39:33,649][03282] Stopping RolloutWorker_w6... [2024-12-22 17:39:33,650][03282] Loop rollout_proc6_evt_loop terminating... [2024-12-22 17:39:33,785][03278] Stopping RolloutWorker_w1... [2024-12-22 17:39:33,786][03278] Loop rollout_proc1_evt_loop terminating... [2024-12-22 17:39:33,785][00403] Component RolloutWorker_w1 stopped! [2024-12-22 17:39:33,794][00403] Component RolloutWorker_w5 stopped! [2024-12-22 17:39:33,796][03284] Stopping RolloutWorker_w5... [2024-12-22 17:39:33,796][03284] Loop rollout_proc5_evt_loop terminating... [2024-12-22 17:39:33,803][00403] Component RolloutWorker_w3 stopped! [2024-12-22 17:39:33,808][03280] Stopping RolloutWorker_w3... [2024-12-22 17:39:33,809][03280] Loop rollout_proc3_evt_loop terminating... [2024-12-22 17:39:33,817][00403] Component RolloutWorker_w7 stopped! [2024-12-22 17:39:33,823][00403] Waiting for process learner_proc0 to stop... [2024-12-22 17:39:33,826][03283] Stopping RolloutWorker_w7... [2024-12-22 17:39:33,826][03283] Loop rollout_proc7_evt_loop terminating... [2024-12-22 17:39:35,243][00403] Waiting for process inference_proc0-0 to join... [2024-12-22 17:39:35,247][00403] Waiting for process rollout_proc0 to join... [2024-12-22 17:39:37,169][00403] Waiting for process rollout_proc1 to join... [2024-12-22 17:39:37,172][00403] Waiting for process rollout_proc2 to join... [2024-12-22 17:39:37,177][00403] Waiting for process rollout_proc3 to join... [2024-12-22 17:39:37,181][00403] Waiting for process rollout_proc4 to join... [2024-12-22 17:39:37,184][00403] Waiting for process rollout_proc5 to join... [2024-12-22 17:39:37,188][00403] Waiting for process rollout_proc6 to join... [2024-12-22 17:39:37,192][00403] Waiting for process rollout_proc7 to join... [2024-12-22 17:39:37,195][00403] Batcher 0 profile tree view: batching: 25.4539, releasing_batches: 0.0252 [2024-12-22 17:39:37,198][00403] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 392.5885 update_model: 8.1828 weight_update: 0.0027 one_step: 0.0034 handle_policy_step: 547.3133 deserialize: 13.9169, stack: 3.0888, obs_to_device_normalize: 118.1281, forward: 272.3693, send_messages: 27.4489 prepare_outputs: 84.2889 to_cpu: 50.7270 [2024-12-22 17:39:37,199][00403] Learner 0 profile tree view: misc: 0.0047, prepare_batch: 13.1290 train: 71.2559 epoch_init: 0.0175, minibatch_init: 0.0063, losses_postprocess: 0.6687, kl_divergence: 0.5029, after_optimizer: 32.6294 calculate_losses: 25.3187 losses_init: 0.0036, forward_head: 1.2495, bptt_initial: 16.7741, tail: 0.9664, advantages_returns: 0.2770, losses: 3.7945 bptt: 1.9036 bptt_forward_core: 1.8130 update: 11.4179 clip: 0.8466 [2024-12-22 17:39:37,201][00403] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3344, enqueue_policy_requests: 84.9332, env_step: 782.3665, overhead: 11.5078, complete_rollouts: 6.7145 save_policy_outputs: 19.6547 split_output_tensors: 7.8126 [2024-12-22 17:39:37,203][00403] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3366, enqueue_policy_requests: 92.0859, env_step: 776.4577, overhead: 11.3950, complete_rollouts: 6.7023 save_policy_outputs: 19.4588 split_output_tensors: 7.9831 [2024-12-22 17:39:37,204][00403] Loop Runner_EvtLoop terminating... [2024-12-22 17:39:37,205][00403] Runner profile tree view: main_loop: 1017.6939 [2024-12-22 17:39:37,207][00403] Collected {0: 4005888}, FPS: 3936.2 [2024-12-22 17:39:59,000][00403] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-22 17:39:59,001][00403] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-22 17:39:59,005][00403] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-22 17:39:59,007][00403] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-22 17:39:59,008][00403] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-22 17:39:59,010][00403] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-22 17:39:59,012][00403] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-22 17:39:59,015][00403] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-22 17:39:59,017][00403] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-22 17:39:59,019][00403] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-22 17:39:59,020][00403] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-22 17:39:59,021][00403] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-22 17:39:59,023][00403] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-22 17:39:59,024][00403] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-22 17:39:59,025][00403] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-22 17:39:59,059][00403] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-22 17:39:59,063][00403] RunningMeanStd input shape: (3, 72, 128) [2024-12-22 17:39:59,067][00403] RunningMeanStd input shape: (1,) [2024-12-22 17:39:59,083][00403] ConvEncoder: input_channels=3 [2024-12-22 17:39:59,185][00403] Conv encoder output size: 512 [2024-12-22 17:39:59,188][00403] Policy head output size: 512 [2024-12-22 17:39:59,359][00403] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-22 17:40:00,168][00403] Num frames 100... [2024-12-22 17:40:00,293][00403] Num frames 200... [2024-12-22 17:40:00,411][00403] Num frames 300... [2024-12-22 17:40:00,531][00403] Num frames 400... [2024-12-22 17:40:00,650][00403] Num frames 500... [2024-12-22 17:40:00,780][00403] Num frames 600... [2024-12-22 17:40:00,904][00403] Num frames 700... [2024-12-22 17:40:01,023][00403] Num frames 800... [2024-12-22 17:40:01,145][00403] Num frames 900... [2024-12-22 17:40:01,266][00403] Num frames 1000... [2024-12-22 17:40:01,386][00403] Num frames 1100... [2024-12-22 17:40:01,509][00403] Num frames 1200... [2024-12-22 17:40:01,630][00403] Num frames 1300... [2024-12-22 17:40:01,758][00403] Num frames 1400... [2024-12-22 17:40:01,889][00403] Num frames 1500... [2024-12-22 17:40:02,019][00403] Num frames 1600... [2024-12-22 17:40:02,142][00403] Num frames 1700... [2024-12-22 17:40:02,265][00403] Num frames 1800... [2024-12-22 17:40:02,388][00403] Num frames 1900... [2024-12-22 17:40:02,514][00403] Num frames 2000... [2024-12-22 17:40:02,641][00403] Num frames 2100... [2024-12-22 17:40:02,693][00403] Avg episode rewards: #0: 59.999, true rewards: #0: 21.000 [2024-12-22 17:40:02,695][00403] Avg episode reward: 59.999, avg true_objective: 21.000 [2024-12-22 17:40:02,823][00403] Num frames 2200... [2024-12-22 17:40:02,943][00403] Num frames 2300... [2024-12-22 17:40:03,066][00403] Num frames 2400... [2024-12-22 17:40:03,185][00403] Avg episode rewards: #0: 32.259, true rewards: #0: 12.260 [2024-12-22 17:40:03,187][00403] Avg episode reward: 32.259, avg true_objective: 12.260 [2024-12-22 17:40:03,248][00403] Num frames 2500... [2024-12-22 17:40:03,375][00403] Num frames 2600... [2024-12-22 17:40:03,496][00403] Num frames 2700... [2024-12-22 17:40:03,614][00403] Num frames 2800... [2024-12-22 17:40:03,745][00403] Num frames 2900... [2024-12-22 17:40:03,877][00403] Num frames 3000... [2024-12-22 17:40:03,999][00403] Num frames 3100... [2024-12-22 17:40:04,123][00403] Num frames 3200... [2024-12-22 17:40:04,247][00403] Num frames 3300... [2024-12-22 17:40:04,368][00403] Num frames 3400... [2024-12-22 17:40:04,490][00403] Num frames 3500... [2024-12-22 17:40:04,612][00403] Num frames 3600... [2024-12-22 17:40:04,746][00403] Num frames 3700... [2024-12-22 17:40:04,859][00403] Avg episode rewards: #0: 31.106, true rewards: #0: 12.440 [2024-12-22 17:40:04,861][00403] Avg episode reward: 31.106, avg true_objective: 12.440 [2024-12-22 17:40:04,977][00403] Num frames 3800... [2024-12-22 17:40:05,139][00403] Num frames 3900... [2024-12-22 17:40:05,300][00403] Num frames 4000... [2024-12-22 17:40:05,463][00403] Num frames 4100... [2024-12-22 17:40:05,631][00403] Num frames 4200... [2024-12-22 17:40:05,805][00403] Num frames 4300... [2024-12-22 17:40:05,979][00403] Num frames 4400... [2024-12-22 17:40:06,153][00403] Num frames 4500... [2024-12-22 17:40:06,324][00403] Num frames 4600... [2024-12-22 17:40:06,496][00403] Num frames 4700... [2024-12-22 17:40:06,671][00403] Num frames 4800... [2024-12-22 17:40:06,850][00403] Num frames 4900... [2024-12-22 17:40:07,030][00403] Num frames 5000... [2024-12-22 17:40:07,206][00403] Num frames 5100... [2024-12-22 17:40:07,352][00403] Num frames 5200... [2024-12-22 17:40:07,475][00403] Num frames 5300... [2024-12-22 17:40:07,597][00403] Num frames 5400... [2024-12-22 17:40:07,725][00403] Num frames 5500... [2024-12-22 17:40:07,892][00403] Avg episode rewards: #0: 35.485, true rewards: #0: 13.985 [2024-12-22 17:40:07,893][00403] Avg episode reward: 35.485, avg true_objective: 13.985 [2024-12-22 17:40:07,904][00403] Num frames 5600... [2024-12-22 17:40:08,034][00403] Num frames 5700... [2024-12-22 17:40:08,156][00403] Num frames 5800... [2024-12-22 17:40:08,277][00403] Num frames 5900... [2024-12-22 17:40:08,400][00403] Num frames 6000... [2024-12-22 17:40:08,520][00403] Num frames 6100... [2024-12-22 17:40:08,644][00403] Num frames 6200... [2024-12-22 17:40:08,776][00403] Num frames 6300... [2024-12-22 17:40:08,897][00403] Num frames 6400... [2024-12-22 17:40:09,021][00403] Num frames 6500... [2024-12-22 17:40:09,160][00403] Num frames 6600... [2024-12-22 17:40:09,285][00403] Num frames 6700... [2024-12-22 17:40:09,406][00403] Num frames 6800... [2024-12-22 17:40:09,531][00403] Num frames 6900... [2024-12-22 17:40:09,652][00403] Num frames 7000... [2024-12-22 17:40:09,780][00403] Num frames 7100... [2024-12-22 17:40:09,914][00403] Avg episode rewards: #0: 34.924, true rewards: #0: 14.324 [2024-12-22 17:40:09,915][00403] Avg episode reward: 34.924, avg true_objective: 14.324 [2024-12-22 17:40:09,962][00403] Num frames 7200... [2024-12-22 17:40:10,092][00403] Num frames 7300... [2024-12-22 17:40:10,213][00403] Num frames 7400... [2024-12-22 17:40:10,334][00403] Num frames 7500... [2024-12-22 17:40:10,453][00403] Num frames 7600... [2024-12-22 17:40:10,575][00403] Num frames 7700... [2024-12-22 17:40:10,706][00403] Num frames 7800... [2024-12-22 17:40:10,827][00403] Num frames 7900... [2024-12-22 17:40:10,948][00403] Num frames 8000... [2024-12-22 17:40:11,071][00403] Num frames 8100... [2024-12-22 17:40:11,205][00403] Num frames 8200... [2024-12-22 17:40:11,324][00403] Num frames 8300... [2024-12-22 17:40:11,451][00403] Num frames 8400... [2024-12-22 17:40:11,575][00403] Num frames 8500... [2024-12-22 17:40:11,709][00403] Num frames 8600... [2024-12-22 17:40:11,830][00403] Num frames 8700... [2024-12-22 17:40:11,951][00403] Num frames 8800... [2024-12-22 17:40:12,079][00403] Num frames 8900... [2024-12-22 17:40:12,211][00403] Num frames 9000... [2024-12-22 17:40:12,336][00403] Num frames 9100... [2024-12-22 17:40:12,456][00403] Num frames 9200... [2024-12-22 17:40:12,586][00403] Avg episode rewards: #0: 38.770, true rewards: #0: 15.437 [2024-12-22 17:40:12,588][00403] Avg episode reward: 38.770, avg true_objective: 15.437 [2024-12-22 17:40:12,637][00403] Num frames 9300... [2024-12-22 17:40:12,764][00403] Num frames 9400... [2024-12-22 17:40:12,887][00403] Num frames 9500... [2024-12-22 17:40:13,008][00403] Num frames 9600... [2024-12-22 17:40:13,132][00403] Num frames 9700... [2024-12-22 17:40:13,263][00403] Num frames 9800... [2024-12-22 17:40:13,371][00403] Avg episode rewards: #0: 34.911, true rewards: #0: 14.054 [2024-12-22 17:40:13,372][00403] Avg episode reward: 34.911, avg true_objective: 14.054 [2024-12-22 17:40:13,450][00403] Num frames 9900... [2024-12-22 17:40:13,571][00403] Num frames 10000... [2024-12-22 17:40:13,701][00403] Num frames 10100... [2024-12-22 17:40:13,828][00403] Num frames 10200... [2024-12-22 17:40:13,949][00403] Num frames 10300... [2024-12-22 17:40:14,075][00403] Num frames 10400... [2024-12-22 17:40:14,205][00403] Num frames 10500... [2024-12-22 17:40:14,327][00403] Num frames 10600... [2024-12-22 17:40:14,447][00403] Num frames 10700... [2024-12-22 17:40:14,572][00403] Num frames 10800... [2024-12-22 17:40:14,704][00403] Num frames 10900... [2024-12-22 17:40:14,827][00403] Num frames 11000... [2024-12-22 17:40:14,948][00403] Num frames 11100... [2024-12-22 17:40:15,072][00403] Num frames 11200... [2024-12-22 17:40:15,202][00403] Num frames 11300... [2024-12-22 17:40:15,335][00403] Num frames 11400... [2024-12-22 17:40:15,458][00403] Num frames 11500... [2024-12-22 17:40:15,578][00403] Num frames 11600... [2024-12-22 17:40:15,706][00403] Num frames 11700... [2024-12-22 17:40:15,776][00403] Avg episode rewards: #0: 36.390, true rewards: #0: 14.640 [2024-12-22 17:40:15,778][00403] Avg episode reward: 36.390, avg true_objective: 14.640 [2024-12-22 17:40:15,887][00403] Num frames 11800... [2024-12-22 17:40:16,007][00403] Num frames 11900... [2024-12-22 17:40:16,129][00403] Num frames 12000... [2024-12-22 17:40:16,262][00403] Num frames 12100... [2024-12-22 17:40:16,382][00403] Num frames 12200... [2024-12-22 17:40:16,504][00403] Num frames 12300... [2024-12-22 17:40:16,626][00403] Num frames 12400... [2024-12-22 17:40:16,754][00403] Num frames 12500... [2024-12-22 17:40:16,877][00403] Num frames 12600... [2024-12-22 17:40:16,998][00403] Num frames 12700... [2024-12-22 17:40:17,137][00403] Avg episode rewards: #0: 34.742, true rewards: #0: 14.187 [2024-12-22 17:40:17,139][00403] Avg episode reward: 34.742, avg true_objective: 14.187 [2024-12-22 17:40:17,181][00403] Num frames 12800... [2024-12-22 17:40:17,332][00403] Num frames 12900... [2024-12-22 17:40:17,500][00403] Num frames 13000... [2024-12-22 17:40:17,681][00403] Num frames 13100... [2024-12-22 17:40:17,849][00403] Num frames 13200... [2024-12-22 17:40:18,015][00403] Num frames 13300... [2024-12-22 17:40:18,184][00403] Num frames 13400... [2024-12-22 17:40:18,349][00403] Num frames 13500... [2024-12-22 17:40:18,511][00403] Num frames 13600... [2024-12-22 17:40:18,682][00403] Num frames 13700... [2024-12-22 17:40:18,850][00403] Num frames 13800... [2024-12-22 17:40:19,019][00403] Num frames 13900... [2024-12-22 17:40:19,193][00403] Num frames 14000... [2024-12-22 17:40:19,371][00403] Num frames 14100... [2024-12-22 17:40:19,541][00403] Num frames 14200... [2024-12-22 17:40:19,743][00403] Avg episode rewards: #0: 35.583, true rewards: #0: 14.283 [2024-12-22 17:40:19,745][00403] Avg episode reward: 35.583, avg true_objective: 14.283 [2024-12-22 17:41:41,998][00403] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-22 17:44:20,491][00403] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-22 17:44:20,493][00403] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-22 17:44:20,495][00403] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-22 17:44:20,496][00403] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-22 17:44:20,498][00403] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-22 17:44:20,500][00403] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-22 17:44:20,502][00403] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-22 17:44:20,503][00403] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-22 17:44:20,504][00403] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-22 17:44:20,505][00403] Adding new argument 'hf_repository'='sunnyday910/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-22 17:44:20,506][00403] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-22 17:44:20,508][00403] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-22 17:44:20,509][00403] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-22 17:44:20,510][00403] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-22 17:44:20,512][00403] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-22 17:44:20,548][00403] RunningMeanStd input shape: (3, 72, 128) [2024-12-22 17:44:20,549][00403] RunningMeanStd input shape: (1,) [2024-12-22 17:44:20,563][00403] ConvEncoder: input_channels=3 [2024-12-22 17:44:20,603][00403] Conv encoder output size: 512 [2024-12-22 17:44:20,604][00403] Policy head output size: 512 [2024-12-22 17:44:20,629][00403] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-22 17:44:21,060][00403] Num frames 100... [2024-12-22 17:44:21,191][00403] Num frames 200... [2024-12-22 17:44:21,310][00403] Num frames 300... [2024-12-22 17:44:21,428][00403] Num frames 400... [2024-12-22 17:44:21,552][00403] Num frames 500... [2024-12-22 17:44:21,674][00403] Num frames 600... [2024-12-22 17:44:21,796][00403] Num frames 700... [2024-12-22 17:44:21,893][00403] Avg episode rewards: #0: 12.360, true rewards: #0: 7.360 [2024-12-22 17:44:21,895][00403] Avg episode reward: 12.360, avg true_objective: 7.360 [2024-12-22 17:44:21,973][00403] Num frames 800... [2024-12-22 17:44:22,098][00403] Num frames 900... [2024-12-22 17:44:22,227][00403] Num frames 1000... [2024-12-22 17:44:22,347][00403] Num frames 1100... [2024-12-22 17:44:22,469][00403] Num frames 1200... [2024-12-22 17:44:22,587][00403] Num frames 1300... [2024-12-22 17:44:22,717][00403] Num frames 1400... [2024-12-22 17:44:22,841][00403] Num frames 1500... [2024-12-22 17:44:22,961][00403] Num frames 1600... [2024-12-22 17:44:23,075][00403] Avg episode rewards: #0: 14.730, true rewards: #0: 8.230 [2024-12-22 17:44:23,077][00403] Avg episode reward: 14.730, avg true_objective: 8.230 [2024-12-22 17:44:23,143][00403] Num frames 1700... [2024-12-22 17:44:23,277][00403] Num frames 1800... [2024-12-22 17:44:23,396][00403] Num frames 1900... [2024-12-22 17:44:23,515][00403] Num frames 2000... [2024-12-22 17:44:23,638][00403] Num frames 2100... [2024-12-22 17:44:23,766][00403] Num frames 2200... [2024-12-22 17:44:23,890][00403] Num frames 2300... [2024-12-22 17:44:24,016][00403] Num frames 2400... [2024-12-22 17:44:24,140][00403] Num frames 2500... [2024-12-22 17:44:24,271][00403] Num frames 2600... [2024-12-22 17:44:24,394][00403] Num frames 2700... [2024-12-22 17:44:24,516][00403] Num frames 2800... [2024-12-22 17:44:24,638][00403] Num frames 2900... [2024-12-22 17:44:24,768][00403] Num frames 3000... [2024-12-22 17:44:24,893][00403] Num frames 3100... [2024-12-22 17:44:25,014][00403] Num frames 3200... [2024-12-22 17:44:25,105][00403] Avg episode rewards: #0: 22.760, true rewards: #0: 10.760 [2024-12-22 17:44:25,106][00403] Avg episode reward: 22.760, avg true_objective: 10.760 [2024-12-22 17:44:25,197][00403] Num frames 3300... [2024-12-22 17:44:25,325][00403] Num frames 3400... [2024-12-22 17:44:25,445][00403] Num frames 3500... [2024-12-22 17:44:25,563][00403] Num frames 3600... [2024-12-22 17:44:25,693][00403] Num frames 3700... [2024-12-22 17:44:25,829][00403] Num frames 3800... [2024-12-22 17:44:25,968][00403] Avg episode rewards: #0: 19.670, true rewards: #0: 9.670 [2024-12-22 17:44:25,969][00403] Avg episode reward: 19.670, avg true_objective: 9.670 [2024-12-22 17:44:26,011][00403] Num frames 3900... [2024-12-22 17:44:26,133][00403] Num frames 4000... [2024-12-22 17:44:26,299][00403] Num frames 4100... [2024-12-22 17:44:26,418][00403] Num frames 4200... [2024-12-22 17:44:26,538][00403] Num frames 4300... [2024-12-22 17:44:26,665][00403] Num frames 4400... [2024-12-22 17:44:26,797][00403] Num frames 4500... [2024-12-22 17:44:26,918][00403] Num frames 4600... [2024-12-22 17:44:27,041][00403] Num frames 4700... [2024-12-22 17:44:27,164][00403] Num frames 4800... [2024-12-22 17:44:27,295][00403] Num frames 4900... [2024-12-22 17:44:27,427][00403] Num frames 5000... [2024-12-22 17:44:27,587][00403] Avg episode rewards: #0: 21.976, true rewards: #0: 10.176 [2024-12-22 17:44:27,590][00403] Avg episode reward: 21.976, avg true_objective: 10.176 [2024-12-22 17:44:27,606][00403] Num frames 5100... [2024-12-22 17:44:27,735][00403] Num frames 5200... [2024-12-22 17:44:27,860][00403] Num frames 5300... [2024-12-22 17:44:27,981][00403] Num frames 5400... [2024-12-22 17:44:28,105][00403] Num frames 5500... [2024-12-22 17:44:28,228][00403] Num frames 5600... [2024-12-22 17:44:28,291][00403] Avg episode rewards: #0: 20.005, true rewards: #0: 9.338 [2024-12-22 17:44:28,295][00403] Avg episode reward: 20.005, avg true_objective: 9.338 [2024-12-22 17:44:28,426][00403] Num frames 5700... [2024-12-22 17:44:28,598][00403] Num frames 5800... [2024-12-22 17:44:28,772][00403] Num frames 5900... [2024-12-22 17:44:28,936][00403] Num frames 6000... [2024-12-22 17:44:29,096][00403] Num frames 6100... [2024-12-22 17:44:29,263][00403] Num frames 6200... [2024-12-22 17:44:29,441][00403] Num frames 6300... [2024-12-22 17:44:29,624][00403] Num frames 6400... [2024-12-22 17:44:29,808][00403] Avg episode rewards: #0: 19.817, true rewards: #0: 9.246 [2024-12-22 17:44:29,810][00403] Avg episode reward: 19.817, avg true_objective: 9.246 [2024-12-22 17:44:29,861][00403] Num frames 6500... [2024-12-22 17:44:30,032][00403] Num frames 6600... [2024-12-22 17:44:30,194][00403] Num frames 6700... [2024-12-22 17:44:30,368][00403] Num frames 6800... [2024-12-22 17:44:30,545][00403] Num frames 6900... [2024-12-22 17:44:30,727][00403] Num frames 7000... [2024-12-22 17:44:30,903][00403] Num frames 7100... [2024-12-22 17:44:31,053][00403] Num frames 7200... [2024-12-22 17:44:31,172][00403] Num frames 7300... [2024-12-22 17:44:31,294][00403] Num frames 7400... [2024-12-22 17:44:31,381][00403] Avg episode rewards: #0: 19.908, true rewards: #0: 9.282 [2024-12-22 17:44:31,383][00403] Avg episode reward: 19.908, avg true_objective: 9.282 [2024-12-22 17:44:31,482][00403] Num frames 7500... [2024-12-22 17:44:31,604][00403] Num frames 7600... [2024-12-22 17:44:31,737][00403] Num frames 7700... [2024-12-22 17:44:31,857][00403] Num frames 7800... [2024-12-22 17:44:31,977][00403] Num frames 7900... [2024-12-22 17:44:32,102][00403] Num frames 8000... [2024-12-22 17:44:32,223][00403] Num frames 8100... [2024-12-22 17:44:32,343][00403] Num frames 8200... [2024-12-22 17:44:32,470][00403] Num frames 8300... [2024-12-22 17:44:32,593][00403] Num frames 8400... [2024-12-22 17:44:32,729][00403] Num frames 8500... [2024-12-22 17:44:32,848][00403] Num frames 8600... [2024-12-22 17:44:32,973][00403] Num frames 8700... [2024-12-22 17:44:33,094][00403] Num frames 8800... [2024-12-22 17:44:33,215][00403] Num frames 8900... [2024-12-22 17:44:33,344][00403] Num frames 9000... [2024-12-22 17:44:33,467][00403] Num frames 9100... [2024-12-22 17:44:33,596][00403] Num frames 9200... [2024-12-22 17:44:33,753][00403] Avg episode rewards: #0: 22.974, true rewards: #0: 10.308 [2024-12-22 17:44:33,755][00403] Avg episode reward: 22.974, avg true_objective: 10.308 [2024-12-22 17:44:33,784][00403] Num frames 9300... [2024-12-22 17:44:33,906][00403] Num frames 9400... [2024-12-22 17:44:34,029][00403] Num frames 9500... [2024-12-22 17:44:34,155][00403] Num frames 9600... [2024-12-22 17:44:34,277][00403] Num frames 9700... [2024-12-22 17:44:34,397][00403] Num frames 9800... [2024-12-22 17:44:34,527][00403] Num frames 9900... [2024-12-22 17:44:34,649][00403] Num frames 10000... [2024-12-22 17:44:34,782][00403] Num frames 10100... [2024-12-22 17:44:34,900][00403] Num frames 10200... [2024-12-22 17:44:34,999][00403] Avg episode rewards: #0: 22.737, true rewards: #0: 10.237 [2024-12-22 17:44:35,002][00403] Avg episode reward: 22.737, avg true_objective: 10.237 [2024-12-22 17:45:33,016][00403] Replay video saved to /content/train_dir/default_experiment/replay.mp4!