[2024-12-23 04:15:51,631][00864] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-23 04:15:51,634][00864] Rollout worker 0 uses device cpu [2024-12-23 04:15:51,635][00864] Rollout worker 1 uses device cpu [2024-12-23 04:15:51,637][00864] Rollout worker 2 uses device cpu [2024-12-23 04:15:51,642][00864] Rollout worker 3 uses device cpu [2024-12-23 04:15:51,643][00864] Rollout worker 4 uses device cpu [2024-12-23 04:15:51,644][00864] Rollout worker 5 uses device cpu [2024-12-23 04:15:51,645][00864] Rollout worker 6 uses device cpu [2024-12-23 04:15:51,646][00864] Rollout worker 7 uses device cpu [2024-12-23 04:15:51,794][00864] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 04:15:51,795][00864] InferenceWorker_p0-w0: min num requests: 2 [2024-12-23 04:15:51,829][00864] Starting all processes... [2024-12-23 04:15:51,830][00864] Starting process learner_proc0 [2024-12-23 04:15:51,877][00864] Starting all processes... [2024-12-23 04:15:51,887][00864] Starting process inference_proc0-0 [2024-12-23 04:15:51,890][00864] Starting process rollout_proc2 [2024-12-23 04:15:51,890][00864] Starting process rollout_proc1 [2024-12-23 04:15:51,891][00864] Starting process rollout_proc3 [2024-12-23 04:15:51,891][00864] Starting process rollout_proc4 [2024-12-23 04:15:51,891][00864] Starting process rollout_proc5 [2024-12-23 04:15:51,891][00864] Starting process rollout_proc6 [2024-12-23 04:15:51,891][00864] Starting process rollout_proc7 [2024-12-23 04:15:51,887][00864] Starting process rollout_proc0 [2024-12-23 04:16:07,934][03635] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 04:16:07,938][03635] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-23 04:16:08,016][03635] Num visible devices: 1 [2024-12-23 04:16:08,051][03635] Starting seed is not provided [2024-12-23 04:16:08,052][03635] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 04:16:08,052][03635] Initializing actor-critic model on device cuda:0 [2024-12-23 04:16:08,053][03635] RunningMeanStd input shape: (3, 72, 128) [2024-12-23 04:16:08,056][03635] RunningMeanStd input shape: (1,) [2024-12-23 04:16:08,160][03635] ConvEncoder: input_channels=3 [2024-12-23 04:16:08,604][03654] Worker 1 uses CPU cores [1] [2024-12-23 04:16:08,603][03656] Worker 4 uses CPU cores [0] [2024-12-23 04:16:08,618][03653] Worker 2 uses CPU cores [0] [2024-12-23 04:16:08,629][03652] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 04:16:08,634][03652] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-23 04:16:08,686][03658] Worker 7 uses CPU cores [1] [2024-12-23 04:16:08,710][03652] Num visible devices: 1 [2024-12-23 04:16:08,791][03657] Worker 6 uses CPU cores [0] [2024-12-23 04:16:08,797][03660] Worker 0 uses CPU cores [0] [2024-12-23 04:16:08,820][03655] Worker 3 uses CPU cores [1] [2024-12-23 04:16:08,851][03659] Worker 5 uses CPU cores [1] [2024-12-23 04:16:08,863][03635] Conv encoder output size: 512 [2024-12-23 04:16:08,863][03635] Policy head output size: 512 [2024-12-23 04:16:08,913][03635] Created Actor Critic model with architecture: [2024-12-23 04:16:08,913][03635] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-23 04:16:09,307][03635] Using optimizer [2024-12-23 04:16:11,787][00864] Heartbeat connected on Batcher_0 [2024-12-23 04:16:11,795][00864] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-23 04:16:11,809][00864] Heartbeat connected on RolloutWorker_w0 [2024-12-23 04:16:11,811][00864] Heartbeat connected on RolloutWorker_w1 [2024-12-23 04:16:11,816][00864] Heartbeat connected on RolloutWorker_w3 [2024-12-23 04:16:11,818][00864] Heartbeat connected on RolloutWorker_w2 [2024-12-23 04:16:11,820][00864] Heartbeat connected on RolloutWorker_w4 [2024-12-23 04:16:11,822][00864] Heartbeat connected on RolloutWorker_w5 [2024-12-23 04:16:11,829][00864] Heartbeat connected on RolloutWorker_w7 [2024-12-23 04:16:11,834][00864] Heartbeat connected on RolloutWorker_w6 [2024-12-23 04:16:13,653][03635] No checkpoints found [2024-12-23 04:16:13,653][03635] Did not load from checkpoint, starting from scratch! [2024-12-23 04:16:13,653][03635] Initialized policy 0 weights for model version 0 [2024-12-23 04:16:13,659][03635] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-23 04:16:13,666][03635] LearnerWorker_p0 finished initialization! [2024-12-23 04:16:13,667][00864] Heartbeat connected on LearnerWorker_p0 [2024-12-23 04:16:13,857][03652] RunningMeanStd input shape: (3, 72, 128) [2024-12-23 04:16:13,858][03652] RunningMeanStd input shape: (1,) [2024-12-23 04:16:13,873][03652] ConvEncoder: input_channels=3 [2024-12-23 04:16:13,978][03652] Conv encoder output size: 512 [2024-12-23 04:16:13,979][03652] Policy head output size: 512 [2024-12-23 04:16:14,031][00864] Inference worker 0-0 is ready! [2024-12-23 04:16:14,033][00864] All inference workers are ready! Signal rollout workers to start! [2024-12-23 04:16:14,222][03658] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 04:16:14,221][03655] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 04:16:14,224][03659] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 04:16:14,226][03654] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 04:16:14,252][03657] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 04:16:14,253][03656] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 04:16:14,257][03660] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 04:16:14,258][03653] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 04:16:15,218][03659] Decorrelating experience for 0 frames... [2024-12-23 04:16:15,216][03654] Decorrelating experience for 0 frames... [2024-12-23 04:16:15,953][03656] Decorrelating experience for 0 frames... [2024-12-23 04:16:15,957][03653] Decorrelating experience for 0 frames... [2024-12-23 04:16:15,959][03660] Decorrelating experience for 0 frames... [2024-12-23 04:16:15,962][03655] Decorrelating experience for 0 frames... [2024-12-23 04:16:15,967][03657] Decorrelating experience for 0 frames... [2024-12-23 04:16:15,980][03654] Decorrelating experience for 32 frames... [2024-12-23 04:16:16,575][00864] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-23 04:16:16,758][03655] Decorrelating experience for 32 frames... [2024-12-23 04:16:17,070][03654] Decorrelating experience for 64 frames... [2024-12-23 04:16:17,498][03660] Decorrelating experience for 32 frames... [2024-12-23 04:16:17,507][03657] Decorrelating experience for 32 frames... [2024-12-23 04:16:17,508][03656] Decorrelating experience for 32 frames... [2024-12-23 04:16:17,511][03653] Decorrelating experience for 32 frames... [2024-12-23 04:16:17,739][03658] Decorrelating experience for 0 frames... [2024-12-23 04:16:18,136][03654] Decorrelating experience for 96 frames... [2024-12-23 04:16:18,597][03660] Decorrelating experience for 64 frames... [2024-12-23 04:16:18,901][03659] Decorrelating experience for 32 frames... [2024-12-23 04:16:19,187][03658] Decorrelating experience for 32 frames... [2024-12-23 04:16:19,217][03653] Decorrelating experience for 64 frames... [2024-12-23 04:16:19,221][03655] Decorrelating experience for 64 frames... [2024-12-23 04:16:20,217][03659] Decorrelating experience for 64 frames... [2024-12-23 04:16:20,282][03655] Decorrelating experience for 96 frames... [2024-12-23 04:16:20,850][03656] Decorrelating experience for 64 frames... [2024-12-23 04:16:20,904][03657] Decorrelating experience for 64 frames... [2024-12-23 04:16:21,048][03653] Decorrelating experience for 96 frames... [2024-12-23 04:16:21,575][00864] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.0. Samples: 10. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-23 04:16:21,579][00864] Avg episode reward: [(0, '1.728')] [2024-12-23 04:16:23,149][03660] Decorrelating experience for 96 frames... [2024-12-23 04:16:23,519][03656] Decorrelating experience for 96 frames... [2024-12-23 04:16:23,675][03657] Decorrelating experience for 96 frames... [2024-12-23 04:16:24,615][03658] Decorrelating experience for 64 frames... [2024-12-23 04:16:26,575][00864] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 205.6. Samples: 2056. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-23 04:16:26,577][00864] Avg episode reward: [(0, '2.896')] [2024-12-23 04:16:28,421][03635] Signal inference workers to stop experience collection... [2024-12-23 04:16:28,457][03652] InferenceWorker_p0-w0: stopping experience collection [2024-12-23 04:16:29,417][03659] Decorrelating experience for 96 frames... [2024-12-23 04:16:29,544][03658] Decorrelating experience for 96 frames... [2024-12-23 04:16:30,996][03635] Signal inference workers to resume experience collection... [2024-12-23 04:16:30,999][03652] InferenceWorker_p0-w0: resuming experience collection [2024-12-23 04:16:31,575][00864] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 170.5. Samples: 2558. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-23 04:16:31,577][00864] Avg episode reward: [(0, '3.104')] [2024-12-23 04:16:36,575][00864] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 346.1. Samples: 6922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:16:36,581][00864] Avg episode reward: [(0, '3.638')] [2024-12-23 04:16:38,389][03652] Updated weights for policy 0, policy_version 10 (0.0161) [2024-12-23 04:16:41,575][00864] Fps is (10 sec: 4915.2, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 541.8. Samples: 13544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:16:41,577][00864] Avg episode reward: [(0, '4.380')] [2024-12-23 04:16:46,575][00864] Fps is (10 sec: 3686.4, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 523.8. Samples: 15714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:16:46,577][00864] Avg episode reward: [(0, '4.452')] [2024-12-23 04:16:49,682][03652] Updated weights for policy 0, policy_version 20 (0.0045) [2024-12-23 04:16:51,575][00864] Fps is (10 sec: 3686.4, 60 sec: 2574.6, 300 sec: 2574.6). Total num frames: 90112. Throughput: 0: 613.7. Samples: 21480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:16:51,577][00864] Avg episode reward: [(0, '4.250')] [2024-12-23 04:16:56,575][00864] Fps is (10 sec: 4915.1, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 114688. Throughput: 0: 711.3. Samples: 28452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:16:56,577][00864] Avg episode reward: [(0, '4.486')] [2024-12-23 04:16:56,585][03635] Saving new best policy, reward=4.486! [2024-12-23 04:16:59,162][03652] Updated weights for policy 0, policy_version 30 (0.0034) [2024-12-23 04:17:01,575][00864] Fps is (10 sec: 3686.4, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 692.0. Samples: 31142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:17:01,577][00864] Avg episode reward: [(0, '4.510')] [2024-12-23 04:17:01,580][03635] Saving new best policy, reward=4.510! [2024-12-23 04:17:06,578][00864] Fps is (10 sec: 3276.0, 60 sec: 2949.0, 300 sec: 2949.0). Total num frames: 147456. Throughput: 0: 794.0. Samples: 35740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:17:06,580][00864] Avg episode reward: [(0, '4.571')] [2024-12-23 04:17:06,587][03635] Saving new best policy, reward=4.571! [2024-12-23 04:17:09,983][03652] Updated weights for policy 0, policy_version 40 (0.0045) [2024-12-23 04:17:11,575][00864] Fps is (10 sec: 4096.0, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 907.6. Samples: 42900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:17:11,577][00864] Avg episode reward: [(0, '4.548')] [2024-12-23 04:17:16,575][00864] Fps is (10 sec: 4097.1, 60 sec: 3140.3, 300 sec: 3140.3). Total num frames: 188416. Throughput: 0: 976.7. Samples: 46508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:17:16,580][00864] Avg episode reward: [(0, '4.404')] [2024-12-23 04:17:20,965][03652] Updated weights for policy 0, policy_version 50 (0.0026) [2024-12-23 04:17:21,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3150.8). Total num frames: 204800. Throughput: 0: 978.8. Samples: 50968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:17:21,579][00864] Avg episode reward: [(0, '4.353')] [2024-12-23 04:17:26,575][00864] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 229376. Throughput: 0: 976.9. Samples: 57504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:17:26,577][00864] Avg episode reward: [(0, '4.574')] [2024-12-23 04:17:26,584][03635] Saving new best policy, reward=4.574! [2024-12-23 04:17:29,967][03652] Updated weights for policy 0, policy_version 60 (0.0035) [2024-12-23 04:17:31,578][00864] Fps is (10 sec: 4504.1, 60 sec: 4095.8, 300 sec: 3331.3). Total num frames: 249856. Throughput: 0: 1007.4. Samples: 61050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:17:31,581][00864] Avg episode reward: [(0, '4.438')] [2024-12-23 04:17:36,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3328.0). Total num frames: 266240. Throughput: 0: 999.6. Samples: 66462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:17:36,579][00864] Avg episode reward: [(0, '4.428')] [2024-12-23 04:17:41,245][03652] Updated weights for policy 0, policy_version 70 (0.0032) [2024-12-23 04:17:41,575][00864] Fps is (10 sec: 3687.6, 60 sec: 3891.2, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 967.8. Samples: 72002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:17:41,577][00864] Avg episode reward: [(0, '4.526')] [2024-12-23 04:17:46,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3458.8). Total num frames: 311296. Throughput: 0: 987.3. Samples: 75572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:17:46,582][00864] Avg episode reward: [(0, '4.566')] [2024-12-23 04:17:46,592][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth... [2024-12-23 04:17:50,712][03652] Updated weights for policy 0, policy_version 80 (0.0019) [2024-12-23 04:17:51,575][00864] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3449.3). Total num frames: 327680. Throughput: 0: 1025.8. Samples: 81898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:17:51,580][00864] Avg episode reward: [(0, '4.447')] [2024-12-23 04:17:56,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3440.6). Total num frames: 344064. Throughput: 0: 963.0. Samples: 86234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:17:56,577][00864] Avg episode reward: [(0, '4.327')] [2024-12-23 04:18:01,388][03652] Updated weights for policy 0, policy_version 90 (0.0032) [2024-12-23 04:18:01,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3510.9). Total num frames: 368640. Throughput: 0: 962.9. Samples: 89838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-23 04:18:01,582][00864] Avg episode reward: [(0, '4.287')] [2024-12-23 04:18:06,576][00864] Fps is (10 sec: 4505.0, 60 sec: 4027.8, 300 sec: 3537.4). Total num frames: 389120. Throughput: 0: 1019.8. Samples: 96860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:18:06,582][00864] Avg episode reward: [(0, '4.396')] [2024-12-23 04:18:11,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3526.1). Total num frames: 405504. Throughput: 0: 978.9. Samples: 101554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:18:11,577][00864] Avg episode reward: [(0, '4.608')] [2024-12-23 04:18:11,579][03635] Saving new best policy, reward=4.608! [2024-12-23 04:18:12,916][03652] Updated weights for policy 0, policy_version 100 (0.0014) [2024-12-23 04:18:16,575][00864] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 960.6. Samples: 104276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:18:16,581][00864] Avg episode reward: [(0, '4.480')] [2024-12-23 04:18:21,552][03652] Updated weights for policy 0, policy_version 110 (0.0014) [2024-12-23 04:18:21,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3604.5). Total num frames: 450560. Throughput: 0: 1000.5. Samples: 111486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:18:21,577][00864] Avg episode reward: [(0, '4.343')] [2024-12-23 04:18:26,575][00864] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3591.9). Total num frames: 466944. Throughput: 0: 1002.2. Samples: 117102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:18:26,577][00864] Avg episode reward: [(0, '4.439')] [2024-12-23 04:18:31,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 972.4. Samples: 119332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:18:31,577][00864] Avg episode reward: [(0, '4.487')] [2024-12-23 04:18:32,785][03652] Updated weights for policy 0, policy_version 120 (0.0040) [2024-12-23 04:18:36,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3627.9). Total num frames: 507904. Throughput: 0: 984.7. Samples: 126208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:18:36,577][00864] Avg episode reward: [(0, '4.379')] [2024-12-23 04:18:41,575][00864] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3644.0). Total num frames: 528384. Throughput: 0: 1036.4. Samples: 132872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:18:41,577][00864] Avg episode reward: [(0, '4.331')] [2024-12-23 04:18:42,229][03652] Updated weights for policy 0, policy_version 130 (0.0016) [2024-12-23 04:18:46,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3631.8). Total num frames: 544768. Throughput: 0: 1004.2. Samples: 135026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:18:46,579][00864] Avg episode reward: [(0, '4.342')] [2024-12-23 04:18:51,575][00864] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3646.8). Total num frames: 565248. Throughput: 0: 977.8. Samples: 140858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:18:51,577][00864] Avg episode reward: [(0, '4.512')] [2024-12-23 04:18:52,692][03652] Updated weights for policy 0, policy_version 140 (0.0022) [2024-12-23 04:18:56,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3686.4). Total num frames: 589824. Throughput: 0: 1029.5. Samples: 147882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:18:56,578][00864] Avg episode reward: [(0, '4.607')] [2024-12-23 04:19:01,575][00864] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3674.0). Total num frames: 606208. Throughput: 0: 1028.7. Samples: 150568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:19:01,581][00864] Avg episode reward: [(0, '4.669')] [2024-12-23 04:19:01,583][03635] Saving new best policy, reward=4.669! [2024-12-23 04:19:04,224][03652] Updated weights for policy 0, policy_version 150 (0.0021) [2024-12-23 04:19:06,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3686.4). Total num frames: 626688. Throughput: 0: 971.4. Samples: 155200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:19:06,579][00864] Avg episode reward: [(0, '4.575')] [2024-12-23 04:19:11,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3698.1). Total num frames: 647168. Throughput: 0: 1009.3. Samples: 162522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:19:11,579][00864] Avg episode reward: [(0, '4.492')] [2024-12-23 04:19:12,539][03652] Updated weights for policy 0, policy_version 160 (0.0040) [2024-12-23 04:19:16,576][00864] Fps is (10 sec: 4095.5, 60 sec: 4027.7, 300 sec: 3709.1). Total num frames: 667648. Throughput: 0: 1040.2. Samples: 166142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:19:16,580][00864] Avg episode reward: [(0, '4.490')] [2024-12-23 04:19:21,575][00864] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3697.5). Total num frames: 684032. Throughput: 0: 985.6. Samples: 170560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:19:21,583][00864] Avg episode reward: [(0, '4.558')] [2024-12-23 04:19:23,950][03652] Updated weights for policy 0, policy_version 170 (0.0028) [2024-12-23 04:19:26,575][00864] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 3708.0). Total num frames: 704512. Throughput: 0: 983.3. Samples: 177118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:19:26,577][00864] Avg episode reward: [(0, '4.731')] [2024-12-23 04:19:26,633][03635] Saving new best policy, reward=4.731! [2024-12-23 04:19:31,575][00864] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3738.9). Total num frames: 729088. Throughput: 0: 1014.2. Samples: 180666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:19:31,583][00864] Avg episode reward: [(0, '4.503')] [2024-12-23 04:19:33,389][03652] Updated weights for policy 0, policy_version 180 (0.0018) [2024-12-23 04:19:36,575][00864] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3727.4). Total num frames: 745472. Throughput: 0: 1004.1. Samples: 186042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:19:36,577][00864] Avg episode reward: [(0, '4.338')] [2024-12-23 04:19:41,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3736.4). Total num frames: 765952. Throughput: 0: 975.8. Samples: 191792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:19:41,576][00864] Avg episode reward: [(0, '4.358')] [2024-12-23 04:19:43,979][03652] Updated weights for policy 0, policy_version 190 (0.0022) [2024-12-23 04:19:46,575][00864] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3764.4). Total num frames: 790528. Throughput: 0: 995.9. Samples: 195384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:19:46,578][00864] Avg episode reward: [(0, '4.723')] [2024-12-23 04:19:46,584][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000193_790528.pth... [2024-12-23 04:19:51,585][00864] Fps is (10 sec: 4092.0, 60 sec: 4027.1, 300 sec: 3752.9). Total num frames: 806912. Throughput: 0: 1031.9. Samples: 201644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:19:51,588][00864] Avg episode reward: [(0, '4.840')] [2024-12-23 04:19:51,592][03635] Saving new best policy, reward=4.840! [2024-12-23 04:19:55,313][03652] Updated weights for policy 0, policy_version 200 (0.0017) [2024-12-23 04:19:56,575][00864] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3742.3). Total num frames: 823296. Throughput: 0: 970.5. Samples: 206196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:19:56,581][00864] Avg episode reward: [(0, '4.732')] [2024-12-23 04:20:01,575][00864] Fps is (10 sec: 4100.0, 60 sec: 4027.7, 300 sec: 3768.3). Total num frames: 847872. Throughput: 0: 967.7. Samples: 209688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-23 04:20:01,577][00864] Avg episode reward: [(0, '4.468')] [2024-12-23 04:20:04,096][03652] Updated weights for policy 0, policy_version 210 (0.0019) [2024-12-23 04:20:06,575][00864] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3775.4). Total num frames: 868352. Throughput: 0: 1028.3. Samples: 216832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:20:06,580][00864] Avg episode reward: [(0, '4.488')] [2024-12-23 04:20:11,582][00864] Fps is (10 sec: 3683.7, 60 sec: 3959.0, 300 sec: 3764.7). Total num frames: 884736. Throughput: 0: 983.9. Samples: 221400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:20:11,586][00864] Avg episode reward: [(0, '4.425')] [2024-12-23 04:20:15,327][03652] Updated weights for policy 0, policy_version 220 (0.0030) [2024-12-23 04:20:16,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3771.7). Total num frames: 905216. Throughput: 0: 969.6. Samples: 224300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:20:16,577][00864] Avg episode reward: [(0, '4.481')] [2024-12-23 04:20:21,575][00864] Fps is (10 sec: 4508.9, 60 sec: 4096.0, 300 sec: 3795.1). Total num frames: 929792. Throughput: 0: 1008.9. Samples: 231442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:20:21,582][00864] Avg episode reward: [(0, '4.458')] [2024-12-23 04:20:25,133][03652] Updated weights for policy 0, policy_version 230 (0.0045) [2024-12-23 04:20:26,575][00864] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3768.3). Total num frames: 942080. Throughput: 0: 1000.2. Samples: 236800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:20:26,581][00864] Avg episode reward: [(0, '4.475')] [2024-12-23 04:20:31,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3774.7). Total num frames: 962560. Throughput: 0: 967.2. Samples: 238906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-23 04:20:31,579][00864] Avg episode reward: [(0, '4.495')] [2024-12-23 04:20:35,733][03652] Updated weights for policy 0, policy_version 240 (0.0036) [2024-12-23 04:20:36,575][00864] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3796.7). Total num frames: 987136. Throughput: 0: 980.0. Samples: 245736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:20:36,577][00864] Avg episode reward: [(0, '4.493')] [2024-12-23 04:20:41,575][00864] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3802.3). Total num frames: 1007616. Throughput: 0: 1023.1. Samples: 252234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:20:41,580][00864] Avg episode reward: [(0, '4.329')] [2024-12-23 04:20:46,576][00864] Fps is (10 sec: 3276.4, 60 sec: 3822.9, 300 sec: 3777.4). Total num frames: 1019904. Throughput: 0: 992.0. Samples: 254328. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:20:46,579][00864] Avg episode reward: [(0, '4.445')] [2024-12-23 04:20:46,874][03652] Updated weights for policy 0, policy_version 250 (0.0031) [2024-12-23 04:20:51,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3960.1, 300 sec: 3798.1). Total num frames: 1044480. Throughput: 0: 960.1. Samples: 260038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:20:51,580][00864] Avg episode reward: [(0, '4.413')] [2024-12-23 04:20:55,630][03652] Updated weights for policy 0, policy_version 260 (0.0019) [2024-12-23 04:20:56,575][00864] Fps is (10 sec: 4506.2, 60 sec: 4027.7, 300 sec: 3803.4). Total num frames: 1064960. Throughput: 0: 1015.2. Samples: 267076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:20:56,581][00864] Avg episode reward: [(0, '4.342')] [2024-12-23 04:21:01,575][00864] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3794.2). Total num frames: 1081344. Throughput: 0: 1006.2. Samples: 269580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:21:01,577][00864] Avg episode reward: [(0, '4.360')] [2024-12-23 04:21:06,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3799.4). Total num frames: 1101824. Throughput: 0: 953.5. Samples: 274348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:21:06,579][00864] Avg episode reward: [(0, '4.529')] [2024-12-23 04:21:07,218][03652] Updated weights for policy 0, policy_version 270 (0.0016) [2024-12-23 04:21:11,575][00864] Fps is (10 sec: 4505.9, 60 sec: 4028.2, 300 sec: 3818.3). Total num frames: 1126400. Throughput: 0: 995.7. Samples: 281604. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:21:11,581][00864] Avg episode reward: [(0, '4.670')] [2024-12-23 04:21:16,575][00864] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 1029.6. Samples: 285236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:21:16,579][00864] Avg episode reward: [(0, '4.667')] [2024-12-23 04:21:16,723][03652] Updated weights for policy 0, policy_version 280 (0.0023) [2024-12-23 04:21:21,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1159168. Throughput: 0: 975.2. Samples: 289622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:21:21,580][00864] Avg episode reward: [(0, '4.626')] [2024-12-23 04:21:26,575][00864] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1183744. Throughput: 0: 974.4. Samples: 296082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:21:26,581][00864] Avg episode reward: [(0, '4.606')] [2024-12-23 04:21:27,336][03652] Updated weights for policy 0, policy_version 290 (0.0021) [2024-12-23 04:21:31,577][00864] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 3984.9). Total num frames: 1204224. Throughput: 0: 1004.7. Samples: 299538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:21:31,581][00864] Avg episode reward: [(0, '4.560')] [2024-12-23 04:21:36,575][00864] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1220608. Throughput: 0: 998.7. Samples: 304978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:21:36,577][00864] Avg episode reward: [(0, '4.527')] [2024-12-23 04:21:38,802][03652] Updated weights for policy 0, policy_version 300 (0.0032) [2024-12-23 04:21:41,575][00864] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 1241088. Throughput: 0: 967.1. Samples: 310594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:21:41,577][00864] Avg episode reward: [(0, '4.476')] [2024-12-23 04:21:46,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3984.9). Total num frames: 1265664. Throughput: 0: 992.0. Samples: 314218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:21:46,580][00864] Avg episode reward: [(0, '4.654')] [2024-12-23 04:21:46,591][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth... [2024-12-23 04:21:46,726][03635] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth [2024-12-23 04:21:47,289][03652] Updated weights for policy 0, policy_version 310 (0.0028) [2024-12-23 04:21:51,575][00864] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1282048. Throughput: 0: 1032.0. Samples: 320786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:21:51,580][00864] Avg episode reward: [(0, '4.680')] [2024-12-23 04:21:56,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1298432. Throughput: 0: 970.4. Samples: 325272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:21:56,581][00864] Avg episode reward: [(0, '4.707')] [2024-12-23 04:21:58,752][03652] Updated weights for policy 0, policy_version 320 (0.0039) [2024-12-23 04:22:01,577][00864] Fps is (10 sec: 4095.4, 60 sec: 4027.6, 300 sec: 3984.9). Total num frames: 1323008. Throughput: 0: 967.0. Samples: 328754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:22:01,582][00864] Avg episode reward: [(0, '4.504')] [2024-12-23 04:22:06,575][00864] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1343488. Throughput: 0: 1029.2. Samples: 335936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:22:06,577][00864] Avg episode reward: [(0, '4.522')] [2024-12-23 04:22:08,168][03652] Updated weights for policy 0, policy_version 330 (0.0019) [2024-12-23 04:22:11,575][00864] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1359872. Throughput: 0: 991.1. Samples: 340682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:22:11,581][00864] Avg episode reward: [(0, '4.459')] [2024-12-23 04:22:16,575][00864] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1380352. Throughput: 0: 976.9. Samples: 343498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:22:16,582][00864] Avg episode reward: [(0, '4.398')] [2024-12-23 04:22:18,446][03652] Updated weights for policy 0, policy_version 340 (0.0027) [2024-12-23 04:22:21,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 1404928. Throughput: 0: 1017.4. Samples: 350762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:22:21,582][00864] Avg episode reward: [(0, '4.569')] [2024-12-23 04:22:26,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3985.0). Total num frames: 1425408. Throughput: 0: 1020.8. Samples: 356530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:22:26,576][00864] Avg episode reward: [(0, '4.651')] [2024-12-23 04:22:29,304][03652] Updated weights for policy 0, policy_version 350 (0.0022) [2024-12-23 04:22:31,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3984.9). Total num frames: 1441792. Throughput: 0: 989.1. Samples: 358728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:22:31,581][00864] Avg episode reward: [(0, '4.475')] [2024-12-23 04:22:36,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1466368. Throughput: 0: 995.9. Samples: 365600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:22:36,577][00864] Avg episode reward: [(0, '4.394')] [2024-12-23 04:22:38,042][03652] Updated weights for policy 0, policy_version 360 (0.0039) [2024-12-23 04:22:41,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 1486848. Throughput: 0: 1047.4. Samples: 372404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:22:41,580][00864] Avg episode reward: [(0, '4.418')] [2024-12-23 04:22:46,575][00864] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1503232. Throughput: 0: 1019.1. Samples: 374610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:22:46,577][00864] Avg episode reward: [(0, '4.523')] [2024-12-23 04:22:49,110][03652] Updated weights for policy 0, policy_version 370 (0.0013) [2024-12-23 04:22:51,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1523712. Throughput: 0: 989.6. Samples: 380468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:22:51,579][00864] Avg episode reward: [(0, '4.570')] [2024-12-23 04:22:56,577][00864] Fps is (10 sec: 4504.8, 60 sec: 4164.1, 300 sec: 3998.8). Total num frames: 1548288. Throughput: 0: 1045.0. Samples: 387710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:22:56,581][00864] Avg episode reward: [(0, '4.587')] [2024-12-23 04:22:57,878][03652] Updated weights for policy 0, policy_version 380 (0.0023) [2024-12-23 04:23:01,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 1564672. Throughput: 0: 1043.8. Samples: 390468. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:23:01,580][00864] Avg episode reward: [(0, '4.416')] [2024-12-23 04:23:06,575][00864] Fps is (10 sec: 3687.2, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1585152. Throughput: 0: 987.1. Samples: 395182. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:23:06,581][00864] Avg episode reward: [(0, '4.270')] [2024-12-23 04:23:08,995][03652] Updated weights for policy 0, policy_version 390 (0.0018) [2024-12-23 04:23:11,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 1609728. Throughput: 0: 1022.3. Samples: 402534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:23:11,578][00864] Avg episode reward: [(0, '4.330')] [2024-12-23 04:23:16,575][00864] Fps is (10 sec: 4505.5, 60 sec: 4164.2, 300 sec: 3998.8). Total num frames: 1630208. Throughput: 0: 1054.4. Samples: 406176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:23:16,577][00864] Avg episode reward: [(0, '4.620')] [2024-12-23 04:23:19,195][03652] Updated weights for policy 0, policy_version 400 (0.0022) [2024-12-23 04:23:21,575][00864] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1642496. Throughput: 0: 1004.0. Samples: 410780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:23:21,577][00864] Avg episode reward: [(0, '4.557')] [2024-12-23 04:23:26,575][00864] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1667072. Throughput: 0: 1004.0. Samples: 417586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:23:26,582][00864] Avg episode reward: [(0, '4.562')] [2024-12-23 04:23:28,652][03652] Updated weights for policy 0, policy_version 410 (0.0025) [2024-12-23 04:23:31,575][00864] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 1691648. Throughput: 0: 1033.3. Samples: 421110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:23:31,579][00864] Avg episode reward: [(0, '4.886')] [2024-12-23 04:23:31,584][03635] Saving new best policy, reward=4.886! [2024-12-23 04:23:36,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1708032. Throughput: 0: 1021.7. Samples: 426444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:23:36,582][00864] Avg episode reward: [(0, '4.823')] [2024-12-23 04:23:39,645][03652] Updated weights for policy 0, policy_version 420 (0.0023) [2024-12-23 04:23:41,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1728512. Throughput: 0: 990.4. Samples: 432274. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:23:41,577][00864] Avg episode reward: [(0, '4.661')] [2024-12-23 04:23:46,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 1753088. Throughput: 0: 1010.2. Samples: 435926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:23:46,581][00864] Avg episode reward: [(0, '4.801')] [2024-12-23 04:23:46,592][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000428_1753088.pth... [2024-12-23 04:23:46,712][03635] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000193_790528.pth [2024-12-23 04:23:48,150][03652] Updated weights for policy 0, policy_version 430 (0.0030) [2024-12-23 04:23:51,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1769472. Throughput: 0: 1049.3. Samples: 442400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:23:51,580][00864] Avg episode reward: [(0, '4.977')] [2024-12-23 04:23:51,587][03635] Saving new best policy, reward=4.977! [2024-12-23 04:23:56,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 3998.8). Total num frames: 1785856. Throughput: 0: 991.1. Samples: 447132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:23:56,576][00864] Avg episode reward: [(0, '4.867')] [2024-12-23 04:23:59,504][03652] Updated weights for policy 0, policy_version 440 (0.0030) [2024-12-23 04:24:01,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1810432. Throughput: 0: 989.4. Samples: 450700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:24:01,577][00864] Avg episode reward: [(0, '4.888')] [2024-12-23 04:24:06,578][00864] Fps is (10 sec: 4504.1, 60 sec: 4095.8, 300 sec: 4012.6). Total num frames: 1830912. Throughput: 0: 1047.3. Samples: 457912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:24:06,580][00864] Avg episode reward: [(0, '4.788')] [2024-12-23 04:24:09,671][03652] Updated weights for policy 0, policy_version 450 (0.0025) [2024-12-23 04:24:11,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1847296. Throughput: 0: 998.0. Samples: 462498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:24:11,580][00864] Avg episode reward: [(0, '4.625')] [2024-12-23 04:24:16,575][00864] Fps is (10 sec: 4097.3, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 1871872. Throughput: 0: 989.2. Samples: 465624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:24:16,577][00864] Avg episode reward: [(0, '5.028')] [2024-12-23 04:24:16,589][03635] Saving new best policy, reward=5.028! [2024-12-23 04:24:19,066][03652] Updated weights for policy 0, policy_version 460 (0.0020) [2024-12-23 04:24:21,575][00864] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4040.5). Total num frames: 1896448. Throughput: 0: 1033.9. Samples: 472970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:24:21,576][00864] Avg episode reward: [(0, '5.248')] [2024-12-23 04:24:21,587][03635] Saving new best policy, reward=5.248! [2024-12-23 04:24:26,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1908736. Throughput: 0: 1026.5. Samples: 478466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:24:26,579][00864] Avg episode reward: [(0, '5.134')] [2024-12-23 04:24:30,399][03652] Updated weights for policy 0, policy_version 470 (0.0019) [2024-12-23 04:24:31,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1929216. Throughput: 0: 994.1. Samples: 480660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:24:31,582][00864] Avg episode reward: [(0, '5.523')] [2024-12-23 04:24:31,585][03635] Saving new best policy, reward=5.523! [2024-12-23 04:24:36,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1953792. Throughput: 0: 1007.1. Samples: 487720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:24:36,577][00864] Avg episode reward: [(0, '5.694')] [2024-12-23 04:24:36,589][03635] Saving new best policy, reward=5.694! [2024-12-23 04:24:38,686][03652] Updated weights for policy 0, policy_version 480 (0.0014) [2024-12-23 04:24:41,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1974272. Throughput: 0: 1046.8. Samples: 494238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:24:41,577][00864] Avg episode reward: [(0, '5.580')] [2024-12-23 04:24:46,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.8). Total num frames: 1990656. Throughput: 0: 1017.3. Samples: 496478. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-23 04:24:46,577][00864] Avg episode reward: [(0, '5.732')] [2024-12-23 04:24:46,591][03635] Saving new best policy, reward=5.732! [2024-12-23 04:24:49,671][03652] Updated weights for policy 0, policy_version 490 (0.0044) [2024-12-23 04:24:51,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2015232. Throughput: 0: 992.0. Samples: 502550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:24:51,577][00864] Avg episode reward: [(0, '6.247')] [2024-12-23 04:24:51,583][03635] Saving new best policy, reward=6.247! [2024-12-23 04:24:56,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 2035712. Throughput: 0: 1050.4. Samples: 509768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:24:56,580][00864] Avg episode reward: [(0, '6.449')] [2024-12-23 04:24:56,646][03635] Saving new best policy, reward=6.449! [2024-12-23 04:24:59,536][03652] Updated weights for policy 0, policy_version 500 (0.0034) [2024-12-23 04:25:01,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2052096. Throughput: 0: 1034.1. Samples: 512158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:25:01,585][00864] Avg episode reward: [(0, '6.838')] [2024-12-23 04:25:01,588][03635] Saving new best policy, reward=6.838! [2024-12-23 04:25:06,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4028.0, 300 sec: 4026.7). Total num frames: 2072576. Throughput: 0: 976.8. Samples: 516926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:25:06,582][00864] Avg episode reward: [(0, '7.129')] [2024-12-23 04:25:06,592][03635] Saving new best policy, reward=7.129! [2024-12-23 04:25:09,953][03652] Updated weights for policy 0, policy_version 510 (0.0030) [2024-12-23 04:25:11,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2093056. Throughput: 0: 1017.8. Samples: 524268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:25:11,580][00864] Avg episode reward: [(0, '7.834')] [2024-12-23 04:25:11,635][03635] Saving new best policy, reward=7.834! [2024-12-23 04:25:16,575][00864] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2113536. Throughput: 0: 1048.9. Samples: 527860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:25:16,577][00864] Avg episode reward: [(0, '8.213')] [2024-12-23 04:25:16,650][03635] Saving new best policy, reward=8.213! [2024-12-23 04:25:20,784][03652] Updated weights for policy 0, policy_version 520 (0.0029) [2024-12-23 04:25:21,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2129920. Throughput: 0: 990.4. Samples: 532288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:25:21,582][00864] Avg episode reward: [(0, '7.454')] [2024-12-23 04:25:26,575][00864] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2154496. Throughput: 0: 999.2. Samples: 539204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:25:26,577][00864] Avg episode reward: [(0, '6.599')] [2024-12-23 04:25:29,400][03652] Updated weights for policy 0, policy_version 530 (0.0022) [2024-12-23 04:25:31,578][00864] Fps is (10 sec: 4913.6, 60 sec: 4164.0, 300 sec: 4040.4). Total num frames: 2179072. Throughput: 0: 1028.1. Samples: 542746. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-23 04:25:31,580][00864] Avg episode reward: [(0, '7.196')] [2024-12-23 04:25:36,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2195456. Throughput: 0: 1013.1. Samples: 548138. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-23 04:25:36,581][00864] Avg episode reward: [(0, '7.136')] [2024-12-23 04:25:40,440][03652] Updated weights for policy 0, policy_version 540 (0.0043) [2024-12-23 04:25:41,575][00864] Fps is (10 sec: 3687.6, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 2215936. Throughput: 0: 985.5. Samples: 554116. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-23 04:25:41,581][00864] Avg episode reward: [(0, '7.618')] [2024-12-23 04:25:46,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 2240512. Throughput: 0: 1014.0. Samples: 557790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:25:46,577][00864] Avg episode reward: [(0, '8.008')] [2024-12-23 04:25:46,585][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000547_2240512.pth... [2024-12-23 04:25:46,711][03635] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth [2024-12-23 04:25:49,591][03652] Updated weights for policy 0, policy_version 550 (0.0019) [2024-12-23 04:25:51,578][00864] Fps is (10 sec: 4094.7, 60 sec: 4027.5, 300 sec: 4040.4). Total num frames: 2256896. Throughput: 0: 1049.2. Samples: 564142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:25:51,589][00864] Avg episode reward: [(0, '7.607')] [2024-12-23 04:25:56,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2273280. Throughput: 0: 992.1. Samples: 568912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:25:56,576][00864] Avg episode reward: [(0, '8.317')] [2024-12-23 04:25:56,584][03635] Saving new best policy, reward=8.317! [2024-12-23 04:26:00,453][03652] Updated weights for policy 0, policy_version 560 (0.0013) [2024-12-23 04:26:01,575][00864] Fps is (10 sec: 4097.3, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2297856. Throughput: 0: 992.2. Samples: 572510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:26:01,577][00864] Avg episode reward: [(0, '8.194')] [2024-12-23 04:26:06,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2318336. Throughput: 0: 1050.6. Samples: 579566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:26:06,581][00864] Avg episode reward: [(0, '8.714')] [2024-12-23 04:26:06,594][03635] Saving new best policy, reward=8.714! [2024-12-23 04:26:11,569][03652] Updated weights for policy 0, policy_version 570 (0.0030) [2024-12-23 04:26:11,576][00864] Fps is (10 sec: 3276.4, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 2330624. Throughput: 0: 994.0. Samples: 583934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:26:11,578][00864] Avg episode reward: [(0, '8.692')] [2024-12-23 04:26:16,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 2355200. Throughput: 0: 984.2. Samples: 587032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:26:16,581][00864] Avg episode reward: [(0, '8.859')] [2024-12-23 04:26:16,590][03635] Saving new best policy, reward=8.859! [2024-12-23 04:26:20,192][03652] Updated weights for policy 0, policy_version 580 (0.0025) [2024-12-23 04:26:21,575][00864] Fps is (10 sec: 4915.9, 60 sec: 4164.3, 300 sec: 4054.4). Total num frames: 2379776. Throughput: 0: 1026.4. Samples: 594324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:26:21,577][00864] Avg episode reward: [(0, '9.078')] [2024-12-23 04:26:21,579][03635] Saving new best policy, reward=9.078! [2024-12-23 04:26:26,576][00864] Fps is (10 sec: 4095.5, 60 sec: 4027.6, 300 sec: 4040.5). Total num frames: 2396160. Throughput: 0: 1013.3. Samples: 599718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:26:26,579][00864] Avg episode reward: [(0, '9.594')] [2024-12-23 04:26:26,589][03635] Saving new best policy, reward=9.594! [2024-12-23 04:26:31,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 4040.5). Total num frames: 2412544. Throughput: 0: 980.1. Samples: 601894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:26:31,579][00864] Avg episode reward: [(0, '9.992')] [2024-12-23 04:26:31,663][03652] Updated weights for policy 0, policy_version 590 (0.0032) [2024-12-23 04:26:31,666][03635] Saving new best policy, reward=9.992! [2024-12-23 04:26:36,575][00864] Fps is (10 sec: 4096.5, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2437120. Throughput: 0: 995.4. Samples: 608932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:26:36,581][00864] Avg episode reward: [(0, '10.718')] [2024-12-23 04:26:36,678][03635] Saving new best policy, reward=10.718! [2024-12-23 04:26:40,410][03652] Updated weights for policy 0, policy_version 600 (0.0021) [2024-12-23 04:26:41,579][00864] Fps is (10 sec: 4503.6, 60 sec: 4027.4, 300 sec: 4040.4). Total num frames: 2457600. Throughput: 0: 1031.1. Samples: 615314. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:26:41,581][00864] Avg episode reward: [(0, '11.391')] [2024-12-23 04:26:41,584][03635] Saving new best policy, reward=11.391! [2024-12-23 04:26:46,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4040.5). Total num frames: 2473984. Throughput: 0: 998.5. Samples: 617444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:26:46,577][00864] Avg episode reward: [(0, '11.243')] [2024-12-23 04:26:51,123][03652] Updated weights for policy 0, policy_version 610 (0.0021) [2024-12-23 04:26:51,575][00864] Fps is (10 sec: 4097.8, 60 sec: 4027.9, 300 sec: 4068.2). Total num frames: 2498560. Throughput: 0: 984.1. Samples: 623850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:26:51,577][00864] Avg episode reward: [(0, '11.182')] [2024-12-23 04:26:56,579][00864] Fps is (10 sec: 4913.1, 60 sec: 4164.0, 300 sec: 4068.2). Total num frames: 2523136. Throughput: 0: 1047.4. Samples: 631070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:26:56,581][00864] Avg episode reward: [(0, '12.482')] [2024-12-23 04:26:56,592][03635] Saving new best policy, reward=12.482! [2024-12-23 04:27:01,580][00864] Fps is (10 sec: 3684.5, 60 sec: 3959.1, 300 sec: 4040.4). Total num frames: 2535424. Throughput: 0: 1030.9. Samples: 633428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:27:01,584][00864] Avg episode reward: [(0, '13.275')] [2024-12-23 04:27:01,589][03635] Saving new best policy, reward=13.275! [2024-12-23 04:27:01,928][03652] Updated weights for policy 0, policy_version 620 (0.0033) [2024-12-23 04:27:06,575][00864] Fps is (10 sec: 3278.2, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2555904. Throughput: 0: 978.0. Samples: 638336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:27:06,577][00864] Avg episode reward: [(0, '13.685')] [2024-12-23 04:27:06,586][03635] Saving new best policy, reward=13.685! [2024-12-23 04:27:11,049][03652] Updated weights for policy 0, policy_version 630 (0.0020) [2024-12-23 04:27:11,575][00864] Fps is (10 sec: 4508.0, 60 sec: 4164.4, 300 sec: 4068.2). Total num frames: 2580480. Throughput: 0: 1021.0. Samples: 645660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:27:11,577][00864] Avg episode reward: [(0, '13.915')] [2024-12-23 04:27:11,579][03635] Saving new best policy, reward=13.915! [2024-12-23 04:27:16,576][00864] Fps is (10 sec: 4505.0, 60 sec: 4095.9, 300 sec: 4054.3). Total num frames: 2600960. Throughput: 0: 1049.0. Samples: 649102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:27:16,579][00864] Avg episode reward: [(0, '12.143')] [2024-12-23 04:27:21,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2617344. Throughput: 0: 993.8. Samples: 653654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:27:21,582][00864] Avg episode reward: [(0, '12.689')] [2024-12-23 04:27:22,111][03652] Updated weights for policy 0, policy_version 640 (0.0013) [2024-12-23 04:27:26,575][00864] Fps is (10 sec: 4096.5, 60 sec: 4096.1, 300 sec: 4068.2). Total num frames: 2641920. Throughput: 0: 1007.3. Samples: 660640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:27:26,577][00864] Avg episode reward: [(0, '13.668')] [2024-12-23 04:27:30,746][03652] Updated weights for policy 0, policy_version 650 (0.0019) [2024-12-23 04:27:31,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 2662400. Throughput: 0: 1040.9. Samples: 664284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:27:31,578][00864] Avg episode reward: [(0, '14.592')] [2024-12-23 04:27:31,580][03635] Saving new best policy, reward=14.592! [2024-12-23 04:27:36,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2678784. Throughput: 0: 1009.6. Samples: 669282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:27:36,579][00864] Avg episode reward: [(0, '15.367')] [2024-12-23 04:27:36,592][03635] Saving new best policy, reward=15.367! [2024-12-23 04:27:41,575][00864] Fps is (10 sec: 3686.3, 60 sec: 4028.0, 300 sec: 4054.3). Total num frames: 2699264. Throughput: 0: 984.0. Samples: 675344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:27:41,577][00864] Avg episode reward: [(0, '15.815')] [2024-12-23 04:27:41,584][03635] Saving new best policy, reward=15.815! [2024-12-23 04:27:41,940][03652] Updated weights for policy 0, policy_version 660 (0.0060) [2024-12-23 04:27:46,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 2723840. Throughput: 0: 1012.5. Samples: 678984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:27:46,577][00864] Avg episode reward: [(0, '16.001')] [2024-12-23 04:27:46,588][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000665_2723840.pth... [2024-12-23 04:27:46,722][03635] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000428_1753088.pth [2024-12-23 04:27:46,733][03635] Saving new best policy, reward=16.001! [2024-12-23 04:27:51,575][00864] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2740224. Throughput: 0: 1036.9. Samples: 684996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:27:51,579][00864] Avg episode reward: [(0, '16.229')] [2024-12-23 04:27:51,582][03635] Saving new best policy, reward=16.229! [2024-12-23 04:27:52,499][03652] Updated weights for policy 0, policy_version 670 (0.0027) [2024-12-23 04:27:56,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3891.5, 300 sec: 4040.5). Total num frames: 2756608. Throughput: 0: 984.3. Samples: 689952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:27:56,577][00864] Avg episode reward: [(0, '18.188')] [2024-12-23 04:27:56,586][03635] Saving new best policy, reward=18.188! [2024-12-23 04:28:01,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4096.4, 300 sec: 4054.3). Total num frames: 2781184. Throughput: 0: 988.3. Samples: 693576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:28:01,579][00864] Avg episode reward: [(0, '17.224')] [2024-12-23 04:28:01,973][03652] Updated weights for policy 0, policy_version 680 (0.0015) [2024-12-23 04:28:06,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2801664. Throughput: 0: 1041.8. Samples: 700534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:28:06,577][00864] Avg episode reward: [(0, '17.732')] [2024-12-23 04:28:11,576][00864] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 2818048. Throughput: 0: 985.1. Samples: 704972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:28:11,578][00864] Avg episode reward: [(0, '18.207')] [2024-12-23 04:28:11,586][03635] Saving new best policy, reward=18.207! [2024-12-23 04:28:13,039][03652] Updated weights for policy 0, policy_version 690 (0.0019) [2024-12-23 04:28:16,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 2842624. Throughput: 0: 977.2. Samples: 708258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:28:16,577][00864] Avg episode reward: [(0, '16.856')] [2024-12-23 04:28:21,455][03652] Updated weights for policy 0, policy_version 700 (0.0021) [2024-12-23 04:28:21,575][00864] Fps is (10 sec: 4915.9, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 2867200. Throughput: 0: 1032.2. Samples: 715730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:28:21,580][00864] Avg episode reward: [(0, '17.941')] [2024-12-23 04:28:26,577][00864] Fps is (10 sec: 3685.6, 60 sec: 3959.3, 300 sec: 4026.5). Total num frames: 2879488. Throughput: 0: 1016.0. Samples: 721066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:28:26,581][00864] Avg episode reward: [(0, '17.589')] [2024-12-23 04:28:31,575][00864] Fps is (10 sec: 3276.7, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2899968. Throughput: 0: 987.8. Samples: 723436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:28:31,578][00864] Avg episode reward: [(0, '17.737')] [2024-12-23 04:28:32,590][03652] Updated weights for policy 0, policy_version 710 (0.0028) [2024-12-23 04:28:36,575][00864] Fps is (10 sec: 4506.5, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2924544. Throughput: 0: 1015.6. Samples: 730700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:28:36,577][00864] Avg episode reward: [(0, '18.237')] [2024-12-23 04:28:36,587][03635] Saving new best policy, reward=18.237! [2024-12-23 04:28:41,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2945024. Throughput: 0: 1041.6. Samples: 736826. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:28:41,581][00864] Avg episode reward: [(0, '18.607')] [2024-12-23 04:28:41,585][03635] Saving new best policy, reward=18.607! [2024-12-23 04:28:42,420][03652] Updated weights for policy 0, policy_version 720 (0.0028) [2024-12-23 04:28:46,575][00864] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2961408. Throughput: 0: 1009.1. Samples: 738986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:28:46,581][00864] Avg episode reward: [(0, '19.101')] [2024-12-23 04:28:46,591][03635] Saving new best policy, reward=19.101! [2024-12-23 04:28:51,575][00864] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2985984. Throughput: 0: 1001.2. Samples: 745586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:28:51,579][00864] Avg episode reward: [(0, '18.289')] [2024-12-23 04:28:52,088][03652] Updated weights for policy 0, policy_version 730 (0.0031) [2024-12-23 04:28:56,576][00864] Fps is (10 sec: 4505.0, 60 sec: 4164.2, 300 sec: 4054.3). Total num frames: 3006464. Throughput: 0: 1061.6. Samples: 752742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:28:56,587][00864] Avg episode reward: [(0, '16.845')] [2024-12-23 04:29:01,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3022848. Throughput: 0: 1036.1. Samples: 754882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:29:01,577][00864] Avg episode reward: [(0, '16.347')] [2024-12-23 04:29:03,396][03652] Updated weights for policy 0, policy_version 740 (0.0014) [2024-12-23 04:29:06,575][00864] Fps is (10 sec: 3686.9, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3043328. Throughput: 0: 991.2. Samples: 760336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:29:06,580][00864] Avg episode reward: [(0, '16.714')] [2024-12-23 04:29:11,575][00864] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3067904. Throughput: 0: 1036.6. Samples: 767710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:29:11,577][00864] Avg episode reward: [(0, '17.323')] [2024-12-23 04:29:11,798][03652] Updated weights for policy 0, policy_version 750 (0.0023) [2024-12-23 04:29:16,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3084288. Throughput: 0: 1051.0. Samples: 770732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:29:16,579][00864] Avg episode reward: [(0, '18.614')] [2024-12-23 04:29:21,575][00864] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3104768. Throughput: 0: 994.7. Samples: 775460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:29:21,577][00864] Avg episode reward: [(0, '20.057')] [2024-12-23 04:29:21,580][03635] Saving new best policy, reward=20.057! [2024-12-23 04:29:22,885][03652] Updated weights for policy 0, policy_version 760 (0.0038) [2024-12-23 04:29:26,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 4068.2). Total num frames: 3129344. Throughput: 0: 1021.3. Samples: 782784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:29:26,584][00864] Avg episode reward: [(0, '20.363')] [2024-12-23 04:29:26,593][03635] Saving new best policy, reward=20.363! [2024-12-23 04:29:31,575][00864] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3149824. Throughput: 0: 1052.5. Samples: 786350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:29:31,582][00864] Avg episode reward: [(0, '20.945')] [2024-12-23 04:29:31,584][03635] Saving new best policy, reward=20.945! [2024-12-23 04:29:32,573][03652] Updated weights for policy 0, policy_version 770 (0.0013) [2024-12-23 04:29:36,575][00864] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3166208. Throughput: 0: 1008.3. Samples: 790962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:29:36,578][00864] Avg episode reward: [(0, '22.109')] [2024-12-23 04:29:36,590][03635] Saving new best policy, reward=22.109! [2024-12-23 04:29:41,575][00864] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3186688. Throughput: 0: 992.4. Samples: 797400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:29:41,581][00864] Avg episode reward: [(0, '21.409')] [2024-12-23 04:29:42,606][03652] Updated weights for policy 0, policy_version 780 (0.0030) [2024-12-23 04:29:46,575][00864] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3211264. Throughput: 0: 1027.6. Samples: 801124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:29:46,582][00864] Avg episode reward: [(0, '21.498')] [2024-12-23 04:29:46,602][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000784_3211264.pth... [2024-12-23 04:29:46,750][03635] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000547_2240512.pth [2024-12-23 04:29:51,578][00864] Fps is (10 sec: 4094.9, 60 sec: 4027.5, 300 sec: 4040.4). Total num frames: 3227648. Throughput: 0: 1032.8. Samples: 806814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:29:51,580][00864] Avg episode reward: [(0, '21.319')] [2024-12-23 04:29:53,357][03652] Updated weights for policy 0, policy_version 790 (0.0036) [2024-12-23 04:29:56,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 3248128. Throughput: 0: 991.7. Samples: 812336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:29:56,577][00864] Avg episode reward: [(0, '20.944')] [2024-12-23 04:30:01,575][00864] Fps is (10 sec: 4506.8, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3272704. Throughput: 0: 1004.0. Samples: 815914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:30:01,578][00864] Avg episode reward: [(0, '21.339')] [2024-12-23 04:30:02,315][03652] Updated weights for policy 0, policy_version 800 (0.0026) [2024-12-23 04:30:06,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3289088. Throughput: 0: 1045.7. Samples: 822518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:30:06,578][00864] Avg episode reward: [(0, '19.662')] [2024-12-23 04:30:11,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3305472. Throughput: 0: 981.5. Samples: 826950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:30:11,581][00864] Avg episode reward: [(0, '19.657')] [2024-12-23 04:30:13,546][03652] Updated weights for policy 0, policy_version 810 (0.0020) [2024-12-23 04:30:16,575][00864] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3330048. Throughput: 0: 985.5. Samples: 830696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:30:16,577][00864] Avg episode reward: [(0, '19.570')] [2024-12-23 04:30:21,576][00864] Fps is (10 sec: 4914.5, 60 sec: 4164.2, 300 sec: 4068.2). Total num frames: 3354624. Throughput: 0: 1047.9. Samples: 838120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:30:21,578][00864] Avg episode reward: [(0, '20.783')] [2024-12-23 04:30:22,244][03652] Updated weights for policy 0, policy_version 820 (0.0022) [2024-12-23 04:30:26,575][00864] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3371008. Throughput: 0: 1013.9. Samples: 843024. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:30:26,577][00864] Avg episode reward: [(0, '20.596')] [2024-12-23 04:30:31,575][00864] Fps is (10 sec: 3686.9, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 3391488. Throughput: 0: 993.1. Samples: 845812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:30:31,577][00864] Avg episode reward: [(0, '21.759')] [2024-12-23 04:30:33,008][03652] Updated weights for policy 0, policy_version 830 (0.0035) [2024-12-23 04:30:36,575][00864] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3416064. Throughput: 0: 1026.6. Samples: 853008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:30:36,577][00864] Avg episode reward: [(0, '23.915')] [2024-12-23 04:30:36,595][03635] Saving new best policy, reward=23.915! [2024-12-23 04:30:41,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3432448. Throughput: 0: 1031.2. Samples: 858738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:30:41,578][00864] Avg episode reward: [(0, '24.565')] [2024-12-23 04:30:41,581][03635] Saving new best policy, reward=24.565! [2024-12-23 04:30:43,856][03652] Updated weights for policy 0, policy_version 840 (0.0024) [2024-12-23 04:30:46,575][00864] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3448832. Throughput: 0: 998.7. Samples: 860854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:30:46,577][00864] Avg episode reward: [(0, '23.711')] [2024-12-23 04:30:51,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4068.2). Total num frames: 3473408. Throughput: 0: 1006.9. Samples: 867828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:30:51,577][00864] Avg episode reward: [(0, '23.232')] [2024-12-23 04:30:52,671][03652] Updated weights for policy 0, policy_version 850 (0.0014) [2024-12-23 04:30:56,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3493888. Throughput: 0: 1058.9. Samples: 874600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:30:56,577][00864] Avg episode reward: [(0, '22.412')] [2024-12-23 04:31:01,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3510272. Throughput: 0: 1024.5. Samples: 876798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-23 04:31:01,585][00864] Avg episode reward: [(0, '20.010')] [2024-12-23 04:31:04,270][03652] Updated weights for policy 0, policy_version 860 (0.0016) [2024-12-23 04:31:06,575][00864] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3530752. Throughput: 0: 978.8. Samples: 882166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:31:06,583][00864] Avg episode reward: [(0, '19.576')] [2024-12-23 04:31:11,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3555328. Throughput: 0: 1031.5. Samples: 889442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-23 04:31:11,577][00864] Avg episode reward: [(0, '19.036')] [2024-12-23 04:31:13,021][03652] Updated weights for policy 0, policy_version 870 (0.0020) [2024-12-23 04:31:16,575][00864] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3571712. Throughput: 0: 1034.9. Samples: 892384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:31:16,585][00864] Avg episode reward: [(0, '18.950')] [2024-12-23 04:31:21,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 4054.4). Total num frames: 3592192. Throughput: 0: 980.0. Samples: 897110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:31:21,576][00864] Avg episode reward: [(0, '19.702')] [2024-12-23 04:31:23,850][03652] Updated weights for policy 0, policy_version 880 (0.0027) [2024-12-23 04:31:26,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3616768. Throughput: 0: 1015.5. Samples: 904434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:31:26,583][00864] Avg episode reward: [(0, '20.257')] [2024-12-23 04:31:31,575][00864] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3637248. Throughput: 0: 1048.5. Samples: 908038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:31:31,581][00864] Avg episode reward: [(0, '20.944')] [2024-12-23 04:31:34,190][03652] Updated weights for policy 0, policy_version 890 (0.0024) [2024-12-23 04:31:36,575][00864] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 4040.5). Total num frames: 3649536. Throughput: 0: 995.5. Samples: 912624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:31:36,577][00864] Avg episode reward: [(0, '20.623')] [2024-12-23 04:31:41,575][00864] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3674112. Throughput: 0: 992.4. Samples: 919258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:31:41,580][00864] Avg episode reward: [(0, '18.939')] [2024-12-23 04:31:43,490][03652] Updated weights for policy 0, policy_version 900 (0.0026) [2024-12-23 04:31:46,575][00864] Fps is (10 sec: 5324.9, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 3702784. Throughput: 0: 1026.0. Samples: 922966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:31:46,580][00864] Avg episode reward: [(0, '19.913')] [2024-12-23 04:31:46,590][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000904_3702784.pth... [2024-12-23 04:31:46,720][03635] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000665_2723840.pth [2024-12-23 04:31:51,579][00864] Fps is (10 sec: 4094.3, 60 sec: 4027.4, 300 sec: 4040.5). Total num frames: 3715072. Throughput: 0: 1032.6. Samples: 928638. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-23 04:31:51,582][00864] Avg episode reward: [(0, '19.806')] [2024-12-23 04:31:54,394][03652] Updated weights for policy 0, policy_version 910 (0.0018) [2024-12-23 04:31:56,575][00864] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4068.3). Total num frames: 3735552. Throughput: 0: 992.9. Samples: 934124. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-23 04:31:56,581][00864] Avg episode reward: [(0, '20.308')] [2024-12-23 04:32:01,575][00864] Fps is (10 sec: 4507.5, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3760128. Throughput: 0: 1007.3. Samples: 937712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:32:01,580][00864] Avg episode reward: [(0, '21.846')] [2024-12-23 04:32:03,284][03652] Updated weights for policy 0, policy_version 920 (0.0031) [2024-12-23 04:32:06,577][00864] Fps is (10 sec: 4095.2, 60 sec: 4095.9, 300 sec: 4054.3). Total num frames: 3776512. Throughput: 0: 1042.9. Samples: 944044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:32:06,581][00864] Avg episode reward: [(0, '23.668')] [2024-12-23 04:32:11,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3792896. Throughput: 0: 985.2. Samples: 948766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-23 04:32:11,582][00864] Avg episode reward: [(0, '22.569')] [2024-12-23 04:32:14,225][03652] Updated weights for policy 0, policy_version 930 (0.0021) [2024-12-23 04:32:16,575][00864] Fps is (10 sec: 4096.9, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3817472. Throughput: 0: 988.9. Samples: 952538. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-23 04:32:16,582][00864] Avg episode reward: [(0, '22.090')] [2024-12-23 04:32:21,580][00864] Fps is (10 sec: 4912.7, 60 sec: 4163.9, 300 sec: 4068.2). Total num frames: 3842048. Throughput: 0: 1048.5. Samples: 959810. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-23 04:32:21,583][00864] Avg episode reward: [(0, '20.682')] [2024-12-23 04:32:24,531][03652] Updated weights for policy 0, policy_version 940 (0.0029) [2024-12-23 04:32:26,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3854336. Throughput: 0: 999.2. Samples: 964222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:32:26,581][00864] Avg episode reward: [(0, '21.674')] [2024-12-23 04:32:31,575][00864] Fps is (10 sec: 3278.5, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3874816. Throughput: 0: 976.0. Samples: 966888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:32:31,577][00864] Avg episode reward: [(0, '21.284')] [2024-12-23 04:32:34,698][03652] Updated weights for policy 0, policy_version 950 (0.0018) [2024-12-23 04:32:36,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3899392. Throughput: 0: 1006.0. Samples: 973904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:32:36,577][00864] Avg episode reward: [(0, '23.682')] [2024-12-23 04:32:41,578][00864] Fps is (10 sec: 4094.8, 60 sec: 4027.5, 300 sec: 4040.4). Total num frames: 3915776. Throughput: 0: 1009.5. Samples: 979556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-23 04:32:41,580][00864] Avg episode reward: [(0, '24.946')] [2024-12-23 04:32:41,582][03635] Saving new best policy, reward=24.946! [2024-12-23 04:32:45,912][03652] Updated weights for policy 0, policy_version 960 (0.0029) [2024-12-23 04:32:46,575][00864] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 4040.5). Total num frames: 3932160. Throughput: 0: 977.6. Samples: 981706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-23 04:32:46,577][00864] Avg episode reward: [(0, '24.554')] [2024-12-23 04:32:51,575][00864] Fps is (10 sec: 4097.2, 60 sec: 4028.0, 300 sec: 4068.2). Total num frames: 3956736. Throughput: 0: 996.1. Samples: 988868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:32:51,577][00864] Avg episode reward: [(0, '24.264')] [2024-12-23 04:32:54,220][03652] Updated weights for policy 0, policy_version 970 (0.0021) [2024-12-23 04:32:56,575][00864] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3977216. Throughput: 0: 1038.0. Samples: 995474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:32:56,577][00864] Avg episode reward: [(0, '24.667')] [2024-12-23 04:33:01,575][00864] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4040.5). Total num frames: 3993600. Throughput: 0: 1000.7. Samples: 997568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-23 04:33:01,577][00864] Avg episode reward: [(0, '25.101')] [2024-12-23 04:33:01,583][03635] Saving new best policy, reward=25.101! [2024-12-23 04:33:04,036][03635] Stopping Batcher_0... [2024-12-23 04:33:04,037][03635] Loop batcher_evt_loop terminating... [2024-12-23 04:33:04,038][00864] Component Batcher_0 stopped! [2024-12-23 04:33:04,038][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-23 04:33:04,095][03652] Weights refcount: 2 0 [2024-12-23 04:33:04,099][03652] Stopping InferenceWorker_p0-w0... [2024-12-23 04:33:04,099][00864] Component InferenceWorker_p0-w0 stopped! [2024-12-23 04:33:04,105][03652] Loop inference_proc0-0_evt_loop terminating... [2024-12-23 04:33:04,167][03635] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000784_3211264.pth [2024-12-23 04:33:04,190][03635] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-23 04:33:04,423][03635] Stopping LearnerWorker_p0... [2024-12-23 04:33:04,424][03635] Loop learner_proc0_evt_loop terminating... [2024-12-23 04:33:04,423][00864] Component LearnerWorker_p0 stopped! [2024-12-23 04:33:04,459][03656] Stopping RolloutWorker_w4... [2024-12-23 04:33:04,459][00864] Component RolloutWorker_w4 stopped! [2024-12-23 04:33:04,466][03656] Loop rollout_proc4_evt_loop terminating... [2024-12-23 04:33:04,476][00864] Component RolloutWorker_w6 stopped! [2024-12-23 04:33:04,476][03657] Stopping RolloutWorker_w6... [2024-12-23 04:33:04,484][03657] Loop rollout_proc6_evt_loop terminating... [2024-12-23 04:33:04,486][00864] Component RolloutWorker_w0 stopped! [2024-12-23 04:33:04,490][03660] Stopping RolloutWorker_w0... [2024-12-23 04:33:04,494][03660] Loop rollout_proc0_evt_loop terminating... [2024-12-23 04:33:04,500][00864] Component RolloutWorker_w2 stopped! [2024-12-23 04:33:04,505][03653] Stopping RolloutWorker_w2... [2024-12-23 04:33:04,505][03653] Loop rollout_proc2_evt_loop terminating... [2024-12-23 04:33:04,596][03654] Stopping RolloutWorker_w1... [2024-12-23 04:33:04,596][00864] Component RolloutWorker_w1 stopped! [2024-12-23 04:33:04,598][03654] Loop rollout_proc1_evt_loop terminating... [2024-12-23 04:33:04,622][03655] Stopping RolloutWorker_w3... [2024-12-23 04:33:04,624][03655] Loop rollout_proc3_evt_loop terminating... [2024-12-23 04:33:04,622][00864] Component RolloutWorker_w3 stopped! [2024-12-23 04:33:04,635][00864] Component RolloutWorker_w7 stopped! [2024-12-23 04:33:04,635][03658] Stopping RolloutWorker_w7... [2024-12-23 04:33:04,638][03658] Loop rollout_proc7_evt_loop terminating... [2024-12-23 04:33:04,661][03659] Stopping RolloutWorker_w5... [2024-12-23 04:33:04,661][00864] Component RolloutWorker_w5 stopped! [2024-12-23 04:33:04,662][03659] Loop rollout_proc5_evt_loop terminating... [2024-12-23 04:33:04,665][00864] Waiting for process learner_proc0 to stop... [2024-12-23 04:33:06,058][00864] Waiting for process inference_proc0-0 to join... [2024-12-23 04:33:06,069][00864] Waiting for process rollout_proc0 to join... [2024-12-23 04:33:07,993][00864] Waiting for process rollout_proc1 to join... [2024-12-23 04:33:08,000][00864] Waiting for process rollout_proc2 to join... [2024-12-23 04:33:08,003][00864] Waiting for process rollout_proc3 to join... [2024-12-23 04:33:08,007][00864] Waiting for process rollout_proc4 to join... [2024-12-23 04:33:08,009][00864] Waiting for process rollout_proc5 to join... [2024-12-23 04:33:08,015][00864] Waiting for process rollout_proc6 to join... [2024-12-23 04:33:08,018][00864] Waiting for process rollout_proc7 to join... [2024-12-23 04:33:08,021][00864] Batcher 0 profile tree view: batching: 26.2159, releasing_batches: 0.0263 [2024-12-23 04:33:08,023][00864] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0026 wait_policy_total: 388.0924 update_model: 8.6248 weight_update: 0.0020 one_step: 0.0067 handle_policy_step: 569.1528 deserialize: 14.4264, stack: 3.1032, obs_to_device_normalize: 122.7405, forward: 283.7366, send_messages: 28.4910 prepare_outputs: 87.6937 to_cpu: 53.3861 [2024-12-23 04:33:08,024][00864] Learner 0 profile tree view: misc: 0.0058, prepare_batch: 13.6080 train: 75.8177 epoch_init: 0.0058, minibatch_init: 0.0168, losses_postprocess: 0.7342, kl_divergence: 0.6643, after_optimizer: 33.5546 calculate_losses: 28.0951 losses_init: 0.0112, forward_head: 1.3479, bptt_initial: 19.3673, tail: 1.0366, advantages_returns: 0.2829, losses: 3.8543 bptt: 1.9019 bptt_forward_core: 1.8036 update: 12.1702 clip: 0.8607 [2024-12-23 04:33:08,027][00864] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3454, enqueue_policy_requests: 90.1071, env_step: 792.0246, overhead: 12.0806, complete_rollouts: 6.3696 save_policy_outputs: 20.9536 split_output_tensors: 8.5838 [2024-12-23 04:33:08,029][00864] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2672, enqueue_policy_requests: 92.3840, env_step: 785.7467, overhead: 12.0211, complete_rollouts: 6.9594 save_policy_outputs: 19.8840 split_output_tensors: 8.2641 [2024-12-23 04:33:08,030][00864] Loop Runner_EvtLoop terminating... [2024-12-23 04:33:08,031][00864] Runner profile tree view: main_loop: 1036.2029 [2024-12-23 04:33:08,033][00864] Collected {0: 4005888}, FPS: 3865.9 [2024-12-23 04:33:13,278][00864] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-23 04:33:13,279][00864] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-23 04:33:13,282][00864] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-23 04:33:13,284][00864] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-23 04:33:13,286][00864] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-23 04:33:13,288][00864] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-23 04:33:13,290][00864] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-23 04:33:13,292][00864] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-23 04:33:13,294][00864] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-23 04:33:13,296][00864] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-23 04:33:13,297][00864] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-23 04:33:13,298][00864] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-23 04:33:13,299][00864] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-23 04:33:13,300][00864] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-23 04:33:13,301][00864] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-23 04:33:13,332][00864] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-23 04:33:13,336][00864] RunningMeanStd input shape: (3, 72, 128) [2024-12-23 04:33:13,338][00864] RunningMeanStd input shape: (1,) [2024-12-23 04:33:13,353][00864] ConvEncoder: input_channels=3 [2024-12-23 04:33:13,458][00864] Conv encoder output size: 512 [2024-12-23 04:33:13,462][00864] Policy head output size: 512 [2024-12-23 04:33:13,725][00864] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-23 04:33:14,526][00864] Num frames 100... [2024-12-23 04:33:14,649][00864] Num frames 200... [2024-12-23 04:33:14,773][00864] Num frames 300... [2024-12-23 04:33:14,897][00864] Num frames 400... [2024-12-23 04:33:15,026][00864] Num frames 500... [2024-12-23 04:33:15,153][00864] Num frames 600... [2024-12-23 04:33:15,277][00864] Num frames 700... [2024-12-23 04:33:15,397][00864] Num frames 800... [2024-12-23 04:33:15,521][00864] Num frames 900... [2024-12-23 04:33:15,642][00864] Num frames 1000... [2024-12-23 04:33:15,713][00864] Avg episode rewards: #0: 22.100, true rewards: #0: 10.100 [2024-12-23 04:33:15,714][00864] Avg episode reward: 22.100, avg true_objective: 10.100 [2024-12-23 04:33:15,822][00864] Num frames 1100... [2024-12-23 04:33:15,948][00864] Num frames 1200... [2024-12-23 04:33:16,080][00864] Num frames 1300... [2024-12-23 04:33:16,209][00864] Num frames 1400... [2024-12-23 04:33:16,384][00864] Avg episode rewards: #0: 15.990, true rewards: #0: 7.490 [2024-12-23 04:33:16,386][00864] Avg episode reward: 15.990, avg true_objective: 7.490 [2024-12-23 04:33:16,390][00864] Num frames 1500... [2024-12-23 04:33:16,514][00864] Num frames 1600... [2024-12-23 04:33:16,636][00864] Num frames 1700... [2024-12-23 04:33:16,762][00864] Num frames 1800... [2024-12-23 04:33:16,884][00864] Num frames 1900... [2024-12-23 04:33:17,006][00864] Num frames 2000... [2024-12-23 04:33:17,147][00864] Num frames 2100... [2024-12-23 04:33:17,270][00864] Num frames 2200... [2024-12-23 04:33:17,392][00864] Num frames 2300... [2024-12-23 04:33:17,514][00864] Num frames 2400... [2024-12-23 04:33:17,637][00864] Num frames 2500... [2024-12-23 04:33:17,722][00864] Avg episode rewards: #0: 18.080, true rewards: #0: 8.413 [2024-12-23 04:33:17,723][00864] Avg episode reward: 18.080, avg true_objective: 8.413 [2024-12-23 04:33:17,821][00864] Num frames 2600... [2024-12-23 04:33:17,945][00864] Num frames 2700... [2024-12-23 04:33:18,071][00864] Num frames 2800... [2024-12-23 04:33:18,215][00864] Num frames 2900... [2024-12-23 04:33:18,345][00864] Num frames 3000... [2024-12-23 04:33:18,469][00864] Num frames 3100... [2024-12-23 04:33:18,595][00864] Num frames 3200... [2024-12-23 04:33:18,720][00864] Num frames 3300... [2024-12-23 04:33:18,879][00864] Avg episode rewards: #0: 17.970, true rewards: #0: 8.470 [2024-12-23 04:33:18,882][00864] Avg episode reward: 17.970, avg true_objective: 8.470 [2024-12-23 04:33:18,899][00864] Num frames 3400... [2024-12-23 04:33:19,021][00864] Num frames 3500... [2024-12-23 04:33:19,167][00864] Num frames 3600... [2024-12-23 04:33:19,290][00864] Num frames 3700... [2024-12-23 04:33:19,409][00864] Num frames 3800... [2024-12-23 04:33:19,533][00864] Num frames 3900... [2024-12-23 04:33:19,656][00864] Num frames 4000... [2024-12-23 04:33:19,821][00864] Avg episode rewards: #0: 17.784, true rewards: #0: 8.184 [2024-12-23 04:33:19,823][00864] Avg episode reward: 17.784, avg true_objective: 8.184 [2024-12-23 04:33:19,837][00864] Num frames 4100... [2024-12-23 04:33:19,958][00864] Num frames 4200... [2024-12-23 04:33:20,080][00864] Num frames 4300... [2024-12-23 04:33:20,215][00864] Num frames 4400... [2024-12-23 04:33:20,341][00864] Num frames 4500... [2024-12-23 04:33:20,463][00864] Num frames 4600... [2024-12-23 04:33:20,581][00864] Num frames 4700... [2024-12-23 04:33:20,663][00864] Avg episode rewards: #0: 17.037, true rewards: #0: 7.870 [2024-12-23 04:33:20,665][00864] Avg episode reward: 17.037, avg true_objective: 7.870 [2024-12-23 04:33:20,759][00864] Num frames 4800... [2024-12-23 04:33:20,880][00864] Num frames 4900... [2024-12-23 04:33:20,997][00864] Num frames 5000... [2024-12-23 04:33:21,123][00864] Num frames 5100... [2024-12-23 04:33:21,255][00864] Num frames 5200... [2024-12-23 04:33:21,376][00864] Num frames 5300... [2024-12-23 04:33:21,498][00864] Num frames 5400... [2024-12-23 04:33:21,618][00864] Num frames 5500... [2024-12-23 04:33:21,742][00864] Num frames 5600... [2024-12-23 04:33:21,819][00864] Avg episode rewards: #0: 17.454, true rewards: #0: 8.026 [2024-12-23 04:33:21,820][00864] Avg episode reward: 17.454, avg true_objective: 8.026 [2024-12-23 04:33:21,922][00864] Num frames 5700... [2024-12-23 04:33:22,045][00864] Num frames 5800... [2024-12-23 04:33:22,174][00864] Num frames 5900... [2024-12-23 04:33:22,307][00864] Num frames 6000... [2024-12-23 04:33:22,431][00864] Num frames 6100... [2024-12-23 04:33:22,551][00864] Num frames 6200... [2024-12-23 04:33:22,674][00864] Num frames 6300... [2024-12-23 04:33:22,823][00864] Num frames 6400... [2024-12-23 04:33:23,001][00864] Num frames 6500... [2024-12-23 04:33:23,168][00864] Num frames 6600... [2024-12-23 04:33:23,339][00864] Num frames 6700... [2024-12-23 04:33:23,514][00864] Avg episode rewards: #0: 18.213, true rewards: #0: 8.462 [2024-12-23 04:33:23,516][00864] Avg episode reward: 18.213, avg true_objective: 8.462 [2024-12-23 04:33:23,573][00864] Num frames 6800... [2024-12-23 04:33:23,739][00864] Num frames 6900... [2024-12-23 04:33:23,901][00864] Num frames 7000... [2024-12-23 04:33:24,066][00864] Num frames 7100... [2024-12-23 04:33:24,251][00864] Num frames 7200... [2024-12-23 04:33:24,439][00864] Num frames 7300... [2024-12-23 04:33:24,610][00864] Num frames 7400... [2024-12-23 04:33:24,786][00864] Num frames 7500... [2024-12-23 04:33:24,964][00864] Num frames 7600... [2024-12-23 04:33:25,145][00864] Num frames 7700... [2024-12-23 04:33:25,297][00864] Num frames 7800... [2024-12-23 04:33:25,437][00864] Num frames 7900... [2024-12-23 04:33:25,560][00864] Num frames 8000... [2024-12-23 04:33:25,681][00864] Num frames 8100... [2024-12-23 04:33:25,806][00864] Num frames 8200... [2024-12-23 04:33:25,930][00864] Num frames 8300... [2024-12-23 04:33:26,040][00864] Avg episode rewards: #0: 20.601, true rewards: #0: 9.268 [2024-12-23 04:33:26,042][00864] Avg episode reward: 20.601, avg true_objective: 9.268 [2024-12-23 04:33:26,117][00864] Num frames 8400... [2024-12-23 04:33:26,245][00864] Num frames 8500... [2024-12-23 04:33:26,365][00864] Num frames 8600... [2024-12-23 04:33:26,499][00864] Num frames 8700... [2024-12-23 04:33:26,624][00864] Num frames 8800... [2024-12-23 04:33:26,746][00864] Num frames 8900... [2024-12-23 04:33:26,872][00864] Num frames 9000... [2024-12-23 04:33:26,997][00864] Num frames 9100... [2024-12-23 04:33:27,122][00864] Num frames 9200... [2024-12-23 04:33:27,246][00864] Num frames 9300... [2024-12-23 04:33:27,370][00864] Num frames 9400... [2024-12-23 04:33:27,550][00864] Avg episode rewards: #0: 20.997, true rewards: #0: 9.497 [2024-12-23 04:33:27,553][00864] Avg episode reward: 20.997, avg true_objective: 9.497 [2024-12-23 04:33:27,558][00864] Num frames 9500... [2024-12-23 04:34:22,997][00864] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-23 04:37:22,660][00864] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-23 04:37:22,662][00864] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-23 04:37:22,665][00864] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-23 04:37:22,667][00864] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-23 04:37:22,668][00864] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-23 04:37:22,670][00864] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-23 04:37:22,672][00864] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-23 04:37:22,673][00864] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-23 04:37:22,678][00864] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-23 04:37:22,679][00864] Adding new argument 'hf_repository'='AlbertoImmune/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-23 04:37:22,680][00864] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-23 04:37:22,681][00864] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-23 04:37:22,682][00864] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-23 04:37:22,683][00864] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-23 04:37:22,684][00864] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-23 04:37:22,722][00864] RunningMeanStd input shape: (3, 72, 128) [2024-12-23 04:37:22,726][00864] RunningMeanStd input shape: (1,) [2024-12-23 04:37:22,744][00864] ConvEncoder: input_channels=3 [2024-12-23 04:37:22,781][00864] Conv encoder output size: 512 [2024-12-23 04:37:22,782][00864] Policy head output size: 512 [2024-12-23 04:37:22,801][00864] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-23 04:37:23,236][00864] Num frames 100... [2024-12-23 04:37:23,359][00864] Num frames 200... [2024-12-23 04:37:23,481][00864] Num frames 300... [2024-12-23 04:37:23,610][00864] Num frames 400... [2024-12-23 04:37:23,734][00864] Num frames 500... [2024-12-23 04:37:23,856][00864] Num frames 600... [2024-12-23 04:37:23,978][00864] Num frames 700... [2024-12-23 04:37:24,105][00864] Num frames 800... [2024-12-23 04:37:24,231][00864] Num frames 900... [2024-12-23 04:37:24,316][00864] Avg episode rewards: #0: 19.230, true rewards: #0: 9.230 [2024-12-23 04:37:24,318][00864] Avg episode reward: 19.230, avg true_objective: 9.230 [2024-12-23 04:37:24,412][00864] Num frames 1000... [2024-12-23 04:37:24,539][00864] Num frames 1100... [2024-12-23 04:37:24,669][00864] Num frames 1200... [2024-12-23 04:37:24,792][00864] Num frames 1300... [2024-12-23 04:37:24,920][00864] Num frames 1400... [2024-12-23 04:37:25,047][00864] Num frames 1500... [2024-12-23 04:37:25,174][00864] Num frames 1600... [2024-12-23 04:37:25,304][00864] Num frames 1700... [2024-12-23 04:37:25,427][00864] Num frames 1800... [2024-12-23 04:37:25,549][00864] Num frames 1900... [2024-12-23 04:37:25,683][00864] Num frames 2000... [2024-12-23 04:37:25,800][00864] Num frames 2100... [2024-12-23 04:37:25,865][00864] Avg episode rewards: #0: 21.535, true rewards: #0: 10.535 [2024-12-23 04:37:25,866][00864] Avg episode reward: 21.535, avg true_objective: 10.535 [2024-12-23 04:37:25,982][00864] Num frames 2200... [2024-12-23 04:37:26,110][00864] Num frames 2300... [2024-12-23 04:37:26,237][00864] Num frames 2400... [2024-12-23 04:37:26,362][00864] Num frames 2500... [2024-12-23 04:37:26,485][00864] Num frames 2600... [2024-12-23 04:37:26,616][00864] Num frames 2700... [2024-12-23 04:37:26,741][00864] Num frames 2800... [2024-12-23 04:37:26,863][00864] Num frames 2900... [2024-12-23 04:37:26,986][00864] Num frames 3000... [2024-12-23 04:37:27,114][00864] Num frames 3100... [2024-12-23 04:37:27,242][00864] Num frames 3200... [2024-12-23 04:37:27,363][00864] Num frames 3300... [2024-12-23 04:37:27,484][00864] Num frames 3400... [2024-12-23 04:37:27,609][00864] Num frames 3500... [2024-12-23 04:37:27,745][00864] Num frames 3600... [2024-12-23 04:37:27,867][00864] Num frames 3700... [2024-12-23 04:37:27,990][00864] Num frames 3800... [2024-12-23 04:37:28,114][00864] Num frames 3900... [2024-12-23 04:37:28,245][00864] Num frames 4000... [2024-12-23 04:37:28,363][00864] Num frames 4100... [2024-12-23 04:37:28,489][00864] Num frames 4200... [2024-12-23 04:37:28,555][00864] Avg episode rewards: #0: 31.690, true rewards: #0: 14.023 [2024-12-23 04:37:28,557][00864] Avg episode reward: 31.690, avg true_objective: 14.023 [2024-12-23 04:37:28,681][00864] Num frames 4300... [2024-12-23 04:37:28,800][00864] Num frames 4400... [2024-12-23 04:37:28,972][00864] Avg episode rewards: #0: 24.987, true rewards: #0: 11.238 [2024-12-23 04:37:28,975][00864] Avg episode reward: 24.987, avg true_objective: 11.238 [2024-12-23 04:37:28,983][00864] Num frames 4500... [2024-12-23 04:37:29,130][00864] Num frames 4600... [2024-12-23 04:37:29,268][00864] Num frames 4700... [2024-12-23 04:37:29,398][00864] Num frames 4800... [2024-12-23 04:37:29,518][00864] Num frames 4900... [2024-12-23 04:37:29,637][00864] Num frames 5000... [2024-12-23 04:37:29,769][00864] Num frames 5100... [2024-12-23 04:37:29,892][00864] Num frames 5200... [2024-12-23 04:37:30,027][00864] Num frames 5300... [2024-12-23 04:37:30,158][00864] Num frames 5400... [2024-12-23 04:37:30,280][00864] Num frames 5500... [2024-12-23 04:37:30,406][00864] Num frames 5600... [2024-12-23 04:37:30,528][00864] Num frames 5700... [2024-12-23 04:37:30,650][00864] Num frames 5800... [2024-12-23 04:37:30,834][00864] Avg episode rewards: #0: 27.388, true rewards: #0: 11.788 [2024-12-23 04:37:30,837][00864] Avg episode reward: 27.388, avg true_objective: 11.788 [2024-12-23 04:37:30,847][00864] Num frames 5900... [2024-12-23 04:37:30,969][00864] Num frames 6000... [2024-12-23 04:37:31,090][00864] Num frames 6100... [2024-12-23 04:37:31,221][00864] Num frames 6200... [2024-12-23 04:37:31,345][00864] Num frames 6300... [2024-12-23 04:37:31,465][00864] Num frames 6400... [2024-12-23 04:37:31,587][00864] Num frames 6500... [2024-12-23 04:37:31,710][00864] Num frames 6600... [2024-12-23 04:37:31,835][00864] Num frames 6700... [2024-12-23 04:37:31,959][00864] Num frames 6800... [2024-12-23 04:37:32,079][00864] Num frames 6900... [2024-12-23 04:37:32,212][00864] Num frames 7000... [2024-12-23 04:37:32,378][00864] Avg episode rewards: #0: 27.482, true rewards: #0: 11.815 [2024-12-23 04:37:32,379][00864] Avg episode reward: 27.482, avg true_objective: 11.815 [2024-12-23 04:37:32,397][00864] Num frames 7100... [2024-12-23 04:37:32,520][00864] Num frames 7200... [2024-12-23 04:37:32,644][00864] Num frames 7300... [2024-12-23 04:37:32,787][00864] Num frames 7400... [2024-12-23 04:37:32,961][00864] Num frames 7500... [2024-12-23 04:37:33,134][00864] Num frames 7600... [2024-12-23 04:37:33,302][00864] Num frames 7700... [2024-12-23 04:37:33,474][00864] Num frames 7800... [2024-12-23 04:37:33,640][00864] Num frames 7900... [2024-12-23 04:37:33,806][00864] Num frames 8000... [2024-12-23 04:37:33,974][00864] Num frames 8100... [2024-12-23 04:37:34,069][00864] Avg episode rewards: #0: 27.173, true rewards: #0: 11.601 [2024-12-23 04:37:34,073][00864] Avg episode reward: 27.173, avg true_objective: 11.601 [2024-12-23 04:37:34,248][00864] Num frames 8200... [2024-12-23 04:37:34,429][00864] Num frames 8300... [2024-12-23 04:37:34,597][00864] Num frames 8400... [2024-12-23 04:37:34,774][00864] Num frames 8500... [2024-12-23 04:37:34,960][00864] Num frames 8600... [2024-12-23 04:37:35,144][00864] Num frames 8700... [2024-12-23 04:37:35,227][00864] Avg episode rewards: #0: 25.387, true rewards: #0: 10.887 [2024-12-23 04:37:35,230][00864] Avg episode reward: 25.387, avg true_objective: 10.887 [2024-12-23 04:37:35,406][00864] Num frames 8800... [2024-12-23 04:37:35,532][00864] Num frames 8900... [2024-12-23 04:37:35,654][00864] Num frames 9000... [2024-12-23 04:37:35,778][00864] Num frames 9100... [2024-12-23 04:37:35,909][00864] Num frames 9200... [2024-12-23 04:37:36,034][00864] Num frames 9300... [2024-12-23 04:37:36,166][00864] Num frames 9400... [2024-12-23 04:37:36,291][00864] Num frames 9500... [2024-12-23 04:37:36,419][00864] Num frames 9600... [2024-12-23 04:37:36,544][00864] Num frames 9700... [2024-12-23 04:37:36,669][00864] Num frames 9800... [2024-12-23 04:37:36,802][00864] Avg episode rewards: #0: 25.513, true rewards: #0: 10.958 [2024-12-23 04:37:36,803][00864] Avg episode reward: 25.513, avg true_objective: 10.958 [2024-12-23 04:37:36,852][00864] Num frames 9900... [2024-12-23 04:37:36,985][00864] Num frames 10000... [2024-12-23 04:37:37,109][00864] Num frames 10100... [2024-12-23 04:37:37,239][00864] Num frames 10200... [2024-12-23 04:37:37,362][00864] Num frames 10300... [2024-12-23 04:37:37,485][00864] Num frames 10400... [2024-12-23 04:37:37,607][00864] Num frames 10500... [2024-12-23 04:37:37,732][00864] Num frames 10600... [2024-12-23 04:37:37,862][00864] Num frames 10700... [2024-12-23 04:37:37,995][00864] Num frames 10800... [2024-12-23 04:37:38,132][00864] Num frames 10900... [2024-12-23 04:37:38,296][00864] Avg episode rewards: #0: 25.687, true rewards: #0: 10.987 [2024-12-23 04:37:38,298][00864] Avg episode reward: 25.687, avg true_objective: 10.987 [2024-12-23 04:38:40,489][00864] Replay video saved to /content/train_dir/default_experiment/replay.mp4!